Assembly Language Programming. MASM & Intel Architecture Documents

CSCI 240 - Assembly Language Programming - MASM & Intel Docs
MASM Documentation
G G G G
Getting Started Reference Guide Programmer's Guide Environment and Tools
Intel Documentation
G
Intel Architecture, Software Developer's Manual Volume 1: Basic Architecture Volume 2: Instruction Set Reference Volume 3: System Programming
Intel Architecture Optimization Reference Manual
Microsoft MASM 6.1 Documentation

Getting Started
Table of Contents Ch 3. - Configuring Your System Back to Top Ch. 1 - MASM Overview Ch. 2 - Installing and Using MASM
Reference Guide
Title Page Introduction Ch. 3 - Symbols and Operators Ch. 6 - Macros Copyright Page Ch. 1 - Tools Ch. 4 - Processor Ch. 7 - Tables Back to Top Table of Contents Ch. 2 - Directives Ch. 5 - Coprocessor
Programmer's Guide
Title Page Introduction Ch. 3 - Using Addresses and Pointers Ch. 6 - Using Floating-Point and Binary Coded Decimal Numbers Ch. 9. - Using Macros Ch. 12 - Mixed-Language Programming App. B - BNF Grammar App. E - Default Segment Names Documentation Feedback Back to Top Copyright Page Ch. 1 - Understanding Global Concepts Ch 4. - Defining and Using Simple Data Types Ch 7. - Controlling Program Flow Ch. 10 - Writing a DynamicLink Library for Windows Ch 13. - Writing 32-Bit Applications App. C - Generating and Reading Assembly Listings Glossary Table of Contents Ch. 2 - Organizing Segments Ch 5. - Understanding and Using Complex Data Types Ch. 8 - Sharing Data and Procedures Among Modules and Libraries Ch. 11 - Writing MemoryResident Software App. A - Differences Between MASM 6.1 and 5.1 App. D - MASM Reserved Words Index
Environment and Tools

Title Page Introduction Part 1 - The Programmer's WorkBench Ch. 1 - Introducing the Programmer's WorkBench Ch 4. - User Interface Details Ch. 7 - Programmer's WorkBench Reference Part 2 - The CodeView Debugger Ch. 8 - Getting Started with CodeView Ch. 11 - Using Expressions in CodeView Ch. 9 - The CodeView Environment Ch. 12 - CodeView Reference Part 3 - Compiling and Linking Ch. 13 - Linking Object Files with LINK Ch. 14 - Creating ModuleDefinition Files Part 4 - Utilities Ch. 16 - Managing Projects with NMAKE Ch. 19 - Browser Utilities Ch. 17 - Managing Libraries with LIB Ch. 20 - Using Other Utilities Part 5 - Using Help Ch. 21 - Using Help Appendixes App. A - Error Messages Index Back to Top App. B - Regular Expressions Glossary Ch. 18 - Creating Help Files with HELPMAKE Ch. 15 - Using EXEHDR Ch. 10 - Special Topics Ch. 2 - Quick Start Ch. 2 continued Ch. 5 - Advanced PWB Techniques Ch. 5 continued Ch. 3 - Managing Multimodule Programs Ch. 6 - Customizing PWB Copyright Page Table of Contents
http://web.sau.edu/LillisKevinM/csci240/masmdocs/ [12/27/2002 10:21:00 PM]
Intel Architecture Software Developers Manual

Volume 1: Basic Architecture
NOTE: The Intel Architecture Software Developers Manual consists of three volumes: Basic Architecture, Order Number 243190; Instruction Set Reference, Order Number 243191; and the System Programming Guide, Order Number 243192. Please refer to all three volumes when evaluating your design needs.
1999
Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intels Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. Intels Intel Architecture processors (e.g., Pentium, Pentium II, Pentium III, and Pentium Pro processors) may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an ordering number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's literature center at http://www.intel.com.
COPYRIGHT INTEL CORPORATION 1999 *THIRD-PARTY BRANDS AND NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS.
TABLE OF CONTENTS
CHAPTER 1 ABOUT THIS MANUAL 1.1. OVERVIEW OF THE INTEL ARCHITECTURE SOFTWARE DEVELOPERS MANUAL, 1-1 VOLUME 1: BASIC ARCHITECTURE 1.2. OVERVIEW OF THE INTEL ARCHITECTURE SOFTWARE DEVELOPERS MANUAL, VOLUME 2: INSTRUCTION SET REFERENCE 1-3 1.3. OVERVIEW OF THE INTEL ARCHITECTURE SOFTWARE DEVELOPERS MANUAL, 1-3 VOLUME 3: SYSTEM PROGRAMMING GUIDE 1.4. NOTATIONAL CONVENTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 1.4.1. Bit and Byte Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-5 1.4.2. Reserved Bits and Software Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6 1.4.3. Instruction Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-7 1.4.4. Hexadecimal and Binary Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-7 1.4.5. Segmented Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-7 1.4.6. Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-8 1.5. RELATED LITERATURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 CHAPTER 2 INTRODUCTION TO THE INTEL ARCHITECTURE 2.1. BRIEF HISTORY OF THE INTEL ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . 2-1 2.2. INCREASING INTEL ARCHITECTURE PERFORMANCE AND MOORES LAW . 2-4 2.3. BRIEF HISTORY OF THE INTEL ARCHITECTURE FLOATING-POINT UNIT. . . . 2-6 2.4. INTRODUCTION TO THE P6 FAMILY PROCESSORS ADVANCED MICROARCHITECTURE 2-6 2.5. DETAILED DESCRIPTION OF THE P6 FAMILY PROCESSOR MICROARCHITECTURE 2-9 2.5.1. Memory Subsystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-9 2.5.2. Fetch/Decode Unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-11 2.5.3. Instruction Pool (Reorder Buffer). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-11 2.5.4. Dispatch/Execute Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-12 2.5.5. Retirement Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-13 CHAPTER 3 BASIC EXECUTION ENVIRONMENT 3.1. MODES OF OPERATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 3.2. OVERVIEW OF THE BASIC EXECUTION ENVIRONMENT . . . . . . . . . . . . . . . . . 3-2 3.3. MEMORY ORGANIZATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 3.4. MODES OF OPERATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 3.5. 32-BIT VS. 16-BIT ADDRESS AND OPERAND SIZES. . . . . . . . . . . . . . . . . . . . . . 3-4 3.6. REGISTERS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 3.6.1. General-Purpose Data Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6 3.6.2. Segment Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-7 3.6.3. EFLAGS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-10 3.6.3.1. Status Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-12 3.6.3.2. DF Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-13 3.6.4. System Flags and IOPL Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-13 3.7. INSTRUCTION POINTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 3.8. OPERAND-SIZE AND ADDRESS-SIZE ATTRIBUTES. . . . . . . . . . . . . . . . . . . . . 3-14 iii
TABLE OF CONTENTS
CHAPTER 4 PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS 4.1. PROCEDURE CALL TYPES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 4.2. STACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 4.2.1. Setting Up a Stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3 4.2.2. Stack Alignment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3 4.2.3. Address-Size Attributes for Stack Accesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3 4.2.4. Procedure Linking Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-4 4.2.4.1. Stack-Frame Base Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-4 4.2.4.2. Return Instruction Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-4 4.3. CALLING PROCEDURES USING CALL AND RET . . . . . . . . . . . . . . . . . . . . . . . . 4-5 4.3.1. Near CALL and RET Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-5 4.3.2. Far CALL and RET Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-6 4.3.3. Parameter Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7 4.3.3.1. Passing Parameters Through the General-Purpose Registers . . . . . . . . . . . .4-7 4.3.3.2. Passing Parameters on the Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7 4.3.3.3. Passing Parameters in an Argument List . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7 4.3.4. Saving Procedure State Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7 4.3.5. Calls to Other Privilege Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-8 4.3.6. CALL and RET Operation Between Privilege Levels . . . . . . . . . . . . . . . . . . . . .4-10 4.4. INTERRUPTS AND EXCEPTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11 4.4.1. Call and Return Operation for Interrupt or Exception Handling Procedures . . . .4-13 4.4.2. Calls to Interrupt or Exception Handler Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . .4-17 4.4.3. Interrupt and Exception Handling in Real-Address Mode . . . . . . . . . . . . . . . . . .4-17 4.4.4. INT n, INTO, INT 3, and BOUND Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . .4-17 4.5. PROCEDURE CALLS FOR BLOCK-STRUCTURED LANGUAGES. . . . . . . . . . . 4-18 4.5.1. ENTER Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-18 4.5.2. LEAVE Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-24 CHAPTER 5 DATA TYPES AND ADDRESSING MODES 5.1. FUNDAMENTAL DATA TYPES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 5.1.1. Alignment of Words, Doublewords, and Quadwords. . . . . . . . . . . . . . . . . . . . . . .5-2 5.2. NUMERIC, POINTER, BIT FIELD, AND STRING DATA TYPES . . . . . . . . . . . . . . 5-3 5.2.1. Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-3 5.2.2. Unsigned Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-5 5.2.3. BCD Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-5 5.2.4. Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-5 5.2.5. Bit Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-5 5.2.6. Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-5 5.2.7. Floating-Point Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-6 5.2.8. MMX Technology Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-6 5.2.9. Streaming SIMD Extensions Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-6 5.3. OPERAND ADDRESSING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 5.3.1. Immediate Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-6 5.3.2. Register Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-7 5.3.3. Memory Operands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-7 5.3.3.1. Specifying a Segment Selector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-8 5.3.3.2. Specifying an Offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-9 5.3.3.3. Assembler and Compiler Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . .5-10 5.3.4. I/O Port Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-11
iv
TABLE OF CONTENTS
CHAPTER 6 INSTRUCTION SET SUMMARY 6.1. NEW INTEL ARCHITECTURE INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 6.1.1. New Instructions Introduced with the Streaming SIMD Extensions . . . . . . . . . . . 6-1 6.1.2. New Instructions Introduced with the MMX Technology . . . . . . . . . . . . . . . . . 6-1 6.1.3. New Instructions in the Pentium Pro Processor . . . . . . . . . . . . . . . . . . . . . . . . 6-2 6.1.4. New Instructions in the Pentium Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 6.1.5. New Instructions in the Intel486 Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 6.2. INSTRUCTION SET LIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 6.2.1. Integer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 6.2.1.1. Data Transfer Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 6.2.1.2. Binary Arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5 6.2.1.3. Decimal Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5 6.2.1.4. Logic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5 6.2.1.5. Shift and Rotate Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5 6.2.1.6. Bit and Byte Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 6.2.1.7. Control Transfer Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7 6.2.1.8. String Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 6.2.1.9. Flag Control Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 6.2.1.10. Segment Register Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 6.2.1.11. Miscellaneous Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 6.2.2. MMX Technology Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 6.2.2.1. MMX Data Transfer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 6.2.2.2. MMX Conversion Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 6.2.2.3. MMX Packed Arithmetic Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 6.2.2.4. MMX Comparison Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11 6.2.2.5. MMX Logic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11 6.2.2.6. MMX Shift and Rotate Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11 6.2.2.7. MMX State Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12 6.2.3. Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12 6.2.3.1. Data Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12 6.2.3.2. Basic Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13 6.2.3.3. Comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14 6.2.3.4. Transcendental . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14 6.2.3.5. Load Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15 6.2.3.6. FPU Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15 6.2.4. System Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-16 6.2.5. Streaming SIMD Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-17 6.2.5.1. Streaming SIMD Extensions Data Transfer Instructions. . . . . . . . . . . . . . . . 6-17 6.2.5.2. Streaming SIMD Extensions Conversion Instructions. . . . . . . . . . . . . . . . . . 6-17 6.2.5.3. Streaming SIMD Extensions Packed Arithmetic Instructions . . . . . . . . . . . . 6-18 6.2.5.4. Streaming SIMD Extensions Comparison Instructions . . . . . . . . . . . . . . . . . 6-18 6.2.5.5. Streaming SIMD Extensions Logical Instructions . . . . . . . . . . . . . . . . . . . . . 6-18 6.2.5.6. Streaming SIMD Extensions Data Shuffle Instructions . . . . . . . . . . . . . . . . . 6-19 6.2.5.7. Streaming SIMD Extensions Additional SIMD-Integer Instructions. . . . . . . . 6-19 6.2.5.8. Streaming SIMD Extensions Cacheability Control Instructions. . . . . . . . . . . 6-19 6.2.5.9. Streaming SIMD Extensions State Management Instructions . . . . . . . . . . . 6-19 6.3. DATA MOVEMENT INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-20 6.3.1. General-Purpose Data Movement Instructions . . . . . . . . . . . . . . . . . . . . . . . . . 6-20 6.3.1.1. Move Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-20 6.3.1.2. Conditional Move Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-20 6.3.1.3. Exchange Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21
TABLE OF CONTENTS
6.3.2. 6.3.2.1. 6.3.2.2. 6.3.2.3. 6.4. 6.4.1. 6.4.2. 6.4.3. 6.4.4. 6.5. 6.5.1. 6.5.2. 6.6. 6.7. 6.7.1. 6.7.2. 6.7.3. 6.8. 6.8.1. 6.8.2. 6.8.3. 6.8.4. 6.9. 6.9.1. 6.9.1.1. 6.9.1.2. 6.9.1.3. 6.9.2. 6.9.2.1. 6.9.2.2. 6.9.2.3. 6.9.3. 6.10. 6.10.1. 6.11. 6.12. 6.13. 6.13.1. 6.13.2. 6.13.3. 6.13.4. 6.14. 6.14.1. 6.14.2. 6.14.3. 6.14.4. 6.15. 6.15.1. 6.15.2. 6.15.3. 6.15.4.
Stack Manipulation Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-23 Type Conversion Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-25 Simple Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-25 Move and Convert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-26 BINARY ARITHMETIC INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-26 Addition and Subtraction Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-26 Increment and Decrement Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-26 Comparison and Sign Change Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-27 Multiplication and Divide Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-27 DECIMAL ARITHMETIC INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-27 Packed BCD Adjustment Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-28 Unpacked BCD Adjustment Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-28 LOGICAL INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-29 SHIFT AND ROTATE INSTRUCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-29 Shift Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-29 Double-Shift Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-31 Rotate Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-32 BIT AND BYTE INSTRUCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-34 Bit Test and Modify Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-34 Bit Scan Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-34 Byte Set on Condition Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-34 Test Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-35 CONTROL TRANSFER INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-35 Unconditional Transfer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-35 Jump Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-35 Call and Return Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-36 Return From Interrupt Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-36 Conditional Transfer Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-36 Conditional Jump Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-37 Loop Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-38 Jump If Zero Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-38 Software Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-39 STRING OPERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-39 Repeating String Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-40 I/O INSTRUCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-41 ENTER AND LEAVE INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-41 EFLAGS INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-42 Carry and Direction Flag Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-42 Interrupt Flag Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-42 EFLAGS Transfer Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-42 Interrupt Flag Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-43 SEGMENT REGISTER INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-43 Segment-Register Load and Store Instructions. . . . . . . . . . . . . . . . . . . . . . . . . .6-43 Far Control Transfer Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-44 Software Interrupt Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-44 Load Far Pointer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-44 MISCELLANEOUS INSTRUCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-44 Address Computation Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-44 Table Lookup Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-45 Processor Identification Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-45 No-Operation and Undefined Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-45
vi
TABLE OF CONTENTS
CHAPTER 7 FLOATING-POINT UNIT 7.1. COMPATIBILITY AND EASE OF USE OF THE INTEL ARCHITECTURE FPU . . . 7-1 7.2. REAL NUMBERS AND FLOATING-POINT FORMATS . . . . . . . . . . . . . . . . . . . . . . 7-2 7.2.1. Real Number System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 7.2.2. Floating-Point Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 7.2.2.1. Normalized Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 7.2.2.2. Biased Exponent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5 7.2.3. Real Number and Non-number Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5 7.2.3.1. Signed Zeros. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 7.2.3.2. Normalized and Denormalized Finite Numbers . . . . . . . . . . . . . . . . . . . . . . . 7-6 7.2.3.3. Signed Infinities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8 7.2.3.4. NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8 7.2.4. Indefinite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8 7.3. FPU ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8 7.3.1. FPU Data Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 7.3.1.1. Parameter Passing with the FPU Register Stack . . . . . . . . . . . . . . . . . . . . . 7-11 7.3.2. FPU Status Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12 7.3.2.1. Top of Stack (TOP) Pointer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12 7.3.2.2. Condition Code Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12 7.3.2.3. Exception Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-14 7.3.2.4. Stack Fault Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-15 7.3.3. Branching and Conditional Moves on FPU Condition Codes . . . . . . . . . . . . . . 7-15 7.3.4. FPU Control Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-16 7.3.4.1. Exception-Flag Masks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-17 7.3.4.2. Precision Control Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-17 7.3.4.3. Rounding Control Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18 7.3.5. Infinity Control Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20 7.3.6. FPU Tag Word. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20 7.3.7. FPU Instruction and Operand (Data) Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21 7.3.8. Last Instruction Opcode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21 7.3.9. Saving the FPUs State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21 7.4. FLOATING-POINT DATA TYPES AND FORMATS . . . . . . . . . . . . . . . . . . . . . . . . 7-24 7.4.1. Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-25 7.4.2. Binary Integers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-27 7.4.3. Decimal Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-29 7.4.4. Unsupported Extended-Real Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-30 7.5. FPU INSTRUCTION SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-31 7.5.1. Escape (ESC) Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-32 7.5.2. FPU Instruction Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-32 7.5.3. Data Transfer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-32 7.5.4. Load Constant Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-34 7.5.5. Basic Arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-35 7.5.6. Comparison and Classification Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-36 7.5.6.1. Branching on the FPU Condition Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-38 7.5.7. Trigonometric Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-38 7.5.8. Pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-39 7.5.9. Logarithmic, Exponential, and Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-40 7.5.10. Transcendental Instruction Accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-40 7.5.11. FPU Control Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-41 7.5.12. Waiting Vs. Non-waiting Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-42 7.5.13. Unsupported FPU Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-43
vii
TABLE OF CONTENTS
7.6. 7.6.1. 7.6.2. 7.6.3. 7.7. 7.7.1. 7.7.2. 7.7.3. 7.7.3.1. 7.7.3.2. 7.7.3.3. 7.8. 7.8.1. 7.8.1.1. 7.8.1.2. 7.8.2. 7.8.3. 7.8.4. 7.8.5. 7.8.6. 7.8.7. 7.9.
OPERATING ON NANS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-43 Operating on NaNs with Streaming SIMD Extensions . . . . . . . . . . . . . . . . . . . .7-44 Uses for Signaling NANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-45 Uses for Quiet NANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-45 FLOATING-POINT EXCEPTION HANDLING . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-46 Arithmetic vs. Non-arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-46 Automatic Exception Handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-47 Software Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-49 Native Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-49 MS-DOS* Compatibility Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-49 Typical Floating-Point Exception Handler Actions . . . . . . . . . . . . . . . . . . . . .7-50 FLOATING-POINT EXCEPTION CONDITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . 7-51 Invalid Operation Exception. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-51 Stack Overflow or Underflow Exception (#IS). . . . . . . . . . . . . . . . . . . . . . . . .7-52 Invalid Arithmetic Operand Exception (#IA) . . . . . . . . . . . . . . . . . . . . . . . . . .7-52 Divide-By-Zero Exception (#Z) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-53 Denormal Operand Exception (#D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-54 Numeric Overflow Exception (#O) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-54 Numeric Underflow Exception (#U) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-56 Inexact Result (Precision) Exception (#P) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-57 Exception Priority. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-57 FLOATING-POINT EXCEPTION SYNCHRONIZATION . . . . . . . . . . . . . . . . . . . . 7-58
CHAPTER 8 PROGRAMMING WITH THE INTEL MMX TECHNOLOGY 8.1. OVERVIEW OF THE MMX TECHNOLOGY PROGRAMMING ENVIRONMENT 8-1 8.1.1. MMX Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-2 8.1.2. MMX Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-3 8.1.3. Single Instruction, Multiple Data (SIMD) Execution Model . . . . . . . . . . . . . . . . . .8-4 8.1.4. Memory Data Formats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-4 8.1.5. Data Formats for MMX Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-5 8.2. MMX INSTRUCTION SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5 8.2.1. Saturation Arithmetic and Wraparound Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-6 8.2.2. Instruction Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-7 8.3. OVERVIEW OF THE MMX INSTRUCTION SET . . . . . . . . . . . . . . . . . . . . . . . . . 8-7 8.3.1. Data Transfer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-7 8.3.2. Arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-9 8.3.2.1. Packed Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-9 8.3.2.2. Packed Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-9 8.3.2.3. Packed Multiply Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-9 8.3.3. Comparison Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-9 8.3.4. Conversion Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-10 8.3.5. Logical Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-10 8.3.6. Shift Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-10 8.3.7. EMMS (Empty MMX State) Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-10 8.4. COMPATIBILITY WITH FPU ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . 8-11 8.4.1. MMX Instructions and the Floating-Point Tag Word . . . . . . . . . . . . . . . . . . . .8-11 8.4.2. Effect of Instruction Prefixes on MMX Instructions . . . . . . . . . . . . . . . . . . . . .8-11 8.5. WRITING APPLICATIONS WITH MMX CODE . . . . . . . . . . . . . . . . . . . . . . . . . 8-11 8.5.1. Detecting Support for MMX Technology Using the CPUID Instruction . . . . . .8-12 8.5.2. Using the EMMS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-12
viii
TABLE OF CONTENTS
8.5.3. 8.5.4. 8.5.4.1. 8.5.5. 8.5.5.1. 8.5.5.2. 8.5.6. 8.5.7.
Interfacing with MMX Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Writing Code with MMX and Floating-Point Instructions . . . . . . . . . . . . . . . . RECOMMENDATIONS AND GUIDELINES . . . . . . . . . . . . . . . . . . . . . . . . . Using MMX Code in a Multitasking Operating System Environment . . . . . . . COOPERATIVE MULTITASKING OPERATING SYSTEM. . . . . . . . . . . . . . PREEMPTIVE MULTITASKING OPERATING SYSTEM . . . . . . . . . . . . . . . Exception Handling in MMX Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Register Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-13 8-14 8-14 8-15 8-16 8-16 8-16 8-16
CHAPTER 9 PROGRAMMING WITH THE STREAMING SIMD EXTENSIONS 9.1. OVERVIEW OF THE STREAMING SIMD EXTENSIONS . . . . . . . . . . . . . . . . . . . . 9-2 9.1.1. SIMD Floating-Point Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2 9.1.2. SIMD Floating-Point Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3 9.1.3. Single Instruction, Multiple Data (SIMD) Execution Model . . . . . . . . . . . . . . . . . 9-4 9.1.4. Pentium III Processor Single Precision Floating-Point Format . . . . . . . . . . . . . 9-4 9.1.5. Memory Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5 9.1.6. SIMD Floating-Point Register Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5 9.1.7. SIMD Floating-Point Control/Status Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7 9.1.8. Rounding Control Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8 9.1.9. Flush-To-Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8 9.2. STREAMING SIMD EXTENSIONS SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9 9.2.1. Instruction Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9 9.3. OVERVIEW OF THE STREAMING SIMD EXTENSIONS SET . . . . . . . . . . . . . . . 9-10 9.3.1. Data Movement Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10 9.3.2. Arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11 9.3.2.1. Packed/Scalar Addition and Subtraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11 9.3.2.2. Packed/Scalar Multiplication and Division . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11 9.3.2.3. Packed/Scalar Square Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11 9.3.2.4. Packed Maximum/Minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-12 9.3.3. Comparison Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-12 9.3.4. Conversion Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-13 9.3.5. Logical Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-13 9.3.6. Additional SIMD Integer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-14 9.3.7. Shuffle Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-15 9.3.8. State Management Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-16 9.3.9. Cacheability Control Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17 9.4. COMPATIBILITY WITH FPU ARCHITECTURE. . . . . . . . . . . . . . . . . . . . . . . . . . . 9-19 9.4.1. Effect of Instruction Prefixes on Streaming SIMD Extensions . . . . . . . . . . . . . . 9-19 9.5. WRITING APPLICATIONS WITH STREAMING SIMD EXTENSIONS CODE. . . . 9-21 9.5.1. Detecting Support for Streaming SIMD Extensions Using the CPUID Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-21 9.5.2. Interfacing with Streaming SIMD Extensions Procedures and Functions . . . . . 9-22 9.5.3. Writing Code with MMX, Floating-Point, and Streaming SIMD Extensions . . 9-22 9.5.3.1. Cacheability Hint Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-23 9.5.3.2. Recommendations and Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-25 9.5.4. Using Streaming SIMD Extensions Code in a Multitasking Operating System Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-25 9.5.4.1. Cooperative Multitasking Operating System. . . . . . . . . . . . . . . . . . . . . . . . . 9-26 9.5.4.2. Preemptive Multitasking Operating System . . . . . . . . . . . . . . . . . . . . . . . . . 9-26 9.5.5. Exception Handling in Streaming SIMD Extensions . . . . . . . . . . . . . . . . . . . . . 9-26
ix
TABLE OF CONTENTS
CHAPTER 10 INPUT/OUTPUT 10.1. I/O PORT ADDRESSING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1 10.2. I/O PORT HARDWARE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1 10.3. I/O ADDRESS SPACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2 10.3.1. Memory-Mapped I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-2 10.4. I/O INSTRUCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3 10.5. PROTECTED-MODE I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4 10.5.1. I/O Privilege Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-4 10.5.2. I/O Permission Bit Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-5 10.6. ORDERING I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6 CHAPTER 11 PROCESSOR IDENTIFICATION AND FEATURE DETERMINATION 11.1. PROCESSOR IDENTIFICATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2 11.2. IDENTIFICATION OF EARLIER INTEL ARCHITECTURE PROCESSORS . . . . . 11-4 11.3. CPUID INSTRUCTION EXTENSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-5 11.3.1. Version Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-5 11.3.2. Control Register Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-7 APPENDIX A EFLAGS CROSS-REFERENCE APPENDIX B EFLAGS CONDITION CODES APPENDIX C FLOATING-POINT EXCEPTIONS SUMMARY APPENDIX D SIMD FLOATING-POINT EXCEPTIONS SUMMARY APPENDIX E GUIDELINES FOR WRITING FPU EXCEPTIONS HANDLERS E.1. ORIGIN OF THE MS-DOS* COMPATIBILITY MODE FOR HANDLING FPU EXCEPTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-2 E.2. IMPLEMENTATION OF THE MS-DOS* COMPATIBILITY MODE IN THE INTEL486, PENTIUM, AND P6 FAMILY PROCESSORS E-3 E.2.1. MS-DOS* Compatibility Mode in the Intel486 and Pentium Processors . . . . E-3 E.2.1.1. Basic Rules: When FERR# Is Generated. . . . . . . . . . . . . . . . . . . . . . . . . . . . E-4 E.2.1.2. Recommended External Hardware to Support the MS-DOS* Compatibility Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-5 E.2.1.3. No-Wait FPU Instructions Can Get FPU Interrupt in Window. . . . . . . . . . . . . E-7 E.2.2. MS-DOS* Compatibility Mode in the P6 Family Processors . . . . . . . . . . . . . . . . E-9 E.3. RECOMMENDED PROTOCOL FOR MS-DOS* COMPATIBILITY HANDLERS. . E-10 E.3.1. Floating-Point Exceptions and Their Defaults . . . . . . . . . . . . . . . . . . . . . . . . . . E-10 E.3.2. Two Options for Handling Numeric Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . E-11 E.3.2.1. Automatic Exception Handling: Using Masked Exceptions . . . . . . . . . . . . . E-11 E.3.2.2. Software Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-12 E.3.3. Synchronization Required for Use of FPU Exception Handlers . . . . . . . . . . . . E-14
TABLE OF CONTENTS
E.3.3.1. E.3.3.2. E.3.3.3. E.3.3.4. E.3.4. E.3.5. E.3.5.1. E.3.5.2. E.3.5.3. E.3.5.4. E.3.5.5. E.4. E.4.1. E.4.2. E.4.3.
Exception Synchronization: What, Why and When. . . . . . . . . . . . . . . . . . . . E-14 Exception Synchronization Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-15 Proper Exception Synchronization in General . . . . . . . . . . . . . . . . . . . . . . . E-16 FPU Exception Handling Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-16 Need for Storing State of IGNNE# Circuit If Using FPU and SMM . . . . . . . . . . E-20 Considerations When FPU Shared Between Tasks . . . . . . . . . . . . . . . . . . . . . E-21 Speculatively Deferring FPU Saves, General Overview . . . . . . . . . . . . . . . . E-21 Tracking FPU Ownership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-22 Interaction of FPU State Saves and Floating-point Exception Association. . E-22 Interrupt Routing From the Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-25 Special Considerations for Operating Systems that Support Streaming SIMD Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-25 DIFFERENCES FOR HANDLERS USING NATIVE MODE . . . . . . . . . . . . . . . . . . E-26 Origin with the Intel 286 and Intel 287, and Intel386 and Intel 387 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-26 Changes with Intel486, Pentium, and P6 Family Processors with CR0.NE=1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-27 Considerations When FPU Shared Between Tasks Using Native Mode. . . . . . E-27
APPENDIX F GUIDELINES FOR WRITING SIMD FLOATING-POINT EXCEPTION HANDLERS F.1. TWO OPTIONS FOR HANDLING NUMERIC EXCEPTIONS . . . . . . . . . . . . . . . . . F-1 F.2. SOFTWARE EXCEPTION HANDLING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-1 F.3. EXCEPTION SYNCHRONIZATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-3 F.4. SIMD FLOATING-POINT EXCEPTIONS AND THE IEEE-754 STANDARD FOR BINARY FLOATING-POINT COMPUTATIONS F-4 F.4.1. Floating-Point Emulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-4 F.4.2. Streaming SIMD Extensions Response To Floating-Point Exceptions . . . . . . . . F-6 F.4.2.1. Numeric Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-7 F.4.2.2. Results of Operations with NaN Operands or a NaN Result for Streaming SIMD Extensions Numeric Instructions . . . . . . . . . . . . . . . . . . . . . F-7 F.4.2.3. Condition Codes, Exception Flags, and Response for Masked and Unmasked Numeric Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-10 F.4.3. SIMD Floating-Point Emulation Implementation Example . . . . . . . . . . . . . . . . . F-13
xi
TABLE OF CONTENTS
xii
TABLE OF FIGURES
Figure 1-1. Figure 2-1. Figure 2-2. Figure 3-1. Figure 3-2. Figure 3-3. Figure 3-4. Figure 3-5. Figure 3-6. Figure 3-7. Figure 4-1. Figure 4-2. Figure 4-3. Figure 4-4. Figure 4-5. Figure 4-6. Figure 4-7. Figure 4-8. Figure 4-9. Figure 4-10. Figure 5-1. Figure 5-2. Figure 5-3. Figure 5-4. Figure 5-5. Figure 5-6. Figure 6-1. Figure 6-2. Figure 6-3. Figure 6-4. Figure 6-5. Figure 6-6. Figure 6-7. Figure 6-8. Figure 6-9. Figure 6-10. Figure 6-11. Figure 7-1. Figure 7-2. Figure 7-3. Figure 7-4. Figure 7-5. Figure 7-6. Figure 7-7. Figure 7-8. Figure 7-9. Figure 7-10. Figure 7-11. Bit and Byte Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6 The Processing Units in the P6 Family Processor Microarchitecture and Their Interface with the Memory Subsystem . . . . . . . . . . . . . . . . . . . . . . .2-7 Functional Block Diagram of the P6 Family Processor Microarchitecture . . .2-10 P6 Family Processor Basic Execution Environment. . . . . . . . . . . . . . . . . . . . .3-2 Three Memory Management Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3 Application Programming Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6 Alternate General-Purpose Register Names . . . . . . . . . . . . . . . . . . . . . . . . . .3-7 Use of Segment Registers for Flat Memory Model. . . . . . . . . . . . . . . . . . . . . .3-8 Use of Segment Registers in Segmented Memory Model . . . . . . . . . . . . . . . .3-9 EFLAGS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-11 Stack Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-2 Stack on Near and Far Calls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-6 Protection Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-9 Stack Switch on a Call to a Different Privilege Level . . . . . . . . . . . . . . . . . . .4-11 Stack Usage on Transfers to Interrupt and Exception Handling Routines . . .4-15 Nested Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-20 Stack Frame after Entering the MAIN Procedure . . . . . . . . . . . . . . . . . . . . . .4-21 Stack Frame after Entering Procedure A . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-22 Stack Frame after Entering Procedure B . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-23 Stack Frame after Entering Procedure C . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-24 Fundamental Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-1 SIMD Floating-Point Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-1 Bytes, Words, Doublewords and Quadwords in Memory . . . . . . . . . . . . . . . . .5-2 Numeric, Pointer, and Bit Field Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4 Memory Operand Address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-7 Offset (or Effective Address) Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-9 Operation of the PUSH Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-23 Operation of the PUSHA Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-24 Operation of the POP Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-24 Operation of the POPA Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-25 Sign Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-25 SHL/SAL Instruction Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-29 SHR Instruction Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-30 SAR Instruction Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-31 SHLD and SHRD Instruction Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-32 ROL, ROR, RCL, and RCR Instruction Operations . . . . . . . . . . . . . . . . . . . .6-33 Flags Affected by the PUSHF, POPF, PUSHFD, and POPFD instructions . .6-43 Binary Real Number System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-3 Binary Floating-Point Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-4 Real Numbers and NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-6 Relationship Between the Integer Unit and the FPU . . . . . . . . . . . . . . . . . . . .7-9 FPU Execution Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-10 FPU Data Register Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-10 Example FPU Dot Product Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-12 FPU Status Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-13 Moving the FPU Condition Codes to the EFLAGS Register. . . . . . . . . . . . . .7-16 FPU Control Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-17 FPU Tag Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-20 xiii
TABLE OF FIGURES
Figure 7-12. Figure 7-13. Figure 7-14. Figure 7-15. Figure 7-16. Figure 7-17. Figure 8-1. Figure 8-2. Figure 8-3. Figure 9-1. Figure 9-2. Figure 9-3. Figure 9-4. Figure 9-5. Figure 9-6. Figure 9-7. Figure 9-8. Figure 9-9. Figure 10-1. Figure 10-2. Figure 11-1. Figure 11-2. Figure 11-3. Figure E-1. Figure E-2. Figure E-3. Figure E-4. Figure E-5. Figure E-6. Figure F-1.
Contents of FPU Opcode Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-22 Protected Mode FPU State Image in Memory, 32-Bit Format . . . . . . . . . . . .7-22 Real Mode FPU State Image in Memory, 32-Bit Format . . . . . . . . . . . . . . . .7-23 Protected Mode FPU State Image in Memory, 16-Bit Format . . . . . . . . . . . .7-23 Real Mode FPU State Image in Memory, 16-Bit Format . . . . . . . . . . . . . . . .7-24 Floating-Point Unit Data Type Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-25 MMX Register Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-2 MMX Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-3 Eight Packed Bytes in Memory (at address 1000H) . . . . . . . . . . . . . . . . . . . . .8-4 SIMD Floating-Point Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-3 Packed Single-FP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-3 Four Packed FP Data in Memory (at address 1000H) . . . . . . . . . . . . . . . . . . .9-5 SIMD Floating-Point Control/Status Register Format . . . . . . . . . . . . . . . . . . . .9-7 Packed Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-9 Scalar Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-10 Packed Shuffle Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-15 Unpack High Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-16 Unpack Low Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-16 Memory-Mapped I/O. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-3 I/O Permission Bit Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-5 EAX Return Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-6 CPUID Feature Field Information Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-6 CR4 Register Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-7 Recommended Circuit for MS-DOS* Compatibility FPU Exception Handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-6 Behavior of Signals During FPU Exception Handling . . . . . . . . . . . . . . . . . . . E-7 Timing of Receipt of External Interrupt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-8 Arithmetic Example Using Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-12 General Program Flow for DNA Exception Handler . . . . . . . . . . . . . . . . . . . E-24 Program Flow for a Numeric Exception Dispatch Routine . . . . . . . . . . . . . . E-24 Control Flow for Handling Unmasked Floating-Point Exceptions . . . . . . . . . . .F-6
xiv
TABLE OF TABLES
Table 2-1. Table 3-1. Table 4-1. Table 5-1. Table 6-1. Table 6-2. Table 6-3. Table 6-4. Table 6-5. Table 7-1. Table 7-2. Table 7-3. Table 7-4. Table 7-5. Table 7-6. Table 7-7. Table 7-8. Table 7-9. Table 7-10. Table 7-11. Table 7-12. Table 7-13. Table 7-14. Table 7-15. Table 7-16. Table 7-17. Table 7-18. Table 7-19. Table 7-20. Table 7-21. Table 7-22. Table 7-23. Table 8-1. Table 8-2. Table 8-3. Table 9-1. Table 9-2. Table 9-3. Table 9-4. Table 9-5. Table 9-6. Table 9-7. Table 10-1. Table 11-1. Table 11-2. Table A-1. Table B-1. xv Processor Performance Over Time and Other Intel Architecture Key Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-5 Effective Operand- and Address-Size Attributes . . . . . . . . . . . . . . . . . . . . . .3-15 Exceptions and Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-14 Default Segment Selection Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-8 Move Instruction Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-21 Conditional Move Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-22 Bit Test and Modify Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-34 Conditional Jump Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-37 Information Provided by the CPUID Instruction . . . . . . . . . . . . . . . . . . . . . . .6-45 Real Number Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-5 Denormalization Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-7 FPU Condition Code Interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-14 Precision Control Field (PC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-17 Rounding Control Field (RC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-18 Rounding of Positive Numbers with Masked Overflow . . . . . . . . . . . . . . . . . .7-19 Rounding of Negative Numbers with Masked Overflow . . . . . . . . . . . . . . . . .7-19 Length, Precision, and Range of FPU Data Types. . . . . . . . . . . . . . . . . . . . .7-26 Real Number and NaN Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-27 Binary Integer Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-28 Packed Decimal Integer Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-29 Unsupported Extended-Real Encodings. . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-31 Data Transfer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-32 Floating-Point Conditional Move Instructions . . . . . . . . . . . . . . . . . . . . . . . . .7-33 Setting of FPU Condition Code Flags for Real Number Comparisons . . . . . .7-37 Setting of EFLAGS Status Flags for Real Number Comparisons. . . . . . . . . .7-37 TEST Instruction Constants for Conditional Branching . . . . . . . . . . . . . . . . .7-38 Rules for Generating QNaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-44 Results of Operations with NaN Operands. . . . . . . . . . . . . . . . . . . . . . . . . . .7-45 Arithmetic and Non-arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . .7-48 Invalid Arithmetic Operations and the Masked Responses to Them . . . . . . .7-53 Divide-By-Zero Conditions and the Masked Responses to Them . . . . . . . . .7-54 Masked Responses to Numeric Overflow. . . . . . . . . . . . . . . . . . . . . . . . . . . .7-55 Data Range Limits for Saturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-6 MMX Instruction Set Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-8 Effect of Prefixes on MMX Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-11 Precision and Range of SIMD Floating-point Datatype . . . . . . . . . . . . . . . . . .9-5 Real Number and NaN Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-6 Rounding Control Field (RC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-8 Streaming SIMD Extensions Behavior with Prefixes . . . . . . . . . . . . . . . . . . .9-20 SIMD Integer Instructions Behavior with Prefixes. . . . . . . . . . . . . . . . . . . . . .9-20 Cacheability Control Instruction Behavior with Prefixes . . . . . . . . . . . . . . . . .9-20 Cache Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-24 I/O Instruction Serialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-7 EAX Input Value and CPUID Return Values . . . . . . . . . . . . . . . . . . . . . . . . .11-5 New P6-Family Processor Feature Information Returned by CPUID in EDX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-6 EFLAGS Cross-Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 EFLAGS Condition Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
TABLE OF TABLES
Table C-1. Table D-1. Table F-1. Table F-2. Table F-3. Table F-4. Table F-5. Table F-6. Table F-7. Table F-8. Table F-9. Table F-10. Table F-11. Table F-12. Table F-13. Table F-14. Table F-15. Table F-16.
Floating-Point Exceptions Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-1 Streaming SIMD Extensions Instruction Set Summary . . . . . . . . . . . . . . . . . D-2 ADDPS, ADDSS, SUBPS, SUBSS, MULPS, MULSS, DIVPS, DIVSS . . . . . .F-8 CMPPS.EQ, CMPSS.EQ, CMPPS.ORD, CMPSS.ORD . . . . . . . . . . . . . . . . .F-8 CMPPS.NEQ, CMPSS.NEQ, CMPPS.UNORD, CMPSS.UNORD. . . . . . . . . .F-8 CMPPS.LT, CMPSS.LT, CMPPS.LE, CMPSS.LE . . . . . . . . . . . . . . . . . . . . . .F-8 CMPPS.NLT, CMPSS.NLT, CMPSS.NLT, CMPSS.NLE . . . . . . . . . . . . . . . . .F-8 COMISS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-9 UCOMISS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-9 CVTPS2PI, CVTSS2SI, CVTTPS2PI, CVTTSS2SI . . . . . . . . . . . . . . . . . . . . .F-9 MAXPS, MAXSS, MINPS, MINSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-9 SQRTPS, SQRTSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-9 #I - Invalid Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-10 #Z - Divide-by-Zero. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-11 #D - Denormal Operand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-12 #0 - Numeric Overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-12 #U - Numeric Underflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-13 #P - Inexact Result (Precision) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-13
xvi
1
About This Manual
CHAPTER 1 ABOUT THIS MANUAL

The Intel Architecture Software Developers Manual, Volume 1: Basic Architecture (Order Number 243190) is part of a three-volume set that describes the architecture and programming environment of all Intel Architecture (IA) processors. The other two volumes in this set are:
The Intel Architecture Software Developers Manual, Volume 2: Instruction Set Reference (Order Number 243191). The Intel Architecture Software Developers Manual, Volume 3: System Programming Guide (Order Number 243192).
The Intel Architecture Software Developers Manual, Volume 1, describes the basic architecture and programming environment of an IA processor; the Intel Architecture Software Developers Manual, Volume 2, describes the instruction set of the processor and the opcode structure. These two volumes are aimed at application programmers who are writing programs to run under existing operating systems or executives. The Intel Architecture Software Developers Manual, Volume 3 describes the operating-system support environment of an IA processor, including memory management, protection, task management, interrupt and exception handling, and system management mode. It also provides IA processor compatibility information. This volume is aimed at operating-system and BIOS designers and programmers.
1.1.
OVERVIEW OF THE INTEL ARCHITECTURE SOFTWARE DEVELOPERS MANUAL, VOLUME 1: BASIC ARCHITECTURE
The contents of this manual are as follows: Chapter 1 About This Manual. Gives an overview of all three volumes of the Intel Architecture Software Developers Manual. It also describes the notational conventions in these manuals and lists related Intel manuals and documentation of interest to programmers and hardware designers. Chapter 2 Introduction to the Intel Architecture. Introduces the IA and the families of Intel processors that are based on this architecture. It also gives an overview of the common features found in these processors and brief history of the IA. Chapter 3 Basic Execution Environment. Introduces the models of memory organization and describes the register set used by applications. Chapter 4 Procedure Calls, Interrupts, and Exceptions. Describes the procedure stack and the mechanisms provided for making procedure calls and for servicing interrupts and exceptions.
1-1
ABOUT THIS MANUAL
Chapter 5 Data Types and Addressing Modes. Describes the data types and addressing modes recognized by the processor. Chapter 6 Instruction Set Summary. Gives an overview of all the IA instructions except those executed by the processors floating-point unit. The instructions are presented in functionally related groups. Chapter 7 Floating-Point Unit. Describes the IA floating-point unit, including the floatingpoint registers and data types; gives an overview of the floating-point instruction set; and describes the processors floating-point exception conditions. Chapter 8 Programming with Intel MMX Technology. Describes the Intel MMX technology, including MMX registers and data types, and gives an overview of the MMX instruction set. Chapter 9 Programming with the Streaming SIMD Extensions. Describes the Intel Streaming SIMD Extensions, including the registers and data types. Chapter 10 Input/Output. Describes the processors I/O architecture, including I/O port addressing, the I/O instructions, and the I/O protection mechanism. Chapter 11 Processor Identification and Feature Determination. Describes how to determine the CPU type and the features that are available in the processor. Appendix A EFLAGS Cross-Reference. Summarizes how the IA instructions affect the flags in the EFLAGS register. Appendix B EFLAGS Condition Codes. Summarizes how the conditional jump, move, and byte set on condition code instructions use the condition code flags (OF, CF, ZF, SF, and PF) in the EFLAGS register. Appendix C Floating-Point Exceptions Summary. Summarizes the exceptions that can be raised by floating-point instructions. Appendix D SIMD Floating-Point Exceptions Summary. Provides the Streaming SIMD Extensions mnemonics, and the exceptions that each instruction can cause. Appendix E Guidelines for Writing FPU Exception Handlers. Describes how to design and write MS-DOS* compatible exception handling facilities for FPU and SIMD floating-point exceptions, including both software and hardware requirements and assembly-language code examples. This appendix also describes general techniques for writing robust FPU exception handlers. Appendix F Guidelines for Writing SIMD-FP Exception Handlers. Provides guidelines for the Streaming SIMD Extensions instructions that can generate numeric (floating-point) exceptions, and gives an overview of the necessary support for handling such exceptions.
1-2
ABOUT THIS MANUAL
1.2.
OVERVIEW OF THE INTEL ARCHITECTURE SOFTWARE DEVELOPERS MANUAL, VOLUME 2: INSTRUCTION SET REFERENCE
The contents of the Intel Architecture Software Developers Manual, Volume 2 are as follows: Chapter 1 About This Manual. Gives an overview of all three volumes of the Intel Architecture Software Developers Manual. It also describes the notational conventions in these manuals and lists related Intel manuals and documentation of interest to programmers and hardware designers. Chapter 2 Instruction Format. Describes the machine-level instruction format used for all IA instructions and gives the allowable encodings of prefixes, the operand-identifier byte (ModR/M byte), the addressing-mode specifier byte (SIB byte), and the displacement and immediate bytes. Chapter 3 Instruction Set Reference. Describes each of the IA instructions in detail, including an algorithmic description of operations, the effect on flags, the effect of operand- and address-size attributes, and the exceptions that may be generated. The instructions are arranged in alphabetical order. The FPU, MMX, and Streaming SIMD Extensions instructions are included in this chapter. Appendix A Opcode Map. Gives an opcode map for the IA instruction set. Appendix B Instruction Formats and Encodings. Gives the binary encoding of each form of each IA instruction. Appendix C Compiler Intrinsics and Functional Equivalents. Gives the Intel C/C++ compiler intrinsics and functional equivalents for the MMX Technology instructions and Streaming SIMD Extensions.
1.3.
OVERVIEW OF THE INTEL ARCHITECTURE SOFTWARE DEVELOPERS MANUAL, VOLUME 3: SYSTEM PROGRAMMING GUIDE
The contents of the Intel Architecture Software Developers Manual, Volume 3 are as follows: Chapter 1 About This Manual. Gives an overview of all three volumes of the Intel Architecture Software Developers Manual. It also describes the notational conventions in these manuals and lists related Intel manuals and documentation of interest to programmers and hardware designers. Chapter 2 System Architecture Overview. Describes the modes of operation of an IA processor and the mechanisms provided in the IA to support operating systems and executives, including the system-oriented registers and data structures and the system-oriented instructions. The steps necessary for switching between real-address and protected modes are also identified.
1-3
ABOUT THIS MANUAL
Chapter 3 Protected-Mode Memory Management. Describes the data structures, registers, and instructions that support segmentation and paging and explains how they can be used to implement a flat (unsegmented) memory model or a segmented memory model. Chapter 4 Protection. Describes the support for page and segment protection provided in the IA. This chapter also explains the implementation of privilege rules, stack switching, pointer validation, user and supervisor modes. Chapter 5 Interrupt and Exception Handling. Describes the basic interrupt mechanisms defined in the IA, shows how interrupts and exceptions relate to protection, and describes how the architecture handles each exception type. Reference information for each IA exception is given at the end of this chapter. Chapter 6 Task Management. Describes the mechanisms the IA provides to support multitasking and inter-task protection. Chapter 7 Multiple Processor Management. Describes the instructions and flags that support multiple processors with shared memory, memory ordering, and the advanced programmable interrupt controller (APIC). Chapter 8 Processor Management and Initialization. Defines the state of an IA processor and its floating-point and SIMD floating-point units after reset initialization. This chapter also explains how to set up an IA processor for real-address mode operation and protected-mode operation, and how to switch between modes. Chapter 9 Memory Cache Control. Describes the general concept of caching and the caching mechanisms supported by the IA. This chapter also describes the memory type range registers (MTRRs) and how they can be used to map memory types of physical memory. MTRRs were introduced into the IA with the Pentium Pro processor. It also presents information on using the new cache control and memory streaming instructions introduced with the Pentium III processor. Chapter 10 MMX Technology System Programming. Describes those aspects of the Intel MMX technology that must be handled and considered at the system programming level, including task switching, exception handling, and compatibility with existing system environments. The MMX technology was introduced into the IA with the Pentium processor. Chapter 11 Streaming SIMD Extensions System Programming. Describes those aspects of Streaming SIMD Extensions that must be handled and considered at the system programming level, including task switching, exception handling, and compatibility with existing system environments. Streaming SIMD Extensions were introduced into the IA with the Pentium processor. Chapter 12 System Management Mode (SMM). Describes the IAs system management mode (SMM), which can be used to implement power management functions. Chapter 13 Machine-Check Architecture. Describes the machine-check architecture, which was introduced into the IA with the Pentium processor. Chapter 14 Code Optimization. Discusses general optimization techniques for programming an IA processor.
1-4
ABOUT THIS MANUAL
Chapter 15 Debugging and Performance Monitoring. Describes the debugging registers and other debug mechanism provided in the IA. This chapter also describes the time-stamp counter and the performance-monitoring counters. Chapter 16 8086 Emulation. Describes the real-address and virtual-8086 modes of the IA. Chapter 17 Mixing 16-Bit and 32-Bit Code. Describes how to mix 16-bit and 32-bit code modules within the same program or task. Chapter 18 Intel Architecture Compatibility. Describes the programming differences between the Intel 286, Intel386, Intel486, Pentium, and P6 family processors. The differences among the 32-bit IA processors (the Intel386, Intel486, Pentium, and P6 family processors) are described throughout the three volumes of the Intel Architecture Software Developers Manual, as relevant to particular features of the architecture. This chapter provides a collection of all the relevant compatibility information for all IA processors and also describes the basic differences with respect to the 16-bit IA processors (the Intel 8086 and Intel 286 processors). Appendix A Performance-Monitoring Events. Lists the events that can be counted with the performance-monitoring counters and the codes used to select these events. Both Pentium processor and P6 family processor events are described. Appendix B Model-Specific Registers (MSRs). Lists the MSRs available in the Pentium and P6 family processors and their functions. Appendix C Dual-Processor (DP) Bootup Sequence Example (Specific to Pentium Processors). Gives an example of how to use the DP protocol to boot two Pentium processors (a primary processor and a secondary processor) in a DP system and initialize their APICs. Appendix D Multiple-Processor (MP) Bootup Sequence Example (Specific to P6 Family Processors). Gives an example of how to use of the MP protocol to boot two P6 family processors in a MP system and initialize their APICs. Appendix E Programming the LINT0 and LINT1 Inputs. Gives an example of how to program the LINT0 and LINT1 pins for specific interrupt vectors.
1.4.
NOTATIONAL CONVENTIONS
This manual uses special notation for data-structure formats, for symbolic representation of instructions, and for hexadecimal numbers. A review of this notation makes the manual easier to read.
1.4.1.
Bit and Byte Order
In illustrations of data structures in memory, smaller addresses appear toward the bottom of the figure; addresses increase toward the top. Bit positions are numbered from right to left. The numerical value of a set bit is equal to two raised to the power of the bit position. IA processors are little endian machines; this means the bytes of a word are numbered starting from the least significant byte. Figure 1-1 illustrates these conventions.
1-5
ABOUT THIS MANUAL
Highest 31 Address
Data Structure 8 7 24 23 16 15
0 28 24 20 16 12 8 4 0
Bit offset
Byte 3
Byte 2
Byte 1
Byte 0
Lowest Address
Byte Offset
Figure 1-1. Bit and Byte Order
1.4.2.
Reserved Bits and Software Compatibility
In many register and memory layout descriptions, certain bits are marked as reserved. When bits are marked as reserved, it is essential for compatibility with future processors that software treat these bits as having a future, though unknown, effect. The behavior of reserved bits should be regarded as not only undefined, but unpredictable. Software should follow these guidelines in dealing with reserved bits:
Do not depend on the states of any reserved bits when testing the values of registers which contain such bits. Mask out the reserved bits before testing. Do not depend on the states of any reserved bits when storing to memory or to a register. Do not depend on the ability to retain information written into any reserved bits. When loading a register, always load the reserved bits with the values indicated in the documentation, if any, or reload them with values previously read from the same register.
NOTE
Avoid any software dependence upon the state of reserved bits in IA registers. Depending upon the values of reserved register bits will make software dependent upon the unspecified manner in which the processor handles these bits. Programs that depend upon reserved values risk incompatibility with future processors.
1-6
ABOUT THIS MANUAL
1.4.3.
Instruction Operands
When instructions are represented symbolically, a subset of the IA assembly language is used. In this subset, an instruction has the following format:
label: mnemonic argument1, argument2, argument3
where:
A label is an identifier which is followed by a colon. A mnemonic is a reserved name for a class of instruction opcodes which have the same function. The operands argument1, argument2, and argument3 are optional. There may be from zero to three operands, depending on the opcode. When present, they take the form of either literals or identifiers for data items. Operand identifiers are either reserved names of registers or are assumed to be assigned to data items declared in another part of the program (which may not be shown in the example).
When two operands are present in an arithmetic or logical instruction, the right operand is the source and the left operand is the destination. For example:
LOADREG: MOV EAX, SUBTOTAL
In this example, LOADREG is a label, MOV is the mnemonic identifier of an opcode, EAX is the destination operand, and SUBTOTAL is the source operand. Some assembly languages put the source and destination in reverse order.
1.4.4.
Hexadecimal and Binary Numbers
Base 16 (hexadecimal) numbers are represented by a string of hexadecimal digits followed by the character H (for example, F82EH). A hexadecimal digit is a character from the following set: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F. Base 2 (binary) numbers are represented by a string of 1s and 0s, sometimes followed by the character B (for example, 1010B). The B designation is only used in situations where confusion as to the type of number might arise.
1.4.5.
Segmented Addressing
The processor uses byte addressing. This means memory is organized and accessed as a sequence of bytes. Whether one or more bytes are being accessed, a byte address is used to locate the byte or bytes memory. The range of memory that can be addressed is called an address space. The processor also supports segmented addressing. This is a form of addressing where a program may have many independent address spaces, called segments. For example, a program can keep its code (instructions) and stack in separate segments. Code addresses would always
1-7
ABOUT THIS MANUAL
refer to the code space, and stack addresses would always refer to the stack space. The following notation is used to specify a byte address within a segment:
Segment-register:Byte-address
For example, the following segment address identifies the byte at address FF79H in the segment pointed by the DS register:
DS:FF79H
The following segment address identifies an instruction address in the code segment. The CS register points to the code segment and the EIP register contains the address of the instruction.
CS:EIP
1.4.6.
Exceptions
An exception is an event that typically occurs when an instruction causes an error. For example, an attempt to divide by zero generates an exception. However, some exceptions, such as breakpoints, occur under other conditions. Some types of exceptions may provide error codes. An error code reports additional information about the error. An example of the notation used to show an exception and error code is shown below.
#PF(fault code)
This example refers to a page-fault exception under conditions where an error code naming a type of fault is reported. Under some conditions, exceptions which produce error codes may not be able to report an accurate code. In this case, the error code is zero, as shown below for a general-protection exception.
#GP(0)
Refer to Chapter 5, Interrupt and Exception Handling, in the Intel Architecture Software Developers Manual, Volume 3, for a list of exception mnemonics and their descriptions.
1-8
ABOUT THIS MANUAL
1.5.
RELATED LITERATURE
The following books contain additional material related to Intel processors:
Intel Pentium II Processor Specification Update, Order Number 243337-010. Intel Pentium Pro Processor Specification Update, Order Number 242689. Intel Pentium Processor Specification Update, Order Number 242480. AP-485, Intel Processor Identification and the CPUID Instruction, Order Number 241618. AP-578, Software and Hardware Considerations for FPU Exception Handlers for Intel Architecture Processors, Order Number 242415-001. Pentium Pro Processor Family Developers Manual, Volume 1: Specifications, Order Number 242690-001. Pentium Processor Family Developers Manual, Order Number 241428. Intel486 Microprocessor Data Book, Order Number 240440. Intel486 SX CPU/Intel487 SX Math Coprocessor Data Book , Order Number 240950. Intel486 DX2 Microprocessor Data Book, Order Number 241245. Intel486 Microprocessor Product Brief Book, Order Number 240459. Intel386 Processor Hardware Reference Manual, Order Number 231732. Intel386 Processor System Software Writer's Guide, Order Number 231499. Intel386 High-Performance 32-Bit CHMOS Microprocessor with Integrated Memory Management, Order Number 231630. 376 Embedded Processor Programmers Reference Manual, Order Number 240314. 80387 DX Users Manual Programmers Reference, Order Number 231917. 376 High-Performance 32-Bit Embedded Processor, Order Number 240182. Intel386 SX Microprocessor, Order Number 240187. Microprocessor and Peripheral Handbook (Vol. 1), Order Number 230843. AP-528, Optimizations for Intels 32-Bit Processors, Order Number 242816-001.
1-9
ABOUT THIS MANUAL
1-10
2
Introduction to the Intel Architecture
CHAPTER 2 INTRODUCTION TO THE INTEL ARCHITECTURE

A strong case can be made that the exponential growth of both the power and breadth of usage of the computer has made it the most important force that is reshaping human technology, business, and society in the second half of the twentieth century. Further, the computer promises to continue to dominate technological growth well into the twenty-first century, in part since other powerful technological forces that are just emerging are strongly dependent on the growth of computing power for their own existence and growth (such as the Internet, and genetics developments like recombinant DNA research and development). The Intel Architecture (IA) is clearly todays preferred computer architecture, as measured by number of computers in use and total computing power available in the world. Thus it is hard to overestimate the importance of the IA.
2.1.
BRIEF HISTORY OF THE INTEL ARCHITECTURE
The developments leading to the IA can be traced back through the 8085 and 8080 microprocessors to the 4004 microprocessor (the first microprocessor, designed by Intel in 1969). However, the first actual processor in the IA family is the 8086, quickly followed by a more cost effective version for smaller systems, the 8088. The object code programs created for these processors starting in 1978 will still execute on the latest members of the IA family. The 8086 has 16-bit registers and a 16-bit external data bus, with 20-bit addressing giving a 1MByte address space. The 8088 is identical except for a smaller external data bus of 8 bits. These processors introduced IA segmentation, but only in Real Mode; 16-bit registers can act as pointers to address into segments of up to 64 KBytes in size. The four segment registers hold the (effectively) 20-bit base addresses of the currently active segments; up to 256 KBytes can be addressed without switching between segments, and a total address range of 1 MByte is available. The Intel 80286 processor introduced the Protected Mode into the IA. This new mode uses the segment register contents as selectors or pointers into descriptor tables. The descriptors provide 24-bit base addresses, allowing a maximum physical memory size of up to 16 MBytes, support for virtual memory management on a segment swapping basis, and various protection mechanisms. These include segment limit checking, read-only and execute-only segment options, and up to four privilege levels to protect operating system code (in several subdivisions, if desired) from application or user programs. Furthermore, hardware task switching and the local descriptor tables allow the operating system to protect application or user programs from each other. The Intel386 processor introduced 32-bit registers into the architecture, for use both as operands for calculations and for addressing. The lower half of each 32-bit register retained the properties of one of the 16-bit registers of the earlier two generations, to provide complete upward compatibility. A new virtual-8086 mode was provided to yield greater efficiency when
2-1
INTRODUCTION TO THE INTEL ARCHITECTURE
executing programs created for the 8086 and 8088 processors on the new 32-bit machine. The 32-bit addressing was supported with an external 32-bit address bus, giving a 4-GByte address space, and also allowed each segment to be as large as 4 GBytes. The original instructions were enhanced with new 32-bit operand and addressing forms, and completely new instructions were provided, including those for bit manipulation. The Intel386 processor also introduced paging into the IA, with the fixed 4-KByte page size providing a method for virtual memory management that was significantly superior compared to using segments for the purpose (it was much more efficient for operating systems, and completely transparent to the applications without significant sacrifice of execution speed). Furthermore, the ability to define segments as large as the 4 GBytes physical address space, together with paging, allowed the creation of protected flat model1 addressing systems in the architecture, including complete implementations of the widely used mainframe operating system UNIX. The IA has been and is committed to the task of maintaining backward compatibility at the object code level to preserve our customers very large investment in software, but at the same time, in each generation of the architecture, the latest most effective microprocessor architecture and silicon fabrication technologies have been used to produce the fastest, most powerful processors possible. Intel has worked over the generations to adapt and incorporate increasingly sophisticated techniques from mainframe architecture into microprocessor architecture. Various forms of parallel processing have been the most performance enhancing of these techniques, and the Intel386 processor was the first IA processor to include a number of parallel stages: six. These are the Bus Interface Unit (accesses memory and I/O for the other units), the Code Prefetch Unit (receives object code from the Bus Unit and puts it into a 16-byte queue), the Instruction Decode Unit (decodes object code from the Prefetch unit into microcode), the Execution Unit (executes the microcode instructions), the Segment Unit (translates logical addresses to linear addresses and does protection checks), and the Paging Unit (translates linear addresses to physical addresses, does page based protection checks, and contains a cache with information for up to 32 most recently accessed pages). The Intel486 processor added more parallel execution capability by (basically) expanding the Intel386 processors Instruction Decode and Execution Units into five pipelined stages, where each stage (when needed) operates in parallel with the others on up to five instructions in different stages of execution. Each stage can do its work on one instruction in one clock, and so the Intel486 processor can execute as rapidly as one instruction per CPU clock. An 8-KByte on-chip L1 cache was added to the Intel486 processor to greatly increase the percent of instructions that could execute at the scalar rate of one per clock: memory access instructions were now included if the operand was in the L1 cache. The Intel486 processor also for the first time integrated the floating-point math Unit onto the same chip as the CPU (refer to Section 2.3., Brief History of the Intel Architecture Floating-Point Unit) and added new pins, bits, and instructions to support more complex and powerful systems (L2 cache support and multiprocessor support). Late in the Intel486 processor generation, Intel incorporated features designed to support energy savings and other system management capabilities into the IA mainstream with the Intel486 SL Enhanced processors. These features were developed in the Intel386 SL and Intel486 SL processors, which were specialized for the rapidly growing battery-operated
1. Requires only one 32-bit address component to access anywhere in the address space.
2-2
notebook PC market. The features include the new System Management Mode, triggered by its own dedicated interrupt pin, which allows complex system management features (such as power management of various subsystems within the PC), to be added to a system transparently to the main operating system and all applications. The Stop Clock and Auto Halt Powerdown features allow the CPU itself to execute at a reduced clock rate to save power, or to be shut down (with state preserved) to save even more power. The Intel Pentium processor added a second execution pipeline to achieve superscalar performance (two pipelines, known as u and v, together can execute two instructions per clock). The on-chip L1 cache has also been doubled, with 8 KBytes devoted to code, and another 8 KBytes devoted to data. The data cache uses the MESI protocol to support the more efficient write-back mode, as well as the write-through mode that is used by the Intel486 processor. Branch prediction with an on-chip branch table has been added to increase performance in looping constructs. Extensions have been added to make the virtual-8086 mode more efficient, and to allow for 4MByte as well as 4-KByte pages. The main registers are still 32 bits, but internal data paths of 128 and 256 bits have been added to speed internal data transfers, and the burstable external data bus has been increased to 64 bits. The Advanced Programmable Interrupt Controller (APIC) has been added to support systems with multiple Pentium processors, and new pins and a special mode (dual processing) has been designed in to support glueless two processor systems. The Intel Pentium Pro processor introduced Dynamic Execution. It has a three-way superscalar architecture, which means that it can execute three instructions per CPU clock. It does this by incorporating even more parallelism than the Pentium processor. The Pentium Pro processor provides Dynamic Execution (micro-data flow analysis, out-of-order execution, superior branch prediction, and speculative execution) in a superscalar implementation. Three instruction decode units work in parallel to decode object code into smaller operations called micro-ops. These go into an instruction pool, and (when interdependencies dont prevent) can be executed out of order by the five parallel execution units (two integer, two FPU and one memory interface unit). The Retirement Unit retires completed micro-ops in their original program order, taking account of any branches. The power of the Pentium Pro processor is further enhanced by its caches: it has the same two on-chip 8-KByte L1 caches as does the Pentium processor, and also has a 256-KByte L2 cache that is in the same package as, and closely coupled to, the CPU, using a dedicated 64-bit (backside) full clock speed bus. The L1 cache is dual-ported, the L2 cache supports up to 4 concurrent accesses, and the 64-bit external data bus is transaction-oriented, meaning that each access is handled as a separate request and response, with numerous requests allowed while awaiting a response. These parallel features for data access work with the parallel execution capabilities to provide a non-blocking architecture in which the processor is more fully utilized and performance is enhanced. The Pentium Pro processor also has an expanded 36-bit address bus, giving a maximum physical address space of 64 GBytes. The Pentium II processor added MMX instructions to the Pentium Pro processor architecture, incorporating the new slot 1 and slot 2 packaging techniques. These new packaging techniques moved the L2 cache off-chip or off-die. The slot 1 and slot 2 package uses a singleedge connector instead of a socket. The Pentium II processor expanded the L1 data cache and L1 instruction cache to 16 KBytes each. The Pentium II processor has L2 cache sizes of 256 KBytes, 512 KBytes and 1 MByte or 2 MByte (slot 2 only). The slot 1 processor uses a half clock speed backside bus while the slot 2 processor uses a full clock speed backside bus. The
2-3
Pentium II processors utilize multiple low-power states such as AutoHALT, Stop-Grant, Sleep, and Deep Sleep to conserve power during idle times. The newest processor in the IA is the Pentium III processor. It is based on the Pentium Pro and Pentium II processors architectures. The Pentium III processor introduces 70 new instructions to the IA instruction set. These instructions target existing functional units within the architecture as well as the new SIMD-floating-point unit. More detailed discussion of the new features in the Pentium Pro, Pentium II, and Pentium III processors is provided in Section 2.4., Introduction to the P6 Family Processors Advanced Microarchitecture and Section 2.5., Detailed Description of the P6 FaMILY Processor Microarchitecture. More detailed hardware and architectural information on each of the generations of the IA family is available in the separate data books for the processor generations (Section 1.5., Related Literature in Chapter 1, About This Manual).
2.2.
INCREASING INTEL ARCHITECTURE PERFORMANCE AND MOORES LAW
In the mid-1960s, Intel Chairman of the Board Gordon Moore deduced a principle or law which has continued to be true for over three decades: the computing power and the complexity (or roughly, the number of transistors per CPU chip) of the silicon integrated circuit microprocessor doubles every one to two years, and the cost per CPU chip is cut in half. This law is the main explanation for the computer revolution, in which the IA plays such a significant role.
2-4
The table below shows the dramatic increases in performance and transistor count of the IA processors over their history, as predicted by Moores Law, and also summarizes the evolution of other key features of the architecture.
Table 2-1. Processor Performance Over Time and Other Intel Architecture Key Features
Date of Product Introduction 1978 1982 1985 1989 1993 1995 Perform -ance in MIPs1 0.8 2.7 6.0 20 100 440 Max. CPU Frequency at Introduction 8 MHz 12.5 MHz 20 MHz 25 MHz 60 MHz 200 MHz No. of Transis -tors on the Die 29 K 134 K 275 K 1.2 M 3.1 M 5.5 M Main CPU Register Size2 16 16 32 32 32 32 Extern. Data Bus Size2 16 16 32 32 64 64 Max. Extern. Addr. Space 1 MB 16 MB 4 GB 4 GB 4 GB 64 GB Caches in CPU Package3 None Note 3 Note 3 8KB L1 16KB L1 16KB L1; 256KB or 512KB L2 32KB L1; 256KB or 512KB L2 32KB L1; 512KB L2
Intel Processor 8086 Intel 286 Intel386 DX Intel486 DX Pentium Pentium Pro
Pentium II
1997
466
266
7M
32
64
64 GB
Pentium III
NOTES:
1999
1000
500
8.2 M
32 GP 128 SIMD-FP
64
64 GB
1. Performance here is indicated by Dhrystone MIPs (Millions of Instructions per Second) because even though MIPs are no longer considered a preferred measure of CPU performance, they are the only benchmarks that span all six generations of the IA. The MIPs and frequency values given here correspond to the maximum CPU frequency available at product introduction. 2. Main CPU register size and external data bus size are given in bits. Note also that there are 8 and 16-bit data registers in all of the CPUs, there are eight 80-bit registers in the FPUs integrated into the Intel386 chip and beyond, and there are internal data paths that are 2 to 4 times wider than the external data bus for each processor. 3. In addition to the large general-purpose caches listed in the table for the Intel486 processor (8 KBytes of combined code and data) and the Intel Pentium and Pentium Pro processors (8 KBytes each for separate code cache and data cache), there are smaller special purpose caches. The Intel 286 has 6 byte descriptor caches for each segment register. The Intel386 has 8 byte descriptor caches for each segment register, and also a 32-entry, 4-way set associative Translation Lookaside Buffer (cache) to store access information for recently used pages on the chip. The Intel486 has the same caches described for the Intel386, as well as its 8K L1 general-purpose cache. The Intel Pentium and Pentium Pro processors have their general-purpose caches, descriptor caches, and two Translation Lookaside Buffers each (one for each 8K L1 cache). The Pentium II and Pentium III processors have the same cache structure as the Pentium Pro processor except that the size of each cache is 16K.
2-5
2.3.
BRIEF HISTORY OF THE INTEL ARCHITECTURE FLOATINGPOINT UNIT
The IA Floating-Point Units (FPUs) before the Intel486 lack the added efficiency of integration into the CPU, but have provided the option of greatly enhanced floating-point performance since the beginning of the family. (Since the earlier FPUs were on separate chips, they were often referred to as numeric processor extensions (NPXs) or math coprocessors (MCPs).) With each succeeding generation, Intel has made significant increases in the power and flexibility of the FPU, and yet has maintained complete upward compatibility. The Pentium Pro Processor offers compatibility with object code for 8087, Intel 287, Intel 387 DX, Intel 387 SX, and Intel 487 DX math coprocessors and the Intel486 DX and Pentium processors. The 8087 numeric processor extension (NPX) was designed for use in 8086-family systems. The 8086 was the first microprocessor family to partition the processing unit to permit high-performance numeric capabilities. The 8087 NPX for this processor family implemented a complete numeric processing environment in compliance with an early proposal for IEEE Standard 754 for Binary Floating-Point Arithmetic. With the Intel 287 coprocessor NPX, high-speed numeric computations were extended to 80286 high-performance multitasking and multi-user systems. Multiple tasks using the numeric processor extension were afforded the full protection of the 80286 memory management and protection features. The Intel 387 DX and SX math coprocessors are Intels third generation numeric processors. They implement the final IEEE Standard 754, adding new trigonometric instructions, and using a new design and CHMOS-III process to allow higher clock rates and require fewer clocks per instruction. Together, the Intel 387 math coprocessor with additional instructions and the improved standard brought even more convenience and reliability to numeric programming and made this convenience and reliability available to applications that need the high-speed and large memory capacity of the 32-bit environment of the Intel386 microprocessor. The Intel486 processor FPU is an on-chip equivalent of the Intel 387 DX math coprocessor conforming to both IEEE Standard 754 and the more recent, generalized IEEE Standard 854. Having the FPU on-chip results in a considerable performance improvement in numeric-intensive computation. The Pentium processor FPU has been completely redesigned over the Intel486 processor FPU while maintaining conformance to both the IEEE Standard 754 and 854. Faster algorithms provide at least three times the performance over the Intel486 processor FPU for common operations including ADD, MUL, and LOAD. Many applications can achieve five times the performance of the Intel486 processor FPU or more with instruction scheduling and pipelined execution.
2.4.
INTRODUCTION TO THE P6 FAMILY PROCESSORS ADVANCED MICROARCHITECTURE
The P6 Family processors (introduced by Intel in 1995) represent the earliest implementation of most recent processor in the IA family. Like its predecessor, the Pentium processor (introduced
2-6
by Intel in 1993), the Pentium Pro processor, with its advanced superscalar microarchitecture, sets an impressive performance standard. In designing the P6 Family processors, one of the primary goals of the Intel chip architects was to exceed the performance of the Pentium processor significantly while still using the same 0.6-micrometer, four-layer, metal BICMOS manufacturing process. Using the same manufacturing process as the Pentium processor meant that performance gains could only be achieved through substantial advances in the microarchitecture. The resulting P6 Family processor microarchitecture is a three-way superscalar, pipelined architecture. The term three-way superscalar means that using parallel processing techniques, the processor is able on average to decode, dispatch, and complete execution of (retire) three instructions per clock cycle. To handle this level of instruction throughput, the P6 Family processors use a decoupled, 12-stage superpipeline that supports out-of-order instruction execution. Figure 2-1 shows a conceptual view of this pipeline, with the pipeline divided into four processing units (the fetch/decode unit, the dispatch/execute unit, the retire unit, and the instruction pool). Instructions and data are supplied to these units through the bus interface unit.
System Bus L2 Cache
Cache Bus Bus Interface Unit
L1 Instruction Cache Fetch Fetch/Decode Unit
L1 Data Cache Load Dispatch/ Execute Unit Store Intel Architecture Registers
Retire Unit
Instruction Pool
Figure 2-1. The Processing Units in the P6 Family Processor Microarchitecture and Their Interface with the Memory Subsystem
To insure a steady supply of instructions and data to the instruction execution pipeline, the P6 Family processor microarchitecture incorporates two cache levels. The L1 cache provides an 8-
2-7
KByte instruction cache and an 8-KByte data cache, both closely coupled to the pipeline. The L2 cache is a 256-KByte, 512-KByte, or 1-MByte static RAM that is coupled to the core processor through a full clock-speed 64-bit cache bus. The centerpiece of the P6 Family processor microarchitecture is an innovative out-of-order execution mechanism called dynamic execution. Dynamic execution incorporates three dataprocessing concepts:
Deep branch prediction. Dynamic data flow analysis. Speculative execution.
Branch prediction is a concept found in most mainframe and high-speed microprocessor architectures. It allows the processor to decode instructions beyond branches to keep the instruction pipeline full. In the P6 Family processors, the instruction fetch/decode unit uses a highly optimized branch prediction algorithm to predict the direction of the instruction stream through multiple levels of branches, procedure calls, and returns. Dynamic data flow analysis involves real-time analysis of the flow of data through the processor to determine data and register dependencies and to detect opportunities for out-of-order instruction execution. The P6 Family processors dispatch/execute unit can simultaneously monitor many instructions and execute these instructions in the order that optimizes the use of the processors multiple execution units, while maintaining data integrity. This out-of-order execution keeps the execution units busy even when cache misses and data dependencies among instructions occur. Speculative execution refers to the processors ability to execute instructions ahead of the program counter but ultimately to commit the results in the order of the original instruction stream. To make speculative execution possible, the P6 Family processors microarchitecture decouples the dispatching and executing of instructions from the commitment of results. The processors dispatch/execute unit uses data-flow analysis to execute all available instructions in the instruction pool and store the results in temporary registers. The retirement unit then linearly searches the instruction pool for completed instructions that no longer have data dependencies with other instructions or unresolved branch predictions. When completed instructions are found, the retirement unit commits the results of these instructions to memory and/or the IA registers (the processors eight general-purpose registers and eight floating-point unit data registers) in the order they were originally issued and retires the instructions from the instruction pool. Through deep branch prediction, dynamic data-flow analysis, and speculative execution, dynamic execution removes the constraint of linear instruction sequencing between the traditional fetch and execute phases of instruction execution. It allows instructions to be decoded deep into multi-level branches to keep the instruction pipeline full. It promotes out-of-order instruction execution to keep the processors six instruction execution units running at full capacity. And finally, it commits the results of executed instructions in original program order to maintain data integrity and program coherency. The following section describes the P6 Family processor microarchitecture in greater detail. The Pentium Pro processor architecture is the base architecture for the processors that followed it.
2-8
The Pentium II processor and now the Pentium III processor are based on the Pentium Pro processor architecture. Changes or enhancements to the Pentium Pro processor architecture are noted where appropriate.
2.5.
DETAILED DESCRIPTION OF THE P6 FAMILY PROCESSOR MICROARCHITECTURE
Figure 2-2 shows a functional block diagram of the P6 Family processor microarchitecture. In this diagram, the following blocks make up the four processing units and the memory subsystem shown in Figure 2-1:
Memory subsystemSystem bus, L2 cache, bus interface unit, instruction cache (L1), data cache unit (L1), memory interface unit, and memory reorder buffer. Fetch/decode unitInstruction fetch unit, branch target buffer, instruction decoder, microcode sequencer, and register alias table. Instruction poolReorder buffer Dispatch/execute unitReservation station, two integer units, one x87 floating-point unit, two address generation units, and two SIMD floating-point units. Retire unitRetire unit and retirement register file.
2.5.1.
Memory Subsystem
The memory subsystem for the P6 Family processor consists of main system memory, the primary cache (L1), and the secondary cache (L2). The bus interface unit accesses system memory through the external system bus. This 64-bit bus is a transaction-oriented bus, meaning that each bus access is handled as separate request and response operations. While the bus interface unit is waiting for a response to one bus request, it can issue numerous additional requests. The bus interface unit accesses the close-coupled L2 cache through a 64-bit cache bus. This bus is also transactional oriented, supporting up to four concurrent cache accesses, and operates at the full clock speed of the processor. Access to the L1 caches is through internal buses, also at full clock speed. The 8-KByte L1 instruction cache is four-way set associative; the 8-KByte L1 data cache is dual-ported and twoway set associative, supporting one load and one store operation per cycle. Coherency between the caches and system memory are maintained using the MESI (modified, exclusive, shared, invalid) cache protocol. This protocol fosters cache coherency in single- and multiple-processor systems. It is also able to detect coherency problems created by self-modifying code. Memory requests from the processors execution units go through the memory interface unit and the memory order buffer. These units have been designed to support a smooth flow of memory access requests through the cache and system memory hierarchy to prevent memory access
2-9
blocking. The L1 data cache automatically forwards a cache miss on to the L2 cache, and then, if necessary, the bus interface unit forwards an L2 cache miss to system memory.
System Bus (External)
L2 Cache Cache Bus
Bus Interface Unit Next IP Unit Branch Target Buffer Microcode Instruction Sequencer From Integer Unit Memory Reorder Buffer
Instruction Fetch Unit
Instruction Cache (L1)
Instruction Decoder Simple Instuction Decoder Simple Instuction Decoder Complex Instuction Decoder
Register Alias Table Retirement Register File (Intel Arch. Registers)
Retirement Unit Reorder Buffer (Instruction Pool)
Data Cache Unit (L1)
Reservation Station
Execution Unit SIMD FP Unit (FPU) FloatingPoint Unit (FPU) Integer Unit Integer Unit Memory Interface Unit
Internal Data-Results Buses
Figure 2-2. Functional Block Diagram of the P6 Family Processor Microarchitecture
Memory requests to the L2 cache or system memory go through the memory reorder buffer, which functions as a scheduling and dispatch station. This unit keeps track of all memory requests and is able to reorder some requests to prevent blocks and improve throughput. For example, the memory reorder buffer allows loads to pass stores. It also issues speculative loads. (Stores are always dispatched in order, and speculative stores are never issued.)
2-10
2.5.2.
Fetch/Decode Unit
The fetch/decode unit reads a stream of IA instructions from the L1 instruction cache and decodes them into a series of micro-operations called micro-ops. This micro-op stream (still in the order of the original instruction stream) is then sent to the instruction pool. The instruction fetch unit fetches one 32-byte cache line per clock from the instruction cache. It marks the beginning and end of the IA instructions in the cache lines and transmits 16 aligned bytes to the decoder. The instruction fetch unit computes the instruction pointer, based on inputs from the branch target buffer, the exception/interrupt status, and branch-misprediction indications from the integer execution units. The most important part of this process is the branch prediction performed by the branch target buffer. Using an extension of Yehs algorithm, the 512-entry branch target buffer looks many instructions ahead of the retirement program counter. Within this instruction window there may be numerous branches, procedure calls, and returns that must be correctly predicted if the dispatch/execute unit is to do useful work. The instruction decoder contains three parallel decoders: two simple-instruction decoders and one complex instruction decoder. Each decoder converts an IA instruction into one or more triadic micro-ops (two logical sources and one logical destination per micro-op). Micro-ops are primitive instructions that are executed by the processors six parallel execution units. Many IA instructions are converted directly into single micro-ops by the simple instruction decoders, and some instructions are decoded into from one to four micro-ops. The more complex IA instructions are decoded into sequences of preprogrammed micro-ops obtained from the microcode instruction sequencer. The instruction decoders also handle the decoding of instruction prefixes and looping operations. The instruction decoder can generate up to six micro-ops per clock cycle (one each for the simple instruction decoders and four for the complex instruction decoder). The IAs register set can cause resource stalls due to register dependencies. To solve this problem, the processor provides 40 internal, general-purpose registers, which are used for the actual computations. These registers can handle both integer and floating-point values. To allocate the internal registers, the enqueued micro-ops from the instruction decoder are sent to the register alias table unit, where references to the logical IA registers are converted into internal physical register references. In the final step of the decoding process, the allocator in the register alias table unit adds status bits and flags to the micro-ops to prepare them for out-of-order execution and sends the resulting micro-ops to the instruction pool.
2.5.3.
Instruction Pool (Reorder Buffer)
Prior to entering the instruction pool (known formally as the reorder buffer), the micro-op instruction stream is in the same order as the IA instruction stream that was sent to the instruction decoder. No reordering of instructions has taken place. The reorder buffer is an array of content-addressable memory, arranged into 40 micro-op registers. It contains micro-ops that are waiting to be executed, as well as those that have already been
2-11
executed but not yet committed to machine state. The dispatch/execute unit can execute instructions from the reorder buffer in any order.
2.5.4.
Dispatch/Execute Unit
The dispatch/execute unit is an out-of-order unit that schedules and executes the micro-ops stored in the reorder buffer according to data dependencies and resource availability and temporarily stores the results of these speculative executions. The scheduling and dispatching of micro-ops from the reorder buffer is handled by the reservation station. It continuously scans the reorder buffer for micro-ops that are ready to be executed (that is, all the source operands are available) and dispatches them to the available execution units. The results of a micro-op execution are returned to the reorder buffer and stored along with the micro-op until it is retired. This scheduling and dispatching process supports classic out-of-order execution, where micro-ops are dispatched to the execution units strictly according to data-flow constraints and execution resource availability, without regard to the original ordering of the instructions. When two or more micro-ops of the same type (for example, integer operations) are available at the same time, they are executed in a pseudo FIFO order in the reorder buffer. Execution of micro-ops is handled by two integer units, two floating-point units, and one memory-interface unit, allowing up to five micro-ops to be scheduled per clock. The two integer units can handle two integer micro-ops in parallel. One of the integer units is designed to handle branch micro-ops. This unit has the ability to detect branch mispredictions and signal the branch target buffer to restart the pipeline. This operation is handled as follows. The instruction decoder tags each branch micro-op with both branch destination addresses (the predicted destination and the fall-through destination). When the integer unit executes the branch micro-op, it is able to determine whether the predicted or the fall-through destination was taken. If the predicted branch is taken, then speculatively executed micro-ops are marked usable and execution continues along the predicted instruction path. If the predicted branch was not taken, a jump execution unit in the integer unit changes the status of all of the micro-ops following the branch to remove them from the instruction pool. It then provides the proper branch destination to the branch target buffer, which in turn restarts the pipeline from the new target address. The memory interface unit handles load and store micro-ops. A load access only needs to specify the memory address, so it can be encoded in one micro-op. A store access needs to specify both an address and the data to be written, so it is encoded in two micro-ops. The part of the memory interface unit that handles stores has two ports allowing it to process the address and the data micro-op in parallel. The memory interface unit can thus execute both a load and a store in parallel in one clock cycle. The floating-point execution units are similar to those found in the Pentium processor. Several new floating-point instructions have been added to the P6 Family processor to streamline conditional branches and moves.
2-12
2.5.5.
Retirement Unit
The retirement unit commits the results of speculatively executed micro-ops to permanent machine state and removes the micro-ops from the reorder buffer. Like the reservation station, the retirement unit continuously checks the status of micro-ops in the reorder buffer, looking for ones that have been executed and no longer have any dependencies with other micro-ops in the instruction pool. It then retires completed micro-ops in their original program order, taking into accounts interrupts, exceptions, breakpoints, and branch mispredictions. The retirement unit can retire three micro-ops per clock. In retiring a micro-op, it writes the results to the retirement register file and/or memory. The retirement register file contains the IA registers (eight general-purpose registers and eight floating-point data registers). After the results have been committed to machine state, the micro-op is removed from the reorder buffer.
2-13
2-14
3
Basic Execution Environment
CHAPTER 3 BASIC EXECUTION ENVIRONMENT

This chapter describes the basic execution environment of an Intel Architecture (IA) processor as seen by assembly-language programmers. It describes how the processor executes instructions and how it stores and manipulates data. The parts of the execution environment described here include memory (the address space), the general-purpose data registers, the segment registers, the EFLAGS register, and the instruction pointer register. The execution environment for the floating-point unit (FPU) is described in Chapter 7, FloatingPoint Unit.
3.1.
MODES OF OPERATION
The IA supports three operating modes: protected mode, real-address mode, and system management mode. The operating mode determines which instructions and architectural features are accessible:
Protected mode. The native state of the processor. In this mode all instructions and architectural features are available, providing the highest performance and capability. This is the recommended mode for all new applications and operating systems. Among the capabilities of protected mode is the ability to directly execute real-address mode 8086 software in a protected, multitasking environment. This feature is called virtual-8086 mode, although it is not actually a processor mode. Virtual-8086 mode is actually a protected mode attribute that can be enabled for any task. Real-address mode. Provides the programming environment of the Intel 8086 processor with a few extensions (such as the ability to switch to protected or system management mode). The processor is placed in real-address mode following power-up or a reset. System management mode. A standard architectural feature unique to all Intel processors, beginning with the Intel386 SL processor. This mode provides an operating system or executive with a transparent mechanism for implementing platform-specific functions such as power management and system security. The processor enters SMM when the external SMM interrupt pin (SMI#) is activated or an SMI is received from the advanced programmable interrupt controller (APIC). In SMM, the processor switches to a separate address space while saving the entire context of the currently running program or task. SMM-specific code may then be executed transparently. Upon returning from SMM, the processor is placed back into its state prior to the system management interrupt.
The basic execution environment is the same for each of these operating modes, as is described in the remaining sections of this chapter.
3-1
BASIC EXECUTION ENVIRONMENT
3.2.
OVERVIEW OF THE BASIC EXECUTION ENVIRONMENT
Any program or task running on an IA processor is given a set of resources for executing instructions and for storing code, data, and state information. These resources (shown in Figure 3-1) include an address space of up to 236 bytes, a set of general data registers, a set of segment registers, and a set of status and control registers. When a program calls a procedure, a procedure stack is added to the execution environment. (Procedure calls and the procedure stack implementation are described in Chapter 4, Procedure Calls, Interrupts, and Exceptions.)
236 1 Eight 32-bit Registers General-Purpose Registers
Six 16-bit Registers 32-bits 32-bits
Segment Registers
Address Space*
EFLAGS Register EIP (Instruction Pointer Register)
*The address space can be flat or segmented. * Physical address space is 2**36-1 * Linear address space is 2**32-1
Figure 3-1. P6 Family Processor Basic Execution Environment
3.3.
MEMORY ORGANIZATION
The memory that the processor addresses on its bus is called physical memory . Physical memory is organized as a sequence of 8-bit bytes. Each byte is assigned a unique address, called a physical address. The physical address space ranges from zero to a maximum of 236 1 (64 gigabytes). Virtually any operating system or executive designed to work with an IA processor will use the processors memory management facilities to access memory. These facilities provide features such as segmentation and paging, which allow memory to be managed efficiently and reliably. Memory management is described in detail in Chapter 3, Protected-Mode Memory Management, of the Intel Architecture Software Developers Manual, Volume 3. The following paragraphs describe the basic methods of addressing memory when memory management is used. When employing the processors memory management facilities, programs do not directly address physical memory. Instead, they access memory using any of three memory models: flat, segmented, or real-address mode. With the flat memory model (refer to Figure 3-2), memory appears to a program as a single, continuous address space, called a linear address space . Code (a programs instructions), data,
3-2
and the procedure stack are all contained in this address space. The linear address space is byte addressable, with addresses running contiguously from 0 to 236 1. An address for any byte in the linear address space is called a linear address.
Flat Model Linear Address
Linear Address Space*
Segmented Model Segments Offset Logical Address Segment Selector Linear Address Space*
Real-Address Mode Model Offset Logical Address Segment Selector Linear Address Space Divided Into Equal Sized Segments
* The linear address space can be paged when using the flat or segmented model.
Figure 3-2. Three Memory Management Models
With the segmented memory model, memory appears to a program as a group of independent address spaces called segments. When using this model, code, data, and stacks are typically contained in separate segments. To address a byte in a segment, a program must issue a logical address, which consists of a segment selector and an offset. (A logical address is often referred to as a far pointer.) The segment selector identifies the segment to be accessed and the offset identifies a byte in the address space of the segment. The programs running on an IA processor can address up to 16,383 segments of different sizes and types, and each segment can be as large as 236 bytes. Internally, all the segments that are defined for a system are mapped into the processors linear address space. The processor translates each logical address into a linear address to access a memory location. This translation is transparent to the application program.
3-3
The primary reason for using segmented memory is to increase the reliability of programs and systems. For example, placing a programs stack in a separate segment prevents the stack from growing into the code or data space and overwriting instructions or data, respectively. Placing the operating systems or executives code, data, and stack in separate segments also protects them from the application program and vice versa. With either the flat or segmented model, the IA provides facilities for dividing the linear address space into pages and mapping the pages into virtual memory. If an operating system/executive uses the IAs paging mechanism, the existence of the pages is transparent to an application program. The real-address mode model uses the memory model for the Intel 8086 processor, the first IA processor. It was provided in all the subsequent IA processors for compatibility with existing programs written to run on the Intel 8086 processor. The real-address mode uses a specific implementation of segmented memory in which the linear address space for the program and the operating system/executive consists of an array of segments of up to 64 Kbytes in size each. The maximum size of the linear address space in real-address mode is 220 bytes. (Refer to Chapter 16, 8086 Emulation, in the Intel Architecture Software Developers Manual, Volume 3, for more information on this memory model.)
3.4.
MODES OF OPERATION
When writing code for the Pentium Pro processor, a programmer needs to know the operating mode the processor is going to be in when executing the code and the memory model being used. The relationship between operating modes and memory models is as follows:
Protected mode. When in protected mode, the processor can use any of the memory models described in this section. (The real-addressing mode memory model is ordinarily used only when the processor is in the virtual-8086 mode.) The memory model used depends on the design of the operating system or executive. When multitasking is implemented, individual tasks can use different memory models. Real-address mode. When in real-address mode, the processor only supports the realaddress mode memory model. System management mode. When in SMM, the processor switches to a separate address space, called the system management RAM (SMRAM). The memory model used to address bytes in this address space is similar to the real-address mode model. (Refer to Chapter 12, System Management Mode (SMM), in the Intel Architecture Software Developers Manual, Volume 3, for more information on the memory model used in SMM.)
3.5.
32-BIT VS. 16-BIT ADDRESS AND OPERAND SIZES
The processor can be configured for 32-bit or 16-bit address and operand sizes. With 32-bit address and operand sizes, the maximum linear address or segment offset is FFFFFFFFH (232), and operand sizes are typically 8 bits or 32 bits. With 16-bit address and operand sizes, the
3-4
maximum linear address or segment offset is FFFFH (216), and operand sizes are typically 8 bits or 16 bits. When using 32-bit addressing, a logical address (or far pointer) consists of a 16-bit segment selector and a 32-bit offset; when using 16-bit addressing, it consists of a 16-bit segment selector and a 16-bit offset. Instruction prefixes allow temporary overrides of the default address and/or operand sizes from within a program. When operating in protected mode, the segment descriptor for the currently executing code segment defines the default address and operand size. A segment descriptor is a system data structure not normally visible to application code. Assembler directives allow the default addressing and operand size to be chosen for a program. The assembler and other tools then set up the segment descriptor for the code segment appropriately. When operating in real-address mode, the default addressing and operand size is 16 bits. An address-size override can be used in real-address mode to enable 32-bit addressing; however, the maximum allowable 32-bit address is still 0000FFFFH (216).
3.6.
REGISTERS
The processor provides 16 registers for use in general system and application programing. As shown in Figure 3-3, these registers can be grouped as follows:
General-purpose data registers. These eight registers are available for storing operands and pointers. Segment registers. These registers hold up to six segment selectors. Status and control registers. These registers report and allow modification of the state of the processor and of the program being executed.
3-5
3.6.1.
General-Purpose Data Registers
The 32-bit general-purpose data registers EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP are provided for holding the following items:
Operands for logical and arithmetic operations Operands for address calculations Memory pointers
Although all of these registers are available for general storage of operands, results, and pointers, caution should be used when referencing the ESP register. The ESP register holds the stack pointer and as a general rule should not be used for any other purpose.
General-Purpose Registers
31
0 EAX EBX ECX EDX ESI EDI EBP ESP
Segment Registers 0 15 CS DS SS ES FS GS 31 31 Status and Control Registers 0 EFLAGS 0 EIP
Figure 3-3. Application Programming Registers
Many instructions assign specific registers to hold operands. For example, string instructions use the contents of the ECX, ESI, and EDI registers as operands. When using a segmented memory model, some instructions assume that pointers in certain registers are relative to
3-6
specific segments. For instance, some instructions assume that a pointer in the EBX register points to a memory location in the DS segment. The special uses of general-purpose registers by instructions are described in Chapter 6, Instruction Set Summary, in this volume, and Chapter 3, Instruction Set Reference in the Intel Architecture Software Developers Manual, Volume 2. The following is a summary of these special uses:
EAXAccumulator for operands and results data. EBXPointer to data in the DS segment. ECXCounter for string and loop operations. EDXI/O pointer. ESIPointer to data in the segment pointed to by the DS register; source pointer for string operations. EDIPointer to data (or destination) in the segment pointed to by the ES register; destination pointer for string operations. ESPStack pointer (in the SS segment). EBPPointer to data on the stack (in the SS segment).
As shown in Figure 3-4, the lower 16 bits of the general-purpose registers map directly to the register set found in the 8086 and Intel 286 processors and can be referenced with the names AX, BX, CX, DX, BP, SP, SI, and DI. Each of the lower two bytes of the EAX, EBX, ECX, and EDX registers can be referenced by the names AH, BH, CH, and DH (high bytes) and AL, BL, CL, and DL (low bytes).
31
General-Purpose Registers 8 7 16 15 AH BH CH DH BP SI DI SP AL BL CL DL
16-bit AX BX CX DX
32-bit EAX EBX ECX EDX EBP ESI EDI ESP
Figure 3-4. Alternate General-Purpose Register Names
3.6.2.
Segment Registers
The segment registers (CS, DS, SS, ES, FS, and GS) hold 16-bit segment selectors. A segment selector is a special pointer that identifies a segment in memory. To access a particular segment
3-7
in memory, the segment selector for that segment must be present in the appropriate segment register. When writing application code, programmers generally create segment selectors with assembler directives and symbols. The assembler and other tools then create the actual segment selector values associated with these directives and symbols. If writing system code, programmers may need to create segment selectors directly. (A detailed description of the segment-selector data structure is given in Chapter 3, Protected-Mode Memory Management, of the Intel Architecture Software Developers Manual, Volume 3.) How segment registers are used depends on the type of memory management model that the operating system or executive is using. When using the flat (unsegmented) memory model, the segment registers are loaded with segment selectors that point to overlapping segments, each of which begins at address 0 of the linear address space (as shown in Figure 3-5). These overlapping segments then comprise the linear address space for the program. (Typically, two overlapping segments are defined: one for code and another for data and stacks. The CS segment register points to the code segment and all the other segment registers point to the data and stack segment.) When using the segmented memory model, each segment register is ordinarily loaded with a different segment selector so that each segment register points to a different segment within the linear address space (as shown in Figure 3-6). At any time, a program can thus access up to six segments in the linear-address space. To access a segment not pointed to by one of the segment registers, a program must first load the segment selector for the segment to be accessed into a segment register.
Linear Address Space for Program
Segment Registers CS DS SS ES FS GS The segment selector in each segment register points to an overlapping segment in the linear address space.
Overlapping Segments of up to 4G Bytes Beginning at Address 0
Figure 3-5. Use of Segment Registers for Flat Memory Model
3-8
Segment Registers CS DS SS ES FS GS
Code Segment Data Segment Stack Segment All segments are mapped to the same linear-address space Data Segment Data Segment Data Segment
Figure 3-6. Use of Segment Registers in Segmented Memory Model
Each of the segment registers is associated with one of three types of storage: code, data, or stack). For example, the CS register contains the segment selector for the code segment, where the instructions being executed are stored. The processor fetches instructions from the code segment, using a logical address that consists of the segment selector in the CS register and the contents of the EIP register. The EIP register contains the linear address within the code segment of the next instruction to be executed. The CS register cannot be loaded explicitly by an application program. Instead, it is loaded implicitly by instructions or internal processor operations that change program control (such as, procedure calls, interrupt handling, or task switching). The DS, ES, FS, and GS registers point to four data segments. The availability of four data segments permits efficient and secure access to different types of data structures. For example, four separate data segments might be created: one for the data structures of the current module, another for the data exported from a higher-level module, a third for a dynamically created data structure, and a fourth for data shared with another program. To access additional data segments, the application program must load segment selectors for these segments into the DS, ES, FS, and GS registers, as needed. The SS register contains the segment selector for a stack segment, where the procedure stack is stored for the program, task, or handler currently being executed. All stack operations use the SS register to find the stack segment. Unlike the CS register, the SS register can be loaded explicitly, which permits application programs to set up multiple stacks and switch among them. Refer to Section 3.3., Memory Organization for an overview of how the segment registers are used in real-address mode.
3-9
The four segment registers CS, DS, SS, and ES are the same as the segment registers found in the Intel 8086 and Intel 286 processors and the FS and GS registers were introduced into the IA with the Intel386 family of processors.
3.6.3.
EFLAGS Register
The 32-bit EFLAGS register contains a group of status flags, a control flag, and a group of system flags. Figure 3-7 defines the flags within this register. Following initialization of the processor (either by asserting the RESET pin or the INIT pin), the state of the EFLAGS register is 00000002H. Bits 1, 3, 5, 15, and 22 through 31 of this register are reserved. Software should not use or depend on the states of any of these bits. Some of the flags in the EFLAGS register can be modified directly, using special-purpose instructions (described in the following sections). There are no instructions that allow the whole register to be examined or modified directly. However, the following instructions can be used to move groups of flags to and from the procedure stack or the EAX register: LAHF, SAHF, PUSHF, PUSHFD, POPF, and POPFD. After the contents of the EFLAGS register have been transferred to the procedure stack or EAX register, the flags can be examined and modified using the processors bit manipulation instructions (BT, BTS, BTR, and BTC). When suspending a task (using the processors multitasking facilities), the processor automatically saves the state of the EFLAGS register in the task state segment (TSS) for the task being suspended. When binding itself to a new task, the processor loads the EFLAGS register with data from the new tasks TSS. When a call is made to an interrupt or exception handler procedure, the processor automatically saves the state of the EFLAGS registers on the procedure stack. When an interrupt or exception is handled with a task switch, the state of the EFLAGS register is saved in the TSS for the task being suspended.
3-10
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 V V N 0 0 0 0 0 0 0 0 0 0 I I I A V R 0 T D C M F P F I O P L O D I T S Z P C A F F F F F F 0 F 0 F 1 F
X ID Flag (ID) X Virtual Interrupt Pending (VIP) X Virtual Interrupt Flag (VIF) X Alignment Check (AC) X Virtual-8086 Mode (VM) X Resume Flag (RF) X Nested Task (NT) X I/O Privilege Level (IOPL) S Overflow Flag (OF) C Direction Flag (DF) X Interrupt Enable Flag (IF) X Trap Flag (TF) S Sign Flag (SF) S Zero Flag (ZF) S Auxiliary Carry Flag (AF) S Parity Flag (PF) S Carry Flag (CF) S Indicates a Status Flag C Indicates a Control Flag X Indicates a System Flag Reserved bit positions. DO NOT USE. Always set to values previously read.
Figure 3-7. EFLAGS Register
As the IA has evolved, flags have been added to the EFLAGS register, but the function and placement of existing flags have remained the same from one family of the IA processors to the next. As a result, code that accesses or modifies these flags for one family of IA processors works as expected when run on later families of processors.
3-11
3.6.3.1.
STATUS FLAGS
The status flags (bits 0, 2, 4, 6, 7, and 11) of the EFLAGS register indicate the results of arithmetic instructions, such as the ADD, SUB, MUL, and DIV instructions. The functions of the status flags are as follows: CF (bit 0) Carry flag. Set if an arithmetic operation generates a carry or a borrow out of the most-significant bit of the result; cleared otherwise. This flag indicates an overflow condition for unsigned-integer arithmetic. It is also used in multiple-precision arithmetic. Parity flag. Set if the least-significant byte of the result contains an even number of 1 bits; cleared otherwise. Adjust flag. Set if an arithmetic operation generates a carry or a borrow out of bit 3 of the result; cleared otherwise. This flag is used in binary-coded decimal (BCD) arithmetic. Zero flag. Set if the result is zero; cleared otherwise. Sign flag. Set equal to the most-significant bit of the result, which is the sign bit of a signed integer. (0 indicates a positive value and 1 indicates a negative value.) Overflow flag. Set if the integer result is too large a positive number or too small a negative number (excluding the sign-bit) to fit in the destination operand; cleared otherwise. This flag indicates an overflow condition for signed-integer (twos complement) arithmetic.
PF (bit 2) AF (bit 4)
ZF (bit 6) SF (bit 7)
OF (bit 11)
Of these status flags, only the CF flag can be modified directly, using the STC, CLC, and CMC instructions. Also the bit instructions (BT, BTS, BTR, and BTC) copy a specified bit into the CF flag. The status flags allow a single arithmetic operation to produce results for three different data types: unsigned integers, signed integers, and BCD integers. If the result of an arithmetic operation is treated as an unsigned integer, the CF flag indicates an out-of-range condition (carry or a borrow); if treated as a signed integer (twos complement number), the OF flag indicates a carry or borrow; and if treated as a BCD digit, the AF flag indicates a carry or borrow. The SF flag indicates the sign of a signed integer. The ZF flag indicates either a signed- or an unsignedinteger zero. When performing multiple-precision arithmetic on integers, the CF flag is used in conjunction with the add with carry (ADC) and subtract with borrow (SBB) instructions to propagate a carry or borrow from one computation to the next. The condition instructions Jcc (jump on condition code cc), SETcc (byte set on condition code cc), LOOPcc, and CMOVcc (conditional move) use one or more of the status flags as condition codes and test them for branch, set-byte, or end-loop conditions.
3-12
3.6.3.2.
DF FLAG
The direction flag (DF, located in bit 10 of the EFLAGS register) controls the string instructions (MOVS, CMPS, SCAS, LODS, and STOS). Setting the DF flag causes the string instructions to auto-decrement (that is, to process strings from high addresses to low addresses). Clearing the DF flag causes the string instructions to auto-increment (process strings from low addresses to high addresses). The STD and CLD instructions set and clear the DF flag, respectively.
3.6.4.
System Flags and IOPL Field
The system flags and IOPL field in the EFLAGS register control operating-system or executive operations. They should not be modified by application programs. The functions of the system flags are as follows: IF (bit 9) Interrupt enable flag. Controls the response of the processor to maskable interrupt requests. Set to respond to maskable interrupts; cleared to inhibit maskable interrupts. Trap flag. Set to enable single-step mode for debugging; clear to disable single-step mode. I/O privilege level field. Indicates the I/O privilege level of the currently running program or task. The current privilege level (CPL) of the currently running program or task must be less than or equal to the I/O privilege level to access the I/O address space. This field can only be modified by the POPF and IRET instructions when operating at a CPL of 0. Nested task flag. Controls the chaining of interrupted and called tasks. Set when the current task is linked to the previously executed task; cleared when the current task is not linked to another task. Resume flag. Controls the processors response to debug exceptions. Virtual-8086 mode flag. Set to enable virtual-8086 mode; clear to return to protected mode. Alignment check flag. Set this flag and the AM bit in the CR0 register to enable alignment checking of memory references; clear the AC flag and/or the AM bit to disable alignment checking. Virtual interrupt flag. Virtual image of the IF flag. Used in conjunction with the VIP flag. (To use this flag and the VIP flag the virtual mode extensions are enabled by setting the VME flag in control register CR4.) Virtual interrupt pending flag. Set to indicate pending interrupts; or clear when no interrupts are pending. (Software sets and clears this
TF (bit 8) IOPL (bits 12, 13)
NT (bit 14)
RF (bit 16) VM (bit 17) AC (bit 18)
VIF (bit 19)
VIP (bit 20)
3-13
flag; the processor only reads it.) Used in conjunction with the VIF flag. ID (bit 21) Identification flag. The ability of a program to set or clear this flag indicates support for the CPUID instruction.
Refer to Chapter 3, Protected-Mode Memory Management, in the Intel Architecture Software Developers Manual, Volume 3, for a detail description of these flags.
3.7.
INSTRUCTION POINTER
The instruction pointer (EIP) register contains the offset in the current code segment for the next instruction to be executed. It is advanced from one instruction boundary to the next in straightline code or it is moved ahead or backwards by a number of instructions when executing JMP, Jcc, CALL, RET, and IRET instructions. The EIP register cannot be accessed directly by software; it is controlled implicitly by controltransfer instructions (such as JMP, Jcc, CALL, and RET), interrupts, and exceptions. The only way to read the EIP register is to execute a CALL instruction and then read the value of the return instruction pointer from the procedure stack. The EIP register can be loaded indirectly by modifying the value of a return instruction pointer on the procedure stack and executing a return instruction (RET or IRET). Refer to Section 4.2.4.2., Return Instruction Pointer in Chapter 4, Procedure Calls, Interrupts, and Exceptions. All IA processors prefetch instructions. Because of instruction prefetching, an instruction address read from the bus during an instruction load does not match the value in the EIP register. Even though different processor generations use different prefetching mechanisms, the function of EIP register to direct program flow remains fully compatible with all software written to run on IA processors.
3.8.
OPERAND-SIZE AND ADDRESS-SIZE ATTRIBUTES
When the processor is executing in protected mode, every code segment has a default operandsize attribute and address-size attribute. These attributes are selected with the D (default size) flag in the segment descriptor for the code segment (refer to Chapter 3, Protected-Mode Memory Management, in the Intel Architecture Software Developers Manual, Volume 3). When the D flag is set, the 32-bit operand-size and address-size attributes are selected; when the flag is clear, the 16-bit size attributes are selected. When the processor is executing in real-address mode, virtual-8086 mode, or SMM, the default operand-size and address-size attributes are always 16 bits. The operand-size attribute selects the sizes of operands that instructions operate on. When the 16-bit operand-size attribute is in force, operands can generally be either 8 bits or 16 bits, and when the 32-bit operand-size attribute is in force, operands can generally be 8 bits or 32 bits. The address-size attribute selects the sizes of addresses used to address memory: 16 bits or 32 bits. When the 16-bit address-size attribute is in force, segment offsets and displacements are 16 bits. This restriction limits the size of a segment that can be addressed to 64 KBytes. When the
3-14
32-bit address-size attribute is in force, segment offsets and displacements are 32 bits, allowing segments of up to 4 GBytes to be addressed. The default operand-size attribute and/or address-size attribute can be overridden for a particular instruction by adding an operand-size and/or address-size prefix to an instruction (refer to Chapter 17, Mixing 16-Bit and 32-Bit Code of the Intel Architecture Software Developers Manual, Volume 3). The effect of this prefix applies only to the instruction it is attached to. Table 3-1 shows effective operand size and address size (when executing in protected mode) depending on the settings of the D/B flag and the operand-size and address-size prefixes.
Table 3-1. Effective Operand- and Address-Size Attributes
D Flag in Code Segment Descriptor Operand-Size Prefix 66H Address-Size Prefix 67H Effective Operand Size Effective Address Size NOTES: Y Yes, this instruction prefix is present. N No, this instruction prefix is not present. 0 N N 16 16 0 N Y 16 32 0 Y N 32 16 0 Y Y 32 32 1 N N 32 32 1 N Y 32 16 1 Y N 16 32 1 Y Y 16 16
3-15
3-16
4
Procedure Calls, Interrupts, and Exceptions
CHAPTER 4 PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS

This chapter describes the facilities in the Intel Architecture (IA) for executing calls to procedures or subroutines. It also describes how interrupts and exceptions are handled from the perspective of an application programmer.
4.1.
PROCEDURE CALL TYPES
The processor supports procedure calls in the following two different ways:
CALL and RET instructions. ENTER and LEAVE instructions, in conjunction with the CALL and RET instructions.
Both of these procedure call mechanisms use the procedure stack, commonly referred to simply as the stack, to save the state of the calling procedure, pass parameters to the called procedure, and store local variables for the currently executing procedure. The processors facilities for handling interrupts and exceptions are similar to those used by the CALL and RET instructions.
4.2.
STACK
The stack (refer to Figure 4-1) is a contiguous array of memory locations. It is contained in a segment and identified by the segment selector in the SS register. (When using the flat memory model, the stack can be located anywhere in the linear address space for the program.) A stack can be up to 4 gigabytes long, the maximum size of a segment. The next available memory location on the stack is called the top of stack. At any given time, the stack pointer (contained in the ESP register) gives the address (that is the offset from the base of the SS segment) of the top of the stack. Items are placed on the stack using the PUSH instruction and removed from the stack using the POP instruction. When an item is pushed onto the stack, the processor decrements the ESP register, then writes the item at the new top of stack. When an item is popped off the stack, the processor reads the item from the top of stack, then increments the ESP register. In this manner, the stack grows down in memory (towards lesser addresses) when items are pushed on the stack and shrinks up (towards greater addresses) when the items are popped from the stack. A program or operating system/executive can set up many stacks. For example, in multitasking systems, each task can be given its own stack. The number of stacks in a system is limited by the maximum number of segments and the available physical memory. When a system sets up
4-1
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
many stacks, only one stackthe current stackis available at a time. The current stack is the one contained in the segment referenced by the SS register.
Stack Segment Bottom of Stack (Initial ESP Value)
Local Variables for Calling Procedure
The Stack Can Be 16 or 32 Bits Wide
Parameters Passed to Called Procedure Frame Boundary
The EBP register is typically set to point to the return instruction pointer. Return Instruction Pointer EBP Register ESP Register Top of Stack
Pushes Move the Top Of Stack to Lower Addresses
Pops Move the Top Of Stack to Higher Addresses
Figure 4-1. Stack Structure
The processor references the SS register automatically for all stack operations. For example, when the ESP register is used as a memory address, it automatically points to an address in the current stack. Also, the CALL, RET, PUSH, POP, ENTER, and LEAVE instructions all perform operations on the current stack.
4-2
4.2.1.
Setting Up a Stack
To set a stack and establish it as the current stack, the program or operating system/executive must do the following: 1. Establish a stack segment. 2. Load the segment selector for the stack segment into the SS register using a MOV, POP, or LSS instruction. 3. Load the stack pointer for the stack into the ESP register using a MOV, POP, or LSS instruction. (The LSS instruction can be used to load the SS and ESP registers in one operation.) Refer to Chapter 3, Protected-Mode Memory Management of the Intel Architecture Software Developers Manual, Volume 3, for information on how to set up a segment descriptor and segment limits for a stack segment.
4.2.2.
Stack Alignment
The stack pointer for a stack segment should be aligned on 16-bit (word) or 32-bit (double-word) boundaries, depending on the width of the stack segment. The D flag in the segment descriptor for the current code segment sets the stack-segment width (refer to Chapter 3, Protected-Mode Memory Management of the Intel Architecture Software Developers Manual, Volume 3). The PUSH and POP instructions use the D flag to determine how much to decrement or increment the stack pointer on a push or pop operation, respectively. When the stack width is 16 bits, the stack pointer is incremented or decremented in 16-bit increments; when the width is 32 bits, the stack pointer is incremented or decremented in 32-bit increments. The processor does not check stack pointer alignment. It is the responsibility of the programs, tasks, and system procedures running on the processor to maintain proper alignment of stack pointers. Misaligning a stack pointer can cause serious performance degradation and in some instances program failures.
4.2.3.
Address-Size Attributes for Stack Accesses
Instructions that use the stack implicitly (such as the PUSH and POP instructions) have two address-size attributes each of either 16 or 32 bits. This is because they always have the implicit address of the top of the stack, and they may also have an explicit memory address (for example, PUSH Array1[EBX]). The attribute of the explicit address is determined by the D flag of the current code segment and the presence or absence of the 67H address-size prefix, as usual. The address-size attribute of the top of the stack determines whether SP or ESP is used for the stack access. Stack operations with an address-size attribute of 16 use the 16-bit SP stack pointer register and can use a maximum stack address of FFFFH; stack operations with an address-size attribute of 32 bits use the 32-bit ESP register and can use a maximum address of FFFFFFFFH. The default address-size attribute for data segments used as stacks is controlled by the B flag of
4-3
the segments descriptor. When this flag is clear, the default address-size attribute is 16; when the flag is set, the address-size attribute is 32.
4.2.4.
Procedure Linking Information
The processor provides two pointers for linking of procedures: the stack-frame base pointer and the return instruction pointer. When used in conjunction with a standard software procedure-call technique, these pointers permit reliable and coherent linking of procedures 4.2.4.1. STACK-FRAME BASE POINTER
The stack is typically divided into frames. Each stack frame can then contain local variables, parameters to be passed to another procedure, and procedure linking information. The stackframe base pointer (contained in the EBP register) identifies a fixed reference point within the stack frame for the called procedure. To use the stack-frame base pointer, the called procedure typically copies the contents of the ESP register into the EBP register prior to pushing any local variables on the stack. The stack-frame base pointer then permits easy access to data structures passed on the stack, to the return instruction pointer, and to local variables added to the stack by the called procedure. Like the ESP register, the EBP register automatically points to an address in the current stack segment (that is, the segment specified by the current contents of the SS register). 4.2.4.2. RETURN INSTRUCTION POINTER
Prior to branching to the first instruction of the called procedure, the CALL instruction pushes the address in the EIP register onto the current stack. This address is then called the returninstruction pointer and it points to the instruction where execution of the calling procedure should resume following a return from the called procedure. Upon returning from a called procedure, the RET instruction pops the return-instruction pointer from the stack back into the EIP register. Execution of the calling procedure then resumes. The processor does not keep track of the location of the return-instruction pointer. It is thus up to the programmer to insure that stack pointer is pointing to the return-instruction pointer on the stack, prior to issuing a RET instruction. A common way to reset the stack pointer to the point to the return-instruction pointer is to move the contents of the EBP register into the ESP register. If the EBP register is loaded with the stack pointer immediately following a procedure call, it should point to the return instruction pointer on the stack. The processor does not require that the return instruction pointer point back to the calling procedure. Prior to executing the RET instruction, the return instruction pointer can be manipulated in software to point to any address in the current code segment (near return) or another code segment (far return). Performing such an operation, however, should be undertaken very cautiously, using only well defined code entry points.
4-4
4.3.
CALLING PROCEDURES USING CALL AND RET
The CALL instructions allows control transfers to procedures within the current code segment (near call) and in a different code segment (far call). Near calls usually provide access to local procedures within the currently running program or task. Far calls are usually used to access operating system procedures or procedures in a different task. Refer to Chapter 3, Instruction Set Reference of the Intel Architecture Software Developers Manual, Volume 2, for a detailed description of the CALL instruction. The RET instruction also allows near and far returns to match the near and far versions of the CALL instruction. In addition, the RET instruction allows a program to increment the stack pointer on a return to release parameters from the stack. The number of bytes released from the stack is determined by an optional argument (n) to the RET instruction. Refer to Chapter 3, Instruction Set Reference of the Intel Architecture Software Developers Manual, Volume 2, for a detailed description of the RET instruction.
4.3.1.
Near CALL and RET Operation
When executing a near call, the processor does the following (refer to Figure 4-2): 1. Pushes the current value of the EIP register on the stack. 2. Loads the offset of the called procedure in the EIP register. 3. Begins execution of the called procedure. When executing a near return, the processor performs these actions: 1. Pops the top-of-stack value (the return instruction pointer) into the EIP register. 2. (If the RET instruction has an optional n argument.) Increments the stack pointer by the number of bytes specified with the n operand to release parameters from the stack. 3. Resumes execution of the calling procedure.
4-5
Stack Frame Before Call
Stack During Near Call Param 1 Param 2 Param 3 Calling EIP
Stack Frame Before Call ESP Before Call ESP After Call Stack Frame After Call
Stack During Far Call
Stack Frame After Call
Param 1 Param 2 Param 3 Calling CS Calling EIP
ESP Before Call ESP After Call
Stack During Near Return
Stack During Far Return ESP After Return Param 1 Param 2 Param 3 Calling CS Calling EIP
ESP After Return Param 1 Param 2 Param 3 Calling EIP
ESP Before Return
ESP Before Return
Note: On a near or far return, parameters are released from the stack if the correct value is given for the n operand in the RET n instruction.
Figure 4-2. Stack on Near and Far Calls
4.3.2.
Far CALL and RET Operation
When executing a far call, the processor performs these actions (refer to Figure 4-2): 1. Pushes current value of the CS register on the stack. 2. Pushes the current value of the EIP register on the stack. 3. Loads the segment selector of the segment that contains the called procedure in the CS register. 4. Loads the offset of the called procedure in the EIP register. 5. Begins execution of the called procedure.
4-6
When executing a far return, the processor does the following: 1. Pops the top-of-stack value (the return instruction pointer) into the EIP register. 2. Pops the top-of-stack value (the segment selector for the code segment being returned to) into the CS register. 3. (If the RET instruction has an optional n argument.) Increments the stack pointer by the number of bytes specified with the n operand to release parameters from the stack. 4. Resumes execution of the calling procedure.
4.3.3.
Parameter Passing
Parameters can be passed between procedures in any of three ways: through general-purpose registers, in an argument list, or on the stack. 4.3.3.1. PASSING PARAMETERS THROUGH THE GENERAL-PURPOSE REGISTERS
The processor does not save the state of the general-purpose registers on procedure calls. A calling procedure can thus pass up to six parameter to the called procedure by copying the parameters into any of these registers (except the ESP and EBP registers) prior to executing the CALL instruction. The called procedure can likewise pass parameters back to the calling procedure through general-purpose registers. 4.3.3.2. PASSING PARAMETERS ON THE STACK
To pass a large number of parameters to the called procedure, the parameters can be placed on the stack, in the stack frame for the calling procedure. Here, it is useful to use the stack-frame base pointer (in the EBP register) to make a frame boundary for easy access to the parameters. The stack can also be used to pass parameters back from the called procedure to the calling procedure. 4.3.3.3. PASSING PARAMETERS IN AN ARGUMENT LIST
An alternate method of passing a larger number of parameters (or a data structure) to the called procedure is to place the parameters in an argument list in one of the data segments in memory. A pointer to the argument list can then be passed to the called procedure through a generalpurpose register or the stack. Parameters can also be passed back to the calling procedure in this same manner.
4.3.4.
Saving Procedure State Information
The processor does not save the contents of the general-purpose registers, segment registers, or the EFLAGS register on a procedure call. A calling procedure should explicitly save the values
4-7
in any of the general-purpose registers that it will need when it resumes execution after a return. These values can be saved on the stack or in memory in one of the data segments. The PUSHA and POPA instruction facilitates saving and restoring the contents of the generalpurpose registers. PUSHA pushes the values in all the general-purpose registers on the stack in the following order: EAX, ECX, EDX, EBX, ESP (the value prior to executing the PUSHA instruction), EBP, ESI, and EDI. The POPA instruction pops all the register values saved with a PUSHA instruction (except the ESI value) from the stack to their respective registers. If a called procedure changes the state of any of the segment registers explicitly, it should restore them to their former value before executing a return to the calling procedure. If a calling procedure needs to maintain the state of the EFLAGS register it can save and restore all or part of the register using the PUSHF/PUSHFD and POPF/POPFD instructions. The PUSHF instruction pushes the lower word of the EFLAGS register on the stack, while the PUSHFD instruction pushes the entire register. The POPF instruction pops a word from the stack into the lower word of the EFLAGS register, while the POPFD instruction pops a double word from the stack into the register.
4.3.5.
Calls to Other Privilege Levels
The IAs protection mechanism recognizes four privilege levels, numbered from 0 to 3, where greater numbers mean lesser privileges. The primary reason to use these privilege levels is to improve the reliability of operating systems. For example, Figure 4-3 shows how privilege levels can be interpreted as rings of protection. In this example, the highest privilege level 0 (at the center of the diagram) is used for segments that contain the most critical code modules in the system, usually the kernel of an operating system. The outer rings (with progressively lower privileges) are used for segments that contain code modules for less critical software. Code modules in lower privilege segments can only access modules operating at higher privilege segments by means of a tightly controlled and protected interface called a gate. Attempts to access higher privilege segments without going through a protection gate and without having sufficient access rights causes a general-protection exception (#GP) to be generated.
4-8
Protection Rings
Operating System Kernel Operating System Services (Device Drivers, Etc.) Applications Highest 0 Lowest 3
Level 0
Level 1 Level 2 Level 3
Privilege Levels
Figure 4-3. Protection Rings
If an operating system or executive uses this multilevel protection mechanism, a call to a procedure that is in a more privileged protection level than the calling procedure is handled in a similar manner as a far call (refer to Section 4.3.2., Far CALL and RET Operation). The differences are as follows:
The segment selector provided in the CALL instruction references a special data structure called a call gate descriptor. Among other things, the call gate descriptor provides the following: Access rights information. The segment selector for the code segment of the called procedure. An offset into the code segment (that is, the instruction pointer for the called procedure).
The processor switches to a new stack to execute the called procedure. Each privilege level has its own stack. The segment selector and stack pointer for the privilege level 3 stack are stored in the SS and ESP registers, respectively, and are automatically saved when a call to a more privileged level occurs. The segment selectors and stack pointers for the privilege level 2, 1, and 0 stacks are stored in a system segment called the task state segment (TSS).
The use of a call gate and the TSS during a stack switch are transparent to the calling procedure, except when a general-protection exception is raised.
4-9
4.3.6.
CALL and RET Operation Between Privilege Levels
When making a call to a more privileged protection level, the processor does the following (refer to Figure 4-2): 1. Performs an access rights check (privilege check). 2. Temporarily saves (internally) the current contents of the SS, ESP, CS, and EIP registers. 3. Loads the segment selector and stack pointer for the new stack (that is, the stack for the privilege level being called) from the TSS into the SS and ESP registers and switches to the new stack. 4. Pushes the temporarily saved SS and ESP values for the calling procedures stack onto the new stack. 5. Copies the parameters from the calling procedures stack to the new stack. (A value in the call gate descriptor determines how many parameters to copy to the new stack.) 6. Pushes the temporarily saved CS and EIP values for the calling procedure to the new stack. 7. Loads the segment selector for the new code segment and the new instruction pointer from the call gate into the CS and EIP registers, respectively. 8. Begins execution of the called procedure at the new privilege level. When executing a return from the privileged procedure, the processor performs these actions: 1. Performs a privilege check. 2. Restores the CS and EIP registers to their values prior to the call. 3. (If the RET instruction has an optional n argument.) Increments the stack pointer by the number of bytes specified with the n operand to release parameters from the stack. If the call gate descriptor specifies that one or more parameters be copied from one stack to the other, a RET n instruction must be used to release the parameters from both stacks. Here, the n operand specifies the number of bytes occupied on each stack by the parameters. On a return, the processor increments ESP by n for each stack to step over (effectively remove) these parameters from the stacks. 4. Restores the SS and ESP registers to their values prior to the call, which causes a switch back to the stack of the calling procedure. 5. (If the RET instruction has an optional n argument.) Increments the stack pointer by the number of bytes specified with the n operand to release parameters from the stack (refer to the explanation in step 3). 6. Resumes execution of the calling procedure.
4-10
Stack for Calling Procedure
Stack for Called Procedure Calling SS Calling ESP Param 1 Param 2 Param 3 Calling CS Calling EIP
Stack Frame Before Call
Param 1 Param 2 Param 3
ESP Before Call ESP After Call
Stack Frame After Call
ESP After Return Param 1 Param 2 Param 3 ESP Before Return
Calling SS Calling ESP Param 1 Param 2 Param 3 Calling CS Calling EIP
Note: On a return, parameters are released on both stacks if the correct value is given for the n operand in the RET n instruction.
Figure 4-4. Stack Switch on a Call to a Different Privilege Level
Refer to Chapter 4, Protection of the Intel Architecture Software Developers Manual, Volume 3, for detailed information on calls to privileged levels and the call gate descriptor.
4.4.
INTERRUPTS AND EXCEPTIONS
The processor provides two mechanisms for interrupting program execution: interrupts and exceptions:
An interrupt is an asynchronous event that is typically triggered by an I/O device. An exception is a synchronous event that is generated when the processor detects one or more predefined conditions while executing an instruction. The IA specifies three classes of exceptions: faults, traps, and aborts.
The processor responds to interrupts and exceptions in essentially the same way. When an interrupt or exception is signaled, the processor halts execution of the current program or task and switches to a handler procedure that has been written specifically to handle the interrupt or exception condition. The processor accesses the handler procedure through an entry in the inter-
4-11
rupt descriptor table (IDT). When the handler has completed handling the interrupt or exception, program control is returned to the interrupted program or task. The operating system, executive, and/or device drivers normally handle interrupts and exceptions independently from application programs or tasks. Application programs can, however, access the interrupt and exception handlers incorporated in an operating system or executive through assembly-language calls. The remainder of this section gives a brief overview of the processors interrupt and exception handling mechanism. Refer to Chapter 5, Interrupt and Exception Handling of the Intel Architecture Software Developers Manual, Volume 3, for a detailed description of this mechanism. The IA defines 17 predefined interrupts and exceptions and 224 user defined interrupts, which are associated with entries in the IDT. Each interrupt and exception in the IDT is identified with a number, called a vector. Table 4-1 lists the interrupts and exceptions with entries in the IDT and their respective vector numbers. Vectors 0 through 8, 10 through 14, and 16 through 19 are the predefined interrupts and exceptions, and vectors 32 through 255 are the user-defined interrupts, called maskable interrupts. Note that the processor defines several additional interrupts that do not point to entries in the IDT; the most notable of these interrupts is the SMI interrupt. Refer to Chapter 5, Interrupt and Exception Handling of the Intel Architecture Software Developers Manual, Volume 3, for more information about the interrupts and exceptions that the IA supports. When the processor detects an interrupt or exception, it does one of the following things:
Executes an implicit call to a handler procedure. Executes an implicit call to a handler task.
The Pentium III processor can generate two types of exceptions: Numeric exceptions Non-numeric exceptions
When numeric exceptions occur, a processor supporting Streaming SIMD Extensions takes one of two possible courses of action: The processor can handle the exception by itself, producing the most reasonable result and allowing numeric program execution to continue undisturbed (i.e., masked exception response). A software exception handler can be invoked to handle the exception (i.e., unmasked exception response).
Each of the numeric exception conditions has corresponding flag and mask bits in the MXCSR (Streaming SIMD Extensions control status register). If an exception is masked (the corresponding mask bit in MXCSR = 1), the processor takes an appropriate default action and continues with the computation. If the exception is unmasked (mask bit = 0) and the OS supports SIMD floating-point exceptions (i.e. CR4.OSXMMEXCPT = 1), a software exception handler is invoked immediately through SIMD floating-point exception interrupt vector 19. If the exception is unmasked (mask bit = 0) and the OS does not support SIMD floating-point exceptions (i.e. CR4.OSXMMEXCPT = 0), an invalid opcode exception is signaled instead of a SIMD
4-12
floating-point exception. Refer to Section 9.5.5., Exception Handling in Streaming SIMD Extensions, in Chapter 9, Programming with the Streaming SIMD Extensions for more information on handling STREAMING SIMD Extensions exceptions. Note that because SIMD floating-point exceptions are precise and occur immediately, the situation does not arise where an x87-FP instruction, an FWAIT instruction, or another Streaming SIMD Extensions instruction will catch a pending unmasked SIMD floating-point exception.
4.4.1.
Call and Return Operation for Interrupt or Exception Handling Procedures
A call to an interrupt or exception handler procedure is similar to a procedure call to another protection level (refer to Section 4.3.6., CALL and RET Operation Between Privilege Levels). Here, the interrupt vector references one of two kinds of gates: an interrupt gate or a trap gate. Interrupt and trap gates are similar to call gates in that they provide the following information:
Access rights information. The segment selector for the code segment that contains the handler procedure. An offset into the code segment to the first instruction of the handler procedure.
The difference between an interrupt gate and a trap gate is as follows. If an interrupt or exception handler is called through an interrupt gate, the processor clears the interrupt enable (IF) flag in the EFLAGS register to prevent subsequent interrupts from interfering with the execution of the handler. When a handler is called through a trap gate, the state of the IF flag is not changed. If the code segment for the handler procedure has the same privilege level as the currently executing program or task, the handler procedure uses the current stack; if the handler executes at a more privileged level, the processor switches to the stack for the handlers privilege level.
4-13
Table 4-1. Exceptions and Interrupts

Vector No. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20-31 32-255 #MF #AC #MC #XF #TS #NP #SS #GP #PF #BP #OF #BR #UD #NM #DF Mnemonic Description #DE #DB Divide Error Debug NMI Interrupt Breakpoint Overflow BOUND Range Exceeded Invalid Opcode (UnDefined Opcode) Device Not Available (No Math Coprocessor) Double Fault CoProcessor Segment Overrun (reserved) Invalid TSS Segment Not Present Stack Segment Fault General Protection Page Fault (Intel reserved. Do not use.) Floating-Point Error (Math Fault) Alignment Check Machine Check Streaming SIMD Extensions (Intel reserved. Do not use.) Maskable Interrupts External interrupt from INTR pin or INT n instruction. Floating-point or WAIT/FWAIT instruction. Any data reference in memory.3 Error codes (if any) and source are model dependent.4 SIMD floating-point numeric exceptions.5 Source DIV and IDIV instructions. Any code or data reference. Non-maskable external interrupt. INT 3 instruction. INTO instruction. BOUND instruction. UD2 instruction or reserved opcode.1 Floating-point or WAIT/FWAIT instruction. Any instruction that can generate an exception, an NMI, or an INTR. Floating-point instruction.2 Task switch or TSS access. Loading segment registers or accessing system segments. Stack operations and SS register loads. Any memory reference and other protection checks. Any memory reference.
1. The UD2 instruction was introduced in the Pentium Pro processor. 2. IA processors after the Intel386 processor do not generate this exception. 3. This exception was introduced in the Intel486 processor. 4. This exception was introduced in the Pentium processor and enhanced in the Pentium Pro processor. 5. This exception was introduced in the Pentium III processor.
4-14
If no stack switch occurs, the processor does the following when calling an interrupt or exception handler (refer to Figure 4-5): 1. Pushes the current contents of the EFLAGS, CS, and EIP registers (in that order) on the stack. 2. Pushes an error code (if appropriate) on the stack. 3. Loads the segment selector for the new code segment and the new instruction pointer (from the interrupt gate or trap gate) into the CS and EIP registers, respectively. 4. If the call is through an interrupt gate, clears the IF flag in the EFLAGS register. 5. Begins execution of the handler procedure at the new privilege level.
Stack Usage with No Privilege-Level Change Interrupted Procedures and Handlers Stack ESP Before Transfer to Handler
EFLAGS CS EIP Error Code
ESP After Transfer to Handler
Stack Usage with Privilege-Level Change Interrupted Procedures Stack ESP Before Transfer to Handler Handlers Stack
SS ESP EFLAGS CS EIP Error Code
Figure 4-5. Stack Usage on Transfers to Interrupt and Exception Handling Routines
4-15
If a stack switch does occur, the processor does the following: 1. Temporarily saves (internally) the current contents of the SS, ESP, EFLAGS, CS, and EIP registers. 2. Loads the segment selector and stack pointer for the new stack (that is, the stack for the privilege level being called) from the TSS into the SS and ESP registers and switches to the new stack. 3. Pushes the temporarily saved SS, ESP, EFLAGS, CS, and EIP values for the interrupted procedures stack onto the new stack. 4. Pushes an error code on the new stack (if appropriate). 5. Loads the segment selector for the new code segment and the new instruction pointer (from the interrupt gate or trap gate) into the CS and EIP registers, respectively. 6. If the call is through an interrupt gate, clears the IF flag in the EFLAGS register. 7. Begins execution of the handler procedure at the new privilege level. A return from an interrupt or exception handler is initiated with the IRET instruction. The IRET instruction is similar to the far RET instruction, except that it also restores the contents of the EFLAGS register for the interrupted procedure: When executing a return from an interrupt or exception handler from the same privilege level as the interrupted procedure, the processor performs these actions: 1. Restores the CS and EIP registers to their values prior to the interrupt or exception. 2. Restores the EFLAGS register. 3. Increments the stack pointer appropriately 4. Resumes execution of the interrupted procedure. When executing a return from an interrupt or exception handler from a different privilege level than the interrupted procedure, the processor performs these actions: 1. Performs a privilege check. 2. Restores the CS and EIP registers to their values prior to the interrupt or exception. 3. Restores the EFLAGS register. 4. Restores the SS and ESP registers to their values prior to the interrupt or exception, resulting in a stack switch back to the stack of the interrupted procedure. 5. Resumes execution of the interrupted procedure.
4-16
4.4.2.
Calls to Interrupt or Exception Handler Tasks
Interrupt and exception handler routines can also be executed in a separate task. Here, an interrupt or exception causes a task switch to a handler task. The handler task is given its own address space and (optionally) can execute at a higher protection level than application programs or tasks. The switch to the handler task is accomplished with an implicit task call that references a task gate descriptor. The task gate provides access to the address space for the handler task. As part of the task switch, the processor saves complete state information for the interrupted program or task. Upon returning from the handler task, the state of the interrupted program or task is restored and execution continues. Refer to Chapter 5, Interrupt and Exception Handling, of the Intel Architecture Software Developers Manual, Volume 3, for a detailed description of the processors mechanism for handling interrupts and exceptions through handler tasks.
4.4.3.
Interrupt and Exception Handling in Real-Address Mode
When operating in real-address mode, the processor responds to an interrupt or exception with an implicit far call to an interrupt or exception handler. The processor uses the interrupt or exception vector number as an index into an interrupt table. The interrupt table contains instruction pointers to the interrupt and exception handler procedures. The processor saves the state of the EFLAGS register, the EIP register, the CS register, and an optional error code on the stack before switching to the handler procedure. A return from the interrupt or exception handler is carried out with the IRET instruction. Refer to Chapter 16, 8086 Emulation, of the Intel Architecture Software Developers Manual, Volume 3, for more information on handling interrupts and exceptions in real-address mode.
4.4.4.
INT n, INTO, INT 3, and BOUND Instructions
The INT n, INTO, INT 3, and BOUND instructions allow a program or task to explicitly call an interrupt or exception handler. The INT n instruction uses an interrupt vector as an argument, which allows a program to call any interrupt handler. The INTO instruction explicitly calls the overflow exception (#OF) handler if the overflow flag (OF) in the EFLAGS register is set. The OF flag indicates overflow on arithmetic instructions, but it does not automatically raise an overflow exception. An overflow exception can only be raised explicitly in either of the following ways:
Execute the INTO instruction. Test the OF flag and execute the INT n instruction with an argument of 4 (the vector number of the overflow exception) if the flag is set.
Both the methods of dealing with overflow conditions allow a program to test for overflow at specific places in the instruction stream. The INT 3 instruction explicitly calls the breakpoint exception (#BP) handler.
4-17
The BOUND instruction explicitly calls the BOUND-range exceeded exception (#BR) handler if an operand is found to be not within predefined boundaries in memory. This instruction is provided for checking references to arrays and other data structures. Like the overflow exception, the BOUND-range exceeded exception can only be raised explicitly with the BOUND instruction or the INT n instruction with an argument of 5 (the vector number of the bounds-check exception). The processor does not implicitly perform bounds checks and raise the BOUND-range exceeded exception.
4.5.
PROCEDURE CALLS FOR BLOCK-STRUCTURED LANGUAGES
The IA supports an alternate method of performing procedure calls with the ENTER (enter procedure) and LEAVE (leave procedure) instructions. These instructions automatically create and release, respectively, stack frames for called procedures. The stack frames have predefined spaces for local variables and the necessary pointers to allow coherent returns from called procedures. They also allow scope rules to be implemented so that procedures can access their own local variables and some number of other variables located in other stack frames. The ENTER and LEAVE instructions offer two benefits:
They provide machine-language support for implementing block-structured languages, such as C and Pascal. They simplify procedure entry and exit in compiler-generated code.
4.5.1.
ENTER Instruction
The ENTER instruction creates a stack frame compatible with the scope rules typically used in block-structured languages. In block-structured languages, the scope of a procedure is the set of variables to which it has access. The rules for scope vary among languages. They may be based on the nesting of procedures, the division of the program into separately compiled files, or some other modularization scheme. The ENTER instruction has two operands. The first specifies the number of bytes to be reserved on the stack for dynamic storage for the procedure being called. Dynamic storage is the memory allocated for variables created when the procedure is called, also known as automatic variables. The second parameter is the lexical nesting level (from 0 to 31) of the procedure. The nesting level is the depth of a procedure in a hierarchy of procedure calls. The lexical level is unrelated to either the protection privilege level or to the I/O privilege level of the currently running program or task. The ENTER instruction in the following example, allocates 2 Kbytes of dynamic storage on the stack and sets up pointers to two previous stack frames in the stack frame for this procedure.
ENTER 2048,3
4-18
The lexical nesting level determines the number of stack frame pointers to copy into the new stack frame from the preceding frame. A stack frame pointer is a doubleword used to access the variables of a procedure. The set of stack frame pointers used by a procedure to access the variables of other procedures is called the display. The first doubleword in the display is a pointer to the previous stack frame. This pointer is used by a LEAVE instruction to undo the effect of an ENTER instruction by discarding the current stack frame. After the ENTER instruction creates the display for a procedure, it allocates the dynamic local variables for the procedure by decrementing the contents of the ESP register by the number of bytes specified in the first parameter. This new value in the ESP register serves as the initial topof-stack for all PUSH and POP operations within the procedure. To allow a procedure to address its display, the ENTER instruction leaves the EBP register pointing to the first doubleword in the display. Because stacks grow down, this is actually the doubleword with the highest address in the display. Data manipulation instructions that specify the EBP register as a base register automatically address locations within the stack segment instead of the data segment. The ENTER instruction can be used in two ways: nested and non-nested. If the lexical level is 0, the non-nested form is used. The non-nested form pushes the contents of the EBP register on the stack, copies the contents of the ESP register into the EBP register, and subtracts the first operand from the contents of the ESP register to allocate dynamic storage. The non-nested form differs from the nested form in that no stack frame pointers are copied. The nested form of the ENTER instruction occurs when the second parameter (lexical level) is not zero. The following pseudo code shows the formal definition of the ENTER instruction. STORAGE is the number of bytes of dynamic storage to allocate for local variables, and LEVEL is the lexical nesting level.
PUSH EBP; FRAME_PTR ESP; IF LEVEL > 0 THEN DO (LEVEL 1) times EBP EBP 4; PUSH Pointer(EBP); (* doubleword pointed to by EBP *) OD; PUSH FRAME_PTR; FI; EBP FRAME_PTR; ESP ESP STORAGE;
The main procedure (in which all other procedures are nested) operates at the highest lexical level, level 1. The first procedure it calls operates at the next deeper lexical level, level 2. A level 2 procedure can access the variables of the main program, which are at fixed locations specified by the compiler. In the case of level 1, the ENTER instruction allocates only the requested dynamic storage on the stack because there is no previous display to copy. A procedure which calls another procedure at a lower lexical level gives the called procedure access to the variables of the caller. The ENTER instruction provides this access by placing a pointer to the calling procedures stack frame in the display.
4-19
A procedure which calls another procedure at the same lexical level should not give access to its variables. In this case, the ENTER instruction copies only that part of the display from the calling procedure which refers to previously nested procedures operating at higher lexical levels. The new stack frame does not include the pointer for addressing the calling procedures stack frame. The ENTER instruction treats a re-entrant procedure as a call to a procedure at the same lexical level. In this case, each succeeding iteration of the re-entrant procedure can address only its own variables and the variables of the procedures within which it is nested. A re-entrant procedure always can address its own variables; it does not require pointers to the stack frames of previous iterations. By copying only the stack frame pointers of procedures at higher lexical levels, the ENTER instruction makes certain that procedures access only those variables of higher lexical levels, not those at parallel lexical levels (refer to Figure 4-6).
Main (Lexical Level 1) Procedure A (Lexical Level 2) Procedure B (Lexical Level 3) Procedure C (Lexical Level 3) Procedure D (Lexical Level 4)
Figure 4-6. Nested Procedures
Block-structured languages can use the lexical levels defined by ENTER to control access to the variables of nested procedures. In Figure 4-6, for example, if procedure A calls procedure B which, in turn, calls procedure C, then procedure C will have access to the variables of the MAIN procedure and procedure A, but not those of procedure B because they are at the same lexical level. The following definition describes the access to variables for the nested procedures in Figure 4-6. 1. MAIN has variables at fixed locations. 2. Procedure A can access only the variables of MAIN. 3. Procedure B can access only the variables of procedure A and MAIN. Procedure B cannot access the variables of procedure C or procedure D. 4. Procedure C can access only the variables of procedure A and MAIN. procedure C cannot access the variables of procedure B or procedure D. 5. Procedure D can access the variables of procedure C, procedure A, and MAIN. Procedure D cannot access the variables of procedure B.
4-20
In Figure 4-7, an ENTER instruction at the beginning of the MAIN procedure creates three doublewords of dynamic storage for MAIN, but copies no pointers from other stack frames. The first doubleword in the display holds a copy of the last value in the EBP register before the ENTER instruction was executed. The second doubleword holds a copy of the contents of the EBP register following the ENTER instruction. After the instruction is executed, the EBP register points to the first doubleword pushed on the stack, and the ESP register points to the last doubleword in the stack frame. When MAIN calls procedure A, the ENTER instruction creates a new display (refer to Figure 4-8). The first doubleword is the last value held in MAINs EBP register. The second doubleword is a pointer to MAINs stack frame which is copied from the second doubleword in MAINs display. This happens to be another copy of the last value held in MAINs EBP register. Procedure A can access variables in MAIN because MAIN is at level 1. Therefore the base address for the dynamic storage used in MAIN is the current address in the EBP register, plus four bytes to account for the saved contents of MAINs EBP register. All dynamic variables for MAIN are at fixed, positive offsets from this value.
Old EBP Display Mains EBP
EBP
Dynamic Storage
ESP
Figure 4-7. Stack Frame after Entering the MAIN Procedure
4-21
Old EBP Mains EBP
Display
Mains EBP Mains EBP Procedure As EBP
EBP
Dynamic Storage ESP
Figure 4-8. Stack Frame after Entering Procedure A
When procedure A calls procedure B, the ENTER instruction creates a new display (refer to Figure 4-9). The first doubleword holds a copy of the last value in procedure As EBP register. The second and third doublewords are copies of the two stack frame pointers in procedure As display. Procedure B can access variables in procedure A and MAIN by using the stack frame pointers in its display.
Old EBP Mains EBP
Procedure As EBP Display Mains EBP Procedure As EBP Procedure Bs EBP Dynamic Storage
EBP
ESP
Figure 4-9. Stack Frame after Entering Procedure B
When procedure B calls procedure C, the ENTER instruction creates a new display for procedure C (refer to Figure 4-10). The first doubleword holds a copy of the last value in procedure Bs EBP register. This is used by the LEAVE instruction to restore procedure Bs stack frame. The second and third doublewords are copies of the two stack frame pointers in procedure As display. If procedure C were at the next deeper lexical level from procedure B, a fourth doubleword would be copied, which would be the stack frame pointer to procedure Bs local variables. Note that procedure B and procedure C are at the same level, so procedure C is not intended to access procedure Bs variables. This does not mean that procedure C is completely isolated from procedure B; procedure C is called by procedure B, so the pointer to the returning stack frame is a pointer to procedure B's stack frame. In addition, procedure B can pass parameters to procedure C either on the stack or through variables global to both procedures (that is, variables in the scope of both procedures).
4-23
Old EBP Mains EBP
Procedure As EBP Mains EBP Procedure As EBP Procedure Bs EBP
Procedure Bs EBP Display Mains EBP Procedure As EBP Procedure Cs EBP Dynamic Storage
EBP
ESP
Figure 4-10. Stack Frame after Entering Procedure C
4.5.2.
LEAVE Instruction
The LEAVE instruction, which does not have any operands, reverses the action of the previous ENTER instruction. The LEAVE instruction copies the contents of the EBP register into the ESP register to release all stack space allocated to the procedure. Then it restores the old value of the EBP register from the stack. This simultaneously restores the ESP register to its original value. A subsequent RET instruction then can remove any arguments and the return address pushed on the stack by the calling program for use by the procedure.
4-24
5
Data Types and Addressing Modes
CHAPTER 5 DATA TYPES AND ADDRESSING MODES

This chapter describes data types and addressing modes available to programmers of the Intel Architecture (IA) processors.
5.1.
FUNDAMENTAL DATA TYPES
The fundamental data types of the IA are bytes, words, doublewords, and quadwords (refer to Figure 5-1). A byte is eight bits, a word is 2 bytes (16 bits), a doubleword is 4 bytes (32 bits), and a quadword is 8 bytes (64 bits).
7 0 Byte N 15 High Byte N+1 31 High Word N+2 63 High Doubleword N+4 32 31 Low Doubleword N 16 15 Low Word N 0 Quadword 87 Low Byte N 0 Doubleword 0 Word
Figure 5-1. Fundamental Data Types
The Pentium III processor introduced a new data type, a 128-bit packed data type. It is packed single precision (32 bits) floating-point numbers. These values are the operands for the SIMD floating-point operations. They are also the operands for the scalar equivalents of these instructions. Refer to Chapter 5-2, SIMD Floating-Point Data Type for a description of this data type.
127
96 95
64 63
32 31
Figure 5-2. SIMD Floating-Point Data Type
5-1
DATA TYPES AND ADDRESSING MODES
Figure 5-2 shows the byte order of each of the fundamental data types when referenced as operands in memory. The low byte (bits 0 through 7) of each data type occupies the lowest address in memory and that address is also the address of the operand.
5.1.1.
Alignment of Words, Doublewords, and Quadwords
Words, doublewords, and quadwords do not need to be aligned in memory on natural boundaries. (The natural boundaries for words, double words, and quadwords are even-numbered addresses, addresses evenly divisible by four, and addresses evenly divisible by eight, respectively.) However, to improve the performance of programs, data structures (especially stacks) should be aligned on natural boundaries whenever possible. The reason for this is that the processor requires two memory accesses to make an unaligned memory access; whereas, aligned accesses require only one memory access. A word or doubleword operand that crosses a 4-byte boundary or a quadword operand that crosses an 8-byte boundary is considered unaligned and requires two separate memory bus cycles to access it; a word that starts on an odd address but does not cross a word boundary is considered aligned and can still be accessed in one bus cycle.
EH 7AH Word at Address BH Contains FE06H Byte at Address 9H Contains 1FH FEH 06H 36H 1FH A4H Word at Address 6H Contains 230BH 23H 0BH DH CH BH AH 9H 8H 7H 6H 5H 4H Word at Address 2H Contains 74CBH Word at Address 1H Contains CB31H 74H CBH 31H 3H 2H 1H 0H Quadword at Address 6H Contains 7AFE06361FA4230BH Doubleword at Address AH Contains 7AFE0636H
Figure 5-3. Bytes, Words, Doublewords and Quadwords in Memory
When accessing 128 bit data for the Pentium III processor, data must be aligned on 16-byte boundaries. There are instructions that allow for unaligned access, but additional time is required to receive the data into the cache. If an instruction that expects aligned data is used to access unaligned data, a general protection fault will occur.
5-2
5.2.
NUMERIC, POINTER, BIT FIELD, AND STRING DATA TYPES
Although bytes, words, and doublewords are the fundamental data types of the IA, some instructions support additional interpretations of these data types to allow operations to be performed on numeric data types (signed and unsigned integers and BCD integers). Refer to Figure 5-4. Also, some instructions recognize and operate on additional pointer, bit field, and string data types. The following sections describe these additional data types.
5.2.1.
Integers
Integers are signed binary numbers held in a byte, word, or doubleword. All operations assume a twos complement representation. The sign bit is located in bit 7 in a byte integer, bit 15 in a word integer, and bit 31 in a doubleword integer. The sign bit is set for negative integers and cleared for positive integers and zero. Integer values range from 128 to +127 for a byte integer, from 32,768 to +32,767 for a word integer, and from 231 to +231 1 for a doubleword integer.
5-3
Byte Signed Integer Sign 7 6 Word Signed Integer Sign 15 14 Doubleword Signed Integer Sign 31 30 0 Byte Unsigned Integer 7 0 Word Unsigned Integer 15 Doubleword Unsigned Integer 31 BCD Integers X BCD 0 0 0
.... ....
BCD 7
BCD 0 43
Packed BCD Integers BCD BCD BCD BCD BCD BCD 7 0 43 Near Pointer 0 Offset 32 31 Bit Field 0
Offset or Linear Address 31 Far Pointer or Logical Address Segment Selector 47
Field Length Least Significant Bit
Figure 5-4. Numeric, Pointer, and Bit Field Data Types
5-4
5.2.2.
Unsigned Integers
Unsigned integers are unsigned binary numbers contained in a byte, word, or doubleword. Unsigned integer values range from 0 to 255 for an unsigned byte integer, from 0 to 65,535 for an unsigned word integer, and from 0 to 232 1 for an unsigned doubleword integer. Unsigned integers are sometimes referred to as ordinals.
5.2.3.
BCD Integers
Binary-coded decimal integers (BCD integers) are unsigned 4-bit integers with valid values ranging from 0 to 9. BCD integers can be unpacked (one BCD digit per byte) or packed (two BCD digits per byte). The value of an unpacked BCD integer is the binary value of the low halfbyte (bits 0 through 3). The high half-byte (bits 4 through 7) can be any value during addition and subtraction, but must be zero during multiplication and division. Packed BCD integers allow two BCD digits to be contained in one byte. Here, the digit in the high half-byte is more significant than the digit in the low half-byte.
5.2.4.
Pointers
Pointers are addresses of locations in memory. The Pentium Pro processor recognizes two types of pointers: a near pointer (32 bits) and a far pointer (48 bits). A near pointer is a 32-bit offset (also called an effective address) within a segment. Near pointers are used for all memory references in a flat memory model or for references in a segmented model where the identity of the segment being accessed is implied. A far pointer is a 48-bit logical address, consisting of a 16-bit segment selector and a 32-bit offset. Far pointers are used for memory references in a segmented memory model where the identity of a segment being accessed must be specified explicitly.
5.2.5.
Bit Fields
A bit field is a contiguous sequence of bits. It can begin at any bit position of any byte in memory and can contain up to 32 bits.
5.2.6.
Strings
Strings are continuous sequences of bits, bytes, words, or doublewords. A bit string can begin at any bit position of any byte and can contain up to 232 1 bits. A byte string can contain bytes, words, or doublewords and can range from zero to 232 1 bytes (4 gigabytes).
5-5
5.2.7.
Floating-Point Data Types
The processors floating-point instructions recognize a set of real, integer, and BCD integer data types. Refer to Section 7.4., Floating-Point Data Types and Formats in Chapter 7, FloatingPoint Unit for a description of FPU data types.
5.2.8.
MMX Technology Data Types
IA processors that implement the Intel MMX technology recognize a set of packed 64-bit data types. Refer to Section 8.1.2., MMX Data Types in Chapter 8, Programming with the Intel MMX Technology for a description of the MMX data types.
5.2.9.
Streaming SIMD Extensions Data Types
IA processors that implement the Intel Streaming SIMD Extensions recognize a set of 128-bit data types. Refer to Section 9.1.2., SIMD Floating-Point Data Types in Chapter 9, Programming with the Streaming SIMD Extensions for a description of the SIMD floating-point data types.
5.3.
OPERAND ADDRESSING
An IA machine-instruction acts on zero or more operands. Some operands are specified explicitly in an instruction and others are implicit to an instruction. An operand can be located in any of the following places:
The instruction itself (an immediate operand). A register. A memory location. An I/O port.
5.3.1.
Immediate Operands
Some instructions use data encoded in the instruction itself as a source operand. These operands are called immediate operands (or simply immediates). For example, the following ADD instruction adds an immediate value of 14 to the contents of the EAX register:
ADD EAX, 14
All the arithmetic instructions (except the DIV and IDIV instructions) allow the source operand to be an immediate value. The maximum value allowed for an immediate operand varies among instructions, but can never be greater than the maximum value of an unsigned doubleword integer (232).
5-6
5.3.2.
Register Operands
Source and destination operands can be located in any of the following registers, depending on the instruction being executed:
The 32-bit general-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, or EBP). The 16-bit general-purpose registers (AX, BX, CX, DX, SI, DI, SP, or BP). The 8-bit general-purpose registers (AH, BH, CH, DH, AL, BL, CL, or DL). The segment registers (CS, DS, SS, ES, FS, and GS). The EFLAGS register. System registers, such as the global descriptor table (GDTR) or the interrupt descriptor table register (IDTR).
Some instructions (such as the DIV and MUL instructions) use quadword operands contained in a pair of 32-bit registers. Register pairs are represented with a colon separating them. For example, in the register pair EDX:EAX, EDX contains the high order bits and EAX contains the low order bits of a quadword operand. Several instructions (such as the PUSHFD and POPFD instructions) are provided to load and store the contents of the EFLAGS register or to set or clear individual flags in this register. Other instructions (such as the Jcc instructions) use the state of the status flags in the EFLAGS register as condition codes for branching or other decision making operations. The processor contains a selection of system registers that are used to control memory management, interrupt and exception handling, task management, processor management, and debugging activities. Some of these system registers are accessible by an application program, the operating system, or the executive through a set of system instructions. When accessing a system register with a system instruction, the register is generally an implied operand of the instruction.
5.3.3.
Memory Operands
Source and destination operands in memory are referenced by means of a segment selector and an offset (refer to Figure 5-5). The segment selector specifies the segment containing the operand and the offset (the number of bytes from the beginning of the segment to the first byte of the operand) specifies the linear or effective address of the operand.
15
Segment Selector
31 Offset (or Linear Address)
Figure 5-5. Memory Operand Address
5-7
5.3.3.1.
SPECIFYING A SEGMENT SELECTOR
The segment selector can be specified either implicitly or explicitly. The most common method of specifying a segment selector is to load it in a segment register and then allow the processor to select the register implicitly, depending on the type of operation being performed. The processor automatically chooses a segment according to the rules given in Table 5-1.
Table 5-1. Default Segment Selection Rules
Type of Reference Instructions Stack Register Used CS SS Segment Used Code Segment Stack Segment Default Selection Rule All instruction fetches. All stack pushes and pops. Any memory reference which uses the ESP or EBP register as a base register. All data references, except when relative to stack or string destination. Destination of string instructions.
Local Data Destination Strings
DS ES
Data Segment Data Segment pointed to with the ES register
When storing data in or loading data from memory, the DS segment default can be overridden to allow other segments to be accessed. Within an assembler, the segment override is generally handled with a colon : operator. For example, the following MOV instruction moves a value from register EAX into the segment pointed to by the ES register. The offset into the segment is contained in the EBX register:
MOV ES:[EBX], EAX;
(At the machine level, a segment override is specified with a segment-override prefix, which is a byte placed at the beginning of an instruction.) The following default segment selections cannot be overridden:
Instruction fetches must be made from the code segment. Destination strings in string instructions must be stored in the data segment pointed to by the ES register. Push and pop operations must always reference the SS segment.
Some instructions require a segment selector to be specified explicitly. In these cases, the 16-bit segment selector can be located in a memory location or in a 16-bit register. For example, the following MOV instruction moves a segment selector located in register BX into segment register DS:
MOV DS, BX
Segment selectors can also be specified explicitly as part of a 48-bit far pointer in memory. Here, the first doubleword in memory contains the offset and the next word contains the segment selector.
5-8
5.3.3.2.
SPECIFYING AN OFFSET
The offset part of a memory address can be specified either directly as an static value (called a displacement) or through an address computation made up of one or more of the following components:
DisplacementAn 8-, 16-, or 32-bit value. BaseThe value in a general-purpose register. IndexThe value in a general-purpose register. Scale factorA value of 2, 4, or 8 that is multiplied by the index value.
The offset which results from adding these components is called an effective address. Each of these components can have either a positive or negative (2s complement) value, with the exception of the scaling factor. Figure 5-6 shows all the possible ways that these components can be combined to create an effective address in the selected segment.
Base EAX EBX ECX EDX ESP EBP ESI EDI
Index EAX EBX ECX EDX EBP ESI EDI
Scale 1 2
Displacement None
8-bit 16-bit 32-bit
3 4
Offset = Base + (Index Scale) + Displacement
Figure 5-6. Offset (or Effective Address) Computation
The uses of general-purpose registers as base or index components are restricted in the following manner:
The ESP register cannot be used as an index register. When the ESP or EBP register is used as the base, the SS segment is the default segment. In all other cases, the DS segment is the default segment.
The base, index, and displacement components can be used in any combination, and any of these components can be null. A scale factor may be used only when an index also is used. Each possible combination is useful for data structures commonly used by programmers in high-level languages and assembly language. The following addressing modes suggest uses for common combinations of address components. Displacement A displacement alone represents a direct (uncomputed) offset to the operand. Because the displacement is encoded in the instruction, this form of an address is sometimes called an absolute or static address. It is commonly used to access a statically allocated scalar operand.
5-9
Base A base alone represents an indirect offset to the operand. Since the value in the base register can change, it can be used for dynamic storage of variables and data structures. Base + Displacement A base register and a displacement can be used together for two distinct purposes:
As an index into an array when the element size is not 2, 4, or 8 bytesThe displacement component encodes the static offset to the beginning of the array. The base register holds the results of a calculation to determine the offset to a specific element within the array. To access a field of a recordThe base register holds the address of the beginning of the record, while the displacement is an static offset to the field.
An important special case of this combination is access to parameters in a procedure activation record. A procedure activation record is the stack frame created when a procedure is entered. Here, the EBP register is the best choice for the base register, because it automatically selects the stack segment. This is a compact encoding for this common function. (Index Scale) + Displacement This address mode offers an efficient way to index into a static array when the element size is 2, 4, or 8 bytes. The displacement locates the beginning of the array, the index register holds the subscript of the desired array element, and the processor automatically converts the subscript into an index by applying the scaling factor. Base + Index + Displacement Using two registers together supports either a two-dimensional array (the displacement holds the address of the beginning of the array) or one of several instances of an array of records (the displacement is an offset to a field within the record). Base + (Index Scale) + Displacement Using all the addressing components together allows efficient indexing of a two-dimensional array when the elements of the array are 2, 4, or 8 bytes in size. 5.3.3.3. ASSEMBLER AND COMPILER ADDRESSING MODES
At the machine-code level, the selected combination of displacement, base register, index register, and scale factor is encoded in an instruction. All assemblers permit a programmer to use any of the allowable combinations of these addressing components to address operands. High-level language (HLL) compilers will select an appropriate combination of these components based on the HHL construct a programmer defines.
5-10
5.3.4.
I/O Port Addressing
The processor supports an I/O address space that contains up to 65,536 8-bit I/O ports. Ports that are 16-bit and 32-bit may also be defined in the I/O address space. An I/O port can be addressed with either an immediate operand or a value in the DX register. Refer to Chapter 10, Input/Output for more information about I/O port addressing.
5-11
5-12
6
Instruction Set Summary
CHAPTER 6 INSTRUCTION SET SUMMARY

This chapter lists all the instructions in the Intel Architecture (IA) instruction set, divided into three functional groups: integer, floating-point, and system. It also briefly describes each of the integer instructions. Brief descriptions of the floating-point instructions are given in Chapter 7, Floating-Point Unit; brief descriptions of the system instructions are given in the Intel Architecture Software Developers Manual, Volume 3. Detailed descriptions of all the IA instructions are given in the Intel Architecture Software Developers Manual, Volume 2. Included in this volume are a description of each instructions encoding and operation, the effect of an instruction on the EFLAGS flags, and the exceptions an instruction may generate.
6.1.
NEW INTEL ARCHITECTURE INSTRUCTIONS
The following sections give the IA instructions that were new in the Streaming SIMD Extensions, MMX Technology and in the Pentium Pro, Pentium, and Intel486 processors.
6.1.1.
New Instructions Introduced with the Streaming SIMD Extensions
The Intel Streaming SIMD Extensions introduced a new set of instructions to the IA, designed to enhance the performance of multimedia applications, 3D games and other 3D applications, as well as other applications. These instructions are recognized by all IA processors that implement the Streaming SIMD Extensions that are listed in Section 6.2.5., Streaming SIMD Extensions.
6.1.2.
New Instructions Introduced with the MMX Technology
The Intel MMX technology introduced a new set of instructions to the IA, designed to enhance the performance of multimedia applications. These instructions are recognized by all IA processors that implement the MMX technology. The MMX instructions are listed in Section 6.2.2., MMX Technology Instructions.
6-1
INSTRUCTION SET SUMMARY
6.1.3.
New Instructions in the Pentium Pro Processor
The following instructions are new in the Pentium Pro processor:
CMOVccConditional move (refer to Section 6.3.1.2., Conditional Move Instructions). FCMOVccFloating-point conditional move on condition-code flags in EFLAGS register (refer to Section 7.5.3., Data Transfer Instructions in Chapter 7, Floating-Point Unit). FCOMI/FCOMIP/FUCOMI/FUCOMIPFloating-point compare and set condition-code flags in EFLAGS register (refer to Section 7.5.6., Comparison and Classification Instructions in Chapter 7, Floating-Point Unit). RDPMCRead performance monitoring counters (refer to Chapter 3, Instruction Set Reference of the Intel Architecture Software Developers Manual, Volume 2). (This instruction is also available in all Pentium processors that implement the MMX technology.) UD2Undefined instruction (refer to Section 6.15.4., No-Operation and Undefined Instructions).
6.1.4.New Instructions in the Pentium Processor

The following instructions are new in the Pentium processor:
CMPXCHG8B (compare and exchange 8 bytes) instruction. CPUID (CPU identification) instruction. (This instruction was introduced in the Pentium processor and added to later versions of the Intel486 processor.) RDTSC (read time-stamp counter) instruction. RDMSR (read model-specific register) instruction. WRMSR (write model-specific register) instruction. RSM (resume from SMM) instruction.
The form of the MOV instruction used to access the test registers has been removed on the Pentium and future IA processors.
6-2
6.1.5.
New Instructions in the Intel486 Processor
The following instructions are new in the Intel486 processor:
BSWAP (byte swap) instruction. XADD (exchange and add) instruction. CMPXCHG (compare and exchange) instruction. NVD (invalidate cache) instruction. WBINVD (write-back and invalidate cache) instruction. INVLPG (invalidate TLB entry) instruction.
6.2.
INSTRUCTION SET LIST
This section lists all the IA instructions divided into three major groups: integer, MMX technology, floating-point, and system instructions. For each instruction, the mnemonic and descriptive names are given. When two or more mnemonics are given (for example, CMOVA/CMOVNBE), they represent different mnemonics for the same instruction opcode. Assemblers support redundant mnemonics for some instructions to make it easier to read code listings. For instance, CMOVA (Conditional move if above) and CMOVNBE (Conditional move is not below or equal) represent the same condition.
6.2.1.
Integer Instructions
Integer instructions perform the integer arithmetic, logic, and program flow control operations that programmers commonly use to write application and system software to run on an IA processor. In the following sections, the integer instructions are divided into several instruction subgroups. 6.2.1.1. MOV CMOVE/CMOVZ CMOVNE/CMOVNZ CMOVA/CMOVNBE CMOVAE/CMOVNB CMOVB/CMOVNAE CMOVBE/CMOVNA DATA TRANSFER INSTRUCTIONS Move Conditional move if equal/Conditional move if zero Conditional move if not equal/Conditional move if not zero Conditional move if above/Conditional move if not below or equal Conditional move if above or equal/Conditional move if not below Conditional move if below/Conditional move if not above or equal Conditional move if below or equal/Conditional move if not above
6-3
CMOVG/CMOVNLE CMOVGE/CMOVNL CMOVL/CMOVNGE CMOVLE/CMOVNG CMOVC CMOVNC CMOVO CMOVNO CMOVS CMOVNS CMOVP/CMOVPE CMOVNP/CMOVPO XCHG BSWAP XADD CMPXCHG CMPXCHG8B PUSH POP PUSHA/PUSHAD POPA/POPAD IN OUT CWD/CDQ CBW/CWDE MOVSX MOVZX
Conditional move if greater/Conditional move if not less or equal Conditional move if greater or equal/Conditional move if not less Conditional move if less/Conditional move if not greater or equal Conditional move if less or equal/Conditional move if not greater Conditional move if carry Conditional move if not carry Conditional move if overflow Conditional move if not overflow Conditional move if sign (negative) Conditional move if not sign (non-negative) Conditional move if parity/Conditional move if parity even Conditional move if not parity/Conditional move if parity odd Exchange Byte swap Exchange and add Compare and exchange Compare and exchange 8 bytes Push onto stack Pop off of stack Push general-purpose registers onto stack Pop general-purpose registers from stack Read from a port Write to a port Convert word to doubleword/Convert doubleword to quadword Convert byte to word/Convert word to doubleword in EAX register Move and sign extend Move and zero extend
6-4
6.2.1.2. ADD ADC SUB SBB IMUL MUL IDIV DIV INC DEC NEG CMP 6.2.1.3. DAA DAS AAA AAS AAM AAD 6.2.1.4. AND OR XOR NOT 6.2.1.5. SAR SHR SAL/SHL
BINARY ARITHMETIC INSTRUCTIONS Integer add Add with carry Subtract Subtract with borrow Signed multiply Unsigned multiply Signed divide Unsigned divide Increment Decrement Negate Compare DECIMAL ARITHMETIC Decimal adjust after addition Decimal adjust after subtraction ASCII adjust after addition ASCII adjust after subtraction ASCII adjust after multiplication ASCII adjust before division LOGIC INSTRUCTIONS And Or Exclusive or Not SHIFT AND ROTATE INSTRUCTIONS Shift arithmetic right Shift logical right Shift arithmetic left/Shift logical left
6-5
SHRD SHLD ROR ROL RCR RCL 6.2.1.6. BT BTS BTR BTC BSF BSR SETE/SETZ SETNE/SETNZ SETA/SETNBE
Shift right double Shift left double Rotate right Rotate left Rotate through carry right Rotate through carry left BIT AND BYTE INSTRUCTIONS Bit test Bit test and set Bit test and reset Bit test and complement Bit scan forward Bit scan reverse Set byte if equal/Set byte if zero Set byte if not equal/Set byte if not zero Set byte if above/Set byte if not below or equal
SETAE/SETNB/SETNC Set byte if above or equal/Set byte if not below/Set byte if not carry SETB/SETNAE/SETC SETBE/SETNA SETG/SETNLE SETGE/SETNL SETL/SETNGE SETLE/SETNG SETS SETNS SETO SETNO SETPE/SETP SETPO/SETNP TEST Set byte if below/Set byte if not above or equal/Set byte if carry Set byte if below or equal/Set byte if not above Set byte if greater/Set byte if not less or equal Set byte if greater or equal/Set byte if not less Set byte if less/Set byte if not greater or equal Set byte if less or equal/Set byte if not greater Set byte if sign (negative) Set byte if not sign (non-negative) Set byte if overflow Set byte if not overflow Set byte if parity even/Set byte if parity Set byte if parity odd/Set byte if not parity Logical compare
6-6
6.2.1.7. JMP JE/JZ JNE/JNZ JA/JNBE JAE/JNB JB/JNAE JBE/JNA JG/JNLE JGE/JNL JL/JNGE JLE/JNG JC JNC JO JNO JS JNS JPO/JNP JPE/JP
CONTROL TRANSFER INSTRUCTIONS Jump Jump if equal/Jump if zero Jump if not equal/Jump if not zero Jump if above/Jump if not below or equal Jump if above or equal/Jump if not below Jump if below/Jump if not above or equal Jump if below or equal/Jump if not above Jump if greater/Jump if not less or equal Jump if greater or equal/Jump if not less Jump if less/Jump if not greater or equal Jump if less or equal/Jump if not greater Jump if carry Jump if not carry Jump if overflow Jump if not overflow Jump if sign (negative) Jump if not sign (non-negative) Jump if parity odd/Jump if not parity Jump if parity even/Jump if parity Jump register CX zero/Jump register ECX zero Loop with ECX counter Loop with ECX and zero/Loop with ECX and equal Loop with ECX and not zero/Loop with ECX and not equal Call procedure Return Return from interrupt Software interrupt Interrupt on overflow Detect value out of range High-level procedure entry High-level procedure exit
JCXZ/JECXZ LOOP LOOPZ/LOOPE LOOPNZ/LOOPNE CALL RET IRET INT INTO BOUND ENTER LEAVE
6-7
6.2.1.8.
STRING INSTRUCTIONS Move string/Move byte string Move string/Move word string Move string/Move doubleword string Compare string/Compare byte string Compare string/Compare word string Compare string/Compare doubleword string Scan string/Scan byte string Scan string/Scan word string Scan string/Scan doubleword string Load string/Load byte string Load string/Load word string Load string/Load doubleword string Store string/Store byte string Store string/Store word string Store string/Store doubleword string Repeat while ECX not zero Repeat while equal/Repeat while zero Repeat while not equal/Repeat while not zero Input string from port/Input byte string from port Input string from port/Input word string from port Input string from port/Input doubleword string from port Output string to port/Output byte string to port Output string to port/Output word string to port Output string to port/Output doubleword string to port
MOVS/MOVSB MOVS/MOVSW MOVS/MOVSD CMPS/CMPSB CMPS/CMPSW CMPS/CMPSD SCAS/SCASB SCAS/SCASW SCAS/SCASD LODS/LODSB LODS/LODSW LODS/LODSD STOS/STOSB STOS/STOSW STOS/STOSD REP REPE/REPZ REPNE/REPNZ INS/INSB INS/INSW INS/INSD OUTS/OUTSB OUTS/OUTSW OUTS/OUTSD
6-8
6.2.1.9. STC CLC CMC CLD STD LAHF SAHF
FLAG CONTROL INSTRUCTIONS Set carry flag Clear the carry flag Complement the carry flag Clear the direction flag Set direction flag Load flags into AH register Store AH register into flags Push EFLAGS onto stack Pop EFLAGS from stack Set interrupt flag Clear the interrupt flag SEGMENT REGISTER INSTRUCTIONS Load far pointer using DS Load far pointer using ES Load far pointer using FS Load far pointer using GS Load far pointer using SS MISCELLANEOUS INSTRUCTIONS Load effective address No operation Undefined instruction Table lookup translation Processor Identification
PUSHF/PUSHFD POPF/POPFD STI CLI 6.2.1.10. LDS LES LFS LGS LSS 6.2.1.11. LEA NOP UB2 XLAT/XLATB CPUID
6-9
6.2.2.
MMX Technology Instructions
The MMX instructions execute on those IA processors that implement the Intel MMX technology. These instructions operate on packed-byte, packed-word, packed-doubleword, and quadword operands. As with the integer instructions, the following list of MMX instructions is divided into subgroups. 6.2.2.1. MOVD MOVQ 6.2.2.2. PACKSSWB PACKSSDW PACKUSWB PUNPCKHBW PUNPCKHWD PUNPCKHDQ PUNPCKLBW PUNPCKLWD PUNPCKLDQ 6.2.2.3. PADDB PADDW PADDD PADDSB PADDSW PADDUSB PADDUSW PSUBB PSUBW PSUBD PSUBSB MMX DATA TRANSFER INSTRUCTIONS Move doubleword Move quadword MMX CONVERSION INSTRUCTIONS Pack words into bytes with signed saturation Pack doublewords into words with signed saturation Pack words into bytes with unsigned saturation Unpack high-order bytes from words Unpack high-order words from doublewords Unpack high-order doublewords from quadword Unpack low-order bytes from words Unpack low-order words from doublewords Unpack low-order doublewords from quadword
MMX PACKED ARITHMETIC INSTRUCTIONS Add packed bytes Add packed words Add packed doublewords Add packed bytes with saturation Add packed words with saturation Add packed unsigned bytes with saturation Add packed unsigned words with saturation Subtract packed bytes Subtract packed words Subtract packed doublewords Subtract packed bytes with saturation
6-10
PSUBSW PSUBUSB PSUBUSW PMULHW PMULLW PMADDWD 6.2.2.4. PCMPEQB PCMPEQW PCMPEQD PCMPGTB PCMPGTW PCMPGTD 6.2.2.5. PAND PANDN POR PXOR 6.2.2.6. PSLLW PSLLD PSLLQ PSRLW PSRLD PSRLQ PSRAW PSRAD
Subtract packed words with saturation Subtract packed unsigned bytes with saturation Subtract packed unsigned words with saturation Multiply packed words and store high result Multiply packed words and store low result Multiply and add packed words MMX COMPARISON INSTRUCTIONS Compare packed bytes for equal Compare packed words for equal Compare packed doublewords for equal Compare packed bytes for greater than Compare packed words for greater than Compare packed doublewords for greater than MMX LOGIC INSTRUCTIONS Bitwise logical and Bitwise logical and not Bitwise logical or Bitwise logical exclusive or MMX SHIFT AND ROTATE INSTRUCTIONS Shift packed words left logical Shift packed doublewords left logical Shift packed quadword left logical Shift packed words right logical Shift packed doublewords right logical Shift packed quadword right logical Shift packed words right arithmetic Shift packed doublewords right arithmetic
6-11
6.2.2.7. EMMS
MMX STATE MANAGEMENT Empty MMX state
6.2.3.
Floating-Point Instructions
The floating-point instructions are those that are executed by the processors floating-point unit (FPU). These instructions operate on floating-point (real), extended integer, and binary-coded decimal (BCD) operands. As with the integer instructions, the following list of floating-point instructions is divided into subgroups. 6.2.3.1. FLD FST FSTP FILD FIST FISTP FBLD FBSTP FXCH FCMOVE FCMOVNE FCMOVB FCMOVBE FCMOVNB FCMOVNBE FCMOVU FCMOVNU DATA TRANSFER Load real Store real Store real and pop Load integer Store integer Store integer and pop Load BCD Store BCD and pop Exchange registers Floating-point conditional move if equal Floating-point conditional move if not equal Floating-point conditional move if below Floating-point conditional move if below or equal Floating-point conditional move if not below Floating-point conditional move if not below or equal Floating-point conditional move if unordered Floating-point conditional move if not unordered
6-12
6.2.3.2. FADD FADDP FIADD FSUB FSUBP FISUB FSUBR FSUBRP FISUBR FMUL FMULP FIMUL FDIV FDIVP FIDIV FDIVR FDIVRP FIDIVR FPREM FPREMI FABS FCHS FRNDINT FSCALE FSQRT FXTRACT
BASIC ARITHMETIC Add real Add real and pop Add integer Subtract real Subtract real and pop Subtract integer Subtract real reverse Subtract real reverse and pop Subtract integer reverse Multiply real Multiply real and pop Multiply integer Divide real Divide real and pop Divide integer Divide real reverse Divide real reverse and pop Divide integer reverse Partial remainder IEEE Partial remainder Absolute value Change sign Round to integer Scale by power of two Square root Extract exponent and significand
6-13
6.2.3.3. FCOM FCOMP FCOMPP FUCOM FUCOMP FUCOMPP FICOM FICOMP FCOMI FUCOMI FCOMIP FUCOMIP FTST FXAM 6.2.3.4. FSIN FCOS FSINCOS FPTAN FPATAN F2XM1 FYL2X FYL2XP1
COMPARISON Compare real Compare real and pop Compare real and pop twice Unordered compare real Unordered compare real and pop Unordered compare real and pop twice Compare integer Compare integer and pop Compare real and set EFLAGS Unordered compare real and set EFLAGS Compare real, set EFLAGS, and pop Unordered compare real, set EFLAGS, and pop Test real Examine real TRANSCENDENTAL Sine Cosine Sine and cosine Partial tangent Partial arctangent 2x 1 ylog2x ylog2(x+1)
6-14
6.2.3.5. FLD1 FLDZ FLDPI FLDL2E FLDLN2 FLDL2T FLDLG2 6.2.3.6. FINCSTP FDECSTP FFREE FINIT FNINIT FCLEX FNCLEX FSTCW FNSTCW FLDCW FSTENV FNSTENV FLDENV FSAVE FNSAVE FRSTOR FSTSW FNSTSW
LOAD CONSTANTS Load +1.0 Load +0.0 Load Load log2e Load loge2 Load log210 Load log102 FPU CONTROL Increment FPU register stack pointer Decrement FPU register stack pointer Free floating-point register Initialize FPU after checking error conditions Initialize FPU without checking error conditions Clear floating-point exception flags after checking for error conditions Clear floating-point exception flags without checking for error conditions Store FPU control word after checking error conditions Store FPU control word without checking error conditions Load FPU control word Store FPU environment after checking error conditions Store FPU environment without checking error conditions Load FPU environment Save FPU state after checking error conditions Save FPU state without checking error conditions Restore FPU state Store FPU status word after checking error conditions Store FPU status word without checking error conditions Wait for FPU FPU no operation
WAIT/FWAIT FNOP
6-15
6.2.4.
System Instructions
The following system instructions are used to control those functions of the processor that are provided to support for operating systems and executives. LGDT SGDT LLDT SLDT LTR STR LIDT SIDT MOV LMSW SMSW CLTS ARPL LAR LSL VERR VERW MOV INVD WBINVD INVLPG LOCK (prefix) HLT RSM Load global descriptor table (GDT) register Store global descriptor table (GDT) register Load local descriptor table (LDT) register Store local descriptor table (LDT) register Load task register Store task register Load interrupt descriptor table (IDT) register Store interrupt descriptor table (IDT) register Load and store control registers Load machine status word Store machine status word Clear the task-switched flag Adjust requested privilege level Load access rights Load segment limit Verify segment for reading Verify segment for writing Load and store debug registers Invalidate cache, no writeback Invalidate cache, with writeback Invalidate TLB Entry Lock Bus Halt processor Return from system management mode (SSM)
6-16
RDMSR WRMSR RDPMC RDTSC SYSENTER SYSEXIT
Read model-specific register Write model-specific register Read performance monitoring counters Read time stamp counter Fast System Call, transfers to a flat protected mode kernel at CPL=0. Fast System Call, transfers to a flat protected mode kernel at CPL=3.
6.2.5.
Streaming SIMD Extensions
The Streaming SIMD Extensions execute on those IA processors that implement the Intel Streaming SIMD Extensions. These instructions operate on packed single precision floatingpoint operands. As with the MMX instructions, the following list of Streaming SIMD Extensions is divided into subgroups. 6.2.5.1. MOVAPS MOVUPS MOVHPS MOVHLPS MOVLPS MOVLHPS MOVMSKPS MOVSS 6.2.5.2. CVTPI2PS CVTSI2SS CVTPS2PI CVTTPS2PI CVTSS2SI CVTTSS2SI STREAMING SIMD EXTENSIONS DATA TRANSFER INSTRUCTIONS Move aligned packed single-precision floating-point Move unaligned packed single-precision floating-point Move unaligned high packed single-precision floating-point Move aligned high packed single-precision floating-point to low packed single-precision floating-point Move unaligned low packed single-precision floating-point Move aligned low packed single-precision floating-point to high packed single-precision floating-point Move mask packed single-precision floating-point Move scalar single-precision floating-point STREAMING SIMD EXTENSIONS CONVERSION INSTRUCTIONS Convert packed 32-bit integer to packed single-precision floating-point Convert scalar 32-bit integer to scalar single-precision floating-point Convert packed single-precision floating-point to packed 32-bit integer Convert truncate packed single-precision floating-point to packed 32-bit integer Convert scalar single-precision floating-point to a 32-bit integer Convert truncate scalar single-precision floating-point to scalar 32-bit integer
6-17
6.2.5.3. ADDPS SUBPS ADDSS SUBSS MULPS MULSS DIVPS DIVSS SQRTPS SQRTSS MAXPS MAXSS MINPS MINSS 6.2.5.4. CMPPS CMPSS COMISS UCOMISS
STREAMING SIMD EXTENSIONS PACKED ARITHMETIC INSTRUCTIONS Add packed single-precision floating-point Subtract packed single-precision floating-point Add scalar single-precision floating-point Subtract scalar single-precision floating-point Multiply packed single-precision floating-point Multiply scalar single-precision floating-point Divide packed single-precision floating-point Divide scalar single-precision floating-point Square root packed single-precision floating-point Square root scalar single-precision floating-point Maximum packed single-precision floating-point Maximum scalar single-precision floating-point Minimum packed single-precision floating-point Minimum scalar single-precision floating-point STREAMING SIMD EXTENSIONS COMPARISON INSTRUCTIONS Compare packed single-precision floating-point Compare scalar single-precision floating-point Compare scalar single-precision floating-point ordered and set EFLAGS Unordered compare scalar single-precision floating-point ordered and set EFLAGS STREAMING SIMD EXTENSIONS LOGICAL INSTRUCTIONS Bit-wise packed logical AND for single-precision floating-point Bit-wise packed logical AND NOT for single-precision floatingpoint Bit-wise packed logical OR for single-precision floating-point Bit-wise packed logical XOR for single-precision floating-point
6.2.5.5. ANDPS ANDNPS ORPS XORPS
6-18
6.2.5.6. SHUFPS UNPCKHPS UNPCKLPS 6.2.5.7.
STREAMING SIMD EXTENSIONS DATA SHUFFLE INSTRUCTIONS Shuffle packed single-precision floating-point Unpacked high packed single-precision floating-point Unpacked low packed single-precision floating-point STREAMING SIMD EXTENSIONS ADDITIONAL SIMD-INTEGER INSTRUCTIONS Average unsigned source sub-operands, without incurring a loss in precision Extract 16-bit word from MMX register Insert 16-bit word into MMX register Maximum of packed unsigned integer bytes or signed integer words Minimum of packed unsigned integer bytes or signed integer words Move Byte Mask from MMX register Unsigned high packed integer word multiply in MMX register Sum of absolute differences Shuffle packed integer word in MMX register STREAMING SIMD EXTENSIONS CACHEABILITY CONTROL INSTRUCTIONS Non-temporal byte mask store of packed integer in a MMX register Non-temporal store of packed integer in a MMX register Non-temporal store of packed single-precision floating-point Load 32 or greater number of bytes Store Fence STREAMING SIMD EXTENSIONS STATE MANAGEMENT INSTRUCTIONS Load SIMD Floating-Point Control and Status Register Store SIMD Floating-Point Control and Status Register Saves floating-point and MMX state and SIMD Floating-Point state to memory Loads FP and MMX state and SIMD Floating-Point state from memory
PAVGB/PAVGW PEXTRW PINSRW PMAXUB/PMAXSW PMINUB/PMINSW PMOVMSKB PMULHUW PSADBW PSHUFW 6.2.5.8.
MASKMOVQ MOVNTQ MOVNTPS PREFETCH SFENCE 6.2.5.9. LDMXCSR STMXCSR FXSAVE FXRSTOR
6-19
6.3.
DATA MOVEMENT INSTRUCTIONS
The data movement instructions move bytes, words, doublewords, or quadwords both between memory and the processors registers and between registers. These instructions are divided into four groups:
General-purpose data movement. Exchange. Stack manipulation. Type-conversion.
6.3.1.
General-Purpose Data Movement Instructions
The MOV (move) and CMOVcc (conditional move) instructions transfer data between memory and registers or between registers. 6.3.1.1. MOVE INSTRUCTION
The MOV instruction performs basic load data and store data operations between memory and the processors registers and data movement operations between registers. It handles data transfers along the paths listed in Table 6-1. (Refer to MOVMove to/from Control Registers and MOVMove to/from Debug Registers in Chapter 3, Instruction Set Reference of the Intel Architecture Software Developers Manual, Volume 2, for information on moving data to and from the control and debug registers.) The MOV instruction cannot move data from one memory location to another or from one segment register to another segment register. Memory-to-memory moves can be performed with the MOVS (string move) instruction (refer to Section 6.10., String Operations). 6.3.1.2. CONDITIONAL MOVE INSTRUCTIONS
The CMOVcc instructions are a group of instructions that check the state of the status flags in the EFLAGS register and perform a move operation if the flags are in a specified state (or condition). These instructions can be used to move a 16- or 32-bit value from memory to a generalpurpose register or from one general-purpose register to another. The flag state being tested for each instruction is specified with a condition code (cc) that is associated with the instruction. If the condition is not satisfied, a move is not performed and execution continues with the instruction following the CMOVcc instruction.
6-20
Table 6-1. Move Instruction Operations

Type of Data Movement From memory to a register From a register to memory Between registers Source Destination Memory location General-purpose register Memory location Segment register General-purpose register Memory location Segment register Memory location General-purpose register General-purpose register General-purpose register Segment register Segment register General-purpose register General-purpose register Control register Control register General-purpose register General-purpose register Debug register Debug register General-purpose register Immediate General-purpose register Immediate Memory location
Immediate data to a register Immediate data to memory
Table 6-4 shows the mnemonics for the CMOVcc instructions and the conditions being tested for each instruction. The condition code mnemonics are appended to the letters CMOV to form the mnemonics for the CMOVcc instructions. The instructions listed in Table 6-4 as pairs (for example, CMOVA/CMOVNBE) are alternate names for the same instruction. The assembler provides these alternate names to make it easier to read program listings. The CMOVcc instructions are useful for optimizing small IF constructions. They also help eliminate branching overhead for IF statements and the possibility of branch mispredictions by the processor. These instructions may not be supported on some processors in the Pentium Pro processor family. Software can check if the CMOVcc instructions are supported by checking the processors feature information with the CPUID instruction (refer to CPUIDCPU Identification in Chapter 3, Instruction Set Reference of the Intel Architecture Software Developers Manual, Volume 2). 6.3.1.3. EXCHANGE INSTRUCTIONS
The exchange instructions swap the contents of one or more operands and, in some cases, performs additional operations such as asserting the LOCK signal or modifying flags in the EFLAGS register. The XCHG (exchange) instruction swaps the contents of two operands. This instruction takes the place of three MOV instructions and does not require a temporary location to save the contents of one operand location while the other is being loaded. When a memory operand is used with the XCHG instruction, the processors LOCK signal is automatically asserted. This instruction is thus useful for implementing semaphores or similar data structures for process synchronization. (Refer to Section 7.1.2., Bus Locking of the Intel Architecture Software Developers Manual, Volume 3, for more information on bus locking.) The BSWAP (byte swap) instruction reverses the byte order in a 32-bit register operand. Bit positions 0 through 7 are exchanged with 24 through 31, and bit positions 8 through 15 are
6-21
exchanged with 16 through 23. Executing this instruction twice in a row leaves the register with the same value as before. The BSWAP instruction is useful for converting between big-endian and little-endian data formats. This instruction also speeds execution of decimal arithmetic. (The XCHG instruction can be used two swap the bytes in a word.)
Table 6-2. Conditional Move Instructions
Instruction Mnemonic Unsigned Conditional Moves CMOVA/CMOVNBE CMOVAE/CMOVNB CMOVNC CMOVB/CMOVNAE CMOVC CMOVBE/CMOVNA CMOVE/CMOVZ CMOVNE/CMOVNZ CMOVP/CMOVPE CMOVNP/CMOVPO Signed Conditional Moves CMOVGE/CMOVNL CMOVL/CMOVNGE CMOVLE/CMOVNG CMOVO CMOVNO CMOVS CMOVNS (SF xor OF)=0 (SF xor OF)=1 ((SF xor OF) or ZF)=1 OF=1 OF=0 SF=1 SF=0 Greater or equal/not less Less/not greater or equal Less or equal/not greater Overflow Not overflow Sign (negative) Not sign (non-negative) (CF or ZF)=0 CF=0 CF=0 CF=1 CF=1 (CF or ZF)=1 ZF=1 ZF=0 PF=1 PF=0 Above/not below or equal Above or equal/not below Not carry Below/not above or equal Carry Below or equal/not above Equal/zero Not equal/not zero Parity/parity even Not parity/parity odd Status Flag States Condition Description
The XADD (exchange and add) instruction swaps two operands and then stores the sum of the two operands in the destination operand. The status flags in the EFLAGS register indicate the result of the addition. This instruction can be combined with the LOCK prefix (refer to LOCKAssert LOCK# Signal Prefix in Chapter 3, Instruction Set Reference of the Intel Architecture Software Developers Manual, Volume 2) in a multiprocessing system to allow multiple processors to execute one DO loop. The CMPXCHG (compare and exchange) and CMPXCHG8B (compare and exchange 8 bytes) instructions are used to synchronize operations in systems that use multiple processors. The CMPXCHG instruction requires three operands: a source operand in a register, another source operand in the EAX register, and a destination operand. If the values contained in the destination operand and the EAX register are equal, the destination operand is replaced with the value of the other source operand (the value not in the EAX register). Otherwise, the original value of the destination operand is loaded in the EAX register. The status flags in the EFLAGS register
6-22
reflect the result that would have been obtained by subtracting the destination operand from the value in the EAX register. The CMPXCHG instruction is commonly used for testing and modifying semaphores. It checks to see if a semaphore is free. If the semaphore is free it is marked allocated, otherwise it gets the ID of the current owner. This is all done in one uninterruptible operation. In a single-processor system, the CMPXCHG instruction eliminates the need to switch to protection level 0 (to disable interrupts) before executing multiple instructions to test and modify a semaphore. For multiple processor systems, CMPXCHG can be combined with the LOCK prefix to perform the compare and exchange operation atomically. (Refer to Section 7.1., Locked Atomic Operations of the Intel Architecture Software Developers Manual, Volume 3, for more information on atomic operations.) The CMPXCHG8B instruction also requires three operands: a 64-bit value in EDX:EAX, a 64-bit value in ECX:EBX, and a destination operand in memory. The instruction compares the 64-bit value in the EDX:EAX registers with the destination operand. If they are equal, the 64-bit value in the ECX:EBX register is stored in the destination operand. If the EDX:EAX register and the destination are not equal, the destination is loaded in the EDX:EAX register. The CMPXCHG8B instruction can be combined with the LOCK prefix to perform the operation atomically.
6.3.2.
Stack Manipulation Instructions
The PUSH, POP, PUSHA (push all registers), and POPA (pop all registers) instructions move data to and from the stack. The PUSH instruction decrements the stack pointer (contained in the ESP register), then copies the source operand to the top of stack (refer to Figure 6-1). It operates on memory operands, immediate operands, and register operands (including segment registers). The PUSH instruction is commonly used to place parameters on the stack before calling a procedure. It can also be used to reserve space on the stack for temporary variables.
Stack Before Pushing Doubleword Stack Growth n n4 n8 31 0 ESP Doubleword Value ESP After Pushing Doubleword 31 0
Figure 6-1. Operation of the PUSH Instruction
The PUSHA instruction saves the contents of the eight general-purpose registers on the stack (refer to Figure 6-2). This instruction simplifies procedure calls by reducing the number of instructions required to save the contents of the general-purpose registers. The registers are pushed on the stack in the following order: EAX, ECX, EDX, EBX, the initial value of ESP before EAX was pushed, EBP, ESI, and EDI.
6-23
Stack Stack Growth n n-4 n-8 n - 12 n - 16 n - 20 n - 24 n - 28 n - 32 n - 36 Before Pushing Registers 31 0 After Pushing Registers 31 0
ESP EAX ECX EDX EBX Old ESP EBP ESI EDI
ESP
Figure 6-2. Operation of the PUSHA Instruction
The POP instruction copies the word or doubleword at the current top of stack (indicated by the ESP register) to the location specified with the destination operand, and then increments the ESP register to point to the new top of stack (refer to Figure 6-3). The destination operand may specify a general-purpose register, a segment register, or a memory location.
Stack Before Popping Doubleword Stack Growth n n-4 n-8 31 0 After Popping Doubleword 31 0
ESP Doubleword Value ESP
Figure 6-3. Operation of the POP Instruction
The POPA instruction reverses the effect of the PUSHA instruction. It pops the top eight words or doublewords from the top of the stack into the general-purpose registers, except for the ESP register (refer to Figure 6-4). If the operand-size attribute is 32, the doublewords on the stack are transferred to the registers in the following order: EDI, ESI, EBP, ignore doubleword, EBX, EDX, ECX, and EAX. The ESP register is restored by the action of popping the stack. If the operand-size attribute is 16, the words on the stack are transferred to the registers in the following order: DI, SI, BP, ignore word, BX, DX, CX, and AX.
6-24
Stack Stack Growth Before Popping Registers 0 31 After Popping Registers 0 31 ESP EAX ECX EDX EBX Ignored EBP ESI EDI
n n-4 n-8 n - 12 n - 16 n - 20 n - 24 n - 28 n - 32 n - 36
ESP
Figure 6-4. Operation of the POPA Instruction
6.3.2.1.
TYPE CONVERSION INSTRUCTIONS
The type conversion instructions convert bytes into words, words into doublewords, and doublewords into quadwords. These instructions are especially useful for converting integers to larger integer formats, because they perform sign extension (refer to Figure 6-5). Two kinds of type conversion instructions are provided: simple conversion and move and convert.
15
S N N N N N N N N N N N N N N N 31 15 0
Before Sign Extension After Sign Extension
S S S S S S S S S S S S S S S S S N N N N N N N N N N N N N N N
Figure 6-5. Sign Extension
6.3.2.2.
SIMPLE CONVERSION
The CBW (convert byte to word), CWDE (convert word to doubleword extended), CWD (convert word to doubleword), and CDQ (convert doubleword to quadword) instructions perform sign extension to double the size of the source operand.
6-25
The CBW instruction copies the sign (bit 7) of the byte in the AL register into every bit position of the upper byte of the AX register. The CWDE instruction copies the sign (bit 15) of the word in the AX register into every bit position of the high word of the EAX register. The CWD instruction copies the sign (bit 15) of the word in the AX register into every bit position in the DX register. The CDQ instruction copies the sign (bit 31) of the doubleword in the EAX register into every bit position in the EDX register. The CWD instruction can be used to produce a doubleword dividend from a word before a word division, and the CDQ instruction can be used to produce a quadword dividend from a doubleword before doubleword division. 6.3.2.3. MOVE AND CONVERT
The MOVSX (move with sign extension) and MOVZX (move with zero extension) instructions move the source operand into a register then perform the sign extension. The MOVSX instruction extends an 8-bit value to a 16-bit value or an 8- or 16-bit value to 32-bit value by sign extending the source operand, as shown in Figure 6-5. The MOVZX instruction extends an 8-bit value to a 16-bit value or an 8- or 16-bit value to 32-bit value by zero extending the source operand.
6.4.
BINARY ARITHMETIC INSTRUCTIONS
The binary arithmetic instructions operate on 8-, 16-, and 32-bit numeric data encoded as signed or unsigned binary integers. Operations include the add, subtract, multiply, and divide as well as increment, decrement, compare, and change sign (negate). The binary arithmetic instructions may also be used in algorithms that operate on decimal (BCD) values.
6.4.1.
Addition and Subtraction Instructions
The ADD (add integers), ADC (add integers with carry), SUB (subtract integers), and SBB (subtract integers with borrow) instructions perform addition and subtraction operations on signed or unsigned integer operands. The ADD instruction computes the sum of two integer operands. The ADC instruction computes the sum of two integer operands, plus 1 if the CF flag is set. This instruction is used to propagate a carry when adding numbers in stages. The SUB instruction computes the difference of two integer operands. The SBB instruction computes the difference of two integer operands, minus 1 if the CF flag is set. This instruction is used to propagate a borrow when subtracting numbers in stages.
6.4.2.
Increment and Decrement Instructions
The INC (increment) and DEC (decrement) instructions add 1 to or subtract 1 from an unsigned integer operand, respectively. A primary use of these instructions is for implementing counters.
6-26
6.4.3.
Comparison and Sign Change Instruction
The CMP (compare) instruction computes the difference between two integer operands and updates the OF, SF, ZF, AF, PF, and CF flags according to the result. The source operands are not modified, nor is the result saved. The CMP instruction is commonly used in conjunction with a Jcc (jump) or SETcc (byte set on condition) instruction, with the latter instructions performing an action based on the result of a CMP instruction. The NEG (negate) instruction subtracts a signed integer operand from zero. The effect of the NEG instruction is to change the sign of a twos complement operand while keeping its magnitude.
6.4.4.
Multiplication and Divide Instructions
The processor provides two multiply instructions, MUL (unsigned multiply) and IMUL signed multiply), and two divide instructions, DIV (unsigned divide) and IDIV (signed divide). The MUL instruction multiplies two unsigned integer operands. The result is computed to twice the size of the source operands (for example, if word operands are being multiplied, the result is a doubleword). The IMUL instruction multiplies two signed integer operands. The result is computed to twice the size of the source operands; however, in some cases the result is truncated to the size of the source operands (refer to Chapter 3, Instruction Set Reference of the Intel Architecture Software Developers Manual, Volume 2). The DIV instruction divides one unsigned operand by another unsigned operand and returns a quotient and a remainder. The IDIV instruction is identical to the DIV instruction, except that IDIV performs a signed division.
6.5.
DECIMAL ARITHMETIC INSTRUCTIONS
Decimal arithmetic can be performed by combining the binary arithmetic instructions ADD, SUB, MUL, and DIV (discussed in Section 6.4., Binary Arithmetic Instructions) with the decimal arithmetic instructions. The decimal arithmetic instructions are provided to carry out the following operations:
To adjust the results of a previous binary arithmetic operation to produce a valid BCD result. To adjust the operands of a subsequent binary arithmetic operation so that the operation will produce a valid BCD result.
These instructions operate only on both packed and unpacked BCD values.
6-27
6.5.1.
Packed BCD Adjustment Instructions
The DAA (decimal adjust after addition) and DAS (decimal adjust after subtraction) instructions adjust the results of operations performed on packed BCD integers (refer to Section 5.2.3., BCD Integers in Chapter 5, Data Types and Addressing Modes). Adding two packed BCD values requires two instructions: an ADD instruction followed by a DAA instruction. The ADD instruction adds (binary addition) the two values and stores the result in the AL register. The DAA instruction then adjusts the value in the AL register to obtain a valid, 2-digit, packed BCD value and sets the CF flag if a decimal carry occurred as the result of the addition. Likewise, subtracting one packed BCD value from another requires a SUB instruction followed by a DAS instruction. The SUB instruction subtracts (binary subtraction) one BCD value from another and stores the result in the AL register. The DAS instruction then adjusts the value in the AL register to obtain a valid, 2-digit, packed BCD value and sets the CF flag if a decimal borrow occurred as the result of the subtraction.
6.5.2.
Unpacked BCD Adjustment Instructions
The AAA (ASCII adjust after addition), AAS (ASCII adjust after subtraction), AAM (ASCII adjust after multiplication), and AAD (ASCII adjust before division) instructions adjust the results of arithmetic operations performed in unpacked BCD values (refer to Section 5.2.3., BCD Integers in Chapter 5, Data Types and Addressing Modes). All these instructions assume that the value to be adjusted is stored in the AL register or, in one instance, the AL and AH registers. The AAA instruction adjusts the contents of the AL register following the addition of two unpacked BCD values. It converts the binary value in the AL register into a decimal value and stores the result in the AL register in unpacked BCD format (the decimal number is stored in the lower 4 bits of the register and the upper 4 bits are cleared). If a decimal carry occurred as a result of the addition, the CF flag is set and the contents of the AH register are incremented by 1. The AAS instruction adjusts the contents of the AL register following the subtraction of two unpacked BCD values. Here again, a binary value is converted into an unpacked BCD value. If a borrow was required to complete the decimal subtract, the CF flag is set and the contents of the AH register are decremented by 1. The AAM instruction adjusts the contents of the AL register following a multiplication of two unpacked BCD values. It converts the binary value in the AL register into a decimal value and stores the least significant digit of the result in the AL register (in unpacked BCD format) and the most significant digit, if there is one, in the AH register (also in unpacked BCD format). The AAD instruction adjusts a two-digit BCD value so that when the value is divided with the DIV instruction, a valid unpacked BCD result is obtained. The instruction converts the BCD value in registers AH (most significant digit) and AL (least significant digit) into a binary value and stores the result in register AL. When the value in AL is divided by an unpacked BCD value, the quotient and remainder will be automatically encoded in unpacked BCD format.
6-28
6.6.
LOGICAL INSTRUCTIONS
The logical instructions AND, OR, XOR (exclusive or), and NOT perform the standard Boolean operations for which they are named. The AND, OR, and XOR instructions require two operands; the NOT instruction operates on a single operand.
6.7.
SHIFT AND ROTATE INSTRUCTIONS
The shift and rotate instructions rearrange the bits within an operand. These instructions fall into the following classes:
Shift. Double shift. Rotate.
6.7.1.
Shift Instructions
The SAL (shift arithmetic left), SHL (shift logical left), SAR (shift arithmetic right), SHR (shift logical right) instructions perform an arithmetic or logical shift of the bits in a byte, word, or doubleword. The SAL and SHL instructions perform the same operation (refer to Figure 6-6). They shift the source operand left by from 1 to 31 bit positions. Empty bit positions are cleared. The CF flag is loaded with the last bit shifted out of the operand.
.
Initial State CF X Operand

1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1 1
After 1-bit SHL/SAL Instruction 1

0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1 1 0
After 10-bit SHL/SAL Instruction 0

0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0
Figure 6-6. SHL/SAL Instruction Operation
The SHR instruction shifts the source operand right by from 1 to 31 bit positions (refer to Figure 6-7). As with the SHL/SAL instruction, the empty bit positions are cleared and the CF flag is loaded with the last bit shifted out of the operand.
0 0 6-29
Initial State
Operand
CF X
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1 1
After 1-bit SHR Instruction 0 6-30 0

0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1
After 10-bit SHR Instruction

0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0
Figure 6-7. SHR Instruction Operation
The SAR instruction shifts the source operand right by from 1 to 31 bit positions (refer to Figure 6-8). This instruction differs from the SHR instruction in that it preserves the sign of the source operand by clearing empty bit positions if the operand is positive or setting the empty bits if the operand is negative. Again, the CF flag is loaded with the last bit shifted out of the operand. The SAR and SHR instructions can also be used to perform division by powers of 2 (refer to Chapter 3, Instruction Set Reference of the Intel Architecture Software Developers Manual, Volume 2).
Initial State (Positive Operand)
Operand
CF X
0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1
After 1-bit SAR Instruction

0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1
Initial State (Negative Operand)

1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1
CF X
After 1-bit SAR Instruction

1 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1
Figure 6-8. SAR Instruction Operation
6.7.2.
Double-Shift Instructions
The SHLD (shift left double) and SHRD (shift right double) instructions shift a specified number of bits from one operand to another (refer to Figure 6-9). They are provided to facilitate operations on unaligned bit strings. They can also be used to implement a variety of bit string move operations.
6-31
SHLD Instruction 31 CF Destination (Memory or Register) 0
31 Source (Register) SHRD Instruction Source (Register)
31
31 Destination (Memory or Register)
0 CF
Figure 6-9. SHLD and SHRD Instruction Operations
The SHLD instruction shifts the bits in the destination operand to the left and fills the empty bit positions (in the destination operand) with bits shifted out of the source operand. The destination and source operands must be the same length (either words or doublewords). The shift count can range from 0 to 31 bits. The result of this shift operation is stored in the destination operand, and the source operand is not modified. The CF flag is loaded with the last bit shifted out of the destination operand. The SHRD instruction operates the same as the SHLD instruction except bits are shifted to the left in the destination operand, with the empty bit positions filled with bits shifted out of the source operand.
6.7.3.
Rotate Instructions
The ROL (rotate left), ROR (rotate right), RCL (rotate through carry left) and RCR (rotate through carry right) instructions rotate the bits in the destination operand out of one end and back through the other end (refer to Figure 6-10). Unlike a shift, no bits are lost during a rotation. The rotate count can range from 0 to 31.
6-32
31 CF
ROL Instruction Destination (Memory or Register)
31
ROR Instruction Destination (Memory or Register)
0 CF
31 CF
RCL Instruction Destination (Memory or Register)
31
RCR Instruction Destination (Memory or Register)
0 CF
Figure 6-10. ROL, ROR, RCL, and RCR Instruction Operations
The ROL instruction rotates the bits in the operand to the left (toward more significant bit locations). The ROR instruction rotates the operand right (toward less significant bit locations). The RCL instruction rotates the bits in the operand to the left, through the CF flag). This instruction treats the CF flag as a one-bit extension on the upper end of the operand. Each bit which exits from the most significant bit location of the operand moves into the CF flag. At the same time, the bit in the CF flag enters the least significant bit location of the operand. The RCR instruction rotates the bits in the operand to the right through the CF flag. For all the rotate instructions, the CF flag always contains the value of the last bit rotated out of the operand, even if the instruction does not use the CF flag as an extension of the operand. The value of this flag can then be tested by a conditional jump instruction (JC or JNC).
6-33
6.8.
BIT AND BYTE INSTRUCTIONS
The bit and byte instructions operate on bit or byte strings. They are divided into four groups:
Bit test and modify instructions. Bit scan instructions. Byte set on condition. Test.
6.8.1.
Bit Test and Modify Instructions
The bit test and modify instructions (refer to Table 6-3) operate on a single bit, which can be in an operand. The location of the bit is specified as an offset from the least significant bit of the operand. When the processor identifies the bit to be tested and modified, it first loads the CF flag with the current value of the bit. Then it assigns a new value to the selected bit, as determined by the modify operation for the instruction.
Table 6-3. Bit Test and Modify Instructions
Instruction BT (Bit Test) BTS (Bit Test and Set) BTR (Bit Test and Reset) BTC (Bit Test and Complement) Effect on CF Flag CF flag Selected Bit CF flag Selected Bit CF flag Selected Bit CF flag Selected Bit Effect on Selected Bit No effect Selected Bit 1 Selected Bit 0 Selected Bit NOT (Selected Bit)
6.8.2.
Bit Scan Instructions
The BSF (bit scan forward) and BSR (bit scan reverse) instructions scan a bit string in a source operand for a set bit and store the bit index of the first set bit found in a destination register. The bit index is the offset from the least significant bit (bit 0) in the bit string to the first set bit. The BSF instruction scans the source operand low-to-high (from bit 0 of the source operand toward the most significant bit); the BSR instruction scans high-to-low (from the most significant bit toward the least significant bit).
6.8.3.
Byte Set on Condition Instructions
The SETcc (set byte on condition) instructions set a destination-operand byte to 0 or 1, depending on the state of selected status flags (CF, OF, SF, ZF, and PF) in the EFLAGS register. The suffix (cc) added to the SET mnemonic determines the condition being tested for. For example, the SETO instruction tests for overflow. If the OF flag is set, the destination byte is set to 1; if OF is clear, the destination byte is cleared to 0. Appendix B, EFLAGS Condition Codes lists the conditions it is possible to test for with this instruction.
6-34
6.8.4.
Test Instruction
The TEST instruction performs a logical AND of two operands and sets the SF, ZF, and PF flags according to the results. The flags can then be tested by the conditional jump or loop instructions or the SETcc instructions. The TEST instruction differs from the AND instruction in that it does not alter either of the operands.
6.9.
CONTROL TRANSFER INSTRUCTIONS
The processor provides both conditional and unconditional control transfer instructions to direct the flow of program execution. Conditional transfers are taken only for specified states of the status flags in the EFLAGS register. Unconditional control transfers are always executed.
6.9.1.
Unconditional Transfer Instructions
The JMP, CALL, RET, INT, and IRET instructions transfer program control to another location (destination address) in the instruction stream. The destination can be within the same code segment (near transfer) or in a different code segment (far transfer). 6.9.1.1. JUMP INSTRUCTION
The JMP (jump) instruction unconditionally transfers program control to a destination instruction. The transfer is one-way; that is, a return address is not saved. A destination operand specifies the address (the instruction pointer) of the destination instruction. The address can be a relative address or an absolute address. A relative address is a displacement (offset) with respect to the address in the EIP register. The destination address (a near pointer) is formed by adding the displacement to the address in the EIP register. The displacement is specified with a signed integer, allowing jumps either forward or backward in the instruction stream. An absolute address is a offset from address 0 of a segment. It can be specified in either of the following ways:
An address in a general-purpose register. This address is treated as a near pointer, which is copied into the EIP register. Program execution then continues at the new address within the current code segment. An address specified using the standard addressing modes of the processor. Here, the address can be a near pointer or a far pointer. If the address is for a near pointer, the address is translated into an offset and copied into the EIP register. If the address is for a far pointer, the address is translated into a segment selector (which is copied into the CS register) and an offset (which is copied into the EIP register).
In protected mode, the JMP instruction also allows jumps to a call gate, a task gate, and a taskstate segment.
6-35
6.9.1.2.
CALL AND RETURN INSTRUCTIONS
The CALL (call procedure) and RET (return from procedure) instructions allow a jump from one procedure (or subroutine) to another and a subsequent jump back (return) to the calling procedure. The CALL instruction transfers program control from the current (or calling procedure) to another procedure (the called procedure). To allow a subsequent return to the calling procedure, the CALL instruction saves the current contents of the EIP register on the stack before jumping to the called procedure. The EIP register (prior to transferring program control) contains the address of the instruction following the CALL instruction. When this address is pushed on the stack, it is referred to as the return instruction pointer or return address. The address of the called procedure (the address of the first instruction in the procedure being jumped to) is specified in a CALL instruction the same way as it is in a JMP instruction (refer to Section 6.9.1.1., Jump Instruction). The address can be specified as a relative address or an absolute address. If an absolute address is specified, it can be either a near or a far pointer. The RET instruction transfers program control from the procedure currently being executed (the called procedure) back to the procedure that called it (the calling procedure). Transfer of control is accomplished by copying the return instruction pointer from the stack into the EIP register. Program execution then continues with the instruction pointed to by the EIP register. The RET instruction has an optional operand, the value of which is added to the contents of the ESP register as part of the return operation. This operand allows the stack pointer to be incremented to remove parameters from the stack that were pushed on the stack by the calling procedure. Refer to Section 4.3., Calling Procedures Using CALL and RET in Chapter 4, Procedure Calls, Interrupts, and Exceptions for more information on the mechanics of making procedure calls with the CALL and RET instructions. 6.9.1.3. RETURN FROM INTERRUPT INSTRUCTION
When the processor services an interrupt, it performs an implicit call to an interrupt-handling procedure. The IRET (return from interrupt) instruction returns program control from an interrupt handler to the interrupted procedure (that is, the procedure that was executing when the interrupt occurred). The IRET instruction performs a similar operation to the RET instruction (refer to Section 6.9.1.2., Call and Return Instructions) except that it also restores the EFLAGS register from the stack. The contents of the EFLAGS register are automatically stored on the stack along with the return instruction pointer when the processor services an interrupt.
6.9.2.
Conditional Transfer Instructions
The conditional transfer instructions execute jumps or loops that transfer program control to another instruction in the instruction stream if specified conditions are met. The conditions for control transfer are specified with a set of condition codes that define various states of the status flags (CF, ZF, OF, PF, and SF) in the EFLAGS register.
6-36
6.9.2.1.
CONDITIONAL JUMP INSTRUCTIONS
The Jcc (conditional) jump instructions transfer program control to a destination instruction if the conditions specified with the condition code (cc) associated with the instruction are satisfied (refer to Table 6-4). If the condition is not satisfied, execution continues with the instruction following the Jcc instruction. As with the JMP instruction, the transfer is one-way; that is, a return address is not saved.
Table 6-4. Conditional Jump Instructions
Instruction Mnemonic Unsigned Conditional Jumps JA/JNBE JAE/JNB JB/JNAE JBE/JNA JC JE/JZ JNC JNE/JNZ JNP/JPO JP/JPE JCXZ JECXZ Signed Conditional Jumps JG/JNLE JGE/JNL JL/JNGE JLE/JNG JNO JNS JO JS ((SF xor OF) or ZF) =0 (SF xor OF)=0 (SF xor OF)=1 ((SF xor OF) or ZF)=1 OF=0 SF=0 OF=1 SF=1 Greater/not less or equal Greater or equal/not less Less/not greater or equal Less or equal/not greater Not overflow Not sign (non-negative) Overflow Sign (negative) (CF or ZF)=0 CF=0 CF=1 (CF or ZF)=1 CF=1 ZF=1 CF=0 ZF=0 PF=0 PF=1 CX=0 ECX=0 Above/not below or equal Above or equal/not below Below/not above or equal Below or equal/not above Carry Equal/zero Not carry Not equal/not zero Not parity/parity odd Parity/parity even Register CX is zero Register ECX is zero Condition (Flag States) Description
The destination operand specifies a relative address (a signed offset with respect to the address in the EIP register) that points to an instruction in the current code segment. The Jcc instructions do not support far transfers; however, far transfers can be accomplished with a combination of a Jcc and a JMP instruction (refer to J ccJump if Condition Is Met in Chapter 3, Instruction Set Reference of the Intel Architecture Software Developers Manual, Volume 2).
6-37
Table 6-4 shows the mnemonics for the Jcc instructions and the conditions being tested for each instruction. The condition code mnemonics are appended to the letter J to form the mnemonic for a Jcc instruction. The instructions are divided into two groups: unsigned and signed conditional jumps. These groups correspond to the results of operations performed on unsigned and signed integers, respectively. Those instructions listed as pairs (for example, JA/JNBE) are alternate names for the same instruction. The assembler provides these alternate names to make it easier to read program listings. The JCXZ and JECXZ instructions test the CX and ECX registers, respectively, instead of one or more status flags. Refer to Section 6.9.2.3., Jump If Zero Instructions for more information about these instructions. 6.9.2.2. LOOP INSTRUCTIONS
The LOOP, LOOPE (loop while equal), LOOPZ (loop while zero), LOOPNE (loop while not equal), and LOOPNZ (loop while not zero) instructions are conditional jump instructions that use the value of the ECX register as a count for the number of times to execute a loop. All the loop instructions decrement the count in the ECX register each time they are executed and terminate a loop when zero is reached. The LOOPE, LOOPZ, LOOPNE, and LOOPNZ instructions also accept the ZF flag as a condition for terminating the loop before the count reaches zero. The LOOP instruction decrements the contents of the ECX register (or the CX register, if the address-size attribute is 16), then tests the register for the loop-termination condition. If the count in the ECX register is non-zero, program control is transferred to the instruction address specified by the destination operand. The destination operand is a relative address (that is, an offset relative to the contents of the EIP register), and it generally points to the first instruction in the block of code that is to be executed in the loop. When the count in the ECX register reaches zero, program control is transferred to the instruction immediately following the LOOP instruction, which terminates the loop. If the count in the ECX register is zero when the LOOP instruction is first executed, the register is pre-decremented to FFFFFFFFH, causing the loop to be executed 232 times. The LOOPE and LOOPZ instructions perform the same operation (they are mnemonics for the same instruction). These instructions operate the same as the LOOP instruction, except that they also test the ZF flag. If the count in the ECX register is not zero and the ZF flag is set, program control is transferred to the destination operand. When the count reaches zero or the ZF flag is clear, the loop is terminated by transferring program control to the instruction immediately following the LOOPE/LOOPZ instruction. The LOOPNE and LOOPNZ instructions (mnemonics for the same instruction) operate the same as the LOOPE/LOOPPZ instructions, except that they terminate the loop if the ZF flag is set. 6.9.2.3. JUMP IF ZERO INSTRUCTIONS
The JECXZ (jump if ECX zero) instruction jumps to the location specified in the destination operand if the ECX register contains the value zero. This instruction can be used in combination with a loop instruction (LOOP, LOOPE, LOOPZ, LOOPNE, or LOOPNZ) to test the ECX register prior to beginning a loop. As described in Section 6.9.2.2., Loop Instructions, the loop
6-38
instructions decrement the contents of the ECX register before testing for zero. If the value in the ECX register is zero initially, it will be decremented to FFFFFFFFH on the first loop instruction, causing the loop to be executed 232 times. To prevent this problem, a JECXZ instruction can be inserted at the beginning of the code block for the loop, causing a jump out the loop if the EAX register count is initially zero. When used with repeated string scan and compare instructions, the JECXZ instruction can determine whether the loop terminated because the count reached zero or because the scan or compare conditions were satisfied. The JCXZ (jump if CX is zero) instruction operates the same as the JECXZ instruction when the 16-bit address-size attribute is used. Here, the CX register is tested for zero.
6.9.3.
Software Interrupts
The INT n (software interrupt), INTO (interrupt on overflow), and BOUND (detect value out of range) instructions allow a program to explicitly raise a specified interrupt or exception, which in turn causes the handler routine for the interrupt or exception to be called. The INT n instruction can raise any of the processors interrupts or exceptions by encoding the vector number or the interrupt or exception in the instruction. This instruction can be used to support software generated interrupts or to test the operation of interrupt and exception handlers. The IRET instruction (refer to Section 6.9.1.3., Return From Interrupt Instruction) allows returns from interrupt handling routines. The INTO instruction raises the overflow exception, if the OF flag is set. If the flag is clear, execution continues without raising the exception. This instruction allows software to access the overflow exception handler explicitly to check for overflow conditions. The BOUND instruction compares a signed value against upper and lower bounds, and raises the BOUND range exceeded exception if the value is less than the lower bound or greater than the upper bound. This instruction is useful for operations such as checking an array index to make sure it falls within the range defined for the array.
6.10. STRING OPERATIONS

The MOVS (Move String), CMPS (Compare string), SCAS (Scan string), LODS (Load string), and STOS (Store string) instructions permit large data structures, such as alphanumeric character strings, to be moved and examined in memory. These instructions operate on individual elements in a string, which can be a byte, word, or doubleword. The string elements to be operated on are identified with the ESI (source string element) and EDI (destination string element) registers. Both of these registers contain absolute addresses (offsets into a segment) that point to a string element. By default, the ESI register addresses the segment identified with the DS segment register. A segment-override prefix allows the ESI register to be associated with the CS, SS, ES, FS, or GS segment register. The EDI register addresses the segment identified with the ES segment register; no segment override is allowed for the EDI register. The use of two different segment registers in the string instructions permits operations to be performed on strings located in different segments. Or by associating the ESI register with the ES segment register, both the
6-39
source and destination strings can be located in the same segment. (This latter condition can also be achieved by loading the DS and ES segment registers with the same segment selector and allowing the ESI register to default to the DS register.) The MOVS instruction moves the string element addressed by the ESI register to the location addressed by the EDI register. The assembler recognizes three short forms of this instruction, which specify the size of the string to be moved: MOVSB (move byte string), MOVSW (move word string), and MOVSD (move doubleword string). The CMPS instruction subtracts the destination string element from the source string element and updates the status flags (CF, ZF, OF, SF, PF, and AF) in the EFLAGS register according to the results. Neither string element is written back to memory. The assembler recognizes three short forms of the CMPS instruction: CMPSB (compare byte strings), CMPSW (compare word strings), and CMPSD (compare doubleword strings). The SCAS instruction subtracts the destination string element from the contents of the EAX, AX, or AL register (depending on operand length) and updates the status flags according to the results. The string element and register contents are not modified. The following short forms of the SCAS instruction specifies the operand length: SCASB (scan byte string), SCASW (scan word string), and SCASD (scan doubleword string). The LODS instruction loads the source string element identified by the ESI register into the EAX register (for a doubleword string), the AX register (for a word string), or the AL register (for a byte string). The short forms for this instruction are LODSB (load byte string), LODSW (load word string), and LODSD (load doubleword string). This instruction is usually used in a loop, where other instructions process each element of the string after they are loaded into the target register. The STOS instruction stores the source string element from the EAX (doubleword string), AX (word string), or AL (byte string) register into the memory location identified with the EDI register. The short forms for this instruction are STOSB (store byte string), STOSW (store word string), and STOSD (store doubleword string). This instruction is also normally used in a loop. Here a string is commonly loaded into the register with a LODS instruction, operated on by other instructions, and then stored again in memory with a STOS instruction. The I/O instructions (refer to Section 6.11., I/O Instructions) also perform operations on strings in memory.
6.10.1. Repeating String Operations

The string instructions described in Section 6.10., String Operations perform one iteration of a string operation. To operate strings longer than a doubleword, the string instructions can be combined with a repeat prefix (REP) to create a repeating instruction or be placed in a loop. When used in string instructions, the ESI and EDI registers are automatically incremented or decremented after each iteration of an instruction to point to the next element (byte, word, or doubleword) in the string. String operations can thus begin at higher addresses and work toward lower ones, or they can begin at lower addresses and work toward higher ones. The DF flag in
6-40
the EFLAGS register controls whether the registers are incremented (DF=0) or decremented (DF=1). The STD and CLD instructions set and clear this flag, respectively. The following repeat prefixes can be used in conjunction with a count in the ECX register to cause a string instruction to repeat:
REPRepeat while the ECX register not zero. REPE/REPZRepeat while the ECX register not zero and the ZF flag is set. REPNE/REPNZRepeat while the ECX register not zero and the ZF flag is clear.
When a string instruction has a repeat prefix, the operation executes until one of the termination conditions specified by the prefix is satisfied. The REPE/REPZ and REPNE/REPNZ prefixes are used only with the CMPS and SCAS instructions. Also, note that a A REP STOS instruction is the fastest way to initialize a large block of memory.
6.11. I/O INSTRUCTIONS

The IN (input from port to register), INS (input from port to string), OUT (output from register to port), and OUTS (output string to port) instructions move data between the processors I/O ports and either a register or memory. The register I/O instructions (IN and OUT) move data between an I/O port and the EAX register (32-bit I/O), the AX register (16-bit I/O), or the AL (8-bit I/O) register. The I/O port being read or written to is specified with an immediate operand or an address in the DX register. The block I/O instructions (INS and OUTS) instructions move blocks of data (strings) between an I/O port and memory. These instructions operate similar to the string instructions (refer to Section 6.10., String Operations). The ESI and EDI registers are used to specify string elements in memory and the repeat prefixes (REP) are used to repeat the instructions to implement block moves. The assembler recognizes the following alternate mnemonics for these instructions: INSB (input byte), INSW (input word), and INSD (input doubleword), and OUTB (output byte), OUTW (output word), and OUTD (output doubleword). The INS and OUTS instructions use an address in the DX register to specify the I/O port to be read or written to.
6.12. ENTER AND LEAVE INSTRUCTIONS

The ENTER and LEAVE instructions provide machine-language support for procedure calls in block-structured languages, such as C and Pascal. These instructions and the call and return mechanism that they support are described in detail in Section 4.5., Procedure Calls for BlockStructured Languages in Chapter 4, Procedure Calls, Interrupts, and Exceptions.
6-41
6.13. EFLAGS INSTRUCTIONS

The EFLAGS instructions allow the state of selected flags in the EFLAGS register to be read or modified.
6.13.1. Carry and Direction Flag Instructions

The STC (set carry flag), CLC (clear carry flag), and CMC (complement carry flag) instructions allow the CF flags in the EFLAGS register to be modified directly. They are typically used to initialize the CF flag to a known state before an instruction that uses the flag in an operation is executed. They are also used in conjunction with the rotate-with-carry instructions (RCL and RCR). The STD (set direction flag) and CLD (clear direction flag) instructions allow the DF flag in the EFLAGS register to be modified directly. The DF flag determines the direction in which index registers ESI and EDI are stepped when executing string processing instructions. If the DF flag is clear, the index registers are incremented after each iteration of a string instruction; if the DF flag is set, the registers are decremented.
6.13.2. Interrupt Flag Instructions

The STI (set interrupt flag) and CTI (clear interrupt flag) instructions allow the interrupt IF flag in the EFLAGS register to be modified directly. The IF flag controls the servicing of hardwaregenerated interrupts (those received at the processors INTR pin). If the IF flag is set, the processor services hardware interrupts; if the IF flag is clear, hardware interrupts are masked.
6.13.3. EFLAGS Transfer Instructions

The EFLAGS transfer instructions allow groups of flags in the EFLAGS register to be copied to a register or memory or be loaded from a register or memory. The LAHF (load AH from flags) and SAHF (store AH into flags) instructions operate on five of the EFLAGS status flags (SF, ZF, AF, PF, and CF). The LAHF instruction copies the status flags to bits 7, 6, 4, 2, and 0 of the AH register, respectively. The contents of the remaining bits in the register (bits 5, 3, and 1) are undefined, and the contents of the EFLAGS register remain unchanged. The SAHF instruction copies bits 7, 6, 4, 2, and 0 from the AH register into the SF, ZF, AF, PF, and CF flags, respectively in the EFLAGS register. The PUSHF (push flags), PUSHFD (push flags double), POPF (pop flags), and POPFD (pop flags double) instructions copy the flags in the EFLAGS register to and from the stack. The PUSHF instruction pushes the lower word of the EFLAGS register onto the stack (refer to Figure 6-11). The PUSHFD instruction pushes the entire EFLAGS register onto the stack (with the RF and VM flags read as clear).
6-42
PUSHFD/POPFD PUSHF/POPF
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 V V N A 0 0 0 0 0 0 0 0 0 0 I I I C V R 0 T D M F P F I O P L O D I T S Z P C A F F F F F F 0 F 0 F 1 F
Figure 6-11. Flags Affected by the PUSHF, POPF, PUSHFD, and POPFD instructions
The POPF instruction pops a word from the stack into the EFLAGS register. Only bits 11, 10, 8, 7, 6, 4, 2, and 0 of the EFLAGS register are affected with all uses of this instruction. If the current privilege level (CPL) of the current code segment is 0 (most privileged), the IOPL bits (bits 13 and 12) also are affected. If the I/O privilege level (IOPL) is greater than or equal to the CPL, numerically, the IF flag (bit 9) also is affected. The POPFD instruction pops a doubleword into the EFLAGS register. This instruction can change the state of the AC bit (bit 18) and the ID bit (bit 21), as well as the bits affected by a POPF instruction. The restrictions for changing the IOPL bits and the IF flag that were given for the POPF instruction also apply to the POPFD instruction.
6.13.4. Interrupt Flag Instructions

The CLI (clear interrupt flag) and STI (set interrupt flag) instructions clear and set the interrupt flag (IF) in the EFLAGS register, respectively. Clearing the IF flag causes external interrupts to be ignored. The ability to execute these instructions depends on the operating mode of the processor and the current privilege level (CPL) of the program or task attempting to execute these instructions.
6.14. SEGMENT REGISTER INSTRUCTIONS

The processor provides a variety of instructions that address the segment registers of the processor directly. These instructions are only used when an operating system or executive is using the segmented or the real-address mode memory model.
6.14.1. Segment-Register Load and Store Instructions

The MOV instruction (introduced in Section 6.3.1., General-Purpose Data Movement Instructions) and the PUSH and POP instructions (introduced in Section 6.3.2., Stack Manipulation Instructions) can transfer 16-bit segment selectors to and from segment registers (DS, ES, FS, GS, and SS). The transfers are always made to or from a segment register and a general-purpose register or memory. Transfers between segment registers are not supported.
6-43
The POP and MOV instructions cannot place a value in the CS register. Only the far controltransfer versions of the JMP, CALL, and RET instructions (refer to Section 6.14.2., Far Control Transfer Instructions) affect the CS register directly.
6.14.2. Far Control Transfer Instructions

The JMP and CALL instructions (refer to Section 6.9., Control Transfer Instructions) both accept a far pointer as a source operand to transfer program control to a segment other than the segment currently being pointed to by the CS register. When a far call is made with the CALL instruction, the current values of the EIP and CS registers are both pushed on the stack. The RET instruction (refer to Section 6.9.1.2., Call and Return Instructions) can be used to execute a far return. Here, program control is transferred from a code segment that contains a called procedure back to the code segment that contained the calling procedure. The RET instruction restores the values of the CS and EIP registers for the calling procedure from the stack.
6.14.3. Software Interrupt Instructions

The software interrupt instructions INT, INTO, BOUND, and IRET (refer to Section 6.9.3., Software Interrupts) can also call and return from interrupt and exception handler procedures that are located in a code segment other than the current code segment. With these instructions, however, the switching of code segments is handled transparently from the application program.
6.14.4. Load Far Pointer Instructions

The load far pointer instructions LDS (load far pointer using DS), LES (load far pointer using ES), LFS (load far pointer using FS), LGS (load far pointer using GS), and LSS (load far pointer using SS) load a far pointer from memory into a segment register and a general-purpose general register. The segment selector part of the far pointer is loaded into the selected segment register and the offset is loaded into the selected general-purpose register.
6.15. MISCELLANEOUS INSTRUCTIONS

The following instructions perform miscellaneous operations that are of interest to applications programmers.
6.15.1. Address Computation Instruction

The LEA (load effective address) instruction computes the effective address in memory (offset within a segment) of a source operand and places it in a general-purpose register. This instruction can interpret any of the Pentium Pro processors addressing modes and can perform any indexing or scaling that may be needed. It is especially useful for initializing the ESI or EDI
6-44
registers before the execution of string instructions or for initializing the EBX register before an XLAT instruction.
6.15.2. Table Lookup Instructions

The XLAT and XLATB (table lookup) instructions replace the contents of the AL register with a byte read from a translation table in memory. The initial value in the AL register is interpreted as an unsigned index into the translation table. This index is added to the contents of the EBX register (which contains the base address of the table) to calculate the address of the table entry. These instructions are used for applications such as converting character codes from one alphabet into another (for example, an ASCII code could be used to look up its EBCDIC equivalent in a table).
6.15.3. Processor Identification Instruction

The CPUID (processor identification) instruction provides information about the processor on which the instruction is executed. To obtain processor information, a value of from 0 to 2 is loaded in the EAX register and then the CPUID instruction is executed. The resulting processor information is placed in the EAX, EBX, ECX, and EDX registers. Table 6-5 shows the information that is provided depending on the value initially entered in the EAX register. Refer to Section 11.1., Processor Identification in Chapter 11, Processor Identification and Feature Determination for detailed information on the output of the CPUID instruction.
Table 6-5. Information Provided by the CPUID Instruction
Initial EAX Value 0 1 2 Information Provided about the Processor Maximum CPUID input value. Vendor identification string (GenuineIntel). Version information (family ID, model ID, and stepping ID). Feature information (identifies the feature set for the processor model). Cache information (about the processors internal cache memory).
6.15.4. No-Operation and Undefined Instructions

The NOP (no operation) instruction increments the EIP register to point at the next instruction, but affects nothing else. The UD2 (undefined) instruction generates an invalid opcode exception. Intel reserves the opcode for this instruction for this function. The instruction is provided to allow software to test an invalid opcode exception handler.
6-45
6-46
7
Floating-Point Unit
CHAPTER 7 FLOATING-POINT UNIT

The Intel Architecture (IA) Floating-Point Unit (FPU) provides high-performance floatingpoint processing capabilities. It supports the real, integer, and BCD-integer data types and the floating-point processing algorithms and exception handling architecture defined in the IEEE 754 and 854 Standards for Floating-Point Arithmetic. The FPU executes instructions from the processors normal instruction stream and greatly improves the efficiency of IA processors in handling the types of high-precision floating-point processing operations commonly found in scientific, engineering, and business applications. This chapter describes the data types that the FPU operates on, the FPUs execution environment, and the FPU-specific instruction set. Detailed descriptions of the FPU instructions are given in Chapter 3, Instruction Set Reference, in the Intel Architecture Software Developers Manual, Volume 2.
7.1.
COMPATIBILITY AND EASE OF USE OF THE INTEL ARCHITECTURE FPU
The architecture of the IA FPU has evolved in parallel with the architecture of early IA processors. The first Intel Math Coprocessors (the Intel 8087, Intel 287, and Intel 387) were companion processors to the Intel 8086/8088, Intel 286, and Intel386 processors, respectively, and were designed to improve and extend the numeric processing capability of the IA. The Intel486 DX processor for the first time integrated the CPU and the FPU architectures on one chip. The Pentium processors FPU offered the same architecture as the Intel486 DX processors FPU, but with improved performance. The Pentium Pro processors FPU further extended the floating-point processing capability of IA family of processors and added several new instructions to improve processing throughput. Throughout this evolution, compatibility among the various generations of FPUs and math coprocessors has been maintained. For example, the Pentium Pro processors FPU is fully compatible with the Pentium and Intel486 DX processorss FPUs. Each generation of the IA FPUs have been explicitly designed to deliver stable, accurate results when programmed using straightforward pencil and paper algorithms, bringing the functionality and power of accurate numeric computation into the hands of the general user. The IEEE 754 standard specifically addresses this issue, recognizing the fundamental importance of making numeric computations both easy and safe to use. For example, some processors can overflow when two single-precision floating-point numbers are multiplied together and then divided by a third, even if the final result is a perfectly valid 32bit number. The IA FPUs deliver the correctly rounded result. Other typical examples of undesirable machine behavior in straightforward calculations occur when computing financial rate of return, which involves the expression (1 + i) n or when solving for roots of a quadratic equation:
7-1
FLOATING-POINT UNIT
b b 4ac -------------------------------------2a If a does not equal 0, the formula is numerically unstable when the roots are nearly coincident or when their magnitudes are wildly different. The formula is also vulnerable to spurious over/underflows when the coefficients a, b, and c are all very big or all very tiny. When singleprecision (4-byte) floating-point coefficients are given as data and the formula is evaluated in the FPUs normal way, keeping all intermediate results in its stack, the FPU produces impeccable single-precision roots. This happens because, by default and with no effort on the programmers part, the FPU evaluates all those sub-expressions with so much extra precision and range as to overwhelm almost any threat to numerical integrity. If double-precision data and results were at issue, a better formula would have to be used, and once again the FPUs default evaluation of that formula would provide substantially enhanced numerical integrity over mere double-precision evaluation. On most machines, straightforward algorithms will not deliver consistently correct results (and will not indicate when they are incorrect). To obtain correct results on traditional machines under all conditions usually requires sophisticated numerical techniques that go beyond typical programming practice. General application programmers using straightforward algorithms will produce much more reliable programs using the IAs. This simple fact greatly reduces the software investment required to develop safe, accurate computation-based products. Beyond traditional numeric support for scientific applications, the IA processors have built-in facilities for commercial computing. They can process decimal numbers of up to 18 digits without round-off errors, performing exact arithmetic on integers as large as 264 (or 1018). Exact arithmetic is vital in accounting applications where rounding errors may introduce monetary losses that cannot be reconciled. The Intel FPUs contain a number of optional numerical facilities that can be invoked by sophisticated users. These advanced features include directed rounding, gradual underflow, and programmed exception-handling facilities. These automatic exception-handling facilities permit a high degree of flexibility in numeric processing software, without burdening the programmer. While performing numeric calculations, the processor automatically detects exception conditions that can potentially damage a calculation (for example, X 0 or X when X < 0). By default, on-chip exception logic handles these exceptions so that a reasonable result is produced and execution may proceed without program interruption. Alternatively, the processor can invoke a software exception handler to provide special results whenever various types of exceptions are detected.
7.2.
REAL NUMBERS AND FLOATING-POINT FORMATS
This section describes how real numbers are represented in floating-point format in the IA FPU. It also introduces terms such as normalized numbers, denormalized numbers, biased exponents, signed zeros, and NaNs. Readers who are already familiar with floating-point processing techniques and the IEEE standards may wish to skip this section.
7-2
FLOATING-POINT UNIT
7.2.1.
Real Number System
As shown in Figure 7-1, the real-number system comprises the continuum of real numbers from minus infinity () to plus infinity (+).
-100
Binary Real Number System -1 0 10 -10 1
100
Subset of binary real-numbers that can be represented with IEEE single-precision (32-bit) floating-point format -1 0 10 100 -100 -10 1

+10
10.0000000000000000000000 Precision 1.11111111111111111111111 24 Binary Digits
Numbers within this range cannot be represented.
Figure 7-1. Binary Real Number System
Because the size and number of registers that any computer can have is limited, only a subset of the real-number continuum can be used in real-number calculations. As shown at the bottom of Figure 7-1, the subset of real numbers that a particular FPU supports represents an approximation of the real number system. The range and precision of this real-number subset is determined by the format that the FPU uses to represent real numbers.
7-3
FLOATING-POINT UNIT
7.2.2.
Floating-Point Format
To increase the speed and efficiency of real-number computations, computers or FPUs typically represent real numbers in a binary floating-point format. In this format, a real number has three parts: a sign, a significand, and an exponent. Figure 7-2 shows the binary floating-point format that the IA FPU uses. This format conforms to the IEEE standard. The sign is a binary value that indicates whether the number is positive (0) or negative (1). The significand has two parts: a 1-bit binary integer (also referred to as the J-bit) and a binary fraction. The J-bit is often not represented, but instead is an implied value. The exponent is a binary integer that represents the base-2 power that the significand is raised to.
Sign Exponent Significand
Fraction Integer or J-Bit
Figure 7-2. Binary Floating-Point Format
Table 7-1 shows how the real number 178.125 (in ordinary decimal format) is stored in floatingpoint format. The table lists a progression of real number notations that leads to the single-real, 32-bit floating-point format (which is one of the floating-point formats that the FPU supports). In this format, the significand is normalized (refer to Section 7.2.2.1., Normalized Numbers) and the exponent is biased (refer to Section 7.2.2.2., Biased Exponent). For the single-real format, the biasing constant is +127. 7.2.2.1. NORMALIZED NUMBERS
In most cases, the FPU represents real numbers in normalized form. This means that except for zero, the significand is always made up of an integer of 1 and the following fraction: 1.fff...ff For values less than 1, leading zeros are eliminated. (For each leading zero eliminated, the exponent is decremented by one.)
7-4
FLOATING-POINT UNIT
Table 7-1. Real Number Notation

Notation Ordinary Decimal Scientific Decimal Scientific Binary Scientific Binary (Biased Exponent) Single-Real Format 178.125 1.78125E102 1.0110010001E2111 1.0110010001E210000110 Sign 0 Biased Exponent 10000110 Normalized Significand 01100100010000000000000 1. (Implied) Value
Representing numbers in normalized form maximizes the number of significant digits that can be accommodated in a significand of a given width. To summarize, a normalized real number consists of a normalized significand that represents a real number between 1 and 2 and an exponent that specifies the numbers binary point. 7.2.2.2. BIASED EXPONENT
The FPU represents exponents in a biased form. This means that a constant is added to the actual exponent so that the biased exponent is always a positive number. The value of the biasing constant depends on the number of bits available for representing exponents in the floating-point format being used. The biasing constant is chosen so that the smallest normalized number can be reciprocated without overflow. (Refer to Section 7.4.1., Real Numbers for a list of the biasing constants that the FPU uses for the various sizes of real data-types.)
7.2.3.
Real Number and Non-number Encodings
A variety of real numbers and special values can be encoded in the FPUs floating-point format. These numbers and values are generally divided into the following classes:
Signed zeros. Denormalized finite numbers. Normalized finite numbers. Signed infinities. NaNs. Indefinite numbers.
(The term NaN stands for Not a Number.)
7-5
FLOATING-POINT UNIT
Figure 7-3 shows how the encodings for these numbers and non-numbers fit into the real number continuum. The encodings shown here are for the IEEE single-precision (32-bit) format, where the term S indicates the sign bit, E the biased exponent, and F the fraction. (The exponent values are given in decimal.) The FPU can operate on and/or return any of these values, depending on the type of computation being performed. The following sections describe these number and non-number classes. 7.2.3.1. SIGNED ZEROS
Zero can be represented as a +0 or a 0 depending on the sign bit. Both encodings are equal in value. The sign of a zero result depends on the operation being performed and the rounding mode being used. Signed zeros have been provided to aid in implementing interval arithmetic. The sign of a zero may indicate the direction from which underflow occurred, or it may indicate the sign of an that has been reciprocated. 7.2.3.2. NORMALIZED AND DENORMALIZED FINITE NUMBERS
Non-zero, finite numbers are divided into two classes: normalized and denormalized. The normalized finite numbers comprise all the non-zero finite values that can be encoded in a normalized real number format between zero and . In the single-real format shown in Figure 7-3, this group of numbers includes all the numbers with biased exponents ranging from 1 to 25410 (unbiased, the exponent range is from 12610 to +12710).
NaN
Denormalized Finite Normalized Finite 0 +0 +Denormalized Finite
+Normalized Finite
NaN
+
S 1 1
E 0 0
Real Number and NaN Encodings For 32-Bit Floating-Point Format E F F S 0 +0 0 0 0 0 0.XXX2 Denormalized Finite Normalized Finite SNaN QNaN +Denormalized Finite 0 0 0.XXX2
1 1...254 Any Value 1 255 0 1.0XX2 1.1XX
+Normalized 0 1...254 Any Value Finite
+ 0
255
0 1.0XX2 1.1XX
X1 255 X1 255
+SNaN X1 255 +QNaN X1 255
NOTES: 1. Sign bit ignored. 2. Fractions must be non-zero.
Figure 7-3. Real Numbers and NaNs
7-6
FLOATING-POINT UNIT
When real numbers become very close to zero, the normalized-number format can no longer be used to represent the numbers. This is because the range of the exponent is not large enough to compensate for shifting the binary point to the right to eliminate leading zeros. When the biased exponent is zero, smaller numbers can only be represented by making the integer bit (and perhaps other leading bits) of the significand zero. The numbers in this range are called denormalized (or tiny) numbers. The use of leading zeros with denormalized numbers allows smaller numbers to be represented. However, this denormalization causes a loss of precision (the number of significant bits in the fraction is reduced by the leading zeros). When performing normalized floating-point computations, an FPU normally operates on normalized numbers and produces normalized numbers as results. Denormalized numbers represent an underflow condition. A denormalized number is computed through a technique called gradual underflow. Table 7-2 gives an example of gradual underflow in the denormalization process. Here the single-real format is being used, so the minimum exponent (unbiased) is 12610. The true result in this example requires an exponent of 12910 in order to have a normalized number. Since 12910 is beyond the allowable exponent range, the result is denormalized by inserting leading zeros until the minimum exponent of 12610 is reached.
Table 7-2. Denormalization Process
Operation True Result Denormalize Denormalize Denormalize Denormal Result NOTE: * Expressed as an unbiased, decimal number. Sign 0 0 0 0 0 Exponent* 129 128 127 126 126 Significand 1.01011100000...00 0.10101110000...00 0.01010111000...00 0.00101011100...00 0.00101011100...00
In the extreme case, all the significant bits are shifted out to the right by leading zeros, creating a zero result. The FPU deals with denormal values in the following ways:
It avoids creating denormals by normalizing numbers whenever possible. It provides the floating-point underflow exception to permit programmers to detect cases when denormals are created. It provides the floating-point denormal operand exception to permit procedures or programs to detect when denormals are being used as source operands for computations.
When a denormal number in single- or double-real format is used as a source operand and the denormal exception is masked, the FPU automatically normalizes the number when it is converted to extended-real format.
7-7
FLOATING-POINT UNIT
7.2.3.3.
SIGNED INFINITIES
The two infinities, + and , represent the maximum positive and negative real numbers, respectively, that can be represented in the floating-point format. Infinity is always represented by a zero significand (fraction and integer bit) and the maximum biased exponent allowed in the specified format (for example, 255 10 for the single-real format). The signs of infinities are observed, and comparisons are possible. Infinities are always interpreted in the affine sense; that is, is less than any finite number and + is greater than any finite number. Arithmetic on infinities is always exact. Exceptions are generated only when the use of an infinity as a source operand constitutes an invalid operation. Whereas denormalized numbers represent an underflow condition, the two infinity numbers represent the result of an overflow condition. Here, the normalized result of a computation has a biased exponent greater than the largest allowable exponent for the selected result format. 7.2.3.4. NANS
Since NaNs are non-numbers, they are not part of the real number line. In Figure 7-3, the encoding space for NaNs in the FPU floating-point formats is shown above the ends of the real number line. This space includes any value with the maximum allowable biased exponent and a non-zero fraction. (The sign bit is ignored for NaNs.) The IEEE standard defines two classes of NaN: quiet NaNs (QNaNs) and signaling NaNs (SNaNs). A QNaN is a NaN with the most significant fraction bit set; an SNaN is a NaN with the most significant fraction bit clear. QNaNs are allowed to propagate through most arithmetic operations without signaling an exception. SNaNs generally signal an invalid operation exception whenever they appear as operands in arithmetic operations. Exceptions are discussed in Section 7.7., Floating-Point Exception Handling. Refer to Section 7.6., Operating on NaNs, for detailed information on how the FPU handles NaNs.
7.2.4.
Indefinite
For each FPU data type, one unique encoding is reserved for representing the special value indefinite. For example, when operating on real values, the real indefinite value is a QNaN (refer to Section 7.4.1., Real Numbers). The FPU produces indefinite values as responses to masked floating-point exceptions.
7.3.
FPU ARCHITECTURE
From an abstract, architectural view, the FPU is a coprocessor that operates in parallel with the processors integer unit (refer to Figure 7-4). The FPU gets its instructions from the same instruction decoder and sequencer as the integer unit and shares the system bus with the integer unit. Other than these connections, the integer unit and FPU operate independently and in parallel. (The actual microarchitecture of an IA processor varies among the various families of processors. For example, the Pentium Pro processor has two integer units and two FPUs;
7-8
FLOATING-POINT UNIT
whereas, the Pentium processor has two integer units and one FPU, and the Intel486 processor has one integer unit and one FPU.)
Instruction Decoder and Sequencer
Integer Unit Data Bus
FPU
Figure 7-4. Relationship Between the Integer Unit and the FPU
The instruction execution environment of the FPU (refer to Figure 7-5) consists of 8 data registers (called the FPU data registers) and the following special-purpose registers:
The status register. The control register. The tag word register. Instruction pointer register. Last operand (data pointer) register. Opcode register.
These registers are described in the following sections.
7.3.1.
FPU Data Registers
The FPU data registers (shown in Figure 7-5) consist of eight 80-bit registers. Values are stored in these registers in the extended-real format shown in Figure 7-17. When real, integer, or packed BCD integer values (in any of the formats shown in Figure 7-17) are loaded from memory into any of the FPU data registers, the values are automatically converted into extended-real format (if they are not already in that format). When computation results are subsequently transferred back into memory from any of the FPU registers, the results can be left in the extended-real format or converted back into one of the other FPU formats (real, integer, or packed BCD integers) shown in Figure 7-17. The FPU instructions treat the eight FPU data registers as a register stack (refer to Figure 7-6). All addressing of the data registers is relative to the register on the top of the stack. The register number of the current top-of-stack register is stored in the TOP (stack TOP) field in the FPU status word. Load operations decrement TOP by one and load a value into the new top-of-stack register, and store operations store the value from the current TOP register in memory and then
7-9
FLOATING-POINT UNIT
increment TOP by one. (For the FPU, a load operation is equivalent to a push and a store operation is equivalent to a pop.)
FPU Data Registers Sign 79 78 R7 R6 R5 R4 R3 R2 R1 R0 15 Control Register Status Register Tag Register 0 47 FPU Instruction Pointer FPU Operand (Data) Pointer 10 Opcode 0 0 64 63 Significand 0 Exponent
Figure 7-5. FPU Execution Environment
If a load operation is performed when TOP is at 0, register wraparound occurs and the new value of TOP is set to 7. The floating-point stack-overflow exception indicates when wraparound might cause an unsaved value to be overwritten (refer to Section 7.8.1.1., Stack Overflow or Underflow Exception (#IS)).
FPU Data Register Stack 7 6 Growth Stack 5 4 3 2 1 0 ST(2) ST(1) ST(0) Top 011B
Figure 7-6. FPU Data Register Stack
7-10
FLOATING-POINT UNIT
Many floating-point instructions have several addressing modes that permit the programmer to implicitly operate on the top of the stack, or to explicitly operate on specific registers relative to the TOP. Assemblers supports these register addressing modes, using the expression ST(0), or simply ST, to represent the current stack top and ST(i) to specify the ith register from TOP in the stack (0 i 7). For example, if TOP contains 011B (register 3 is the top of the stack), the following instruction would add the contents of two registers in the stack (registers 3 and 5):
FADD ST, ST(2);
Figure 7-7 shows an example of how the stack structure of the FPU registers and instructions are typically used to perform a series of computations. Here, a two-dimensional dot product is computed, as follows: 1. The first instruction (FLD value1) decrements the stack register pointer (TOP) and loads the value 5.6 from memory into ST(0). The result of this operation is shown in snap-shot (a). 2. The second instruction multiplies the value in ST(0) by the value 2.4 from memory and stores the result in ST(0), shown in snap-shot (b). 3. The third instruction decrements TOP and loads the value 3.8 in ST(0). 4. The fourth instruction multiplies the value in ST(0) by the value 10.3 from memory and stores the result in ST(0), shown in snap-shot (c). 5. The fifth instruction adds the value and the value in ST(1) and stores the result in ST(0), shown in snap-shot (d). The style of programming demonstrated in this example is supported by the floating-point instruction set. In cases where the stack structure causes computation bottlenecks, the FXCH (exchange FPU register contents) instruction can be used to streamline a computation. 7.3.1.1. PARAMETER PASSING WITH THE FPU REGISTER STACK
Like the general-purpose registers in the processors integer unit, the contents of the FPU data registers are unaffected by procedure calls, or in other words, the values are maintained across procedure boundaries. A calling procedure can thus use the FPU data registers (as well as the procedure stack) for passing parameter between procedures. The called procedure can reference parameters passed through the register stack using the current stack register pointer (TOP) and the ST(0) and ST(i) nomenclature. It is also common practice for a called procedure to leave a return value or result in register ST(0) when returning execution to the calling procedure or program.
7-11
FLOATING-POINT UNIT
Computation Dot Product = (5.6 x 2.4) + (3.8 x 10.3) Code: FLD value1 FMUL value2 FLD value3 FMUL value4 FADD ST(1) (a) R7 R6 R5 R4 R3 R2 R1 R0 5.6 ST(0) (b) R7 R6 R5 R4 R3 R2 R1 R0 13.44 ST(0) ;(a) value1=5.6 ;(b) value2=2.4 ; value3=3.8 ;(c)value4=10.3 ;(d) (c) R7 R6 R5 R4 R3 R2 R1 R0 13.44 39.14 ST(1) ST(0) (d) R7 R6 R5 R4 R3 R2 R1 R0 13.44 52.58 ST(1) ST(0)
Figure 7-7. Example FPU Dot Product Computation
7.3.2.
FPU Status Register
The 16-bit FPU status register (refer to Figure 7-8) indicates the current state of the FPU. The flags in the FPU status register include the FPU busy flag, top-of-stack (TOP) pointer, condition code flags, error summary status flag, stack fault flag, and exception flags. The FPU sets the flags in this register to show the results of operations. The contents of the FPU status register (referred to as the FPU status word) can be stored in memory using the FSTSW/FNSTSW, FSTENV/FNSTENV, and FSAVE/FNSAVE instructions. It can also be stored in the AX register of the integer unit, using the FSTSW/FNSTSW instructions. 7.3.2.1. TOP OF STACK (TOP) POINTER
A pointer to the FPU data register that is currently at the top of the FPU register stack is contained in bits 11 through 13 of the FPU status word. This pointer, which is commonly referred to as TOP (for top-of-stack), is a binary value from 0 to 7. Refer to Section 7.3.1., FPU Data Registers, for more information about the TOP pointer. 7.3.2.2. CONDITION CODE FLAGS
The four FPU condition code flags (C0 through C3) indicate the results of floating-point comparison and arithmetic operations. Table 7-3 summarizes the manner in which the floating-
7-12
FLOATING-POINT UNIT
point instructions set the condition code flags. These condition code bits are used principally for conditional branching and for storage of information used in exception handling (refer to Section 7.3.3., Branching and Conditional Moves on FPU Condition Codes).
FPU Busy Top of Stack Pointer

15 14 13 B C 3 11 10 9 8 7 6 5 4 3 2 1 0 C C C E S P U O Z D I 2 1 0 S F E E E E E E
TOP
Condition Code Error Summary Status Stack Fault Exception Flags Precision Underflow Overflow Zero Divide Denormalized Operand Invalid Operation
Figure 7-8. FPU Status Word
As shown in Table 7-3, the C1 condition code flag is used for a variety of functions. When both the IE and SF flags in the FPU status word are set, indicating a stack overflow or underflow exception (#IS), the C1 flag distinguishes between overflow (C1=1) and underflow (C1=0). When the PE flag in the status word is set, indicating an inexact (rounded) result, the C1 flag is set to 1 if the last rounding by the instruction was upward. The FXAM instruction sets C1 to the sign of the value being examined. The C2 condition code flag is used by the FPREM and FPREM1 instructions to indicate an incomplete reduction (or partial remainder). When a successful reduction has been completed, the C0, C3, and C1 condition code flags are set to the three least-significant bits of the quotient (Q2, Q1, and Q0, respectively). Refer to FPREM1Partial Remainder in Chapter 3, Instruction Set Reference, of the Intel Architecture Software Developers Manual, Volume 2, for more information on how these instructions use the condition code flags. The FPTAN, FSIN, FCOS, and FSINCOS instructions set the C2 flag to 1 to indicate that the source operand is beyond the allowable range of 263. Where the state of the condition code flags are listed as undefined in Table 7-3, do not rely on any specific value in these flags.
7-13
FLOATING-POINT UNIT
Table 7-3. FPU Condition Code Interpretation

Instruction FCOM, FCOMP, FCOMPP, FICOM, FICOMP, FTST, FUCOM, FUCOMP, FUCOMPP FCOMI, FCOMIP, FUCOMI, FUCOMIP FXAM FPREM, FPREM1 Q2 C0 C3 C2 Operands are not Comparable C1 0 or #IS Result of Comparison
Undefined. (These instructions set the status flags in the EFLAGS register.) Operand class Q1 0=reduction complete 1=reduction incomplete
#IS Sign Q0 or #IS
F2XM1, FADD, FADDP, FBSTP, FCMOVcc, FIADD, FDIV, FDIVP, FDIVR, FDIVRP, FIDIV, FIDIVR, FIMUL, FIST, FISTP, FISUB, FISUBR,FMUL, FMULP, FPATAN, FRNDINT, FSCALE, FST, FSTP, FSUB, FSUBP, FSUBR, FSUBRP,FSQRT, FYL2X, FYL2XP1 FCOS, FSIN, FSINCOS, FPTAN FABS, FBLD, FCHS, FDECSTP, FILD, FINCSTP, FLD, Load Constants, FSTP (ext. real), FXCH, FXTRACT FLDENV, FRSTOR FFREE, FLDCW, FCLEX/FNCLEX, FNOP, FSTCW/FNSTCW, FSTENV/FNSTENV, FSTSW/FNSTSW, FINIT/FNINIT, FSAVE/FNSAVE 0
Undefined
Roundup or #IS
Undefined
1=source operand out of range.
Roundup or #IS (Undefined if C2=1) 0 or #IS
Undefined
Each bit loaded from memory Undefined
7.3.2.3.
EXCEPTION FLAGS
The six exception flags (bits 0 through 5) of the status word indicate that one or more floatingpoint exceptions has been detected since the bits were last cleared. The individual exception flags (IE, DE, ZE, OE, UE, and PE) are described in detail in Section 7.7., Floating-Point Exception Handling, Each of the exception flags can be masked by an exception mask bit in the FPU control word (refer to Section 7.3.4., FPU Control Word). The exception summary status (ES) flag (bit 7) is set when any of the unmasked exception flags are set. When the ES
7-14
FLOATING-POINT UNIT
flag is set, the FPU exception handler is invoked, using one of the techniques described in Section 7.7.3., Software Exception Handling. (Note that if an exception flag is masked, the FPU will still set the flag if its associated exception occurs, but it will not set the ES flag.) The exception flags are sticky bits, meaning that once set, they remain set until explicitly cleared. They can be cleared by executing the FCLEX/FNCLEX (clear exceptions) instructions, by reinitializing the FPU with the FINIT/FNINIT or FSAVE/FNSAVE instructions, or by overwriting the flags with an FRSTOR or FLDENV instruction. The B-bit (bit 15) is included for 8087 compatibility only. It reflects the contents of the ES flag. 7.3.2.4. STACK FAULT FLAG
The stack fault flag (bit 6 of the FPU status word) indicates that stack overflow or stack underflow has occurred. The FPU explicitly sets the SF flag when it detects a stack overflow or underflow condition, but it does not explicitly clear the flag when it detects an invalid-arithmeticoperand condition. When this flag is set, the condition code flag C1 indicates the nature of the fault: overflow (C1 = 1) and underflow (C1 = 0). The SF flag is a sticky flag, meaning that after it is set, the processor does not clear it until it is explicitly instructed to do so (for example, by an FINIT/FNINIT, FCLEX/FNCLEX, or FSAVE/FNSAVE instruction). Refer to Section 7.3.6., FPU Tag Word for more information on FPU stack faults.
7.3.3.
Branching and Conditional Moves on FPU Condition Codes
The IA FPU (beginning with the Pentium Pro processor) supports two mechanisms for branching and performing conditional moves according to comparisons of two floating-point values. These mechanism are referred to here as the old mechanism and the new mechanism. The old mechanism is available in FPUs prior to the Pentium Pro processor and in the Pentium Pro processor. This mechanism uses the floating-point compare instructions (FCOM, FCOMP, FCOMPP, FTST, FUCOMPP, FICOM, and FICOMP) to compare two floating-point values and set the condition code flags (C0 through C3) according to the results. The contents of the condition code flags are then copied into the status flags of the EFLAGS register using a two step process (refer to Figure 7-9): 1. The FSTSW AX instruction moves the FPU status word into the AX register. 2. The SAHF instruction copies the upper 8 bits of the AX register, which includes the condition code flags, into the lower 8 bits of the EFLAGS register. When the condition code flags have been loaded into the EFLAGS register, conditional jumps or conditional moves can be performed based on the new settings of the status flags in the EFLAGS register.
7-15
FLOATING-POINT UNIT
15 Condition Status Code Flag C0 C1 C2 C3 CF (none) PF ZF

C 3
FPU Status Word

C C C 2 1 0
FSTSW AX Instruction 15
C 3
AX Register
C C C 2 1 0
SAHF Instruction 31 EFLAGS Register 7

Z F
0
P C F 1 F
Figure 7-9. Moving the FPU Condition Codes to the EFLAGS Register
The new mechanism is available only in the Pentium Pro processor. Using this mechanism, the new floating-point compare and set EFLAGS instructions (FCOMI, FCOMIP, FUCOMI, and FUCOMIP) compare two floating-point values and set the ZF, PF, and CF flags in the EFLAGS register directly. A single instruction thus replaces the three instructions required by the old mechanism. Note also that the FCMOVcc instructions (also new in the Pentium Pro processor) allow conditional moves of floating-point values (values in the FPU data registers) based on the setting of the status flags (ZF, PF, and CF) in the EFLAGS register. These instructions eliminate the need for an IF statement to perform conditional moves of floating-point values.
7.3.4.
FPU Control Word
The 16-bit FPU control word (refer to Figure 7-10) controls the precision of the FPU and rounding method used. It also contains the exception-flag mask bits. The control word is cached in the FPU control register. The contents of this register can be loaded with the FLDCW instruction and stored in memory with the FSTCW/FNSTCW instructions. When the FPU is initialized with either an FINIT/FNINIT or FSAVE/FNSAVE instruction, the FPU control word is set to 037FH, which masks all floating-point exceptions, sets rounding to nearest, and sets the FPU precision to 64 bits.
7-16
FLOATING-POINT UNIT
Infinity Control Rounding Control Precision Control

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 X RC PC P U O Z D I M M M M M M
Exception Masks Precision Underflow Overflow Zero Divide Denormalized Operand Invalid Operation Reserved
Figure 7-10. FPU Control Word
7.3.4.1.
EXCEPTION-FLAG MASKS
The exception-flag mask bits (bits 0 through 5 of the FPU control word) mask the 6 exception flags in the FPU status word (also bits 0 through 5). When one of these mask bits is set, its corresponding floating-point exception is blocked from being generated. 7.3.4.2. PRECISION CONTROL FIELD
The precision-control (PC) field (bits 8 and 9 of the FPU control word) determines the precision (64, 53, or 24 bits) of floating-point calculations made by the FPU (refer to Table 7-4). The default precision is extended precision, which uses the full 64-bit significand available with the extended-real format of the FPU data registers, but is configurable by the user, compiler, or operating system. This setting is best suited for most applications, because it allows applications to take full advantage of the precision of the extended-real format.
Table 7-4. Precision Control Field (PC)
Precision Single Precision (24-Bits*) Reserved Double Precision (53-Bits ) Extended Precision (64-Bits) NOTE:
* *
PC Field 00B 01B 10B 11B
Includes the implied integer bit.
7-17
FLOATING-POINT UNIT
The double precision and single precision settings, reduce the size of the significand to 53 bits and 24 bits, respectively. These settings are provided to support the IEEE standard and to allow exact replication of calculations which were done using the lower precision data types. Using these settings nullifies the advantages of the extended-real formats 64-bit significand length. When reduced precision is specified, the rounding of the significand value clears the unused bits on the right to zeros. The precision-control bits only affect the results of the following floating-point instructions: FADD, FADDP, FSUB, FSUBP, FSUBR, FSUBRP, FMUL, FMULP, FDIV, FDIVP, FDIVR, FDIVRP, and FSQRT. 7.3.4.3. ROUNDING CONTROL FIELD
The rounding control (RC) field of the FPU control register (bits 10 and 11) controls how the results of floating-point instructions are rounded. Four rounding modes are supported (refer to Table 7-5): round to nearest, round up, round down, and round toward zero. Round to nearest is the default rounding mode and is suitable for most applications. It provides the most accurate and statistically unbiased estimate of the true result.
Table 7-5. Rounding Control Field (RC)
Rounding Mode Round to nearest (even) Round down (toward ) Round up (toward +) Round toward zero (Truncate) RC Field Setting 00B Description Rounded result is the closest to the infinitely precise result. If two values are equally close, the result is the even value (that is, the one with the least-significant bit of zero). Rounded result is close to but no greater than the infinitely precise result. Rounded result is close to but no less than the infinitely precise result. Rounded result is close to but no greater in absolute value than the infinitely precise result.
01B 10B 11B
The round up and round down modes are termed directed rounding and can be used to implement interval arithmetic. Interval arithmetic is used to determine upper and lower bounds for the true result of a multistep computation, when the intermediate results of the computation are subject to rounding. The round toward zero mode (sometimes called the chop mode) is commonly used when performing integer arithmetic with the FPU. Whenever possible, the FPU produces an infinitely precise result in the destination format (single, double, or extended real). However, it is often the case that the infinitely precise result of an arithmetic or store operation cannot be encoded exactly in the format of the destination operand.
7-18
FLOATING-POINT UNIT
For example, the following value (a) has a 24-bit fraction. The least-significant bit of this fraction (the underlined bit) cannot be encoded exactly in the single-real format (which has only a 23-bit fraction): (a) 1.0001 0000 1000 0011 1001 0111E2 101 To round this result (a), the FPU first selects two representable fractions b and c that most closely bracket a in value (b < a < c). (b) 1.0001 0000 1000 0011 1001 011E2 101 (c) 1.0001 0000 1000 0011 1001 100E2 101 The FPU then sets the result to b or to c according to the rounding mode selected in the RC field. Rounding introduces an error in a result that is less than one unit in the last place to which the result is rounded. The rounded result is called the inexact result. When the FPU produces an inexact result, the floating-point precision (inexact) flag (PE) is set in the FPU status word. When the overflow exception is masked and the infinitely precise result is between the largest positive finite value allowed in a particular format and +, the FPU rounds the result as shown in Table 7-6.
Table 7-6. Rounding of Positive Numbers with Masked Overflow
Rounding Mode Rounding to nearest (even) Rounding toward zero (Truncate) Rounding up (toward +) Rounding down) (toward ) Result + Maximum, positive finite value + Maximum, positive finite value
When the overflow exception is masked and the infinitely precise result is between the largest negative finite value allowed in a particular format and , the FPU rounds the result as shown in Table 7-7.
Table 7-7. Rounding of Negative Numbers with Masked Overflow
Rounding Mode Rounding to nearest (even) Rounding toward zero (Truncate) Rounding up (toward +) Rounding down) (toward ) Result Maximum, negative finite value Maximum, negative finite value
The rounding modes have no effect on comparison operations, operations that produce exact results, or operations that produce NaN results.
7-19
FLOATING-POINT UNIT
7.3.5.
Infinity Control Flag
The infinity control flag (bit 12 of the FPU control word) is provided for compatibility with the Intel 287 Math Coprocessor; it is not meaningful for the Pentium Pro processor FPU or for the Pentium processor FPU, the Intel486 processor FPU, or Intel 387 processor NPX. Refer to Section 7.2.3.3., Signed Infinities, for information on how the IA FPUs handle infinity values.
7.3.6.
FPU Tag Word
The 16-bit tag word (refer to Figure 7-11) indicates the contents of each the 8 registers in the FPU data-register stack (one 2-bit tag per register). The tag codes indicate whether a register contains a valid number, zero, or a special floating-point number (NaN, infinity, denormal, or unsupported format), or whether it is empty. The FPU tag word is cached in the FPU in the FPU tag word register. When the FPU is initialized with either an FINIT/FNINIT or FSAVE/FNSAVE instruction, the FPU tag word is set to FFFFH, which marks all the FPU data registers as empty.
.
15 TAG(7) TAG(6) TAG(5) TAG(4) TAG(3) TAG(2) TAG(1)
0 TAG(0)
TAG Values 00 Valid 01 Zero 10 Special: invalid (NaN, unsupported), infinity, or denormal 11 Empty
Figure 7-11. FPU Tag Word
Each tag in the FPU tag word corresponds to a physical register (numbers 0 through 7). The current top-of-stack (TOP) pointer stored in the FPU status word can be used to associate tags with registers relative to ST(0). The FPU uses the tag values to detect stack overflow and underflow conditions. Stack overflow occurs when the TOP pointer is decremented (due to a register load or push operation) to point to a non-empty register. Stack underflow occurs when the TOP pointer is incremented (due to a save or pop operation) to point to an empty register or when an empty register is also referenced as a source operand. A non-empty register is defined as a register containing a zero (01), a valid value (00), or an special (10) value. Application programs and exception handlers can use this tag information to check the contents of an FPU data register without performing complex decoding of the actual data in the register. To read the tag register, it must be stored in memory using either the FSTENV/FNSTENV or FSAVE/FNSAVE instructions. The location of the tag word in memory after being saved with one of these instructions is shown in Figures 7-13 through 7-16.
7-20
FLOATING-POINT UNIT
Software cannot directly load or modify the tags in the tag register. The FLDENV and FRSTOR instructions load an image of the tag register into the FPU; however, the FPU uses those tag values only to determine if the data registers are empty (11B) or non-empty (00B, 01B, or 10B). If the tag register image indicates that a data register is empty, the tag in the tag register for that data register is marked empty (11B); if the tag register image indicates that the data register is non-empty, the FPU reads the actual value in the data register and sets the tag for the register accordingly. This action prevents a program from setting the values in the tag register to incorrectly represent the actual contents of non-empty data registers.
7.3.7.
FPU Instruction and Operand (Data) Pointers
The FPU stores pointers to the instruction and operand (data) for the last non-control instruction executed in two 48-bit registers: the FPU instruction pointer and FPU operand (data) pointer registers (refer to Figure 7-5). (This information is saved to provide state information for exception handlers.) The contents of the FPU instruction and operand pointer registers remain unchanged when any of the control instructions (FINIT/FNINIT, FCLEX/FNCLEX, FLDCW, FSTCW/FNSTCW, FSTSW/FNSTSW, FSTENV/FNSTENV, FLDENV, FSAVE/FNSAVE, FRSTOR, and WAIT/FWAIT) are executed. The contents of the FPU operand register are undefined if the prior non-control instruction did not have a memory operand. The pointers stored in the FPU instruction and operand pointer registers consist of an offset (stored in bits 0 through 31) and a segment selector (stored in bits 32 through 47). These registers can be accessed by the FSTENV/FNSTENV, FLDENV, FINIT/FNINIT, FSAVE/FNSAVE and FRSTOR instructions. The FINIT/FNINIT and FSAVE/FNSAVE instructions clear these registers. For all the IA FPUs and NPXs except the 8087, the FPU instruction pointer points to any prefixes that preceded the instruction. For the 8087, the FPU instruction pointer points only to the actual opcode.
7.3.8.
Last Instruction Opcode
The FPU stores the opcode of the last non-control instruction executed in an 11-bit FPU opcode register. (This information provides state information for exception handlers.) Only the first and second opcode bytes (after all prefixes) are stored in the FPU opcode register. Figure 7-12 shows the encoding of these two bytes. Since the upper 5 bits of the first opcode byte are the same for all floating-point opcodes (11011B), only the lower 3 bits of this byte are stored in the opcode register.
7.3.9.
Saving the FPUs State
The FSTENV/FNSTENV and FSAVE/FNSAVE instructions store FPU state information in memory for use by exception handlers and other system and application software. The
7-21
FLOATING-POINT UNIT
FSTENV/FNSTENV instruction saves the contents of the status, control, tag, FPU instruction pointer, FPU operand pointer, and opcode registers. The FSAVE/FNSAVE instruction stores that information plus the contents of the FPU data registers. Note that the FSAVE/FNSAVE instruction also initializes the FPU to default values (just as the FINIT/FNINIT instruction does) after it has saved the original state of the FPU.
1st Instruction Byte 2
2nd Instruction Byte 0 7 0
10
8 7
FPU Opcode Register
Figure 7-12. Contents of FPU Opcode Registers
The manner in which this information is stored in memory depends on the operating mode of the processor (protected mode or real-address mode) and on the operand-size attribute in effect (32-bit or 16-bit). Refer to Figures 7-13 through 7-16. In virtual-8086 mode or SMM, the realaddress mode formats shown in Figure 7-16 is used. Refer to Chapter 12, System Management Mode (SMM) of the Intel Architecture Software Developers Manual, Volume 3, for special considerations for using the FPU while in SMM.
31
32-Bit Protected Mode Format 16 15 Control Word Status Word Tag Word FPU Instruction Pointer Offset
0 0 4 8 12 16 20 24
0000
Opcode 10...00
FPU Instruction Pointer Selector
FPU Operand Pointer Offset FPU Operand Pointer Selector Reserved
Figure 7-13. Protected Mode FPU State Image in Memory, 32-Bit Format
7-22
FLOATING-POINT UNIT
31
32-Bit Real-Address Mode Format 16 15 Control Word Status Word Tag Word FPU Instruction Pointer 15...00
0 0 4 8 12 16 20 24
0000 0000
FPU Instruction Pointer 31...16 Reserved FPU Operand Pointer 31...16
Opcode 10...00
FPU Operand Pointer 15...00 000000000000
Reserved
Figure 7-14. Real Mode FPU State Image in Memory, 32-Bit Format
16-Bit Protected Mode Format 0 15 Control Word Status Word Tag Word FPU Instruction Pointer Offset FPU Operand Pointer Offset FPU Operand Pointer Selector 0 2 4 6 10 12
FPU Instruction Pointer Selector 8
Figure 7-15. Protected Mode FPU State Image in Memory, 16-Bit Format
7-23
FLOATING-POINT UNIT
16-Bit Real-Address Mode and Virtual-8086 Mode Format 15 Control Word Status Word Tag Word FPU Instruction Pointer 15...00 IP 19..16 0 Opcode 10...00 FPU Operand Pointer 15...00 0 0 2 4 6 8 10
OP 19..16 0 0 0 0 0 0 0 0 0 0 0 0 12
Figure 7-16. Real Mode FPU State Image in Memory, 16-Bit Format
The FLDENV and FRSTOR instructions allow FPU state information to be loaded from memory into the FPU. Here, the FLDENV instruction loads only the status, control, tag, FPU instruction pointer, FPU operand pointer, and opcode registers, and the FRSTOR instruction loads all the FPU registers, including the FPU stack registers.
7.4.
FLOATING-POINT DATA TYPES AND FORMATS
The IA FPU recognizes and operates on seven data types, divided into three groups: reals, integers, and packed BCD integers. Figure 7-17 shows the data formats for each of the FPU data types. Table 7-8 gives the length, precision, and approximate normalized range that can be represented of each FPU data type. Denormal values are also supported in each of the real types, as required by IEEE Standard 854. With the exception of the 80-bit extended-real format, all of these data types exist in memory only. When they are loaded into FPU data registers, they are converted into extended-real format and operated on in that format.
7-24
FLOATING-POINT UNIT
Single Real Sign 3130 Exp. 23 22 Fraction Implied Integer 0
Double Real Sign Sign Extended Real 79 78 Exponent 6463 62 Fraction Integer Word Integer Sign 15 14 Short Integer Sign 31 30 Long Integer Sign Sign 63 62 Packed BCD Integers
X D17 D16 D15 D14 D13 D12 D11 D10 D9 D8 D7 D6 D5 D4 D3 D2 D1 D0
Exponent 63 62 52 51
Fraction Implied Integer 0
79 78
72 71
4 Bits = 1 BCD Digit
Figure 7-17. Floating-Point Unit Data Type Formats
When stored in memory, the least significant byte of an FPU data-type value is stored at the initial address specified for the value. Successive bytes from the value are then stored in successively higher addresses in memory. The floating-point instructions load and store memory operands using only the initial address of the operand.
7.4.1.
Real Numbers
The FPUs three real data types (single-real, double-real, and extended-real) correspond directly to the single-precision, double-precision, and double-extended-precision formats in the IEEE standard. The extended-precision format is the format used by the data registers in the FPU. Table 7-8 gives the precision and range of these data types and Figure 7-17 gives the formats. For the single-real and double-real formats, only the fraction part of the significand is encoded. The integer is assumed to be 1 for all numbers except 0 and denormalized finite numbers. For the extended-real format, the integer is contained in bit 63, and the most-significant fraction bit
7-25
FLOATING-POINT UNIT
is bit 62. Here, the integer is explicitly set to 1 for normalized numbers, infinities, and NaNs, and to 0 for zero and denormalized numbers.
Table 7-8. Length, Precision, and Range of FPU Data Types
Data Type Length Precision (Bits) Approximate Normalized Range Binary 2126 to 2127 2 2
1022
Decimal 1.18 1038 to 3.40 1038 2.23 10308 to 1.79 10308 3.37 104932 to 1.18 104932 32,768 to 32,767 2.14 109 to 2.14 109 9.22 1018 to 9.22 1018 (1018 + 1) to (1018 1)
Binary Real Single real Double real Extended real Binary Integer Word integer Short integer Long integer Packed BCD Integers
32 64 80 16 32 64 80
24 53 64 15 31 63 18 (decimal digits)
to 21023 to 2
16383
16382
215 to 215 1 2
31
to 2
31
263 to 263 1 Not Pertinent
The exponent of each real data type is encoded in biased format. The biasing constant is 127 for the single-real format, 1023 for the double-real format, and 16,383 for the extended-real format. Table 7-9 shows the encodings for all the classes of real numbers (that is, zero, denormalizedfinite, normalized-finite, and ) and NaNs for each of the three real data-types. It also gives the format for the real indefinite value. When storing real values in memory, single-real values are stored in 4 consecutive bytes in memory; double-real values are stored in 8 consecutive bytes; and extended-real values are stored in 10 consecutive bytes. As a general rule, values should be stored in memory in double-real format. This format provides sufficient range and precision to return correct results with a minimum of programmer attention. The single-real format is appropriate for applications that are constrained by memory; however, it provides less precision and a greater chance of overflow. The single-real format is also useful for debugging algorithms, because rounding problems will manifest themselves more quickly in this format. The extended-real format is normally reserved for holding intermediate results in the FPU registers and constants. Its extra length is designed to shield final results from the effects of rounding and overflow/underflow in intermediate calculations. However, when an application requires the maximum range and precision of the FPU (for data storage, computations, and results), values can be stored in memory in extended-real format. The real indefinite value is a QNaN encoding that is stored by several floating-point instructions in response to a masked floating-point invalid operation exception (refer to Table 7-21).
7-26
FLOATING-POINT UNIT
Table 7-9. Real Number and NaN Encodings

Class Sign Biased Exponent Integer Positive + +Normals 0 0 . . 0 0 . . 0 0 1 1 . . 1 1 . . 1 1 X X 1 11..11 11..10 . . 00..01 00..00 . . 00..00 00..00 00..00 00..00 . . 00..00 00..01 . . 11..10 11..11 11..11 11..11 11..11 1 1 . . 1 0 . . 0 0 0 0 . . 0 1 . . 1 1 1 1 1
1
Significand Fraction 00..00 11..11 . . 00..00 11.11 . . 00..01 00..00 00..00 00..01 . . 11..11 00..00 . . 11..11 00..00 0X..XX2 1X..XX 10..00
+Denormals
+Zero Negative Zero Denormals
Normals
NaNs SNaN QNaN Real Indefinite (QNaN) Single-Real: Double-Real: Extended-Real NOTES:
8 Bits 11 Bits 15 Bits
23 Bits 52 Bits 63 Bits
1. Integer bit is implied and not stored for single-real and double-real formats. 2. The fraction for SNaN encodings must be non-zero.
7.4.2.
Binary Integers
The FPUs three binary integer data types (word, short, and long) have identical formats, except for length. Table 7-8 gives the precision and range of these data types and Figure 7-17 gives the formats. Table 7-10 gives the encodings of the three binary integer types.
7-27
FLOATING-POINT UNIT
Table 7-10. Binary Integer Encodings

Class Positive Largest Sign 0 . . . Smallest Zero Negative Smallest 0 0 1 . . . . Largest Integer Indefinite 1 1 Word Integer: Short Integer: Long Integer: Magnitude 11..11 . . . 00..01 00..00 11..11 . . . . 00..00 00..00
15 bits 31 Bits 63 Bits
The most significant bit of each format is the sign bit (0 for positive and 1 for negative). Negative values are represented in standard twos complement notation. The quantity zero is represented with all bits (including the sign bit) set to zero. Note that the FPUs word-integer data type is identical to the word-integer data type used by the processors integer unit and the shortinteger format is identical to the integer units doubleword-integer data type. Word-integer values are stored in 2 consecutive bytes in memory; short-integer values are stored in 4 consecutive bytes; and long-integer values are stored in 8 consecutive bytes. When loaded into the FPUs data registers, all the binary integers are exactly representable in the extendedreal format. The binary integer encoding 100..00B represents either of two things, depending on the circumstances of its use:
The largest negative number supported by the format (215, 231, or 263). The integer indefinite value.
If this encoding is used as a source operand (as in an integer load or integer arithmetic instruction), the FPU interprets it as the largest negative number representable in the format being used. If the FPU detects an invalid operation when storing an integer value in memory with an FIST/FISTP instruction and the invalid operation exception is masked, the FPU stores the integer indefinite encoding in the destination operand as a masked response to the exception. In situations where the origin of a value with this encoding may be ambiguous, the invalid operation exception flag can be examined to see if the value was produced as a response to an exception.
7-28
FLOATING-POINT UNIT
If the integer indefinite is stored in memory and is later loaded back into an FPU data register, it is interpreted as the largest negative number supported by the format.
7.4.3.
Decimal Integers
Decimal integers are stored in a 10-byte, packed BCD format. Table 7-8 gives the precision and range of this data type and Figure 7-17 shows the format. In this format, the first 9 bytes hold 18 BCD digits, 2 digits per byte (refer to Section 5.2.3., BCD Integers in Chapter 5, Data Types and Addressing Modes). The least-significant digit is contained in the lower half-byte of byte 0 and the most-significant digit is contained in the upper half-byte of byte 9. The most significant bit of byte 10 contains the sign bit (0 = positive and 1 = negative). (Bits 0 through 6 of byte 10 are dont care bits.) Negative decimal integers are not stored in two's complement form; they are distinguished from positive decimal integers only by the sign bit. Table 7-11 gives the possible encodings of value in the decimal integer data type.
Table 7-11. Packed Decimal Integer Encodings
Magnitude Class Positive Largest Sign 0 . . 0 Smallest Zero Negative Zero Smallest . . Largest Decimal Integer Indefinite 1 1 . . 0000000 1111111 1001 1111 1001 1111 . . 1001 UUUU* 1001 UUUU ... ... 1001 UUUU 0 1 1 0000000 0000000 0000000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ... ... ... 0000 0000 0001 0000000 . . 0000000 0000 0000 digit 1001 digit 1001 digit 1001 . . 0000 0000 ... 0001 digit 1001 ... ... digit 1001
1 byte
NOTE:
9 bytes
* UUUU means bit values are undefined and may contain any value.
The decimal integer format exists in memory only. When a decimal integer is loaded in a data register in the FPU, it is automatically converted to the extended-real format. All decimal integers are exactly representable in extended-real format.
7-29
FLOATING-POINT UNIT
The packed decimal indefinite encoding is stored by the FBSTP instruction in response to a masked floating-point invalid operation exception. Attempting to load this value with the FBLD instruction produces an undefined result.
7.4.4.
Unsupported Extended-Real Encodings
The extended-real format permits many encodings that do not fall into any of the categories shown in Table 7-9. Table 7-12 shows these unsupported encodings. Some of these encodings were supported by the Intel 287 math coprocessor; however, most of them are not supported by the Intel 387 math coprocessor, or the internal FPUs in the Intel486, Pentium, or Pentium Pro processors. These encodings are no longer supported due to changes made in the final version of IEEE Standard 754 that eliminated these encodings. The categories of encodings formerly known as pseudo-NaNs, pseudo-infinities, and un-normal numbers are not supported. The Intel 387 math coprocessor and the internal FPUs in the Intel486, Pentium, and Pentium Pro processors generate the invalid operation exception when they are encountered as operands. The encodings formerly known as pseudo-denormal numbers are not generated by the Intel 387 math coprocessor and the internal FPUs in the Intel486, Pentium, and Pentium Pro processors; however, they are used correctly when encountered as operands. The exponent is treated as if it were 00..01B and the mantissa is unchanged. The denormal exception is generated.
7-30
FLOATING-POINT UNIT
Table 7-12. Unsupported Extended-Real Encodings

Class Sign Biased Exponent Significand Integer Positive Pseudo-NaNs Quiet 0 . 0 0 . 0 0 0 . 0 0 . 0 1 . 1 1 . 1 1 1 . 1 1 . 1 11..11 . 11..11 11..11 . 11..11 11..11 11..10 . 00..01 00..00 . 00..00 00..00 . 00..00 11..10 . 00..01 11..11 11..11 . 11..11 11..11 . 11..11 0 Fraction 11..11 . 10..00 01..11 . 00..01 00..00 11..11 . 00..00 11..11 . 00..00 11..11 . 00..00 11..01 . 00..00 00..00 01..11 . 00..01 11..11 . 10..00
Signaling Positive Reals Pseudo-infinity Unnormals Pseudo-denormals
0 0
Negative Reals
Pseudo-denormals
Unnormals Pseudo-infinity Negative Pseudo-NaNs Signaling
0 0
Quiet
15 bits 7.5. FPU INSTRUCTION SET
63 bits
The floating-point instructions that the IA FPU supports can be grouped into six functional categories:
Data transfer instructions Basic arithmetic instructions Comparison instructions Transcendental instructions Load constant instructions FPU control instructions
7-31
FLOATING-POINT UNIT
Refer to Section 6.2.3., Floating-Point Instructions in Chapter 6, Instruction Set Summary, for a list of the floating-point instructions by category. The following section briefly describes the instructions in each category. Detailed descriptions of the floating-point instructions are given in Chapter 3, Instruction Set Reference, in the Intel Architecture Software Developers Manual, Volume 2.
7.5.1.
Escape (ESC) Instructions
All of the instructions in the FPU instruction set fall into a class of instructions known as escape (ESC) instructions. All of these instructions have a common opcode format, which is slightly different from the format used by the integer and operating-system instructions.
7.5.2.
FPU Instruction Operands
Most floating-point instructions require one or two operands, located on the FPU data-register stack or in memory. (None of the floating-point instructions accept immediate operands.) When an operand is located in a data register, it is referenced relative to the ST(0) register (the register at the top of the register stack), rather than by a physical register number. Often the ST(0) register is an implied operand. Operands in memory can be referenced using the same operand addressing methods available for the integer and system instructions.
7.5.3.
Data Transfer Instructions
The data transfer instructions (refer to Table 7-13) perform the following operations:
Load real, integer, or packed BCD operands from memory into the ST(0) register. Store the value in the ST(0) register in memory in real, integer, or packed BCD format. Move values between registers in the FPU register stack.
Table 7-13. Data Transfer Instructions
Real Integer FILD FIST FISTP Load Integer Store Integer Store Integer and Pop FBSTP Store Packed Decimal and Pop FBLD Packed Decimal Load Packed Decimal
FLD FST FSTP FXCH FCMOVcc
Load Real Store Real Store Real and Pop Exchange Register Contents Conditional Move
7-32
FLOATING-POINT UNIT
Operands are normally stored in the FPU data registers in extended-real format (refer to Section 7.3.4.2., Precision Control Field). The FLD (load real) instruction pushes a real operand from memory onto the top of the FPU data-register stack. If the operand is in single- or double-real format, it is automatically converted to extended-real format. This instruction can also be used to push the value in a selected FPU data register onto the top of the register stack. The FILD (load integer) instruction converts an integer operand in memory into extended-real format and pushes the value onto the top of the register stack. The FBLD (load packed decimal) instruction performs the same load operation for a packed BCD operand in memory. The FST (store real) and FIST (store integer) instructions store the value in register ST(0) in memory in the destination format (real or integer, respectively). Again, the format conversion is carried out automatically. The FSTP (store real and pop), FISTP (store integer and pop), and FBSTP (store packed decimal and pop) instructions store the value in the ST(0) registers into memory in the destination format (real, integer, or packed BCD), then performs a pop operation on the register stack. A pop operation causes the ST(0) register to be marked empty and the stack pointer (TOP) in the FPU control work to be incremented by 1. The FSTP instruction can also be used to copy the value in the ST(0) register to another FPU register [ST(i)]. The FXCH (exchange register contents) instruction exchanges the value in a selected register in the stack [ST(i)] with the value in ST(0). The FCMOVcc (conditional move) instructions move the value in a selected register in the stack [ST(i)] to register ST(0). These instructions move the value only if the conditions specified with a condition code (cc) are satisfied (refer to Table 7-14). The conditions being tested with the FCMOVcc instructions are represented by the status flags in the EFLAGS register. The condition code mnemonics are appended to the letters FCMOV to form the mnemonic for a FCMOVcc instruction.
Table 7-14. Floating-Point Conditional Move Instructions
Instruction Mnemonic FCMOVB FCMOVNB FCMOVE FCMOVNE FCMOVBE FCMOVNBE FCMOVU FCMOVNU Status Flag States CF=1 CF=0 ZF=1 ZF=0 (CF or ZF)=1 (CF or ZF)=0 PF=1 PF=0 Condition Description Below Not below Equal Not equal Below or equal Not below nor equal Unordered Not unordered
7-33
FLOATING-POINT UNIT
Like the CMOVcc instructions, the FCMOVcc instructions are useful for optimizing small IF constructions. They also help eliminate branching overhead for IF operations and the possibility of branch mispredictions by the processor.
NOTE
The FCMOVcc instructions may not be supported on some processors in the Pentium Pro processor family. Software can check if the FCMOVcc instructions are supported by checking the processors feature information with the CPUID instruction (refer to CPUIDCPU Identification in Chapter 3, Instruction Set Reference, of the Intel Architecture Software Developers Manual, Volume 2).
7.5.4.
Load Constant Instructions
The following instructions push commonly used constants onto the top [ST(0)] of the FPU register stack: FLDZ FLD1 FLDPI FLDL2T FLDL2E FLDLG2 FLDLN2 Load +0.0 Load +1.0 Load Load log2 10 Load log2e Load log102 Load loge2
The constant values have full extended-real precision (64 bits) and are accurate to approximately 19 decimal digits. They are stored internally in a format more precise than extended real. When loading the constant, the FPU rounds the more precise internal constant according to the RC (rounding control) field of the FPU control word. Refer to Section 7.5.8., Pi, for information on the constant.
7-34
FLOATING-POINT UNIT
7.5.5.
Basic Arithmetic Instructions
The following floating-point instructions perform basic arithmetic operations on real numbers. Where applicable, these instructions match IEEE Standard 754: FADD/FADDP FIADD FSUB/FSUBP FISUB FSUBR/FSUBRP FISUBR FMUL/FMULP FIMUL FDIV/FDIVP FIDIV FDIVR/FDIVRP FIDIVR FABS FCHS FSQRT FPREM FPREM1 FRNDINT FXTRACT Add real Add integer to real Subtract real Subtract integer from real Reverse subtract real Reverse subtract real from integer Multiply real Multiply integer by real Divide real Divide real by integer Reverse divide Reverse divide integer by real Absolute value Change sign Square root Partial remainder IEEE partial remainder Round to integral value Extract exponent and significand
The add, subtract, multiply and divide instructions operate on the following types of operands:
Two FPU register values. A register value and a real or integer value in memory.
Operands in memory can be in single-real, double-real, short-integer, or word-integer format. They are converted to extended-real format automatically. Reverse versions of the subtract and divide instructions are provided to foster efficient coding. For example, the FSUB instruction subtracts the value in a specified FPU register [ST(i)] from the value in register ST(0); whereas, the FSUBR instruction subtracts the value in ST(0) from the value in ST(i). The results of both operations are stored in register ST(0). These instructions eliminate the need to exchange values between register ST(0) and another FPU register to perform a subtraction or division. The pop versions of the add, subtract, multiply and divide instructions pop the FPU register stack following the arithmetic operation. The FPREM instruction computes the remainder from the division of two operands in the manner used by the Intel 8087 and Intel 287 math coprocessors; the FPREM1 instructions computes the remainder is the manner specified in the IEEE specification. The FSQRT instruction computes the square root of the source operand. The FRNDINT instructions rounds a real value to its nearest integer value, according to the current rounding mode specified in the RC field of the FPU control word. This instruction
7-35
FLOATING-POINT UNIT
performs a function similar to the FIST/FISTP instructions, except that the result is saved in a real format. The FABS, FCHS, and FXTRACT instructions perform convenient arithmetic operations. The FABS instruction produces the absolute value of the source operand. The FCHS instruction changes the sign of the source operand. The FXTRACT instruction separates the source operand into its exponent and fraction and stores each value in a register in real format.
7.5.6.
Comparison and Classification Instructions

Compare real and set FPU condition code flags. Unordered compare real and set FPU condition code flags. Compare integer and set FPU condition code flags. Compare real and set EFLAGS status flags. Unordered compare real and set EFLAGS status flags. Test (compare real with 0.0). Examine.
The following instructions compare or classify real values: FCOM/FCOMP/FCOMPP FUCOM/FUCOMP/FUCOMPP FICOM/FICOMP FCOMI/FCOMIP FUCOMI/FUCOMIP FTST FXAM
Comparison of real values differ from comparison of integers because real values have four (rather than three) mutually exclusive relationships: less than, equal, greater than, and unordered. The unordered relationship is true when at least one of the two values being compared is a NaN or in an undefined format. This additional relationship is required because, by definition, NaNs are not numbers, so they cannot have less than, equal, or greater than relationships with other real values. The FCOM, FCOMP, and FCOMPP instructions compare the value in register ST(0) with a real source operand and set the condition code flags (C0, C2, and C3) in the FPU status word according to the results (refer to Table 7-15). If an unordered condition is detected (one or both of the values is a NaN or in an undefined format), a floating-point invalid operation exception is generated. The pop versions of the instruction pop the FPU register stack once or twice after the comparison operation is complete. The FUCOM, FUCOMP, and FUCOMPP instructions operate the same as the FCOM, FCOMP, and FCOMPP instructions. The only difference is that with the FUCOM, FUCOMP, and FUCOMPP instructions, if an unordered condition is detected because one or both of the operands is a QNaN, the floating-point invalid operation exception is not generated.
7-36
FLOATING-POINT UNIT
Table 7-15. Setting of FPU Condition Code Flags for Real Number Comparisons
Condition ST(0) > Source Operand ST(0) < Source Operand ST(0) = Source Operand Unordered C3 0 0 1 1 C2 0 0 0 1 C0 0 1 0 1
The FICOM and FICOMP instructions also operate the same as the FCOM and FCOMP instructions, except that the source operand is an integer value in memory. The integer value is automatically converted into an extended real value prior to making the comparison. The FICOMP instruction pops the FPU register stack following the comparison operation. The FTST instruction performs the same operation as the FCOM instruction, except that the value in register ST(0) is always compared with the value 0.0. The FCOMI and FCOMIP instructions are new in the Intel Pentium Pro processor. They perform the same comparison as the FCOM and FCOMP instructions, except that they set the status flags (ZF, PF, and CF) in the EFLAGS register to indicate the results of the comparison (refer to Table 7-16) instead of the FPU condition code flags. The FCOMI and FCOMIP instructions allow condition branch instructions (Jcc) to be executed directly from the results of their comparison.
Table 7-16. Setting of EFLAGS Status Flags for Real Number Comparisons
Comparison Results ST0 > ST(i) ST0 < ST(i) ST0 = ST(i) Unordered ZF 0 0 1 1 PF 0 0 0 1 CF 0 1 0 1
The FUCOMI and FUCOMIP instructions operate the same as the FCOMI and FCOMIP instructions, except that they do not generate a floating-point invalid operation exception if the unordered condition is the result of one or both of the operands being a QNaN. The FCOMIP and FUCOMIP instructions pop the FPU register stack following the comparison operation. The FXAM instruction determines the classification of the real value in the ST(0) register (that is, whether the value is zero, a denormal number, a normal finite number, , a NaN, or an unsupported format) or that the register is empty. It sets the FPU condition code flags to indicate the classification (refer to FXAMExamine in Chapter 3, Instruction Set Reference, of the Intel Architecture Software Developers Manual, Volume 2). It also sets the C1 flag to indicate the sign of the value.
7-37
FLOATING-POINT UNIT
7.5.6.1.
BRANCHING ON THE FPU CONDITION CODES
The processor does not offer any control-flow instructions that branch on the setting of the condition code flags (C0, C2, and C3) in the FPU status word. To branch on the state of these flags, the FPU status word must first be moved to the AX register in the integer unit. The FSTSW AX (store status word) instruction can be used for this purpose. When these flags are in the AX register, the TEST instruction can be used to control conditional branching as follows: 1. Check for an unordered result. Use the TEST instruction to compare the contents of the AX register with the constant 0400H (refer to Table 7-17). This operation will clear the ZF flag in the EFLAGS register if the condition code flags indicate an unordered result; otherwise, the ZF flag will be set. The JNZ instruction can then be used to transfer control (if necessary) to a procedure for handling unordered operands.
Table 7-17. TEST Instruction Constants for Conditional Branching
Order ST(0) > Source Operand ST(0) < Source Operand ST(0) = Source Operand Unordered Constant 4500H 0100H 4000H 0400H Branch JZ JNZ JNZ JNZ
2. Check ordered comparison result. Use the constants given in Table 7-17 in the TEST instruction to test for a less than, equal to, or greater than result, then use the corresponding conditional branch instruction to transfer program control to the appropriate procedure or section of code. If a program or procedure has been thoroughly tested and it incorporates periodic checks for QNaN results, then it is not necessary to check for the unordered result every time a comparison is made. Refer to Section 7.3.3., Branching and Conditional Moves on FPU Condition Codes, for another technique for branching on FPU condition codes. Some non-comparison FPU instructions update the condition code flags in the FPU status word. To ensure that the status word is not altered inadvertently, store it immediately following a comparison operation.
7.5.7.
FSIN FCOS FSINCOS FPTAN FPATAN
Trigonometric Instructions
Sine Cosine Sine and cosine Tangent Arctangent
The following instructions perform four common trigonometric functions:
7-38
FLOATING-POINT UNIT
These instructions operate on the top one or two registers of the FPU register stack and they return their results to the stack. The source operands must be given in radians. The FSINCOS instruction returns both the sine and the cosine of a source operand value. It operates faster than executing the FSIN and FCOS instructions in succession. The FPATAN instruction computes the arctangent of ST(1) divided by ST(0). It is useful for converting rectangular coordinates to polar coordinates.
7.5.8.
Pi
When the argument (source operand) of a trigonometric function is within the range of the function, the argument is automatically reduced by the appropriate multiple of 2 through the same reduction mechanism used by the FPREM and FPREM1 instructions. The internal value of that the IA FPU uses for argument reduction and other computations is as follows: = 0.f 22 where: f = C90FDAA2 2168C234 C (The spaces in the fraction above indicate 32-bit boundaries.) This internal value has a 66-bit mantissa, which is 2 bits more than is allowed in the significand of an extended-real value. (Since 66 bits is not an even number of hexadecimal digits, two additional zeros have been added to the value so that it can be represented in hexadecimal format. The least-significant hexadecimal digit (C) is thus 1100B, where the two leastsignificant bits represent bits 67 and 68 of the mantissa.) This value of has been chosen to guarantee no loss of significance in a source operand, provided the operand is within the specified range for the instruction. If the results of computations that explicitly use are to be used in the FSIN, FCOS, FSINCOS, or FPTAN instructions, the full 66-bit fraction of should be used. This insures that the results are consistent with the argument-reduction algorithms that these instructions use. Using a rounded version of can cause inaccuracies in result values, which if propagated through several calculations, might result in meaningless results. A common method of representing the full 66-bit fraction of is to separate the value into two numbers (high and low) that when added together give the value for shown earlier in this section with the full 66-bit fraction: = high + low For example, the following two values (given in scientific notation with the fraction in hexadecimal and the exponent in decimal) represent the 33 most-significant and the 33 least-significant bits of the fraction: high (unnormalized)= 0.C90FDAA20 * 2+2 low (unnormalized) = 0.42D184698 * 231
7-39
FLOATING-POINT UNIT
These values encoded in standard IEEE double-real format are as follows: high = 400921FB 54400000 low = 3DE0B461 1A600000 (Note that in the IEEE double-real format, the exponents are biased (by 1023) and the fractions are normalized.) Similar versions of can also be written in extended-real format. When using this two-part value in an algorithm, parallel computations should be performed on each part, with the results kept separate. When all the computations are complete, the two results can be added together to form the final result. The complications of maintaining a consistent value of for argument reduction can be avoided, either by applying the trigonometric functions only to arguments within the range of the automatic reduction mechanism, or by performing all argument reductions (down to a magnitude less than /4) explicitly in software.
7.5.9.
Logarithmic, Exponential, and Scale
The following instructions provide two different logarithmic functions, an exponential function, and a scale function. FYL2X FYL2XP1 F2XM1 FSCALE Compute log: Compute log epsilon: Compute exponential: Scale (y log2x) (y log2(x + 1)) (2x 1)
The FYL2X and FYL2XP1 instructions perform two different base 2 logarithmic operations. The FYL2X instruction computes the log of (y log2(x)). This operation permits the calculation of the log of any base using the following equation: logb x = (1/log2 b) log2 x The FYL2XP1 instruction computes the log epsilon of (y log2(x + 1)). This operation provides optimum accuracy for values of x that may be very close to 0. The F2XM1 instruction computes the exponential (2x 1). This instruction only operates on source values in the range 1.0 to +1.0. The FSCALE instruction multiplies the source operand by a power of 2.
7.5.10. Transcendental Instruction Accuracy

The algorithms that the Pentium and Pentium Pro processors use for the transcendental instructions (FSIN, FCOS, FSINCOS, FPTAN, FPATAN, F2XM1, FYL2X, and FYL2XP1) allow a higher level of accuracy than was possible in earlier IA math coprocessors and FPUs. The accuracy of these instructions is measured in terms of units in the last place (ulp). For a
7-40
FLOATING-POINT UNIT
given argument x, let f(x) and F(x) be the correct and computed (approximate) function values, respectively. The error in ulps is defined to be: f(x) F(x ) error = -------------------------k 63 2 where k is an integer such that 1 2

f( x) < 2 .
With the Pentium and Pentium Pro processors, the worst case error in the transcendental instructions is less than 1 ulp when rounding to nearest and less than 1.5 ulps when rounding in other modes. (The instructions fyl2x and fyl2xp1 are two operand instructions and are guaranteed to be within 1 ulp only when y = 1. When y != 1, the maximum ulp error is always within 1.35 ulps in round to nearest mode. The trigonometric instructions may use a 66-bit approximation to the true value of pi to reduce the magnitude of the input argument. In this case, the final computed result can vary considerably from the true mathematically precise result.) The instructions are guaranteed to be monotonic, with respect to the input operands, throughout the domain supported by the instruction. (For the two operand functions, monotonicity was proved by holding one of the operands constant.) With the Intel486 processor and Intel 387 math coprocessor, the worst-case, transcendentalfunction error is typically 3 or 3.5 ulps, but is sometimes as large as 4.5 ulps.
7.5.11. FPU Control Instructions

The following instructions control the state and modes of operation of the FPU. They also allow the status of the FPU to be examined: FINIT/FNINIT FLDCW FSTCW/FNSTCW FSTSW/FNSTSW FCLEX/FNCLEX FLDENV FSTENV/FNSTENV FRSTOR FSAVE/FNSAVE FINCSTP FDECSTP FFREE FNOP WAIT/FWAIT Initialize FPU Load FPU control word Store FPU control word Store FPU status word Clear FPU exception flags Load FPU environment Store FPU environment Restore FPU state Save FPU state Increment FPU register stack pointer Decrement FPU register stack pointer Free FPU register No operation Check for and handle pending unmasked FPU exceptions
The FINIT/FNINIT instructions initialize the FPU and its internal registers to default values.
7-41
FLOATING-POINT UNIT
The FLDCW instructions loads the FPU control word register with a value from memory. The FSTCW/FNSTCW and FSTSW/FNSTSW instructions store the FPU control and status words, respectively, in memory (or for an FSTSW/FNSTSW instruction in a general-purpose register). The FSTENV/FNSTENV and FSAVE/FNSAVE instructions save the FPU environment and state, respectively, in memory. The FPU environment includes all the FPUs control and status registers; the FPU state includes the FPU environment and the data registers in the FPU register stack. (The FSAVE/FNSAVE instruction also initializes the FPU to default values, like the FINIT/FNINIT instruction, after it saves the original state of the FPU.) The FLDENV and FRSTOR instructions load the FPU environment and state, respectively, from memory into the FPU. These instructions are commonly used when switching tasks or contexts. The WAIT/FWAIT instructions are synchronization instructions. (They are actually mnemonics for the same opcode.) These instructions check the FPU status word for pending unmasked FPU exceptions. If any pending unmasked FPU exceptions are found, they are handled before the processor resumes execution of the instructions (integer, floating-point, or system instruction) in the instruction stream. The WAIT/FWAIT instructions are provided to allow synchronization of instruction execution between the FPU and the processors integer unit. Refer to Section 7.9., Floating-Point Exception Synchronization for more information on the use of the WAIT/FWAIT instructions.
7.5.12. Waiting Vs. Non-waiting Instructions

All of the floating-point instructions except a few special control instructions perform a wait operation (similar to the WAIT/FWAIT instructions), to check for and handle pending unmasked FPU exceptions, before they perform their primary operation (such as adding two real numbers). These instructions are called waiting instructions. Some of the FPU control instructions, such as FSTSW/FNSTSW, have both a waiting and a non-waiting versions. The waiting version (with the F prefix) executes a wait operation before it performs its primary operation; whereas, the non-waiting version (with the FN prefix) ignores pending unmasked exceptions. Non-waiting instructions allow software to save the current FPU state without first handling pending exceptions or to reset or reinitialize the FPU without regard for pending exceptions.
NOTE
When operating a Pentium or Intel486 processor in MS-DOS compatibility mode, it is possible (under unusual circumstances) for a non-waiting instruction to be interrupted prior to being executed to handle a pending FPU exception. The circumstances where this can happen and the resulting action of the processor are described in Section E.2.1.3., No-Wait FPU Instructions Can Get FPU Interrupt in Window in Appendix E, Guidelines for Writing FPU Exceptions Handlers. When operating a Pentium Pro processor in MSDOS compatibility mode, non-waiting instructions can not be interrupted in this way (refer to Section E.2.2., MS-DOS* Compatibility Mode in the P6 Family Processors in Appendix E, Guidelines for Writing FPU Exceptions Handlers).
7-42
FLOATING-POINT UNIT
7.5.13. Unsupported FPU Instructions

The Intel 8087 instructions FENI and FDISI and the Intel 287 math coprocessor instruction FSETPM perform no function in the Intel 387 math coprocessor, or the Intel486, Pentium, or Pentium Pro processors. If these opcodes are detected in the instruction stream, the FPU performs no specific operation and no internal FPU states are affected.
7.6.
OPERATING ON NANS
As was described in Section 7.2.3.4., NaNs, the FPU supports two types of NaNs: SNaNs and QNaNs. An SNaN is any NaN value with its most-significant fraction bit set to 0 and at least one other fraction bit set to 1. (If all the fraction bits are set to 0, the value is an .) A QNaN is any NaN value with the most-significant fraction bit set to 1. The sign bit of a NaN is not interpreted. As a general rule, when a QNaN is used in one or more arithmetic floating-point instructions, it is allowed to propagate through a computation. An SNaN on the other hand causes a floatingpoint invalid operation exception to be signaled. SNaNs are typically used to trap or invoke an exception handler. They must be inserted by software; that is, the FPU never generates an SNaN as a result. The floating-point invalid operation exception has a flag and a mask bit associated with it in the FPU status and control registers, respectively (refer to Section 7.7., Floating-Point Exception Handling). The mask bit determines how the FPU handles an SNaN value. If the floating-point invalid operation mask bit is set, the SNaN is converted to a QNaN by setting the most-significant fraction bit of the value to 1. The result is then stored in the destination operand and the floating-point invalid operation flag is set. If the invalid operation mask is clear, a floating-point invalid operation fault is signaled and no result is stored in the destination operand. When a real operation or exception delivers a QNaN result, the value of the result depends on the source operands, as shown in Table 7-18. Except for the rules given at the beginning of this section for encoding SNaNs and QNaNs, software is free to use the bits in the significand of a NaN for any purpose. Both SNaNs and QNaNs can be encoded to carry and store data, such as diagnostic information.
7-43
FLOATING-POINT UNIT
Table 7-18. Rules for Generating QNaNs

Source Operands An SNaN and a QNaN. Two SNaNs. Two QNaNs. An SNaN and a real value. A QNaN and a real value. Neither source operand is a NaN and a floatingpoint invalid operation exception is signaled. QNaN Result The QNaN source operand. The SNaN with the larger significand converted into a QNaN. The QNaN with the larger significand. The SNaN converted into a QNaN. The QNaN source operand. The default QNaN real indefinite.
7.6.1.
Operating on NaNs with Streaming SIMD Extensions
The information presented in Section 7.6., Operating on NaNs, is applicable to the floatingpoint operations in the Streaming SIMD Extensions which operate on data in the floating-point registers. Specific differences are noted in this section. The invalid operation exception has a flag and a mask bit associated with it in MXCSR. The mask bit determines how the an SNaN value is handled. If the invalid operation mask bit is set, the SNaN is converted to a QNaN by setting the most-significant fraction bit of the value to 1. The result is then stored in the destination operand and the invalid operation flag is set. If the invalid operation mask is clear, an invalid operation fault is signaled and no result is stored in the destination operand. When a real operation or exception delivers a QNaN result, the value of the result depends on the source operands, as shown in Table 7-19. The exceptions to the behavior described in Table 7-19 are the MINPS and MAXPS instructions. If only one source is a NaN for these instructions, the Src2 operand (either NaN or real value) is written to the result; this differs from the behavior for other instructions as defined in Table 7-19, which is to always write the NaN to the result, regardless of which source operand contains the NaN. This approach for MINPS/MAXPS allows NaN data to be screened out of the bounds-checking portion of an algorithm. If instead of this behavior, it is required that the NaN source operand be returned, the min/max functionality can be emulated using a sequence of instructions: comparison followed by AND, ANDN and OR. In general Src1 and Src2 relate to an Katmai New Instruction instruction as follows:
ADDPS Src1, Src2/m128
Except for the rules given at the beginning of this section for encoding SNaNs and QNaNs, software is free to use the bits in the significand of a NaN for any purpose. Both SNaNs and QNaNs can be encoded to carry and store data, such as diagnostic information.
7-44
FLOATING-POINT UNIT
Table 7-19. Results of Operations with NaN Operands

Source Operands An SNaN and a QNaN. Two SNaNs. Two QNaNs. An SNaN and a real value. A QNaN and a real value. An SNaN/QNaN value ( for instructions which take only one operand ie. RCPPS, RCPSS, RSQRTPS, RSQRTSS) Neither source operand is a NaN and a floatingpoint invalid operation exception is signaled. NaN Result (invalid operation exception is masked) Src1 NaN (converted to QNaN if Src1 is an SNaN). Src1 NaN (converted to QNaN ) Src1 QNaN The SNaN converted into a QNaN. The QNaN source operand. The SNaN converted into a QNaN/the source QNaN.
The default QNaN real indefinite.
7.6.2.
Uses for Signaling NANs
By unmasking the invalid operation exception, the programmer can use signaling NaNs to trap to the exception handler. The generality of this approach and the large number of NaN values that are available provide the sophisticated programmer with a tool that can be applied to a variety of special situations. For example, a compiler can use signaling NaNs as references to uninitialized (real) array elements. The compiler can preinitialize each array element with a signaling NaN whose significand contained the index (relative position) of the element. Then, if an application program attempts to access an element that it had not initialized, it can use the NaN placed there by the compiler. If the invalid operation exception is unmasked, an interrupt will occur, and the exception handler will be invoked. The exception handler can determine which element has been accessed, since the operand address field of the exception pointers will point to the NaN, and the NaN will contain the index number of the array element.
7.6.3.
Uses for Quiet NANs
Quiet NaNs are often used to speed up debugging. In its early testing phase, a program often contains multiple errors. An exception handler can be written to save diagnostic information in memory whenever it was invoked. After storing the diagnostic data, it can supply a quiet NaN as the result of the erroneous instruction, and that NaN can point to its associated diagnostic area in memory. The program will then continue, creating a different NaN for each error. When the program ends, the NaN results can be used to access the diagnostic data saved at the time the errors occurred. Many errors can thus be diagnosed and corrected in one test run. In embedded applications which use computed results in further computations, an undetected QNaN can invalidate all subsequent results. Such applications should therefore periodically check for QNaNs and provide a recovery mechanism to be used if a QNaN result is detected.
7-45
FLOATING-POINT UNIT
7.7.
FLOATING-POINT EXCEPTION HANDLING
The FPU detects six classes of exception conditions while executing floating-point instructions:
Invalid operation (#I) Stack overflow or underflow (#IS) Invalid arithmetic operation (#IA)
Divide-by-zero (#Z) Denormalized operand (#D) Numeric overflow (#O) Numeric underflow (#U) Inexact result (precision) (#P)
The nomenclature of # symbol followed by one or two letters (for example, #IS) is used in this manual to indicate exception conditions. It is merely a short-hand form and is not related to assembler mnemonics. Each of the six exception classes has a corresponding flag bit in the FPU status word and a mask bit in the FPU control word (refer to Section 7.3.2., FPU Status Register and Section 7.3.4., FPU Control Word, respectively). In addition, the exception summary (ES) flag in the status word indicates when any of the exceptions has been detected, and the stack fault (SF) flag (also in the status word) distinguishes between the two types of invalid operation exceptions. When the FPU detects a floating-point exception, it sets the appropriate flags in the FPU status word, then takes one of two possible courses of action:
Handles the exception automatically, producing a predefined (and often times usable result), while allowing program execution to continue undisturbed. Invokes a software exception handler to handle the exception.
The following sections describe how the FPU handles exceptions (either automatically or by calling a software exception handler), how the FPU detects the various floating-point exceptions, and the automatic (masked) response to the floating-point exceptions.
7.7.1.
Arithmetic vs. Non-arithmetic Instructions
When dealing with floating-point exceptions, it is useful to distinguish between arithmetic instructions and non-arithmetic instructions. Non-arithmetic instructions have no operands or do not make substantial changes to their operands. Arithmetic instructions do make significant changes to their operands; in particular, they make changes that could result in a floatingpoint exception being signaled. Table 7-20 lists the non-arithmetic and arithmetic instructions. It should be noted that some non-arithmetic instructions can signal a floating-point stack (fault) exception, but this exception is not the result of an operation on an operand.
7-46
FLOATING-POINT UNIT
7.7.2.
Automatic Exception Handling
If the FPU detects an exception condition for a masked exception (an exception with its mask bit set), it sets the exception flag for the exception and delivers a predefined (default) response and continues executing instructions. The masked (default) responses to exceptions have been chosen to deliver a reasonable result for each exception condition and are generally satisfactory for most floating-point applications. By masking or unmasking specific floating-point exceptions in the FPU control word, programmers can delegate responsibility for most exceptions to the FPU and reserve the most severe exception conditions for software exception handlers. Because the exception flags are sticky, they provide a cumulative record of the exceptions that have occurred since they were last cleared. A programmer can thus mask all exceptions, run a calculation, and then inspect the exception flags to see if any exceptions were detected during the calculation.
7-47
FLOATING-POINT UNIT
Table 7-20. Arithmetic and Non-arithmetic Instructions

Non-arithmetic Instructions FABS FCHS FCLEX FDECSTP FFREE FINCSTP FINIT/FNINIT FLD (register-to-register) FLD (extended format from memory) FLD constant FLDCW FLDENV FNOP FRSTOR FSAVE/FNSAVE FST/FSTP (register-to-register) FSTP (extended format to memory) FSTCW/FNSTCW FSTENV/FNSTENV FSTSW/FNSTSW WAIT/FWAIT FXAM FXCH F2XM1 FADD/FADDP FBLD FBSTP FCOM/FCOMP/FCOMPP FCOS FDIV/FDIVP/FDIVR/FDIVRP FIADD FICOM/FICOMP FIDIV/FIDIVR FILD FIMUL FIST/FISTP FISUB/FISUBR FLD (conversion) FMUL/FMULP FPATAN FPREM/FPREM1 FPTAN FRNDINT FSCALE FSIN FSINCOS FSQRT FST/FSTP (conversion) FSUB/FSUBP/FSUBR/FSUBRP FTST FUCOM/FUCOMP/FUCOMPP FXTRACT FYL2X/FYL2XP1 Arithmetic Instructions
7-48
FLOATING-POINT UNIT
Note that when exceptions are masked, the FPU may detect multiple exceptions in a single instruction, because it continues executing the instruction after performing its masked response. For example, the FPU can detect a denormalized operand, perform its masked response to this exception, and then detect numeric underflow.
7.7.3.
Software Exception Handling
The FPU in the Pentium Pro, Pentium, and Intel486 processors provides two different modes of operation for invoking a software exception handler for floating-point exceptions: native mode and MS-DOS compatibility mode. The mode of operation is selected with the NE flag of control register CR0. (Refer to Chapter 2, System Architecture Overview, in the Intel Architecture Software Developers Manual, Volume 3, for more information about the NE flag.) 7.7.3.1. NATIVE MODE
The native mode for handling floating-point exceptions is selected by setting the NE flag in control register CR0 to 1. In this mode, if the FPU detects an exception condition while executing a floating-point instruction and the exception is unmasked (the mask bit for the exception is cleared), the FPU sets the flag for the exception and the ES flag in the FPU status word. It then invokes the software exception handler through the floating-point-error exception (#MF, vector 16), immediately before execution of any of the following instructions in the processors instruction stream:
The next floating-point instruction, unless it is one of the non-waiting instructions (FNINIT, FNCLEX, FNSTSW, FNSTCW, FNSTENV, and FNSAVE). The next WAIT/FWAIT instruction. The next MMX instruction.
If the next floating-point instruction in the instruction stream is a non-waiting instruction, the FPU executes the instruction without invoking the software exception handler. 7.7.3.2. MS-DOS* COMPATIBILITY MODE
If the NE flag in control register CR0 is set to 0, the MS-DOS compatibility mode for handling floating-point exceptions is selected. In this mode, the software exception handler for floatingpoint exceptions is invoked externally using the processors FERR#, INTR, and IGNNE# pins. This method of reporting floating-point errors and invoking an exception handler is provided to support the floating-point exception handling mechanism used in PC systems that are running the MS-DOS or Windows* 95 operating system.
7-49
FLOATING-POINT UNIT
The MS-DOS compatibility mode is typically used as follows to invoke the floating-point exception handler: 1. If the FPU detects an unmasked floating-point exception, it sets the flag for the exception and the ES flag in the FPU status word. 2. If the IGNNE# pin is deasserted, the FPU then asserts the FERR# pin either immediately, or else delayed (deferred) until just before the execution of the next waiting floating-point instruction or MMX instruction. Whether the FERR# pin is asserted immediately or delayed depends on the type of processor, the instruction, and the type of exception. 3. If a preceding floating-point instruction has set the exception flag for an unmasked FPU exception, the processor freezes just before executing the next WAIT instruction, waiting floating-point instruction, or MMX instruction. Whether the FERR# pin was asserted at the preceding floating-point instruction or is just now being asserted, the freezing of the processor assures that the FPU exception handler will be invoked before the new floatingpoint (or MMX) instruction gets executed. 4. The FERR# pin is connected through external hardware to IRQ13 of a cascaded, programmable interrupt controller (PIC). When the FERR# pin is asserted, the PIC is programmed to generate an interrupt 75H. 5. The PIC asserts the INTR pin on the processor to signal the interrupt 75H. 6. The BIOS for the PC system handles the interrupt 75H by branching to the interrupt 2 (NMI) interrupt handler. 7. The interrupt 2 handler determines if the interrupt is the result of an NMI interrupt or a floating-point exception. 8. If a floating-point exception is detected, the interrupt 2 handler branches to the floatingpoint exception handler. If the IGNNE# pin is asserted, the processor ignores floating-point error conditions. This pin is provided to inhibit floating-point exceptions from being generated while the floating-point exception handler is servicing a previously signaled floating-point exception. Appendix E, Guidelines for Writing FPU Exceptions Handlers, describes the MS-DOS compatibility mode in much greater detail. This mode is somewhat more complicated in the Intel486 and Pentium processor implementations, as described in Appendix E, Guidelines for Writing FPU Exceptions Handlers. 7.7.3.3. TYPICAL FLOATING-POINT EXCEPTION HANDLER ACTIONS
After the floating-point exception handler is invoked, the processor handles the exception in the same manner that it handles non-FPU exceptions. (The floating-point exception handler is normally part of the operating system or executive software.) A typical action of the exception handler is to store FPU state information in memory (with the FSTENV/FNSTENV or FSAVE/FNSAVE instructions) so that it can evaluate the exception and formulate an appropriate response (refer to Section 7.3.9., Saving the FPUs State).
7-50
FLOATING-POINT UNIT
Other typical exception handler actions include:
Examining stored FPU state information (control, status, and tag words, and FPU instruction and operand pointers) to determine the nature of the error. Correcting the condition that caused the error. Clearing the exception bits in the status word. Returning to the interrupted program and resuming normal execution.
If the faulting floating-point instruction is followed by one or more non-floating-point instructions, it may not be useful to re-execute the faulting instruction. Refer to Section 7.9., FloatingPoint Exception Synchronization, for more information on synchronizing floating-point exceptions. In cases where the handler needs to restart program execution with the faulting instruction, the IRET instruction cannot be used directly. The reason for this is that because the exception is not generated until the next floating-point or WAIT/FWAIT instruction following the faulting floating-point instruction, the return instruction pointer on the stack may not point to the faulting instruction. To restart program execution at the faulting instruction, the exception handler must obtain a pointer to the instruction from the saved FPU state information, load it into the return instruction pointer location on the stack, and then execute the IRET instruction. In lieu of writing recovery procedures, the exception handler can do the following:
Increment an exception counter for later display or printing. Print or display diagnostic information (such as, the FPU environment and registers). Halt further program execution.
Refer to Section E.3.3.4., FPU Exception Handling Examples in Appendix E, Guidelines for Writing FPU Exceptions Handlers for general examples of floating-point exception handlers and for specific examples of how to write a floating-point exception handler when using the MSDOS compatibility mode.
7.8.
FLOATING-POINT EXCEPTION CONDITIONS
The following sections describe the various conditions that cause a floating-point exception to be generated and the masked response of the FPU when these conditions are detected. Chapter 3, Instruction Set Reference, in the Intel Architecture Software Developers Manual, Volume 2, lists the floating-point exceptions that can be signaled for each floating-point instruction.
7.8.1.
Invalid Operation Exception
The floating-point invalid operation exception occurs in response to two general types of operations:
Stack overflow or underflow (#IS). Invalid arithmetic operand (#IA).
7-51
FLOATING-POINT UNIT
The flag for this exception (IE) is bit 0 of the FPU status word, and the mask bit (IM) is bit 0 of the FPU control word. The stack fault flag (SF) of the FPU status word indicates the type of operation caused the exception. When the SF flag is set to 1, a stack operation has resulted in stack overflow or underflow; when the flag is cleared to 0, an arithmetic instruction has encountered an invalid operand. Note that the FPU explicitly sets the SF flag when it detects a stack overflow or underflow condition, but it does not explicitly clear the flag when it detects an invalid-arithmetic-operand condition. As a result, the state of the SF flag can be 1 following an invalid-arithmetic-operation exception, if it was not cleared from the last time a stack overflow or underflow condition occurred. Refer to Section 7.3.2.4., Stack Fault Flag, for more information about the SF flag. 7.8.1.1. STACK OVERFLOW OR UNDERFLOW EXCEPTION (#IS)
The FPU tag word keeps track of the contents of the registers in the FPU register stack (refer to Section 7.3.6., FPU Tag Word). It then uses this information to detect two different types of stack faults:
Stack overflowan instruction attempts to write a value into a non-empty FPU register Stack underflowan instruction attempts to read a value from an empty FPU register.
When the FPU detects stack overflow or underflow, it sets the IE flag (bit 0) and the SF flag (bit 6) in the FPU status word to 1. It then sets condition-code flag C1 (bit 9) in the FPU status word to 1 if stack overflow occurred or to 0 if stack underflow occurred. If the invalid operation exception is masked, the FPU then returns the real, integer, or BCDinteger indefinite value to the destination operand, depending on the instruction being executed. This value overwrites the destination register or memory location specified by the instruction. If the invalid operation exception is not masked, a software exception handler is invoked (refer to Section 7.7.3., Software Exception Handling) and the top-of-stack pointer (TOP) and source operands remain unchanged. The term stack overflow comes from the condition where the a program has pushed eight values onto the FPU register stack and the next value pushed on the stack causes a stack wraparound to a register that already contains a value. The term stack underflow refers to the opposite condition from stack overflow. Here, a program has popped eight values from the FPU register stack and the next value popped from the stack causes stack wraparound to an empty register. 7.8.1.2. INVALID ARITHMETIC OPERAND EXCEPTION (#IA)
The FPU is able to detect a variety of invalid arithmetic operations that can be coded in a program. These operations generally indicate a programming error, such as dividing by . Table 7-21 lists the invalid arithmetic operations that the FPU detects. This group includes the invalid operations defined in IEEE Standard 854. When the FPU detects an invalid arithmetic operand, it sets the IE flag (bit 0) in the FPU status word to 1. If the invalid operation exception is masked, the FPU then returns an indefinite value to the destination operand or sets the floating-point condition codes, as shown in Table 7-21. If the invalid operation exception is not masked, a software exception handler is invoked (refer to
7-52
FLOATING-POINT UNIT
Section 7.7.3., Software Exception Handling) and the top-of-stack pointer (TOP) and source operands remain unchanged.
Table 7-21. Invalid Arithmetic Operations and the Masked Responses to Them
Condition Any arithmetic operation on an operand that is in an unsupported format. Any arithmetic operation on a SNaN. Compare and test operations: one or both operands are NaNs. Addition: operands are opposite-signed infinities. Subtraction: operands are like-signed infinities. Multiplication: by 0; 0 by . Division: by ; 0 by 0. Remainder instructions FPREM, FPREM1: modulus (divisor) is 0 or dividend is . Trigonometric instructions FCOS, FPTAN, FSIN, FSINCOS: source operand is . FIST/FISTP instruction when input operand <> MAXINT for destination operand size. FSQRT: negative operand (except FSQRT (0) = 0); FYL2X: negative operand (except FYL2X (0) = ); FYL2XP1: operand more negative than 1. FBSTP: source register is empty or it contains a NaN, , or a value that cannot be represented in 18 decimal digits. FXCH: one or both registers are tagged empty. Masked Response Return the real indefinite value to the destination operand. Return a QNaN to the destination operand (refer to Section 7.6., Operating on NaNs). Set the condition code flags (C0, C2, and C3) in the FPU status word to 111B (not comparable). Return the real indefinite value to the destination operand. Return the real indefinite value to the destination operand. Return the real indefinite value to the destination operand. Return the real indefinite; clear condition code flag C2 to 0. Return the real indefinite; clear condition code flag C2 to 0. Return MAXNEG to destination operand. Return the real indefinite value to the destination operand. Store BCD integer indefinite value in the destination operand. Load empty registers with the real indefinite value, then perform the exchange.
7.8.2.
Divide-By-Zero Exception (#Z)
The FPU reports a floating-point zero-divide exception whenever an instruction attempts to divide a finite non-zero operand by 0. The flag (ZE) for this exception is bit 2 of the FPU status word, and the mask bit (ZM) is bit 2 of the FPU control word. The FDIV, FDIVP, FDIVR, FDIVRP, FIDIV, and FIDIVR instructions and the other instructions that perform division internally (FYL2X and FXTRACT) can report the divide-by-zero exception. When a divide-by-zero exception occurs and the exception is masked, the FPU sets the ZE flag and returns the values shown in Table 7-21. If the divide-by-zero exception is not masked, the ZE flag is set, a software exception handler is invoked (refer to Section 7.7.3., Software Exception Handling), and the top-of-stack pointer (TOP) and source operands remain unchanged.
7-53
FLOATING-POINT UNIT
Table 7-22. Divide-By-Zero Conditions and the Masked Responses to Them

Condition Divide or reverse divide operation with a 0 divisor. FYL2X instruction. FXTRACT instruction. Masked Response Returns an signed with the exclusive OR of the sign of the two operands to the destination operand. Returns an signed with the opposite sign of the non-zero operand to the destination operand. ST(1) is set to ; ST(0) is set to 0 with the same sign as the source operand.
7.8.3.
Denormal Operand Exception (#D)
The FPU signals the denormal operand exception under the following conditions:
If an arithmetic instruction attempts to operate on a denormal operand (refer to Section 7.2.3.2., Normalized and Denormalized Finite Numbers). If an attempt is made to load a denormal single- or double-real value into an FPU register. (If the denormal value being loaded is an extended-real value, the denormal operand exception is not reported.)
The flag (DE) for this exception is bit 1 of the FPU status word, and the mask bit (DM) is bit 1 of the FPU control word. When a denormal operand exception occurs and the exception is masked, the FPU sets the DE flag, then proceeds with the instruction. The denormal operand in single- or double-real format is automatically normalized when converted to the extended-real format. Operating on denormal numbers will produce results at least as good as, and often better than, what can be obtained when denormal numbers are flushed to zero. In fact, subsequent operations will benefit from the additional precision of the internal extended-real format. Most programmers mask this exception so that a computation may proceed, then analyze any loss of accuracy when the final result is delivered. When a denormal operand exception occurs and the exception is not masked, the DE flag is set and a software exception handler is invoked (refer to Section 7.7.3., Software Exception Handling). The top-of-stack pointer (TOP) and source operands remain unchanged. When denormal operands have reduced significance due to loss of low-order bits, it may be advisable to not operate on them. Precluding denormal operands from computations can be accomplished by an exception handler that responds to unmasked denormal operand exceptions.
7.8.4.
Numeric Overflow Exception (#O)
The FPU reports a floating-point numeric overflow exception (#O) whenever the rounded result of an arithmetic instruction exceeds the largest allowable finite value that will fit into the real format of the destination operand. For example, if the destination format is extended-real (80 bits), overflow occurs when the rounded result falls outside the unbiased range of 1.0 216384 to 1.0 216384 (exclusive). Numeric overflow can occur on arithmetic operations where the result is stored in an FPU data register. It can also occur on store-real operations (with the FST and
7-54
FLOATING-POINT UNIT
FSTP instructions), where a within-range value in a data register is stored in memory in a singleor double-real format. The overflow threshold range for the single-real format is 1.0 2128 to 1.0 2128; the range for the double-real format is 1.0 21024 to 1.0 21024. The numeric overflow exception cannot occur when overflow occurs when storing values in an integer or BCD integer format. Instead, the invalid-arithmetic-operand exception is signaled. The flag (OE) for the numeric overflow exception is bit 3 of the FPU status word, and the mask bit (OM) is bit 3 of the FPU control word. When a numeric overflow exception occurs and the exception is masked, the FPU sets the OE flag and returns one of the values shown in Table 7-23. The value returned depends on the current rounding mode of the FPU (refer to Section 7.3.4.3., Rounding Control Field).
.
Table 7-23. Masked Responses to Numeric Overflow

Rounding Mode To nearest Sign of True Result + Toward Toward + + + Toward zero + + Largest finite positive number + Largest finite negative number Largest finite positive number Largest finite negative number Result
The action that the FPU takes when numeric overflow occurs and the numeric overflow exception is not masked, depends on whether the instruction is supposed to store the result in memory or on the register stack. If the destination is a memory location, the OE flag is set and a software exception handler is invoked (refer to Section 7.7.3., Software Exception Handling). The top-of-stack pointer (TOP) and source and destination operands remain unchanged. If the destination is the register stack, the exponent of the rounded result is divided by 224576 and the result is stored along with the significand in the destination operand. Condition code bit C1 in the FPU status word (called in this situation the round-up bit) is set if the significand was rounded upward and cleared if the result was rounded toward 0. After the result is stored, the OE flag is set and a software exception handler is invoked. The scaling bias value 24,576 is equal to 3 213. Biasing the exponent by 24,576 normally translates the number as nearly as possible to the middle of the extended-real exponent range so that, if desired, it can be used in subsequent scaled operations with less risk of causing further exceptions. When using the FSCALE instruction, massive overflow can occur, where the result is too large to be represented, even with a bias-adjusted exponent. Here, if overflow occurs again, after the result has been biased, a properly signed is stored in the destination operand.
7-55
7.8.5.
Numeric Underflow Exception (#U)
The FPU reports a floating-point numeric underflow exception (#U) whenever the rounded result of an arithmetic instruction is tiny (that is, less than the smallest possible normalized, finite value that will fit into the real format of the destination operand). For example, if the destination format is extended-real (80 bits), underflow occurs when the rounded result falls in the unbiased range of 1.0 216382 to 1.0 216382 (exclusive). Like numeric overflow, numeric underflow can occur on arithmetic operations where the result is stored in an FPU data register. It can also occur on store-real operations (with the FST and FSTP instructions), where a withinrange value in a data register is stored in memory in a single- or double-real format. The underflow threshold range for the single-real format is 1.0 2126 to 1.0 2126; the range for the double-real format is 1.0 21022 to 1.0 21022. (The numeric underflow exception cannot occur when storing values in an integer or BCD integer format.) The flag (UE) for the numeric-underflow exception is bit 4 of the FPU status word, and the mask bit (UM) is bit 4 of the FPU control word. When a numeric-underflow exception occurs and the exception is masked, the FPU denormalizes the result (refer to Section 7.2.3.2., Normalized and Denormalized Finite Numbers). If the denormalized result is exact, the FPU stores the result in the destination operand, without setting the UE flag. If the denormal result is inexact, the FPU sets the UE flag, then goes on to handle the inexact result exception condition (refer to Section 7.8.6., Inexact Result (Precision) Exception (#P)). It is important to note that if numeric-underflow is masked, a numeric-underflow exception is signaled only if the denormalized result is inexact. If the denormalized result is exact, no flags are set and no exceptions are signaled. The action that the FPU takes when numeric underflow occurs and the numeric-underflow exception is not masked, depends on whether the instruction is supposed to store the result in memory or on the register stack. If the destination is a memory location, the UE flag is set and a software exception handler is invoked (refer to Section 7.7.3., Software Exception Handling). The top-of-stack pointer (TOP) and source and destination operands remain unchanged. If the destination is the register stack, the exponent of the rounded result is multiplied by 224576 and the product is stored along with the significand in the destination operand. Condition code bit C1 in the FPU the status register (acting here as a round-up bit) is set if the significand was rounded upward and cleared if the result was rounded toward 0. After the result is stored, the UE flag is set and a software exception handler is invoked. The scaling bias value 24,576 is the same as is used for the overflow exception and has the same effect, which is to translate the result as nearly as possible to the middle of the extended-real exponent range. When using the FSCALE instruction, massive underflow can occur, where the result is too tiny to be represented, even with a bias-adjusted exponent. Here, if underflow occurs again, after the result has been biased, a properly signed 0 is stored in the destination operand.
FLOATING-POINT UNIT
7.8.6.
Inexact Result (Precision) Exception (#P)
The inexact result exception (also called the precision exception) occurs if the result of an operation is not exactly representable in the destination format. For example, the fraction 1/3 cannot be precisely represented in binary form. This exception occurs frequently and indicates that some (normally acceptable) accuracy has been lost. The exception is supported for applications that need to perform exact arithmetic only. Because the rounded result is generally satisfactory for most applications, this exception is commonly masked. Note that the transcendental instructions [FSIN, FCOS, FSINCOS, FPTAN, FPATAN, F2XM1, FYL2X, and FYL2XP1] by nature produce inexact results. The inexact result exception flag (PE) is bit 5 of the FPU status word, and the mask bit (PM) is bit 5 of the FPU control word. If the inexact result exception is masked when an inexact result condition occurs and a numeric overflow or underflow condition has not occurred, the FPU sets the PE flag and stores the rounded result in the destination operand. The current rounding mode determines the method used to round the result (refer to Section 7.3.4.3., Rounding Control Field). The C1 (roundup) bit in the FPU status word indicates whether the inexact result was rounded up (C1 is set) or not rounded up (C1 is cleared). In the not rounded up case, the least-significant bits of the inexact result are truncated so that the result fits in the destination format. If the inexact result exception is not masked when an inexact result occurs and numeric overflow or underflow has not occurred, the FPU performs the same operation described in the previous paragraph and, in addition, invokes a software exception handler (refer to Section 7.7.3., Software Exception Handling). If an inexact result occurs in conjunction with numeric overflow or underflow, one of the following operations is carried out:
If an inexact result occurs along with masked overflow or underflow, the OE or UE flag and the PE flag are set and the result is stored as described for the overflow or underflow exceptions (refer to Section 7.8.4., Numeric Overflow Exception (#O) or Section 7.8.5., Numeric Underflow Exception (#U)). If the inexact result exception is unmasked, the FPU also invokes the software exception handler. If an inexact result occurs along with unmasked overflow or underflow and the destination operand is a register, the OE or UE flag and the PE flag are set, the result is stored as described for the overflow or underflow exceptions, and the software exception handler is invoked. If an inexact result occurs along with unmasked overflow or underflow and the destination operand is a memory location, the inexact result condition is ignored.
7.8.7.
Exception Priority
The processor handles exceptions according to a predetermined precedence. When an instruction generates two or more exception conditions, the exception precedence sometimes results in the higher-priority exception being handled and the lower-priority exceptions being ignored. For
7-57
FLOATING-POINT UNIT
example, dividing an SNaN by zero can potentially signal an invalid-arithmetic-operand exception (due to the SNaN operand) and a divide-by-zero exception. Here, if both exceptions are masked, the FPU handles the higher-priority exception only (the invalid-arithmetic-operand exception), returning a real indefinite to the destination. Alternately, a denormal operand or inexact result exception can accompany a numeric underflow or overflow exception, with both exceptions being handled. The precedence for floating-point exceptions is as follows: 1. Invalid operation exception, subdivided as follows: a. Stack underflow.
b. Stack overflow. c. Operand of unsupported format.
d. SNaN operand. 2. QNaN operand. Though this is not an exception, the handling of a QNaN operand has precedence over lower-priority exceptions. For example, a QNaN divided by zero results in a QNaN, not a zero-divide exception. 3. Any other invalid operation exception not mentioned above or a divide-by-zero exception. 4. Denormal operand exception. If masked, then instruction execution continues, and a lower-priority exception can occur as well. 5. Numeric overflow and underflow exceptions in conjunction with the inexact result exception. 6. Inexact result exception. Invalid operation, zero divide, and denormal operand exceptions are detected before a floatingpoint operation begins, whereas overflow, underflow, and precision errors are not detected until a true result has been computed. When a pre-operation exception is detected, the FPU register stack and memory have not yet been updated, and appear as if the offending instructions has not been executed. When a post-operation exception is detected, the register stack and memory may be updated with a result (depending on the nature of the error). For more information on the order in which multiple exceptions or interrupts are serviced, refer to Section 5.7., Priority Among Simultaneous Exceptions and Interrupts, in Chapter 5, Interrupt and Exception Handling, of the Intel Architecture Software Developers Manual, Volume 3.
7.9.
FLOATING-POINT EXCEPTION SYNCHRONIZATION
Because the integer unit and FPU are separate execution units, it is possible for the processor to execute floating-point, integer, and system instructions concurrently. No special programming techniques are required to gain the advantages of concurrent execution. (Floating-point instructions are placed in the instruction stream along with the integer and system instructions.) However, concurrent execution can cause problems for floating-point exception handlers.
7-58
FLOATING-POINT UNIT
This problem is related to the way the FPU signals the existence of unmasked floating-point exceptions. (Special exception synchronization is not required for masked floating-point exceptions, because the FPU always returns a masked result to the destination operand.) When a floating-point exception is unmasked and the exception condition occurs, the FPU stops further execution of the floating-point instruction and signals the exception event. On the next occurrence of a floating-point instruction or a WAIT/FWAIT instruction in the instruction stream, the processor checks the ES flag in the FPU status word for pending floating-point exceptions. It floating-point exceptions are pending, the FPU makes an implicit call (traps) to the floating-point software exception handler. The exception handler can then execute recovery procedures for selected or all floating-point exceptions. Synchronization problems occur in the time frame between when the exception is signaled and when it is actually handled. Because of concurrent execution, integer or system instructions can be executed during this time frame. It is thus possible for the source or destination operands for a floating-point instruction that faulted to be overwritten in memory, making it impossible for the exception handler to analyze or recover from the exception. To solve this problem, an exception synchronizing instruction (either a floating-point instruction or a WAIT/FWAIT instruction) can be placed immediately after any floating-point instruction that might present a situation where state information pertaining to a floating-point exception might be lost or corrupted. Floating-point instructions that store data in memory are prime candidates for synchronization. For example, the following three lines of code have the potential for exception synchronization problems:
FILD COUNT INC COUNT FSQRT ; Floating-point instruction ; Integer instruction ; Subsequent floating-point instruction
In this example, the INC instruction modifies the result of a floating-point instruction (FILD). If an exception is signaled during the execution of the FILD instruction, the result stored in the COUNT memory location might be overwritten before the exception handler is called. Rearranging the instructions, as follows, so that the FSQRT instruction follows the FILD instruction, synchronizes the exception handling and eliminates the possibility of the exception being handled incorrectly.
FILD COUNT FSQRT INC COUNT ; Floating-point instruction ; Subsequent floating-point instruction synchronizes ; any exceptions generated by the FILD instruction. ; Integer instruction
The FSQRT instruction does not require any synchronization, because the results of this instruction are stored in the FPU data registers and will remain there, undisturbed, until the next floating-point or WAIT/FWAIT instruction is executed. To absolutely insure that any exceptions emanating from the FSQRT instruction are handled (for example, prior to a procedure call), a WAIT instruction can be placed directly after the FSQRT instruction. Note that some floating-point instructions (non-waiting instructions) do not check for pending unmasked exceptions (refer to Section 7.5.11., FPU Control Instructions). They include the FNINIT, FNSTENV, FNSAVE, FNSTSW, FNSTCW, and FNCLEX instructions. When an
7-59
FLOATING-POINT UNIT
FNINIT, FNSTENV, FNSAVE, or FNCLEX instruction is executed, all pending exceptions are essentially lost (either the FPU status register is cleared or all exceptions are masked). The FNSTSW and FNSTCW instructions do not check for pending interrupts, but they do not modify the FPU status and control registers. A subsequent waiting floating-point instruction can then handle any pending exceptions.
7-60
8
Programming With the Intel MMX Technology
CHAPTER 8 PROGRAMMING WITH THE INTEL MMX TECHNOLOGY

The Intel MMX technology comprises a set of extensions to the Intel Architecture (IA) that are designed to greatly enhance the performance of advanced media and communications applications. These extensions (which include new registers, data types, and instructions) are combined with a single-instruction, multiple-data (SIMD) execution model to accelerate the performance of applications such as motion video, combined graphics with video, image processing, audio synthesis, speech synthesis and compression, telephony, video conferencing, and 2D and 3D graphics, which typically use compute-intensive algorithms to perform repetitive operations on large arrays of simple, native data elements. The MMX technology defines a simple and flexible software model, with no new mode or operating-system visible state. All existing software will continue to run correctly, without modification, on IA processors that incorporate the MMX technology, even in the presence of existing and new applications that incorporate this technology. The following sections of this chapter describe the MMX technologys basic programming environment, including the MMX register set, data types, and instruction set. Detailed descriptions of the MMX instructions are provided in Chapter 3, Instruction Set Reference, of the Intel Architecture Software Developers Manual, Volume 2. The manner in which the MMX technology is integrated into the IA system programming model is described in Chapter 10, MMX Technology System Programming, in the Intel Architecture Software Developers Manual, Volume 3.
8.1.
OVERVIEW OF THE MMX TECHNOLOGY PROGRAMMING ENVIRONMENT
MMX technology provides the following new extensions to the IA programming environment.
Eight MMX registers (MM0 through MM7). Four MMX data types (packed bytes, packed words, packed doublewords, and quadword). The MMX instruction set.
The MMX registers and data types are described in the following sections. Refer to Section 8.3., Overview of the MMX Instruction Set, for an overview of the MMX instructions.
8-1
PROGRAMMING WITH THE INTEL MMX TECHNOLOGY
8.1.1.
MMX Registers
The MMX register set consists of eight 64-bit registers (refer to Figure 8-1). The MMX instructions access the MMX registers directly using the register names MM0 through MM7. These registers can only be used to perform calculations on MMX data types; they cannot be used to address memory. Addressing of MMX instruction operands in memory is handled by using the standard IA addressing modes and general-purpose registers (EAX, EBX, ECX, EDX, EBP, ESI, EDI, and ESP).
63 MM7 MM6 MM5 MM4 MM3 MM2 MM1 MM0
3006044
Figure 8-1. MMX Register Set
Although the MMX registers are defined in the IA as separate registers, they are aliased to the registers in the FPU data register stack (R0 through R7). (Refer to Chapter 10, MMX Technology System Programming, in the Intel Architecture Software Developers Manual, Volume 3, for a more detailed discussion of the aliasing of MMX registers.)
8-10
8.1.2.
MMX Data Types

Eight bytes packed into one 64-bit quantity. Four (16-bit) words packed into one 64-bit quantity. Two (32-bit) doublewords packed into one 64-bit quantity. One 64-bit quantity.
The MMX technology defines the following new 64-bit data types (refer to Figure 8-2): Packed bytes Packed words Packed doublewords Quadword
The bytes in the packed bytes data type are numbered 0 through 7, with byte 0 being contained in the least significant bits of the data type (bits 0 through 7) and byte 7 being contained in the most significant bits (bits 56 through 63). The words in the packed words data type are numbered 0 through 4, with word 0 being contained in the bits 0 through 15 of the data type and word 4 being contained in bits 48 through 63. The doublewords in a packed doublewords data type are numbered 0 and 1, with doubleword 0 being contained in bits 0 through 31 and doubleword 1 being contained in bits 32 through 63.
Packed bytes (8x8 bits) 63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0
Packed word (4x16 bits) 63 48 47 32 31 16 15 0
Packed doublewords (2x32 bits) 63 32 31 0
Quadword (64 bits) 63 0
3006002
Figure 8-2. MMX Data Types
The MMX instructions move the packed data types (packed bytes, packed words, or packed doublewords) and the quadword data type to-and-from memory or to-and-from the IA generalpurpose registers in 64-bit blocks. However, when performing arithmetic or logical operations on the packed data types, the MMX instructions operate in parallel on the individual bytes,
8-11
words, or doublewords contained in a 64-bit MMX register, as described in the following section (Section 8.1.3., Single Instruction, Multiple Data (SIMD) Execution Model). When operating on the bytes, words, and doublewords within packed data types, the MMX instructions recognize and operate on both signed and unsigned byte integers, word integers, and doubleword integers.
8.1.3.
Single Instruction, Multiple Data (SIMD) Execution Model
The MMX technology uses the single instruction, multiple data (SIMD) technique for performing arithmetic and logical operations on the bytes, words, or doublewords packed into 64-bit MMX registers. For example, the PADDSB instruction adds 8 signed bytes from the source operand to 8 signed bytes in the destination operand and stores 8 byte-results in the destination operand. This SIMD technique speeds up software performance by allowing the same operation to be carried out on multiple data elements in parallel. The MMX technology supports parallel operations on byte, word, and doubleword data elements when contained in MMX registers. The SIMD execution model supported in the MMX technology directly addresses the needs of modern media, communications, and graphics applications, which often use sophisticated algorithms that perform the same operations on a large number of small data types (bytes, words, and doublewords). For example, most audio data is represented in 16-bit (word) quantities. The MMX instructions can operate on 4 of these words simultaneously with one instruction. Video and graphics information is commonly represented as palletized 8-bit (byte) quantities. Here, one MMX instruction can operate on 8 of these bytes simultaneously.
8.1.4.
Memory Data Formats
When stored in memory the bytes, words, and doublewords in the packed data types are stored in consecutive addresses, with the least significant byte, word, or doubleword being stored in the lowest address and the more significant bytes, words, or doubleword being stored at consecutively higher addresses (refer to Figure 8-3). The ordering bytes, words, or doublewords in memory is always little endian. That is, the bytes with the lower addresses are less significant than the bytes with the higher addresses.
63
56 55 Byte 7
48 47
40 39
32 31 Byte 4
24 23
16 15 Byte 1
8 7 Byte 0
Byte 6
Byte 5
Byte 3
Byte 2
Memory Address 1008h
Memory Address 1000h

3006045
Figure 8-3. Eight Packed Bytes in Memory (at address 1000H)
8-10
8.1.5.
Data Formats for MMX Registers
Values in MMX registers have the same format as a 64-bit quantity in memory. MMX registers have two data access modes: 64-bit access mode and 32-bit access mode. The 64-bit access mode is used for 64-bit memory access, 64-bit transfer between MMX registers, all pack, logical and arithmetic instructions, and some unpack instructions. The 32-bit access mode is used for 32-bit memory access, 32-bit transfer between integer registers and MMX registers, and some unpack instructions.
8.2.
MMX INSTRUCTION SET
The MMX instruction set consists of 57 instructions, grouped into the following categories:
Data transfer instructions Arithmetic instructions Comparison instructions Conversion instructions Logical instructions Shift instructions Empty MMX state instruction (EMMS)
When operating on packed data within an MMX register, the data is cast by the type specified by the instruction. For example, the PADDB (add packed bytes) instruction treats the packed data in an MMX register as 8 packed bytes; whereas, the PADDW (add packed words) instruction treats the packed data as 4 packed words. Refer to Section 9.3.6., Additional SIMD Integer Instructions, in Chapter 9, Programming with the Streaming SIMD Extensions, for additional SIMD integer instructions added with the Streaming SIMD Extensions.
8-11
8.2.1.
Saturation Arithmetic and Wraparound Mode
The MMX technology supports a new arithmetic capability known as saturating arithmetic. Saturation is best defined by contrasting it with wraparound mode. In wraparound mode, results that overflow or underflow are truncated and only the lower (least significant) bits of the result are returned; that is, the carry is ignored. In saturation mode, results of an operation that overflow or underflow are clipped (saturated) to a data-range limit for the data type (refer to Table 8-1). The result of an operation that exceeds the range of a data-type saturates to the maximum value of the range. A result that is less than the range of a data type saturates to the minimum value of the range. This method of handling overflow and underflow is useful in many applications, such as color calculations.
Table 8-1. Data Range Limits for Saturation
Data Type Lower Limit Hexadecimal Signed Byte Signed Word Unsigned Byte Unsigned Word 80H 8000H 00H 0000H Decimal -128 -32,768 0 0 Upper Limit Hexadecimal 7FH 7FFFH FFH FFFFH Decimal 127 32,767 255 65,535
For example, when the result exceeds the data range limit for signed bytes, it is saturated to 7FH (FFH for unsigned bytes). If a value is less than the data range limit, it is saturated to 80H for signed bytes (00H for unsigned bytes). Saturation provides a useful feature of avoiding wraparound artifacts. In the example of color calculations, saturation causes a color to remain pure black or pure white without allowing for and inversion. MMX instructions do not indicate overflow or underflow occurrence by generating exceptions or setting flags.
8-10
8.2.2.
All MMX instructions, except the EMMS instruction, reference and operate on two operands: the source and destination operands. The first operand is the destination and the second operand is the source. The destination operand may also be a second source operand for the operation. The instruction overwrites the destination operand with the result. For example, a two-operand instruction would be decoded as: DEST (first operand) DEST (first operand) OPERATION SRC (second operand) The source operand for all the MMX instructions (except the data transfer instructions), can reside either in memory or in an MMX register. The destination operand resides in an MMX register. For data transfer instructions, the source and destination operands can also be an integer register (for the MOVD instruction) or memory location (for both the MOVD and MOVQ instructions).
8.3.
OVERVIEW OF THE MMX INSTRUCTION SET
Table 8-2 shows the instructions in the MMX instruction set. The following sections give a brief overview of each group of instructions in the MMX instruction set and the instructions within each group. Refer to Section 9.3.6., Additional SIMD Integer Instructions, in Chapter 9, Programming with the Streaming SIMD Extensions, for additional SIMD integer instructions added with the Streaming SIMD Extensions.
8.3.1.
Data Transfer Instructions
The MOVD (Move 32 Bits) instruction transfers 32 bits of packed data from memory to MMX registers and visa versa, or from integer registers to MMX registers and visa versa. The MOVQ (Move 64 Bits) instruction transfers 64-bits of packed data from memory to MMX registers and vise versa, or transfers data between MMX registers.
8-11
Table 8-2. MMX Instruction Set Summary

Category Arithmetic Addition Subtraction Multiplication Multiply and Add Comparison Compare for Equal Compare for Greater Than Conversion Pack Unpack High PUNPCKHBW, PUNPCKHWD, PUNPCKHDQ PUNPCKLBW, PUNPCKLWD, PUNPCKLDQ Packed Logical And And Not Or Exclusive OR Shift Left Logical Shift Right Logical Shift Right Arithmetic PSLLW, PSLLD PSRLW, PSRLD PSRAW, PSRAD Doubleword Transfers Data Transfer Register to Register Load from Memory Store to Memory MOVD MOVD MOVD EMMS Full Quadword PAND PANDN POR PXOR PSLLQ PSRLQ Wraparound PADDB, PADDW, PADDD PSUBB, PSUBW, PSUBD PMULL, PMULH PMADD PCMPEQB, PCMPEQW, PCMPEQD PCMPGTPB, PCMPGTPW, PCMPGTPD PACKSSWB, PACKSSDW PACKUSWB Signed Saturation PADDSB, PADDSW PSUBSB, PSUBSW Unsigned Saturation PADDUSB, PADDUSW PSUBUSB, PSUBUSW
Unpack Low
Shift
Quadword Transfers MOVQ MOVQ MOVQ
Empty MMX State
8-10
8.3.2.
Arithmetic Instructions
The arithmetic instructions perform addition, subtraction, multiplication, and multiply/add operations on packed data types. 8.3.2.1. PACKED ADDITION AND SUBTRACTION
The PADDSB, PADDSW, and PADDWD (packed add) and PSUBB, PSUBW, and PSUBD (packed subtract) instructions add or subtract the signed or unsigned data elements of the source operand to or from the destination operand in wraparound mode. These instructions support packed byte, packed word, and packed doubleword data types. The PADDSB and PADDSW (packed add with saturation) and PSUBSB and PSUBSW (packed subtract with saturation) instructions add or subtract the signed data elements of the source operand to or from the signed data elements of the destination operand and saturate the result to the limits of the signed data-type range. These instructions support packed byte and packed word data types. The PADDUSB and PADDUSW (packed add unsigned with saturation) and PSUBUSB and PSUBUSW (packed subtract unsigned with saturation) instructions add or subtract the unsigned data elements of the source operand to or from the unsigned data elements of the destination operand and saturate the result to the limits of the unsigned data-type range. These instructions support packed byte and packed word data types. 8.3.2.2. PACKED MULTIPLICATION
Packed multiplication instructions perform four multiplications on pairs of signed 16-bit operands, producing 32-bit intermediate results. Users may choose the low-order or high-order parts of each 32-bit result. The PMULHW (packed multiply high) and PMULLW (packed multiply low) instructions multiply the signed words of the source and destination operands and write the high-order or low-order 16 bits of each of the results to the destination operand. 8.3.2.3. PACKED MULTIPLY ADD
The PMADDWD (packed multiply and add) instruction calculates the products of the signed words of the source and destination operands. The four intermediate 32-bit doubleword products are summed in pairs to produce two 32-bit doubleword results.
8.3.3.
Comparison Instructions
The PCMPEQB, PCMPEQW, and PCMPEQD (packed compare for equal) and PCMPGTB, PCMPGTW, and PCMPGTD (packed compare for greater than) instructions compare the corresponding data elements in the source and destination operands for equality or value greater than, respectively. These instructions generate a mask of ones or zeros which are written to the destination operand. Logical operations can use the mask to select elements. This can be used to
8-11
implement a packed conditional move operation without a branch or a set of branch instructions. No flags are set. These instructions support packed byte, packed word and packed doubleword data types.
8.3.4.
Conversion Instructions
The conversion instructions convert the data elements within a packed data type. The PACKSSWB and PACKSSDW (packed with signed saturation) instruction converts signed words into signed bytes or signed doublewords into signed words, in signed saturation mode. The PACKUSWB (packed with unsigned saturation) instruction converts signed words into unsigned bytes, in unsigned saturation mode. The PUNPCKHBW, PUNPCKHWD, and PUNPCKHDQ (unpack high packed data) and PUNPCKLBW, PUNPCKLWD, and PUNPCKLDQ (unpack low packed data) instructions convert bytes to words, words to doublewords, or doublewords to quadwords.
8.3.5.
Logical Instructions
The PAND (bitwise logical AND), PANDN (bitwise logical AND NOT), POR (bitwise logical OR), and PXOR (bitwise logical exclusive OR) instructions perform bitwise logical operations on 64-bit quantities.
8.3.6.
Shift Instructions
The logical shift left, logical shift right and arithmetic shift right instructions shift each element by a specified number of bits. The logical left and right shifts also enable a 64-bit quantity (quadword) to be shifted as one block, assisting in data type conversions and alignment operations. The PSLLW and PSLLD (packed shift left logical) and PSRLW and PSRLD (packed shift right logical) instructions perform a logical left or right shift, and fill the empty high or low order bit positions with zeros. These instructions support packed word, packed doubleword, and quadword data types. The PSRAW and PSRAD (packed shift right arithmetic) instruction performs an arithmetic right shift, copying the sign bit into empty bit positions on the upper end of the operand. This instruction supports packed word and packed doubleword data types.
8.3.7.
EMMS (Empty MMX State) Instruction
The EMMS instruction empties the MMX state. This instruction must be used to clear the MMX state (empty the floating-point tag word) at the end of an MMX routine before calling other routines that can execute floating-point instructions.
8-10
8.4.
COMPATIBILITY WITH FPU ARCHITECTURE
The MMX state is aliased upon the IA floating-point state. No new state or mode is added to support the MMX technology. The same floating-point instructions that save and restore the floating-point state also handle the MMX state (for example, during context switching). MMX technology uses the same interface techniques between the floating-point architecture and the operating system (primarily for task switching purposes). For more details, refer to Chapter 10, MMX Technology System Programming, in the Intel Architecture Software Developers Manual, Volume 3.
8.4.1.
MMX Instructions and the Floating-Point Tag Word
After each MMX instruction, the entire floating-point tag word is set to Valid (00s). The Empty MMX state (EMMS) instruction sets the entire floating-point tag word to Empty (11s). Chapter 10, MMX Technology System Programming, in the Intel Architecture Software Developers Manual, Volume 3, describes the effects of floating-point and MMX instructions on the floating-point tag word. For details on floating-point tag word, refer to Section 7.3.6., FPU Tag Word in Chapter 7, Floating-Point Unit.
8.4.2.
Effect of Instruction Prefixes on MMX Instructions
Table 8-3 details the effect of an instruction prefix on an MMX instruction.

Table 8-3. Effect of Prefixes on MMX Instructions
Prefix Type Address size (67H) Operand size (66H) Segment override Repeat Lock (F0H) Effect of Prefix Affects MMX instructions with a memory operand. Ignored by MMX instructions without a memory operand. Reserved. Affects MMX instructions with a memory operand. Ignored by MMX instructions without a memory operand. Reserved. Generates an invalid opcode exception.
Refer to Section 2.2., Instruction Prefixes in Chapter 2, Instruction Format of the Intel Architecture Software Developers Manual, Volume 2, for detailed information on prefixes.
8.5.
WRITING APPLICATIONS WITH MMX CODE
The following sections give guidelines for writing applications code using the MMX technology.
8-11
8.5.1.
Detecting Support for MMX Technology Using the CPUID Instruction
Use the CPUID instruction to determine whether the processor supports the MMX instruction set (refer to Section 3.2., Instruction Reference in Section 3, Instruction Set Reference of the Intel Architecture Software Developers Manual, Volume 2, for a detailed description of the CPUID instruction). When the support for MMX technology is detected by the CPUID instruction, it is signaled by setting bit 23 (MMX technology bit) in the feature flags to 1. In general, two versions of the routine can be created: one with scalar instructions and one with MMX instructions. The application will call the appropriate routine depending on the results of the CPUID instruction. If support for MMX technology is detected, then the MMX routine is called; if no support for the MMX technology exists, the application calls the scalar routine.
NOTE
The CPUID instruction will continue to report the existence of the MMX technology if the CR0.EM bit is set (which signifies that the CPU is configured to generate exception interrupt 7 that can be used to emulate floating-point instructions). In this case, executing an MMX instruction results in an invalid opcode exception. Example 8-1 illustrates how to use the CPUID instruction. This example does not represent the entire CPUID sequence, but shows the portion used for detection of MMX technology.
Example 8-1. Partial Routine for Detecting MMX Technology with the CPUID Instruction ; identify existence of CPUID instruction ; identify Intel processor movEAX, 1; request for feature flags CPUID ; 0Fh, 0A2h CPUID instruction testEDX, 00800000h; Is IA MMX technology bit (Bit 23 of EDX) ; in feature flags set? jnz MMX_Technology_Found
8.5.2.
Using the EMMS Instruction
When integrating an MMX routine into an application running under an existing operating system, programmers need to take special precautions, similar to those when writing floatingpoint code. When an MMX instruction executes, the floating-point tag word is marked valid (00s). Subsequent floating-point instructions that will be executed may produce unexpected results because the floating-point stack seems to contain valid data. The EMMS instruction marks the floating-
8-10
point tag word as empty. Therefore, it is imperative to use the EMMS instruction at the end of every MMX routine, if the next routine may contain FPU code. The EMMS instruction must be used in each of the following cases:
When an application using the floating-point instructions calls an MMX technology library/DLL. (Use the EMMS instruction at the end of the MMX code.) When an application using MMX instructions calls a floating-point library/DLL. (Use the EMMS instruction before calling the floating-point code.) When a switch is made between MMX code in a task/thread and other tasks/threads in cooperative operating systems, unless it is certain that more MMX instructions will be executed before any FPU code.
If the EMMS instruction is not used when trying to execute a floating-point instruction, the following may occur:
Depending on the exception mask bits of the floating-point control word, a floating- point exception event may be generated. A soft exception may occur. In this case floating-point code continues to execute, but generates incorrect results. This happens when the floating-point exceptions are masked and no visible exceptions occur. The internal exception handler (microcode, not user visible) loads a NaN (Not a Number) with an exponent of 11..11B onto the floating-point stack. The NaN is used for further calculations, yielding incorrect results. A potential error may occur only if the operating system does NOT manage floating-point context across task switches. These operating systems are usually cooperative operating systems. It is imperative that the EMMS instruction execute at the end of all the MMX routines that may enable a task switch immediately after they end execution (explicit yield API or implicit yield API). The EMMS instruction is not returned when mixing MMX technology instructions and Streaming SIMD Extensions. Refer to Section 9.4., Compatibility with FPU Architecture in Chapter 9.4., Compatibility with FPU Architecture, of the Intel Architecture Software Developers Manual, Volume 3, for more detailed information.
8.5.3.
Interfacing with MMX Code
The MMX technology enables direct access to all the MMX registers. This means that all existing interface conventions that apply to the use of the processors general-purpose registers (EAX, EBX, etc.) also apply to use of MMX register. An efficient interface to MMX routines might pass parameters and return values through the MMX registers or through a combination of memory locations (via the stack) and MMX registers. Such an interface would have to be written in assembly language since passing parameters through MMX registers is not currently supported by any existing C compilers. Do not use the EMMS instruction when the interface to the MMX code has been defined to retain values in the MMX register.
8-11
If a high-level language, such as C, is used, the data types could be defined as a 64-bit structure with packed data types. When implementing usage of MMX instructions in high-level languages other approaches can be taken, such as:
Passing parameters to an MMX routine by passing a pointer to a structure via the integer stack. Returning a value from a function by returning the pointer to a structure.
8.5.4.
Writing Code with MMX and Floating-Point Instructions
The MMX technology aliases the MMX registers on the floating-point registers. The main reason for this is to enable MMX technology to be fully compatible and transparent to existing software environments (operating systems and applications). This way operating systems will be able to include new applications and drivers that use the MMX technology. An application can contain both floating-point and MMX code. However, the user is discouraged from causing frequent transitions between MMX and floating-point instructions by mixing MMX code and floating-point code. 8.5.4.1. RECOMMENDATIONS AND GUIDELINES
Do not mix MMX code and floating-point code at the instruction level for the following reasons:
The TOS (top of stack) value of the floating-point status word is set to 0 after each MMX instruction. This means that the floating-point code loses its pointer to its floating-point registers if the code mixes MMX instructions within a floating-point routine. An MMX instruction write to an MMX register writes ones (11s) to the exponent part of the corresponding floating-point register. Floating-point code that uses register contents that were generated by the MMX instructions may cause floating-point exceptions or incorrect results. These floating-point exceptions are related to undefined floating-point values and floating-point stack usage. All MMX instructions (except EMMS) set the entire tag word to the valid state (00s in all tag fields) without preserving the previous floating-point state. Frequent transitions between the MMX and floating-point instructions may result in significant performance degradation in some implementations.
8-10
If the application contains floating-point and MMX instructions, follow these guidelines:
Partition the MMX technology module and the floating-point module into separate instruction streams (separate loops or subroutines) so that they contain only instructions of one type. Do not rely on register contents across transitions. When the MMX state is not required, empty the MMX state using the EMMS instruction. Exit the floating-point code section with an empty stack.
Example 8-2. Floating-point (FP) and MMX Code
FP_code: .. .. (*leave the FPU stack empty*) MMX_code: .. EMMS (*mark the FPU tag word as empty*) FP_code 1: .. .. (*leave the FPU stack empty*)
8.5.5.
Using MMX Code in a Multitasking Operating System Environment
An application needs to identify the nature of the multitasking operating system on which it runs. Each task retains its own state which must be saved when a task switch occurs. The processor state (context) consists of the general-purpose registers and the floating-point and MMX registers. Operating systems can be classified into two types:
Cooperative multitasking operating system. Preemptive multitasking operating system.
The behavior of the two operating-system types in context switching is described in Section 10.4., Designing Operating System Task and Context Switching Facilities in Chapter 10, MMX Technology System Programming, of the Intel Architecture Software Developers Manual, Volume 3.
8.5.5.1.
COOPERATIVE MULTITASKING OPERATING SYSTEM
Cooperative multitasking operating systems do not save the FPU or MMX state when performing a context switch. Therefore, the application needs to save the relevant state before relinquishing direct or indirect control to the operating system. 8.5.5.2. PREEMPTIVE MULTITASKING OPERATING SYSTEM
Preemptive multitasking operating systems are responsible for saving and restoring the FPU and MMX state when performing a context switch. Therefore, the application does not have to save or restore the FPU and MMX state.
8.5.6.
Exception Handling in MMX Code
MMX instructions generate the same type of memory-access exceptions as other IA instructions. Some examples are: page fault, segment not present, and limit violations. Existing exception handlers can handle these types of exceptions. They do not have to be modified. Unless there is a pending floating-point exception, MMX instructions do not generate numeric exceptions. Therefore, there is no need to modify existing exception handlers or add new ones. If a floating-point exception is pending, the subsequent MMX instruction generates a numeric error exception (interrupt 16 and/or FERR#). The MMX instruction resumes execution upon return from the exception handler.
8.5.7.
Register Mapping
The MMX registers and their tags are mapped to physical locations of the floating-point registers and their tags. Register aliasing and mapping is described in more detail in Chapter 10, MMX Technology System Programming Model, in the Intel Architecture Software Developers Manual, Volume 3.
8-10
9
Programming With the Streaming SIMD Extensions
CHAPTER 9 PROGRAMMING WITH THE STREAMING SIMD EXTENSIONS

The Intel Streaming SIMD Extensions comprise a set of extensions to the Intel Architecture (IA) that is designed to greatly enhance the performance of advanced media and communications applications. These extensions (which include new registers, data types, and instructions) are combined with a single-instruction, multiple-data (SIMD) execution model to accelerate the performance of applications. Applications that typically use compute-intensive algorithms to perform repetitive operations on large arrays of simple, native data elements benefit the most. Applications that require regular access to large amount of data also benefit from the Streaming SIMD Extensions prefetching and streaming stores capabilities. Examples of these types of applications include:
motion video combined graphics with video image processing audio synthesis speech recognition, synthesis, and compression telephony video conferencing 2D and 3D graphics.
The Streaming SIMD Extensions define a simple and flexible software model. This new mode introduces a new operating-system visible state. To enhance performance and yield more concurrent execution, a new set of registers has been added. All existing software will continue to run correctly without modification on IA processors that incorporate the Streaming SIMD Extensions, even in the presence of existing and new applications that incorporate this technology. The following sections of this chapter describe the Streaming SIMD Extensions basic programming environment, including the SIMD floating-point register set, data types, and instruction set. Detailed descriptions of the Streaming SIMD Extensions are provided in Chapter 3, Instruction Set Reference, of the Intel Architecture Software Developers Manual, Volume 2. The manner in which the Streaming SIMD Extensions are integrated into the IA system programming model is described in Chapter 10, MMX Technology System Programming, in the Intel Architecture Software Developers Manual, Volume 3.
9-1
PROGRAMMING WITH THE STREAMING SIMD EXTENSIONS
9.1.
OVERVIEW OF THE STREAMING SIMD EXTENSIONS
The Streaming SIMD Extensions introduce new, general-purpose, floating-point instructions, that operate on a new set of eight 128-bit SIMD floating-point registers. This set enables the programmer to develop algorithms that can finely-mix packed, single-precision, floating-point and integer using both Streaming SIMD Extensions and MMX instructions respectively. In addition to these instructions, Streaming SIMD Extensions also provide new instructions to control cacheability of all MMX technology and 32-bit IA data types. These instructions include the ability to stream data to memory without polluting the caches, and the ability to prefetch data before it is actually used. The Streaming SIMD Extensions provide the following new extensions to the IA programming environment:
Eight SIMD floating-point registers (XMM0 through XMM7). SIMD floating-point data type - 128-bit, packed floating-point. The Streaming SIMD Extensions set.
The SIMD floating-point registers and data types are described in the following sections. Refer to Section 9.3., Overview of the Streaming SIMD Extensions Set, for an overview of the Streaming SIMD Extensions.
9.1.1.
SIMD Floating-Point Registers
The IA Streaming SIMD Extensions provide eight 128-bit general-purpose registers, each of which can be directly addressed. These registers are new, and require support from the operating system to use them. The SIMD floating-point registers hold packed 128-bit data. The Streaming SIMD Extensions access the SIMD floating-point registers directly using the register names XMM0 to XMM7 (Figure 9-1). SIMD floating-point registers can be used to perform calculations on data; they cannot be used to address memory. Addressing is accomplished by using the integer registers and standard IA addressing modes and general-purpose registers (EAX, EBX, ECX, EDX, EBP, ESI, EDI, and ESP). There is a new control/status register MXCSR, that is used to mask/unmask numerical exception handling, to set rounding modes, to set flush-to-zero mode, and to view status flags.
9-2
XMM7 XMM6 XMM5 XMM4 XMM3 XMM2 XMM1 XMM0

Figure 9-1. SIMD Floating-Point Registers
MMX registers are mapped onto the floating-point registers. Transitioning from MMX operations to floating-point operations required executing the EMMS instruction. Since SIMD floating-point registers are a separate register file, MMX instructions and floating-point instructions can be mixed with Streaming SIMD Extensions without execution of a special instruction such as EMMS.
9.1.2.
SIMD Floating-Point Data Types
The principal data type of the IA Streaming SIMD Extensions is a packed, single-precision, floating-point operand, specifically:
Four 32-bit single-precision (SP), floating-point numbers (Figure 9-2)
The new SIMD-integer instructions will operate on the packed byte, word or doubleword data types. The new prefetch instruction works on typeless data of size 32 bytes or greater.
127
96 95
65 63
32 31
Packed Single-FP
Figure 9-2. Packed Single-FP
9-3
The 32-bit, single-precision, floating-point numbers (doublewords) are numbered 0 through 3, with 0 being contained in the least significant 32-bits (doubleword) of the register. The Streaming SIMD Extensions move the packed data types (single-precision, floating-point doublewords) to-and-from memory in 64-bit or 128-bit blocks. However, when performing arithmetic or logical operations on the packed data types, the Streaming SIMD Extensions operate in parallel on the individual doublewords contained in the SIMD floating-point registers, as described in the following, Section 9.1.3., Single Instruction, Multiple Data (SIMD) Execution Model. The new SIMD-integer instructions follow the conventions of the MMX instructions and operate on data in the MMX registers, not the SIMD floating-point 128-bit registers (refer to Section 8.1.1., MMX Registers and Section 8.1.2., MMX Data Types in Chapter 8, Programming with the Intel MMX Technology).
9.1.3.
Single Instruction, Multiple Data (SIMD) Execution Model
The Streaming SIMD Extensions use the Single Instruction, Multiple Data (SIMD) technique for performing arithmetic and logical operations on the single-precision, floating-point values in the 128-bit SIMD floating-point registers. This technique speeds up software performance by processing multiple data elements in parallel, using a single instruction. The Streaming SIMD Extensions support operations on packed, single-precision, floating-point data types, and the additional SIMD Integer instructions support operations on packed quadrate data types (byte, word, or doubleword). This approach was chosen because most media processing applications have the following characteristics:
inherently parallel; wide dynamic range, hence floating-point based; regular and re-occurring memory access patterns; localized re-occurring operations performed on the data; data independent control flow.
The Streaming SIMD Extensions are 100% compatible with the IEEE Standard 754 for Binary Floating-Point Arithmetic. The Streaming SIMD Extensions are accessible from all IA execution modes: Protected mode, Real-address mode, and Virtual 8086 mode.
9.1.4.
Pentium III Processor Single Precision Floating-Point Format
The Pentium III processors SIMD floating-point instructions operate on a 32-bit single precision floating-point number. For specific information and details on real numbers and special values represented by the IEEE single precision (32-bit) format, and how the Pentium III
9-4
processor operates on these values, refer to Section 7.2., Real Numbers and Floating-Point Formats in Chapter 7, Floating-Point Unit.
9.1.5.
Memory Data Formats
The IA Streaming SIMD Extensions introduces a new packed 128-bit data type that consists of four, single-precision, floating-point numbers. The 128 bits are numbered 0 through 127. Bit 0 is the least significant bit (LSB), and bit 127 is the most significant bit (MSB). Bytes in the new data type format have consecutive memory addresses. The ordering is always little endian, that is, the bytes with the lower addresses are less significant than the bytes with the higher addresses (Figure 9-3).
Byte 15 15 14 13 12 11 Memory Address 1016d 10 9 8 7 6 5 4 3 2 1
Byte 0 0
Memory Address 1000d
Figure 9-3. Four Packed FP Data in Memory (at address 1000H)
9.1.6.
SIMD Floating-Point Register Data Formats
Values in SIMD floating-point registers have the same format as a 128-bit quantity in memory. They have two data access modes: 128-bit access mode and 32-bit access mode. The data type corresponds directly to the single-precision format in the IEEE standard. Table 9-1 gives the precision and range of this data type. Only the fraction part of the significand is encoded. The integer is assumed to be 1 for all numbers, except 0 and denormalized finite numbers. The exponent of the single-precision data type is encoded in biased format. The biasing constant is 127 for the single-precision format.
Table 9-1. Precision and Range of SIMD Floating-point Datatype
Data Type Length Precision (Bits) 24 2
126
Approximate Normalized Range Binary to 2

127
Decimal 1.18 10
38
single-precision
32
to 1.70 1038
Table 9-2 shows the encodings for all the classes of real numbers (that is, zero, denormalizedfinite, normalized-finite, and ) and NaNs for the single-real data-type. It also gives the format for the real indefinite value, which is a QNaN encoding that is generated by several Streaming SIMD Extensions in response to a masked, floating-point, invalid operation exception.
9-5
When storing real values in memory, single-real values are stored in 4 consecutive bytes in memory. The 128-bit access mode is used for 128-bit memory accesses, 128-bit transfers between SIMD floating-point registers, and all logical, unpack and arithmetic instructions. The 32-bit access mode is used for 32-bit memory access, 32-bit transfers between SIMD floatingpoint registers, and all arithmetic instructions.
Table 9-2. Real Number and NaN Encodings
Class Sign Biased Exponent Integer Positive + +Normals 0 0 . . 0 0 . . 0 0 1 1 . . 1 1 . . 1 1 X X 1 11..11 11..10 . . 00..01 00..00 . . 00..00 00..00 00..00 00..00 . . 00..00 00..01 . . 11..10 11..11 11..11 11..11 11..11 1 1 . . 1 0 . . 0 0 0 0 . . 0 1 . . 1 1 1 1 1
1
Significand Fraction 00..00 11..11 . . 00..00 11.11 . . 00..01 00..00 00..00 00..01 . . 11..11 00..00 . . 11..11 00..00 0X..XX2 1X..XX 10..00
+Denormals
+Zero Negative Zero Denormals
Normals
- NaNs SNaN QNaN Real Indefinite (QNaN) Single NOTES:
8 Bits
23 Bits
1. Integer bit is implied and not stored for single-real and double-real formats. 2. The fraction for SNaN encodings must be non-zero.
9-6
9.1.7.
SIMD Floating-Point Control/Status Register
The control/status register is used to enable masked/unmasked numerical exception handling, to set rounding modes, to set flush-to-zero mode, and to view status flags. The contents of this register can be loaded with the LDMXCSR and FXRSTOR instructions and stored in memory with the STMXCSR and FXSAVE instructions. Figure 9-4 shows the format and encoding of the fields in the MXCSR.
31-16
Reserved
F Z
15 R C
R C
P M
U M
O M
10 Z M
D M
I M
R s v d
P E
5 U E
O E
Z E
D E
0 I E
Figure 9-4. SIMD Floating-Point Control/Status Register Format
Bits 5-0 indicate whether a SIMD floating-point numerical exception has been detected. They are sticky flags, and can be cleared by using the LDMXCSR instruction to write zeroes to these fields. If an LDMXCSR instruction clears a mask bit and sets the corresponding exception flag bit, an exception will not be immediately generated. The exception will occur only upon the next Streaming SIMD Extensions to cause this type of exception. Streaming SIMD Extensions use only one exception flag for each exception. There is no provision for individual exception reporting within a packed data type. In situations where multiple identical exceptions occur within the same instruction, the associated exception flag is updated and indicates that at least one of these conditions happened. These flags are cleared upon reset. Bits 12-7 configure numerical exception masking; an exception type is masked if the corresponding bit is set, and it is unmasked if the bit is clear. These enables are set upon reset, meaning that all numerical exceptions are masked. Bits 14-13 encode the rounding control, which provides for the common round to nearest mode, as well as directed rounding and true chop (refer to Section 9.1.8., Rounding Control Field). The rounding control is set to round to nearest upon reset. Bit 15 (FZ) is used to turn on the Flush-To-Zero mode (refer to Section 9.1.9., Flush-To-Zero). This bit is cleared upon reset, disabling the Flush-To-Zero mode. The other bits of MXCSR (bits 31-16 and bit 6) are defined as reserved and cleared; attempting to write a non-zero value to these bits, using either the FXRSTOR or LDMXCSR instructions, will result in a general protection exception.
9-7
9.1.8.
Rounding Control Field
The rounding control (RC) field of MXCSR (bits 13 and 14) controls how the results of floatingpoint instructions are rounded. Four rounding modes are supported: round to nearest, round up, round down, and round toward zero (refer to Table 9-3). Round to nearest is the default rounding mode and is suitable for most applications. It provides the most accurate and statistically unbiased estimate of the true result.
Rounding Mode Round to nearest (even) Round down (toward ) Round up (toward +) Round toward zero (Truncate) RC Field Setting 00B Description Rounded result is the closest to the infinitely precise result. If two values are equally close, the result is the even value (that is, the one with the least-significant bit of zero). Rounded result is close to but no greater than the infinitely precise result. Rounded result is close to but no less than the infinitely precise result. Rounded result is close to but no greater in absolute value than the infinitely precise result.
01B 10B 11B
The round up and round down modes are termed directed rounding and can be used to implement interval arithmetic. Interval arithmetic is used to determine upper and lower bounds for the true result of a multistep computation, when the intermediate results of the computation are subject to rounding. The round toward zero mode (sometimes called the chop mode) is commonly used when performing integer arithmetic with the processor.
9.1.9.
Flush-To-Zero
Zero results are returned with the sign of the true result Precision and underflow exception flags are set
Turning on the Flush-To-Zero mode has the following effects during underflow situations:
The IEEE mandated masked response to underflow is to deliver the denormalized result (i.e., gradual underflow); consequently, the Flush-To-Zero mode is not compatible with IEEE Standard 754. It is provided primarily for performance reasons. At the cost of a slight precision loss, faster execution can be achieved for applications where underflows are common. Underflow for Flush-To-Zero is defined to occur when the exponent for a computed result, prior to denormalization scaling, falls in the denormal range; this is regardless of whether a loss of accuracy has occurred. Unmasking the underflow exception takes precedence over Flush-To-Zero mode; this means that an exception handler will be invoked for a Streaming SIMD Extensions instruction that generates an underflow condition while this exception is unmasked, regardless of whether Flush-To-Zero is enabled.
9-8
9.2.
STREAMING SIMD EXTENSIONS SET
The Streaming SIMD Extensions set consists of 70 instructions, grouped into the following categories:
9.2.1.
Data movement instructions Arithmetic instructions Comparison instructions Conversion instructions Logical instructions Additional SIMD integer instructions Shuffle instructions State management instructions Cacheability control instructions
The IA Streaming SIMD Extensions supply a rich set of instructions that operate on either all or the least significant pairs of packed data operands in parallel. The packed instructions operate on a pair of operands as shown in Figure 9-5 while scalar instructions always operate on the least significant pair of the two operands as shown in Figure 9-6; for scalar operations, the three upper components from the first operand are passed through to the destination. In general, the address of a memory operand has to be aligned on a 16-byte boundary for all instructions, except for unaligned loads and stores.
X1 (SP)
X2 (SP)
X3 (SP)
X4 (SP)
Y1 (SP) OP
Y2 (SP) OP
Y3 (SP) OP
Y4 (SP) OP
X1 op Y1 (SP)
X2 op Y2 (SP)
X3 op Y3 (SP)
X4 op Y4 (SP)
Figure 9-5. Packed Operations
9-9
X1 (SP)
X2 (SP)
X3 (SP)
X4 (SP)
Y1 (SP)
Y2 (SP)
Y3 (SP)
Y4 (SP) OP
X1 (SP)
X2 (SP)
X3 (SP)
X4 op Y4 (SP)
Figure 9-6. Scalar Operations
9.3.
OVERVIEW OF THE STREAMING SIMD EXTENSIONS SET
Appendix D, SIMD Floating-Point Exceptions Summary shows the instructions in the Streaming SIMD Extensions set. The following sections give a brief overview of each group of instructions in the Streaming SIMD Extensions set and the instructions within each group.
9.3.1.
Data Movement Instructions
The MOVAPS (Move aligned packed, single-precision, floating-point) instruction transfers 128 bits of packed data from memory to SIMD floating-point registers and vice versa, or between SIMD floating-point registers. The memory address is aligned to 16-byte boundary; otherwise, a general protection exception will occur. The MOVUPS (Move unaligned packed, single-precision, floating-point) instruction transfers 128 bits of packed data from memory to SIMD floating-point registers and vice versa, or between SIMD floating-point registers. No assumption is made for alignment. The MOVHPS (Move unaligned, high packed, single-precision, floating-point) instruction transfers 64 bits of packed data from memory to the upper two fields of a SIMD floating-point register and vice versa. The lower two fields are left unchanged. The MOVHLPS (Move high to low packed, single-precision, floating-point) instruction transfers the upper 64-bits of the source register into the lower 64-bits of the 128-bit destination register. The upper 64-bits of the destination register are left unchanged. The MOVLHPS (Move low to high packed, single-precision, floating-point) instruction transfers the lower 64-bits of the source register into the upper 64-bits of the 128-bit destination register. The lower 64-bits of the destination register are left unchanged.
9-10
The MOVLPS (Move unaligned, low packed, single-precision, floating-point) instruction transfers 64 bits of packed data from memory to the lower two fields of a SIMD floating-point register and vice versa. The upper two fields are left unchanged. The MOVMSKPS (Move mask packed, single-precision, floating-point) instruction transfers the most significant bit of each of the four, packed, single-precision, floating-point numbers to an IA integer register. This 4-bit value can then be used as a condition to perform branching. The MOVSS (Move scalar single-precision, floating-point) instruction transfers the least significant 32 bits from memory to a SIMD floating-point register or vice versa, and between registers.
9.3.2.
9.3.2.1.
PACKED/SCALAR ADDITION AND SUBTRACTION
The ADDPS (Add packed, single-precision, floating-point) and SUBPS (Subtract packed, single-precision, floating-point) instructions add or subtract four pairs of packed, single-precision, floating-point operands. The ADDSS (Add scalar single-precision, floating-point) and SUBSS (Subtract scalar singleprecision, floating-point) instructions add or subtract the least significant pair of packed, singleprecision, floating-point operands; the upper three fields are passed through from the source operand. 9.3.2.2. PACKED/SCALAR MULTIPLICATION AND DIVISION
The MULPS (Multiply packed, single-precision, floating-point) instruction multiplies four pairs of packed, single-precision, floating-point operands. The MULSS (Multiply scalar single-precision, floating-point) instruction multiplies the least significant pair of packed, single-precision, floating-point operands; the upper three fields are passed through from the source operand. The DIVPS (Divide packed, single-precision, floating-point) instruction divides four pairs of packed, single-precision, floating-point operands. The DIVSS (Divide scalar single-precision, floating-point) instruction divides the least significant pair of packed, single-precision, floating-point operands; the upper three fields are passed through from the source operand. 9.3.2.3. PACKED/SCALAR SQUARE ROOT
The SQRTPS (Square root packed, single-precision, floating-point) instruction returns the square root of the packed four single-precision, floating-point numbers from the source to a destination register.
9-11
The SQRTSS (Square root scalar single-precision, floating-point) instruction returns the square root of the least significant component of the packed, single-precision, floating-point numbers from source to a destination register; the upper three fields are passed through from the source operand. 9.3.2.4. PACKED MAXIMUM/MINIMUM
The MAXPS (Maximum packed, single-precision, floating-point) instruction returns the maximum of each pair of packed, single-precision, floating-point numbers into the destination register. (destreg = {MAX xmm1[1], xmm2[1]; MAX xmm1[2], xmm2[2]; MAX xmm1[3], xmm2[3]; MAX xmm1[4], xmm2[4]}) The MAXSS (Maximum scalar single-precision, floating-point) instructions returns the maximum of the least significant pair of packed, single-precision, floating-point numbers into the destination register; the upper three fields are passed through from the source operand, to the destination register. The MINPS (Minimum packed, single-precision, floating-point) instruction returns the minimum of each pair of packed, single-precision, floating-point numbers into the destination register. (destreg = {MIN xmm1[1], xmm2[1]; MIN xmm1[2], xmm2[2]; MIN xmm1[3], xmm2[3]; MIN xmm1[4], xmm2[4]}) The MINSS (Minimum scalar single-precision, floating-point) instruction returns the minimum of the least significant pair of packed, single-precision, floating-point numbers into the destination register; the upper three fields are passed through from the source operand, to the destination register.
9.3.3.
Comparison Instructions
The CMPPS (Compare packed, single-precision, floating-point) instruction compares four pairs of packed, single-precision, floating-point numbers using the immediate operand as a predicate, returning per SP field an all "1" 32-bit mask or an all "0" 32-bit mask as a result. The instruction supports a full set of 12 conditions: equal, less than, less than equal, greater than, greater than or equal, unordered, not equal, not less than, not less than or equal, not greater than, not greater than or equal, ordered. The CMPSS (Compare scalar single-precision, floating-point) instruction compares the least significant pairs of packed, single-precision, floating-point numbers using the immediate operand as a predicate (same as CMPPS), returning per SP field an all "1" 32-bit mask or an all "0" 32-bit mask as a result. The COMISS (Compare scalar single-precision, floating-point ordered and set EFLAGS) instruction compares the least significant pairs of packed, single-precision, floating-point numbers, and sets the ZF, PF, and CF bits in the EFLAGS register (the OF, SF, and AF bits are cleared). The UCOMISS (Unordered compare scalar single-precision, floating-point ordered and set EFLAGS) instruction compares the least significant pairs of packed, single-precision, floating-
9-12
point numbers, and sets the ZF, PF, and CF bits in the EFLAGS register as described above (the OF, SF, and AF bits are cleared).
9.3.4.
Conversion Instructions
These instructions support packed and scalar conversions between 128-bit SIMD floating-point registers and either 64-bit integer MMX registers or 32-bit integer IA32 registers. The packed versions behave identically to original MMX instructions, in the presence of x87-FP instructions, including:
Transition from x87-FP to MMX technology (TOS=0, FP valid bits set to all valid). MMX instructions write ones (1s) to the exponent part of the corresponding x87-FP register. Use of EMMS for transition from MMX technology to x87-FP.
The CVTPI2PS (Convert packed 32-bit integer to packed, single-precision, floating-point) instruction converts two 32-bit signed integers in an MMX register to the two least significant single-precision, floating-point numbers. When the conversion is inexact, the rounded value according to the rounding mode in MXCSR is returned. The upper two significant numbers in the destination register are retained. The CVTSI2SS (Convert scalar 32-bit integer to scalar single-precision, floating-point) instruction converts a 32-bit signed integer in an MMX register to the least significant single-precision, floating-point number. When the conversion is inexact, the rounded value according to the rounding mode in MXCSR is returned. The upper three significant numbers in the destination register are retained. The CVTPS2PI (Convert packed, single-precision, floating-point to packed 32-bit integer) instruction converts the two least significant single-precision, floating-point numbers to two 32bit signed integers in an MMX register. When the conversion is inexact, the rounded value according to the rounding mode in MXCSR is returned. The CVTTPS2PI (Convert truncate packed, single-precision, floating-point to packed 32-bit integer) instruction is similar to CVTPS2PI, except if the conversion is inexact, in which case the truncated result is returned. The CVTSS2SI (Convert scalar single-precision, floating-point to a 32-bit integer) instruction converts the least significant single-precision, floating-point number to a 32-bit signed integer in an IA 32-bit integer register. When the conversion is inexact, the rounded value according to the rounding mode in MXCSR is returned. The CVTTSS2SI (Convert truncate scalar singleprecision, floating-point to scalar 32-bit integer) instruction is similar to CVTSS2SI, except if the conversion is inexact, the truncated result is returned.
9.3.5.
The ANDPS (Bit-wise packed logical AND for single-precision, floating-point) instruction returns a bitwise AND between the two operands.
9-13
The ANDNPS (Bit-wise packed logical AND NOT for single-precision, floating-point) instruction returns a bitwise AND NOT between the two operands. The ORPS (Bit-wise packed logical OR for single-precision, floating-point) instruction returns a bitwise OR between the two operands. The XORPS (Bit-wise packed logical XOR for single-precision, floating-point) instruction returns a bitwise XOR between the two operands.
9.3.6.
Additional SIMD Integer Instructions
Similar to the conversion instructions discussed in Section 9.3.4., Conversion Instructions, these SIMD Integer instructions also behave identically to original MMX instructions, in the presence of x87-FP instructions. The PAVGB/PAVGW (Average unsigned source sub-operands, without incurring a loss in precision) instructions add the unsigned data elements of the source operand to the unsigned data elements of the destination register. The results of the add are then each independently shifted right by one bit position. The high order bits of each element are filled with the carry bits of the sums. To prevent cumulative round-off errors, an averaging is performed. The low order bit of each final shifted result is set to 1 if at least one of the two least significant bits of the intermediate unshifted shifted sum is 1. The PEXTRW (Extract 16-bit word from MMX register) instruction moves the word in an MMX register selected by the two least significant bits of the immediate operand to the lower half of a 32-bit integer register; the upper word in the integer register is cleared. The PINSRW (Insert 16-bit word into MMX register) instruction moves the lower word in a 32-bit integer register or 16-bit word from memory into one of the four word locations in an MMX register, selected by the two least significant bits of the immediate operand. The PMAXUB/PMAXSW (Maximum of packed unsigned integer bytes or signed integer words) instructions return the maximum of each pair of packed elements into the destination register. The PMINUB/PMINSW (Minimum of packed unsigned integer bytes or signed integer words) instructions return the minimum of each pair of packed data elements into the destination register. The PMOVMSKB (Move Byte Mask from MMX register) instruction returns an 8-bit mask formed of the most significant bits of each byte of its source operand in an MMX register to an IA integer register. The PMULHUW (Unsigned high packed integer word multiply in MMX register) instruction performs an unsigned multiply on each word field of the two source MMX registers, returning the high word of each result to an MMX register. The PSADBW (Sum of absolute differences) instruction computes the absolute difference for each pair of sub-operand byte sources, and then accumulates the eight differences into a single 16-bit result.
9-14
The PSHUFW (Shuffle packed integer word in MMX register) instruction performs a full shuffle of any source word field to any result word field, using an 8-bit immediate operand.
9.3.7.
Shuffle Instructions
The SHUFPS (Shuffle packed, single-precision, floating-point) instruction is able to shuffle any of the packed four single-precision, floating-point numbers from one source operand to the lower two destination fields; the upper two destination fields are generated from a shuffle of any of the four SP FP numbers from the second source operand (Figure 9-7). By using the same register for both sources, SHUFPS can return any combination of the four SP FP numbers from this register.
X4
X3
X2
X1
Y4
Y3
Y2
Y1
{Y4 ... Y1}
{Y4 ... Y1}
{X4 ... X1}
{X4 ... X1}
Figure 9-7. Packed Shuffle Operation
The UNPCKHPS (Unpacked high packed, single-precision, floating-point) instruction performs an interleaved unpack of the high-order data elements of first and second packed, single-precision, floating-point operands. It ignores the lower half part of the sources (Figure 9-8). When unpacking from a memory operand, the full 128-bit operand is accessed from memory, but only the high order 64 bits are utilized by the instruction.
9-15
X4
X3
X2
X1
Y4
Y3
Y2
Y1
Y4
X4
Y3
X3
Figure 9-8. Unpack High Operation
The UNPCKLPS (Unpacked low packed, single-precision, floating-point) instruction performs an interleaved unpack of the low-order data elements of first and second packed, single-precision, floating-point operands. It ignores the higher half part of the sources (Figure 9-9). When unpacking from a memory operand, the full 128-bit operand is accessed from memory, but only the low order 64 bits are utilized by the instruction.
X4
X3
X2
X1
Y4
Y3
Y2
Y1
Y2
X2
Y1
X1
Figure 9-9. Unpack Low Operation
9.3.8.
State Management Instructions
The LDMXCSR (Load SIMD Floating-Point Control and Status Register) instruction loads the SIMD floating-point control and status register from memory. STMXCSR (Store SIMD
9-16
Floating-Point Control and Status Register) instruction stores the Streaming SIMD Extensions control and status word to memory. The FXSAVE instruction saves FP and MMX state and SIMD floating-point state to memory. Unlike FSAVE, FXSAVE it does not clear the x87-FP state. FXRSTOR loads FP and MMX state and SIMD floating-point state from memory.
9.3.9.
Cacheability Control Instructions
Data referenced by a programmer can have temporal (data will be used again) or spatial (data will be in adjacent locations, e.g. same cache line) locality. Some multimedia data types, such as the display list in a 3D graphics application, are referenced once and not reused in the immediate future. We will refer to this data type as non-temporal data. Thus, the programmer does not want the applications cached code and data to be overwritten by this non-temporal data. The cacheability control instructions enable the programmer to control caching so that non-temporal accesses will minimize cache pollution. In addition, the execution engine needs to be fed such that it does not become stalled waiting for data. Streaming SIMD Extensions allow the programmer to prefetch data long before its final use. These instructions are not architectural since they do not update any architectural state and are specific to each implementation. The programmer may have to tune his application for each implementation to take advantage of these instructions. These instructions merely provide a hint to the hardware, and they will not generate exceptions or faults. Excessive use of prefetch instructions may degrade processor performance due to resource allocation. The following three instructions provide programmatic control for minimizing cache pollution when writing data to memory from either the MMX registers or the SIMD floating-point registers.
The MASKMOVQ (Non-temporal byte mask store of packed integer in an MMX register) instruction stores data from an MMX register to the location specified by the (DS) EDI register. The most significant bit in each byte of the second MMX mask register is used to selectively write the data of the first register on a per-byte basis. The instruction is implicitly weakly-ordered, with all of the characteristics of the WC memory type; successive non-temporal stores may not write memory in program-order, do not write-allocate (i.e., the processor will not fetch the corresponding cache line into the cache hierarchy, prior to performing the store), write combine/collapse, and minimize cache pollution. The MOVNTQ (Non-temporal store of packed integer in an MMX register) instruction stores data from an MMX register to memory. The instruction is implicitly weaklyordered, does not write-allocate, and minimizes cache pollution. The MOVNTPS (Non-temporal store of packed, single-precision, floating-point) instruction stores data from a SIMD floating-point register to memory. The memory address must be aligned to a 16-byte boundary; if it is not aligned, a general protection exception will occur. The instruction is implicitly weakly-ordered, does not write-allocate, and minimizes cache pollution.
9-17
The non-temporal store instructions (MOVNTPS, MOVNTQ, and MASKMOVQ) minimize cache pollution while writing data. The main difference between a non-temporal store and a regular cacheable store is in the write-allocation policy. The memory type of the region being written to can override the non-temporal hint, leading to the following considerations. If the programmer specifies a non-temporal store to:
Uncacheable memory, the store behaves like an uncacheable store; the non-temporal hint is ignored, and the memory type for the region is retained. Uncacheable as referred to here means that the region being written to has been mapped with either a UC or WP memory type. If the memory region has been mapped as WB, WT, or WC, the non-temporal store will implement weakly-ordered (WC) semantic behavior. Cacheable memory, two cases may result. If the data is:
Present in the cache hierarchy, the hint is ignored and the cache line is updated normally. A given processor may choose different ways to implement this; some examples include: updating data in-place in the cache hierarchy while preserving the memory type semantics assigned to that region, or evicting the data from the caches and writing the new non-temporal data to memory (with WC semantics). Not present in the cache hierarchy, and the destination region is mapped as WB, WT, or WC, the transaction will be weakly-ordered, and is subject to all WC memory semantics; consequently, the programmer is responsible for maintaining coherency. The non-temporal store will not write allocate (i.e., the processor will not fetch the corresponding cache line into the cache hierarchy, prior to performing the store). Different implementations may choose to collapse and combine these stores prior to issuing them to memory.
In general, WC semantics require software to ensure coherency, with respect to other processors and other system agents (such as graphics cards). Appropriate use of synchronization and a fencing operation (refer to SFENCE, below) must be performed for producer-consumer usage models. Fencing ensures that all system agents have global visibility of the stored data. For instance, failure to fence may result in a written cache line staying within a processor, and the line would not be visible to other agents. For processors that implement non-temporal stores by updating data in-place that already resides in the cache hierarchy, the destination region should also be mapped as WC. Otherwise, if mapped as WB or WT, there is the potential for speculative processor reads to bring the data into the caches. In this case, non-temporal stores would then update in place, and data would not be flushed from the processor by a subsequent fencing operation. The memory type visible on the bus in the presence of memory type aliasing is implementationspecific. As one possible example, the memory type written to the bus may reflect the memory type for the first store to this line, as seen in program order; other alternatives are possible. This behavior should be considered reserved, and dependence on the behavior of any particular implementation risks future incompatibility. The PREFETCH (Load 32 or greater number of bytes) instructions load either non-temporal data or temporal data in the specified cache level. This access and the cache level are specified as a hint. The prefetch instructions do not affect functional behavior of the program and will be implementation-specific.
9-18
For more information on prefetch hints, refer to Section 9.5.3.1., Cacheability Hint Instructions. For even more detailed information, refer to Chapter 6, Optimizing Cache Utilization for Pentium III Processors, in the Intel Architecture Optimization Reference Manual (Order Number 245127-001). The SFENCE (Store Fence) instruction guarantees that every store instruction that precedes the store fence instruction in program order is globally visible before any store instruction that follows the fence. The SFENCE instruction provides an efficient way of ensuring ordering between routines that produce weakly-ordered results and routines that consume this data. The use of weakly-ordered memory types can be important under certain data sharing relationships, such as a producer-consumer relationship. The use of weakly-ordered memory can make the assembling of data more efficient, but care must be taken to ensure that the consumer obtains the data that the producer intended it to see.
9.4.
COMPATIBILITY WITH FPU ARCHITECTURE
The Streaming SIMD Extensions introduce a new state in the architecture. It is not aliased onto the floating-point registers as are the MMX instructions. New instructions must be used to save/restore the state of a Pentium III processor. The interface for context switching is discussed in detail in Section 11.5., Saving and Restoring the Streaming SIMD Extensions state and Section 11.6., Designing Operating System Task and Context Switching Facilities in Chapter 11, Streaming SIMD Extensions System Programming, of the Intel Architecture Software Developers Manual, Volume 3.
9.4.1.
Effect of Instruction Prefixes on Streaming SIMD Extensions
The Streaming SIMD Extensions use prefixes as specified in Table 9-4, Table 9-5, and Table 9-6. The effect of multiple prefixes (more than one prefix from a group) is unpredictable and may vary from processor to processor. Applying a prefix, in a manner not defined in this document, is considered reserved behavior. For example, Table 9-4 shows general behavior for most Streaming SIMD Extensions; however, the application of a prefix (Repeat, Repeat NE, Operand Size) is reserved for the following instructions: ANDPS, ANDNPS, COMISS, FXRSTOR, FXSAVE, ORPS, LDMXCSR, MOVAPS, MOVHPS, MOVLPS, MOVMSKPS, MOVNTPS, MOVUPS, SHUFPS, STMXCSR, UCOMISS, UNPCKHPS, UNPCKLPS, XORPS.
9-19
Table 9-4. Streaming SIMD Extensions Behavior with Prefixes

Prefix Type Address Size Prefix (67H) Operand Size (66H) Segment Override (2EH,36H,3EH,26H,64H,65H) Repeat Prefix (F3H) Repeat NE Prefix(F2H) Lock Prefix (0F0H) Effect on Streaming SIMD Extensions Affects Streaming SIMD Extensions with memory operand. Ignored by Streaming SIMD Extensions without memory operand. Reserved and may result in unpredictable behavior. Affects Streaming SIMD Extensions with memory operand. Ignored by Streaming SIMD Extensions without memory operand. Affects Streaming SIMD Extensions. Reserved and may result in unpredictable behavior. Generates invalid opcode exception.
Table 9-5. SIMD Integer Instructions Behavior with Prefixes

Prefix Type Address Size Prefix (67H) Operand Size (66H) Segment Override (2EH,36H,3EH,26H,64H,65H) Repeat Prefix (F3H) Repeat NE Prefix(F2H) Lock Prefix (0F0H) Effect on MMX Instructions Affects MMX instructions with mem. operand. Ignored by MMX instructions without mem. operand. Reserved and may result in unpredictable behavior. Affects MMX instructions with mem. operand. Ignored by MMX instructions without mem operand. Reserved and may result in unpredictable behavior. Reserved and may result in unpredictable behavior. Generates invalid opcode exception.
Table 9-6. Cacheability Control Instruction Behavior with Prefixes

Prefix Type Address Size Prefix (67H) Operand Size (66H) Segment Override (2EH,36H,3EH,26H,64H,65H) Repeat Prefix(F3H) Repeat NE Prefix(F2H) Lock Prefix (0F0H) Effect on Streaming SIMD Extensions Affects cacheability control instruction with a mem. operand. Ignored by cacheability control instruction w/o a mem. operand. Reserved and may result in unpredictable behavior. Affects cacheability control instructions with mem. operand. Ignored by cacheability control instruction without mem operand. Reserved and may result in unpredictable behavior. Reserved and may result in unpredictable behavior. Generates an invalid opcode exception for all cacheability instructions.
9-20
9.5.
WRITING APPLICATIONS WITH STREAMING SIMD EXTENSIONS CODE
The following sections give guidelines for writing applications code using the Streaming SIMD Extensions.
9.5.1.
Detecting Support for Streaming SIMD Extensions Using the CPUID Instruction
Use the CPUID instruction to determine whether the processor supports the Streaming SIMD Extensions set (refer to Section 3.2., Instruction Reference in Chapter 3, Instruction Set Reference of the Intel Architecture Software Developers Manual, Volume 2, for a detailed description of the CPUID instruction). When support for the Streaming SIMD Extensions is detected by the CPUID instruction, it is signaled by setting bit 25 (Streaming SIMD Extensions bit) in the feature flags to 1. This only determines the Streaming SIMD Extensions are present. There are other support considerations related to Streaming SIMD Extensions. The Streaming SIMD Extensions extensions can be divided into four categories:
Single-precision, packed/scalar floating-point Additional SIMD Integer Instructions State management (i.e., FXSAVE/FXRSTOR) Cacheability control, further subdivided as:
streaming stores for both packed FP (MOVNTPS) and integer MMX (MASKMOVQ and MOVNTQ) instructions. PREFETCH and SFENCE, which are not constrained to work with any specific data type.
In order for an application to use SIMD floating-point extensions, the following conditions must exist, otherwise an invalid opcode exception (Int 6) is generated:
CR0.EM(bit 2) = 0 (emulation disabled) CR4.OSFXSR(bit 9) = 1 (OS supports saving SIMD floating-point state during context switches) CPUID.XMM(EDX bit 25) = 1 (processor supports Streaming SIMD Extensions)
To verify support for the additional SIMD Integer instructions, including the corresponding cacheability control instructions, the application needs only to check that CPUID.XMM is set to 1. The SIMD integer instructions behave otherwise identically to the original MMX instructions; this implies that they will generate an invalid opcode exception if CR0.EM is set, but will not generate an exception if CR4.OSFXSR is disabled/cleared. To verify support for the PREFETCH and SFENCE instructions, the application needs only to check that CPUID.XMM is set to 1; these instructions are not affected by CR0.EM or CR4.OSFXSR.
9-21
For full details on how to determine what support is present for the Streaming SIMD Extensions, please refer to the Intel Processor Identification and the CPUID Instruction Application Note (AP-485), order number 241618-008.
9.5.2.
Interfacing with Streaming SIMD Extensions Procedures and Functions
The Streaming SIMD Extensions allow direct access to all SIMD floating-point registers. All existing interface conventions that apply to the use of other general registers (for example: EAX, EBX) will also apply to SIMD floating-point register usage. An efficient interface to the Streaming SIMD Extensions routines might pass parameters and return values through the SIMD floating-point registers or through a combination of memory locations (view the stack) and SIMD floating-point registers. The three common IA-32 calling conventions (cdecl, stdcall, and fastcall) have been extended to support the new register set for Streaming SIMD Extensions in the following ways:
The first three __m128 parameters are passed in registers xmm0, xmm1, and xmm2 (args in registers). Additional __m128 parameters are passed on the stack as usual. __m128 return values are passed in xmm0. Registers xmm0 through xmm7 are caller-save.
The caller must reserve the space in the argument block where the first three __m128 parameters would normally appear. These locations are generally left empty by the caller, but can be used by the callee as homes for the xmm0, xmm1, and xmm2 registers if necessary. New versions of the stdarg.h and varargs.h headers are provided with the Intel C/C++ compiler. These new implementations support variable argument lists containing __m128 data (i.e., where padding may have been inserted as required for aligned parameters as described above). The new convention requires that functions with variable argument lists be prototyped before calls are made to them, and that, for this case only, the caller must fill the locations on the stack for data in registers xmm0, xmm1, and xmm2. Callers to non-prototyped functions with variable argument lists with __m128 data must pass parameters both on the stack and in registers.
9.5.3.
Writing Code with MMX, Floating-Point, and Streaming SIMD Extensions
The SIMD floating-point registers are separate from the FP / MMX registers. An application can use Streaming SIMD Extensions and MMX instructions or Streaming SIMD Extensions and x87-FP instructions simultaneously, without any penalty. An application can use x87-FP for operations that need double or extended precision arithmetic, or for accessing any of the x87-FP trigonometric instructions. The restrictions on the simultaneous use of x87-FP and MMX instructions continue to exist, because they share the same architectural registers. The user still needs to perform an EMMS instruction when switching from MMX code to x87-FP code. However, the EMMS instruc-
9-22
tion is not necessary when integrating a Streaming SIMD Extensions module with existing MMX technology modules or existing x87-FP modules. Streaming SIMD Extensions also do not affect the floating-point tag word (FTW), floating-point control word (FCW), floating-point status word (FSW) or floating-point exception state (FIP, FOP, FCS, FDS and FDP). The SIMD integer instructions that are included in Streaming SIMD Extensions behave identically to original MMX instructions, in the presence of x87-FP instructions; this includes:
Transition from x87-FP to MMX technology (TOS=0, FP valid bits set to all valid). MMX instructions write ones (1s) to the exponent part of the corresponding x87-FP register. Use of EMMS for transition from MMX technology to x87-FP.
The Streaming SIMD Extensions that follow this behavior are: CVTPI2PS, CVTPS2PI, CVTTPS2PI, MASKMOVQ, MOVNTQ, PEXTRW, PINSRW, PMOVMSKB, PMULHUW, PSHUFW. 9.5.3.1. CACHEABILITY HINT INSTRUCTIONS
The Pentium III processors cacheability control instructions enable the programmer to control caching and prefetching of data. When correctly used, these instructions can significantly improve application performance. The PREFETCH instruction can minimize the latency of data access in performance-critical sections of application code by allowing data to be fetched in advance of actual usage. The instruction fetches 32 aligned bytes (or more, depending on the implementation) containing the addressed byte, to a location in the processor cache hierarchy as specified by the temporal locality hint (Table 9-7). In this table, cache level 0 is closest to the processor and cache level 2 is farthest from the processor. The hints specify fetch of either temporal or non-temporal data. Subsequent accesses to temporal data are treated like normal accesses, while those to nontemporal data will continue to minimize cache pollution. If the data is already present in a level of the cache hierarchy that is closer to the processor, the PREFETCH instruction will not result in any data movement.
9-23
Table 9-7. Cache Hints

HINTS T0 T1 T2 NTA ACTIONS Temporal data - fetch data into all levels of cache hierarchy (L1 or L2 on Pentium III) Temporal data - fetch data into level 2 cache and higher (L2 on Pentium III) Temporal data - fetch data into level 2 cache and higher (L2 on Pentium III) Non-temporal data - fetch data into location close to the processor, minimizing cache pollution (for level 1 cache) (L1 on Pentium III)
The PREFETCH instruction does not change the user-visible semantics of a program, although it may affect the performance of a program. The operation of this instruction is implementationdependent and can be overloaded to a subset of the hints (for example, T0, T1, and T2 may have the same behavior) or altogether ignored by an implementation. The programmer will have to tune his application for each implementation to take advantage of these instructions. These instructions do not generate exceptions or faults. Excessive usage of prefetch instructions may be throttled by the processor. For more detailed information on prefetch hints, refer to Chapter 6, Optimizing Cache Utilization for Pentium III Processors, in the Intel Architecture Optimization Reference Manual (Order Number 245127-001). Some common usage models that may be affected in this way by weakly-ordered stores are:
library functions, which use weakly-ordered memory to write results compiler-generated code, which also benefit from writing weakly-ordered results hand-crafted code
The degree to which a consumer of data knows that the data is weakly-ordered can vary for these cases. As a result, the SFENCE instruction should be used to ensure ordering between routines that produce weakly-ordered data and routines that consume this data. The SFENCE instruction provides a performance-efficient way to ensure ordering, by guaranteeing that every store instruction that precedes the store fence instruction in program order is globally visible before any store instruction that follows the fence.
9-24
9.5.3.2.
RECOMMENDATIONS AND GUIDELINES
For more specific information relating to these recommendations and guidelines, such as port assignments, prefetch instruction details, etc, refer to the Intel Architecture Optimization Reference Manual (Order Number 245127-001).
Balance the limitations of the architecture. a. Schedule instructions to resolve dependencies.
b. Intermix SIMD floating-point operations that utilize port 0 and port 1. c. Do not issue consecutive instructions that utilize the same port.
Use the reciprocal instructions followed by iteration for increased accuracy. These instructions yield reduced accuracy but execute much faster. If reduced accuracy is acceptable, use them with no iteration. If near full accuracy is needed, use a Newton-Raphson iteration. If full accuracy is needed, divide and square root provides this but slows down performance. Exceptions a. Mask exceptions to achieve higher performance. Unmasked exceptions may cause a reduction in the retirement rate.
b. Utilize the Flush-to-Zero mode for higher performance to avoid the penalty of dealing with denormals and underflows.
Incorporate the prefetch instruction whenever possible. Try to emulate conditional moves by masked compares and logicals instead of using conditional jumps. Utilize MMX instructions if the computations can be done in SIMD-integer, or for shuffling data or copying data that is not used later in SIMD floating-point computations. If the algorithm requires extended precision, conversion to SIMD floating-point code is not advised, because the SIMD floating-point instructions are single-precision.
9.5.4.
Using Streaming SIMD Extensions Code in a Multitasking Operating System Environment
An application needs to identify the nature of the multitasking operating system on which it runs. Each task retains its own state that must be saved when a task switch occurs. The processor state (context) consists of the integer registers, floating-point unit registers, and SIMD floatingpoint registers. The STMXCSR and FXSAVE instructions store SIMD floating-point state in memory for use by exception handlers and other system and application software. The STMXCSR instruction saves the contents of the SIMD floating-point control/status register. The FXSAVE instruction saves the x87-FP state (status, control, tag, instruction pointer, data pointer, opcode and stack registers) and SIMD floating-point state (status/control, tag and data registers). An application needs to verify that the processor supports FXSAVE prior to using this instruction. For a processor that implements FXSAVE but not Streaming SIMD Extensions, this can be
9-25
done by checking the CPUID.FXSR bit; for a processor that does implement Streaming SIMD Extensions, use the approach described in Section 9.5.1., Detecting Support for Streaming SIMD Extensions Using the CPUID Instruction. For even more detailed information, refer to the Intel Processor Identification and the CPUID Instruction Application Note (AP-485), order number 241618-008 and Identifying Support for Streaming SIMD Extensions in the Processor and Operating System (AP-900). The operating systems can be classified into two types:
Cooperative multitasking operating systems Preemptive multitasking operating systems COOPERATIVE MULTITASKING OPERATING SYSTEM
9.5.4.1.
This type of multitasking operating system does not save the FP and MMX state and SIMD floating-point state when performing a context switch. Therefore, the application needs to save the relevant state before relinquishing direct or indirect control to the operating system. 9.5.4.2. PREEMPTIVE MULTITASKING OPERATING SYSTEM
This type of multitasking operating system saves the FP and MMX state and SIMD floatingpoint state when performing a context switch. Therefore, the application does not have to save or restore SIMD floating-point state.
9.5.5.
Exception Handling in Streaming SIMD Extensions
Streaming SIMD Extensions can generate two kinds of exceptions:
Non-numeric exceptions Numeric exceptions
Streaming SIMD Extensions can generate the same type of memory access exceptions as the IA instructions do. Some examples are: page fault, segment not present, and limit violations. Existing exception handlers can handle these types of exceptions without any code modification. The Streaming SIMD Extensions PREFETCH instruction hints will not generate any kind of exception and instead will be ignored. Streaming SIMD Extensions can generate the same six numeric exceptions that x87-FP instructions can generate. All SIMD floating-point numeric exceptions are reported independently of x87-FP numeric exceptions. Independent masking and unmasking of SIMD floating-point numeric exceptions is achieved by setting/resetting specific bits in the MXCSR register. The application must ensure that the OS can support unmasked SIMD floating-point exceptions before unmasking them. (Use the approach described in Section 9.5.1., Detecting Support for Streaming SIMD Extensions Using the CPUID Instruction. For even more detailed information, refer to the Intel Processor Identification and the CPUID Instruction Application Note (AP-485), order number 241618-008 and Identifying Support for Streaming SIMD Extensions
9-26
in the Processor and Operating System (AP-900).) If an application unmasks exceptions using either FXRSTOR or LDMXCSR without the required OS support being enabled, an invalid opcode fault, instead of a SIMD floating-point exception, will be generated on the first faulting Streaming SIMD Extensions. SIMD floating-point numeric exceptions are precise and occur as soon as the instruction completes execution. They will not catch pending x87 floating-point exceptions and will not cause assertion of FERR# (independent of the value of CR0.NE). In addition, they ignore the assertion/de-assertion of IGNNE#. For more details on SIMD floating-point exceptions and exception handlers, refer to Section 4.4., Interrupts and Exceptions, in Chapter 4, Procedure Calls, Interrupts, and Exceptions, Appendix D, SIMD Floating-Point Exceptions Summary, and Appendix E, Guidelines for Writing FPU Exceptions Handlers.
9-27
9-28
10
Input/Output
CHAPTER 10 INPUT/OUTPUT
In addition to transferring data to and from external memory, Intel Architecture (IA) processors can also transfer data to and from input/output ports (I/O ports). I/O ports are created in system hardware by circuity that decodes the control, data, and address pins on the processor. These I/O ports are then configured to communicate with peripheral devices. An I/O port can be an input port, an output port, or a bidirectional port. Some I/O ports are used for transmitting data, such as to and from the transmit and receive registers, respectively, of a serial interface device. Other I/O ports are used to control peripheral devices, such as the control registers of a disk controller. This chapter describes the processors I/O architecture. The topics discussed include:
I/O port addressing. I/O instructions. I/O protection mechanism.
10.1. I/O PORT ADDRESSING

The processor allows I/O ports to be accessed in either of two ways:
Through a separate I/O address space. Through memory-mapped I/O.
Accessing I/O ports through the I/O address space is handled through a set of I/O instructions and a special I/O protection mechanism. Accessing I/O ports through memory-mapped I/O is handled with the processors general-purpose move and string instructions, with protection provided through segmentation or paging. I/O ports can be mapped so that they appear in the I/O address space or the physical-memory address space (memory mapped I/O) or both. One benefit of using the I/O address space is that writes to I/O ports are guaranteed to be completed before the next instruction in the instruction stream is executed. Thus, I/O writes to control system hardware cause the hardware to be set to its new state before any other instructions are executed. Refer to Section 10.6. for more information on serializing of I/O operations.
10.2. I/O PORT HARDWARE

From a hardware point of view, I/O addressing is handled through the processors address lines. For Pentium Pro, Pentium II, and Pentium III processors, a special memory-I/O transaction on the system bus indicates whether the address lines are being driven with a memory address or an I/O address; for Pentium and earlier IA processors, the M/IO pin indicates a memory address (1) or an I/O address (0). When the separate I/O address space is selected, it is the responsibility of the hardware to decode the memory-I/O bus transaction to select I/O ports rather than memory. Data is transmitted between the processor and an I/O device through the data lines.
10-1
INPUT/OUTPUT
10.3. I/O ADDRESS SPACE

The processors I/O address space is separate and distinct from the physical-memory address space. The I/O address space consists of 216 (64K) individually addressable 8-bit I/O ports, numbered 0 through FFFFH. I/O port addresses 0F8H through 0FFH are reserved. Do not assign I/O ports to these addresses. The result of an attempt to address beyond the I/O address space limit of FFFFH is implementation-specific; refer to the Developers Manuals for specific processors for more details. Any two consecutive 8-bit ports can be treated as a 16-bit port, and any four consecutive ports can be a 32-bit port. In this manner, the processor can transfer 8, 16, or 32 bits to or from a device in the I/O address space. Like words in memory, 16-bit ports should be aligned to even addresses (0, 2, 4, ...) so that all 16 bits can be transferred in a single bus cycle. Likewise, 32-bit ports should be aligned to addresses that are multiples of four (0, 4, 8, ...). The processor supports data transfers to unaligned ports, but there is a performance penalty because one or more extra bus cycle must be used. The exact order of bus cycles used to access unaligned ports is undefined and is not guaranteed to remain the same in future IA processors. If hardware or software requires that I/O ports be written to in a particular order, that order must be specified explicitly. For example, to load a word-length I/O port at address 2H and then another word port at 4H, two word-length writes must be used, rather than a single doubleword write at 2H. Note that the processor does not mask parity errors for bus cycles to the I/O address space. Accessing I/O ports through the I/O address space is thus a possible source of parity errors.
10.3.1. Memory-Mapped I/O

I/O devices that respond like memory components can be accessed through the processors physical-memory address space (refer to Figure 10-1). When using memory-mapped I/O, any of the processors instructions that reference memory can be used to access an I/O port located at a physical-memory address. For example, the MOV instruction can transfer data between any register and a memory-mapped I/O port. The AND, OR, and TEST instructions may be used to manipulate bits in the control and status registers of a memory-mapped peripheral devices. When using memory-mapped I/O, caching of the address space mapped for I/O operations must be prevented. With the Pentium Pro, Pentium II, and Pentium III processors, caching of I/O accesses can be prevented by using memory type range registers (MTRRs) to map the address space used for the memory-mapped I/O as uncacheable (UC). Refer to Chapter 9, Memory Cache Control, in the Intel Architecture Software Developers Manual, Volume 3, for a complete discussion of the MTRRs. The Pentium and Intel486 processors do not support MTRRs. Instead, they provide the KEN# pin, which when held inactive (high) prevents caching of all addresses sent out on the system bus. To use this pin, external address decoding logic is required to block caching in specific address spaces.
10-2
INPUT/OUTPUT
Physical Memory FFFF FFFFH EPROM
I/O Port I/O Port I/O Port
RAM
Figure 10-1. Memory-Mapped I/O
All the IA processors that have on-chip caches also provide the PCD (page-level cache disable) flag in page table and page directory entries. This flag allows caching to be disabled on a pageby-page basis. Refer to Chapter 3.6.4., Page-Directory and Page-Table Entries in Chapter 3, Protected-Mode Memory Management, in the Intel Architecture Software Developers Manual, Volume 3.
10.4. I/O INSTRUCTIONS

The processors I/O instructions provide access to I/O ports through the I/O address space. (These instructions cannot be used to access memory-mapped I/O ports.) There are two groups of I/O instructions:
Those which transfer a single item (byte, word, or doubleword) between an I/O port and a general-purpose register. Those which transfer strings of items (strings of bytes, words, or doublewords) between an I/O port and memory.
The register I/O instructions IN (input from I/O port) and OUT (output to I/O port) move data between I/O ports and the EAX register (32-bit I/O), the AX register (16-bit I/O), or the AL (8-bit I/O) register. The address of the I/O port can be given with an immediate value or a value in the DX register. The string I/O instructions INS (input string from I/O port) and OUTS (output string to I/O port) move data between an I/O port and a memory location. The address of the I/O port being accesses is given in the DX register; the source or destination memory address is given in the DS:ESI or ES:EDI register, respectively.
10-3
INPUT/OUTPUT
When used with one of the repeat prefixes (such as REP), the INS and OUTS instructions perform string (or block) input or output operations. The repeat prefix REP modifies the INS and OUTS instructions to transfer blocks of data between an I/O port and memory. Here, the ESI or EDI register is incremented or decremented (according to the setting of the DF flag in the EFLAGS register) after each byte, word, or doubleword is transferred between the selected I/O port and memory. Refer to the individual references for the IN, INS, OUT, and OUTS instructions in Chapter 3, Instruction Set Reference, of the Intel Architecture Software Developers Manual, Volume 2, for more information on these instructions.
10.5. PROTECTED-MODE I/O

When the processor is running in protected mode, the following protection mechanisms regulate access to I/O ports:
When accessing I/O ports through the I/O address space, two protection devices control access: The I/O privilege level (IOPL) field in the EFLAGS register. The I/O permission bit map of a task state segment (TSS).
When accessing memory-mapped I/O ports, the normal segmentation and paging protection and the MTRRs (in processors that support them) also affect access to I/O ports. Refer to Chapter 4, Protection, and Chapter 9, Memory Cache Control, in the Intel Architecture Software Developers Manual, Volume 3, for a complete discussion of memory protection.
The following sections describe the protection mechanisms available when accessing I/O ports in the I/O address space with the I/O instructions.
10.5.1. I/O Privilege Level

In systems where I/O protection is used, the IOPL field in the EFLAGS register controls access to the I/O address space by restricting use of selected instructions. This protection mechanism permits the operating system or executive to set the privilege level needed to perform I/O. In a typical protection ring model, access to the I/O address space is restricted to privilege levels 0 and 1. Here, kernel and the device drivers are allowed to perform I/O, while less privileged device drivers and application programs are denied access to the I/O address space. Application programs must then make calls to the operating system to perform I/O. The following instructions can be executed only if the current privilege level (CPL) of the program or task currently executing is less than or equal to the IOPL: IN, INS, OUT, OUTS, CLI (clear interrupt-enable flag), and STI (set interrupt-enable flag). These instructions are called I/O sensitive instructions, because they are sensitive to the IOPL field. Any attempt by a less privileged program or task to use an I/O sensitive instruction results in a general-protection
10-4
INPUT/OUTPUT
exception (#GP) being signaled. Because each task has its own copy of the EFLAGS register, each task can have a different IOPL. The I/O permission bit map in the TSS can be used to modify the effect of the IOPL on I/O sensitive instructions, allowing access to some I/O ports by less privileged programs or tasks (refer to Section 10.5.2.). A program or task can change its IOPL only with the POPF and IRET instructions; however, such changes are privileged. No procedure may change the current IOPL unless it is running at privilege level 0. An attempt by a less privileged procedure to change the IOPL does not result in an exception; the IOPL simply remains unchanged. The POPF instruction also may be used to change the state of the IF flag (as can the CLI and STI instructions); however, the POPF instruction in this case is also I/O sensitive. A procedure may use the POPF instruction to change the setting of the IF flag only if the CPL is less than or equal to the current IOPL. An attempt by a less privileged procedure to change the IF flag does not result in an exception; the IF flag simply remains unchanged.
10.5.2. I/O Permission Bit Map

The I/O permission bit map is a device for permitting limited access to I/O ports by less privileged programs or tasks and for tasks operating in virtual-8086 mode. The I/O permission bit map is located in the TSS (refer to Figure 10-2) for the currently running task or program. The address of the first byte of the I/O permission bit map is given in the I/O map base address field of the TSS. The size of the I/O permission bit map and its location in the TSS are variable.
Task State Segment (TSS) Last byte of bit map must be followed by a byte with all bits set
31 24 23 0 1 1 1 1 1 1 1 1
I/O Permission Bit Map
I/O base map must not exceed DFFFH.
I/O Map Base
64H
Figure 10-2. I/O Permission Bit Map
Because each task has its own TSS, each task has its own I/O permission bit map. Access to individual I/O ports can thus be granted to individual tasks.
10-5
INPUT/OUTPUT
If in protected mode and the CPL is less than or equal to the current IOPL, the processor allows all I/O operations to proceed. If the CPL is greater than the IOPL or if the processor is operating in virtual-8086 mode, the processor checks the I/O permission bit map to determine if access to a particular I/O port is allowed. Each bit in the map corresponds to an I/O port byte address. For example, the control bit for I/O port address 29H in the I/O address space is found at bit position 1 of the sixth byte in the bit map. Before granting I/O access, the processor tests all the bits corresponding to the I/O port being addressed. For a doubleword access, for example, the processors tests the four bits corresponding to the four adjacent 8-bit port addresses. If any tested bit is set, a general-protection exception (#GP) is signaled. If all tested bits are clear, the I/O operation is allows to proceed. Because I/O port addresses are not necessarily aligned to word and doubleword boundaries, the processor reads two bytes from the I/O permission bit map for every access to an I/O port. To prevent exceptions from being generated when the ports with the highest addresses are accessed, an extra byte needs to included in the TSS immediately after the table. This byte must have all of its bits set, and it must be within the segment limit. It is not necessary for the I/O permission bit map to represent all the I/O addresses. I/O addresses not spanned by the map are treated as if they had set bits in the map. For example, if the TSS segment limit is 10 bytes past the bit-map base address, the map has 11 bytes and the first 80 I/O ports are mapped. Higher addresses in the I/O address space generate exceptions. If the I/O bit map base address is greater than or equal to the TSS segment limit, there is no I/O permission map, and all I/O instructions generate exceptions when the CPL is greater than the current IOPL. The I/O bit map base address must be less than or equal to DFFFH.
10.6. ORDERING I/O

When controlling I/O devices it is often important that memory and I/O operations be carried out in precisely the order programmed. For example, a program may write a command to an I/O port, then read the status of the I/O device from another I/O port. It is important that the status returned be the status of the device after it receives the command, not before. When using memory-mapped I/O, caution should be taken to avoid situations in which the programmed order is not preserved by the processor. To optimize performance, the processor allows cacheable memory reads to be reordered ahead of buffered writes in most situations. Internally, processor reads (cache hits) can be reordered around buffered writes. When using memory-mapped I/O, therefore, is possible that an I/O read might be performed before the memory write of a previous instruction. The recommended method of enforcing program ordering of memory-mapped I/O accesses with the Pentium Pro, Pentium II, and Pentium III processors is to use the MTRRs to make the memory mapped I/O address space uncacheable; for the Pentium and Intel486 processors, either the #KEN pin or the PCD flags can be used for this purpose (refer to Section 10.3.1.). When the target of a read or write is in an uncacheable region of memory, memory reordering does not occur externally at the processors pins (that is, reads and writes appear in-order). Designating a memory mapped I/O region of the address space as uncacheable insures that reads and writes of I/O devices are carried out in program
10-6
INPUT/OUTPUT
order. Refer to Chapter 9, Memory Cache Control, in the Intel Architecture Software Developers Manual, Volume 3, for more information on using MTRRs. Another method of enforcing program order is to insert one of the serializing instructions, such as the CPUID instruction, between operations. Refer to Chapter 7, Multiple-Processor Management, in the Intel Architecture Software Developers Manual, Volume 3, for more information on serialization of instructions. It should be noted that the chip set being used to support the processor (bus controller, memory controller, and/or I/O controller) may post writes to uncacheable memory which can lead to outof-order execution of memory accesses. In situations where out-of-order processing of memory accesses by the chip set can potentially cause faulty memory-mapped I/O processing, code must be written to force synchronization and ordering of I/O operations. Serializing instructions can often be used for this purpose. When the I/O address space is used instead of memory-mapped I/O, the situation is different in two respects:
The processor never buffers I/O writes. Therefore, strict ordering of I/O operations is enforced by the processor. (As with memory-mapped I/O, it is possible for a chip set to post writes in certain I/O ranges.) The processor synchronizes I/O instruction execution with external bus activity (refer to Table 10-1).
Table 10-1. I/O Instruction Serialization
Processor Delays Execution of Instruction Being Executed Current Instruction? Yes Yes Yes Yes Yes Yes Next Instruction? Until Completion of Pending Stores? Yes Yes Yes Yes Yes Yes Yes Yes Yes Current Store?
IN INS REP INS OUT OUTS REP OUTS
10-7
11
Processor Identification and Feature Determination
CHAPTER 11 PROCESSOR IDENTIFICATION AND FEATURE DETERMINATION

When writing software intended to run on several different types of Intel Architecture (IA) processors, it is generally necessary to identify the type of processor present in a system and the processor features that are available to an application. This chapter describes how to identify the processor that is executing the code and determine the features the processor supports. It also shows how to determine if an FPU or NPX is present. For more information about processor identification and supported features, refer to the following documents:
AP-485, Intel Processor Identification and the CPUID Instruction For a complete list of the features that are available for the different IA processors, refer to Chapter 18, Intel Architecture Compatibility of the Intel Architecture Software Developers Manual, Volume 3: System Programming Guide.
11-1
PROCESSOR IDENTIFICATION AND FEATURE DETERMINATION
11.1. PROCESSOR IDENTIFICATION

The CPUID instruction returns the processor type for the processor that executes the instruction. It also indicates the features that are present in the processor, including the existence of an on-chip FPU. The following information can be obtained with this instruction:
The highest operand value the instruction responds to (2 for the Pentium Pro processors and 1 for the Pentium processors and recent Intel486 processors). The processors family identification (ID) number, model ID, and stepping ID. The presence of an on-chip FPU. Support for or the presence of the following architectural extensions and enhancements: Virtual-8086 mode enhancements. Debugging extensions. Page-size extensions. Read time stamp counter (RDTSC) instruction. Read model specific registers (RDMSR) and write model specific registers (WRMSR) instructions. Physical address extension. Machine check exceptions. Compare and exchange 8 bytes instruction (CMPXCHG8B). On-chip, advanced programmable interrupt controller (APIC). Memory-type range registers (MTRRs). Page global flag. Machine check architecture. Conditional move instruction (CMOVcc). MMX technology.
Cache and TLB information.
To use this instruction, a source operand value of 0, 1, or 2 is placed in the EAX register. Processor identification and feature information is then returned in the EAX, EBX, ECX, and EDX registers. Refer to Section 3.2., Instruction Reference in Chapter 3, Instruction Set Reference of the Intel Architecture Software Developers Manual, Volume 2, for more detailed information about the instruction.
11-2
AP-485, Intel Processor Identification and the CPUID Instruction (Order Number 241618), provides additional information and example source code for use in identifying IA processors. It also contains guidelines for using the CPUID instruction to help maintain the widest range of software compatibility. The following guidelines are among the most important, and should always be followed when using the CPUID instruction to determine available features:
Always begin by testing for the GenuineIntel, message in the EBX, EDX, and ECX registers when the CPUID instruction is executed with EAX equal to 0. If the processor is not genuine Intel, the feature identification flags may have different meanings than are described in CPUIDCPU Identification in Chapter 3, Instruction Set Reference of the Intel Architecture Software Developers Manual, Volume 2. Do not assume a value of 1 in a feature identification flag indicates that a given feature is present. For future feature identification flags, a value of 1 may indicate that the specific feature is not present. Test feature identification flags individually and do not make assumptions about undefined bits.
Note that the CPUID instruction will cause the invalid opcode exception (#UD) if executed on a processor that does not support it. The CPUID instruction application note provides a code sequence to test the validity of the CPUID instruction. Also, this test code (for CPUID valid) is not reliable when executed in virtual-8086 mode. To avoid this, if the test code is written to run in real-address mode, the SMSW instruction must be used to read the PE bit from the MSW (lower half of CR0). If PE flag is set to 1, the Real Mode code is actually being executed in virtual-8086 mode, and the test sequence cannot be guaranteed to return reliable information. (Note that the new version of the CPUID application note (AP-485, Intel Processor Identification and the CPUID Instruction (Order Number 241618-005)), explains this virtual-8086 problem, but the older versions of the application note do not.)
11-3
11.2. IDENTIFICATION OF EARLIER INTEL ARCHITECTURE PROCESSORS

The CPUID instruction is only available in the Pentium Pro, Pentium, and recent Intel486 processors. For the earlier IA processors (including the earlier Intel486 processors), several other architectural features can be exploited to identify the processor. The settings of bits 12 and 13 (IOPL), 14 (NT), and 15 (reserved) in the EFLAGS register (refer to Figure 3-7, Section 3.6.3., EFLAGS Register, in Chapter 3, Basic Execution Environment) is different for Intels 32-bit processors than for the Intel 8086 and Intel 286 processors. By examining the settings of these bits (with the PUSHF/PUSHFD and POP/POPFD instructions), an application program can determine whether the processor is an 8086, Intel286, or one of the Intel 32-bit processors:
8086 processor Bits 12 through 15 of the EFLAGS register are always set. Intel 286 processor Bits 12 through 15 are always clear in real-address mode. 32-bit processors In real-address mode, bit 15 is always clear and bits 12 through 14 have the last value loaded into them. In protected mode, bit 15 is always clear, bit 14 has the last value loaded into it, and the IOPL bits depends on the current privilege level (CPL). The IOPL field can be changed only if the CPL is 0. Bit 18 (AC) Implemented only on the Pentium Pro, Pentium, and Intel486 processors. The inability to set or clear this bit distinguishes an Intel386 processor from the other Intel 32-bit processors. Bit 21 (ID) Determines if the processor is able to execute the CPUID instruction. The ability to set and clear this bit indicates that the processor is a Pentium Pro, Pentium, or later version Intel486 processor.
Other EFLAG register bits that can be used to differentiate between the 32-bit processors:
To determine whether an FPU or NPX is present in a system, applications can write to the FPU/NPX status and control registers using the FNINIT instruction and then verify the correct values are read back using the FNSTENV instruction. After determining that an FPU or NPX is present, its type can then be determined. In most cases, the processor type will determine the type of FPU or NPX; however, an Intel386 processor is compatible with either an Intel 287 or Intel 387 math coprocessor. The method the coprocessor uses to represent (after the execution of the FINIT, FNINIT, or RESET instruction) indicates which coprocessor is present. The Intel 287 math coprocessor uses the same bit representation for + and ; whereas, the Intel 387 math coprocessor uses different representations for + and .
11.3. CPUID INSTRUCTION EXTENSIONS

The CPUID instructions of all P6-family processors behave identically. The CPUID instruction is described in detail in the application note, AP-485, Intel Processor Identification and the CPUID Instruction. This section describes processor-specific information returned by the CPUID instruction. The CPUID instructions behavior varies depending upon the contents of the EAX register when the instruction is executed. Table 11-1 shows the interaction between the value in EAX before the call to CPUID and the value that CPUID returns.
Table 11-1. EAX Input Value and CPUID Return Values
EAX 0 EAX EBX ECX EDX 1 EAX EBX ECX EDX 2 EAX EBX ECX EDX CPUID Return Values Maximum CPUID input value 756E6547H 6C65746EH 49656E69H uneG (G in BL) letn (n in CL) leni (i in DL)
Version information (Type, Family, Model, Stepping) Reserved Reserved Feature Information Cache Information Cache Information Cache Information Cache Information
Refer to the CPUID application note, AP-485, for details on cache information. AP-485 is available from the following web site: http://developer.intel.com/design/pro/applnots/ap485.htm. In addition, the following two new cache descriptors are defined for P6-family processors with Model > 3: 1M L2 Cache 2M L2 Cache 4-way set associative 4-way set associative 32-byte line size 32-byte line size 44h 45h
11.3.1. Version Information

When the CPUID instruction is executed with a 1 in EAX, it returns version and feature information. Figure 11-1 shows the version information bit fields returned by CPUID in EAX. The 233, 266, and 300 MHz Pentium II processors are indicated by a 6 in the Family ID and a
11-5
3 in the Model ID field. Future P6-family processors are indicated by a 6 in the Family ID and a value greater than 3 in the Model ID field.
31 Reserved (0) 12 11 Family ID 08 07 Model ID 04 03 Stepping ID 00
Figure 11-1. EAX Return Values
Figure 11-2 shows the feature information bit fields returned by CPUID in EAX.
3 26 25 24 23 22 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
XMM
FXSR
MMX
rsvd
PN
PSE-36
PAT
CMOV
MCA
PGE
MTRR
SEP
rsvd
APIC
CX8
MCE
PAE
MSR
TSC
PSE
DE
VME
FPU
Reserved (0)
Figure 11-2. CPUID Feature Field Information Bits
Table 11-2 describes the bit representations for the new P6-family processor features.
Table 11-2. New P6-Family Processor Feature Information Returned by CPUID in EDX
Bit 11 16 Feature SEP PAT Value 1 1 Description Fast System Call Page Attribute Table Notes Indicates whether the processor supports the Fast System Call instructions SYSENTER and SYSEXIT. Indicates whether the processor supports the Page Attribute Table. This feature augments the Memory Type Range Registers (MTRRs), allowing an operating system to specify attributes of memory on a page granularity through a linear address. Indicates whether the processor supports 4 MB pages that are capable of addressing physical memory beyond 4 GB. This feature indicates that the up-per four bits of the physical address of the 4-MB page is encoded by bits 13-16 of the page directory entry. Indicates whether the processor supports the 96-bit Processor Number feature. These bits are reserved for future use. The contents of these fields are not defined and should not be relied upon or altered. Indicates whether the processor supports the MMX technology instruction set and architecture. Indicates whether the processor supports the FXSAVE and FXRSTOR instructions for fast save and restore of the floating-point context. Presence of this bit also indicates that CR4.OSFXSR is available, allowing an operating system to indicate that it uses the fast save/restore instructions. Indicates whether the processor supports the Streaming SIMD Extensions instruction set.
17
PSE-36
36-bit Page Size Extension
18 19-22
PN rsvd
1 0
Processor Number Reserved
23 24
MMX FXSR
1 1
MMX-technology Fast floatingpoint save and restore
25
XMM
Streaming SIMD Extension
11-6
11.3.2. Control Register Extensions

The control registers (CR0, CR1, CR2, CR3, and CR4) determine the operating mode of the processor and the characteristics of the currently executing task. A new field has been added to CR4, which contains a group of flags used to enable several architectural extensions as depicted in Figure 11-3.
31 10 09 OSFXSR 08 PCE 07 PGE 06 MCE 05 PAE 04 PSE 03 DE 02 TSD 01 PVI 00 VME
Reserved (set to 0)
Figure 11-3. CR4 Register Extensions
The new field at bit 9 (OSFXSR) is set by the operating system to indicate that it uses the FXSAVE/FXRSTOR instructions for saving/restoring FP/MMX state during context switches. This bit defaults to clear (zero) at processor initialization.
11-7
11-8
A
EFLAGS Cross-Reference
APPENDIX A EFLAGS CROSS-REFERENCE

The cross-reference in Table A-1 summarizes how the flags in the processors EFLAGS register are affected by each instruction. For detailed information on how flags are affected, refer to Chapter 3, Instruction Set Reference of the Intel Architecture Software Developers Manual, Volume 2. The following codes describe how the flags are affected: T M 0 1
Instruction tests flag. Instruction modifies flag (either sets or resets depending on operands). Instruction resets flag. Instruction sets flag. Instructions effect on flag is undefined. Instruction restores prior value of flag. Instruction does not affect flag.
R Blank
Table A-1. EFLAGS Cross-Reference

Instruction AAA AAD AAM AAS ADC ADD AND ARPL BOUND BSF/BSR BSWAP BT/BTS/BTR/BTC M M OF M M 0 SF M M M M M ZF M M M M M M AF TM TM M M PF M M M M M CF M M TM M 0 TF IF DF NT RF
A-1
EFLAGS CROSS-REFERENCE
Table A-1. EFLAGS Cross-Reference (Contd.)

Instruction CALL CBW CLC CLD CLI CLTS CMC CMOVcc CMP CMPS CMPXCHG CMPXCHG8B CPUID COMISS CWD DAA DAS DEC DIV ENTER ESC FCMOVcc FCOMI, FCOMIP, FUCOMI, FUCOMIP HLT IDIV IMUL IN INC INS INT INTO INVD T 0 0 M M M M M T 0 0 M M T M T M T M M M M M M M M TM TM M M M M TM TM 1 1 1 1 1 1 T M M M T M M M T M M M M M M M T M M M M T M M M T 0 0 0 OF SF ZF AF PF CF TF IF DF NT RF
A-2

Instruction INVLPG IRET Jcc JCXZ JMP LAHF LAR LDS/LES/LSS/LFS/LGS LEA LEAVE LGDT/LIDT/LLDT/LMSW LOCK LODS LOOP LOOPE/LOOPNE LSL LTR MOV MOV control, debug, test MOVS MOVSX/MOVZX MUL NEG NOP NOT OR OUT OUTS POP/POPA POPF PUSH/PUSHA/PUSHF RCL/RCR 1 RCL/RCR count M TM TM R R R R R R R R R R T 0 M M M 0 M M M M M M M M T T M T M R T R T R T R R T R T R R R T OF SF ZF AF PF CF TF IF DF NT RF
A-3

Instruction OF SF ZF AF PF CF TF IF DF NT RF
RDMSR RDPMC RDTSC REP/REPE/REPNE RET ROL/ROR 1 ROL/ROR count RSM SAHF SAL/SAR/SHL/SHR 1 SAL/SAR/SHL/SHR count SBB SCAS SETcc SGDT/SIDT/SLDT/SMSW SHLD/SHRD STC STD STI STOS STR SUB TEST UCOMISS UD2 VERR/VERRW WAIT WBINVD WRMSR XADD XCHG XLAT XOR 0 M M M 0 M M M M M M M M 0 1 M M 1 M M 1 M 1 M M 1 M 0 1 1 T M M M M 1 1 M M M T M M M R M M M M T M R M M M M T M R M M M R M M M M T M M M R M M TM M T T M M M M M
A-4
B
EFLAGS Condition Codes
APPENDIX B EFLAGS CONDITION CODES

Table B-1 gives all the condition codes that can be tested for by the CMOVcc, FCMOVcc, Jcc and SETcc instructions. The condition codes refer to the setting of one or more status flags (CF, OF, SF, ZF, and PF) in the EFLAGS register. The Mnemonic column gives the suffix (cc) added to the instruction to specific the test condition. The Condition Tested For column describes the condition specified in the Status Flags Setting column. The Instruction Subcode column gives the opcode suffix added to the main opcode to specify a test condition.
Table B-1. EFLAGS Condition Codes
Mnemonic (cc) O NO B NAE NB AE E Z NE NZ BE NA NBE A S NS P PE NP PO Mnemonic L NGE NL GE Condition Tested For Overflow No overflow Below Neither above nor equal Not below Above or equal Equal Zero Not equal Not zero Below or equal Not above Neither below nor equal Above Sign No sign Parity Parity even No parity Parity odd Meaning Less Neither greater nor equal Not less Greater or equal Instruction Subcode 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 Instruction Subcode 1100 1101 Status Flags Setting OF = 1 OF = 0 CF = 1 CF = 0 ZF = 1 ZF = 0 (CF OR ZF) = 1 (CF OR ZF) = 0 SF = 1 SF = 0 PF = 1 PF = 0
Condition Tested (SF xOR OF) = 1 (SF xOR OF) = 0
B-1
EFLAGS CONDITION CODES
Table B-1. EFLAGS Condition Codes (Contd.)

Mnemonic (cc) LE NG NLE G Condition Tested For Less or equal Not greater Neither less nor equal Greater Instruction Subcode 1110 1111 Status Flags Setting ((SF XOR OF) OR ZF) = 1 ((SF XOR OF) OR ZF) = 0
Many of the test conditions are described in two different ways. For example, LE (less or equal) and NG (not greater) describe the same test condition. Alternate mnemonics are provided to make code more intelligible. The terms above and below are associated with the CF flag and refer to the relation between two unsigned integer values. The terms greater and less are associated with the SF and OF flags and refer to the relation between two signed integer values.
B-2
C
Floating-Point Exceptions Summary
APPENDIX C FLOATING-POINT EXCEPTIONS SUMMARY

Table C-1 lists the floating-point instruction mnemonics in alphabetical order. For each mnemonic, it summarizes the exceptions that the instruction may cause. Refer to Section 7.8., Floating-Point Exception Conditions in Chapter 7, Floating-Point Unit for a detailed discussion of the floating-point exceptions. The following codes indicate the floating-point exceptions: #IS #IA #D #Z #O #U #P Invalid operation exception for stack underflow or stack overflow. Invalid operation exception for invalid arithmetic operands and unsupported formats. Denormal operand exception. Divide-by-zero exception. Numeric overflow exception. Numeric underflow exception. Inexact result (precision) exception.
Table C-1. Floating-Point Exceptions Summary

Mnemonic F2XM1 FABS FADD(P) FBLD FBSTP FCHS FCLEX FCMOVcc FCOM, FCOMP, FCOMPP FCOMI, FCOMIP, FUCOMI, FUCOMIP FCOS FDECSTP FDIV(R)(P) 2
X1
Instruction
#IS Y Y Y Y Y Y
#IA Y
#D Y
#Z
#O
#U Y
#P Y
Absolute value Add real BCD load BCD store and pop Change sign Clear exceptions Floating-point conditional move Compare real Compare real and set EFLAGS Cosine Decrement stack pointer Divide real
Y Y Y Y Y Y Y Y Y Y Y
C-1
FLOATING-POINT EXCEPTIONS SUMMARY
Table C-1. Floating-Point Exceptions Summary (Contd.)

Mnemonic FFREE FIADD FICOM(P) FIDIV FIDIVR FILD FIMUL FINCSTP FINIT FIST(P) FISUB(R) FLD extended or stack FLD single or double FLD1 FLDCW FLDENV FLDL2E FLDL2T FLDLG2 FLDLN2 FLDPI FLDZ FMUL(P) FNOP FPATAN FPREM FPREM1 FPTAN FRNDINT FRSTOR FSAVE FSCALE FSIN Instruction Free register Integer add Integer compare Integer divide Integer divide reversed Integer load Integer multiply Increment stack pointer Initialize processor Integer store Integer subtract Load real Load real Load + 1.0 Load Control word Load environment Load log2e Load log210 Load log102 Load loge2 Load Load + 0.0 Multiply real No operation Partial arctangent Partial remainder IEEE partial remainder Partial tangent Round to integer Restore state Save state Scale Sine Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y #IS #IA #D #Z #O #U #P
C-2
FLOATING-POINT EXCEPTIONS SUMMARY
Table C-1. Floating-Point Exceptions Summary (Contd.)

Mnemonic FSINCOS FSQRT FST(P) stack or extended FST(P) single or double FSTCW FSTENV FSTSW (AX) FSUB(R)(P) FTST FUCOM(P)(P) FWAIT FXAM FXCH FXTRACT FYL2X FYL2XP1 Instruction Sine and cosine Square root Store real Store real Store control word Store environment Store status word Subtract real Test Unordered compare real CPU Wait Examine Exchange registers Extract Y log2X Y log2(X + 1) Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y #IS Y Y Y Y Y Y Y Y Y #IA Y Y #D Y Y #Z #O #U Y #P Y Y
C-3
D
SIMD Floating-Point Exceptions Summary
APPENDIX D SIMD FLOATING-POINT EXCEPTIONS SUMMARY

Table D-1 lists the Streaming SIMD Extensions mnemonics in alphabetical order. For each mnemonic, it summarizes the exceptions that the instruction may cause. Refer to Section 9.5.5., Exception Handling in Streaming SIMD Extensions in Chapter 9, Programming with the Streaming SIMD Extensions for a detailed discussion of the various exceptions that can occur when executing Streaming SIMD Extensions. The following codes indicate the exceptions associated with execution of an instruction that utilizes the 128-bit Streaming SIMD Extensions registers. #I #D #Z #O #U #P Invalid operation exception for invalid arithmetic operands and unsupported formats. Denormal operand exception. Divide-by-zero exception. Numeric overflow exception. Numeric underflow exception. Inexact result (precision) exception.
D-1
SIMD FLOATING-POINT EXCEPTIONS SUMMARY
Table D-1. Streaming SIMD Extensions Instruction Set Summary

Mnemonic ADDPS ADDSS ANDNPS ANDPS CMPPS CMPSS COMISS Instruction Packed add Scalar add Packed logical INVERT and AND Packed logical AND Packed compare Scalar compare Scalar ordered compare lower SP FP numbers and set the status flags Convert two 32-bit signed integers from MM2/Mem to two SP FP. Convert lower 2 SP FP from XMM/Mem to 2 32-bit signed integers in MM using rounding specified by MXCSR. Convert one 32-bit signed integer from Integer Reg/Mem to one SP FP. Convert one SP FP from XMM/Mem to one 32-bit signed integer using rounding mode specified by MXCSR, and move the result to an integer register. Convert lower 2 SP FP from XMM2/Mem to 2 32-bit signed integers in MM1 using truncate. Convert lowest SP FP from XMM/Mem to one 32-bit signed integer using truncate, and move the result to an integer register. Packed divide Scalar divide Load FP and Streaming SIMD Extensions state Store FP and Streaming SIMD Extensions state Load control/status word Packed maximum Scalar maximum Packed minimum Scalar minimum Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y #I Y Y # D Y Y # Z # O Y Y # U Y Y # P Y Y
CVTPI2PS
CVTPS2PI
CVTSI2SS
CVTSS2SI
CVTTPS2PI
CVTTSS2SI
DIVPS DIVSS FXRSTOR FXSAVE LDMXCSR MAXPS MAXSS MINPS MINSS
Y Y
Y Y
Y Y
Y Y
Y Y
Y Y
D-2
Mnemonic MOVAPS MOVHPS MOVLPS MOVMSKPS MOVSS MOVUPS MULPS MULSS ORPS RCPPS RCPSS RSQRTPS RSQRTSS SHUFPS SQRTPS SQRTSS STMXCSR SUBPS SUBSS UCOMISS
Instruction Move aligned packed data Move high 64 bits Move low 64 bits Move mask to r32 Move scalar Move unaligned packed data Packed multiply Scalar multiply Packed OR Packed reciprocal Scalar reciprocal Packed reciprocal square root Scalar reciprocal square root Shuffle Square Root of the packed SP FP numbers Scalar square root Store control/status word Packed subtract Scalar subtract Unordered compare lower SP FP numbers and set the status flags Interleave SP FP numbers Interleave SP FP numbers Packed XOR
#I
# D
# Z
# O
# U
# P
Y Y
Y Y
Y Y
Y Y
Y Y
Y Y Y Y Y
Y Y Y Y Y Y Y Y Y
Y Y Y Y
UNPCKHPS UNPCKLPS XORPS
D-3
D-4
E
Guidelines for Writing FPU Exception Handlers
APPENDIX E GUIDELINES FOR WRITING FPU EXCEPTIONS HANDLERS

As described in Chapter 7, Floating-Point Unit, the Intel Architecture (IA) supports two mechanisms for accessing exception handlers to handle unmasked FPU exceptions: native mode and MS-DOS compatibility mode. The primary purpose of this appendix is to provide detailed information to help software engineers design and write FPU exception-handling facilities to run on PC systems that use the MS-DOS compatibility modeI for handling FPU exceptions. Some of the information in this appendix will also be of interest to engineers who are writing nativemode FPU exception handlers. The information provided is as follows:
Discussion of the origin of the MS-DOS* FPU exception handling mechanism and its relationship to the FPUs native exception handling mechanism. Description of the IA flags and processor pins that control the MS-DOS FPU exception handling mechanism. Description of the external hardware typically required to support MS-DOS exception handling mechanism. Description of the FPUs exception handling mechanism and the typical protocol for FPU exception handlers. Code examples that demonstrate various levels of FPU exception handlers. Discussion of FPU considerations in multitasking environments. Discussion of native mode FPU exception handling.
The information given is oriented toward the most recent generations of IA processors, starting with the Intel486. It is intended to augment the reference information given in Chapter 7, Floating-Point Unit. A more extensive version of this appendix is available in the application note AP-578, Software and Hardware Considerations for FPU Exception Handlers for Intel Architecture Processors (Order Number 242415-001), which is available from Intel.
NOTES I Microsoft Windows* 95 and Windows* 3.1 (and earlier versions) operating systems use almost the same FPU exception handling interface as the operating system. The recommendations in this appendix for a MS-DOS* compatible exception handler thus apply to all three operating systems.
E-1
GUIDELINES FOR WRITING FPU EXCEPTIONS HANDLERS
E.1. ORIGIN OF THE MS-DOS* COMPATIBILITY MODE FOR HANDLING FPU EXCEPTIONS
The first generations of IA processors (starting with the Intel 8086 and 8088 processors and going through the Intel 286 and Intel386 processors) did not have an on-chip floating-point unit. Instead, floating-point capability was provided on a separate numeric coprocessor chip. The first of these numeric coprocessors was the Intel 8087, which was followed by the Intel 287 and Intel 387 numeric coprocessors. To allow the 8087 to signal floating-point exceptions to its companion 8086 or 8088, the 8087 has an output pin, INT, which it asserts when an unmasked floating-point exception occurs. The designers of the 8087 recommended that the output from this pin be routed through a programmable interrupt controller (PIC) such as the Intel 8259A to the INTR pin of the 8086 or 8088. The accompanying interrupt vector number could then be used to access the floating-point exception handler. However, the original IBM PC design and MS-DOS operating system used a different mechanism for handling the INT output from the 8087. It connected the INT pin directly to the NMI input pin of the 8086 or 8088. The NMI interrupt handler then had to determine if the interrupt was caused by a floating-point exception or another NMI event. This mechanism is the origin of what is now called the MS-DOS compatibility mode. The decision to use this latter floating-point exception handling mechanism came about because when the IBM PC was first designed, the 8087 was not available. When the 8087 did become available, other functions had already been assigned to the eight inputs to the PIC. One of these functions was a BIOS video interrupt, which was assigned to interrupt number 16 for the 8086 and 8088. The Intel 286 processor created the native mode for handling floating-point exceptions by providing a dedicated input pin (ERROR#) for receiving floating-point exception signals and a dedicated interrupt number, 16. Interrupt 16 was used to signal floating-point errors (also called math faults). It was intended that the ERROR# pin on the Intel 286 be connected to a corresponding ERROR# pin on the Intel 287 numeric coprocessor. When the Intel 287 signals a floating-point exception using this mechanism, the Intel 286 generates an interrupt 16, to invoke the floating-point exception handler. To maintain compatibility existing PC software, the native floating-point exception handling mode of the Intel 286 and 287 was not used in the IBM PC AT* system design. Instead, the ERROR# pin on the Intel 286 was tied permanently high, and the ERROR# pin from the Intel 287 was routed to a second (cascaded) PIC. The resulting output of this PIC was routed through an exception handler and eventually caused an interrupt 2 (NMI interrupt). Here the NMI interrupt was shared with PC ATs new parity checking feature. Interrupt 16 remained assigned to the BIOS video interrupt handler. The external hardware for the MS-DOS compatibility mode must prevent the Intel 286 processor from executing past the next FPU instruction when an unmasked exception has been generated. To do this, it asserts the BUSY# signal into the Intel 286 when the ERROR# signal is asserted by the Intel 287. The Intel386 processor and its companion Intel 387 numeric coprocessor provided the same hardware mechanism for signaling and handling floating-point exceptions as the Intel 286 and 287 processors. And again, to maintain compatibility with existing MS-DOS software, basically
E-2
the same MS-DOS compatibility floating-point exception handling mechanism that was used in the PC AT was used in PCs based on the Intel386.
E.2. IMPLEMENTATION OF THE MS-DOS* COMPATIBILITY MODE IN THE INTEL486, PENTIUM, AND P6 FAMILY PROCESSORS
Beginning with the Intel486 processor, the IA provided a dedicated mechanism for enabling the MS-DOS compatibility mode for FPU exceptions and for generating external FPU-exception signals while operating in this mode. The following sections describe the implementation of the MS-DOS compatibility mode in Intel486, Pentium processors, and P6 family processors. Also described is the recommended external hardware to support this mode of operation.
E.2.1.
MS-DOS* Compatibility Mode in the Intel486 and Pentium Processors
In the Intel486, several things were done to enhance and speed up the numeric coprocessor, now called the floating-point unit (FPU). The most important enhancement was that the FPU was included in the same chip as the processor, for increased speed in FPU computations and reduced latency for FPU exception handling. Also, for the first time, the MS-DOS compatibility mode was built into the chip design, with the addition of the NE bit in control register CR0 and the addition of the FERR# (Floating-point ERRor) and IGNNE# (IGNore Numeric Error) pins. The NE bit selects the native FPU exception handling mode (NE = 1) or the MS-DOS compatibility mode (NE = 0). When native mode is selected, all signaling of floating-point exceptions is handled internally in the Intel486 chip, resulting in the generation of an interrupt 16. When MS-DOS compatibility mode is selected the FERRR# and IGNNE# pins are used to signal floating-point exceptions. The FERR# output pin, which replaces the ERROR# pin from the previous generations of IA numeric coprocessors, is connected to a PIC. A new input signal, IGNNE#, is provided to allow the FPU exception handler to execute FPU instructions, if desired, without first clearing the error condition and without triggering the interrupt a second time. This IGNNE# feature is needed to replicate the capability that was provided on MS-DOS compatible Intel 286 and Intel 287 and Intel386 and Intel 387 systems by turning off the BUSY# signal, when inside the FPU exception handler, before clearing the error condition. Note that Intel, in order to provide Intel486 processors for market segments which had no need for an FPU, created the SX versions. These Intel486 SX processors did not contain the floating-point unit. Intel also produced Intel 487 SX processors for end users who later decided to upgrade to a system with an FPU. These Intel 487 SX processors are similar to standard Intel486 processors with a working FPU on board. Thus, the external circuitry necessary to support the MS-DOS compatibility mode for Intel 487 SX processors is the same as for standard Intel486 DX processors. The Pentium and P6 family processors offer the same mechanism (the NE bit and the FERR# and IGNNE# pins) as the Intel486 processors for generating FPU exceptions in MS-DOS
E-3
compatibility mode. The actions of these mechanisms are slightly different and more straightforward for the P6 family processors, as described in Section E.2.2., MS-DOS* Compatibility Mode in the P6 Family Processors. For Pentium and P6 family processors, it is important to note that the special DP (Dual Processing) mode for Pentium processors and also the more general Intel MultiProcessor Specification for systems with multiple Pentium or P6 family processors support FPU exception handling only in the native mode. Intel does not recommend using the MS-DOS compatibility FPU mode for systems using more than one processor. E.2.1.1. BASIC RULES: WHEN FERR# IS GENERATED
When MS-DOS compatibility mode is enabled for the Intel486 or Pentium processors (NE bit is set to 0) and the IGNNE# input pin is de-asserted, the FERR# signal is generated as follows: 1. When an FPU instruction causes an unmasked FPU exception, the processor (in most cases) uses a deferred method of reporting the error. This means that the processor does not respond immediately, but rather freezes just before executing the next WAIT or FPU instruction (except for no-wait instructions, which the FPU executes regardless of an error condition). 2. When the processor freezes, it also asserts the FERR# output. 3. The frozen processor waits for an external interrupt, which must be supplied by external hardware in response to the FERR# assertion. 4. In MS-DOS* compatibility systems, FERR# is fed to the IRQ13 input in the cascaded PIC. The PIC generates interrupt 75H, which then branches to interrupt 2, as described earlier in this appendix for systems using the Intel 286 and Intel 287 or Intel386 and Intel 387 processors. The deferred method of error reporting is used for all exceptions caused by the basic arithmetic instructions (including FADD, FSUB, FMUL, FDIV, FSQRT, FCOM and FUCOM), for precision exceptions caused by all types of FPU instructions, and for numeric underflow and overflow exceptions caused by all types of FPU instructions except stores to memory. Some FPU instructions with some FPU exceptions use an immediate method of reporting errors. Here, the FERR# is asserted immediately, at the time that the exception occurs. The immediate method of error reporting is used for FPU stack fault, invalid operation and denormal exceptions caused by all transcendental instructions, FSCALE, FXTRACT, FPREM and others, and all exceptions (except precision) when caused by FPU store instructions. Like deferred error reporting, immediate error reporting will cause the processor to freeze just before executing the next WAIT or FPU instruction if the error condition has not been cleared by that time. Note that in general, whether deferred or immediate error reporting is used for an FPU exception depends both on which exception occurred and which instruction caused that exception. A complete specification of these cases, which applies to both the Pentium and the Intel486 processors, is given in Section 5.1.2.1., Program-Error Exceptions, in Chapter 5, Interrupt and Exception Handling, of the Intel Architecture Software Developers Manual, Volume 3.
E-4
If NE=0 but the IGNNE# input is active while an unmasked FPU exception is in effect, the processor disregards the exception, does not assert FERR#, and continues. If IGNNE# is then deasserted and the FPU exception has not been cleared, the processor will respond as described above. (That is, an immediate exception case will assert FERR# immediately. A deferred exception case will assert FERR# and freeze just before the next FPU or WAIT instruction.) The assertion of IGNNE# is intended for use only inside the FPU exception handler, where it is needed if one wants to execute non-control FPU instructions for diagnosis, before clearing the exception condition. When IGNNE# is asserted inside the exception handler, a preceding FPU exception has already caused FERR# to be asserted, and the external interrupt hardware has responded, but IGNNE# assertion still prevents the freeze at FPU instructions. Note that if IGNNE# is left active outside of the FPU exception handler, additional FPU instructions may be executed after a given instruction has caused an FPU exception. In this case, if the FPU exception handler ever did get invoked, it could not determine which instruction caused the exception. To properly manage the interface between the processors FERR# output, its IGNNE# input, and the IRQ13 input of the PIC, additional external hardware is needed. A recommended configuration is described in the following section. E.2.1.2. RECOMMENDED EXTERNAL HARDWARE TO SUPPORT THE MS-DOS* COMPATIBILITY MODE
Figure E-1 provides an external circuit that will assure proper handling of FERR# and IGNNE# when an FPU exception occurs. In particular, it assures that IGNNE# will be active only inside the FPU exception handler without depending on the order of actions by the exception handler. Some hardware implementations have been less robust because they have depended on the exception handler to clear the FPU exception interrupt request to the PIC (FP_IRQ signal) before the handler causes FERR# to be de-asserted by clearing the exception from the FPU itself. Figure E-2 shows the details of how IGNNE# will behave when the circuit in Figure E-1 is implemented. The temporal regions within the FPU exception handler activity are described as follows: 1. The FERR# signal is activated by an FPU exception and sends an interrupt request through the PIC to the processors INTR pin. 2. During the FPU interrupt service routine (exception handler) the processor will need to clear the interrupt request latch (Flip Flop #1). It may also want to execute non-control FPU instructions before the exception is cleared from the FPU. For this purpose the IGNNE# must be driven low. Typically in the PC environment an I/O access to Port 0F0H clears the external FPU exception interrupt request (FP_IRQ). In the recommended circuit, this access also is used to activate IGNNE#. With IGNNE# active the FPU exception handler may execute any FPU instruction without being blocked by an active FPU exception. 3. Clearing the exception within the FPU will cause the FERR# signal to be deactivated and then there is no further need for IGNNE# to be active. In the recommended circuit, the deactivation of FERR# is used to deactivate IGNNE#. If another circuit is used, the software and circuit together must assure that IGNNE# is deactivated no later than the exit from the FPU exception handler.
E-5
FF #1
Intel486, Pentium, or Pentium Pro processor
FF #2
FP_IRQ
Legend: FF #n Flip Flop #n CLR Clear or Reset Figure E-1. Recommended Circuit for MS-DOS* Compatibility FPU Exception Handling
In the circuit in Figure E-1, when the FPU exception handler accesses I/O port 0F0H it clears the IRQ13 interrupt request output from Flip Flop #1 and also clocks out the IGNNE# signal (active) from Flip Flop #2. So the handler can activate IGNNE#, if needed, by doing this 0F0H access before clearing the FPU exception condition (which de-asserts FERR#). However, the
E-6
circuit does not depend on the order of actions by the FPU exception handler to guarantee the correct hardware state upon exit from the handler. Flip Flop #2, which drives IGNNE# to the processor, has its CLEAR input attached to the inverted FERR#. This ensures that IGNNE# can never be active when FERR# is inactive. So if the handler clears the FPU exception condition before the 0F0H access, IGNNE# does not get activated and left on after exit from the handler.
0F0H Address Decode
Figure E-2. Behavior of Signals During FPU Exception Handling
E.2.1.3.
NO-WAIT FPU INSTRUCTIONS CAN GET FPU INTERRUPT IN WINDOW
The Pentium and Intel486 processors implement the no-wait floating-point instructions (FNINIT, FNCLEX, FNSTENV, FNSAVE, FNSTSW, FNSTCW, FNENI, FNDISI or FNSETPM) in the MS-DOS compatibility mode in the following manner. (Refer to Section 7.5.11., FPU Control Instructions and Section 7.5.12., Waiting Vs. Non-waiting Instructions in Chapter 7, Floating-Point Unit, for a discussion of the no-wait instructions.) If an unmasked numeric exception is pending from a preceding FPU instruction, a member of the no-wait class of instructions will, at the beginning of its execution, assert the FERR# pin in response to that exception just like other FPU instructions, but then, unlike the other FPU instructions, FERR# will be de-asserted. This de-assertion was implemented to allow the no-wait class of instructions to proceed without an interrupt due to any pending numeric exception. However, the brief assertion of FERR# is sufficient to latch the FPU exception request into most hardware interface implementations (including Intels recommended circuit). All the FPU instructions are implemented such that during their execution, there is a window in which the processor will sample and accept external interrupts. If there is a pending interrupt, the processor services the interrupt first before resuming the execution of the instruction. Consequently, it is possible that the no-wait floating-point instruction may accept the external interrupt caused by its own assertion of the FERR# pin in the event of a pending unmasked numeric
E-7
exception, which is not an explicitly documented behavior of a no-wait instruction. This process is illustrated in Figure E-3.
Exception Generating Floating-Point Instruction Assertion of FERR# by the Processor
Start of the No-Wait Floating-Point Instruction
System Dependent Delay Case 1 Assertion of INTR Pin by the System Case 2 Window Closed External Interrupt Sampling Window
Figure E-3. Timing of Receipt of External Interrupt
Figure E-3 assumes that a floating-point instruction that generates a deferred error (as defined in the Section E.2.1.1., Basic Rules: When FERR# Is Generated), which asserts the FERR# pin only on encountering the next floating-point instruction, causes an unmasked numeric exception. Assume that the next floating-point instruction following this instruction is one of the no-wait floating-point instructions. The FERR# pin is asserted by the processor to indicate the pending exception on encountering the no-wait floating-point instruction. After the assertion of the FERR# pin the no-wait floating-point instruction opens a window where the pending external interrupts are sampled. Then there are two cases possible depending on the timing of the receipt of the interrupt via the INTR pin (asserted by the system in response to the FERR# pin) by the processor. Case 1 If the system responds to the assertion of FERR# pin by the no-wait floating-point instruction via the INTR pin during this window then the interrupt is serviced first, before resuming the execution of the no-wait floating-point instruction. If the system responds via the INTR pin after the window has closed then the interrupt is recognized only at the next instruction boundary.
Case 2
There are two other ways, in addition to Case 1 above, in which a no-wait floating-point instruction can service a numeric exception inside its interrupt window. First, the first floating-point error condition could be of the immediate category (as defined in Section E.2.1.1., Basic Rules: When FERR# Is Generated) that asserts FERR# immediately. If the system delay before
E-8
asserting INTR is long enough, relative to the time elapsed before the no-wait floating-point instruction, INTR can be asserted inside the interrupt window for the latter. Second, consider two no-wait FPU instructions in close sequence, and assume that a previous FPU instruction has caused an unmasked numeric exception. Then if the INTR timing is too long for an FERR# signal triggered by the first no-wait instruction to hit the first instructions interrupt window, it could catch the interrupt window of the second. The possible malfunction of a no-wait FPU instruction explained above cannot happen if the instruction is being used in the manner for which Intel originally designed it. The no-wait instructions were intended to be used inside the FPU exception handler, to allow manipulation of the FPU before the error condition is cleared, without hanging the processor because of the FPU error condition, and without the need to assert IGNNE#. They will perform this function correctly, since before the error condition is cleared, the assertion of FERR# that caused the FPU error handler to be invoked is still active. Thus the logic that would assert FERR# briefly at a no-wait instruction causes no change since FERR# is already asserted. The no-wait instructions may also be used without problem in the handler after the error condition is cleared, since now they will not cause FERR# to be asserted at all. If a no-wait instruction is used outside of the FPU exception handler, it may malfunction as explained above, depending on the details of the hardware interface implementation and which particular processor is involved. The actual interrupt inside the window in the no-wait instruction may be blocked by surrounding it with the instructions: PUSHFD, CLI, no-wait, then POPFD. (CLI blocks interrupts, and the push and pop of flags preserves and restores the original value of the interrupt flag.) However, if FERR# was triggered by the no-wait, its latched value and the PIC response will still be in effect. Further code can be used to check for and correct such a condition, if needed. Section E.3.5., Considerations When FPU Shared Between Tasks discusses an important example of this type of problem and gives a solution.
E.2.2.
MS-DOS* Compatibility Mode in the P6 Family Processors
When bit NE=0 in CR0, the MS-DOS compatibility mode of the P6 family processors provides FERR# and IGNNE# functionality that is almost identical to the Intel486 and Pentium processors. The same external hardware described in Section E.2.1.2., Recommended External Hardware to Support the MS-DOS* Compatibility Mode is recommended for the P6 family processors as well as the two previous generations. The only change to MS-DOS compatibility FPU exception handling with the P6 family processors is that all exceptions for all FPU instructions cause immediate error reporting. That is, FERR# is asserted as soon as the FPU detects an unmasked exception; there are no cases in which error reporting is deferred to the next FPU or WAIT instruction. (As is discussed in Section E.2.1.1., Basic Rules: When FERR# Is Generated, most exception cases in the Intel486 and Pentium processors are of the deferred type.) Although FERR# is asserted immediately upon detection of an unmasked FPU error, this certainly does not mean that the requested interrupt will always be serviced before the next instruction in the code sequence is executed. To begin with, the P6 family processors executes several instructions simultaneously. There also will be a delay, which depends on the external hardware implementation, between the FERR# assertion from the processor and the responding INTR assertion to the processor. Further, the interrupt request to the PICs (IRQ13) may be temporarily blocked by the operating system, or delayed by higher priority interrupts, and processor re-
E-9
sponse to INTR itself is blocked if the operating system has cleared the IF bit in EFLAGS. Note that Streaming SIMD Extensions numeric exceptions will not cause assertion of FERR# (independent of the value of CR0.NE). In addition they ignore the assertion /de-assertion of IGNNE#. However, just as with the Intel486 and Pentium processors, if the IGNNE# input is inactive, a floating-point exception which occurred in the previous FPU instruction and is unmasked causes the processor to freeze immediately when encountering the next WAIT or FPU instruction (except for no-wait instructions). This means that if the FPU exception handler has not already been invoked due to the earlier exception (and therefore, the handler not has cleared that exception state from the FPU), the processor is forced to wait for the handler to be invoked and handle the exception, before the processor can execute another WAIT or FPU instruction. As explained in Section E.2.1.3., No-Wait FPU Instructions Can Get FPU Interrupt in Window, if a no-wait instruction is used outside of the FPU exception handler, in the Intel486 and Pentium processors, it may accept an unmasked exception from a previous FPU instruction which happens to fall within the external interrupt sampling window that is opened near the beginning of execution of all FPU instructions. This will not happen in the P6 family processors, because this sampling window has been removed from the no-wait group of FPU instructions.
E.3. RECOMMENDED PROTOCOL FOR MS-DOS* COMPATIBILITY HANDLERS

The activities of numeric programs can be split into two major areas: program control and arithmetic. The program control part performs activities such as deciding what functions to perform, calculating addresses of numeric operands, and loop control. The arithmetic part simply adds, subtracts, multiplies, and performs other operations on the numeric operands. The processor is designed to handle these two parts separately and efficiently. An FPU exception handler, if a system chooses to implement one, is often one of the most complicated parts of the program control code.
E.3.1.
Floating-Point Exceptions and Their Defaults
The FPU can recognize six classes of floating-point exception conditions while executing floating-point instructions: 1. #I Invalid operation #IS Stack fault #IA IEEE standard invalid operation 2. #Z Divide-by-zero 3. #D Denormalized operand 4. #O Numeric overflow 5. #U Numeric underflow 6. #P Inexact result (precision)
E-10
For complete details on these exceptions and their defaults, refer to Section 7.7., Floating-Point Exception Handling and Section 7.8., Floating-Point Exception Conditions in Chapter 7, Floating-Point Unit.
E.3.2.
Two Options for Handling Numeric Exceptions
Depending on options determined by the software system designer, the processor takes one of two possible courses of action when a numeric exception occurs:
The FPU can handle selected exceptions itself, producing a default fix-up that is reasonable in most situations. This allows the numeric program execution to continue undisturbed. Programs can mask individual exception types to indicate that the FPU should generate this safe, reasonable result whenever the exception occurs. The default exception fix-up activity is treated by the FPU as part of the instruction causing the exception; no external indication of the exception is given (except that the instruction takes longer to execute when it handles a masked exception.) When masked exceptions are detected, a flag is set in the numeric status register, but no information is preserved regarding where or when it was set. Alternatively, a software exception handler can be invoked to handle the exception. When a numeric exception is unmasked and the exception occurs, the FPU stops further execution of the numeric instruction and causes a branch to a software exception handler. The exception handler can then implement any sort of recovery procedures desired for any numeric exception detectable by the FPU. AUTOMATIC EXCEPTION HANDLING: USING MASKED EXCEPTIONS
E.3.2.1.
Each of the six exception conditions described above has a corresponding flag bit in the FPU status word and a mask bit in the FPU control word. If an exception is masked (the corresponding mask bit in the control word = 1), the processor takes an appropriate default action and continues with the computation. The processor has a default fix-up activity for every possible exception condition it may encounter. These masked-exception responses are designed to be safe and are generally acceptable for most numeric applications. For example, if the Inexact result (Precision) exception is masked, the system can specify whether the FPU should handle a result that cannot be represented exactly by one of four modes of rounding: rounding it normally, chopping it toward zero, always rounding it up, or always down. If the Underflow exception is masked, the FPU will store a number that is too small to be represented in normalized form as a denormal (or zero if its smaller than the smallest denormal). Note that when exceptions are masked, the FPU may detect multiple exceptions in a single instruction, because it continues executing the instruction after performing its masked response. For example, the FPU could detect a denormalized operand, perform its masked response to this exception, and then detect an underflow. As an example of how even severe exceptions can be handled safely and automatically using the default exception responses, consider a calculation of the parallel resistance of several values using only the standard formula (refer to Figure E-4). If R1 becomes zero, the circuit resistance
E-11
becomes zero. With the divide-by-zero and precision exceptions masked, the processor will produce the correct result. FDIV of R1 into 1 gives infinity, and then FDIV of (infinity +R2 +R3) into 1 gives zero.
R1
R2
R3
Equivalent Resistance =
1 1 1 1 + + R1 R2 R3
Figure E-4. Arithmetic Example Using Infinity
By masking or unmasking specific numeric exceptions in the FPU control word, programmers can delegate responsibility for most exceptions to the processor, reserving the most severe exceptions for programmed exception handlers. Exception-handling software is often difficult to write, and the masked responses have been tailored to deliver the most reasonable result for each condition. For the majority of applications, masking all exceptions yields satisfactory results with the least programming effort. Certain exceptions can usefully be left unmasked during the debugging phase of software development, and then masked when the clean software is actually run. An invalid operation exception for example, typically indicates a program error that must be corrected. The exception flags in the FPU status word provide a cumulative record of exceptions that have occurred since these flags were last cleared. Once set, these flags can be cleared only by executing the FCLEX/FNCLEX (clear exceptions) instruction, by reinitializing the FPU with FINIT/FNINIT or FSAVE/FNSAVE, or by overwriting the flags with an FRSTOR or FLDENV instruction. This allows a programmer to mask all exceptions, run a calculation, and then inspect the status word to see if any exceptions were detected at any point in the calculation. E.3.2.2. SOFTWARE EXCEPTION HANDLING
If the FPU in or with an IA processor (Intel 286 and onwards) encounters an unmasked exception condition, with the system operated in the MS-DOS compatibility mode and with IGNNE# not asserted, a software exception handler is invoked through a PIC and the processors INTR pin. The FERR# (or ERROR#) output from the FPU that begins the process of invoking the exception handler may occur when the error condition is first detected, or when the processor encounters the next WAIT or FPU instruction. Which of these two cases occurs depends on the processor generation and also on which exception and which FPU instruction triggered it, as dis-
E-12
cussed earlier in Section E.1., Origin of the MS-DOS* Compatibility Mode for Handling FPU Exceptions and Section E.2., Implementation of the MS-DOS* Compatibility Mode in the Intel486, Pentium, and P6 family processors The elapsed time between the initial error signal and the invocation of the FPU exception handler depends of course on the external hardware interface, and also on whether the external interrupt for FPU errors is enabled. But the architecture ensures that the handler will be invoked before execution of the next WAIT or floating-point instruction since an unmasked floating-point exception causes the processor to freeze just before executing such an instruction (unless the IGNNE# input is active, or it is a no-wait FPU instruction). The frozen processor waits for an external interrupt, which must be supplied by external hardware in response to the FERR# (or ERROR#) output of the processor (or coprocessor), usually through IRQ13 on the slave PIC, and then through INTR. Then the external interrupt invokes the exception handling routine. Note that if the external interrupt for FPU errors is disabled when the processor executes an FPU instruction, the processor will freeze until some other (enabled) interrupt occurs if an unmasked FPU exception condition is in effect. If NE = 0 but the IGNNE# input is active, the processor disregards the exception and continues. Error reporting via an external interrupt is supported for MS-DOS compatibility. Chapter 18, Intel Architecture Compatibility of the Intel Architecture Software Developers Manual, Volume 3, contains further discussion of compatibility issues. The references above to the ERROR# output from the FPU apply to the Intel 387 and Intel 287 math coprocessors (NPX chips). If one of these coprocessors encounters an unmasked exception condition, it signals the exception to the Intel 286 or Intel386 processor using the ERROR# status line between the processor and the coprocessor. Refer to Section E.1., Origin of the MSDOS* Compatibility Mode for Handling FPU Exceptions, in this appendix, and Chapter 18, Intel Architecture Compatibility, in the Intel Architecture Software Developers Manual, Volume 3 for differences in FPU exception handling. The exception-handling routine is normally a part of the systems software. The routine must clear (or disable) the active exception flags in the FPU status word before executing any floating-point instructions that cannot complete execution when there is a pending floating-point exception. Otherwise, the floating-point instruction will trigger the FPU interrupt again, and the system will be caught in an endless loop of nested floating-point exceptions, and hang. In any event, the routine must clear (or disable) the active exception flags in the FPU status word after handling them, and before IRET(D). Typical exception responses may include:
Incrementing an exception counter for later display or printing. Printing or displaying diagnostic information (e.g., the FPU environment and registers). Aborting further execution, or using the exception pointers to build an instruction that will run without exception and executing it.
Applications programmers should consult their operating systems reference manuals for the appropriate system response to numerical exceptions. For systems programmers, some details on writing software exception handlers are provided in Chapter 5, Interrupt and Exception Handling, in the Intel Architecture Software Developers Manual, Volume 3, as well as in Section E.3.3.4., FPU Exception Handling Examples in this appendix.
E-13
As discussed in Section E.2.1.2., Recommended External Hardware to Support the MS-DOS* Compatibility Mode, some early FERR# to INTR hardware interface implementations are less robust than the recommended circuit. This is because they depended on the exception handler to clear the FPU exception interrupt request to the PIC (by accessing port 0F0H) before the handler causes FERR# to be de-asserted by clearing the exception from the FPU itself. To eliminate the chance of a problem with this early hardware, Intel recommends that FPU exception handlers always access port 0F0H before clearing the error condition from the FPU.
E.3.3.
Synchronization Required for Use of FPU Exception Handlers
Concurrency or synchronization management requires a check for exceptions before letting the processor change a value just used by the FPU. It is important to remember that almost any numeric instruction can, under the wrong circumstances, produce a numeric exception. E.3.3.1. EXCEPTION SYNCHRONIZATION: WHAT, WHY AND WHEN
Exception synchronization means that the exception handler inspects and deals with the exception in the context in which it occurred. If concurrent execution is allowed, the state of the processor when it recognizes the exception is often not in the context in which it occurred. The processor may have changed many of its internal registers and be executing a totally different program by the time the exception occurs. If the exception handler cannot recapture the original context, it cannot reliably determine the cause of the exception or to recover successfully from the exception. To handle this situation, the FPU has special registers updated at the start of each numeric instruction to describe the state of the numeric program when the failed instruction was attempted. This provides tools to help the exception handler recapture the original context, but the application code must also be written with synchronization in mind. Overall, exception synchronization must ensure that the FPU and other relevant parts of the context are in a well defined state when the handler is invoked after an unmasked numeric exception occurs. When the FPU signals an unmasked exception condition, it is requesting help. The fact that the exception was unmasked indicates that further numeric program execution under the arithmetic and programming rules of the FPU will probably yield invalid results. Thus the exception must be handled, and with proper synchronization, or the program will not operate reliably. For programmers in higher-level languages, all required synchronization is automatically provided by the appropriate compiler. However, for assembly language programmers exception synchronization remains the responsibility of the programmer. It is not uncommon for a programmer to expect that their numeric program will not cause numeric exceptions after it has been tested and debugged, but in a different system or numeric environment, exceptions may occur regularly nonetheless. An obvious example would be use of the program with some numbers beyond the range for which it was designed and tested. Example E-1 and Example E-2 in Section E.3.3.2., Exception Synchronization Examples shows a more subtle way in which unexpected exceptions can occur.
E-14
As described in Section E.3.1., Floating-Point Exceptions and Their Defaults, depending on options determined by the software system designer, the processor can perform one of two possible courses of action when a numeric exception occurs.
The FPU can provide a default fix-up for selected numeric exceptions. If the FPU performs its default action for all exceptions, then the need for exception synchronization is not manifest. However, code is often ported to contexts and operating systems for which it was not originally designed. Example E-1 and Example E-2, below, illustrate that it is safest to always consider exception synchronization when designing code that uses the FPU. Alternatively, a software exception handler can be invoked to handle the exception. When a numeric exception is unmasked and the exception occurs, the FPU stops further execution of the numeric instruction and causes a branch to a software exception handler. When an FPU exception handler will be invoked, synchronization must always be considered to assure reliable performance.
Example E-1 and Example E-2, below, illustrate the need to always consider exception synchronization when writing numeric code, even when the code is initially intended for execution with exceptions masked. E.3.3.2. EXCEPTION SYNCHRONIZATION EXAMPLES
In the following examples, three instructions are shown to load an integer, calculate its square root, then increment the integer. The synchronous execution of the FPU will allow both of these programs to execute correctly, with INC COUNT being executed in parallel in the processor, as long as no exceptions occur on the FILD instruction. However, if the code is later moved to an environment where exceptions are unmasked, the code in Example E-1 will not work correctly:
Example E-1. Incorrect Error Synchronization FILD COUNT; FPU instruction INC COUNT; integer instruction alters operand FSQRT ; subsequent FPU instruction -- error ; from previous FPU instruction detected here Example E-2. Proper Error Synchronization FILD COUNT; FPU instruction FSQRT ; subsequent FPU instruction -- error from ; previous FPU instruction detected here INC COUNT; integer instruction alters operand
In some operating systems supporting the FPU, the numeric register stack is extended to memory. To extend the FPU stack to memory, the invalid exception is unmasked. A push to a full register or pop from an empty register sets SF (Stack Fault flag) and causes an invalid operation exception. The recovery routine for the exception must recognize this situation, fix up the stack, then perform the original operation. The recovery routine will not work correctly in Example E-1. The problem is that the value of COUNT is incremented before the exception handler is
E-15
invoked, so that the recovery routine will load an incorrect value of COUNT, causing the program to fail or behave unreliably. E.3.3.3. PROPER EXCEPTION SYNCHRONIZATION IN GENERAL
As explained in Section E.2.1.2., Recommended External Hardware to Support the MS-DOS* Compatibility Mode, if the FPU encounters an unmasked exception condition a software exception handler is invoked before execution of the next WAIT or floating-point instruction. This is because an unmasked floating-point exception causes the processor to freeze immediately before executing such an instruction (unless the IGNNE# input is active, or it is a no-wait FPU instruction). Exactly when the exception handler will be invoked (in the interval between when the exception is detected and the next WAIT or FPU instruction) is dependent on the processor generation, the system, and which FPU instruction and exception is involved. To be safe in exception synchronization, one should assume the handler will be invoked at the end of the interval. Thus the program should not change any value that might be needed by the handler (such as COUNT in Example E-1 and Example E-2) until after the next FPU instruction following an FPU instruction that could cause an error. If the program needs to modify such a value before the next FPU instruction (or if the next FPU instruction could also cause an error), then a WAIT instruction should be inserted before the value is modified. This will force the handling of any exception before the value is modified. A WAIT instruction should also be placed after the last floating-point instruction in an application so that any unmasked exceptions will be serviced before the task completes. E.3.3.4. FPU EXCEPTION HANDLING EXAMPLES
There are many approaches to writing exception handlers. One useful technique is to consider the exception handler procedure as consisting of prologue, body, and epilogue sections of code. In the transfer of control to the exception handler due to an INTR, NMI, or SMI, external interrupts have been disabled by hardware. The prologue performs all functions that must be protected from possible interruption by higher-priority sources. Typically, this involves saving registers and transferring diagnostic information from the FPU to memory. When the critical processing has been completed, the prologue may re-enable interrupts to allow higher-priority interrupt handlers to preempt the exception handler. The standard prologue not only saves the registers and transfers diagnostic information from the FPU to memory but also clears the floating-point exception flags in the status word. Alternatively, when it is not necessary for the handler to be re-entrant, another technique may also be used. In this technique, the exception flags are not cleared in the prologue and the body of the handler must not contain any floating-point instructions that cannot complete execution when there is a pending floating-point exception. (The no-wait instructions are discussed in Section 7.5.12., Waiting Vs. Non-waiting Instructions in Chapter 7, Floating-Point Unit.) Note that the handler must still clear the exception flag(s) before executing the IRET. If the exception handler uses neither of these techniques the system will be caught in an endless loop of nested floating-point exceptions, and hang. The body of the exception handler examines the diagnostic information and makes a response that is necessarily application-dependent. This response may range from halting execution, to
E-16
displaying a message, to attempting to repair the problem and proceed with normal execution. The epilogue essentially reverses the actions of the prologue, restoring the processor so that normal execution can be resumed. The epilogue must not load an unmasked exception flag into the FPU or another exception will be requested immediately. The following code examples show the ASM386/486 coding of three skeleton exception handlers, with the save spaces given as correct for 32-bit protected mode. They show how prologues and epilogues can be written for various situations, but the application dependent exception handling body is just indicated by comments showing where it should be placed. The first two are very similar; their only substantial difference is their choice of instructions to save and restore the FPU. The trade-off here is between the increased diagnostic information provided by FNSAVE and the faster execution of FNSTENV. (Also, after saving the original contents, FNSAVE re-initializes the FPU, while FNSTENV only masks all FPU exceptions.) For applications that are sensitive to interrupt latency or that do not need to examine register contents, FNSTENV reduces the duration of the critical region, during which the processor does not recognize another interrupt request. (Refer to Section 7, Floating-Point Unit in Chapter 7, Floating-Point Unit, for a complete description of the FPU save image.) If the processor supports Streaming SIMD Extensions and the operating system supports it, the FXSAVE instruction should be used instead of FNSAVE. If the FXSAVE instruction is used, the save area should be increased to 512 bytes and aligned to 16 bytes to save the entire state. These steps will ensure that the complete context is saved. After the exception handler body, the epilogues prepare the processor to resume execution from the point of interruption (i.e., the instruction following the one that generated the unmasked exception). Notice that the exception flags in the memory image that is loaded into the FPU are cleared to zero prior to reloading (in fact, in these examples, the entire status word image is cleared). Example E-3 and Example E-4 assume that the exception handler itself will not cause an unmasked exception. Where this is a possibility, the general approach shown in Example E-5 can be employed. The basic technique is to save the full FPU state and then to load a new control word in the prologue. Note that considerable care should be taken when designing an exception handler of this type to prevent the handler from being reentered endlessly.
Example E-3. Full-State Exception Handler SAVE_ALLPROC ; ; SAVE REGISTERS, ALLOCATE STACK SPACE FOR FPU STATE IMAGE PUSHEBP . . MOV EBP, ESP SUB ESP, 108 ; ALLOCATES 108 BYTES (32-bit PROTECTED MODE SIZE) ;SAVE FULL FPU STATE, RESTORE INTERRUPT ENABLE FLAG (IF) FNSAVE[EBP-108] PUSH [EBP + OFFSET_TO_EFLAGS] ; COPY OLD EFLAGS TO STACK TOP POPFD ; RESTORE IF TO VALUE BEFORE FPU EXCEPTION
E-17
; ; APPLICATION-DEPENDENT EXCEPTION HANDLING CODE GOES HERE ; ; CLEAR EXCEPTION FLAGS IN STATUS WORD (WHICH IS IN MEMORY) ; RESTORE MODIFIED STATE IMAGE MOVBYTE PTR [EBP-104], 0H FRSTOR[EBP-108] ; DE-ALLOCATE STACK SPACE, RESTORE REGISTERS MOVESP, EBP . . POPEBP ; ; RETURN TO INTERRUPTED CALCULATION IRETD SAVE_ALLENDP Example E-4. Reduced-Latency Exception Handler SAVE_ENVIRONMENTPROC ; ; SAVE REGISTERS, ALLOCATE STACK SPACE FOR FPU ENVIRONMENT PUSHEBP . . MOV EBP, ESP SUB ESP, 28 ; ALLOCATES 28 BYTES (32-bit PROTECTED MODE SIZE) ;SAVE ENVIRONMENT, RESTORE INTERRUPT ENABLE FLAG (IF) FNSTENV[EBP-28] PUSH [EBP + OFFSET_TO_EFLAGS] ; COPY OLD EFLAGS TO STACK TOP POPFD ; RESTORE IF TO VALUE BEFORE FPU EXCEPTION ; ; APPLICATION-DEPENDENT EXCEPTION HANDLING CODE GOES HERE ; ; CLEAR EXCEPTION FLAGS IN STATUS WORD (WHICH IS IN MEMORY) ; RESTORE MODIFIED ENVIRONMENT IMAGE MOV BYTE PTR [EBP-24], 0H FLDENV[EBP-28] ; DE-ALLOCATE STACK SPACE, RESTORE REGISTERS MOV ESP, EBP . . POP EBP ; ; RETURN TO INTERRUPTED CALCULATION IRETD
E-18
SAVE_ENVIRONMENT ENDP Example E-5. Reentrant Exception Handler . . LOCAL_CONTROL DW ?; ASSUME INITIALIZED . . REENTRANTPROC ; ; SAVE REGISTERS, ALLOCATE STACK SPACE FOR FPU STATE IMAGE PUSH EBP . . MOV EBP, ESP SUB ESP, 108 ; ALLOCATES 108 BYTES (32-bit PROTECTED MODE SIZE) ; SAVE STATE, LOAD NEW CONTROL WORD, RESTORE INTERRUPT ENABLE FLAG (IF) FNSAVE[EBP-108] FLDCW LOCAL_CONTROL PUSH [EBP + OFFSET_TO_EFLAGS] ; COPY OLD EFLAGS TO STACK TOP POPFD ; RESTORE IF TO VALUE BEFORE FPU EXCEPTION . . ; ; APPLICATION-DEPENDENT EXCEPTION HANDLING CODE GOES HERE. AN UNMASKED EXCEPTION ; GENERATED HERE WILL CAUSE THE EXCEPTION HANDLER TO BE REENTERED. ; IF LOCAL STORAGE IS NEEDED, IT MUST BE ALLOCATED ON THE STACK. ; . . ; CLEAR EXCEPTION FLAGS IN STATUS WORD (WHICH IS IN MEMORY) ; RESTORE MODIFIED STATE IMAGE MOV BYTE PTR [EBP-104], 0H FRSTOR[EBP-108] ; DE-ALLOCATE STACK SPACE, RESTORE REGISTERS MOV ESP, EBP . . POP EBP ; ; RETURN TO POINT OF INTERRUPTION IRETD REENTRANT ENDP
E-19
E.3.4.
Need for Storing State of IGNNE# Circuit If Using FPU and SMM
The recommended circuit (refer to Figure E-1) for MS-DOS compatibility FPU exception handling for Intel486 processors and beyond contains two flip flops. When the FPU exception handler accesses I/O port 0F0H it clears the IRQ13 interrupt request output from Flip Flop #1 and also clocks out the IGNNE# signal (active) from Flip Flop #2. The assertion of IGNNE# may be used by the handler if needed to execute any FPU instruction while ignoring the pending FPU errors. The problem here is that the state of Flip Flop #2 is effectively an additional (but hidden) status bit that can affect processor behavior, and so ideally should be saved upon entering SMM, and restored before resuming to normal operation. If this is not done, and also the SMM code saves the FPU state, AND an FPU error handler is being used which relies on IGNNE# assertion, then (very rarely) the FPU handler will nest inside itself and malfunction. The following example shows how this can happen. Suppose that the FPU exception handler includes the following sequence:
FNSTSWsave_sw ; save the FPU status word ; using a no-wait FPU instruction OUT0F0H, AL; clears IRQ13 & activates IGNNE# .... FLDCW new_cw; loads new CW ignoring FPU errors, ; since IGNNE# is assumed active; or any ; other FPU instruction that is not a no-wait ; type will cause the same problem .... FCLEX ; clear the FPU error conditions & thus turn off FERR# & reset the IGNNE# FF
The problem will only occur if the processor enters SMM between the OUT and the FLDCW instructions. But if that happens, AND the SMM code saves the FPU state using FNSAVE, then the IGNNE# Flip Flop will be cleared (because FNSAVE clears the FPU errors and thus de-asserts FERR#). When the processor returns from SMM it will restore the FPU state with FRSTOR, which will re-assert FERR#, but the IGNNE# Flip Flop will not get set. Then when the FPU error handler executes the FLDCW instruction, the active error condition will cause the processor to re-enter the FPU error handler from the beginning. This may cause the handler to malfunction. To avoid this problem, Intel recommends two measures: 1. Do not use the FPU for calculations inside SMM code. (The normal power management, and sometimes security, functions provided by SMM have no need for FPU calculations; if they are needed for some special case, use scaling or emulation instead.) This eliminates the need to do FNSAVE/FRSTOR inside SMM code, except when going into a 0 V suspend state (in which, in order to save power, the CPU is turned off completely, requiring its complete state to be saved.) 2. The system should not call upon SMM code to put the processor into 0 V suspend while the processor is running FPU calculations, or just after an interrupt has occurred. Normal power management protocol avoids this by going into power down states only after timed intervals in which no system activity occurs.
E-20
E.3.5.
Considerations When FPU Shared Between Tasks
The IA allows speculative deferral of floating-point state swaps on task switches. This feature allows postponing an FPU state swap until an FPU instruction is actually encountered in another task. Since kernel tasks rarely use floating-point, and some applications do not use floating-point or use it infrequently, the amount of time saved by avoiding unnecessary stores of the floatingpoint state is significant. Speculative deferral of FPU saves does, however, place an extra burden on the kernel in three key ways: 1. The kernel must keep track of which thread owns the FPU, which may be different from the currently executing thread. 2. The kernel must associate any floating-point exceptions with the generating task. This requires special handling since floating-point exceptions are delivered asynchronous with other system activity. 3. There are conditions under which spurious floating-point exception interrupts are generated, which the kernel must recognize and discard. E.3.5.1. SPECULATIVELY DEFERRING FPU SAVES, GENERAL OVERVIEW
In order to support multitasking, each thread in the system needs a save area for the general-purpose registers, and each task that is allowed to use floating-point needs an FPU save area large enough to hold the entire FPU stack and associated FPU state such as the control word and status word. (Refer to Section 7.3.9., Saving the FPUs State in Chapter 7, Floating-Point Unit, for a complete description of the FPU save image.) If the processor and the operating system support Streaming SIMD Extensions, the save area should be large enough and aligned correctly to hold FPU and Streaming SIMD Extensions state. On a task switch, the general-purpose registers are swapped out to their save area for the suspending thread, and the registers of the resuming thread are loaded. The FPU state does not need to be saved at this point. If the resuming thread does not use the FPU before it is itself suspended, then both a save and a load of the FPU state has been avoided. It is often the case that several threads may be executed without any usage of the FPU. The processor supports speculative deferral of FPU saves via interrupt 7 Device Not Available (DNA), used in conjunction with CR0 bit 3, the Task Switched bit (TS). (Refer to Section 2.5., Control Registers, in Chapter 2, System Architecture Overview of the Intel Architecture Software Developers Manual, Volume 3.) Every task switch via the hardware supported task switching mechanism (refer to Section 6.3., Task Switching in Chapter 6, Task Management of the Intel Architecture Software Developers Manual, Volume 3) sets TS. Multi-threaded kernels that use software task switchingI can set the TS bit by reading CR0, ORing a 1 intoII bit 3, and writing back CR0. Any subsequent floating-point instructions (now being executed in a new thread context) will fault via interrupt 7 before execution. This allows a DNA handler to save the old floating-point context and reload the FPU state for the current thread. The handler should
NOTES I In a software task switch, the operating system uses a sequence of instructions to save the suspending threads state and restore the resuming threads state, instead of the single long non-interruptible task switch operation provided by the IA.
E-21
clear the TS bit before exit using the CLTS instruction. On return from the handler the faulting thread will proceed with its floating-point computation. Some operating systems save the FPU context on every task switch, typically because they also change the linear address space between tasks. The problem and solution discussed in the following sections apply to these operating systems also. E.3.5.2. TRACKING FPU OWNERSHIP
Since the contents of the FPU may not belong to the currently executing thread, the thread identifier for the last FPU user needs to be tracked separately. This is not complicated; the kernel should simply provide a variable to store the thread identifier of the FPU owner, separate from the variable that stores the identifier for the currently executing thread. This variable is updated in the DNA exception handler, and is used by the DNA exception handler to find the FPU save areas of the old and new threads. A simplified flow for a DNA exception handler is then: 1. Use the FPU Owner variable to find the FPU save area of the last thread to use the FPU. 2. Save the FPU contents to the old threads save area, typically using an FNSAVE or FXSAVE instruction. 3. Set the FPU Owner variable to the identify the currently executing thread. 4. Reload the FPU contents from the new threads save area, typically using an FRSTOR or FXRSTOR instruction. 5. Clear TS using the CLTS instruction and exit the DNA exception handler. While this flow covers the basic requirements for speculatively deferred FPU state swaps, there are some additional subtleties that need to be handled in a robust implementation. E.3.5.3. INTERACTION OF FPU STATE SAVES AND FLOATING-POINT EXCEPTION ASSOCIATION
Recall these key points from earlier in this document: When considering floating-point exceptions across all implementations of the IA, and across all floating-point instructions, an floatingpoint exception can be initiated from any time during the excepting floating-point instruction, up to just before the next floating-point instruction. The next floating-point instruction may be the FNSAVE used to save the FPU state for a task switch. In the case of no-wait: instructions such as FNSAVE, the interrupt from a previously excepting instruction (NE=0 case) may arrive just before the no-wait instruction, during, or shortly thereafter with a system dependent delay. Note that this implies that an floating-point exception might be registered during the state swap process itself, and the kernel and floating-point exception interrupt handler must be prepared for this case.
NOTES II Although CR0, bit 2, the emulation flag (EM), also causes a DNA exception, do not use the EM bit as a surrogate for TS. EM means that no floating-point unit is available and that floating-point instructions must be emulated. Using EM to trap on task switches is not compatible with IA MMX technology. If the EM flag is set, MMX instructions raise the invalid opcode exception.
E-22
A simple way to handle the case of exceptions arriving during FPU state swaps is to allow the kernel to be one of the FPU owning threads. A reserved thread identifier is used to indicate kernel ownership of the FPU. During an floating-point state swap, the FPU owner variable should be set to indicate the kernel as the current owner. At the completion of the state swap, the variable should be set to indicate the new owning thread. The numeric exception handler needs to check the FPU owner and discard any numeric exceptions that occur while the kernel is the FPU owner. A more general flow for a DNA exception handler that handles this case is shown in Figure E-5. Numeric exceptions received while the kernel owns the FPU for a state swap must be discarded in the kernel without being dispatched to a handler. A flow for a numeric exception dispatch routine is shown in Figure E-6. It may at first glance seem that there is a possibility of floating-point exceptions being lost because of exceptions that are discarded during state swaps. This is not the case, as the exception will be re-issued when the floating-point state is reloaded. Walking through state swaps both with and without pending numeric exceptions will clarify the operation of these two handlers. Case #1: FPU State Swap Without Numeric Exception Assume two threads A and B, both using the floating-point unit. Let A be the thread to have most recently executed a floating-point instruction, with no pending numeric exceptions. Let B be the currently executing thread. CR0.TS was set when thread A was suspended. When B starts to execute a floating-point instruction the instruction will fault with the DNA exception because TS is set. At this point the handler is entered, and eventually it finds that the current FPU Owner is not the currently executing thread. To guard the FPU state swap from extraneous numeric exceptions, the FPU Owner is set to be the kernel. The old owners FPU state is saved with FNSAVE, and the current threads FPU state is restored with FRSTOR. Before exiting, the FPU owner is set to thread B, and the TS bit is cleared. On exit, thread B resumes execution of the faulting floating-point instruction and continues. Case #2: FPU State Swap with Discarded Numeric Exception Again, assume two threads A and B, both using the floating-point unit. Let A be the thread to have most recently executed a floating-point instruction, but this time let there be a pending numeric exception. Let B be the currently executing thread. When B starts to execute a floatingpoint instruction the instruction will fault with the DNA exception and enter the DNA handler. (If both numeric and DNA exceptions are pending, the DNA exception takes precedence, in order to support handling the numeric exception in its own context.)
E-23
DNA Handler Entry
<other handler setup code>
Current Thread same as FPU Owner? No FPU Owner = Kernel
Yes
Use FNSAVE or FXSAVE to Old Threads FP Save Area (may cause numeric exception)
<handler final cleanup>
Use FRSTOR or FXRSTOR from Current Threads FP Save Area
CLTS (clears CR0.TS)
Exit DNA Handler <other handler code>
FPU Owner = Current Thread
Figure E-5. General Program Flow for DNA Exception Handler
Numeric Exception Entry
Is Kernel FPU Owner? No Normal Dispatch to Numeric Exception Handler
Yes
Exit
Figure E-6. Program Flow for a Numeric Exception Dispatch Routine
When the FNSAVE starts, it will trigger an interrupt via FERR# because of the pending numeric exception. After some system dependent delay, the numeric exception handler is entered. It may
E-24
be entered before the FNSAVE starts to execute, or it may be entered shortly after execution of the FNSAVE. Since the FPU Owner is the kernel, the numeric exception handler simply exits, discarding the exception. The DNA handler resumes execution, completing the FNSAVE of the old floating-point context of thread A and the FRSTOR of the floating-point context for thread B. Thread A eventually gets an opportunity to handle the exception that was discarded during the task switch. After some time, thread B is suspended, and thread A resumes execution. When thread A starts to execute a floating-point instruction, once again the DNA exception handler is entered. Bs FPU state is stored, and As FPU state is restored. Note that in restoring the FPU state from As save area, the pending numeric exception flags are reloaded in to the floatingpoint status word. Now when the DNA exception handler returns, thread A resumes execution of the faulting floating-point instruction just long enough to immediately generate a numeric exception, which now gets handled in the normal way. The net result is that the task switch and resulting FPU state swap via the DNA exception handler causes an extra numeric exception which can be safely discarded. E.3.5.4. INTERRUPT ROUTING FROM THE KERNEL
In MS-DOS, an application that wishes to handle numeric exceptions hooks interrupt 16 by placing its handler address in the interrupt vector table, and exiting via a jump to the previous interrupt 16 handler. Protected mode systems that run MS-DOS programs under a subsystem can emulate this exception delivery mechanism. For example, assume a protected mode O.S. that runs with CR.NE = 1, and that runs MS-DOS programs in a virtual machine subsystem. The MS-DOS program is set up in a virtual machine that provides a virtualized interrupt table. The MS-DOS application hooks interrupt 16 in the virtual machine in the normal way. A numeric exception will trap to the kernel via the real INT 16 residing in the kernel at ring 0. The INT 16 handler in the kernel then locates the correct MS-DOS virtual machine, and reflects the interrupt to the virtual machine monitor. The virtual machine monitor then emulates an interrupt by jumping through the address in the virtualized interrupt table, eventually reaching the applications numeric exception handler. E.3.5.5. SPECIAL CONSIDERATIONS FOR OPERATING SYSTEMS THAT SUPPORT STREAMING SIMD EXTENSIONS
Operating systems that support Streaming SIMD Extensions instructions introduced with the Pentium III processor should use the FXSAVE and FXRSTOR instructions to save and restore the new SIMD floating-point instruction register state as well as the floating-point state. Such operating systems must consider the following issues: 1. Enlarged state save area: the FNSAVE/FRSTOR instructions operate on a 94-byte or 108-byte memory region, depending on whether they are executed in 16-bit or 32-bit mode. The FXSAVE/FXRSTOR instructions operate on a 512-byte memory region. 2. Alignment requirements: the FXSAVE/FXRSTOR instructions require the memory region on which they operate to be 16-byte aligned (refer to the individual instruction instructions descriptions in Chapter 3, Instruction Set Reference, in the Intel Architecture
E-25
Software Developers Manual, Volume 2, for information about exceptions generated if the memory region is not aligned). 3. Maintaining compatibility with legacy applications/libraries: The operating system changes to support Streaming SIMD Extensions must be invisible to legacy applications or libraries that deal only with floating-point instructions. The layout of the memory region operated on by the FXSAVE/FXRSTOR instructions is different from the layout for the FNSAVE/FRSTOR instructions. Specifically, the format of the FPU tag word and the length of the various fields in the memory region is different. Care must be taken to return the FPU state to a legacy application (e.g., when reporting FP exceptions) in the format it expects. 4. Instruction semantic differences: There are some semantic differences between the way the FXSAVE and FSAVE/FNSAVE instructions operate. The FSAVE/FNSAVE instructions clear the FPU after they save the state while the FXSAVE instruction saves the FPU/Streaming SIMD Extensions state but does not clear it. Operating systems that use FXSAVE to save the FPU state before making it available for another thread (e.g., during thread switch time) should take precautions not to pass a dirty FPU to another application.
E.4. DIFFERENCES FOR HANDLERS USING NATIVE MODE

The 8087 has a pin INT which it asserts when an unmasked exception occurs. But there is no interrupt input pin in the 8086 or 8088 dedicated to its attachment, nor an interrupt vector number in the 8086 or 8088 specific for an FPU error assertion. But beginning with the Intel 286 and Intel 287 hardware connections were dedicated to support the FPU exception, and interrupt vector 16 assigned to it.
E.4.1.
Origin with the Intel 286 and Intel 287, and Intel386 and Intel 387 Processors
The Intel 286 and Intel 287, and Intel386 and Intel 387 processor/coprocessor pairs are each provided with ERROR# pins that are recommended to be connected between the processor and FPU. If this is done, when an unmasked FPU exception occurs, the FPU records the exception, and asserts its ERROR# pin. The processor recognizes this active condition of the ERROR# status line immediately before execution of the next WAIT or FPU instruction (except for the nowait type) in its instruction stream, and branches to the routine at interrupt vector 16. Thus an FPU exception will be handled before any other FPU instruction (after the one causing the error) is executed (except for no-wait instructions, which will be executed without triggering the FPU exception interrupt, but it will remain pending). Using the dedicated interrupt 16 for FPU exception handling is referred to as the native mode. It is the simplest approach, and the one recommended most highly by Intel.
E-26
E.4.2.
Changes with Intel486, Pentium, and P6 Family Processors with CR0.NE=1
With these latest three generations of the IA, more enhancements and speedup features have been added to the corresponding FPUs. Also, the FPU is now built into the same chip as the processor, which allows further increases in the speed at which the FPU can operate as part of the integrated system. This also means that the native mode of FPU exception handling, selected by setting bit NE of register CR0 to 1, is now entirely internal. If an unmasked exception occurs during an FPU instruction, the FPU records the exception internally, and triggers the exception handler through interrupt 16 immediately before execution of the next WAIT or FPU instruction (except for no-wait instructions, which will be executed as described in Section E.4.1., Origin with the Intel 286 and Intel 287, and Intel386 and Intel 387 Processors). An unmasked numerical exception causes the FERR# output to be activated even with NE=1, and at exactly the same point in the program flow as it would have been asserted if NE were zero. However, the system would not connect FERR# to a PIC to generate INTR when operating in the native, internal mode. (If the hardware of a system has FERR# connected to trigger IRQ13 in order to support MS-DOS, but an O/S using the native mode is actually running the system, it is the O/Ss responsibility to make sure that IRQ13 is not enabled in the slave PIC.) With this configuration a system is immune to the problem discussed in Section E.2.1.3., No-Wait FPU Instructions Can Get FPU Interrupt in Window, where for Intel486 and Pentium processors a no-wait FPU instruction can get an FPU exception.
E.4.3.
Considerations When FPU Shared Between Tasks Using Native Mode
The protocols recommended in Section E.3.5., Considerations When FPU Shared Between Tasks for MS-DOS compatibility FPU exception handlers that are shared between tasks may be used without change with the native mode. However, the protocols for a handler written specifically for native mode can be simplified, because the problem of a spurious floating-point exception interrupt occurring while the kernel is executing cannot happen in native mode. The problem as actually found in practical code in a MS-DOS compatibility system happens when the DNA handler uses FNSAVE to switch FPU contexts. If an FPU exception is active, then FNSAVE triggers FERR# briefly, which usually will cause the FPU exception handler to be invoked inside the DNA handler. In native mode, neither FNSAVE nor any other no-wait instructions can trigger interrupt 16. (As discussed above, FERR# gets asserted independent of the value of the NE bit, but when NE=1, the O/S should not enable its path through the PIC.) Another possible (very rare) way a floating-point exception interrupt could occur while the kernel is executing is by an FPU immediate exception case having its interrupt delayed by the external hardware until execution has switched to the kernel. This also cannot happen in native mode because there is no delay through external hardware. Thus the native mode FPU exception handler can omit the test to see if the kernel is the FPU owner, and the DNA handler for a native mode system can omit the step of setting the kernel as the FPU owner at the handlers beginning. Since however these simplifications are minor and
E-27
save little code, it would be a reasonable and conservative habit (as long as the MS-DOS compatibility mode is widely used) to include these steps in all systems. Note that the special DP (Dual Processing) mode for Pentium Processors, and also the more general Intel MultiProcessor Specification for systems with multiple Pentium or P6 family processors, support FPU exception handling only in the native mode. Intel does not recommend using the MS-DOS compatibility mode for systems using more than one processor.
E-28
F
Guidelines for Writing SIMD Floating-Point Exception Handlers
GUIDELINES FOR WRITING SIMD FLOATING-POINT EXCEPTION
APPENDIX F GUIDELINES FOR WRITING SIMD FLOATINGPOINT EXCEPTION HANDLERS

Most of the information on Streaming SIMD Extensions instructions can be found in Chapter 9, Programming with the Streaming SIMD Extensions. Exceptions in Streaming SIMD Extensions are specifically presented in Section 9.5.5., Exception Handling in Streaming SIMD Extensions This appendix considers only the Streaming SIMD Extensions instructions that can generate numeric (floating-point) exceptions, and gives an overview of the necessary support for handling such exceptions. This appendix does not address RSQRTSS, RSQRTPS, RCPSS, RCPPS, or any unlisted instruction. For detailed information on which instructions generate numeric exceptions, and a listing of those exceptions, refer to Appendix D, SIMD Floating-Point Exceptions Summary. Non-numeric exceptions are handled in a way similar to that for the standard IA-32 instructions.
F.1.
TWO OPTIONS FOR HANDLING NUMERIC EXCEPTIONS
Just as for FPU floating-point exceptions, the processor takes one of two possible courses of action when a Streaming SIMD Extensions instruction raises a floating-point exception.
If the exception being raised is masked (by setting the corresponding mask bit in the MXCSR to 1), then a default result is produced, which is acceptable in most situations. No external indication of the exception is given, but the corresponding exception flags in the MXCSR are set, and may be examined later. Note though that for packed operations, an exception flag that is set in the MXCSR will not tell which of the four sets of sub-operands caused the event to occur. If the exception being raised is not masked (by setting the corresponding mask bit in the MXCSR to 0), a software exception handler previously registered by the user will be invoked through the SIMD floating-point exception vector 19. This case is discussed below in Section F.2., Software Exception Handling.
F.2.
SOFTWARE EXCEPTION HANDLING
The exception handling routine reached via interrupt vector 19 is usually part of the system software (the operating system kernel). Note that an interrupt descriptor table (IDT) entry must have been previously set up for this vector (refer to Chapter 5, Interrupt and Exception Handling, in the Intel Architecture Software Developers Manual, Volume 3). Some compilers use specific run-time libraries to assist in floating-point exception handling. If any FPU floating-point operations are going to be performed that might raise floating-point exceptions, then the exception handling routine must either disable all floating-point exceptions (for example, loading a local
F-1
control word with FLDCW), or it must be implemented as re-entrant (for the case of FPU exceptions, refer to Example E-5 in Appendix E, Guidelines for Writing FPU Exceptions Handlers). If this is not the case, the routine has to clear the status flags for FPU exceptions, or to mask all FPU floating-point exceptions. For Streaming SIMD Extensions floating-point exceptions though, the exception flags in MXCSR do not have to be cleared, even if they remain unmasked (they may still be cleared). Exceptions are in this case precise and occur immediately, and a Streaming SIMD Extensions exception status flag that is set when the corresponding exception is unmasked will not generate an exception. Typical actions performed by this low-level exception handling routine are:
incrementing an exception counter for later display or printing printing or displaying diagnostic information (e.g. the MXCSR and XMM registers) aborting further execution, or using the exception pointers to build an instruction that will run without exception and executing it storing information about the exception in a data structure that will be passed to a higher level user exception handler
In most cases (and this applies also to the Streaming SIMD Extensions instructions), there will be three main components of a low-level floating-point exception handler: a prologue, a body, and an epilogue. The prologue performs functions that must be protected from possible interruption by higherpriority sources - typically saving registers and transferring diagnostic information from the processor to memory. When the critical processing has been completed, the prologue may re-enable interrupts to allow higher-priority interrupt handlers to preempt the exception handler (assuming that the interrupt handler was called through an interrupt gate, meaning that the processor cleared the interrupt enable (IF) flag in the EFLAGS register - refer to Section 4.4.1., Call and Return Operation for Interrupt or Exception Handling Procedures in Chapter 4, Procedure Calls, Interrupts, and Exceptions). The body of the exception handler examines the diagnostic information and makes a response that is application-dependent. It may range from halting execution, to displaying a message, to attempting to fix the problem and then proceeding with normal execution, to setting up a data structure, calling a higher-level user exception handler and continuing execution upon return from it. This latter case will be assumed in Section F.4., SIMD Floating-Point Exceptions and the IEEE-754 Standard for Binary Floating-Point Computations below. Finally, the epilogue essentially reverses the actions of the prologue, restoring the processor state so that normal execution can be resumed. The following example represents a typical exception handler. To link it with Example F-2 that will follow in Section F.4.3., SIMD Floating-Point Emulation Implementation Example, assume that the body of the handler (not shown here in detail) passes the saved state to a routine that will examine in turn all the sub-operands of the excepting instruction, invoking a user floating-point exception handler if a particular set of sub-operands raises an unmasked (enabled) exception, or emulating the instruction otherwise.
F-2
Example F-1. SIMD Floating-Point Exception Handler SIMD_FP_EXC_HANDLER PROC ; ;;; PROLOGUE ; SAVE REGISTERS PUSH EBP PUSH EAX ... MOV EBP, ESP SUB ESP, 512 AND ESP, 0fffffff0h FXSAVE [ESP] PUSH [EBP+EFLAGS_OFFSET] POPD
; SAVE EBP ; SAVE EAX ; SAVE ESP in EBP ; ALLOCATE 512 BYTES ; MAKE THE ADDRESS 16-BYTE ALIGNED ; SAVE FP, MMX, AND SIMD FP STATE ; COPY OLD EFLAGS TO STACK TOP ;RESTORE THE INTERRUPT ENABLE FLAG IF ;TO VALUE BEFORE SIMD FP EXCEPTION
; ;;; BODY ; APPLICATION-DEPENDENT EXCEPTION HANDLING CODE GOES HERE LDMXCSR LOCAL_MXCSR ; LOAD LOCAL FPU CW IF NEEDED ... ... ; ;;; EPILOGUE FXRSTOR [ESP] ; RESTORE MODIFIED STATE IMAGE MOV ESP, EBP ; DE-ALLOCATE STACK SPACE ... POP EAX ; RESTORE EAX POP EBP ; RESTORE EBP IRET ; RETURN TO INTERRUPTED CALCULATION SIMD_FP_EXC_HANDLER ENDP
F.3.
EXCEPTION SYNCHRONIZATION
A Streaming SIMD Extensions instruction can execute in parallel with other similar instructions, with integer instructions, and with floating-point or MMX instructions. Exception synchronization may therefore be necessary, similarly to the situation described in Section E.3.3., Synchronization Required for Use of FPU Exception Handlers in Appendix E, Guidelines for Writing FPU Exceptions Handlers). Careful coding will ensure proper synchronization in case a floating-point exception handler is invoked, and will lead to reliable performance.
F-3
F.4.
SIMD FLOATING-POINT EXCEPTIONS AND THE IEEE-754 STANDARD FOR BINARY FLOATING-POINT COMPUTATIONS
The Streaming SIMD Extensions are 100% compatible with the ANSI/IEEE Standard 7541985, IEEE Standard for Binary Floating-Point Arithmetic, satisfying all of its mandatory requirements (when the flush-to-zero mode is not enabled). But a programming environment that includes the Streaming SIMD Extensions instructions will comply with both the obligatory and the strongly recommended requirements of the IEEE Standard 754 regarding floating-point exception handling, only as a combination of hardware and software (which is acceptable). The standard states that a user should be able to request a trap on any of the five floating-point exceptions (note that the denormal exception is an IA addition), and it also specifies the values (operands or result) to be delivered to the exception handler. The main issue is that for Streaming SIMD Extensions instructions that raise post-computation exceptions (traps: overflow, underflow, or inexact), unlike for IA-32 FPU instructions, the processor does not provide the result recommended by the IEEE standard to the user handler. If a user program needs the result of an instruction that generated a post-computation exception, it is the responsibility of the software to produce this result by emulating the faulting Streaming SIMD Extensions instruction. Another issue is that the standard does not specify explicitly how to handle multiple floating-point exceptions that occur simultaneously. For packed operations, a logical OR of the flags that would be set by each sub-operation is used to set the exception flags in the MXCSR. The following subsections present one possible way to solve these problems.
F.4.1.
Floating-Point Emulation
Every operating system must provide a kernel level floating-point exception handler (a template was presented in Section F.2., Software Exception Handling above). In the following, assume that a user mode floating-point exception filter is supplied for Streaming SIMD Extensions exceptions (for example as part of a library of C functions), that a user program can invoke in order to handle unmasked exceptions. The user mode floating-point exception filter (not shown here) has to be able to emulate the subset of Streaming SIMD Extensions instructions that can generate numeric exceptions, and has to be able to invoke a user provided floating-point exception handler for floating-point exceptions. When a floating-point exception that is not masked is raised by a Streaming SIMD Extensions instruction, the low-level floating-point exception handler will be called. This low-level handler may in turn call the user mode floating-point exception filter. The filter function receives the original operands of the excepting instruction, as no results are provided by the hardware, whether a pre-computation or a post-computation exception has occurred. The filter will unpack the operands into up to four sets of sub-operands, and will submit them one set at a time to an emulation function (that will be presented in Example F-2 in Section F.4.3., SIMD Floating-Point Emulation Implementation Example, below). The emulation function will examine the sub-operands, and will possibly redo the necessary calculation.
F-4
Two cases are possible:
If an unmasked (enabled) exception occurs in this process, the emulation function will return to its caller (the filter function) with the appropriate information. The filter will invoke a (previously registered) user floating-point exception handler for this set of suboperands, and will record the result upon return from the user handler (provided the user handler allows continuation of the execution). If no unmasked (enabled) exception occurs, the emulation function will determine and will return to its caller the result of the operation for the current set of sub-operands (it has to be IEEE compliant). The filter function will record the result (plus any new flag settings).
The user level filter function will then call the emulation function for the next set of sub-operands (if any). When done, the partial results will be packed (if the excepting instruction has a packed floating-point result, which is true for most Streaming SIMD Extensions numeric instructions) and the filter will return to the low-level exception handler, which in turn will return from the interruption, allowing execution to continue. Note that the instruction pointer (EIP) has to be altered to point to the instruction following the excepting instruction, in order to continue execution correctly. If a user mode floating-point exception filter is not provided, then all the work for decoding the excepting instruction, reading its operands, emulating the instruction for the components of the result that do not correspond to unmasked floating-point exceptions, and providing the compounded result will have to be performed by the user provided floating-point exception handler. Actual emulation will have to take place for one operand or pair of operands for scalar operations, and for all four operands or pairs of operands for packed operations. The steps to perform are the following:
the excepting instruction has to be decoded and the operands have to be read from the saved context the instruction has to be emulated for each (pair of) sub-operand(s); if no floating-point exception occurs, the partial result has to be saved; if a masked floating-point exception occurs, the masked result has to be produced through emulation and saved, and the appropriate status flags have to be set; if an unmasked floating-point exception occurs, the result has to be generated by the user provided floating-point exception handler, and the appropriate status flags have to be set the four partial results have to be combined and written to the context that will be restored upon application program resumption
F-5
A diagram of the control flow in handling an unmasked floating-point exception is presented below.
User Application
Low-Level Floating-Point Exception Handler
User Level Floating-Point Exception Filter
User Floating-Point Exception Handler
Figure F-1. Control Flow for Handling Unmasked Floating-Point Exceptions
From the user level floating-point filter, Example F-2 in Section F.4.3., SIMD Floating-Point Emulation Implementation Examplewill present only the floating-point emulation part. In order to understand the actions involved, the expected response to exceptions has to be known for all the Streaming SIMD Extensions numeric instructions in two situations: with exceptions enabled (unmasked result), and with exceptions disabled (masked result). The latter can be found in Section 4.4., Interrupts and Exceptions, in Chapter 4, Procedure Calls, Interrupts, and Exceptions. The response to NaN operands that do not raise an exception is specified in Section 9.1.6., SIMD Floating-Point Register Data Formats. Operating on NaNs from the same source. It is also given in more detail in the next subsection, along with the unmasked and masked responses to floating-point exceptions.
F.4.2.
Streaming SIMD Extensions Response To Floating-Point Exceptions
This subsection specifies the unmasked response expected from the Streaming SIMD Extensions instructions that raise floating-point exceptions. The masked response is given in parallel, as it is necessary in the emulation process of the instructions that raise unmasked floating-point exceptions. The response to NaN operands is also included in more detail than in Section 9.1.6., SIMD Floating-Point Register Data Formats. For floating-point exception priority, refer to
F-6
Section 5.7., Priority Among Simultaneous Exceptions and Interrupts in Chapter 5, Interrupt and Exception Handling. Note that some floating-point instructions (non-waiting instructions) do not check for pending unmasked exceptions (refer to Section 7.5.11., FPU Control Instructions, in Chapter 7, Floating-Point Unit). They include the FNINIT, FNSTENV, FNSAVE, FNSTSW, FNSTCW, and FNCLEX instructions. When an FNINIT, FNSTENV, FNSAVE, or FNCLEX instruction is executed, all pending exceptions are essentially lost (either the FPU status register is cleared or all exceptions are masked). The FNSTSW and FNSTCW instructions do not check for pending interrupts, but they do not modify the FPU status and control registers. A subsequent waiting floating-point instruction can then handle any pending exceptions. F.4.2.1. NUMERIC EXCEPTIONS
There are six classes of numeric (floating-point) exception conditions that can occur: Invalid operation (#I), Divide-by-Zero (#Z), Denormal Operand (#D), Numeric Overflow (#O), Numeric Underflow (#U), and Inexact Result (precision) (#P). #I, #Z, #D are pre-computation exceptions (floating-point faults), detected before the arithmetic operation. #O, #U, #P are post-computation exceptions (floating-point traps). Users can control how the exceptions are handled by setting the mask/unmask bits in MXCSR. Masked exceptions are handled by the processor or by software if they are combined with unmasked exceptions occurring in the same instruction. Unmasked exceptions are usually handled by the low-level exception handler, in conjunction with user-level software. F.4.2.2. RESULTS OF OPERATIONS WITH NAN OPERANDS OR A NAN RESULT FOR STREAMING SIMD EXTENSIONS NUMERIC INSTRUCTIONS
The tables below specify the response of the Streaming SIMD Extensions technology instructions to NaN inputs, or to other inputs that lead to NaN results. These results will be referenced by subsequent tables. Most operations do not raise an invalid exception for quiet NaN operands, but even so, they will have higher precedence over raising some exception. Note that the single-precision QNaN Indefinite value is 0xffc00000, and the Integer Indefinite value is 0x80000000 (not a floating-point number, but it can be the result of a conversion instruction from floating-point to integer). For an unmasked exception, no result will be provided to the user handler. If a user registered floating-point exception handler is invoked, it may provide a result for the excepting instruction, that will be used if execution of the application code is continued after returning from the interruption. In Tables F-1 through Table F-10, the specified operands cause an invalid exception, unless the unmasked result is marked with (not an exception). In this latter case, the unmasked and masked results are the same.
F-7
Table F-1. ADDPS, ADDSS, SUBPS, SUBSS, MULPS, MULSS, DIVPS, DIVSS
Source Operands SNaN1 op SNaN2 SNaN1 op QNaN2 QNaN1 op SNaN2 QNaN1 op QNaN2 SNaN op real value Real value op SNaN QNaN op real value Real value op QNaN Neither source operand is SNaN, but #I is signaled (e.g. for Inf - Inf, Inf * 0, Inf / Inf, 0/0) Masked Result SNaN1 | 0x00400000 SNaN1 | 0x00400000 QNaN1 QNaN1 SNaN | 0x00400000 SNaN | 0x00400000 QNaN QNaN Single-Precision QNaN Indefinite Unmasked Result None None None QNaN1 (not an exception) None None QNaN (not an exception) QNaN (not an exception) None
Note 1. SNaN | 0x00400000 is a quiet NaN obtained from the signaling NaN given as input Note 2. Operations involving only quiet NaNs do not raise a floating-point exception
Table F-2. CMPPS.EQ, CMPSS.EQ, CMPPS.ORD, CMPSS.ORD

Source Operands NaN op Opd2 (any Opd2) Opd1 op NaN (any Opd1) Masked Result 0x00000000 0x00000000 Unmasked Result 0x00000000 (not an exception) 0x00000000 (not an exception)
Table F-3. CMPPS.NEQ, CMPSS.NEQ, CMPPS.UNORD, CMPSS.UNORD

Source Operands NaN op Opd2 (any Opd2) Opd1 op NaN (any Opd1) Masked Result 0x11111111 0x11111111 Unmasked Result 0x11111111 (not an exception) 0x11111111 (not an exception)
Table F-4. CMPPS.LT, CMPSS.LT, CMPPS.LE, CMPSS.LE

Source Operands NaN op Opd2 (any Opd2) Opd1 op NaN (any Opd1) Masked Result 0x00000000 0x00000000 Unmasked Result None None
Table F-5. CMPPS.NLT, CMPSS.NLT, CMPSS.NLT, CMPSS.NLE

Source Operands NaN op Opd2 (any Opd2) Opd1 op NaN (any Opd1) Masked Result 0x11111111 0x11111111 Unmasked Result None None
F-8
Table F-6. COMISS

Source Operands SNaN op Opd2 (any Opd2) Opd1 op SNaN (any Opd1) QNaN op Opd2 (any Opd2) Opd1 op QNaN (any Opd1) Masked Result OF,SF,AF=000 ZF,PF,CF=111 OF,SF,AF=000 ZF,PF,CF=111 OF,SF,AF=000 ZF,PF,CF=111 OF,SF,AF=000 ZF,PF,CF=111 Unmasked Result None None None None
Table F-7. UCOMISS

Source Operands SNaN op Opd2 (any Opd2) Opd1 op SNaN (any Opd1) QNaN op Opd2 (any Opd2 SNaN) Opd1 op QNaN (any Opd1 SNaN) OF,SF,AF=000 ZF,PF,CF=111 OF,SF,AF=000 ZF,PF,CF=111 (not an exception) OF,SF,AF=000 ZF,PF,CF=111 OF,SF,AF=000 ZF,PF,CF=111 (not an exception) Masked Result OF,SF,AF=000 ZF,PF,CF=111 OF,SF,AF=000 ZF,PF,CF=111 Unmasked Result None None
Table F-8. CVTPS2PI, CVTSS2SI, CVTTPS2PI, CVTTSS2SI

Source Operand SNaN QNaN Masked Result 0x80000000 (Integer Indefinite) 0x80000000 (Integer Indefinite) Unmasked Result None None
Table F-9. MAXPS, MAXSS, MINPS, MINSS

Source Operands Opd1 op NaN2 (any Opd1) NaN1 op Opd2 (any Opd2) Masked Result NaN2 Opd2 Unmasked Result None None
Note: SNaN and QNaN operands raise an Invalid Operand fault
Table F-10. SQRTPS, SQRTSS

Source Operand QnaN SNaN Source operand is not SNaN, but #I is signaled (e.g. for sqrt (-1.0)) Single-Precision QNaN Indefinite None Masked Result QNaN SNaN | 0x00400000 Unmasked Result QNaN (not an exception) None
Note: SNaN | 0x00400000 is a quiet NaN obtained from the signaling NaN given as input
F-9
F.4.2.3.
CONDITION CODES, EXCEPTION FLAGS, AND RESPONSE FOR MASKED AND UNMASKED NUMERIC EXCEPTIONS
In the following, the masked response is what the processor provides when a masked exception is raised by a Streaming SIMD Extensions numeric instruction. The same response is provided by the floating-point emulator for Streaming SIMD Extensions numeric instructions, when certain components of the quadruple input operands generate exceptions that are masked (the emulator also generates the correct answer, as specified by the IEEE standard wherever applicable, in the case when no floating-point exception occurs). The unmasked response is what the emulator provides to the user handler for those components of the quadruple input operands of the Streaming SIMD Extensions instructions that raise unmasked exceptions. Note that for precomputation exceptions (floating-point faults), no result is provided to the user handler. For post-computation exceptions (floating-point traps), a result is also provided to the user handler, as specified below. In the following tables, the result is denoted by res, with the understanding that for the actual instruction, the destination coincides with the first source operand (except for COMISS and UCOMISS, whose destination is the EFLAGS register).
Table F-11. #I - Invalid Operations
Instruction Condition Masked Response Unmasked Response and Exception Code src1, src2 unchanged, #IA=1
ADDPS ADDSS SUBPS SUBSS MULPS MULSS DIVPS DIVSS
src1 or src2 = SNaN src1=+Inf, src2 = -Inf or src1=-Inf, src2 = +Inf src1 or src2 = SNaN src1=+Inf, src2 = +Inf or src1=-Inf, src2 = -Inf src1 or src2 = SNaN src1=Inf, src2 = 0 or src1=0, src2 = Inf src1 or src2 = SNaN src1=Inf, src2 = Inf or src1=0, src2 = 0
Refer to Table F-1 for NaN operands, #IA=1 res = QNaN Indefinite, #IA=1 Refer to Table F-1 for NaN operands, #IA=1 res = QNaN Indefinite, #IA=1 Refer to Table F-1 for NaN operands, #IA=1 res = QNaN Indefinite, #IA=1 Refer to Table F-1 for NaN operands, #IA=1 res = QNaN Indefinite, #IA=1
src1, src2 unchanged, #IA=1
F-10
Table F-11. #I - Invalid Operations

Instruction Condition Masked Response Unmasked Response and Exception Code src unchanged, #IA=1
SQRTPS SQRTSS MAXPS MAXSS MINP MINSS CMPPS.LT CMPPS.LE CMPPS.NLT CMPPS.NLE CMPSS.LT CMPSS.LE CMPSS.NLT CMPSS.NLE COMISS UCOMISS CVTPS2PI CVTSS2SI CVTTPS2PI CVTTSS2SI
src = SNaN src < 0 (note that -0 < 0 is false) src1 = NaN or src2 = NaN src1 = NaN or src2 = NaN src1 = NaN or src2 = NaN
Refer to Table F-10 for NaN operands, #IA=1 res = QNaN Indefinite, #IA=1 res = src2, #IA=1 res = src2, #IA=1 Refer to Table F-4 and Table F-5 for NaN operands, #IA=1
src1, src2 unchanged, #IA=1 src1, src2 unchanged, #IA=1 src1, src2 unchanged, #IA=1
src1 = NaN or src2 = NaN src1 = SNaN or src2 = SNaN src = NaN, Inf, |(src)rnd | > 0x7fffffff src = NaN, Inf, |(src)rz | > 0x7fffffff
Refer to Table F-6 for NaN operands Refer to Table F-7 for NaN operands res = Integer Indefinite #IA=1 res = Integer Indefinite #IA=1
src1, src2, EFLAGS unchanged,#IA=1 src1, src2, EFLAGS unchanged,#IA=1 src unchanged, #IA=1 src unchanged, #IA=1
Note 1. rnd signifies the user rounding mode from MXCSR, and rz signifies the rounding mode toward zero (truncate), when rounding a floating-point value to an integer. For more information, refer to Table 9-3 in Section 9.1.8., Rounding Control Field, of Chapter 9, Programming with the Streaming SIMD Extensions. Note 2. For NAN encodings, see Table 9-2, Chapter 9, Programming with the Streaming SIMD Extensions.
Table F-12. #Z - Divide-by-Zero

Instruction Condition Masked Response Unmasked Response and Exception Code src1, src2 unchanged, #ZE=1
DIVPS DIVSS
src1 = finite non-zero (normal, or denormal) src2 = 0
res = Inf #ZE=1
F-11
Table F-13. #D - Denormal Operand

Instruction ADDPS SUBPS MULPS DIVPS SQRTPS MAXPS MINPS CMPPS ADDSS SUBSS MULSS DIVSS SQRTSS MAXSS MINSS CMPSS COMISS UCOMISS Condition src1 = denormal or src2 = denormal #DE=1 Masked Response res = result rounded to the destination precision and using the bounded exponent, but only if no unmasked postcomputation exception occurs Unmasked Response and Exception Code src1, src2 unchanged, #DE=1
(SQRT only has 1 src)
Note: For denormal encodings, see Table 9-2, Chapter 9, Programming with the Streaming SIMD Extensions.
Table F-14. #0 - Numeric Overflow

Instruction ADDPS SUBPS MULPS DIVPS ADDSS SUBSS MULSS DIVSS Condition rounded result > largest singleprecision finite normal value
Rounding
Masked Response
Sign Result & Status Flags
Unmasked Response and Exception Code res = (result calculated with unbounded exponent and rounded to the destination precision) / 2192 #OE=1 #PE=1 if the result is inexact
To nearest Toward Toward + Toward 0
+ + + + -
#OE=1, #PE=1 res = + res = #OE=1, #PE=1 res = 1.111 * 2127 res = #OE=1, #PE=1 res = + res = -1.111 * 2127 #OE=1, #PE=1 res = 1.111 * 2127 res = -1.111 * 2127
F-12
Table F-15. #U - Numeric Underflow

Instruction ADDPS SUBPS MULPS DIVPS ADDSS SUBSS MULSS DIVSS Condition result calculated with unbounded exponent and rounded to the destination precision < smallest singleprecision finite normal value Masked Response #UE=1 and #PE=1, but only if the result is exact res = 0, denormal, or normal Unmasked Response and Exception Code res = (result calculated with unbounded exponent and rounded to the destination precision) * 2192 #UE=1 #PE=1 if the result is inexact
Table F-16. #P - Inexact Result (Precision)

Instruction ADDPS SUBPS MULPS DIVPS SQRTPS CVTPI2PS CVTPS2PI CVTTPS2PI ADDSS SUBSS MULSS DIVSS SQRTSS CVTSI2SS CVTSS2SI CVTTSS2SI Condition the result is not exactly representable in the destination format Masked Response res = result rounded to the destination precision and using the bounded exponent, but only if no unmasked underflow or overflow conditions occur (This exception can occur in the presence of a masked underflow or overflow) #PE=1 Unmasked Response and Exception Code only if no underflow/overflow condition occurred, or if the corresponding exceptions are masked: set #OE if masked overflow and set result as described above for masked overflow; set #UE if masked underflow and set result as described above for masked underflow; if neither underflow nor overflow, res = the result rounded to the destination precision and using the bounded exponent set #PE=1
F.4.3.
SIMD Floating-Point Emulation Implementation Example
The sample code listed below may be considered as being part of a user-level floating-point exception filter for Streaming SIMD Extensions numeric instructions. It is assumed that the filter function is invoked by a low-level exception handler (reached via interrupt vector 19 when an unmasked floating-point exception occurs), and that it operates as explained in Section F.4.1., Floating-Point Emulation The sample code does the emulation for the add, subtract, multiply, and divide operations. For this, it uses C code and IA-32 FPU operations (readability, and not efficiency was the primary goal). Operations corresponding to other Streaming SIMD Extensions numeric instructions have to be emulated, but only place holders for them are included. The example assumes that the emulation function receives a pointer to a data structure specifying a number of input parameters: the operation that caused the exception, a set of two sub-operands (unpacked, of type float), the rounding mode (the precision is always single), exception masks (having the same relative bit positions as in the MXCSR but starting from bit 0 in an unsigned integer), and a flush-to-zero indicator. The output parameters are a floating-point result (of type float), the cause of the exception (identified by constants not explicitly defined below), and the exception status flags. The corresponding C definition is:
F-13
typedef struct { unsigned int operation; // Streaming SIMD Extensions operation: ADDPS, ADDSS, ... float operand1_fval; // first operand value float operand2_fval; // second operand value (if any) float result_fval; // result value (if any) unsigned int rounding_mode; // rounding mode unsigned int exc_masks; // exception masks, in the order P, U, O, Z, D, I unsigned int exception_cause; // exception cause unsigned int status_flag_inexact; // inexact status flag unsigned int status_flag_underflow; // underflow status flag unsigned int status_flag_overflow; // overflow status flag unsigned int status_flag_divide_by_zero; // divide by zero status flag unsigned int status_flag_denormal_operand; // denormal operand status flag unsigned int status_flag_invalid_operation; // invalid operation status flag unsigned int ftz; // flush-to-zero flag } EXC_ENV;
The arithmetic operations exemplified are emulated as follows: 1. Perform the operation using IA-32 FPU instructions, with exceptions disabled, the original user rounding mode, and single precision; this will reveal invalid, denormal, or divide-byzero exceptions (if there are any); store the result in memory as a double precision value (whose exponent range is large enough to look like unbounded to the result of the single precision computation). 2. If no unmasked exceptions were detected, determine if the result is tiny (less than the smallest normal number that can be represented in single precision format), or huge (greater than the largest normal number that can be represented in single precision format); if an unmasked overflow or underflow occur, calculate the scaled result that will be handed to the user exception handler, as specified by the IEEE-754 Standard for Binary FloatingPoint Computations. 3. If no exception was raised above, calculate the result with bounded exponent; if the result was tiny, it will require denormalization (shifting right the significand while incrementing the exponent to bring it into the admissible range of [-126,+127] for single precision floating-point numbers); the result obtained in step A above cannot be used because it might incur a double rounding error (it was rounded to 24 bits in step A, and might have to be rounded again in the denormalization process); the way to overcome this is to calculate the result as a double precision value, and then to store it to memory in single precision format - rounding first to 53 bits in the significand, and then to 24 will never cause a double rounding error (exact properties exist that state when doublerounding error does not occur, but for the elementary arithmetic operations, the rule of thumb is that if we round an infinitely precise result to 2p+1 bits and then again to p bits, the result is the same as when rounding directly to p bits, which means that no double rounding error occurs). 4. If the result is inexact and the inexact exceptions are unmasked, the result calculated in step C will be delivered to the user floating-point exception handler. 5. Finally, the flush-to-zero case is dealt with if the result is tiny.
F-14
The emulation function returns RAISE_EXCEPTION to the filter function if an exception has to be raised (the exception_cause field will indicate the cause); otherwise, the emulation function returns DO_NOT_ RAISE_EXCEPTION. In the first case, the result will be provided by the user exception handler called by the filter function. In the second case, it is provided by the emulation function. The filter function has to collect all the partial results, and to assemble the scalar or packed result that will be used if execution is to be continued.
Example F-2. SIMD Floating-Point Emulation // masks for individual status word bits #define PRECISION_MASK 0x20 #define UNDERFLOW_MASK 0x10 #define OVERFLOW_MASK 0x08 #define ZERODIVIDE_MASK 0x04 #define DENORMAL_MASK 0x02 #define INVALID_MASK 0x01 // 32-bit constants static unsigned ZEROF_ARRAY[] = {0x00000000}; #define ZEROF *(float *) ZEROF_ARRAY // +0.0 static unsigned NZEROF_ARRAY[] = {0x80000000}; #define NZEROF *(float *) NZEROF_ARRAY // -0.0 static unsigned POSINFF_ARRAY[] = {0x7f800000}; #define POSINFF *(float *)POSINFF_ARRAY // +Inf static unsigned NEGINFF_ARRAY[] = {0xff800000}; #define NEGINFF *(float *)NEGINFF_ARRAY // -Inf // 64-bit constants static unsigned MIN_SINGLE_NORMAL_ARRAY [] = {0x00000000, 0x38100000}; #define MIN_SINGLE_NORMAL *(double *)MIN_SINGLE_NORMAL_ARRAY // +1.0 * 2^-126 static unsigned MAX_SINGLE_NORMAL_ARRAY [] = {0x70000000, 0x47efffff}; #define MAX_SINGLE_NORMAL *(double *)MAX_SINGLE_NORMAL_ARRAY // +1.1...1*2^127 static unsigned TWO_TO_192_ARRAY[] = {0x00000000, 0x4bf00000}; #define TWO_TO_192 *(double *)TWO_TO_192_ARRAY // +1.0 * 2^192 static unsigned TWO_TO_M192_ARRAY[] = {0x00000000, 0x33f00000}; #define TWO_TO_M192 *(double *)TWO_TO_M192_ARRAY // +1.0 * 2^-192 // auxiliary functions static int isnanf (float f); // returns 1 if f is a NaN, and 0 otherwise
F-15
static float quietf (float f); // converts a signaling NaN to a quiet NaN, and // leaves a quiet NaN unchanged
// emulation of Streaming SIMD Extensions instructions using // C code and IA-32 FPU instructions unsigned int simd_fp_emulate (EXC_ENV *exc_env) { float opd1; // first operand of the add, subtract, multiply, or divide float opd2; // second operand of the add, subtract, multiply, or divide float res; // result of the add, subtract, multiply, or divide double dbl_res24; // result with 24-bit significand, but "unbounded" exponent // (needed to check tininess, to provide a scaled result to // an underflow/overflow trap handler, and in flush-to-zero mode) double dbl_res; // result in double precision format (needed to avoid a // double rounding error when denormalizing) unsigned int result_tiny; unsigned int result_huge; unsigned short int sw; // 16 bits unsigned short int cw; // 16 bits
// have to check first for faults (V, D, Z), and then for traps (O, U, I) // initialize FPU (floating-point exceptions are masked) _asm { fninit; } result_tiny = 0; result_huge = 0; switch (exc_env->operation) { case ADDPS: case ADDSS: case SUBPS: case SUBSS: case MULPS: case MULSS: case DIVPS: case DIVSS:
F-16
opd1 = exc_env->operand1_fval; opd2 = exc_env->operand2_fval; // execute the operation and check whether the invalid, denormal, or // divide by zero flags are set and the respective exceptions enabled // set control word with rounding mode set to exc_env->rounding_mode, // single precision, and all exceptions disabled switch (exc_env->rounding_mode) { case ROUND_TO_NEAREST: cw = 0x003f; // round to nearest, single precision, exceptions masked break; case ROUND_DOWN: cw = 0x043f; // round down, single precision, exceptions masked break; case ROUND_UP: cw = 0x083f; // round up, single precision, exceptions masked break; case ROUND_TO_ZERO: cw = 0x0c3f; // round to zero, single precision, exceptions masked break; default: ; } __asm { fldcw WORD PTR cw; } // compute result and round to the destination precision, with // "unbounded" exponent (first IEEE rounding) switch (exc_env->operation) { case ADDPS: case ADDSS: // perform the addition __asm { fnclex; // load input operands fld DWORD PTR opd1; // may set the denormal or invalid status flags fld DWORD PTR opd2; // may set the denormal or invalid status flags faddp st(1), st(0); // may set the inexact or invalid status flags // store result fstp QWORD PTR dbl_res24; // exact } break; case SUBPS:
F-17
case SUBSS: // perform the subtraction __asm { fnclex; // load input operands fld DWORD PTR opd1; // may set the denormal or invalid status flags fld DWORD PTR opd2; // may set the denormal or invalid status flags fsubp st(1), st(0); // may set the inexact or invalid status flags // store result fstp QWORD PTR dbl_res24; // exact } break; case MULPS: case MULSS: // perform the multiplication __asm { fnclex; // load input operands fld DWORD PTR opd1; // may set the denormal or invalid status flags fld DWORD PTR opd2; // may set the denormal or invalid status flags fmulp st(1), st(0); // may set the inexact or invalid status flags // store result fstp QWORD PTR dbl_res24; // exact } break; case DIVPS: case DIVSS: // perform the division __asm { fnclex; // load input operands fld DWORD PTR opd1; // may set the denormal or invalid status flags fld DWORD PTR opd2; // may set the denormal or invalid status flags fdivp st(1), st(0); // may set the inexact, divide by zero, or // invalid status flags // store result fstp QWORD PTR dbl_res24; // exact } break; default: ; // will never occur }
F-18
// read status word __asm { fstsw WORD PTR sw; } if (sw & ZERODIVIDE_MASK) sw = sw & ~DENORMAL_MASK; // clear D flag for (denormal / 0) // if invalid flag is set, and invalid exceptions are enabled, take trap if (!(exc_env->exc_masks & INVALID_MASK) && (sw & INVALID_MASK)) { exc_env->status_flag_invalid_operation = 1; exc_env->exception_cause = INVALID_OPERATION; return (RAISE_EXCEPTION); } // checking for NaN operands has priority over denormal exceptions; also fix for the // differences in treating two NaN inputs between the Streaming SIMD Extensions // instructions and other IA-32 instructions if (isnanf (opd1) || isnanf (opd2)) { if (isnanf (opd1) && isnanf (opd2)) exc_env->result_fval = quietf (opd1); else exc_env->result_fval = (float)dbl_res24; // exact if (sw & INVALID_MASK) exc_env->status_flag_invalid_operation = 1; return (DO_NOT_RAISE_EXCEPTION); } // if denormal flag is set, and denormal exceptions are enabled, take trap if (!(exc_env->exc_masks & DENORMAL_MASK) && (sw & DENORMAL_MASK)) { exc_env->status_flag_denormal_operand = 1; exc_env->exception_cause = DENORMAL_OPERAND; return (RAISE_EXCEPTION); } // if divide by zero flag is set, and divide by zero exceptions are // enabled, take trap (for divide only) if (!(exc_env->exc_masks & ZERODIVIDE_MASK) && (sw & ZERODIVIDE_MASK)) { exc_env->status_flag_divide_by_zero = 1; exc_env->exception_cause = DIVIDE_BY_ZERO; return (RAISE_EXCEPTION); } // done if the result is a NaN (QNaN Indefinite) res = (float)dbl_res24; if (isnanf (res)) {
F-19
exc_env->result_fval = res; // exact exc_env->status_flag_invalid_operation = 1; return (DO_NOT_RAISE_EXCEPTION); } // dbl_res24 is not a NaN at this point if (sw & DENORMAL_MASK) exc_env->status_flag_denormal_operand = 1; // Note: (dbl_res24 == 0.0 && sw & PRECISION_MASK) cannot occur if (-MIN_SINGLE_NORMAL < dbl_res24 && dbl_res24 < 0.0 || 0.0 < dbl_res24 && dbl_res24 < MIN_SINGLE_NORMAL) { result_tiny = 1; } // check if the result is huge if (NEGINFF < dbl_res24 && dbl_res24 < -MAX_SINGLE_NORMAL || MAX_SINGLE_NORMAL < dbl_res24 && dbl_res24 < POSINFF) { result_huge = 1; } // at this point, there are no enabled I, D, or Z exceptions; the instr. // might lead to an enabled underflow, enabled underflow and inexact, // enabled overflow, enabled overflow and inexact, enabled inexact, or // none of these; if there are no U or O enabled exceptions, re-execute // the instruction using IA-32 double precision format, and the // users rounding mode; exceptions must have been disabled before calling // this function; an inexact exception may be reported on the 53-bit // fsubp, fmulp, or on both the 53-bit and 24-bit conversions, while an // overflow or underflow (with traps disabled) may be reported on the // conversion from dbl_res to res // check whether there is an underflow, overflow, or inexact trap to be // taken // if the underflow traps are enabled and the result is tiny, take // underflow trap if (!(exc_env->exc_masks & UNDERFLOW_MASK) && result_tiny) { dbl_res24 = TWO_TO_192 * dbl_res24; // exact exc_env->status_flag_underflow = 1; exc_env->exception_cause = UNDERFLOW; exc_env->result_fval = (float)dbl_res24; // exact if (sw & PRECISION_MASK) exc_env->status_flag_inexact = 1; return (RAISE_EXCEPTION); } // if overflow traps are enabled and the result is huge, take
F-20
// overflow trap if (!(exc_env->exc_masks & OVERFLOW_MASK) && result_huge) { dbl_res24 = TWO_TO_M192 * dbl_res24; // exact exc_env->status_flag_overflow = 1; exc_env->exception_cause = OVERFLOW; exc_env->result_fval = (float)dbl_res24; // exact if (sw & PRECISION_MASK) exc_env->status_flag_inexact = 1; return (RAISE_EXCEPTION); } // set control word with rounding mode set to exc_env->rounding_mode, // double precision, and all exceptions disabled cw = cw | 0x0200; // set precision to double __asm { fldcw WORD PTR cw; }
switch (exc_env->operation) { case ADDPS: case ADDSS: // perform the addition __asm { // load input operands fld DWORD PTR opd1; // may set the denormal status flag fld DWORD PTR opd2; // may set the denormal status flag faddp st(1), st(0); // rounded to 53 bits, may set the inexact // status flag // store result fstp QWORD PTR dbl_res; // exact, will not set any flag } break; case SUBPS: case SUBSS: // perform the subtraction __asm { // load input operands fld DWORD PTR opd1; // may set the denormal status flag fld DWORD PTR opd2; // may set the denormal status flag fsubp st(1), st(0); // rounded to 53 bits, may set the inexact // status flag // store result fstp QWORD PTR dbl_res; // exact, will not set any flag } break;
F-21
case MULPS: case MULSS: // perform the multiplication __asm { // load input operands fld DWORD PTR opd1; // may set the denormal status flag fld DWORD PTR opd2; // may set the denormal status flag fmulp st(1), st(0); // rounded to 53 bits, exact // store result fstp QWORD PTR dbl_res; // exact, will not set any flag } break; case DIVPS: case DIVSS: // perform the division __asm { // load input operands fld DWORD PTR opd1; // may set the denormal status flag fld DWORD PTR opd2; // may set the denormal status flag fdivp st(1), st(0); // rounded to 53 bits, may set the inexact // status flag // store result fstp QWORD PTR dbl_res; // exact, will not set any flag } break; default: ; // will never occur } // calculate result for the case an inexact trap has to be taken, or // when no trap occurs (second IEEE rounding) res = (float)dbl_res; // may set P, U or O; may also involve denormalizing the result // read status word __asm { fstsw WORD PTR sw; } // if inexact traps are enabled and result is inexact, take inexact trap if (!(exc_env->exc_masks & PRECISION_MASK) && ((sw & PRECISION_MASK) || (exc_env->ftz && result_tiny))) { exc_env->status_flag_inexact = 1;
F-22
exc_env->exception_cause = INEXACT; if (result_tiny) { exc_env->status_flag_underflow = 1; // if ftz = 1 and result is tiny, result = 0.0 // (no need to check for underflow traps disabled: result tiny and // underflow traps enabled would have caused taking an underflow // trap above) if (exc_env->ftz) { if (res > 0.0) res = ZEROF; else if (res < 0.0) res = NZEROF; // else leave res unchanged } } if (result_huge) exc_env->status_flag_overflow = 1; exc_env->result_fval = res; return (RAISE_EXCEPTION); } // if it got here, then there is no trap to be taken; the following must // hold: ((the MXCSR U exceptions are disabled or // // the MXCSR underflow exceptions are enabled and the underflow flag is // clear and (the inexact flag is set or the inexact flag is clear and // the 24-bit result with unbounded exponent is not tiny))) // and (the MXCSR overflow traps are disabled or the overflow flag is // clear) and (the MXCSR inexact traps are disabled or the inexact flag // is clear) // // in this case, the result has to be delivered (the status flags are // sticky, so they are all set correctly already) // read status word to see if result is inexact __asm { fstsw WORD PTR sw; } if (sw & UNDERFLOW_MASK) exc_env->status_flag_underflow = 1; if (sw & OVERFLOW_MASK) exc_env->status_flag_overflow = 1; if (sw & PRECISION_MASK) exc_env->status_flag_inexact = 1; // if ftz = 1, and result is tiny (underflow traps must be disabled), // result = 0.0 if (exc_env->ftz && result_tiny) { if (res > 0.0)
F-23
res = ZEROF; else if (res < 0.0) res = NZEROF; // else leave res unchanged exc_env->status_flag_inexact = 1; exc_env->status_flag_underflow = 1; } exc_env->result_fval = res; if (sw & ZERODIVIDE_MASK) exc_env->status_flag_divide_by_zero = 1; if (sw & DENORMAL_MASK) exc_env->status_flag_denormal= 1; if (sw & INVALID_MASK) exc_env->status_flag_invalid_operation = 1; return (DO_NOT_RAISE_EXCEPTION); break; case CMPPS: case CMPSS: ... break; case COMISS: case UCOMISS: ... break; case CVTPI2PS: case CVTSI2SS: ... break; case CVTPS2PI: case CVTSS2SI: case CVTTPS2PI: case CVTTSS2SI: ... break;
F-24
case MAXPS: case MAXSS: case MINPS: case MINSS: ... break; case SQRTPS: case SQRTSS: ... break; case UNSPEC: ... break; default: ... } }
F-25
F-26
INDEX
Numerics
16-bit address size . . . . . . . . . . . . . . . . . . . . . . . . .3-4 operand size . . . . . . . . . . . . . . . . . . . . . . . . .3-4 32-bit address size . . . . . . . . . . . . . . . . . . . . . . . . .3-4 operand size . . . . . . . . . . . . . . . . . . . . . . . . .3-4
B
B (default size) flag, segment descriptor .3-14, 4-3 Base (operand addressing) . . . . . . . . . . .5-9, 5-10 Basic execution environment . . . . . . . . . . . . . . 3-2 B-bit, FPU status word . . . . . . . . . . . . . . . . . . 7-15 BCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 BCD integers . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 FPU encoding . . . . . . . . . . . . . . . . . . . . . . 7-29 packed. . . . . . . . . . . . . . . . . . . . . . . . .5-5, 6-28 relationship to status flags. . . . . . . . . . . . . 3-12 unpacked. . . . . . . . . . . . . . . . . . . . . . .5-5, 6-28 BH register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 Bias value numeric overflow . . . . . . . . . . . . . . . . . . . . 7-55 numeric underflow. . . . . . . . . . . . . . . . . . . 7-56 Biased exponent. . . . . . . . . . . . . . . . . . . . . . . . 7-5 Binary numbers . . . . . . . . . . . . . . . . . . . . . . . . 1-7 Binary-coded decimal (see BCD) Bit fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 Bit order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 BOUND instruction . . . . . . . . . . . . 4-17, 6-39, 6-44 BOUND range exceeded exception (#BR) . . . 4-18 BP register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 Branch prediction . . . . . . . . . . . . . . . . . . . . . . . 2-8 Branching, on FPU condition codes . . . .7-15, 7-38 BSF instruction . . . . . . . . . . . . . . . . . . . . . . . . 6-34 BSR instruction. . . . . . . . . . . . . . . . . . . . . . . . 6-34 BSWAP instruction . . . . . . . . . . . . . . . . . .6-3, 6-21 BT instruction . . . . . . . . . . . . . . . . 3-10, 3-12, 6-34 BTC instruction . . . . . . . . . . . . . . . 3-10, 3-12, 6-34 BTR instruction . . . . . . . . . . . . . . . 3-10, 3-12, 6-34 BTS instruction . . . . . . . . . . . . . . . 3-10, 3-12, 6-34 Bus interface unit . . . . . . . . . . . . . . . . . . . . . . . 2-9 BX register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 Byte order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
A
AAA instruction. . . . . . . . . . . . . . . . . . . . . . . . .6-28 AAD instruction . . . . . . . . . . . . . . . . . . . . . . . .6-28 AAM instruction . . . . . . . . . . . . . . . . . . . . . . . .6-28 AAS instruction. . . . . . . . . . . . . . . . . . . . . . . . .6-28 AC (alignment check) flag, EFLAGS register. .3-13 Access rights, segment descriptor . . . . . . 4-9, 4-13 ADC instruction . . . . . . . . . . . . . . . . . . . . . . . .6-26 ADD instruction . . . . . . . . . . . . . . . . . . . . . . . .6-26 Address size attribute code segment . . . . . . . . . . . . . . . . . . . . . . .3-14 description of . . . . . . . . . . . . . . . . . . . . . . .3-14 of stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3 Address sizes. . . . . . . . . . . . . . . . . . . . . . . . . . .3-4 Addressing modes assembler . . . . . . . . . . . . . . . . . . . . . . . . . .5-10 base . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9, 5-10 base plus displacement . . . . . . . . . . . . . . .5-10 base plus index plus displacement . . . . . . .5-10 base plus index time scale plus displacement . . . . . . . . . . . . . . . . . . . . .5-10 displacement. . . . . . . . . . . . . . . . . . . . . . . . .5-9 effective address. . . . . . . . . . . . . . . . . . . . . .5-9 immediate operands . . . . . . . . . . . . . . . . . . .5-6 index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-9 index times scale plus displacement . . . . .5-10 memory operands. . . . . . . . . . . . . . . . . . . . .5-7 register operands . . . . . . . . . . . . . . . . . . . . .5-7 scale factor . . . . . . . . . . . . . . . . . . . . . . . . . .5-9 specifying a segment selector . . . . . . . . . . .5-8 specifying an offset . . . . . . . . . . . . . . . . . . . .5-9 Addressing, segments . . . . . . . . . . . . . . . . . . . .1-7 Advanced programmable interrupt controller (see APIC) AF (adjust) flag, EFLAGS register . . . . . . . . . .3-12 AH register . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-7 Alignment of words, doublewords, and quadwords . . . .5-2 AND instruction . . . . . . . . . . . . . . . . . . . . . . . .6-29 APIC, presence of . . . . . . . . . . . . . . . . . . . . . .11-2 Arctangent, FPU operation. . . . . . . . . . . . . . . .7-38 Arithmetic instructions, FPU. . . . . . . . . . . . . . .7-46 Assembler, addressing modes. . . . . . . . . . . . .5-10 AX register . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-7
C
C1 flag, FPU status word . . 7-13, 7-52, 7-55, 7-57 C2 flag, FPU status word . . . . . . . . . . . . . . . . 7-13 Call gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8 CALL instruction . . . 3-14, 4-4, 4-5, 4-9, 6-36, 6-44 Calls (see Procedure calls) CBW instruction . . . . . . . . . . . . . . . . . . . . . . . 6-26 CDQ instruction . . . . . . . . . . . . . . . . . . . . . . . 6-26 CF (carry) flag, EFLAGS register . . . . . . . . . . 3-12 CH register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 CLC instruction . . . . . . . . . . . . . . . . . . . .3-12, 6-42 CLD instruction . . . . . . . . . . . . . . . . . . . .3-13, 6-42 CLI instruction. . . . . . . . . . . . . . . . . . . . .6-43, 10-4 CMC instruction . . . . . . . . . . . . . . . . . . .3-12, 6-42 CMOVcc instructions . . . . . . . . . . . . . . . .6-2, 6-20
INDEX-1
INDEX
CMP instruction . . . . . . . . . . . . . . . . . . . . . . . .6-27 CMPS instruction . . . . . . . . . . . . . . . . . . 3-13, 6-40 CMPXCHG instruction . . . . . . . . . . . . . . . 6-3, 6-22 CMPXCHG8B instruction . . . . . . . . 6-2, 6-22, 11-2 Code segment . . . . . . . . . . . . . . . . . . . . . . . . . .3-9 Compare compare and exchange . . . . . . . . . . . . . . .6-22 integers . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-27 real numbers, FPU . . . . . . . . . . . . . . . . . . .7-37 strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-40 Compatibility software . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6 Condition code flags, FPU status word branching on . . . . . . . . . . . . . . . . . . . . . . . .7-15 conditional moves on . . . . . . . . . . . . . . . . .7-15 description of . . . . . . . . . . . . . . . . . . . . . . .7-12 interpretation of. . . . . . . . . . . . . . . . . . . . . .7-14 use of . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-36 Conditional moves, on FPU condition codes . .7-15 Constants (floating-point) descriptions of. . . . . . . . . . . . . . . . . . . . . . .7-34 Cosine, FPU operation. . . . . . . . . . . . . . . . . . .7-38 CPUID instruction. . . . . . . . . 6-2, 6-45, 11-2, 11-4 CS register . . . . . . . . . . . . . . . . . . . . . . . . . 3-7, 3-9 CTI instruction . . . . . . . . . . . . . . . . . . . . . . . . .6-42 Current privilege level (see CPL) Current stack . . . . . . . . . . . . . . . . . . . . . . . 4-2, 4-4 CWD instruction . . . . . . . . . . . . . . . . . . . . . . . .6-26 CWDE instruction. . . . . . . . . . . . . . . . . . . . . . .6-26 CX register . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-7
DE (denormal operand exception) flag, FPU status word. . . . . . . . . . .7-14, 7-54 DEC instruction. . . . . . . . . . . . . . . . . . . . . . . . 6-26 Decimal integers, FPU description of. . . . . . . . . . . . . . . . . . . . . . . 7-29 encodings . . . . . . . . . . . . . . . . . . . . . . . . . 7-29 Deep branch prediction . . . . . . . . . . . . . . . . . . 2-8 Denormal number (see Denormalized finite number) Denormal operand exception (#D) . . . . . . . . . 7-54 Denormalization process . . . . . . . . . . . . . . . . . 7-7 Denormalized finite number . . . . . . . 7-6, 7-25, 9-5 DF (direction) flag, EFLAGS register . . . . . . . 3-13 DH register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 DI register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 Dispatch/execute unit . . . . . . . . . . . . . . . . . . . 2-12 Displacement (operand addressing). . . . .5-9, 5-10 DIV instruction . . . . . . . . . . . . . . . . . . . . . . . . 6-27 Division-by-zero exception (#Z) . . . . . . . . . . . 7-53 Double-extended-precision, IEEE floating-point format . . . . . . . . .7-25, 9-5 Double-precision, IEEE floating-point format . . . . . . . . .7-25, 9-5 Double-real floating-point format . . . . . . .7-25, 9-5 Doubleword . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 DS register . . . . . . . . . . . . . . . . . . . . . . . . .3-7, 3-9 DX register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 Dynamic data flow analysis . . . . . . . . . . . . . . . 2-8 Dynamic execution . . . . . . . . . . . . . . . . . . . . . . 2-8
D
DAA instruction . . . . . . . . . . . . . . . . . . . . . . . .6-28 DAS instruction . . . . . . . . . . . . . . . . . . . . . . . .6-28 Data pointer, FPU . . . . . . . . . . . . . . . . . . . . . .7-21 Data segment. . . . . . . . . . . . . . . . . . . . . . . . . . .3-9 Data types alignment of words, doublewords, and quadwords . . . . . . . . . . . . . . . . . . . .5-2 BCD integers . . . . . . . . . . . . . . . . . . . 5-5, 6-28 bit fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-5 byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-1 doubleword . . . . . . . . . . . . . . . . . . . . . . . . . .5-1 FPU BCD decimal. . . . . . . . . . . . . . . . . . . .7-29 FPU integer. . . . . . . . . . . . . . . . . . . . . . . . .7-27 FPU real number . . . . . . . . . . . . . . . . . . . .7-25 fundamental data types . . . . . . . . . . . . . . . .5-1 integers . . . . . . . . . . . . . . . . . . . 5-3, 6-26, 6-27 packed bytes. . . . . . . . . . . . . . . . . . . . . . . . .8-3 packed doublewords. . . . . . . . . . . . . . . . . . .8-3 packed words . . . . . . . . . . . . . . . . . . . . . . . .8-3 pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-5 quadword . . . . . . . . . . . . . . . . . . . . . . . 5-1, 8-3 strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-5 unsigned integers . . . . . . . . . . . 5-5, 6-26, 6-27 word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-1
E
EAX register . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 EBP register . . . . . . . . . . . . . . . . . . . . 3-6, 4-4, 4-7 EBX register . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 ECX register . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 EDI register. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 EDX register . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 Effective address . . . . . . . . . . . . . . . . . . . . . . . 5-9 EFLAGS Condition Codes . . . . . . . . . . . . . . . . B-1 EFLAGS register . . . . . . . . . . . . . . . . . . . . . . 3-10 restoring from procedure stack . . . . . . . . . . 4-8 saving on a procedure call . . . . . . . . . . . . . 4-8 status flags . . . . . . . . . . . . . . . 7-15, 7-16, 7-37 EIP register. . . . . . . . . . . . . . . . . . . . . . . .3-8, 3-14 EMMS instruction . . . . . . . . . . . . . . . . . .8-10, 8-12 ENTER instruction . . . . . . . . . . . . . . . . .4-18, 6-41 ES register . . . . . . . . . . . . . . . . . . . . . . . . .3-7, 3-9 ES (exception summary) flag, FPU status word. . . . . . . . . . . . . . . 7-59 ESC instructions, FPU . . . . . . . . . . . . . . . . . . 7-32 ESI register. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 ESP register . . . . . . . . . . . . . . . . 3-6, 4-1, 4-3, 4-4 Exception flags, FPU status word. . . . . . . . . . 7-14 Exception handler. . . . . . . . . . . . . . . . . . . . . . 4-11 Exception priority, FPU exceptions. . . . . . . . . 7-57 Exception-flag masks, FPU control word . . . . 7-17
INDEX-2
INDEX
Exceptions BOUND range exceeded (#BR) . . . . . . . . .4-18 description of . . . . . . . . . . . . . . . . . . . . . . .4-11 implicit call to handler . . . . . . . . . . . . . . . . . .4-1 in real-address mode . . . . . . . . . . . . . . . . .4-17 notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-8 overflow exception (#OF) . . . . . . . . . . . . . .4-17 summary of . . . . . . . . . . . . . . . . . . . . . . . . .4-14 vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-12 Exponent floating-point number . . . . . . . . . . . . . . . . . .7-4 Exponential, FPU operation . . . . . . . . . . . . . . .7-40 Extended real encodings, unsupported . . . . . . . . . . . . . . .7-30 floating-point format . . . . . . . . . . . . . . 7-25, 9-5
F
F2XM1 instruction . . . . . . . . . . . . . . . . . . . . . .7-40 FABS instruction . . . . . . . . . . . . . . . . . . . . . . .7-35 FADD instruction . . . . . . . . . . . . . . . . . . . . . . .7-35 FADDP instruction . . . . . . . . . . . . . . . . . . . . . .7-35 Far call description of . . . . . . . . . . . . . . . . . . . . . . . .4-5 operation. . . . . . . . . . . . . . . . . . . . . . . . . . . .4-6 Far pointer 16-bit addressing . . . . . . . . . . . . . . . . . . . . .3-4 32-bit addressing . . . . . . . . . . . . . . . . . . . . .3-4 description of . . . . . . . . . . . . . . . . . . . . 3-3, 5-5 FBSTP instruction . . . . . . . . . . . . . . . . . . . . . .7-33 FCHS instruction . . . . . . . . . . . . . . . . . . . . . . .7-35 FCLEX/FNCLEX instructions . . . . . . . . . . . . . .7-15 FCMOVcc instructions . . . . . . . . . . 6-2, 7-16, 7-33 FCOM instruction . . . . . . . . . . . . . . . . . . 7-15, 7-36 FCOMI instruction . . . . . . . . . . . . . 6-2, 7-16, 7-36 FCOMIP instruction . . . . . . . . . . . . 6-2, 7-16, 7-36 FCOMP instruction. . . . . . . . . . . . . . . . . 7-15, 7-36 FCOMPP instruction . . . . . . . . . . . . . . . 7-15, 7-36 FCOS instruction . . . . . . . . . . . . . . . . . . 7-13, 7-38 FDIV instruction . . . . . . . . . . . . . . . . . . . . . . . .7-35 FDIVP instruction . . . . . . . . . . . . . . . . . . . . . . .7-35 FDIVR instruction . . . . . . . . . . . . . . . . . . . . . . .7-35 FDIVRP instruction. . . . . . . . . . . . . . . . . . . . . .7-35 Feature determination, of processor . . . . . . . .11-1 Fetch/decode unit. . . . . . . . . . . . . . . . . . . . . . .2-11 FIADD instruction . . . . . . . . . . . . . . . . . . . . . . .7-35 FICOM instruction . . . . . . . . . . . . . . . . . 7-15, 7-36 FICOMP instruction . . . . . . . . . . . . . . . . 7-15, 7-36 FIDIV instruction. . . . . . . . . . . . . . . . . . . . . . . .7-35 FIDIVR instruction . . . . . . . . . . . . . . . . . . . . . .7-35 FILD instruction . . . . . . . . . . . . . . . . . . . . . . . .7-33 FIMUL instruction . . . . . . . . . . . . . . . . . . . . . . .7-35 FINIT/FNINIT instructions . 7-15, 7-16, 7-20, 7-41 FIST instruction . . . . . . . . . . . . . . . . . . . . . . . .7-33 FISTP instruction . . . . . . . . . . . . . . . . . . . . . . .7-33 FISUB instruction . . . . . . . . . . . . . . . . . . . . . . .7-35 FISUBR instruction. . . . . . . . . . . . . . . . . . . . . .7-35 Flat memory model . . . . . . . . . . . . . . . . . . 3-2, 3-8
FLD instruction . . . . . . . . . . . . . . . . . . . . . . . . 7-33 FLD1 instruction . . . . . . . . . . . . . . . . . . . . . . . 7-34 FLDCW instruction . . . . . . . . . . . . . . . . .7-16, 7-42 FLDENV instruction . . . . . . 7-15, 7-21, 7-24, 7-42 FLDL2E instruction . . . . . . . . . . . . . . . . . . . . . 7-34 FLDL2T instruction . . . . . . . . . . . . . . . . . . . . . 7-34 FLDLG2 instruction. . . . . . . . . . . . . . . . . . . . . 7-34 FLDLN2 instruction. . . . . . . . . . . . . . . . . . . . . 7-34 FLDPI instruction . . . . . . . . . . . . . . . . . . . . . . 7-34 FLDSW instruction . . . . . . . . . . . . . . . . . . . . . 7-42 FLDZ instruction . . . . . . . . . . . . . . . . . . . . . . . 7-34 Floating-point data types . . . . . . . . . . . . . . . . 7-24 Floating-point exceptions automatic handling . . . . . . . . . . . . . . . . . . 7-47 denormal operand exception. . . . . . . . . . . 7-54 division-by-zero . . . . . . . . . . . . . . . . . . . . . 7-53 exception conditions . . . . . . . . . . . . . . . . . 7-51 exception priority . . . . . . . . . . . . . . . . . . . . 7-57 inexact result (precision) . . . . . . . . . . . . . . 7-57 invalid arithmetic operand . . . . . . . . .7-51, 7-52 numeric overflow . . . . . . . . . . . . . . . . . . . . 7-54 numeric underflow. . . . . . . . . . . . . . . . . . . 7-56 software handling . . . . . . . . . . . . . . . . . . . 7-49 stack overflow . . . . . . . . . . . . . . . . . .7-13, 7-52 stack underflow . . . . . . . . . . . . 7-13, 7-51, 7-52 summary of . . . . . . . . . . . . . . . . . . . . . . . . 7-46 synchronization . . . . . . . . . . . . . . . . . . . . . 7-58 Floating-point format biased exponent . . . . . . . . . . . . . . . . . . . . . 7-5 description of. . . . . . . . . . . . . . . . . . . . . . . 7-24 exponent . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 fraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 real number system. . . . . . . . . . . . . . . . . . . 7-2 real numbers . . . . . . . . . . . . . . . . . . . . . . . 7-25 sign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 significand . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 FMUL instruction . . . . . . . . . . . . . . . . . . . . . . 7-35 FMULP instruction . . . . . . . . . . . . . . . . . . . . . 7-35 FNOP instruction . . . . . . . . . . . . . . . . . . . . . . 7-41 FPATAN instruction . . . . . . . . . . . . . . . .7-38, 7-39 FPREM instruction . . . . . . . . . . . . 7-13, 7-35, 7-39 FPREM1 instruction . . . . . . . . . . . 7-13, 7-35, 7-39 FPTAN instruction . . . . . . . . . . . . . . . . . . . . . 7-13 FPU architecture . . . . . . . . . . . . . . . . . . . . . . . . . 7-8 compatibility with Intel Architecture FPUs and math coprocessors . . . . . . . . . . . . . . . . 7-1 floating-point format . . . . . . . . . . . . . . .7-2, 7-4 IEEE standards . . . . . . . . . . . . . . . . . . . . . . 7-1 presence of . . . . . . . . . . . . . . . . . . . . . . . . 11-2 transcendental instruction accuracy . . . . . 7-40 FPU control word description of. . . . . . . . . . . . . . . . . . . . . . . 7-16 exception-flag masks . . . . . . . . . . . . . . . . 7-17 PC field . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-17 RC field . . . . . . . . . . . . . . . . . . . . . . . .7-18, 9-8 FPU data pointer . . . . . . . . . . . . . . . . . . . . . . 7-21 FPU data registers . . . . . . . . . . . . . . . . . . . . . . 7-9
INDEX-3
INDEX
FPU instruction pointer. . . . . . . . . . . . . . . . . . .7-21 FPU instructions arithmetic vs. non-arithmetic instructions . .7-46 instruction set . . . . . . . . . . . . . . . . . . . . . . .7-31 operands. . . . . . . . . . . . . . . . . . . . . . . . . . .7-32 overview . . . . . . . . . . . . . . . . . . . . . . . . . . .7-31 unsupported . . . . . . . . . . . . . . . . . . . . . . . .7-43 FPU integer description of . . . . . . . . . . . . . . . . . . . . . . .7-27 encodings . . . . . . . . . . . . . . . . . . . . . . . . . .7-28 FPU last opcode. . . . . . . . . . . . . . . . . . . . . . . .7-21 FPU register stack description of . . . . . . . . . . . . . . . . . . . . . . . .7-9 parameter passing . . . . . . . . . . . . . . . . . . .7-11 FPU state image . . . . . . . . . . . . . . . . . . . . . . . . 7-22, 7-23 saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-21 FPU status word condition code flags . . . . . . . . . . . . . . . . . .7-12 DE flag . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-54 description of . . . . . . . . . . . . . . . . . . . . . . .7-12 exception flags . . . . . . . . . . . . . . . . . . . . . .7-14 OE flag . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-54 PE flag . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-13 TOP field. . . . . . . . . . . . . . . . . . . . . . . . . . . .7-9 FPU tag word . . . . . . . . . . . . . . . . . . . . . . . . . .7-20 Fraction, floating-point number . . . . . . . . . . . . .7-4 FRNDINT instruction . . . . . . . . . . . . . . . . . . . .7-35 FRSTOR instruction . . . . . . 7-15, 7-21, 7-24, 7-42 FS register . . . . . . . . . . . . . . . . . . . . . . . . . 3-7, 3-9 FSAVE/FNSAVE instructions. . . .7-12, 7-15, 7-20, 7-21, 7-42 FSCALE instruction . . . . . . . . . . . . . . . . . . . . .7-40 FSIN instruction . . . . . . . . . . . . . . . . . . . 7-13, 7-38 FSINCOS instruction . . . . . . . . . . . . . . . 7-13, 7-39 FSQRT instruction . . . . . . . . . . . . . . . . . . . . . .7-35 FST instruction . . . . . . . . . . . . . . . . . . . . . . . . .7-33 FSTCW/FNSTCW instructions. . . . . . . . 7-16, 7-42 FSTENV/FNSTENV instructions .7-12, 7-20, 7-21, 7-42 FSTP instruction. . . . . . . . . . . . . . . . . . . . . . . .7-33 FSTSW/FNSTSW instructions . . . . . . . . 7-12, 7-42 FSUB instruction . . . . . . . . . . . . . . . . . . . . . . .7-35 FSUBP instruction . . . . . . . . . . . . . . . . . . . . . .7-35 FSUBR instruction . . . . . . . . . . . . . . . . . . . . . .7-35 FSUBRP instruction . . . . . . . . . . . . . . . . . . . . .7-35 FTST instruction. . . . . . . . . . . . . . . . . . . 7-15, 7-36 FUCOM instruction. . . . . . . . . . . . . . . . . . . . . .7-36 FUCOMI instruction . . . . . . . . . . . . 6-2, 7-16, 7-36 FUCOMIP instruction . . . . . . . . . . . 6-2, 7-16, 7-36 FUCOMP instruction . . . . . . . . . . . . . . . . . . . .7-36 FUCOMPP instruction . . . . . . . . . . . . . . 7-15, 7-36 FXAM instruction . . . . . . . . . . . . . . . . . . 7-13, 7-36 FXCH instruction . . . . . . . . . . . . . . . . . . . . . . .7-33 FXTRACT instruction . . . . . . . . . . . . . . . . . . . .7-35 FYL2X instruction. . . . . . . . . . . . . . . . . . . . . . .7-40 FYL2XP1 instruction . . . . . . . . . . . . . . . . . . . .7-40
G
General-purpose registers . . . . . . . . . . . . .3-5, 3-6 parameter passing . . . . . . . . . . . . . . . . . . . 4-7 GS register . . . . . . . . . . . . . . . . . . . . . . . . .3-7, 3-9
H
Hexadecimal numbers . . . . . . . . . . . . . . . . . . . 1-7
I
ID (identification) flag, EFLAGS register. . . . . 3-14 IDIV instruction . . . . . . . . . . . . . . . . . . . . . . . . 6-27 IE (invalid operation exception) flag, FPU status word. . . . . . . . . . .7-14, 7-52 IEEE 754 and 854 standards for floating-point arithmetic . . . . . . . . . . 7-1 IF (interrupt enable) flag, EFLAGS register . . . . . 3-13, 4-13, 10-5 Immediate operands. . . . . . . . . . . . . . . . . . . . . 5-6 IMUL instruction . . . . . . . . . . . . . . . . . . . . . . . 6-27 IN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3 IN instruction. . . . . . . . . . . . . . . . . 6-41, 10-3, 10-4 INC instruction . . . . . . . . . . . . . . . . . . . . . . . . 6-26 Indefinite description of. . . . . . . . . . . . . . . . . . . . . . . . 7-8 integer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-28 packed BCD decimal. . . . . . . . . . . . . . . . . 7-30 real . . . . . . . . . . . . . . . . . . . . . . . . . . .7-27, 9-6 Index (operand addressing) . . . . . . . . . . .5-9, 5-10 Inexact result (precision) exception (#P) . . . . 7-57 Inexact result, FPU . . . . . . . . . . . . . . . . . . . . . 7-19 Infinity control flag, FPU control word. . . . . . . 7-20 Infinity, floating-point format . . . . . . . . . . . . . . . 7-8 INIT pin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 Input/output (see I/O) INS instruction . . . . . . . . . . . . . . . 6-41, 10-3, 10-4 Instruction decoder . . . . . . . . . . . . . . . . . . . . . 2-11 Instruction operands . . . . . . . . . . . . . . . . . . . . . 1-7 Instruction pointer (EIP register). . . . . . . . . . . 3-14 Instruction pointer, FPU . . . . . . . . . . . . . . . . . 7-21 Instruction pool (reorder buffer) . . . . . . . . . . . 2-11 Instruction prefixes (see Prefixes) Instruction set binary arithmetic instructions. . . . . . . . . . . 6-26 bit scan instructions. . . . . . . . . . . . . . . . . . 6-34 bit test and modify instructions . . . . . . . . . 6-34 byte-set-on-condition instructions . . . . . . . 6-34 control transfer instructions . . . . . . . . . . . . 6-35 data movement instructions . . . . . . . . . . . 6-20 decimal arithmetic instructions . . . . . . . . . 6-27 EFLAGS instructions. . . . . . . . . . . . . . . . . 6-42 floating-point instructions . . . . . . . . .6-10, 6-12 integer instructions . . . . . . . . . . . . . . . . . . . 6-3 I/O instructions . . . . . . . . . . . . . . . . . . . . . 6-41 lists of . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 logical instructions. . . . . . . . . . . . . . . . . . . 6-29
INDEX-4
INDEX
MMX instructions . . . . . . . . . . . . . . . . . . . . .8-5 processor identification instruction . . . . . . .6-45 repeating string operations . . . . . . . . . . . . .6-40 rotate instructions . . . . . . . . . . . . . . . . . . . .6-32 segment register instructions . . . . . . . . . . .6-43 shift instructions . . . . . . . . . . . . . . . . . . . . .6-29 software interrupt instructions. . . . . . . . . . .6-39 string operation instructions . . . . . . . . . . . .6-39 summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-1 system instructions . . . . . . . . . . . . . . . . . . .6-16 test instruction. . . . . . . . . . . . . . . . . . . . . . .6-35 type conversion instructions . . . . . . . . . . . .6-25 INT instruction . . . . . . . . . . . . . . . . . . . . 4-17, 6-44 Integers . . . . . . . . . . . . . . . . . . . . . 5-3, 6-26, 6-27 Integer, FPU data type description of . . . . . . . . . . . . . . . . . . . . . . .7-27 indefinite . . . . . . . . . . . . . . . . . . . . . . . . . . .7-28 Inter-privilege level call description of . . . . . . . . . . . . . . . . . . . . . . . .4-8 operation. . . . . . . . . . . . . . . . . . . . . . . . . . .4-10 Inter-privilege level return description of . . . . . . . . . . . . . . . . . . . . . . . .4-8 operation. . . . . . . . . . . . . . . . . . . . . . . . . . .4-10 Interrupt gate . . . . . . . . . . . . . . . . . . . . . . . . . .4-13 Interrupt handler. . . . . . . . . . . . . . . . . . . . . . . .4-11 Interrupt vector . . . . . . . . . . . . . . . . . . . . . . . . .4-12 Interrupts description of . . . . . . . . . . . . . . . . . . . . . . .4-11 implicit call to an interrupt handler procedure . . . . . . . . . . . . . . . . . . . . . . .4-13 implicit call to an interrupt handler task. . . .4-17 in real-address mode . . . . . . . . . . . . . . . . .4-17 maskable . . . . . . . . . . . . . . . . . . . . . . . . . .4-12 summary of . . . . . . . . . . . . . . . . . . . . . . . . .4-14 user-defined . . . . . . . . . . . . . . . . . . . . . . . .4-12 vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-12 INTn instruction . . . . . . . . . . . . . . . . . . . . . . . .6-39 INTO instruction . . . . . . . . . . . . . . 4-17, 6-39, 6-44 Invalid arithmetic operand exception (#IA), FPU description of . . . . . . . . . . . . . . . . . . . . . . .7-52 masked response to . . . . . . . . . . . . . . . . . .7-53 Invalid operation exception . . . . . . . . . . . . . . .7-51 INVD instruction . . . . . . . . . . . . . . . . . . . . . . . . .6-3 INVLPG instruction. . . . . . . . . . . . . . . . . . . . . . .6-3 IOPL (I/O privilege level) field, EFLAGS register . . . . . . . . . . 3-13, 10-4 IRET instruction 3-14, 4-16, 4-17, 6-36, 6-44, 10-5 I/O address space . . . . . . . . . . . . . . . . . . . . . .10-2 I/O instructions overview of . . . . . . . . . . . . . . . . . . . . 6-41, 10-3 serialization. . . . . . . . . . . . . . . . . . . . . . . . .10-6 I/O map base . . . . . . . . . . . . . . . . . . . . . . . . . .10-5 I/O permission bit map . . . . . . . . . . . . . . . . . . .10-5 I/O ports addressing . . . . . . . . . . . . . . . . . . . . . . . . .10-1 defined . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-1 hardware. . . . . . . . . . . . . . . . . . . . . . . 9-7, 10-1 memory-mapped I/O. . . . . . . . . . . . . . . . . .10-2
ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6 protected mode I/O . . . . . . . . . . . . . . . . . . 10-4 I/O privilege level (see IOPL) I/O sensitive instructions. . . . . . . . . . . . . . . . . 10-4
J
J-bit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 Jcc instructions . . . . . . . . . . . . . . . 3-12, 3-14, 6-37 JMP instruction . . . . . . . . . . . . . . . 3-14, 6-35, 6-44
L
L1 (level 1) cache . . . . . . . . . . . . . . . . . . . .2-7, 2-9 L2 (level 2) cache . . . . . . . . . . . . . . . . . . . .2-7, 2-9 LAHF instruction . . . . . . . . . . . . . . . . . . .3-10, 6-42 Last instruction opcode, FPU . . . . . . . . . . . . . 7-21 LDS instruction . . . . . . . . . . . . . . . . . . . . . . . . 6-44 LEA instruction . . . . . . . . . . . . . . . . . . . . . . . . 6-44 LEAVE instruction. . . . . . . . . . . . . 4-18, 4-24, 6-41 LES instruction . . . . . . . . . . . . . . . . . . . . . . . . 6-44 LGS instruction . . . . . . . . . . . . . . . . . . . . . . . . 6-44 Linear address . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 Linear address space defined . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 maximum size . . . . . . . . . . . . . . . . . . . . . . . 3-2 LOCK signal . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21 LODS instruction . . . . . . . . . . . . . . . . . .3-13, 6-40 Log epsilon, FPU operation . . . . . . . . . . . . . . 7-40 Log (base 2), FPU computation . . . . . . . . . . . 7-40 Logical address . . . . . . . . . . . . . . . . . . . . . . . . 3-3 LOOP instructions . . . . . . . . . . . . . . . . . . . . . 6-38 LOOPcc instructions. . . . . . . . . . . . . . . .3-12, 6-38 LSS instruction . . . . . . . . . . . . . . . . . . . . . . . . 6-44
M
Maskable interrupts . . . . . . . . . . . . . . . . . . . . 4-12 Masked responses to denormal operand exception. . . . . . . . . 7-54 to division-by-zero exception. . . . . . . . . . . 7-54 to FPU stack overflow or underflow exception. . . . . . . . . . . . . . . 7-52 to inexact result (precision) exception. . . . 7-57 to invalid arithmetic operation . . . . . . . . . . 7-53 to numeric overflow exception. . . . . . . . . . 7-55 to numeric underflow exception . . . . . . . . 7-56 Masks, exception-flags, FPU control word . . . 7-17 Memory order buffer . . . . . . . . . . . . . . . . . . . . . . . . 2-10 organization. . . . . . . . . . . . . . . . . . . . . .3-2, 3-3 subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 Memory interface unit . . . . . . . . . . . . . . . . . . . . 2-9 Memory operands. . . . . . . . . . . . . . . . . . . . . . . 5-7 Memory-mapped I/O. . . . . . . . . . . . . . . .10-1, 10-2 MESI (modified, exclusive, shared, invalid) cache protocol . . . . . . . . . . . . . . . . . 2-9
INDEX-5
INDEX
Microarchitecture detailed description. . . . . . . . . . . . . . . . . . . .2-9 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-6 Micro-ops . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-11 MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7 registers . . . . . . . . . . . . . . . . . .8-2 MMX technology arithmetic instructions . . . . . . . . . . . . . . . . . .8-9 comparison instructions . . . . . . . . . . . . . . . .8-9 compatibility with FPU architecture. . . . . . .8-11 conversion instructions . . . . . . . . . . . . . . . .8-10 CPUID instruction . . . . . . . . . . . . . . . . . . . .11-2 data transfer instructions . . . . . . . . . . . . . . .8-7 data types . . . . . . . . . . . . . . . . . . . . . . . . . . .8-3 detecting MMX technology with CPUID instruction . . . . . . . . . . . . . . . . .8-11 detecting with CPUID instruction . . . . . . . .8-12 effect of instruction prefixes on MMX instructions. . . . . . . . . . . . . . . . . .8-11 EMMS instruction . . . . . . . . . . . . . . . . . . . .8-10 exception handling in MMX code . . . . . . . .8-16 instruction operands . . . . . . . . . . . . . . . . . . .8-7 instruction set . . . . . . . . . . . . . . . . . . . . 8-5, 8-7 interfacing with MMX code . . . . . . . . . . . . .8-13 introduction to . . . . . . . . . . . . . . . . . . . . . . . .8-1 logical instructions . . . . . . . . . . . . . . . . . . .8-10 memory data formats . . . . . . . . . . . . . . . . . .8-4 mixing MMX and floating-point instructions . . . . . . . . . . . . . . . . . . . . . .8-14 programming environment (overview) . . . . .8-1 register data formats. . . . . . . . . . . . . . . . . . .8-5 register mapping . . . . . . . . . . . . . . . . . . . . .8-16 registers . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-2 saturation arithmetic . . . . . . . . . . . . . . . . . . .8-6 shift instructions . . . . . . . . . . . . . . . . . . . . .8-10 SIMD execution environment . . . . . . . . . . . .8-4 support for, determing. . . . . . . . . . . . . . . . .11-2 using MMX code in a multitasking operating system environment . . . . . . .8-15 using the EMMS instruction . . . . . . . . . . . .8-12 wraparound mode. . . . . . . . . . . . . . . . . . . . .8-6 Modes, operating . . . . . . . . . . . . . . . . . . . . . . . .3-4 MOV instruction . . . . . . . . . . . . . . . . . . . 6-20, 6-43 MOVD instruction . . . . . . . . . . . . . . . . . . . . . . . .8-7 MOVQ instruction. . . . . . . . . . . . . . . . . . . . . . . .8-7 MOVS instruction . . . . . . . . . . . . . . . . . . 3-13, 6-40 MOVSX instruction. . . . . . . . . . . . . . . . . . . . . .6-26 MOVZX instruction . . . . . . . . . . . . . . . . . . . . . .6-26 MTRRs (memory type range registers) presence of . . . . . . . . . . . . . . . . . . . . . . . . .11-2 MUL instruction . . . . . . . . . . . . . . . . . . . . . . . .6-27
SNaNs vs. QNaNs . . . . . . . . . . . . . . . . . . . 7-8 Near call description of. . . . . . . . . . . . . . . . . . . . . . . . 4-5 operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5 Near pointer description of. . . . . . . . . . . . . . . . . . . . . . . . 5-5 Near return operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5 Near return operation . . . . . . . . . . . . . . . . . . . . 4-6 NEG instruction . . . . . . . . . . . . . . . . . . . . . . . 6-27 Non-arithmetic instructions, FPU . . . . . . . . . . 7-46 Non-number encodings, FPU . . . . . . . . . . . . . . 7-5 Non-waiting instructions . . . . . . . . . . . . .7-42 , 7-49 NOP instruction . . . . . . . . . . . . . . . . . . . . . . . 6-45 Normalized finite number . . . . . . . . . . . . . .7-4, 7-6 NOT instruction. . . . . . . . . . . . . . . . . . . . . . . . 6-29 Notation bit and byte order . . . . . . . . . . . . . . . . . . . . 1-5 exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8 hexadecimal and binary numbers . . . . . . . . 1-7 instruction operands . . . . . . . . . . . . . . . . . . 1-7 reserved bits . . . . . . . . . . . . . . . . . . . . . . . . 1-6 segmented addressing . . . . . . . . . . . . . . . . 1-7 Notational conventions . . . . . . . . . . . . . . . . . . . 1-5 NT (nested task) flag, EFLAGS register . . . . . 3-13 Numeric overflow exception (#O) . . . . . .7-13, 7-54 Numeric underflow exception (#U) . . . . .7-13, 7-56
O
OE (numeric overflow exception) flag, FPU status word. . . . . . . . . . .7-14, 7-55 OF (overflow) flag, EFLAGS register . . .3-12, 4-17 Offset (operand addressing). . . . . . . . . . . . . . . 5-9 Operand FPU instructions . . . . . . . . . . . . . . . . . . . . 7-32 instruction . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7 Operand addressing, modes . . . . . . . . . . . . . . 5-6 Operand sizes . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 Operand-size attribute code segment . . . . . . . . . . . . . . . . . . . . . . 3-14 description of. . . . . . . . . . . . . . . . . . . . . . . 3-14 Operating modes . . . . . . . . . . . . . . . . . . . . . . . 3-4 OR instruction. . . . . . . . . . . . . . . . . . . . . . . . . 6-29 Ordering I/O . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6 OUT instruction. . . . . . . . . . . . . . . 6-41, 10-3, 10-4 OUTS instruction . . . . . . . . . . . . . 6-41, 10-3, 10-4 Overflow exception (#OF). . . . . . . . . . . . . . . . 4-17 Overflow, FPU exception (see Numeric overflow exception) Overflow, FPU stack. . . . . . . . . . . . . . . .7-51, 7-52
N
NaN description of . . . . . . . . . . . . . . . . . . . . 7-5, 7-8 encoding of . . . . . . . . . . . . . . . . . . . . . 7-6, 7-27 operating on . . . . . . . . . . . . . . . . . . . . . . . .7-43
P
P6 family processors microarchitecture. . . . . . . . . . . . . . . . . .2-6, 2-9 Packed BCD integers . . . . . . . . . . . . . . . . . . . . 5-5
INDEX-6
INDEX
Packed bytes data type . . . . . . . . . . . . . . . . . . .8-3 Packed decimal indefinite . . . . . . . . . . . . . . . .7-30 Packed doublewords data type . . . . . . . . . . . . .8-3 Packed words data type. . . . . . . . . . . . . . . . . . .8-3 Parameter passing argument list . . . . . . . . . . . . . . . . . . . . . . . . .4-7 FPU register stack . . . . . . . . . . . . . . . . . . .7-11 on procedure stack . . . . . . . . . . . . . . . . . . . .4-7 on the procedure stack . . . . . . . . . . . . . . . . .4-7 through general-purpose registers . . . . . . . .4-7 PC (precision) field, FPU control word . . . . . . .7-17 PE (inexact result exception) flag, FPU status word 7-13, 7-14, 7-19, 7-57 PF (parity) flag, EFLAGS register . . . . . . . . . .3-12 Physical address space . . . . . . . . . . . . . . . . . . .3-2 Physical memory . . . . . . . . . . . . . . . . . . . . . . . .3-2 Pi description of FPU constant . . . . . . . . . . . .7-39 Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-5 POP instruction . . . . . . . . . . . 4-1, 4-3, 6-24, 6-43 POPA instruction . . . . . . . . . . . . . . . . . . . 4-8, 6-24 POPF instruction . . . . . . . . . 3-10, 4-8, 6-42, 10-5 POPFD instruction . . . . . . . . . . . . . 3-10, 4-8, 6-42 Privilege levels description of . . . . . . . . . . . . . . . . . . . . . . . .4-9 inter-privilege level calls . . . . . . . . . . . . . . . .4-8 stack switching . . . . . . . . . . . . . . . . . . . . . .4-13 Procedure calls description of . . . . . . . . . . . . . . . . . . . . . . . .4-5 far call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-5 for block-structured languages . . . . . . . . . .4-18 inter-privilege level call . . . . . . . . . . . . . . . .4-10 linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-4 near call . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-5 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-1 procedure stack . . . . . . . . . . . . . . . . . . . . . .4-1 return instruction pointer (EIP register). . . . .4-4 saving procedure state information. . . . . . . .4-7 stack switching . . . . . . . . . . . . . . . . . . . . . . .4-9 to exception handler procedure . . . . . . . . .4-13 to exception task. . . . . . . . . . . . . . . . . . . . .4-17 to interrupt handler procedure . . . . . . . . . .4-13 to interrupt task . . . . . . . . . . . . . . . . . . . . . .4-17 to other privilege levels . . . . . . . . . . . . . . . . .4-8 types of . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-1 Procedure stack address-size attribute . . . . . . . . . . . . . . . . . .4-3 alignment of stack pointer. . . . . . . . . . . . . . .4-3 current stack . . . . . . . . . . . . . . . . . . . . . 4-2, 4-4 description of . . . . . . . . . . . . . . . . . . . . . . . .4-1 EIP register (return instruction pointer). . . . .4-4 maximum size. . . . . . . . . . . . . . . . . . . . . . . .4-1 number allowed . . . . . . . . . . . . . . . . . . . . . .4-1 passing parameters on . . . . . . . . . . . . . . . . .4-7 popping values from . . . . . . . . . . . . . . . . . . .4-1 procedure linking information . . . . . . . . . . . .4-4 pushing values on. . . . . . . . . . . . . . . . . . . . .4-1 return instruction pointer . . . . . . . . . . . . . . . .4-4 SS register . . . . . . . . . . . . . . . . . . . . . . . . . .4-1
stack pointer . . . . . . . . . . . . . . . . . . . . . . . . 4-1 stack segment . . . . . . . . . . . . . . . . . . . . . . . 4-1 stack-frame base pointer, EBP register . . . 4-4 switching . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9 top of stack . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 Processor identification earlier Intel architecture processors . . . . . 11-4 using CPUID instruction . . . . . . . . . . . . . . 11-2 Processorstateinformation,savingonaprocedurecall 4-7 Protected mode description of. . . . . . . . . . . . . . . . . . . . . . . . 3-4 I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4 Pseudo-denormal number . . . . . . . . . . . . . . . 7-30 Pseudo-infinity . . . . . . . . . . . . . . . . . . . . . . . . 7-30 Pseudo-NaN . . . . . . . . . . . . . . . . . . . . . . . . . . 7-30 PUSH instruction . . . . . . . . . . 4-1, 4-3, 6-23, 6-43 PUSHA instruction . . . . . . . . . . . . . . . . . .4-8, 6-23 PUSHF instruction . . . . . . . . . . . . . 3-10, 4-8, 6-42 PUSHFD instruction . . . . . . . . . . . . 3-10, 4-8, 6-42
Q
QNaN description of. . . . . . . . . . . . . . . . . . . . . . . . 7-8 operating on . . . . . . . . . . . . . . . . . . . . . . . 7-43 rules for generating . . . . . . . . . . . . . . . . . . 7-44 Quadword . . . . . . . . . . . . . . . . . . . . . . . . . .5-1, 8-3 Quiet NaN (see QNaN)
R
RC (rounding control) field, FPU control word . . . . . . . . . . . . . . .7-18, 9-8 RCL instruction . . . . . . . . . . . . . . . . . . . . . . . . 6-33 RCR instruction . . . . . . . . . . . . . . . . . . . . . . . 6-33 RDMSR instruction . . . . . . . . . . . . . . . . . .6-2, 11-2 RDPMC instruction . . . . . . . . . . . . . . . . . . . . . . 6-2 RDTSC instruction . . . . . . . . . . . . . . . . . .6-2, 11-2 Real numbers encoding . . . . . . . . . . . . . . . . . . . 7-5, 7-6, 7-27 floating-point format . . . . . . . . . . . . . . . . . 7-25 indefinite . . . . . . . . . . . . . . . . . . . . . . .7-27, 9-6 notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5 system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 Real-address mode . . . . . . . . . . . . . . . . . . . . . 3-4 handling exceptions in. . . . . . . . . . . . . . . . 4-17 handling interrupts in. . . . . . . . . . . . . . . . . 4-17 Register operands . . . . . . . . . . . . . . . . . . . . . . 5-7 Register stack, FPU . . . . . . . . . . . . . . . . . . . . . 7-9 Registers EFLAGS register . . . . . . . . . . . . . . . . . . . . 3-10 EIP register . . . . . . . . . . . . . . . . . . . . . . . . 3-14 general-purpose registers . . . . . . . . . . .3-5, 3-6 segment registers . . . . . . . . . . . . . . . . .3-5, 3-7 Related literature . . . . . . . . . . . . . . . . . . . . . . . 1-9 REP/REPE/REPZ/REPNE/REPNZ prefixes. . . . . . . . . . . . . . . . . .6-40, 10-4 Reserved bits . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
INDEX-7
RESET pin . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-10 RET instruction. . . . . . . . 3-14, 4-4, 4-5, 6-36, 6-44 Retirement unit. . . . . . . . . . . . . . . . . . . . . . . . .2-13 Return instruction pointer . . . . . . . . . . . . . . . . . .4-4 Returns, from procedure calls exception handler, return from . . . . . . . . . .4-13 far return . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-6 interrupt handler, return from . . . . . . . . . . .4-13 Returns, from procedures calls inter-privilege level return . . . . . . . . . . . . . .4-10 near return . . . . . . . . . . . . . . . . . . . . . . . . . .4-5 RF (resume) flag, EFLAGS register . . . . . . . . .3-13 ROL instruction . . . . . . . . . . . . . . . . . . . . . . . .6-33 ROR instruction . . . . . . . . . . . . . . . . . . . . . . . .6-33 Rounding control, RC field of FPU control word . 7-18, 9-8 modes, FPU . . . . . . . . . . . . . . . . . . . . 7-18, 9-8 results, FPU . . . . . . . . . . . . . . . . . . . . . . . .7-19 RSM instruction . . . . . . . . . . . . . . . . . . . . . . . . .6-2
S
SAHF instruction . . . . . . . . . . . . . . . . . . 3-10, 6-42 SAL instruction . . . . . . . . . . . . . . . . . . . . . . . . .6-29 SAR instruction . . . . . . . . . . . . . . . . . . . . . . . .6-30 Saturation arithmetic (MMX instructions) . . . . . .8-6 Saving the FPU state . . . . . . . . . . . . . . . . . . . .7-21 SBB instruction. . . . . . . . . . . . . . . . . . . . . . . . .6-26 Scale (operand addressing) . . . . . . . . . . . 5-9, 5-10 Scale, FPU operation . . . . . . . . . . . . . . . . . . . .7-40 Scaling bias value . . . . . . . . . . . . . . . . . 7-55, 7-56 SCAS instruction . . . . . . . . . . . . . . . . . . 3-13, 6-40 Segment registers description of . . . . . . . . . . . . . . . . . . . . 3-5, 3-7 Segment selector description of . . . . . . . . . . . . . . . . . . . . 3-3, 3-7 specifying . . . . . . . . . . . . . . . . . . . . . . . . . . .5-8 Segmented addressing . . . . . . . . . . . . . . . . . . .1-7 Segmented memory model . . . . . . . . . . . . 3-3, 3-8 Segments defined . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3 maximum number . . . . . . . . . . . . . . . . . . . . .3-3 Serialization of I/O instructions. . . . . . . . . . . . .10-6 SETcc instructions . . . . . . . . . . . . . . . . . 3-12, 6-34 SF (sign) flag, EFLAGS register. . . . . . . . . . . .3-12 SF (stack fault) flag, FPU status word . . 7-15, 7-52 SHL instruction. . . . . . . . . . . . . . . . . . . . . . . . .6-29 SHLD instruction . . . . . . . . . . . . . . . . . . . . . . .6-32 SHR instruction . . . . . . . . . . . . . . . . . . . . . . . .6-29 SHRD instruction . . . . . . . . . . . . . . . . . . . . . . .6-32 SI register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-7 Signaling NaN (see SNaN) Signed infinity. . . . . . . . . . . . . . . . . . . . . . . . . . .7-8 Signed zero . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-6 Significand of floating-point number . . . . . . . . . . . . . . . .7-4 Sign, floating-point number . . . . . . . . . . . . . . . .7-4
SIMD (single-instruction, multiple-data) execution model . . . . . . . . . . . . . . . . 8-4 Sine, FPU operation . . . . . . . . . . . . . . . . . . . . 7-38 Single-precision, IEEE floating-point format . . . . . . . . .7-25, 9-5 Single-real floating-point format . . . . . . . .7-25, 9-5 SNaN description of. . . . . . . . . . . . . . . . . . . . . . . . 7-8 operating on . . . . . . . . . . . . . . . . . . . . . . . 7-43 typical uses of . . . . . . . . . . . . . . . . . . . . . . 7-43 SP register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 Speculative execution. . . . . . . . . . . . . . . . .2-7, 2-8 SS register . . . . . . . . . . . . . . . . . . . . . 3-7, 3-9, 4-1 Stack alignment . . . . . . . . . . . . . . . . . . . . . . . . 4-3 Stack fault, FPU . . . . . . . . . . . . . . . . . . . . . . . 7-15 Stack overflow and underflow exceptions (#IS), FPU . . . . . . . . . . 7-52 Stack overflow exception, FPU. . . . . . . .7-13, 7-51 Stack pointer (ESP register) . . . . . . . . . . . . . . . 4-1 Stack segment . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 Stack switching on calls to interrupt and exception handlers. . . . . . . . . . . . . . . . 4-13 on inter-privilege level calls . . . . . . . .4-10, 4-16 Stack underflow exception, FPU . . . . . .7-13, 7-51 Stack (see Procedure stack) Stack-frame base pointer, EBP register . . . . . . 4-4 Status flags, EFLAGS register . . 3-12, 7-15, 7-16, 7-37 STC instruction . . . . . . . . . . . . . . . . . . . .3-12, 6-42 STD instruction . . . . . . . . . . . . . . . . . . . .3-13, 6-42 STI instruction. . . . . . . . . . . . . . . . 6-42, 6-43, 10-4 STOS instruction . . . . . . . . . . . . . . . . . .3-13, 6-40 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 ST(0), top-of-stack register. . . . . . . . . . . . . . . 7-11 SUB instruction. . . . . . . . . . . . . . . . . . . . . . . . 6-26 Superscalar . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 Synchronization, of floating-point exceptions . 7-58 System flags, EFLAGS register . . . . . . . . . . . 3-13 System management mode (SSM) . . . . . . . . . 3-4
T
Tangent, FPU operation . . . . . . . . . . . . . . . . . 7-38 Task gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-17 Task state segment (see TSS) Tasks exception handler . . . . . . . . . . . . . . . . . . . 4-17 interrupt handler . . . . . . . . . . . . . . . . . . . . 4-17 TEST instruction . . . . . . . . . . . . . . . . . . . . . . . 6-35 TF (trap) flag, EFLAGS register . . . . . . . . . . . 3-13 Tiny number . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7 TOP (stack TOP) field, FPU status word . . . . . 7-9 Transcendental instruction accuracy . . . . . . . 7-40 Trap gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13 TSS I/O map base. . . . . . . . . . . . . . . . . . . . . . . 10-5 I/O permission bit map . . . . . . . . . . . . . . . 10-5
INDEX
saving state of EFLAGS register . . . . . . . .3-10
U
UD2 instruction. . . . . . . . . . . . . . . . . . . . . 6-2, 6-45 UE (numeric overflow exception) flag, FPU status word . . . . . . . . . . 7-14, 7-56 Underflow, FPU exception (see Numeric underflow exception) Underflow, FPU stack . . . . . . . . . . . . . . 7-51, 7-52 Underflow, numeric . . . . . . . . . . . . . . . . . . . . . .7-7 Un-normal number . . . . . . . . . . . . . . . . . . . . . .7-30 Unsigned integers . . . . . . . . . . . . . 5-5, 6-26, 6-27 Unsupported floating-point formats . . . . . . . . .7-30 Unsupported FPU instructions . . . . . . . . . . . . .7-43
V
Vector (see Interrupt vector) VIF (virtual interrupt) flag, EFLAGS register . .3-13 VIP (virtual interrupt pending) flag, EFLAGS register . . . . . . . . . . . . . . .3-13 Virtual 8086 mode description of . . . . . . . . . . . . . . . . . . . . . . .3-13 memory model . . . . . . . . . . . . . . . . . . . . . . .3-4 VM (virtual 8086 mode) flag, EFLAGS register 3-13
W
Waiting instructions . . . . . . . . . . . . . . . . . . . . .7-42 WAIT/FWAIT instructions. . . . . . . . . . . . 7-42, 7-59 WBINVD instruction . . . . . . . . . . . . . . . . . . . . . .6-3 Word. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-1 Wraparound mode (MMX instructions) . . . . . . .8-6 WRMSR instruction . . . . . . . . . . . . . . . . . 6-2, 11-2
X
XADD instruction . . . . . . . . . . . . . . . . . . . 6-3, 6-22 XCHG instruction . . . . . . . . . . . . . . . . . . . . . . .6-21 XLAT/XLATB instruction . . . . . . . . . . . . . . . . .6-45 XOR instruction . . . . . . . . . . . . . . . . . . . . . . . .6-29
Z
ZE (division-by-zero exception) flag, FPU status word . . . . . . . . . . . . . . .7-14 Zero, floating-point format . . . . . . . . . . . . . . . . .7-6 ZF (zero) flag, EFLAGS register . . . . . . . . . . .3-12
INDEX-9

Volume 2: Instruction Set Reference
1999
TABLE OF CONTENTS
CHAPTER 1 ABOUT THIS MANUAL 1.1. OVERVIEW OF THE INTEL ARCHITECTURE SOFTWARE DEVELOPERS MANUAL, 1-1 VOLUME 2: INSTRUCTION SET REFERENCE 1.2. OVERVIEW OF THE INTEL ARCHITECTURE SOFTWARE DEVELOPERS MANUAL, VOLUME 1: BASIC ARCHITECTURE 1-2 1.3. OVERVIEW OF THE INTEL ARCHITECTURE SOFTWARE DEVELOPERS MANUAL, 1-3 VOLUME 3: SYSTEM PROGRAMMING GUIDE 1.4. NOTATIONAL CONVENTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 1.4.1. Bit and Byte Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-5 1.4.2. Reserved Bits and Software Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6 1.4.3. Instruction Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-7 1.4.4. Hexadecimal and Binary Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-7 1.4.5. Segmented Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-7 1.4.6. Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-8 1.5. RELATED LITERATURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 CHAPTER 2 INSTRUCTION FORMAT 2.1. GENERAL INSTRUCTION FORMAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. INSTRUCTION PREFIXES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3. OPCODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4. MODR/M AND SIB BYTES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5. DISPLACEMENT AND IMMEDIATE BYTES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6. ADDRESSING-MODE ENCODING OF MODR/M AND SIB BYTES . . . . . . . . . . . .
2-1 2-1 2-2 2-2 2-3 2-3
CHAPTER 3 INSTRUCTION SET REFERENCE 3.1. INTERPRETING THE INSTRUCTION REFERENCE PAGES . . . . . . . . . . . . . . . . 3-1 3.1.1. Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-1 3.1.1.1. Opcode Column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-2 3.1.1.2. Instruction Column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3 3.1.1.3. Description Column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-5 3.1.1.4. Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-5 3.1.2. Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6 3.1.3. Intel C/C++ Compiler Intrinsics Equivalent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-9 3.1.3.1. The Intrinsics API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-9 3.1.3.2. MMX Technology Intrinsics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-10 3.1.3.3. SIMD Floating-Point Intrinsics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-10 3.1.4. Flags Affected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-11 3.1.5. FPU Flags Affected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-12 3.1.6. Protected Mode Exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-12 3.1.7. Real-Address Mode Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-12 3.1.8. Virtual-8086 Mode Exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-13 3.1.9. Floating-Point Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-14 3.1.10. SIMD Floating-Point Exceptions - Streaming SIMD Extensions Only . . . . . . . . .3-14
iii
TABLE OF CONTENTS
3.2.
INSTRUCTION REFERENCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16 AAAASCII Adjust After Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-17 AADASCII Adjust AX Before Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-18 AAMASCII Adjust AX After Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-19 AASASCII Adjust AL After Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-20 ADCAdd with Carry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-21 ADDAdd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-23 ADDPSPacked Single-FP Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-25 ADDSSScalar Single-FP Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-27 ANDLogical AND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-30 ANDNPSBit-wise Logical And Not For Single-FP. . . . . . . . . . . . . . . . . . . . . . . . . . .3-32 ANDPSBit-wise Logical And For Single FP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-34 ARPLAdjust RPL Field of Segment Selector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-36 BOUNDCheck Array Index Against Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-38 BSFBit Scan Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-40 BSRBit Scan Reverse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-42 BSWAPByte Swap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-44 BTBit Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-45 BTCBit Test and Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-47 BTRBit Test and Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-49 BTSBit Test and Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-51 CALLCall Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-53 CBW/CWDEConvert Byte to Word/Convert Word to Doubleword . . . . . . . . . . . . . .3-64 CDQConvert Double to Quad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-65 CLCClear Carry Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-66 CLDClear Direction Flag. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-67 CLIClear Interrupt Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-68 CLTSClear Task-Switched Flag in CR0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-70 CMCComplement Carry Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-71 CMOVccConditional Move . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-72 CMPCompare Two Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-76 CMPPSPacked Single-FP Compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-78 CMPS/CMPSB/CMPSW/CMPSDCompare String Operands. . . . . . . . . . . . . . . . . .3-87 CMPSSScalar Single-FP Compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-90 CMPXCHGCompare and Exchange. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-100 CMPXCHG8BCompare and Exchange 8 Bytes . . . . . . . . . . . . . . . . . . . . . . . . . . .3-102 COMISSScalar Ordered Single-FP Compare and Set EFLAGS . . . . . . . . . . . . . .3-104 CPUIDCPU Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-111 CVTPI2PSPacked Signed INT32 to Packed Single-FP Conversion . . . . . . . . . . .3-119 CVTPS2PIPacked Single-FP to Packed INT32 Conversion. . . . . . . . . . . . . . . . . .3-123 CVTSI2SSScalar Signed INT32 to Single-FP Conversion . . . . . . . . . . . . . . . . . . .3-127 CVTSS2SIScalar Single-FP to Signed INT32 Conversion . . . . . . . . . . . . . . . . . . .3-130 CVTTPS2PIPacked Single-FP to Packed INT32 Conversion (Truncate). . . . . . . .3-133 CVTTSS2SIScalar Single-FP to Signed INT32 Conversion (Truncate) . . . . . . . . .3-137 CWD/CDQConvert Word to Doubleword/Convert Doubleword to Quadword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-141 CWDEConvert Word to Doubleword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-142
iv
TABLE OF CONTENTS
DAADecimal Adjust AL after Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DASDecimal Adjust AL after Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DECDecrement by 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DIVUnsigned Divide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DIVPSPacked Single-FP Divide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DIVSSScalar Single-FP Divide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EMMSEmpty MMX State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ENTERMake Stack Frame for Procedure Parameters . . . . . . . . . . . . . . . . . . . . . F2XM1Compute 2x1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FABSAbsolute Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FADD/FADDP/FIADDAdd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FBLDLoad Binary Coded Decimal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FBSTPStore BCD Integer and Pop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FCHSChange Sign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FCLEX/FNCLEXClear Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FCMOVccFloating-Point Conditional Move. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FCOM/FCOMP/FCOMPPCompare Real . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FCOMI/FCOMIP/ FUCOMI/FUCOMIPCompare Real and Set EFLAGS . . . . . . . FCOSCosine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FDECSTPDecrement Stack-Top Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FDIV/FDIVP/FIDIVDivide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FDIVR/FDIVRP/FIDIVRReverse Divide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FFREEFree Floating-Point Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FICOM/FICOMPCompare Integer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FILDLoad Integer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FINCSTPIncrement Stack-Top Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FINIT/FNINITInitialize Floating-Point Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FIST/FISTPStore Integer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FLDLoad Real. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FLD1/FLDL2T/FLDL2E/FLDPI/FLDLG2/FLDLN2/FLDZLoad Constant . . . . . . . . FLDCWLoad Control Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FLDENVLoad FPU Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FMUL/FMULP/FIMULMultiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FNOPNo Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FPATANPartial Arctangent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FPREMPartial Remainder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FPREM1Partial Remainder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FPTANPartial Tangent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FRNDINTRound to Integer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FRSTORRestore FPU State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FSAVE/FNSAVEStore FPU State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FSCALEScale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FSINSine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FSINCOSSine and Cosine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FSQRTSquare Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FST/FSTPStore Real . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FSTCW/FNSTCWStore Control Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-143 3-145 3-146 3-148 3-151 3-154 3-156 3-158 3-161 3-163 3-165 3-169 3-171 3-174 3-176 3-178 3-180 3-183 3-186 3-188 3-189 3-193 3-197 3-198 3-200 3-202 3-203 3-205 3-208 3-210 3-212 3-214 3-216 3-220 3-221 3-223 3-226 3-229 3-231 3-232 3-235 3-238 3-240 3-242 3-244 3-246 3-249
TABLE OF CONTENTS
FSTENV/FNSTENVStore FPU Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-251 FSTSW/FNSTSWStore Status Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-254 FSUB/FSUBP/FISUBSubtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-257 FSUBR/FSUBRP/FISUBRReverse Subtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-261 FTSTTEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-265 FUCOM/FUCOMP/FUCOMPPUnordered Compare Real . . . . . . . . . . . . . . . . . . .3-267 FWAITWait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-270 FXAMExamine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-271 FXCHExchange Register Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-273 FXRSTORRestore FP and MMX State and Streaming SIMD Extension State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-275 FXSAVEStore FP and MMX State and Streaming SIMD Extension State . . . . .3-279 FXTRACTExtract Exponent and Significand . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-285 FYL2XCompute y * log2x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-287 FYL2XP1Compute y * log2(x +1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-289 HLTHalt. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-291 IDIVSigned Divide. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-292 IMULSigned Multiply. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-295 INInput from Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-299 INCIncrement by 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-301 INS/INSB/INSW/INSDInput from Port to String . . . . . . . . . . . . . . . . . . . . . . . . . . .3-303 INT n/INTO/INT 3Call to Interrupt Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-306 INVDInvalidate Internal Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-318 INVLPGInvalidate TLB Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-320 IRET/IRETDInterrupt Return. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-321 JccJump if Condition Is Met . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-329 JMPJump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-333 LAHFLoad Status Flags into AH Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-341 LARLoad Access Rights Byte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-342 LDMXCSRLoad Streaming SIMD Extension Control/Status . . . . . . . . . . . . . . . . .3-345 LDS/LES/LFS/LGS/LSSLoad Far Pointer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-349 LEALoad Effective Address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-353 LEAVEHigh Level Procedure Exit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-355 LESLoad Full Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-357 LFSLoad Full Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-358 LGDT/LIDTLoad Global/Interrupt Descriptor Table Register . . . . . . . . . . . . . . . . .3-359 LGSLoad Full Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-361 LLDTLoad Local Descriptor Table Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-362 LIDTLoad Interrupt Descriptor Table Register . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-364 LMSWLoad Machine Status Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-365 LOCKAssert LOCK# Signal Prefix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-367 LODS/LODSB/LODSW/LODSDLoad String. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-369 LOOP/LOOPccLoop According to ECX Counter . . . . . . . . . . . . . . . . . . . . . . . . . .3-372 LSLLoad Segment Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-375 LSSLoad Full Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-379 LTRLoad Task Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-380 MASKMOVQByte Mask Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-382
vi
TABLE OF CONTENTS
MAXPSPacked Single-FP Maximum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MAXSSScalar Single-FP Maximum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MINPSPacked Single-FP Minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MINSSScalar Single-FP Minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOVMove . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOVMove to/from Control Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOVMove to/from Debug Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOVAPSMove Aligned Four Packed Single-FP. . . . . . . . . . . . . . . . . . . . . . . . . . MOVDMove 32 Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOVHLPS High to Low Packed Single-FP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOVHPSMove High Packed Single-FP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOVLHPSMove Low to High Packed Single-FP . . . . . . . . . . . . . . . . . . . . . . . . . MOVLPSMove Low Packed Single-FP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOVMSKPSMove Mask To Integer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOVNTPSMove Aligned Four Packed Single-FP Non Temporal. . . . . . . . . . . . . MOVNTQMove 64 Bits Non Temporal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOVQMove 64 Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOVS/MOVSB/MOVSW/MOVSDMove Data from String to String . . . . . . . . . . . MOVSSMove Scalar Single-FP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOVSXMove with Sign-Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOVUPSMove Unaligned Four Packed Single-FP . . . . . . . . . . . . . . . . . . . . . . . MOVZXMove with Zero-Extend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MULUnsigned Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MULPSPacked Single-FP Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MULSSScalar Single-FP Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NEGTwo's Complement Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NOPNo Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NOTOne's Complement Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ORLogical Inclusive OR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ORPSBit-wise Logical OR for Single-FP Data . . . . . . . . . . . . . . . . . . . . . . . . . . . OUTOutput to Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OUTS/OUTSB/OUTSW/OUTSDOutput String to Port . . . . . . . . . . . . . . . . . . . . . PACKSSWB/PACKSSDWPack with Signed Saturation . . . . . . . . . . . . . . . . . . . . PACKUSWBPack with Unsigned Saturation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . PADDB/PADDW/PADDDPacked Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PADDSB/PADDSWPacked Add with Saturation . . . . . . . . . . . . . . . . . . . . . . . . . PADDUSB/PADDUSWPacked Add Unsigned with Saturation . . . . . . . . . . . . . . . PANDLogical AND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PANDNLogical AND NOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PAVGB/PAVGWPacked Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PCMPEQB/PCMPEQW/PCMPEQDPacked Compare for Equal . . . . . . . . . . . . . PCMPGTB/PCMPGTW/PCMPGTDPacked Compare for Greater Than . . . . . . . PEXTRWExtract Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PINSRWInsert Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PMADDWDPacked Multiply and Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PMAXSWPacked Signed Integer Word Maximum . . . . . . . . . . . . . . . . . . . . . . . . PMAXUBPacked Unsigned Integer Byte Maximum . . . . . . . . . . . . . . . . . . . . . . .
3-386 3-390 3-394 3-398 3-402 3-407 3-409 3-411 3-414 3-417 3-419 3-422 3-424 3-427 3-429 3-431 3-433 3-435 3-438 3-441 3-443 3-446 3-448 3-450 3-452 3-454 3-456 3-457 3-459 3-461 3-463 3-465 3-469 3-472 3-475 3-479 3-482 3-485 3-487 3-489 3-493 3-497 3-501 3-503 3-505 3-508 3-511
vii
TABLE OF CONTENTS
PMINSWPacked Signed Integer Word Minimum . . . . . . . . . . . . . . . . . . . . . . . . . .3-514 PMINUBPacked Unsigned Integer Byte Minimum . . . . . . . . . . . . . . . . . . . . . . . . .3-517 PMOVMSKBMove Byte Mask To Integer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-520 PMULHUWPacked Multiply High Unsigned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-522 PMULHWPacked Multiply High . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-525 PMULLWPacked Multiply Low . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-528 POPPop a Value from the Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-531 POPA/POPADPop All General-Purpose Registers . . . . . . . . . . . . . . . . . . . . . . . .3-536 POPF/POPFDPop Stack into EFLAGS Register . . . . . . . . . . . . . . . . . . . . . . . . . .3-538 PORBitwise Logical OR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-541 PREFETCHPrefetch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-543 PSADBWPacked Sum of Absolute Differences . . . . . . . . . . . . . . . . . . . . . . . . . . .3-545 PSHUFWPacked Shuffle Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-548 PSLLW/PSLLD/PSLLQPacked Shift Left Logical . . . . . . . . . . . . . . . . . . . . . . . . . .3-550 PSRAW/PSRADPacked Shift Right Arithmetic. . . . . . . . . . . . . . . . . . . . . . . . . . . .3-555 PSRLW/PSRLD/PSRLQPacked Shift Right Logical. . . . . . . . . . . . . . . . . . . . . . . .3-558 PSUBB/PSUBW/PSUBDPacked Subtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-563 PSUBSB/PSUBSWPacked Subtract with Saturation . . . . . . . . . . . . . . . . . . . . . . .3-567 PSUBUSB/PSUBUSWPacked Subtract Unsigned with Saturation . . . . . . . . . . . .3-570 PUNPCKHBW/PUNPCKHWD/PUNPCKHDQUnpack High Packed Data . . . . . . .3-573 PUNPCKLBW/PUNPCKLWD/PUNPCKLDQUnpack Low Packed Data . . . . . . . .3-577 PUSHPush Word or Doubleword Onto the Stack. . . . . . . . . . . . . . . . . . . . . . . . . .3-581 PUSHA/PUSHADPush All General-Purpose Registers . . . . . . . . . . . . . . . . . . . . .3-584 PUSHF/PUSHFDPush EFLAGS Register onto the Stack . . . . . . . . . . . . . . . . . . .3-587 PXORLogical Exclusive OR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-589 RCL/RCR/ROL/ROR-Rotate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-591 RCPPSPacked Single-FP Reciprocal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-596 RCPSSScalar Single-FP Reciprocal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-598 RDMSRRead from Model Specific Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-600 RDPMCRead Performance-Monitoring Counters. . . . . . . . . . . . . . . . . . . . . . . . . .3-602 RDTSCRead Time-Stamp Counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-604 REP/REPE/REPZ/REPNE /REPNZRepeat String Operation Prefix . . . . . . . . . . .3-605 RETReturn from Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-608 ROL/RORRotate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-615 RSMResume from System Management Mode . . . . . . . . . . . . . . . . . . . . . . . . . . .3-616 RSQRTPSPacked Single-FP Square Root Reciprocal . . . . . . . . . . . . . . . . . . . . .3-617 RSQRTSSScalar Single-FP Square Root Reciprocal . . . . . . . . . . . . . . . . . . . . . .3-619 SAHFStore AH into Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-621 SAL/SAR/SHL/SHRShift. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-622 SBBInteger Subtraction with Borrow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-627 SCAS/SCASB/SCASW/SCASDScan String . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-629 SETccSet Byte on Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-632 SFENCEStore Fence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-634 SGDT/SIDTStore Global/Interrupt Descriptor Table Register . . . . . . . . . . . . . . . .3-636 SHL/SHRShift Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-639 SHLDDouble Precision Shift Left . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-640 SHRDDouble Precision Shift Right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-643
viii
TABLE OF CONTENTS
SHUFPSShuffle Single-FP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SIDTStore Interrupt Descriptor Table Register . . . . . . . . . . . . . . . . . . . . . . . . . . . SLDTStore Local Descriptor Table Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SMSWStore Machine Status Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQRTPSPacked Single-FP Square Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQRTSSScalar Single-FP Square Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . STCSet Carry Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . STDSet Direction Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . STISet Interrupt Flag. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . STMXCSRStore Streaming SIMD Extension Control/Status . . . . . . . . . . . . . . . . STOS/STOSB/STOSW/STOSDStore String. . . . . . . . . . . . . . . . . . . . . . . . . . . . . STRStore Task Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SUBSubtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SUBPSPacked Single-FP Subtract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SUBSSScalar Single-FP Subtract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SYSENTERFast Transition to System Call Entry Point . . . . . . . . . . . . . . . . . . . . SYSEXITFast Transition from System Call Entry Point . . . . . . . . . . . . . . . . . . . . TESTLogical Compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . UCOMISSUnordered Scalar Single-FP compare and set EFLAGS . . . . . . . . . . . UD2Undefined Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . UNPCKHPSUnpack High Packed Single-FP Data . . . . . . . . . . . . . . . . . . . . . . . . UNPCKLPSUnpack Low Packed Single-FP Data . . . . . . . . . . . . . . . . . . . . . . . . VERR/VERWVerify a Segment for Reading or Writing. . . . . . . . . . . . . . . . . . . . . WAIT/FWAITWait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WBINVDWrite Back and Invalidate Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WRMSRWrite to Model Specific Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XADDExchange and Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XCHGExchange Register/Memory with Register . . . . . . . . . . . . . . . . . . . . . . . . . XLAT/XLATBTable Look-up Translation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XORLogical Exclusive OR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XORPSBit-wise Logical Xor for Single-FP Data . . . . . . . . . . . . . . . . . . . . . . . . . .
3-646 3-651 3-652 3-654 3-656 3-659 3-662 3-663 3-664 3-666 3-668 3-671 3-673 3-675 3-678 3-681 3-685 3-688 3-690 3-697 3-698 3-701 3-704 3-707 3-708 3-710 3-712 3-714 3-716 3-718 3-720
APPENDIX A OPCODE MAP A.1. KEY TO ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 A.1.1. Codes for Addressing Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 A.1.2. Codes for Operand Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3 A.1.3. Register Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3 A.2. OPCODE LOOK-UP EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3 A.2.1. One-Byte Opcode Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4 A.2.2. Two-Byte Opcode Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4 A.2.3. Opcode Map Shading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5 A.2.4. Opcode Map Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5 A.2.5. Opcode Extensions For One- And Two-byte Opcodes . . . . . . . . . . . . . . . . . . . A-10 A.2.6. Escape Opcode Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-12 A.2.6.1. Opcodes with ModR/M Bytes in the 00H through BFH Range . . . . . . . . . . . A-12 A.2.6.2. Opcodes with ModR/M Bytes outside the 00H through BFH Range. . . . . . . A-12 A.2.6.3. Escape Opcodes with D8 as First Byte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-12 A.2.6.4. Escape Opcodes with D9 as First Byte. . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-14
ix
TABLE OF CONTENTS
A.2.6.5. A.2.6.6. A.2.6.7. A.2.6.8. A.2.6.9. A.2.6.10.
Escape Opcodes with DA as First Byte Escape Opcodes with DB as First Byte Escape Opcodes with DC as First Byte Escape Opcodes with DD as First Byte Escape Opcodes with DE as First Byte Escape Opcodes with DF As First Byte
............................ ............................ ............................ ............................ ............................ ............................
A-15 A-16 A-18 A-19 A-21 A-22
APPENDIX B INSTRUCTION FORMATS AND ENCODINGS B.1. MACHINE INSTRUCTION FORMAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1 B.1.1. Reg Field (reg). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2 B.1.2. Encoding of Operand Size Bit (w) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3 B.1.3. Sign Extend (s) Bit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3 B.1.4. Segment Register Field (sreg). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4 B.1.5. Special-Purpose Register (eee) Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4 B.1.6. Condition Test Field (tttn) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5 B.1.7. Direction (d) Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5 B.2. INTEGER INSTRUCTION FORMATS AND ENCODINGS . . . . . . . . . . . . . . . . . . . B-6 B.3. MMX INSTRUCTION FORMATS AND ENCODINGS . . . . . . . . . . . . . . . . . . . . B-19 B.3.1. Granularity Field (gg). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-19 B.3.2. MMX and General-Purpose Register Fields (mmxreg and reg) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-19 B.3.3. MMX Instruction Formats and Encodings Table . . . . . . . . . . . . . . . . . . . . . . B-20 B.4. STREAMING SIMD EXTENSION FORMATS AND ENCODINGS TABLE . . . . . . B-24 B.4.1. Instruction Prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-24 B.4.2. Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-26 B.4.3. Formats and Encodings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-27 B.5. FLOATING-POINT INSTRUCTION FORMATS AND ENCODINGS . . . . . . . . . . . B-36 APPENDIX C COMPILER INTRINSICS AND FUNCTIONAL EQUIVALENTS C.1. SIMPLE INTRINSICS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2 C.2. COMPOSITE INTRINSICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-11
TABLE OF FIGURES
Figure 1-1. Figure 2-1. Figure 3-1. Figure 3-2. Figure 3-3. Figure 3-4. Figure 3-5. Figure 3-6. Figure 3-7. Figure 3-8. Figure 3-9. Figure 3-10. Figure 3-11. Figure 3-12. Figure 3-13. Figure 3-14. Figure 3-15. Figure 3-16. Figure 3-17. Figure 3-18. Figure 3-19. Figure 3-20. Figure 3-21. Figure 3-22. Figure 3-23. Figure 3-24. Figure 3-25. Figure 3-26. Figure 3-27. Figure 3-28. Figure 3-29. Figure 3-30. Figure 3-31. Figure 3-32. Figure 3-33. Figure 3-34. Figure 3-35. Figure 3-36. Figure 3-37. Figure 3-38. Figure 3-39. Figure 3-40. Figure 3-41. Figure 3-42. Figure 3-43. Figure 3-44. Figure 3-45. Figure 3-46. Figure 3-47. Bit and Byte Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6 Intel Architecture Instruction Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-1 Bit Offset for BIT[EAX,21] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-8 Memory Bit Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-12 Operation of the ADDPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-25 Operation of the ADDSS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-27 Operation of the ANDNPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-32 Operation of the ANDPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-34 Operation of the CMPPS (Imm8=0) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-78 Operation of the CMPPS (Imm8=1) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-78 Operation of the CMPPS (Imm8=2) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-79 Operation of the CMPPS (Imm8=3) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-79 Operation of the CMPPS (Imm8=4) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-80 Operation of the CMPPS (Imm8=5) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-80 Operation of the CMPPS (Imm8=6) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-81 Operation of the CMPPS (Imm8=7) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-81 Operation of the CMPSS (Imm8=0) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-92 Operation of the CMPSS (Imm8=1) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-92 Operation of the CMPSS (Imm8=2) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-93 Operation of the CMPSS (Imm8=3) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-93 Operation of the CMPSS (Imm8=4) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-94 Operation of the CMPSS (Imm8=5) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-94 Operation of the CMPSS (Imm8=6) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-95 Operation of the CMPSS (Imm8=7) Instruction . . . . . . . . . . . . . . . . . . . . . . .3-95 Operation of the COMISS Instruction, Condition One . . . . . . . . . . . . . . . . .3-104 Operation of the COMISS Instruction, Condition Two . . . . . . . . . . . . . . . . .3-105 Operation of the COMISS Instruction, Condition Three . . . . . . . . . . . . . . . .3-105 Operation of the COMISS Instruction, Condition Four . . . . . . . . . . . . . . . . .3-106 Version and Feature Information in Registers EAX and EDX. . . . . . . . . . . .3-112 Operation of the CVTPI2PS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-119 Operation of the CVTPS2PI Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-123 Operation of the CVTSI2SS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-127 Operation of the CVTSS2SI Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-130 Operation of the CVTTPS2PI Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . .3-133 Operation of the CVTTSS2SI Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . .3-137 Operation of the DIVPS Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-151 Operation of the DIVSS Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-154 Operation of the MAXPS Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-386 Operation of the MAXSS Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-390 Operation of the MINPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-394 Operation of the MINSS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-398 Operation of the MOVAPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-411 Operation of the MOVD Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-414 Operation of the MOVHLPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-417 Operation of the MOVHPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-419 Operation of the MOVLHPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-422 Operation of the MOVLPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-424 Operation of the MOVMSKPS Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . .3-427 Operation of the MOVQ Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-433 xi
TABLE OF FIGURES
Figure 3-48. Figure 3-49. Figure 3-50. Figure 3-51. Figure 3-52. Figure 3-53. Figure 3-54. Figure 3-55. Figure 3-56. Figure 3-57. Figure 3-58. Figure 3-59. Figure 3-60. Figure 3-61. Figure 3-62. Figure 3-63. Figure 3-64. Figure 3-65. Figure 3-66. Figure 3-67. Figure 3-68. Figure 3-69. Figure 3-70. Figure 3-71. Figure 3-72. Figure 3-73. Figure 3-74. Figure 3-75. Figure 3-76. Figure 3-77. Figure 3-78. Figure 3-79. Figure 3-80. Figure 3-81. Figure 3-82. Figure 3-83. Figure 3-84. Figure 3-85. Figure 3-86. Figure 3-87. Figure 3-88. Figure 3-89. Figure 3-90. Figure 3-91. Figure 3-92. Figure 3-93. Figure 3-94. Figure 3-95. Figure 3-96. Figure 3-97.
Operation of the MOVSS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-438 Operation of the MOVUPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-443 Operation of the MULPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-450 Operation of the MULSS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-452 Operation of the ORPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-461 Operation of the PACKSSDW Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . .3-469 Operation of the PACKUSWB Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . .3-472 Operation of the PADDW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-475 Operation of the PADDSW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-479 Operation of the PADDUSB Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-482 Operation of the PAND Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-485 Operation of the PANDN Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-487 Operation of the PAVGB/PAVGW Instruction. . . . . . . . . . . . . . . . . . . . . . . .3-489 Operation of the PCMPEQW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . .3-493 Operation of the PCMPGTW Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . .3-497 Operation of the PEXTRW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-501 Operation of the PINSRW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-503 Operation of the PMADDWD Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . .3-505 Operation of the PMAXSW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-508 Operation of the PMAXUB Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-511 Operation of the PMINSW Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-514 Operation of the PMINUB Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-517 Operation of the PMOVMSKB Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . .3-520 Operation of the PMULHUW Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . .3-522 Operation of the PMULHW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-525 Operation of the PMULLW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-528 Operation of the POR Instruction.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-541 Operation of the PSADBW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-545 Operation of the PSHUFW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-548 Operation of the PSLLW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-550 Operation of the PSRAW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-555 Operation of the PSRLW Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-558 Operation of the PSUBW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-563 Operation of the PSUBSW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-567 Operation of the PSUBUSB Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-570 High-Order Unpacking and Interleaving of Bytes With the PUNPCKHBW Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-573 Low-Order Unpacking and Interleaving of Bytes With the PUNPCKLBW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-577 Operation of the PXOR Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-589 Operation of the RCPPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-596 Operation of the RCPSS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-598 Operation of the RSQRTPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-617 Operation of the RSQRTSS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-619 Operation of the SHUFPS Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-647 Operation of the SQRTPS Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-656 Operation of the SQRTSS Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-659 Operation of the SUBPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-675 Operation of the SUBSS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-678 Operation of the UCOMISS Instruction, Condition One . . . . . . . . . . . . . . . .3-690 Operation of the UCOMISS Instruction, Condition Two . . . . . . . . . . . . . . . .3-691 Operation of the UCOMISS Instruction, Condition Three . . . . . . . . . . . . . . .3-691
xii
TABLE OF FIGURES
Figure 3-98. Figure 3-99. Figure 3-100. Figure 3-101. Figure A-1. Figure B-1. Figure B-2. Figure B-3.
Operation of the UCOMISS Instruction, Condition Four . . . . . . . . . . . . . . . 3-692 Operation of the UNPCKHPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . 3-699 Operation of the UNPCKLPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . 3-702 Operation of the XORPS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-720 ModR/M Byte nnn Field (Bits 5, 4, and 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . A-10 General Machine Instruction Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1 Key to Codes for MMX Data Type Cross-Reference. . . . . . . . . . . . . . . . . B-20 Key to Codes for Streaming SIMD Extensions Data Type Cross-Reference B-27
xiii
TABLE OF FIGURES
xiv
TABLE OF TABLES
Table 2-1. Table 2-2. Table 2-3. Table 3-1. Table 3-2. Table 3-3. Table 3-4. Table 3-5. Table 3-6. Table 3-7. Table 3-8. Table 3-9. Table A-1. Table A-2. Table A-3. Table A-4. Table A-5. Table A-6. Table A-7. Table A-8. Table A-9. Table A-10. Table A-11. Table A-12. Table A-13. Table A-14. Table A-15. Table A-16. Table A-17. Table A-18. Table A-19. Table A-20. Table A-21. Table A-22. Table B-1. Table B-2. Table B-3. Table B-4. Table B-5. Table B-6. Table B-7. Table B-8. Table B-9. Table B-10. Table B-11. Table B-12. Table B-13. Table B-14. xv 16-Bit Addressing Forms with the ModR/M Byte . . . . . . . . . . . . . . . . . . . . . . .2-5 32-Bit Addressing Forms with the ModR/M Byte . . . . . . . . . . . . . . . . . . . . . . .2-6 32-Bit Addressing Forms with the SIB Byte . . . . . . . . . . . . . . . . . . . . . . . . . . .2-7 Register Encodings Associated with the +rb, +rw, and +rd Nomenclature. . . .3-3 Exception Mnemonics, Names, and Vector Numbers . . . . . . . . . . . . . . . . . .3-13 Floating-Point Exception Mnemonics and Names . . . . . . . . . . . . . . . . . . . . .3-14 SIMD Floating-Point Exception Mnemonics and Names . . . . . . . . . . . . . . . .3-15 Streaming SIMD Extensions Faults (Interrupts 6 & 7) . . . . . . . . . . . . . . . . . .3-16 Information Returned by CPUID Instruction . . . . . . . . . . . . . . . . . . . . . . . . .3-111 Processor Type Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-113 Feature Flags Returned in EDX Register . . . . . . . . . . . . . . . . . . . . . . . . . . .3-114 Encoding of Cache and TLB Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . .3-116 Notes on Instruction Set Encoding Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5 One-byte Opcode Map (Left) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6 One-byte Opcode Map (Right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-7 Two-byte Opcode Map (Left) (First Byte is OFH) . . . . . . . . . . . . . . . . . . . . . . A-8 Two-byte Opcode Map (Right) (First Byte is OFH). . . . . . . . . . . . . . . . . . . . . A-9 Opcode Extensions for One- and Two-byte Opcodes by Group Number. . . A-11 D8 Opcode Map When ModR/M Byte is Within 00H to BFH1 . . . . . . . . . . . A-12 D8 Opcode Map When ModR/M Byte is Outside 00H to BFH1 . . . . . . . . . . A-13 D9 Opcode Map When ModR/M Byte is Within 00H to BFH1. . . . . . . . . . . . A-14 D9 Opcode Map When ModR/M Byte is Outside 00H to BFH1 . . . . . . . . . . A-15 DA Opcode Map When ModR/M Byte is Within 00H to BFH1 . . . . . . . . . . . A-15 DA Opcode Map When ModR/M Byte is Outside 00H to BFH1 . . . . . . . . . . A-16 DB Opcode Map When ModR/M Byte is Within 00H to BFH1 . . . . . . . . . . . A-17 DB Opcode Map When ModR/M Byte is Outside 00H to BFH1 . . . . . . . . . . A-17 DC Opcode Map When ModR/M Byte is Within 00H to BFH1 . . . . . . . . . . . A-18 DC Opcode Map When ModR/M Byte is Outside 00H to BFH4 . . . . . . . . . . A-19 DD Opcode Map When ModR/M Byte is Within 00H to BFH1 . . . . . . . . . . . A-20 DD Opcode Map When ModR/M Byte is Outside 00H to BFH1 . . . . . . . . . . A-20 DE Opcode Map When ModR/M Byte is Within 00H to BFH1 . . . . . . . . . . . A-21 DE Opcode Map When ModR/M Byte is Outside 00H to BFH1 . . . . . . . . . . A-22 DF Opcode Map When ModR/M Byte is Within 00H to BFH1 . . . . . . . . . . . A-23 DF Opcode Map When ModR/M Byte is Outside 00H to BFH1 . . . . . . . . . . A-23 Special Fields Within Instruction Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . B-2 Encoding of reg Field When w Field is Not Present in Instruction . . . . . . . . . B-2 Encoding of reg Field When w Field is Present in Instruction. . . . . . . . . . . . . B-3 Encoding of Operand Size (w) Bit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3 Encoding of Sign-Extend (s) Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3 Encoding of the Segment Register (sreg) Field . . . . . . . . . . . . . . . . . . . . . . . B-4 Encoding of Special-Purpose Register (eee) Field. . . . . . . . . . . . . . . . . . . . . B-4 Encoding of Conditional Test (tttn) Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5 Encoding of Operation Direction (d) Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-6 Integer Instruction Formats and Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . B-6 Encoding of Granularity of Data Field (gg) . . . . . . . . . . . . . . . . . . . . . . . . . . B-19 Encoding of the MMX Register Field (mmxreg) . . . . . . . . . . . . . . . . . . . . B-19 Encoding of the General-Purpose Register Field (reg) When Used in MMX Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-20 MMX Instruction Formats and Encodings . . . . . . . . . . . . . . . . . . . . . . . . . B-21
TABLE OF TABLES
Table B-15. Table B-16. Table B-17. Table B-18. Table B-19. Table B-20. Table B-21. Table B-22. Table B-23. Table C-1. Table C-2.
Streaming SIMD Extensions Instruction Behavior with Prefixes . . . . . . . . . B-25 SIMD Integer Instructions - Behavior with Prefixes . . . . . . . . . . . . . . . . . . . B-25 Cacheability Control Instruction Behavior with Prefixes . . . . . . . . . . . . . . . B-25 Key to Streaming SIMD Extensions Naming Convention . . . . . . . . . . . . . . . B-26 Encoding of the SIMD Floating-Point Register Field . . . . . . . . . . . . . . . . . . B-27 Encoding of the SIMD-Integer Register Field . . . . . . . . . . . . . . . . . . . . . . . . B-34 Encoding of the Streaming SIMD Extensions Cacheability Control Register Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-35 General Floating-Point Instruction Formats . . . . . . . . . . . . . . . . . . . . . . . . . B-36 Floating-Point Instruction Formats and Encodings . . . . . . . . . . . . . . . . . . . . B-37 Simple Intrinsics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2 Composite Intrinsics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-11
xvi
1
About This Manual

The Intel Architecture Software Developers Manual, Volume 2: Instruction Set Reference (Order Number 243191) is part of a three-volume set that describes the architecture and programming environment of all Intel Architecture processors. The other two volumes in this set are:
The Intel Architecture Software Developers Manual, Volume 1: Basic Architecture (Order Number 243190). The Intel Architecture Software Developers Manual, Volume 3: System Programing Guide (Order Number 243192).
The Intel Architecture Software Developers Manual, Volume 1, describes the basic architecture and programming environment of an Intel Architecture processor; the Intel Architecture Software Developers Manual, Volume 2, describes the instructions set of the processor and the opcode structure. These two volumes are aimed at application programmers who are writing programs to run under existing operating systems or executives. The Intel Architecture Software Developers Manual, Volume 3, describes the operating-system support environment of an Intel Architecture processor, including memory management, protection, task management, interrupt and exception handling, and system management mode. It also provides Intel Architecture processor compatibility information. This volume is aimed at operating-system and BIOS designers and programmers.
1.1.
The contents of this manual are as follows: Chapter 1 About This Manual. Gives an overview of all three volumes of the Intel Architecture Software Developers Manual. It also describes the notational conventions in these manuals and lists related Intel manuals and documentation of interest to programmers and hardware designers. Chapter 2 Instruction Format. Describes the machine-level instruction format used for all Intel Architecture instructions and gives the allowable encodings of prefixes, the operand-identifier byte (ModR/M byte), the addressing-mode specifier byte (SIB byte), and the displacement and immediate bytes. Chapter 3 Instruction Set Reference. Describes each of the Intel Architecture instructions in detail, including an algorithmic description of operations, the effect on flags, the effect of operand- and address-size attributes, and the exceptions that may be generated. The instructions
1-1
ABOUT THIS MANUAL
are arranged in alphabetical order. The FPU, MMX Technology instructions, and Streaming SIMD Extensions are included in this chapter. Appendix A Opcode Map. Gives an opcode map for the Intel Architecture instruction set. Appendix B Instruction Formats and Encodings. Gives the binary encoding of each form of each Intel Architecture instruction. Appendix C Compiler Intrinsics and Functional Equivalents. Gives the Intel C/C++ compiler intrinsics and functional equivalents for the MMX Technology instructions and Streaming SIMD Extensions.
1.2.
The contents of the Intel Architecture Software Developers Manual, Volume 1, are as follows: Chapter 1 About This Manual. Gives an overview of all three volumes of the Intel Architecture Software Developers Manual. It also describes the notational conventions in these manuals and lists related Intel manuals and documentation of interest to programmers and hardware designers. Chapter 2 Introduction to the Intel Architecture. Introduces the Intel Architecture and the families of Intel processors that are based on this architecture. It also gives an overview of the common features found in these processors and brief history of the Intel Architecture. Chapter 3 Basic Execution Environment. Introduces the models of memory organization and describes the register set used by applications. Chapter 4 Procedure Calls, Interrupts, and Exceptions. Describes the procedure stack and the mechanisms provided for making procedure calls and for servicing interrupts and exceptions. Chapter 5 Data Types and Addressing Modes. Describes the data types and addressing modes recognized by the processor. Chapter 6 Instruction Set Summary. Gives an overview of all the Intel Architecture instructions except those executed by the processors floating-point unit. The instructions are presented in functionally related groups. Chapter 7 Floating-Point Unit. Describes the Intel Architecture floating-point unit, including the floating-point registers and data types; gives an overview of the floating-point instruction set; and describes the processors floating-point exception conditions. Chapter 8 Programming with Intel MMX Technology. Describes the Intel MMX technology, including registers and data types, and gives an overview of the MMX technology instruction set. Chapter 9 Programming with the Streaming SIMD Extensions. Describes the Intel Streaming SIMD Extensions, including the registers and data types.
1-2
ABOUT THIS MANUAL
Chapter 10 Input/Output. Describes the processors I/O architecture, including I/O port addressing, the I/O instructions, and the I/O protection mechanism. Chapter 11 Processor Identification and Feature Determination. Describes how to determine the CPU type and the features that are available in the processor. Appendix A EFLAGS Cross-Reference. Summarizes how the Intel Architecture instructions affect the flags in the EFLAGS register. Appendix B EFLAGS Condition Codes. Summarizes how the conditional jump, move, and byte set on condition code instructions use the condition code flags (OF, CF, ZF, SF, and PF) in the EFLAGS register. Appendix C Floating-Point Exceptions Summary. Summarizes the exceptions that can be raised by floating-point instructions. Appendix D SIMD Floating-Point Exceptions Summary. Provides the Streaming SIMD Extensions mnemonics, and the exceptions that each instruction can cause. Appendix E Guidelines for Writing FPU Exception Handlers. Describes how to design and write MS-DOS* compatible exception handling facilities for FPU and SIMD floating-point exceptions, including both software and hardware requirements and assembly-language code examples. This appendix also describes general techniques for writing robust FPU exception handlers. Appendix F Guidelines for Writing SIMD-FP Exception Handlers. Provides guidelines for the Streaming SIMD Extensions instructions that can generate numeric (floating-point) exceptions, and gives an overview of the necessary support for handling such exceptions.
1.3.
The contents of the Intel Architecture Software Developers Manual, Volume 3, are as follows: Chapter 1 About This Manual. Gives an overview of all three volumes of the Intel Architecture Software Developers Manual. It also describes the notational conventions in these manuals and lists related Intel manuals and documentation of interest to programmers and hardware designers. Chapter 2 System Architecture Overview. Describes the modes of operation of an Intel Architecture processor and the mechanisms provided in the Intel Architecture to support operating systems and executives, including the system-oriented registers and data structures and the system-oriented instructions. The steps necessary for switching between real-address and protected modes are also identified. Chapter 3 Protected-Mode Memory Management. Describes the data structures, registers, and instructions that support segmentation and paging and explains how they can be used to implement a flat (unsegmented) memory model or a segmented memory model.
1-3
ABOUT THIS MANUAL
Chapter 4 Protection. Describes the support for page and segment protection provided in the Intel Architecture. This chapter also explains the implementation of privilege rules, stack switching, pointer validation, user and supervisor modes. Chapter 5 Interrupt and Exception Handling. Describes the basic interrupt mechanisms defined in the Intel Architecture, shows how interrupts and exceptions relate to protection, and describes how the architecture handles each exception type. Reference information for each Intel Architecture exception is given at the end of this chapter. Chapter 6 Task Management. Describes the mechanisms the Intel Architecture provides to support multitasking and inter-task protection. Chapter 7 Multiple Processor Management. Describes the instructions and flags that support multiple processors with shared memory, memory ordering, and the advanced programmable interrupt controller (APIC). Chapter 8 Processor Management and Initialization. Defines the state of an Intel Architecture processor and its floating-point and SIMD floating-point units after reset initialization. This chapter also explains how to set up an Intel Architecture processor for real-address mode operation and protected- mode operation, and how to switch between modes. Chapter 9 Memory Cache Control. Describes the general concept of caching and the caching mechanisms supported by the Intel Architecture. This chapter also describes the memory type range registers (MTRRs) and how they can be used to map memory types of physical memory. MTRRs were introduced into the Intel Architecture with the Pentium Pro processor. It also presents information on using the new cache control and memory streaming instructions introduced with the Pentium III processor. Chapter 10 MMX Technology System Programming. Describes those aspects of the Intel MMX technology that must be handled and considered at the system programming level, including task switching, exception handling, and compatibility with existing system environments. The MMX technology was introduced into the Intel Architecture with the Pentium processor. Chapter 11 Streaming SIMD Extensions System Programming. Describes those aspects of Streaming SIMD Extensions that must be handled and considered at the system programming level, including task switching, exception handling, and compatibility with existing system environments. Streaming SIMD Extensions were introduced into the Intel Architecture with the Pentium processor. Chapter 12 System Management Mode (SMM). Describes the Intel Architectures system management mode (SMM), which can be used to implement power management functions. Chapter 13 Machine-Check Architecture. Describes the machine-check architecture, which was introduced into the Intel Architecture with the Pentium processor. Chapter 14 Code Optimization. Discusses general optimization techniques for programming an Intel Architecture processor. Chapter 15 Debugging and Performance Monitoring. Describes the debugging registers and other debug mechanism provided in the Intel Architecture. This chapter also describes the time-stamp counter and the performance-monitoring counters.
1-4
ABOUT THIS MANUAL
Chapter 16 8086 Emulation. Describes the real-address and virtual-8086 modes of the Intel Architecture. Chapter 17 Mixing 16-Bit and 32-Bit Code. Describes how to mix 16-bit and 32-bit code modules within the same program or task. Chapter 18 Intel Architecture Compatibility. Describes the programming differences between the Intel 286, Intel386, Intel486, Pentium, and P6 family processors. The differences among the 32-bit Intel Architecture processors (the Intel386, Intel486, Pentium, and P6 family processors) are described throughout the three volumes of the Intel Architecture Software Developers Manual, as relevant to particular features of the architecture. This chapter provides a collection of all the relevant compatibility information for all Intel Architecture processors and also describes the basic differences with respect to the 16-bit Intel Architecture processors (the Intel 8086 and Intel 286 processors). Appendix A Performance-Monitoring Events. Lists the events that can be counted with the performance-monitoring counters and the codes used to select these events. Both Pentium processor and P6 family processor events are described. Appendix B Model-Specific Registers (MSRs). Lists the MSRs available in the Pentium and P6 family processors and their functions. Appendix C Dual-Processor (DP) Bootup Sequence Example (Specific to Pentium Processors). Gives an example of how to use the DP protocol to boot two Pentium processors (a primary processor and a secondary processor) in a DP system and initialize their APICs. Appendix D Multiple-Processor (MP) Bootup Sequence Example (Specific to P6 Family Processors). Gives an example of how to use of the MP protocol to boot two P6 family processors in a MP system and initialize their APICs. Appendix E Programming the LINT0 and LINT1 Inputs. Gives an example of how to program the LINT0 and LINT1 pins for specific interrupt vectors.
1.4.
1.4.1.
Bit and Byte Order
In illustrations of data structures in memory, smaller addresses appear toward the bottom of the figure; addresses increase toward the top. Bit positions are numbered from right to left. The numerical value of a set bit is equal to two raised to the power of the bit position. Intel Architecture processors are little endian machines; this means the bytes of a word are numbered starting from the least significant byte. Figure 1-1 illustrates these conventions.
1-5
ABOUT THIS MANUAL
Highest 31 Address
Data Structure 8 7 24 23 16 15
0 28 24 20 16 12 8 4 0
Bit offset
Byte 3
Byte 2
Byte 1
Byte 0
Lowest Address
Byte Offset
1.4.2.
NOTE
Avoid any software dependence upon the state of reserved bits in Intel Architecture registers. Depending upon the values of reserved register bits will make software dependent upon the unspecified manner in which the processor handles these bits. Programs that depend upon reserved values risk incompatibility with future processors.
1-6
ABOUT THIS MANUAL
1.4.3.
When instructions are represented symbolically, a subset of the Intel Architecture assembly language is used. In this subset, an instruction has the following format:
where:
1.4.4.
1.4.5.
The processor uses byte addressing. This means memory is organized and accessed as a sequence of bytes. Whether one or more bytes are being accessed, a byte address is used to locate the byte or bytes of memory. The range of memory that can be addressed is called an address space.
1-7
ABOUT THIS MANUAL
The processor also supports segmented addressing. This is a form of addressing where a program may have many independent address spaces, called segments. For example, a program can keep its code (instructions) and stack in separate segments. Code addresses would always refer to the code space, and stack addresses would always refer to the stack space. The following notation is used to specify a byte address within a segment:
DS:FF79H
CS:EIP
1.4.6.
Exceptions
#PF(fault code)
#GP(0)
Refer to Chapter 5, Interrupt and Exception Handling, in the Intel Architecture Software Developers Manual, Volume 3, for a list of exception mnemonics and their descriptions.
1-8
ABOUT THIS MANUAL
1.5.
RELATED LITERATURE
Intel Pentium II Processor Specification Update, Order Number 243337-010. Intel Pentium Pro Processor Specification Update, Order Number 242689. Intel Pentium Processor Specification Update, Order Number 242480. AP-485, Intel Processor Identification and the CPUID Instruction, Order Number 241618. AP-578, Software and Hardware Considerations for FPU Exception Handlers for Intel Architecture Processors, Order Number 242415-001. Pentium Pro Processor Family Developers Manual, Volume 1: Specifications, Order Number 242690-001. Pentium Processor Family Developers Manual, Order Number 241428. Intel486 Microprocessor Data Book, Order Number 240440. Intel486 SX CPU/Intel487 SX Math Coprocessor Data Book , Order Number 240950. Intel486 DX2 Microprocessor Data Book, Order Number 241245. Intel486 Microprocessor Product Brief Book, Order Number 240459. Intel386 Processor Hardware Reference Manual, Order Number 231732. Intel386 Processor System Software Writer's Guide, Order Number 231499. Intel386 High-Performance 32-Bit CHMOS Microprocessor with Integrated Memory Management, Order Number 231630. 376 Embedded Processor Programmers Reference Manual, Order Number 240314. 80387 DX Users Manual Programmers Reference, Order Number 231917. 376 High-Performance 32-Bit Embedded Processor, Order Number 240182. Intel386 SX Microprocessor, Order Number 240187. Microprocessor and Peripheral Handbook (Vol. 1), Order Number 230843. AP-528, Optimizations for Intels 32-Bit Processors, Order Number 242816-001.
1-9
ABOUT THIS MANUAL
1-10
2
Instruction Format
CHAPTER 2 INSTRUCTION FORMAT

This chapter describes the instruction format for all Intel Architecture processors.
2.1.
GENERAL INSTRUCTION FORMAT
All Intel Architecture instruction encodings are subsets of the general instruction format shown in Figure 2-1. Instructions consist of optional instruction prefixes (in any order), one or two primary opcode bytes, an addressing-form specifier (if required) consisting of the ModR/M byte and sometimes the SIB (Scale-Index-Base) byte, a displacement (if required), and an immediate data field (if required).
Instruction Prefixes Up to four prefixes of 1-byte each (optional) 7
Opcode 1 or 2 byte opcode
ModR/M 1 byte (if required)
SIB 1 byte (if required)
Displacement Address displacement of 1, 2, or 4 bytes or none 3 2 Index Base 0
Immediate Immediate data of 1, 2, or 4 bytes or none
65 Mod
32 0 Reg/ R/M Opcode
6 5
Scale
Figure 2-1. Intel Architecture Instruction Format
2.2.
INSTRUCTION PREFIXES
The instruction prefixes are divided into four groups, each with a set of allowable prefix codes:
Lock and repeat prefixes. F0HLOCK prefix. F2HREPNE/REPNZ prefix (used only with string instructions). F3HREP prefix (used only with string instructions). F3HREPE/REPZ prefix (used only with string instructions). F3HStreaming SIMD Extensions prefix.
2-1
INSTRUCTION FORMAT
Segment override. 2EHCS segment override prefix. 36HSS segment override prefix. 3EHDS segment override prefix. 26HES segment override prefix. 64HFS segment override prefix. 65HGS segment override prefix.
Operand-size override, 66H Address-size override, 67H
For each instruction, one prefix may be used from each of these groups and be placed in any order. The effect of redundant prefixes (more than one prefix from a group) is undefined and may vary from processor to processor.
Streaming SIMD Extensions prefix, 0FH
The nature of Streaming SIMD Extensions allows the use of existing instruction formats. Instructions use the ModR/M format and are preceded by the 0F prefix byte. In general, operations are not duplicated to provide two directions (i.e. separate load and store variants). For more information, see Section B.4.1., Instruction Prefixes in Appendix B, Instruction Formats and Encodings.
2.3.
OPCODE
The primary opcode is either 1 or 2 bytes. An additional 3-bit opcode field is sometimes encoded in the ModR/M byte. Smaller encoding fields can be defined within the primary opcode. These fields define the direction of the operation, the size of displacements, the register encoding, condition codes, or sign extension. The encoding of fields in the opcode varies, depending on the class of operation.
2.4.
MODR/M AND SIB BYTES
Most instructions that refer to an operand in memory have an addressing-form specifier byte (called the ModR/M byte) following the primary opcode. The ModR/M byte contains three fields of information:
The mod field combines with the r/m field to form 32 possible values: eight registers and 24 addressing modes. The reg/opcode field specifies either a register number or three more bits of opcode information. The purpose of the reg/opcode field is specified in the primary opcode. The r/m field can specify a register as an operand or can be combined with the mod field to encode an addressing mode.
2-2
INSTRUCTION FORMAT
Certain encodings of the ModR/M byte require a second addressing byte, the SIB byte, to fully specify the addressing form. The base-plus-index and scale-plus-index forms of 32-bit addressing require the SIB byte. The SIB byte includes the following fields:
The scale field specifies the scale factor. The index field specifies the register number of the index register. The base field specifies the register number of the base register.
Refer to Section 2.6. for the encodings of the ModR/M and SIB bytes.
2.5.
DISPLACEMENT AND IMMEDIATE BYTES
Some addressing forms include a displacement immediately following either the ModR/M or SIB byte. If a displacement is required, it can be 1, 2, or 4 bytes. If the instruction specifies an immediate operand, the operand always follows any displacement bytes. An immediate operand can be 1, 2, or 4 bytes.
2.6.
ADDRESSING-MODE ENCODING OF MODR/M AND SIB BYTES
The values and the corresponding addressing forms of the ModR/M and SIB bytes are shown in Tables 2-1 through 2-3. The 16-bit addressing forms specified by the ModR/M byte are in Table 2-1, and the 32-bit addressing forms specified by the ModR/M byte are in Table 2-2. Table 2-3 shows the 32-bit addressing forms specified by the SIB byte. In Tables 2-1 and 2-2, the first column (labeled Effective Address) lists 32 different effective addresses that can be assigned to one operand of an instruction by using the Mod and R/M fields of the ModR/M byte. The first 24 give the different ways of specifying a memory location; the last eight (specified by the Mod field encoding 11B) give the ways of specifying the general purpose, MMX technology, and SIMD floating-point registers. Each of the register encodings list five possible registers. For example, the first register-encoding (selected by the R/M field encoding of 000B) indicates the general-purpose registers EAX, AX or AL, the MMX technology register MM0, or the SIMD floating-point register XMM0. Which of these five registers is used is determined by the opcode byte and the operand-size attribute, which select either the EAX register (32 bits) or AX register (16 bits). The second and third columns in Tables 2-1 and 2-2 gives the binary encodings of the Mod and R/M fields in the ModR/M byte, respectively, required to obtain the associated effective address listed in the first column. All 32 possible combinations of the Mod and R/M fields are listed. Across the top of Tables 2-1 and 2-2, the eight possible values of the 3-bit Reg/Opcode field are listed, in decimal (sixth row from top) and in binary (seventh row from top). The seventh row is labeled REG=, which represents the use of these three bits to give the location of a second operand, which must be a general-purpose register, an MMX technology register, or a SIMD floating-point register. If the instruction does not require a second operand to be specified, then the 3 bits of the Reg/Opcode field may be used as an extension of the opcode, which is repre-
2-3
INSTRUCTION FORMAT
sented by the sixth row, labeled /digit (Opcode). The five rows above give the byte, word, and doubleword general-purpose registers; the MMX technology registers; the Streaming SIMD Extensions registers; and SIMD floating-point registers that correspond to the register numbers, with the same assignments as for the R/M field when Mod field encoding is 11B. As with the R/M field register options, which of the five possible registers is used is determined by the opcode byte along with the operand-size attribute. The body of Tables 2-1 and 2-2 (under the label Value of ModR/M Byte (in Hexadecimal)) contains a 32 by 8 array giving all of the 256 values of the ModR/M byte, in hexadecimal. Bits 3, 4 and 5 are specified by the column of the table in which a byte resides, and the row specifies bits 0, 1 and 2, and also bits 6 and 7.
2-4
INSTRUCTION FORMAT
Table 2-1. 16-Bit Addressing Forms with the ModR/M Byte

r8(/r) r16(/r) r32(/r) mm(/r) xmm(/r) /digit (Opcode) REG = AL AX EAX MM0 XMM0 0 000 CL CX ECX MM1 XMM1 1 001 DL DX EDX MM2 XMM2 2 010 BL BX EBX MM3 XMM3 3 011 AH SP ESP MM4 XMM4 4 100 CH BP1 EBP MM5 XMM5 5 101 DH SI ESI MM6 XMM6 6 110 BH DI EDI MM7 XMM7 7 111
Effective Address [BX+SI] [BX+DI] [BP+SI] [BP+DI] [SI] [DI] disp162 [BX] [BX+SI]+disp83 [BX+DI]+disp8 [BP+SI]+disp8 [BP+DI]+disp8 [SI]+disp8 [DI]+disp8 [BP]+disp8 [BX]+disp8 [BX+SI]+disp16 [BX+DI]+disp16 [BP+SI]+disp16 [BP+DI]+disp16 [SI]+disp16 [DI]+disp16 [BP]+disp16 [BX]+disp16 EAX/AX/AL/MM0/XMM0 ECX/CX/CL/MM1/XMM1 EDX/DX/DL/MM2/XMM2 EBX/BX/BL/MM3/XMM3 ESP/SP/AHMM4/XMM4 EBP/BP/CH/MM5/XMM5 ESI/SI/DH/MM6/XMM6 EDI/DI/BH/MM7/XMM7
Mod 00
R/M 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 00 01 02 03 04 05 06 07 40 41 42 43 44 45 46 47 80 81 82 83 84 85 86 87 C0 C1 C2 C3 C4 C5 C6 C7
Value of ModR/M Byte (in Hexadecimal) 08 09 0A 0B 0C 0D 0E 0F 48 49 4A 4B 4C 4D 4E 4F 88 89 8A 8B 8C 8D 8E 8F C8 C9 CA CB CC CD CE CF 10 11 12 13 14 15 16 17 50 51 52 53 54 55 56 57 90 91 92 93 94 95 96 97 D0 D1 D2 D3 D4 D5 D6 D7 18 19 1A 1B 1C 1D 1E 1F 58 59 5A 5B 5C 5D 5E 5F 98 99 9A 9B 9C 9D 9E 9F D8 D9 DA DB DC DD DE DF 20 21 22 23 24 25 26 27 60 61 62 63 64 65 66 67 A0 A1 A2 A3 A4 A5 A6 A7 E0 EQ E2 E3 E4 E5 E6 E7 28 29 2A 2B 2C 2D 2E 2F 68 69 6A 6B 6C 6D 6E 6F A8 A9 AA AB AC AD AE AF E8 E9 EA EB EC ED EE EF 30 31 32 33 34 35 36 37 70 71 72 73 74 75 76 77 B0 B1 B2 B3 B4 B5 B6 B7 F0 F1 F2 F3 F4 F5 F6 F7 38 39 3A 3B 3C 3D 3E 3F 78 79 7A 7B 7C 7D 7E 7F B8 B9 BA BB BC BD BE BF F8 F9 FA FB FC FD FE FF
01
10
11
NOTES: 1. The default segment register is SS for the effective addresses containing a BP index, DS for other effective addresses. 2. The disp16 nomenclature denotes a 16-bit displacement following the ModR/M byte, to be added to the index. 3. The disp8 nomenclature denotes an 8-bit displacement following the ModR/M byte, to be sign-extended and added to the index.
2-5
INSTRUCTION FORMAT
Table 2-2. 32-Bit Addressing Forms with the ModR/M Byte

r8(/r) r16(/r) r32(/r) mm(/r) xmm(/r) /digit (Opcode) REG = AL AX EAX MM0 XMM0 0 000 CL CX ECX MM1 XMM1 1 001 DL DX EDX MM2 XMM2 2 010 BL BX EBX MM3 XMM3 3 011 AH SP ESP MM4 XMM4 4 100 CH BP EBP MM5 XMM5 5 101 DH SI ESI MM6 XMM6 6 110 BH DI EDI MM7 XMM7 7 111
Effective Address [EAX] [ECX] [EDX] [EBX] [--][--]1 disp322 [ESI] [EDI] disp8[EAX]3 disp8[ECX] disp8[EDX] disp8[EBX]; disp8[--][--] disp8[EBP] disp8[ESI] disp8[EDI] disp32[EAX] disp32[ECX] disp32[EDX] disp32[EBX] disp32[--][--] disp32[EBP] disp32[ESI] disp32[EDI] EAX/AX/AL/MM0/XMM0 ECX/CX/CL/MM1/XMM1 EDX/DX/DL/MM2XMM2 EBX/BX/BL/MM3/XMM3 ESP/SP/AH/MM4/XMM4 EBP/BP/CH/MM5/XMM5 ESI/SI/DH/MM6/XMM6 EDI/DI/BH/MM7/XMM7 NOTES:
Mod 00
R/M 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 00 01 02 03 04 05 06 07 40 41 42 43 44 45 46 47 80 81 82 83 84 85 86 87 C0 C1 C2 C3 C4 C5 C6 C7
Value of ModR/M Byte (in Hexadecimal) 08 09 0A 0B 0C 0D 0E 0F 48 49 4A 4B 4C 4D 4E 4F 88 89 8A 8B 8C 8D 8E 8F C8 C9 CA CB CC CD CE CF 10 11 12 13 14 15 16 17 50 51 52 53 54 55 56 57 90 91 92 93 94 95 96 97 D0 D1 D2 D3 D4 D5 D6 D7 18 19 1A 1B 1C 1D 1E 1F 58 59 5A 5B 5C 5D 5E 5F 98 99 9A 9B 9C 9D 9E 9F D8 D9 DA DB DC DD DE DF 20 21 22 23 24 25 26 27 60 61 62 63 64 65 66 67 A0 A1 A2 A3 A4 A5 A6 A7 E0 E1 E2 E3 E4 E5 E6 E7 28 29 2A 2B 2C 2D 2E 2F 68 69 6A 6B 6C 6D 6E 6F A8 A9 AA AB AC AD AE AF E8 E9 EA EB EC ED EE EF 30 31 32 33 34 35 36 37 70 71 72 73 74 75 76 77 B0 B1 B2 B3 B4 B5 B6 B7 F0 F1 F2 F3 F4 F5 F6 F7 38 39 3A 3B 3C 3D 3E 3F 78 79 7A 7B 7C 7D 7E 7F B8 B9 BA BB BC BD BE BF F8 F9 FA FB FC FD FE FF
01
10
11
1. The [--][--] nomenclature means a SIB follows the ModR/M byte. 2. The disp32 nomenclature denotes a 32-bit displacement following the SIB byte, to be added to the index. 3. The disp8 nomenclature denotes an 8-bit displacement following the SIB byte, to be sign-extended and added to the index.
2-6
INSTRUCTION FORMAT
Table 2-3 is organized similarly to Tables 2-1 and 2-2, except that its body gives the 256 possible values of the SIB byte, in hexadecimal. Which of the 8 general-purpose registers will be used as base is indicated across the top of the table, along with the corresponding values of the base field (bits 0, 1 and 2) in decimal and binary. The rows indicate which register is used as the index (determined by bits 3, 4 and 5) along with the scaling factor (determined by bits 6 and 7).
Table 2-3. 32-Bit Addressing Forms with the SIB Byte
r32 Base = Base = Scaled Index [EAX] [ECX] [EDX] [EBX] none [EBP] [ESI] [EDI] [EAX*2] [ECX*2] [EDX*2] [EBX*2] none [EBP*2] [ESI*2] [EDI*2] [EAX*4] [ECX*4] [EDX*4] [EBX*4] none [EBP*4] [ESI*4] [EDI*4] [EAX*8] [ECX*8] [EDX*8] [EBX*8] none [EBP*8] [ESI*8] [EDI*8] NOTE: 1. The [*] nomenclature means a disp32 with no base if MOD is 00, [EBP] otherwise. This provides the following addressing modes: disp32[index] disp8[EBP][index] disp32[EBP][index] (MOD=00). (MOD=01). (MOD=10). SS 00 Index 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 00 08 10 18 20 28 30 38 40 48 50 58 60 68 70 78 80 88 90 98 A0 A8 B0 B8 C0 C8 D0 D8 E0 E8 F0 F8 01 09 11 19 21 29 31 39 41 49 51 59 61 69 71 79 81 89 91 89 A1 A9 B1 B9 C1 C9 D1 D9 E1 E9 F1 F9 EAX 0 000 ECX 1 001 EDX 2 010 EBX 3 011 ESP 4 100 [*] 5 101 ESI 6 110 EDI 7 111
Value of SIB Byte (in Hexadecimal) 02 0A 12 1A 22 2A 32 3A 42 4A 52 5A 62 6A 72 7A 82 8A 92 9A A2 AA B2 BA C2 CA D2 DA E2 EA F2 FA 03 0B 13 1B 23 2B 33 3B 43 4B 53 5B 63 6B 73 7B 83 8B 93 9B A3 AB B3 BB C3 CB D3 DB E3 EB F3 FB 04 0C 14 1C 24 2C 34 3C 44 4C 54 5C 64 6C 74 7C 84 8C 94 9C A4 AC B4 BC C4 CC D4 DC E4 EC F4 FC 05 0D 15 1D 25 2D 35 3D 45 4D 55 5D 65 6D 75 7D 85 8D 95 9D A5 AD B5 BD C5 CD D5 DD E5 ED F5 FD 06 0E 16 1E 26 2E 36 3E 46 4E 56 5E 66 6E 76 7E 86 8E 96 9E A6 AE B6 BE C6 CE D6 DE E6 EE F6 FE 07 0F 17 1F 27 2F 37 3F 47 4F 57 5F 67 6F 77 7F 87 8F 97 9F A7 AF B7 BF C7 CF D7 DF E7 EF F7 FF
01
10
11
2-7
INSTRUCTION FORMAT
2-8
3
Instruction Set Reference
CHAPTER 3 INSTRUCTION SET REFERENCE

This chapter describes the complete Intel Architecture instruction set, including the integer, floating-point, MMX technology, Streaming SIMD Extensions, and system instructions. The instruction descriptions are arranged in alphabetical order. For each instruction, the forms are given for each operand combination, including the opcode, operands required, and a description. Also given for each instruction are a description of the instruction and its operands, an operational description, a description of the effect of the instructions on flags in the EFLAGS register, and a summary of the exceptions that can be generated.
3.1.
INTERPRETING THE INSTRUCTION REFERENCE PAGES
This section describes the information contained in the various sections of the instruction reference pages that make up the majority of this chapter. It also explains the notational conventions and abbreviations used in these sections.
3.1.1.
Instruction Format
The following is an example of the format used for each Intel Architecture instruction description in this chapter:
3-1
INSTRUCTION SET REFERENCE
CMCComplement Carry Flag

Opcode F5 Instruction CMC Description Complement carry flag
3.1.1.1.
OPCODE COLUMN
The Opcode column gives the complete object code produced for each form of the instruction. When possible, the codes are given as hexadecimal bytes, in the same order in which they appear in memory. Definitions of entries other than hexadecimal bytes are as follows:
/digitA digit between 0 and 7 indicates that the ModR/M byte of the instruction uses only the r/m (register or memory) operand. The reg field contains the digit that provides an extension to the instruction's opcode. /rIndicates that the ModR/M byte of the instruction contains both a register operand and an r/m operand. cb, cw, cd, cpA 1-byte (cb), 2-byte (cw), 4-byte (cd), or 6-byte (cp) value following the opcode that is used to specify a code offset and possibly a new value for the code segment register. ib, iw, idA 1-byte (ib), 2-byte (iw), or 4-byte (id) immediate operand to the instruction that follows the opcode, ModR/M bytes or scale-indexing bytes. The opcode determines if the operand is a signed value. All words and doublewords are given with the low-order byte first. +rb, +rw, +rdA register code, from 0 through 7, added to the hexadecimal byte given at the left of the plus sign to form a single opcode byte. The register codes are given in Table 3-1. +iA number used in floating-point instructions when one of the operands is ST(i) from the FPU register stack. The number i (which can range from 0 to 7) is added to the hexadecimal byte given at the left of the plus sign to form a single opcode byte.
3-2
Table 3-1. Register Encodings Associated with the +rb, +rw, and +rd Nomenclature
rb AL CL DL BL = = = = rb AH CH DH BH = = = = 4 5 6 7 SP BP SI DI 0 1 2 3 AX CX DX BX rw = = = = rw = = = = 4 5 6 7 ESP EBP ESI EDI 0 1 2 3 EAX ECX EDX EBX rd = = = = rd = = = = 4 5 6 7 0 1 2 3
3.1.1.2.
INSTRUCTION COLUMN
The Instruction column gives the syntax of the instruction statement as it would appear in an ASM386 program. The following is a list of the symbols used to represent operands in the instruction statements:
rel8A relative address in the range from 128 bytes before the end of the instruction to 127 bytes after the end of the instruction. rel16 and rel32A relative address within the same code segment as the instruction assembled. The rel16 symbol applies to instructions with an operand-size attribute of 16 bits; the rel32 symbol applies to instructions with an operand-size attribute of 32 bits. ptr16:16 and ptr16:32A far pointer, typically in a code segment different from that of the instruction. The notation 16:16 indicates that the value of the pointer has two parts. The value to the left of the colon is a 16-bit selector or value destined for the code segment register. The value to the right corresponds to the offset within the destination segment. The ptr16:16 symbol is used when the instruction's operand-size attribute is 16 bits; the ptr16:32 symbol is used when the operand-size attribute is 32 bits. r8One of the byte general-purpose registers AL, CL, DL, BL, AH, CH, DH, or BH. r16One of the word general-purpose registers AX, CX, DX, BX, SP, BP, SI, or DI. r32One of the doubleword general-purpose registers EAX, ECX, EDX, EBX, ESP, EBP, ESI, or EDI. imm8An immediate byte value. The imm8 symbol is a signed number between 128 and +127 inclusive. For instructions in which imm8 is combined with a word or doubleword operand, the immediate value is sign-extended to form a word or doubleword. The upper byte of the word is filled with the topmost bit of the immediate value. imm16An immediate word value used for instructions whose operand-size attribute is 16 bits. This is a number between 32,768 and +32,767 inclusive.
3-3
imm32 An immediate doubleword value used for instructions whose operandsize attribute is 32 bits. It allows the use of a number between +2,147,483,647 and 2,147,483,648 inclusive. r/m8A byte operand that is either the contents of a byte general-purpose register (AL, BL, CL, DL, AH, BH, CH, and DH), or a byte from memory. r/m16A word general-purpose register or memory operand used for instructions whose operand-size attribute is 16 bits. The word general-purpose registers are: AX, BX, CX, DX, SP, BP, SI, and DI. The contents of memory are found at the address provided by the effective address computation. r/m32A doubleword general-purpose register or memory operand used for instructions whose operand-size attribute is 32 bits. The doubleword general-purpose registers are: EAX, EBX, ECX, EDX, ESP, EBP, ESI, and EDI. The contents of memory are found at the address provided by the effective address computation. mA 16- or 32-bit operand in memory. m8A byte operand in memory, usually expressed as a variable or array name, but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is used only with the string instructions and the XLAT instruction. m16A word operand in memory, usually expressed as a variable or array name, but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is used only with the string instructions. m32A doubleword operand in memory, usually expressed as a variable or array name, but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is used only with the string instructions. m64A memory quadword operand in memory. This nomenclature is used only with the CMPXCHG8B instruction. m128A memory double quadword operand in memory. This nomenclature is used only with the Streaming SIMD Extensions. m16:16, m16:32A memory operand containing a far pointer composed of two numbers. The number to the left of the colon corresponds to the pointer's segment selector. The number to the right corresponds to its offset. m16&32, m16&16, m32&32A memory operand consisting of data item pairs whose sizes are indicated on the left and the right side of the ampersand. All memory addressing modes are allowed. The m16&16 and m32&32 operands are used by the BOUND instruction to provide an operand containing an upper and lower bounds for array indices. The m16&32 operand is used by LIDT and LGDT to provide a word with which to load the limit field, and a doubleword with which to load the base field of the corresponding GDTR and IDTR registers. moffs8, moffs16, moffs32A simple memory variable (memory offset) of type byte, word, or doubleword used by some variants of the MOV instruction. The actual address is given by a simple offset relative to the segment base. No ModR/M byte is used in the
3-4
instruction. The number shown with moffs indicates its size, which is determined by the address-size attribute of the instruction.
SregA segment register. The segment register bit assignments are ES=0, CS=1, SS=2, DS=3, FS=4, and GS=5. m32real, m64real, m80realA single-, double-, and extended-real (respectively) floating-point operand in memory. m16int, m32int, m64intA word-, short-, and long-integer (respectively) floating-point operand in memory. ST or ST(0)The top element of the FPU register stack. ST(i)The ith element from the top of the FPU register stack. (i = 0 through 7) mmAn MMX technology register. The 64-bit MMX technology registers are: MM0 through MM7. xmmA SIMD floating-point register. The 128-bit SIMD floating-point registers are: XMM0 through XMM7. mm/m32The low order 32 bits of an MMX technology register or a 32-bit memory operand. The 64-bit MMX technology registers are: MM0 through MM7. The contents of memory are found at the address provided by the effective address computation. mm/m64An MMX technology register or a 64-bit memory operand. The 64-bit MMX technology registers are: MM0 through MM7. The contents of memory are found at the address provided by the effective address computation. xmm/m32A SIMD floating-points register or a 32-bit memory operand. The 128-bit SIMD floating-point registers are XMM0 through XMM7. The contents of memory are found at the address provided by the effective address computation. xmm/m64A SIMD floating-point register or a 64-bit memory operand. The 64-bit SIMD floating-point registers are XMM0 through XMM7. The contents of memory are found at the address provided by the effective address computation. xmm/m128A SIMD floating-point register or a 128-bit memory operand. The 128-bit SIMD floating-point registers are XMM0 through XMM7. The contents of memory are found at the address provided by the effective address computation. DESCRIPTION COLUMN
3.1.1.3.
The Description column following the Instruction column briefly explains the various forms of the instruction. The following Description and Operation sections contain more details of the instruction's operation. 3.1.1.4. DESCRIPTION
The Description section describes the purpose of the instructions and the required operands. It also discusses the effect of the instruction on flags.
3-5
3.1.2.
Operation
The Operation section contains an algorithmic description (written in pseudo-code) of the instruction. The pseudo-code uses a notation similar to the Algol or Pascal language. The algorithms are composed of the following elements:
Comments are enclosed within the symbol pairs (* and *). Compound statements are enclosed in keywords, such as IF, THEN, ELSE, and FI for an if statement, DO and OD for a do statement, or CASE ... OF and ESAC for a case statement. A register name implies the contents of the register. A register name enclosed in brackets implies the contents of the location whose address is contained in that register. For example, ES:[DI] indicates the contents of the location whose ES segment relative address is in register DI. [SI] indicates the contents of the address contained in register SI relative to SIs default segment (DS) or overridden segment. Parentheses around the E in a general-purpose register name, such as (E)SI, indicates that an offset is read from the SI register if the current address-size attribute is 16 or is read from the ESI register if the address-size attribute is 32. Brackets are also used for memory operands, where they mean that the contents of the memory location is a segment-relative offset. For example, [SRC] indicates that the contents of the source operand is a segment-relative offset. A B; indicates that the value of B is assigned to A. The symbols =, , , and are relational operators used to compare two values, meaning equal, not equal, greater or equal, less or equal, respectively. A relational expression such as A = B is TRUE if the value of A is equal to B; otherwise it is FALSE. The expression << COUNT and >> COUNT indicates that the destination operand should be shifted left or right, respectively, by the number of bits indicated by the count operand.
The following identifiers are used in the algorithmic descriptions:
OperandSize and AddressSizeThe OperandSize identifier represents the operand-size attribute of the instruction, which is either 16 or 32 bits. The AddressSize identifier represents the address-size attribute, which is either 16 or 32 bits. For example, the following pseudo-code indicates that the operand-size attribute depends on the form of the CMPS instruction used.
IF instruction = CMPSW THEN OperandSize 16; ELSE IF instruction = CMPSD THEN OperandSize 32; FI; FI;
3-6
Refer to Section 3.8., Operand-Size and Address-Size Attributes in Chapter 3, Basic Execution Environment of the Intel Architecture Software Developers Manual, Volume 1, for general guidelines on how these attributes are determined.
StackAddrSizeRepresents the stack address-size attribute associated with the instruction, which has a value of 16 or 32 bits. For more information, refer to Section 4.2.3., Address-Size Attributes for Stack Accesses in Chapter 4, Procedure Calls, Interrupts, and Exceptions of the Intel Architecture Software Developers Manual, Volume 1. SRCRepresents the source operand. DESTRepresents the destination operand.
The following functions are used in the algorithmic descriptions: ZeroExtend(value)Returns a value zero-extended to the operand-size attribute of the instruction. For example, if the operand-size attribute is 32, zero extending a byte value of 10 converts the byte from F6H to a doubleword value of 000000F6H. If the value passed to the ZeroExtend function and the operand-size attribute are the same size, ZeroExtend returns the value unaltered. SignExtend(value)Returns a value sign-extended to the operand-size attribute of the instruction. For example, if the operand-size attribute is 32, sign extending a byte containing the value 10 converts the byte from F6H to a doubleword value of FFFFFFF6H. If the value passed to the SignExtend function and the operand-size attribute are the same size, SignExtend returns the value unaltered. SaturateSignedWordToSignedByteConverts a signed 16-bit value to a signed 8-bit value. If the signed 16-bit value is less than 128, it is represented by the saturated value 128 (80H); if it is greater than 127, it is represented by the saturated value 127 (7FH). SaturateSignedDwordToSignedWordConverts a signed 32-bit value to a signed 16-bit value. If the signed 32-bit value is less than 32768, it is represented by the saturated value 32768 (8000H); if it is greater than 32767, it is represented by the saturated value 32767 (7FFFH). SaturateSignedWordToUnsignedByteConverts a signed 16-bit value to an unsigned 8-bit value. If the signed 16-bit value is less than zero, it is represented by the saturated value zero (00H); if it is greater than 255, it is represented by the saturated value 255 (FFH). SaturateToSignedByteRepresents the result of an operation as a signed 8-bit value. If the result is less than 128, it is represented by the saturated value 128 (80H); if it is greater than 127, it is represented by the saturated value 127 (7FH). SaturateToSignedWordRepresents the result of an operation as a signed 16-bit value. If the result is less than 32768, it is represented by the saturated value 32768 (8000H); if it is greater than 32767, it is represented by the saturated value 32767 (7FFFH). SaturateToUnsignedByteRepresents the result of an operation as a signed 8-bit value. If the result is less than zero it is represented by the saturated value zero (00H); if it is greater than 255, it is represented by the saturated value 255 (FFH).
3-7
SaturateToUnsignedWordRepresents the result of an operation as a signed 16-bit value. If the result is less than zero it is represented by the saturated value zero (00H); if it is greater than 65535, it is represented by the saturated value 65535 (FFFFH). LowOrderWord(DEST * SRC)Multiplies a word operand by a word operand and stores the least significant word of the doubleword result in the destination operand. HighOrderWord(DEST * SRC)Multiplies a word operand by a word operand and stores the most significant word of the doubleword result in the destination operand. Push(value)Pushes a value onto the stack. The number of bytes pushed is determined by the operand-size attribute of the instruction. Refer to the Operation section in PUSHPush Word or Doubleword Onto the Stack in this chapter for more information on the push operation. Pop() removes the value from the top of the stack and returns it. The statement EAX Pop(); assigns to EAX the 32-bit value from the top of the stack. Pop will return either a word or a doubleword depending on the operand-size attribute. Refer to the Operation section in POPPop a Value from the Stack in this chapter for more information on the pop operation. PopRegisterStackMarks the FPU ST(0) register as empty and increments the FPU register stack pointer (TOP) by 1. Switch-TasksPerforms a task switch. Bit(BitBase, BitOffset)Returns the value of a bit within a bit string, which is a sequence of bits in memory or a register. Bits are numbered from low-order to high-order within registers and within memory bytes. If the base operand is a register, the offset can be in the range 0..31. This offset addresses a bit within the indicated register. An example, the function Bit[EAX, 21] is illustrated in Figure 3-1.
31
21
BitOffset = 21
Figure 3-1. Bit Offset for BIT[EAX,21]
If BitBase is a memory address, BitOffset can range from 2 GBits to 2 GBits. The addressed bit is numbered (Offset MOD 8) within the byte at address (BitBase + (BitOffset DIV 8)), where DIV is signed division with rounding towards negative infinity, and MOD returns a positive number. This operation is illustrated in Figure 3-2.
3-8
3.1.3.
Intel C/C++ Compiler Intrinsics Equivalent
The Pentium with MMX technology, Pentium II, and Pentium III processors have characteristics that enable the development of advanced multimedia applications. This section describes the compiler intrinsic equivalents that can be used with the Intel C/C++ Compiler. Intrinsics are special coding extensions that allow using the syntax of C function calls and C variables instead of hardware registers. Using these intrinsics frees programmers from having to manage registers and assembly programming. Further, the compiler optimizes the instruction scheduling so that executables runs faster. The following sections discuss the intrinsics API and the MMX technology and SIMD floating-point intrinsics. Each intrinsic equivalent is listed with the instruction description. There may be additional intrinsics that do not have an instruction equivalent. It is strongly recommended that the reader reference the compiler documentation for the complete list of supported intrinsics. Please refer to the Intel C/C++ Compiler Users Guide for Win32* Systems With
Streaming SIMD Extension Support (Order Number 718195-00B). Refer to Appendix C, Compiler Intrinsics and Functional Equivalents for more information on using intrinsics.
Most of the intrinsics that use __m64 operands have two different names. If two intrinsic names are shown for the same equivalent, the first name is the intrinsic for Intel C/C++ Compiler versions prior to 4.0 and the second name should be used with the Intel C/C++ Compiler version 4.0 and future versions. The Intel C/C++ Compiler version 4.0 will support the old intrinsic names. Programs written using pre-4.0 intrinsic names will compile with version 4.0. Version 4.0 intrinsic names will not compile on pre-4.0 compilers. 3.1.3.1. THE INTRINSICS API
The benefit of coding with MMX technology intrinsics and SIMD floating-point intrinsics is that you can use the syntax of C function calls and C variables instead of hardware registers. This frees you from managing registers and programming assembly. Further, the compiler optimizes the instruction scheduling so that your executable runs faster. For each computational and data manipulation instruction in the new instruction set, there is a corresponding C intrinsic that implements it directly. The intrinsics allow you to specify the underlying implementation (instruction selection) of an algorithm yet leave instruction scheduling and register allocation to the compiler.
3-9
3.1.3.2.
MMX TECHNOLOGY INTRINSICS
The MMX technology intrinsics are based on a new __m64 data type to represent the specific contents of an MMX technology register. You can specify values in bytes, short integers, 32bit values, or a 64-bit object. The __m64 data type, however, is not a basic ANSI C data type, and therefore you must observe the following usage restrictions:
Use __m64 data only on the left-hand side of an assignment, as a return value, or as a parameter. You cannot use it with other arithmetic expressions ("+", ">>", and so on). Use __m64 objects in aggregates, such as unions to access the byte elements and structures; the
address of an __m64 object may be taken.
Use __m64 data only with the MMX technology intrinsics described in this guide and the Intel C/C++ Compiler Users Guide for Win32* Systems With Streaming SIMD Extension
Support (Order Number 718195-00B). Refer to Appendix C, Compiler Intrinsics and Functional Equivalents for more information on using intrinsics.
3.1.3.3.
SIMD FLOATING-POINT INTRINSICS
The __m128 data type is used to represent the contents of an xmm register, which is either four packed single-precision floating-point values or one scalar single-precision number. The __m128 data type is not a basic ANSI C datatype and therefore some restrictions are placed on its usage:
Use __m128 only on the left-hand side of an assignment, as a return value, or as a parameter. Do not use it in other arithmetic expressions such as "+" and ">>". Do not initialize __m128 with literals; there is no way to express 128-bit constants. Use __m128 objects in aggregates, such as unions (for example, to access the float elements) and structures. The address of an __m128 object may be taken. Use __m128 data only with the intrinsics described in this users guide. Refer to Appendix C, Compiler Intrinsics and Functional Equivalents for more information on using intrinsics.
The compiler aligns __m128 local data to 16B boundaries on the stack. Global __m128 data is also 16B-aligned. (To align float arrays, you can use the alignment declspec described in the following section.) Because the new instruction set treats the SIMD floating-point registers in the same way whether you are using packed or scalar data, there is no __m32 datatype to represent scalar data as you might expect. For scalar operations, you should use the __m128 objects and the scalar forms of the intrinsics; the compiler and the processor implement these operations with 32-bit memory references. The suffixes ps and ss are used to denote packed single and scalar single precision operations. The packed floats are represented in right-to-left order, with the lowest word (right-most) being used for scalar operations: [z, y, x, w]. To explain how memory storage reflects this, consider the following example.
3-10
The operation
float a[4] = { 1.0, 2.0, 3.0, 4.0 }; __m128 t = _mm_load_ps(a);
produces the same result as follows:

__m128 t = _mm_set_ps(4.0, 3.0, 2.0, 1.0);
In other words,
t = [ 4.0, 3.0, 2.0, 1.0 ]
where the scalar element is 1.0. Some intrinsics are composites because they require more than one instruction to implement them. You should be familiar with the hardware features provided by the Streaming SIMD Extensions and MMX technology when writing programs with the intrinsics. Keep the following three important issues in mind:
Certain intrinsics, such as _mm_loadr_ps and _mm_cmpgt_ss, are not directly supported by the instruction set. While these intrinsics are convenient programming aids, be mindful of their implementation cost. Floating-point data loaded or stored as __m128 objects must generally be 16-bytealigned. Some intrinsics require that their argument be immediates, that is, constant integers (literals), due to the nature of the instruction. The result of arithmetic operations acting on two NaN (Not a Number) arguments is undefined. Therefore, FP operations using NaN arguments will not match the expected behavior of the corresponding assembly instructions.
For a more detailed description of each intrinsic and additional information related to its usage, refer to the Intel C/C++ Compiler Users Guide for Win32* Systems With Streaming SIMD Extension
Support (Order Number 718195-00B). Refer to Appendix C, Compiler Intrinsics and Functional Equivalents for more information on using intrinsics.
3.1.4.
Flags Affected
The Flags Affected section lists the flags in the EFLAGS register that are affected by the instruction. When a flag is cleared, it is equal to 0; when it is set, it is equal to 1. The arithmetic and logical instructions usually assign values to the status flags in a uniform manner. For more information, refer to Appendix A, EFLAGS Cross-Reference, of the Intel Architecture Software Developers Manual, Volume 1. Non-conventional assignments are described in the Operation section. The values of flags listed as undefined may be changed by the instruction in an indeterminate manner. Flags that are not listed are unchanged by the instruction.
3-11
0 7
0 7
BitBase + 1
BitBase
BitBase 1
BitOffset = +13
7 0 7 0 7 5 0
BitBase
BitBase 1 BitOffset = 11
BitBase 2
Figure 3-2. Memory Bit Indexing
3.1.5.
FPU Flags Affected
The floating-point instructions have an FPU Flags Affected section that describes how each instruction can affect the four condition code flags of the FPU status word.
3.1.6.
Protected Mode Exceptions
The Protected Mode Exceptions section lists the exceptions that can occur when the instruction is executed in protected mode and the reasons for the exceptions. Each exception is given a mnemonic that consists of a pound sign (#) followed by two letters and an optional error code in parentheses. For example, #GP(0) denotes a general protection exception with an error code of 0. Table 3-2 associates each two-letter mnemonic with the corresponding interrupt vector number and exception name. Refer to Chapter 5, Interrupt and Exception Handling, of the Intel Architecture Software Developers Manual, Volume 3, for a detailed description of the exceptions. Application programmers should consult the documentation provided with their operating systems to determine the actions taken when exceptions occur.
3.1.7.
Real-Address Mode Exceptions
The Real-Address Mode Exceptions section lists the exceptions that can occur when the instruction is executed in real-address mode.
3-12
Table 3-2. Exception Mnemonics, Names, and Vector Numbers

Vector No. 0 1 3 4 5 6 7 8 10 11 12 13 14 16 17 18 19 Mnemonic #DE #DB #BP #OF #BR #UD #NM #DF #TS #NP #SS #GP #PF #MF #AC #MC #XF Divide Error Debug Breakpoint Overflow BOUND Range Exceeded Invalid Opcode (Undefined Opcode) Device Not Available (No Math Coprocessor) Double Fault Invalid TSS Segment Not Present Stack Segment Fault General Protection Page Fault Floating-Point Error (Math Fault) Alignment Check Machine Check SIMD Floating-Point Numeric Error Name Source DIV and IDIV instructions. Any code or data reference. INT 3 instruction. INTO instruction. BOUND instruction. UD2 instruction or reserved opcode.1 Floating-point or WAIT/FWAIT instruction. Any instruction that can generate an exception, an NMI, or an INTR. Task switch or TSS access. Loading segment registers or accessing system segments. Stack operations and SS register loads. Any memory reference and other protection checks. Any memory reference. Floating-point or WAIT/FWAIT instruction. Any data reference in memory.2 Model dependent.3 Streaming SIMD Extensions4
NOTES: 1. The UD2 instruction was introduced in the Pentium Pro processor. 2. This exception was introduced in the Intel486 processor. 3. This exception was introduced in the Pentium processor and enhanced in the Pentium Pro processor.
4. This exception was introduced in the Pentium III processor.
3.1.8.
Virtual-8086 Mode Exceptions
The Virtual-8086 Mode Exceptions section lists the exceptions that can occur when the instruction is executed in virtual-8086 mode.
3-13
3.1.9.
Floating-Point Exceptions
The Floating-Point Exceptions section lists additional exceptions that can occur when a floating-point instruction is executed in any mode. All of these exception conditions result in a floating-point error exception (#MF, vector number 16) being generated. Table 3-3 associates each one- or two-letter mnemonic with the corresponding exception name. Refer to Section 7.8., Floating-Point Exception Conditions in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1, for a detailed description of these exceptions.
Table 3-3. Floating-Point Exception Mnemonics and Names
Vector No. 16 #IS #IA 16 16 16 16 16 #Z #D #O #U #P Mnemonic Name Floating-point invalid operation: - Stack overflow or underflow - Invalid arithmetic operation Floating-point divide-by-zero Floating-point denormalized operation Floating-point numeric overflow Floating-point numeric underflow Floating-point inexact result (precision) Source - FPU stack overflow or underflow - Invalid FPU arithmetic operation FPU divide-by-zero Attempting to operate on a denormal number FPU numeric overflow FPU numeric underflow Inexact result (precision)
3.1.10. SIMD Floating-Point Exceptions - Streaming SIMD Extensions Only

The SIMD Floating-Point Exceptions section lists additional exceptions that can occur when a SIMD floating-point instruction is executed in any mode. All of these exception conditions result in a SIMD floating-point error exception (#XF, vector number 19) being generated. Table 3-4 associates each one-or two-letter mnemonic with the corresponding exception name. For a detailed description of these exceptions, refer to Chapter 9, Programming with the Streaming SIMD Extensions, of the Intel Architecture Software Developers Manual, Volume 1.
3-14
Table 3-4. SIMD Floating-Point Exception Mnemonics and Names
Vector No. 6 6
Mnemonic #UD #UD
Name Invalid opcode Invalid opcode
Source Memory access Refer to Note 1 & Table 3-5 Refer to Note 1 & Table 3-5 Memory access Refer to Note 2 Memory access Refer to Note 3 Refer to Note 4 Refer to Note 4 Refer to Note 4 Refer to Note 5 Refer to Note 5 Refer to Note 5
#NM
Device not available Stack exception General protection Page fault Alignment check Invalid operation Divide-by-zero Denormalized operand Numeric overflow Numeric underflow Inexact result
12 13 14 17 19 19 19 19 19 19
#SS #GP #PF #AC #I #Z #D #O #U #P
Note 1:These are system exceptions. Table 3-5 lists the causes for Interrupt 6 and Interrupt 7 with Streaming SIMD Extensions. Note 2:Executing a Streaming SIMD Extension with a misaligned 128-bit memory reference generates a general protection exception; a 128-bit reference within the stack segment, which is not aligned to a 16byte boundary will also generate a GP fault, not a stack exception (SS). However, the MOVUPS instruction, which performs an unaligned 128-bit load or store, will not generate an exception for data that is not aligned to a 16-byte boundary. Note 3:This type of alignment check is done for operands which are less than 128-bits in size: 32-bit scalar single and 16-bit/32-bit/64-bit integer MMX technology; the exception is the MOVUPS instruction, which performs a 128-bit unaligned load or store, is also covered by this alignment check. There are three conditions that must be true to enable #AC interrupt generation. Note 4:Invalid, Divide-by-zero and Denormal exceptions are pre-computation exceptions, i.e., they are detected before any arithmetic operation occurs. Note 5:Underflow, Overflow and Precision exceptions are post-computation exceptions.
3-15
Table 3-5. Streaming SIMD Extensions Faults (Interrupts 6 & 7)

CR0.EM 1 0 CR0.TS 1 CR4.OSFXSR 1 0 CPUID.XMM 1 0 Exception #UD Interrupt 6 #NM Interrupt 7 #UD Interrupt 6 #UD Interrupt 6
3.2.
INSTRUCTION REFERENCE
The remainder of this chapter provides detailed descriptions of each of the Intel Architecture instructions.
3-16
AAAASCII Adjust After Addition

Opcode 37 Instruction AAA Description ASCII adjust AL after addition
Description This instruction adjusts the sum of two unpacked BCD values to create an unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAA instruction is only useful when it follows an ADD instruction that adds (binary addition) two unpacked BCD values and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result. If the addition produces a decimal carry, the AH register is incremented by 1, and the CF and AF flags are set. If there was no decimal carry, the CF and AF flags are cleared and the AH register is unchanged. In either case, bits 4 through 7 of the AL register are cleared to 0. Operation
IF ((AL AND 0FH) > 9) OR (AF = 1) THEN AL (AL + 6); AH AH + 1; AF 1; CF 1; ELSE AF 0; CF 0; FI; AL AL AND 0FH;
Flags Affected The AF and CF flags are set to 1 if the adjustment results in a decimal carry; otherwise they are cleared to 0. The OF, SF, ZF, and PF flags are undefined. Exceptions (All Operating Modes) None.
3-17
AADASCII Adjust AX Before Division

Opcode D5 0A D5 ib Instruction AAD (No mnemonic) Description ASCII adjust AX before division Adjust AX before division to number base imm8
Description This instruction adjusts two unpacked BCD digits (the least-significant digit in the AL register and the most-significant digit in the AH register) so that a division operation performed on the result will yield a correct unpacked BCD value. The AAD instruction is only useful when it precedes a DIV instruction that divides (binary division) the adjusted value in the AX register by an unpacked BCD value. The AAD instruction sets the value in the AL register to (AL + (10 * AH)), and then clears the AH register to 00H. The value in the AX register is then equal to the binary equivalent of the original unpacked two-digit (base 10) number in registers AH and AL. The generalized version of this instruction allows adjustment of two unpacked digits of any number base (refer to the Operation section below), by setting the imm8 byte to the selected number base (for example, 08H for octal, 0AH for decimal, or 0CH for base 12 numbers). The AAD mnemonic is interpreted by all assemblers to mean adjust ASCII (base 10) values. To adjust values in another number base, the instruction must be hand coded in machine code (D5 imm8). Operation
tempAL AL; tempAH AH; AL (tempAL + (tempAH imm8)) AND FFH; (* imm8 is set to 0AH for the AAD mnemonic *) AH 0
The immediate value (imm8) is taken from the second byte of the instruction. Flags Affected The SF, ZF, and PF flags are set according to the result; the OF, AF, and CF flags are undefined. Exceptions (All Operating Modes) None.
3-18
AAMASCII Adjust AX After Multiply

Opcode D4 0A D4 ib Instruction AAM (No mnemonic) Description ASCII adjust AX after multiply Adjust AX after multiply to number base imm8
Description This instruction adjusts the result of the multiplication of two unpacked BCD values to create a pair of unpacked (base 10) BCD values. The AX register is the implied source and destination operand for this instruction. The AAM instruction is only useful when it follows an MUL instruction that multiplies (binary multiplication) two unpacked BCD values and stores a word result in the AX register. The AAM instruction then adjusts the contents of the AX register to contain the correct 2-digit unpacked (base 10) BCD result. The generalized version of this instruction allows adjustment of the contents of the AX to create two unpacked digits of any number base (refer to the Operation section below). Here, the imm8 byte is set to the selected number base (for example, 08H for octal, 0AH for decimal, or 0CH for base 12 numbers). The AAM mnemonic is interpreted by all assemblers to mean adjust to ASCII (base 10) values. To adjust to values in another number base, the instruction must be hand coded in machine code (D4 imm8). Operation
tempAL AL; AH tempAL / imm8; (* imm8 is set to 0AH for the AAD mnemonic *) AL tempAL MOD imm8;
The immediate value (imm8) is taken from the second byte of the instruction. Flags Affected The SF, ZF, and PF flags are set according to the result. The OF, AF, and CF flags are undefined. Exceptions (All Operating Modes) None with the default immediate value of 0AH. If, however, an immediate value of 0 is used, it will cause a #DE (divide error) exception.
3-19
AASASCII Adjust AL After Subtraction

Opcode 3F Instruction AAS Description ASCII adjust AL after subtraction
Description This instruction adjusts the result of the subtraction of two unpacked BCD values to create a unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAS instruction is only useful when it follows a SUB instruction that subtracts (binary subtraction) one unpacked BCD value from another and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result. If the subtraction produced a decimal carry, the AH register is decremented by 1, and the CF and AF flags are set. If no decimal carry occurred, the CF and AF flags are cleared, and the AH register is unchanged. In either case, the AL register is left with its top nibble set to 0. Operation
IF ((AL AND 0FH) > 9) OR (AF = 1) THEN AL AL 6; AH AH 1; AF 1; CF 1; ELSE CF 0; AF 0; FI; AL AL AND 0FH;
Flags Affected The AF and CF flags are set to 1 if there is a decimal borrow; otherwise, they are cleared to 0. The OF, SF, ZF, and PF flags are undefined. Exceptions (All Operating Modes) None.
3-20
ADCAdd with Carry

Opcode 14 ib 15 iw 15 id 80 /2 ib 81 /2 iw 81 /2 id 83 /2 ib 83 /2 ib 10 /r 11 /r 11 /r 12 /r 13 /r 13 /r Instruction ADC AL,imm8 ADC AX,imm16 ADC EAX,imm32 ADC r/m8,imm8 ADC r/m16,imm16 ADC r/m32,imm32 ADC r/m16,imm8 ADC r/m32,imm8 ADC r/m8,r8 ADC r/m16,r16 ADC r/m32,r32 ADC r8,r/m8 ADC r16,r/m16 ADC r32,r/m32 Description Add with carry imm8 to AL Add with carry imm16 to AX Add with carry imm32 to EAX Add with carry imm8 to r/m8 Add with carry imm16 to r/m16 Add with CF imm32 to r/m32 Add with CF sign-extended imm8 to r/m16 Add with CF sign-extended imm8 into r/m32 Add with carry byte register to r/m8 Add with carry r16 to r/m16 Add with CF r32 to r/m32 Add with carry r/m8 to byte register Add with carry r/m16 to r16 Add with CF r/m32 to r32
Description This instruction adds the destination operand (first operand), the source operand (second operand), and the carry (CF) flag and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) The state of the CF flag represents a carry from a previous addition. When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format. The ADC instruction does not distinguish between signed or unsigned operands. Instead, the processor evaluates the result for both data types and sets the OF and CF flags to indicate a carry in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result. The ADC instruction is usually executed as part of a multibyte or multiword addition in which an ADD instruction is followed by an ADC instruction. Operation
DEST DEST + SRC + CF;
Flags Affected The OF, SF, ZF, AF, CF, and PF flags are set according to the result.
3-21
ADCAdd with Carry (Continued)

Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
Real-Address Mode Exceptions #GP #SS If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit.
Virtual-8086 Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made.
3-22
ADDAdd
Opcode 04 ib 05 iw 05 id 80 /0 ib 81 /0 iw 81 /0 id 83 /0 ib 83 /0 ib 00 /r 01 /r 01 /r 02 /r 03 /r 03 /r Instruction ADD AL,imm8 ADD AX,imm16 ADD EAX,imm32 ADD r/m8,imm8 ADD r/m16,imm16 ADD r/m32,imm32 ADD r/m16,imm8 ADD r/m32,imm8 ADD r/m8,r8 ADD r/m16,r16 ADD r/m32,r32 ADD r8,r/m8 ADD r16,r/m16 ADD r32,r/m32 Description Add imm8 to AL Add imm16 to AX Add imm32 to EAX Add imm8 to r/m8 Add imm16 to r/m16 Add imm32 to r/m32 Add sign-extended imm8 to r/m16 Add sign-extended imm8 to r/m32 Add r8 to r/m8 Add r16 to r/m16 Add r32 to r/m32 Add r/m8 to r8 Add r/m16 to r16 Add r/m32 to r32
Description This instruction adds the first operand (destination operand) and the second operand (source operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format. The ADD instruction does not distinguish between signed or unsigned operands. Instead, the processor evaluates the result for both data types and sets the OF and CF flags to indicate a carry in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result. Operation
DEST DEST + SRC;
Flags Affected The OF, SF, ZF, AF, CF, and PF flags are set according to the result.
3-23
ADDAdd (Continued)
3-24
ADDPSPacked Single-FP Add

Opcode 0F,58,/r Instruction ADDPS xmm1, xmm2/m128 Description Add packed SP FP numbers from XMM2/Mem to XMM1.
Description The ADDPS instruction adds the packed SP FP numbers of both their operands.
ADD PS xmm1, xmm2/M128 Xmm1 4.0 3.0
+
Xmm2/ m128 Xmm1 1.0 2.0
+
3.0
+
4.0
=
5.0
=
5.0
=
5.0
=
5.0
Figure 3-3. Operation of the ADDPS Instruction
Operation
DEST[31-0] DEST[63-32] DEST[95-64] DEST[127-96] = DEST[31-0] + SRC/m128[31-0]; = DEST[63-32] + SRC/m128[63-32]; = DEST[95-64] + SRC/m128[95-64]; = DEST[127-96] + SRC/m128[127-96];
Intel C/C++ Compiler Intrinsic Equivalent

__m128 _mm_add_ps(__m128 a, __m128 b)
Adds the four SP FP values of a and b.
3-25
ADDPSPacked Single-FP Add (Continued)

Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions Overflow, Underflow, Invalid, Precision, Denormal. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault If CR0.EM = 1. If TS bit in CR0 is set. For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
Real Address Mode Exceptions Interrupt 13 #UD #NM #XM #UD If any part of the operand would lie outside of the effective address space from 0 to 0FFFFH. If CR0.EM = 1. If TS bit in CR0 is set. For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0).
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) For a page fault.
3-26
ADDSSScalar Single-FP Add

Opcode F3,0F,58, /r Instruction ADDSS xmm1, xmm2/m32 Description Add the lower SP FP number from XMM2/Mem to XMM1.
Description The ADDSS instruction adds the lower SP FP numbers of both their operands; the upper three fields are passed through from xmm1.
ADD SS xmm1, xmm2/m32 Xmm1
+
Xmm2/ m32 Xmm1
+ =
+
4.0
=
5.0
Figure 3-4. Operation of the ADDSS Instruction
Operation
DEST[31-0] DEST[63-32] DEST[95-64] DEST[127-96] = DEST[31-0] + SRC/m32[31-0]; = DEST[63-32]; = DEST[95-64]; = DEST[127-96];

__m128 _mm_add_ss(__m128 a, __m128 b)
Adds the lower SP FP (single-precision, floating-point) values of a and b; the upper three SP FP values are passed through from a.
3-27
ADDSSScalar Single-FP Add (Continued)

Exceptions None. Numeric Exceptions Overflow, Underflow, Invalid, Precision, Denormal. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #AC For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference. To enable #AC exceptions, three conditions must be true (CR0.AM is set; EFLAGS.AC is set; and current CPL is 3). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
#XM #UD #UD #UD
3-28
ADDSSScalar Single-FP Add (Continued)

Real Address Mode Exceptions Interrupt 13 #UD #NM #XM #UD #UD #UD If any part of the operand would lie outside of the effective address space from 0 to 0FFFFH. If CR0.EM = 1. If TS bit in CR0 is set. For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #AC #PF (fault-code) For unaligned memory reference if the current privilege level is 3. For a page fault.
3-29
ANDLogical AND
Opcode 24 ib 25 iw 25 id 80 /4 ib 81 /4 iw 81 /4 id 83 /4 ib 83 /4 ib 20 /r 21 /r 21 /r 22 /r 23 /r 23 /r Instruction AND AL,imm8 AND AX,imm16 AND EAX,imm32 AND r/m8,imm8 AND r/m16,imm16 AND r/m32,imm32 AND r/m16,imm8 AND r/m32,imm8 AND r/m8,r8 AND r/m16,r16 AND r/m32,r32 AND r8,r/m8 AND r16,r/m16 AND r32,r/m32 Description AL AND imm8 AX AND imm16 EAX AND imm32
r/m8 AND imm8 r/m16 AND imm16 r/m32 AND imm32 r/m16 AND imm8 (sign-extended) r/m32 AND imm8 (sign-extended) r/m8 AND r8 r/m16 AND r16 r/m32 AND r32 r8 AND r/m8 r16 AND r/m16 r32 AND r/m32
Description This instruction performs a bitwise AND operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. Two memory operands cannot, however, be used in one instruction. Each bit of the instruction result is a 1 if both corresponding bits of the operands are 1; otherwise, it becomes a 0. Operation
DEST DEST AND SRC;
Flags Affected The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result. The state of the AF flag is undefined.
3-30
ANDLogical AND (Continued)

Protected Mode Exceptions #GP(0) If the destination operand points to a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-31
ANDNPSBit-wise Logical And Not For Single-FP

Opcode 0F,55,/r Instruction ANDNPS xmm1, xmm2/m128 Description Invert the 128 bits in XMM1and then AND the result with 128 bits from XMM2/Mem.
Description The ANDNPS instructions returns a bit-wise logical AND between the complement of XMM1 and XMM2/Mem.
ANDNPS xmm1, xmm2/M128 Xmm1 0x00001111 0x11110000
&
Xmm2/ m128 Xmm1 0x11110000 0x00001111
&
0x11110000
&
0x00001111
=
0x00001111
=
0x11110000
=
0x00001111
=
0x11110000
Figure 3-5. Operation of the ANDNPS Instruction
Operation
DEST[127-0] = NOT (DEST[127-0]) AND SRC/m128[127-0];

__m128 _mm_andnot_ps(__m128 a, __m128 b)
Computes the bitwise AND-NOT of the four SP FP values of a and b.
3-32
ANDNPSBit-wise Logical And Not for Single-FP (Continued)

Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
Real-Address Mode Exceptions Interrupt 13 #UD #NM If any part of the operand would lie outside of the effective address space from 0 to 0FFFFH If CR0.EM = 1. If TS bit in CR0 is set.
Virtual-8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) #UD #UD Comments The usage of Repeat Prefix (F3H) with ANDNPS is reserved. Different processor implementations may handle this prefix differently. Usage of this prefix with ANDNPS risks incompatibility with future processors. For a page fault. If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-33
ANDPSBit-wise Logical And For Single FP

Opcode 0F,54,/r Instruction ANDPS xmm1, xmm2/m128 Description Logical AND of 128 bits from XMM2/Mem to XMM1 register.
Description The ANDPS instruction returns a bit-wise logical AND between XMM1 and XMM2/Mem.
ANDPS xmm1, xmm2/m128 Xmm1 Xmm2/ m128 Xmm1 0x00001111 0x11110000 0x00001111 0x11110000
&
0x11110000
&
0x00001111
&
0x11110000
&
0x00001111
=
0X00000000
=
0X00000000
=
0X00000000
=
0X00000000
Figure 3-6. Operation of the ANDPS Instruction
Operation
DEST[127-0] AND= SRC/m128[127-0];

__m128 _mm_and_ps(__m128 a, __m128 b)
Computes the bitwise And of the four SP FP values of a and b. Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment.
3-34
ANDPSBit-wise Logical And for Single-FP (Continued)

Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
Real-Address Mode Exceptions Interrupt 13 #UD #NM #UD #UD If any part of the operand would lie outside of the effective address space from 0 to 0FFFFH. If CR0.EM = 1. If TS bit in CR0 is set. If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
Virtual-8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) Comments The usage of Repeat Prefix (F3H) with ANDPS is reserved. Different processor implementations may handle this prefix differently. Usage of this prefix with ANDPS risks incompatibility with future processors. For a page fault.
3-35
ARPLAdjust RPL Field of Segment Selector

Opcode 63 /r Instruction ARPL r/m16,r16 Description Adjust RPL of r/m16 to not less than RPL of r16
Description This instruction compares the RPL fields of two segment selectors. The first operand (the destination operand) contains one segment selector and the second operand (source operand) contains the other. (The RPL field is located in bits 0 and 1 of each operand.) If the RPL field of the destination operand is less than the RPL field of the source operand, the ZF flag is set and the RPL field of the destination operand is increased to match that of the source operand. Otherwise, the ZF flag is cleared and no change is made to the destination operand. (The destination operand can be a word register or a memory location; the source operand must be a word register.) The ARPL instruction is provided for use by operating-system procedures (however, it can also be used by applications). It is generally used to adjust the RPL of a segment selector that has been passed to the operating system by an application program to match the privilege level of the application program. Here the segment selector passed to the operating system is placed in the destination operand and segment selector for the application programs code segment is placed in the source operand. (The RPL field in the source operand represents the privilege level of the application program.) Execution of the ARPL instruction then insures that the RPL of the segment selector received by the operating system is no lower (does not have a higher privilege) than the privilege level of the application program. (The segment selector for the application programs code segment can be read from the stack following a procedure call.) Refer to Section 4.10.4., Checking Caller Access Privileges (ARPL Instruction) in Chapter 4, Protection of the Intel Architecture Software Developers Manual, Volume 3, for more information about the use of this instruction. Operation
IF DEST(RPL) < SRC(RPL) THEN ZF 1; DEST(RPL) SRC(RPL); ELSE ZF 0; FI;
Flags Affected The ZF flag is set to 1 if the RPL field of the destination operand is less than that of the source operand; otherwise, is cleared to 0.
3-36
ARPLAdjust RPL Field of Segment Selector (Continued)

Real-Address Mode Exceptions #UD The ARPL instruction is not recognized in real-address mode.
Virtual-8086 Mode Exceptions #UD The ARPL instruction is not recognized in virtual-8086 mode.
3-37
BOUNDCheck Array Index Against Bounds

Opcode 62 /r 62 /r Instruction BOUND r16,m16&16 BOUND r32,m32&32 Description Check if r16 (array index) is within bounds specified by m16&16 Check if r32 (array index) is within bounds specified by m16&16
Description This instruction determines if the first operand (array index) is within the bounds of an array specified the second operand (bounds operand). The array index is a signed integer located in a register. The bounds operand is a memory location that contains a pair of signed doublewordintegers (when the operand-size attribute is 32) or a pair of signed word-integers (when the operand-size attribute is 16). The first doubleword (or word) is the lower bound of the array and the second doubleword (or word) is the upper bound of the array. The array index must be greater than or equal to the lower bound and less than or equal to the upper bound plus the operand size in bytes. If the index is not within bounds, a BOUND range exceeded exception (#BR) is signaled. (When a this exception is generated, the saved return instruction pointer points to the BOUND instruction.) The bounds limit data structure (two words or doublewords containing the lower and upper limits of the array) is usually placed just before the array itself, making the limits addressable via a constant offset from the beginning of the array. Because the address of the array already will be present in a register, this practice avoids extra bus cycles to obtain the effective address of the array bounds. Operation
IF (ArrayIndex < LowerBound OR ArrayIndex > (UppderBound + OperandSize/8])) (* Below lower bound or above upper bound *) THEN #BR; FI;
Flags Affected None.
3-38
BOUNDCheck Array Index Against Bounds (Continued)

Protected Mode Exceptions #BR #UD #GP(0) If the bounds test fails. If second operand is not a memory location. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
Real-Address Mode Exceptions #BR #UD #GP #SS If the bounds test fails. If second operand is not a memory location. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit.
Virtual-8086 Mode Exceptions #BR #UD #GP(0) #SS(0) #PF(fault-code) #AC(0) If the bounds test fails. If second operand is not a memory location. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made.
3-39
BSFBit Scan Forward

Opcode 0F BC 0F BC Instruction BSF r16,r/m16 BSF r32,r/m32 Description Bit scan forward on r/m16 Bit scan forward on r/m32
Description This instruction searches the source operand (second operand) for the least significant set bit (1 bit). If a least significant 1 bit is found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the contents source operand are 0, the contents of the destination operand is undefined. Operation
IF SRC = 0 THEN ZF 1; DEST is undefined; ELSE ZF 0; temp 0; WHILE Bit(SRC, temp) = 0 DO temp temp + 1; DEST temp; OD; FI;
Flags Affected The ZF flag is set to 1 if all the source operand is 0; otherwise, the ZF flag is cleared. The CF, OF, SF, AF, and PF, flags are undefined. Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-40
BSFBit Scan Forward (Continued)

3-41
BSRBit Scan Reverse

Opcode 0F BD 0F BD Instruction BSR r16,r/m16 BSR r32,r/m32 Description Bit scan reverse on r/m16 Bit scan reverse on r/m32
Description This instruction searches the source operand (second operand) for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the contents source operand are 0, the contents of the destination operand is undefined. Operation
IF SRC = 0 THEN ZF 1; DEST is undefined; ELSE ZF 0; temp OperandSize 1; WHILE Bit(SRC, temp) = 0 DO temp temp 1; DEST temp; OD; FI;
Flags Affected The ZF flag is set to 1 if all the source operand is 0; otherwise, the ZF flag is cleared. The CF, OF, SF, AF, and PF, flags are undefined. Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-42
BSRBit Scan Reverse (Continued)

3-43
BSWAPByte Swap
Opcode 0F C8+rd Instruction BSWAP r32 Description Reverses the byte order of a 32-bit register.
Description This instruction reverses the byte order of a 32-bit (destination) register: bits 0 through 7 are swapped with bits 24 through 31, and bits 8 through 15 are swapped with bits 16 through 23. This instruction is provided for converting little-endian values to big-endian format and vice versa. To swap bytes in a word value (16-bit register), use the XCHG instruction. When the BSWAP instruction references a 16-bit register, the result is undefined. Intel Architecture Compatibility The BSWAP instruction is not supported on Intel Architecture processors earlier than the Intel486 processor family. For compatibility with this instruction, include functionally equivalent code for execution on Intel processors earlier than the Intel486 processor family. Operation
TEMP DEST DEST(7..0) TEMP(31..24) DEST(15..8) TEMP(23..16) DEST(23..16) TEMP(15..8) DEST(31..24) TEMP(7..0)
Flags Affected None. Exceptions (All Operating Modes) None.
3-44
BTBit Test
Opcode 0F A3 0F A3 0F BA /4 ib 0F BA /4 ib Instruction BT r/m16,r16 BT r/m32,r32 BT r/m16,imm8 BT r/m32,imm8 Description Store selected bit in CF flag Store selected bit in CF flag Store selected bit in CF flag Store selected bit in CF flag
Description This instruction selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand) and stores the value of the bit in the CF flag. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value. If the bit base operand specifies a register, the instruction takes the modulo 16 or 32 (depending on the register size) of the bit offset operand, allowing any bit position to be selected in a 16- or 32-bit register, respectively (refer to Figure 3-1). If the bit base operand specifies a memory location, it represents the address of the byte in memory that contains the bit base (bit 0 of the specified byte) of the bit string (refer to Figure 3-2). The offset operand then selects a bit position within the range 231 to 231 1 for a register offset and 0 to 31 for an immediate offset. Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combination with the displacement field of the memory operand. In this case, the loworder three or five bits (three for 16-bit operands, five for 32-bit operands) of the immediate bit offset are stored in the immediate bit offset field, and the high-order bits are shifted and combined with the byte displacement in the addressing mode by the assembler. The processor will ignore the high order bits if they are not zero. When accessing a bit in memory, the processor may access four bytes starting from the memory address for a 32-bit operand size, using by the following relationship:
Effective Address + (4 (BitOffset DIV 32))
Or, it may access two bytes starting from the memory address for a 16-bit operand, using this relationship:
Effective Address + (2 (BitOffset DIV 16))
It may do so even when only a single byte needs to be accessed to reach the given bit. When using this bit addressing mechanism, software should avoid referencing areas of memory close to address space holes. In particular, it should avoid references to memory-mapped I/O registers. Instead, software should use the MOV instructions to load from or store to these addresses, and use the register form of these instructions to manipulate the data. Operation
CF Bit(BitBase, BitOffset)
3-45
BTBit Test (Continued)

Flags Affected The CF flag contains the value of the selected bit. The OF, SF, ZF, AF, and PF flags are undefined. Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-46
BTCBit Test and Complement

Opcode 0F BB 0F BB 0F BA /7 ib 0F BA /7 ib Instruction BTC r/m16,r16 BTC r/m32,r32 BTC r/m16,imm8 BTC r/m32,imm8 Description Store selected bit in CF flag and complement Store selected bit in CF flag and complement Store selected bit in CF flag and complement Store selected bit in CF flag and complement
Description This instruction selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and complements the selected bit in the bit string. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value. If the bit base operand specifies a register, the instruction takes the modulo 16 or 32 (depending on the register size) of the bit offset operand, allowing any bit position to be selected in a 16- or 32-bit register, respectively (refer to Figure 3-1). If the bit base operand specifies a memory location, it represents the address of the byte in memory that contains the bit base (bit 0 of the specified byte) of the bit string (refer to Figure 3-2). The offset operand then selects a bit position within the range 231 to 231 1 for a register offset and 0 to 31 for an immediate offset. Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combination with the displacement field of the memory operand. Refer to BTBit Test in this chapter for more information on this addressing mechanism. Operation
CF Bit(BitBase, BitOffset) Bit(BitBase, BitOffset) NOT Bit(BitBase, BitOffset);
Flags Affected The CF flag contains the value of the selected bit before it is complemented. The OF, SF, ZF, AF, and PF flags are undefined.
3-47
BTCBit Test and Complement (Continued)

3-48
BTRBit Test and Reset

Opcode 0F B3 0F B3 0F BA /6 ib 0F BA /6 ib Instruction BTR r/m16,r16 BTR r/m32,r32 BTR r/m16,imm8 BTR r/m32,imm8 Description Store selected bit in CF flag and clear Store selected bit in CF flag and clear Store selected bit in CF flag and clear Store selected bit in CF flag and clear
Description This instruction selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and clears the selected bit in the bit string to 0. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value. If the bit base operand specifies a register, the instruction takes the modulo 16 or 32 (depending on the register size) of the bit offset operand, allowing any bit position to be selected in a 16- or 32-bit register, respectively (refer to Figure 3-1). If the bit base operand specifies a memory location, it represents the address of the byte in memory that contains the bit base (bit 0 of the specified byte) of the bit string (refer to Figure 3-2). The offset operand then selects a bit position within the range 231 to 231 1 for a register offset and 0 to 31 for an immediate offset. Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combination with the displacement field of the memory operand. Refer to BTBit Test in this chapter for more information on this addressing mechanism. Operation
CF Bit(BitBase, BitOffset) Bit(BitBase, BitOffset) 0;
Flags Affected The CF flag contains the value of the selected bit before it is cleared. The OF, SF, ZF, AF, and PF flags are undefined.
3-49
BTRBit Test and Reset (Continued)

3-50
BTSBit Test and Set

Opcode 0F AB 0F AB 0F BA /5 ib 0F BA /5 ib Instruction BTS r/m16,r16 BTS r/m32,r32 BTS r/m16,imm8 BTS r/m32,imm8 Description Store selected bit in CF flag and set Store selected bit in CF flag and set Store selected bit in CF flag and set Store selected bit in CF flag and set
Description This instruction selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and sets the selected bit in the bit string to 1. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value. If the bit base operand specifies a register, the instruction takes the modulo 16 or 32 (depending on the register size) of the bit offset operand, allowing any bit position to be selected in a 16- or 32-bit register, respectively (refer to Figure 3-1). If the bit base operand specifies a memory location, it represents the address of the byte in memory that contains the bit base (bit 0 of the specified byte) of the bit string (refer to Figure 3-2). The offset operand then selects a bit position within the range 231 to 231 1 for a register offset and 0 to 31 for an immediate offset. Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combination with the displacement field of the memory operand. Refer to BTBit Test in this chapter for more information on this addressing mechanism. Operation
CF Bit(BitBase, BitOffset) Bit(BitBase, BitOffset) 1;
Flags Affected The CF flag contains the value of the selected bit before it is set. The OF, SF, ZF, AF, and PF flags are undefined.
3-51
BTSBit Test and Set (Continued)

Virtual-8086 Mode Exceptions #GP #SS #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made.
3-52
CALLCall Procedure
Opcode E8 cw E8 cd FF /2 FF /2 9A cd 9A cp FF /3 FF /3 Instruction CALL rel16 CALL rel32 CALL r/m16 CALL r/m32 CALL ptr16:16 CALL ptr16:32 CALL m16:16 CALL m16:32 Description Call near, relative, displacement relative to next instruction Call near, relative, displacement relative to next instruction Call near, absolute indirect, address given in r/m16 Call near, absolute indirect, address given in r/m32 Call far, absolute, address given in operand Call far, absolute, address given in operand Call far, absolute indirect, address given in m16:16 Call far, absolute indirect, address given in m16:32
Description This instruction saves procedure linking information on the stack and branches to the procedure (called procedure) specified with the destination (target) operand. The target operand specifies the address of the first instruction in the called procedure. This operand can be an immediate value, a general-purpose register, or a memory location. This instruction can be used to execute four different types of calls:
Near callA call to a procedure within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment call. Far callA call to a procedure located in a different segment than the current code segment, sometimes referred to as an intersegment call. Inter-privilege-level far callA far call to a procedure in a segment at a different privilege level than that of the currently executing program or procedure. Task switchA call to a procedure located in a different task.
The latter two call types (inter-privilege-level call and task switch) can only be executed in protected mode. Refer to Section 4.3., Calling Procedures Using CALL and RET in Chapter 4, Procedure Calls, Interrupts, and Exceptions of the Intel Architecture Software Developers Manual, Volume 1, for additional information on near, far, and inter-privilege-level calls. Refer to Chapter 6, Task Management, of the Intel Architecture Software Developers Manual, Volume 3, for information on performing task switches with the CALL instruction. Near Call. When executing a near call, the processor pushes the value of the EIP register (which contains the offset of the instruction following the CALL instruction) onto the stack (for use later as a return-instruction pointer). The processor then branches to the address in the current code segment specified with the target operand. The target operand specifies either an absolute offset in the code segment (that is an offset from the base of the code segment) or a relative offset (a signed displacement relative to the current value of the instruction pointer in the EIP register, which points to the instruction following the CALL instruction). The CS register is not changed on near calls.
3-53
CALLCall Procedure (Continued)

For a near call, an absolute offset is specified indirectly in a general-purpose register or a memory location (r/m16 or r/m32). The operand-size attribute determines the size of the target operand (16 or 32 bits). Absolute offsets are loaded directly into the EIP register. If the operandsize attribute is 16, the upper two bytes of the EIP register are cleared to 0s, resulting in a maximum instruction pointer size of 16 bits. (When accessing an absolute offset indirectly using the stack pointer [ESP] as a base register, the base value used is the value of the ESP before the instruction executes.) A relative offset (rel16 or rel32) is generally specified as a label in assembly code, but at the machine code level, it is encoded as a signed, 16- or 32-bit immediate value. This value is added to the value in the EIP register. As with absolute offsets, the operand-size attribute determines the size of the target operand (16 or 32 bits). Far Calls in Real-Address or Virtual-8086 Mode. When executing a far call in realaddress or virtual-8086 mode, the processor pushes the current value of both the CS and EIP registers onto the stack for use as a return-instruction pointer. The processor then performs a far branch to the code segment and offset specified with the target operand for the called procedure. Here the target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). With the pointer method, the segment and offset of the called procedure is encoded in the instruction, using a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address immediate. With the indirect method, the target operand specifies a memory location that contains a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address. The operand-size attribute determines the size of the offset (16 or 32 bits) in the far address. The far address is loaded directly into the CS and EIP registers. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared to 0s. Far Calls in Protected Mode. When the processor is operating in protected mode, the CALL instruction can be used to perform the following three types of far calls:
Far call to the same privilege level. Far call to a different privilege level (inter-privilege level call). Task switch (far call to another task).
In protected mode, the processor always uses the segment selector part of the far address to access the corresponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate, task gate, or TSS) and access rights determine the type of call operation to be performed. If the selected descriptor is for a code segment, a far call to a code segment at the same privilege level is performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, a general-protection exception is generated.) A far call to the same privilege level in protected mode is very similar to one carried out in real-address or virtual-8086 mode. The target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The operandsize attribute determines the size of the offset (16 or 32 bits) in the far address. The new code segment selector and its descriptor are loaded into CS register, and the offset from the instruction is loaded into the EIP register.
3-54

Note that a call gate (described in the next paragraph) can also be used to perform far call to a code segment at the same privilege level. Using this mechanism provides an extra level of indirection and is the preferred method of making calls between 16-bit and 32-bit code segments. When executing an inter-privilege-level far call, the code segment for the procedure being called must be accessed through a call gate. The segment selector specified by the target operand identifies the call gate. Here again, the target operand can specify the call gate segment selector either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The processor obtains the segment selector for the new code segment and the new instruction pointer (offset) from the call gate descriptor. (The offset from the target operand is ignored when a call gate is used.) On inter-privilege-level calls, the processor switches to the stack for the privilege level of the called procedure. The segment selector for the new stack segment is specified in the TSS for the currently running task. The branch to the new code segment occurs after the stack switch. (Note that when using a call gate to perform a far call to a segment at the same privilege level, no stack switch occurs.) On the new stack, the processor pushes the segment selector and stack pointer for the calling procedures stack, an (optional) set of parameters from the calling procedures stack, and the segment selector and instruction pointer for the calling procedures code segment. (A value in the call gate descriptor determines how many parameters to copy to the new stack.) Finally, the processor branches to the address of the procedure being called within the new code segment. Executing a task switch with the CALL instruction, is somewhat similar to executing a call through a call gate. Here the target operand specifies the segment selector of the task gate for the task being switched to (and the offset in the target operand is ignored.) The task gate in turn points to the TSS for the task, which contains the segment selectors for the tasks code and stack segments. The TSS also contains the EIP value for the next instruction that was to be executed before the task was suspended. This instruction pointer value is loaded into EIP register so that the task begins executing again at this next instruction. The CALL instruction can also specify the segment selector of the TSS directly, which eliminates the indirection of the task gate. Refer to Chapter 6, Task Management, of the Intel Architecture Software Developers Manual, Volume 3, for detailed information on the mechanics of a task switch. Note that when you execute at task switch with a CALL instruction, the nested task flag (NT) is set in the EFLAGS register and the new TSSs previous task link field is loaded with the old tasks TSS selector. Code is expected to suspend this nested task by executing an IRET instruction, which, because the NT flag is set, will automatically use the previous task link to return to the calling task. Refer to Section 6.4., Task Linking in Chapter 6, Task Management of the Intel Architecture Software Developers Manual, Volume 3, for more information on nested tasks. Switching tasks with the CALL instruction differs in this regard from the JMP instruction which does not set the NT flag and therefore does not expect an IRET instruction to suspend the task.
3-55

Mixing 16-Bit and 32-Bit Calls. When making far calls between 16-bit and 32-bit code segments, the calls should be made through a call gate. If the far call is from a 32-bit code segment to a 16-bit code segment, the call should be made from the first 64 KBytes of the 32bit code segment. This is because the operand-size attribute of the instruction is set to 16, so only a 16-bit return address offset is saved. Also, the call should be made using a 16-bit call gate so that 16-bit values will be pushed on the stack. Refer to Chapter 17, Mixing 16-Bit and 32-Bit Code, of the Intel Architecture Software Developers Manual, Volume 3, for more information on making calls between 16-bit and 32-bit code segments. Operation
IF near call THEN IF near relative call IF the instruction pointer is not within code segment limit THEN #GP(0); FI; THEN IF OperandSize = 32 THEN IF stack not large enough for a 4-byte return address THEN #SS(0); FI; Push(EIP); EIP EIP + DEST; (* DEST is rel32 *) ELSE (* OperandSize = 16 *) IF stack not large enough for a 2-byte return address THEN #SS(0); FI; Push(IP); EIP (EIP + DEST) AND 0000FFFFH; (* DEST is rel16 *) FI; FI; ELSE (* near absolute call *) IF the instruction pointer is not within code segment limit THEN #GP(0); FI; IF OperandSize = 32 THEN IF stack not large enough for a 4-byte return address THEN #SS(0); FI; Push(EIP); EIP DEST; (* DEST is r/m32 *) ELSE (* OperandSize = 16 *) IF stack not large enough for a 2-byte return address THEN #SS(0); FI; Push(IP); EIP DEST AND 0000FFFFH; (* DEST is r/m16 *) FI; FI: FI; IF far call AND (PE = 0 OR (PE = 1 AND VM = 1)) (* real-address or virtual-8086 mode *) THEN IF OperandSize = 32 THEN IF stack not large enough for a 6-byte return address THEN #SS(0); FI; IF the instruction pointer is not within code segment limit THEN #GP(0); FI;
3-56

Push(CS); (* padded with 16 high-order bits *) Push(EIP); CS DEST[47:32]; (* DEST is ptr16:32 or [m16:32] *) EIP DEST[31:0]; (* DEST is ptr16:32 or [m16:32] *) ELSE (* OperandSize = 16 *) IF stack not large enough for a 4-byte return address THEN #SS(0); FI; IF the instruction pointer is not within code segment limit THEN #GP(0); FI; Push(CS); Push(IP); CS DEST[31:16]; (* DEST is ptr16:16 or [m16:16] *) EIP DEST[15:0]; (* DEST is ptr16:16 or [m16:16] *) EIP EIP AND 0000FFFFH; (* clear upper 16 bits *) FI; FI; IF far call AND (PE = 1 AND VM = 0) (* Protected mode, not virtual-8086 mode *) THEN IF segment selector in target operand null THEN #GP(0); FI; IF segment selector index not within descriptor table limits THEN #GP(new code segment selector); FI; Read type and access rights of selected segment descriptor; IF segment type is not a conforming or nonconforming code segment, call gate, task gate, or TSS THEN #GP(segment selector); FI; Depending on type and access rights GO TO CONFORMING-CODE-SEGMENT; GO TO NONCONFORMING-CODE-SEGMENT; GO TO CALL-GATE; GO TO TASK-GATE; GO TO TASK-STATE-SEGMENT; FI; CONFORMING-CODE-SEGMENT: IF DPL > CPL THEN #GP(new code segment selector); FI; IF segment not present THEN #NP(new code segment selector); FI; IF OperandSize = 32 THEN IF stack not large enough for a 6-byte return address THEN #SS(0); FI; IF the instruction pointer is not within code segment limit THEN #GP(0); FI; Push(CS); (* padded with 16 high-order bits *) Push(EIP); CS DEST(NewCodeSegmentSelector); (* segment descriptor information also loaded *) CS(RPL) CPL EIP DEST(offset);
3-57

ELSE (* OperandSize = 16 *) IF stack not large enough for a 4-byte return address THEN #SS(0); FI; IF the instruction pointer is not within code segment limit THEN #GP(0); FI; Push(CS); Push(IP); CS DEST(NewCodeSegmentSelector); (* segment descriptor information also loaded *) CS(RPL) CPL EIP DEST(offset) AND 0000FFFFH; (* clear upper 16 bits *) FI; END; NONCONFORMING-CODE-SEGMENT: IF (RPL > CPL) OR (DPL CPL) THEN #GP(new code segment selector); FI; IF segment not present THEN #NP(new code segment selector); FI; IF stack not large enough for return address THEN #SS(0); FI; tempEIP DEST(offset) IF OperandSize=16 THEN tempEIP tempEIP AND 0000FFFFH; (* clear upper 16 bits *) FI; IF tempEIP outside code segment limit THEN #GP(0); FI; IF OperandSize = 32 THEN Push(CS); (* padded with 16 high-order bits *) Push(EIP); CS DEST(NewCodeSegmentSelector); (* segment descriptor information also loaded *) CS(RPL) CPL; EIP tempEIP; ELSE (* OperandSize = 16 *) Push(CS); Push(IP); CS DEST(NewCodeSegmentSelector); (* segment descriptor information also loaded *) CS(RPL) CPL; EIP tempEIP; FI; END; CALL-GATE: IF call gate DPL < CPL or RPL THEN #GP(call gate selector); FI; IF call gate not present THEN #NP(call gate selector); FI; IF call gate code-segment selector is null THEN #GP(0); FI;
3-58

IF call gate code-segment selector index is outside descriptor table limits THEN #GP(code segment selector); FI; Read code segment descriptor; IF code-segment segment descriptor does not indicate a code segment OR code-segment segment descriptor DPL > CPL THEN #GP(code segment selector); FI; IF code segment not present THEN #NP(new code segment selector); FI; IF code segment is non-conforming AND DPL < CPL THEN go to MORE-PRIVILEGE; ELSE go to SAME-PRIVILEGE; FI; END; MORE-PRIVILEGE: IF current TSS is 32-bit TSS THEN TSSstackAddress new code segment (DPL 8) + 4 IF (TSSstackAddress + 7) > TSS limit THEN #TS(current TSS selector); FI; newSS TSSstackAddress + 4; newESP stack address; ELSE (* TSS is 16-bit *) TSSstackAddress new code segment (DPL 4) + 2 IF (TSSstackAddress + 4) > TSS limit THEN #TS(current TSS selector); FI; newESP TSSstackAddress; newSS TSSstackAddress + 2; FI; IF stack segment selector is null THEN #TS(stack segment selector); FI; IF stack segment selector index is not within its descriptor table limits THEN #TS(SS selector); FI Read code segment descriptor; IF stack segment selectors RPL DPL of code segment OR stack segment DPL DPL of code segment OR stack segment is not a writable data segment THEN #TS(SS selector); FI IF stack segment not present THEN #SS(SS selector); FI; IF CallGateSize = 32 THEN IF stack does not have room for parameters plus 16 bytes THEN #SS(SS selector); FI; IF CallGate(InstructionPointer) not within code segment limit THEN #GP(0); FI; SS newSS; (* segment descriptor information also loaded *)
3-59

ESP newESP; CS:EIP CallGate(CS:InstructionPointer); (* segment descriptor information also loaded *) Push(oldSS:oldESP); (* from calling procedure *) temp parameter count from call gate, masked to 5 bits; Push(parameters from calling procedures stack, temp) Push(oldCS:oldEIP); (* return address to calling procedure *) ELSE (* CallGateSize = 16 *) IF stack does not have room for parameters plus 8 bytes THEN #SS(SS selector); FI; IF (CallGate(InstructionPointer) AND FFFFH) not within code segment limit THEN #GP(0); FI; SS newSS; (* segment descriptor information also loaded *) ESP newESP; CS:IP CallGate(CS:InstructionPointer); (* segment descriptor information also loaded *) Push(oldSS:oldESP); (* from calling procedure *) temp parameter count from call gate, masked to 5 bits; Push(parameters from calling procedures stack, temp) Push(oldCS:oldEIP); (* return address to calling procedure *) FI; CPL CodeSegment(DPL) CS(RPL) CPL END; SAME-PRIVILEGE: IF CallGateSize = 32 THEN IF stack does not have room for 8 bytes THEN #SS(0); FI; IF EIP not within code segment limit then #GP(0); FI; CS:EIP CallGate(CS:EIP) (* segment descriptor information also loaded *) Push(oldCS:oldEIP); (* return address to calling procedure *) ELSE (* CallGateSize = 16 *) IF stack does not have room for parameters plus 4 bytes THEN #SS(0); FI; IF IP not within code segment limit THEN #GP(0); FI; CS:IP CallGate(CS:instruction pointer) (* segment descriptor information also loaded *) Push(oldCS:oldIP); (* return address to calling procedure *) FI; CS(RPL) CPL END;
3-60

TASK-GATE: IF task gate DPL < CPL or RPL THEN #GP(task gate selector); FI; IF task gate not present THEN #NP(task gate selector); FI; Read the TSS segment selector in the task-gate descriptor; IF TSS segment selector local/global bit is set to local OR index not within GDT limits THEN #GP(TSS selector); FI; Access TSS descriptor in GDT; IF TSS descriptor specifies that the TSS is busy (low-order 5 bits set to 00001) THEN #GP(TSS selector); FI; IF TSS not present THEN #NP(TSS selector); FI; SWITCH-TASKS (with nesting) to TSS; IF EIP not within code segment limit THEN #GP(0); FI; END; TASK-STATE-SEGMENT: IF TSS DPL < CPL or RPL OR TSS descriptor indicates TSS not available THEN #GP(TSS selector); FI; IF TSS is not present THEN #NP(TSS selector); FI; SWITCH-TASKS (with nesting) to TSS IF EIP not within code segment limit THEN #GP(0); FI; END;
Flags Affected All flags are affected if a task switch occurs; no flags are affected if a task switch does not occur.
3-61

Protected Mode Exceptions #GP(0) If target offset in destination operand is beyond the new code segment limit. If the segment selector in the destination operand is null. If the code segment selector in the gate is null. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #GP(selector) If code segment or gate or TSS selector index is outside descriptor table limits. If the segment descriptor pointed to by the segment selector in the destination operand is not for a conforming-code segment, nonconforming-code segment, call gate, task gate, or task state segment. If the DPL for a nonconforming-code segment is not equal to the CPL or the RPL for the segments segment selector is greater than the CPL. If the DPL for a conforming-code segment is greater than the CPL. If the DPL from a call-gate, task-gate, or TSS segment descriptor is less than the CPL or than the RPL of the call-gate, task-gate, or TSSs segment selector. If the segment descriptor for a segment selector from a call gate does not indicate it is a code segment. If the segment selector from a call gate is beyond the descriptor table limits. If the DPL for a code-segment obtained from a call gate is greater than the CPL. If the segment selector for a TSS has its local/global bit set for local. If a TSS segment descriptor specifies that the TSS is busy or not available. #SS(0) If pushing the return address, parameters, or stack segment pointer onto the stack exceeds the bounds of the stack segment, when no stack switch occurs. If a memory operand effective address is outside the SS segment limit. #SS(selector) If pushing the return address, parameters, or stack segment pointer onto the stack exceeds the bounds of the stack segment, when a stack switch occurs.
3-62

If the SS register is being loaded as part of a stack switch and the segment pointed to is marked not present. If stack segment does not have room for the return address, parameters, or stack segment pointer, when stack switch occurs. #NP(selector) #TS(selector) If a code segment, data segment, stack segment, call gate, task gate, or TSS is not present. If the new stack segment selector and ESP are beyond the end of the TSS. If the new stack segment selector is null. If the RPL of the new stack segment selector in the TSS is not equal to the DPL of the code segment being accessed. If DPL of the stack segment descriptor for the new stack segment is not equal to the DPL of the code segment descriptor. If the new stack segment is not a writable data segment. If segment-selector index for stack segment is outside descriptor table limits. #PF(fault-code) #AC(0) If a page fault occurs. If an unaligned memory access occurs when the CPL is 3 and alignment checking is enabled.
Real-Address Mode Exceptions #GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the target offset is beyond the code segment limit. Virtual-8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the target offset is beyond the code segment limit. #PF(fault-code) #AC(0) If a page fault occurs. If an unaligned memory access occurs when alignment checking is enabled.
3-63
CBW/CWDEConvert Byte to Word/Convert Word to Doubleword

Opcode 98 98 Instruction CBW CWDE Description AX sign-extend of AL EAX sign-extend of AX
Description These instructions double the size of the source operand by means of sign extension (refer to Figure 6-5 in Chapter 6, Instruction Set Summary of the Intel Architecture Software Developers Manual, Volume 1). The CBW (convert byte to word) instruction copies the sign (bit 7) in the source operand into every bit in the AH register. The CWDE (convert word to doubleword) instruction copies the sign (bit 15) of the word in the AX register into the higher 16 bits of the EAX register. The CBW and CWDE mnemonics reference the same opcode. The CBW instruction is intended for use when the operand-size attribute is 16 and the CWDE instruction for when the operandsize attribute is 32. Some assemblers may force the operand size to 16 when CBW is used and to 32 when CWDE is used. Others may treat these mnemonics as synonyms (CBW/CWDE) and use the current setting of the operand-size attribute to determine the size of values to be converted, regardless of the mnemonic used. The CWDE instruction is different from the CWD (convert word to double) instruction. The CWD instruction uses the DX:AX register pair as a destination operand; whereas, the CWDE instruction uses the EAX register as a destination. Operation
IF OperandSize = 16 (* instruction = CBW *) THEN AX SignExtend(AL); ELSE (* OperandSize = 32, instruction = CWDE *) EAX SignExtend(AX); FI;
3-64
CDQConvert Double to Quad

Refer to entry for CWD/CDQ Convert Word to Doubleword/Convert Doubleword to Quadword.
3-65
CLCClear Carry Flag

Opcode F8 Instruction CLC Description Clear CF flag
Description This instruction clears the CF flag in the EFLAGS register. Operation
CF 0;
Flags Affected The CF flag is cleared to 0. The OF, ZF, SF, AF, and PF flags are unaffected. Exceptions (All Operating Modes) None.
3-66
CLDClear Direction Flag

Opcode FC Instruction CLD Description Clear DF flag
Description This instruction clears the DF flag in the EFLAGS register. When the DF flag is set to 0, string operations increment the index registers (ESI and/or EDI). Operation
DF 0;
Flags Affected The DF flag is cleared to 0. The CF, OF, ZF, SF, AF, and PF flags are unaffected. Exceptions (All Operating Modes) None.
3-67
CLIClear Interrupt Flag

Opcode FA Instruction CLI Description Clear interrupt flag; interrupts disabled when interrupt flag cleared
Description This instruction clears the IF flag in the EFLAGS register. No other flags are affected. Clearing the IF flag causes the processor to ignore maskable external interrupts. The IF flag and the CLI and STI instruction have no affect on the generation of exceptions and NMI interrupts. The following decision table indicates the action of the CLI instruction (bottom of the table) depending on the processors mode of operating and the CPL and IOPL of the currently running program or procedure (top of the table).
PE = VM = CPL IOPL IF 0 #GP(0) NOTES: X Dont care N Action in column 1 not taken Y Action in column 1 taken 0 X X X Y N 1 0 IOPL X Y N 1 X X =3 Y N 1 0 > IOPL X N Y 1 1 X <3 N Y
3-68
CLIClear Interrupt Flag (Continued)

Operation
IF PE = 0 (* Executing in real-address mode *) THEN IF 0; ELSE IF VM = 0 (* Executing in protected mode *) THEN IF CPL IOPL THEN IF 0; ELSE #GP(0); FI; FI; ELSE (* Executing in Virtual-8086 mode *) IF IOPL = 3 THEN IF 0 ELSE #GP(0); FI; FI; FI;
Flags Affected The IF is cleared to 0 if the CPL is equal to or less than the IOPL; otherwise, it is not affected. The other flags in the EFLAGS register are unaffected. Protected Mode Exceptions #GP(0) If the CPL is greater (has less privilege) than the IOPL of the current program or procedure.
Real-Address Mode Exceptions None. Virtual-8086 Mode Exceptions #GP(0) If the CPL is greater (has less privilege) than the IOPL of the current program or procedure.
3-69
CLTSClear Task-Switched Flag in CR0

Opcode 0F 06 Instruction CLTS Description Clears TS flag in CR0
Description This instruction clears the task-switched (TS) flag in the CR0 register. This instruction is intended for use in operating-system procedures. It is a privileged instruction that can only be executed at a CPL of 0. It is allowed to be executed in real-address mode to allow initialization for protected mode. The processor sets the TS flag every time a task switch occurs. The flag is used to synchronize the saving of FPU context in multitasking applications. Refer to the description of the TS flag in Section 2.5., Control Registers in Chapter 2, System Architecture Overview of the Intel Architecture Software Developers Manual, Volume 3, for more information about this flag. Operation
CR0(TS) 0;
Flags Affected The TS flag in CR0 register is cleared. Protected Mode Exceptions #GP(0) If the CPL is greater than 0.
Real-Address Mode Exceptions None. Virtual-8086 Mode Exceptions #GP(0) If the CPL is greater than 0.
3-70
CMCComplement Carry Flag

Opcode F5 Instruction CMC Description Complement CF flag
Description This instruction complements the CF flag in the EFLAGS register. Operation
CF NOT CF;
Flags Affected The CF flag contains the complement of its original value. The OF, ZF, SF, AF, and PF flags are unaffected. Exceptions (All Operating Modes) None.
3-71
CMOVccConditional Move
Opcode 0F 47 /r 0F 47 /r 0F 43 /r 0F 43 /r 0F 42 /r 0F 42 /r 0F 46 /r 0F 46 /r 0F 42 /r 0F 42 /r 0F 44 /r 0F 44 /r 0F 4F /r 0F 4F /r 0F 4D /r 0F 4D /r 0F 4C /r 0F 4C /r 0F 4E /r 0F 4E /r 0F 46 /r 0F 46 /r 0F 42 /r 0F 42 /r 0F 43 /r 0F 43 /r 0F 47 /r 0F 47 /r 0F 43 /r 0F 43 /r 0F 45 /r 0F 45 /r 0F 4E /r 0F 4E /r 0F 4C /r 0F 4C /r 0F 4D /r 0F 4D /r 0F 4F /r 0F 4F /r Instruction CMOVA r16, r/m16 CMOVA r32, r/m32 CMOVAE r16, r/m16 CMOVAE r32, r/m32 CMOVB r16, r/m16 CMOVB r32, r/m32 CMOVBE r16, r/m16 CMOVBE r32, r/m32 CMOVC r16, r/m16 CMOVC r32, r/m32 CMOVE r16, r/m16 CMOVE r32, r/m32 CMOVG r16, r/m16 CMOVG r32, r/m32 CMOVGE r16, r/m16 CMOVGE r32, r/m32 CMOVL r16, r/m16 CMOVL r32, r/m32 CMOVLE r16, r/m16 CMOVLE r32, r/m32 CMOVNA r16, r/m16 CMOVNA r32, r/m32 CMOVNAE r16, r/m16 CMOVNAE r32, r/m32 CMOVNB r16, r/m16 CMOVNB r32, r/m32 CMOVNBE r16, r/m16 CMOVNBE r32, r/m32 CMOVNC r16, r/m16 CMOVNC r32, r/m32 CMOVNE r16, r/m16 CMOVNE r32, r/m32 CMOVNG r16, r/m16 CMOVNG r32, r/m32 CMOVNGE r16, r/m16 CMOVNGE r32, r/m32 CMOVNL r16, r/m16 CMOVNL r32, r/m32 CMOVNLE r16, r/m16 CMOVNLE r32, r/m32 Description Move if above (CF=0 and ZF=0) Move if above (CF=0 and ZF=0) Move if above or equal (CF=0) Move if above or equal (CF=0) Move if below (CF=1) Move if below (CF=1) Move if below or equal (CF=1 or ZF=1) Move if below or equal (CF=1 or ZF=1) Move if carry (CF=1) Move if carry (CF=1) Move if equal (ZF=1) Move if equal (ZF=1) Move if greater (ZF=0 and SF=OF) Move if greater (ZF=0 and SF=OF) Move if greater or equal (SF=OF) Move if greater or equal (SF=OF) Move if less (SF<>OF) Move if less (SF<>OF) Move if less or equal (ZF=1 or SF<>OF) Move if less or equal (ZF=1 or SF<>OF) Move if not above (CF=1 or ZF=1) Move if not above (CF=1 or ZF=1) Move if not above or equal (CF=1) Move if not above or equal (CF=1) Move if not below (CF=0) Move if not below (CF=0) Move if not below or equal (CF=0 and ZF=0) Move if not below or equal (CF=0 and ZF=0) Move if not carry (CF=0) Move if not carry (CF=0) Move if not equal (ZF=0) Move if not equal (ZF=0) Move if not greater (ZF=1 or SF<>OF) Move if not greater (ZF=1 or SF<>OF) Move if not greater or equal (SF<>OF) Move if not greater or equal (SF<>OF) Move if not less (SF=OF) Move if not less (SF=OF) Move if not less or equal (ZF=0 and SF=OF) Move if not less or equal (ZF=0 and SF=OF)
3-72
CMOVccConditional Move (Continued)

Opcode 0F 41 /r 0F 41 /r 0F 4B /r 0F 4B /r 0F 49 /r 0F 49 /r 0F 45 /r 0F 45 /r 0F 40 /r 0F 40 /r 0F 4A /r 0F 4A /r 0F 4A /r 0F 4A /r 0F 4B /r 0F 4B /r 0F 48 /r 0F 48 /r 0F 44 /r 0F 44 /r Instruction CMOVNO r16, r/m16 CMOVNO r32, r/m32 CMOVNP r16, r/m16 CMOVNP r32, r/m32 CMOVNS r16, r/m16 CMOVNS r32, r/m32 CMOVNZ r16, r/m16 CMOVNZ r32, r/m32 CMOVO r16, r/m16 CMOVO r32, r/m32 CMOVP r16, r/m16 CMOVP r32, r/m32 CMOVPE r16, r/m16 CMOVPE r32, r/m32 CMOVPO r16, r/m16 CMOVPO r32, r/m32 CMOVS r16, r/m16 CMOVS r32, r/m32 CMOVZ r16, r/m16 CMOVZ r32, r/m32 Description Move if not overflow (OF=0) Move if not overflow (OF=0) Move if not parity (PF=0) Move if not parity (PF=0) Move if not sign (SF=0) Move if not sign (SF=0) Move if not zero (ZF=0) Move if not zero (ZF=0) Move if overflow (OF=0) Move if overflow (OF=0) Move if parity (PF=1) Move if parity (PF=1) Move if parity even (PF=1) Move if parity even (PF=1) Move if parity odd (PF=0) Move if parity odd (PF=0) Move if sign (SF=1) Move if sign (SF=1) Move if zero (ZF=1) Move if zero (ZF=1)
Description The CMOVcc instructions check the state of one or more of the status flags in the EFLAGS register (CF, OF, PF, SF, and ZF) and perform a move operation if the flags are in a specified state (or condition). A condition code (cc) is associated with each instruction to indicate the condition being tested for. If the condition is not satisfied, a move is not performed and execution continues with the instruction following the CMOVcc instruction. These instructions can move a 16- or 32-bit value from memory to a general-purpose register or from one general-purpose register to another. Conditional moves of 8-bit register operands are not supported. The conditions for each CMOVcc mnemonic is given in the description column of the above table. The terms less and greater are used for comparisons of signed integers and the terms above and below are used for unsigned integers. Because a particular state of the status flags can sometimes be interpreted in two ways, two mnemonics are defined for some opcodes. For example, the CMOVA (conditional move if above) instruction and the CMOVNBE (conditional move if not below or equal) instruction are alternate mnemonics for the opcode 0F 47H.
3-73

The CMOVcc instructions are new for the Pentium Pro processor family; however, they may not be supported by all the processors in the family. Software can determine if the CMOVcc instructions are supported by checking the processors feature information with the CPUID instruction (refer to COMISSScalar Ordered Single-FP Compare and Set EFLAGS in this chapter). Operation
temp DEST IF condition TRUE THEN DEST SRC ELSE DEST temp FI;
Flags Affected None. Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-74

3-75
CMPCompare Two Operands

Opcode 3C ib 3D iw 3D id 80 /7 ib 81 /7 iw 81 /7 id 83 /7 ib 83 /7 ib 38 /r 39 /r 39 /r 3A /r 3B /r 3B /r Instruction CMP AL, imm8 CMP AX, imm16 CMP EAX, imm32 CMP r/m8, imm8 CMP r/m16, imm16 CMP r/m32,imm32 CMP r/m16,imm8 CMP r/m32,imm8 CMP r/m8,r8 CMP r/m16,r16 CMP r/m32,r32 CMP r8,r/m8 CMP r16,r/m16 CMP r32,r/m32 Description Compare imm8 with AL Compare imm16 with AX Compare imm32 with EAX Compare imm8 with r/m8 Compare imm16 with r/m16 Compare imm32 with r/m32 Compare imm8 with r/m16 Compare imm8 with r/m32 Compare r8 with r/m8 Compare r16 with r/m16 Compare r32 with r/m32 Compare r/m8 with r8 Compare r/m16 with r16 Compare r/m32 with r32
Description This instruction compares the first source operand with the second source operand and sets the status flags in the EFLAGS register according to the results. The comparison is performed by subtracting the second operand from the first operand and then setting the status flags in the same manner as the SUB instruction. When an immediate value is used as an operand, it is signextended to the length of the first operand. The CMP instruction is typically used in conjunction with a conditional jump (Jcc), condition move (CMOVcc), or SETcc instruction. The condition codes used by the Jcc, CMOVcc, and SETcc instructions are based on the results of a CMP instruction. Appendix B, EFLAGS Condition Codes, in the Intel Architecture Software Developers Manual, Volume 1, shows the relationship of the status flags and the condition codes. Operation
temp SRC1 SignExtend(SRC2); ModifyStatusFlags; (* Modify status flags in the same manner as the SUB instruction*)
Flags Affected The CF, OF, SF, ZF, AF, and PF flags are set according to the result.
3-76
CMPCompare Two Operands (Continued)

Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-77
CMPPSPacked Single-FP Compare

Opcode 0F,C2,/r,ib Instruction CMPPS xmm1, xmm2/m128, imm8 Description Compare packed SP FP numbers from XMM2/Mem to packed SP FP numbers in XMM1 register using imm8 as predicate.
Description For each individual pair of SP FP numbers, the CMPPS instruction returns an all "1" 32-bit mask or an all "0" 32-bit mask, using the comparison predicate specified by imm8.
CMPPS xmm1, xmm2/m128, imm8 Xmm1 10.0 2.0
(imm8=0)
==
Xmm2/ m128 Xmm1 10.0
==
9.0
==
11111111 True
00000000 False
11111111 True
00000000 False
Figure 3-7. Operation of the CMPPS (Imm8=0) Instruction
CMPPS xmm1, xmm2/m128,imm8 Xmm1 Xmm2/ m128 Xmm1 10.0 2.0 9.0
(Imm8=1) 1.0
<
3.0
<
11.0
<
9.0 00000000 False
<
4.0
00000000 False
11111111 True
11111111 True
3-78
CMPPSPacked Single-FP Compare (Continued)

CMPPS xmm1, xmm2/m128, imm8 Xmm1 10.0 2.0 (imm8=2)
<=
Xmm2/ m128 Xmm1 3.0
<=
9.0
<=
00000000 False
11111111 True
11111111 True
11111111 True
CMPPS xmm1, xmm2/m128,imm8 Xmm1 Xmm2/ m128 10.0 QNaN 9.0
(Imm8=3) 1.0
?
3.0 Xmm1
?
11.0
?
9.0 00000000 False
?
QNaN
00000000 False
11111111 True
11111111 True
3-79

!=
Xmm2/ m128 Xmm1 3.0
!=
9.0
!=
11111111 True
11111111 True
00000000 False
11111111 True
CMPPS xmm1, xmm2/m128,imm8 Xmm1 Xmm2/ m128 10.0 2.0 9.0
(Imm8=5) 1.0
!<
3.0
!<
11.0 9.0 4.0
Xmm1
11111111 True
00000000 False
11111111 True
00000000 False
3-80

!<=
Xmm2/ m128 Xmm1 3.0 9.0
!<=
11111111 True
00000000 False
00000000 False
00000000 False
CMPPS xmm1, xmm2/m128,imm8 Xmm1 Xmm2/ m128 10.0 QNaN 9.0
(Imm8=7) 1.0
!?
3.0
!?
11.0 9.0 QNaN
Xmm1
11111111 True
00000000 False
11111111 True
00000000 False
3-81

Note that a subsequent computational instruction which uses this mask as an input operand will not generate a fault, since a mask of all "0s" corresponds to an FP value of +0.0 and a mask of all "1s" corresponds to an FP value of -qNaN. Some of the comparisons can be achieved only through software emulation. For these comparisons the programmer must swap the operands, copying registers when necessary to protect the data that will now be in the destination, and then perform the compare using a different predicate. The predicate to be used for these emulations is listed in the table under the heading "Emulation". The following table shows the different comparison types:
:
Predicate
Description
Relation
Emulation
imm8 Encoding
Result if NaN Operand False False False False False
Q/SNaN Operand Signals Invalid No Yes Yes Yes Yes No No Yes Yes Yes Yes No
eq lt le
equal less-than less-than-or-equal greater than greater-than-orequal
xmm1 == xmm2 xmm1 < xmm2 xmm1 <= xmm2 xmm1 > xmm2 xmm1 >= xmm2 xmm1 ? xmm2 !(xmm1 == xmm2) !(xmm1 < xmm2) !(xmm1 <= xmm2) !(xmm1 > xmm2) !(xmm1 >= xmm2) !(xmm1 ? xmm2) swap, protect, nlt swap, protect, nle swap, protect, lt swap protect, le
000B 001B 010B
unord neq nlt nle
unordered not-equal not-less-than not-less-than-orequal not-greater-than not-greater-than-orequal
011B 100B 101B 110B
True True True True True True
ord
ordered
111B
False
NOTE: The greater-than, greater-than-or-equal, not-greater-than, and not-greater-than-or-equal relations are not directly implemented in hardware.
3-82

Operation
IF (imm8 = 0) THEN OP = "EQ"; ELSE IF (imm8 = 1) THEN OP = "LT"; ELSE IF (imm8 = 2) THEN OP = "LE"; ELSE IF (imm8 = 3) THEN OP = "UNORD"; ELSE IF (imm8 = 4) THEN OP = "NE"; ELSE IF (imm8 = 5) THEN OP = "NLT"; ELSE IF (imm8 = 6) THEN OP = "NLE"; ELSE IF (imm8 = 7) THEN OP = "ORD"; FI FI FI FI FI FI FI FI CMP0 = DEST[31-0] OP SRC/m128[31-0]; CMP1 = DEST[63-32] OP SRC/m128[63-32]; CMP2 = DEST [95-64] OP SRC/m128[95-64]; CMP3 = DEST[127-96] OP SRC/m128[127-96]; IF (CMP0 = TRUE) THEN DEST[31-0] = 0XFFFFFFFF; DEST[63-32] = 0XFFFFFFFF; DEST[95-64] = 0XFFFFFFFF; DEST[127-96] = 0XFFFFFFFF;
3-83

ELSE DEST[31-0] = 0X00000000; DEST[63-32] = 0X00000000; DEST[95-64] = 0X00000000; DEST[127-96] = 0X00000000; FI
Intel C/C++ Compiler Intrinsic Equivalents

__m128 _mm_cmpeq_ps(__m128 a, __m128 b)
Compare for equality.

__m128 _mm_cmplt_ps(__m128 a, __m128 b)
Compare for less-than.

__m128 _mm_cmple_ps(__m128 a, __m128 b)
Compare for less-than-or-equal.

__m128 _mm_cmpgt_ps(__m128 a, __m128 b)
Compare for greater-than.

__m128 _mm_cmpge_ps(__m128 a, __m128 b)
Compare for greater-than-or-equal.

__m128 _mm_cmpneq_ps(__m128 a, __m128 b)
Compare for inequality.

__m128 _mm_cmpnlt_ps(__m128 a, __m128 b)
Compare for not-less-than.

__m128 _mm_cmpngt_ps(__m128 a, __m128 b)
Compare for not-greater-than.

__m128 _mm_cmpnge_ps(__m128 a, __m128 b)
Compare for not-greater-than-or-equal.

__m128 _mm_cmpord_ps(__m128 a, __m128 b)
Compare for ordered.

__m128 _mm_cmpunord_ps(__m128 a, __m128 b)
Compare for unordered.

__m128 _mm_cmpnle_ps(__m128 a, __m128 b)
Compare for not-less-than-or-equal.
3-84

Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions Invalid, if sNaN operands, denormal. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-85

Virtual Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) Comments Compilers and assemblers should implement the following 2-operand pseudo-ops in addition to the 3-operand CMPPS instruction: Pseudo-Op CMPEQPS xmm1, xmm2 CMPLTPS xmm1, xmm2 CMPLEPS xmm1, xmm2 CMPUNORDPS xmm1, xmm2 CMPNEQPS xmm1, xmm2 CMPNLTPS xmm1, xmm2 CMPNLEPS xmm1, xmm2 CMPORDPS xmm1, xmm2 Implementation CMPPS xmm1,xmm2, 0 CMPPS xmm1,xmm2, 1 CMPPS xmm1,xmm2, 2 CMPPS xmm1,xmm2, 3 CMPPS xmm1,xmm2, 4 CMPPS xmm1,xmm2, 5 CMPPS xmm1,xmm2, 6 CMPPS xmm1,xmm2, 7 For a page fault.
The greater-than relations not implemented in hardware require more than one instruction to emulate in software and therefore should not be implemented as pseudo-ops. (For these, the programmer should reverse the operands of the corresponding less than relations and use move instructions to ensure that the mask is moved to the correct destination register and that the source operand is left intact.) Bits 7-4 of the immediate field are reserved. Different processors may handle them differently. Usage of these bits risks incompatibility with future processors.
3-86
CMPS/CMPSB/CMPSW/CMPSDCompare String Operands

Opcode A6 A7 A7 A6 A7 A7 Instruction CMPS m8, m8 CMPS m16, m16 CMPS m32, m32 CMPSB CMPSW CMPSD Description Compares byte at address DS:(E)SI with byte at address ES:(E)DI and sets the status flags accordingly Compares word at address DS:(E)SI with word at address ES:(E)DI and sets the status flags accordingly Compares doubleword at address DS:(E)SI with doubleword at address ES:(E)DI and sets the status flags accordingly Compares byte at address DS:(E)SI with byte at address ES:(E)DI and sets the status flags accordingly Compares word at address DS:(E)SI with word at address ES:(E)DI and sets the status flags accordingly Compares doubleword at address DS:(E)SI with doubleword at address ES:(E)DI and sets the status flags accordingly
Description This instruction compares the byte, word, or double word specified with the first source operand with the byte, word, or double word specified with the second source operand and sets the status flags in the EFLAGS register according to the results. Both the source operands are located in memory. The address of the first source operand is read from either the DS:ESI or the DS:SI registers (depending on the address-size attribute of the instruction, 32 or 16, respectively). The address of the second source operand is read from either the ES:EDI or the ES:DI registers (again depending on the address-size attribute of the instruction). The DS segment may be overridden with a segment override prefix, but the ES segment cannot be overridden. At the assembly-code level, two forms of this instruction are allowed: the explicit-operands form and the no-operands form. The explicit-operands form (specified with the CMPS mnemonic) allows the two source operands to be specified explicitly. Here, the source operands should be symbols that indicate the size and location of the source values. This explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the source operand symbols must specify the correct type (size) of the operands (bytes, words, or doublewords), but they do not have to specify the correct location. The locations of the source operands are always specified by the DS:(E)SI and ES:(E)DI registers, which must be loaded correctly before the compare string instruction is executed. The no-operands form provides short forms of the byte, word, and doubleword versions of the CMPS instructions. Here also the DS:(E)SI and ES:(E)DI registers are assumed by the processor to specify the location of the source operands. The size of the source operands is selected with the mnemonic: CMPSB (byte comparison), CMPSW (word comparison), or CMPSD (doubleword comparison).
3-87
CMPS/CMPSB/CMPSW/CMPSDCompare String Operands (Continued)

After the comparison, the (E)SI and (E)DI registers are incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the (E)SI and (E)DI register are incremented; if the DF flag is 1, the (E)SI and (E)DI registers are decremented.) The registers are incremented or decremented by one for byte operations, by two for word operations, or by four for doubleword operations. The CMPS, CMPSB, CMPSW, and CMPSD instructions can be preceded by the REP prefix for block comparisons of ECX bytes, words, or doublewords. More often, however, these instructions will be used in a LOOP construct that takes some action based on the setting of the status flags before the next comparison is made. Refer to REP/REPE/REPZ/REPNE /REPNZRepeat String Operation Prefix in this chapter for a description of the REP prefix. Operation
temp SRC1 SRC2; SetStatusFlags(temp); IF (byte comparison) THEN IF DF = 0 THEN (E)SI (E)SI + 1; (E)DI (E)DI + 1; ELSE (E)SI (E)SI 1; (E)DI (E)DI 1; FI; ELSE IF (word comparison) THEN IF DF = 0 (E)SI (E)SI + 2; (E)DI (E)DI + 2; ELSE (E)SI (E)SI 2; (E)DI (E)DI 2; FI; ELSE (* doubleword comparison*) THEN IF DF = 0 (E)SI (E)SI + 4; (E)DI (E)DI + 4; ELSE (E)SI (E)SI 4; (E)DI (E)DI 4; FI; FI;
3-88
CMPS/CMPSB/CMPSW/CMPSDCompare String Operands (Continued)

Flags Affected The CF, OF, SF, ZF, AF, and PF flags are set according to the temporary result of the comparison. Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-89
CMPSSScalar Single-FP Compare

Opcode F3,0F,C2,/r,ib Instruction CMPSS xmm1, xmm2/m32, imm8 Description Compare lowest SP FP number from XMM2/Mem to lowest SP FP number in XMM1 register using imm8 as predicate.
Description For the lowest pair of SP FP numbers, the CMPSS instruction returns an all "1" 32-bit mask or an all "0" 32-bit mask, using the comparison predicate specified by imm8. The values for the upper three pairs of SP FP numbers are not compared. Note that a subsequent computational instruction, which uses this mask as an input operand, will not generate a fault, since a mask of all "0"s corresponds to an FP value of +0.0, and a mask of all "1s" corresponds to an FP value of -qNaN. Some comparisons can be achieved only through software emulation. For those comparisons, the programmer must swap the operands, copying registers when necessary to protect the data that will now be in the destination, and then perform the compare using a different predicate. The predicate to be used for these emulations is listed under the heading "Emulation."
3-90
CMPSSScalar Single-FP Compare (Continued)

The following table shows the different comparison types:
Predicate
Description
Relation
Emulation
imm8 Encoding
Result if NaN Operand False False False False False
qNaN Operand Signals Invalid No Yes Yes Yes Yes No No Yes Yes Yes Yes No
eq lt le
equal less-than less-than-or-equal greater than greater-than-or-equal
xmm1 == xmm2 xmm1 < xmm2 xmm1 <= xmm2 xmm1 > xmm2 xmm1 >= xmm2 xmm1 ? xmm2 !(xmm1 == xmm2) !(xmm1 < xmm2) !(xmm1 <= xmm2) !(xmm1 > xmm2) !(xmm1 >= xmm2) !(xmm1 ? xmm2) swap, protect, nlt swap, protect, nle swap, protect, lt swap protect, le
000B 001B 010B
unord neq nlt nle
unordered not-equal not-less-than not-less-than-or-equal not-greater-than not-greater-than-or-equal
011B 100B 101B 110B
True True True True True True
ord NOTE:
ordered
111B
False
* The greater-than, greater-than-or-equal, not-greater-than, and not-greater-than-or-equal relations are not directly implemented in hardware.
3-91

CMPSS xmm1, xmm2/m32, imm8 Xmm1 (imm8=0)
==
Xmm2/ m32 Xmm1 00000000 False
Figure 3-15. Operation of the CMPSS (Imm8=0) Instruction
CMPSS xmm1, xmm2/m32,imm8 Xmm1 Xmm2/ m32
(Imm8=1) 1.0
4.0
Xmm1 True
3-92

<=
Xmm2/ m32 Xmm1 True
CMPSS xmm1, xmm2/m32,imm8 Xmm1 Xmm2/ m32 Xmm1
(Imm8=3) QNaN
4.0
True
3-93

!=
Xmm2/ m32 Xmm1 True
(Imm8=5) 1.0
4.0
False
3-94

!<=
Xmm2/ m32 Xmm1 False
(Imm8=7) 1.0
QNaN
False
3-95

Operation
IF (imm8 = 0) THEN OP = "EQ"; ELSE IF (imm8 = 1) THEN OP = "LT"; ELSE IF (imm8 = 2) THEN OP = "LE"; ELSE IF (imm8 = 3) THEN OP = "UNORD"; ELSE IF (imm8 = 4) THEN OP = "NE"; ELSE IF (imm8 = 5) THEN OP = "NLT"; ELSE IF (imm8 = 6) THEN OP = "NLE"; ELSE IF (imm8 = 7) THEN OP = "ORD"; FI FI FI FI FI FI FI FI CMP0 = DEST[31-0] OP SRC/m128[31-0]; IF (CMP0 = TRUE) THEN DEST[31-0] = 0XFFFFFFFF; DEST[63-32] = DEST[63-32]; DEST[95-64] = DEST[95-64]; DEST[127-96] = DEST[127-96]; ELSE DEST[31-0] = 0X00000000; DEST[63-32] = DEST[63-32]; DEST[95-64] = DEST[95-64]; DEST[127-96] = DEST[127-96]; FI
3-96

__m128 _mm_cmpeq_ss(__m128 a, __m128 b)
Compare for equality.

__m128 _mm_cmplt_ss(__m128 a, __m128 b)
Compare for less-than.

__m128 _mm_cmple_ss(__m128 a, __m128 b)
Compare for less-than-or-equal.

__m128 _mm_cmpgt_ss(__m128 a, __m128 b)
Compare for greater-than.

__m128 _mm_cmpge_ss(__m128 a, __m128 b)
Compare for greater-than-or-equal.

__m128 _mm_cmpneq_ss(__m128 a, __m128 b)
Compare for inequality.

__m128 _mm_cmpnlt_ss(__m128 a, __m128 b)
Compare for not-less-than.

__m128 _mm_cmpnle_ss(__m128 a, __m128 b)
Compare for not-less-than-or-equal.

__m128 _mm_cmpngt_ss(__m128 a, __m128 b)
Compare for not-greater-than.

__m128 _mm_cmpnge_ss(__m128 a, __m128 b)
Compare for not-greater-than-or-equal.

__m128 _mm_cmpord_ss(__m128 a, __m128 b)
Compare for ordered.

__m128 _mm_cmpunord_ss(__m128 a, __m128 b)
Compare for unordered.
3-97

Exceptions None. Numeric Exceptions Invalid if sNaN operand, invalid if qNaN and predicate as listed in above table, denormal. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #AC #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference. To enable #AC exceptions, three conditions must be true (CR0.AM is set; EFLAGS.AC is set; current CPL is 3). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-98

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #AC #PF (fault-code) Comments Compilers and assemblers should implement the following 2-operand pseudo-ops in addition to the 3-operand CMPSS instruction. For unaligned memory reference if the current privilege level is 3. For a page fault.
Pseudo-Op CMPEQSS xmm1, xmm2 CMPLTSS xmm1, xmm2 CMPLESS xmm1, xmm2 CMPUNORDSS xmm1, xmm2 CMPNEQSS xmm1, xmm2 CMPNLTSS xmm1, xmm2 CMPNLESS xmm1, xmm2 CMPORDSS xmm1, xmm2
Implementation CMPSS xmm1,xmm2, 0 CMPSS xmm1,xmm2, 1 CMPSS xmm1,xmm2, 2 CMPSS xmm1,xmm2, 3 CMPSS xmm1,xmm2, 4 CMPSS xmm1,xmm2, 5 CMPSS xmm1,xmm2, 6 CMPSS xmm1,xmm2, 7
The greater-than relations not implemented in hardware require more than one instruction to emulate in software and therefore should not be implemented as pseudo-ops. (For these, the programmer should reverse the operands of the corresponding less than relations and use move instructions to ensure that the mask is moved to the correct destination register and that the source operand is left intact.) Bits 7-4 of the immediate field are reserved. Different processors may handle them differently. Usage of these bits risks incompatibility with future processors.
3-99
CMPXCHGCompare and Exchange

Opcode 0F B0/r 0F B1/r 0F B1/r Instruction CMPXCHG r/m8,r8 CMPXCHG r/m16,r16 CMPXCHG r/m32,r32 Description Compare AL with r/m8. If equal, ZF is set and r8 is loaded into r/m8. Else, clear ZF and load r/m8 into AL. Compare AX with r/m16. If equal, ZF is set and r16 is loaded into r/m16. Else, clear ZF and load r/m16 into AL Compare EAX with r/m32. If equal, ZF is set and r32 is loaded into r/m32. Else, clear ZF and load r/m32 into AL
Description This instruction compares the value in the AL, AX, or EAX register (depending on the size of the operand) with the first operand (destination operand). If the two values are equal, the second operand (source operand) is loaded into the destination operand. Otherwise, the destination operand is loaded into the AL, AX, or EAX register. This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processors bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.) Intel Architecture Compatibility This instruction is not supported on Intel processors earlier than the Intel486 processors. Operation
(* accumulator = AL, AX, or EAX, depending on whether *) (* a byte, word, or doubleword comparison is being performed*) IF accumulator = DEST THEN ZF 1 DEST SRC ELSE ZF 0 accumulator DEST FI;
Flags Affected The ZF flag is set if the values in the destination operand and register AL, AX, or EAX are equal; otherwise it is cleared. The CF, PF, AF, SF, and OF flags are set according to the results of the comparison operation.
3-100
CMPXCHGCompare and Exchange (Continued)

Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-101
CMPXCHG8BCompare and Exchange 8 Bytes

Opcode 0F C7 /1 m64 Instruction CMPXCHG8B m64 Description Compare EDX:EAX with m64. If equal, set ZF and load ECX:EBX into m64. Else, clear ZF and load m64 into EDX:EAX.
Description This instruction compares the 64-bit value in EDX:EAX with the operand (destination operand). If the values are equal, the 64-bit value in ECX:EBX is stored in the destination operand. Otherwise, the value in the destination operand is loaded into EDX:EAX. The destination operand is an 8-byte memory location. For the EDX:EAX and ECX:EBX register pairs, EDX and ECX contain the high-order 32 bits and EAX and EBX contain the low-order 32 bits of a 64-bit value. This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processors bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.) Intel Architecture Compatibility This instruction is not supported on Intel processors earlier than the Pentium processors. Operation
IF (EDX:EAX = DEST) ZF 1 DEST ECX:EBX ELSE ZF 0 EDX:EAX DEST
Flags Affected The ZF flag is set if the destination operand and EDX:EAX are equal; otherwise it is cleared. The CF, PF, AF, SF, and OF flags are unaffected.
3-102
CMPXCHG8BCompare and Exchange 8 Bytes (Continued)

Protected Mode Exceptions #UD #GP(0) If the destination operand is not a memory location. If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
Real-Address Mode Exceptions #UD #GP #SS If the destination operand is not a memory location. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit.
Virtual-8086 Mode Exceptions #UD #GP(0) #SS(0) #PF(fault-code) #AC(0) If the destination operand is not a memory location. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made.
3-103
COMISSScalar Ordered Single-FP Compare and Set EFLAGS

Opcode 0F,2F,/r Instruction COMISS xmm1, xmm2/m32 Description Compare lower SP FP number in XMM1 register with lower SP FP number in XMM2/Mem and set the status flags accordingly
Description The COMISS instruction compares two SP FP numbers and sets the ZF,PF,CF bits in the EFLAGS register as described above. Although the data type is packed single-FP, only the lower SP numbers are compared. In addition, the OF, SF, and AF bits in the EFLAGS register are zeroed out. The unordered predicate is returned if either input is a NaN (qNaN or sNaN).
COMISS xmm1, xmm2/m32 Xmm1
Xmm2/ m32 Xmm1
Figure 3-23. Operation of the COMISS Instruction, Condition One EFLAGS: OF,SF,AF=000 EFLAGS: ZF,PF,CF=111 MXCSR flags: Invalid flag is set
3-104
COMISSScalar Ordered Single-FP Compare And Set EFLAGS (Continued)

COMISS xmm1, xmm2/m32 Xmm1 Xmm2/ m32 9.0
6.0
=
Xmm1
=
9.0
Figure 3-24. Operation of the COMISS Instruction, Condition Two EFLAGS: OF,SF,AF=000 EFLAGS: ZF,PF,CF=000 MXCSR flags: Invalid flag is set
COMISS xmm1, xmm2/m32 Xmm1
Xmm2/ m32 Xmm1
Figure 3-25. Operation of the COMISS Instruction, Condition Three EFLAGS: OF,SF,AF=000 EFLAGS: ZF,PF,CF=001 MXCSR flags: Invalid flag is set
3-105

COMISS xmm1, xmm2/m32 Xmm1 Xmm2/ m32 6.0
6.0
=
Xmm1
=
6.0
Figure 3-26. Operation of the COMISS Instruction, Condition Four EFLAGS: OF,SF,AF=000 EFLAGS: ZF,PF,CF=100 MXCSR flags: Invalid flag is set
3-106

Operation
OF = 0; SF = 0; AF = 0; IF ((DEST[31-0] UNORD SRC/m32[31-0]) = TRUE) THEN ZF = 1; PF = 1; CF = 1; ELSE IF ((DEST[31-0] GTRTHAN SRC/m32[31-0]) = TRUE)THEN ZF = 0; PF = 0; CF = 0; ELSE IF ((DEST[31-0] LESSTHAN SRC/m32[31-0]) = TRUE THEN ZF = 0; PF = 0; CF = 1; ELSE ZF = 1; PF = 0; CF = 0; FI FI FI
3-107

int_mm_comieq_ss(__m128 a, __m128 b)
Compares the lower SP FP value of a and b for a equal to b. If a and b are equal, 1 is returned. Otherwise 0 is returned.
int_mm_comilt_ss(__m128 a, __m128 b)
Compares the lower SP FP value of a and b for a less than b. If a is less than b, 1 is returned. Otherwise 0 is returned.
int_mm_comile_ss(__m128 a, __m128 b)
Compares the lower SP FP value of a and b for a less than or equal to b. If a is less than or equal to b, 1 is returned. Otherwise 0 is returned.
int_mm_comigt_ss(__m128 a, __m128 b)
Compares the lower SP FP value of a and b for a greater than b. If a is greater than b are equal, 1 is returned. Otherwise 0 is returned.
int_mm_comige_ss(__m128 a, __m128 b)
Compares the lower SP FP value of a and b for a greater than or equal to b. If a is greater than or equal to b, 1 is returned. Otherwise 0 is returned.
int_mm_comineq_ss(__m128 a, __m128 b)
Compares the lower SP FP value of a and b for a not equal to b. If a and b are not equal, 1 is returned. Otherwise 0 is returned. Exceptions None.
3-108

Numeric Exceptions Invalid (if sNaN or qNaN operands), Denormal. Integer EFLAGS values will not be updated in the presence of unmasked numeric exceptions. #GP(0) #SS(0) #PF (fault-code) #UD #NM #AC #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-109

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #AC #PF (fault-code) Comments COMISS differs from UCOMISS and COMISS in that it signals an invalid numeric exception when a source operand is either a qNaN or an sNaN operand; UCOMISS signals invalid only a source operand is an sNaN. The usage of Repeat (F2H, F3H) and Operand-Size (66H) prefixes with COMISS is reserved. Different processor implementations may handle this prefix differently. Usage of this prefix with COMISS risks incompatibility with future processors. For unaligned memory reference if the current privilege level is 3. For a page fault.
3-110
CPUIDCPU Identification
Opcode 0F A2 Instruction CPUID Description EAX Processor identification information
Description This instruction provides processor identification information in registers EAX, EBX, ECX, and EDX. This information identifies Intel as the vendor, gives the family, model, and stepping of processor, feature information, and cache information. An input value loaded into the EAX register determines what information is returned, as shown in Table 3-6.
Table 3-6. Information Returned by CPUID Instruction
Initial EAX Value 0 EAX EBX ECX EDX 1 EAX EBX ECX EDX EAX EBX ECX EDX Information Provided about the Processor Maximum CPUID Input Value (2 for the P6 family processors and 1 for the Pentium processor and the later versions of Intel486 processor that support the CPUID instruction). Genu ntel ineI Version Information (Type, Family, Model, and Stepping ID) Reserved Reserved Feature Information Cache and TLB Information Cache and TLB Information Cache and TLB Information Cache and TLB Information
The CPUID instruction can be executed at any privilege level to serialize instruction execution. Serializing instruction execution guarantees that any modifications to flags, registers, and memory for previous instructions are completed before the next instruction is fetched and executed. For more information, refer to Section 7.4., Serializing Instructions in Chapter 7, Multiple-Processor Management of the Intel Architecture Software Developers Manual, Volume 3. When the input value in register EAX is 0, the processor returns the highest value the CPUID instruction recognizes in the EAX register (refer to Table 3-6). A vendor identification string is returned in the EBX, EDX, and ECX registers. For Intel processors, the vendor identification string is GenuineIntel as follows:
EBX 756e6547h (* "Genu", with G in the low nibble of BL *) EDX 49656e69h (* "ineI", with i in the low nibble of DL *) ECX 6c65746eh (* "ntel", with n in the low nibble of CL *)
3-111
CPUIDCPU Identification (Continued)

When the input value is 1, the processor returns version information in the EAX register and feature information in the EDX register (refer to Figure 3-27).
31
14 13 12 11
8 7
4 3
EAX
Family
Model
Stepping ID
Processor Type Family (0110B for the Pentium Pro Processor Family) Model (Beginning with 0001B)
Streaming SIMD Extensions
Figure 3-27. Version and Feature Information in Registers EAX and EDX
The version information consists of an Intel Architecture family identifier, a model identifier, a stepping ID, and a processor type. The model, family, and processor type for the first processor in the Intel Pentium Pro family is as follows:
Model0001B Family0110B Processor Type00B
3-112

Refer to AP-485, Intel Processor Identification and the CPUID Instruction (Order Number 241618), the Intel Pentium Pro Processor Specification Update (Order Number 242689), and the Intel Pentium Processor Specification Update (Order Number 242480) for more information on identifying earlier Intel Architecture processors. The available processor types are given in Table 3-7. Intel releases information on stepping IDs as needed.
Table 3-7. Processor Type Field
Type Original OEM Processor Intel OverDrive Processor Dual processor * Intel reserved. * Not applicable to Intel386 and Intel486 processors.
Encoding 00B 01B 10B 11B
3-113

Table 3-8 shows the encoding of the feature flags in the EDX register. A feature flag set to 1 indicates the corresponding feature is supported. Software should identify Intel as the vendor to properly interpret the feature flags.
Table 3-8. Feature Flags Returned in EDX Register
Bit 0 1 Feature FPUFloatingPoint Unit on Chip VMEVirtual8086 Mode Enhancements Description Processor contains an FPU and executes the Intel 387 instruction set. Processor supports the following virtual-8086 mode enhancements: CR4.VME bit enables virtual-8086 mode extensions. CR4.PVI bit enables protected-mode virtual interrupts. Expansion of the TSS with the software indirection bitmap. EFLAGS.VIF bit (virtual interrupt flag). EFLAGS.VIP bit (virtual interrupt pending flag). Processor supports I/O breakpoints, including the CR4.DE bit for enabling debug extensions and optional trapping of access to the DR4 and DR5 registers. Processor supports 4-Mbyte pages, including the CR4.PSE bit for enabling page size extensions, the modified bit in page directory entries (PDEs), page directory entries, and page table entries (PTEs). Processor supports the RDTSC (read time stamp counter) instruction, including the CR4.TSD bit that, along with the CPL, controls whether the time stamp counter can be read. Processor supports the RDMSR (read model-specific register) and WRMSR (write model-specific register) instructions. Processor supports physical addresses greater than 32 bits, the extended page-table-entry format, an extra level in the page translation tables, and 2MByte pages. The CR4.PAE bit enables this feature. The number of address bits is implementation specific. The Pentium Pro processor supports 36 bits of addressing when the PAE bit is set. Processor supports the CR4.MCE bit, enabling machine check exceptions. However, this feature does not define the model-specific implementations of machine-check error logging, reporting, or processor shutdowns. Machinecheck exception handlers might have to check the processor version to do model-specific processing of the exception or check for presence of the machine-check feature. Processor supports the CMPXCHG8B (compare and exchange 8 bytes) instruction. Processor contains an on-chip Advanced Programmable Interrupt Controller (APIC) and it has been enabled and is available for use.
DEDebugging Extensions PSEPage Size Extensions TSCTime Stamp Counter MSRModel Specific Registers PAEPhysical Address Extension
5 6
MCEMachine Check Exception
8 9 10
CX8CMPXCHG 8B Instruction APIC Reserved
3-114

Bit 11 12 Feature SEPFast System Call MTRRMemory Type Range Registers Description Indicates whether the processor supports the Fast System Call instructions, SYSENTER and SYSEXIT. Processor supports machine-specific memory-type range registers (MTRRs). The MTRRs contains bit fields that indicate the processors MTRR capabilities, including which memory types the processor supports, the number of variable MTRRs the processor supports, and whether the processor supports fixed MTRRs. Processor supports the CR4.PGE flag enabling the global bit in both PTDEs and PTEs. These bits are used to indicate translation lookaside buffer (TLB) entries that are common to different tasks and need not be flushed when control register CR3 is written. Processor supports the MCG_CAP (machine check global capability) MSR. The MCG_CAP register indicates how many banks of error reporting MSRs the processor supports. Processor supports the CMOVcc instruction and, if the FPU feature flag (bit 0) is also set, supports the FCMOVcc and FCOMI instructions. Processor supports CMOVcc, and if the FPU feature flag (bit 0) is also set, supports the FMOVCC and FCOMI instructions. Processor supports 4MB pages with 36 bit physical addresses. Processor supports the 96-bit Processor Number feature, and the feature is enabled
13
PGEPTE Global Flag
14
MCAMachine Check Architecture CMOVConditional Move and Compare Instructions FGPATPage Attribute Table PSE-3636-bit Page Size Extension PNProcessor Number Reserved MMX Technology
15
16 17 18 19-22 23
Processor supports the MMX instruction set. These instructions operate in parallel on multiple data elements (8 bytes, 4 words, or 2 doublewords) packed into quadword registers or memory locations. Indicates whether the processor supports the FXSAVE and FXRSTOR instructions for fast save and restore of the floating point context. Presence of this bit also indicates that CR4.OSFXSR is available for an operating system to indicate that it uses the fast save/restore instructions. Processor supports the Streaming SIMD Extensions instruction set.
24
FXSRFast FP/MMX Technology/Streaming SIMD Extensions save/restore XMMStreaming SIMD Extensions Reserved
25 26-31
3-115

When the input value is 2, the processor returns information about the processors internal caches and TLBs in the EAX, EBX, ECX, and EDX registers. The encoding of these registers is as follows:
The least-significant byte in register EAX (register AL) indicates the number of times the CPUID instruction must be executed with an input value of 2 to get a complete description of the processors caches and TLBs. The Pentium Pro family of processors will return a 1. The most significant bit (bit 31) of each register indicates whether the register contains valid information (cleared to 0) or is reserved (set to 1). If a register contains valid information, the information is contained in one-byte descriptors. Table 3-9 shows the encoding of these descriptors.
Table 3-9. Encoding of Cache and TLB Descriptors
Descriptor Value 00H 01H 02H 03H 04H 06H 08H 0AH 0CH 40H 41H 42H 43H 44H 45H Null descriptor Instruction TLB: 4K-Byte Pages, 4-way set associative, 32 entries Instruction TLB: 4M-Byte Pages, fully associative, two entries Data TLB: 4K-Byte Pages, 4-way set associative, 64 entries Data TLB: 4M-Byte Pages, 4-way set associative, eight entries Instruction cache: 8K Bytes, 4-way set associative, 32 byte line size Instruction cache: 16K Bytes, 4-way set associative, 32 byte line size Data cache: 8K Bytes, 2-way set associative, 32 byte line size Data cache: 16K Bytes, 2-way or 4-way set associative, 32 byte line size No L2 Cache L2 Unified cache: 128K Bytes, 4-way set associative, 32 byte line size L2 Unified cache: 256K Bytes, 4-way set associative, 32 byte line size L2 Unified cache: 512K Bytes, 4-way set associative, 32 byte line size L2 Unified cache: 1M Byte, 4-way set associative, 32 byte line size L2 Unified cache: 2M Byte, 4-way set associative, 32 byte line size Cache or TLB Description
3-116

The first member of the Pentium Pro processor family will return the following information about caches and TLBs when the CPUID instruction is executed with an input value of 2: EAX EBX ECX EDX 03 02 01 01H 0H 0H 06 04 0A 42H
These values are interpreted as follows:
The least-significant byte (byte 0) of register EAX is set to 01H, indicating that the CPUID instruction needs to be executed only once with an input value of 2 to retrieve complete information about the processors caches and TLBs. The most-significant bit of all four registers (EAX, EBX, ECX, and EDX) is set to 0, indicating that each register contains valid 1-byte descriptors. Bytes 1, 2, and 3 of register EAX indicate that the processor contains the following: 01HA 32-entry instruction TLB (4-way set associative) for mapping 4-KByte pages. 02HA 2-entry instruction TLB (fully associative) for mapping 4-MByte pages. 03HA 64-entry data TLB (4-way set associative) for mapping 4-KByte pages.
The descriptors in registers EBX and ECX are valid, but contain null descriptors. Bytes 0, 1, 2, and 3 of register EDX indicate that the processor contains the following: 42HA 256-KByte unified cache (the L2 cache), 4-way set associative, with a 32-byte cache line size. 0AHAn 8-KByte data cache (the L1 data cache), 2-way set associative, with a 32-byte cache line size. 04HAn 8-entry data TLB (4-way set associative) for mapping 4M-byte pages. 06HAn 8-KByte instruction cache (the L1 instruction cache), 4-way set associative, with a 32-byte cache line size.
Intel Architecture Compatibility The CPUID instruction is not supported in early models of the Intel486 processor or in any Intel Architecture processor earlier than the Intel486 processor. The ID flag in the EFLAGS register can be used to determine if this instruction is supported. If a procedure is able to set or clear this flag, the CPUID is supported by the processor running the procedure.
3-117

Operation
CASE (EAX) OF EAX = 0: EAX highest input value understood by CPUID; (* 2 for Pentium Pro processor *) EBX Vendor identification string; EDX Vendor identification string; ECX Vendor identification string; BREAK; EAX = 1: EAX[3:0] Stepping ID; EAX[7:4] Model; EAX[11:8] Family; EAX[13:12] Processor type; EAX[31:12] Reserved; EBX Reserved;ECX Reserved; EDX Feature flags; (* Refer to Figure 3-27 *) BREAK; EAX = 2: EAX Cache and TLB information; EBX Cache and TLB information; ECX Cache and TLB information; EDX Cache and TLB information; BREAK; DEFAULT: (* EAX > highest value recognized by CPUID *) EAX reserved, undefined; EBX reserved, undefined; ECX reserved, undefined; EDX reserved, undefined; BREAK; ESAC;
3-118
CVTPI2PSPacked Signed INT32 to Packed Single-FP Conversion

Opcode 0F,2A,/r Instruction CVTPI2PS xmm, mm/m64 Description Convert two 32-bit signed integers from MM/Mem to two SP FP.
Description The CVTPI2PS instruction converts signed 32-bit integers to SP FP numbers. When the conversion is inexact, rounding is done according to MXCSR. A #MF fault is signaled if there is a pending x87 fault.
CVTPI2PS xmm1, xmm1/m64 Xmm1 Mm1/ m64 Float Xmm1 Float 1.0
Figure 3-28. Operation of the CVTPI2PS Instruction
Operation
DEST[31-0] DEST[63-32] DEST[95-64] DEST[127-96] = (float) (SRC/m64[31-0]); = (float) (SRC/m64[63-32]); = DEST[95-64]; = DEST[127-96];

__m128 _mm_cvt_pi2ps(__m128 a, __m64 b) __m128 _mm_cvtpi32_ps(__m128 a, __m64 b)
Convert the two 32-bit integer values in packed form in b to two SP FP values; the upper two SP FP values are passed through from a.
3-119
CVTPI2PSPacked Signed INT32 to Packed Single-FP Conversion (Continued)

Exceptions None. Numeric Exceptions Precision. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #MF #AC #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-120

Real Address Mode Exceptions Interrupt 13 #UD #NM #MF #AC #XM #UD #UD #UD If any part of the operand would lie outside of the effective address space from 0 to 0FFFFH. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-121

Comments This instruction behaves identically to original MMX instructions, in the presence of x87-FP instructions:
Transition from x87-FP to MMX technology (TOS=0, FP valid bits set to all valid). MMX instructions write ones (1s) to the exponent part of the corresponding x87-FP register.
However, the use of a memory source operand with this instruction will not result in the above transition from x87-FP to MMX technology. Prioritizing for fault and assist behavior for CVTPI2PS is as follows: Memory source 1. Invalid opcode (CR0.EM=1) 2. DNA (CR0.TS=1) 3. #SS or #GP, for limit violation 4. #PF, page fault 5. Streaming SIMD Extensions numeric fault (i.e., precision) Register source 1. Invalid opcode (CR0.EM=1) 2. DNA (CR0.TS=1) 3. #MF, pending x87-FP fault signaled 4. After returning from #MF, x87-FP->MMX technology transition 5. Streaming SIMD Extensions numeric fault (i.e., precision)
3-122
CVTPS2PIPacked Single-FP to Packed INT32 Conversion

Opcode 0F,2D,/r Instruction CVTPS2PI mm, xmm/m64 Description Convert lower two SP FP from XMM/Mem to two 32-bit signed integers in MM using rounding specified by MXCSR.
Description The CVTPS2PI instruction converts the lower two SP FP numbers in xmm/m64 to signed 32-bit integers in mm. When the conversion is inexact, the value rounded according to the MXCSR is returned. If the converted result(s) is/are larger than the maximum signed 32 bit value, the Integer Indefinite value (0x80000000) will be returned.
CVTPS2PI xmm1, xmm1/m64 Xmm1 Xmm2/ m64 Int Mm1
1.0 Int
Figure 3-29. Operation of the CVTPS2PI Instruction
Operation
DEST[31-0] = (int) (SRC/m64[31-0]); DEST[63-32]= (int) (SRC/m64[63-32]);

__m64 _mm_cvt_ps2pi(__m128 a) __m64 _mm_cvtps_pi32(__m128 a)
Convert the two lower SP FP values of a to two 32-bit integers with truncation, returning the integers in packed form.
3-123
CVTPS2PIPacked Single-FP to Packed INT32 Conversion (Continued)

Exceptions None. Numeric Exceptions Invalid, Precision. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #MF #AC #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-124

Real Address Mode Exceptions Interrupt 13 #UD #NM #MF #XM #UD #UD #UD If any part of the operand would lie outside of the effective address space from 0 to 0FFFFH. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-125

Comments This instruction behaves identically to original MMX instructions, in the presence of x87-FP instructions:
Prioritizing for fault and assist behavior for CVTPS2PI is as follows: Memory source 1. Invalid opcode (CR0.EM=1) 2. DNA (CR0.TS=1) 3. #MF, pending x87-FP fault signaled 4. After returning from #MF, x87-FP->MMX technology transition 5. #SS or #GP, for limit violation 6. #PF, page fault 7. Streaming SIMD Extensions numeric fault (i.e., invalid, precision) Register source 1. Invalid opcode (CR0.EM=1) 2. DNA (CR0.TS=1) 3. #MF, pending x87-FP fault signaled 4. After returning from #MF, x87-FP->MMX technology transition 5. Streaming SIMD Extensions numeric fault (i.e., precision)
3-126
CVTSI2SSScalar Signed INT32 to Single-FP Conversion

Opcode F3,0F,2A,/r Instruction CVTSI2SS xmm, r/m32 Description Convert one 32-bit signed integer from Integer Reg/Mem to one SP FP.
Description The CVTSI2SS instruction converts a signed 32-bit integer from memory or from a 32-bit integer register to an SP FP number. When the conversion is inexact, rounding is done according to the MXCSR.
CVTSI2SS xmm1, r/m32 Xmm1
R/m32 Float Xmm1
Figure 3-30. Operation of the CVTSI2SS Instruction
Operation
DEST[31-0] DEST[63-32] DEST[95-64] DEST[127-96] = (float) (R/m32); = DEST[63-32]; = DEST[95-64]; = DEST[127-96];

__m128 _mm_cvt_si2ss(__m128 a, int b) __m128 _mm_cvtsi32_ss(__m128 a, int b)
Convert the 32-bit integer value b to an SP FP value; the upper three SP FP values are passed through from a.
3-127
CVTSI2SSScalar Signed INT32 to Single-FP Conversion (Continued)

Exceptions None. Numeric Exceptions Precision. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #AC #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-128
CVTSI2SSScalar Signed INT32 to Single-FP Conversion (Continued)

3-129
CVTSS2SIScalar Single-FP to Signed INT32 Conversion

Opcode F3,0F,2D,/r Instruction CVTSS2SI r32, xmm/m32 Description Convert one SP FP from XMM/Mem to one 32 bit signed integer using rounding mode specified by MXCSR, and move the result to an integer register.
Description The CVTSS2SI instruction converts an SP FP number to a signed 32-bit integer and returns it in the 32-bit integer register. When the conversion is inexact, the rounded value according to the MXCSR is returned. If the converted result is larger than the maximum signed 32 bit integer, the Integer Indefinite value (0x80000000) will be returned.
CVTSS2SI r32, xmm1/m32
r32
Xmm1/ m32
r32
Figure 3-31. Operation of the CVTSS2SI Instruction
Operation
r32 = (int) (SRC/m32[31-0]);

int_mm_cvt_ss2si(__m128 a) int_mm_cvtss_si32(__m128 a)
Convert the lower SP FP value of a to a 32-bit integer according to the current rounding mode.
3-130
CVTSS2SIScalar Single-FP to Signed INT32 Conversion (Continued)

Exceptions None. Numeric Exceptions Invalid, Precision. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #AC #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference.To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT = 0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-131
CVTSS2SIScalar Single-FP to Signed INT32 Conversion (Continued)

3-132
CVTTPS2PIPacked Single-FP to Packed INT32 Conversion (Truncate)

Opcode 0F,2C,/r Instruction CVTTPS2PI mm, xmm/m64 Description Convert lower two SP FP from XMM/Mem to two 32-bit signed integers in MM using truncate.
Description The CVTTPS2PI instruction converts the lower two SP FP numbers in xmm/m64 to two 32-bit signed integers in mm. If the conversion is inexact, the truncated result is returned. If the converted result(s) is/are larger than the maximum signed 32 bit value, the Integer Indefinite value (0x80000000) will be returned.
CVTTPS2PI mm1, xmm1/m64 Mm1 Xmm1/ m64 Int Mm1
1.0 Int
Figure 3-32. Operation of the CVTTPS2PI Instruction
Operation
DEST[31-0] = (int) (SRC/m64[31-0]); DEST[63-32]= (int) (SRC/m64[63-32]);
3-133
CVTTPS2PIPacked Single-FP to Packed INT32 Conversion (Truncate) (Continued)

__m64 _mm_cvtt_ps2pi(__m128 a) __m64 _mm_cvttps_pi32(__m128 a)
Convert the two lower SP FP values of a to two 32-bit integers according to the current rounding mode, returning the integers in packed form. Exceptions None. Numeric Exceptions Invalid, Precision. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #MF #AC #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-134

Real Address Mode Exceptions Interrupt 13 #UD #NM #MF #XM #UD #UD #UD If any part of the operand would lie outside of the effective address space from 0 to 0FFFFH. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-135

Comments This instruction behaves identically to original MMX instructions, in the presence of x87-FP instructions, including:
Prioritizing for fault and assist behavior for CVTTPS2PI is as follows: Memory source 1. Invalid opcode (CR0.EM=1) 2. DNA (CR0.TS=1) 3. #MF, pending x87-FP fault signaled 4. After returning from #MF, x87-FP->MMX technology transition 5. #SS or #GP, for limit violation 6. #PF, page fault 7. Streaming SIMD Extensions numeric fault (i.e., precision) Register source 1. Invalid opcode (CR0.EM=1) 2. DNA (CR0.TS=1) 3. #MF, pending x87-FP fault signaled 4. After returning from #MF, x87-FP->MMX technology transition 5. Streaming SIMD Extensions numeric fault (i.e., precision)
3-136
CVTTSS2SIScalar Single-FP to Signed INT32 Conversion (Truncate)

Opcode F3,0F,2C,/r Instruction CVTTSS2SI r32, xmm/m32 Description Convert lowest SP FP from XMM/Mem to one 32 bit signed integer using truncate, and move the result to an integer register.
Description The CVTTSS2SI instruction converts an SP FP number to a signed 32-bit integer and returns it in the 32-bit integer register. If the conversion is inexact, the truncated result is returned. If the converted result is larger than the maximum signed 32 bit value, the Integer Indefinite value (0x80000000) will be returned.
CVTTSS2SI r321, xmm1/m32
R32
Xmm1/ m32 Int R32
Figure 3-33. Operation of the CVTTSS2SI Instruction
Operation
r32 = (INT) (SRC/m32[31-0]);
3-137
CVTTSS2SIScalar Single-FP to Signed INT32 Conversion (Truncate) (Continued)

Intel C/C++ Compiler Intrinsic Equivalent Version 4.0 and later Intel C/C++ Compiler intrinsic:
int_mm_cvtt_ss2si(__m128 a) int_mm_cvttss_si32(__m128 a)
Convert the lower SP FP value of a to a 32-bit integer according to the current rounding mode. Pre-4.0 Intel C/C++ Compiler intrinsic:
_m64_m_from_int(int_i)
Version 4.0 and later Intel C/C++ Compiler intrinsic:

_m64_mm_cvttsi32_si64(int_i)
Convert the integer object i to a 64-bit __m64 object. The integer value is zero extended to 64 bits. Pre-4.0 Intel C/C++ Compiler intrinsic:
int_m_to_int(__m64_m)

int_mm_cvtsi64_si32(__m64_m)
Convert the lower 32 bits of the __m64 object m to an integer. Exceptions None. Numeric Exceptions Invalid, Precision.
3-138

Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #AC #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-139

3-140
CWD/CDQConvert Word to Doubleword/Convert Doubleword to Quadword

Opcode 99 99 Instruction CWD CDQ Description DX:AX sign-extend of AX EDX:EAX sign-extend of EAX
Description These instructions double the size of the operand in register AX or EAX (depending on the operand size) by means of sign extension and stores the result in registers DX:AX or EDX:EAX, respectively. The CWD instruction copies the sign (bit 15) of the value in the AX register into every bit position in the DX register. For more information, refer to Figure 6-5 in Chapter 6, Instruction Set Summaryof the Intel Architecture Software Developers Manual, Volume 1. The CDQ instruction copies the sign (bit 31) of the value in the EAX register into every bit position in the EDX register. The CWD instruction can be used to produce a doubleword dividend from a word before a word division, and the CDQ instruction can be used to produce a quadword dividend from a doubleword before doubleword division. The CWD and CDQ mnemonics reference the same opcode. The CWD instruction is intended for use when the operand-size attribute is 16 and the CDQ instruction for when the operand-size attribute is 32. Some assemblers may force the operand size to 16 when CWD is used and to 32 when CDQ is used. Others may treat these mnemonics as synonyms (CWD/CDQ) and use the current setting of the operand-size attribute to determine the size of values to be converted, regardless of the mnemonic used. Operation
IF OperandSize = 16 (* CWD instruction *) THEN DX SignExtend(AX); ELSE (* OperandSize = 32, CDQ instruction *) EDX SignExtend(EAX); FI;
3-141
CWDEConvert Word to Doubleword

Refer to entry for CBW/CWDEConvert Byte to Word/Convert Word to Doubleword.
3-142
DAADecimal Adjust AL after Addition

Opcode 27 Instruction DAA Description Decimal adjust AL after addition
Description This instruction adjusts the sum of two packed BCD values to create a packed BCD result. The AL register is the implied source and destination operand. The DAA instruction is only useful when it follows an ADD instruction that adds (binary addition) two 2-digit, packed BCD values and stores a byte result in the AL register. The DAA instruction then adjusts the contents of the AL register to contain the correct 2-digit, packed BCD result. If a decimal carry is detected, the CF and AF flags are set accordingly. Operation
IF (((AL AND 0FH) > 9) or AF = 1) THEN AL AL + 6; CF CF OR CarryFromLastAddition; (* CF OR carry from AL AL + 6 *) AF 1; ELSE AF 0; FI; IF ((AL AND F0H) > 90H) or CF = 1) THEN AL AL + 60H; CF 1; ELSE CF 0; FI;
Example
ADD AL, BL DAA Before: AL=79H BL=35H EFLAGS(OSZAPC)=XXXXXX After: AL=AEH BL=35H EFLAGS(0SZAPC)=110000 Before: AL=2EH BL=35H EFLAGS(OSZAPC)=110000 After: AL=04H BL=35H EFLAGS(0SZAPC)=X00101
3-143
DAADecimal Adjust AL after Addition (Continued)

Flags Affected The CF and AF flags are set if the adjustment of the value results in a decimal carry in either digit of the result (refer to the Operation section above). The SF, ZF, and PF flags are set according to the result. The OF flag is undefined. Exceptions (All Operating Modes) None.
3-144
DASDecimal Adjust AL after Subtraction

Opcode 2F Instruction DAS Description Decimal adjust AL after subtraction
Description This instruction adjusts the result of the subtraction of two packed BCD values to create a packed BCD result. The AL register is the implied source and destination operand. The DAS instruction is only useful when it follows a SUB instruction that subtracts (binary subtraction) one 2-digit, packed BCD value from another and stores a byte result in the AL register. The DAS instruction then adjusts the contents of the AL register to contain the correct 2-digit, packed BCD result. If a decimal borrow is detected, the CF and AF flags are set accordingly. Operation
IF (AL AND 0FH) > 9 OR AF = 1 THEN AL AL 6; CF CF OR BorrowFromLastSubtraction; (* CF OR borrow from AL AL 6 *) AF 1; ELSE AF 0; FI; IF ((AL > 9FH) or CF = 1) THEN AL AL 60H; CF 1; ELSE CF 0; FI;
Example
SUB AL, BL DAA Before: AL=35H BL=47H EFLAGS(OSZAPC)=XXXXXX After: AL=EEH BL=47H EFLAGS(0SZAPC)=010111 Before: AL=EEH BL=47H EFLAGS(OSZAPC)=010111 After: AL=88H BL=47H EFLAGS(0SZAPC)=X10111
Flags Affected The CF and AF flags are set if the adjustment of the value results in a decimal borrow in either digit of the result (refer to the Operation section above). The SF, ZF, and PF flags are set according to the result. The OF flag is undefined. Exceptions (All Operating Modes) None.
3-145
DECDecrement by 1
Opcode FE /1 FF /1 FF /1 48+rw 48+rd Instruction DEC r/m8 DEC r/m16 DEC r/m32 DEC r16 DEC r32 Description Decrement r/m8 by 1 Decrement r/m16 by 1 Decrement r/m32 by 1 Decrement r16 by 1 Decrement r32 by 1
Description This instruction subtracts one from the destination operand, while preserving the state of the CF flag. The destination operand can be a register or a memory location. This instruction allows a loop counter to be updated without disturbing the CF flag. (To perform a decrement operation that updates the CF flag, use a SUB instruction with an immediate operand of 1.) Operation
DEST DEST 1;
Flags Affected The CF flag is not affected. The OF, SF, ZF, AF, and PF flags are set according to the result. Protected Mode Exceptions #GP(0) If the destination operand is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-146
DECDecrement by 1 (Continued)
3-147
DIVUnsigned Divide
Opcode F6 /6 F7 /6 F7 /6 Instruction DIV r/m8 DIV r/m16 DIV r/m32 Description Unsigned divide AX by r/m8; AL Quotient, AH Remainder Unsigned divide DX:AX by r/m16; AX Quotient, DX Remainder Unsigned divide EDX:EAX by r/m32 doubleword; EAX Quotient, EDX Remainder
Description This instruction divides (unsigned) the value in the AX register, DX:AX register pair, or EDX:EAX register pair (dividend) by the source operand (divisor) and stores the result in the AX (AH:AL), DX:AX, or EDX:EAX registers. The source operand can be a general-purpose register or a memory location. The action of this instruction depends on the operand size, as shown in the following table:
Maximum Quotient 255 65,535 232 1
Operand Size Word/byte Doubleword/word Quadword/doubleword
Dividend AX DX:AX EDX:EAX
Divisor r/m8 r/m16 r/m32
Quotient AL AX EAX
Remainder AH DX EDX
Non-integral results are truncated (chopped) towards 0. The remainder is always less than the divisor in magnitude. Overflow is indicated with the #DE (divide error) exception rather than with the CF flag.
3-148
DIVUnsigned Divide (Continued)

Operation
IF SRC = 0 THEN #DE; (* divide error *) FI; IF OpernadSize = 8 (* word/byte operation *) THEN temp AX / SRC; IF temp > FFH THEN #DE; (* divide error *) ; ELSE AL temp; AH AX MOD SRC; FI; ELSE IF OperandSize = 16 (* doubleword/word operation *) THEN temp DX:AX / SRC; IF temp > FFFFH THEN #DE; (* divide error *) ; ELSE AX temp; DX DX:AX MOD SRC; FI; ELSE (* quadword/doubleword operation *) temp EDX:EAX / SRC; IF temp > FFFFFFFFH THEN #DE; (* divide error *) ; ELSE EAX temp; EDX EDX:EAX MOD SRC; FI; FI; FI;
Flags Affected The CF, OF, SF, ZF, AF, and PF flags are undefined.
3-149
DIVUnsigned Divide (Continued)

Protected Mode Exceptions #DE If the source operand (divisor) is 0 If the quotient is too large for the designated register. #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
Real-Address Mode Exceptions #DE If the source operand (divisor) is 0. If the quotient is too large for the designated register. #GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) If a memory operand effective address is outside the SS segment limit.
Virtual-8086 Mode Exceptions #DE If the source operand (divisor) is 0. If the quotient is too large for the designated register. #GP(0) #SS #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made.
3-150
DIVPSPacked Single-FP Divide

Opcode 0F,5E,/r Instruction DIVPS xmm1, xmm2/m128 Description Divide packed SP FP numbers in XMM1 by XMM2/Mem
Description The DIVPS instruction divides the packed SP FP numbers of both their operands.
Xmm1 Xmm2/ m128 Xmm1
100.0
1050.0
25.0
36.0
=
4.0
10.0
25.0
=
10.0
Figure 3-34. Operation of the DIVPS Instruction
Operation
DEST[31-0] DEST[63-32] DEST[95-64] DEST[127-96] = DEST[31-0] / (SRC/m128[31-0]); = DEST[63-32] / (SRC/m128[63-32]); = DEST[95-64] / (SRC/m128[95-64]); = DEST[127-96] / (SRC/m128[127-96]);

__m128 _mm_div_ps(__m128 a, __m128 b)
Divides the four SP FP values of a and b.
3-151
DIVPSPacked Single-FP Divide (Continued)

Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions Overflow, Underflow, Invalid, Divide-by-Zero, Precision, Denormal. Protected Mode Exceptions #GP #GP #SS #PF #UD #NM #XM #UD #UD #UD If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. (0) for an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. (0) for an illegal address in the SS segment. (fault-code) for a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1) For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-152
DIVPSPacked Single-FP Divide (Continued)

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code). If a page fault occurs.
3-153
DIVSSScalar Single-FP Divide

Opcode F3,0F,5E,/r Instruction DIVSS xmm1, xmm2/m32 Description Divide lower SP FP numbers in XMM1 by XMM2/Mem
Description The DIVSS instructions divide the lowest SP FP numbers of both operands; the upper three fields are passed through from xmm1.
DIVSS xmm1, xmm2/m32 Xmm1
Xmm2/ m32 Xmm1
Figure 3-35. Operation of the DIVSS Instruction
Operation
DEST[31-0] DEST[63-32] DEST[95-64] DEST[127-96] = DEST[31-0] / (SRC/m32[31-0]); = DEST[63-32]; = DEST[95-64]; = DEST[127-96];

__m128 _mm_div_ss(__m128 a, __m128 b)
Divides the lower SP FP values of a and b; the upper three SP FP values are passed through from a. Exceptions None. Overflow, Underflow, Invalid, Divide-by-Zero, Precision, Denormal.
3-154
DIVSSScalar Single-FP Divide (Continued)

3-155
EMMSEmpty MMX State

Opcode 0F 77 Instruction EMMS Description Set the FP tag word to empty.
Description This instruction sets the values of all the tags in the FPU tag word to empty (all ones). This operation marks the MMX technology registers as available, so they can subsequently be used by floating-point instructions. Refer to Figure 7-11 in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1, for the format of the FPU tag word. All other MMX instructions (other than the EMMS instruction) set all the tags in FPU tag word to valid (all zeroes). The EMMS instruction must be used to clear the MMX technology state at the end of all MMX technology routines and before calling other procedures or subroutines that may execute floating-point instructions. If a floating-point instruction loads one of the registers in the FPU register stack before the FPU tag word has been reset by the EMMS instruction, a floatingpoint stack overflow can occur that will result in a floating-point exception or incorrect result. Operation
FPUTagWord FFFF
Intel C/C++ Compiler Intrinsic Equivalent Pre-4.0 Intel C/C++ Compiler intrinsic:
void_m_empty()

void_mm_empty()
Clears the MMX technology state. Flags Affected None. Protected Mode Exceptions #UD #NM #MF If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception.
3-156
EMMSEmpty MMX State (Continued)

Real-Address Mode Exceptions #UD #NM #MF If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception.
Virtual-8086 Mode Exceptions #UD #NM #MF If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception.
3-157
ENTERMake Stack Frame for Procedure Parameters

Opcode C8 iw 00 C8 iw 01 C8 iw ib Instruction ENTER imm16,0 ENTER imm16,1 ENTER imm16,imm8 Description Create a stack frame for a procedure Create a nested stack frame for a procedure Create a nested stack frame for a procedure
Description This instruction creates a stack frame for a procedure. The first operand (size operand) specifies the size of the stack frame (that is, the number of bytes of dynamic storage allocated on the stack for the procedure). The second operand (nesting level operand) gives the lexical nesting level (0 to 31) of the procedure. The nesting level determines the number of stack frame pointers that are copied into the display area of the new stack frame from the preceding frame. Both of these operands are immediate values. The stack-size attribute determines whether the BP (16 bits) or EBP (32 bits) register specifies the current frame pointer and whether SP (16 bits) or ESP (32 bits) specifies the stack pointer. The ENTER and companion LEAVE instructions are provided to support block structured languages. The ENTER instruction (when used) is typically the first instruction in a procedure and is used to set up a new stack frame for a procedure. The LEAVE instruction is then used at the end of the procedure (just before the RET instruction) to release the stack frame. If the nesting level is 0, the processor pushes the frame pointer from the EBP register onto the stack, copies the current stack pointer from the ESP register into the EBP register, and loads the ESP register with the current stack-pointer value minus the value in the size operand. For nesting levels of one or greater, the processor pushes additional frame pointers on the stack before adjusting the stack pointer. These additional frame pointers provide the called procedure with access points to other nested frames on the stack. Refer to Section 4.5., Procedure Calls for Block-Structured Languages in Chapter 4, Procedure Calls, Interrupts, and Exceptions of the Intel Architecture Software Developers Manual, Volume 1, for more information about the actions of the ENTER instruction.
3-158
ENTERMake Stack Frame for Procedure Parameters (Continued)

Operation
NestingLevel NestingLevel MOD 32 IF StackSize = 32 THEN Push(EBP) ; FrameTemp ESP; ELSE (* StackSize = 16*) Push(BP); FrameTemp SP; FI; IF NestingLevel = 0 THEN GOTO CONTINUE; FI; IF (NestingLevel > 0) FOR i 1 TO (NestingLevel 1) DO IF OperandSize = 32 THEN IF StackSize = 32 EBP EBP 4; Push([EBP]); (* doubleword push *) ELSE (* StackSize = 16*) BP BP 4; Push([BP]); (* doubleword push *) FI; ELSE (* OperandSize = 16 *) IF StackSize = 32 THEN EBP EBP 2; Push([EBP]); (* word push *) ELSE (* StackSize = 16*) BP BP 2; Push([BP]); (* word push *) FI; FI; OD; IF OperandSize = 32 THEN Push(FrameTemp); (* doubleword push *) ELSE (* OperandSize = 16 *) Push(FrameTemp); (* word push *) FI; GOTO CONTINUE; FI;
3-159
ENTERMake Stack Frame for Procedure Parameters (Continued)

CONTINUE: IF StackSize = 32 THEN EBP FrameTemp ESP EBP Size; ELSE (* StackSize = 16*) BP FrameTemp SP BP Size; FI; END;
Flags Affected None. Protected Mode Exceptions #SS(0) #PF(fault-code) If the new value of the SP or ESP register is outside the stack segment limit. If a page fault occurs.
Real-Address Mode Exceptions #SS(0) If the new value of the SP or ESP register is outside the stack segment limit.
Virtual-8086 Mode Exceptions #SS(0) #PF(fault-code) If the new value of the SP or ESP register is outside the stack segment limit. If a page fault occurs.
3-160
F2XM1Compute 2x1
Opcode D9 F0 Instruction F2XM1 Description Replace ST(0) with (2ST(0) 1)
Description This instruction calculates the exponential value of 2 to the power of the source operand minus 1. The source operand is located in register ST(0) and the result is also stored in ST(0). The value of the source operand must lie in the range 1.0 to +1.0. If the source value is outside this range, the result is undefined. The following table shows the results obtained when computing the exponential value of various classes of numbers, assuming that neither overflow nor underflow occurs.
ST(0) SRC 1.0 to 0 0 +0 +0 to +1.0 ST(0) DEST 0.5 to 0 0 +0 +0 to 1.0
Values other than 2 can be exponentiated using the following formula:

xy = 2(y log2x)
Operation
ST(0) (2ST(0) 1);
FPU Flags Affected C1 Set to 0 if stack underflow occurred. Indicates rounding direction if the inexact result exception (#P) is generated: 0 = not roundup; 1 = roundup. C0, C2, C3 Undefined.
3-161
F2XM1Compute 2x1 (Continued)

Floating-Point Exceptions #IS #IA #D #U #P Stack underflow occurred. Source operand is an sNaN value or unsupported format. Result is a denormal value. Result is too small for destination format. Value cannot be represented exactly in destination format.
Protected Mode Exceptions #NM EM or TS in CR0 is set.
Real-Address Mode Exceptions #NM EM or TS in CR0 is set.
Virtual-8086 Mode Exceptions #NM EM or TS in CR0 is set.
3-162
FABSAbsolute Value
Opcode D9 E1 Instruction FABS Description Replace ST with its absolute value.
Description This instruction clears the sign bit of ST(0) to create the absolute value of the operand. The following table shows the results obtained when creating the absolute value of various classes of numbers.
ST(0) SRC F 0 +0 +F + NaN NOTE: F Means finite-real number. ST(0) DEST + +F +0 +0 +F + NaN
Operation
ST(0) |ST(0)|
FPU Flags Affected C1 C0, C2, C3 Set to 0 if stack underflow occurred; otherwise, cleared to 0. Undefined.
Floating-Point Exceptions #IS Stack underflow occurred.
3-163
FABSAbsolute Value (Continued)

3-164
FADD/FADDP/FIADDAdd
Opcode D8 /0 DC /0 D8 C0+i DC C0+i DE C0+i DE C1 DA /0 DE /0 Instruction FADD m32 real FADD m64real FADD ST(0), ST(i) FADD ST(i), ST(0) FADDP ST(i), ST(0) FADDP FIADD m32int FIADD m16int Description Add m32real to ST(0) and store result in ST(0) Add m64real to ST(0) and store result in ST(0) Add ST(0) to ST(i) and store result in ST(0) Add ST(i) to ST(0) and store result in ST(i) Add ST(0) to ST(i), store result in ST(i), and pop the register stack Add ST(0) to ST(1), store result in ST(1), and pop the register stack Add m32int to ST(0) and store result in ST(0) Add m16int to ST(0) and store result in ST(0)
Description This instruction adds the destination and source operands and stores the sum in the destination location. The destination operand is always an FPU register; the source operand can be a register or a memory location. Source operands in memory can be in single-real, double-real, wordinteger, or short-integer formats. The no-operand version of the instruction adds the contents of the ST(0) register to the ST(1) register. The one-operand version adds the contents of a memory location (either a real or an integer value) to the contents of the ST(0) register. The two-operand version, adds the contents of the ST(0) register to the ST(i) register or vice versa. The value in ST(0) can be doubled by coding:
FADD ST(0), ST(0);
The FADDP instructions perform the additional operation of popping the FPU register stack after storing the result. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. (The no-operand version of the floating-point add instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FADD rather than FADDP.) The FIADD instructions convert an integer source operand to extended-real format before performing the addition. The table on the following page shows the results obtained when adding various classes of numbers, assuming that neither overflow nor underflow occurs. When the sum of two operands with opposite signs is 0, the result is +0, except for the round toward mode, in which case the result is 0. When the source operand is an integer 0, it is treated as a +0. When both operand are infinities of the same sign, the result is of the expected sign. If both operands are infinities of opposite signs, an invalid operation exception is generated.
3-165
FADD/FADDP/FIADDAdd (Continued)
.
DEST - F or I SRC 0 +0 +F or +I + NaN NOTES: F Means finite-real number. I Means integer. * Indicates floating-point invalid-arithmetic-operand (#IA) exception. - - - - - * NaN F - F DEST DEST F or 0 + NaN 0 - SRC 0 0 SRC + NaN +0 - SRC 0 +0 SRC + NaN +F - F or 0 DEST DEST +F + NaN + * + + + + + NaN NaN NaN NaN NaN NaN NaN NaN NaN
Operation
IF instruction is FIADD THEN DEST DEST + ConvertExtendedReal(SRC); ELSE (* source operand is real number *) DEST DEST + SRC; FI; IF instruction = FADDP THEN PopRegisterStack; FI;
3-166
Floating-Point Exceptions #IS #IA Stack underflow occurred. Operand is an sNaN value or unsupported format. Operands are infinities of unlike sign. #D #U #O #P Source operand is a denormal value. Result is too small for destination format. Result is too large for destination format. Value cannot be represented exactly in destination format.
Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #NM #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
Real-Address Mode Exceptions #GP #SS #NM If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set.
3-167
Virtual-8086 Mode Exceptions #GP(0) #SS(0) #NM #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made.
3-168
FBLDLoad Binary Coded Decimal

Opcode DF /4 Instruction FBLD m80 dec Description Convert BCD value to real and push onto the FPU stack.
Description This instruction converts the BCD source operand into extended-real format and pushes the value onto the FPU stack. The source operand is loaded without rounding errors. The sign of the source operand is preserved, including that of 0. The packed BCD digits are assumed to be in the range 0 through 9; the instruction does not check for invalid digits (AH through FH). Attempting to load an invalid encoding produces an undefined result. Operation
TOP TOP 1; ST(0) ExtendedReal(SRC);
FPU Flags Affected C1 C0, C2, C3 Set to 1 if stack overflow occurred; otherwise, cleared to 0. Undefined.
Floating-Point Exceptions #IS Stack overflow occurred.
3-169
FBLDLoad Binary Coded Decimal (Continued)

3-170
FBSTPStore BCD Integer and Pop

Opcode DF /6 Instruction FBSTP m80bcd Description Store ST(0) in m80bcd and pop ST(0).
Description This instruction converts the value in the ST(0) register to an 18-digit packed BCD integer, stores the result in the destination operand, and pops the register stack. If the source value is a non-integral value, it is rounded to an integer value, according to rounding mode specified by the RC field of the FPU control word. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The destination operand specifies the address where the first byte destination value is to be stored. The BCD value (including its sign bit) requires 10 bytes of space in memory. The following table shows the results obtained when storing various classes of numbers in packed BCD format.
ST(0) - F < 1 1 < F < 0 0 +0 +0 < +F < +1 +F > +1 + NaN NOTES: F Means finite-real number. D Means packed-BCD number. * Indicates floating-point invalid operation (#IA) exception. ** 0 or 1, depending on the rounding mode. DEST * D ** 0 +0 ** +D * *
If the source value is too large for the destination format and the invalid operation exception is not masked, an invalid operation exception is generated and no value is stored in the destination operand. If the invalid operation exception is masked, the packed BCD indefinite value is stored in memory. If the source value is a quiet NaN, an invalid operation exception is generated. Quiet NaNs do not normally cause this exception to be generated.
3-171
FBSTPStore BCD Integer and Pop (Continued)

Operation
DEST BCD(ST(0)); PopRegisterStack;
FPU Flags Affected C1 Set to 0 if stack underflow occurred. Indicates rounding direction if the inexact exception (#P) is generated: 0 = not roundup; 1 = roundup. C0, C2, C3 Undefined.
Floating-Point Exceptions #IS #IA #P Stack underflow occurred. Source operand is empty; contains a NaN, , or unsupported format; or contains value that exceeds 18 BCD digits in length. Value cannot be represented exactly in destination format.
Protected Mode Exceptions #GP(0) If a segment register is being loaded with a segment selector that points to a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #NM #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-172
FBSTPStore BCD Integer and Pop (Continued)

3-173
FCHSChange Sign
Opcode D9 E0 Instruction FCHS Description Complements sign of ST(0)
Description This instruction complements the sign bit of ST(0). This operation changes a positive value into a negative value of equal magnitude or vice versa. The following table shows the results obtained when changing the sign of various classes of numbers.
ST(0) SRC F 0 +0 +F + NaN NOTE: F Means finite-real number. ST(0) DEST + +F +0 0 F NaN
Operation
SignBit(ST(0)) NOT (SignBit(ST(0)))
3-174
FCHSChange Sign (Continued)

3-175
FCLEX/FNCLEXClear Exceptions
Opcode 9B DB E2 DB E2 NOTE: * Refer to Intel Architecture Compatibility below. Instruction FCLEX FNCLEX* Description Clear floating-point exception flags after checking for pending unmasked floating-point exceptions. Clear floating-point exception flags without checking for pending unmasked floating-point exceptions.
Description This instruction clears the floating-point exception flags (PE, UE, OE, ZE, DE, and IE), the exception summary status flag (ES), the stack fault flag (SF), and the busy flag (B) in the FPU status word. The FCLEX instruction checks for and handles any pending unmasked floatingpoint exceptions before clearing the exception flags; the FNCLEX instruction does not. Intel Architecture Compatibility When operating a Pentium or Intel486 processor in MS-DOS compatibility mode, it is possible (under unusual circumstances) for an FNCLEX instruction to be interrupted prior to being executed to handle a pending FPU exception. Refer to Section E.2.1.3, No-Wait FPU Instructions Can Get FPU Interrupt in Window in Appendix E, Guidelines for Writing FPU Exception Handlers of the Intel Architecture Software Developers Manual, Volume 1, for a description of these circumstances. An FNCLEX instruction cannot be interrupted in this way on a Pentium Pro processor. On a Pentium III processor, the FCLEX/FNCLEX instructions operate the same as on a Pentium II processor. They have no effect on the Pentium III processor SIMD floating-point functional unit or control/status register. Operation
FPUStatusWord[0..7] 0; FPUStatusWord[15] 0;
FPU Flags Affected The PE, UE, OE, ZE, DE, IE, ES, SF, and B flags in the FPU status word are cleared. The C0, C1, C2, and C3 flags are undefined. Floating-Point Exceptions None.
3-176
FCLEX/FNCLEXClear Exceptions (Continued)

3-177
FCMOVccFloating-Point Conditional Move

Opcode DA C0+i DA C8+i DA D0+i DA D8+i DB C0+i DB C8+i DB D0+i DB D8+i Instruction FCMOVB ST(0), ST(i) FCMOVE ST(0), ST(i) FCMOVBE ST(0), ST(i) FCMOVU ST(0), ST(i) FCMOVNB ST(0), ST(i) FCMOVNE ST(0), ST(i) FCMOVNBE ST(0), ST(i) FCMOVNU ST(0), ST(i) Description Move if below (CF=1) Move if equal (ZF=1) Move if below or equal (CF=1 or ZF=1) Move if unordered (PF=1) Move if not below (CF=0) Move if not equal (ZF=0) Move if not below or equal (CF=0 and ZF=0) Move if not unordered (PF=0)
Description This instruction tests the status flags in the EFLAGS register and moves the source operand (second operand) to the destination operand (first operand) if the given test condition is true. The conditions for each mnemonic are given in the Description column above and in Table 6-4 in Chapter 6, Instruction Set Summary of the Intel Architecture Software Developers Manual, Volume 1. The source operand is always in the ST(i) register and the destination operand is always ST(0). The FCMOVcc instructions are useful for optimizing small IF constructions. They also help eliminate branching overhead for IF operations and the possibility of branch mispredictions by the processor. A processor may not support the FCMOVcc instructions. Software can check if the FCMOVcc instructions are supported by checking the processors feature information with the CPUID instruction (refer to COMISSScalar Ordered Single-FP Compare and Set EFLAGS in this chapter). If both the CMOV and FPU feature bits are set, the FCMOVcc instructions are supported. Intel Architecture Compatibility The FCMOVcc instructions were introduced to the Intel Architecture in the Pentium Pro processor family and is not available in earlier Intel Architecture processors. Operation
IF condition TRUE ST(0) ST(i) FI;
FPU Flags Affected C1 C0, C2, C3 Set to 0 if stack underflow occurred. Undefined.
3-178
FCMOVccFloating-Point Conditional Move (Continued)

Integer Flags Affected None. Protected Mode Exceptions #NM EM or TS in CR0 is set.
3-179
FCOM/FCOMP/FCOMPPCompare Real
Opcode D8 /2 DC /2 D8 D0+i D8 D1 D8 /3 DC /3 D8 D8+i D8 D9 DE D9 Instruction FCOM m32real FCOM m64real FCOM ST(i) FCOM FCOMP m32real FCOMP m64real FCOMP ST(i) FCOMP FCOMPP Description Compare ST(0) with m32real. Compare ST(0) with m64real. Compare ST(0) with ST(i). Compare ST(0) with ST(1). Compare ST(0) with m32real and pop register stack. Compare ST(0) with m64real and pop register stack. Compare ST(0) with ST(i) and pop register stack. Compare ST(0) with ST(1) and pop register stack. Compare ST(0) with ST(1) and pop register stack twice.
Description These instructions compare the contents of register ST(0) and source value and sets condition code flags C0, C2, and C3 in the FPU status word according to the results (refer to the table below). The source operand can be a data register or a memory location. If no source operand is given, the value in ST(0) is compared with the value in ST(1). The sign of zero is ignored, so that 0.0 = +0.0.
Condition ST(0) > SRC ST(0) < SRC ST(0) = SRC Unordered* NOTE: * Flags not set if unmasked invalid-arithmetic-operand (#IA) exception is generated. C3 0 0 1 1 C2 0 0 0 1 C0 0 1 0 1
This instruction checks the class of the numbers being compared (refer to FXAMExamine in this chapter). If either operand is a NaN or is in an unsupported format, an invalid-arithmeticoperand exception (#IA) is raised and, if the exception is masked, the condition flags are set to unordered. If the invalid-arithmetic-operand exception is unmasked, the condition code flags are not set. The FCOMP instruction pops the register stack following the comparison operation and the FCOMPP instruction pops the register stack twice following the comparison operation. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1.
3-180
FCOM/FCOMP/FCOMPPCompare Real (Continued)

The FCOM instructions perform the same operation as the FUCOM instructions. The only difference is how they handle qNaN operands. The FCOM instructions raise an invalid-arithmetic-operand exception (#IA) when either or both of the operands is a NaN value or is in an unsupported format. The FUCOM instructions perform the same operation as the FCOM instructions, except that they do not generate an invalid-arithmetic-operand exception for qNaNs. Operation
CASE (relation of operands) OF ST > SRC: C3, C2, C0 000; ST < SRC: C3, C2, C0 001; ST = SRC: C3, C2, C0 100; ESAC; IF ST(0) or SRC = NaN or unsupported format THEN #IA IF FPUControlWord.IM = 1 THEN C3, C2, C0 111; FI; FI; IF instruction = FCOMP THEN PopRegisterStack; FI; IF instruction = FCOMPP THEN PopRegisterStack; PopRegisterStack; FI;
FPU Flags Affected C1 C0, C2, C3 Set to 0 if stack underflow occurred; otherwise, cleared to 0. Refer to table on previous page.
Floating-Point Exceptions #IS #IA Stack underflow occurred. One or both operands are NaN values or have unsupported formats. Register is marked empty. #D One or both operands are denormal values.
3-181
FCOM/FCOMP/FCOMPPCompare Real (Continued)

3-182
FCOMI/FCOMIP/ FUCOMI/FUCOMIPCompare Real and Set EFLAGS

Opcode DB F0+i DF F0+i DB E8+i DF E8+i Instruction FCOMI ST, ST(i) FCOMIP ST, ST(i) FUCOMI ST, ST(i) FUCOMIP ST, ST(i) Description Compare ST(0) with ST(i) and set status flags accordingly Compare ST(0) with ST(i), set status flags accordingly, and pop register stack Compare ST(0) with ST(i), check for ordered values, and set status flags accordingly Compare ST(0) with ST(i), check for ordered values, set status flags accordingly, and pop register stack
Description These instructions compare the contents of register ST(0) and ST(i) and sets the status flags ZF, PF, and CF in the EFLAGS register according to the results (refer to the table below). The sign of zero is ignored for comparisons, so that 0.0 = +0.0.
Comparison Results ST0 > ST(i) ST0 < ST(i) ST0 = ST(i) Unordered* NOTE: * Flags are set regardless, whether there is an unmasked invalid-arithmetic-operand (#IA) exception generated or not. ZF 0 0 1 1 PF 0 0 0 1 CF 0 1 0 1
The FCOMI/FCOMIP instructions perform the same operation as the FUCOMI/FUCOMIP instructions. The only difference is how they handle qNaN operands. The FCOMI/FCOMIP instructions set the status flags to unordered and generate an invalid-arithmetic-operand exception (#IA) when either or both of the operands is a NaN value (sNaN or qNaN) or is in an unsupported format. The FUCOMI/FUCOMIP instructions perform the same operation as the FCOMI/FCOMIP instructions, except that they do not generate an invalid-arithmetic-operand exception for qNaNs. Refer to FXAMExamine in this chapter for additional information on unordered comparisons. If invalid operation exception is unmasked, the status flags are not set if the invalid-arithmeticoperand exception is generated. The FCOMIP and FUCOMIP instructions also pop the register stack following the comparison operation. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1.
3-183
FCOMI/FCOMIP/ FUCOMI/FUCOMIPCompare Real and Set EFLAGS (Continued)

Intel Architecture Compatibility The FCOMI/FCOMIP/FUCOMI/FUCOMIP instructions were introduced to the Intel Architecture in the Pentium Pro processor family and are not available in earlier Intel Architecture processors. Operation
CASE (relation of operands) OF ST(0) > ST(i): ZF, PF, CF 000; ST(0) < ST(i): ZF, PF, CF 001; ST(0) = ST(i): ZF, PF, CF 100; ESAC; IF instruction is FCOMI or FCOMIP THEN IF ST(0) or ST(i) = NaN or unsupported format THEN #IA IF FPUControlWord.IM = 1 THEN ZF, PF, CF 111; FI; FI; FI; IF instruction is FUCOMI or FUCOMIP THEN IF ST(0) or ST(i) = QNaN, but not SNaN or unsupported format THEN ZF, PF, CF 111; ELSE (* ST(0) or ST(i) is SNaN or unsupported format *) #IA; IF FPUControlWord.IM = 1 THEN ZF, PF, CF 111; FI; FI; FI; IF instruction is FCOMIP or FUCOMIP THEN PopRegisterStack; FI;
3-184
FCOMI/FCOMIP/ FUCOMI/FUCOMIPCompare Real and Set EFLAGS (Continued)

FPU Flags Affected C1 C0, C2, C3 Set to 0 if stack underflow occurred; otherwise, cleared to 0. Not affected.
Floating-Point Exceptions #IS #IA Stack underflow occurred. (FCOMI or FCOMIP instruction) One or both operands are NaN values or have unsupported formats. (FUCOMI or FUCOMIP instruction) One or both operands are sNaN values (but not qNaNs) or have undefined formats. Detection of a qNaN value does not raise an invalid-operand exception. Protected Mode Exceptions #NM EM or TS in CR0 is set.
3-185
FCOSCosine
Opcode D9 FF Instruction FCOS Description Replace ST(0) with its cosine
Description This instruction calculates the cosine of the source operand in register ST(0) and stores the result in ST(0). The source operand must be given in radians and must be within the range 263 to +263. The following table shows the results obtained when taking the cosine of various classes of numbers, assuming that neither overflow nor underflow occurs.
ST(0) SRC F 0 +0 +F + NaN NOTES: F Means finite-real number. * Indicates floating-point invalid-arithmetic-operand (#IA) exception. ST(0) DEST * 1 to +1 +1 +1 1 to +1 * NaN
If the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of range. It is up to the program to check the C2 flag for out-ofrange conditions. Source values outside the range 263 to +263 can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2 or by using the FPREM instruction with a divisor of 2. Refer to Section 7.5.8., Pi in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1, for a discussion of the proper value to use for in performing such reductions. Operation
IF |ST(0)| < 263 THEN C2 0; ST(0) cosine(ST(0)); ELSE (*source operand is out-of-range *) C2 1; FI;
3-186
FCOSCosine (Continued)
FPU Flags Affected C1 Set to 0 if stack underflow occurred. Indicates rounding direction if the inexact result exception (#P) is generated: 0 = not roundup; 1 = roundup. Undefined if C2 is 1. C2 C0, C3 Set to 1 if source operand is outside the range 263 to +263; otherwise, cleared to 0. Undefined.
Floating-Point Exceptions #IS #IA #D #U #P Stack underflow occurred. Source operand is an sNaN value, , or unsupported format. Result is a denormal value. Result is too small for destination format. Value cannot be represented exactly in destination format.
3-187
FDECSTPDecrement Stack-Top Pointer

Opcode D9 F6 Instruction FDECSTP Description Decrement TOP field in FPU status word.
Description This instruction subtracts one from the TOP field of the FPU status word (decrements the topof-stack pointer). If the TOP field contains a 0, it is set to 7. The effect of this instruction is to rotate the stack by one position. The contents of the FPU data registers and tag register are not affected. Operation
IF TOP = 0 THEN TOP 7; ELSE TOP TOP 1; FI;
FPU Flags Affected The C1 flag is set to 0; otherwise, cleared to 0. The C0, C2, and C3 flags are undefined. Floating-Point Exceptions None. Protected Mode Exceptions #NM EM or TS in CR0 is set.
3-188
FDIV/FDIVP/FIDIVDivide
Opcode D8 /6 DC /6 D8 F0+i DC F8+i DE F8+i DE F9 DA /6 DE /6 Instruction FDIV m32real FDIV m64real FDIV ST(0), ST(i) FDIV ST(i), ST(0) FDIVP ST(i), ST(0) FDIVP FIDIV m32int FIDIV m16int Description Divide ST(0) by m32real and store result in ST(0) Divide ST(0) by m64real and store result in ST(0) Divide ST(0) by ST(i) and store result in ST(0) Divide ST(i) by ST(0) and store result in ST(i) Divide ST(i) by ST(0), store result in ST(i), and pop the register stack Divide ST(1) by ST(0), store result in ST(1), and pop the register stack Divide ST(0) by m32int and store result in ST(0) Divide ST(0) by m16int and store result in ST(0)
Description These instructions divide the destination operand by the source operand and stores the result in the destination location. The destination operand (dividend) is always in an FPU register; the source operand (divisor) can be a register or a memory location. Source operands in memory can be in single-real, double-real, word-integer, or short-integer formats. The no-operand version of the instruction divides the contents of the ST(1) register by the contents of the ST(0) register. The one-operand version divides the contents of the ST(0) register by the contents of a memory location (either a real or an integer value). The two-operand version, divides the contents of the ST(0) register by the contents of the ST(i) register or vice versa. The FDIVP instructions perform the additional operation of popping the FPU register stack after storing the result. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The no-operand version of the floating-point divide instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FDIV rather than FDIVP. The FIDIV instructions convert an integer source operand to extended-real format before performing the division. When the source operand is an integer 0, it is treated as a +0. If an unmasked divide-by-zero exception (#Z) is generated, no result is stored; if the exception is masked, an of the appropriate sign is stored in the destination operand. The following table shows the results obtained when dividing various classes of numbers, assuming that neither overflow nor underflow occurs.
3-189
FDIV/FDIVP/FIDIVDivide (Continued)
DEST - F I SRC 0 +0 +I +F + NaN NOTES: F Means finite-real number. I Means integer. * Indicates floating-point invalid-arithmetic-operand (#IA) exception. ** Indicates floating-point zero-divide (#Z) exception. * + + + * NaN F +0 +F +F ** ** F F 0 NaN 0 +0 +0 +0 * * 0 0 0 NaN +0 0 0 0 * * +0 +0 +0 NaN +F 0 F F ** ** +F +F +0 NaN + * + + + * NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Operation
IF SRC = 0 THEN #Z ELSE IF instruction is FIDIV THEN DEST DEST / ConvertExtendedReal(SRC); ELSE (* source operand is real number *) DEST DEST / SRC; FI; FI; IF instruction = FDIVP THEN PopRegisterStack FI;
3-190
Floating-Point Exceptions #IS #IA Stack underflow occurred. Operand is an sNaN value or unsupported format. / ; 0 / 0 #D #Z #U #O #P Result is a denormal value. DEST / 0, where DEST is not equal to 0. Result is too small for destination format. Result is too large for destination format. Value cannot be represented exactly in destination format.
3-191
3-192
FDIVR/FDIVRP/FIDIVRReverse Divide
Opcode D8 /7 DC /7 D8 F8+i DC F0+i DE F0+i DE F1 DA /7 DE /7 Instruction FDIVR m32real FDIVR m64real FDIVR ST(0), ST(i) FDIVR ST(i), ST(0) FDIVRP ST(i), ST(0) FDIVRP FIDIVR m32int FIDIVR m16int Description Divide m32real by ST(0) and store result in ST(0) Divide m64real by ST(0) and store result in ST(0) Divide ST(i) by ST(0) and store result in ST(0) Divide ST(0) by ST(i) and store result in ST(i) Divide ST(0) by ST(i), store result in ST(i), and pop the register stack Divide ST(0) by ST(1), store result in ST(1), and pop the register stack Divide m32int by ST(0) and store result in ST(0) Divide m16int by ST(0) and store result in ST(0)
Description These instructions divide the source operand by the destination operand and stores the result in the destination location. The destination operand (divisor) is always in an FPU register; the source operand (dividend) can be a register or a memory location. Source operands in memory can be in single-real, double-real, word-integer, or short-integer formats. These instructions perform the reverse operations of the FDIV, FDIVP, and FIDIV instructions. They are provided to support more efficient coding. The no-operand version of the instruction divides the contents of the ST(0) register by the contents of the ST(1) register. The one-operand version divides the contents of a memory location (either a real or an integer value) by the contents of the ST(0) register. The two-operand version, divides the contents of the ST(i) register by the contents of the ST(0) register or vice versa. The FDIVRP instructions perform the additional operation of popping the FPU register stack after storing the result. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The no-operand version of the floating-point divide instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FDIVR rather than FDIVRP. The FIDIVR instructions convert an integer source operand to extended-real format before performing the division. If an unmasked divide-by-zero exception (#Z) is generated, no result is stored; if the exception is masked, an of the appropriate sign is stored in the destination operand. The following table shows the results obtained when dividing various classes of numbers, assuming that neither overflow nor underflow occurs.
3-193
FDIVR/FDIVRP/FIDIVRReverse Divide (Continued)

DEST SRC F I 0 +0 +I +F + NaN NOTES: F Means finite-real number. I Means integer. * Indicates floating-point invalid-arithmetic-operand (#IA) exception. ** Indicates floating-point zero-divide (#Z) exception. * +0 +0 +0 0 0 0 * NaN F + +F +F +0 0 -F -F NaN 0 + ** ** * * ** ** NaN +0 ** ** * * ** ** + NaN +F -F -F 0 +0 +F +F + NaN + * 0 0 0 +0 +0 +0 * NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
When the source operand is an integer 0, it is treated as a +0. Operation

IF DEST = 0 THEN #Z ELSE IF instruction is FIDIVR THEN DEST ConvertExtendedReal(SRC) / DEST; ELSE (* source operand is real number *) DEST SRC / DEST; FI; FI; IF instruction = FDIVRP THEN PopRegisterStack FI;
3-194

Floating-Point Exceptions #IS #IA Stack underflow occurred. Operand is an sNaN value or unsupported format. / ; 0 / 0 #D #Z #U #O #P Result is a denormal value. SRC / 0, where SRC is not equal to 0. Result is too small for destination format. Result is too large for destination format. Value cannot be represented exactly in destination format.
3-195

3-196
FFREEFree Floating-Point Register

Opcode DD C0+i Instruction FFREE ST(i) Description Sets tag for ST(i) to empty
Description This instruction sets the tag in the FPU tag register associated with register ST(i) to empty (11B). The contents of ST(i) and the FPU stack-top pointer (TOP) are not affected. Operation
TAG(i) 11B;
FPU Flags Affected C0, C1, C2, C3 undefined. Floating-Point Exceptions None. Protected Mode Exceptions #NM EM or TS in CR0 is set.
3-197
FICOM/FICOMPCompare Integer
Opcode DE /2 DA /2 DE /3 DA /3 Instruction FICOM m16int FICOM m32int FICOMP m16int FICOMP m32int Description Compare ST(0) with m16int Compare ST(0) with m32int Compare ST(0) with m16int and pop stack register Compare ST(0) with m32int and pop stack register
Description These instruction compare the value in ST(0) with an integer source operand and sets the condition code flags C0, C2, and C3 in the FPU status word according to the results (refer to table below). The integer value is converted to extended-real format before the comparison is made.
Condition ST(0) > SRC ST(0) < SRC ST(0) = SRC Unordered C3 0 0 1 1 C2 0 0 0 1 C0 0 1 0 1
These instructions perform an unordered comparison. An unordered comparison also checks the class of the numbers being compared (refer to FXAMExamine in this chapter). If either operand is a NaN or is in an undefined format, the condition flags are set to unordered. The sign of zero is ignored, so that 0.0 = +0.0. The FICOMP instructions pop the register stack following the comparison. To pop the register stack, the processor marks the ST(0) register empty and increments the stack pointer (TOP) by 1. Operation
CASE (relation of operands) OF ST(0) > SRC: C3, C2, C0 000; ST(0) < SRC: C3, C2, C0 001; ST(0) = SRC: C3, C2, C0 100; Unordered: C3, C2, C0 111; ESAC; IF instruction = FICOMP THEN PopRegisterStack; FI;
3-198
FICOM/FICOMPCompare Integer (Continued)

FPU Flags Affected C1 C0, C2, C3 Set to 0 if stack underflow occurred; otherwise, set to 0. Refer to table on previous page.
Floating-Point Exceptions #IS #IA #D Stack underflow occurred. One or both operands are NaN values or have unsupported formats. One or both operands are denormal values.
3-199
FILDLoad Integer
Opcode DF /0 DB /0 DF /5 Instruction FILD m16int FILD m32int FILD m64int Description Push m16int onto the FPU register stack. Push m32int onto the FPU register stack. Push m64int onto the FPU register stack.
Description This instruction converts the signed-integer source operand into extended-real format and pushes the value onto the FPU register stack. The source operand can be a word, short, or long integer value. It is loaded without rounding errors. The sign of the source operand is preserved. Operation
TOP TOP 1; ST(0) ExtendedReal(SRC);
FPU Flags Affected C1 C0, C2, C3 Set to 1 if stack overflow occurred; cleared to 0 otherwise. Undefined.
3-200
FILDLoad Integer (Continued)

3-201
FINCSTPIncrement Stack-Top Pointer

Opcode D9 F7 Instruction FINCSTP Description Increment the TOP field in the FPU status register
Description This instruction adds one to the TOP field of the FPU status word (increments the top-of-stack pointer). If the TOP field contains a 7, it is set to 0. The effect of this instruction is to rotate the stack by one position. The contents of the FPU data registers and tag register are not affected. This operation is not equivalent to popping the stack, because the tag for the previous top-ofstack register is not marked empty. Operation
IF TOP = 7 THEN TOP 0; ELSE TOP TOP + 1; FI;
FPU Flags Affected The C1 flag is set to 0; otherwise, cleared to 0. The C0, C2, and C3 flags are undefined. Floating-Point Exceptions None. Protected Mode Exceptions #NM EM or TS in CR0 is set.
3-202
FINIT/FNINITInitialize Floating-Point Unit

Opcode 9B DB E3 DB E3 NOTE: * Refer to Intel Architecture Compatibility below. Instruction FINIT FNINIT* Description Initialize FPU after checking for pending unmasked floating-point exceptions. Initialize FPU without checking for pending unmasked floating-point exceptions.
Description These instructions set the FPU control, status, tag, instruction pointer, and data pointer registers to their default states. The FPU control word is set to 037FH (round to nearest, all exceptions masked, 64-bit precision). The status word is cleared (no exception flags set, TOP is set to 0). The data registers in the register stack are left unchanged, but they are all tagged as empty (11B). Both the instruction and data pointers are cleared. The FINIT instruction checks for and handles any pending unmasked floating-point exceptions before performing the initialization; the FNINIT instruction does not. Intel Architecture Compatibility When operating a Pentium or Intel486 processor in MS-DOS compatibility mode, it is possible (under unusual circumstances) for an FNINIT instruction to be interrupted prior to being executed to handle a pending FPU exception. Refer to Section E.2.1.3, No-Wait FPU Instructions Can Get FPU Interrupt in Window in Appendix E, Guidelines for Writing FPU Exception Handlers of the Intel Architecture Software Developers Manual, Volume 1, for a description of these circumstances. An FNINIT instruction cannot be interrupted in this way on a Pentium Pro processor. In the Intel387 math coprocessor, the FINIT/FNINIT instruction does not clear the instruction and data pointers. On a Pentium III processor, the FINIT/FNINT instructions operate the same as on a Pentium II processor. They have no effect on the Pentium III processor SIMD floating-point functional unit or control/status register. Operation
FPUControlWord 037FH; FPUStatusWord 0; FPUTagWord FFFFH; FPUDataPointer 0; FPUInstructionPointer 0; FPULastInstructionOpcode 0;
3-203
FINIT/FNINITInitialize Floating-Point Unit (Continued)

FPU Flags Affected C0, C1, C2, C3 cleared to 0. Floating-Point Exceptions None. Protected Mode Exceptions #NM EM or TS in CR0 is set.
Virtual-8086 Mode Exceptions #NM Comments This instruction has no effect on the state of SIMD floating-point registers. EM or TS in CR0 is set.
3-204
FIST/FISTPStore Integer
Opcode DF /2 DB /2 DF /3 DB /3 DF /7 Instruction FIST m16int FIST m32int FISTP m16int FISTP m32int FISTP m64int Description Store ST(0) in m16int Store ST(0) in m32int Store ST(0) in m16int and pop register stack Store ST(0) in m32int and pop register stack Store ST(0) in m64int and pop register stack
Description The FIST instruction converts the value in the ST(0) register to a signed integer and stores the result in the destination operand. Values can be stored in word- or short-integer format. The destination operand specifies the address where the first byte of the destination value is to be stored. The FISTP instruction performs the same operation as the FIST instruction and then pops the register stack. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The FISTP instruction can also stores values in longinteger format. The following table shows the results obtained when storing various classes of numbers in integer format.
ST(0) F < 1 1 < F < 0 0 +0 +0 < +F < +1 +F > +1 + NaN NOTES: F Means finite-real number. I Means integer. * Indicates floating-point invalid operation (#IA) exception. ** 0 or 1, depending on the rounding mode. DEST * I ** 0 0 ** +I * *
3-205
FIST/FISTPStore Integer (Continued)

If the source value is a non-integral value, it is rounded to an integer value, according to the rounding mode specified by the RC field of the FPU control word. If the value being stored is too large for the destination format, is an , is a NaN, or is in an unsupported format and if the invalid-arithmetic-operand exception (#IA) is unmasked, an invalid operation exception is generated and no value is stored in the destination operand. If the invalid operation exception is masked, the integer indefinite value is stored in the destination operand. Operation
DEST Integer(ST(0)); IF instruction = FISTP THEN PopRegisterStack; FI;
FPU Flags Affected C1 Set to 0 if stack underflow occurred. Indicates rounding direction of if the inexact exception (#P) is generated: 0 = not roundup; 1 = roundup. Cleared to 0 otherwise. C0, C2, C3 Undefined.
Floating-Point Exceptions #IS #IA Stack underflow occurred. Source operand is too large for the destination format Source operand is a NaN value or unsupported format. #P Value cannot be represented exactly in destination format.
3-206
FIST/FISTPStore Integer (Continued)

Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #NM #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-207
FLDLoad Real
Opcode D9 /0 DD /0 DB /5 D9 C0+i Instruction FLD m32real FLD m64real FLD m80real FLD ST(i) Description Push m32real onto the FPU register stack. Push m64real onto the FPU register stack. Push m80real onto the FPU register stack. Push ST(i) onto the FPU register stack.
Description This instruction pushes the source operand onto the FPU register stack. If the source operand is in single- or double-real format, it is automatically converted to the extended-real format before being pushed on the stack. The FLD instruction can also push the value in a selected FPU register [ST(i)] onto the stack. Here, pushing register ST(0) duplicates the stack top. Operation
IF SRC is ST(i) THEN temp ST(i) TOP TOP 1; IF SRC is memory-operand THEN ST(0) ExtendedReal(SRC); ELSE (* SRC is ST(i) *) ST(0) temp;
Floating-Point Exceptions #IS #IA #D Stack overflow occurred. Source operand is an sNaN value or unsupported format. Source operand is a denormal value. Does not occur if the source operand is in extended-real format.
3-208
FLDLoad Real (Continued)

Protected Mode Exceptions #GP(0) If destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #NM #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-209
FLD1/FLDL2T/FLDL2E/FLDPI/FLDLG2/FLDLN2/FLDZLoad Constant
Opcode D9 E8 D9 E9 D9 EA D9 EB D9 EC D9 ED D9 EE Instruction FLD1 FLDL2T FLDL2E FLDPI FLDLG2 FLDLN2 FLDZ Description Push +1.0 onto the FPU register stack. Push log210 onto the FPU register stack. Push log2e onto the FPU register stack. Push onto the FPU register stack. Push log102 onto the FPU register stack. Push loge2 onto the FPU register stack. Push +0.0 onto the FPU register stack.
Description These instructions push one of seven commonly used constants (in extended-real format) onto the FPU register stack. The constants that can be loaded with these instructions include +1.0, +0.0, log210, log2e, , log102, and loge2. For each constant, an internal 66-bit constant is rounded (as specified by the RC field in the FPU control word) to external-real format. The inexact result exception (#P) is not generated as a result of the rounding. Refer to Section 7.5.8., Pi in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1, for a description of the constant. Operation
TOP TOP 1; ST(0) CONSTANT;
3-210
FLD1/FLDL2T/FLDL2E/FLDPI/FLDLG2/FLDLN2/FLDZLoad Constant (Continued)

Intel Architecture Compatibility When the RC field is set to round to nearest mode, the FPU produces the same constants that is produced by the Intel 8087 and Intel287 math coprocessors.
3-211
FLDCWLoad Control Word

Opcode D9 /5 Instruction FLDCW m2byte Description Load FPU control word from m2byte.
Description This instruction loads the 16-bit source operand into the FPU control word. The source operand is a memory location. This instruction is typically used to establish or change the FPUs mode of operation. If one or more exception flags are set in the FPU status word prior to loading a new FPU control word and the new control word unmasks one or more of those exceptions, a floating-point exception will be generated upon execution of the next floating-point instruction (except for the no-wait floating-point instructions. For more information, refer to Section 7.7.3., Software Exception Handling in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1). To avoid raising exceptions when changing FPU operating modes, clear any pending exceptions (using the FCLEX or FNCLEX instruction) before loading the new control word. Intel Architecture Compatibility On a Pentium III processor, the FLDCW instruction operates the same as on a Pentium II processor. It has no effect on the Pentium III processor SIMD floating-point functional unit or control/status register. Operation
FPUControlWord SRC;
FPU Flags Affected C0, C1, C2, C3 undefined. Floating-Point Exceptions None; however, this operation might unmask a pending exception in the FPU status word. That exception is then generated upon execution of the next waiting floating-point instruction.
3-212
FLDCWLoad Control Word (Continued)

Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #NM #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-213
FLDENVLoad FPU Environment

Opcode D9 /4 Instruction FLDENV m14/28byte Description Load FPU environment from m14byte or m28byte.
Description This instruction loads the complete FPU operating environment from memory into the FPU registers. The source operand specifies the first byte of the operating-environment data in memory. This data is typically written to the specified memory location by a FSTENV or FNSTENV instruction. The FPU operating environment consists of the FPU control word, status word, tag word, instruction pointer, data pointer, and last opcode. Figures 7-13 through Figure 7-16 in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1, show the layout in memory of the loaded environment, depending on the operating mode of the processor (protected or real) and the current operand-size attribute (16-bit or 32-bit). In virtual8086 mode, the real mode layouts are used. The FLDENV instruction should be executed in the same operating mode as the corresponding FSTENV/FNSTENV instruction. If one or more unmasked exception flags are set in the new FPU status word, a floating-point exception will be generated upon execution of the next floating-point instruction (except for the no-wait floating-point instructions. or more information, refer to Section 7.7.3., Software Exception Handling in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1). To avoid generating exceptions when loading a new environment, clear all the exception flags in the FPU status word that is being loaded. Intel Architecture Compatibility On a Pentium III processor, the FLDENV instruction operates the same as on a Pentium II processor. It has no effect on the Pentium III processor SIMD floating-point functional unit or control/status register. Operation
FPUControlWord SRC(FPUControlWord); FPUStatusWord SRC(FPUStatusWord); FPUTagWord SRC(FPUTagWord); FPUDataPointer SRC(FPUDataPointer); FPUInstructionPointer SRC(FPUInstructionPointer); FPULastInstructionOpcode SRC(FPULastInstructionOpcode);
FPU Flags Affected The C0, C1, C2, C3 flags are loaded.
3-214
FLDENVLoad FPU Environment (Continued)

Floating-Point Exceptions None; however, if an unmasked exception is loaded in the status word, it is generated upon execution of the next waiting floating-point instruction. Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #NM #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-215
FMUL/FMULP/FIMULMultiply
Opcode D8 /1 DC /1 D8 C8+i DC C8+i DE C8+i DE C9 DA /1 DE /1 Instruction FMUL m32real FMUL m64real FMUL ST(0), ST(i) FMUL ST(i), ST(0) FMULP ST(i), ST(0) FMULP FIMUL m32int FIMUL m16int Description Multiply ST(0) by m32real and store result in ST(0) Multiply ST(0) by m64real and store result in ST(0) Multiply ST(0) by ST(i) and store result in ST(0) Multiply ST(i) by ST(0) and store result in ST(i) Multiply ST(i) by ST(0), store result in ST(i), and pop the register stack Multiply ST(1) by ST(0), store result in ST(1), and pop the register stack Multiply ST(0) by m32int and store result in ST(0) Multiply ST(0) by m16int and store result in ST(0)
Description These instructions multiply the destination and source operands and stores the product in the destination location. The destination operand is always an FPU data register; the source operand can be an FPU data register or a memory location. Source operands in memory can be in singlereal, double-real, word-integer, or short-integer formats. The no-operand version of the instruction multiplies the contents of the ST(1) register by the contents of the ST(0) register and stores the product in the ST(1) register. The one-operand version multiplies the contents of the ST(0) register by the contents of a memory location (either a real or an integer value) and stores the product in the ST(0) register. The two-operand version, multiplies the contents of the ST(0) register by the contents of the ST(i) register, or vice versa, with the result being stored in the register specified with the first operand (the destination operand). The FMULP instructions perform the additional operation of popping the FPU register stack after storing the product. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The no-operand version of the floatingpoint multiply instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FMUL rather than FMULP. The FIMUL instructions convert an integer source operand to extended-real format before performing the multiplication. The sign of the result is always the exclusive-OR of the source signs, even if one or more of the values being multiplied is 0 or . When the source operand is an integer 0, it is treated as a +0. The following table shows the results obtained when multiplying various classes of numbers, assuming that neither overflow nor underflow occurs.
3-216
FMUL/FMULP/FIMULMultiply (Continued)
DEST F I SRC 0 +0 +I +F + NaN NOTES: F Means finite-real number. I Means Integer. * Indicates invalid-arithmetic-operand (#IA) exception. + + + * * NaN F + +F +F +0 0 F F NaN 0 * +0 +0 +0 0 0 0 * NaN +0 * 0 0 0 +0 +0 +0 * NaN +F F F 0 +0 +F +F + NaN + * * + + + NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Operation
IF instruction is FIMUL THEN DEST DEST ConvertExtendedReal(SRC); ELSE (* source operand is real number *) DEST DEST SRC; FI; IF instruction = FMULP THEN PopRegisterStack FI;
FPU Flags Affected C1 Set to 0 if stack underflow occurred. Indicates rounding direction if the inexact result exception (#P) fault is generated: 0 = not roundup; 1 = roundup. C0, C2, C3 Undefined.
3-217
Floating-Point Exceptions #IS #IA Stack underflow occurred. Operand is an sNaN value or unsupported format. One operand is 0 and the other is . #D #U #O #P Source operand is a denormal value. Result is too small for destination format. Result is too large for destination format. Value cannot be represented exactly in destination format.
3-218
3-219
FNOPNo Operation
Opcode D9 D0 Instruction FNOP Description No operation is performed.
Description This instruction performs no FPU operation. This instruction takes up space in the instruction stream but does not affect the FPU or machine context, except the EIP register. FPU Flags Affected C0, C1, C2, C3 undefined. Floating-Point Exceptions None. Protected Mode Exceptions #NM EM or TS in CR0 is set.
3-220
FPATANPartial Arctangent
Opcode D9 F3 Instruction FPATAN Description Replace ST(1) with arctan(ST(1)/ST(0)) and pop the register stack
Description This instruction computes the arctangent of the source operand in register ST(1) divided by the source operand in register ST(0), stores the result in ST(1), and pops the FPU register stack. The result in register ST(0) has the same sign as the source operand ST(1) and a magnitude less than +. The FPATAN instruction returns the angle between the X axis and the line from the origin to the point (X,Y), where Y (the ordinate) is ST(1) and X (the abscissa) is ST(0). The angle depends on the sign of X and Y independently, not just on the sign of the ratio Y/X. This is because a point (X,Y) is in the second quadrant, resulting in an angle between /2 and , while a point (X,Y) is in the fourth quadrant, resulting in an angle between 0 and /2. A point (X,Y) is in the third quadrant, giving an angle between /2 and . The following table shows the results obtained when computing the arctangent of various classes of numbers, assuming that underflow does not occur.
ST(0) ST(1) F 0 +0 +F + NaN NOTES: F Means finite-real number. * Table 7-21 in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1, specifies that the ratios 0/0 and / generate the floating-point invalid arithmetic-operation exception and, if this exception is masked, the real indefinite value is returned. With the FPATAN instruction, the 0/0 or / value is actually not calculated using division. Instead, the arctangent of the two variables is derived from a common mathematical formulation that is generalized to allow complex numbers as arguments. In this complex variable formulation, arctangent(0,0) etc. has well defined values. These values are needed to develop a library to compute transcendental functions with complex arguments, based on the FPU functions that only allow real numbers as arguments. 3/4* + + +3/4* NaN F /2 to /2 + + to +/2 +/2 NaN 0 /2 /2 * +* +/2 +/2 NaN +0 /2 /2 0* +0* +/2 +/2 NaN +F /2 /2 to 0 0 +0 +/2 to +0 +/2 NaN + /4* -0 0 +0 +0 +/4* NaN NaN NaN NaN NaN NaN NaN NaN NaN
There is no restriction on the range of source operands that FPATAN can accept.
3-221
FPATANPartial Arctangent (Continued)

Intel Architecture Compatibility The source operands for this instruction are restricted for the 80287 math coprocessor to the following range: 0 |ST(1)| < |ST(0)| < + Operation
ST(1) arctan(ST(1) / ST(0)); PopRegisterStack;
Floating-Point Exceptions #IS #IA #D #U #P Stack underflow occurred. Source operand is an sNaN value or unsupported format. Source operand is a denormal value. Result is too small for destination format. Value cannot be represented exactly in destination format.
3-222
FPREMPartial Remainder
Opcode D9 F8 Instruction FPREM Description Replace ST(0) with the remainder obtained from dividing ST(0) by ST(1)
Description This instruction computes the remainder obtained from dividing the value in the ST(0) register (the dividend) by the value in the ST(1) register (the divisor or modulus), and stores the result in ST(0). The remainder represents the following value: Remainder = ST(0) (Q ST(1)) Here, Q is an integer value that is obtained by truncating the real-number quotient of [ST(0) / ST(1)] toward zero. The sign of the remainder is the same as the sign of the dividend. The magnitude of the remainder is less than that of the modulus, unless a partial remainder was computed (as described below). This instruction produces an exact result; the precision (inexact) exception does not occur and the rounding control has no effect. The following table shows the results obtained when computing the remainder of various classes of numbers, assuming that underflow does not occur.
ST(1) ST(0) F 0 +0 +F + NaN NOTES: F Means finite-real number. * Indicates floating-point invalid-arithmetic-operand (#IA) exception. ** Indicates floating-point zero-divide (#Z) exception. * ST(0) 0 +0 ST(0) * NaN F * F or 0 0 +0 +F or +0 * NaN 0 * ** * * ** * NaN +0 * ** * * ** * NaN +F * F or 0 0 +0 +F or +0 * NaN + * ST(0) 0 +0 ST(0) * NaN NaN NaN NaN NaN NaN NaN NaN NaN
When the result is 0, its sign is the same as that of the dividend. When the modulus is , the result is equal to the value in ST(0).
3-223
FPREMPartial Remainder (Continued)

The FPREM instruction gets its name partial remainder because of the way it computes the remainder. This instructions arrives at a remainder through iterative subtraction. It can, however, reduce the exponent of ST(0) by no more than 63 in one execution of the instruction. If the instruction succeeds in producing a remainder that is less than the modulus, the operation is complete and the C2 flag in the FPU status word is cleared. Otherwise, C2 is set, and the result in ST(0) is called the partial remainder. The exponent of the partial remainder will be less than the exponent of the original dividend by at least 32. Software can re-execute the instruction (using the partial remainder in ST(0) as the dividend) until C2 is cleared. (Note that while executing such a remainder-computation loop, a higher-priority interrupting routine that needs the FPU can force a context switch in-between the instructions in the loop.) An important use of the FPREM instruction is to reduce the arguments of periodic functions. When reduction is complete, the instruction stores the three least-significant bits of the quotient in the C3, C1, and C0 flags of the FPU status word. This information is important in argument reduction for the tangent function (using a modulus of /4), because it locates the original angle in the correct one of eight sectors of the unit circle. Operation
D exponent(ST(0)) exponent(ST(1)); IF D < 64 THEN Q Integer(TruncateTowardZero(ST(0) / ST(1))); ST(0) ST(0) (ST(1) Q); C2 0; C0, C3, C1 LeastSignificantBits(Q); (* Q2, Q1, Q0 *) ELSE C2 1; N an implementation-dependent number between 32 and 63; QQ Integer(TruncateTowardZero((ST(0) / ST(1)) / 2(D N))); ST(0) ST(0) (ST(1) QQ 2(D N)); FI;
FPU Flags Affected C0 C1 C2 C3 Set to bit 2 (Q2) of the quotient. Set to 0 if stack underflow occurred; otherwise, set to least significant bit of quotient (Q0). Set to 0 if reduction complete; set to 1 if incomplete. Set to bit 1 (Q1) of the quotient.
3-224
FPREMPartial Remainder (Continued)

Floating-Point Exceptions #IS #IA #D #U Stack underflow occurred. Source operand is an sNaN value, modulus is 0, dividend is , or unsupported format. Source operand is a denormal value. Result is too small for destination format.
3-225
FPREM1Partial Remainder
Opcode D9 F5 Instruction FPREM1 Description Replace ST(0) with the IEEE remainder obtained from dividing ST(0) by ST(1)
Description This instruction computes the IEEE remainder obtained from dividing the value in the ST(0) register (the dividend) by the value in the ST(1) register (the divisor or modulus), and stores the result in ST(0). The remainder represents the following value: Remainder = ST(0) (Q ST(1)) Here, Q is an integer value that is obtained by rounding the real-number quotient of [ST(0) / ST(1)] toward the nearest integer value. The magnitude of the remainder is less than half the magnitude of the modulus, unless a partial remainder was computed (as described below). This instruction produces an exact result; the precision (inexact) exception does not occur and the rounding control has no effect. The following table shows the results obtained when computing the remainder of various classes of numbers, assuming that underflow does not occur.
ST(1) ST(0) F 0 +0 +F + NaN NOTES: F Means finite-real number. * Indicates floating-point invalid-arithmetic-operand (#IA) exception. ** Indicates floating-point zero-divide (#Z) exception. * ST(0) 0 +0 ST(0) * NaN F * F or 0 0 +0 F or +0 * NaN 0 * ** * * ** * NaN +0 * ** * * ** * NaN +F * F or 0 0 +0 F or +0 * NaN + * ST(0) 0 +0 ST(0) * NaN NaN NaN NaN NaN NaN NaN NaN NaN
When the result is 0, its sign is the same as that of the dividend. When the modulus is , the result is equal to the value in ST(0). The FPREM1 instruction computes the remainder specified in IEEE Std 754. This instruction operates differently from the FPREM instruction in the way that it rounds the quotient of ST(0) divided by ST(1) to an integer (refer to the Operation section below).
3-226
FPREM1Partial Remainder (Continued)

Like the FPREM instruction, the FPREM1 computes the remainder through iterative subtraction, but can reduce the exponent of ST(0) by no more than 63 in one execution of the instruction. If the instruction succeeds in producing a remainder that is less than one half the modulus, the operation is complete and the C2 flag in the FPU status word is cleared. Otherwise, C2 is set, and the result in ST(0) is called the partial remainder. The exponent of the partial remainder will be less than the exponent of the original dividend by at least 32. Software can reexecute the instruction (using the partial remainder in ST(0) as the dividend) until C2 is cleared. (Note that while executing such a remainder-computation loop, a higher-priority interrupting routine that needs the FPU can force a context switch in-between the instructions in the loop.) An important use of the FPREM1 instruction is to reduce the arguments of periodic functions. When reduction is complete, the instruction stores the three least-significant bits of the quotient in the C3, C1, and C0 flags of the FPU status word. This information is important in argument reduction for the tangent function (using a modulus of /4), because it locates the original angle in the correct one of eight sectors of the unit circle. Operation
D exponent(ST(0)) exponent(ST(1)); IF D < 64 THEN Q Integer(RoundTowardNearestInteger(ST(0) / ST(1))); ST(0) ST(0) (ST(1) Q); C2 0; C0, C3, C1 LeastSignificantBits(Q); (* Q2, Q1, Q0 *) ELSE C2 1; N an implementation-dependent number between 32 and 63; QQ Integer(TruncateTowardZero((ST(0) / ST(1)) / 2(D N))); ST(0) ST(0) (ST(1) QQ 2(D N)); FI;
FPU Flags Affected C0 C1 C2 C3 Set to bit 2 (Q2) of the quotient. Set to 0 if stack underflow occurred; otherwise, set to least significant bit of quotient (Q0). Set to 0 if reduction complete; set to 1 if incomplete. Set to bit 1 (Q1) of the quotient.
3-227
FPREM1Partial Remainder (Continued)

Floating-Point Exceptions #IS #IA #D #U Stack underflow occurred. Source operand is an sNaN value, modulus (divisor) is 0, dividend is , or unsupported format. Source operand is a denormal value. Result is too small for destination format.
3-228
FPTANPartial Tangent
Opcode D9 F2 Instruction FPTAN Clocks 17-173 Description Replace ST(0) with its tangent and push 1 onto the FPU stack.
Description This instruction computes the tangent of the source operand in register ST(0), stores the result in ST(0), and pushes a 1.0 onto the FPU register stack. The source operand must be given in radians and must be less than 263. The following table shows the unmasked results obtained when computing the partial tangent of various classes of numbers, assuming that underflow does not occur.
ST(0) SRC F 0 +0 +F + NaN NOTES: F Means finite-real number. * Indicates floating-point invalid-arithmetic-operand (#IA) exception. ST(0) DEST * F to +F 0 +0 F to +F * NaN
If the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of range. It is up to the program to check the C2 flag for out-ofrange conditions. Source values outside the range 263 to +263 can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2 or by using the FPREM instruction with a divisor of 2. Refer to Section 7.5.8., Pi in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1, for a discussion of the proper value to use for in performing such reductions. The value 1.0 is pushed onto the register stack after the tangent has been computed to maintain compatibility with the Intel 8087 and Intel287 math coprocessors. This operation also simplifies the calculation of other trigonometric functions. For instance, the cotangent (which is the reciprocal of the tangent) can be computed by executing a FDIVR instruction after the FPTAN instruction.
3-229
FPTANPartial Tangent (Continued)

Operation
IF ST(0) < 263 THEN C2 0; ST(0) tan(ST(0)); TOP TOP 1; ST(0) 1.0; ELSE (*source operand is out-of-range *) C2 1; FI;
FPU Flags Affected C1 Set to 0 if stack underflow occurred; set to 1 if stack overflow occurred. Indicates rounding direction if the inexact result exception (#P) is generated: 0 = not roundup; 1 = roundup. C2 C0, C3 Set to 1 if source operand is outside the range 263 to +263; otherwise, cleared to 0. Undefined.
Floating-Point Exceptions #IS #IA #D #U #P Stack underflow occurred. Source operand is an sNaN value, , or unsupported format. Source operand is a denormal value. Result is too small for destination format. Value cannot be represented exactly in destination format.
3-230
FRNDINTRound to Integer
Opcode D9 FC Instruction FRNDINT Description Round ST(0) to an integer.
Description This instruction rounds the source value in the ST(0) register to the nearest integral value, depending on the current rounding mode (setting of the RC field of the FPU control word), and stores the result in ST(0). If the source value is , the value is not changed. If the source value is not an integral value, the floating-point inexact result exception (#P) is generated. Operation
ST(0) RoundToIntegralValue(ST(0));
Floating-Point Exceptions #IS #IA #D #P Stack underflow occurred. Source operand is an sNaN value or unsupported format. Source operand is a denormal value. Source operand is not an integral value.
3-231
FRSTORRestore FPU State

Opcode DD /4 Instruction FRSTOR m94/108byte Description Load FPU state from m94byte or m108byte.
Description This instruction loads the FPU state (operating environment and register stack) from the memory area specified with the source operand. This state data is typically written to the specified memory location by a previous FSAVE/FNSAVE instruction. The FPU operating environment consists of the FPU control word, status word, tag word, instruction pointer, data pointer, and last opcode. Figures 7-13 through Figure 7-16 in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1, show the layout in memory of the stored environment, depending on the operating mode of the processor (protected or real) and the current operand-size attribute (16-bit or 32-bit). In virtual8086 mode, the real mode layouts are used. The contents of the FPU register stack are stored in the 80 bytes immediately follow the operating environment image. The FRSTOR instruction should be executed in the same operating mode as the corresponding FSAVE/FNSAVE instruction. Intel Architecture Compatibility On a Pentium III processor, the FRSTOR instruction operates the same as on a Pentium II processor. It has no effect on the SIMD floating-point functional unit or control/status register, i.e., it does not restore the SIMD floating-point processor state. Operation
FPUControlWord SRC(FPUControlWord); FPUStatusWord SRC(FPUStatusWord); FPUTagWord SRC(FPUTagWord); FPUDataPointer SRC(FPUDataPointer); FPUInstructionPointer SRC(FPUInstructionPointer); FPULastInstructionOpcode SRC(FPULastInstructionOpcode); ST(0) SRC(ST(0)); ST(1) SRC(ST(1)); ST(2) SRC(ST(2)); ST(3) SRC(ST(3)); ST(4) SRC(ST(4)); ST(5) SRC(ST(5)); ST(6) SRC(ST(6)); ST(7) SRC(ST(7));
3-232
FRSTORRestore FPU State (Continued)

FPU Flags Affected The C0, C1, C2, C3 flags are loaded. Floating-Point Exceptions None; however, this operation might unmask an existing exception that has been detected but not generated, because it was masked. Here, the exception is generated at the completion of the instruction. Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #NM #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-233
FRSTORRestore FPU State (Continued)

Comments This instruction has no effect on the state of SIMD floating-point registers.
3-234
FSAVE/FNSAVEStore FPU State

Opcode 9B DD /6 Instruction FSAVE m94/108byte Description Store FPU state to m94byte or m108byte after checking for pending unmasked floating-point exceptions. Then reinitialize the FPU. Store FPU environment to m94byte or m108byte without checking for pending unmasked floating-point exceptions. Then re-initialize the FPU.
DD /6
FNSAVE* m94/108byte
NOTE: * Refer to Intel Architecture Compatibility below.
Description These instructions store the current FPU state (operating environment and register stack) at the specified destination in memory, and then re-initializes the FPU. The FSAVE instruction checks for and handles pending unmasked floating-point exceptions before storing the FPU state; the FNSAVE instruction does not. The FPU operating environment consists of the FPU control word, status word, tag word, instruction pointer, data pointer, and last opcode. Figures 7-13 through Figures 7-16 in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1 show the layout in memory of the stored environment, depending on the operating mode of the processor (protected or real) and the current operand-size attribute (16-bit or 32-bit). In virtual8086 mode, the real mode layouts are used. The contents of the FPU register stack are stored in the 80 bytes immediately follow the operating environment image. The saved image reflects the state of the FPU after all floating-point instructions preceding the FSAVE/FNSAVE instruction in the instruction stream have been executed. After the FPU state has been saved, the FPU is reset to the same default values it is set to with the FINIT/FNINIT instructions (refer to FINIT/FNINITInitialize Floating-Point Unit in this chapter). The FSAVE/FNSAVE instructions are typically used when the operating system needs to perform a context switch, an exception handler needs to use the FPU, or an application program needs to pass a clean FPU to a procedure. Intel Architecture Compatibility For Intel math coprocessors and FPUs prior to the Intel Pentium processor, an FWAIT instruction should be executed before attempting to read from the memory image stored with a prior FSAVE/FNSAVE instruction. This FWAIT instruction helps insure that the storage operation has been completed. On a Pentium III processor, the FSAVE/FNSAVE instructions operate the same as on a Pentium II processor. They have no effect on the Pentium III processor SIMD floating-point functional unit or control/status register, i.e., they do not save the SIMD floating-point processor state.
3-235
FSAVE/FNSAVEStore FPU State (Continued)

When operating a Pentium or Intel486 processor in MS-DOS compatibility mode, it is possible (under unusual circumstances) for an FNSAVE instruction to be interrupted prior to being executed to handle a pending FPU exception. Refer to Section E.2.1.3, No-Wait FPU Instructions Can Get FPU Interrupt in Window in Appendix E, Guidelines for Writing FPU Exception Handlers of the Intel Architecture Software Developers Manual, Volume 1, for a description of these circumstances. An FNSAVE instruction cannot be interrupted in this way on a Pentium Pro processor. Operation
(* Save FPU State and Registers *) DEST(FPUControlWord) FPUControlWord; DEST(FPUStatusWord) FPUStatusWord; DEST(FPUTagWord) FPUTagWord; DEST(FPUDataPointer) FPUDataPointer; DEST(FPUInstructionPointer) FPUInstructionPointer; DEST(FPULastInstructionOpcode) FPULastInstructionOpcode; DEST(ST(0)) ST(0); DEST(ST(1)) ST(1); DEST(ST(2)) ST(2); DEST(ST(3)) ST(3); DEST(ST(4)) ST(4); DEST(ST(5)) ST(5); DEST(ST(6)) ST(6); DEST(ST(7)) ST(7); (* Initialize FPU *) FPUControlWord 037FH; FPUStatusWord 0; FPUTagWord FFFFH; FPUDataPointer 0; FPUInstructionPointer 0; FPULastInstructionOpcode 0;
FPU Flags Affected The C0, C1, C2, and C3 flags are saved and then cleared. Floating-Point Exceptions None.
3-236
FSAVE/FNSAVEStore FPU State (Continued)

Protected Mode Exceptions #GP(0) If destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #NM #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
Comments This instruction has no effect on the state of SIMD floating-point registers.
3-237
FSCALEScale
Opcode D9 FD Instruction FSCALE Description Scale ST(0) by ST(1).
Description This instruction multiplies the destination operand by 2 to the power of the source operand and stores the result in the destination operand. The destination operand is a real value that is located in register ST(0). The source operand is the nearest integer value that is smaller than the value in the ST(1) register (that is, the value in register ST(1) is truncated toward 0 to its nearest integer value to form the source operand). This instruction provides rapid multiplication or division by integral powers of 2 because it is implemented by simply adding an integer value (the source operand) to the exponent of the value in register ST(0). The following table shows the results obtained when scaling various classes of numbers, assuming that neither overflow nor underflow occurs.
ST(1) N ST(0) F 0 +0 +F + NaN NOTES: F Means finite-real number. N Means integer. F 0 +0 +F + NaN 0 F 0 +0 +F + NaN +N F 0 +0 +F + NaN
In most cases, only the exponent is changed and the mantissa (significand) remains unchanged. However, when the value being scaled in ST(0) is a denormal value, the mantissa is also changed and the result may turn out to be a normalized number. Similarly, if overflow or underflow results from a scale operation, the resulting mantissa will differ from the sources mantissa. The FSCALE instruction can also be used to reverse the action of the FXTRACT instruction, as shown in the following example:
FXTRACT; FSCALE; FSTP ST(1);
3-238
FSCALEScale (Continued)
In this example, the FXTRACT instruction extracts the significand and exponent from the value in ST(0) and stores them in ST(0) and ST(1) respectively. The FSCALE then scales the significand in ST(0) by the exponent in ST(1), recreating the original value before the FXTRACT operation was performed. The FSTP ST(1) instruction overwrites the exponent (extracted by the FXTRACT instruction) with the recreated value, which returns the stack to its original state with only one register [ST(0)] occupied. Operation
ST(0) ST(0) 2ST(1);
Floating-Point Exceptions #IS #IA #D #U #O #P Stack underflow occurred. Source operand is an sNaN value or unsupported format. Source operand is a denormal value. Result is too small for destination format. Result is too large for destination format. Value cannot be represented exactly in destination format.
3-239
FSINSine
Opcode D9 FE Instruction FSIN Description Replace ST(0) with its sine.
Description This instruction calculates the sine of the source operand in register ST(0) and stores the result in ST(0). The source operand must be given in radians and must be within the range 263 to +263. The following table shows the results obtained when taking the sine of various classes of numbers, assuming that underflow does not occur.
SRC (ST(0)) F 0 +0 +F + NaN NOTES: F Means finite-real number. * Indicates floating-point invalid-arithmetic-operand (#IA) exception. DEST (ST(0)) * 1 to +1 0 +0 1 to +1 * NaN
If the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of range. It is up to the program to check the C2 flag for out-ofrange conditions. Source values outside the range 263 to +263 can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2 or by using the FPREM instruction with a divisor of 2. Refer to Section 7.5.8., Pi in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1, for a discussion of the proper value to use for in performing such reductions. Operation
IF ST(0) < 263 THEN C2 0; ST(0) sin(ST(0)); ELSE (* source operand out of range *) C2 1; FI:
3-240
FSINSine (Continued)
FPU Flags Affected C1 Set to 0 if stack underflow occurred. Indicates rounding direction if the inexact result exception (#P) is generated: 0 = not roundup; 1 = roundup. C2 C0, C3 Set to 1 if source operand is outside the range 263 to +263; otherwise, cleared to 0. Undefined.
Floating-Point Exceptions #IS #IA #D #P Stack underflow occurred. Source operand is an sNaN value, , or unsupported format. Source operand is a denormal value. Value cannot be represented exactly in destination format.
3-241
FSINCOSSine and Cosine

Opcode D9 FB Instruction FSINCOS Description Compute the sine and cosine of ST(0); replace ST(0) with the sine, and push the cosine onto the register stack.
Description This instruction computes both the sine and the cosine of the source operand in register ST(0), stores the sine in ST(0), and pushes the cosine onto the top of the FPU register stack. (This instruction is faster than executing the FSIN and FCOS instructions in succession.) The source operand must be given in radians and must be within the range 263 to +263. The following table shows the results obtained when taking the sine and cosine of various classes of numbers, assuming that underflow does not occur.
SRC ST(0) F 0 +0 +F + NaN NOTES: F Means finite-real number. * Indicates floating-point invalid-arithmetic-operand (#IA) exception. ST(1) Cosine * 1 to +1 +1 +1 1 to +1 * NaN DEST ST(0) Sine * 1 to +1 0 +0 1 to +1 * NaN
If the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of range. It is up to the program to check the C2 flag for out-ofrange conditions. Source values outside the range 263 to +263 can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2 or by using the FPREM instruction with a divisor of 2. Refer to Section 7.5.8., Pi in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1, for a discussion of the proper value to use for in performing such reductions.
3-242
FSINCOSSine and Cosine (Continued)

Operation
IF ST(0) < 263 THEN C2 0; TEMP cosine(ST(0)); ST(0) sine(ST(0)); TOP TOP 1; ST(0) TEMP; ELSE (* source operand out of range *) C2 1; FI:
FPU Flags Affected C1 Set to 0 if stack underflow occurred; set to 1 of stack overflow occurs. Indicates rounding direction if the exception (#P) is generated: 0 = not roundup; 1 = roundup. C2 C0, C3 Set to 1 if source operand is outside the range 263 to +263; otherwise, cleared to 0. Undefined.
Floating-Point Exceptions #IS #IA #D #U #P Stack underflow occurred. Source operand is an sNaN value, , or unsupported format. Source operand is a denormal value. Result is too small for destination format. Value cannot be represented exactly in destination format.
3-243
FSQRTSquare Root
Opcode D9 FA Instruction FSQRT Description Calculates square root of ST(0) and stores the result in ST(0)
Description This instruction calculates the square root of the source value in the ST(0) register and stores the result in ST(0). The following table shows the results obtained when taking the square root of various classes of numbers, assuming that neither overflow nor underflow occurs.
SRC (ST(0)) F 0 +0 +F + NaN NOTES: F Means finite-real number. * Indicates floating-point invalid-arithmetic-operand (#IA) exception. DEST (ST(0)) * * 0 +0 +F + NaN
Operation
ST(0) SquareRoot(ST(0));
FPU Flags Affected C1 Set to 0 if stack underflow occurred. Indicates rounding direction if inexact result exception (#P) is generated: 0 = not roundup; 1 = roundup. C0, C2, C3 Undefined.
3-244
FSQRTSquare Root (Continued)

Floating-Point Exceptions #IS #IA Stack underflow occurred. Source operand is an sNaN value or unsupported format. Source operand is a negative value (except for 0). #D #P Source operand is a denormal value. Value cannot be represented exactly in destination format.
3-245
FST/FSTPStore Real
Opcode D9 /2 DD /2 DD D0+i D9 /3 DD /3 DB /7 DD D8+i Instruction FST m32real FST m64real FST ST(i) FSTP m32real FSTP m64real FSTP m80real FSTP ST(i) Description Copy ST(0) to m32real Copy ST(0) to m64real Copy ST(0) to ST(i) Copy ST(0) to m32real and pop register stack Copy ST(0) to m64real and pop register stack Copy ST(0) to m80real and pop register stack Copy ST(0) to ST(i) and pop register stack
Description The FST instruction copies the value in the ST(0) register to the destination operand, which can be a memory location or another register in the FPU register stack. When storing the value in memory, the value is converted to single- or double-real format. The FSTP instruction performs the same operation as the FST instruction and then pops the register stack. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The FSTP instruction can also store values in memory in extended-real format. If the destination operand is a memory location, the operand specifies the address where the first byte of the destination value is to be stored. If the destination operand is a register, the operand specifies a register in the register stack relative to the top of the stack. If the destination size is single- or double-real, the significand of the value being stored is rounded to the width of the destination (according to rounding mode specified by the RC field of the FPU control word), and the exponent is converted to the width and bias of the destination format. If the value being stored is too large for the destination format, a numeric overflow exception (#O) is generated and, if the exception is unmasked, no value is stored in the destination operand. If the value being stored is a denormal value, the denormal exception (#D) is not generated. This condition is simply signaled as a numeric underflow exception (#U) condition. If the value being stored is 0, , or a NaN, the least-significant bits of the significand and the exponent are truncated to fit the destination format. This operation preserves the values identity as a 0, , or NaN. If the destination operand is a non-empty register, the invalid operation exception is not generated. Operation
DEST ST(0); IF instruction = FSTP THEN PopRegisterStack; FI;
3-246
FST/FSTPStore Real (Continued)

FPU Flags Affected C1 Set to 0 if stack underflow occurred. Indicates rounding direction of if the floating-point inexact exception (#P) is generated: 0 = not roundup; 1 = roundup. C0, C2, C3 Undefined.
Floating-Point Exceptions #IS #IA #U #O #P Stack underflow occurred. Source operand is an sNaN value or unsupported format. Result is too small for the destination format. Result is too large for the destination format. Value cannot be represented exactly in destination format.
3-247
FST/FSTPStore Real (Continued)

3-248
FSTCW/FNSTCWStore Control Word

Opcode 9B D9 /7 D9 /7 NOTE: * Refer to Intel Architecture Compatibility below. Instruction FSTCW m2byte FNSTCW* m2byte Description Store FPU control word to m2byte after checking for pending unmasked floating-point exceptions. Store FPU control word to m2byte without checking for pending unmasked floating-point exceptions.
Description These instructions store the current value of the FPU control word at the specified destination in memory. The FSTCW instruction checks for and handles pending unmasked floating-point exceptions before storing the control word; the FNSTCW instruction does not. Intel Architecture Compatibility When operating a Pentium or Intel486 processor in MS-DOS compatibility mode, it is possible (under unusual circumstances) for an FNSTCW instruction to be interrupted prior to being executed to handle a pending FPU exception. Refer to Section E.2.1.3, No-Wait FPU Instructions Can Get FPU Interrupt in Window in Appendix E, Guidelines for Writing FPU Exception Handlers of the Intel Architecture Software Developers Manual, Volume 1, for a description of these circumstances. An FNSTCW instruction cannot be interrupted in this way on a Pentium Pro processor. On a Pentium III processor, the FSTCW/FNSTCW instructions operate the same as on a Pentium II processor. They have no effect on the Pentium III processor SIMD floating-point functional unit or control/status register. Operation
DEST FPUControlWord;
FPU Flags Affected The C0, C1, C2, and C3 flags are undefined. Floating-Point Exceptions None.
3-249
FSTCW/FNSTCWStore Control Word (Continued)

3-250
FSTENV/FNSTENVStore FPU Environment

Opcode 9B D9 /6 Instruction FSTENV m14/28byte Description Store FPU environment to m14byte or m28byte after checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions. Store FPU environment to m14byte or m28byte without checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions.
D9 /6
FNSTENV* m14/28byte
NOTE: * Refer to Intel Architecture Compatibility below.
Description These instructions save the current FPU operating environment at the memory location specified with the destination operand, and then masks all floating-point exceptions. The FPU operating environment consists of the FPU control word, status word, tag word, instruction pointer, data pointer, and last opcode. Figures 7-13 through Figure 7-16 in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1 show the layout in memory of the stored environment, depending on the operating mode of the processor (protected or real) and the current operand-size attribute (16-bit or 32-bit). In virtual-8086 mode, the real mode layouts are used. The FSTENV instruction checks for and handles any pending unmasked floating-point exceptions before storing the FPU environment; the FNSTENV instruction does not.The saved image reflects the state of the FPU after all floating-point instructions preceding the FSTENV/FNSTENV instruction in the instruction stream have been executed. These instructions are often used by exception handlers because they provide access to the FPU instruction and data pointers. The environment is typically saved in the stack. Masking all exceptions after saving the environment prevents floating-point exceptions from interrupting the exception handler. Intel Architecture Compatibility When operating a Pentium or Intel486 processor in MS-DOS compatibility mode, it is possible (under unusual circumstances) for an FNSTENV instruction to be interrupted prior to being executed to handle a pending FPU exception. Refer to Section E.2.1.3, No-Wait FPU Instructions Can Get FPU Interrupt in Window in Appendix E, Guidelines for Writing FPU Exception Handlers of the Intel Architecture Software Developers Manual, Volume 1, for a description of these circumstances. An FNSTENV instruction cannot be interrupted in this way on a Pentium Pro processor. On a Pentium III processor, the FSTENV/FNSTENV instructions operate the same as on a Pentium II processor. They have no effect on the Pentium III processor SIMD floating-point functional unit or control/status register.
3-251
FSTENV/FNSTENVStore FPU Environment (Continued)

Operation
DEST(FPUControlWord) FPUControlWord; DEST(FPUStatusWord) FPUStatusWord; DEST(FPUTagWord) FPUTagWord; DEST(FPUDataPointer) FPUDataPointer; DEST(FPUInstructionPointer) FPUInstructionPointer; DEST(FPULastInstructionOpcode) FPULastInstructionOpcode;
FPU Flags Affected The C0, C1, C2, and C3 are undefined. Floating-Point Exceptions None. Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #NM #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-252
FSTENV/FNSTENVStore FPU Environment (Continued)

3-253
FSTSW/FNSTSWStore Status Word

Opcode 9B DD /7 9B DF E0 DD /7 DF E0 NOTE: * Refer to Intel Architecture Compatibility below. Instruction FSTSW m2byte FSTSW AX FNSTSW* m2byte FNSTSW* AX Description Store FPU status word at m2byte after checking for pending unmasked floating-point exceptions. Store FPU status word in AX register after checking for pending unmasked floating-point exceptions. Store FPU status word at m2byte without checking for pending unmasked floating-point exceptions. Store FPU status word in AX register without checking for pending unmasked floating-point exceptions.
Description These instructions store the current value of the FPU status word in the destination location. The destination operand can be either a two-byte memory location or the AX register. The FSTSW instruction checks for and handles pending unmasked floating-point exceptions before storing the status word; the FNSTSW instruction does not. The FNSTSW AX form of the instruction is used primarily in conditional branching (for instance, after an FPU comparison instruction or an FPREM, FPREM1, or FXAM instruction), where the direction of the branch depends on the state of the FPU condition code flags. Refer to Section 7.3.3., Branching and Conditional Moves on FPU Condition Codes in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1. This instruction can also be used to invoke exception handlers (by examining the exception flags) in environments that do not use interrupts. When the FNSTSW AX instruction is executed, the AX register is updated before the processor executes any further instructions. The status stored in the AX register is thus guaranteed to be from the completion of the prior FPU instruction. Intel Architecture Compatibility When operating a Pentium or Intel486 processor in MS-DOS compatibility mode, it is possible (under unusual circumstances) for an FNSTSW instruction to be interrupted prior to being executed to handle a pending FPU exception. Refer to Section E.2.1.3, No-Wait FPU Instructions Can Get FPU Interrupt in Window in Appendix E, Guidelines for Writing FPU Exception Handlers of the Intel Architecture Software Developers Manual, Volume 1, for a description of these circumstances. An FNSTSW instruction cannot be interrupted in this way on a Pentium Pro processor. On a Pentium III processor, the FSTSW/FNSTSW instructions operate the same as on a Pentium II processor. They have no effect on the Pentium III processor SIMD floating-point functional unit or control/status register.
3-254
FSTSW/FNSTSWStore Status Word (Continued)

Operation
DEST FPUStatusWord;
FPU Flags Affected The C0, C1, C2, and C3 are undefined. Floating-Point Exceptions None. Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #NM #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-255
FSTSW/FNSTSWStore Status Word (Continued)

3-256
FSUB/FSUBP/FISUBSubtract
Opcode D8 /4 DC /4 D8 E0+i DC E8+i DE E8+i DE E9 DA /4 DE /4 Instruction FSUB m32real FSUB m64real FSUB ST(0), ST(i) FSUB ST(i), ST(0) FSUBP ST(i), ST(0) FSUBP FISUB m32int FISUB m16int Description Subtract m32real from ST(0) and store result in ST(0) Subtract m64real from ST(0) and store result in ST(0) Subtract ST(i) from ST(0) and store result in ST(0) Subtract ST(0) from ST(i) and store result in ST(i) Subtract ST(0) from ST(i), store result in ST(i), and pop register stack Subtract ST(0) from ST(1), store result in ST(1), and pop register stack Subtract m32int from ST(0) and store result in ST(0) Subtract m16int from ST(0) and store result in ST(0)
Description These instructions subtract the source operand from the destination operand and stores the difference in the destination location. The destination operand is always an FPU data register; the source operand can be a register or a memory location. Source operands in memory can be in single-real, double-real, word-integer, or short-integer formats. The no-operand version of the instruction subtracts the contents of the ST(0) register from the ST(1) register and stores the result in ST(1). The one-operand version subtracts the contents of a memory location (either a real or an integer value) from the contents of the ST(0) register and stores the result in ST(0). The two-operand version, subtracts the contents of the ST(0) register from the ST(i) register or vice versa. The FSUBP instructions perform the additional operation of popping the FPU register stack following the subtraction. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The no-operand version of the floatingpoint subtract instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FSUB rather than FSUBP. The FISUB instructions convert an integer source operand to extended-real format before performing the subtraction. The following table shows the results obtained when subtracting various classes of numbers from one another, assuming that neither overflow nor underflow occurs. Here, the SRC value is subtracted from the DEST value (DEST SRC = result). When the difference between two operands of like sign is 0, the result is +0, except for the round toward mode, in which case the result is 0. This instruction also guarantees that +0 (0) = +0, and that 0 (+0) = 0. When the source operand is an integer 0, it is treated as a +0. When one operand is , the result is of the expected sign. If both operands are of the same sign, an invalid operation exception is generated.
3-257
FSUB/FSUBP/FISUBSubtract (Continued)
SRC F DEST 0 +0 +F + NaN NOTES: F Means finite-real number. I Means integer. * Indicates floating-point invalid-arithmetic-operand (#IA) exception. * + + + + + NaN F or I F or 0 SRC SRC +F + NaN 0 DEST 0 +0 DEST + NaN +0 DEST 0 0 DEST + NaN +F or +I F SRC SRC F or 0 + NaN + * NaN NaN NaN NaN NaN NaN NaN NaN NaN
Operation
IF instruction is FISUB THEN DEST DEST ConvertExtendedReal(SRC); ELSE (* source operand is real number *) DEST DEST SRC; FI; IF instruction is FSUBP THEN PopRegisterStack FI;
3-258
Floating-Point Exceptions #IS #IA Stack underflow occurred. Operand is an sNaN value or unsupported format. Operands are infinities of like sign. #D #U #O #P Source operand is a denormal value. Result is too small for destination format. Result is too large for destination format. Value cannot be represented exactly in destination format.
3-259
3-260
FSUBR/FSUBRP/FISUBRReverse Subtract
Opcode D8 /5 DC /5 D8 E8+i DC E0+i DE E0+i DE E1 DA /5 DE /5 Instruction FSUBR m32real FSUBR m64real FSUBR ST(0), ST(i) FSUBR ST(i), ST(0) FSUBRP ST(i), ST(0) FSUBRP FISUBR m32int FISUBR m16int Description Subtract ST(0) from m32real and store result in ST(0) Subtract ST(0) from m64real and store result in ST(0) Subtract ST(0) from ST(i) and store result in ST(0) Subtract ST(i) from ST(0) and store result in ST(i) Subtract ST(i) from ST(0), store result in ST(i), and pop register stack Subtract ST(1) from ST(0), store result in ST(1), and pop register stack Subtract ST(0) from m32int and store result in ST(0) Subtract ST(0) from m16int and store result in ST(0)
Description These instructions subtract the destination operand from the source operand and stores the difference in the destination location. The destination operand is always an FPU register; the source operand can be a register or a memory location. Source operands in memory can be in single-real, double-real, word-integer, or short-integer formats. These instructions perform the reverse operations of the FSUB, FSUBP, and FISUB instructions. They are provided to support more efficient coding. The no-operand version of the instruction subtracts the contents of the ST(1) register from the ST(0) register and stores the result in ST(1). The one-operand version subtracts the contents of the ST(0) register from the contents of a memory location (either a real or an integer value) and stores the result in ST(0). The two-operand version, subtracts the contents of the ST(i) register from the ST(0) register or vice versa. The FSUBRP instructions perform the additional operation of popping the FPU register stack following the subtraction. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The no-operand version of the floatingpoint reverse subtract instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FSUBR rather than FSUBRP. The FISUBR instructions convert an integer source operand to extended-real format before performing the subtraction. The following table shows the results obtained when subtracting various classes of numbers from one another, assuming that neither overflow nor underflow occurs. Here, the DEST value is subtracted from the SRC value (SRC DEST = result). When the difference between two operands of like sign is 0, the result is +0, except for the round toward mode, in which case the result is 0. This instruction also guarantees that +0 (0) = +0, and that 0 (+0) = 0. When the source operand is an integer 0, it is treated as a +0. When one operand is , the result is of the expected sign. If both operands are of the same sign, an invalid operation exception is generated.
3-261
FSUBR/FSUBRP/FISUBRReverse Subtract (Continued)

SRC F DEST 0 +0 +F + NaN NOTES: F Means finite-real number. I Means integer. * Indicates floating-point invalid-arithmetic-operand (#IA) exception. * NaN F or I + F or 0 SRC SRC F NaN 0 + DEST 0 0 DEST NaN +0 + DEST +0 0 DEST NaN +F or +I + +F SRC SRC F or 0 NaN + + + + + + * NaN NaN NaN NaN NaN NaN NaN NaN NaN
Operation
IF instruction is FISUBR THEN DEST ConvertExtendedReal(SRC) DEST; ELSE (* source operand is real number *) DEST SRC DEST; FI; IF instruction = FSUBRP THEN PopRegisterStack FI;
3-262

Floating-Point Exceptions #IS #IA Stack underflow occurred. Operand is an sNaN value or unsupported format. Operands are infinities of like sign. #D #U #O #P Source operand is a denormal value. Result is too small for destination format. Result is too large for destination format. Value cannot be represented exactly in destination format.
3-263

3-264
FTSTTEST
Opcode D9 E4 Instruction FTST Description Compare ST(0) with 0.0.
Description This instruction compares the value in the ST(0) register with 0.0 and sets the condition code flags C0, C2, and C3 in the FPU status word according to the results (refer to the table below).
Condition ST(0) > 0.0 ST(0) < 0.0 ST(0) = 0.0 Unordered C3 0 0 1 1 C2 0 0 0 1 C0 0 1 0 1
This instruction performs an unordered comparison. An unordered comparison also checks the class of the numbers being compared (refer to FXAMExamine in this chapter). If the value in register ST(0) is a NaN or is in an undefined format, the condition flags are set to unordered and the invalid operation exception is generated. The sign of zero is ignored, so that 0.0 = +0.0. Operation
CASE (relation of operands) OF Not comparable: C3, C2, C0 111; ST(0) > 0.0: C3, C2, C0 000; ST(0) < 0.0: C3, C2, C0 001; ST(0) = 0.0: C3, C2, C0 100; ESAC;
FPU Flags Affected C1 C0, C2, C3 Set to 0 if stack underflow occurred; otherwise, cleared to 0. Refer to above table.
Floating-Point Exceptions #IS #IA #D Stack underflow occurred. The source operand is a NaN value or is in an unsupported format. The source operand is a denormal value.
3-265
FTSTTEST (Continued)
3-266
FUCOM/FUCOMP/FUCOMPPUnordered Compare Real

Opcode DD E0+i DD E1 DD E8+i DD E9 DA E9 Instruction FUCOM ST(i) FUCOM FUCOMP ST(i) FUCOMP FUCOMPP Description Compare ST(0) with ST(i) Compare ST(0) with ST(1) Compare ST(0) with ST(i) and pop register stack Compare ST(0) with ST(1) and pop register stack Compare ST(0) with ST(1) and pop register stack twice
Description These instructions perform an unordered comparison of the contents of register ST(0) and ST(i) and sets condition code flags C0, C2, and C3 in the FPU status word according to the results (refer to the table below). If no operand is specified, the contents of registers ST(0) and ST(1) are compared. The sign of zero is ignored, so that 0.0 = +0.0.
Comparison Results ST0 > ST(i) ST0 < ST(i) ST0 = ST(i) Unordered NOTE: * Flags not set if unmasked invalid-arithmetic-operand (#IA) exception is generated. C3 0 0 1 1 C2 0 0 0 1 C0 0 1 0 1
An unordered comparison checks the class of the numbers being compared (refer to FXAMExamine in this chapter). The FUCOM instructions perform the same operations as the FCOM instructions. The only difference is that the FUCOM instructions raise the invalidarithmetic-operand exception (#IA) only when either or both operands are an sNaN or are in an unsupported format; qNaNs cause the condition code flags to be set to unordered, but do not cause an exception to be generated. The FCOM instructions raise an invalid operation exception when either or both of the operands are a NaN value of any kind or are in an unsupported format. As with the FCOM instructions, if the operation results in an invalid-arithmetic-operand exception being raised, the condition code flags are set only if the exception is masked. The FUCOMP instruction pops the register stack following the comparison operation and the FUCOMPP instruction pops the register stack twice following the comparison operation. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1.
3-267
FUCOM/FUCOMP/FUCOMPPUnordered Compare Real (Continued)

Operation
CASE (relation of operands) OF ST > SRC: C3, C2, C0 000; ST < SRC: C3, C2, C0 001; ST = SRC: C3, C2, C0 100; ESAC; IF ST(0) or SRC = QNaN, but not SNaN or unsupported format THEN C3, C2, C0 111; ELSE (* ST(0) or SRC is SNaN or unsupported format *) #IA; IF FPUControlWord.IM = 1 THEN C3, C2, C0 111; FI; FI; IF instruction = FUCOMP THEN PopRegisterStack; FI; IF instruction = FUCOMPP THEN PopRegisterStack; PopRegisterStack; FI;
FPU Flags Affected C1 C0, C2, C3 Set to 0 if stack underflow occurred. Refer to table on previous page.
Floating-Point Exceptions #IS #IA Stack underflow occurred. One or both operands are sNaN values or have unsupported formats. Detection of a qNaN value in and of itself does not raise an invalidoperand exception. One or both operands are denormal values.
#D
3-268
FUCOM/FUCOMP/FUCOMPPUnordered Compare Real (Continued)

3-269
FWAITWait
Refer to entry for WAIT/FWAITWait.
3-270
FXAMExamine
Opcode D9 E5 Instruction FXAM Description Classify value or number in ST(0)
Description This instruction examines the contents of the ST(0) register and sets the condition code flags C0, C2, and C3 in the FPU status word to indicate the class of value or number in the register (refer to the table below).
.
Class Unsupported NaN Normal finite number Infinity Zero Empty Denormal number
C3 0 0 0 0 1 1 1
C2 0 0 1 1 0 0 1
C0 0 1 0 1 0 1 0
The C1 flag is set to the sign of the value in ST(0), regardless of whether the register is empty or full. Operation
C1 sign bit of ST; (* 0 for positive, 1 for negative *) CASE (class of value or number in ST(0)) OF Unsupported:C3, C2, C0 000; NaN: C3, C2, C0 001; Normal: C3, C2, C0 010; Infinity: C3, C2, C0 011; Zero: C3, C2, C0 100; Empty: C3, C2, C0 101; Denormal: C3, C2, C0 110; ESAC;
FPU Flags Affected C1 C0, C2, C3 Sign of value in ST(0). Refer to table above.
3-271
FXAMExamine (Continued)
Floating-Point Exceptions None. Protected Mode Exceptions #NM EM or TS in CR0 is set.
3-272
FXCHExchange Register Contents

Opcode D9 C8+i D9 C9 Instruction FXCH ST(i) FXCH Description Exchange the contents of ST(0) and ST(i) Exchange the contents of ST(0) and ST(1)
Description This instruction exchanges the contents of registers ST(0) and ST(i). If no source operand is specified, the contents of ST(0) and ST(1) are exchanged. This instruction provides a simple means of moving values in the FPU register stack to the top of the stack [ST(0)], so that they can be operated on by those floating-point instructions that can only operate on values in ST(0). For example, the following instruction sequence takes the square root of the third register from the top of the register stack:
FXCH ST(3); FSQRT; FXCH ST(3);
Operation
IF number-of-operands is 1 THEN temp ST(0); ST(0) SRC; SRC temp; ELSE temp ST(0); ST(0) ST(1); ST(1) temp;
3-273
FXCHExchange Register Contents (Continued)

3-274
FXRSTORRestore FP and MMX State and Streaming SIMD Extension State

Opcode 0F,AE,/1 Instruction FXRSTOR m512byte Description Load FP and MMX technology and Streaming SIMD Extension state from m512byte.
Description The FXRSTOR instruction reloads the FP and MMX technology state, and the Streaming SIMD Extension state (environment and registers), from the memory area defined by m512byte. This data should have been written by a previous FXSAVE. The floating-point, MMX technology, and Streaming SIMD Extension environment and registers consist of the following data structure (little-endian byte order as arranged in memory, with byte offset into row described by right column):
3-275
FXRSTORRestore FP And MMX State and Streaming SIMD Extension State (Continued)
15 14 13
CS Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved
12
11 10
IP
6
FOP
5
DS
4
FTW
2
DP
1
FCW
0
0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496
Rsrvd
FSW
Reserved
MXCSR
Rsrvd
ST0/MM0 ST1/MM1 ST2/MM2 ST3/MM3 ST4/MM4 ST5/MM5 ST6/MM6 ST7/MM7
3-276
FXRSTORRestore FP And MMX State And Streaming SIMD Extension State (Continued)
Three fields in the floating-point save area contain reserved bits that are not indicated in the table: FOP IP & DP The lower 11-bits contain the opcode, upper 5-bits are reserved. 32-bit mode: 32-bit IP-offset. 16-bit mode: lower 16 bits are IP-offset and upper 16 bits are reserved. If the MXCSR state contains an unmasked exception with a corresponding status flag also set, loading it will not result in a floating-point error condition being asserted. Only the next occurrence of this unmasked exception will result in the error condition being asserted. Some bits of MXCSR (bits 31-16 and bit 6) are defined as reserved and cleared; attempting to write a non-zero value to these bits will result in a general protection exception. FXRSTOR does not flush pending x87-FP exceptions, unlike FRSTOR. To check and raise exceptions when loading a new operating environment, use FWAIT after FXRSTOR. The Streaming SIMD Extension fields in the save image (XMM0-XMM7 and MXCSR) will not be loaded into the processor if the CR4.OSFXSR bit is not set. This CR4 bit must be set in order to enable execution of Streaming SIMD Extension. Operation
FP and MMX technology state and Streaming SIMD Extension state = m512byte;
Exceptions #AC If exception detection is disabled, a general protection exception is signaled if the address is not aligned on 16-byte boundary. Note that if #AC is enabled (and CPL is 3), signaling of #AC is not guaranteed and may vary with implementation. In all implementations where #AC is not signaled, a general protection fault will instead be signaled. In addition, the width of the alignment check when #AC is enabled may also vary with implementation; for instance, for a given implementation, #AC might be signaled for a 2-byte misalignment, whereas #GP might be signaled for all other misalignments (4-/8-/16-byte). Invalid opcode exception if instruction is preceded by a LOCK override prefix. General protection fault if reserved bits of MXCSR are loaded with non-zero values.
Numeric Exceptions None.
3-277
FXRSTORRestore FP And MMX State And Streaming SIMD Extension State (Continued)
Protected Mode Exceptions #GP(0) For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments, or if an attempt is made to load non-zero values to reserved bits in the MXCSR field. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3).
#SS(0) #PF (fault-code) #NM #NM #AC
Real Address Mode Exceptions Interrupt 13 #NM #NM If any part of the operand would lie outside of the effective address space from 0 to 0FFFFH. If CR0.EM = 1. If TS bit in CR0 is set.
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #AC #PF (fault-code) Comments State saved with FXSAVE and restored with FRSTOR, and state saved with FSAVE and restored with FXRSTOR, will result in incorrect restoration of state in the processor. The address size prefix will have the usual effect on address calculation, but will have no effect on the format of the FXRSTOR image. The use of Repeat (F2H, F3H) and Operand-size (66H) prefixes with FXRSTOR is reserved. Different processor implementations may handle these prefixes differently. Usage of these prefixes with FXRSTOR risks incompatibility with future processors. For unaligned memory reference if the current privilege level is 3. For a page fault.
3-278
FXSAVEStore FP and MMX State and Streaming SIMD Extension State

Opcode 0F,AE,/0 Instruction FXSAVE m512byte Description Store FP and MMX technology state and Streaming SIMD Extension state to m512byte.
Description The FXSAVE instruction writes the current FP and MMX technology state, and Streaming SIMD Extension state (environment and registers), to the specified destination defined by m512byte. It does this without checking for pending unmasked floating-point exceptions (similar to the operation of FNSAVE). Unlike the FSAVE/FNSAVE instructions, the processor retains the contents of the FP and MMX technology state and Streaming SIMD Extension state in the processor after the state has been saved. This instruction has been optimized to maximize floating-point save performance. The save data structure is as follows (little-endian byte order as arranged in memory, with byte offset into row described by right column):
3-279
FXSAVEStore FP and MMX State And Streaming SIMD Extension State (Continued)
15 14 13
CS Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved
12
11 10
IP
6
FOP
5
DS
4
FTW
2
DP
1
FCW
0
0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496
Rsrvd
FSW
Reserved
MXCSR
Rsrvd
ST0/MM0 ST1/MM1 ST2/MM2 ST3/MM3 ST4/MM4 ST5/MM5 ST6/MM6 ST7/MM7
3-280
Three fields in the floating-point save area contain reserved bits that are not indicated in the table: FOP: IP & DP: The lower 11-bits contain the opcode, upper 5-bits are reserved. 32-bit mode: 32-bit IP-offset. 16-bit mode: lower 16 bits are IP-offset and upper 16 bits are reserved. The FXSAVE instruction is used when an operating system needs to perform a context switch or when an exception handler needs to use the floating-point, MMX technology, and Streaming SIMD Extension units. It cannot be used by an application program to pass a "clean" FP state to a procedure, since it retains the current state. An application must explicitly execute an FINIT instruction after FXSAVE to provide for this functionality. All of the x87-FP fields retain the same internal format as in FSAVE except for FTW. Unlike FSAVE, FXSAVE saves only the FTW valid bits rather than the entire x87-FP FTW field. The FTW bits are saved in a non-TOS relative order, which means that FR0 is always saved first, followed by FR1, FR2 and so forth. As an example, if TOS=4 and only ST0, ST1 and ST2 are valid, FSAVE saves the FTW field in the following format: ST3 FR7 11 ST2 FR6 xx ST1 FR5 xx ST0 FR4 xx ST7 FR3 11 ST6 FR2 11 ST5 FR1 11 ST4 (TOS=4) FR0 11
where xx is one of (00, 01, 10). (11) indicates an empty stack elements, and the 00, 01, and 10 indicate Valid, Zero, and Special, respectively. In this example, FXSAVE would save the following vector: FR7 0 FRits6 FR5 1 1 FR4 1 FR3 0 FR2 0 FR1 0 FR0 0
3-281
The FSAVE format for FTW can be recreated from the FTW valid bits and the stored 80-bit FP data (assuming the stored data was not the contents of MMX technology registers) using the following table:
Exponent all 1s 0 0 0 0 0 0 0 0 1 1 1 1 Exponent all 0s 0 0 0 0 1 1 1 1 0 0 0 0 Fraction all 0s 0 0 1 1 0 0 1 1 0 0 1 1 J and M bits 0x 1x 00 10 0x 1x 00 10 1x 1x 00 10 FTW valid bit 1 1 1 1 1 1 1 1 1 1 1 1 0 x87 FTW Special Valid Special Valid Special Special Zero Special Special Special Special Special Empty 10 00 10 00 10 10 01 10 10 10 10 10 11
For all legal combinations above
The J-bit is defined to be the 1-bit binary integer to the left of the decimal place in the significand. The M-bit is defined to be the most significant bit of the fractional portion of the significand (i.e., the bit immediately to the right of the decimal place). When the M- bit is the most significant bit of the fractional portion of the significand, it must be 0 if the fraction is all 0s. If the FXSAVE instruction is immediately preceded by an FP instruction which does not use a memory operand, then the FXSAVE instruction does not write/update the DP field, in the FXSAVE image. MXCSR holds the contents of the SIMD floating-point Control/Status Register. Refer to the LDMXCSR instruction for a full description of this field. The fields XMM0-XMM7 contain the content of registers XMM0-XMM7 in exactly the same format as they exist in the registers.
3-282
The Streaming SIMD Extension fields in the save image (XMM0-XMM7 and MXCSR) may not be loaded into the processor if the CR4.OSFXSR bit is not set. This CR4 bit must be set in order to enable execution of Streaming SIMD Extensions. The destination m512byte is assumed to be aligned on a 16-byte boundary. If m512byte is not aligned on a 16-byte boundary, FXSAVE generates a general protection exception. Operation
m512byte = FP and MMX technology state and Streaming SIMD Extension state;
Exceptions #AC If exception detection is disabled, a general protection exception is signaled if the address is not aligned on 16-byte boundary. Note that if #AC is enabled (and CPL is 3), signaling of #AC is not guaranteed and may vary with implementation. In all implementations where #AC is not signaled, a general protection fault will instead be signaled. In addition, the width of the alignment check when #AC is enabled may also vary with implementation; for instance, for a given implementation, #AC might be signaled for a 2-byte misalignment, whereas #GP might be signaled for all other misalignments (4-/8-/16-byte). Invalid opcode exception if instruction is preceded by a LOCK override prefix.
Numeric Exceptions Invalid, Precision. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #NM #NM #AC For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3).
3-283
Real Address Mode Exceptions Interrupt 13 #NM #NM If any part of the operand would lie outside of the effective address space from 0 to 0FFFFH. If CR0.EM = 1. If TS bit in CR0 is set.
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #AC #PF (fault-code) Comments State saved with FXSAVE and restored with FRSTOR, and state saved with FSAVE and restored with FXRSTOR, will result in incorrect restoration of state in the processor. The address size prefix will have the usual effect on address calculation, but will have no effect on the format of the FXSAVE image. The use of Repeat (F2H, F3H) and Operand-size (66H) prefixes with FXSAVE is reserved. Different processor implementations may handle these prefixes differently. Usage of these prefixes with FXSAVE risks incompatibility with future processors. For unaligned memory reference if the current privilege level is 3. For a page fault.
3-284
FXTRACTExtract Exponent and Significand

Opcode D9 F4 Instruction FXTRACT Description Separate value in ST(0) into exponent and significand, store exponent in ST(0), and push the significand onto the register stack.
Description This instruction separates the source value in the ST(0) register into its exponent and significand, stores the exponent in ST(0), and pushes the significand onto the register stack. Following this operation, the new top-of-stack register ST(0) contains the value of the original significand expressed as a real number. The sign and significand of this value are the same as those found in the source operand, and the exponent is 3FFFH (biased value for a true exponent of zero). The ST(1) register contains the value of the original operands true (unbiased) exponent expressed as a real number. (The operation performed by this instruction is a superset of the IEEE-recommended logb(x) function.) This instruction and the F2XM1 instruction are useful for performing power and range scaling operations. The FXTRACT instruction is also useful for converting numbers in extended-real format to decimal representations (e.g., for printing or displaying). If the floating-point zero-divide exception (#Z) is masked and the source operand is zero, an exponent value of is stored in register ST(1) and 0 with the sign of the source operand is stored in register ST(0). Operation
TEMP Significand(ST(0)); ST(0) Exponent(ST(0)); TOP TOP 1; ST(0) TEMP;
FPU Flags Affected C1 C0, C2, C3 Set to 0 if stack underflow occurred; set to 1 if stack overflow occurred. Undefined.
Floating-Point Exceptions #IS Stack underflow occurred. Stack overflow occurred. #IA #Z #D Source operand is an sNaN value or unsupported format. ST(0) operand is 0. Source operand is a denormal value.
3-285
FXTRACTExtract Exponent and Significand (Continued)

3-286
FYL2XCompute y log2x
Opcode D9 F1 Instruction FYL2X Description Replace ST(1) with (ST(1) log2ST(0)) and pop the register stack
Description This instruction calculates (ST(1) log2 (ST(0))), stores the result in resister ST(1), and pops the FPU register stack. The source operand in ST(0) must be a non-zero positive number. The following table shows the results obtained when taking the log of various classes of numbers, assuming that neither overflow nor underflow occurs.
ST(0) ST(1) F 0 +0 +F + NaN NOTES: F Means finite-real number. * Indicates floating-point invalid operation (#IA) exception. ** Indicates floating-point zero-divide (#Z) exception. * * * * * * NaN F * * * * * * NaN 0 + ** * * ** NaN +0 < +F < +1 + +F +0 0 F NaN +1 * 0 0 +0 +0 NaN +F > +1 F 0 +0 +F + NaN + * * + + NaN NaN NaN NaN NaN NaN NaN NaN NaN
If the divide-by-zero exception is masked and register ST(0) contains 0, the instruction returns with a sign that is the opposite of the sign of the source operand in register ST(1). The FYL2X instruction is designed with a built-in multiplication to optimize the calculation of logarithms with an arbitrary positive base (b):
logbx = (log2b)1 log2x
Operation
ST(1) ST(1) log2ST(0); PopRegisterStack;
3-287
FYL2XCompute y log2x (Continued)

Floating-Point Exceptions #IS #IA Stack underflow occurred. Either operand is an sNaN or unsupported format. Source operand in register ST(0) is a negative finite value (not 0). #Z #D #U #O #P Source operand in register ST(0) is 0. Source operand is a denormal value. Result is too small for destination format. Result is too large for destination format. Value cannot be represented exactly in destination format.
3-288
FYL2XP1Compute y log2(x +1)

Opcode D9 F9 Instruction FYL2XP1 Description Replace ST(1) with ST(1) log2(ST(0) + 1.0) and pop the register stack
Description This instruction calculates the log epsilon (ST(1) log2(ST(0) + 1.0)), stores the result in register ST(1), and pops the FPU register stack. The source operand in ST(0) must be in the range: ( 1 2 2 ) ) to ( 1 2 2 ) The source operand in ST(1) can range from to +. If the ST(0) operand is outside of its acceptable range, the result is undefined and software should not rely on an exception being generated. Under some circumstances exceptions may be generated when ST(0) is out of range, but this behavior is implementation specific and not guaranteed. The following table shows the results obtained when taking the log epsilon of various classes of numbers, assuming that underflow does not occur.
ST(0) (1 ( 2 2 )) to 0 ST(1) F 0 +0 +F + NaN NOTES: F Means finite-real number. * Indicates floating-point invalid operation (#IA) exception. + +F +0 0 F NaN 0 * +0 +0 0 0 * NaN +0 * 0 0 +0 +0 * NaN +0 to +(1 ( 2 2 )) F 0 +0 +F + NaN NaN NaN NaN NaN NaN NaN NaN NaN
This instruction provides optimal accuracy for values of epsilon [the value in register ST(0)] that are close to 0. When the epsilon value () is small, more significant digits can be retained by using the FYL2XP1 instruction than by using (+1) as an argument to the FYL2X instruction. The (+1) expression is commonly found in compound interest and annuity calculations. The result can be simply converted into a value in another logarithm base by including a scale factor in the ST(1) source operand. The following equation is used to calculate the scale factor for a particular logarithm base, where n is the logarithm base desired for the result of the FYL2XP1 instruction: scale factor = logn 2
3-289
FYL2XP1Compute y log2(x +1) (Continued)

Operation
ST(1) ST(1) log2(ST(0) + 1.0); PopRegisterStack;
Floating-Point Exceptions #IS #IA #D #U #O #P Stack underflow occurred. Either operand is an sNaN value or unsupported format. Source operand is a denormal value. Result is too small for destination format. Result is too large for destination format. Value cannot be represented exactly in destination format.
3-290
HLTHalt
Opcode F4 Instruction HLT Description Halt
Description This instruction stops instruction execution and places the processor in a HALT state. An enabled interrupt, NMI, or a reset will resume execution. If an interrupt (including NMI) is used to resume execution after a HLT instruction, the saved instruction pointer (CS:EIP) points to the instruction following the HLT instruction. The HLT instruction is a privileged instruction. When the processor is running in protected or virtual-8086 mode, the privilege level of a program or procedure must be 0 to execute the HLT instruction. Operation
Enter Halt state;
Flags Affected None. Protected Mode Exceptions #GP(0) If the current privilege level is not 0.
Real-Address Mode Exceptions None. Virtual-8086 Mode Exceptions #GP(0) If the current privilege level is not 0.
3-291
IDIVSigned Divide
Opcode F6 /7 Instruction IDIV r/m8 Description Signed divide AX (where AH must contain signextension of AL) by r/m byte. (Results: AL=Quotient, AH=Remainder) Signed divide DX:AX (where DX must contain signextension of AX) by r/m word. (Results: AX=Quotient, DX=Remainder) Signed divide EDX:EAX (where EDX must contain sign-extension of EAX) by r/m doubleword. (Results: EAX=Quotient, EDX=Remainder)
F7 /7
IDIV r/m16
F7 /7
IDIV r/m32
Description This instruction divides (signed) the value in the AL, AX, or EAX register by the source operand and stores the result in the AX, DX:AX, or EDX:EAX registers. The source operand can be a general-purpose register or a memory location. The action of this instruction depends on the operand size, as shown in the following table:
Operand Size Word/byte Doubleword/word Quadword/doubleword Dividend AX DX:AX EDX:EAX Divisor r/m8 r/m16 r/m32 Quotient AL AX EAX Remainder AH DX EDX Quotient Range 128 to +127 32,768 to +32,767 231 to 232 1
Non-integral results are truncated (chopped) towards 0. The sign of the remainder is always the same as the sign of the dividend. The absolute value of the remainder is always less than the absolute value of the divisor. Overflow is indicated with the #DE (divide error) exception rather than with the OF (overflow) flag. Operation
IF SRC = 0 THEN #DE; (* divide error *) FI; IF OpernadSize = 8 (* word/byte operation *) THEN temp AX / SRC; (* signed division *) IF (temp > 7FH) OR (temp < 80H) (* if a positive result is greater than 7FH or a negative result is less than 80H *) THEN #DE; (* divide error *) ; ELSE AL temp; AH AX SignedModulus SRC; FI;
3-292
IDIVSigned Divide (Continued)

ELSE IF OpernadSize = 16 (* doubleword/word operation *) THEN temp DX:AX / SRC; (* signed division *) IF (temp > 7FFFH) OR (temp < 8000H) (* if a positive result is greater than 7FFFH *) (* or a negative result is less than 8000H *) THEN #DE; (* divide error *) ; ELSE AX temp; DX DX:AX SignedModulus SRC; FI; ELSE (* quadword/doubleword operation *) temp EDX:EAX / SRC; (* signed division *) IF (temp > 7FFFFFFFH) OR (temp < 80000000H) (* if a positive result is greater than 7FFFFFFFH *) (* or a negative result is less than 80000000H *) THEN #DE; (* divide error *) ; ELSE EAX temp; EDX EDXE:AX SignedModulus SRC; FI; FI; FI;
Flags Affected The CF, OF, SF, ZF, AF, and PF flags are undefined. Protected Mode Exceptions #DE If the source operand (divisor) is 0. The signed result (quotient) is too large for the destination. #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-293
IDIVSigned Divide (Continued)

Real-Address Mode Exceptions #DE If the source operand (divisor) is 0. The signed result (quotient) is too large for the destination. #GP #SS If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit.
Virtual-8086 Mode Exceptions #DE If the source operand (divisor) is 0. The signed result (quotient) is too large for the destination. #GP(0) #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made.
3-294
IMULSigned Multiply
Opcode F6 /5 F7 /5 F7 /5 0F AF /r 0F AF /r 6B /r ib 6B /r ib 6B /r ib 6B /r ib 69 /r iw 69 /r id 69 /r iw 69 /r id Instruction IMUL r/m8 IMUL r/m16 IMUL r/m32 IMUL r16,r/m16 IMUL r32,r/m32 IMUL r16,r/m16,imm8 IMUL r32,r/m32,imm8 IMUL r16,imm8 IMUL r32,imm8 IMUL r16,r/ m16,imm16 IMUL r32,r/ m32,imm32 IMUL r16,imm16 IMUL r32,imm32 Description AX AL r/m byte DX:AX AX r/m word EDX:EAX EAX r/m doubleword word register word register r/m word doubleword register doubleword register r/m doubleword word register r/m16 sign-extended immediate byte doubleword register r/m32 sign-extended immediate byte word register word register sign-extended immediate byte doubleword register doubleword register signextended immediate byte word register r/m16 immediate word doubleword register r/m32 immediate doubleword word register r/m16 immediate word doubleword register r/m32 immediate doubleword
Description This instruction performs a signed multiplication of two operands. This instruction has three forms, depending on the number of operands.
One-operand form. This form is identical to that used by the MUL instruction. Here, the source operand (in a general-purpose register or memory location) is multiplied by the value in the AL, AX, or EAX register (depending on the operand size) and the product is stored in the AX, DX:AX, or EDX:EAX registers, respectively. Two-operand form. With this form the destination operand (the first operand) is multiplied by the source operand (second operand). The destination operand is a generalpurpose register and the source operand is an immediate value, a general-purpose register, or a memory location. The product is then stored in the destination operand location. Three-operand form. This form requires a destination operand (the first operand) and two source operands (the second and the third operands). Here, the first source operand (which can be a general-purpose register or a memory location) is multiplied by the second source operand (an immediate value). The product is then stored in the destination operand (a general-purpose register).
When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.
3-295
IMULSigned Multiply (Continued)

The CF and OF flags are set when significant bits are carried into the upper half of the result. The CF and OF flags are cleared when the result fits exactly in the lower half of the result. The three forms of the IMUL instruction are similar in that the length of the product is calculated to twice the length of the operands. With the one-operand form, the product is stored exactly in the destination. With the two- and three- operand forms, however, result is truncated to the length of the destination before it is stored in the destination register. Because of this truncation, the CF or OF flag should be tested to ensure that no significant bits are lost. The two- and three-operand forms may also be used with unsigned operands because the lower half of the product is the same regardless if the operands are signed or unsigned. The CF and OF flags, however, cannot be used to determine if the upper half of the result is non-zero.
3-296

Operation
IF (NumberOfOperands = 1) THEN IF (OperandSize = 8) THEN AX AL SRC (* signed multiplication *) IF ((AH = 00H) OR (AH = FFH)) THEN CF = 0; OF = 0; ELSE CF = 1; OF = 1; FI; ELSE IF OperandSize = 16 THEN DX:AX AX SRC (* signed multiplication *) IF ((DX = 0000H) OR (DX = FFFFH)) THEN CF = 0; OF = 0; ELSE CF = 1; OF = 1; FI; ELSE (* OperandSize = 32 *) EDX:EAX EAX SRC (* signed multiplication *) IF ((EDX = 00000000H) OR (EDX = FFFFFFFFH)) THEN CF = 0; OF = 0; ELSE CF = 1; OF = 1; FI; FI; ELSE IF (NumberOfOperands = 2) THEN temp DEST SRC (* signed multiplication; temp is double DEST size*) DEST DEST SRC (* signed multiplication *) IF temp DEST THEN CF = 1; OF = 1; ELSE CF = 0; OF = 0; FI; ELSE (* NumberOfOperands = 3 *) DEST SRC1 SRC2 (* signed multiplication *) temp SRC1 SRC2 (* signed multiplication; temp is double SRC1 size *) IF temp DEST THEN CF = 1; OF = 1; ELSE CF = 0; OF = 0; FI; FI; FI;
3-297

Flags Affected For the one operand form of the instruction, the CF and OF flags are set when significant bits are carried into the upper half of the result and cleared when the result fits exactly in the lower half of the result. For the two- and three-operand forms of the instruction, the CF and OF flags are set when the result must be truncated to fit in the destination operand size and cleared when the result fits exactly in the destination operand size. The SF, ZF, AF, and PF flags are undefined. Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-298
INInput from Port

Opcode E4 ib E5 ib E5 ib EC ED ED Instruction IN AL,imm8 IN AX,imm8 IN EAX,imm8 IN AL,DX IN AX,DX IN EAX,DX Description Input byte from imm8 I/O port address into AL Input byte from imm8 I/O port address into AX Input byte from imm8 I/O port address into EAX Input byte from I/O port in DX into AL Input word from I/O port in DX into AX Input doubleword from I/O port in DX into EAX
Description This instruction copies the value from the I/O port specified with the second operand (source operand) to the destination operand (first operand). The source operand can be a byte-immediate or the DX register; the destination operand can be register AL, AX, or EAX, depending on the size of the port being accessed (8, 16, or 32 bits, respectively). Using the DX register as a source operand allows I/O port addresses from 0 to 65,535 to be accessed; using a byte immediate allows I/O port addresses 0 to 255 to be accessed. When accessing an 8-bit I/O port, the opcode determines the port size; when accessing a 16- and 32-bit I/O port, the operand-size attribute determines the port size. At the machine code level, I/O instructions are shorter when accessing 8-bit I/O ports. Here, the upper eight bits of the port address will be 0. This instruction is only useful for accessing I/O ports located in the processors I/O address space. Refer to Chapter 10, Input/Output of the Intel Architecture Software Developers Manual, Volume 1, for more information on accessing I/O ports in the I/O address space. Operation
IF ((PE = 1) AND ((CPL > IOPL) OR (VM = 1))) THEN (* Protected mode with CPL > IOPL or virtual-8086 mode *) IF (Any I/O Permission Bit for I/O port being accessed = 1) THEN (* I/O operation is not allowed *) #GP(0); ELSE ( * I/O operation is allowed *) DEST SRC; (* Reads from selected I/O port *) FI; ELSE (Real Mode or Protected Mode with CPL IOPL *) DEST SRC; (* Reads from selected I/O port *) FI;
3-299
INInput from Port (Continued)

Protected Mode Exceptions #GP(0) If the CPL is greater than (has less privilege) the I/O privilege level (IOPL) and any of the corresponding I/O permission bits in TSS for the I/O port being accessed is 1.
Real-Address Mode Exceptions None. Virtual-8086 Mode Exceptions #GP(0) If any of the I/O permission bits in the TSS for the I/O port being accessed is 1.
3-300
INCIncrement by 1
Opcode FE /0 FF /0 FF /0 40+ rw 40+ rd Instruction INC r/m8 INC r/m16 INC r/m32 INC r16 INC r32 Description Increment r/m byte by 1 Increment r/m word by 1 Increment r/m doubleword by 1 Increment word register by 1 Increment doubleword register by 1
Description This instruction adds one to the destination operand, while preserving the state of the CF flag. The destination operand can be a register or a memory location. This instruction allows a loop counter to be updated without disturbing the CF flag. (Use a ADD instruction with an immediate operand of 1 to perform an increment operation that does updates the CF flag.) Operation
DEST DEST +1;
Flags Affected The CF flag is not affected. The OF, SF, ZF, AF, and PF flags are set according to the result. Protected Mode Exceptions #GP(0) If the destination operand is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-301
INCIncrement by 1 (Continued)
3-302
INS/INSB/INSW/INSDInput from Port to String

Opcode 6C 6D 6D 6C 6D 6D Instruction INS m8, DX INS m16, DX INS m32, DX INSB INSW INSD Description Input byte from I/O port specified in DX into memory location specified in ES:(E)DI Input word from I/O port specified in DX into memory location specified in ES:(E)DI Input doubleword from I/O port specified in DX into memory location specified in ES:(E)DI Input byte from I/O port specified in DX into memory location specified with ES:(E)DI Input word from I/O port specified in DX into memory location specified in ES:(E)DI Input doubleword from I/O port specified in DX into memory location specified in ES:(E)DI
Description These instructions copy the data from the I/O port specified with the source operand (second operand) to the destination operand (first operand). The source operand is an I/O port address (from 0 to 65,535) that is read from the DX register. The destination operand is a memory location, the address of which is read from either the ES:EDI or the ES:DI registers (depending on the address-size attribute of the instruction, 32 or 16, respectively). The ES segment cannot be overridden with a segment override prefix. The size of the I/O port being accessed (that is, the size of the source and destination operands) is determined by the opcode for an 8-bit I/O port or by the operand-size attribute of the instruction for a 16- or 32-bit I/O port. At the assembly-code level, two forms of this instruction are allowed: the explicit-operands form and the no-operands form. The explicit-operands form (specified with the INS mnemonic) allows the source and destination operands to be specified explicitly. Here, the source operand must be DX, and the destination operand should be a symbol that indicates the size of the I/O port and the destination address. This explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the destination operand symbol must specify the correct type (size) of the operand (byte, word, or doubleword), but it does not have to specify the correct location. The location is always specified by the ES:(E)DI registers, which must be loaded correctly before the INS instruction is executed. The no-operands form provides short forms of the byte, word, and doubleword versions of the INS instructions. Here also DX is assumed by the processor to be the source operand and ES:(E)DI is assumed to be the destination operand. The size of the I/O port is specified with the choice of mnemonic: INSB (byte), INSW (word), or INSD (doubleword). After the byte, word, or doubleword is transfer from the I/O port to the memory location, the (E)DI register is incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the (E)DI register is incremented; if the DF flag is 1, the (E)DI register is decremented.) The (E)DI register is incremented or decremented by one for byte operations, by two for word operations, or by four for doubleword operations.
3-303
INS/INSB/INSW/INSDInput from Port to String (Continued)

The INS, INSB, INSW, and INSD instructions can be preceded by the REP prefix for block input of ECX bytes, words, or doublewords. Refer to REP/REPE/REPZ/REPNE /REPNZRepeat String Operation Prefix in this chapter for a description of the REP prefix. These instructions are only useful for accessing I/O ports located in the processors I/O address space. Refer to Chapter 10, Input/Output of the Intel Architecture Software Developers Manual, Volume 1, for more information on accessing I/O ports in the I/O address space. Operation
IF ((PE = 1) AND ((CPL > IOPL) OR (VM = 1))) THEN (* Protected mode with CPL > IOPL or virtual-8086 mode *) IF (Any I/O Permission Bit for I/O port being accessed = 1) THEN (* I/O operation is not allowed *) #GP(0); ELSE ( * I/O operation is allowed *) DEST SRC; (* Reads from I/O port *) FI; ELSE (Real Mode or Protected Mode with CPL IOPL *) DEST SRC; (* Reads from I/O port *) FI; IF (byte transfer) THEN IF DF = 0 THEN (E)DI (E)DI + 1; ELSE (E)DI (E)DI 1; FI; ELSE IF (word transfer) THEN IF DF = 0 THEN (E)DI (E)DI + 2; ELSE (E)DI (E)DI 2; FI; ELSE (* doubleword transfer *) THEN IF DF = 0 THEN (E)DI (E)DI + 4; ELSE (E)DI (E)DI 4; FI; FI; FI;
3-304
INS/INSB/INSW/INSDInput from Port to String (Continued)

Protected Mode Exceptions #GP(0) If the CPL is greater than (has less privilege) the I/O privilege level (IOPL) and any of the corresponding I/O permission bits in TSS for the I/O port being accessed is 1. If the destination is located in a nonwritable segment. If an illegal memory operand effective address in the ES segments is given. #PF(fault-code) #AC(0) If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
Virtual-8086 Mode Exceptions #GP(0) #PF(fault-code) #AC(0) If any of the I/O permission bits in the TSS for the I/O port being accessed is 1. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made.
3-305
INT n/INTO/INT 3Call to Interrupt Procedure

Opcode CC CD CE Instruction INT 3 INT imm8 INTO Description Interrupt 3trap to debugger Interrupt vector number specified by immediate byte Interrupt 4if overflow flag is 1
ib
Description The INT n instruction generates a call to the interrupt or exception handler specified with the destination operand. For more information, refer to Section 4.4., Interrupts and Exceptions in Chapter 4, Procedure Calls, Interrupts, and Exceptions of the Intel Architecture Software Developers Manual, Volume 1. The destination operand specifies an interrupt vector number from 0 to 255, encoded as an 8-bit unsigned intermediate value. Each interrupt vector number provides an index to a gate descriptor in the IDT. The first 32 interrupt vector numbers are reserved by Intel for system use. Some of these interrupts are used for internally generated exceptions. The INT n instruction is the general mnemonic for executing a software-generated call to an interrupt handler. The INTO instruction is a special mnemonic for calling overflow exception (#OF), interrupt vector number 4. The overflow interrupt checks the OF flag in the EFLAGS register and calls the overflow interrupt handler if the OF flag is set to 1. The INT 3 instruction generates a special one byte opcode (CC) that is intended for calling the debug exception handler. (This one byte form is valuable because it can be used to replace the first byte of any instruction with a breakpoint, including other one byte instructions, without over-writing other code). To further support its function as a debug breakpoint, the interrupt generated with the CC opcode also differs from the regular software interrupts as follows:
Interrupt redirection does not happen when in VME mode; the interrupt is handled by a protected-mode handler. The virtual-8086 mode IOPL checks do not occur. The interrupt is taken without faulting at any IOPL level.
Note that the normal 2-byte opcode for INT 3 (CD03) does not have these special features. Intel and Microsoft assemblers will not generate the CD03 opcode from any mnemonic, but this opcode can be created by direct numeric code definition or by self-modifying code. The action of the INT n instruction (including the INTO and INT 3 instructions) is similar to that of a far call made with the CALL instruction. The primary difference is that with the INT n instruction, the EFLAGS register is pushed onto the stack before the return address. (The return address is a far address consisting of the current values of the CS and EIP registers.) Returns from interrupt procedures are handled with the IRET instruction, which pops the EFLAGS information and return address from the stack.
3-306
INT n/INTO/INT 3Call to Interrupt Procedure (Continued)

The interrupt vector number specifies an interrupt descriptor in the interrupt descriptor table (IDT); that is, it provides index into the IDT. The selected interrupt descriptor in turn contains a pointer to an interrupt or exception handler procedure. In protected mode, the IDT contains an array of 8-byte descriptors, each of which is an interrupt gate, trap gate, or task gate. In realaddress mode, the IDT is an array of 4-byte far pointers (2-byte code segment selector and a 2-byte instruction pointer), each of which point directly to a procedure in the selected segment. (Note that in real-address mode, the IDT is called the interrupt vector table, and its pointers are called interrupt vectors.) The following decision table indicates which action in the lower portion of the table is taken given the conditions in the upper portion of the table. Each Y in the lower section of the decision table represents a procedure defined in the Operation section for this instruction (except #GP).
PE VM IOPL DPL/CPL RELATIONSHIP INTERRUPT TYPE GATE TYPE REAL-ADDRESSMODE PROTECTED-MODE TRAP-ORINTERRUPT-GATE INTER-PRIVILEGELEVEL-INTERRUPT INTRA-PRIVILEGELEVEL-INTERRUPT INTERRUPT-FROMVIRTUAL-8086MODE TASK-GATE #GP NOTES: Y Blank Dont Care. Yes, Action Taken. Action Not Taken. Y Y Y Y Y Y 0 1 DPL< CPL S/W 1 1 DPL> CPL Trap or Interrupt 1 DPL= CPL or C Trap or Interrupt 1 0 DPL< CPL & NC Trap or Interrupt 1 1 <3 1 1 =3
Task
Trap or Interrupt
Trap or Interrupt
Y Y
Y Y
Y Y Y
Y Y
Y Y
3-307

When the processor is executing in virtual-8086 mode, the IOPL determines the action of the INT n instruction. If the IOPL is less than 3, the processor generates a general protection exception (#GP); if the IOPL is 3, the processor executes a protected mode interrupt to privilege level 0. The interrupt gates DPL must be set to three and the target CPL of the interrupt handler procedure must be 0 to execute the protected mode interrupt to privilege level 0. The interrupt descriptor table register (IDTR) specifies the base linear address and limit of the IDT. The initial base address value of the IDTR after the processor is powered up or reset is 0. Operation The following operational description applies not only to the INT n and INTO instructions, but also to external interrupts and exceptions.
IF PE=0 THEN GOTO REAL-ADDRESS-MODE; ELSE (* PE=1 *) IF (VM=1 AND IOPL < 3 AND INT n) THEN #GP(0); ELSE (* protected mode or virtual-8086 mode interrupt *) GOTO PROTECTED-MODE; FI; FI; REAL-ADDRESS-MODE: IF ((DEST 4) + 3) is not within IDT limit THEN #GP; FI; IF stack not large enough for a 6-byte return information THEN #SS; FI; Push (EFLAGS[15:0]); IF 0; (* Clear interrupt flag *) TF 0; (* Clear trap flag *) AC 0; (*Clear AC flag*) Push(CS); Push(IP); (* No error codes are pushed *) CS IDT(Descriptor (vector_number 4), selector)); EIP IDT(Descriptor (vector_number 4), offset)); (* 16-bit offset AND 0000FFFFH *) END; PROTECTED-MODE: IF ((DEST 8) + 7) is not within IDT limits OR selected IDT descriptor is not an interrupt-, trap-, or task-gate type THEN #GP((DEST 8) + 2 + EXT); (* EXT is bit 0 in error code *) FI;
3-308

IF software interrupt (* generated by INT n, INT 3, or INTO *) THEN IF gate descriptor DPL < CPL THEN #GP((vector_number 8) + 2 ); (* PE=1, DPL<CPL, software interrupt *) FI; FI; IF gate not present THEN #NP((vector_number 8) + 2 + EXT); FI; IF task gate (* specified in the selected interrupt table descriptor *) THEN GOTO TASK-GATE; ELSE GOTO TRAP-OR-INTERRUPT-GATE; (* PE=1, trap/interrupt gate *) FI; END; TASK-GATE: (* PE=1, task gate *) Read segment selector in task gate (IDT descriptor); IF local/global bit is set to local OR index not within GDT limits THEN #GP(TSS selector); FI; Access TSS descriptor in GDT; IF TSS descriptor specifies that the TSS is busy (low-order 5 bits set to 00001) THEN #GP(TSS selector); FI; IF TSS not present THEN #NP(TSS selector); FI; SWITCH-TASKS (with nesting) to TSS; IF interrupt caused by fault with error code THEN IF stack limit does not allow push of error code THEN #SS(0); FI; Push(error code); FI; IF EIP not within code segment limit THEN #GP(0); FI; END; TRAP-OR-INTERRUPT-GATE Read segment selector for trap or interrupt gate (IDT descriptor); IF segment selector for code segment is null THEN #GP(0H + EXT); (* null selector with EXT flag set *) FI;
3-309

IF segment selector is not within its descriptor table limits THEN #GP(selector + EXT); FI; Read trap or interrupt handler descriptor; IF descriptor does not indicate a code segment OR code segment descriptor DPL > CPL THEN #GP(selector + EXT); FI; IF trap or interrupt gate segment is not present, THEN #NP(selector + EXT); FI; IF code segment is non-conforming AND DPL < CPL THEN IF VM=0 THEN GOTO INTER-PRIVILEGE-LEVEL-INTERRUPT; (* PE=1, interrupt or trap gate, nonconforming *) (* code segment, DPL<CPL, VM=0 *) ELSE (* VM=1 *) IF code segment DPL 0 THEN #GP(new code segment selector); FI; GOTO INTERRUPT-FROM-VIRTUAL-8086-MODE; (* PE=1, interrupt or trap gate, DPL<CPL, VM=1 *) FI; ELSE (* PE=1, interrupt or trap gate, DPL CPL *) IF VM=1 THEN #GP(new code segment selector); FI; IF code segment is conforming OR code segment DPL = CPL THEN GOTO INTRA-PRIVILEGE-LEVEL-INTERRUPT; ELSE #GP(CodeSegmentSelector + EXT); (* PE=1, interrupt or trap gate, nonconforming *) (* code segment, DPL>CPL *) FI; FI; END; INTER-PREVILEGE-LEVEL-INTERRUPT (* PE=1, interrupt or trap gate, non-conforming code segment, DPL<CPL *) (* Check segment selector and descriptor for stack of new privilege level in current TSS *) IF current TSS is 32-bit TSS THEN TSSstackAddress (new code segment DPL 8) + 4 IF (TSSstackAddress + 7) > TSS limit THEN #TS(current TSS selector); FI; NewSS TSSstackAddress + 4; NewESP stack address;
3-310

ELSE (* TSS is 16-bit *) TSSstackAddress (new code segment DPL 4) + 2 IF (TSSstackAddress + 4) > TSS limit THEN #TS(current TSS selector); FI; NewESP TSSstackAddress; NewSS TSSstackAddress + 2; FI; IF segment selector is null THEN #TS(EXT); FI; IF segment selector index is not within its descriptor table limits OR segment selectors RPL DPL of code segment, THEN #TS(SS selector + EXT); FI; Read segment descriptor for stack segment in GDT or LDT; IF stack segment DPL DPL of code segment, OR stack segment does not indicate writable data segment, THEN #TS(SS selector + EXT); FI; IF stack segment not present THEN #SS(SS selector+EXT); FI; IF 32-bit gate THEN IF new stack does not have room for 24 bytes (error code pushed) OR 20 bytes (no error code pushed) THEN #SS(segment selector + EXT); FI; ELSE (* 16-bit gate *) IF new stack does not have room for 12 bytes (error code pushed) OR 10 bytes (no error code pushed); THEN #SS(segment selector + EXT); FI; FI; IF instruction pointer is not within code segment limits THEN #GP(0); FI; SS:ESP TSS(NewSS:NewESP) (* segment descriptor information also loaded *) IF 32-bit gate THEN CS:EIP Gate(CS:EIP); (* segment descriptor information also loaded *) ELSE (* 16-bit gate *) CS:IP Gate(CS:IP); (* segment descriptor information also loaded *) FI; IF 32-bit gate THEN Push(far pointer to old stack); (* old SS and ESP, 3 words padded to 4 *); Push(EFLAGS); Push(far pointer to return instruction); (* old CS and EIP, 3 words padded to 4*); Push(ErrorCode); (* if needed, 4 bytes *)
3-311

ELSE(* 16-bit gate *) Push(far pointer to old stack); (* old SS and SP, 2 words *); Push(EFLAGS(15..0)); Push(far pointer to return instruction); (* old CS and IP, 2 words *); Push(ErrorCode); (* if needed, 2 bytes *) FI; CPL CodeSegmentDescriptor(DPL); CS(RPL) CPL; IF interrupt gate THEN IF 0 (* interrupt flag to 0 (disabled) *); FI; TF 0; VM 0; RF 0; NT 0; END; INTERRUPT-FROM-VIRTUAL-8086-MODE: (* Check segment selector and descriptor for privilege level 0 stack in current TSS *) IF current TSS is 32-bit TSS THEN TSSstackAddress (new code segment DPL 8) + 4 IF (TSSstackAddress + 7) > TSS limit THEN #TS(current TSS selector); FI; NewSS TSSstackAddress + 4; NewESP stack address; ELSE (* TSS is 16-bit *) TSSstackAddress (new code segment DPL 4) + 2 IF (TSSstackAddress + 4) > TSS limit THEN #TS(current TSS selector); FI; NewESP TSSstackAddress; NewSS TSSstackAddress + 2; FI; IF segment selector is null THEN #TS(EXT); FI; IF segment selector index is not within its descriptor table limits OR segment selectors RPL DPL of code segment, THEN #TS(SS selector + EXT); FI; Access segment descriptor for stack segment in GDT or LDT; IF stack segment DPL DPL of code segment, OR stack segment does not indicate writable data segment, THEN #TS(SS selector + EXT); FI; IF stack segment not present THEN #SS(SS selector+EXT); FI;
3-312

IF 32-bit gate THEN IF new stack does not have room for 40 bytes (error code pushed) OR 36 bytes (no error code pushed); THEN #SS(segment selector + EXT); FI; ELSE (* 16-bit gate *) IF new stack does not have room for 20 bytes (error code pushed) OR 18 bytes (no error code pushed); THEN #SS(segment selector + EXT); FI; FI; IF instruction pointer is not within code segment limits THEN #GP(0); FI; tempEFLAGS EFLAGS; VM 0; TF 0; RF 0; IF service through interrupt gate THEN IF 0; FI; TempSS SS; TempESP ESP; SS:ESP TSS(SS0:ESP0); (* Change to level 0 stack segment *) (* Following pushes are 16 bits for 16-bit gate and 32 bits for 32-bit gates *) (* Segment selector pushes in 32-bit mode are padded to two words *) Push(GS); Push(FS); Push(DS); Push(ES); Push(TempSS); Push(TempESP); Push(TempEFlags); Push(CS); Push(EIP); GS 0; (*segment registers nullified, invalid in protected mode *) FS 0; DS 0; ES 0; CS Gate(CS); IF OperandSize=32 THEN EIP Gate(instruction pointer); ELSE (* OperandSize is 16 *) EIP Gate(instruction pointer) AND 0000FFFFH; FI; (* Starts execution of new routine in Protected Mode *) END;
3-313

INTRA-PRIVILEGE-LEVEL-INTERRUPT: (* PE=1, DPL = CPL or conforming segment *) IF 32-bit gate THEN IF current stack does not have room for 16 bytes (error code pushed) OR 12 bytes (no error code pushed); THEN #SS(0); FI; ELSE (* 16-bit gate *) IF current stack does not have room for 8 bytes (error code pushed) OR 6 bytes (no error code pushed); THEN #SS(0); FI; IF instruction pointer not within code segment limit THEN #GP(0); FI; IF 32-bit gate THEN Push (EFLAGS); Push (far pointer to return instruction); (* 3 words padded to 4 *) CS:EIP Gate(CS:EIP); (* segment descriptor information also loaded *) Push (ErrorCode); (* if any *) ELSE (* 16-bit gate *) Push (FLAGS); Push (far pointer to return location); (* 2 words *) CS:IP Gate(CS:IP); (* segment descriptor information also loaded *) Push (ErrorCode); (* if any *) FI; CS(RPL) CPL; IF interrupt gate THEN IF 0; FI; TF 0; NT 0; VM 0; RF 0; FI; END;
Flags Affected The EFLAGS register is pushed onto the stack. The IF, TF, NT, AC, RF, and VM flags may be cleared, depending on the mode of operation of the processor when the INT instruction is executed (refer to the Operation section). If the interrupt uses a task gate, any flags may be set or cleared, controlled by the EFLAGS image in the new tasks TSS.
3-314

Protected Mode Exceptions #GP(0) #GP(selector) If the instruction pointer in the IDT or in the interrupt-, trap-, or task gate is beyond the code segment limits. If the segment selector in the interrupt-, trap-, or task gate is null. If a interrupt-, trap-, or task gate, code segment, or TSS segment selector index is outside its descriptor table limits. If the interrupt vector number is outside the IDT limits. If an IDT descriptor is not an interrupt-, trap-, or task-descriptor. If an interrupt is generated by the INT n, INT 3, or INTO instruction and the DPL of an interrupt-, trap-, or task-descriptor is less than the CPL. If the segment selector in an interrupt- or trap-gate does not point to a segment descriptor for a code segment. If the segment selector for a TSS has its local/global bit set for local. If a TSS segment descriptor specifies that the TSS is busy or not available. #SS(0) #SS(selector) If pushing the return address, flags, or error code onto the stack exceeds the bounds of the stack segment and no stack switch occurs. If the SS register is being loaded and the segment pointed to is marked not present. If pushing the return address, flags, error code, or stack segment pointer exceeds the bounds of the new stack segment when a stack switch occurs. #NP(selector) #TS(selector) If code segment, interrupt-, trap-, or task gate, or TSS is not present. If the RPL of the stack segment selector in the TSS is not equal to the DPL of the code segment being accessed by the interrupt or trap gate. If DPL of the stack segment descriptor pointed to by the stack segment selector in the TSS is not equal to the DPL of the code segment descriptor for the interrupt or trap gate. If the stack segment selector in the TSS is null. If the stack segment for the TSS is not a writable data segment. If segment-selector index for stack segment is outside descriptor table limits. #PF(fault-code) If a page fault occurs.
3-315

Real-Address Mode Exceptions #GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the interrupt vector number is outside the IDT limits. #SS If stack limit violation on push. If pushing the return address, flags, or error code onto the stack exceeds the bounds of the stack segment.
3-316

Virtual-8086 Mode Exceptions #GP(0) (For INT n, INTO, or BOUND instruction) If the IOPL is less than 3 or the DPL of the interrupt-, trap-, or task-gate descriptor is not equal to 3. If the instruction pointer in the IDT or in the interrupt-, trap-, or task gate is beyond the code segment limits. #GP(selector) If the segment selector in the interrupt-, trap-, or task gate is null. If a interrupt-, trap-, or task gate, code segment, or TSS segment selector index is outside its descriptor table limits. If the interrupt vector number is outside the IDT limits. If an IDT descriptor is not an interrupt-, trap-, or task-descriptor. If an interrupt is generated by the INT n instruction and the DPL of an interrupt-, trap-, or task-descriptor is less than the CPL. If the segment selector in an interrupt- or trap-gate does not point to a segment descriptor for a code segment. If the segment selector for a TSS has its local/global bit set for local. #SS(selector) If the SS register is being loaded and the segment pointed to is marked not present. If pushing the return address, flags, error code, stack segment pointer, or data segments exceeds the bounds of the stack segment. #NP(selector) #TS(selector) If code segment, interrupt-, trap-, or task gate, or TSS is not present. If the RPL of the stack segment selector in the TSS is not equal to the DPL of the code segment being accessed by the interrupt or trap gate. If DPL of the stack segment descriptor for the TSSs stack segment is not equal to the DPL of the code segment descriptor for the interrupt or trap gate. If the stack segment selector in the TSS is null. If the stack segment for the TSS is not a writable data segment. If segment-selector index for stack segment is outside descriptor table limits. #PF(fault-code) #BP #OF If a page fault occurs. If the INT 3 instruction is executed. If the INTO instruction is executed and the OF flag is set.
3-317
INVDInvalidate Internal Caches

Opcode 0F 08 Instruction INVD Description Flush internal caches; initiate flushing of external caches.
Description This instruction invalidates (flushes) the processors internal caches and issues a special-function bus cycle that directs external caches to also flush themselves. Data held in internal caches is not written back to main memory. After executing this instruction, the processor does not wait for the external caches to complete their flushing operation before proceeding with instruction execution. It is the responsibility of hardware to respond to the cache flush signal. The INVD instruction is a privileged instruction. When the processor is running in protected mode, the CPL of a program or procedure must be 0 to execute this instruction. Use this instruction with care. Data cached internally and not written back to main memory will be lost. Unless there is a specific requirement or benefit to flushing caches without writing back modified cache lines (for example, testing or fault recovery where cache coherency with main memory is not a concern), software should use the WBINVD instruction. Intel Architecture Compatibility The INVD instruction is implementation dependent, and its function may be implemented differently on future Intel Architecture processors. This instruction is not supported on Intel Architecture processors earlier than the Intel486 processor. Operation
Flush(InternalCaches); SignalFlush(ExternalCaches); Continue (* Continue execution);
3-318
INVDInvalidate Internal Caches (Continued)

Real-Address Mode Exceptions None. Virtual-8086 Mode Exceptions #GP(0) The INVD instruction cannot be executed in virtual-8086 mode.
3-319
INVLPGInvalidate TLB Entry

Opcode 0F 01/7 Instruction INVLPG m Description Invalidate TLB Entry for page that contains m
Description This instruction invalidates (flushes) the translation lookaside buffer (TLB) entry specified with the source operand. The source operand is a memory address. The processor determines the page that contains that address and flushes the TLB entry for that page. The INVLPG instruction is a privileged instruction. When the processor is running in protected mode, the CPL of a program or procedure must be 0 to execute this instruction. The INVLPG instruction normally flushes the TLB entry only for the specified page; however, in some cases, it flushes the entire TLB. Refer to MOVMove to/from Control Registers in this chapter for further information on operations that flush the TLB. Intel Architecture Compatibility The INVLPG instruction is implementation dependent, and its function may be implemented differently on future Intel Architecture processors. This instruction is not supported on Intel Architecture processors earlier than the Intel486 processor. Operation
Flush(RelevantTLBEntries); Continue (* Continue execution);
Flags Affected None. Protected Mode Exceptions #GP(0) #UD If the current privilege level is not 0. Operand is a register.
Real-Address Mode Exceptions #UD Operand is a register.
Virtual-8086 Mode Exceptions #GP(0) The INVLPG instruction cannot be executed at the virtual-8086 mode.
3-320
IRET/IRETDInterrupt Return
Opcode CF CF Instruction IRET IRETD Description Interrupt return (16-bit operand size) Interrupt return (32-bit operand size)
Description These instructions return program control from an exception or interrupt handler to a program or procedure that was interrupted by an exception, an external interrupt, or a software-generated interrupt. These instructions are also used to perform a return from a nested task. (A nested task is created when a CALL instruction is used to initiate a task switch or when an interrupt or exception causes a task switch to an interrupt or exception handler.) Refer to Section 6.4., Task Linking in Chapter 6, Task Management of the Intel Architecture Software Developers Manual, Volume 3. IRET and IRETD are mnemonics for the same opcode. The IRETD mnemonic (interrupt return double) is intended for use when returning from an interrupt when using the 32-bit operand size; however, most assemblers use the IRET mnemonic interchangeably for both operand sizes. In Real-Address Mode, the IRET instruction preforms a far return to the interrupted program or procedure. During this operation, the processor pops the return instruction pointer, return code segment selector, and EFLAGS image from the stack to the EIP, CS, and EFLAGS registers, respectively, and then resumes execution of the interrupted program or procedure. In Protected Mode, the action of the IRET instruction depends on the settings of the NT (nested task) and VM flags in the EFLAGS register and the VM flag in the EFLAGS image stored on the current stack. Depending on the setting of these flags, the processor performs the following types of interrupt returns:
Return from virtual-8086 mode. Return to virtual-8086 mode. Intra-privilege level return. Inter-privilege level return. Return from nested task (task switch).
If the NT flag (EFLAGS register) is cleared, the IRET instruction performs a far return from the interrupt procedure, without a task switch. The code segment being returned to must be equally or less privileged than the interrupt handler routine (as indicated by the RPL field of the code segment selector popped from the stack). As with a real-address mode interrupt return, the IRET instruction pops the return instruction pointer, return code segment selector, and EFLAGS image from the stack to the EIP, CS, and EFLAGS registers, respectively, and then resumes execution of the interrupted program or procedure. If the return is to another privilege level, the IRET instruction also pops the stack pointer and SS from the stack, before resuming program execution. If the return is to virtual-8086 mode, the processor also pops the data segment registers from the stack.
3-321
IRET/IRETDInterrupt Return (Continued)

If the NT flag is set, the IRET instruction performs a task switch (return) from a nested task (a task called with a CALL instruction, an interrupt, or an exception) back to the calling or interrupted task. The updated state of the task executing the IRET instruction is saved in its TSS. If the task is re-entered later, the code that follows the IRET instruction is executed. Operation
IF PE = 0 THEN GOTO REAL-ADDRESS-MODE:; ELSE GOTO PROTECTED-MODE; FI; REAL-ADDRESS-MODE; IF OperandSize = 32 THEN IF top 12 bytes of stack not within stack limits THEN #SS; FI; IF instruction pointer not within code segment limits THEN #GP(0); FI; EIP Pop(); CS Pop(); (* 32-bit pop, high-order 16 bits discarded *) tempEFLAGS Pop(); EFLAGS (tempEFLAGS AND 257FD5H) OR (EFLAGS AND 1A0000H); ELSE (* OperandSize = 16 *) IF top 6 bytes of stack are not within stack limits THEN #SS; FI; IF instruction pointer not within code segment limits THEN #GP(0); FI; EIP Pop(); EIP EIP AND 0000FFFFH; CS Pop(); (* 16-bit pop *) EFLAGS[15:0] Pop(); FI; END; PROTECTED-MODE: IF VM = 1 (* Virtual-8086 mode: PE=1, VM=1 *) THEN GOTO RETURN-FROM-VIRTUAL-8086-MODE; (* PE=1, VM=1 *) FI; IF NT = 1 THEN GOTO TASK-RETURN;( *PE=1, VM=0, NT=1 *) FI; IF OperandSize=32 THEN IF top 12 bytes of stack not within stack limits THEN #SS(0)
3-322

FI; tempEIP Pop(); tempCS Pop(); tempEFLAGS Pop(); ELSE (* OperandSize = 16 *) IF top 6 bytes of stack are not within stack limits THEN #SS(0); FI; tempEIP Pop(); tempCS Pop(); tempEFLAGS Pop(); tempEIP tempEIP AND FFFFH; tempEFLAGS tempEFLAGS AND FFFFH; FI; IF tempEFLAGS(VM) = 1 AND CPL=0 THEN GOTO RETURN-TO-VIRTUAL-8086-MODE; (* PE=1, VM=1 in EFLAGS image *) ELSE GOTO PROTECTED-MODE-RETURN; (* PE=1, VM=0 in EFLAGS image *) FI; RETURN-FROM-VIRTUAL-8086-MODE: (* Processor is in virtual-8086 mode when IRET is executed and stays in virtual-8086 mode *) IF IOPL=3 (* Virtual mode: PE=1, VM=1, IOPL=3 *) THEN IF OperandSize = 32 THEN IF top 12 bytes of stack not within stack limits THEN #SS(0); FI; IF instruction pointer not within code segment limits THEN #GP(0); FI; EIP Pop(); CS Pop(); (* 32-bit pop, high-order 16 bits discarded *) EFLAGS Pop(); (*VM,IOPL,VIP,and VIF EFLAGS bits are not modified by pop *) ELSE (* OperandSize = 16 *) IF top 6 bytes of stack are not within stack limits THEN #SS(0); FI; IF instruction pointer not within code segment limits THEN #GP(0); FI; EIP Pop(); EIP EIP AND 0000FFFFH; CS Pop(); (* 16-bit pop *) EFLAGS[15:0] Pop(); (* IOPL in EFLAGS is not modified by pop *) FI; ELSE #GP(0); (* trap to virtual-8086 monitor: PE=1, VM=1, IOPL<3 *) FI;
3-323

END; RETURN-TO-VIRTUAL-8086-MODE: (* Interrupted procedure was in virtual-8086 mode: PE=1, VM=1 in flags image *) IF top 24 bytes of stack are not within stack segment limits THEN #SS(0); FI; IF instruction pointer not within code segment limits THEN #GP(0); FI; CS tempCS; EIP tempEIP; EFLAGS tempEFLAGS TempESP Pop(); TempSS Pop(); ES Pop(); (* pop 2 words; throw away high-order word *) DS Pop(); (* pop 2 words; throw away high-order word *) FS Pop(); (* pop 2 words; throw away high-order word *) GS Pop(); (* pop 2 words; throw away high-order word *) SS:ESP TempSS:TempESP; (* Resume execution in Virtual-8086 mode *) END; TASK-RETURN: (* PE=1, VM=0, NT=1 *) Read segment selector in link field of current TSS; IF local/global bit is set to local OR index not within GDT limits THEN #GP(TSS selector); FI; Access TSS for task specified in link field of current TSS; IF TSS descriptor type is not TSS or if the TSS is marked not busy THEN #GP(TSS selector); FI; IF TSS not present THEN #NP(TSS selector); FI; SWITCH-TASKS (without nesting) to TSS specified in link field of current TSS; Mark the task just abandoned as NOT BUSY; IF EIP is not within code segment limit THEN #GP(0); FI; END; PROTECTED-MODE-RETURN: (* PE=1, VM=0 in flags image *) IF return code segment selector is null THEN GP(0); FI; IF return code segment selector addrsses descriptor beyond descriptor table limit
3-324

THEN GP(selector; FI; Read segment descriptor pointed to by the return code segment selector IF return code segment descriptor is not a code segment THEN #GP(selector); FI; IF return code segment selector RPL < CPL THEN #GP(selector); FI; IF return code segment descriptor is conforming AND return code segment DPL > return code segment selector RPL THEN #GP(selector); FI; IF return code segment descriptor is not present THEN #NP(selector); FI: IF return code segment selector RPL > CPL THEN GOTO RETURN-OUTER-PRIVILEGE-LEVEL; ELSE GOTO RETURN-TO-SAME-PRIVILEGE-LEVEL FI; END; RETURN-TO-SAME-PRIVILEGE-LEVEL: (* PE=1, VM=0 in flags image, RPL=CPL *) IF EIP is not within code segment limits THEN #GP(0); FI; EIP tempEIP; CS tempCS; (* segment descriptor information also loaded *) EFLAGS (CF, PF, AF, ZF, SF, TF, DF, OF, NT) tempEFLAGS; IF OperandSize=32 THEN EFLAGS(RF, AC, ID) tempEFLAGS; FI; IF CPL IOPL THEN EFLAGS(IF) tempEFLAGS; FI; IF CPL = 0 THEN EFLAGS(IOPL) tempEFLAGS; IF OperandSize=32 THEN EFLAGS(VM, VIF, VIP) tempEFLAGS; FI; FI; END; RETURN-TO-OUTER-PRIVILGE-LEVEL: IF OperandSize=32 THEN IF top 8 bytes on stack are not within limits THEN #SS(0); FI; ELSE (* OperandSize=16 *) IF top 4 bytes on stack are not within limits THEN #SS(0); FI; FI; Read return segment selector; IF stack segment selector is null THEN #GP(0); FI; IF return stack segment selector index is not within its descriptor table limits
3-325

THEN #GP(SSselector); FI; Read segment descriptor pointed to by return segment selector; IF stack segment selector RPL RPL of the return code segment selector IF stack segment selector RPL RPL of the return code segment selector OR the stack segment descriptor does not indicate a a writable data segment; OR stack segment DPL RPL of the return code segment selector THEN #GP(SS selector); FI; IF stack segment is not present THEN #SS(SS selector); FI; IF tempEIP is not within code segment limit THEN #GP(0); FI; EIP tempEIP; CS tempCS; EFLAGS (CF, PF, AF, ZF, SF, TF, DF, OF, NT) tempEFLAGS; IF OperandSize=32 THEN EFLAGS(RF, AC, ID) tempEFLAGS; FI; IF CPL IOPL THEN EFLAGS(IF) tempEFLAGS; FI; IF CPL = 0 THEN EFLAGS(IOPL) tempEFLAGS; IF OperandSize=32 THEN EFLAGS(VM, VIF, VIP) tempEFLAGS; FI; FI; CPL RPL of the return code segment selector; FOR each of segment register (ES, FS, GS, and DS) DO; IF segment register points to data or non-conforming code segment AND CPL > segment descriptor DPL (* stored in hidden part of segment register *) THEN (* segment register invalid *) SegmentSelector 0; (* null segment selector *) FI; OD; END:
Flags Affected All the flags and fields in the EFLAGS register are potentially modified, depending on the mode of operation of the processor. If performing a return from a nested task to a previous task, the EFLAGS register will be modified according to the EFLAGS image stored in the previous tasks TSS.
3-326

Protected Mode Exceptions #GP(0) If the return code or stack segment selector is null. If the return instruction pointer is not within the return code segment limit. #GP(selector) If a segment selector index is outside its descriptor table limits. If the return code segment selector RPL is greater than the CPL. If the DPL of a conforming-code segment is greater than the return code segment selector RPL. If the DPL for a nonconforming-code segment is not equal to the RPL of the code segment selector. If the stack segment descriptor DPL is not equal to the RPL of the return code segment selector. If the stack segment is not a writable data segment. If the stack segment selector RPL is not equal to the RPL of the return code segment selector. If the segment descriptor for a code segment does not indicate it is a code segment. If the segment selector for a TSS has its local/global bit set for local. If a TSS segment descriptor specifies that the TSS is busy or not available. #SS(0) #NP(selector) #PF(fault-code) #AC(0) If the top bytes of stack are not within stack limits. If the return code or stack segment is not present. If a page fault occurs. If an unaligned memory reference occurs when the CPL is 3 and alignment checking is enabled.
Real-Address Mode Exceptions #GP #SS If the return instruction pointer is not within the return code segment limit. If the top bytes of stack are not within stack limits.
3-327

Virtual-8086 Mode Exceptions #GP(0) If the return instruction pointer is not within the return code segment limit. IF IOPL not equal to 3 #PF(fault-code) #SS(0) #AC(0) If a page fault occurs. If the top bytes of stack are not within stack limits. If an unaligned memory reference occurs and alignment checking is enabled.
3-328
JccJump if Condition Is Met

Opcode 77 cb 73 cb 72 cb 76 cb 72 cb E3 cb E3 cb 74 cb 7F cb 7D cb 7C cb 7E cb 76 cb 72 cb 73 cb 77 cb 73 cb 75 cb 7E cb 7C cb 7D cb 7F cb 71 cb 7B cb 79 cb 75 cb 70 cb 7A cb 7A cb 7B cb 78 cb 74 cb 0F 87 cw/cd 0F 83 cw/cd 0F 82 cw/cd 0F 86 cw/cd 0F 82 cw/cd Instruction JA rel8 JAE rel8 JB rel8 JBE rel8 JC rel8 JCXZ rel8 JECXZ rel8 JE rel8 JG rel8 JGE rel8 JL rel8 JLE rel8 JNA rel8 JNAE rel8 JNB rel8 JNBE rel8 JNC rel8 JNE rel8 JNG rel8 JNGE rel8 JNL rel8 JNLE rel8 JNO rel8 JNP rel8 JNS rel8 JNZ rel8 JO rel8 JP rel8 JPE rel8 JPO rel8 JS rel8 JZ rel8 JA rel16/32 JAE rel16/32 JB rel16/32 JBE rel16/32 JC rel16/32 Description Jump short if above (CF=0 and ZF=0) Jump short if above or equal (CF=0) Jump short if below (CF=1) Jump short if below or equal (CF=1 or ZF=1) Jump short if carry (CF=1) Jump short if CX register is 0 Jump short if ECX register is 0 Jump short if equal (ZF=1) Jump short if greater (ZF=0 and SF=OF) Jump short if greater or equal (SF=OF) Jump short if less (SF<>OF) Jump short if less or equal (ZF=1 or SF<>OF) Jump short if not above (CF=1 or ZF=1) Jump short if not above or equal (CF=1) Jump short if not below (CF=0) Jump short if not below or equal (CF=0 and ZF=0) Jump short if not carry (CF=0) Jump short if not equal (ZF=0) Jump short if not greater (ZF=1 or SF<>OF) Jump short if not greater or equal (SF<>OF) Jump short if not less (SF=OF) Jump short if not less or equal (ZF=0 and SF=OF) Jump short if not overflow (OF=0) Jump short if not parity (PF=0) Jump short if not sign (SF=0) Jump short if not zero (ZF=0) Jump short if overflow (OF=1) Jump short if parity (PF=1) Jump short if parity even (PF=1) Jump short if parity odd (PF=0) Jump short if sign (SF=1) Jump short if zero (ZF = 1) Jump near if above (CF=0 and ZF=0) Jump near if above or equal (CF=0) Jump near if below (CF=1) Jump near if below or equal (CF=1 or ZF=1) Jump near if carry (CF=1)
3-329
JccJump if Condition Is Met (Continued)

Opcode 0F 84 cw/cd 0F 8F cw/cd 0F 8D cw/cd 0F 8C cw/cd 0F 8E cw/cd 0F 86 cw/cd 0F 82 cw/cd 0F 83 cw/cd 0F 87 cw/cd 0F 83 cw/cd 0F 85 cw/cd 0F 8E cw/cd 0F 8C cw/cd 0F 8D cw/cd 0F 8F cw/cd 0F 81 cw/cd 0F 8B cw/cd 0F 89 cw/cd 0F 85 cw/cd 0F 80 cw/cd 0F 8A cw/cd 0F 8A cw/cd 0F 8B cw/cd 0F 88 cw/cd 0F 84 cw/cd Instruction JE rel16/32 JG rel16/32 JGE rel16/32 JL rel16/32 JLE rel16/32 JNA rel16/32 JNAE rel16/32 JNB rel16/32 JNBE rel16/32 JNC rel16/32 JNE rel16/32 JNG rel16/32 JNGE rel16/32 JNL rel16/32 JNLE rel16/32 JNO rel16/32 JNP rel16/32 JNS rel16/32 JNZ rel16/32 JO rel16/32 JP rel16/32 JPE rel16/32 JPO rel16/32 JS rel16/32 JZ rel16/32 Description Jump near if equal (ZF=1) Jump near if greater (ZF=0 and SF=OF) Jump near if greater or equal (SF=OF) Jump near if less (SF<>OF) Jump near if less or equal (ZF=1 or SF<>OF) Jump near if not above (CF=1 or ZF=1) Jump near if not above or equal (CF=1) Jump near if not below (CF=0) Jump near if not below or equal (CF=0 and ZF=0) Jump near if not carry (CF=0) Jump near if not equal (ZF=0) Jump near if not greater (ZF=1 or SF<>OF) Jump near if not greater or equal (SF<>OF) Jump near if not less (SF=OF) Jump near if not less or equal (ZF=0 and SF=OF) Jump near if not overflow (OF=0) Jump near if not parity (PF=0) Jump near if not sign (SF=0) Jump near if not zero (ZF=0) Jump near if overflow (OF=1) Jump near if parity (PF=1) Jump near if parity even (PF=1) Jump near if parity odd (PF=0) Jump near if sign (SF=1) Jump near if zero (ZF=1)
Description This instruction checks the state of one or more of the status flags in the EFLAGS register (CF, OF, PF, SF, and ZF) and, if the flags are in the specified state (condition), performs a jump to the target instruction specified by the destination operand. A condition code (cc) is associated with each instruction to indicate the condition being tested for. If the condition is not satisfied, the jump is not performed and execution continues with the instruction following the Jcc instruction.
3-330

The target instruction is specified with a relative offset (a signed offset relative to the current value of the instruction pointer in the EIP register). A relative offset (rel8, rel16, or rel32) is generally specified as a label in assembly code, but at the machine code level, it is encoded as a signed, 8-bit or 32-bit immediate value, which is added to the instruction pointer. Instruction coding is most efficient for offsets of 128 to +127. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared to 0s, resulting in a maximum instruction pointer size of 16 bits. The conditions for each Jcc mnemonic are given in the Description column of the table on the preceding page. The terms less and greater are used for comparisons of signed integers and the terms above and below are used for unsigned integers. Because a particular state of the status flags can sometimes be interpreted in two ways, two mnemonics are defined for some opcodes. For example, the JA (jump if above) instruction and the JNBE (jump if not below or equal) instruction are alternate mnemonics for the opcode 77H. The Jcc instruction does not support far jumps (jumps to other code segments). When the target for the conditional jump is in a different segment, use the opposite condition from the condition being tested for the Jcc instruction, and then access the target with an unconditional far jump (JMP instruction) to the other segment. For example, the following conditional far jump is illegal:
JZ FARLABEL;
To accomplish this far jump, use the following two instructions:

JNZ BEYOND; JMP FARLABEL; BEYOND:
The JECXZ and JCXZ instructions differs from the other Jcc instructions because they do not check the status flags. Instead they check the contents of the ECX and CX registers, respectively, for 0. Either the CX or ECX register is chosen according to the address-size attribute. These instructions are useful at the beginning of a conditional loop that terminates with a conditional loop instruction (such as LOOPNE). They prevent entering the loop when the ECX or CX register is equal to 0, which would cause the loop to execute 232 or 64K times, respectively, instead of zero times. All conditional jumps are converted to code fetches of one or two cache lines, regardless of jump address or cacheability.
3-331

Operation
IF condition THEN EIP EIP + SignExtend(DEST); IF OperandSize = 16 THEN EIP EIP AND 0000FFFFH; FI; FI;
Flags Affected None. Protected Mode Exceptions #GP(0) If the offset being jumped to is beyond the limits of the CS segment.
Real-Address Mode Exceptions #GP If the offset being jumped to is beyond the limits of the CS segment or is outside of the effective address space from 0 to FFFFH. This condition can occur if 32-address size override prefix is used.
Virtual-8086 Mode Exceptions #GP(0) If the offset being jumped to is beyond the limits of the CS segment or is outside of the effective address space from 0 to FFFFH. This condition can occur if 32-address size override prefix is used.
3-332
JMPJump
Opcode EB cb E9 cw E9 cd FF /4 FF /4 EA cd EA cp FF /5 FF /5 Instruction JMP rel8 JMP rel16 JMP rel32 JMP r/m16 JMP r/m32 JMP ptr16:16 JMP ptr16:32 JMP m16:16 JMP m16:32 Description Jump short, relative, displacement relative to next instruction Jump near, relative, displacement relative to next instruction Jump near, relative, displacement relative to next instruction Jump near, absolute indirect, address given in r/m16 Jump near, absolute indirect, address given in r/m32 Jump far, absolute, address given in operand Jump far, absolute, address given in operand Jump far, absolute indirect, address given in m16:16 Jump far, absolute indirect, address given in m16:32
Description This instruction transfers program control to a different point in the instruction stream without recording return information. The destination (target) operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a general-purpose register, or a memory location. This instruction can be used to execute four different types of jumps:
Near jumpA jump to an instruction within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment jump. Short jumpA near jump where the jump range is limited to 128 to +127 from the current EIP value. Far jumpA jump to an instruction located in a different segment than the current code segment but at the same privilege level, sometimes referred to as an intersegment jump. Task switchA jump to an instruction located in a different task.
A task switch can only be executed in protected mode. Refer to Chapter 6, Task Management, of the Intel Architecture Software Developers Manual, Volume 3, for information on performing task switches with the JMP instruction. Near and Short Jumps. When executing a near jump, the processor jumps to the address (within the current code segment) that is specified with the target operand. The target operand specifies either an absolute offset (that is an offset from the base of the code segment) or a relative offset (a signed displacement relative to the current value of the instruction pointer in the EIP register). A near jump to a relative offset of 8-bits (rel8) is referred to as a short jump. The CS register is not changed on near and short jumps. An absolute offset is specified indirectly in a general-purpose register or a memory location (r/m16 or r/m32). The operand-size attribute determines the size of the target operand (16 or 32 bits). Absolute offsets are loaded directly into the EIP register. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared to 0s, resulting in a maximum instruction pointer size of 16 bits.
3-333
JMPJump (Continued)
A relative offset (rel8, rel16, or rel32) is generally specified as a label in assembly code, but at the machine code level, it is encoded as a signed 8-, 16-, or 32-bit immediate value. This value is added to the value in the EIP register. (Here, the EIP register contains the address of the instruction following the JMP instruction). When using relative offsets, the opcode (for short vs. near jumps) and the operand-size attribute (for near relative jumps) determines the size of the target operand (8, 16, or 32 bits). Far Jumps in Real-Address or Virtual-8086 Mode. When executing a far jump in realaddress or virtual-8086 mode, the processor jumps to the code segment and offset specified with the target operand. Here the target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). With the pointer method, the segment and address of the called procedure is encoded in the instruction, using a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address immediate. With the indirect method, the target operand specifies a memory location that contains a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address. The far address is loaded directly into the CS and EIP registers. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared to 0s. Far Jumps in Protected Mode. When the processor is operating in protected mode, the JMP instruction can be used to perform the following three types of far jumps:
A far jump to a conforming or non-conforming code segment. A far jump through a call gate. A task switch.
(The JMP instruction cannot be used to perform interprivilege level far jumps.) In protected mode, the processor always uses the segment selector part of the far address to access the corresponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate, task gate, or TSS) and access rights determine the type of jump to be performed. If the selected descriptor is for a code segment, a far jump to a code segment at the same privilege level is performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, a general-protection exception is generated.) A far jump to the same privilege level in protected mode is very similar to one carried out in real-address or virtual-8086 mode. The target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The operandsize attribute determines the size of the offset (16 or 32 bits) in the far address. The new code segment selector and its descriptor are loaded into CS register, and the offset from the instruction is loaded into the EIP register. Note that a call gate (described in the next paragraph) can also be used to perform far call to a code segment at the same privilege level. Using this mechanism provides an extra level of indirection and is the preferred method of making jumps between 16bit and 32-bit code segments.
3-334
JMPJump (Continued)
When executing a far jump through a call gate, the segment selector specified by the target operand identifies the call gate. (The offset part of the target operand is ignored.) The processor then jumps to the code segment specified in the call gate descriptor and begins executing the instruction at the offset specified in the call gate. No stack switch occurs. Here again, the target operand can specify the far address of the call gate either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). Executing a task switch with the JMP instruction, is somewhat similar to executing a jump through a call gate. Here the target operand specifies the segment selector of the task gate for the task being switched to (and the offset part of the target operand is ignored). The task gate in turn points to the TSS for the task, which contains the segment selectors for the tasks code and stack segments. The TSS also contains the EIP value for the next instruction that was to be executed before the task was suspended. This instruction pointer value is loaded into EIP register so that the task begins executing again at this next instruction. The JMP instruction can also specify the segment selector of the TSS directly, which eliminates the indirection of the task gate. Refer to Chapter 6, Task Management, of the Intel Architecture Software Developers Manual, Volume 3, for detailed information on the mechanics of a task switch. Note that when you execute at task switch with a JMP instruction, the nested task flag (NT) is not set in the EFLAGS register and the new TSSs previous task link field is not loaded with the old tasks TSS selector. A return to the previous task can thus not be carried out by executing the IRET instruction. Switching tasks with the JMP instruction differs in this regard from the CALL instruction which does set the NT flag and save the previous task link information, allowing a return to the calling task with an IRET instruction.
3-335
JMPJump (Continued)
Operation
IF near jump THEN IF near relative jump THEN tempEIP EIP + DEST; (* EIP is instruction following JMP instruction*) ELSE (* near absolute jump *) tempEIP DEST; FI; IF tempEIP is beyond code segment limit THEN #GP(0); FI; IF OperandSize = 32 THEN EIP tempEIP; ELSE (* OperandSize=16 *) EIP tempEIP AND 0000FFFFH; FI; FI: IF far jump AND (PE = 0 OR (PE = 1 AND VM = 1)) (* real-address or virtual-8086 mode *) THEN tempEIP DEST(offset); (* DEST is ptr16:32 or [m16:32] *) IF tempEIP is beyond code segment limit THEN #GP(0); FI; CS DEST(segment selector); (* DEST is ptr16:32 or [m16:32] *) IF OperandSize = 32 THEN EIP tempEIP; (* DEST is ptr16:32 or [m16:32] *) ELSE (* OperandSize = 16 *) EIP tempEIP AND 0000FFFFH; (* clear upper 16 bits *) FI; FI; IF far jump AND (PE = 1 AND VM = 0) (* Protected mode, not virtual-8086 mode *) THEN IF effective address in the CS, DS, ES, FS, GS, or SS segment is illegal OR segment selector in target operand null THEN #GP(0); FI; IF segment selector index not within descriptor table limits THEN #GP(new selector); FI; Read type and access rights of segment descriptor; IF segment type is not a conforming or nonconforming code segment, call gate, task gate, or TSS THEN #GP(segment selector); FI; Depending on type and access rights GO TO CONFORMING-CODE-SEGMENT; GO TO NONCONFORMING-CODE-SEGMENT;
3-336
JMPJump (Continued)
GO TO CALL-GATE; GO TO TASK-GATE; GO TO TASK-STATE-SEGMENT; ELSE #GP(segment selector); FI; CONFORMING-CODE-SEGMENT: IF DPL > CPL THEN #GP(segment selector); FI; IF segment not present THEN #NP(segment selector); FI; tempEIP DEST(offset); IF OperandSize=16 THEN tempEIP tempEIP AND 0000FFFFH; FI; IF tempEIP not in code segment limit THEN #GP(0); FI; CS DEST(SegmentSelector); (* segment descriptor information also loaded *) CS(RPL) CPL EIP tempEIP; END; NONCONFORMING-CODE-SEGMENT: IF (RPL > CPL) OR (DPL CPL) THEN #GP(code segment selector); FI; IF segment not present THEN #NP(segment selector); FI; IF instruction pointer outside code segment limit THEN #GP(0); FI; tempEIP DEST(offset); IF OperandSize=16 THEN tempEIP tempEIP AND 0000FFFFH; FI; IF tempEIP not in code segment limit THEN #GP(0); FI; CS DEST(SegmentSelector); (* segment descriptor information also loaded *) CS(RPL) CPL EIP tempEIP; END; CALL-GATE: IF call gate DPL < CPL OR call gate DPL < call gate segment-selector RPL THEN #GP(call gate selector); FI; IF call gate not present THEN #NP(call gate selector); FI; IF call gate code-segment selector is null THEN #GP(0); FI; IF call gate code-segment selector index is outside descriptor table limits THEN #GP(code segment selector); FI; Read code segment descriptor;
3-337
JMPJump (Continued)
IF code-segment segment descriptor does not indicate a code segment OR code-segment segment descriptor is conforming and DPL > CPL OR code-segment segment descriptor is non-conforming and DPL CPL THEN #GP(code segment selector); FI; IF code segment is not present THEN #NP(code-segment selector); FI; IF instruction pointer is not within code-segment limit THEN #GP(0); FI; tempEIP DEST(offset); IF GateSize=16 THEN tempEIP tempEIP AND 0000FFFFH; FI; IF tempEIP not in code segment limit THEN #GP(0); FI; CS DEST(SegmentSelector); (* segment descriptor information also loaded *) CS(RPL) CPL EIP tempEIP; END; TASK-GATE: IF task gate DPL < CPL OR task gate DPL < task gate segment-selector RPL THEN #GP(task gate selector); FI; IF task gate not present THEN #NP(gate selector); FI; Read the TSS segment selector in the task-gate descriptor; IF TSS segment selector local/global bit is set to local OR index not within GDT limits OR TSS descriptor specifies that the TSS is busy THEN #GP(TSS selector); FI; IF TSS not present THEN #NP(TSS selector); FI; SWITCH-TASKS to TSS; IF EIP not within code segment limit THEN #GP(0); FI; END; TASK-STATE-SEGMENT: IF TSS DPL < CPL OR TSS DPL < TSS segment-selector RPL OR TSS descriptor indicates TSS not available THEN #GP(TSS selector); FI; IF TSS is not present THEN #NP(TSS selector); FI; SWITCH-TASKS to TSS IF EIP not within code segment limit THEN #GP(0); FI; END;
Flags Affected All flags are affected if a task switch occurs; no flags are affected if a task switch does not occur.
3-338
JMPJump (Continued)
Protected Mode Exceptions #GP(0) If offset in target operand, call gate, or TSS is beyond the code segment limits. If the segment selector in the destination operand, call gate, task gate, or TSS is null. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #GP(selector) If segment selector index is outside descriptor table limits. If the segment descriptor pointed to by the segment selector in the destination operand is not for a conforming-code segment, nonconforming-code segment, call gate, task gate, or task state segment. If the DPL for a nonconforming-code segment is not equal to the CPL (When not using a call gate.) If the RPL for the segments segment selector is greater than the CPL. If the DPL for a conforming-code segment is greater than the CPL. If the DPL from a call-gate, task-gate, or TSS segment descriptor is less than the CPL or than the RPL of the call-gate, task-gate, or TSSs segment selector. If the segment descriptor for selector in a call gate does not indicate it is a code segment. If the segment descriptor for the segment selector in a task gate does not indicate available TSS. If the segment selector for a TSS has its local/global bit set for local. If a TSS segment descriptor specifies that the TSS is busy or not available. #SS(0) #NP (selector) If a memory operand effective address is outside the SS segment limit. If the code segment being accessed is not present. If call gate, task gate, or TSS not present. #PF(fault-code) #AC(0) If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3. (Only occurs when fetching target from memory.)
3-339
JMPJump (Continued)
Real-Address Mode Exceptions #GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. #SS If a memory operand effective address is outside the SS segment limit.
Virtual-8086 Mode Exceptions #GP(0) If the target operand is beyond the code segment limits. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made. (Only occurs when fetching target from memory.)
3-340
LAHFLoad Status Flags into AH Register

Opcode 9F Instruction LAHF Description Load: AH = EFLAGS(SF:ZF:0:AF:0:PF:1:CF)
Description This instruction moves the low byte of the EFLAGS register (which includes status flags SF, ZF, AF, PF, and CF) to the AH register. Reserved bits 1, 3, and 5 of the EFLAGS register are set in the AH register as shown in the Operation section below. Operation
AH EFLAGS(SF:ZF:0:AF:0:PF:1:CF);
Flags Affected None (that is, the state of the flags in the EFLAGS register is not affected). Exceptions (All Operating Modes) None.
3-341
LARLoad Access Rights Byte

Opcode 0F 02 /r 0F 02 /r Instruction LAR r16,r/m16 LAR r32,r/m32 Description
r16 r/m16 masked by FF00H r32 r/m32 masked by 00FxFF00H
Description This instruction loads the access rights from the segment descriptor specified by the second operand (source operand) into the first operand (destination operand) and sets the ZF flag in the EFLAGS register. The source operand (which can be a register or a memory location) contains the segment selector for the segment descriptor being accessed. The destination operand is a general-purpose register. The processor performs access checks as part of the loading process. Once loaded in the destination register, software can perform additional checks on the access rights information. When the operand size is 32 bits, the access rights for a segment descriptor include the type and DPL fields and the S, P, AVL, D/B, and G flags, all of which are located in the second doubleword (bytes 4 through 7) of the segment descriptor. The doubleword is masked by 00FXFF00H before it is loaded into the destination operand. When the operand size is 16 bits, the access rights include the type and DPL fields. Here, the two lower-order bytes of the doubleword are masked by FF00H before being loaded into the destination operand. This instruction performs the following checks before it loads the access rights in the destination register:
Checks that the segment selector is not null. Checks that the segment selector points to a descriptor that is within the limits of the GDT or LDT being accessed Checks that the descriptor type is valid for this instruction. All code and data segment descriptors are valid for (can be accessed with) the LAR instruction. The valid system segment and gate descriptor types are given in the following table. If the segment is not a conforming code segment, it checks that the specified segment descriptor is visible at the CPL (that is, if the CPL and the RPL of the segment selector are less than or equal to the DPL of the segment selector).
If the segment descriptor cannot be accessed or is an invalid type for the instruction, the ZF flag is cleared and no access rights are loaded in the destination operand. The LAR instruction can only be executed in protected mode.
3-342
LARLoad Access Rights Byte (Continued)

Type 0 1 2 3 4 5 6 7 8 9 A B C D E F Name Reserved Available 16-bit TSS LDT Busy 16-bit TSS 16-bit call gate 16-bit/32-bit task gate 16-bit interrupt gate 16-bit trap gate Reserved Available 32-bit TSS Reserved Busy 32-bit TSS 32-bit call gate Reserved 32-bit interrupt gate 32-bit trap gate Valid No Yes Yes Yes Yes Yes No No No Yes No Yes Yes No No No
Operation
IF SRC(Offset) > descriptor table limit THEN ZF 0; FI; Read segment descriptor; IF SegmentDescriptor(Type) conforming code segment AND (CPL > DPL) OR (RPL > DPL) OR Segment type is not valid for instruction THEN ZF 0 ELSE IF OperandSize = 32 THEN DEST [SRC] AND 00FxFF00H; ELSE (*OperandSize = 16*) DEST [SRC] AND FF00H; FI; FI;
3-343
LARLoad Access Rights Byte (Continued)

Flags Affected The ZF flag is set to 1 if the access rights are loaded successfully; otherwise, it is cleared to 0. Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3. (Only occurs when fetching target from memory.)
Real-Address Mode Exceptions #UD The LAR instruction is not recognized in real-address mode.
Virtual-8086 Mode Exceptions #UD The LAR instruction cannot be executed in virtual-8086 mode.
3-344
LDMXCSRLoad Streaming SIMD Extension Control/Status

Opcode 0F,AE,/2 Instruction LDMXCSR m32 Description Load Streaming SIMD Extension control/status word from m32.
Description The MXCSR control/status register is used to enable masked/unmasked exception handling, to set rounding modes, to set flush-to-zero mode, and to view exception status flags. The following figure shows the format and encoding of the fields in MXCSR:
31-16 Rsvd 15 FZ RC RC PM UM 10 OM ZM DM IM Rsvd 5 PE UE OE ZE DE 0 IE
The default MXCSR value at reset is 0x1f80. Bits 5-0 indicate whether a Streaming SIMD Extension numerical exception has been detected. They are sticky flags, and can be cleared by using the LDMXCSR instruction to write zeroes to these fields. If an LDMXCSR instruction clears a mask bit and sets the corresponding exception flag bit, an exception will not be immediately generated. The exception will occur only upon the next Streaming SIMD Extension to cause this type of exception. Streaming SIMD Extension uses only one exception flag for each exception. There is no provision for individual exception reporting within a packed data type. In situations where multiple identical exceptions occur within the same instruction, the associated exception flag is updated and indicates that at least one of these conditions happened. These flags are cleared upon reset. Bits 12-7 configure numerical exception masking. An exception type is masked if the corresponding bit is set, and unmasked if the bit is clear. These enables are set upon reset, meaning that all numerical exceptions are masked. Bits 14-13 encode the rounding control, which provides for the common round to nearest mode, as well as directed rounding and true chop. Rounding control affects the arithmetic instructions and certain conversion instructions. The encoding for RC is as follows:
3-345
LDMXCSRLoad Streaming SIMD Extension Control/Status (Continued)

Rounding Mode Round to nearest (even) RC Field 00B Description Rounded result is the closest to the infinitely precise result. If two values are equally close, the result is the even value (that is, the one with the least-significant bit of zero). Rounded result is close to but no greater than the infinitely precise result Rounded result is close to but no less than the infinitely precise result. Rounded result is close to but no greater in absolute value than the infinitely precise result.
Round down (to minus infinity) Round up (toward positive infinity) Round toward zero (truncate)
01B 10B 11B
The rounding control is set to round to nearest upon reset. Bit 15 (FZ) is used to turn on the Flush-To-Zero mode (bit is set). Turning on the Flush-To-Zero mode has the following effects during underflow situations:
zero results are returned with the sign of the true result precision and underflow exception flags are set
The IEEE mandated masked response to underflow is to deliver the denormalized result (i.e., gradual underflow); consequently, the flush-to-zero mode is not compatible with IEEE Std. 754. It is provided primarily for performance reasons. At the cost of a slight precision loss, faster execution can be achieved for applications where underflows are common. Unmasking the underflow exception takes precedence over Flush-To-Zero mode. This arrangement means that an exception handler will be invoked for a Streaming SIMD Extension that generates an underflow condition while this exception is unmasked, regardless of whether flush-to-zero is enabled. The other bits of MXCSR (bits 31-16 and bit 6) are defined as reserved and cleared; attempting to write a non-zero value to these bits, using either the FXRSTOR or LDMXCSR instructions, will result in a general protection exception. The linear address corresponds to the address of the least-significant byte of the referenced memory data. Operation
MXCSR = m32;
3-346

C/C++ Compiler Intrinsic Equivalent
_mm_setcsr(unsigned int i)
Sets the control register to the value specified. Exceptions General protection fault if reserved bits are loaded with non-zero values. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. #AC for unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
#UD #UD
Real Address Mode Exceptions Interrupt 13 #UD #NM #UD #UD If any part of the operand would lie outside of the effective address space from 0 to 0FFFFH. If CR0.EM = 1. If TS bit in CR0 is set. If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-347

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) #AC Comments The usage of Repeat Prefix (F3H) with LDMXCSR is reserved. Different processor implementations may handle this prefix differently. Usage of this prefix with LDMXCSR risks incompatibility with future processors. For a page fault. For unaligned memory reference.
3-348
LDS/LES/LFS/LGS/LSSLoad Far Pointer

Opcode C5 /r C5 /r 0F B2 /r 0F B2 /r C4 /r C4 /r 0F B4 /r 0F B4 /r 0F B5 /r 0F B5 /r Instruction LDS r16,m16:16 LDS r32,m16:32 LSS r16,m16:16 LSS r32,m16:32 LES r16,m16:16 LES r32,m16:32 LFS r16,m16:16 LFS r32,m16:32 LGS r16,m16:16 LGS r32,m16:32 Description Load DS:r16 with far pointer from memory Load DS:r32 with far pointer from memory Load SS:r16 with far pointer from memory Load SS:r32 with far pointer from memory Load ES:r16 with far pointer from memory Load ES:r32 with far pointer from memory Load FS:r16 with far pointer from memory Load FS:r32 with far pointer from memory Load GS:r16 with far pointer from memory Load GS:r32 with far pointer from memory
Description These instructions load a far pointer (segment selector and offset) from the second operand (source operand) into a segment register and the first operand (destination operand). The source operand specifies a 48-bit or a 32-bit pointer in memory depending on the current setting of the operand-size attribute (32 bits or 16 bits, respectively). The instruction opcode and the destination operand specify a segment register/general-purpose register pair. The 16-bit segment selector from the source operand is loaded into the segment register specified with the opcode (DS, SS, ES, FS, or GS). The 32-bit or 16-bit offset is loaded into the register specified with the destination operand. If one of these instructions is executed in protected mode, additional information from the segment descriptor pointed to by the segment selector in the source operand is loaded in the hidden part of the selected segment register. Also in protected mode, a null selector (values 0000 through 0003) can be loaded into DS, ES, FS, or GS registers without causing a protection exception. (Any subsequent reference to a segment whose corresponding segment register is loaded with a null selector, causes a generalprotection exception (#GP) and no memory reference to the segment occurs.)
3-349
LDS/LES/LFS/LGS/LSSLoad Far Pointer (Continued)

Operation
IF Protected Mode THEN IF SS is loaded THEN IF SegementSelector = null THEN #GP(0); FI; ELSE IF Segment selector index is not within descriptor table limits OR Segment selector RPL CPL OR Access rights indicate nonwritable data segment OR DPL CPL THEN #GP(selector); FI; ELSE IF Segment marked not present THEN #SS(selector); FI; SS SegmentSelector(SRC); SS SegmentDescriptor([SRC]); ELSE IF DS, ES, FS, or GS is loaded with non-null segment selector THEN IF Segment selector index is not within descriptor table limits OR Access rights indicate segment neither data nor readable code segment OR (Segment is data or nonconforming-code segment AND both RPL and CPL > DPL) THEN #GP(selector); FI; ELSE IF Segment marked not present THEN #NP(selector); FI; SegmentRegister SegmentSelector(SRC) AND RPL; SegmentRegister SegmentDescriptor([SRC]); ELSE IF DS, ES, FS or GS is loaded with a null selector: SegmentRegister NullSelector; SegmentRegister(DescriptorValidBit) 0; (*hidden flag; not accessible by software*) FI; FI; IF (Real-Address or Virtual-8086 Mode) THEN SegmentRegister SegmentSelector(SRC); FI; DEST Offset(SRC);
3-350

Flags Affected None. Protected Mode Exceptions #UD #GP(0) If source operand is not a memory location. If a null selector is loaded into the SS register. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #GP(selector) If the SS register is being loaded and any of the following is true: the segment selector index is not within the descriptor table limits, the segment selector RPL is not equal to CPL, the segment is a nonwritable data segment, or DPL is not equal to CPL. If the DS, ES, FS, or GS register is being loaded with a non-null segment selector and any of the following is true: the segment selector index is not within descriptor table limits, the segment is neither a data nor a readable code segment, or the segment is a data or nonconforming-code segment and both RPL and CPL are greater than DPL. #SS(0) #SS(selector) #NP(selector) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If the SS register is being loaded and the segment is marked not present. If DS, ES, FS, or GS register is being loaded with a non-null segment selector and the segment is marked not present. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
Real-Address Mode Exceptions #GP #SS #UD If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If source operand is not a memory location.
3-351

Virtual-8086 Mode Exceptions #UD #GP(0) #SS(0) #PF(fault-code) #AC(0) If source operand is not a memory location. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made.
3-352
LEALoad Effective Address

Opcode 8D /r 8D /r Instruction LEA r16,m LEA r32,m Description Store effective address for m in register r16 Store effective address for m in register r32
Description This instruction computes the effective address of the second operand (the source operand) and stores it in the first operand (destination operand). The source operand is a memory address (offset part) specified with one of the processors addressing modes; the destination operand is a general-purpose register. The address-size and operand-size attributes affect the action performed by this instruction, as shown in the following table. The operand-size attribute of the instruction is determined by the chosen register; the address-size attribute is determined by the attribute of the code segment.
Operand Size 16 16 32 32 Address Size 16 32 16 32 Action Performed 16-bit effective address is calculated and stored in requested 16-bit register destination. 32-bit effective address is calculated. The lower 16 bits of the address are stored in the requested 16-bit register destination. 16-bit effective address is calculated. The 16-bit address is zeroextended and stored in the requested 32-bit register destination. 32-bit effective address is calculated and stored in the requested 32-bit register destination.
Different assemblers may use different algorithms based on the size attribute and symbolic reference of the source operand.
3-353
LEALoad Effective Address (Continued)

Operation
IF OperandSize = 16 AND AddressSize = 16 THEN DEST EffectiveAddress(SRC); (* 16-bit address *) ELSE IF OperandSize = 16 AND AddressSize = 32 THEN temp EffectiveAddress(SRC); (* 32-bit address *) DEST temp[0..15]; (* 16-bit address *) ELSE IF OperandSize = 32 AND AddressSize = 16 THEN temp EffectiveAddress(SRC); (* 16-bit address *) DEST ZeroExtend(temp); (* 32-bit address *) ELSE IF OperandSize = 32 AND AddressSize = 32 THEN DEST EffectiveAddress(SRC); (* 32-bit address *) FI; FI;
Flags Affected None. Protected Mode Exceptions #UD If source operand is not a memory location.
Real-Address Mode Exceptions #UD If source operand is not a memory location.
Virtual-8086 Mode Exceptions #UD If source operand is not a memory location.
3-354
LEAVEHigh Level Procedure Exit

Opcode C9 C9 Instruction LEAVE LEAVE Description Set SP to BP, then pop BP Set ESP to EBP, then pop EBP
Description This instruction releases the stack frame set up by an earlier ENTER instruction. The LEAVE instruction copies the frame pointer (in the EBP register) into the stack pointer register (ESP), which releases the stack space allocated to the stack frame. The old frame pointer (the frame pointer for the calling procedure that was saved by the ENTER instruction) is then popped from the stack into the EBP register, restoring the calling procedures stack frame. A RET instruction is commonly executed following a LEAVE instruction to return program control to the calling procedure. Refer to Section 4.5., Procedure Calls for Block-Structured Languages in Chapter 4, Procedure Calls, Interrupts, and Exceptions of the Intel Architecture Software Developers Manual, Volume 1, for detailed information on the use of the ENTER and LEAVE instructions. Operation
IF StackAddressSize = 32 THEN ESP EBP; ELSE (* StackAddressSize = 16*) SP BP; FI; IF OperandSize = 32 THEN EBP Pop(); ELSE (* OperandSize = 16*) BP Pop(); FI;
3-355
LEAVEHigh Level Procedure Exit (Continued)

Protected Mode Exceptions #SS(0) #PF(fault-code) #AC(0) If the EBP register points to a location that is not within the limits of the current stack segment. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
Real-Address Mode Exceptions #GP If the EBP register points to a location outside of the effective address space from 0 to 0FFFFH.
Virtual-8086 Mode Exceptions #GP(0) #PF(fault-code) #AC(0) If the EBP register points to a location outside of the effective address space from 0 to 0FFFFH. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made.
3-356
LESLoad Full Pointer

Refer to entry for LDS/LES/LFS/LGS/LSSLoad Far Pointer.
3-357
LFSLoad Full Pointer

3-358
LGDT/LIDTLoad Global/Interrupt Descriptor Table Register

Opcode 0F 01 /2 0F 01 /3 Instruction LGDT m16&32 LIDT m16&32 Description Load m into GDTR Load m into IDTR
Description These instructions load the values in the source operand into the global descriptor table register (GDTR) or the interrupt descriptor table register (IDTR). The source operand specifies a 6-byte memory location that contains the base address (a linear address) and the limit (size of table in bytes) of the global descriptor table (GDT) or the interrupt descriptor table (IDT). If operandsize attribute is 32 bits, a 16-bit limit (lower two bytes of the 6-byte data operand) and a 32-bit base address (upper four bytes of the data operand) are loaded into the register. If the operand-size attribute is 16 bits, a 16-bit limit (lower two bytes) and a 24-bit base address (third, fourth, and fifth byte) are loaded. Here, the high-order byte of the operand is not used and the high-order byte of the base address in the GDTR or IDTR is filled with zeroes. The LGDT and LIDT instructions are used only in operating-system software; they are not used in application programs. They are the only instructions that directly load a linear address (that is, not a segment-relative address) and a limit in protected mode. They are commonly executed in real-address mode to allow processor initialization prior to switching to protected mode. Refer to SFENCEStore Fence in this chapter for information on storing the contents of the GDTR and IDTR. Operation
IF instruction is LIDT THEN IF OperandSize = 16 THEN IDTR(Limit) SRC[0:15]; IDTR(Base) SRC[16:47] AND 00FFFFFFH; ELSE (* 32-bit Operand Size *) IDTR(Limit) SRC[0:15]; IDTR(Base) SRC[16:47]; FI; ELSE (* instruction is LGDT *) IF OperandSize = 16 THEN GDTR(Limit) SRC[0:15]; GDTR(Base) SRC[16:47] AND 00FFFFFFH; ELSE (* 32-bit Operand Size *) GDTR(Limit) SRC[0:15]; GDTR(Base) SRC[16:47]; FI; FI;
3-359
LGDT/LIDTLoad Global/Interrupt Descriptor Table Register (Continued)

Flags Affected None. Protected Mode Exceptions #UD #GP(0) If source operand is not a memory location. If the current privilege level is not 0. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #PF(fault-code) If a memory operand effective address is outside the SS segment limit. If a page fault occurs.
Real-Address Mode Exceptions #UD #GP #SS If source operand is not a memory location. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit.
Virtual-8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.
3-360
LGSLoad Full Pointer

3-361
LLDTLoad Local Descriptor Table Register

Opcode 0F 00 /2 Instruction LLDT r/m16 Description Load segment selector r/m16 into LDTR
Description This instruction loads the source operand into the segment selector field of the local descriptor table register (LDTR). The source operand (a general-purpose register or a memory location) contains a segment selector that points to a local descriptor table (LDT). After the segment selector is loaded in the LDTR, the processor uses to segment selector to locate the segment descriptor for the LDT in the global descriptor table (GDT). It then loads the segment limit and base address for the LDT from the segment descriptor into the LDTR. The segment registers DS, ES, SS, FS, GS, and CS are not affected by this instruction, nor is the LDTR field in the task state segment (TSS) for the current task. If the source operand is 0, the LDTR is marked invalid and all references to descriptors in the LDT (except by the LAR, VERR, VERW or LSL instructions) cause a general protection exception (#GP). The operand-size attribute has no effect on this instruction. The LLDT instruction is provided for use in operating-system software; it should not be used in application programs. Also, this instruction can only be executed in protected mode. Operation
IF SRC(Offset) > descriptor table limit THEN #GP(segment selector); FI; Read segment descriptor; IF SegmentDescriptor(Type) LDT THEN #GP(segment selector); FI; IF segment descriptor is not present THEN #NP(segment selector); LDTR(SegmentSelector) SRC; LDTR(SegmentDescriptor) GDTSegmentDescriptor;
3-362
LLDTLoad Local Descriptor Table Register (Continued)

Protected Mode Exceptions #GP(0) If the current privilege level is not 0. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #GP(selector) If the selector operand does not point into the Global Descriptor Table or if the entry in the GDT is not a Local Descriptor Table. Segment selector is beyond GDT limit. #SS(0) #NP(selector) #PF(fault-code) If a memory operand effective address is outside the SS segment limit. If the LDT descriptor is not present. If a page fault occurs.
Real-Address Mode Exceptions #UD The LLDT instruction is not recognized in real-address mode.
Virtual-8086 Mode Exceptions #UD The LLDT instruction is recognized in virtual-8086 mode.
3-363
LIDTLoad Interrupt Descriptor Table Register

Refer to entry for LGDT/LIDTLoad Global/Interrupt Descriptor Table Register.
3-364
LMSWLoad Machine Status Word

Opcode 0F 01 /6 Instruction LMSW r/m16 Description Loads r/m16 in machine status word of CR0
Description This instruction loads the source operand into the machine status word, bits 0 through 15 of register CR0. The source operand can be a 16-bit general-purpose register or a memory location. Only the low-order four bits of the source operand (which contains the PE, MP, EM, and TS flags) are loaded into CR0. The PG, CD, NW, AM, WP, NE, and ET flags of CR0 are not affected. The operand-size attribute has no effect on this instruction. If the PE flag of the source operand (bit 0) is set to 1, the instruction causes the processor to switch to protected mode. While in protected mode, the LMSW instruction cannot be used clear the PE flag and force a switch back to real-address mode. The LMSW instruction is provided for use in operating-system software; it should not be used in application programs. In protected or virtual-8086 mode, it can only be executed at CPL 0. This instruction is provided for compatibility with the Intel 286 processor; programs and procedures intended to run on the P6 family, Intel486, and Intel386 processors should use the MOV (control registers) instruction to load the whole CR0 register. The MOV CR0 instruction can be used to set and clear the PE flag in CR0, allowing a procedure or program to switch between protected and real-address modes. This instruction is a serializing instruction. Operation
CR0[0:3] SRC[0:3];
Flags Affected None. Protected Mode Exceptions #GP(0) If the current privilege level is not 0. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #PF(fault-code) If a memory operand effective address is outside the SS segment limit. If a page fault occurs.
3-365
LMSWLoad Machine Status Word (Continued)

Real-Address Mode Exceptions #GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.
Virtual-8086 Mode Exceptions #GP(0) If the current privilege level is not 0. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. #SS(0) #PF(fault-code) If a memory operand effective address is outside the SS segment limit. If a page fault occurs.
3-366
LOCKAssert LOCK# Signal Prefix

Opcode F0 Instruction LOCK Description Asserts LOCK# signal for duration of the accompanying instruction
Description This instruction causes the processors LOCK# signal to be asserted during execution of the accompanying instruction (turns the instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal insures that the processor has exclusive use of any shared memory while the signal is asserted. Note that in later Intel Architecture processors (such as the Pentium Pro processor), locking may occur without the LOCK# signal being asserted. Refer to Intel Architecture Compatibility below. The LOCK prefix can be prepended only to the following instructions and to those forms of the instructions that use a memory operand: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. An undefined opcode exception will be generated if the LOCK prefix is used with any other instruction. The XCHG instruction always asserts the LOCK# signal regardless of the presence or absence of the LOCK prefix. The LOCK prefix is typically used with the BTS instruction to perform a read-modify-write operation on a memory location in shared memory environment. The integrity of the LOCK prefix is not affected by the alignment of the memory field. Memory locking is observed for arbitrarily misaligned fields. Intel Architecture Compatibility Beginning with the Pentium Pro processor, when the LOCK prefix is prefixed to an instruction and the memory area being accessed is cached internally in the processor, the LOCK# signal is generally not asserted. Instead, only the processors cache is locked. Here, the processors cache coherency mechanism insures that the operation is carried out atomically with regards to memory. Refer to Section 7.1.4., Effects of a LOCK Operation on Internal Processor Caches in Chapter 7, Multiple-Processor Management of the Intel Architecture Software Developers Manual, Volume 3, the for more information on locking of caches. Operation
AssertLOCK#(DurationOfAccompaningInstruction)
3-367
LOCKAssert LOCK# Signal Prefix (Continued)

Protected Mode Exceptions #UD If the LOCK prefix is used with an instruction not listed in the Description section above. Other exceptions can be generated by the instruction that the LOCK prefix is being applied to.
Real-Address Mode Exceptions #UD If the LOCK prefix is used with an instruction not listed in the Description section above. Other exceptions can be generated by the instruction that the LOCK prefix is being applied to.
Virtual-8086 Mode Exceptions #UD If the LOCK prefix is used with an instruction not listed in the Description section above. Other exceptions can be generated by the instruction that the LOCK prefix is being applied to.
3-368
LODS/LODSB/LODSW/LODSDLoad String
Opcode AC AD AD AC AD AD Instruction LODS m8 LODS m16 LODS m32 LODSB LODSW LODSD Description Load byte at address DS:(E)SI into AL Load word at address DS:(E)SI into AX Load doubleword at address DS:(E)SI into EAX Load byte at address DS:(E)SI into AL Load word at address DS:(E)SI into AX Load doubleword at address DS:(E)SI into EAX
Description These instructions load a byte, word, or doubleword from the source operand into the AL, AX, or EAX register, respectively. The source operand is a memory location, the address of which is read from the DS:EDI or the DS:SI registers (depending on the address-size attribute of the instruction, 32 or 16, respectively). The DS segment may be overridden with a segment override prefix. At the assembly-code level, two forms of this instruction are allowed: the explicit-operands form and the no-operands form. The explicit-operands form (specified with the LODS mnemonic) allows the source operand to be specified explicitly. Here, the source operand should be a symbol that indicates the size and location of the source value. The destination operand is then automatically selected to match the size of the source operand (the AL register for byte operands, AX for word operands, and EAX for doubleword operands). This explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the source operand symbol must specify the correct type (size) of the operand (byte, word, or doubleword), but it does not have to specify the correct location. The location is always specified by the DS:(E)SI registers, which must be loaded correctly before the load string instruction is executed. The no-operands form provides short forms of the byte, word, and doubleword versions of the LODS instructions. Here also DS:(E)SI is assumed to be the source operand and the AL, AX, or EAX register is assumed to be the destination operand. The size of the source and destination operands is selected with the mnemonic: LODSB (byte loaded into register AL), LODSW (word loaded into AX), or LODSD (doubleword loaded into EAX). After the byte, word, or doubleword is transferred from the memory location into the AL, AX, or EAX register, the (E)SI register is incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the (E)SI register is incremented; if the DF flag is 1, the ESI register is decremented.) The (E)SI register is incremented or decremented by one for byte operations, by two for word operations, or by four for doubleword operations.
3-369
LODS/LODSB/LODSW/LODSDLoad String (Continued)

The LODS, LODSB, LODSW, and LODSD instructions can be preceded by the REP prefix for block loads of ECX bytes, words, or doublewords. More often, however, these instructions are used within a LOOP construct because further processing of the data moved into the register is usually necessary before the next transfer can be made. Refer to REP/REPE/REPZ/REPNE /REPNZRepeat String Operation Prefix in this chapter for a description of the REP prefix. Operation
IF (byte load) THEN AL SRC; (* byte load *) THEN IF DF = 0 THEN (E)SI (E)SI + 1; ELSE (E)SI (E)SI 1; FI; ELSE IF (word load) THEN AX SRC; (* word load *) THEN IF DF = 0 THEN (E)SI (E)SI + 2; ELSE (E)SI (E)SI 2; FI; ELSE (* doubleword transfer *) EAX SRC; (* doubleword load *) THEN IF DF = 0 THEN (E)SI (E)SI + 4; ELSE (E)SI (E)SI 4; FI; FI; FI;
3-370
LODS/LODSB/LODSW/LODSDLoad String (Continued)

3-371
LOOP/LOOPccLoop According to ECX Counter

Opcode E2 cb E1 cb E1 cb E0 cb E0 cb Instruction LOOP rel8 LOOPE rel8 LOOPZ rel8 LOOPNE rel8 LOOPNZ rel8 Description Decrement count; jump short if count 0 Decrement count; jump short if count 0 and ZF=1 Decrement count; jump short if count 0 and ZF=1 Decrement count; jump short if count 0 and ZF=0 Decrement count; jump short if count 0 and ZF=0
Description These instructions perform a loop operation using the ECX or CX register as a counter. Each time the LOOP instruction is executed, the count register is decremented, then checked for 0. If the count is 0, the loop is terminated and program execution continues with the instruction following the LOOP instruction. If the count is not zero, a near jump is performed to the destination (target) operand, which is presumably the instruction at the beginning of the loop. If the address-size attribute is 32 bits, the ECX register is used as the count register; otherwise the CX register is used. The target instruction is specified with a relative offset (a signed offset relative to the current value of the instruction pointer in the EIP register). This offset is generally specified as a label in assembly code, but at the machine code level, it is encoded as a signed, 8-bit immediate value, which is added to the instruction pointer. Offsets of 128 to +127 are allowed with this instruction. Some forms of the loop instruction (LOOPcc) also accept the ZF flag as a condition for terminating the loop before the count reaches zero. With these forms of the instruction, a condition code (cc) is associated with each instruction to indicate the condition being tested for. Here, the LOOPcc instruction itself does not affect the state of the ZF flag; the ZF flag is changed by other instructions in the loop.
3-372
LOOP/LOOPccLoop According to ECX Counter (Continued)

Operation
IF AddressSize = 32 THEN Count is ECX; ELSE (* AddressSize = 16 *) Count is CX; FI; Count Count 1; IF instruction is not LOOP THEN IF (instruction = LOOPE) OR (instruction = LOOPZ) THEN IF (ZF =1) AND (Count 0) THEN BranchCond 1; ELSE BranchCond 0; FI; FI; IF (instruction = LOOPNE) OR (instruction = LOOPNZ) THEN IF (ZF =0 ) AND (Count 0) THEN BranchCond 1; ELSE BranchCond 0; FI; FI; ELSE (* instruction = LOOP *) IF (Count 0) THEN BranchCond 1; ELSE BranchCond 0; FI; FI; IF BranchCond = 1 THEN EIP EIP + SignExtend(DEST); IF OperandSize = 16 THEN EIP EIP AND 0000FFFFH; FI; ELSE Terminate loop and continue program execution at EIP; FI;
3-373
LOOP/LOOPccLoop According to ECX Counter (Continued)

Flags Affected None. Protected Mode Exceptions #GP(0) If the offset jumped to is beyond the limits of the code segment.
Real-Address Mode Exceptions None. Virtual-8086 Mode Exceptions None.
3-374
LSLLoad Segment Limit

Opcode 0F 03 /r 0F 03 /r Instruction LSL r16,r/m16 LSL r32,r/m32 Description Load: r16 segment limit, selector r/m16 Load: r32 segment limit, selector r/m32)
Description This instruction loads the unscrambled segment limit from the segment descriptor specified with the second operand (source operand) into the first operand (destination operand) and sets the ZF flag in the EFLAGS register. The source operand (which can be a register or a memory location) contains the segment selector for the segment descriptor being accessed. The destination operand is a general-purpose register. The processor performs access checks as part of the loading process. Once loaded in the destination register, software can compare the segment limit with the offset of a pointer. The segment limit is a 20-bit value contained in bytes 0 and 1 and in the first four bits of byte 6 of the segment descriptor. If the descriptor has a byte granular segment limit (the granularity flag is set to 0), the destination operand is loaded with a byte granular value (byte limit). If the descriptor has a page granular segment limit (the granularity flag is set to 1), the LSL instruction will translate the page granular limit (page limit) into a byte limit before loading it into the destination operand. The translation is performed by shifting the 20-bit raw limit left 12 bits and filling the low-order 12 bits with 1s. When the operand size is 32 bits, the 32-bit byte limit is stored in the destination operand. When the operand size is 16 bits, a valid 32-bit limit is computed; however, the upper 16 bits are truncated and only the low-order 16 bits are loaded into the destination operand. This instruction performs the following checks before it loads the segment limit into the destination register:
Checks that the segment selector is not null. Checks that the segment selector points to a descriptor that is within the limits of the GDT or LDT being accessed Checks that the descriptor type is valid for this instruction. All code and data segment descriptors are valid for (can be accessed with) the LSL instruction. The valid special segment and gate descriptor types are given in the following table. If the segment is not a conforming code segment, the instruction checks that the specified segment descriptor is visible at the CPL (that is, if the CPL and the RPL of the segment selector are less than or equal to the DPL of the segment selector).
If the segment descriptor cannot be accessed or is an invalid type for the instruction, the ZF flag is cleared and no value is loaded in the destination operand.
3-375
LSLLoad Segment Limit (Continued)

Type 0 1 2 3 4 5 6 7 8 9 A B C D E F Name Reserved Available 16-bit TSS LDT Busy 16-bit TSS 16-bit call gate 16-bit/32-bit task gate 16-bit interrupt gate 16-bit trap gate Reserved Available 32-bit TSS Reserved Busy 32-bit TSS 32-bit call gate Reserved 32-bit interrupt gate 32-bit trap gate Valid No Yes Yes Yes No No No No No Yes No Yes No No No No
3-376

Operation
IF SRC(Offset) > descriptor table limit THEN ZF 0; FI; Read segment descriptor; IF SegmentDescriptor(Type) conforming code segment AND (CPL > DPL) OR (RPL > DPL) OR Segment type is not valid for instruction THEN ZF 0 ELSE temp SegmentLimit([SRC]); IF (G = 1) THEN temp ShiftLeft(12, temp) OR 00000FFFH; FI; IF OperandSize = 32 THEN DEST temp; ELSE (*OperandSize = 16*) DEST temp AND FFFFH; FI; FI;
Flags Affected The ZF flag is set to 1 if the segment limit is loaded successfully; otherwise, it is cleared to 0. Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-377

Real-Address Mode Exceptions #UD The LSL instruction is not recognized in real-address mode.
Virtual-8086 Mode Exceptions #UD The LSL instruction is not recognized in virtual-8086 mode.
3-378
LSSLoad Full Pointer

3-379
LTRLoad Task Register

Opcode 0F 00 /3 Instruction LTR r/m16 Description Load r/m16 into task register
Description This instruction loads the source operand into the segment selector field of the task register. The source operand (a general-purpose register or a memory location) contains a segment selector that points to a task state segment (TSS). After the segment selector is loaded in the task register, the processor uses the segment selector to locate the segment descriptor for the TSS in the global descriptor table (GDT). It then loads the segment limit and base address for the TSS from the segment descriptor into the task register. The task pointed to by the task register is marked busy, but a switch to the task does not occur. The LTR instruction is provided for use in operating-system software; it should not be used in application programs. It can only be executed in protected mode when the CPL is 0. It is commonly used in initialization code to establish the first task to be executed. The operand-size attribute has no effect on this instruction. Operation
IF SRC(Offset) > descriptor table limit OR IF SRC(type) global THEN #GP(segment selector); FI; Read segment descriptor; IF segment descriptor is not for an available TSS THEN #GP(segment selector); FI; IF segment descriptor is not present THEN #NP(segment selector); TSSsegmentDescriptor(busy) 1; (* Locked read-modify-write operation on the entire descriptor when setting busy flag *) TaskRegister(SegmentSelector) SRC; TaskRegister(SegmentDescriptor) TSSSegmentDescriptor;
3-380
LTRLoad Task Register (Continued)

Protected Mode Exceptions #GP(0) If the current privilege level is not 0. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #GP(selector) If the source selector points to a segment that is not a TSS or to one for a task that is already busy. If the selector points to LDT or is beyond the GDT limit. #NP(selector) #SS(0) #PF(fault-code) If the TSS is marked not present. If a memory operand effective address is outside the SS segment limit. If a page fault occurs.
Real-Address Mode Exceptions #UD The LTR instruction is not recognized in real-address mode.
Virtual-8086 Mode Exceptions #UD The LTR instruction is not recognized in virtual-8086 mode.
3-381
MASKMOVQByte Mask Write

Opcode 0F,F7,/r Instruction MASKMOVQ mm1, mm2 Description Move 64-bits representing integer data from MM1 register to memory location specified by the edi register, using the byte mask in MM2 register.
Description Data is stored from the mm1 register to the location specified by the di/edi register (using DS segment). The size of the store depends on the address-size attribute. The most significant bit in each byte of the mask register mm2 is used to selectively write the data (0 = no write, 1 = write) on a per-byte basis. Behavior with a mask of all zeroes is as follows:
No data will be written to memory. However, transition from FP to MMX technology state (if necessary) will occur, irrespective of the value of the mask. For memory references, a zero byte mask does not prevent addressing faults (i.e., #GP, #SS) from being signaled. Signaling of page faults (#PG) is implementation-specific. #UD, #NM, #MF, and #AC faults are signaled irrespective of the value of the mask. Signaling of breakpoints (code or data) is not guaranteed; different processor implementations may signal or not signal these breakpoints. If the destination memory region is mapped as UC or WP, enforcement of associated semantics for these memory types is not guaranteed (i.e., is reserved) and is implementation-specific. Dependence on the behavior of a specific implementation in this case is not recommended, and may lead to future incompatibility.
The Mod field of the ModR/M byte must be 11, or an Invalid Opcode Exception will result.
3-382
MASKMOVQByte Mask Write (Continued)

Operation
IF (SRC[7] = 1) THEN m64[EDI] = DEST[7-0]; ELSE M64[EDI] = 0X0; IF (SRC[15] = 1) THEN m64[EDI] = DEST[15-8]; ELSE M64[EDI] = 0X0; IF (SRC[23] = 1) THEN m64[EDI] = DEST[23-16]; ELSE M64[EDI] = 0X0; IF (SRC[31] = 1) THEN m64[EDI] = DEST[31-24]; ELSE M64[EDI] = 0X0; IF (SRC[39] = 1) THEN m64[EDI] = DEST[39-32]; ELSE M64[EDI] = 0X0; IF (SRC[47] = 1) THEN m64[EDI] = DEST[47-40]; ELSE M64[EDI] = 0X0; IF (SRC[55] = 1) THEN m64[EDI] = DEST[55-48]; ELSE M64[EDI] = 0X0; IF (SRC[63] = 1) THEN m64[EDI] = DEST[63-56]; ELSE M64[EDI] = 0X0;
3-383

void_m_maskmovq(__m64d, __m64n, char * p)

void_mm_maskmove_si64(__m64d, __m64n, char * p)
Conditionally store byte elements of d to address p. The high bit of each byte in the selector n determines whether the corresponding byte in d will be stored. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #MF #AC For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3)
Real Address Mode Exceptions Interrupt 13 #UD #NM #MF If any part of the operand would lie outside of the effective address space from 0 to 0FFFFH. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception.
3-384

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #AC #PF (fault-code) Comments MASKMOVQ can be used to improve performance for algorithms which need to merge data on a byte granularity. MASKMOVQ should not cause a read for ownership; doing so generates unnecessary bandwidth since data is to be written directly using the byte-mask without allocating old data prior to the store. Similar to the Streaming SIMD Extension non-temporal store instructions, MASKMOVQ minimizes pollution of the cache hierarchy. MASKMOVQ implicitly uses weakly-ordered, write-combining stores (WC). Refer to Section 9.3.9., Cacheability Control Instructions in Chapter 9, Programming with the Streaming SIMD Extensions of the Intel Architecture Software Developers Manual, Volume 1, for further information about nontemporal stores. As a consequence of the resulting weakly-ordered memory consistency model, a fencing operation such as SFENCE should be used if multiple processors may use different memory types to read/write the same memory location specified by edi. This instruction behaves identically to MMX instructions, in the presence of x87-FP instructions: transition from x87-FP to MMX technology (TOS=0, FP valid bits set to all valid). MASMOVQ ignores the value of CR4.OSFXSR. Since it does not affect the new Streaming SIMD Extension state, they will not generate an invalid exception if CR4.OSFXSR = 0. For unaligned memory reference if the current privilege level is 3. For a page fault.
3-385
MAXPSPacked Single-FP Maximum

Opcode 0F,5F,/r Instruction MAXPS xmm1, xmm2/m128 Description Return the maximum SP FP numbers between XMM2/Mem and XMM1.
Description The MAXPS instruction returns the maximum SP FP numbers from XMM1 and XMM2/Mem. If the values being compared are both zeroes, source2 (xmm2/m128) would be returned. If source2 (xmm2/m128) is an sNaN, this sNaN is forwarded unchanged to the destination (i.e., a quieted version of the sNaN is not returned).
99.1
>
10.99
> =
65.0
267.0
> =
> =
519.0
8.7
=
519.0
Figure 3-36. Operation of the MAXPS Instruction
Operation
IF (DEST[31-0]=NaN) THEN DEST[31-0] = SRC[31-0]; ELSE IF (SRC[31-0] = NaN) THEN DEST[31-0] = SRC[31-0]; ELSE IF (DEST[31-0] > SRC/m128[31-0]) THEN DEST[31-0] = DEST[31-0]; ELSE DEST[31-0] = SRC/m128[31-0]; FI FI FI
3-386
MAXPSPacked Single-FP Maximum (Continued)

IF (DEST[63-32]=NaN) THEN DEST[63-32] = SRC[63-32]; ELSE IF (SRC[63-32] = NaN) THEN DEST[63-32] = SRC[63-32]; ELSE IF (DEST[63-32] > SRC/m128[63-32]) THEN DEST[63-32] = DEST[63-32]; ELSE DEST[63-32] = SRC/m128[63-32]; FI FI FI IF (DEST[95-64]=NaN) THEN DEST[95-64] = SRC[95-64]; ELSE IF (SRC[95-64] = NaN) THEN DEST[95-64] = SRC[95-64]; ELSE IF (DEST[95-64] > SRC/m128[95-64]) THEN DEST[95-64] = DEST[95-64]; ELSE DEST[95-64] = SRC/m128[95-64]; FI FI FI IF (DEST[127-96]=NaN) THEN DEST[127-96] = SRC[127-96]; ELSE IF (SRC[127-96] = NaN) THEN DEST[127-96] = SRC[127-96]; ELSE IF (DEST[127-96] > SRC/m128[127-96]) THEN DEST[127-96] = DEST[127-96]; ELSE DEST[127-96] = SRC/m128[127-96]; FI FI FI
3-387

Exceptions General protection exception if not aligned on 16-byte boundary, including unaligned reference within the stack segment. Intel C/C++ Compiler Intrinsic Equivalent
__m128 _mm_max_ps(__m128 a, __m128 b)
Computes the maximums of the four SP FP values of a and b. Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions Invalid (including qNaN source operand), Denormal. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-388

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) Comments Note that if only one source is a NaN for these instructions, the Src2 operand (either NaN or real value) is written to the result; this differs from the behavior for other instructions as defined in Table 7-9 in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1, which is to always write the NaN to the result, regardless of which source operand contains the NaN. This approach for MAXPS allows compilers to use the MAXPS instruction for common C conditional constructs. If instead of this behavior, it is required that the NaN source operand be returned, the min/max functionality can be emulated using a sequence of instructions: comparison followed by AND, ANDN, and OR. For a page fault.
3-389
MAXSSScalar Single-FP Maximum

Opcode F3,0F,5F,/r Instruction MAXSS xmm1, xmm2/m32 Description Return the maximum SP FP number between the lower SP FP numbers from XMM2/Mem and XMM1.
Description The MAXSS instruction returns the maximum SP FP number from the lower SP FP numbers of XMM1 and XMM2/Mem; the upper three fields are passed through from xmm1. If the values being compared are both zeroes, source2 (xmm2/m128) will be returned. If source2 (xmm2/m128) is an sNaN, this sNaN is forwarded unchanged to the destination (i.e., a quieted version of the sNaN is not returned).
MAXSS xmm1, xmm2/m32 Xmm1 Xmm2/ m32 Xmm1 267.0
>
107.3
=
267.0
Figure 3-37. Operation of the MAXSS Instruction
3-390
MAXSSScalar Single-FP Maximum (Continued)

Operation
IF (DEST[31-0]=NaN) THEN DEST[31-0] = SRC[31-0]; ELSE IF (SRC[31-0] = NaN) THEN DEST[31-0] = SRC[31-0]; ELSE IF (DEST[31-0] > SRC/m128[31-0]) THEN DEST[31-0] = DEST[31-0]; ELSE DEST[31-0] = SRC/m128[31-0]; FI FI FI DEST[63-32]= DEST[63-32]; DEST[95-64]= DEST[95-64]; DEST[127-96]= DEST[127-96];

__m128 _mm_max_ss(__m128 a, __m128 b)
Computes the maximum of the lower SP FP values of a and b; the upper three SP FP values are passed through from a. Exceptions None. Numeric Exceptions Invalid (including qNaN source operand), Denormal.
3-391

3-392

Comments Note that if only one source is a NaN for these instructions, the Src2 operand (either NaN or real value) is written to the result; this differs from the behavior for other instructions as defined in Table 7-9 in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1, which is to always write the NaN to the result, regardless of which source operand contains the NaN. The upper three operands are still bypassed from the src1 operand, as in all other scalar operations. This approach for MAXSS allows compilers to use the MAXSS instruction for common C conditional constructs. If instead of this behavior, it is required that the NaN source operand be returned, the min/max functionality can be emulated using a sequence of instructions: comparison followed by AND, ANDN, and OR.
3-393
MINPSPacked Single-FP Minimum

Opcode 0F,5D,/r Instruction MINPS xmm1, xmm2/m128 Description Return the minimum SP numbers between XMM2/Mem and XMM1.
Description The MINPS instruction returns the minimum SP FP numbers from XMM1 and XMM2/Mem. If the values being compared are both zeroes, source2 (xmm2/m128) would be returned. If source2 (xmm2/m128) is an sNaN, this sNaN is forwarded unchanged to the destination (i.e., a quieted version of the sNaN is not returned).
MINPS xmm1, xmm2/m128 Xmm1 Xmm2/ m128 Xmm1 99.1 10.99 65.0 267.0
<
519.0
<
8.7
<
38.9
<
107.3
=
99.1
=
8.7
=
38.9
=
107.3
Figure 3-38. Operation of the MINPS Instruction
Operation
IF (DEST[31-0]=NaN) THEN DEST[31-0] = SRC[31-0]; ELSE IF (SRC[31-0] = NaN) THEN DEST[31-0] = SRC[31-0]; ELSE IF (DEST[31-0] < SRC/m128[31-0]) THEN DEST[31-0] = DEST[31-0]; ELSE DEST[31-0] = SRC/m128[31-0]; FI FI FI
3-394
MINPSPacked Single-FP Minimum (Continued)

IF (DEST[63-32]=NaN) THEN DEST[63-32] = SRC[63-32]; ELSE IF (SRC[63-32] = NaN) THEN DEST[63-32] = SRC[63-32]; ELSE IF (DEST[63-32] < SRC/m128[63-32]) THEN DEST[63-32] = DEST[63-32]; ELSE DEST[63-32] = SRC/m128[63-32]; FI FI FI IF (DEST[95-64]=NaN) THEN DEST[95-64] = SRC[95-64]; ELSE IF (SRC[95-64] = NaN) THEN DEST[95-64] = SRC[95-64]; ELSE IF (DEST[95-64] < SRC/m128[95-64]) THEN DEST[95-64] = DEST[95-64]; ELSE DEST[95-64] = SRC/m128[95-64]; FI FI FI IF (DEST[127-96]=NaN) THEN DEST[127-96] = SRC[127-96]; ELSE IF (SRC[127-96] = NaN) THEN DEST[127-96] = SRC[127-96]; ELSE IF (DEST[127-96] < SRC/m128[127-96]) THEN DEST[127-96] = DEST[127-96]; ELSE DEST[127-96] = SRC/m128[127-96]; FI FI FI

__m128 _mm_min_ps(__m128 a, __m128 b)
Computes the minimums of the four SP FP values of a and b.
3-395

Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions Invalid (including qNaN source operand), Denormal. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-396

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) Comments Note that if only one source is a NaN for these instructions, the Src2 operand (either NaN or real value) is written to the result; this differs from the behavior for other instructions as defined in Table 7-9 in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1, which is to always write the NaN to the result, regardless of which source operand contains the NaN. This approach for MINPS allows compilers to use the MINPS instruction for common C conditional constructs. If instead of this behavior, it is required that the NaN source operand be returned, the min/max functionality can be emulated using a sequence of instructions: comparison followed by AND, ANDN, and OR. For a page fault.
3-397
MINSSScalar Single-FP Minimum

Opcode F3,0F,5D,/r Instruction MINSS xmm1, xmm2/m32 Description Return the minimum SP FP number between the lowest SP FP numbers from XMM2/Mem and XMM1.
Description The MINSS instruction returns the minimum SP FP number from the lower SP FP numbers from XMM1 and XMM2/Mem; the upper three fields are passed through from xmm1. If the values being compared are both zeroes, source2 (xmm2/m128) would be returned. If source2 (xmm2/m128) is an sNaN, this sNaN is forwarded unchanged to the destination (i.e., a quieted version of the sNaN is not returned).
MINSS xmm1, xmm2/m32 Xmm1
< =
Xmm2/ m32 Xmm1
Figure 3-39. Operation of the MINSS Instruction
3-398
MINSSScalar Single-FP Minimum (Continued)

Operation
IF (DEST[31-0]=NaN) THEN DEST[31-0] = SRC[31-0]; ELSE IF (SRC[31-0] = NaN) THEN DEST[31-0] = SRC[31-0]; ELSE IF (DEST[31-0] < SRC/m128[31-0]) THEN DEST[31-0] = DEST[31-0]; ELSE DEST[31-0] = SRC/m128[31-0]; FI FI FI DEST[63-32]= DEST[63-32]; DEST[95-64]= DEST[95-64]; DEST[127-96]= DEST[127-96];

__m128 _mm_min_ss(__m128 a, __m128 b)
Computes the minimum of the lower SP FP values of a and b; the upper three SP FP values are passed through from a. Exceptions None. Numeric Exceptions Invalid (including qNaN source operand), Denormal.
3-399

3-400

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF (fault-code) #AC Comments Note that if only one source is a NaN for these instructions, the Src2 operand (either NaN or real value) is written to the result; this differs from the behavior for other instructions as defined in Table 7-9 in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1, which is to always write the NaN to the result, regardless of which source operand contains the NaN. The upper three operands are still bypassed from the src1 operand, as in all other scalar operations. This approach for MINSS allows compilers to use the MINSS instruction for common C conditional constructs. If instead of this behavior, it is required that the NaN source operand be returned, the min/max functionality can be emulated using a sequence of instructions: comparison followed by AND, ANDN, and OR. For a page fault. For unaligned memory references.
3-401
MOVMove
Opcode 88 /r 89 /r 89 /r 8A /r 8B /r 8B /r 8C /r 8E /r A0 A1 A1 A2 A3 A3 B0+ rb B8+ rw B8+ rd C6 /0 C7 /0 C7 /0 NOTES: * The moffs8, moffs16, and moffs32 operands specify a simple offset relative to the segment base, where 8, 16, and 32 refer to the size of the data. The address-size attribute of the instruction determines the size of the offset, either 16 or 32 bits. ** In 32-bit mode, the assembler may insert the 16-bit operand-size prefix with this instruction (refer to the following Description section for further information). Instruction MOV r/m8,r8 MOV r/m16,r16 MOV r/m32,r32 MOV r8,r/m8 MOV r16,r/m16 MOV r32,r/m32 MOV r/m16,Sreg** MOV Sreg,r/m16** MOV AL,moffs8* MOV AX,moffs16* MOV EAX,moffs32* MOV moffs8*,AL MOV moffs16*,AX MOV moffs32*,EAX MOV r8,imm8 MOV r16,imm16 MOV r32,imm32 MOV r/m8,imm8 MOV r/m16,imm16 MOV r/m32,imm32 Description Move r8 to r/m8 Move r16 to r/m16 Move r32 to r/m32 Move r/m8 to r8 Move r/m16 to r16 Move r/m32 to r32 Move segment register to r/m16 Move r/m16 to segment register Move byte at (seg:offset) to AL Move word at (seg:offset) to AX Move doubleword at (seg:offset) to EAX Move AL to (seg:offset) Move AX to (seg:offset) Move EAX to (seg:offset) Move imm8 to r8 Move imm16 to r16 Move imm32 to r32 Move imm8 to r/m8 Move imm16 to r/m16 Move imm32 to r/m32
Description This instruction copies the second operand (source operand) to the first operand (destination operand). The source operand can be an immediate value, general-purpose register, segment register, or memory location; the destination register can be a general-purpose register, segment register, or memory location. Both operands must be the same size, which can be a byte, a word, or a doubleword. The MOV instruction cannot be used to load the CS register. Attempting to do so results in an invalid opcode exception (#UD). To load the CS register, use the far JMP, CALL, or RET instruction.
3-402
MOVMove (Continued)
If the destination operand is a segment register (DS, ES, FS, GS, or SS), the source operand must be a valid segment selector. In protected mode, moving a segment selector into a segment register automatically causes the segment descriptor information associated with that segment selector to be loaded into the hidden (shadow) part of the segment register. While loading this information, the segment selector and segment descriptor information is validated (refer to the Operation algorithm below). The segment descriptor data is obtained from the GDT or LDT entry for the specified segment selector. A null segment selector (values 0000-0003) can be loaded into the DS, ES, FS, and GS registers without causing a protection exception. However, any subsequent attempt to reference a segment whose corresponding segment register is loaded with a null value causes a general protection exception (#GP) and no memory reference occurs. Loading the SS register with a MOV instruction inhibits all interrupts until after the execution of the next instruction. This operation allows a stack pointer to be loaded into the ESP register with the next instruction (MOV ESP, stack-pointer value) before an interrupt occurs1. The LSS instruction offers a more efficient method of loading the SS and ESP registers. When operating in 32-bit mode and moving data between a segment register and a generalpurpose register, the Intel Architecture 32-bit family of processors do not require the use of the 16-bit operand-size prefix (a byte with the value 66H) with this instruction, but most assemblers will insert it if the typical form of the instruction is used (for example, MOV DS, AX). The processor will execute this instruction correctly, but it will usually require an extra clock. With most assemblers, using the instruction form MOV DS, EAX will avoid this unneeded 66H prefix. When the processor executes the instruction with a 32-bit general-purpose register, it assumes that the 16 least-significant bits of the general-purpose register are the destination or source operand. If the register is a destination operand, the resulting value in the two high-order bytes of the register is implementation dependent. For the Pentium Pro processor, the two highorder bytes are filled with zeroes; for earlier 32-bit Intel Architecture processors, the two high order bytes are undefined.
1. Note that in a sequence of instructions that individually delay interrupts past the following instruction, only the first instruction in the sequence is guaranteed to delay the interrupt, but subsequent interrupt-delaying instructions may not delay the interrupt. Thus, in the following instruction sequence: STI MOV SS, EAX MOV ESP, EBP interrupts may be recognized before MOV ESP, EBP executes, because STI also delays interrupts for one instruction.
3-403
MOVMove (Continued)
Operation
DEST SRC;
Loading a segment register while in protected mode results in special checks and actions, as described in the following listing. These checks are performed on the segment selector and the segment descriptor it points to.
IF SS is loaded; THEN IF segment selector is null THEN #GP(0); FI; IF segment selector index is outside descriptor table limits OR segment selectors RPL CPL OR segment is not a writable data segment OR DPL CPL THEN #GP(selector); FI; IF segment not marked present THEN #SS(selector); ELSE SS segment selector; SS segment descriptor; FI; FI; IF DS, ES, FS or GS is loaded with non-null selector; THEN IF segment selector index is outside descriptor table limits OR segment is not a data or readable code segment OR ((segment is a data or nonconforming code segment) AND (both RPL and CPL > DPL)) THEN #GP(selector); IF segment not marked present THEN #NP(selector); ELSE SegmentRegister segment selector; SegmentRegister segment descriptor; FI; FI; IF DS, ES, FS or GS is loaded with a null selector; THEN SegmentRegister segment selector; SegmentRegister segment descriptor; FI;
3-404
MOVMove (Continued)
Flags Affected None. Protected Mode Exceptions #GP(0) If attempt is made to load SS register with null segment selector. If the destination operand is in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #GP(selector) If segment selector index is outside descriptor table limits. If the SS register is being loaded and the segment selectors RPL and the segment descriptors DPL are not equal to the CPL. If the SS register is being loaded and the segment pointed to is a nonwritable data segment. If the DS, ES, FS, or GS register is being loaded and the segment pointed to is not a data or readable code segment. If the DS, ES, FS, or GS register is being loaded and the segment pointed to is a data or nonconforming code segment, but both the RPL and the CPL are greater than the DPL. #SS(0) #SS(selector) #NP #PF(fault-code) #AC(0) #UD If a memory operand effective address is outside the SS segment limit. If the SS register is being loaded and the segment pointed to is marked not present. If the DS, ES, FS, or GS register is being loaded and the segment pointed to is marked not present. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3. If attempt is made to load the CS register.
Real-Address Mode Exceptions #GP #SS #UD If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If attempt is made to load the CS register.
3-405
MOVMove (Continued)
Virtual-8086 Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #AC(0) #UD If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made. If attempt is made to load the CS register.
3-406
MOVMove to/from Control Registers

Opcode 0F 22 /r 0F 22 /r 0F 22 /r 0F 22 /r 0F 20 /r 0F 20 /r 0F 20 /r 0F 20 /r Instruction MOV CR0,r32 MOV CR2,r32 MOV CR3,r32 MOV CR4,r32 MOV r32,CR0 MOV r32,CR2 MOV r32,CR3 MOV r32,CR4 Description Move r32 to CR0 Move r32 to CR2 Move r32 to CR3 Move r32 to CR4 Move CR0 to r32 Move CR2 to r32 Move CR3 to r32 Move CR4 to r32
Description This instruction moves the contents of a control register (CR0, CR2, CR3, or CR4) to a generalpurpose register or vice versa. The operand size for these instructions is always 32 bits, regardless of the operand-size attribute. Refer to Section 2.5., Control Registers in Chapter 2, System Architecture Overview of the Intel Architecture Software Developers Manual, Volume 3, for a detailed description of the flags and fields in the control registers. When loading a control register, a program should not attempt to change any of the reserved bits; that is, always set reserved bits to the value previously read. At the opcode level, the reg field within the ModR/M byte specifies which of the control registers is loaded or read. The two bits in the mod field are always 11B. The r/m field specifies the general-purpose register loaded or read. These instructions have the following side effects:
When writing to control register CR3, all non-global TLB entries are flushed. Refer to Section 3.7., Translation Lookaside Buffers (TLBs) in Chapter 3, Protected-Mode Memory Management of the Intel Architecture Software Developers Manual, Volume 3, for a detailed description of the flags and fields in the control registers.
3-407
MOVMove to/from Control Registers (Continued)

The following side effects are implementation specific for the Pentium Pro processors. Software should not depend on this functionality in future Intel Architecture processors:
When modifying any of the paging flags in the control registers (PE and PG in register CR0 and PGE, PSE, and PAE in register CR4), all TLB entries are flushed, including global entries. If the PG flag is set to 1 and control register CR4 is written to set the PAE flag to 1 (to enable the physical address extension mode), the pointers (PDPTRs) in the page-directory pointers table will be loaded into the processor (into internal, non-architectural registers). If the PAE flag is set to 1 and the PG flag set to 1, writing to control register CR3 will cause the PDPTRs to be reloaded into the processor. If the PAE flag is set to 1 and control register CR0 is written to set the PG flag, the PDPTRs are reloaded into the processor.
Operation
DEST SRC;
Flags Affected The OF, SF, ZF, AF, PF, and CF flags are undefined. Protected Mode Exceptions #GP(0) If the current privilege level is not 0. If an attempt is made to write invalid bit combinations in CR0 (such as setting the PG flag to 1 when the PE flag is set to 0, or setting the CD flag to 0 when the NE flag is set to 1). If an attempt is made to write a 1 to any reserved bit in CR4. If an attempt is made to write reserved bits in the page-directory pointers table (used in the extended physical addressing mode) when the PAE flag in control register CR4 and the PG flag in control register CR0 are set to 1. Real-Address Mode Exceptions #GP If an attempt is made to write a 1 to any reserved bit in CR4.
Virtual-8086 Mode Exceptions #GP(0) These instructions cannot be executed in virtual-8086 mode.
3-408
MOVMove to/from Debug Registers

Opcode 0F 21/r 0F 23 /r Instruction MOV r32, DR0-DR7 MOV DR0-DR7,r32 Description Move debug register to r32 Move r32 to debug register
Description This instruction moves the contents of a debug register (DR0, DR1, DR2, DR3, DR4, DR5, DR6, or DR7) to a general-purpose register or vice versa. The operand size for these instructions is always 32 bits, regardless of the operand-size attribute. Refer to Chapter 15, Debugging and Performance Monitoring of the Intel Architecture Software Developers Manual, Volume 3, for a detailed description of the flags and fields in the debug registers. The instructions must be executed at privilege level 0 or in real-address mode. When the debug extension (DE) flag in register CR4 is clear, these instructions operate on debug registers in a manner that is compatible with Intel386 and Intel486 processors. In this mode, references to DR4 and DR5 refer to DR6 and DR7, respectively. When the DE set in CR4 is set, attempts to reference DR4 and DR5 result in an undefined opcode (#UD) exception. (The CR4 register was added to the Intel Architecture beginning with the Pentium processor.) At the opcode level, the reg field within the ModR/M byte specifies which of the debug registers is loaded or read. The two bits in the mod field are always 11. The r/m field specifies the generalpurpose register loaded or read. Operation
IF ((DE = 1) and (SRC or DEST = DR4 or DR5)) THEN #UD; ELSE DEST SRC;
Flags Affected The OF, SF, ZF, AF, PF, and CF flags are undefined. Protected Mode Exceptions #GP(0) #UD #DB If the current privilege level is not 0. If the DE (debug extensions) bit of CR4 is set and a MOV instruction is executed involving DR4 or DR5. If any debug register is accessed while the GD flag in debug register DR7 is set.
3-409
MOVMove to/from Debug Registers (Continued)

Real-Address Mode Exceptions #UD #DB If the DE (debug extensions) bit of CR4 is set and a MOV instruction is executed involving DR4 or DR5. If any debug register is accessed while the GD flag in debug register DR7 is set.
Virtual-8086 Mode Exceptions #GP(0) The debug registers cannot be loaded or read when in virtual-8086 mode.
3-410
MOVAPSMove Aligned Four Packed Single-FP

Opcode 0F,28,/r 0F,29,/r Instruction MOVAPS xmm1, xmm2/m128 MOVAPS xmm2/m128, xmm1 Description Move 128 bits representing four packed SP data from XMM2/Mem to XMM1 register. Move 128 bits representing four packed SP from XMM1 register to XMM2/Mem.
Description The linear address corresponds to the address of the least-significant byte of the referenced memory data. When a memory address is indicated, the 16 bytes of data at memory location m128 are loaded or stored. When the register-register form of this operation is used, the content of the 128-bit source register is copied into the 128-bit destination register.
MOVAPS xmm1, xmm2/m128 (xmm2/m128, xmm1) Xmm1
Xmm2/ m128 Xmm1
Figure 3-40. Operation of the MOVAPS Instruction
3-411
MOVAPSMove Aligned Four Packed Single-FP (Continued)

Operation
IF (destination = DEST) THEN IF (SRC = m128)THEN (* load instruction *) DEST[127-0] = m128; ELSE(* move instruction *) DEST[127=0] = SRC[127-0]; FI; ELSE IF (destination = m128)THEN (* store instruction *) m128 = SRC[127-0]; ELSE(* move instruction *) DEST[127-0] = SRC[127-0]; FI; FI;

__m128 _mm_load_ps(float * p)
Loads four SP FP values. The address must be 16-byte-aligned.

void_mm_store_ps(float *p, __m128 a)
Stores four SP FP values. The address must be 16-byte-aligned. Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions None.
3-412
MOVAPSMove Aligned Four Packed Single-FP (Continued)

Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) Comments MOVAPS should be used when dealing with 16-byte aligned SP FP numbers. If the data is not known to be aligned, MOVUPS should be used instead of MOVAPS. The usage of this instruction should be limited to the cases where the aligned restriction is easy to meet. Processors that support Streaming SIMD Extension will provide optimal aligned performance for the MOVAPS instruction. The usage of Repeat Prefix (F3H) with MOVAPS is reserved. Different processor implementations may handle this prefix differently. Usage of this prefix with MOVAPS risks incompatibility with future processors. For a page fault.
3-413
MOVDMove 32 Bits
Opcode 0F 6E /r 0F 7E /r Instruction MOVD mm, r/m32 MOVD r/m32, mm Description Move doubleword from r/m32 to mm. Move doubleword from mm to r/m32.
Description This instruction copies doubleword from source operand (second operand) to destination operand (first operand). Source and destination operands can be MMX technology registers, memory locations, or 32-bit general-purpose registers; however, data cannot be transferred from an MMX technology register to another MMX technology register, from one memory location to another memory location, or from one general-purpose register to another generalpurpose register. When the destination operand is an MMX technology register, the 32-bit source value is written to the low-order 32 bits of the 64-bit MMX technology register and zero-extended to 64 bits (refer to Figure 3-41). When the source operand is an MMX technology register, the low-order 32 bits of the MMX technology register are written to the 32-bit general-purpose register or 32-bit memory location selected with the destination operand.
MOVD m32, mm 63 0 32 31 xxxxxxxx b 3 b2 b1 b0 mm
15 b3 b1
0 b2 b0
W N+1 WN+1 m32
MOVD mm, r32 63 32 31 0 00000000 b 3 b2 b1 b0 mm
31 0 b 3 b2 b 1 b0 r32
3006010
Figure 3-41. Operation of the MOVD Instruction
3-414
MOVDMove 32 Bits (Continued)

Operation
IF DEST is MMX technology register THEN DEST ZeroExtend(SRC); ELSE (* SRC is MMX technology register *) DEST LowOrderDoubleword(SRC);
Flags Affected None. Protected Mode Exceptions #GP(0) If the destination operand is in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. #SS(0) #UD #NM #MF #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
Real-Address Mode Exceptions #GP #UD #NM #MF If any part of the operand lies outside of the effective address space from 0 to FFFFH. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception.
3-415
MOVDMove 32 Bits (Continued)

Virtual-8086 Mode Exceptions #GP #UD #NM #MF #PF(fault-code) #AC(0) If any part of the operand lies outside of the effective address space from 0 to FFFFH. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made.
3-416
MOVHLPS High to Low Packed Single-FP

Opcode OF,12,/r Instruction MOVHLPS xmm1, xmm2 Description Move 64 bits representing higher two SP operands from xmm2 to lower two fields of xmm1 register.
Description The upper 64-bits of the source register xmm2 are loaded into the lower 64-bits of the 128-bit register xmm1, and the upper 64-bits of xmm1 are left unchanged.
MOVHLPS xmm1, xmm2 Xmm1 Xmm2
=
Xmm1
Figure 3-42. Operation of the MOVHLPS Instruction
Operation
DEST[127-64] = DEST[127-64]; DEST[63-0] = SRC[127-64];

__m128 _mm_movehl_ps(__m128 a, __m128 b)
Moves the upper 2 SP FP values of b to the lower 2 SP FP values of the result. The upper 2 SP FP values of a are passed through to the result.
3-417
MOVHLPS High to Low Packed Single-FP (Continued)

Exceptions None Numeric Exceptions None. Protected Mode Exceptions #UD #NM #UD #UD If CR0.EM = 1. If TS bit in CR0 is set. If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
Real Address Mode Exceptions #UD #NM #UD #UD If CR0.EM = 1. If TS bit in CR0 is set. If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. Comments The usage of Repeat (F2H, F3H) and Operand Size (66H) prefixes with MOVHLPS is reserved. Different processor implementations may handle these prefixes differently. Usage of these prefixes with MOVHLPS risks incompatibility with future processors.
3-418
MOVHPSMove High Packed Single-FP

Opcode 0F,16,/r 0F,17,/r Instruction MOVHPS xmm, m64 MOVHPS m64, xmm Description Move 64 bits representing two SP operands from Mem to upper two fields of XMM register. Move 64 bits representing two SP operands from upper two fields of XMM register to Mem.
Description The linear address corresponds to the address of the least-significant byte of the referenced memory data. When the load form of this operation is used, m64 is loaded into the upper 64-bits of the 128-bit register xmm, and the lower 64-bits are left unchanged.
MOVHPS xmm1, m64 (m64, xmm1) Xmm1
m64
Xmm1
Figure 3-43. Operation of the MOVHPS Instruction
3-419
MOVHPSMove High Packed Single-FP (Continued)

Operation
IF (destination = DEST) THEN(* load instruction *) DEST[127-64] = m64; DEST[31-0] = DEST[31-0]; DEST[63-32] = DEST[63-32]; ELSE (* store instruction *) m64 = SRC[127-64]; FI;

__m128 _mm_loadh_pi(__m128 a, __m64 * p)
Sets the upper two SP FP values with 64 bits of data loaded from the address p; the lower two values are passed through from a.
void_mm_storeh_pi(__m64 * p, __m128 a)
Stores the upper two SP FP values of a to the address p. Exceptions None. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #AC #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3). If CR4.OSFXSR(bit 9) = 0 If CPUID.XMM(EDX bit 25) = 0.
3-420
MOVHPSMove High Packed Single-FP (Continued)

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF (fault-code) #AC Comments The usage of Repeat Prefixes (F2H, F3H) with MOVHPS is reserved. Different processor implementations may handle these prefixes differently. Usage of these prefixes with MOVHPS risks incompatibility with future processors. For a page fault. For unaligned memory reference if the current privilege level is 3.
3-421
MOVLHPSMove Low to High Packed Single-FP

Opcode OF,16,/r Instruction MOVLHPS xmm1, xmm2 Description Move 64 bits representing lower two SP operands from xmm2 to upper two fields of xmm1 register.
Description The lower 64-bits of the source register xmm2 are loaded into the upper 64-bits of the 128-bit register xmm1, and the lower 64-bits of xmm1 are left unchanged.
MOVLHPS xmm1, xmm2 Xmm1 Xmm2
=
Xmm1
Figure 3-44. Operation of the MOVLHPS Instruction
Operation
DEST[127-64] = SRC[63-0]; DEST[63-0] = DEST[63-0];
3-422
MOVLHPSMove Low to High Packed Single-FP (Continued)

__m128 _mm_movelh_ps (__m128 a, __m128 b)
Moves the lower 2 SP FP values of b to the upper 2 SP FP values of the result. The lower 2 SP FP values of a are passed through to the result. Exceptions None. Numeric Exceptions None. Protected Mode Exceptions #UD #NM #UD #UD If CR0.EM = 1. If TS bit in CR0 is set. If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. Comments The usage of Repeat (F2H, F3H) and Operand Size (66H) prefixes with MOVLHPS is reserved. Different processor implementations may handle these prefixes differently. Usage of these prefixes with MOVLHPS risks incompatibility with future processors.
3-423
MOVLPSMove Low Packed Single-FP

Opcode 0F,12,/r 0F,13,/r Instruction MOVLPS xmm, m64 MOVLPS m64, xmm Description Move 64 bits representing two SP operands from Mem to lower two fields of XMM register. Move 64 bits representing two SP operands from lower two fields of XMM register to Mem.
Description The linear address corresponds to the address of the least-significant byte of the referenced memory data. When the load form of this operation is used, m64 is loaded into the lower 64-bits of the 128-bit register xmm, and the upper 64-bits are left unchanged.
MOVLPS xmm1, m64 (m64, xmm1) Xmm1
m64
Xmm1
Figure 3-45. Operation of the MOVLPS Instruction
3-424
MOVLPSMove Low Packed Single-FP (Continued)

Operation
IF (destination = DEST) THEN(* load instruction *) DEST[63-0] = m64; DEST[95-64] = DEST[95-64]; DEST[127-96] = DEST[127-96]; ELSE(* store instruction *) m64 = DEST[63-0]; FI

__m128 _mm_loadl_pi(__m128 a, __m64 *p)
Sets the lower two SP FP values with 64 bits of data loaded from the address p; the upper two values are passed through from a.
void_mm_storel_pi(__m64 * p, __m128 a)
Stores the lower two SP FP values of a to the address p. Exceptions None. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0 #PF (fault-code) #UD #NM #AC #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-425
MOVLPSMove Low Packed Single-FP (Continued)

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF (fault-code) #AC Comments The usage of Repeat Prefix (F3H) with MOVLPS is reserved. Different processor implementations may handle this prefix differently. Usage of this prefix with MOVLPS risks incompatibility with future processors. For a page fault. For unaligned memory reference if the current privilege level is 3.
3-426
MOVMSKPSMove Mask To Integer

Opcode 0F,50,/r Instruction MOVMSKPS r32, xmm Description Move the single mask to r32.
Description The MOVMSKPS instruction returns to the integer register r32 a 4-bit mask formed of the most significant bits of each SP FP number of its operand.
MOVMSKPS r32, xmm1
R32
Xmm1
R32
Figure 3-46. Operation of the MOVMSKPS Instruction
Operation
r32[0] r32[1] r32[2] r32[3] r32[7-4] r32[15-8] r32[31-16] = SRC[31]; = SRC[63]; = SRC[95]; = SRC[127]; = 0X0; = 0X00; = 0X0000;
3-427
MOVMSKPSMove Mask To Integer (Continued)

int_mm_movemask_ps(__m128 a)
Creates a 4-bit mask from the most significant bits of the four SP FP values. Exceptions None. Numeric Exceptions None. Protected Mode Exceptions #UD #NM #MF #UD #UD If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. Comments The usage of Repeat Prefix (F3H) with MOVMSKPS is reserved. Different process implementations may handle this prefix differently. Usage of this prefix with MOVMSKPS risks incompatibility with future processors.
3-428
MOVNTPSMove Aligned Four Packed Single-FP Non Temporal

Opcode 0F,2B, /r Instruction MOVNTPS m128, xmm Description Move 128 bits representing four packed SP FP data from XMM register to Mem, minimizing pollution in the cache hierarchy.
Description The linear address corresponds to the address of the least-significant byte of the referenced memory data. This store instruction minimizes cache pollution. Operation
m128 = SRC;

void_mm_stream_ps(float * p, __m128 a)
Stores the data in a to the address p without polluting the caches. The address must be 16-bytealigned. Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-429
MOVNTPSMove Aligned Four Packed Single-FP Non Temporal (Continued)

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) Comments MOVTNPS should be used when dealing with 16-byte aligned single-precision FP numbers. MOVNTPS minimizes pollution in the cache hierarchy. As a consequence of the resulting weakly-ordered memory consistency model, a fencing operation should be used if multiple processors may use different memory types to read/write the memory location. Refer to Section 9.3.9., Cacheability Control Instructions in Chapter 9, Programming with the Streaming SIMD Extensions of the Intel Architecture Software Developers Manual, Volume 1, for further information about non-temporal stores. The usage of Repeat Prefix (F3H) with MOVNTPS is reserved. Different processor implementations may handle this prefix differently. Usage of this prefix with MOVNTPS risks incompatibility with future processors. For a page fault.
3-430
MOVNTQMove 64 Bits Non Temporal

Opcode 0F,E7,/r Instruction MOVNTQ m64, mm Description Move 64 bits representing integer operands (8b, 16b, 32b) from MM register to memory, minimizing pollution within cache hierarchy.
Description The linear address corresponds to the address of the least-significant byte of the referenced memory data. This store instruction minimizes cache pollution. Operation
m64 = SRC;

void_mm_stream_pi(__m64 * p, __m64 a)
Stores the data in a to the address p without polluting the caches. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #MF #AC For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. To enable #AC exceptions, three conditions must be true (CR0.AM is set; EFLAGS.AC is set; current CPL is 3)
3-431
MOVNTQMove 64 Bits Non Temporal (Continued)

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #AC #PF (fault-code) Comments MOVNTQ minimizes pollution in the cache hierarchy. As a consequence of the resulting weakly-ordered memory consistency model, a fencing operation should be used if multiple processors may use different memory types to read/write the memory location. Refer to Section 9.3.9., Cacheability Control Instructions in Chapter 9, Programming with the Streaming SIMD Extensions of the Intel Architecture Software Developers Manual, Volume 1, for further information about non-temporal stores. MOVNTQ ignores the value of CR4.OSFXSR. Since it does not affect the new Streaming SIMD Extension state, MOVNTQ will not generate an invalid exception if CR4.OSFXSR = 0. For unaligned memory reference if the current privilege level is 3. For a page fault.
3-432
MOVQMove 64 Bits
Opcode 0F 6F /r 0F 7F /r Instruction MOVQ mm, mm/m64 MOVQ mm/m64, mm Description Move quadword from mm/m64 to mm. Move quadword from mm to mm/m64.
Description This instruction copies quadword from the source operand (second operand) to the destination operand (first operand) (refer to Figure 3-47). A source or destination operand can be either an MMX technology register or a memory location; however, data cannot be transferred from one memory location to another memory location. Data can be transferred from one MMX technology register to another MMX technology register.
MOVQ mm, m64 15 b7 b5 b3 b1 0 b6 W N+3 b4 b2 b0 W N+2 W N+1 W N+0 m64
3006013
63 48 47 32 31 1615 0 b7 b6 b5 b4 b3 b2 b1 b0 mm
Figure 3-47. Operation of the MOVQ Instructions
Operation
DEST SRC;
3-433
MOVQMove 64 Bits (Continued)

Protected Mode Exceptions #GP(0) If the destination operand is in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. #SS(0) #UD #NM #MF #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-434
MOVS/MOVSB/MOVSW/MOVSDMove Data from String to String

Opcode A4 A5 A5 A4 A5 A5 Instruction MOVS m8, m8 MOVS m16, m16 MOVS m32, m32 MOVSB MOVSW MOVSD Description Move byte at address DS:(E)SI to address ES:(E)DI Move word at address DS:(E)SI to address ES:(E)DI Move doubleword at address DS:(E)SI to address ES:(E)DI Move byte at address DS:(E)SI to address ES:(E)DI Move word at address DS:(E)SI to address ES:(E)DI Move doubleword at address DS:(E)SI to address ES:(E)DI
Description These instructions move the byte, word, or doubleword specified with the second operand (source operand) to the location specified with the first operand (destination operand). Both the source and destination operands are located in memory. The address of the source operand is read from the DS:ESI or the DS:SI registers (depending on the address-size attribute of the instruction, 32 or 16, respectively). The address of the destination operand is read from the ES:EDI or the ES:DI registers (again depending on the address-size attribute of the instruction). The DS segment may be overridden with a segment override prefix, but the ES segment cannot be overridden. At the assembly-code level, two forms of this instruction are allowed: the explicit-operands form and the no-operands form. The explicit-operands form (specified with the MOVS mnemonic) allows the source and destination operands to be specified explicitly. Here, the source and destination operands should be symbols that indicate the size and location of the source value and the destination, respectively. This explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the source and destination operand symbols must specify the correct type (size) of the operands (bytes, words, or doublewords), but they do not have to specify the correct location. The locations of the source and destination operands are always specified by the DS:(E)SI and ES:(E)DI registers, which must be loaded correctly before the move string instruction is executed. The no-operands form provides short forms of the byte, word, and doubleword versions of the MOVS instructions. Here also DS:(E)SI and ES:(E)DI are assumed to be the source and destination operands, respectively. The size of the source and destination operands is selected with the mnemonic: MOVSB (byte move), MOVSW (word move), or MOVSD (doubleword move). After the move operation, the (E)SI and (E)DI registers are incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the (E)SI and (E)DI register are incremented; if the DF flag is 1, the (E)SI and (E)DI registers are decremented.) The registers are incremented or decremented by one for byte operations, by two for word operations, or by four for doubleword operations.
3-435
MOVS/MOVSB/MOVSW/MOVSDMove Data from String to String (Continued)

The MOVS, MOVSB, MOVSW, and MOVSD instructions can be preceded by the REP prefix (refer to REP/REPE/REPZ/REPNE /REPNZRepeat String Operation Prefix in this chapter) for block moves of ECX bytes, words, or doublewords. Operation
DEST SRC; IF (byte move) THEN IF DF = 0 THEN (E)SI (E)SI + 1; (E)DI (E)DI + 1; ELSE (E)SI (E)SI 1; (E)DI (E)DI 1; FI; ELSE IF (word move) THEN IF DF = 0 (E)SI (E)SI + 2; (E)DI (E)DI + 2; ELSE (E)SI (E)SI 2; (E)DI (E)DI 2; FI; ELSE (* doubleword move*) THEN IF DF = 0 (E)SI (E)SI + 4; (E)DI (E)DI + 4; ELSE (E)SI (E)SI 4; (E)DI (E)DI 4; FI; FI;
3-436
MOVS/MOVSB/MOVSW/MOVSDMove Data from String to String (Continued)

3-437
MOVSSMove Scalar Single-FP

Opcode F3,0F,10,/r F3,0F,11,/r Instruction MOVSS xmm1, xmm2/m32 MOVSS xmm2/m32, xmm1 Description Move 32 bits representing one scalar SP operand from XMM2/Mem to XMM1 register. Move 32 bits representing one scalar SP operand from XMM1 register to XMM2/Mem.
Description The linear address corresponds to the address of the least-significant byte of the referenced memory data. When a memory address is indicated, the four bytes of data at memory location m32 are loaded or stored. When the load form of this operation is used, the 32 bits from memory are copied into the lower 32 bits of the 128-bit register xmm, the 96 most significant bits being cleared.
MOVSS xmm1,xmm2/m32 (xmm2/m32, xmm1) Xmm1
Xmm2/ m32 Xmm1
Figure 3-48. Operation of the MOVSS Instruction
3-438
MOVSSMove Scalar Single-FP (Continued)

Operation
IF (destination = DEST) THEN IF (SRC == m32) THEN(* load instruction *) DEST[31-0] = m32; DEST [63-32] = 0X00000000; DEST [95-64] = 0X00000000; DEST [127-96] = 0X00000000; ELSE(* move instruction *) DEST [31-0] = SRC[31-0]; DEST [63-32] = DEST [63-32]; DEST [95-64] = DEST [95-64]; DEST [127-96] = DEST [127-96]; FI ELSE IF (destination = m32) THEN(* store instruction *) m32 = SRC[31-0]; ELSE (* move instruction *) DEST [31-0] = SRC[31-0] DEST [63-32] = DEST[63-32]; DEST [95-64] = DEST [95-64]; DEST [127-96] = DEST [127-96]; FI FI

__m128 _mm_load_ss(float * p)
Loads an SP FP value into the low word and clears the upper three words.
void_mm_store_ss(float * p, __m128 a)
Stores the lower SP FP value.

__m128 _mm_move_ss(__m128 a, __m128 b)
Sets the low word to the SP FP value of b. The upper 3 SP FP values are passed through from a. Exceptions None. Numeric Exceptions None.
3-439
MOVSSMove Scalar Single-FP (Continued)

Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #AC #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference.To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-440
MOVSXMove with Sign-Extension

Opcode 0F BE /r 0F BE /r 0F BF /r Instruction MOVSX r16,r/m8 MOVSX r32,r/m8 MOVSX r32,r/m16 Description Move byte to word with sign-extension Move byte to doubleword, sign-extension Move word to doubleword, sign-extension
Description This instruction copies the contents of the source operand (register or memory location) to the destination operand (register) and sign extends the value to 16 or 32 bits. For more information, refer to Section 6-5, Sign Extension in Chapter 6, Instruction Set Summary of the Intel Architecture Software Developers Manual, Volume 1. The size of the converted value depends on the operand-size attribute. Operation
DEST SignExtend(SRC);
3-441
MOVSXMove with Sign-Extension (Continued)

Virtual-8086 Mode Exceptions #GP(0) #SS(0) #PF(fault-code) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If a page fault occurs.
3-442
MOVUPSMove Unaligned Four Packed Single-FP

Opcode 0F,10,/r 0F,11,/r Instruction MOVUPS xmm1, xmm2/m128 MOVUPS xmm2/m128, xmm1 Description Move 128 bits representing four SP data from XMM2/Mem to XMM1 register. Move 128 bits representing four SP data from XMM1 register to XMM2/Mem.
Description The linear address corresponds to the address of the least-significant byte of the referenced memory data. When a memory address is indicated, the 16 bytes of data at memory location m128 are loaded to the 128-bit multimedia register xmm or stored from the 128-bit multimedia register xmm. When the register-register form of this operation is used, the content of the 128bit source register is copied into 128-bit register xmm. No assumption is made about alignment.
MOVUPS xmm1, xmm2/m128 (xmm2/m128, xmm1) Xmm1
Xmm2/ m128 Xmm1
Figure 3-49. Operation of the MOVUPS Instruction
3-443
MOVUPSMove Unaligned Four Packed Single-FP (Continued)

Operation
IF (destination = xmm) THEN IF (SRC = m128)THEN(* load instruction *) DEST[127-0] = m128; ELSE (* move instruction *) DEST[127-0] = SRC[127-0]; FI ELSE IF (destination = m128) THEN(* store instruction *) m128 = SRC[127-0]; ELSE (* move instruction *) DEST[127-0] = SRC[127-0]; FI FI

__m128 _mm_loadu_ps(float * p)
Loads four SP FP values. The address need not be 16-byte-aligned.

void_mm_storeu_ps(float *p, __m128 a)
Stores four SP FP values. The address need not be 16-byte-aligned. Exceptions None. Numeric Exceptions None.
3-444
MOVUPSMove Unaligned Four Packed Single-FP (Continued)

Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code #UD #AC #NM For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. For unaligned memory reference if the current privilege level is 3. If TS bit in CR0 is set.
Real Address Mode Exceptions Interrupt 13 #UD #NM If any part of the operand would lie outside of the effective address space from 0 to 0FFFFH. If CR0.EM = 1. If TS bit in CR0 is set.
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #AC #PF (fault-code) Comments MOVUPS should be used with SP FP numbers when that data is known to be unaligned.The usage of this instruction should be limited to the cases where the aligned restriction is hard or impossible to meet. Streaming SIMD Extension implementations guarantee optimum unaligned support for MOVUPS. Efficient Streaming SIMD Extension applications should mainly rely on MOVAPS, not MOVUPS, when dealing with aligned data. The usage of Repeat-NE (F2H) and Operand Size (66H) prefixes is reserved. Different processor implementations may handle these prefixes differently. Usage of these prefixes with MOVUPS risks incompatibility with future processors. A linear address of the 128 bit data access, while executing in 16-bit mode, that overlaps the end of a 16-bit segment is not allowed and is defined as reserved behavior. Different processor implementations may/may not raise a GP fault in this case if the segment limit has been exceeded. Additionally, the address that spans the end of the segment may/may not wrap around to the beginning of the segment. For unaligned memory reference if the current privilege level is 3. For a page fault.
3-445
MOVZXMove with Zero-Extend

Opcode 0F B6 /r 0F B6 /r 0F B7 /r Instruction MOVZX r16,r/m8 MOVZX r32,r/m8 MOVZX r32,r/m16 Description Move byte to word with zero-extension Move byte to doubleword, zero-extension Move word to doubleword, zero-extension
Description This instruction copies the contents of the source operand (register or memory location) to the destination operand (register) and zero extends the value to 16 or 32 bits. The size of the converted value depends on the operand-size attribute. Operation
DEST ZeroExtend(SRC);
3-446
MOVZXMove with Zero-Extend (Continued)

3-447
MULUnsigned Multiply
Opcode F6 /4 F7 /4 F7 /4 Instruction MUL r/m8 MUL r/m16 MUL r/m32 Description Unsigned multiply (AX AL r/m8) Unsigned multiply (DX:AX AX r/m16) Unsigned multiply (EDX:EAX EAX r/m32)
Description This instruction performs an unsigned multiplication of the first operand (destination operand) and the second operand (source operand) and stores the result in the destination operand. The destination operand is an implied operand located in register AL, AX or EAX (depending on the size of the operand); the source operand is located in a general-purpose register or a memory location. The action of this instruction and the location of the result depends on the opcode and the operand size as shown in the following table.
:
Operand Size Byte Word Doubleword
Source 1 AL AX EAX
Source 2 r/m8 r/m16 r/m32
Destination AX DX:AX EDX:EAX
The result is stored in register AX, register pair DX:AX, or register pair EDX:EAX (depending on the operand size), with the high-order bits of the product contained in register AH, DX, or EDX, respectively. If the high-order bits of the product are 0, the CF and OF flags are cleared; otherwise, the flags are set. Operation
IF byte operation THEN AX AL SRC ELSE (* word or doubleword operation *) IF OperandSize = 16 THEN DX:AX AX SRC ELSE (* OperandSize = 32 *) EDX:EAX EAX SRC FI; FI;
Flags Affected The OF and CF flags are cleared to 0 if the upper half of the result is 0; otherwise, they are set to 1. The SF, ZF, AF, and PF flags are undefined.
3-448
MULUnsigned Multiply (Continued)

3-449
MULPSPacked Single-FP Multiply

Opcode 0F,59,/r Instruction MULPS xmm1, xmm2/m128 Description Multiply packed SP FP numbers in XMM2/Mem to XMM1.
Description The MULPS instructions multiply the packed SP FP numbers of both their operands.
MULPS xmm1, xmm2/m128 Xmm1
Xmm2/ m128 Xmm1
Figure 3-50. Operation of the MULPS Instruction
Operation
DEST[31-0] = DEST[31-0] * SRC/m128[31-0]; DEST[63-32] = DEST[63-32] * SRC/m128[63-32]; DEST[95-64] = DEST[95-64] * SRC/m128[95-64]; DEST[127-96] = DEST[127-96] * SRC/m128[127-96];

__m128 _mm_mul_ps(__m128 a, __m128 b)
Multiplies the four SP FP values of a and b.
3-450
MULPSPacked Single-FP Multiply (Continued)

Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions Overflow, Underflow, Invalid, Precision, Denormal. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #XM #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0).
3-451
MULSSScalar Single-FP Multiply

Opcode F3,0F,59,/r Instruction MULSS xmm1 xmm2/m32 Description Multiply the lowest SP FP number in XMM2/Mem to XMM1.
Description The MULSS instructions multiply the lowest SP FP numbers of both their operands; the upper three fields are passed through from xmm1.
MULSS xmm1, xmm2/m128 Xmm1
-4.75
Xmm2/ m128
2501.4
=
Xmm1
=
-11881.65
Figure 3-51. Operation of the MULSS Instruction
Operation
DEST[31-0] = DEST[31-0] * SRC/m32[31-0]; DEST[63-32] = DEST[63-32]; DEST[95-64] = DEST[95-64]; DEST[127-96] = DEST[127-96];

__m128 _mm_mul_ss(__m128 a, __m128 b)
Multiplies the lower SP FP values of a and b; the upper three SP FP values are passed through from a. Exceptions None. Numeric Exceptions Overflow, Underflow, Invalid, Precision, Denormal.
3-452
MULSSScalar Single-FP Multiply (Continued)

Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #AC #XM #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0).
3-453
NEGTwo's Complement Negation

Opcode F6 /3 F7 /3 F7 /3 Instruction NEG r/m8 NEG r/m16 NEG r/m32 Description Twos complement negate r/m8 Twos complement negate r/m16 Twos complement negate r/m32
Description This instruction replaces the value of operand (the destination operand) with its twos complement. (This operation is equivalent to subtracting the operand from 0.) The destination operand is located in a general-purpose register or a memory location. Operation
IF DEST = 0 THEN CF 0 ELSE CF 1; FI; DEST (DEST)
Flags Affected The CF flag cleared to 0 if the source operand is 0; otherwise it is set to 1. The OF, SF, ZF, AF, and PF flags are set according to the result. Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-454
NEGTwo's Complement Negation (Continued)

3-455
NOPNo Operation
Opcode 90 Instruction NOP Description No operation
Description This instruction performs no operation. This instruction is a one-byte instruction that takes up space in the instruction stream but does not affect the machine context, except the EIP register. The NOP instruction is an alias mnemonic for the XCHG (E)AX, (E)AX instruction. Flags Affected None. Exceptions (All Operating Modes) None.
3-456
NOTOne's Complement Negation

Opcode F6 /2 F7 /2 F7 /2 Instruction NOT r/m8 NOT r/m16 NOT r/m32 Description Reverse each bit of r/m8 Reverse each bit of r/m16 Reverse each bit of r/m32
Description This instruction performs a bitwise NOT operation (each 1 is cleared to 0, and each 0 is set to 1) on the destination operand and stores the result in the destination operand location. The destination operand can be a register or a memory location. Operation
DEST NOT DEST;
Flags Affected None. Protected Mode Exceptions #GP(0) If the destination operand points to a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-457
NOTOne's Complement Negation (Continued)

3-458
ORLogical Inclusive OR
Opcode 0C ib 0D iw 0D id 80 /1 ib 81 /1 iw 81 /1 id 83 /1 ib 83 /1 ib 08 /r 09 /r 09 /r 0A /r 0B /r 0B /r Instruction OR AL,imm8 OR AX,imm16 OR EAX,imm32 OR r/m8,imm8 OR r/m16,imm16 OR r/m32,imm32 OR r/m16,imm8 OR r/m32,imm8 OR r/m8,r8 OR r/m16,r16 OR r/m32,r32 OR r8,r/m8 OR r16,r/m16 OR r32,r/m32 Description AL OR imm8 AX OR imm16 EAX OR imm32
r/m8 OR imm8 r/m16 OR imm16 r/m32 OR imm32 r/m16 OR imm8 (sign-extended) r/m32 OR imm8 (sign-extended) r/m8 OR r8 r/m16 OR r16 r/m32 OR r32 r8 OR r/m8 r16 OR r/m16 r32 OR r/m32
Description This instruction performs a bitwise inclusive OR operation between the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result of the OR instruction is 0 if both corresponding bits of the operands are 0; otherwise, each bit is 1. Operation
DEST DEST OR SRC;
3-459
ORLogical Inclusive OR (Continued)

3-460
ORPSBit-wise Logical OR for Single-FP Data

Opcode 0F,56,/r Instruction ORPS xmm1, xmm2/m128 Description OR 128 bits from XMM2/Mem to XMM1 register.
Description The ORPS instructions return a bit-wise logical OR between xmm1 and xmm2/mem.
0xEB460053
0xFB37D019
0x00038AC2
0x999333CC
0x00FF00AA
0x00FF00AA
=
0xEBFF00FB
Figure 3-52. Operation of the ORPS Instruction
Operation
DEST[127-0] |= SRC/m128[127-0];

__m128 _mm_or_ps(__m128 a, __m128 b)
Computes the bitwise OR of the four SP FP values of a and b. Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment.
3-461
ORPSBit-wise Logical OR for Single-FP Data (Continued)

Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set.
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) Comments The usage of Repeat Prefix (F3H) with ORPS is reserved. Different processor implementations may handle this prefix differently. Usage of this prefix with ORPS risks incompatibility with future processors. For a page fault.
3-462
OUTOutput to Port
Opcode E6 ib E7 ib E7 ib EE EF EF Instruction OUT imm8, AL OUT imm8, AX OUT imm8, EAX OUT DX, AL OUT DX, AX OUT DX, EAX Description Output byte in AL to I/O port address imm8 Output word in AX to I/O port address imm8 Output doubleword in EAX to I/O port address imm8 Output byte in AL to I/O port address in DX Output word in AX to I/O port address in DX Output doubleword in EAX to I/O port address in DX
Description This instruction copies the value from the second operand (source operand) to the I/O port specified with the destination operand (first operand). The source operand can be register AL, AX, or EAX, depending on the size of the port being accessed (8, 16, or 32 bits, respectively); the destination operand can be a byte-immediate or the DX register. Using a byte immediate allows I/O port addresses 0 to 255 to be accessed; using the DX register as a source operand allows I/O ports from 0 to 65,535 to be accessed. The size of the I/O port being accessed is determined by the opcode for an 8-bit I/O port or by the operand-size attribute of the instruction for a 16- or 32-bit I/O port. At the machine code level, I/O instructions are shorter when accessing 8-bit I/O ports. Here, the upper eight bits of the port address will be 0. This instruction is only useful for accessing I/O ports located in the processors I/O address space. Refer to Chapter 10, Input/Output of the Intel Architecture Software Developers Manual, Volume 1, for more information on accessing I/O ports in the I/O address space. Intel Architecture Compatibility After executing an OUT instruction, the Pentium processor insures that the EWBE# pin has been sampled active before it begins to execute the next instruction. (Note that the instruction can be prefetched if EWBE# is not active, but it will not be executed until the EWBE# pin is sampled active.) Only the P6 family of processors has the EWBE# pin; the other Intel Architecture processors do not.
3-463
OUTOutput to Port (Continued)

Operation
IF ((PE = 1) AND ((CPL > IOPL) OR (VM = 1))) THEN (* Protected mode with CPL > IOPL or virtual-8086 mode *) IF (Any I/O Permission Bit for I/O port being accessed = 1) THEN (* I/O operation is not allowed *) #GP(0); ELSE ( * I/O operation is allowed *) DEST SRC; (* Writes to selected I/O port *) FI; ELSE (Real Mode or Protected Mode with CPL IOPL *) DEST SRC; (* Writes to selected I/O port *) FI;
Flags Affected None. Protected Mode Exceptions #GP(0) If the CPL is greater than (has less privilege) the I/O privilege level (IOPL) and any of the corresponding I/O permission bits in TSS for the I/O port being accessed is 1.
Real-Address Mode Exceptions None. Virtual-8086 Mode Exceptions #GP(0) If any of the I/O permission bits in the TSS for the I/O port being accessed is 1.
3-464
OUTS/OUTSB/OUTSW/OUTSDOutput String to Port

Opcode 6E 6F 6F 6E 6F 6F Instruction OUTS DX, m8 OUTS DX, m16 OUTS DX, m32 OUTSB OUTSW OUTSD Description Output byte from memory location specified in DS:(E)SI to I/O port specified in DX Output word from memory location specified in DS:(E)SI to I/O port specified in DX Output doubleword from memory location specified in DS:(E)SI to I/O port specified in DX Output byte from memory location specified in DS:(E)SI to I/O port specified in DX Output word from memory location specified in DS:(E)SI to I/O port specified in DX Output doubleword from memory location specified in DS:(E)SI to I/O port specified in DX
Description These instructions copy data from the source operand (second operand) to the I/O port specified with the destination operand (first operand). The source operand is a memory location, the address of which is read from either the DS:EDI or the DS:DI registers (depending on the address-size attribute of the instruction, 32 or 16, respectively). The DS segment may be overridden with a segment override prefix. The destination operand is an I/O port address (from 0 to 65,535) that is read from the DX register. The size of the I/O port being accessed (that is, the size of the source and destination operands) is determined by the opcode for an 8-bit I/O port or by the operand-size attribute of the instruction for a 16- or 32-bit I/O port. At the assembly-code level, two forms of this instruction are allowed: the explicit-operands form and the no-operands form. The explicit-operands form (specified with the OUTS mnemonic) allows the source and destination operands to be specified explicitly. Here, the source operand should be a symbol that indicates the size of the I/O port and the source address, and the destination operand must be DX. This explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the source operand symbol must specify the correct type (size) of the operand (byte, word, or doubleword), but it does not have to specify the correct location. The location is always specified by the DS:(E)SI registers, which must be loaded correctly before the OUTS instruction is executed. The no-operands form provides short forms of the byte, word, and doubleword versions of the OUTS instructions. Here also DS:(E)SI is assumed to be the source operand and DX is assumed to be the destination operand. The size of the I/O port is specified with the choice of mnemonic: OUTSB (byte), OUTSW (word), or OUTSD (doubleword). After the byte, word, or doubleword is transferred from the memory location to the I/O port, the (E)SI register is incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the (E)SI register is incremented; if the DF flag is 1, the (E)SI register is decremented.) The (E)SI register is incremented or decremented by ne for byte operations, by two for word operations, or by four for doubleword operations.
3-465
OUTS/OUTSB/OUTSW/OUTSDOutput String to Port (Continued)

The OUTS, OUTSB, OUTSW, and OUTSD instructions can be preceded by the REP prefix for block input of ECX bytes, words, or doublewords. Refer to REP/REPE/REPZ/REPNE /REPNZRepeat String Operation Prefix in this chapter for a description of the REP prefix. This instruction is only useful for accessing I/O ports located in the processors I/O address space. Refer to Chapter 10, Input/Output of the Intel Architecture Software Developers Manual, Volume 1, for more information on accessing I/O ports in the I/O address space. Intel Architecture Compatibility After executing an OUTS, OUTSB, OUTSW, or OUTSD instruction, the Pentium processor insures that the EWBE# pin has been sampled active before it begins to execute the next instruction. (Note that the instruction can be prefetched if EWBE# is not active, but it will not be executed until the EWBE# pin is sampled active.) Only the P6 family of processors has the EWBE# pin. For the P6 family processors, upon execution of an OUTS, OUTSB, OUTSW, or OUTSD instruction, the P6 family of processor will not execute the next instruction until the data phase of the transaction is complete.
3-466

Operation
IF ((PE = 1) AND ((CPL > IOPL) OR (VM = 1))) THEN (* Protected mode with CPL > IOPL or virtual-8086 mode *) IF (Any I/O Permission Bit for I/O port being accessed = 1) THEN (* I/O operation is not allowed *) #GP(0); ELSE ( * I/O operation is allowed *) DEST SRC; (* Writes to I/O port *) FI; ELSE (Real Mode or Protected Mode with CPL IOPL *) DEST SRC; (* Writes to I/O port *) FI; IF (byte transfer) THEN IF DF = 0 THEN (E)SI (E)SI + 1; ELSE (E)SI (E)SI 1; FI; ELSE IF (word transfer) THEN IF DF = 0 THEN (E)SI (E)SI + 2; ELSE (E)SI (E)SI 2; FI; ELSE (* doubleword transfer *) THEN IF DF = 0 THEN (E)SI (E)SI + 4; ELSE (E)SI (E)SI 4; FI; FI; FI;
3-467

Protected Mode Exceptions #GP(0) If the CPL is greater than (has less privilege) the I/O privilege level (IOPL) and any of the corresponding I/O permission bits in TSS for the I/O port being accessed is 1. If a memory operand effective address is outside the limit of the CS, DS, ES, FS, or GS segment. If the segment register contains a null segment selector. #PF(fault-code) #AC(0) If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
Virtual-8086 Mode Exceptions #GP(0) #PF(fault-code) #AC(0) If any of the I/O permission bits in the TSS for the I/O port being accessed is 1. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made.
3-468
PACKSSWB/PACKSSDWPack with Signed Saturation

Opcode 0F 63 /r Instruction PACKSSWB mm, mm/m64 PACKSSDW mm, mm/m64 Description Packs and saturate pack four signed words from mm and four signed words from mm/m64 into eight signed bytes in mm. Pack and saturate two signed doublewords from mm and two signed doublewords from mm/m64 into four signed words in mm.
0F 6B /r
Description These instructions pack and saturate signed words into bytes (PACKSSWB) or signed doublewords into words (PACKSSDW). The PACKSSWB instruction packs four signed words from the destination operand (first operand) and four signed words from the source operand (second operand) into eight signed bytes in the destination operand. If the signed value of a word is beyond the range of a signed byte (that is, greater than 7FH or less than 80H), the saturated byte value of 7FH or 80H, respectively, is stored into the destination. The PACKSSDW instruction packs two signed doublewords from the destination operand (first operand) and two signed doublewords from the source operand (second operand) into four signed words in the destination operand (refer to Figure 3-53). If the signed value of a doubleword is beyond the range of a signed word (that is, greater than 7FFFH or less than 8000H), the saturated word value of 7FFFH or 8000H, respectively, is stored into the destination. The destination operand for either the PACKSSWB or PACKSSDW instruction must be an MMX technology register; the source operand may be either an MMX technology register or a quadword memory location.
PACKSSDW mm, mm/m64 mm/m64 D C mm B A
mm
Figure 3-53. Operation of the PACKSSDW Instruction
3-469
PACKSSWB/PACKSSDWPack with Signed Saturation (Continued)

Operation
IF instruction is PACKSSWB THEN DEST(7..0) SaturateSignedWordToSignedByte DEST(15..0); DEST(15..8) SaturateSignedWordToSignedByte DEST(31..16); DEST(23..16) SaturateSignedWordToSignedByte DEST(47..32); DEST(31..24) SaturateSignedWordToSignedByte DEST(63..48); DEST(39..32) SaturateSignedWordToSignedByte SRC(15..0); DEST(47..40) SaturateSignedWordToSignedByte SRC(31..16); DEST(55..48) SaturateSignedWordToSignedByte SRC(47..32); DEST(63..56) SaturateSignedWordToSignedByte SRC(63..48); ELSE (* instruction is PACKSSDW *) DEST(15..0) SaturateSignedDoublewordToSignedWord DEST(31..0); DEST(31..16) SaturateSignedDoublewordToSignedWord DEST(63..32); DEST(47..32) SaturateSignedDoublewordToSignedWord SRC(31..0); DEST(63..48) SaturateSignedDoublewordToSignedWord SRC(63..32); FI;
Intel C/C++ Compiler Intrinsic Equivalents Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_packsswb (__m64 m1, __m64 m2)

__m64 _mm_packs_pi16(__m64 m1, __m64 m2)
Pack the four 16-bit values from m1 into the lower four 8-bit values of the result with signed saturation, and pack the four 16-bit values from m2 into the upper four 8-bit values of the result with signed saturation. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_packssdw (__m64 m1, __m64 m2)

__m64 _mm_packs_pi32 (__m64 m1, __m64 m2)
Pack the two 32-bit values from m1 into the lower two 16-bit values of the result with signed saturation, and pack the two 32-bit values from m2 into the upper two 16-bit values of the result with signed saturation. Flags Affected None.
3-470
PACKSSWB/PACKSSDWPack with Signed Saturation (Continued)

Protected Mode Exceptions #GP(0) #SS(0) #UD #NM #MF #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-471
PACKUSWBPack with Unsigned Saturation

Opcode 0F 67 /r Instruction PACKUSWB mm, mm/m64 Description Pack and saturate four signed words from mm and four signed words from mm/m64 into eight unsigned bytes in mm.
Description This instruction packs and saturates four signed words from the destination operand (first operand) and four signed words from the source operand (second operand) into eight unsigned bytes in the destination operand (refer to Figure 3-54). If the signed value of a word is beyond the range of an unsigned byte (that is, greater than FFH or less than 00H), the saturated byte value of FFH or 00H, respectively, is stored into the destination. The destination operand must be an MMX technology register; the source operand may be either an MMX technology register or a quadword memory location.
PACKUSWB mm, mm/m64 mm/m64 H G F E
mm D C B A
H G F E D C B A mm
3006014
Figure 3-54. Operation of the PACKUSWB Instruction
Operation
DEST(7..0) SaturateSignedWordToUnsignedByte DEST(15..0); DEST(15..8) SaturateSignedWordToUnsignedByte DEST(31..16); DEST(23..16) SaturateSignedWordToUnsignedByte DEST(47..32); DEST(31..24) SaturateSignedWordToUnsignedByte DEST(63..48); DEST(39..32) SaturateSignedWordToUnsignedByte SRC(15..0); DEST(47..40) SaturateSignedWordToUnsignedByte SRC(31..16); DEST(55..48) SaturateSignedWordToUnsignedByte SRC(47..32); DEST(63..56) SaturateSignedWordToUnsignedByte SRC(63..48);
3-472
PACKUSWBPack with Unsigned Saturation (Continued)

__m64 _m_packuswb(__m64 m1, __m64 m2)

__m64 _mm_packs_pu16(__m64 m1, __m64 m2)
Pack the four 16-bit values from m1 into the lower four 8-bit values of the result with unsigned saturation, and pack the four 16-bit values from m2 into the upper four 8-bit values of the result with unsigned saturation. Flags Affected None. Protected Mode Exceptions #GP(0) #SS(0) #UD #NM #MF #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-473
PACKUSWBPack with Unsigned Saturation (Continued)

3-474
PADDB/PADDW/PADDDPacked Add
Opcode 0F FC /r 0F FD /r 0F FE /r Instruction PADDB mm, mm/m64 PADDW mm, mm/m64 PADDD mm, mm/m64 Description Add packed bytes from mm/m64 to packed bytes in mm. Add packed words from mm/m64 to packed words in mm. Add packed doublewords from mm/m64 to packed doublewords in mm.
Description These instructions add the individual data elements (bytes, words, or doublewords) of the source operand (second operand) to the individual data elements of the destination operand (first operand) (refer to Figure 3-55). If the result of an individual addition exceeds the range for the specified data type (overflows), the result is wrapped around, meaning that the result is truncated so that only the lower (least significant) bits of the result are returned (that is, the carry is ignored). The destination operand must be an MMX technology register; the source operand can be either an MMX technology register or a quadword memory location.
PADDW mm, mm/m64 mm 1000000000000000 0111111100111000
+
mm/m64 mm
+
1111111111111111 0111111111111111
+
0001011100000111 1001011000111111
3006015
Figure 3-55. Operation of the PADDW Instruction
The PADDB instruction adds the bytes of the source operand to the bytes of the destination operand and stores the results to the destination operand. When an individual result is too large to be represented in eight bits, the lower eight bits of the result are written to the destination operand and therefore the result wraps around. The PADDW instruction adds the words of the source operand to the words of the destination operand and stores the results to the destination operand. When an individual result is too large to be represented in 16 bits, the lower 16 bits of the result are written to the destination operand and therefore the result wraps around.
3-475
PADDB/PADDW/PADDDPacked Add (Continued)

The PADDD instruction adds the doublewords of the source operand to the doublewords of the destination operand and stores the results to the destination operand. When an individual result is too large to be represented in 32 bits, the lower 32 bits of the result are written to the destination operand and therefore the result wraps around. Note that like the integer ADD instruction, the PADDB, PADDW, and PADDD instructions can operate on either unsigned or signed (twos complement notation) packed integers. Unlike the integer instructions, none of the MMX instructions affect the EFLAGS register. With MMX instructions, there are no carry or overflow flags to indicate when overflow has occurred, so the software must control the range of values or else use the with saturation MMX instructions. Operation
IF instruction is PADDB THEN DEST(7..0) DEST(7..0) + SRC(7..0); DEST(15..8) DEST(15..8) + SRC(15..8); DEST(23..16) DEST(23..16)+ SRC(23..16); DEST(31..24) DEST(31..24) + SRC(31..24); DEST(39..32) DEST(39..32) + SRC(39..32); DEST(47..40) DEST(47..40)+ SRC(47..40); DEST(55..48) DEST(55..48) + SRC(55..48); DEST(63..56) DEST(63..56) + SRC(63..56); ELSEIF instruction is PADDW THEN DEST(15..0) DEST(15..0) + SRC(15..0); DEST(31..16) DEST(31..16) + SRC(31..16); DEST(47..32) DEST(47..32) + SRC(47..32); DEST(63..48) DEST(63..48) + SRC(63..48); ELSE (* instruction is PADDD *) DEST(31..0) DEST(31..0) + SRC(31..0); DEST(63..32) DEST(63..32) + SRC(63..32); FI;
3-476

__m64 _m_paddb(__m64 m1, __m64 m2)

__m64 _mm_add_pi8(__m64 m1, __m64 m2)
Add the eight 8-bit values in m1 to the eight 8-bit values in m2. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_paddw(__m64 m1, __m64 m2)

__m64 _mm_addw_pi16(__m64 m1, __m64 m2)
Add the four 16-bit values in m1 to the four 16-bit values in m2. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_paddd(__m64 m1, __m64 m2)

__m64 _mm_add_pi32(__m64 m1, __m64 m2)
Add the two 32-bit values in m1 to the two 32-bit values in m2. Flags Affected None. Protected Mode Exceptions #GP(0) #SS(0) #UD #NM #MF #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-477

3-478
PADDSB/PADDSWPacked Add with Saturation

Opcode 0F EC /r 0F ED /r Instruction PADDSB mm, mm/m64 PADDSW mm, mm/m64 Description Add signed packed bytes from mm/m64 to signed packed bytes in mm and saturate. Add signed packed words from mm/m64 to signed packed words in mm and saturate.
Description These instructions add the individual signed data elements (bytes or words) of the source operand (second operand) to the individual signed data elements of the destination operand (first operand) (refer to Figure 3-56). If the result of an individual addition exceeds the range for the specified data type, the result is saturated. The destination operand must be an MMX technology register; the source operand can be either an MMX technology register or a quadword memory location.
PADDSW mm, mm/m64 mm 1000000000000000 0111111100111000
+
mm/m64 mm
+
1111111111111111 1000000000000000
+
0001011100000111 0111111111111111
3006016
Figure 3-56. Operation of the PADDSW Instruction
The PADDSB instruction adds the signed bytes of the source operand to the signed bytes of the destination operand and stores the results to the destination operand. When an individual result is beyond the range of a signed byte (that is, greater than 7FH or less than 80H), the saturated byte value of 7FH or 80H, respectively, is written to the destination operand. The PADDSW instruction adds the signed words of the source operand to the signed words of the destination operand and stores the results to the destination operand. When an individual result is beyond the range of a signed word (that is, greater than 7FFFH or less than 8000H), the saturated word value of 7FFFH or 8000H, respectively, is written to the destination operand.
3-479
PADDSB/PADDSWPacked Add with Saturation (Continued)

Operation
IF instruction is PADDSB THEN DEST(7..0) SaturateToSignedByte(DEST(7..0) + SRC (7..0)) ; DEST(15..8) SaturateToSignedByte(DEST(15..8) + SRC(15..8) ); DEST(23..16) SaturateToSignedByte(DEST(23..16)+ SRC(23..16) ); DEST(31..24) SaturateToSignedByte(DEST(31..24) + SRC(31..24) ); DEST(39..32) SaturateToSignedByte(DEST(39..32) + SRC(39..32) ); DEST(47..40) SaturateToSignedByte(DEST(47..40)+ SRC(47..40) ); DEST(55..48) SaturateToSignedByte(DEST(55..48) + SRC(55..48) ); DEST(63..56) SaturateToSignedByte(DEST(63..56) + SRC(63..56) ); ELSE { (* instruction is PADDSW *) DEST(15..0) SaturateToSignedWord(DEST(15..0) + SRC(15..0) ); DEST(31..16) SaturateToSignedWord(DEST(31..16) + SRC(31..16) ); DEST(47..32) SaturateToSignedWord(DEST(47..32) + SRC(47..32) ); DEST(63..48) SaturateToSignedWord(DEST(63..48) + SRC(63..48) ); FI;
__m64 _m_paddsb(__m64 m1, __m64 m2)

__m64 _mm_adds_pi8(__m64 m1, __m64 m2)
Add the eight signed 8-bit values in m1 to the eight signed 8-bit values in m2 and saturate. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_paddsw(__m64 m1, __m64 m2)

__m64 _mm_adds_pi16(__m64 m1, __m64 m2)
Add the four signed 16-bit values in m1 to the four signed 16-bit values in m2 and saturate. Flags Affected None.
3-480
PADDSB/PADDSWPacked Add with Saturation (Continued)

3-481
PADDUSB/PADDUSWPacked Add Unsigned with Saturation

Opcode 0F DC /r 0F DD /r Instruction PADDUSB mm, mm/m64 PADDUSW mm, mm/m64 Description Add unsigned packed bytes from mm/m64 to unsigned packed bytes in mm and saturate. Add unsigned packed words from mm/m64 to unsigned packed words in mm and saturate.
Description These instructions add the individual unsigned data elements (bytes or words) of the packed source operand (second operand) to the individual unsigned data elements of the packed destination operand (first operand) (refer to Figure 3-57). If the result of an individual addition exceeds the range for the specified unsigned data type, the result is saturated. The destination operand must be an MMX technology register; the source operand can be either an MMX technology register or a quadword memory location.
PADDUSB mm, mm/m64 mm 10000000 01111111 00111000
+
mm/m64 mm
+
11111111 11111111
+
00010111 10010110
+
00000111 00111111
3006017
Figure 3-57. Operation of the PADDUSB Instruction
The PADDUSB instruction adds the unsigned bytes of the source operand to the unsigned bytes of the destination operand and stores the results to the destination operand. When an individual result is beyond the range of an unsigned byte (that is, greater than FFH), the saturated unsigned byte value of FFH is written to the destination operand. The PADDUSW instruction adds the unsigned words of the source operand to the unsigned words of the destination operand and stores the results to the destination operand. When an individual result is beyond the range of an unsigned word (that is, greater than FFFFH), the saturated unsigned word value of FFFFH is written to the destination operand.
3-482
PADDUSB/PADDUSWPacked Add Unsigned with Saturation (Continued)

Operation
IF instruction is PADDUSB THEN DEST(7..0) SaturateToUnsignedByte(DEST(7..0) + SRC (7..0) ); DEST(15..8) SaturateToUnsignedByte(DEST(15..8) + SRC(15..8) ); DEST(23..16) SaturateToUnsignedByte(DEST(23..16)+ SRC(23..16) ); DEST(31..24) SaturateToUnsignedByte(DEST(31..24) + SRC(31..24) ); DEST(39..32) SaturateToUnsignedByte(DEST(39..32) + SRC(39..32) ); DEST(47..40) SaturateToUnsignedByte(DEST(47..40)+ SRC(47..40) ); DEST(55..48) SaturateToUnsignedByte(DEST(55..48) + SRC(55..48) ); DEST(63..56) SaturateToUnsignedByte(DEST(63..56) + SRC(63..56) ); ELSE { (* instruction is PADDUSW *) DEST(15..0) SaturateToUnsignedWord(DEST(15..0) + SRC(15..0) ); DEST(31..16) SaturateToUnsignedWord(DEST(31..16) + SRC(31..16) ); DEST(47..32) SaturateToUnsignedWord(DEST(47..32) + SRC(47..32) ); DEST(63..48) SaturateToUnsignedWord(DEST(63..48) + SRC(63..48) ); FI;
__m64 _m_paddusb(__m64 m1, __m64 m2)

__m64 _mm_adds_pu8(__m64 m1, __m64 m2)
Add the eight unsigned 8-bit values in m1 to the eight unsigned 8-bit values in m2 and saturate. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_paddusw(__m64 m1, __m64 m2)

__m64 _mm_adds_pu16(__m64 m1, __m64 m2)
Add the four unsigned 16-bit values in m1 to the four unsigned 16-bit values in m2 and saturate. Flags Affected None.
3-483
PADDUSB/PADDUSWPacked Add Unsigned with Saturation (Continued)

3-484
PANDLogical AND
Opcode 0F DB /r Instruction PAND mm, mm/m64 Description AND quadword from mm/m64 to quadword in mm.
Description This instruction performs a bitwise logical AND operation on the quadword source (second) and destination (first) operands and stores the result in the destination operand location (refer to Figure 3-58). The source operand can be an MMX technology register or a quadword memory location; the destination operand must be an MMX technology register. Each bit of the result of the PAND instruction is set to 1 if the corresponding bits of the operands are both 1; otherwise it is made zero
PAND mm, mm/m64 mm 1111111111111000000000000000010110110101100010000111011101110111
&
mm/m64 0001000011011001010100000011000100011110111011110001010110010101 mm 0001000011011000000000000000000100010100100010000001010100010101
3006019
Figure 3-58. Operation of the PAND Instruction
Operation
DEST DEST AND SRC;
__m64 _m_pand(__m64 m1, __m64 m2)

__m64 _mm_and_si64(__m64 m1, __m64 m2)
Perform a bitwise AND of the 64-bit value in m1 with the 64-bit value in m2. Flags Affected None.
3-485
PANDLogical AND (Continued)

3-486
PANDNLogical AND NOT

Opcode 0F DF /r Instruction PANDN mm, mm/m64 Description AND quadword from mm/m64 to NOT quadword in mm.
Description This instruction performs a bitwise logical NOT on the quadword destination operand (first operand). Then, the instruction performs a bitwise logical AND operation on the inverted destination operand and the quadword source operand (second operand) (refer to Figure 3-59). Each bit of the result of the AND operation is set to one if the corresponding bits of the source and inverted destination bits are one; otherwise it is set to zero. The result is stored in the destination operand location. The source operand can be an MMX technology register or a quadword memory location; the destination operand must be an MMX technology register.
PANDN mm, mm/m64
~
mm
11111111111110000000000000000101101101010011101111000100010001000
&
m/m64 mm
11111111111110000000000000000101101101010011101111000100010001000
11111111111110000000000000000101101101010011101111000100010001000
Figure 3-59. Operation of the PANDN Instruction
Operation
DEST (NOT DEST) AND SRC;
__m64 _m_pandn(__m64 m1, __m64 m2)

__m64 _mm_andnot_si64(__m64 m1, __m64 m2)
Perform a logical NOT on the 64-bit value in m1 and use the result in a bitwise AND with the 64-bit value in m2.
3-487
PANDNLogical AND NOT (Continued)

Flags Affected None. Protected Mode Exceptions #GP(0) #SS(0) #UD #NM #MF #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-488
PAVGB/PAVGWPacked Average
Opcode 0F,E0, /r 0F,E3, /r Instruction PAVGB mm1,mm2/m64 PAVGW mm1, mm2/m64 Description Average with rounding packed unsigned bytes from MM2/Mem to packed bytes in MM1 register. Average with rounding packed unsigned words from MM2/Mem to packed words in MM1 register.
Description The PAVG instructions add the unsigned data elements of the source operand to the unsigned data elements of the destination register, along with a carry-in. The results of the add are then each independently right-shifted by one bit position. The high order bits of each element are filled with the carry bits of the corresponding sum. The destination operand is an MMX technology register. The source operand can either be an MMX technology register or a 64-bit memory operand. The PAVGB instruction operates on packed unsigned bytes, and the PAVGW instruction operates on packed unsigned words.
PAVGB mm1,mm2/m64 mm1 255 1 0 253 254 255 1 0
mm2/ 64 mm1
255
255
255
255
=
255
=
2
=
0
=
254
=
255
=
255
=
2
=
0
Figure 3-60. Operation of the PAVGB/PAVGW Instruction
3-489
PAVGB/PAVGWPacked Average (Continued)

Operation
IF (* instruction = PAVGB *) THEN X[0] = DEST[7-0]; Y[0] = SRC/m64[7-0]; X[1] = DEST[15-8]; Y[1] = SRC/m64[15-8]; X[2] = DEST[23-16]; Y[2] = SRC/m64[23-16]; X[3] = DEST[31-24]; Y[3] = SRC/m64[31-24]; X[4] = DEST[39-32]; Y[4] = SRC/m64[39-32]; X[5] = DEST[47-40]; Y[5] = SRC/m64[47-40]; X[6] = DEST[55-48]; Y[6] = SRC/m64[55-48]; X[7] = DEST[63-56]; Y[7] = SRC/m64[63-56]; WHILE (I < 8) TEMP[I] = ZERO_EXT(X[I], 8) + ZERO_EXT{Y[I], 8); RES[I] = (TEMP[I] + 1) >> 1; ENDWHILE DEST[7-0] = RES[0]; DEST[15-8] = RES[1]; DEST[23-16] = RES[2]; DEST[31-24] = RES[3]; DEST[39-32] = RES[4]; DEST[47-40] = RES[5]; DEST[55-48] = RES[6]; DEST[63-56] = RES[7]; ELSE IF (* instruction PAVGW *)THEN X[0] = DEST[15-0]; Y[0] = SRC/m64[15-0]; X[1] = DEST[31-16]; Y[1] = SRC/m64[31-16]; X[2] = DEST[47-32]; Y[2] = SRC/m64[47-32]; X[3] = DEST[63-48]; Y[3] = SRC/m64[63-48];
3-490

WHILE (I < 4) TEMP[I] = ZERO_EXT(X[I], 16) + ZERO_EXT{Y[I], 16); RES[I] = (TEMP[I] + 1) >> 1; ENDWHILE DEST[15-0] = RES[0]; DEST[31-16] = RES[1]; DEST[47-32] = RES[2]; DEST[63-48] = RES[3]; FI;
__m64_mm_pavgb(__m64 a, __m64 b)
Pre-4.0 Intel C/C++ Compiler intrinsic:

__m64_mm_avg_pu8(__m64 a, __m64 b)
Performs the packed average on the eight 8-bit values of the two operands. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64_mm_pavgw(__m64 a, __m64 b)

__m64_mm_avg_pu16(__m64 a, __m64 b)
Performs the packed average on the four 16-bit values of the two operands. Numeric Exceptions None.
3-491

Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #MF #AC For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3).
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) #AC For a page fault. For unaligned memory references (if the current privilege level is 3).
3-492
PCMPEQB/PCMPEQW/PCMPEQDPacked Compare for Equal

Opcode 0F 74 /r 0F 75 /r 0F 76 /r Instruction PCMPEQB mm, mm/m64 PCMPEQW mm, mm/m64 PCMPEQD mm, mm/m64 Description Compare packed bytes in mm/m64 with packed bytes in mm for equality. Compare packed words in mm/m64 with packed words in mm for equality. Compare packed doublewords in mm/m64 with packed doublewords in mm for equality.
Description These instructions compare the individual data elements (bytes, words, or doublewords) in the destination operand (first operand) to the corresponding data elements in the source operand (second operand) (refer to Figure 3-61). If a pair of data elements are equal, the corresponding data element in the destination operand is set to all ones; otherwise, it is set to all zeroes. The destination operand must be an MMX technology register; the source operand may be either an MMX technology register or a 64-bit memory location.
PCMPEQW mm, mm/m64 mm 0000000000000000 0000000000000001 0000000000000111 0111000111000111
==
==
==
==
mm/m64 0000000000000000 0000000000000000 0111000111000111 0111000111000111 True False False True mm 1111111111111111 0000000000000000 0000000000000000 1111111111111111
3006020
Figure 3-61. Operation of the PCMPEQW Instruction
The PCMPEQB instruction compares the bytes in the destination operand to the corresponding bytes in the source operand, with the bytes in the destination operand being set according to the results. The PCMPEQW instruction compares the words in the destination operand to the corresponding words in the source operand, with the words in the destination operand being set according to the results. The PCMPEQD instruction compares the doublewords in the destination operand to the corresponding doublewords in the source operand, with the doublewords in the destination operand being set according to the results.
3-493
PCMPEQB/PCMPEQW/PCMPEQDPacked Compare for Equal (Continued)

Operation
IF instruction is PCMPEQB THEN IF DEST(7..0) = SRC(7..0) THEN DEST(7 0) FFH; ELSE DEST(7..0) 0; * Continue comparison of second through seventh bytes in DEST and SRC * IF DEST(63..56) = SRC(63..56) THEN DEST(63..56) FFH; ELSE DEST(63..56) 0; ELSE IF instruction is PCMPEQW THEN IF DEST(15..0) = SRC(15..0) THEN DEST(15..0) FFFFH; ELSE DEST(15..0) 0; * Continue comparison of second and third words in DEST and SRC * IF DEST(63..48) = SRC(63..48) THEN DEST(63..48) FFFFH; ELSE DEST(63..48) 0; ELSE (* instruction is PCMPEQD *) IF DEST(31..0) = SRC(31..0) THEN DEST(31..0) FFFFFFFFH; ELSE DEST(31..0) 0; IF DEST(63..32) = SRC(63..32) THEN DEST(63..32) FFFFFFFFH; ELSE DEST(63..32) 0; FI;
3-494

__m64 _m_pcmpeqb (__m64 m1, __m64 m2)

__m64 _mm_cmpeq_pi8 (__m64 m1, __m64 m2)
If the respective 8-bit values in m1 are equal to the respective 8-bit values in m2 set the respective 8-bit resulting values to all ones, otherwise set them to all zeroes. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_pcmpeqw (__m64 m1, __m64 m2)

__m64 _mm_cmpeq_pi16 (__m64 m1, __m64 m2)
If the respective 16-bit values in m1 are equal to the respective 16-bit values in m2 set the respective 16-bit resulting values to all ones, otherwise set them to all zeroes. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_pcmpeqd (__m64 m1, __m64 m2)

__m64 _mm_cmpeq_pi32 (__m64 m1, __m64 m2)
If the respective 32-bit values in m1 are equal to the respective 32-bit values in m2 set the respective 32-bit resulting values to all ones, otherwise set them to all zeroes. Flags Affected None:
3-495

3-496
PCMPGTB/PCMPGTW/PCMPGTDPacked Compare for Greater Than

Opcode 0F 64 /r 0F 65 /r 0F 66 /r Instruction PCMPGTB mm, mm/m64 PCMPGTW mm, mm/m64 PCMPGTD mm, mm/m64 Description Compare packed bytes in mm with packed bytes in mm/m64 for greater value. Compare packed words in mm with packed words in mm/m64 for greater value. Compare packed doublewords in mm with packed doublewords in mm/m64 for greater value.
Description These instructions compare the individual signed data elements (bytes, words, or doublewords) in the destination operand (first operand) to the corresponding signed data elements in the source operand (second operand) (refer to Figure 3-62). If a data element in the destination operand is greater than its corresponding data element in the source operand, the data element in the destination operand is set to all ones; otherwise, it is set to all zeroes. The destination operand must be an MMX technology register; the source operand may be either an MMX technology register or a 64-bit memory location.
PCMPGTW mm, mm/m64 mm 0000000000000000 0000000000000001 0000000000000111 0111000111000111
>
mm
>
>
>
mm/m64 0000000000000000 0000000000000000 0111000111000111 0111000111000111 True False False False 0000000000000000 1111111111111111 0000000000000000 0000000000000000
3006021
Figure 3-62. Operation of the PCMPGTW Instruction
The PCMPGTB instruction compares the signed bytes in the destination operand to the corresponding signed bytes in the source operand, with the bytes in the destination operand being set according to the results. The PCMPGTW instruction compares the signed words in the destination operand to the corresponding signed words in the source operand, with the words in the destination operand being set according to the results. The PCMPGTD instruction compares the signed doublewords in the destination operand to the corresponding signed doublewords in the source operand, with the doublewords in the destination operand being set according to the results.
3-497
PCMPGTB/PCMPGTW/PCMPGTDPacked Compare for Greater Than (Continued)

Operation
IF instruction is PCMPGTB THEN IF DEST(7..0) > SRC(7..0) THEN DEST(7 0) FFH; ELSE DEST(7..0) 0; * Continue comparison of second through seventh bytes in DEST and SRC * IF DEST(63..56) > SRC(63..56) THEN DEST(63..56) FFH; ELSE DEST(63..56) 0; ELSE IF instruction is PCMPGTW THEN IF DEST(15..0) > SRC(15..0) THEN DEST(15..0) FFFFH; ELSE DEST(15..0) 0; * Continue comparison of second and third bytes in DEST and SRC * IF DEST(63..48) > SRC(63..48) THEN DEST(63..48) FFFFH; ELSE DEST(63..48) 0; ELSE { (* instruction is PCMPGTD *) IF DEST(31..0) > SRC(31..0) THEN DEST(31..0) FFFFFFFFH; ELSE DEST(31..0) 0; IF DEST(63..32) > SRC(63..32) THEN DEST(63..32) FFFFFFFFH; ELSE DEST(63..32) 0; FI;
3-498

__m64 _m_pcmpgtb (__m64 m1, __m64 m2)

__m64 _mm_cmpgt_pi8 (__m64 m1, __m64 m2)
If the respective 8-bit values in m1 are greater than the respective 8-bit values in m2 set the respective 8-bit resulting values to all ones, otherwise set them to all zeroes. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_pcmpgtw (__m64 m1, __m64 m2)

__m64 _mm_pcmpgt_pi16 (__m64 m1, __m64 m2)
If the respective 16-bit values in m1 are greater than the respective 16-bit values in m2 set the respective 16-bit resulting values to all ones, otherwise set them to all zeroes. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_pcmpgtd (__m64 m1, __m64 m2)

__m64 _mm_pcmpgt_pi32 (__m64 m1, __m64 m2)
If the respective 32-bit values in m1 are greater than the respective 32-bit values in m2 set the respective 32-bit resulting values to all ones, otherwise set them all to zeroes. Flags Affected None.
3-499

3-500
PEXTRWExtract Word
Opcode 0F,C5, /r, ib Instruction PEXTRW r32, mm, imm8 Description Extract the word pointed to by imm8 from MM and move it to a 32-bit integer register.
Description The PEXTRW instruction moves the word in MM (selected by the two least significant bits of imm8) to the lower half of a 32-bit integer register.
PEXTRW r32,mm1,0x09 R32
Mm1
=
R32
Figure 3-63. Operation of the PEXTRW Instruction
Operation
SEL = imm8 AND 0X3; MM_TEMP = (SRC >> (SEL * 16)) AND 0XFFFF; r32[15-0] = MM_TEMP[15-0]; r32[31-16] = 0X0000;
int_m_pextrw(__m64 a, int n)

int_mm_extract_pi16(__m64 a, int n)
Extracts one of the four words of a. The selector n must be an immediate.
3-501
PEXTRWExtract Word (Continued)

Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #MF For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception
3-502
PINSRWInsert Word
Opcode 0F,C4,/r,ib Instruction PINSRW mm, r32/m16, imm8 Description Insert the word from the lower half of r32 or from Mem16 into the position in MM pointed to by imm8 without touching the other words.
Description The PINSRW instruction loads a word from the lower half of a 32-bit integer register (or from memory) and inserts it in the MM destination register, at a position defined by the two least significant bits of the imm8 constant. The insertion is done in such a way that the three other words from the destination register are left untouched.
PINSRW mm1,r32/m16, 0x0A R32/m16
Mm1
0x4326
0x985F
=
mm1 0x985F
Figure 3-64. Operation of the PINSRW Instruction
Operation
SEL = imm8 AND 0X3; IF(SEL = 0) THEN MASK=0X000000000000FFFF; ELSE IF(SEL = 1) THEN MASK=0X00000000FFFF0000 : ELSE IF(SEL = 2) THEN MASK=0XFFFF000000000000; FI FI FI DEST = (DEST AND NOT MASK) OR ((m16/r32[15-0] << (SEL * 16)) AND MASK);
3-503
PINSRWInsert Word (Continued)

__m64 _m_pinsrw(__m64 a, int d, int n)

__m64 _mm_insert_pi16(__m64 a, int d, int n)
Inserts word d into one of four words of a. The selector n must be an immediate. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #MF #AC For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3)
3-504
PMADDWDPacked Multiply and Add

Opcode 0F F5 /r Instruction PMADDWD mm, mm/m64 Description Multiply the packed words in mm by the packed words in mm/m64. Add the 32-bit pairs of results and store in mm as doubleword
Description This instruction multiplies the individual signed words of the destination operand by the corresponding signed words of the source operand, producing four signed, doubleword results (refer to Figure 3-65). The two doubleword results from the multiplication of the high-order words are added together and stored in the upper doubleword of the destination operand; the two doubleword results from the multiplication of the low-order words are added together and stored in the lower doubleword of the destination operand. The destination operand must be an MMX technology register; the source operand may be either an MMX technology register or a 64-bit memory location. The PMADDWD instruction wraps around to 80000000H only when all four words of both the source and destination operands are 8000H.
PMADDWD mm, mm/m64 mm

0111000111000111 0111000111000111
mm/m64
1000000000000000 0000010000000000
+
mm
1100100011100011
+
1001110000000000
Figure 3-65. Operation of the PMADDWD Instruction
Operation
DEST(31..0) (DEST(15..0) SRC(15..0)) + (DEST(31..16) SRC(31..16)); DEST(63..32) (DEST(47..32) SRC(47..32)) + (DEST(63..48) SRC(63..48));
3-505
PMADDWDPacked Multiply and Add (Continued)

__m64 _m_pmaddwd(__m64 m1, __m64 m2)

__m64 _mm_madd_pi16(__m64 m1, __m64 m2)
Multiply four 16-bit values in m1 by four 16-bit values in m2 producing four 32-bit intermediate results, which are then summed by pairs to produce two 32-bit results. Flags Affected None. Protected Mode Exceptions #GP(0) #SS(0) #UD #NM #MF #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-506
PMADDWDPacked Multiply and Add (Continued)

3-507
PMAXSWPacked Signed Integer Word Maximum

Opcode 0F,EE, /r Instruction PMAXSW mm1, mm2/m64 Description Return the maximum words between MM2/Mem and MM1.
Description The PMAXSW instruction returns the maximum between the four signed words in MM1 and MM2/Mem.
mm1
mm2/ m64 mm1
Figure 3-66. Operation of the PMAXSW Instruction
3-508
PMAXSWPacked Signed Integer Word Maximum (Continued)

Operation
IF DEST[15-0] > SRC/m64[15-0]) THEN (DEST[15-0] = DEST[15-0]; ELSE (DEST[15-0] = SRC/m64[15-0]; FI IF DEST[31-16] > SRC/m64[31-16]) THEN (DEST[31-16] = DEST[31-16]; ELSE (DEST[31-16] = SRC/m64[31-16]; FI IF DEST[47-32] > SRC/m64[47-32]) THEN (DEST[47-32] = DEST[47-32]; ELSE (DEST[47-32] SRC/m64[47-32]; FI IF DEST[63-48] > SRC/m64[63-48]) THEN (DEST[63-48] = DEST[63-48]; ELSE (DEST[63-48] = SRC/m64[63-48]; FI
__m64 _m_pmaxsw(__m64 a, __m64 b)

__m64 _mm_max_pi16(__m64 a, __m64 b)
Computes the element-wise maximum of the words in a and b. Numeric Exceptions None.
3-509
PMAXSWPacked Signed Integer Word Maximum (Continued)

Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #MF #AC For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3)
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) #AC For a page fault. For unaligned memory reference if the current privilege level is 3.
3-510
PMAXUBPacked Unsigned Integer Byte Maximum

Opcode 0F,DE, /r Instruction PMAXUB mm1, mm2/m64 Description Return the maximum bytes between MM2/Mem and MM1.
Description The PMAXUB instruction returns the maximum between the eight unsigned words in MM1 and MM2/Mem.
PMAXUB mm1, mm2/m64 mm1 59 46 40 87 187 55 221 27
mm2/ m64 mm1
24
65
11
101
78
207
111
36
=
59
=
65
=
40
=
101
=
187
=
207
=
221
=
36
Figure 3-67. Operation of the PMAXUB Instruction
3-511
PMAXUBPacked Unsigned Integer Byte Maximum (Continued)

Operation
IF DEST[7-0] > SRC/m64[7-0]) THEN (DEST[7-0] = DEST[7-0]; ELSE (DEST[7-0] = SRC/m64[7-0]; FI IF DEST[15-8] > SRC/m64[15-8]) THEN (DEST[15-8] = DEST[15-8]; ELSE (DEST[15-8] = SRC/m64[15-8]; FI IF DEST[23-16] > SRC/m64[23-16]) THEN (DEST[23-16] = DEST[23-16]; ELSE (DEST[23-16] = SRC/m64[23-16]; FI IF DEST[31-24] > SRC/m64[31-24]) THEN (DEST[31-24] = DEST[31-24]; ELSE (DEST[31-24] = SRC/m64[31-24]; FI IF DEST[39-32] > SRC/m64[39-32]) THEN (DEST[39-32] = DEST[39-32]; ELSE (DEST[39-32] = SRC/m64[39-32]; FI IF DEST[47-40] > SRC/m64[47-40]) THEN (DEST[47-40] = DEST[47-40]; ELSE (DEST[47-40] = SRC/m64[47-40]; FI IF DEST[55-48] > SRC/m64[55-48]) THEN (DEST[55-48] = DEST[55-48]; ELSE (DEST[55-48] = SRC/m64[55-48]; FI IF DEST[63-56] > SRC/m64[63-56]) THEN (DEST[63-56] = DEST[63-56]; ELSE (DEST[63-56] = SRC/m64[63-56]; FI
3-512
PMAXUBPacked Unsigned Integer Byte Maximum (Continued)

__m64 _m_pmaxub(__m64 a, __m64 b)

__m64 _mm_max_pu8(__m64 a, __m64 b)
Computes the element-wise maximum of the unsigned bytes in a and b. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #MF #AC For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3)
Real Address Mode Exceptions Interrupt 13 #UD #NM #MF If any part of the operand would lie outside of the effective address space from 0 to 0FFFFH. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception
3-513
PMINSWPacked Signed Integer Word Minimum

Opcode 0F,EA, /r Instruction PMINSW mm1, mm2/m64 Description Return the minimum words between MM2/Mem and MM1.
Description The PMINSW instruction returns the minimum between the four signed words in MM1 and MM2/Mem.
mm1
mm2/ m64 mm1
Figure 3-68. Operation of the PMINSW Instruction
3-514
PMINSWPacked Signed Integer Word Minimum (Continued)

Operation
IF DEST[15-0] < SRC/m64[15-0]) THEN (DEST[15-0] = DEST[15-0]; ELSE (DEST[15-0] = SRC/m64[15-0]; FI IF DEST[31-16] < SRC/m64[31-16]) THEN (DEST[31-16] = DEST[31-16]; ELSE (DEST[31-16] = SRC/m64[31-16]; FI IF DEST[47-32] < SRC/m64[47-32]) THEN (DEST[47-32] = DEST[47-32]; ELSE (DEST[47-32] SRC/m64[47-32]; FI IF DEST[63-48] < SRC/m64[63-48]) THEN (DEST[63-48] = DEST[63-48]; ELSE (DEST[63-48] = SRC/m64[63-48]; FI
__m64 _m_pminsw(__m64 a, __m64 b)

__m64 _mm_min_pi16(__m64 a, __m64 b)
Computes the element-wise minimum of the words in a and b. Numeric Exceptions None.
3-515
PMINSWPacked Signed Integer Word Minimum (Continued)

Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #MF #AC For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3)
3-516
PMINUBPacked Unsigned Integer Byte Minimum

Opcode 0F,DA, /r Instruction PMINUB mm1, mm2/m64 Description Return the minimum bytes between MM2/Mem and MM1.
Description The PMINUB instruction returns the minimum between the eight unsigned words in MM1 and MM2/Mem.
PMINUB mm1, mm2/m64 mm1 59 46 40 87 187 55 221 27
mm2/ m64 mm1
24
65
11
101
78
207
111
36
=
24
=
46
=
11
=
87
=
78
=
55
=
111
=
27
Figure 3-69. Operation of the PMINUB Instruction
3-517
PMINUBPacked Unsigned Integer Byte Minimum (Continued)

Operation
IF DEST[7-0] < SRC/m64[7-0]) THEN (DEST[7-0] = DEST[7-0]; ELSE (DEST[7-0] = SRC/m64[7-0]; FI IF DEST[15-8] < SRC/m64[15-8]) THEN (DEST[15-8] = DEST[15-8]; ELSE (DEST[15-8] = SRC/m64[15-8]; FI IF DEST[23-16] < SRC/m64[23-16]) THEN (DEST[23-16] = DEST[23-16]; ELSE (DEST[23-16] = SRC/m64[23-16]; FI IF DEST[31-24] < SRC/m64[31-24]) THEN (DEST[31-24] = DEST[31-24]; ELSE (DEST[31-24] = SRC/m64[31-24]; FI IF DEST[39-32] < SRC/m64[39-32]) THEN (DEST[39-32] = DEST[39-32]; ELSE (DEST[39-32] = SRC/m64[39-32]; FI IF DEST[47-40] < SRC/m64[47-40]) THEN (DEST[47-40] = DEST[47-40]; ELSE (DEST[47-40] = SRC/m64[47-40]; FI IF DEST[55-48] < SRC/m64[55-48]) THEN (DEST[55-48] = DEST[55-48]; ELSE (DEST[55-48] = SRC/m64[55-48]; FI IF DEST[63-56] < SRC/m64[63-56]) THEN (DEST[63-56] = DEST[63-56]; ELSE (DEST[63-56] = SRC/m64[63-56]; FI
3-518
PMINUBPacked Unsigned Integer Byte Minimum (Continued)

__m64 _m_pminub(__m64 a, __m64 b)

__m64 _m_min_pu8(__m64 a, __m64 b)
Computes the element-wise minimum of the unsigned bytes in a and b. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #MF #AC For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3).
3-519
PMOVMSKBMove Byte Mask To Integer

Opcode 0F,D7,/r Instruction PMOVMSKB r32, mm Description Move the byte mask of MM to r32.
Description The PMOVMSKB instruction returns an 8-bit mask formed of the most significant bits of each byte of its source operand.
PMOVMSKB r32, mm1 mm1 R32
=
R32
Figure 3-70. Operation of the PMOVMSKB Instruction
Operation
r32[7] = SRC[63]; r32[6] = SRC[55]; r32[5] = SRC[47]; r32[4] = SRC[39]; r32[3] = SRC[31]; r32[2] = SRC[23]; r32[1] = SRC[15]; r32[0] = SRC[7]; r32[31-8] = 0X000000;
int_m_pmovmskb(__m64 a)

int_mm_movemask_pi8(__m64 a)
Creates an 8-bit mask from the most significant bits of the bytes in a.
3-520
PMOVMSKBMove Byte Mask To Integer (Continued)

Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #MF #AC For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3).
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF (fault-code) #AC For a page fault. For unaligned memory reference if the current privilege level is 3.
3-521
PMULHUWPacked Multiply High Unsigned

Opcode 0F,E4,/r Instruction PMULHUW mm1, mm2/m64 Description Multiply the packed unsigned words in MM1 register with the packed unsigned words in MM2/Mem, then store the high-order 16 bits of the results in MM1.
Description The PMULHUW instruction multiplies the four unsigned words in the destination operand with the four unsigned words in the source operand. The high-order 16 bits of the 32-bit intermediate results are written to the destination operand.
mm1
*
mm2/ m64 mm1
* =
* =
* =
Figure 3-71. Operation of the PMULHUW Instruction
Operation
DEST[15-0] = (DEST[15-0] * SRC/m64[15-0])[31-16]; DEST[31-16] = (DEST[31-16] * SRC/m64[31-16])[31-16]; DEST[47-32] = (DEST[47-32] * SRC/m64[47-32])[31-16]; DEST[63-48] = (DEST[63-48] * SRC/m64[63-48])[31-16];
3-522
PMULHUWPacked Multiply High Unsigned (Continued)

__m64 _m_pmulhuw(__m64 a, __m64 b)

__m64 _mm_mulhi_pu16(__m64 a, __m64 b)
Multiplies the unsigned words in a and b, returning the upper 16 bits of the 32-bit intermediate results. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #MF #AC For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3).
3-523
PMULHUWPacked Multiply High Unsigned (Continued)

3-524
PMULHWPacked Multiply High

Opcode 0F E5 /r Instruction PMULHW mm, mm/m64 Description Multiply the signed packed words in mm by the signed packed words in mm/m64, then store the high-order word of each doubleword result in mm.
Description This instruction multiplies the four signed words of the source operand (second operand) by the four signed words of the destination operand (first operand), producing four signed, doubleword, intermediate results (refer to Figure 3-72). The high-order word of each intermediate result is then written to its corresponding word location in the destination operand. The destination operand must be an MMX technology register; the source operand may be either an MMX technology register or a 64-bit memory location.
PMULHW mm, mm/m64 mm 0111000111000111 0111000111000111
*
mm/m64 High Order mm
*
High Order
1000000000000000 0000010000000000 High Order High Order 1100011100011100 0000000111000111

3006022
Figure 3-72. Operation of the PMULHW Instruction
Operation
DEST(15..0) HighOrderWord(DEST(15..0) SRC(15..0)); DEST(31..16) HighOrderWord(DEST(31..16) SRC(31..16)); DEST(47..32) HighOrderWord(DEST(47..32) SRC(47..32)); DEST(63..48) HighOrderWord(DEST(63..48) SRC(63..48));
3-525
PMULHWPacked Multiply High (Continued)

__m64 _m_pmulhw(__m64 m1, __m64 m2)

__m64 _mM_mulhi_pi16(__m64 m1, __m64 m2)
Multiply four signed 16-bit values in m1 by four signed 16-bit values in m2 and produce the high 16 bits of the four results. Flags Affected None. Protected Mode Exceptions #GP(0) #SS(0) #UD #NM #MF #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-526
PMULHWPacked Multiply High (Continued)

3-527
PMULLWPacked Multiply Low

Opcode 0F D5 /r Instruction PMULLW mm, mm/m64 Description Multiply the packed words in mm with the packed words in mm/m64, then store the low-order word of each doubleword result in mm.
Description This instruction multiplies the four signed or unsigned words of the source operand (second operand) with the four signed or unsigned words of the destination operand (first operand), producing four doubleword, intermediate results (refer to Figure 3-73). The low-order word of each intermediate result is then written to its corresponding word location in the destination operand. The destination operand must be an MMX technology register; the source operand may be either an MMX technology register or a 64-bit memory location.
PMULLW mm, mm/m64 mm 0111000111000111 0111000111000111
*
mm/m64 Low Order mm
*
Low Order
1000000000000000 0000010000000000 Low Order Low Order 1000000000000000 0001110000000000

3006025
Figure 3-73. Operation of the PMULLW Instruction
Operation
DEST(15..0) LowOrderWord(DEST(15..0) SRC(15..0)); DEST(31..16) LowOrderWord(DEST(31..16) SRC(31..16)); DEST(47..32) LowOrderWord(DEST(47..32) SRC(47..32)); DEST(63..48) LowOrderWord(DEST(63..48) SRC(63..48));
3-528
PMULLWPacked Multiply Low (Continued)

__m64 _m_pmullw(__m64 m1, __m64 m2)

__m64 _mm_mullo_pi16(__m64 m1, __m64 m2)
Multiply four 16-bit values in m1 by four 16-bit values in m2 and produce the low 16 bits of the four results. Flags Affected None. Protected Mode Exceptions #GP(0) #SS(0) #UD #NM #MF #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-529
PMULLWPacked Multiply Low (Continued)

3-530
POPPop a Value from the Stack

Opcode 8F /0 8F /0 58+ rw 58+ rd 1F 07 17 0F A1 0F A9 Instruction POP m16 POP m32 POP r16 POP r32 POP DS POP ES POP SS POP FS POP GS Description Pop top of stack into m16; increment stack pointer Pop top of stack into m32; increment stack pointer Pop top of stack into r16; increment stack pointer Pop top of stack into r32; increment stack pointer Pop top of stack into DS; increment stack pointer Pop top of stack into ES; increment stack pointer Pop top of stack into SS; increment stack pointer Pop top of stack into FS; increment stack pointer Pop top of stack into GS; increment stack pointer
Description This instruction loads the value from the top of the stack to the location specified with the destination operand and then increments the stack pointer. The destination operand can be a generalpurpose register, memory location, or segment register. The current operand-size attribute of the stack segment determines the stack pointer size (16 bits or 32 bitsthe source address size), and the operand-size attribute of the current code segment determines the amount the stack pointer is incremented (two bytes or four bytes). For example, if these address- and operand-size attributes are 32, the 32-bit ESP register (stack pointer) is incremented by four and, if they are 16, the 16-bit SP register is incremented by two. (The B flag in the stack segments segment descriptor determines the stacks address-size attribute, and the D flag in the current code segments segment descriptor, along with prefixes, determines the operand-size attribute and also the address-size attribute of the destination operand.) If the destination operand is one of the segment registers DS, ES, FS, GS, or SS, the value loaded into the register must be a valid segment selector. In protected mode, popping a segment selector into a segment register automatically causes the descriptor information associated with that segment selector to be loaded into the hidden (shadow) part of the segment register and causes the selector and the descriptor information to be validated (refer to the Operation section below). A null value (0000-0003) may be popped into the DS, ES, FS, or GS register without causing a general protection fault. However, any subsequent attempt to reference a segment whose corresponding segment register is loaded with a null value causes a general protection exception (#GP). In this situation, no memory reference occurs and the saved value of the segment register is null. The POP instruction cannot pop a value into the CS register. To load the CS register from the stack, use the RET instruction. If the ESP register is used as a base register for addressing a destination operand in memory, the POP instruction computes the effective address of the operand after it increments the ESP register. For the case of a 16-bit stack where ESP wraps to 0h as a result of the POP instruction, the resulting location of the memory write is processor-family-specific.
3-531
POPPop a Value from the Stack (Continued)

The POP ESP instruction increments the stack pointer (ESP) before data at the old top of stack is written into the destination. A POP SS instruction inhibits all interrupts, including the NMI interrupt, until after execution of the next instruction. This action allows sequential execution of POP SS and MOV ESP, EBP instructions without the danger of having an invalid stack during an interrupt1. However, use of the LSS instruction is the preferred method of loading the SS and ESP registers. Operation
IF StackAddrSize = 32 THEN IF OperandSize = 32 THEN DEST SS:ESP; (* copy a doubleword *) ESP ESP + 4; ELSE (* OperandSize = 16*) DEST SS:ESP; (* copy a word *) ESP ESP + 2; FI; ELSE (* StackAddrSize = 16* ) IF OperandSize = 16 THEN DEST SS:SP; (* copy a word *) SP SP + 2; ELSE (* OperandSize = 32 *) DEST SS:SP; (* copy a doubleword *) SP SP + 4; FI; FI;
Loading a segment register while in protected mode results in special checks and actions, as described in the following listing. These checks are performed on the segment selector and the segment descriptor it points to.
IF SS is loaded; THEN IF segment selector is null THEN #GP(0);
1. Note that in a sequence of instructions that individually delay interrupts past the following instruction, only the first instruction in the sequence is guaranteed to delay the interrupt, but subsequent interrupt-delaying instructions may not delay the interrupt. Thus, in the following instruction sequence: STI POP SS POP ESP interrupts may be recognized before the POP ESP executes, because STI also delays interrupts for one instruction.
3-532

FI; IF segment selector index is outside descriptor table limits OR segment selectors RPL CPL OR segment is not a writable data segment OR DPL CPL THEN #GP(selector); FI; IF segment not marked present THEN #SS(selector); ELSE SS segment selector; SS segment descriptor; FI; FI; IF DS, ES, FS or GS is loaded with non-null selector; THEN IF segment selector index is outside descriptor table limits OR segment is not a data or readable code segment OR ((segment is a data or nonconforming code segment) AND (both RPL and CPL > DPL)) THEN #GP(selector); IF segment not marked present THEN #NP(selector); ELSE SegmentRegister segment selector; SegmentRegister segment descriptor; FI; FI; IF DS, ES, FS or GS is loaded with a null selector; THEN SegmentRegister segment selector; SegmentRegister segment descriptor; FI;
3-533

Protected Mode Exceptions #GP(0) If attempt is made to load SS register with null segment selector. If the destination operand is in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #GP(selector) If segment selector index is outside descriptor table limits. If the SS register is being loaded and the segment selectors RPL and the segment descriptors DPL are not equal to the CPL. If the SS register is being loaded and the segment pointed to is a nonwritable data segment. If the DS, ES, FS, or GS register is being loaded and the segment pointed to is not a data or readable code segment. If the DS, ES, FS, or GS register is being loaded and the segment pointed to is a data or nonconforming code segment, but both the RPL and the CPL are greater than the DPL. #SS(0) If the current top of stack is not within the stack segment. If a memory operand effective address is outside the SS segment limit. #SS(selector) #NP #PF(fault-code) #AC(0) If the SS register is being loaded and the segment pointed to is marked not present. If the DS, ES, FS, or GS register is being loaded and the segment pointed to is marked not present. If a page fault occurs. If an unaligned memory reference is made while the current privilege level is 3 and alignment checking is enabled.
3-534

Real-Address Mode Exceptions #GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.
Virtual-8086 Mode Exceptions #GP(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a page fault occurs. If an unaligned memory reference is made while alignment checking is enabled.
3-535
POPA/POPADPop All General-Purpose Registers

Opcode 61 61 Instruction POPA POPAD Description Pop DI, SI, BP, BX, DX, CX, and AX Pop EDI, ESI, EBP, EBX, EDX, ECX, and EAX
Description These instructions pop doublewords (POPAD) or words (POPA) from the stack into the generalpurpose registers. The registers are loaded in the following order: EDI, ESI, EBP, EBX, EDX, ECX, and EAX (if the operand-size attribute is 32) and DI, SI, BP, BX, DX, CX, and AX (if the operand-size attribute is 16). These instructions reverse the operation of the PUSHA/PUSHAD instructions. The value on the stack for the ESP or SP register is ignored. Instead, the ESP or SP register is incremented after each register is loaded. The POPA (pop all) and POPAD (pop all double) mnemonics reference the same opcode. The POPA instruction is intended for use when the operand-size attribute is 16 and the POPAD instruction for when the operand-size attribute is 32. Some assemblers may force the operand size to 16 when POPA is used and to 32 when POPAD is used (using the operand-size override prefix [66H] if necessary). Others may treat these mnemonics as synonyms (POPA/POPAD) and use the current setting of the operand-size attribute to determine the size of values to be popped from the stack, regardless of the mnemonic used. (The D flag in the current code segments segment descriptor determines the operand-size attribute.) Operation
IF OperandSize = 32 (* instruction = POPAD *) THEN EDI Pop(); ESI Pop(); EBP Pop(); increment ESP by 4 (* skip next 4 bytes of stack *) EBX Pop(); EDX Pop(); ECX Pop(); EAX Pop(); ELSE (* OperandSize = 16, instruction = POPA *) DI Pop(); SI Pop(); BP Pop(); increment ESP by 2 (* skip next 2 bytes of stack *) BX Pop(); DX Pop(); CX Pop(); AX Pop(); FI;
3-536
POPA/POPADPop All General-Purpose Registers (Continued)

Flags Affected None. Protected Mode Exceptions #SS(0) #PF(fault-code) #AC(0) If the starting or ending stack address is not within the stack segment. If a page fault occurs. If an unaligned memory reference is made while the current privilege level is 3 and alignment checking is enabled.
Real-Address Mode Exceptions #SS If the starting or ending stack address is not within the stack segment.
Virtual-8086 Mode Exceptions #SS(0) #PF(fault-code) #AC(0) If the starting or ending stack address is not within the stack segment. If a page fault occurs. If an unaligned memory reference is made while alignment checking is enabled.
3-537
POPF/POPFDPop Stack into EFLAGS Register

Opcode 9D 9D Instruction POPF POPFD Description Pop top of stack into lower 16 bits of EFLAGS Pop top of stack into EFLAGS
Description These instructions pop a doubleword (POPFD) from the top of the stack (if the current operandsize attribute is 32) and stores the value in the EFLAGS register, or pops a word from the top of the stack (if the operand-size attribute is 16) and stores it in the lower 16 bits of the EFLAGS register (that is, the FLAGS register). These instructions reverse the operation of the PUSHF/PUSHFD instructions. The POPF (pop flags) and POPFD (pop flags double) mnemonics reference the same opcode. The POPF instruction is intended for use when the operand-size attribute is 16 and the POPFD instruction for when the operand-size attribute is 32. Some assemblers may force the operand size to 16 when POPF is used and to 32 when POPFD is used. Others may treat these mnemonics as synonyms (POPF/POPFD) and use the current setting of the operand-size attribute to determine the size of values to be popped from the stack, regardless of the mnemonic used. The effect of the POPF/POPFD instructions on the EFLAGS register changes slightly, depending on the mode of operation of the processor. When the processor is operating in protected mode at privilege level 0 (or in real-address mode, which is equivalent to privilege level 0), all the non-reserved flags in the EFLAGS register except the VIP, VIF, and VM flags can be modified. The VIP and VIF flags are cleared, and the VM flag is unaffected. When operating in protected mode, with a privilege level greater than 0, but less than or equal to IOPL, all the flags can be modified except the IOPL field and the VIP, VIF, and VM flags. Here, the IOPL flags are unaffected, the VIP and VIF flags are cleared, and the VM flag is unaffected. The interrupt flag (IF) is altered only when executing at a level at least as privileged as the IOPL. If a POPF/POPFD instruction is executed with insufficient privilege, an exception does not occur, but the privileged bits do not change. When operating in virtual-8086 mode, the I/O privilege level (IOPL) must be equal to 3 to use POPF/POPFD instructions and the VM, RF, IOPL, VIP, and VIF flags are unaffected. If the IOPL is less than 3, the POPF/POPFD instructions cause a general-protection exception (#GP). Refer to Section 3.6.3. in Chapter 3, Basic Execution Environment of the Intel Architecture Software Developers Manual, Volume 1, for information about the EFLAGS registers. Operation
IF VM=0 (* Not in Virtual-8086 Mode *) THEN IF CPL=0 THEN IF OperandSize = 32; THEN
3-538
POPF/POPFDPop Stack into EFLAGS Register (Continued)

EFLAGS Pop(); (* All non-reserved flags except VIP, VIF, and VM can be modified; *) (* VIP and VIF are cleared; VM is unaffected*) ELSE (* OperandSize = 16 *) EFLAGS[15:0] Pop(); (* All non-reserved flags can be modified; *) FI; ELSE (* CPL > 0 *) IF OperandSize = 32; THEN EFLAGS Pop() (* All non-reserved bits except IOPL, VIP, and VIF can be modified; *) (* IOPL is unaffected; VIP and VIF are cleared; VM is unaffected *) ELSE (* OperandSize = 16 *) EFLAGS[15:0] Pop(); (* All non-reserved bits except IOPL can be modified *) (* IOPL is unaffected *) FI; FI; ELSE (* In Virtual-8086 Mode *) IF IOPL=3 THEN IF OperandSize=32 THEN EFLAGS Pop() (* All non-reserved bits except VM, RF, IOPL, VIP, and VIF *) (* can be modified; VM, RF, IOPL, VIP, and VIF are unaffected *) ELSE EFLAGS[15:0] Pop() (* All non-reserved bits except IOPL can be modified *) (* IOPL is unaffected *) FI; ELSE (* IOPL < 3 *) #GP(0); (* trap to virtual-8086 monitor *) FI; FI; FI;
Flags Affected All flags except the reserved bits and the VM bit.
3-539
POPF/POPFDPop Stack into EFLAGS Register (Continued)

Protected Mode Exceptions #SS(0) #PF(fault-code) #AC(0) If the top of stack is not within the stack segment. If a page fault occurs. If an unaligned memory reference is made while the current privilege level is 3 and alignment checking is enabled.
Real-Address Mode Exceptions #SS If the top of stack is not within the stack segment.
Virtual-8086 Mode Exceptions #GP(0) If the I/O privilege level is less than 3. If an attempt is made to execute the POPF/POPFD instruction with an operand-size override prefix. #SS(0) #PF(fault-code) #AC(0) If the top of stack is not within the stack segment. If a page fault occurs. If an unaligned memory reference is made while alignment checking is enabled.
3-540
PORBitwise Logical OR
Opcode 0F EB /r Instruction POR mm, mm/m64 Description OR quadword from mm/m64 to quadword in mm.
Description This instruction performs a bitwise logical OR operation on the quadword source (second) and destination (first) operands and stores the result in the destination operand location (refer to Figure 3-74). The source operand can be an MMX technology register or a quadword memory location; the destination operand must be an MMX technology register. Each bit of the result is made 0 if the corresponding bits of both operands are 0; otherwise the bit is set to 1.
POR mm, mm/m64 mm 1111111111111000000000000000010110110101100010000111011101110111
mm/m64 0001000011011001010100000011000100011110111011110001010110010101
mm
1111111111111001010100000011010110111111111011110111011111110111
3006024
Figure 3-74. Operation of the POR Instruction.
Operation
DEST DEST OR SRC;
__m64 _m_por(__m64 m1, __m64 m2)

__m64 _mm_or_si64(__m64 m1, __m64 m2)
Perform a bitwise OR of the 64-bit value in m1 with the 64-bit value in m2. Flags Affected None.
3-541
PORBitwise Logical OR (Continued)

3-542
PREFETCHPrefetch
Opcode 0F,18,/1 0F,18,/2 0F,18,/3 0F,18,/0 Instruction PREFETCHT0 m8 PREFETCHT1 m8 PREFETCHT2 m8 PREFETCHNTA m8 Description Move data specified by address closer to the processor using the t0 hint. Move data specified by address closer to the processor using the t1 hint. Move data specified by address closer to the processor using the t2 hint. Move data specified by address closer to the processor using the nta hint.
Description If there are no excepting conditions, the prefetch instruction fetches the line containing the addresses byte to a location in the cache hierarchy specified by a locality hint. If the line is already present in the cache hierarchy at a level closer to the processor, no data movement occurs. The bits 5:3 of the ModR/M byte specify locality hints as follows:
temporal data(t0) - prefetch data into all cache levels. temporal with respect to first level cache (t1) - prefetch data in all cache levels except 0th cache level temporal with respect to second level cache (t2) - prefetch data in all cache levels, except 0th and 1st cache levels. non temporal with respect to all cache levels (nta) - prefetch data into nontemporal cache structure.
The architectural implementation of this instruction in no way effects the function of a program. Locality hints are processor implementation-dependent, and can be overloaded or ignored by a processor implementation. The prefetch instruction does not cause any exceptions (except for code breakpoints), does not affect program behavior, and may be ignored by the processor implementation. The amount of data prefetched is processor implementation-dependent. It will, however, be a minimum of 32 bytes. Prefetches to uncacheable or WC memory (UC or WCF memory types) will be ignored. Additional ModRM encodings, besides those specified above, are defined to be reserved, and the use of reserved encodings risks future incompatibility. Use of any ModRM value other than the specified ones will lead to unpredictable behavior. Operation
FETCH (m8);
3-543
PREFETCHPrefetch (Continued)
void_mm_prefetch(char *p, int i)
Loads one cache line of data from address p to a location "closer" to the processor. The value i specifies the type of prefetch operation. The value i specifies the type of prefetch operation: the constants _MM_HINT_T0, _MM_HINT_T1, _MM_HINT_T2, and _MM_HINT_NTA should be used, corresponding to the type of prefetch instruction. Numeric Exceptions None. Protected Mode Exceptions None. Real Address Mode Exceptions None. Virtual 8086 Mode Exceptions None. Comments This instruction is merely a hint. If executed, this instruction moves data closer to the processor in anticipation of future use. The performance of these instructions in application code can be implementation specific. To achieve maximum speedup, code tuning might be necessary for each implementation. The non temporal hint also minimizes pollution of useful cache data. PREFETCH instructions ignore the value of CR4.OSFXSR. Since they do not affect the new Streaming SIMD Extension state, they will not generate an invalid exception if CR4.OSFXSR = 0. If the PTE is not in the TLB, the prefetch is ignored. This behavior is specific to the Pentium III processor and may change with future processor implementations.
3-544
PSADBWPacked Sum of Absolute Differences

Opcode 0F,F6, /r Instruction PSADBW mm1,mm2/m64 Description Absolute difference of packed unsigned bytes from MM2 /Mem and MM1; these differences are then summed to produce a word result.
Description The PSADBW instruction computes the absolute value of the difference of unsigned bytes for mm1 and mm2/m64. These differences are then summed to produce a word result in the lower 16-bit field; the upper three words are cleared. The destination operand is an MMX technology register. The source operand can either be an MMX technology register or a 64-bit memory operand.
PSADBW mm1, mm2/m64 mm1 59 46 40 87 187 55 221 27
mm2/ m64 mm1 24
65
11
101
78
207
111
36
Figure 3-75. Operation of the PSADBW Instruction
3-545
PSADBWPacked Sum of Absolute Differences (Continued)

Operation
TEMP1 = abs(DEST[7-0] - SRC/m64[7-0]); TEMP2 = abs(DEST[15-8] - SRC/m64[15-8]); TEMP3 = abs(DEST[23-16] - SRC/m64[23-16]); TEMP4 = abs(DEST[31-24] - SRC/m64[31-24]); TEMP5 = abs(DEST[39-32] - SRC/m64[39-32]); TEMP6 = abs(DEST[47-40] - SRC/m64[47-40]); TEMP7 = abs(DEST[55-48] - SRC/m64[55-48]); TEMP8 = abs(DEST[63-56] - SRC/m64[63-56]); DEST[15:0] = TEMP1 + TEMP2 + TEMP3 + TEMP4 + TEMP5 + TEMP6 + TEMP7 + TEMP8; DEST[31:16] = 0X00000000; DEST[47:32] = 0X00000000; DEST[63:48] = 0X00000000;
__m64_m_psadbw(__m64 a,__m64 b)

__m64_mm_sad_pu8(__m64 a,__m64 b)
Computes the sum of the absolute differences of the unsigned bytes in a and b, returning the value in the lower word. The upper three words are cleared. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #MF #AC For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3).
3-546
PSADBWPacked Sum of Absolute Differences (Continued)

3-547
PSHUFWPacked Shuffle Word

Opcode 0F,70,/r,ib Instruction PSHUFW mm1, mm2/m64, imm8 Description Shuffle the words in MM2/Mem based on the encoding in imm8 and store in MM1.
Description The PSHUF instruction uses the imm8 operand to select which of the four words in MM2/Mem will be placed in each of the words in MM1. Bits 1 and 0 of imm8 encode the source for destination word 0 (MM1[15-0]), bits 3 and 2 encode for word 1, bits 5 and 4 encode for word 2, and bits 7 and 6 encode for word 3 (MM1[63-48]). Similarly, the two-bit encoding represents which source word is to be used, e.g., a binary encoding of 10 indicates that source word 2 (MM2/Mem[47-32]) will be used.
mm1
mm2/ m64 mm1
Figure 3-76. Operation of the PSHUFW Instruction
Operation
DEST[15-0] = (SRC/m64 >> (imm8[1-0] * 16) )[15-0] DEST[31-16] = (SRC/m64 >> (imm8[3-2] * 16) )[15-0] DEST[47-32] = (SRC/m64 >> (imm8[5-4] * 16) )[15-0] DEST[63-48] = (SRC/m64 >> (imm8[7-6] * 16) )[15-0]
3-548
PSHUFWPacked Shuffle Word (Continued)

__m64 _m_pshufw(__m64 a, int n)

__m64 _mm_shuffle_pi16(__m64 a, int n)
Returns a combination of the four words of a. The selector n must be an immediate. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #MF #AC For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If there is a pending FPU exception. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3).
3-549
PSLLW/PSLLD/PSLLQPacked Shift Left Logical

Opcode 0F F1 /r 0F 71 /6, ib 0F F2 /r 0F 72 /6 ib 0F F3 /r 0F 73 /6 ib Instruction PSLLW mm, mm/m64 PSLLW mm, imm8 PSLLD mm, mm/m64 PSLLD mm, imm8 PSLLQ mm, mm/m64 PSLLQ mm, imm8 Description Shift words in mm left by amount specified in mm/m64, while shifting in zeroes. Shift words in mm left by imm8, while shifting in zeroes. Shift doublewords in mm left by amount specified in mm/m64, while shifting in zeroes. Shift doublewords in mm by imm8, while shifting in zeroes. Shift mm left by amount specified in mm/m64, while shifting in zeroes. Shift mm left by Imm8, while shifting in zeroes.
Description These instructions shift the bits in the data elements (words, doublewords, or quadword) in the destination operand (first operand) to the left by the number of bits specified in the unsigned count operand (second operand) (refer to Figure 3-77). The result of the shift operation is written to the destination operand. As the bits in the data elements are shifted left, the empty low-order bits are cleared (set to zero). If the value specified by the count operand is greater than 15 (for words), 31 (for doublewords), or 63 (for a quadword), then the destination operand is set to all zeroes. The destination operand must be an MMX technology register; the count operand can be either an MMX technology register, a 64-bit memory location, or an 8-bit immediate. The PSLLW instruction shifts each of the four words of the destination operand to the left by the number of bits specified in the count operand; the PSLLD instruction shifts each of the two doublewords of the destination operand; and the PSLLQ instruction shifts the 64-bit quadword in the destination operand. As the individual data elements are shifted left, the empty low-order bit positions are filled with zeroes.
PSLLW mm, 2 mm 1111111111111100 0001000111000111
shift left
shift left
shift left
shift left
mm
1111111111110000 0100011100011100
3006026
Figure 3-77. Operation of the PSLLW Instruction
3-550
PSLLW/PSLLD/PSLLQPacked Shift Left Logical (Continued)

Operation
IF instruction is PSLLW THEN DEST(15..0) DEST(15..0) << COUNT; DEST(31..16) DEST(31..16) << COUNT; DEST(47..32) DEST(47..32) << COUNT; DEST(63..48) DEST(63..48) << COUNT; ELSE IF instruction is PSLLD THEN { DEST(31..0) DEST(31..0) << COUNT; DEST(63..32) DEST(63..32) << COUNT; ELSE (* instruction is PSLLQ *) DEST DEST << COUNT; FI;
3-551

__m64 _m_psllw (__m64 m, __m64 count)

__m64 _mm_sll_pi16 (__m64 m, __m64 count)
Shifts four 16-bit values in m left the amount specified by count while shifting in zeroes. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_psllwi (__m64 m, int count)

__m64 _mm_slli_pi16 (__m64 m, int count)
Shifts four 16-bit values in m left the amount specified by count while shifting in zeroes. For the best performance, count should be a constant. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_pslld (__m64 m, __m64 count)

__m64 _mm_sll_pi32 (__m64 m, __m64 count)
Shifts two 32-bit values in m left the amount specified by count while shifting in zeroes. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_pslldi (__m64 m, int count)
Shifts two 32-bit values in m left the amount specified by count while shifting in zeroes. For the best performance, count should be a constant. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_psllq (__m64 m, __m64 count)

__m64 _mm_sll_si64 (__m64 m, __m64 count)
Shifts the 64-bit value in m left the amount specified by count while shifting in zeroes.
3-552

__m64 _m_psllqi (__m64 m, int count)

__m64 _mm_slli_si64 (__m64 m, int count)
Shifts the 64-bit value in m left the amount specified by count while shifting in zeroes. For the best performance, count should be a constant. Flags Affected None. Protected Mode Exceptions #GP(0) #SS(0) #UD #NM #MF #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-553

3-554
PSRAW/PSRADPacked Shift Right Arithmetic

Opcode 0F E1 /r 0F 71 /4 ib 0F E2 /r 0F 72 /4 ib Instruction PSRAW mm, mm/m64 PSRAW mm, imm8 PSRAD mm, mm/m64 PSRAD mm, imm8 Description Shift words in mm right by amount specified in mm/m64 while shifting in sign bits. Shift words in mm right by imm8 while shifting in sign bits Shift doublewords in mm right by amount specified in mm/m64 while shifting in sign bits. Shift doublewords in mm right by imm8 while shifting in sign bits.
Description These instructions shift the bits in the data elements (words or doublewords) in the destination operand (first operand) to the right by the amount of bits specified in the unsigned count operand (second operand) (refer to Figure 3-78). The result of the shift operation is written to the destination operand. The empty high-order bits of each element are filled with the initial value of the sign bit of the data element. If the value specified by the count operand is greater than 15 (for words) or 31 (for doublewords), each destination data element is filled with the initial value of the sign bit of the element. The destination operand must be an MMX technology register; the count operand (source operand) can be either an MMX technology register, a 64-bit memory location, or an 8-bit immediate. The PSRAW instruction shifts each of the four words in the destination operand to the right by the number of bits specified in the count operand; the PSRAD instruction shifts each of the two doublewords in the destination operand. As the individual data elements are shifted right, the empty high-order bit positions are filled with the sign value.
PSRAW mm, 2 mm 1111111111111100 1101000111000111
shift right
shift right
shift right
shift right
mm
1111111111111111 1111010001110001
3006048
Figure 3-78. Operation of the PSRAW Instruction
3-555
PSRAW/PSRADPacked Shift Right Arithmetic (Continued)

Operation
IF instruction is PSRAW THEN DEST(15..0) SignExtend (DEST(15..0) >> COUNT); DEST(31..16) SignExtend (DEST(31..16) >> COUNT); DEST(47..32) SignExtend (DEST(47..32) >> COUNT); DEST(63..48) SignExtend (DEST(63..48) >> COUNT); ELSE { (*instruction is PSRAD *) DEST(31..0) SignExtend (DEST(31..0) >> COUNT); DEST(63..32) SignExtend (DEST(63..32) >> COUNT); FI;
__m64 _m_psraw (__m64 m, __m64 count)

__m64 _mm_sraw_pi16 (__m64 m, __m64 count)
Shifts four 16-bit values in m right the amount specified by count while shifting in the sign bit. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_psrawi (__m64 m, int count)

__m64 _mm_srai_pi16 (__m64 m, int count)
Shifts four 16-bit values in m right the amount specified by count while shifting in the sign bit. For the best performance, count should be a constant. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_psrad (__m64 m, __m64 count)

__m64 _mm_sra_pi32 (__m64 m, __m64 count)
Shifts two 32-bit values in m right the amount specified by count while shifting in the sign bit. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_psradi (__m64 m, int count)

__m64 _mm_srai_pi32 (__m64 m, int count)
Shifts two 32-bit values in m right the amount specified by count while shifting in the sign bit. For the best performance, count should be a constant.
3-556
PSRAW/PSRADPacked Shift Right Arithmetic (Continued)

3-557
PSRLW/PSRLD/PSRLQPacked Shift Right Logical

Opcode 0F D1 /r 0F 71 /2 ib 0F D2 /r 0F 72 /2 ib 0F D3 /r 0F 73 /2 ib Instruction PSRLW mm, mm/m64 PSRLW mm, imm8 PSRLD mm, mm/m64 PSRLD mm, imm8 PSRLQ mm, mm/m64 PSRLQ mm, imm8 Description Shift words in mm right by amount specified in mm/m64 while shifting in zeroes. Shift words in mm right by imm8. Shift doublewords in mm right by amount specified in mm/m64 while shifting in zeroes. Shift doublewords in mm right by imm8. Shift mm right by amount specified in mm/m64 while shifting in zeroes. Shift mm right by imm8 while shifting in zeroes.
Description These instructions shift the bits in the data elements (words, doublewords, or quadword) in the destination operand (first operand) to the right by the number of bits specified in the unsigned count operand (second operand) (refer to Figure 3-79). The result of the shift operation is written to the destination operand. As the bits in the data elements are shifted right, the empty high-order bits are cleared (set to zero). If the value specified by the count operand is greater than 15 (for words), 31 (for doublewords), or 63 (for a quadword), then the destination operand is set to all zeroes. The destination operand must be an MMX technology register; the count operand can be either an MMX technology register, a 64-bit memory location, or an 8-bit immediate. The PSRLW instruction shifts each of the four words of the destination operand to the right by the number of bits specified in the count operand; the PSRLD instruction shifts each of the two doublewords of the destination operand; and the PSRLQ instruction shifts the 64-bit quadword in the destination operand. As the individual data elements are shifted right, the empty highorder bit positions are filled with zeroes.
PSRLW mm, 2 mm 1111111111111100 0001000111000111
shift right
shift right
shift right
shift right
mm
0011111111111111 0000010001110001
3006027
Figure 3-79. Operation of the PSRLW Instruction
3-558
PSRLW/PSRLD/PSRLQPacked Shift Right Logical (Continued)

Operation
IF instruction is PSRLW THEN { DEST(15..0) DEST(15..0) >> COUNT; DEST(31..16) DEST(31..16) >> COUNT; DEST(47..32) DEST(47..32) >> COUNT; DEST(63..48) DEST(63..48) >> COUNT; ELSE IF instruction is PSRLD THEN { DEST(31..0) DEST(31..0) >> COUNT; DEST(63..32) DEST(63..32) >> COUNT; ELSE (* instruction is PSRLQ *) DEST DEST >> COUNT; FI;
3-559

__m64 _m_psrlw (__m64 m, __m64 count)

__m64 _mm_srl_pi16 (__m64 m, __m64 count)
Shifts four 16-bit values in m right the amount specified by count while shifting in zeroes. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_psrlwi (__m64 m, int count)

__m64 _mm_srli_pi16 (__m64 m, int count)
Shifts four 16-bit values in m right the amount specified by count while shifting in zeroes. For the best performance, count should be a constant. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_psrld (__m64 m, __m64 count)

__m64 _mm_sri_pi32 (__m64 m, __m64 count)
Shifts two 32-bit values in m right the amount specified by count while shifting in zeroes. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_psrldi (__m64 m, int count)

__m64 _mm_srli_pi32 (__m64 m, int count)
Shifts two 32-bit values in m right the amount specified by count while shifting in zeroes. For the best performance, count should be a constant. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_psrlq (__m64 m, __m64 count)

__m64 _mm_srl_si64 (__m64 m, __m64 count)
Shifts the 64-bit value in m right the amount specified by count while shifting in zeroes.
3-560

__m64 _m_psrlqi (__m64 m, int count)

__m64 _mm_srli_si64 (__m64 m, int count)
Shifts the 64-bit value in m right the amount specified by count while shifting in zeroes. For the best performance, count should be a constant. Flags Affected None. Protected Mode Exceptions #GP(0) #SS(0) #UD #NM #MF #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-561

3-562
PSUBB/PSUBW/PSUBDPacked Subtract
Opcode 0F F8 /r 0F F9 /r 0F FA /r Instruction PSUBB mm, mm/m64 PSUBW mm, mm/m64 PSUBD mm, mm/m64 Description Subtract packed bytes in mm/m64 from packed bytes in mm. Subtract packed words inmm/m64 from packed words in mm. Subtract packed doublewords in mm/m64 from packed doublewords in mm.
Description These instructions subtract the individual data elements (bytes, words, or doublewords) of the source operand (second operand) from the individual data elements of the destination operand (first operand) (refer to Figure 3-80). If the result of a subtraction exceeds the range for the specified data type (overflows), the result is wrapped around, meaning that the result is truncated so that only the lower (least significant) bits of the result are returned (that is, the carry is ignored). The destination operand must be an MMX technology register; the source operand can be either an MMX technology register or a quadword memory location.
PSUBW mm, mm/m64 mm 1000000000000000 0111111100111000
mm/m64 mm
0000000000000001 1110100011111001
0111111111111111 1001011000111111
3006028
Figure 3-80. Operation of the PSUBW Instruction
The PSUBB instruction subtracts the bytes of the source operand from the bytes of the destination operand and stores the results to the destination operand. When an individual result is too large to be represented in eight bits, the lower eight bits of the result are written to the destination operand and therefore the result wraps around. The PSUBW instruction subtracts the words of the source operand from the words of the destination operand and stores the results to the destination operand. When an individual result is too large to be represented in 16 bits, the lower 16 bits of the result are written to the destination operand and therefore the result wraps around.
3-563
PSUBB/PSUBW/PSUBDPacked Subtract (Continued)

The PSUBD instruction subtracts the doublewords of the source operand from the doublewords of the destination operand and stores the results to the destination operand. When an individual result is too large to be represented in 32 bits, the lower 32 bits of the result are written to the destination operand and therefore the result wraps around. Note that like the integer SUB instruction, the PSUBB, PSUBW, and PSUBD instructions can operate on either unsigned or signed (twos complement notation) packed integers. Unlike the integer instructions, none of the MMX instructions affect the EFLAGS register. With MMX instructions, there are no carry or overflow flags to indicate when overflow has occurred, so the software must control the range of values or else use the with saturation MMX instructions. Operation
IF instruction is PSUBB THEN DEST(7..0) DEST(7..0) SRC(7..0); DEST(15..8) DEST(15..8) SRC(15..8); DEST(23..16) DEST(23..16) SRC(23..16); DEST(31..24) DEST(31..24) SRC(31..24); DEST(39..32) DEST(39..32) SRC(39..32); DEST(47..40) DEST(47..40) SRC(47..40); DEST(55..48) DEST(55..48) SRC(55..48); DEST(63..56) DEST(63..56) SRC(63..56); ELSEIF instruction is PSUBW THEN DEST(15..0) DEST(15..0) SRC(15..0); DEST(31..16) DEST(31..16) SRC(31..16); DEST(47..32) DEST(47..32) SRC(47..32); DEST(63..48) DEST(63..48) SRC(63..48); ELSE { (* instruction is PSUBD *) DEST(31..0) DEST(31..0) SRC(31..0); DEST(63..32) DEST(63..32) SRC(63..32); FI;
3-564

__m64 _m_psubb(__m64 m1, __m64 m2)

__m64 _mm_sub_pi8(__m64 m1, __m64 m2)
Subtract the eight 8-bit values in m2 from the eight 8-bit values in m1. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_psubw(__m64 m1, __m64 m2)

__m64 _mm_sub_pi16(__m64 m1, __m64 m2)
Subtract the four 16-bit values in m2 from the four 16-bit values in m1. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_psubd(__m64 m1, __m64 m2)

__m64 _mm_sub_pi32(__m64 m1, __m64 m2)
Subtract the two 32-bit values in m2 from the two 32-bit values in m1. Flags Affected None. Protected Mode Exceptions #GP(0) #SS(0) #UD #NM #MF #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-565

3-566
PSUBSB/PSUBSWPacked Subtract with Saturation

Opcode 0F E8 /r 0F E9 /r Instruction PSUBSB mm, mm/m64 PSUBSW mm, mm/m64 Description Subtract signed packed bytes in mm/m64 from signed packed bytes in mm and saturate. Subtract signed packed words in mm/m64 from signed packed words in mm and saturate.
Description These instructions subtract the individual signed data elements (bytes or words) of the source operand (second operand) from the individual signed data elements of the destination operand (first operand) (refer to Figure 3-81). If the result of a subtraction exceeds the range for the specified data type, the result is saturated. The destination operand must be an MMX technology register; the source operand can be either an MMX technology register or a quadword memory location.
PSUBSW mm, mm/m64 mm 1000000000000000 0111111100111000
mm/m64 mm
0000000000000001 1110100011111001
1000000000000000 0111111111111111
3006029
Figure 3-81. Operation of the PSUBSW Instruction
The PSUBSB instruction subtracts the signed bytes of the source operand from the signed bytes of the destination operand and stores the results to the destination operand. When an individual result is beyond the range of a signed byte (that is, greater than 7FH or less than 80H), the saturated byte value of 7FH or 80H, respectively, is written to the destination operand. The PSUBSW instruction subtracts the signed words of the source operand from the signed words of the destination operand and stores the results to the destination operand. When an individual result is beyond the range of a signed word (that is, greater than 7FFFH or less than 8000H), the saturated word value of 7FFFH or 8000H, respectively, is written to the destination operand.
3-567
PSUBSB/PSUBSWPacked Subtract with Saturation (Continued)

Operation
IF instruction is PSUBSB THEN DEST(7..0) SaturateToSignedByte(DEST(7..0) SRC (7..0)); DEST(15..8) SaturateToSignedByte(DEST(15..8) SRC(15..8)); DEST(23..16) SaturateToSignedByte(DEST(23..16) SRC(23..16)); DEST(31..24) SaturateToSignedByte(DEST(31..24) SRC(31..24)); DEST(39..32) SaturateToSignedByte(DEST(39..32) SRC(39..32)); DEST(47..40) SaturateToSignedByte(DEST(47..40) SRC(47..40)); DEST(55..48) SaturateToSignedByte(DEST(55..48) SRC(55..48)); DEST(63..56) SaturateToSignedByte(DEST(63..56) SRC(63..56)) ELSE (* instruction is PSUBSW *) DEST(15..0) SaturateToSignedWord(DEST(15..0) SRC(15..0)); DEST(31..16) SaturateToSignedWord(DEST(31..16) SRC(31..16)); DEST(47..32) SaturateToSignedWord(DEST(47..32) SRC(47..32)); DEST(63..48) SaturateToSignedWord(DEST(63..48) SRC(63..48)); FI;
__m64 _m_psubsb(__m64 m1, __m64 m2)

__m64 _mm_subs_pi8(__m64 m1, __m64 m2)
Subtract the eight signed 8-bit values in m2 from the eight signed 8-bit values in m1 and saturate. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_psubsw(__m64 m1, __m64 m2)

__m64 _mm_subs_pi16(__m64 m1, __m64 m2)
Subtract the four signed 16-bit values in m2 from the four signed 16-bit values in m1 and saturate. Flags Affected None.
3-568
PSUBSB/PSUBSWPacked Subtract with Saturation (Continued)

3-569
PSUBUSB/PSUBUSWPacked Subtract Unsigned with Saturation

Opcode 0F D8 /r 0F D9 /r Instruction PSUBUSB mm, mm/m64 PSUBUSW mm, mm/m64 Description Subtract unsigned packed bytes in mm/m64 from unsigned packed bytes in mm and saturate. Subtract unsigned packed words in mm/m64 from unsigned packed words in mm and saturate.
Description These instructions subtract the individual unsigned data elements (bytes or words) of the source operand (second operand) from the individual unsigned data elements of the destination operand (first operand) (refer to Figure 3-82). If the result of an individual subtraction exceeds the range for the specified unsigned data type, the result is saturated. The destination operand musts be an MMX technology register; the source operand can be either an MMX technology register or a quadword memory location.
PSUBUSB mm, mm/m64 mm 10000000 01111111 11111000
mm/m64 mm
11111111 00010111 00000111
00000000 01101000 11110001

3006030
Figure 3-82. Operation of the PSUBUSB Instruction
The PSUBUSB instruction subtracts the unsigned bytes of the source operand from the unsigned bytes of the destination operand and stores the results to the destination operand. When an individual result is less than zero (a negative value), the saturated unsigned byte value of 00H is written to the destination operand. The PSUBUSW instruction subtracts the unsigned words of the source operand from the unsigned words of the destination operand and stores the results to the destination operand. When an individual result is less than zero (a negative value), the saturated unsigned word value of 0000H is written to the destination operand.
3-570
PSUBUSB/PSUBUSWPacked Subtract Unsigned with Saturation (Continued)

Operation
IF instruction is PSUBUSB THEN DEST(7..0) SaturateToUnsignedByte (DEST(7..0 SRC (7..0) ); DEST(15..8) SaturateToUnsignedByte ( DEST(15..8) SRC(15..8) ); DEST(23..16) SaturateToUnsignedByte (DEST(23..16) SRC(23..16) ); DEST(31..24) SaturateToUnsignedByte (DEST(31..24) SRC(31..24) ); DEST(39..32) SaturateToUnsignedByte (DEST(39..32) SRC(39..32) ); DEST(47..40) SaturateToUnsignedByte (DEST(47..40) SRC(47..40) ); DEST(55..48) SaturateToUnsignedByte (DEST(55..48) SRC(55..48) ); DEST(63..56) SaturateToUnsignedByte (DEST(63..56) SRC(63..56) ); ELSE { (* instruction is PSUBUSW *) DEST(15..0) SaturateToUnsignedWord (DEST(15..0) SRC(15..0) ); DEST(31..16) SaturateToUnsignedWord (DEST(31..16) SRC(31..16) ); DEST(47..32) SaturateToUnsignedWord (DEST(47..32) SRC(47..32) ); DEST(63..48) SaturateToUnsignedWord (DEST(63..48) SRC(63..48) ); FI;
__m64 _m_psubusb(__m64 m1, __m64 m2)

__m64 _mm_sub_pu8(__m64 m1, __m64 m2)
Subtract the eight unsigned 8-bit values in m2 from the eight unsigned 8-bit values in m1 and saturate. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_psubusw(__m64 m1, __m64 m2)

__m64 _mm_sub_pu16(__m64 m1, __m64 m2)
Subtract the four unsigned 16-bit values in m2 from the four unsigned 16-bit values in m1 and saturate. Flags Affected None.
3-571
PSUBUSB/PSUBUSWPacked Subtract Unsigned with Saturation (Continued)

3-572
PUNPCKHBW/PUNPCKHWD/PUNPCKHDQUnpack High Packed Data

Opcode 0F 68 /r 0F 69 /r 0F 6A /r Instruction PUNPCKHBW mm, mm/m64 PUNPCKHWD mm, mm/m64 PUNPCKHDQ mm, mm/m64 Description Interleave high-order bytes from mm and mm/m64 into mm. Interleave high-order words from mm and mm/m64 into mm. Interleave high-order doublewords from mm and mm/m64 into mm.
Description These instructions unpack and interleave the high-order data elements (bytes, words, or doublewords) of the destination operand (first operand) and source operand (second operand) into the destination operand (refer to Figure 3-83). The low-order data elements are ignored. The destination operand must be an MMX technology register; the source operand may be either an MMX technology register or a 64-bit memory location. When the source data comes from a memory operand, the full 64-bit operand is accessed from memory, but the instruction uses only the high-order 32 bits.
PUNPCKHBW mm, mm/m64 mm/m64 27 26 25 24 23 22 21 20
mm 17 16 15 14 13 12 11 10
27 17 26 16 25 15 24 14 mm
3006031
Figure 3-83. High-Order Unpacking and Interleaving of Bytes With the PUNPCKHBW Instruction
The PUNPCKHBW instruction interleaves the four high-order bytes of the source operand and the four high-order bytes of the destination operand and writes them to the destination operand. The PUNPCKHWD instruction interleaves the two high-order words of the source operand and the two high-order words of the destination operand and writes them to the destination operand. The PUNPCKHDQ instruction interleaves the high-order doubleword of the source operand and the high-order doubleword of the destination operand and writes them to the destination operand.
3-573
PUNPCKHBW/PUNPCKHWD/PUNPCKHDQUnpack High Packed Data (Continued)

If the source operand is all zeroes, the result (stored in the destination operand) contains zero extensions of the high-order data elements from the original value in the destination operand. With the PUNPCKHBW instruction the high-order bytes are zero extended (that is, unpacked into unsigned words), and with the PUNPCKHWD instruction, the high-order words are zero extended (unpacked into unsigned doublewords). Operation
IF instruction is PUNPCKHBW THEN DEST(7..0) DEST(39..32); DEST(15..8) SRC(39..32); DEST(23..16) DEST(47..40); DEST(31..24) SRC(47..40); DEST(39..32) DEST(55..48); DEST(47..40) SRC(55..48); DEST(55..48) DEST(63..56); DEST(63..56) SRC(63..56); ELSE IF instruction is PUNPCKHW THEN DEST(15..0) DEST(47..32); DEST(31..16) SRC(47..32); DEST(47..32) DEST(63..48); DEST(63..48) SRC(63..48); ELSE (* instruction is PUNPCKHDQ *) DEST(31..0) DEST(63..32) DEST(63..32) SRC(63..32); FI;
3-574

__m64 _m_punpckhbw (__m64 m1, __m64 m2)

__m64 _mm_unpckhi_pi8 (__m64 m1, __m64 m2)
Interleave the four 8-bit values from the high half of m1 with the four values from the high half of m2 and take the least significant element from m1. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_punpckhwd (__m64 m1, __m64 m2)

Interleave the two 16-bit values from the high half of m1 with the two values from the high half of m2 and take the least significant element from m1. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_punpckhdq (__m64 m1, __m64 m2)

Interleave the 32-bit value from the high half of m1 with the 32-bit value from the high half of m2 and take the least significant element from m1. Flags Affected None.
3-575

3-576
PUNPCKLBW/PUNPCKLWD/PUNPCKLDQUnpack Low Packed Data

Opcode 0F 60 /r 0F 61 /r 0F 62 /r Instruction PUNPCKLBW mm, mm/m32 PUNPCKLWD mm, mm/m32 PUNPCKLDQ mm, mm/m32 Description Interleave low-order bytes from mm and mm/m64 into mm. Interleave low-order words from mm and mm/m64 into mm. Interleave low-order doublewords from mm and mm/m64 into mm.
Description These instructions unpack and interleave the low-order data elements (bytes, words, or doublewords) of the destination and source operands into the destination operand (refer to Figure 384). The destination operand must be an MMX technology register; the source operand may be either an MMX technology register or a memory location. When source data comes from an MMX technology register, the upper 32 bits of the register are ignored. When the source data comes from a memory, only 32-bits are accessed from memory.
PUNPCKLBW mm, mm/m32 mm/m32 2 3 2 2 21 20
mm 17 16 15 14 13 12 11 10
2 3 1 3 22 12 21 11 20 10 mm
3006032
Figure 3-84. Low-Order Unpacking and Interleaving of Bytes With the PUNPCKLBW Instruction
The PUNPCKLBW instruction interleaves the four low-order bytes of the source operand and the four low-order bytes of the destination operand and writes them to the destination operand. The PUNPCKLWD instruction interleaves the two low-order words of the source operand and the two low-order words of the destination operand and writes them to the destination operand. The PUNPCKLDQ instruction interleaves the low-order doubleword of the source operand and the low-order doubleword of the destination operand and writes them to the destination operand.
3-577
PUNPCKLBW/PUNPCKLWD/PUNPCKLDQUnpack Low Packed Data (Continued)

If the source operand is all zeroes, the result (stored in the destination operand) contains zero extensions of the high-order data elements from the original value in the destination operand. With the PUNPCKLBW instruction the low-order bytes are zero extended (that is, unpacked into unsigned words), and with the PUNPCKLWD instruction, the low-order words are zero extended (unpacked into unsigned doublewords). Operation
IF instruction is PUNPCKLBW THEN DEST(63..56) SRC(31..24); DEST(55..48) DEST(31..24); DEST(47..40) SRC(23..16); DEST(39..32) DEST(23..16); DEST(31..24) SRC(15..8); DEST(23..16) DEST(15..8); DEST(15..8) SRC(7..0); DEST(7..0) DEST(7..0); ELSE IF instruction is PUNPCKLWD THEN DEST(63..48) SRC(31..16); DEST(47..32) DEST(31..16); DEST(31..16) SRC(15..0); DEST(15..0) DEST(15..0); ELSE (* instruction is PUNPCKLDQ *) DEST(63..32) SRC(31..0); DEST(31..0) DEST(31..0); FI;
3-578

__m64 _m_punpcklbw (__m64 m1, __m64 m2)

__m64 _mm_unpcklo_pi8 (__m64 m1, __m64 m2)
Interleave the four 8-bit values from the low half of m1 with the four values from the low half of m2 and take the least significant element from m1. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_punpcklwd (__m64 m1, __m64 m2)

Interleave the two 16-bit values from the low half of m1 with the two values from the low half of m2 and take the least significant element from m1. Pre-4.0 Intel C/C++ Compiler intrinsic:
__m64 _m_punpckldq (__m64 m1, __m64 m2)

Interleave the 32-bit value from the low half of m1 with the 32-bit value from the low half of m2 and take the least significant element from m1. Protected Mode Exceptions #GP(0) #SS(0) #UD #NM #MF #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-579

3-580
PUSHPush Word or Doubleword Onto the Stack

Opcode FF /6 FF /6 50+rw 50+rd 6A 68 68 0E 16 1E 06 0F A0 0F A8 Instruction PUSH r/m16 PUSH r/m32 PUSH r16 PUSH r32 PUSH imm8 PUSH imm16 PUSH imm32 PUSH CS PUSH SS PUSH DS PUSH ES PUSH FS PUSH GS Description Push r/m16 Push r/m32 Push r16 Push r32 Push imm8 Push imm16 Push imm32 Push CS Push SS Push DS Push ES Push FS Push GS
Description This instruction decrements the stack pointer and then stores the source operand on the top of the stack. The address-size attribute of the stack segment determines the stack pointer size (16 bits or 32 bits), and the operand-size attribute of the current code segment determines the amount the stack pointer is decremented (two bytes or four bytes). For example, if these addressand operand-size attributes are 32, the 32-bit ESP register (stack pointer) is decremented by four and, if they are 16, the 16-bit SP register is decremented by 2.(The B flag in the stack segments segment descriptor determines the stacks address-size attribute, and the D flag in the current code segments segment descriptor, along with prefixes, determines the operand-size attribute and also the address-size attribute of the source operand.) Pushing a 16-bit operand when the stack address-size attribute is 32 can result in a misaligned the stack pointer (that is, the stack pointer is not aligned on a doubleword boundary). The PUSH ESP instruction pushes the value of the ESP register as it existed before the instruction was executed. Thus, if a PUSH instruction uses a memory operand in which the ESP register is used as a base register for computing the operand address, the effective address of the operand is computed before the ESP register is decremented. In the real-address mode, if the ESP or SP register is 1 when the PUSH instruction is executed, the processor shuts down due to a lack of stack space. No exception is generated to indicate this condition. Intel Architecture Compatibility For Intel Architecture processors from the Intel 286 on, the PUSH ESP instruction pushes the value of the ESP register as it existed before the instruction was executed. (This is also true in the real-address and virtual-8086 modes.) For the Intel 8086 processor, the PUSH SP instruction pushes the new value of the SP register (that is the value after it has been decremented by 2).
3-581
PUSHPush Word or Doubleword Onto the Stack (Continued)

Operation
IF StackAddrSize = 32 THEN IF OperandSize = 32 THEN ESP ESP 4; SS:ESP SRC; (* push doubleword *) ELSE (* OperandSize = 16*) ESP ESP 2; SS:ESP SRC; (* push word *) FI; ELSE (* StackAddrSize = 16*) IF OperandSize = 16 THEN SP SP 2; SS:SP SRC; (* push word *) ELSE (* OperandSize = 32*) SP SP 4; SS:SP SRC; (* push doubleword *) FI; FI;
Flags Affected None. Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-582
PUSHPush Word or Doubleword Onto the Stack (Continued)

Real-Address Mode Exceptions #GP #SS If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If the new value of the SP or ESP register is outside the stack segment limit. Virtual-8086 Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made.
3-583
PUSHA/PUSHADPush All General-Purpose Registers

Opcode 60 60 Instruction PUSHA PUSHAD Description Push AX, CX, DX, BX, original SP, BP, SI, and DI Push EAX, ECX, EDX, EBX, original ESP, EBP, ESI, and EDI
Description These instructions push the contents of the general-purpose registers onto the stack. The registers are stored on the stack in the following order: EAX, ECX, EDX, EBX, EBP, ESP (original value), EBP, ESI, and EDI (if the current operand-size attribute is 32) and AX, CX, DX, BX, SP (original value), BP, SI, and DI (if the operand-size attribute is 16). These instructions perform the reverse operation of the POPA/POPAD instructions. The value pushed for the ESP or SP register is its value before prior to pushing the first register (refer to the Operation section below). The PUSHA (push all) and PUSHAD (push all double) mnemonics reference the same opcode. The PUSHA instruction is intended for use when the operand-size attribute is 16 and the PUSHAD instruction for when the operand-size attribute is 32. Some assemblers may force the operand size to 16 when PUSHA is used and to 32 when PUSHAD is used. Others may treat these mnemonics as synonyms (PUSHA/PUSHAD) and use the current setting of the operandsize attribute to determine the size of values to be pushed from the stack, regardless of the mnemonic used. In the real-address mode, if the ESP or SP register is 1, 3, or 5 when the PUSHA/PUSHAD instruction is executed, the processor shuts down due to a lack of stack space. No exception is generated to indicate this condition.
3-584
PUSHA/PUSHADPush All General-Purpose Register (Continued)

Operation
IF OperandSize = 32 (* PUSHAD instruction *) THEN Temp (ESP); Push(EAX); Push(ECX); Push(EDX); Push(EBX); Push(Temp); Push(EBP); Push(ESI); Push(EDI); ELSE (* OperandSize = 16, PUSHA instruction *) Temp (SP); Push(AX); Push(CX); Push(DX); Push(BX); Push(Temp); Push(BP); Push(SI); Push(DI); FI;
Flags Affected None. Protected Mode Exceptions #SS(0) #PF(fault-code) #AC(0) If the starting or ending stack address is outside the stack segment limit. If a page fault occurs. If an unaligned memory reference is made while the current privilege level is 3 and alignment checking is enabled.
3-585
PUSHA/PUSHADPush All General-Purpose Register (Continued)

Real-Address Mode Exceptions #GP If the ESP or SP register contains 7, 9, 11, 13, or 15.
Virtual-8086 Mode Exceptions #GP(0) #PF(fault-code) #AC(0) If the ESP or SP register contains 7, 9, 11, 13, or 15. If a page fault occurs. If an unaligned memory reference is made while alignment checking is enabled.
3-586
PUSHF/PUSHFDPush EFLAGS Register onto the Stack

Opcode 9C 9C Instruction PUSHF PUSHFD Description Push lower 16 bits of EFLAGS Push EFLAGS
Description These instructions decrement the stack pointer by four (if the current operand-size attribute is 32) and pushes the entire contents of the EFLAGS register onto the stack, or decrements the stack pointer by two (if the operand-size attribute is 16) and pushes the lower 16 bits of the EFLAGS register (that is, the FLAGS register) onto the stack. (These instructions reverse the operation of the POPF/POPFD instructions.) When copying the entire EFLAGS register to the stack, the VM and RF flags (bits 16 and 17) are not copied; instead, the values for these flags are cleared in the EFLAGS image stored on the stack. Refer to Section 3.6.3. in Chapter 3, Basic Execution Environment of the Intel Architecture Software Developers Manual, Volume 1, for information about the EFLAGS registers. The PUSHF (push flags) and PUSHFD (push flags double) mnemonics reference the same opcode. The PUSHF instruction is intended for use when the operand-size attribute is 16 and the PUSHFD instruction for when the operand-size attribute is 32. Some assemblers may force the operand size to 16 when PUSHF is used and to 32 when PUSHFD is used. Others may treat these mnemonics as synonyms (PUSHF/PUSHFD) and use the current setting of the operand-size attribute to determine the size of values to be pushed from the stack, regardless of the mnemonic used. When in virtual-8086 mode and the I/O privilege level (IOPL) is less than 3, the PUSHF/PUSHFD instruction causes a general protection exception (#GP). In the real-address mode, if the ESP or SP register is 1, 3, or 5 when the PUSHA/PUSHAD instruction is executed, the processor shuts down due to a lack of stack space. No exception is generated to indicate this condition.
3-587
PUSHF/PUSHFDPush EFLAGS Register onto the Stack (Continued)

Operation
IF (PE=0) OR (PE=1 AND ((VM=0) OR (VM=1 AND IOPL=3))) (* Real-Address Mode, Protected mode, or Virtual-8086 mode with IOPL equal to 3 *) THEN IF OperandSize = 32 THEN push(EFLAGS AND 00FCFFFFH); (* VM and RF EFLAG bits are cleared in image stored on the stack*) ELSE push(EFLAGS); (* Lower 16 bits only *) FI; ELSE (* In Virtual-8086 Mode with IOPL less than 0 *) #GP(0); (* Trap to virtual-8086 monitor *) FI;
Flags Affected None. Protected Mode Exceptions #SS(0) #PF(fault-code) #AC(0) If the new value of the ESP register is outside the stack segment boundary. If a page fault occurs. If an unaligned memory reference is made while the current privilege level is 3 and alignment checking is enabled.
Real-Address Mode Exceptions None. Virtual-8086 Mode Exceptions #GP(0) #PF(fault-code) #AC(0) If the I/O privilege level is less than 3. If a page fault occurs. If an unaligned memory reference is made while alignment checking is enabled.
3-588
PXORLogical Exclusive OR
Opcode 0F EF /r Instruction PXOR mm, mm/m64 Description XOR quadword from mm/m64 to quadword in mm.
Description This instruction performs a bitwise logical exclusive-OR (XOR) operation on the quadword source (second) and destination (first) operands and stores the result in the destination operand location (refer to Figure 3-85). The source operand can be an MMX technology register or a quadword memory location; the destination operand must be an MMX technology register. Each bit of the result is 1 if the corresponding bits of the two operands are different; each bit is 0 if the corresponding bits of the operands are the same.
PXOR mm, mm/m64 mm 1111111111111000000000000000010110110101100010000111011101110111
^
mm/m64 0001000011011001010100000011000100011110111011110001010110010101 mm
1110111100100001010100000011010010101011011001110110001011100010
3006033
Figure 3-85. Operation of the PXOR Instruction
Operation
DEST DEST XOR SRC;
__m64 _m_pxor(__m64 m1, __m64 m2)

__m64 _mm_xor_si64(__m64 m1, __m64 m2)
Perform a bitwise XOR of the 64-bit value in m1 with the 64-bit value in m2.
3-589
PXORLogical Exclusive OR (Continued)

3-590
RCL/RCR/ROL/ROR-Rotate
Opcode D0 /2 D2 /2 C0 /2 ib D1 /2 D3 /2 C1 /2 ib D1 /2 D3 /2 C1 /2 ib D0 /3 D2 /3 C0 /3 ib D1 /3 D3 /3 C1 /3 ib D1 /3 D3 /3 C1 /3 ib D0 /0 D2 /0 C0 /0 ib D1 /0 D3 /0 C1 /0 ib D1 /0 D3 /0 C1 /0 ib D0 /1 D2 /1 C0 /1 ib D1 /1 D3 /1 C1 /1 ib D1 /1 D3 /1 C1 /1 ib Instruction RCL r/m8,1 RCL r/m8,CL RCL r/m8,imm8 RCL r/m16,1 RCL r/m16,CL RCL r/m16,imm8 RCL r/m32,1 RCL r/m32,CL RCL r/m32,imm8 RCR r/m8,1 RCR r/m8,CL RCR r/m8,imm8 RCR r/m16,1 RCR r/m16,CL RCR r/m16,imm8 RCR r/m32,1 RCR r/m32,CL RCR r/m32,imm8 ROL r/m8,1 ROL r/m8,CL ROL r/m8,imm8 ROL r/m16,1 ROL r/m16,CL ROL r/m16,imm8 ROL r/m32,1 ROL r/m32,CL ROL r/m32,imm8 ROR r/m8,1 ROR r/m8,CL ROR r/m8,imm8 ROR r/m16,1 ROR r/m16,CL ROR r/m16,imm8 ROR r/m32,1 ROR r/m32,CL ROR r/m32,imm8 Description Rotate nine bits (CF,r/m8) left once Rotate nine bits (CF,r/m8) left CL times Rotate nine bits (CF,r/m8) left imm8 times Rotate 17 bits (CF,r/m16) left once Rotate 17 bits (CF,r/m16) left CL times Rotate 17 bits (CF,r/m16) left imm8 times Rotate 33 bits (CF,r/m32) left once Rotate 33 bits (CF,r/m32) left CL times Rotate 33 bits (CF,r/m32) left imm8 times Rotate nine bits (CF,r/m8) right once Rotate nine bits (CF,r/m8) right CL times Rotate nine bits (CF,r/m8) right imm8 times Rotate 17 bits (CF,r/m16) right once Rotate 17 bits (CF,r/m16) right CL times Rotate 17 bits (CF,r/m16) right imm8 times Rotate 33 bits (CF,r/m32) right once Rotate 33 bits (CF,r/m32) right CL times Rotate 33 bits (CF,r/m32) right imm8 times Rotate eight bits r/m8 left once Rotate eight bits r/m8 left CL times Rotate eight bits r/m8 left imm8 times Rotate 16 bits r/m16 left once Rotate 16 bits r/m16 left CL times Rotate 16 bits r/m16 left imm8 times Rotate 32 bits r/m32 left once Rotate 32 bits r/m32 left CL times Rotate 32 bits r/m32 left imm8 times Rotate eight bits r/m8 right once Rotate eight bits r/m8 right CL times Rotate eight bits r/m16 right imm8 times Rotate 16 bits r/m16 right once Rotate 16 bits r/m16 right CL times Rotate 16 bits r/m16 right imm8 times Rotate 32 bits r/m32 right once Rotate 32 bits r/m32 right CL times Rotate 32 bits r/m32 right imm8 times
3-591
RCL/RCR/ROL/ROR-Rotate (Continued)
Description These instructions shift (rotate) the bits of the first operand (destination operand) the number of bit positions specified in the second operand (count operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the count operand is an unsigned integer that can be an immediate or a value in the CL register. The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the five least-significant bits. The rotate left (ROL) and rotate through carry left (RCL) instructions shift all the bits toward more-significant bit positions, except for the most-significant bit, which is rotated to the leastsignificant bit location. For more information, refer to Figure 6-10 in Chapter 6, Instruction Set Summary of the Intel Architecture Software Developers Manual, Volume 1. The rotate right (ROR) and rotate through carry right (RCR) instructions shift all the bits toward less significant bit positions, except for the least-significant bit, which is rotated to the most-significant bit location. The RCL and RCR instructions include the CF flag in the rotation. The RCL instruction shifts the CF flag into the least-significant bit and shifts the most-significant bit into the CF flag. For more information, refer to Figure 6-10 in Chapter 6, Instruction Set Summary of the Intel Architecture Software Developers Manual, Volume 1. The RCR instruction shifts the CF flag into the most-significant bit and shifts the least-significant bit into the CF flag. For the ROL and ROR instructions, the original value of the CF flag is not a part of the result, but the CF flag receives a copy of the bit that was shifted from one end to the other. The OF flag is defined only for the 1-bit rotates; it is undefined in all other cases (except that a zero-bit rotate does nothing, that is affects no flags). For left rotates, the OF flag is set to the exclusive OR of the CF bit (after the rotate) and the most-significant bit of the result. For right rotates, the OF flag is set to the exclusive OR of the two most-significant bits of the result. Intel Architecture Compatibility The 8086 does not mask the rotation count. However, all other Intel Architecture processors (starting with the Intel 286 processor) do mask the rotation count to five bits, resulting in a maximum count of 31. This masking is done in all operating modes (including the virtual-8086 mode) to reduce the maximum execution time of the instructions. Operation
(* RCL and RCR instructions *) SIZE OperandSize CASE (determine count) OF SIZE = 8: tempCOUNT (COUNT AND 1FH) MOD 9; SIZE = 16: tempCOUNT (COUNT AND 1FH) MOD 17; SIZE = 32: tempCOUNT COUNT AND 1FH; ESAC; (* RCL instruction operation *)
3-592
WHILE (tempCOUNT 0) DO tempCF MSB(DEST); DEST (DEST 2) + CF; CF tempCF; tempCOUNT tempCOUNT 1; OD; ELIHW; IF COUNT = 1 THEN OF MSB(DEST) XOR CF; ELSE OF is undefined; FI; (* RCR instruction operation *) IF COUNT = 1 THEN OF MSB(DEST) XOR CF; ELSE OF is undefined; FI; WHILE (tempCOUNT 0) DO tempCF LSB(SRC); DEST (DEST / 2) + (CF * 2SIZE); CF tempCF; tempCOUNT tempCOUNT 1; OD; (* ROL and ROR instructions *) SIZE OperandSize CASE (determine count) OF SIZE = 8: tempCOUNT COUNT MOD 8; SIZE = 16: tempCOUNT COUNT MOD 16; SIZE = 32: tempCOUNT COUNT MOD 32; ESAC; (* ROL instruction operation *) WHILE (tempCOUNT 0) DO tempCF MSB(DEST); DEST (DEST 2) + tempCF; tempCOUNT tempCOUNT 1; OD; ELIHW; CF LSB(DEST); IF COUNT = 1 THEN OF MSB(DEST) XOR CF; ELSE OF is undefined; FI;
3-593
(* ROR instruction operation *) WHILE (tempCOUNT 0) DO tempCF LSB(SRC); DEST (DEST / 2) + (tempCF 2SIZE); tempCOUNT tempCOUNT 1; OD; ELIHW; CF MSB(DEST); IF COUNT = 1 THEN OF MSB(DEST) XOR MSB 1(DEST); ELSE OF is undefined; FI;
Flags Affected The CF flag contains the value of the bit shifted into it. The OF flag is affected only for singlebit rotates (refer to Description above); it is undefined for multi-bit rotates. The SF, ZF, AF, and PF flags are not affected. Protected Mode Exceptions #GP(0) If the source operand is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-594
3-595
RCPPSPacked Single-FP Reciprocal

Opcode 0F,53,/r Instruction RCPPS xmm1, xmm2/m128 Description Return a packed approximation of the reciprocal of XMM2/Mem.
Description RCPPS returns an approximation of the reciprocal of the SP FP numbers from xmm2/m128. The maximum error for this approximation is:
Error <=1.5x2-12
5.0
125.0
=
0.2
Figure 3-86. Operation of the RCPPS Instruction
Operation
DEST[31-0] = APPROX (1.0/(SRC/m128[31-0])); DEST[63-32] = APPROX (1.0/(SRC/m128[63-32])); DEST[95-64] = APPROX (1.0/(SRC/m128[95-64])); DEST[127-96] = APPROX (1.0/(SRC/m128[127-96]));

__m128 _mm_rcp_ps(__m128 a)
Computes the approximations of the reciprocals of the four SP FP values of a.
3-596
RCPPSPacked Single-FP Reciprocal (Continued)

Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set.
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) Comments RCPPS is not affected by the rounding control in MXCSR. Denormal inputs are treated as zeroes (of the same sign) and underflow results are always flushed to zero, with the sign of the operand. For a page fault.
3-597
RCPSSScalar Single-FP Reciprocal

Opcode F3,0F,53,/r Instruction RCPSS xmm1, xmm2/m32 Description Return an approximation of the reciprocal of the lower SP FP number in XMM2/Mem.
Description RCPSS returns an approximation of the reciprocal of the lower SP FP number from xmm2/m32; the upper three fields are passed through from xmm1. The maximum error for this approximation is:
|Error| <= 1.5x2-12
RCPSS xmm1,xmm2/m128 Xmm1
Xmm2/ m128 Xmm1
Figure 3-87. Operation of the RCPSS Instruction
Operation
DEST[31-0] = APPROX (1.0/(SRC/m32[31-0])); DEST[63-32] = DEST[63-32]; DEST[95-64] = DEST[95-64]; DEST[127-96] = DEST[127-96];
3-598
RCPSSScalar Single-FP Reciprocal (Continued)

__m128 _mm_rcp_ss(__m128 a)
Computes the approximation of the reciprocal of the lower SP FP value of a; the upper three SP FP values are passed through. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #AC #NM For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. For unaligned memory reference if the current privilege level is 3. If TS bit in CR0 is set.
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #AC #PF (fault-code) Comments RCPSS is not affected by the rounding control in MXCSR. Denormal inputs are treated as zeroes (of the same sign) and underflow results are always flushed to zero, with the sign of the operand. For unaligned memory reference if the current privilege level is 3. For a page fault.
3-599
RDMSRRead from Model Specific Register

Opcode 0F 32 Instruction RDMSR Description Load MSR specified by ECX into EDX:EAX
Description This instruction loads the contents of a 64-bit model specific register (MSR) specified in the ECX register into registers EDX:EAX. The EDX register is loaded with the high-order 32 bits of the MSR and the EAX register is loaded with the low-order 32 bits. If less than 64 bits are implemented in the MSR being read, the values returned to EDX:EAX in unimplemented bit locations are undefined. This instruction must be executed at privilege level 0 or in real-address mode; otherwise, a general protection exception #GP(0) will be generated. Specifying a reserved or unimplemented MSR address in ECX will also cause a general protection exception. The MSRs control functions for testability, execution tracing, performance-monitoring and machine check errors. Appendix B, Model-Specific Registers, in the Intel Architecture Software Developers Manual, Volume 3, lists all the MSRs that can be read with this instruction and their addresses. The CPUID instruction should be used to determine whether MSRs are supported (EDX[5]=1) before using this instruction. Intel Architecture Compatibility The MSRs and the ability to read them with the RDMSR instruction were introduced into the Intel Architecture with the Pentium processor. Execution of this instruction by an Intel Architecture processor earlier than the Pentium processor results in an invalid opcode exception #UD. Operation
EDX:EAX MSR[ECX];
Flags Affected None. Protected Mode Exceptions #GP(0) If the current privilege level is not 0. If the value in ECX specifies a reserved or unimplemented MSR address.
3-600
RDMSRRead from Model Specific Register (Continued)

Real-Address Mode Exceptions #GP If the value in ECX specifies a reserved or unimplemented MSR address.
Virtual-8086 Mode Exceptions #GP(0) The RDMSR instruction is not recognized in virtual-8086 mode.
3-601
RDPMCRead Performance-Monitoring Counters

Opcode 0F 33 Instruction RDPMC Description Read performance-monitoring counter specified by ECX into EDX:EAX
Description This instruction loads the contents of the 40-bit performance-monitoring counter specified in the ECX register into registers EDX:EAX. The EDX register is loaded with the high-order eight bits of the counter and the EAX register is loaded with the low-order 32 bits. The Pentium Pro processor has two performance-monitoring counters (0 and 1), which are specified by placing 0000H or 0001H, respectively, in the ECX register. The RDPMC instruction allows application code running at a privilege level of 1, 2, or 3 to read the performance-monitoring counters if the PCE flag in the CR4 register is set. This instruction is provided to allow performance monitoring by application code without incurring the overhead of a call to an operating-system procedure. The performance-monitoring counters are event counters that can be programmed to count events such as the number of instructions decoded, number of interrupts received, or number of cache loads. Appendix A, Performance-Monitoring Events, in the Intel Architecture Software Developers Manual, Volume 3, lists all the events that can be counted. The RDPMC instruction does not serialize instruction execution. That is, it does not imply that all the events caused by the preceding instructions have been completed or that events caused by subsequent instructions have not begun. If an exact event count is desired, software must use a serializing instruction (such as the CPUID instruction) before and/or after the execution of the RDPMC instruction. The RDPMC instruction can execute in 16-bit addressing mode or virtual-8086 mode; however, the full contents of the ECX register are used to determine the counter to access and a full 40-bit result is returned (the low-order 32 bits in the EAX register and the high-order nine bits in the EDX register). Intel Architecture Compatibility The RDPMC instruction was introduced into the Intel Architecture in the Pentium Pro processor and the Pentium processor with MMX technology. The other Pentium processors have performance-monitoring counters, but they must be read with the RDMSR instruction. Operation
IF (ECX = 0 OR 1) AND ((CR4.PCE = 1) OR ((CR4.PCE = 0) AND (CPL=0))) THEN EDX:EAX PMC[ECX]; ELSE (* ECX is not 0 or 1 and/or CR4.PCE is 0 and CPL is 1, 2, or 3 *) #GP(0); FI;
3-602
RDPMCRead Performance-Monitoring Counters (Continued)

Flags Affected None. Protected Mode Exceptions #GP(0) If the current privilege level is not 0 and the PCE flag in the CR4 register is clear. If the value in the ECX register is not 0 or 1. Real-Address Mode Exceptions #GP If the PCE flag in the CR4 register is clear. If the value in the ECX register is not 0 or 1. Virtual-8086 Mode Exceptions #GP(0) If the PCE flag in the CR4 register is clear. If the value in the ECX register is not 0 or 1.
3-603
RDTSCRead Time-Stamp Counter

Opcode 0F 31 Instruction RDTSC Description Read time-stamp counter into EDX:EAX
Description This instruction loads the current value of the processors time-stamp counter into the EDX:EAX registers. The time-stamp counter is contained in a 64-bit MSR. The high-order 32 bits of the MSR are loaded into the EDX register, and the low-order 32 bits are loaded into the EAX register. The processor increments the time-stamp counter MSR every clock cycle and resets it to 0 whenever the processor is reset. The time stamp disable (TSD) flag in register CR4 restricts the use of the RDTSC instruction. When the TSD flag is clear, the RDTSC instruction can be executed at any privilege level; when the flag is set, the instruction can only be executed at privilege level 0. The time-stamp counter can also be read with the RDMSR instruction, when executing at privilege level 0. The RDTSC instruction is not a serializing instruction. Thus, it does not necessarily wait until all previous instructions have been executed before reading the counter. Similarly, subsequent instructions may begin execution before the read operation is performed. This instruction was introduced into the Intel Architecture in the Pentium processor. Operation
IF (CR4.TSD = 0) OR ((CR4.TSD = 1) AND (CPL=0)) THEN EDX:EAX TimeStampCounter; ELSE (* CR4 is 1 and CPL is 1, 2, or 3 *) #GP(0) FI;
Flags Affected None. Protected Mode Exceptions #GP(0) If the TSD flag in register CR4 is set and the CPL is greater than 0.
Real-Address Mode Exceptions #GP If the TSD flag in register CR4 is set.
Virtual-8086 Mode Exceptions #GP(0) If the TSD flag in register CR4 is set.
3-604
REP/REPE/REPZ/REPNE/REPNZRepeat String Operation Prefix

Opcode F3 6C F3 6D F3 6D F3 A4 F3 A5 F3 A5 F3 6E F3 6F F3 6F F3 AC F3 AD F3 AD F3 AA F3 AB F3 AB F3 A6 F3 A7 F3 A7 F3 AE F3 AF F3 AF F2 A6 F2 A7 F2 A7 F2 AE F2 AF F2 AF Instruction REP INS r/m8, DX REP INS r/m16,DX REP INS r/m32,DX REP MOVS m8,m8 REP MOVS m16,m16 REP MOVS m32,m32 REP OUTS DX,r/m8 REP OUTS DX,r/m16 REP OUTS DX,r/m32 REP LODS AL REP LODS AX REP LODS EAX REP STOS m8 REP STOS m16 REP STOS m32 REPE CMPS m8,m8 REPE CMPS m16,m16 REPE CMPS m32,m32 REPE SCAS m8 REPE SCAS m16 REPE SCAS m32 REPNE CMPS m8,m8 REPNE CMPS m16,m16 REPNE CMPS m32,m32 REPNE SCAS m8 REPNE SCAS m16 REPNE SCAS m32 Description Input (E)CX bytes from port DX into ES:[(E)DI] Input (E)CX words from port DX into ES:[(E)DI] Input (E)CX doublewords from port DX into ES:[(E)DI] Move (E)CX bytes from DS:[(E)SI] to ES:[(E)DI] Move (E)CX words from DS:[(E)SI] to ES:[(E)DI] Move (E)CX doublewords from DS:[(E)SI] to ES:[(E)DI] Output (E)CX bytes from DS:[(E)SI] to port DX Output (E)CX words from DS:[(E)SI] to port DX Output (E)CX doublewords from DS:[(E)SI] to port DX Load (E)CX bytes from DS:[(E)SI] to AL Load (E)CX words from DS:[(E)SI] to AX Load (E)CX doublewords from DS:[(E)SI] to EAX Fill (E)CX bytes at ES:[(E)DI] with AL Fill (E)CX words at ES:[(E)DI] with AX Fill (E)CX doublewords at ES:[(E)DI] with EAX Find nonmatching bytes in ES:[(E)DI] and DS:[(E)SI] Find nonmatching words in ES:[(E)DI] and DS:[(E)SI] Find nonmatching doublewords in ES:[(E)DI] and DS:[(E)SI] Find non-AL byte starting at ES:[(E)DI] Find non-AX word starting at ES:[(E)DI] Find non-EAX doubleword starting at ES:[(E)DI] Find matching bytes in ES:[(E)DI] and DS:[(E)SI] Find matching words in ES:[(E)DI] and DS:[(E)SI] Find matching doublewords in ES:[(E)DI] and DS:[(E)SI] Find AL, starting at ES:[(E)DI] Find AX, starting at ES:[(E)DI] Find EAX, starting at ES:[(E)DI]
Description These instructions repeat a string instruction the number of times specified in the count register ((E)CX) or until the indicated condition of the ZF flag is no longer met. The REP (repeat), REPE (repeat while equal), REPNE (repeat while not equal), REPZ (repeat while zero), and REPNZ (repeat while not zero) mnemonics are prefixes that can be added to one of the string instructions. The REP prefix can be added to the INS, OUTS, MOVS, LODS, and STOS instructions, and the REPE, REPNE, REPZ, and REPNZ prefixes can be added to the CMPS and SCAS instructions. (The REPZ and REPNZ prefixes are synonymous forms of the REPE and REPNE prefixes, respectively.) The behavior of the REP prefix is undefined when used with non-string instructions.
3-605
REP/REPE/REPZ/REPNE /REPNZRepeat String Operation Prefix (Continued)

The REP prefixes apply only to one string instruction at a time. To repeat a block of instructions, use the LOOP instruction or another looping construct. All of these repeat prefixes cause the associated instruction to be repeated until the count in register (E)CX is decremented to 0 (refer to the following table). If the current address-size attribute is 32, register ECX is used as a counter, and if the address-size attribute is 16, the CX register is used. The REPE, REPNE, REPZ, and REPNZ prefixes also check the state of the ZF flag after each iteration and terminate the repeat loop if the ZF flag is not in the specified state. When both termination conditions are tested, the cause of a repeat termination can be determined either by testing the (E)CX register with a JECXZ instruction or by testing the ZF flag with a JZ, JNZ, and JNE instruction.
Repeat Conditions
Repeat Prefix REP REPE/REPZ REPNE/REPNZ Termination Condition 1 ECX=0 ECX=0 ECX=0 Termination Condition 2 None ZF=0 ZF=1
When the REPE/REPZ and REPNE/REPNZ prefixes are used, the ZF flag does not require initialization because both the CMPS and SCAS instructions affect the ZF flag according to the results of the comparisons they make. A repeating string operation can be suspended by an exception or interrupt. When this happens, the state of the registers is preserved to allow the string operation to be resumed upon a return from the exception or interrupt handler. The source and destination registers point to the next string elements to be operated on, the EIP register points to the string instruction, and the ECX register has the value it held following the last successful iteration of the instruction. This mechanism allows long string operations to proceed without affecting the interrupt response time of the system. When a fault occurs during the execution of a CMPS or SCAS instruction that is prefixed with REPE or REPNE, the EFLAGS value is restored to the state prior to the execution of the instruction. Since the SCAS and CMPS instructions do not use EFLAGS as an input, the processor can resume the instruction after the page fault handler. Use the REP INS and REP OUTS instructions with caution. Not all I/O ports can handle the rate at which these instructions execute. A REP STOS instruction is the fastest way to initialize a large block of memory.
3-606
REP/REPE/REPZ/REPNE/REPNZRepeat String Operation Prefix (Continued)

Operation
IF AddressSize = 16 THEN use CX for CountReg; ELSE (* AddressSize = 32 *) use ECX for CountReg; FI; WHILE CountReg 0 DO service pending interrupts (if any); execute associated string instruction; CountReg CountReg 1; IF CountReg = 0 THEN exit WHILE loop FI; IF (repeat prefix is REPZ or REPE) AND (ZF=0) OR (repeat prefix is REPNZ or REPNE) AND (ZF=1) THEN exit WHILE loop FI; OD;
Flags Affected None; however, the CMPS and SCAS instructions do set the status flags in the EFLAGS register. Exceptions (All Operating Modes) None; however, exceptions can be generated by the instruction a repeat prefix is associated with.
3-607
RETReturn from Procedure

Opcode C3 CB C2 iw CA iw Instruction RET RET RET imm16 RET imm16 Description Near return to calling procedure Far return to calling procedure Near return to calling procedure and pop imm16 bytes from stack Far return to calling procedure and pop imm16 bytes from stack
Description This instruction transfers program control to a return address located on the top of the stack. The address is usually placed on the stack by a CALL instruction, and the return is made to the instruction that follows the CALL instruction. The optional source operand specifies the number of stack bytes to be released after the return address is popped; the default is none. This operand can be used to release parameters from the stack that were passed to the called procedure and are no longer needed. It must be used when the CALL instruction used to switch to a new procedure uses a call gate with a non-zero word count to access the new procedure. Here, the source operand for the RET instruction must specify the same number of bytes as is specified in the word count field of the call gate. The RET instruction can be used to execute three different types of returns:
Near returnA return to a calling procedure within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment return. Far returnA return to a calling procedure located in a different segment than the current code segment, sometimes referred to as an intersegment return. Inter-privilege-level far returnA far return to a different privilege level than that of the currently executing program or procedure.
The inter-privilege-level return type can only be executed in protected mode. Refer to Section 4.3., Calling Procedures Using CALL and RET in Chapter 4, Procedure Calls, Interrupts, and Exceptions of the Intel Architecture Software Developers Manual, Volume 1, for detailed information on near, far, and inter-privilege-level returns. When executing a near return, the processor pops the return instruction pointer (offset) from the top of the stack into the EIP register and begins program execution at the new instruction pointer. The CS register is unchanged. When executing a far return, the processor pops the return instruction pointer from the top of the stack into the EIP register, then pops the segment selector from the top of the stack into the CS register. The processor then begins program execution in the new code segment at the new instruction pointer.
3-608
RETReturn from Procedure (Continued)

The mechanics of an inter-privilege-level far return are similar to an intersegment return, except that the processor examines the privilege levels and access rights of the code and stack segments being returned to determine if the control transfer is allowed to be made. The DS, ES, FS, and GS segment registers are cleared by the RET instruction during an inter-privilege-level return if they refer to segments that are not allowed to be accessed at the new privilege level. Since a stack switch also occurs on an inter-privilege level return, the ESP and SS registers are loaded from the stack. If parameters are passed to the called procedure during an inter-privilege level call, the optional source operand must be used with the RET instruction to release the parameters on the return. Here, the parameters are released both from the called procedures stack and the calling procedures stack (that is, the stack being returned to). Operation
(* Near return *) IF instruction = near return THEN; IF OperandSize = 32 THEN IF top 12 bytes of stack not within stack limits THEN #SS(0); FI; EIP Pop(); ELSE (* OperandSize = 16 *) IF top 6 bytes of stack not within stack limits THEN #SS(0) FI; tempEIP Pop(); tempEIP tempEIP AND 0000FFFFH; IF tempEIP not within code segment limits THEN #GP(0); FI; EIP tempEIP; FI; IF instruction has immediate operand THEN IF StackAddressSize=32 THEN ESP ESP + SRC; (* release parameters from stack *) ELSE (* StackAddressSize=16 *) SP SP + SRC; (* release parameters from stack *) FI; FI; (* Real-address mode or virtual-8086 mode *) IF ((PE = 0) OR (PE = 1 AND VM = 1)) AND instruction = far return THEN;
3-609

IF OperandSize = 32 THEN IF top 12 bytes of stack not within stack limits THEN #SS(0); FI; EIP Pop(); CS Pop(); (* 32-bit pop, high-order 16 bits discarded *) ELSE (* OperandSize = 16 *) IF top 6 bytes of stack not within stack limits THEN #SS(0); FI; tempEIP Pop(); tempEIP tempEIP AND 0000FFFFH; IF tempEIP not within code segment limits THEN #GP(0); FI; EIP tempEIP; CS Pop(); (* 16-bit pop *) FI; IF instruction has immediate operand THEN SP SP + (SRC AND FFFFH); (* release parameters from stack *) FI; FI; (* Protected mode, not virtual-8086 mode *) IF (PE = 1 AND VM = 0) AND instruction = far RET THEN IF OperandSize = 32 THEN IF second doubleword on stack is not within stack limits THEN #SS(0); FI; ELSE (* OperandSize = 16 *) IF second word on stack is not within stack limits THEN #SS(0); FI; FI; IF return code segment selector is null THEN GP(0); FI; IF return code segment selector addrsses descriptor beyond diescriptor table limit THEN GP(selector; FI; Obtain descriptor to which return code segment selector points from descriptor table IF return code segment descriptor is not a code segment THEN #GP(selector); FI; if return code segment selector RPL < CPL THEN #GP(selector); FI; IF return code segment descriptor is conforming AND return code segment DPL > return code segment selector RPL THEN #GP(selector); FI; IF return code segment descriptor is not present THEN #NP(selector); FI: IF return code segment selector RPL > CPL THEN GOTO RETURN-OUTER-PRIVILEGE-LEVEL; ELSE GOTO RETURN-TO-SAME-PRIVILEGE-LEVEL FI; END;FI;
3-610

RETURN-SAME-PRIVILEGE-LEVEL: IF the return instruction pointer is not within ther return code segment limit THEN #GP(0); FI; IF OperandSize=32 THEN EIP Pop(); CS Pop(); (* 32-bit pop, high-order 16 bits discarded *) ESP ESP + SRC; (* release parameters from stack *) ELSE (* OperandSize=16 *) EIP Pop(); EIP EIP AND 0000FFFFH; CS Pop(); (* 16-bit pop *) ESP ESP + SRC; (* release parameters from stack *) FI; RETURN-OUTER-PRIVILEGE-LEVEL: IF top (16 + SRC) bytes of stack are not within stack limits (OperandSize=32) OR top (8 + SRC) bytes of stack are not within stack limits (OperandSize=16) THEN #SS(0); FI; FI; Read return segment selector; IF stack segment selector is null THEN #GP(0); FI; IF return stack segment selector index is not within its descriptor table limits THEN #GP(selector); FI; Read segment descriptor pointed to by return segment selector; IF stack segment selector RPL RPL of the return code segment selector OR stack segment is not a writable data segment OR stack segment descriptor DPL RPL of the return code segment selector THEN #GP(selector); FI; IF stack segment not present THEN #SS(StackSegmentSelector); FI; IF the return instruction pointer is not within the return code segment limit THEN #GP(0); FI: CPL ReturnCodeSegmentSelector(RPL); IF OperandSize=32 THEN EIP Pop(); CS Pop(); (* 32-bit pop, high-order 16 bits discarded *) (* segment descriptor information also loaded *) CS(RPL) CPL; ESP ESP + SRC; (* release parameters from called procedures stack *) tempESP Pop(); tempSS Pop(); (* 32-bit pop, high-order 16 bits discarded *) (* segment descriptor information also loaded *) ESP tempESP; SS tempSS;
3-611

ELSE (* OperandSize=16 *) EIP Pop(); EIP EIP AND 0000FFFFH; CS Pop(); (* 16-bit pop; segment descriptor information also loaded *) CS(RPL) CPL; ESP ESP + SRC; (* release parameters from called procedures stack *) tempESP Pop(); tempSS Pop(); (* 16-bit pop; segment descriptor information also loaded *) (* segment descriptor information also loaded *) ESP tempESP; SS tempSS; FI; FOR each of segment register (ES, FS, GS, and DS) DO; IF segment register points to data or non-conforming code segment AND CPL > segment descriptor DPL; (* DPL in hidden part of segment register *) THEN (* segment register invalid *) SegmentSelector 0; (* null segment selector *) FI; OD; For each of ES, FS, GS, and DS DO IF segment selector index is not within descriptor table limits OR segment descriptor indicates the segment is not a data or readable code segment OR if the segment is a data or non-conforming code segment and the segment descriptors DPL < CPL or RPL of code segments segment selector THEN segment selector register null selector; OD; ESP ESP + SRC; (* release parameters from calling procedures stack *)
3-612

Protected Mode Exceptions #GP(0) If the return code or stack segment selector null. If the return instruction pointer is not within the return code segment limit #GP(selector) If the RPL of the return code segment selector is less then the CPL. If the return code or stack segment selector index is not within its descriptor table limits. If the return code segment descriptor does not indicate a code segment. If the return code segment is non-conforming and the segment selectors DPL is not equal to the RPL of the code segments segment selector If the return code segment is conforming and the segment selectors DPL greater than the RPL of the code segments segment selector If the stack segment is not a writable data segment. If the stack segment selector RPL is not equal to the RPL of the return code segment selector. If the stack segment descriptor DPL is not equal to the RPL of the return code segment selector. #SS(0) If the top bytes of stack are not within stack limits. If the return stack segment is not present. #NP(selector) #PF(fault-code) #AC(0) If the return code segment is not present. If a page fault occurs. If an unaligned memory access occurs when the CPL is 3 and alignment checking is enabled.
Real-Address Mode Exceptions #GP #SS If the return instruction pointer is not within the return code segment limit If the top bytes of stack are not within stack limits.
3-613

Virtual-8086 Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #AC(0) If the return instruction pointer is not within the return code segment limit If the top bytes of stack are not within stack limits. If a page fault occurs. If an unaligned memory access occurs when alignment checking is enabled.
3-614
ROL/RORRotate
Refer to entry for RCL/RCR/ROL/RORRotate.
3-615
RSMResume from System Management Mode

Opcode 0F AA Instruction RSM Description Resume operation of interrupted program
Description This instruction returns program control from system management mode (SMM) to the application program or operating-system procedure that was interrupted when the processor received an SSM interrupt. The processors state is restored from the dump created upon entering SMM. If the processor detects invalid state information during state restoration, it enters the shutdown state. The following invalid information can cause a shutdown:
Any reserved bit of CR4 is set to 1. Any illegal combination of bits in CR0, such as (PG=1 and PE=0) or (NW=1 and CD=0). (Intel Pentium and Intel486 processors only.) The value stored in the state dump base field is not a 32-KByte aligned address.
The contents of the model-specific registers are not affected by a return from SMM. Refer to Chapter 12, System Management Mode (SMM), in the Intel Architecture Software Developers Manual, Volume 3, for more information about SMM and the behavior of the RSM instruction. Operation
ReturnFromSSM; ProcessorState Restore(SSMDump);
Flags Affected All. Protected Mode Exceptions #UD If an attempt is made to execute this instruction when the processor is not in SMM.
Real-Address Mode Exceptions #UD If an attempt is made to execute this instruction when the processor is not in SMM.
Virtual-8086 Mode Exceptions #UD If an attempt is made to execute this instruction when the processor is not in SMM.
3-616
RSQRTPSPacked Single-FP Square Root Reciprocal

Opcode 0F,52,/r Instruction RSQRTPS xmm1, xmm2/m128 Description Return a packed approximation of the square root of the reciprocal of XMM2/Mem.
Description RSQRTPS returns an approximation of the reciprocal of the square root of the SP FP numbers from xmm2/m128. The maximum error for this approximation is:
|Error| <= 1.5x2-12
RSQRTPS xmm1,xmm2/m128 Xmm1 Xmm2/ m128 Xmm1
0.0007716
0.0086553
=
36.0
Figure 3-88. Operation of the RSQRTPS Instruction
Operation
DEST[31-0] = APPROX (1.0/SQRT(SRC/m128[31-0])); DEST[63-32] = APPROX (1.0/SQRT(SRC/m128[63-32])); DEST[95-64] = APPROX (1.0/SQRT(SRC/m128[95-64])); DEST[127-96] = APPROX (1.0/SQRT(SRC/m128[127-96]));

__m128 _mm_rsqrt_ps(__m128 a)
Computes the approximations of the reciprocals of the square roots of the four SP FP values of a.
3-617
RSQRTPSPacked Single-FP Square Root Reciprocal (Continued)

Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) Comments RSQRTPS is not affected by the rounding control in MXCSR. Denormal inputs are treated as zeroes (of the same sign) and underflow results are always flushed to zero, with the sign of the operand. For a page fault.
3-618
RSQRTSSScalar Single-FP Square Root Reciprocal

Opcode F3,0F,52,/r Instruction RSQRTSS xmm1, xmm2/m32 Description Return an approximation of the square root of the reciprocal of the lowest SP FP number in XMM2/Mem.
Description RSQRTSS returns an approximation of the reciprocal of the square root of the lowest SP FP number from xmm2/m32; the upper three fields are passed through from xmm1. The maximum error for this approximation is:
|Error| <= 1.5x2-12
RSQRTSS xmm1, xmm2/m32 Xmm1
Xmm2/ m32 Xmm1
Figure 3-89. Operation of the RSQRTSS Instruction
Operation
DEST[31-0] = APPROX (1.0/SQRT(SRC/m32[31-0])); DEST[63-32] = DEST[63-32]; DEST[95-64] = DEST[95-64]; DEST[127-96] = DEST[127-96];
3-619
RSQRTSSScalar Single-FP Square Root Reciprocal (Continued)

__m128 _mm_rsqrt_ss(__m128 a)
Computes the approximation of the reciprocal of the square root of the lower SP FP value of a; the upper three SP FP values are passed through. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #AC For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3)
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #AC #PF (fault-code) Comments RSQRTSS is not affected by the rounding control in MXCSR. Denormal inputs are treated as zeroes (of the same sign) and underflow results are always flushed to zero, with the sign of the operand. For unaligned memory reference if the current privilege level is 3. For a page fault.
3-620
SAHFStore AH into Flags

Opcode 9E Instruction SAHF Clocks 2 Description Loads SF, ZF, AF, PF, and CF from AH into EFLAGS register
Description This instruction loads the SF, ZF, AF, PF, and CF flags of the EFLAGS register with values from the corresponding bits in the AH register (bits 7, 6, 4, 2, and 0, respectively). Bits 1, 3, and 5 of register AH are ignored; the corresponding reserved bits (1, 3, and 5) in the EFLAGS register remain as shown in the Operation section below. Operation
EFLAGS(SF:ZF:0:AF:0:PF:1:CF) AH;
Flags Affected The SF, ZF, AF, PF, and CF flags are loaded with values from the AH register. Bits 1, 3, and 5 of the EFLAGS register are unaffected, with the values remaining 1, 0, and 0, respectively. Exceptions (All Operating Modes) None.
3-621
SAL/SAR/SHL/SHRShift
Opcode D0 /4 D2 /4 C0 /4 ib D1 /4 D3 /4 C1 /4 ib D1 /4 D3 /4 C1 /4 ib D0 /7 D2 /7 C0 /7 ib D1 /7 D3 /7 C1 /7 ib D1 /7 D3 /7 C1 /7 ib D0 /4 D2 /4 C0 /4 ib D1 /4 D3 /4 C1 /4 ib D1 /4 D3 /4 C1 /4 ib D0 /5 D2 /5 C0 /5 ib D1 /5 D3 /5 C1 /5 ib D1 /5 D3 /5 C1 /5 ib NOTE: * Not the same form of division as IDIV; rounding is toward negative infinity. Instruction SAL r/m8,1 SAL r/m8,CL SAL r/m8,imm8 SAL r/m16,1 SAL r/m16,CL SAL r/m16,imm8 SAL r/m32,1 SAL r/m32,CL SAL r/m32,imm8 SAR r/m8,1 SAR r/m8,CL SAR r/m8,imm8 SAR r/m16,1 SAR r/m16,CL SAR r/m16,imm8 SAR r/m32,1 SAR r/m32,CL SAR r/m32,imm8 SHL r/m8,1 SHL r/m8,CL SHL r/m8,imm8 SHL r/m16,1 SHL r/m16,CL SHL r/m16,imm8 SHL r/m32,1 SHL r/m32,CL SHL r/m32,imm8 SHR r/m8,1 SHR r/m8,CL SHR r/m8,imm8 SHR r/m16,1 SHR r/m16,CL SHR r/m16,imm8 SHR r/m32,1 SHR r/m32,CL SHR r/m32,imm8 Description Multiply r/m8 by 2, once Multiply r/m8 by 2, CL times Multiply r/m8 by 2, imm8 times Multiply r/m16 by 2, once Multiply r/m16 by 2, CL times Multiply r/m16 by 2, imm8 times Multiply r/m32 by 2, once Multiply r/m32 by 2, CL times Multiply r/m32 by 2, imm8 times Signed divide* r/m8 by 2, once Signed divide* r/m8 by 2, CL times Signed divide* r/m8 by 2, imm8 times Signed divide* r/m16 by 2, once Signed divide* r/m16 by 2, CL times Signed divide* r/m16 by 2, imm8 times Signed divide* r/m32 by 2, once Signed divide* r/m32 by 2, CL times Signed divide* r/m32 by 2, imm8 times Multiply r/m8 by 2, once Multiply r/m8 by 2, CL times Multiply r/m8 by 2, imm8 times Multiply r/m16 by 2, once Multiply r/m16 by 2, CL times Multiply r/m16 by 2, imm8 times Multiply r/m32 by 2, once Multiply r/m32 by 2, CL times Multiply r/m32 by 2, imm8 times Unsigned divide r/m8 by 2, once Unsigned divide r/m8 by 2, CL times Unsigned divide r/m8 by 2, imm8 times Unsigned divide r/m16 by 2, once Unsigned divide r/m16 by 2, CL times Unsigned divide r/m16 by 2, imm8 times Unsigned divide r/m32 by 2, once Unsigned divide r/m32 by 2, CL times Unsigned divide r/m32 by 2, imm8 times
3-622
SAL/SAR/SHL/SHRShift (Continued)
Description These instructions shift the bits in the first operand (destination operand) to the left or right by the number of bits specified in the second operand (count operand). Bits shifted beyond the destination operand boundary are first shifted into the CF flag, then discarded. At the end of the shift operation, the CF flag contains the last bit shifted out of the destination operand. The destination operand can be a register or a memory location. The count operand can be an immediate value or register CL. The count is masked to five bits, which limits the count range to 0 to 31. A special opcode encoding is provided for a count of 1. The shift arithmetic left (SAL) and shift logical left (SHL) instructions perform the same operation; they shift the bits in the destination operand to the left (toward more significant bit locations). For each shift count, the most significant bit of the destination operand is shifted into the CF flag, and the least significant bit is cleared. Refer to Figure 6-6 in Chapter 6, Instruction Set Summary of the Intel Architecture Software Developers Manual, Volume 1. The shift arithmetic right (SAR) and shift logical right (SHR) instructions shift the bits of the destination operand to the right (toward less significant bit locations). For each shift count, the least significant bit of the destination operand is shifted into the CF flag, and the most significant bit is either set or cleared depending on the instruction type. The SHR instruction clears the most significant bit. For more information, refer to Figure 6-7 in Chapter 6, Instruction Set Summary of the Intel Architecture Software Developers Manual, Volume 1. The SAR instruction sets or clears the most significant bit to correspond to the sign (most significant bit) of the original value in the destination operand. In effect, the SAR instruction fills the empty bit positions shifted value with the sign of the unshifted value. For more information, refer to Figure 6-8 in Chapter 6, Instruction Set Summary of the Intel Architecture Software Developers Manual, Volume 1. The SAR and SHR instructions can be used to perform signed or unsigned division, respectively, of the destination operand by powers of 2. For example, using the SAR instruction to shift a signed integer one bit to the right divides the value by 2. Using the SAR instruction to perform a division operation does not produce the same result as the IDIV instruction. The quotient from the IDIV instruction is rounded toward zero, whereas the quotient of the SAR instruction is rounded toward negative infinity. This difference is apparent only for negative numbers. For example, when the IDIV instruction is used to divide -9 by 4, the result is -2 with a remainder of -1. If the SAR instruction is used to shift -9 right by two bits, the result is -3 and the remainder is +3; however, the SAR instruction stores only the most significant bit of the remainder (in the CF flag). The OF flag is affected only on 1-bit shifts. For left shifts, the OF flag is cleared to 0 if the mostsignificant bit of the result is the same as the CF flag (that is, the top two bits of the original operand were the same); otherwise, it is set to 1. For the SAR instruction, the OF flag is cleared for all 1-bit shifts. For the SHR instruction, the OF flag is set to the most-significant bit of the original operand.
3-623
Intel Architecture Compatibility The 8086 does not mask the shift count. However, all other Intel Architecture processors (starting with the Intel 286 processor) do mask the shift count to five bits, resulting in a maximum count of 31. This masking is done in all operating modes (including the virtual-8086 mode) to reduce the maximum execution time of the instructions. Operation
tempCOUNT (COUNT AND 1FH); tempDEST DEST; WHILE (tempCOUNT 0) DO IF instruction is SAL or SHL THEN CF MSB(DEST); ELSE (* instruction is SAR or SHR *) CF LSB(DEST); FI; IF instruction is SAL or SHL THEN DEST DEST 2; ELSE IF instruction is SAR THEN DEST DEST / 2 (*Signed divide, rounding toward negative infinity*); ELSE (* instruction is SHR *) DEST DEST / 2 ; (* Unsigned divide *); FI; FI; tempCOUNT tempCOUNT 1; OD; (* Determine overflow for the various instructions *) IF COUNT = 1 THEN IF instruction is SAL or SHL THEN OF MSB(DEST) XOR CF; ELSE IF instruction is SAR THEN OF 0; ELSE (* instruction is SHR *) OF MSB(tempDEST); FI; FI;
3-624
ELSE IF COUNT = 0 THEN All flags remain unchanged; ELSE (* COUNT neither 1 or 0 *) OF undefined; FI; FI;
Flags Affected The CF flag contains the value of the last bit shifted out of the destination operand; it is undefined for SHL and SHR instructions where the count is greater than or equal to the size (in bits) of the destination operand. The OF flag is affected only for 1-bit shifts (refer to Description above); otherwise, it is undefined. The SF, ZF, and PF flags are set according to the result. If the count is 0, the flags are not affected. For a non-zero count, the AF flag is undefined. Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-625
3-626
SBBInteger Subtraction with Borrow

Opcode 1C ib 1D iw 1D id 80 /3 ib 81 /3 iw 81 /3 id 83 /3 ib 83 /3 ib 18 /r 19 /r 19 /r 1A /r 1B /r 1B /r Instruction SBB AL,imm8 SBB AX,imm16 SBB EAX,imm32 SBB r/m8,imm8 SBB r/m16,imm16 SBB r/m32,imm32 SBB r/m16,imm8 SBB r/m32,imm8 SBB r/m8,r8 SBB r/m16,r16 SBB r/m32,r32 SBB r8,r/m8 SBB r16,r/m16 SBB r32,r/m32 Description Subtract with borrow imm8 from AL Subtract with borrow imm16 from AX Subtract with borrow imm32 from EAX Subtract with borrow imm8 from r/m8 Subtract with borrow imm16 from r/m16 Subtract with borrow imm32 from r/m32 Subtract with borrow sign-extended imm8 from r/m16 Subtract with borrow sign-extended imm8 from r/m32 Subtract with borrow r8 from r/m8 Subtract with borrow r16 from r/m16 Subtract with borrow r32 from r/m32 Subtract with borrow r/m8 from r8 Subtract with borrow r/m16 from r16 Subtract with borrow r/m32 from r32
Description This instruction adds the source operand (second operand) and the carry (CF) flag, and subtracts the result from the destination operand (first operand). The result of the subtraction is stored in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) The state of the CF flag represents a borrow from a previous subtraction. When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format. The SBB instruction does not distinguish between signed or unsigned operands. Instead, the processor evaluates the result for both data types and sets the OF and CF flags to indicate a borrow in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result. The SBB instruction is usually executed as part of a multibyte or multiword subtraction in which a SUB instruction is followed by a SBB instruction. Operation
DEST DEST (SRC + CF);
Flags Affected The OF, SF, ZF, AF, PF, and CF flags are set according to the result.
3-627
SBBInteger Subtraction with Borrow (Continued)

3-628
SCAS/SCASB/SCASW/SCASDScan String
Opcode AE AF AF AE AF AF Instruction SCAS m8 SCAS m16 SCAS m32 SCASB SCASW SCASD Description Compare AL with byte at ES:(E)DI and set status flags Compare AX with word at ES:(E)DI and set status flags Compare EAX with doubleword at ES(E)DI and set status flags Compare AL with byte at ES:(E)DI and set status flags Compare AX with word at ES:(E)DI and set status flags Compare EAX with doubleword at ES:(E)DI and set status flags
Description These instructions compare the byte, word, or double word specified with the memory operand with the value in the AL, AX, or EAX register, and sets the status flags in the EFLAGS register according to the results. The memory operand address is read from either the ES:EDI or the ES:DI registers (depending on the address-size attribute of the instruction, 32 or 16, respectively). The ES segment cannot be overridden with a segment override prefix. At the assembly-code level, two forms of this instruction are allowed: the explicit-operands form and the no-operands form. The explicit-operand form (specified with the SCAS mnemonic) allows the memory operand to be specified explicitly. Here, the memory operand should be a symbol that indicates the size and location of the operand value. The register operand is then automatically selected to match the size of the memory operand (the AL register for byte comparisons, AX for word comparisons, and EAX for doubleword comparisons). This explicit-operand form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the memory operand symbol must specify the correct type (size) of the operand (byte, word, or doubleword), but it does not have to specify the correct location. The location is always specified by the ES:(E)DI registers, which must be loaded correctly before the compare string instruction is executed. The no-operands form provides short forms of the byte, word, and doubleword versions of the SCAS instructions. Here also ES:(E)DI is assumed to be the memory operand and the AL, AX, or EAX register is assumed to be the register operand. The size of the two operands is selected with the mnemonic: SCASB (byte comparison), SCASW (word comparison), or SCASD (doubleword comparison). After the comparison, the (E)DI register is incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the (E)DI register is incremented; if the DF flag is 1, the (E)DI register is decremented.) The (E)DI register is incremented or decremented by one for byte operations, by two for word operations, or by four for doubleword operations. The SCAS, SCASB, SCASW, and SCASD instructions can be preceded by the REP prefix for block comparisons of ECX bytes, words, or doublewords. More often, however, these instructions will be used in a LOOP construct that takes some action based on the setting of the status flags before the next comparison is made. Refer to REP/REPE/REPZ/REPNE /REPNZRepeat String Operation Prefix in this chapter for a description of the REP prefix.
3-629
SCAS/SCASB/SCASW/SCASDScan String (Continued)

Operation
IF (byte cmparison) THEN temp AL SRC; SetStatusFlags(temp); THEN IF DF = 0 THEN (E)DI (E)DI + 1; ELSE (E)DI (E)DI 1; FI; ELSE IF (word comparison) THEN temp AX SRC; SetStatusFlags(temp) THEN IF DF = 0 THEN (E)DI (E)DI + 2; ELSE (E)DI (E)DI 2; FI; ELSE (* doubleword comparison *) temp EAX SRC; SetStatusFlags(temp) THEN IF DF = 0 THEN (E)DI (E)DI + 4; ELSE (E)DI (E)DI 4; FI; FI; FI;
Flags Affected The OF, SF, ZF, AF, PF, and CF flags are set according to the temporary result of the comparison. Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the limit of the ES segment. If the ES register contains a null segment selector. If an illegal memory operand effective address in the ES segment is given. #PF(fault-code) #AC(0) If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-630
SCAS/SCASB/SCASW/SCASDScan String (Continued)

3-631
SETccSet Byte on Condition

Opcode 0F 97 0F 93 0F 92 0F 96 0F 92 0F 94 0F 9F 0F 9D 0F 9C 0F 9E 0F 96 0F 92 0F 93 0F 97 0F 93 0F 95 0F 9E 0F 9C 0F 9D 0F 9F 0F 91 0F 9B 0F 99 0F 95 0F 90 0F 9A 0F 9A 0F 9B 0F 98 0F 94 Instruction SETA r/m8 SETAE r/m8 SETB r/m8 SETBE r/m8 SETC r/m8 SETE r/m8 SETG r/m8 SETGE r/m8 SETL r/m8 SETLE r/m8 SETNA r/m8 SETNAE r/m8 SETNB r/m8 SETNBE r/m8 SETNC r/m8 SETNE r/m8 SETNG r/m8 SETNGE r/m8 SETNL r/m8 SETNLE r/m8 SETNO r/m8 SETNP r/m8 SETNS r/m8 SETNZ r/m8 SETO r/m8 SETP r/m8 SETPE r/m8 SETPO r/m8 SETS r/m8 SETZ r/m8 Description Set byte if above (CF=0 and ZF=0) Set byte if above or equal (CF=0) Set byte if below (CF=1) Set byte if below or equal (CF=1 or ZF=1) Set if carry (CF=1) Set byte if equal (ZF=1) Set byte if greater (ZF=0 and SF=OF) Set byte if greater or equal (SF=OF) Set byte if less (SF<>OF) Set byte if less or equal (ZF=1 or SF<>OF) Set byte if not above (CF=1 or ZF=1) Set byte if not above or equal (CF=1) Set byte if not below (CF=0) Set byte if not below or equal (CF=0 and ZF=0) Set byte if not carry (CF=0) Set byte if not equal (ZF=0) Set byte if not greater (ZF=1 or SF<>OF) Set if not greater or equal (SF<>OF) Set byte if not less (SF=OF) Set byte if not less or equal (ZF=0 and SF=OF) Set byte if not overflow (OF=0) Set byte if not parity (PF=0) Set byte if not sign (SF=0) Set byte if not zero (ZF=0) Set byte if overflow (OF=1) Set byte if parity (PF=1) Set byte if parity even (PF=1) Set byte if parity odd (PF=0) Set byte if sign (SF=1) Set byte if zero (ZF=1)
Description This instruction sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The condition code suffix (cc) indicates the condition being tested for. The terms above and below are associated with the CF flag and refer to the relationship between two unsigned integer values. The terms greater and less are associated with the SF and OF flags and refer to the relationship between two signed integer values.
3-632
SETccSet Byte on Condition (Continued)

Many of the SETcc instruction opcodes have alternate mnemonics. For example, the SETG (set byte if greater) and SETNLE (set if not less or equal) both have the same opcode and test for the same condition: ZF equals 0 and SF equals OF. These alternate mnemonics are provided to make code more intelligible. Appendix B, EFLAGS Condition Codes, in the Intel Architecture Software Developers Manual, Volume 1, shows the alternate mnemonics for various test conditions. Some languages represent a logical one as an integer with all bits set. This representation can be obtained by choosing the logically opposite condition for the SETcc instruction, then decrementing the result. For example, to test for overflow, use the SETNO instruction, then decrement the result. Operation
IF condition THEN DEST 1 ELSE DEST 0; FI;
Flags Affected None. Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) If a memory operand effective address is outside the SS segment limit. If a page fault occurs.
3-633
SFENCEStore Fence
Opcode 0F AE /7 Instruction SFENCE Description Guarantees that every store instruction that precedes in program order the store fence instruction is globally visible before any store instruction which follows the fence is globally visible.
Description Weakly ordered memory types can enable higher performance through such techniques as outof-order issue, write-combining, and write-collapsing. Memory ordering issues can arise between a producer and a consumer of data and there are a number of common usage models which may be affected by weakly ordered stores: 1. library functions, which use weakly ordered memory to write results 2. compiler-generated code, which also benefit from writing weakly-ordered results 3. hand-written code The degree to which a consumer of data knows that the data is weakly ordered can vary for these cases. As a result, the SFENCE instruction provides a performance-efficient way of ensuring ordering between routines that produce weakly-ordered results and routines that consume this data. The SFENCE is ordered with respect to stores and other SFENCE instructions. SFENCE uses the following ModRM encoding: Mod (7:6) = 11B Reg/Opcode (5:3) = 111B R/M (2:0) = 000B All other ModRM encodings are defined to be reserved, and use of these encodings risks incompatibility with future processors. Operation
WHILE (NOT(preceding_stores_globally_visible)) WAIT();

void_mm_sfence(void)
Guarantees that every preceding store is globally visible before any subsequent store. Numeric Exceptions None.
3-634
SFENCEStore Fence (Continued)

Protected Mode Exceptions None. Real-Address Mode Exceptions None. Virtual-8086 Mode Exceptions None. Comments SFENCE ignores the value of CR4.OSFXSR. SFENCE will not generate an invalid exception if CR4.OSFXSR = 0
3-635
SGDT/SIDTStore Global/Interrupt Descriptor Table Register

Opcode 0F 01 /0 0F 01 /1 Instruction SGDT m SIDT m Description Store GDTR to m Store IDTR to m
Description These instructions store the contents of the global descriptor table register (GDTR) or the interrupt descriptor table register (IDTR) in the destination operand. The destination operand specifies a 6-byte memory location. If the operand-size attribute is 32 bits, the 16-bit limit field of the register is stored in the lower two bytes of the memory location and the 32-bit base address is stored in the upper four bytes. If the operand-size attribute is 16 bits, the limit is stored in the lower two bytes and the 24-bit base address is stored in the third, fourth, and fifth byte, with the sixth byte filled with 0s. The SGDT and SIDT instructions are only useful in operating-system software; however, they can be used in application programs without causing an exception to be generated. Refer to LGDT/LIDTLoad Global/Interrupt Descriptor Table Register in this chapter for information on loading the GDTR and IDTR. Intel Architecture Compatibility The 16-bit forms of the SGDT and SIDT instructions are compatible with the Intel 286 processor, if the upper eight bits are not referenced. The Intel 286 processor fills these bits with 1s; the Pentium Pro, Pentium, Intel486, and Intel386 processors fill these bits with 0s.
3-636
SGDT/SIDTStore Global/Interrupt Descriptor Table Register (Continued)

Operation
IF instruction is IDTR THEN IF OperandSize = 16 THEN DEST[0:15] IDTR(Limit); DEST[16:39] IDTR(Base); (* 24 bits of base address loaded; *) DEST[40:47] 0; ELSE (* 32-bit Operand Size *) DEST[0:15] IDTR(Limit); DEST[16:47] IDTR(Base); (* full 32-bit base address loaded *) FI; ELSE (* instruction is SGDT *) IF OperandSize = 16 THEN DEST[0:15] GDTR(Limit); DEST[16:39] GDTR(Base); (* 24 bits of base address loaded; *) DEST[40:47] 0; ELSE (* 32-bit Operand Size *) DEST[0:15] GDTR(Limit); DEST[16:47] GDTR(Base); (* full 32-bit base address loaded *) FI; FI;
Flags Affected None. Protected Mode Exceptions #UD #GP(0) If the destination operand is a register. If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If an unaligned memory access occurs when the CPL is 3 and alignment checking is enabled.
3-637
SGDT/SIDTStore Global/Interrupt Descriptor Table Register (Continued)

Real-Address Mode Exceptions #UD #GP #SS If the destination operand is a register. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit.
Virtual-8086 Mode Exceptions #UD #GP(0) #SS(0) #PF(fault-code) #AC(0) If the destination operand is a register. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If an unaligned memory access occurs when alignment checking is enabled.
3-638
SHL/SHRShift Instructions
Refer to entry for SAL/SAR/SHL/SHRShift.
3-639
SHLDDouble Precision Shift Left

Opcode 0F A4 0F A5 0F A4 0F A5 Instruction SHLD r/m16,r16,imm8 SHLD r/m16,r16,CL SHLD r/m32,r32,imm8 SHLD r/m32,r32,CL Description Shift r/m16 to left imm8 places while shifting bits from r16 in from the right Shift r/m16 to left CL places while shifting bits from r16 in from the right Shift r/m32 to left imm8 places while shifting bits from r32 in from the right Shift r/m32 to left CL places while shifting bits from r32 in from the right
Description This instruction shifts the first operand (destination operand) to the left the number of bits specified by the third operand (count operand). The second operand (source operand) provides bits to shift in from the right (starting with bit 0 of the destination operand). The destination operand can be a register or a memory location; the source operand is a register. The count operand is an unsigned integer that can be an immediate byte or the contents of the CL register. Only bits 0 through 4 of the count are used, which masks the count to a value between 0 and 31. If the count is greater than the operand size, the result in the destination operand is undefined. If the count is one or greater, the CF flag is filled with the last bit shifted out of the destination operand. For a 1-bit shift, the OF flag is set if a sign change occurred; otherwise, it is cleared. If the count operand is 0, the flags are not affected. The SHLD instruction is useful for multiprecision shifts of 64 bits or more.
3-640
SHLDDouble Precision Shift Left (Continued)

Operation
COUNT COUNT MOD 32; SIZE OperandSize IF COUNT = 0 THEN no operation ELSE IF COUNT SIZE THEN (* Bad parameters *) DEST is undefined; CF, OF, SF, ZF, AF, PF are undefined; ELSE (* Perform the shift *) CF BIT[DEST, SIZE COUNT]; (* Last bit shifted out on exit *) FOR i SIZE 1 DOWNTO COUNT DO Bit(DEST, i) Bit(DEST, i COUNT); OD; FOR i COUNT 1 DOWNTO 0 DO BIT[DEST, i] BIT[SRC, i COUNT + SIZE]; OD; FI; FI;
Flags Affected If the count is one or greater, the CF flag is filled with the last bit shifted out of the destination operand and the SF, ZF, and PF flags are set according to the value of the result. For a 1-bit shift, the OF flag is set if a sign change occurred; otherwise, it is cleared. For shifts greater than one bit, the OF flag is undefined. If a shift occurs, the AF flag is undefined. If the count operand is 0, the flags are not affected. If the count is greater than the operand size, the flags are undefined.
3-641
SHLDDouble Precision Shift Left (Continued)

3-642
SHRDDouble Precision Shift Right

Opcode 0F AC 0F AD 0F AC 0F AD Instruction SHRD r/m16,r16,imm8 SHRD r/m16,r16,CL SHRD r/m32,r32,imm8 SHRD r/m32,r32,CL Description Shift r/m16 to right imm8 places while shifting bits from r16 in from the left Shift r/m16 to right CL places while shifting bits from r16 in from the left Shift r/m32 to right imm8 places while shifting bits from r32 in from the left Shift r/m32 to right CL places while shifting bits from r32 in from the left
Description This instruction shifts the first operand (destination operand) to the right the number of bits specified by the third operand (count operand). The second operand (source operand) provides bits to shift in from the left (starting with the most significant bit of the destination operand). The destination operand can be a register or a memory location; the source operand is a register. The count operand is an unsigned integer that can be an immediate byte or the contents of the CL register. Only bits 0 through 4 of the count are used, which masks the count to a value between 0 and 31. If the count is greater than the operand size, the result in the destination operand is undefined. If the count is one or greater, the CF flag is filled with the last bit shifted out of the destination operand. For a 1-bit shift, the OF flag is set if a sign change occurred; otherwise, it is cleared. If the count operand is 0, the flags are not affected. The SHRD instruction is useful for multiprecision shifts of 64 bits or more.
3-643
SHRDDouble Precision Shift Right (Continued)

Operation
COUNT COUNT MOD 32; SIZE OperandSize IF COUNT = 0 THEN no operation ELSE IF COUNT SIZE THEN (* Bad parameters *) DEST is undefined; CF, OF, SF, ZF, AF, PF are undefined; ELSE (* Perform the shift *) CF BIT[DEST, COUNT 1]; (* last bit shifted out on exit *) FOR i 0 TO SIZE 1 COUNT DO BIT[DEST, i] BIT[DEST, i COUNT]; OD; FOR i SIZE COUNT TO SIZE 1 DO BIT[DEST,i] BIT[inBits,i+COUNT SIZE]; OD; FI; FI;
Flags Affected If the count is one or greater, the CF flag is filled with the last bit shifted out of the destination operand and the SF, ZF, and PF flags are set according to the value of the result. For a 1-bit shift, the OF flag is set if a sign change occurred; otherwise, it is cleared. For shifts greater than one bit, the OF flag is undefined. If a shift occurs, the AF flag is undefined. If the count operand is 0, the flags are not affected. If the count is greater than the operand size, the flags are undefined. Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-644
SHRDDouble Precision Shift Right (Continued)

3-645
SHUFPSShuffle Single-FP
Opcode 0F,C6,/r, ib Instruction SHUFPS xmm1, xmm2/m128, imm8 Description Shuffle Single.
Description The SHUFPS instruction is able to shuffle any of the four SP FP numbers from xmm1 to the lower two destination fields; the upper two destination fields are generated from a shuffle of any of the four SP FP numbers from xmm2/m128.
Example 3-1. SHUFPS Instruction
X4 xmm1 Y4 xmm2/m128 xmm1 {Y4 ... Y1}
X3
X2
X1
Y3
Y2
Y1
{Y4 ... Y1}
{X4 ... X1}
{X4 ... X1}
By using the same register for both sources, SHUFPS can return any combination of the four SP FP numbers from this register. Bits 0 and 1 of the immediate field are used to select which of the four input SP FP numbers will be put in the first SP FP number of the result; bits 3 and 2 of the immediate field are used to select which of the four input SP FP will be put in the second SP FP number of the result; etc.
3-646
SHUFPSShuffle Single-FP (Continued)
Xmm1
Xmm2/ m128 Xmm1

Figure 3-90. Operation of the SHUFPS Instruction
Operation
FP_SELECT = (imm8 >> 0) AND 0X3; IF (FP_SELECT = 0) THEN DEST[31-0] = DEST[31-0]; ELSE IF (FP_SELECT = 1) THEN DEST[31-0] = DEST[63-32]; ELSE IF (FP_SELECT = 2) THEN DEST[31-0] = DEST[95-64]; ELSE DEST[31-0] = DEST[127-96]; FI FI FI
3-647

FP_SELECT = (imm8 >> 2) AND 0X3; IF (FP_SELECT = 0) THEN DEST[63-32] = DEST[31-0]; ELSE IF (FP_SELECT = 1) THEN DEST[63-32] = DEST[63-32]; ELSE IF (FP_SELECT = 2) THEN DEST[63-32] = DEST[95-64]; ELSE DEST[63-32] = DEST[127-96]; FI FI FI FP_SELECT = (imm8 >> 4) AND 0X3; IF (FP_SELECT = 0) THEN DEST[95-64] = SRC/m128[31-0]; ELSE IF (FP_SELECT = 1) THEN DEST[95-64] = SRC/m128 [63-32]; ELSE IF (FP_SELECT = 2) THEN DEST[95-64] = SRC/m128 [95-64]; ELSE DEST[95-64] = SRC/m128 [127-96]; FI FI FI FP_SELECT = (imm8 >> 6) AND 0X3; IF (FP_SELECT = 0) THEN DEST[127-96] = SRC/m128 [31-0]; ELSE IF (FP_SELECT = 1) THEN DEST[127-96] = SRC/m128 [63-32]; ELSE IF (FP_SELECT = 2) THEN DEST[127-96] = SRC/m128 [95-64]; ELSE DEST[127-96] = SRC/m128 [127-96]; FI FI FI
3-648

__m128 _mm_shuffle_ps(__m128 a, __m128 b, unsigned int imm8)
Selects four specific SP FP values from a and b, based on the mask i. The mask must be an immediate. Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-649

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) Comments The usage of Repeat Prefix (F3H) with SHUFPS is reserved. Different processor implementations may handle this prefix differently. Usage of this prefix with SHUFPS risks incompatibility with future processors. For a page fault.
3-650
SIDTStore Interrupt Descriptor Table Register

Refer to entry for SGDT/SIDTStore Global/Interrupt Descriptor Table Register.
3-651
SLDTStore Local Descriptor Table Register

Opcode 0F 00 /0 0F 00 /0 Instruction SLDT r/m16 SLDT r/m32 Description Stores segment selector from LDTR in r/m16 Store segment selector from LDTR in low-order 16 bits of r/m32
Description This instruction stores the segment selector from the local descriptor table register (LDTR) in the destination operand. The destination operand can be a general-purpose register or a memory location. The segment selector stored with this instruction points to the segment descriptor (located in the GDT) for the current LDT. This instruction can only be executed in protected mode. When the destination operand is a 32-bit register, the 16-bit segment selector is copied into the lower-order 16 bits of the register. The high-order 16 bits of the register are cleared to 0s for the Pentium Pro processor and are undefined for Pentium, Intel486, and Intel386 processors. When the destination operand is a memory location, the segment selector is written to memory as a 16-bit quantity, regardless of the operand size. The SLDT instruction is only useful in operating-system software; however, it can be used in application programs. Operation
DEST LDTR(SegmentSelector);
Flags Affected None. Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-652
SLDTStore Local Descriptor Table Register (Continued)

Real-Address Mode Exceptions #UD The SLDT instruction is not recognized in real-address mode.
Virtual-8086 Mode Exceptions #UD The SLDT instruction is not recognized in virtual-8086 mode.
3-653
SMSWStore Machine Status Word

Opcode 0F 01 /4 0F 01 /4 Instruction SMSW r/m16 SMSW r32/m16 Description Store machine status word to r/m16 Store machine status word in low-order 16 bits of r32/m16; high-order 16 bits of r32 are undefined
Description This instruction stores the machine status word (bits 0 through 15 of control register CR0) into the destination operand. The destination operand can be a 16-bit general-purpose register or a memory location. When the destination operand is a 32-bit register, the low-order 16 bits of register CR0 are copied into the low-order 16 bits of the register and the upper 16 bits of the register are undefined. When the destination operand is a memory location, the low-order 16 bits of register CR0 are written to memory as a 16-bit quantity, regardless of the operand size. The SMSW instruction is only useful in operating-system software; however, it is not a privileged instruction and can be used in application programs. This instruction is provided for compatibility with the Intel 286 processor. Programs and procedures intended to run on the Pentium Pro, Pentium, Intel486, and Intel386 processors should use the MOV (control registers) instruction to load the machine status word. Operation
DEST CR0[15:0]; (* Machine status word *);
Flags Affected None. Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-654
SMSWStore Machine Status Word (Continued)

Real-Address Mode Exceptions #GP #SS(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit.
3-655
SQRTPSPacked Single-FP Square Root

Opcode 0F,51,/r Instruction SQRTPS xmm1, xmm2/m128 Description Square Root of the packed SP FP numbers in XMM2/Mem.
Description The SQRTPS instruction returns the square root of the packed SP FP numbers from xmm2/m128.
SQRTPS xmm1, xmm2/m128 Xmm1 Xmm2/ m128 Xmm1
9.0
16.0
=
3.0
Figure 3-91. Operation of the SQRTPS Instruction
Operation
DEST[31-0] = SQRT (SRC/m128[31-0]); DEST[63-32] = SQRT (SRC/m128[63-32]); DEST[95-64] = SQRT (SRC/m128[95-64]); DEST[127-96] = SQRT (SRC/m128[127-96]);

__m128 _mm_sqrt_ps(__m128 a)
Computes the square roots of the four SP FP values of a.
3-656
SQRTPSPacked Single-FP Square Root (Continued)

Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions Invalid, Precision, Denormal. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-657
SQRTPSPacked Single-FP Square Root (Continued)

3-658
SQRTSSScalar Single-FP Square Root

Opcode F3,0F,51,/r Instruction SQRTSS xmm1, xmm2/m32 Description Square Root of the lower SP FP number in XMM2/Mem.
Description The SQRTSS instructions return the square root of the lowest SP FP numbers of their operand.
SQRTSS xmm1, xmm2/m32 Xmm1 Xmm2/ m32
=
Xmm1
Figure 3-92. Operation of the SQRTSS Instruction
Operation
DEST[31-0] = SQRT (SRC/m32[31-0]); DEST[63-32] = DEST[63-32]; DEST[95-64] = DEST[95-64]; DEST[127-96] = DEST[127-96];

__m128 _mm_sqrt_ss(__m128 a)
Computes the square root of the lower SP FP value of a; the upper three SP FP values are passed through.
3-659
SQRTSSScalar Single-FP Square Root (Continued)

Exceptions None. Numeric Exceptions Invalid, Precision, Denormal. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #AC #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-660
SQRTSSScalar Single-FP Square Root (Continued)

3-661
STCSet Carry Flag

Opcode F9 Instruction STC Description Set CF flag
Description This instruction sets the CF flag in the EFLAGS register. Operation
CF 1;
Flags Affected The CF flag is set. The OF, ZF, SF, AF, and PF flags are unaffected. Exceptions (All Operating Modes) None.
3-662
STDSet Direction Flag

Opcode FD Instruction STD Description Set DF flag
Description This instruction sets the DF flag in the EFLAGS register. When the DF flag is set to 1, string operations decrement the index registers (ESI and/or EDI). Operation
DF 1;
Flags Affected The DF flag is set. The CF, OF, ZF, SF, AF, and PF flags are unaffected. Operation
DF 1;
Exceptions (All Operating Modes) None.
3-663
STISet Interrupt Flag

Opcode FB Instruction STI Description Set interrupt flag; external, maskable interrupts enabled at the end of the next instruction
Description This instruction sets the interrupt flag (IF) in the EFLAGS register. After the IF flag is set, the processor begins responding to external, maskable interrupts after the next instruction is executed. The delayed effect of this instruction is provided to allow interrupts to be enabled just before returning from a procedure (or subroutine). For instance, if an STI instruction is followed by an RET instruction, the RET instruction is allowed to execute before external interrupts are recognized1. This behavior allows external interrupts to be disabled at the beginning of a procedure and enabled again at the end of the procedure. If the STI instruction is followed by a CLI instruction (which clears the IF flag), the effect of the STI instruction is negated. The IF flag and the STI and CLI instructions have no affect on the generation of exceptions and NMI interrupts. The following decision table indicates the action of the STI instruction (bottom of the table) depending on the processors mode of operation and the CPL and IOPL of the currently running program or procedure (top of the table).
PE = VM = CPL IOPL IF 1 #GP(0) NOTES: X Dont care. N Action in Column 1 not taken. Y Action in Column 1 taken. 0 X X X Y N 1 0 IOPL X Y N 1 0 > IOPL X N Y 1 1 =3 =3 Y N
1. Note that in a sequence of instructions that individually delay interrupts past the following instruction, only the first instruction in the sequence is guaranteed to delay the interrupt, but subsequent interrupt-delaying instructions may not delay the interrupt. Thus, in the following instruction sequence: STI MOV SS, AX MOV ESP, EBP interrupts may be recognized before MOV ESP, EBP executes, even though MOV SS, AX normally delays interrupts for one instruction.
3-664
STISet Interrupt Flag (Continued)

Operation
IF PE=0 (* Executing in real-address mode *) THEN IF 1; (* Set Interrupt Flag *) ELSE (* Executing in protected mode or virtual-8086 mode *) IF VM=0 (* Executing in protected mode*) THEN IF IOPL = 3 THEN IF 1; ELSE IF CPL IOPL THEN IF 1; ELSE #GP(0); FI; FI; ELSE (* Executing in Virtual-8086 mode *) #GP(0); (* Trap to virtual-8086 monitor *) FI; FI;
Flags Affected The IF flag is set to 1. Protected Mode Exceptions #GP(0) If the CPL is greater (has less privilege) than the IOPL of the current program or procedure.
Real-Address Mode Exceptions None. Virtual-8086 Mode Exceptions #GP(0) If the CPL is greater (has less privilege) than the IOPL of the current program or procedure.
3-665
STMXCSRStore Streaming SIMD Extension Control/Status

Opcode 0F,AE,/3 Instruction STMXCSR m32 Description Store Streaming SIMD Extension control/status word to m32.
Description The MXCSR control/status register is used to enable masked/unmasked exception handling, to set rounding modes, to set flush-to-zero mode, and to view exception status flags. Refer to LDMXCSR for a description of the format of MXCSR. The linear address corresponds to the address of the least-significant byte of the referenced memory data. The reserved bits in the MXCSR are stored as zeroes. Operation
m32 = MXCSR;

_mm_getcsr(void)
Returns the contents of the control register. Exceptions None. Numeric Exceptions None.
3-666
STMXCSRStore Streaming SIMD Extension Control/Status (Continued)

Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #AC #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) #AC Comments The usage of Repeat Prefix (F3H) with STMXCSR is reserved. Different processor implementations may handle this prefix differently. Usage of this prefix with STMXCSR risks incompatibility with future processors. For a page fault. For unaligned memory reference.
3-667
STOS/STOSB/STOSW/STOSDStore String
Opcode AA AB AB AA AB AB Instruction STOS m8 STOS m16 STOS m32 STOSB STOSW STOSD Description Store AL at address ES:(E)DI Store AX at address ES:(E)DI Store EAX at address ES:(E)DI Store AL at address ES:(E)DI Store AX at address ES:(E)DI Store EAX at address ES:(E)DI
Description These instructions store a byte, word, or doubleword from the AL, AX, or EAX register, respectively, into the destination operand. The destination operand is a memory location, the address of which is read from either the ES:EDI or the ES:DI registers (depending on the address-size attribute of the instruction, 32 or 16, respectively). The ES segment cannot be overridden with a segment override prefix. At the assembly-code level, two forms of this instruction are allowed: the explicit-operands form and the no-operands form. The explicit-operands form (specified with the STOS mnemonic) allows the destination operand to be specified explicitly. Here, the destination operand should be a symbol that indicates the size and location of the destination value. The source operand is then automatically selected to match the size of the destination operand (the AL register for byte operands, AX for word operands, and EAX for doubleword operands). This explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the destination operand symbol must specify the correct type (size) of the operand (byte, word, or doubleword), but it does not have to specify the correct location. The location is always specified by the ES:(E)DI registers, which must be loaded correctly before the store string instruction is executed. The no-operands form provides short forms of the byte, word, and doubleword versions of the STOS instructions. Here also ES:(E)DI is assumed to be the destination operand and the AL, AX, or EAX register is assumed to be the source operand. The size of the destination and source operands is selected with the mnemonic: STOSB (byte read from register AL), STOSW (word from AX), or STOSD (doubleword from EAX). After the byte, word, or doubleword is transferred from the AL, AX, or EAX register to the memory location, the (E)DI register is incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the (E)DI register is incremented; if the DF flag is 1, the (E)DI register is decremented.) The (E)DI register is incremented or decremented by one for byte operations, by two for word operations, or by four for doubleword operations.
3-668
STOS/STOSB/STOSW/STOSDStore String (Continued)

The STOS, STOSB, STOSW, and STOSD instructions can be preceded by the REP prefix for block loads of ECX bytes, words, or doublewords. More often, however, these instructions are used within a LOOP construct because data needs to be moved into the AL, AX, or EAX register before it can be stored. Refer to REP/REPE/REPZ/REPNE /REPNZRepeat String Operation Prefix in this chapter for a description of the REP prefix. Operation
IF (byte store) THEN DEST AL; THEN IF DF = 0 THEN (E)DI (E)DI + 1; ELSE (E)DI (E)DI 1; FI; ELSE IF (word store) THEN DEST AX; THEN IF DF = 0 THEN (E)DI (E)DI + 2; ELSE (E)DI (E)DI 2; FI; ELSE (* doubleword store *) DEST EAX; THEN IF DF = 0 THEN (E)DI (E)DI + 4; ELSE (E)DI (E)DI 4; FI; FI; FI;
Flags Affected None. Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the limit of the ES segment. If the ES register contains a null segment selector. #PF(fault-code) #AC(0) If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-669
STOS/STOSB/STOSW/STOSDStore String (Continued)

Real-Address Mode Exceptions #GP If a memory operand effective address is outside the ES segment limit.
Virtual-8086 Mode Exceptions #GP(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the ES segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made.
3-670
STRStore Task Register

Opcode 0F 00 /1 Instruction STR r/m16 Description Stores segment selector from TR in r/m16
Description This instruction stores the segment selector from the task register (TR) in the destination operand. The destination operand can be a general-purpose register or a memory location. The segment selector stored with this instruction points to the task state segment (TSS) for the currently running task. When the destination operand is a 32-bit register, the 16-bit segment selector is copied into the lower 16 bits of the register and the upper 16 bits of the register are cleared to 0s. When the destination operand is a memory location, the segment selector is written to memory as a 16-bit quantity, regardless of operand size. The STR instruction is useful only in operating-system software. It can only be executed in protected mode. Operation
DEST TR(SegmentSelector);
Flags Affected None. Protected Mode Exceptions #GP(0) If the destination is a memory operand that is located in a nonwritable segment or if the effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-671
STRStore Task Register (Continued)

Real-Address Mode Exceptions #UD The STR instruction is not recognized in real-address mode.
Virtual-8086 Mode Exceptions #UD The STR instruction is not recognized in virtual-8086 mode.
3-672
SUBSubtract
Opcode 2C ib 2D iw 2D id 80 /5 ib 81 /5 iw 81 /5 id 83 /5 ib 83 /5 ib 28 /r 29 /r 29 /r 2A /r 2B /r 2B /r Instruction SUB AL,imm8 SUB AX,imm16 SUB EAX,imm32 SUB r/m8,imm8 SUB r/m16,imm16 SUB r/m32,imm32 SUB r/m16,imm8 SUB r/m32,imm8 SUB r/m8,r8 SUB r/m16,r16 SUB r/m32,r32 SUB r8,r/m8 SUB r16,r/m16 SUB r32,r/m32 Description Subtract imm8 from AL Subtract imm16 from AX Subtract imm32 from EAX Subtract imm8 from r/m8 Subtract imm16 from r/m16 Subtract imm32 from r/m32 Subtract sign-extended imm8 from r/m16 Subtract sign-extended imm8 from r/m32 Subtract r8 from r/m8 Subtract r16 from r/m16 Subtract r32 from r/m32 Subtract r/m8 from r8 Subtract r/m16 from r16 Subtract r/m32 from r32
Description This instruction subtracts the second operand (source operand) from the first operand (destination operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, register, or memory location. (However, two memory operands cannot be used in one instruction.) When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format. The SUB instruction does not distinguish between signed or unsigned operands. Instead, the processor evaluates the result for both data types and sets the OF and CF flags to indicate a borrow in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result. Operation
DEST DEST SRC;
Flags Affected The OF, SF, ZF, AF, PF, and CF flags are set according to the result.
3-673
SUBSubtract (Continued)
3-674
SUBPSPacked Single-FP Subtract

Opcode 0F,5C,/r Instruction SUBPS xmm1 xmm2/m128 Description Subtract packed SP FP numbers in XMM2/Mem from XMM1.
Description The SUBPS instruction subtracts the packed SP FP numbers of both their operands.
Xmm1
Xmm2/ m128 Xmm1
Figure 3-93. Operation of the SUBPS Instruction
Operation
DEST[31-0] = DEST[31-0] - SRC/m128[31-0]; DEST[63-32] = DEST[63-32] - SRC/m128[63-32]; DEST[95-64] = DEST[95-64] - SRC/m128[95-64]; DEST[127-96] = DEST[127-96] - SRC/m128[127-96];

__m128 _mm_sub_ps(__m128 a, __m128 b)
Subtracts the four SP FP values of a and b.
3-675
SUBPSPacked Single-FP Subtract (Continued)

Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions Overflow, Underflow, Invalid, Precision, Denormal. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-676
SUBPSPacked Single-FP Subtract (Continued)

3-677
SUBSSScalar Single-FP Subtract

Opcode F3,0F,5C, /r Instruction SUBSS xmm1, xmm2/m32 Description Subtract the lower SP FP numbers in XMM2/Mem from XMM1.
Description The SUBSS instruction subtracts the lower SP FP numbers of both their operands.
SUBSS xmm1, xmm2/m32 Xmm1
Xmm2/ m32
=
Xmm1
Figure 3-94. Operation of the SUBSS Instruction
Operation
DEST[31-0] = DEST[31-0] - SRC/m32[31-0]; DEST[63-32] = DEST[63-32]; DEST[95-64] = DEST[95-64]; DEST[127-96] = DEST[127-96];

__m128 _mm_sub_ss(__m128 a, __m128 b)
Subtracts the lower SP FP values of a and b. The upper three SP FP values are passed through from a.
3-678
SUBSSScalar Single-FP Subtract (Continued)

Exceptions None. Numeric Exceptions Overflow, Underflow, Invalid, Precision, Denormal. Protected Mode Exceptions #GP(0) #SS(0) #PF (fault-code) #UD #NM #AC #XM #UD #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. For unaligned memory reference. To enable #AC exceptions, three conditions must be true(CR0.AM is set; EFLAGS.AC is set; current CPL is 3). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =1). For an unmasked Streaming SIMD Extension numeric exception (CR4.OSXMMEXCPT =0). If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
3-679
SUBSSScalar Single-FP Subtract (Continued)

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #AC #PF(fault-code) For unaligned memory reference if the current privilege level is 3. For a page fault.
3-680
SYSENTERFast Transition to System Call Entry Point

Opcode 0F, 34 Instruction SYSENTER Description Transition to System Call Entry Point
Description The SYSENTER instruction is part of the "Fast System Call" facility introduced on the Pentium II processor. The SYSENTER instruction is optimized to provide the maximum performance for transitions to protection ring 0 (CPL = 0). The SYSENTER instruction sets the following registers according to values specified by the operating system in certain model-specific registers. CS register EIP register SS register ESP register set to the value of (SYSENTER_CS_MSR) set to the value of (SYSENTER_EIP_MSR) set to the sum of (8 plus the value in SYSENTER_CS_MSR) set to the value of (SYSENTER_ESP_MSR)
The processor does not save user stack or return address information, and does not save any registers. The SYSENTER and SYSEXIT instructions do not constitute a call/return pair; therefore, the system call "stub" routines executed by user code (typically in shared libraries or DLLs) must perform the required register state save to create a system call/return pair. The SYSENTER instruction always transfers to a flat protected mode kernel at CPL = 0. SYSENTER can be invoked from all modes except real mode. The instruction requires that the following conditions are met by the operating system:
The CS selector for the target ring 0 code segment is 32 bits, mapped as a flat 0-4 GB address space with execute and read permissions The SS selector for the target ring 0 stack segment is 32 bits, mapped as a flat 0-4 GB address space with read, write, and accessed permissions. This selector (Target Ring 0 SS Selector) is assigned the value of the new (CS selector + 8).
An operating system provides values for CS, EIP, SS, and ESP for the ring 0 entry point through use of model-specific registers within the processor. These registers can be read from and written to by using the RDMSR and WRMSR instructions. The register addresses are defined to remain fixed at the following addresses on future processors that provide support for this feature.
Name SYSENTER_CS_MSR SYSENTER_ESP_MSR SYSENTER_EIP_MSR Description Target Ring 0 CS Selector Target Ring 0 ESP Target Ring 0 Entry Point EIP Address 174h 175h 176h
3-681
SYSENTERFast Transition to System Call Entry Point (Continued)

The presence of this facility is indicated by the SYSENTER Present (SEP) bit 11 of CPUID. An operating system that detects the presence of the SEP bit must also qualify the processor family and model to ensure that the SYSENTER/SYSEXIT instructions are actually present. For example:
IF (CPUID SEP bit is set) IF (Family == 6) AND (Model < 3) AND (Stepping < 3) THEN Fast System Call NOT supported FI; ELSE Fast System Call is supported FI
The Pentium Pro processor (Model = 1) returns a set SEP CPUID feature bit, but does not support the SYSENTER/SYSEXIT instructions.
3-682

Operation
SYSENTER IF CR0.PE == 0 THEN #GP(0) IF SYSENTER_CS_MSR == 0 THEN #GP(0) EFLAGS.VM := 0 EFLAGS.IF := 0 CS.SEL := SYSENTER_CS_MSR // Set rest of CS to a fixed value CS.SEL.CPL := 0 CS.SEL.BASE := 0 CS.SEL.LIMIT := 0xFFFF CS.SEL.G := 1 CS.SEL.S := 1 CS.SEL.TYPE_xCRA := 1011 CS.SEL.D := 1 CS.SEL.DPL := 0 CS.SEL.RPL := 0 CS.SEL.P := 1 SS.SEL := CS.SEL+8 // Set rest of SS to a fixed value SS.SEL.BASE := 0 SS.SEL.LIMIT := 0xFFFF SS.SEL.G := 1 SS.SEL.S := 1 SS.SEL.TYPE_xCRA := 0011 SS.SEL.D := 1 SS.SEL.DPL := 0 SS.SEL.RPL := 0 SS.SEL.P := 1 ESP := SYSENTER_ESP_MSR EIP := SYSENTER_EIP_MSR // Prevent VM86 mode // Mask interrupts // Operating system provides CS
// CPL = 0 // Flat segment // 4G limit // 4 KB granularity // Execute + Read, Accessed // 32 bit code
// Flat segment // 4G limit // 4 KB granularity // Read/Write, Accessed // 32 bit stack
3-683

Exceptions #GP(0) If SYSENTER_CS_MSR contains zero.
Numeric Exceptions None. Real Address Mode Exceptions #GP(0) If protected mode is not enabled.
3-684
SYSEXITFast Transition from System Call Entry Point

Opcode 0F, 35 Instruction SYSEXIT Description Transition from System Call Entry Point
Description The SYSEXIT instruction is part of the "Fast System Call" facility introduced on the Pentium II processor. The SYSEXIT instruction is optimized to provide the maximum performance for transitions to protection ring 3 (CPL = 3) from protection ring 0 (CPL = 0). The SYSEXIT instruction sets the following registers according to values specified by the operating system in certain model-specific or general purpose registers. CS register EIP register SS register ESP register set to the sum of (16 plus the value in SYSENTER_CS_MSR) set to the value contained in the EDX register set to the sum of (24 plus the value in SYSENTER_CS_MSR) set to the value contained in the ECX register
The processor does not save kernel stack or return address information, and does not save any registers. The SYSENTER and SYSEXIT instructions do not constitute a call/return pair; therefore, the system call "stub" routines executed by user code (typically in shared libraries or DLLs) must perform the required register state restore to create a system call/return pair. The SYSEXIT instruction always transfers to a flat protected mode user at CPL = 3. SYSEXIT can be invoked only from protected mode and CPL = 0. The instruction requires that the following conditions are met by the operating system:
The CS selector for the target ring 3 code segment is 32 bits, mapped as a flat 0-4 GB address space with execute, read, and non-conforming permissions. The SS selector for the target ring 3 stack segment is 32 bits, mapped as a flat 0-4 GB address space with expand-up, read, and write permissions.
An operating system must set the following:

Name CS Selector SS Selector EIP ESP Description The Target Ring 3 CS Selector. This is assigned the sum of (16 + the value of SYSENTER_CS_MSR). The Target Ring 3 SS Selector. This is assigned the sum of (24 + the value of SYSENTER_CS_MSR). Target Ring 3 Return EIP. This is the target entry point, and is assigned the value contained in the EDX register. Target Ring 3 Return ESP. This is the target entry point, and is assigned the value contained in the ECX register.
3-685
SYSEXITFast Transition from System Call Entry Point (Continued)

The presence of this facility is indicated by the SYSENTER Present (SEP) bit 11 of CPUID. An operating system that detects the presence of the SEP bit must also qualify the processor family and model to ensure that the SYSENTER/SYSEXIT instructions are actually present, as described for the SYSENTER instruction. The Pentium Pro processor (Model = 1) returns a set SEP CPUID feature bit, but does not support the SYSENTER/SYSEXIT instructions. Operation
SYSEXIT IF SYSENTER_CS_MSR == 0 THEN #GP(0) IF CR0.PE == 0 THEN #GP(0) IF CPL <> 0 THEN #GP(0) // Changing CS:EIP and SS:ESP is required CS.SEL := (SYSENTER_CS_MSR + 16) CS.SEL.RPL := 3 // Set rest of CS to a fixed value CS.SEL.BASE := 0 CS.SEL.LIMIT := 0xFFFF CS.SEL.G := 1 CS.SEL.S := 1 CS.SEL.TYPE_xCRA := 1011 CS.SEL.D := 1 CS.SEL.DPL := 3 CS.SEL.P := 1 SS.SEL := (SYSENTER_CS_MSR + 24) SS.SEL.RPL := 3 // Set rest of SS to a fixed value SS.SEL.BASE := 0 SS.SEL.LIMIT := 0xFFFF SS.SEL.G := 1 SS.SEL.S := 1 SS.SEL.TYPE_xCRA := 0011 SS.SEL.D := 1 SS.SEL.DPL := 3 SS.SEL.CPL := 3 SS.SEL.P := 1 ESP := ECX EIP := EDX // Selector for return CS
// Flat segment // 4G limit // 4 KB granularity // Execute, Read, Non-Conforming Code // 32 bit code
// Flat segment // 4G limit // 4 KB granularity // Expand Up, Read/Write, Data // 32 bit stack
3-686
SYSEXITFast Transition from System Call Entry Point (Continued)

Exceptions #GP(0) If SYSENTER_CS_MSR contains zero.
Numeric Exceptions None. Protected Mode Exceptions #GP(0) If CPL is non-zero.
Real Address Mode Exceptions #GP(0) If protected mode is not enabled.
3-687
TESTLogical Compare
Opcode A8 ib A9 iw A9 id F6 /0 ib F7 /0 iw F7 /0 id 84 /r 85 /r 85 /r Instruction TEST AL,imm8 TEST AX,imm16 TEST EAX,imm32 TEST r/m8,imm8 TEST r/m16,imm16 TEST r/m32,imm32 TEST r/m8,r8 TEST r/m16,r16 TEST r/m32,r32 Description AND imm8 with AL; set SF, ZF, PF according to result AND imm16 with AX; set SF, ZF, PF according to result AND imm32 with EAX; set SF, ZF, PF according to result AND imm8 with r/m8; set SF, ZF, PF according to result AND imm16 with r/m16; set SF, ZF, PF according to result AND imm32 with r/m32; set SF, ZF, PF according to result AND r8 with r/m8; set SF, ZF, PF according to result AND r16 with r/m16; set SF, ZF, PF according to result AND r32 with r/m32; set SF, ZF, PF according to result
Description This instruction computes the bit-wise logical AND of first operand (source 1 operand) and the second operand (source 2 operand) and sets the SF, ZF, and PF status flags according to the result. The result is then discarded. Operation
TEMP SRC1 AND SRC2; SF MSB(TEMP); IF TEMP = 0 THEN ZF 1; ELSE ZF 0; FI: PF BitwiseXNOR(TEMP[0:7]); CF 0; OF 0; (*AF is Undefined*)
Flags Affected The OF and CF flags are cleared to 0. The SF, ZF, and PF flags are set according to the result (refer to the Operation section above). The state of the AF flag is undefined.
3-688
TESTLogical Compare (Continued)

3-689
UCOMISSUnordered Scalar Single-FP compare and set EFLAGS

Opcode 0F,2E,/r Instruction UCOMISS xmm1, xmm2/m32 Description Compare lower SP FP number in XMM1 register with lower SP FP number in XMM2/Mem and set the status flags accordingly.
Description The UCOMISS instructions compare the two lowest scalar SP FP numbers, and set the ZF,PF,CF bits in the EFLAGS register as described above. In addition, the OF, SF, and AF bits in the EFLAGS register are zeroed out. The unordered predicate is returned if either source operand is a NaN (qNaN or sNaN).
Xmm1
Xmm2/ m32 Xmm1
Figure 3-95. Operation of the UCOMISS Instruction, Condition One EFLAGS: OF,SF,AF=000 EFLAGS: ZF,PF,CF=111 MXCSR flags: Invalid flag is set
3-690
UCOMISSUnordered Scalar Single-FP compare and set EFLAGS (Continued)
Xmm1
Xmm2/ m32 Xmm1
Figure 3-96. Operation of the UCOMISS Instruction, Condition Two EFLAGS: OF,SF,AF=000 EFLAGS: ZF,PF,CF=000 MXCSR flags: Invalid flag is set
Xmm1
Xmm2/ m32 Xmm1
Figure 3-97. Operation of the UCOMISS Instruction, Condition Three EFLAGS: OF,SF,AF=000 EFLAGS: ZF,PF,CF=001 MXCSR flags: Invalid flag is set
3-691
Xmm1
Xmm2/ m32 Xmm1
Figure 3-98. Operation of the UCOMISS Instruction, Condition Four EFLAGS: OF,SF,AF=000 EFLAGS: ZF,PF,CF=100 MXCSR flags: Invalid flag is set
3-692

Operation
OF = 0; SF = 0; AF = 0; IF ((DEST[31-0] UNORD SRC/m32[31-0]) = TRUE) THEN ZF = 1; PF = 1; CF = 1; ELSE IF ((DEST[31-0] GTRTHAN SRC/m32[31-0]) = TRUE)THEN ZF = 0; PF = 0; CF = 0; ELSE IF ((DEST[31-0] LESSTHAN SRC/m32[31-0]) = TRUE THEN ZF = 0; PF = 0; CF = 1; ELSE ZF = 1; PF = 0; CF = 0; FI FI FI
3-693

_mm_ucomieq_ss(__m128 a, __m128 b)
Compares the lower SP FP value of a and b for a equal to b. If a and b are equal, 1 is returned. Otherwise 0 is returned.
_mm_ucomilt_ss(__m128 a, __m128 b)
Compares the lower SP FP value of a and b for a less than b. If a is less than b, 1 is returned. Otherwise 0 is returned.
_mm_ucomile_ss(__m128 a, __m128 b)
Compares the lower SP FP value of a and b for a less than or equal to b. If a is less than or equal to b, 1 is returned. Otherwise 0 is returned.
_mm_ucomigt_ss(__m128 a, __m128 b)
Compares the lower SP FP value of a and b for a greater than b. If a is greater than b are equal, 1 is returned. Otherwise 0 is returned.
_mm_ucomige_ss(__m128 a, __m128 b)
Compares the lower SP FP value of a and b for a greater than or equal to b. If a is greater than or equal to b, 1 is returned. Otherwise 0 is returned.
_mm_ucomineq_ss(__m128 a, __m128 b)
Compares the lower SP FP value of a and b for a not equal to b. If a and b are not equal, 1 is returned. Otherwise 0 is returned. Exceptions None. Numeric Exceptions Invalid (if sNaN operands), Denormal. Integer EFLAGS values will not be updated in the presence of unmasked numeric exceptions.
3-694

3-695

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #AC #PF (fault-code) Comments UCOMISS differs from COMISS in that it signals an invalid numeric exception when a source operand is an sNaN; COMISS signals invalid if a source operand is either a qNaN or an sNaN. The usage of Repeat (F2H, F3H) and Operand-size (66H) prefixes with UCOMISS is reserved. Different processor implementations may handle these prefixes differently. Usage of these prefixes with UCOMISS risks incompatibility with future processors. For unaligned memory reference if the current privilege level is 3. For a page fault.
3-696
UD2Undefined Instruction
Opcode 0F 0B Instruction UD2 Description Raise invalid opcode exception
Description This instruction generates an invalid opcode. This instruction is provided for software testing to explicitly generate an invalid opcode. The opcode for this instruction is reserved for this purpose. Other than raising the invalid opcode exception, this instruction is the same as the NOP instruction. Operation
#UD (* Generates invalid opcode exception *);
Flags Affected None. Exceptions (All Operating Modes) #UD Instruction is guaranteed to raise an invalid opcode exception in all operating modes).
3-697
UNPCKHPSUnpack High Packed Single-FP Data

Opcode 0F,15,/r Instruction UNPCKHPS xmm1, xmm2/m128 Description Interleaves SP FP numbers from the high halves of XMM1 and XMM2/Mem into XMM1 register.
Description The UNPCKHPS instruction performs an interleaved unpack of the high-order data elements of XMM1 and XMM2/Mem. It ignores the lower half of the sources.
Example 3-2. UNPCKHPS Instruction
X4
X3
X2
X1 xmm1
Y4
Y3
Y2
Y1 xmm2/m128 xmm1
Y4
X4
Y3
X3
3-698
UNPCKHPSUnpack High Packed Single-FP Data (Continued)
Xmm1
Xmm2/ m128 Xmm1
Figure 3-99. Operation of the UNPCKHPS Instruction
Operation
DEST[31-0] = DEST[95-64]; DEST[63-32] = SRC/m128[95-64]; DEST[95-64] = DEST[127-96]; DEST[127-96] = SRC/m128[127-96];

__m128 _mm_unpackhi_ps(__m128 a, __m128 b)
Selects and interleaves the upper two SP FP values from a and b. Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions None.
3-699
UNPCKHPSUnpack High Packed Single-FP Data (Continued)

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) Comments When unpacking from a memory operand, an implementation may decide to fetch only the appropriate 64 bits. Alignment to 16-byte boundary and normal segment checking will still be enforced. The usage of Repeat Prefix (F3H) with UNPCKHPS is reserved. Different processor implementations may handle this prefix differently. Usage of this prefix with UNPCKHPS risks incompatibility with future processors. For a page fault.
3-700
UNPCKLPSUnpack Low Packed Single-FP Data

Opcode 0F,14,/r Instruction UNPCKLPS xmm1, xmm2/m128 Description Interleaves SP FP numbers from the low halves of XMM1 and XMM2/Mem into XMM1 register.
Description The UNPCKLPS instruction performs an interleaved unpack of the low-order data elements of XMM1 and XMM2/Mem. It ignores the upper half part of the sources.
Example 3-3. UNPCKLPS Instruction
X4 xmm1 Y4 xmm2/m128 xmm1 Y2
X3
X2
X1
Y3
Y2
Y1
X2
Y1
X1
3-701
UNPCKLPSUnpack Low Packed Single-FP Data (Continued)
Xmm1
Xmm2/ m128 Xmm1
Figure 3-100. Operation of the UNPCKLPS Instruction
Operation
DEST[31-0] = DEST[31-0]; DEST[63-32] = SRC/m128[31-0]; DEST[95-64] = DEST[63-32]; DEST[127-96] = SRC/m128[63-32];

__m128 _mm_unpacklo_ps(__m128 a, __m128 b)
Selects and interleaves the lower two SP FP values from a and b. Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment. Numeric Exceptions None.
3-702
UNPCKLPSUnpack Low Packed Single-FP Data (Continued)

Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) Comments When unpacking from a memory operand, an implementation may decide to fetch only the appropriate 64 bits. Alignment to 16-byte boundary and normal segment checking will still be enforced. The usage of Repeat Prefixes (F2H, F3H) with UNPCKLPS is reserved. Different processor implementations may handle this prefix differently. Usage of these prefixes with UNPCKLPS risks incompatibility with future processors. For a page fault.
3-703
VERR/VERWVerify a Segment for Reading or Writing

Opcode 0F 00 /4 0F 00 /5 Instruction VERR r/m16 VERW r/m16 Description Set ZF=1 if segment specified with r/m16 can be read Set ZF=1 if segment specified with r/m16 can be written
Description These instructions verify whether the code or data segment specified with the source operand is readable (VERR) or writable (VERW) from the current privilege level (CPL). The source operand is a 16-bit register or a memory location that contains the segment selector for the segment to be verified. If the segment is accessible and readable (VERR) or writable (VERW), the ZF flag is set; otherwise, the ZF flag is cleared. Code segments are never verified as writable. This check cannot be performed on system segments. To set the ZF flag, the following conditions must be met:
The segment selector is not null. The selector must denote a descriptor within the bounds of the descriptor table (GDT or LDT). The selector must denote the descriptor of a code or data segment (not that of a system segment or gate). For the VERR instruction, the segment must be readable. For the VERW instruction, the segment must be a writable data segment. If the segment is not a conforming code segment, the segments DPL must be greater than or equal to (have less or the same privilege as) both the CPL and the segment selector's RPL.
The validation performed is the same as is performed when a segment selector is loaded into the DS, ES, FS, or GS register, and the indicated access (read or write) is performed. The segment selector's value cannot result in a protection exception, enabling the software to anticipate possible segment access problems.
3-704
VERR/VERWVerify a Segment for Reading or Writing (Continued)

Operation
IF SRC(Offset) > (GDTR(Limit) OR (LDTR(Limit)) THEN ZF 0 Read segment descriptor; IF SegmentDescriptor(DescriptorType) = 0 (* system segment *) OR (SegmentDescriptor(Type) conforming code segment) AND (CPL > DPL) OR (RPL > DPL) THEN ZF 0 ELSE IF ((Instruction = VERR) AND (segment = readable)) OR ((Instruction = VERW) AND (segment = writable)) THEN ZF 1; FI; FI;
Flags Affected The ZF flag is set to 1 if the segment is accessible and readable (VERR) or writable (VERW); otherwise, it is cleared to 0. Protected Mode Exceptions The only exceptions generated for these instructions are those related to illegal addressing of the source operand. #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-705
VERR/VERWVerify a Segment for Reading or Writing (Continued)

Real-Address Mode Exceptions #UD The VERR and VERW instructions are not recognized in real-address mode.
Virtual-8086 Mode Exceptions #UD The VERR and VERW instructions are not recognized in virtual-8086 mode.
3-706
WAIT/FWAITWait
Opcode 9B 9B Instruction WAIT FWAIT Description Check pending unmasked floating-point exceptions. Check pending unmasked floating-point exceptions.
Description These instructions cause the processor to check for and handle pending, unmasked, floatingpoint exceptions before proceeding. (FWAIT is an alternate mnemonic for the WAIT). This instruction is useful for synchronizing exceptions in critical sections of code. Coding a WAIT instruction after a floating-point instruction insures that any unmasked floating-point exceptions the instruction may raise are handled before the processor can modify the instructions results. Refer to Section 7.9., Floating-Point Exception Synchronization in Chapter 7, Floating-Point Unit of the Intel Architecture Software Developers Manual, Volume 1, for more information on using the WAIT/FWAIT instruction. Operation
CheckForPendingUnmaskedFloatingPointExceptions;
FPU Flags Affected The C0, C1, C2, and C3 flags are undefined. Floating-Point Exceptions None. Protected Mode Exceptions #NM MP and TS in CR0 is set.
Real-Address Mode Exceptions #NM MP and TS in CR0 is set.
Virtual-8086 Mode Exceptions #NM MP and TS in CR0 is set.
3-707
WBINVDWrite Back and Invalidate Cache

Opcode 0F 09 Instruction WBINVD Description Write back and flush Internal caches; initiate writing-back and flushing of external caches.
Description This instruction writes back all modified cache lines in the processors internal cache to main memory and invalidates (flushes) the internal caches. The instruction then issues a special-function bus cycle that directs external caches to also write back modified data and another bus cycle to indicate that the external caches should be invalidated. After executing this instruction, the processor does not wait for the external caches to complete their write-back and flushing operations before proceeding with instruction execution. It is the responsibility of hardware to respond to the cache write-back and flush signals. The WDINVD instruction is a privileged instruction. When the processor is running in protected mode, the CPL of a program or procedure must be 0 to execute this instruction. This instruction is also a serializing instruction. For more information, refer to Section 7.4., Serializing Instructions in Chapter 7, Multiple-Processor Management of the Intel Architecture Software Developers Manual, Volume 3. In situations where cache coherency with main memory is not a concern, software can use the INVD instruction. Intel Architecture Compatibility The WBINVD instruction is implementation dependent, and its function may be implemented differently on future Intel Architecture processors. The instruction is not supported on Intel Architecture processors earlier than the Intel486 processor. Operation
WriteBack(InternalCaches); Flush(InternalCaches); SignalWriteBack(ExternalCaches); SignalFlush(ExternalCaches); Continue (* Continue execution);
3-708
WBINVDWrite Back and Invalidate Cache (Continued)

Real-Address Mode Exceptions None. Virtual-8086 Mode Exceptions #GP(0) The WBINVD instruction cannot be executed at the virtual-8086 mode.
3-709
WRMSRWrite to Model Specific Register

Opcode 0F 30 Instruction WRMSR Description Write the value in EDX:EAX to MSR specified by ECX
Description This instruction writes the contents of registers EDX:EAX into the 64-bit model specific register (MSR) specified in the ECX register. The high-order 32 bits are copied from EDX and the loworder 32 bits are copied from EAX. Always set the undefined or reserved bits in an MSR to the values previously read. This instruction must be executed at privilege level 0 or in real-address mode; otherwise, a general protection exception #GP(0) will be generated. Specifying a reserved or unimplemented MSR address in ECX will also cause a general protection exception. When the WRMSR instruction is used to write to an MTRR, the TLBs are invalidated, including the global entries. For more information, refer to Section 3.7., Translation Lookaside Buffers (TLBs) in Chapter 3, Protected-Mode Memory Management of the Intel Architecture Software Developers Manual, Volume 3. MTRRs are an implementation-specific feature of the Pentium Pro processor. The MSRs control functions for testability, execution tracing, performance monitoring and machine check errors. Appendix B, Model-Specific Registers, in the Intel Architecture Software Developers Manual, Volume 3, lists all the MSRs that can be written to with this instruction and their addresses. The WRMSR instruction is a serializing instruction. For more information, refer to Section 7.4., Serializing Instructions in Chapter 7, Multiple-Processor Management of the Intel Architecture Software Developers Manual, Volume 3. The CPUID instruction should be used to determine whether MSRs are supported (EDX[5]=1) before using this instruction. Intel Architecture Compatibility The MSRs and the ability to read them with the WRMSR instruction were introduced into the Intel Architecture with the Pentium processor. Execution of this instruction by an Intel Architecture processor earlier than the Pentium processor results in an invalid opcode exception #UD. Operation
MSR[ECX] EDX:EAX;
3-710
WRMSRWrite to Model Specific Register (Continued)

Protected Mode Exceptions #GP(0) If the current privilege level is not 0. If the value in ECX specifies a reserved or unimplemented MSR address. Real-Address Mode Exceptions #GP If the value in ECX specifies a reserved or unimplemented MSR address.
Virtual-8086 Mode Exceptions #GP(0) The WRMSR instruction is not recognized in virtual-8086 mode.
3-711
XADDExchange and Add

Opcode 0F C0/r 0F C1/r 0F C1/r Instruction XADD r/m8,r8 XADD r/m16,r16 XADD r/m32,r32 Description Exchange r8 and r/m8; load sum into r/m8. Exchange r16 and r/m16; load sum into r/m16. Exchange r32 and r/m32; load sum into r/m32.
Description This instruction exchanges the first operand (destination operand) with the second operand (source operand), then loads the sum of the two values into the destination operand. The destination operand can be a register or a memory location; the source operand is a register. This instruction can be used with a LOCK prefix. Intel Architecture Compatibility Intel Architecture processors earlier than the Intel486 processor do not recognize this instruction. If this instruction is used, you should provide an equivalent code sequence that runs on earlier processors. Operation
TEMP SRC + DEST SRC DEST DEST TEMP
Flags Affected The CF, PF, AF, SF, ZF, and OF flags are set according to the result of the addition, which is stored in the destination operand. Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-712
XADDExchange and Add (Continued)

3-713
XCHGExchange Register/Memory with Register

Opcode 90+rw 90+rw 90+rd 90+rd 86 /r 86 /r 87 /r 87 /r 87 /r 87 /r Instruction XCHG AX,r16 XCHG r16,AX XCHG EAX,r32 XCHG r32,EAX XCHG r/m8,r8 XCHG r8,r/m8 XCHG r/m16,r16 XCHG r16,r/m16 XCHG r/m32,r32 XCHG r32,r/m32 Description Exchange r16 with AX Exchange AX with r16 Exchange r32 with EAX Exchange EAX with r32 Exchange r8 (byte register) with byte from r/m8 Exchange byte from r/m8 with r8 (byte register) Exchange r16 with word from r/m16 Exchange word from r/m16 with r16 Exchange r32 with doubleword from r/m32 Exchange doubleword from r/m32 with r32
Description This instruction exchanges the contents of the destination (first) and source (second) operands. The operands can be two general-purpose registers or a register and a memory location. If a memory operand is referenced, the processors locking protocol is automatically implemented for the duration of the exchange operation, regardless of the presence or absence of the LOCK prefix or of the value of the IOPL. Refer to the LOCK prefix description in this chapter for more information on the locking protocol. This instruction is useful for implementing semaphores or similar data structures for process synchronization. Refer to Section 7.1.2., Bus Locking in Chapter 7, Multiple-Processor Management of the Intel Architecture Software Developers Manual, Volume 3, for more information on bus locking. The XCHG instruction can also be used instead of the BSWAP instruction for 16-bit operands. Operation
TEMP DEST DEST SRC SRC TEMP
3-714
XCHGExchange Register/Memory with Register (Continued)

Protected Mode Exceptions #GP(0) If either operand is in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) #AC(0) If a memory operand effective address is outside the SS segment limit. If a page fault occurs. If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.
3-715
XLAT/XLATBTable Look-up Translation

Opcode D7 D7 Instruction XLAT m8 XLATB Description Set AL to memory byte DS:[(E)BX + unsigned AL] Set AL to memory byte DS:[(E)BX + unsigned AL]
Description This instruction locates a byte entry in a table in memory, using the contents of the AL register as a table index, then copies the contents of the table entry back into the AL register. The index in the AL register is treated as an unsigned integer. The XLAT and XLATB instructions get the base address of the table in memory from either the DS:EBX or the DS:BX registers (depending on the address-size attribute of the instruction, 32 or 16, respectively). The DS segment may be overridden with a segment override prefix. At the assembly-code level, two forms of this instruction are allowed: the explicit-operand form and the no-operand form. The explicit-operand form (specified with the XLAT mnemonic) allows the base address of the table to be specified explicitly with a symbol. This explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the symbol does not have to specify the correct base address. The base address is always specified by the DS:(E)BX registers, which must be loaded correctly before the XLAT instruction is executed. The no-operands form (XLATB) provides a short form of the XLAT instructions. Here also the processor assumes that the DS:(E)BX registers contain the base address of the table. Operation
IF AddressSize = 16 THEN AL (DS:BX + ZeroExtend(AL)) ELSE (* AddressSize = 32 *) AL (DS:EBX + ZeroExtend(AL)); FI;
Flags Affected None. Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) #PF(fault-code) If a memory operand effective address is outside the SS segment limit. If a page fault occurs.
3-716
XLAT/XLATBTable Look-up Translation (Continued)

3-717
XORLogical Exclusive OR
Opcode 34 ib 35 iw 35 id 80 /6 ib 81 /6 iw 81 /6 id 83 /6 ib 83 /6 ib 30 /r 31 /r 31 /r 32 /r 33 /r 33 /r Instruction XOR AL,imm8 XOR AX,imm16 XOR EAX,imm32 XOR r/m8,imm8 XOR r/m16,imm16 XOR r/m32,imm32 XOR r/m16,imm8 XOR r/m32,imm8 XOR r/m8,r8 XOR r/m16,r16 XOR r/m32,r32 XOR r8,r/m8 XOR r16,r/m16 XOR r32,r/m32 Description AL XOR imm8 AX XOR imm16 EAX XOR imm32 r/m8 XOR imm8 r/m16 XOR imm16 r/m32 XOR imm32 r/m16 XOR imm8 (sign-extended) r/m32 XOR imm8 (sign-extended) r/m8 XOR r8 r/m16 XOR r16 r/m32 XOR r32 r8 XOR r/m8 r8 XOR r/m8 r8 XOR r/m8
Description This instruction performs a bitwise exclusive OR (XOR) operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result is 1 if the corresponding bits of the operands are different; each bit is 0 if the corresponding bits are the same. Operation
DEST DEST XOR SRC;
3-718
XORLogical Exclusive OR (Continued)

3-719
XORPSBit-wise Logical Xor for Single-FP Data

Opcode 0F,57,/r Instruction XORPS xmm1, xmm2/m128 Description XOR 128 bits from XMM2/Mem to XMM1 register.
Description The XORPS instruction returns a bit-wise logical XOR between XMM1 and XMM2/Mem.
Xmm1
<
Xmm2/ m128 Xmm1
<
=
<
<
Figure 3-101. Operation of the XORPS Instruction
Operation
DEST[127-0] = DEST/m128[127-0] XOR SRC/m128[127-0]

__m128 _mm_xor_ps(__m128 a, __m128 b)
Computes bitwise EXOR (exclusive-or) of the four SP FP values of a and b. Exceptions General protection exception if not aligned on 16-byte boundary, regardless of segment.
3-720
XORPSBit-wise Logical Xor for Single-FP Data (Continued)

Numeric Exceptions None. Protected Mode Exceptions #GP(0) #SS(0) #PF(fault-code) #UD #NM #UD #UD For an illegal memory operand effective address in the CS, DS, ES, FS, or GS segments. For an illegal address in the SS segment. For a page fault. If CR0.EM = 1. If TS bit in CR0 is set. If CR4.OSFXSR(bit 9) = 0. If CPUID.XMM(EDX bit 25) = 0.
Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode. #PF(fault-code) Comments The usage of Repeat Prefix (F3H) with XORPS is reserved. Different processor implementations may handle this prefix differently. Usage of this prefix with XORPS risks incompatibility. For a page fault.
3-721
3-722
A
Opcode Map
APPENDIX A OPCODE MAP

The opcode tables in this chapter are provided to aid in interpreting Intel Architecture object code. The instructions are divided into three encoding groups: 1-byte opcode encodings, 2-byte opcode encodings, and escape (floating-point) encodings. The 1- and 2-byte opcode encodings are used to encode integer, system, MMX technology, and Streaming SIMD Extensions. The opcode maps for these instructions are given in Table A-2 through A-6. Section A.2.1., OneByte Opcode Instructions through Section A.2.5., Opcode Extensions For One- And Two-byte Opcodes give instructions for interpreting 1- and 2-byte opcode maps. The escape encodings are used to encode floating-point instructions. The opcode maps for these instructions are given in Table A-7 through A-22. Section A.2.6., Escape Opcode Instructions gives instructions for interpreting the escape opcode maps. The opcode tables in this section aid in interpreting Pentium processor object code. Use the four high-order bits of the opcode as an index to a row of the opcode table; use the four loworder bits as an index to a column of the table. If the opcode is 0FH, refer to the 2-byte opcode table and use the second byte of the opcode to index the rows and columns of that table. The escape (ESC) opcode tables for floating-point instructions identify the eight high-order bits of the opcode at the top of each page. If the accompanying ModR/M byte is in the range 00H through BFH, bits 3 through 5 identified along the top row of the third table on each page, along with the REG bits of the ModR/M, determine the opcode. ModR/M bytes outside the range 00H through BFH are mapped by the bottom two tables on each page. Refer to Chapter 2, Instruction Format for detailed information on the ModR/M byte, register values, and the various addressing forms.
A.1. KEY TO ABBREVIATIONS

Operands are identified by a two-character code of the form Zz. The first character, an uppercase letter, specifies the addressing method; the second character, a lowercase letter, specifies the type of operand.
A.1.1.
A
Codes for Addressing Method

Direct address. The instruction has no ModR/M byte; the address of the operand is encoded in the instruction; and no base register, index register, or scaling factor can be applied (for example, far JMP (EA)). The reg field of the ModR/M byte selects a control register (for example, MOV (0F20, 0F22)).
The following abbreviations are used for addressing methods:
A-1
OPCODE MAP
D E
The reg field of the ModR/M byte selects a debug register (for example, MOV (0F21,0F23)). A ModR/M byte follows the opcode and specifies the operand. The operand is either a general-purpose register or a memory address. If it is a memory address, the address is computed from a segment register and any of the following values: a base register, an index register, a scaling factor, a displacement. EFLAGS Register. The reg field of the ModR/M byte selects a general register (for example, AX (000)). Immediate data. The operand value is encoded in subsequent bytes of the instruction. The instruction contains a relative offset to be added to the instruction pointer register (for example, JMP (0E9), LOOP). The ModR/M byte may refer only to memory (for example, BOUND, LES, LDS, LSS, LFS, LGS, CMPXCHG8B). The instruction has no ModR/M byte; the offset of the operand is coded as a word or double word (depending on address size attribute) in the instruction. No base register, index register, or scaling factor can be applied (for example, MOV (A0A3)). The reg field of the ModR/M byte selects a packed quadword MMX technology register. A ModR/M byte follows the opcode and specifies the operand. The operand is either an MMX technology register or a memory address. If it is a memory address, the address is computed from a segment register and any of the following values: a base register, an index register, a scaling factor, and a displacement. The mod field of the ModR/M byte may refer only to a general register (for example, MOV (0F20-0F24, 0F26)). The reg field of the ModR/M byte selects a segment register (for example, MOV (8C,8E)). The reg field of the ModR/M byte selects a test register (for example, MOV (0F24,0F26)). The reg field of the ModR/M byte selects a packed SIMD floating-point register. An ModR/M byte follows the opcode and specifies the operand. The operand is either a SIMD floating-point register or a memory address. If it is a memory address, the address is computed from a segment register and any of the following values: a base register, an index register, a scaling factor, and a displacement Memory addressed by the DS:SI register pair (for example, MOVS, CMPS, OUTS, or LODS). Memory addressed by the ES:DI register pair (for example, MOVS, CMPS, INS, STOS, or SCAS).
F G I J M O
P Q
R S T V W
X Y
A-2
OPCODE MAP
A.1.2.
a b c d dq p pi ps q s ss si v w
Codes for Operand Type

Two one-word operands in memory or two double-word operands in memory, depending on operand-size attribute (used only by the BOUND instruction). Byte, regardless of operand-size attribute. Byte or word, depending on operand-size attribute. Doubleword, regardless of operand-size attribute. Double-quadword, regardless of operand-size attribute. 32-bit or 48-bit pointer, depending on operand-size attribute. Quadword MMX technology register (e.g. mm0) 128-bit packed FP single-precision data. Quadword, regardless of operand-size attribute. 6-byte pseudo-descriptor. Scalar element of a 128-bit packed FP single-precision data. Doubleword integer register (e.g., eax) Word or doubleword, depending on operand-size attribute. Word, regardless of operand-size attribute.
The following abbreviations are used for operand types:
A.1.3.
Register Codes
When an operand is a specific register encoded in the opcode, the register is identified by its name (for example, AX, CL, or ESI). The name of the register indicates whether the register is 32, 16, or 8 bits wide. A register identifier of the form eXX is used when the width of the register depends on the operand-size attribute. For example, eAX indicates that the AX register is used when the operand-size attribute is 16, and the EAX register is used when the operand-size attribute is 32.
A.2. OPCODE LOOK-UP EXAMPLES

This section provides several examples to demonstrate how the following opcode maps are used. Refer to the introduction to Chapter 3, Instruction Set Reference, in the Intel Architecture Software Developers Manual, Volume 2 for detailed information on the ModR/M byte, register values, and the various addressing forms.
A-3
OPCODE MAP
A.2.1.
One-Byte Opcode Instructions
The opcode maps for 1-byte opcodes are shown in Table A-2 and A-3. Looking at the 1-byte opcode maps, the instruction and its operands can be determined from the hexadecimal opcode. For example:
Opcode: 030500000000H
LSB address 03 05 00 00 00 MSB address 00
Opcode 030500000000H for an ADD instruction can be interpreted from the 1-byte opcode map as follows. The first digit (0) of the opcode indicates the row, and the second digit (3) indicates the column in the opcode map tables. The first operand (type Gv) indicates a general register that is a word or doubleword depending on the operand-size attribute. The second operand (type Ev) indicates that a ModR/M byte follows that specifies whether the operand is a word or doubleword general-purpose register or a memory address. The ModR/M byte for this instruction is 05H, which indicates that a 32-bit displacement follows (00000000H). The reg/opcode portion of the ModR/M byte (bits 3 through 5) is 000, indicating the EAX register. Thus, it can be determined that the instruction for this opcode is ADD EAX, mem_op, and the offset of mem_op is 00000000H. Some 1- and 2-byte opcodes point to group numbers. These group numbers indicate that the instruction uses the reg/opcode bits in the ModR/M byte as an opcode extension (refer to Section A.2.5., Opcode Extensions For One- And Two-byte Opcodes).
A.2.2.
Two-Byte Opcode Instructions
Instructions that begin with 0FH can be found in the two-byte opcode maps given in Table A-4 and A-5. The second opcode byte is used to reference a particular row and column in the tables. For example, the opcode 0FA4050000000003H is located on the two-byte opcode map in row A, column 4. This opcode indicates a SHLD instruction with the operands Ev, Gv, and Ib. These operands are defined as follows: Ev Gv Ib The ModR/M byte follows the opcode to specify a word or doubleword operand The reg field of the ModR/M byte selects a general-purpose register Immediate data is encoded in the subsequent byte of the instruction.
The third byte is the ModR/M byte (05H). The mod and opcode/reg fields indicate that a 32-bit displacement follows, located in the EAX register, and is the source. The next part of the opcode is the 32-bit displacement for the destination memory operand (00000000H), and finally the immediate byte representing the count of the shift (03H). By this breakdown, it has been shown that this opcode represents the instruction:
SHLD DS:00000000H, EAX, 3
A-4
OPCODE MAP
The next part of the SHLD opcode is the 32-bit displacement for the destination memory operand (00000000H), which is followed by the immediate byte representing the count of the shift (03H). By this breakdown, it has been shown that the opcode 0FA4050000000003H represents the instruction:
SHLD DS:00000000H, EAX, 3.
Lower case is used in the following tables to highlight the mnemonics added by MMX technology and Streaming SIMD Extensions.
A.2.3.
Opcode Map Shading
Table A-2 contains notes on particular encodings. These notes are indicated in the following Opcode Maps (Table A-2 through A-6) by superscripts. For the One-byte Opcode Maps (Table A-2 through A-3), grey shading indicates instruction groupings.
A.2.4.
Opcode Map Notes
Table A-1 contains notes on particular encodings. These notes are indicated in the following Opcode Maps (Table A-2 through A-6) by superscripts.
Table A-1. Notes on Instruction Set Encoding Tables

Symbol 1A 1B Note Bits 5, 4, and 3 of ModR/M byte used as an opcode extension (refer to Section A.2.5., Opcode Extensions For One- And Two-byte Opcodes). These abbreviations are not actual mnemonics. When shifting by immediate shift counts, the PSHIMD mnemonic represents the PSLLD, PSRAD, and PSRLD instructions, PSHIMW represents the PSLLW, PSRAW, and PSRLW instructions, and PSHIMQ represents the PSLLQ and PSRLQ instructions. The instructions that shift by immediate counts are differentiated by ModR/M bytes (refer to Section A.2.5., Opcode Extensions For One- And Two-byte Opcodes). Use the 0F0B opcode (UD2 instruction) or the 0FB9H opcode when deliberately trying to generate an invalid opcode exception (#UD). Some instructions added in the Pentium III processor may use the same two-byte opcode. If the instruction has variations, or the opcode represents different instructions, the ModR/M byte will be used to differentiate the instruction. For the value of the ModR/M byte needed to completely decode the instruction, see Table A-6. (These instructions include SFENCE, STMXCSR, LDMXCSR, FXRSTOR, and FXSAVE, as well as PREFETCH and its variations.)
1C 1D
A-5
OPCODE MAP
Table A-2. One-byte Opcode Map (Left)

0 0 1 2 ADD Eb, Gb Ev, Gv Gb, Eb ADC Eb, Gb Ev, Gv Gb, Eb AND Eb, Gb Ev, Gv Gb, Eb XOR Eb, Gb Ev, Gv Gb, Eb Gb, Ev AL, Ib eAX, Iv Gv, Ev AL, Ib eAX, Iv Gv, Ev AL, Ib eAX, Iv Gv, Ev AL, Ib eAX, Iv 3 4 5 6 PUSH ES PUSH SS SEG=ES 7 POP ES POP SS DAA
SEG=SS
AAA
INC general register eAX eCX eDX eBX eSP eBP eSI eDI
5 6 7
PUSH general register eAX PUSHA/ PUSHAD eCX POPA/ POPAD eDX BOUND Gv, Ma eBX ARPL Ew, Gw eSP SEG=FS eBP SEG=GS eSI Opd Size eDI Addr Size
Jcc, Jb - Short-displacement jump on condition O NO B/NAE/C NB/AE/NC Z/E TEST Ev, Ib Eb, Gb Ev, Gv Eb, Gb NZ/NE BE/NA XCHG Ev, Gv NBE/A Immediate Grp 11A Eb, Ib NOP Ev, Iv Ev, Ib
XCHG word or double-word register with eAX eCX MOV eDX eBX eSP MOVS/ MOVSB Xb, Yb eBP MOVS/ MOVSW/ MOVSD Xv, Yv eSI CMPS/ CMPSB Xb, Yb eDI CMPS/ CMPSW/ CMPSD Xv, Yv
AL, Ob
eAX, Ov
Ob, AL
Ov, eAX
MOV immediate byte into byte register AL Shift Grp 21A Eb, Ib Ev, Ib CL DL RETN Iw BL RETN AH LES Gv, Mp AAM Ib IN AL, Ib HLT eAX, Ib CMC Ib, AL CH LDS Gv, Mp AAD Ib OUT Ib, eAX DH BH Grp 111A - MOV Eb, Ib Ev, Iv XLAT/ XLATB
Shift Grp 21A Eb, 1 LOOPNE/ LOOPNZ Jb LOCK Ev, 1 LOOPE/ LOOPZ Jb Eb, CL LOOP Jb REPNE Ev, CL JCXZ/ JECXZ Jb REP/ REPE
Unary Grp 31A Eb Ev
A-6
OPCODE MAP
Table A-3. One-byte Opcode Map (Right)

8 9 A OR Eb, Gb Eb, Gb Eb, Gb Eb, Gb eAX eAX PUSH Iv Ev, Gv Ev, Gv Ev, Gv Ev, Gv eCX eCX IMUL Gv, Ev, Iv Gb, Eb SBB Gb, Eb SUB Gb, Eb CMP Gb, Eb eDX eDX PUSH Ib Gv, Ev eBX eBX IMUL Gv, Ev, Ib AL, Ib eSP eSP INS/ INSB Yb, DX eAX, Iv eBP eBP INS/ INSW/ INSD Yv, DX NL/GE LEA Gv, M POPF/ POPFD Fv LODS/ LODSW/ LODSD eAX, Xv eBP INT Ib IN short Jb STI AL, DX CLD eAX, DX STD DX, AL INC/DEC Grp 41A DEC general register eSI eSI OUTS/ OUTSB DX, Xb eDI eDI OUTS/ OUTSW/ OUTSD DX, Xv NLE/G POP Ev LAHF SCAS/ SCASW/ SCASD eAX, Xv eDI IRET POP into general register Gv, Ev AL, Ib eAX, Iv Gv, Ev AL, Ib eAX, Iv Gv, Ev AL, Ib eAX, Iv B C D E PUSH CS PUSH DS SEG=CS SEG=DS F 2-byte escape 0
POP DS DAS AAS
1 2 3 4 5
Jcc, Jb- Short displacement jump on condition S Eb, Gb CBW/ CWDE TEST AL, Ib eAX, Iv NS MOV Ev, Gv CWD/ CDQ Gb, Eb CALLF Ap STOS/ STOSB Yb, AL Gv, Ev FWAIT/ WAIT STOS/ STOSW/ STOSD Yv, eAX eBX RETF P/PE NP/PO L/NGE MOV Ew, Sw PUSHF/ PUSHFD Fv LODS/ LODSB AL, Xb LE/NG MOV Sw, Ew SAHF SCAS/ SCASB AL, Yb
7 8
MOV immediate word or double into word or double register eAX ENTER Iw, Ib eCX LEAVE eDX RETF Iw JMP near JV STC far AP CLI eSP INT 3 eSI INTO
B C D
ESC (Escape to coprocessor instruction set) CALL Jv CLC OUT DX, eAX INC/DEC Grp 51A
GENERAL NOTE: All blanks in the opcode maps A-2 and A-3 are reserved and should not be used. Do not depend on the operation of these undefined opcodes.
A-7
OPCODE MAP
Table A-4. Two-byte Opcode Map (Left) (First Byte is OFH)

0 0 Grp 61A movups Vps, Wps movss (F3) Vss, Wss MOV Rd, Cd WRMSR 1 Grp 71A movups Wps, Vps movss (F3) Wss, Vss MOV Rd, Dd RDTSC 2 LAR Gv, Ew movlps Wq, Vq movhlps Vq, Vq MOV Cd, Rd RDMSR 3 LSL Gv, Ew movlps Vq, Wq unpcklps Vps, Wq unpckhps Vps, Wq 4 5 6 CLTS movhps Vq, Wq movlhps Vq, Vq 7
movhps Wq, Vq
2 3
MOV Dd, Rd RDPMC SYSENTER SYSEXIT
CMOVcc, (Gv, Ev) - Conditional Move O movmskps Ed, Vps NO sqrtps Vps, Wps sqrtss (F3) Vss, Wss punpcklwd Pq, Qd pshimw1B Pq, Qq (Grp 121A) B/C/NAE rsqrtps Vps, Wps rsqrtss (F3) Vss, Wss punpckldq Pq, Qd pshimd1B Pq, Qq (Grp 131A) AE/NB/NC rcpps Vps, Wps rcpss (F3) Vss, Wss packsswb Pq, Qq pshimq1B Pq, Qq (Grp 141A) E/Z andps Vps, Wps NE/NZ andnps Vps, Wps BE/NA orps Vps, Wps A/NBE xorps Vps, Wps
punpcklbw Pq, Qd
pcmpgtb Pq, Qq
pcmpgtw Pq, Qq
pcmpgtd Pq, Qq
packuswb Pq, Qq
pshufw Pq, Qq, Ib
pcmpeqb Pq, Qq
pcmpeqw Pq, Qq
pcmpeqd Pq, Qq
emms
Jcc, Jv - Long-displacement jump on condition O NO B/C/NAE AE/NB/NC E/Z NE/NZ BE/NA A/NBE
SETcc, Eb - Byte Set on condition O PUSH FS CMPXCHG Eb, Gb Ev, Gv NO POP FS B/C/NAE CPUID LSS Mp cmpps Vps, Wps, Ib cmpss (F3) Vss, Wss, Ib psrld Pq, Qq (Grp 131A) psrad Pq, Qq (Grp 131A) pslld Pq, Qq (Grp 131A) psrlq Pq, Qq (Grp 141A) pavgw Pq, Qq psllq Pq, Qq (Grp 141A) pmulhuw Pq, Qq AE/NB/NC BT Ev, Gv BTR Ev, Gv E/Z SHLD Ev, Gv, Ib LFS Mp pinsrw Pq, Ed, Ib NE/NZ SHLD Ev, Gv, CL LGS Mp pextrw Gd, Pq, Ib MOVZX Gv, Eb shufps Vps, Wps, Ib Gv, Ew BE/NA A/NBE
XADD Eb, Gb
XADD Ev, Gv
Grp 91A
D pavgb Pq, Qq
psrlw Pq, Qq (Grp 121A) psraw Pq, Qq (Grp 121A) psllw Pq, Qq (Grp 121A)
pmullw Pq, Qq pmulhw Pq, Qq pmaddwd Pq, Qq psadbw Pq, Qq
pmovmskb Gd, Pq movntq Wq, Vq maskmovq Ppi, Qpi
GENERAL NOTE: All blanks in the opcode maps A-4 and A-5 are reserved and should not be used. Do not depend on the operation of these undefined opcodes.
A-8
OPCODE MAP
Table A-5. Two-byte Opcode Map (Right) (First Byte is OFH)

8 INVD Prefetch (Grp 161A) movaps Vps, Wps
1D
9 WBINVD
B 2-byte Illegal Opcodes UD21C
F 0
1 cvtpi2ps Vps, Qq cvtsi2ss (F3) Vss, Ed cvttps2pi Qq, Wps cvttss2si (F3) Gd, Wss cvtps2pi Qq, Wps cvtss2si (F3) Gd, Wss
movaps Wps, Vps
movntps Wps, Vps
ucomiss Vss, Wss
comiss Vps, Wps
3 CMOVcc(Gv, Ev) - Conditional Move S addps Vps, Wps addss (F3) Vss, Wss punpckhbw Pq, Qd NS mulps Vps, Wps mulss (F3) Vss, Wss punpckhwd Pq, Qd punpckhdq Pq, Qd MMX UD packssdw Pq, Qd P/PE NP/PO L/NGE subps Vps, Wps subss (F3) Vss, Wss NL/GE minps Vps, Wps minss (F3) Vss, Wss LE/NG divps Vps, Wps divss (F3) Vss, Wss movd Pd, Ed movd Ed, Pd NLE/G maxps Vps, Wps maxss (F3) Vss, Wss movq Pq, Qq movq Qq, Pq 4
Jcc, Jv - Long-displacement jump on condition S NS P/PE NP/PO L/NGE NL/GE LE/NG NLE/G
SETcc, Eb - Byte Set on condition S PUSH GS NS POP GS Grp Invalid Opcode1C 101A P/PE RSM Grp 81A Ev, Ib NP/PO BTS Ev, Gv BTC Ev, Gv BSWAP EAX psubusb Pq, Qq psubsb Pq, Qq psubb Pq, Qq ECX psubusw Pq, Qq psubsw Pq, Qq psubw Pq, Qq EDX pminub Pq, Qq pminsw Pq, Qq psubd Pq, Qq EBX pand Pq, Qq por Pq, Qq ESP paddusb Pq, Qq paddsb Pq, Qq paddb Pq, Qq EBP paddusw Pq, Qq paddsw Pq, Qq paddw Pq, Qq ESI pmaxub Pq, Qq pmaxsw Pq, Qq paddd Pq, Qq EDI pandn Pq, Qq pxor Pq, Qq L/NGE SHRD Ev, Gv, Ib BSF Gv, Ev NL/GE SHRD Ev, Gv, CL BSR Gv, Ev LE/NG (Grp 151A)1D MOVSX Gv, Eb Gv, Ew NLE/G IMUL Gv, Ev
9 A
A-9
OPCODE MAP
A.2.5.
Opcode Extensions For One- And Two-byte Opcodes
Some of the 1-byte and 2-byte opcodes use bits 5, 4, and 3 of the ModR/M byte (the nnn field in Figure A-1) as an extension of the opcode. Those opcodes that have opcode extensions are indicated in Table A-6 with group numbers (Group 1, Group 2, etc.). The group numbers (ranging from 1 to A) provide an entry point into Table A-6 where the encoding of the opcode extension field can be found. For example, the ADD instruction with a 1-byte opcode of 80H is a Group 1 instruction. Table A-6 indicates that the opcode extension that must be encoded in the ModR/M byte for this instruction is 000B.
mod nnn R/M
Figure A-1. ModR/M Byte nnn Field (Bits 5, 4, and 3)
A-10
OPCODE MAP
Table A-6. Opcode Extensions for One- and Two-byte Opcodes by Group Number
Encoding of Bits 5,4,3 of the ModR/M Byte
Opcode
80-83 C0, C1 reg, imm D0, D1 reg, 1 D2, D3 reg, CL F6, F7 FE FF OF OO OF O1 OF BA
Group
1 2
Mod 7,6
mem11 mem11
000
ADD ROL
001
OR ROR
010
ADC RCL
011
SBB RCR
100
AND SHL/SAL
101
SUB SHR
110
XOR
111
CMP SAR
3 4 5 6 7 8
mem11 mem11 mem11 mem11 mem11 mem11 mem
TEST Ib/Iv INC Eb INC Ev SLDT Ew SGDT Ms DEC Eb DEC Ev STR Ew SIDT Ms
NOT
NEG
MUL AL/eAX
IMUL AL/eAX
DIV AL/eAX
IDIV AL/eAX
CALLN Ev LLDT Ew LGDT Ms
CALLF Ep LTR Ew LIDT Ms
JMPN Ev VERR Ew SMSW Ew BT
JMPF Ep VERW Ew
PUSH Ev
LMSW Ew BTS BTR
INVLPG Mb BTC
OF C7
9 11 mem
CMPXCH8 B Mq
OF B9
10 11
C6 11 C7
mem 11 mem
MOV Ev, Iv MOV Ev, Iv
OF 71
12 11 mem
psrlw Pq, Ib
psraw Pq, Ib
psllw Pq, Ib
OF 72
13 11 mem
psrld Pq, Ib
psrad Pq, Ib
pslld Pq, Ib
OF 73
14 11 mem fxsave fxrstor
psrlq Pq, Ib ldmxcsr stmxcsr
psllq Pq, Ib
OF AE
15 11 mem prefetch NTA prefetch T0 prefetch T1 prefetch T2 sfence
OF 18
16 11
GENERAL NOTE: All blanks in the opcode map are reserved and should not be used. Do not depend on the operation of these undefined opcodes.
A-11
OPCODE MAP
A.2.6.
Escape Opcode Instructions
The opcode maps for the escape instruction opcodes (floating-point instruction opcodes) are given in Table A-7 through A-22. These opcode maps are grouped by the first byte of the opcode from D8 through DF. Each of these opcodes has a ModR/M byte. If the ModR/M byte is within the range of 00H through BFH, bits 5, 4, and 3 of the ModR/M byte are used as an opcode extension, similar to the technique used for 1-and 2-byte opcodes (refer to Section A.2.5., Opcode Extensions For One- And Two-byte Opcodes). If the ModR/M byte is outside the range of 00H through BFH, the entire ModR/M byte is used as an opcode extension. A.2.6.1. OPCODES WITH MODR/M BYTES IN THE 00H THROUGH BFH RANGE
The opcode DD0504000000H can be interpreted as follows. The instruction encoded with this opcode can be located in Section A.2.6.8., Escape Opcodes with DD as First Byte. Since the ModR/M byte (05H) is within the 00H through BFH range, bits 3 through 5 (000) of this byte indicate the opcode to be for an FLD double-real instruction (refer to Table A-9). The doublereal value to be loaded is at 00000004H, which is the 32-bit displacement that follows and belongs to this opcode. A.2.6.2. OPCODES WITH MODR/M BYTES OUTSIDE THE 00H THROUGH BFH RANGE
The opcode D8C1H illustrates an opcode with a ModR/M byte outside the range of 00H through BFH. The instruction encoded here, can be located in Section A.2.5., Opcode Extensions For One- And Two-byte Opcodes. In Table A-8, the ModR/M byte C1H indicates row C, column 1, which is an FADD instruction using ST(0), ST(1) as the operands. A.2.6.3. ESCAPE OPCODES WITH D8 AS FIRST BYTE
Table A-7 and A-8 contain the opcodes maps for the escape instruction opcodes that begin with D8H. Table A-7 shows the opcode map if the accompanying ModR/M byte within the range of 00H through BFH. Here, the value of bits 5, 4, and 3 (the nnn field in Figure A-1) selects the instruction.
Table A-7. D8 Opcode Map When ModR/M Byte is Within 00H to BFH1
nnn Field of ModR/M Byte (refer to Figure A.2.5.) 000 FADD single-real 001 FMUL single-real 010 FCOM single-real 011 FCOMP single-real 100 FSUB single-real 101 FSUBR single-real 110 FDIV single-real 111 FDIVR single-real
A-12
OPCODE MAP
Table A-8 shows the opcode map if the accompanying ModR/M byte is outside the range of 00H to BFH. In this case the first digit of the ModR/M byte selects the row in the table and the second digit selects the column.
Table A-8. D8 Opcode Map When ModR/M Byte is Outside 00H to BFH1
0 C 1 2 3 FADD
ST(0),ST(0) ST(0),ST(1) ST(0),ST(2) ST(0),ST(3) ST(0),ST(4) ST(0),ST(5) ST(0),ST(6) ST(0),ST(7)
D
ST(0),ST(0) ST(0),ST(1) ST(0),T(2)
FCOM
ST(0),ST(3) ST(0),ST(4) ST(0),ST(5) ST(0),ST(6) ST(0),ST(7)
FSUB
FDIV
8 C
B FMUL
D
ST(0),ST(0) ST(0),ST(1) ST(0),T(2)
FCOMP
ST(0),ST(3) ST(0),ST(4) ST(0),ST(5) ST(0),ST(6) ST(0),ST(7)
FSUBR
FDIVR
A-13
OPCODE MAP
A.2.6.4.
ESCAPE OPCODES WITH D9 AS FIRST BYTE
Table A-9 and A-10 contain opcodes maps for escape instruction opcodes that begin with D9H. Table A-9 shows the opcode map if the accompanying ModR/M byte is within the range of 00H through BFH. Here, the value of bits 5, 4, and 3 (the Figure A-1 nnn field) selects the instruction.
Table A-9. D9 Opcode Map When ModR/M Byte is Within 00H to BFH1.
nnn Field of ModR/M Byte (refer to Figure A-1) 000 FLD single-real 001 010 FST single-real 011 100 101 FLDCW 2 bytes 110 FSTENV 14/28 bytes 111 FSTCW 2 bytes FSTP FLDENV single-real 14/28 bytes
NOTE: 1. All blanks in the opcode map are reserved and should not be used. Do not depend on the operation of these undefined opcodes.
A-14
OPCODE MAP
Table A-10. D9 Opcode Map When ModR/M Byte is Outside 00H to BFH1
0 C D E F FNOP FCHS F2XM1 FABS FYL2X FPTAN FPATAN FTST FXTRACT FXAM FPREM1 FDECSTP FINCSTP 1 2 3 FLD ST(0),ST(0) ST(0),ST(1) ST(0),ST(2) ST(0),ST(3) ST(0),ST(4) ST(0),ST(5) ST(0),ST(6) ST(0),ST(7) 4 5 6 7
8 C D E F FLD1 FPREM
B FXCH
FLDL2T FYL2XP1
FLDL2E FSQRT
FLDPI FSINCOS
FLDLG2 FRNDINT
FLDLN2 FSCALE
FLDZ FSIN FCOS
A.2.6.5.
ESCAPE OPCODES WITH DA AS FIRST BYTE
Table A-11 and A-12 contain the opcodes maps for the escape instruction opcodes that begin with DAH. Table A-11 shows the opcode map if the accompanying ModR/M byte within the range of 00H through BFH. Here, the value of bits 5, 4, and 3 (the nnn field in Figure A-1) selects the instruction.
Table A-11. DA Opcode Map When ModR/M Byte is Within 00H to BFH1
nnn Field of ModR/M Byte (refer to Figure A-1) 000 001 010 011 100 101 110 111
FIADD FIMUL FICOM FICOMP FISUB FISUBR FIDIV FIDIVR dword-integer dword-integer dword-integer dword-integer dword-integer dword-integer dword-integer dword-integer
A-15
OPCODE MAP
Table A-12. DA Opcode Map When ModR/M Byte is Outside 00H to BFH1
0 C 1 2 3 FCMOVB
FCMOVBE
8 C
B FCMOVE
FCMOVU
FUCOMPP
A.2.6.6.
ESCAPE OPCODES WITH DB AS FIRST BYTE
Table A-13 and A-14 contain the opcodes maps for the escape instruction opcodes that begin with DBH. Table A-13 shows the opcode map if the accompanying ModR/M byte within the range of 00H through BFH. Here, the value of bits 5, 4, and 3 (the nnn field in Figure A-1) selects the instruction.
A-16
OPCODE MAP
Table A-13. DB Opcode Map When ModR/M Byte is Within 00H to BFH1
nnn Field of ModR/M Byte (refer to Figure A-1) 000
FILD dword-integer
001
010
011
100
101
FLD extended-real
110
111
FSTP extended-real
FIST FISTP dword-integer dword-integer
Table A-14. DB Opcode Map When ModR/M Byte is Outside 00H to BFH1
0 C 1 2 3 FCMOVNB
FCMOVNBE
FCLEX
FINIT
FCOMI
8 C
FCMOVNE
FCMOVNU
FUCOMI
A-17
OPCODE MAP
A.2.6.7.
ESCAPE OPCODES WITH DC AS FIRST BYTE
Table A-15 and A-16 contain the opcodes maps for the escape instruction opcodes that begin with DCH. Table A-15 shows the opcode map if the accompanying ModR/M byte within the range of 00H through BFH. Here, the value of bits 5, 4, and 3 (the nnn field in Figure A-1) selects the instruction.
Table A-15. DC Opcode Map When ModR/M Byte is Within 00H to BFH1
FADD double-real
001
FMUL double-real
010
FCOM double-real
011
FCOMP double-real
100
FSUB double-real
101
FSUBR double-real
110
FDIV double-real
111
FDIVR double-real
A-18
OPCODE MAP
Table A-16. DC Opcode Map When ModR/M Byte is Outside 00H to BFH4
0 C 1 2 3 FADD
FSUBR
FDIVR
8 C
B FMUL
FSUB
FDIV
A.2.6.8.
ESCAPE OPCODES WITH DD AS FIRST BYTE
Table A-17 and A-18 contain the opcodes maps for the escape instruction opcodes that begin with DDH. Table A-17 shows the opcode map if the accompanying ModR/M byte within the range of 00H through BFH. Here, the value of bits 5, 4, and 3 (the nnn field in Figure A-1) selects the instruction.
A-19
OPCODE MAP
Table A-17. DD Opcode Map When ModR/M Byte is Within 00H to BFH1
FLD double-real
001
010
FST double-real
011
FSTP double-real
100
FRSTOR 98/108bytes
101
110
FSAVE 98/108bytes
111
FSTSW 2 bytes
Table A-18. DD Opcode Map When ModR/M Byte is Outside 00H to BFH1
0 C ST(0) D ST(0) E ST(1) ST(2) ST(3) ST(1) ST(2) 1 2 3 FFREE ST(3) FST ST(4) ST(5) ST(6) ST(7) ST(4) ST(5) ST(6) ST(7) 4 5 7
FUCOM
8 C
D ST(0) E ST(0) F ST(1) ST(2) ST(1) ST(2) ST(3)
FSTP ST(4) ST(5) ST(6) ST(7)
FUCOMP ST(3) ST(4) ST(5) ST(6) ST(7)
A-20
OPCODE MAP
A.2.6.9.
ESCAPE OPCODES WITH DE AS FIRST BYTE
Table A-19 and A-20 contain the opcodes maps for the escape instruction opcodes that begin with DEH. Table A-19 shows the opcode map if the accompanying ModR/M byte within the range of 00H through BFH. Here, the value of bits 5, 4, and 3 (the nnn field in Figure A-1) selects the instruction.
Table A-19. DE Opcode Map When ModR/M Byte is Within 00H to BFH1
FIADD word-integer
001
010
011
100
101
110
111
FIDIVR word-integer
FIMUL FICOM FICOMP FISUB FISUBR FIDIV word-integer word-integer word-integer word-integer word-integer word-integer
A-21
OPCODE MAP
Table A-20. DE Opcode Map When ModR/M Byte is Outside 00H to BFH1
0 C 1 2 3 FADDP
FSUBRP
FDIVRP ST(0),ST(0) ST(1),ST(0) ST(2),ST(0) ST(3),ST(0) ST(4),ST(0) ST(5),ST(0) ST(6),ST(0) ST(7),ST(0)
8 C
B FMULP
FCOMPP
FSUBP
FDIVP
ST(0),ST(0) ST(1),ST(0) ST(2),ST(0). ST(3),ST(0) ST(4),ST(0) ST(5),ST(0) ST(6),ST(0) ST(7),ST(0)
A.2.6.10.
ESCAPE OPCODES WITH DF AS FIRST BYTE
Table A-21 and A-22 contain the opcodes maps for the escape instruction opcodes that begin with DFH. Table A-21 shows the opcode map if the accompanying ModR/M byte within the range of 00H through BFH. Here, the value of bits 5, 4, and 3 (the nnn field in Figure A-1) selects the instruction.
A-22
OPCODE MAP
Table A-21. DF Opcode Map When ModR/M Byte is Within 00H to BFH1
FILD word-integer
001
010
011
100
101
110
111
FIST FISTP FBLD FILD FBSTP FISTP word-integer word-integer packed-BCD qword-integer packed-BCD qword-integer
Table A-22. DF Opcode Map When ModR/M Byte is Outside 00H to BFH1
0 C 1 2 3 4 5 7
FSTSW AX FCOMIP
8 C
FUCOMIP
A-23
OPCODE MAP
A-24
B
Instruction Formats and Encodings
APPENDIX B INSTRUCTION FORMATS AND ENCODINGS

This appendix shows the formats and encodings of the Intel Architecture instructions. The main format and encoding tables are Tables B-10, B-14, B-20, and B-23.
B.1. MACHINE INSTRUCTION FORMAT

All Intel Architecture instructions are encoded using subsets of the general machine instruction format shown in Figure B-1. Each instruction consists of an opcode, a register and/or address mode specifier (if required) consisting of the ModR/M byte and sometimes the scale-index-base (SIB) byte, a displacement (if required), and an immediate data field (if required).
76543210
7 6 5 4 3 2 1 0 7-6
5-3
2-0 7-6
5-3
2-0 d32 | 16 | 8 | None d32 | 16 | 8 | None
T T T T T T T T T T T T T T T T Mod Reg* R/M Scale Index Base
Opcode 1 or 2 Bytes (T Represents an Opcode Bit) * Reg Field is sometimes used as an opcode extension field (TTT).
ModR/M Byte
SIB Byte
Address Displacement Immediate Data (4, 2, 1 Bytes or None) (4,2,1 Bytes or None)
Register and/or Address Mode Specifier
Figure B-1. General Machine Instruction Format
The primary opcode for an instruction is encoded in one or two bytes of the instruction. Some instructions also use an opcode extension field encoded in bits 5, 4, and 3 of the ModR/M byte. Within the primary opcode, smaller encoding fields may be defined. These fields vary according to the class of operation being performed. The fields define such information as register encoding, conditional test performed, or sign extension of immediate byte. Almost all instructions that refer to a register and/or memory operand have a register and/or address mode byte following the opcode. This byte, the ModR/M byte, consists of the mod field, the reg field, and the R/M field. Certain encodings of the ModR/M byte indicate that a second address mode byte, the SIB byte, must be used. If the selected addressing mode specifies a displacement, the displacement value is placed immediately following the ModR/M byte or SIB byte. If a displacement is present, the possible sizes are 8, 16, or 32 bits. If the instruction specifies an immediate operand, the immediate value follows any displacement bytes. An immediate operand, if specified, is always the last field of the instruction.
B-1
INSTRUCTION FORMATS AND ENCODINGS
Table B-1 lists several smaller fields or bits that appear in certain instructions, sometimes within the opcode bytes themselves. The following tables describe these fields and bits and list the allowable values. All of these fields (except the d bit) are shown in the integer instruction formats given in Table B-10.
Table B-1. Special Fields Within Instruction Encodings
Field Name reg w s sreg2 sreg3 eee tttn d Description General-register specifier (refer to Table B-2 or B-3) Specifies if data is byte or full-sized, where full-sized is either 16 or 32 bits (refer to Table B-4) Specifies sign extension of an immediate data field (refer to Table B-5) Segment register specifier for CS, SS, DS, ES (refer to Table B-6) Segment register specifier for CS, SS, DS, ES, FS, GS (refer to Table B-6) Specifies a special-purpose (control or debug) register (refer to Table B-7) For conditional instructions, specifies a condition asserted or a condition negated (refer to Table B-8) Specifies direction of data operation (refer to Table B-9) Number of Bits 3 1 1 2 3 3 4 1
B.1.1.
Reg Field (reg)
The reg field in the ModR/M byte specifies a general-purpose register operand. The group of registers specified is modified by the presence of and state of the w bit in an encoding (refer to Table B-4). Table B-2 shows the encoding of the reg field when the w bit is not present in an encoding, and Table B-3 shows the encoding of the reg field when the w bit is present.
Table B-2. Encoding of reg Field When w Field is Not Present in Instruction
reg Field 000 001 010 011 100 101 110 111 Register Selected during 16-Bit Data Operations AX CX DX BX SP BP SI DI Register Selected during 32-Bit Data Operations EAX ECX EDX EBX ESP EBP ESI EDI
B-2
Table B-3. Encoding of reg Field When w Field is Present in Instruction

Register Specified by reg Field during 16-Bit Data Operations Function of w Field reg 000 001 010 011 100 101 110 111 When w = 0 AL CL DL BL AH CH DH BH When w = 1 AX CX DX BX SP BP SI DI reg 000 001 010 011 100 101 110 111 Register Specified by reg Field during 32-Bit Data Operations Function of w Field When w = 0 AL CL DL BL AH CH DH BH When w = 1 EAX ECX EDX EBX ESP EBP ESI EDI
B.1.2.
Encoding of Operand Size Bit (w)
The current operand-size attribute determines whether the processor is performing 16-or 32-bit operations. Within the constraints of the current operand-size attribute, the operand-size bit (w) can be used to indicate operations on 8-bit operands or the full operand size specified with the operand-size attribute (16 bits or 32 bits). Table B-4 shows the encoding of the w bit depending on the current operand-size attribute.
Table B-4. Encoding of Operand Size (w) Bit
w Bit 0 1 Operand Size When Operand-Size Attribute is 16 bits 8 Bits 16 Bits Operand Size When Operand-Size Attribute is 32 bits 8 Bits 32 Bits
B.1.3.
Sign Extend (s) Bit
The sign-extend (s) bit occurs primarily in instructions with immediate data fields that are being extended from 8 bits to 16 or 32 bits. Table B-5 shows the encoding of the s bit.
Table B-5. Encoding of Sign-Extend (s) Bit
s 0 1 None Sign-extend to fill 16-bit or 32-bit destination Effect on 8-Bit Immediate Data None None Effect on 16- or 32-Bit Immediate Data
B-3
B.1.4.
Segment Register Field (sreg)
When an instruction operates on a segment register, the reg field in the ModR/M byte is called the sreg field and is used to specify the segment register. Table B-6 shows the encoding of the sreg field. This field is sometimes a 2-bit field (sreg2) and other times a 3-bit field (sreg3).
Table B-6. Encoding of the Segment Register (sreg) Field
2-Bit sreg2 Field 00 01 10 11 Segment Register Selected ES CS SS DS 3-Bit sreg3 Field 000 001 010 011 100 101 110 111 * Do not use reserved encodings. Segment Register Selected ES CS SS DS FS GS Reserved* Reserved*
B.1.5.
Special-Purpose Register (eee) Field
When the control or debug registers are referenced in an instruction they are encoded in the eee field, which is located in bits 5, 4, and 3 of the ModR/M byte. Table B-7 shows the encoding of the eee field.
Table B-7. Encoding of Special-Purpose Register (eee) Field
eee 000 001 010 011 100 101 110 111 * Do not use reserved encodings. Control Register CR0 Reserved* CR2 CR3 CR4 Reserved* Reserved* Reserved* Debug Register DR0 DR1 DR2 DR3 Reserved* Reserved* DR6 DR7
B-4
B.1.6.
Condition Test Field (tttn)
For conditional instructions (such as conditional jumps and set on condition), the condition test field (tttn) is encoded for the condition being tested for. The ttt part of the field gives the condition to test and the n part indicates whether to use the condition (n = 0) or its negation (n = 1). For 1-byte primary opcodes, the tttn field is located in bits 3,2,1, and 0 of the opcode byte; for 2-byte primary opcodes, the tttn field is located in bits 3,2,1, and 0 of the second opcode byte. Table B-8 shows the encoding of the tttn field.
Table B-8. Encoding of Conditional Test (tttn) Field
tttn 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 O NO B, NAE NB, AE E, Z NE, NZ BE, NA NBE, A S NS P, PE NP, PO L, NGE NL, GE LE, NG NLE, G Mnemonic Overflow No overflow Below, Not above or equal Not below, Above or equal Equal, Zero Not equal, Not zero Below or equal, Not above Not below or equal, Above Sign Not sign Parity, Parity Even Not parity, Parity Odd Less than, Not greater than or equal to Not less than, Greater than or equal to Less than or equal to, Not greater than Not less than or equal to, Greater than Condition
B.1.7.
Direction (d) Bit
In many two-operand instructions, a direction bit (d) indicates which operand is considered the source and which is the destination. Table B-9 shows the encoding of the d bit. When used for integer instructions, the d bit is located at bit 1 of a 1-byte primary opcode. This bit does not appear as the symbol d in Table B-10; instead, the actual encoding of the bit as 1 or 0 is given. When used for floating-point instructions (in Table B-23), the d bit is shown as bit 2 of the first byte of the primary opcode.
B-5
Table B-9. Encoding of Operation Direction (d) Bit

d 0 1 reg Field ModR/M or SIB Byte Source Destination ModR/M or SIB Byte reg Field
B.2. INTEGER INSTRUCTION FORMATS AND ENCODINGS

Table B-10 shows the formats and encodings of the integer instructions.
Table B-10. Integer Instruction Formats and Encodings
Instruction and Format AAA ASCII Adjust after Addition AAD ASCII Adjust AX before Division AAM ASCII Adjust AX after Multiply AAS ASCII Adjust AL after Subtraction ADC ADD with Carry register1 to register2 register2 to register1 memory to register register to memory immediate to register immediate to AL, AX, or EAX immediate to memory ADD Add register1 to register2 register2 to register1 memory to register register to memory immediate to register immediate to AL, AX, or EAX immediate to memory 0000 000w : 11 reg1 reg2 0000 001w : 11 reg1 reg2 0000 001w : mod reg r/m 0000 000w : mod reg r/m 1000 00sw : 11 000 reg : immediate data 0000 010w : immediate data 1000 00sw : mod 000 r/m : immediate data 0001 000w : 11 reg1 reg2 0001 001w : 11 reg1 reg2 0001 001w : mod reg r/m 0001 000w : mod reg r/m 1000 00sw : 11 010 reg : immediate data 0001 010w : immediate data 1000 00sw : mod 010 r/m : immediate data 0011 0111 1101 0101 : 0000 1010 1101 0100 : 0000 1010 0011 1111 Encoding
B-6
Table B-10. Integer Instruction Formats and Encodings (Contd.)

Instruction and Format AND Logical AND register1 to register2 register2 to register1 memory to register register to memory immediate to register immediate to AL, AX, or EAX immediate to memory ARPL Adjust RPL Field of Selector from register from memory BOUND Check Array Against Bounds BSF Bit Scan Forward
register1, register2
Encoding
0010 000w : 11 reg1 reg2 0010 001w : 11 reg1 reg2 0010 001w : mod reg r/m 0010 000w : mod reg r/m
0010 010w : immediate data 1000 00sw : mod 100 r/m : immediate data
0110 0011 : 11 reg1 reg2 0110 0011 : mod reg r/m 0110 0010 : mod reg r/m
0000 1111 : 1011 1100 : 11 reg2 reg1 0000 1111 : 1011 1100 : mod reg r/m
memory, register BSR Bit Scan Reverse register1, register2 memory, register BSWAP Byte Swap BT Bit Test register, immediate memory, immediate register1, register2 memory, reg BTC Bit Test and Complement register, immediate memory, immediate register1, register2 memory, reg BTR Bit Test and Reset register, immediate memory, immediate register1, register2 memory, reg
0000 1111 : 1011 1101 : 11 reg2 reg1 0000 1111 : 1011 1101 : mod reg r/m 0000 1111 : 1100 1 reg
0000 1111 : 1011 1010 : 11 100 reg: imm8 data 0000 1111 : 1011 1010 : mod 100 r/m : imm8 data 0000 1111 : 1010 0011 : 11 reg2 reg1 0000 1111 : 1010 0011 : mod reg r/m
B-7

Instruction and Format BTS Bit Test and Set register, immediate memory, immediate register1, register2 memory, reg CALL Call Procedure (in same segment) direct register indirect memory indirect CALL Call Procedure (in other segment) direct indirect CBW Convert Byte to Word CDQ Convert Doubleword to Qword CLC Clear Carry Flag CLD Clear Direction Flag CLI Clear Interrupt Flag CLTS Clear Task-Switched Flag in CR0 CMC Complement Carry Flag CMOVcc Conditional Move register2 to register1 memory to register CMP Compare Two Operands register1 with register2 register2 with register1 memory with register register with memory immediate with register immediate with AL, AX, or EAX immediate with memory CMPS/CMPSB/CMPSW/CMPSD Compare String Operands CMPXCHG Compare and Exchange register1, register2 memory, register 0000 1111 : 1011 000w : 11 reg2 reg1 0000 1111 : 1011 000w : mod reg r/m 0011 100w : 11 reg1 reg2 0011 101w : 11 reg1 reg2 0011 100w : mod reg r/m 0011 101w : mod reg r/m 1000 00sw : 11 111 reg : immediate data 0011 110w : immediate data 1000 00sw : mod 111 r/m 1010 011w 0000 1111: 0100 tttn : 11 reg1 reg2 0000 1111: 0100 tttn : mod mem r/m 1001 1010 : unsigned full offset, selector 1111 1111 : mod 011 r/m 1001 1000 1001 1001 1111 1000 1111 1100 1111 1010 0000 1111 : 0000 0110 1111 0101 1110 1000 : full displacement 1111 1111 : 11 010 reg 1111 1111 : mod 010 r/m 0000 1111 : 1011 1010 : 11 101 reg: imm8 data 0000 1111 : 1011 1010 : mod 101 r/m : imm8 data 0000 1111 : 1010 1011 : 11 reg2 reg1 0000 1111 : 1010 1011 : mod reg r/m Encoding
B-8

Instruction and Format CMPXCHG8B Compare and Exchange 8 Bytes memory, register CPUID CPU Identification CWD Convert Word to Doubleword CWDE Convert Word to Doubleword DAA Decimal Adjust AL after Addition DAS Decimal Adjust AL after Subtraction DEC Decrement by 1 register register (alternate encoding) memory DIV Unsigned Divide AL, AX, or EAX by register AL, AX, or EAX by memory ENTER Make Stack Frame for High Level Procedure HLT Halt IDIV Signed Divide AL, AX, or EAX by register AL, AX, or EAX by memory IMUL Signed Multiply AL, AX, or EAX with register AL, AX, or EAX with memory register1 with register2 register with memory register1 with immediate to register2 memory with immediate to register IN Input From Port fixed port variable port INC Increment by 1 reg reg (alternate encoding) memory INS Input from DX Port 1111 111w : 11 000 reg 0100 0 reg 1111 111w : mod 000 r/m 0110 110w 1110 010w : port number 1110 110w 1111 011w : 11 101 reg 1111 011w : mod 101 reg 0000 1111 : 1010 1111 : 11 : reg1 reg2 0000 1111 : 1010 1111 : mod reg r/m 0110 10s1 : 11 reg1 reg2 : immediate data 0110 10s1 : mod reg r/m : immediate data 1111 011w : 11 111 reg 1111 011w : mod 111 r/m 1111 011w : 11 110 reg 1111 011w : mod 110 r/m 1100 1000 : 16-bit displacement : 8-bit level (L) 1111 0100 1111 111w : 11 001 reg 0100 1 reg 1111 111w : mod 001 r/m 0000 1111 : 1100 0111 : mod reg r/m 0000 1111 : 1010 0010 1001 1001 1001 1000 0010 0111 0010 1111 Encoding
B-9

Instruction and Format INT n Interrupt Type n INT Single-Step Interrupt 3 INTO Interrupt 4 on Overflow INVD Invalidate Cache INVLPG Invalidate TLB Entry IRET/IRETD Interrupt Return Jcc Jump if Condition is Met 8-bit displacement full displacement JCXZ/JECXZ Jump on CX/ECX Zero Address-size prefix differentiates JCXZ and JECXZ JMP Unconditional Jump (to same segment) short direct register indirect memory indirect JMP Unconditional Jump (to other segment) direct intersegment indirect intersegment LAHF Load Flags into AHRegister LAR Load Access Rights Byte from register from memory LDS Load Pointer to DS LEA Load Effective Address LEAVE High Level Procedure Exit LES Load Pointer to ES LFS Load Pointer to FS LGDT Load Global Descriptor Table Register LGS Load Pointer to GS LIDT Load Interrupt Descriptor Table Register LLDT Load Local Descriptor Table Register LDTR from register LDTR from memory 0000 1111 : 0000 0000 : 11 010 reg 0000 1111 : 0000 0000 : mod 010 r/m 0000 1111 : 0000 0010 : 11 reg1 reg2 0000 1111 : 0000 0010 : mod reg r/m 1100 0101 : mod reg r/m 1000 1101 : mod reg r/m 1100 1001 1100 0100 : mod reg r/m 0000 1111 : 1011 0100 : mod reg r/m 0000 1111 : 0000 0001 : mod 010 r/m 0000 1111 : 1011 0101 : mod reg r/m 1110 1010 : unsigned full offset, selector 1111 1111 : mod 101 r/m 1001 1111 1110 1011 : 8-bit displacement 1110 1001 : full displacement 1111 1111 : 11 100 reg 1111 1111 : mod 100 r/m 0111 tttn : 8-bit displacement 0000 1111 : 1000 tttn : full displacement 1110 0011 : 8-bit displacement 1100 1101 : type 1100 1100 1100 1110 0000 1111 : 0000 1000 0000 1111 : 0000 0001 : mod 111 r/m 1100 1111 Encoding
B-10

Instruction and Format LMSW Load Machine Status Word from register from memory LOCK Assert LOCK# Signal Prefix LODS/LODSB/LODSW/LODSD Load String Operand LOOP Loop Count LOOPZ/LOOPE Loop Count while Zero/Equal LOOPNZ/LOOPNE Loop Count while not Zero/Equal LSL Load Segment Limit from register from memory LSS Load Pointer to SS LTR Load Task Register from register from memory MOV Move Data register1 to register2 register2 to register1 memory to reg reg to memory immediate to register immediate to register (alternate encoding) immediate to memory memory to AL, AX, or EAX AL, AX, or EAX to memory MOV Move to/from Control Registers CR0 from register CR2 from register CR3 from register CR4 from register register from CR0-CR4 0000 1111 : 0010 0010 : 11 000 reg 0000 1111 : 0010 0010 : 11 010reg 0000 1111 : 0010 0010 : 11 011 reg 0000 1111 : 0010 0010 : 11 100 reg 0000 1111 : 0010 0000 : 11 eee reg 1000 100w : 11 reg1 reg2 1000 101w : 11 reg1 reg2 1000 101w : mod reg r/m 1000 100w : mod reg r/m 1100 011w : 11 000 reg : immediate data 1011 w reg : immediate data 1100 011w : mod 000 r/m : immediate data 1010 000w : full displacement 1010 001w : full displacement 0000 1111 : 0000 0000 : 11 011 reg 0000 1111 : 0000 0000 : mod 011 r/m 0000 1111 : 0000 0011 : 11 reg1 reg2 0000 1111 : 0000 0011 : mod reg r/m 0000 1111 : 1011 0010 : mod reg r/m 0000 1111 : 0000 0001 : 11 110 reg 0000 1111 : 0000 0001 : mod 110 r/m 1111 0000 1010 110w 1110 0010 : 8-bit displacement 1110 0001 : 8-bit displacement 1110 0000 : 8-bit displacement Encoding
B-11

Instruction and Format MOV Move to/from Debug Registers DR0-DR3 from register DR4-DR5 from register DR6-DR7 from register register from DR6-DR7 register from DR4-DR5 register from DR0-DR3 MOV Move to/from Segment Registers register to segment register register to SS memory to segment reg memory to SS segment register to register segment register to memory MOVS/MOVSB/MOVSW/MOVSD Move Data from String to String MOVSX Move with Sign-Extend register2 to register1 memory to reg MOVZX Move with Zero-Extend register2 to register1 memory to register MUL Unsigned Multiply AL, AX, or EAX with register AL, AX, or EAX with memory NEG Two's Complement Negation register memory NOP No Operation NOT One's Complement Negation register memory 1111 011w : 11 010 reg 1111 011w : mod 010 r/m 1111 011w : 11 011 reg 1111 011w : mod 011 r/m 1001 0000 1111 011w : 11 100 reg 1111 011w : mod 100 reg 0000 1111 : 1011 011w : 11 reg1 reg2 0000 1111 : 1011 011w : mod reg r/m 0000 1111 : 1011 111w : 11 reg1 reg2 0000 1111 : 1011 111w : mod reg r/m 1000 1110 : 11 sreg3 reg 1000 1110 : 11 sreg3 reg 1000 1110 : mod sreg3 r/m 1000 1110 : mod sreg3 r/m 1000 1100 : 11 sreg3 reg 1000 1100 : mod sreg3 r/m 1010 010w 0000 1111 : 0010 0011 : 11 eee reg 0000 1111 : 0010 0011 : 11 eee reg 0000 1111 : 0010 0011 : 11 eee reg 0000 1111 : 0010 0001 : 11 eee reg 0000 1111 : 0010 0001 : 11 eee reg 0000 1111 : 0010 0001 : 11 eee reg Encoding
B-12

Instruction and Format OR Logical Inclusive OR register1 to register2 register2 to register1 memory to register register to memory immediate to register immediate to AL, AX, or EAX immediate to memory OUT Output to Port fixed port variable port OUTS Output to DX Port POP Pop a Word from the Stack register register (alternate encoding) memory POP Pop a Segment Register from the Stack segment register CS, DS, ES segment register SS segment register FS, GS POPA/POPAD Pop All General Registers POPF/POPFD Pop Stack into FLAGS or EFLAGS Register PUSH Push Operand onto the Stack register register (alternate encoding) memory immediate PUSH Push Segment Register onto the Stack segment register CS,DS,ES,SS segment register FS,GS PUSHA/PUSHAD Push All General Registers PUSHF/PUSHFD Push Flags Register onto the Stack 000 sreg2 110 0000 1111: 10 sreg3 000 0110 0000 1001 1100 1111 1111 : 11 110 reg 0101 0 reg 1111 1111 : mod 110 r/m 0110 10s0 : immediate data 000 sreg2 111 000 sreg2 111 0000 1111: 10 sreg3 001 0110 0001 1001 1101 1000 1111 : 11 000 reg 0101 1 reg 1000 1111 : mod 000 r/m 1110 011w : port number 1110 111w 0110 111w 0000 100w : 11 reg1 reg2 0000 101w : 11 reg1 reg2 0000 101w : mod reg r/m 0000 100w : mod reg r/m 1000 00sw : 11 001 reg : immediate data 0000 110w : immediate data 1000 00sw : mod 001 r/m : immediate data Encoding
B-13

Instruction and Format RCL Rotate thru Carry Left register by 1 memory by 1 register by CL memory by CL register by immediate count memory by immediate count RCR Rotate thru Carry Right register by 1 memory by 1 register by CL memory by CL register by immediate count memory by immediate count RDMSR Read from Model-Specific Register RDPMC Read Performance Monitoring Counters RDTSC Read Time-Stamp Counter REP INS Input String REP LODS Load String REP MOVS Move String REP OUTS Output String REP STOS Store String REPE CMPS Compare String REPE SCAS Scan String REPNE CMPS Compare String REPNE SCAS Scan String RET Return from Procedure (to same segment) no argument adding immediate to SP RET Return from Procedure (to other segment) intersegment adding immediate to SP 1100 1011 1100 1010 : 16-bit displacement 1100 0011 1100 0010 : 16-bit displacement 1101 000w : 11 011 reg 1101 000w : mod 011 r/m 1101 001w : 11 011 reg 1101 001w : mod 011 r/m 1100 000w : 11 011 reg : imm8 data 1100 000w : mod 011 r/m : imm8 data 0000 1111 : 0011 0010 0000 1111 : 0011 0011 0000 1111 : 0011 0001 1111 0011 : 0110 110w 1111 0011 : 1010 110w 1111 0011 : 1010 010w 1111 0011 : 0110 111w 1111 0011 : 1010 101w 1111 0011 : 1010 011w 1111 0011 : 1010 111w 1111 0010 : 1010 011w 1111 0010 : 1010 111w 1101 000w : 11 010 reg 1101 000w : mod 010 r/m 1101 001w : 11 010 reg 1101 001w : mod 010 r/m 1100 000w : 11 010 reg : imm8 data 1100 000w : mod 010 r/m : imm8 data Encoding
B-14

Instruction and Format ROL Rotate Left register by 1 memory by 1 register by CL memory by CL register by immediate count memory by immediate count ROR Rotate Right register by 1 memory by 1 register by CL memory by CL register by immediate count memory by immediate count RSM Resume from System Management Mode SAHF Store AH into Flags SAL Shift Arithmetic Left SAR Shift Arithmetic Right register by 1 memory by 1 register by CL memory by CL register by immediate count memory by immediate count SBB Integer Subtraction with Borrow register1 to register2 register2 to register1 memory to register register to memory immediate to register immediate to AL, AX, or EAX immediate to memory SCAS/SCASB/SCASW/SCASD Scan String 0001 100w : 11 reg1 reg2 0001 101w : 11 reg1 reg2 0001 101w : mod reg r/m 0001 100w : mod reg r/m 1000 00sw : 11 011 reg : immediate data 0001 110w : immediate data 1000 00sw : mod 011 r/m : immediate data 1101 111w 1101 000w : 11 111 reg 1101 000w : mod 111 r/m 1101 001w : 11 111 reg 1101 001w : mod 111 r/m 1100 000w : 11 111 reg : imm8 data 1100 000w : mod 111 r/m : imm8 data 1101 000w : 11 001 reg 1101 000w : mod 001 r/m 1101 001w : 11 001 reg 1101 001w : mod 001 r/m 1100 000w : 11 001 reg : imm8 data 1100 000w : mod 001 r/m : imm8 data 0000 1111 : 1010 1010 1001 1110 same instruction as SHL 1101 000w : 11 000 reg 1101 000w : mod 000 r/m 1101 001w : 11 000 reg 1101 001w : mod 000 r/m 1100 000w : 11 000 reg : imm8 data 1100 000w : mod 000 r/m : imm8 data Encoding
B-15

Instruction and Format SETcc Byte Set on Condition register memory SGDT Store Global Descriptor Table Register SHL Shift Left register by 1 memory by 1 register by CL memory by CL register by immediate count memory by immediate count SHLD Double Precision Shift Left register by immediate count memory by immediate count register by CL memory by CL SHR Shift Right register by 1 memory by 1 register by CL memory by CL register by immediate count memory by immediate count SHRD Double Precision Shift Right register by immediate count memory by immediate count register by CL memory by CL SIDT Store Interrupt Descriptor Table Register SLDT Store Local Descriptor Table Register to register to memory 0000 1111 : 0000 0000 : 11 000 reg 0000 1111 : 0000 0000 : mod 000 r/m 0000 1111 : 1010 1100 : 11 reg2 reg1 : imm8 0000 1111 : 1010 1100 : mod reg r/m : imm8 0000 1111 : 1010 1101 : 11 reg2 reg1 0000 1111 : 1010 1101 : mod reg r/m 0000 1111 : 0000 0001 : mod 001 r/m 1101 000w : 11 101 reg 1101 000w : mod 101 r/m 1101 001w : 11 101 reg 1101 001w : mod 101 r/m 1100 000w : 11 101 reg : imm8 data 1100 000w : mod 101 r/m : imm8 data 0000 1111 : 1010 0100 : 11 reg2 reg1 : imm8 0000 1111 : 1010 0100 : mod reg r/m : imm8 0000 1111 : 1010 0101 : 11 reg2 reg1 0000 1111 : 1010 0101 : mod reg r/m 1101 000w : 11 100 reg 1101 000w : mod 100 r/m 1101 001w : 11 100 reg 1101 001w : mod 100 r/m 1100 000w : 11 100 reg : imm8 data 1100 000w : mod 100 r/m : imm8 data 0000 1111 : 1001 tttn : 11 000 reg 0000 1111 : 1001 tttn : mod 000 r/m 0000 1111 : 0000 0001 : mod 000 r/m Encoding
B-16

Instruction and Format SMSW Store Machine Status Word to register to memory STC Set Carry Flag STD Set Direction Flag STI Set Interrupt Flag 0000 1111 : 0000 0001 : 11 100 reg 0000 1111 : 0000 0001 : mod 100 r/m 1111 1001 1111 1101 1111 1011 Encoding
STOS/STOSB/STOSW/STOSD Store String Data 1010 101w STR Store Task Register to register to memory SUB Integer Subtraction register1 to register2 register2 to register1 memory to register register to memory immediate to register immediate to AL, AX, or EAX immediate to memory TEST Logical Compare register1 and register2 memory and register immediate and register immediate and AL, AX, or EAX immediate and memory UD2 Undefined instruction VERR Verify a Segment for Reading register memory VERW Verify a Segment for Writing register memory WAIT Wait WBINVD Writeback and Invalidate Data Cache WRMSR Write to Model-Specific Register 0000 1111 : 0000 0000 : 11 101 reg 0000 1111 : 0000 0000 : mod 101 r/m 1001 1011 0000 1111 : 0000 1001 0000 1111 : 0011 0000 0000 1111 : 0000 0000 : 11 100 reg 0000 1111 : 0000 0000 : mod 100 r/m 1000 010w : 11 reg1 reg2 1000 010w : mod reg r/m 1111 011w : 11 000 reg : immediate data 1010 100w : immediate data 1111 011w : mod 000 r/m : immediate data 0000 FFFF : 0000 1011 0010 100w : 11 reg1 reg2 0010 101w : 11 reg1 reg2 0010 101w : mod reg r/m 0010 100w : mod reg r/m 1000 00sw : 11 101 reg : immediate data 0010 110w : immediate data 1000 00sw : mod 101 r/m : immediate data 0000 1111 : 0000 0000 : 11 001 reg 0000 1111 : 0000 0000 : mod 001 r/m
B-17

Instruction and Format XADD Exchange and Add register1, register2 memory, reg XCHG Exchange Register/Memory with Register register1 with register2 AL, AX, or EAX with reg memory with reg XLAT/XLATB Table Look-up Translation XOR Logical Exclusive OR register1 to register2 register2 to register1 memory to register register to memory immediate to register immediate to AL, AX, or EAX immediate to memory Prefix Bytes address size LOCK operand size CS segment override DS segment override ES segment override FS segment override GS segment override SS segment override 0110 0111 1111 0000 0110 0110 0010 1110 0011 1110 0010 0110 0110 0100 0110 0101 0011 0110 0011 000w : 11 reg1 reg2 0011 001w : 11 reg1 reg2 0011 001w : mod reg r/m 0011 000w : mod reg r/m 1000 00sw : 11 110 reg : immediate data 0011 010w : immediate data 1000 00sw : mod 110 r/m : immediate data 1000 011w : 11 reg1 reg2 1001 0 reg 1000 011w : mod reg r/m 1101 0111 0000 1111 : 1100 000w : 11 reg2 reg1 0000 1111 : 1100 000w : mod reg r/m Encoding
B-18
B.3. MMX INSTRUCTION FORMATS AND ENCODINGS

All MMX instructions, except the EMMS instruction, use the a format similar to the 2-byte Intel Architecture integer format. Details of subfield encodings within these formats are presented below. For information relating to the use of prefixes with MMX instructions, and the effects of these prefixes, see Section B.4.1. and Section 2.2., Instruction Prefixes in Chapter 2, Instruction Format.
B.3.1.
Granularity Field (gg)
The granularity field (gg) indicates the size of the packed operands that the instruction is operating on. When this field is used, it is located in bits 1 and 0 of the second opcode byte. Table B-11 shows the encoding of this gg field.
Table B-11. Encoding of Granularity of Data Field (gg)
gg 00 01 10 11 Granularity of Data Packed Bytes Packed Words Packed Doublewords Quadword
B.3.2.
MMX and General-Purpose Register Fields (mmxreg and reg)
When MMX technology registers (mmxreg) are used as operands, they are encoded in the ModR/M byte in the reg field (bits 5, 4, and 3) and/or the R/M field (bits 2, 1, and 0). Table B12 shows the 3-bit encodings used for mmxreg fields.
Table B-12. Encoding of the MMX Register Field (mmxreg)
mmxreg Field Encoding 000 001 010 011 100 101 110 111 MMX Register MM0 MM1 MM2 MM3 MM4 MM5 MM6 MM7
B-19
If an MMX instruction operates on a general-purpose register (reg), the register is encoded in the R/M field of the ModR/M byte. Table B-13 shows the encoding of general-purpose registers when used in MMX instructions.
Table B-13. Encoding of the General-Purpose Register Field (reg) When Used in MMX Instructions.
reg Field Encoding 000 001 010 011 100 101 110 111 Register Selected EAX ECX EDX EBX ESP EBP ESI EDI
B.3.3.
MMX Instruction Formats and Encodings Table
Table B-14 shows the formats and encodings for MMX instructions for the data types supportedpacked byte (B), packed word (W), packed doubleword (D), and quadword (Q). Figure B-2 describes the nomenclature used in columns (3 through 6) of the table.
Code Y N O I n/a Meaning Supported Not supported Output Input Not Applicable
Figure B-2. Key to Codes for MMX Data Type Cross-Reference
B-20
Table B-14. MMX Instruction Formats and Encodings

Instruction and Format EMMS - Empty MMX state MOVD - Move doubleword reg to mmreg reg from mmxreg mem to mmxreg mem from mmxreg MOVQ - Move quadword mmxreg2 to mmxreg1 mmxreg2 from mmxreg1 mem to mmxreg mem from mmxreg - Pack dword to word data (signed with saturation) mmxreg2 to mmxreg1 memory to mmxreg Pack word to byte data (signed with saturation) mmxreg2 to mmxreg1 memory to mmxreg PACKUSWB1 - Pack word to byte data (unsigned with saturation) mmxreg2 to mmxreg1 memory to mmxreg PADD - Add with wrap-around mmxreg2 to mmxreg1 memory to mmxreg PADDS - Add signed with saturation mmxreg2 to mmxreg1 memory to mmxreg 0000 1111: 111011gg: 11 mmxreg1 mmxreg2 0000 1111: 111011gg: mod mmxreg r/m 0000 1111: 111111gg: 11 mmxreg1 mmxreg2 0000 1111: 111111gg: mod mmxreg r/m Y Y N N 0000 1111:01100111: 11 mmxreg1 mmxreg2 0000 1111:01100111: mod mmxreg r/m Y Y Y N 0000 1111:01100011: 11 mmxreg1 mmxreg2 0000 1111:01100011: mod mmxreg r/m O I n/a n/a PACKSSWB1 0000 1111:01101011: 11 mmxreg1 mmxreg2 0000 1111:01101011: mod mmxreg r/m O I n/a n/a PACKSSDW1 0000 1111:01101111: 11 mmxreg1 mmxreg2 0000 1111:01111111: 11 mmxreg1 mmxreg2 0000 1111:01101111: mod mmxreg r/m 0000 1111:01111111: mod mmxreg r/m n/a O I n/a 0000 1111:01101110: 11 mmxreg reg 0000 1111:01111110: 11 mmxreg reg 0000 1111:01101110: mod mmxreg r/m 0000 1111:01111110: mod mmxreg r/m N N N Y Encoding 0000 1111:01110111 B n/a N W n/a N D n/a Y Q n/a N
B-21
Table B-14. MMX Instruction Formats and Encodings (Contd.)

Instruction and Format PADDUS - Add unsigned with saturation mmxreg2 to mmxreg1 memory to mmxreg PAND - Bitwise And mmxreg2 to mmxreg1 memory to mmxreg PANDN - Bitwise AndNot mmxreg2 to mmxreg1 memory to mmxreg PCMPEQ - Packed compare for equality mmxreg1 with mmxreg2 mmxreg with memory PCMPGT - Packed compare greater (signed) mmxreg1 with mmxreg2 mmxreg with memory PMADD - Packed multiply add mmxreg2 to mmxreg1 memory to mmxreg PMULH - Packed multiplication mmxreg2 to mmxreg1 memory to mmxreg PMULL - Packed multiplication mmxreg2 to mmxreg1 memory to mmxreg POR - Bitwise Or mmxreg2 to mmxreg1 memory to mmxreg 0000 1111:11101011: 11 mmxreg1 mmxreg2 0000 1111:11101011: mod mmxreg r/m 0000 1111:11010101: 11 mmxreg1 mmxreg2 0000 1111:11010101: mod mmxreg r/m N N N Y 0000 1111:11100101: 11 mmxreg1 mmxreg2 0000 1111:11100101: mod mmxreg r/m N Y N N 0000 1111:11110101: 11 mmxreg1 mmxreg2 0000 1111:11110101: mod mmxreg r/m N Y N N 0000 1111:011001gg: 11 mmxreg1 mmxreg2 0000 1111:011001gg: mod mmxreg r/m n/a I O n/a 0000 1111:011101gg: 11 mmxreg1 mmxreg2 0000 1111:011101gg: mod mmxreg r/m Y Y Y N 0000 1111:11011111: 11 mmxreg1 mmxreg2 0000 1111:11011111: mod mmxreg r/m Y Y Y N 0000 1111:11011011: 11 mmxreg1 mmxreg2 0000 1111:11011011: mod mmxreg r/m N N N Y 0000 1111: 110111gg: 11 mmxreg1 mmxreg2 0000 1111: 110111gg: mod mmxreg r/m N N N Y Encoding B Y W Y D N Q N
B-22

Instruction and Format PSLL - Packed shift left logical mmxreg1 by mmxreg2 mmxreg by memory mmxreg by immediate PSRA2 - Packed shift right arithmetic mmxreg1 by mmxreg2 mmxreg by memory mmxreg by immediate PSRL2 - Packed shift right logical mmxreg1 by mmxreg2 mmxreg by memory mmxreg by immediate PSUB - Subtract with wraparound mmxreg2 from mmxreg1 memory from mmxreg PSUBS - Subtract signed with saturation mmxreg2 from mmxreg1 memory from mmxreg PSUBUS - Subtract unsigned with saturation mmxreg2 from mmxreg1 memory from mmxreg PUNPCKH - Unpack high data to next larger type mmxreg2 to mmxreg1 memory to mmxreg 0000 1111:011010gg: 11 mmxreg1 mmxreg2 0000 1111:011010gg: mod mmxreg r/m 0000 1111:110110gg: 11 mmxreg1 mmxreg2 0000 1111:110110gg: mod mmxreg r/m Y Y Y N 0000 1111:111010gg: 11 mmxreg1 mmxreg2 0000 1111:111010gg: mod mmxreg r/m Y Y N N 0000 1111:111110gg: 11 mmxreg1 mmxreg2 0000 1111:111110gg: mod mmxreg r/m Y Y N N 0000 1111:110100gg: 11 mmxreg1 mmxreg2 0000 1111:110100gg: mod mmxreg r/m 0000 1111:011100gg: 11 010 mmxreg: imm8 data Y Y Y N 0000 1111:111000gg: 11 mmxreg1 mmxreg2 0000 1111:111000gg: mod mmxreg r/m 0000 1111:011100gg: 11 100 mmxreg: imm8 data N Y Y Y 0000 1111:111100gg: 11 mmxreg1 mmxreg2 0000 1111:111100gg: mod mmxreg r/m 0000 1111:011100gg: 11 110 mmxreg: imm8 data N Y Y N
2
Encoding
B N
W Y
D Y
Q Y
B-23

Instruction and Format PUNPCKL - Unpack low data to next larger type mmxreg2 to mmxreg1 memory to mmxreg PXOR - Bitwise Xor mmxreg2 to mmxreg1 memory to mmxreg NOTES: 1. The pack instructions perform saturation from signed packed data of one type to signed or unsigned data of the next smaller type. 2. The format of the shift instructions has one additional format to support shifting by immediate shiftcounts. The shift operations are not supported equally for all data types. 0000 1111:11101111: 11 mmxreg1 mmxreg2 0000 1111:11101111: mod mmxreg r/m 0000 1111:011000gg: 11 mmxreg1 mmxreg2 0000 1111:011000gg: mod mmxreg r/m N N N Y Encoding B Y W Y D Y Q N
B.4. STREAMING SIMD EXTENSION FORMATS AND ENCODINGS TABLE

The nature of the Streaming SIMD Extensions allows the use of existing instruction formats. Instructions use the ModR/M format and are preceded by the 0F prefix byte. In general, operations are not duplicated to provide two directions (i.e., separate load and store variants).
B.4.1.
Instruction Prefixes
The Streaming SIMD Extensions use prefixes as specified in Table B-15, Table B-16, and Table B-17. The effect of redundant prefixes (more than one prefix from a group) is undefined and may vary from processor to processor. Applying a prefix, in a manner not defined in this document, is considered reserved behavior. For example, Table B-15 shows general behavior for most Streaming SIMD Extensions; however, the application of a prefix (Repeat, Repeat NE, Operand Size) is reserved for the following instructions: ANDPS, ANDNPS, COMISS, FXRSTOR, FXSAVE, ORPS, LDMXCSR, MOVAPS, MOVHPS, MOVLPS, MOVMSKPS, MOVNTPS, MOVUPS, SHUFPS, STMXCSR, UCOMISS, UNPCKHPS, UNPCKLPS, XORPS.
B-24
Table B-15. Streaming SIMD Extensions Instruction Behavior with Prefixes

Prefix Type Address Size Prefix (67H) Operand Size (66H) Segment Override (2EH,36H,3EH,26H,64H,65H) Repeat Prefix (F3H) Repeat NE Prefix(F2H) Lock Prefix (0F0H) Effect on Streaming SIMD Extensions Affects Streaming SIMD Extensions with memory operand Ignored by Streaming SIMD Extensions without memory operand. Not supported and may result in undefined behavior. Affects Streaming SIMD Extensions with mem.operand Ignored by Streaming SIMD Extensions without mem operand Affects Streaming SIMD Extensions Not supported and may result in undefined behavior. Generates invalid opcode exception.
Table B-16. SIMD Integer Instructions - Behavior with Prefixes

Prefix Type Address Size Prefix (67H) Operand Size (66H) Segment Override (2EH,36H,3EH,26H,64H,65H) Repeat Prefix (F3H) Repeat NE Prefix(F2H) Lock Prefix (0F0H) Effect on MMX Instructions Affects MMX instructions with mem. operand Ignored by MMX instructions without mem. operand. Reserved and may result in unpredictable behavior. Affects MMX instructions with mem. operand Ignored by MMX instructions without mem operand Reserved and may result in unpredictable behavior. Reserved and may result in unpredictable behavior. Generates invalid opcode exception.
Table B-17. Cacheability Control Instruction Behavior with Prefixes

Prefix Type Address Size Prefix (67H) Operand Size (66H) Effect on Streaming SIMD Extensions Affects cacheability control instruction with a mem. operand Ignored by cacheability control instruction w/o a mem. operand. Ignored by PREFETCH and SFENCE. Not supported and may result in undefined behavior with MOVNTPS. Ignored by MOVNTQ and MASKMOVQ. Affects cacheability control instructions with mem. operand Ignored by cacheability control instruction without mem operand Ignored by PREFETCH and SFENCE instructions. Not supported and may result in undefined behavior with MOVNTPS. Ignored by MOVNTQ and MASKMOVQ. Ignored by PREFETCH and SFENCE instructions. Not supported and may result in undefined behavior with MOVNTPS. Ignored by MOVNTQ and MASKMOVQ. Generates an invalid opcode exception for all cacheability instructions.
Segment Override (2EH,36H,3EH,26H,64H,65H) Repeat Prefix(F3H)
Repeat NE Prefix(F2H)
Lock Prefix (0F0H)
B-25
B.4.2.
Notations
Besides opcodes, two kinds of notations are found which both describe information found in the ModR/M byte: /digit: (digit between 0 and 7) Indicates that the instruction uses only the r/m (register and memory) operand. The reg field contains the digit that provides an extension to the instructions opcode. Indicates that the ModR/M byte of an instruction contains both a register operand and an r/m operand.
/r
In addition, the following abbreviations are used: r32 xmm/m128 xmm/m64 xmm/m32 mm/m64 imm8 ib Intel Architecture 32-bit integer register Indicates a 128-bit multimedia register or a 128-bit memory location. Indicates a 128-bit multimedia register or a 64-bit memory location. Indicates a 128-bit multimedia register or a 32-bit memory location. Indicates a 64-bit multimedia register or a 64-bit memory location. Indicates an immediate 8-bit operand. Indicates that an immediate byte operand follows the opcode, ModR/M byte or scaled-indexing byte.
When there is ambiguity, xmm1 indicates the first source operand and xmm2 the second source operand. Table B-18 describes the naming conventions used in the Streaming SIMD Extensions mnemonics.
Table B-18. Key to Streaming SIMD Extensions Naming Convention

Mnemonic PI PS SI SS Description Packed integer qword (e.g., mm0) Packed single FP (e.g., xmm0) Scalar integer (e.g., eax) Scalar single-FP (e.g., low 32 bits of xmm0)
B-26
B.4.3.
Formats and Encodings
The following three tables show the formats and encodings for Streaming SIMD Extensions for the data types supportedpacked byte (B), packed word (W), packed doubleword (D), quadword (Q), and double quadword (DQ). Table B-19, Table B-20, and Table B-21 correspond respectively to SIMD floating-point, SIMD-Integer, and Cacheability Register Fields. Figure B-3 describes the nomenclature used in columns (3 through 7) of the table.
Code Y N O I n/a Meaning Supported Not supported Output Input Not Applicable
Figure B-3. Key to Codes for Streaming SIMD Extensions Data Type CrossReference Table B-19. Encoding of the SIMD Floating-Point Register Field
Instruction and Format ADDPS - Packed SingleFP Add xmmreg to xmmreg mem to xmmreg ADDSS - Scalar SingleFP Add xmmreg to xmmreg mem to xmmreg ANDNPS - Bit-wise Logical And Not for Single-FP xmmreg to xmmreg mem to xmmreg ANDPS - Bit-wise Logical And for Single-FP xmmreg to xmmreg mem to xmmreg CMPPS - Packed SingleFP Compare xmmreg to xmmreg, imm8 mem to xmmreg, imm8 00001111:11000010:11 xmmreg1 xmmreg2: imm8 00001111:11000010: mod xmmreg r/m: imm8 00001111:01010100:11 xmmreg1 xmmreg2 00001111:01010100: mod xmmreg r/m n/a n/a n/a n/a Y 00001111:01010101:11 xmmreg1 xmmreg2 00001111:01010101: mod xmmreg r/m n/a n/a n/a n/a Y 11110011:00001111:01011000:11 xmmreg1 xmmreg2 11110011:00001111:01011000: mod xmmreg r/m n/a n/a n/a n/a Y 00001111:01011000:11 xmmreg1 xmmreg2 00001111:01011000: mod xmmreg r/m n/a n/a Y n/a n/a Encoding B n/a W n/a D n/a Q n/a DQ Y
B-27
Table B-19. Encoding of the SIMD Floating-Point Register Field

Instruction and Format CMPSS - Scalar SingleFP Compare xmmreg to xmmreg, imm8 mem to xmmreg, imm8 COMISS - Scalar Ordered Single-FP compare and set EFLAGS xmmreg to xmmreg mem to xmmreg CVTPI2PS - Packed signed INT32 to Packed Single-FP conversion mmreg to xmmreg mem to xmmreg CVTPS2PI - Packed Single-FP to Packed INT32 conversion xmmreg to mmreg mem to mmreg CVTSI2SS - Scalar signed INT32 to SingleFP conversion r32 to xmmreg1 mem to xmmreg CVTSS2SI - Scalar Single-FP to signed INT32 conversion xmmreg to r32 mem to r32 CVTTPS2PI - Packed Single-FP to Packed INT32 Conversion (truncate) xmmreg to mmreg mem to mmreg 00001111:00101100:11 mmreg1 xmmreg1 00001111:00101100: mod mmreg r/m 11110011:00001111:00101101:11 r32 xmmreg 11110011:00001111:00101101: mod r32 r/m n/a n/a n/a n/a Y 11110011:00001111:00101010:11 xmmreg r32 11110011:00001111:00101010: mod xmmreg r/m n/a n/a Y n/a n/a 00001111:00101101:11 mmreg1 xmmreg1 00001111:00101101: mod mmreg r/m n/a n/a Y n/a n/a 00001111:00101010:11 xmmreg1 mmreg1 00001111:00101010: mod xmmreg r/m n/a n/a n/a n/a Y 00001111:00101111:11 xmmreg1 xmmreg2 00001111:00101111: mod xmmreg r/m n/a n/a n/a n/a Y 11110011:00001111:11000010:11 xmmreg1 xmmreg2: imm8 11110011:00001111:11000010: mod xmmreg r/m: imm8 n/a n/a Y n/a n/a Encoding B n/a W n/a D Y Q n/a DQ n/a
B-28

Instruction and Format CVTTSS2SI - Scalar Single-FP to signed INT32 conversion (truncate) xmmreg to r32 mem to r32 DIVPS - Packed SingleFP Divide xmmreg to xmmreg mem to xmmreg DIVSS - Scalar Single-FP Divide xmmreg to xmmreg mem to xmmreg FXRSTOR - Restore FP/MMX and Streaming SIMD Extensions state FXSAVE - Store FP/MMX and Streaming SIMD Extensions state LDMXCSR - Load Streaming SIMD Extensions Technology Control/Status Register m32 to MXCSR MAXPS - Packed SingleFP Maximum xmmreg to xmmreg mem to xmmreg MAXSS - Scalar SingleFP Maximum xmmreg to xmmreg mem to xmmreg MINPS - Packed SingleFP Minimum xmmreg to xmmreg mem to xmmreg 00001111:01011101:11 xmmreg1 xmmreg2 00001111:01011101: mod xmmreg r/m 11110011:00001111:01011111:11 xmmreg1 xmmreg2 11110011:00001111:01011111: mod xmmreg r/m n/a n/a n/a n/a Y 00001111:01011111:11 xmmreg1 xmmreg2 00001111:01011111: mod xmmreg r/m n/a n/a Y n/a n/a 00001111:10101110:10 m32 n/a n/a n/a n/a Y 11110011:00001111:01011110: mod xmmreg r/m 00001111:10101110:01 m512 n/a n/a n/a n/a n/a 00001111:01011110:11 xmmreg1 xmmreg2 00001111:01011110: mod xmmreg r/m 11110011:00001111:01011110:11 xmmreg1 xmmreg2 n/a n/a Y n/a n/a 11110011:00001111:00101100:11 r32 xmmreg1 11110011:00001111:00101100: mod r32 r/m n/a n/a n/a n/a Y Encoding B n/a W n/a D Y Q n/a DQ n/a
00001111:10101110:00 m512
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
B-29

Instruction and Format MINSS - Scalar Single-FP Minimum xmmreg to xmmreg mem to xmmreg MOVAPS - Move Aligned Four Packed Single-FP xmmreg2 to xmmreg1 mem to xmmreg1 xmmreg1 to xmmreg2 xmmreg1 to mem MOVHLPS - Move High to Low Packed Single-FP xmmreg to xmmreg MOVHPS - Move High Packed Single-FP mem to xmmreg xmmreg to mem MOVLHPS - Move Low to High Packed Single-FP xmmreg to xmmreg MOVLPS - Move Low Packed Single-FP mem to xmmreg xmmreg to mem MOVMSKPS - Move Mask To Integer xmmreg to r32 MOVSS - Move Scalar Single-FP xmmreg2 to xmmreg1 mem to xmmreg1 xmmreg1 to xmmreg2 xmmreg1 to mem 11110011:00001111:00010000:11 xmmreg2 xmmreg1 11110011:00001111:00010000: mod xmmreg r/m 11110011:00001111:00010000:11 xmmreg1 xmmreg2 11110011:00001111:00010000: mod xmmreg r/m 00001111:01010000:11 r32 xmmreg n/a n/a Y n/a n/a 00001111:00010010: mod xmmreg r/m 00001111:00010011: mod xmmreg r/m n/a n/a n/a n/a Y n/a n/a n/a Y n/a 00001111:00010110:11 xmmreg1 xmmreg2 00001111:00010110: mod xmmreg r/m 00001111:00010111: mod xmmreg r/m n/a n/a n/a Y n/a 00001111:00010010:11 xmmreg1 xmmreg2 n/a n/a n/a Y n/a 00001111:00101000:11 xmmreg2 xmmreg1 00001111:00101000: mod xmmreg r/m 00001111:00101001:11 xmmreg1 xmmreg2 00001111:00101001: mod xmmreg r/m n/a n/a n/a Y n/a 11110011:00001111:01011101:11 xmmreg1 xmmreg2 11110011:00001111:01011101: mod xmmreg r/m n/a n/a n/a n/a Y Encoding B n/a W n/a D Y Q n/a DQ n/a
B-30

Instruction and Format MOVUPS - Move Unaligned Four Packed Single-FP xmmreg2 to xmmreg1 mem to xmmreg1 xmmreg1 to xmmreg2 xmmreg1 to mem MULPS - Packed SingleFP Multiply xmmreg to xmmreg mem to xmmreg MULSS - Scalar SingleFP Multiply xmmreg to xmmreg mem to xmmreg ORPS: Bit-wise Logical OR for Single-FP Data xmmreg to xmmreg mem to xmmreg RCPPS - Packed SingleFP Reciprocal xmmreg to xmmreg mem to xmmreg RCPSS - Scalar SingleFP Reciprocal xmmreg to xmmreg mem to xmmreg RSQRTPS - Packed Single-FP Square Root Reciprocal xmmreg to xmmreg mem to xmmreg 00001111:01010010:11 xmmreg1 xmmreg2 00001111:01010010 mode xmmreg r/m 11110011:00001111:01010011:11 xmmreg1 xmmreg2 11110011:00001111:01010011: mod xmmreg r/m n/a n/a n/a n/a Y 00001111:01010011:11 xmmreg1 xmmreg2 00001111:01010011: mod xmmreg r/m n/a n/a Y n/a n/a 00001111:01010110:11 xmmreg1 xmmreg2 00001111:01010110 mod xmmreg r/m n/a n/a n/a n/a Y 11110011:00001111:010111001:11 xmmreg1 xmmreg2 11110011:00001111:010111001: mod xmmreg r/m n/a n/a n/a n/a Y 00001111:01011001:11 xmmreg1 xmmreg2 00001111:01011001: mod xmmreg rm n/a n/a Y n/a n/a 00001111:00010000:11 xmmreg2 xmmreg1 00001111:00010000: mod xmmreg r/m 00001111:00010001:11 xmmreg1 xmmreg2 00001111:00010001: mod xmmreg r/m n/a n/a n/a n/a Y Encoding B n/a W n/a D n/a Q n/a DQ Y
B-31

Instruction and Format RSQRTSS - Scalar Single-FP Square Root Reciprocal xmmreg to xmmreg mem to xmmreg SHUFPS - Shuffle SingleFP xmmreg to xmmreg, imm8 mem to xmmreg, imm8 SQRTPS - Packed Single-FP Square Root xmmreg to xmmreg mem to xmmreg SQRTSS - Scalar SingleFP Square Root xmmreg to xmmreg mem to xmmreg STMXCSR - Store Streaming SIMD Extensions Technology Control/Status Register MXCSR to mem SUBPS: Packed SingleFP Subtract xmmreg to xmmreg mem to xmmreg SUBSS: Scalar Single-FP Subtract xmmreg to xmmreg mem to xmmreg UCOMISS: Unordered Scalar Single-FP compare and set EFLAGS xmmreg to xmmreg mem to xmmreg 00001111:00101110:11 xmmreg1 xmmreg2 00001111:00101110 mod xmmreg r/m 11110011:00001111:01011100:11 xmmreg1 xmmreg2 11110011:00001111:01011100 mod xmmreg r/m n/a n/a Y n/a n/a 00001111:01011100:11 xmmreg1 xmmreg2 00001111:01011100 mod xmmreg r/m n/a n/a Y n/a n/a 00001111:10101110:11 m32 n/a n/a n/a n/a Y 01010011:00001111:01010001:11 xmmreg1 xmmreg 2 01010011:00001111:01010001 mod xmmreg r/m n/a n/a Y n/a n/a 00001111:01010001:11 xmmreg1 xmmreg 2 00001111:01010001 mod xmmreg r/m n/a n/a Y n/a n/a 00001111:11000110:11 xmmreg1 xmmreg2: imm8 00001111:11000110: mod xmmreg r/m: imm8 n/a n/a n/a n/a Y 11110011:00001111:01010010:11 xmmreg1 xmmreg2 11110011:00001111:01010010 mod xmmreg r/m n/a n/a n/a n/a Y Encoding B n/a W n/a D Y Q n/a DQ n/a
B-32

Instruction and Format UNPCKHPS: Unpack High Packed Single-FP Data xmmreg to xmmreg mem to xmmreg UNPCKLPS: Unpack Low Packed Single-FP Data xmmreg to xmmreg mem to xmmreg XORPS: Bit-wise Logical Xor for Single-FP Data xmmreg to xmmreg mem to xmmreg 00001111:01010111:11 xmmreg1 xmmreg2 00001111:01010111 mod xmmreg r/m 00001111:00010100:11 xmmreg1 xmmreg2 00001111:00010100 mod xmmreg r/m n/a n/a n/a n/a Y 00001111:00010101:11 xmmreg1 xmmreg2 00001111:00010101 mod xmmreg r/m n/a n/a n/a n/a Y Encoding B n/a W n/a D n/a Q n/a DQ Y
B-33
Table B-20. Encoding of the SIMD-Integer Register Field

Instruction and Format PAVGB/PAVGW - Packed Average mmreg to mmreg 00001111:11100000:11 mmreg1 mmreg2 00001111:11100011:11 mmreg1 mmreg2 mem to mmreg 00001111:11100000 mod mmreg r/m 00001111:11100011 mod mmreg r/m PEXTRW - Extract Word mmreg to reg32, imm8 PINSRW - Insert Word reg32 to mmreg, imm8 m16 to mmreg, imm8 PMAXSW - Packed Signed Integer Word Maximum mmreg to mmreg mem to mmreg PMAXUB - Packed Unsigned Integer Byte Maximum mmreg to mmreg mem to mmreg PMINSW - Packed Signed Integer Word Minimum mmreg to mmreg mem to mmreg PMINUB - Packed Unsigned Integer Byte Minimum mmreg to mmreg mem to mmreg PMOVMSKB - Move Byte Mask To Integer mmreg to reg32 00001111:11010111:11 mmreg1 r32 00001111:11011010:11 mmreg1 mmreg2 00001111:11011010 mod mmreg r/m O n/a n/a I n/a 00001111:11101010:11 mmreg1 mmreg2 00001111:11101010 mod mmreg r/m Y n/a n/a n/a n/a 00001111:11011110:11 mmreg1 mmreg2 00001111:11011110 mod mmreg r/m n/a Y n/a n/a n/a 00001111:11101110:11 mmreg1 mmreg2 00001111:11101110 mod mmreg r/m Y n/a n/a n/a n/a 00001111:11000100:11 r32 mmreg1: imm8 00001111:11000100 mod mmreg r/m: imm8 n/a Y n/a n/a n/a 00001111:11000101:11 mmreg r32: imm8 n/a Y n/a n/a n/a n/a Y n/a n/a n/a Encoding B Y W Y D n/a Q n/a DQ n/a
B-34
Table B-20. Encoding of the SIMD-Integer Register Field

Instruction and Format PMULHUW - Packed Multiply High Unsigned mmreg to mmreg mem to mmreg PSADBW - Packed Sum of Absolute Differences mmreg to mmreg mem to mmreg PSHUFW - Packed Shuffle Word mmreg to mmreg, imm8 mem to mmreg, imm8 00001111:01110000:11 mmreg1 mmreg2: imm8 00001111:01110000:11 mod mmreg r/m: imm8 00001111:11110110:11 mmreg1 mmreg2 00001111:11110110 mod mmreg r/m n/a Y n/a I n/a 00001111:11100100:11 mmreg1 mmreg2 00001111:11100100 mod mmreg r/m I O n/a Y n/a Encoding B n/a W O D n/a Q I DQ n/a
Table B-21. Encoding of the Streaming SIMD Extensions Cacheability Control Register Field
Instruction and Format MASKMOVQ - Byte Mask Write mmreg to mmreg MOVNTPS - Move Aligned Four Packed Single-FP Non Temporal xmmreg to mem MOVNTQ - Move 64 Bits Non Temporal mmreg to mem PREFETCHT0 - Prefetch to all cache levels PREFETCHT1 - Prefetch to all cache levels PREFETCHT2 - Prefetch to L2 cache PREFETCHNTA Prefetch to L1 cache SFENCE - Store Fence 00001111:11100111 mod mmreg r/m 00001111:00011000:01 mem 00001111:00011000:10 mem 00001111:00011000:11 mem 00001111:00011000:00 mem 00001111:10101110:11111000 Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y 00001111:00101011 mod xmmreg r/m n/a n/a n/a Y n/a 00001111:11110111:11 mmreg1 mmreg2 n/a n/a n/a n/a Y Encoding B n/a W n/a D n/a Q Y DQ n/a
B-35
B.5. FLOATING-POINT INSTRUCTION FORMATS AND ENCODINGS

Table B-22 shows the five different formats used for floating-point instructions In all cases, instructions are at least two bytes long and begin with the bit pattern 11011.
Table B-22. General Floating-Point Instruction Formats
Instruction First Byte 1 2 3 4 5 11011 11011 11011 11011 11011 1511 d 0 0 10 OPA MF P 0 1 9 1 OPA OPA 1 1 8 1 1 1 7 mod mod 1 1 1 6 Second Byte 1 OPB OPB 1 1 5 4 3 R OP OP 2 1 0 OPB r/m r/m ST(i) Optional Fields s-i-b s-i-b disp disp
MF = Memory Format 00 32-bit real 01 32-bit integer 10 64-bit real 11 16-bit integer P = Pop 0 Do not pop stack 1 Pop stack after operation d = Destination 0 Destination is ST(0) 1 Destination is ST(i) R XOR d = 0 Destination OP Source R XOR d = 1 Source OP Destination ST(i) = Register stack element i 000 = Stack Top 001 = Second stack element 111 = Eighth stack element
The Mod and R/M fields of the ModR/M byte have the same interpretation as the corresponding fields of the integer instructions. The SIB byte and disp (displacement) are optionally present in instructions that have Mod and R/M fields. Their presence depends on the values of Mod and R/M, as for integer instructions. Table B-23 shows the formats and encodings of the floating-point instructions.
B-36
Table B-23. Floating-Point Instruction Formats and Encodings

Instruction and Format F2XM1 Compute 2ST(0) 1 FABS Absolute Value FADD Add ST(0) ST(0) + 32-bit memory ST(0) ST(0) + 64-bit memory ST(d) ST(0) + ST(i) FADDP Add and Pop ST(0) ST(0) + ST(i) FBLD Load Binary Coded Decimal FBSTP Store Binary Coded Decimal and Pop FCHS Change Sign FCLEX Clear Exceptions FCMOVcc Conditional Move on EFLAG Register Condition Codes move if below (B) move if equal (E) move if below or equal (BE) move if unordered (U) move if not below (NB) move if not equal (NE) move if not below or equal (NBE) move if not unordered (NU) FCOM Compare Real 32-bit memory 64-bit memory ST(i) FCOMP Compare Real and Pop 32-bit memory 64-bit memory ST(i) FCOMPP Compare Real and Pop Twice FCOMI Compare Real and Set EFLAGS FCOMIP Compare Real, Set EFLAGS, and Pop FCOS Cosine of ST(0) FDECSTP Decrement Stack-Top Pointer FDIV Divide ST(0) ST(0) 32-bit memory ST(0) ST(0) 64-bit memory ST(d) ST(0) ST(i) FDIVP Divide and Pop ST(0) ST(0) ST(i) 11011 110 : 1111 1 ST(i) 11011 000 : mod 110 r/m 11011 100 : mod 110 r/m 11011 d00 : 1111 R ST(i) 11011 000 : mod 011 r/m 11011 100 : mod 011 r/m 11011 000 : 11 011 ST(i) 11011 110 : 11 011 001 11011 011 : 11 110 ST(i) 11011 111 : 11 110 ST(i) 11011 001 : 1111 1111 11011 001 : 1111 0110 11011 000 : mod 010 r/m 11011 100 : mod 010 r/m 11011 000 : 11 010 ST(i) 11011 010 : 11 000 ST(i) 11011 010 : 11 001 ST(i) 11011 010 : 11 010 ST(i) 11011 010 : 11 011 ST(i) 11011 011 : 11 000 ST(i) 11011 011 : 11 001 ST(i) 11011 011 : 11 010 ST(i) 11011 011 : 11 011 ST(i) 11011 110 : 11 000 ST(i) 11011 111 : mod 100 r/m 11011 111 : mod 110 r/m 11011 001 : 1110 0000 11011 011 : 1110 0010 11011 000 : mod 000 r/m 11011 100 : mod 000 r/m 11011 d00 : 11 000 ST(i) Encoding 11011 001 : 1111 0000 11011 001 : 1110 0001
B-37

Instruction and Format FDIVR Reverse Divide ST(0) 32-bit memory ST(0) ST(0) 64-bit memory ST(0) ST(d) ST(i) ST(0) FDIVRP Reverse Divide and Pop ST(0) ST(i) ST(0) FFREE Free ST(i) Register FIADD Add Integer ST(0) ST(0) + 16-bit memory ST(0) ST(0) + 32-bit memory FICOM Compare Integer 16-bit memory 32-bit memory FICOMP Compare Integer and Pop 16-bit memory 32-bit memory FIDIV ST(0) ST(0) + 16-bit memory ST(0) ST(0) + 32-bit memory FIDIVR ST(0) ST(0) + 16-bit memory ST(0) ST(0) + 32-bit memory FILD Load Integer 16-bit memory 32-bit memory 64-bit memory FIMUL ST(0) ST(0) + 16-bit memory ST(0) ST(0) + 32-bit memory FINCSTP Increment Stack Pointer FINIT Initialize Floating-Point Unit FIST Store Integer 16-bit memory 32-bit memory FISTP Store Integer and Pop 16-bit memory 32-bit memory 64-bit memory FISUB ST(0) ST(0) + 16-bit memory ST(0) ST(0) + 32-bit memory 11011 110 : mod 100 r/m 11011 010 : mod 100 r/m 11011 111 : mod 011 r/m 11011 011 : mod 011 r/m 11011 111 : mod 111 r/m 11011 111 : mod 010 r/m 11011 011 : mod 010 r/m 11011 110 : mod 001 r/m 11011 010 : mod 001 r/m 11011 001 : 1111 0111 11011 111 : mod 000 r/m 11011 011 : mod 000 r/m 11011 111 : mod 101 r/m 11011 110 : mod 111 r/m 11011 010 : mod 111 r/m 11011 110 : mod 110 r/m 11011 010 : mod 110 r/m 11011 110 : mod 011 r/m 11011 010 : mod 011 r/m 11011 110 : mod 010 r/m 11011 010 : mod 010 r/m 11011 110 : mod 000 r/m 11011 010 : mod 000 r/m 11011 110 : 1111 0 ST(i) 11011 101 : 1100 0 ST(i) 11011 000 : mod 111 r/m 11011 100 : mod 111 r/m 11011 d00 : 1111 R ST(i) Encoding
B-38

Instruction and Format FISUBR ST(0) ST(0) + 16-bit memory ST(0) ST(0) + 32-bit memory FLD Load Real 32-bit memory 64-bit memory 80-bit memory ST(i) FLD1 Load +1.0 into ST(0) FLDCW Load Control Word FLDENV Load FPU Environment FLDL2E Load log2() into ST(0) FLDL2T Load log2(10) into ST(0) FLDLG2 Load log10(2) into ST(0) FLDLN2 Load log(2) into ST(0) FLDPI Load into ST(0) FLDZ Load +0.0 into ST(0) FMUL Multiply ST(0) ST(0) 32-bit memory ST(0) ST(0) 64-bit memory ST(d) ST(0) ST(i) FMULP Multiply ST(0) ST(0) ST(i) FNOP No Operation FPATAN Partial Arctangent FPREM Partial Remainder FPREM1 Partial Remainder (IEEE) FPTAN Partial Tangent FRNDINT Round to Integer FRSTOR Restore FPU State FSAVE Store FPU State FSCALE Scale FSIN Sine FSINCOS Sine and Cosine FSQRT Square Root FST Store Real 32-bit memory 64-bit memory ST(i) FSTCW Store Control Word FSTENV Store FPU Environment 11011 001 : mod 010 r/m 11011 101 : mod 010 r/m 11011 101 : 11 010 ST(i) 11011 001 : mod 111 r/m 11011 001 : mod 110 r/m 11011 110 : 1100 1 ST(i) 11011 001 : 1101 0000 11011 001 : 1111 0011 11011 001 : 1111 1000 11011 001 : 1111 0101 11011 001 : 1111 0010 11011 001 : 1111 1100 11011 101 : mod 100 r/m 11011 101 : mod 110 r/m 11011 001 : 1111 1101 11011 001 : 1111 1110 11011 001 : 1111 1011 11011 001 : 1111 1010 11011 000 : mod 001 r/m 11011 100 : mod 001 r/m 11011 d00 : 1100 1 ST(i) 11011 001 : mod 000 r/m 11011 101 : mod 000 r/m 11011 011 : mod 101 r/m 11011 001 : 11 000 ST(i) 11011 001 : 1110 1000 11011 001 : mod 101 r/m 11011 001 : mod 100 r/m 11011 001 : 1110 1010 11011 001 : 1110 1001 11011 001 : 1110 1100 11011 001 : 1110 1101 11011 001 : 1110 1011 11011 001 : 1110 1110 11011 110 : mod 101 r/m 11011 010 : mod 101 r/m Encoding
B-39

Instruction and Format FSTP Store Real and Pop 32-bit memory 64-bit memory 80-bit memory ST(i) FSTSW Store Status Word into AX FSTSW Store Status Word into Memory FSUB Subtract ST(0) ST(0) 32-bit memory ST(0) ST(0) 64-bit memory ST(d) ST(0) ST(i) FSUBP Subtract and Pop ST(0) ST(0) ST(i) FSUBR Reverse Subtract ST(0) 32-bit memory ST(0) ST(0) 64-bit memory ST(0) ST(d) ST(i) ST(0) FSUBRP Reverse Subtract and Pop ST(i) ST(i) ST(0) FTST Test FUCOM Unordered Compare Real FUCOMP Unordered Compare Real and Pop FUCOMPP Unordered Compare Real and Pop Twice FUCOMI Unordered Compare Real and Set EFLAGS FUCOMIP Unordered Compare Real, Set EFLAGS, and Pop FXAM Examine FXCH Exchange ST(0) and ST(i) FXTRACT Extract Exponent and Significand FYL2X ST(1) log2(ST(0)) FYL2XP1 ST(1) log2(ST(0) + 1.0) FWAIT Wait until FPU Ready 11011 110 : 1110 0 ST(i) 11011 001 : 1110 0100 11011 101 : 1110 0 ST(i) 11011 101 : 1110 1 ST(i) 11011 010 : 1110 1001 11011 011 : 11 101 ST(i) 11011 111 : 11 101 ST(i) 11011 001 : 1110 0101 11011 001 : 1100 1 ST(i) 11011 001 : 1111 0100 11011 001 : 1111 0001 11011 001 : 1111 1001 1001 1011 11011 000 : mod 101 r/m 11011 100 : mod 101 r/m 11011 d00 : 1110 R ST(i) 11011 110 : 1110 1 ST(i) 11011 000 : mod 100 r/m 11011 100 : mod 100 r/m 11011 d00 : 1110 R ST(i) 11011 001 : mod 011 r/m 11011 101 : mod 011 r/m 11011 011 : mod 111 r/m 11011 101 : 11 011 ST(i) 11011 111 : 1110 0000 11011 101 : mod 111 r/m Encoding
B-40
C
Compiler Intrinsics and Functional Equivalents
APPENDIX C COMPILER INTRINSICS AND FUNCTIONAL EQUIVALENTS

The two tables in this chapter itemize the Intel C/C++ compiler intrinsics and functional equivalents for the MMX technology instructions and Streaming SIMD Extensions. There may be additional intrinsics that do not have an instruction equivalent. It is strongly recommended that the reader reference the compiler documentation for the complete list of supported intrinsics. Please refer to the Intel C/C++ Compiler Users Guide for Win32* Systems With Streaming SIMD Extension Support (Order Number 718195-00B). Appendix C catalogs use of these intrinsics. The Section 3.1.3., Intel C/C++ Compiler Intrinsics Equivalent of Chapter 3, Instruction Set Reference has more general supporting information for the following tables. Table C-1 presents simple intrinsics, and Table C-2 presents composite intrinsics. Some intrinsics are composites because they require more than one instruction to implement them. Most of the intrinsics that use __m64 operands have two different names. If two intrinsic names are shown for the same equivalent, the first name is the intrinsic for Intel C/C++ Compiler versions prior to 4.0 and the second name should be used with the Intel C/C++ Compiler version 4.0 and future versions. The Intel C/C++ Compiler version 4.0 will support the old intrinsic names. Programs written using pre-4.0 intrinsic names will compile with version 4.0. Version 4.0 intrinsic names will not compile on pre-4.0 compilers. Intel C/C++ Compiler version 4.0 names reflect the following naming conventions:
a "_mm" prefix, followed by a plain spelling of the operation or the actual instructions mnemonic, followed by a suffix indicating the operand type. Since there are many different types of integer data that can be contained within a __m64 data item, the following convention is used:
s - indicates scalar p - indicates packed i - indicates signed integer, or in some instructions where the sign does not matter, this is the default u - indicates an unsigned integer
8, 16, 32, or 64 - the bit size of the data elements. For example, _mm_add_pi8 indicates addition of packed, 8-bit integers; _mm_slli_pi32() is a logical left shift with an immediate shift count (the "i" after the name) of a packed, 32-bit integer.
C-1
COMPILER INTRINSICS AND FUNCTIONAL EQUIVALENTS
C.1. SIMPLE INTRINSICS

Table C-1. Simple Intrinsics
Mnemonic
ADDPS ADDSS
Intrinsic
__m128 _mm_add_ps(__m128 a, __m128 b) __m128 _mm_add_ss(__m128 a, __m128 b)
Description
Adds the four SP FP values of a and b. Adds the lower SP FP (single-precision, floating-point) values of a and b; the upper three SP FP values are passed through from a. Computes the bitwise AND-NOT of the four SP FP values of a and b. Compare for equality. Compare for less-than. Compare for less-than-or-equal. Compare for greater-than. Compare for greater-than-or-equal. Compare for inequality. Compare for not-less-than. Compare for not-greater-than. Compare for not-greater-than-or-equal. Compare for ordered. Compare for unordered. Compare for not-less-than-or-equal. Compare for equality. Compare for less-than. Compare for less-than-or-equal. Compare for greater-than. Compare for greater-than-or-equal. Compare for inequality. Compare for not-less-than. Compare for not-greater-than. Compare for not-greater-than-or-equal. Compare for ordered. Compare for unordered. Compare for not-less-than-or-equal.
ANDPS CMPPS
__m128 _mm_andnot_ps(__m128 a, __m128 b) __m128 _mm_cmpeq_ps(__m128 a, __m128 b) __m128 _mm_cmplt_ps(__m128 a, __m128 b) __m128 _mm_cmple_ps(__m128 a, __m128 b) __m128 _mm_cmpgt_ps(__m128 a, __m128 b) __m128 _mm_cmpge_ps(__m128 a, __m128 b) __m128 _mm_cmpneq_ps(__m128 a, __m128 b) __m128 _mm_cmpnlt_ps(__m128 a, __m128 b) __m128 _mm_cmpngt_ps(__m128 a, __m128 b) __m128 _mm_cmpnge_ps(__m128 a, __m128 b) __m128 _mm_cmpord_ps(__m128 a, __m128 b) __m128 _mm_cmpunord_ps(__m128 a, __m128 b) __m128 _mm_cmpnle_ps(__m128 a, __m128 b)
CMPSS
__m128 _mm_cmpeq_ss(__m128 a, __m128 b) __m128 _mm_cmplt_ss(__m128 a, __m128 b) __m128 _mm_cmple_ss(__m128 a, __m128 b) __m128 _mm_cmpgt_ss(__m128 a, __m128 b) __m128 _mm_cmpge_ss(__m128 a, __m128 b) __m128 _mm_cmpneq_ss(__m128 a, __m128 b) __m128 _mm_cmpnlt_ss(__m128 a, __m128 b) __m128 _mm_cmpnle_ss(__m128 a, __m128 b) __m128 _mm_cmpngt_ss(__m128 a, __m128 b) __m128 _mm_cmpnge_ss(__m128 a, __m128 b) __m128 _mm_cmpord_ss(__m128 a, __m128 b) __m128 _mm_cmpunord_ss(__m128 a, __m128 b)
C-2

Mnemonic
COMISS
Intrinsic
int _mm_comieq_ss(__m128 a, __m128 b)
Description
Compares the lower SP FP value of a and b for a equal to b. If a and b are equal, 1 is returned. Otherwise 0 is returned. Compares the lower SP FP value of a and b for a less than b. If a is less than b, 1 is returned. Otherwise 0 is returned. Compares the lower SP FP value of a and b for a less than or equal to b. If a is less than or equal to b, 1 is returned. Otherwise 0 is returned. Compares the lower SP FP value of a and b for a greater than b. If a is greater than b are equal, 1 is returned. Otherwise 0 is returned. Compares the lower SP FP value of a and b for a greater than or equal to b. If a is greater than or equal to b, 1 is returned. Otherwise 0 is returned. Compares the lower SP FP value of a and b for a not equal to b. If a and b are not equal, 1 is returned. Otherwise 0 is returned. Convert the two 32-bit integer values in packed form in b to two SP FP values; the upper two SP FP values are passed through from a. Convert the two lower SP FP values of a to two 32-bit integers according to the current rounding mode, returning the integers in packed form. Convert the 32-bit integer value b to an SP FP value; the upper three SP FP values are passed through from a. Convert the lower SP FP value of a to a 32bit integer with truncation. Convert the two lower SP FP values of a to two 32-bit integer with truncation, returning the integers in packed form. Convert the lower SP FP value of a to a 32bit integer according to the current rounding mode. Convert the integer object i to a 64-bit __m64 object. The integer value is zero extended to 64 bits. Convert the lower 32 bits of the __m64 object m to an integer. Divides the four SP FP values of a and b. Divides the lower SP FP values of a and b; the upper three SP FP values are passed through from a. Clears the MMX technology state.
int _mm_comilt_ss(__m128 a, __m128 b)
int _mm_comile_ss(__m128 a, __m128 b)
int _mm_comigt_ss(__m128 a, __m128 b)
int _mm_comige_ss(__m128 a, __m128 b)
int _mm_comineq_ss(__m128 a, __m128 b)
CVTPI2PS
__m128 _mm_cvt_pi2ps(__m128 a, __m64 b) __m128 _mm_cvtpi32_ps(__m128 a, __m64b)
CVTPS2PI
__m64 _mm_cvt_ps2pi(__m128 a) __m64 _mm_cvtps_pi32(__m128 a)
CVTSI2SS
__m128 _mm_cvt_si2ss(__m128 a, int b) __m128 _mm_cvtsi32_ss(__m128a, int b) int _mm_cvt_ss2si(__m128 a) int _mm_cvtss_si32(__m128 a) __m64 _mm_cvtt_ps2pi(__m128 a) __m64 _mm_cvttps_pi32(__m128 a) int _mm_cvtt_ss2si(__m128 a) int _mm_cvttss_si32(__m128 a) __m64 _m_from_int(int i) __m64 _mm_cvtsi32_si64(int i) int _m_to_int(__m64 m) int _mm_cvtsi64_si32(__m64 m)
CVTSS2SI CVTTPS2PI
CVTTSS2SI
DIVPS DIVSS
__m128 _mm_div_ps(__m128 a, __m128 b) __m128 _mm_div_ss(__m128 a, __m128 b)
EMMS
void _m_empty() void _mm_empty()
C-3

Mnemonic
LDMXCSR MASKMOVQ
Intrinsic
_mm_setcsr(unsigned int i) void _m_maskmovq(__m64 d, __m64 n, char * p) void _mm_maskmove_si64(__m64 d, __m64 n, char *p)
Description
Sets the control register to the value specified. Conditionally store byte elements of d to address p. The high bit of each byte in the selector n determines whether the corresponding byte in d will be stored. Computes the maximums of the four SP FP values of a and b. Computes the maximum of the lower SP FP values of a and b; the upper three SP FP values are passed through from a. Computes the minimums of the four SP FP values of a and b. Computes the minimum of the lower SP FP values of a and b; the upper three SP FP values are passed through from a. Loads four SP FP values. The address must be 16-byte-aligned. Stores four SP FP values. The address must be 16-byte-aligned. Moves the upper 2 SP FP values of b to the lower 2 SP FP values of the result. The upper 2 SP FP values of a are passed through to the result. Sets the upper two SP FP values with 64 bits of data loaded from the address p; the lower two values are passed through from a. Stores the upper two SP FP values of a to the address p. Sets the lower two SP FP values with 64 bits of data loaded from the address p; the upper two values are passed through from a. Stores the lower two SP FP values of a to the address p. Moves the lower 2 SP FP values of b to the upper 2 SP FP values of the result. The lower 2 SP FP values of a are passed through to the result. Creates a 4-bit mask from the most significant bits of the four SP FP values. Stores the data in a to the address p without polluting the caches. The address must be 16-byte-aligned. Stores the data in a to the address p without polluting the caches.
MAXPS MAXSS
__m128 _mm_max_ps(__m128 a, __m128 b) __m128 _mm_max_ss(__m128 a, __m128 b)
MINPS MINSS
__m128 _mm_min_ps(__m128 a, __m128 b) __m128 _mm_min_ss(__m128 a, __m128 b)
MOVAPS
__m128 _mm_load_ps(float * p) void_mm_store_ps(float *p, __m128 a)
MOVHLPS
__m128 _mm_movehl_ps(__m128 a, __m128 b)
MOVHPS
__m128 _mm_loadh_pi(__m128 a, __m64 * p)
void_mm_storeh_pi(__m64 * p, __m128 a) MOVLPS __m128 _mm_loadl_pi(__m128 a, __m64 *p)
void_mm_storel_pi(__m64 * p, __m128 a) MOVLHPS __m128 _mm_movelh_ps(__m128 a, __m128 b)
MOVMSKPS MOVNTPS
int_mm_movemask_ps(__m128 a) void_mm_stream_ps(float * p, __m128 a)
MOVNTQ
void_mm_stream_pi(__m64 * p, __m64 a)
C-4

Mnemonic
MOVSS
Intrinsic
__m128 _mm_load_ss(float * p) void_mm_store_ss(float * p, __m128 a) __m128 _mm_move_ss(__m128 a, __m128 b)
Description
Loads an SP FP value into the low word and clears the upper three words. Stores the lower SP FP value. Sets the low word to the SP FP value of b. The upper 3 SP FP values are passed through from a. Loads four SP FP values. The address need not be 16-byte-aligned. Stores four SP FP values. The address need not be 16-byte-aligned. Multiplies the lower SP FP values of a and b; the upper three SP FP values are passed through from a. Computes the bitwise OR of the four SP FP values of a and b. Pack the four 16-bit values from m1 into the lower four 8-bit values of the result with signed saturation, and pack the four 16-bit values from m2 into the upper four 8-bit values of the result with signed saturation. Pack the two 32-bit values from m1 into the lower two 16-bit values of the result with signed saturation, and pack the two 32-bit values from m2 into the upper two 16-bit values of the result with signed saturation. Pack the four 16-bit values from m1 into the lower four 8-bit values of the result with unsigned saturation, and pack the four 16bit values from m2 into the upper four 8-bit values of the result with unsigned saturation. Add the eight 8-bit values in m1 to the eight 8-bit values in m2. Add the four 16-bit values in m1 to the four 16-bit values in m2. Add the two 32-bit values in m1 to the two 32-bit values in m2. Add the eight signed 8-bit values in m1 to the eight signed 8-bit values in m2 and saturate. Add the four signed 16-bit values in m1 to the four signed 16-bit values in m2 and saturate. Add the eight unsigned 8-bit values in m1 to the eight unsigned 8-bit values in m2 and saturate. Add the four unsigned 16-bit values in m1 to the four unsigned 16-bit values in m2 and saturate. Perform a bitwise AND of the 64-bit value in m1 with the 64-bit value in m2.
MOVUPS
__m128 _mm_loadu_ps(float * p) void_mm_storeu_ps(float *p, __m128 a)
MULSS
__m128 _mm_mul_ss(__m128 a, __m128 b)
ORPS PACKSSWB
__m128 _mm_or_ps(__m128 a, __m128 b) __m64 _m_packsswb (__m64 m1, __m64 m2) __m64 _mm_packs_pi16(__m64 m1, __m64 m2)
PACKSSDW
__m64 _m_packssdw (__m64 m1, __m64 m2) __m64 _mm_packs_pi32 (__m64 m1, __m64 m2)
PACKUSWB
__m64 _m_packuswb(__m64 m1, __m64 m2) __m64 _mm_packs_pu16(__m64 m1, __m64 m2)
PADDB PADDW PADDD PADDSB
__m64 _m_paddb(__m64 m1, __m64 m2) __m64 _mm_add_pi8(__m64 m1, __m64 m2) __m64 _m_paddw(__m64 m1, __m64 m2) __m64 _mm_addw_pi16__m64 m1, __m64 m2) __m64 _m_paddd(__m64 m1, __m64 m2) __m64 _mm_add_pi32(__m64 m1, __m64 m2) __m64 _m_paddsb(__m64 m1, __m64 m2) __m64 _mm_adds_pi8(__m64 m1, __m64 m2) __m64 _m_paddsw(__m64 m1, __m64 m2) __m64 _mm_adds_pi16(__m64 m1, __m64 m2) __m64 _m_paddusb(__m64 m1, __m64 m2) __m64 _mm_adds_pu8(__m64 m1, __m64 m2) __m64 _m_paddusw(__m64 m1, __m64 m2) __m64 _mm_adds_pu16(__m64 m1, __m64 m2) __m64 _m_pand(__m64 m1, __m64 m2) __m64 _mm_and_si64(__m64 m1, __m64 m2)
PADDSW
PADDUSB
PADDUSW
PAND
C-5

Mnemonic
PANDN
Intrinsic
__m64 _m_pandn(__m64 m1, __m64 m2) __m64 _mm_andnot_si64(__m64 m1, __m64 m2) __m64 _mm_pavgb(__m64 a, __m64 b) __m64 _mm_avg_pu8(__m64 a, __m64 b) __m64 _mm_pavgw(__m64 a, __m64 b) __m64 _mm_avg_pu16(__m64 a, __m64 b) __m64 _m_pcmpeqb (__m64 m1, __m64 m2) __m64 _mm_cmpeq_pi8(__m64 m1, __m64 m2)
Description
Perform a logical NOT on the 64-bit value in m1 and use the result in a bitwise AND with the 64-bit value in m2. Perform the packed average on the eight 8bit values of the two operands. Perform the packed average on the four 16-bit values of the two operands. If the respective 8-bit values in m1 are equal to the respective 8-bit values in m2 set the respective 8-bit resulting values to all ones, otherwise set them to all zeroes. If the respective 16-bit values in m1 are equal to the respective 16-bit values in m2 set the respective 16-bit resulting values to all ones, otherwise set them to all zeroes. If the respective 32-bit values in m1 are equal to the respective 32-bit values in m2 set the respective 32-bit resulting values to all ones, otherwise set them to all zeroes. If the respective 8-bit values in m1 are greater than the respective 8-bit values in m2 set the respective 8-bit resulting values to all ones, otherwise set them to all zeroes. If the respective 16-bit values in m1 are greater than the respective 16-bit values in m2 set the respective 16-bit resulting values to all ones, otherwise set them to all zeroes. If the respective 32-bit values in m1 are greater than the respective 32-bit values in m2 set the respective 32-bit resulting values to all ones, otherwise set them all to zeroes. Extracts one of the four words of a. The selector n must be an immediate. Inserts word d into one of four words of a. The selector n must be an immediate. Multiply four 16-bit values in m1 by four 16bit values in m2 producing four 32-bit intermediate results, which are then summed by pairs to produce two 32-bit results. Computes the element-wise maximum of the words in a and b. Computes the element-wise maximum of the unsigned bytes in a and b. Computes the element-wise minimum of the words in a and b. Computes the element-wise minimum of the unsigned bytes in a and b. Creates an 8-bit mask from the most significant bits of the bytes in a.
PAVGB PAVGW PCMPEQB
PCMPEQW
__m64 _m_pcmpeqw (__m64 m1, __m64 m2) __m64 _mm_cmpeq_pi16 (__m64 m1, __m64 m2)
PCMPEQD
__m64 _m_pcmpeqd (__m64 m1, __m64 m2) __m64 _mm_cmpeq_pi32(__m64 m1, __m64 m2)
PCMPGTB
__m64 _m_pcmpgtb (__m64 m1, __m64 m2) __m64 _mm_cmpgt_pi8 (__m64 m1, __m64 m2)
PCMPGTW
__m64 _m_pcmpgtw (__m64 m1, __m64 m2) __m64 _m_cmpgt_pi16 (__m64 m1, __m64 m2)
PCMPGTD
__m64 _m_pcmpgtd (__m64 m1, __m64 m2) __m64 _mm_cmpgt_pi32(__m64 m1, __m64 m2)
PEXTRW PINSRW PMADDWD
int _m_pextrw(__m64 a, int n) int _mm_extract_pi16(__m64 a, int n) __m64 _m_pinsrw(__m64 a, int d, int n) __m64 _mm_insert_pi16(__m64 a, int d, int n) __m64 _m_pmaddwd(__m64 m1, __m64 m2) __m64 _mm_madd_pi16(__m64 m1, __m64 m2)
PMAXSW PMAXUB PMINSW PMINUB PMOVMSKB
__m64 _m_pmaxsw(__m64 a, __m64 b) __m64 _mm_max_pi16(__m64 a, __m64 b) __m64 _m_pmaxub(__m64 a, __m64 b) __m64 _mm_max_pu8(__m64 a, __m64 b) __m64 _m_pminsw(__m64 a, __m64 b) __m64 _mm_min_pi16(__m64 a, __m64 b) __m64 _m_pminub(__m64 a, __m64 b) __m64 _m_min_pu8(__m64 a, __m64 b) int _m_pmovmskb(__m64 a) int _mm_movemask_pi8(__m64 a)
C-6

Mnemonic
PMULHUW
Intrinsic
__m64 _m_pmulhuw(__m64 a, __m64 b) __m64 _mm_mulhi_pu16(__m64 a, __m64 b) __m64 _m_pmulhw(__m64 m1, __m64 m2) __m64 _mm_mulhi_pi16(__m64 m1, __m64 m2) __m64 _m_pmullw(__m64 m1, __m64 m2) __m64 _mm_mullo_pi16(__m64 m1, __m64 m2) __m64 _m_por(__m64 m1, __m64 m2) __m64 _mm_or_si64(__m64 m1, __m64 m2) void _mm_prefetch(char *a, int sel)
Description
Multiplies the unsigned words in a and b, returning the upper 16 bits of the 32-bit intermediate results. Multiply four signed 16-bit values in m1 by four signed 16-bit values in m2 and produce the high 16 bits of the four results. Multiply four 16-bit values in m1 by four 16bit values in m2 and produce the low 16 bits of the four results. Perform a bitwise OR of the 64-bit value in m1 with the 64-bit value in m2. Loads one cache line of data from address p to a location "closer" to the processor. The value i specifies the type of prefetch operation. Returns a combination of the four words of a. The selector n must be an immediate. Shift four 16-bit values in m left the amount specified by count while shifting in zeroes. Shift four 16-bit values in m left the amount specified by count while shifting in zeroes. For the best performance, count should be a constant. Shift two 32-bit values in m left the amount specified by count while shifting in zeroes. Shift two 32-bit values in m left the amount specified by count while shifting in zeroes. For the best performance, count should be a constant. Shift the 64-bit value in m left the amount specified by count while shifting in zeroes. Shift the 64-bit value in m left the amount specified by count while shifting in zeroes. For the best performance, count should be a constant. Shift four 16-bit values in m right the amount specified by count while shifting in the sign bit. Shift four 16-bit values in m right the amount specified by count while shifting in the sign bit. For the best performance, count should be a constant. Shift two 32-bit values in m right the amount specified by count while shifting in the sign bit. Shift two 32-bit values in m right the amount specified by count while shifting in the sign bit. For the best performance, count should be a constant.
PMULHW
PMULLW
POR PREFETCH
PSHUFW PSLLW
__m64 _m_psadbw(__m64 a, __m64 b) __m64 _mm_sad_pu8(__m64 a, __m64 b) __m64 _m_pshufw(__m64 a, int n) __m64 _mm_shuffle_pi16(__m64 a, int n) __m64 _m_psllw(__m64 m, __m64 count) __m64 _mm_sll_pi16(__m64 m, __m64 count)
PSLLD
__m64 _m_psllwi (__m64 m, int count) __m64 _m_slli_pi16(__m64 m, int count) __m64 _m_pslld (__m64 m, __m64 count) __m64 _m_sll_pi32(__m64 m, __m64 count)
PSLLQ
__m64 _m_psllq (__m64 m, __m64 count) __m64 _mm_sll_si64(__m64 m, __m64 count) __m64 _m_psllqi (__m64 m, int count) __m64 _mm_slli_si64(__m64 m, int count)
PSRAW
__m64 _m_psraw (__m64 m, __m64 count) __m64 _mm_sra_pi16(__m64 m, __m64 count) __m64 _m_psrawi (__m64 m, int count) __m64 _mm_srai_pi16(__m64 m, int count)
PSRAD
__m64 _m_psrad (__m64 m, __m64 count) __m64 _mm_sra_pi32 (__m64 m, __m64 count) __m64 _m_psradi (__m64 m, int count) __m64 _mm_srai_pi32 (__m64 m, int count)
C-7

Mnemonic
PSRLW
Intrinsic
__m64 _m_psrlw (__m64 m, __m64 count) __m64 _mm_srl_pi16 (__m64 m, __m64 count) __m64 _m_psrlwi (__m64 m, int count) __m64 _mm_srli_pi16(__m64 m, int count)
Description
Shift four 16-bit values in m right the amount specified by count while shifting in zeroes. Shift four 16-bit values in m right the amount specified by count while shifting in zeroes. For the best performance, count should be a constant. Shift two 32-bit values in m right the amount specified by count while shifting in zeroes. Shift two 32-bit values in m right the amount specified by count while shifting in zeroes. For the best performance, count should be a constant. Shift the 64-bit value in m right the amount specified by count while shifting in zeroes. Shift the 64-bit value in m right the amount specified by count while shifting in zeroes. For the best performance, count should be a constant. Subtract the eight 8-bit values in m2 from the eight 8-bit values in m1. Subtract the four 16-bit values in m2 from the four 16-bit values in m1. Subtract the two 32-bit values in m2 from the two 32-bit values in m1. Subtract the eight signed 8-bit values in m2 from the eight signed 8-bit values in m1 and saturate. Subtract the four signed 16-bit values in m2 from the four signed 16-bit values in m1 and saturate. Subtract the eight unsigned 8-bit values in m2 from the eight unsigned 8-bit values in m1 and saturate. Subtract the four unsigned 16-bit values in m2 from the four unsigned 16-bit values in m1 and saturate. Interleave the four 8-bit values from the high half of m1 with the four values from the high half of m2 and take the least significant element from m1. Interleave the two 16-bit values from the high half of m1 with the two values from the high half of m2 and take the least significant element from m1. Interleave the 32-bit value from the high half of m1 with the 32-bit value from the high half of m2 and take the least significant element from m1. Interleave the four 8-bit values from the low half of m1 with the four values from the low half of m2 and take the least significant element from m1.
PSRLD
__m64 _m_psrld (__m64 m, __m64 count) __m64 _mm_srl_pi32 (__m64 m, __m64 count) __m64 _m_psrldi (__m64 m, int count) __m64 _mm_srli_pi32 (__m64 m, int count)
PSRLQ
__m64 _m_psrlq (__m64 m, __m64 count) __m64 _mm_srl_si64 (__m64 m, __m64 count) __m64 _m_psrlqi (__m64 m, int count) __m64 _mm_srli_si64 (__m64 m, int count)
PSUBB PSUBW PSUBD PSUBSB
__m64 _m_psubb(__m64 m1, __m64 m2) __m64 _mm_sub_pi8(__m64 m1, __m64 m2) __m64 _m_psubw(__m64 m1, __m64 m2) __m64 _mm_sub_pi16(__m64 m1, __m64 m2) __m64 _m_psubd(__m64 m1, __m64 m2) __m64 _mm_sub_pi32(__m64 m1, __m64 m2) __m64 _m_psubsb(__m64 m1, __m64 m2) __m64 _mm_subs_pi8(__m64 m1, __m64 m2) __m64 _m_psubsw(__m64 m1, __m64 m2) __m64 _mm_subs_pi16(__m64 m1, __m64 m2) __m64 _m_psubusb(__m64 m1, __m64 m2) __m64 _mm_sub_pu8(__m64 m1, __m64 m2) __m64 _m_psubusw(__m64 m1, __m64 m2) __m64 _mm_sub_pu16(__m64 m1, __m64 m2) __m64 _m_punpckhbw (__m64 m1, __m64 m2) __m64 _mm_unpackhi_pi8(__m64 m1, __m64 m2)
PSUBSW
PSUBUSB
PSUBUSW
PUNPCKHBW
PUNPCKHWD
__m64 _m_punpckhwd (__m64 m1, __m64 m2) __m64 _mm_unpackhi_pi16(__m64 m1,__m64 m2)
PUNPCKHDQ
__m64 _m_punpckhdq (__m64 m1, __m64 m2) __m64 _mm_unpackhi_pi32(__m64 m1, __m64 m2)
PUNPCKLBW
__m64 _m_punpcklbw (__m64 m1, __m64 m2) __m64 _mm_unpacklo_pi8 (__m64 m1, __m64 m2)
C-8

Mnemonic
PUNPCKLWD
Intrinsic
__m64 _m_punpcklwd (__m64 m1, __m64 m2) __m64 _mm_unpacklo_pi16(__m64 m1, __m64 m2)
Description
Interleave the two 16-bit values from the low half of m1 with the two values from the low half of m2 and take the least significant element from m1. Interleave the 32-bit value from the low half of m1 with the 32-bit value from the low half of m2 and take the least significant element from m1. Perform a bitwise XOR of the 64-bit value in m1 with the 64-bit value in m2. Computes the approximations of the reciprocals of the four SP FP values of a. Computes the approximation of the reciprocal of the lower SP FP value of a; the upper three SP FP values are passed through. Computes the approximations of the reciprocals of the square roots of the four SP FP values of a. Computes the approximation of the reciprocal of the square root of the lower SP FP value of a; the upper three SP FP values are passed through. Guarantees that every preceding store is globally visible before any subsequent store. Selects four specific SP FP values from a and b, based on the mask i. The mask must be an immediate. Computes the square roots of the four SP FP values of a. Computes the square root of the lower SP FP value of a; the upper three SP FP values are passed through. Returns the contents of the control register. Subtracts the four SP FP values of a and b. Subtracts the lower SP FP values of a and b. The upper three SP FP values are passed through from a.
PUNPCKLDQ
__m64 _m_punpckldq (__m64 m1, __m64 m2) __m64 _mm_unpacklo_pi32(__m64 m1, __m64 m2)
PXOR RCPPS RCPSS
__m64 _m_pxor(__m64 m1, __m64 m2) __m64 _mm_xor_si64(__m64 m1, __m64 m2) __m128 _mm_rcp_ps(__m128 a) __m128 _mm_rcp_ss(__m128 a)
RSQRTPS
__m128 _mm_rsqrt_ps(__m128 a)
RSQRTSS
__m128 _mm_rsqrt_ss(__m128 a)
SFENCE
void_mm_sfence(void)
SHUFPS
__m128 _mm_shuffle_ps(__m128 a, __m128 b, unsigned int imm8) __m128 _mm_sqrt_ps(__m128 a) __m128 _mm_sqrt_ss(__m128 a)
SQRTPS SQRTSS
STMXCSR SUBPS SUBSS
_mm_getcsr(void) __m128 _mm_sub_ps(__m128 a, __m128 b) __m128 _mm_sub_ss(__m128 a, __m128 b)
C-9

Mnemonic
UCOMISS
Intrinsic
_mm_ucomieq_ss(__m128 a, __m128 b)
Description
Compares the lower SP FP value of a and b for a equal to b. If a and b are equal, 1 is returned. Otherwise 0 is returned. Compares the lower SP FP value of a and b for a less than b. If a is less than b, 1 is returned. Otherwise 0 is returned. Compares the lower SP FP value of a and b for a less than or equal to b. If a is less than or equal to b, 1 is returned. Otherwise 0 is returned. Compares the lower SP FP value of a and b for a greater than b. If a is greater than b are equal, 1 is returned. Otherwise 0 is returned. Compares the lower SP FP value of a and b for a greater than or equal to b. If a is greater than or equal to b, 1 is returned. Otherwise 0 is returned. Compares the lower SP FP value of a and b for a not equal to b. If a and b are not equal, 1 is returned. Otherwise 0 is returned. Selects and interleaves the upper two SP FP values from a and b. Selects and interleaves the lower two SP FP values from a and b. Computes bitwise EXOR (exclusive-or) of the four SP FP values of a and b.
_mm_ucomilt_ss(__m128 a, __m128 b)
_mm_ucomile_ss(__m128 a, __m128 b)
_mm_ucomigt_ss(__m128 a, __m128 b)
_mm_ucomige_ss(__m128 a, __m128 b)
_mm_ucomineq_ss(__m128 a, __m128 b)
UNPCKHPS UNPCKLPS XORPS
__m128 _mm_unpackhi_ps(__m128 a, __m128 b) __m128 _mm_unpacklo_ps(__m128 a, __m128 b) __m128 _mm_xor_ps(__m128 a, __m128 b)
C-10
C.2. COMPOSITE INTRINSICS

Table C-2. Composite Intrinsics
Mnemonic
(composite) (composite) (composite) (composite) MOVSS + shuffle MOVAPS + shuffle MOVSS + shuffle MOVAPS + shuffle
Intrinsic
__m128 _mm_set_ps1(float w) __m128_set1_ps(float w) __m128 _mm_set_ps(float z, float y, float x, float w) __m128 _mm_setr_ps(float z, float y, float x, float w) __m128 _mm_setzero_ps(void) __m128 _mm_load_ps1(float * p) __m128 _mm_load1_ps(float *p) __m128 _mm_loadr_ps(float * p) void _mm_store_ps1(float * p, __m128 a) void _mm_store1_ps(float *p, __m128 a) _mm_storer_ps(float * p, __m128 a)
Description
Sets the four SP FP values to w. Sets the four SP FP values to the four inputs. Sets the four SP FP values to the four inputs in reverse order. Clears the four SP FP values. Loads a single SP FP value, copying it into all four words. Loads four SP FP values in reverse order. The address must be 16-byte-aligned. Stores the lower SP FP value across four words. Stores four SP FP values in reverse order. The address must be 16-byte-aligned.
C-11
C-12
Index
INDEX
Numerics
36-bit Page Size Extension flag, CPUID instruction. . . . . . . . . . . . . . . . . . . .3-115
C
Caches, invalidating (flushing) . . . . . .3-318, 3-708 Call gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-337 CALL instruction . . . . . . . . . . . . . . . . . . . . . . . 3-53 Calls (see Procedure calls) CBW instruction . . . . . . . . . . . . . . . . . . . . . . . 3-64 CDQ instruction . . . . . . . . . . . . . . . . . . . . . . . 3-65 CF (carry) flag, EFLAGS register 3-21, 3-23, 3-45, 3-47, 3-49, 3-51, 3-66, 3-71, 3-146, 3-296, 3-301, 3-448, 3-592, 3-627, 3-640, 3-643, 3-662, 3-673 Classify floating-point value, FPU operation. 3-271 CLC instruction . . . . . . . . . . . . . . . . . . . . . . . . 3-66 CLD instruction . . . . . . . . . . . . . . . . . . . . . . . . 3-67 CLI instruction. . . . . . . . . . . . . . . . . . . . . . . . . 3-68 CLTS instruction . . . . . . . . . . . . . . . . . . . . . . . 3-70 CMC instruction . . . . . . . . . . . . . . . . . . . . . . . 3-71 CMOV flag, CPUID instruction . . . . . . . . . . . 3-115 CMOVcc instruction . . . . . . . . . . . . . . . . . . . . 3-72 CMOVcc instructions . . . . . . . . . . . . . .3-72, 3-115 CMP instruction . . . . . . . . . . . . . . . . . . . . . . . 3-76 CMPPS instruction . . . . . . . . . . . . . . . . . . . . . 3-78 CMPS instruction . . . . . . . . . . . . . . . . .3-87, 3-605 CMPSB instruction . . . . . . . . . . . . . . . . . . . . . 3-87 CMPSD instruction . . . . . . . . . . . . . . . . . . . . . 3-87 CMPSS instruction . . . . . . . . . . . . . . . . . . . . . 3-90 CMPSW instruction . . . . . . . . . . . . . . . . . . . . 3-87 CMPXCHG instruction . . . . . . . . . . . .3-100, 3-367 CMPXCHG8B instruction . . . . . . . . . . . . . . . 3-102 COMISS instruction . . . . . . . . . . . . . . . . . . . 3-104 Compatibility, software . . . . . . . . . . . . . . . . . . . 1-6 Compiler functional equivalents . . . . . . . . . . 1, C-1 Compiler intrinsics . . . . . . . . . . . . . . . . . . . . 1, C-1 composite . . . . . . . . . . . . . . . . . . . . . . . . . C-11 simple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2 Condition code flags, EFLAGS register . . . . . 3-72 Condition code flags, FPU status word flags affected by instructions. . . . . . . . . . . 3-12 setting . . . . . . . . . . . . . . . . 3-265, 3-267, 3-271 Conditional jump. . . . . . . . . . . . . . . . . . . . . . 3-329 Conditional Move and Compare flag, CPUID instruction. . . . . . . . . . . . . 3-115 Conforming code segment . . . . . . . . .3-337, 3-342 Constants (floating point) loading . . . . . . . . . 3-210 Control registers, moving values to and from 3-407 Cosine, FPU operation . . . . . . . . . . . .3-186, 3-242 CPL. . . . . . . . . . . . . . . . . . . . . . . . . . . .3-68, 3-704 CPUID instruction . . . . . . . . . . . . . . . . . . . . . 3-111 CPUID instruction flags . . . . . . . . . . . . . . . . 3-114 CR0 control register . . . . . . . . . . . . . . . . . . . 3-654 CS register . . . . . . . . . .3-53, 3-306, 3-321, 3-333, 3-402, 3-531 CS segment override prefix . . . . . . . . . . . . . . . 2-2
A
AAA instruction. . . . . . . . . . . . . . . . . . . . . . . . .3-17 AAD instruction . . . . . . . . . . . . . . . . . . . . . . . .3-18 AAM instruction . . . . . . . . . . . . . . . . . . 3-19, 3-681 AAS instruction. . . . . . . . . . . . . . . . . . . 3-20, 3-685 Abbreviations, opcode key . . . . . . . . . . . . . . . . A-1 Access rights, segment descriptor . . . . . . . . .3-342 ADC instruction . . . . . . . . . . . . . . . . . . 3-21, 3-367 ADD instruction . . . . . . . 3-21, 3-23, 3-143, 3-367 ADDPS instruction . . . . . . . . . . . . . . . . . . . . . .3-25 Address size attribute override prefix. . . . . . . . .2-2 Address size override prefix. . . . . . . . . . . . . . . .2-2 Addressing methods codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 operand codes . . . . . . . . . . . . . . . . . . . . . . A-3 register codes . . . . . . . . . . . . . . . . . . . . . . . A-3 Addressing, segments . . . . . . . . . . . . . . . . . . . .1-7 ADDSS instruction . . . . . . . . . . . . . . . . . . . . . .3-27 Advanced Programmable Interrupt Controller (see APIC) AND instruction . . . . . . . . . . . . . . . . . . 3-30, 3-367 ANDNPS instruction. . . . . . . . . . . . . . . . . . . . .3-32 ANDPS instruction . . . . . . . . . . . . . . . . . . . . . .3-34 APIC CPUID instruction flag . . . . . . . . . . . . .3-114 Arctangent, FPU operation. . . . . . . . . . . . . . .3-221 ARPL instruction . . . . . . . . . . . . . . . . . . . . . . .3-36
B
B (default stack size) flag, segment descriptor . . . . . . . . . . . . . . 3-531, 3-581 Base (operand addressing) . . . . . . . . . . . . . . . .2-3 BCD integers packed . . . . . . . . . 3-143, 3-145, 3-169, 3-171 unpacked 3-17, 3-18, 3-19, 3-20, 3-681, 3-685 Binary numbers . . . . . . . . . . . . . . . . . . . . . . . . .1-7 Binary-coded decimal (see BCD) Bit order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-5 BOUND instruction. . . . . . . . . . . . . . . . . . . . . .3-38 BOUND range exceeded exception (#BR). . . .3-38 BSF instruction. . . . . . . . . . . . . . . . . . . . . . . . .3-40 BSR instruction . . . . . . . . . . . . . . . . . . . . . . . .3-42 BSWAP instruction. . . . . . . . . . . . . . . . . . . . . .3-44 BT instruction . . . . . . . . . . . . . . . . . . . . . . . . . .3-45 BTC instruction. . . . . . . . . . . . . . . . . . . 3-47, 3-367 BTR instruction. . . . . . . . . . . . . . . . . . . 3-49, 3-367 BTS instruction. . . . . . . . . . . . . . . . . . . 3-51, 3-367 Byte order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-5
INDEX-1
INDEX
Current privilege level (see CPL) CVTPI2PS instruction . . . . . . . . . . . . . . . . . .3-119 CVTPS2PI instruction . . . . . . . . . . . . . . . . . .3-123 CVTSI2SS instruction . . . . . . . . . . . . . . . . . .3-127 CVTSS2SI instruction . . . . . . . . . . . . . . . . . .3-130 CVTTPS2PI instruction . . . . . . . . . . . . . . . . .3-133 CVTTSS2SI instruction . . . . . . . . . . . . . . . . .3-137 CWD instruction . . . . . . . . . . . . . . . . . . . . . . .3-141 CWDE instruction (see CBW instruction) CX8 flag, CPUID instruction. . . . . . . . . . . . . .3-114
D
D (default operation size) flag, segment descriptor 3-531, 3-536, 3-581 DAA instruction . . . . . . . . . . . . . . . . . . . . . . .3-143 DAS instruction . . . . . . . . . . . . . . . . . . . . . . .3-145 DE flag, CPUID instruction. . . . . . . . . . . . . . .3-114 Debug registers, moving value to and from . .3-409 Debugging Extensions flag, CPUID instruction . . . . . . . . . . . . .3-114 DEC instruction . . . . . . . . . . . . . . . . . 3-146, 3-367 Denormal number (see Denormalized finite number) Denormalized finite number . . . . . . . . . . . . . .3-271 DF (direction) flag, EFLAGS register . . 3-67, 3-88, 3-303, 3-369, 3-435, 3-465, 3-629, 3-663 Displacement (operand addressing) . . . . . . . . .2-3 DIV instruction . . . . . . . . . . . . . . . . . . . . . . . .3-148 Divide error exception (#DE) . . . . . . . . . . . . .3-148 DIVPS instruction . . . . . . . . . . . . . . . . . . . . . .3-151 DIVSS instruction . . . . . . . . . . . . . . . . . . . . . .3-154 DS register . . . . . 3-87, 3-349, 3-369, 3-435, 3-465 DS segment override prefix . . . . . . . . . . . . . . . .2-2
instruction prefixes, cacheability control instruction behavior . . . . . . . . . . . . . . . B-25 integer instruction . . . . . . . . . . . . . . . . . . . . B-6 MMX instructions. . . . . . . . . . . . . . . . . . . . B-19 MMX instructions, general-purpose register fields . . . . . . . . . . . . . . . . . . . . B-19 notations . . . . . . . . . . . . . . . . . . . . . . . . . . B-26 SIMD floating-point register field . . . . . . . . B-27 SIMD integer instruction behavior . . . . . . . B-25 SIMD-integer register field . . . . . . . . . . . . B-34 Streaming SIMD Extension formats and encodings table . . . . . . . . . . . . . . . . . . B-24 Streaming SIMD Extensions cacheability control register field . . . . . . . . . . . . . . . . . . . . . B-35 ENTER instruction . . . . . . . . . . . . . . . . . . . . 3-158 ES register . . . . 3-87, 3-349, 3-465, 3-629, 3-668 ES segment override prefix . . . . . . . . . . . . . . . 2-2 ESI register. . . . 3-87, 3-369, 3-435, 3-465, 3-663 ESP register . . . . . . . . . . . . . . . . . . . . .3-54, 3-532 Exceptions BOUND range exceeded (#BR) . . . . . . . . 3-38 list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8 overflow exception (#OF) . . . . . . . . . . . . 3-306 returning from . . . . . . . . . . . . . . . . . . . . . 3-321 Exponent extracting from floating-point number . . . 3-285 Extract exponent and significand, FPU operation . . . . . . . . . . . . . . . 3-285
F
F2XM1 instruction. . . . . . . . . . . . . . . .3-161, 3-285 FABS instruction. . . . . . . . . . . . . . . . . . . . . . 3-163 FADD instruction . . . . . . . . . . . . . . . . . . . . . 3-165 FADDP instruction . . . . . . . . . . . . . . . . . . . . 3-165 Far call, CALL instruction . . . . . . . . . . . . . . . . 3-53 Far pointer, loading. . . . . . . . . . . . . . . . . . . . 3-349 Far return, RET instruction . . . . . . . . . . . . . . 3-608 Fast FP/MMX Technology/Streaming SIMD Extensions save/restore flag, CPUID instruction 3-115 Fast System Call flag, CPUID instruction . . . 3-115 FBLD instruction . . . . . . . . . . . . . . . . . . . . . . 3-169 FBSTP instruction. . . . . . . . . . . . . . . . . . . . . 3-171 FCHS instruction . . . . . . . . . . . . . . . . . . . . . 3-174 FCLEX instruction. . . . . . . . . . . . . . . . . . . . . 3-176 FCMOVcc instructions . . . . . . . . . . . .3-115, 3-178 FCOM instruction . . . . . . . . . . . . . . . . . . . . . 3-180 FCOMI instruction. . . . . . . . . . . . . . . .3-115, 3-183 FCOMIP instruction . . . . . . . . . . . . . . . . . . . 3-183 FCOMP instruction . . . . . . . . . . . . . . . . . . . . 3-180 FCOMPP instruction. . . . . . . . . . . . . . . . . . . 3-180 FCOS instruction . . . . . . . . . . . . . . . . . . . . . 3-186 FDECSTP instruction . . . . . . . . . . . . . . . . . . 3-188 FDIV instruction . . . . . . . . . . . . . . . . . . . . . . 3-189 FDIVP instruction . . . . . . . . . . . . . . . . . . . . . 3-189 FDIVR instruction . . . . . . . . . . . . . . . . . . . . . 3-193
E
EDI register . . . . . . . . . 3-87, 3-629, 3-663, 3-668 Effective address . . . . . . . . . . . . . . . . . . . . . .3-353 EFLAGS register condition codes. . . . . . . . . . 3-73, 3-178, 3-183 flags affected by instructions . . . . . . . . . . .3-11 loading . . . . . . . . . . . . . . . . . . . . . . . . . . .3-341 popping . . . . . . . . . . . . . . . . . . . . . . . . . . .3-538 popping on return from interrupt . . . . . . . .3-321 pushing . . . . . . . . . . . . . . . . . . . . . . . . . . .3-587 pushing on interrupts . . . . . . . . . . . . . . . .3-306 saving . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-621 status flags . . . . . . . 3-76, 3-330, 3-632, 3-688 EIP register . . . . . . . . . 3-53, 3-306, 3-321, 3-333 EMMS instruction . . . . . . . . . . . . . . . . . . . . . .3-156 Encoding floating-point instruction formats. . . . . . . . B-36 formats and encodings . . . . . . . . . . . . . . . B-27 granularity field . . . . . . . . . . . . . . . . . . . . . B-19 instruction prefixes . . . . . . . . . . . . . . . . . . B-24
INDEX-2
INDEX
FDIVRP instruction. . . . . . . . . . . . . . . . . . . . .3-193 Feature information, processor . . . . . . . . . . .3-111 FFREE instruction . . . . . . . . . . . . . . . . . . . . .3-197 FIADD instruction . . . . . . . . . . . . . . . . . . . . . .3-165 FICOM instruction . . . . . . . . . . . . . . . . . . . . .3-198 FICOMP instruction . . . . . . . . . . . . . . . . . . . .3-198 FIDIV instruction. . . . . . . . . . . . . . . . . . . . . . .3-189 FIDIVR instruction . . . . . . . . . . . . . . . . . . . . .3-193 FILD instruction . . . . . . . . . . . . . . . . . . . . . . .3-200 FIMUL instruction . . . . . . . . . . . . . . . . . . . . . .3-216 FINCSTP instruction . . . . . . . . . . . . . . . . . . .3-202 FINIT instruction. . . . . . . . . . . . . . . . . 3-203, 3-235 FIST instruction . . . . . . . . . . . . . . . . . . . . . . .3-205 FISTP instruction . . . . . . . . . . . . . . . . . . . . . .3-205 FISUB instruction . . . . . . . . . . . . . . . . . . . . . .3-257 FISUBR instruction. . . . . . . . . . . . . . . . . . . . .3-261 FLD instruction . . . . . . . . . . . . . . . . . . . . . . . .3-208 FLD1 instruction . . . . . . . . . . . . . . . . . . . . . . .3-210 FLDCW instruction . . . . . . . . . . . . . . . . . . . . .3-212 FLDENV instruction . . . . . . . . . . . . . . . . . . . .3-214 FLDL2E instruction. . . . . . . . . . . . . . . . . . . . .3-210 FLDL2T instruction. . . . . . . . . . . . . . . . . . . . .3-210 FLDLG2 instruction . . . . . . . . . . . . . . . . . . . .3-210 FLDLN2 instruction . . . . . . . . . . . . . . . . . . . .3-210 FLDPI instruction . . . . . . . . . . . . . . . . . . . . . .3-210 FLDZ instruction. . . . . . . . . . . . . . . . . . . . . . .3-210 Floating-point exceptions . . . . . . . . . . . . . . . . .3-14 list, including mnemonics . . . . . . . . . . . . . .3-14 Streaming SIMD Extensions. . . . . . . . . . . .3-14 Flushing caches . . . . . . . . . . . . . . . . . . . . . 3-318, 3-708 TLB entry . . . . . . . . . . . . . . . . . . . . . . . . .3-320 FMUL instruction . . . . . . . . . . . . . . . . . . . . . .3-216 FMULP instruction . . . . . . . . . . . . . . . . . . . . .3-216 FNCLEX instruction . . . . . . . . . . . . . . . . . . . .3-176 FNINIT instruction . . . . . . . . . . . . . . . . . . . . .3-203 FNOP instruction . . . . . . . . . . . . . . . . . . . . . .3-220 FNSAVE instruction . . . . . . . . . . . . . . 3-232, 3-235 FNSTCW instruction . . . . . . . . . . . . . . . . . . .3-249 FNSTENV instruction . . . . . . . . . . . . . 3-214, 3-251 FNSTSW instruction. . . . . . . . . . . . . . . . . . . .3-254 Formats (see Encodings) FPATAN instruction . . . . . . . . . . . . . . . . . . . .3-221 FPREM instruction . . . . . . . . . . . . . . . . . . . . .3-223 FPREM1 instruction . . . . . . . . . . . . . . . . . . . .3-226 FPTAN instruction . . . . . . . . . . . . . . . . . . . . .3-229 FPU checking for pending FPU exceptions . . .3-707 constants . . . . . . . . . . . . . . . . . . . . . . . . .3-210 existence of. . . . . . . . . . . . . . . . . . . . . . . .3-114 initialization . . . . . . . . . . . . . . . . . . . . . . . .3-203 FPU control word loading . . . . . . . . . . . . . . . . . . . . . 3-212, 3-214 RC field . . . . . . . . . . . . . . . 3-206, 3-210, 3-246 restoring . . . . . . . . . . . . . . . . . . . . . . . . . .3-232 saving . . . . . . . . . . . . . . . . . . . . . . 3-235, 3-251 storing . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-249 FPU data pointer . . . . 3-214, 3-232, 3-235, 3-251
FPU flag, CPUID instruction . . . . . . . . . . . . . 3-114 FPU instruction pointer 3-214, 3-232, 3-235, 3-251 FPU last opcode . . . . . 3-214, 3-232, 3-235, 3-251 FPU status word condition code flags . . . . 3-180 , 3-198, 3-265, 3-267, 3-271 FPU flags affected by instructions. . . . . . . 3-12 loading. . . . . . . . . . . . . . . . . . . . . . . . . . . 3-214 restoring . . . . . . . . . . . . . . . . . . . . . . . . . 3-232 saving . . . . . . . . . . . . . . . . 3-235, 3-251, 3-254 TOP field . . . . . . . . . . . . . . . . . . . . . . . . . 3-202 FPU tag word . . . . . . . 3-214, 3-232, 3-235, 3-251 FRNDINT instruction . . . . . . . . . . . . . . . . . . 3-231 FRSTOR instruction . . . . . . . . . . . . . . . . . . . 3-232 FS register . . . . . . . . . . . . . . . . . . . . . . . . . . 3-349 FS segment override prefix . . . . . . . . . . . . . . . 2-2 FSAVE instruction . . . . . . . . . . . . . . .3-232, 3-235 FSCALE instruction . . . . . . . . . . . . . . . . . . . 3-238 FSIN instruction . . . . . . . . . . . . . . . . . . . . . . 3-240 FSINCOS instruction . . . . . . . . . . . . . . . . . . 3-242 FSQRT instruction . . . . . . . . . . . . . . . . . . . . 3-244 FST instruction . . . . . . . . . . . . . . . . . . . . . . . 3-246 FSTCW instruction . . . . . . . . . . . . . . . . . . . . 3-249 FSTENV instruction . . . . . . . . . . . . . . . . . . . 3-251 FSTP instruction . . . . . . . . . . . . . . . . . . . . . . 3-246 FSTSW instruction . . . . . . . . . . . . . . . . . . . . 3-254 FSUB instruction. . . . . . . . . . . . . . . . . . . . . . 3-257 FSUBP instruction . . . . . . . . . . . . . . . . . . . . 3-257 FSUBR instruction . . . . . . . . . . . . . . . . . . . . 3-261 FSUBRP instruction . . . . . . . . . . . . . . . . . . . 3-261 FTST instruction . . . . . . . . . . . . . . . . . . . . . . 3-265 FUCOM instruction . . . . . . . . . . . . . . . . . . . . 3-267 FUCOMI instruction . . . . . . . . . . . . . . . . . . . 3-183 FUCOMIP instruction . . . . . . . . . . . . . . . . . . 3-183 FUCOMP instruction. . . . . . . . . . . . . . . . . . . 3-267 FUCOMPP instruction . . . . . . . . . . . . . . . . . 3-267 FWAIT instruction . . . . . . . . . . . . . . . .3-270, 3-707 FXAM instruction . . . . . . . . . . . . . . . . . . . . . 3-271 FXCH instruction . . . . . . . . . . . . . . . . . . . . . 3-273 FXRSTOR instruction . . . . . . . . . . . . . . . . . . 3-275 FXSAVE instruction . . . . . . . . . . . . . . . . . . . 3-279 FXSR flag, CPUID instruction. . . . . . . . . . . . 3-115 FXTRACT instruction . . . . . . . . . . . . .3-238, 3-285 FYL2X instruction . . . . . . . . . . . . . . . . . . . . . 3-287 FYL2XP1 instruction. . . . . . . . . . . . . . . . . . . 3-289
G
GDT (global descriptor table) . . . . . . .3-359, 3-362 GDTR (global descriptor table register) . . . . 3-359, 3-636 General-purpose registers MMX registers . . . . . . . . . . . . . . . . . . . . . . B-19 moving value to and from . . . . . . . . . . . . 3-402 popping all. . . . . . . . . . . . . . . . . . . . . . . . 3-536 pushing all . . . . . . . . . . . . . . . . . . . . . . . . 3-584 GS register . . . . . . . . . . . . . . . . . . . . . . . . . . 3-349 GS segment override prefix . . . . . . . . . . . . . . . 2-2
INDEX-3
INDEX
H
Hexadecimal numbers . . . . . . . . . . . . . . . . . . . .1-7 HLT instruction . . . . . . . . . . . . . . . . . . . . . . . .3-291
I
IDIV instruction. . . . . . . . . . . . . . . . . . . . . . . .3-292 IDT (interrupt descriptor table) . . . . . . 3-307, 3-359 IDTR (interrupt descriptor table register) . . . 3-359, 3-636 IF (interrupt enable) flag, EFLAGS register . . 3-68, 3-664 Immediate operands . . . . . . . . . . . . . . . . . . . . .2-3 IMUL instruction . . . . . . . . . . . . . . . . . . . . . . .3-295 IN instruction . . . . . . . . . . . . . . . . . . . . . . . . .3-299 INC instruction . . . . . . . . . . . . . . . . . . 3-301, 3-367 Index (operand addressing) . . . . . . . . . . . . . . . .2-3 Initialization FPU . . . . . . . . . . . . . . . . . . . . . .3-203 Input/output (see I/O) INS instruction . . . . . . . . . . . . . . . . . . 3-303, 3-605 INSB instruction . . . . . . . . . . . . . . . . . . . . . . .3-303 INSD instruction . . . . . . . . . . . . . . . . . . . . . . .3-303 Instruction format base field . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3 description of reference information . . . . . . .3-1 displacement. . . . . . . . . . . . . . . . . . . . . . . . .2-3 illustration . . . . . . . . . . . . . . . . . . . . . . . . . . .2-1 immediate . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3 index field . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3 Mod field . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-2 ModR/M byte . . . . . . . . . . . . . . . . . . . . . . . .2-2 opcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-2 prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-1 reg/opcode field . . . . . . . . . . . . . . . . . . . . . .2-2 r/m field . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-2 scale field . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3 SIB byte . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-2 Instruction formats and encodings . . . . . . . . . . B-1 Instruction operands. . . . . . . . . . . . . . . . . . . . . .1-7 Instruction prefixes (see Prefixes) Instruction reference, nomenclature. . . . . . . . . .3-1 Instruction set reference. . . . . . . . . . . . . . . . . . . . . . . . . . . .3-1 string instructions . . 3-87, 3-303, 3-369, 3-435, 3-465, 3-668 INSW instruction . . . . . . . . . . . . . . . . . . . . . .3-303 INT 3 instruction . . . . . . . . . . . . . . . . . . . . . . .3-306 INT3 instruction . . . . . . . . . . . . . . . . . . . . . . .3-306 Integer instruction encodings . . . . . . . . . . . . . . . . . . . . . . . . . . B-6 formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-6 Integer storing, FPU data type . . . . . . . . . . . .3-205 Inter-privilege level call, CALL instruction . . . .3-53 Inter-privilege level return, RET instruction . .3-608 Interrupts interrupt vector 4. . . . . . . . . . . . . . . . . . . .3-306 returning from . . . . . . . . . . . . . . . . . . . . . .3-321 software . . . . . . . . . . . . . . . . . . . . . . . . . .3-306
INTn instruction . . . . . . . . . . . . . . . . . . . . . . 3-306 INTO instruction . . . . . . . . . . . . . . . . . . . . . . 3-306 Intrinsics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1, C-1 INVD instruction . . . . . . . . . . . . . . . . . . . . . . 3-318 INVLPG instruction . . . . . . . . . . . . . . . . . . . . 3-320 IOPL (I/O privilege level) field, EFLAGS register . . . 3-68, 3-587, 3-664 IRET instruction . . . . . . . . . . . . . . . . . . . . . . 3-321 IRETD instruction . . . . . . . . . . . . . . . . . . . . . 3-321 I/O privilege level (see IOPL)
J
Jcc instructions . . . . . . . . . . . . . . . . . . . . . . . 3-329 JMP instruction . . . . . . . . . . . . . . . . . . . . . . . 3-333 Jump operation. . . . . . . . . . . . . . . . . . . . . . . 3-333
L
LAHF instruction . . . . . . . . . . . . . . . . . . . . . . 3-341 LAR instruction . . . . . . . . . . . . . . . . . . . . . . . 3-342 LDMXCSR instruction. . . . . . . . . . . . . . . . . . 3-345 LDS instruction . . . . . . . . . . . . . . . . . . . . . . . 3-349 LDT (local descriptor table) . . . . . . . . . . . . . 3-362 LDTR (local descriptor table register).3-362, 3-652 LEA instruction . . . . . . . . . . . . . . . . . . . . . . . 3-353 LEAVE instruction. . . . . . . . . . . . . . . . . . . . . 3-355 LES instruction . . . . . . . . . . . . . . . . . .3-349, 3-357 LFS instruction . . . . . . . . . . . . . . . . . .3-349, 3-358 LGDT instruction. . . . . . . . . . . . . . . . . . . . . . 3-359 LGS instruction . . . . . . . . . . . . . . . . . .3-349, 3-361 LIDT instruction . . . . . . . . . . . . . . . . .3-359, 3-364 LLDT instruction . . . . . . . . . . . . . . . . . . . . . . 3-362 LMSW instruction . . . . . . . . . . . . . . . . . . . . . 3-365 Load effective address operation . . . . . . . . . 3-353 LOCK prefix2-1, 3-100, 3-102, 3-367, 3-712, 3-714 Locking operation . . . . . . . . . . . . . . . . . . . . . 3-367 LODS instruction . . . . . . . . . . . . . . . .3-369, 3-605 LODSB instruction . . . . . . . . . . . . . . . . . . . . 3-369 LODSD instruction . . . . . . . . . . . . . . . . . . . . 3-369 LODSW instruction . . . . . . . . . . . . . . . . . . . . 3-369 Log epsilon, FPU operation . . . . . . . . . . . . . 3-287 Log (base 2), FPU operation . . . . . . . . . . . . 3-289 LOOP instruction . . . . . . . . . . . . . . . . . . . . . 3-372 LOOPcc instructions. . . . . . . . . . . . . . . . . . . 3-372 LSL instruction . . . . . . . . . . . . . . . . . . . . . . . 3-375 LSS instruction . . . . . . . . . . . . . . . . . .3-349, 3-379 LTR instruction . . . . . . . . . . . . . . . . . . . . . . . 3-380
M
Machine Check Architecture flag, CPUID instruction. . . . . . . . . . . . . 3-115 Machine Check Exception) flag, CPUID instruction. . . . . . . . . . . . . 3-114 Machine instruction encoding and format condition test field . . . . . . . . . . . . . . . . . . . . B-5 direction bit . . . . . . . . . . . . . . . . . . . . . . . . . B-5
INDEX-4
INDEX
operand size bit . . . . . . . . . . . . . . . . . . . . . B-3 reg field . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2 segment register field . . . . . . . . . . . . . . . . . B-4 sign extend bit. . . . . . . . . . . . . . . . . . . . . . . B-3 Machine status word, CR0 register . . 3-365, 3-654 MASKMOVQ instruction. . . . . . . . . . . . . . . . .3-382 MAXPS instruction . . . . . . . . . . . . . . . . . . . . .3-386 MAXSS instruction . . . . . . . . . . . . . . . . . . . . .3-390 MCA flag, CPUID instruction . . . . . . . . . . . . .3-115 MCE flag, CPUID instruction . . . . . . . . . . . . .3-114 Memory Type Range Registers flag, CPUID instruction . . . . . . . . . . . . .3-115 MINPS instruction . . . . . . . . . . . . . . . . . . . . .3-394 MINSS instruction . . . . . . . . . . . . . . . . . . . . .3-398 MMX instruction formats and encodings . . . . . . . . . . . . . . . B-19 general-purpose register fields . . . . . . . . . B-19 granularity field . . . . . . . . . . . . . . . . . . . . . B-19 MMXtm Technology flag, CPUID instruction . . . . . . . . . . . . . . .3-115 Mod field, instruction format . . . . . . . . . . . . . . . .2-2 ModR/M byte 16-bit addressing forms . . . . . . . . . . . . . . . .2-5 32-bit addressing forms . . . . . . . . . . . . . . . .2-6 description . . . . . . . . . . . . . . . . . . . . . . . . . .2-2 format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-1 MOV instruction . . . . . . . . . . . . . . . . . . . . . . .3-402 control registers . . . . . . . . . . . . . . . . . . . .3-407 debug registers . . . . . . . . . . . . . . . . . . . . .3-409 MOVAPS instruction . . . . . . . . . . . . . . . . . . .3-411 MOVD instruction . . . . . . . . . . . . . . . . . . . . . .3-414 MOVHLPS instruction . . . . . . . . . . . . . . . . . .3-417 MOVHPS instruction . . . . . . . . . . . . . . . . . . .3-419 MOVLHPS instruction . . . . . . . . . . . . . . . . . .3-422 MOVLPS instruction. . . . . . . . . . . . . . . . . . . .3-424 MOVMSKPS instruction . . . . . . . . . . . . . . . . .3-427 MOVNTPS instruction . . . . . . . . . . . . . . . . . .3-429 MOVNTQ instruction . . . . . . . . . . . . . . . . . . .3-431 MOVQ instruction. . . . . . . . . . . . . . . . . . . . . .3-433 MOVS instruction . . . . . . . . . . . . . . . . 3-435, 3-605 MOVSB instruction. . . . . . . . . . . . . . . . . . . . .3-435 MOVSD instruction. . . . . . . . . . . . . . . . . . . . .3-435 MOVSS instruction. . . . . . . . . . . . . . . . . . . . .3-438 MOVSW instruction . . . . . . . . . . . . . . . . . . . .3-435 MOVSX instruction. . . . . . . . . . . . . . . . . . . . .3-441 MOVUPS instruction . . . . . . . . . . . . . . . . . . .3-443 MOVZX instruction . . . . . . . . . . . . . . . . . . . . .3-446 MSR flag, CPUID instruction . . . . . . . . . . . . .3-114 MSRs (model specific registers) existence of. . . . . . . . . . . . . . . . . . . . . . . .3-114 reading . . . . . . . . . . . . . . . . . . . . . . . . . . .3-600 writing . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-710 MTRRs flag, CPUID instruction . . . . . . . . . . .3-115 MUL instruction . . . . . . . . . . . . . . . . . 3-448, 3-681 MULPS instruction . . . . . . . . . . . . . . . . . . . . .3-450 MULSS instruction . . . . . . . . . . . . . . . . . . . . .3-452
N
NaN testing for . . . . . . . . . . . . . . . . . . . . . . . . 3-265 Near call, CALL instruction . . . . . . . . . . . . . . . 3-53 Near return, RET instruction. . . . . . . . . . . . . 3-608 NEG instruction . . . . . . . . . . . . . . . . .3-367, 3-454 Nonconforming code segment . . . . . . . . . . . 3-337 NOP instruction . . . . . . . . . . . . . . . . . . . . . . 3-456 NOT instruction. . . . . . . . . . . . . . . . . .3-367, 3-457 Notation bit and byte order . . . . . . . . . . . . . . . . . . . . 1-5 exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8 hexadecimal and binary numbers . . . . . . . . 1-7 instruction operands . . . . . . . . . . . . . . . . . . 1-7 reserved bits . . . . . . . . . . . . . . . . . . . . . . . . 1-6 segmented addressing . . . . . . . . . . . . . . . . 1-7 Notational conventions . . . . . . . . . . . . . . . . . . . 1-5 NT (nested task) flag, EFLAGS register . . . . 3-321 Numeric overflow exception . . . . . . . . . . . . . . 3-14 Numeric underflow exception . . . . . . . . . . . . . 3-14
O
OF (carry) flag, EFLAGS register . . . . . . . . . 3-296 OF (overflow) flag, EFLAGS register . . 3-21, 3-23, 3-306, 3-448, 3-627, 3-640, 3-643, 3-673 Opcode escape instructions . . . . . . . . . . . . . . . . . . A-12 format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 Opcode extensions description. . . . . . . . . . . . . . . . . . . . . . . . . A-10 table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-11 Opcode integer instructions one-byte . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4 one-byte opcode map . . . . . . . . . . . . . A-6, A-7 two-byte. . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4 two-byte opcode map . . . . . . . . . . . . . A-8, A-9 Opcode key abbreviations . . . . . . . . . . . . . . . . A-1 Operand instruction . . . . . . . . . . . . . . . . . . . . . 1-7 Operand-size override prefix . . . . . . . . . . . . . . 2-2 OR instruction. . . . . . . . . . . . . . . . . . .3-367, 3-459 ORPS instruction . . . . . . . . . . . . . . . . . . . . . 3-461 OUT instruction. . . . . . . . . . . . . . . . . . . . . . . 3-463 OUTS instruction . . . . . . . . . . . . . . . .3-465, 3-605 OUTSB instruction . . . . . . . . . . . . . . . . . . . . 3-465 OUTSD instruction . . . . . . . . . . . . . . . . . . . . 3-465 OUTSW instruction. . . . . . . . . . . . . . . . . . . . 3-465 Overflow exception (#OF). . . . . . . . . . . . . . . 3-306 Overflow, FPU exception (see Numeric overflow exception)
P
PACKSSDW instruction . . . . . . . . . . . . . . . . 3-469 PACKSSWB instruction . . . . . . . . . . . . . . . . 3-469
INDEX-5
INDEX
PACKUSWB instruction . . . . . . . . . . . . . . . . .3-472 PADDB instruction . . . . . . . . . . . . . . . . . . . . .3-475 PADDD instruction . . . . . . . . . . . . . . . . . . . . .3-475 PADDSB instruction . . . . . . . . . . . . . . . . . . . .3-479 PADDSW instruction . . . . . . . . . . . . . . . . . . .3-479 PADDUSB instruction . . . . . . . . . . . . . . . . . .3-482 PADDUSW instruction . . . . . . . . . . . . . . . . . .3-482 PADDW instruction . . . . . . . . . . . . . . . . . . . .3-475 PAE flag, CPUID instruction. . . . . . . . . . . . . .3-114 Page Attribute Table flag, CPUID instruction .3-115 Page Size Extensions) flag, CPUID instruction . . . . . . . . . . . . .3-114 Page-table-entry global flag, CPUID instruction . . . . . . . . . . . . .3-115 PAND instruction . . . . . . . . . . . . . . . . . . . . . .3-485 PANDN instruction . . . . . . . . . . . . . . . . . . . . .3-487 PAT flag, CPUID instruction. . . . . . . . . . . . . .3-115 PAVGB instruction . . . . . . . . . . . . . . . . . . . . .3-489 PAVGW instruction . . . . . . . . . . . . . . . . . . . .3-489 PCMPEQB instruction . . . . . . . . . . . . . . . . . .3-493 PCMPEQD instruction . . . . . . . . . . . . . . . . . .3-493 PCMPEQW instruction. . . . . . . . . . . . . . . . . .3-493 PCMPGTB instruction . . . . . . . . . . . . . . . . . .3-497 PCMPGTD instruction . . . . . . . . . . . . . . . . . .3-497 PCMPGTW instruction . . . . . . . . . . . . . . . . . .3-497 PE flag, CR0 register . . . . . . . . . . . . . . . . . . .3-365 Performance-monitoring counters, reading . .3-602 PEXTRW instruction . . . . . . . . . . . . . . . . . . .3-501 PGE flag, CPUID instruction . . . . . . . . . . . . .3-115 Physical Address Extension flag, CPUID instruction . . . . . . . . . . . . .3-114 PINSRW instruction . . . . . . . . . . . . . . . . . . . .3-503 Pi,loading . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-210 PMADDWD instruction. . . . . . . . . . . . . . . . . .3-505 PMAXSW instruction . . . . . . . . . . . . . . . . . . .3-508 PMAXUB instruction. . . . . . . . . . . . . . . . . . . .3-511 PMINSW instruction . . . . . . . . . . . . . . . . . . . .3-514 PMINUB instruction . . . . . . . . . . . . . . . . . . . .3-517 PMOVMSKB instruction . . . . . . . . . . . . . . . . .3-520 PMULHUW instruction . . . . . . . . . . . . . . . . . .3-522 PMULHW instruction . . . . . . . . . . . . . . . . . . .3-525 PMULLW instruction . . . . . . . . . . . . . . . . . . .3-528 PN flag, CPUID instruction. . . . . . . . . . . . . . .3-115 POP instruction . . . . . . . . . . . . . . . . . . . . . . .3-531 POPA instruction . . . . . . . . . . . . . . . . . . . . . .3-536 POPAD instruction . . . . . . . . . . . . . . . . . . . . .3-536 POPF instruction . . . . . . . . . . . . . . . . . . . . . .3-538 POPFD instruction . . . . . . . . . . . . . . . . . . . . .3-538 POR instruction . . . . . . . . . . . . . . . . . . . . . . .3-541 PREFETCH instruction . . . . . . . . . . . . . . . . .3-543 Prefixes address size override . . . . . . . . . . . . . . . . . .2-2 instruction, description . . . . . . . . . . . . . . . . .2-1 LOCK . . . . . . . . . . . . . . . . . . . . . . . . 2-1, 3-367 operand-size override . . . . . . . . . . . . . . . . . .2-2 REP . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-605 REPE . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-605 repeat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-1
REPNE . . . . . . . . . . . . . . . . . . . . . . . . . . 3-605 REPNZ . . . . . . . . . . . . . . . . . . . . . . . . . . 3-605 REPZ. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-605 segment override . . . . . . . . . . . . . . . . . . . . 2-2 Procedure stack, pushing values on. . . . . . . 3-581 Processor Number flag, CPUID instruction . 3-115 Protection Enable flag, CR0 register . . . . . . 3-365 PSADBW instruction. . . . . . . . . . . . . . . . . . . 3-545 PSE flag, CPUID instruction . . . . . . . . . . . . . 3-114 PSE-36 flag, CPUID instruction . . . . . . . . . . 3-115 PSHUFW instruction. . . . . . . . . . . . . . . . . . . 3-548 PSLLD instruction. . . . . . . . . . . . . . . . . . . . . 3-550 PSLLQ instruction. . . . . . . . . . . . . . . . . . . . . 3-550 PSLLW instruction . . . . . . . . . . . . . . . . . . . . 3-550 PSRAD instruction . . . . . . . . . . . . . . . . . . . . 3-555 PSRAW instruction . . . . . . . . . . . . . . . . . . . . 3-555 PSRLD instruction . . . . . . . . . . . . . . . . . . . . 3-558 PSRLQ instruction . . . . . . . . . . . . . . . . . . . . 3-558 PSRLW instruction . . . . . . . . . . . . . . . . . . . . 3-558 PSUBB instruction . . . . . . . . . . . . . . . . . . . . 3-563 PSUBD instruction . . . . . . . . . . . . . . . . . . . . 3-563 PSUBSB instruction . . . . . . . . . . . . . . . . . . . 3-567 PSUBSW instruction. . . . . . . . . . . . . . . . . . . 3-567 PSUBUSB instruction . . . . . . . . . . . . . . . . . . 3-570 PSUBUSW instruction . . . . . . . . . . . . . . . . . 3-570 PSUBW instruction . . . . . . . . . . . . . . . . . . . . 3-563 PUNPCKHBW instruction. . . . . . . . . . . . . . . 3-573 PUNPCKHDQ instruction . . . . . . . . . . . . . . . 3-573 PUNPCKHWD instruction. . . . . . . . . . . . . . . 3-573 PUNPCKLBW instruction . . . . . . . . . . . . . . . 3-577 PUNPCKLDQ instruction . . . . . . . . . . . . . . . 3-577 PUNPCKLWD instruction . . . . . . . . . . . . . . . 3-577 PUSH instruction . . . . . . . . . . . . . . . . . . . . . 3-581 PUSHA instruction . . . . . . . . . . . . . . . . . . . . 3-584 PUSHAD instruction . . . . . . . . . . . . . . . . . . . 3-584 PUSHF instruction . . . . . . . . . . . . . . . . . . . . 3-587 PUSHFD instruction . . . . . . . . . . . . . . . . . . . 3-587 PXOR instruction . . . . . . . . . . . . . . . . . . . . . 3-589
Q
QNaN . . . . . . . . . . . . . . . . . . . . . 3-82, 3-91, 3-171 Quiet NaN (see QNaN)
R
RC (rounding control) field, FPU control word . . 3-206, 3-210, 3-246 RCL instruction . . . . . . . . . . . . . . . . . . . . . . . 3-591 RCPPS instruction . . . . . . . . . . . . . . . . . . . . 3-596 RCPSS instruction . . . . . . . . . . . . . . . . . . . . 3-598 RCR instruction . . . . . . . . . . . . . . . . . . . . . . 3-591 RDMSR instruction . . . . . . . . . 3-114, 3-600, 3-604 RDPMC instruction . . . . . . . . . . . . . . . . . . . . 3-602 RDTSC instruction . . . . . . . . . . . . . . .3-114, 3-604 Reg/opcode field, instruction format . . . . . . . . . 2-2 Related Literature . . . . . . . . . . . . . . . . . . . . . . . 1-9
INDEX-6
INDEX
Remainder, FPU operation . . . . . . . . 3-223, 3-226 REP prefix . . . . . . . . . . . . . . . . . . . . . . 3-88, 3-605 REPE prefix . . . . . . . . . . . . . . . . . . . . . 3-88, 3-605 REPNE prefix . . . . . . . . . . . . . . . . . . . . 3-88, 3-605 REPNZ prefix . . . . . . . . . . . . . . . . . . . . 3-88, 3-605 REPZ prefix . . . . . . . . . . . . . . . . . . . . . 3-88, 3-605 REP/REPE/REPZ/REPNE/REPNZ prefixes . . . . . . . . . . . 2-1, 3-304, 3-466 Reserved bits . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6 RET instruction. . . . . . . . . . . . . . . . . . . . . . . .3-608 ROL instruction . . . . . . . . . . . . . . . . . 3-591, 3-615 ROR instruction . . . . . . . . . . . . . . . . . 3-591, 3-615 Rotate operation. . . . . . . . . . . . . . . . . . . . . . .3-591 Round to integer, FPU operation . . . . . . . . . .3-231 RPL field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-36 RSM instruction . . . . . . . . . . . . . . . . . . . . . . .3-616 RSQRTPS instruction . . . . . . . . . . . . . . . . . .3-617 RSQRTSS instruction . . . . . . . . . . . . . . . . . .3-619 R/m field, instruction format . . . . . . . . . . . . . . . .2-2
S
SAHF instruction . . . . . . . . . . . . . . . . . . . . . .3-621 SAL instruction . . . . . . . . . . . . . . . . . . . . . . . .3-622 SAR instruction . . . . . . . . . . . . . . . . . . . . . . .3-622 SBB instruction. . . . . . . . . . . . . . . . . . 3-367, 3-627 Scale (operand addressing) . . . . . . . . . . . . . . . .2-3 Scale, FPU operation . . . . . . . . . . . . . . . . . . .3-238 SCAS instruction . . . . . . . . . . . . . . . . 3-605, 3-629 SCASB instruction . . . . . . . . . . . . . . . . . . . . .3-629 SCASD instruction . . . . . . . . . . . . . . . . . . . . .3-629 SCASW instruction. . . . . . . . . . . . . . . . . . . . .3-629 Segment descriptor, segment limit. . . . . . . . .3-375 Segment limit . . . . . . . . . . . . . . . . . . . . . . . . .3-375 Segment override prefixes . . . . . . . . . . . . . . . . .2-2 Segment registers, moving values to and from. . . . . . . . . . . . . . . . . . .3-402 Segment selector, RPL field. . . . . . . . . . . . . . .3-36 Segmented addressing . . . . . . . . . . . . . . . . . . .1-7 SEP flag, CPUID instruction. . . . . . . . . . . . . .3-115 SETcc instructions . . . . . . . . . . . . . . . . . . . . .3-632 SF (sign) flag, EFLAGS register. . . . . . . 3-21, 3-23 SFENCE instruction . . . . . . . . . . . . . . . . . . . .3-634 SGDT instruction . . . . . . . . . . . . . . . . . . . . . .3-636 SHL instruction. . . . . . . . . . . . . . . . . . 3-622, 3-639 SHLD instruction . . . . . . . . . . . . . . . . . . . . . .3-640 SHR instruction . . . . . . . . . . . . . . . . . 3-622, 3-639 SHRD instruction . . . . . . . . . . . . . . . . . . . . . .3-643 SHUFPS instruction . . . . . . . . . . . . . . . . . . . .3-646 SIB byte 32-bit addressing forms . . . . . . . . . . . . . . . .2-7 description . . . . . . . . . . . . . . . . . . . . . . . . . .2-2 format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-1 SIDT instruction . . . . . . . . . . . . . . . . . 3-636, 3-651 Signaling NaN (see SNaN) Significan, extracting . . . . . . . . . . . . . . . . . . .3-285 SIMD floating-point exceptions (See Floating-point exceptions) Sine, FPU operation. . . . . . . . . . . . . . 3-240, 3-242 SLDT instruction. . . . . . . . . . . . . . . . . . . . . . .3-652
SMSW instruction . . . . . . . . . . . . . . . . . . . . . 3-654 SNaN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-82 SQRTPS instruction . . . . . . . . . . . . . . . . . . . 3-656 SQRTSS instruction . . . . . . . . . . . . . . . . . . . 3-659 Square root, FPU operation . . . . . . . . . . . . . 3-244 SS register . . . . . . . . . . . . . . . 3-349, 3-403, 3-532 SS segment override prefix . . . . . . . . . . . . . . . 2-2 Stack (see Procedure stack) Status flags, EFLAGS register . 3-73, 3-76, 3-178, 3-183, 3-330, 3-632, 3-688 STC instruction . . . . . . . . . . . . . . . . . . . . . . . 3-662 STD instruction . . . . . . . . . . . . . . . . . . . . . . . 3-663 STI instruction. . . . . . . . . . . . . . . . . . . . . . . . 3-664 STMXCSR instruction. . . . . . . . . . . . . . . . . . 3-666 STOS instruction . . . . . . . . . . . . . . . .3-605, 3-668 STOSB instruction . . . . . . . . . . . . . . . . . . . . 3-668 STOSD instruction . . . . . . . . . . . . . . . . . . . . 3-668 STOSW instruction . . . . . . . . . . . . . . . . . . . . 3-668 STR instruction . . . . . . . . . . . . . . . . . . . . . . . 3-671 Streaming SIMD Extensions CPUID instruction flag. . . . . . . . . . . . . . . 3-115 encoding SIMD floating-point register field B-27 encoding SIMD-integer register field. . . . . B-34 encoding Streaming SIMD Extensions cacheability control register field . . . . . B-35 formats and encodings . . . . . . . . . . . . . . . B-27 formats and encodings table . . . . . . . . . . . B-24 instruction prefixes . . . . . . . . . . . . . B-24, B-25 instruction prefixes, cacheability control instruction behavior . . . . . . . . . . . . . . . B-25 notations . . . . . . . . . . . . . . . . . . . . . . . . . . B-26 SIMD integer instruction behavior . . . . . . . B-25 String operations . . . . . . . . . . 3-87, 3-303, 3-369, 3-435, 3-465, 3-668 SUB instruction. . . . . . 3-145, 3-367, 3-673, 3-685 SUBPS instruction . . . . . . . . . . . . . . . . . . . . 3-675 SUBSS instruction . . . . . . . . . . . . . . . . . . . . 3-678 SYSENTER instruction. . . . . . . . . . . . . . . . . 3-681 SYSEXIT instruction . . . . . . . . . . . . . . . . . . . 3-685
T
Tangent, FPU operation . . . . . . . . . . . . . . . . 3-229 Task gate . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-338 Task register loading. . . . . . . . . . . . . . . . . . . . . . . . . . . 3-380 storing . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-671 Task state segment (see TSS) Task switch return from nested task, IRET instruction 3-321 Task switch, CALL instruction . . . . . . . . . . . . 3-53 TEST instruction . . . . . . . . . . . . . . . . . . . . . . 3-688 Time Stamp Counter flag, CPUID instruction 3-114 Time-stamp counter, reading . . . . . . . . . . . . 3-604 TLB entry, invalidating (flushing) . . . . . . . . . 3-320 TS (task switched) flag, CR0 register . . . . . . . 3-70 TSC flag, CPUID instruction . . . . . . . . . . . . . 3-114 TSD flag, CR4 register . . . . . . . . . . . . . . . . . 3-604 TSS, relationship to task register . . . . . . . . . 3-671
INDEX-7
U
UCOMISS instruction . . . . . . . . . . . . . . . . . . .3-690 UD2 instruction. . . . . . . . . . . . . . . . . . . . . . . .3-697 Undefined format opcodes . . . . . . . . . . . . . . .3-265 Underflow, FPU exception (see Numeric underflow exception) Unordered values. . . . 3-180, 3-183, 3-265, 3-267 UNPCKHPS instruction . . . . . . . . . . . . . . . . .3-698 UNPCKLPS instruction . . . . . . . . . . . . . . . . .3-701
V
Vector (see Interrupt vector) Vector (see INTn instruction) VERR instruction . . . . . . . . . . . . . . . . . . . . . .3-704 Version information, processor . . . . . . . . . . .3-111 VERW instruction . . . . . . . . . . . . . . . . . . . . . .3-704 Virtual 8086 Mode Enhancements flag, CPUID instruction . . . . . . . . . . . . .3-114 Virtual 8086 Mode flag, EFLAGS register . . .3-321 VM flag, EFLAGS register . . . . . . . . . . . . . . .3-321 VME flag, CPUID instruction . . . . . . . . . . . . .3-114
W
WAIT instruction. . . . . . . . . . . . . . . . . . . . . . .3-707 WBINVD instruction . . . . . . . . . . . . . . . . . . . .3-708 Write-back and invalidate caches . . . . . . . . .3-708 WRMSR instruction . . . . . . . . . . . . . . 3-114, 3-710
X
XADD instruction . . . . . . . . . . . . . . . . 3-367, 3-712 XCHG instruction . . . . . . . . . . . . . . . . 3-367, 3-714 XLAT instruction. . . . . . . . . . . . . . . . . . . . . . .3-716 XLATB instruction . . . . . . . . . . . . . . . . . . . . .3-716 XMM flag, CPUID instruction . . . . . . . . . . . . .3-115 XOR instruction . . . . . . . . . . . . . . . . . 3-367, 3-718 XORPS instruction . . . . . . . . . . . . . . . . . . . . .3-720
Z
ZF (zero) flag, EFLAGS register . . . 3-100, 3-102, 3-342, 3-372, 3-375, 3-605, 3-704

Volume 3: System Programming
1999
TABLE OF CONTENTS
CHAPTER 1 ABOUT THIS MANUAL 1.1. P6 FAMILY PROCESSOR TERMINOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.2. OVERVIEW OF THE INTEL ARCHITECTURE SOFTWARE DEVELOPERS MANUAL, VOLUME 3: SYSTEM PROGRAMMING GUIDE . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.3. OVERVIEW OF THE INTEL ARCHITECTURE SOFTWARE DEVELOPERS MANUAL, 1-3 VOLUME 1: BASIC ARCHITECTURE 1.4. OVERVIEW OF THE INTEL ARCHITECTURE SOFTWARE DEVELOPERS MANUAL, VOLUME 2: INSTRUCTION SET REFERENCE 1-5 1.5. NOTATIONAL CONVENTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 1.5.1. Bit and Byte Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6 1.5.2. Reserved Bits and Software Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6 1.5.3. Instruction Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-7 1.5.4. Hexadecimal and Binary Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-7 1.5.5. Segmented Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-7 1.5.6. Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-8 1.6. RELATED LITERATURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 CHAPTER 2 SYSTEM ARCHITECTURE OVERVIEW 2.1. OVERVIEW OF THE SYSTEM-LEVEL ARCHITECTURE . . . . . . . . . . . . . . . . . . . 2-1 2.1.1. Global and Local Descriptor Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3 2.1.2. System Segments, Segment Descriptors, and Gates . . . . . . . . . . . . . . . . . . . . . .2-3 2.1.3. Task-State Segments and Task Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-4 2.1.4. Interrupt and Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-4 2.1.5. Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-5 2.1.6. System Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-5 2.1.7. Other System Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-6 2.2. MODES OF OPERATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 2.3. SYSTEM FLAGS AND FIELDS IN THE EFLAGS REGISTER . . . . . . . . . . . . . . . . 2-8 2.4. MEMORY-MANAGEMENT REGISTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 2.4.1. Global Descriptor Table Register (GDTR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-10 2.4.2. Local Descriptor Table Register (LDTR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-11 2.4.3. IDTR Interrupt Descriptor Table Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-11 2.4.4. Task Register (TR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-11 2.5. CONTROL REGISTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12 2.5.1. CPUID Qualification of Control Register Flags . . . . . . . . . . . . . . . . . . . . . . . . . .2-18 2.6. SYSTEM INSTRUCTION SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-18 2.6.1. Loading and Storing System Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-20 2.6.2. Verifying of Access Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-20 2.6.3. Loading and Storing Debug Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-21 2.6.4. Invalidating Caches and TLBs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-21 2.6.5. Controlling the Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-22 2.6.6. Reading Performance-Monitoring and Time-Stamp Counters . . . . . . . . . . . . . .2-22 2.6.7. Reading and Writing Model-Specific Registers . . . . . . . . . . . . . . . . . . . . . . . . . .2-23 2.6.8. Loading and Storing the Streaming SIMD Extensions Control/Status Word . . . .2-23
iii
TABLE OF CONTENTS
CHAPTER 3 PROTECTED-MODE MEMORY MANAGEMENT 3.1. MEMORY MANAGEMENT OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 3.2. USING SEGMENTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 3.2.1. Basic Flat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3 3.2.2. Protected Flat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-4 3.2.3. Multisegment Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-5 3.2.4. Paging and Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6 3.3. PHYSICAL ADDRESS SPACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 3.4. LOGICAL AND LINEAR ADDRESSES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 3.4.1. Segment Selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-7 3.4.2. Segment Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-8 3.4.3. Segment Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-9 3.4.3.1. Code- and Data-Segment Descriptor Types. . . . . . . . . . . . . . . . . . . . . . . . . .3-13 3.5. SYSTEM DESCRIPTOR TYPES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15 3.5.1. Segment Descriptor Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-16 3.6. PAGING (VIRTUAL MEMORY) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18 3.6.1. Paging Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-19 3.6.2. Page Tables and Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-20 3.6.2.1. Linear Address Translation (4-KByte Pages) . . . . . . . . . . . . . . . . . . . . . . . . .3-20 3.6.2.2. Linear Address Translation (4-MByte Pages). . . . . . . . . . . . . . . . . . . . . . . . .3-21 3.6.2.3. Mixing 4-KByte and 4-MByte Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-22 3.6.3. Base Address of the Page Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-23 3.6.4. Page-Directory and Page-Table Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-23 3.6.5. Not Present Page-Directory and Page-Table Entries . . . . . . . . . . . . . . . . . . . . .3-28 3.7. TRANSLATION LOOKASIDE BUFFERS (TLBS) . . . . . . . . . . . . . . . . . . . . . . . . . 3-28 3.8. PHYSICAL ADDRESS EXTENSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-29 3.8.1. Linear Address Translation With Extended Addressing Enabled (4-KByte Pages) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-30 3.8.2. Linear Address Translation With Extended Addressing Enabled (2-MByte or 4-MByte Pages) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-32 3.8.3. Accessing the Full Extended Physical Address Space With the Extended Page-Table Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-32 3.8.4. Page-Directory and Page-Table Entries With Extended Addressing Enabled . .3-33 3.9. 36-BIT PAGE SIZE EXTENSION (PSE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-35 3.9.1. Description of the 36-bit PSE Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-36 3.9.2. Fault Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-39 3.10. MAPPING SEGMENTS TO PAGES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-40 CHAPTER 4 PROTECTION 4.1. ENABLING AND DISABLING SEGMENT AND PAGE PROTECTION . . . . . . . . . . 4-2 4.2. FIELDS AND FLAGS USED FOR SEGMENT-LEVEL AND PAGE-LEVEL PROTECTION 4-2 4.3. LIMIT CHECKING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5 4.4. TYPE CHECKING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6 4.4.1. Null Segment Selector Checking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7 4.5. PRIVILEGE LEVELS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8 4.6. PRIVILEGE LEVEL CHECKING WHEN ACCESSING DATA SEGMENTS . . . . . . 4-9 4.6.1. Accessing Data in Code Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-12 4.7. PRIVILEGE LEVEL CHECKING WHEN LOADING THE SS REGISTER . . . . . . . 4-12
iv
TABLE OF CONTENTS
4.8. 4.8.1. 4.8.1.1. 4.8.1.2. 4.8.2. 4.8.3. 4.8.4. 4.8.5. 4.8.6. 4.9. 4.10. 4.10.1. 4.10.2. 4.10.3. 4.10.4. 4.10.5. 4.11. 4.11.1. 4.11.2. 4.11.3. 4.11.4. 4.11.5. 4.12.
PRIVILEGE LEVEL CHECKING WHEN TRANSFERRING PROGRAM CONTROL BETWEEN CODE SEGMENTS 4-12 Direct Calls or Jumps to Code Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13 Accessing Nonconforming Code Segments . . . . . . . . . . . . . . . . . . . . . . . . . 4-14 Accessing Conforming Code Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-15 Gate Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16 Call Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16 Accessing a Code Segment Through a Call Gate . . . . . . . . . . . . . . . . . . . . . . . 4-17 Stack Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-21 Returning from a Called Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-23 PRIVILEGED INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-25 POINTER VALIDATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-25 Checking Access Rights (LAR Instruction) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-26 Checking Read/Write Rights (VERR and VERW Instructions) . . . . . . . . . . . . . 4-27 Checking That the Pointer Offset Is Within Limits (LSL Instruction) . . . . . . . . . 4-28 Checking Caller Access Privileges (ARPL Instruction) . . . . . . . . . . . . . . . . . . . 4-28 Checking Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-30 PAGE-LEVEL PROTECTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-30 Page-Protection Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-31 Restricting Addressable Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-31 Page Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-32 Combining Protection of Both Levels of Page Tables . . . . . . . . . . . . . . . . . . . . 4-32 Overrides to Page Protection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-32 COMBINING PAGE AND SEGMENT PROTECTION . . . . . . . . . . . . . . . . . . . . . . 4-33
CHAPTER 5 INTERRUPT AND EXCEPTION HANDLING 5.1. INTERRUPT AND EXCEPTION OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 5.1.1. Sources of Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 5.1.1.1. External Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 5.1.1.2. Maskable Hardware Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 5.1.1.3. Software-Generated Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 5.1.2. Sources of Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 5.1.2.1. Program-Error Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 5.1.2.2. Software-Generated Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 5.1.2.3. Machine-Check Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 5.2. EXCEPTION AND INTERRUPT VECTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 5.3. EXCEPTION CLASSIFICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 5.4. PROGRAM OR TASK RESTART. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7 5.5. NONMASKABLE INTERRUPT (NMI). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5.5.1. Handling Multiple NMIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5.6. ENABLING AND DISABLING INTERRUPTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5.6.1. Masking Maskable Hardware Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5.6.2. Masking Instruction Breakpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9 5.6.3. Masking Exceptions and Interrupts When Switching Stacks . . . . . . . . . . . . . . . 5-10 5.7. PRIORITY AMONG SIMULTANEOUS EXCEPTIONS AND INTERRUPTS . . . . . 5-10 5.8. INTERRUPT DESCRIPTOR TABLE (IDT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 5.9. IDT DESCRIPTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13 5.10. EXCEPTION AND INTERRUPT HANDLING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15 5.10.1. Exception- or Interrupt-Handler Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15 5.10.1.1. Protection of Exception- and Interrupt-Handler Procedures . . . . . . . . . . . . . 5-17 5.10.1.2. Flag Usage By Exception- or Interrupt-Handler Procedure. . . . . . . . . . . . . . 5-18
TABLE OF CONTENTS
5.10.2. Interrupt Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-18 5.11. ERROR CODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-20 5.12. EXCEPTION AND INTERRUPT REFERENCE . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21 CHAPTER 6 TASK MANAGEMENT 6.1. TASK MANAGEMENT OVERVIEW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 6.1.1. Task Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-1 6.1.2. Task State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-2 6.1.3. Executing a Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-3 6.2. TASK MANAGEMENT DATA STRUCTURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 6.2.1. Task-State Segment (TSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-4 6.2.2. TSS Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-6 6.2.3. Task Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-8 6.2.4. Task-Gate Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-8 6.3. TASK SWITCHING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 6.4. TASK LINKING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14 6.4.1. Use of Busy Flag To Prevent Recursive Task Switching . . . . . . . . . . . . . . . . . .6-16 6.4.2. Modifying Task Linkages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-16 6.5. TASK ADDRESS SPACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-17 6.5.1. Mapping Tasks to the Linear and Physical Address Spaces. . . . . . . . . . . . . . . .6-17 6.5.2. Task Logical Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-18 6.6. 16-BIT TASK-STATE SEGMENT (TSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19 CHAPTER 7 MULTIPLE-PROCESSOR MANAGEMENT 7.1. LOCKED ATOMIC OPERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 7.1.1. Guaranteed Atomic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-2 7.1.2. Bus Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-3 7.1.2.1. Automatic Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-3 7.1.2.2. Software Controlled Bus Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-4 7.1.3. Handling Self- and Cross-Modifying Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-5 7.1.4. Effects of a LOCK Operation on Internal Processor Caches. . . . . . . . . . . . . . . . .7-6 7.2. MEMORY ORDERING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 7.2.1. Memory Ordering in the Pentium and Intel486 Processors. . . . . . . . . . . . . . .7-7 7.2.2. Memory Ordering in the P6 Family Processors. . . . . . . . . . . . . . . . . . . . . . . . . . .7-7 7.2.3. Out of Order Stores From String Operations in P6 Family Processors . . . . . . . . .7-9 7.2.4. Strengthening or Weakening the Memory Ordering Model . . . . . . . . . . . . . . . . . .7-9 7.3. PROPAGATION OF PAGE TABLE ENTRY CHANGES TO MULTIPLE PROCESSORS 7-11 7.4. SERIALIZING INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-11 7.5. ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC). . . . . . . . . 7-13 7.5.1. Presence of APIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-14 7.5.2. Enabling or Disabling the Local APIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-14 7.5.3. APIC Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-14 7.5.4. Valid Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-15 7.5.5. Interrupt Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-15 7.5.6. Bus Arbitration Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-15 7.5.7. The Local APIC Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-16 7.5.8. Relocation of the APIC Registers Base Address. . . . . . . . . . . . . . . . . . . . . . . . .7-19 7.5.9. Interrupt Destination and APIC ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-20 7.5.9.1. Physical Destination Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-20 vi
TABLE OF CONTENTS
7.5.9.2. 7.5.9.3. 7.5.9.4. 7.5.9.5. 7.5.10. 7.5.11. 7.5.12. 7.5.13. 7.5.13.1. 7.5.13.2. 7.5.13.3. 7.5.13.4. 7.5.13.5. 7.5.13.6. 7.5.14. 7.5.14.1. 7.5.14.2. 7.5.14.3. 7.5.14.4. 7.5.14.5. 7.5.15. 7.5.16. 7.5.16.1. 7.5.16.2. 7.5.17. 7.5.18. 7.5.19. 7.5.20. 7.5.21. 7.6. 7.7. 7.7.1. 7.7.2. 7.7.3. 7.7.4. 7.7.5.
Logical Destination Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Flat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cluster Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arbitration Priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interrupt Distribution Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Local Vector Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interprocessor and Self-Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interrupt Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interrupt Acceptance Decision Flow Chart . . . . . . . . . . . . . . . . . . . . . . . . . . Task Priority Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Processor Priority Register (PPR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arbitration Priority Register (APR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spurious Interrupt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . End-Of-Interrupt (EOI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Local APIC State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spurious-Interrupt Vector Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Local APIC Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Local APIC State After Power-Up Reset. . . . . . . . . . . . . . . . . . . . . . . . . . . . Local APIC State After an INIT Reset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Local APIC State After INIT-Deassert Message . . . . . . . . . . . . . . . . . . . . . . Local APIC Version Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . APIC Bus Arbitration Mechanism and Protocol . . . . . . . . . . . . . . . . . . . . . . . . . Bus Message Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . APIC Bus Status Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software Visible Differences Between the Local APIC and the 82489DX . . . . . Performance Related Differences between the Local APIC and the 82489DX . New Features Incorporated in the Pentium and P6 Family Processors Local APIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DUAL-PROCESSOR (DP) INITIALIZATION PROTOCOL . . . . . . . . . . . . . . . . . . . MULTIPLE-PROCESSOR (MP) INITIALIZATION PROTOCOL. . . . . . . . . . . . . . . MP Initialization Protocol Requirements and Restrictions . . . . . . . . . . . . . . . . . MP Protocol Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error Detection During the MP Initialization Protocol. . . . . . . . . . . . . . . . . . . . . Error Handling During the MP Initialization Protocol . . . . . . . . . . . . . . . . . . . . . MP Initialization Protocol Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-20 7-21 7-21 7-22 7-22 7-23 7-25 7-30 7-30 7-31 7-32 7-32 7-33 7-33 7-33 7-34 7-35 7-35 7-35 7-35 7-36 7-36 7-37 7-40 7-42 7-43 7-44 7-45 7-45 7-45 7-46 7-46 7-47 7-48 7-48 7-48
CHAPTER 8 PROCESSOR MANAGEMENT AND INITIALIZATION 8.1. INITIALIZATION OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1 8.1.1. Processor State After Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2 8.1.2. Processor Built-In Self-Test (BIST) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2 8.1.3. Model and Stepping Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5 8.1.4. First Instruction Executed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6 8.2. FPU INITIALIZATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6 8.2.1. Configuring the FPU Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6 8.2.2. Setting the Processor for FPU Software Emulation. . . . . . . . . . . . . . . . . . . . . . . 8-8 8.3. CACHE ENABLING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-8 8.4. MODEL-SPECIFIC REGISTERS (MSRS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-8 8.5. MEMORY TYPE RANGE REGISTERS (MTRRS) . . . . . . . . . . . . . . . . . . . . . . . . . . 8-9 8.6. SOFTWARE INITIALIZATION FOR REAL-ADDRESS MODE OPERATION . . . . 8-10
vii
TABLE OF CONTENTS
8.6.1. Real-Address Mode IDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-10 8.6.2. NMI Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-10 8.7. SOFTWARE INITIALIZATION FOR PROTECTED-MODE OPERATION . . . . . . . 8-11 8.7.1. Protected-Mode System Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-12 8.7.2. Initializing Protected-Mode Exceptions and Interrupts . . . . . . . . . . . . . . . . . . . .8-12 8.7.3. Initializing Paging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-12 8.7.4. Initializing Multitasking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-13 8.8. MODE SWITCHING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-13 8.8.1. Switching to Protected Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-14 8.8.2. Switching Back to Real-Address Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-15 8.9. INITIALIZATION AND MODE SWITCHING EXAMPLE . . . . . . . . . . . . . . . . . . . . . 8-16 8.9.1. Assembler Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-19 8.9.2. STARTUP.ASM Listing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-19 8.9.3. MAIN.ASM Source Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-29 8.9.4. Supporting Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-29 8.10. P6 FAMILY MICROCODE UPDATE FEATURE . . . . . . . . . . . . . . . . . . . . . . . . . . 8-31 8.10.1. Microcode Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-32 8.10.2. Microcode Update Loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-35 8.10.2.1. Update Loading Procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-36 8.10.2.2. Hard Resets in Update Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-36 8.10.2.3. Update in a Multiprocessor System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-37 8.10.2.4. Update Loader Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-37 8.10.3. Update Signature and Verification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-37 8.10.3.1. Determining the Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-38 8.10.3.2. Authenticating the Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-38 8.10.4. P6 Family Processor Microcode Update Specifications . . . . . . . . . . . . . . . . . . .8-39 8.10.4.1. Responsibilities of the BIOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-39 8.10.4.2. Responsibilities of the Calling Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-40 8.10.4.3. Microcode Update Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-43 8.10.4.4. INT 15h-based Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-43 8.10.4.5. Return Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-50 CHAPTER 9 MEMORY CACHE CONTROL 9.1. INTERNAL CACHES, TLBS, AND BUFFERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1 9.2. CACHING TERMINOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4 9.3. METHODS OF CACHING AVAILABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5 9.3.1. Buffering of Write Combining Memory Locations . . . . . . . . . . . . . . . . . . . . . . . . .9-7 9.3.2. Choosing a Memory Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-8 9.4. CACHE CONTROL PROTOCOL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9 9.5. CACHE CONTROL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9 9.5.1. Precedence of Cache Controls (P6 Family Processor) . . . . . . . . . . . . . . . . . . . .9-13 9.5.2. Preventing Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-14 9.6. CACHE MANAGEMENT INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-15 9.7. SELF-MODIFYING CODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-15 9.8. IMPLICIT CACHING (P6 FAMILY PROCESSORS) . . . . . . . . . . . . . . . . . . . . . . . 9-16 9.9. EXPLICIT CACHING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-16 9.10. INVALIDATING THE TRANSLATION LOOKASIDE BUFFERS (TLBS) . . . . . . . . 9-17 9.11. WRITE BUFFER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17 9.12. MEMORY TYPE RANGE REGISTERS (MTRRS) . . . . . . . . . . . . . . . . . . . . . . . . . 9-18 9.12.1. MTRR Feature Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-20 9.12.2. Setting Memory Ranges with MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-21
viii
TABLE OF CONTENTS
9.12.2.1. MTRRdefType Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.2.2. Fixed Range MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.2.3. Variable Range MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.3. Example Base and Mask Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.4. Range Size and Alignment Requirement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.4.1. MTRR Precedences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.5. MTRR Initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.6. Remapping Memory Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.7. MTRR Maintenance Programming Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.7.1. MemTypeGet() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.7.2. MemTypeSet() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.8. Multiple-Processor Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.12.9. Large Page Size Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.13. PAGE ATTRIBUTE TABLE (PAT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.13.1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.13.2. Detecting Support for the PAT Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.13.3. Technical Description of the PAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.13.4. Accessing the PAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.13.5. Programming the PAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 10 MMX TECHNOLOGY SYSTEM PROGRAMMING 10.1. EMULATION OF THE MMX INSTRUCTION SET . . . . . . . . . . . . . . . . . . . . . . . 10.2. THE MMX STATE AND MMX REGISTER ALIASING . . . . . . . . . . . . . . . . . . 10.2.1. Effect of MMX and Floating-Point Instructions on the FPU Tag Word . . . . . . 10.3. SAVING AND RESTORING THE MMX STATE AND REGISTERS . . . . . . . . . . 10.4. DESIGNING OPERATING SYSTEM TASK AND CONTEXT SWITCHING FACILITIES 10.4.1. Using the TS Flag in Control Register CR0 to Control MMX/FPU State Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5. EXCEPTIONS THAT CAN OCCUR WHEN EXECUTING MMX INSTRUCTIONS 10.5.1. Effect of MMX Instructions on Pending Floating-Point Exceptions . . . . . . . . 10.6. DEBUGGING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 11 STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING 11.1. EMULATION OF THE STREAMING SIMD EXTENSIONS . . . . . . . . . . . . . . . . . . 11.2. MMX STATE AND STREAMING SIMD EXTENSIONS . . . . . . . . . . . . . . . . . . . 11.3. NEW PENTIUM III PROCESSOR REGISTERS . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.1. SIMD Floating-point Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.2. SIMD Floating-point Control/Status Registers . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.2.1. Rounding Control Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.2.2. Flush-to-Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4. ENABLING STREAMING SIMD EXTENSIONS SUPPORT. . . . . . . . . . . . . . . . . . 11.4.1. Enabling Streaming SIMD Extensions Support . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.2. Device Not Available (DNA) Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.3. FXSAVE/FXRSTOR as a Replacement for FSAVE/FRSTOR. . . . . . . . . . . . . . 11.4.4. Numeric Error flag and IGNNE# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5. SAVING AND RESTORING THE STREAMING SIMD EXTENSIONS STATE . . . 11.6. DESIGNING OPERATING SYSTEM TASK AND CONTEXT SWITCHING FACILITIES
9-21 9-22 9-23 9-25 9-26 9-26 9-27 9-27 9-28 9-28 9-29 9-31 9-32 9-33 9-33 9-34 9-34 9-35 9-38
10-1 10-1 10-3 10-4 10-5 10-5 10-7 10-8 10-8
11-1 11-1 11-1 11-2 11-2 11-3 11-5 11-6 11-6 11-6 11-7 11-7 11-7 11-8 ix
TABLE OF CONTENTS
11.6.1.
Using the TS Flag in Control Register CR0 to Control SIMD Floating-Point State Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-8 11.7. EXCEPTIONS THAT CAN OCCUR WHEN EXECUTING STREAMING SIMD EXTENSIONS INSTRUCTIONS 11-11 11.7.1. SIMD Floating-point Non-Numeric Exceptions . . . . . . . . . . . . . . . . . . . . . . . . .11-12 11.7.2. SIMD Floating-point Numeric Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-13 11.7.2.1. Exception Priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-13 11.7.2.2. Automatic Masked Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-14 11.7.2.3. Software Exception Handling - Unmasked Exceptions. . . . . . . . . . . . . . . . .11-15 11.7.2.4. Interaction with x87 numeric exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . . .11-16 11.7.3. SIMD Floating-point Numeric Exception Conditions and Masked/Unmasked Responses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-16 11.7.3.1. Invalid Operation Exception(#IA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-17 11.7.3.2. Division-By-Zero Exception (#Z). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-18 11.7.3.3. Denormal Operand Exception (#D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-19 11.7.3.4. Numeric Overflow Exception (#O) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-19 11.7.3.5. Numeric Underflow Exception (#U) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-20 11.7.3.6. Inexact Result (Precision) Exception (#P) . . . . . . . . . . . . . . . . . . . . . . . . . .11-21 11.7.4. Effect of Streaming SIMD Extensions Instructions on Pending Floating-Point Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-22 11.8. DEBUGGING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-22 CHAPTER 12 SYSTEM MANAGEMENT MODE (SMM) 12.1. SYSTEM MANAGEMENT MODE OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1 12.2. SYSTEM MANAGEMENT INTERRUPT (SMI) . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2 12.3. SWITCHING BETWEEN SMM AND THE OTHER PROCESSOR OPERATING MODES 12-2 12.3.1. Entering SMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-2 12.3.1.1. Exiting From SMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-3 12.4. SMRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-4 12.4.1. SMRAM State Save Map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-5 12.4.2. SMRAM Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-7 12.5. SMI HANDLER EXECUTION ENVIRONMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-8 12.6. EXCEPTIONS AND INTERRUPTS WITHIN SMM . . . . . . . . . . . . . . . . . . . . . . . 12-10 12.7. NMI HANDLING WHILE IN SMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-11 12.8. SAVING THE FPU STATE WHILE IN SMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-11 12.9. SMM REVISION IDENTIFIER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-12 12.10. AUTO HALT RESTART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-13 12.10.1. Executing the HLT Instruction in SMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-14 12.11. SMBASE RELOCATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-14 12.11.1. Relocating SMRAM to an Address Above 1 MByte. . . . . . . . . . . . . . . . . . . . . .12-15 12.12. I/O INSTRUCTION RESTART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-15 12.12.1. Back-to-Back SMI Interrupts When I/O Instruction Restart Is Being Used . . . .12-16 12.13. SMM MULTIPLE-PROCESSOR CONSIDERATIONS. . . . . . . . . . . . . . . . . . . . . 12-17 CHAPTER 13 MACHINE-CHECK ARCHITECTURE 13.1. MACHINE-CHECK EXCEPTIONS AND ARCHITECTURE . . . . . . . . . . . . . . . . . . 13-1 13.2. COMPATIBILITY WITH PENTIUM PROCESSOR . . . . . . . . . . . . . . . . . . . . . . . 13-1 13.3. MACHINE-CHECK MSRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2 13.3.1. Machine-Check Global Control MSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-2 x
TABLE OF CONTENTS
13.3.1.1. 13.3.1.2. 13.3.1.3. 13.3.2. 13.3.2.1. 13.3.2.2. 13.3.2.3. 13.3.2.4. 13.3.3. 13.4. 13.5. 13.6. 13.6.1. 13.6.2. 13.6.3. 13.7. 13.7.1. 13.7.2. 13.7.3.
MCG_CAP MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2 MCG_STATUS MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-3 MCG_CTL MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-4 Error-Reporting Register Banks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-4 MCi_CTL MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-4 MCi_STATUS MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-5 MCi_ADDR MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-6 MCi_MISC MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-7 Mapping of the Pentium Processor Machine-Check Errors to the P6 Family Machine-Check Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-7 MACHINE-CHECK AVAILABILITY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-7 MACHINE-CHECK INITIALIZATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-7 INTERPRETING THE MCA ERROR CODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-8 Simple Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-9 Compound Error Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-9 Interpreting the Machine-Check Error Codes for External Bus Errors. . . . . . . 13-11 GUIDELINES FOR WRITING MACHINE-CHECK SOFTWARE . . . . . . . . . . . . . 13-14 Machine-Check Exception Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-14 Pentium Processor Machine-Check Exception Handling . . . . . . . . . . . . . . . 13-16 Logging Correctable Machine-Check Errors . . . . . . . . . . . . . . . . . . . . . . . . . . 13-16
CHAPTER 14 CODE OPTIMIZATION 14.1. CODE OPTIMIZATION GUIDELINES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1 14.1.1. General Code Optimization Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1 14.1.2. Guidelines for Optimizing MMX Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2 14.1.3. Guidelines for Optimizing Floating-Point Code . . . . . . . . . . . . . . . . . . . . . . . . . 14-2 14.1.4. Guidelines for Optimizing SIMD Floating-point Code . . . . . . . . . . . . . . . . . . . . 14-3 14.2. BRANCH PREDICTION OPTIMIZATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-4 14.2.1. Branch Prediction Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-4 14.2.2. Optimizing Branch Predictions in Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-5 14.2.3. Eliminating and Reducing the Number of Branches . . . . . . . . . . . . . . . . . . . . . 14-5 14.3. REDUCING PARTIAL REGISTER STALLS ON P6 FAMILY PROCESSORS. . . . 14-7 14.4. ALIGNMENT RULES AND GUIDELINES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-9 14.4.1. Alignment Penalties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-9 14.4.2. Code Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-9 14.4.3. Data Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-9 14.4.3.1. Alignment of Data Structures and Arrays Greater Than 32 Bytes . . . . . . . 14-10 14.4.3.2. Alignment of Data in Memory and on the Stack . . . . . . . . . . . . . . . . . . . . . 14-10 14.5. INSTRUCTION SCHEDULING OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12 14.5.1. Instruction Pairing Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12 14.5.1.1. General Pairing Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12 14.5.1.2. Integer Pairing Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13 14.5.1.3. MMX Instruction Pairing Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-17 14.5.2. Pipelining Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18 14.5.2.1. MMX Instruction Pipelining Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18 14.5.2.2. Floating-Point Pipelining Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18 14.5.3. Scheduling Rules for P6 Family Processors . . . . . . . . . . . . . . . . . . . . . . . . . . 14-22 14.6. ACCESSING MEMORY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-24 14.6.1. Using MMX Instructions That Access Memory. . . . . . . . . . . . . . . . . . . . . . . 14-24 14.6.2. Partial Memory Accesses With MMX Instructions . . . . . . . . . . . . . . . . . . . . 14-25 14.6.3. Write Allocation Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-27
xi
TABLE OF CONTENTS
14.7. 14.8. 14.9. 14.10.
ADDRESSING MODES AND REGISTER USAGE . . . . . . . . . . . . . . . . . . . . . . . INSTRUCTION LENGTH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PREFIXED OPCODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . INTEGER INSTRUCTION SELECTION AND OPTIMIZATIONS. . . . . . . . . . . . .
14-29 14-30 14-31 14-32
CHAPTER 15 DEBUGGING AND PERFORMANCE MONITORING 15.1. OVERVIEW OF THE DEBUGGING SUPPORT FACILITIES . . . . . . . . . . . . . . . . 15-1 15.2. DEBUG REGISTERS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2 15.2.1. Debug Address Registers (DR0-DR3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-4 15.2.2. Debug Registers DR4 and DR5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-4 15.2.3. Debug Status Register (DR6) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-4 15.2.4. Debug Control Register (DR7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-5 15.2.5. Breakpoint Field Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-6 15.3. DEBUG EXCEPTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-7 15.3.1. Debug Exception (#DB)Interrupt Vector 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-8 15.3.1.1. Instruction-Breakpoint Exception Condition . . . . . . . . . . . . . . . . . . . . . . . . . .15-8 15.3.1.2. Data Memory and I/O Breakpoint Exception Conditions . . . . . . . . . . . . . . . .15-9 15.3.1.3. General-Detect Exception Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-10 15.3.1.4. Single-Step Exception Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-10 15.3.1.5. Task-Switch Exception Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-11 15.3.2. Breakpoint Exception (#BP)Interrupt Vector 3 . . . . . . . . . . . . . . . . . . . . . . . .15-11 15.4. LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING . . . . . . . . . . . . 15-11 15.4.1. DebugCtlMSR Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-11 15.4.2. Last Branch and Last Exception MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-13 15.4.3. Monitoring Branches, Exceptions, and Interrupts . . . . . . . . . . . . . . . . . . . . . . .15-13 15.4.4. Single-Stepping on Branches, Exceptions, and Interrupts . . . . . . . . . . . . . . . .15-14 15.4.5. Initializing Last Branch or Last Exception/Interrupt Recording . . . . . . . . . . . . .15-14 15.5. TIME-STAMP COUNTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-14 15.6. PERFORMANCE-MONITORING COUNTERS . . . . . . . . . . . . . . . . . . . . . . . . . . 15-15 15.6.1. P6 Family Processor Performance-Monitoring Counters . . . . . . . . . . . . . . . . .15-15 15.6.1.1. PerfEvtSel0 and PerfEvtSel1 MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-16 15.6.1.2. PerfCtr0 and PerfCtr1 MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-18 15.6.1.3. Starting and Stopping the Performance-Monitoring Counters . . . . . . . . . . .15-18 15.6.1.4. Event and Time-Stamp Monitoring Software . . . . . . . . . . . . . . . . . . . . . . . .15-18 15.6.2. Monitoring Counter Overflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-19 15.6.3. Pentium Processor Performance-Monitoring Counters. . . . . . . . . . . . . . . . . .15-20 15.6.3.1. Control and Event Select Register (CESR) . . . . . . . . . . . . . . . . . . . . . . . . .15-20 15.6.3.2. Use of the Performance-Monitoring Pins . . . . . . . . . . . . . . . . . . . . . . . . . . .15-21 15.6.3.3. Events Counted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-22 CHAPTER 16 8086 EMULATION 16.1. REAL-ADDRESS MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-1 16.1.1. Address Translation in Real-Address Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-3 16.1.2. Registers Supported in Real-Address Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-4 16.1.3. Instructions Supported in Real-Address Mode . . . . . . . . . . . . . . . . . . . . . . . . . .16-4 16.1.4. Interrupt and Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-6 16.2. VIRTUAL-8086 MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-9 16.2.1. Enabling Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-9 16.2.2. Structure of a Virtual-8086 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-9 16.2.3. Paging of Virtual-8086 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-10 xii
TABLE OF CONTENTS
16.2.4. Protection within a Virtual-8086 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.5. Entering Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.6. Leaving Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.7. Sensitive Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.8. Virtual-8086 Mode I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.8.1. I/O-Port-Mapped I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.8.2. Memory-Mapped I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.8.3. Special I/O Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3. INTERRUPT AND EXCEPTION HANDLING IN VIRTUAL-8086 MODE . . . . . . . 16.3.1. Class 1Hardware Interrupt and Exception Handling in Virtual-8086 Mode . 16.3.1.1. Handling an Interrupt or Exception Through a Protected-Mode Trap or Interrupt Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.1.2. Handling an Interrupt or Exception With an 8086 Program Interrupt or Exception Handler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.1.3. Handling an Interrupt or Exception Through a Task Gate . . . . . . . . . . . . . 16.3.2. Class 2Maskable Hardware Interrupt Handling in Virtual-8086 Mode Using the Virtual Interrupt Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.3. Class 3Software Interrupt Handling in Virtual-8086 Mode . . . . . . . . . . . . . . 16.3.3.1. Method 1: Software Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.3.2. Methods 2 and 3: Software Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . 16.3.3.3. Method 4: Software Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.3.4. Method 5: Software Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.3.5. Method 6: Software Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4. PROTECTED-MODE VIRTUAL INTERRUPTS . . . . . . . . . . . . . . . . . . . . . . . . . .
16-11 16-11 16-13 16-14 16-14 16-15 16-15 16-15 16-15 16-17 16-17 16-19 16-20 16-20 16-23 16-25 16-26 16-26 16-26 16-27 16-27
CHAPTER 17 MIXING 16-BIT AND 32-BIT CODE 17.1. DEFINING 16-BIT AND 32-BIT PROGRAM MODULES . . . . . . . . . . . . . . . . . . . . 17.2. MIXING 16-BIT AND 32-BIT OPERATIONS WITHIN A CODE SEGMENT. . . . . . 17.3. SHARING DATA AMONG MIXED-SIZE CODE SEGMENTS . . . . . . . . . . . . . . . . 17.4. TRANSFERRING CONTROL AMONG MIXED-SIZE CODE SEGMENTS . . . . . . 17.4.1. Code-Segment Pointer Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.2. Stack Management for Control Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.2.1. Controlling the Operand-Size Attribute For a Call. . . . . . . . . . . . . . . . . . . . . 17.4.2.2. Passing Parameters With a Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.3. Interrupt Control Transfers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.4. Parameter Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.5. Writing Interface Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 18 INTEL ARCHITECTURE COMPATIBILITY 18.1. INTEL ARCHITECTURE FAMILIES AND CATEGORIES . . . . . . . . . . . . . . . . . . . 18.2. RESERVED BITS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3. ENABLING NEW FUNCTIONS AND MODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4. DETECTING THE PRESENCE OF NEW FEATURES THROUGH SOFTWARE . 18.5. MMX TECHNOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.6. STREAMING SIMD EXTENSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.7. NEW INSTRUCTIONS IN THE PENTIUM AND LATER INTEL ARCHITECTURE PROCESSORS 18.7.1. Instructions Added Prior to the Pentium Processor. . . . . . . . . . . . . . . . . . . . . 18.8. OBSOLETE INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.9. UNDEFINED OPCODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17-2 17-2 17-3 17-4 17-5 17-5 17-7 17-7 17-8 17-8 17-8
18-1 18-1 18-2 18-2 18-3 18-3 18-3 18-5 18-5 18-6 xiii
TABLE OF CONTENTS
18.10. NEW FLAGS IN THE EFLAGS REGISTER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-6 18.10.1. Using EFLAGS Flags to Distinguish Between 32-Bit Intel Architecture Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-6 18.11. STACK OPERATIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-7 18.11.1. PUSH SP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-7 18.11.2. EFLAGS Pushed on the Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-7 18.12. FPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-7 18.12.1. Control Register CR0 Flags. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-8 18.12.2. FPU Status Word. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-8 18.12.2.1. Condition Code Flags (C0 through C3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-8 18.12.2.2. Stack Fault Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-9 18.12.3. FPU Control Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-9 18.12.4. FPU Tag Word. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-9 18.12.5. Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-10 18.12.5.1. NaNs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-10 18.12.5.2. Pseudo-zero, Pseudo-NaN, Pseudo-infinity, and Unnormal Formats . . . . .18-10 18.12.6. Floating-Point Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-11 18.12.6.1. Denormal Operand Exception (#D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-11 18.12.6.2. Numeric Overflow Exception (#O) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-11 18.12.6.3. Numeric Underflow Exception (#U) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-12 18.12.6.4. Exception Precedence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-12 18.12.6.5. CS and EIP For FPU Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-12 18.12.6.6. FPU Error Signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-12 18.12.6.7. Assertion of the FERR# Pin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-13 18.12.6.8. Invalid Operation Exception On Denormals . . . . . . . . . . . . . . . . . . . . . . . . .18-13 18.12.6.9. Alignment Check Exceptions (#AC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-13 18.12.6.10. Segment Not Present Exception During FLDENV . . . . . . . . . . . . . . . . . . . .18-14 18.12.6.11. Device Not Available Exception (#NM). . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-14 18.12.6.12. Coprocessor Segment Overrun Exception . . . . . . . . . . . . . . . . . . . . . . . . . .18-14 18.12.6.13. General Protection Exception (#GP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-14 18.12.6.14. Floating-Point Error Exception (#MF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-14 18.12.7. Changes to Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-14 18.12.7.1. FDIV, FPREM, and FSQRT Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . .18-15 18.12.7.2. FSCALE Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-15 18.12.7.3. FPREM1 Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-15 18.12.7.4. FPREM Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-15 18.12.7.5. FUCOM, FUCOMP, and FUCOMPP Instructions. . . . . . . . . . . . . . . . . . . . .18-15 18.12.7.6. FPTAN Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-15 18.12.7.7. Stack Overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-16 18.12.7.8. FSIN, FCOS, and FSINCOS Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . .18-16 18.12.7.9. FPATAN Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-16 18.12.7.10. F2XM1 Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-16 18.12.7.11. FLD Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-16 18.12.7.12. FXTRACT Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-17 18.12.7.13. Load Constant Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-17 18.12.7.14. FSETPM Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-17 18.12.7.15. FXAM Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-17 18.12.7.16. FSAVE and FSTENV Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-18 18.12.8. Transcendental Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-18 18.12.9. Obsolete Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-18 18.12.10. WAIT/FWAIT Prefix Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-18 18.12.11. Operands Split Across Segments and/or Pages . . . . . . . . . . . . . . . . . . . . . . . .18-18
xiv
TABLE OF CONTENTS
18.12.12. FPU Instruction Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.13. SERIALIZING INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.14. FPU AND MATH COPROCESSOR INITIALIZATION . . . . . . . . . . . . . . . . . . . . . 18.14.1. Intel 387 and Intel 287 Math Coprocessor Initialization . . . . . . . . . . . . . . . . . . 18.14.2. Intel486 SX Processor and Intel 487 SX Math Coprocessor Initialization . . 18.15. CONTROL REGISTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.16. MEMORY MANAGEMENT FACILITIES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.16.1. New Memory Management Control Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.16.1.1. Physical Memory Addressing Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.16.1.2. Global Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.16.1.3. Larger Page Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.16.2. CD and NW Cache Control Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.16.3. Descriptor Types and Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.16.4. Changes in Segment Descriptor Loads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.17. DEBUG FACILITIES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.17.1. Differences in Debug Register DR6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.17.2. Differences in Debug Register DR7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.17.3. Debug Registers DR4 and DR5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.17.4. Recognition of Breakpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.18. TEST REGISTERS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.19. EXCEPTIONS AND/OR EXCEPTION CONDITIONS . . . . . . . . . . . . . . . . . . . . . 18.19.1. Machine-Check Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.19.2. Priority OF Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.20. INTERRUPTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.20.1. Interrupt Propagation Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.20.2. NMI Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.20.3. IDT Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.21. TASK SWITCHING AND TSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.21.1. P6 Family and Pentium Processor TSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.21.2. TSS Selector Writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.21.3. Order of Reads/Writes to the TSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.21.4. Using A 16-Bit TSS with 32-Bit Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.21.5. Differences in I/O Map Base Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.22. CACHE MANAGEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.22.1. Self-Modifying Code with Cache Enabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.23. PAGING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.23.1. Large Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.23.2. PCD and PWT Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.23.3. Enabling and Disabling Paging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.24. STACK OPERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.24.1. Selector Pushes and Pops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.24.2. Error Code Pushes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.24.3. Fault Handling Effects on the Stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.24.4. Interlevel RET/IRET From a 16-Bit Interrupt or Call Gate . . . . . . . . . . . . . . . . 18.25. MIXING 16- AND 32-BIT SEGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.26. SEGMENT AND ADDRESS WRAPAROUND. . . . . . . . . . . . . . . . . . . . . . . . . . . 18.26.1. Segment Wraparound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.27. WRITE BUFFERS AND MEMORY ORDERING . . . . . . . . . . . . . . . . . . . . . . . . . 18.28. BUS LOCKING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.29. BUS HOLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.30. TWO WAYS TO RUN INTEL 286 PROCESSOR TASKS . . . . . . . . . . . . . . . . . . 18.31. MODEL-SPECIFIC EXTENSIONS TO THE INTEL ARCHITECTURE . . . . . . . .
18-19 18-19 18-19 18-19 18-20 18-21 18-23 18-23 18-23 18-23 18-23 18-23 18-24 18-24 18-24 18-24 18-24 18-25 18-25 18-25 18-25 18-27 18-27 18-27 18-27 18-28 18-28 18-28 18-28 18-28 18-28 18-29 18-29 18-30 18-31 18-31 18-32 18-32 18-32 18-33 18-33 18-33 18-33 18-34 18-34 18-35 18-35 18-36 18-37 18-37 18-37 18-38
xv
TABLE OF CONTENTS
18.31.1. 18.31.2. 18.31.3. 18.31.4. 18.31.5.
Model-Specific Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-38 RDMSR and WRMSR Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-38 Memory Type Range Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-39 Machine-Check Exception and Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . .18-39 Performance-Monitoring Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-40
APPENDIX A PERFORMANCE-MONITORING EVENTS A.1. P6 FAMILY PROCESSOR PERFORMANCE-MONITORING EVENTS . . . . . . . . . A-1 A.2. PENTIUM PROCESSOR PERFORMANCE-MONITORING EVENTS . . . . . . . . A-12 APPENDIX B MODEL-SPECIFIC REGISTERS APPENDIX C DUAL-PROCESSOR (DP) BOOTUP SEQUENCE EXAMPLE (SPECIFIC TO PENTIUM PROCESSORS) C.1. PRIMARY PROCESSORS SEQUENCE OF EVENTS . . . . . . . . . . . . . . . . . . . . . . C-1 C.2. SECONDARY PROCESSORS SEQUENCE OF EVENTS FOLLOWING RECEIPT OF START-UP IPI C-4 APPENDIX D MULTIPLE-PROCESSOR (MP) BOOTUP SEQUENCE EXAMPLE (SPECIFIC TO P6 FAMILY PROCESSORS) D.1. BSPS SEQUENCE OF EVENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1 D.2. APS SEQUENCE OF EVENTS FOLLOWING RECEIPT OF START-UP IPI . . . . . D-3 APPENDIX E PROGRAMMING THE LINT0 AND LINT1 INPUTS E.1. CONSTANTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-1 E.2. LINT[0:1] PINS PROGRAMMING PROCEDURE . . . . . . . . . . . . . . . . . . . . . . . . . . E-1
xvi
TABLE OF FIGURES
Figure 1-1. Figure 2-1. Figure 2-2. Figure 2-3. Figure 2-4. Figure 2-5. Figure 3-1. Figure 3-2. Figure 3-3. Figure 3-4. Figure 3-5. Figure 3-6. Figure 3-7. Figure 3-8. Figure 3-9. Figure 3-10. Figure 3-11. Figure 3-12. Figure 3-13. Figure 3-14. Figure 3-15. Figure 3-16. Figure 3-17. Figure 3-18. Figure 3-19. Figure 3-20. Figure 3-21. Figure 3-22. Figure 3-23. Figure 4-1. Figure 4-2. Figure 4-3. Figure 4-4. Figure 4-5. Figure 4-6. Figure 4-7. Figure 4-8. Figure 4-9. Figure 4-10. Figure 4-11. Figure 4-12. Figure 5-1. Bit and Byte Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6 System-Level Registers and Data Structures. . . . . . . . . . . . . . . . . . . . . . . . . .2-2 Transitions Among the Processors Operating Modes . . . . . . . . . . . . . . . . . . .2-7 System Flags in the EFLAGS Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-8 Memory Management Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-10 Control Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-12 Segmentation and Paging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-2 Flat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-4 Protected Flat Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-4 Multisegment Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-5 Logical Address to Linear Address Translation . . . . . . . . . . . . . . . . . . . . . . . .3-7 Segment Selector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-8 Segment Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-9 Segment Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-11 Segment Descriptor When Segment-Present Flag Is Clear . . . . . . . . . . . . . .3-13 Global and Local Descriptor Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-17 Pseudo-Descriptor Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-18 Linear Address Translation (4-KByte Pages) . . . . . . . . . . . . . . . . . . . . . . . . .3-21 Linear Address Translation (4-MByte Pages). . . . . . . . . . . . . . . . . . . . . . . . .3-22 Format of Page-Directory and Page-Table Entries for 4-KByte Pages and 32-Bit Physical Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-24 Format of Page-Directory Entries for 4-MByte Pages and 32-Bit Addresses .3-25 Format of a Page-Table or Page-Directory Entry for a Not-Present Page . . .3-28 Register CR3 Format When the Physical Address Extension is Enabled . . .3-30 Linear Address Translation With Extended Physical Addressing Enabled (4-KByte Pages) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-31 Linear Address Translation With Extended Physical Addressing Enabled (2-MByte or 4-MByte Pages) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-33 Format of Page-Directory-Pointer-Table, Page-Directory, and Page-Table Entries for 4-KByte Pages and 36-Bit Extended Physical Addresses . . . . . .3-34 Format of Page-Directory-Pointer-Table and Page-Directory Entries for 2- or 4-MByte Pages and 36-Bit Extended Physical Addresses. . . . . . . . . . .3-35 PDE Format Differences between 36-bit and 32-bit addressing. . . . . . . . . . .3-38 Memory Management Convention That Assigns a Page Table to Each Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-40 Descriptor Fields Used for Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-4 Protection Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-8 Privilege Check for Data Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-10 Examples of Accessing Data Segments From Various Privilege Levels . . . .4-11 Privilege Check for Control Transfer Without Using a Gate . . . . . . . . . . . . . .4-13 Examples of Accessing Conforming and Nonconforming Code Segments From Various Privilege Levels. . . . . . . . . . . . . . . . . . . . . . . . . . . .4-14 Call-Gate Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-17 Call-Gate Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-18 Privilege Check for Control Transfer with Call Gate . . . . . . . . . . . . . . . . . . . .4-19 Example of Accessing Call Gates At Various Privilege Levels. . . . . . . . . . . .4-20 Stack Switching During an Interprivilege-Level Call . . . . . . . . . . . . . . . . . . . .4-23 Use of RPL to Weaken Privilege Level of Called Procedure . . . . . . . . . . . . .4-29 Relationship of the IDTR and IDT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-13 xvii
TABLE OF FIGURES
Figure 5-2. Figure 5-3. Figure 5-4. Figure 5-5. Figure 5-6. Figure 5-7. Figure 6-1. Figure 6-2. Figure 6-3. Figure 6-4. Figure 6-5. Figure 6-6. Figure 6-7. Figure 6-8. Figure 6-9. Figure 7-1. Figure 7-2. Figure 7-3. Figure 7-4. Figure 7-5. Figure 7-6. Figure 7-7. Figure 7-8. Figure 7-9. Figure 7-10. Figure 7-11. Figure 7-12. Figure 7-13. Figure 7-14. Figure 7-15. Figure 7-16. Figure 7-17. Figure 7-18. Figure 7-19. Figure 8-1. Figure 8-2. Figure 8-3. Figure 8-4. Figure 8-5. Figure 8-6. Figure 8-7. Figure 8-8. Figure 8-9. Figure 9-1. Figure 9-2. Figure 9-3. Figure 9-4. Figure 9-5. Figure 9-6. Figure 9-7.
IDT Gate Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-14 Interrupt Procedure Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-16 Stack Usage on Transfers to Interrupt and Exception-Handling Routines . . .5-17 Interrupt Task Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-19 Error Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-20 Page-Fault Error Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-45 Structure of a Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-2 32-Bit Task-State Segment (TSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-5 TSS Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-7 Task Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-9 Task-Gate Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-9 Task Gates Referencing the Same Task . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-11 Nested Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-15 Overlapping Linear-to-Physical Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . .6-18 16-Bit TSS Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-20 Example of Write Ordering in Multiple-Processor Systems . . . . . . . . . . . . . . .7-8 I/O APIC and Local APICs in Multiple-Processor Systems . . . . . . . . . . . . . .7-14 Local APIC Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-17 APIC_BASE_MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-19 Local APIC ID Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-20 Logical Destination Register (LDR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-21 Destination Format Register (DFR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-21 Local Vector Table (LVT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-24 Interrupt Command Register (ICR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-26 IRR, ISR and TMR Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-30 Interrupt Acceptance Flow Chart for the Local APIC . . . . . . . . . . . . . . . . . . .7-31 Task Priority Register (TPR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-32 EOI Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-33 Spurious-Interrupt Vector Register (SVR) . . . . . . . . . . . . . . . . . . . . . . . . . . .7-34 Local APIC Version Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-36 Error Status Register (ESR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-42 Divide Configuration Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-43 Initial Count and Current Count Registers . . . . . . . . . . . . . . . . . . . . . . . . . . .7-44 SMP System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-49 Contents of CR0 Register after Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-5 Processor Type and Signature in the EDX Register after Reset . . . . . . . . . . .8-5 Processor State After Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-17 Constructing Temporary GDT and Switching to Protected Mode (Lines 162-172 of List File) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-26 Moving the GDT, IDT and TSS from ROM to RAM (Lines 196-261 of List File) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-27 Task Switching (Lines 282-296 of List File) . . . . . . . . . . . . . . . . . . . . . . . . . .8-28 Integrating Processor Specific Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-32 Format of the Microcode Update Data Block . . . . . . . . . . . . . . . . . . . . . . . . .8-35 Write Operation Flow Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-47 Intel Architecture Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-2 Cache-Control Mechanisms Available in the Intel Architecture Processors . .9-10 Mapping Physical Memory With MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-20 MTRRcap Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-21 MTRRdefType Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-22 MTRRphysBasen and MTRRphysMaskn Variable-Range Register Pair . . . .9-24 Page Attribute Table Model Specific Register . . . . . . . . . . . . . . . . . . . . . . . .9-34
xviii
TABLE OF FIGURES
Figure 9-8. Figure 10-1. Figure 10-2. Figure 10-3. Figure 11-1. Figure 11-2. Figure 12-1. Figure 12-2. Figure 12-3. Figure 12-4. Figure 12-5. Figure 13-1. Figure 13-2. Figure 13-3. Figure 13-4. Figure 13-5. Figure 13-6. Figure 14-1. Figure 14-2. Figure 15-1. Figure 15-2. Figure 15-3. Figure 15-4. Figure 16-1. Figure 16-2. Figure 16-3. Figure 16-4. Figure 16-5. Figure 17-1. Figure 18-1.
Page Attribute Table Index Scheme for Paging Hierarchy . . . . . . . . . . . . . . 9-36 Mapping of MMX Registers to Floating-Point Registers . . . . . . . . . . . . . . 10-2 Example of MMX/FPU State Saving During an Operating System-Controlled Task Switch . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6 Mapping of MMX Registers to Floating-Point (FP) Registers . . . . . . . . . . 10-9 Streaming SIMD Extensions Control/Status Register Format. . . . . . . . . . . . 11-3 Example of SIMD Floating-Point State Saving During an Operating System-Controlled Task Switch . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9 SMRAM Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-5 SMM Revision Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-13 Auto HALT Restart Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-13 SMBASE Relocation Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-15 I/O Instruction Restart Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-16 Machine-Check MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2 MCG_CAP Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-3 MCG_STATUS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-3 MCi_CTL Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-4 MCi_STATUS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-5 Machine-Check Bank Address Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-6 Stack and Memory Layout of Static Variables . . . . . . . . . . . . . . . . . . . . . . 14-11 Pipeline Example of AGI Stall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-29 Debug Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-3 DebugCtlMSR Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-12 PerfEvtSel0 and PerfEvtSel1 MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-17 CESR MSR (Pentium Processor Only) . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-21 Real-Address Mode Address Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-4 Interrupt Vector Table in Real-Address Mode. . . . . . . . . . . . . . . . . . . . . . . . 16-7 Entering and Leaving Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 16-12 Privilege Level 0 Stack After Interrupt or Exception in Virtual-8086 Mode . 16-18 Software Interrupt Redirection Bit Map in TSS . . . . . . . . . . . . . . . . . . . . . . 16-25 Stack after Far 16- and 32-Bit Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6 I/O Map Base Address Differences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-30
xix
TABLE OF FIGURES
xx
TABLE OF TABLES
Table 2-1. Table 2-2. Table 3-1. Table 3-2. Table 3-3. Table 3-4. Table 4-1. Table 4-2. Table 5-1. Table 5-2. Table 5-3. Table 5-4. Table 5-5. Table 5-6. Table 5-7. Table 6-1. Table 6-2. Table 7-1. Table 7-2. Table 7-3. Table 7-4. Table 7-5. Table 7-6. Table 7-7. Table 7-8. Table 8-1. Table 8-2. Table 8-3. Table 8-4. Table 8-5. Table 8-6. Table 8-7. Table 8-8. Table 8-9. Table 8-10. Table 8-11. Table 8-12. Table 8-13. Table 8-14. Table 9-1. Table 9-2. Table 9-3. Table 9-4. xxi Action Taken for Combinations of EM, MP, TS, CR4.OSFXSR, and CPUID.XMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-15 Summary of System Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-19 Code- and Data-Segment Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-14 System-Segment and Gate-Descriptor Types . . . . . . . . . . . . . . . . . . . . . . . .3-16 Page Sizes and Physical Address Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-20 Paging Modes and Physical Address Size . . . . . . . . . . . . . . . . . . . . . . . . . . .3-37 Privilege Check Rules for Call Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-19 Combined Page-Directory and Page-Table Protection. . . . . . . . . . . . . . . . . .4-33 Protected-Mode Exceptions and Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . .5-6 SIMD Floating-Point Exceptions Priority. . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-11 Priority Among Simultaneous Exceptions and Interrupts . . . . . . . . . . . . . . . .5-12 Interrupt and Exception Classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-32 Conditions for Generating a Double Fault . . . . . . . . . . . . . . . . . . . . . . . . . . .5-33 Invalid TSS Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-35 Alignment Requirements by Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-50 Exception Conditions Checked During a Task Switch . . . . . . . . . . . . . . . . . .6-13 Effect of a Task Switch on Busy Flag, NT Flag, Previous Task Link Field, and TS Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-15 Local APIC Register Address Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-18 Valid Combinations for the APIC Interrupt Command Register . . . . . . . . . . .7-29 EOI Message (14 Cycles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-37 Short Message (21 Cycles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-38 Nonfocused Lowest Priority Message (34 Cycles) . . . . . . . . . . . . . . . . . . . .7-39 APIC Bus Status Cycles Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-40 Types of Boot Phase IPIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-47 Boot Phase IPI Message Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-47 32-Bit Intel Architecture Processor States Following Power-up, Reset, or INIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-3 Recommended Settings of EM and MP Flags on Intel Architecture Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-7 Software Emulation Settings of EM, MP, and NE Flags . . . . . . . . . . . . . . . . . .8-8 Main Initialization Steps in STARTUP.ASM Source Listing . . . . . . . . . . . . . .8-18 Relationship Between BLD Item and ASM Source File . . . . . . . . . . . . . . . . .8-31 P6 Family Processor MSR Register Components . . . . . . . . . . . . . . . . . . . . .8-33 Microcode Update Encoding Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-34 Microcode Update Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-43 Parameters for the Presence Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-44 Parameters for the Write Update Data Function. . . . . . . . . . . . . . . . . . . . . . .8-45 Parameters for the Control Update Sub-function . . . . . . . . . . . . . . . . . . . . . .8-48 Mnemonic Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-48 Parameters for the Read Microcode Update Data Function. . . . . . . . . . . . . .8-49 Return Code Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-50 Characteristics of the Caches, TLBs, and Write Buffer in Intel Architecture Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-3 Methods of Caching Available in P6 Family, Pentium , and Intel486 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-6 MESI Cache Line States. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-9 Cache Operating Modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-11
TABLE OF TABLES
Table 9-5. Table 9-6. Table 9-7. Table 9-8. Table 9-9. Table 9-10. Table 10-1. Table 10-2. Table 11-1. Table 11-2. Table 11-3. Table 11-4. Table 11-5. Table 11-6. Table 11-7. Table 11-8. Table 11-9. Table 12-1. Table 12-2. Table 12-3. Table 12-4. Table 13-1. Table 13-2. Table 13-3. Table 13-4. Table 13-5. Table 13-6. Table 13-7. Table 14-1. Table 14-2. Table 15-1. Table 15-2. Table 16-1. Table 16-2. Table 17-1. Table 18-1. Table 18-1. Table 18-2. Table A-1. Table A-2. Table B-1.
Effective Memory Type Depending on MTRR, PCD, and PWT Settings . . . .9-14 MTRR Memory Types and Their Properties . . . . . . . . . . . . . . . . . . . . . . . . . .9-19 Address Mapping for Fixed-Range MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . .9-23 PAT Indexing and Values After Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-35 Effective Memory Type Depending on MTRRs and PAT . . . . . . . . . . . . . . . .9-37 PAT Memory Types and Their Properties . . . . . . . . . . . . . . . . . . . . . . . . . . .9-38 Effects of MMX Instructions on FPU State . . . . . . . . . . . . . . . . . . . . . . . . .10-3 Effect of the MMX and Floating-Point Instructions on the FPU Tag Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-3 SIMD Floating-point Register Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-2 Rounding Control Field (RC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-4 Rounding of Positive Numbers Greater than the Maximum Positive Finite Value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-5 Rounding of Negative Numbers Smaller than the Maximum Negative Finite Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-5 CPUID Bits for Streaming SIMD Extensions Support . . . . . . . . . . . . . . . . . .11-6 CR4 Bits for Streaming SIMD Extensions Support . . . . . . . . . . . . . . . . . . . .11-6 Streaming SIMD Extensions Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-12 Invalid Arithmetic Operations and the Masked Responses to Them . . . . . .11-18 Masked Responses to Numeric Overflow . . . . . . . . . . . . . . . . . . . . . . . . . .11-20 SMRAM State Save Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-5 Processor Register Initialization in SMM . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-9 Auto HALT Restart Flag Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-14 I/O Instruction Restart Field Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-16 Simple Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-9 General Forms of Compound Error Codes. . . . . . . . . . . . . . . . . . . . . . . . . . .13-9 Encoding for TT (Transaction Type) Sub-Field. . . . . . . . . . . . . . . . . . . . . . .13-10 Level Encoding for LL (Memory Hierarchy Level) Sub-Field . . . . . . . . . . . .13-10 Encoding of Request (RRRR) Sub-Field . . . . . . . . . . . . . . . . . . . . . . . . . . .13-10 Encodings of PP, T, and II Sub-Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-11 Encoding of the MCi_STATUS Register for External Bus Errors . . . . . . . .13-11 Small and Large General-Purpose Register Pairs . . . . . . . . . . . . . . . . . . . . .14-7 Pairable Integer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-14 Breakpointing Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-7 Debug Exception Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-8 Real-Address Mode Exceptions and Interrupts . . . . . . . . . . . . . . . . . . . . . .16-8 Software Interrupt Handling Methods While in Virtual-8086 Mode . . . . . . . .16-24 Characteristics of 16-Bit and 32-Bit Program Modules. . . . . . . . . . . . . . . . . .17-1 New Instructions in the Pentium and Later Intel Architecture Processors . .18-3 Recommended Values of the FP Related Bits for Intel486 SX Microprocessor/Intel 487 SX Math Coprocessor System . . . . . . . . . . . . . . .18-20 EM and MP Flag Interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-20 Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2 Events That Can Be Counted with the Pentium Processor PerformanceMonitoring Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-12 Model-Specific Registers (MSRs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
xxii
1
About This Manual

The Intel Architecture Software Developers Manual, Volume 2: Instruction Set Reference (Order Number 243191) is part of a three-volume set that describes the architecture and programming environment of all Intel Architecture processors. The other two volumes in this set are:
The Intel Architecture Software Developers Manual, Volume 1: Basic Architecture (Order Number 243190). The Intel Architecture Software Developers Manual, Volume 3: System Programing Guide (Order Number 243192).
The Intel Architecture Software Developers Manual, Volume 1, describes the basic architecture and programming environment of an Intel Architecture processor; the Intel Architecture Software Developers Manual, Volume 2, describes the instructions set of the processor and the opcode structure. These two volumes are aimed at application programmers who are writing programs to run under existing operating systems or executives. The Intel Architecture Software Developers Manual, Volume 3, describes the operating-system support environment of an Intel Architecture processor, including memory management, protection, task management, interrupt and exception handling, and system management mode. It also provides Intel Architecture processor compatibility information. This volume is aimed at operating-system and BIOS designers and programmers.
1.1.
P6 FAMILY PROCESSOR TERMINOLOGY
This manual includes information pertaining primarily to the 32-bit Intel Architecture processors, which include the Intel386, Intel486, and Pentium processors, and the P6 family processors. The P6 family processors are those Intel Architecture processors based on the P6 family microarchitecture. This family includes the Pentium Pro, Pentium II, Pentium III processor, and any future processors based on the P6 family microarchitecture.
1.2.
The contents of this manual are as follows: Chapter 1 About This Manual. Gives an overview of all three volumes of the Intel Architecture Software Developers Manual. It also describes the notational conventions in these manuals and lists related Intel manuals and documentation of interest to programmers and hardware designers.
1-1
ABOUT THIS MANUAL
Chapter 2 System Architecture Overview. Describes the modes of operation of an Intel Architecture processor and the mechanisms provided in the Intel Architecture to support operating systems and executives, including the system-oriented registers and data structures and the system-oriented instructions. The steps necessary for switching between real-address and protected modes are also identified. Chapter 3 Protected-Mode Memory Management. Describes the data structures, registers, and instructions that support segmentation and paging and explains how they can be used to implement a flat (unsegmented) memory model or a segmented memory model. Chapter 4 Protection. Describes the support for page and segment protection provided in the Intel Architecture. This chapter also explains the implementation of privilege rules, stack switching, pointer validation, user and supervisor modes. Chapter 5 Interrupt and Exception Handling. Describes the basic interrupt mechanisms defined in the Intel Architecture, shows how interrupts and exceptions relate to protection, and describes how the architecture handles each exception type. Reference information for each Intel Architecture exception is given at the end of this chapter. Chapter 6 Task Management. Describes the mechanisms the Intel Architecture provides to support multitasking and inter-task protection. Chapter 7 Multiple-Processor Management. Describes the instructions and flags that support multiple processors with shared memory, memory ordering, and the advanced programmable interrupt controller (APIC). Chapter 8 Processor Management and Initialization. Defines the state of an Intel Architecture processor and its floating-point and SIMD floating-point units after reset initialization. This chapter also explains how to set up an Intel Architecture processor for real-address mode operation and protected- mode operation, and how to switch between modes. Chapter 9 Memory Cache Control. Describes the general concept of caching and the caching mechanisms supported by the Intel Architecture. This chapter also describes the memory type range registers (MTRRs) and how they can be used to map memory types of physical memory. MTRRs were introduced into the Intel Architecture with the Pentium Pro processor. It also presents information on using the new cache control and memory streaming instructions introduced with the Pentium III processor. Chapter 10 MMX Technology System Programming. Describes those aspects of the Intel MMX technology that must be handled and considered at the system programming level, including task switching, exception handling, and compatibility with existing system environments. The MMX technology was introduced into the Intel Architecture with the Pentium processor. Chapter 11 Streaming SIMD Extensions System Programming. Describes those aspects of Streaming SIMD Extensions that must be handled and considered at the system programming level, including task switching, exception handling, and compatibility with existing system environments. Streaming SIMD Extensions were introduced into the Intel Architecture with the Pentium processor. Chapter 12 System Management Mode (SMM). Describes the Intel Architectures system management mode (SMM), which can be used to implement power management functions.
1-2
ABOUT THIS MANUAL
Chapter 13 Machine-Check Architecture. Describes the machine-check architecture, which was introduced into the Intel Architecture with the Pentium processor. Chapter 14 Code Optimization. Discusses general optimization techniques for programming an Intel Architecture processor. Chapter 15 Debugging and Performance Monitoring. Describes the debugging registers and other debug mechanism provided in the Intel Architecture. This chapter also describes the time-stamp counter and the performance-monitoring counters. Chapter 16 8086 Emulation. Describes the real-address and virtual-8086 modes of the Intel Architecture. Chapter 17 Mixing 16-Bit and 32-Bit Code. Describes how to mix 16-bit and 32-bit code modules within the same program or task. Chapter 18 Intel Architecture Compatibility. Describes the programming differences between the Intel 286, Intel386, Intel486, Pentium, and P6 family processors. The differences among the 32-bit Intel Architecture processors (the Intel386, Intel486, Pentium, and P6 family processors) are described throughout the three volumes of the Intel Architecture Software Developers Manual, as relevant to particular features of the architecture. This chapter provides a collection of all the relevant compatibility information for all Intel Architecture processors and also describes the basic differences with respect to the 16-bit Intel Architecture processors (the Intel 8086 and Intel 286 processors). Appendix A Performance-Monitoring Events. Lists the events that can be counted with the performance-monitoring counters and the codes used to select these events. Both Pentium processor and P6 family processor events are described. Appendix B Model-Specific Registers (MSRs). Lists the MSRs available in the Pentium and P6 family processors and their functions. Appendix C Dual-Processor (DP) Bootup Sequence Example (Specific to Pentium Processors). Gives an example of how to use the DP protocol to boot two Pentium processors (a primary processor and a secondary processor) in a DP system and initialize their APICs. Appendix D Multiple-Processor (MP) Bootup Sequence Example (Specific to P6 Family Processors). Gives an example of how to use of the MP protocol to boot two P6 family processors in a MP system and initialize their APICs. Appendix E Programming the LINT0 and LINT1 Inputs. Gives an example of how to program the LINT0 and LINT1 pins for specific interrupt vectors.
1.3.
The contents of the Intel Architecture Software Developers Manual, Volume 1 are as follows: Chapter 1 About This Manual. Gives an overview of all three volumes of the Intel Architecture Software Developers Manual. It also describes the notational conventions in these
1-3
ABOUT THIS MANUAL
manuals and lists related Intel manuals and documentation of interest to programmers and hardware designers. Chapter 2 Introduction to the Intel Architecture. Introduces the Intel Architecture and the families of Intel processors that are based on this architecture. It also gives an overview of the common features found in these processors and brief history of the Intel Architecture. Chapter 3 Basic Execution Environment. Introduces the models of memory organization and describes the register set used by applications. Chapter 4 Procedure Calls, Interrupts, and Exceptions. Describes the procedure stack and the mechanisms provided for making procedure calls and for servicing interrupts and exceptions. Chapter 5 Data Types and Addressing Modes. Describes the data types and addressing modes recognized by the processor. Chapter 6 Instruction Set Summary. Gives an overview of all the Intel Architecture instructions except those executed by the processors floating-point unit. The instructions are presented in functionally related groups. Chapter 7 Floating-Point Unit. Describes the Intel Architecture floating-point unit, including the floating-point registers and data types; gives an overview of the floating-point instruction set; and describes the processors floating-point exception conditions. Chapter 8 Programming with the Intel MMX Technology. Describes the Intel MMX technology, including MMX registers and data types, and gives an overview of the MMX instruction set. Chapter 9 Programming with the Streaming SIMD Extensions. Describes the Intel Streaming SIMD Extensions, including the registers and data types. Chapter 10 Input/Output. Describes the processors I/O architecture, including I/O port addressing, the I/O instructions, and the I/O protection mechanism. Chapter 11 Processor Identification and Feature Determination. Describes how to determine the CPU type and the features that are available in the processor. Appendix A EFLAGS Cross-Reference. Summarizes how the Intel Architecture instructions affect the flags in the EFLAGS register. Appendix B EFLAGS Condition Codes. Summarizes how the conditional jump, move, and byte set on condition code instructions use the condition code flags (OF, CF, ZF, SF, and PF) in the EFLAGS register. Appendix C Floating-Point Exceptions Summary. Summarizes the exceptions that can be raised by floating-point instructions. Appendix D SIMD Floating-Point Exceptions Summary. Provides the Streaming SIMD Extensions mnemonics, and the exceptions that each instruction can cause. Appendix E Guidelines for Writing FPU Exception Handlers. Describes how to design and write MS-DOS* compatible exception handling facilities for FPU and SIMD floating-point exceptions, including both software and hardware requirements and assembly-language code
1-4
ABOUT THIS MANUAL
examples. This appendix also describes general techniques for writing robust FPU exception handlers. Appendix F Guidelines for Writing SIMD-FP Exception Handlers. Provides guidelines for the Streaming SIMD Extensions instructions that can generate numeric (floating-point) exceptions, and gives an overview of the necessary support for handling such exceptions.
1.4.
The contents of the Intel Architecture Software Developers Manual, Volume 2, are as follows: Chapter 1 About This Manual. Gives an overview of all three volumes of the Intel Architecture Software Developers Manual. It also describes the notational conventions in these manuals and lists related Intel manuals and documentation of interest to programmers and hardware designers. Chapter 2 Instruction Format. Describes the machine-level instruction format used for all Intel Architecture instructions and gives the allowable encodings of prefixes, the operand-identifier byte (ModR/M byte), the addressing-mode specifier byte (SIB byte), and the displacement and immediate bytes. Chapter 3 Instruction Set Reference. Describes each of the Intel Architecture instructions in detail, including an algorithmic description of operations, the effect on flags, the effect of operand- and address-size attributes, and the exceptions that may be generated. The instructions are arranged in alphabetical order. The FPU, MMX Technology instructions, and Streaming SIMD Extensions are included in this chapter. Appendix A Opcode Map. Gives an opcode map for the Intel Architecture instruction set. Appendix B Instruction Formats and Encodings. Gives the binary encoding of each form of each Intel Architecture instruction. Appendix C Compiler Intrinsics and Functional Equivalents. Gives the Intel C/C++ compiler intrinsics and functional equivalents for the MMX Technology instructions and Streaming SIMD Extensions.
1.5.
1-5
ABOUT THIS MANUAL
1.5.1.
Bit and Byte Order
In illustrations of data structures in memory, smaller addresses appear toward the bottom of the figure; addresses increase toward the top. Bit positions are numbered from right to left. The numerical value of a set bit is equal to two raised to the power of the bit position. Intel Architecture processors are little endian machines; this means the bytes of a word are numbered starting from the least significant byte. Figure 1-1 illustrates these conventions.
1.5.2.
NOTE
Avoid any software dependence upon the state of reserved bits in Intel Architecture registers. Depending upon the values of reserved register bits will make software dependent upon the unspecified manner in which the processor handles these bits. Programs that depend upon reserved values risk incompatibility with future processors.
Highest 31 Address Data Structure 8 7 24 23 16 15
0 28 24 20 16 12 8 4 0
Bit offset
Byte 3
Byte 2
Byte 1
Byte 0
Lowest Address
Byte Offset
1-6
ABOUT THIS MANUAL
1.5.3.
When instructions are represented symbolically, a subset of the Intel Architecture assembly language is used. In this subset, an instruction has the following format:
where:
1.5.4.
1.5.5.
The processor uses byte addressing. This means memory is organized and accessed as a sequence of bytes. Whether one or more bytes are being accessed, a byte address is used to locate the byte or bytes of memory. The range of memory that can be addressed is called an address space. The processor also supports segmented addressing. This is a form of addressing where a program may have many independent address spaces, called segments. For example, a program can keep its code (instructions) and stack in separate segments. Code addresses would always
1-7
ABOUT THIS MANUAL
refer to the code space, and stack addresses would always refer to the stack space. The following notation is used to specify a byte address within a segment:
DS:FF79H
CS:EIP
1.5.6.
Exceptions
#PF(fault code)
#GP(0)
Refer to Chapter 5, Interrupt and Exception Handling, for a list of exception mnemonics and their descriptions.
1-8
ABOUT THIS MANUAL
1.6.
RELATED LITERATURE
Intel Pentium II Processor Specification Update, Order Number 243337-010. Intel Pentium Pro Processor Specification Update, Order Number 242689-031. Intel Pentium Processor Specification Update, Order Number 242480. AP-485, Intel Processor Identification and the CPUID Instruction, Order Number 241618006. AP-578, Software and Hardware Considerations for FPU Exception Handlers for Intel Architecture Processors, Order Number 243291. Pentium Pro Processor Data Book, Order Number 242690. Pentium Pro BIOS Writers Guide, http://www.intel.com/procs/ppro/info/index.htm. Pentium Processor Data Book, Order Number 241428. 82496 Cache Controller and 82491 Cache SRAM Data Book For Use With the Pentium Processor, Order Number 241429. Intel486 Microprocessor Data Book, Order Number 240440. Intel486 SX CPU/Intel487 SX Math Coprocessor Data Book , Order Number 240950. Intel486 DX2 Microprocessor Data Book, Order Number 241245. Intel486 Microprocessor Product Brief Book, Order Number 240459. Intel386 Processor Hardware Reference Manual, Order Number 231732. Intel386 Processor System Software Writer's Guide, Order Number 231499. Intel386 High-Performance 32-Bit CHMOS Microprocessor with Integrated Memory Management, Order Number 231630. 376 Embedded Processor Programmers Reference Manual, Order Number 240314. 80387 DX Users Manual Programmers Reference, Order Number 231917. 376 High-Performance 32-Bit Embedded Processor, Order Number 240182. Intel386 SX Microprocessor, Order Number 240187. Intel Architecture Optimization Manual, Order Number 242816-002.
1-9
ABOUT THIS MANUAL
1-10
2
System Architecture Overview
SYSTEM ARCHITECTURE OVERVIEW
CHAPTER 2 SYSTEM ARCHITECTURE OVERVIEW

The 32-bit members of the Intel Architecture family of processors provide extensive support for operating-system and system-development software. This support is part of the processors system-level architecture and includes features to assist in the following operations:
Memory management Protection of software modules Multitasking Exception and interrupt handling Multiprocessing Cache management Hardware resource and power management Debugging and performance monitoring
This chapter provides a brief overview of the processors system-level architecture; a detailed description of each part of this architecture given in the following chapters. This chapter also describes the system registers that are used to set up and control the processor at the system level and gives a brief overview of the processors system-level (operating system) instructions. Many of the system-level architectural features of the processor are used only by system programmers. Application programmers may need to read this chapter, and the following chapters which describe the use of these features, in order to understand the hardware facilities used by system programmers to create a reliable and secure environment for application programs.
NOTE
This overview and most of the subsequent chapters of this book focus on the native or protected-mode operation of the 32-bit Intel Architecture processors. As described in Chapter 8, Processor Management and Initialization, all Intel Architecture processors enter real-address mode following a power-up or reset. Software must then initiate a switch from real-address mode to protected mode.
2.1.
OVERVIEW OF THE SYSTEM-LEVEL ARCHITECTURE
The Intel Architectures system architecture consists of a set of registers, data structures, and instructions designed to support basic system-level operations such as memory management, interrupt and exception handling, task management, and control of multiple processors (multiprocessing). Figure 2-1 provides a generalized summary of the system registers and data structures.
2-1
EFLAGS Register
Physical Address Linear Address
Code, Data or Stack Segment Task-State Segment (TSS) Task Code Data Stack
Control Registers CR4 Segment Selector CR3 CR2 Register CR1 CR0 Global Descriptor MXCSR1 Table (GDT) Task Register Segment Sel. Interrupt Vector TSS Seg. Sel. Seg. Desc. TSS Desc. Seg. Desc. TSS Desc. LTD Desc. GDTR Trap Gate Local Descriptor Table (LDT) IDTR Call-Gate Segment Selector Seg. Desc. Call Gate LDTR
Interrupt Handler Code Current Stack TSS Task-State Segment (TSS)
Interrupt Descriptor Table (IDT) Interrupt Gate Task Gate
Task Code Data Stack
Exception Handler Code Current Stack TSS Protected Procedure Code Current Stack TSS
Linear Address Space Dir Linear Addr.
Linear Address Table Offset
Page Directory
Page Table
Page Physical Addr.
Pg. Dir. Entry
Pg. Tbl. Entry
0 CR3*
This page mapping example is for 4-KByte pages and the normal 32-bit physical address size.
*Physical Address
1. MXCSR is new control/status register in the Pentium III processor.
Figure 2-1. System-Level Registers and Data Structures
2-2
2.1.1.
Global and Local Descriptor Tables
When operating in protected mode, all memory accesses pass through either the global descriptor table (GDT) or the (optional) local descriptor table (LDT), shown in Figure 2-1. These tables contain entries called segment descriptors. A segment descriptor provides the base address of a segment and access rights, type, and usage information. Each segment descriptor has a segment selector associated with it. The segment selector provides an index into the GDT or LDT (to its associated segment descriptor), a global/local flag (that determines whether the segment selector points to the GDT or the LDT), and access rights information. To access a byte in a segment, both a segment selector and an offset must be supplied. The segment selector provides access to the segment descriptor for the segment (in the GDT or LDT). From the segment descriptor, the processor obtains the base address of the segment in the linear address space. The offset then provides the location of the byte relative to the base address. This mechanism can be used to access any valid code, data, or stack segment in the GDT or LDT, provided the segment is accessible from the current privilege level (CPL) at which the processor is operating. (The CPL is defined as the protection level of the currently executing code segment.) In Figure 2-1 the solid arrows indicate a linear address, the dashed lines indicate a segment selector, and the dotted arrows indicate a physical address. For simplicity, many of the segment selectors are shown as direct pointers to a segment. However, the actual path from a segment selector to its associated segment is always through the GDT or LDT. The linear address of the base of the GDT is contained in the GDT register (GDTR); the linear address of the LDT is contained in the LDT register (LDTR).
2.1.2.
System Segments, Segment Descriptors, and Gates
Besides the code, data, and stack segments that make up the execution environment of a program or procedure, the system architecture also defines two system segments: the task-state segment (TSS) and the LDT. (The GDT is not considered a segment because it is not accessed by means of a segment selector and segment descriptor.) Each of these segment types has a segment descriptor defined for it. The system architecture also defines a set of special descriptors called gates (the call gate, interrupt gate, trap gate, and task gate) that provide protected gateways to system procedures and handlers that operate at different privilege levels than application programs and procedures. For example, a CALL to a call gate provides access to a procedure in a code segment that is at the same or numerically lower privilege level (more privileged) than the current code segment. To access a procedure through a call gate, the calling procedure1 must supply the selector of the call gate. The processor than performs an access rights check on the call gate, comparing the CPL with the privilege level of the call gate and the destination code segment pointed to by the call gate. If access to the destination code segment is allowed, the processor gets the segment selector for the destination code segment and an offset into that code segment from the call gate.
1. The word procedure is commonly used in this document as a general term for a logical unit or block of code (such as a program, procedure, function, or routine). The term is not restricted to the definition of a procedure in the Intel Architecture assembly language.
2-3
If the call requires a change in privilege level, the processor also switches to the stack for that privilege level. (The segment selector for the new stack is obtained from the TSS for the currently running task.) Gates also facilitate transitions between 16-bit and 32-bit code segments, and vice versa.
2.1.3.
Task-State Segments and Task Gates
The TSS (refer to Figure 2-1) defines the state of the execution environment for a task. It includes the state of the general-purpose registers, the segment registers, the EFLAGS register, the EIP register, and segment selectors and stack pointers for three stack segments (one stack each for privilege levels 0, 1, and 2). It also includes the segment selector for the LDT associated with the task and the page-table base address. All program execution in protected mode happens within the context of a task, called the current task. The segment selector for the TSS for the current task is stored in the task register. The simplest method of switching to a task is to make a call or jump to the task. Here, the segment selector for the TSS of the new task is given in the CALL or JMP instruction. In switching tasks, the processor performs the following actions: 1. Stores the state of the current task in the current TSS. 2. Loads the task register with the segment selector for the new task. 3. Accesses the new TSS through a segment descriptor in the GDT. 4. Loads the state of the new task from the new TSS into the general-purpose registers, the segment registers, the LDTR, control register CR3 (page-table base address), the EFLAGS register, and the EIP register. 5. Begins execution of the new task. A task can also be accessed through a task gate. A task gate is similar to a call gate, except that it provides access (through a segment selector) to a TSS rather than a code segment.
2.1.4.
Interrupt and Exception Handling
External interrupts, software interrupts, and exceptions are handled through the interrupt descriptor table (IDT), refer to Figure 2-1. The IDT contains a collection of gate descriptors, which provide access to interrupt and exception handlers. Like the GDT, the IDT is not a segment. The linear address of the base of the IDT is contained in the IDT register (IDTR). The gate descriptors in the IDT can be of the interrupt-, trap-, or task-gate type. To access an interrupt or exception handler, the processor must first receive an interrupt vector (interrupt number) from internal hardware, an external interrupt controller, or from software by means of an INT, INTO, INT 3, or BOUND instruction. The interrupt vector provides an index into the IDT to a gate descriptor. If the selected gate descriptor is an interrupt gate or a trap gate, the associated handler procedure is accessed in a manner very similar to calling a procedure through a call gate. If the descriptor is a task gate, the handler is accessed through a task switch.
2-4
2.1.5.
Memory Management
The system architecture supports either direct physical addressing of memory or virtual memory (through paging). When physical addressing is used, a linear address is treated as a physical address. When paging is used, all the code, data, stack, and system segments and the GDT and IDT can be paged, with only the most recently accessed pages being held in physical memory. The location of pages (or page frames as they are sometimes called in the Intel Architecture) in physical memory is contained in two types of system data structures (a page directory and a set of page tables), both of which reside in physical memory (refer to Figure 2-1). An entry in a page directory contains the physical address of the base of a page table, access rights, and memory management information. An entry in a page table contains the physical address of a page frame, access rights, and memory management information. The base physical address of the page directory is contained in control register CR3. To use this paging mechanism, a linear address is broken into three parts, providing separate offsets into the page directory, the page table, and the page frame. A system can have a single page directory or several. For example, each task can have its own page directory.
2.1.6.
System Registers
To assist in initializing the processor and controlling system operations, the system architecture provides system flags in the EFLAGS register and several system registers:
The system flags and IOPL field in the EFLAGS register control task and mode switching, interrupt handling, instruction tracing, and access rights. Refer to Section 2.3., System Flags and Fields in the EFLAGS Register for a description of these flags. The control registers (CR0, CR2, CR3, and CR4) contain a variety of flags and data fields for controlling system-level operations. With the introduction of the Pentium III processor, CR4 now contains bits indicating support Pentium III processor specific capabilities within the OS. Refer to Section 2.5., Control Registers for a description of these flags. The debug registers (not shown in Figure 2-1) allow the setting of breakpoints for use in debugging programs and systems software. Refer to Chapter 15, Debugging and Performance Monitoring, for a description of these registers. The GDTR, LDTR, and IDTR registers contain the linear addresses and sizes (limits) of their respective tables. Refer to Section 2.4., Memory-Management Registers for a description of these registers. The task register contains the linear address and size of the TSS for the current task. Refer to Section 2.4., Memory-Management Registers for a description of this register. Model-specific registers (not shown in Figure 2-1).
The model-specific registers (MSRs) are a group of registers available primarily to operatingsystem or executive procedures (that is, code running at privilege level 0). These registers control items such as the debug extensions, the performance-monitoring counters, the machinecheck architecture, and the memory type ranges (MTRRs). The number and functions of these
2-5
registers varies among the different members of the Intel Architecture processor families. Section 8.4., Model-Specific Registers (MSRs) in Chapter 8, Processor Management and Initialization for more information about the MSRs and Appendix B, Model-Specific Registers for a complete list of the MSRs. Most systems restrict access to all system registers (other than the EFLAGS register) by application programs. Systems can be designed, however, where all programs and procedures run at the most privileged level (privilege level 0), in which case application programs are allowed to modify the system registers.
2.1.7.
Other System Resources
Besides the system registers and data structures described in the previous sections, the system architecture provides the following additional resources:
Operating system instructions (refer to Section 2.6., System Instruction Summary). Performance-monitoring counters (not shown in Figure 2-1). Internal caches and buffers (not shown in Figure 2-1).
The performance-monitoring counters are event counters that can be programmed to count processor events such as the number of instructions decoded, the number of interrupts received, or the number of cache loads. Refer to Section 15.6., Performance-Monitoring Counters, in Chapter 15, Debugging and Performance Monitoring , for more information about these counters. The processor provides several internal caches and buffers. The caches are used to store both data and instructions. The buffers are used to store things like decoded addresses to system and application segments and write operations waiting to be performed. Refer to Chapter 9, Memory Cache Control, for a detailed discussion of the processors caches and buffers.
2.2.
MODES OF OPERATION
The Intel Architecture supports three operating modes and one quasi-operating mode:
Protected mode. This is the native operating mode of the processor. In this mode all instructions and architectural features are available, providing the highest performance and capability. This is the recommended mode for all new applications and operating systems. Real-address mode. This operating mode provides the programming environment of the Intel 8086 processor, with a few extensions (such as the ability to switch to protected or system management mode). System management mode (SMM). The system management mode (SMM) is a standard architectural feature in all Intel Architecture processors, beginning with the Intel386 SL processor. This mode provides an operating system or executive with a transparent mechanism for implementing power management and OEM differentiation features. SMM is entered through activation of an external system interrupt pin (SMI#), which generates a
2-6
system management interrupt (SMI). In SMM, the processor switches to a separate address space while saving the context of the currently running program or task. SMM-specific code may then be executed transparently. Upon returning from SMM, the processor is placed back into its state prior to the SMI.
Virtual-8086 mode. In protected mode, the processor supports a quasi-operating mode known as virtual-8086 mode. This mode allows the processor to execute 8086 software in a protected, multitasking environment.
Figure 2-2 shows how the processor moves among these operating modes.
Real-Address Mode Reset or PE=0
PE=1
Reset or RSM SMI#
SMI#
Reset
Protected Mode RSM VM=0 VM=1 SMI# Virtual-8086 Mode RSM
System Management Mode
Figure 2-2. Transitions Among the Processors Operating Modes
The processor is placed in real-address mode following power-up or a reset. Thereafter, the PE flag in control register CR0 controls whether the processor is operating in real-address or protected mode (refer to Section 2.5., Control Registers). Refer to Section 8.8., Mode Switching in Chapter 8, Processor Management and Initialization for detailed information on switching between real-address mode and protected mode. The VM flag in the EFLAGS register determines whether the processor is operating in protected mode or virtual-8086 mode. Transitions between protected mode and virtual-8086 mode are generally carried out as part of a task switch or a return from an interrupt or exception handler (refer to Section 16.2.5., Entering Virtual-8086 Mode in Chapter 16, 8086 Emulation). The processor switches to SMM whenever it receives an SMI while the processor is in realaddress, protected, or virtual-8086 modes. Upon execution of the RSM instruction, the processor always returns to the mode it was in when the SMI occurred.
2-7
2.3.
SYSTEM FLAGS AND FIELDS IN THE EFLAGS REGISTER
The system flags and IOPL field of the EFLAGS register control I/O, maskable hardware interrupts, debugging, task switching, and the virtual-8086 mode (refer to Figure 2-3). Only privileged code (typically operating system or executive code) should be allowed to modify these bits. The functions of the system flags and IOPL are as follows: TF Trap (bit 8). Set to enable single-step mode for debugging; clear to disable single-step mode. In single-step mode, the processor generates a debug exception after each instruction, which allows the execution state of a program to be inspected after each instruction. If an application program sets the TF flag using a POPF, POPFD, or IRET instruction, a debug exception is generated after the instruction that follows the POPF, POPFD, or IRET instruction.
31
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Reserved (set to 0)
V V I I I A V R 0 N T D C M F P F
I O P L
O D I T S Z P C A F F F F F F 0 F 0 F 1 F
ID Identification Flag VIP Virtual Interrupt Pending VIF Virtual Interrupt Flag AC Alignment Check VM Virtual-8086 Mode RF Resume Flag NT Nested Task Flag IOPL I/O Privilege Level IF Interrupt Enable Flag TF Trap Flag Reserved
Figure 2-3. System Flags in the EFLAGS Register
IF
Interrupt enable (bit 9). Controls the response of the processor to maskable hardware interrupt requests (refer to Section 5.1.1.2., Maskable Hardware Interrupts in Chapter 5, Interrupt and Exception Handling). Set to respond to maskable hardware interrupts; cleared to inhibit maskable hardware interrupts. The IF flag does not affect the generation of exceptions or nonmaskable interrupts (NMI interrupts). The CPL, IOPL, and the state of the VME flag in control register CR4 determine whether the IF flag can be modified by the CLI, STI, POPF, POPFD, and IRET instructions. I/O privilege level field (bits 12 and 13). Indicates the I/O privilege level (IOPL) of the currently running program or task. The CPL of the currently running program or task must be less than or equal to the IOPL to access the I/O address space. This field can only be modified by the POPF and IRET instructions when operating at a CPL of 0. Refer to Chapter 10, Input/Output, of the Intel Architecture Software Developers Manual, Volume 1, for more information on the relationship of the IOPL to I/O operations.
IOPL
2-8
The IOPL is also one of the mechanisms that controls the modification of the IF flag and the handling of interrupts in virtual-8086 mode when the virtual mode extensions are in effect (the VME flag in control register CR4 is set). NT Nested task (bit 14). Controls the chaining of interrupted and called tasks. The processor sets this flag on calls to a task initiated with a CALL instruction, an interrupt, or an exception. It examines and modifies this flag on returns from a task initiated with the IRET instruction. The flag can be explicitly set or cleared with the POPF/POPFD instructions; however, changing to the state of this flag can generate unexpected exceptions in application programs. Refer to Section 6.4., Task Linking in Chapter 6, Task Management for more information on nested tasks. Resume (bit 16). Controls the processors response to instruction-breakpoint conditions. When set, this flag temporarily disables debug exceptions (#DE) from being generated for instruction breakpoints; although, other exception conditions can cause an exception to be generated. When clear, instruction breakpoints will generate debug exceptions. The primary function of the RF flag is to allow the restarting of an instruction following a debug exception that was caused by an instruction breakpoint condition. Here, debugger software must set this flag in the EFLAGS image on the stack just prior to returning to the interrupted program with the IRETD instruction, to prevent the instruction breakpoint from causing another debug exception. The processor then automatically clears this flag after the instruction returned to has been successfully executed, enabling instruction breakpoint faults again. Refer to Section 15.3.1.1., Instruction-Breakpoint Exception Condition, in Chapter 15, Debugging and Performance Monitoring, for more information on the use of this flag. VM Virtual-8086 mode (bit 17). Set to enable virtual-8086 mode; clear to return to protected mode. Refer to Section 16.2.1., Enabling Virtual-8086 Mode in Chapter 16, 8086 Emulation for a detailed description of the use of this flag to switch to virtual8086 mode. Alignment check (bit 18). Set this flag and the AM flag in the CR0 register to enable alignment checking of memory references; clear the AC flag and/or the AM flag to disable alignment checking. An alignment-check exception is generated when reference is made to an unaligned operand, such as a word at an odd byte address or a doubleword at an address which is not an integral multiple of four. Alignment-check exceptions are generated only in user mode (privilege level 3). Memory references that default to privilege level 0, such as segment descriptor loads, do not generate this exception even when caused by instructions executed in user-mode. The alignment-check exception can be used to check alignment of data. This is useful when exchanging data with other processors, which require all data to be aligned. The alignment-check exception can also be used by interpreters to flag some pointers as special by misaligning the pointer. This eliminates overhead of checking each pointer and only handles the special pointer when used.
RF
AC
2-9
VIF
Virtual Interrupt (bit 19). Contains a virtual image of the IF flag. This flag is used in conjunction with the VIP flag. The processor only recognizes the VIF flag when either the VME flag or the PVI flag in control register CR4 is set and the IOPL is less than 3. (The VME flag enables the virtual-8086 mode extensions; the PVI flag enables the protected-mode virtual interrupts.) Refer to Section 16.3.3.5., Method 6: Software Interrupt Handling and Section 16.4., Protected-Mode Virtual Interrupts in Chapter 16, 8086 Emulation for detailed information about the use of this flag. Virtual interrupt pending (bit 20). Set by software to indicate that an interrupt is pending; cleared to indicate that no interrupt is pending. This flag is used in conjunction with the VIF flag. The processor reads this flag but never modifies it. The processor only recognizes the VIP flag when either the VME flag or the PVI flag in control register CR4 is set and the IOPL is less than 3. (The VME flag enables the virtual-8086 mode extensions; the PVI flag enables the protected-mode virtual interrupts.) Refer to Section 16.3.3.5., Method 6: Software Interrupt Handling and Section 16.4., Protected-Mode Virtual Interrupts in Chapter 16, 8086 Emulation for detailed information about the use of this flag. Identification (bit 21). The ability of a program or procedure to set or clear this flag indicates support for the CPUID instruction.
VIP
ID
2.4.
MEMORY-MANAGEMENT REGISTERS
The processor provides four memory-management registers (GDTR, LDTR, IDTR, and TR) that specify the locations of the data structures which control segmented memory management (refer to Figure 2-4). Special instructions are provided for loading and storing these registers.
47 GDTR IDTR
System Table Registers 16 15 32-bit Linear Base Address 32-bit Linear Base Address
16-Bit Table Limit 16-Bit Table Limit
System Segment 15 Registers 0 Task Register LDTR Seg. Sel. Seg. Sel.
Segment Descriptor Registers (Automatically Loaded) Attributes 32-bit Linear Base Address 32-bit Linear Base Address Segment Limit Segment Limit
Figure 2-4. Memory Management Registers
2.4.1.
Global Descriptor Table Register (GDTR)
The GDTR register holds the 32-bit base address and 16-bit table limit for the GDT. The base address specifies the linear address of byte 0 of the GDT; the table limit specifies the number of bytes in the table. The LGDT and SGDT instructions load and store the GDTR register, respectively. On power up or reset of the processor, the base address is set to the default value of 0 and
2-10
the limit is set to FFFFH. A new base address must be loaded into the GDTR as part of the processor initialization process for protected-mode operation. Refer to Section 3.5.1., Segment Descriptor Tables in Chapter 3, Protected-Mode Memory Management for more information on the base address and limit fields.
2.4.2.
Local Descriptor Table Register (LDTR)
The LDTR register holds the 16-bit segment selector, 32-bit base address, 16-bit segment limit, and descriptor attributes for the LDT. The base address specifies the linear address of byte 0 of the LDT segment; the segment limit specifies the number of bytes in the segment. Refer to Section 3.5.1., Segment Descriptor Tables in Chapter 3, Protected-Mode Memory Management for more information on the base address and limit fields. The LLDT and SLDT instructions load and store the segment selector part of the LDTR register, respectively. The segment that contains the LDT must have a segment descriptor in the GDT. When the LLDT instruction loads a segment selector in the LDTR, the base address, limit, and descriptor attributes from the LDT descriptor are automatically loaded into the LDTR. When a task switch occurs, the LDTR is automatically loaded with the segment selector and descriptor for the LDT for the new task. The contents of the LDTR are not automatically saved prior to writing the new LDT information into the register. On power up or reset of the processor, the segment selector and base address are set to the default value of 0 and the limit is set to FFFFH.
2.4.3.
IDTR Interrupt Descriptor Table Register
The IDTR register holds the 32-bit base address and 16-bit table limit for the IDT. The base address specifies the linear address of byte 0 of the IDT; the table limit specifies the number of bytes in the table. The LIDT and SIDT instructions load and store the IDTR register, respectively. On power up or reset of the processor, the base address is set to the default value of 0 and the limit is set to FFFFH. The base address and limit in the register can then be changed as part of the processor initialization process. Refer to Section 5.8., Interrupt Descriptor Table (IDT) in Chapter 5, Interrupt and Exception Handling for more information on the base address and limit fields.
2.4.4.
Task Register (TR)
The task register holds the 16-bit segment selector, 32-bit base address, 16-bit segment limit, and descriptor attributes for the TSS of the current task. It references a TSS descriptor in the GDT. The base address specifies the linear address of byte 0 of the TSS; the segment limit specifies the number of bytes in the TSS. (Refer to Section 6.2.3., Task Register in Chapter 6, Task Management for more information about the task register.) The LTR and STR instructions load and store the segment selector part of the task register, respectively. When the LTR instruction loads a segment selector in the task register, the base
2-11
address, limit, and descriptor attributes from the TSS descriptor are automatically loaded into the task register. On power up or reset of the processor, the base address is set to the default value of 0 and the limit is set to FFFFH. When a task switch occurs, the task register is automatically loaded with the segment selector and descriptor for the TSS for the new task. The contents of the task register are not automatically saved prior to writing the new TSS information into the register.
2.5.
CONTROL REGISTERS
The control registers (CR0, CR1, CR2, CR3, and CR4) determine operating mode of the processor and the characteristics of the currently executing task (refer to Figure 2-5).
31
10
9 8 7 6 5 4 3 2 1 0 T P V P P M P P C G C A S D S V M E D I E E E E E E
Reserved (set to 0)
CR4
OSXMMEXCPT OSFXSR
31 12 11 5 4 3 2 P P C W D T 0 0
Page-Directory Base
31
CR3 (PDBR)
Page-Fault Linear Address

31 0
CR2
CR1
31 30 29 30 P C N G D W 19 18 17 16 15 A M W P 6 5 4 3 2 1 0 N E T E M P E T S M P E
CR0
Reserved
Figure 2-5. Control Registers
2-12
The control registers:
CR0Contains system control flags that control operating mode and states of the processor. CR1Reserved. CR2Contains the page-fault linear address (the linear address that caused a page fault). CR3Contains the physical address of the base of the page directory and two flags (PCD and PWT). This register is also known as the page-directory base register (PDBR). Only the 20 most-significant bits of the page-directory base address are specified; the lower 12 bits of the address are assumed to be 0. The page directory must thus be aligned to a page (4-KByte) boundary. The PCD and PWT flags control caching of the page directory in the processors internal data caches (they do not control TLB caching of page-directory information). When using the physical address extension, the CR3 register contains the base address of the page-directory-pointer table (refer to Section 3.8., Physical Address Extension in Chapter 3, Protected-Mode Memory Management).
CR4Contains a group of flags that enable several architectural extensions, as well as indicating the level of OS support for the Streaming SIMD Extensions.
In protected mode, the move-to-or-from-control-registers forms of the MOV instruction allow the control registers to be read (at privilege level 0 only) or loaded (at privilege level 0 only). These restrictions mean that application programs (running at privilege levels 1, 2, or 3) are prevented from reading or loading the control registers. A program running at privilege level 1, 2, or 3 should not attempt to read or write the control registers. An attempt to read or write these registers will result in a general protection fault (GP(0)). The functions of the flags in the control registers are as follows: PG Paging (bit 31 of CR0). Enables paging when set; disables paging when clear. When paging is disabled, all linear addresses are treated as physical addresses. The PG flag has no effect if the PE flag (bit 0 of register CR0) is not also set; in fact, setting the PG flag when the PE flag is clear causes a general-protection exception (#GP) to be generated. Refer to Section 3.6., Paging (Virtual Memory) in Chapter 3, Protected-Mode Memory Management for a detailed description of the processors paging mechanism. Cache Disable (bit 30 of CR0). When the CD and NW flags are clear, caching of memory locations for the whole of physical memory in the processors internal (and external) caches is enabled. When the CD flag is set, caching is restricted as described in Table 9-4, in Chapter 9, Memory Cache Control. To prevent the processor from accessing and updating its caches, the CD flag must be set and the caches must be invalidated so that no cache hits can occur (refer to Section 9.5.2., Preventing Caching, in Chapter 9, Memory Cache Control). Refer to Section 9.5., Cache Control, Chapter 9, Memory Cache Control, for a detailed description of the additional restrictions that can be placed on the caching of selected pages or regions of memory. Not Write-through (bit 29 of CR0). When the NW and CD flags are clear, write-back (for Pentium and P6 family processors) or write-through (for Intel486 processors) is enabled for writes that hit the cache and invalidation cycles are enabled. Refer to
2-13
CD
NW
Table 9-4, in Chapter 9, Memory Cache Control, for detailed information about the affect of the NW flag on caching for other settings of the CD and NW flags. AM Alignment Mask (bit 18 of CR0). Enables automatic alignment checking when set; disables alignment checking when clear. Alignment checking is performed only when the AM flag is set, the AC flag in the EFLAGS register is set, the CPL is 3, and the processor is operating in either protected or virtual-8086 mode. Write Protect (bit 16 of CR0). Inhibits supervisor-level procedures from writing into user-level read-only pages when set; allows supervisor-level procedures to write into user-level read-only pages when clear. This flag facilitates implementation of the copyon-write method of creating a new process (forking) used by operating systems such as UNIX*. Numeric Error (bit 5 of CR0). Enables the native (internal) mechanism for reporting FPU errors when set; enables the PC-style FPU error reporting mechanism when clear. When the NE flag is clear and the IGNNE# input is asserted, FPU errors are ignored. When the NE flag is clear and the IGNNE# input is deasserted, an unmasked FPU error causes the processor to assert the FERR# pin to generate an external interrupt and to stop instruction execution immediately before executing the next waiting floatingpoint instruction or WAIT/FWAIT instruction. The FERR# pin is intended to drive an input to an external interrupt controller (the FERR# pin emulates the ERROR# pin of the Intel 287 and Intel 387 DX math coprocessors). The NE flag, IGNNE# pin, and FERR# pin are used with external logic to implement PC-style error reporting. (Refer to Software Exception Handling in Chapter 7, and Appendix D in the Intel Architecture Software Developers Manual, Volume 1, for more information about FPU error reporting and for detailed information on when the FERR# pin is asserted, which is implementation dependent.) Extension Type (bit 4 of CR0). Reserved in the P6 family and Pentium processors. (In the P6 family processors, this flag is hardcoded to 1.) In the Intel386 and Intel486 processors, this flag indicates support of Intel 387 DX math coprocessor instructions when set. Task Switched (bit 3 of CR0). Allows the saving of FPU context on a task switch to be delayed until the FPU is actually accessed by the new task. The processor sets this flag on every task switch and tests it when interpreting floating-point arithmetic instructions.
WP
NE
ET
TS
If the TS flag is set, a device-not-available exception (#NM) is raised prior to the execution of a floating-point instruction. If the TS flag and the MP flag (also in the CR0 register) are both set, an #NM exception is raised prior to the execution of floating-point instruction or a WAIT/FWAIT instruction.
Table 2-1 shows the actions taken for floating-point, WAIT/FWAIT, MMX, and Streaming SIMD Extensions based on the settings of the TS, EM, and MP flags.
2-14
Table 2-1. Action Taken for Combinations of EM, MP, TS, CR4.OSFXSR, and CPUID.XMM
CR0 Flags EM MP TS CR4 OSFXSR CPUID XMM Floating-Point Instruction Type WAIT/FWAIT MMX Technology Execute #NM Exception Execute #NM Exception #UD Exception #UD Exception #UD Exception MMX Technology #UD Exception Streaming SIMD Extensions Streaming SIMD Extensions #UD Interrupt 6 #NM Interrupt 7 #UD Interrupt 6 #UD Interrupt 6
0 0 0 0 1 1 1 EM
0 0 1 1 0 0 1 MP
0 1 0 1 0 1 0 TS
OSFXSR
XMM
Execute #NM Exception Execute #NM Exception #NM Exception #NM Exception #NM Exception Floating-Point
Execute Execute Execute #NM Exception Execute Execute Execute WAIT/FWAIT
1 1 0 -
1 -
1 1 -
1 0 -
1 0
#NM Exception -
#NM Exception -
The processor does not automatically save the context of the FPU on a task switch. Instead it sets the TS flag, which causes the processor to raise an #NM exception whenever it encounters a floating-point instruction in the instruction stream for the new task. The fault handler for the #NM exception can then be used to clear the TS flag (with the CLTS instruction) and save the context of the FPU. If the task never encounters a floating-point instruction, the FPU context is never saved. EM Emulation (bit 2 of CR0). Indicates that the processor does not have an internal or external FPU when set; indicates an FPU is present when clear. When the EM flag is set, execution of a floating-point instruction generates a device-not-available exception (#NM). This flag must be set when the processor does not have an internal FPU or is not connected to a math coprocessor. If the processor does have an internal FPU, setting this flag would force all floating-point instructions to be handled by software emulation. Table 8-2 in Chapter 8, Processor Management and Initialization shows the recommended setting of this flag, depending on the Intel Architecture processor and
2-15
FPU or math coprocessor present in the system. Table 2-1 shows the interaction of the EM, MP, and TS flags. Note that the EM flag also affects the execution of the MMX instructions (refer to Table 2-1). When this flag is set, execution of an MMX instruction causes an invalid opcode exception (#UD) to be generated. Thus, if an Intel Architecture processor incorporates MMX technology, the EM flag must be set to 0 to enable execution of MMX instructions. Similarly for the Streaming SIMD Extensions, when this flag is set, execution of a Streaming SIMD Extensions instruction causes an invalid opcode exception (#UD) to be generated. Thus, if an Intel Architecture processor incorporates Streaming SIMD Extensions, the EM flag must be set to 0 to enable execution of Streaming SIMD Extensions. The exception to this is the PREFETCH and SFENCE instructions. These instructions are not affected by the EM flag. MP Monitor Coprocessor (bit 1 of CR0). Controls the interaction of the WAIT (or FWAIT) instruction with the TS flag (bit 3 of CR0). If the MP flag is set, a WAIT instruction generates a device-not-available exception (#NM) if the TS flag is set. If the MP flag is clear, the WAIT instruction ignores the setting of the TS flag. Table 8-2 in Chapter 8, Processor Management and Initialization shows the recommended setting of this flag, depending on the Intel Architecture processor and FPU or math coprocessor present in the system. Table 2-1 shows the interaction of the MP, EM, and TS flags. Protection Enable (bit 0 of CR0). Enables protected mode when set; enables realaddress mode when clear. This flag does not enable paging directly. It only enables segment-level protection. To enable paging, both the PE and PG flags must be set. Refer to Section 8.8., Mode Switching in Chapter 8, Processor Management and Initialization for information using the PE flag to switch between real and protected mode. Page-level Cache Disable (bit 4 of CR3). Controls caching of the current page directory. When the PCD flag is set, caching of the page-directory is prevented; when the flag is clear, the page-directory can be cached. This flag affects only the processors internal caches (both L1 and L2, when present). The processor ignores this flag if paging is not used (the PG flag in register CR0 is clear) or the CD (cache disable) flag in CR0 is set. Refer to Chapter 9, Memory Cache Control, for more information about the use of this flag. Refer to Section 3.6.4., Page-Directory and Page-Table Entries in Chapter 3, Protected-Mode Memory Management for a description of a companion PCD flag in the page-directory and page-table entries. Page-level Writes Transparent (bit 3 of CR3). Controls the write-through or writeback caching policy of the current page directory. When the PWT flag is set, writethrough caching is enabled; when the flag is clear, write-back caching is enabled. This flag affects only the internal caches (both L1 and L2, when present). The processor ignores this flag if paging is not used (the PG flag in register CR0 is clear) or the CD (cache disable) flag in CR0 is set. Refer to Section 9.5., Cache Control, in Chapter 9, Memory Cache Control, for more information about the use of this flag. Refer to Section 3.6.4., Page-Directory and Page-Table Entries in Chapter 3, Protected-Mode
PE
PCD
PWT
2-16
Memory Management for a description of a companion PCD flag in the page-directory and page-table entries. VME Virtual-8086 Mode Extensions (bit 0 of CR4). Enables interrupt- and exceptionhandling extensions in virtual-8086 mode when set; disables the extensions when clear. Use of the virtual mode extensions can improve the performance of virtual-8086 applications by eliminating the overhead of calling the virtual-8086 monitor to handle interrupts and exceptions that occur while executing an 8086 program and, instead, redirecting the interrupts and exceptions back to the 8086 programs handlers. It also provides hardware support for a virtual interrupt flag (VIF) to improve reliability of running 8086 programs in multitasking and multiple-processor environments. Refer to Section 16.3., Interrupt and Exception Handling in Virtual-8086 Mode in Chapter 16, 8086 Emulation for detailed information about the use of this feature. Protected-Mode Virtual Interrupts (bit 1 of CR4). Enables hardware support for a virtual interrupt flag (VIF) in protected mode when set; disables the VIF flag in protected mode when clear. Refer to Section 16.4., Protected-Mode Virtual Interrupts in Chapter 16, 8086 Emulation for detailed information about the use of this feature. Time Stamp Disable (bit 2 of CR4). Restricts the execution of the RDTSC instruction to procedures running at privilege level 0 when set; allows RDTSC instruction to be executed at any privilege level when clear. Debugging Extensions (bit 3 of CR4). References to debug registers DR4 and DR5 cause an undefined opcode (#UD) exception to be generated when set; when clear, processor aliases references to registers DR4 and DR5 for compatibility with software written to run on earlier Intel Architecture processors. Refer to Section 15.2.2., Debug Registers DR4 and DR5, in Chapter 15, Debugging and Performance Monitoring, for more information on the function of this flag. Page Size Extensions (bit 4 of CR4). Enables 4-MByte pages when set; restricts pages to 4 KBytes when clear. Refer to Section 3.6.1., Paging Options in Chapter 3, Protected-Mode Memory Management for more information about the use of this flag. Physical Address Extension (bit 5 of CR4). Enables paging mechanism to reference 36-bit physical addresses when set; restricts physical addresses to 32 bits when clear. Refer to Section 3.8., Physical Address Extension in Chapter 3, Protected-Mode Memory Management for more information about the physical address extension. Machine-Check Enable (bit 6 of CR4). Enables the machine-check exception when set; disables the machine-check exception when clear. Refer to Chapter 13, MachineCheck Architecture, for more information about the machine-check exception and machine- check architecture. Page Global Enable (bit 7 of CR4). (Introduced in the P6 family processors.) Enables the global page feature when set; disables the global page feature when clear. The global page feature allows frequently used or shared pages to be marked as global to all users (done with the global flag, bit 8, in a page-directory or page-table entry). Global pages are not flushed from the translation-lookaside buffer (TLB) on a task switch or a write to register CR3. In addition, the bit must not be enabled before paging
PVI
TSD
DE
PSE
PAE
MCE
PGE
2-17
is enabled via CR0.PG. Program correctness may be affected by reversing this sequence, and processor performance will be impacted. Refer to Section 3.7., Translation Lookaside Buffers (TLBs) in Chapter 3, Protected-Mode Memory Management for more information on the use of this bit. PCE Performance-Monitoring Counter Enable (bit 8 of CR4). Enables execution of the RDPMC instruction for programs or procedures running at any protection level when set; RDPMC instruction can be executed only at protection level 0 when clear.
OSFXSR Operating Sytsem FXSAVE/FXRSTOR Support (bit 9 of CR4). The operating system will set this bit if both the CPU and the OS support the use of FXSAVE/FXRSTOR for use during context switches. OSXMMEXCPT Operating System Unmasked Exception Support (bit 10 of CR4). The operating system will set this bit if it provides support for unmasked SIMD floating-point exceptions.
2.5.1.
CPUID Qualification of Control Register Flags
The VME, PVI, TSD, DE, PSE, PAE, MCE, PGE, PCE, OSFXSR, and OSXMMCEPT flags in control register CR4 are model specific. All of these flags (except PCE) can be qualified with the CPUID instruction to determine if they are implemented on the processor before they are used.
2.6.
SYSTEM INSTRUCTION SUMMARY
The system instructions handle system-level functions such as loading system registers, managing the cache, managing interrupts, or setting up the debug registers. Many of these instructions can be executed only by operating-system or executive procedures (that is, procedures running at privilege level 0). Others can be executed at any privilege level and are thus available to application programs. Table 2-2 lists the system instructions and indicates whether they are available and useful for application programs. These instructions are described in detail in Chapter 3, Instruction Set Reference, of the Intel Architecture Software Developers Manual, Volume 2.
2-18
Table 2-2. Summary of System Instructions

Instruction LLDT SLDT LGDT SGDT LTR STR LIDT SIDT MOV CRn SMSW LMSW CLTS ARPL LAR LSL VERR VERW MOV DBn INVD WBINVD INVLPG HLT LOCK (Prefix) RSM RDMSR3 WRMSR3 RDPMC RDTSC
4 3 5
Description Load LDT Register Store LDT Register Load GDT Register Store GDT Register Load Task Register Store Task Register Load IDT Register Store IDT Register Load and store control registers Store MSW Load MSW Clear TS flag in CR0 Adjust RPL Load Access Rights Load Segment Limit Verify for Reading Verify for Writing Load and store debug registers Invalidate cache, no writeback Invalidate cache, with writeback Invalidate TLB entry Halt Processor Bus Lock Return from system management mode Read Model-Specific Registers Write Model-Specific Registers Read Performance-Monitoring Counter Read Time-Stamp Counter Load MXCSR Register Store MXCSR Resister
Useful to Application? No No No No No No No No Yes Yes No No Yes1 Yes Yes Yes Yes No No No No No Yes No No No Yes Yes Yes Yes No
Protected from Application? Yes Yes No Yes No Yes No Yes (load only) No Yes Yes No No No No No Yes Yes Yes Yes Yes No Yes Yes Yes Yes2 Yes2 No No
LDMXCSR
STMXCSR5 NOTES:
1. Useful to application programs running at a CPL of 1 or 2. 2. The TSD and PCE flags in control register CR4 control access to these instructions by application programs running at a CPL of 3. 3. These instructions were introduced into the Intel Architecture with the Pentium processor. 4. This instruction was introduced into the Intel Architecture with the Pentium Pro processor and the Pentium processor with MMX technology. 5. This instruction was introduced into the Intel Architecture with the Pentium III processor.
2-19
2.6.1.
Loading and Storing System Registers
The GDTR, LDTR, IDTR, and TR registers each have a load and store instruction for loading data into and storing data from the register: LGDT (Load GDTR Register) Loads the GDT base address and limit from memory into the GDTR register. SGDT (Store GDTR Register) Stores the GDT base address and limit from the GDTR register into memory. LIDT (Load IDTR Register) SIDT (Load IDTR Register LLDT (Load LDT Register) Loads the IDT base address and limit from memory into the IDTR register. Stores the IDT base address and limit from the IDTR register into memory. Loads the LDT segment selector and segment descriptor from memory into the LDTR. (The segment selector operand can also be located in a general-purpose register.) Stores the LDT segment selector from the LDTR register into memory or a general-purpose register. Loads segment selector and segment descriptor for a TSS from memory into the task register. (The segment selector operand can also be located in a general-purpose register.) Stores the segment selector for the current task TSS from the task register into memory or a general-purpose register.
SLDT (Store LDT Register) LTR (Load Task Register)
STR (Store Task Register)
The LMSW (load machine status word) and SMSW (store machine status word) instructions operate on bits 0 through 15 of control register CR0. These instructions are provided for compatibility with the 16-bit Intel 286 processor. Program written to run on 32-bit Intel Architecture processors should not use these instructions. Instead, they should access the control register CR0 using the MOV instruction. The CLTS (clear TS flag in CR0) instruction is provided for use in handling a device-not-available exception (#NM) that occurs when the processor attempts to execute a floating-point instruction when the TS flag is set. This instruction allows the TS flag to be cleared after the FPU context has been saved, preventing further #NM exceptions. Refer to Section 2.5., Control Registers for more information about the TS flag. The control registers (CR0, CR1, CR2, CR3, and CR4) are loaded with the MOV instruction. This instruction can load a control register from a general-purpose register or store the contents of the control register in a general-purpose register.
2.6.2.
Verifying of Access Privileges
The processor provides several instructions for examining segment selectors and segment descriptors to determine if access to their associated segments is allowed. These instructions
2-20
duplicate some of the automatic access rights and type checking done by the processor, thus allowing operating-system or executive software to prevent exceptions from being generated. The ARPL (adjust RPL) instruction adjusts the RPL (requestor privilege level) of a segment selector to match that of the program or procedure that supplied the segment selector. Refer to Section 4.10.4., Checking Caller Access Privileges (ARPL Instruction) in Chapter 4, Protection for a detailed explanation of the function and use of this instruction. The LAR (load access rights) instruction verifies the accessibility of a specified segment and loads the access rights information from the segments segment descriptor into a generalpurpose register. Software can then examine the access rights to determine if the segment type is compatible with its intended use. Refer to Section 4.10.1., Checking Access Rights (LAR Instruction) in Chapter 4, Protection for a detailed explanation of the function and use of this instruction. The LSL (load segment limit) instruction verifies the accessibility of a specified segment and loads the segment limit from the segments segment descriptor into a general-purpose register. Software can then compare the segment limit with an offset into the segment to determine whether the offset lies within the segment. Refer to Section 4.10.3., Checking That the Pointer Offset Is Within Limits (LSL Instruction) in Chapter 4, Protection for a detailed explanation of the function and use of this instruction. The VERR (verify for reading) and VERW (verify for writing) instructions verify if a selected segment is readable or writable, respectively, at the CPL. Refer to Section 4.10.2., Checking Read/Write Rights (VERR and VERW Instructions) in Chapter 4, Protection for a detailed explanation of the function and use of this instruction.
2.6.3.
Loading and Storing Debug Registers
The internal debugging facilities in the processor are controlled by a set of 8 debug registers (DR0 through DR7). The MOV instruction allows setup data to be loaded into and stored from these registers.
2.6.4.
Invalidating Caches and TLBs
The processor provides several instructions for use in explicitly invalidating its caches and TLB entries. The INVD (invalidate cache with no writeback) instruction invalidates all data and instruction entries in the internal caches and TLBs and sends a signal to the external caches indicating that they should be invalidated also. The WBINVD (invalidate cache with writeback) instruction performs the same function as the INVD instruction, except that it writes back any modified lines in its internal caches to memory before it invalidates the caches. After invalidating the internal caches, it signals the external caches to write back modified data and invalidate their contents. The INVLPG (invalidate TLB entry) instruction invalidates (flushes) the TLB entry for a specified page.
2-21
2.6.5.
Controlling the Processor
The HLT (halt processor) instruction stops the processor until an enabled interrupt (such as NMI or SMI, which are normally enabled), the BINIT# signal, the INIT# signal, or the RESET# signal is received. The processor generates a special bus cycle to indicate that the halt mode has been entered. Hardware may respond to this signal in a number of ways. An indicator light on the front panel may be turned on. An NMI interrupt for recording diagnostic information may be generated. Reset initialization may be invoked. (Note that the BINIT# pin was introduced with the Pentium Pro processor.) The LOCK prefix invokes a locked (atomic) read-modify-write operation when modifying a memory operand. This mechanism is used to allow reliable communications between processors in multiprocessor systems. In the Pentium and earlier Intel Architecture processors, the LOCK prefix causes the processor to assert the LOCK# signal during the instruction, which always causes an explicit bus lock to occur. In the P6 family processors, the locking operation is handled with either a cache lock or bus lock. If a memory access is cacheable and affects only a single cache line, a cache lock is invoked and the system bus and the actual memory location in system memory are not locked during the operation. Here, other P6 family processors on the bus writeback any modified data and invalidate their caches as necessary to maintain system memory coherency. If the memory access is not cacheable and/or it crosses a cache line boundary, the processors LOCK# signal is asserted and the processor does not respond to requests for bus control during the locked operation. The RSM (return from SMM) instruction restores the processor (from a context dump) to the state it was in prior to an system management mode (SMM) interrupt.
2.6.6.
Reading Performance-Monitoring and Time-Stamp Counters
The RDPMC (read performance-monitoring counter) and RDTSC (read time-stamp counter) instructions allow an application program to read the processors performance-monitoring and time-stamp counters, respectively. The P6 family processors have two 40-bit performance counters that record either the occurrence of events or the duration of events. The events that can be monitored include the number of instructions decoded, number of interrupts received, of number of cache loads. Each counter can be set up to monitor a different event, using the system instruction WRMSR to set up values in the model-specific registers PerfEvtSel0 and PerfEvtSel1. The RDPMC instruction loads the current count in counter 0 or 1 into the EDX:EAX registers. The time-stamp counter is a model-specific 64-bit counter that is reset to zero each time the processor is reset. If not reset, the counter will increment ~6.3 x 10 15 times per year when the processor is operating at a clock rate of 200 MHz. At this clock frequency, it would take over 2000 years for the counter to wrap around. The RDTSC instruction loads the current count of the time-stamp counter into the EDX:EAX registers.
2-22
Refer to Section 15.5., Time-Stamp Counter, and Section 15.6., Performance-Monitoring Counters, in Chapter 15, Debugging and Performance Monitoring, for more information about the performance monitoring and time-stamp counters. The RDTSC instruction was introduced into the Intel Architecture with the Pentium processor. The RDPMC instruction was introduced into the Intel Architecture with the Pentium Pro processor and the Pentium processor with MMX technology. Earlier Pentium processors have two performance-monitoring counters, but they can be read only with the RDMSR instruction, and only at privilege level 0.
2.6.7.
Reading and Writing Model-Specific Registers
The RDMSR (read model-specific register) and WRMSR (write model-specific register) allow the processors 64-bit model-specific registers (MSRs) to be read and written to, respectively. The MSR to be read or written to is specified by the value in the ECX register. The RDMSR instruction reads the value from the specified MSR into the EDX:EAX registers; the WRMSR writes the value in the EDX:EAX registers into the specified MSR. Refer to Section 8.4., Model-Specific Registers (MSRs) in Chapter 8, Processor Management and Initialization for more information about the MSRs. The RDMSR and WRMSR instructions were introduced into the Intel Architecture with the Pentium processor.
2.6.8.
Loading and Storing the Streaming SIMD Extensions Control/Status Word
The LDMXCSR (load Streaming SIMD Extensions control/status word from memory) and STMXCSR (store Streaming SIMD Extensions control/status word to memory) allow the Pentium III processors 32-bit control/status word to be read and written to, respectively. The MXCSR control/status register is used to enable masked/unmasked exception handling, to set rounding modes, to set flush-to-zero mode, and to view exception status flags. For more information on the LDMXCSR and STMXCSR instructions, refer to the Intel Architecture Software Developers Manual, Vol 2, for a complete description of these instructions.
2-23
2-24
3
Protected-Mode Memory Management
PROTECTED-MODE MEMORY MANAGEMENT
CHAPTER 3 PROTECTED-MODE MEMORY MANAGEMENT

This chapter describes the Intel Architectures protected-mode memory management facilities, including the physical memory requirements, the segmentation mechanism, and the paging mechanism. Refer to Chapter 4, Protection for a description of the processors protection mechanism. Refer to Chapter 16, 8086 Emulation for a description of memory addressing protection in real-address and virtual-8086 modes.
3.1.
MEMORY MANAGEMENT OVERVIEW
The memory management facilities of the Intel Architecture are divided into two parts: segmentation and paging. Segmentation provides a mechanism of isolating individual code, data, and stack modules so that multiple programs (or tasks) can run on the same processor without interfering with one another. Paging provides a mechanism for implementing a conventional demand-paged, virtual-memory system where sections of a programs execution environment are mapped into physical memory as needed. Paging can also be used to provide isolation between multiple tasks. When operating in protected mode, some form of segmentation must be used. There is no mode bit to disable segmentation. The use of paging, however, is optional. These two mechanisms (segmentation and paging) can be configured to support simple singleprogram (or single-task) systems, multitasking systems, or multiple-processor systems that used shared memory. As shown in Figure 3-1, segmentation provides a mechanism for dividing the processors addressable memory space (called the linear address space) into smaller protected address spaces called segments. Segments can be used to hold the code, data, and stack for a program or to hold system data structures (such as a TSS or LDT). If more than one program (or task) is running on a processor, each program can be assigned its own set of segments. The processor then enforces the boundaries between these segments and insures that one program does not interfere with the execution of another program by writing into the other programs segments. The segmentation mechanism also allows typing of segments so that the operations that may be performed on a particular type of segment can be restricted. All of the segments within a system are contained in the processors linear address space. To locate a byte in a particular segment, a logical address (sometimes called a far pointer) must be provided. A logical address consists of a segment selector and an offset. The segment selector is a unique identifier for a segment. Among other things it provides an offset into a descriptor table (such as the global descriptor table, GDT) to a data structure called a segment descriptor. Each segment has a segment descriptor, which specifies the size of the segment, the access rights and privilege level for the segment, the segment type, and the location of the first byte of the segment in the linear address space (called the base address of the segment). The offset part of the logical address is added to the base address for the segment to locate a byte within the segment. The base address plus the offset thus forms a linear address in the processors linear
3-1
address space.
Logical Address (or Far Pointer) Segment Selector
Offset
Linear Address Space
Global Descriptor Table (GDT) Segment Segment Descriptor
Dir
Linear Address Table Offset
Physical Address Space Page Phy. Addr.
Page Table Page Directory
Lin. Addr. Entry Segment Base Address Page
Entry
Segmentation
Paging
Figure 3-1. Segmentation and Paging
If paging is not used, the linear address space of the processor is mapped directly into the physical address space of processor. The physical address space is defined as the range of addresses that the processor can generate on its address bus. Because multitasking computing systems commonly define a linear address space much larger than it is economically feasible to contain all at once in physical memory, some method of virtualizing the linear address space is needed. This virtualization of the linear address space is handled through the processors paging mechanism. Paging supports a virtual memory environment where a large linear address space is simulated with a small amount of physical memory (RAM and ROM) and some disk storage. When using paging, each segment is divided into pages (ordinarily 4 KBytes each in size), which are stored either in physical memory or on the disk. The operating system or executive maintains a page directory and a set of page tables to keep track of the pages. When a program (or task) attempts to access an address location in the linear address space, the processor uses the page directory
3-2
and page tables to translate the linear address into a physical address and then performs the requested operation (read or write) on the memory location. If the page being accessed is not currently in physical memory, the processor interrupts execution of the program (by generating a page-fault exception). The operating system or executive then reads the page into physical memory from the disk and continues executing the program. When paging is implemented properly in the operating-system or executive, the swapping of pages between physical memory and the disk is transparent to the correct execution of a program. Even programs written for 16-bit Intel Architecture processors can be paged (transparently) when they are run in virtual-8086 mode.
3.2.
USING SEGMENTS
The segmentation mechanism supported by the Intel Architecture can be used to implement a wide variety of system designs. These designs range from flat models that make only minimal use of segmentation to protect programs to multisegmented models that employ segmentation to create a robust operating environment in which multiple programs and tasks can be executed reliably. The following sections give several examples of how segmentation can be employed in a system to improve memory management performance and reliability.
3.2.1.
Basic Flat Model
The simplest memory model for a system is the basic flat model, in which the operating system and application programs have access to a continuous, unsegmented address space. To the greatest extent possible, this basic flat model hides the segmentation mechanism of the architecture from both the system designer and the application programmer. To implement a basic flat memory model with the Intel Architecture, at least two segment descriptors must be created, one for referencing a code segment and one for referencing a data segment (refer to Figure 3-2). Both of these segments, however, are mapped to the entire linear address space: that is, both segment descriptors have the same base address value of 0 and the same segment limit of 4 GBytes. By setting the segment limit to 4 GBytes, the segmentation mechanism is kept from generating exceptions for out of limit memory references, even if no physical memory resides at a particular address. ROM (EPROM) is generally located at the top of the physical address space, because the processor begins execution at FFFF_FFF0H. RAM (DRAM) is placed at the bottom of the address space because the initial base address for the DS data segment after reset initialization is 0.
3-3
Linear Address Space (or Physical Memory) Segment Registers CS SS DS ES FS GS Access Limit Base Address Code- and Data-Segment Descriptors Code FFFFFFFFH
Not Present Data and Stack
Figure 3-2. Flat Model
3.2.2.
Protected Flat Model
The protected flat model is similar to the basic flat model, except the segment limits are set to include only the range of addresses for which physical memory actually exists (refer to Figure 3-3). A general-protection exception (#GP) is then generated on any attempt to access nonexistent memory. This model provides a minimum level of hardware protection against some kinds of program bugs.
Segment Descriptors Segment Registers CS Not Present ES SS DS FS GS Memory I/O Access Limit Base Address Data and Stack 0 Access Limit Base Address
Linear Address Space (or Physical Memory) Code FFFFFFFFH
Figure 3-3. Protected Flat Model
More complexity can be added to this protected flat model to provide more protection. For example, for the paging mechanism to provide isolation between user and supervisor code and data, four segments need to be defined: code and data segments at privilege level 3 for the user, and code and data segments at privilege level 0 for the supervisor. Usually these segments all overlay each other and start at address 0 in the linear address space. This flat segmentation
3-4
model along with a simple paging structure can protect the operating system from applications, and by adding a separate paging structure for each task or process, it can also protect applications from each other. Similar designs are used by several popular multitasking operating systems.
3.2.3.
Multisegment Model
A multisegment model (such as the one shown in Figure 3-4) uses the full capabilities of the segmentation mechanism to provided hardware enforced protection of code, data structures, and programs and tasks. Here, each program (or task) is given its own table of segment descriptors and its own segments. The segments can be completely private to their assigned programs or shared among programs. Access to all segments and to the execution environments of individual programs running on the system is controlled by hardware.
Segment Descriptors Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Access Limit Base Address Data Data Data Data Linear Address Space (or Physical Memory) Stack
Segment Registers CS SS DS ES FS GS
Code
Figure 3-4. Multisegment Model
3-5
Access checks can be used to protect not only against referencing an address outside the limit of a segment, but also against performing disallowed operations in certain segments. For example, since code segments are designated as read-only segments, hardware can be used to prevent writes into code segments. The access rights information created for segments can also be used to set up protection rings or levels. Protection levels can be used to protect operatingsystem procedures from unauthorized access by application programs.
3.2.4.
Paging and Segmentation
Paging can be used with any of the segmentation models described in Figures 3-2, 3-3, and 3-4. The processors paging mechanism divides the linear address space (into which segments are mapped) into pages (as shown in Figure 3-1). These linear-address-space pages are then mapped to pages in the physical address space. The paging mechanism offers several page-level protection facilities that can be used with or instead of the segment-protection facilities. For example, it lets read-write protection be enforced on a page-by-page basis. The paging mechanism also provides two-level user-supervisor protection that can also be specified on a page-by-page basis.
3.3.
PHYSICAL ADDRESS SPACE
In protected mode, the Intel Architecture provides a normal physical address space of 4 GBytes (232 bytes). This is the address space that the processor can address on its address bus. This address space is flat (unsegmented), with addresses ranging continuously from 0 to FFFFFFFFH. This physical address space can be mapped to read-write memory, read-only memory, and memory mapped I/O. The memory mapping facilities described in this chapter can be used to divide this physical memory up into segments and/or pages. (Introduced in the Pentium Pro processor.) The Intel Architecture also supports an extension of the physical address space to 236 bytes (64 GBytes), with a maximum physical address of FFFFFFFFFH. This extension is invoked with the physical address extension (PAE) flag, located in bit 5 of control register CR4. (Refer to Section 3.8., Physical Address Extension for more information about extended physical addressing.)
3.4.
LOGICAL AND LINEAR ADDRESSES
At the system-architecture level in protected mode, the processor uses two stages of address translation to arrive at a physical address: logical-address translation and linear address space paging. Even with the minimum use of segments, every byte in the processors address space is accessed with a logical address. A logical address consists of a 16-bit segment selector and a 32-bit offset (refer to Figure 3-5). The segment selector identifies the segment the byte is located in and the offset specifies the location of the byte in the segment relative to the base address of the segment. The processor translates every logical address into a linear address. A linear address is a 32-bit address in the processors linear address space. Like the physical address space, the linear address space is a flat (unsegmented), 232-byte address space, with addresses ranging from 0 to
3-6
FFFFFFFH. The linear address space contains all the segments and system tables defined for a system. To translate a logical address into a linear address, the processor does the following: 1. Uses the offset in the segment selector to locate the segment descriptor for the segment in the GDT or LDT and reads it into the processor. (This step is needed only when a new segment selector is loaded into a segment register.) 2. Examines the segment descriptor to check the access rights and range of the segment to insure that the segment is accessible and that the offset is within the limits of the segment. 3. Adds the base address of the segment from the segment descriptor to the offset to form a linear address.
Logical Address
15 0 Seg. Selector Descriptor Table
31 Offset
Segment Descriptor
Base Address
+
0
31 Linear Address
Figure 3-5. Logical Address to Linear Address Translation
If paging is not used, the processor maps the linear address directly to a physical address (that is, the linear address goes out on the processors address bus). If the linear address space is paged, a second level of address translation is used to translate the linear address into a physical address. Page translation is described in Section 3.6., Paging (Virtual Memory)
3.4.1.
Segment Selectors
A segment selector is a 16-bit identifier for a segment (refer to Figure 3-6). It does not point directly to the segment, but instead points to the segment descriptor that defines the segment. A segment selector contains the following items: Index (Bits 3 through 15). Selects one of 8192 descriptors in the GDT or LDT. The processor multiplies the index value by 8 (the number of bytes in a segment descriptor) and adds the result to the base address of the GDT or LDT (from the GDTR or LDTR register, respectively).
3-7
TI (table indicator) flag (Bit 2). Specifies the descriptor table to use: clearing this flag selects the GDT; setting this flag selects the current LDT.
15
3 2 1 0
Index
T RPL I
Table Indicator 0 = GDT 1 = LDT Requested Privilege Level (RPL)
Figure 3-6. Segment Selector
Requested Privilege Level (RPL) (Bits 0 and 1). Specifies the privilege level of the selector. The privilege level can range from 0 to 3, with 0 being the most privileged level. Refer to Section 4.5., Privilege Levels in Chapter 4, Protection for a description of the relationship of the RPL to the CPL of the executing program (or task) and the descriptor privilege level (DPL) of the descriptor the segment selector points to. The first entry of the GDT is not used by the processor. A segment selector that points to this entry of the GDT (that is, a segment selector with an index of 0 and the TI flag set to 0) is used as a null segment selector. The processor does not generate an exception when a segment register (other than the CS or SS registers) is loaded with a null selector. It does, however, generate an exception when a segment register holding a null selector is used to access memory. A null selector can be used to initialize unused segment registers. Loading the CS or SS register with a null segment selector causes a general-protection exception (#GP) to be generated. Segment selectors are visible to application programs as part of a pointer variable, but the values of selectors are usually assigned or modified by link editors or linking loaders, not application programs.
3.4.2.
Segment Registers
To reduce address translation time and coding complexity, the processor provides registers for holding up to 6 segment selectors (refer to Figure 3-7). Each of these segment registers support a specific kind of memory reference (code, stack, or data). For virtually any kind of program execution to take place, at least the code-segment (CS), data-segment (DS), and stack-segment (SS) registers must be loaded with valid segment selectors. The processor also provides three additional data-segment registers (ES, FS, and GS), which can be used to make additional data segments available to the currently executing program (or task). For a program to access a segment, the segment selector for the segment must have been loaded in one of the segment registers. So, although a system can define thousands of segments, only 6
3-8
can be available for immediate use. Other segments can be made available by loading their segment selectors into these registers during program execution.
Visible Part Segment Selector
Hidden Part Base Address, Limit, Access Information CS SS DS ES FS GS
Figure 3-7. Segment Registers
Every segment register has a visible part and a hidden part. (The hidden part is sometimes referred to as a descriptor cache or a shadow register.) When a segment selector is loaded into the visible part of a segment register, the processor also loads the hidden part of the segment register with the base address, segment limit, and access control information from the segment descriptor pointed to by the segment selector. The information cached in the segment register (visible and hidden) allows the processor to translate addresses without taking extra bus cycles to read the base address and limit from the segment descriptor. In systems in which multiple processors have access to the same descriptor tables, it is the responsibility of software to reload the segment registers when the descriptor tables are modified. If this is not done, an old segment descriptor cached in a segment register might be used after its memory-resident version has been modified. Two kinds of load instructions are provided for loading the segment registers: 1. Direct load instructions such as the MOV, POP, LDS, LES, LSS, LGS, and LFS instructions. These instructions explicitly reference the segment registers. 2. Implied load instructions such as the far pointer versions of the CALL, JMP, and RET instructions and the IRET, INTn, INTO and INT3 instructions. These instructions change the contents of the CS register (and sometimes other segment registers) as an incidental part of their operation. The MOV instruction can also be used to store visible part of a segment register in a generalpurpose register.
3.4.3.
Segment Descriptors
A segment descriptor is a data structure in a GDT or LDT that provides the processor with the size and location of a segment, as well as access control and status information. Segment descriptors are typically created by compilers, linkers, loaders, or the operating system or exec-
3-9
utive, but not application programs. Figure 3-8 illustrates the general descriptor format for all types of segment descriptors. The flags and fields in a segment descriptor are as follows: Segment limit field Specifies the size of the segment. The processor puts together the two segment limit fields to form a 20-bit value. The processor interprets the segment limit in one of two ways, depending on the setting of the G (granularity) flag: If the granularity flag is clear, the segment size can range from 1 byte to 1 MByte, in byte increments. If the granularity flag is set, the segment size can range from 4 KBytes to 4 GBytes, in 4-KByte increments.
The processor uses the segment limit in two different ways, depending on whether the segment is an expand-up or an expand-down segment. Refer to Section 3.4.3.1., Code- and Data-Segment Descriptor Types for more information about segment types. For expand-up segments, the offset in a logical address can range from 0 to the segment limit. Offsets greater than the segment limit generate general-protection exceptions (#GP). For expand-down segments, the segment limit has the reverse function; the offset can range from the segment limit to FFFFFFFFH or FFFFH, depending on the setting of the B flag. Offsets less than the segment limit generate general-protection exceptions. Decreasing the value in the segment limit field for an expand-down segment allocates new memory at the bottom of the segment's address space, rather than at the top. Intel Architecture stacks always grow downwards, making this mechanism is convenient for expandable stacks.
3-10
31
24 23 22 21 20 19
16 15 14 13 12 11
8 7
Base 31:24
D A G / 0 V B L
Seg. Limit 19:16
D P L
Type
Base 23:16
31
16 15
Base Address 15:00
Segment Limit 15:00
AVL Available for use by system software BASE Segment base address D/B Default operation size (0 = 16-bit segment; 1 = 32-bit segment) DPL Descriptor privilege level G Granularity LIMIT Segment Limit P Segment present S Descriptor type (0 = system; 1 = code or data) TYPE Segment type
Figure 3-8. Segment Descriptor
Base address fields Defines the location of byte 0 of the segment within the 4-GByte linear address space. The processor puts together the three base address fields to form a single 32-bit value. Segment base addresses should be aligned to 16-byte boundaries. Although 16-byte alignment is not required, this alignment allows programs to maximize performance by aligning code and data on 16-byte boundaries. Type field Indicates the segment or gate type and specifies the kinds of access that can be made to the segment and the direction of growth. The interpretation of this field depends on whether the descriptor type flag specifies an application (code or data) descriptor or a system descriptor. The encoding of the type field is different for code, data, and system descriptors (refer to Figure 4-1 in Chapter 4, Protection). Refer to Section 3.4.3.1., Code- and Data-Segment Descriptor Types for a description of how this field is used to specify code and datasegment types.
S (descriptor type) flag Specifies whether the segment descriptor is for a system segment (S flag is clear) or a code or data segment (S flag is set).
3-11
DPL (descriptor privilege level) field Specifies the privilege level of the segment. The privilege level can range from 0 to 3, with 0 being the most privileged level. The DPL is used to control access to the segment. Refer to Section 4.5., Privilege Levels in Chapter 4, Protection for a description of the relationship of the DPL to the CPL of the executing code segment and the RPL of a segment selector. P (segment-present) flag Indicates whether the segment is present in memory (set) or not present (clear). If this flag is clear, the processor generates a segment-not-present exception (#NP) when a segment selector that points to the segment descriptor is loaded into a segment register. Memory management software can use this flag to control which segments are actually loaded into physical memory at a given time. It offers a control in addition to paging for managing virtual memory. Figure 3-9 shows the format of a segment descriptor when the segment-present flag is clear. When this flag is clear, the operating system or executive is free to use the locations marked Available to store its own data, such as information regarding the whereabouts of the missing segment. D/B (default operation size/default stack pointer size and/or upper bound) flag Performs different functions depending on whether the segment descriptor is an executable code segment, an expand-down data segment, or a stack segment. (This flag should always be set to 1 for 32-bit code and data segments and to 0 for 16-bit code and data segments.) Executable code segment. The flag is called the D flag and it indicates the default length for effective addresses and operands referenced by instructions in the segment. If the flag is set, 32-bit addresses and 32-bit or 8-bit operands are assumed; if it is clear, 16-bit addresses and 16-bit or 8-bit operands are assumed. The instruction prefix 66H can be used to select an operand size other than the default, and the prefix 67H can be used select an address size other than the default. Stack segment (data segment pointed to by the SS register). The flag is called the B (big) flag and it specifies the size of the stack pointer used for implicit stack operations (such as pushes, pops, and calls). If the flag is set, a 32-bit stack pointer is used, which is stored in the 32-bit ESP register; if the flag is clear, a 16-bit stack pointer is used, which is stored in the 16-bit SP register. If the stack segment is set up to be an expand-down data segment (described in the next paragraph), the B flag also specifies the upper bound of the stack segment. Expand-down data segment. The flag is called the B flag and it specifies the upper bound of the segment. If the flag is set, the upper bound is FFFFFFFFH (4 GBytes); if the flag is clear, the upper bound is FFFFH (64 KBytes).
3-12
31
16 15 14 13 12 11
8 7
Available
D P L
Type
Available
31
Available
Figure 3-9. Segment Descriptor When Segment-Present Flag Is Clear
G (granularity) flag Determines the scaling of the segment limit field. When the granularity flag is clear, the segment limit is interpreted in byte units; when flag is set, the segment limit is interpreted in 4-KByte units. (This flag does not affect the granularity of the base address; it is always byte granular.) When the granularity flag is set, the twelve least significant bits of an offset are not tested when checking the offset against the segment limit. For example, when the granularity flag is set, a limit of 0 results in valid offsets from 0 to 4095. Available and reserved bits Bit 20 of the second doubleword of the segment descriptor is available for use by system software; bit 21 is reserved and should always be set to 0. 3.4.3.1. CODE- AND DATA-SEGMENT DESCRIPTOR TYPES
When the S (descriptor type) flag in a segment descriptor is set, the descriptor is for either a code or a data segment. The highest order bit of the type field (bit 11 of the second double word of the segment descriptor) then determines whether the descriptor is for a data segment (clear) or a code segment (set). For data segments, the three low-order bits of the type field (bits 8, 9, and 10) are interpreted as accessed (A), write-enable (W), and expansion-direction (E). Refer to Table 3-1 for a description of the encoding of the bits in the type field for code and data segments. Data segments can be read-only or read/write segments, depending on the setting of the write-enable bit.
3-13
Table 3-1. Code- and Data-Segment Types

Type Field 11 Decimal 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 10 E 0 0 0 0 1 1 1 1 C 8 9 10 11 12 13 14 15 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 9 W 0 0 1 1 0 0 1 1 R 0 0 1 1 0 0 1 1 8 A 0 1 0 1 0 1 0 1 A 0 1 0 1 0 1 0 1 Code Code Code Code Code Code Code Code Execute-Only Execute-Only, accessed Execute/Read Execute/Read, accessed Execute-Only, conforming Execute-Only, conforming, accessed Execute/Read-Only, conforming Execute/Read-Only, conforming, accessed Descriptor Type Data Data Data Data Data Data Data Data Description Read-Only Read-Only, accessed Read/Write Read/Write, accessed Read-Only, expand-down Read-Only, expand-down, accessed Read/Write, expand-down Read/Write, expand-down, accessed
Stack segments are data segments which must be read/write segments. Loading the SS register with a segment selector for a nonwritable data segment generates a general-protection exception (#GP). If the size of a stack segment needs to be changed dynamically, the stack segment can be an expand-down data segment (expansion-direction flag set). Here, dynamically changing the segment limit causes stack space to be added to the bottom of the stack. If the size of a stack segment is intended to remain static, the stack segment may be either an expand-up or expanddown type. The accessed bit indicates whether the segment has been accessed since the last time the operating-system or executive cleared the bit. The processor sets this bit whenever it loads a segment selector for the segment into a segment register. The bit remains set until explicitly cleared. This bit can be used both for virtual memory management and for debugging. For code segments, the three low-order bits of the type field are interpreted as accessed (A), read enable (R), and conforming (C). Code segments can be execute-only or execute/read, depending on the setting of the read-enable bit. An execute/read segment might be used when constants or other static data have been placed with instruction code in a ROM. Here, data can be read from the code segment either by using an instruction with a CS override prefix or by loading a segment selector for the code segment in a data-segment register (the DS, ES, FS, or GS registers). In protected mode, code segments are not writable. Code segments can be either conforming or nonconforming. A transfer of execution into a moreprivileged conforming segment allows execution to continue at the current privilege level. A transfer into a nonconforming segment at a different privilege level results in a general-protection exception (#GP), unless a call gate or task gate is used (refer to Section 4.8.1., Direct Calls or Jumps to Code Segments in Chapter 4, Protection for more information on conforming and
3-14
nonconforming code segments). System utilities that do not access protected facilities and handlers for some types of exceptions (such as, divide error or overflow) may be loaded in conforming code segments. Utilities that need to be protected from less privileged programs and procedures should be placed in nonconforming code segments.
NOTE
Execution cannot be transferred by a call or a jump to a less-privileged (numerically higher privilege level) code segment, regardless of whether the target segment is a conforming or nonconforming code segment. Attempting such an execution transfer will result in a general-protection exception. All data segments are nonconforming, meaning that they cannot be accessed by less privileged programs or procedures (code executing at numerically high privilege levels). Unlike code segments, however, data segments can be accessed by more privileged programs or procedures (code executing at numerically lower privilege levels) without using a special access gate. The processor may update the Type field when a segment is accessed, even if the access is a read cycle. If the descriptor tables have been put in ROM, it may be necessary for hardware to prevent the ROM from being enabled onto the data bus during a write cycle. It also may be necessary to return the READY# signal to the processor when a write cycle to ROM occurs, otherwise the cycle will not terminate. These features of the hardware design are necessary for using ROM-based descriptor tables with the Intel386 DX processor, which always sets the Accessed bit when a segment descriptor is loaded. The P6 family, Pentium, and Intel486 processors, however, only set the accessed bit if it is not already set. Writes to descriptor tables in ROM can be avoided by setting the accessed bits in every descriptor.
3.5.
SYSTEM DESCRIPTOR TYPES
When the S (descriptor type) flag in a segment descriptor is clear, the descriptor type is a system descriptor. The processor recognizes the following types of system descriptors:
Local descriptor-table (LDT) segment descriptor. Task-state segment (TSS) descriptor. Call-gate descriptor. Interrupt-gate descriptor. Trap-gate descriptor. Task-gate descriptor.
These descriptor types fall into two categories: system-segment descriptors and gate descriptors. System-segment descriptors point to system segments (LDT and TSS segments). Gate descriptors are in themselves gates, which hold pointers to procedure entry points in code segments (call, interrupt, and trap gates) or which hold segment selectors for TSSs (task gates). Table 3-2 shows the encoding of the type field for system-segment descriptors and gate descriptors.
3-15
Table 3-2. System-Segment and Gate-Descriptor Types

Type Field Decimal 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 11 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 10 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 9 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 8 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Reserved 16-Bit TSS (Available) LDT 16-Bit TSS (Busy) 16-Bit Call Gate Task Gate 16-Bit Interrupt Gate 16-Bit Trap Gate Reserved 32-Bit TSS (Available) Reserved 32-Bit TSS (Busy) 32-Bit Call Gate Reserved 32-Bit Interrupt Gate 32-Bit Trap Gate Description
For more information on the system-segment descriptors, refer to Section 3.5.1., Segment Descriptor Tables, and Section 6.2.2., TSS Descriptor in Chapter 6, Task Management. For more information on the gate descriptors, refer to Section 4.8.2., Gate Descriptors in Chapter 4, Protection; Section 5.9., IDT Descriptors in Chapter 5, Interrupt and Exception Handling; and Section 6.2.4., Task-Gate Descriptor in Chapter 6, Task Management.
3.5.1.
Segment Descriptor Tables
A segment descriptor table is an array of segment descriptors (refer to Figure 3-10). A descriptor table is variable in length and can contain up to 8192 (213) 8-byte descriptors. There are two kinds of descriptor tables:
The global descriptor table (GDT) The local descriptor tables (LDT)
3-16
Global Descriptor Table (GDT)

T I
Local Descriptor Table (LDT)
TI = 0 56 48 40 32 24 16 8 First Descriptor in GDT is Not Used 0
TI = 1 56 48 40 32 24 16 8 0
Segment Selector
GDTR Register Limit Base Address
LDTR Register Limit Base Address Seg. Sel.
Figure 3-10. Global and Local Descriptor Tables
Each system must have one GDT defined, which may be used for all programs and tasks in the system. Optionally, one or more LDTs can be defined. For example, an LDT can be defined for each separate task being run, or some or all tasks can share the same LDT. The GDT is not a segment itself; instead, it is a data structure in the linear address space. The base linear address and limit of the GDT must be loaded into the GDTR register (refer to Section 2.4., Memory-Management Registers in Chapter 2, System Architecture Overview). The base addresses of the GDT should be aligned on an eight-byte boundary to yield the best processor performance. The limit value for the GDT is expressed in bytes. As with segments, the limit value is added to the base address to get the address of the last valid byte. A limit value of 0 results in exactly one valid byte. Because segment descriptors are always 8 bytes long, the GDT limit should always be one less than an integral multiple of eight (that is, 8N 1). The first descriptor in the GDT is not used by the processor. A segment selector to this null descriptor does not generate an exception when loaded into a data-segment register (DS, ES, FS, or GS), but it always generates a general-protection exception (#GP) when an attempt is
3-17
made to access memory using the descriptor. By initializing the segment registers with this segment selector, accidental reference to unused segment registers can be guaranteed to generate an exception. The LDT is located in a system segment of the LDT type. The GDT must contain a segment descriptor for the LDT segment. If the system supports multiple LDTs, each must have a separate segment selector and segment descriptor in the GDT. The segment descriptor for an LDT can be located anywhere in the GDT. Refer to Section 3.5., System Descriptor Types for information on the LDT segment-descriptor type. An LDT is accessed with its segment selector. To eliminate address translations when accessing the LDT, the segment selector, base linear address, limit, and access rights of the LDT are stored in the LDTR register (refer to Section 2.4., Memory-Management Registers in Chapter 2, System Architecture Overview). When the GDTR register is stored (using the SGDT instruction), a 48-bit pseudo-descriptor is stored in memory (refer to Figure 3-11). To avoid alignment check faults in user mode (privilege level 3), the pseudo-descriptor should be located at an odd word address (that is, address MOD 4 is equal to 2). This causes the processor to store an aligned word, followed by an aligned doubleword. User-mode programs normally do not store pseudo-descriptors, but the possibility of generating an alignment check fault can be avoided by aligning pseudo-descriptors in this way. The same alignment should be used when storing the IDTR register using the SIDT instruction. When storing the LDTR or task register (using the SLTR or STR instruction, respectively), the pseudo-descriptor should be located at a doubleword address (that is, address MOD 4 is equal to 0).
47 Base Address 16 15 Limit 0
Figure 3-11. Pseudo-Descriptor Format
3.6.
PAGING (VIRTUAL MEMORY)
When operating in protected mode, the Intel Architecture permits the linear address space to be mapped directly into a large physical memory (for example, 4 GBytes of RAM) or indirectly (using paging) into a smaller physical memory and disk storage. This latter method of mapping the linear address space is commonly referred to as virtual memory or demand-paged virtual memory. When paging is used, the processor divides the linear address space into fixed-size pages (generally 4 KBytes in length) that can be mapped into physical memory and/or disk storage. When a program (or task) references a logical address in memory, the processor translates the address into a linear address and then uses its paging mechanism to translate the linear address into a corresponding physical address. If the page containing the linear address is not currently in physical memory, the processor generates a page-fault exception (#PF). The exception handler for the page-fault exception typically directs the operating system or executive to load the page from disk storage into physical memory (perhaps writing a different page from physical memory
3-18
out to disk in the process). When the page has been loaded in physical memory, a return from the exception handler causes the instruction that generated the exception to be restarted. The information that the processor uses to map linear addresses into the physical address space and to generate page-fault exceptions (when necessary) is contained in page directories and page tables stored in memory. Paging is different from segmentation through its use of fixed-size pages. Unlike segments, which usually are the same size as the code or data structures they hold, pages have a fixed size. If segmentation is the only form of address translation used, a data structure present in physical memory will have all of its parts in memory. If paging is used, a data structure can be partly in memory and partly in disk storage. To minimize the number of bus cycles required for address translation, the most recently accessed page-directory and page-table entries are cached in the processor in devices called translation lookaside buffers (TLBs). The TLBs satisfy most requests for reading the current page directory and page tables without requiring a bus cycle. Extra bus cycles occur only when the TLBs do not contain a page-table entry, which typically happens when a page has not been accessed for a long time. Refer to Section 3.7., Translation Lookaside Buffers (TLBs) for more information on the TLBs.
3.6.1.
Paging Options
Paging is controlled by three flags in the processors control registers:
PG (paging) flag, bit 31 of CR0 (available in all Intel Architecture processors beginning with the Intel386 processor). PSE (page size extensions) flag, bit 4 of CR4 (introduced in the Pentium and Pentium Pro processors). PAE (physical address extension) flag, bit 5 of CR4 (introduced in the Pentium Pro processors).
The PG flag enables the page-translation mechanism. The operating system or executive usually sets this flag during processor initialization. The PG flag must be set if the processors pagetranslation mechanism is to be used to implement a demand-paged virtual memory system or if the operating system is designed to run more than one program (or task) in virtual-8086 mode. The PSE flag enables large page sizes: 4-MByte pages or 2-MByte pages (when the PAE flag is set). When the PSE flag is clear, the more common page length of 4 KBytes is used. Refer to Chapter 3.6.2.2., Linear Address Translation (4-MByte Pages) and Section 3.8.2., Linear Address Translation With Extended Addressing Enabled (2-MByte or 4-MByte Pages) for more information about the use of the PSE flag. The PAE flag enables 36-bit physical addresses. This physical address extension can only be used when paging is enabled. It relies on page directories and page tables to reference physical addresses above FFFFFFFFH. Refer to Section 3.8., Physical Address Extension for more information about the physical address extension.
3-19
3.6.2.
Page Tables and Directories
The information that the processor uses to translate linear addresses into physical addresses (when paging is enabled) is contained in four data structures:
Page directoryAn array of 32-bit page-directory entries (PDEs) contained in a 4-KByte page. Up to 1024 page-directory entries can be held in a page directory. Page tableAn array of 32-bit page-table entries (PTEs) contained in a 4-KByte page. Up to 1024 page-table entries can be held in a page table. (Page tables are not used for 2MByte or 4-MByte pages. These page sizes are mapped directly from one or more pagedirectory entries.) PageA 4-KByte, 2-MByte, or 4-MByte flat address space. Page-Directory-Pointer TableAn array of four 64-bit entries, each of which points to a page directory. This data structure is only used when the physical address extension is enabled (refer to Section 3.8., Physical Address Extension).
These tables provide access to either 4-KByte or 4-MByte pages when normal 32-bit physical addressing is being used and to either 4-KByte, 2-MByte, or 4-MByte pages when extended (36bit) physical addressing is being used. Table 3-3 shows the page size and physical address size obtained from various settings of the paging control flags. Each page-directory entry contains a PS (page size) flag that specifies whether the entry points to a page table whose entries in turn point to 4-KByte pages (PS set to 0) or whether the page-directory entry points directly to a 4MByte or 2-MByte page (PSE or PAE set to 1 and PS set to 1).
Table 3-3. Page Sizes and Physical Address Sizes
PG Flag, CR0 0 1 1 1 1 1 PAE Flag, CR4 X 0 0 0 1 1 PSE Flag, CR4 X 0 1 1 X X PS Flag, PDE X X 0 1 0 1 Page Size 4 KBytes 4 KBytes 4 MBytes 4 KBytes 2 MBytes Physical Address Size Paging Disabled 32 Bits 32 Bits 32 Bits 36 Bits 36 Bits
3.6.2.1.
LINEAR ADDRESS TRANSLATION (4-KBYTE PAGES)
Figure 3-12 shows the page directory and page-table hierarchy when mapping linear addresses to 4-KByte pages. The entries in the page directory point to page tables, and the entries in a page table point to pages in physical memory. This paging method can be used to address up to 220 pages, which spans a linear address space of 232 bytes (4 GBytes).
3-20
Linear Address 31 22 21 12 11 Table Directory
0 Offset 12 4-KByte Page Physical Address
10 Page Directory
10
Page Table
Page-Table Entry Directory Entry 32* CR3 (PDBR) *32 bits aligned onto a 4-KByte boundary. 1024 PDE 1024 PTE = 220 Pages
Figure 3-12. Linear Address Translation (4-KByte Pages)
To select the various table entries, the linear address is divided into three sections:
Page-directory entryBits 22 through 31 provide an offset to an entry in the page directory. The selected entry provides the base physical address of a page table. Page-table entryBits 12 through 21 of the linear address provide an offset to an entry in the selected page table. This entry provides the base physical address of a page in physical memory. Page offsetBits 0 through 11 provides an offset to a physical address in the page.
Memory management software has the option of using one page directory for all programs and tasks, one page directory for each task, or some combination of the two. 3.6.2.2. LINEAR ADDRESS TRANSLATION (4-MBYTE PAGES)
Figure 3-12 shows how a page directory can be used to map linear addresses to 4-MByte pages. The entries in the page directory point to 4-MByte pages in physical memory. This paging method can be used to map up to 1024 pages into a 4-GByte linear address space.
3-21
31
Linear Address 22 21 Offset Directory 22
4-MByte Page Physical Address
10
Page Directory
Directory Entry 32* 1024 PDE = 1024 Pages CR3 (PDBR)
*32 bits aligned onto a 4-KByte boundary.
Figure 3-13. Linear Address Translation (4-MByte Pages)
The 4-MByte page size is selected by setting the PSE flag in control register CR4 and setting the page size (PS) flag in a page-directory entry (refer to Figure 3-14). With these flags set, the linear address is divided into two sections:
Page directory entryBits 22 through 31 provide an offset to an entry in the page directory. The selected entry provides the base physical address of a 4-MByte page. Page offsetBits 0 through 21 provides an offset to a physical address in the page.
NOTE
(For the Pentium processor only.) When enabling or disabling large page sizes, the TLBs must be invalidated (flushed) after the PSE flag in control register CR4 has been set or cleared. Otherwise, incorrect page translation might occur due to the processor using outdated page translation information stored in the TLBs. Refer to Section 9.10., Invalidating the Translation Lookaside Buffers (TLBs), in Chapter 9, Memory Cache Control, for information on how to invalidate the TLBs. 3.6.2.3. MIXING 4-KBYTE AND 4-MBYTE PAGES
When the PSE flag in CR4 is set, both 4-MByte pages and page tables for 4-KByte pages can be accessed from the same page directory. If the PSE flag is clear, only page tables for 4-KByte pages can be accessed (regardless of the setting of the PS flag in a page-directory entry). A typical example of mixing 4-KByte and 4-MByte pages is to place the operating system or executives kernel in a large page to reduce TLB misses and thus improve overall system performance. The processor maintains 4-MByte page entries and 4-KByte page entries in separate
3-22
TLBs. So, placing often used code such as the kernel in a large page, frees up 4-KByte-page TLB entries for application programs and tasks.
3.6.3.
Base Address of the Page Directory
The physical address of the current page directory is stored in the CR3 register (also called the page directory base register or PDBR). (Refer to Figure 2-5 and Section 2.5., Control Registers in Chapter 2, System Architecture Overview for more information on the PDBR.) If paging is to be used, the PDBR must be loaded as part of the processor initialization process (prior to enabling paging). The PDBR can then be changed either explicitly by loading a new value in CR3 with a MOV instruction or implicitly as part of a task switch. (Refer to Section 6.2.1., Task-State Segment (TSS) in Chapter 6, Task Management for a description of how the contents of the CR3 register is set for a task.) There is no present flag in the PDBR for the page directory. The page directory may be notpresent (paged out of physical memory) while its associated task is suspended, but the operating system must ensure that the page directory indicated by the PDBR image in a task's TSS is present in physical memory before the task is dispatched. The page directory must also remain in memory as long as the task is active.
3.6.4.
Page-Directory and Page-Table Entries
Figure 3-14 shows the format for the page-directory and page-table entries when 4-KByte pages and 32-bit physical addresses are being used. Figure 3-14 shows the format for the page-directory entries when 4-MByte pages and 32-bit physical addresses are being used. Refer to Section 3.8., Physical Address Extension for the format of page-directory and page-table entries when the physical address extension is being used.
3-23
Page-Directory Entry (4-KByte Page Table)

31 Page-Table Base Address 12 11 9 8 7 6 5 4 3 2 1 0
P P U R
Avail. G P 0 A C W / / P
S D T S W
Available for system programmers use Global page (Ignored) Page size (0 indicates 4 KBytes) Reserved (set to 0) Accessed Cache disabled Write-through User/Supervisor Read/Write Present Page-Table Entry (4-KByte Page)
31 Page Base Address 12 11 9 8 7 6 5 4 3 2 1 0
P P U R G 0 D A C W / / P D T S W
Avail.
Available for system programmers use Global page Reserved (set to 0) Dirty Accessed Cache disabled Write-through User/Supervisor Read/Write Present
Figure 3-14. Format of Page-Directory and Page-Table Entries for 4-KByte Pages and 32-Bit Physical Addresses
3-24
Page-Directory Entry (4-MByte Page)

31 Page Base Address 22 21 Reserved 12 11 9 8 7 6 5 4 3 2 1 0
P P U R
Avail. G P D A C W / / P
S D T S W
Available for system programmers use Global page Page size (1 indicates 4 MBytes) Dirty Accessed Cache disabled Write-through User/Supervisor Read/Write Present
Figure 3-15. Format of Page-Directory Entries for 4-MByte Pages and 32-Bit Addresses
The functions of the flags and fields in the entries in Figures 3-14 and 3-15 are as follows: Page base address, bits 12 through 32 (Page-table entries for 4-KByte pages.) Specifies the physical address of the first byte of a 4-KByte page. The bits in this field are interpreted as the 20 mostsignificant bits of the physical address, which forces pages to be aligned on 4-KByte boundaries. (Page-directory entries for 4-KByte page tables.) Specifies the physical address of the first byte of a page table. The bits in this field are interpreted as the 20 most-significant bits of the physical address, which forces page tables to be aligned on 4-KByte boundaries. (Page-directory entries for 4-MByte pages.) Specifies the physical address of the first byte of a 4-MByte page. Only bits 22 through 31 of this field are used (and bits 12 through 21 are reserved and must be set to 0, for Intel Architecture processors through the Pentium II processor). The base address bits are interpreted as the 10 most-significant bits of the physical address, which forces 4MByte pages to be aligned on 4-MByte boundaries. Present (P) flag, bit 0 Indicates whether the page or page table being pointed to by the entry is currently loaded in physical memory. When the flag is set, the page is in physical memory and address translation is carried out. When the flag is clear, the page is not in memory and, if the processor attempts to access the page, it generates a page-fault exception (#PF). The processor does not set or clear this flag; it is up to the operating system or executive to maintain the state of the flag.
3-25
The bit must be set to 1 whenever extended physical addressing mode is enabled. If the processor generates a page-fault exception, the operating system must carry out the following operations in the order below: 1. 2. Copy the page from disk storage into physical memory, if needed. Load the page address into the page-table or page-directory entry and set its present flag. Other bits, such as the dirty and accessed flags, may also be set at this time. Invalidate the current page-table entry in the TLB (refer to Section 3.7., Translation Lookaside Buffers (TLBs) for a discussion of TLBs and how to invalidate them). Return from the page-fault handler to restart the interrupted program or task.
3.
4.
Read/write (R/W) flag, bit 1 Specifies the read-write privileges for a page or group of pages (in the case of a page-directory entry that points to a page table). When this flag is clear, the page is read only; when the flag is set, the page can be read and written into. This flag interacts with the U/S flag and the WP flag in register CR0. Refer to Section 4.11., Page-Level Protection and Table 4-2 in Chapter 4, Protection for a detailed discussion of the use of these flags. User/supervisor (U/S) flag, bit 2 Specifies the user-supervisor privileges for a page or group of pages (in the case of a page-directory entry that points to a page table). When this flag is clear, the page is assigned the supervisor privilege level; when the flag is set, the page is assigned the user privilege level. This flag interacts with the R/W flag and the WP flag in register CR0. Refer to Section 4.11., Page-Level Protection and Table 4-2 in Chapter 4, Protection for a detail discussion of the use of these flags. Page-level write-through (PWT) flag, bit 3 Controls the write-through or write-back caching policy of individual pages or page tables. When the PWT flag is set, write-through caching is enabled for the associated page or page table; when the flag is clear, write-back caching is enabled for the associated page or page table. The processor ignores this flag if the CD (cache disable) flag in CR0 is set. Refer to Section 9.5., Cache Control, in Chapter 9, Memory Cache Control, for more information about the use of this flag. Refer to Section 2.5., Control Registers in Chapter 2, System Architecture Overview for a description of a companion PWT flag in control register CR3. Page-level cache disable (PCD) flag, bit 4 Controls the caching of individual pages or page tables. When the PCD flag is set, caching of the associated page or page table is prevented; when the flag is clear, the page or page table can be cached. This flag permits caching to be
3-26
disabled for pages that contain memory-mapped I/O ports or that do not provide a performance benefit when cached. The processor ignores this flag (assumes it is set) if the CD (cache disable) flag in CR0 is set. Refer to Chapter 9, Memory Cache Control, for more information about the use of this flag. Refer to Section 2.5. in Chapter 2, System Architecture Overview for a description of a companion PCD flag in control register CR3. Accessed (A) flag, bit 5 Indicates whether a page or page table has been accessed (read from or written to) when set. Memory management software typically clears this flag when a page or page table is initially loaded into physical memory. The processor then sets this flag the first time a page or page table is accessed. This flag is a sticky flag, meaning that once set, the processor does not implicitly clear it. Only software can clear this flag. The accessed and dirty flags are provided for use by memory management software to manage the transfer of pages and page tables into and out of physical memory. Dirty (D) flag, bit 6 Indicates whether a page has been written to when set. (This flag is not used in page-directory entries that point to page tables.) Memory management software typically clears this flag when a page is initially loaded into physical memory. The processor then sets this flag the first time a page is accessed for a write operation. This flag is sticky, meaning that once set, the processor does not implicitly clear it. Only software can clear this flag. The dirty and accessed flags are provided for use by memory management software to manage the transfer of pages and page tables into and out of physical memory. Page size (PS) flag, bit 7 Determines the page size. This flag is only used in page-directory entries. When this flag is clear, the page size is 4 KBytes and the page-directory entry points to a page table. When the flag is set, the page size is 4 MBytes for normal 32-bit addressing (and 2 MBytes if extended physical addressing is enabled) and the page-directory entry points to a page. If the page-directory entry points to a page table, all the pages associated with that page table will be 4-KByte pages. Global (G) flag, bit 8 (Introduced in the Pentium Pro processor.) Indicates a global page when set. When a page is marked global and the page global enable (PGE) flag in register CR4 is set, the page-table or page-directory entry for the page is not invalidated in the TLB when register CR3 is loaded or a task switch occurs. This flag is provided to prevent frequently used pages (such as pages that contain kernel or other operating system or executive code) from being flushed from the TLB. Only software can set or clear this flag. For page-directory entries that point to page tables, this flag is ignored and the global characteristics of a page are set in the page-table entries. Refer to Section 3.7., Translation Lookaside Buffers (TLBs) for more information about the use of this flag. (This bit is reserved in Pentium and earlier Intel Architecture processors.)
3-27
Reserved and available-to-software bits In a page-table entry, bit 7 is reserved and should be set to 0; in a page-directory entry that points to a page table, bit 6 is reserved and should be set to 0. For a page-directory entry for a 4-MByte page, bits 12 through 21 are reserved and must be set to 0, for Intel Architecture processors through the Pentium II processor. For both types of entries, bits 9, 10, and 11 are available for use by software. (When the present bit is clear, bits 1 through 31 are available to softwarerefer to Figure 3-16.) When the PSE and PAE flags in control register CR4 are set, the processor generates a page fault if reserved bits are not set to 0.
3.6.5.
Not Present Page-Directory and Page-Table Entries
When the present flag is clear for a page-table or page-directory entry, the operating system or executive may use the rest of the entry for storage of information such as the location of the page in the disk storage system (refer to ).
31 0
Available to Operating System or Executive
Figure 3-16. Format of a Page-Table or Page-Directory Entry for a Not-Present Page
3.7.
TRANSLATION LOOKASIDE BUFFERS (TLBS)
The processor stores the most recently used page-directory and page-table entries in on-chip caches called translation lookaside buffers or TLBs. The P6 family and Pentium processors have separate TLBs for the data and instruction caches. Also, the P6 family processors maintain separate TLBs for 4-KByte and 4-MByte page sizes. The CPUID instruction can be used to determine the sizes of the TLBs provided in the P6 family and Pentium processors. Most paging is performed using the contents of the TLBs. Bus cycles to the page directory and page tables in memory are performed only when the TLBs do not contain the translation information for a requested page. The TLBs are inaccessible to application programs and tasks (privilege level greater than 0); that is, they cannot invalidate TLBs. Only operating system or executive procedures running at privilege level of 0 can invalidate TLBs or selected TBL entries. Whenever a page-directory or page-table entry is changed (including when the present flag is set to zero), the operating-system must immediately invalidate the corresponding entry in the TLB so that it can be updated the next time the entry is referenced. However, if the physical address extension (PAE) feature is enabled to use 36-bit addressing, a new table is added to the paging hierarchy. This new table is called the page directory pointer table (as described in Section 3.8., Physical Address Extension). If an entry is changed in this table (to point to another page directory), the TLBs must then be flushed by writing to CR3.
3-28
All (nonglobal) TLBs are automatically invalidated any time the CR3 register is loaded (unless the G flag for a page or page-table entry is set, as describe later in this section). The CR3 register can be loaded in either of two ways:
Explicitly, using the MOV instruction, for example:

MOV CR3, EAX
where the EAX register contains an appropriate page-directory base address.
Implicitly by executing a task switch, which automatically changes the contents of the CR3 register.
The INVLPG instruction is provided to invalidate a specific page-table entry in the TLB. Normally, this instruction invalidates only an individual TLB entry; however, in some cases, it may invalidate more than the selected entry and may even invalidate all of the TLBs. This instruction ignores the setting of the G flag in a page-directory or page-table entry (refer to the following paragraph). (Introduced in the Pentium Pro processor.) The page global enable (PGE) flag in register CR4 and the global (G) flag of a page-directory or page-table entry (bit 8) can be used to prevent frequently used pages from being automatically invalidated in the TLBs on a task switch or a load of register CR3. (Refer to Section 3.6.4., Page-Directory and Page-Table Entries for more information about the global flag.) When the processor loads a page-directory or page-table entry for a global page into a TLB, the entry will remain in the TLB indefinitely. The only way to deterministically invalidate global page entries is to clear the PGE flag and then invalidate the TLBs or to use the INVLPG instruction to invalidate individual page-directory or page-table entries in the TLBs. For additional information about invalidation of the TLBs, refer to Section 9.10., Invalidating the Translation Lookaside Buffers (TLBs), in Chapter 9, Memory Cache Control.
3.8.
PHYSICAL ADDRESS EXTENSION
The physical address extension (PAE) flag in register CR4 enables an extension of physical addresses from 32 bits to 36 bits. (This feature was introduced into the Intel Architecture in the Pentium Pro processors.) Here, the processor provides 4 additional address line pins to accommodate the additional address bits. This option can only be used when paging is enabled (that is, when both the PG flag in register CR0 and the PAE flag in register CR4 are set). When the physical address extension is enabled, the processor allows several sizes of pages: 4-KByte, 2-MByte, or 4-MByte. As with 32-bit addressing, these page sizes can be addressed within the same set of paging tables (that is, a page-directory entry can point to either a 2-MByte or 4-MByte page or a page table that in turn points to 4-KByte pages). To support the 36-bit physical addresses, the following changes are made to the paging data structures:
The paging table entries are increased to 64 bits to accommodate 36-bit base physical addresses. Each 4-KByte page directory and page table can thus have up to 512 entries.
3-29
A new table, called the page-directory-pointer table, is added to the linear-address translation hierarchy. This table has 4 entries of 64-bits each, and it lies above the page directory in the hierarchy. With the physical address extension mechanism enabled, the processor supports up to 4 page directories. The 20-bit page-directory base address field in register CR3 (PDPR) is replaced with a 27-bit page-directory-pointer-table base address field (refer to Figure 3-17). (In this case, register CR3 is called the PDPTR.) This field provides the 27 most-significant bits of the physical address of the first byte of the page-directory-pointer table, which forces the table to be located on a 32-byte boundary. Linear address translation is changed to allow mapping 32-bit linear addresses into the larger physical address space.
31 0
Page-Directory-Pointer-Table Base Address
P P C W 0 0 0 D T
Figure 3-17. Register CR3 Format When the Physical Address Extension is Enabled
3.8.1.
Linear Address Translation With Extended Addressing Enabled (4-KByte Pages)
Figure 3-12 shows the page-directory-pointer, page-directory, and page-table hierarchy when mapping linear addresses to 4-KByte pages with extended physical addressing enabled. This paging method can be used to address up to 220 pages, which spans a linear address space of 232 bytes (4 GBytes).
3-30
Directory Pointer
Linear Address 31 30 29 21 20 12 11 Table Directory
0 Offset 12 4-KByte Page Physical Address
Page Table Page Directory 9 2 Directory Entry Page-Table Entry 9
Page-DirectoryPointer Table 4 PDPTE 512 PDE 512 PTE = 220 Pages
Dir. Pointer Entry 32*
CR3 (PDBR)
*32 bits aligned onto a 32-byte boundary
Figure 3-18. Linear Address Translation With Extended Physical Addressing Enabled (4-KByte Pages)
To select the various table entries, the linear address is divided into three sections:
Page-directory-pointer-table entryBits 30 and 31 provide an offset to one of the 4 entries in the page-directory-pointer table. The selected entry provides the base physical address of a page directory. Page-directory entryBits 21 through 29 provide an offset to an entry in the selected page directory. The selected entry provides the base physical address of a page table. Page-table entryBits 12 through 20 provide an offset to an entry in the selected page table. This entry provides the base physical address of a page in physical memory. Page offsetBits 0 through 11 provide an offset to a physical address in the page.
3-31
3.8.2.
Linear Address Translation With Extended Addressing Enabled (2-MByte or 4-MByte Pages)
Figure 3-12 shows how a page-directory-pointer table and page directories can be used to map linear addresses to 2-MByte or 4-MByte pages. This paging method can be used to map up to 2048 pages (4 page-directory-pointer-table entries times 512 page-directory entries) into a 4-GByte linear address space. The 2-MByte or 4-MByte page size is selected by setting the PSE flag in control register CR4 and setting the page size (PS) flag in a page-directory entry (refer to Figure 3-14). With these flags set, the linear address is divided into three sections:
Page-directory-pointer-table entryBits 30 and 31 provide an offset to an entry in the page-directory-pointer table. The selected entry provides the base physical address of a page directory. Page-directory entryBits 21 through 29 provide an offset to an entry in the page directory. The selected entry provides the base physical address of a 2-MByte or 4-MByte page. Page offsetBits 0 through 20 provides an offset to a physical address in the page.
3.8.3.
Accessing the Full Extended Physical Address Space With the Extended Page-Table Structure
The page-table structure described in the previous two sections allows up to 4 GBytes of the 64-GByte extended physical address space to be addressed at one time. Additional 4-GByte sections of physical memory can be addressed in either of two way:
Change the pointer in register CR3 to point to another page-directory-pointer table, which in turn points to another set of page directories and page tables. Change entries in the page-directory-pointer table to point to other page directories, which in turn point to other sets of page tables.
3-32
Directory Pointer
Linear Address 31 30 29 21 20 Offset Directory 21 9 Page Directory
2 or 4-MByte Pages Physical Address
Page-DirectoryPointer Table 2 Directory Entry Dir. Pointer Entry 32* CR3 (PDBR) *32 bits aligned onto a 32-byte boundary 4 PDPTE 512 PDE = 2048 Pages
Figure 3-19. Linear Address Translation With Extended Physical Addressing Enabled (2-MByte or 4-MByte Pages)
3.8.4.
Page-Directory and Page-Table Entries With Extended Addressing Enabled
Figure 3-20 shows the format for the page-directory-pointer-table, page-directory, and page-table entries when 4-KByte pages and 36-bit extended physical addresses are being used. Figure 3-21 shows the format for the page-directory-pointer-table and page-directory entries when 2-MByte or 4-MByte pages and 36-bit extended physical addresses are being used. The functions of the flags in these entries are the same as described in Section 3.6.4., Page-Directory and Page-Table Entries. The major differences in these entries are as follows:
A page-directory-pointer-table entry is added. The size of the entries are increased from 32 bits to 64 bits. The maximum number of entries in a page directory or page table is 512. The base physical address field in each entry is extended to 24 bits.
3-33
Page-Directory-Pointer-Table Entry
63 36 35 32
Reserved (set to 0)
31 12 11 9 8
Base Addr. 5 4 3 2 1 0
Page-Directory Base Address
Avail.
P P Reserved C W Res. 1 D T
Page-Directory Entry (4-KByte Page Table)

63 36 35 32
Reserved (set to 0)
31 12 11
Base Addr. 9 8 7 6 5 4 3 2 1 0
P P U R D T S W
Page-Table Base Address
Avail. 0 0 0 A C W / / P
Page-Table Entry (4-KByte Page)

63 36 35 32
Reserved (set to 0)
31 12 11
Base Addr. 9 8 7 6 5 4 3 2 1 0
P P U R D T S W
Page Base Address
Avail. G 0 D A C W / / P
Figure 3-20. Format of Page-Directory-Pointer-Table, Page-Directory, and Page-Table Entries for 4-KByte Pages and 36-Bit Extended Physical Addresses
The base physical address in an entry specifies the following, depending on the type of entry:
Page-directory-pointer-table entrythe physical address of the first byte of a 4-KByte page directory. Page-directory entrythe physical address of the first byte of a 4-KByte page table or a 2-MByte page. Page-table entrythe physical address of the first byte of a 4-KByte page.
For all table entries (except for page-directory entries that point to 2-MByte or 4-MByte pages), the bits in the page base address are interpreted as the 24 most-significant bits of a 36-bit physical address, which forces page tables and pages to be aligned on 4-KByte boundaries. When a page-directory entry points to a 2-MByte or 4-MByte page, the base address is interpreted as the 15 most-significant bits of a 36-bit physical address, which forces pages to be aligned on 2MByte or 4-MByte boundaries.
3-34
Page-Directory-Pointer-Table Entry
63 36 35 32
Reserved (set to 0)
31 12 11 9 8
Base Addr. 5 4 3 2 1 0
Page Directory Base Address
Avail.
P P Reserved C W Res. 1 D T
Page-Directory Entry (2- or 4-MByte Pages)

63 36 35 32
Reserved (set to 0)
31 21 20 12 11
Base Addr. 9 8 7 6 5 4 3 2 1 0
P P U R D T S W
Page Base Address
Reserved (set to 0)
Avail. G 1 D A C W / / P
Figure 3-21. Format of Page-Directory-Pointer-Table and Page-Directory Entries for 2- or 4-MByte Pages and 36-Bit Extended Physical Addresses
The present (P) flag (bit 0) in all page-directory-pointer-table entries must be set to 1 anytime extended physical addressing mode is enabled; that is, whenever the PAE flag (bit 5 in register CR4) and the PG flag (bit 31 in register CR0) are set. If the P flag is not set in all 4 page-directory-pointer-table entries in the page-directory-pointer table when extended physical addressing is enabled, a general-protection exception (#GP) is generated. The page size (PS) flag (bit 7) in a page-directory entry determines if the entry points to a page table or a 2-MByte or 4-MByte page. When this flag is clear, the entry points to a page table; when the flag is set, the entry points to a 2-MByte or 4-MByte page. This flag allows 4-KByte, 2-MByte, or 4-MByte pages to be mixed within one set of paging tables. Access (A) and dirty (D) flags (bits 5 and 6) are provided for table entries that point to pages. Bits 9, 10, and 11 in all the table entries for the physical address extension are available for use by software. (When the present flag is clear, bits 1 through 63 are available to software.) All bits in Figure 3-14 that are marked reserved or 0 should be set to 0 by software and not accessed by software. When the PSE and/or PAE flags in control register CR4 are set, the processor generates a page fault (#PF) if reserved bits in page-directory and page-table entries are not set to 0, and it generates a general-protection exception (#GP) if reserved bits in a page-directorypointer-table entry are not set to 0.
3.9.
36-BIT PAGE SIZE EXTENSION (PSE)
The 36-bit PSE extends 36-bit physical address support to 4-MByte pages while maintaining a 4-byte page-directory entry. This approach provides a simple mechanism for operating system
3-35
vendors to address physical memory above 4-GBytes without requiring major design changes, but has practical limitations with respect to demand paging. The P6 family of processors physical address extension (PAE) feature provides generic access to a 36-bit physical address space. However, it requires expansion of the page-directory and page-table entries to an 8-byte format (64 bit), and the addition of a page-directory-pointer table, resulting in another level of indirection to address translation. For P6-family processors that support the 36-bit PSE feature, the virtual memory architecture is extended to support 4-MByte page size granularity in combination with 36-bit physical addressing. Note that some P6-family processors do not support this feature. For information about determining a processors feature support, refer to the following documents:
AP-485, Intel Processor Identification and the CPUID Instruction AddendumIntel Architecture Software Developers Manual, Volume1: Basic Architecture
For information about the virtual memory architecture features of P6-family processors, refer to Chapter 3 of the Intel Architecture Software Developers Manual, Volume3: System Programming Guide.
3.9.1.
Description of the 36-bit PSE Feature
The 36-bit PSE feature (PSE-36) is detected by an operating system through the CPUID instruction. Specifically, the operating system executes the CPUID instruction with the value 1 in the EAX register and then determines support for the feature by inspecting bit 17 of the EDX register return value (see AddendumIntel Architecture Software Developers Manual, Volume1: Basic Architecture). If the PSE-36 feature is supported, an operating system is permitted to utilize the feature, as well as use certain formerly reserved bits. To use the 36-bit PSE feature, the PSE flag must be enabled by the operating system (bit 4 of CR4). Note that a separate control bit in CR 4 does not exist to regulate the use of 36-bit MByte pages, because this feature becomes the example for 4-MByte pages on processors that support it. Table 3-8 shows the page size and physical address size obtained from various settings of the page-control flags for the P6-family processors that support the 36-bit PSE feature. Shaded in gray is the change to this table resulting from the 36-bit PSE feature.
3-36
Table 3-4. Paging Modes and Physical Address Size

PG Flag (in CR0) 0 1 1 1 1 1 PAE Flag (in CR4) X 0 0 0 1 1 PSE Flag (in CR4) X 0 1 1 X X PS Flag (in the PDE) X X 0 1 0 1 Page Size 4 KB 4 KB 4 KB 4 KB 2 KB Physical Address Size Paging Disabled 32 bits 32 bits 36 bits 36 bits 36 bits
To use the 36-bit PSE feature, the PAE feature must be cleared (as indicated in Table 3-4). However, the 36-bit PSE in no way affects the PAE feature. Existing operating systems and softwware that use the PAE will continue to have compatible functionality and features with P6family processors that support 36-bit PSE. Specifically, the Page-Directory Entry (PDE) format when PAE is enabled for 2-MByte or 4-MByte pages is exactly as depicted in Figure 3-21 of the Intel Architecture Software Developers Manual, Volume3: System Programming Guide. No matter which 36-bit addressing feature is used (PAE or 36-bit PSE), the linear address space of the processor remains at 32 bits. Applications must partition the address space of their work loads across multiple operating system process to take advantage of the additonal physical memory provided in the system. The 36-bit PSE feature estends the PDE format of the Intel Architecture for 4-MByte pages and 32-bit addresses by utilizing bits 16-13 (formerly reserved bits that were required to be zero) to extend the physical address without requiring an 8-byte page-directory entry. Therefore, with the 36-bit PSE feature, a page directory can contain up to 1024 entries, each pointing to a 4MByte page that can exist anywhere in the 36-bit physical address space of the processor. Figure 3-22 shows the difference between PDE formats for 4-MByte pages on P6-family processors that support the 36-bit PSE feature compared to P6-family processors that do not support the 36-bit PSE feature (i.e., 32-bit addressing). Figure 3-22 also shows the linear address mapping to 4-MByte pages when the 36-bit PSE is enabled. The base physical address of the 4-MByte page is contained in the PDE. PA-2 (bits 1316) is used to provide the upper four bits (bits 32-35) of the 36-bit physical address. PA-1 (bits 22-31) continues to provide the next ten bits (bits 22-31) of the physical address for the 4-MByte page. The offset into the page is provided by the lower 22 bits of the linear address. This scheme eliminates the second level of indirection caused by the use of 4-KByte page tables.
3-37
Page Directory Entry format for processors that support 36-bit addressing for 4-MByte pages 31 PA - 1 22 21 17 16 PA - 2 13 12 PAT 11 8 7 PS=1 6 0
Reserved
Page Directory Entry format for processors that support 32-bit addressing for 4-MByte pages 31 22 Base Page Address 21 Reserved 12 11 8 7 PS=1 6 0
Figure 3-22. PDE Format Differences between 36-bit and 32-bit addressing
Notes: 1. PA-2 = Bits 35-32 of thebase physical address for the 4-MByte page (correspond to bits 16-13) 2. PA-2 = Bits 31-22 of thebase physical address for the 4-MByte page 3. PAT = Bit 12 used as the Most Significant Bit of the index into Page Attribute Table (PAT); see Section 10.2. 4. PS = Bit 7 is the Page Size Bitindicates 4-MByte page (must be set to 1) 5. Reserved = Bits 21-17 are reserved for future expansion 6. No change in format or meaning of bits 11-8 and 6-0; refer to Figure 3-15 for details.
The PSE-36 feature is transparent to existing operating systems that utilize 4-MByte pages, because unused bits in PA-2 are currently enforced as zero by Intel processors. The feature requires 4-MByte pages aligned on a 4-MByte boundary and 4 MBytes of physically contiguous memory. Therefore, the ten bits of PA-1 are sufficient to specify the base physical address of any 4-MByte page below 4 GBytes. An operating system can easily support addresses greater than 4 GBytes simply by providing the upper 4 bits of the physical address in PA-2 when creating a PDE for a 4-MByte page. Figure 3-23 shows the linear address mapping to 4 MB pages when the 36-bit PSE is enabled. The base physical address of the 4 MB page is contained in the PDE. PA-2 (bits 13-16) is used to provide the upper four bits (bits 32-35) of the 36-bit physical address. PA-1 (bits 22-31) continues to provide the next ten bits (bits 22-31) of the physical address for the 4 MB page. The offset into the page is provided by the lower 22 bits of the linear address. This scheme eliminates the second level of indirection caused by the use of 4 KB page tables.
3-38
Linear Address
4 MB Page
31
22 21
Directory Index
Page Directory
31
2221
17 16
13 12 11
PAT
8 7
PS=1
Page Frame Address Reserved PA-1
PA-2
CR3
Figure 3-23. Page Size Extension Linear to Physical Translation
The PSE-36 feature is transparent to existing operating systems that utilize 4 MB pages because unused bits in PA-2 are currently enforced as zero by Intel processors. The feature requires 4 MB pages aligned on a 4 MB boundary and 4 MB of physically contiguous memory. Therefore, the ten bits of PA-1 are sufficient to specify the base physical address of any 4 MB page below 4GB. An operating system easily can support addresses greater than 4 GB simply by providing the upper 4 bits of the physical address in PA-2 when creating a PDE for a 4 MB page.
3.9.2.
Fault Detection
There are several conditions that can cause P6-family processors that support this feature to generate a page fault (PF) fault. These conditions are related to the use of, or switching between, various memory management features:
If the PSE feature is enabled, a nonzero value in any of the remaining reserved bits (17-21) of a 4-MByte PDE causes a page fault, with the reserved bit (bit 3) set in the error code. If the PAE feature is enabled and set to use 2-MByte or 4-MByte pages (that is, 8-byte page-directory table entries are being used), a nonzero value in any of the reserved bits 1320 causes a page fault, with the reserved bit (bit 3) set in the error code. Note that bit 12 is now being used to support the Page Attribute Table feature (refer to Section 9.13., Page Attribute Table (PAT)).
3-39
3.10. MAPPING SEGMENTS TO PAGES

The segmentation and paging mechanisms provide in the Intel Architecture support a wide variety of approaches to memory management. When segmentation and paging is combined, segments can be mapped to pages in several ways. To implement a flat (unsegmented) addressing environment, for example, all the code, data, and stack modules can be mapped to one or more large segments (up to 4-GBytes) that share same range of linear addresses (refer to Figure 3-2). Here, segments are essentially invisible to applications and the operating-system or executive. If paging is used, the paging mechanism can map a single linear address space (contained in a single segment) into virtual memory. Or, each program (or task) can have its own large linear address space (contained in its own segment), which is mapped into virtual memory through its own page directory and set of page tables. Segments can be smaller than the size of a page. If one of these segments is placed in a page which is not shared with another segment, the extra memory is wasted. For example, a small data structure, such as a 1-byte semaphore, occupies 4K bytes if it is placed in a page by itself. If many semaphores are used, it is more efficient to pack them into a single page. The Intel Architecture does not enforce correspondence between the boundaries of pages and segments. A page can contain the end of one segment and the beginning of another. Likewise, a segment can contain the end of one page and the beginning of another. Memory-management software may be simpler and more efficient if it enforces some alignment between page and segment boundaries. For example, if a segment which can fit in one page is placed in two pages, there may be twice as much paging overhead to support access to that segment. One approach to combining paging and segmentation that simplifies memory-management software is to give each segment its own page table, as shown in Figure 3-24. This convention gives the segment a single entry in the page directory that provides the access control information for paging the entire segment.
Page Frames LDT Page Directory Page Tables PTE PTE PTE Seg. Descript. Seg. Descript. PDE PDE
PTE PTE
Figure 3-24. Memory Management Convention That Assigns a Page Table to Each Segment
3-40
4
Protection
PROTECTION
CHAPTER 4 PROTECTION
In protected mode, the Intel Architecture provides a protection mechanism that operates at both the segment level and the page level. This protection mechanism provides the ability to limit access to certain segments or pages based on privilege levels (four privilege levels for segments and two privilege levels for pages). For example, critical operating-system code and data can be protected by placing them in more privileged segments than those that contain applications code. The processors protection mechanism will then prevent application code from accessing the operating-system code and data in any but a controlled, defined manner. Segment and page protection can be used at all stages of software development to assist in localizing and detecting design problems and bugs. It can also be incorporated into end-products to offer added robustness to operating systems, utilities software, and applications software. When the protection mechanism is used, each memory reference is checked to verify that it satisfies various protection checks. All checks are made before the memory cycle is started; any violation results in an exception. Because checks are performed in parallel with address translation, there is no performance penalty. The protection checks that are performed fall into the following categories:
Limit checks. Type checks. Privilege level checks. Restriction of addressable domain. Restriction of procedure entry-points. Restriction of instruction set.
All protection violation results in an exception being generated. Refer to Chapter 5, Interrupt and Exception Handling for an explanation of the exception mechanism. This chapter describes the protection mechanism and the violations which lead to exceptions. The following sections describe the protection mechanism available in protected mode. Refer to Chapter 16, 8086 Emulation for information on protection in real-address and virtual-8086 mode.
4-1
PROTECTION
4.1.
ENABLING AND DISABLING SEGMENT AND PAGE PROTECTION
Setting the PE flag in register CR0 causes the processor to switch to protected mode, which in turn enables the segment-protection mechanism. Once in protected mode, there is no control bit for turning the protection mechanism on or off. The part of the segment-protection mechanism that is based on privilege levels can essentially be disabled while still in protected mode by assigning a privilege level of 0 (most privileged) to all segment selectors and segment descriptors. This action disables the privilege level protection barriers between segments, but other protection checks such as limit checking and type checking are still carried out. Page-level protection is automatically enabled when paging is enabled (by setting the PG flag in register CR0). Here again there is no mode bit for turning off page-level protection once paging is enabled. However, page-level protection can be disabled by performing the following operations:
Clear the WP flag in control register CR0. Set the read/write (R/W) and user/supervisor (U/S) flags for each page-directory and pagetable entry.
This action makes each page a writable, user page, which in effect disables page-level protection.
4.2.
FIELDS AND FLAGS USED FOR SEGMENT-LEVEL AND PAGE-LEVEL PROTECTION
The processors protection mechanism uses the following fields and flags in the system data structures to control access to segments and pages:
Descriptor type (S) flag(Bit 12 in the second doubleword of a segment descriptor.) Determines if the segment descriptor is for a system segment or a code or data segment. Type field(Bits 8 through 11 in the second doubleword of a segment descriptor.) Determines the type of code, data, or system segment. Limit field(Bits 0 through 15 of the first doubleword and bits 16 through 19 of the second doubleword of a segment descriptor.) Determines the size of the segment, along with the G flag and E flag (for data segments). G flag(Bit 23 in the second doubleword of a segment descriptor.) Determines the size of the segment, along with the limit field and E flag (for data segments). E flag(Bit 10 in the second doubleword of a data-segment descriptor.) Determines the size of the segment, along with the limit field and G flag. Descriptor privilege level (DPL) field(Bits 13 and 14 in the second doubleword of a segment descriptor.) Determines the privilege level of the segment. Requested privilege level (RPL) field. (Bits 0 and 1 of any segment selector.) Specifies the requested privilege level of a segment selector.
4-2
PROTECTION
Current privilege level (CPL) field. (Bits 0 and 1 of the CS segment register.) Indicates the privilege level of the currently executing program or procedure. The term current privilege level (CPL) refers to the setting of this field. User/supervisor (U/S) flag. (Bit 2 of a page-directory or page-table entry.) Determines the type of page: user or supervisor. Read/write (R/W) flag. (Bit 1 of a page-directory or page-table entry.) Determines the type of access allowed to a page: read only or read-write.
Figure 4-1 shows the location of the various fields and flags in the data, code, and systemsegment descriptors; Figure 3-6 in Chapter 3, Protected-Mode Memory Management shows the location of the RPL (or CPL) field in a segment selector (or the CS register); and Figure 3-14 in Chapter 3, Protected-Mode Memory Management shows the location of the U/S and R/W flags in the page-directory and page-table entries.
4-3
PROTECTION
Data-Segment Descriptor
31 24 23 22 21 20 19 16 15 14 13 12 11 8 7 0 A G B 0 V L D P L
Base 31:24
Limit 19:16
Type
1 0 E W A
Base 23:16
4
0
31
16 15
Base Address 15:00
Segment Limit 15:00
Code-Segment Descriptor
31 24 23 22 21 20 19 16 15 14 13 12 11 8 7 0 A G D 0 V L D P L
Base 31:24
Limit 19:16
Type
1 1 C R A
Base 23:16
4
0
31
16 15
Base Address 15:00
Segment Limit 15:00
System-Segment Descriptor
31 24 23 22 21 20 19 16 15 14 13 12 11 8 7 0 D P L
Base 31:24
Limit 19:16
Type
Base 23:16
4
0
31
16 15
Base Address 15:00
Segment Limit 15:00
A AVL B C D DPL
Accessed Available to Sys. Programmers Big Conforming Default Descriptor Privilege Level Reserved
E G R LIMIT W P
Expansion Direction Granularity Readable Segment Limit Writable Present
Figure 4-1. Descriptor Fields Used for Protection
Many different styles of protection schemes can be implemented with these fields and flags. When the operating system creates a descriptor, it places values in these fields and flags in keeping with the particular protection style chosen for an operating system or executive. Application program do not generally access or modify these fields and flags. The following sections describe how the processor uses these fields and flags to perform the various categories of checks described in the introduction to this chapter.
4-4
PROTECTION
4.3.
LIMIT CHECKING
The limit field of a segment descriptor prevents programs or procedures from addressing memory locations outside the segment. The effective value of the limit depends on the setting of the G (granularity) flag (refer to Figure 4-1). For data segments, the limit also depends on the E (expansion direction) flag and the B (default stack pointer size and/or upper bound) flag. The E flag is one of the bits in the type field when the segment descriptor is for a data-segment type. When the G flag is clear (byte granularity), the effective limit is the value of the 20-bit limit field in the segment descriptor. Here, the limit ranges from 0 to FFFFFH (1 MByte). When the G flag is set (4-KByte page granularity), the processor scales the value in the limit field by a factor of 2^12 (4 KBytes). In this case, the effective limit ranges from FFFH (4 KBytes) to FFFFFFFFH (4 GBytes). Note that when scaling is used (G flag is set), the lower 12 bits of a segment offset (address) are not checked against the limit; for example, note that if the segment limit is 0, offsets 0 through FFFH are still valid. For all types of segments except expand-down data segments, the effective limit is the last address that is allowed to be accessed in the segment, which is one less than the size, in bytes, of the segment. The processor causes a general-protection exception any time an attempt is made to access the following addresses in a segment:
A byte at an offset greater than the effective limit A word at an offset greater than the (effective-limit 1) A doubleword at an offset greater than the (effective-limit 3) A quadword at an offset greater than the (effective-limit 7)
For expand-down data segments, the segment limit has the same function but is interpreted differently. Here, the effective limit specifies the last address that is not allowed to be accessed within the segment; the range of valid offsets is from (effective-limit + 1) to FFFFFFFFH if the B flag is set and from (effective-limit + 1) to FFFFH if the B flag is clear. An expand-down segment has maximum size when the segment limit is 0. Limit checking catches programming errors such as runaway code, runaway subscripts, and invalid pointer calculations. These errors are detected when they occur, so identification of the cause is easier. Without limit checking, these errors could overwrite code or data in another segment. In addition to checking segment limits, the processor also checks descriptor table limits. The GDTR and IDTR registers contain 16-bit limit values that the processor uses to prevent programs from selecting a segment descriptors outside the respective descriptor tables. The LDTR and task registers contain 32-bit segment limit value (read from the segment descriptors for the current LDT and TSS, respectively). The processor uses these segment limits to prevent accesses beyond the bounds of the current LDT and TSS. Refer to Section 3.5.1., Segment Descriptor Tables in Chapter 3, Protected-Mode Memory Management for more information on the GDT and LDT limit fields; refer to Section 5.8., Interrupt Descriptor Table (IDT) in Chapter 5, Interrupt and Exception Handling for more information on the IDT limit field; and refer to Section 6.2.3., Task Register in Chapter 6, Task Management for more information on the TSS segment limit field.
4-5
PROTECTION
4.4.
TYPE CHECKING
Segment descriptors contain type information in two places:
The S (descriptor type) flag. The type field.
The processor uses this information to detect programming errors that result in an attempt to use a segment or gate in an incorrect or unintended manner. The S flag indicates whether a descriptor is a system type or a code or data type. The type field provides 4 additional bits for use in defining various types of code, data, and system descriptors. Table 3-1 in Chapter 3, Protected-Mode Memory Management shows the encoding of the type field for code and data descriptors; Table 3-2 in Chapter 3, Protected-Mode Memory Management shows the encoding of the field for system descriptors. The processor examines type information at various times while operating on segment selectors and segment descriptors. The following list gives examples of typical operations where type checking is performed. This list is not exhaustive.
When a segment selector is loaded into a segment register. Certain segment registers can contain only certain descriptor types, for example: The CS register only can be loaded with a selector for a code segment. Segment selectors for code segments that are not readable or for system segments cannot be loaded into data-segment registers (DS, ES, FS, and GS). Only segment selectors of writable data segments can be loaded into the SS register.
When a segment selector is loaded into the LDTR or task register. The LDTR can only be loaded with a selector for an LDT. The task register can only be loaded with a segment selector for a TSS.
When instructions access segments whose descriptors are already loaded into segment registers. Certain segments can be used by instructions only in certain predefined ways, for example: No instruction may write into an executable segment. No instruction may write into a data segment if it is not writable. No instruction may read an executable segment unless the readable flag is set.
When an instruction operand contains a segment selector. Certain instructions can access segment or gates of only a particular type, for example: A far CALL or far JMP instruction can only access a segment descriptor for a conforming code segment, nonconforming code segment, call gate, task gate, or TSS. The LLDT instruction must reference a segment descriptor for an LDT. The LTR instruction must reference a segment descriptor for a TSS.
4-6
PROTECTION
The LAR instruction must reference a segment or gate descriptor for an LDT, TSS, call gate, task gate, code segment, or data segment. The LSL instruction must reference a segment descriptor for a LDT, TSS, code segment, or data segment. IDT entries must be interrupt, trap, or task gates.
During certain internal operations. For example: On a far call or far jump (executed with a far CALL or far JMP instruction), the processor determines the type of control transfer to be carried out (call or jump to another code segment, a call or jump through a gate, or a task switch) by checking the type field in the segment (or gate) descriptor pointed to by the segment (or gate) selector given as an operand in the CALL or JMP instruction. If the descriptor type is for a code segment or call gate, a call or jump to another code segment is indicated; if the descriptor type is for a TSS or task gate, a task switch is indicated. On a call or jump through a call gate (or on an interrupt- or exception-handler call through a trap or interrupt gate), the processor automatically checks that the segment descriptor being pointed to by the gate is for a code segment. On a call or jump to a new task through a task gate (or on an interrupt- or exceptionhandler call to a new task through a task gate), the processor automatically checks that the segment descriptor being pointed to by the task gate is for a TSS. On a call or jump to a new task by a direct reference to a TSS, the processor automatically checks that the segment descriptor being pointed to by the CALL or JMP instruction is for a TSS. On return from a nested task (initiated by an IRET instruction), the processor checks that the previous task link field in the current TSS points to a TSS.
4.4.1.
Null Segment Selector Checking
Attempting to load a null segment selector (refer to Section 3.4.1. in Chapter 3, Protected-Mode Memory Management) into the CS or SS segment register generates a general-protection exception (#GP). A null segment selector can be loaded into the DS, ES, FS, or GS register, but any attempt to access a segment through one of these registers when it is loaded with a null segment selector results in a #GP exception being generated. Loading unused data-segment registers with a null segment selector is a useful method of detecting accesses to unused segment registers and/or preventing unwanted accesses to data segments.
4-7
PROTECTION
4.5.
PRIVILEGE LEVELS
The processors segment-protection mechanism recognizes 4 privilege levels, numbered from 0 to 3. The greater numbers mean lesser privileges. Figure 4-2 shows how these levels of privilege can be interpreted as rings of protection. The center (reserved for the most privileged code, data, and stacks) is used for the segments containing the critical software, usually the kernel of an operating system. Outer rings are used for less critical software. (Systems that use only 2 of the 4 possible privilege levels should use levels 0 and 3.)
Protection Rings
Operating System Kernel Operating System Services
Level 0 Level 1 Level 2
Applications
Level 3
Figure 4-2. Protection Rings
The processor uses privilege levels to prevent a program or task operating at a lesser privilege level from accessing a segment with a greater privilege, except under controlled situations. When the processor detects a privilege level violation, it generates a general-protection exception (#GP). To carry out privilege-level checks between code segments and data segments, the processor recognizes the following three types of privilege levels:
Current privilege level (CPL). The CPL is the privilege level of the currently executing program or task. It is stored in bits 0 and 1 of the CS and SS segment registers. Normally, the CPL is equal to the privilege level of the code segment from which instructions are being fetched. The processor changes the CPL when program control is transferred to a code segment with a different privilege level. The CPL is treated slightly differently when accessing conforming code segments. Conforming code segments can be accessed from any privilege level that is equal to or numerically greater (less privileged) than the DPL of the conforming code segment. Also, the CPL is not changed when the processor accesses a conforming code segment that has a different privilege level than the CPL. Descriptor privilege level (DPL). The DPL is the privilege level of a segment or gate. It is stored in the DPL field of the segment or gate descriptor for the segment or gate. When the currently executing code segment attempts to access a segment or gate, the DPL of the
4-8
PROTECTION
segment or gate is compared to the CPL and RPL of the segment or gate selector (as described later in this section). The DPL is interpreted differently, depending on the type of segment or gate being accessed: Data segment. The DPL indicates the numerically highest privilege level that a program or task can have to be allowed to access the segment. For example, if the DPL of a data segment is 1, only programs running at a CPL of 0 or 1 can access the segment. Nonconforming code segment (without using a call gate). The DPL indicates the privilege level that a program or task must be at to access the segment. For example, if the DPL of a nonconforming code segment is 0, only programs running at a CPL of 0 can access the segment. Call gate. The DPL indicates the numerically highest privilege level that the currently executing program or task can be at and still be able to access the call gate. (This is the same access rule as for a data segment.) Conforming code segment and nonconforming code segment accessed through a call gate. The DPL indicates the numerically lowest privilege level that a program or task can have to be allowed to access the segment. For example, if the DPL of a conforming code segment is 2, programs running at a CPL of 0 or 1 cannot access the segment. TSS. The DPL indicates the numerically highest privilege level that the currently executing program or task can be at and still be able to access the TSS. (This is the same access rule as for a data segment.)
Requested privilege level (RPL). The RPL is an override privilege level that is assigned to segment selectors. It is stored in bits 0 and 1 of the segment selector. The processor checks the RPL along with the CPL to determine if access to a segment is allowed. Even if the program or task requesting access to a segment has sufficient privilege to access the segment, access is denied if the RPL is not of sufficient privilege level. That is, if the RPL of a segment selector is numerically greater than the CPL, the RPL overrides the CPL, and vice versa. The RPL can be used to insure that privileged code does not access a segment on behalf of an application program unless the program itself has access privileges for that segment. Refer to Section 4.10.4., Checking Caller Access Privileges (ARPL Instruction) for a detailed description of the purpose and typical use of the RPL.
Privilege levels are checked when the segment selector of a segment descriptor is loaded into a segment register. The checks used for data access differ from those used for transfers of program control among code segments; therefore, the two kinds of accesses are considered separately in the following sections.
4.6.
PRIVILEGE LEVEL CHECKING WHEN ACCESSING DATA SEGMENTS
To access operands in a data segment, the segment selector for the data segment must be loaded into the data-segment registers (DS, ES, FS, or GS) or into the stack-segment register (SS).
4-9
PROTECTION
(Segment registers can be loaded with the MOV, POP, LDS, LES, LFS, LGS, and LSS instructions.) Before the processor loads a segment selector into a segment register, it performs a privilege check (refer to Figure 4-3) by comparing the privilege levels of the currently running program or task (the CPL), the RPL of the segment selector, and the DPL of the segments segment descriptor. The processor loads the segment selector into the segment register if the DPL is numerically greater than or equal to both the CPL and the RPL. Otherwise, a generalprotection fault is generated and the segment register is not loaded.
CS Register CPL Segment Selector For Data Segment RPL Data-Segment Descriptor DPL Privilege Check
Figure 4-3. Privilege Check for Data Access
Figure 4-4 shows four procedures (located in codes segments A, B, C, and D), each running at different privilege levels and each attempting to access the same data segment.
The procedure in code segment A is able to access data segment E using segment selector E1, because the CPL of code segment A and the RPL of segment selector E1 are equal to the DPL of data segment E. The procedure in code segment B is able to access data segment E using segment selector E2, because the CPL of code segment A and the RPL of segment selector E2 are both numerically lower than (more privileged) than the DPL of data segment E. A code segment B procedure can also access data segment E using segment selector E1. The procedure in code segment C is not able to access data segment E using segment selector E3 (dotted line), because the CPL of code segment C and the RPL of segment selector E3 are both numerically greater than (less privileged) than the DPL of data segment E. Even if a code segment C procedure were to use segment selector E1 or E2, such that the RPL would be acceptable, it still could not access data segment E because its CPL is not privileged enough. The procedure in code segment D should be able to access data segment E because code segment Ds CPL is numerically less than the DPL of data segment E. However, the RPL of segment selector E3 (which the code segment D procedure is using to access data segment E) is numerically greater than the DPL of data segment E, so access is not
4-10
PROTECTION
allowed. If the code segment D procedure were to use segment selector E1 or E2 to access the data segment, access would be allowed.
Code Segment C CPL=3 Lowest Privilege Code Segment A CPL=2
Segment Sel. E3 RPL=3
Data Segment E DPL=2
Code Segment B CPL=1
Code Segment D CPL=0 Highest Privilege
Figure 4-4. Examples of Accessing Data Segments From Various Privilege Levels
As demonstrated in the previous examples, the addressable domain of a program or task varies as its CPL changes. When the CPL is 0, data segments at all privilege levels are accessible; when the CPL is 1, only data segments at privilege levels 1 through 3 are accessible; when the CPL is 3, only data segments at privilege level 3 are accessible. The RPL of a segment selector can always override the addressable domain of a program or task. When properly used, RPLs can prevent problems caused by accidental (or intensional) use of segment selectors for privileged data segments by less privileged programs or procedures. It is important to note that the RPL of a segment selector for a data segment is under software control. For example, an application program running at a CPL of 3 can set the RPL for a datasegment selector to 0. With the RPL set to 0, only the CPL checks, not the RPL checks, will provide protection against deliberate, direct attempts to violate privilege-level security for the data segment. To prevent these types of privilege-level-check violations, a program or procedure can check access privileges whenever it receives a data-segment selector from another procedure (refer to Section 4.10.4., Checking Caller Access Privileges (ARPL Instruction)).
4-11
PROTECTION
4.6.1.
Accessing Data in Code Segments
In some instances it may be desirable to access data structures that are contained in a code segment. The following methods of accessing data in code segments are possible:
Load a data-segment register with a segment selector for a nonconforming, readable, code segment. Load a data-segment register with a segment selector for a conforming, readable, code segment. Use a code-segment override prefix (CS) to read a readable, code segment whose selector is already loaded in the CS register.
The same rules for accessing data segments apply to method 1. Method 2 is always valid because the privilege level of a conforming code segment is effectively the same as the CPL, regardless of its DPL. Method 3 is always valid because the DPL of the code segment selected by the CS register is the same as the CPL.
4.7.
PRIVILEGE LEVEL CHECKING WHEN LOADING THE SS REGISTER
Privilege level checking also occurs when the SS register is loaded with the segment selector for a stack segment. Here all privilege levels related to the stack segment must match the CPL; that is, the CPL, the RPL of the stack-segment selector, and the DPL of the stack-segment descriptor must be the same. If the RPL and DPL are not equal to the CPL, a general-protection exception (#GP) is generated.
4.8.
PRIVILEGE LEVEL CHECKING WHEN TRANSFERRING PROGRAM CONTROL BETWEEN CODE SEGMENTS
To transfer program control from one code segment to another, the segment selector for the destination code segment must be loaded into the code-segment register (CS). As part of this loading process, the processor examines the segment descriptor for the destination code segment and performs various limit, type, and privilege checks. If these checks are successful, the CS register is loaded, program control is transferred to the new code segment, and program execution begins at the instruction pointed to by the EIP register. Program control transfers are carried out with the JMP, CALL, RET, INT n, and IRET instructions, as well as by the exception and interrupt mechanisms. Exceptions, interrupts, and the IRET instruction are special cases discussed in Chapter 5, Interrupt and Exception Handling. This chapter discusses only the JMP, CALL, and RET instructions. A JMP or CALL instruction can reference another code segment in any of four ways:
The target operand contains the segment selector for the target code segment. The target operand points to a call-gate descriptor, which contains the segment selector for the target code segment.
4-12
PROTECTION
The target operand points to a TSS, which contains the segment selector for the target code segment. The target operand points to a task gate, which points to a TSS, which in turn contains the segment selector for the target code segment.
The following sections describe first two types of references. Refer to Section 6.3., Task Switching in Chapter 6, Task Management for information on transferring program control through a task gate and/or TSS.
4.8.1.
Direct Calls or Jumps to Code Segments
The near forms of the JMP, CALL, and RET instructions transfer program control within the current code segment, so privilege-level checks are not performed. The far forms of the JMP, CALL, and RET instructions transfer control to other code segments, so the processor does perform privilege-level checks. When transferring program control to another code segment without going through a call gate, the processor examines four kinds of privilege level and type information (refer to Figure 4-5):
The CPL. (Here, the CPL is the privilege level of the calling code segment; that is, the code segment that contains the procedure that is making the call or jump.)
CS Register CPL Segment Selector For Code Segment RPL Destination Code Segment Descriptor DPL C
Privilege Check
Figure 4-5. Privilege Check for Control Transfer Without Using a Gate
The DPL of the segment descriptor for the destination code segment that contains the called procedure. The RPL of the segment selector of the destination code segment. The conforming (C) flag in the segment descriptor for the destination code segment, which determines whether the segment is a conforming (C flag is set) or nonconforming (C flag is clear) code segment. (Refer to Section 3.4.3.1., Code- and Data-Segment Descriptor
4-13
PROTECTION
Types in Chapter 3, Protected-Mode Memory Management for more information about this flag.) The rules that the processor uses to check the CPL, RPL, and DPL depends on the setting of the C flag, as described in the following sections. 4.8.1.1. ACCESSING NONCONFORMING CODE SEGMENTS
When accessing nonconforming code segments, the CPL of the calling procedure must be equal to the DPL of the destination code segment; otherwise, the processor generates a general-protection exception (#GP). For example, in Figure 4-6, code segment C is a nonconforming code segment. Therefore, a procedure in code segment A can call a procedure in code segment C (using segment selector C1), because they are at the same privilege level (the CPL of code segment A is equal to the DPL of code segment C). However, a procedure in code segment B cannot call a procedure in code segment C (using segment selector C2 or C1), because the two code segments are at different privilege levels.
Code Segment B CPL=3
Segment Sel. D2 RPL=3 Segment Sel. C2 RPL=3
Lowest Privilege Code Segment A CPL=2 Segment Sel. C1 RPL=2 Segment Sel. D1 RPL=2 Code Segment C DPL=2
Nonconforming Code Segment
Code Segment D DPL=3

Conforming Code Segment
Highest Privilege
Figure 4-6. Examples of Accessing Conforming and Nonconforming Code Segments From Various Privilege Levels
4-14
PROTECTION
The RPL of the segment selector that points to a nonconforming code segment has a limited effect on the privilege check. The RPL must be numerically less than or equal to the CPL of the calling procedure for a successful control transfer to occur. So, in the example in Figure 4-6, the RPLs of segment selectors C1 and C2 could legally be set to 0, 1, or 2, but not to 3. When the segment selector of a nonconforming code segment is loaded into the CS register, the privilege level field is not changed; that is, it remains at the CPL (which is the privilege level of the calling procedure). This is true, even if the RPL of the segment selector is different from the CPL. 4.8.1.2. ACCESSING CONFORMING CODE SEGMENTS
When accessing conforming code segments, the CPL of the calling procedure may be numerically equal to or greater than (less privileged) the DPL of the destination code segment; the processor generates a general-protection exception (#GP) only if the CPL is less than the DPL. (The segment selector RPL for the destination code segment is not checked if the segment is a conforming code segment.) In the example in Figure 4-6, code segment D is a conforming code segment. Therefore, calling procedures in both code segment A and B can access code segment D (using either segment selector D1 or D2, respectively), because they both have CPLs that are greater than or equal to the DPL of the conforming code segment. For conforming code segments, the DPL represents the numerically lowest privilege level that a calling procedure may be at to successfully make a call to the code segment. (Note that segments selectors D1 and D2 are identical except for their respective RPLs. But since RPLs are not checked when accessing conforming code segments, the two segment selectors are essentially interchangeable.) When program control is transferred to a conforming code segment, the CPL does not change, even if the DPL of the destination code segment is less than the CPL. This situation is the only one where the CPL may be different from the DPL of the current code segment. Also, since the CPL does not change, no stack switch occurs. Conforming segments are used for code modules such as math libraries and exception handlers, which support applications but do not require access to protected system facilities. These modules are part of the operating system or executive software, but they can be executed at numerically higher privilege levels (less privileged levels). Keeping the CPL at the level of a calling code segment when switching to a conforming code segment prevents an application program from accessing nonconforming code segments while at the privilege level (DPL) of a conforming code segment and thus prevents it from accessing more privileged data. Most code segments are nonconforming. For these segments, program control can be transferred only to code segments at the same level of privilege, unless the transfer is carried out through a call gate, as described in the following sections.
4-15
PROTECTION
4.8.2.
Gate Descriptors
To provide controlled access to code segments with different privilege levels, the processor provides special set of descriptors called gate descriptors. There are four kinds of gate descriptors:
Call gates Trap gates Interrupt gates Task gates
Task gates are used for task switching and are discussed in Chapter 6, Task Management. Trap and interrupt gates are special kinds of call gates used for calling exception and interrupt handlers. The are described in Chapter 5, Interrupt and Exception Handling. This chapter is concerned only with call gates.
4.8.3.
Call Gates
Call gates facilitate controlled transfers of program control between different privilege levels. They are typically used only in operating systems or executives that use the privilege-level protection mechanism. Call gates are also useful for transferring program control between 16-bit and 32-bit code segments, as described in Section 17.4., Transferring Control Among MixedSize Code Segments in Chapter 17, Mixing 16-Bit and 32-Bit Code. Figure 4-7 shows the format of a call-gate descriptor. A call-gate descriptor may reside in the GDT or in an LDT, but not in the interrupt descriptor table (IDT). It performs six functions:
It specifies the code segment to be accessed. It defines an entry point for a procedure in the specified code segment. It specifies the privilege level required for a caller trying to access the procedure. If a stack switch occurs, it specifies the number of optional parameters to be copied between stacks. It defines the size of values to be pushed onto the target stack: 16-bit gates force 16-bit pushes and 32-bit gates force 32-bit pushes. It specifies whether the call-gate descriptor is valid.
4-16
PROTECTION
31
16 15 14 13 12 11
8 7
6 0 0 0
5 4
Offset in Segment 31:16

31
D P L
Type
0 1 1 0 0
Param. Count
0
16 15
Segment Selector
Offset in Segment 15:00
DPL Descriptor Privilege Level P Gate Valid
Figure 4-7. Call-Gate Descriptor
The segment selector field in a call gate specifies the code segment to be accessed. The offset field specifies the entry point in the code segment. This entry point is generally to the first instruction of a specific procedure. The DPL field indicates the privilege level of the call gate, which in turn is the privilege level required to access the selected procedure through the gate. The P flag indicates whether the call-gate descriptor is valid. (The presence of the code segment to which the gate points is indicated by the P flag in the code segments descriptor.) The parameter count field indicates the number of parameters to copy from the calling procedures stack to the new stack if a stack switch occurs (refer to Section 4.8.5., Stack Switching). The parameter count specifies the number of words for 16-bit call gates and doublewords for 32-bit call gates. Note that the P flag in a gate descriptor is normally always set to 1. If it is set to 0, a not present (#NP) exception is generated when a program attempts to access the descriptor. The operating system can use the P flag for special purposes. For example, it could be used to track the number of times the gate is used. Here, the P flag is initially set to 0 causing a trap to the not-present exception handler. The exception handler then increments a counter and sets the P flag to 1, so that on returning from the handler, the gate descriptor will be valid.
4.8.4.
Accessing a Code Segment Through a Call Gate
To access a call gate, a far pointer to the gate is provided as a target operand in a CALL or JMP instruction. The segment selector from this pointer identifies the call gate (refer to Figure 4-8); the offset from the pointer is required, but not used or checked by the processor. (The offset can be set to any value.) When the processor has accessed the call gate, it uses the segment selector from the call gate to locate the segment descriptor for the destination code segment. (This segment descriptor can be in the GDT or the LDT.) It then combines the base address from the code-segment descriptor with the offset from the call gate to form the linear address of the procedure entry point in the code segment. As shown in Figure 4-9, four different privilege levels are used to check the validity of a program control transfer through a call gate:
4-17
PROTECTION
The CPL (current privilege level). The RPL (requestor's privilege level) of the call gates selector. The DPL (descriptor privilege level) of the call gate descriptor. The DPL of the segment descriptor of the destination code segment.
The C flag (conforming) in the segment descriptor for the destination code segment is also checked.
Far Pointer to Call Gate Segment Selector Offset
Required but not used by processor
Descriptor Table
Offset Segment Selector Offset
Call-Gate Descriptor
Base
Base
+
Procedure Entry Point
Base
Code-Segment Descriptor
Figure 4-8. Call-Gate Mechanism
4-18
PROTECTION
CS Register
CPL
Call-Gate Selector
RPL
Call Gate (Descriptor)

DPL
Privilege Check
Destination CodeSegment Descriptor

DPL
Figure 4-9. Privilege Check for Control Transfer with Call Gate
The privilege checking rules are different depending on whether the control transfer was initiated with a CALL or a JMP instruction, as shown in Table 4-1.
Table 4-1. Privilege Check Rules for Call Gates
Instruction CALL Privilege Check Rules CPL call gate DPL; RPL call gate DPL Destination conforming code segment DPL CPL Destination nonconforming code segment DPL CPL JMP CPL call gate DPL; RPL call gate DPL Destination conforming code segment DPL CPL Destination nonconforming code segment DPL = CPL
The DPL field of the call-gate descriptor specifies the numerically highest privilege level from which a calling procedure can access the call gate; that is, to access a call gate, the CPL of a calling procedure must be equal to or less than the DPL of the call gate. For example, in Figure 4-12, call gate A has a DPL of 3. So calling procedures at all CPLs (0 through 3) can access this call gate, which includes calling procedures in code segments A, B, and C. Call gate B has a DPL of 2, so only calling procedures at a CPL or 0, 1, or 2 can access call gate B, which includes calling procedures in code segments B and C. The dotted line shows that a calling procedure in code segment A cannot access call gate B.
4-19
PROTECTION
The RPL of the segment selector to a call gate must satisfy the same test as the CPL of the calling procedure; that is, the RPL must be less than or equal to the DPL of the call gate. In the example in Figure 4-12, a calling procedure in code segment C can access call gate B using gate selector B2 or B1, but it could not use gate selector B3 to access call gate B. If the privilege checks between the calling procedure and call gate are successful, the processor then checks the DPL of the code-segment descriptor against the CPL of the calling procedure. Here, the privilege check rules vary between CALL and JMP instructions. Only CALL instructions can use call gates to transfer program control to more privileged (numerically lower privilege level) nonconforming code segments; that is, to nonconforming code segments with a DPL less than the CPL. A JMP instruction can use a call gate only to transfer program control to a nonconforming code segment with a DPL equal to the CPL. CALL and JMP instruction can both transfer program control to a more privileged conforming code segment; that is, to a conforming code segment with a DPL less than or equal to the CPL. If a call is made to a more privileged (numerically lower privilege level) nonconforming destination code segment, the CPL is lowered to the DPL of the destination code segment and a stack switch occurs (refer to Section 4.8.5., Stack Switching). If a call or jump is made to a more privileged conforming destination code segment, the CPL is not changed and no stack switch occurs.
Code Segment A CPL=3
Gate Selector A RPL=3 Gate Selector B3 RPL=3
Call Gate A DPL=3
Lowest Privilege Code Segment B CPL=2 Gate Selector B1 RPL=2 Call Gate B DPL=2
2
Code Segment C CPL=1 Gate Selector B2 RPL=1 No Stack Switch Occurs Stack Switch Occurs Code Segment E DPL=0
Nonconforming Code Segment
Code Segment D DPL=0
Highest Privilege
Conforming Code Segment
Figure 4-10. Example of Accessing Call Gates At Various Privilege Levels
4-20
PROTECTION
Call gates allow a single code segment to have procedures that can be accessed at different privilege levels. For example, an operating system located in a code segment may have some services which are intended to be used by both the operating system and application software (such as procedures for handling character I/O). Call gates for these procedures can be set up that allow access at all privilege levels (0 through 3). More privileged call gates (with DPLs of 0 or 1) can then be set up for other operating system services that are intended to be used only by the operating system (such as procedures that initialize device drivers).
4.8.5.
Stack Switching
Whenever a call gate is used to transfer program control to a more privileged nonconforming code segment (that is, when the DPL of the nonconforming destination code segment is less than the CPL), the processor automatically switches to the stack for the destination code segments privilege level. This stack switching is carried out to prevent more privileged procedures from crashing due to insufficient stack space. It also prevents less privileged procedures from interfering (by accident or intent) with more privileged procedures through a shared stack. Each task must define up to 4 stacks: one for applications code (running at privilege level 3) and one for each of the privilege levels 2, 1, and 0 that are used. (If only two privilege levels are used [3 and 0], then only two stacks must be defined.) Each of these stacks is located in a separate segment and is identified with a segment selector and an offset into the stack segment (a stack pointer). The segment selector and stack pointer for the privilege level 3 stack is located in the SS and ESP registers, respectively, when privilege-level-3 code is being executed and is automatically stored on the called procedures stack when a stack switch occurs. Pointers to the privilege level 0, 1, and 2 stacks are stored in the TSS for the currently running task (refer to Figure 6-2 in Chapter 6, Task Management). Each of these pointers consists of a segment selector and a stack pointer (loaded into the ESP register). These initial pointers are strictly read-only values. The processor does not change them while the task is running. They are used only to create new stacks when calls are made to more privileged levels (numerically lower privilege levels). These stacks are disposed of when a return is made from the called procedure. The next time the procedure is called, a new stack is created using the initial stack pointer. (The TSS does not specify a stack for privilege level 3 because the processor does not allow a transfer of program control from a procedure running at a CPL of 0, 1, or 2 to a procedure running at a CPL of 3, except on a return.) The operating system is responsible for creating stacks and stack-segment descriptors for all the privilege levels to be used and for loading initial pointers for these stacks into the TSS. Each stack must be read/write accessible (as specified in the type field of its segment descriptor) and must contain enough space (as specified in the limit field) to hold the following items:
The contents of the SS, ESP, CS, and EIP registers for the calling procedure. The parameters and temporary variables required by the called procedure. The EFLAGS register and error code, when implicit calls are made to an exception or interrupt handler.
4-21
PROTECTION
The stack will need to require enough space to contain many frames of these items, because procedures often call other procedures, and an operating system may support nesting of multiple interrupts. Each stack should be large enough to allow for the worst case nesting scenario at its privilege level. (If the operating system does not use the processors multitasking mechanism, it still must create at least one TSS for this stack-related purpose.) When a procedure call through a call gate results in a change in privilege level, the processor performs the following steps to switch stacks and begin execution of the called procedure at a new privilege level: 1. Uses the DPL of the destination code segment (the new CPL) to select a pointer to the new stack (segment selector and stack pointer) from the TSS. 2. Reads the segment selector and stack pointer for the stack to be switched to from the current TSS. Any limit violations detected while reading the stack-segment selector, stack pointer, or stack-segment descriptor cause an invalid TSS (#TS) exception to be generated. 3. Checks the stack-segment descriptor for the proper privileges and type and generates an invalid TSS (#TS) exception if violations are detected. 4. Temporarily saves the current values of the SS and ESP registers. 5. Loads the segment selector and stack pointer for the new stack in the SS and ESP registers. 6. Pushes the temporarily saved values for the SS and ESP registers (for the calling procedure) onto the new stack (refer to Figure 4-11). 7. Copies the number of parameter specified in the parameter count field of the call gate from the calling procedures stack to the new stack. If the count is 0, no parameters are copied. 8. Pushes the return instruction pointer (the current contents of the CS and EIP registers) onto the new stack. 9. Loads the segment selector for the new code segment and the new instruction pointer from the call gate into the CS and EIP registers, respectively, and begins execution of the called procedure. Refer to the description of the CALL instruction in Chapter 3, Instruction Set Reference, in the Intel Architecture Software Developers Manual, Volume 2, for a detailed description of the privilege level checks and other protection checks that the processor performs on a far call through a call gate.
4-22
PROTECTION
Calling Procedures Stack
Called Procedures Stack Calling SS
Parameter 1 Parameter 2 Parameter 3 ESP
Calling ESP Parameter 1 Parameter 2 Parameter 3 Calling CS Calling EIP ESP
Figure 4-11. Stack Switching During an Interprivilege-Level Call
The parameter count field in a call gate specifies the number of data items (up to 31) that the processor should copy from the calling procedures stack to the stack of the called procedure. If more than 31 data items need to be passed to the called procedure, one of the parameters can be a pointer to a data structure, or the saved contents of the SS and ESP registers may be used to access parameters in the old stack space. The size of the data items passed to the called procedure depends on the call gate size, as described in Section 4.8.3., Call Gates
4.8.6.
Returning from a Called Procedure
The RET instruction can be used to perform a near return, a far return at the same privilege level, and a far return to a different privilege level. This instruction is intended to execute returns from procedures that were called with a CALL instruction. It does not support returns from a JMP instruction, because the JMP instruction does not save a return instruction pointer on the stack. A near return only transfers program control within the current code segment; therefore, the processor performs only a limit check. When the processor pops the return instruction pointer from the stack into the EIP register, it checks that the pointer does not exceed the limit of the current code segment. On a far return at the same privilege level, the processor pops both a segment selector for the code segment being returned to and a return instruction pointer from the stack. Under normal conditions, these pointers should be valid, because they were pushed on the stack by the CALL instruction. However, the processor performs privilege checks to detect situations where the current procedure might have altered the pointer or failed to maintain the stack properly.
4-23
PROTECTION
A far return that requires a privilege-level change is only allowed when returning to a less privileged level (that is, the DPL of the return code segment is numerically greater than the CPL). The processor uses the RPL field from the CS register value saved for the calling procedure (refer to Figure 4-11) to determine if a return to a numerically higher privilege level is required. If the RPL is numerically greater (less privileged) than the CPL, a return across privilege levels occurs. The processor performs the following steps when performing a far return to a calling procedure (refer to Figures 4-2 and 4-4 in the Intel Architecture Software Developers Manual, Volume 1, for an illustration of the stack contents prior to and after a return): 1. Checks the RPL field of the saved CS register value to determine if a privilege level change is required on the return. 2. Loads the CS and EIP registers with the values on the called procedures stack. (Type and privilege level checks are performed on the code-segment descriptor and RPL of the codesegment selector.) 3. (If the RET instruction includes a parameter count operand and the return requires a privilege level change.) Adds the parameter count (in bytes obtained from the RET instruction) to the current ESP register value (after popping the CS and EIP values), to step past the parameters on the called procedures stack. The resulting value in the ESP register points to the saved SS and ESP values for the calling procedures stack. (Note that the byte count in the RET instruction must be chosen to match the parameter count in the call gate that the calling procedure referenced when it made the original call multiplied by the size of the parameters.) 4. (If the return requires a privilege level change.) Loads the SS and ESP registers with the saved SS and ESP values and switches back to the calling procedures stack. The SS and ESP values for the called procedures stack are discarded. Any limit violations detected while loading the stack-segment selector or stack pointer cause a general-protection exception (#GP) to be generated. The new stack-segment descriptor is also checked for type and privilege violations. 5. (If the RET instruction includes a parameter count operand.) Adds the parameter count (in bytes obtained from the RET instruction) to the current ESP register value, to step past the parameters on the calling procedures stack. The resulting ESP value is not checked against the limit of the stack segment. If the ESP value is beyond the limit, that fact is not recognized until the next stack operation. 6. (If the return requires a privilege level change.) Checks the contents of the DS, ES, FS, and GS segment registers. If any of these registers refer to segments whose DPL is less than the new CPL (excluding conforming code segments), the segment register is loaded with a null segment selector. Refer to the description of the RET instruction in Chapter 3, Instruction Set Reference, of the Intel Architecture Software Developers Manual, Volume 2, for a detailed description of the privilege level checks and other protection checks that the processor performs on a far return.
4-24
PROTECTION
4.9.
PRIVILEGED INSTRUCTIONS
Some of the system instructions (called privileged instructions are protected from use by application programs. The privileged instructions control system functions (such as the loading of system registers). They can be executed only when the CPL is 0 (most privileged). If one of these instructions is executed when the CPL is not 0, a general-protection exception (#GP) is generated. The following system instructions are privileged instructions:
LGDTLoad GDT register. LLDTLoad LDT register. LTRLoad task register. LIDTLoad IDT register. MOV (control registers)Load and store control registers. LMSWLoad machine status word. CLTSClear task-switched flag in register CR0. MOV (debug registers)Load and store debug registers. INVDInvalidate cache, without writeback. WBINVDInvalidate cache, with writeback. INVLPGInvalidate TLB entry. HLTHalt processor. RDMSRRead Model-Specific Registers. WRMSRWrite Model-Specific Registers. RDPMCRead Performance-Monitoring Counter. RDTSCRead Time-Stamp Counter.
Some of the privileged instructions are available only in the more recent families of Intel Architecture processors (refer to Section 18.7., New Instructions In the Pentium and Later Intel Architecture Processors, in Chapter 18, Intel Architecture Compatibility). The PCE and TSD flags in register CR4 (bits 4 and 2, respectively) enable the RDPMC and RDTSC instructions, respectively, to be executed at any CPL.
4.10. POINTER VALIDATION

When operating in protected mode, the processor validates all pointers to enforce protection between segments and maintain isolation between privilege levels. Pointer validation consists of the following checks: 1. Checking access rights to determine if the segment type is compatible with its use. 2. Checking read/write rights
4-25
PROTECTION
3. Checking if the pointer offset exceeds the segment limit. 4. Checking if the supplier of the pointer is allowed to access the segment. 5. Checking the offset alignment. The processor automatically performs first, second, and third checks during instruction execution. Software must explicitly request the fourth check by issuing an ARPL instruction. The fifth check (offset alignment) is performed automatically at privilege level 3 if alignment checking is turned on. Offset alignment does not affect isolation of privilege levels.
4.10.1. Checking Access Rights (LAR Instruction)

When the processor accesses a segment using a far pointer, it performs an access rights check on the segment descriptor pointed to by the far pointer. This check is performed to determine if type and privilege level (DPL) of the segment descriptor are compatible with the operation to be performed. For example, when making a far call in protected mode, the segment-descriptor type must be for a conforming or nonconforming code segment, a call gate, a task gate, or a TSS. Then, if the call is to a nonconforming code segment, the DPL of the code segment must be equal to the CPL, and the RPL of the code segments segment selector must be less than or equal to the DPL. If type or privilege level are found to be incompatible, the appropriate exception is generated. To prevent type incompatibility exceptions from being generated, software can check the access rights of a segment descriptor using the LAR (load access rights) instruction. The LAR instruction specifies the segment selector for the segment descriptor whose access rights are to be checked and a destination register. The instruction then performs the following operations: 1. Check that the segment selector is not null. 2. Checks that the segment selector points to a segment descriptor that is within the descriptor table limit (GDT or LDT). 3. Checks that the segment descriptor is a code, data, LDT, call gate, task gate, or TSS segment-descriptor type. 4. If the segment is not a conforming code segment, checks if the segment descriptor is visible at the CPL (that is, if the CPL and the RPL of the segment selector are less than or equal to the DPL). 5. If the privilege level and type checks pass, loads the second doubleword of the segment descriptor into the destination register (masked by the value 00FXFF00H, where X indicates that the corresponding 4 bits are undefined) and sets the ZF flag in the EFLAGS register. If the segment selector is not visible at the current privilege level or is an invalid type for the LAR instruction, the instruction does not modify the destination register and clears the ZF flag. Once loaded in the destination register, software can preform additional checks on the access rights information.
4-26
PROTECTION
4.10.2. Checking Read/Write Rights (VERR and VERW Instructions)

When the processor accesses any code or data segment it checks the read/write privileges assigned to the segment to verify that the intended read or write operation is allowed. Software can check read/write rights using the VERR (verify for reading) and VERW (verify for writing) instructions. Both these instructions specify the segment selector for the segment being checked. The instructions then perform the following operations: 1. Check that the segment selector is not null. 2. Checks that the segment selector points to a segment descriptor that is within the descriptor table limit (GDT or LDT). 3. Checks that the segment descriptor is a code or data-segment descriptor type. 4. If the segment is not a conforming code segment, checks if the segment descriptor is visible at the CPL (that is, if the CPL and the RPL of the segment selector are less than or equal to the DPL). 5. Checks that the segment is readable (for the VERR instruction) or writable (for the VERW) instruction. The VERR instruction sets the ZF flag in the EFLAGS register if the segment is visible at the CPL and readable; the VERW sets the ZF flag if the segment is visible and writable. (Code segments are never writable.) The ZF flag is cleared if any of these checks fail.
4-27
PROTECTION
4.10.3. Checking That the Pointer Offset Is Within Limits (LSL Instruction)
When the processor accesses any segment it performs a limit check to insure that the offset is within the limit of the segment. Software can perform this limit check using the LSL (load segment limit) instruction. Like the LAR instruction, the LSL instruction specifies the segment selector for the segment descriptor whose limit is to be checked and a destination register. The instruction then performs the following operations: 1. Check that the segment selector is not null. 2. Checks that the segment selector points to a segment descriptor that is within the descriptor table limit (GDT or LDT). 3. Checks that the segment descriptor is a code, data, LDT, or TSS segment-descriptor type. 4. If the segment is not a conforming code segment, checks if the segment descriptor is visible at the CPL (that is, if the CPL and the RPL of the segment selector less than or equal to the DPL). 5. If the privilege level and type checks pass, loads the unscrambled limit (the limit scaled according to the setting of the G flag in the segment descriptor) into the destination register and sets the ZF flag in the EFLAGS register. If the segment selector is not visible at the current privilege level or is an invalid type for the LSL instruction, the instruction does not modify the destination register and clears the ZF flag. Once loaded in the destination register, software can compare the segment limit with the offset of a pointer.
4.10.4. Checking Caller Access Privileges (ARPL Instruction)

The requestors privilege level (RPL) field of a segment selector is intended to carry the privilege level of a calling procedure (the calling procedures CPL) to a called procedure. The called procedure then uses the RPL to determine if access to a segment is allowed. The RPL is said to weaken the privilege level of the called procedure to that of the RPL. Operating-system procedures typically use the RPL to prevent less privileged application programs from accessing data located in more privileged segments. When an operating-system procedure (the called procedure) receives a segment selector from an application program (the calling procedure), it sets the segment selectors RPL to the privilege level of the calling procedure. Then, when the operating system uses the segment selector to access its associated segment, the processor performs privilege checks using the calling procedures privilege level (stored in the RPL) rather than the numerically lower privilege level (the CPL) of the operatingsystem procedure. The RPL thus insures that the operating system does not access a segment on behalf of an application program unless that program itself has access to the segment. Figure 4-12 shows an example of how the processor uses the RPL field. In this example, an application program (located in code segment A) possesses a segment selector (segment selector D1) that points to a privileged data structure (that is, a data structure located in a data segment D at privilege level 0). The application program cannot access data segment D, because it does
4-28
PROTECTION
not have sufficient privilege, but the operating system (located in code segment C) can. So, in an attempt to access data segment D, the application program executes a call to the operating system and passes segment selector D1 to the operating system as a parameter on the stack. Before passing the segment selector, the (well behaved) application program sets the RPL of the segment selector to its current privilege level (which in this example is 3). If the operating system attempts to access data segment D using segment selector D1, the processor compares the CPL (which is now 0 following the call), the RPL of segment selector D1, and the DPL of data segment D (which is 0). Since the RPL is greater than the DPL, access to data segment D is denied. The processors protection mechanism thus protects data segment D from access by the operating system, because application programs privilege level (represented by the RPL of segment selector B) is greater than the DPL of data segment D.
Passed as a parameter on the stack. Application Program Code Segment A CPL=3 Gate Selector B RPL=3 Call Gate B DPL=3 Segment Sel. D1 RPL=3
Lowest Privilege
2
Access not allowed
1
Code Operating Segment C System DPL=0 Segment Sel. D2 RPL=0 Access allowed Data Segment D DPL=0
Highest Privilege
Figure 4-12. Use of RPL to Weaken Privilege Level of Called Procedure
Now assume that instead of setting the RPL of the segment selector to 3, the application program sets the RPL to 0 (segment selector D2). The operating system can now access data segment D, because its CPL and the RPL of segment selector D2 are both equal to the DPL of data segment D. Because the application program is able to change the RPL of a segment selector to any value, it can potentially use a procedure operating at a numerically lower privilege level to access a
4-29
PROTECTION
protected data structure. This ability to lower the RPL of a segment selector breaches the processors protection mechanism. Because a called procedure cannot rely on the calling procedure to set the RPL correctly, operating-system procedures (executing at numerically lower privilege-levels) that receive segment selectors from numerically higher privilege-level procedures need to test the RPL of the segment selector to determine if it is at the appropriate level. The ARPL (adjust requested privilege level) instruction is provided for this purpose. This instruction adjusts the RPL of one segment selector to match that of another segment selector. The example in Figure 4-12 demonstrates how the ARPL instruction is intended to be used. When the operating-system receives segment selector D2 from the application program, it uses the ARPL instruction to compare the RPL of the segment selector with the privilege level of the application program (represented by the code-segment selector pushed onto the stack). If the RPL is less than application programs privilege level, the ARPL instruction changes the RPL of the segment selector to match the privilege level of the application program (segment selector D1). Using this instruction thus prevents a procedure running at a numerically higher privilege level from accessing numerically lower privilege-level (more privileged) segments by lowering the RPL of a segment selector. Note that the privilege level of the application program can be determined by reading the RPL field of the segment selector for the application-programs code segment. This segment selector is stored on the stack as part of the call to the operating system. The operating system can copy the segment selector from the stack into a register for use as an operand for the ARPL instruction.
4.10.5. Checking Alignment

When the CPL is 3, alignment of memory references can be checked by setting the AM flag in the CR0 register and the AC flag in the EFLAGS register. Unaligned memory references generate alignment exceptions (#AC). The processor does not generate alignment exceptions when operating at privilege level 0, 1, or 2. Refer to Table 5-7 in Chapter 5, Interrupt and Exception Handling for a description of the alignment requirements when alignment checking is enabled.
4.11. PAGE-LEVEL PROTECTION

Page-level protection can be used alone or applied to segments. When page-level protection is used with the flat memory model, it allows supervisor code and data (the operating system or executive) to be protected from user code and data (application programs). It also allows pages containing code to be write protected. When the segment- and page-level protection are combined, page-level read/write protection allows more protection granularity within segments. With page-level protection (as with segment-level protection) each memory reference is checked to verify that protection checks are satisfied. All checks are made before the memory cycle is started, and any violation prevents the cycle from starting and results in a page-fault
4-30
PROTECTION
exception being generated. Because checks are performed in parallel with address translation, there is no performance penalty. The processor performs two page-level protection checks:
Restriction of addressable domain (supervisor and user modes). Page type (read only or read/write).
Violations of either of these checks results in a page-fault exception being generated. Refer to Chapter 5, Interrupt and Exception Handling for an explanation of the page-fault exception mechanism. This chapter describes the protection violations which lead to page-fault exceptions.
4.11.1. Page-Protection Flags

Protection information for pages is contained in two flags in a page-directory or page-table entry (refer to Figure 3-14 in Chapter 3, Protected-Mode Memory Management): the read/write flag (bit 1) and the user/supervisor flag (bit 2). The protection checks are applied to both first- and second-level page tables (that is, page directories and page tables).
4.11.2. Restricting Addressable Domain

The page-level protection mechanism allows restricting access to pages based on two privilege levels:
Supervisor mode (U/S flag is 0)(Most privileged) For the operating system or executive, other system software (such as device drivers), and protected system data (such as page tables). User mode (U/S flag is 1)(Least privileged) For application code and data.
The segment privilege levels map to the page privilege levels as follows. If the processor is currently operating at a CPL of 0, 1, or 2, it is in supervisor mode; if it is operating at a CPL of 3, it is in user mode. When the processor is in supervisor mode, it can access all pages; when in user mode, it can access only user-level pages. (Note that the WP flag in control register CR0 modifies the supervisor permissions, as described in Section 4.11.3., Page Type) Note that to use the page-level protection mechanism, code and data segments must be set up for at least two segment-based privilege levels: level 0 for supervisor code and data segments and level 3 for user code and data segments. (In this model, the stacks are placed in the data segments.) To minimize the use of segments, a flat memory model can be used (refer to Section 3.2.1., Basic Flat Model in Section 3, Protected-Mode Memory Management). Here, the user and supervisor code and data segments all begin at address zero in the linear address space and overlay each other. With this arrangement, operating-system code (running at the supervisor level) and application code (running at the user level) can execute as if there are no segments. Protection between operating-system and application code and data is provided by the processors page-level protection mechanism.
4-31
PROTECTION
4.11.3.
Page Type
The page-level protection mechanism recognizes two page types:
Read-only access (R/W flag is 0). Read/write access (R/W flag is 1).
When the processor is in supervisor mode and the WP flag in register CR0 is clear (its state following reset initialization), all pages are both readable and writable (write-protection is ignored). When the processor is in user mode, it can write only to user-mode pages that are read/write accessible. User-mode pages which are read/write or read-only are readable; supervisor-mode pages are neither readable nor writable from user mode. A page-fault exception is generated on any attempt to violate the protection rules. The P6 family, Pentium, and Intel486 processors allow user-mode pages to be writeprotected against supervisor-mode access. Setting the WP flag in register CR0 to 1 enables supervisor-mode sensitivity to user-mode, write-protected pages. This supervisor write-protect feature is useful for implementing a copy-on-write strategy used by some operating systems, such as UNIX*, for task creation (also called forking or spawning). When a new task is created, it is possible to copy the entire address space of the parent task. This gives the child task a complete, duplicate set of the parent's segments and pages. An alternative copy-on-write strategy saves memory space and time by mapping the child's segments and pages to the same segments and pages used by the parent task. A private copy of a page gets created only when one of the tasks writes to the page. By using the WP flag and marking the shared pages as readonly, the supervisor can detect an attempt to write to a user-level page, and can copy the page at that time.
4.11.4.
Combining Protection of Both Levels of Page Tables
For any one page, the protection attributes of its page-directory entry (first-level page table) may differ from those of its page-table entry (second-level page table). The processor checks the protection for a page in both its page-directory and the page-table entries. Table 4-2 shows the protection provided by the possible combinations of protection attributes when the WP flag is clear.
4.11.5.
Overrides to Page Protection
The following types of memory accesses are checked as if they are privilege-level 0 accesses, regardless of the CPL at which the processor is currently operating:
Access to segment descriptors in the GDT, LDT, or IDT. Access to an inner-privilege-level stack during an inter-privilege-level call or a call to in exception or interrupt handler, when a change of privilege level occurs.
4-32
PROTECTION
4.12. COMBINING PAGE AND SEGMENT PROTECTION

When paging is enabled, the processor evaluates segment protection first, then evaluates page protection. If the processor detects a protection violation at either the segment level or the page level, the memory access is not carried out and an exception is generated. If an exception is generated by segmentation, no paging exception is generated. Page-level protections cannot be used to override segment-level protection. For example, a code segment is by definition not writable. If a code segment is paged, setting the R/W flag for the pages to read-write does not make the pages writable. Attempts to write into the pages will be blocked by segment-level protection checks. Page-level protection can be used to enhance segment-level protection. For example, if a large read-write data segment is paged, the page-protection mechanism can be used to write-protect individual pages.
Table 4-2. Combined Page-Directory and Page-Table Protection
Page-Directory Entry Privilege User User User User User User User User Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor NOTE: * If the WP flag of CR0 is set, the access type is determined by the R/W flags of the page-directory and page-table entries. Access Type Read-Only Read-Only Read-Write Read-Write Read-Only Read-Only Read-Write Read-Write Read-Only Read-Only Read-Write Read-Write Read-Only Read-Only Read-Write Read-Write Page-Table Entry Privilege User User User User Supervisor Supervisor Supervisor Supervisor User User User User Supervisor Supervisor Supervisor Supervisor Access Type Read-Only Read-Write Read-Only Read-Write Read-Only Read-Write Read-Only Read-Write Read-Only Read-Write Read-Only Read-Write Read-Only Read-Write Read-Only Read-Write Combined Effect Privilege User User User User Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Access Type Read-Only Read-Only Read-Only Read/Write Read/Write* Read/Write* Read/Write* Read/Write Read/Write* Read/Write* Read/Write* Read/Write Read/Write* Read/Write* Read/Write* Read/Write
4-33
PROTECTION
4-34
5
Interrupt and Exception Handling
INTERRUPT AND EXCEPTION HANDLING
CHAPTER 5 INTERRUPT AND EXCEPTION HANDLING

This chapter describes the processors interrupt and exception-handling mechanism, when operating in protected mode. Most of the information provided here also applies to the interrupt and exception mechanism used in real-address or virtual-8086 mode. Refer to Chapter 16, 8086 Emulation for a description of the differences in the interrupt and exception mechanism for realaddress and virtual-8086 mode.
5.1.
INTERRUPT AND EXCEPTION OVERVIEW
Interrupts and exceptions are forced transfers of execution from the currently running program or task to a special procedure or task called a handler. Interrupts typically occur at random times during the execution of a program, in response to signals from hardware. They are used to handle events external to the processor, such as requests to service peripheral devices. Software can also generate interrupts by executing the INT n instruction. Exceptions occur when the processor detects an error condition while executing an instruction, such as division by zero. The processor detects a variety of error conditions including protection violations, page faults, and internal machine faults. The machine-check architecture of the P6 family and Pentium processors also permits a machine-check exception to be generated when internal hardware errors and bus errors are detected. The processors interrupt and exception-handling mechanism allows interrupts and exceptions to be handled transparently to application programs and the operating system or executive. When an interrupt is received or an exception is detected, the currently running procedure or task is automatically suspended while the processor executes an interrupt or exception handler. When execution of the handler is complete, the processor resumes execution of the interrupted procedure or task. The resumption of the interrupted procedure or task happens without loss of program continuity, unless recovery from an exception was not possible or an interrupt caused the currently running program to be terminated. This chapter describes the processors interrupt and exception-handling mechanism, when operating in protected mode. A detailed description of the exceptions and the conditions that cause them to be generated is given at the end of this chapter. Refer to Chapter 16, 8086 Emulation for a description of the interrupt and exception mechanism for real-address and virtual-8086 mode.
5.1.1.
Sources of Interrupts
The processor receives interrupts from two sources:
External (hardware generated) interrupts. Software-generated interrupts.
5-1
5.1.1.1.
EXTERNAL INTERRUPTS
External interrupts are received through pins on the processor or through the local APIC serial bus. The primary interrupt pins on a P6 family or Pentium processor are the LINT[1:0] pins, which are connected to the local APIC (refer to Section 7.5., Advanced Programmable Interrupt Controller (APIC) in Chapter 7, Multiple-Processor Management). When the local APIC is disabled, these pins are configured as INTR and NMI pins, respectively. Asserting the INTR pin signals the processor that an external interrupt has occurred, and the processor reads from the system bus the interrupt vector number provided by an external interrupt controller, such as an 8259A (refer to Section 5.2., Exception and Interrupt Vectors). Asserting the NMI pin signals a nonmaskable interrupt (NMI), which is assigned to interrupt vector 2. When the local APIC is enabled, the LINT[1:0] pins can be programmed through the APICs vector table to be associated with any of the processors exception or interrupt vectors. The processors local APIC can be connected to a system-based I/O APIC. Here, external interrupts received at the I/O APICs pins can be directed to the local APIC through the APIC serial bus (pins PICD[1:0]). The I/O APIC determines the vector number of the interrupt and sends this number to the local APIC. When a system contains multiple processors, processors can also send interrupts to one another by means of the APIC serial bus. The LINT[1:0] pins are not available on the Intel486 processor and the earlier Pentium processors that do not contain an on-chip local APIC. Instead these processors have dedicated NMI and INTR pins. With these processors, external interrupts are typically generated by a system-based interrupt controller (8259A), with the interrupts being signaled through the INTR pin. Note that several other pins on the processor cause a processor interrupt to occur; however, these interrupts are not handled by the interrupt and exception mechanism described in this chapter. These pins include the RESET#, FLUSH#, STPCLK#, SMI#, R/S#, and INIT# pins. Which of these pins are included on a particular Intel Architecture processor is implementation dependent. The functions of these pins are described in the data books for the individual processors. The SMI# pin is also described in Chapter 12, System Management Mode (SMM). 5.1.1.2. MASKABLE HARDWARE INTERRUPTS
Any external interrupt that is delivered to the processor by means of the INTR pin or through the local APIC is called a maskable hardware interrupt. The maskable hardware interrupts that can be delivered through the INTR pin include all Intel Architecture defined interrupt vectors from 0 through 255; those that can be delivered through the local APIC include interrupt vectors 16 through 255. All maskable hardware interrupts can be masked as a group. Use the single IF flag in the EFLAGS register (refer to Section 5.6.1., Masking Maskable Hardware Interrupts) to mask these maskable interrupts. Note that when interrupts 0 through 15 are delivered through the local APIC, the APIC indicates the receipt of an illegal vector.
5-2
5.1.1.3.
SOFTWARE-GENERATED INTERRUPTS
The INT n instruction permits interrupts to be generated from within software by supplying the interrupt vector number as an operand. For example, the INT 35 instruction forces an implicit call to the interrupt handler for interrupt 35. Any of the interrupt vectors from 0 to 255 can be used as a parameter in this instruction. If the processors predefined NMI vector is used, however, the response of the processor will not be the same as it would be from an NMI interrupt generated in the normal manner. If vector number 2 (the NMI vector) is used in this instruction, the NMI interrupt handler is called, but the processors NMI-handling hardware is not activated. Note that interrupts generated in software with the INT n instruction cannot be masked by the IF flag in the EFLAGS register.
5.1.2.
Sources of Exceptions
The processor receives exceptions from three sources:
Processor-detected program-error exceptions. Software-generated exceptions. Machine-check exceptions. PROGRAM-ERROR EXCEPTIONS
5.1.2.1.
The processor generates one or more exceptions when it detects program errors during the execution in an application program or the operating system or executive. The Intel Architecture defines a vector number for each processor-detectable exception. The exceptions are further classified as faults, traps, and aborts (refer to Section 5.3., Exception Classifications). 5.1.2.2. SOFTWARE-GENERATED EXCEPTIONS
The INTO, INT 3, and BOUND instructions permit exceptions to be generated in software. These instructions allow checks for specific exception conditions to be performed at specific points in the instruction stream. For example, the INT 3 instruction causes a breakpoint exception to be generated. The INT n instruction can be used to emulate a specific exception in software, with one limitation. If the n operand in the INT n instruction contains a vector for one of the Intel Architecture exceptions, the processor will generate an interrupt to that vector, which will in turn invoke the exception handler associated with that vector. Because this is actually an interrupt, however, the processor does not push an error code onto the stack, even if a hardware-generated exception for that vector normally produces one. For those exceptions that produce an error code, the exception handler will attempt to pop an error code from the stack while handling the exception. If the INT n instruction was used to emulate the generation of an exception, the handler will pop off and discard the EIP (in place of the missing error code), sending the return to the wrong location.
5-3
5.1.2.3.
MACHINE-CHECK EXCEPTIONS
The P6 family and Pentium processors provide both internal and external machine-check mechanisms for checking the operation of the internal chip hardware and bus transactions. These mechanisms constitute extended (implementation dependent) exception mechanisms. When a machine-check error is detected, the processor signals a machine-check exception (vector 18) and returns an error code. Refer to Interrupt 18Machine Check Exception (#MC) at the end of this chapter and Chapter 13, Machine-Check Architecture, for a detailed description of the machine-check mechanism.
5.2.
EXCEPTION AND INTERRUPT VECTORS
The processor associates an identification number, called a vector, with each exception and interrupt. Table 5-1 shows the assignment of exception and interrupt vectors. This table also gives the exception type for each vector, indicates whether an error code is saved on the stack for an exception, and gives the source of the exception or interrupt. The vectors in the range 0 through 31 are assigned to the exceptions and the NMI interrupt. Not all of these vectors are currently used by the processor. Unassigned vectors in this range are reserved for possible future uses. Do not use the reserved vectors. The vectors in the range 32 to 255 are designated as user-defined interrupts. These interrupts are not reserved by the Intel Architecture and are generally assigned to external I/O devices and to permit them to signal the processor through one of the external hardware interrupt mechanisms described in Section 5.1.1., Sources of Interrupts
5.3.
EXCEPTION CLASSIFICATIONS
Exceptions are classified as faults, traps, or aborts depending on the way they are reported and whether the instruction that caused the exception can be restarted with no loss of program or task continuity. Faults A fault is an exception that can generally be corrected and that, once corrected, allows the program to be restarted with no loss of continuity. When a fault is reported, the processor restores the machine state to the state prior to the beginning of execution of the faulting instruction. The return address (saved contents of the CS and EIP registers) for the fault handler points to the faulting instruction, rather than the instruction following the faulting instruction. Note: There are a small subset of exceptions that are normally reported as faults, but under architectural corner cases, they are not restartable and some processor context will be lost. An example of these cases is the execution of the POPAD instruction where the stack frame crosses over the the end of the stack segment. The exception handler will see that the CS:EIP has been restored as if the POPAD instruction had not executed however internal processor state (general purpose registers) will have been modified. These corner cases are
5-4
considered programming errors and an application causeing this class of exceptions will likely be terminated by the operating system. Traps A trap is an exception that is reported immediately following the execution of the trapping instruction. Traps allow execution of a program or task to be continued without loss of program continuity. The return address for the trap handler points to the instruction to be executed after the trapping instruction. An abort is an exception that does not always report the precise location of the instruction causing the exception and does not allow restart of the program or task that caused the exception. Aborts are used to report severe errors, such as hardware errors and inconsistent or illegal values in system tables.
Aborts
5-5
Table 5-1. Protected-Mode Exceptions and Interrupts

Vector No. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20-31 32255 #TS #NP #SS #GP #PF #MF #AC #MC #XF Mnemonic #DE #DB #BP #OF #BR #UD #NM #DF Description Divide Error Debug NMI Interrupt Breakpoint Overflow BOUND Range Exceeded Invalid Opcode (Undefined Opcode) Device Not Available (No Math Coprocessor) Double Fault Coprocessor Segment Overrun (reserved) Invalid TSS Segment Not Present Stack-Segment Fault General Protection Page Fault (Intel reserved. Do not use.) Floating-Point Error (Math Fault) Alignment Check Machine Check Streaming SIMD Extensions Intel reserved. Do not use. User Defined (Nonreserved) Interrupts Interrupt External interrupt or INT n instruction. Fault Fault Abort Fault Type Fault Fault/ Trap Interrupt Trap Trap Fault Fault Fault Abort Fault Fault Fault Fault Fault Fault Error Code No No No No No No No No Yes (Zero) No Yes Yes Yes Yes Yes No No Yes (Zero) No No Floating-point or WAIT/FWAIT instruction. Any data reference in memory.3 Error codes (if any) and source are model dependent.4 SIMD floating-point instructions5 Source DIV and IDIV instructions. Any code or data reference or the INT 1 instruction. Nonmaskable external interrupt. INT 3 instruction. INTO instruction. BOUND instruction. UD2 instruction or reserved opcode.1 Floating-point or WAIT/FWAIT instruction. Any instruction that can generate an exception, an NMI, or an INTR. Floating-point instruction.2 Task switch or TSS access. Loading segment registers or accessing system segments. Stack operations and SS register loads. Any memory reference and other protection checks. Any memory reference.
NOTES: 1. The UD2 instruction was introduced in the Pentium Pro processor. 2. Intel Architecture processors after the Intel386 processor do not generate this exception. 3. This exception was introduced in the Intel486 processor. 4. This exception was introduced in the Pentium processor and enhanced in the P6 family processors. 5. This exception was introduced in the Pentium III processor.
5-6
5.4.
PROGRAM OR TASK RESTART
To allow restarting of program or task following the handling of an exception or an interrupt, all exceptions except aborts are guaranteed to report the exception on a precise instruction boundary, and all interrupts are guaranteed to be taken on an instruction boundary. For fault-class exceptions, the return instruction pointer that the processor saves when it generates the exception points to the faulting instruction. So, when a program or task is restarted following the handling of a fault, the faulting instruction is restarted (re-executed). Restarting the faulting instruction is commonly used to handle exceptions that are generated when access to an operand is blocked. The most common example of a fault is a page-fault exception (#PF) that occurs when a program or task references an operand in a page that is not in memory. When a page-fault exception occurs, the exception handler can load the page into memory and resume execution of the program or task by restarting the faulting instruction. To insure that this instruction restart is handled transparently to the currently executing program or task, the processor saves the necessary registers and stack pointers to allow it to restore itself to its state prior to the execution of the faulting instruction. For trap-class exceptions, the return instruction pointer points to the instruction following the trapping instruction. If a trap is detected during an instruction which transfers execution, the return instruction pointer reflects the transfer. For example, if a trap is detected while executing a JMP instruction, the return instruction pointer points to the destination of the JMP instruction, not to the next address past the JMP instruction. All trap exceptions allow program or task restart with no loss of continuity. For example, the overflow exception is a trapping exception. Here, the return instruction pointer points to the instruction following the INTO instruction that tested the OF (overflow) flag in the EFLAGS register. The trap handler for this exception resolves the overflow condition. Upon return from the trap handler, program or task execution continues at the next instruction following the INTO instruction. The abort-class exceptions do not support reliable restarting of the program or task. Abort handlers generally are designed to collect diagnostic information about the state of the processor when the abort exception occurred and then shut down the application and system as gracefully as possible. Interrupts rigorously support restarting of interrupted programs and tasks without loss of continuity. The return instruction pointer saved for an interrupt points to the next instruction to be executed at the instruction boundary where the processor took the interrupt. If the instruction just executed has a repeat prefix, the interrupt is taken at the end of the current iteration with the registers set to execute the next iteration. The ability of a P6 family processor to speculatively execute instructions does not affect the taking of interrupts by the processor. Interrupts are taken at instruction boundaries located during the retirement phase of instruction execution; so they are always taken in the in-order instruction stream. Refer to Chapter 2, Introduction to the Intel Architecture, in the Intel Architecture Software Developers Manual, Volume 1, for more information about the P6 family processors microarchitecture and its support for out-of-order instruction execution. Note that the Pentium processor and earlier Intel Architecture processors also perform varying amounts of prefetching and preliminary decoding of instructions; however, here also exceptions and interrupts are not signaled until actual in-order execution of the instructions. For a given
5-7
code sample, the signaling of exceptions will occur uniformly when the code is executed on any family of Intel Architecture processors (except where new exceptions or new opcodes have been defined).
5.5.
NONMASKABLE INTERRUPT (NMI)
The nonmaskable interrupt (NMI) can be generated in either of two ways:
External hardware asserts the NMI pin. The processor receives a message on the APIC serial bus of delivery mode NMI.
When the processor receives a NMI from either of these sources, the processor handles it immediately by calling the NMI handler pointed to by interrupt vector number 2. The processor also invokes certain hardware conditions to insure that no other interrupts, including NMI interrupts, are received until the NMI handler has completed executing (refer to Section 5.5.1., Handling Multiple NMIs). Also, when an NMI is received from either of the above sources, it cannot be masked by the IF flag in the EFLAGS register. It is possible to issue a maskable hardware interrupt (through the INTR pin) to vector 2 to invoke the NMI interrupt handler; however, this interrupt will not truly be an NMI interrupt. A true NMI interrupt that activates the processors NMI-handling hardware can only be delivered through one of the mechanisms listed above.
5.5.1.
Handling Multiple NMIs
While an NMI interrupt handler is executing, the processor disables additional calls to the NMI handler until the next IRET instruction is executed. This blocking of subsequent NMIs prevents stacking up calls to the NMI handler. It is recommended that the NMI interrupt handler be accessed through an interrupt gate to disable maskable hardware interrupts (refer to Section 5.6.1., Masking Maskable Hardware Interrupts).
5.6.
ENABLING AND DISABLING INTERRUPTS
The processor inhibits the generation of some interrupts, depending on the state of the processor and of the IF and RF flags in the EFLAGS register, as described in the following sections.
5.6.1.
Masking Maskable Hardware Interrupts
The IF flag can disable the servicing of maskable hardware interrupts received on the processors INTR pin or through the local APIC (refer to Section 5.1.1.2., Maskable Hardware Interrupts). When the IF flag is clear, the processor inhibits interrupts delivered to the INTR pin or through the local APIC from generating an internal interrupt request; when the IF flag is set, interrupts delivered to the INTR or through the local APIC pin are processed as normal
5-8
external interrupts. The IF flag does not affect nonmaskable interrupts (NMIs) delivered to the NMI pin or delivery mode NMI messages delivered through the APIC serial bus, nor does it affect processor generated exceptions. As with the other flags in the EFLAGS register, the processor clears the IF flag in response to a hardware reset. The fact that the group of maskable hardware interrupts includes the reserved interrupt and exception vectors 0 through 32 can potentially cause confusion. Architecturally, when the IF flag is set, an interrupt for any of the vectors from 0 through 32 can be delivered to the processor through the INTR pin and any of the vectors from 16 through 32 can be delivered through the local APIC. The processor will then generate an interrupt and call the interrupt or exception handler pointed to by the vector number. So for example, it is possible to invoke the page-fault handler through the INTR pin (by means of vector 14); however, this is not a true page-fault exception. It is an interrupt. As with the INT n instruction (refer to Section 5.1.2.2., SoftwareGenerated Exceptions), when an interrupt is generated through the INTR pin to an exception vector, the processor does not push an error code on the stack, so the exception handler may not operate correctly. The IF flag can be set or cleared with the STI (set interrupt-enable flag) and CLI (clear interruptenable flag) instructions, respectively. These instructions may be executed only if the CPL is equal to or less than the IOPL. A general-protection exception (#GP) is generated if they are executed when the CPL is greater than the IOPL. (The effect of the IOPL on these instructions is modified slightly when the virtual mode extension is enabled by setting the VME flag in control register CR4, refer to Section 16.3., Interrupt and Exception Handling in Virtual-8086 Mode in Chapter 16, 8086 Emulation.) The IF flag is also affected by the following operations:
The PUSHF instruction stores all flags on the stack, where they can be examined and modified. The POPF instruction can be used to load the modified flags back into the EFLAGS register. Task switches and the POPF and IRET instructions load the EFLAGS register; therefore, they can be used to modify the setting of the IF flag. When an interrupt is handled through an interrupt gate, the IF flag is automatically cleared, which disables maskable hardware interrupts. (If an interrupt is handled through a trap gate, the IF flag is not cleared.)
Refer to the descriptions of the CLI, STI, PUSHF, POPF, and IRET instructions in Chapter 3, Instruction Set Reference, of the Intel Architecture Software Developers Manual, Volume 2, for a detailed description of the operations these instructions are allowed to perform on the IF flag.
5.6.2.
Masking Instruction Breakpoints
The RF (resume) flag in the EFLAGS register controls the response of the processor to instruction-breakpoint conditions (refer to the description of the RF flag in Section 2.3., System Flags and Fields in the EFLAGS Register in Chapter 2, System Architecture Overview). When set, it prevents an instruction breakpoint from generating a debug exception (#DB); when clear, instruction breakpoints will generate debug exceptions. The primary function of the RF flag is
5-9
to prevent the processor from going into a debug exception loop on an instruction-breakpoint. Refer to Section 15.3.1.1., Instruction-Breakpoint Exception Condition, in Chapter 15, Debugging and Performance Monitoring, for more information on the use of this flag.
5.6.3.
Masking Exceptions and Interrupts When Switching Stacks
To switch to a different stack segment, software often uses a pair of instructions, for example:
MOV SS, AX MOV ESP, StackTop
If an interrupt or exception occurs after the segment selector has been loaded into the SS register but before the ESP register has been loaded, these two parts of the logical address into the stack space are inconsistent for the duration of the interrupt or exception handler. To prevent this situation, the processor inhibits interrupts, debug exceptions, and single-step trap exceptions after either a MOV to SS instruction or a POP to SS instruction, until the instruction boundary following the next instruction is reached. All other faults may still be generated. If the LSS instruction is used to modify the contents of the SS register (which is the recommended method of modifying this register), this problem does not occur.
5.7.
PRIORITY AMONG SIMULTANEOUS EXCEPTIONS AND INTERRUPTS
If more than one exception or interrupt is pending at an instruction boundary, the processor services them in a predictable order. Table 5-3 shows the priority among classes of exception and interrupt sources. While priority among these classes is consistent throughout the architecture, exceptions within each class are implementation-dependent and may vary from processor to processor. The processor first services a pending exception or interrupt from the class which has the highest priority, transferring execution to the first instruction of the handler. Lower priority exceptions are discarded; lower priority interrupts are held pending. Discarded exceptions are re-generated when the interrupt handler returns execution to the point in the program or task where the exceptions and/or interrupts occurred. The Pentium III processor added the SIMD floating-point execution unit. The SIMD floatingpoint execution unit can generate exceptions as well. Since the SIMD floating-point execution unit utilizes a 4-wide register set an exception may result from more than one operand within a SIMD floating-point register. Hence the Pentium III processor handles these exceptions according to a predetermined precedence. When a sub-operand of a packed instruction generates two or more exception conditions, the exception precedence sometimes results in the higher priority exception being handled and the lower priority exceptions being ignored. Prioritization of exceptions is performed only on a sub-operand basis, and not between suboperands. For example, an invalid exception generated by one sub-operand will not prevent the reporting of a divide-by-zero exception generated by another sub-operand. Table 5-2 shows the precedence for Streaming SIMD Extensions numeric exceptions. The table reflects the order in which interrupts are handled upon simultaneous recognition by the processor (for example, when multiple interrupts are pending at an instruction boundary). However, the table does not necessarily reflect the
5-10
order in which interrupts will be recognized by the processor if received simultaneously at the processor pins.
Table 5-2. SIMD Floating-Point Exceptions Priority
Priority 1(Highest) Description Invalid operation exception due to SNaN operand (or any NaN operand for max, min, or certain compare and convert operations) QNaN operand1 Any other invalid operation exception not mentioned above or a divide-by-zero exception2 Denormal operand exception2 Numeric overflow and underflow exceptions possibly in conjunction with the inexact result exception2 Inexact result exception
2 3
4 5
6(Lowest)
1. Though this is not an exception, the handling of a QNaN operand has precedence over lower priority exceptions. For example, a QNaN divided by zero results in a QNaN, not a zero-divide exception. 2. If masked, then instruction execution continues, and a lower priority exception can occur as well.
5.8.
INTERRUPT DESCRIPTOR TABLE (IDT)
The interrupt descriptor table (IDT) associates each exception or interrupt vector with a gate descriptor for the procedure or task used to service the associated exception or interrupt. Like the GDT and LDTs, the IDT is an array of 8-byte descriptors (in protected mode). Unlike the GDT, the first entry of the IDT may contain a descriptor. To form an index into the IDT, the processor scales the exception or interrupt vector by eight (the number of bytes in a gate descriptor). Because there are only 256 interrupt or exception vectors, the IDT need not contain more than 256 descriptors. It can contain fewer than 256 descriptors, because descriptors are required only for the interrupt and exception vectors that may occur. All empty descriptor slots in the IDT should have the present flag for the descriptor set to 0.
5-11
Table 5-3. Priority Among Simultaneous Exceptions and Interrupts

Priority 1 (Highest) Descriptions Hardware Reset and Machine Checks - RESET - Machine Check Trap on Task Switch - T flag in TSS is set External Hardware Interventions - FLUSH - STOPCLK - SMI - INIT Traps on the Previous Instruction - Breakpoints - Debug Trap Exceptions (TF flag set or data/I-O breakpoint) External Interrupts - NMI Interrupts - Maskable Hardware Interrupts Faults from Fetching Next Instruction - Code Breakpoint Fault - Code-Segment Limit Violation1 - Code Page Fault1 Faults from Decoding the Next Instruction - Instruction length > 15 bytes - Illegal Opcode - Coprocessor Not Available Faults on Executing an Instruction - Floating-point exception - Overflow - Bound error - Invalid TSS - Segment Not Present - Stack fault - General Protection - Data Page Fault - Alignment Check - SIMD floating-point exception
2 3
8 (Lowest)
NOTE: 1. For the Pentium and Intel486 processors, the Code Segment Limit Violation and the Code Page Fault exceptions are assigned to the priority 7.
The base addresses of the IDT should be aligned on an 8-byte boundary to maximize performance of cache line fills. The limit value is expressed in bytes and is added to the base address to get the address of the last valid byte. A limit value of 0 results in exactly 1 valid byte. Because IDT entries are always eight bytes long, the limit should always be one less than an integral multiple of eight (that is, 8N 1).
5-12
The IDT may reside anywhere in the linear address space. As shown in Figure 5-1, the processor locates the IDT using the IDTR register. This register holds both a 32-bit base address and 16-bit limit for the IDT.
IDTR Register
47 16 15 0
IDT Base Address
IDT Limit
Interrupt Descriptor Table (IDT)

Gate for Interrupt #n (n1)8
Gate for Interrupt #3 Gate for Interrupt #2 Gate for Interrupt #1 31 0
16 8 0
Figure 5-1. Relationship of the IDTR and IDT
The LIDT (load IDT register) and SIDT (store IDT register) instructions load and store the contents of the IDTR register, respectively. The LIDT instruction loads the IDTR register with the base address and limit held in a memory operand. This instruction can be executed only when the CPL is 0. It normally is used by the initialization code of an operating system when creating an IDT. An operating system also may use it to change from one IDT to another. The SIDT instruction copies the base and limit value stored in IDTR to memory. This instruction can be executed at any privilege level. If a vector references a descriptor beyond the limit of the IDT, a general-protection exception (#GP) is generated.
5.9.
IDT DESCRIPTORS
The IDT may contain any of three kinds of gate descriptors:
Task-gate descriptor Interrupt-gate descriptor Trap-gate descriptor
5-13
Figure 5-2 shows the formats for the task-gate, interrupt-gate, and trap-gate descriptors. The format of a task gate used in an IDT is the same as that of a task gate used in the GDT or an LDT (refer to Section 6.2.4., Task-Gate Descriptor in Chapter 6, Task Management). The task gate contains the segment selector for a TSS for an exception and/or interrupt handler task.
Task Gate
31 16 15 14 13 12 P D P L 8 7 0
0 0 1 0 1
4
0
31
16 15
TSS Segment Selector
Interrupt Gate
31 16 15 14 13 12 8 7 5 4 0 D P L
Offset 31..16
0 D 1 1 0
0 0 0
4
0
31
16 15
Segment Selector
Offset 15..0
Trap Gate
31 16 15 14 13 12 8 7 5 4 0 D P L
Offset 31..16
0 D 1 1 1
0 0 0
4
0
31
16 15
Segment Selector
Offset 15..0
DPL Offset P Selector D
Descriptor Privilege Level Offset to procedure entry point Segment Present flag Segment Selector for destination code segment Size of gate: 1 = 32 bits; 0 = 16 bits
Reserved
Figure 5-2. IDT Gate Descriptors
Interrupt and trap gates are very similar to call gates (refer to Section 4.8.3., Call Gates in Chapter 4, Protection). They contain a far pointer (segment selector and offset) that the processor uses to transfer execution to a handler procedure in an exception- or interrupt-handler
5-14
code segment. These gates differ in the way the processor handles the IF flag in the EFLAGS register (refer to Section 5.10.1.2., Flag Usage By Exception- or Interrupt-Handler Procedure).
5.10. EXCEPTION AND INTERRUPT HANDLING

The processor handles calls to exception- and interrupt-handlers similar to the way it handles calls with a CALL instruction to a procedure or a task. When responding to an exception or interrupt, the processor uses the exception or interrupt vector as an index to a descriptor in the IDT. If the index points to an interrupt gate or trap gate, the processor calls the exception or interrupt handler in a manner similar to a CALL to a call gate (refer to Section 4.8.2., Gate Descriptors through Section 4.8.6., Returning from a Called Procedure in Chapter 4, Protection). If index points to a task gate, the processor executes a task switch to the exception- or interrupt-handler task in a manner similar to a CALL to a task gate (refer to Section 6.3., Task Switching in Chapter 6, Task Management).
5.10.1. Exception- or Interrupt-Handler Procedures

An interrupt gate or trap gate references an exception- or interrupt-handler procedure that runs in the context of the currently executing task (refer to Figure 5-3). The segment selector for the gate points to a segment descriptor for an executable code segment in either the GDT or the current LDT. The offset field of the gate descriptor points to the beginning of the exception- or interrupt-handling procedure. When the processor performs a call to the exception- or interrupt-handler procedure, it saves the current states of the EFLAGS register, CS register, and EIP register on the stack (refer to Figure 5-4). (The CS and EIP registers provide a return instruction pointer for the handler.) If an exception causes an error code to be saved, it is pushed on the stack after the EIP value. If the handler procedure is going to be executed at the same privilege level as the interrupted procedure, the handler uses the current stack. If the handler procedure is going to be executed at a numerically lower privilege level, a stack switch occurs. When a stack switch occurs, a stack pointer for the stack to be returned to is also saved on the stack. (The SS and ESP registers provide a return stack pointer for the handler.) The segment selector and stack pointer for the stack to be used by the handler is obtained from the TSS for the currently executing task. The processor copies the EFLAGS, SS, ESP, CS, EIP, and error code information from the interrupted procedures stack to the handlers stack. To return from an exception- or interrupt-handler procedure, the handler must use the IRET (or IRETD) instruction. The IRET instruction is similar to the RET instruction except that it restores the saved flags into the EFLAGS register. The IOPL field of the EFLAGS register is restored only if the CPL is 0. The IF flag is changed only if the CPL is less than or equal to the IOPL. Refer to IRET/IRETDInterrupt Return in Chapter 3 of the Intel Architecture Software Developers Manual, Volume 2, for the complete operation performed by the IRET instruction. If a stack switch occurred when calling the handler procedure, the IRET instruction switches back to the interrupted procedures stack on the return.
5-15
IDT
Destination Code Segment
Offset Interrupt Vector Interrupt or Trap Gate
Interrupt Procedure
Segment Selector GDT or LDT Base Address
Segment Descriptor
Figure 5-3. Interrupt Procedure Call
5-16
Stack Usage with No Privilege-Level Change Interrupted Procedures and Handlers Stack ESP Before Transfer to Handler
EFLAGS CS EIP Error Code
Stack Usage with Privilege-Level Change Interrupted Procedures Stack ESP Before Transfer to Handler Handlers Stack
SS ESP EFLAGS CS EIP Error Code
Figure 5-4. Stack Usage on Transfers to Interrupt and Exception-Handling Routines
5.10.1.1.
PROTECTION OF EXCEPTION- AND INTERRUPT-HANDLER PROCEDURES
The privilege-level protection for exception- and interrupt-handler procedures is similar to that used for ordinary procedure calls when called through a call gate (refer to Section 4.8.4., Accessing a Code Segment Through a Call Gate in Chapter 4, Protection). The processor does not permit transfer of execution to an exception- or interrupt-handler procedure in a less privileged code segment (numerically greater privilege level) than the CPL. An attempt to violate this rule results in a general-protection exception (#GP). The protection mechanism for exceptionand interrupt-handler procedures is different in the following ways:
Because interrupt and exception vectors have no RPL, the RPL is not checked on implicit calls to exception and interrupt handlers. The processor checks the DPL of the interrupt or trap gate only if an exception or interrupt is generated with an INT n, INT 3, or INTO instruction. Here, the CPL must be less than or equal to the DPL of the gate. This restriction prevents application programs or procedures running at privilege level 3 from using a software interrupt to access critical exception
5-17
handlers, such as the page-fault handler, providing that those handlers are placed in more privileged code segments (numerically lower privilege level). For hardware-generated interrupts and processor-detected exceptions, the processor ignores the DPL of interrupt and trap gates. Because exceptions and interrupts generally do not occur at predictable times, these privilege rules effectively impose restrictions on the privilege levels at which exception and interrupthandling procedures can run. Either of the following techniques can be used to avoid privilegelevel violations.
The exception or interrupt handler can be placed in a conforming code segment. This technique can be used for handlers that only need to access data available on the stack (for example, divide error exceptions). If the handler needs data from a data segment, the data segment needs to be accessible from privilege level 3, which would make it unprotected. The handler can be placed in a nonconforming code segment with privilege level 0. This handler would always run, regardless of the CPL that the interrupted program or task is running at. FLAG USAGE BY EXCEPTION- OR INTERRUPT-HANDLER PROCEDURE
5.10.1.2.
When accessing an exception or interrupt handler through either an interrupt gate or a trap gate, the processor clears the TF flag in the EFLAGS register after it saves the contents of the EFLAGS register on the stack. (On calls to exception and interrupt handlers, the processor also clears the VM, RF, and NT flags in the EFLAGS register, after they are saved on the stack.) Clearing the TF flag prevents instruction tracing from affecting interrupt response. A subsequent IRET instruction restores the TF (and VM, RF, and NT) flags to the values in the saved contents of the EFLAGS register on the stack. The only difference between an interrupt gate and a trap gate is the way the processor handles the IF flag in the EFLAGS register. When accessing an exception- or interrupt-handling procedure through an interrupt gate, the processor clears the IF flag to prevent other interrupts from interfering with the current interrupt handler. A subsequent IRET instruction restores the IF flag to its value in the saved contents of the EFLAGS register on the stack. Accessing a handler procedure through a trap gate does not affect the IF flag.
5.10.2. Interrupt Tasks

When an exception or interrupt handler is accessed through a task gate in the IDT, a task switch results. Handling an exception or interrupt with a separate task offers several advantages:
The entire context of the interrupted program or task is saved automatically. A new TSS permits the handler to use a new privilege level 0 stack when handling the exception or interrupt. If an exception or interrupt occurs when the current privilege level 0 stack is corrupted, accessing the handler through a task gate can prevent a system crash by providing the handler with a new privilege level 0 stack.
5-18
The handler can be further isolated from other tasks by giving it a separate address space. This is done by giving it a separate LDT.
The disadvantage of handling an interrupt with a separate task is that the amount of machine state that must be saved on a task switch makes it slower than using an interrupt gate, resulting in increased interrupt latency. A task gate in the IDT references a TSS descriptor in the GDT (refer to Figure 5-5). A switch to the handler task is handled in the same manner as an ordinary task switch (refer to Section 6.3., Task Switching in Chapter 6, Task Management). The link back to the interrupted task is stored in the previous task link field of the handler tasks TSS. If an exception caused an error code to be generated, this error code is copied to the stack of the new task.
IDT
TSS for InterruptHandling Task
Interrupt Vector
Task Gate
TSS Selector GDT
TSS Base Address
TSS Descriptor
Figure 5-5. Interrupt Task Switch
5-19
When exception- or interrupt-handler tasks are used in an operating system, there are actually two mechanisms that can be used to dispatch tasks: the software scheduler (part of the operating system) and the hardware scheduler (part of the processors interrupt mechanism). The software scheduler needs to accommodate interrupt tasks that may be dispatched when interrupts are enabled.
5.11. ERROR CODE

When an exception condition is related to a specific segment, the processor pushes an error code onto the stack of the exception handler (whether it is a procedure or task). The error code has the format shown in Figure 5-6. The error code resembles a segment selector; however, instead of a TI flag and RPL field, the error code contains 3 flags: EXT IDT External event (bit 0). When set, indicates that an event external to the program caused the exception, such as a hardware interrupt. Descriptor location (bit 1). When set, indicates that the index portion of the error code refers to a gate descriptor in the IDT; when clear, indicates that the index refers to a descriptor in the GDT or the current LDT. GDT/LDT (bit 2). Only used when the IDT flag is clear. When set, the TI flag indicates that the index portion of the error code refers to a segment or gate descriptor in the LDT; when clear, it indicates that the index refers to a descriptor in the current GDT.
TI
31
3 2 1 0
Reserved
Segment Selector Index
T I E X I D T T
Figure 5-6. Error Code
The segment selector index field provides an index into the IDT, GDT, or current LDT to the segment or gate selector being referenced by the error code. In some cases the error code is null (that is, all bits in the lower word are clear). A null error code indicates that the error was not caused by a reference to a specific segment or that a null segment descriptor was referenced in an operation. The format of the error code is different for page-fault exceptions (#PF), refer to Interrupt 14Page-Fault Exception (#PF) in this chapter. The error code is pushed on the stack as a doubleword or word (depending on the default interrupt, trap, or task gate size). To keep the stack aligned for doubleword pushes, the upper half of the error code is reserved. Note that the error code is not popped when the IRET instruction is executed to return from an exception handler, so the handler must remove the error code before executing a return.
5-20
Error codes are not pushed on the stack for exceptions that are generated externally (with the INTR or LINT[1:0] pins) or the INT n instruction, even if an error code is normally produced for those exceptions.
5.12. EXCEPTION AND INTERRUPT REFERENCE

The following sections describe conditions which generate exceptions and interrupts. They are arranged in the order of vector numbers. The information contained in these sections are as follows: Exception Class Indicates whether the exception class is a fault, trap, or abort type. Some exceptions can be either a fault or trap type, depending on when the error condition is detected. (This section is not applicable to interrupts.) Gives a general description of the purpose of the exception or interrupt type. It also describes how the processor handles the exception or interrupt. Indicates whether an error code is saved for the exception. If one is saved, the contents of the error code are described. (This section is not applicable to interrupts.)
Description
Exception Error Code
Saved Instruction Pointer Describes which instruction the saved (or return) instruction pointer points to. It also indicates whether the pointer can be used to restart a faulting instruction. Program State Change Describes the effects of the exception or interrupt on the state of the currently running program or task and the possibilities of restarting the program or task without loss of continuity.
5-21
Interrupt 0Divide Error Exception (#DE)

Exception Class Description Indicates the divisor operand for a DIV or IDIV instruction is 0 or that the result cannot be represented in the number of bits specified for the destination operand. Exception Error Code None. Saved Instruction Pointer Saved contents of CS and EIP registers point to the instruction that generated the exception. Program State Change A program-state change does not accompany the divide error, because the exception occurs before the faulting instruction is executed. Fault.
5-22
Interrupt 1Debug Exception (#DB)

Exception Class Trap or Fault. The exception handler can distinguish between traps or faults by examining the contents of DR6 and the other debug registers.
Description Indicates that one or more of several debug-exception conditions has been detected. Whether the exception is a fault or a trap depends on the condition, as shown below:
Exception Condition Instruction fetch breakpoint Data read or write breakpoint I/O read or write breakpoint General detect condition (in conjunction with in-circuit emulation) Single-step Task-switch Execution of INT 1 instruction Fault Trap Trap Fault Trap Trap Trap Exception Class
Refer to Chapter 15, Debugging and Performance Monitoring, for detailed information about the debug exceptions. Exception Error Code None. An exception handler can examine the debug registers to determine which condition caused the exception. Saved Instruction Pointer FaultSaved contents of CS and EIP registers point to the instruction that generated the exception. TrapSaved contents of CS and EIP registers point to the instruction following the instruction that generated the exception. Program State Change FaultA program-state change does not accompany the debug exception, because the exception occurs before the faulting instruction is executed. The program can resume normal execution upon returning from the debug exception handler TrapA program-state change does accompany the debug exception, because the instruction or task switch being executed is allowed to complete before the exception is generated. However, the new state of the program is not corrupted and execution of the program can continue reliably.
5-23
Interrupt 2NMI Interrupt

Exception Class Description The nonmaskable interrupt (NMI) is generated externally by asserting the processors NMI pin or through an NMI request set by the I/O APIC to the local APIC on the APIC serial bus. This interrupt causes the NMI interrupt handler to be called. Exception Error Code Not applicable. Saved Instruction Pointer The processor always takes an NMI interrupt on an instruction boundary. The saved contents of CS and EIP registers point to the next instruction to be executed at the point the interrupt is taken. Refer to Section 5.4., Program or Task Restart for more information about when the processor takes NMI interrupts. Program State Change The instruction executing when an NMI interrupt is received is completed before the NMI is generated. A program or task can thus be restarted upon returning from an interrupt handler without loss of continuity, provided the interrupt handler saves the state of the processor before handling the interrupt and restores the processors state prior to a return. Not applicable.
5-24
Interrupt 3Breakpoint Exception (#BP)

Exception Class Description Indicates that a breakpoint instruction (INT 3) was executed, causing a breakpoint trap to be generated. Typically, a debugger sets a breakpoint by replacing the first opcode byte of an instruction with the opcode for the INT 3 instruction. (The INT 3 instruction is one byte long, which makes it easy to replace an opcode in a code segment in RAM with the breakpoint opcode.) The operating system or a debugging tool can use a data segment mapped to the same physical address space as the code segment to place an INT 3 instruction in places where it is desired to call the debugger. With the P6 family, Pentium, Intel486, and Intel386 processors, it is more convenient to set breakpoints with the debug registers. (Refer to Section 15.3.2., Breakpoint Exception (#BP)Interrupt Vector 3, in Chapter 15, Debugging and Performance Monitoring, for information about the breakpoint exception.) If more breakpoints are needed beyond what the debug registers allow, the INT 3 instruction can be used. The breakpoint (#BP) exception can also be generated by executing the INT n instruction with an operand of 3. The action of this instruction (INT 3) is slightly different than that of the INT 3 instruction (refer to INTn/INTO/INT3Call to Interrupt Procedure in Chapter 3 of the Intel Architecture Software Developers Manual, Volume 2). Exception Error Code None. Saved Instruction Pointer Saved contents of CS and EIP registers point to the instruction following the INT 3 instruction. Program State Change Even though the EIP points to the instruction following the breakpoint instruction, the state of the program is essentially unchanged because the INT 3 instruction does not affect any register or memory locations. The debugger can thus resume the suspended program by replacing the INT 3 instruction that caused the breakpoint with the original opcode and decrementing the saved contents of the EIP register. Upon returning from the debugger, program execution resumes with the replaced instruction. Trap.
5-25
Interrupt 4Overflow Exception (#OF)

Exception Class Description Indicates that an overflow trap occurred when an INTO instruction was executed. The INTO instruction checks the state of the OF flag in the EFLAGS register. If the OF flag is set, an overflow trap is generated. Some arithmetic instructions (such as the ADD and SUB) perform both signed and unsigned arithmetic. These instructions set the OF and CF flags in the EFLAGS register to indicate signed overflow and unsigned overflow, respectively. When performing arithmetic on signed operands, the OF flag can be tested directly or the INTO instruction can be used. The benefit of using the INTO instruction is that if the overflow exception is detected, an exception handler can be called automatically to handle the overflow condition. Exception Error Code None. Saved Instruction Pointer The saved contents of CS and EIP registers point to the instruction following the INTO instruction. Program State Change Even though the EIP points to the instruction following the INTO instruction, the state of the program is essentially unchanged because the INTO instruction does not affect any register or memory locations. The program can thus resume normal execution upon returning from the overflow exception handler. Trap.
5-26
Interrupt 5BOUND Range Exceeded Exception (#BR)

Exception Class Description Indicates that a BOUND-range-exceeded fault occurred when a BOUND instruction was executed. The BOUND instruction checks that a signed array index is within the upper and lower bounds of an array located in memory. If the array index is not within the bounds of the array, a BOUND-range-exceeded fault is generated. Exception Error Code None. Saved Instruction Pointer The saved contents of CS and EIP registers point to the BOUND instruction that generated the exception. Program State Change A program-state change does not accompany the bounds-check fault, because the operands for the BOUND instruction are not modified. Returning from the BOUND-range-exceeded exception handler causes the BOUND instruction to be restarted. Fault.
5-27
Interrupt 6Invalid Opcode Exception (#UD)

Exception Class Description Indicates that the processor did one of the following things: Fault.
Attempted to execute a Streaming SIMD Extensions instruction in an Intel Architecture processor that does not support the Streaming SIMD Extensions. Attempted to execute a Streaming SIMD Extensions instruction when the OSFXSR bit is not set (0) in CR4. Note this does not include the following Streaming SIMD Extensions: PAVGB, PAVGW, PEXTRW, PINSRW, PMAXSW, PMAXUB, PMINSW, PMINUB, PMOVMSKB, PMULHUW, PSADBW, PSHUFW, MASKMOVQ, MOVNTQ, PREFETCH and SFENCE. Attempted to execute a Streaming SIMD Extensions instruction in an Intel Architecture processor which causes a numeric exception when the OSXMMEXCPT bit is not set (0) in CR4. Attempted to execute an invalid or reserved opcode, including any MMX instruction in an Intel Architecture processor that does not support the MMX architecture. Attempted to execute an MMX instruction or SIMD floating-point instruction when the EM flag in register CR0 is set. Note this does not include the following Streaming SIMD Extensions: SFENCE and PREFETCH. Attempted to execute an instruction with an operand type that is invalid for its accompanying opcode; for example, the source operand for a LES instruction is not a memory location. Executed a UD2 instruction. Detected a LOCK prefix that precedes an instruction that may not be locked or one that may be locked but the destination operand is not a memory location. Attempted to execute an LLDT, SLDT, LTR, STR, LSL, LAR, VERR, VERW, or ARPL instruction while in real-address or virtual-8086 mode. Attempted to execute the RSM instruction when not in SMM mode.
In the P6 family processors, this exception is not generated until an attempt is made to retire the result of executing an invalid instruction; that is, decoding and speculatively attempting to execute an invalid opcode does not generate this exception. Likewise, in the Pentium processor and earlier Intel Architecture processors, this exception is not generated as the result of prefetching and preliminary decoding of an invalid instruction. (Refer to Section 5.4., Program or Task Restart for general rules for taking of interrupts and exceptions.) The opcodes D6 and F1 are undefined opcodes that are reserved by Intel. These opcodes, even though undefined, do not generate an invalid opcode exception.
5-28
The UD2 instruction is guaranteed to generate an invalid opcode exception. Exception Error Code None. Saved Instruction Pointer The saved contents of CS and EIP registers point to the instruction that generated the exception. Program State Change A program-state change does not accompany an invalid-opcode fault, because the invalid instruction is not executed.
5-29
Interrupt 7Device Not Available Exception (#NM)

Exception Class Description Indicates one of the following things: The device-not-available fault is generated by either of three conditions: Fault.
The processor executed a floating-point instruction while the EM flag of register CR0 was set. The processor executed a floating-point, MMX or SIMD floating-point (excluding prefetch, sfence or streaming store instructions) instruction while the TS flag of register CR0 was set. The processor executed a WAIT or FWAIT instruction while the MP and TS flags of register CR0 were set.
The EM flag is set when the processor does not have an internal floating-point unit. An exception is then generated each time a floating-point instruction is encountered, allowing an exception handler to call floating-point instruction emulation routines. The TS flag indicates that a context switch (task switch) has occurred since the last time a floating-point, MMX or SIMD floating-point (excluding prefetch, sfence or streaming store instructions) instruction was executed, but that the context of the FPU was not saved. When the TS flag is set, the processor generates a device-not-available exception each time a floatingpoint, MMX or SIMD floating-point (excluding prefetch, sfence or streaming store instructions) instruction is encountered. The exception handler can then save the context of the FPU before it executes the instruction. Refer to Section 2.5., Control Registers, in Chapter 2, System Architecture Overview, for more information about the TS flag. The MP flag in control register CR0 is used along with the TS flag to determine if WAIT or FWAIT instructions should generate a device-not-available exception. It extends the function of the TS flag to the WAIT and FWAIT instructions, giving the exception handler an opportunity to save the context of the FPU before the WAIT or FWAIT instruction is executed. The MP flag is provided primarily for use with the Intel286 and Intel386 DX processors. For programs running on the P6 family, Pentium, or Intel486 DX processors, or the Intel 487 SX coprocessors, the MP flag should always be set; for programs running on the Intel486 SX processor, the MP flag should be clear. Exception Error Code None. Saved Instruction Pointer The saved contents of CS and EIP registers point to the floating-point instruction or the WAIT/FWAIT instruction that generated the exception.
5-30
Program State Change A program-state change does not accompany a device-not-available fault, because the instruction that generated the exception is not executed. If the EM flag is set, the exception handler can then read the floating-point instruction pointed to by the EIP and call the appropriate emulation routine. If the MP and TS flags are set or the TS flag alone is set, the exception handler can save the context of the FPU, clear the TS flag, and continue execution at the interrupted floating-point or WAIT/FWAIT instruction.
5-31
Interrupt 8Double Fault Exception (#DF)

Exception Class Description Indicates that the processor detected a second exception while calling an exception handler for a prior exception. Normally, when the processor detects another exception while trying to call an exception handler, the two exceptions can be handled serially. If, however, the processor cannot handle them serially, it signals the double-fault exception. To determine when two faults need to be signaled as a double fault, the processor divides the exceptions into three classes: benign exceptions, contributory exceptions, and page faults (refer to Table 5-4).
Table 5-4. Interrupt and Exception Classes
Class Benign Exceptions and Interrupts Vector Number 1 2 3 4 5 6 7 9 16 17 18 19 All All 0 10 11 12 13 14 Description Debug Exception NMI Interrupt Breakpoint Overflow BOUND Range Exceeded Invalid Opcode Device Not Available Coprocessor Segment Overrun Floating-Point Error Alignment Check Machine Check SIMD floating-point extensions INT n INTR Divide Error Invalid TSS Segment Not Present Stack Fault General Protection Page Fault
Abort.
Contributory Exceptions
Page Faults
Table 5-5 shows the various combinations of exception classes that cause a double fault to be generated. A double-fault exception falls in the abort class of exceptions. The program or task cannot be restarted or resumed. The double-fault handler can be used to collect diagnostic information about the state of the machine and/or, when possible, to shut the application and/or system down gracefully or restart the system. A segment or page fault may be encountered while prefetching instructions; however, this behavior is outside the domain of Table 5-5. Any further faults generated while the processor is attempting to transfer control to the appropriate fault handler could still lead to a double-fault sequence.
5-32
Table 5-5. Conditions for Generating a Double Fault

Second Exception First Exception Benign Contributory Page Fault Benign Handle Exceptions Serially Handle Exceptions Serially Handle Exceptions Serially Contributory Handle Exceptions Serially Generate a Double Fault Generate a Double Fault Page Fault Handle Exceptions Serially Handle Exceptions Serially Generate a Double Fault
If another exception occurs while attempting to call the double-fault handler, the processor enters shutdown mode. This mode is similar to the state following execution of an HLT instruction. In this mode, the processor stops executing instructions until an NMI interrupt, SMI interrupt, hardware reset, or INIT# is received. The processor generates a special bus cycle to indicate that it has entered shutdown mode. Software designers may need to be aware of the response of hardware to receiving this signal. For example, hardware may turn on an indicator light on the front panel, generate an NMI interrupt to record diagnostic information, invoke reset initialization, generate an INIT initialization, or generate an SMI. If the shutdown occurs while the processor is executing an NMI interrupt handler, then only a hardware reset can restart the processor. Exception Error Code Zero. The processor always pushes an error code of 0 onto the stack of the double-fault handler. Saved Instruction Pointer The saved contents of CS and EIP registers are undefined. Program State Change A program-state following a double-fault exception is undefined. The program or task cannot be resumed or restarted. The only available action of the double-fault exception handler is to collect all possible context information for use in diagnostics and then close the application and/or shut down or reset the processor.
5-33
Interrupt 9Coprocessor Segment Overrun

Exception Class Abort. (Intel reserved; do not use. Recent Intel Architecture processors do not generate this exception.)
Description Indicates that an Intel386 CPU-based systems with an Intel 387 math coprocessor detected a page or segment violation while transferring the middle portion of an Intel 387 math coprocessor operand. The P6 family, Pentium, and Intel486 processors do not generate this exception; instead, this condition is detected with a general protection exception (#GP), interrupt 13. Exception Error Code None. Saved Instruction Pointer The saved contents of CS and EIP registers point to the instruction that generated the exception. Program State Change A program-state following a coprocessor segment-overrun exception is undefined. The program or task cannot be resumed or restarted. The only available action of the exception handler is to save the instruction pointer and reinitialize the FPU using the FNINIT instruction.
5-34
Interrupt 10Invalid TSS Exception (#TS)

Exception Class Description Indicates that a task switch was attempted and that invalid information was detected in the TSS for the target task. Table 5-6 shows the conditions that will cause an invalid-TSS exception to be generated. In general, these invalid conditions result from protection violations for the TSS descriptor; the LDT pointed to by the TSS; or the stack, code, or data segments referenced by the TSS.
Table 5-6. Invalid TSS Conditions
Error Code Index TSS segment selector index LDT segment selector index Stack-segment selector index Stack-segment selector index Stack-segment selector index Stack-segment selector index Code-segment selector index Code-segment selector index Code-segment selector index Code-segment selector index Data-segment selector index Data-segment selector index Invalid Condition TSS segment limit less than 67H for 32-bit TSS or less than 2CH for 16bit TSS. Invalid LDT or LDT not present Stack-segment selector exceeds descriptor table limit Stack segment is not writable Stack segment DPL CPL Stack-segment selector RPL CPL Code-segment selector exceeds descriptor table limit Code segment is not executable Nonconforming code segment DPL CPL Conforming code segment DPL greater than CPL Data-segment selector exceeds descriptor table limit Data segment not readable
Fault.
This exception can generated either in the context of the original task or in the context of the new task (refer to Section 6.3., Task Switching in Chapter 6, Task Management). Until the processor has completely verified the presence of the new TSS, the exception is generated in the context of the original task. Once the existence of the new TSS is verified, the task switch is considered complete. Any invalid-TSS conditions detected after this point are handled in the context of the new task. (A task switch is considered complete when the task register is loaded with the segment selector for the new TSS and, if the switch is due to a procedure call or interrupt, the previous task link field of the new TSS references the old TSS.) To insure that a valid TSS is available to process the exception, the invalid-TSS exception handler must be a task called using a task gate.
5-35
Exception Error Code An error code containing the segment selector index for the segment descriptor that caused the violation is pushed onto the stack of the exception handler. If the EXT flag is set, it indicates that the exception was caused by an event external to the currently running program (for example, if an external interrupt handler using a task gate attempted a task switch to an invalid TSS). Saved Instruction Pointer If the exception condition was detected before the task switch was carried out, the saved contents of CS and EIP registers point to the instruction that invoked the task switch. If the exception condition was detected after the task switch was carried out, the saved contents of CS and EIP registers point to the first instruction of the new task. Program State Change The ability of the invalid-TSS handler to recover from the fault depends on the error condition than causes the fault. Refer to Section 6.3., Task Switching in Chapter 6, Task Management for more information on the task switch process and the possible recovery actions that can be taken. If an invalid TSS exception occurs during a task switch, it can occur before or after the committo-new-task point. If it occurs before the commit point, no program state change occurs. If it occurs after the commit point (when the segment descriptor information for the new segment selectors have been loaded in the segment registers), the processor will load all the state information from the new TSS before it generates the exception. During a task switch, the processor first loads all the segment registers with segment selectors from the TSS, then checks their contents for validity. If an invalid TSS exception is discovered, the remaining segment registers are loaded but not checked for validity and therefore may not be usable for referencing memory. The invalid TSS handler should not rely on being able to use the segment selectors found in the CS, SS, DS, ES, FS, and GS registers without causing another exception. The exception handler should load all segment registers before trying to resume the new task; otherwise, generalprotection exceptions (#GP) may result later under conditions that make diagnosis more difficult. The Intel recommended way of dealing situation is to use a task for the invalid TSS exception handler. The task switch back to the interrupted task from the invalid-TSS exceptionhandler task will then cause the processor to check the registers as it loads them from the TSS.
5-36
Interrupt 11Segment Not Present (#NP)

Exception Class Description Indicates that the present flag of a segment or gate descriptor is clear. The processor can generate this exception during any of the following operations: Fault.
While attempting to load CS, DS, ES, FS, or GS registers. [Detection of a not-present segment while loading the SS register causes a stack fault exception (#SS) to be generated.] This situation can occur while performing a task switch. While attempting to load the LDTR using an LLDT instruction. Detection of a not-present LDT while loading the LDTR during a task switch operation causes an invalid-TSS exception (#TS) to be generated. When executing the LTR instruction and the TSS is marked not present. While attempting to use a gate descriptor or TSS that is marked segment-not-present, but is otherwise valid.
An operating system typically uses the segment-not-present exception to implement virtual memory at the segment level. If the exception handler loads the segment and returns, the interrupted program or task resumes execution. A not-present indication in a gate descriptor, however, does not indicate that a segment is not present (because gates do not correspond to segments). The operating system may use the present flag for gate descriptors to trigger exceptions of special significance to the operating system. Exception Error Code An error code containing the segment selector index for the segment descriptor that caused the violation is pushed onto the stack of the exception handler. If the EXT flag is set, it indicates that the exception resulted from an external event (NMI or INTR) that caused an interrupt, which subsequently referenced a not-present segment. The IDT flag is set if the error code refers to an IDT entry (e.g., an INT instruction referencing a not-present gate). Saved Instruction Pointer The saved contents of CS and EIP registers normally point to the instruction that generated the exception. If the exception occurred while loading segment descriptors for the segment selectors in a new TSS, the CS and EIP registers point to the first instruction in the new task. If the exception occurred while accessing a gate descriptor, the CS and EIP registers point to the instruction that invoked the access (for example a CALL instruction that references a call gate).
5-37
Program State Change If the segment-not-present exception occurs as the result of loading a register (CS, DS, SS, ES, FS, GS, or LDTR), a program-state change does accompany the exception, because the register is not loaded. Recovery from this exception is possible by simply loading the missing segment into memory and setting the present flag in the segment descriptor. If the segment-not-present exception occurs while accessing a gate descriptor, a program-state change does not accompany the exception. Recovery from this exception is possible merely by setting the present flag in the gate descriptor. If a segment-not-present exception occurs during a task switch, it can occur before or after the commit-to-new-task point (refer to Section 6.3., Task Switching in Chapter 6, Task Management). If it occurs before the commit point, no program state change occurs. If it occurs after the commit point, the processor will load all the state information from the new TSS (without performing any additional limit, present, or type checks) before it generates the exception. The segment-not-present exception handler should thus not rely on being able to use the segment selectors found in the CS, SS, DS, ES, FS, and GS registers without causing another exception. (Refer to the Program State Change description for Interrupt 10Invalid TSS Exception (#TS) in this chapter for additional information on how to handle this situation.)
5-38
Interrupt 12Stack Fault Exception (#SS)

Exception Class Description Indicates that one of the following stack related conditions was detected: Fault.
A limit violation is detected during an operation that refers to the SS register. Operations that can cause a limit violation include stack-oriented instructions such as POP, PUSH, CALL, RET, IRET, ENTER, and LEAVE, as well as other memory references which implicitly or explicitly use the SS register (for example, MOV AX, [BP+6] or MOV AX, SS:[EAX+6]). The ENTER instruction generates this exception when there is not enough stack space for allocating local variables. A not-present stack segment is detected when attempting to load the SS register. This violation can occur during the execution of a task switch, a CALL instruction to a different privilege level, a return to a different privilege level, an LSS instruction, or a MOV or POP instruction to the SS register.
Recovery from this fault is possible by either extending the limit of the stack segment (in the case of a limit violation) or loading the missing stack segment into memory (in the case of a notpresent violation. Exception Error Code If the exception is caused by a not-present stack segment or by overflow of the new stack during an inter-privilege-level call, the error code contains a segment selector for the segment that caused the exception. Here, the exception handler can test the present flag in the segment descriptor pointed to by the segment selector to determine the cause of the exception. For a normal limit violation (on a stack segment already in use) the error code is set to 0. Saved Instruction Pointer The saved contents of CS and EIP registers generally point to the instruction that generated the exception. However, when the exception results from attempting to load a not-present stack segment during a task switch, the CS and EIP registers point to the first instruction of the new task. Program State Change A program-state change does not generally accompany a stack-fault exception, because the instruction that generated the fault is not executed. Here, the instruction can be restarted after the exception handler has corrected the stack fault condition. If a stack fault occurs during a task switch, it occurs after the commit-to-new-task point (refer to Section 6.3., Task Switching Chapter 6, Task Management). Here, the processor loads all the state information from the new TSS (without performing any additional limit, present, or
5-39
type checks) before it generates the exception. The stack fault handler should thus not rely on being able to use the segment selectors found in the CS, SS, DS, ES, FS, and GS registers without causing another exception. The exception handler should check all segment registers before trying to resume the new task; otherwise, general protection faults may result later under conditions that are more difficult to diagnose. (Refer to the Program State Change description for Interrupt 10Invalid TSS Exception (#TS) in this chapter for additional information on how to handle this situation.)
5-40
Interrupt 13General Protection Exception (#GP)

Exception Class Description Indicates that the processor detected one of a class of protection violations called generalprotection violations. The conditions that cause this exception to be generated comprise all the protection violations that do not cause other exceptions to be generated (such as, invalid-TSS, segment-not-present, stack-fault, or page-fault exceptions). The following conditions cause general-protection exceptions to be generated: Fault.
Exceeding the segment limit when accessing the CS, DS, ES, FS, or GS segments. Exceeding the segment limit when referencing a descriptor table (except during a task switch or a stack switch). Transferring execution to a segment that is not executable. Writing to a code segment or a read-only data segment. Reading from an execute-only code segment. Loading the SS register with a segment selector for a read-only segment (unless the selector comes from a TSS during a task switch, in which case an invalid-TSS exception occurs). Loading the SS, DS, ES, FS, or GS register with a segment selector for a system segment. Loading the DS, ES, FS, or GS register with a segment selector for an execute-only code segment. Loading the SS register with the segment selector of an executable segment or a null segment selector. Loading the CS register with a segment selector for a data segment or a null segment selector. Accessing memory using the DS, ES, FS, or GS register when it contains a null segment selector. Switching to a busy task during a call or jump to a TSS. Switching to an available (nonbusy) task during the execution of an IRET instruction. Using a segment selector on task switch that points to a TSS descriptor in the current LDT. TSS descriptors can only reside in the GDT. Violating any of the privilege rules described in Chapter 4, Protection. Exceeding the instruction length limit of 15 bytes (this only can occur when redundant prefixes are placed before an instruction).
5-41
Loading the CR0 register with a set PG flag (paging enabled) and a clear PE flag (protection disabled). Loading the CR0 register with a set NW flag and a clear CD flag. Referencing an entry in the IDT (following an interrupt or exception) that is not an interrupt, trap, or task gate. Attempting to access an interrupt or exception handler through an interrupt or trap gate from virtual-8086 mode when the handlers code segment DPL is greater than 0. Attempting to write a 1 into a reserved bit of CR4. Attempting to execute a privileged instruction when the CPL is not equal to 0 (refer to Section 4.9., Privileged Instructions in Chapter 4, Protection for a list of privileged instructions). Writing to a reserved bit in an MSR. Accessing a gate that contains a null segment selector. Executing the INT n instruction when the CPL is greater than the DPL of the referenced interrupt, trap, or task gate. The segment selector in a call, interrupt, or trap gate does not point to a code segment. The segment selector operand in the LLDT instruction is a local type (TI flag is set) or does not point to a segment descriptor of the LDT type. The segment selector operand in the LTR instruction is local or points to a TSS that is not available. The target code-segment selector for a call, jump, or return is null. If the PAE and/or PSE flag in control register CR4 is set and the processor detects any reserved bits in a page-directory-pointer-table entry set to 1. These bits are checked during a write to control registers CR0, CR3, or CR4 that causes a reloading of the pagedirectory-pointer-table entry.
A program or task can be restarted following any general-protection exception. If the exception occurs while attempting to call an interrupt handler, the interrupted program can be restartable, but the interrupt may be lost. Exception Error Code The processor pushes an error code onto the exception handlers stack. If the fault condition was detected while loading a segment descriptor, the error code contains a segment selector to or IDT vector number for the descriptor; otherwise, the error code is 0. The source of the selector in an error code may be any of the following:

5-42
An operand of the instruction. A selector from a gate which is the operand of the instruction. A selector from a TSS involved in a task switch.
IDT vector number.
Saved Instruction Pointer The saved contents of CS and EIP registers point to the instruction that generated the exception. Program State Change In general, a program-state change does not accompany a general-protection exception, because the invalid instruction or operation is not executed. An exception handler can be designed to correct all of the conditions that cause general-protection exceptions and restart the program or task without any loss of program continuity. If a general-protection exception occurs during a task switch, it can occur before or after the commit-to-new-task point (refer to Section 6.3., Task Switching in Chapter 6, Task Management). If it occurs before the commit point, no program state change occurs. If it occurs after the commit point, the processor will load all the state information from the new TSS (without performing any additional limit, present, or type checks) before it generates the exception. The general-protection exception handler should thus not rely on being able to use the segment selectors found in the CS, SS, DS, ES, FS, and GS registers without causing another exception. (Refer to the Program State Change description for Interrupt 10Invalid TSS Exception (#TS) in this chapter for additional information on how to handle this situation.)
5-43
Interrupt 14Page-Fault Exception (#PF)

Exception Class Description Indicates that, with paging enabled (the PG flag in the CR0 register is set), the processor detected one of the following conditions while using the page-translation mechanism to translate a linear address to a physical address: Fault.
The P (present) flag in a page-directory or page-table entry needed for the address translation is clear, indicating that a page table or the page containing the operand is not present in physical memory. The procedure does not have sufficient privilege to access the indicated page (that is, a procedure running in user mode attempts to access a supervisor-mode page). Code running in user mode attempts to write to a read-only page. In the Intel486 and later processors, if the WP flag is set in CR0, the page fault will also be triggered by code running in supervisor mode that tries to write to a read-only user-mode page.
The exception handler can recover from page-not-present conditions and restart the program or task without any loss of program continuity. It can also restart the program or task after a privilege violation, but the problem that caused the privilege violation may be uncorrectable. Exception Error Code Yes (special format). The processor provides the page-fault handler with two items of information to aid in diagnosing the exception and recovering from it:
An error code on the stack. The error code for a page fault has a format different from that for other exceptions (refer to Figure 5-7). The error code tells the exception handler four things: The P flag indicates whether the exception was due to a not-present page (0) or to either an access rights violation or the use of a reserved bit (1). The W/R flag indicates whether the memory access that caused the exception was a read (0) or write (1). The U/S flag indicates whether the processor was executing at user mode (1) or supervisor mode (0) at the time of the exception. The RSVD flag indicates that the processor detected 1s in reserved bits of the page directory, when the PSE or PAE flags in control register CR4 are set to 1. (The PSE flag is only available in the P6 family and Pentium processors, and the PAE flag is only available on the P6 family processors. In earlier Intel Architecture processor families, the bit position of the RSVD flag is reserved.)
5-44
31
4 3 2 1 0
Reserved P
R S V D
U R / / P S W
0 The fault was caused by a nonpresent page. 1 The fault was caused by a page-level protection violation. 0 The access causing the fault was a read. The access causing the fault was a write. 0 The access causing the fault originated when the processor was executing in supervisor mode. The access causing the fault originated when the processor was executing in user mode.
W/R 1 U/S 1
RSVD 0 The fault was not caused by a reserved bit violation. 1 The page fault occured because a 1 was detected in one of the reserved bit positions of a page table entry or directory entry that was marked present.
Figure 5-7. Page-Fault Error Code
The contents of the CR2 register. The processor loads the CR2 register with the 32-bit linear address that generated the exception. The page-fault handler can use this address to locate the corresponding page directory and page-table entries. If another page fault can potentially occur during execution of the page-fault handler, the handler must push the contents of the CR2 register onto the stack before the second page fault occurs.
If a page fault is caused by a page-level protection violation, the access flag in the page-directory entry is set when the fault occurs. The behavior of Intel Architecture processors regarding the access flag in the corresponding page-table entry is model specific and not architecturally defined. Saved Instruction Pointer The saved contents of CS and EIP registers generally point to the instruction that generated the exception. If the page-fault exception occurred during a task switch, the CS and EIP registers may point to the first instruction of the new task (as described in the following Program State Change section). Program State Change A program-state change does not normally accompany a page-fault exception, because the instruction that causes the exception to be generated is not executed. After the page-fault exception handler has corrected the violation (for example, loaded the missing page into memory), execution of the program or task can be resumed.
5-45
When a page-fault exception is generated during a task switch, the program-state may change, as follows. During a task switch, a page-fault exception can occur during any of following operations:
While writing the state of the original task into the TSS of that task. While reading the GDT to locate the TSS descriptor of the new task. While reading the TSS of the new task. While reading segment descriptors associated with segment selectors from the new task. While reading the LDT of the new task to verify the segment registers stored in the new TSS.
In the last two cases the exception occurs in the context of the new task. The instruction pointer refers to the first instruction of the new task, not to the instruction which caused the task switch (or the last instruction to be executed, in the case of an interrupt). If the design of the operating system permits page faults to occur during task-switches, the page-fault handler should be called through a task gate. If a page fault occurs during a task switch, the processor will load all the state information from the new TSS (without performing any additional limit, present, or type checks) before it generates the exception. The page-fault handler should thus not rely on being able to use the segment selectors found in the CS, SS, DS, ES, FS, and GS registers without causing another exception. (Refer to the Program State Change description for Interrupt 10Invalid TSS Exception (#TS) in this chapter for additional information on how to handle this situation.) Additional Exception-Handling Information Special care should be taken to ensure that an exception that occurs during an explicit stack switch does not cause the processor to use an invalid stack pointer (SS:ESP). Software written for 16-bit Intel Architecture processors often use a pair of instructions to change to a new stack, for example:
MOV SS, AX MOV SP, StackTop
When executing this code on one of the 32-bit Intel Architecture processors, it is possible to get a page fault, general-protection fault (#GP), or alignment check fault (#AC) after the segment selector has been loaded into the SS register but before the ESP register has been loaded. At this point, the two parts of the stack pointer (SS and ESP) are inconsistent. The new stack segment is being used with the old stack pointer. The processor does not use the inconsistent stack pointer if the exception handler switches to a well defined stack (that is, the handler is a task or a more privileged procedure). However, if the exception handler is called at the same privilege level and from the same task, the processor will attempt to use the inconsistent stack pointer. In systems that handle page-fault, general-protection, or alignment check exceptions within the faulting task (with trap or interrupt gates), software executing at the same privilege level as the exception handler should initialize a new stack by using the LSS instruction rather than a pair
5-46
of MOV instructions, as described earlier in this note. When the exception handler is running at privilege level 0 (the normal case), the problem is limited to procedures or tasks that run at privilege level 0, typically the kernel of the operating system.
5-47
Interrupt 16Floating-Point Error Exception (#MF)

Exception Class Description Indicates that the FPU has detected a floating-point-error exception. The NE flag in the register CR0 must be set and the appropriate exception must be unmasked (clear mask bit in the control register) for an interrupt 16, floating-point-error exception to be generated. (Refer to Section 2.5., Control Registers in Chapter 2, System Architecture Overview for a detailed description of the NE flag.) While executing floating-point instructions, the FPU detects and reports six types of floatingpoint errors: Fault.
Invalid operation (#I) Stack overflow or underflow (#IS) Invalid arithmetic operation (#IA)
Divide-by-zero (#Z) Denormalized operand (#D) Numeric overflow (#O) Numeric underflow (#U) Inexact result (precision) (#P)
For each of these error types, the FPU provides a flag in the FPU status register and a mask bit in the FPU control register. If the FPU detects a floating-point error and the mask bit for the error is set, the FPU handles the error automatically by generating a predefined (default) response and continuing program execution. The default responses have been designed to provide a reasonable result for most floating-point applications. If the mask for the error is clear and the NE flag in register CR0 is set, the FPU does the following: 1. Sets the necessary flag in the FPU status register. 2. Waits until the next waiting floating-point instruction or WAIT/FWAIT instruction is encountered in the programs instruction stream. (The FPU checks for pending floatingpoint exceptions on waiting instructions prior to executing them. All the floating-point instructions except the FNINIT, FNCLEX, FNSTSW, FNSTSW AX, FNSTCW, FNSTENV, and FNSAVE instructions are waiting instructions.) 3. Generates an internal error signal that causes the processor to generate a floating-pointerror exception.
5-48
All of the floating-point-error conditions can be recovered from. The floating-point-error exception handler can determine the error condition that caused the exception from the settings of the flags in the FPU status word. Refer to Software Exception Handling in Chapter 7 of the Intel Architecture Software Developers Manual, Volume 1, for more information on handling floating-point-error exceptions. Exception Error Code None. The FPU provides its own error information. Saved Instruction Pointer The saved contents of CS and EIP registers point to the floating-point or WAIT/FWAIT instruction that was about to be executed when the floating-point-error exception was generated. This is not the faulting instruction in which the error condition was detected. The address of the faulting instruction is contained in the FPU instruction pointer register. Refer to The FPU Instruction and Operand (Data) Pointers in Chapter 7 of the Intel Architecture Software Developers Manual, Volume 1, for more information about information the FPU saves for use in handling floating-point-error exceptions. Program State Change A program-state change generally accompanies a floating-point-error exception because the handling of the exception is delayed until the next waiting floating-point or WAIT/FWAIT instruction following the faulting instruction. The FPU, however, saves sufficient information about the error condition to allow recovery from the error and re-execution of the faulting instruction if needed. In situations where nonfloating-point instructions depend on the results of a floating-point instruction, a WAIT or FWAIT instruction can be inserted in front of a dependent instruction to force a pending floating-point-error exception to be handled before the dependent instruction is executed. Refer to Floating-Point Exception Synchronization in Chapter 7 of the Intel Architecture Software Developers Manual, Volume 1, for more information about synchronization of floating-point-error exceptions.
5-49
Interrupt 17Alignment Check Exception (#AC)

Exception Class Description Indicates that the processor detected an unaligned memory operand when alignment checking was enabled. Alignment checks are only carried out in data (or stack) segments (not in code or system segments). An example of an alignment-check violation is a word stored at an odd byte address, or a doubleword stored at an address that is not an integer multiple of 4. Table 5-7 lists the alignment requirements various data types recognized by the processor.
Table 5-7. Alignment Requirements by Data Type
Data Type Word Doubleword Single Real Double Real Extended Real Segment Selector 32-bit Far Pointer 48-bit Far Pointer 32-bit Pointer GDTR, IDTR, LDTR, or Task Register Contents FSTENV/FLDENV Save Area FSAVE/FRSTOR Save Area Bit String 128-bit1 2 4 4 8 8 2 2 4 4 4 4 or 2, depending on operand size 4 or 2, depending on operand size 2 or 4 depending on the operand-size attribute. 16 Address Must Be Divisible By
Fault.
1. 128-bit datatype introduced with the Pentium III processor. This type of alignment check is done for operands less than 128-bits in size: 32-bit scalar single and 16-bit/32-bit/64-bit integer MMX technology; 2, 4, or 8 byte alignments checks are possible when #AC is enabled. Some exceptional cases are:
The MOVUPS instruction, which performs a 128-bit unaligned load or store. In this case, 2/4/8-byte misalignments will be detected, but detection of 16-byte misalignment is not guaranteed and may vary with implementation. The FXSAVE/FXRSTOR instructions - refer to instruction descriptions
To enable alignment checking, the following conditions must be true:
AM flag in CR0 register is set. AC flag in the EFLAGS register is set. The CPL is 3 (protected mode or virtual-8086 mode).
5-50
Alignment-check faults are generated only when operating at privilege level 3 (user mode). Memory references that default to privilege level 0, such as segment descriptor loads, do not generate alignment-check faults, even when caused by a memory reference made from privilege level 3. Storing the contents of the GDTR, IDTR, LDTR, or task register in memory while at privilege level 3 can generate an alignment-check fault. Although application programs do not normally store these registers, the fault can be avoided by aligning the information stored on an even word-address. FSAVE and FRSTOR instructions generate unaligned references which can cause alignmentcheck faults. These instructions are rarely needed by application programs. Exception Error Code Yes (always zero). Saved Instruction Pointer The saved contents of CS and EIP registers point to the instruction that generated the exception. Program State Change A program-state change does not accompany an alignment-check fault, because the instruction is not executed.
5-51
Interrupt 18Machine-Check Exception (#MC)

Exception Class Description Indicates that the processor detected an internal machine error or a bus error, or that an external agent detected a bus error. The machine-check exception is model-specific, available only on the P6 family and Pentium processors. The implementation of the machine-check exception is different between the P6 family and Pentium processors, and these implementations may not be compatible with future Intel Architecture processors. (Use the CPUID instruction to determine whether this feature is present.) Bus errors detected by external agents are signaled to the processor on dedicated pins: the BINIT# pin on the P6 family processors and the BUSCHK# pin on the Pentium processor. When one of these pins is enabled, asserting the pin causes error information to be loaded into machine-check registers and a machine-check exception is generated. The machine-check exception and machine-check architecture are discussed in detail in Chapter 13, Machine-Check Architecture. Also, refer to the data books for the individual processors for processor-specific hardware information. Exception Error Code None. Error information is provide by machine-check MSRs. Saved Instruction Pointer For the P6 family processors, if the EIPV flag in the MCG_STATUS MSR is set, the saved contents of CS and EIP registers are directly associated with the error that caused the machinecheck exception to be generated; if the flag is clear, the saved instruction pointer may not be associated with the error (refer to Section 13.3.1.2., MCG_STATUS MSR, in Chapter 13, Machine-Check Architecture). For the Pentium processor, contents of the CS and EIP registers may not be associated with the error. Program State Change A program-state change always accompanies a machine-check exception. If the machine-check mechanism is enabled (the MCE flag in control register CR4 is set), a machine-check exception results in an abort; that is, information about the exception can be collected from the machinecheck MSRs, but the program cannot be restarted. If the machine-check mechanism is not enabled, a machine-check exception causes the processor to enter the shutdown state. Abort.
5-52
Interrupt 19SIMD Floating-Point Exception (#XF)

Exception Class Description Indicates the processor has detected a SIMD floating-point execution unit exception. The appropriate status flag in the MXCSR register must be set and the particular exception unmasked for this interrupt to be generated. There are six classes of numeric exception conditions that can occur while executing Streaming SIMD Extensions: 1. Invalid operation (#I) 2. Divide-by-zero (#Z) 3. Denormalized operand (#D) 4. Numeric overflow (#O) 5. Numeric underflow (#U) 6. Inexact result (Precision) (#P) Invalid, Divide-by-zero, and Denormal exceptions are pre-computation exceptions, i.e., they are detected before any arithmetic operation occurs. Underflow, Overflow, and Precision exceptions are post-computational exceptions. When numeric exceptions occur, a processor supporting Streaming SIMD Extensions takes one of two possible courses of action: The processor can handle the exception by itself, producing the most reasonable result and allowing numeric program execution to continue undisturbed (i.e., masked exception response). A software exception handler can be invoked to handle the exception (i.e., unmasked exception response). Each of the six exception conditions described above has corresponding flag and mask bits in the MXCSR. If an exception is masked (the corresponding mask bit in MXCSR = 1), the processor takes an appropriate default action and continues with the computation. If the exception is unmasked (mask bit = 0) and the OS supports SIMD floating-point exceptions (i.e. CR4.OSXMMEXCPT = 1), a software exception handler is invoked immediately through SIMD floating-point exception interrupt vector 19. If the exception is unmasked (mask bit = 0) and the OS does not support SIMD floating-point exceptions (i.e. CR4.OSXMMEXCPT = 0), an invalid opcode exception is signaled instead of a SIMD floating-point exception. Note that because SIMD floating-point exceptions are precise and occur immediately, the situation does not arise where an x87-FP instruction, an FWAIT instruction, or another Streaming SIMD Extensions instruction will catch a pending unmasked SIMD floating-point exception. Fault.
5-53
Exception Error Code None. The Streaming SIMD Extensions provide their own error information. Saved Instruction Pointer The saved contents of CS and EIP registers point to the Streaming SIMD Extensions instruction that was executed when the SIMD floating-point exception was generated. This is the faulting instruction in which the error condition was detected. Program State Change A program-state change generally accompanies a SIMD floating-point exception because the handling of the exception is immediate unless the particular exception is masked. The Pentium III processor contains sufficient information about the error condition to allow recovery from the error and re-execution of the faulting instruction if needed. In situations where a SIMD floating-point exception occurred while the SIMD floating-point exceptions were masked, SIMD floating-point exceptions were then unmasked, and a Streaming SIMD Extensions instruction was executed, then no exception is raised.
5-54
Interrupts 32 to 255User Defined Interrupts

Exception Class Description Indicates that the processor did one of the following things: Not applicable.
Executed an INT n instruction where the instruction operand is one of the vector numbers from 32 through 255. Responded to an interrupt request at the INTR pin or from the local APIC when the interrupt vector number associated with the request is from 32 through 255.
Exception Error Code Not applicable. Saved Instruction Pointer The saved contents of CS and EIP registers point to the instruction that follows the INT n instruction or instruction following the instruction on which the INTR signal occurred. Program State Change A program-state change does not accompany interrupts generated by the INT n instruction or the INTR signal. The INT n instruction generates the interrupt within the instruction stream. When the processor receives an INTR signal, it commits all state changes for all previous instructions before it responds to the interrupt; so, program execution can resume upon returning from the interrupt handler.
5-55
5-56
6
Task Management
TASK MANAGEMENT
CHAPTER 6 TASK MANAGEMENT

This chapter describes the Intel Architectures task management facilities. These facilities are only available when the processor is running in protected mode.
6.1.
TASK MANAGEMENT OVERVIEW
A task is a unit of work that a processor can dispatch, execute, and suspend. It can be used to execute a program, a task or process, an operating-system service utility, an interrupt or exception handler, or a kernel or executive utility. The Intel Architecture provides a mechanism for saving the state of a task, for dispatching tasks for execution, and for switching from one task to another. When operating in protected mode, all processor execution takes place from within a task. Even simple systems must define at least one task. More complex systems can use the processors task management facilities to support multitasking applications.
6.1.1.
Task Structure
A task is made up of two parts: a task execution space and a task-state segment (TSS). The task execution space consists of a code segment, a stack segment, and one or more data segments (refer to Figure 6-1). If an operating system or executive uses the processors privilege-level protection mechanism, the task execution space also provides a separate stack for each privilege level. The TSS specifies the segments that make up the task execution space and provides a storage place for task state information. In multitasking systems, the TSS also provides a mechanism for linking tasks.
NOTE
This chapter describes primarily 32-bit tasks and the 32-bit TSS structure. For information on 16-bit tasks and the 16-bit TSS structure, refer to Section 6.6., 16-Bit Task-State Segment (TSS). A task is identified by the segment selector for its TSS. When a task is loaded into the processor for execution, the segment selector, base address, limit, and segment descriptor attributes for the TSS are loaded into the task register (refer to Section 2.4.4., Task Register (TR) in Chapter 2, System Architecture Overview). If paging is implemented for the task, the base address of the page directory used by the task is loaded into control register CR3.
6-1
TASK MANAGEMENT
Code Segment Task-State Segment (TSS) Data Segment Stack Segment (Current Priv. Level) Stack Seg. Priv. Level 0 Stack Seg. Priv. Level 1 Task Register CR3 Stack Segment (Priv. Level 2)
Figure 6-1. Structure of a Task
6.1.2.
Task State
The following items define the state of the currently executing task:
The tasks current execution space, defined by the segment selectors in the segment registers (CS, DS, SS, ES, FS, and GS). The state of the general-purpose registers. The state of the EFLAGS register. The state of the EIP register. The state of control register CR3. The state of the task register. The state of the LDTR register. The I/O map base address and I/O map (contained in the TSS). Stack pointers to the privilege 0, 1, and 2 stacks (contained in the TSS). Link to previously executed task (contained in the TSS).
Prior to dispatching a task, all of these items are contained in the tasks TSS, except the state of the task register. Also, the complete contents of the LDTR register are not contained in the TSS, only the segment selector for the LDT.
6-2
TASK MANAGEMENT
6.1.3.
Executing a Task
Software or the processor can dispatch a task for execution in one of the following ways:
A explicit call to a task with the CALL instruction. A explicit jump to a task with the JMP instruction. An implicit call (by the processor) to an interrupt-handler task. An implicit call to an exception-handler task. A return (initiated with an IRET instruction) when the NT flag in the EFLAGS register is set.
All of these methods of dispatching a task identify the task to be dispatched with a segment selector that points either to a task gate or the TSS for the task. When dispatching a task with a CALL or JMP instruction, the selector in the instruction may select either the TSS directly or a task gate that holds the selector for the TSS. When dispatching a task to handle an interrupt or exception, the IDT entry for the interrupt or exception must contain a task gate that holds the selector for the interrupt- or exception-handler TSS. When a task is dispatched for execution, a task switch automatically occurs between the currently running task and the dispatched task. During a task switch, the execution environment of the currently executing task (called the tasks state or context) is saved in its TSS and execution of the task is suspended. The context for the dispatched task is then loaded into the processor and execution of that task begins with the instruction pointed to by the newly loaded EIP register. If the task has not been run since the system was last initialized, the EIP will point to the first instruction of the tasks code; otherwise, it will point to the next instruction after the last instruction that the task executed when it was last active. If the currently executing task (the calling task) called the task being dispatched (the called task), the TSS segment selector for the calling task is stored in the TSS of the called task to provide a link back to the calling task. For all Intel Architecture processors, tasks are not recursive. A task cannot call or jump to itself. Interrupts and exceptions can be handled with a task switch to a handler task. Here, the processor not only can perform a task switch to handle the interrupt or exception, but it can automatically switch back to the interrupted task upon returning from the interrupt- or exception-handler task. This mechanism can handle interrupts that occur during interrupt tasks. As part of a task switch, the processor can also switch to another LDT, allowing each task to have a different logical-to-physical address mapping for LDT-based segments. The page-directory base register (CR3) also is reloaded on a task switch, allowing each task to have its own set of page tables. These protection facilities help isolate tasks and prevent them from interfering with one another. If one or both of these protection mechanisms are not used, the processor provides no protection between tasks. This is true even with operating systems that use multiple privilege levels for protection. Here, a task running at privilege level 3 that uses the same LDT and page tables as other privilege-level-3 tasks can access code and corrupt data and the stack of other tasks.
6-3
TASK MANAGEMENT
Use of task management facilities for handling multitasking applications is optional. Multitasking can be handled in software, with each software defined task executed in the context of a single Intel Architecture task.
6.2.
TASK MANAGEMENT DATA STRUCTURES
The processor defines five data structures for handling task-related activities:
Task-state segment (TSS). Task-gate descriptor. TSS descriptor. Task register. NT flag in the EFLAGS register.
When operating in protected mode, a TSS and TSS descriptor must be created for at least one task, and the segment selector for the TSS must be loaded into the task register (using the LTR instruction).
6.2.1.
Task-State Segment (TSS)
The processor state information needed to restore a task is saved in a system segment called the task-state segment (TSS). Figure 6-2 shows the format of a TSS for tasks designed for 32-bit CPUs. (Compatibility with 16-bit Intel 286 processor tasks is provided by a different kind of TSS, refer to Figure 6-9.) The fields of a TSS are divided into two main categories: dynamic fields and static fields. The processor updates the dynamic fields when a task is suspended during a task switch. The following are dynamic fields: General-purpose register fields State of the EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI registers prior to the task switch. Segment selector fields Segment selectors stored in the ES, CS, SS, DS, FS, and GS registers prior to the task switch. EFLAGS register field State of the EFAGS register prior to the task switch. EIP (instruction pointer) field State of the EIP register prior to the task switch. Previous task link field Contains the segment selector for the TSS of the previous task (updated on a task switch that was initiated by a call, interrupt, or exception). This field
6-4
TASK MANAGEMENT
(which is sometimes called the back link field) permits a task switch back to the previous task to be initiated with an IRET instruction. The processor reads the static fields, but does not normally change them. These fields are set up when a task is created. The following are static fields: LDT segment selector field Contains the segment selector for the tasks LDT.
31 I/O Map Base Address
15
0 T 100 LDT Segment Selector GS FS DS SS CS ES 96 92 88 84 80 76 72 68 64 60 56 52 48 44 40 36 32 28 SS2 24 20 SS1 16 12 SS0 8 4 Previous Task Link 0
EDI ESI EBP ESP EBX EDX ECX EAX EFLAGS EIP CR3 (PDBR)
ESP2
ESP1 ESP0
Reserved bits. Set to 0.
Figure 6-2. 32-Bit Task-State Segment (TSS)
6-5
TASK MANAGEMENT
CR3 control register field Contains the base physical address of the page directory to be used by the task. Control register CR3 is also known as the page-directory base register (PDBR). Privilege level-0, -1, and -2 stack pointer fields These stack pointers consist of a logical address made up of the segment selector for the stack segment (SS0, SS1, and SS2) and an offset into the stack (ESP0, ESP1, and ESP2). Note that the values in these fields are static for a particular task; whereas, the SS and ESP values will change if stack switching occurs within the task. T (debug trap) flag (byte 100, bit 0) When set, the T flag causes the processor to raise a debug exception when a task switch to this task occurs (refer to Section 15.3.1.5., Task-Switch Exception Condition, in Chapter 15, Debugging and Performance Monitoring). I/O map base address field Contains a 16-bit offset from the base of the TSS to the I/O permission bit map and interrupt redirection bitmap. When present, these maps are stored in the TSS at higher addresses. The I/O map base address points to the beginning of the I/O permission bit map and the end of the interrupt redirection bit map. Refer to Chapter 9, Input/Output, in the Intel Architecture Software Developers Manual, Volume 1, for more information about the I/O permission bit map. Refer to Section 16.3., Interrupt and Exception Handling in Virtual8086 Mode in Chapter 16, 8086 Emulation for a detailed description of the interrupt redirection bit map. If paging is used, care should be taken to avoid placing a page boundary within the part of the TSS that the processor reads during a task switch (the first 104 bytes). If a page boundary is placed within this part of the TSS, the pages on either side of the boundary must be present at the same time and contiguous in physical memory. The reason for this restriction is that when accessing a TSS during a task switch, the processor reads and writes into the first 104 bytes of each TSS from contiguous physical addresses beginning with the physical address of the first byte of the TSS. It may not perform address translations at a page boundary if one occurs within this area. So, after the TSS access begins, if a part of the 104 bytes is not both present and physically contiguous, the processor will access incorrect TSS information, without generating a page-fault exception. The reading of this incorrect information will generally lead to an unrecoverable exception later in the task switch process. Also, if paging is used, the pages corresponding to the previous tasks TSS, the current tasks TSS, and the descriptor table entries for each should be marked as read/write. The task switch will be carried out faster if the pages containing these structures are also present in memory before the task switch is initiated.
6.2.2.
TSS Descriptor
The TSS, like all other segments, is defined by a segment descriptor. Figure 6-3 shows the format of a TSS descriptor. TSS descriptors may only be placed in the GDT; they cannot be placed in an LDT or the IDT. An attempt to access a TSS using a segment selector with its TI
6-6
TASK MANAGEMENT
flag set (which indicates the current LDT) causes a general-protection exception (#GP) to be generated. A general-protection exception is also generated if an attempt is made to load a segment selector for a TSS into a segment register. The busy flag (B) in the type field indicates whether the task is busy. A busy task is currently running or is suspended. A type field with a value of 1001B indicates an inactive task; a value of 1011B indicates a busy task. Tasks are not recursive. The processor uses the busy flag to detect an attempt to call a task whose execution has been interrupted. To insure that there is only one busy flag is associated with a task, each TSS should have only one TSS descriptor that points to it.
TSS Descriptor
31 24 23 22 21 20 19 16 15 14 13 12 11 8 7 0 A G 0 0 V L D P L
Base 31:24
Limit 19:16
Type
0 1 0 B 1
Base 23:16
4
0
31
16 15
Base Address 15:00
Segment Limit 15:00
AVL B BASE DPL G LIMIT P TYPE
Available for use by system software Busy flag Segment Base Address Descriptor Privilege Level Granularity Segment Limit Segment Present Segment Type
Figure 6-3. TSS Descriptor
The base, limit, and DPL fields and the granularity and present flags have functions similar to their use in data-segment descriptors (refer to Section 3.4.3., Segment Descriptors in Chapter 3, Protected-Mode Memory Management). The limit field must have a value equal to or greater than 67H (for a 32-bit TSS), one byte less than the minimum size of a TSS. Attempting to switch to a task whose TSS descriptor has a limit less than 67H generates an invalid-TSS exception (#TS). A larger limit is required if an I/O permission bit map is included in the TSS. An even larger limit would be required if the operating system stores additional data in the TSS. The processor does not check for a limit greater than 67H on a task switch; however, it does when accessing the I/O permission bit map or interrupt redirection bit map. Any program or procedure with access to a TSS descriptor (that is, whose CPL is numerically equal to or less than the DPL of the TSS descriptor) can dispatch the task with a call or a jump. In most systems, the DPLs of TSS descriptors should be set to values less than 3, so that only privileged software can perform task switching. However, in multitasking applications, DPLs for some TSS descriptors can be set to 3 to allow task switching at the application (or user) privilege level.
6-7
TASK MANAGEMENT
6.2.3.
Task Register
The task register holds the 16-bit segment selector and the entire segment descriptor (32-bit base address, 16-bit segment limit, and descriptor attributes) for the TSS of the current task (refer to Figure 2-4 in Chapter 2, System Architecture Overview). This information is copied from the TSS descriptor in the GDT for the current task. Figure 6-4 shows the path the processor uses to accesses the TSS, using the information in the task register. The task register has both a visible part (that can be read and changed by software) and an invisible part (that is maintained by the processor and is inaccessible by software). The segment selector in the visible portion points to a TSS descriptor in the GDT. The processor uses the invisible portion of the task register to cache the segment descriptor for the TSS. Caching these values in a register makes execution of the task more efficient, because the processor does not need to fetch these values from memory to reference the TSS of the current task. The LTR (load task register) and STR (store task register) instructions load and read the visible portion of the task register. The LTR instruction loads a segment selector (source operand) into the task register that points to a TSS descriptor in the GDT, and then loads the invisible portion of the task register with information from the TSS descriptor. This instruction is a privileged instruction that may be executed only when the CPL is 0. The LTR instruction generally is used during system initialization to put an initial value in the task register. Afterwards, the contents of the task register are changed implicitly when a task switch occurs. The STR (store task register) instruction stores the visible portion of the task register in a general-purpose register or memory. This instruction can be executed by code running at any privilege level, to identify the currently running task; however, it is normally used only by operating system software. On power up or reset of the processor, the segment selector and base address are set to the default value of 0 and the limit is set to FFFFH.
6.2.4.
Task-Gate Descriptor
A task-gate descriptor provides an indirect, protected reference to a task. Figure 6-5 shows the format of a task-gate descriptor. A task-gate descriptor can be placed in the GDT, an LDT, or the IDT. The TSS segment selector field in a task-gate descriptor points to a TSS descriptor in the GDT. The RPL in this segment selector is not used. The DPL of a task-gate descriptor controls access to the TSS descriptor during a task switch. When a program or procedure makes a call or jump to a task through a task gate, the CPL and the RPL field of the gate selector pointing to the task gate must be less than or equal to the DPL of the task-gate descriptor. (Note that when a task gate is used, the DPL of the destination TSS descriptor is not used.)
6-8
TASK MANAGEMENT
TSS
Visible Part Task Register Selector
Invisible Part Base Address Segment Limit
GDT
TSS Descriptor
Figure 6-4. Task Register
31
16 15 14 13 12 11 P D P L
8 7
Type
0 0 1 0 1 0
31
16 15
TSS Segment Selector
DPL P TYPE
Descriptor Privilege Level Segment Present Segment Type Reserved
Figure 6-5. Task-Gate Descriptor
6-9
TASK MANAGEMENT
A task can be accessed either through a task-gate descriptor or a TSS descriptor. Both of these structures are provided to satisfy the following needs:
The need for a task to have only one busy flag. Because the busy flag for a task is stored in the TSS descriptor, each task should have only one TSS descriptor. There may, however, be several task gates that reference the same TSS descriptor. The need to provide selective access to tasks. Task gates fill this need, because they can reside in an LDT and can have a DPL that is different from the TSS descriptors DPL. A program or procedure that does not have sufficient privilege to access the TSS descriptor for a task in the GDT (which usually has a DPL of 0) may be allowed access to the task through a task gate with a higher DPL. Task gates give the operating system greater latitude for limiting access to specific tasks. The need for an interrupt or exception to be handled by an independent task. Task gates may also reside in the IDT, which allows interrupts and exceptions to be handled by handler tasks. When an interrupt or exception vector points to a task gate, the processor switches to the specified task.
Figure 6-6 illustrates how a task gate in an LDT, a task gate in the GDT, and a task gate in the IDT can all point to the same task.
6.3.
TASK SWITCHING
The processor transfers execution to another task in any of four cases:
The current program, task, or procedure executes a JMP or CALL instruction to a TSS descriptor in the GDT. The current program, task, or procedure executes a JMP or CALL instruction to a task-gate descriptor in the GDT or the current LDT. An interrupt or exception vector points to a task-gate descriptor in the IDT. The current task executes an IRET when the NT flag in the EFLAGS register is set.
The JMP, CALL, and IRET instructions, as well as interrupts and exceptions, are all generalized mechanisms for redirecting a program. The referencing of a TSS descriptor or a task gate (when calling or jumping to a task) or the state of the NT flag (when executing an IRET instruction) determines whether a task switch occurs. The processor performs the following operations when switching to a new task: 1. Obtains the TSS segment selector for the new task as the operand of the JMP or CALL instruction, from a task gate, or from the previous task link field (for a task switch initiated with an IRET instruction).
6-10
TASK MANAGEMENT
LDT
GDT Task Gate
TSS
Task Gate
TSS Descriptor
IDT
Task Gate
Figure 6-6. Task Gates Referencing the Same Task
2. Checks that the current (old) task is allowed to switch to the new task. Data-access privilege rules apply to JMP and CALL instructions. The CPL of the current (old) task and the RPL of the segment selector for the new task must be less than or equal to the DPL of the TSS descriptor or task gate being referenced. Exceptions, interrupts (except for interrupts generated by the INT n instruction), and the IRET instruction are permitted to switch tasks regardless of the DPL of the destination task-gate or TSS descriptor. For interrupts generated by the INT n instruction, the DPL is checked. 3. Checks that the TSS descriptor of the new task is marked present and has a valid limit (greater than or equal to 67H). 4. Checks that the new task is available (call, jump, exception, or interrupt) or busy (IRET return).
6-11
TASK MANAGEMENT
5. Checks that the current (old) TSS, new TSS, and all segment descriptors used in the task switch are paged into system memory. 6. If the task switch was initiated with a JMP or IRET instruction, the processor clears the busy (B) flag in the current (old) tasks TSS descriptor; if initiated with a CALL instruction, an exception, or an interrupt, the busy (B) flag is left set. (Refer to Table 6-2.) 7. If the task switch was initiated with an IRET instruction, the processor clears the NT flag in a temporarily saved image of the EFLAGS register; if initiated with a CALL or JMP instruction, an exception, or an interrupt, the NT flag is left unchanged in the saved EFLAGS image. 8. Saves the state of the current (old) task in the current tasks TSS. The processor finds the base address of the current TSS in the task register and then copies the states of the following registers into the current TSS: all the general-purpose registers, segment selectors from the segment registers, the temporarily saved image of the EFLAGS register, and the instruction pointer register (EIP).
NOTE
At this point, if all checks and saves have been carried out successfully, the processor commits to the task switch. If an unrecoverable error occurs in steps 1 through 8, the processor does not complete the task switch and insures that the processor is returned to its state prior to the execution of the instruction that initiated the task switch. If an unrecoverable error occurs after the commit point (in steps 9 through 14), the processor completes the task switch (without performing additional access and segment availability checks) and generates the appropriate exception prior to beginning execution of the new task. If exceptions occur after the commit point, the exception handler must finish the task switch itself before allowing the processor to begin executing the task. Refer to Chapter 5, Interrupt and Exception Handling for more information about the affect of exceptions on a task when they occur after the commit point of a task switch. 9. If the task switch was initiated with a CALL instruction, an exception, or an interrupt, the processor sets the NT flag in the EFLAGS image stored in the new tasks TSS; if initiated with an IRET instruction, the processor restores the NT flag from the EFLAGS image stored on the stack. If initiated with a JMP instruction, the NT flag is left unchanged. (Refer to Table 6-2.) 10. If the task switch was initiated with a CALL instruction, JMP instruction, an exception, or an interrupt, the processor sets the busy (B) flag in the new tasks TSS descriptor; if initiated with an IRET instruction, the busy (B) flag is left set. 11. Sets the TS flag in the control register CR0 image stored in the new tasks TSS. 12. Loads the task register with the segment selector and descriptor for the new task's TSS.
6-12
TASK MANAGEMENT
13. Loads the new tasks state from its TSS into processor. Any errors associated with the loading and qualification of segment descriptors in this step occur in the context of the new task. The task state information that is loaded here includes the LDTR register, the PDBR (control register CR3), the EFLAGS register, the EIP register, the general-purpose registers, and the segment descriptor parts of the segment registers. 14. Begins executing the new task. (To an exception handler, the first instruction of the new task appears not to have been executed.) The state of the currently executing task is always saved when a successful task switch occurs. If the task is resumed, execution starts with the instruction pointed to by the saved EIP value, and the registers are restored to the values they held when the task was suspended. When switching tasks, the privilege level of the new task does not inherit its privilege level from the suspended task. The new task begins executing at the privilege level specified in the CPL field of the CS register, which is loaded from the TSS. Because tasks are isolated by their separate address spaces and TSSs and because privilege rules control access to a TSS, software does not need to perform explicit privilege checks on a task switch. Table 6-1 shows the exception conditions that the processor checks for when switching tasks. It also shows the exception that is generated for each check if an error is detected and the segment that the error code references. (The order of the checks in the table is the order used in the P6 family processors. The exact order is model specific and may be different for other Intel Architecture processors.) Exception handlers designed to handle these exceptions may be subject to recursive calls if they attempt to reload the segment selector that generated the exception. The cause of the exception (or the first of multiple causes) should be fixed before reloading the selector.
Table 6-1. Exception Conditions Checked During a Task Switch
Condition Checked Segment selector for a TSS descriptor references the GDT and is within the limits of the table. TSS descriptor is present in memory. TSS descriptor is not busy (for task switch initiated by a call, interrupt, or exception). TSS descriptor is not busy (for task switch initiated by an IRET instruction). TSS segment limit greater than or equal to 108 (for 32bit TSS) or 44 (for 16-bit TSS). Registers are loaded from the values in the TSS. LDT segment selector of new task is valid 3. Code segment DPL matches segment selector RPL. SS segment selector is valid . Stack segment is present in memory.
2
Exception1 #GP #NP #GP (for JMP, CALL, INT) #TS (for IRET) #TS
Error Code Reference2 New Tasks TSS New Tasks TSS Tasks back-link TSS New Tasks TSS New Tasks TSS
#TS #TS #TS #SF
New Tasks LDT New Code Segment New Stack Segment New Stack Segment
6-13
TASK MANAGEMENT
Table 6-1. Exception Conditions Checked During a Task Switch (Contd.)

Stack segment DPL matches CPL. LDT of new task is present in memory. CS segment selector is valid 3. Code segment is present in memory. Stack segment DPL matches selector RPL. DS, ES, FS, and GS segment selectors are valid . DS, ES, FS, and GS segments are readable. DS, ES, FS, and GS segments are present in memory. DS, ES, FS, and GS segment DPL greater than or equal to CPL (unless these are conforming segments). NOTES: 1. #NP is segment-not-present exception, #GP is general-protection exception, #TS is invalid-TSS exception, and #SF is stack-fault exception. 2. The error code contains an index to the segment descriptor referenced in this column. 3. A segment selector is valid if it is in a compatible type of table (GDT or LDT), occupies an address within the tables segment limit, and refers to a compatible type of descriptor (for example, a segment selector in the CS register only is valid when it points to a code-segment descriptor).
3
#TS #TS #TS #NP #TS #TS #TS #NP #TS
New stack segment New Tasks LDT New Code Segment New Code Segment New Stack Segment New Data Segment New Data Segment New Data Segment New Data Segment
The TS (task switched) flag in the control register CR0 is set every time a task switch occurs. System software uses the TS flag to coordinate the actions of floating-point unit when generating floating-point exceptions with the rest of the processor. The TS flag indicates that the context of the floating-point unit may be different from that of the current task. Refer to Section 2.5., Control Registers in Chapter 2, System Architecture Overview for a detailed description of the function and use of the TS flag.
6.4.
TASK LINKING
The previous task link field of the TSS (sometimes called the backlink) and the NT flag in the EFLAGS register are used to return execution to the previous task. The NT flag indicates whether the currently executing task is nested within the execution of another task, and the previous task link field of the current task's TSS holds the TSS selector for the higher-level task in the nesting hierarchy, if there is one (refer to Figure 6-7). When a CALL instruction, an interrupt, or an exception causes a task switch, the processor copies the segment selector for the current TSS into the previous task link field of the TSS for the new task, and then sets the NT flag in the EFLAGS register. The NT flag indicates that the previous task link field of the TSS has been loaded with a saved TSS segment selector. If software uses an IRET instruction to suspend the new task, the processor uses the value in the previous task link field and the NT flag to return to the previous task; that is, if the NT flag is set, the processor performs a task switch to the task specified in the previous task link field.
6-14
TASK MANAGEMENT
NOTE
When a JMP instruction causes a task switch, the new task is not nested; that is, the NT flag is set to 0 and the previous task link field is not used. A JMP instruction is used to dispatch a new task when nesting is not desired.
Top Level Task TSS
Nested Task TSS
More Deeply Nested Task TSS
Currently Executing Task EFLAGS NT=1
NT=0
NT=1
NT=1
Prev. Task Link
Prev. Task Link
Prev. Task Link
Task Register
Figure 6-7. Nested Tasks
Table 6-2 summarizes the uses of the busy flag (in the TSS segment descriptor), the NT flag, the previous task link field, and TS flag (in control register CR0) during a task switch. Note that the NT flag may be modified by software executing at any privilege level. It is possible for a program to set its NT flag and execute an IRET instruction, which would have the effect of invoking the task specified in the previous link field of the current tasks TSS. To keep spurious task switches from succeeding, the operating system should initialize the previous task link field for every TSS it creates to 0.
Table 6-2. Effect of a Task Switch on Busy Flag, NT Flag, Previous Task Link Field, and TS Flag
Effect of JMP instruction Flag is set. Must have been clear before. Flag is cleared. No change. No change. No change. No change. Flag is set. Effect of CALL Instruction or Interrupt Flag is set. Must have been clear before. No change. Flag is currently set. Flag is set. No change. Loaded with selector for old tasks TSS. No change. Flag is set. Effect of IRET Instruction No change. Must have been set. Flag is cleared. Restored to value from TSS of new task. Flag is cleared. No change. No change. Flag is set.
Flag or Field Busy (B) flag of new task. Busy flag of old task. NT flag of new task. NT flag of old task. Previous task link field of new task. Previous task link field of old task. TS flag in control register CR0.
6-15
TASK MANAGEMENT
6.4.1.
Use of Busy Flag To Prevent Recursive Task Switching
A TSS allows only one context to be saved for a task; therefore, once a task is called (dispatched), a recursive (or re-entrant) call to the task would cause the current state of the task to be lost. The busy flag in the TSS segment descriptor is provided to prevent re-entrant task switching and subsequent loss of task state information. The processor manages the busy flag as follows: 1. When dispatching a task, the processor sets the busy flag of the new task. 2. If during a task switch, the current task is placed in a nested chain (the task switch is being generated by a CALL instruction, an interrupt, or an exception), the busy flag for the current task remains set. 3. When switching to the new task (initiated by a CALL instruction, interrupt, or exception), the processor generates a general-protection exception (#GP) if the busy flag of the new task is already set. (If the task switch is initiated with an IRET instruction, the exception is not raised because the processor expects the busy flag to be set.) 4. When a task is terminated by a jump to a new task (initiated with a JMP instruction in the task code) or by an IRET instruction in the task code, the processor clears the busy flag, returning the task to the not busy state. In this manner the processor prevents recursive task switching by preventing a task from switching to itself or to any task in a nested chain of tasks. The chain of nested suspended tasks may grow to any length, due to multiple calls, interrupts, or exceptions. The busy flag prevents a task from being invoked if it is in this chain. The busy flag may be used in multiprocessor configurations, because the processor follows a LOCK protocol (on the bus or in the cache) when it sets or clears the busy flag. This lock keeps two processors from invoking the same task at the same time. (Refer to Section 7.1.2.1., Automatic Locking in Chapter 7, Multiple-Processor Management for more information about setting the busy flag in a multiprocessor applications.)
6.4.2.
Modifying Task Linkages
In a uniprocessor system, in situations where it is necessary to remove a task from a chain of linked tasks, use the following procedure to remove the task: 1. Disable interrupts. 2. Change the previous task link field in the TSS of the pre-empting task (the task that suspended the task to be removed). It is assumed that the pre-empting task is the next task (newer task) in the chain from the task to be removed. Change the previous task link field should to point to the TSS of the next oldest or to an even older task in the chain. 3. Clear the busy (B) flag in the TSS segment descriptor for the task being removed from the chain. If more than one task is being removed from the chain, the busy flag for each task being remove must be cleared. 4. Enable interrupts.
6-16
TASK MANAGEMENT
In a multiprocessing system, additional synchronization and serialization operations must be added to this procedure to insure that the TSS and its segment descriptor are both locked when the previous task link field is changed and the busy flag is cleared.
6.5.
TASK ADDRESS SPACE
The address space for a task consists of the segments that the task can access. These segments include the code, data, stack, and system segments referenced in the TSS and any other segments accessed by the task code. These segments are mapped into the processors linear address space, which is in turn mapped into the processors physical address space (either directly or through paging). The LDT segment field in the TSS can be used to give each task its own LDT. Giving a task its own LDT allows the task address space to be isolated from other tasks by placing the segment descriptors for all the segments associated with the task in the tasks LDT. It also is possible for several tasks to use the same LDT. This is a simple and memory-efficient way to allow some tasks to communicate with or control each other, without dropping the protection barriers for the entire system. Because all tasks have access to the GDT, it also is possible to create shared segments accessed through segment descriptors in this table. If paging is enabled, the CR3 register (PDBR) field in the TSS allows each task can also have its own set of page tables for mapping linear addresses to physical addresses. Or, several tasks can share the same set of page tables.
6.5.1.
Mapping Tasks to the Linear and Physical Address Spaces
Tasks can be mapped to the linear address space and physical address space in either of two ways:
One linear-to-physical address space mapping is shared among all tasks. When paging is not enabled, this is the only choice. Without paging, all linear addresses map to the same physical addresses. When paging is enabled, this form of linear-to-physical address space mapping is obtained by using one page directory for all tasks. The linear address space may exceed the available physical space if demand-paged virtual memory is supported. Each task has its own linear address space that is mapped to the physical address space. This form of mapping is accomplished by using a different page directory for each task. Because the PDBR (control register CR3) is loaded on each task switch, each task may have a different page directory.
The linear address spaces of different tasks may map to completely distinct physical addresses. If the entries of different page directories point to different page tables and the page tables point to different pages of physical memory, then the tasks do not share any physical addresses.
6-17
TASK MANAGEMENT
With either method of mapping task linear address spaces, the TSSs for all tasks must lie in a shared area of the physical space, which is accessible to all tasks. This mapping is required so that the mapping of TSS addresses does not change while the processor is reading and updating the TSSs during a task switch. The linear address space mapped by the GDT also should be mapped to a shared area of the physical space; otherwise, the purpose of the GDT is defeated. Figure 6-8 shows how the linear address spaces of two tasks can overlap in the physical space by sharing page tables.
TSS Task A TSS
Page Directories
Page Tables
Page Frames Task A Page
PDBR
PDE PDE
PTE PTE PTE Shared PT
Task A Page Task A Page Shared Page
PTE PTE Task B TSS
Shared Page Task B Page
PDBR
PDE PDE
PTE PTE
Task B Page
Figure 6-8. Overlapping Linear-to-Physical Mappings
6.5.2.
Task Logical Address Space
To allow the sharing of data among tasks, use any of the following techniques to create shared logical-to-physical address-space mappings for data segments:
Through the segment descriptors in the GDT. All tasks must have access to the segment descriptors in the GDT. If some segment descriptors in the GDT point to segments in the linear-address space that are mapped into an area of the physical-address space common to all tasks, then all tasks can share the data and code in those segments. Through a shared LDT. Two or more tasks can use the same LDT if the LDT fields in their TSSs point to the same LDT. If some segment descriptors in a shared LDT point to segments that are mapped to a common area of the physical address space, the data and code in those segments can be shared among the tasks that share the LDT. This method of sharing is more selective than sharing through the GDT, because the sharing can be limited
6-18
TASK MANAGEMENT
to specific tasks. Other tasks in the system may have different LDTs that do not give them access to the shared segments.
Through segment descriptors in distinct LDTs that are mapped to common addresses in the linear address space. If this common area of the linear address space is mapped to the same area of the physical address space for each task, these segment descriptors permit the tasks to share segments. Such segment descriptors are commonly called aliases. This method of sharing is even more selective than those listed above, because, other segment descriptors in the LDTs may point to independent linear addresses which are not shared.
6.6.
16-BIT TASK-STATE SEGMENT (TSS)
The 32-bit Intel Architecture processors also recognize a 16-bit TSS format like the one used in Intel 286 processors (refer to Figure 6-9). It is supported for compatibility with software written to run on these earlier Intel Architecture processors. The following additional information is important to know about the 16-bit TSS.
Do not use a 16-bit TSS to implement a virtual-8086 task. The valid segment limit for a 16-bit TSS is 2CH. The 16-bit TSS does not contain a field for the base address of the page directory, which is loaded into control register CR3. Therefore, a separate set of page tables for each task is not supported for 16-bit tasks. If a 16-bit task is dispatched, the page-table structure for the previous task is used. The I/O base address is not included in the 16-bit TSS, so none of the functions of the I/O map are supported. When task state is saved in a 16-bit TSS, the upper 16 bits of the EFLAGS register and the EIP register are lost. When the general-purpose registers are loaded or saved from a 16-bit TSS, the upper 16 bits of the registers are modified and not maintained.
6-19
TASK MANAGEMENT
15 Task LDT Selector DS Selector SS Selector CS Selector ES Selector DI SI BP SP BX DX CX AX FLAG Word IP (Entry Point) SS2 SP2 SS1 SP1 SS0 SP0 Previous Task Link
0 42 40 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
Figure 6-9. 16-Bit TSS Format
6-20
7
Multiple-Processor Management
CHAPTER 7 MULTIPLE-PROCESSOR MANAGEMENT

The Intel Architecture provides several mechanisms for managing and improving the performance of multiple processors connected to the same system bus. These mechanisms include:
Bus locking and/or cache coherency management for performing atomic operations on system memory. Serializing instructions. (These instructions apply only to the Pentium and P6 family processors.) Advance programmable interrupt controller (APIC) located on the processor chip. (The APIC architecture was introduced into the Intel Architecture with the Pentium processor.) A secondary (level 2, L2) cache. For the P6 family processors, the L2 cache is included in the processor package and is tightly coupled to the processor. For the Pentium and Intel486 processors, pins are provided to support an external L2 cache.
These mechanisms are particularly useful in symmetric-multiprocessing systems; however, they can also be used in applications where a Intel Architecture processor and a special-purpose processor (such as a communications, graphics, or video processor) share the system bus. The main goals of these multiprocessing mechanisms are as follows:
To maintain system memory coherencyWhen two or more processors are attempting simultaneously to access the same address in system memory, some communication mechanism or memory access protocol must be available to promote data coherency and, in some instances, to allow one processor to temporarily lock a memory location. To maintain cache consistencyWhen one processor accesses data cached in another processor, it must not receive incorrect data. If it modifies data, all other processors that access that data must receive the modified data. To allow predictable ordering of writes to memoryIn some circumstances, it is important that memory writes be observed externally in precisely the same order as programmed. To distribute interrupt handling among a group of processorsWhen several processors are operating in a system in parallel, it is useful to have a centralized mechanism for receiving interrupts and distributing them to available processors for servicing.
The Intel Architectures caching mechanism and cache consistency are discussed in Chapter 9, Memory Cache Control. Bus and memory locking, serializing instructions, memory ordering, and the processors internal APIC are discussed in the following sections.
7-1
MULTIPLE-PROCESSOR MANAGEMENT
7.1.
LOCKED ATOMIC OPERATIONS
The 32-bit Intel Architecture processors support locked atomic operations on locations in system memory. These operations are typically used to manage shared data structures (such as semaphores, segment descriptors, system segments, or page tables) in which two or more processors may try simultaneously to modify the same field or flag. The processor uses three interdependent mechanisms for carrying out locked atomic operations:
Guaranteed atomic operations. Bus locking, using the LOCK# signal and the LOCK instruction prefix. Cache coherency protocols that insure that atomic operations can be carried out on cached data structures (cache lock). This mechanism is present in the P6 family processors.
These mechanisms are interdependent in the following ways. Certain basic memory transactions (such as reading or writing a byte in system memory) are always guaranteed to be handled atomically. That is, once started, the processor guarantees that the operation will be completed before another processor or bus agent is allowed access to the memory location. The processor also supports bus locking for performing selected memory operations (such as a read-modify-write operation in a shared area of memory) that typically need to be handled atomically, but are not automatically handled this way. Because frequently used memory locations are often cached in a processors L1 or L2 caches, atomic operations can often be carried out inside a processors caches without asserting the bus lock. Here the processors cache coherency protocols insure that other processors that are caching the same memory locations are managed properly while atomic operations are performed on cached memory locations. Note that the mechanisms for handling locked atomic operations have evolved as the complexity of Intel Architecture processors has evolved. As such, more recent Intel Architecture processors (such as the P6 family processors) provide a more refined locking mechanism than earlier Intel Architecture processors, as is described in the following sections.
7.1.1.
Guaranteed Atomic Operations
The Intel386, Intel486, Pentium, and P6 family processors guarantee that the following basic memory operations will always be carried out atomically:

7-2
Reading or writing a byte. Reading or writing a word aligned on a 16-bit boundary. Reading or writing a doubleword aligned on a 32-bit boundary.
The P6 family processors guarantee that the following additional memory operations will always be carried out atomically: Reading or writing a quadword aligned on a 64-bit boundary. (This operation is also guaranteed on the Pentium processor.) 16-bit accesses to uncached memory locations that fit within a 32-bit data bus. 16-, 32-, and 64-bit accesses to cached memory that fit within a 32-Byte cache line.
Accesses to cacheable memory that are split across bus widths, cache lines, and page boundaries are not guaranteed to be atomic by the Intel486, Pentium, or P6 family processors. The P6 family processors provide bus control signals that permit external memory subsystems to make split accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and should be avoided where possible.
7.1.2.
Bus Locking
Intel Architecture processors provide a LOCK# signal that is asserted automatically during certain critical memory operations to lock the system bus. While this output signal is asserted, requests from other processors or bus agents for control of the bus are blocked. Software can specify other occasions when the LOCK semantics are to be followed by prepending the LOCK prefix to an instruction. In the case of the Intel386, Intel486, and Pentium processors, explicitly locked instructions will result in the assertion of the LOCK# signal. It is the responsibility of the hardware designer to make the LOCK# signal available in system hardware to control memory accesses among processors. For the P6 family processors, if the memory area being accessed is cached internally in the processor, the LOCK# signal is generally not asserted; instead, locking is only applied to the processors caches (refer to Section 7.1.4., Effects of a LOCK Operation on Internal Processor Caches). 7.1.2.1. AUTOMATIC LOCKING
The operations on which the processor automatically follows the LOCK semantics are as follows:
When executing an XCHG instruction that references memory. When setting the B (busy) flag of a TSS descriptor. The processor tests and sets the busy flag in the type field of the TSS descriptor when switching to a task. To insure that two processors do not switch to the same task simultaneously, the processor follows the LOCK semantics while testing and setting this flag. When updating segment descriptors. When loading a segment descriptor, the processor will set the accessed flag in the segment descriptor if the flag is clear. During this operation, the processor follows the LOCK semantics so that the descriptor will not be modified by another processor while it is being updated. For this action to be effective, operating-system procedures that update descriptors should use the following steps: Use a locked operation to modify the access-rights byte to indicate that the segment descriptor is not-present, and specify a value for the type field that indicates that the descriptor is being updated. Update the fields of the segment descriptor. (This operation may require several memory accesses; therefore, locked operations cannot be used.)
7-3
Use a locked operation to modify the access-rights byte to indicate that the segment descriptor is valid and present. Note that the Intel386 processor always updates the accessed flag in the segment descriptor, whether it is clear or not. The P6 family, Pentium, and Intel486 processors only update this flag if it is not already set.
When updating page-directory and page-table entries. When updating page-directory and page-table entries, the processor uses locked cycles to set the accessed and dirty flag in the page-directory and page-table entries. Acknowledging interrupts. After an interrupt request, an interrupt controller may use the data bus to send the interrupt vector for the interrupt to the processor. The processor follows the LOCK semantics during this time to ensure that no other data appears on the data bus when the interrupt vector is being transmitted. SOFTWARE CONTROLLED BUS LOCKING
7.1.2.2.
To explicitly force the LOCK semantics, software can use the LOCK prefix with the following instructions when they are used to modify a memory location. An invalid-opcode exception (#UD) is generated when the LOCK prefix is used with any other instruction or when no write operation is made to memory (that is, when the destination operand is in a register).
The bit test and modify instructions (BTS, BTR, and BTC). The exchange instructions (XADD, CMPXCHG, and CMPXCHG8B). The LOCK prefix is automatically assumed for XCHG instruction. The following single-operand arithmetic and logical instructions: INC, DEC, NOT, and NEG. The following two-operand arithmetic and logical instructions: ADD, ADC, SUB, SBB, AND, OR, and XOR.
A locked instruction is guaranteed to lock only the area of memory defined by the destination operand, but may be interpreted by the system as a lock for a larger memory area. Software should access semaphores (shared memory used for signaling between multiple processors) using identical addresses and operand lengths. For example, if one processor accesses a semaphore using a word access, other processors should not access the semaphore using a byte access. The integrity of a bus lock is not affected by the alignment of the memory field. The LOCK semantics are followed for as many bus cycles as necessary to update the entire operand. However, it is recommend that locked accesses be aligned on their natural boundaries for better system performance:

7-4
Any boundary for an 8-bit access (locked or otherwise). 16-bit boundary for locked word accesses. 32-bit boundary for locked doubleword access.
64-bit boundary for locked quadword access.
Locked operations are atomic with respect to all other memory operations and all externally visible events. Only instruction fetch and page table accesses can pass locked instructions. Locked instructions can be used to synchronize data written by one processor and read by another processor. For the P6 family processors, locked operations serialize all outstanding load and store operations (that is, wait for them to complete). Locked instructions should not be used to insure that data written can be fetched as instructions.
NOTE
The locked instructions for the current versions of the Intel486, Pentium, and P6 family processors will allow data written to be fetched as instructions. However, Intel recommends that developers who require the use of selfmodifying code use a different synchronizing mechanism, described in the following sections.
7.1.3.
Handling Self- and Cross-Modifying Code
The act of a processor writing data into a currently executing code segment with the intent of executing that data as code is called self-modifying code. Intel Architecture processors exhibit model-specific behavior when executing self-modified code, depending upon how far ahead of the current execution pointer the code has been modified. As processor architectures become more complex and start to speculatively execute code ahead of the retirement point (as in the P6 family processors), the rules regarding which code should execute, pre- or post-modification, become blurred. To write self-modifying code and ensure that it is compliant with current and future Intel Architectures one of the following two coding options should be chosen.
(* OPTION 1 *) Store modified code (as data) into code segment; Jump to new code or an intermediate location; Execute new code; (* OPTION 2 *) Store modified code (as data) into code segment; Execute a serializing instruction; (* For example, CPUID instruction *) Execute new code;
(The use of one of these options is not required for programs intended to run on the Pentium or Intel486 processors, but are recommended to insure compatibility with the P6 family processors.) It should be noted that self-modifying code will execute at a lower level of performance than nonself-modifying or normal code. The degree of the performance deterioration will depend upon the frequency of modification and specific characteristics of the code.
7-5
The act of one processor writing data into the currently executing code segment of a second processor with the intent of having the second processor execute that data as code is called cross-modifying code. As with self-modifying code, Intel Architecture processors exhibit model-specific behavior when executing cross-modifying code, depending upon how far ahead of the executing processors current execution pointer the code has been modified. To write cross-modifying code and insure that it is compliant with current and future Intel Architectures, the following processor synchronization algorithm should be implemented.
; Action of Modifying Processor Store modified code (as data) into code segment; Memory_Flag 1; ; Action of Executing Processor WHILE (Memory_Flag 1) Wait for code to update; ELIHW; Execute serializing instruction; (* For example, CPUID instruction *) Begin executing modified code;
(The use of this option is not required for programs intended to run on the Intel486 processor, but is recommended to insure compatibility with the Pentium, and P6 family processors.) Like self-modifying code, cross-modifying code will execute at a lower level of performance than noncross-modifying (normal) code, depending upon the frequency of modification and specific characteristics of the code.
7.1.4.
Effects of a LOCK Operation on Internal Processor Caches
For the Intel486 and Pentium processors, the LOCK# signal is always asserted on the bus during a LOCK operation, even if the area of memory being locked is cached in the processor. For the P6 family processors, if the area of memory being locked during a LOCK operation is cached in the processor that is performing the LOCK operation as write-back memory and is completely contained in a cache line, the processor may not assert the LOCK# signal on the bus. Instead, it will modify the memory location internally and allow its cache coherency mechanism to insure that the operation is carried out atomically. This operation is called cache locking. The cache coherency mechanism automatically prevents two or more processors that have cached the same area of memory from simultaneously modifying data in that area.
7.2.
MEMORY ORDERING
The term memory ordering refers to the order in which the processor issues reads (loads) and writes (stores) out onto the bus to system memory. The Intel Architecture supports several memory ordering models depending on the implementation of the architecture. For example, the Intel386 processor enforces program ordering (generally referred to as strong ordering),
7-6
where reads and writes are issued on the system bus in the order they occur in the instruction stream under all circumstances. To allow optimizing of instruction execution, the Intel Architecture allows departures from strong-ordering model called processor ordering in P6-family processors. These processorordering variations allow performance enhancing operations such as allowing reads to go ahead of writes by buffering writes. The goal of any of these variations is to increase instruction execution speeds, while maintaining memory coherency, even in multiple-processor systems. The following sections describe the memory ordering models used by the Intel486, Pentium, and P6 family processors.
7.2.1.
Memory Ordering in the Pentium and Intel486 Processors
The Pentium and Intel486 processors follow the processor-ordered memory model; however, they operate as strongly-ordered processors under most circumstances. Reads and writes always appear in programmed order at the system busexcept for the following situation where processor ordering is exhibited. Read misses are permitted to go ahead of buffered writes on the system bus when all the buffered writes are cache hits and, therefore, are not directed to the same address being accessed by the read miss. In the case of I/O operations, both reads and writes always appear in programmed order. Software intended to operate correctly in processor-ordered processors (such as the P6 family processors) should not depend on the relatively strong ordering of the Pentium or Intel486 processors. Instead, it should insure that accesses to shared variables that are intended to control concurrent execution among processors are explicitly required to obey program ordering through the use of appropriate locking or serializing operations (refer to Section 7.2.4., Strengthening or Weakening the Memory Ordering Model).
7.2.2.
Memory Ordering in the P6 Family Processors
The P6 family processors also use a processor-ordered memory ordering model that can be further refined defined as write ordered with store-buffer forwarding. This model can be characterized as follows. In a single-processor system for memory regions defined as write-back cacheable, the following ordering rules apply: 1. Reads can be carried out speculatively and in any order. 2. Reads can pass buffered writes, but the processor is self-consistent. 3. Writes to memory are always carried out in program order. 4. Writes can be buffered. 5. Writes are not performed speculatively; they are only performed for instructions that have actually been retired.
7-7
6. Data from buffered writes can be forwarded to waiting reads within the processor. 7. Reads or writes cannot pass (be carried out ahead of) I/O instructions, locked instructions, or serializing instructions. The second rule allows a read to pass a write. However, if the write is to the same memory location as the read, the processors internal snooping mechanism will detect the conflict and update the already cached read before the processor executes the instruction that uses the value. The sixth rule constitutes an exception to an otherwise write ordered model. In a multiple-processor system, the following ordering rules apply:
Individual processors use the same ordering rules as in a single-processor system. Writes by a single processor are observed in the same order by all processors. Writes from the individual processors on the system bus are globally observed and are NOT ordered with respect to each other.
The latter rule can be clarified by the example in Figure 7-1. Consider three processors in a system and each processor performs three writes, one to each of three defined locations (A, B, and C). Individually, the processors perform the writes in the same program order, but because of bus arbitration and other memory access mechanisms, the order that the three processors write the individual memory locations can differ each time the respective code sequences are executed on the processors. The final values in location A, B, and C would possibly vary on each execution of the write sequence.
Order of Writes From Individual Processors Each processor is guaranteed to perform writes in program order. Processor #1 Write A.1 Write B.1 Write C.1 Processor #2 Write A.2 Write B.2 Write C.2 Processor #3 Write A.3 Write B.3 Write C.3
Example of Order of Actual Writes From All Processors to Memory Writes are in order with respect to individual processors. Write A.1 Write B.1 Write A.2 Write A.3 Write C.1 Write B.2 Write C.2 Write B.3 Write C.3
Writes from all processors are not guaranteed to occur in a particular order.
Figure 7-1. Example of Write Ordering in Multiple-Processor Systems
7-8
The processor-ordering model described in this section is virtually identical to that used by the Pentium and Intel486 processors. The only enhancements in the P6 family processors are:
Added support for speculative reads. Store-buffer forwarding, when a read passes a write to the same memory location. Out of order store from long string store and string move operations (refer to Section 7.2.3., Out of Order Stores From String Operations in P6 Family Processors below).
7.2.3.
Out of Order Stores From String Operations in P6 Family Processors
The P6 family processors modify the processors operation during the string store operations (initiated with the MOVS and STOS instructions) to maximize performance. Once the fast string operations initial conditions are met (as described below), the processor will essentially operate on, from an external perspective, the string in a cache line by cache line mode. This results in the processor looping on issuing a cache-line read for the source address and an invalidation on the external bus for the destination address, knowing that all bytes in the destination cache line will be modified, for the length of the string. In this mode interrupts will only be accepted by the processor on cache line boundaries. It is possible in this mode that the destination line invalidations, and therefore stores, will be issued on the external bus out of order. Code dependent upon sequential store ordering should not use the string operations for the entire data structure to be stored. Data and semaphores should be separated. Order dependent code should use a discrete semaphore uniquely stored to after any string operations to allow correctly ordered data to be seen by all processors. Initial conditions for fast string operations:
Source and destination addresses must be 8-byte aligned. String operation must be performed in ascending address order. The initial operation counter (ECX) must be equal to or greater than 64. Source and destination must not overlap by less than a cache line (32 bytes). The memory type for both source and destination addresses must be either WB or WC.
7.2.4.
Strengthening or Weakening the Memory Ordering Model
The Intel Architecture provides several mechanisms for strengthening or weakening the memory ordering model to handle special programming situations. These mechanisms include:
The I/O instructions, locking instructions, the LOCK prefix, and serializing instructions force stronger ordering on the processor. The memory type range registers (MTRRs) can be used to strengthen or weaken memory ordering for specific area of physical memory (refer to Section 9.12., Memory Type
7-9
Range Registers (MTRRs), in Chapter 9, Memory Cache Control). MTRRs are available only in the P6 family processors. These mechanisms can be used as follows. Memory mapped devices and other I/O devices on the bus are often sensitive to the order of writes to their I/O buffers. I/O instructions can be used to (the IN and OUT instructions) impose strong write ordering on such accesses as follows. Prior to executing an I/O instruction, the processor waits for all previous instructions in the program to complete and for all buffered writes to drain to memory. Only instruction fetch and page tables walks can pass I/O instructions. Execution of subsequent instructions do not begin until the processor determines that the I/O instruction has been completed. Synchronization mechanisms in multiple-processor systems may depend upon a strong memory-ordering model. Here, a program can use a locking instruction such as the XCHG instruction or the LOCK prefix to insure that a read-modify-write operation on memory is carried out atomically. Locking operations typically operate like I/O operations in that they wait for all previous instructions to complete and for all buffered writes to drain to memory (refer to Section 7.1.2., Bus Locking). Program synchronization can also be carried out with serializing instructions (refer to Section 7.4., Serializing Instructions). These instructions are typically used at critical procedure or task boundaries to force completion of all previous instructions before a jump to a new section of code or a context switch occurs. Like the I/O and locking instructions, the processor waits until all previous instructions have been completed and all buffered writes have been drained to memory before executing the serializing instruction. The MTRRs were introduced in the P6 family processors to define the cache characteristics for specified areas of physical memory. The following are two examples of how memory types set up with MTRRs can be used strengthen or weaken memory ordering for the P6 family processors:
The uncached (UC) memory type forces a strong-ordering model on memory accesses. Here, all reads and writes to the UC memory region appear on the bus and out-of-order or speculative accesses are not performed. This memory type can be applied to an address range dedicated to memory mapped I/O devices to force strong memory ordering. For areas of memory where weak ordering is acceptable, the write back (WB) memory type can be chosen. Here, reads can be performed speculatively and writes can be buffered and combined. For this type of memory, cache locking is performed on atomic (locked) operations that do not split across cache lines, which helps to reduce the performance penalty associated with the use of the typical synchronization instructions, such as XCHG, that lock the bus during the entire read-modify-write operation. With the WB memory type, the XCHG instruction locks the cache instead of the bus if the memory access is contained within a cache line.
It is recommended that software written to run on P6 family processors assume the processorordering model or a weaker memory-ordering model. The P6 family processors do not implement a strong memory-ordering model, except when using the UC memory type. Despite the fact that P6 family processors support processor ordering, Intel does not guarantee that future processors will support this model. To make software portable to future processors, it is recom-
7-10
mended that operating systems provide critical region and resource control constructs and APIs (application program interfaces) based on I/O, locking, and/or serializing instructions be used to synchronize access to shared areas of memory in multiple-processor systems. Also, software should not depend on processor ordering in situations where the system hardware does not vsupport this memory-ordering model.
7.3.
PROPAGATION OF PAGE TABLE ENTRY CHANGES TO MULTIPLE PROCESSORS
In a multiprocessor system, when one processor changes a page table entry or mapping, the changes must also be propagated to all the other processors. This process is also known as TLB Shootdown. Propagation may be done by memory-based semaphores and/or interprocessor interrupts between processors. One naive but algorithmically correct TLB Shootdown sequence for the Intel Architecture is: 1. Begin barrier: Stop all processors. Cause all but one to HALT or stop in a spinloop. 2. Let the active processor change the PTE(s). 3. Let all processors invalidate the PTE(s) modified in their TLBs. 4. End barrier: Resume all processors. Alternate, performance-optimized, TBL Shootdown algorithms may be developed; however, care must be taken by the developers to ensure that either:
The differing TLB mappings are not actually used on different processors during the update process. OR The operating system is prepared to deal with the case where processor(s) is/are using the stale mapping during the update process.
7.4.
SERIALIZING INSTRUCTIONS
The Intel Architecture defines several serializing instructions. These instructions force the processor to complete all modifications to flags, registers, and memory by previous instructions and to drain all buffered writes to memory before the next instruction is fetched and executed. For example, when a MOV to control register instruction is used to load a new value into control register CR0 to enable protected mode, the processor must perform a serializing operation before it enters protected mode. This serializing operation insures that all operations that were started while the processor was in real-address mode are completed before the switch to protected mode is made. The concept of serializing instructions was introduced into the Intel Architecture with the Pentium processor to support parallel instruction execution. Serializing instructions have no meaning for the Intel486 and earlier processors that do not implement parallel instruction execution.
7-11
It is important to note that executing of serializing instructions on P6 family processors constrain speculative execution, because the results of speculatively executed instructions are discarded. The following instructions are serializing instructions:
Privileged serializing instructionsMOV (to control register), MOV (to debug register), WRMSR, INVD, INVLPG, WBINVD, LGDT, LLDT, LIDT, and LTR. Nonprivileged serializing instructionsCPUID, IRET, and RSM.
The CPUID instruction can be executed at any privilege level to serialize instruction execution with no effect on program flow, except that the EAX, EBX, ECX, and EDX registers are modified. Nothing can pass a serializing instruction, and serializing instructions cannot pass any other instruction (read, write, instruction fetch, or I/O). When the processor serializes instruction execution, it ensures that all pending memory transactions are completed, including writes stored in its store buffer, before it executes the next instruction. The following additional information is worth noting regarding serializing instructions:
The processor does not writeback the contents of modified data in its data cache to external memory when it serializes instruction execution. Software can force modified data to be written back by executing the WBINVD instruction, which is a serializing instruction. It should be noted that frequent use of the WBINVD instruction will seriously reduce system performance. When an instruction is executed that enables or disables paging (that is, changes the PG flag in control register CR0), the instruction should be followed by a jump instruction. The target instruction of the jump instruction is fetched with the new setting of the PG flag (that is, paging is enabled or disabled), but the jump instruction itself is fetched with the previous setting. The P6 family processors do not require the jump operation following the move to register CR0 (because any use of the MOV instruction in a P6 family processor to write to CR0 is completely serializing). However, to maintain backwards and forward compatibility with code written to run on other Intel Architecture processors, it is recommended that the jump operation be performed. Whenever an instruction is executed to change the contents of CR3 while paging is enabled, the next instruction is fetched using the translation tables that correspond to the new value of CR3. Therefore the next instruction and the sequentially following instructions should have a mapping based upon the new value of CR3. (Global entries in the TLBs are not invalidated, refer to Section 9.10., Invalidating the Translation Lookaside Buffers (TLBs), Chapter 9, Memory Cache Control.) The Pentium and P6 family processors use branch-prediction techniques to improve performance by prefetching the destination of a branch instruction before the branch instruction is executed. Consequently, instruction execution is not deterministically serialized when a branch instruction is executed.
7-12
7.5.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
The Advanced Programmable Interrupt Controller (APIC), referred to in the following sections as the local APIC, was introduced into the Intel Architecture with the Pentium processor (beginning with the 735/90 and 815/100 models) and is included in all P6 family processors. The local APIC performs two main functions for the processor:
It processes local external interrupts that the processor receives at its interrupt pins and local internal interrupts that software generates. In multiple-processor systems, it communicates with an external I/O APIC chip. The external I/O APIC receives external interrupt events from the system and interprocessor interrupts from the processors on the system bus and distributes them to the processors on the system bus. The I/O APIC is part of Intels system chip set.
Figure 7-2 shows the relationship of the local APICs on the processors in a multiple-processor (MP) system and the I/O APIC. The local APIC controls the dispatching of interrupts (to its associated processor) that it receives either locally or from the I/O APIC. It provides facilities for queuing, nesting and masking of interrupts. It handles the interrupt delivery protocol with its local processor and accesses to APIC registers, and also manages interprocessor interrupts and remote APIC register reads. A timer on the local APIC allows local generation of interrupts, and local interrupt pins permit local reception of processor-specific interrupts. The local APIC can be disabled and used in conjunction with a standard 8259A-style interrupt controller. (Disabling the local APIC can be done in hardware for the Pentium processors or in software for the P6 family processors.) The I/O APIC is responsible for receiving interrupts generated by I/O devices and distributing them among the local APICs by means of the APIC Bus. The I/O APIC manages interrupts using either static or dynamic distribution schemes. Dynamic distribution of interrupts allows routing of interrupts to the lowest priority processors. It also handles the distribution of interprocessor interrupts and system-wide control functions such as NMI, INIT, SMI and start-up-interprocessor interrupts. Individual pins on the I/O APIC can be programmed to generate a specific, prioritized interrupt vector when asserted. The I/O APIC also has a virtual wire mode that allows it to cooperate with an external 8259A in the system. The APIC in the Pentium and P6 family processors is an architectural subset of the Intel 82489DX external APIC. The differences are described in Section 7.5.19., Software Visible Differences Between the Local APIC and the 82489DX The following sections focus on the local APIC, and its implementation in the P6 family processors. Contact Intel for the information on I/O APIC.
7-13
Processor #1 CPU Local APIC Local Interrupts
Processor #2 CPU Local APIC Local Interrupts
Processor #3 CPU Local APIC Local Interrupts APIC Bus
I/O APIC External Interrupts I/O Chip Set
Figure 7-2. I/O APIC and Local APICs in Multiple-Processor Systems
7.5.1.
Presence of APIC
Beginning with the P6 family processors, the presence or absence of an on-chip APIC can be detected using the CPUID instruction. When the CPUID instruction is executed, bit 9 of the feature flags returned in the EDX register indicates the presence (set) or absence (clear) of an on-chip local APIC.
7.5.2.
Enabling or Disabling the Local APIC
For the P6 family processors, a flag (the E flag, bit 11) in the APIC_BASE_MSR register permits the local APIC to be explicitly enabled or disabled. Refer to Section 7.5.8., Relocation of the APIC Registers Base Address for a description of this flag. For the Pentium processor, the APICEN pin (which is shared with the PICD1 pin) is used during reset to enable or disable the local APIC.
7.5.3.
APIC Bus
All I/O APIC and local APICs communicate through the APIC bus (a 3-line inter-APIC bus). Two of the lines are open-drain (wired-OR) and are used for data transmission; the third line is a clock. The bus and its messages are invisible to software and are not classed as architectural (that is, the APIC bus and message format may change in future implementations without having any effect on software compatibility).
7-14
7.5.4.
Valid Interrupts
The local and I/O APICs support 240 distinct vectors in the range of 16 to 255. Interrupt priority is implied by its vector, according to the following relationship: priority = vector / 16 One is the lowest priority and 15 is the highest. Vectors 16 through 31 are reserved for exclusive use by the processor. The remaining vectors are for general use. The processors local APIC includes an in-service entry and a holding entry for each priority level. To avoid losing interrupts, software should allocate no more than 2 interrupt vectors per priority.
7.5.5.
Interrupt Sources
The local APIC can receive interrupts from the following sources:
Interrupt pins on the processor chip, driven by locally connected I/O devices. A bus message from the I/O APIC, originated by an I/O device connected to the I/O APIC. A bus message from another processors local APIC, originated as an interprocessor interrupt. The local APICs programmable timer or the error register, through the self-interrupt generating mechanism. Software, through the self-interrupt generating mechanism. (P6 family processors.) The performance-monitoring counters.
The local APIC services the I/O APIC and interprocessor interrupts according to the information included in the bus message (such as vector, trigger type, interrupt destination, etc.). Interpretation of the processors interrupt pins and the timer-generated interrupts is programmable, by means of the local vector table (LVT). To generate an interprocessor interrupt, the source processor programs its interrupt command register (ICR). The programming of the ICR causes generation of a corresponding interrupt bus message. Refer to Section 7.5.11., Local Vector Table and Section 7.5.12., Interprocessor and Self-Interrupts for detailed information on programming the LVT and ICR, respectively.
7.5.6.
Bus Arbitration Overview
Being connected on a common bus (the APIC bus), the local and I/O APICs have to arbitrate for permission to send a message on the APIC bus. Logically, the APIC bus is a wired-OR connection, enabling more than one local APIC to send messages simultaneously. Each APIC issues its arbitration priority at the beginning of each message, and one winner is collectively selected following an arbitration round. At any given time, a local APICs the arbitration priority is a unique value from 0 to 15. The arbitration priority of each local APIC is dynamically modified after each successfully transmitted message to preserve fairness. Refer to Section 7.5.16., APIC Bus Arbitration Mechanism and Protocol for a detailed discussion of bus arbitration.
7-15
Section 7.5.3., APIC Bus describes the existing arbitration protocols and bus message formats, while Section 7.5.12., Interprocessor and Self-Interrupts describes the INIT level deassert message, used to resynchronize all local APICs arbitration IDs. Note that except for startup (refer to Section 7.5.11., Local Vector Table), all bus messages failing during delivery are automatically retried. The software should avoid situations in which interrupt messages may be ignored by disabled or nonexistent target local APICs, and messages are being resent repeatedly.
7.5.7.
The Local APIC Block Diagram
Figure 7-3 gives a functional block diagram for the local APIC. Software interacts with the local APIC by reading and writing its registers. The registers are memory-mapped to the processors physical address space, and for each processor they have an identical address space of 4 KBytes starting at address FEE00000H. (Refer to Section 7.5.8., Relocation of the APIC Registers Base Address for information on relocating the APIC registers base address for the P6 family processors.)
NOTE
For P6 family processors, the APIC handles all memory accesses to addresses within the 4-KByte APIC register space and no external bus cycles are produced. For the Pentium processors with an on-chip APIC, bus cycles are produced for accesses to the 4-KByte APIC register space. Thus, for software intended to run on Pentium processors, system software should explicitly not map the APIC register space to regular system memory. Doing so can result in an invalid opcode exception (#UD) being generated or unpredictable execution. The 4-KByte APIC register address space should be mapped as uncacheable (UC), refer to Section 9, Memory Cache Control, in Chapter 9, Memory Cache Control.
7-16
DATA/ADDR Version Register Timer Current Count Register Initial Count Register Divide Configuration Register Local Vec Table Timer
LINT0/1 15 T S R
INTA EXTINT INTR
EOI Register
Task Priority Register
Prioritizer
Local Interrupts 0,1 Performance Monitoring Counters* Error
T S R
1 V
TMR, ISR, IRR Registers
T R
T R
Software Transparent Registers

Vec[3:0] & TMR Bit Register Select
Interrupt Command Register
Arb. ID Register
Vector Decode
APIC ID Register Logical Destination Register Destination Format Register
Processor Priority
Acceptance Logic
INIT, NMI, SMI Dest. Mode & Vector
APIC Bus Send/Receive Logic
APIC Serial Bus
* Available only in P6 family processors
Figure 7-3. Local APIC Structure
Within the 4-KByte APIC register area, the register address allocation scheme is shown in Table 7-1. Register offsets are aligned on 128-bit boundaries. All registers must be accessed using 32bit loads and stores. Wider registers (64-bit or 256-bit) are defined and accessed as independent multiple 32-bit registers. If a LOCK prefix is used with a MOV instruction that accesses the APIC address space, the prefix is ignored; that is, a locking operation does not take place.
7-17
Table 7-1. Local APIC Register Address Map

Address FEE0 0000H FEE0 0010H FEE0 0020H FEE0 0030H FEE0 0040H FEE0 0050H FEE0 0060H FEE0 0070H FEE0 0080H FEE0 0090H FEE0 00A0H FEE0 00B0H FEE0 00C0H FEE0 00D0H FEE0 00E0H FEE0 00F0H FEE0 0100H through FEE0 0170H FEE0 0180H through FEE0 01F0H FEE0 0200H through FEE0 0270H FEE0 0280H FEE0 0290H through FEE0 02F0H FEE0 0300H FEE0 0310H FEE0 0320H FEE0 0330H FEE0 0340H FEE0 0350H FEE0 0360H FEE0 0370H FEE0 0380H 7-18 Reserved Reserved Local APIC ID Register Local APIC Version Register Reserved Reserved Reserved Reserved Task Priority Register Arbitration Priority Register Processor Priority Register EOI Register Reserved Logical Destination Register Destination Format Register Spurious-Interrupt Vector Register ISR 0-255 TMR 0-255 IRR 0-255 Error Status Register Reserved Interrupt Command Reg. 0-31 Interrupt Command Reg. 32-63 Local Vector Table (Timer) Reserved Performance Counter LVT1 Local Vector Table (LINT0) Local Vector Table (LINT1) Local Vector Table (Error)
2
Register Name
Software Read/Write
Read/write Read only
Read/Write Read only Read only Write only
Read/Write Bits 0-27 Read only. Bits 28-31 Read/Write Bits 0-3 Read only. Bits 4-9 Read/Write Read only Read only Read only Read only
Read/Write Read/Write Read/Write
Read/Write Read/Write Read/Write Read/Write Read/Write
Initial Count Register for Timer
Table 7-1. Local APIC Register Address Map (Contd.)

Address FEE0 0390H FEE0 03A0H through FEE0 03D0H FEE0 03E0H FEE0 03F0H NOTES: 1. Introduced into the APIC Architecture in the Pentium Pro processor. 2. Introduced into the APIC Architecture in the Pentium processor. Register Name Current Count Register for Timer Reserved Timer Divide Configuration Register Reserved Read/Write Software Read/Write Read only
7.5.8.
Relocation of the APIC Registers Base Address
The P6 family processors permit the starting address of the APIC registers to be relocated from FEE00000H to another physical address. This extension of the APIC architecture is provided to help resolve conflicts with memory maps of existing systems. The P6 family processors also provide the ability to enable or disable the local APIC. An alternate APIC base address is specified through the APIC_BASE_MSR register. This MSR is located at MSR address 27 (1BH). Figure 7-4 shows the encoding of the bits in this register. This register also provides the flag for enabling or disabling the local APIC. The functions of the bits in the APIC_BASE_MSR register are as follows: BSP flag, bit 8 Indicates if the processor is the bootstrap processor (BSP), determined during the MP initialization (refer to Section 7.7., Multiple-Processor (MP) Initialization Protocol). Following a power-up or reset, this flag is clear for all the processors in the system except the single BSP.
63
36 35
12 11 10 9 8 7
Reserved APIC BaseBase physical address EAPIC enable/disable BSPProcessor is BSP Reserved
APIC Base
Figure 7-4. APIC_BASE_MSR
E (APIC Enabled) flag, bit 11 Permits the local APIC to be enabled (set) or disabled (clear). Following a power-up or reset, this flag is set, enabling the local APIC. When this flag is
7-19
clear, the processor is functionally equivalent to an Intel Architecture processor without an on-chip APIC (for example, an Intel486 processor). This flag is implementation dependent and in not guaranteed to be available or available at the same location in future Intel Architecture processors. APIC Base field, bits 12 through 35 Specifies the base address of the APIC registers. This 24-bit value is extended by 12 bits at the low end to form the base address, which automatically aligns the address on a 4-KByte boundary. Following a power-up or reset, this field is set to FEE00000H. Bits 0 through 7, bits 9 and 10, and bits 36 through 63 in the APIC_BASE_MSR register are reserved.
7.5.9.
Interrupt Destination and APIC ID
The destination of an interrupt can be one, all, or a subset of the processors in the system. The sender specifies the destination of an interrupt in one of two destination modes: physical or logical. 7.5.9.1. PHYSICAL DESTINATION MODE
In physical destination mode, the destination processor is specified by its local APIC ID. This ID is matched against the local APICs actual physical ID, which is stored in the local APIC ID register (refer to Figure 7-5). Either a single destination (the ID is 0 through 14) or a broadcast to all (the ID is 15) can be specified in physical destination mode. Note that in this mode, up to 15 the local APICs can be individually addressed. An ID of all 1s denotes a broadcast to all local APICs. The APIC ID register is loaded at power up by sampling configuration data that is driven onto pins of the processor. For the P6 family processors, pins A11# and A12# and pins BR0# through BR3# are sampled; for the Pentium processor, pins BE0# through BE3# are sampled. The ID portion can be read and modified by software.
31 28 27 24 23 0
Reserved
APIC ID
Reserved
Address: 0FEE0 0020H Value after reset: 0000 0000H
Figure 7-5. Local APIC ID Register
7.5.9.2.
LOGICAL DESTINATION MODE
In logical destination mode, message destinations are specified using an 8-bit message destination address (MDA). The MDA is compared against the 8-bit logical APIC ID field of the APIC logical destination register (LDR), refer to Figure 7-6.
7-20
31
24 23
Logical APIC ID Address: 0FEE0 00D0H Value after reset: 0000 0000H
Reserved
Figure 7-6. Logical Destination Register (LDR)
Destination format register (DFR) defines the interpretation of the logical destination information (refer to Figure 7-7). The DFR register can be programmed for flat model or cluster model interrupt delivery modes.
31 28 0
Model Address: 0FEE0 00E0H Value after reset: FFFF FFFFH
Reserved (All 1s)
Figure 7-7. Destination Format Register (DFR)
7.5.9.3.
FLAT MODEL
For the flat model, bits 28 through 31 of the DFR must be programmed to 1111. The MDA is interpreted as a decoded address. This scheme allows the specification of arbitrary groups of local APICs simply by setting each APICs bit to 1 in the corresponding LDR. In the flat model, up to 8 local APICs can coexist in the system. Broadcast to all APICs is achieved by setting all 8 bits of the MDA to ones. 7.5.9.4. CLUSTER MODEL
For the cluster model, the DFR bits 28 through 31 should be programmed to 0000. In this model, there are two basic connection schemes: flat cluster and hierarchical cluster. In the flat cluster connection model, all clusters are assumed to be connected on a single APIC bus. Bits 28 through 31 of the MDA contains the encoded address of the destination cluster. These bits are compared with bits 28 through 31 of the LDR to determine if the local APIC is part of the cluster. Bits 24 through 27 of the MDA are compared with Bits 24 through 27 of the LDR to identify individual local APIC unit within the cluster. Arbitrary sets of processors within a cluster can be specified by writing the target cluster address in bits 28 through 31 of the MDA and setting selected bits in bits 24 through 27 of the MDA, corresponding to the chosen members of the cluster. In this mode, 15 clusters (with cluster addresses of 0 through 14) each having 4 processors can be specified in the message. The APIC arbitration ID, however, supports only 15 agents, and hence the total number of processors supported in this mode is limited to 15.
7-21
Broadcast to all local APICs is achieved by setting all destination bits to one. This guarantees a match on all clusters, and selects all APICs in each cluster. In the hierarchical cluster connection model, an arbitrary hierarchical network can be created by connecting different flat clusters via independent APIC buses. This scheme requires a cluster manager within each cluster, responsible for handling message passing between APIC buses. One cluster contains up to 4 agents. Thus 15 cluster managers, each with 4 agents, can form a network of up to 60 APIC agents. Note that hierarchical APIC networks requires a special cluster manager device, which is not part of the local or the I/O APIC units. 7.5.9.5. ARBITRATION PRIORITY
Each local APIC is given an arbitration priority of from 0 to 15 upon reset. The I/O APIC uses this priority during arbitration rounds to determine which local APIC should be allowed to transmit a message on the APIC bus when multiple local APICs are issuing messages. The local APIC with the highest arbitration priority wins access to the APIC bus. Upon completion of an arbitration round, the winning local APIC lowers its arbitration priority to 0 and the losing local APICs each raise theirs by 1. In this manner, the I/O APIC distributes message bus-cycles among the contesting local APICs. The current arbitration priority for a local APIC is stored in a 4-bit, software-transparent arbitration ID (Arb ID) register. During reset, this register is initialized to the APIC ID number (stored in the local APIC ID register). The INIT-deassert command resynchronizes the arbitration priorities of the local APICs by resetting Arb ID register of each agent to its current APIC ID value.
7.5.10. Interrupt Distribution Mechanisms

The APIC supports two mechanisms for selecting the destination processor for an interrupt: static and dynamic. Static distribution is used to access a specific processor in the network. Using this mechanism, the interrupt is unconditionally delivered to all local APICs that match the destination information supplied with the interrupt. The following delivery modes fall into the static distribution category: fixed, SMI, NMI, EXTINT, and start-up. Dynamic distribution assigns incoming interrupts to the lowest priority processor, which is generally the least busy processor. It can be programmed in the LVT for local interrupt delivery or the ICR for bus messages. Using dynamic distribution, only the lowest priority delivery mode is allowed. From all processors listed in the destination, the processor selected is the one whose current arbitration priority is the lowest. The latter is specified in the arbitration priority register (APR), refer to Section 7.5.13.4., Arbitration Priority Register (APR) If more than one processor shares the lowest priority, the processor with the highest arbitration priority (the unique value in the Arb ID register) is selected. In lowest priority mode, if a focus processor exists, it may accept the interrupt, regardless of its priority. A processor is said to be the focus of an interrupt if it is currently servicing that interrupt or if it has a pending request for that interrupt.
7-22
7.5.11. Local Vector Table

The local APIC contains a local vector table (LVT), specifying interrupt delivery and status information for the local interrupts. The information contained in this table includes the interrupts associated vector, delivery mode, status bits and other data as shown in Figure 7-8. The LVT incorporates five 32-bit entries: one for the timer, one each for the two local interrupt (LINT0 and LINT1) pins, one for the error interrupt, and (in the P6 family processors) one for the performance-monitoring counter interrupt. The fields in the LVT are as follows: Vector Delivery Mode Interrupt vector number. Defined only for local interrupt entries 1 and 2 and the performancemonitoring counter. The timer and the error status register (ESR) generate only edge triggered maskable hardware interrupts to the local processor. The delivery mode field does not exist for the timer and error interrupts. The performance-monitoring counter LVT may be programmed with a Deliver Mode equal to Fixed or NMI only. Note that certain delivery modes will only operate as intended when used in conjunction with a specific Trigger Mode. The allowable delivery modes are as follows: 000 (Fixed) Delivers the interrupt, received on the local interrupt pin, to this processor as specified in the corresponding LVT entry. The trigger mode can be edge or level. Note, if the processor is not used in conjunction with an I/O APIC, the fixed delivery mode may be software programmed for an edgetriggered interrupt, but the P6 family processors implementation will always operate in a leveltriggered mode. Delivers the interrupt, received on the local interrupt pin, to this processor as an NMI interrupt. The vector information is ignored. The NMI interrupt is treated as edge-triggered, even if programmed otherwise. Note that the NMI may be masked. It is the software's responsibility to program the LVT mask bit according to the desired behavior of NMI. Delivers the interrupt, received on the local interrupt pin, to this processor and responds as if the interrupt originated in an externally connected (8259A-compatible) interrupt controller. A special INTA bus cycle corresponding to ExtINT, is routed to the external controller. The latter is expected to supply the vector information. When the delivery mode is ExtINT, the trigger-mode is
100 (NMI)
111 (ExtINT)
7-23
level-triggered, regardless of how the APIC triggering mode is programmed. The APIC architecture supports only one ExtINT source in a system, usually contained in the compatibility bridge.
31
18 17 16 15
13 12 11
8 7
Timer Timer Mode 0: One-shot 1: Periodic
Vector Address: FEE0 0320H Value after Reset: 0001 0000H Delivery Status 0: Idle 1: Send Pending
Mask 0: Not Masked 1: Masked Interrupt Input Pin Polarity Remote IRR Trigger Mode 0: Edge 1: Level
Delivery Mode 000: Fixed 100: NMI 111: ExtlNT All other combinations are Reserved
31
17
11 10
8 7
LINT0 LINT1 ERROR PCINT

16 15 14 13 12
Vector Vector Vector Vector Address: FEE0 0350H Address: FEE0 0360H Address: FEE0 0370H Address: FEE0 0340H Value After Reset: 0001 0000H
Reserved
Figure 7-8. Local Vector Table (LVT)
7-24
Delivery Status (read only) Holds the current status of interrupt delivery. Two states are defined: 0 (Idle) There is currently no activity for this interrupt, or the previous interrupt from this source has completed.
1 (Send Pending) Indicates that the interrupt transmission has started, but has not yet been completely accepted. Interrupt Input Pin Polarity Specifies the polarity of the corresponding interrupt pin: (0) active high or (1) active low. Remote Interrupt Request Register (IRR) Bit Used for level triggered interrupts only; its meaning is undefined for edge triggered interrupts. For level triggered interrupts, the bit is set when the logic of the local APIC accepts the interrupt. The remote IRR bit is reset when an EOI command is received from the processor. Trigger Mode Selects the trigger mode for the local interrupt pins when the delivery mode is Fixed: (0) edge sensitive and (1) level sensitive. When the delivery mode is NMI, the trigger mode is always level sensitive; when the delivery mode is ExtINT, the trigger mode is always level sensitive. The timer and error interrupts are always treated as edge sensitive. Interrupt mask: (0) enables reception of the interrupt and (1) inhibits reception of the interrupt. Selects the timer mode: (0) one-shot and (1) periodic (refer to Section 7.5.18., Timer).
Mask Timer Mode
7.5.12. Interprocessor and Self-Interrupts

A processor generates interprocessor interrupts by writing into the interrupt command register (ICR) of its local APIC (refer to Figure 7-9). The processor may use the ICR for self interrupts or for interrupting other processors (for example, to forward device interrupts originally accepted by it to other processors for service). In addition, special inter-processor interrupts (IPI) such as the start-up IPI message, can only be delivered using the ICR mechanism. ICRbased interrupts are treated as edge triggered even if programmed otherwise. Note that not all combinations of options for ICR generated interrupts are valid (refer to Table 7-2).
7-25
63
56 55
32
Destination Field
Reserved
31
20 19 18 17 16 15 14 13 12 11 10
8 7
Reserved
Vector
Destination Shorthand 00: Dest. Field 01: Self 10: All Incl. Self 11: All Excl. Self
Reserved Address: FEE0 0310H Value after Reset: 0H
Delivery Mode 000: Fixed 001: Lowest Priority 010: SMI 011: Reserved 100: NMI 101: INIT 110: Start Up 111: Reserved Destination Mode 0: Physical 1: Logical Delivery Status 0: Idle 1: Send Pending Level 0 = De-assert 1 = Assert Trigger Mode 0: Edge 1: Level
Figure 7-9. Interrupt Command Register (ICR)
All fields of the ICR are read-write by software with the exception of the delivery status field, which is read-only. Writing to the 32-bit word that contains the interrupt vector causes the interrupt message to be sent. The ICR consists of the following fields. Vector Delivery Mode The vector identifying the interrupt being sent. The localAPIC register addresses are summarized in Table 7-1. Specifies how the APICs listed in the destination field should act upon reception of the interrupt. Note that all interprocessor interrupts behave as edge triggered interrupts (except for INIT level de-assert message) even if they are programmed as level triggered interrupts. 000 (Fixed) Deliver the interrupt to all processors listed in the destination field according to the information provided in the ICR. The fixed interrupt is treated as
7-26
an edge-triggered interrupt even if programmed otherwise. 001 (Lowest Priority) Same as fixed mode, except that the interrupt is delivered to the processor executing at the lowest priority among the set of processors listed in the destination. 010 (SMI) 011 (Reserved) 100 (NMI) Delivers the interrupt as an NMI interrupt to all processors listed in the destination field. The vector information is ignored. NMI is treated as an edge triggered interrupt even if programmed otherwise. Delivers the interrupt as an INIT signal to all processors listed in the destination field. As a result, all addressed APICs will assume their INIT state. As in the case of NMI, the vector information is ignored, and INIT is treated as an edge triggered interrupt even if programmed otherwise. Only the edge trigger mode is allowed. The vector field must be programmed to 00B.
101 (INIT)
101 (INIT Level De-assert) (The trigger mode must also be set to 1 and level mode to 0.) Sends a synchronization message to all APIC agents to set their arbitration IDs to the values of their APIC IDs. Note that the INIT interrupt is sent to all agents, regardless of the destination field value. However, at least one valid destination processor should be specified. For future compatibility, the software is requested to use a broadcast-to-all (all-incl-self shorthand, as described below). 110 (Start-Up) Sends a special message between processors in a multiple-processor system. For details refer to the Pentium Pro Family Developers Manual, Volume 1. The Vector information contains the startup address for the multiple-processor boot-up protocol. Start-up is treated as an edge triggered interrupt even if programmed otherwise. Note that interrupts are not automatically retried by the source APIC upon failure in delivery of the message. It is up to the software to decide whether a
7-27
retry is needed in the case of failure, and issue a retry message accordingly. Destination Mode Delivery Status Selects either (0) physical or (1) logical destination mode. Indicates the delivery status: 0 (Idle) There is currently no activity for this interrupt, or the previous interrupt from this source has completed.
1 (Send Pending) Indicates that the interrupt transmission has started, but has not yet been completely accepted. Level Trigger Mode Destination Shorthand Indicates whether a shorthand notation is used to specify the destination of the interrupt and, if so, which shorthand is used. Destination shorthands do not use the 8-bit destination field, and can be sent by software using a single write to the lower 32-bit part of the APIC interrupt command register. Shorthands are defined for the following cases: software self interrupt, interrupt to all processors in the system including the sender, interrupts to all processors in the system excluding the sender. 00: (destination field, no shorthand) The destination is specified in bits 56 through 63 of the ICR. 01: (self) The current APIC is the single destination of the interrupt. This is useful for software self interrupts. The destination field is ignored. Refer to Table 7-2 for description of supported modes. Note that self interrupts do not generate bus messages. For INIT level de-assert delivery mode the level is 0. For all other modes the level is 1. Used for the INIT level de-assert delivery mode only.
10: (all including self) The interrupt is sent to all processors in the system including the processor sending the interrupt. The APIC will broadcast a message with the destination field set to FH. Refer to Table 7-2 for description of supported modes. 11: (all excluding self) The interrupt is sent to all processors in the system with the exception of the processor sending the interrupt. The APIC will broadcast a message using
7-28
the physical destination mode and destination field set to FH. Destination This field is only used when the destination shorthand field is set to dest field. If the destination mode is physical, then bits 56 through 59 contain the APIC ID. In logical destination mode, the interpretation of the 8-bit destination field depends on the DFR and LDR of the local APIC Units.
Table 7-2 shows the valid combinations for the fields in the interrupt control register.
Table 7-2. Valid Combinations for the APIC Interrupt Command Register
Trigger Mode Edge Level Level Level Edge Level x Edge Level x Edge Level Level Level NOTES: 1. Valid. Treated as edge triggered if Level = 1 (assert), otherwise ignored. 2. Valid. Treated as edge triggered when Level = 1 (assert); when Level = 0 (deassert), treated as INIT Level Deassert message. Only INIT level deassert messages are allowed to have level = deassert. For all other messages the level must be assert. 3. Invalid. The behavior of the APIC is undefined. 4. XDont care. Destination Mode Physical or Logical Physical or Logical Physical or Logical x
4
Delivery Mode Fixed, Lowest Priority, NMI, SMI, INIT, Start-Up Fixed, Lowest Priority, NMI INIT SMI, Start-Up Fixed Fixed Lowest Priority, NMI, INIT, SMI, Start-Up Fixed Fixed Lowest Priority, NMI, INIT, SMI, Start-Up Fixed, Lowest Priority, NMI, INIT, SMI, Start-Up Fixed, Lowest Priority, NMI SMI, Start-Up INIT
Valid/ Invalid Valid 1 2 Invalid Valid 1 Invalid Valid 1 Invalid Valid 1 Invalid3 2
3 3 3
Destination Shorthand Dest. Field Dest. field Dest. Field x Self Self Self All inc Self All inc Self All inc Self All excl Self All excl Self All excl Self All excl Self
x x x x x x x x x x
7-29
7.5.13. Interrupt Acceptance

Three 256-bit read-only registers (the IRR, ISR, and TMR registers) are involved in the interrupt acceptance logic (refer to Figure 7-10). The 256 bits represents the 256 possible vectors. Because vectors 0 through 15 are reserved, so are bits 0 through 15 in these registers. The functions of the three registers are as follows: TMR (trigger mode register) Upon acceptance of an interrupt, the corresponding TMR bit is cleared for edge triggered interrupts and set for level interrupts. If the TMR bit is set, the local APIC sends an EOI message to all I/O APICs as a result of software issuing an EOI command (refer to Section 7.5.13.6., End-Of-Interrupt (EOI) for a description of the EOI register).
255
16 15
Reserved Reserved Reserved Addresses: IRR FEE0 0200H - FEE0 0270H ISR FEE0 0100H - FEE0 0170H TMR FEE0 0180H - FEE0 01F0H Value after reset: 0H
IRR ISR TMR
Figure 7-10. IRR, ISR and TMR Registers
IRR (interrupt request register) Contains the active interrupt requests that have been accepted, but not yet dispensed by the current local APIC. A bit in IRR is set when the APIC accepts the interrupt. The IRR bit is cleared, and a corresponding ISR bit is set when the INTA cycle is issued. ISR (in-service register) Marks the interrupts that have been delivered to the processor, but have not been fully serviced yet, as an EOI has not yet been received from the processor. The ISR reflects the current state of the processor interrupt queue. The ISR bit for the highest priority IRR is set during the INTA cycle. During the EOI cycle, the highest priority ISR bit is cleared, and if the corresponding TMR bit was set, an EOI message is sent to all I/O APICs. 7.5.13.1. INTERRUPT ACCEPTANCE DECISION FLOW CHART
The process that the APIC uses to accept an interrupt is shown in the flow chart in Figure 7-11. The response of the local APIC to the start-up IPI is explained in the Pentium Pro Family Developers Manual, Volume 1.
7-30
Wait to Receive Bus Message
Discard Message
No
Belong to Destination? Yes Is it NMI/SMI/INIT / ExtINT? No Yes Accept Message
Fixed
Delivery Mode?
Lowest Priority
Set Status to Retry
No
Is Interrupt Slot Available?
Am I Focus? No No Other Focus?
Yes
Accept Message
Yes Yes Is Status a Retry? No
Yes
Discard Message
Accept Message Is Interrupt Slot Available? Yes
Set Status to Retry
No
Arbitrate
No
Am I Winner?
Yes
Accept Message
Figure 7-11. Interrupt Acceptance Flow Chart for the Local APIC
7.5.13.2.
TASK PRIORITY REGISTER
Task priority register (TPR) provides a priority threshold mechanism for interrupting the processor (refer to Figure 7-12). Only interrupts whose priority is higher than that specified in the TPR will be serviced. Other interrupts are recorded and are serviced as soon as the TPR value is decreased enough to allow that. This enables the operating system to block temporarily specific interrupts (generally low priority) from disturbing high-priority tasks execution. The priority threshold mechanism is not applicable for delivery modes excluding the vector information (that is, for ExtINT, NMI, SMI, INIT, INIT-Deassert, and Start-Up delivery modes).
7-31
31
8 7
Reserved Address: FEE0 0080H Value after reset: 0H
Task Priority
Figure 7-12. Task Priority Register (TPR)
The Task Priority is specified in the TPR. The 4 most-significant bits of the task priority correspond to the 16 interrupt priorities, while the 4 least-significant bits correspond to the sub-class priority. The TPR value is generally denoted as x:y, where x is the main priority and y provides more precision within a given priority class. When the x-value of the TPR is 15, the APIC will not accept any interrupts. 7.5.13.3. PROCESSOR PRIORITY REGISTER (PPR)
The processor priority register (PPR) is used to determine whether a pending interrupt can be dispensed to the processor. Its value is computed as follows:
IF TPR[7:4] ISRV[7:4] THEN PPR[7:0] = TPR[7:0] ELSE PPR[7:4] = ISRV[7:4] AND PPR[3:0] = 0
Where ISRV is the vector of the highest priority ISR bit set, or zero if no ISR bit is set. The PPR format is identical to that of the TPR. The PPR address is FEE000A0H, and its value after reset is zero. 7.5.13.4. ARBITRATION PRIORITY REGISTER (APR)
Arbitration priority register (APR) holds the current, lowest-priority of the processor, a value used during lowest priority arbitration (refer to Section 7.5.16., APIC Bus Arbitration Mechanism and Protocol). The APR format is identical to that of the TPR. The APR value is computed as the following.
IF (TPR[7:4] IRRV[7:4]) AND (TPR[7:4] > ISRV[7:4]) THEN APR[7:0] = TPR[7:0] ELSE APR[7:4] = max(TPR[7:4] AND ISRV[7:4], IRRV[7:4]), APR[3:0]=0.
Here, IRRV is the interrupt vector with the highest priority IRR bit set or cleared (if no IRR bit is set). The APR address is FEE0 0090H, and its value after reset is 0.
7-32
7.5.13.5.
SPURIOUS INTERRUPT
A special situation may occur when a processor raises its task priority to be greater than or equal to the level of the interrupt for which the processor INTR signal is currently being asserted. If at the time the INTA cycle is issued, the interrupt that was to be dispensed has become masked (programmed by software), the local APIC will return a spurious-interrupt vector to the processor. Dispensing the spurious-interrupt vector does not affect the ISR, so the handler for this vector should return without an EOI. 7.5.13.6. END-OF-INTERRUPT (EOI)
During the interrupt serving routine, software should indicate acceptance of lowest-priority, fixed, timer, and error interrupts by writing an arbitrary value into its local APIC end-of-interrupt (EOI) register (refer to Figure 7-13). This is an indication for the local APIC it can issue the next interrupt, regardless of whether the current interrupt service has been terminated or not. Note that interrupts whose priority is higher than that currently in service, do not wait for the EOI command corresponding to the interrupt in service.
31
Address: 0FEE0 00B0H Value after reset: 0H
Figure 7-13. EOI Register
Upon receiving end-of-interrupt, the APIC clears the highest priority bit in the ISR and selects the next highest priority interrupt for posting to the CPU. If the terminated interrupt was a leveltriggered interrupt, the local APIC sends an end-of-interrupt message to all I/O APICs. Note that EOI command is supplied for the above two interrupt delivery modes regardless of the interrupt source (that is, as a result of either the I/O APIC interrupts or those issued on local pins or using the ICR). For future compatibility, the software is requested to issue the end-of-interrupt command by writing a value of 0H into the EOI register.
7.5.14. Local APIC State

In P6 family processors, all local APICs are initialized in a software-disabled state after powerup. A software-disabled local APIC unit responds only to self-interrupts and to INIT, NMI, SMI, and start-up messages arriving on the APIC Bus. The operation of local APICs during the disabled state is as follows:
For the INIT, NMI, SMI, and start-up messages, the APIC behaves normally, as if fully enabled.
7-33
Pending interrupts in the IRR and ISR registers are held and require masking or handling by the CPU. A disabled local APIC does not affect the sending of APIC messages. It is softwares responsibility to avoid issuing ICR commands if no sending of interrupts is desired. Disabling a local APIC does not affect the message in progress. The local APIC will complete the reception/transmission of the current message and then enter the disabled state. A disabled local APIC automatically sets all mask bits in the LVT entries. Trying to reset these bits in the local vector table will be ignored. A software-disabled local APIC listens to all bus messages in order to keep its arbitration ID synchronized with the rest of the system, in the event that it is re-enabled.
For the Pentium processor, the local APIC is enabled and disabled through a hardware mechanism. (Refer to the Pentium Processor Data Book for a description of this mechanism.) 7.5.14.1. SPURIOUS-INTERRUPT VECTOR REGISTER
Software can enable or disable a local APIC at any time by programming bit 8 of the spuriousinterrupt vector register (SVR), refer to Figure 7-14. The functions of the fields in the SVR are as follows:
31 10 9 8 7 4 3 0
Reserved
1111
Focus Processor Checking 0: Enabled 1: Disabled
APIC Enabled 0: APIC SW Disabled 1: APIC SW Enabled
Spurious Vector
Address: FEE0 00F0H Value after reset: 0000 00FFH
Figure 7-14. Spurious-Interrupt Vector Register (SVR)
Spurious Vector
Released during an INTA cycle when all pending interrupts are masked or when no interrupt is pending. Bits 4 through 7 of the this field are programmable by software, and bits 0 through 3 are hardwired to logical ones. Software writes to bits 0 through 3 have no effect. Allows software to enable (1) or disable (0) the local APIC. To bypass APIC completely, use the APIC_BASE_MSR in Figure 7-4. Determines if focus processor checking is enabled during the lowest Priority delivery: (0) enabled and (1) disabled.
APIC Enable Focus Processor Checking
7-34
7.5.14.2.
LOCAL APIC INITIALIZATION
On a hardware reset, the processor and its local APIC are initialized simultaneously. For the P6 family processors, the local APIC obtains its initial physical ID from system hardware at the falling edge of the RESET# signal by sampling 6 lines on the system bus (the BR[3:0]) and cluster ID[1:0] lines) and storing this value into the APIC ID register; for the Pentium processor, four lines are sampled (BE0# through BE3#). Refer to the Pentium Pro & Pentium II Processors Data Book and the Pentium Processor Data Book for descriptions of this mechanism. 7.5.14.3. LOCAL APIC STATE AFTER POWER-UP RESET
The state of local APIC registers and state machines after a power-up reset are as follows:
The following registers are all reset to 0: the IRR, ISR, TMR, ICR, LDR, and TPR registers; the holding registers; the timer initial count and timer current count registers; the remote register; and the divide configuration register. The DFR register is reset to all 1s. The LVT register entries are reset to 0 except for the mask bits, which are set to 1s. The local APIC version register is not affected. The local APIC ID and Arb ID registers are loaded from processor input pins (the Arb ID register is set to the APIC ID value for the local APIC). All internal state machines are reset. APIC is software disabled (that is, bit 8 of the SVR register is set to 0). The spurious-interrupt vector register is initialized to FFH. LOCAL APIC STATE AFTER AN INIT RESET
7.5.14.4.
An INIT reset of the processor can be initiated in either of two ways:
By asserting the processors INIT# pin. By sending the processor an INIT IPI (sending an APIC bus-based interrupt with the delivery mode set to INIT).
Upon receiving an INIT via either of these two mechanisms, the processor responds by beginning the initialization process of the processor core and the local APIC. The state of the local APIC following an INIT reset is the same as it is after a power-up reset, except that the APIC ID and Arb ID registers are not affected. 7.5.14.5. LOCAL APIC STATE AFTER INIT-DEASSERT MESSAGE
An INIT-disassert message has no affect on the state of the APIC, other than to reload the arbitration ID register with the value in the APIC ID register.
7-35
7.5.15. Local APIC Version Register

The local APIC contains a hardwired version register, which software can use to identify the APIC version (refer to Figure 7-16). In addition, the version register specifies the size of LVT used in the specific implementation. The fields in the local APIC version register are as follows: Version The version numbers of the local APIC or an external 82489DX APIC controller: 1XH 0XH Local APIC. 82489DX.
20H through FFHReserved. Max LVT Entry Shows the number of the highest order LVT entry. For the P6 family processors, having 5 LVT entries, the Max LVT number is 4; for the Pentium processor, having 4 LVT entries, the Max LVT number is 3.
31
24 23
16 15
8 7
Reserved
Max. LVT Entry
Reserved
Version
Value after reset: 000N 00VVH V = Version, N = # of LVT entries Address: FEE0 0030H
Figure 7-15. Local APIC Version Register
7.5.16. APIC Bus Arbitration Mechanism and Protocol

Because only one message can be sent at a time on the APIC bus, the I/O APIC and local APICs employ a rotating priority arbitration protocol to gain permission to send a message on the APIC bus. One or more APICs may start sending their messages simultaneously. At the beginning of every message, each APIC presents the type of the message it is sending and its current arbitration priority on the APIC bus. This information is used for arbitration. After each arbitration cycle (within an arbitration round, only the potential winners keep driving the bus. By the time all arbitration cycles are completed, there will be only one APIC left driving the bus. Once a winner is selected, it is granted exclusive use of the bus, and will continue driving the bus to send its actual message. After each successfully transmitted message, all APICs increase their arbitration priority by 1. The previous winner (that is, the one that has just successfully transmitted its message) assumes a priority of 0 (lowest). An agent whose arbitration priority was 15 (highest) during arbitration, but did not send a message, adopts the previous winners arbitration priority, incremented by 1. Note that the arbitration protocol described above is slightly different if one of the APICs issues a special End-Of-Interrupt (EOI). This high-priority message is granted the bus regardless of its senders arbitration priority, unless more than one APIC issues an EOI message simultaneously. In the latter case, the APICs sending the EOI messages arbitrate using their arbitration priorities.
7-36
If the APICs are set up to use lowest priority arbitration (refer to Section 7.5.10., Interrupt Distribution Mechanisms) and multiple APICs are currently executing at the lowest priority (the value in the APR register), the arbitration priorities (unique values in the Arb ID register) are used to break ties. All 8 bits of the APR are used for the lowest priority arbitration. 7.5.16.1. BUS MESSAGE FORMATS
The APICs use three types of messages: EOI message, short message, and non-focused lowest priority message. The purpose of each type of message and its format are described below. EOI Message. Local APICs send 14-cycle EOI messages to the I/O APIC to indicate that a level triggered interrupt has been accepted by the processor. This interrupt, in turn, is a result of software writing into the EOI register of the local APIC. Table 7-3 shows the cycles in an EOI message. The checksum is computed for cycles 6 through 9. It is a cumulative sum of the 2-bit (Bit1:Bit0) logical data values. The carry out of all but the last addition is added to the sum. If any APIC computes a different checksum than the one appearing on the bus in cycle 10, it signals an error, driving 11 on the APIC bus during cycle 12. In this case, the APICs disregard the message. The sending APIC will receive an appropriate error indication (refer to Section 7.5.17., Error Handling) and resend the message. The status cycles are defined in Table 7-6. Short Message. Short messages (21-cycles) are used for sending fixed, NMI, SMI, INIT, startup, ExtINT and lowest-priority-with-focus interrupts. Table 7-4 shows the cycles in a short message.
Table 7-3. EOI Message (14 Cycles)
Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Bit1 1 ArbID3 ArbID2 ArbID1 ArbID0 V7 V5 V3 V1 C 0 A A1 0 Bit0 1 0 0 0 0 V6 V4 V2 V0 C 0 A A1 0 Status Cycle 0 Status Cycle 1 Idle Checksum for cycles 6 - 9 Interrupt vector V7 - V0 11 = EOI Arbitration ID bits 3 through 0
7-37
If the physical delivery mode is being used, then cycles 15 and 16 represent the APIC ID and cycles 13 and 14 are considered dont care by the receiver. If the logical delivery mode is being used, then cycles 13 through 16 are the 8-bit logical destination field. For shorthands of allincl-self and all-excl-self, the physical delivery mode and an arbitration priority of 15 (D0:D3 = 1111) are used. The agent sending the message is the only one required to distinguish between the two cases. It does so using internal information. When using lowest priority delivery with an existing focus processor, the focus processor identifies itself by driving 10 during cycle 19 and accepts the interrupt. This is an indication to other APICs to terminate arbitration. If the focus processor has not been found, the short message is extended on-the-fly to the non-focused lowest-priority message. Note that except for the EOI message, messages generating a checksum or an acceptance error (refer to Section 7.5.17., Error Handling) terminate after cycle 21.
Table 7-4. Short Message (21 Cycles)
Cycle 1 2 3 4 5 6 7 Cycle 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Bit1 0 ArbID3 ArbID2 ArbID1 ArbID0 DM M1 Bit1 L V7 V5 V3 V1 D7 D5 D3 D1 C 0 A A1 0 Bit0 1 0 0 0 0 M2 M0 Bit0 TM V6 V4 V2 V0 D6 D4 D2 D0 C 0 A A1 0 Status cycle 0 Status cycle 1 Idle Checksum for cycles 6-16 D7-D0 = Destination L = Level, TM = Trigger Mode V7-V0 = Interrupt Vector DM = Destination Mode M2-M0 = Delivery mode 0 1 = normal Arbitration ID bits 3 through 0
Nonfocused Lowest Priority Message. These 34-cycle messages (refer to Table 7-5) are used in the lowest priority delivery mode when a focus processor is not present. Cycles 1 through 20
7-38
are same as for the short message. If during the status cycle (cycle 19) the state of the (A:A) flags is 10B, a focus processor has been identified, and the short message format is used (refer to Table 7-4). If the (A:A) flags are set to 00B, lowest priority arbitration is started and the 34cycles of the nonfocused lowest priority message are competed. For other combinations of status flags, refer to Section 7.5.16.2., APIC Bus Status Cycles
Table 7-5. Nonfocused Lowest Priority Message (34 Cycles)
Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 Cycle 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Bit0 0 ArbID3 ArbID2 ArbID1 ArbID0 DM M1 L V7 V5 V3 V1 D7 Bit0 D5 D3 D1 C 0 A A1 P7 P6 P5 P4 P3 P2 P1 P0 ArbID3 ArbID2 ArbID1 ArbID0 A2 0 Bit1 1 0 0 0 0 M2 M0 TM V6 V4 V2 V0 D6 Bit1 D4 D2 D0 C 0 A A1 0 0 0 0 0 0 0 0 0 0 0 0 A2 0 Status Cycle Idle 7-39 Arbitration ID 3 -0 Status cycle 0 Status cycle 1 P7 - P0 = Inverted Processor Priority Checksum for cycles 6-16 D7-D0 = Destination DM = Destination mode M2-M0 = Delivery mode L = Level, TM = Trigger Mode V7-V0 = Interrupt Vector 0 1 = normal Arbitration ID bits 3 through 0
Cycles 21 through 28 are used to arbitrate for the lowest priority processor. The processors participating in the arbitration drive their inverted processor priority on the bus. Only the local APICs having free interrupt slots participate in the lowest priority arbitration. If no such APIC exists, the message will be rejected, requiring it to be tried at a later time. Cycles 29 through 32 are also used for arbitration in case two or more processors have the same lowest priority. In the lowest priority delivery mode, all combinations of errors in cycle 33 (A2 A2) will set the accept error bit in the error status register (refer to Figure 7-16). Arbitration priority update is performed in cycle 20, and is not affected by errors detected in cycle 33. Only the local APIC that wins in the lowest priority arbitration, drives cycle 33. An error in cycle 33 will force the sender to resend the message. 7.5.16.2. APIC BUS STATUS CYCLES
Certain cycles within an APIC bus message are status cycles. During these cycles the status flags (A:A) and (A1:A1) are examined. Table 7-6 shows how these status flags are interpreted, depending on the current delivery mode and existence of a focus processor.
Table 7-6. APIC Bus Status Cycles Interpretation
Delivery Mode EOI Update ArbID and Cycle# Yes, 13 Yes, 13 No No No No Yes, 20 Yes, 20 No No No No Yes, 20 Yes, 20 No No No No Message Length 14 Cycle 14 Cycle 14 Cycle 14 Cycle 14 Cycle 14 Cycle 21 Cycle 21 Cycle 21 Cycle 21 Cycle 21 Cycle 21 Cycle 21 Cycle 21 Cycle 21 Cycle 21 Cycle 21 Cycle 21 Cycle
A Status 00: CS_OK 00: CS_OK 00: CS_OK 11: CS_Error 10: Error 01: Error
A1 Status 10: Accept 11: Retry 0X: Accept Error XX: XX: XX: 10: Accept 11: Retry 0X: Accept Error XX: XX: XX: 10: Accept 11: Retry 0X: Accept Error XX: XX: XX:
A2 Status XX: XX: XX: XX: XX: XX: XX: XX: XX: XX: XX: XX: XX: XX: XX: XX: XX: XX:
Retry No Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes
Fixed
00: CS_OK 00: CS_OK 00: CS_OK 11: CS_Error 10: Error 01: Error
NMI, SMI, INIT, ExtINT, Start-Up
00: CS_OK 00: CS_OK 00: CS_OK 11: CS_Error 10: Error 01: Error
7-40
Table 7-6. APIC Bus Status Cycles Interpretation (Contd.)

Lowest 00: CS_OK, NoFocus 00: CS_OK, NoFocus 00: CS_OK, NoFocus 00: CS_OK, NoFocus 00: CS_OK, NoFocus 10: CS_OK, Focus 11: CS_Error 01: Error 11: Do Lowest 11: Do Lowest 11: Do Lowest 10: End and Retry 0X: Error XX: XX: XX: 10: Accept 11: Error 0X: Error XX: XX: XX: XX: XX: Yes, 20 Yes, 20 Yes, 20 Yes, 20 No Yes, 20 No No 34 Cycle 34 Cycle 34 Cycle 34 Cycle 34 Cycle 34 Cycle 21 Cycle 21 Cycle No Yes Yes Yes Yes No Yes Yes
7-41
7.5.17. Error Handling

The local APIC sets flags in the error status register (ESR) to record all the errors that is detects (refer to Figure 7-16). The ESR is a read/write register and is reset after being written to by the processor. A write to the ESR must be done just prior to reading the ESR to allow the register to be updated. An error interrupt is generated when one of the error bits is set. Error bits are cumulative. The ESR must be cleared by software after unmasking of the error interrupt entry in the LVT is performed (by executing back-to-back a writes). If the software, however, wishes to handle errors set in the register prior to unmasking, it should write and then read the ESR prior or immediately after the unmasking.
31
8 7 6 5 4 3 2 1 0
Reserved
Illegal Register Address Received Illegal Vector Send Illegal Vector Reserved Receive Accept Error Send Accept Error Receive CS Error Send CS Error Address: FEE0 0280H Value after reset: 0H
Figure 7-16. Error Status Register (ESR)
7-42
The functions of the ESR flags are as follows: Send CS Error Receive CS Error Send Accept Error Receive Accept Error Send Illegal Vector Receive Illegal Vector Set when the local APIC detects a check sum error for a message that was sent by it. Set when the local APIC detects a check sum error for a message that was received by it. Set when the local APIC detects that a message it sent was not accepted by any APIC on the bus. Set when the local APIC detects that the message it received was not accepted by any APIC on the bus, including itself. Set when the local APIC detects an illegal vector in the message that it is sending on the bus. Set when the local APIC detects an illegal vector in the message it received, including an illegal vector code in the local vector table interrupts and self-interrupts from ICR. Set when the processor is trying to access a register that is not implemented in the P6 family processors local APIC register address space; that is, within FEE00000H (the APICBase MSR) through FEE003FFH (the APICBase MSR plus 4K Bytes).
Illegal Reg. Address (P6 Family Processors Only)
7.5.18. Timer
The local APIC unit contains a 32-bit programmable timer for use by the local processor. This timer is configured through the timer register in the local vector table (refer to Figure 7-8). The time base is derived from the processors bus clock, divided by a value specified in the divide configuration register (refer to Figure 7-17). After reset, the timer is initialized to zero. The timer supports one-shot and periodic modes. The timer can be configured to interrupt the local processor with an arbitrary vector.
31
4 3 2 1 0
Reserved Address: FEE0 03E0H Value after reset: 0H
Divide Value (bits 0, 1 and 3) 000: Divide by 2 001: Divide by 4 010: Divide by 8 011: Divide by 16 100: Divide by 32 101: Divide by 64 110: Divide by 128 111: Divide by 1
Figure 7-17. Divide Configuration Register
7-43
The timer is started by programming its initial-count register, refer to Figure 7-18. The initial count value is copied into the current-count register and count-down is begun. After the timer reaches zero in one-shot mode, an interrupt is generated and the timer remains at its 0 value until reprogrammed. In periodic mode, the current-count register is automatically reloaded from the initial-count register when the count reaches 0 and the count-down is repeated. If during the count-down process the initial-count register is set, the counting will restart and the new value will be used. The initial-count register is read-write by software, while the current-count register is read only.
31 0
Initial Count Current Count Address: Initial Count FEE0 0380H Current Count FEE0 0390H Value after reset: 0H
Figure 7-18. Initial Count and Current Count Registers
7.5.19. Software Visible Differences Between the Local APIC and the 82489DX
The following local APIC features differ in their definitions from the 82489DX features:
When the local APIC is disabled, its internal registers are not cleared. Instead, setting the mask bits in the local vector table to disable the local APIC merely causes it to cease accepting the bus messages except for INIT, SMI, NMI, and start-up. In the 82489DX, when the local unit is disabled by resetting the bit 8 of the spurious vector register, all the internal registers including the IRR, ISR and TMR are cleared and the mask bits in the local vector tables are set to logical ones. In the disabled mode, 82489DX local unit will accept only the reset deassert message. In the local APIC, NMI and INIT (except for INIT deassert) are always treated as edge triggered interrupts, even if programmed otherwise. In the 82489DX these interrupts are always level triggered. In the local APIC, interrupts generated through ICR messages are always treated as edge triggered (except INIT Deassert). In the 82489DX, the ICR can be used to generate either edge or level triggered interrupts. Logical Destination register the local APIC supports 8 bits, where it supports 32 bits for the 82489DX. APIC ID register is 4 bits wide for the local APIC and 8 bits wide for the 82489DX. The remote read delivery mode provided in the 82489DX is not supported in the Intel Architecture local APIC.
7-44
7.5.20. Performance Related Differences between the Local APIC and the 82489DX
For the 82489DX, in the lowest priority mode, all the target local APICs specified by the destination field participate in the lowest priority arbitration. Only those local APICs which have free interrupt slots will participate in the lowest priority arbitration.
7.5.21. New Features Incorporated in the Pentium and P6 Family Processors Local APIC
The local APIC in the Pentium and P6 family processors have the following new features not found in the 82489DX.
The local APIC supports cluster addressing in logical destination mode. Focus processor checking can be enabled/disabled in the local APIC. Interrupt input signal polarity can be programmed in the local APIC. The local APIC supports SMI through the ICR and I/O redirection table. The local APIC incorporates an error status register to log and report errors to the processor.
In the P6 family processors, the local APIC incorporates an additional local vector table entry to handle performance monitoring counter interrupts.
7.6.
DUAL-PROCESSOR (DP) INITIALIZATION PROTOCOL
The Pentium processor contains an internal dual-processing (DP) mechanism that permits two processors to be initialized and configured for tightly coupled symmetric multiprocessing (SMP). The DP initialization protocol supports the controlled booting and configuration of the two Pentium processors. When configuration has been completed, the two Pentium processors can share the processing load for the system and share the handling of interrupts received from the systems I/O APIC. The Pentium DP initialization protocol defines two processors:
Primary processor (also called the bootstrap processor, BSP)This processor boots itself, configures the APIC environment, and starts the second processor. Secondary processor (also called the dual processor, DP)This processor boots itself then waits for a startup signal from the primary processor. Upon receiving the startup signal, it completes its configuration.
Appendix C, Dual-Processor (DP) Bootup Sequence Example (Specific to Pentium Processors) gives an example (with code) of the bootup sequence for two Pentium processors operating in a DP configuration.
7-45
Appendix E, Programming the LINT0 and LINT1 Inputs describes (with code) how to program the LINT[0:1] pins of the processors local APICs after a dual-processor configuration has been completed.
7.7.
MULTIPLE-PROCESSOR (MP) INITIALIZATION PROTOCOL
The Intel Architecture (beginning with the Pentium Pro processors) defines a multipleprocessor (MP) initialization protocol, for use with both single- and multiple-processor systems. (Here, multiple processors is defined as two or more processors.) The primary goals of this protocol are as follows:
To permit sequential or controlled booting of multiple processors (from 2 to 4) with no dedicated system hardware. The initialization algorithm is not limited to 4 processors; it can support supports from 1 to 15 processors in a multiclustered system when the APIC busses are tied together. Larger systems are not supported. To be able to initiate the MP protocol without the need for a dedicated signal or BSP. To provide fault tolerance. No single processor is geographically designated the BSP. The BSP is determined dynamically during initialization.
The following sections describe an MP initialization protocol. Appendix D, Multiple-Processor (MP) Bootup Sequence Example (Specific to P6 Family Processors) gives an example (with code) of the bootup sequence for two P6 family processors operating in an MP configuration. Appendix E, Programming the LINT0 and LINT1 Inputs describes (with code) how to program the LINT[0:1] pins of the processors local APICs after an MP configuration has been completed.
7.7.1.
MP Initialization Protocol Requirements and Restrictions
The MP protocol imposes the following requirements and restrictions on the system:
An APIC clock (APICLK) must be provided on all systems based on the P6 family processors (excluding mobile processors and modules). All interrupt mechanisms must be disabled for the duration of the MP protocol algorithm, including the window of time between the assertion of INIT# or receipt of an INIT IPI by the application processors and the receipt of a STARTUP IPI by the application processors. That is, requests generated by interrupting devices must not be seen by the local APIC unit (on board the processor) until the completion of the algorithm. Failure to disable the interrupt mechanisms may result in processor shutdown. The MP protocol should be initiated only after a hardware reset. After completion of the protocol algorithm, a flag is set in the APIC base MSR of the BSP (APIC_BASE.BSP) to indicate that it is the BSP. This flag is cleared for all other processors. If a processor or the complete system is subject to an INIT sequence (either through the INIT# pin or an INIT
7-46
IPI), then the MP protocol is not re-executed. Instead, each processor examines its BSP flag to determine whether the processor should boot or wait for a STARTUP IPI.
7.7.2.
MP Protocol Nomenclature
The MP initialization protocol defines two classes of processors:
The bootstrap processor (BSP)This primary processor is dynamically selected by the MP initialization algorithm. After the BSP has been selected, it configures the APIC environment, and starts the secondary processors, under software control. Application processors (APs)These secondary processors are the remainder of the processors in a MP system that were not selected as the BSP. The APs complete a minimal self-configuration, then wait for a startup signal from the BSP processor. Upon receiving a startup signal, an AP completes its configuration.
Table 7-7 describes the interrupt-style abbreviations that will be used through out the remaining description of the MP initialization protocol. These IPIs do not define new interrupt messages. They are messages that are special only by virtue of the time that they exist (that is, before the RESET sequence is complete).
Table 7-7. Types of Boot Phase IPIs
Message Type Boot InterProcessor Interrupt Final Boot InterProcessor Interrupt Abbreviation BIPI FIPI Description An APIC serial bus message that Symmetric Multiprocessing (SMP) agents use to dynamically determine a BSP after reset. An APIC serial bus message that the BSP issues before it fetches from the reset vector. This message has the lowest priority of all boot phase IPIs. When a BSP sees an FIPI that it issued, it fetches the reset vector because no other boot phase IPIs can follow an FIPI. Used to send a new reset vector to a Application Processor (nonBSP) processor in an MP system.
Startup InterProcessor Interrupt
SIPI
Table 7-8 describes the various fields of each boot phase IPI.
Table 7-8. Boot Phase IPI Message Format
Type BIPI FIPI SIPI NOTE: * For all P6 family processors. Destination Field Not used Not used Used Destination Shorthand All including self All including self All allowed Trigger Mode Edge Edge Edge Level Deassert Deassert Assert Destination Mode Dont Care Dont Care Physical or Logical Delivery Mode Fixed (000) Fixed (000) StartUp (110) Vector (Hex) 40 to 4E* 10 to 1E 00 to FF
7-47
For BIPI and FIPI messages, the lower 4 bits of the vector field are equal to the APIC ID of the processor issuing the message. The upper 4 bits of the vector field of a BIPI or FIPI can be thought of as the generation ID of the message. All processors that run symmetric to a P6 family processor will have a generation ID of 0100B or 4H. BIPIs in a system based on the P6 family processors will therefore use vector values ranging from 40H to 4EH (4FH can not be used because FH is not a valid APIC ID).
7.7.3.
Error Detection During the MP Initialization Protocol
Errors may occur on the APIC bus during the MP initialization phase. These errors may be transient or permanent and can be caused by a variety of failure mechanisms (for example, broken traces, soft errors during bus usage, etc.). All serial bus related errors will result in an APIC checksum or acceptance error. The occurrence of an APIC error causes a processor shutdown.
7.7.4.
Error Handling During the MP Initialization Protocol
The MP initialization protocol makes the following assumptions:
If any errors are detected on the APIC bus during execution of the MP initialization protocol, all processors will shutdown. In a system that conforms to Intel Architecture guidelines, a likely error (broken trace, check sum error during transmission) will result in no more than one processor booting. The MP initialization protocol will be executed by processors even if they fail their BIST sequences.
7.7.5.
MP Initialization Protocol Algorithm
The MP initialization protocol uses the message passing capabilities of the processors local APIC to dynamically determine a boot strap processor (BSP). The algorithm used essentially implements a race for the flag mechanism using the APIC bus for atomicity. The MP initialization algorithm is based on the fact that one and only one message is allowed to exist on the APIC bus at a given time and that once the message is issued, it will complete (APIC messages are atomic). Another feature of the APIC architecture that is used in the initialization algorithm is the existence of a round-robin priority mechanism between all agents that use the APIC bus. The MP initialization protocol algorithm performs the following operations in a SMP system (refer to Figure 7-19): 1. After completing their internal BISTs, all processors start their MP initialization protocol sequence by issuing BIPIs to all including self (at time t=0). The four least significant bits of the vector field of the IPI contain each processor's APIC ID. The APIC hardware
7-48
observes the BNR# (block next request) pin to guarantee that the initial BIPI is not issued on the APIC bus until the BIST sequence is completed for all processors in the system. 2. When the first BIPI completes (at time t=1), the APIC hardware (in each processor) propagates an interrupt to the processor core to indicate the arrival of the BIPI. 3. The processor compares the four least significant bits of the BIPIs vector field to the processor's APIC ID. A match indicates that the processor should be the BSP and continue the initialization sequence. If the APIC ID fails to match the BIPIs vector field, the processor is essentially the loser or not the BSP. The processor then becomes an application processor and should enter a wait for SIPI loop. 4. The winner (the BSP) issues an FIPI. The FIPI is issued to all including self and is guaranteed to be the last IPI on the APIC bus during the initialization sequence. This is due to the fact that the round-robin priority mechanism forces the winning APIC agent's (the BSPs) arbitration priority to 0. The FIPI is therefore issued by a priority 0 agent and has to wait until all other agents have issued their BIPI's. When the BSP receives the FIPI that it issued (t=5), it will start fetching code at the reset vector (Intel Architecture address).
System (CPU) Bus
P6 Family Processor A
P6 Family Processor B
P6 Family Processor C
P6 Family Processor D
APIC Bus t=0 t=1 t=2 t=3 t=4 t=5
BIPI.A
BIPI.B
BIPI.C Serial Bus Activity
BIPI.D
FIPI
Figure 7-19. SMP System
5. All application processors (non-BSP processors) remain in a halted state and can only be woken up by SIPIs issued by another processor (note an AP in the startup IPI loop will also respond to BINIT and snoops).
7-49
8
Processor Management and Initialization
PROCESSOR MANAGEMENT AND INITIALIZATION
CHAPTER 8 PROCESSOR MANAGEMENT AND INITIALIZATION

This chapter describes the facilities provided for managing processor wide functions and for initializing the processor. The subjects covered include: processor initialization, FPU initialization, processor configuration, feature determination, mode switching, the MSRs (in the Pentium and P6 family processors), and the MTRRs (in the P6 family processors).
8.1.
INITIALIZATION OVERVIEW
Following power-up or an assertion of the RESET# pin, each processor on the system bus performs a hardware initialization of the processor (known as a hardware reset) and an optional built-in self-test (BIST). A hardware reset sets each processors registers to a known state and places the processor in real-address mode. It also invalidates the internal caches, translation lookaside buffers (TLBs) and the branch target buffer (BTB). At this point, the action taken depends on the processor family:
P6 family processorsAll the processors on the system bus (including a single processor in a uniprocessor system) execute the multiple processor (MP) initialization protocol across the APIC bus. The processor that is selected through this protocol as the bootstrap processor (BSP) then immediately starts executing software-initialization code in the current code segment beginning at the offset in the EIP register. The application (non-BSP) processors (AP) go into a halt state while the BSP is executing initialization code. Refer to Section 7.7., Multiple-Processor (MP) Initialization Protocol in Chapter 7, MultipleProcessor Management for more details. Note that in a uniprocessor system, the single P6 family processor automatically becomes the BSP. Pentium processorsIn either a single- or dual- processor system, a single Pentium processor is always pre-designated as the primary processor. Following a reset, the primary processor behaves as follows in both single- and dual-processor systems. Using the dualprocessor (DP) ready initialization protocol, the primary processor immediately starts executing software-initialization code in the current code segment beginning at the offset in the EIP register. The secondary processor (if there is one) goes into a halt state. (Refer to Section 7.6., Dual-Processor (DP) Initialization Protocol in Chapter 7, MultipleProcessor Management for more details.) Intel486 processorThe primary processor (or single processor in a uniprocessor system) immediately starts executing software-initialization code in the current code segment beginning at the offset in the EIP register. (The Intel486 does not automatically execute a DP or MP initialization protocol to determine which processor is the primary processor.)
The software-initialization code performs all system-specific initialization of the BSP or primary processor and the system logic.
8-1
At this point, for MP (or DP) systems, the BSP (or primary) processor wakes up each AP (or secondary) processor to enable those processors to execute self-configuration code. When all processors are initialized, configured, and synchronized, the BSP or primary processor begins executing an initial operating-system or executive task. The floating-point unit (FPU) is also initialized to a known state during hardware reset. FPU software initialization code can then be executed to perform operations such as setting the precision of the FPU and the exception masks. No special initialization of the FPU is required to switch operating modes. Asserting the INIT# pin on the processor invokes a similar response to a hardware reset. The major difference is that during an INIT, the internal caches, MSRs, MTRRs, and FPU state are left unchanged (although, the TLBs and BTB are invalidated as with a hardware reset). An INIT provides a method for switching from protected to real-address mode while maintaining the contents of the internal caches.
8.1.1.
Processor State After Reset
Table 8-1 shows the state of the flags and other registers following power-up for the Pentium Pro, Pentium, and Intel486 processors. The state of control register CR0 is 60000010H (refer to Figure 8-1), which places the processor is in real-address mode with paging disabled.
8.1.2.
Processor Built-In Self-Test (BIST)
Hardware may request that the BIST be performed at power-up. The EAX register is cleared (0H) if the processor passes the BIST. A nonzero value in the EAX register after the BIST indicates that a processor fault was detected. If the BIST is not requested, the contents of the EAX register after a hardware reset is 0H. The overhead for performing a BIST varies between processor families. For example, the BIST takes approximately 5.5 million processor clock periods to execute on the Pentium Pro processor. (This clock count is model-specific, and Intel reserves the right to change the exact number of periods, for any of the Intel Architecture processors, without notification.)
8-2
Table 8-1. 32-Bit Intel Architecture Processor States Following Power-up, Reset, or INIT
Register EFLAGS1 EIP CR0 CR2, CR3, CR4 MXCSR P6 Family Processors 00000002H 0000FFF0H 60000010H2 00000000H Pentium III processor onlyPwr up or Reset: 1F80H FINIT/FNINIT: Unchanged Selector = F000H Base = FFFF0000H Limit = FFFFH AR = Present, R/W, Accessed Selector = 0000H Base = 00000000H Limit = FFFFH AR = Present, R/W, Accessed 000006xxH 0
3
Pentium Processor 00000002H 0000FFF0H 60000010H2 00000000H NA
Intel486 Processor 00000002H 0000FFF0H 60000010H2 00000000H NA
CS
Selector = F000H Base = FFFF0000H Limit = FFFFH AR = Present, R/W, Accessed Selector = 0000H Base = 00000000H Limit = FFFFH AR = Present, R/W, Accessed 000005xxH 0
3
Selector = F000H Base = FFFF0000H Limit = FFFFH AR = Present, R/W, Accessed Selector = 0000H Base = 00000000H Limit = FFFFH AR = Present, R/W, Accessed 000004xxH 03 00000000H NA
SS, DS, ES, FS, GS
EDX EAX EBX, ECX, ESI, EDI, EBP, ESP MM0 through MM74
00000000H Pentium Pro processor NA Pentium II and Pentium III processor Pwr up or Reset: 0000000000000000H FINIT/FNINIT: Unchanged Pentium III processor onlyPwr up or Reset: 0000000000000000H FINIT/FNINIT: Unchanged Pwr up or Reset: +0.0 FINIT/FNINIT: Unchanged Pwr up or Reset: 0040H FINIT/FNINIT: 037FH Pwr up or Reset: 0000H FINIT/FNINIT: 0000H Pwr up or Reset: 5555H FINIT/FNINIT: FFFFH Pwr up or Reset: 0000H FINIT/FNINIT: 0000H
00000000H Pwr up or Reset: 0000000000000000H FINIT/FNINIT: Unchanged
XMM0 through XMM75
NA
NA
ST0 through ST74 FPU Control Word4 FPU Status Word4 FPU Tag Word4 FPU Data Operand and CS Seg. Selectors4
Pwr up or Reset: +0.0 FINIT/FNINIT: Unchanged Pwr up or Reset: 0040H FINIT/FNINIT: 037FH Pwr up or Reset: 0000H FINIT/FNINIT: 0000H Pwr up or Reset: 5555H FINIT/FNINIT: FFFFH Pwr up or Reset: 0000H FINIT/FNINIT: 0000H
Pwr up or Reset: +0.0 FINIT/FNINIT: Unchanged Pwr up or Reset: 0040H FINIT/FNINIT: 037FH Pwr up or Reset: 0000H FINIT/FNINIT: 0000H Pwr up or Reset: 5555H FINIT/FNINIT: FFFFH Pwr up or Reset: 0000H FINIT/FNINIT: 0000H
8-3
Table 8-1. 32-Bit Intel Architecture Processor States Following Power-up, Reset, or INIT (Contd.)
Register FPU Data Operand and Inst. Pointers4 GDTR,IDTR P6 Family Processors Pwr up or Reset: 00000000H FINIT/FNINIT: 00000000H Base = 00000000H Limit = FFFFH AR = Present, R/W Selector = 0000H Base = 00000000H Limit = FFFFH AR = Present, R/W 00000000H FFFF0FF0H 00000400H Power up or Reset: 0H INIT: Unchanged Power up or Reset: 0H INIT: Unchanged Pwr up or Reset: Undefined INIT: Unchanged Invalid Pwr up or Reset: Disabled INIT: Unchanged Pwr up or Reset: Disabled INIT: Unchanged Pwr up or Reset: Undefined INIT: Unchanged Pwr up or Reset: Enabled INIT: Unchanged Pentium Processor Pwr up or Reset: 00000000H FINIT/FNINIT: 00000000H Base = 00000000H Limit = FFFFH AR = Present, R/W Selector = 0000H Base = 00000000H Limit = FFFFH AR = Present, R/W 00000000H FFFF0FF0H 00000400H Power up or Reset: 0H INIT: Unchanged Power up or Reset: 0H INIT: Unchanged Pwr up or Reset: Undefined INIT: Unchanged Invalid Not Implemented Not Implemented Not Implemented Intel486 Processor Pwr up or Reset: 00000000H FINIT/FNINIT: 00000000H Base = 00000000H Limit = FFFFH AR = Present, R/W Selector = 0000H Base = 00000000H Limit = FFFFH AR = Present, R/W 00000000H FFFF1FF0H 00000000H Not Implemented Not Implemented
LDTR, Task Register
DR0, DR1, DR2, DR3 DR6 DR7 Time-Stamp Counter Perf. Counters and Event Select All Other MSRs
Not Implemented
Data and Code Cache, TLBs Fixed MTRRs Variable MTRRs Machine-Check Architecture APIC
Invalid Not Implemented Not Implemented Not Implemented
Pwr up or Reset: Enabled INIT: Unchanged
Not Implemented
NOTES: 1. The 10 most-significant bits of the EFLAGS register are undefined following a reset. Software should not depend on the states of any of these bits. 2. The CD and NW flags are unchanged, bit 4 is set to 1, all other bits are cleared. 3. If Built-In Self-Test (BIST) is invoked on power up or reset, EAX is 0 only if all tests passed. (BIST cannot be invoked during an INIT.) 4. The state of the FPU state and MMX registers is not changed by the execution of an INIT. 5. Available in the Pentium III processor and Pentium III Xeon processor only. The state of the SIMD floating-point registers is not changed by the execution of an INIT.
8-4
Paging disabled: 0 Caching disabled: 1 Not write-through disabled: 1 Alignment check disabled: 0 Write-protect disabled: 0
31 30 29 28 19 18 17 16 15 6 5 4 3 2 1 0
P C N G DW
A M
W P
N T E M P 1 E S M P E
External FPU error reporting: 0 (Not used): 1 No task switch: 0 FPU instructions not trapped: 0 WAIT/FWAIT instructions not trapped: 0 Real-address mode: 0 Reserved
Figure 8-1. Contents of CR0 Register after Reset
8.1.3.
Model and Stepping Information
Following a hardware reset, the EDX register contains component identification and revision information (refer to Figure 8-2). The device ID field is set to the value 6H, 5H, 4H, or 3H to indicate a Pentium Pro, Pentium, Intel486, or Intel386 processor, respectively. Different values may be returned for the various members of these Intel Architecture families. For example the Intel386 SX processor returns 23H in the device ID field. Binary object code can be made compatible with other Intel processors by using this number to select the correct initialization software.
31
14 13 12 11
8 7
4 3
EDX
Family
Model
Stepping ID
Processor Type Family (0110B for the Pentium Pro Processor Family) Model (Beginning with 0001B) Reserved
Figure 8-2. Processor Type and Signature in the EDX Register after Reset
8-5
The stepping ID field contains a unique identifier for the processors stepping ID or revision level. The upper word of EDX is reserved following reset.
8.1.4.
First Instruction Executed
The first instruction that is fetched and executed following a hardware reset is located at physical address FFFFFFF0H. This address is 16 bytes below the processors uppermost physical address. The EPROM containing the software-initialization code must be located at this address. The address FFFFFFF0H is beyond the 1-MByte addressable range of the processor while in real-address mode. The processor is initialized to this starting address as follows. The CS register has two parts: the visible segment selector part and the hidden base address part. In realaddress mode, the base address is normally formed by shifting the 16-bit segment selector value 4 bits to the left to produce a 20-bit base address. However, during a hardware reset, the segment selector in the CS register is loaded with F000H and the base address is loaded with FFFF0000H. The starting address is thus formed by adding the base address to the value in the EIP register (that is, FFFF0000 + FFF0H = FFFFFFF0H). The first time the CS register is loaded with a new value after a hardware reset, the processor will follow the normal rule for address translation in real-address mode (that is, [CS base address = CS segment selector * 16]). To insure that the base address in the CS register remains unchanged until the EPROM based software-initialization code is completed, the code must not contain a far jump or far call or allow an interrupt to occur (which would cause the CS selector value to be changed).
8.2.
FPU INITIALIZATION
Software-initialization code can determine the whether the processor contains or is attached to an FPU by using the CPUID instruction. The code must then initialize the FPU and set flags in control register CR0 to reflect the state of the FPU environment. A hardware reset places the Pentium processor FPU in the state shown in Table 8-1. This state is different from the state the processor is placed in when executing an FINIT or FNINIT instruction (also shown in Table 8-1). If the FPU is to be used, the software-initialization code should execute an FINIT/FNINIT instruction following a hardware reset. These instructions, tag all data registers as empty, clear all the exception masks, set the TOP-of-stack value to 0, and select the default rounding and precision controls setting (round to nearest and 64-bit precision). If the processor is reset by asserting the INIT# pin, the FPU state is not changed.
8.2.1.
Configuring the FPU Environment
Initialization code must load the appropriate values into the MP, EM, and NE flags of control register CR0. These bits are cleared on hardware reset of the processor. Figure 8-2 shows the suggested settings for these flags, depending on the Intel Architecture processor being initial-
8-6
ized. Initialization code can test for the type of processor present before setting or clearing these flags.
Table 8-2. Recommended Settings of EM and MP Flags on Intel Architecture Processors
EM 1 0 MP 0 1 NE 1 1 or 0* Intel Architecture Processor Intel486 SX, Intel386 DX, and Intel386 SX processors only, without the presence of a math coprocessor. Pentium Pro, Pentium, Intel486 DX, and Intel 487 SX processors, and also Intel386 DX and Intel386 SX processors when a companion math coprocessor is present.
NOTE: * The setting of the NE flag depends on the operating system being used.
The EM flag determines whether floating-point instructions are executed by the FPU (EM is cleared) or generate a device-not-available exception (#NM) so that an exception handler can emulate the floating-point operation (EM = 1). Ordinarily, the EM flag is cleared when an FPU or math coprocessor is present and set if they are not present. If the EM flag is set and no FPU, math coprocessor, or floating-point emulator is present, the system will hang when a floatingpoint instruction is executed. The MP flag determines whether WAIT/FWAIT instructions react to the setting of the TS flag. If the MP flag is clear, WAIT/FWAIT instructions ignore the setting of the TS flag; if the MP flag is set, they will generate a device-not-available exception (#NM) if the TS flag is set. Generally, the MP flag should be set for processors with an integrated FPU and clear for processors without an integrated FPU and without a math coprocessor present. However, an operating system can choose to save the floating-point context at every context switch, in which case there would be no need to set the MP bit. Table 2-1 in Chapter 2, System Architecture Overview shows the actions taken for floating-point and WAIT/FWAIT instructions based on the settings of the EM, MP, and TS flags. The NE flag determines whether unmasked floating-point exceptions are handled by generating a floating-point error exception internally (NE is set, native mode) or through an external interrupt (NE is cleared). In systems where an external interrupt controller is used to invoke numeric exception handlers (such as MS-DOS-based systems), the NE bit should be cleared.
8-7
8.2.2.
Setting the Processor for FPU Software Emulation
Setting the EM flag causes the processor to generate a device-not-available exception (#NM) and trap to a software exception handler whenever it encounters a floating-point instruction. (Table 8-2 shows when it is appropriate to use this flag.) Setting this flag has two functions:
It allows floating-point code to run on an Intel processor that neither has an integrated FPU nor is connected to an external math coprocessor, by using a floating-point emulator. It allows floating-point code to be executed using a special or nonstandard floating-point emulator, selected for a particular application, regardless of whether an FPU or math coprocessor is present.
To emulate floating-point instructions, the EM, MP, and NE flag in control register CR0 should be set as shown in Table 8-3.
Table 8-3. Software Emulation Settings of EM, MP, and NE Flags
CR0 Bit EM MP NE Value 1 0 1
Regardless of the value of the EM bit, the Intel486 SX processor generates a device-not-available exception (#NM) upon encountering any floating-point instruction.
8.3.
CACHE ENABLING
The Intel Architecture processors (beginning with the Intel486 processor) contain internal instruction and data caches. These caches are enabled by clearing the CD and NW flags in control register CR0. (They are set during a hardware reset.) Because all internal cache lines are invalid following reset initialization, it is not necessary to invalidate the cache before enabling caching. Any external caches may require initialization and invalidation using a system-specific initialization and invalidation code sequence. Depending on the hardware and operating system or executive requirements, additional configuration of the processors caching facilities will probably be required. Beginning with the Intel486 processor, page-level caching can be controlled with the PCD and PWT flags in page-directory and page-table entries. For P6 family processors, the memory type range registers (MTRRs) control the caching characteristics of the regions of physical memory. (For the Intel486 and Pentium processors, external hardware can be used to control the caching characteristics of regions of physical memory.) Refer to Chapter 9, Memory Cache Control, for detailed information on configuration of the caching facilities in the P6 family processors and system memory.
8.4.
MODEL-SPECIFIC REGISTERS (MSRS)
The P6 family processors and Pentium processors contain model-specific registers (MSRs). These registers are by definition implementation specific; that is, they are not guaranteed to be
8-8
supported on future Intel Architecture processors and/or to have the same functions. The MSRs are provided to control a variety of hardware- and software-related features, including:
The performance-monitoring counters (refer to Section 15.6., Performance-Monitoring Counters, in Chapter 15, Debugging and Performance Monitoring). (P6 family processors only.) Debug extensions (refer to Section 15.4., Last Branch, Interrupt, and Exception Recording, in Chapter 15, Debugging and Performance Monitoring). (P6 family processors only.) The machine-check exception capability and its accompanying machine-check architecture (refer to Chapter 13, Machine-Check Architecture). (P6 family processors only.) The MTRRs (refer to Section 9.12., Memory Type Range Registers (MTRRs), in Chapter 9, Memory Cache Control).
The MSRs can be read and written to using the RDMSR and WRMSR instructions, respectively. When performing software initialization of a Pentium Pro or Pentium processor, many of the MSRs will need to be initialized to set up things like performance-monitoring events, run-time machine checks, and memory types for physical memory. Systems configured to implement FRC mode must write all of the processors internal MSRs to deterministic values before performing either a read or read-modify-write operation using these registers. The following is a list of MSRs that are not initialized by the processors reset sequences.
All fixed and variable MTRRs. All Machine Check Architecture (MCA) status registers. Microcode update signature register. All L2 cache initialization MSRs.
The list of available performance-monitoring counters for the Pentium Pro and Pentium processors is given in Appendix A, Performance-Monitoring Events, and the list of available MSRs for the Pentium Pro processor is given in Appendix B, Model-Specific Registers. The references earlier in this section show where the functions of the various groups of MSRs are described in this manual.
8.5.
MEMORY TYPE RANGE REGISTERS (MTRRS)
Memory type range registers (MTRRs) were introduced into the Intel Architecture with the Pentium Pro processor. They allow the type of caching (or no caching) to be specified in system memory for selected physical address ranges. They allow memory accesses to be optimized for various types of memory such as RAM, ROM, frame buffer memory, and memory-mapped I/O devices. In general, initializing the MTRRs is normally handled by the software initialization code or BIOS and is not an operating system or executive function. At the very least, all the MTRRs must be cleared to 0, which selects the uncached (UC) memory type. Refer to Section 9.12.,
8-9
Memory Type Range Registers (MTRRs), in Chapter 9, Memory Cache Control, for detailed information on the MTRRs.
8.6.
SOFTWARE INITIALIZATION FOR REAL-ADDRESS MODE OPERATION
Following a hardware reset (either through a power-up or the assertion of the RESET# pin) the processor is placed in real-address mode and begins executing software initialization code from physical address FFFFFFF0H. Software initialization code must first set up the necessary data structures for handling basic system functions, such as a real-mode IDT for handling interrupts and exceptions. If the processor is to remain in real-address mode, software must then load additional operating-system or executive code modules and data structures to allow reliable execution of application programs in real-address mode. If the processor is going to operate in protected mode, software must load the necessary data structures to operate in protected mode and then switch to protected mode. The protected-mode data structures that must be loaded are described in Section 8.7., Software Initialization for Protected-Mode Operation.
8.6.1.
Real-Address Mode IDT
In real-address mode, the only system data structure that must be loaded into memory is the IDT (also called the interrupt vector table). By default, the address of the base of the IDT is physical address 0H. This address can be changed by using the LIDT instruction to change the base address value in the IDTR. Software initialization code needs to load interrupt- and exceptionhandler pointers into the IDT before interrupts can be enabled. The actual interrupt- and exception-handler code can be contained either in EPROM or RAM; however, the code must be located within the 1-MByte addressable range of the processor in real-address mode. If the handler code is to be stored in RAM, it must be loaded along with the IDT.
8.6.2.
NMI Interrupt Handling
The NMI interrupt is always enabled (except when multiple NMIs are nested). If the IDT and the NMI interrupt handler need to be loaded into RAM, there will be a period of time following hardware reset when an NMI interrupt cannot be handled. During this time, hardware must provide a mechanism to prevent an NMI interrupt from halting code execution until the IDT and the necessary NMI handler software is loaded.
8-10
Here are two examples of how NMIs can be handled during the initial states of processor initialization:
A simple IDT and NMI interrupt handler can be provided in EPROM. This allows an NMI interrupt to be handled immediately after reset initialization. The system hardware can provide a mechanism to enable and disable NMIs by passing the NMI# signal through an AND gate controlled by a flag in an I/O port. Hardware can clear the flag when the processor is reset, and software can set the flag when it is ready to handle NMI interrupts.
8.7.
SOFTWARE INITIALIZATION FOR PROTECTED-MODE OPERATION
The processor is placed in real-address mode following a hardware reset. At this point in the initialization process, some basic data structures and code modules must be loaded into physical memory to support further initialization of the processor, as described in Section 8.6., Software Initialization for Real-Address Mode Operation. Before the processor can be switched to protected mode, the software initialization code must load a minimum number of protected mode data structures and code modules into memory to support reliable operation of the processor in protected mode. These data structures include the following:
A protected-mode IDT. A GDT. A TSS. (Optional.) An LDT. If paging is to be used, at least one page directory and one page table. A code segment that contains the code to be executed when the processor switches to protected mode. One or more code modules that contain the necessary interrupt and exception handlers.
Software initialization code must also initialize the following system registers before the processor can be switched to protected mode: The GDTR. (Optional.) The IDTR. This register can also be initialized immediately after switching to protected mode, prior to enabling interrupts. Control registers CR1 through CR4. (Pentium Pro processor only.) The memory type range registers (MTRRs).
With these data structures, code modules, and system registers initialized, the processor can be switched to protected mode by loading control register CR0 with a value that sets the PE flag (bit 0).
8-11
8.7.1.
Protected-Mode System Data Structures
The contents of the protected-mode system data structures loaded into memory during software initialization, depend largely on the type of memory management the protected-mode operatingsystem or executive is going to support: flat, flat with paging, segmented, or segmented with paging. To implement a flat memory model without paging, software initialization code must at a minimum load a GDT with one code and one data-segment descriptor. A null descriptor in the first GDT entry is also required. The stack can be placed in a normal read/write data segment, so no dedicated descriptor for the stack is required. A flat memory model with paging also requires a page directory and at least one page table (unless all pages are 4 MBytes in which case only a page directory is required). Refer to Section 8.7.3., Initializing Paging Before the GDT can be used, the base address and limit for the GDT must be loaded into the GDTR register using an LGDT instruction. A multisegmented model may require additional segments for the operating system, as well as segments and LDTs for each application program. LDTs require segment descriptors in the GDT. Some operating systems allocate new segments and LDTs as they are needed. This provides maximum flexibility for handling a dynamic programming environment. However, many operating systems use a single LDT for all tasks, allocating GDT entries in advance. An embedded system, such as a process controller, might pre-allocate a fixed number of segments and LDTs for a fixed number of application programs. This would be a simple and efficient way to structure the software environment of a real-time system.
8.7.2.
Initializing Protected-Mode Exceptions and Interrupts
Software initialization code must at a minimum load a protected-mode IDT with gate descriptor for each exception vector that the processor can generate. If interrupt or trap gates are used, the gate descriptors can all point to the same code segment, which contains the necessary exception handlers. If task gates are used, one TSS and accompanying code, data, and task segments are required for each exception handler called with a task gate. If hardware allows interrupts to be generated, gate descriptors must be provided in the IDT for one or more interrupt handlers. Before the IDT can be used, the base address and limit for the IDT must be loaded into the IDTR register using an LIDT instruction. This operation is typically carried out immediately after switching to protected mode.
8.7.3.
Initializing Paging
Paging is controlled by the PG flag in control register CR0. When this flag is clear (its state following a hardware reset), the paging mechanism is turned off; when it is set, paging is enabled. Before setting the PG flag, the following data structures and registers must be initialized:
8-12
Software must load at least one page directory and one page table into physical memory. The page table can be eliminated if the page directory contains a directory entry pointing to itself (here, the page directory and page table reside in the same page), or if only 4-MByte pages are used. Control register CR3 (also called the PDBR register) is loaded with the physical base address of the page directory. (Optional) Software may provide one set of code and data descriptors in the GDT or in an LDT for supervisor mode and another set for user mode.
With this paging initialization complete, paging is enabled and the processor is switched to protected mode at the same time by loading control register CR0 with an image in which the PG and PE flags are set. (Paging cannot be enabled before the processor is switched to protected mode.)
8.7.4.
Initializing Multitasking
If the multitasking mechanism is not going to be used and changes between privilege levels are not allowed, it is not necessary load a TSS into memory or to initialize the task register. If the multitasking mechanism is going to be used and/or changes between privilege levels are allowed, software initialization code must load at least one TSS and an accompanying TSS descriptor. (A TSS is required to change privilege levels because pointers to the privileged-level 0, 1, and 2 stack segments and the stack pointers for these stacks are obtained from the TSS.) TSS descriptors must not be marked as busy when they are created; they should be marked busy by the processor only as a side-effect of performing a task switch. As with descriptors for LDTs, TSS descriptors reside in the GDT. After the processor has switched to protected mode, the LTR instruction can be used to load a segment selector for a TSS descriptor into the task register. This instruction marks the TSS descriptor as busy, but does not perform a task switch. The processor can, however, use the TSS to locate pointers to privilege-level 0, 1, and 2 stacks. The segment selector for the TSS must be loaded before software performs its first task switch in protected mode, because a task switch copies the current task state into the TSS. After the LTR instruction has been executed, further operations on the task register are performed by task switching. As with other segments and LDTs, TSSs and TSS descriptors can be either pre-allocated or allocated as needed.
8.8.
MODE SWITCHING
To use the processor in protected mode, a mode switch must be performed from real-address mode. Once in protected mode, software generally does not need to return to real-address mode. To run software written to run in real-address mode (8086 mode), it is generally more convenient to run the software in virtual-8086 mode, than to switch back to real-address mode.
8-13
8.8.1.
Switching to Protected Mode
Before switching to protected mode, a minimum set of system data structures and code modules must be loaded into memory, as described in Section 8.7., Software Initialization for ProtectedMode Operation. Once these tables are created, software initialization code can switch into protected mode. Protected mode is entered by executing a MOV CR0 instruction that sets the PE flag in the CR0 register. (In the same instruction, the PG flag in register CR0 can be set to enable paging.) Execution in protected mode begins with a CPL of 0. The 32-bit Intel Architecture processors have slightly different requirements for switching to protected mode. To insure upwards and downwards code compatibility with all 32-bit Intel Architecture processors, it is recommended that the following steps be performed: 1. Disable interrupts. A CLI instruction disables maskable hardware interrupts. NMI interrupts can be disabled with external circuitry. (Software must guarantee that no exceptions or interrupts are generated during the mode switching operation.) 2. Execute the LGDT instruction to load the GDTR register with the base address of the GDT. 3. Execute a MOV CR0 instruction that sets the PE flag (and optionally the PG flag) in control register CR0. 4. Immediately following the MOV CR0 instruction, execute a far JMP or far CALL instruction. (This operation is typically a far jump or call to the next instruction in the instruction stream.) The JMP or CALL instruction immediately after the MOV CR0 instruction changes the flow of execution and serializes the processor. If paging is enabled, the code for the MOV CR0 instruction and the JMP or CALL instruction must come from a page that is identity mapped (that is, the linear address before the jump is the same as the physical address after paging and protected mode is enabled). The target instruction for the JMP or CALL instruction does not need to be identity mapped. 5. If a local descriptor table is going to be used, execute the LLDT instruction to load the segment selector for the LDT in the LDTR register. 6. Execute the LTR instruction to load the task register with a segment selector to the initial protected-mode task or to a writable area of memory that can be used to store TSS information on a task switch. 7. After entering protected mode, the segment registers continue to hold the contents they had in real-address mode. The JMP or CALL instruction in step 4 resets the CS register. Perform one of the following operations to update the contents of the remaining segment registers. Reload segment registers DS, SS, ES, FS, and GS. If the ES, FS, and/or GS registers are not going to be used, load them with a null selector.
8-14
Perform a JMP or CALL instruction to a new task, which automatically resets the values of the segment registers and branches to a new code segment. 8. Execute the LIDT instruction to load the IDTR register with the address and limit of the protected-mode IDT. 9. Execute the STI instruction to enable maskable hardware interrupts and perform the necessary hardware operation to enable NMI interrupts. Random failures can occur if other instructions exist between steps 3 and 4 above. Failures will be readily seen in some situations, such as when instructions that reference memory are inserted between steps 3 and 4 while in System Management mode.
8.8.2.
Switching Back to Real-Address Mode
The processor switches back to real-address mode if software clears the PE bit in the CR0 register with a MOV CR0 instruction. A procedure that re-enters real-address mode should perform the following steps: 1. Disable interrupts. A CLI instruction disables maskable hardware interrupts. NMI interrupts can be disabled with external circuitry. 2. If paging is enabled, perform the following operations: Transfer program control to linear addresses that are identity mapped to physical addresses (that is, linear addresses equal physical addresses). Insure that the GDT and IDT are in identity mapped pages. Clear the PG bit in the CR0 register. Move 0H into the CR3 register to flush the TLB. 3. Transfer program control to a readable segment that has a limit of 64 KBytes (FFFFH). This operation loads the CS register with the segment limit required in real-address mode. 4. Load segment registers SS, DS, ES, FS, and GS with a selector for a descriptor containing the following values, which are appropriate for real-address mode: Limit = 64 KBytes (0FFFFH) Byte granular (G = 0) Expand up (E = 0) Writable (W = 1) Present (P = 1) Base = any value The segment registers must be loaded with nonnull segment selectors or the segment registers will be unusable in real-address mode. Note that if the segment registers are not
8-15
reloaded, execution continues using the descriptor attributes loaded during protected mode. 5. Execute an LIDT instruction to point to a real-address mode interrupt table that is within the 1-MByte real-address mode address range. 6. Clear the PE flag in the CR0 register to switch to real-address mode. 7. Execute a far JMP instruction to jump to a real-address mode program. This operation flushes the instruction queue and loads the appropriate base and access rights values in the CS register. 8. Load the SS, DS, ES, FS, and GS registers as needed by the real-address mode code. If any of the registers are not going to be used in real-address mode, write 0s to them. 9. Execute the STI instruction to enable maskable hardware interrupts and perform the necessary hardware operation to enable NMI interrupts.
NOTE
All the code that is executed in steps 1 through 9 must be in a single page and the linear addresses in that page must be identity mapped to physical addresses.
8.9.
INITIALIZATION AND MODE SWITCHING EXAMPLE
This section provides an initialization and mode switching example that can be incorporated into an application. This code was originally written to initialize the Intel386 processor, but it will execute successfully on the Pentium Pro, Pentium, and Intel486 processors. The code in this example is intended to reside in EPROM and to run following a hardware reset of the processor. The function of the code is to do the following:
Establish a basic real-address mode operating environment. Load the necessary protected-mode system data structures into RAM. Load the system registers with the necessary pointers to the data structures and the appropriate flag settings for protected-mode operation. Switch the processor to protected mode.
Figure 8-3 shows the physical memory layout for the processor following a hardware reset and the starting point of this example. The EPROM that contains the initialization code resides at the upper end of the processors physical memory address range, starting at address FFFFFFFFH and going down from there. The address of the first instruction to be executed is at FFFFFFF0H, the default starting address for the processor following a hardware reset. The main steps carried out in this example are summarized in Table 8-4. The source listing for the example (with the filename STARTUP.ASM) is given in Example 8-1. The line numbers given in Table 8-4 refer to the source listing.
8-16
The following are some additional notes concerning this example:
When the processor is switched into protected mode, the original code segment baseaddress value of FFFF0000H (located in the hidden part of the CS register) is retained and execution continues from the current offset in the EIP register. The processor will thus continue to execute code in the EPROM until a far jump or call is made to a new code segment, at which time, the base address in the CS register will be changed. Maskable hardware interrupts are disabled after a hardware reset and should remain disabled until the necessary interrupt handlers have been installed. The NMI interrupt is not disabled following a reset. The NMI# pin must thus be inhibited from being asserted until an NMI handler has been loaded and made available to the processor. The use of a temporary GDT allows simple transfer of tables from the EPROM to anywhere in the RAM area. A GDT entry is constructed with its base pointing to address 0 and a limit of 4 GBytes. When the DS and ES registers are loaded with this descriptor, the temporary GDT is no longer needed and can be replaced by the application GDT. This code loads one TSS and no LDTs. If more TSSs exist in the application, they must be loaded into RAM. If there are LDTs they may be loaded as well.
After Reset [CS.BASE+EIP]
FFFF FFFFH FFFF FFF0H
64K EPROM
EIP = 0000 FFF0H CS.BASE = FFFF 0000H DS.BASE = 0H ES.BASE = 0H SS.BASE = 0H ESP = 0H
FFFF 0000H
[SP, DS, SS, ES]
Figure 8-3. Processor State After Reset
8-17
Table 8-4. Main Initialization Steps in STARTUP.ASM Source Listing

STARTUP.ASM Line Numbers From 157 162 157 169 To Description Jump (short) to the entry code in the EPROM Construct a temporary GDT in RAM with one entry: 0 - null 1 - R/W data segment, base = 0, limit = 4 GBytes Load the GDTR to point to the temporary GDT Load CR0 with PE flag set to switch to protected mode Jump near to clear real mode instruction queue Load DS, ES registers with GDT[1] descriptor, so both point to the entire physical memory space Perform specific board initialization that is imposed by the new protected mode Copy the applications GDT from ROM into RAM Copy the applications IDT from ROM into RAM Load applications GDTR Load applications IDTR Copy the applications TSS from ROM into RAM Update TSS descriptor and other aliases in GDT (GDT alias or IDT alias) Load the task register (without task switch) using LTR instruction Load SS, ESP with the value found in the applications TSS Push EFLAGS value found in the applications TSS Push CS value found in the applications TSS Push EIP value found in the applications TSS Load DS, ES with the value found in the applications TSS Perform IRET; pop the above values and enter the application code
171 174 179 184 188 196 220 241 244 247 263 277 282 287 288 289 290 296
172 177 181 186 195 218 238 243 245 261 267 277 286 287 288 289 293 296
8-18
8.9.1.
Assembler Usage
In this example, the Intel assembler ASM386 and build tools BLD386 are used to assemble and build the initialization code module. The following assumptions are used when using the Intel ASM386 and BLD386 tools.
The ASM386 will generate the right operand size opcodes according to the code-segment attribute. The attribute is assigned either by the ASM386 invocation controls or in the code-segment definition. If a code segment that is going to run in real-address mode is defined, it must be set to a USE 16 attribute. If a 32-bit operand is used in an instruction in this code segment (for example, MOV EAX, EBX), the assembler automatically generates an operand prefix for the instruction that forces the processor to execute a 32-bit operation, even though its default code-segment attribute is 16-bit. Intels ASM386 assembler allows specific use of the 16- or 32-bit instructions, for example, LGDTW, LGDTD, IRETD. If the generic instruction LGDT is used, the defaultsegment attribute will be used to generate the right opcode.
8.9.2.
STARTUP.ASM Listing
The source code listing to move the processor into protected mode is provided in Example 8-1. This listing does not include any opcode and offset information.
Example 8-1. STARTUP.ASM MS-DOS* 5.0(045-N) 386(TM) MACRO ASSEMBLER STARTUP 09:44:51 08/19/92 PAGE 1 MS-DOS 5.0(045-N) 386(TM) MACRO ASSEMBLER V4.0, ASSEMBLY OF MODULE STARTUP OBJECT MODULE PLACED IN startup.obj ASSEMBLER INVOKED BY: f:\386tools\ASM386.EXE startup.a58 pw (132 ) LINE 1 2 3 SOURCE NAME STARTUP
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
4 ; 5 ; ASSUMPTIONS: 6 ; 7 ; 1. The bottom 64K of memory is ram, and can be used for 8 ; scratch space by this module. 9 ; 10 ; 2. The system has sufficient free usable ram to copy the 11 ; initial GDT, IDT, and TSS
8-19
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; configuration data - must match with build definition CS_BASE EQU 0FFFF0000H
; CS_BASE is the linear address of the segment STARTUP_CODE ; - this is specified in the build language file RAM_START EQU 400H
; RAM_START is the start of free, usable ram in the linear ; memory space. The GDT, IDT, and initial TSS will be ; copied above this space, and a small data segment will be ; discarded at this linear address. The 32-bit word at ; RAM_START will contain the linear address of the first ; free byte above the copied tables - this may be useful if ; a memory manager is used. TSS_INDEX EQU 10
; TSS_INDEX is the index of the TSS of the first task to ; run after startup
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; ------------------------- STRUCTURES and EQU --------------; structures for system data ; TSS structure TASK_STATE STRUC link DW ? link_h DW ? ESP0 DD ? SS0 DW ? SS0_h DW ? ESP1 DD ? SS1 DW ? SS1_h DW ? ESP2 DD ? SS2 DW ? SS2_h DW ? CR3_reg DD ? EIP_reg DD ? EFLAGS_reg DD ?
8-20
59 EAX_reg DD ? 60 ECX_reg DD ? 61 EDX_reg DD ? 62 EBX_reg DD ? 63 ESP_reg DD ? 64 EBP_reg DD ? 65 ESI_reg DD ? 66 EDI_reg DD ? 67 ES_reg DW ? 68 ES_h DW ? 69 CS_reg DW ? 70 CS_h DW ? 71 SS_reg DW ? 72 SS_h DW ? 73 DS_reg DW ? 74 DS_h DW ? 75 FS_reg DW ? 76 FS_h DW ? 77 GS_reg DW ? 78 GS_h DW ? 79 LDT_reg DW ? 80 LDT_h DW ? 81 TRAP_reg DW ? 82 IO_map_base DW ? 83 TASK_STATE ENDS 84 85 ; basic structure of a descriptor 86 DESC STRUC 87 lim_0_15 DW ? 88 bas_0_15 DW ? 89 bas_16_23 DB ? 90 access DB ? 91 gran DB ? 92 bas_24_31 DB ? 93 DESC ENDS 94 95 ; structure for use with LGDT and LIDT instructions 96 TABLE_REG STRUC 97 table_lim DW ? 98 table_linear DD ? 99 TABLE_REG ENDS 100 101 ; offset of GDT and IDT descriptors in builder generated GDT 102 GDT_DESC_OFF EQU 1*SIZE(DESC) 103 IDT_DESC_OFF EQU 2*SIZE(DESC) 104 105 ; equates for building temporary GDT in RAM
8-21
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
LINEAR_SEL EQU 1*SIZE (DESC) LINEAR_PROTO_LO EQU 00000FFFFH ; LINEAR_ALIAS LINEAR_PROTO_HI EQU 000CF9200H ; Protection Enable Bit in CR0 PE_BIT EQU 1B ; -----------------------------------------------------------; ------------------------- DATA SEGMENT---------------------; Initially, this data segment starts at linear 0, according ; to the processors power-up state. STARTUP_DATA SEGMENT RW
free_mem_linear_base LABEL DWORD TEMP_GDT LABEL BYTE ; must be first in segment TEMP_GDT_NULL_DESC DESC <> TEMP_GDT_LINEAR_DESC DESC <> ; scratch areas for LGDT and LIDT instructions TEMP_GDT_SCRATCH TABLE_REG <> APP_GDT_RAM TABLE_REG <> APP_IDT_RAM TABLE_REG <> ; align end_data fill DW ? ; last thing in this segment - should be on a dword boundary end_data LABEL BYTE STARTUP_DATA ENDS ; ------------------------------------------------------------
; ------------------------- CODE SEGMENT---------------------STARTUP_CODE SEGMENT ER PUBLIC USE16 ; filled in by builder PUBLIC GDT_EPROM GDT_EPROM TABLE_REG <> ; filled in by builder PUBLIC IDT_EPROM IDT_EPROM TABLE_REG <> ; entry point into startup code - the bootstrap will vector
8-22
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178
; here with a near JMP generated by the builder. This ; label must be in the top 64K of linear memory. PUBLIC STARTUP STARTUP: ; DS,ES address the bottom 64K of flat linear memory ASSUME DS:STARTUP_DATA, ES:STARTUP_DATA ; See Figure 8-4 ; load GDTR with temporary GDT LEA EBX,TEMP_GDT ; build the TEMP_GDT in low ram, MOV DWORD PTR [EBX],0 ; where we can address MOV DWORD PTR [EBX]+4,0 MOV DWORD PTR [EBX]+8, LINEAR_PROTO_LO MOV DWORD PTR [EBX]+12, LINEAR_PROTO_HI MOV TEMP_GDT_scratch.table_linear,EBX MOV TEMP_GDT_scratch.table_lim,15 DB 66H ; execute a 32 bit LGDT LGDT TEMP_GDT_scratch ; enter protected mode MOV EBX,CR0 OR EBX,PE_BIT MOV CR0,EBX
179 ; clear prefetch queue 180 JMP CLEAR_LABEL 181 CLEAR_LABEL: 182 183 ; make DS and ES address 4G of linear memory 184 MOV CX,LINEAR_SEL 185 MOV DS,CX 186 MOV ES,CX 187 188 ; do board specific initialization 189 ; 190 ; 191 ; ...... 192 ; 193 194 195 ; See Figure 8-5 196 ; copy EPROM GDT to ram at: 197 ; RAM_START + size (STARTUP_DATA) 198 MOV EAX,RAM_START
8-23
199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247
8-24
ADD EAX,OFFSET (end_data) MOV EBX,RAM_START MOV ECX, CS_BASE ADD ECX, OFFSET (GDT_EPROM) MOV ESI, [ECX].table_linear MOV EDI,EAX MOVZX ECX, [ECX].table_lim MOV APP_GDT_ram[EBX].table_lim,CX INC ECX MOV EDX,EAX MOV APP_GDT_ram[EBX].table_linear,EAX ADD EAX,ECX REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI] ; fixup GDT base in descriptor MOV ECX,EDX MOV [EDX].bas_0_15+GDT_DESC_OFF,CX ROR ECX,16 MOV [EDX].bas_16_23+GDT_DESC_OFF,CL MOV [EDX].bas_24_31+GDT_DESC_OFF,CH ; copy EPROM IDT to ram at: ; RAM_START+size(STARTUP_DATA)+SIZE (EPROM GDT) MOV ECX, CS_BASE ADD ECX, OFFSET (IDT_EPROM) MOV ESI, [ECX].table_linear MOV EDI,EAX MOVZX ECX, [ECX].table_lim MOV APP_IDT_ram[EBX].table_lim,CX INC ECX MOV APP_IDT_ram[EBX].table_linear,EAX MOV EBX,EAX ADD EAX,ECX REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI] ; fixup IDT pointer in GDT MOV [EDX].bas_0_15+IDT_DESC_OFF,BX ROR EBX,16 MOV [EDX].bas_16_23+IDT_DESC_OFF,BL MOV [EDX].bas_24_31+IDT_DESC_OFF,BH ; load GDTR and IDTR MOV EBX,RAM_START DB 66H ; execute a 32 bit LGDT LGDT APP_GDT_ram[EBX] DB 66H ; execute a 32 bit LIDT LIDT APP_IDT_ram[EBX] ; move the TSS
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294
MOV EDI,EAX MOV EBX,TSS_INDEX*SIZE(DESC) MOV ECX,GDT_DESC_OFF ;build linear address for TSS MOV GS,CX MOV DH,GS:[EBX].bas_24_31 MOV DL,GS:[EBX].bas_16_23 ROL EDX,16 MOV DX,GS:[EBX].bas_0_15 MOV ESI,EDX LSL ECX,EBX INC ECX MOV EDX,EAX ADD EAX,ECX REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI] ; fixup TSS pointer MOV GS:[EBX].bas_0_15,DX ROL EDX,16 MOV GS:[EBX].bas_24_31,DH MOV GS:[EBX].bas_16_23,DL ROL EDX,16 ;save start of free ram at linear location RAMSTART MOV free_mem_linear_base+RAM_START,EAX ;assume no LDT used in the initial task - if necessary, ;code to move the LDT could be added, and should resemble ;that used to move the TSS ; load task register LTR BX ; No task switch, only descriptor loading ; See Figure 8-6 ; load minimal set of registers necessary to simulate task ; switch
MOV MOV MOV MOV PUSH PUSH PUSH MOV MOV MOV MOV
AX,[EDX].SS_reg ; start loading registers EDI,[EDX].ESP_reg SS,AX ESP,EDI ; stack now valid DWORD PTR [EDX].EFLAGS_reg DWORD PTR [EDX].CS_reg DWORD PTR [EDX].EIP_reg AX,[EDX].DS_reg BX,[EDX].ES_reg DS,AX ; DS and ES no longer linear memory ES,BX
8-25
295 ; simulate far jump to initial task 296 IRETD 297 298 STARTUP_CODE ENDS *** WARNING #377 IN 298, (PASS 2) SEGMENT CONTAINS PRIVILEGED INSTRUCTION(S) 299 300 END STARTUP, DS:STARTUP_DATA, SS:STARTUP_DATA 301 302 ASSEMBLY COMPLETE, 1 WARNING, NO ERRORS.
FFFF FFFFH
START: [CS.BASE+EIP]
FFFF 0000H
Jump near start Construct TEMP_GDT LGDT Move to protected mode
DS, ES = GDT[1]
4GB
Base Limit GDT [1] GDT [0] Base=0, Limit=4G 0
GDT_SCRATCH
TEMP_GDT
Figure 8-4. Constructing Temporary GDT and Switching to Protected Mode (Lines 162-172 of List File)
8-26
FFFF FFFFH
TSS IDT GDT
Move the GDT, IDT, TSS from ROM to RAM Fix Aliases LTR
TSS RAM IDT RAM GDT RAM
RAM_START
Figure 8-5. Moving the GDT, IDT and TSS from ROM to RAM (Lines 196-261 of List File)
8-27
SS = TSS.SS ESP = TSS.ESP PUSH TSS.EFLAG PUSH TSS.CS PUSH TSS.EIP ES = TSS.ES DS = TSS.DS IRET
EIP EFLAGS ESP ES CS SS DS
GDT
IDT Alias GDT Alias 0
TSS RAM IDT RAM GDT RAM
RAM_START
Figure 8-6. Task Switching (Lines 282-296 of List File)
8-28
8.9.3.
MAIN.ASM Source Code
The file MAIN.ASM shown in Example 8-2 defines the data and stack segments for this application and can be substituted with the main module task written in a high-level language that is invoked by the IRET instruction executed by STARTUP.ASM.
Example 8-2. MAIN.ASM NAME main_module data SEGMENT RW dw 1000 dup(?) DATA ENDS stack stackseg 800 CODE SEGMENT ER use32 PUBLIC main_start: nop nop nop CODE ENDS END main_start, ds:data, ss:stack
8.9.4.
Supporting Files
The batch file shown in Example 8-3 can be used to assemble the source code files STARTUP.ASM and MAIN.ASM and build the final application.
Example 8-3. Batch File to Assemble and Build the Application ASM386 STARTUP.ASM ASM386 MAIN.ASM BLD386 STARTUP.OBJ, MAIN.OBJ buildfile(EPROM.BLD) bootstrap(STARTUP) Bootload
BLD386 performs several operations in this example:
It allocates physical memory location to segments and tables. It generates tables using the build file and the input files. It links object files and resolves references. It generates a boot-loadable file to be programmed into the EPROM.
Example 8-4 shows the build file used as an input to BLD386 to perform the above functions.
8-29
Example 8-4. Build File INIT_BLD_EXAMPLE; SEGMENT *SEGMENTS(DPL = 0) , startup.startup_code(BASE = 0FFFF0000H) ; TASK BOOT_TASK(OBJECT = startup, INITIAL,DPL = 0, NOT INTENABLED) , PROTECTED_MODE_TASK(OBJECT = main_module,DPL = 0, NOT INTENABLED) ; TABLE GDT ( LOCATION = GDT_EPROM , ENTRY = ( 10: PROTECTED_MODE_TASK , startup.startup_code , startup.startup_data , main_module.data , main_module.code , main_module.stack ) ), IDT ( LOCATION = IDT_EPROM ); MEMORY ( RESERVE = (0..3FFFH -- Area for the GDT, IDT, TSS copied from ROM , 60000H..0FFFEFFFFH) , RANGE = (ROM_AREA = ROM (0FFFF0000H..0FFFFFFFFH)) -- Eprom size 64K , RANGE = (RAM_AREA = RAM (4000H..05FFFFH)) ); END
Table 8-5 shows the relationship of each build item with an ASM source file.
8-30
Table 8-5. Relationship Between BLD Item and ASM Source File
Item Bootstrap GDT location ASM386 and Startup.A58 public startup startup: public GDT_EPROM GDT_EPROM TABLE_REG <> public IDT_EPROM IDT_EPROM TABLE_REG <> RAM_START equ 400H BLD386 Controls and BLD file bootstrap start(startup) TABLE GDT(location = GDT_EPROM) TABLE IDT(location = IDT_EPROM memory (reserve = (0..3FFFH)) Effect Near jump at 0FFFFFFF0H to start The location of the GDT will be programmed into the GDT_EPROM location The location of the IDT will be programmed into the IDT_EPROM location RAM_START is used as the ram destination for moving the tables. It must be excluded from the applications segment area. Put the descriptor of the application TSS in GDT entry 10 Initialization code size must be less than 64K and resides at upper most 64K of the 4GB memory space.
IDT location
RAM start
Location of the application TSS in the GDT EPROM size and location
TSS_INDEX EQU 10
TABLE GDT( ENTRY=( 10: PROTECTED_MODE_TA SK)) SEGMENT startup.code (base= 0FFFF0000H) ...memory (RANGE( ROM_AREA = ROM(x..y))
size and location of the initialization code
8.10. P6 FAMILY MICROCODE UPDATE FEATURE

P6 family processors have the capability to correct specific errata through the loading of an Intel-supplied data block. This data block is referred to as a microcode update. This chapter describes the underlying mechanisms the BIOS needs to provide in order to utilize this feature during system initialization. It also describes a specification that provides for incorporating future releases of the microcode update into a system BIOS. Intel considers the combination of a particular silicon revision and the microcode update as the equivalent stepping of the processor. Intel does not validate processors without the microcode update loaded. Intel completes a full-stepping level validation and testing for new releases of microcode updates. A microcode update is used to correct specific errata in the processor. The BIOS, which incorporates an update loader, is responsible for loading the appropriate update on all processors during system initialization (refer to Figure 8-7). There are effectively two steps to this process. The first is to incorporate the necessary microcode updates into the BIOS, the second is to actually load the appropriate microcode update into the processor.
8-31
UPDATE LOADER Update Blocks
New Update BIOS
P6 Family CPU
Figure 8-7. Integrating Processor Specific Updates
8.10.1. Microcode Update

A microcode update consists of an Intel-supplied binary that contains a descriptive header and data. No executable code resides within the update. This section describes the update and the structure of its data format. Each microcode update is tailored for a particular stepping of a P6 family processor. It is designed such that a mismatch between a stepping of the processor and the update will result in a failure to load. Thus, a given microcode update is associated with a particular type, family, model, and stepping of the processor as returned by the CPUID instruction. In addition, the intended processor platform type must be determined to properly target the microcode update. The intended processor platform type is determined by reading a model-specific register MSR (17h) (refer to Table 8-6) within the P6 family processor. This is a 64-bit register that may be read using the RDMSR instruction (refer to Section 3.2., Instruction Reference Chapter 3, Instruction Set Reference, Volume 1 of the Programmers Reference Manual). The three platform ID bits, when read as a binary coded decimal (BCD) number indicate the bit position in the microcode update headers, Processor Flags field, that is associated with the installed processor.
8-32
Register Name:BBL_CR_OVRD MSR Address:017h Access:Read Only BBL_CR_OVRD is a 64-bit register accessed only when referenced as a Qword through a RDMSR instruction. Table 8-6. P6 Family Processor MSR Register Components
Bit 63:53 52:50 Reserved Platform ID bits (RO). The field gives information concerning the intended platform for the processor. 52 51 50 0 0 0 Processor Flag 0 (See Processor Flags in Microcode Update Header) 0 0 1 Processor Flag 1 0 1 0 Processor Flag 2 0 1 1 Processor Flag 3 1 0 0 Processor Flag 4 1 0 1 Processor Flag 5 1 1 0 Processor Flag 6 1 1 1 Processor Flag 7 Reserved Descriptions
49:0
The microcode update is a data block that is exactly 2048 bytes in length. The initial 48 bytes of the update contain a header with information used to identify the update. The update header and its reserved fields are interpreted by software based upon the header version. The initial version of the header is 00000001h. An encoding scheme also guards against tampering of the update data and provides a means for determining the authenticity of any given update. Table 8-7 defines each of the fields and Figure 8-8 shows the format of the microcode update data block.
8-33
Table 8-7. Microcode Update Encoding Format

Field Name Header Version Update Revision Offset (in bytes) 0 4 Length (in bytes) 4 4 Description Version number of the update header. Unique version number for the update, the basis for the update signature provided by the processor to indicate the current update functioning within the processor. Used by the BIOS to authenticate the update and verify that it is loaded successfully by the processor. The value in this field cannot be used for processor stepping identification alone. Date of the update creation in binary format: mmddyyyy (e.g. 07/18/98 is 07181998h).
Date Processor
8 12
4 4
Processor type, family, model, and stepping of processor that requires this particular update revision (e.g., 00000650h). Each microcode update is designed specifically for a given processor type, family, model, and stepping of processor. The BIOS uses the Processor field in conjunction with the CPUID instruction to determine whether or not an update is appropriate to load on a processor. The information encoded within this field exactly corresponds to the bit representations returned by the CPUID instruction.
Checksum of update data and header. Used to verify the integrity of the update header and data. Checksum is correct when the summation of the 512 double words of the update result in the value zero. Version number of the loader program needed to correctly load this update. The initial version is 00000001h. Platform type information is encoded in the lower 8 bits of this 4-byte field. Each bit represents a particular platform type for a given CPUID. The BIOS uses the Processor Flags field in conjunction with the platform ID bits in MSR (17h) to determine whether or not an update is appropriate to load on a processor. Reserved Fields for future expansion. Update data.
Checksum
16
Loader Revision
20
Processor Flags
24
Reserved Update Data
28 48
20 2000
8-34
32
24
16
Update Data (2000 Bytes)
Reserved (20 Bytes) Processor Flags

Reserved: 24 P7: I P6: I P5: I P4: I P3: I P2: I P1: I
Loader Revision Checksum Processor

Reserved: 18 ProcType: 2 Family: 4 Model: 4 Stepping: 4
Date
Month: 8 Day: 8 Year: 16
Update Revision Header Revision

32 24 16 8 0
Figure 8-8. Format of the Microcode Update Data Block
8.10.2. Microcode Update Loader

This section describes the update loader used to load a microcode update into a P6 family processor. It also discusses the requirements placed upon the BIOS to ensure proper loading of an update. The update loader contains the minimal instructions needed to load an update. The specific instruction sequence that is required to load an update is dependent upon the loader revision field contained within the update header. The revision of the update loader is expected to change very infrequently, potentially only when new processor models are introduced.
8-35
The code below represents the update loader with a loader revision of 00000001h:
mov ecx,79h ; MSR to read in ECX xor eax,eax ; clear EAX xor ebx,ebx ; clear EBX movax,cs ; Segment of microcode update shl eax,4 movbx,offset Update ; Offset of microcode update addeax,ebx ; Linear Address of Update in EAX addeax,48d ; Offset of the Update Data within the Update xor edx,edx ; Zero in EDX WRMSR ; microcode update trigger
8.10.2.1.
UPDATE LOADING PROCEDURE
The simple loader previously described assumes that Update is the address of a microcode update (header and data) embedded within the code segment of the BIOS. It also assumes that the processor is operating in real mode. The data may reside anywhere in memory that is accessible by the processor within its current operating mode (real, protected). Before the BIOS executes the microcode update trigger (WRMSR) instruction the following must be true:
EAX contains the linear address of the start of the update data EDX contains zero ECX contains 79h
Other requirements to keep in mind are: The microcode update must be loaded to the processor early on in the POST, and always prior to the initialization of the P6 family processors L2 cache controller. If the update is loaded while the processor is in real mode, then the update data may not cross a segment boundary. If the update is loaded while the processor is in real mode, then the update data may not exceed a segment limit. If paging is enabled, pages that are currently present must map the update data. The microcode update data does not require any particular byte or word boundary alignment. HARD RESETS IN UPDATE LOADING
8.10.2.2.
The effects of a loaded update are cleared from the processor upon a hard reset. Therefore, each time a hard reset is asserted during the BIOS POST, the update must be reloaded on all processors that observed the reset. The effects of a loaded update are, however, maintained across a processor INIT. There are no side effects caused by loading an update into a processor multiple times.
8-36
8.10.2.3.
UPDATE IN A MULTIPROCESSOR SYSTEM
A multiprocessor (MP) system requires loading each processor with update data appropriate for its CPUID and platform ID bits. The BIOS is responsible for ensuring that this requirement is met, and that the loader is located in a module that is executed by all processors in the system. If a system design permits multiple steppings of P6 family processors to exist concurrently, then the BIOS must verify each individual processor against the update header information to ensure appropriate loading. Given these considerations, it is most practical to load the update during MP initialization. 8.10.2.4. UPDATE LOADER ENHANCEMENTS
The update loader presented in Section 8.10.2.1., Update Loading Procedure is a minimal implementation that can be enhanced to provide additional functionality and features. Some potential enhancements are described below:
The BIOS can incorporate multiple updates to support multiple steppings of the P6 family processor. This feature provides for operating in a mixed stepping environment on an MP system and enables a user to upgrade to a later version of the processor. In this case, modify the loader to check the CPUID and platform ID bits of the processor that it is running on against the available headers before loading a particular update. The number of updates is only limited by the available space in the BIOS. A loader can load the update and test the processor to determine if the update was loaded correctly. This can be done as described in the Section 8.10.3., Update Signature and Verification. A loader can verify the integrity of the update data by performing a checksum on the double words of the update summing to zero, and can reject the update. A loader can provide power-on messages indicating successful loading of an update.
8.10.3. Update Signature and Verification

The P6 family processor provides capabilities to verify the authenticity of a particular update and to identify the current update revision. This section describes the model-specific extensions of the processor that support this feature. The update verification method below assumes that the BIOS will only verify an update that is more recent than the revision currently loaded into the processor. The CPUID instruction returns a value in a model specific register in addition to its usual register return values. The semantics of the CPUID instruction cause it to deposit an update ID value in the 64-bit model-specific register (MSR) at address 08Bh. If no update is present in the processor, the value in the MSR remains unmodified. Normally a zero value is preloaded into the MSR by software before executing the CPUID instruction. If the MSR still contains zero after executing CPUID, this indicates that no update is present. The update ID value returned in the EDX register after a RDMSR instruction indicates the revision of the update loaded in the processor. This value, in combination with the normal CPUID
8-37
value returned in the EAX register, uniquely identifies a particular update. The signature ID can be directly compared with the update revision field in the microcode update header for verification of a correct update load. No consecutive updates released for a given stepping of the P6 family processor may share the same signature. Updates for different steppings are differentiated by the CPUID value. 8.10.3.1. DETERMINING THE SIGNATURE
An update that is successfully loaded into the processor provides a signature that matches the update revision of the currently functioning revision. This signature is available any time after the actual update has been loaded, and requesting this signature does not have any negative impact upon any currently loaded update. The procedure for determining this signature is:
mov ecx, 08Bh;Model Specific Register to Read in ECX xor eax,eax ;clear EAX xor edx,edx ;clear EDX WRMSR ;Load 0 to MSR at 8Bh mov eax,1 CPUID mov ecx, 08BH;Model Specific Register to Read RDMSR ;Read Model Specific Register
If there is an update currently active in the processor, its update revision is returned in the EDX register after the RDMSR instruction has completed. 8.10.3.2. AUTHENTICATING THE UPDATE
An update may be authenticated by the BIOS using the signature primitive, described above, with the following algorithm:
Z = Update revision from the update header to be authenticated; X = Current Update Signature from MSR 8Bh; If (Z > X) Then Load Update that is to be authenticated; Y = New Signature from MSR 8Bh; If (Z == Y) then Success Else Fail Else Fail
The algorithm requires that the BIOS only authenticate updates that contain a numerically larger revision than the currently loaded revision, where Current Signature (X) < New Update Revision (Z). A processor with no update loaded should be considered to have a revision equal to zero. This authentication procedure relies upon the decoding provided by the processor to verify an update from a potentially hostile source. As an example, this mechanism in conjunction with other safeguards provides security for dynamically incorporating field updates into the BIOS.
8-38
8.10.4. P6 Family Processor Microcode Update Specifications

This section describes the interface that an application can use to dynamically integrate processor-specific updates into the system BIOS. In this discussion, the application is referred to as the calling program or caller. The real mode INT15 call specification described here is an Intel extension to an OEM BIOS. This extension allows an application to read and modify the contents of the microcode update data in NVRAM. The update loader, which is part of the system BIOS, cannot be updated by the interface. All of the functions defined in the specification must be implemented for a system to be considered compliant with the specification. The INT15 functions are accessible only from real mode. 8.10.4.1. RESPONSIBILITIES OF THE BIOS
If a BIOS passes the presence test (INT 15h, AX=0D042h, BL=0h) it must implement all of the sub-functions defined in the INT 15h, AX= 0D042h specification. There are no optional functions. The BIOS must load the appropriate update for each processor during system initialization. A header version of an update block containing the value 0FFFFFFFFh indicates that the update block is unused and available for storing a new update. The BIOS is responsible for providing a 2048 byte region of non-volatile storage (NVRAM) for each potential processor stepping within a system. This storage unit is referred to as an update block. The BIOS for a single processor system need only provide one update block to store the microcode update data. The BIOS for a multiple processor capable system needs to provide one update block for each unique processor stepping supported by the OEMs system. The BIOS is responsible for managing the NVRAM update blocks. This includes garbage collection, such as removing update blocks that exist in NVRAM for which a corresponding processor does not exist in the system. This specification only provides the mechanism for ensuring security, the uniqueness of an entry, and that stale entries are not loaded. The actual update block management is implementation specific on a per-BIOS basis. As an example, the BIOS may use update blocks sequentially in ascending order with CPU signatures sorted versus the first available block. In addition, garbage collection may be implemented as a setup option to clear all NVRAM slots or as BIOS code that searches and eliminates unused entries during boot. The following algorithm describes the steps performed during BIOS initialization used to load the updates into the processor(s). It assumes that the BIOS ensures that no update contained within NVRAM has a header version or loader version that does not match one currently supported by the BIOS and that the update block contains a correct checksum. It also assumes that the BIOS ensures that at most one update exists for each processor stepping and that older update revisions are not allowed to overwrite more recent ones. These requirements are checked by the BIOS during the execution of the write update function of this interface. The BIOS sequentially scans through all of the update blocks in NVRAM starting with index 0. The BIOS scans until it finds an update where the processor fields in the header match the family, model, and stepping as well as the platform ID bits of the current processor.
8-39
For each processor in the system { Determine the ProcType, Family, Model and Stepping via CPUID; Determine the Platform ID Bits by reading the BBL_CR_OVRD[52:50] MSR; for (I = UpdateBlock 0, I < NumOfUpdates; I++) { If ((UpdateHeader.Processor == ProcType, Family, Model and Stepping) && (UpdateHeader.ProcessorFlags == Platform ID Bits)) { Load UpdateHeader.UpdateData into the Processor; Verify that update was correctly loaded into the processor Go on to next processor Break; } }
Programmers Note: The platform ID bits in the BBL_CR_OVRD MSR are encoded as a three-bit binary coded decimal field. The platform ID bits in the microcode update header are individually bit encoded. The algorithm must do a translation from one format to the other prior to doing the comparison. When performing the INT 15h, 0D042h functions, the BIOS must assume that the caller has no knowledge about platform specific requirements. It is the responsibility of the BIOS calls to manage all chipset and platform specific prerequisites for managing the NVRAM device. When writing the update data via the write update sub-function, the BIOS must maintain implementation specific data requirements, such as the update of NVRAM checksum. The BIOS should also attempt to verify the success of write operations on the storage device used to record the update. 8.10.4.2. RESPONSIBILITIES OF THE CALLING PROGRAM
This section of the document lists the responsibilities of the calling program using the interface specifications to load microcode update(s) into BIOS NVRAM. The calling program should call the INT 15h, 0D042h functions from a pure real mode program and should be executing on a system that is running in pure real mode. The caller should issue the presence test function (sub function 0) and verify the signature and return codes of that function. It is important that the calling program provides the required scratch RAM buffers for the BIOS and the proper stack size as specified in the interface definition. The calling program should read any update data that already exists in the BIOS in order to make decisions about the appropriateness of loading the update. The BIOS refuses to overwrite a newer update with an older version. The update header contains information about version and processor specifics for the calling program to make an intelligent decision about loading. There can be no ambiguous updates. The BIOS refuses to allow multiple updates for the same CPUID to exist at the same time. The BIOS also refuses to load an update for a processor that does not exist in the system. The calling application should implement a verify function that is run after the update write function successfully completes. This function reads back the update and verifies that the BIOS
8-40
returned an image identical to the one that was written. The following pseudo-code represents a calling program.
INT 15 D042 Calling Program Pseudo-code // // We must be in real mode // If the system is not in Real mode then Exit // // Detect the presence of Genuine Intel processor(s) that can be updated (CPUID) // If no Intel processors exist that can be updated then Exit // // Detect the presence of the Intel microcode update extensions // If the BIOS fails the PresenceTest then Exit // // If the APIC is enabled, see if any other processors are out there // Read APICBaseMSR If APIC enabled { Send Broadcast Message to all processors except self via APIC; Have all processors execute CPUID and record Type, Family, Model, Stepping Have all processors read BBL_CR_OVRD[52:50] and record platform ID bits If current processor is not updatable then Exit } // // Determine the number of unique update slots needed for this system // NumSlots = 0; For each processor { If ((this is a unique processor stepping) and (we have an update in the database for this processor)) { Checksum the update from the database; If Checksum fails then Exit; Increment NumSlots; } } // // Do we have enough update slots for all CPUs? // If there are more unique processor steppings than update slots provided by the BIOS then Exit
8-41
// // Do we need any update slots at all? If not, then were all done // If (NumSlots == 0) then Exit // // Record updates for processors in NVRAM. // For (I=0; I<NumSlots; I++) { // // Load each Update // Issue the WriteUpdate function If (STORAGE_FULL) returned { Display Error -- BIOS is not managing NVRAM appropriately exit } If (INVALID_REVISION) returned { Display Message: More recent update already loaded in NVRAM for this stepping continue; } If any other error returned { Display Diagnostic exit } // // Verify the update was loaded correctly // Issue the ReadUpdate function If an error occurred { Display Diagnostic exit } // // Compare the Update read to that written // if (Update read != Update written) { Display Diagnostic exit } } //
8-42
// Enable Update Loading, and inform user // Issue the ControlUpdate function with Task=Enable.
8.10.4.3.
MICROCODE UPDATE FUNCTIONS
Table 8-8 defines the current P6 family Processor microcode update functions.
Table 8-8. Microcode Update Functions
Microcode Update Function Presence test Write update data Update control Read update data Function Number 00h 01h 02h 03h Description Returns information about the supported functions. Writes one of the update data areas (slots). Globally controls the loading of updates. Reads one of the update data areas (slots). Required/Optional Required Required Required Required
8.10.4.4.
INT 15H-BASED INTERFACE
Intel recommends that a BIOS interface be provided that allows additional microcode updates to be added to the system flash. The INT15 interface is an Intel-defined method for doing this. The program that calls this interface is responsible for providing three 64-kilobyte RAM areas for BIOS use during calls to the read and write functions. These RAM scratch pads can be used by the BIOS for any purpose, but only for the duration of the function call. The calling routine places real mode segments pointing to the RAM blocks in the CX, DX and SI registers. Calls to functions in this interface must be made with a minimum of 32 kilobytes of stack available to the BIOS. In general, each function returns with CF cleared and AH contains the returned status. The general return codes and other constant definitions are listed in Section 8.10.4.5., Return Codes. The OEM Error (AL) is provided for the OEM to return additional error information specific to the platform. If the BIOS provides no additional information about the error, the OEM Error must be set to SUCCESS. The OEM Error field is undefined if AH contains either SUCCESS (00) or NOT_IMPLEMENTED (86h). In all other cases it must be set with either SUCCESS or a value meaningful to the OEM. The following text details the functions provided by the INT15h-based interface.
8-43
Function 00h - Presence Test This function verifies that the BIOS has implemented the required microcode update functions. Table 8-3 lists the parameters and return codes for the function.
Table 8-9. Parameters for the Presence Test
Input: AX BL Output: CF AH AL EBX ECX EDX SI Carry Flag Return Code OEM Error Signature Part 1 Signature Part 2 Loader Version Update Count Additional OEM Information. INTE - Part one of the signature. LPEP- Part two of the signature. Version number of the microcode update loader. Number of update blocks the system can record in NVRAM. Carry Set - Failure - AH Contains Status. Carry Clear - All return values are valid. Function Code Sub-function 0D042h 00h - Presence Test
Return Codes: (See Table 8-8 for code definitions) SUCCESS NOT_IMPLEMENTED Function completed successfully. Function not implemented.
In order to assure that the BIOS function is present, the caller must verify the Carry Flag, the Return Code, and the 64-bit signature. Each update block is exactly 2048 bytes in length. The update count reflects the number of update blocks available for storage within non-volatile RAM. The update count must return with a value greater than or equal to the number of unique processor steppings currently installed within the system. The loader version number refers to the revision of the update loader program that is included in the system BIOS image.
8-44
Function 01h - Write Microcode Update Data This function integrates a new microcode update into the BIOS storage device. Table 8-4 lists the parameters and return codes for the function.
Table 8-10. Parameters for the Write Update Data Function
Input: AX BL ED:DI CX DX SI SS:SP Output: CF AH AL Carry Flag Return Code OEM Error Carry Set - Failure - AH Contains Status. Carry Clear - All return values are valid. Status of the Call Additional OEM Information. Function Code Sub-function Update Address Scratch Pad1 Scratch Pad2 Scratch Pad3 Stack pointer 0D042h 01h - Write Update Real Mode pointer to the Intel Update structure. This buffer is 2048 bytes in length Real Mode Segment address of 64 kilobytes of RAM Block. Real Mode Segment address of 64 kilobytes of RAM Block. Real Mode Segment address of 64 kilobytes of RAM Block. 32 kilobytes of Stack Minimum.
Return Codes: (See Table 8-8 for code definitions) SUCCESS WRITE_FAILURE ERASE_FAILURE READ_FAILURE STORAGE_FULL Function completed successfully. A failure because of the inability to write the storage device. A failure because of the inability to erase the storage device. A failure because of the inability to read the storage device. The BIOS non-volatile storage area is unable to accommodate the update because all available update blocks are filled with updates that are needed for processors in the system. The processor stepping does not currently exist in the system. The update header contains a header or loader version that is not recognized by the BIOS. The update does not checksum correctly. The processor rejected the update. The same or more recent revision of the update exists in the storage device.
CPU_NOT_PRESENT INVALID_HEADER INVALID_HEADER_CS SECURITY_FAILURE INVALID_REVISION
The BIOS is responsible for selecting an appropriate update block in the non-volatile storage for storing the new update. This BIOS is also responsible for ensuring the integrity of the information provided by the caller, including authenticating the proposed update before incorporating it into storage.
8-45
Before writing the update block into NVRAM, the BIOS should ensure that the update structure meets the following criteria in the following order: 1. The update header version should be equal to an update header version recognized by the BIOS. 2. The update loader version in the update header should be equal to the update loader version contained within the BIOS image. 3. The update block should checksum to zero. This checksum is computed as a 32-bit summation of all 512 double words in the structure, including the header. The BIOS selects an update block in non-volatile storage for storing the candidate update. The BIOS can select any available update block as long as it guarantees that only a single update exists for any given processor stepping in non-volatile storage. If the update block selected already contains an update, the following additional criteria apply to overwrite it:
The processor signature in the proposed update should be equal to the processor signature in the header of the current update in NVRAM (CPUID + platform ID bits). The update revision in the proposed update should be greater than the update revision in the header of the current update in NVRAM.
If no unused update blocks are available and the above criteria are not met, the BIOS can overwrite an update block for a processor stepping that is no longer present in the system. This can be done by scanning the update blocks and comparing the processor steppings, identified in the MP Specification table, to the processor steppings that currently exist in the system. Finally, before storing the proposed update into NVRAM, the BIOS should verify the authenticity of the update via the mechanism described in Section 8.10.2., Microcode Update Loader. This includes loading the update into the current processor, executing the CPUID instruction, reading MSR 08Bh, and comparing a calculated value with the update revision in the proposed update header for equality. When performing the write update function, the BIOS should record the entire update, including the header and the update data. When writing an update, the original contents may be overwritten, assuming the above criteria have been met. It is the responsibility of the BIOS to ensure that more recent updates are not overwritten through the use of this BIOS call, and that only a single update exists within the NVRAM for any processor stepping. Figure 8-9 shows the process the BIOS follows to choose an update block and ensure the integrity of the data when it stores the new microcode update.
8-46
Write Microcode Update
Does Update Match a CPU in the System?
No
Return CPU_NOT_PRESENT
Yes
Valid Update Header Version?
No
Return INVALID_HEADER
Yes
Does Loader Revision Match BIOSs Loader?
No
Return INVALID_HEADER
Yes
Does Update Checksum Correctly?
No
Return INVALID_HEADER_CS
Yes
Update Matching CPU Already In NVRAM?
No
Space Available in NVRAM?
Yes
No
Yes
Return STORAGE_FULL
Update Revision Newer Than NVRAM Update?
No
Return INVALID_REVISION
Yes
Update Pass Authenticity Test?
No
Return SECURITY_FAILURE
Yes
Update NMRAM Record
Return SUCCESS
Figure 8-9. Write Operation Flow Chart
8-47
Function 02h - Microcode Update Control This function enables loading of binary updates into the processor. Table 8-5 lists the parameters and return codes for the function.
Table 8-11. Parameters for the Control Update Sub-function
Input: AX BL BH CX DX SI SS:SP Output: CF AH AL BL Carry Flag Return Code OEM Error Update Status Carry Set - Failure - AH contains Status. Carry Clear - All return values are valid. Status of the Call. Additional OEM Information. Either Enable or Disable indicator. Function Code Sub-function Task Scratch Pad1 Scratch Pad2 Scratch Pad3 Stack pointer 0D042h 02h - Control Update See Description. Real Mode Segment of 64 kilobytes of RAM Block. Real Mode Segment of 64 kilobytes of RAM Block. Real Mode Segment of 64 kilobytes of RAM Block. 32 kilobytes of Stack Minimum.
Return Codes: (See Table 8-8 for code definitions) SUCCESS READ_FAILURE Function completed successfully. A failure because of the inability to read the storage device.
This control is provided on a global basis for all updates and processors. The caller can determine the current status of update loading (enabled or disabled) without changing the state. The function does not allow the caller to disable loading of binary updates, as this poses a security risk. The caller specifies the requested operation by placing one of the values from Table 8-6 in the BH register. After successfully completing this function the BL register contains either the enable or the disable designator. Note that if the function fails, the update status return value is undefined.
Table 8-12. Mnemonic Values
Mnemonic Enable Query Value 1 2 Meaning Enable the Update loading at initialization time Determine the current state of the update control without changing its status.
The READ_FAILURE error code returned by this function has meaning only if the control function is implemented in the BIOS NVRAM. The state of this feature (enabled/disabled) can also be implemented using CMOS RAM bits where READ failure errors cannot occur.
8-48
Function 03h - Read Microcode Update Data This function reads a currently installed microcode update from the BIOS storage into a callerprovided RAM buffer. Section 8-13, Parameters for the Read Microcode Update Data Function lists the parameters and return codes for the function.
Table 8-13. Parameters for the Read Microcode Update Data Function
Input: AX BL ES:DI ECX ECX DX SS:SP SI Function Code Sub-function Buffer Address Scratch Pad1 Scratch Pad2 Scratch Pad3 Stack pointer Update Number 0D042h 03h - Read Update Real Mode pointer to the Intel Update structure that will be written with the binary data. Real Mode Segment address of 64 kilobytes of RAM Block (lower 16 bits). Real Mode Segment address of 64 kilobytes of RAM Block (upper 16 bits). Real Mode Segment address of 64 kilobytes of RAM Block. 32 kilobytes of Stack Minimum. The index number of the update block to be read. This value is zero based and must be less than the update count returned from the presence test function.
Output: CF Carry Clear - All return values are valid. AH AL Return Code OEM Error Status of the Call. Additional OEM Information. Carry Flag Carry Set - Failure - AH contains Status.
Return Codes: (See Table 8-8 for code definitions) SUCCESS READ_FAILURE UPDATE_NUM_INVALID Function completed successfully. A failure because of the inability to read the storage device. Update number exceeds the maximum number of update blocks implemented by the BIOS.
The read function enables the caller to read any update data that already exists in a BIOS and make decisions about the addition of new updates. As a result of a successful call, the BIOS copies exactly 2048 bytes into the location pointed to by ES:DI, with the contents of the update block represented by update number. An update block is considered unused and available for storing a new update if its header version contains the value 0FFFFFFFFh after return from this function call. The actual implementation of NVRAM storage management is not specified here and is BIOS dependent. As an example, the actual data value used to represent an empty block by the BIOS may be zero, rather than
8-49
0FFFFFFFFh. The BIOS is responsible for translating this information into the header provided by this function. 8.10.4.5. RETURN CODES
After the call has been made, the return codes listed in Table 8-8 are available in the AH register.
Table 8-14. Return Code Definitions
Return Code SUCCESS NOT_IMPLEMENTED ERASE_FAILURE WRITE_FAILURE READ_FAILURE STORAGE_FULL Value 00h 86h 90h 91h 92h 93h Description Function completed successfully Function not implemented A failure because of the inability to erase the storage device A failure because of the inability to write the storage device A failure because of the inability to read the storage device The BIOS non-volatile storage area is unable to accommodate the update because all available update blocks are filled with updates that are needed for processors in the system The processor stepping does not currently exist in the system The update header contains a header or loader version that is not recognized by the BIOS The update does not checksum correctly The update was rejected by the processor The same or more recent revision of the update exists in the storage device The update number exceeds the maximum number of update blocks implemented by the BIOS
CPU_NOT_PRESENT INVALID_HEADER INVALID_HEADER_CS SECURITY_FAILURE INVALID_REVISION UPDATE_NUM_INVALID
94h 95h 96h 97h 98h 99h
8-50
9
Memory Cache Control
MEMORY CACHE CONTROL
CHAPTER 9 MEMORY CACHE CONTROL

This chapter describes the Intel Architectures memory cache and cache control mechanisms, the TLBs, and the write buffer. It also describes the memory type range registers (MTRRs) found in the P6 family processors and how they are used to control caching of physical memory locations.
9.1.
INTERNAL CACHES, TLBS, AND BUFFERS
The Intel Architecture supports caches, translation look aside buffers (TLBs), and write buffers for temporary on-chip (and external) storage of instructions and data (see Figure 9-1). Table 9-1 shows the characteristics of these caches and buffers for the P6 family, Pentium, and Intel486 processors. The sizes and characteristics of these units are machine specific and may change in future versions of the processor. The CPUID instruction returns the sizes and characteristics of the caches and buffers for the processor on which the instruction is executed. For more information, see CPUIDCPU Identification in Chapter 3 of the Intel Architecture Software Developers Manual, Volume 2.
9-1
Physical Memory L2 Cache2,3 Cache Bus Inst. TLBs Bus Interface Unit Data TLBs Data Cache Unit (L11)
System Bus (External)
Instruction Fetch Unit

1
Instruction Cache (L11)
Write Buffer
For the Intel486 processor, the L1 Cache is a unified instruction and data cache. For the Pentium and Intel486 processors, the L2 Cache is external to the processor package and there is no cache bus (that is, the L2 cache interfaces with the system bus). For the Pentium Pro, Pentium II and Pentium III processors, the L2 Cache is internal to the processor package and there is a separate cache bus.
Figure 9-1. Intel Architecture Caches
The Intel Architecture defines two separate caches: the level 1 (L1) cache and the level 2 (L2) cache (see Figure 9-1). The L1 cache is closely coupled to the instruction fetch unit and execution units of the processor. For the Pentium and P6 family processors, the L1 cache is divided into two sections: one dedicated to caching instructions and one to caching data. For the Intel486 processor, the L1 cache is a unified instruction and data cache.
9-2
Table 9-1. Characteristics of the Caches, TLBs, and Write Buffer in Intel Architecture Processors
Cache or Buffer L1 Instruction Cache1 Characteristics - P6 family and Pentium processors: 8 or 16 KBytes, 4-way set associative, 32-byte cache line size; 2-way set associative for earlier Pentium processors. - Intel486 processor: 8 or 16 KBytes, 4-way set associative, 16-byte cache line size, instruction and data cache combined. - P6 family processors: 16 KBytes, 4-way set associative, 32-byte cache line size; 8 KBytes, 2-way set associative for earlier P6 family processors. - Pentium processors: 16 KBytes, 4-way set associative, 32-byte cache line size; 8 KBytes, 2-way set associative for earlier Pentium processors. - Intel486 processor: (see L1 instruction cache). - P6 family processors: 128 KBytes, 256 KBytes, 512 KBytes, 1 MByte, or 2 MByte, 4-way set associative, 32-byte cache line size. - Pentium processor: System specific, typically 256 or 512 KBytes, 4-way set associative, 32-byte cache line size. - Intel486 processor: System specific. - P6 family processors: 32 entries, 4-way set associative. - Pentium processor: 32 entries, 4-way set associative; fully set associative for Pentium processors with MMX technology. - Intel486 processor: 32 entries, 4-way set associative, instruction and data TLB combined. - Pentium and P6 family processors: 64 entries, 4-way set associative; fully set associative for Pentium processors with MMX technology. - Intel486 processor: (see Instruction TLB). - P6 family processors: 2 entries, fully associative - Pentium processor: Uses same TLB as used for 4-KByte pages. - Intel486 processor: None (large pages not supported). - P6 family processors: 8 entries, 4-way set associative. - Pentium processor: 8 entries, 4-way set associative; uses same TLB as used for 4-KByte pages in Pentium processors with MMX technology. - Intel486 processor: None (large pages not supported). - P6 family processors: 12 entries. - Pentium processor: 2 buffers, 1 entry each (Pentium processors with MMX technology have 4 buffers for 4 entries). - Intel486 processor: 4 entries.
L1 Data Cache1
L2 Unified Cache2,3
Instruction TLB (4KByte Pages)1
Data TLB (4-KByte Pages)1 Instruction TLB (Large Pages) Data TLB (Large Pages)
Write Buffer
NOTES: 1. In the Intel486 processor, the L1 cache is a unified instruction and data cache, and the TLB is a unified instruction and data TLB. 2. In the Intel486 and Pentium processors, the L2 cache is external to the processor package and optional. 3. In the Pentium Pro, Pentium II, and Pentium III processors, the L2 cache is internal to the processor package.
9-3
The L2 cache is a unified cache for storage of both instructions and data. It is closely coupled to the L1 cache through the processors cache bus (for the P6 family processors) or the system bus (for the Pentium and Intel486 processors). The cache lines for the P6 family and Pentium processors L1 and L2 caches are 32 bytes wide. The processor always reads a cache line from system memory beginning on a 32-byte boundary. (A 32-byte aligned cache line begins at an address with its 5 least-significant bits clear.) A cache line can be filled from memory with a 4-transfer burst transaction. The caches do not support partially-filled cache lines, so caching even a single doubleword requires caching an entire line. (The cache line size for the Intel486 processor is 16 bytes.) The L1 and L2 caches are available in all execution modes. Using these caches greatly improves the performance of the processor both in single- and multiple-processor systems. Caching can also be used in system management mode (SMM); however, it must be handled carefully. For more information, see Section 12.4.2., SMRAM Caching, in Chapter 12, System Management Mode (SMM). The TLBs store the most recently used page-directory and page-table entries. They speed up memory accesses when paging is enabled by reducing the number of memory accesses that are required to read the page tables stored in system memory. The TLBs are divided into four groups: instruction TLBs for 4-KByte pages, data TLBs for 4-KByte pages; instruction TLBs for large pages (2-MByte or 4-MByte pages), and data TLBs for large pages. (Only 4-KByte pages are supported for Intel386 and Intel486 processors.) The TLBs are normally active only in protected mode with paging enabled. When paging is disabled or the processor is in realaddress mode, the TLBs maintain their contents until explicitly or implicitly flushed. For more information, see Section 9.10., Invalidating the Translation Lookaside Buffers (TLBs). The write buffer is associated with the processors instruction execution units. It allows writes to system memory and/or the internal caches to be saved and in some cases combined to optimize the processors bus accesses. The write buffer is always enabled in all execution modes. The processors caches are for the most part transparent to software. When enabled, instructions and data flow through these caches without the need for explicit software control. However, knowledge of the behavior of these caches may be useful in optimizing software performance. For example, knowledge of cache dimensions and replacement algorithms gives an indication of how large of a data structure can be operated on at once without causing cache thrashing. In multiprocessor systems, maintenance of cache consistency may, in rare circumstances, require intervention by system software. For these rare cases, the processor provides privileged cache control instructions for use in flushing caches.
9.2.
CACHING TERMINOLOGY
The Intel Architecture (beginning with the Pentium processor) uses the MESI (modified, exclusive, shared, invalid) cache protocol to maintain consistency with internal caches and caches in other processors. For more information, see Section 9.4., Cache Control Protocol. (The Intel486 processor uses an implementation defined caching protocol that operates in a similar manner to the MESI protocol.)
9-4
When the processor recognizes that an operand being read from memory is cacheable, the processor reads an entire cache line into the appropriate cache (L1, L2, or both). This operation is called a cache line fill. If the memory location containing that operand is still cached the next time the processor attempts to access the operand, the processor can read the operand from the cache instead of going back to memory. This operation is called a cache hit. When the processor attempts to write an operand to a cacheable area of memory, it first checks if a cache line for that memory location exists in the cache. If a valid cache line does exist, the processor (depending on the write policy currently in force) can write the operand into the cache instead of writing it out to system memory. This operation is called a write hit. If a write misses the cache (that is, a valid cache line is not present for the area of memory being written to), the processor performs a cache line fill, write allocation. Then it writes the operand into the cache line and (depending on the write policy currently in force) can also write it out to memory. If the operand is to be written out to memory, it is written first into the write buffer, and then written from the write buffer to memory when the system bus is available. (Note that for the Intel486 and Pentium processors, write misses do not result in a cache line fill; they always result in a write to memory. For these processors, only read misses result in cache line fills.) When operating in a multiple-processor system, Intel Architecture processors (beginning with the Intel486 processor) have the ability to snoop other processors accesses to system memory and to their internal caches. They use this snooping ability to keep their internal caches consistent both with system memory and with the caches in other processors on the bus. For example, in the Pentium and P6 family processors, if through snooping one processor detects that another processor intends to write to a memory location that it currently has cached in shared state, the snooping processor will invalidate its cache line forcing it to perform a cache line fill the next time it accesses the same memory location. Beginning with the P6 family processors, if a processor detects (through snooping) that another processor is trying to access a memory location that it has modified in its cache, but has not yet written back to system memory, the snooping processor will signal the other processor (by means of the HITM# signal) that the cache line is held in modified state and will preform an implicit write-back of the modified data. The implicit write-back is transferred directly to the initial requesting processor and snooped by the memory controller to assure that system memory has been updated. Here, the processor with the valid data may pass the data to the other processors without actually writing it to system memory; however, it is the responsibility of the memory controller to snoop this operation and update memory.
9.3.
METHODS OF CACHING AVAILABLE
The processor allows any area of system memory to be cached in the L1 and L2 caches. Within individual pages or regions of system memory, it also allows the type of caching (also called memory type) to be specified, using a variety of system flags and registers. For more information, see Section 9.5., Cache Control. The caching methods currently defined for the Intel Architecture are as follows. (Table 9-2 lists which types of caching are available on specific Intel Architecture processors.)
Uncacheable (UC)System memory locations are not cached. All reads and writes appear on the system bus and are executed in program order, without reordering. No speculative
9-5
memory accesses, page-table walks, or prefetches of speculated branch targets are made. This type of cache-control is useful for memory-mapped I/O devices. When used with normal RAM, it greatly reduces processor performance.
Table 9-2. Methods of Caching Available in P6 Family, Pentium, and Intel486 Processors
Caching Method Uncacheable (UC) Write Combining (WC) Write Through (WT) Write Back (WB) Write Protected (WP) NOTES: 1. Requires programming of MTRRs to implement. 2. Speculative reads not supported. P6 Family Processors Yes Yes
1
Pentium Processor Yes No Yes

2
Intel486 Processor Yes No Yes2 No No
Yes Yes Yes1
Yes2 No
Write Combining (WC)System memory locations are not cached (as with uncacheable memory) and coherency is not enforced by the processors bus coherency protocol. Speculative reads are allowed. Writes may be delayed and combined in the write buffer to reduce memory accesses. The writes may be delayed until the next occurrence of a buffer or processor serialization event, e.g., CPUID execution, a read or write to uncached memory, interrupt occurrence, LOCKed instruction execution, etc. if the WC buffer is partially filled. This type of cache-control is appropriate for video frame buffers, where the order of writes is unimportant as long as the writes update memory so they can be seen on the graphics display. See Section 9.3.1., Buffering of Write Combining Memory Locations, for more information about caching the WC memory type. The preferred method is to use the new SFENCE (store fence) instruction introduced in the Pentium III processor. The SFENCE instruction ensures weakly ordered writes are written to memory in order, i.e., it serializes only the store operations. Write-through (WT)Writes and reads to and from system memory are cached. Reads come from cache lines on cache hits; read misses cause cache fills. Speculative reads are allowed. All writes are written to a cache line (when possible) and through to system memory. When writing through to memory, invalid cache lines are never filled, and valid cache lines are either filled or invalidated. Write combining is allowed. This type of cachecontrol is appropriate for frame buffers or when there are devices on the system bus that access system memory, but do not perform snooping of memory accesses. It enforces coherency between caches in the processors and system memory. Write-back (WB)Writes and reads to and from system memory are cached. Reads come from cache lines on cache hits; read misses cause cache fills. Speculative reads are allowed. Write misses cause cache line fills (in the P6 family processors), and writes are performed entirely in the cache, when possible. Write combining is allowed. The writeback memory type reduces bus traffic by eliminating many unnecessary writes to system memory. Writes to a cache line are not immediately forwarded to system memory; instead,
9-6
they are accumulated in the cache. The modified cache lines are written to system memory later, when a write-back operation is performed. Write-back operations are triggered when cache lines need to be deallocated, such as when new cache lines are being allocated in a cache that is already full. They also are triggered by the mechanisms used to maintain cache consistency. This type of cache-control provides the best performance, but it requires that all devices that access system memory on the system bus be able to snoop memory accesses to insure system memory and cache coherency.
Write protected (WP)Reads come from cache lines when possible, and read misses cause cache fills. Writes are propagated to the system bus and cause corresponding cache lines on all processors on the bus to be invalidated. Speculative reads are allowed. This caching option is available in the P6 family processors by programming the MTRRs (seeTable 9-5).
9.3.1.
Buffering of Write Combining Memory Locations
Writes to WC memory are not cached in the typical sense of the word cached. They are retained in an internal buffer that is separate from the internal L1 and L2 caches. The buffer is not snooped and thus does not provide data coherency. The write buffering is done to allow software a small window of time to supply more modified data to the buffer while remaining as nonintrusive to software as possible. The size of the buffer is not architecturally defined, However the Pentium Pro and Pentium II processors implement a single concurrent 32-byte buffer. The size of this buffer was chosen by implementation convenience. In the Pentium III processor there are 4 write combine buffers. The size is the same as for the Pentium Pro and Pentium II processors. Buffer size and quantity changes may occur in future generations of the P6 family processors and so software should not rely upon the current 32-byte WC buffer size or the existence of a single concurrent buffer or the 4 buffers in the Penitum III processor. The WC buffering of writes also causes data to be collapsed (for example, multiple writes to the same location will leave the last data written in the location and the other writes will be lost). For the Pentium Pro and Pentium II processors, once software writes to a region of memory that is addressed outside of the range of the current 32-byte buffer, the data in the buffer is automatically forwarded to the system bus and written to memory. Therefore software that writes more than one 32-byte buffers worth of data will ensure that the data from the first buffers address range is forwarded to memory. The last buffer written in the sequence may be delayed by the processor longer unless the buffers are deliberately emptied. Software developers should not rely on the fact that there is only one active WC buffer at a time. Software developers creating software that is sensitive to data being delayed must deliberately empty the WC buffers and not assume the hardware will. Once the processor has started to move data into the WC buffer, it will make a bus transaction style decision based on how much of the buffer contains valid data. If the buffer is full (for example, all 32 bytes are valid) the processor will execute a burst write transaction on the bus that will result in all 32 bytes being transmitted on the data bus in a single transaction. If one or more of the WC buffers bytes are invalid (for example, have not been written by software) then the processor will start to move the data to memory using partial write transactions on the system bus. There will be a maximum of 4 partial write transactions for one WC buffer of data sent to memory. Once data in the WC buffer has started to be propagated to memory, the data is
9-7
subject to the weak ordering semantics of its definition. Ordering is not maintained between the successive allocation/deallocation of WC buffers (for example, writes to WC buffer 1 followed by writes to WC buffer 2 may appear as buffer 2 followed by buffer 1 on the system bus. When a WC buffer is propagated to memory as partial writes there is no guaranteed ordering between successive partial writes (for example, a partial write for chunk 2 may appear on the bus before the partial write for chunk 1 or vice versa). The only elements of WC propagation to the system bus that are guaranteed are those provided by transaction atomicity. For the P6 family processors, a completely full WC buffer will always be propagated as a single burst transaction using any of the chunk orders. In a WC buffer propagation where the data will be propagated as partials, all data contained in the same chunk (0 mod 8 aligned) will be propagated simultaneously.
9.3.2.
Choosing a Memory Type
The simplest system memory model does not use memory-mapped I/O with read or write side effects, does not include a frame buffer, and uses the write-back memory type for all memory. An I/O agent can perform direct memory access (DMA) to write-back memory and the cache protocol maintains cache coherency. A system can use uncacheable memory for other memory-mapped I/O, and should always use uncacheable memory for memory-mapped I/O with read side effects. Dual-ported memory can be considered a write side effect, making relatively prompt writes desirable, because those writes cannot be observed at the other port until they reach the memory agent. A system can use uncacheable, write-through, or write-combining memory for frame buffers or dual-ported memory that contains pixel values displayed on a screen. Frame buffer memory is typically large (a few megabytes) and is usually written more than it is read by the processor. Using uncacheable memory for a frame buffer generates very large amounts of bus traffic, because operations on the entire buffer are implemented using partial writes rather than line writes. Using write-through memory for a frame buffer can displace almost all other useful cached lines in the processors L2 cache and L1 data cache. Therefore, systems should use writecombining memory for frame buffers whenever possible. Software can use page-level cache control, to assign appropriate effective memory types when software will not access data structures in ways that benefit from write-back caching. For example, software may read a large data structure once and not access the structure again until the structure is rewritten by another agent. Such a large data structure should be marked as uncacheable, or reading it will evict cached lines that the processor will be referencing again. A similar example would be a write-only data structure that is written to (to export the data to another agent), but never read by software. Such a structure can be marked as uncacheable, because software never reads the values that it writes (though as uncacheable memory, it will be written using partial writes, while as write-back memory, it will be written using line writes, which may not occur until the other agent reads the structure and triggers implicit write-backs). On the Pentium III processor, new capabilities exist that may allow the programmer to perform similar functions with the prefetch and streaming store instructions. For more information on these instructions, see Section 3.2., Instruction Reference in Chapter 3, Instruction Set Reference.
9-8
9.4.
CACHE CONTROL PROTOCOL
The following section describes the cache control protocol currently defined for the Intel Architecture processors. This protocol is used by the P6 family and Pentium processors. The Intel486 processor uses an implementation defined protocol that does not support the MESI four-state protocol, but instead uses a two-state protocol with valid and invalid states defined. In the L1 data cache and the P6 family processors L2 cache, the MESI (modified, exclusive, shared, invalid) cache protocol maintains consistency with caches of other processors. The L1 data cache and the L2 cache has two MESI status flags per cache line. Each line can thus be marked as being in one of the states defined in Table 9-3. In general, the operation of the MESI protocol is transparent to programs. The L1 instruction cache implements only the SI part of the MESI protocol, because the instruction cache is not writable. The instruction cache monitors changes in the data cache to maintain consistency between the caches when instructions are modified. See Section 9.7., Self-Modifying Code, for more information on the implications of caching instructions.
Table 9-3. MESI Cache Line States
Cache Line State This cache line is valid? The memory copy is Copies exist in caches of other processors? A write to this line M (Modified) Yes out of date No does not go to bus E (Exclusive) Yes valid No does not go to bus S (Shared) Yes valid Maybe causes the processor to gain exclusive ownership of the line No Maybe goes directly to bus I (Invalid)
9.5.
CACHE CONTROL
The current Intel Architecture provides the following cache-control mechanisms for use in enabling caching and/or restricting caching to various pages or regions in memory (see Figure 9-2):
CD flag, bit 30 of control register CR0Controls caching of system memory locations. For more information, see Section 2.5., Control Registers, in Chapter 2, System Architecture Overview. If the CD flag is clear, caching is enabled for the whole of system memory, but may be restricted for individual pages or regions of memory by other cachecontrol mechanisms. When the CD flag is set, caching is restricted in the L1 and L2 caches for the P6 family processors and prevented for the Pentium and Intel486 processors (see note below). With the CD flag set, however, the caches will still respond to snoop traffic. Caches should be explicitly flushed to insure memory coherency. For highest processor performance, both the CD and the NW flags in control register CR0 should be cleared. Table 9-4 shows the interaction of the CD and NW flags.
9-9
NOTE
The effect of setting the CD flag is somewhat different for the P6 family, Pentium, and Intel486 processors (see Table 9-4). To insure memory coherency after the CD flag is set, the caches should be explicitly flushed. For more information, see Section 9.5.2., Preventing Caching. Setting the CD flag for the P6 family processors modifies cache line fill and update behaviour. Also for the P6 family processors, setting the CD flag does not force strict ordering of memory accesses unless the MTRRs are disabled and/or all memory is referenced as uncached. For more information, see Section 7.2.4., Strengthening or Weakening the Memory Ordering Model, in Chapter 7, Multiple-Processor Management.
CR4
P G E
CR3
P P C W D T
Enables global pages designated with G flag
Physical Memory
FFFFFFFFH2
Control caching of page directory Page-Directory or Page-Table Entry

P P G1 C W D T
CR0
C N D W
MTRRs3 MTRRs control caching of selected regions of physical memory Memory Types Allowed: Uncacheable (UC) Write-Protected (WP) Write-Combining (WC) Write-Through (WT) Write-Back (WB)
CD and NW Flags control overall caching of system memory
PCD and PWT flags control page-level caching G flag controls pagelevel flushing of TLBs
Write Buffer TLBs
1. G flag only available in P6 family processors. 2. If 36-bit physical addressing is being used, the maximum physical address size is FFFFFFFFFH. 3. MTRRs available only in P6 family processors; similar control available in Pentium processor with KEN# and WB/WT# pins, and in Intel486 processor.
Figure 9-2. Cache-Control Mechanisms Available in the Intel Architecture Processors
9-10
Table 9-4. Cache Operating Modes

CD 0 NW 0 Caching and Read/Write Policy Normal highest performance cache operation. - Read hits access the cache; read misses may cause replacement. - Write hits update the cache. - (Pentium and P6 family processors.) Only writes to shared lines and write misses update system memory. - (P6 family processors.) Write misses cause cache line fills; write hits can change shared lines to exclusive under control of the MTRRs - (Pentium processor.) Write misses do not cause cache line fills; write hits can change shared lines to exclusive under control of WB/WT#. - (Intel486 processor.) All writes update system memory; write misses do not cause cache line fills. - Invalidation is allowed. - External snoop traffic is supported. Invalid setting. A general-protection exception (#GP) with an error code of 0 is generated. Memory coherency is maintained. - Read hits access the cache; read misses do not cause replacement. - Write hits update the cache. - (Pentium and P6 family processors.) Only writes to shared lines and write misses update system memory. - (Intel486 processor.) All writes update system memory - (Pentium processor.) Write hits can change shared lines to exclusive under control of the WB/WT#. - (P6 family processors.) Strict memory ordering is not enforced unless the MTRRs are disabled and/or all memory is referenced as uncached. For more information, see Section 7.2.4., Strengthening or Weakening the Memory Ordering Model. - Invalidation is allowed. - External snoop traffic is supported. Memory coherency is not maintained. This is the state of the processor after a power up or reset. - Read hits access the cache; read misses do not cause replacement. - Write hits update the cache. - (Pentium and P6 family processors.) Write hits change exclusive lines to modified. - (Pentium and P6 family processors.) Shared lines remain shared after write hit. - Write misses access memory. - (P6 family processors.) Strict memory ordering is not enforced unless the MTRRs are disabled and/or all memory is referenced as uncached. For more information, see Section 7.2.4., Strengthening or Weakening the Memory Ordering Model. - Invalidation is inhibited when snooping; but is allowed with INVD and WBINVD instructions. - External snoop traffic is supported. L1 Yes Yes Yes Yes Yes Yes Yes Yes NA Yes Yes NA L21 Yes Yes Yes Yes
Yes Yes Yes Yes Yes Yes
Yes Yes Yes
Yes
Yes Yes
Yes Yes
Yes No
Yes Yes
NOTE: 1. The P6 family processors are the only Intel Architecture processors that contain an integrated L2 cache. The L2 column in this table is definitive for the P6 family processors. It is intended to represent what could be implemented in a Pentium processor based system with a platform specific write-back L2 cache.
9-11
NW flag, bit 29 of control register CR0Controls the write policy for system memory locations. For more information, see Section 2.5., Control Registers, in Chapter 2, System Architecture Overview. If the NW and CD flags are clear, write-back is enabled for the whole of system memory (write-through for the Intel486 processor), but may be restricted for individual pages or regions of memory by other cache-control mechanisms. Table 9-4 shows how the other combinations of CD and NW flags affects caching.
NOTE
For the Pentium processor, when the L1 cache is disabled (the CD and NW flags in control register CR0 are set), external snoops are accepted in DP (dual-processor) systems and inhibited in uniprocessor systems. When snoops are inhibited, address parity is not checked and APCHK# is not asserted for a corrupt address; however, when snoops are accepted, address parity is checked and APCHK# is asserted for corrupt addresses.
PCD flag in the page-directory and page-table entriesControls caching for individual page tables and pages, respectively. For more information, see Section 3.6.4., PageDirectory and Page-Table Entries, in Chapter 3, Protected-Mode Memory Management. This flag only has effect when paging is enabled and the CD flag in control register CR0 is clear. The PCD flag enables caching of the page table or page when clear and prevents caching when set. PWT flag in the page-directory and page-table entriesControls the write policy for individual page tables and pages, respectively. For more information, see Section 3.6.4., Page-Directory and Page-Table Entries, in Chapter 3, Protected-Mode Memory Management. This flag only has effect when paging is enabled and the NW flag in control register CR0 is clear. The PWT flag enables write-back caching of the page table or page when clear and write-through caching when set. PCD and PWT flags in control register CR3. Control the global caching and write policy for the page directory. For more information, see Section 2.5., Control Registers, in Chapter 2, System Architecture Overview. The PCD flag enables caching of the page directory when clear and prevents caching when set. The PWT flag enables write-back caching of the page directory when clear and write-through caching when set. These flags do not affect the caching and write policy for individual page tables. These flags only have effect when paging is enabled and the CD flag in control register CR0 is clear. G (global) flag in the page-directory and page-table entries (introduced to the Intel Architecture in the P6 family processors)Controls the flushing of TLB entries for individual pages. See Section 3.7., Translation Lookaside Buffers (TLBs), in Chapter 3, ProtectedMode Memory Management, for more information about this flag. PGE (page global enable) flag in control register CR4Enables the establishment of global pages with the G flag. See Section 3.7., Translation Lookaside Buffers (TLBs), in Chapter 3, Protected-Mode Memory Management, for more information about this flag. Memory type range registers (MTRRs) (introduced in the P6 family processors)Control the type of caching used in specific regions of physical memory. Any of the caching types described in Section 9.3., Methods of Caching Available, can be selected. See Section
9-12
9.12., Memory Type Range Registers (MTRRs), for a detailed description of the MTRRs.
KEN# and WB/WT# pins on Pentium processor and KEN# pin alone on the Intel486 processorThese pins allow external hardware to control the caching method used for specific areas of memory. They perform similar (but not identical) functions to the MTRRs in the P6 family processors. PCD and PWT pins on the Pentium and Intel486 processorsThese pins (which are associated with the PCD and PWT flags in control register CR3 and in the page-directory and page-table entries) permit caching in an external L2 cache to be controlled on a pageby-page basis, consistent with the control exercised on the L1 cache of these processors. The P6 family processors do not provide these pins because the L2 cache in internal to the chip package.
9.5.1.
Precedence of Cache Controls (P6 Family Processor)
In the P6 family processors, the cache control flags and MTRRs operate hierarchically for restricting caching. That is, if the CD flag is set, caching is prevented globally (see Table 9-4). If the CD flag is clear, either the PCD flags and/or the MTRRs can be used to restrict caching. If there is an overlap of page-level caching control and MTRR caching control, the mechanism that prevents caching has precedence. For example, if an MTRR makes a region of system memory uncachable, a PCD flag cannot be used to enable caching for a page in that region. The converse is also true; that is, if the PCD flag is set, an MTRR cannot be used to make a region of system memory cacheable. In cases where there is a overlap in the assignment of the write-back and write-through caching policies to a page and a region of memory, the write-through policy takes precedence. The writecombining policy (which can only be assigned through an MTRR) takes precedence over either write-through or write-back. Table 9-5 describes the mapping from MTRR memory types and page-level caching attributes to effective memory types, when normal caching is in effect (the CD and NW flags in control register CR0 are clear). Combinations that appear in gray are implementation-defined and may be implemented differently on future Intel Architecture processors. System designers are encouraged to avoid these implementation-defined combinations. When normal caching is in effect, the effective memory type is determined using the following rules: 1. If the PCD and PWT attributes for the page are both 0, then the effective memory type is identical to the MTRR-defined memory type. 2. If the PCD flag is set, then the effective memory type is UC. 3. If the PCD flag is clear and the PWT flag is set, the effective memory type is WT for the WB memory type and the MTRR-defined memory type for all other memory types. 4. Setting the PCD and PWT flags to opposite values is considered model-specific for the WP and WC memory types and architecturally-defined for the WB, WT, and UC memory types.
9-13
Table 9-5. Effective Memory Type Depending on MTRR, PCD, and PWT Settings
MTRR Memory Type UC WC PCD Value X 0 0 1 1 WT 0 1 WP 0 0 1 1 WB 0 0 1 NOTE: This table assumes that the CD and NW flags in register CR0 are set to 0. The effective memory types in the grey areas are implementation defined and may be different in future Intel Architecture processors. PWT Value X 0 1 0 1 X X 0 1 0 1 0 1 X Effective Memory Type UC WC WC WC UC WT UC WP WP WC UC WB WT UC
9.5.2.
Preventing Caching
To prevent the L1 and L2 caches from performing caching operations after they have been enabled and have received cache fills, perform the following steps: 1. Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1 and the NW flag to 0. 2. Flush all caches using the WBINVD instruction. 3. Disable the MTRRs and set the default memory type to uncached or set all MTRRs for the uncached memory type. For more information, see the discussion of the TYPE field and the E flag in Section 9.12.2.1., MTRRdefType Register. The caches must be flushed when the CD flag is cleared to insure system memory coherency. If the caches are not flushed in step 2, cache hits on reads will still occur and data will be read from valid cache lines.
9-14
9.6.
CACHE MANAGEMENT INSTRUCTIONS
The INVD and WBINVD instructions are used to invalidate the contents of the L1 and L2 caches. The INVD instruction invalidates all internal cache entries, then generates a specialfunction bus cycle that indicates that external caches also should be invalidated. The INVD instruction should be used with care. It does not force a write-back of modified cache lines; therefore, data stored in the caches and not written back to system memory will be lost. Unless there is a specific requirement or benefit to invalidating the caches without writing back the modified lines (such as, during testing or fault recovery where cache coherency with main memory is not a concern), software should use the WBINVD instruction. The WBINVD instruction first writes back any modified lines in all the internal caches, then invalidates the contents of both L1 and L2 caches. It ensures that cache coherency with main memory is maintained regardless of the write policy in effect (that is, write-through or writeback). Following this operation, the WBINVD instruction generates one (P6 family processors) or two (Pentium and Intel486 processors) special-function bus cycles to indicate to external cache controllers that write-back of modified data followed by invalidation of external caches should occur.
9.7.
SELF-MODIFYING CODE
A write to a memory location in a code segment that is currently cached in the processor causes the associated cache line (or lines) to be invalidated. This check is based on the physical address of the instruction. In addition, the P6 family and Pentium processors check whether a write to a code segment may modify an instruction that has been prefetched for execution. If the write affects a prefetched instruction, the prefetch queue is invalidated. This latter check is based on the linear address of the instruction. In practice, the check on linear addresses should not create compatibility problems among Intel Architecture processors. Applications that include self-modifying code use the same linear address for modifying and fetching the instruction. Systems software, such as a debugger, that might possibly modify an instruction using a different linear address than that used to fetch the instruction, will execute a serializing operation, such as a CPUID instruction, before the modified instruction is executed, which will automatically resynchronize the instruction cache and prefetch queue. See Section 7.1.3., Handling Self- and Cross-Modifying Code, in Chapter 7, Multiple-Processor Management, for more information about the use of self-modifying code. For Intel486 processors, a write to an instruction in the cache will modify it in both the cache and memory, but if the instruction was prefetched before the write, the old version of the instruction could be the one executed. To prevent the old instruction from being executed, flush the instruction prefetch unit by coding a jump instruction immediately after any write that modifies an instruction.
9-15
9.8.
IMPLICIT CACHING (P6 FAMILY PROCESSORS)
Implicit caching occurs when a memory element is made potentially cacheable, although the element may never have been accessed in the normal von Neumann sequence. Implicit caching occurs on the P6 family processors due to aggressive prefetching, branch prediction, and TLB miss handling. Implicit caching is an extension of the behavior of existing Intel386, Intel486, and Pentium processor systems, since software running on these processor families also has not been able to deterministically predict the behavior of instruction prefetch. To avoid problems related to implicit caching, the operating system must explicitly invalidate the cache when changes are made to cacheable data that the cache coherency mechanism does not automatically handle. This includes writes to dual-ported or physically aliased memory boards that are not detected by the snooping mechanisms of the processor, and changes to pagetable entries in memory. The code in Example 9-1 shows the effect of implicit caching on page-table entries. The linear address F000H points to physical location B000H (the page-table entry for F000H contains the value B000H), and the page-table entry for linear address F000 is PTE_F000.
Example 9-1. Effect of Implicit Caching on Page-Table Entries mov mov mov mov EAX, CR3 ; Invalidate the TLB CR3, EAX ; by copying CR3 to itself PTE_F000, A000H; Change F000H to point to A000H EBX, [F000H];
Because of speculative execution in the P6 family processors, the last MOV instruction performed would place the value at physical location B000H into EBX, rather than the value at the new physical address A000H. This situation is remedied by placing a TLB invalidation between the load and the store.
9.9.
EXPLICIT CACHING
The Pentium III processor introduced a new instruction designed to provide some control over caching of data. The prefetch instruction is a hint to the processor that the data requested by the prefetch instruction should be read into cache, even though it is not needed yet. The processor assumes it will be needed soon. Explicit caching occurs when the application program executes a prefetch instruction. The programmer must be judicious in the use of the prefetch instruction. Overuse can lead to resource conflicts and hence reduce the performance of an application. For more detailed information on the proper use of the prefetch instruction, refer to Chapter 6, Optimizing Cache Utilization for Pentium III Processors, in the Intel Architecture Optimization Reference Manual (Order Number 245127-001). Prefetch can be used to read data into the cache prior to the application actually requiring it. This helps to reduce the long latency typically associated with reading data from memory and causing the processor to stall. It is important to remember that prefetch is only a hint to the processor
9-16
to fetch the data now or as soon as possible. It will be used soon. The prefetch instruction has different variations that allow the programmer to control into which cache level the data will be read. For more information on the variations of the prefetch instruction refer to Section 9.5.3.1., Cacheability Hint Instructions, Chapter 9, Programming with the Streaming SIMD Extensions, if the Intel Architecture Software Developers Manual, Volume 2.
9.10. INVALIDATING THE TRANSLATION LOOKASIDE BUFFERS (TLBS)

The processor updates its address translation caches (TLBs) transparently to software. Several mechanisms are available, however, that allow software and hardware to invalidate the TLBs either explicitly or as a side effect of another operation. The INVLPG instruction invalidates the TLB for a specific page. This instruction is the most efficient in cases where software only needs to invalidate a specific page, because it improves performance over invalidating the whole TLB. This instruction is not affected by the state of the G flag in a page-directory or page-table entry. The following operations invalidate all TLB entries except global entries. (A global entry is one for which the G (global) flag is set in its corresponding page-directory or page-table entry. The global flag was introduced into the Intel Architecture in the P6 family processors, see Section 9.5., Cache Control.)
Writing to control register CR3. A task switch that changes control register CR3.
The following operations invalidate all TLB entries, irrespective of the setting of the G flag: Asserting or de-asserting the FLUSH# pin. (P6 family processors only.) Writing to an MTRR (with a WRMSR instruction). Writing to control register CR0 to modify the PG or PE flag. (P6 family processors only.) Writing to control register CR4 to modify the PSE, PGE, or PAE flag.
See Section 3.7., Translation Lookaside Buffers (TLBs), in Chapter 3, Protected-Mode Memory Management, for additional information about the TLBs.
9.11. WRITE BUFFER

Intel Architecture processors temporarily store each write (store) to memory in a write buffer. The write buffer improves processor performance by allowing the processor to continue executing instructions without having to wait until a write to memory and/or to a cache is complete. It also allows writes to be delayed for more efficient use of memory-access bus cycles.
9-17
In general, the existence of the write buffer is transparent to software, even in systems that use multiple processors. The processor ensures that write operations are always carried out in program order. It also insures that the contents of the write buffer are always drained to memory in the following situations:
When an exception or interrupt is generated. (P6 family processors only.) When a serializing instruction is executed. When an I/O instruction is executed. When a LOCK operation is performed. (P6 family processors only.) When a BINIT operation is performed. (Pentium III processors only.) When using SFENCE to order stores.
The discussion of write ordering in Section 7.2., Memory Ordering, in Chapter 7, MultipleProcessor Management, gives a detailed description of the operation of the write buffer.
9.12. MEMORY TYPE RANGE REGISTERS (MTRRS)

The following section pertains only to the P6 family processors. The memory type range registers (MTRRs) provide a mechanism for associating the memory types with physical-address ranges in system memory. For more information, see Section 9.3., Methods of Caching Available. They allow the processor to optimize operations for different types of memory such as RAM, ROM, frame-buffer memory, and memory-mapped I/O devices. They also simplify system hardware design by eliminating the memory control pins used for this function on earlier Intel Architecture processors and the external logic needed to drive them. The MTRR mechanism allows up to 96 memory ranges to be defined in physical memory, and it defines a set of model-specific registers (MSRs) for specifying the type of memory that is contained in each range. Table 9-6 shows the memory types that can be specified and their properties; Figure 9-3 shows the mapping of physical memory with MTRRs. See Section 9.3., Methods of Caching Available, for a more detailed description of each memory type. Following a hardware reset, a P6 family processor disables all the fixed and variable MTRRs, which in effect makes all of physical memory uncachable. Initialization software should then set the MTRRs to a specific, system-defined memory map. Typically, the BIOS (basic input/output system) software configures the MTRRs. The operating system or executive is then free to modify the memory map using the normal page-level cacheability attributes. In a multiprocessor system, different P6 family processors MUST use the identical MTRR memory map so that software has a consistent view of memory, independent of the processor executing a program.
9-18
Table 9-6. MTRR Memory Types and Their Properties

Encoding in MTRR 0 1 4 5 6 2, 3, 7 through 255 Cacheable in L1 and L2 Caches No No Yes Yes for reads, no for writes Yes Writeback Cacheable No No No No Yes Allows Speculative Reads No Yes Yes Yes Yes Memory Ordering Model Strong Ordering Weak Ordering Speculative Processor Ordering Speculative Processor Ordering Speculative Processor Ordering
Mnemonic Uncacheable (UC) Write Combining (WC) Write-through (WT) Write-protected (WP) Writeback (WB) Reserved Encodings* NOTE:
* Using these encoding result in a general-protection exception (#GP) being generated.
9-19
Physical Memory
FFFFFFFFH
Address ranges not mapped by an MTRR are set to a default type
8 variable ranges (from 4 KBytes to maximum size of physical memory)
64 fixed ranges (4 KBytes each) 16 fixed ranges (16 KBytes each) 8 fixed ranges (64-KBytes each)
256 KBytes 256 KBytes 512 KBytes
100000H FFFFFH C0000H BFFFFH 80000H 7FFFFH
Figure 9-3. Mapping Physical Memory With MTRRs
9.12.1. MTRR Feature Identification

The availability of the MTRR feature is model-specific. Software can determine if MTRRs are supported on a processor by executing the CPUID instruction and reading the state of the MTRR flag (bit 12) in the feature information register (EDX). If the MTRR flag is set (indicating that the processor implements MTRRs), additional information about MTRRs can be obtained from the 64-bit MTRRcap register. The MTRRcap register is a read-only MSR that can be read with the RDMSR instruction. Figure 9-4 shows the contents of the MTRRcap register. The functions of the flags and field in this register are as follows: VCNT (variable range registers count) field, bits 0 through 7 Indicates the number of variable ranges implemented on the processor. The P6 family processors have eight pairs of MTRRs for setting up eight variable ranges.
9-20
63
11 10 9 8 7
Reserved WCWrite-combining memory type supported FIXFixed range registers supported VCNTNumber of variable range registers Reserved
W C
F I X
VCNT
Figure 9-4. MTRRcap Register
FIX (fixed range registers supported) flag, bit 8 Fixed range MTRRs (MTRRfix64K_00000 through MTRRfix4K_0F8000) are supported when set; no fixed range registers are supported when clear. WC (write combining) flag, bit 10 The write-combining (WC) memory type is supported when set; the WC type is not supported when clear. Bit 9 and bits 11 through 63 in the MTRRcap register are reserved. If software attempts to write to the MTRRcap registers, a general-protection exception (#GP) is generated. For the P6 family processors, the MTRRcap register always contains the value 508H.
9.12.2. Setting Memory Ranges with MTRRs

The memory ranges and the types of memory specified in each range are set by three groups of registers: the MTRRdefType register, the fixed-range MTRRs, and the variable range MTRRs. These registers can be read and written to using the RDMSR and WRMSR instructions, respectively. The MTRRcap register indicates the availability of these registers on the processor. For more information, see Section 9.12.1., MTRR Feature Identification. 9.12.2.1. MTRRDEFTYPE REGISTER
The MTRRdefType register (see Figure 9-4) sets the default properties of the regions of physical memory that are not encompassed by MTRRs. For more information, see Section 9.4., Cache Control Protocol. The functions of the flags and field in this register are as follows: Type field, bits 0 through 7 Indicates the default memory type used for those physical memory address ranges that do not have a memory type specified for them by an MTRR. See Table 9-6 for the encoding of this field. If the MTRRs are disabled, this field defines the memory type for all of physical memory. The legal values for this field are 0, 1, 4, 5, and 6. All other values result in a general-protection exception (#GP) being generated.
9-21
Intel recommends the use of the UC (uncached) memory type for all physical memory addresses where memory does not exist. To assign the UC type to nonexistent memory locations, it can either be specified as the default type in the Type field or be explicitly assigned with the fixed and variable MTRRs.
63
12 11 10 9 8 7
Reserved EMTRR enable/disable FEFixed-range MTRRs enable/disable TypeDefault memory type Reserved
F E E
Type
Figure 9-5. MTRRdefType Register
FE (fixed MTRRs enabled) flag, bit 10 Fixed-range MTRRs are enabled when set; fixed-range MTRRs are disabled when clear. When the fixed-range MTRRs are enabled, they take priority over the variable-range MTRRs when overlaps in ranges occur. If the fixed-range MTRRs are disabled, the variable-range MTRRs can still be used and can map the range ordinarily covered by the fixed-range MTRRs. E (MTRRs enabled) flag, bit 11 MTRRs are enabled when set; all MTRRs are disabled when clear, and the UC memory type is applied to all of physical memory. When this flag is set, the FE flag can disable the fixed-range MTRRs; when the flag is clear, the FE flag has no affect. When the E flag is set, the type specified in the default memory type field is used for areas of memory not already mapped by either a fixed or variable MTRR. Bits 8 and 9, and bits 12 through 63, in the MTRRdefType register are reserved; the processor generates a general-protection exception (#GP) if software attempts to write nonzero values to them. 9.12.2.2. FIXED RANGE MTRRS
The fixed memory ranges are mapped with 8 fixed-range registers of 64 bits each. Each of these registers is divided into 8-bit fields that are used to specify the memory type for each of the subranges the register controls. Table 9-7 shows the relationship between the fixed physical-address ranges and the corresponding fields of the fixed-range MTRRs; Table 9-6 shows the encoding of these field:
Register MTRRfix64K_00000. Maps the 512-KByte address range from 0H to 7FFFFH. This range is divided into eight 64-KByte sub-ranges.
9-22
Registers MTRRfix16K_80000 and MTRRfix16K_A0000. Maps the two 128-KByte address ranges from 80000H to BFFFFH. This range is divided into sixteen 16-KByte subranges, 8 ranges per register. Registers MTRRfix4K_C0000. and MTRRfix4K_F8000. Maps eight 32-KByte address ranges from C0000H to FFFFFH. This range is divided into sixty-four 4-KByte subranges, 8 ranges per register.
See the Pentium Pro BIOS Writers Guide for examples of assigning memory types with fixedrange MTRRs.
Table 9-7. Address Mapping for Fixed-Range MTRRs
Address Range (hexadecimal) 63 56 700007FFFF 9C000 9FFFF BC000 BFFFF C7000 C7FFF CF000 CFFFF D7000 D7FFF DF000 DFFFF E7000 E7FFF EF000 EFFFF F7000 F7FFF FF000 FFFFF 55 48 47 40 39 32 31 24 23 16 15 8 7 0 MTRRfix64K _00000 MTRRfix16K _80000 MTRRfix16K _A0000 MTRRfix4K_ C0000 MTRRfix4K_ C8000 MTRRfix4K_ D0000 MTRRfix4K_ D8000 MTRRfix4K_ E0000 MTRRfix4K_ E8000 MTRRfix4K_ F0000 MTRRfix4K_ F8000 Register
600006FFFF 9800098FFF B8000BBFFF C6000C6FFF CE000CEFFF D6000D6FFF DE000DEFFF E6000E6FFF EE000EEFFF F6000F6FFF FE000FEFFF
500005FFFF 9400097FFF B4000B7FFF C5000C5FFF CD000CDFFF D5000D5FFF DD000DDFFF E5000E5FFF ED000EDFFF F5000F5FFF FD000FDFFF
400004FFFF 9000093FFF B0000B3FFF C4000C4FFF CC000CCFFF D4000D4FFF DC000DCFFF E4000E4FFF EC000ECFFF F4000F4FFF FC000FCFFF
300003FFFF 8C0008FFFF AC000AFFFF C3000C3FFF CB000CBFFF D3000D3FFF DB000DBFFF E3000E3FFF EB000EBFFF F3000F3FFF FB000FBFFF
200002FFFF 880008BFFF A8000ABFFF C2000C2FFF CA000CAFFF D2000D2FFF DA000DAFFF E2000E2FFF EA000EAFFF F2000F2FFF FA000FAFFF
100001FFFF 8400087FFF A4000A7FFF C1000C1FFF C9000C9FFF D1000D1FFF D9000D9FFF E1000E1FFF E9000E9FFF F1000F1FFF F9000F9FFF
000000FFFF 8000083FFF A0000A3FFF C0000C0FFF C8000C8FFF D0000D0FFF D8000D8FFF E0000E0FFF E8000E8FFF F0000F0FFF F8000F8FFF
9.12.2.3.
VARIABLE RANGE MTRRS
The P6 family processors permit software to specify the memory type for eight variable-size address ranges, using a pair of MTRRs for each range. The first of each pair (MTRRphysBasen) defines the base address and memory type for the range, and the second (MTRRphysMaskn) contains a mask that is used to determine the address range. The n suffix indicates registers pairs 0 through 7. Figure 9-6 shows flags and fields in these registers. The functions of the flags and fields in these registers are as follows:
9-23
Type field, bits 0 through 7 Specifies the memory type for the range. See Table 9-6 for the encoding of this field.
MTRRphysBasen Register
63 36 35 12 11 8 7 0
Reserved
PhysBase
Type
PhysBaseBase address of range TypeMemory type for range
MTRRphysMaskn Register
63 36 35 12 11 10 0
Reserved
PhysMask
Reserved
PhysMaskSets range mask VValid Reserved
Figure 9-6. MTRRphysBasen and MTRRphysMaskn Variable-Range Register Pair
PhysBase field, bits 12 through 35 Specifies the base address of the address range. This 24-bit value is extended by 12 bits at the low end to form the base address, which automatically aligns the address on a 4-KByte boundary. PhysMask field, bits 12 through 35 Specifies a 24-bit mask that determines the range of the region being mapped, according to the following relationship: Address_Within_Range AND PhysMask = PhysBase AND PhysMask This 24-bit value is extended by 12 bits at the low end to form the mask value. See Section 9.12.3., Example Base and Mask Calculations, for more information and some examples of base address and mask computations. V (valid) flag, bit 11 Enables the register pair when set; disables register pair when clear. All other bits in the MTRRphysBasen and MTRRphysMaskn registers are reserved; the processor generates a general-protection exception (#GP) if software attempts to write to them. Overlapping variable MTRR ranges are not supported generically. However, two variable ranges are allowed to overlap, if the following conditions are present:
If both of them are UC (uncached).
9-24
If one range is of type UC and the other is of type WB (write back).
In both cases above, the effective type for the overlapping region is UC. The processors behavior is undefined for all other cases of overlapping variable ranges. A variable range can overlap a fixed range (provided the fixed range MTRRs are enabled). Here, the memory type specified in the fixed range register overrides the one specified in variable-range register pair.
NOTE
Some mask values can result in discontinuous ranges. In a discontinuous range, the area not mapped by the mask value is set to the default memory type. Intel does not encourage the use of discontinuous ranges, because they could require physical memory to be present throughout the entire 4-GByte physical memory map. If memory is not provided for the complete memory map, the behaviour of the processor is undefined.
9.12.3. Example Base and Mask Calculations

The base and mask values entered into the variable-range MTRR pairs are 24-bit values that the processor extends to 36-bits. For example, to enter a base address of 2 MBytes (200000H) to the MTRRphysBase3 register, the 12 least-significant bits are truncated and the value 000200H is entered into the PhysBase field. The same operation must be performed on mask values. For instance, to map the address range from 200000H to 3FFFFFH (2 MBytes to 4 MBytes), a mask value of FFFE00000H is required. Here again, the 12 least-significant bits of this mask value are truncated, so that the value entered in the PhysMask field of the MTRRphysMask3 register is FFFE00H. This mask is chosen so that when any address in the 200000H to 3FFFFFH range is ANDed with the mask value it will return the same value as when the base address is ANDed with the mask value (which is 200000H). To map the address range from 400000H 7FFFFFH (4 MBytes to 8 MBytes), a base value of 000400H is entered in the PhysBase field and a mask value of FFFC00H is entered in the PhysMask field. Here is a real-life example of setting up the MTRRs for an entire system. Assume that the system has the following characteristics:
96 MBytes of system memory is mapped as write-back memory (WB) for highest system performance. A custom 4-MByte I/O card is mapped to uncached memory (UC) at a base address of 64 MBytes. This restriction forces the 96 MBytes of system memory to be addressed from 0 to 64 MBytes and from 68 MBytes to 100 MBytes, leaving a 4-MByte hole for the I/O card. An 8-MByte graphics card is mapped to write-combining memory (WC) beginning at address A0000000H. The BIOS area from 15 MBytes to 16 MBytes is mapped to UC memory.
9-25
The following settings for the MTRRs will yield the proper mapping of the physical address space for this system configuration. The x0_0x notation is used below to add clarity to the large numbers represented.
MTRRPhysBase0 = MTRRPhysMask0 = MTRRPhysBase1 = MTRRPhysMask1 = MTRRPhysBase2 = MTRRPhysMask2 = MTRRPhysBase3 = MTRRPhysMask3 = MTRRPhysBase4 = MTRRPhysMask4 = MTRRPhysBase5 = MTRRPhysMask5 = 0000_0000_0000_0006h 0000_000F_FC00_0800h 0000_0000_0400_0006h 0000_000F_FE00_0800h 0000_0000_0600_0006h 0000_000F_FFC0_0800h 0000_0000_0400_0000h 0000_000F_FFC0_0800h 0000_0000_00F0_0000h 0000_000F_FFF0_0800h 0000_0000_A000_0001h 0000_000F_FF80_0800h Caches 0-64 MB as WB cache type. Caches 64-96 MB as WB cache type. Caches 96-100 MB as WB cache type. Caches 64-68 MB as UC cache type. Caches 15-16 MB as UC cache type Cache A0000000h-A0800000 as WC type.
This MTRR setup uses the ability to overlap any two memory ranges (as long as the ranges are mapped to WB and UC memory types) to minimize the number of MTRR registers that are required to configure the memory environment. This setup also fulfills the requirement that two register pairs are left for operating system usage.
9.12.4. Range Size and Alignment Requirement

The range that is to be mapped to a variable-range MTRR must meet the following power of 2 size and alignment rules: 1. The minimum range size is 4 KBytes, and the base address of this range must be on at least a 4-KByte boundary. 2. For ranges greater than 4 KBytes, each range must be of length 2n and its base address must be aligned on a 2n boundary, where n is a value equal to or greater than 12. The baseaddress alignment value cannot be less than its length. For example, an 8-KByte range cannot be aligned on a 4-KByte boundary. It must be aligned on at least an 8-KByte boundary. 9.12.4.1. MTRR PRECEDENCES
If the MTRRs are not enabled (by setting the E flag in the MTRRdefType register), then all memory accesses are of the UC memory type. If the MTRRs are enabled, then the memory type used for a memory access is determined as follows: 1. If the physical address falls within the first 1 MByte of physical memory and fixed MTRRs are enabled, the processor uses the memory type stored for the appropriate fixed-range MTRR.
9-26
2. Otherwise, the processor attempts to match the physical address with a memory type range set with a pair of variable-range MTRRs: a. If one variable memory range matches, the processor uses the memory type stored in the MTRRphysBasen register for that range.
b. If two or more variable memory ranges match and the memory types are identical, then that memory type is used. c. If two or more variable memory ranges match and one of the memory types is UC, the UC memory type used.
d. If two or more variable memory ranges match and the memory types are WT and WB, the WT memory type is used. e. If two or more variable memory ranges match and the memory types are other than UC and WB, the behaviour of the processor is undefined.
3. If no fixed or variable memory range matches, the processor uses the default memory type.
9.12.5. MTRR Initialization

On a hardware reset, a P6 family processor clears the valid flags in the variable-range MTRRs and clears the E flag in the MTRRdefType register to disable all MTRRs. All other bits in the MTRRs are undefined. Prior to initializing the MTRRs, software (normally the system BIOS) must initialize all fixed-range and variable-range MTRR registers fields to 0. Software can then initialize the MTRRs according to the types of memory known to it, including memory on devices that it auto-configures. This initialization is expected to occur prior to booting the operating system. See Section 9.12.8., Multiple-Processor Considerations, for information on initializing MTRRs in multiple-processor systems.
9.12.6. Remapping Memory Types

A system designer may re-map memory types to tune performance or because a future processor may not implement all memory types supported by the P6 family processors. The following rules support coherent memory-type re-mappings: 1. A memory type should not be mapped into another memory type that has a weaker memory ordering model. For example, the uncacheable type cannot be mapped into any other type, and the write-back, write-through, and write-protected types cannot be mapped into the weakly ordered write-combining type. 2. A memory type that does not delay writes should not be mapped into a memory type that does delay writes, because applications of such a memory type may rely on its writethrough behavior. Accordingly, the write-back type cannot be mapped into the writethrough type.
9-27
3. A memory type that views write data as not necessarily stored and read back by a subsequent read, such as the write-protected type, can only be mapped to another type with the same behaviour (and there are no others for the P6 family processors) or to the uncacheable type. In many specific cases, a system designer can have additional information about how a memory type is used, allowing additional mappings. For example, write-through memory with no associated write side effects can be mapped into write-back memory.
9.12.7. MTRR Maintenance Programming Interface

The operating system maintains the MTRRs after booting and sets up or changes the memory types for memory-mapped devices. The operating system should provide a driver and application programming interface (API) to access and set the MTRRs. The function calls MemTypeGet() and MemTypeSet() define this interface. 9.12.7.1. MEMTYPEGET() FUNCTION
The MemTypeGet() function returns the memory type of the physical memory range specified by the parameters base and size. The base address is the starting physical address and the size is the number of bytes for the memory range. The function automatically aligns the base address and size to 4-KByte boundaries. Pseudocode for the MemTypeGet() function is given in Example 9-2.
Example 9-2. MemTypeGet() Pseudocode #define MIXED_TYPES -1 /* 0 < MIXED_TYPES || MIXED_TYPES > 256 */
IF CPU_FEATURES.MTRR /* processor supports MTRRs */ THEN Align BASE and SIZE to 4-KByte boundary; IF (BASE + SIZE) wrap 64-GByte address space THEN return INVALID; FI; IF MTRRdefType.E = 0 THEN return UC; FI; FirstType Get4KMemType (BASE); /* Obtains memory type for first 4-KByte range */ /* See Get4KMemType (4KByteRange) in Example 9-3 */ FOR each additional 4-KByte range specified in SIZE NextType Get4KMemType (4KByteRange); IF NextType FirstType THEN return MixedTypes; FI; ROF; return FirstType;
9-28
ELSE return UNSUPPORTED; FI;
If the processor does not support MTRRs, the function returns UNSUPPORTED. If the MTRRs are not enabled, then the UC memory type is returned. If more than one memory type corresponds to the specified range, a status of MIXED_TYPES is returned. Otherwise, the memory type defined for the range (UC, WC, WT, WB, or WP) is returned. The pseudocode for the Get4KMemType() function in Example 9-3 obtains the memory type for a single 4-KByte range at a given physical address. The sample code determines whether an PHY_ADDRESS falls within a fixed range by comparing the address with the known fixed ranges: 0 to 7FFFFH (64-KByte regions), 80000H to BFFFFH (16-KByte regions), and C0000H to FFFFFH (4-KByte regions). If an address falls within one of these ranges, the appropriate bits within one of its MTRRs determine the memory type.
Example 9-3. Get4KMemType() Pseudocode IF MTRRcap.FIX AND MTRRdefType.FE /* fixed registers enabled */ THEN IF PHY_ADDRESS is within a fixed range return MTRRfixed.Type; FI; FOR each variable-range MTRR in MTRRcap.VCNT IF MTRRphysMask.V = 0 THEN continue; FI; IF (PHY_ADDRESS AND MTRRphysMask.Mask) = (MTRRphysBase.Base AND MTRRphysMask.Mask) THEN return MTRRphysBase.Type; FI; ROF; return MTRRdefType.Type;
9.12.7.2.
MEMTYPESET() FUNCTION
The MemTypeSet() function in Example 9-4 sets a MTRR for the physical memory range specified by the parameters base and size to the type specified by type. The base address and size are multiples of 4 KBytes and the size is not 0.
Example 9-4. MemTypeSet Pseudocode IF CPU_FEATURES.MTRR (* processor supports MTRRs *) THEN IF BASE and SIZE are not 4-KByte aligned or size is 0 THEN return INVALID; FI; IF (BASE + SIZE) wrap 4-GByte address space THEN return INVALID;
9-29
FI; IF TYPE is invalid for P6 family processors THEN return UNSUPPORTED; FI; IF TYPE is WC and not supported THEN return UNSUPPORTED; FI; IF MTRRcap.FIX is set AND range can be mapped using a fixed-range MTRR THEN pre_mtrr_change(); update affected MTRR; post_mtrr_change(); FI; ELSE (* try to map using a variable MTRR pair *) IF MTRRcap.VCNT = 0 THEN return UNSUPPORTED; FI; IF conflicts with current variable ranges THEN return RANGE_OVERLAP; FI; IF no MTRRs available THEN return VAR_NOT_AVAILABLE; FI; IF BASE and SIZE do not meet the power of 2 requirements for variable MTRRs THEN return INVALID_VAR_REQUEST; FI; pre_mtrr_change(); Update affected MTRRs; post_mtrr_change(); FI; pre_mtrr_change() BEGIN disable interrupts; Save current value of CR4; disable and flush caches; flush TLBs; disable MTRRs; IF multiprocessing THEN maintain consistency through IPIs; FI; END post_mtrr_change() BEGIN flush caches and TLBs; enable MTRRs;
9-30
enable caches; restore value of CR4; enable interrupts; END
The physical address to variable range mapping algorithm in the MemTypeSet function detects conflicts with current variable range registers by cycling through them and determining whether the physical address in question matches any of the current ranges. During this scan, the algorithm can detect whether any current variable ranges overlap and can be concatenated into a single range. The pre_mtrr_change() function disables interrupts prior to changing the MTRRs, to avoid executing code with a partially valid MTRR setup. The algorithm disables caching by setting the CD flag and clearing the NW flag in control register CR0. The caches are invalidated using the WBINVD instruction. The algorithm disables the page global flag (PGE) in control register CR4, if necessary, then flushes all TLB entries by updating control register CR3. Finally, it disables MTRRs by clearing the E flag in the MTRRdefType register. After the memory type is updated, the post_mtrr_change() function re-enables the MTRRs and again invalidates the caches and TLBs. This second invalidation is required because of the processors aggressive prefetch of both instructions and data. The algorithm restores interrupts and re-enables caching by setting the CD flag. An operating system can batch multiple MTRR updates so that only a single pair of cache invalidations occur.
9.12.8. Multiple-Processor Considerations

In multiple-processor systems, the operating systems must maintain MTRR consistency between all the processors in the system. The P6 family processors provide no hardware support to maintain this consistency. In general, all processors must have the same MTRR values. This requirement implies that when the operating system initializes a multiple-processor system, it must load the MTRRs of the boot processor while the E flag in register MTRRdefType is 0. The operating system then directs other processors to load their MTRRs with the same memory map. After all the processors have loaded their MTRRs, the operating system signals them to enable their MTRRs. Barrier synchronization is used to prevent further memory accesses until all processors indicate that the MTRRs are enabled. This synchronization is likely to be a shootdown style algorithm, with shared variables and interprocessor interrupts. Any change to the value of the MTRRs in a multiple-processor system requires the operating system to repeat the loading and enabling process to maintain consistency, using the following procedure: 1. Broadcast to all processors to execute the following code sequence. 2. Disable interrupts. 3. Wait for all processors to reach this point.
9-31
4. Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1 and the NW flag to 0.) 5. Flush all caches using the WBINVD instruction. 6. Clear the PGE flag in control register CR4 (if set). 7. Flush all TLBs. (Execute a MOV from control register CR3 to another register and then a MOV from that register back to CR3.) 8. Disable all range registers (by clearing the E flag in register MTRRdefType). If only variable ranges are being modified, software may clear the valid bits for the affected register pairs instead. 9. Update the MTRRs. 10. Enable all range registers (by setting the E flag in register MTRRdefType). If only variable-range registers were modified and their individual valid bits were cleared, then set the valid bits for the affected ranges instead. 11. Flush all caches and all TLBs a second time. (The TLB flush is required for P6 family processors. Executing the WBINVD instruction is not needed when using P6 family processors, but it may be needed in future systems.) 12. Enter the normal cache mode to re-enable caching. (Set the CD and NW flags in control register CR0 to 0.) 13. Set PGE flag in control register CR4, if previously cleared. 14. Wait for all processors to reach this point. 15. Enable interrupts.
9.12.9. Large Page Size Considerations

The MTRRs provide memory typing for a limited number of regions that have a 4 KByte granularity (the same granularity as 4-KByte pages). The memory type for a given page is cached in the processors TLBs. When using large pages (2 or 4 MBytes), a single page-table entry covers multiple 4-KByte granules, each with a single memory type. Because the memory type for a large page is cached in the TLB, the processor can behave in an undefined manner if a large page is mapped to a region of memory that MTRRs have mapped with multiple memory types. Undefined behavior can be avoided by insuring that all MTRR memory-type ranges within a large page are of the same type. If a large page maps to a region of memory containing different MTRR-defined memory types, the PCD and PWT flags in the page-table entry should be set for the most conservative memory type for that range. For example, a large page used for memory mapped I/O and regular memory is mapped as UC memory. Alternatively, the operating system can map the region using multiple 4-KByte pages each with its own memory type. The requirement that all 4-KByte ranges in a large page are of the same memory type implies that large pages with different memory types may suffer a performance penalty, since they must be marked with the lowest common denominator memory type.
9-32
The P6 family processors provide special support for the physical memory range from 0 to 4 MBytes, which is potentially mapped by both the fixed and variable MTRRs. This support is invoked when a P6 family processor detects a large page overlapping the first 1 MByte of this memory range with a memory type that conflicts with the fixed MTRRs. Here, the processor maps the memory range as multiple 4-KByte pages within the TLB. This operation insures correct behavior at the cost of performance. To avoid this performance penalty, operatingsystem software should reserve the large page option for regions of memory at addresses greater than or equal to 4 MBytes.
9.13. PAGE ATTRIBUTE TABLE (PAT)

The Page Attribute Table (PAT) is an extension to Intels 32-bit processor virtual memory architecture for certain P6 family processors. Specifically, the PAT is an extension of the page-table format, which allows the specification of memory types to regions of physical memory based on linear address mappings. The PAT provides the equivalent functionality of an unlimited number of Memory Type Range Registers (MTRRs). Using the PAT in conjunction with the MTRRs of the P6 family of processors extends the memory type information present in the current Intel Architecture page-table format. It combines the extendable and programmable qualities of the MTRRs with the flexibility of the page tables, allowing operating systems or applications to select the best memory type for their needs. The ability to apply the best memory type in a flexible way enables higher levels of performance.
NOTE
In multiple processor systems, the operating system(s) must maintain MTRR consistency between all the processors in the system. The P6 family processors provide no hardware support for maintaining this consistency. In general, all processors must have the same MTRR values.
9.13.1. Background
The P6 family of processors support the assignment of specific memory types to physical addresses. Memory type support is provided through the use of Memory Type Range Registers (MTRRs). Currently there are two interacting mechanisms that work together to set the effective memory type: the MTRRs and the page tables. Refer to the Intel Architecture Software Developers Manual, Volume 3: System Programming Guide. The MTRRs define the memory types for physical address ranges. MTRRs have specific alignment and length requirements for the memory regions they describe. Therefore, they are useful for statically describing memory types for physical ranges, and are typically set up by the system BIOS. However, they are incapable of describing memory types for the dynamic, linearly addressed data structures of programs. The MTRRs are an expandable and programmable way to encode memory types, but are inflexible because they can only apply those memory types to physical address ranges.
9-33
The page tables allow memory types to be assigned dynamically to linearly addressed pages of memory. This gives the operating system the maximum amount of flexibility in applying memory types to any data structure. However, the page tables only offer three of the five basic P6 processor family memory type encodings: Write-back (WB), Write-through (WT) and Uncached (UC). The PAT extends the existing page-table format to enable the specification of additional memory types.
9.13.2. Detecting Support for the PAT Feature

The page attribute table (PAT) feature is detected by an operating system through the use of the CPUID instruction. Specifically, the operating system executes the CPUID instruction with the value 1 in the EAX register, and then determines support for the feature by inspecting bit 16 of the EDX register return value. If the PAT is supported, an operating system is permitted to utilize the model specific register (MSR) specified for programming the PAT, as well as make use of the PAT-index bit (PATi), which was formerly a reserved bit in the page tables. Note that there is not a separate flag or control bit in any of the control registers that enables the use of this feature. The PAT is always enabled on all processors that support it, and the table lookup always occurs whenever paging is enabled and for all paging modes (e.g., PSE, PAE).
9.13.3. Technical Description of the PAT

The Page Attribute Table is a Model Specific Register (MSR) at address 277H (for information about the MSRs, refer to Appendix B, Model-Specific Registers. The model specific register address for the PAT is defined and will remain at the same address on future Intel processors that support this feature. Figure 9-7 shows the format of the 64-bit register containing the PAT.
31 Rsvd 27 26 PA3 24 23 Rsvd 19 18 PA 2 16 15 Rsvd 11 10 PA 1 8 7 Rsvd 3 2 PA 0 0
63 Rsvd
59
58 PA7
56
55 Rsvd
51
50 PA6
48
47 Rsvd
43
42 PA5
40
39 Rsvd
35
34 PA4
32
NOTES: 1. PA0-7 = Specifies the eight page attribute locations contained within the PAT 2. Rsvd = Most significant bits for each Page Attribute are reserved for future expansion
Figure 9-7. Page Attribute Table Model Specific Register
Each of the eight page attribute fields can contain any of the available memory type encodings, or indexes, as specified in Table 9-1.
9-34
9.13.4. Accessing the PAT

Access to the memory types that have been programmed into the PAT register fields is accomplished with a 3-bit index consisting of the PATi, PCD, and PWT bits. Table 9-8 shows how the PAT register fields are indexed. The last column of the table shows which memory type the processor assigns to each PAT field at processor reset and initialization. These initial values provide complete backward compatibility with previous Intel processors and existing software that use the previously existing page-table memory types and MTRRs.
Table 9-8. PAT Indexing and Values After Reset
PATi1 0 0 0 0 1 1 1 1 NOTES: 1. PATi bit is defined as bit 7 for 4 KB PTEs, bit 12 for PDEs mapping 2 MB/4 MB pages. 2. UC- is the page encoding PCD, PWT = 10 on P6 family processors that do not support this feature. UCin the page table is overridden by WC in the MTRRs. 3. UC is the page encoding PCD, PWT = 11 on P6 family processors that do not support this feature. UC in the page-table overrides WC in the MTRRs. PCD 0 0 1 1 0 0 1 1 PWT 0 1 0 1 0 1 0 1 PAT Entry 0 1 2 3 4 5 6 7 Memory Type at Reset WB WT UC-2 UC3 WB WT UC-2 UC3
In P6 family processors that do not support the PAT, the PCD and PWT bits are used to determine the page-table memory types of a given physical page. The PAT feature redefines these two bits and combines them with a newly defined PAT-index bit (PATi) in the page-directory and page-table entries. These three bits create an index into the 8-entry Page Attribute Table. The memory type from the PAT is used in place of PCD and PWT for computing the effective memory type. The bit used for PATi differs depending upon the level of the paging hierarchy. PATi is bit 7 for page-table entries, and bit 12 for page-directory entries that map to large pages. Reserved bit faults are disabled for nonzero values for PATi, but remain present for all other reserved bits. This is true for 4 KB/2 MB pages when PAE is enabled. The PAT index scheme for each level of the paging hierarchy is shown in Figure 9-8.
9-35
31
PCD PWT Page-Directory Base Register (CR3) 31 4 3 PCD and PWT provide 2 bit index into the PAT, allowing use of first 4 entries
PCD PWT Page-Directory Pointer Table Entry 31 4 3
PCD PWT 4 KB Page-Directory Entry 31 13 12 PATi 4 PCD 3 PWT PATi, PCD, and PWT provide 3 bit index into the PAT, allowing use of all 8 entries
2 MB/4 MB Page-Directory Entry 31 8 7 PATi 4 PCD 3 PWT
4 KB Page-Table Entry
Figure 9-8. Page Attribute Table Index Scheme for Paging Hierarchy
NOTE: This figure only shows the format of the lower 32 bits of the PDE, PDEPTR, and PTEs when in PAE mode Refer to Figure 3-21 from Chapter 3, Protected-Mode Memory Management of the Intel Architecture Software Developers Manual, Volume 3: System Programming Guide. Additionally, the formats shown in this figure are not meant to accurately represent the entire structure, but only the labeled bits.
Figure 9-8 shows that the PAT bit is not defined in CR3, the Page-Directory-Pointer Tables when PAE is enabled, or the Page Directory when it doesnt describe a large page. In these cases, only PCD and PWT are used to index into the PAT, limiting the operating system to using only the first 4 entries of PAT for describing the memory attributes of the paging hierarchy. Note that all 8 PAT entries are available for describing a 4 KB/2 MB/4 MB page. The memory type as now defined by PAT interacts with the MTRR memory type to determine the effective memory type as outlined in Table 9-9. Compare this to Table 9-5.
9-36
Table 9-9. Effective Memory Type Depending on MTRRs and PAT

PAT Memory Type UCMTRR Memory Type WB, WT WC UC WP UC WC WT WB, WT, WP, WC UC X WB, WT UC WC WP WP WB, WP UC WC, WT WB WB UC WC WT WP NOTES: This table assumes that the CD and NW flags in register CR0 are set to 0. If CR0.CD = 1, then the effective memory type returned is UC, regardless of what is indicated in the table. However, this does not force strict ordering. To ensure strict ordering, the MTRRs also must be disabled. The effective memory types in the gray areas are implementation dependent and may be different between implementations of Intel Architecture processors. UC_MTRR indicates that the UC attribute came from the MTRRs and the processor(s) are not required to snoop their caches since the data could never have been cached. This is preferred for performance reasons. UC_PAGE indicates that the UC attribute came from the page tables and processors are required to check their caches because the data may be cached due to page aliasing, which is not recommended. UC- is the page encoding PCD, PWT = 10 on P6 family processors that do not support this feature. UC- in the PTE/PDE is overridden by WC in the MTRRs. UC is the page encoding PCD, PWT = 11 on P6 family processors that do not support this feature. UC in the PTE/PDE overrides WC in the MTRRs. Effective Memory Type UC_PAGE WC UC_MTRR Undefined UC_PAGE UC_MTRR WC WT UC_MTRR Undefined Undefined WP UC_MTRR Undefined WB UC_MTRR WC WT WP
Whenever the MTRRs are disabled, via bit 11 (E) in the MTRRDefType register, the effective memory type is UC for all memory ranges. An operating system can program the PAT and select the 8 most useful attribute combinations. The PAT allows an operating system to offer performance-enhancing memory types to applications.
9-37
The page attribute for addresses containing a page directory or page table supports only the first four entries in the PAT, since a PAT-index bit is not defined for these mappings. The page attribute is determined by using the two-bit value specified by PCD and PWT in CR3 (for page directory) or the page-directory entry (for page tables). The same applies to Page-DirectoryPointer Tables when PAE is enabled.
9.13.5. Programming the PAT

The Page Attribute Table is read/write accessible to software operating at ring 0 through the use of the rdmsr and wrmsr instructions. Accesses are directed to the PAT through use of model specific register address 277H. Refer to Figure 9-7 for the format of the 64-bit register containing the PAT. The PAT implementation on processors that support the feature defines only the 3 least significant bits for page attributes. These bits are used to specify the memory type with the same encoding as used for the P6 family MTRRs as shown in Table 9-6. Processors that support the PAT feature modify those encodings slightly, in that encoding 0 is UC and encoding 7 is UC-, as indicated in the Table 9-10. Encoding 7 remains undefined for the fixed and variable MTRRs, and any attempt to write an undefined memory type encoding continues to generate a GP fault. Attempting to write an undefined memory type encoding into the PAT generates a GP fault.
Table 9-10. PAT Memory Types and Their Properties
Writeback Cacheable No No No Allows Speculative Reads No Yes Yes Memory Ordering Model Strong Ordering Weak Ordering Speculative Processor Ordering Speculative Processor Ordering Speculative Processor Ordering Strong Ordered, but can be overridden by WC in the MTRRs
Mnemonic Uncacheable (UC) Write Combining (WC) Write-through (WT) Write-protect (WP) Write-back (WB)
Encoding 0 1 4
Cacheable No No Yes
Yes for reads, no for writes Yes
No
Yes
Yes
Yes
Uncached (UC-)
No
No
No
Reserved
2, 3, 87-255
The operating system is responsible for ensuring that changes to a PAT entry occur in a manner that maintains the consistency of the processor caches and translation lookaside buffers (TLB). This is accomplished by following the procedure as specified in the Intel Architecture Software
9-38
Developers Manual, Volume 3: System Programming Guide, for changing the value of an MTRR. It involves a specific sequence of operations that includes flushing the processor(s) caches and TLBs. An operating system must ensure that the PAT of all processors in a multiprocessing system have the same values. The PAT allows any memory type to be specified in the page tables, and therefore it is possible to have a single physical page mapped by two different linear addresses with differing memory types. This practice is strongly discouraged by Intel and should be avoided as it may lead to undefined results. In particular, a WC page must never be aliased to a cacheable page because WC writes may not check the processor caches. When remapping a page that was previously mapped as a cacheable memory type to a WC page, an operating system can avoid this type of aliasing by:
Removing the previous mapping to a cacheable memory type in the page tables; that is, make them not present. Flushing the TLBs of processors that may have used the mapping, even speculatively. Creating a new mapping to the same physical address with a new memory type, for instance, WC. Flushing the caches on all processors that may have used the mapping previously.
Operating systems that use a Page Directory as a Page Table and enable Page Size Extensions must carefully scrutinize the use of the PATi index bit for the 4 KB Page-Table Entries. The PATi index bit for a PTE (bit 7) corresponds to the page size bit in a PDE. Therefore, the operating system can only utilize PAT entries PA0-3 when setting the caching type for a page table that is also used as a page directory. If the operating system attempts to use PAT entries PA4-7 when using this memory as a page table, it effectively sets the PS bit for the access to this memory as a page directory.
9-39
9-40
10
MMX Technology System Programming
MMX TECHNOLOGY SYSTEM PROGRAMMING
CHAPTER 10 MMX TECHNOLOGY SYSTEM PROGRAMMING

This chapter describes those features of the MMX technology that must be considered when designing or enhancing an operating system to support MMX technology. It covers MMX instruction set emulation, the MMX state, aliasing of MMX registers, saving MMX state, task and context switching considerations, exception handling, and debugging.
10.1. EMULATION OF THE MMX INSTRUCTION SET

The Intel Architecture does not support emulation of the MMX technology, as it does for floating-point instructions. The EM flag in control register CR0 (provided to invoke emulation of floating-point instructions) cannot be used for MMX technology emulation. If an MMX instruction is executed when the EM flag is set, an invalid opcode (UD#) exception is generated.
10.2. THE MMX STATE AND MMX REGISTER ALIASING

The MMX state consists of eight 64-bit registers (MM0 through MM7). These registers are aliased to the 64-bit mantissas (bits 0 through 63) of floating-point registers R0 through R7 (see Figure 10-2). Note that the MMX registers are mapped to the physical locations of the floating-point registers (R0 through R7), not to the relative locations of the registers in the floating-point register stack (ST0 through ST7). As a result, the MMX register mapping is fixed and is not affected by value in the Top Of Stack (TOS) field in the floating-point status word (bits 11 through 13). When a value is written into an MMX register using an MMX instruction, the value also appears in the corresponding floating-point register in bits 0 through 63. Likewise, when a floating-point value written into a floating-point register by a floating-point instruction, the mantissa of that value also appears in a the corresponding MMX register. The execution of MMX instructions have several side effects on the FPU state contained in the floating-point registers, the FPU tag word, and the FPU the status word. These side effects are as follows:
When an MMX instruction writes a value into an MMX register, at the same time, bits 64 through 79 of the corresponding floating-point register (the exponent field and the sign bit) are set to all 1s. When an MMX instruction (other than the EMMS instruction) is executed, each of the tag fields in the FPU tag word is set to 00B (valid). (See also Section 10.2.1., Effect of MMX and Floating-Point Instructions on the FPU Tag Word.) When the EMMS instruction is executed, each tag field in the FPU tag word is set to 11B (empty).
10-1
Each time an MMX instruction is executed, the TOS value is set to 000B.
FPU Tag Register 00 00 00 00 00 00 00 00
79
64 63
Floating-Point Registers Mantissa
0 R7 R6 R5 R4 R3 R2 R1 R0
FPU Status Register 13 11 000 TOS 63 MMXTM Registers 0 MM7 MM6 MM5 MM4 MM3 MM2 MM1 TOS = 0 MM0
Figure 10-1. Mapping of MMX Registers to Floating-Point Registers
Execution of MMX instructions does not affect the other bits in the FPU status word (bits 0 through 10 and bits 14 and 15) or the contents of the other FPU registers that comprise the FPU state (the FPU control word, instruction pointer, data pointer, or opcode registers). Table 10-1 summarizes the effects of the MMX instructions on the FPU state.
10-2
Table 10-1. Effects of MMX Instructions on FPU State

MMX Instruction Type Read from MMn register Write to MMn register EMMS NOTE: MMn refers to one MMX register; Rn refers to corresponding floating-point register. TOS Field of FPU Status Word 000B 000B 000B Other FPU Registers Unchanged Unchanged Unchanged Exponent Bits and Sign Bit of Rn Unchanged Set to all 1s Unchanged
FPU Tag Word All tags set to 00B (Valid) All tags set to 00B (Valid) All fields set to 11B (Empty)
Mantissa of Rn Unchanged Overwritten with MMX data Unchanged
10.2.1. Effect of MMX and Floating-Point Instructions on the FPU Tag Word
Table 10-2 summarizes the effect of MMX and floating-point instructions on the tags in the FPU tag word and the corresponding tags in an image of the tag word stored in memory.
Table 10-2. Effect of the MMX and Floating-Point Instructions on the FPU Tag Word
Instruction Type MMX Instruction MMX Instruction Floating-Point Instruction Instruction All (except EMMS) EMMS All (except FXSAVE/FSAVE, FSTENV, FXRSTOR/FRST OR, FLDENV) FXSAVE/FSAVE, FSTENV FPU Tag Word All tags are set to 00B (valid). All tags are set to 11B (empty). Tag for modified floating-point register is set to 00B or 11B. Image of FPU Tag Word Stored in Memory Not affected. Not affected. Not affected.
Floating-Point Instruction
Tags and register values are read and interpreted; then all tags are set to 11B.
Tags are set according to the actual values in the floatingpoint registers; that is, empty registers are marked 11B and valid registers are marked 00B (nonzero), 01B (zero), or 10B (special). Tags are read and interpreted, but not modified.
Floating-Point Instruction
FXRSTOR/FRST OR, FLDENV
All tags marked 11B in memory are set to 11B; all other tags are set according to the value in the corresponding floatingpoint register: 00B (nonzero), 01B (zero), or 10B (special).
10-3
The values in the fields of the FPU tag word do not affect the contents of the MMX registers or the execution of MMX instructions. However, the MMX instructions do modify the contents of the FPU tag word, as is described in Section 10.2., The MMX State and MMX Register Aliasing. These modifications may affect the operation of the FPU when executing floating-point instructions, if the FPU state is not initialized or restored prior to beginning floating-point instruction execution. Note that the FXSAVE/FSAVE and FSTENV instructions (which save FPU state information) read the FPU tag register and contents of each of the floating-point registers, determine the actual tag values for each register (empty, nonzero, zero, or special), and store the updated tag word in memory. After executing these instructions, all the tags in the FPU tag word are set to empty (11B). Likewise, the EMMS instruction clears MMX state from the MMX/floatingpoint registers by setting all the tags in the FPU tag word to 11B.
10.3. SAVING AND RESTORING THE MMX STATE AND REGISTERS

The recommended method of saving and restoring the MMX technology state is as follows:
Execute an FXSAVE/FSAVE/FNSAVE instruction to write the entire state of the MMX/FPU, the SIMD floating-point registers and the SIMD floating-point MXCSR to memory. Execute an FXRSTOR/FRSTOR instruction to read the entire saved state of the MMX/FPU, the SIMD floating-point registers and the SIMD floating-point MXCSR from memory into the FPU registers, the aliased MMX registers, the SIMD floatingpoint registers and the SIMD floating-point MXCSR.
This save and restore method is required for operating systems (refer to Section 10.4., Designing Operating System Task and Context Switching Facilities). Applications can in some cases save and restore only the MMX registers, in the following way:
Execute eight MOVQ instructions to write the contents of the MMX registers MM0 through MM7 to memory. An EMMS instruction may then (optionally) be executed to clear the MMX state in the FPU. Execute eight MOVQ instructions to read the saved contents of the MMX registers from memory into the MM0 through MM7 registers.
NOTE
Intel does not support scanning the FPU tag word and then only saving valid entries.
10-4
10.4. DESIGNING OPERATING SYSTEM TASK AND CONTEXT SWITCHING FACILITIES

When switching from one task or context to another, it is often necessary to save the MMX state (just as it is often necessary to save the state of the FPU). As a general rule, if the existing task switching code for an operating system includes facilities for saving the state of the FPU, these facilities can also be relied upon to save the MMX state, without rewriting the task switch code. This reliance is possible because the MMX state is aliased to the FPU state (refer to Section 10.2., The MMX State and MMX Register Aliasing). When designing new MMX (and/or FPU) state saving facilities for an operating system, several approaches are available:
The operating system can require that applications (which will be run as tasks) take responsibility for saving the state of the MMX/FPU prior to a task suspension during a task switch and for restoring the MMX/FPU state when the task is resumed. The application can use either of the state saving and restoring techniques given in Section 10.3., Saving and Restoring the MMX State and Registers. This approach to saving MMX/FPU state is appropriate for cooperative multitasking operating systems, where the application has control over (or is able to determine) when a task switch is about to occur and can save state prior to the task switch. The operating system can take the responsibility for automatically saving the MMX/FPU state as part of the task switch process (using an FXSAVE/FSAVE instruction) and automatically restoring the MMX/FPU state when a suspended task is resumed (using an FXRSTOR/FRSTOR instruction). Here, the MMX/FPU state must be saved as part of the task state. This approach is appropriate for preemptive multitasking operating systems, where the application cannot know when it is going to be preempted and cannot prepare in advance for task switching. The operating system is responsible for saving and restoring the task and MMX/FPU state when necessary. The operating system can take the responsibility for saving the MMX/FPU state as part of the task switch process, but delay the saving of the MMX/FPU state until an MMX or floating-point instruction is actually executed by the new task. Using this approach, the MMX/FPU state is saved only if an MMX or floating-point instruction needs to be executed in the new task. (Refer to Section 10.4.1., Using the TS Flag in Control Register CR0 to Control MMX/FPU State Saving, for more information on this MMX/FPU state saving technique.)
10.4.1. Using the TS Flag in Control Register CR0 to Control MMX/FPU State Saving
Saving the MMX/FPU state using the FXSAVE/FSAVE instruction is a relatively high-overhead operation. If a task being switched to will not access the FPU (by executing an MMX or a floating-point instruction), this overhead can be avoided by not automatically saving the MMX/FPU state on a task switch.
10-5
The TS flag in control register CR0 is provided to allow the operating system to delay saving the MMX/FPU state until the FPU is actually accessed in the new task. When this flag is set, the processor monitors the instruction stream for MMX or floating-point instructions. When the processor detects an MMX or floating-point instruction, it raises a device-not-available exception (#NM) prior to executing the instruction. The device-not-available exception handler can then be used to save the MMX/FPU state for the previous task (using an FXSAVE/FSAVE instruction) and load the MMX/FPU state for the current task (using an FXRSTOR/FRSTOR instruction). If the task never encounters an MMX or floating-point instruction, the devicenot-available exception will not be raised and the MMX/FPU state will not be saved unnecessarily. The TS flag can be set either explicitly (by executing a MOV instruction to control register CR0) or implicitly (using the processors native task switching mechanism). When the native task switching mechanism is used, the processor automatically sets the TS flag on a task switch. After the device-not-available handler has saved the MMX/FPU state, it should execute the CLTS instruction to clear the TS flag in CR0. Figure 10-2 gives an example of an operating system that implements MMX/FPU state saving using the TS flag. In this example, task A is the currently running task and task B is the task being switched to.
Task A Application Operating System Task A MMX/FPU State Save Area Operating System Task Switching Code Saves Task A MMX/FPU State Device-Not-Available Exception Handler MMX/FPU State Owner
Task B
CR0.TS=1 and Task B Floating-point or MMX/FPU MMX Instruction State Save Area is encountered.
Loads Task B MMX/FPU State
Figure 10-2. Example of MMX/FPU State Saving During an Operating System-Controlled Task Switch
The operating system maintains an MMX/FPU save area for each task and defines a variable (MMX/FPUStateOwner) that indicates which task owns the MMX/FPU state. In this example, task A is the current MMX/FPU state owner. On a task switch, the operating system task switching code must execute the following pseudocode to set the TS flag according to who is the current MMX/FPU state owner. If the new task
10-6
(task B in this example) is not the current MMX/FPU state owner, the TS flag is set to 1; otherwise, it is set to 0.
IF Task_Being_Switched_To MMX/FPUStateOwner THEN CR0.TS 1; ELSE CR0.TS 0; FI;
If a new task attempts to use an MMX or floating-point instruction while the TS flag is set to 1, a device-not-available exception (#NM) is generated and the device-not-available exception handler executes the following pseudo-code.
CR0.TS 0; FSAVE To MMX/FPU State Save Area for Current MMX/FPU State Owner; FRSTOR MMX/FPU State From Current Tasks MMX/FPU State Save Area; MMX/FPUStateOwner Current_Task;
This handler code performs the following tasks:
Clears the TS flag. Saves the MMX/FPU state in the state save area for the current MMX/FPU state owner. Restores the MMX/FPU state from the new tasks MMX/FPU state save area. Updates the current MMX/FPU state owner to be the current task.
10.5. EXCEPTIONS THAT CAN OCCUR WHEN EXECUTING MMX INSTRUCTIONS

MMX instructions do not generate floating-point exceptions, nor do they affect the processors status flags in the EFLAGS register or the FPU status word. The following exceptions can be generated during the execution of an MMX instruction:
Exceptions during memory accesses: Stack-segment fault (#SS). General protection (#GP). Page fault (#PF). Alignment check (#AC), if alignment checking is enabled.
System exceptions: Invalid Opcode (#UD), if the EM flag in control register CR0 is set when an MMX instruction is executed. (Refer to Section 10.1., Emulation of the MMX Instruction Set).
10-7
Device not available (#NM), if an MMX instruction is executed when the TS flag in control register CR0 is set. (See Refer to Section 10.4.1., Using the TS Flag in Control Register CR0 to Control MMX/FPU State Saving.)
Floating-point error (#MF). (See Refer to Section 10.5.1., Effect of MMX Instructions on Pending Floating-Point Exceptions.) Other exceptions can occur indirectly due to the faulty execution of the exception handlers for the above exceptions. For example, if a stack-segment fault (#SS) occurs due to MMX instructions, the interrupt gate for the stack-segment fault can direct the processor to invalid TSS, causing an invalid TSS exception (#TS) to be generated.
10.5.1. Effect of MMX Instructions on Pending Floating-Point Exceptions

If a floating-point exception is pending and the processor encounters an MMX instruction, the processor generates a floating-point error (#MF) prior to executing the MMX instruction, to allow the exception to be handled by the floating-point error exception handler. While the handler is executing, the FPU state is maintained and is visible to the handler. Upon returning from the exception handler, the MMX instruction is executed, which will alter the FPU state, as described in Section 10.2., The MMX State and MMX Register Aliasing.
10.6. DEBUGGING
The debug facilities of the Intel Architecture operate in the same manner when executing MMX instructions as when executing other Intel Architecture instructions. These facilities enable debuggers to debug MMX technology code. To correctly interpret the contents of the MMX or FPU registers from the FXSAVE/FSAVE image in memory, a debugger needs to take account of the relationship between the floatingpoint registers logical locations relative to TOS and the MMX registers physical locations. In the floating-point context, STn refers to a floating-point register at location n relative to the TOS. However, the tags in the FPU tag word are associated with the physical locations of the floating-point registers (R0 through R7). The MMX registers always refer to the physical locations of the registers (with MM0 through MM7 being mapped to R0 through R7). In Figure 10-2, the inner circle refers to the physical location of the floating-point and MMX registers. The outer circle refers to the floating-point registerss relative location to the current TOS.
10-8
FP push
ST0
FP pop ST1 MM1 MM7 ST2 MM6 MM5
ST6
FP push
ST7 MM7 MM6 MM5
MM0 (R0) MM2 (R2) MM3 MM4
MM0 (R0) MM1 TOS MM2 (R2) MM3 MM4
ST7
TOS
ST0
FP pop ST1
Case A: TOS=0
Case B: TOS=2
Outer circle = FP registers logical location relative to TOS Inner circle = FPU tags = MMX registers location = FP registerss physical location
Figure 10-3. Mapping of MMX Registers to Floating-Point (FP) Registers
When the TOS equals 0 (case A in Figure 10-2), ST0 points to the physical location R0 on the floating-point stack. MM0 maps to ST0, MM1 maps to ST1, and so on. When the TOS equals 2 (case B in Figure 10-2), ST0 points to the physical location R2. MM0 maps to ST6, MM1 maps to ST7, MM2 maps to ST0, and so on.
10-9
10-10
11
Streaming SIMD Extensions System Programming
CHAPTER 11 STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING

This chapter describes those features of the Streaming SIMD Extensions that must be considered when designing or enhancing an operating system to support the Pentium III processor. It covers extensions emulation, the new SIMD floating-point architectural state, similarities to MMX technology, task and context switching considerations, exception handling, and debugging.
11.1. EMULATION OF THE STREAMING SIMD EXTENSIONS

The Intel Architecture does not support emulation of the Streaming SIMD Extensions, as it does for floating-point instructions. The EM flag in control register CR0 (provided to invoke emulation of floating-point instructions) cannot be used for Streaming SIMD Extensions emulation. If a Streaming SIMD Extensions instruction is executed when the EM flag is set (CR0.EM), an invalid opcode (UD#/INT6) exception is generated instead of a device not available exception (NM#/INT7).
11.2. MMX STATE AND STREAMING SIMD EXTENSIONS

The SIMD-integer instructions of the Streaming SIMD Extensions use the same registers as the MMX technology instructions. In addition they have been implemented so the same rules for MMX technology instructions apply to the Streaming SIMD Extensions. Hence everything referenced in chapter 10 relating to MMX technology and system programming is applicable to the SIMD-integer instructions in the Streaming SIMD Extensions.
11.3. NEW PENTIUM III PROCESSOR REGISTERS

The Pentium III Processor introduced a set of 128-bit general-purpose registers. These registers are directly addressable and can be used to hold data only. In addition, the Pentium III Processor also introduced a new control/status register (MXCSR) that is used to flag exceptions resulting from computations involving the SIMD floating-point registers, mask/unmask exceptions, and control the rounding and flush-to-zero modes. These registers are described more completely in the following sections.
11-1
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
11.3.1.
SIMD Floating-point Registers
Streaming SIMD Extensions provides eight 128-bit general-purpose registers, each of which can be directly addressed. These registers are new state, and require support from the operating system to use them. The SIMD floating-point registers can hold packed 128-bit data. The SIMD floating-point instructions access the SIMD floating-point registers directly using the register names XMM0 to XMM7 (Table 11-1). These registers can be used to perform calculations on data. They cannot be used to address memory; addressing is accomplished by using the integer registers and existing IA addressing modes. The contents of SIMD floating-point registers are cleared upon reset. There is a new control/status register MXCSR which is used to mask/unmask numerical exception handling, to set rounding modes, to set the flush-to-zero mode, and to view status flags.
Table 11-1. SIMD Floating-point Register Set
128 97 96 64 63 32 31 0 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7
11.3.2.
SIMD Floating-point Control/Status Registers
The control/status register is used to enable masked/unmasked numerical exception handling, to set rounding modes, to set the flush-to-zero mode, and to view status flags. The contents of this register can be loaded with the LDMXCSR and FXRSTOR instructions and stored in memory with the STMXCSR and FXSAVE instructions. Figure 11-1 shows the format and encoding of the fields in the MXCSR.
11-2
31-16
Reserved
15 F R Z C
R C
P M
U M
10 O M
Z M
D M
I M
R s v d
5 P E
U E
O E
Z E
D E
0 I E
Figure 11-1. Streaming SIMD Extensions Control/Status Register Format
Bits 5-0 indicate whether a Streaming SIMD Extensions numerical exception has been detected. They are sticky flags, and can be cleared by using the LDMXCSR instruction to write zeroes to these fields. If a LDMXCSR instruction clears a mask bit and sets the corresponding exception flag bit, an exception will not be generated because of this change. This type of exception will occur only upon the next Streaming SIMD Extensions instruction to cause it. Streaming SIMD Extensions use only one exception flag for each exception. There is no provision for individual exception reporting within a packed data type. In situations where multiple identical exceptions occur within the same instruction, the associated exception flag is updated and indicates that at least one of these conditions happened. These flags are cleared upon reset. Bits 12-7 configure numerical exception masking; an exception type is masked if the corresponding bit is set and it is unmasked if the bit is clear. These bits are set upon reset, meaning that all numerical exceptions are masked. Bits 14-13 encode the rounding control, which provides for the common round to nearest mode, as well as directed rounding and true chop (refer to Section 11.3.2.1., Rounding Control Field). The rounding control is set to round to nearest upon reset. Bit 15 (FZ) is used to turn on the flush-to-zero mode (refer to Section 11.3.2.2., Flush-toZero). This bit is cleared upon reset, disabling the flush-to-zero mode. The other bits of MXCSR (bits 31-16 and bit 6) are defined as reserved and cleared; attempting to write a non-zero value to these bits, using either the FXRSTOR or LDMXCSR instructions, will result in a general protection exception. 11.3.2.1. ROUNDING CONTROL FIELD
The rounding control (RC) field of MXCSR (bits 13 and 14) controls how the results of floatingpoint instructions are rounded. Four rounding modes are supported: round to nearest, round up, round down, and round toward zero (see Table 11-2). Round to nearest is the default rounding mode and is suitable for most applications. It provides the most accurate and statistically unbiased estimate of the true result.
11-3

Rounding Mode Round to nearest (even) Round down (toward ) Round up (toward +) Round toward zero (truncate) RC Field Setting 00B Description Rounded result is the closest to the infinitely precise result. If two values are equally close, the result is the even value (that is, the one with the least-significant bit of zero). Rounded result is closest to, but no greater than the infinitely precise result. Rounded result is closest to, but no less than the infinitely precise result. Rounded result is closest to, but no greater in absolute value than the infinitely precise result.
01B 10B 11B
The round up and round down modes are termed directed rounding and can be used to implement interval arithmetic. Interval arithmetic is used to determine upper and lower bounds for the true result of a multistep computation, when the intermediate results of the computation are subject to rounding. The round toward zero mode (sometimes called the chop mode) is commonly used when performing integer arithmetic with the processor. Whenever possible, the processor produces an infinitely precise result. However, it is often the case that the infinitely precise result of an arithmetic or store operation cannot be encoded exactly in the format of the destination operand. For example, the following value ( a) has a 24bit fraction. The least-significant bit of this fraction (the underlined bit) cannot be encoded exactly in the single-real format (which has only a 23-bit fraction): (a) 1.0001 0000 1000 0011 1001 0111E2 101 To round this result (a), the processor first selects two representable fractions b and c that most closely bracket a in value (b < a < c). (b) 1.0001 0000 1000 0011 1001 011E2 101 (c) 1.0001 0000 1000 0011 1001 100E2 101 The processor then sets the result to b or to c according to the rounding mode selected in the RC field. Rounding introduces an error in a result that is less than one unit in the last place to which the result is rounded. The rounded result is called the inexact result. When the processor produces an inexact result, the floating-point precision (inexact) flag (PE) is set in MXCSR. When the infinitely precise result is between the largest positive finite value allowed in a particular format and +, the processor rounds the result as shown in Table 11-3.
11-4
Table 11-3. Rounding of Positive Numbers Greater than the Maximum Positive Finite Value
Rounding Mode Rounding to nearest (even) Rounding down (toward ) Rounding up (toward +) Rounding toward zero (Truncate) + Maximum, positive finite value + Maximum, positive finite value Result
When the infinitely precise result is between the largest negative finite value allowed in a particular format and , the processor rounds the result as shown in Table 11-4.
Table 11-4. Rounding of Negative Numbers Smaller than the Maximum Negative Finite Value
Rounding Mode Rounding to nearest (even) Rounding toward zero (Truncate) Rounding up (toward +) Rounding down (toward ) - Maximum, negative finite value Maximum, negative finite value - Result
The rounding modes have no effect on comparison operations, operations that produce exact results, or operations that produce NaN results. 11.3.2.2. FLUSH-TO-ZERO
Turning on the Flush-To-Zero mode has the following effects when tiny results occur (i.e. when the infinitely precise result rounded to the destination precision with an unbounded exponent, is smaller in absolute value than the smallest normal number that can be represented; this is similar to the underflow condition when underflow traps are unmasked):
Zero results are returned with the sign of the true result Precision and underflow exception flags are set
The IEEE mandated masked response to underflow is to deliver the denormalized result (i.e., gradual underflow); consequently, the flush-to-zero mode is not compatible with IEEE Standard 754. It is provided primarily for performance reasons. At the cost of a slight precision loss, faster execution can be achieved for applications where underflow is common. Underflow for flushto-zero is defined to occur when the exponent for a computed result, prior to denormalization scaling, falls in the denormal range; this is regardless of whether a loss of accuracy has occurred. Unmasking the underflow exception takes precedence over flush-to-zero mode; this means that an exception handler will be invoked for a Streaming SIMD Extensions instruction that generates an underflow condition while this exception is unmasked, regardless of whether flush-tozero is enabled.
11-5
11.4. ENABLING STREAMING SIMD EXTENSIONS SUPPORT

This section describes the interface of the Intel Architecture Streaming SIMD Extensions with the operating system.
11.4.1.
Enabling Streaming SIMD Extensions Support
Certain steps must be taken in both the application and the OS to check if the CPU supports Streaming SIMD Extensions and associated unmasked exceptions. This section describes this process, which is conducted using the bits described in Table 11-5 and Table 11-6. If the OS wants to use FXSAVE/FXRSTOR, it will first check CPUID.FXSR to determine if the CPU supports these instructions. If the CPU does support FXSAVE/FXRSTOR, then the OS can set CR4.OSFXSR without faulting and enable code for context switching that utilizes FXSAVE/FXRSTOR instead of FSAVE/FRSTOR. At this point, if the OS also supports unmasked SIMD floating-point exceptions, it should check CPUID.XMM to see if this is a Streaming SIMD Extensions-enabled processor. If CPUID.XMM is set, this verifies that the OS can set CR4.OSXMMEXCPT without faulting. The process by which an application detects the existence of Streaming SIMD Extensions as discussed in Section 9.5.1., Detecting Support for Streaming SIMD Extensions Using the CPUID Instruction Chapter 9, Programming with the Streaming SIMD Extensions, in the Intel Architecture Software Developers Manual, Volume 1. For additional information and examples, see AP-900, Identifying Support for Streaming SIMD Extensions in the Processor and Operating System.
Table 11-5. CPUID Bits for Streaming SIMD Extensions Support
CPUID bit (EAX = 1) FXSR (EDX bit24) XMM (EDX bit25) Meaning If set, CPU supports FXSAVE/FXRSTOR. The OS can read this bit to determine if it can use FXSAVE/FXRSTOR in place of FSAVE/FRSTOR for context switches. If set, the Streaming SIMD Extensions set is supported by the processor.
Table 11-6. CR4 Bits for Streaming SIMD Extensions Support

CR4 bit OSFXSR (bit9) OSXMMEXCPT (bit10) Meaning Defaults to clear. If both the CPU and the OS support FXSAVE/FXRSTOR for use during context switches, then the OS will set this bit. Defaults to clear. The OS will set this bit if it supports unmasked SIMD floatingpoint exceptions.
11.4.2.
Device Not Available (DNA) Exceptions
Streaming SIMD Extensions will cause a DNA Exception (#NM) if the processor attempts to execute a SIMD floating-point instruction while CR0.TS is set. If CPUID.XMM is clear, execu-
11-6
tion of any Streaming SIMD Extensions instruction will cause an invalid opcode fault regardless of the state of CR0.TS.
11.4.3. FXSAVE/FXRSTOR as a Replacement for FSAVE/FRSTOR

The FXSAVE and FXRSTOR instructions are designed to be a replacement for FSAVE/FRSTOR, to be used by the OS for context switches. These have been optimized to be faster than FSAVE/FRSTOR, while still saving/restoring the additional SIMD floating-point state. To meet this goal, FXSAVE differs from FSAVE in that it does not cause an FINIT to be performed, nor does FXSAVE initialize the SIMD floating-point registers in any way. While FXSAVE/FXRSTOR does save/restore the x87-FP state, FSAVE/FRSTOR does not affect the SIMD floating-point state. This allows for FXSAVE/FXRSTOR and FSAVE/FRSTOR to be nested. State saved with FXSAVE and restored with FRSTOR (and vice versa) will result in incorrect restoration of state in the processor. FXSAVE will not save the SIMD floating-point state (SIMD floating-point registers and MXCSR register) if the CR4.OSFXSR bit is not set.
11.4.4. Numeric Error flag and IGNNE#

Streaming SIMD Extensions ignore CR0.NE (treats it as if it were always set) and the IGNNE# pin and always use the vector 19 software exception for error reporting.
11.5. SAVING AND RESTORING THE STREAMING SIMD EXTENSIONS STATE

The recommended method of saving and restoring the Streaming SIMD Extensions state is as follows:
Execute an FXSAVE instruction to write the entire state of the MMX/FPU, the SIMD floating-point registers, and the SIMD floating-point MXCSR to memory. Execute an FXRSTOR instruction to read the entire saved state of the MMX/FPU, the SIMDP floating-point registers and the SIMD floating-point MXCSR from memory into the FPU registers and the aliased MMX registers.
This save and restore method is required for operating systems (see Section 10.6., Designing Operating System Task and Context Switching Facilities). Applications can in some cases save and restore only the SIMD floating-point registers, in the following way:
Execute eight MOVAPS instructions to write the contents of the SIMD floating-point registers XMM0 through XMM7 to memory. Execute a STMXCSR instruction to save the MXCSR register to memory. Execute eight MOVAPS instructions to read the saved contents of the SIMD floating-point registers from memory into the XMM0 through XMM7 registers. Execute a LDMXCSR
11-7
instruction to read the saved contents of the MXCSR register from memory into the MXCSR register.
11.6. DESIGNING OPERATING SYSTEM TASK AND CONTEXT SWITCHING FACILITIES

When switching from one task or context to another, it is often necessary to save the SIMD floating-point state (just as it is often necessary to save the state of the FPU). As mentioned in the previous chapter, the MMX state is aliased on the FPU state. The SIMD floating-point registers in the Pentium III processor introduce a new state. When designing new SIMD floating-point state saving facilities for an operating system, several approaches are available:
The operating system can require that applications (which will be run as tasks) take responsibility for saving the SIMD floating-point state prior to a task suspension during a task switch and for restoring the SIMD floating-point state when the task is resumed. The application can use either of the state saving and restoring techniques given in Section 10.5., Saving and Restoring the Streaming SIMD Extensions state. This approach to saving the SIMD floating-point state is appropriate for cooperative multitasking operating systems, where the application has control over (or is able to determine) when a task switch is about to occur and can save state prior to the task switch. The operating system can take the responsibility for automatically saving the SIMD floating-point state as part of the task switch process (using an FXSAVE instruction) and automatically restoring the SIMD floating-point state when a suspended task is resumed (using an FXRSTOR instruction). Here, the SIMD floating-point state must be saved as part of the task state. This approach is appropriate for preemptive multitasking operating systems, where the application cannot know when it is going to be preempted and cannot prepare in advance for task switching. The operating system is responsible for saving and restoring the task and SIMD floating-point state when necessary. The operating system can take the responsibility for saving the SIMD floating-point state as part of the task switch process, but delay the saving of the SIMD floating-point state until a Streaming SIMD Extensions instruction is actually executed by the new task. Using this approach, the SIMD floating-point state is saved only if a Streaming SIMD Extensions instruction needs to be executed in the new task. (See Section 10.6.1., Using the TS Flag in Control Register CR0 to Control SIMD Floating-Point State Saving, for more information on this SIMD floating-point state saving technique.)
11.6.1.
Using the TS Flag in Control Register CR0 to Control SIMD Floating-Point State Saving
Saving the SIMD floating-point state using the FXSAVE instruction is not as high-overhead operation as FSAVE. However an operating system may choose to wait to save the SIMD floating-point state to avoid this overhead. If a task being switched to will not access the SIMD floating-point registers (by executing a Streaming SIMD Extensions instruction), this overhead can be avoided by not automatically saving the SIMD floating-point state on a task switch.
11-8
The TS flag in control register CR0 is provided to allow the operating system to delay saving the SIMD floating-point state until the SIMD floating-point registers are actually accessed in the new task. When this flag is set, the processor monitors the instruction stream for Streaming SIMD Extensions instructions. When the processor detects a Streaming SIMD Extensions instruction, it raises a device-not-available exception (#NM) prior to executing the instruction. The device-not-available exception handler can then be used to save the SIMD floating-point state for the previous task (using an FXSAVE instruction) and load the SIMD floating-point state for the current task (using an FXRSTOR instruction). If the task never encounters a Streaming SIMD Extensions instruction, the device-not-available exception will not be raised and the SIMD floating-point state will not be saved unnecessarily. The TS flag can be set either explicitly (by executing a MOV instruction to control register CR0) or implicitly (using the processors native task switching mechanism). When the native task switching mechanism is used, the processor automatically sets the TS flag on a task switch. After the device-not-available handler has saved the SIMD floating-point state, it should execute the CLTS instruction to clear the TS flag in CR0. Figure 10-2 gives an example of an operating system that implements SIMD floating-point state saving using the TS flag. In this example, task A is the currently running task and task B is the task being switched to.
Task A Application Operating System Task A SIMD floating-point State Save Area Operating System Task Switching Code Device-Not-Available Exception Handler SIMD floating-point State Owner
Task B
CR0.TS=1 and extensions instruction is encountered.
Task B SIMD floating-point State Save Area
Saves Task A SIMD floating-point State
Loads Task B SIMD floating-point State
Figure 11-2. Example of SIMD Floating-Point State Saving During an Operating SystemControlled Task Switch
The operating system maintains a SIMD floating-point save area for each task and defines a variable (SIMD-fpStateOwner) that indicates which task owns the SIMD floating-point state. In this example, task A is the current SIMD floating-point state owner. On a task switch, the operating system task switching code must execute the following pseudocode to set the TS flag according to the current SIMD floating-point state owner. If the new task
11-9
(task B in this example) is not the current SIMD floating-point state owner, the TS flag is set to 1; otherwise, it is set to 0.
IF Task_Being_Switched_To SIMD-fpStateOwner THEN CR0.TS 1; ELSE CR0.TS 0; FI;
If a new task attempts to use a Streaming SIMD Extensions instruction while the TS flag is set to 1, a device-not-available exception (#NM) is generated and the device-not-available exception handler executes the following pseudo-code.
CR0.TS 0; FXSAVE To SIMD floating-point State Save Area for Current SIMD Floating-point State Owner; FXRSTOR SIMD floating-point State From Current Tasks SIMD Floating-point State Save Area; SIMF-fpStateOwner Current_Task;
11-10
This handler code performs the following tasks:
Clears the TS flag. Saves the SIMD floating-point state in the state save area for the current SIMD floatingpoint state owner. Restores the SIMD floating-point state from the new tasks SIMD floating-point state save area. Updates the current SIMD floating-point state owner to be the current task.
11.7. EXCEPTIONS THAT CAN OCCUR WHEN EXECUTING STREAMING SIMD EXTENSIONS INSTRUCTIONS
Streaming SIMD Extensions can generate two kinds of exceptions:
Non-numeric exceptions Numeric exceptions
Streaming SIMD Extensions can generate the same type of memory access exceptions as the Intel Architecture instructions do. Some examples are: page fault, segment not present, and limit violations. Existing exception handlers can handle these types of exceptions without any code modification. The SIMD floating-point PREFETCH instruction hints will not generate any kind of exception and instead will be ignored. Streaming SIMD Extensions can generate the same six numeric exceptions that x87-FP instructions can generate. All Streaming SIMD Extensions numeric exceptions are reported independently of x87-FP numeric exceptions. Independent masking and unmasking of Streaming SIMD Extensions numeric exceptions is achieved by setting/resetting specific bits in the MXCSR register. The application must ensure that the OS can support unmasked SIMD floating-point exceptions before unmasking them. For more details, refer to Section 9.5.1., Detecting Support for Streaming SIMD Extensions Using the CPUID Instruction Chapter 9, Programming with the Streaming SIMD Extensions, in the Intel Architecture Software Developers Manual, Volume 1 and AP-900, Identifying Support for Streaming SIMD Extensions in the Processor and Operating System. If an application unmasks exceptions using either FXRSTOR or LDMXCSR without the required OS support being enabled, then an invalid opcode fault, instead of a SIMD floating-point exception, will be generated on the first faulting SIMD floating-point instruction.
11-11
11.7.1.
SIMD Floating-point Non-Numeric Exceptions
Exceptions during memory accesses: Invalid opcode (#UD). Stack exception (#SS). General protection (#GP). Page fault (#PF). Alignment check (#AC), if alignment checking is enabled.
System exceptions: Invalid Opcode (#UD), if the EM flag in control register CR0 is set, the CPUID.XMM bit is not set, or the CR4.OSFXSR* bit is not set, when a Streaming SIMD Extensions instruction is executed (see Section 10.1., Emulation of the Streaming SIMD Extensions). Device not available (#NM), if a Streaming SIMD Extensions instruction is executed when the TS flag in control register CR0 is set. (See Section 10.6.1., Using the TS Flag in Control Register CR0 to Control SIMD Floating-Point State Saving.)
Other exceptions can occur indirectly due to the faulty execution of the exception handlers for the above exceptions. For example, if a stack-segment fault (#SS) occurs due to Streaming SIMD Extensions instructions, the interrupt gate for the stack-segment fault can direct the processor to invalid TSS, causing an invalid TSS exception (#TS) to be generated.
Table 11-7 lists the causes for Interrupt 6 and Interrupt 7 with Streaming SIMD Extensions.
Table 11-7. Streaming SIMD Extensions Faults
CR0.EM 1 0 CR4.OSFXSR 0 1 CPUID.XMM 0 1 CR0.TS 1 EXCEPTION #UD Interrupt 6 #UD Interrupt 6 #UD Interrupt 6 #NM Interrupt 7
11-12
11.7.2. SIMD Floating-point Numeric Exceptions

There are six classes of numeric exception conditions that can occur while executing Streaming SIMD Extensions:
Invalid operation (#I) Divide-by-zero (#Z) Denormal operand (#D) Numeric overflow (#O) Numeric underflow (#U) Inexact result (Precision) (#P)
Invalid, Divide-by-zero and Denormal exceptions are pre-computation exceptions, i.e., they are detected before any arithmetic operation occurs. Underflow, Overflow and Precision exceptions are post-computation exceptions. When numeric exceptions occur, a processor supporting Streaming SIMD Extensions takes one of two possible courses of action:
The processor can handle the exception by itself, producing the most reasonable result and allowing numeric program execution to continue undisturbed (i.e., masked exception response). A software exception handler can be invoked to handle the exception (i.e., unmasked exception response).
Each of the six exception conditions described above has corresponding flag and mask bits in the MXCSR. If an exception is masked (the corresponding mask bit in MXCSR = 1), the processor takes an appropriate default action and continues with the computation. If the exception is unmasked (mask bit = 0) and the OS supports SIMD floating-point exceptions (i.e. CR4.OSXMMEXCPT = 1), a software exception handler is invoked immediately through SIMD floating-point exception interrupt vector 19. If the exception is unmasked (mask bit = 0) and the OS does not support SIMD floating-point exceptions (i.e. CR4.OSXMMEXCPT = 0), an invalid opcode exception is signaled instead of a SIMD floating-point exception. Note that because SIMD floating-point exceptions are precise and occur immediately, the situation does not arise where an x87-FP instruction, an FWAIT instruction, or another Streaming SIMD Extensions instruction will catch a pending unmasked SIMD floating-point exception. 11.7.2.1. EXCEPTION PRIORITY
The processor handles exceptions according to a predetermined precedence. When a suboperand of a packed instruction generates two or more exception conditions, the exception precedence sometimes results in the higher-priority exception being handled and the lowerpriority exceptions being ignored. For example, dividing an SNaN by zero could potentially signal an invalid-arithmetic-operand exception (due to the SNaN operand) and a divide-by-zero exception. Here, if both exceptions are masked, the processor handles the higher-priority exception only (the invalid-arithmetic-operand exception), returning the quiet version of the SNaN to
11-13
the destination. The prioritization policy also applies for unmasked exceptions; if both invalid and divide-by-zero are unmasked for the previous example, only the invalid flag will be set. Prioritization of exceptions is performed only on an individual sub-operand basis, and not between suboperands; for example, an invalid exception generated by one sub-operand will not prevent the reporting of a divide-by-zero exception generated by another sub-operand. The precedence for SIMD floating-point numeric exceptions is as follows: 1. Invalid operation exception due to NaN operands (refer to Table 11-8). 2. QNaN operand. Though this is not an exception, the handling of a QNaN operand has precedence over lower-priority exceptions. For example, a QNaN divided by zero results in a QNaN, not a zero-divide exception. 3. Any other invalid operation exception not mentioned above or a divide-by-zero exception (refer to Table 11-8). 4. Denormal operand exception. If masked, then instruction execution continues, and a lower-priority exception can occur as well. 5. Numeric overflow and underflow exceptions possibly in conjunction with the inexact result exception. 6. Inexact result exception. 11.7.2.2. AUTOMATIC MASKED EXCEPTION HANDLING
If the processor detects an exception condition for a masked exception (an exception with its mask bit set), it delivers a predefined (default) response and continues executing instructions. The masked (default) responses to exceptions have been chosen to deliver a reasonable result for each exception condition and are generally satisfactory for most application code. By masking or unmasking specific floating-point exceptions in the MXCSR, programmers can delegate responsibility for most exceptions to the processor and reserve the most severe exception conditions for software exception handlers. Because the exception flags are sticky, they provide a cumulative record of the exceptions that have occurred since they were last cleared. A programmer can thus mask all exceptions, run a calculation, and then inspect the exception flags to see if any exceptions were detected during the calculation. Note that when exceptions are masked, the processor may detect multiple exceptions in a single instruction, because:
It continues executing the instruction after performing its masked response; for example, the processor could detect a denormalized operand, perform its masked response to this exception, and then detect an underflow Exceptions may occur naturally in pairs, such as numeric underflow and inexact result (precision) Packed instructions can produce independent exceptions for each pair of operands.
11-14
Updating of exception flags is generated by a logical-OR of exception conditions for all suboperand computations, where the OR is done independently for each type of exception; for packed computations this means 4 sub-operands and for scalar computations this means 1 suboperand (the lowest one). 11.7.2.3. SOFTWARE EXCEPTION HANDLING - UNMASKED EXCEPTIONS
An application must ensure that the operating system supports unmasked exceptions before unmasking any of the exceptions in the MXCSR (refer to Section 9.5.1., Detecting Support for Streaming SIMD Extensions Using the CPUID Instruction Chapter 9, Programming with the Streaming SIMD Extensions, Volume 1 of the Programmers Reference Manual). If the processor detects a condition for an unmasked SIMD floating-point application exception, a software handler is invoked immediately at the end of the excepting instruction. The handler is invoked through the SIMD floating-point exception interrupt (vector 19), irrespective of the state of the CR0.NE flag. If an exception is unmasked, but SIMD floating-point unmasked exceptions are not enabled (CR4.OSXMMEXCPT = 0), an invalid opcode fault is generated. However, the corresponding exception bit will still be set in the MXCSR, as it would be if CR4.OSXMMEXCPT =1, since the invalid opcode handler or the user needs to determine the cause of the exception. A typical action of the exception handler is to store x87-FP and SIMD floating-point state information in memory (with the FXSAVE/FXRSTOR instructions) so that it can evaluate the exception and formulate an appropriate response. Other typical exception handler actions can include:
Examine stored x87-FP and SIMD floating-point state information (control/status) to determine the nature of the error. Taking action to correct the condition that caused the error. Clear the exception bits in the x87-FP status word (FSW) or the SIMD floating-point control register (MXCSR) Return to the interrupted program and resume normal execution.
In lieu of writing recovery procedures, the exception handler can do one or more of the following: Increment in software an exception counter for later display or printing. Print or display diagnostic information (such as the SIMD floating-point register state). Halt further program execution.
When an unmasked exception occurs, the processor will not alter the contents of the source register operands prior to invoking the unmasked handler. Similarly, the integer EFLAGS will also not be modified if an unmasked exception occurs while executing the COMISS or UCOMISS instructions. Exception flags will be updated according to the following rules:
Updating of exception flags is generated by a logical-OR of exception conditions for all sub-operand computations, where the OR is done independently for each type of
11-15
exception; for packed computations this means 4 sub-operands and for scalar computations this means 1 sub-operand (the lowest one).
In the case of only masked exception conditions, all flags will be updated, In the case of an unmasked pre-computation type of exception condition (e.g., denormal input), all flags relating to all pre-computation conditions (masked or unmasked) will be updated, and no subsequent computation is performed (i.e., no post-computation condition can occur if there is an unmasked pre-computation condition). In the case of an unmasked post-computation exception condition, all flags relating to all post-computation conditions (masked or unmasked) will be updated; all pre-computation conditions, which must be masked-only will also be reported. INTERACTION WITH X87 NUMERIC EXCEPTIONS
11.7.2.4.
The Streaming SIMD Extensions control/status register was separated from its x87-FP counterparts to allow for maximum flexibility. Consequently, the Streaming SIMD Extensions architecture is independent of the x87-FP architecture, but has the following implications for x87-FP applications that call Streaming SIMD Extensions-enabled libraries:
The x87-FP rounding mode specified in FCW will not apply to calls in a Streaming SIMD Extensions library (unless the rounding control in MXCSR is explicitly set to the same mode). x87-FP exception observability may not apply to a Streaming SIMD Extensions library.
An application that expects to catch x87-FP exceptions that occur in an x87-FP library will not be notified if an exception occurs in a Streaming SIMD Extensions library, unless the exception masks enabled in FCW have also been enabled in MXCSR. An application will not be able to unmask exceptions after returning from a Streaming SIMD Extensions library call to detect if an error occurred. A SIMD floating-point exception flag that is already set when the corresponding exception is unmasked will not generate a fault; only the next occurrence of that exception will generate an unmasked fault. An application which checks FSW to determine if any masked exception flags were set during an x87-FP library call will also need to check MXCSR in order to observe a similar occurrence of a masked exception within a Streaming SIMD Extensions library.
11.7.3.
SIMD Floating-point Numeric Exception Conditions and Masked/Unmasked Responses
The following sections describe the various conditions that cause a SIMD floating-point numeric exception to be generated and the masked response of the processor when these conditions are detected.
11-16
11.7.3.1.
INVALID OPERATION EXCEPTION(#IA)
The invalid operation exception occurs in response to an invalid arithmetic operand, or to an invalid combination of operands. If the invalid operation exception is masked, the processor sets the IE flag in MXCSR and returns the single-precision QNaN indefinite value or another QNaN value (derived from a NaN input operand) to the destination operand. This value overwrites the destination register specified by the instruction. If the invalid operation exception is not masked, the processor sets the IE flag in MXCSR and an exception handler is invoked (see Section 11.7.2.3., Software Exception Handling Unmasked Exceptions) and the operands remain unchanged. The processor can detect a variety of invalid arithmetic operations that can be coded in a program. These operations generally indicate a programming error, such as dividing by . Table 11-8 lists the SIMD floating-point invalid arithmetic operations that the processor detects. This group includes the invalid operations defined in IEEE Std. 854. The flag (IE) for this exception is bit 0 of MXCSR, and the mask bit (IM) is bit 7 of MXCSR. The invalid operation exception is not affected by the flush-to-zero mode.
11-17
Table 11-8. Invalid Arithmetic Operations and the Masked Responses to Them
Condition ADDPS/ADDSS/DIVPS/DIVSS/ MULPS/MULSS/SUBPS/SUBSS with a SNaN operand. CMPPS/CMPSS with QNaN/SNaN operands (QNaN applies only for predicates "lt", "le", "nlt", "nle") COMISS with QNaN/SNaN operand(s). UCOMISS with SNaN operand(s). SQRTPS/SQRTSS with SNaN operand(s). Addition of opposite signed infinities or subtraction of like-signed infinities. Multiplication of infinity by zero. Divide of (0/0) or( Masked Response Return the Signaling NaN converted to a quiet NaN; Refer to Table 7-18, in Chapter 7, Floating-Point Unit, for more details; set #IA flag. Return a mask of all 0s for predicates "eq", "lt", "le", and "ord", and a mask of all 1s for predicates "neq", "nlt", "nle", and "unord"; set #IA flag. Set EFLAGS values to not comparable; set #IA flag. Set EFLAGS values to not comparable; set #IA flag. Return the SNan converted to a QNaN; set #IA flag; Return the QNaN Indefinite; set #IA flag. Return the QNaN Indefinite; set #IA flag. Return the QNaN Indefinite; set #IA flag. Return the QNaN Indefinite; set #IA flag. Return the Integer Indefinite; set #IA flag.
/ .)
SQRTPS/SQRTSS of negative operands (except negative zero). Conversion to integer when the source register is a NaN, Infinity or exceeds the representable range. NOTE:
RCPPS/RCPSS/RSQRTPS/RSQRTSS with QNaN/SNaN operand(s) do not raise an invalid exception. They return either the SNaN operand converted to QNaN, or the original QNaN operand. RSQRTPS/RSQRTSS with negative operands (but not for negative zero) do not raise an invalid exception, and return QNaN Indefinite.
11.7.3.2.
DIVISION-BY-ZERO EXCEPTION (#Z)
The processor reports a divide-by-zero exception whenever an instruction attempts to divide a finite non-zero operand by 0. This is possible with DIVPS, DIVSS. The masked response for DIVPS, DIVSS is to set the ZE flag in MXCSR and return an infinity signed with the exclusive OR of the signs of the operands. If the divide-by-zero exception is not masked, the ZE flag is set, a software exception handler is invoked (see Section 11.7.2.3., Software Exception Handling - Unmasked Exceptions) and the source operands remain unchanged. Note that the response for RCPPS, RSQRTPS, RCPSS and RSQRTSS is to return an infinity of the same sign as the operand. These instructions do not set any exception flags and thus are not affected by the exception masks. The flag (ZE) for the divide-by-zero exception is bit 2 of MXCSR, and the mask bit (ZM) is bit 9 of MXCSR. The divide-by-zero exception is not affected by the flush-to-zero mode.
11-18
11.7.3.3.
DENORMAL OPERAND EXCEPTION (#D)
The processor signals the denormal operand exception if an arithmetic instruction attempts to operate on a denormal operand. When a denormal operand exception occurs and the exception is masked, the processor sets the DE flag in MXCSR, then proceeds with the instruction. Operating on denormal numbers will produce results at least as good as, and often better than, what can be obtained when denormal numbers are flushed to zero. Programmers can mask this exception so that a computation may proceed, then analyze any loss of accuracy when the final result is delivered. When a denormal operand exception occurs and the exception is not masked, the processor sets the DE bit in MXCSR and a software exception handler is invoked (see Section 11.7.2.3., Software Exception Handling - Unmasked Exceptions). The source operands remain unchanged. When denormal operands have reduced significance due to loss of low-order bits, it may be advisable to not operate on them. Precluding denormal operands from computations can be accomplished by an exception handler that responds to unmasked denormal operand exceptions. Note that the response for RCPPS, RSQRTPS, RCPSS and RSQRTSS is to return an infinity of the same sign as the operand. These instructions do not set any exception flags and thus are not affected by the exception masks. Conversion instructions (CVTPI2PS, CVTPS2PI, CVTTPS2PI, CVTSI2SS, CVTSS2SI, CVTTSS2SI) do not signal denormal exceptions. The flag (DE) for this exception is bit 1 of MXCSR, and the mask bit (DM) is bit 8 of MXCSR. The denormal operand exception is not affected by the flush-to-zero mode. 11.7.3.4. NUMERIC OVERFLOW EXCEPTION (#O)
The processor reports a floating-point numeric overflow exception whenever the result of an instruction rounded to the destination precision with unbounded exponent exceeds the largest allowable finite value that will fit into the destination operand. This is possible with ADDPS, ADDSS, SUBPS, SUBSS, MULPS, MULSS, DIVPS, DIVSS. When a numeric overflow exception occurs and the exception is masked, the processor sets the MXCSR.OE and MXCSR.PE flags and returns one of the values shown in Table 11-9 according to the current rounding mode of the processor (see Section 11.3.2.1., Rounding Control Field). When a numeric overflow exception occurs and the exception is unmasked, the operands are left unaltered and a software exception handler is invoked (see Section 11.7.2.3., Software Exception Handling - Unmasked Exceptions). The MXCSR.OE flag is set; the MXCSR.PE flag is only set if a loss of accuracy has occurred in addition to overflow when rounding the result to the destination precision, with unbounded exponent. The flag (OE) for the numeric overflow exception is bit 3 of MXCSR, and the mask bit (OM) is bit 10 of MXCSR. The numeric overflow exception is not affected by the flush-to-zero mode.
11-19
Note that the overflow status flag is not set by RCPPS/RCPSS, since these instructions are combinatorial and are not affected by exception masks.
.
Table 11-9. Masked Responses to Numeric Overflow

Rounding Mode To nearest + Toward Toward + + + Toward zero + Sign of True Result + Largest finite positive number + Largest finite negative number Largest finite positive number Largest finite negative number Result
11.7.3.5.
NUMERIC UNDERFLOW EXCEPTION (#U)
The processor might report a floating-point numeric underflow exception whenever the rounded result of an arithmetic instruction is tiny; that is, the result rounded to the destination precision with unbounded exponent is less than the smallest possible normalized, finite value that will fit into the destination operand. The Underflow exception can occur in the execution of the instructions ADDPS, ADDSS, SUBPS, SUBSS, MULPS, MULSS, DIVPS and DIVSS. Two related events contribute to underflow:
Creation of a tiny result which, because it is so small, may cause some other exception later (such as overflow upon division). Creation of an inexact result; i.e. the delivered result differs from what would have been computed were both the exponent and precision unbounded.
Which of these events triggers the underflow exception depends on whether the underflow exception is masked:
Underflow exceptions masked. The underflow exception is signaled when the result is both tiny and inexact. Underflow exceptions not masked: The underflow exception is signaled when the result is tiny, regardless of inexactness.
The response to an underflow exception also depends on whether the exception is masked: Masked response: The result is normal, denormal or zero. The precision exception is also triggered. The OE and PE flags are set in MXCSR. Unmasked response: The UE flag is set in MXCSR. If the original computation generated an imprecise mantissa, the inexact (#P) status flag PE will also be set in the MXCSR. In either case (result imprecise or not), the underflow (#U) status flag is set, the operands are
11-20
left unaltered, and a software exception handler is invoked (see Section 11.7.2.3., Software Exception Handling - Unmasked Exceptions). If underflow is masked and flush-to-zero mode is enabled, an underflow condition will set the underflow (#U) and inexact (#P) status flags UE and PE in MXCSR and a correctly signed zero result will be returned; this will avoid the performance penalty associated with generating a denormalized result. If underflow is unmasked, the flush-to-zero mode is ignored and an underflow condition will be handled as described above. Note that the underflow status flag is not set by RCPPS/RCPSS, since these instructions are combinatorial and are not affected by exception masks. The flag (UE) for the numeric underflow exception is bit 4 of MXCSR and the mask bit (UM) is bit 11 of MXCSR. 11.7.3.6. INEXACT RESULT (PRECISION) EXCEPTION (#P)
The inexact result exception (also called the precision exception) occurs if the result of an operation is not exactly representable in the destination format. For example, the fraction 1/3 cannot be precisely represented in binary form. This exception occurs frequently and indicates that some (normally acceptable) accuracy has been lost. The exception is supported for applications that need to perform exact arithmetic only. Because the rounded result is generally satisfactory for most applications, this exception is commonly masked. If the inexact result exception is masked when an inexact result condition occurs and a numeric overflow or underflow condition has not occurred, the processor sets the inexact (#P) status flag (PE flag) and stores the rounded result in the destination operand. The current rounding mode determines the method used to round the result (refer to Section 11.3.2.1., Rounding Control Field). If the inexact result exception is not masked when an inexact result occurs and numeric overflow or underflow has not occurred, the operands are left unaltered, the PE flag is set in MXCSR, the inexact (#P) status flag is set, and a software exception handler is invoked (see Section 11.7.2.3., Software Exception Handling - Unmasked Exceptions). If an inexact result occurs in conjunction with numeric overflow or underflow, one of the following operations is carried out:
If an inexact result occurs along with masked overflow or underflow, the OE or UE flag and the PE flag are set in MXCSR and the result is stored as described for the overflow or underflow exceptions (see Section 11.7.3.4., Numeric Overflow Exception (#O). or Section 11.7.3.5., Numeric Underflow Exception (#U)). If the inexact result exception is unmasked, the processor also invokes the software exception handler. If an inexact result occurs along with unmasked overflow or underflow, the OE or UE flag and the PE flag are set and the software exception handler is invoked.
Note that the inexact result flag is not set by RCPPS, RSQRTPS, RCPSS and RSQRTSS, since these instructions are combinatorial and are not affected by the exception masks. The inexact result exception flag (PE) is bit 5 of MXCSR, and the mask bit (PM) is bit 12 of MXCSR.
11-21
In flush-to-zero mode, the inexact result exception is reported along with the underflow exception (the latter must be masked).
11.7.4.
Effect of Streaming SIMD Extensions Instructions on Pending Floating-Point Exceptions
Unlike MMX instructions which will generate a floating-point error (#MF) prior to executing the MMX instruction, execution of a Streaming SIMD Extensions instruction does not generate a floating-point error (#MF) prior to executing the instruction. Hence they will not catch pending x87 floating-point exceptions. In addition, they will not cause assertion of FERR# (independent of the value of CR0.NE) and they ignore the assertion/de-assertion of IGNNE#.
11.8. DEBUGGING
The debug facilities of the Intel Architecture operate in the same manner when executing Streaming SIMD Extensions as when executing other Intel Architecture instructions. These facilities enable debuggers to debug code utilizing these instructions. To correctly interpret the contents of the Pentium III processor registers from the FXSAVE image in memory, a debugger needs to take account of the relationship between the floatingpoint registers logical locations relative to TOS and the MMX registers physical locations (refer to Section 10.6., Debugging, Chapter 10, MMX Technology System Programming). In addition it needs to have knowledge of the SIMD floating-point registers and the state save data area used by the FXSAVE instruction. Comparisons of the Streaming SIMD Extensions and x87 results can be performed within the Pentium III processor at the internal single precision format and/or externally at the memory single precision format. The internal format comparison is required to allow the partitioning of the data space to reduce test time.
11-22
12
System Management Mode
SYSTEM MANAGEMENT MODE (SMM)
CHAPTER 12 SYSTEM MANAGEMENT MODE (SMM)

This chapter describes the Intel Architectures System Management Mode (SMM) architecture. SMM was introduced into the Intel Architecture in the Intel386 SL processor (a mobile specialized version of the Intel386 processor). It is also available in the Intel486 processors (beginning with the Intel486 SL and Intel486 enhanced versions) and in the Intel Pentium and P6 family processors. For a detailed description of the hardware that supports SMM, refer to the developers manuals for each of the Intel Architecture processors.
12.1. SYSTEM MANAGEMENT MODE OVERVIEW

SMM is a special-purpose operating mode provided for handling system-wide functions like power management, system hardware control, or proprietary OEM-designed code. It is intended for use only by system firmware, not by applications software or general-purpose systems software. The main benefit of SMM is that it offers a distinct and easily isolated processor environment that operates transparently to the operating system or executive and software applications. When SMM is invoked through a system management interrupt (SMI), the processor saves the current state of the processor (the processors context), then switches to a separate operating environment contained in system management RAM (SMRAM). While in SMM, the processor executes SMI handler code to perform operations such as powering down unused disk drives or monitors, executing proprietary code, or placing the whole system in a suspended state. When the SMI handler has completed its operations, it executes a resume (RSM) instruction. This instruction causes the processor to reload the saved context of the processor, switch back to protected or real mode, and resume executing the interrupted application or operating-system program or task. The following SMM mechanisms make it transparent to applications programs and operating systems:
The only way to enter SMM is by means of an SMI. The processor executes SMM code in a separate address space (SMRAM) that can be made inaccessible from the other operating modes. Upon entering SMM, the processor saves the context of the interrupted program or task. All interrupts normally handled by the operating system are disabled upon entry into SMM. The RSM instruction can be executed only in SMM.
SMM is similar to real-address mode in that there are no privilege levels or address mapping. An SMM program can address up to 4 GBytes of memory and can execute all I/O and applicable system instructions. Refer to Section 12.5., SMI Handler Execution Environment for more information about the SMM execution environment.
12-1
NOTE
The physical address extension (PAE) mechanism available in the P6 family processors is not supported when a processor is in SMM.
12.2. SYSTEM MANAGEMENT INTERRUPT (SMI)

The only way to enter SMM is by signaling an SMI through the SMI# pin on the processor or through an SMI message received through the APIC bus. The SMI is a nonmaskable external interrupt that operates independently from the processors interrupt- and exception-handling mechanism and the local APIC. The SMI takes precedence over an NMI and a maskable interrupt. SMM is nonreentrant; that is, the SMI is disabled while the processor is in SMM.
NOTE
In the P6 family processors, when a processor that is designated as the application processor during an MP initialization protocol is waiting for a startup IPI, it is in a mode where SMIs are masked.
12.3. SWITCHING BETWEEN SMM AND THE OTHER PROCESSOR OPERATING MODES
Figure 2-2 in Chapter 2, System Architecture Overview shows how the processor moves between SMM and the other processor operating modes (protected, real-address, and virtual-8086). Signaling an SMI while the processor is in real-address, protected, or virtual-8086 modes always causes the processor to switch to SMM. Upon execution of the RSM instruction, the processor always returns to the mode it was in when the SMI occurred.
12.3.1. Entering SMM

The processor always handles an SMI on an architecturally defined interruptible point in program execution (which is commonly at an Intel Architecture instruction boundary). When the processor receives an SMI, it waits for all instructions to retire and for all stores to complete. The processor then saves its current context in SMRAM (refer to Section 12.4., SMRAM), enters SMM, and begins to execute the SMI handler. Upon entering SMM, the processor signals external hardware that SMM handling has begun. The signaling mechanism used is implementation dependent. For the P6 family processors, an SMI acknowledge transaction is generated on the system bus and the multiplexed status signal EXF4 is asserted each time a bus transaction is generated while the processor is in SMM. For the Pentium and Intel486 processors, the SMIACT# pin is asserted. An SMI has a greater priority than debug exceptions and external interrupts. Thus, if an NMI, maskable hardware interrupt, or a debug exception occurs at an instruction boundary along with an SMI, only the SMI is handled. Subsequent SMI requests are not acknowledged while the processor is in SMM. The first SMI interrupt request that occurs while the processor is in SMM
12-2
(that is, after SMM has been acknowledged to external hardware) is latched and serviced when the processor exits SMM with the RSM instruction. The processor will latch only one SMI while in SMM. Refer to Section 12.5., SMI Handler Execution Environment for a detailed description of the execution environment when in SMM. 12.3.1.1. EXITING FROM SMM
The only way to exit SMM is to execute the RSM instruction. The RSM instruction is only available to the SMI handler; if the processor is not in SMM, attempts to execute the RSM instruction result in an invalid-opcode exception (#UD) being generated. The RSM instruction restores the processors context by loading the state save image from SMRAM back into the processors registers. The processor then returns an SMIACK transaction on the system bus and returns program control back to the interrupted program. Upon successful completion of the RSM instruction, the processor signals external hardware that SMM has been exited. For the P6 family processors, an SMI acknowledge transaction is generated on the system bus and the multiplexed status signal EXF4 is no longer generated on bus cycles. For the Pentium and Intel486 processors, the SMIACT# pin is deserted. If the processor detects invalid state information saved in the SMRAM, it enters the shutdown state and generates a special bus cycle to indicate it has entered shutdown state. Shutdown happens only in the following situations:
A reserved bit in control register CR4 is set to 1 on a write to CR4. This error should not happen unless SMI handler code modifies reserved areas of the SMRAM saved state map (refer to Section 12.4.1., SMRAM State Save Map). Note that CR4 is not distinctly part of the saved state map. An illegal combination of bits is written to control register CR0, in particular PG set to 1 and PE set to 0, or NW set to 1 and CD set to 0. (For the Pentium and Intel486 processors only.) If the address stored in the SMBASE register when an RSM instruction is executed is not aligned on a 32-KByte boundary. This restriction does not apply to the P6 family processors.
In shutdown state, the processor stops executing instructions until a RESET#, INIT# or NMI# is asserted. The processor also recognizes the FLUSH# signal while in the shutdown state. In addition, the Pentium processor recognizes the SMI# signal while in shutdown state, but the P6 family and Intel486 processors do not. (It is not recommended that the SMI# pin be asserted on a Pentium processor to bring the processor out of shutdown state, because the action of the processor in this circumstance is not well defined.) If the processor is in the HALT state when the SMI is received, the processor handles the return from SMM slightly differently (refer to Section 12.10., Auto HALT Restart). Also, the SMBASE address can be changed on a return from SMM (refer to Section 12.11., SMBASE Relocation).
12-3
12.4. SMRAM
While in SMM, the processor executes code and stores data in the SMRAM space. The SMRAM space is mapped to the physical address space of the processor and can be up to 4 GBytes in size. The processor uses this space to save the context of the processor and to store the SMI handler code, data and stack. It can also be used to store system management information (such as the system configuration and specific information about powered-down devices) and OEM-specific information. The default SMRAM size is 64 KBytes beginning at a base physical address in physical memory called the SMBASE (refer to Figure 12-1). The SMBASE default value following a hardware reset is 30000H. The processor looks for the first instruction of the SMI handler at the address [SMBASE + 8000H]. It stores the processors state in the area from [SMBASE + FE00H] to [SMBASE + FFFFH]. Refer to Section 12.4.1., SMRAM State Save Map for a description of the mapping of the state save area. The system logic is minimally required to decode the physical address range for the SMRAM from [SMBASE + 8000H] to [SMBASE + FFFFH]. A larger area can be decoded if needed. The size of this SMRAM can be between 32 KBytes and 4 GBytes. The location of the SMRAM can be changed by changing the SMBASE value (refer to Section 12.11., SMBASE Relocation). It should be noted that all processors in a multiple-processor system are initialized with the same SMBASE value (30000H). Initialization software must sequentially place each processor in SMM and change its SMBASE so that it does not overlap those of other processors. The actual physical location of the SMRAM can be in system memory or in a separate RAM memory. The processor generates an SMI acknowledge transaction (P6 family processors) or asserts the SMIACT# pin (Pentium and Intel486 processors) when the processor receives an SMI (refer to Section 12.3.1., Entering SMM). System logic can use the SMI acknowledge transaction or the assertion of the SMIACT# pin to decode accesses to the SMRAM and redirect them (if desired) to specific SMRAM memory. If a separate RAM memory is used for SMRAM, system logic should provide a programmable method of mapping the SMRAM into system memory space when the processor is not in SMM. This mechanism will enable start-up procedures to initialize the SMRAM space (that is, load the SMI handler) before executing the SMI handler during SMM.
12-4
SMRAM SMBASE + FFFFH Start of State Save Area
SMBASE + 8000H
SMI Handler Entry Point
SMBASE
Figure 12-1. SMRAM Usage
12.4.1. SMRAM State Save Map

When the processor initially enters SMM, it writes its state to the state save area of the SMRAM. The state save area begins at [SMBASE + 8000H + 7FFFH] and extends down to [SMBASE + 8000H + 7E00H]. Table 12-1 shows the state save map. The offset in column 1 is relative to the SMBASE value plus 8000H. Reserved spaces should not be used by software. Some of the registers in the SMRAM state save area (marked YES in column 3) may be read and changed by the SMI handler, with the changed values restored to the processor registers by the RSM instruction. Some register images are read-only, and must not be modified (modifying these registers will result in unpredictable behavior). An SMI handler should not rely on any values stored in an area that is marked as reserved.
Table 12-1. SMRAM State Save Map
Offset (Added to SMBASE + 8000H) 7FFCH 7FF8H 7FF4H 7FF0H 7FECH 7FE8H 7FE4H 7FE0H 7FDCH 7FD8H 7FD4H Register CR0 CR3 EFLAGS EIP EDI ESI EBP ESP EBX EDX ECX Writable? No No Yes Yes Yes Yes Yes Yes Yes Yes Yes
12-5
Table 12-1. SMRAM State Save Map (Contd.)

Offset (Added to SMBASE + 8000H) 7FD0H 7FCCH 7FC8H 7FC4H 7FC0H 7FBCH 7FB8H 7FB4H 7FB0H 7FACH 7FA8H 7FA7H - 7F04H 7F02H 7F00H 7EFCH 7EF8H 7EF7H - 7E00H NOTE: * Upper two bytes are reserved. Register EAX DR6 DR7 TR* LDT Base* GS* FS* DS* SS* CS* ES* Reserved Auto HALT Restart Field (Word) I/O Instruction Restart Field (Word) SMM Revision Identifier Field (Doubleword) SMBASE Field (Doubleword) Reserved Writable? Yes No No No No No No No No No No No Yes Yes No Yes No
The following registers are saved (but not readable) and restored upon exiting SMM:
Control register CR4 (CR4 is set to 0 while in the SMM handler). The hidden segment descriptor information stored in segment registers CS, DS, ES, FS, GS, and SS.
If an SMI request is issued for the purpose of powering down the processor, the values of all reserved locations in the SMM state save must be saved to nonvolatile memory. The following state is not automatically saved and restored following an SMI and the RSM instruction, respectively:

12-6
Debug registers DR0 through DR3. The FPU registers. The MTRRs. Control register CR2. The model-specific registers (for the P6 family and Pentium processors) or test registers TR3 through TR7 (for the Pentium and Intel486 processors).
The state of the trap controller. The machine-check architecture registers. The APIC internal interrupt state (ISR, IRR, etc.). The microcode update state.
If an SMI is used to power down the processor, a power-on reset will be required before returning to SMM, which will reset much of this state back to its default values. So an SMI handler that is going to trigger power down should first read these registers listed above directly, and save them (along with the rest of RAM) to nonvolatile storage. After the power-on reset, the continuation of the SMI handler should restore these values, along with the rest of the systems state. Anytime the SMI handler changes these registers in the processor, it must also save and restore them.
NOTE
A small subset of the MSRs (such as, the time-stamp counter and performance-monitoring counter) are not arbitrarily writable and therefore cannot be saved and restored. SMM-based power-down and restoration should only be performed with operating systems that do not use or rely on the values of these registers. Operating system developers should be aware of this fact and ensure that their operating-system assisted power-down and restoration software is immune to unexpected changes in these register values.
12.4.2. SMRAM Caching

An Intel Architecture processor supporting SMM does not unconditionally write back and invalidate its cache before entering SMM. Therefore, if SMRAM is in a location that is shadowed by any existing system memory that is visible to the application or operating system, then it is necessary for the system to flush the cache upon entering SMM. This may be accomplished by asserting the FLUSH# pin at the same time as the request to enter SMM. The priorities of the FLUSH# pin and the SMI# are such that the FLUSH# will be serviced first. To guarantee this behavior, the processor requires that the following constraints on the interaction of SMI# and FLUSH# be met. In a system where the FLUSH# pin and SMI# pins are synchronous and the set up and hold times are met, then the FLUSH# and SMI# pins may be asserted in the same clock. In asynchronous systems, the FLUSH# pin must be asserted at least one clock before the SMI# pin to guarantee that the FLUSH# pin is serviced first. Note that in Pentium processor systems that use the FLUSH# pin to write back and invalidate cache contents before entering SMM, the processor will prefetch at least one cache line in between when the Flush Acknowledge cycle is run, and the subsequent recognition of SMI# and the assertion of SMIACT#. It is the obligation of the system to ensure that these lines are not cached by returning KEN# inactive to the Pentium processor.
12-7
Intel Architecture processors do not write back or invalidate their internal caches upon leaving SMM. For this reason, references to the SMRAM area must not be cached if any part of the SMRAM shadows (overlays) non-SMRAM memory; that is, system DRAM or video RAM. It is the obligation of the system to ensure that all memory references to overlapped areas are uncached; that is, the KEN# pin is sampled inactive during all references to the SMRAM area for the Pentium processor. The WBINVD instruction should be used to ensure cache coherency at the end of a cached SMM execution in systems that have a protected SMM memory region provided by the chipset. The P6 family of processors have no external equivalent of the KEN# pin. All memory accesses are typed via the MTRRs. It is not practical therefore to have memory access to a certain address be cached in one access and not cached in another. Intel does not recommend the caching of SMM space in any overlapping memory environment on the P6 family of processors.
12.5. SMI HANDLER EXECUTION ENVIRONMENT

After saving the current context of the processor, the processor initializes its core registers to the values shown in Table 12-2. Upon entering SMM, the PE and PG flags in control register CR0 are cleared, which places the processor is in an environment similar to real-address mode. The differences between the SMM execution environment and the real-address mode execution environment are as follows:
The addressable SMRAM address space ranges from 0 to FFFFFFFFH (4 GBytes). (The physical address extension (enabled with the PAE flag in control register CR4) is not supported in SMM.) The normal 64-KByte segment limit for real-address mode is increased to 4 GBytes. The default operand and address sizes are set to 16 bits, which restricts the addressable SMRAM address space to the 1-MByte real-address mode limit for native real-addressmode code. However, operand-size and address-size override prefixes can be used to access the address space beyond the 1-MByte. Near jumps and calls can be made to anywhere in the 4-GByte address space if a 32-bit operand-size override prefix is used. Due to the real-address-mode style of base-address formation, a far call or jump cannot transfer control to a segment with a base address of more than 20 bits (1 MByte). However, since the segment limit in SMM is 4 GBytes, offsets into a segment that go beyond the 1-MByte limit are allowed when using 32-bit operand-size override prefixes. Any program control transfer that does not have a 32-bit operand-size override prefix truncates the EIP value to the 16 low-order bits.
12-8
Table 12-2. Processor Register Initialization in SMM

Register General-purpose registers EFLAGS EIP CS selector CS base DS, ES, FS, GS, SS Selectors DS, ES, FS, GS, SS Bases DS, ES, FS, GS, SS Limits CR0 DR6 DR7 Undefined 00000002H 00008000H SMM Base shifted right 4 bits (default 3000H) SMM Base (default 30000H) 0000H 000000000H 0FFFFFFFFH PE, EM, TS and PG flags set to 0; others unmodified Undefined 00000400H Contents
Data and the stack can be located anywhere in the 4-GByte address space, but can be accessed only with a 32-bit address-size override if they are located above 1 MByte. As with the code segment, the base address for a data or stack segment cannot be more than 20 bits.
The value in segment register CS is automatically set to the default of 30000H for the SMBASE shifted 4 bits to the right; that is, 3000H. The EIP register is set to 8000H. When the EIP value is added to shifted CS value (the SMBASE), the resulting linear address points to the first instruction of the SMI handler. The other segment registers (DS, SS, ES, FS, and GS) are cleared to 0 and their segment limits are set to 4 GBytes. In this state, the SMRAM address space may be treated as a single flat 4Gbyte linear address space. If a segment register is loaded with a 16-bit value, that value is then shifted left by 4 bits and loaded into the segment base (hidden part of the segment register). The limits and attributes are not modified. Maskable hardware interrupts, exceptions, NMI interrupts, SMI interrupts, A20M interrupts, single-step traps, breakpoint traps, and INIT operations are inhibited when the processor enters SMM. Maskable hardware interrupts, exceptions, single-step traps, and breakpoint traps can be enabled in SMM if the SMM execution environment provides and initializes an interrupt table and the necessary interrupt and exception handlers (refer to Section 12.6., Exceptions and Interrupts Within SMM).
12-9
12.6. EXCEPTIONS AND INTERRUPTS WITHIN SMM

When the processor enters SMM, all hardware interrupts are disabled in the following manner:
The IF flag in the EFLAGS register is cleared, which inhibits maskable hardware interrupts from being generated. The TF flag in the EFLAGS register is cleared, which disables single-step traps Debug register DR7 is cleared, which disables breakpoint traps. (This action prevents a debugger from accidentally breaking into an SMM handler if a debug breakpoint is set in normal address space that overlays code or data in SMRAM.) NMI, SMI, and A20M interrupts are blocked by internal SMM logic. (Refer to Section 12.7., NMI Handling While in SMM for further information about how NMIs are handled in SMM.)
Software-invoked interrupts and exceptions can still occur, and maskable hardware interrupts can be enabled by setting the IF flag. Intel recommends that SMM code be written in so that it does not invoke software interrupts (with the INT n, INTO, INT 3, or BOUND instructions) or generate exceptions. If the SMM handler requires interrupt and exception handling, an SMM interrupt table and the necessary exception and interrupt handlers must be created and initialized from within SMM. Until the interrupt table is correctly initialized (using the LIDT instruction), exceptions and software interrupts will result in unpredictable processor behavior. The following restrictions apply when designing SMM interrupt and exception-handling facilities:
The interrupt table should be located at linear address 0 and must contain real-address mode style interrupt vectors (4 bytes containing CS and IP). Due to the real-address mode style of base address formation, an interrupt or exception cannot transfer control to a segment with a base address of more that 20 bits. An interrupt or exception cannot transfer control to a segment offset of more than 16 bits (64 KBytes). When an exception or interrupt occurs, only the 16 least-significant bits of the return address (EIP) are pushed onto the stack. If the offset of the interrupted procedure is greater than 64 KBytes, it is not possible for the interrupt/exception handler to return control to that procedure. (One solution to this problem is for a handler to adjust the return address on the stack.) The SMBASE relocation feature affects the way the processor will return from an interrupt or exception generated while the SMI handler is executing. For example, if the SMBASE is relocated to above 1 MByte, but the exception handlers are below 1 MByte, a normal return to the SMI handler is not possible. One solution is to provide the exception handler with a mechanism for calculating a return address above 1 MByte from the 16-bit return address on the stack, then use a 32-bit far call to return to the interrupted procedure.
12-10
If an SMI handler needs access to the debug trap facilities, it must insure that an SMM accessible debug handler is available and save the current contents of debug registers DR0 through DR3 (for later restoration). Debug registers DR0 through DR3 and DR7 must then be initialized with the appropriate values. If an SMI handler needs access to the single-step mechanism, it must insure that an SMM accessible single-step handler is available, and then set the TF flag in the EFLAGS register. If the SMI design requires the processor to respond to maskable hardware interrupts or software-generated interrupts while in SMM, it must ensure that SMM accessible interrupt handlers are available and then set the IF flag in the EFLAGS register (using the STI instruction). Software interrupts are not blocked upon entry to SMM, so they do not need to be enabled.
12.7. NMI HANDLING WHILE IN SMM

NMI interrupts are blocked upon entry to the SMI handler. If an NMI request occurs during the SMI handler, it is latched and serviced after the processor exits SMM. Only one NMI request will be latched during the SMI handler. If an NMI request is pending when the processor executes the RSM instruction, the NMI is serviced before the next instruction of the interrupted code sequence. Although NMI requests are blocked when the CPU enters SMM, they may be enabled through software by executing an IRET/IRETD instruction. If the SMM handler requires the use of NMI interrupts, it should invoke a dummy interrupt service routine for the purpose of executing an IRET/IRETD instruction. Once an IRET/IRETD instruciton is executed, NMI interrupt requrests are serviced in the same real mode manner in which they are handled outside of SMM. A special case can occur if an SMI handler nests inside an NMI handler and then another NMI occurs. During NMI interrupt handling, NMI interrupts are disabled, so normally NMI interrupts are serviced and completed with an IRET instruction one at a time. When the processor enters SMM while executing an NMI handler, the processor saves the SMRAM state save map but does not save the attribute to keep NMI interrupts disabled. Potentially, an NMI could be latched (while in SMM or upon exit) and serviced upon exit of SMM even though the previous NMI handler has still not completed. One or more NMIs could thus be nested inside the first NMI handler. The NMI interrupt handler should take this possibility into consideration. Also, for the Pentium processor, exceptions that invoke a trap or fault handler will enable NMI interrupts from inside of SMM. This behavior is implementation specific for the Pentium processor and is not part the Intel Architecture.
12.8. SAVING THE FPU STATE WHILE IN SMM

In some instances (for example prior to powering down system memory when entering a 0-volt suspend state), it is necessary to save the state of the FPU while in SMM. Care should be taken when performing this operation to insure that relevant FPU state information is not lost. The
12-11
safest way to perform this task is to place the processor in 32-bit protected mode before saving the FPU state. The reason for this is as follows. The FSAVE instruction saves the FPU context in any of four different formats, depending on which mode the processor is in when FSAVE is executed (refer to Figures 7-13 through 7-16 in the Intel Architecture Software Developers Manual, Volume 1). When in SMM, by default, the 16-bit real-address mode format is used (shown in Figure 7-16). If an SMI interrupt occurs while the processor is in a mode other than 16-bit real-address mode, FSAVE and FRSTOR will be unable to save and restore all the relevant FPU information, and this situation may result in a malfunction when the interrupted program is resumed. To avoid this problem, the processor should be in 32-bit protected mode when executing the FSAVE and FRSTOR instructions. The following guidelines should be used when going into protected mode from an SMI handler to save and restore the FPU state:
Use the CPUID instruction to insure that the processor contains an FPU. Create a 32-bit code segment in SMRAM space that contains procedures or routines to save and restore the FPU using the FSAVE and FRSTOR instructions, respectively. A GDT with an appropriate code-segment descriptor (D bit is set to 1) for the 32-bit code segment must also be placed in SMRAM. Write a procedure or routine that can be called by the SMI handler to save and restore the FPU state. This procedure should do the following: Place the processor in 32-bit protected mode as describe in Section 8.8.1., Switching to Protected Mode in Chapter 8, Processor Management and Initialization. Execute a far JMP to the 32-bit code segment that contains the FPU save and restore procedures. Place the processor back in 16-bit real-address mode before returning to the SMI handler (refer to Section 8.8.2., Switching Back to Real-Address Mode in Chapter 8, Processor Management and Initialization).
The SMI handler may continue to execute in protected mode after the FPU state has been saved and return safely to the interrupted program from protected mode. However, it is recommended that the handler execute primarily in 16- or 32-bit real-address mode.
12.9. SMM REVISION IDENTIFIER

The SMM revision identifier field is used to indicate the version of SMM and the SMM extensions that are supported by the processor (refer to Figure 12-2). The SMM revision identifier is written during SMM entry and can be examined in SMRAM space at offset 7EFCH. The lower word of the SMM revision identifier refers to the version of the base SMM architecture.
12-12
Register Offset 7EFCH

31 18 17 16 15 0
Reserved SMBASE Relocation I/O Instruction Restart Reserved
SMM Revision Identifier
Figure 12-2. SMM Revision Identifier
The upper word of the SMM revision identifier refers to the extensions available. If the I/O instruction restart flag (bit 16) is set, the processor supports the I/O instruction restart (refer to Section 12.12., I/O Instruction Restart); if the SMBASE relocation flag (bit 17) is set, SMRAM base address relocation is supported (refer to Section 12.11., SMBASE Relocation).
12.10. AUTO HALT RESTART

If the processor is in a HALT state (due to the prior execution of a HLT instruction) when it receives an SMI, the processor records the fact in the auto HALT restart flag in the saved processor state (refer to Figure 12-3). (This flag is located at offset 7F02H and bit 0 in the state save area of the SMRAM.) If the processor sets the auto HALT restart flag upon entering SMM (indicating that the SMI occurred when the processor was in the HALT state), the SMI handler has two options:
It can leave the auto HALT restart flag set, which instructs the RSM instruction to return program control to the HLT instruction. This option in effect causes the processor to reenter the HALT state after handling the SMI. (This is the default operation.) It can clear the auto HALT restart flag, with instructs the RSM instruction to return program control to the instruction following the HLT instruction.
15 1 0
Register Offset 7F02H Reserved Auto HALT Restart
Figure 12-3. Auto HALT Restart Field
12-13
These options are summarized in Table 12-3. Note that if the processor was not in a HALT state when the SMI was received (the auto HALT restart flag is cleared), setting the flag to 1 will cause unpredictable behavior when the RSM instruction is executed.
Table 12-3. Auto HALT Restart Flag Values
Value of Flag After Entry to SMM 0 0 1 1 Value of Flag When Exiting SMM 0 1 0 1 Action of Processor When Exiting SMM Returns to next instruction in interrupted program or task Unpredictable Returns to next instruction after HLT instruction Returns to HALT state
If the HLT instruction is restarted, the processor will generate a memory access to fetch the HLT instruction (if it is not in the internal cache), and execute a HLT bus transaction. This behavior results in multiple HLT bus transactions for the same HLT instruction.
12.10.1. Executing the HLT Instruction in SMM

The HLT instruction should not be executed during SMM, unless interrupts have been enabled by setting the IF flag in the EFLAGS register. If the processor is halted in SMM, the only event that can remove the processor from this state is a maskable hardware interrupt or a hardware reset.
12.11. SMBASE RELOCATION

The default base address for the SMRAM is 30000H. This value is contained in an internal processor register called the SMBASE register. The operating system or executive can relocate the SMRAM by setting the SMBASE field in the saved state map (at offset 7EF8H) to a new value (refer to Figure 12-4). The RSM instruction reloads the internal SMBASE register with the value in the SMBASE field each time it exits SMM. All subsequent SMI requests will use the new SMBASE value to find the starting address for the SMI handler (at SMBASE + 8000H) and the SMRAM state save area (from SMBASE + FE00H to SMBASE + FFFFH). (The processor resets the value in its internal SMBASE register to 30000H on a RESET, but does not change it on an INIT.) In multiple-processor systems, initialization software must adjust the
12-14
SMBASE value for each processor so that the SMRAM state save areas for each processor do not overlap. (For Pentium and Intel486 processors, the SMBASE values must be aligned on a 32-KByte boundary or the processor will enter shutdown state during the execution of a RSM instruction.)
31
SMM Base
Register Offset 7EF8H
Figure 12-4. SMBASE Relocation Field
If the SMBASE relocation flag in the SMM revision identifier field is set, it indicates the ability to relocate the SMBASE (refer to Section 12.9., SMM Revision Identifier).
12.11.1. Relocating SMRAM to an Address Above 1 MByte

In SMM, the segment base registers can only be updated by changing the value in the segment registers. The segment registers contain only 16 bits, which allows only 20 bits to be used for a segment base address (the segment register is shifted left 4 bits to determine the segment base address). If SMRAM is relocated to an address above 1 MByte, software operating in realaddress mode can no longer initialize the segment registers to point to the SMRAM base address (SMBASE). The SMRAM can still be accessed by using 32-bit address-size override prefixes to generate an offset to the correct address. For example, if the SMBASE has been relocated to FFFFFFH (immediately below the 16-MByte boundary) and the DS, ES, FS, and GS registers are still initialized to 0H, data in SMRAM can be accessed by using 32-bit displacement registers, as in the following example:
mov mov esi,00FFxxxxH; 64K segment immediately below 16M ax,ds:[esi]
A stack located above the 1-MByte boundary can be accessed in the same manner.
12.12. I/O INSTRUCTION RESTART

If the I/O instruction restart flag in the SMM revision identifier field is set (refer to Section 12.9., SMM Revision Identifier), the I/O instruction restart mechanism is present on the processor. This mechanism allows an interrupted I/O instruction to be re-executed upon returning from SMM mode. For example, if an I/O instruction is used to access a powered-down I/O device, a chip set supporting this device can intercept the access and respond by asserting SMI#. This action invokes the SMI handler to power-up the device. Upon returning from the SMI handler, the I/O instruction restart mechanism can be used to re-execute the I/O instruction that caused the SMI.
12-15
The I/O instruction restart field (at offset 7F00H in the SMM state-save area, refer to Figure 12-5) controls I/O instruction restart. When an RSM instruction is executed, if this field contains the value FFH, then the EIP register is modified to point to the I/O instruction that received the SMI request. The processor will then automatically re-execute the I/O instruction that the SMI trapped. (The processor saves the necessary machine state to insure that re-execution of the instruction is handled coherently.)
15 0
I/O Instruction Restart Field
Register Offset 7F00H
Figure 12-5. I/O Instruction Restart Field
If the I/O instruction restart field contains the value 00H when the RSM instruction is executed, then the processor begins program execution with the instruction following the I/O instruction. (When a repeat prefix is being used, the next instruction may be the next I/O instruction in the repeat loop.) Not re-executing the interrupted I/O instruction is the default behavior; the processor automatically initializes the I/O instruction restart field to 00H upon entering SMM. Table 12-4 summarizes the states of the I/O instruction restart field.
Table 12-4. I/O Instruction Restart Field Values
Value of Flag After Entry to SMM 00H 00H Value of Flag When Exiting SMM 00H FFH Action of Processor When Exiting SMM Does not re-execute trapped I/O instruction. Re-executes trapped I/O instruction.
Note that the I/O instruction restart mechanism does not indicate the cause of the SMI. It is the responsibility of the SMI handler to examine the state of the processor to determine the cause of the SMI and to determine if an I/O instruction was interrupted and should be restarted upon exiting SMM. If an SMI interrupt is signaled on a non-I/O instruction boundary, setting the I/O instruction restart field to FFH prior to executing the RSM instruction will likely result in a program error.
12.12.1. Back-to-Back SMI Interrupts When I/O Instruction Restart Is Being Used
If an SMI interrupt is signaled while the processor is servicing an SMI interrupt that occurred on an I/O instruction boundary, the processor will service the new SMI request before restarting the originally interrupted I/O instruction. If the I/O instruction restart field is set to FFH prior to returning from the second SMI handler, the EIP will point to an address different from the originally interrupted I/O instruction, which will likely lead to a program error. To avoid this situation, the SMI handler must be able to recognize the occurrence of back-to-back SMI interrupts
12-16
when I/O instruction restart is being used and insure that the handler sets the I/O instruction restart field to 00H prior to returning from the second invocation of the SMI handler.
12.13. SMM MULTIPLE-PROCESSOR CONSIDERATIONS

The following should be noted when designing multiple-processor systems:
Any processor in a multiprocessor system can respond to an SMM. Each processor needs its own SMRAM space. This space can be in system memory or in a separate RAM. The SMRAMs for different processors can be overlapped in the same memory space. The only stipulation is that each processor needs its own state save area and its own dynamic data storage area. (Also, for the Pentium and Intel486 processors, the SMBASE address must be located on a 32-KByte boundary.) Code and static data can be shared among processors. Overlapping SMRAM spaces can be done more efficiently with the P6 family processors because they do not require that the SMBASE address be on a 32-KByte boundary. The SMI handler will need to initialize the SMBASE for each processor. Processors can respond to local SMIs through their SMI# pins or to SMIs received through the APIC interface. The APIC interface can distribute SMIs to different processors. Two or more processors can be executing in SMM at the same time. When operating Pentium processors in dual processing (DP) mode, the SMIACT# pin is driven only by the MRM processor and should be sampled with ADS#. For additional details, refer to Chapter 14 of the Pentium Processor Family Users Manual, Volume 1.
SMM is not re-entrant, because the SMRAM State Save Map is fixed relative to the SMBASE. If there is a need to support two or more processors in SMM mode at the same time then each processor should have dedicated SMRAM spaces. This can be done by using the SMBASE Relocation feature (refer to Section 12.11., SMBASE Relocation).
12-17
12-18
13
Machine-Check Architecture
MACHINE-CHECK ARCHITECTURE
CHAPTER 13 MACHINE-CHECK ARCHITECTURE

This chapter describes the P6 familys machine-check architecture and machine-check exception mechanism. Refer to Chapter 5, Interrupt and Exception Handling for more information on the machine-check exception. A brief description of the Pentium processors machine check capability is also given.
13.1. MACHINE-CHECK EXCEPTIONS AND ARCHITECTURE

The P6 family of processors implement a machine-check architecture that provides a mechanism for detecting and reporting hardware (machine) errors, such as system bus errors, ECC errors, parity errors, cache errors, and TLB errors. It consists of a set of model-specific registers (MSRs) that are used to set up machine checking and additional banks of MSRs for recording the errors that are detected. The processor signals the detection of a machine-check error by generating a machine-check exception (#MC). A machine-check exception is generally an abort class exception. The implementation of the machine-check architecture, does not ordinarily permit the processor to be restarted reliably after generating a machine-check exception; however, the machine-check-exception handler can collect information about the machinecheck error from the machine-check MSRs.
13.2. COMPATIBILITY WITH PENTIUM PROCESSOR

The P6 family processors support and extend the machine-check exception mechanism used in the Pentium processor. The Pentium processor reports the following machine-check errors:
Data parity errors during read cycles. Unsuccessful completion of a bus cycle.
These errors are reported through the P5_MC_TYPE and P5_MC_ADDR MSRs, which are implementation specific for the Pentium processor. These MSRs can be read with the RDMSR instruction. Refer to Table B-1 in Appendix B, Model-Specific Registers for the register addresses for these MSRs. The machine-check error reporting mechanism that the Pentium processors use is similar to that used in the P6 family processors. That is, when an error is detected, it is recorded in the P5_MC_TYPE and P5_MC_ADDR MSRs and then the processor generates a machine-check exception (#MC). Refer to Section 13.3.3., Mapping of the Pentium Processor Machine-Check Errors to the P6 Family Machine-Check Architecture and Section 13.7.2., Pentium Processor MachineCheck Exception Handling for information on compatibility between machine-check code written to run on the Pentium processors and code written to run on P6 family processors.
13-1
13.3. MACHINE-CHECK MSRS

The machine check MSRs in the P6 family processors consist of a set of global control and status registers and several error-reporting register banks (refer to Figure 13-1). Each errorreporting bank is associated with a specific hardware unit (or group of hardware units) within the processor. The RDMSR and WRMSR instructions are used to read and write these registers.
Global Control Registers 63 MCG_CAP Register 63 MCG_STATUS Register 63 MCG_CTL Register* * Not present in the Pentium Pro processor. 63 0 63 0 63 0 63
Error-Reporting Bank Registers (One Set for Each Hardware Unit) 0 MCi_CTL Register 0 MCi_STATUS Register 0 MCi_ADDR Register 0 MCi_MISC Register
Figure 13-1. Machine-Check MSRs
13.3.1. Machine-Check Global Control MSRs

The machine-check global control registers include the MCG_CAP, MCG_STATUS, and MCG_CTL MSRs. Refer to Appendix B, Model-Specific Registers for the addresses of these registers. 13.3.1.1. MCG_CAP MSR
The MCG_CAP MSR is a read-only register that provides information about the machine-check architecture implementation in the processor (refer to Figure 13-2). It contains the following field and flag: Count field, bits 0 through 7 Indicates the number of hardware unit error-reporting banks available in a particular processor implementation. MCG_CTL_P (register present) flag, bit 8 Indicates that the MCG_CTL register is present when set, and absent when clear. Bits 9 through 63 are reserved. The effect of writing to the MCG_CAP register is undefined. Figure 5-1 shows the bit fields of MCG_CAP.
13-2
63
9 8 7
Reserved MCG_CTL_PMCG_CTL register present CountNumber of reporting banks
Count
Figure 13-2. MCG_CAP Register
13.3.1.2.
MCG_STATUS MSR
The MCG_STATUS MSR describes the current state of the processor after a machine-check exception has occurred (refer to Figure 13-3). This register contains the following flags: RIPV (restart IP valid) flag, bit 0 Indicates (when set) that program execution can be restarted reliably at the instruction pointed to by the instruction pointer pushed on the stack when the machine-check exception is generated. When clear, the program cannot be reliably restarted at the pushed instruction pointer. EIPV (error IP valid) flag, bit 1 Indicates (when set) that the instruction pointed to by the instruction pointer pushed onto the stack when the machine-check exception is generated is directly associated with the error. When this flag is cleared, the instruction pointed to may not be associated with the error. MCIP (machine check in progress) flag, bit 2 Indicates (when set) that a machine-check exception was generated. Software can set or clear this flag. The occurrence of a second Machine-Check Event while MCIP is set will cause the processor to enter a shutdown state. Bits 3 through 63 in the MCG_STATUS register are reserved.
63
3 2 1 0
Reserved
M C I P
E R I I P P V V
MCIPMachine check in progress flag EIPVError IP valid flag RIPVRestart IP valid flag
Figure 13-3. MCG_STATUS Register
13-3
13.3.1.3.
MCG_CTL MSR
The MCG_CTL register is present if the capability flag MCG_CTL_P is set in the MCG_CAP register. The MCG_CTL register controls the reporting of machine-check exceptions. If present (MCG_CTL_P flag in the MCG_CAP register is set), writing all 1s to this register enables all machine-check features and writing all 0s disables all machine-check features. All other values are undefined and/or implementation specific.
13.3.2. Error-Reporting Register Banks

Each error-reporting register bank can contains an MCi_CTL, MCi_STATUS, MCi_ADDR, and MCi_MISC MSR. The P6 family processors provide five banks of error-reporting registers. The first error-reporting register (MC0_CTL) always starts at address 400H. Refer to Table B-1 in Appendix B, Model-Specific Registers for the addresses of the other error-reporting registers. 13.3.2.1. MCi_CTL MSR
The MCi_CTL MSR controls error reporting for specific errors produced by a particular hardware unit (or group of hardware units). Each of the 64 flags (EEj) represents a potential error. Setting an EEj flag enables reporting of the associated error and clearing it disables reporting of the error. Writing the 64-bit value FFFFFFFFFFFFFFFFH to an MCi_CTL register enables logging of all errors. The processor does not write changes to bits that are not implemented. Figure 13-4 shows the bit fields of MCi_CTL
NOTE
Operating system or executive software must not modify the contents of the MC0_CTL register. The MC0_CTL register is internally aliased to the EBL_CR_POWERON register and as such controls system-specific error handling features. These features are platform specific. System specific firmware (the BIOS) is responsible for the appropriate initialization of MC0_CTL. The P6 family processors only allows the writing of all 1s or all 0s to the MCi_CTL registers.
63 62 61
E E 6 3 E E 6 2 E E 6 1
.....
EEjError reporting enable flag (where j is 00 through 63)
3 2 1 0
E E 0 2 E E E E 0 0 1 0
Figure 13-4. MCi_CTL Register
13-4
13.3.2.2.
MCi_STATUS MSR
The MCi_STATUS MSR contains information related to a machine-check error if its VAL (valid) flag is set (refer to Figure 13-5). Software is responsible for clearing the MC i_STATUS register by writing it with all 0s; writing 1s to this register will cause a general-protection exception to be generated. The flags and fields in this register are as follows: MCA (machine-check architecture) error code field, bits 0 through 15 Specifies the machine-check architecture-defined error code for the machine-check error condition detected. The machine-check architecture-defined error codes are guaranteed to be the same for all Intel Architecture processors that implement the machine-check architecture. Refer to Section 13.6., Interpreting the MCA Error Codes for information on machine-check error codes.
63 62 6160 59 58 5756
V U E A O C N L P C C
32 31
16 15
Other Information
Model-Specific Error Code
MCA Error Code
PCCProcessor context corrupt ADDRVMCi_ADDR register valid MISCVMCi_MISC register valid ENError enabled UCUncorrected error OVERError overflow VALMCi_STATUS register valid
Figure 13-5. MCi_STATUS Register
Model-specific error code field, bits 16 through 31 Specifies the model-specific error code that uniquely identifies the machine-check error condition detected. The model-specific error codes may differ among Intel Architecture processors for the same machine-check error condition. Other information field, bits 32 through 56 The functions of the bits in this field are implementation specific and are not part of the machine-check architecture. Software that is intended to be portable among Intel Architecture processors should not rely on the values in this field. PCC (processor context corrupt) flag, bit 57 Indicates (when set) that the state of the processor might have been corrupted by the error condition detected and that reliable restarting of the processor may not be possible. When clear, this flag indicates that the error did not affect the processors state. ADDRV (MCi_ADDR register valid) flag, bit 58 Indicates (when set) that the MCi_ADDR register contains the address where the error occurred (refer to Section 13.3.2.3., MCi_ADDR MSR). When clear, this flag indicates that the MCi_ADDR register does not contain the address where the error occurred. Do not read these registers if they are not implemented in the processor.
13-5
MISCV (MCi_MISC register valid) flag, bit 59 Indicates (when set) that the MCi_MISC register contains additional information regarding the error. When clear, this flag indicates that the MCi_MISC register does not contain additional information regarding the error. Do not read these registers if they are not implemented in the processor EN (error enabled) flag, bit 60 Indicates (when set) that the error was enabled by the associated EEj bit of the MCi_CTL register. UC (error uncorrected) flag, bit 61 Indicates (when set) that the processor did not or was not able to correct the error condition. When clear, this flag indicates that the processor was able to correct the error condition. OVER (machine check overflow) flag, bit 62 Indicates (when set) that a machine-check error occurred while the results of a previous error were still in the error-reporting register bank (that is, the VAL bit was already set in the MCi_STATUS register). The processor sets the OVER flag and software is responsible for clearing it. Enabled errors are written over disabled errors, and uncorrected errors are written over corrected errors. Uncorrected errors are not written over previous valid uncorrected errors. VAL (MCi_STATUS register valid) flag, bit 63 Indicates (when set) that the information within the MCi_STATUS register is valid. When this flag is set, the processor follows the rules given for the OVER flag in the MCi_STATUS register when overwriting previously valid entries. The processor sets the VAL flag and software is responsible for clearing it. 13.3.2.3. MCi_ADDR MSR
The MCi_ADDR MSR contains the address of the code or data memory location that produced the machine-check error if the ADDRV flag in the MCi_STATUS register is set (refer to Section 13.3.2.2., MCi_STATUS MSR). The address returned is either 32-bit offset into a segment, 32-bit linear address, or 36-bit physical address, depending upon the type of error encountered. Bits 36 through 63 of this register are reserved for future address expansion and are always read as zeros.
63
36 35
Reserved
Address
Figure 13-6. Machine-Check Bank Address Register
13-6
13.3.2.4.
MCi_MISC MSR
The MCi_MISC MSR contains additional information describing the machine-check error if the MISCV flag in the MCi_STATUS register is set. This register is not implemented in any of the error-reporting register banks for the P6 family processors.
13.3.3. Mapping of the Pentium Processor Machine-Check Errors to the P6 Family Machine-Check Architecture
The Pentium processor reports machine-check errors using two registers: P5_MC_TYPE and P5_MC_ADDR. The P6 family processors map these registers into the MCi_STATUS and MCi_ADDR registers of the error-reporting register bank that reports on the type of external bus errors reported in the P5_MC_TYPE and P5_MC_ADDR registers. The information in these registers can then be accessed in either of two ways:
By reading the MCi_STATUS and MCi_ADDR registers as part of a generalized machinecheck exception handler written for a P6 family processor. By reading the P5_MC_TYPE and P5_MC_ADDR registers with the RDMSR instruction.
The second access capability permits a machine-check exception handler written to run on a Pentium processor to be run on a P6 family processor. There is a limitation in that information returned by the P6 family processor will be encoded differently than it is for the Pentium processor. To run the Pentium processor machine-check exception handler on a P6 family processor, it must be rewritten to interpret the P5_MC_TYPE register encodings correctly.
13.4. MACHINE-CHECK AVAILABILITY

The machine-check architecture and machine-check exception (#MC) are model-specific features. Software can execute the CPUID instruction to determine whether a processor implements these features. Following the execution of the CPUID instruction, the settings of the MCA flag (bit 14) and MCE flag (bit 7) in the EDX register indicate whether the processor implements the machine-check architecture and machine-check exception, respectively.
13.5. MACHINE-CHECK INITIALIZATION

To use the processors machine-check architecture, software must initialize the processor to activate the machine-check exception and the error-reporting mechanism. Example 13-1 gives pseudocode for performing this initialization. This pseudocode checks for the existence of the machine-check architecture and exception on the processor, then enables the machine-check exception and the error-reporting register banks. The pseudocode assumes that the machinecheck exception (#MC) handler has been installed on the system. This initialization procedure is compatible with the Pentium and P6 family processors.
13-7
Example 13-1. Machine-Check Initialization Pseudocode EXECUTE the CPUID instruction; READ bits 7 (MCE) and 14 (MCA) of the EDX register; IF CPU supports MCE THEN IF CPU supports MCA THEN IF MCG_CAP.MCG_CTL_P = 1 (* MCG_CTL register is present *) Set MCG_CTL register to all 1s; (* enables all MCA features *) FI; COUNT MCG_CAP.Count; (* determine number of error-reporting banks supported *) FOR error-reporting banks (1 through COUNT) DO Set MCi_CTL register to all 1s; (* enables logging of all errors except for the MC0_CTL register *) OD FOR error-reporting banks (0 through COUNT) DO Set MCi_STATUS register to all 0s; (* clears all errors *) OD FI; Set the MCE flag (bit 6) in CR4 register to enable machine-check exceptions; FI;
The processor can write valid information (such as an ECC error) into the MC i_STATUS registers while it is being powered up. As part of the initialization of the MCE exception handler, software might examine all the MCi_STATUS registers and log the contents of them, then rewrite them all to zeros. This procedure is not included in the initialization pseudocode in Example 13-1.
13.6. INTERPRETING THE MCA ERROR CODES

When the processor detects a machine-check error condition, it writes a 16-bit error code in the MCA Error Code field of one of the MCi_STATUS registers and sets the VAL (valid) flag in that register. The processor may also write a 16-bit Model-specific Error Code in the MCi_STATUS register depending on the implementation of the machine-check architecture of the processor. The MCA error codes are architecturally defined for Intel Architecture processors; however, the specific MCi_STATUS register that a code is written into is model specific. To determine the cause of a machine-check exception, the machine-check exception handler must read the VAL flag for each MCi_STATUS register, and, if the flag is set, then read the MCA error code field of the register. It is the encoding of the MCACOD value that determines the type of error being reported and not the register bank reporting it. There are two types of MCA error codes: simple error codes and compound error codes.
13-8
13.6.1. Simple Error Codes

Table 13-1 shows the simple error codes. These unique codes indicate global error information.
Table 13-1. Simple Error Codes
Error Code No Error Unclassified Microcode ROM Parity Error External Error FRC Error Internal Unclassified Binary Encoding 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0010 0000 0000 0000 0011 0000 0000 0000 0100 0000 01xx xxxx xxxx Meaning No error has been reported to this bank of error-reporting registers. This error has not been classified into the MCA error classes. Parity error in internal microcode ROM The BINIT# from another processor caused this processor to enter machine check. FRC (functional redundancy check) master/slave error Internal unclassified errors
13.6.2. Compound Error Codes

The compound error codes describe errors related to the TLBs, memory, caches, bus and interconnect logic. A set of sub-fields is common to all of the compound error encodings. These subfields describe the type of access, level in the memory hierarchy, and type of request. Table 13-2 shows the general form of the compound error codes. The interpretation column indicates the name of a compound error. The name is constructed by substituting mnemonics from Tables 13-2 through 13-6 for the sub-field names given within curly braces. For example, the error code ICACHEL1_RD_ERR is constructed from the form:
{TT}CACHE{LL}_{RRRR}_ERR
where {TT} is replaced by I, {LL} is replaced by L1, and {RRRR} is replaced by RD. The 2-bit TT sub-field (refer to Table 13-2) indicates the type of transaction (data, instruction, or generic). It applies to the TLB, cache, and interconnect error conditions. The generic type is reported when the processor cannot determine the transaction type.
Table 13-2. General Forms of Compound Error Codes
Type TLB Errors Memory Hierarchy Errors Bus and Interconnect Errors Form 0000 0000 0001 TTLL 0000 0001 RRRR TTLL 0000 1PPT RRRR IILL Interpretation {TT}TLB{LL}_ERR {TT}CACHE{LL}_{RRRR}_ERR BUS{LL}_{PP}_{RRRR}_{II}_{T}_ERR
13-9
Table 13-3. Encoding for TT (Transaction Type) Sub-Field

Transaction Type Instruction Data Generic Mnemonic I D G Binary Encoding 00 01 10
The 2-bit LL sub-field (refer to Table 13-4) indicates the level in the memory hierarchy where the error occurred (level 0, level 1, level 2, or generic). The LL sub-field also applies to the TLB, cache, and interconnect error conditions. The P6 family processors support two levels in the cache hierarchy and one level in the TLBs. Again, the generic type is reported when the processor cannot determine the hierarchy level.
Table 13-4. Level Encoding for LL (Memory Hierarchy Level) Sub-Field
Hierarchy Level Level 0 Level 1 Level 2 Generic Mnemonic L0 L1 L2 LG Binary Encoding 00 01 10 11
The 4-bit RRRR sub-field (refer to Table 13-5) indicates the type of action associated with the error. Actions include read and write operations, prefetches, cache evictions, and snoops. Generic error is returned when the type of error cannot be determined. Generic read and generic write are returned when the processor cannot determine the type of instruction or data request that caused the error. Eviction and Snoop requests apply only to the caches. All of the other requests apply to TLBs, caches and interconnects.
Table 13-5. Encoding of Request (RRRR) Sub-Field
Request Type Generic Error Generic Read Generic Write Data Read Data Write Instruction Fetch Prefetch Eviction Snoop Mnemonic ERR RD WR DRD DWR IRD PREFETCH EVICT SNOOP Binary Encoding 0000 0001 0010 0011 0100 0101 0110 0111 1000
13-10
The bus and interconnect errors are defined with the 2-bit PP (participation), 1-bit T (time-out), and 2-bit II (memory or I/O) sub-fields, in addition to the LL and RRRR sub-fields (refer to Table 13-6). The bus error conditions are implementation dependent and related to the type of bus implemented by the processor. Likewise, the interconnect error conditions are predicated on a specific implementation-dependent interconnect model that describes the connections between the different levels of the storage hierarchy. The type of bus is implementation dependent, and as such is not specified in this document. A bus or interconnect transaction consists of a request involving an address and a response.
Table 13-6. Encodings of PP, T, and II Sub-Fields
Sub-Field PP (Participation) Transaction Local processor originated request Local processor responded to request Local processor observed error as third party Generic T (Time-out) Request timed out Request did not time out II (Memory or I/O) Memory Access Reserved I/O Other transaction IO TIMEOUT NOTIMEOUT M Mnemonic SRC RES OBS Binary Encoding 00 01 10 11 1 0 00 01 10 11
13.6.3. Interpreting the Machine-Check Error Codes for External Bus Errors
Table 13-7 gives additional information for interpreting the MCA error code, model-specific error code, and other information error code fields for machine-check errors that occur on the external bus. This information can be used to design a machine-check exception handler for the processor that offers greater granularity for the external bus errors.
Table 13-7. Encoding of the MCi_STATUS Register for External Bus Errors
Bit No. 0-1 2-3 4-7 Bit Function MCA Error Code MCA Error Code MCA Error Code Undefined. Bit 2 is set to 1 if the access was a special cycle. Bit 3 is set to 1 if the access was a special cycle OR a I/O cycle. 00WR; W = 1 for writes, R = 1 for reads. Bit Description
13-11
Table 13-7. Encoding of the MCi_STATUS Register for External Bus Errors (Contd.)
Bit No. 8-9 10 Bit Function MCA Error Code MCA Error Code Undefined. Set to 0 for all EBL errors. Set to 1 for internal watch-dog timer time-out. For a watch-dog timer time-out, all the MCACOD bits except this bit are set to 0. A watch-dog timer time-out only occurs if the BINIT driver is enabled. Set to 1 for EBL errors. Set to 0 for internal watch-dog timer time-out. Reserved. Reserved. Bit Description
11 12-15 16-18
MCA Error Code MCA Error Code ModelSpecific Error Code ModelSpecific Error Code
19-24
000000 for BQ_DCU_READ_TYPE error. 000010 for BQ_IFU_DEMAND_TYPE error. 000011 for BQ_IFU_DEMAND_NC_TYPE error. 000100 for BQ_DCU_RFO_TYPE error. 000101 for BQ_DCU_RFO_LOCK_TYPE error. 000110 for BQ_DCU_ITOM_TYPE error. 001000 for BQ_DCU_WB_TYPE error. 001010 for BQ_DCU_WCEVICT_TYPE error. 001011 for BQ_DCU_WCLINE_TYPE error. 001100 for BQ_DCU_BTM_TYPE error. 001101 for BQ_DCU_INTACK_TYPE error. 001110 for BQ_DCU_INVALL2_TYPE error. 001111 for BQ_DCU_FLUSHL2_TYPE error. 010000 for BQ_DCU_PART_RD_TYPE error. 010010 for BQ_DCU_PART_WR_TYPE error. 010100 for BQ_DCU_SPEC_CYC_TYPE error. 011000 for BQ_DCU_IO_RD_TYPE error. 011001 for BQ_DCU_IO_WR_TYPE error. 011100 for BQ_DCU_LOCK_RD_TYPE error. 011110 for BQ_DCU_SPLOCK_RD_TYPE error. 011101 for BQ_DCU_LOCK_WR_TYPE error. 000 for BQ_ERR_HARD_TYPE error. 001 for BQ_ERR_DOUBLE_TYPE error. 010 for BQ_ERR_AERR2_TYPE error. 100 for BQ_ERR_SINGLE_TYPE error. 101 for BQ_ERR_AERR1_TYPE error. 1 if FRC error is active.
27-25
ModelSpecific Error Code
28
ModelSpecific Error Code ModelSpecific Error Code
29
1 if BERR is driven.
13-12
Bit No. 30 Bit Function ModelSpecific Error Code ModelSpecific Error Code Other Information Other Information BINIT Other Information RESPONSE PARITY ERROR Other Information BUS BINIT Other Information TIMEOUT BINIT Bit Description 1 if BINIT is driven for this processor.
31
Reserved.
32-34 35
Reserved. 1 if BINIT is received from external bus.
36
This bit is asserted in the MCi_STATUS register if this component has received a parity error on the RS[2:0]# pins for a response transaction. The RS signals are checked by the RSP# external pin.
37
This bit is asserted in the MCi_STATUS register if this component has received a hard error response on a split transaction (one access that has needed to be split across the 64-bit external bus interface into two accesses). This bit is asserted in the MCi_STATUS register if this component has experienced a ROB time-out, which indicates that no microinstruction has been retired for a predetermined period of time. A ROB time-out occurs when the 15bit ROB time-out counter carries a 1 out of its high order bit. The timer is cleared when a microinstruction retires, an exception is detected by the core processor, RESET is asserted, or when a ROB BINIT occurs. The ROB time-out counter is prescaled by the 8-bit PIC timer which is a divide by 128 of the bus clock (the bus clock is 1:2, 1:3, 1:4 the core clock). When a carry out of the 8-bit PIC timer occurs, the ROB counter counts up by one. While this bit is asserted, it cannot be overwritten by another error.
38
39-41 42
Other Information Other Information HARD ERROR Other Information IERR Other Information AERR
Reserved. This bit is asserted in the MCi_STATUS register if this component has initiated a bus transactions which has received a hard error response. While this bit is asserted, it cannot be overwritten. This bit is asserted in the MCi_STATUS register if this component has experienced a failure that causes the IERR pin to be asserted. While this bit is asserted, it cannot be overwritten. This bit is asserted in the MCi_STATUS register if this component has initiated 2 failing bus transactions which have failed due to Address Parity Errors (AERR asserted). While this bit is asserted, it cannot be overwritten.
43
44
13-13
Bit No. 45 Bit Function Other Information UECC Other Information CECC Other Information SYNDROME Bit Description Uncorrectable ECC error bit is asserted in the MCi_STATUS register for uncorrected ECC errors. While this bit is asserted, the ECC syndrome field will not be overwritten. The correctable ECC error bit is asserted in the MCi_STATUS register for corrected ECC errors. The ECC syndrome field in the MCi_STATUS register contains the 8-bit ECC syndrome only if the error was a correctable/uncorrectable ECC error, and there wasnt a previous valid ECC error syndrome logged in the MCi_STATUS register. A previous valid ECC error in MCi_STATUS is indicated by MCi_STATUS.bit45 (uncorrectable error occurred) being asserted. After processing an ECC error, machine-check handling software should clear MCi_STATUS.bit45 so that future ECC error syndromes can be logged. Reserved.
46
47-54
55-56
Other Information
13.7. GUIDELINES FOR WRITING MACHINE-CHECK SOFTWARE

The machine-check architecture and error logging can be used in two different ways:
To detect machine errors during normal instruction execution, using the machine-check exception (#MC). To periodically check and log machine errors.
To use the machine-check exception, the operating system or executive software must provide a machine-check exception handler. This handler can be designed specifically for P6 family processors or be a portable handler that also handles Pentium processor machine-check errors. A special program or utility is required to log machine errors. Guidelines for writing a machine-check exception handler or a machine-error logging utility are given in the following sections.
13.7.1. Machine-Check Exception Handler

The machine-check exception (#MC) corresponds to vector 18. To service machine-check exceptions, a trap gate must be added to the IDT, and the pointer in the trap gate must point to a machine-check exception handler. Two approaches can be taken to designing the exception handler:
The handler can merely log all the machine status and error information, then call a debugger or shut down the system. The handler can analyze the reported error information and, in some cases, attempt to correct the error and restart the processor.
13-14
Virtually all the machine-check conditions detected with the P6 family processors cannot be recovered from (they result in abort-type exceptions). The logging of status and error information is therefore a baseline implementation. Refer to Section 13.7., Guidelines for Writing Machine-Check Software for more information on logging errors. For future P6 family processor implementations, where recovery may be possible, the following things should be considered when writing a machine-check exception handler:
To determine the nature of the error, the handler must read each of the error-reporting register banks. The count field in the MCG_CAP register gives number of register banks. The first register of register bank 0 is at address 400H. The VAL (valid) flag in each MCi_STATUS register indicates whether the error information in the register is valid. If this flag is clear, the registers in that bank do not contain valid error information and do not need to be checked. To write a portable exception handler, only the MCA error code field in the MCi_STATUS register should be checked. Refer to Section 13.6., Interpreting the MCA Error Codes for information that can be used to write an algorithm to interpret this field. The RIPV, PCC, and OVER flags in each MCi_STATUS register indicate whether recovery from the error is possible. If either of these fields is set, recovery is not possible. The OVER field indicates that two or more machine-check error occurred. When recovery is not possible, the handler typically records the error information and signals an abort to the operating system. Corrected errors will have been corrected automatically by the processor. The UC flag in each MCi_STATUS register indicates whether the processor automatically corrected the error. The RIPV flag in the MCG_STATUS register indicates whether the program can be restarted at the instruction pointed to by the instruction pointer pushed on the stack when the exception was generated. If this flag is clear, the processor may still be able to be restarted (for debugging purposes), but not without loss of program continuity. For unrecoverable errors, the EIPV flag in the MCG_STATUS register indicates whether the instruction pointed to by the instruction pointer pushed on the stack when the exception was generated is related to the error. If this flag is clear, the pushed instruction may not be related to the error. The MCIP flag in the MCG_STATUS register indicates whether a machine-check exception was generated. Before returning from the machine-check exception handler, software should clear this flag so that it can be used reliably by an error logging utility. The MCIP flag also detects recursion. The machine-check architecture does not support recursion. When the processor detects machine-check recursion, it enters the shutdown state.
13-15
Example 13-2 gives typical steps carried out by a machine-check exception handler:
Example 13-2. Machine-Check Exception Handler Pseudocode IF CPU supports MCE THEN IF CPU supports MCA THEN call errorlogging routine; (* returns restartability *) FI; ELSE (* Pentium(R) processor compatible *) READ P5_MC_ADDR READ P5_MC_TYPE; report RESTARTABILITY to console; FI; IF error is not restartable THEN report RESTARTABILITY to console; abort system; FI; CLEAR MCIP flag in MCG_STATUS;
13.7.2. Pentium Processor Machine-Check Exception Handling

To make the machine-check exception handler portable to the Pentium and P6 family processors, checks can be made (using the CPUID instruction) to determine the processor type. Then based on the processor type, machine-check exceptions can be handled specifically for Pentium or P6 family processors. When machine-check exceptions are enabled for the Pentium processor (MCE flag is set in control register CR0), the machine-check exception handler uses the RDMSR instruction to read the error type from the P5_MC_TYPE register and the machine check address from the P5_MC_ADDR register. The handler then normally reports these register values to the system console before aborting execution (refer to Example 13-2).
13.7.3. Logging Correctable Machine-Check Errors

If a machine-check error is correctable, the processor does not generate a machine-check exception for it. To detect correctable machine-check errors, a utility program must be written that reads each of the machine-check error-reporting register banks and logs the results in an accounting file or data structure. This utility can be implemented in either of the following ways:
A system daemon that polls the register banks on an infrequent basis, such as hourly or daily.
13-16
A user-initiated application that polls the register banks and records the exceptions. Here, the actual polling service is provided by an operating-system driver or through the system call interface.
Example 13-3 gives pseudocode for an error logging utility.

Example 13-3. Machine-Check Error Logging Pseudocode Assume that execution is restartable; IF the processor supports MCA THEN FOR each bank of machine-check registers DO READ MCi_STATUS; IF VAL flag in MCi_STATUS = 1 THEN IF ADDRV flag in MCi_STATUS = 1 THEN READ MCi_ADDR; FI; IF MISCV flag in MCi_STATUS = 1 THEN READ MCi_MISC; FI; IF MCIP flag in MCG_STATUS = 1 (* Machine-check exception is in progress *) AND PCC flag in MCi_STATUS = 1 AND RIPV flag in MCG_STATUS = 0 (* execution is not restartable *) THEN RESTARTABILITY = FALSE; return RESTARTABILITY to calling procedure; FI; Save time-stamp counter and processor ID; Set MCi_STATUS to all 0s; Execute serializing instruction (i.e., CPUID); FI; OD; FI;
If the processor supports the machine-check architecture, the utility reads through the banks of error-reporting registers looking for valid register entries, and then saves the values of the MCi_STATUS, MCi_ADDR, MCi_MISC and MCG_STATUS registers for each bank that is valid. The routine minimizes processing time by recording the raw data into a system data structure or file, reducing the overhead associated with polling. User utilities analyze the collected data in an off-line environment. When the MCIP flag is set in the MCG_STATUS register, a machine-check exception is in progress and the machine-check exception handler has called the exception logging routine. Once the logging process has been completed the exception-handling routine must determine
13-17
whether execution can be restarted, which is usually possible when damage has not occurred (The PCC flag is clear, in the MCi_STATUS register) and when the processor can guarantee that execution is restartable (the RIPV flag is set in the MCG_STATUS register). If execution cannot be restarted, the system is not recoverable and the exception-handling routine should signal the console appropriately before returning the error status to the Operating System kernel for subsequent shutdown. The machine-check architecture allows buffering of exceptions from a given error-reporting bank although the P6 family processors do not implement this feature. The error logging routine should provide compatibility with future processors by reading each hardware error-reporting banks MCi_STATUS register and then writing 0s to clear the OVER and VAL flags in this register. The error logging utility should re-read the MCi_STATUS register for the bank ensuring that the valid bit is clear. The processor will write the next error into the register bank and set the VAL flags. Additional information that should be stored by the exception-logging routine includes the processors time-stamp counter value, which provides a mechanism to indicate the frequency of exceptions. A multiprocessing operating system stores the identity of the processor node incurring the exception using a unique identifier, such as the processors APIC ID (refer to Section 7.5.9., Interrupt Destination and APIC ID). The basic algorithm given in Example 13-3 can be modified to provide more robust recovery techniques. For example, software has the flexibility to attempt recovery using information unavailable to the hardware. Specifically, the machine-check exception handler can, after logging carefully analyze the error-reporting registers when the error-logging routine reports an error that does not allow execution to be restarted. These recovery techniques can use external bus related model-specific information provided with the error report to localize the source of the error within the system and determine the appropriate recovery strategy.
13-18
14
Code Optimization
CHAPTER 14 CODE OPTIMIZATION

This chapter describes the more important code optimization techniques for Intel Architecture processors with and without MMX technology, as well as with and without Streaming SIMD Extensions. The chapter begins with general code-optimization guidelines and continues with a brief overview of the more important blended techniques for optimizing integer, MMX technology, floating-point, and SIMD floating-point code. A comprehensive discussion of code optimization techniques can be found in the Intel Architecture Optimization Manual, Order Number 242816.
14.1. CODE OPTIMIZATION GUIDELINES

This section contains general guidelines for optimizing applications code, as well as specific guidelines for optimizing MMX, floating-point, and SIMD floating-point code. Developers creating applications that use MMX and/or floating-point instructions should apply the first set of guidelines in addition to the MMX and/or floating-point code optimization guidelines. Developers creating applications that use SIMD floating-point code should apply the first set of guidelines, as well as the MMX and/or floating-point code optimization guidelines, in addition to the SIMD floating-point code optimization guidelines.
14.1.1. General Code Optimization Guidelines

Use the following guidelines to optimize code to run efficiently across several families of Intel Architecture processors:
Use a current generation compiler that produces optimized code to insure that efficient code is generated from the start of code development. Write code that can be optimized by the compiler. For example: Minimize the use of global variables, pointers, and complex control flow statements. Do not use the register modifier. Use the const modifier. Do not defeat the typing system. Do not make indirect calls. Use minimum sizes for integer and floating-point data types, to enable SIMD parallelism.
14-1
CODE OPTIMIZATION
Pay attention to the branch prediction algorithm for the target processor. This optimization is particularly important for P6 family processors. Code that optimizes branch predictability will spend fewer clocks fetching instructions. Take advantage of the SIMD capabilities of MMX technology and Streaming SIMD Extensions. Avoid partial register stalls. Align all data. Organize code to minimize instruction cache misses and optimize instruction prefetches. Schedule code to maximize pairing on Pentium processors. Avoid prefixed opcodes other than 0FH. When possible, load and store data to the same area of memory using the same data sizes and address alignments; that is, avoid small loads after large stores to the same area of memory, and avoid large loads after small stores to the same area of memory. Use software pipelining. Always pair CALL and RET (return) instructions. Avoid self-modifying code. Do not place data in the code segment. Calculate store addresses as soon as possible. Avoid instructions that contain 4 or more micro-ops or instructions that are more than 7 bytes long. If possible, use instructions that require 1 micro-op. Cleanse partial registers before calling callee-save procedures.
14.1.2. Guidelines for Optimizing MMX Code

Use the following guidelines to optimize MMX code:
Do not intermix MMX instructions and floating-point instructions. Use the opcode reg, mem instruction format whenever possible. This format helps to free registers and reduce clocks without generating unnecessary loads. Put an EMMS instruction at the end of all MMX code sections that you know will transition to floating-point code. Optimize data cache bandwidth to MMX registers. Guidelines for Optimizing Floating-Point Code
14.1.3.
Use the following guidelines to optimize floating-point code:
14-2
CODE OPTIMIZATION
Understand how the compiler handles floating-point code. Look at the assembly dump and see what transforms are already performed on the program. Study the loop nests in the application that dominate the execution time. Determine why the compiler is not creating the fastest code. For example, look for dependences that can be resolved by rearranging code Look for and correct situations known to cause slow execution of floating-point code, such as: Large memory bandwidth requirements. Poor cache locality. Long-latency floating-point arithmetic operations.
Do not use more precision than is necessary. Single precision (32-bits) is faster on some operations and consumes only half the memory space as double precision (64-bits) or double extended (80-bits). Use a library that provides fast floating-point to integer routines. Many library routines do more work than is necessary. Insure whenever possible that computations stay in range. Out of range numbers cause very high overhead. Schedule code in assembly language using the FXCH instruction. When possible, unroll loops and pipeline code. Perform transformations to improve memory access patterns. Use loop fusion or compression to keep as much of the computation in the cache as possible. Break dependency chains.
14.1.4. Guidelines for Optimizing SIMD Floating-point Code

Generally, it is important to understand and balance port utilization to create efficient SIMD floating-point code. Use the following guidelines to optimize SIMD floating-point code:
Balance the limitations of the architecture. Schedule instructions to resolve dependencies. Schedule utilization of the triple/quadruple rule (port 0, port 1, port 2, 3, and 4). Group instructions that utilize the same registers as closely as possible. Take into consideration the resolution of true dependencies. Intermix SIMD-fp operations that utilize port 0 and port 1. Do not issue consecutive instructions that utilize the same port. Use the reciprocal instructions followed by iteration for increased accuracy. These instructions yield reduced accuracy but execute much faster. If reduced accuracy is acceptable,
14-3
CODE OPTIMIZATION
use them with no iteration. If near full accuracy is needed, use a Newton-Raphson iteration. If full accuracy is needed, then use divide and square root which provide more accuracy, but slow down performance.
Exceptions: mask exceptions to achieve higher performance. Unmasked exceptions may cause a reduction in the retirement rate. Utilize the flush-to-zero mode for higher performance to avoid the penalty of dealing with denormals and underflows. Incorporate the prefetch instruction whenever possible (for details, refer to Chapter 6, Optimizing Cache Utilization for Pentium III processors ). Try to emulate conditional moves by masked compares and logicals instead of using conditional jumps. Utilize MMX technology instructions if the computations can be done in SIMD-integer or for shuffling data or copying data that is not used later in SIMD floating-point computations. If the algorithm requires extended precision, then conversion to SIMD floating-point code is not advised because the SIMD floating-point instructions are single-precision.
14.2. BRANCH PREDICTION OPTIMIZATION

The P6 family and Pentium processors provide dynamic branch prediction using the branch target buffers (BTBs) on the processors. Understanding the flow of branches and improving the predictability of branches can increase code execution speed significantly.
14.2.1. Branch Prediction Rules

Three elements of dynamic branch prediction are important to understand:
If the instruction address is not in the BTB, execution is predicted to continue without branching (fall through). Predicted taken branches have a 1 clock delay. The BTB stores a four-bit history of branch predictions on Pentium Pro processors, the Pentium II processor family, and the Pentium III processor. The Pentium II and Pentium III processors BTB pattern matches on the direction of the last four branches to dynamically predict whether a branch will be taken.
During the process of instruction prefetch, the instruction address of a conditional instruction is checked with the entries in the BTB. When the address is not in the BTB, execution is predicted to fall through to the next instruction. On P6 family processors, branches that do not have a history in the BTB are predicted using a static prediction algorithm. The static prediction algorithm does the following:
14-4
CODE OPTIMIZATION
Predicts unconditional branches to be taken. Predicts backward conditional branches to be taken. This rule is suitable for loops. Predicts forward conditional branches to be not taken.
14.2.2. Optimizing Branch Predictions in Code

To optimize branch predictions in an application code, apply the following techniques:
Reduce or eliminate branches (see Section 14.2.3., Eliminating and Reducing the Number of Branches). Insure that each CALL instruction has a matching RET instruction. The P6 family of processors have a return stack buffer that keeps track of the target address of the next RET instruction. Do not use pops and jumps to return from a CALL instruction; always use the RET instruction. Do not intermingle data with instructions in a code segment. Unconditional jumps, when not in the BTB, are predicted to be not taken. If data follows a unconditional branch, the data might be fetched, causing the loss of instruction fetch cycles and valuable instructioncache space. When data must be stored in the code segment, move it to the end where it will not be in the instruction fetch stream. Unroll all very short loops. Loops that execute for less than 2 clocks waste loop overhead. Write code to follow the static prediction algorithm. The static prediction algorithm follows the natural flow of program code. Following this algorithm reduces the number of branch mispredictions. Eliminating and Reducing the Number of Branches
14.2.3.
Eliminating branches improves processor performance by:
Removing the possibility of branch mispredictions. Reducing the number of BTB entries required.
Branches can be eliminated by using the SETcc instruction, or by using the P6 family processors conditional move (CMOVcc or FCMOVcc) instructions. The following C code example shows conditions that are dependent upon on of the constants A and B:
/* C Code /* ebx = (A < B) ? C1 : C2;
This code conditionally compares the values A and B. If the condition is true, EBX is set to C1; otherwise it is set to C2. The assembly-language equivalent of the C code is shown in the example below:
; Assembly Code
14-5
CODE OPTIMIZATION
cmp A, B jge L30 mov ebx, CONST1 jmp L31 L30: mov ebx, CONST2 L31:
; condition ; conditional branch ; unconditional branch
By replacing the JGE instruction as shown in the previous example with a SETcc instruction, the EBX register is set to either C1 or C2. This code can be optimized to eliminate the branches as shown in the following code:
xor ebx, ebx cmp A, B setge bl dec and add ebx ebx, (CONST2-CONST1) ebx, min(CONST1,CONST2) ;clear ebx ;When ebx = 0 or 1 ;OR the complement condition ;ebx=00...00 or 11...11 ;ebx=0 or(CONST2-CONST1) ;ebx=CONST1 or CONST2
The optimized code sets register EBX to 0 then compares A and B. If A is greater than or equal to B then EBX is set to 1. EBX is then decremented and ANDed with the difference of the constant values. This sets EBX to either 0 or the difference of the values. By adding the minimum of the two constants the correct value is written to EBX. When CONST1 or CONST2 is equal to zero, the last instruction can be deleted as the correct value already has been written to EBX. When ABS(CONST1-CONST2) is 1 of {2,3,5,9}, the following example applies:
xor cmp setge lea ebx, ebx A, B bl ; or the complement condition ebx, [ebx*D+ebx+CONST1-CONST2]
where D stands for ABS(CONST1 CONST2) 1. A second way to remove branches on P6 family processors is to use the new CMOVcc and FCMOVcc instructions. The following example shows how to use the CMOVcc instruction to eliminate the branch from a test and branch instruction sequence. If the test sets the equal flag then the value in register EBX will be moved to register EAX. This branch is data dependent, and is representative of a unpredictable branch.
test jne mov 1h: ecx, ecx 1h eax, ebx
To change the code, the JNE and the MOV instructions are combined into one CMOVcc instruction, which checks the equal flag. The optimized code is shown below:
test ecx, ecx cmoveqeax, ebx 1h: ; test the flags ; if the equal flag is set, move ebx to eax
14-6
CODE OPTIMIZATION
The label 1h: is no longer needed unless it is the target of another branch instruction. These instructions will generate invalid opcodes when used on previous generation Intel Architecture processors. Therefore, use the CPUID instruction to check feature bit 15 of the EDX register, which when set indicates presence of the CMOVcc family of instructions. Do not use the family and model codes returned by CPUID to test for the presence of specific features. Additional information on branch optimization can be found in the Intel Architecture Optimization Manual.
14.3. REDUCING PARTIAL REGISTER STALLS ON P6 FAMILY PROCESSORS

On P6 family processors, when a large (32-bit) general-purpose register is read immediately after a small register (8- or 16-bit) that is contained in the large register has been written, the read is stalled until the write retires (a minimum of 7 clocks). Consider the example below:
MOV ADD AX, 8 ECX, EAX ; Partial stall occurs on access of ; the EAX register
Here, the first instruction moves the value 8 into the small register AX. The next instruction accesses the large register EAX. This code sequence results in a partial register stall. Pentium and Intel486 processors do not generate this stall. Table 14-1 lists the groups of small registers and their corresponding large register for which a partial register stall can occur. For example, writing to register BL, BH, or BX and subsequently reading register EBX will result in a stall.
Table 14-1. Small and Large General-Purpose Register Pairs
Small Registers AL BL CL DL AH BH CH DH AX BX CX DX SP BP DI SI Large Registers EAX EBX ECX EDX ESP EBP EDI ESI
Because the P6 family processors can execute code out of order, the instructions need not be immediately adjacent for the stall to occur. The following example also contains a partial stall:
MOV MOV MOV AL, 8 EDX, 0x40 EDI, new_value
14-7
CODE OPTIMIZATION
ADD
EDX, EAX
; Partial stall occurs on access of ; the EAX register
In addition, any micro-ops that follow the stalled micro-op will also wait until the clock cycle after the stalled micro-op continues through the pipe. In general, to avoid stalls, do not read a large register after writing a small register that is contained in the large register. Special cases of writing and reading corresponding small and large registers have been implemented in the P6 family processors to simplify the blending of code across processor generations. The special cases include the XOR and SUB instructions when using EAX, EBX, ECX, EDX, EBP, ESP, EDI and ESI as shown in the following examples:
xor movb add xor movw add sub movb add sub movb or xor movb sub eax, eax al, mem8 eax, mem32 eax, eax ax, mem16 eax, mem32 ax, ax al, mem8 ax, mem16 eax, eax al, mem8 ax, mem16 ah, ah al, mem8 ax, mem16
; no partial stall
; no partial stall
; no partial stall
; no partial stall
; no partial stall
In general, when implementing this sequence, always write all zeros to the large register then write to the lower half of the register.
14-8
CODE OPTIMIZATION
14.4. ALIGNMENT RULES AND GUIDELINES

The following section gives rules and guidelines for aligning of code and data for optimum code execution speed. 14.4.1. Alignment Penalties
The following are common penalties for accesses to misaligned data or code:
On a Pentium processor, a misaligned access costs 3 clocks. On a P6 family processor, a misaligned access that crosses a cache line boundary costs 6 to 9 clocks. On a P6 family processor, unaligned accesses that cause a data cache split stall the processor. A data cache split is a memory access that crosses a 32-byte cache line boundary.
For best performance, make sure that data structures and arrays greater than 32 bytes, are 32byte aligned, and that access patterns to data structures and arrays do not break the alignment rules. 14.4.2. Code Alignment
The P6 family and Pentium processors have a cache line size of 32 bytes. Since the prefetch buffers fetch on 16-byte boundaries, code alignment has a direct impact on prefetch buffer efficiency. For optimal performance across the Intel Architecture family, it is recommended that:
A loop entry label should be 16-byte aligned when it is less than 8 bytes away from that boundary. A label that follows a conditional branch should not be aligned. A label that follows an unconditional branch or function call should be 16-byte aligned when it is less than 8 bytes away from that boundary. Data Alignment
14.4.3.
A misaligned access in the data cache or on the bus costs at least 3 extra clocks on the Pentium processor. A misaligned access in the data cache, which crosses a cache line boundary, costs 9 to 12 clocks on the P6 family processors. It is recommended that data be aligned on the following boundaries for optimum code execution on all processors:
Align 8-bit data on any boundary. Align 16-bit data to be contained within an aligned 4-byte word. Align 32-bit data on any boundary that is a multiple of 4. Align 64-bit data on any boundary that is a multiple of 8.
14-9
CODE OPTIMIZATION
Align 80-bit data on a 128-bit boundary (that is, any boundary that is a multiple of 16 bytes). Align 128-bit SIMD floating-point data on a 128-bit boundary (that is, any boundary that is a multiple of 16 bytes). ALIGNMENT OF DATA STRUCTURES AND ARRAYS GREATER THAN 32 BYTES
14.4.3.1.
A 32-byte or greater data structure or array should be aligned such that the beginning of each structure or array element is aligned on a 32 byte boundary, and such that each structure or array element does not cross a 32-byte cache line boundary. Does this general discussion adequately cover the differences between 8, 16, and 32 bit alignments? 14.4.3.2. ALIGNMENT OF DATA IN MEMORY AND ON THE STACK
On the Pentium processor, accessing 64-bit variables that are not 8-byte aligned will cost an extra 3 clocks. On the P6 family processors, accessing a 64-bit variable will cause a data cache split. Some commercial compilers do not align double precision variables on 8-byte boundaries. In such cases, the following techniques can be used to force optimum alignment of data:
Use static variables instead of dynamic (stack) variables. Use in-line assembly code that explicitly aligns data. In C code, use malloc to explicitly allocate variables.
The following sections describe these techniques. Static Variables When a compiler allocates stack space for a dynamic variable, it may not align the variable (see Figure 14-1). However, in most cases, when the compiler allocates space in memory for static variables, the variables are aligned.
14-10
CODE OPTIMIZATION
static float a; float b; static float c; Stack
b b
Memory
a c
Figure 14-1. Stack and Memory Layout of Static Variables
Alignment Using Assembly Language Use in-line assembly code to explicitly align variables. The following example aligns the stack to 64-bits.
; procedure prologue push ebp mov esp, ebp and ebp, -8 sub esp, 12 ; procedure epilogue add esp, 12 pop ebp ret
Dynamic Allocation Using MALLOC When using dynamic allocation, check that the compiler aligns doubleword or quadword values on 8-byte boundaries. If the compiler does not implement this alignment, then use the following technique to align doublewords and quadwords for optimum code execution: 1. Allocate memory equal to the size of the array or structure plus 4 bytes. 2. Use bitwise and to make sure that the array is aligned, for example:
double a[5]; double *p, *newp; p = (double*)malloc ((sizeof(double)*5)+4) newp = (p+4) & (-7)
14-11
CODE OPTIMIZATION
14.5. INSTRUCTION SCHEDULING OVERVIEW

On all Intel Architecture processors, the scheduling of (arrangement of) instructions in the instruction stream can have a significant affect on the execution speed of the processor. For example, when executing code on a Pentium or later Intel Architecture processor, two 1-clock instructions that do not have register or data dependencies between them can generally be executed in parallel (in a single clock) if they are pairedplaced adjacent to one another in the instruction stream. Likewise, a long-latency instruction such as a floating-point instruction can often be executed in parallel with a sequence of 1-clock integer instructions or shorter latency floating-point instructions if the instructions are scheduled appropriately in the instruction stream. The following sections describe two aspects of scheduling that can provide improved performance in Intel Architecture processors: pairing and pipelining. Pairing is generally used to optimize the execution of integer and MMX instructions; pipelining is generally used to optimize the execution of MMX and floating-point instructions.
14.5.1. Instruction Pairing Guidelines

The microarchitecture for the Pentium family of processors (with and without MMX technology) contain two instruction execution pipelines: the U-pipe and the V-pipe. These pipelines are capable of executing two Intel Architecture instructions in parallel (during the same clock or clocks) if the two instructions are pairable. Pairable instructions are those instructions that when they appear adjacent to one another in the instruction stream will normally be executed in parallel. By ordering a code sequence so that whenever possible pairable instructions occur sequentially, code can be optimized to take advantage of the Pentium processors two-pipe microarchitecture.
NOTE
Pairing of instructions improves Pentium processor performance significantly. It does not slow and sometimes improves the performance of P6 family processors. The following subsections describe the Pentium processor pairing rules for integer, MMX, and, floating-point instructions. The pairing rules are grouped into types, as follows:
General pairing rules Integer instruction pairing rules. MMX instruction pairing rules. Floating-point instruction pairing rules. GENERAL PAIRING RULES
14.5.1.1.
The following are general rules for instruction pairing in code written to run on Pentium processors:
14-12
CODE OPTIMIZATION
Unpairable instructions are always executed in the U-pipe. For paired instructions to execute in parallel, the first instruction of the pair must fall on an instruction boundary that forces the instruction to be executed in the U-pipe. The following placements of an instruction in the instruction stream will force an instruction to be executed in the U-pipe: If the first instruction of a pair of pairable instructions is the first instruction in a block of code, the first instruction will be executed in the U-pipe and the second of the pair will be executed in the V-pipe, resulting in parallel execution of the two instructions. If the first instruction of a pair of pairable instructions follows an unpairable instruction in the instruction stream, the first of the pairable instructions will be executed in the U-pipe and the second of the pair in the V-pipe, resulting in parallel execution. After one pair of instructions has been executed in parallel, subsequent pairs will also be executed in parallel until an unpairable instruction is encountered.
Parallel execution of paired instructions will not occur if: The next two instructions are not pairable instructions. The next two instructions have some type of register contention (implicit or explicit). There are some special exceptions (see Special Pairs, in Section 14.5.1.2., Integer Pairing Rules) to this rule where register contention can occur with pairing. The instructions are not both in the instruction cache. An exception to this that permits pairing is if the first instruction is a one byte instruction. The processor is operating in single-step mode.
Instructions that have data dependencies should be separated by at least one other instruction. Pentium processors without MMX technology do not execute a set of paired instructions if either instruction is longer than 7 bytes; Pentium processors with MMX technology do not execute a set of paired instructions if the first instruction is longer than 11 bytes or the second instruction is longer than 7 bytes. Prefixes are not counted. On Pentium processors without MMX technology, prefixed instructions are pairable only in the U-pipe. On Pentium processors with MMX technology, instructions with 0FH, 66H or 67H prefixes are also pairable in the V-pipe. For this and the previous rule, stalls at the entrance to the instruction FIFO, on Pentium processors with MMX technology, will prevent pairing. Floating-point instructions are not pairable with MMX instructions. INTEGER PAIRING RULES
14.5.1.2.
Table 14-2 shows the integer instructions that can be paired. The table is divided into two halves: one for the U-pipe and one for the V-pipe. Any instruction in the U-pipe list can be paired with any instruction in the V-pipe list, and vice versa.
14-13
CODE OPTIMIZATION
Table 14-2. Pairable Integer Instructions

Integer Instruction Pairable in U-Pipe MOV reg, reg MOV reg, mem MOV mem, reg MOV reg, imm MOV mem, imm MOV eax, mem MOV mem, eax Integer Instruction Pairable in V-Pipe MOV reg, reg MOV reg, mem MOV mem, reg MOV reg, imm MOV mem, imm MOV eax, mem MOV m, eax
ALU reg, imm ALU mem, imm ALU eax, imm ALU mem, reg ALU reg, mem
INC/DEC reg INC/DEC mem LEA reg, mem
PUSH reg PUSH imm POP reg NOP SHIFT/ROT by 1 SHIFT by imm TEST reg, r/m TEST acc, imm
ALU reg, imm ALU mem, imm ALU eax, imm ALU mem, reg ALU reg, mem
INC./DEC reg INC/DEC mem LEA reg, mem TEST reg, r/m
PUSH reg PUSH imm POP reg JMP near Jcc near 0F Jcc CALL near NOP TEST acc, imm
ALU reg, reg
ALU reg, reg
NOTES:
ALUArithmetic or logical instruction such as ADD, SUB, or AND. In general, most simple ALU instructions are pairable. immImmediate. regRegister. memMemory location. r/mRegister or memory location. accAccumulator (EAX or AX register).
General Integer-Instruction Pairability Rules The following are general rules for pairability of integer instructions. These rules summarize the pairing of instructions in Table 14-2.
NP InstructionsThe following integer instructions cannot be paired: The shift and rotate instructions with a shift count in the CL register. Long-arithmetic instructions, such as MUL and DIV. Extended instructions, such as RET, ENTER, PUSHA, MOVS, STOS, and LOOPNZ. Inter-segment instructions, such as PUSH sreg and CALL far.
UV InstructionsThe following instructions can be paired when issued to the U- or Vpipes: Most 8/32 bit ALU operations, such as ADD, INC, and XOR. All 8/32 bit compare instructions, such as CMP and TEST. All 8/32 bit stack operations using registers, such as PUSH reg and POP reg.
PU instructionsThe following instructions when issued to the U-pipe can be paired with a suitable instruction in the V-Pipe. These instructions never execute in the V-pipe. Carry and borrow instructions, such as ADC and SBB.
14-14
CODE OPTIMIZATION
Prefixed instructions. Shift with immediate instructions.
PV instructionsThe following instructions when issued to the V-pipe can be paired with a suitable instruction in the U-Pipe. The simple control transfer instructions, such as the CALL near, JMP near, or Jcc instructions, can execute in either the U-pipe or the V-pipe, but they can be paired with other instructions only when they are in the V-pipe. Since these instructions change the instruction pointer (EIP), they cannot pair in the U-pipe since the next instruction may not be adjacent. The PV instructions include both Jcc short and Jcc near (which have a 0FH prefix) versions of the Jcc instruction.
Unpairability Due to Register Dependencies Instruction pairing is also affected by instruction operands. The following instruction pairings will not result in parallel execution because of register contention. Exceptions to these rules are given in Special Pairs, in Section 14.5.1.2., Integer Pairing Rules.
Flow DependenceThe first instruction writes to a register that the second one reads from, as in the following example:
mov mov eax, 8 [ebp], eax
Output DependenceBoth instructions write to the same register, as in the following example.
mov mov eax, 8 eax, [ebp]
This output dependence limitation does not apply to a pair of instructions that write to the EFLAGS register (for example, two ALU operations that change the condition codes). The condition code after the paired instructions execute will have the condition from the V-pipe instruction. Note that a pair of instructions in which the first reads a register and the second writes to the same register (anti-dependence) may be paired, as in the following example:
mov mov eax, ebx ebx, [ebp]
For purposes of determining register contention, a reference to a byte or word register is treated as a reference to the containing 32-bit register. Therefore, the following instruction pair does not execute in parallel because of output dependencies on the contents of the EAX register.
mov mov al, 1 ah, 0
14-15
CODE OPTIMIZATION
Special Pairs Some integer instructions can be paired in spite of the previously described general integerinstruction rules. These special pairs overcome register dependencies, and most involve implicit reads/writes to the ESP register or implicit writes to the condition codes:
Stack Pointer.
push push pop reg/imm reg/imm reg ; push reg/imm ; call ; pop reg
Condition Codes.
cmp add ; jcc ; jne
Note that the special pairs that consist of PUSH/POP instructions may have only immediate or register operands, not memory operands. Restrictions On Pair Execution Some integer-instruction pairs may be issued simultaneously but will not execute in parallel:
Data-Cache ConflictIf both instructions access the same data-cache memory bank then the second request (V-pipe) must wait for the first request to complete. A bank conflict occurs when bits 2 through 4 of the two physical addresses are the same. A bank conflict results in a 1-clock penalty on the V-pipe instruction. Inter-Pipe ConcurrencyParallel execution of integer instruction pairs preserves memoryaccess ordering. A multiclock instruction in the U-pipe will execute alone until its last memory access.
For example, the following instructions add the contents of the register and the value at the memory location, then put the result in the register. An add with a memory operand takes 2 clocks to execute. The first clock loads the value from the data cache, and the second clock performs the addition. Since there is only one memory access in the U-pipe instruction, the add in the V-pipe can start in the same clock.
add add (add) eax, meml ebx, mem2 (add) ;1 ; 2 2-cycle
The following instructions add the contents of the register to the memory location and store the result at the memory location. An add with a memory result takes 3 clocks to execute. The first clock loads the value, the second performs the addition, and the third stores the result. When paired, the last clock of the U-pipe instruction overlaps with the first clock of the V-pipe instruction execution.
add (add) (add) (add) (add) meml, eax add mem2, ebx ;4 ;5 ;1 ;2 ;3
14-16
CODE OPTIMIZATION
No other instructions may begin execution until the instructions already executing have completed. To expose the opportunities for scheduling and pairing, it is better to issue a sequence of simple instructions rather than a complex instruction that takes the same number of clocks. The simple instruction sequence can take advantage of more issue slots. The load/store style code generation requires more registers and increases code size. This impacts Intel486 processor performance, although only as a second order effect. To compensate for the extra registers needed, extra effort should be put into register allocation and instruction scheduling so that extra registers are only used when parallelism increases. 14.5.1.3. MMX INSTRUCTION PAIRING GUIDELINES
This section specifies guidelines and restrictions for pairing MMX instructions with each other and with integer instructions. Pairing Two MMX Instructions The following restrictions apply when pairing of two MMX instructions:
Two MMX instructions that both use the MMX shifter unit (pack, unpack, and shift instructions) are not pairable because there is only one MMX shifter unit. Shift operations may be issued in either the U-pipe or the V-pipe, but cannot executed in both pipes in the same clock. Two MMX instructions that both use the MMX multiplier unit (PMULL, PMULH, PMADD type instructions) are not pairable because there is only one MMX multiplier unit. Multiply operations may be issued in either the U-pipe or the V-pipe, but cannot executed in both pipes in the same clock. MMX instructions that access either memory or a general-purpose register can be issued in the U-pipe only. Do not schedule these instructions to the V-pipe as they will wait and be issued in the next pair of instructions (and to the U-pipe). The MMX destination register of the U-pipe instruction should not match the source or destination register of the V-pipe instruction (dependency check). The EMMS instruction is not pairable with other instructions. If either the TS flag or the EM flag in control register CR0 is set, MMX instructions cannot be executed in the V-pipe.
Pairing an Integer Instruction in the U-Pipe With an MMX Instruction in the V-Pipe Use the following guidelines for pairing an integer instruction in the U-pipe and an MMX instruction in the V-pipe:
The MMX instruction is not the first MMX instruction following a floating-point instruction. The V-pipe MMX instruction does not access either memory or a general-purpose register.
14-17
CODE OPTIMIZATION
The U-pipe integer instruction is a pairable U-pipe integer instruction (see Table 14-2).
Pairing an MMX Instruction in the U-Pipe with an Integer Instruction in the V-Pipe Use the following guidelines for pairing an MMX instruction in the U-pipe and an integer instruction in the V-pipe:
The U-pipe MMX instruction does not access either memory or a general-purpose register. The V-pipe instruction is a pairable integer V-pipe instruction (see Table 14-2).
14.5.2. Pipelining Guidelines

The term pipelining refers to the practice of scheduling instructions in the instruction stream to reduce processor stalls due to register, data, or data-cache dependencies. The effect of pipelining on code execution is highly dependent on the family of Intel Architecture processors the code is intended to run on. Pipelining can greatly increase the performance of code written to run on the Pentium family of processors. It is less important for code written to run on the P6 family processors, because the dynamic execution model that these processors use does a significant amount of pipelining automatically. The following subsections describe general pipelining guidelines for MMX and floatingpoint instructions. These guidelines yield significant improvements in execution speed for code running on the Pentium processors and may yield additional improvements in execution speed on the P6 family processors. Specific pipelining guidelines for the P6 family processors are given in Section 14.5.3., Scheduling Rules for P6 Family Processors 14.5.2.1. MMX INSTRUCTION PIPELINING GUIDELINES
All MMX instructions can be pipelined on P6 family and Pentium (with MMX technology) processors, including the multiply instructions. All MMX instructions take a single clock to execute except the MMX multiply instructions which take 3 clocks. Since MMX multiply instructions take 3 clocks to execute, the result of a multiply instruction can be used only by other instructions issued 3 clocks later. For this reason, avoid scheduling a dependent instruction in the 2 instruction pairs following the multiply. The store of a register after writing the register must wait for 2 clocks after the update of the register. Scheduling the store 2 clocks after the update avoids a pipeline stall. 14.5.2.2. FLOATING-POINT PIPELINING GUIDELINES
Many of the floating-point instructions have a latency greater than 1 clock, therefore on Pentium processors the next floating-point instruction cannot access the result until the first operation has finished execution. To hide this latency, instructions should be inserted between the pair that causes the pipe stall. These instructions can be integer instructions or floating-point instructions that will not cause a new stall themselves. The number of instructions that should be inserted depends on the length of the latency. Because of the out-of-order execution capa-
14-18
CODE OPTIMIZATION
bility of the P6 family processors, stalls will not necessarily occur on an instruction or micro-op basis. However, if an instruction has a very long latency such as an FDIV, then scheduling can improve the throughput of the overall application. The following sections list considerations for floating-point pipelining on Pentium processors. Pairing of Floating-Point Instructions In a Pentium processor, pairing floating-point instructions with one another (with one exception) does not result in a performance enhancement because the processor has only one floatingpoint unit (FPU). However, some floating-point instructions can be paired with integer instructions or the FXCH instruction to improve execution times. The following are some general pairing rules and restrictions for floating-point instructions:
All floating-point instructions can be executed in the V-pipe and paired with suitable instructions (generally integer instructions) in the U-pipe. The only floating-point instruction that can be executed in the U-pipe is the FXCH instruction. The FXCH instruction, if executed in the U-pipe can be paired with another floating-point instruction executing in the V-pipe. The floating-point instructions FSCALE, FLDCW, and FST cannot be paired with any instruction (integer instruction or the FXCH instruction).
Using Integer Instructions to Hide Latencies and Schedule Floating-Point Instructions When a floating-point instruction depends on the result of the immediately preceding instruction, and that instruction is also a floating-point instruction, performance can be improved by placing one or more integer instructions between the two floating-point instructions. This is true even if the integer instructions perform loop control. The following example restructures a loop in this manner:
for (i=0; i<Size; i++) array1 [i] += array2 [i]; ; assume eax=Size-1, esi=array1, edi=array2 PENTIUM(R) PROCESSORCLOCKS LoopEntryPoint: fld fadd fstp dec jnz real4 ptr [esi+eax*4] real4 ptr [edi+eax*4] real4 ptr [esi+eax*4] eax LoopEntryPoint ; 2 - AGI ;1 ; 5 - waits for fadd ;1
; assume eax=Size-1, esi=array1, edi=array2 jmp LoopEntryPoint Align 16 TopOfLoop: fstp real4 ptr [esi+eax*4+4] LoopEntryPoint: fld real4 ptr [esi+eax*4]
; 4 - waits for fadd + AGI ;1
14-19
CODE OPTIMIZATION
fadd dec jnz ; fstp
real4 ptr [edi+eax*4] eax TopOfLoop real4 ptr [esi+eax*4+4]
;1 ;1
By moving the integer instructions between the FADDS and FSTPS instructions, the integer instructions can be executed while the FADDS instruction is completing in the floating-point unit and before the FSTPS instruction begins execution. Note that this new loop structure requires a separate entry point for the first iteration because the loop needs to begin with the FLDS instruction. Also, there needs to be an additional FSTPS instruction after the conditional jump to finish the final loop iteration. Hiding the One-Clock Latency of a Floating-Point Store A floating-point store must wait an extra clock for its floating-point operand. After an FLD, an FST must wait 1 clock, as shown in the following example:
fld fst meml mem2 ; 1 fld takes 1 clock ; 2 fst waits, schedule something here ; 3,4 fst takes 2 clocks
After the common arithmetic operations, FMUL and FADD, which normally have a latency of 3 clocks, FST waits an extra clock for a total of 4 (see following example).
fadd meml ; 1 add takes 3 clocks ; 2 add, schedule something here ; 3 add, schedule something here ; 4 fst waits, schedule something here ; 5,2 fst takes 2 clocks
fst
mem2
Other instructions such as FADDP and FSUBRP also exhibit this type of latency. In the next example, the store is not dependent on the previous load:
fld fld fxch fst meml mem2 st(l) mem3 ;1 ;2 ;2 ; 3 stores values loaded from meml
Here, a register may be used immediately after it has been loaded (with FLD):
fld fadd mem1 mem2 ;l ; 2,3,4
Use of a register by a floating-point operation immediately after it has been written by another FADD, FSUB, or FMUL causes a 2-clock delay. If instructions are inserted between these two, then latency and a potential stall can be hidden. Additionally, there are multiclock floating-point instructions (FDIV and FSQRT) that execute in the floating-point unit pipe (the U-pipe). While executing these instructions in the floating-point unit pipe, integer instructions can be executed in parallel. Emitting a number of integer instructions after such an instruction will keep the integer execution units busy (the exact number of instructions depends on the floating-point instructions clock count).
14-20
CODE OPTIMIZATION
Integer instructions generally overlap with the floating-point operations except when the last floating-point operation was FXCH. In this case there is a 1 clock delay:
:
U-pipe fadd
V-pipe fxch ;1 ; 2 fxch delay
mov eax, 1
inc edx
Integer and Floating-Point Multiply The integer multiply operations, the MUL and IMUL instructions, are executed by the FPUs multiply unit. Therefore, for the Pentium processor, these instructions cannot be executed in parallel with a floating-point instruction. This restriction does not apply to the P6 family processors, because these processors have two internal FPU execution units. A floating-point multiply instruction (FMUL) delays for 1 clock if the immediately preceding clock executed an FMUL or an FMUL-FXCH pair. The multiplier can only accept a new pair of operands every other clock. Floating-Point Operations with Integer Operands Floating-point operations that take integer operands (the FIADD or FISUB instruction) should be avoided. These instructions should be split into two instructions: the FILD instruction and a floating-point operation. The number of clocks before another instruction can be issued (throughput) for FIADD is 4, while for FILD and simple floating-point operations it is 1, as shown in the example below:
.
Complex Instructions fiadd [ebp] ; 4
Better for Potential Overlap fild [ebp] ; 1 faddp st(l) ; 2
Using the FILD and FADDP instructions in place of FIADD yields 2 free clocks for executing other instructions. FSTSW Instruction The FSTSW instruction that usually appears after a floating-point comparison instruction (FCOM, FCOMP, FCOMPP) delays for 3 clocks. Other instructions may be inserted after the comparison instruction to hide this latency. On the P6 family processors the FCMOVcc instruction can be used instead.
14-21
CODE OPTIMIZATION
Transcendental Instructions Transcendental instructions execute in the U-pipe and nothing can be overlapped with them, so an integer instruction following a transcendental instruction will wait until the previous instruction completes. Transcendental instructions execute on the Pentium processor (and later Intel Architecture processors) much faster than the software emulations of these instructions found in most math libraries. Therefore, it may be worthwhile in-lining transcendental instructions in place of math library calls to transcendental functions. Software emulations of transcendental instructions will execute faster than the equivalent instructions only if accuracy is sacrificed. FXCH Guidelines The FXCH instruction costs no extra clocks on the Pentium processor when all of the following conditions occur, allowing the instruction to execute in the V-pipe in parallel with another floating-point instruction executing in the U-pipe:
A floating-point instruction follows the FXCH instruction. A floating-point instruction from the following list immediately precedes the FXCH instruction: FADD, FSUB, FMUL, FLD, FCOM, FUCOM, FCHS, FTST, FABS, or FDIV. An FXCH instruction has already been executed. This is because the instruction boundaries in the cache are marked the first time the instruction is executed, so pairing only happens the second time this instruction is executed from the cache.
When the above conditions are true, the instruction is almost free and can be used to access elements in the deeper levels of the floating-point stack instead of storing them and then loading them again.
14.5.3. Scheduling Rules for P6 Family Processors

The P6 family processors have 3 decoders that translate Intel Architecture macro instructions into micro operations (micro-ops, also called uops). The decoder limitations are as follows:
The first decoder (decoder 0) can decode instructions up to 7 bytes in length and with up to 4 micro-ops in one clock cycle. The second two decoders (decoders 1 and 2) can decode instructions that are 1 micro-op instructions, and these instructions will also be decoded in one clock cycle. Three macro instructions in an instruction sequence that fall into this envelope will be decoded in one clock cycle. Macro instructions outside this envelope will be decoded through decoder 0 alone. While decoder 0 is decoding a long macro instruction, decoders 1 and 2 (second and third decoders) are quiescent.
Appendix C of the Intel Architecture Optimization Manual lists all Intel macro-instructions and the decoders on which they can be decoded.
14-22
CODE OPTIMIZATION
The macro instructions entering the decoder travel through the pipe in order; therefore, if a macro instruction will not fit in the next available decoder then the instruction must wait until the next clock to be decoded. It is possible to schedule instructions for the decoder such that the instructions in the in-order pipeline are less likely to be stalled. Consider the following examples:
If the next available decoder for a multimicro-op instruction is not decoder 0, the multimicro-op instruction will wait for decoder 0 to be available, usually in the next clock, leaving the other decoders empty during the current clock. Hence, the following two instructions will take 2 clocks to decode.
add add eax, ecx edx, [ebx] ; 1 uop instruction (decoder 0) ; 2 uop instruction (stall 1 cycle wait till ; decoder 0 is available)
During the beginning of the decoding clock, if two consecutive instructions are more than 1 micro-op, decoder 0 will decode one instruction and the next instruction will not be decoded until the next clock.
add mov add eax, [ebx] ecx, [eax] ebx, 8 ; 2 uop instruction (decoder 0) ; 2 uop instruction (stall 1 cycle to wait until ; decoder 0 is available) ; 1 uop instruction (decoder 1)
Instructions of the opcode reg, mem form produce two micro-ops: the load from memory and the operation micro-op. Scheduling for the decoder template (4-1-1) can improve the decoding throughput of your application. In general, the opcode reg, mem forms of instructions are used to reduce register pressure in code that is not memory bound, and when the data is in the cache. Use simple instructions for improved speed on the Pentium and P6 family processors. The following rules should be observed while using the opcode reg, mem instruction on Pentium processors with MMX technology:
Schedule for minimal stalls in the Pentium processor pipe. Use as many simple instructions as possible. Generally, 32-bit assembly code that is well optimized for the Pentium processor pipeline will execute well on the P6 family processors. When scheduling for Pentium processors, keep in mind the primary stall conditions and decoder (4-1-1) template on the P6 family processors, as shown in the example below.
pmaddw paddd ad mm6, [ebx] mm7, mm6 ebx, 8 ; 2 uops instruction (decoder 0) ; 1 uop instruction (decoder 1) ; 1 uop instruction (decoder 2)
14-23
CODE OPTIMIZATION
14.6. ACCESSING MEMORY

The following subsections describe optimizations that can be obtained when scheduling instructions that access memory. 14.6.1. Using MMX Instructions That Access Memory
An MMX instruction may have two register operands (opcode reg, reg) or one register and one memory operand (opcode reg, mem), where opcode represents the instruction opcode, reg represents the register, and mem represents memory. The opcode reg, mem instructions are useful in some cases to reduce register pressure, increase the number of operations per clock, and reduce code size. The following discussion assumes that the memory operand is present in the data cache. If it is not, then the resulting penalty is usually large enough to obviate the scheduling effects discussed in this section. In Pentium processor with MMX technology, the opcode reg, mem MMX instructions do not have longer latency than the opcode reg, reg instructions (assuming a cache hit). They do have more limited pairing opportunities, however. In the Pentium II and Pentium III processors, the opcode reg, mem MMX instructions translate into two micro-ops, as opposed to one micro-op for the opcode reg, reg instructions. Thus, they tend to limit decoding bandwidth and occupy more resources than the opcode reg, reg instructions. The recommended usage of the opcode reg, reg instructions depends on whether the MMX code is memory-bound (that is, execution speed is limited by memory accesses). As a rule of thumb, an MMX code sequence is considered to be memory-bound if the following inequality holds:
Instructions NonMMXInstructions -------------------------------- < MemoryAccesses + -----------------------------------------------------------2 2
For memory-bound MMX code, Intel recommends merging loads whenever the same memory address is used more than once to reduce memory accesses. For example, the following code sequence can be speeded up by using a MOVQ instruction in place of the opcode reg, mem forms of the MMX instructions:
OPCODE MM0, [address A] OPCODE MM1, [address A] ; optimized by use of a MOVQ instruction and opcode reg, mem forms ; of the MMX(TM) instructions
MOVQ OPCODE OPCODE MM2, [address A] MM0, MM2 MM1, MM2
Another alternative is to incorporate the prefetch instruction introduced in the Pentium III processor. Prefetching the data preloads the cache prior to actually needing the data. Proper use of prefetch can improve performance if the application is not memory bandwidth bound or the
14-24
CODE OPTIMIZATION
data does not already fit into cache. For more information on proper usage of the prefetch instruction see the Intel Architecture Optimization Manual order number 245127-001. For MMX code that is not memory-bound, load merging is recommended only if the same memory address is used more than twice. Where load merging is not possible, usage of the opcode reg, mem instructions is recommended to minimize instruction count and code size. For example, the following code sequence can be shortened by removing the MOVQ instruction and using an opcode reg, mem form of the MMX instruction:
MOVQ mm0, [address A] OPCODE mm1, mm0 ; optimized by removing the MOVQ instruction and using an ; opcode reg, mem form of the MMX(TM) instructions
OPCODE
mm1, [address A]
In many cases, a MOVQ reg, reg and opcode reg, mem can be replaced by a MOVQ reg, mem and the opcode reg, reg. This should be done where possible, since it saves one micro-op on the Pentium II and Pentium III processors. The following example is one where the opcode is a symmetric operation:
MOVQ OPCODE mm1, mm0 mm1, [address A] (1 micro-op) (2 micro-ops)
One clock can be saved by rewriting the code as follows:

MOVQ OPCODE mm1, [address A] mm1, mm0 (1 micro-op) (1 micro-op)
14.6.2.
Partial Memory Accesses With MMX Instructions
The MMX registers allow large quantities of data to be moved without stalling the processor. Instead of loading single array values that are 8-, 16-, or 32-bits long, the values can be loaded in a single quadword, with the structure or array pointer being incremented accordingly. Any data that will be manipulated by MMX instructions should be loaded using either:
The MMX instruction that loads a 64-bit operand (for example, MOVQ MM0, m64), or The register-memory form of any MMX instruction that operates on a quadword memory operand (for example, PMADDW MM0, m64).
All data in MMX registers should be stored using the MMX instruction that stores a 64-bit operand (for example, MOVQ m64, MM0). The goal of these recommendations is twofold. First, the loading and storing of data in MMX registers is more efficient using the larger quadword data block sizes. Second, using quadword data block sizes helps to avoid the mixing of 8-, 16-, or 32-bit load and store operations with 64bit MMX load and store operations on the same data. This, in turn, prevents situations in which small loads follow large stores to the same area of memory, or large loads follow small stores to the same area of memory. The Pentium II and Pentium III processors will stall in these situations.
14-25
CODE OPTIMIZATION
Consider the following examples. The first example illustrates the effects of a large load after a series of small stores to the same area of memory (beginning at memory address mem). The large load will stall the processor:
MOV MOV : : MOVQ mem, eax mem + 4, ebx ; store dword to address "mem" ; store dword to address "mem + 4"
mm0, mem
; load qword at address "mem", stalls
The MOVQ instruction in this example must wait for the stores to write memory before it can access all the data it requires. This stall can also occur with other data types (for example, when bytes or words are stored and then words or doublewords are read from the same area of memory). By changing the code sequence as follows, the processor can access the data without delay:
MOVD MOVD PSLLQ POR MOVQ : : MOVQ mm1, ebx mm2, eax mm1, 32 mm1, mm2 mem, mm1 ; build data into a qword first before storing it to memory
; store SIMD variable to "mem" as a qword
mm0, mem
; load qword SIMD variable "mem", no stall
The second example illustrates the effect of a series of small loads after a large store to the same area of memory (beginning at memory address mem). Here, the small loads will stall the processor:
MOVQ mem, mm0 : : MOV bx, mem + 2 MOV cx, mem + 4 ; store qword to address "mem"
; load word at address "mem + 2" stalls ; load word at address "mem + 4" stalls
The word loads must wait for the MOVQ instruction to write to memory before they can access the data they require. This stall can also occur with other data types (for example, when doublewords or words are stored and then words or bytes are read from the same area of memory). Changing the code sequence as follows allows the processor to access the data without a stall:
MOVQ : : MOVQ MOVD PSRLQ SHR MOVD AND mem, mm0 ; store qword to address "mem"
mm1, mem eax, mm1 mm1, 32 eax, 16 ebx, mm1 ebx, 0ffffh
; load qword at address "mem" ; transfer "mem + 2" to ax from ; MMX(TM) register not memory
; transfer "mem + 4" to bx from ; MMX register, not memory
14-26
CODE OPTIMIZATION
These transformations, in general, increase the number the instructions required to perform the desired operation. For the Pentium II and Pentium III processors, the performance penalty due to the increased number of instructions is more than offset by the number of clocks saved. For the Pentium processor with MMX technology, however, the increased number of instructions can negatively impact performance. For this reason, careful and efficient coding of these transformations is necessary to minimize any potential negative impact to Pentium processor performance. 14.6.3. Write Allocation Effects
P6 family processors have a write allocate by read-for-ownership cache, whereas the Pentium processor has a no-write-allocate; write through on write miss cache. On P6 family processors, when a write occurs and the write misses the cache, the entire 32-byte cache line is fetched. On the Pentium processor, when the same write miss occurs, the write is simply sent out to memory. Write allocate is generally advantageous, since sequential stores are merged into burst writes, and the data remains in the cache for use by later loads. This is why P6 family processors adopted this write strategy, and why some Pentium processor system designs implement it for the L2 cache. Write allocate can be a disadvantage in code where:
Just one piece of a cache line is written. The entire cache line is not read. Strides are larger than the 32-byte cache line. Writes to a large number of addresses (greater than 8000).
When a large number of writes occur within an application, and both the stride is longer than the 32-byte cache line and the array is large, every store on a P6 family processor will cause an entire cache line to be fetched. In addition, this fetch will probably replace one (sometimes two) dirty cache line(s). The result is that every store causes an additional cache line fetch and slows down the execution of the program. When many writes occur in a program, the performance decrease can be significant. The following Sieve of Erastothenes example program demonstrates these cache effects. In this example, a large array is stepped through in increasing strides while writing a single value of the array with zero.
NOTE
This is a very simplistic example used only to demonstrate cache effects. Many other optimizations are possible in this code.
14-27
CODE OPTIMIZATION
boolean array[max]; for(i=2;i<max;i++) { array = 1; } for(i=2;i<max;i++) { if( array[i] ) { for(j=2;j<max;j+=i) { array[j] = 0;
/*here we assign memory to 0 causing the cache line fetch within the j loop */
} } }
Two optimizations are available for this specific example:
Optimization 1In boolean in this example there is a char array. Here, it may well be better to make the boolean array into an array of bits, thereby reducing the size of the array, which in turn reduces the number of cache line fetches. The array is packed so that read-modify-writes are done (since the cache protocol makes every read into a readmodify-write). Unfortunately, in this example, the vast majority of strides are greater than 256 bits (one cache line of bits), so the performance increase is not significant. Optimization 2Another optimization is to check if the value is already zero before writing (as shown in the following example), thereby reducing the number of writes to memory (dirty cache lines)
boolean array[max]; for(i=2;i<max;i++) { array = 1; } for(i=2;i<max;i++) { if( array[i] ) { for(j=2;j<max;j+=i) { if( array[j] != 0 ) { array[j] = 0; } } } }
/* check to see if value is already 0 */
The external bus activity is reduced by half because most of the time in the Sieve program the data is already zero. By checking first, you need only 1 burst bus cycle for the read and you save the burst bus cycle for every line you do not write. The actual write back of the modified line is no longer needed, therefore saving the extra cycles.
14-28
CODE OPTIMIZATION
NOTE
This operation benefits the P6 family processors, but it may not enhance the performance of Pentium processors. As such, it should not be considered generic.
14.7. ADDRESSING MODES AND REGISTER USAGE

On the Pentium processor, when a register is used as the base component, an additional clock is used if that register is the destination of the immediately preceding instruction (assuming all instructions are already in the prefetch queue). For example:
add mov esi, eax eax, [esi] ; esi is destination register ; esi is base, 1 clock penalty
Since the Pentium processor has two integer pipelines, a register used as the base or index component of an effective address calculation (in either pipe) causes an additional clock if that register is the destination of either instruction from the immediately preceding clock (see Figure 14-2). This effect is known as Address Generation Interlock (AGI). To avoid the AGI, the instructions should be separated by at least 1 clock by placing other instructions between them. The MMX registers cannot be used as base or index registers, so the AGI does not apply for MMX register destinations. No penalty occurs in the P6 family processors for the AGI condition.
AGI Penalty PF DI D2 E WB PF DI D2 E WB
PF DI D2 AGI E WB
Figure 14-2. Pipeline Example of AGI Stall
14-29
CODE OPTIMIZATION
Note that some instructions have implicit reads/writes to registers. Instructions that generate addresses implicitly through ESP (such as PUSH, POP, RET, CALL) also suffer from the AGI penalty, as shown in the following example:
sub esp, 24 ; 1 clock cycle stall push ebx mov esp, ebp ; 1 clock cycle stall pop ebp
The PUSH and POP instructions also implicitly write to the ESP register. These writes, however, do not cause an AGI when the next instruction addresses through the ESP register. Pentium processors rename the ESP register from PUSH and POP instructions to avoid the AGI penalty (see the following example):
push mov edi ebx, [esp] ; no stall
On Pentium processors, instructions that include both an immediate and a displacement field are pairable in the U-pipe. When it is necessary to use constants, it is usually more efficient to use immediate data instead of loading the constant into a register first. If the same immediate data is used more than once, however, it is faster to load the constant in a register and then use the register multiple times, as illustrated in the following example:
mov mov result, 555 word ptr [esp+4], 1 ; 555 is immediate, result is ; displacement ; 1 is immediate, 4 is displacement
Since MMX instructions have 2-byte opcodes (0FH opcode map), any MMX instruction that uses base or index addressing with a 4-byte displacement to access memory will have a length of 8 bytes. Instructions over 7 bytes can slow macro instruction decoding and should be avoided where possible. It is often possible to reduce the size of such instructions by adding the immediate value to the value in the base or index register, thus removing the immediate field.
14.8. INSTRUCTION LENGTH

On Pentium processors, instructions greater than 7 bytes in length cannot be executed in the Vpipe. In addition, two instructions cannot be pushed into the instruction FIFO unless both are 7 bytes or less in length. If only one instruction is pushed into the instruction FIFO, pairing will not occur unless the instruction FIFO already contains at least one instruction. In code where pairing is very high (as is often the case in MMX code) or after a mispredicted branch, the instruction FIFO may be empty, leading to a loss of pairing whenever the instruction length is over 7 bytes. In addition, the P6 family processors can only decode one instruction at a time when an instruction is longer than 7 bytes. So, for best performance on all Intel processors, use simple instructions that are less than 8 bytes in length.
14-30
CODE OPTIMIZATION
14.9. PREFIXED OPCODES

On the Pentium processor, an instruction with a prefix is pairable in the U-pipe (PU) if the instruction (without the prefix) is pairable in both pipes (UV) or in the U-pipe (PU). The prefixes are issued to the U-pipe and get decoded in 1 clock for each prefix and then the instruction is issued to the U-pipe and may be paired. For the P6 family and Pentium processors, the prefixes that should be avoided for optimum code execution speeds are:
Lock. Segment override. Address size. Operand size. 2-byte opcode map (0FH) prefix. An exception is the Streaming SIMD Extensions instructions introduced with the Pentium III processor. The first byte of these instructions is 0FH. It is not used as a prefix. 2-byte opcode map (0FH) prefix.
On Pentium processors with MMX technology, a prefix on an instruction can delay the parsing and inhibit pairing of instructions. The following list highlights the effects of instruction prefixes on the Pentium processor instruction FIFO:
There is no penalty on 0FH-prefix instructions. An instruction with a 66H or 67H prefix takes 1 clock for prefix detection, another clock for length calculation, and another clock to enter the instruction FIFO (3 clocks total). It must be the first instruction to enter the instruction FIFO, and a second instruction can be pushed with it. Instructions with other prefixes (not 0FH, 66H, or 67H) take 1 additional clock to detect each prefix. These instructions are pushed into the instruction FIFO only as the first instruction. An instruction with two prefixes will take 3 clocks to be pushed into the instruction FIFO (2 clocks for the prefixes and 1 clock for the instruction). A second instruction can be pushed with the first into the instruction FIFO in the same clock.
The impact on performance exists only when the instruction FIFO does not hold at least two entries. As long as the decoder (D1 stage) has two instructions to decode there is no penalty. The instruction FIFO will quickly become empty if the instructions are pulled from the instruction FIFO at the rate of two per clock. So, if the instructions just before the prefixed instruction suffer from a performance loss (for example, no pairing, stalls due to cache misses, misalignments, etc.), then the performance penalty of the prefixed instruction may be masked. On the P6 family processors, instructions longer than 7 bytes in length limit the number of instructions decoded in each clock. Prefixes add 1 to 2 bytes to the length of an instruction, possibly limiting the decoder.
14-31
CODE OPTIMIZATION
It is recommended that, whenever possible, prefixed instructions not be used or that they be scheduled behind instructions which themselves stall the pipe for some other reason.
14.10. INTEGER INSTRUCTION SELECTION AND OPTIMIZATIONS

This section describes both instruction sequences to avoid and sequences to use when generating optimal assembly code. The information applies to the P6 family processors and the Pentium processors with and without MMX technology.
LEA Instruction. The LEA instruction can be used in the following situations to optimize code execution: The LEA instruction may be used sometimes as a three/four operand addition instruction (for example, LEA ECX, [EAX+EBX+4+a]). In many cases, an LEA instruction or a sequence of LEA, ADD, SUB and SHIFT instructions may be used to replace constant multiply instructions. For the P6 family processors the constant multiply is faster relative to other instructions than on the Pentium processor, therefore the trade off between the two options occurs sooner. It is recommended that the integer multiply instruction be used in code designed for P6 family processor execution. The above technique can also be used to avoid copying a register when both operands to an ADD instruction are still needed after the ADD, since the LEA instruction need not overwrite its operands. The disadvantage of the LEA instruction is that it increases the possibility of an AGI stall with previous instructions. LEA is useful for shifts of 2, 4, and 8 because on the Pentium processor, LEA can execute in either the U- or V-pipe, but the shift can only execute in the U-pipe. On the P6 family processors, both the LEA and SHIFT instructions are single micro-op instructions that execute in 1 clock.
Complex Instructions. For greater execution speed, avoid using complex instructions (for example, LOOP, ENTER, or LEAVE). Use sequences of simple instructions instead to accomplish the function of a complex instruction. Zero-Extension of Short Integers. On the Pentium processor, the MOVZX instruction has a prefix and takes 3 clocks to execute totaling 4 clocks. It is recommended that the following sequence be used instead of the MOVZX instruction:
xor mov eax, eax al, mem
If this code occurs within a loop, it may be possible to pull the XOR instruction out of the loop if the only assignment to EAX is the MOV AL, MEM. This has greater importance for the Pentium processor since the MOVZX is not pairable and the new sequence may be paired with adjacent instructions. In order to avoid a partial register stall on the P6 family processors, special hardware has been implemented that allows this code sequence to execute without a stall. Even
14-32
CODE OPTIMIZATION
so, the MOVZX instruction is a better choice for the P6 family processors than the alternative sequences.
PUSH Mem. The PUSH mem instruction takes 4 clocks for the Intel486 processor. It is recommended that the following sequence be used in place of a PUSH mem instruction because it takes only 2 clocks for the Intel486 processor and increases pairing opportunity for the Pentium processor.
mov push
reg, mem reg
Short Opcodes. Use 1 byte long instructions as much as possible. This will reduce code size and help increase instruction density in the instruction cache. The most common example is using the INC and DEC instructions rather than adding or subtracting the constant 1 with an ADD or SUB instruction. Another common example is using the PUSH and POP instructions instead of the equivalent sequence. 8/16 Bit Operands. With 8-bit operands, try to use the byte opcodes, rather than using 32bit operations on sign and zero extended bytes. Prefixes for operand size override apply to 16-bit operands, not to 8-bit operands. Sign Extension is usually quite expensive. Often, the semantics can be maintained by zero extending 16-bit operands. Specifically, the C code in the following example does not need sign extension nor does it need prefixes for operand size overrides.
static short int a, b; if (a==b) { ... }
Code for comparing these 16-bit operands might be:

U Pipe V Pipe
xor eax, eax movw ax, [a] movw bx, [b]
xor ebx, ebx
;1 ; 2 (prefix) + 1 ; 4 (prefix) + 1
cmp eax, ebx
;6
Of course, this can only be done under certain circumstances, but the circumstances tend to be quite common. This would not work if the compare was for greater than, less than, greater than or equal, and so on, or if the values in EAX or EBX were to be used in another operation where sign extension was required. The P6 family processors provides special support for the XOR reg, reg instruction where both operands point to the same register, recognizing that clearing a register does not depend on the old value of the register. Additionally, special support is provided for the above specific code sequence to avoid the partial stall.
14-33
CODE OPTIMIZATION
The following straight-forward method may be slower on Pentium processors.

movsw movsw cmp eax, a ebx, b ebx, eax ; 1 prefix + 3 ;5 ;9
However, the P6 family processors have improved the performance of the MOVZX instructions to reduce the prevalence of partial stalls. Code written specifically for the P6 family processors should use the MOVZX instructions.
Compares. Use the TEST instruction when comparing a value in a register with 0. TEST essentially ANDs the operands together without writing to a destination register. If a value is ANDed with itself and the result sets the zero condition flag, the value was zero. TEST is preferred over an AND instruction because AND writes the result register which may subsequently cause an AGI or an artificial output dependence on the P6 family processors. TEST is better than CMP .., 0 because the instruction size is smaller. Use the TEST instruction when comparing the result of a boolean AND with an immediate constant for equality or inequality if the register is EAX (if (avar & 8) { }). On the Pentium processor, the TEST instruction is a 1 clock pairable instruction when the form is TEST EAX, imm or TEST reg, reg. Other forms of TEST take 2 clocks and do not pair.
Address Calculations. Pull address calculations into load and store instructions. Internally, memory reference instructions can have 4 operands: a relocatable load-time constant, an immediate constant, a base register, and a scaled index register. (In the segmented model, a segment register may constitute an additional operand in the linear address calculation.) In many cases, several integer instructions can be eliminated by fully using the operands of memory references. Clearing a Register. The preferred sequence to move zero to a register is XOR reg, reg. This sequence saves code space but sets the condition codes. In contexts where the condition codes must be preserved, use MOV reg, 0. Integer Divide. Typically, an integer divide is preceded by a CDQ instruction. (Divide instructions use EDX: EAX as the dividend and CDQ sets up EDX.) It is better to copy EAX into EDX, then right shift EDX 31 places to sign extend. On the Pentium processor, the copy/shift takes the same number of clocks as CDQ, but the copy/shift scheme allows two other instructions to execute at the same time. If the value is known to be positive, use XOR EDX, EDX. On the P6 family processors, the CDQ instruction is faster, because CDQ is a single micro-op instruction as opposed to two instructions for the copy/shift sequence.
Prolog Sequences. Be careful to avoid AGIs in the procedure and function prolog sequences due to register ESP. Since PUSH can pair with other PUSH instructions, saving callee-saved registers on entry to functions should use these instructions. If possible, load parameters before decrementing ESP.
14-34
CODE OPTIMIZATION
In routines that do not call other routines (leaf routines), use ESP as the base register to free up EBP. If you are not using the 32-bit flat model, remember that EBP cannot be used as a general purpose base register because it references the stack segment.
Avoid Compares with Immediate Zero. Often when a value is compared with zero, the operation producing the value sets condition codes that can be tested directly by a Jcc instruction. The most notable exceptions are the MOV and LEA instructions. In these cases, use the TEST instruction. Epilog Sequence. If only 4 bytes were allocated in the stack frame for the current function, instead of incrementing the stack pointer by 4, use POP instructions to prevent AGIs. For the Pentium processor, use two pops for eight bytes.
14-35
CODE OPTIMIZATION
14-36
15
Debugging and Performance Monitoring
CHAPTER 15 DEBUGGING AND PERFORMANCE MONITORING

The Intel Architecture provides extensive debugging facilities for use in debugging code and monitoring code execution and processor performance. These facilities are valuable for debugging applications software, system software, and multitasking operating systems. The debugging support is accessed through the debug registers (DB0 through DB7) and two model-specific registers (MSRs). The debug registers of the Intel Architecture processors hold the addresses of memory and I/O locations, called breakpoints. Breakpoints are user-selected locations in a program, a data-storage area in memory, or specific I/O ports where a programmer or system designer wishes to halt execution of a program and examine the state of the processor by invoking debugger software. A debug exception (#DB) is generated when a memory or I/O access is made to one of these breakpoint addresses. A breakpoint is specified for a particular form of memory or I/O access, such as a memory read and/or write operation or an I/O read and/or write operation. The debug registers support both instruction breakpoints and data breakpoints. The MSRs (which were introduced into the Intel Architecture in the P6 family processors) monitor branches, interrupts, and exceptions and record the addresses of the last branch, interrupt or exception taken and the last branch taken before an interrupt or exception.
15.1. OVERVIEW OF THE DEBUGGING SUPPORT FACILITIES

The following processor facilities support debugging and performance monitoring:
Debug exception (#DB)Transfers program control to a debugger procedure or task when a debug event occurs. Breakpoint exception (#BP)Transfers program control to a debugger procedure or task when an INT 3 instruction is executed. Breakpoint-address registers (DB0 through DB3)Specifies the addresses of up to 4 breakpoints. Debug status register (DB6)Reports the conditions that were in effect when a debug or breakpoint exception was generated. Debug control register (DB7)Specifies the forms of memory or I/O access that cause breakpoints to be generated. DebugCtlMSR registerEnables last branch, interrupt, and exception recording; taken branch traps; the breakpoint reporting pins; and trace messages. LastBranchToIP and LastBranchFromIP MSRsSpecifies the source and destination addresses of the last branch, interrupt, or exception taken. The address saved is the offset in the code segment of the branch (source) or target (destination) instruction.
15-1
DEBUGGING AND PERFORMANCE MONITORING
LastExceptionToIP and LastExceptionFromIP MSRsSpecifies the source and destination addresses of the last branch that was taken prior to an exception or interrupt being generated. The address saved is the offset in the code segment of the branch (source) or target (destination) instruction. T (trap) flag, TSSGenerates a debug exception (#DB) when an attempt is made to switch to a task with the T flag set in its TSS. RF (resume) flag, EFLAGS register Suppresses multiple exceptions to the same instruction. TF (trap) flag, EFLAGS registerGenerates a debug exception (#DB) after every execution of an instruction. Breakpoint instruction (INT 3)Generates a breakpoint exception (#BP), which transfers program control to the debugger procedure or task. This instruction is an alternative way to set code breakpoints. It is especially useful when more than four breakpoints are desired, or when breakpoints are being placed in the source code.
These facilities allow a debugger to be called either as a separate task or as a procedure in the context of the current program or task. The following conditions can be used to invoke the debugger:
Task switch to a specific task. Execution of the breakpoint instruction. Execution of any instruction. Execution of an instruction at a specified address. Read or write of a byte, word, or doubleword at a specified memory address. Write to a byte, word, or doubleword at a specified memory address. Input of a byte, word, or doubleword at a specified I/O address. Output of a byte, word, or doubleword at a specified I/O address. Attempt to change the contents of a debug register.
15.2. DEBUG REGISTERS

The eight debug registers (refer to Figure 15-1) control the debug operation of the processor. These registers can be written to and read using the move to or from debug register form of the MOV instruction. A debug register may be the source or destination operand for one of these instructions. The debug registers are privileged resources; a MOV instruction that accesses these registers can only be executed in real-address mode, in SMM, or in protected mode at a CPL of 0. An attempt to read or write the debug registers from any other privilege level generates a general-protection exception (#GP).
15-2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
R s G L G L G L G L G L LEN R/W LEN R/W LEN R/W LEN R/W G 0 0 1 0 0 v E E 3 3 2 2 1 1 0 0 DR7 3 3 2 2 1 1 0 0 D d

31 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Reserved (set to 1)
B B B 0 1 1 1 1 1 1 1 1 B B B B DR6 T S D 3 2 1 0
0
31
Reserved
DR5
0
31
Reserved
DR4
0
31
Breakpoint 3 Linear Address
DR3
0
31
DR2
0
31
DR1
0
31
DR0
Reserved Bits, DO NOT DEFINE
Figure 15-1. Debug Registers
15-3
The primary function of the debug registers is to set up and monitor from 1 to 4 breakpoints, numbered 0 though 3. For each breakpoint, the following information can be specified and detected with the debug registers:
The linear address where the breakpoint is to occur. The length of the breakpoint location (1, 2, or 4 bytes). The operation that must be performed at the address for a debug exception to be generated. Whether the breakpoint is enabled. Whether the breakpoint condition was present when the debug exception was generated.
The following paragraphs describe the functions of flags and fields in the debug registers.
15.2.1. Debug Address Registers (DR0-DR3)

Each of the four debug-address registers (DR0 through DR3) holds the 32-bit linear address of a breakpoint (refer to Figure 15-1). Breakpoint comparisons are made before physical address translation occurs. Each breakpoint condition is specified further by the contents of debug register DR7.
15.2.2. Debug Registers DR4 and DR5

Debug registers DR4 and DR5 are reserved when debug extensions are enabled (when the DE flag in control register CR4 is set), and attempts to reference the DR4 and DR5 registers cause an invalid-opcode exception (#UD) to be generated. When debug extensions are not enabled (when the DE flag is clear), these registers are aliased to debug registers DR6 and DR7.
15.2.3. Debug Status Register (DR6)

The debug status register (DR6) reports the debug conditions that were sampled at the time the last debug exception was generated (refer to Figure 15-1). Updates to this register only occur when an exception is generated. The flags in this register show the following information: B0 through B3 (breakpoint condition detected) flags (bits 0 through 3) Indicates (when set) that its associated breakpoint condition was met when a debug exception was generated. These flags are set if the condition described for each breakpoint by the LEN n, and R/Wn flags in debug control register DR7 is true. They are set even if the breakpoint is not enabled by the Ln and Gn flags in register DR7. BD (debug register access detected) flag (bit 13) Indicates that the next instruction in the instruction stream will access one of the debug registers (DR0 through DR7). This flag is enabled when the GD (general detect) flag in debug control register DR7 is set. Refer to Section 15.2.4., Debug Control Register (DR7) for further explanation of the purpose of this flag.
15-4
BS (single step) flag (bit 14) Indicates (when set) that the debug exception was triggered by the single-step execution mode (enabled with the TF flag in the EFLAGS register). The singlestep mode is the highest-priority debug exception. When the BS flag is set, any of the other debug status bits also may be set. BT (task switch) flag (bit 15) Indicates (when set) that the debug exception resulted from a task switch where the T flag (debug trap flag) in the TSS of the target task was set (refer to Section 6.2.1., Task-State Segment (TSS), in Section 6, Task Management, for the format of a TSS). There is no flag in debug control register DR7 to enable or disable this exception; the T flag of the TSS is the only enabling flag. Note that the contents of the DR6 register are never cleared by the processor. To avoid any confusion in identifying debug exceptions, the debug handler should clear the register before returning to the interrupted program or task.
15.2.4. Debug Control Register (DR7)

The debug control register (DR7) enables or disables breakpoints and sets breakpoint conditions (refer to Figure 15-1). The flags and fields in this register control the following things: L0 through L3 (local breakpoint enable) flags (bits 0, 2, 4, and 6) Enable (when set) the breakpoint condition for the associated breakpoint for the current task. When a breakpoint condition is detected and its associated Ln flag is set, a debug exception is generated. The processor automatically clears these flags on every task switch to avoid unwanted breakpoint conditions in the new task. G0 through G3 (global breakpoint enable) flags (bits 1, 3, 5, and 7) Enable (when set) the breakpoint condition for the associated breakpoint for all tasks. When a breakpoint condition is detected and its associated Gn flag is set, a debug exception is generated. The processor does not clear these flags on a task switch, allowing a breakpoint to be enabled for all tasks. LE and GE (local and global exact breakpoint enable) flags (bits 8 and 9) (Not supported in the P6 family processors.) When set, these flags cause the processor to detect the exact instruction that caused a data breakpoint condition. For backward and forward compatibility with other Intel Architecture processors, Intel recommends that the LE and GE flags be set to 1 if exact breakpoints are required. GD (general detect enable) flag (bit 13) Enables (when set) debug-register protection, which causes a debug exception to be generated prior to any MOV instruction that accesses a debug register. When such a condition is detected, the BD flag in debug status register DR6 is set prior to generating the exception. This condition is provided to support incircuit emulators. (When the emulator needs to access the debug registers, emulator software can set the GD flag to prevent interference from the program
15-5
currently executing on the processor.) The processor clears the GD flag upon entering to the debug exception handler, to allow the handler access to the debug registers. R/W0 through R/W3 (read/write) fields (bits 16, 17, 20, 21, 24, 25, 28, and 29) Specifies the breakpoint condition for the corresponding breakpoint. The DE (debug extensions) flag in control register CR4 determines how the bits in the R/Wn fields are interpreted. When the DE flag is set, the processor interprets these bits as follows: 00Break on instruction execution only. 01Break on data writes only. 10Break on I/O reads or writes. 11Break on data reads or writes but not instruction fetches. When the DE flag is clear, the processor interprets the R/Wn bits the same as for the Intel386 and Intel486 processors, which is as follows: 00Break on instruction execution only. 01Break on data writes only. 10Undefined. 11Break on data reads or writes but not instruction fetches. LEN0 through LEN3 (Length) fields (bits 18, 19, 22, 23, 26, 27, 30, and 31) Specify the size of the memory location at the address specified in the corresponding breakpoint address register (DR0 through DR3). These fields are interpreted as follows: 001-byte length 012-byte length 10Undefined 114-byte length If the corresponding RWn field in register DR7 is 00 (instruction execution), then the LENn field should also be 00. The effect of using any other length is undefined. Refer to Section 15.2.5., Breakpoint Field Recognition for further information on the use of these fields.
15.2.5. Breakpoint Field Recognition

The breakpoint address registers (debug registers DR0 through DR3) and the LENn fields for each breakpoint define a range of sequential byte addresses for a data or I/O breakpoint. The LENn fields permit specification of a 1-, 2-, or 4-byte range beginning at the linear address specified in the corresponding debug register (DRn). Two-byte ranges must be aligned on word boundaries and 4-byte ranges must be aligned on doubleword boundaries. I/O breakpoint addresses are zero extended from 16 to 32 bits for purposes of comparison with the breakpoint address in the selected debug register. These requirements are enforced by the processor; it uses the LENn field bits to mask the lower address bits in the debug registers. Unaligned data or I/O breakpoint addresses do not yield the expected results.
15-6
A data breakpoint for reading or writing data is triggered if any of the bytes participating in an access is within the range defined by a breakpoint address register and its LENn field. Table 15-1 gives an example setup of the debug registers and the data accesses that would subsequently trap or not trap on the breakpoints.
Table 15-1. Breakpointing Examples
Debug Register Setup Debug Register DR0 DR1 DR2 DR3 R/Wn R/W0 = 11 (Read/Write) R/W1 = 01 (Write) R/W2 = 11 (Read/Write) R/W3 = 01 (Write) Breakpoint Address A0001H A0002H B0002H C0000H LENn LEN0 = 00 (1 byte) LEN1 = 00 (1 byte) LEN2 = 01) (2 bytes) LEN3 = 11 (4 bytes)
Data Accesses Operation Data operations that trap - Read or write - Read or write - Write - Write - Read or write - Read or write - Read or write - Write - Write - Write Data operations that do not trap - Read or write - Read - Read or write - Read or write - Read - Read or write Address A0001H A0001H A0002H A0002H B0001H B0002H B0002H C0000H C0001H C0003H A0000H A0002H A0003H B0000H C0000H C0004H Access Length (In Bytes) 1 2 1 2 4 1 2 4 2 1 1 1 4 2 2 4
A data breakpoint for an unaligned operand can be constructed using two breakpoints, where each breakpoint is byte-aligned, and the two breakpoints together cover the operand. These breakpoints generate exceptions only for the operand, not for any neighboring bytes. Instruction breakpoint addresses must have a length specification of 1 byte (the LENn field is set to 00). The behavior of code breakpoints for other operand sizes is undefined. The processor recognizes an instruction breakpoint address only when it points to the first byte of an instruction. If the instruction has any prefixes, the breakpoint address must point to the first prefix.
15.3. DEBUG EXCEPTIONS

The Intel Architecture processors dedicate two interrupt vectors to handling debug exceptions: vector 1 (debug exception, #DB) and vector 3 (breakpoint exception, #BP). The following
15-7
sections describe how these exceptions are generated and typical exception handler operations for handling these exceptions.
15.3.1. Debug Exception (#DB)Interrupt Vector 1

The debug-exception handler is usually a debugger program or is part of a larger software system. The processor generates a debug exception for any of several conditions. The debugger can check flags in the DR6 and DR7 registers to determine which condition caused the exception and which other conditions might also apply. Table 15-2 shows the states of these flags following the generation of each kind of breakpoint condition.
Table 15-2. Debug Exception Conditions
Debug or Breakpoint Condition Single-step trap Instruction breakpoint, at addresses defined by DRn and LENn Data write breakpoint, at addresses defined by DRn and LENn I/O read or write breakpoint, at addresses defined by DRn and LENn Data read or write (but not instruction fetches), at addresses defined by DRn and LENn General detect fault, resulting from an attempt to modify debug registers (usually in conjunction with in-circuit emulation) Task switch DR6 Flags Tested BS = 1 Bn = 1 and (GEn or LEn = 1) Bn = 1 and (GEn or LEn = 1) Bn = 1 and (GEn or LEn = 1) Bn = 1 and (GEn or LEn = 1) BD = 1 R/Wn = 0 R/Wn = 1 R/Wn = 2 R/Wn = 3 DR7 Flags Tested Exception Class Trap Fault Trap Trap Trap
Fault
BT = 1
Trap
Instruction-breakpoint and general-detect conditions (refer to Section 15.3.1.3., GeneralDetect Exception Condition) result in faults; other debug-exception conditions result in traps. The debug exception may report either or both at one time. The following sections describe each class of debug exception. Refer to Section 5.12., Exception and Interrupt Reference in Chapter 5, Interrupt and Exception Handling for additional information about this exception. 15.3.1.1. INSTRUCTION-BREAKPOINT EXCEPTION CONDITION
The processor reports an instruction breakpoint when it attempts to execute an instruction at an address specified in a breakpoint-address register (DB0 through DR3) that has been set up to detect instruction execution (R/W flag is set to 0). Upon reporting the instruction breakpoint, the processor generates a fault-class, debug exception (#DB) before it executes the target instruction
15-8
for the breakpoint. Instruction breakpoints are the highest priority debug exceptions and are guaranteed to be serviced before any other exceptions that may be detected during the decoding or execution of an instruction. Because the debug exception for an instruction breakpoint is generated before the instruction is executed, if the instruction breakpoint is not removed by the exception handler, the processor will detect the instruction breakpoint again when the instruction is restarted and generate another debug exception. To prevent looping on an instruction breakpoint, the Intel Architecture provides the RF flag (resume flag) in the EFLAGS register (refer to Section 2.3., System Flags and Fields in the EFLAGS Register in Chapter 2, System Architecture Overview). When the RF flag is set, the processor ignores instruction breakpoints. All Intel Architecture processors manage the RF flag as follows. The processor sets the RF flag automatically prior to calling an exception handler for any fault-class exception except a debug exception that was generated in response to an instruction breakpoint. For debug exceptions resulting from instruction breakpoints, the processor does not set the RF flag prior to calling the debug exception handler. The debug exception handler then has the option of disabling the instruction breakpoint or setting the RF flag in the EFLAGS image on the stack. If the RF flag in the EFLAGS image is set when the processor returns from the exception handler, it is copied into the RF flag in the EFLAGS register by the IRETD or task switch instruction that causes the return. The processor then ignores instruction breakpoints for the duration of the next instruction. (Note that the POPF, POPFD, and IRET instructions do not transfer the RF image into the EFLAGS register.) Setting the RF flag does not prevent other types of debug-exception conditions (such as, I/O or data breakpoints) from being detected, nor does it prevent nondebug exceptions from being generated. After the instruction is successfully executed, the processor clears the RF flag in the EFLAGS register, except after an IRETD instruction or after a JMP, CALL, or INT n instruction that causes a task switch. (Note that the processor also does not set the RF flag when calling exception or interrupt handlers for trap-class exceptions, for hardware interrupts, or for software-generated interrupts.) For the Pentium processor, when an instruction breakpoint coincides with another fault-type exception (such as a page fault), the processor may generate one spurious debug exception after the second exception has been handled, even though the debug exception handler set the RF flag in the EFLAGS image. To prevent this spurious exception with Pentium processors, all faultclass exception handlers should set the RF flag in the EFLAGS image. 15.3.1.2. DATA MEMORY AND I/O BREAKPOINT EXCEPTION CONDITIONS
Data memory and I/O breakpoints are reported when the processor attempts to access a memory or I/O address specified in a breakpoint-address register (DB0 through DR3) that has been set up to detect data or I/O accesses (R/W flag is set to 1, 2, or 3). The processor generates the exception after it executes the instruction that made the access, so these breakpoint condition causes a trap-class exception to be generated. Because data breakpoints are traps, the original data is overwritten before the trap exception is generated. If a debugger needs to save the contents of a write breakpoint location, it should save the original contents before setting the breakpoint. The handler can report the saved value after the breakpoint is triggered. The address in the debug registers can be used to locate the new value stored by the instruction that triggered the breakpoint.
15-9
The Intel486 and later Intel Architecture processors ignore the GE and LE flags in DR7. In the Intel386 processor, exact data breakpoint matching does not occur unless it is enabled by setting the LE and/or the GE flags. The P6 family processors, however, are unable to report data breakpoints exactly for the REP MOVS and REP STOS instructions until the completion of the iteration after the iteration in which the breakpoint occurred. For repeated INS and OUTS instructions that generate an I/O-breakpoint debug exception, the processor generates the exception after the completion of the first iteration. Repeated INS and OUTS instructions generate an I/O-breakpoint debug exception after the iteration in which the memory address breakpoint location is accessed. 15.3.1.3. GENERAL-DETECT EXCEPTION CONDITION
When the GD flag in DR7 is set, the general-detect debug exception occurs when a program attempts to access any of the debug registers (DR0 through DR7) at the same time they are being used by another application, such as an emulator or debugger. This additional protection feature guarantees full control over the debug registers when required. The debug exception handler can detect this condition by checking the state of the BD flag of the DR6 register. The processor generates the exception before it executes the MOV instruction that accesses a debug register, which causes a fault-class exception to be generated. 15.3.1.4. SINGLE-STEP EXCEPTION CONDITION
The processor generates a single-step debug exception if (while an instruction is being executed) it detects that the TF flag in the EFLAGS register is set. The exception is a trap-class exception, because the exception is generated after the instruction is executed. (Note that the processor does not generate this exception after an instruction that sets the TF flag. For example, if the POPF instruction is used to set the TF flag, a single-step trap does not occur until after the instruction that follows the POPF instruction.) The processor clears the TF flag before calling the exception handler. If the TF flag was set in a TSS at the time of a task switch, the exception occurs after the first instruction is executed in the new task. The TF flag normally is not cleared by privilege changes inside a task. The INT n and INTO instructions, however, do clear this flag. Therefore, software debuggers that single-step code must recognize and emulate INT n or INTO instructions rather than executing them directly. To maintain protection, the operating system should check the CPL after any single-step trap to see if single stepping should continue at the current privilege level. The interrupt priorities guarantee that, if an external interrupt occurs, single stepping stops. When both an external interrupt and a single-step interrupt occur together, the single-step interrupt is processed first. This operation clears the TF flag. After saving the return address or switching tasks, the external interrupt input is examined before the first instruction of the singlestep handler executes. If the external interrupt is still pending, then it is serviced. The external interrupt handler does not run in single-step mode. To single step an interrupt handler, set a break point inside the handler and then set the TF flag.
15-10
15.3.1.5.
TASK-SWITCH EXCEPTION CONDITION
The processor generates a debug exception after a task switch if the T flag of the new tasks TSS is set. This exception is generated after program control has passed to the new task, and after the first instruction of that task is executed. The exception handler can detect this condition by examining the BT flag of the DR6 register. Note that, if the debug exception handler is a task, the T bit of its TSS should not be set. Failure to observe this rule will put the processor in a loop.
15.3.2. Breakpoint Exception (#BP)Interrupt Vector 3

The breakpoint exception (interrupt 3) is caused by execution of an INT 3 instruction (refer to Section 5.12., Exception and Interrupt Reference in Chapter 5, Interrupt and Exception Handling). Debuggers use break exceptions in the same way that they use the breakpoint registers; that is, as a mechanism for suspending program execution to examine registers and memory locations. With earlier Intel Architecture processors, breakpoint exceptions are used extensively for setting instruction breakpoints. With the Intel386 and later Intel Architecture processors, it is more convenient to set breakpoints with the breakpoint-address registers (DR0 through DR3). However, the breakpoint exception still is useful for breakpointing debuggers, because the breakpoint exception can call a separate exception handler. The breakpoint exception is also useful when it is necessary to set more breakpoints than there are debug registers or when breakpoints are being placed in the source code of a program under development.
15.4. LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING

The P6 family processors provide five MSRs for recording the last branch, interrupt, or exception taken by the processor: DebugCtlMSR, LastBranchToIP, LastBranchFromIP, LastExceptionToIP, and LastExceptionFromIP. These registers can be used to set breakpoints on branches, interrupts, and exceptions, and to single-step from one branch to the next.
15.4.1. DebugCtlMSR Register

The DebugCtlMSR register enables last branch, interrupt, and exception recording; taken branch breakpoints; the breakpoint reporting pins; and trace messages. This register can be written to using the WRMSR instruction, when operating at privilege level 0 or when in realaddress mode. A protected-mode operating system procedure is required to provide user access to this register. Figure 15-2 shows the flags in the DebugCtlMSR register. The functions of these flags are as follows: LBR (last branch/interrupt/exception) flag (bit 0) When set, the processor records the source and target addresses for the last branch and the last exception or interrupt taken by the processor prior to a debug exception being generated. The processor clears this flag whenever a debug exception, such as an instruction or data breakpoint or single-step trap occurs.
15-11
31
7 6 5 4 3 2 1 0
P P P P B L T B B B B T B R 3 2 1 0 F R
TRTrace messages enable PBiPerformance monitoring/breakpoint pins BTFSingle-step on branches LBRLast branch/interrupt/exception Reserved
Figure 15-2. DebugCtlMSR Register
BTF (single-step on branches) flag (bit 1) When set, the processor treats the TF flag in the EFLAGS register as a singlestep on branches flag rather than a single-step on instructions flag. This mechanism allows single-stepping the processor on taken branches. Software must set both the BTF and TF flag to enable debug breakpoints on branches; the processor clears both flags whenever a debug exception occurs. PBi (performance monitoring/breakpoint pins) flags (bits 2 through 5) When these flags are set, the performance monitoring/breakpoint pins on the processor (BP0#, BP1#, BP2#, and BP3#) report breakpoint matches in the corresponding breakpoint-address registers (DR0 through DR3). The processor asserts then deasserts the corresponding BPi# pin when a breakpoint match occurs. When a PBi flag is clear, the performance monitoring/breakpoint pins report performance events. Processor execution is not affected by reporting performance events. TR (trace message enable) flag (bit 6) When set, trace messages are enabled. Thereafter, when the processor detects a branch, exception, or interrupt, it sends the to and from addresses out on the system bus as part of a branch trace message. A debugging device that is monitoring the system bus can read these messages and synchronize operations with branch, exception, and interrupt events. Setting this flag greatly reduces the performance of the processor. When trace messages are enabled, the values stored in the LastBranchToIP, LastBranchFromIP, LastExceptionToIP, and LastExceptionFromIP MSRs are undefined. Note that the from addresses sent out on the system bus may differ from those stored in the LastBranchFromIP MSRs or LastExceptionFromIP MSRs. The from address sent out on the bus is always the next instruction in the instruction stream following a successfully completed instruction. For example, if a branch completes successfully, the address stored in the LastBranchFromIP MSR is the address of the branch instruction, but the address sent out on the bus in the trace message is the address of the instruction
15-12
following the branch instruction. If the processor faults on the branch, the address stored in the LastBranchFromIP MSR is again the address of the branch instruction and that same address is sent out on the bus.
15.4.2. Last Branch and Last Exception MSRs

The LastBranchToIP and LastBranchFromIP MSRs are 32-bit registers for recording the instruction pointers for the last branch, interrupt, or exception that the processor took prior to a debug exception being generated (refer to Figure 15-2). When a branch occurs, the processor loads the address of the branch instruction into the LastBranchFromIP MSR and loads the target address for the branch into the LastBranchToIP MSR. When an interrupt or exception occurs (other than a debug exception), the address of the instruction that was interrupted by the exception or interrupt is loaded into the LastBranchFromIP MSR and the address of the exception or interrupt handler that is called is loaded into the LastBranchToIP MSR. The LastExceptionToIP and LastExceptionFromIP MSRs (also 32-bit registers) record the instruction pointers for the last branch that the processor took prior to an exception or interrupt being generated. When an exception or interrupt occurs, the contents of the LastBranchToIP and LastBranchFromIP MSRs are copied into these registers before the to and from addresses of the exception or interrupt are recorded in the LastBranchToIP and LastBranchFromIP MSRs. These registers can be read using the RDMSR instruction.
15.4.3. Monitoring Branches, Exceptions, and Interrupts

When the LBR flag in the DebugCtlMSR register is set, the processor automatically begins recording branches that it takes, exceptions that are generated (except for debug exceptions), and interrupts that are serviced. Each time a branch, exception, or interrupt occurs, the processor records the to and from instruction pointers in the LastBranchToIP and LastBranchFromIP MSRs. In addition, for interrupts and exceptions, the processor copies the contents of the LastBranchToIP and LastBranchFromIP MSRs into the LastExceptionToIP and LastExceptionFromIP MSRs prior to recording the to and from addresses of the interrupt or exception. When the processor generates a debug exception (#DB), it automatically clears the LBR flag before executing the exception handler, but does not touch the last branch and last exception MSRs. The addresses for the last branch, interrupt, or exception taken are thus retained in the LastBranchToIP and LastBranchFromIP MSRs and the addresses of the last branch prior to an interrupt or exception are retained in the LastExceptionToIP, and LastExceptionFromIP MSRs. The debugger can use the last branch, interrupt, and/or exception addresses in combination with code-segment selectors retrieved from the stack to reset breakpoints in the breakpoint-address registers (DR0 through DR3), allowing a backward trace from the manifestation of a particular bug toward its source. Because the instruction pointers recorded in the LastBranchToIP, LastBranchFromIP, LastExceptionToIP, and LastExceptionFromIP MSRs are offsets into a code segment, software must determine the segment base address of the code segment associated with
15-13
the control transfer to calculate the linear address to be placed in the breakpoint-address registers. The segment base address can be determined by reading the segment selector for the code segment from the stack and using it to locate the segment descriptor for the segment in the GDT or LDT. The segment base address can then be read from the segment descriptor. Before resuming program execution from a debug-exception handler, the handler should set the LBR flag again to re-enable last branch and last exception/interrupt recording.
15.4.4. Single-Stepping on Branches, Exceptions, and Interrupts

When the BTF flag in the DebugCtlMSR register and the TF flag in the EFLAGS register are both set, the processor generates a single-step debug exception the next time it takes a branch, generates an exception, or services an interrupt. This mechanism allows the debugger to singlestep on control transfers caused by branches, exceptions, or interrupts. This control-flow single stepping helps isolate a bug to a particular block of code before instruction single-stepping further narrows the search. If the BTF flag is set when the processor generates a debug exception, the processor clears the flag along with the TF flag. The debugger must reset the BTF flag before resuming program execution to continue control-flow single stepping.
15.4.5. Initializing Last Branch or Last Exception/Interrupt Recording

The LastBranchToIP, LastBranchFromIP, LastExceptionToIP, and LastException-FromIP MSRs are enabled by setting the LBR flag in the DebugCtlMSR register. Control-flow single stepping is enabled by setting the BTF flag in the DebugCtlMSR register. The processor clears both the LBR and the BTF flags whenever a debug exception is generated. To re-enable these mechanisms, the debug-exception handler must thus explicitly set these flags before returning to the interrupted program.
15.5. TIME-STAMP COUNTER

The Intel Architecture (beginning with the Pentium processor) defines a time-stamp counter mechanism that can be used to monitor and identify the relative time of occurrence of processor events. The time-stamp counter architecture includes an instruction for reading the time-stamp counter (RDTSC), a feature bit (TCS flag) that can be read with the CPUID instruction, a timestamp counter disable bit (TSD flag) in control register CR4, and a model-specific time-stamp counter. Following execution of the CPUID instruction, the TSC flag in register EDX (bit 4) indicates (when set) that the time-stamp counter is present in a particular Intel Architecture processor implementation. (Refer to CPUIDCPU Identification in Chapter 3 of the Intel Architecture Software Developers Manual, Volume 2.) The time-stamp counter (as implemented in the Pentium and P6 family processors) is a 64-bit counter that is set to 0 following the hardware reset of the processor. Following reset, the counter
15-14
is incremented every processor clock cycle, even when the processor is halted by the HLT instruction or the external STPCLK# pin. The RDTSC instruction reads the time-stamp counter and is guaranteed to return a monotonically increasing unique value whenever executed, except for 64-bit counter wraparound. Intel guarantees, architecturally, that the time-stamp counter frequency and configuration will be such that it will not wraparound within 10 years after being reset to 0. The period for counter wrap is several thousands of years in the Pentium and P6 family processors. Normally, the RDTSC instruction can be executed by programs and procedures running at any privilege level and in virtual-8086 mode. The TSD flag in control register CR4 (bit 2) allows use of this instruction to be restricted to only programs and procedures running at privilege level 0. A secure operating system would set the TSD flag during system initialization to disable user access to the time-stamp counter. An operating system that disables user access to the timestamp counter should emulate the instruction through a user-accessible programming interface. The RDTSC instruction is not serializing or ordered with other instructions. Thus, it does not necessarily wait until all previous instructions have been executed before reading the counter. Similarly, subsequent instructions may begin execution before the RDTSC instruction operation is performed. The RDMSR and WRMSR instructions can read and write the time-stamp counter, respectively, as a model-specific register (TSC). The ability to read and write the time-stamp counter with the RDMSR and WRMSR instructions is not an architectural feature, and may not be supported by future Intel Architecture processors. Writing to the time-stamp counter with the WRMSR instruction resets the count. Only the low order 32-bits of the time-stamp counter can be written to; the high-order 32 bits are 0 extended (cleared to all 0s).
15.6. PERFORMANCE-MONITORING COUNTERS

The Pentium processor introduced model-specific performance-monitoring counters to the Intel Architecture. These counters permit processor performance parameters to be monitored and measured. The information obtained from these counters can then be used for tuning system and compiler performance. In the Intel P6 family of processors, the performance-monitoring counter mechanism was modified and enhanced to permit a wider variety of events to be monitored and to allow greater control over the selection of the events to be monitored. The following sections describe the performance-monitoring counter mechanism in the Pentium and P6 family processors.
15.6.1. P6 Family Processor Performance-Monitoring Counters

The P6 family processors provide two 40-bit performance counters, allowing two types of events to be monitored simultaneously. These counters can either count events or measure duration. When counting events, a counter is incremented each time a specified event takes place or a specified number of events takes place. When measuring duration, a counter counts the
15-15
number of processor clocks that occur while a specified condition is true. The counters can count events or measure durations that occur at any privilege level. Table A-1 in Appendix A, Performance-Monitoring Events lists the events that can be counted with the P6 family performance monitoring counters. The performance-monitoring counters are supported by four MSRs: the performance event select MSRs (PerfEvtSel0 and PerfEvtSel1) and the performance counter MSRs (PerfCtr0 and PerfCtr1). These registers can be read from and written to using the RDMSR and WRMSR instructions, respectively. They can be accessed using these instructions only when operating at privilege level 0. The PerfCtr0 and PerfCtr1 MSRs can be read from any privilege level using the RDPMC (read performance-monitoring counters) instruction.
NOTE
The PerfEvtSel0, PerfEvtSel1, PerfCtr0, and PerfCtr1 MSRs and the events listed in Table A-1 in Appendix A, Performance-Monitoring Events are model-specific for P6 family processors. They are not guaranteed to be available in future Intel Architecture processors. 15.6.1.1. PERFEVTSEL0 AND PERFEVTSEL1 MSRS
The PerfEvtSel0 and PerfEvtSel1 MSRs control the operation of the performance-monitoring counters, with one register used to set up each counter. They specify the events to be counted, how they should be counted, and the privilege levels at which counting should take place. Figure 15-3 shows the flags and fields in these MSRs. The functions of the flags and fields in the PerfEvtSel0 and PerfEvtSel1 MSRs are as follows: Event select field (bits 0 through 7) Selects the event to be monitored (refer to Table A-1 in Appendix A, Performance-Monitoring Events for a list of events and their 8-bit codes). Unit mask field (bits 8 through 15) Further qualifies the event selected in the event select field. For example, for some cache events, the mask is used as a MESI-protocol qualifier of cache states (refer to Table A-1 in Appendix A, Performance-Monitoring Events). USR (user mode) flag (bit 16) Specifies that events are counted only when the processor is operating at privilege levels 1, 2 or 3. This flag can be used in conjunction with the OS flag. OS (operating system mode) flag (bit 17) Specifies that events are counted only when the processor is operating at privilege level 0. This flag can be used in conjunction with the USR flag.
15-16
31 Counter Mask
24 23 22 21 20 19 18 17 16 15
I N E V N I U N P E O S S R T C
8 7 Unit Mask Event Select
INVInvert counter mask ENEnable counters* INTAPIC interrupt enable PCPin control EEdge detect OSOperating system mode USRUser Mode * Only available in PerfEvtSel0. Reserved
Figure 15-3. PerfEvtSel0 and PerfEvtSel1 MSRs
E (edge detect) flag (bit 18) Enables (when set) edge detection of events. The processor counts the number of deasserted to asserted transitions of any condition that can be expressed by the other fields. The mechanism is limited in that it does not permit back-toback assertions to be distinguished. This mechanism allows software to measure not only the fraction of time spent in a particular state, but also the average length of time spent in such a state (for example, the time spent waiting for an interrupt to be serviced). PC (pin control) flag (bit 19) When set, the processor toggles the PMi pins and increments the counter when performance-monitoring events occur; when clear, the processor toggles the PMi pins when the counter overflows. The toggling of a pin is defined as assertion of the pin for a single bus clock followed by deassertion INT (APIC interrupt enable) flag (bit 20) When set, the processor generates an exception through its local APIC on counter overflow. EN (Enable Counters) Flag (bit 22) This flag is only present in the PerfEvtSel0 MSR. When set, performance counting is enabled in both performance-monitoring counters; when clear, both counters are disabled. INV (invert) flag (bit 23) Inverts the result of the counter-mask comparison when set, so that both greater than and less than comparisons can be made. Counter mask field (bits 24 through 31) When nonzero, the processor compares this mask to the number of events
15-17
counted during a single cycle. If the event count is greater than or equal to this mask, the counter is incremented by one. Otherwise the counter is not incremented. This mask can be used to count events only if multiple occurrences happen per clock (for example, two or more instructions retired per clock). If the counter-mask field is 0, then the counter is incremented each cycle by the number of events that occurred that cycle. 15.6.1.2. PERFCTR0 AND PERFCTR1 MSRS
The performance-counter MSRs (PerfCtr0 and PerfCtr1) contain the event or duration counts for the selected events being counted. The RDPMC instruction can be used by programs or procedures running at any privilege level and in virtual-8086 mode to read these counters. The PCE flag in control register CR4 (bit 8) allows the use of this instruction to be restricted to only programs and procedures running at privilege level 0. The RDPMC instruction is not serializing or ordered with other instructions. Thus, it does not necessarily wait until all previous instructions have been executed before reading the counter. Similarly, subsequent instructions may begin execution before the RDPMC instruction operation is performed. Only the operating system, executing at privilege level 0, can directly manipulate the performance counters, using the RDMSR and WRMSR instructions. A secure operating system would set the TSD flag during system initialization to disable direct user access to the performancemonitoring counters, but provide a user-accessible programming interface that emulates the RDPMC instruction. The WRMSR instruction cannot arbitrarily write to the performance-monitoring counter MSRs (PerfCtr0 and PerfCtr1). Instead, the lower-order 32 bits of each MSR may be written with any value, and the high-order 8 bits are sign-extended according to the value of bit 31. This operation allows writing both positive and negative values to the performance counters. 15.6.1.3. STARTING AND STOPPING THE PERFORMANCE-MONITORING COUNTERS
The performance-monitoring counters are started by writing valid setup information in the PerfEvtSel0 and/or PerfEvtSel1 MSRs and setting the enable counters flag in the PerfEvtSel0 MSR. If the setup is valid, the counters begin counting following the execution of a WRMSR instruction that sets the enable counter flag. The counters can be stopped by clearing the enable counters flag or by clearing all the bits in the PerfEvtSel0 and PerfEvtSel1 MSRs. Counter 1 alone can be stopped by clearing the PerfEvtSel1 MSR. 15.6.1.4. EVENT AND TIME-STAMP MONITORING SOFTWARE
To use the performance-monitoring counters and time-stamp counter, the operating system needs to provide an event-monitoring device driver. This driver should include procedures for handling the following operations:
Feature checking. Initialize and start counters.
15-18
Stop counters. Read the event counters. Read the time-stamp counter.
The event monitor feature determination procedure must determine whether the current processor supports the performance-monitoring counters and time-stamp counter. This procedure compares the family and model of the processor returned by the CPUID instruction with those of processors known to support performance monitoring. (The Pentium and P6 family processors support performance counters.) The procedure also checks the MSR and TSC flags returned to register EDX by the CPUID instruction to determine if the MSRs and the RDTSC instruction are supported. The initialize and start counters procedure sets the PerfEvtSel0 and/or PerfEvtSel1 MSRs for the events to be counted and the method used to count them and initializes the counter MSRs (PerfCtr0 and PerfCtr1) to starting counts. The stop counters procedure stops the performance counters. (Refer to Section 15.6.1.3., Starting and Stopping the Performance-Monitoring Counters for more information about starting and stopping the counters.) The read counters procedure reads the values in the PerfCtr0 and PerfCtr1 MSRs, and a read time-stamp counter procedure reads the time-stamp counter. These procedures would be provided in lieu of enabling the RDTSC and RDPMC instructions that allow application code to read the counters.
15.6.2. Monitoring Counter Overflow

The P6 family processors provide the option of generating a local APIC interrupt when a performance-monitoring counter overflows. This mechanism is enabled by setting the interrupt enable flag in either the PerfEvtSel0 or the PerfEvtSel1 MSR. The primary use of this option is for statistical performance sampling. To use this option, the operating system should do the following things on the processor for which performance events are required to be monitored:
Provide an interrupt vector for handling the counter-overflow interrupt. Initialize the APIC PERF local vector entry to enable handling of performance-monitor counter overflow events. Provide an entry in the IDT that points to a stub exception handler that returns without executing any instructions. Provide an event monitor driver that provides the actual interrupt handler and modifies the reserved IDT entry to point to its interrupt routine.
When interrupted by a counter overflow, the interrupt handler needs to perform the following actions:
Save the instruction pointer (EIP register), code-segment selector, TSS segment selector, counter values and other relevant information at the time of the interrupt.
15-19
Reset the counter to its initial setting and return from the interrupt.
An event monitor application utility or another application program can read the information collected for analysis of the performance of the profiled application.
15.6.3. Pentium Processor Performance-Monitoring Counters

The Pentium processor provides two 40-bit performance counters, which can be used either to count events or measure duration. The performance-monitoring counters are supported by three MSRs: the control and event select MSR (CESR) and the performance counter MSRs (CTR0 and CTR1). These registers can be read from and written to using the RDMSR and WRMSR instructions, respectively. They can be accessed using these instructions only when operating at privilege level 0. Each counter has an associated external pin (PM0/BP0 and PM1/BP1), which can be used to indicate the state of the counter to external hardware.
NOTE
The CESR, CTR0, and CTR1 MSRs and the events listed in Table A-1 in Appendix A, Performance-Monitoring Events are model-specific for the Pentium processor. 15.6.3.1. CONTROL AND EVENT SELECT REGISTER (CESR)
The 32-bit control and event select MSR (CESR) is used to control the operation of performance-monitoring counters CTR0 and CTR1 and their associated pins (refer to Figure 15-3). To control each counter, the CESR register contains a 6-bit event select field (ES0 and ES1), a pin control flag (PC0 and PC1), and a 3-bit counter control field (CC0 and CC1). The functions of these fields are as follows: ES0 and ES1 (event select) fields (bits 0 through 5, bits 16 through 21) Selects (by entering an event code in the field) up to two events to be monitored. Refer to Table A-1 in Appendix A, Performance-Monitoring Events for a list of available event codes CC0 and CC1 (counter control) fields (bits 6 through 8, bits 22 through 24) Controls the operation of the counter. The possible control codes are as follows: CCn 000 001 010 011 100 101 110 111 Meaning Count nothing (counter disabled) Count the selected event while CPL is 0, 1, or 2 Count the selected event while CPL is 3 Count the selected event regardless of CPL Count nothing (counter disabled) Count clocks (duration) while CPL is 0, 1, or 2 Count clocks (duration) while CPL is 3 Count clocks (duration) regardless of CPL
15-20
Note that the highest order bit selects between counting events and counting clocks (duration); the middle bit enables counting when the CPL is 3; and the low-order bit enables counting when the CPL is 0, 1, or 2.
31
26 25 24 P C 1 CC1
22 21 ES1
16 15
10 9 8 P C 0 CC0
6 5 ESO
PC1Pin control 1 CC1Counter control 1 ES1Event select 1 PC0Pin control 0 CC0Counter control 0 ES0Event select 0 Reserved
Figure 15-4. CESR MSR (Pentium Processor Only)
PC0 and PC1 (pin control) flags (bit 9, bits 25) Selects the function of the external performance-monitoring counter pin (PM0/BP0 and PM1/BP1). Setting one of these flags to 1 causes the processor to assert its associated pin when the counter has overflowed; setting the flag to 0 causes the pin to be asserted when the counter has been incremented. These flags permit the pins to be individually programmed to indicate the overflow or incremented condition. Note that the external signaling of the event on the pins will lag the internal event by a few clocks as the signals are latched and buffered. While a counter need not be stopped to sample its contents, it must be stopped and cleared or preset before switching to a new event. It is not possible to set one counter separately. If only one event needs to be changed, the CESR register must be read, the appropriate bits modified, and all bits must then be written back to CESR. At reset, all bits in the CESR register are cleared. 15.6.3.2. USE OF THE PERFORMANCE-MONITORING PINS
When the performance-monitor pins PM0/BP0 and/or PM1/BP1 are configured to indicate when the performance-monitor counter has incremented and an occurrence event is being counted, the associated pin is asserted (high) each time the event occurs. When a duration event is being counted the associated PM pin is asserted for the entire duration of the event. When the performance-monitor pins are configured to indicate when the counter has overflowed, the associated PM pin is not asserted until the counter has overflowed. When the PM0/BP0 and/or PM1/BP1 pins are configured to signal that a counter has incremented, it should be noted that although the counters may increment by 1 or 2 in a single clock,
15-21
the pins can only indicate that the event occurred. Moreover, since the internal clock frequency may be higher than the external clock frequency, a single external clock may correspond to multiple internal clocks. A count up to function may be provided when the event pin is programmed to signal an overflow of the counter. Because the counters are 40 bits, a carry out of bit 39 indicates an overflow. A counter may be preset to a specific value less then 240 1. After the counter has been enabled and the prescribed number of events has transpired, the counter will overflow. Approximately 5 clocks later, the overflow is indicated externally and appropriate action, such as signaling an interrupt, may then be taken. The PM0/BP0 and PM1/BP1 pins also serve to indicate breakpoint matches during in-circuit emulation, during which time the counter increment or overflow function of these pins is not available. After RESET, the PM0/BP0 and PM1/BP1 pins are configured for performance monitoring, however a hardware debugger may reconfigure these pins to indicate breakpoint matches. 15.6.3.3. EVENTS COUNTED
The events that the performance-monitoring counters can set to count and record in the CTR0 and CTR1 MSRs are divided into two categories: occurrences and duration. Occurrences events are counted each time the event takes place. If the PM0/BP0 or PM1/BP1 pins are configured to indicate when a counter increments, they ar asserted each clock the counter increments. Note that if an event can happen twice in one clock, the counter increments by 2, however, the pins are asserted only once. For duration events, the counter counts the total number of clocks that the condition is true. When configured to indicate when a counter increments, the PM0/BP0 and/or PM1/BP1 pins are asserted for the duration of the event. Table A-2 in Appendix A, Performance-Monitoring Events lists the events that can be counted with the Pentium processor performance-monitoring counters.
15-22
16
8086 Emulation
8086 EMULATION
CHAPTER 16 8086 EMULATION

Intel Architecture processors (beginning with the Intel386 processor) provide two ways to execute new or legacy programs that are assembled and/or compiled to run on an Intel 8086 processor:
Real-address mode. Virtual-8086 mode.
Figure 2-2 in Chapter 2, System Architecture Overview shows the relationship of these operating modes to protected mode and system management mode (SMM). When the processor is powered up or reset, it is placed in the real-address mode. This operating mode almost exactly duplicates the execution environment of the Intel 8086 processor, with some extensions. Virtually any program assembled and/or compiled to run on an Intel 8086 processor will run on an Intel Architecture processor in this mode. When running in protected mode, the processor can be switched to virtual-8086 mode to run 8086 programs. This mode also duplicates the execution environment of the Intel 8086 processor, with extensions. In virtual-8086 mode, an 8086 program runs as a separate protectedmode task. Legacy 8086 programs are thus able to run under an operating system (such as Microsoft Windows*) that takes advantage of protected mode and to use protected-mode facilities, such as the protected-mode interrupt- and exception-handling facilities. Protected-mode multitasking permits multiple virtual-8086 mode tasks (with each task running a separate 8086 program) to be run on the processor along with other nonvirtual-8086 mode tasks. This section describes both the basic real-address mode execution environment and the virtual8086-mode execution environment, available on the Intel Architecture processors beginning with the Intel386 processor.
16.1. REAL-ADDRESS MODE

The Intel Architectures real-address mode runs programs written for the Intel 8086, Intel 8088, Intel 80186, and Intel 80188 processors, or for the real-address mode of the Intel 286, Intel386, Intel486, Pentium, Pentium Pro, Pentium II, and P6-family processors. The execution environment of the processor in real-address mode is designed to duplicate the execution environment of the Intel 8086 processor. To an 8086 program, a processor operating in real-address mode behaves like a high-speed 8086 processor. The principal features of this architecture are defined in Chapter 3, Basic Execution Environment, of the Intel Architecture Software Developers Manual, Volume 1. The following is a summary of the core features of the real-address mode execution environment as would be seen by a program written for the 8086:
16-1
8086 EMULATION
The processor supports a nominal 1-MByte physical address space (refer to Section 16.1.1., Address Translation in Real-Address Mode for specific details). This address space is divided into segments, each of which can be up to 64 KBytes in length. The base of a segment is specified with a 16-bit segment selector, which is zero extended to form a 20-bit offset from address 0 in the address space. An operand within a segment is addressed with a 16-bit offset from the base of the segment. A physical address is thus formed by adding the offset to the 20-bit segment base (refer to Section 16.1.1., Address Translation in Real-Address Mode). All operands in native 8086 code are 8-bit or 16-bit values. (Operand size override prefixes can be used to access 32-bit operands.) Eight 16-bit general-purpose registers are provided: AX, BX, CX, DX, SP, BP, SI, and DI. The extended 32 bit registers (EAX, EBX, ECX, EDX, ESP, EBP, ESI, and EDI) are accessible to programs that explicitly perform a size override operation. Four segment registers are provided: CS, DS, SS, and ES. (The FS and GS registers are accessible to programs that explicitly access them.) The CS register contains the segment selector for the code segment; the DS and ES registers contain segment selectors for data segments; and the SS register contains the segment selector for the stack segment. The 8086 16-bit instruction pointer (IP) is mapped to the lower 16-bits of the EIP register. Note this register is a 32-bit register and unintentional address wrapping may occur. The 16-bit FLAGS register contains status and control flags. (This register is mapped to the 16 least significant bits of the 32-bit EFLAGS register.) All of the Intel 8086 instructions are supported (refer to Section 16.1.3., Instructions Supported in Real-Address Mode). A single, 16-bit-wide stack is provided for handling procedure calls and invocations of interrupt and exception handlers. This stack is contained in the stack segment identified with the SS register. The SP (stack pointer) register contains an offset into the stack segment. The stack grows down (toward lower segment offsets) from the stack pointer. The BP (base pointer) register also contains an offset into the stack segment that can be used as a pointer to a parameter list. When a CALL instruction is executed, the processor pushes the current instruction pointer (the 16 least-significant bits of the EIP register and, on far calls, the current value of the CS register) onto the stack. On a return, initiated with a RET instruction, the processor pops the saved instruction pointer from the stack into the EIP register (and CS register on far returns). When an implicit call to an interrupt or exception handler is executed, the processor pushes the EIP, CS, and EFLAGS (low-order 16-bits only) registers onto the stack. On a return from an interrupt or exception handler, initiated with an IRET instruction, the processor pops the saved instruction pointer and EFLAGS image from the stack into the EIP, CS, and EFLAGS registers. A single interrupt table, called the interrupt vector table or interrupt table, is provided for handling interrupts and exceptions (refer to Figure 16-2). The interrupt table (which has 4-byte entries) takes the place of the interrupt descriptor table (IDT, with 8-byte entries) used when handling protected-mode interrupts and exceptions. Interrupt and exception vector numbers provide an index to entries in the interrupt table. Each entry provides a pointer (called a vector) to an interrupt- or exception-handling procedure. Refer to
16-2
8086 EMULATION
Section 16.1.4., Interrupt and Exception Handling for more details. It is possible for software to relocate the IDT by means of the LIDT instruction on Intel Architecture processors beginning with the Intel386 processor.
The floating-point unit (FPU) is active and available to execute FPU instructions in realaddress mode. Programs written to run on the Intel 8087 and Intel 287 math coprocessors can be run in real-address mode without modification.
The following extensions to the Intel 8086 execution environment are available in the Intel Architectures real-address mode. If backwards compatibility to Intel 286 and Intel 8086 processors is required, these features should not be used in new programs written to run in real-address mode.
Two additional segment registers (FS and GS) are available. Many of the integer and system instructions that have been added to P6-family processors can be executed in real-address mode (refer to Section 16.1.3., Instructions Supported in Real-Address Mode). The 32-bit operand prefix can be used in real-address mode programs to execute the 32-bit forms of instructions. This prefix also allows real-address mode programs to use the processors 32-bit general-purpose registers. The 32-bit address prefix can be used in real-address mode programs, allowing 32-bit offsets.
The following sections describe address formation, registers, available instructions, and interrupt and exception handling in real-address mode. For information on I/O in real-address mode, refer to Chapter 9, Input/Output, in the Intel Architecture Software Developers Manual, Volume 1.
16.1.1. Address Translation in Real-Address Mode

In real-address mode, the processor does not interpret segment selectors as indexes into a descriptor table; instead, it uses them directly to form linear addresses as the 8086 processor does. It shifts the segment selector left by 4 bits to form a 20-bit base address (refer to Figure 16-1). The offset into a segment is added to the base address to create a linear address that maps directly to the physical address space. When using 8086-style address translation, it is possible to specify addresses larger than 1 MByte. For example, with a segment selector value of FFFFH and an offset of FFFFH, the linear (and physical) address would be 10FFEFH (1 megabyte plus 64 KBytes). The 8086 processor, which can form addresses only up to 20 bits long, truncates the high-order bit, thereby wrapping this address to FFEFH. When operating in real-address mode, however, the processor does not truncate such an address and uses it as a physical address. (Note, however, that for Intel Architecture processors beginning with the Intel486 processor, the A20M# signal can be used in real-address mode to mask address line A20, thereby mimicking the 20-bit wrap-around behavior of the 8086 processor.) Care should be take to ensure that A20M# based address wrapping is handled correctly in multiprocessor based system.
16-3
8086 EMULATION
19
4 3
Base
16-bit Segment Selector

19 16 15
0 0 0 0
0
+
Offset
0 0 0 0
16-bit Effective Address
=
Linear Address
19
20-bit Linear Address
Figure 16-1. Real-Address Mode Address Translation
The Intel Architecture processors beginning with the Intel386 processor can generate 32-bit offsets using an address override prefix; however, in real-address mode, the value of a 32-bit offset may not exceed FFFFH without causing an exception. For full compatibility with Intel 286 real-address mode, pseudo-protection faults (interrupt 12 or 13) occur if a 32-bit offset is generated outside the range 0 through FFFFH.
16.1.2. Registers Supported in Real-Address Mode

The register set available in real-address mode includes all the registers defined for the 8086 processor plus the new registers introduced inP6-family processors, such as the FS and GS segment registers, the debug registers, the control registers, and the floating-point unit registers. The 32-bit operand prefix allows a real-address mode program to use the 32-bit general-purpose registers (EAX, EBX, ECX, EDX, ESP, EBP, ESI, and EDI).
16.1.3. Instructions Supported in Real-Address Mode

The following instructions make up the core instruction set for the 8086 processor. If backwards compatibility to the Intel 286 and Intel 8086 processors is required, only these instructions should be used in a new program written to run in real-address mode.

16-4
Move (MOV) instructions that move operands between general-purpose registers, segment registers, and between memory and general-purpose registers, The exchange (XCHG) instruction. Load segment register instructions LDS and LES. Arithmetic instructions ADD, ADC, SUB, SBB, MUL, IMUL, DIV, IDIV, INC, DEC, CMP, and NEG. Logical instructions AND, OR, XOR, and NOT.
8086 EMULATION
Decimal instructions DAA, DAS, AAA, AAS, AAM, and AAD. Stack instructions PUSH and POP (to general-purpose registers and segment registers). Type conversion instructions CWD, CDQ, CBW, and CWDE. Shift and rotate instructions SAL, SHL, SHR, SAR, ROL, ROR, RCL, and RCR. TEST instruction. Control instructions JMP, Jcc, CALL, RET, LOOP, LOOPE, and LOOPNE. Interrupt instructions INT n, INTO, and IRET. EFLAGS control instructions STC, CLC, CMC, CLD, STD, LAHF, SAHF, PUSHF, and POPF. I/O instructions IN, INS, OUT, and OUTS. Load effective address (LEA) instruction, and translate (XLATB) instruction. LOCK prefix. Repeat prefixes REP, REPE, REPZ, REPNE, and REPNZ. Processor halt (HLT) instruction. No operation (NOP) instruction.
The following instructions, added to P6-family processors (some in the Intel 286 processor and the remainder in the Intel386 processor), can be executed in real-address mode, if backwards compatibility to the Intel 8086 processor is not required.
Move (MOV) instructions that operate on the control and debug registers. Load segment register instructions LSS, LFS, and LGS. Generalized multiply instructions and multiply immediate data. Shift and rotate by immediate counts. Stack instructions PUSHA, PUSHAD, POPA and POPAD, and PUSH immediate data. Move with sign extension instructions MOVSX and MOVZX. Long-displacement Jcc instructions. Exchange instructions CMPXCHG, CMPXCHG8B, and XADD. String instructions MOVS, CMPS, SCAS, LODS, and STOS. Bit test and bit scan instructions BT, BTS, BTR, BTC, BSF, and BSR; the byte-set-on condition instruction SETcc; and the byte swap (BSWAP) instruction. Double shift instructions SHLD and SHRD. EFLAGS control instructions PUSHF and POPF. ENTER and LEAVE control instructions.
16-5
8086 EMULATION
BOUND instruction. CPU identification (CPUID) instruction. System instructions CLTS, INVD, WINVD, INVLPG, LGDT, SGDT, LIDT, SIDT, LMSW, SMSW, RDMSR, WRMSR, RDTSC, and RDPMC.
Execution of any of the other Intel Architecture instructions (not given in the previous two lists) in real-address mode result in an invalid-opcode exception (#UD) being generated.
16.1.4. Interrupt and Exception Handling

When operating in real-address mode, software must provide interrupt and exception-handling facilities that are separate from those provided in protected mode. Even during the early stages of processor initialization when the processor is still in real-address mode, elementary realaddress mode interrupt and exception-handling facilities must be provided to insure reliable operation of the processor, or the initialization code must insure that no interrupts or exceptions will occur. The Intel Architecture processors handle interrupts and exceptions in real-address mode similar to the way they handle them in protected mode. When a processor receives an interrupt or generates an exception, it uses the vector number of the interrupt or exception as an index into the interrupt table. (In protected mode, the interrupt table is called the interrupt descriptor table (IDT), but in real-address mode, the table is usually called the interrupt vector table, or simply the interrupt table.) The entry in the interrupt vector table provides a pointer to an interrupt- or exception-handler procedure. (The pointer consists of a segment selector for a code segment and a 16-bit offset into the segment.) The processor performs the following actions to make an implicit call to the selected handler: 1. Pushes the current values of the CS and EIP registers onto the stack. (Only the 16 leastsignificant bits of the EIP register are pushed.) 2. Pushes the low-order 16 bits of the EFLAGS register onto the stack. 3. Clears the IF flag in the EFLAGS register to disable interrupts. 4. Clears the TF, RC, and AC flags, in the EFLAGS register. 5. Transfers program control to the location specified in the interrupt vector table. An IRET instruction at the end of the handler procedure reverses these steps to return program control to the interrupted program. Exceptions do not return error codes in real-address mode. The interrupt vector table is an array of 4-byte entries (refer to Figure 16-2). Each entry consists of a far pointer to a handler procedure, made up of a segment selector and an offset. The processor scales the interrupt or exception vector by 4 to obtain an offset into the interrupt table. Following reset, the base of the interrupt vector table is located at physical address 0 and its limit is set to 3FFH. In the Intel 8086 processor, the base address and limit of the interrupt vector table cannot be changed. In the P6-family processors, the base address and limit of the interrupt vector table are contained in the IDTR register and can be changed using the LIDT instruction. (For
16-6
8086 EMULATION
backward compatibility to Intel 8086 processors, the default base address and limit of the interrupt vector table should not be changed.)
Up to Entry 255
Entry 3 12
Entry 2 8 Entry 1 4 Segment Selector Interrupt Vector 0* Offset 15 * Interrupt vector number 0 selects entry 0 (called interrupt vector 0) in the interrupt vector table. Interrupt vector 0 in turn points to the start of the interrupt handler for interrupt 0. 0 IDTR 0 2
Figure 16-2. Interrupt Vector Table in Real-Address Mode
Table 16-1 shows the interrupt and exception vectors that can be generated in real-address mode and virtual-8086 mode, and in the Intel 8086 processor. Refer to Chapter 5, Interrupt and Exception Handling for a description of the exception conditions.
16-7
8086 EMULATION
Table 16-1. Real-Address Mode Exceptions and Interrupts

Vector No. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20-31 32-255 NOTE: * In the real-address mode, vector 13 is the segment overrun exception. In protected and virtual-8086 modes, this exception covers all general-protection error conditions, including traps to the virtual-8086 monitor from virtual-8086 mode. Description Divide Error (#DE) Debug Exception (#DB) NMI Interrupt Breakpoint (#BP) Overflow (#OF) BOUND Range Exceeded (#BR) Invalid Opcode (#UD) Device Not Available (#NM) Double Fault (#DF) (Intel reserved. Do not use.) Invalid TSS (#TS) Segment Not Present (#NP) Stack Fault (#SS) General Protection (#GP)* Page Fault (#PF) (Intel reserved. Do not use.) Floating-Point Error (#MF) Alignment Check (#AC) Machine Check (#MC) SIMD Floating-Point Numeric Error (#XF) (Intel reserved. Do not use.) User Defined Interrupts Real-Address Mode Yes Yes Yes Yes Yes Yes Yes Yes Yes Reserved Reserved Reserved Yes Yes Reserved Reserved Yes Reserved Yes Yes Reserved Yes Virtual-8086 Mode Yes Yes Yes Yes Yes Yes Yes Yes Yes Reserved Yes Yes Yes Yes Yes Reserved Yes Yes Yes Yes Reserved Yes Intel 8086 Processor Yes No Yes Yes Yes Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Reserved Yes
16-8
8086 EMULATION
16.2. VIRTUAL-8086 MODE

Virtual-8086 mode is actually a special type of a task that runs in protected mode. When the operating-system or executive switches to a virtual-8086-mode task, the processor emulates an Intel 8086 processor. The execution environment of the processor while in the 8086-emulation state is the same as is described in Section 16.1., Real-Address Mode for real-address mode, including the extensions. The major difference between the two modes is that in virtual-8086 mode the 8086 emulator uses some protected-mode services (such as the protected-mode interrupt and exception-handling and paging facilities). As in real-address mode, any new or legacy program that has been assembled and/or compiled to run on an Intel 8086 processor will run in a virtual-8086-mode task. And several 8086 programs can be run as virtual-8086-mode tasks concurrently with normal protected-mode tasks, using the processors multitasking facilities.
16.2.1. Enabling Virtual-8086 Mode

The processor runs in virtual-8086 mode when the VM (virtual machine) flag in the EFLAGS register is set. This flag can only be set when the processor switches to a new protected-mode task or resumes virtual-8086 mode via an IRET instruction. System software cannot change the state of the VM flag directly in the EFLAGS register (for example, by using the POPFD instruction). Instead it changes the flag in the image of the EFLAGS register stored in the TSS or on the stack following a call to an interrupt- or exceptionhandler procedure. For example, software sets the VM flag in the EFLAGS image in the TSS when first creating a virtual-8086 task. The processor tests the VM flag under three general conditions:
When loading segment registers, to determine whether to use 8086-style address translation. When decoding instructions, to determine which instructions are not supported in virtual8086 mode and which instructions are sensitive to IOPL. When checking privileged instructions, on page accesses, or when performing other permission checks. (Virtual-8086 mode always executes at CPL 3.)
16.2.2. Structure of a Virtual-8086 Task

A virtual-8086-mode task consists of the following items:
A 32-bit TSS for the task. The 8086 program. A virtual-8086 monitor. 8086 operating-system services.
16-9
8086 EMULATION
The TSS of the new task must be a 32-bit TSS, not a 16-bit TSS, because the 16-bit TSS does not load the most-significant word of the EFLAGS register, which contains the VM flag. All TSSs, stacks, data, and code used to handle exceptions when in virtual-8086 mode must also be 32-bit segments. The processor enters virtual-8086 mode to run the 8086 program and returns to protected mode to run the virtual-8086 monitor. The virtual-8086 monitor is a 32-bit protected-mode code module that runs at a CPL of 0. The monitor consists of initialization, interrupt- and exception-handling, and I/O emulation procedures that emulate a personal computer or other 8086-based platform. Typically, the monitor is either part of or closely associated with the protected-mode general-protection (#GP) exception handler, which also runs at a CPL of 0. As with any protected-mode code module, code-segment descriptors for the virtual-8086 monitor must exist in the GDT or in the tasks LDT. The virtual8086 monitor also may need data-segment descriptors so it can examine the IDT or other parts of the 8086 program in the first 1 MByte of the address space. The linear addresses above 10FFEFH are available for the monitor, the operating system, and other system software. The 8086 operating-system services consists of a kernel and/or operating-system procedures that the 8086 program makes calls to. These services can be implemented in either of the following two ways:
They can be included in the 8086 program. This approach is desirable for either of the following reasons: The 8086 program code modifies the 8086 operating-system services. There is not sufficient development time to merge the 8086 operating-system services into main operating system or executive.
They can be implemented or emulated in the virtual-8086 monitor. This approach is desirable for any of the following reasons: The 8086 operating-system procedures can be more easily coordinated among several virtual-8086 tasks. Memory can be saved by not duplicating 8086 operating-system procedure code for several virtual-8086 tasks. The 8086 operating-system procedures can be easily emulated by calls to the main operating system or executive.
The approach chosen for implementing the 8086 operating-system services may result in different virtual-8086-mode tasks using different 8086 operating-system services.
16.2.3. Paging of Virtual-8086 Tasks

Even though a program running in virtual-8086 mode can use only 20-bit linear addresses, the processor converts these addresses into 32-bit linear addresses before mapping them to the physical address space. If paging is being used, the 8086 address space for a program running in virtual-8086 mode can be paged and located in a set of pages in physical address space. If paging
16-10
8086 EMULATION
is used, it is transparent to the program running in virtual-8086 mode just as it is for any task running on the processor. Paging is not necessary for a single virtual-8086-mode task, but paging is useful or necessary in the following situations:
When running multiple virtual-8086-mode tasks. Here, paging allows the lower 1 MByte of the linear address space for each virtual-8086-mode task to be mapped to a different physical address location. When emulating the 8086 address-wraparound that occurs at 1 MByte. When using 8086style address translation, it is possible to specify addresses larger than 1 MByte. These addresses automatically wraparound in the Intel 8086 processor (refer to Section 16.1.1., Address Translation in Real-Address Mode). If any 8086 programs depend on address wraparound, the same effect can be achieved in a virtual-8086-mode task by mapping the linear addresses between 100000H and 110000H and linear addresses between 0 and 10000H to the same physical addresses. When sharing the 8086 operating-system services or ROM code that is common to several 8086 programs running as different 8086-mode tasks. When redirecting or trapping references to memory-mapped I/O devices.
16.2.4. Protection within a Virtual-8086 Task

Protection is not enforced between the segments of an 8086 program. Either of the following techniques can be used to protect the system software running in a virtual-8086-mode task from the 8086 program:
Reserve the first 1 MByte plus 64 KBytes of each tasks linear address space for the 8086 program. An 8086 processor task cannot generate addresses outside this range. Use the U/S flag of page-table entries to protect the virtual-8086 monitor and other system software in the virtual-8086 mode task space. When the processor is in virtual-8086 mode, the CPL is 3. Therefore, an 8086 processor program has only user privileges. If the pages of the virtual-8086 monitor have supervisor privilege, they cannot be accessed by the 8086 program.
16.2.5. Entering Virtual-8086 Mode

Figure 16-3 summarizes the methods of entering and leaving virtual-8086 mode. The processor switches to virtual-8086 mode in either of the following situations:
Task switch when the VM flag is set to 1 in the EFLAGS register image stored in the TSS for the task. Here the task switch can be initiated in either of two ways: A CALL or JMP instruction. An IRET instruction, where the NT flag in the EFLAGS image is set to 1.
Return from a protected-mode interrupt or exception handler when the VM flag is set to 1 in the EFLAGS register image on the stack.
16-11
8086 EMULATION
Real Mode Code Real-Address Mode PE=1 PE=0 or RESET
Protected Mode
ProtectedMode Tasks Task Switch

1
Task Switch VM=0
ProtectedMode Interrupt and Exception Handlers
CALL Virtual-8086 Monitor RET
VM=0 VM=1 Interrupt or Exception2
Virtual-8086 Mode RESET
Virtual-8086 Mode Tasks (8086 Programs)
#GP Exception3 IRET4 IRET5 Redirect Interrupt to 8086 Program Interrupt or Exception Handler6
NOTES: 1. Task switch carried out in either of two ways: - CALL or JMP where the VM flag in the EFLAGS image is 1. - IRET where VM is 1 and NT is 1. 2. Hardware interrupt or exception; software interrupt (INT n) when IOPL is 3. 3. General-protection exception caused by software interrupt (INT n), IRET, POPF, PUSHF, IN, or OUT when IOPL is less than 3. 4. Normal return from protected-mode interrupt or exception handler. 5. A return from the 8086 monitor to redirect an interrupt or exception back to an interrupt or exception handler in the 8086 program running in virtual8086 mode. 6. Internal redirection of a software interrupt (INT n) when VME is 1, IOPL is <3, and the redirection bit is 1.
Figure 16-3. Entering and Leaving Virtual-8086 Mode
When a task switch is used to enter virtual-8086 mode, the TSS for the virtual-8086-mode task must be a 32-bit TSS. (If the new TSS is a 16-bit TSS, the upper word of the EFLAGS register is not in the TSS, causing the processor to clear the VM flag when it loads the EFLAGS register.) The processor updates the VM flag prior to loading the segment registers from their images in the new TSS. The new setting of the VM flag determines whether the processor interprets the
16-12
8086 EMULATION
contents of the segment registers as 8086-style segment selectors or protected-mode segment selectors. When the VM flag is set, the segment registers are loaded from the TSS, using 8086style address translation to form base addresses. Refer to Section 16.3., Interrupt and Exception Handling in Virtual-8086 Mode for information on entering virtual-8086 mode on a return from an interrupt or exception handler.
16.2.6. Leaving Virtual-8086 Mode

The processor can leave the virtual-8086 mode only through an interrupt or exception. The following are situations where an interrupt or exeception will lead to the processor leaving virtual-8086 mode (refer to Figure 16-3):
The processor services a hardware interrupt generated to signal the suspension of execution of the virtual-8086 application. This hardware interrupt may be generated by a timer or other external mechanism. Upon receiving the hardware interrupt, the processor enters protected mode and switches to a protected-mode (or another virtual-8086 mode) task either through a task gate in the protected-mode IDT or through a trap or interrupt gate that points to a handler that initiates a task switch. A task switch from a virtual-8086 task to another task loads the EFLAGS register from the TSS of the new task. The value of the VM flag in the new EFLAGS determines if the new task executes in virtual-8086 mode or not. The processor services an exception caused by code executing the virtual-8086 task or services a hardware interrupt that belongs to the virtual-8086 task. Here, the processor enters protected mode and services the exception or hardware interrupt through the protected-mode IDT (normally through an interrupt or trap gate) and the protected-mode exception- and interrupt-handlers. The processor may handle the exception or interrupt within the context of the virtual 8086 task and return to virtual-8086 mode on a return from the handler procedure. The processor may also execute a task switch and handle the exception or interrupt in the context of another task. The processor services a software interrupt generated by code executing in the virtual8086 task (such as a software interrupt to call a MS-DOS* operating system routine). The processor provides several methods of handling these software interrupts, which are discussed in detail in Section 16.3.3., Class 3Software Interrupt Handling in Virtual8086 Mode Most of them involve the processor entering protected mode, often by means of a general-protection (#GP) exception. In protected mode, the processor can send the interrupt to the virtual-8086 monitor for handling and/or redirect the interrupt back to the application program running in virtual-8086 mode task for handling. Intel Architecture processors that incorporate the virtual mode extension (enabled with the VME flag in control register CR4) are capable of redirecting software-generated interrupts back to the programs interrupt handlers without leaving virtual-8086 mode. Refer to Section 16.3.3.4., Method 5: Software Interrupt Handling for more information on this mechanism.
16-13
8086 EMULATION
A hardware reset initiated by asserting the RESET or INIT pin is a special kind of interrupt. When a RESET or INIT is signaled while the processor is in virtual-8086 mode, the processor leaves virtual-8086 mode and enters real-address mode. Execution of the HLT instruction in virtual-8086 mode will cause a general-protection (GP#) fault, which the protected-mode handler generally sends to the virtual-8086 monitor. The virtual-8086 monitor then determines the correct execution sequence after verifying that it was entered as a result of a HLT execution.
Refer to Section 16.3., Interrupt and Exception Handling in Virtual-8086 Mode for information on leaving virtual-8086 mode to handle an interrupt or exception generated in virtual-8086 mode.
16.2.7. Sensitive Instructions

When an Intel Architecture processor is running in virtual-8086 mode, the CLI, STI, PUSHF, POPF, INT n, and IRET instructions are sensitive to IOPL. The IN, INS, OUT, and OUTS instructions, which are sensitive to IOPL in protected mode, are not sensitive in virtual-8086 mode. The CPL is always 3 while running in virtual-8086 mode; if the IOPL is less than 3, an attempt to use the IOPL-sensitive instructions listed above triggers a general-protection exception (#GP). These instructions are sensitive to IOPL to give the virtual-8086 monitor a chance to emulate the facilities they affect.
16.2.8. Virtual-8086 Mode I/O

Many 8086 programs written for nonmultitasking systems directly access I/O ports. This practice may cause problems in a multitasking environment. If more than one program accesses the same port, they may interfere with each other. Most multitasking systems require application programs to access I/O ports through the operating system. This results in simplified, centralized control. The processor provides I/O protection for creating I/O that is compatible with the environment and transparent to 8086 programs. Designers may take any of several possible approaches to protecting I/O ports:
Protect the I/O address space and generate exceptions for all attempts to perform I/O directly. Let the 8086 program perform I/O directly. Generate exceptions on attempts to access specific I/O ports. Generate exceptions on attempts to access specific memory-mapped I/O ports.
The method of controlling access to I/O ports depends upon whether they are I/O-port mapped or memory mapped.
16-14
8086 EMULATION
16.2.8.1.
I/O-PORT-MAPPED I/O
The I/O permission bit map in the TSS can be used to generate exceptions on attempts to access specific I/O port addresses. The I/O permission bit map of each virtual-8086-mode task determines which I/O addresses generate exceptions for that task. Because each task may have a different I/O permission bit map, the addresses that generate exceptions for one task may be different from the addresses for another task. This differs from protected mode in which, if the CPL is less than or equal to the IOPL, I/O access is allowed without checking the I/O permission bit map. Refer to Chapter 9, Input/Output, in the Intel Architecture Software Developers Manual, Volume 1, for more information about the I/O permission bit map. 16.2.8.2. MEMORY-MAPPED I/O
In systems which use memory-mapped I/O, the paging facilities of the processor can be used to generate exceptions for attempts to access I/O ports. The virtual-8086 monitor may use paging to control memory-mapped I/O in these ways:
Map part of the linear address space of each task that needs to perform I/O to the physical address space where I/O ports are placed. By putting the I/O ports at different addresses (in different pages), the paging mechanism can enforce isolation between tasks. Map part of the linear address space to pages that are not-present. This generates an exception whenever a task attempts to perform I/O to those pages. System software then can interpret the I/O operation being attempted.
Software emulation of the I/O space may require too much operating system intervention under some conditions. In these cases, it may be possible to generate an exception for only the first attempt to access I/O. The system software then may determine whether a program can be given exclusive control of I/O temporarily, the protection of the I/O space may be lifted, and the program allowed to run at full speed. 16.2.8.3. SPECIAL I/O BUFFERS
Buffers of intelligent controllers (for example, a bit-mapped frame buffer) also can be emulated using page mapping. The linear space for the buffer can be mapped to a different physical space for each virtual-8086-mode task. The virtual-8086 monitor then can control which virtual buffer to copy onto the real buffer in the physical address space.
16.3. INTERRUPT AND EXCEPTION HANDLING IN VIRTUAL-8086 MODE

When the processor receives an interrupt or detects an exception condition while in virtual-8086 mode, it invokes an interrupt or exception handler, just as it does in protected or real-address mode. The interrupt or exception handler that is invoked and the mechanism used to invoke it depends on the class of interrupt or exception that has been detected or generated and the state of various system flags and fields.
16-15
8086 EMULATION
In virtual-8086 mode, the interrupts and exceptions are divided into three classes for the purposes of handling:
Class 1All processor-generated exceptions and all hardware interrupts, including the NMI interrupt and the hardware interrupts sent to the processors external interrupt delivery pins. All class 1 exceptions and interrupts are handled by the protected-mode exception and interrupt handlers. Class 2Special case for maskable hardware interrupts (Section 5.1.1.2., Maskable Hardware Interrupts, in Chapter 5, Interrupt and Exception Handling) when the virtual mode extensions are enabled. Class 3All software-generated interrupts, that is interrupts generated with the INT n instruction1.
The method the processor uses to handle class 2 and 3 interrupts depends on the setting of the following flags and fields:
IOPL field (bits 12 and 13 in the EFLAGS register)Controls how class 3 software interrupts are handled when the processor is in virtual-8086 mode (refer to Section 2.3., System Flags and Fields in the EFLAGS Register, in Chapter 2, System Architecture Overview). This field also controls the enabling of the VIF and VIP flags in the EFLAGS register when the VME flag is set. The VIF and VIP flags are provided to assist in the handling of class 2 maskable hardware interrupts. VME flag (bit 0 in control register CR4)Enables the virtual mode extension for the processor when set (refer to Section 2.5., Control Registers, in Chapter 2, System Architecture Overview). Software interrupt redirection bit map (32 bytes in the TSS, refer to Figure 16-5)Contains 256 flags that indicates how class 3 software interrupts should be handled when they occur in virtual-8086 mode. A software interrupt can be directed either to the interrupt and exception handlers in the currently running 8086 program or to the protectedmode interrupt and exception handlers. The virtual interrupt flag (VIF) and virtual interrupt pending flag (VIP) in the EFLAGS registerProvides virtual interrupt support for the handling of class 2 maskable hardware interrupts (refer to Section 16.3.2., Class 2Maskable Hardware Interrupt Handling in Virtual-8086 Mode Using the Virtual Interrupt Mechanism).
NOTE
The VME flag, software interrupt redirection bit map, and VIF and VIP flags are only available in Intel Architecture processors that support the virtual mode extensions. These extensions were introduced in the Intel Architecture with the Pentium processor. The following sections describe the actions that processor takes and the possible actions of interrupt and exception handlers for the two classes of interrupts described in the previous paragraphs. These sections describe three possible types of interrupt and exception handlers:
1. The INT 3 instruction is a special case (refer to the description of the INT n instruction in Chapter 3, Instruction Set Reference, of the Intel Architecture Software Developers Manual, Volume 2). 16-16
8086 EMULATION
Protected-mode interrupt and exceptions handlersThese are the handlers that the processor calls through the protected-mode IDT. Virtual-8086 monitor interrupt and exception handlersThese handlers are resident in the virtual-8086 monitor, and they are commonly accessed through a general-protection exception (#GP, interrupt 13) that is directed to the protected-mode general-protection exception handler. 8086 program interrupt and exception handlersThese handlers are part of the 8086 program that is running in virtual-8086 mode.
The following sections describe how these handlers are used, depending on the selected class and method of interrupt and exception handling.
16.3.1. Class 1Hardware Interrupt and Exception Handling in Virtual-8086 Mode

In virtual-8086 mode, the Pentium and P6 family processors handle hardware interrupts and exceptions in the same manner as they are handled by the Intel486 and Intel386 processors. They invoke the protected-mode interrupt or exception handler that the interrupt or exception vector points to in the IDT. Here, the IDT entry must contain either a 32-bit trap or interrupt gate or a task gate. The following sections describe various ways that a virtual-8086 mode interrupt or exception can be handled after the protected-mode handler has been invoked. Refer to Section 16.3.2., Class 2Maskable Hardware Interrupt Handling in Virtual-8086 Mode Using the Virtual Interrupt Mechanism for a description of the virtual interrupt mechanism that is available for handling maskable hardware interrupts while in virtual-8086 mode. When this mechanism is either not available or not enabled, maskable hardware interrupts are handled in the same manner as exceptions, as described in the following sections. 16.3.1.1. HANDLING AN INTERRUPT OR EXCEPTION THROUGH A PROTECTED-MODE TRAP OR INTERRUPT GATE
When an interrupt or exception vector points to a 32-bit trap or interrupt gate in the IDT, the gate must in turn point to a nonconforming, privilege-level 0, code segment. When accessing this code segment, the processor performs the following steps. 1. Switches to 32-bit protected mode and privilege level 0. 2. Saves the state of the processor on the privilege-level 0 stack. The states of the EIP, CS, EFLAGS, ESP, SS, ES, DS, FS, and GS registers are saved (refer to Figure 16-4). 3. Clears the segment registers. Saving the DS, ES, FS, and GS registers on the stack and then clearing the registers lets the interrupt or exception handler safely save and restore these registers regardless of the type segment selectors they contain (protected-mode or 8086style). The interrupt and exception handlers, which may be called in the context of either a protected-mode task or a virtual-8086-mode task, can use the same code sequences for saving and restoring the registers for any task. Clearing these registers before execution of the IRET instruction does not cause a trap in the interrupt handler. Interrupt procedures that
16-17
8086 EMULATION
expect values in the segment registers or that return values in the segment registers must use the register images saved on the stack for privilege level 0. 4. Clears the VM flag in the EFLAGS register. 5. Begins executing the selected interrupt or exception handler.
Without Error Code Unused Old GS Old FS Old DS Old ES Old SS Old ESP Old EFLAGS Old CS Old EIP New ESP ESP from TSS
With Error Code Unused Old GS Old FS Old DS Old ES Old SS Old ESP Old EFLAGS Old CS Old EIP Error Code New ESP ESP from TSS
Figure 16-4. Privilege Level 0 Stack After Interrupt or Exception in Virtual-8086 Mode
If the trap or interrupt gate references a procedure in a conforming segment or in a segment at a privilege level other than 0, the processor generates a general-protection exception (#GP). Here, the error code is the segment selector of the code segment to which a call was attempted. Interrupt and exception handlers can examine the VM flag on the stack to determine if the interrupted procedure was running in virtual-8086 mode. If so, the interrupt or exception can be handled in one of three ways:
The protected-mode interrupt or exception handler that was called can handle the interrupt or exception. The protected-mode interrupt or exception handler can call the virtual-8086 monitor to handle the interrupt or exception. The virtual-8086 monitor (if called) can in turn pass control back to the 8086 programs interrupt and exception handler.
16-18
8086 EMULATION
If the interrupt or exception is handled with a protected-mode handler, the handler can return to the interrupted program in virtual-8086 mode by executing an IRET instruction. This instruction loads the EFLAGS and segment registers from the images saved in the privilege level 0 stack (refer to Figure 16-4). A set VM flag in the EFLAGS image causes the processor to switch back to virtual-8086 mode. The CPL at the time the IRET instruction is executed must be 0, otherwise the processor does not change the state of the VM flag. The virtual-8086 monitor runs at privilege level 0, like the protected-mode interrupt and exception handlers. It is commonly closely tied to the protected-mode general-protection exception (#GP, vector 13) handler. If the protected-mode interrupt or exception handler calls the virtual8086 monitor to handle the interrupt or exception, the return from the virtual-8086 monitor to the interrupted virtual-8086 mode program requires two return instructions: a RET instruction to return to the protected-mode handler and an IRET instruction to return to the interrupted program. The virtual-8086 monitor has the option of directing the interrupt and exception back to an interrupt or exception handler that is part of the interrupted 8086 program, as described in Section 16.3.1.2., Handling an Interrupt or Exception With an 8086 Program Interrupt or Exception Handler. 16.3.1.2. HANDLING AN INTERRUPT OR EXCEPTION WITH AN 8086 PROGRAM INTERRUPT OR EXCEPTION HANDLER
Because it was designed to run on an 8086 processor, an 8086 program running in a virtual8086-mode task contains an 8086-style interrupt vector table, which starts at linear address 0. If the virtual-8086 monitor correctly directs an interrupt or exception vector back to the virtual8086-mode task it came from, the handlers in the 8086 program can handle the interrupt or exception. The virtual-8086 monitor must carry out the following steps to send an interrupt or exception back to the 8086 program: 1. Use the 8086 interrupt vector to locate the appropriate handler procedure in the 8086 program interrupt table. 2. Store the EFLAGS (low-order 16 bits only), CS and EIP values of the 8086 program on the privilege-level 3 stack. This is the stack that the virtual-8086-mode task is using. (The 8086 handler may use or modify this information.) 3. Change the return link on the privilege-level 0 stack to point to the privilege-level 3 handler procedure. 4. Execute an IRET instruction to pass control to the 8086 program handler. 5. When the IRET instruction from the privilege-level 3 handler triggers a general-protection exception (#GP) and thus effectively again calls the virtual-8086 monitor, restore the return link on the privilege-level 0 stack to point to the original, interrupted, privilege-level 3 procedure. 6. Copy the low order 16 bits of the EFLAGS image from the privilege-level 3 stack to the privilege-level 0 stack (because some 8086 handlers modify these flags to return information to the code that caused the interrupt).
16-19
8086 EMULATION
7. Execute an IRET instruction to pass control back to the interrupted 8086 program. Note that if an operating system intends to support all 8086 MS-DOS-based programs, it is necessary to use the actual 8086 interrupt and exception handlers supplied with the program. The reason for this is that some programs modify their own interrupt vector table to substitute (or hook in series) their own specialized interrupt and exception handlers. 16.3.1.3. HANDLING AN INTERRUPT OR EXCEPTION THROUGH A TASK GATE
When an interrupt or exception vector points to a task gate in the IDT, the processor performs a task switch to the selected interrupt- or exception-handling task. The following actions are carried out as part of this task switch: 1. The EFLAGS register with the VM flag set is saved in the current TSS. 2. The link field in the TSS of the called task is loaded with the segment selector of the TSS for the interrupted virtual-8086-mode task. 3. The EFLAGS register is loaded from the image in the new TSS, which clears the VM flag and causes the processor to switch to protected mode. 4. The NT flag in the EFLAGS register is set. 5. The processor begins executing the selected interrupt- or exception-handler task. When an IRET instruction is executed in the handler task and the NT flag in the EFLAGS register is set, the processors switches from a protected-mode interrupt- or exception-handler task back to a virtual-8086-mode task. Here, the EFLAGS and segment registers are loaded from images saved in the TSS for the virtual-8086-mode task. If the VM flag is set in the EFLAGS image, the processor switches back to virtual-8086 mode on the task switch. The CPL at the time the IRET instruction is executed must be 0, otherwise the processor does not change the state of the VM flag.
16.3.2. Class 2Maskable Hardware Interrupt Handling in Virtual8086 Mode Using the Virtual Interrupt Mechanism
Maskable hardware interrupts are those interrupts that are delivered through the INTR# pin or through an interrupt request to the local APIC (refer to Section 5.1.1.2., Maskable Hardware Interrupts, in Chapter 5, Interrupt and Exception Handling). These interrupts can be inhibited (masked) from interrupting an executing program or task by clearing the IF flag in the EFLAGS register. When the VME flag in control register CR4 is set and the IOPL field in the EFLAGS register is less than 3, two additional flags are activated in the EFLAGS register:
VIF (virtual interrupt) flag, bit 19 of the EFLAGS register. VIP (virtual interrupt pending) flag, bit 20 of the EFLAGS register.
16-20
8086 EMULATION
These flags provide the virtual-8086 monitor with more efficient control over handling maskable hardware interrupts that occur during virtual-8086 mode tasks. They also reduce interrupt-handling overhead, by eliminating the need for all IF related operations (such as PUSHF, POPF, CLI, and STI instructions) to trap to the virtual-8086 monitor. The purpose and use of these flags are as follows.
NOTE
The VIF and VIP flags are only available in Intel Architecture processors that support the virtual mode extensions. These extensions were introduced in the Intel Architecture with the Pentium processor. When this mechanism is either not available or not enabled, maskable hardware interrupts are handled as class 1 interrupts. Here, if VIF and VIP flags are needed, the virtual-8086 monitor can implement them in software. Existing 8086 programs commonly set and clear the IF flag in the EFLAGS register to enable and disable maskable hardware interrupts, respectively; for example, to disable interrupts while handling another interrupt or an exception. This practice works well in single task environments, but can cause problems in multitasking and multiple-processor environments, where it is often desirable to prevent an application program from having direct control over the handling of hardware interrupts. When using earlier Intel Architecture processors, this problem was often solved by creating a virtual IF flag in software. The Intel Architecture processors (beginning with the Pentium processor) provide hardware support for this virtual IF flag through the VIF and VIP flags. The VIF flag is a virtualized version of the IF flag, which an application program running from within a virtual-8086 task can used to control the handling of maskable hardware interrupts. When the VIF flag is enabled, the CLI and STI instructions operate on the VIF flag instead of the IF flag. When an 8086 program executes the CLI instruction, the processor clears the VIF flag to request that the virtual-8086 monitor inhibit maskable hardware interrupts from interrupting program execution; when it executes the STI instruction, the processor sets the VIF flag requesting that the virtual-8086 monitor enable maskable hardware interrupts for the 8086 program. But actually the IF flag, managed by the operating system, always controls whether maskable hardware interrupts are enabled. Also, if under these circumstances an 8086 program tries to read or change the IF flag using the PUSHF or POPF instructions, the processor will change the VIF flag instead, leaving IF unchanged. The VIP flag provides software a means of recording the existence of a deferred (or pending) maskable hardware interrupt. This flag is read by the processor but never explicitly written by the processor; it can only be written by software. If the IF flag is set and the VIF and VIP flags are enabled, and the processor receives a maskable hardware interrupt (interrupt vector 0 through 255), the processor performs and the interrupt handler software should perform the following operations: 1. The processor invokes the protected-mode interrupt handler for the interrupt received, as described in the following steps. These steps are almost identical to those described for
16-21
8086 EMULATION
method 1 interrupt and exception handling in Section 16.3.1.1., Handling an Interrupt or Exception Through a Protected-Mode Trap or Interrupt Gate: a. Switches to 32-bit protected mode and privilege level 0.
b. Saves the state of the processor on the privilege-level 0 stack. The states of the EIP, CS, EFLAGS, ESP, SS, ES, DS, FS, and GS registers are saved (refer to Figure 16-4). In the EFLAGS image on the stack, the IOPL field is set to 3 and the VIF flag is copied to the IF flag. c. Clears the segment registers.
d. Clears the VM flag in the EFLAGS register. e. Begins executing the selected protected-mode interrupt handler.
2. The recommended action of the protected-mode interrupt handler is to read the VM flag from the EFLAGS image on the stack. If this flag is set, the handler makes a call to the virtual-8086 monitor. 3. The virtual-8086 monitor should read the VIF flag in the EFLAGS register. If the VIF flag is clear, the virtual-8086 monitor sets the VIP flag in the EFLAGS image on the stack to indicate that there is a deferred interrupt pending and returns to the protected-mode handler. If the VIF flag is set, the virtual-8086 monitor can handle the interrupt if it belongs to the 8086 program running in the interrupted virtual-8086 task; otherwise, it can call the protected-mode interrupt handler to handle the interrupt. 4. The protected-mode handler executes a return to the program executing in virtual-8086 mode. 5. Upon returning to virtual-8086 mode, the processor continues execution of the 8086 program. When the 8086 program is ready to receive maskable hardware interrupts, it executes the STI instruction to set the VIF flag (enabling maskable hardware interrupts). Prior to setting the VIF flag, the processor automatically checks the VIP flag and does one of the following, depending on the state of the flag:
If the VIP flag is clear (indicating no pending interrupts), the processor sets the VIF flag. If the VIP flag is set (indicating a pending interrupt), the processor generates a generalprotection exception (#GP).
The recommended action of the protected-mode general-protection exception handler is to then call the virtual-8086 monitor and let it handle the pending interrupt. After handling the pending interrupt, the typical action of the virtual-8086 monitor is to clear the VIP flag and set the VIF flag in the EFLAGS image on the stack, and then execute a return to the virtual-8086 mode. The next time the processor receives a maskable hardware interrupt, it will then handle it as described in steps 1 through 5 earlier in this section. If the processor finds that both the VIF and VIP flags are set at the beginning of an instruction, it generates a general-protection exception. This action allows the virtual-8086 monitor to
16-22
8086 EMULATION
handle the pending interrupt for the virtual-8086 mode task for which the VIF flag is enabled. Note that this situation can only occur immediately following execution of a POPF or IRET instruction or upon entering a virtual-8086 mode task through a task switch. Note that the states of the VIF and VIP flags are not modified in real-address mode or during transitions between real-address and protected modes.
NOTE
The virtual interrupt mechanism described in this section is also available for use in protected mode, refer to Section 16.4., Protected-Mode Virtual Interrupts.
16.3.3. Class 3Software Interrupt Handling in Virtual-8086 Mode

When the processor receives a software interrupt (an interrupt generated with the INT n instruction) while in virtual-8086 mode, it can use any of six different methods to handle the interrupt. The method selected depends on the settings of the VME flag in control register CR4, the IOPL field in the EFLAGS register, and the software interrupt redirection bit map in the TSS. Table 16-2 lists the six methods of handling software interrupts in virtual-8086 mode and the respective settings of the VME flag, IOPL field, and the bits in the interrupt redirection bit map for each method. The table also summarizes the various actions the processor takes for each method. The VME flag enables the virtual mode extensions for the Pentium and P6-family processors. When this flag is clear, the processor responds to interrupts and exceptions in virtual-8086 mode in the same manner as an Intel386 or Intel486 processor does. When this flag is set, the virtual mode extension provides the following enhancements to virtual-8086 mode:
Speeds up the handling of software-generated interrupts in virtual-8086 mode by allowing the processor to bypass the virtual-8086 monitor and redirect software interrupts back to the interrupt handlers that are part of the currently running 8086 program. Supports virtual interrupts for software written to run on the 8086 processor.
The IOPL value interacts with the VME flag and the bits in the interrupt redirection bit map to determine how specific software interrupts should be handled. The software interrupt redirection bit map (refer to Figure 16-5) is a 32-byte field in the TSS. This map is located directly below the I/O permission bit map in the TSS. Each bit in the interrupt redirection bit map is mapped to an interrupt vector. Bit 0 in the interrupt redirection bit map (which maps to vector zero in the interrupt table) is located at the I/O base map address in the TSS minus 32 bytes. When a bit in this bit map is set, it indicates that the associated software interrupt (interrupt generated with an INT n instruction) should be handled through the protected-mode IDT and interrupt and exception handlers. When a bit in this bit map is clear, the processor redirects the associated software interrupt back to the interrupt table in the 8086 program (located at linear address 0 in the programs address space).
16-23
8086 EMULATION
NOTE
The software interrupt redirection bit map does not affect hardware generated interrupts and exceptions. Hardware generated interrupts and exceptions are always handled by the protected-mode interrupt and exception handlers.
Table 16-2. Software Interrupt Handling Methods While in Virtual-8086 Mode
Bit in Redir. Bitmap* X
Method 1
VME 0
IOPL 3
Processor Action Interrupt directed to a protected-mode interrupt handler: - Clears VM and TF flags - If serviced through interrupt gate, clears IF flag - Switches to privilege-level 0 stack - Pushes GS, FS, DS and ES onto privilege-level 0 stack - Clears GS, FS, DS and ES to 0 - Pushes SS, ESP, EFLAGS, CS and EIP of interrupted task onto privilege-level 0 stack - Sets CS and EIP from interrupt gate Interrupt directed to protected-mode general-protection exception (#GP) handler. Interrupt directed to a protected-mode general-protection exception (#GP) handler; VIF and VIP flag support for handling class 2 maskable hardware interrupts. Interrupt directed to protected-mode interrupt handler: (refer to method 1 processor action). Interrupt redirected to 8086 program interrupt handler: - Pushes EFLAGS with NT cleared and IOPL set to 0 - Pushes CS and EIP (lower 16 bits only) - Clears IF flag - Clears TF flag - Loads CS and EIP (lower 16 bits only) from selected entry in the interrupt vector table of the current virtual-8086 task Interrupt redirected to 8086 program interrupt handler; VIF and VIP flag support for handling class 2 maskable hardware interrupts: - Pushes EFLAGS with IOPL set to 3 and VIF copied to IF - Pushes CS and EIP (lower 16 bits only) - Clears the VIF flag - Clears TF flag - Loads CS and EIP (lower 16 bits only) from selected entry in the interrupt vector table of the current virtual-8086 task
2 3
0 1
<3 <3
X 1
4 5
1 1
3 3
1 0
<3
NOTE: * When set to 0, software interrupt is redirected back to the 8086 program interrupt handler; when set to 1, interrupt is directed to protected-mode handler.
16-24
8086 EMULATION
Last byte of bit map must be followed by a byte with all bits set
31
24 23
Task-State Segment (TSS)
1 1 1 1 1 1 1 1
I/O Permission Bit Map
Software Interrupt Redirection Bit Map (32 Bytes)
I/O base map must not exceed DFFFH.
I/O Map Base
64H
Figure 16-5. Software Interrupt Redirection Bit Map in TSS
Redirecting software interrupts back to the 8086 program potentially speeds up interrupt handling because a switch back and forth between virtual-8086 mode and protected mode is not required. This latter interrupt-handling technique is particularly useful for 8086 operating systems (such as MS-DOS) that use the INT n instruction to call operating system procedures. The CPUID instruction can be used to verify that the virtual mode extension is implemented on the processor. Bit 1 of the feature flags register (EDX) indicates the availability of the virtual mode extension (refer to CPUIDCPU Identification in Chapter 3 of the Intel Architecture Software Developers Manual, Volume 2). The following sections describe the six methods (or mechanisms) for handling software interrupts in virtual-8086 mode. Refer to Section 16.3.2., Class 2Maskable Hardware Interrupt Handling in Virtual-8086 Mode Using the Virtual Interrupt Mechanism for a description of the use of the VIF and VIP flags in the EFLAGS register for handling maskable hardware interrupts. 16.3.3.1. METHOD 1: SOFTWARE INTERRUPT HANDLING
When the VME flag in control register CR4 is clear and the IOPL field is 3, a Pentium, or P6family processor handles software interrupts in the same manner as they are handled by an Intel386 or Intel486 processor. It executes an implicit call to the interrupt handler in the protected-mode IDT pointed to by the interrupt vector. Refer to Section 16.3.1., Class 1Hardware Interrupt and Exception Handling in Virtual-8086 Mode for a complete description of this mechanism and its possible uses.
16-25
8086 EMULATION
16.3.3.2.
METHODS 2 AND 3: SOFTWARE INTERRUPT HANDLING
When a software interrupt occurs in virtual-8086 mode and the method 2 or 3 conditions are present, the processor generates a general-protection exception (#GP). Method 2 is enabled when the VME flag is set to 0 and the IOPL value is less than 3. Here the IOPL value is used to bypass the protected-mode interrupt handlers and cause any software interrupt that occurs in virtual-8086 mode to be treated as a protected-mode general-protection exception (#GP). The general-protection exception handler calls the virtual-8086 monitor, which can then emulate an 8086-program interrupt handler or pass control back to the 8086 programs handler, as described in Section 16.3.1.2., Handling an Interrupt or Exception With an 8086 Program Interrupt or Exception Handler. Method 3 is enabled when the VME flag is set to 1, the IOPL value is less than 3, and the corresponding bit for the software interrupt in the software interrupt redirection bit map is set to 1. Here, the processor performs the same operation as it does for method 2 software interrupt handling. If the corresponding bit for the software interrupt in the software interrupt redirection bit map is set to 0, the interrupt is handled using method 6 (refer to Section 16.3.3.5., Method 6: Software Interrupt Handling). 16.3.3.3. METHOD 4: SOFTWARE INTERRUPT HANDLING
Method 4 handling is enabled when the VME flag is set to 1, the IOPL value is 3, and the bit for the interrupt vector in the redirection bit map is set to 1. Method 4 software interrupt handling allows method 1 style handling when the virtual mode extension is enabled; that is, the interrupt is directed to a protected-mode handler (refer to Section 16.3.3.1., Method 1: Software Interrupt Handling). 16.3.3.4. METHOD 5: SOFTWARE INTERRUPT HANDLING
Method 5 software interrupt handling provides a streamlined method of redirecting software interrupts (invoked with the INT n instruction) that occur in virtual 8086 mode back to the 8086 programs interrupt vector table and its interrupt handlers. Method 5 handling is enabled when the VME flag is set to 1, the IOPL value is 3, and the bit for the interrupt vector in the redirection bit map is set to 0. The processor performs the following actions to make an implicit call to the selected 8086 program interrupt handler: 1. Pushes the low-order 16 bits of the EFLAGS register onto the stack with the NT and IOPL bits cleared. 2. Pushes the current values of the CS and EIP registers onto the current stack. (Only the 16 least-significant bits of the EIP register are pushed and no stack switch occurs.) 3. Clears the IF flag in the EFLAGS register to disable interrupts. 4. Clears the TF flag, in the EFLAGS register. 5. Locates the 8086 program interrupt vector table at linear address 0 for the 8086-mode task. 6. Loads the CS and EIP registers with values from the interrupt vector table entry pointed to by the interrupt vector number. Only the 16 low-order bits of the EIP are loaded and the 16
16-26
8086 EMULATION
high-order bits are set to 0. The interrupt vector table is assumed to be at linear address 0 of the current virtual-8986 task. 7. Begins executing the selected interrupt handler. An IRET instruction at the end of the handler procedure reverses these steps to return program control to the interrupted 8086 program. Note that with method 5 handling, a mode switch from virtual-8086 mode to protected mode does not occur. The processor remains in virtual-8086 mode throughout the interrupt-handling operation. The method 5 handling actions are virtually identical to the actions the processor takes when handling software interrupts in real-address mode. The benefit of using method 5 handling to access the 8086 program handlers is that it avoids the overhead of methods 2 and 3 handling, which requires first going to the virtual-8086 monitor, then to the 8086 program handler, then back again to the virtual-8086 monitor, before returning to the interrupted 8086 program (refer to Section 16.3.1.2., Handling an Interrupt or Exception With an 8086 Program Interrupt or Exception Handler).
NOTE
Methods 1 and 4 handling can handle a software interrupt in a virtual-8086 task with a regular protected-mode handler, but this approach requires all virtual-8086 tasks to use the same software interrupt handlers, which generally does not give sufficient latitude to the programs running in the virtual-8086 tasks, particularly MS-DOS programs. 16.3.3.5. METHOD 6: SOFTWARE INTERRUPT HANDLING
Method 6 handling is enabled when the VME flag is set to 1, the IOPL value is less than 3, and the bit for the interrupt or exception vector in the redirection bit map is set to 0. With method 6 interrupt handling, software interrupts are handled in the same manner as was described for method 5 handling (refer to Section 16.3.3.4., Method 5: Software Interrupt Handling). Method 6 differs from method 5 in that with the IOPL value set to less than 3, the VIF and VIP flags in the EFLAGS register are enabled, providing virtual interrupt support for handling class 2 maskable hardware interrupts (refer to Section 16.3.2., Class 2Maskable Hardware Interrupt Handling in Virtual-8086 Mode Using the Virtual Interrupt Mechanism). These flags provide the virtual-8086 monitor with an efficient means of handling maskable hardware interrupts that occur during a virtual-8086 mode task. Also, because the IOPL value is less than 3 and the VIF flag is enabled, the information pushed on the stack by the processor when invoking the interrupt handler is slightly different between methods 5 and 6 (refer to Table 16-2).
16.4. PROTECTED-MODE VIRTUAL INTERRUPTS

The Intel Architecture processors (beginning with the Pentium processor) also support the VIF and VIP flags in the EFLAGS register in protected mode by setting the PVI (protected-mode
16-27
8086 EMULATION
virtual interrupt) flag in the CR4 register. Setting the PVI flag allows applications running at privilege level 3 to execute the CLI and STI instructions without causing a general-protection exception (#GP) or affecting hardware interrupts. When the PVI flag is set to 1, the CPL is 3, and the IOPL is less than 3, the STI and CLI instructions set and clear the VIF flag in the EFLAGS register, leaving IF unaffected. In this mode of operation, an application running in protected mode and at a CPL of 3 can inhibit interrupts in the same manner as is described in Section 16.3.2., Class 2Maskable Hardware Interrupt Handling in Virtual-8086 Mode Using the Virtual Interrupt Mechanism for a virtual-8086 mode task. When the application executes the CLI instruction, the processor clears the VIF flag. If the processor receives a maskable hardware interrupt when the VIF flag is clear, the processor invokes the protected-mode interrupt handler. This handler checks the state of the VIF flag in the EFLAGS register. If the VIF flag is clear (indicating that the active task does not want to have interrupts handled now), the handler sets the VIP flag in the EFLAGS image on the stack and returns to the privilege-level 3 application, which continues program execution. When the application executes a STI instruction to set the VIF flag, the processor automatically invokes the general-protection exception handler, which can then handle the pending interrupt. After handing the pending interrupt, the handler typically sets the VIF flag and clears the VIP flag in the EFLAGS image on the stack and executes a return to the application program. The next time the processor receives a maskable hardware interrupt, the processor will handle it in the normal manner for interrupts received while the processor is operating at a CPL of 3. As with the virtual mode extension (enabled with the VME flag in the CR4 register), the protected-mode virtual interrupt extension only affects maskable hardware interrupts (interrupt vectors 32 through 255). NMI interrupts and exceptions are handled in the normal manner. When protected-mode virtual interrupts are disabled (that is, when the PVI flag in control register CR4 is set to 0, the CPL is less than 3, or the IOPL value is 3), then the CLI and STI instructions execute in a manner compatible with the Intel486 processor. That is, if the CPL is greater (less privileged) than the I/O privilege level (IOPL), a general-protection exception occurs. If the IOPL value is 3, CLI and STI clear or set the IF flag, respectively. PUSHF, POPF, and IRET are executed like in the Intel486 processor, regardless of whether protected-mode virtual interrupts are enabled. It is only possible to enter virtual-8086 mode through a task switch or the execution of an IRET instruction, and it is only possible to leave virtual-8086 mode by faulting to a protected-mode interrupt handler (typically the general-protection exception handler, which in turn calls the virtual 8086-mode monitor). In both cases, the EFLAGS register is saved and restored. This is not true, however, in protected mode when the PVI flag is set and the processor is not in virtual8086 mode. Here, it is possible to call a procedure at a different privilege level, in which case the EFLAGS register is not saved or modified. However, the states of VIF and VIP flags are never examined by the processor when the CPL is not 3.
16-28
17
Mixing 16-Bit and 32-Bit Code
MIXING 16-BIT AND 32-BIT CODE
CHAPTER 17 MIXING 16-BIT AND 32-BIT CODE

Program modules written to run on Intel Architecture processors can be either 16-bit modules or 32-bit modules. Table 17-1 shows the characteristic of 16-bit and 32-bit modules.
Table 17-1. Characteristics of 16-Bit and 32-Bit Program Modules
Characteristic Segment Size Operand Sizes Pointer Offset Size (Address Size) Stack Pointer Size Control Transfers Allowed to Code Segments of This Size 16-Bit Program Modules 0 to 64 KBytes 8 bits and 16 bits 16 bits 16 Bits 16 Bits 32-Bit Program Modules 0 to 4 GBytes 8 bits and 32 bits 32 bits 32 Bits 32 Bits
The Intel Architecture processors function most efficiently when executing 32-bit program modules. They can, however, also execute 16-bit program modules, in any of the following ways:
In real-address mode. In virtual-8086 mode. System management mode (SMM). As a protected-mode task, when the code, data, and stack segments for the task are all configured as a 16-bit segments. By integrating 16-bit and 32-bit segments into a single protected-mode task. By integrating 16-bit operations into 32-bit code segments.
Real-address mode, virtual-8086 mode, and SMM are native 16-bit modes. A legacy program assembled and/or compiled to run on an Intel 8086 or Intel 286 processor should run in realaddress mode or virtual-8086 mode without modification. Sixteen-bit program modules can also be written to run in real-address mode for handling system initialization or to run in SMM for handling system management functions. Refer to Chapter 16, 8086 Emulation for detailed information on real-address mode and virtual-8086 mode; refer to Chapter 12, System Management Mode (SMM) for information on SMM. This chapter describes how to integrate 16-bit program modules with 32-bit program modules when operating in protected mode and how to mix 16-bit and 32-bit code within 32-bit code segments.
17-1
17.1. DEFINING 16-BIT AND 32-BIT PROGRAM MODULES

The following Intel Architecture mechanisms are used to distinguish between and support 16bit and 32-bit segments and operations:
The D (default operand and address size) flag in code-segment descriptors. The B (default stack size) flag in stack-segment descriptors. 16-bit and 32-bit call gates, interrupt gates, and trap gates. Operand-size and address-size instruction prefixes. 16-bit and 32-bit general-purpose registers.
The D flag in a code-segment descriptor determines the default operand-size and address-size for the instructions of a code segment. (In real-address mode and virtual-8086 mode, which do not use segment descriptors, the default is 16 bits.) A code segment with its D flag set is a 32-bit segment; a code segment with its D flag clear is a 16-bit segment. The B flag in the stack-segment descriptor specifies the size of stack pointer (the 32-bit ESP register or the 16-bit SP register) used by the processor for implicit stack references. The B flag for all data descriptors also controls upper address range for expand down segments. When transferring program control to another code segment through a call gate, interrupt gate, or trap gate, the operand size used during the transfer is determined by the type of gate used (16bit or 32-bit), (not by the D-flag or prefix of the transfer instruction). The gate type determines how return information is saved on the stack (or stacks). For most efficient and trouble-free operation of the processor, 32-bit programs or tasks should have the D flag in the code-segment descriptor and the B flag in the stack-segment descriptor set, and 16-bit programs or tasks should have these flags clear. Program control transfers from 16-bit segments to 32-bit segments (and vice versa) are handled most efficiently through call, interrupt, or trap gates. Instruction prefixes can be used to override the default operand size and address size of a code segment. These prefixes can be used in real-address mode as well as in protected mode and virtual-8086 mode. An operand-size or address-size prefix only changes the size for the duration of the instruction.
17.2. MIXING 16-BIT AND 32-BIT OPERATIONS WITHIN A CODE SEGMENT

The following two instruction prefixes allow mixing of 32-bit and 16-bit operations within one segment:
The operand-size prefix (66H) The address-size prefix (67H)
These prefixes reverse the default size selected by the D flag in the code-segment descriptor. For example, the processor can interpret the (MOV mem, reg) instruction in any of four ways:
17-2
In a 32-bit code segment: Moves 32 bits from a 32-bit register to memory using a 32-bit effective address. If preceded by an operand-size prefix, moves 16 bits from a 16-bit register to memory using a 32-bit effective address. If preceded by an address-size prefix, moves 32 bits from a 32-bit register to memory using a 16-bit effective address. If preceded by both an address-size prefix and an operand-size prefix, moves 16 bits from a 16-bit register to memory using a 16-bit effective address.
In a 16-bit code segment: Moves 16 bits from a 16-bit register to memory using a 16-bit effective address. If preceded by an operand-size prefix, moves 32 bits from a 32-bit register to memory using a 16-bit effective address. If preceded by an address-size prefix, moves 16 bits from a 16-bit register to memory using a 32-bit effective address. If preceded by both an address-size prefix and an operand-size prefix, moves 32 bits from a 32-bit register to memory using a 32-bit effective address.
The previous examples show that any instruction can generate any combination of operand size and address size regardless of whether the instruction is in a 16- or 32-bit segment. The choice of the 16- or 32-bit default for a code segment is normally based on the following criteria:
PerformanceAlways use 32-bit code segments when possible. They run much faster than 16-bit code segments on P6 family processors, and somewhat faster on earlier Intel Architecture processors. The operating system the code segment will be running onIf the operating system is a 16-bit operating system, it may not support 32-bit program modules. Mode of operationIf the code segment is being designed to run in real-address mode, virtual-8086 mode, or SMM, it must be a 16-bit code segment. Backward compatibility to earlier Intel Architecture processorsIf a code segment must be able to run on an Intel 8086 or Intel 286 processor, it must be a 16-bit code segment.
17.3. SHARING DATA AMONG MIXED-SIZE CODE SEGMENTS

Data segments can be accessed from both 16-bit and 32-bit code segments. When a data segment that is larger than 64 KBytes is to be shared among 16- and 32-bit code segments, the data that is to be accessed from the 16-bit code segments must be located within the first 64 KBytes of the data segment. The reason for this is that 16-bit pointers by definition can only point to the first 64 KBytes of a segment.
17-3
A stack that spans less than 64 KBytes can be shared by both 16- and 32-bit code segments. This class of stacks includes:
Stacks in expand-up segments with the G (granularity) and B (big) flags in the stacksegment descriptor clear. Stacks in expand-down segments with the G and B flags clear. Stacks in expand-up segments with the G flag set and the B flag clear and where the stack is contained completely within the lower 64 KBytes. (Offsets greater than FFFFH can be used for data, other than the stack, which is not shared.)
Refer to Section 3.4.3., Segment Descriptors in Chapter 3, Protected-Mode Memory Management for a description of the G and B flags and the expand-down stack type. The B flag cannot, in general, be used to change the size of stack used by a 16-bit code segment. This flag controls the size of the stack pointer only for implicit stack references such as those caused by interrupts, exceptions, and the PUSH, POP, CALL, and RET instructions. It does not control explicit stack references, such as accesses to parameters or local variables. A 16-bit code segment can use a 32-bit stack only if the code is modified so that all explicit references to the stack are preceded by the 32-bit address-size prefix, causing those references to use 32-bit addressing and explicit writes to the stack pointer are preceded by a 32-bit operand-size prefix. In 32-bit, expand-down segments, all offsets may be greater than 64 KBytes; therefore, 16-bit code cannot use this kind of stack segment unless the code segment is modified to use 32-bit addressing.
17.4. TRANSFERRING CONTROL AMONG MIXED-SIZE CODE SEGMENTS

There are three ways for a procedure in a 16-bit code segment to safely make a call to a 32-bit code segment:
Make the call through a 32-bit call gate. Make a 16-bit call to a 32-bit interface procedure. The interface procedure then makes a 32-bit call to the intended destination. Modify the 16-bit procedure, inserting an operand-size prefix before the call, to change it to a 32-bit call.
Likewise, there are three ways for procedure in a 32-bit code segment to safely make a call to a 16-bit code segment:
Make the call through a 16-bit call gate. Here, the EIP value at the CALL instruction cannot exceed FFFFH. Make a 32-bit call to a 16-bit interface procedure. The interface procedure then makes a 16-bit call to the intended destination. Modify the 32-bit procedure, inserting an operand-size prefix before the call, changing it to a 16-bit call. Be certain that the return offset does not exceed FFFFH.
17-4
These methods of transferring program control overcome the following architectural limitations imposed on calls between 16-bit and 32-bit code segments:
Pointers from 16-bit code segments (which by default can only be 16-bits) cannot be used to address data or code located beyond FFFFH in a 32-bit segment. The operand-size attributes for a CALL and its companion RETURN instruction must be the same to maintain stack coherency. This is also true for implicit calls to interrupt and exception handlers and their companion IRET instructions. A 32-bit parameters (particularly a pointer parameter) greater than FFFFH cannot be squeezed into a 16-bit parameter location on a stack. The size of the stack pointer (SP or ESP) changes when switching between 16-bit and 32-bit code segments.
These limitations are discussed in greater detail in the following sections.
17.4.1. Code-Segment Pointer Size

For control-transfer instructions that use a pointer to identify the next instruction (that is, those that do not use gates), the operand-size attribute determines the size of the offset portion of the pointer. The implications of this rule are as follows:
A JMP, CALL, or RET instruction from a 32-bit segment to a 16-bit segment is always possible using a 32-bit operand size, providing the 32-bit pointer does not exceed FFFFH. A JMP, CALL, or RET instruction from a 16-bit segment to a 32-bit segment cannot address a destination greater than FFFFH, unless the instruction is given an operand-size prefix.
Refer to Section 17.4.5., Writing Interface Procedures for an interface procedure that can transfer program control from 16-bit segments to destinations in 32-bit segments beyond FFFFH.
17.4.2. Stack Management for Control Transfer

Because the stack is managed differently for 16-bit procedure calls than for 32-bit calls, the operand-size attribute of the RET instruction must match that of the CALL instruction (refer to Figure 17-1). On a 16-bit call, the processor pushes the contents of the 16-bit IP register and (for calls between privilege levels) the 16-bit SP register. The matching RET instruction must also use a 16-bit operand size to pop these 16-bit values from the stack into the 16-bit registers. A 32-bit CALL instruction pushes the contents of the 32-bit EIP register and (for inter-privilegelevel calls) the 32-bit ESP register. Here, the matching RET instruction must use a 32-bit operand size to pop these 32-bit values from the stack into the 32-bit registers. If the two parts of a CALL/RET instruction pair do not have matching operand sizes, the stack will not be managed correctly and the values of the instruction pointer and stack pointer will not be restored to correct values.
17-5
Without Privilege Transition After 16-bit Call

31 0
After 32-bit Call

31 0
Stack Growth
PARM 2 PARM 1 CS IP SP
PARM 2 PARM 1 CS EIP ESP
With Privilege Transition After 16-bit Call

31 0
After 32-bit Call

31 0
SS Stack Growth
SP ESP SP
SS
PARM 2 PARM 1 CS IP
PARM 2 PARM 1 CS EIP ESP
Undefined
Figure 17-1. Stack after Far 16- and 32-Bit Calls
While executing 32-bit code, if a call is made to a 16-bit code segment which is at the same or a more privileged level (that is, the DPL of the called code segment is less than or equal to the CPL of the calling code segment) through a 16-bit call gate, then the upper 16-bits of the ESP register may be unreliable upon returning to the 32-bit code segment (that is, after executing a RET in the 16-bit code segment). When the CALL instruction and its matching RET instruction are in code segments that have D flags with the same values (that is, both are 32-bit code segments or both are 16-bit code segments), the default settings may be used. When the CALL instruction and its matching RET instruction are in segments which have different D-flag settings, an operand-size prefix must be used.
17-6
17.4.2.1.
CONTROLLING THE OPERAND-SIZE ATTRIBUTE FOR A CALL
Three things can determine the operand-size of a call:
The D flag in the segment descriptor for the calling code segment. An operand-size instruction prefix. The type of call gate (16-bit or 32-bit), if a call is made through a call gate.
When a call is made with a pointer (rather than a call gate), the D flag for the calling code segment determines the operand-size for the CALL instruction. This operand-size attribute can be overridden by prepending an operand-size prefix to the CALL instruction. So, for example, if the D flag for a code segment is set for 16 bits and the operand-size prefix is used with a CALL instruction, the processor will cause the information stored on the stack to be stored in 32-bit format. If the call is to a 32-bit code segment, the instructions in that code segment will be able to read the stack coherently. Also, a RET instruction from the 32-bit code segment without an operand-size prefix will maintain stack coherency with the 16-bit code segment being returned to. When a CALL instruction references a call-gate descriptor, the type of call is determined by the type of call gate (16-bit or 32-bit). The offset to the destination in the code segment being called is taken from the gate descriptor; therefore, if a 32-bit call gate is used, a procedure in a 16-bit code segment can call a procedure located more than 64 Kbytes from the base of a 32-bit code segment, because a 32-bit call gate uses a 32-bit offset. Note that regardless of the operand size of the call and how it is determined, the size of the stack pointer used (SP or ESP) is always controlled by the B flag in the stack-segment descriptor currently in use (that is, when B is clear, SP is used, and when B is set, ESP is used). An unmodified 16-bit code segment that has run successfully on an 8086 processor or in real-mode on a P6-family processor will have its D flag clear and will not use operand-size override prefixes. As a result, all CALL instructions in this code segment will use the 16-bit operandsize attribute. Procedures in these code segments can be modified to safely call procedures to 32-bit code segments in either of two ways:
Relink the CALL instruction to point to 32-bit call gates (refer to Section 17.4.2.2., Passing Parameters With a Gate). Add a 32-bit operand-size prefix to each CALL instruction. PASSING PARAMETERS WITH A GATE
17.4.2.2.
When referencing 32-bit gates with 16-bit procedures, it is important to consider the number of parameters passed in each procedure call. The count field of the gate descriptor specifies the size of the parameter string to copy from the current stack to the stack of a more privileged (numerically lower privilege level) procedure. The count field of a 16-bit gate specifies the number of 16-bit words to be copied, whereas the count field of a 32-bit gate specifies the number of 32-bit doublewords to be copied. The count field for a 32-bit gate must thus be half the size of the number of words being placed on the stack by a 16-bit procedure. Also, the 16-bit procedure must use an even number of words as parameters.
17-7
17.4.3. Interrupt Control Transfers

A program-control transfer caused by an exception or interrupt is always carried out through an interrupt or trap gate (located in the IDT). Here, the type of the gate (16-bit or 32-bit) determines the operand-size attribute used in the implicit call to the exception or interrupt handler procedure in another code segment. A 32-bit interrupt or trap gate provides a safe interface to a 32-bit exception or interrupt handler when the exception or interrupt occurs in either a 32-bit or a 16-bit code segment. It is sometimes impractical, however, to place exception or interrupt handlers in 16-bit code segments, because only 16-bit return addresses are saved on the stack. If an exception or interrupt occurs in a 32-bit code segment when the EIP was greater than FFFFH, the 16-bit handler procedure cannot provide the correct return address.
17.4.4. Parameter Translation

When segment offsets or pointers (which contain segment offsets) are passed as parameters between 16-bit and 32-bit procedures, some translation is required. If a 32-bit procedure passes a pointer to data located beyond 64 KBytes to a 16-bit procedure, the 16-bit procedure cannot use it. Except for this limitation, interface code can perform any format conversion between 32-bit and 16-bit pointers that may be needed. Parameters passed by value between 32-bit and 16-bit code also may require translation between 32-bit and 16-bit formats. The form of the translation is application-dependent.
17.4.5. Writing Interface Procedures

Placing interface code between 32-bit and 16-bit procedures can be the solution to the following interface problems:
Allowing procedures in 16-bit code segments to call procedures with offsets greater than FFFFH in 32-bit code segments. Matching operand-size attributes between companion CALL and RET instructions. Translating parameters (data), including managing parameter strings with a variable count or an odd number of 16-bit words. The possible invalidation of the upper bits of the ESP register.
The interface procedure is simplified where these rules are followed. 1. The interface procedure must reside in a 32-bit code segment (the D flag for the codesegment descriptor is set). 2. All procedures that may be called by 16-bit procedures must have offsets not greater than FFFFH. 3. All return addresses saved by 16-bit procedures must have offsets not greater than FFFFH.
17-8
The interface procedure becomes more complex if any of these rules are violated. For example, if a 16-bit procedure calls a 32-bit procedure with an entry point beyond FFFFH, the interface procedure will need to provide the offset to the entry point. The mapping between 16- and 32-bit addresses is only performed automatically when a call gate is used, because the gate descriptor for a call gate contains a 32-bit address. When a call gate is not used, the interface code must provide the 32-bit address. The structure of the interface procedure depends on the types of calls it is going to support, as follows:
Calls from 16-bit procedures to 32-bit procedures. Calls to the interface procedure from a 16-bit code segment are made with 16-bit CALL instructions (by default, because the D flag for the calling code-segment descriptor is clear), and 16-bit operand-size prefixes are used with RET instructions to return from the interface procedure to the calling procedure. Calls from the interface procedure to 32-bit procedures are performed with 32-bit CALL instructions (by default, because the D flag for the interface procedures code segment is set), and returns from the called procedures to the interface procedure are performed with 32-bit RET instructions (also by default). Calls from 32-bit procedures to 16-bit procedures. Calls to the interface procedure from a 32-bit code segment are made with 32-bit CALL instructions (by default), and returns to the calling procedure from the interface procedure are made with 32-bit RET instructions (also by default). Calls from the interface procedure to 16-bit procedures require the CALL instructions to have the operand-size prefixes, and returns from the called procedures to the interface procedure are performed with 16-bit RET instructions (by default).
17-9
17-10
18
Intel Architecture Compatibility
CHAPTER 18 INTEL ARCHITECTURE COMPATIBILITY

All Intel Architecture processors are binary compatible. Compatibility means that, within certain limited constraints, programs that execute on previous generations of Intel Architecture processors will produce identical results when executed on later Intel Architecture processors. The compatibility constraints and any implementation differences between the Intel Architecture processors are described in this chapter. Each new Intel Architecture processor has enhanced the software visible architecture from that found in earlier Intel Architecture processors. Those enhancements have been defined with consideration for compatibility with previous and future processors. This chapter also summarizes the compatibility considerations for those extensions.
18.1. INTEL ARCHITECTURE FAMILIES AND CATEGORIES

Intel Architecture processors are referred to in several different ways in this chapter, depending on the type of compatibility information being related, as described in the following:
Intel Architecture ProcessorsAll the Intel processors based on the Intel Architecture, which include the 8086/88, Intel 286, Intel386, Intel486, Pentium, and P6 family processors. 32-bit ProcessorsAll the Intel Architecture processors that use a 32-bit architecture, which include the Intel386, Intel486, Pentium, and P6 family processors. 16-bit ProcessorsAll the Intel Architecture processors that use a 16-bit architecture, which include the 8086/88 and Intel 286 processors. P6 Family ProcessorsAll the Intel Architecture processors that are based on the P6 family micro-architecture, which include the Pentium Pro, Pentium II, Pentium III and future P6 family processors.
18.2. RESERVED BITS

Throughout this manual, certain bits are marked as reserved in many register and memory layout descriptions. When bits are marked as undefined or reserved, it is essential for compatibility with future processors that software treat these bits as having a future, though unknown effect. Software should follow these guidelines in dealing with reserved bits:
Do not depend on the states of any reserved bits when testing the values of registers or memory locations that contain such bits. Mask out the reserved bits before testing. Do not depend on the states of any reserved bits when storing them to memory or to a register.
18-1
INTEL ARCHITECTURE COMPATIBILITY
Do not depend on the ability to retain information written into any reserved bits. When loading a register, always load the reserved bits with the values indicated in the documentation, if any, or reload them with values previously read from the same register.
Software written for existing Intel Architecture processor that handles reserved bits correctly will port to future Intel Architecture processors without generating protection exceptions.
18.3. ENABLING NEW FUNCTIONS AND MODES

Most of the new control functions defined for the P6 family and Pentium processors are enabled by new mode flags in the control registers (primarily register CR4). This register is undefined for Intel Architecture processors earlier than the Pentium processor. Attempting to access this register with an Intel486 or earlier Intel Architecture processor results in an invalid-opcode exception (#UD). Consequently, programs that execute correctly on the Intel486 or earlier Intel Architecture processor cannot erroneously enable these functions. Attempting to set a reserved bit in register CR4 to a value other than its original value results in a general-protection exception (#GP). So, programs that execute on the P6 family and Pentium processors cannot erroneously enable functions that may be implemented in future Intel Architecture processors. The P6 family and Pentium processors do not check for attempts to set reserved bits in modelspecific registers. It is the obligation of the software writer to enforce this discipline. These reserved bits may be used in future Intel processors.
18.4. DETECTING THE PRESENCE OF NEW FEATURES THROUGH SOFTWARE

Software can check for the presence of new architectural features and extensions in either of two ways:
Test for the presence of the feature or extension Software can test for the presence of new flags in the EFLAGS register and control registers. If these flags are reserved (meaning not present in the processor executing the test), an exception is generated. Likewise, software can attempt to execute a new instruction, which results in an invalidopcode exception (#UD) being generated if it is not supported. Execute the CPUID instruction The CPUID instruction (added to the Intel Architecture in the Pentium processor) indicates the presence of new features directly.
Refer to Chapter 10, Processor Identification and Feature Determination, in the Intel Architecture Software Developers Manual, Volume 1, for detailed information on detecting new processor features and extensions.
18-2
18.5. MMX TECHNOLOGY

The Pentium processor with MMX technology introduced the MMX technology and a set of MMX instructions to the Intel Architecture. The MMX instructions are summarized in Chapter 6, Instruction Set Summary, in the Intel Architecture Software Developers Manual, Volume 1 and are described in detail in Chapter 3 in the Intel Architecture Software Developers Manual, Volume 2. The MMX technology and MMX instructions are also included in the Pentium II and Pentium III processors.
18.6. STREAMING SIMD EXTENSIONS

The Pentium III processor introduced the Streaming SIMD Extensions. This is a set of new instructions added to enhance perfomance of several classes of applications. The Streaming SIMD Extensions are summarized in Chapter 6, Instruction Set Summary, in the Intel Architecture Software Developers Manual, Volume 1 and are described in detail in Chapter 3 in the Intel Architecture Software Developers Manual, Volume 2. Several of these new instructions operate in the same register space as the MMX instructions. When using these instructions, the rules that apply to MMX technology programming apply to this subset of the new instructions as well.
18.7. NEW INSTRUCTIONS IN THE PENTIUM AND LATER INTEL ARCHITECTURE PROCESSORS
Table 18-1 identifies the instructions introduced into the Intel Architecture in the Pentium and later Intel Architecture processors.
Table 18-1. New Instructions in the Pentium and Later Intel Architecture Processors
Instruction Streaming SIMD Extensions SYSENTER/SYSEXIT(fast system call) FXSAVE/FXRSTOR(fast save/restore) CMOVcc (conditional move) FCMOVcc (floating-point conditional move) FCOMI (floating-point compare and set EFLAGS) RDPMC (read performance monitoring counters) UD2 (undefined) CPUID Identification Bits EDX, Bit 25 EDX, Bit 11 EDX, Bit 24 EDX, Bit 15 EDX, Bits 0 and 15 EDX, Bits 0 and 15 EAX, Bits 8-11, set to 6H; refer to Note 1 EAX, Bits 8-11, set to 6H Introduced In Pentium III processor Pentium II processor Pentium II processor Pentium Pro processor
18-3
Table 18-1. New Instructions in the Pentium and Later Intel Architecture Processors
Instruction CMPXCHG8B (compare and exchange 8 bytes) CPUID (CPU identification) RDTSC (read time-stamp counter) RDMSR (read model-specific register) WRMSR (write model-specific register) MMX Instructions NOTES: 1. The RDPMC instruction was introduced in the P6 family of processors and added to later model Pentium processors. This instruction is model specific in nature and not architectural. 2. The CPUID instruction is available in all Pentium and P6 family processors and in later models of the Intel486 processors. The ability to set and clear the ID flag (bit 21) in the EFLAGS register indicates the availability of the CPUID instruction. CPUID Identification Bits EDX, Bit 8 None; refer to Note 2 EDX, Bit 4 EDX, Bit 5 EDX, Bit 5 EDX, Bit 23 Introduced In Pentium processor
18-4
18.7.1. Instructions Added Prior to the Pentium Processor

The following instructions were added in the Intel486 processor:
BSWAP (byte swap) instruction. XADD (exchange and add) instruction. CMPXCHG (compare and exchange) instruction. NVD (invalidate cache) instruction. WBINVD (write-back and invalidate cache) instruction. INVLPG (invalidate TLB entry) instruction.
The following instructions were added in the Intel386 processor: LSS, LFS, and LGS (load SS, FS, and GS registers). Long-displacement conditional jumps. Single-bit instructions. Bit scan instructions. Double-shift instructions. Byte set on condition instruction. Move with sign/zero extension. Generalized multiply instruction. MOV to and from control registers. MOV to and from test registers (now obsolete). MOV to and from debug registers. RSM (resume from SMM). This instruction was introduced in the Intel386 SL and Intel486 SL processors.
The following instructions were added in the Intel 387 math coprocessor: FPREM1. FUCOM, FUCOMP, and FUCOMPP.
18.8. OBSOLETE INSTRUCTIONS

The MOV to and from test registers instructions were removed from the Pentium and future Intel Architecture processors. Execution of these instructions generates an invalid-opcode exception (#UD).
18-5
18.9. UNDEFINED OPCODES

All new instructions defined for Intel Architecture processors use binary encodings that were reserved on earlier-generation processors. Attempting to execute a reserved opcode always results in an invalid-opcode (#UD) exception being generated. Consequently, programs that execute correctly on earlier-generation processors cannot erroneously execute these instructions and thereby produce unexpected results when executed on later Intel Architecture processors.
18.10.NEW FLAGS IN THE EFLAGS REGISTER

The section titled EFLAGS Register in Chapter 3 of the Intel Architecture Software Developers Manual, Volume 1, shows the configuration of flags in the EFLAGS register for the P6 family processors. No new flags have been added to this register in the P6 family processors. The flags added to this register in the Pentium and Intel486 processors are described in the following sections. The following flags were added to the EFLAGS register in the Pentium processor:
VIF (virtual interrupt flag), bit 19. VIP (virtual interrupt pending), bit 20. ID (identification flag), bit 21.
The AC flag (bit 18) was added to the EFLAGS register in the Intel486 processor.
18.10.1. Using EFLAGS Flags to Distinguish Between 32-Bit Intel Architecture Processors
The following bits in the EFLAGS register that can be used to differentiate between the 32-bit Intel Architecture processors:
Bit 18 (the AC flag) can be used to distinguish an Intel386 processor from the P6 family, Pentium, and Intel486 processors. Since it is not implemented on the Intel386 processor, it will always be clear. Bit 21 (the ID flag) indicates whether an application can execute the CPUID instruction. The ability to set and clear this bit indicates that the processor is a P6 family or Pentium processor. The CPUID instruction can then be used to determine which processor. Bits 19 (the VIF flag) and 20 (the VIP flag) will always be zero on processors that do not support virtual mode extensions, which includes all 32-bit processors prior to the Pentium processor.
Refer to Chapter 10, Processor Identification and Feature Determination, in the Intel Architecture Software Developers Manual, Volume 1, for more information on identifying processors.
18-6
18.11.STACK OPERATIONS
This section identifies the differences in stack implementation between the various Intel Architecture processors.
18.11.1. PUSH SP
The P6 family, Pentium, Intel486, Intel386, and Intel 286 processors push a different value on the stack for a PUSH SP instruction than the 8086 processor. The 32-bit processors push the value of the SP register before it is decremented as part of the push operation; the 8086 processor pushes the value of the SP register after it is decremented. If the value pushed is important, replace PUSH SP instructions with the following three instructions:
PUSH BP MOV BP, SP XCHG BP, [BP]
This code functions as the 8086 processor PUSH SP instruction on the P6 family, Pentium, Intel486, Intel386, and Intel 286 processors.
18.11.2. EFLAGS Pushed on the Stack

The setting of the stored values of bits 12 through 15 (which includes the IOPL field and the NT flag) in the EFLAGS register by the PUSHF instruction, by interrupts, and by exceptions is different with the 32-bit Intel Architecture processors than with the 8086 and Intel 286 processors. The differences are as follows:
8086 processorbits 12 through 15 are always set. Intel 286 processorbits 12 through 15 are always cleared in real-address mode. 32-bit processors in real-address modebit 15 (reserved) is always cleared, and bits 12 through 14 have the last value loaded into them.
18.12.FPU
This section addresses the issues that must be faced when porting floating-point software designed to run on earlier Intel Architecture processors and math coprocessors to a Pentium or P6 family processor with integrated FPU. To software, a P6 family processor looks very much like a Pentium processor. Floating-point software which runs on a Pentium or Intel486 DX processor, or on an Intel486 SX processor/Intel 487 SX math coprocessor system or an Intel386 processor/Intel 387 math coprocessor system, will run with at most minor modifications on a P6 family processor. To port code directly from an Intel 286 processor/Intel 287 math coprocessor system or an Intel 8086 processor/8087 math coprocessor system to the Pentium and P6 family processors, certain additional issues must be addressed.
18-7
In the following sections, the term 32-bit Intel Architecture FPUs refers to the P6 family, Pentium, and Intel486 DX processors, and to the Intel 487 SX and Intel 387 math coprocessors; the term 16-bit Intel Architecture math coprocessors refers to the Intel 287 and 8087 math coprocessors.
18.12.1. Control Register CR0 Flags

The ET, NE, and MP flags in control register CR0 control the interface between the integer unit of an Intel Architecture processor and either its internal FPU or an external math coprocessor. The effect of these flags in the various Intel Architecture processors are described in the following paragraphs. The ET (extension type) flag (bit 4 of the CR0 register) is used in the Intel386 processor to indicate whether the math coprocessor in the system is an Intel 287 math coprocessor (flag is clear) or an Intel 387 DX math coprocessor (flag is set). This bit is hardwired to 1 in the P6 family, Pentium, and Intel486 processors. The NE (Numeric Exception) flag (bit 5 of the CR0 register) is used in the P6 family, Pentium, and Intel486 processors to determine whether unmasked floating-point exceptions are reported internally through interrupt vector 16 (flag is set) or externally through an external interrupt (flag is clear). On a hardware reset, the NE flag is initialized to 0, so software using the automatic internal error-reporting mechanism must set this flag to 1. This flag is nonexistent on the Intel386 processor. As on the Intel 286 and Intel386 processors, the MP (monitor coprocessor) flag (bit 1 of register CR0) determines whether the WAIT/FWAIT instructions or waiting-type floating-point instructions trap when the context of the FPU is different from that of the currently-executing task. If the MP and TS flag are set, then a WAIT/FWAIT instruction and waiting instructions will cause a device-not-available exception (interrupt vector 7). The MP flag is used on the Intel 286 and Intel386 processors to support the use of a WAIT/FWAIT instruction to wait on a device other than a math coprocessor. The device reports its status through the BUSY# pin. Since the P6 family, Pentium, and Intel486 processors do not have such a pin, the MP flag has no relevant use and should be set to 1 for normal operation.
18.12.2. FPU Status Word

This section identifies differences to the FPU status word for the different Intel Architecture processors and math coprocessors, the reason for the differences, and their impact on software. 18.12.2.1. CONDITION CODE FLAGS (C0 THROUGH C3)
The following information pertains to differences in the use of the condition code flags (C0 through C3) located in bits 8, 9, 10, and 14 of the FPU status word. After execution of an FINIT instruction or a hardware reset on a 32-bit Intel Architecture FPU, the condition code flags are set to 0. The same operations on a 16-bit Intel Architecture math
18-8
coprocessor leave these flags intact (they contain their prior value). This difference in operation has no impact on software and provides a consistent state after reset. Transcendental instruction results in the core range of the P6 family and Pentium processors may differ from the Intel486 DX processor and Intel 487 SX math coprocessor by 2 to 3 units in the last place (ulps)(refer to Transcendental Instruction Accuracy in Chapter 7 of the Intel Architecture Software Developers Manual, Volume 1). As a result, the value saved in the C1 flag may also differ. After an incomplete FPREM/FPREM1 instruction, the C0, C1, and C3 flags are set to 0 on the 32-bit Intel Architecture FPUs. After the same operation on a 16-bit Intel Architecture math coprocessor, these flags are left intact. On the 32-bit Intel Architecture FPUs, the C2 flag serves as an incomplete flag for the FTAN instruction. On the 16-bit Intel Architecture math coprocessors, the C2 flag is undefined for the FPTAN instruction. This difference has no impact on software, because Intel 287 or 8087 programs do not check C2 after an FPTAN instruction. The use of this flag on later processors allows fast checking of operand range. 18.12.2.2. STACK FAULT FLAG
When unmasked stack overflow or underflow occurs on a 32-bit Intel Architecture FPU, the IE flag (bit 0) and the SF flag (bit 6) of the FPU status word are set to indicate a stack fault and condition code flag C1 is set or cleared to indicate overflow or underflow, respectively. When unmasked stack overflow or underflow occurs on a 16-bit Intel Architecture math coprocessor, only the IE flag is set. Bit 6 is reserved on these processors. The addition of the SF flag on a 32bit Intel Architecture FPU has no impact on software. Existing exception handlers need not change, but may be upgraded to take advantage of the additional information.
18.12.3. FPU Control Word

Only affine closure is supported for infinity control on a 32-bit Intel Architecture FPU. The infinity control flag (bit 12 of the FPU control word) remains programmable on these processors, but has no effect. This change was made to conform to IEEE Standard 754. On a 16-bit Intel Architecture math coprocessor, both affine and projective closures are supported, as determined by the setting of bit 12. After a hardware reset, the default value of bit 12 is projective. Software that requires projective infinity arithmetic may give different results.
18.12.4. FPU Tag Word

When loading the tag word of a 32-bit Intel Architecture FPU, using an FLDENV, FRSTOR, or FXRSTOR (Pentium III processor only) instruction, the processor examines the incoming tag and classifies the location only as empty or nonempty. Thus, tag values of 00, 01, and 10 are interpreted by the processor to indicate a nonempty location. The tag value of 11 is interpreted by the processor to indicate an empty location. Subsequent operations on a nonempty register always examine the value in the register, not the value in its tag. The FSTENV, FSAVE, and
18-9
FXSAVE (Pentium III processor only) instructions examine the nonempty registers and put the correct values in the tags before storing the tag word. The corresponding tag for a 16-bit Intel Architecture math coprocessor is checked before each register access to determine the class of operand in the register; the tag is updated after every change to a register so that the tag always reflects the most recent status of the register. Software can load a tag with a value that disagrees with the contents of a register (for example, the register contains a valid value, but the tag says special). Here, the 16-bit Intel Architecture math coprocessors honor the tag and do not examine the register. Software written to run on a 16-bit Intel Architecture math coprocessor may not operate correctly on a 16-bit Intel Architecture FPU, if it uses FLDENV, FRSTOR, or FXRSTOR (Pentium III processor only) to change tags to values (other than to empty) that are different from actual register contents. The encoding in the tag word for the 32-bit Intel Architecture FPUs for unsupported data formats (including pseudo-zero and unnormal) is special (10B), to comply with the IEEE Standard 754. The encoding in the 16-bit Intel Architecture math coprocessors for pseudo-zero and unnormal is valid (00B) and the encoding for other unsupported data formats is special (10B). Code that recognizes the pseudo-zero or unnormal format as valid must therefore be changed if it is ported to a 32-bit Intel Architecture FPU.
18.12.5. Data Types

This section discusses the differences of data types for the various Intel Architecture FPUs and math coprocessors. 18.12.5.1. NaNs
The 32-bit Intel Architecture FPUs distinguish between signaling NaNs (SNaNs) and quiet NaNs (QNaNs). These FPUs only generate QNaNs and normally do not generate an exception upon encountering a QNaN. An invalid operation exception (#I) is generated only upon encountering a SNaN, except for the FCOM, FIST, and FBSTP instructions, which also generates an invalid operation exceptions for a QNaNs. This behavior matches the IEEE Standard 754. The 16-bit Intel Architecture math coprocessors only generate one kind of NaN (the equivalent of a QNaN), but the raise an invalid operation exception upon encountering any kind of NaN. When porting software written to run on a 16-bit Intel Architecture math coprocessor to a 32-bit Intel Architecture FPU, uninitialized memory locations that contain QNaNs should be changed to SNaNs to cause the FPU or math coprocessor to fault when uninitialized memory locations are referenced. 18.12.5.2. PSEUDO-ZERO, PSEUDO-NaN, PSEUDO-INFINITY, AND UNNORMAL FORMATS
The 32-bit Intel Architecture FPUs neither generate nor support the pseudo-zero, pseudo-NaN, pseudo-infinity, and unnormal formats. Whenever they encounter them in an arithmetic opera-
18-10
tion, they raise an invalid operation exception. The 16-bit Intel Architecture math coprocessors define and support special handling for these formats. Support for these formats was dropped to conform with the IEEE Standard 754. This change should not impact software ported from 16-bit Intel Architecture math coprocessors to 32-bit Intel Architecture FPUs. The 32-bit Intel Architecture FPUs do not generate these formats, and therefore will not encounter them unless software explicitly loads them in the data registers. The only affect may be in how software handles the tags in the tag word (refer to Section 18.12.4., FPU Tag Word).
18.12.6. Floating-Point Exceptions

This section identifies the implementation differences in exception handling for floating-point instructions in the various Intel Architecture FPUs and math coprocessors. 18.12.6.1. DENORMAL OPERAND EXCEPTION (#D)
When the denormal operand exception is masked, the 32-bit Intel Architecture FPUs automatically normalize denormalized numbers when possible; whereas, the 16-bit Intel Architecture math coprocessors return a denormal result. A program written to run on a 16-bit Intel Architecture math coprocessor that uses the denormal exception solely to normalize denormalized operands is redundant when run on the 32-bit Intel Architecture FPUs. If such a program is run on 32-bit Intel Architecture FPUs, performance can be improved by masking the denormal exception. Floating-point programs run faster when the FPU performs normalization of denormalized operands. The denormal operand exception is not raised for transcendental instructions and the FXTRACT instruction on the 16-bit Intel Architecture math coprocessors. This exception is raised for these instructions on the 32-bit Intel Architecture FPUs. The exception handlers ported to these latter processors need to be changed only if the handlers gives special treatment to different opcodes. 18.12.6.2. NUMERIC OVERFLOW EXCEPTION (#O)
On the 32-bit Intel Architecture FPUs, when the numeric overflow exception is masked and the rounding mode is set to chop (toward 0), the result is the largest positive or smallest negative number. The 16-bit Intel Architecture math coprocessors do not signal the overflow exception when the masked response is not ; that is, they signal overflow only when the rounding control is not set to round to 0. If rounding is set to chop (toward 0), the result is positive or negative . Under the most common rounding modes, this difference has no impact on existing software. If rounding is toward 0 (chop), a program on a 32-bit Intel Architecture FPU produces, under overflow conditions, a result that is different in the least significant bit of the significand, compared to the result on a 16-bit Intel Architecture math coprocessor. The reason for this difference is IEEE Standard 754 compatibility. When the overflow exception is not masked, the precision exception is flagged on the 32-bit Intel Architecture FPUs. When the result is stored in the stack, the significand is rounded according to the precision control (PC) field of the FPU control word or according to the opcode.
18-11
On the 16-bit Intel Architecture math coprocessors, the precision exception is not flagged and the significand is not rounded. The impact on existing software is that if the result is stored on the stack, a program running on a 32-bit Intel Architecture FPU produces a different result under overflow conditions than on a 16-bit Intel Architecture math coprocessor. The difference is apparent only to the exception handler. This difference is for IEEE Standard 754 compatibility. 18.12.6.3. NUMERIC UNDERFLOW EXCEPTION (#U)
When the underflow exception is masked on the 32-bit Intel Architecture FPUs, the underflow exception is signaled when both the result is tiny and denormalization results in a loss of accuracy. When the underflow exception is unmasked and the instruction is supposed to store the result on the stack, the significand is rounded to the appropriate precision (according to the PC flag in the FPU control word, for those instructions controlled by PC, otherwise to extended precision), after adjusting the exponent. When the underflow exception is masked on the 16-bit Intel Architecture math coprocessors and rounding is toward 0, the underflow exception flag is raised on a tiny result, regardless of loss of accuracy. When the underflow exception is not masked and the destination is the stack, the significand is not rounded, but instead is left as is. When the underflow exception is masked, this difference has no impact on existing software. The underflow exception occurs less often when rounding is toward 0. When the underflow exception not masked. A program running on a 32-bit Intel Architecture FPU produces a different result during underflow conditions than on a 16-bit Intel Architecture math coprocessor if the result is stored on the stack. The difference is only in the least significant bit of the significand and is apparent only to the exception handler. 18.12.6.4. EXCEPTION PRECEDENCE
There is no difference in the precedence of the denormal operand exception on the 32-bit Intel Architecture FPUs, whether it be masked or not. When the denormal operand exception is not masked on the 16-bit Intel Architecture math coprocessors, it takes precedence over all other exceptions. This difference causes no impact on existing software, but some unneeded normalization of denormalized operands is prevented on the Intel486 processor and Intel 387 math coprocessor. 18.12.6.5. CS AND EIP FOR FPU EXCEPTIONS
On the Intel 32-bit Intel Architecture FPUs, the values from the CS and EIP registers saved for floating-point exceptions point to any prefixes that come before the floating-point instruction. On the 8087 math coprocessor, the saved CS and IP registers points to the floating-point instruction. 18.12.6.6. FPU ERROR SIGNALS
The floating-point error signals to the P6 family, Pentium, and Intel486 processors do not pass through an interrupt controller; an INT# signal from an Intel 387, Intel 287 or 8087 math
18-12
coprocessors does. If an 8086 processor uses another exception for the 8087 interrupt, both exception vectors should call the floating-point-error exception handler. Some instructions in a floating-point-error exception handler may need to be deleted if they use the interrupt controller. The P6 family, Pentium, and Intel486 processors have signals that, with the addition of external logic, support reporting for emulation of the interrupt mechanism used in many personal computers. On the P6 family, Pentium, and Intel486 processors, an undefined floating-point opcode will cause an invalid-opcode exception (#UD, interrupt vector 6). Undefined floating-point opcodes, like legal floating-point opcodes, cause a device not available exception (#NM, interrupt vector 7) when either the TS or EM flag in control register CR0 is set. The P6 family, Pentium, and Intel486 processors do not check for floating-point error conditions on encountering an undefined floating-point opcode. 18.12.6.7. ASSERTION OF THE FERR# PIN
When using the MS-DOS compatibility mode for handing floating-point exceptions, the FERR# pin must be connected to an input to an external interrupt controller. An external interrupt is then generated when the FERR# output drives the input to the interrupt controller and the interrupt controller in turn drives the INTR pin on the processor. For the P6 family and Intel386 processors, an unmasked floating-point exception always causes the FERR# pin to be asserted upon completion of the instruction that caused the exception. For the Pentium and Intel486 processors, an unmasked floating-point exception may cause the FERR# pin to be asserted either at the end of the instruction causing the exception or immediately before execution of the next floating-point instruction. (Note that the next floating-point instruction would not be executed until the pending unmasked exception has been handled.) Refer to Appendix D in the Intel Architecture Software Developers Manual, Volume 1, for a complete description of the required mechanism for handling floating-point exceptions using the MS-DOS compatibility mode. 18.12.6.8. INVALID OPERATION EXCEPTION ON DENORMALS
An invalid operation exception is not generated on the 32-bit Intel Architecture FPUs upon encountering a denormal value when executing a FSQRT, FDIV, or FPREM instruction or upon conversion to BCD or to integer. The operation proceeds by first normalizing the value. On the 16-bit Intel Architecture math coprocessors, upon encountering this situation, the invalid operation exception is generated. This difference has no impact on existing software. Software running on the 32-bit Intel Architecture FPUs continues to execute in cases where the 16-bit Intel Architecture math coprocessors trap. The reason for this change was to eliminate an exception from being raised. 18.12.6.9. ALIGNMENT CHECK EXCEPTIONS (#AC)
If alignment checking is enabled, a misaligned data operand on the P6 family, Pentium, and Intel486 processors causes an alignment check exception (#AC) when a program or procedure is running at privilege-level 3, except for the stack portion of the FSAVE/FNSAVE/FXSAVE and FRSTOR/FXRSTOR instructions.
18-13
18.12.6.10. SEGMENT NOT PRESENT EXCEPTION DURING FLDENV On the Intel486 processor, when a segment not present exception (#NP) occurs in the middle of an FLDENV instruction, it can happen that part of the environment is loaded and part not. In such cases, the FPU control word is left with a value of 007FH. The P6 family and Pentium processors ensure the internal state is correct at all times by attempting to read the first and last bytes of the environment before updating the internal state. 18.12.6.11. DEVICE NOT AVAILABLE EXCEPTION (#NM) The device-not-available exception (#NM, interrupt 7) will occur in the P6 family, Pentium, and Intel486 processors as described in Section 2.5., Control Registers in Chapter 2, System Architecture Overview, and Section 5.12., Exception and Interrupt Reference in Chapter 5, Interrupt and Exception Handling . 18.12.6.12. COPROCESSOR SEGMENT OVERRUN EXCEPTION The coprocessor segment overrun exception (interrupt 9) does not occur in the P6 family, Pentium, and Intel486 processors. In situations where the Intel 387 math coprocessor would cause an interrupt 9, the P6 family, Pentium, and Intel486 processors simply abort the instruction. To avoid undetected segment overruns, it is recommended that the floating-point save area be placed in the same page as the TSS. This placement will prevent the FPU environment from being lost if a page fault occurs during the execution of an FLDENV, FRSTOR, or FXRSTOR instructions while the operating system is performing a task switch. 18.12.6.13. GENERAL PROTECTION EXCEPTION (#GP) A general-protection exception (#GP, interrupt 13) occurs if the starting address of a floatingpoint operand falls outside a segments size. An exception handler should be included to report these programming errors. 18.12.6.14. FLOATING-POINT ERROR EXCEPTION (#MF) In real mode and protected mode (not including virtual-8086 mode), interrupt vector 16 must point to the floating-point exception handler. In virtual 8086 mode, the virtual-8086 monitor can be programmed to accommodate a different location of the interrupt vector for floating-point exceptions.
18.12.7. Changes to Floating-Point Instructions

This section identifies the differences in floating-point instructions for the various Intel FPU and math coprocessor architectures, the reason for the differences, and their impact on software.
18-14
18.12.7.1.
FDIV, FPREM, AND FSQRT INSTRUCTIONS
The 32-bit Intel Architecture FPUs support operations on denormalized operands and, when detected, an underflow exception can occur, for compatibility with the IEEE Standard 754. The 16-bit Intel Architecture math coprocessors do not operate on denormalized operands or return underflow results. Instead, they generate an invalid operation exception when they detect an underflow condition. An existing underflow exception handler will require change only if it gives different treatment to different opcodes. Also, it is possible that fewer invalid operation exceptions will occur. 18.12.7.2. FSCALE INSTRUCTION
With the 32-bit Intel Architecture FPUs, the range of the scaling operand is not restricted. If (0 < | ST(1) < 1), the scaling factor is 0; therefore, ST(0) remains unchanged. If the rounded result is not exact or if there was a loss of accuracy (masked underflow), the precision exception is signaled. With the 16-bit Intel Architecture math coprocessors, the range of the scaling operand is restricted. If (0 < | ST(1) | < 1), the result is undefined and no exception is signaled. The impact of this difference on exiting software is that different results are delivered on the 32-bit and 16-bit FPUs and math coprocessors when (0 < | ST(1) | < 1). 18.12.7.3. FPREM1 INSTRUCTION
The 32-bit Intel Architecture FPUs compute a partial remainder according to the IEEE Standard 754. This instruction does not exist on the 16-bit Intel Architecture math coprocessors. The availability of the FPREM1 instruction has is no impact on existing software. 18.12.7.4. FPREM INSTRUCTION
On the 32-bit Intel Architecture FPUs, the condition code flags C0, C3, C1 in the status word correctly reflect the three low-order bits of the quotient following execution of the FPREM instruction. On the 16-bit Intel Architecture math coprocessors, the quotient bits are incorrect when performing a reduction of (64N + M) when (N 1) and M is 1 or 2. This difference does not affect existing software; software that works around the bug should not be affected. 18.12.7.5. FUCOM, FUCOMP, AND FUCOMPP INSTRUCTIONS
When executing the FUCOM, FUCOMP, and FUCOMPP instructions, the 32-bit Intel Architecture FPUs perform unordered compare according to IEEE Standard 754. These instructions do not exist on the 16-bit Intel Architecture math coprocessors. The availability of these new instructions has no impact on existing software. 18.12.7.6. FPTAN INSTRUCTION
On the 32-bit Intel Architecture FPUs, the range of the operand for the FPTAN instruction is much less restricted (| ST(0) | < 263) than on earlier math coprocessors. The instruction reduces the operand internally using an internal /4 constant that is more accurate. The range of the
18-15
operand is restricted to (| ST(0) | < /4) on the 16-bit Intel Architecture math coprocessors; the operand must be reduced to this range using FPREM. This change has no impact on existing software. 18.12.7.7. STACK OVERFLOW
On the 32-bit Intel Architecture FPUs, if an FPU stack overflow occurs when the invalid operation exception is masked, the FPU returns the real, integer, or BCD-integer indefinite value to the destination operand, depending on the instruction being executed. On the 16-bit Intel Architecture math coprocessors, the original operand remains unchanged following a stack overflow, but it is loaded into register ST(1). This difference has no impact on existing software. 18.12.7.8. FSIN, FCOS, AND FSINCOS INSTRUCTIONS
On the 32-bit Intel Architecture FPUs, these instructions perform three common trigonometric functions. These instructions do not exist on the 16-bit Intel Architecture math coprocessors. The availability of these instructions has no impact on existing software, but using them provides a performance upgrade. 18.12.7.9. FPATAN INSTRUCTION
On the 32-bit Intel Architecture FPUs, the range of operands for the FPATAN instruction is unrestricted. On the 16-bit Intel Architecture math coprocessors, the absolute value of the operand in register ST(0) must be smaller than the absolute value of the operand in register ST(1). This difference has impact on existing software. 18.12.7.10. F2XM1 INSTRUCTION The 32-bit Intel Architecture FPUs support a wider range of operands (1 < ST (0) < + 1) for the F2XM1 instruction. The supported operand range for the 16-bit Intel Architecture math coprocessors is (0 ST(0) 0.5). This difference has no impact on existing software. 18.12.7.11. FLD INSTRUCTION On the 32-bit Intel Architecture FPUs, when using the FLD instruction to load an extended-real value, a denormal operand exception is not generated because the instruction is not arithmetic. The 16-bit Intel Architecture math coprocessors do report a denormal operand exception in this situation. This difference does not affect existing software. On the 32-bit Intel Architecture FPUs, loading a denormal value that is in single- or double-real format causes the value to be converted to extended-real format. Loading a denormal value on the 16-bit Intel Architecture math coprocessors causes the value to be converted to an unnormal. If the next instruction is FXTRACT or FXAM, the 32-bit Intel Architecture FPUs will give a different result than the 16-bit Intel Architecture math coprocessors. This change was made for IEEE Standard 754 compatibility.
18-16
On the 32-bit Intel Architecture FPUs, loading an SNaN that is in single- or double-real format causes the FPU to generate an invalid operation exception. The 16-bit Intel Architecture math coprocessors do not raise an exception when loading a signaling NaN. The invalid operation exception handler for 16-bit math coprocessor software needs to be updated to handle this condition when porting software to 32-bit FPUs. This change was made for IEEE Standard 754 compatibility. 18.12.7.12. FXTRACT INSTRUCTION On the 32-bit Intel Architecture FPUs, if the operand is 0 for the FXTRACT instruction, the divide-by-zero exception is reported and is delivered to register ST(1). If the operand is +, no exception is reported. If the operand is 0 on the 16-bit Intel Architecture math coprocessors, 0 is delivered to register ST(1) and no exception is reported. If the operand is +, the invalid operation exception is reported. These differences have no impact on existing software. Software usually bypasses 0 and . This change is due to the IEEE 754 recommendation to fully support the logb function. 18.12.7.13. LOAD CONSTANT INSTRUCTIONS On 32-bit Intel Architecture FPUs, rounding control is in effect for the load constant instructions. Rounding control is not in effect for the 16-bit Intel Architecture math coprocessors. Results for the FLDPI, FLDLN2, FLDLG2, and FLDL2E instructions are the same as for the 16-bit Intel Architecture math coprocessors when rounding control is set to round to nearest or round to +. They are the same for the FLDL2T instruction when rounding control is set to round to nearest, round to , or round to zero. Results are different from the 16-bit Intel Architecture math coprocessors in the least significant bit of the mantissa if rounding control is set to round to or round to 0 for the FLDPI, FLDLN2, FLDLG2, and FLDL2E instructions; they are different for the FLDL2T instruction if round to + is specified. These changes were implemented for compatibility with IEEE 754 recommendations. 18.12.7.14. FSETPM INSTRUCTION With the 32-bit Intel Architecture FPUs, the FSETPM instruction is treated as NOP (no operation). This instruction informs the Intel 287 math coprocessor that the processor is in protected mode. This change has no impact on existing software. The 32-bit Intel Architecture FPUs handle all addressing and exception-pointer information, whether in protected mode or not. 18.12.7.15. FXAM INSTRUCTION With the 32-bit Intel Architecture FPUs, if the FPU encounters an empty register when executing the FXAM instruction, it will generate combinations of C0 through C3 equal to 1101 or 1111. The 16-bit Intel Architecture math coprocessors may generate these combinations, among others. This difference has no impact on existing software; it provides a performance upgrade to provide repeatable results.
18-17
18.12.7.16. FSAVE AND FSTENV INSTRUCTIONS With the 32-bit Intel Architecture FPUs, the address of a memory operand pointer stored by FSAVE or FSTENV is undefined if the previous floating-point instruction did not refer to memory
18.12.8. Transcendental Instructions

The floating-point results of the P6 family and Pentium processors for transcendental instructions in the core range may differ from the Intel486 processors by about 2 or 3 ulps (refer to Transcendental Instruction Accuracy in Chapter 7 of the Intel Architecture Software Developers Manual, Volume 1). Condition code flag C1 of the status word may differ as a result. The exact threshold for underflow and overflow will vary by a few ulps. The P6 family and Pentium processors results will have a worst case error of less than 1 ulp when rounding to the nearesteven and less than 1.5 ulps when rounding in other modes. The transcendental instructions are guaranteed to be monotonic, with respect to the input operands, throughout the domain supported by the instruction. Transcendental instructions may generate different results in the round-up flag (C1) on the 32-bit Intel Architecture FPUs. The round-up flag is undefined for these instructions on the 16-bit Intel Architecture math coprocessors. This difference has no impact on existing software.
18.12.9. Obsolete Instructions

The 8087 math coprocessor instructions FENI and FDISI and the Intel 287 math coprocessor instruction FSETPM are treated as integer NOP instructions in the 32-bit Intel Architecture FPUs. If these opcodes are detected in the instruction stream, no specific operation is performed and no internal states are affected.
18.12.10.WAIT/FWAIT Prefix Differences

On the Intel486 processor, when a WAIT/FWAIT instruction precedes a floating-point instruction (one which itself automatically synchronizes with the previous floating-point instruction), the WAIT/FWAIT instruction is treated as a no-op. Pending floating-point exceptions from a previous floating-point instruction are processed not on the WAIT/FWAIT instruction but on the floating-point instruction following the WAIT/FWAIT instruction. In such a case, the report of a floating-point exception may appear one instruction later on the Intel486 processor than on a P6 family or Pentium FPU, or on Intel 387 math coprocessor.
18.12.11.Operands Split Across Segments and/or Pages

On the P6 family, Pentium, and Intel486 processor FPUs, when the first half of an operand to be written is inside a page or segment and the second half is outside, a memory fault can cause the first half to be stored but not the second half. In this situation, the Intel 387 math coprocessor stores nothing.
18-18
18.12.12.FPU Instruction Synchronization

On the 32-bit Intel Architecture FPUs, all floating-point instructions are automatically synchronized; that is, the processor automatically waits until the previous floating-point instruction has completed before completing the next floating-point instruction. No explicit WAIT/FWAIT instructions are required to assure this synchronization. For the 8087 math coprocessors, explicit waits are required before each floating-point instruction to ensure synchronization. Although 8087 programs having explicit WAIT instructions execute perfectly on the 32-bit Intel Architecture processors without reassembly, these WAIT instructions are unnecessary.
18.13. SERIALIZING INSTRUCTIONS

Certain instructions have been defined to serialize instruction execution to ensure that modifications to flags, registers and memory are completed before the next instruction is executed (or in P6 family processor terminology committed to machine state). Because the P6 family processors use branch-prediction and out-of-order execution techniques to improve performance, instruction execution is not generally serialized until the results of an executed instruction are committed to machine state (refer to Chapter 2, Introduction to the Intel Architecture, in the Intel Architecture Software Developers Manual, Volume 1). As a result, at places in a program or task where it is critical to have execution completed for all previous instructions before executing the next instruction (for example, at a branch, at the end of a procedure, or in multiprocessor dependent code), it is useful to add a serializing instruction. Refer to Section 7.4., Serializing Instructions in Chapter 7, Multiple-Processor Management for more information on serializing instructions.
18.14. FPU AND MATH COPROCESSOR INITIALIZATION

Table 8-1 in Chapter 8, Processor Management and Initialization shows the states of the FPUs in the P6 family, Pentium, Intel486 processors and of the Intel 387 math coprocessor and Intel 287 coprocessor following a power-up, reset, or INIT, or following the execution of an FINIT/FNINIT instruction. The following is some additional compatibility information concerning the initialization of Intel Architecture FPUs and math coprocessors.
18.14.1. Intel 387 and Intel 287 Math Coprocessor Initialization

Following an Intel386 processor reset, the processor identifies its coprocessor type (Intel 287 or Intel 387 DX math coprocessor) by sampling its ERROR# input some time after the falling edge of RESET# signal and before execution of the first floating-point instruction. The Intel 287 coprocessor keeps its ERROR# output in inactive state after hardware reset; the Intel 387 coprocessor keeps its ERROR# output in active state after hardware reset. Upon hardware reset or execution of the FINIT/FNINIT instruction, the Intel 387 math coprocessor signals an error condition. The P6 family, Pentium, and Intel486 processors, like the Intel 287 coprocessor, do not.
18-19
18.14.2. Intel486 SX Processor and Intel 487 SX Math Coprocessor Initialization

When initializing an Intel486 SX processor and an Intel 487 SX math coprocessor, the initialization routine should check the presence of the math coprocessor and should set the FPU related flags (EM, MP, and NE) in control register CR0 accordingly (refer to Section 2.5., Control Registers in Chapter 2, System Architecture Overview for a complete description of these flags). Table 18-1 gives the recommended settings for these flags when the math coprocessor is present. The FSTCW instruction will give a value of FFFFH for the Intel486 SX microprocessor and 037FH for the Intel 487 SX math coprocessor.
Table 18-1. Recommended Values of the FP Related Bits for Intel486 SX Microprocessor/Intel 487 SX Math Coprocessor System
CR0 Flags EM MP NE Intel486 SX Processor Only 1 0 1 0 1 0, for MS-DOS* systems 1, for user-defined exception handler Intel 487 SX Math Coprocessor Present
The EM and MP flags in register CR0 are interpreted as shown in Table 18-2.
Table 18-2. EM and MP Flag Interpretation
EM 0 0 1 1 MP 0 1 0 1 Interpretation Floating-point instructions are passed to FPU; WAIT/FWAIT and other waiting-type instructions ignore TS. Floating-point instructions are passed to FPU; WAIT/FWAIT and other waiting-type instructions test TS. Floating-point instructions trap to emulator; WAIT/FWAIT and other waiting-type instructions ignore TS. Floating-point instructions trap to emulator; WAIT/FWAIT and other waiting-type instructions test TS.
Following is an example code sequence to initialize the system and check for the presence of Intel486 SX processor/Intel 487 SX math coprocessor.
fninit fstcw mem_loc mov ax, mem_loc cmp ax, 037fh jz Intel487_SX_Math_CoProcessor_present;ax=037fh jmp Intel486_SX_microprocessor_present;ax=ffffh
If the Intel 487 SX math coprocessor is not present, the following code can be run to set the CR0 register for the Intel486 SX processor.
18-20
mov eax, cr0 and eax, fffffffdh ;make MP=0 or eax, 0024h ;make EM=1, NE=1 mov cr0, eax
This initialization will cause any floating-point instruction to generate a device not available exception (#NH), interrupt 7. The software emulation will then take control to execute these instructions. This code is not required if an Intel 487 SX math coprocessor is present in the system. In that case, the typical initialization routine for the Intel486 SX microprocessor will be adequate. Also, when designing an Intel486 SX processor based system with an Intel 487 SX math coprocessor, timing loops should be independent of clock speed and clocks per instruction. One way to attain this is to implement these loops in hardware and not in software (for example, BIOS).
18.15. CONTROL REGISTERS

The following sections identify the new control registers and control register flags and fields that were introduced to the 32-bit Intel Architecture in various processor families. Refer to Figure 2-5 in Chapter 2, System Architecture Overview for the location of these flags and fields in the control registers. The Pentium III processor introduced one new control flag in control register CR4:
OSXMMEXCPT (bit 10)The OS will set this bit if it supports unmasked SIMD floatingpoint exceptions.
The Pentium II processor introduced one new control flag in control register CR4: OSFXSR (bit 9)The OS supports saving and restoring the Pentium III processor state during context switches.
The Pentium Pro processor introduced three new control flags in control register CR4:
PAE (bit 5)Physical address extension. Enables paging mechanism to reference 36-bit physical addresses when set; restricts physical addresses to 32 bits when clear (refer to Section 18.16.1.1., Physical Memory Addressing Extension in Chapter 18, Intel Architecture Compatibility). PGE (bit 7)Page global enable. Inhibits flushing of frequently-used or shared pages on task switches (refer to Section 18.16.1.2., Global Pages in Chapter 18, Intel Architecture Compatibility). PCE (bit 8)Performance-monitoring counter enable. Enables execution of the RDPMC instruction at any protection level.
The content of CR4 is 0H following a hardware reset.
18-21
Control register CR4 was introduced in the Pentium processor. This register contains flags that enable certain new extensions provided in the Pentium processor:
VMEVirtual-8086 mode extensions. Enables support for a virtual interrupt flag in virtual-8086 mode (refer to Section 16.3., Interrupt and Exception Handling in Virtual8086 Mode in Chapter 16, 8086 Emulation). PVIProtected-mode virtual interrupts. Enables support for a virtual interrupt flag in protected mode (refer to Section 16.4., Protected-Mode Virtual Interrupts in Chapter 16, 8086 Emulation). TSDTime-stamp disable. Restricts the execution of the RDTSC instruction to procedures running at privileged level 0. DEDebugging extensions. Causes an undefined opcode (#UD) exception to be generated when debug registers DR4 and DR5 are references for improved performance (refer to Section 15.2.2., Debug Registers DR4 and DR5 in Chapter 15, Debugging and Performance Monitoring). PSEPage size extensions. Enables 4-MByte pages when set (refer to Section 3.6.1., Paging Options in Chapter 3, Protected-Mode Memory Management). MCEMachine-check enable. Enables the machine-check exception, allowing exception handling for certain hardware error conditions (refer to Chapter 13, Machine-Check Architecture).
The Intel486 processor introduced five new flags in control register CR0:
NENumeric error. Enables the normal mechanism for reporting floating-point numeric errors. WPWrite protect. Write-protects user-level pages against supervisor-mode accesses. AMAlignment mask. Controls whether alignment checking is performed. Operates in conjunction with the AC (Alignment Check) flag. NWNot write-through. Enables write-throughs and cache invalidation cycles when clear and disables invalidation cycles and write-throughs that hit in the cache when set. CDCache disable. Enables the internal cache when clear and disables the cache when set.
The Intel486 processor introduced two new flags in control register CR3:
PCDPage-level cache disable. The state of this flag is driven on the PCD# pin during bus cycles that are not paged, such as interrupt acknowledge cycles, when paging is enabled. The PCD# pin is used to control caching in an external cache on a cycle-by-cycle basis. PWTPage-level write-through. The state of this flag is driven on the PWT# pin during bus cycles that are not paged, such as interrupt acknowledge cycles, when paging is enabled. The PWT# pin is used to control write through in an external cache on a cycle-bycycle basis.
18-22
18.16. MEMORY MANAGEMENT FACILITIES

The following sections describe the new memory management facilities available in the various Intel Architecture processors and some compatibility differences.
18.16.1. New Memory Management Control Flags

The Pentium Pro processor introduced three new memory management features: physical memory addressing extension, the global bit in page-table entries, and general support for larger page sizes. These features are only available when operating in protected mode. 18.16.1.1. PHYSICAL MEMORY ADDRESSING EXTENSION
The new PAE (physical address extension) flag in control register CR4, bit 5, enables 4 additional address lines on the processor, allowing 36-bit physical addresses. This option can only be used when paging is enabled, using a new page-table mechanism provided to support the larger physical address range (refer to Section 3.8., Physical Address Extension in Chapter 3, Protected-Mode Memory Management). 18.16.1.2. GLOBAL PAGES
The new PGE (page global enable) flag in control register CR4, bit 7, provides a mechanism for preventing frequently used pages from being flushed from the translation lookaside buffer (TLB). When this flag is set, frequently used pages (such as pages containing kernel procedures or common data tables) can be marked global by setting the global flag in a page-directory or page-table entry. On a task switch or a write to control register CR3 (which normally causes the TLBs to be flushed), the entries in the TLB marked global are not flushed. Marking pages global in this manner prevents unnecessary reloading of the TLB due to TLB misses on frequently used pages. Refer to Section 3.7., Translation Lookaside Buffers (TLBs) in Chapter 3, ProtectedMode Memory Management for a detailed description of this mechanism. 18.16.1.3. LARGER PAGE SIZES
The P6 family processors support large page sizes. This facility is enabled with the PSE (page size extension) flag in control register CR4, bit 4. When this flag is set, the processor supports either 4-KByte or 4-MByte page sizes when normal paging is used and 4-KByte and 2-MByte page sizes when the physical address extension is used. Refer to Section 3.6.1., Paging Options in Chapter 3, Protected-Mode Memory Management for more information about large page sizes.
18.16.2. CD and NW Cache Control Flags

The CD and NW flags in control register CR0 were introduced in the Intel486 processor. In the P6 family and Pentium processors, these flags are used to implement a writeback strategy for the data cache; in the Intel486 processor, they implement a write-through strategy. Refer
18-23
to Table 9-4, in Chapter 9, Memory Cache Control for a comparison of these bits on the P6 family, Pentium, and Intel486 processors. For complete information on caching, refer to Chapter 9, Memory Cache Control.
18.16.3. Descriptor Types and Contents

Operating-system code that manages space in descriptor tables often contains an invalid value in the access-rights field of descriptor-table entries to identify unused entries. Access rights values of 80H and 00H remain invalid for the P6 family, Pentium, Intel486, Intel386, and Intel 286 processors. Other values that were invalid on the Intel 286 processor may be valid on the 32-bit processors because uses for these bits have been defined.
18.16.4. Changes in Segment Descriptor Loads

On the Intel386 processor, loading a segment descriptor always causes a locked read and write to set the accessed bit of the descriptor. On the P6 family, Pentium, and Intel486 processors, the locked read and write occur only if the bit is not already set.
18.17. DEBUG FACILITIES

The P6 family and Pentium processors include extensions to the Intel486 processor debugging support for breakpoints. To use the new breakpoint features, it is necessary to set the DE flag in control register CR4.
18.17.1. Differences in Debug Register DR6

It is not possible to write a 1 to reserved bit 12 in debug status register DR6 on the P6 family and Pentium processors; however, it is possible to write a 1 in this bit on the Intel486 processor. Refer to Table 8-1 in Chapter 8, Processor Management and Initialization for the different setting of this register following a power-up or hardware reset.
18.17.2. Differences in Debug Register DR7

The P6 family and Pentium processors determines the type of breakpoint access by the R/W0 through R/W3 fields in debug control register DR7 as follows: 00 01 10 11 Break on instruction execution only. Break on data writes only. Undefined if the DE flag in control register CR4 is cleared; break on I/O reads or writes but not instruction fetches if the DE flag in control register CR4 is set. Break on data reads or writes but not instruction fetches.
18-24
On the P6 family and Pentium processors, reserved bits 11, 12, 14 and 15 are hard-wired to 0. On the Intel486 processor, however, bit 12 can be set. Refer to Table 8-1 in Chapter 8, Processor Management and Initialization for the different settings of this register following a power-up or hardware reset.
18.17.3. Debug Registers DR4 and DR5

Although the DR4 and DR5 registers are documented as reserved, previous generations of processors aliased references to these registers to debug registers DR6 and DR7, respectively. When debug extensions are not enabled (the DE flag in control register CR4 is cleared), the P6 family and Pentium processors remain compatible with existing software by allowing these aliased references. When debug extensions are enabled (the DE flag is set), attempts to reference registers DR4 or DR5 will result in an invalid-opcode exception (#UD).
18.17.4. Recognition of Breakpoints

For the Pentium processor, it is recommended that debuggers execute the LGDT instruction before returning to the program being debugged to ensure that breakpoints are detected. This operation does not need to be performed on the P6 family, Intel486, or Intel386 processors.
18.18. TEST REGISTERS

The implementation of test registers on the Intel486 processor used for testing the cache and TLB has been redesigned using MSRs on the P6 family and Pentium processors. (Note that MSRs used for this function are different on the P6 family and Pentium processors.) The MOV to and from test register instructions generate invalid-opcode exceptions (#UD) on the P6 family processors.
18.19. Exceptions and/or Exception Conditions

This section describes the new exceptions and exception conditions added to the 32-bit Intel Architecture processors and implementation differences in existing exception handling. Refer to Chapter 5, Interrupt and Exception Handling for a detailed description of the Intel Architecture exceptions. The Pentium III processor introduced new state with the SIMD floating-point registers. Computations involving data in these registers can produce exceptions. A new control/status register is used to determine which exception or exceptions have occurred. When an exception associated with the SIMD floating-point registers occurs, an interrupt is generated.
Streaming SIMD Extensions exception (#XF, interrupt 19)New exceptions associated with the SIMD floating-point registers and resulting computations.
18-25
No new exceptions were added to the Pentium II and Pentium Pro processors. The set of available exceptions is the same as for the Pentium processor. However, the following exception condition was added to the Intel Architecture with the Pentium Pro processor:
Machine-check exception (#MC, interrupt 18)New exception conditions. Many exception conditions have been added to the machine-check exception and a new architecture has been added for handling and reporting on hardware errors. Refer to Chapter 13, Machine-Check Architecture for a detailed description of the new conditions.
The following exceptions and/or exception conditions were added to the Intel Architecture with the Pentium processor:
Machine-check exception (#MC, interrupt 18)New exception. This exception reports parity and other hardware errors. It is a model-specific exception and may not be implemented or implemented differently in future processors. The MCE flag in control register CR4 enables the machine-check exception. When this bit is clear (which it is at reset), the processor inhibits generation of the machine-check exception. General-protection exception (#GP, interrupt 13)New exception condition added. An attempt to write a 1 to a reserved bit position of a special register causes a generalprotection exception to be generated. Page-fault exception (#PF, interrupt 14)New exception condition added. When a 1 is detected in any of the reserved bit positions of a page-table entry, page-directory entry, or page-directory pointer during address translation, a page-fault exception is generated.
The following exception was added to the Intel486 processor:
Alignment-check exception (#AC, interrupt 17)New exception. Reports unaligned memory references when alignment checking is being performed.
The following exceptions and/or exception conditions were added to the Intel386 processor:
Divide-error exception (#DE, interrupt 0) Change in exception handling. Divide-error exceptions on the Intel386 processors always leave the saved CS:IP value pointing to the instruction that failed. On the 8086 processor, the CS:IP value points to the next instruction. Change in exception handling. The Intel386 processors can generate the largest negative number as a quotient for the IDIV instruction (80H and 8000H). The 8086 processor generates a divide-error exception instead.
Invalid-opcode exception (#UD, interrupt 6)New exception condition added. Improper use of the LOCK instruction prefix can generate an invalid-opcode exception. Page-fault exception (#PF, interrupt 14)New exception condition added. If paging is enabled in a 16-bit program, a page-fault exception can be generated as follows. Paging can be used in a system with 16-bit tasks if all tasks use the same page directory. Because there is no place in a 16-bit TSS to store the PDBR register, switching to a 16-bit task does not change the value of the PDBR register. Tasks ported from the Intel 286 processor should be given 32-bit TSSs so they can make full use of paging.
18-26
General-protection exception (#GP, interrupt 13)New exception condition added. The Intel386 processor sets a limit of 15 bytes on instruction length. The only way to violate this limit is by putting redundant prefixes before an instruction. A general-protection exception is generated if the limit on instruction length is violated. The 8086 processor has no instruction length limit.
18.19.1. Machine-Check Architecture

The Pentium Pro processor introduced a new architecture to the Intel Architecture for handling and reporting on machine-check exceptions. This machine-check architecture (described in detail in Chapter 13, Machine-Check Architecture) greatly expands the ability of the processor to report on internal hardware errors.
18.19.2. Priority OF Exceptions

The priority of exceptions are broken down into several major categories: 1. Traps on the previous instruction 2. External interrupts 3. Faults on fetching the next instruction 4. Faults in decoding the next instruction 5. Faults on executing an instruction There are no changes in the priority of these major categories between the different processors, however, exceptions within these categories are implementation dependent and may change from processor to processor.
18.20. INTERRUPTS
The following differences in handling interrupts are found among the Intel Architecture processors.
18.20.1. Interrupt Propagation Delay

External hardware interrupts may be recognized on different instruction boundaries on the P6 family, Pentium, Intel486, and Intel386 processors, due to the superscaler designs of the P6 family and Pentium processors. Therefore, the EIP pushed onto the stack when servicing an interrupt may be different for the P6 family, Pentium, Intel486, and Intel386 processors.
18-27
18.20.2. NMI Interrupts

After an NMI interrupt is recognized by the P6 family, Pentium, Intel486, Intel386, and Intel 286 processors, the NMI interrupt is masked until the first IRET instruction is executed, unlike the 8086 processor.
18.20.3. IDT Limit

The LIDT instruction can be used to set a limit on the size of the IDT. A double-fault exception (#DF) is generated if an interrupt or exception attempts to read a vector beyond the limit. Shutdown then occurs on the 32-bit Intel Architecture processors if the double-fault handler vector is beyond the limit. (The 8086 processor does not have a shutdown mode nor a limit.)
18.21. TASK SWITCHING AND TSS

This section identifies the implementation differences of task switching, additions to the TSS and the handling of TSSs and TSS segment selectors.
18.21.1. P6 Family and Pentium Processor TSS

When the virtual mode extensions are enabled (by setting the VME flag in control register CR4), the TSS in the P6 family and Pentium processors contain an interrupt redirection bit map, which is used in virtual-8086 mode to redirect interrupts back to an 8086 program.
18.21.2. TSS Selector Writes

During task state saves, the Intel486 processor writes 2-byte segment selectors into a 32-bit TSS, leaving the upper 16 bits undefined. For performance reasons, the P6 family and Pentium processors write 4-byte segment selectors into the TSS, with the upper 2 bytes being 0. For compatibility reasons, code should not depend on the value of the upper 16 bits of the selector in the TSS.
18.21.3. Order of Reads/Writes to the TSS

The order of reads and writes into the TSS is processor dependent. The P6 family and Pentium processors may generate different page-fault addresses in control register CR2 in the same TSS area than the Intel486 and Intel386 processors, if a TSS crosses a page boundary (which is not recommended).
18-28
18.21.4. Using A 16-Bit TSS with 32-Bit Constructs

Task switches using 16-bit TSSs should be used only for pure 16-bit code. Any new code written using 32-bit constructs (operands, addressing, or the upper word of the EFLAGS register) should use only 32-bit TSSs. This is due to the fact that the 32-bit processors do not save the upper 16 bits of EFLAGS to a 16-bit TSS. A task switch back to a 16-bit task that was executing in virtual mode will never re-enable the virtual mode, as this flag was not saved in the upper half of the EFLAGS value in the TSS. Therefore, it is strongly recommended that any code using 32-bit constructs use a 32-bit TSS to ensure correct behavior in a multitasking environment.
18.21.5. Differences in I/O Map Base Addresses

The Intel486 processor considers the TSS segment to be a 16-bit segment and wraps around the 64K boundary. Any I/O accesses check for permission to access this I/O address at the I/O base address plus the I/O offset. If the I/O map base address exceeds the specified limit of 0DFFFH, an I/O access will wrap around and obtain the permission for the I/O address at an incorrect location within the TSS. A TSS limit violation does not occur in this situation on the Intel486 processor. However, the P6 family and Pentium processors consider the TSS to be a 32-bit segment and a limit violation occurs when the I/O base address plus the I/O offset is greater than the TSS limit. By following the recommended specification for the I/O base address to be less than 0DFFFH, the Intel486 processor will not wrap around and access incorrect locations within the TSS for I/O port validation and the P6 family and Pentium processors will not experience general-protection exceptions (#GP). Figure 18-1 demonstrates the different areas accessed by the Intel486 and the P6 family and Pentium processors.
18-29
Intel486 Processor
P6 family and Pentium Processors

FFFFH + 10H = Outside Segment for I/O Validation
FFFFH
I/O Map Base Addres
FFFFH
I/O Map Base Addres
FFFFH
FFFFH
FFFFH + 10H = FH for I/O Validation
0H
I/O access at port 10H checks bitmap at I/O map base address FFFFH + 10H = offset 10H. Offset FH from beginning of TSS segment results because wraparound occurs.
0H
I/O access at port 10H checks bitmap at I/O address FFFFH + 10H, which exceeds segment limit. Wrap around does not occur, general-protection exception (#GP) occurs.
Figure 18-1. I/O Map Base Address Differences
18.22. CACHE MANAGEMENT

The P6 family processors include two levels of internal caches: L1 (level 1) and L2 (level 2). The L1 cache is divided into an instruction cache and a data cache; the L2 cache is a generalpurpose cache. Refer to Section 9.1., Internal Caches, TLBs, and Buffers, in Chapter 9, Memory Cache Control, for a description of these caches. (Note that although the Pentium II processor L2 cache is physically located on a separate chip in the cassette, it is considered an internal cache.) The Pentium processor includes separate level 1 instruction and data caches. The data cache supports a writeback (or alternatively write-through, on a line by line basis) policy for memory updates. Refer to the Pentium Processor Data Book for more information about the organization and operation of the Pentium processor caches. The Intel486 processor includes a single level 1 cache for both instructions and data. The meaning of the CD and NW flags in control register CR0 have been redefined for the P6 family and Pentium processors. For these processors, the recommended value (00B) enables writeback for the data cache of the Pentium processor and for the L1 data cache and L2 cache of the P6 family processors. In the Intel486 processor, setting these flags to (00B) enables write-through for the cache. External system hardware can force the Pentium processor to disable caching or to use the write-through cache policy should that be required. Refer to the Pentium Processor Data Book
18-30
for more information about hardware control of the Pentium processor caches. In the P6 family processors, the MTRRs can be used to override the CD and NW flags (refer to Table 9-6, in Chapter 9, Memory Cache Control). The P6 family and Pentium processors support page-level cache management in the same manner as the Intel486 processor by using the PCD and PWT flags in control register CR3, the page-directory entries, and the page-table entries. The Intel486 processor, however, is not affected by the state of the PWT flag since the internal cache of the Intel486 processor is a write-through cache.
18.22.1. Self-Modifying Code with Cache Enabled

On the Intel486 processor, a write to an instruction in the cache will modify it in both the cache and memory. If the instruction was prefetched before the write, however, the old version of the instruction could be the one executed. To prevent this problem, it is necessary to flush the instruction prefetch unit of the Intel486 processor by coding a jump instruction immediately after any write that modifies an instruction. The P6 family and Pentium processors, however, check whether a write may modify an instruction that has been prefetched for execution. This check is based on the linear address of the instruction. If the linear address of an instruction is found to be present in the prefetch queue, the P6 family and Pentium processors flush the prefetch queue, eliminating the need to code a jump instruction after any writes that modify an instruction. Because the linear address of the write is checked against the linear address of the instructions that have been prefetched, special care must be taken for self-modifying code to work correctly when the physical addresses of the instruction and the written data are the same, but the linear addresses differ. In such cases, it is necessary to execute a serializing operation to flush the prefetch queue after the write and before executing the modified instruction. Refer to Section 7.4., Serializing Instructions in Chapter 7, Multiple-Processor Management for more information on serializing instructions.
NOTE
The check on linear addresses described above is not in practice a concern for compatibility. Applications that include self-modifying code use the same linear address for modifying and fetching the instruction. System software, such as a debugger, that might possibly modify an instruction using a different linear address than that used to fetch the instruction must execute a serializing operation, such as IRET, before the modified instruction is executed.
18.23. PAGING
This section identifies enhancements made to the paging mechanism and implementation differences in the paging mechanism for various Intel Architecture processors.
18-31
18.23.1. Large Pages

The Pentium processor extended the memory management/paging facilities of the Intel Architecture to allow large (4Mbytes) pages sizes (refer to Section 3.6.1., Paging Options in Chapter 3, Protected-Mode Memory Management). The initial P6 family processor (the Pentium Pro processor) added a 2MByte page size to the Intel Architecture in conjunction with the physical address extension (PAE) feature (refer to Section 3.8., Physical Address Extension in Chapter 3, Protected-Mode Memory Management). The availability of large pages on any Intel Architecture processor can be determined via feature bit 3 (PSE) of register EDX after the CPUID instruction has been execution with an argument of 1. Intel processors that do not support the CPUID instruction do not support page size enhancements. (Refer to CPUIDCPU Identification in Chapter 3, Instruction Set Reference, of the Intel Architecture Software Developers Manual, Volume 2, and AP-485, Intel Processor Identification and the CPUID Instruction, for more information on the CPUID instruction.)
18.23.2. PCD and PWT Flags

The PCD and PWT flags were introduced to the Intel Architecture in the Intel486 processor to control the caching of pages:
PCD (page-level cache disable) flagControls caching on a page-by-page basis. PWT (page-level write-through) flagControls the write-through/writeback caching policy on a page-by-page basis. Since the internal cache of the Intel486 processor is a write-through cache, it is not affected by the state of the PWT flag.
18.23.3. Enabling and Disabling Paging

Paging is enabled and disabled by loading a value into control register CR0 that modifies the PG flag. For backward and forward compatibility with all Intel Architecture processors, Intel recommends that the following operations be performed when enabling or disabling paging: 1. Execute a MOV CR0, REG instruction to either set (enable paging) or clear (disable paging) the PG flag. 2. Execute a near JMP instruction. The sequence bounded by the MOV and JMP instructions should be identity mapped (that is, the instructions should reside on a page whose linear and physical addresses are identical). For the P6 family processors, the MOV CR0, REG instruction is serializing, so the jump operation is not required. However, for backwards compatibility, the JMP instruction should still be included.
18-32
18.24. STACK OPERATIONS

This section identifies the differences in the stack mechanism for the various Intel Architecture processors.
18.24.1. Selector Pushes and Pops

When pushing a segment selector onto the stack, the Intel486 processor writes 2 bytes onto 4-byte stacks and decrements ESP by 4. The P6 family and Pentium processors write 4 bytes, with the upper 2 bytes being zeros. When popping a segment selector from the stack, the Intel486 processor reads only 2 bytes. The P6 family and Pentium processors read 4 bytes and discard the upper 2 bytes. This operation may have an effect if the ESP is close to the stack-segment limit. On the P6 family and Pentium processors, stack location at ESP plus 4 may be above the stack limit, in which case a stack fault exception (#SS) will be generated. On the Intel486 processor, stack location at ESP plus 2 may be less than the stack limit and no exception is generated. For a POP-to-memory instruction that meets the following conditions:
The stack segment size is 16-bit Any 32-bit addressing form with the SIB byte specifying ESP as the base register The initial stack pointer is FFFCh (32-bit operand) or FFFEh (16-bit operand) and will wrap around to 0h as a result of the POP operation
the result of the memory write is specific to the processor-family. For example, in Pentium II and Pentium Pro processors, the result of the memory write is SS:0h plus any scaled index and displacement. In Pentium and Pentium Pro processors, the result of the memory write may be either a stack fault (real mode or protected mode with stack segment size of 64Kbyte), or write to SS:10000h plus any scaled index and displacement (protected mode and stack segment size exceeds 64Kbyte).
18.24.2. Error Code Pushes

The Intel486 processor implements the error code pushed on the stack as a 16-bit value. When pushed onto a 32-bit stack, the Intel486 processor only pushes 2 bytes and updates ESP by 4. The P6 family and Pentium processors error code is a full 32 bits with the upper 16 bits set to zero. The P6 family and Pentium processors, therefore, push 4 bytes and update ESP by 4. Any code that relies on the state of the upper 16 bits may produce inconsistent results.
18.24.3. Fault Handling Effects on the Stack

During the handling of certain instructions, such as CALL and PUSHA, faults may occur in different sequences for the different processors. For example, during far calls, the Intel486 processor pushes the old CS and EIP before a possible branch fault is resolved. A branch fault
18-33
is a fault from a branch instruction occurring from a segment limit or access rights violation. If a branch fault is taken, the Intel486 and P6 family processors will have corrupted memory below the stack pointer. However, the ESP register is backed up to make the instruction restartable. The P6 family processors issue the branch before the pushes. Therefore, if a branch fault does occur, these processors do not corrupt memory below the stack pointer. This implementation difference, however, does not constitute a compatibility problem, as only values at or above the stack pointer are considered to be valid.
18.24.4. Interlevel RET/IRET From a 16-Bit Interrupt or Call Gate

If a call or interrupt is made from a 32-bit stack environment through a 16-bit gate, only 16 bits of the old ESP can be pushed onto the stack. On the subsequent RET/IRET, the 16-bit ESP is popped but the full 32-bit ESP is updated since control is being resumed in a 32-bit stack environment. The Intel486 processor writes the SS selector into the upper 16 bits of ESP. The P6 family and Pentium processors write zeros into the upper 16 bits.
18.25. MIXING 16- AND 32-BIT SEGMENTS

The features of the 16-bit Intel 286 processor are an object-code compatible subset of those of the 32-bit Intel Architecture processors. The D (default operation size) flag in segment descriptors indicates whether the processor treats a code or data segment as a 16-bit or 32-bit segment; the B(default stack size) flag in segment descriptors indicates whether the processor treats a stack segment as a 16-bit or 32-bit segment. The segment descriptors used by the Intel 286 processor are supported by the 32-bit Intel Architecture processors if the Intel-reserved word (highest word) of the descriptor is clear. On the 32-bit Intel Architecture processors, this word includes the upper bits of the base address and the segment limit. The segment descriptors for data segments, code segments, local descriptor tables (there are no descriptors for global descriptor tables), and task gates are the same for the 16- and 32-bit processors. Other 16-bit descriptors (TSS segment, call gate, interrupt gate, and trap gate) are supported by the 32-bit processors. The 32-bit processors also have descriptors for TSS segments, call gates, interrupt gates, and trap gates that support the 32-bit architecture. Both kinds of descriptors can be used in the same system. For those segment descriptors common to both 16- and 32-bit processors, clear bits in the reserved word cause the 32-bit processors to interpret these descriptors exactly as an Intel 286 processor does, that is:
Base AddressThe upper 8 bits of the 32-bit base address are clear, which limits base addresses to 24 bits. LimitThe upper 4 bits of the limit field are clear, restricting the value of the limit field to 64 Kbytes. Granularity bitThe G (granularity) flag is clear, indicating the value of the 16-bit limit is interpreted in units of 1 byte.
18-34
Big bitIn a data-segment descriptor, the B flag is clear in the segment descriptor used by the 32-bit processors, indicating the segment is no larger than 64 Kbytes. Default bitIn a code-segment descriptor, the D flag is clear, indicating 16-bit addressing and operands are the default. In a stack-segment descriptor, the D flag is clear, indicating use of the SP register (instead of the ESP register) and a 64-Kbyte maximum segment limit.
For information on mixing 16- and 32-bit code in applications, refer to Chapter 17, Mixing 16Bit and 32-Bit Code.
18.26. SEGMENT AND ADDRESS WRAPAROUND

This section discusses differences in segment and address wraparound between the P6 family, Pentium, Intel486, Intel386, Intel 286, and 8086 processors.
18.26.1. Segment Wraparound

On the 8086 processor, an attempt to access a memory operand that crosses offset 65,535 or 0FFFFH or offset 0 (for example, moving a word to offset 65,535 or pushing a word when the stack pointer is set to 1) causes the offset to wrap around modulo 65,536 or 010000H. With the Intel 286 processor, any base and offset combination that addresses beyond 16 MBytes wraps around to the 1 MByte of the address space. The P6 family, Pentium, Intel486, and Intel386 processors in real-address mode generate an exception in these cases:
A general-protection exception (#GP) if the segment is a data segment (that is, if the CS, DS, ES, FS, or GS register is being used to address the segment). A stack-fault exception (#SS) if the segment is a stack segment (that is, if the SS register is being used).
An exception to this behavior occurs when a stack access is data aligned, and the stack pointer is pointing to the last aligned piece of data that size at the top of the stack (ESP is FFFFFFFCH). When this data is popped, no segment limit violation occurs and the stack pointer will wrap around to 0. The address space of the P6 family, Pentium, and Intel486 processors may wraparound at 1 MByte in real-address mode. An external A20M# pin forces wraparound if enabled. On Intel 8086 processors, it is possible to specify addresses greater than 1 MByte. For example, with a selector value FFFFH and an offset of FFFFH, the effective address would be 10FFEFH (1 MByte plus 65519 bytes). The 8086 processor, which can form addresses up to 20 bits long, truncates the uppermost bit, which wraps this address to FFEFH. However, the P6 family, Pentium, and Intel486 processors do not truncate this bit if A20M# is not enabled. If a stack operation wraps around the address limit, shutdown occurs. (The 8086 processor does not have a shutdown mode nor a limit.)
18-35
18.27. WRITE BUFFERS AND MEMORY ORDERING

The Pentium Pro and Pentium II processors provide a write buffer for temporary storage of writes (stores) to memory (refer to Section 9.11., Write Buffer, in Chapter 9, Memory Cache Control). The Pentium III processor has 4 write buffers. Writes stored in the write buffer(s) are always written to memory in program order, with the exception of fast string store operations (refer to Section 7.2.3., Out of Order Stores From String Operations in P6 Family Processors in Chapter 7, Multiple-Processor Management). The Pentium processor has two write buffers, one corresponding to each of the pipelines. Writes in these buffers are always written to memory in the order they were generated by the processor core. It should be noted that only memory writes are buffered and I/O writes are not. The P6 family, Pentium, and Intel486 processors do not synchronize the completion of memory writes on the bus and instruction execution after a write. An I/O, locked, or serializing instruction needs to be executed to synchronize writes with the next instruction (refer to Section 7.4., Serializing Instructions in Chapter 7, Multiple-Processor Management). The P6 family processors use processor ordering to maintain consistency in the order that data is read (loaded) and written (stored) in a program and the order the processor actually carries out the reads and writes. With this type of ordering, reads can be carried out speculatively and in any order, reads can pass buffered writes, and writes to memory are always carried out in program order. (Refer to Section 7.2., Memory Ordering in Chapter 7, Multiple-Processor Management for more information about processor ordering.) The Pentium III processor introduced a new instruction to serialize writes and make them globally visible. Memory ordering issues can arise between a producer and a consumer of data. The SFENCE instruction provides a performance-efficient way of ensuring ordering between routines that produce weakly-ordered results and routines that consume this data. No re-ordering of reads occurs on the Pentium processor, except under the condition noted in Section 7.2.1., Memory Ordering in the Pentium and Intel486 Processors in Chapter 7, Multiple-Processor Management, and in the following paragraph describing the Intel486 processor. Specifically, the write buffers are flushed before the IN instruction is executed. No reads (as a result of cache miss) are reordered around previously generated writes sitting in the write buffers. The implication of this is that the write buffers will be flushed or emptied before a subsequent bus cycle is run on the external bus. On both the Intel486 and Pentium processors, under certain conditions, a memory read will go onto the external bus before the pending memory writes in the buffer even though the writes occurred earlier in the program execution. A memory read will only be reordered in front of all writes pending in the buffers if all writes pending in the buffers are cache hits and the read is a cache miss. Under these conditions, the Intel486 and Pentium processors will not read from an external memory location that needs to be updated by one of the pending writes. During a locked bus cycle, the Intel486 processor will always access external memory, it will never look for the location in the on-chip cache. All data pending in the Intel486 processor's write buffers will be written to memory before a locked cycle is allowed to proceed to the external bus. Thus, the locked bus cycle can be used for eliminating the possibility of reordering read cycles on the Intel486 processor. The Pentium processor does check its cache on a read-
18-36
modify-write access and, if the cache line has been modified, writes the contents back to memory before locking the bus. The P6 family processors write to their cache on a read-modifywrite operation (if the access does not split across a cache line) and does not write back to system memory. If the access does split across a cache line, it locks the bus and accesses system memory. I/O reads are never reordered in front of buffered memory writes on an Intel Architecture processor. This ensures an update of all memory locations before reading the status from an I/O device.
18.28. BUS LOCKING

The Intel 286 processor performs the bus locking differently than the Intel P6 family, Pentium, Intel486, and Intel386 processors. Programs that use forms of memory locking specific to the Intel 286 processor may not run properly when run on later processors. A locked instruction is guaranteed to lock only the area of memory defined by the destination operand, but may lock a larger memory area. For example, typical 8086 and Intel 286 configurations lock the entire physical memory space. Programmers should not depend on this. On the Intel 286 processor, the LOCK prefix is sensitive to IOPL. If the CPL is greater than the IOPL, a general-protection exception (#GP) is generated. On the Intel386 DX, Intel486, and Pentium, and P6 family processors, no check against IOPL is performed. The Pentium processor automatically asserts the LOCK# signal when acknowledging external interrupts. After signaling an interrupt request, an external interrupt controller may use the data bus to send the interrupt vector to the processor. After receiving the interrupt request signal, the processor asserts LOCK# to insure that no other data appears on the data bus until the interrupt vector is received. This bus locking does not occur on the P6 family processors.
18.29. BUS HOLD

Unlike the 8086 and Intel 286 processors, but like the Intel386 and Intel486 processors, the P6 family and Pentium processors respond to requests for control of the bus from other potential bus masters, such as DMA controllers, between transfers of parts of an unaligned operand, such as two words which form a doubleword. Unlike the Intel386 processor, the P6 family, Pentium and Intel486 processors respond to bus hold during reset initialization.
18.30. TWO WAYS TO RUN INTEL 286 PROCESSOR TASKS

When porting 16-bit programs to run on 32-bit Intel Architecture processors, there are two approaches to consider:
Porting an entire 16-bit software system to a 32-bit processor, complete with the old operating system, loader, and system builder. Here, all tasks will have 16-bit TSSs. The 32bit processor is being used as if it were a faster version of the 16-bit processor.
18-37
Porting selected 16-bit applications to run in a 32-bit processor environment with a 32-bit operating system, loader, and system builder. Here, the TSSs used to represent 286 tasks should be changed to 32-bit TSSs. It is possible to mix 16 and 32-bit TSSs, but the benefits are small and the problems are great. All tasks in a 32-bit software system should have 32bit TSSs. It is not necessary to change the 16-bit object modules themselves; TSSs are usually constructed by the operating system, by the loader, or by the system builder. Refer to Chapter 17, Mixing 16-Bit and 32-Bit Code for more detailed information about mixing 16-bit and 32-bit code.
Because the 32-bit processors use the contents of the reserved word of 16-bit segment descriptors, 16-bit programs that place values in this word may not run correctly on the 32-bit processors.
18.31. MODEL-SPECIFIC EXTENSIONS TO THE INTEL ARCHITECTURE

Certain extensions to the Intel Architecture are specific to a processor or family of Intel Architecture processors and may not be implemented or implemented in the same way in future processors. The following sections describe these model-specific extensions. The CPUID instruction indicates the availability of some of the model-specific features.
18.31.1. Model-Specific Registers

The Pentium processor introduced a set of model-specific registers (MSRs) for use in controlling hardware functions and performance monitoring. To access these MSRs, two new instructions were added to the Intel Architecture: read MSR (RDMSR) and write MSR (WRMSR). The MSRs in the Pentium processor are not guaranteed to be duplicated or provided in the next generation Intel Architecture processors. The P6 family processors greatly increased the number of MSRs available to software. Refer to Appendix B, Model-Specific Registers for a complete list of the available MSRs. The new registers control the debug extensions, the performance counters, the machine-check exception capability, the machine-check architecture, and the MTRRs. These registers are accessible using the RDMSR and WRMSR instructions. Specific information on some of these new MSRs is provided in the following sections. As with the Pentium processor MSR, the P6 family processor MSRs are not guaranteed to be duplicated or provided in the next generation Intel Architecture processors.
18.31.2. RDMSR and WRMSR Instructions

The RDMSR (read model-specific register) and WRMSR (write model-specific register) instructions recognize a much larger number of model-specific registers in the P6 family processors. (Refer to RDMSRRead from Model Specific Register and WRMSRWrite to Model Specific Register in Chapter 3 of the Intel Architecture Software Developers Manual, Volume 2, for more information about these instructions.)
18-38
18.31.3. Memory Type Range Registers

Memory type range registers (MTRRs) are a new feature introduced into the Intel Architecture in the Pentium Pro processor. MTRRs allow the processor to optimize memory operations for different types of memory, such as RAM, ROM, frame buffer memory, and memory-mapped I/O. MTRRs are MSRs that contain an internal map of how physical address ranges are mapped to various types of memory. The processor uses this internal memory map to determine the cacheability of various physical memory locations and the optimal method of accessing memory locations. For example, if a memory location is specified in an MTRR as write-through memory, the processor handles accesses to this location as follows. It reads data from that location in lines and caches the read data or maps all writes to that location to the bus and updates the cache to maintain cache coherency. In mapping the physical address space with MTRRs, the processor recognizes five types of memory: uncacheable (UC), uncacheable, speculatable, writecombining (USWC), write-through (WT), write-protected (WP), and writeback (WB). Earlier Intel Architecture processors (such as the Intel486 and Pentium processors) used the KEN# (cache enable) pin and external logic to maintain an external memory map and signal cacheable accesses to the processor. The MTRR mechanism simplifies hardware designs by eliminating the KEN# pin and the external logic required to drive it. Refer to Chapter 8, Processor Management and Initialization and Appendix B, Model-Specific Registers for more information on the MTRRs.
18.31.4. Machine-Check Exception and Architecture

The Pentium processor introduced a new exception called the machine-check exception (#MC, interrupt 18). This exception is used to detect hardware-related errors, such as a parity error on a read cycle. The P6 family processors extend the types of errors that can be detected and that generate a machine-check exception. It also provides a new machine-check architecture for recording information about a machine-check error and provides extended recovery capability. The machine-check architecture provides several banks of reporting registers for recording machine-check errors. Each bank of registers is associated with a specific hardware unit in the processor. The primary focus of the machine checks is on bus and interconnect operations; however, checks are also made of translation lookaside buffer (TLB) and cache operations. The machine-check architecture can correct some errors automatically and allow for reliable restart of instruction execution. It also collects sufficient information for software to use in correcting other machine errors not corrected by hardware. Refer to Chapter 13, Machine-Check Architecture for more information on the machine-check exception and the machine-check architecture.
18-39
18.31.5. Performance-Monitoring Counters

The P6 family and Pentium processors provide two performance-monitoring counters for use in monitoring internal hardware operations. These counters are event counters that can be programmed to count a variety of different types of events, such as the number of instructions decoded, number of interrupts received, or number of cache loads. Appendix A, PerformanceMonitoring Events lists all the events that can be counted (Table A-1 for the P6 family processors and Table A-2 for the Pentium processors). The counters are set up, started, and stopped using two MSRs and the RDMSR and WRMSR instructions. For the P6 family processors, the current count for a particular counter can be read using the new RDPMC instruction. The performance-monitoring counters are useful for debugging programs, optimizing code, diagnosing system failures, or refining hardware designs. Refer to Chapter 15, Debugging and Performance Monitoring for more information on these counters.
18-40
A
PerformanceMonitoring Events
PERFORMANCE-MONITORING EVENTS
APPENDIX A PERFORMANCE-MONITORING EVENTS

This appendix contains list of the performance-monitoring events that can be monitored with the Intel Architecture processors. In the Intel Architecture processors, the ability to monitor performance events and the events that can be monitored are model specific. Section A.1., P6 Family Processor Performance-Monitoring Events lists and describes the events that can be monitored with the P6 family of processors. Section A.2., Pentium Processor Performance-Monitoring Events lists and describes the events that can be monitored with Pentium processors.
A.1. P6 FAMILY PROCESSOR PERFORMANCE-MONITORING EVENTS

Table A-1 lists the events that can be counted with the performance-monitoring counters and read with the RDPMC instruction for the P6 family of processors. The unit column gives the microarchitecture or bus unit that produces the event; the event number column gives the hexadecimal number identifying the event; the mnemonic event name column gives the name of the event; the unit mask column gives the unit mask required (if any); the description column describes the event; and the comments column gives additional information about the event. These performance-monitoring events are intended to be used as guides for performance tuning. The counter values reported are not guaranteed to be absolutely accurate and should be used as a relative guide for tuning. Known discrepancies are documented where applicable. Some performance events are model specific. Those added in later generations of the P6 family processors are listed in this table. Performance events are not architecturally guaranteed in future versions of the P6 family processors. All performance event encodings not listed in Table A-1 are reserved and their use will result in undefined counter results. Refer to the end of the table for notes related to certain entries in the table.
A-1
Table A-1. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters
Unit
Data Cache Unit (DCU)
Event Num.
43H
Mnemonic Event Name

DATA_MEM_REFS
Unit Mask
00H
Description
All loads from any memory type. All stores to any memory type. Each part of a split is counted separately. The internal logic counts not only memory loads and stores, but also internal retries. Note: 80-bit floating-point accesses are double counted, since they are decomposed into a 16-bit exponent load and a 64bit mantissa load. Memory accesses are only counted when they are actually performed (such as a load that gets squashed because a previous cache miss is outstanding to the same address, and which finally gets performed, is only counted once). Does not include I/O accesses, or other nonmemory accesses.
Comments
45H 46H 47H
DCU_LINES_IN DCU_M_LINES_IN DCU_M_LINES_OUT
00H 00H 00H
Total lines allocated in the DCU. Number of M state lines allocated in the DCU. Number of M state lines evicted from the DCU. This includes evictions via snoop HITM, intervention or replacement. Weighted number of cycles while a DCU miss is outstanding, incremented by the number of outstanding cache misses at any particular time. Cacheable read requests only are considered. Uncacheable requests are excluded. Read-for-ownerships are counted, as well as line fills, invalidates, and stores. An access that also misses the L2 is shortchanged by 2 cycles (i.e., if counts N cycles, should be N+2 cycles). Subsequent loads to the same cache line will not result in any additional counts. Count value not precise, but still useful.
48H
DCU_MISS_ OUTSTANDING
00H
Instruction Fetch Unit (IFU)
80H
IFU_IFETCH
00H
Number of instruction fetches, both cacheable and noncacheable, including UC fetches. Number of instruction fetch misses. All instruction fetches that do not hit the IFU (i.e., that produce memory requests). Includes UC accesses.
81H
IFU_IFETCH_MISS
00H
85H
ITLB_MISS
00H
Number of ITLB misses.
A-2
Table A-1. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.)
Unit Event Num.
86H
Mnemonic Event Name

IFU_MEM_STALL
Unit Mask
00H
Description
Number of cycles instruction fetch is stalled, for any reason. Includes IFU cache misses, ITLB misses, ITLB faults, and other minor stalls.
Comments
87H
ILD_STALL
00H
Number of cycles that the instruction length decoder is stalled. Number of L2 instruction fetches. This event indicates that a normal instruction fetch was received by the L2. The count includes only L2 cacheable instruction fetches; it does not include UC instruction fetches. It does not include ITLB miss accesses.
L2 Cache1
28H
L2_IFETCH
MESI 0FH
29H
L2_LD
MESI 0FH
Number of L2 data loads. This event indicates that a normal, unlocked, load memory access was received by the L2. It includes only L2 cacheable memory accesses; it does not include I/O accesses, other nonmemory accesses, or memory accesses such as UC/WT memory accesses. It does include L2 cacheable TLB miss memory accesses.
2AH
L2_ST
MESI 0FH
Number of L2 data stores. This event indicates that a normal, unlocked, store memory access was received by the L2. Specifically, it indicates that the DCU sent a read-for-ownership request to the L2. It also includes Invalid to Modified requests sent by the DCU to the L2. It includes only L2 cacheable memory accesses; it does not include I/O accesses, other nonmemory accesses, or memory accesses such as UC/WT memory accesses. It includes TLB miss memory accesses.
24H
L2_LINES_IN
00H
Number of lines allocated in the L2.
A-3
Unit Event Num.
26H 25H 27H
Mnemonic Event Name

L2_LINES_OUT L2_M_LINES_INM L2_M_LINES_OUTM
Unit Mask
00H 00H 00H
Description
Number of lines removed from the L2 for any reason. Number of modified lines allocated in the L2. Number of modified lines removed from the L2 for any reason. Total number of L2 requests. Number of L2 address strobes. Number of cycles during which the L2 cache data bus was busy. Number of cycles during which the data bus was busy transferring read data from L2 to the processor. Number of clocks during which DRDY# is asserted. Utilization of the external system data bus during data transfers.
Comments
2EH 21H 22H 23H
L2_RQSTS L2_ADS L2_DBUS_BUSY L2_DBUS_BUSY_RD
MESI 0FH 00H 00H 00H
External Bus Logic (EBL)2
62H
BUS_DRDY_ CLOCKS
00H (Self) 20H (Any)
Unit Mask = 00H counts bus clocks when the processor is driving DRDY#. Unit Mask = 20H counts in processor clocks when any agent is driving DRDY#.
63H
BUS_LOCK_ CLOCKS
00H (Self) 20H (Any) 00H (Self)
Number of clocks during which LOCK# is asserted on the external system bus.3 Number of bus requests outstanding. This counter is incremented by the number of cacheable read bus requests outstanding in any given cycle.
Always counts in processor clocks.
60H
BUS_REQ_ OUTSTANDING
Counts only DCU fullline cacheable reads, not RFOs, writes, instruction fetches, or anything else. Counts waiting for bus to complete (last data chunk received).
65H
BUS_TRAN_BRD
00H (Self) 20H (Any) 00H (Self) 20H (Any) 00H (Self) 20H (Any) 00H (Self) 20H (Any)
Number of burst read transactions.
66H
BUS_TRAN_RFO
Number of completed read for ownership transactions.
67H
BUS_TRANS_WB
Number of completed write back transactions.
68H
BUS_TRAN_ IFETCH
Number of completed instruction fetch transactions.
A-4
Unit Event Num.
69H
Mnemonic Event Name

BUS_TRAN_INVAL
Unit Mask
00H (Self) 20H (Any) 00H (Self) 20H (Any) 00H (Self) 20H (Any) 00H (Self) 20H (Any) 00H (Self) 20H (Any) 00H (Self) 20H (Any) 00H (Self) 20H (Any)
Description
Number of completed invalidate transactions.
Comments
6AH
BUS_TRAN_PWR
Number of completed partial write transactions.
6BH
BUS_TRANS_P
Number of completed partial transactions.
6CH
BUS_TRANS_IO
Number of completed I/O transactions.
6DH
BUS_TRAN_DEF
Number of completed deferred transactions.
6EH
BUS_TRAN_BURST
Number of completed burst transactions.
70H
BUS_TRAN_ANY
Number of all completed bus transactions. Address bus utilization can be calculated knowing the minimum address bus occupancy. Includes special cycles, etc.
6FH
BUS_TRAN_MEM
00H (Self) 20H (Any) 00H (Self) 00H (Self)
Number of completed memory transactions.
64H
BUS_DATA_RCV
Number of bus clock cycles during which this processor is receiving data. Number of bus clock cycles during which this processor is driving the BNR# pin.
61H
BUS_BNR_DRV
A-5
Unit Event Num.
7AH
Mnemonic Event Name

BUS_HIT_DRV
Unit Mask
00H (Self)
Description
Number of bus clock cycles during which this processor is driving the HIT# pin.
Comments
Includes cycles due to snoop stalls. The event counts correctly, but the BPMi pins function as follows based on the setting of the PC bits (bit 19 in the PerfEvtSel0 and PerfEvtSel1 registers): If the core-clock-to- busclock ratio is 2:1 or 3:1, and a PC bit is set, the BPMipins will be asserted for a single clock when the counters overflow. If the PC bit is clear, the processor toggles the BPMipins when the counter overflows. If the clock ratio is not 2:1 or 3:1, the BPMi pins will not function for these performancemonitoring counter events.
7BH
BUS_HITM_DRV
00H (Self)
Number of bus clock cycles during which this processor is driving the HITM# pin.
Includes cycles due to snoop stalls. The event counts correctly, but the BPM i pins function as follows based on the setting of the PC bits (bit 19 in the PerfEvtSel0 and PerfEvtSel1 registers): If the core-clock-to- busclock ratio is 2:1 or 3:1, and a PC bit is set, the BPMipins will be asserted for a single clock when the counters overflow. If the PC bit is clear, the processor toggles the BPMipins when the counter overflows. If the clock ratio is not 2:1 or 3:1, the BPMi pins will not function for these performancemonitoring counter events.
7EH
BUS_SNOOP_STALL
00H (Self)
Number of clock cycles during which the bus is snoop stalled.
A-6
Unit
FloatingPoint Unit
Event Num.
C1H
Mnemonic Event Name

FLOPS
Unit Mask
00H
Description
Number of computational floating-point operations retired. Excludes floating-point computational operations that cause traps or assists. Includes floating-point computational operations executed by the assist handler. Includes internal sub-operations for complex floating-point instructions like transcendentals. Excludes floating-point loads and stores.
Comments
Counter 0 only.
10H
FP_COMP_OPS_ EXE
00H
Number of computational floating-point operations executed. The number of FADD, FSUB, FCOM, FMULs, integer MULs and IMULs, FDIVs, FPREMs, FSQRTS, integer DIVs, and IDIVs. Note not the number of cycles, but the number of operations. This event does not distinguish an FADD used in the middle of a transcendental flow from a separate FADD instruction.
Counter 0 only.
11H
FP_ASSIST
00H
Number of floating-point exception cases handled by microcode.
Counter 1 only. This event includes counts due to speculative execution. Counter 1 only.
12H
MUL
00H
Number of multiplies. Note: Includes integer as well as FP multiplies and is speculative.
13H
DIV
00H
Number of divides. Note: Includes integer as well as FP divides and is speculative.
Counter 1 only.
14H
CYCLES_DIV_BUSY
00H
Number of cycles during which the divider is busy, and cannot accept new divides. Note: Includes integer and FP divides, FPREM, FPSQRT, etc., and is speculative.
Counter 0 only.
A-7
Unit
Memory Ordering
Event Num.
03H
Mnemonic Event Name

LD_BLOCKS
Unit Mask
00H
Description
Number of store buffer blocks. Includes counts caused by preceding stores whose addresses are unknown, preceding stores whose addresses are known but whose data is unknown, and preceding stores that conflicts with the load but which incompletely overlap the load.
Comments
04H
SB_DRAINS
00H
Number of store buffer drain cycles. Incremented every cycle the store buffer is draining. Draining is caused by serializing operations like CPUID, synchronizing operations like XCHG, interrupt acknowledgment, as well as other conditions (such as cache flushing).
05H
MISALIGN_ MEM_REF
00H
Number of misaligned data memory references. Incremented by 1 every cycle, during which either the proc load or store pipeline dispatches a misaligned uop. Counting is performed if it is the first or second half, or if it is blocked, squashed, or missed. Note: In this context, misaligned means crossing a 64-bit boundary.
It should be noted that MISALIGN_MEM_REF is only an approximation to the true number of misaligned memory references. The value returned is roughly proportional to the number of misaligned memory accesses, i.e., the size of the problem.
07H
EMON_KNI_PREF_ DISPATCHED
00H 01H 02H 03H 4BH EMON_KNI_PREF_ MISS 00H 01H 02H 03H
Number of Streaming SIMD extensions prefetch/weaklyordered instructions dispatched (speculative prefetches are included in counting) 0: prefetch NTA 1: prefetch T1 2: prefetch T2 3: weakly ordered stores
Counters 0 and 1. Pentium III processor only.
Number of prefetch/weaklyordered instructions that miss all caches. 0: prefetch NTA 1: prefetch T1 2: prefetch T2 3: weakly ordered stores
A-8
Unit
Instruction Decoding and Retirement
Event Num.
C0H
Mnemonic Event Name

INST_RETIRED
Unit Mask
OOH
Description
Number of instructions retired.
Comments
A hardware interrupt received during/after the last iteration of the REP STOS flow causes the counter to undercount by 1 instruction.
C2H D0H D8H
UOPS_RETIRED INST_DECODED EMON_KNI_INST_ RETIRED
00H 00H
Number of UOPs retired. Number of instructions decoded. Number of Streaming SIMD extensions retired 0: packed & scalar 1: scalar Number of Streaming SIMD extensions computation instructions retired. 0: packed and scalar 1: scalar Number of hardware interrupts received. Number of processor cycles for which interrupts are disabled. Number of processor cycles for which interrupts are disabled and interrupts are pending. Number of branch instructions retired. Number of mispredicted branches retired. Number of taken branches retired. Number of taken mispredictions branches retired. Number of branch instructions decoded. Number of branches for which the BTB did not produce a prediction. Number of bogus branches. Number of times BACLEAR is asserted. This is the number of times that a static branch prediction was made, in which the branch decoder decided to make a branch prediction because the BTB did not. Counters 0 and 1. Pentium III processor only.
00H 01H D9H EMON_KNI_COMP_ INST_RET 00H 01H Interrupts C8H C6H C7H HW_INT_RX CYCLES_INT_ MASKED CYCLES_INT_ PENDING_ AND_MASKED BR_INST_RETIRED BR_MISS_PRED_ RETIRED BR_TAKEN_ RETIRED BR_MISS_PRED_ TAKEN_RET BR_INST_DECODED BTB_MISSES 00H 00H 00H
Branches
C4H C5H C9H CAH E0H E2H
00H 00H 00H 00H 00H 00H
E4H E6H
BR_BOGUS BACLEARS
00H 00H
A-9
Unit
Stalls
Event Num.
A2H
Mnemonic Event Name

RESOURCE_STALLS
Unit Mask
00H
Description
Incremented by 1 during every cycle for which there is a resource related stall. Includes register renaming buffer entries, memory buffer entries. Does not include stalls due to bus queue full, too many cache misses, etc. In addition to resource related stalls, this event counts some other events. Includes stalls arising during branch misprediction recovery, such as if retirement of the mispredicted branch is delayed and stalls arising while store buffer is draining from synchronizing operations.
Comments
D2H
PARTIAL_RAT_ STALLS
00H
Number of cycles or events for partial stalls. Note: Includes flag partial stalls.
Segment Register Loads Clocks MMX Unit
06H
SEGMENT_REG_ LOADS CPU_CLK_ UNHALTED MMX_INSTR_EXEC
00H
Number of segment register loads. Number of cycles during which the processor is not halted. Number of MMX Instructions Executed. Available in Intel Celeron, Pentium II and Pentium II Xeon processors only. Does not account for MOVQ and MOVD stores from register to memory.
79H B0H
00H 00H
B1H
MMX_SAT_ INSTR_EXEC
00H
Number of MMX Saturating Instructions Executed.
Available in Pentium II & Pentium III processors only. Available in Pentium II & Pentium III processors only. Available in Pentium II & Pentium III processors only.
B2H
MMX_UOPS_EXEC
0FH
Number of MMX UOPS Executed. MMX packed multiply instructions executed. MMX packed shift instructions executed. MMX pack operation instructions executed. MMX unpack operation instructions executed. MMX packed logical instructions executed. MMX packed arithmetic instructions executed.
B3H
MMX_INSTR_ TYPE_EXEC
01H 02H 04H 08H 10H 20H
A-10
Unit Event Num.
CCH
Mnemonic Event Name

FP_MMX_TRANS
Unit Mask
00H 01H
Description
Transitions from MMX instruction to floating-point instructions. Transitions from floating-point instructions to MMX instructions. Number of MMX Assists (that is, the number of EMMS instructions executed). Number of MMX Instructions Retired. Number of Segment Register Renaming Stalls: Segment register ES Segment register DS Segment register FS Segment register FS Segment registers ES + DS + FS + GS Number of Segment Register Renames: Segment register ES Segment register DS Segment register FS Segment register FS Segment registers ES + DS + FS + GS Number of segment register rename events retired.
Comments
Available in Pentium II & Pentium III processors only.
CDH
MMX_ASSIST
00H
Available in Pentium II & Pentium III processors only. Available in Pentium II processor only. Available in Pentium II & Pentium III processors only.
CEH Segment Register Renaming D4H
MMX_INSTR_RET SEG_RENAME_ STALLS
00H
01H 02H 04H 08H 0FH D5H SEG_REG_ RENAMES 01H 02H 04H 08H 0FH D6H RET_SEG_ RENAMES 00H
NOTES: 1. Several L2 cache events, where noted, can be further qualified using the Unit Mask (UMSK) field in the PerfEvtSel0 and PerfEvtSel1 registers. The lower 4 bits of the Unit Mask field are used in conjunction with L2 events to indicate the cache state or cache states involved. The P6 family processors identify cache states using the MESI protocol and consequently each bit in the Unit Mask field represents one of the four states: UMSK[3] = M (8H) state, UMSK[2] = E (4H) state, UMSK[1] = S (2H) state, and UMSK[0] = I (1H) state. UMSK[3:0] = MESI (FH) should be used to collect data for all states; UMSK = 0H, for the applicable events, will result in nothing being counted. 2. All of the external bus logic (EBL) events, except where noted, can be further qualified using the Unit Mask (UMSK) field in the PerfEvtSel0 and PerfEvtSel1 registers. Bit 5 of the UMSK field is used in conjunction with the EBL events to indicate whether the processor should count transactions that are selfgenerated (UMSK[5] = 0) or transactions that result from any processor on the bus (UMSK[5] = 1). 3. L2 cache locks, so it is possible to have a zero count.
A-11
A.2. PENTIUM PROCESSOR PERFORMANCE-MONITORING EVENTS

Table A-2 lists the events that can be counted with the performance-monitoring counters for the Pentium processor. The Event Number column gives the hexadecimal code that identifies the event and that is entered in the ES0 or ES1 (event select) fields of the CESR MSR. The Mnemonic Event Name column gives the name of the event, and the Description and Comments columns give detailed descriptions of the events. Most events can be counted with either counter 0 or counter 1; however, some events can only be counted with only counter 0 or only counter 1 (as noted).
NOTE
The events in the table that are shaded are implemented only in the Pentium processor with MMX technology.
Table A-2. Events That Can Be Counted with the Pentium Processor PerformanceMonitoring Counters
Event Num. 00H Mnemonic Event Name DATA_READ Description Number of memory data reads (internal data cache hit and miss combined). Comments Split cycle reads are counted individually. Data Memory Reads that are part of TLB miss processing are not included. These events may occur at a maximum of two per clock. I/O is not included. Split cycle writes are counted individually. These events may occur at a maximum of two per clock. I/O is not included.
01H
DATA_WRITE
Number of memory data writes (internal data cache hit and miss combined), I/O is not included. Number of misses to the data cache translation look-aside buffer. Number of memory read accesses that miss the internal data cache whether or not the access is cacheable or noncacheable.
0H2
DATA_TLB_MISS
03H
DATA_READ_MISS
Additional reads to the same cache line after the first BRDY# of the burst line fill is returned but before the final (fourth) BRDY# has been returned, will not cause the counter to be incremented additional times. Data accesses that are part of TLB miss processing are not included. Accesses directed to I/O space are not included. Data accesses that are part of TLB miss processing are not included. Accesses directed to I/O space are not included.
04H
DATA WRITE MISS
Number of memory write accesses that miss the internal data cache whether or not the access is cacheable or noncacheable.
A-12
Table A-2. Events That Can Be Counted with the Pentium Processor PerformanceMonitoring Counters (Contd.)
Event Num. 05H Mnemonic Event Name WRITE_HIT_TO_ M-_OR_ESTATE_LINES DATA_CACHE_ LINES_ WRITTEN_BACK EXTERNAL_ SNOOPS Description Number of write hits to exclusive or modified lines in the data cache. Number of dirty lines (all) that are written back, regardless of the cause. Number of accepted external snoops whether they hit in the code cache or data cache or neither. Number of external snoops to the data cache. Comments These are the writes that may be held up if EWBE# is inactive. These events may occur a maximum of two per clock. Replacements and internal and external snoops can all cause writeback and are counted. Assertions of EADS# outside of the sampling interval are not counted, and no internal snoops are counted. Snoop hits to a valid line in either the data cache, the data line fill buffer, or one of the write back buffers are all counted as hits. These accesses are not necessarily run in parallel due to cache misses, bank conflicts, etc.
06H
07H
08H
EXTERNAL_DATA_ CACHE_SNOOP_ HITS MEMORY ACCESSES IN BOTH PIPES BANK CONFLICTS MISALIGNED DATA MEMORY OR I/O REFERENCES
09H
Number of data memory reads or writes that are paired in both pipes of the pipeline. Number of actual bank conflicts. Number of memory or I/O reads or writes that are misaligned.
0AH 0BH
A 2- or 4-byte access is misaligned when it crosses a 4-byte boundary; an 8-byte access is misaligned when it crosses an 8-byte boundary. Ten byte accesses are treated as two separate accesses of 8 and 2 bytes each. Individual 8-byte noncacheable instruction reads are counted.
0CH
CODE READ
Number of instruction reads whether the read is cacheable or noncacheable. Number of instruction reads that miss the code TLB whether the read is cacheable or noncacheable. Number of instruction reads that miss the internal code cache whether the read is cacheable or noncacheable.
0DH
CODE TLB MISS
Individual 8-byte noncacheable instruction reads are counted.
0EH
CODE CACHE MISS
Individual 8-byte noncacheable instruction reads are counted.
A-13
Event Num. 0FH Mnemonic Event Name ANY SEGMENT REGISTER LOADED Description Number of writes into any segment register in real or protected mode including the LDTR, GDTR, IDTR, and TR. Comments Segment loads are caused by explicit segment register load instructions, far control transfers, and task switches. Far control transfers and task switches causing a privilege level change will signal this event twice. Note that interrupts and exceptions may initiate a far control transfer.
10H 11H 12H
Reserved Reserved Branches Number of taken and not taken branches, including conditional branches, jumps, calls, returns, software interrupts, and interrupt returns. Also counted as taken branches are serializing instructions, VERR and VERW instructions, some segment descriptor loads, hardware interrupts (including FLUSH#), and programmatic exceptions that invoke a trap or fault handler. The pipe is not necessarily flushed. The number of branches actually executed is measured, not the number of predicted branches. Hits are counted only for those instructions that are actually executed. This event type is a logical OR of taken branches and BTB hits. It represents an event that may cause a hit in the BTB. Specifically, it is either a candidate for a space in the BTB or it is already in the BTB. The counter will not be incremented for serializing instructions (serializing instructions cause the prefetch queue to be flushed but will not trigger the Pipeline Flushed event counter) and software interrupts (software interrupts do not flush the pipeline).
13H 14H
BTB_HITS TAKEN_BRANCH_ OR_BTB_HIT
Number of BTB hits that occur. Number of taken branches or BTB hits that occur.
15H
PIPELINE FLUSHES
Number of pipeline flushes that occur. Pipeline flushes are caused by BTB misses on taken branches, mispredictions, exceptions, interrupts, and some segment descriptor loads.
A-14
Event Num. 16H Mnemonic Event Name INSTRUCTIONS_ EXECUTED Description Number of instructions executed (up to two per clock). Comments Invocations of a fault handler are considered instructions. All hardware and software interrupts and exceptions will also cause the count to be incremented. Repeat prefixed string instructions will only increment this counter once despite the fact that the repeat loop executes the same instruction multiple times until the loop criteria is satisfied. This applies to all the Repeat string instruction prefixes (i.e., REP, REPE, REPZ, REPNE, and REPNZ). This counter will also only increment once per each HLT instruction executed regardless of how many cycles the processor remains in the HALT state. This event is the same as the 16H event except it only counts the number of instructions actually executed in the V-pipe. The count includes HLDA, AHOLD, and BOFF# clocks.
17H
INSTRUCTIONS_ EXECUTED_ V PIPE
Number of instructions executed in the V_pipe. It indicates the number of instructions that were paired. Number of clocks while a bus cycle is in progress. This event measures bus use. Number of clocks while the pipeline is stalled due to full write buffers.
18H
BUS_CYCLE_ DURATION
19H
WRITE_BUFFER_ FULL_STALL_ DURATION
Full write buffers stall data memory read misses, data memory write misses, and data memory write hits to S-state lines. Stalls on I/O accesses are not included. Data TLB Miss processing is also included in the count. The pipeline stalls while a data memory read is in progress including attempts to read that are not bypassed while a line is being filled.
1AH
WAITING_FOR_ DATA_MEMORY_ READ_STALL_ DURATION STALL ON WRITE TO AN E- OR MSTATE LINE LOCKED BUS CYCLE
Number of clocks while the pipeline is stalled while waiting for data memory reads. Number of stalls on writes to E- or M-state lines Number of locked bus cycles that occur as the result of the LOCK prefix or LOCK instruction, page-table updates, and descriptor table updates.
1BH
1CH
Only the read portion of the locked read-modify-write is counted. Split locked cycles (SCYC active) count as two separate accesses. Cycles restarted due to BOFF# are not recounted.
A-15
Event Num. 1DH Mnemonic Event Name I/O READ OR WRITE CYCLE NONCACHEABLE_ MEMORY_READS Description Number of bus cycles directed to I/O space. Number of noncacheable instruction or data memory read bus cycles. Count includes read cycles caused by TLB misses, but does not include read cycles to I/O space. Number of address generation interlock (AGI) stalls. An AGI occurring in both the U- and Vpipelines in the same clock signals this event twice. Comments Misaligned I/O accesses will generate two bus cycles. Bus cycles restarted due to BOFF# are not re-counted. Cycles restarted due to BOFF# are not re-counted.
1EH
1FH
PIPELINE_AGI_ STALLS
An AGI occurs when the instruction in the execute stage of either of U- or Vpipelines is writing to either the index or base address register of an instruction in the D2 (address generation) stage of either the U- or V- pipelines.
20H 21H 22H
Reserved Reserved FLOPS Number of floating-point operations that occur. Number of floating-point adds, subtracts, multiplies, divides, remainders, and square roots are counted. The transcendental instructions consist of multiple adds and multiplies and will signal this event multiple times. Instructions generating the divide-by-zero, negative square root, special operand, or stack exceptions will not be counted. Instructions generating all other floating-point exceptions will be counted. The integer multiply instructions and other instructions which use the FPU will be counted.
A-16
Event Num. 23H Mnemonic Event Name BREAKPOINT MATCH ON DR0 REGISTER Description Number of matches on register DR0 breakpoint. Comments The counters is incremented regardless if the breakpoints are enabled or not. However, if breakpoints are not enabled, code breakpoint matches will not be checked for instructions executed in the V-pipe and will not cause this counter to be incremented. (They are checked on instruction executed in the U-pipe only when breakpoints are not enabled.) These events correspond to the signals driven on the BP[3:0] pins. Refer to Chapter 15, Debugging and Performance Monitoring, for more information. Refer to comment for 23H event.
24H
BREAKPOINT MATCH ON DR1 REGISTER BREAKPOINT MATCH ON DR2 REGISTER BREAKPOINT MATCH ON DR3 REGISTER HARDWARE INTERRUPTS DATA_READ_OR_ WRITE
Number of matches on register DR1 breakpoint. Number of matches on register DR2 breakpoint. Number of matches on register DR3 breakpoint. Number of taken INTR and NMI interrupts. Number of memory data reads and/or writes (internal data cache hit and miss combined).
25H
Refer to comment for 23H event.
26H
Refer to comment for 23H event.
27H 28H
Split cycle reads and writes are counted individually. Data Memory Reads that are part of TLB miss processing are not included. These events may occur at a maximum of two per clock. I/O is not included. Additional reads to the same cache line after the first BRDY# of the burst line fill is returned but before the final (fourth) BRDY# has been returned, will not cause the counter to be incremented additional times. Data accesses that are part of TLB miss processing are not included. Accesses directed to I/O space are not included.
29H
DATA_READ_MISS OR_WRITE MISS
Number of memory read and/or write accesses that miss the internal data cache whether or not the access is cacheable or noncacheable.
A-17
Event Num. 2AH Mnemonic Event Name BUS_OWNERSHIP_ LATENCY (Counter 0) Description The time from LRM bus ownership request to bus ownership granted (that is, the time from the earlier of a PBREQ (0), PHITM# or HITM# assertion to a PBGNT assertion). The number of buss ownership transfers (that is, the number of PBREQ (0) assertions. Number of MMX instructions executed in the U-pipe. Number of MMX instructions executed in the V-pipe. Number of times a processor identified a hit to a modified line due to a memory access in the other processor (PHITM (O)). Number of shared data lines in the L1 cache (PHIT (O)). Number of EMMS instructions executed. If the average memory latencies of the system are known, this event enables the user to count the Write Backs on PHITM(O) penalty and the Latency on Hit Modified(I) penalty. Comments The ratio of the 2AH events counted on counter 0 and counter 1 is the average stall time due to bus ownership conflict.
2AH
BUS OWNERSHIP TRANSFERS (Counter 1) MMX_ INSTRUCTIONS_ EXECUTED_ U-PIPE (Counter 0) MMX_ INSTRUCTIONS_ EXECUTED_ V-PIPE (Counter 1) CACHE_MSTATE_LINE_ SHARING (Counter 0)
The ratio of the 2AH events counted on counter 0 and counter 1 is the average stall time due to bus ownership conflict.
2BH
2BH
2CH
2CH
CACHE_LINE_ SHARING (Counter 1) EMMS_ INSTRUCTIONS_ EXECUTED (Counter 0) TRANSITIONS_ BETWEEN_MMX_ AND_FP_ INSTRUCTIONS (Counter 1)
2DH
2DH
Number of transitions between MMX and floating-point instructions or vice versa. An even count indicates the processor is in MMX state. an odd count indicates it is in FP state. Number of clocks the bus is busy due to the processors own activity, i.e., the bus activity that is caused by the processor.
This event counts the first floating-point instruction following an MMX instruction or first MMX instruction following a floating-point instruction. The count may be used to estimate the penalty in transitions between floatingpoint state and MMX state.
2DH
BUS_UTILIZATION_ DUE_TO_ PROCESSOR_ ACTIVITY (Counter 0)
A-18
Event Num. 2EH Mnemonic Event Name WRITES_TO_ NONCACHEABLE_ MEMORY (Counter 1) SATURATING_ MMX_ INSTRUCTIONS_ EXECUTED (Counter 0) SATURATIONS_ PERFORMED (Counter 1) Description Number of write accesses to noncacheable memory. Comments The count includes write cycles caused by TLB misses and I/O write cycles. Cycles restarted due to BOFF# are not re-counted.
2FH
Number of saturating MMX instructions executed, independently of whether they actually saturated. Number of MMX instructions that used saturating arithmetic and that at least one of its results actually saturated. Number of cycles the processor is not idle due to HLT instruction. If an MMX instruction operating on 4 doublewords saturated in three out of the four results, the counter will be incremented by one only. This event will enable the user to calculate net CPI. Note that during the time that the processor is executing the HLT instruction, the Time-Stamp Counter is not disabled. Since this event is controlled by the Counter Controls CC0, CC1 it can be used to calculate the CPI at CPL=3, which the TSC cannot provide.
2FH
30H
NUMBER_OF_ CYCLES_NOT_IN_ HALT_STATE (Counter 0)
30H
DATA_CACHE_ TLB_MISS_ STALL_DURATION (Counter 1) MMX_ INSTRUCTION_ DATA_READS (Counter 0) MMX_ INSTRUCTION_ DATA_READ_ MISSES (Counter 1) FLOATING_POINT_ STALLS_DURATION (Counter 0) TAKEN_BRANCHES (Counter 1) D1_STARVATION_ AND_FIFO_IS_ EMPTY (Counter 0)
Number of clocks the pipeline is stalled due to a data cache translation look-aside buffer (TLB) miss. Number of MMX instruction data reads.
31H
31H
Number of MMX instruction data read misses.
32H
Number of clocks while pipe is stalled due to a floating-point freeze. Number of taken branches. Number of times D1 stage cannot issue ANY instructions since the FIFO buffer is empty. The D1 stage can issue 0, 1, or 2 instructions per clock if those are available in an instructions FIFO buffer.
32H 33H
A-19
Event Num. 33H Mnemonic Event Name D1_STARVATION_ AND_ONLY_ONE_ INSTRUCTION_IN_ FIFO (Counter 1) Description Number of times the D1 stage issues just a single instruction since the FIFO buffer had just one instruction ready. Comments The D1 stage can issue 0, 1, or 2 instructions per clock if those are available in an instructions FIFO buffer. When combined with the previously defined events, Instruction Executed (16H) and Instruction Executed in the Vpipe (17H), this event enables the user to calculate the numbers of time pairing rules prevented issuing of two instructions.
34H
MMX_ INSTRUCTION_ DATA_WRITES (Counter 0) MMX_ INSTRUCTION_ DATA_WRITE_ MISSES (Counter 1) PIPELINE_ FLUSHES_DUE_ TO_WRONG_ BRANCH_ PREDICTIONS (Counter 0)
Number of data writes caused by MMX instructions. Number of data write misses caused by MMX instructions.
34H
35H
Number of pipeline flushes due to wrong branch predictions resolved in either the Estage or the WB-stage.
The count includes any pipeline flush due to a branch that the pipeline did not follow correctly. It includes cases where a branch was not in the BTB, cases where a branch was in the BTB but was mispredicted, and cases where a branch was correctly predicted but to the wrong address. Branches are resolved in either the Execute stage (Estage) or the Writeback stage (WBstage). In the later case, the misprediction penalty is larger by one clock. The difference between the 35H event count in counter 0 and counter 1 is the number of E-stage resolved branches. Refer to note for event 35H (Counter 0).
35H
PIPELINE_ FLUSHES_DUE_ TO_WRONG_ BRANCH_ PREDICTIONS_ RESOLVED_IN_ WB-STAGE (Counter 1) MISALIGNED_ DATA_MEMORY_ REFERENCE_ON_ MMX_ INSTRUCTIONS (Counter 0)
Number of pipeline flushes due to wrong branch predictions resolved in the WB-stage.
36H
Number of misaligned data memory references when executing MMX instructions.
A-20
Event Num. 36H Mnemonic Event Name PIPELINE_ ISTALL_FOR_MMX_ INSTRUCTION_ DATA_MEMORY_ READS (Counter 1) MISPREDICTED_ OR_ UNPREDICTED_ RETURNS (Counter 1) PREDICTED_ RETURNS (Counter 1) MMX_MULTIPLY_ UNIT_INTERLOCK (Counter 0) Description Number clocks during pipeline stalls caused by waits form MMX instruction data memory reads. Number of returns predicted incorrectly or not predicted at all. The count is the difference between the total number of executed returns and the number of returns that were correctly predicted. Only RET instructions are counted (for example, IRET instructions are not counted). Only RET instructions are counted (for example, IRET instructions are not counted). The counter will not be incremented if there is another cause for a stall. For each occurrence of a multiply interlock this event will be counted twice (if the stalled instruction comes on the next clock after the multiply) or by one (if the stalled instruction comes two clocks after the multiply). Comments
37H
37H
Number of predicted returns (whether they are predicted correctly and incorrectly. Number of clocks the pipe is stalled since the destination of previous MMX multiply instruction is not ready yet.
38H
38H
MOVD/MOVQ_ STORE_STALL_ DUE_TO_ PREVIOUS_MMX_ OPERATION (Counter 1) RETURNS (Counter 0)
Number of clocks a MOVD/MOVQ instruction store is stalled in D2 stage due to a previous MMX operation with a destination to be used in the store instruction. Number or returns executed. Only RET instructions are counted; IRET instructions are not counted. Any exception taken on a RET instruction and any interrupt recognized by the processor on the instruction boundary prior to the execution of the RET instruction will also cause this counter to be incremented.
39H
39H 3AH
Reserved BTB_FALSE_ ENTRIES (Counter 0) Number of false entries in the Branch Target Buffer. False entries are causes for misprediction other than a wrong prediction.
A-21
Event Num. 3AH Mnemonic Event Name BTB_MISS_ PREDICTION_ON_ NOT-TAKEN_ BRANCH (Counter 1) FULL_WRITE_ BUFFER_STALL_ DURATION_ WHILE_ EXECUTING_MMX_ INSTRUCTIONS (Counter 0) STALL_ON_MMX_ INSTRUCTION_ WRITE_TO E-_OR_ M-STATE_LINE (Counter 1) Description Number of times the BTB predicted a not-taken branch as taken. Comments
3BH
Number of clocks while the pipeline is stalled due to full write buffers while executing MMX instructions.
3BH
Number of clocks during stalls on MMX instructions writing to Eor M-state lines.
A-22
B
Model-Specific Registers
APPENDIX B MODEL-SPECIFIC REGISTERS

Table B-1 lists the model-specific registers (MSRs) that can be read with the RDMSR and written with the WRMSR instructions. Register addresses are given in both hexadecimal and decimal; the register name is the mnemonic register name; the bit description describes individual bits in registers.
NOTE
The registers with addresses 0H, 1H, 10H, 11H, 12H, and 13H in Table B-1 are available only in the Pentium processor. Code code that accesses registers 0H, 1H, and 10H will run on a P6 family processor without generating exceptions; however, code that accesses registers 11H, 12H, and 13H will generate exceptions on a P6 family processor. The MSRs in this table that are shaded are available only in the Pentium II and later processors in the P6 family.
Table B-1. Model-Specific Registers (MSRs)
Register Address Hex 0H 1H 10H 11H 12H 13H 1BH Dec 0 1 16 17 18 19 27 Register Name P5_MC_ADDR (Pentium Processor Only) P5_MC_TYPE (Pentium Processor Only) TSC CESR (Pentium Processor Only) CTR0 (Pentium Processor Only) CTR1 (Pentium Processor Only) APICBASE 7:0 8 10:9 11 31:12 Reserved Boot Strap Processor indicator Bit. BSP= 1 Reserved APIC Global Enable Bit - Permanent til reset Enabled = 1, Disabled = 0 APIC Base Address Bit Description
B-1
MODEL-SPECIFIC REGISTERS
Table B-1. Model-Specific Registers (MSRs) (Contd.)

Register Address Hex Dec 63:32 2AH 42 EBL_CR_POWERON 0 1 Reserved1 Data Error Checking Enable 1 = Enabled 0 = Disabled Read/Write Response Error Checking Enable FRCERR Observation Enable 1 = Enabled 0 = Disabled Read/Write AERR# Drive Enable 1 = Enabled 0 = Disabled Read/Write BERR# Enable for initiator bus requests 1 = Enabled 0 = Disabled Read/Write Reserved BERR# Driver Enable for initiator internal errors 1 = Enabled 0 = Disabled Read/Write BINIT# Driver Enable 1 = Enabled 0 = Disabled Read/Write Output Tri-state Enabled 1 = Enabled 0 = Disabled Read Execute BIST 1 = Enabled 0 = Disabled Read AERR# Observation Enabled 1 = Enabled 0 = Disabled Read Reserved Register Name Reserved Bit Description
5 6
10
11
B-2

Register Address Hex Dec 12 Register Name Bit Description BINIT# Observation Enabled 1 = Enabled 0 = Disabled Read In Order Queue Depth 1=1 0=8 Read 1Mbyte Power on Reset Vector 1 = 1Mbyte 0 = 4Gbytes Read Only FRC Mode Enable 1 = Enabled 0 = Disabled Read Only APIC Cluster ID Read Reserved Symmetric Arbitration ID Read Clock Frequency Ratio Read Reserved Low Power Mode Enable Read/Write Reserved1 Test Control Register Reserved Streaming Buffer Disable Disable LOCK# assertion for split locked access BIOS Update Trigger Register Chunk 0 data register D[63:0]: used to write to and read from the L2 Chunk 1 data register D[63:0]: used to write to and read from the L2 Chunk 2 data register D[63:0]: used to write to and read from the L2
13
14
15
17:16 19:18 21: 20 24:22 25 26 63:27 33H 51 TEST_CTL 29:0 30 31 79H 88 89 8A 121 136 137 138 BIOS_UPDT_TRIG BBL_CR_D0[63:0] BBL_CR_D1[63:0] BBL_CR_D2[63:0]
B-3

Register Address Hex 8BH Dec 139 Register Name BIOS_SIGN/BBL_CR_D3[ 63:0] PERFCTR0 PERFCTR1 MTRRcap BBL_CR_ADDR [63:0] BBL_CR_ADDR [63:32] BBL_CR_ADDR [31:3] BBL_CR_ADDR [2:0] 118 119 280 281 BBL_CR_DECC[63:0] BBL_CR_CTL Address register: used to send specified address (A31A3) to L2 during cache initialization accesses. Reserved, Address bits [35:3] Reserved Set to 0. Data ECC register D[7:0]: used to write ECC and read ECC to/from L2 Control register: used to program L2 commands to be issued via cache configuration accesses mechanism. Also receives L2 lookup response Reserved Processor number2 Disable = 1 Enable = 0 Reserved User supplied ECC Reserved L2 Hit Reserved State from L2 Modified - 11,Exclusive - 10, Shared - 01, Invalid - 00 Way from L2 Way 0 - 00, Way 1 - 01, Way 2 - 10, Way 3 - 11 Way to L2 Reserved State to L2 L2 Command Data Read w/ LRU update (RLU) Tag Read w/ Data Read (TRR) Tag Inquire (TI) L2 Control Register Read (CR) L2 Control Register Write (CW) Tag Write w/ Data Read (TWR) Tag Write w/ Data Write (TWW) Tag Write (TW) Trigger register: used to initiate a cache configuration accesses access, Write only with Data=0. Busy register: indicates when a cache configuration accesses L2 command is in progress. D[0] = 1 = BUSY Bit Description BIOS Update Signature Register or Chunk 3 data register D[63:0]: used to write to and read from the L2 depending on the usage model
C1H C2H FEH 116
193 194 254 278
BL_CR_CTL[63:22] BBL_CR_CTL[21] BBL_CR_CTL[20:19] BBL_CR_CTL[18] BBL_CR_CTL[17] BBL_CR_CTL[16] BBL_CR_CTL[15:14] BBL_CR_CTL[13:12] BBL_CR_CTL[11:10] BBL_CR_CTL[9:8] BBL_CR_CTL[7] BBL_CR_CTL[6:5] BBL_CR_CTL[4:0] 01100 01110 01111 00010 00011 010 + MESI encode 111 + MESI encode 100 + MESI encode 11A 11B 282 283 BBL_CR_TRIG BBL_CR_BUSY
B-4

Register Address Hex 11E Dec 286 Register Name BBL_CR_CTL3 BBL_CR_CTL3[63:26] BBL_CR_CTL3[25] BBL_CR_CTL3[24] BBL_CR_CTL3[23] BBL_CR_CTL3[22:20] 111 110 101 100 011 010 001 000 BBL_CR_CTL3[19] BBL_CR_CTL3[18] BBL_CR_CTL3[17:13 00001 00010 00100 01000 10000 BBL_CR_CTL3[12:11] BBL_CR_CTL3[10:9] 00 01 10 11 BBL_CR_CTL3[8] BBL_CR_CTL3[7] BBL_CR_CTL3[6] BBL_CR_CTL3[5] BBL_CR_CTL3[4:1] BBL_CR_CTL3[0] 179H 17AH 17BH 186H 377 378 379 390 MCG_CAP MCG_STATUS MCG_CTL EVNTSEL0 7:0 Event Select (Refer to Performance Counter section for a list of event encodings) UMASK: Unit Mask Register Set to 0 to enable all count options USER: Controls the counting of events at Privilege levels of 1, 2, and 3 Bit Description Control register 3: used to configure the L2 Cache Reserved Cache bus fraction (read only) Reserved L2 Hardware Disable (read only) L2 Physical Address Range support 64Gbytes 32Gbytes 16Gbytes 8Gbytes 4Gbytes 2Gbytes 1Gbytes 512Mbytes Reserved Cache State error checking enable (read/write) Cache size per bank (read/write) 256Kbytes 512Kbytes 1Mbyte 2Mbyte 4Mbytes Number of L2 banks (read only) L2 Associativity (read only) Direct Mapped 2 Way 4 Way Reserved L2 Enabled (read/write) CRTN Parity Check Enable (read/write) Address Parity Check Enable (read/write) ECC Check Enable (read/write) L2 Cache Latency (read/write) L2 Configured (read/write)
15:8 16
B-5

Register Address Hex Dec 17 18 Register Name Bit Description OS: Controls the counting of events at Privilege level of 0 E: Occurrence/Duration Mode Select 1 = Occurrence 0 = Duration PC: Enabled the signaling of performance counter overflow via BP0 pin INT: Enables the signaling of counter overflow via input to APIC 1 = Enable 0 = Disable ENABLE: Enables the counting of performance events in both counters 1 = Enable 0 = Disable INV: Inverts the result of the CMASK condition 1 = Inverted 0 = Non-Inverted CMASK: Counter Mask
19
20
22
23
31:24 187H 391 EVNTSEL1 7:0
Event Select (Refer to Performance Counter section for a list of event encodings) UMASK: Unit Mask Register Set to Zero to enable all count options USER: Controls the counting of events at Privilege levels of 1, 2, and 3 OS: Controls the counting of events at Privilege level of 0 E: Occurrence/Duration Mode Select 1 = Occurrence 0 = Duration PC: Enabled the signaling of performance counter overflow via BP0 pin.
15:8
16
17 18
19
B-6

Register Address Hex Dec 20 Register Name Bit Description INT: Enables the signaling of counter overflow via input to APIC 1 = Enable 0 = Disable INV: Inverts the result of the CMASK condition 1 = Inverted 0 = Non-Inverted CMASK: Counter Mask
23
31:24 1D9H 473 DEBUGCTLMSR 0 1 2 3 4 5 6 13:7 14 15 1DBH 1DCH 1DDH 1DEH 1E0H 475 476 477 478 480 LASTBRANCHFROMIP LASTBRANCHTOIP LASTINTFROMIP LASTINTTOIP ROB_CR_BKUPTMPDR6 1:0 2 200H 201H 202H 203H 204H 205H 512 513 514 515 516 517 MTRRphysBase0 MTRRphysMask0 MTRRphysBase1 MTRRphysMask1 MTRRphysBase2 MTRRphysMask2
Enable/Disable Last Branch Records Branch Trap Flag Performance Monitoring/Break Point Pins Performance Monitoring/Break Point Pins Performance Monitoring/Break Point Pins Performance Monitoring/Break Point Pins Enable/Disable Execution Trace Messages Reserved Enable/Disable Execution Trace Messages Enable/Disable Execution Trace Messages
Reserved Fast String Enable bit. Default is enabled
B-7

Register Address Hex 206H 207H 208H 209H 20AH 20BH 20CH 20DH 20EH 20FH 250H 258H 259H 268H 269H 26AH 26BH 26CH 26DH 26EH 26FH 2FFH Dec 518 519 520 521 522 523 524 525 526 527 592 600 601 616 617 618 619 620 621 622 623 767 Register Name MTRRphysBase3 MTRRphysMask3 MTRRphysBase4 MTRRphysMask4 MTRRphysBase5 MTRRphysMask5 MTRRphysBase6 MTRRphysMask6 MTRRphysBase7 MTRRphysMask7 MTRRfix64K_00000 MTRRfix16K_80000 MTRRfix16K_A0000 MTRRfix4K_C0000 MTRRfix4K_C8000 MTRRfix4K_D0000 MTRRfix4K_D8000 MTRRfix4K_E0000 MTRRfix4K_E8000 MTRRfix4K_F0000 MTRRfix4K_F8000 MTRRdefType 2:0 10 11 400H 401H 1024 1025 MC0_CTL MC0_STATUS 63 62 61 60 59 MC_STATUS_V MC_STATUS_O MC_STATUS_UC MC_STATUS_EN MC_STATUS_MISCV Default memory type Fixed MTRR enable MTRR Enable Bit Description
B-8

Register Address Hex Dec 58 57 31:16 15:0 402H 403H 404H 405H 406H 407H 408H 409H 40AH 40BH 40CH 40DH 40EH 40FH 410H 411H 412H 413H 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 MC0_ADDR MC0_MISC MC1_CTL MC1_STATUS MC1_ADDR MC1_MISC MC2_CTL MC2_STATUS MC2_ADDR MC2_MISC MC4_CTL MC4_STATUS MC4_ADDR MC4_MISC MC3_CTL MC3_STATUS MC3_ADDR MC3_MISC Defined in MCA architecture but not implemented in the P6 family processors Bit definitions same as MC0_STATUS Bit definitions same as MC0_STATUS Defined in MCA architecture but not implemented in P6 Family processors Defined in MCA architecture but not implemented in the P6 family processors Defined in MCA architecture but not implemented in the P6 family processors Bit definitions same as MC0_STATUS Defined in MCA architecture but not implemented in the P6 family processors Bit definitions same as MC0_STATUS Defined in MCA architecture but not implemented in the P6 family processors Register Name Bit Description MC_STATUS_ADDRV MC_STATUS_DAM MC_STATUS_MCACOD MC_STATUS_MSCOD
NOTES: 1. Bit 0 of this register has been redefined several times, and is no longer used in Pentium Pro processors. 2. The processor number feature may be disabled by setting bit 21 of the BBL_CR_CTL MSR (model-specific register address 119h) to 1. Once set, bit 21 of the BBL_CR_CTL may not be cleared. This bit is write-once. The processor number feature will be disabled until the processor is reset. 3. The Pentium III processor will prevent FSB frequency overclocking with a new shutdown mechanism. If the FSB frequency selected is greater than the internal FSB frequency the processor will shutdown. If the FSB selected is less than the internal FSB frequency the BIOS may choose to use bit 11 to implement its own shutdown policy.
B-9
C
Dual-Processor Bootup Sequence Example
(Specific to Pentium Processors)
APPENDIX C DUAL-PROCESSOR (DP) BOOTUP SEQUENCE EXAMPLE (SPECIFIC TO PENTIUM PROCESSORS)

The following example shows the DP protocol for booting two Pentium processors (a primary processor and a secondary processor) in a DP system and initializing their APICs. For dual-processor systems based on Pentium processors, the APIC ID of the primary processor is always 0. The following constants and data definitions are used in the accompanying code examples. They are based on the addresses of the APIC registers as defined in Table 7-1 in Chapter 7. ICR_LOW ICR_HI SVR APIC_ID LVT3 APIC_ENABLED BOOT_ID UPGRD_ID EQU 0FEE00300H EQU 0FEE00310H EQU 0FEE000F0H EQU 0FEE00020H EQU 0FEE00370H EQU 100H DW ? DW ?
C.1. PRIMARY PROCESSORS SEQUENCE OF EVENTS

1. The primary processor boots at the Intel Architecture address and executes until it is ready to activate the secondary processor. 2. Initialization software should execute the CPUID instruction to determine if the primary processor is a GenuineIntel. The values of EAX and EDX should be saved into a configuration RAM space for use later. If the type field (in the EAX register following CPUID instruction execution) is 01B in bits 13 and 14, respectively, the processor is a future Pentium OverDrive processor and the Pentium processor (735/90, 815/100, 1000,120, 1110/133) has been put to sleep. This means the system is a uniprocessor system and normal AT system configuration can continue. Go to step 14 to configure the APIC. If the type field is 00B, the processor is the primary processor and detection of the secondary processor is required. Continue with steps 3 through 13.
C-1
DUAL-PROCESSOR (DP) BOOTUP SEQUENCE EXAMPLE (SPECIFIC
3. The following operation can be used to detect the secondary processor: Set a timer before sending the start-up IPI to the secondary processor. In the secondary processors initialization routine, it should write a value into memory indicating its presence. The primary processor can then use the timer expiration to check if something has been written into memory. If the timer expires and nothing has been written into memory, the secondary processor is not present or some error has occurred. 4. Load start-up code for the secondary processor to execute into a 4-KByte page in the lower 1 MByte of memory. 5. Switch to protected mode (to access APIC address space above 1 MByte). 6. Determine the Pentium processors APIC ID from the local APIC ID register (default is 0):
MOV MOV AND MOV ESI, APIC_ID; address of local APIC ID register EAX, [ESI] EAX, 0F000000H; zero out all other bits except APIC ID BOOT_ID, EAX; save in memory
Save the ID in the configuration RAM (optional). 7. Determine APIC ID of the secondary processor and save it in the configuration RAM (optional).
MOV EAX, BOOT_ID XOR EAX, 100000H; toggle lower bit of ID field (bit 24) MOV SECOND_ID, EAX
8. Convert the base address of the 4-KByte page for the secondary processors bootup code into 8-bit vector. The 8-bit vector defines the address of a 4-KByte page in the real-address mode address space (1-MByte space). For example, a vector of 0BDH specifies a start-up memory address of 000BD000H. Use steps 9 and 10 to use the LVT APIC error handling entry to deal with unsuccessful delivery of the start-up IPI. 9. Enable the local APIC by writing to spurious vector register (SVR). This is required to do APIC error handling via the local vector table.
MOV MOV OR MOV ESI, SVR ; address of SVR EAX, [ESI] EAX, APIC_ENABLED; set bit 8 to enable (0 on reset) [ESI], EAX
C-2
DUAL-PROCESSOR (DP) BOOTUP SEQUENCE EXAMPLE (SPECIFIC
10. Program LVT3 (APIC error interrupt vector) of the local vector table with an 8-bit vector for handling APIC errors.
MOV ESI, LVT3 MOV EAX, [ESI] AND EAX, FFFFFF00H; clear out previous vector OR EAX, 000000xxH; xx is the 8-bit vector for APIC error ; handling. MOV [ESI], EAX
11. Write APIC ICRH with address of the secondary processors APIC.
MOV MOV AND OR ICR_HI ; address of ICR high dword [ESI] ; get high word of ICR 0F0FFFFFFH; zero out ID Bits SECOND_ID; write ID into appropriate bits - dont ; affect reserved bits MOV [ESI], SECOND_ID; write upgrade ID to destination field ESI, EAX, EAX, EAX,
12. Set the timer with an appropriate value (~100 milliseconds). 13. Write APIC ICRL to send a start-up IPI message to the secondary processor via the APIC.
MOV MOV AND OR ICR_LOW; write address of ICR low dword [ESI] ; get low dword of ICR 0FFF0F800H; zero out delivery mode and vector fields 000006xxH; 6 selects delivery mode 110 (StartUp IPI) ; xx should be vector of 4kb page as ; computed in Step 8. MOV [ESI], EAX ESI, EAX, EAX, EAX,
14. Configure the APIC as appropriate.
C-3
C.2. SECONDARY PROCESSORS SEQUENCE OF EVENTS FOLLOWING RECEIPT OF START-UP IPI

If the secondary processors APIC is to be used for symmetric multiprocessing, the secondary processor must undertake the following steps: 1. Switch to protected mode to access the APIC addresses. 2. Initialize its local APIC by writing to bit 8 of the SVR register and programming its LVT3 for error handling. 3. Configure the APIC as appropriate. 4. Enable interrupts. 5. (Optional.) Execute the CPUID instruction and write the results into the configuration RAM. 6. Do either of the following: Execute a HALT instruction and wait for an IPI from the operating system. Continue execution.
D
Multiple-Processor (MP) Bootup Sequence Example
(Specific to P6 Family Processors)
APPENDIX D MULTIPLE-PROCESSOR (MP) BOOTUP SEQUENCE EXAMPLE (SPECIFIC TO P6 FAMILY PROCESSORS)

The following example illustrates the use of the MP protocol to boot two P6 family processors in a multiple-processor (MP) system and initialize their APICs. The primary processor (the processor that won the race for the flag) is called the boot strap processor (BSP) and the secondary processor is called the application processor (AP). The following constants and data definitions are used in the accompanying code examples. They are based on the addresses of the APIC registers as defined in Table 7-1 in Chapter 7. ICR_LOW ICR_HI SVR APIC_ID LVT3 APIC_ENABLED BOOT_ID SECOND_ID EQU 0FEE00300H EQU 0FEE00310H EQU 0FEE000F0H EQU 0FEE00020H EQU 0FEE00370H EQU 100H DW ? DW ?
D.1. BSPS SEQUENCE OF EVENTS

1. The BSP boots at the Intel Architecture address and executes until it is ready to activate the AP. 2. Initialization software should execute the CPUID instruction to determine if the BSP is a GenuineIntel. The values of EAX and EDX should be saved into a configuration RAM space for use later. 3. The following operation can be used to detect the AP: Set a timer before sending the start-up IPI to the AP. In the APs initialization routine, it should write a value into memory indicating its presence. The BSP can then use the timer expiration to check if something has been written into memory. If the timer expires and nothing has been written into memory, the AP is not present or some error has occurred. 4. Load start-up code for the AP to execute into a 4-KByte page in the lower 1 MByte of memory.
D-1
MULTIPLE-PROCESSOR (MP) BOOTUP SEQUENCE EXAMPLE
5. Switch to protected mode (to access APIC address space above 1 MByte) or change the APIC base to less than 1 MByte and insure it is mapped to an uncached (UC) memory type. 6. Determine the BSPs APIC ID from the local APIC ID register (default is 0):
MOV MOV AND MOV ESI, APIC_ID; address of local APIC ID register EAX, [ESI] EAX, 0F000000H; zero out all other bits except APIC ID BOOT_ID, EAX; save in memory
Save the ID in the configuration RAM (optional). 7. Determine APIC ID of the AP and save it in the configuration RAM (optional).
MOV EAX, BOOT_ID XOR EAX, 100000H; toggle lower bit of ID field (bit 24) MOV SECOND_ID, EAX
8. Convert the base address of the 4-KByte page for the APs bootup code into 8-bit vector. The 8-bit vector defines the address of a 4-KByte page in the real-address mode address space (1-MByte space). For example, a vector of 0BDH specifies a start-up memory address of 000BD000H. Use steps 9 and 10 to use the LVT APIC error handling entry to deal with unsuccessful delivery of the start-up IPI. 9. Enable the local APIC by writing to spurious vector register (SVR). This is required to do APIC error handling via the local vector table.
10. Program LVT3 (APIC error interrupt vector) of the local vector table with an 8-bit vector for handling APIC errors.
MOV ESI, LVT3 MOV EAX, [ESI] AND EAX, FFFFFF00H; clear out previous vector OR EAX, 000000xxH; xx is the 8-bit vector for APIC error ; handling. MOV [ESI], EAX
11. Write APIC ICRH with address of the APs APIC.

MOV MOV AND OR ICR_HI ; address of ICR high dword [ESI] ; get high word of ICR 0F0FFFFFFH; zero out ID Bits SECOND_ID; write ID into appropriate bits - dont ; affect reserved bits MOV [ESI], SECOND_ID; write upgrade ID to destination field
D-2
ESI, EAX, EAX, EAX,
MULTIPLE-PROCESSOR (MP) BOOTUP SEQUENCE EXAMPLE
12. Initialize the memory location into which the AP will write to signal its presence. 13. Set the timer with an appropriate value (~100 milliseconds). 14. Write APIC ICRL to send a start-up IPI message to the AP via the APIC.
MOV MOV AND OR ICR_LOW; write address of ICR low dword [ESI] ; get low dword of ICR 0FFF0F800H; zero out delivery mode and vector fields 000006xxH; 6 selects delivery mode 110 (StartUp IPI) ; xx should be vector of 4kb page as ; computed in Step 8. MOV [ESI], EAX ESI, EAX, EAX, EAX,
15. Wait for the timer interrupt or an AP signal appearing in memory. 16. If necessary, reconfigure the APIC and continue with the remaining system diagnostics as appropriate.
D.2. APS SEQUENCE OF EVENTS FOLLOWING RECEIPT OF START-UP IPI

If the APs APIC is to be used for symmetric multiprocessing, the AP must undertake the following steps: 1. Switch to protected mode to access the APIC addresses. 2. Initialize its local APIC by writing to bit 8 of the SVR register and programming its LVT3 for error handling. 3. Configure the APIC as appropriate. 4. Enable interrupts. 5. (Optional) Execute the CPUID instruction and write the results into the configuration RAM. 6. Write into the memory location that is being used to signal to the BSP that the AP is executing. 7. Do either of the following: Continue execution (that is, self-configuration, MP Specification Configuration table completion). Execute a HLT instruction and wait for an IPI from the operating system.
D-3
E
Programming the LINT0 and LINT1 Inputs
APPENDIX E PROGRAMMING THE LINT0 AND LINT1 INPUTS

The following procedure describes how to program the LINT0 and LINT1 local APIC pins on a processor after multiple processors have been booted and initialized (as described in Appendix C and Appendix D). In this example, LINT0 is programmed to be the ExtINT pin and LINT1 is programmed to be the NMI pin.
E.1. CONSTANTS
The following constants are defined: LVT1 LVT2 LVT3 SVR EQU 0FEE00350H EQU 0FEE00360H EQU 0FEE00370H EQU 0FEE000F0H
E.2. LINT[0:1] PINS PROGRAMMING PROCEDURE

Use the following to program the LINT[1:0] pins: 1. Mask 8259 interrupts. 2. Enable APIC via SVR (spurious vector register) if not already enabled.
3. Program LVT1 as an ExtINT which delivers the signal to the INTR signal of all processors cores listed in the destination as an interrupt that originated in an externally connected interrupt controller.
MOV MOV AND OR LVT1 [ESI] 0FFFE58FFH; mask off bits 8-10, 12, 14 and 16 700H ; Bit 16=0 for not masked, Bit 15=0 for edge ; triggered, Bit 13=0 for high active input ; polarity, Bits 8-10 are 111b for ExtINT MOV [ESI], EAX ; Write to LVT1 ESI, EAX, EAX, EAX,
E-1
PROGRAMMING THE LINT0 AND LINT1 INPUTS
4. Program LVT2 as NMI, which delivers the signal on the NMI signal of all processor cores listed in the destination.
MOV MOV AND OR LVT2 [ESI] 0FFFE58FFH; mask off bits 8-10 and 15 000000400H; Bit 16=0 for not masked, Bit 15=0 edge ; triggered, Bit 13=0 for high active input ; polarity, Bits 8-10 are 100b for NMI MOV [ESI], EAX ; Write to LVT2 ;Unmask 8259 interrupts and allow NMI. ESI, EAX, EAX, EAX,
E-2
INDEX
Numerics
16-bit code, mixing with 32-bit code. . . . . . . . .17-1 32-bit code, mixing with 16-bit code. . . . . . . . .17-1 8086 emulation, support for . . . . . . . . . . . . . . . . .16-1 processor, exceptions and interrupts . . . . .16-8 8086/8088 processor . . . . . . . . . . . . . . . . . . . .18-7 8087 math coprocessor . . . . . . . . . . . . . . . . . .18-7 82489DX, software visible differences between the local APIC on a Pentium Pro processor and the 82489DX . . . . . . . . . . . . . . .7-44 bus message format . . . . . . . . . . . . . . . . . 7-37 description of. . . . . . . . . . . . . . . . . . . . . . . 7-13 diagram of . . . . . . . . . . . . . . . . . . . . . . . . . 7-14 EOI message format . . . . . . . . . . . . . . . . . 7-37 nonfocused lowest priority message . . . . . 7-38 short message format . . . . . . . . . . . . . . . . 7-37 SMI message . . . . . . . . . . . . . . . . . . . . . . 12-2 status cycles . . . . . . . . . . . . . . . . . . . . . . . 7-40 structure of . . . . . . . . . . . . . . . . . . . . . . . . 7-14 APIC (see also I/O APIC or Loal APIC) APIC_BASE_MSR . . . . . . . . . . . . . . . . . . . . . 7-19 APR (arbitration priority register), local APIC . 7-32 Arbitration APIC bus . . . . . . . . . . . . . . . . . . . . . . . . . . 7-36 priority, local APIC. . . . . . . . . . . . . . . . . . . 7-22 ARPL instruction. . . . . . . . . . . . . . . . . . .2-20, 4-30 Atomic operations automatic bus locking . . . . . . . . . . . . . . . . . 7-3 effects of a locked operation on internal processor caches. . . . . . . . . . . . . . . . . . 7-6 guaranteed, description of. . . . . . . . . . . . . . 7-2 overview of . . . . . . . . . . . . . . . . . . . . . .7-2, 7-3 software-controlled bus locking. . . . . . . . . . 7-4 Auto HALT restart field, SMM . . . . . . . . . . . . . . . . . . . . . . . . 12-13 SMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-13 Automatic bus locking. . . . . . . . . . . . . . . . . . . . 7-3
A
A (accessed) flag, page-table entry . . . . . . . . .3-27 A20M# signal . . . . . . . . . . . . . . . . . . . . 16-3, 18-35 Aborts description of . . . . . . . . . . . . . . . . . . . . . . . .5-5 restarting a program or task after . . . . . . . . .5-7 AC (alignment check) flag, EFLAGS register . . . . . . 2-9, 5-50, 18-6 Access rights checking . . . . . . . . . . . . . . . . . . . . . . . . . . .2-20 checking caller privileges . . . . . . . . . . . . . .4-28 description of . . . . . . . . . . . . . . . . . . . . . . .4-26 invalid values . . . . . . . . . . . . . . . . . . . . . .18-24 ADC instruction . . . . . . . . . . . . . . . . . . . . . . . . .7-4 ADD instruction . . . . . . . . . . . . . . . . . . . . . . . . .7-4 Address size prefix . . . . . . . . . . . . . . . . . . . . . . . . . .17-2 space, of task . . . . . . . . . . . . . . . . . . . . . . .6-17 Address translation 2-MByte pages . . . . . . . . . . . . . . . . . . . . . .3-32 4-KByte pages . . . . . . . . . . . . . . . . . 3-20, 3-30 4-MByte pages . . . . . . . . . . . . . . . . . . . . . .3-21 in real-address mode . . . . . . . . . . . . . . . . .16-3 logical to linear . . . . . . . . . . . . . . . . . . . . . . .3-7 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6 Addressing, segments . . . . . . . . . . . . . . . . . . . .1-7 Advanced programmable interrupt controller (see APIC, I/O APIC, or Loal APIC) Alignment alignment check exception . . . . . . . . . . . . .5-50 checking . . . . . . . . . . . . . . . . . . . . . . . . . . .4-30 exception . . . . . . . . . . . . . . . . . . . . . . . . .18-13 Alignment check exception (#AC) . . . 5-50, 18-13, 18-26 AM (alignment mask) flag, CR0 control register . . . . . . . 2-14, 18-22 AND instruction . . . . . . . . . . . . . . . . . . . . . . . . .7-4 APIC Base field, APIC_BASE_MSR . . . . . . . .7-19 APIC bus arbitration mechanism and protocol . . . . . .7-36 bus arbitration . . . . . . . . . . . . . . . . . . . . . . .7-15
B
B (busy) flag, TSS descriptor . 6-7, 6-12, 6-16, 7-3 B (default stack size) flag, segment descriptor . . . 17-2, 18-34 B0-B3 (breakpoint condition detected) flags, DR6 register . . . . . . . . . . . . . . . . . . 15-4 Backlink (see Previous task link) Base address fields, segment descriptor . . . . 3-11 BD (debug register access detected) flag, DR6 register . . . . . . . . . . . . .15-4, 15-10 Binary numbers . . . . . . . . . . . . . . . . . . . . . . . . 1-7 BINIT# signal . . . . . . . . . . . . . . . . . . . . . . . . . 2-22 Bit order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6 BOUND instruction . . . . . . . . . . . . . . . . . .5-3, 5-27 BOUND range exceeded exception (#BR) . . . 5-27 BP0#, BP1#, BP2#, and BP3# pins . . . . . . . 15-12 Breakpoint exception (#BP) 5-3, 5-25, 15-1, 15-11 Breakpoints breakpoint exception (#BP). . . . . . . . . . . . 15-1 data breakpoint . . . . . . . . . . . . . . . . . . . . . 15-7 data breakpoint exception conditions . . . . 15-9 description of. . . . . . . . . . . . . . . . . . . . . . . 15-1 DR0-DR3 debug registers. . . . . . . . . . . . . 15-4 example. . . . . . . . . . . . . . . . . . . . . . . . . . . 15-7
INDEX-1
INDEX
exception . . . . . . . . . . . . . . . . . . . . . . . . . .5-25 field recognition. . . . . . . . . . . . . . . . . . . . . .15-6 general-detect exception condition . . . . . .15-10 instruction breakpoint . . . . . . . . . . . . . . . . .15-7 instruction breakpoint exception condition .15-8 I/O breakpoint exception conditions . . . . . .15-9 LEN0 - LEN3 (Length) fields, DR7 register.15-6 R/W0-R/W3 (read/write) fields, DR7 register . . . . . . . . . . . . . . . . . . . . .15-6 single-step exception condition. . . . . . . . .15-10 task-switch exception condition . . . . . . . .15-11 BS (single step) flag, DR6 register. . . . . . . . . .15-5 BSP (bootstrap processor) flag, APIC_BASE_MSR . . . . . . . . . . . . . .7-19 BSWAP instruction. . . . . . . . . . . . . . . . . . . . . .18-5 BT (task switch) flag, DR6 register. . . . 15-5, 15-11 BTC instruction. . . . . . . . . . . . . . . . . . . . . . . . . .7-4 BTF (single-step on branches) flag, DebugCtlMSR register . . . . 15-12, 15-14 BTR instruction. . . . . . . . . . . . . . . . . . . . . . . . . .7-4 BTS instruction. . . . . . . . . . . . . . . . . . . . . . . . . .7-4 Built-in self-test (BIST) description of . . . . . . . . . . . . . . . . . . . . . . . .8-1 performing. . . . . . . . . . . . . . . . . . . . . . . . . . .8-2 Bus arbitration, APIC bus. . . . . . . . . . . . . . . . . .7-15 errors, detected with machine-check architecture . . . . . . . . . . . . . . . . . . . . .13-11 hold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-37 locking. . . . . . . . . . . . . . . . . . . . . . . . 7-3, 18-37 Byte order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6
C
C (conforming) flag, segment descriptor . . . . .4-13 C1 flag, FPU status word . . . . . . . . . . . 18-9 , 18-18 C2 flag, FPU status word . . . . . . . . . . . . . . . . .18-9 Cache control . . . . . . . . . . . . . . . . . . . . . . . . . .9-18 cache management instructions . . . . . . . . .9-15 cache mechanisms in Intel Architecture processors. . . . . . . . . . . . . . . . . . . . . .18-30 caching terminology . . . . . . . . . . . . . . . . . . .9-4 CD flag, CR0 control register . . . . . . 9-9, 18-23 choosing a memory type. . . . . . . . . . . . . . . .9-8 fixed-range MTRRs. . . . . . . . . . . . . . . . . . .9-22 flags and fields . . . . . . . . . . . . . . . . . . . . . . .9-9 flushing TLBs . . . . . . . . . . . . . . . . . . . . . . .9-17 G (global) flag, page-directory entries . . . 9-12, 9-17 G (global) flag, page-table entries . . 9-12, 9-17 internal caches . . . . . . . . . . . . . . . . . . . . . . .9-1 MemTypeGet() function . . . . . . . . . . . . . . .9-28 MemTypeSet() function . . . . . . . . . . . . . . .9-29 MESI protocol . . . . . . . . . . . . . . . . . . . . 9-4, 9-9 methods of caching available . . . . . . . . . . . .9-5 MTRR initialization . . . . . . . . . . . . . . . . . . .9-27 MTRR precedences . . . . . . . . . . . . . . . . . .9-26 MTRRs, description of . . . . . . . . . . . . . . . .9-18
multiple-processor considerations. . . . . . . 9-31 NW flag, CR0 control register . . . . .9-12, 18-23 operating modes . . . . . . . . . . . . . . . . . . . . 9-11 overview of . . . . . . . . . . . . . . . . . . . . . . . . . 9-1 PCD flag, CR3 control register . . . . . . . . . 9-12 PCD flag, page-directory entries . . . 9-12, 9-13, 9-32 PCD flag, page-table entries . . 9-12, 9-13, 9-32 precedence of controls . . . . . . . . . . . . . . . 9-13 preventing caching . . . . . . . . . . . . . . . . . . 9-14 protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9 PWT flag, CR3 control register . . . . . . . . . 9-12 PWT flag, page-directory entries. . . .9-12, 9-32 PWT flag, page-table entries. . . . . . .9-12, 9-32 remapping memory types . . . . . . . . . . . . . 9-27 setting up memory ranges with MTRRs . . 9-21 variable-range MTRRs . . . . . . . . . . . . . . . 9-23 Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 cache hit . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5 cache line . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4 cache line fill . . . . . . . . . . . . . . . . . . . . . . . . 9-5 cache write hit . . . . . . . . . . . . . . . . . . . . . . . 9-5 description of. . . . . . . . . . . . . . . . . . . . . . . . 9-1 effects of a locked operation on internal processor caches. . . . . . . . . . . . . . . . . . 7-6 enabling. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-8 management, instructions . . . . . . . . . . . . . 2-21 Caching cache control protocol . . . . . . . . . . . . . . . . . 9-9 cache line . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4 cache mechanisms in Intel Architecture processors . . . . . . . . . . . . . . . . . . . . . 18-30 caching terminology . . . . . . . . . . . . . . . . . . 9-4 choosing a memory type . . . . . . . . . . . . . . . 9-8 flushing TLBs . . . . . . . . . . . . . . . . . . . . . . 9-17 implicit caching . . . . . . . . . . . . . . . . . . . . . 9-16 internal caches . . . . . . . . . . . . . . . . . . . . . . 9-1 L1 (level 1) cache . . . . . . . . . . . . . . . . . . . . 9-2 L2 (level 2) cache . . . . . . . . . . . . . . . . . . . . 9-2 methods of caching available . . . . . . . . . . . 9-5 MTRRs, description of. . . . . . . . . . . . . . . . 9-18 operating modes . . . . . . . . . . . . . . . . . . . . 9-11 overview of . . . . . . . . . . . . . . . . . . . . . . . . . 9-1 self-modifying code, effect on . . . . .9-15, 18-31 snooping . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5 TLBs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4 UC (uncacheable) memory type . . . . . . . . . 9-5 WB (write back) memory type . . . . . . . . . . . 9-6 WC (write combining) memory type . . . . . . 9-6 WP (write protected) memory type . . . . . . . 9-7 write buffer . . . . . . . . . . . . . . . . . . . . .9-4, 9-17 write-back caching . . . . . . . . . . . . . . . . . . . 9-5 WT (write through) memory type. . . . . . . . . 9-6 Call gates 16-bit, interlevel return from . . . . . . . . . . 18-34 accessing a code segment through . . . . . 4-17 description of. . . . . . . . . . . . . . . . . . . . . . . 4-16 for 16-bit and 32-bit code modules . . . . . . 17-2
INDEX-2
INDEX
introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-3 mechanism . . . . . . . . . . . . . . . . . . . . . . . . .4-18 privilege level checking rules . . . . . . . . . . .4-19 CALL instruction. 3-9, 4-12, 4-13, 4-17, 4-23, 6-3, 6-10, 6-12, 17-7 Caller access privileges, checking . . . . . . . . . .4-28 Calls between 16- and 32-bit code segments . . .17-4 controlling the operand-size attribute for a call. . . . . . . . . . . . . . . . . . . . . . . . .17-7 returning from . . . . . . . . . . . . . . . . . . . . . . .4-23 CC0 and CC1 (counter control) fields, CESR MSR (Pentium processor). . . . . . . . . . . .15-20 CD (cache disable) flag, CR0 control register 2-13, 8-8, 9-9, 9-11, 9-13, 9-14, 9-31, 9-32, 18-22, 18-23, 18-30 CESR (control and event select) MSR (Pentium processor) . . . . . . . . . . . . . . . . . . .15-20 CLI instruction . . . . . . . . . . . . . . . . . . . . . . . . . .5-9 CLTS instruction. . . . . . . . . . . . . . . . . . . 2-20, 4-25 Cluster model, local APIC . . . . . . . . . . . . . . . .7-21 CMOVcc instructions . . . . . . . . . . . . . . . . . . . .18-3 CMPXCHG instruction . . . . . . . . . . . . . . . 7-4, 18-5 CMPXCHG8B instruction . . . . . . . . . . . . . 7-4, 18-4 Code modules 16 bit vs. 32 bit . . . . . . . . . . . . . . . . . . . . . .17-2 mixing 16-bit and 32-bit code . . . . . . . . . . .17-1 sharing data among mixed-size code segments. . . . . . . . . . . . . . . . . . . . . . . .17-3 transferring control among mixed-size code segments. . . . . . . . . . . . . . . . . . . . . . . .17-4 Code optimization 8/16 bit operands . . . . . . . . . . . . . . . . . . .14-33 accessing memory . . . . . . . . . . . . . . . . . .14-24 accessing memory, using MMX instructions . . . . . . . . . . . . . . . 14-24, 14-25 accessing memory, write allocation effects . . . . . . . . . . . . . . . . . . . . . . . . .14-27 address calculations . . . . . . . . . . . . . . . . .14-34 addressing modes and register usage . . .14-29 alignment, code . . . . . . . . . . . . . . . . . . . . .14-9 alignment, data . . . . . . . . . . . . . . . . . . . . . .14-9 alignment, data structures and arrays . . .14-10 alignment, dynamic allocation using malloc . . . . . . . . . . . . . . . . . . . . . . . . .14-11 alignment, memory and stack. . . . . . . . . .14-10 alignment, of static variables . . . . . . . . . .14-10 alignment, penalties . . . . . . . . . . . . . . . . . .14-9 alignment, rules and guidelines . . . . . . . . .14-9 alignment, using in-line assembly code . .14-11 branch prediction, eliminating and reducing number of branches . . . . . . . . . . . . . . .14-5 branch prediction, optimization . . . . . 14-4, 14-5 branch prediction, rules . . . . . . . . . . . . . . .14-4 clearing a register . . . . . . . . . . . . . . . . . . .14-34 compares with immediate zero . . . . . . . . .14-35 complex instructions . . . . . . . . . . . . . . . . .14-32 epilog sequence . . . . . . . . . . . . . . . . . . . .14-35
guidelines, floating-point code. . . . . . . . . . 14-2 guidelines, general . . . . . . . . . . . . . . . . . . 14-1 guidelines, MMX code. . . . . . . . . . . .14-2, 14-3 instruction length . . . . . . . . . . . . . . . . . . . 14-30 instruction pairing, general integer-instruction pairability rules. . . . . . . . . . . . . . . . . . 14-14 instruction pairing, general rules . . . . . . . 14-12 instruction pairing, guidelines . . . . . . . . . 14-12 instruction pairing, integer pairing rules . 14-13 instruction pairing, MMX instruction pairing guidelines. . . . . . . . . . . . . . . . . . . . . . 14-17 instruction pairing, pairing MMX and integer instructions. . . . . . . . . . . . . . . .14-17, 14-18 instruction pairing, pairing two MMX instructions. . . . . . . . . . . . . . . . . . . . . 14-17 instruction pairing, restrictions on pair execution 14-16 instruction pairing, special pairs . . . . . . . 14-16 instruction pairing, unpairability due to register dependencies . . . . . . . . . . . . 14-15 instruction scheduling, overview . . . . . . . 14-12 integer divide. . . . . . . . . . . . . . . . . . . . . . 14-34 integer instruction selection and optimizations . . . . . . . . . . . . . . . . . . . 14-32 LEA instruction . . . . . . . . . . . . . . . . . . . . 14-32 partial register stalls, reducing . . . . . . . . . 14-7 pipelining, floating-point instructions . . . . 14-18 pipelining, floating-point operations with integer operands . . . . . . . . . . . . . . . . . . . . . . 14-21 pipelining, FSTSW instruction . . . . . . . . . 14-21 pipelining, FXCH guidelines . . . . . . . . . . 14-22 pipelining, guidelines. . . . . . . . . . . . . . . . 14-18 pipelining, hiding the one-clock latency of a floating-point store . . . . . . . . . . . . . . . 14-20 pipelining, integer and floating-point multiply. . . . . . . . . . . . . . . . . . . . . . . . 14-21 pipelining, MMX instructions . . . . . . . . . . 14-18 pipelining, pairing of floating-point instructions. . . . . . . . . . . . . . . . . . . . . 14-19 pipelining, transcendental instructions . . 14-22 pipelining, using integer instructions to hide latencies and schedule floating-point instructions. . . . . . . . . . . . . . . . . . . . . 14-19 prefixed opcodes. . . . . . . . . . . . . . . . . . . 14-31 prolog sequences . . . . . . . . . . . . . . . . . . 14-34 PUSH mem instruction . . . . . . . . . . . . . . 14-33 scheduling, rules for Pentium II and Pentium Pro processors . . . . . . . . . . . . . . . . . . . . . 14-22 short opcodes . . . . . . . . . . . . . . . . . . . . . 14-33 zero-extension of short integers . . . . . . . 14-32 Code optimizations compares . . . . . . . . . . . . . . . . . . . . . . . . 14-34 Code segments accessing data in . . . . . . . . . . . . . . . . . . . 4-12 accessing through a call gate . . . . . . . . . . 4-17 description of. . . . . . . . . . . . . . . . . . . . . . . 3-13 descriptor format . . . . . . . . . . . . . . . . . . . . . 4-3 descriptor layout . . . . . . . . . . . . . . . . . . . . . 4-3
INDEX-3
INDEX
direct calls or jumps to . . . . . . . . . . . . . . . .4-13 executable (defined) . . . . . . . . . . . . . . . . . .3-12 pointer size . . . . . . . . . . . . . . . . . . . . . . . . .17-5 privilege level checking when transferring program control between code segments. . . . . . . . . . . . . . . . . . . . . . . .4-12 Compatibility Intel Architecture. . . . . . . . . . . . . . . . . . . . .18-1 software . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6 Condition code flags, FPU status word compatibility information . . . . . . . . . . . . . . .18-8 Conforming code segments accessing . . . . . . . . . . . . . . . . . . . . . . . . . .4-15 C (conforming) flag . . . . . . . . . . . . . . . . . . .4-13 description of . . . . . . . . . . . . . . . . . . . . . . .3-14 Context, task (see Task state) Control registers CR0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-12 CR1 (reserved) . . . . . . . . . . . . . . . . . . . . . .2-12 CR2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-12 CR3 (PDBR) . . . . . . . . . . . . . . . . . . . . 2-5, 2-12 CR4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-12 description of . . . . . . . . . . . . . . . . . . . . . . .2-12 introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-5 qualification of flags with CPUID instruction . . . . . . . . . . . . . . . . . . . . . . .2-18 Coprocessor segment overrun exception . . . 5-34, 18-14 Counter mask field, PerfEvtSel0 and PerfEvtSel1 MSRs (P6 family processors) . . . .15-17 CPL description of . . . . . . . . . . . . . . . . . . . . . . . .4-8 field, CS segment selector . . . . . . . . . . . . . .4-3 CPUID instruction. . 2-18, 7-12, 9-20, 13-7, 15-14, 15-19, 18-2, 18-4, 18-38 CR0 control register . . . . . . . . . . . . . . . . . . . . .18-8 description of . . . . . . . . . . . . . . . . . . . . . . .2-12 introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-5 state following processor reset . . . . . . . . . . .8-2 CR1 control register (reserved) . . . . . . . . . . . .2-12 CR2 control register description of . . . . . . . . . . . . . . . . . . . . . . .2-12 introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-5 CR3 control register (PDBR) associated with a task. . . . . . . . . . . . . . 6-1, 6-3 description of . . . . . . . . . . . . . . . . . . 2-12, 3-23 in TSS . . . . . . . . . . . . . . . . . . . . . . . . . 6-6, 6-17 introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-5 loading during initialization . . . . . . . . . . . . .8-13 memory management. . . . . . . . . . . . . . . . . .2-5 CR4 control register . . . . . . . . . . . . . . . . . . . . .18-2 description of . . . . . . . . . . . . . . . . . . . . . . .2-12 inclusion in Intel Architecture . . . . . . . . . .18-21 introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-5 CS register . . . . . . . . . . . . . . . . . . . . . . . . . . .18-12 saving on call to exception or interrupt handler . . . . . . . . . . . . . . . . . . . . . . . . .5-15 state following initialization . . . . . . . . . . . . . .8-6
CS segment selector, CPL field . . . . . . . . . . . . 4-3 CTR0 and CTR1 (performance counters) MSRs (Pentium processor) . . . . . .15-20, 15-22 Current privilege level (see CPL) Current-count register, local APIC . . . . . . . . . 7-44
D
D (default operation size) flag, segment descriptor. . . . . . . .17-2, 18-34 D (dirty) flag, page-table entry . . . . . . . . . . . . 3-27 Data breakpoint exception conditions. . . . . . . 15-9 Data segments description of. . . . . . . . . . . . . . . . . . . . . . . 3-13 descriptor layout . . . . . . . . . . . . . . . . . . . . . 4-3 expand-down type. . . . . . . . . . . . . . . . . . . 3-12 privilege level checking when accessing. . . 4-9 DB0-DB3 breakpoint-address registers . . . . . 15-1 DB6 debug status register . . . . . . . . . . . . . . . 15-1 DB7 debug control register. . . . . . . . . . . . . . . 15-1 DE (debugging extensions) flag, CR4 control register . . . . . 2-17, 18-22, 18-24, 18-25 DE (denormal operand exception) flag, FPU status word . . . . . . . . . . . . .11-17, 11-19 Debug exception (#DB) 5-9, 5-23, 6-6, 15-1, 15-8, 15-13 Debug registers description of. . . . . . . . . . . . . . . . . . . . . . . 15-2 introduction to . . . . . . . . . . . . . . . . . . . . . . . 2-5 loading. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-21 DebugCtlMSR register . . . . . . . . . . . . .15-1, 15-11 Debugging facilities debug registers . . . . . . . . . . . . . . . . . . . . . 15-2 exceptions . . . . . . . . . . . . . . . . . . . . . . . . . 15-7 last branch, interrupt, and exception recording . . . . . . . . . . . . . . . . . . . . . . 15-11 masking debug exceptions . . . . . . . . . . . . . 5-9 overview of . . . . . . . . . . . . . . . . . . . . . . . . 15-1 performance-monitoring counters . . . . . . 15-15 time-stamp counter . . . . . . . . . . . . . . . . . 15-14 DEC instruction. . . . . . . . . . . . . . . . . . . . . . . . . 7-4 Denormal operand exception (#D) . . .11-19, 18-11 Denormalized operand . . . . . . . . . . . . . . . . . 18-15 Device-not-available exception (#NM) . . 5-30, 8-8, 18-13, 18-14 DFR (destination format register), local APIC 7-21 DIV instruction . . . . . . . . . . . . . . . . . . . . . . . . 5-22 Divide configuration register, local APIC . . . . 7-43 Divide-error exception (#DE) . . . . . . . .5-22, 18-26 Division-by-zero exception (#Z) . . . . . . . . . . 11-18 Double-fault exception (#DF) . . . . . . . .5-32, 18-28 DPL (descriptor privilege level) field, segment descriptor . . . . . . . . . . . . . 3-12, 4-2, 4-8 DR0-DR3 breakpoint-address registers . . . . . 15-4, 15-12, 15-13 DR4-DR5 debug registers . . . . . . . . . .15-4, 18-25 DR6 debug status register . . . . . . . . . . . . . . . 15-4
INDEX-4
INDEX
B0-B3 (breakpoint condition detected) flags. . . . . . . . . . . . . . . . . . . . . . . . . . . .15-4 BD (debug register access detected) flag. .15-4 BS (single step) flag . . . . . . . . . . . . . . . . . .15-5 BT (task switch) flag . . . . . . . . . . . . . . . . . .15-5 debug exception (#DB) . . . . . . . . . . . . . . . .5-23 reserved bits . . . . . . . . . . . . . . . . . . . . . . .18-24 DR7 debug control register . . . . . . . . . . . . . . .15-5 G0-G3 (global breakpoint enable) flags . . .15-5 GD (general detect enable) flag . . . . . . . . .15-5 GE (global exact breakpoint enable) flag . .15-5 L0-L3 (local breakpoint enable) flags . . . . .15-5 LE local exact breakpoint enable) flag . . . .15-5 LEN0-LEN3 (Length) fields. . . . . . . . . . . . .15-6 R/W0-R/W3 (read/write) fields . . . . 15-6, 18-24 D/B (default operation size/default stack pointer size and/or upper bound) flag, segment descriptor . . . . . . . . . . . . . . . . . 3-12, 4-5
E
E (edge detect) flag, PerfEvtSel0 and PerfEvtSel1 MSRs (P6 family processors) . . . .15-17 E (enable/disable APIC) flag, APIC_BASE_MSR . . . . . . . . . . . . . .7-19 E (expansion direction) flag, segment descriptor . . . . . . . . . . . . . . . . . . 4-2, 4-5 E (MTRRs enabled) flag, MTRRdefType register . . . . . . . . . . . . . . . . . . 7-19, 9-22 EFLAGS register introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-5 new flags. . . . . . . . . . . . . . . . . . . . . . . . . . .18-6 saved in TSS . . . . . . . . . . . . . . . . . . . . . . . .6-4 saving on call to exception or interrupt handler . . . . . . . . . . . . . . . . . . . . . . . . .5-15 using flags to distinguish between 32-bit Intel Architecture processors. . . . . . . . . . . . .18-6 EIP register . . . . . . . . . . . . . . . . . . . . . . . . . .18-12 saved in TSS . . . . . . . . . . . . . . . . . . . . . . . .6-4 saving on call to exception or interrupt handler . . . . . . . . . . . . . . . . . . . . . . . . .5-15 state following initialization . . . . . . . . . . . . . .8-6 EM (emulation) flag, CR0 control register . . . 2-15, 5-30, 8-6, 8-8 EOI (end-of-interrupt register), local APIC . . . .7-33 Error code exception, description of . . . . . . . . . . . . . . .5-20 pushing on stack. . . . . . . . . . . . . . . . . . . .18-33 Error signals . . . . . . . . . . . . . . . . . . . . 18-12, 18-13 ERROR# input . . . . . . . . . . . . . . . . . . . . . . . .18-19 ERROR# output . . . . . . . . . . . . . . . . . . . . . . .18-19 ES0 and ES1 (event select) fields, CESR MSR (Pentium processor). . . . . . .15-20, A-12 ESP register, saving on call to exception or interrupt handler . . . . . . . . . . . . . . . . . . . . . . .5-15 ESR (error status register), local APIC . . . . . .7-42 ET (extension type) flag, CR0 control register .2-14 ET (extension type) flag, CR0 register . . . . . . .18-8
Event select field, PerfEvtSel0 and PerfEvtSel1 MSRs (P6 family processors) . . . . 15-16 Exception handler calling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15 defined . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 flag usage by handler procedure. . . . . . . . 5-18 machine-check exceptions (#MC). . . . . . 13-14 procedures . . . . . . . . . . . . . . . . . . . . . . . . 5-15 protection of handler procedures . . . . . . . 5-17 task . . . . . . . . . . . . . . . . . . . . . . . . . . .5-18, 6-3 Exception priority, FPU exceptions. . .11-13, 18-12 Exceptions alignment check . . . . . . . . . . . . . . . . . . . 18-13 classifications . . . . . . . . . . . . . . . . . . . . . . . 5-4 conditions checked during a task switch . . 6-13 coprocessor segment overrun. . . . . . . . . 18-14 description of. . . . . . . . . . . . . . . . . . . . .2-4, 5-1 device not available. . . . . . . . . . . . . . . . . 18-14 double fault . . . . . . . . . . . . . . . . . . . . . . . . 5-32 error code . . . . . . . . . . . . . . . . . . . . . . . . . 5-20 floating-point error . . . . . . . . . . . . . . . . . . 18-14 general protection . . . . . . . . . . . . . . . . . . 18-14 handler mechanism. . . . . . . . . . . . . . . . . . 5-15 handler procedures . . . . . . . . . . . . . . . . . . 5-15 handling. . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15 handling in real-address mode . . . . . . . . . 16-6 handling in SMM . . . . . . . . . . . . . . . . . . . 12-10 handling in virtual-8086 mode . . . . . . . . . 16-15 handling through a task gate in virtual-8086 mode . . . . . . . . . . . . . . . . . . . . . . . . . 16-20 handling through a trap or interrupt gate in virtual-8086 mode . . . . . . . . . . . . . . . 16-17 IDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 initializing for protected-mode operation . . 8-12 invalid opcode . . . . . . . . . . . . . . . . . . . . . . 18-6 masking debug exceptions . . . . . . . . . . . . . 5-9 masking when switching stack segments . 5-10 notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8 overview of . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 priorities among simultaneous exceptions and interrupts . . . . . . . . . . . . . . . . . . . . . . . 5-10 priority of . . . . . . . . . . . . . . . . . . . . . . . . . 18-27 reference information on all exceptions . . 5-21 restarting a task or program . . . . . . . . . . . . 5-7 segment not present . . . . . . . . . . . . . . . . 18-14 sources of . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 summary of . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 vectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 Executable code segment, size . . . . . . . . . . . 3-12 Expand-down data segment type . . . . . . . . . . 3-12 External bus errors, detected with machine-check architecture. . . . . . . . . . . . . . . . . . 13-11
F
F2XM1 instruction. . . . . . . . . . . . . . . . . . . . . 18-16 Fast string operations . . . . . . . . . . . . . . . . . . . . 7-9
INDEX-5
INDEX
Faults description of . . . . . . . . . . . . . . . . . . . . . . . .5-4 restarting a program or task after . . . . . . . . .5-7 FCMOVcc instructions . . . . . . . . . . . . . . . . . . .18-3 FCOMI instruction . . . . . . . . . . . . . . . . . . . . . .18-3 FCOMIP instruction . . . . . . . . . . . . . . . . . . . . .18-3 FCOS instruction . . . . . . . . . . . . . . . . . . . . . .18-16 FDISI instruction (obsolete) . . . . . . . . . . . . . .18-18 FDIV instruction . . . . . . . . . . . . . . . . . 18-13, 18-15 FE (fixed MTRRs enabled) flag, MTRRdefType register . . . . . . . . . . . . . . . . . . . . . . .9-22 Feature determination, of processor . . . . . . . .18-2 Feature information, processor . . . . . . . . . . . .18-2 FENI instruction (obsolete). . . . . . . . . . . . . . .18-18 FINIT/FNINIT instructions . . . . . . . . . . 18-8, 18-19 FIX (fixed range registers supported) flag, MTRRcap register . . . . . . . . . . . . . .9-21 Fixed-range MTRRs description of . . . . . . . . . . . . . . . . . . . . . . .9-22 mapping to physical memory . . . . . . . . . . .9-23 Flat model, local APIC . . . . . . . . . . . . . . . . . . .7-21 Flat segmentation model . . . . . . . . . . . . . . 3-3, 3-4 FLD instruction . . . . . . . . . . . . . . . . . . . . . . . .18-16 FLDENV instruction . . . . . . . . . . . . . . . . . . . .18-14 FLDL2E instruction. . . . . . . . . . . . . . . . . . . . .18-17 FLDL2T instruction. . . . . . . . . . . . . . . . . . . . .18-17 FLDLG2 instruction . . . . . . . . . . . . . . . . . . . .18-17 FLDLN2 instruction . . . . . . . . . . . . . . . . . . . .18-17 FLDPI instruction . . . . . . . . . . . . . . . . . . . . . .18-17 Floating-point error exception (#MF) . . 5-48, 5-53, 18-14 Floating-point exceptions denormal operand exception . . . . 11-19, 18-11 division-by-zero. . . . . . . . . . . . . . . . . . . . .11-18 exception conditions . . . . . . . . . . . . . . . . .11-16 exception priority. . . . . . . . . . . . . . . . . . . .11-13 inexact result (precision). . . . . . . . . . . . . .11-21 invalid arithmetic operand. . . . . . . . . . . . .11-17 invalid operation . . . . . . . . . . . . . . . . . . . .18-17 numeric overflow. . . . . . . . . . . . . . 11-19, 18-11 numeric underflow . . . . . . . . . . . . 11-20, 18-12 saved CS and EIP values . . . . . . . . . . . . .18-12 software handling . . . . . . . . . . . . . . . . . . .11-15 stack underflow. . . . . . . . . . . . . . . . . . . . .11-17 FLUSH# pin . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-2 Focus processor, local APIC . . . . . . . . . . . . . .7-22 FPATAN instruction . . . . . . . . . . . . . . . . . . . .18-16 FPREM instruction . . . . . . . . . . 18-9, 18-13, 18-15 FPREM1 instruction . . . . . . . . . . . . . . . 18-9, 18-15 FPTAN instruction . . . . . . . . . . . . . . . . 18-9, 18-15 FPU compatibility with Intel Architecture FPUs and math coprocessors . . . . . . . . . . . . . . . .18-7 configuring the FPU environment . . . . . . . . .8-6 device-not-available exception . . . . . . . . . .5-30 error signals . . . . . . . . . . . . . . . . . 18-12, 18-13 floating-point error exception . . . . . . . . . . .5-48 initialization . . . . . . . . . . . . . . . . . . . . . . . . . .8-6
instruction synchronization . . . . . . . . . . . 18-19 setting up for software emulation of FPU functions . . . . . . . . . . . . . . . . . . . . . . . . 8-8 using in SMM . . . . . . . . . . . . . . . . . . . . . 12-11 FPU control word compatibility, Intel Architecture processors 18-9 RC field . . . . . . . . . . . . . . . . . . . . . . .11-3, 11-4 FPU status word condition code flags . . . . . . . . . . . . . . . . . 18-8 OE flag . . . . . . . . . . . . . . . . . . . . . . . . . . 11-19 FPU tag word . . . . . . . . . . . . . . . . . . . . . . . . . 18-9 FRSTOR instruction . . . . . . . . . . . . . .18-13, 18-14 FSAVE/FNSAVE instructions . . . . . . .18-13, 18-18 FSCALE instruction . . . . . . . . . . . . . . . . . . . 18-15 FSIN instruction . . . . . . . . . . . . . . . . . . . . . . 18-16 FSINCOS instruction . . . . . . . . . . . . . . . . . . 18-16 FSQRT instruction . . . . . . . . . . . . . . .18-13, 18-15 FSTENV/FNSTENV instructions . . . . . . . . . 18-18 FTAN instruction. . . . . . . . . . . . . . . . . . . . . . . 18-9 FUCOM instruction . . . . . . . . . . . . . . . . . . . . 18-15 FUCOMI instruction . . . . . . . . . . . . . . . . . . . . 18-3 FUCOMIP instruction . . . . . . . . . . . . . . . . . . . 18-3 FUCOMP instruction. . . . . . . . . . . . . . . . . . . 18-15 FUCOMPP instruction . . . . . . . . . . . . . . . . . 18-15 FWAIT instruction . . . . . . . . . . . . . . . . . . . . . . 5-30 FXAM instruction . . . . . . . . . . . . . . . .18-16, 18-17 FXTRACT instruction . . . . . . . 18-11, 18-16, 18-17
G
G (global) flag page-directory entries . . . . . . . . . . . .9-12, 9-17 page-table entries . . . . . . . . . . . . . . .9-12, 9-17 page-table entry . . . . . . . . . . . . . . . . . . . . 3-27 G (granularity) flag, segment descriptor 3-10, 3-12, 4-2, 4-5 G0-G3 (global breakpoint enable) flags, DR7 register . . . . . . . . . . . . . . . . . . 15-5 Gate descriptors call gates . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16 description of. . . . . . . . . . . . . . . . . . . . . . . 4-16 Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 GD (general detect enable) flag, DR7 register . . . . . . . . . . . . .15-5, 15-10 GDT description of. . . . . . . . . . . . . . . . . . . .2-3, 3-17 index into with index field of segment selector . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 initializing. . . . . . . . . . . . . . . . . . . . . . . . . . 8-12 pointers to exception and interrupt handlers . . . . . . . . . . . . . . . . . . . . . . . . 5-15 segment descriptors in . . . . . . . . . . . . . . . . 3-9 selecting with TI (table indicator) flag of segment selector . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 task switching . . . . . . . . . . . . . . . . . . . . . . 6-10 task-gate descriptor. . . . . . . . . . . . . . . . . . . 6-8 TSS descriptors. . . . . . . . . . . . . . . . . . . . . . 6-6
INDEX-6
INDEX
use in address translation. . . . . . . . . . . . . . .3-7 GDTR register description of . . . . . . . . . . . . . . 2-3, 2-10, 3-17 introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-5 limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-5 loading during initialization . . . . . . . . . . . . .8-12 storing . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-18 GE (global exact breakpoint enable) flag, DR7 register. . . . . . . . . . . . . 15-5, 15-10 General-detect exception condition . . . . . . . .15-10 General-protection exception (#GP) 3-14, 4-7, 4-8, 4-14, 4-15, 5-17, 5-41, 6-7, 15-2, 18-14, 18-26, 18-27, 18-35, 18-37 General-purpose registers saved in TSS . . . . . . . . . . . . . . . . . . . . . . . .6-4 Global descriptor table register (see GDTR) Global descriptor table (see GDT)
H
HALT state . . . . . . . . . . . . . . . . . . . . . . . . . . .12-13 relationship to SMI interrupt . . . . . . . . . . . .12-3 Hardware reset description of . . . . . . . . . . . . . . . . . . . . . . . .8-1 processor state after reset . . . . . . . . . . . . . .8-2 state of MTRRs following . . . . . . . . . . . . . .9-18 value of SMBASE following . . . . . . . . . . . .12-4 Hexadecimal numbers . . . . . . . . . . . . . . . . . . . .1-7 HITM# line . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-5 HLT instruction . . . 2-22, 4-25, 5-33, 12-13, 12-14, 15-15
I
ID (identification) flag, EFLAGS register 2-10, 18-6 IDIV instruction. . . . . . . . . . . . . . . . . . . 5-22, 18-26 IDT calling interrupt- and exception-handlers from . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-15 changing base and limit in real-address mode . . . . . . . . . . . . . . . . . . . . . . . . . . .16-6 description of . . . . . . . . . . . . . . . . . . . . . . .5-11 handling NMI interrrupts during initialization . . . . . . . . . . . . . . . . . . . . . .8-11 initializing, for protected-mode operation . .8-12 initializing, for real-address mode operation . . . . . . . . . . . . . . . . . . . . . . . .8-10 introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-4 limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-28 structure in real-address mode . . . . . . . . . .16-7 task switching . . . . . . . . . . . . . . . . . . . . . . .6-10 task-gate descriptor . . . . . . . . . . . . . . . . . . .6-8 types of descriptors allowed . . . . . . . . . . . .5-13 use in real-address mode . . . . . . . . . . . . . .16-6 IDTR register description of . . . . . . . . . . . . . . . . . . 2-11, 5-13 introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-4 limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-5 loading in real-address mode . . . . . . . . . . .16-6 storing . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-18
IE (invalid operation exception) flag, FPU status word . . . . . . . . . . . . . . . . . . . 18-9 IEEE 754 and 854 standards for floating-point arithmetic . . . . . . . . . . . . . . .18-9, 18-10 IF (interrupt enable) flag, EFLAGS register . . . 2-8, 5-8, 5-15, 5-18, 12-10, 16-6, 16-26 IN instruction. . . . . . . . . . . . . . . . . . . . .7-10, 18-36 INC instruction . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 Index field, segment selector . . . . . . . . . . . . . . 3-7 Inexact Result (Precision) Exception . . . . . . 11-21 Inexact result (precision) exception (#P) . . . 11-21 Inexact result, FPU . . . . . . . . . . . . . . . . . . . . . 11-4 INIT interrupt. . . . . . . . . . . . . . . . . . . . . . . . . . 7-13 Initial-count register, local APIC . . . . . . . . . . . 7-44 Initialization built-in self-test (BIST). . . . . . . . . . . . . .8-1, 8-2 CS register state following . . . . . . . . . . . . . 8-6 dual-processor (DP) bootup sequence for Pentium processors . . . . . . . . . . . . . . . . C-1 EIP register state following . . . . . . . . . . . . . 8-6 example. . . . . . . . . . . . . . . . . . . . . . . . . . . 8-16 first instruction executed . . . . . . . . . . . . . . . 8-6 FPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6 hardware reset . . . . . . . . . . . . . . . . . . . . . . 8-1 IDT, protected mode . . . . . . . . . . . . . . . . . 8-12 IDT, real-address mode . . . . . . . . . . . . . . 8-10 Intel486 SX processor and Intel 487 SX math coprocessor . . . . . . . . . . . . . . . . . . . 18-20 local APIC . . . . . . . . . . . . . . . . . . . . . . . . . 7-35 location of software-initialization code. . . . . 8-6 model and stepping information . . . . . . . . . 8-5 multiple-processor (MP) bootup sequence for P6 family processors . . . . . . . . . . . . . . . D-1 multitasking environment . . . . . . . . . . . . . 8-13 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1 paging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-12 processor state after reset . . . . . . . . . . . . . 8-2 protected mode . . . . . . . . . . . . . . . . . . . . . 8-11 real-address mode . . . . . . . . . . . . . . . . . . 8-10 RESET# pin . . . . . . . . . . . . . . . . . . . . . . . . 8-1 setting up exception- and interrupt-handling facilities . . . . . . . . . . . . . . . . . . . . . . . . 8-12 INIT# pin . . . . . . . . . . . . . . . . . . . . . . . . . . .5-2, 8-2 INIT# signal . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22 INS instruction . . . . . . . . . . . . . . . . . . . . . . . 15-10 Instruction operands . . . . . . . . . . . . . . . . . . . . . 1-7 Instruction set new instructions . . . . . . . . . . . . . . . . . . . . 18-3 obsolete instructions . . . . . . . . . . . . . . . . . 18-5 Instruction-breakpoint exception condition . . . 15-8 Instructions privileged. . . . . . . . . . . . . . . . . . . . . . . . . . 4-25 serializing . . . . . . . . . . . . . . . . . . . . . . . . 18-19 supported in real-address mode . . . . . . . . 16-4 system. . . . . . . . . . . . . . . . . . . . . . . . .2-6, 2-18 INT 3 instruction . . . . . . . . . . . . . . . . . . .5-25, 15-2 INT instruction . . . . . . . . . . . . . . . . . . . . . . . . 4-12 INT n instruction . . . . . . . . . . . . . . . . . 3-9, 5-1, 5-3
INDEX-7
INT (APIC interrupt enable) flag, PerfEvtSel0 and PerfEvtSel1 MSRs (P6 family processors) 15-17 INT3 instruction . . . . . . . . . . . . . . . . . . . . . 3-9, 5-3 Intel 287 math coprocessor . . . . . . . . . . . . . . .18-7 Intel 387 math coprocessor system . . . . . . . . .18-7 Intel 487 SX math coprocessor . . . . . . 18-7, 18-20 Intel 8086 processor. . . . . . . . . . . . . . . . . . . . .18-7 Intel Architecture compatibility . . . . . . . . . . . . . . . . . . . . . . . .18-1 processors . . . . . . . . . . . . . . . . . . . . . . . . .18-1 Intel286 processor . . . . . . . . . . . . . . . . . . . . . .18-7 Intel386 DX processor . . . . . . . . . . . . . . . . . . .18-7 Intel486 DX processor . . . . . . . . . . . . . . . . . . .18-7 Intel486 SX processor . . . . . . . . . . . . . 18-7, 18-20 Interprivilege level calls call mechanism . . . . . . . . . . . . . . . . . . . . . .4-17 stack switching . . . . . . . . . . . . . . . . . . . . . .4-21 Interrupt command register (ICR), local APIC .7-25 Interrupt gates 16-bit, interlevel return from . . . . . . . . . . .18-34 clearing IF flag . . . . . . . . . . . . . . . . . . 5-9, 5-18 difference between interrupt and trap gates . . 5-18 for 16-bit and 32-bit code modules . . . . . . .17-2 handling a virtual-8086 mode interrupt or exception through . . . . . . . . . . . . . . . .16-17 in IDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-13 introduction to . . . . . . . . . . . . . . . . . . . . 2-3, 2-4 layout of . . . . . . . . . . . . . . . . . . . . . . . . . . .5-13 Interrupt handler calling . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-15 defined . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-1 flag usage by handler procedure . . . . . . . .5-18 procedures . . . . . . . . . . . . . . . . . . . . . . . . .5-15 protection of handler procedures . . . . . . . .5-17 task . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-18, 6-3 Interrupt redirection bit map field (in TSS) . . .16-16 Interrupts acceptance, local APIC. . . . . . . . . . . . . . . .7-30 APIC priority levels . . . . . . . . . . . . . . . . . . .7-15 automatic bus locking when acknowledging. . . . . . . . . . . . . . . . . . .18-37 control transfers between 16- and 32-bit code modules. . . . . . . . . . . . . . . . . . . . . . . . .17-8 description of . . . . . . . . . . . . . . . . . . . . 2-4, 5-1 distribution mechanism, local APIC . . . . . .7-22 enabling and disabling . . . . . . . . . . . . . . . . .5-8 handler mechanism . . . . . . . . . . . . . . . . . .5-15 handler procedures. . . . . . . . . . . . . . . . . . .5-15 handling . . . . . . . . . . . . . . . . . . . . . . . . . . .5-15 handling in real-address mode . . . . . . . . . .16-6 handling in SMM . . . . . . . . . . . . . . . . . . . .12-10 handling in virtual-8086 mode. . . . . . . . . .16-15 handling multiple NMIs . . . . . . . . . . . . . . . . .5-8 handling through a task gate in virtual-8086 mode . . . . . . . . . . . . . . . . . . . . . . . . . .16-20
handling through a trap or interrupt gate in virtual-8086 mode . . . . . . . . . . . . . . . 16-17 IDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 IDTR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11 initializing for protected-mode operation . . 8-12 interrupt descriptor table register (see IDTR) interrupt descriptor table (see IDT) local APIC . . . . . . . . . . . . . . . . . . . . . . . . . 7-13 local APIC sources . . . . . . . . . . . . . . . . . . 7-15 maskable hardware interrupts. . . . . . .2-8, 7-23 masking maskable hardware interrupts . . . 5-8 masking when switching stack segments . 5-10 overview of . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 priorities among simultaneous exceptions and interrupts . . . . . . . . . . . . . . . . . . . . . . . 5-10 propagation delay . . . . . . . . . . . . . . . . . . 18-27 restarting a task or program . . . . . . . . . . . . 5-7 software. . . . . . . . . . . . . . . . . . . . . . . . . . . 5-55 summary of . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 user defined . . . . . . . . . . . . . . . . . . . .5-4, 5-55 valid APIC interrupts . . . . . . . . . . . . . . . . . 7-15 vectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 INTn instruction . . . . . . . . . . . . . . . . . . . . . . 15-10 INTO instruction . . . . . . . . . . 3-9, 5-3, 5-26, 15-10 INTR# pin . . . . . . . . . . . . . . . . . . . . . . . . . .5-2, 5-8 Invalid arithmetic operand exception (#IA), FPU description of. . . . . . . . . . . . . . . . . . . . . . 11-17 Invalid opcode exception (#UD) . 5-28, 12-3, 15-4, 18-6, 18-13 Invalid operation exception. . . . . . . . . . . . . . 11-17 Invalid operation exception, FPU . . . .18-13, 18-17 Invalid TSS exception (#TS). . . . . . . . . . .5-35, 6-7 Invalid-opcode exception (#UD) . . . . .18-25, 18-26 INVD instruction . . . . 2-21, 4-25, 7-12, 9-15, 18-5 INVLPG instruction . . . . . . . 2-21, 4-25, 7-12, 18-5 IOPL (I/O privilege level) field, EFLAGS register description of. . . . . . . . . . . . . . . . . . . . . . . . 2-8 restoring on return from exception or interrupt h andler. . . . . . . . . . . . . . . . . . . . . . . . . . 5-15 sensitive instructions in virtual-8086 mode . . . . . . . . . . . . . . . . . . . . . . . . . 16-14 IRET instruction . . 3-9, 5-8, 5-9, 5-15, 5-18, 6-10, 6-12, 7-12, 16-6, 16-27 IRETD instruction . . . . . . . . . . . . . . . . . . . . . . 7-12 IRR (interrupt request register), local APIC . . 7-30 ISR (in-service register), local APIC . . . . . . . . 7-30 I/O breakpoint exception conditions . . . . . . . . 15-9 in virtual-8086 mode . . . . . . . . . . . . . . . . 16-14 instruction restart flag, SMM revision indentifier field . . . . . . . . . . . . . . . . . . . . .12-15, 12-16 instructions, restarting following an SMI interrupt . . . . . . . . . . . . . . . . . . . . . . . 12-15 I/O permission bit map, TSS . . . . . . . . . . . . 6-6 map base address field, TSS . . . . . . . . . . . 6-6 I/O APIC bus arbitration . . . . . . . . . . . . . . . . . . . . . . 7-15 description of. . . . . . . . . . . . . . . . . . . . . . . 7-13
INDEX
external interrupts . . . . . . . . . . . . . . . . . . . . .5-2 interrupt sources . . . . . . . . . . . . . . . . . . . . .7-15 relationship of local APIC to I/O APIC . . . .7-14 valid interrupts . . . . . . . . . . . . . . . . . . . . . .7-15
J
JMP instruction. . 3-9, 4-12, 4-13, 4-17, 6-3, 6-10, 6-12
K
KEN# pin . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-39
L
L0-L3 (local breakpoint enable) flags, DR7 register. . . . . . . . . . . . . . . . . . .15-5 L1 (level 1) cache description of . . . . . . . . . . . . . . . . . . . . . . . .9-2 disabling . . . . . . . 9-4, 9-5, 9-8, 9-9, 9-15, 9-19 introduction of . . . . . . . . . . . . . . . . . . . . . .18-30 MESI cache protocol. . . . . . . . . . . . . . . . . . .9-9 L2 (level 2) cache description of . . . . . . . . . . . . . . . . . . . . . . . .9-2 disabling . . . . . . . 9-4, 9-5, 9-8, 9-9, 9-15, 9-19 introduction of . . . . . . . . . . . . . . . . . . . . . .18-30 MESI cache protocol. . . . . . . . . . . . . . . . . . .9-9 LAR instruction. . . . . . . . . . . . . . . . . . . . 2-20, 4-26 Larger page sizes introduction of . . . . . . . . . . . . . . . . . . . . . .18-32 support for. . . . . . . . . . . . . . . . . . . . . . . . .18-23 Last branch, interrupt, and exception recording description of . . . . . . . . . . . . . . . . . . . . . .15-11 initialization . . . . . . . . . . . . . . . . . . . . . . . .15-14 LastBranchFromIP MSR . . . . . 15-1, 15-13, 15-14 LastBranchToIP MSR . . . . . . . 15-1, 15-13, 15-14 LastExceptionFromIP MSR . . . 15-2, 15-13, 15-14 LastExceptionToIP MSR . . . . . 15-2, 15-13, 15-14 LBR (last branch/interrupt/exception) flag, DebugCtlMSR register . . . 15-11, 15-13, 15-14 LDR (logical destination register), local APIC .7-20 LDS instruction. . . . . . . . . . . . . . . . . . . . . 3-9, 4-10 LDT associated with a task. . . . . . . . . . . . . . . . . .6-3 description of . . . . . . . . . . . . . . . . . . . . . . .3-18 index into with index field of segment selector . . . . . . . . . . . . . . . . . . . . . . . . . .3-7 introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-3 pointer to in TSS . . . . . . . . . . . . . . . . . . . . . .6-5 pointers to exception and interrupt handlers. . . . . . . . . . . . . . . . . . . . . . . . .5-15 segment descriptors in . . . . . . . . . . . . . . . . .3-9 segment selector field, TSS . . . . . . . . . . . .6-17 selecting with TI (table indicator) flag of segment selector . . . . . . . . . . . . . . . . . . . . . . . . . .3-8 setting up during initialization . . . . . . . . . . .8-12 task switching . . . . . . . . . . . . . . . . . . . . . . .6-10
task-gate descriptor. . . . . . . . . . . . . . . . . . . 6-8 use in address translation . . . . . . . . . . . . . . 3-7 LDTR register description of. . . . . . . . . . . . . . . . . . .2-11, 3-18 introduction to . . . . . . . . . . . . . . . . . . . .2-3, 2-5 limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5 storing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18 LE (local exact breakpoint enable) flag, DR7 register . . . . . . . . . . . . .15-5 , 15-10 LEN0-LEN3 (Length) fields, DR7 register . . . 15-6 LES instruction . . . . . . . . . . . . . . . . 3-9, 4-10, 5-28 LFS instruction . . . . . . . . . . . . . . . . . . . . .3-9, 4-10 LGDT instruction. . . 2-20, 4-25, 7-12, 8-12, 18-25 LGS instruction . . . . . . . . . . . . . . . . . . . . .3-9, 4-10 LIDT instruction2-20, 4-25, 5-13, 7-12, 8-10, 16-6, 18-28 Limit checking description of. . . . . . . . . . . . . . . . . . . . . . . . 4-5 pointer offsets are within limits . . . . . . . . . 4-28 Limit field, segment descriptor . . . . . . . . . .4-2, 4-5 Linear address description of. . . . . . . . . . . . . . . . . . . . . . . . 3-6 introduction to . . . . . . . . . . . . . . . . . . . . . . . 2-5 Linear address space . . . . . . . . . . . . . . . . . . . . 3-6 defined . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 of task . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-17 Link (to previous task) field, TSS . . . . . . . . . . 5-19 Linking tasks mechanism . . . . . . . . . . . . . . . . . . . . . . . . 6-14 modifying task linkages . . . . . . . . . . . . . . . 6-16 LINT pins function of . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 programming . . . . . . . . . . . . . . . . . . . . . . . . E-1 LLDT instruction . . . . . . . . . . . . . . 2-20, 4-25, 7-12 LMSW instruction . . . . . . . . . . . . . . . . . .2-20, 4-25 Local APIC APIC_BASE_MSR . . . . . . . . . . . . . . . . . . 7-19 APR (arbitration priority register). . . . . . . . 7-32 arbitration priority . . . . . . . . . . . . . . . . . . . 7-22 block diagram . . . . . . . . . . . . . . . . . . . . . . 7-16 bus arbitration . . . . . . . . . . . . . . . . . . . . . . 7-15 cluster model. . . . . . . . . . . . . . . . . . . . . . . 7-21 current-count register . . . . . . . . . . . . . . . . 7-44 description of. . . . . . . . . . . . . . . . . . . . . . . 7-13 DFR (destination format register) . . . . . . . 7-21 divide configuration register . . . . . . . . . . . 7-43 enabling or disabling . . . . . . . . . . . . . . . . . 7-19 EOI (end-of-interrupt register) . . . . . . . . . . 7-33 ESR (error status register) . . . . . . . . . . . . 7-42 external interrupts . . . . . . . . . . . . . . . . . . . . 5-2 flat model. . . . . . . . . . . . . . . . . . . . . . . . . . 7-21 focus processor. . . . . . . . . . . . . . . . . . . . . 7-22 ID. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20 identifying BSP . . . . . . . . . . . . . . . . . . . . . 7-19 indicating performance-monitoring counter overflow . . . . . . . . . . . . . . . . . . . . . . . 15-19 initial-count register . . . . . . . . . . . . . . . . . . 7-44 initialization . . . . . . . . . . . . . . . . . . . . . . . . 7-35
INDEX-9
INDEX
interrupt acceptance . . . . . . . . . . . . . . . . . .7-30 interrupt acceptance decision flow chart. . .7-30 interrupt command register (ICR) . . . . . . . .7-25 interrupt destination . . . . . . . . . . . . . . . . . .7-20 interrupt distribution mechanism. . . . . . . . .7-22 interrupt sources . . . . . . . . . . . . . . . . . . . . .7-15 IRR (interrupt request register) . . . . . . . . . .7-30 ISR (in-service register) . . . . . . . . . . . . . . .7-30 LDR (logical destination register) . . . . . . . .7-20 local vector table (LVT). . . . . . . . . . . . . . . .7-23 logical destination mode . . . . . . . . . . . . . . .7-20 LVT (local-APIC version register) . . . . . . . .7-36 MDA (message destination address) . . . . .7-20 new features incorporated in the Pentium Pro processor. . . . . . . . . . . . . . . . . . . . . . . .7-45 physical destination mode . . . . . . . . . . . . .7-20 PPR (processor priority register) . . . . . . . .7-32 register address map . . . . . . . . . . . . . . . . .7-18 relationship of local APIC to I/O APIC . . . .7-14 relocating base address . . . . . . . . . . . . . . .7-19 serial bus . . . . . . . . . . . . . . . . . . . . . . . . . . .5-2 SMI interrupt . . . . . . . . . . . . . . . . . . . . . . . .12-2 software visible differences between the local APIC on a Pentium Pro processor and the 82489DX . . . . . . . . . . . . . . . . . . . . . . . .7-44 spurious interrupt . . . . . . . . . . . . . . . . . . . .7-33 state after a software (INIT) reset . . . . . . . .7-35 state after INIT-deassert message . . . . . . .7-35 state after power-up reset. . . . . . . . . . . . . .7-35 state of . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-33 SVR (spurious-interrupt vector register) . . .7-34 timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-43 TMR (trigger mode register) . . . . . . . . . . . .7-30 TPR (task priority register) . . . . . . . . . . . . .7-31 valid interrupts . . . . . . . . . . . . . . . . . . . . . .7-15 Local APIC version register . . . . . . . . . . . . . . .7-36 Local descriptor table register (see LDTR) Local descriptor table (see LDT) Local vector table (LVT), local APIC . . . . . . . .7-23 LOCK prefix . 2-22, 5-28, 7-2, 7-3, 7-4, 7-9, 18-37 Locked (atomic) operations automatic bus locking . . . . . . . . . . . . . . . . . .7-3 bus locking . . . . . . . . . . . . . . . . . . . . . . . . . .7-3 effects of a locked operation on internal processor caches . . . . . . . . . . . . . . . . . .7-6 loading a segment descriptor . . . . . . . . . .18-24 on Intel Architecture processors . . . . . . . .18-37 overview of . . . . . . . . . . . . . . . . . . . . . . . . . .7-2 software-controlled bus locking . . . . . . . . . .7-4 LOCK# signal . . . . . . . . . . . 2-22, 7-2, 7-3, 7-4, 7-6 Logical address space, of task. . . . . . . . . . . . .6-18 Logical address, description of. . . . . . . . . . . . . .3-6 Logical destination mode, local APIC. . . . . . . .7-20 LSL instruction . . . . . . . . . . . . . . . . . . . . 2-20, 4-28 LSS instruction . . . . . . . . . . . . . . . . . . . . . 3-9, 4-10 LTR instruction . . . . . . . 2-20, 4-25, 6-8, 7-12, 8-13 LVT (local vector table), local APIC . . . . . . . . .7-23
M
Machine-check architecture availability of machine-check architecture and exception . . . . . . . . . . . . . . . . . . . . . . . 13-7 compatibility with Pentium processor implementation . . . . . . . . . . . . . . . . . . 13-1 error codes, compound . . . . . . . . . . . . . . . 13-9 error codes, interpreting . . . . . . . . . . . . . . 13-8 error codes, simple . . . . . . . . . . . . . . . . . . 13-9 error-reporting MSRs . . . . . . . . . . . . . . . . 13-4 first introduced. . . . . . . . . . . . . . . . . . . . . 18-27 global MSRs . . . . . . . . . . . . . . . . . . . . . . . 13-2 guidelines for writing machine-check software . . . . . . . . . . . . . . . . . . . . . . . 13-14 initialization of . . . . . . . . . . . . . . . . . . . . . . 13-7 introduction of in Intel Architecture processors . . . . . . . . . . . . . . . . . . . . . 18-39 logging correctable machine-check errors 13-16 machine-check error codes, external bus errors . . . . . . . . . . . . . . . . . . . . . . . . . 13-11 machine-check exception handler. . . . . . 13-14 MCG_CAP MSR . . . . . . . . . . . . . . . . . . . . 13-2 MCG_CTL MSR . . . . . . . . . . . . . . . . . . . . 13-4 MCi_ADDR MSRs. . . . . . . . . . . . . . . . . . . 13-6 MCi_CTL MSRs . . . . . . . . . . . . . . . . . . . . 13-4 MCi_MISC MSRs . . . . . . . . . . . . . . . . . . . 13-7 MCi_STATUS MSRs. . . . . . . . . . . . . . . . . 13-5 MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2 overview . . . . . . . . . . . . . . . . . . . . . . . . . . 13-1 P5_MC_ADDR MSR . . . . . . . . . . . . . . . . . 13-7 P5_MC_TYPE MSR . . . . . . . . . . . . . . . . . 13-7 Pentium processor machine-check exception handling . . . . . . . . . . . . . . . . . . . . . . . 13-16 Pentium processor style error reporting . . 13-7 Machine-check exception (#MC) 5-52, 13-1, 13-7, 13-14, 18-26, 18-39 Maskable hardware interrupts delivered with local APIC . . . . . . . . . . . . . 7-23 description of. . . . . . . . . . . . . . . . . . . . . . . . 5-2 handling with virtual interrupt mechanism 16-20 masking. . . . . . . . . . . . . . . . . . . . . . . . .2-8, 5-8 Masked responses to denormal operand exception. . . . . . . . 11-19 to FPU stack overflow or underflow exception . . . . . . . . . . . . . . . . . . . . . . 11-17 to inexact result (precision) exception. . . 11-21 to numeric overflow exception. . . . . . . . . 11-20 MCA (machine-check architecture) flag, CPUID instruction . . . . . . . . . . . . . . . . . . . . 13-7 MCE (machine-check enable) flag, CR4 control register . . . . . . . . . . . . . . . . .2-17, 18-22 MCE (machine-check exception) flag, CPUID instruction . . . . . . . . . . . . . . . . . . . . 13-7 MCG_CAP MSR. . . . . . . . . . . . . . . . . .13-2, 13-15 MCG_CTL MSR . . . . . . . . . . . . . . . . . . . . . . . 13-4 MCG_STATUS MSR . . . . . . . . . . . . .13-15, 13-17 MCi_ADDR MSRs . . . . . . . . . . . . . . . . . . . . 13-17 MCi_CTL MSRs . . . . . . . . . . . . . . . . . . . . . . . 13-4
INDEX-10
INDEX
MCi_MISC MSRs . . . . . . . . . . . . . . . . . 13-7, 13-17 MCi_STATUS MSRs . . . . . . . . 13-5, 13-15, 13-17 MDA (message destination address), local APIC. . . . . . . . . . . . . . . . . . . . . . . . .7-20 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-1 Memory management introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-5 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-1 paging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-1 segmentation . . . . . . . . . . . . . . . . . . . . . . . .3-1 Memory ordering in Intel Architecture processors . . . . . . . .18-36 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-6 processor ordering . . . . . . . . . . . . . . . . . . . .7-6 snooping mechanism . . . . . . . . . . . . . . . . . .7-8 write forwarding . . . . . . . . . . . . . . . . . . . . . .7-8 write ordering . . . . . . . . . . . . . . . . . . . . . . . .7-6 Memory type range registers (see MTRRs) Memory types caching methods, defined. . . . . . . . . . . . . . .9-5 choosing . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-8 MTRR types . . . . . . . . . . . . . . . . . . . . . . . .9-19 UC (uncacheable). . . . . . . . . . . . . . . . . . . . .9-5 WB (write back) . . . . . . . . . . . . . . . . . . . . . .9-6 WC (write combining) . . . . . . . . . . . . . . . . . .9-6 WP (write protected) . . . . . . . . . . . . . . . . . . .9-7 WT (write through) . . . . . . . . . . . . . . . . . . . .9-6 MemTypeGet() function . . . . . . . . . . . . . . . . . .9-28 MemTypeSet() function . . . . . . . . . . . . . . . . . .9-29 MESI cache protocol described . . . . . . . . . . . . . . . . . . . . . . . 9-4, 9-9 Mixing 16-bit and 32-bit code on Intel Architecture processors . . . . . . . .18-34 overview . . . . . . . . . . . . . . . . . . . . . . . . . . .17-1 MMX instructions pairing guidelines . . . . . . . . . . . . . . . . . . .14-17 Mode switching between real-address and protected mode 8-13 example . . . . . . . . . . . . . . . . . . . . . . . . . . .8-16 to SMM . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-2 Model and stepping information, following processor initialization or reset . . . . .8-5 Model-specific registers (see MSRs) MOV instruction . . . . . . . . . . . . . . . . . . . . 3-9, 4-10 MOV (control registers) instructions. . . 2-20, 4-25, 7-12, 8-14 MOV (debug registers) instructions . . . 2-21, 4-25, 7-12, 15-10 MP (monitor coprocessor) flag, CR0 control register 2-16, 5-30, 8-6, 8-8 MP (monitor coprocessor) flag, CR0 register. .18-8 MSRs description of . . . . . . . . . . . . . . . . . . . . . . . .8-8 introduction of in Intel Architecture processors 18-38 introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-5 machine-check architecture . . . . . . . . . . . .13-2 reading and writing . . . . . . . . . . . . . . . . . . .2-23
MTRR flag, EDX feature information register . 9-20 MTRRcap register . . . . . . . . . . . . . . . . . . . . . 9-20 MTRRdefType register . . . . . . . . . . . . . . . . . . 9-21 MTRRfix16K_80000 and MTRRfix16K_A0000 (fixed range) MTRRs . . . . . . . . . . . 9-23 MTRRfix4K_C0000. and MTRRfix4K_F8000 (fixed range) MTRRs . . . . . . . . . . . . . . . . 9-23 MTRRfix64K_00000 (fixed range) MTRR. . . . 9-22 MTRRphysBasen (variable range) MTRRs . . 9-23 MTRRphysMaskn (variable range) MTRRs . . 9-23 MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 address mapping for fixed-range MTRRs . 9-23 cache control. . . . . . . . . . . . . . . . . . . . . . . 9-12 description of. . . . . . . . . . . . . . . . . . . .8-9, 9-18 enabling caching . . . . . . . . . . . . . . . . . . . . . 8-8 example of base and mask calculations . . 9-25 feature identification . . . . . . . . . . . . . . . . . 9-20 fixed-range registers . . . . . . . . . . . . . . . . . 9-22 initialization of . . . . . . . . . . . . . . . . . . . . . . 9-27 introduction of in Intel Architecture processors . . . . . . . . . . . . . . . . . . . . 18-39 large page size considerations . . . . . . . . . 9-32 mapping physical memory with . . . . . . . . . 9-20 memory types and their properties . . . . . . 9-19 MemTypeGet() function . . . . . . . . . . . . . . 9-28 MemTypeSet() function. . . . . . . . . . . . . . . 9-29 MTRRcap register . . . . . . . . . . . . . . . . . . . 9-20 MTRRdefType register . . . . . . . . . . . . . . . 9-21 multiple-processor considerations. . . . . . . 9-31 precedence of cache controls . . . . . . . . . . 9-13 precedences . . . . . . . . . . . . . . . . . . . . . . . 9-26 programming interface . . . . . . . . . . . . . . . 9-28 remapping memory types . . . . . . . . . . . . . 9-27 setting memory ranges . . . . . . . . . . . . . . . 9-21 state of following a hardware reset . . . . . . 9-18 variable-range registers . . . . . . . . . . . . . . 9-23 Multiple-processor initialization MP protocol . . . . . . . . . . . . . . . . . . . .7-45, 7-46 procedure . . . . . . . . . . . . . . . . . . . . . . . . . 7-48 Multiple-processor management bus locking . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 guaranteed atomic operations. . . . . . . . . . . 7-2 interprocessor and self-interrupts . . . . . . . 7-25 local APIC . . . . . . . . . . . . . . . . . . . . . . . . . 7-13 memory ordering . . . . . . . . . . . . . . . . . . . . . 7-6 MP protocol . . . . . . . . . . . . . . . . . . . .7-45, 7-46 overview of . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 SMM considerations . . . . . . . . . . . . . . . . 12-17 Multiple-processor system MP protocol . . . . . . . . . . . . . . . . . . . .7-45, 7-46 relationship of local and I/O APICs . . . . . . 7-14 Multisegment model . . . . . . . . . . . . . . . . . . . . . 3-5 Multitasking initialization for . . . . . . . . . . . . . . . . . . . . . 8-13 linking tasks. . . . . . . . . . . . . . . . . . . . . . . . 6-14 mechanism, description of . . . . . . . . . . . . . 6-3 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 setting up TSS. . . . . . . . . . . . . . . . . . . . . . 8-13
INDEX-11
INDEX
setting up TSS descriptor . . . . . . . . . . . . . .8-13
N
NaN compatibility, Intel Architecture processors . . . 18-10 NE (numeric error) flag, CR0 control register. 2-14, 5-48, 8-6, 8-8, 18-22 NE (numeric error) flag, CR0 register . . . . . . .18-8 NEG instruction . . . . . . . . . . . . . . . . . . . . . . . . .7-4 NMI interrupt . . . . . . . . . . . . . . . . . . . . . 2-22, 7-13 description of . . . . . . . . . . . . . . . . . . . . . . . .5-2 handling during initialization . . . . . . . . . . . .8-10 handling in SMM . . . . . . . . . . . . . . . . . . . .12-10 handling multiple NMIs . . . . . . . . . . . . . . . . .5-8 masking . . . . . . . . . . . . . . . . . . . . . . . . . .18-28 receiving when processor is shutdown . . . .5-33 reference information . . . . . . . . . . . . . . . . .5-24 vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4 NMI# pin. . . . . . . . . . . . . . . . . . . . . . . . . . 5-2, 5-24 Nonconforming code segments accessing . . . . . . . . . . . . . . . . . . . . . . . . . .4-14 C (conforming) flag . . . . . . . . . . . . . . . . . . .4-13 description of . . . . . . . . . . . . . . . . . . . . . . .3-14 Nonmaskable interrupt (see NMI) NOT instruction . . . . . . . . . . . . . . . . . . . . . . . . .7-4 Notation bit and byte order . . . . . . . . . . . . . . . . . . . . .1-6 exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . .1-8 hexadecimal and binary numbers. . . . . . . . .1-7 instruction operands . . . . . . . . . . . . . . . . . . .1-7 reserved bits . . . . . . . . . . . . . . . . . . . . . . . . .1-6 segmented addressing . . . . . . . . . . . . . . . . .1-7 Notational conventions. . . . . . . . . . . . . . . . . . . .1-5 NT (nested task) flag, EFLAGS register. 2-9, 6-10, 6-12, 6-14 Null segment selector, checking for . . . . . . . . . .4-7 Numeric overflow exception (#O). . . . 11-19, 18-11 Numeric underflow exception (#U). . . 11-20, 18-12 NV(invert)flag,PerfEvtSel0MSR(P6familyprocessors) 15-17 NW (not writethrough) flag, CR0 control register . . . . . . . . . 2-13, 8-8, 9-11, 9-12, 9-14, 9-31, 9-32 NW (not write-through) flag, CR0 control register . . . . . . . . . . 18-22, 18-23, 18-30
Operands operand-size prefix . . . . . . . . . . . . . . . . . . 17-2 OR instruction. . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 OS (operating system mode) flag, PerfEvtSel0 and PerfEvtSel1 MSRs (P6 family processors). . . . . . . . . . . . . . . . . . 15-16 OUT instruction. . . . . . . . . . . . . . . . . . . . . . . . 7-10 OUTS instruction . . . . . . . . . . . . . . . . . . . . . 15-10 Overflow exception (#OF). . . . . . . . . . . . . . . . 5-26 Overflow, FPU stack. . . . . . . . . . . . . . . . . . . 11-17
P
P (present) flag page-directory entry . . . . . . . . . . . . . . . . . 5-44 page-table entry . . . . . . . . . . . . . . . .3-25, 5-44 P (segment-present) flag, segment descriptor 3-12 P5_MC_ADDR MSR . . . . . . . . . . . . . .13-7, 13-16 P5_MC_TYPE MSR . . . . . . . . . . . . . . .13-7, 13-16 P6 family processors description of. . . . . . . . . . . . . . . . . . . . . . . . 1-1 list of events counted with performance-monitoring counters . . . . . A-1 PAE (physical address extension) flag, CR4 control register . 2-17, 3-19, 3-29, 18-21, 18-23 Page base address field, page-table entry . . . 3-25 Page directory base address. . . . . . . . . . . . . . . . . . . . . . . 3-23 base address (PDBR) . . . . . . . . . . . . . . . . . 6-6 description of. . . . . . . . . . . . . . . . . . . . . . . 3-20 introduction to . . . . . . . . . . . . . . . . . . . . . . . 2-5 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 setting up during initialization . . . . . . . . . . 8-13 Page frame (see Page) Page tables description of. . . . . . . . . . . . . . . . . . . . . . . 3-20 introduction to . . . . . . . . . . . . . . . . . . . . . . . 2-5 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 setting up during initialization . . . . . . . . . . 8-13 Page-directory entries automatic bus locking while updating . . . . . 7-4 caching in TLBs. . . . . . . . . . . . . . . . . . . . . . 9-4 page-table base address field . . . . . . . . . . 3-25 R/W (read/write) flag . . . . . . . . . . 4-2 , 4-3, 4-32 structure of . . . . . . . . . . . . . . . . . . . . . . . . 3-23 U/S (user/supervisor) flag . . . . . . 4-2, 4-3, 4-31 Page-directory-pointer (PDPTR) table . . . . . . 3-30 Page-fault exception (#PF). . . . . 3-18, 5-44, 18-26 Pages descripiton of. . . . . . . . . . . . . . . . . . . . . . . 3-20 disabling protection of . . . . . . . . . . . . . . . . . 4-2 enabling protection of . . . . . . . . . . . . . . . . . 4-2 introduction to . . . . . . . . . . . . . . . . . . . . . . . 2-5 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 PG flag, CR0 control register . . . . . . . . . . . 4-2 Pages, split . . . . . . . . . . . . . . . . . . . . . . . . . . 18-18 Page-table base address field, page-directory entry . . . . . . . . . . . . . . . . . . . . . . . . 3-25
O
Obsolete instructions . . . . . . . . . . . . . . 18-5, 18-18 OE (numeric overflow exception) flag, FPU status word . . . . . . . . . . . . . . . . . . 11-18, 11-19 OF flag, EFLAGS register . . . . . . . . . . . . . . . .5-26 Opcodes undefined . . . . . . . . . . . . . . . . . . . . . . . . . .18-6 Operand instruction . . . . . . . . . . . . . . . . . . . . . . . . . . .1-7
INDEX-12
INDEX
Page-table entries automatic bus locking while updating . . . . . .7-4 caching in TLBs . . . . . . . . . . . . . . . . . . . . . .9-4 effect of implicit caching on. . . . . . . . . . . . .9-16 page base address field . . . . . . . . . . . . . . .3-25 R/W (read/write) flag. . . . . . . . . . 4-2, 4-3, 4-32 structure of . . . . . . . . . . . . . . . . . . . . . . . . .3-23 U/S (user/supervisor) flag . . . . . . 4-2, 4-3, 4-31 Paging combining segment and page-level protection. . . . . . . . . . . . . . . . . . . . . . . .4-33 combining with segmentation . . . . . . . . . . . .3-6 defined . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-1 initializing . . . . . . . . . . . . . . . . . . . . . . . . . .8-12 introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-5 large page size MTRR considerations . . . .9-32 linear address translation (4-KByte pages).3-20 linear address translation (4-MByte pages) 3-21 mapping segments to pages. . . . . . . . . . . .3-39 mixing 4-KByte and 4-MByte pages . . . . . .3-22 page boundaries regarding TSS. . . . . . . . . .6-6 page-fault exception . . . . . . . . . . . . . . . . . .5-44 page-level protection . . . . . . . . . . . . . 4-2, 4-30 page-level protection flags . . . . . . . . . . . . .4-31 virtual-8086 tasks . . . . . . . . . . . . . . . . . . .16-10 Parameter passing, between 16- and 32-bit call gates 17-7 translation, between 16- and 32-bit code segments. . . . . . . . . . . . . . . . . . . . . . . .17-8 PBi (performance monitoring/breakpoint pins) flags, DebugCtlMSR register . . . . . . . . . .15-12 PC (pin control) flag, PerfEvtSel0 and PerfEvtSel1 MSRs (P6 family processors) . . . .15-17 PC0 and PC1 (pin control) fields, CESR MSR (Pentium processor). . . . . . . . . . . .15-21 PCD (page-level cache disable) flag CR3 control register . 2-16, 9-12, 18-22, 18-31 page-directory entries . . . 8-8, 9-12, 9-13, 9-32 page-table entries . 3-26, 8-8, 9-12, 9-13, 9-32, 18-32 PCE (performance-monitoring counter enable) flag, CR4 control register . . 2-18, 4-25, 18-21 PCE (performance-monitoring counter enable) flag, CR4 control register (P6 family processors) . . . . . . . . . . . . . . . . . .15-18 PDBR (see CR3 control register) PE (inexact result exception) flag, FPU status word . . . . . . . . . . . . . . . . . . . 11-4, 11-21 PE (protection enable) flag, CR0 control register . . . . . 2-16, 4-2, 8-13, 8-14, 12-8 Pentium Pro processor. . . . . . . . . . . . . . . . . . . .1-1 Pentium processors . . . . . . . . . . . . . . . . . . . . .18-7 list of events counted with performance-monitoring counters . . . . A-12 performance-monitoring counters. . . . . . .15-20 PerfCtr0 and PerfCtr1 MSRs (P6 family processors) . . . . . . . . . . . . . . . . . .15-16
PerfCtr0 MSR and PerfCtr1 MSRs (P6 family processors). . . . . . . . . . . . . . . . . . 15-18 PerfEvtSel0 and PerfEvtSel1 MSRs (P6 family processors). . . . . . . . . . . . . . . . . . 15-16 Performance-monitoring counters description of. . . . . . . . . . . . . . . . . . . . . . 15-15 events that can be counted (P6 family processors) . . . . . . . . . . . . . . . . . . . . . . A-1 events that can be counted (Pentium processors) . . . . . . . . . . . . . . . 15-22, A-12 introduction of in Intel Architecture processors . . . . . . . . . . . . . . . . . . . . . 18-40 monitoring counter overflow (P6 family processors) . . . . . . . . . . . . . . . . . . . . 15-19 overflow, monitoring (P6 family processors) . . . . . . . . . . . . . . . . . . . . 15-19 overview of . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 P6 family processors. . . . . . . . . . . . . . . . 15-15 Pentium II processor . . . . . . . . . . . . . . . . 15-15 Pentium Pro processor . . . . . . . . . . . . . . 15-15 Pentium processor . . . . . . . . . . . . . . . . . 15-20 reading . . . . . . . . . . . . . . . . . . . . . .2-22, 15-18 setting up (P6 family processors) . . . . . . 15-16 software drivers for . . . . . . . . . . . . . . . . . 15-18 starting and stopping. . . . . . . . . . . . . . . . 15-18 Performance-monitoring events list of events . . . . . . . . . . . . . . . . . . . . . . . . A-1 PG (paging) flag, CR0 control register . 2-13, 3-19, 3-26, 4-2, 8-13, 8-14, 12-8, 18-32 PGE (page global enable) flag, CR4 control register . . . . . . 2-17, 3-27, 18-21, 18-23 PhysBase field, MTRRphysBasen register. . . 9-24 Physical address extension access full extended physical address space . . . . . . . . . . . . . . . . . . . . . . . . . . 3-32 description of. . . . . . . . . . . . . . . . . . . . . . . 3-29 page-directory entries . . . . . . . . . . . . . . . . 3-33 page-table entries . . . . . . . . . . . . . . . . . . . 3-33 Physical address space defined . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 description of. . . . . . . . . . . . . . . . . . . . . . . . 3-6 mapped to a task. . . . . . . . . . . . . . . . . . . . 6-17 Physical addressing . . . . . . . . . . . . . . . . . . . . . 2-5 Physical destination mode, local APIC . . . . . . 7-20 Physical memory mapping of with fixed-range MTRRs . . . . . 9-23 mapping of with variable-range MTRRs . . 9-23 PhysMask, MTRRphysMaskn register . . . . . . 9-24 PM0/BP0 and PM1/BP1 (performance-monitor) pins (Pentium processor) . 15-20, 15-21, 15-22 Pointers code-segment pointer size . . . . . . . . . . . . 17-5 limit checking. . . . . . . . . . . . . . . . . . . . . . . 4-28 validation . . . . . . . . . . . . . . . . . . . . . . . . . . 4-25 POP instruction. . . . . . . . . . . . . . . . . . . . . . . . . 3-9 POPF instruction . . . . . . . . . . . . . . . . . .5-9, 15-10 PPR (processor priority register), local APIC . 7-32
INDEX-13
INDEX
Previous task link field, TSS. . . . . . 6-4, 6-14, 6-16 Priority levels, APIC interrupts . . . . . . . . . . . . .7-15 Privilege levels checking when accessing data segments . .4-9 checking, for call gates . . . . . . . . . . . . . . . .4-17 checking, when transferring program control between code segments . . . . . . . . . . . .4-12 description of . . . . . . . . . . . . . . . . . . . . . . . .4-8 protection rings . . . . . . . . . . . . . . . . . . . . . . .4-9 Privileged instructions . . . . . . . . . . . . . . . . . . .4-25 Processor identification earlier Intel architecture processors . . . . . .9-33 Processor management initialization . . . . . . . . . . . . . . . . . . . . . . . . . .8-1 local APIC . . . . . . . . . . . . . . . . . . . . . . . . . .7-13 overview of . . . . . . . . . . . . . . . . . . . . . . . . . .7-1 snooping mechanism . . . . . . . . . . . . . . . . . .7-8 processor number . . . . . . . . . . . . . . . . . . B-4, B-9 Processor ordering, description of . . . . . . . . . . .7-7 Protected mode IDT initialization . . . . . . . . . . . . . . . . . . . . .8-12 initialization for . . . . . . . . . . . . . . . . . . . . . .8-11 mixing 16-bit and 32-bit code modules . . . .17-2 mode switching . . . . . . . . . . . . . . . . . . . . . .8-13 PE flag, CR0 register . . . . . . . . . . . . . . . . . .4-2 switching to . . . . . . . . . . . . . . . . . . . . . 4-2, 8-14 system data structures required during initialization . . . . . . . . . . . . . . . . . 8-11, 8-12 Protection combining segment and page-level protection. . . . . . . . . . . . . . . . . . . . . . . .4-33 disabling . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-2 enabling . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-2 flags used for page-level protection . . . . . . .4-2 flags used for segment-level protection . . . .4-2 of exception- and interrupt-handler procedures 5-17 overview of . . . . . . . . . . . . . . . . . . . . . . . . . .4-1 page level . . . . . . . . . . . . . . . . . . . . . . 4-2, 4-32 page level, overriding . . . . . . . . . . . . . . . . .4-32 page level, overview . . . . . . . . . . . . . . . . . .4-30 page-level protection flags . . . . . . . . . . . . .4-31 read/write, page level . . . . . . . . . . . . . . . . .4-32 segment level . . . . . . . . . . . . . . . . . . . . . . . .4-2 user/supervisor type . . . . . . . . . . . . . . . . . .4-31 Protection rings . . . . . . . . . . . . . . . . . . . . . . . . .4-9 PS (page size) flag, page-table entry. . . . . . . .3-27 PSE (page size extension) flag, CR4 control register . . . 2-17, 3-19, 3-21, 3-22, 9-17, 18-22, 18-23 Pseudo-infinity . . . . . . . . . . . . . . . . . . . . . . . .18-10 Pseudo-NaN. . . . . . . . . . . . . . . . . . . . . . . . . .18-10 Pseudo-zero. . . . . . . . . . . . . . . . . . . . . . . . . .18-10 PUSH instruction . . . . . . . . . . . . . . . . . . . . . . .18-7 PUSHF instruction . . . . . . . . . . . . . . . . . . 5-9, 18-7 PVI (protected-mode virtual interrupts) flag, CR4 control register . . . . . . . . . . . 2-17 , 18-22 PWT (page-level write-through) flag
CR3 control register . 2-16, 9-12, 18-22, 18-31 page-directory entries . . . . . . . . 8-8, 9-12, 9-32 page-table entries . . . . . 8-8, 9-12, 9-32, 18-32 page-table entry . . . . . . . . . . . . . . . . . . . . 3-26
Q
QNaN compatibility, Intel Architecture processors . . . . . . . . . . . . . . . . . . . . . 18-10
R
RC (rounding control) field, FPU control word . . . . . . . . . . . . . . . . . . . .11-3, 11-4 RDMSR instruction2-23, 4-25, 9-20, 15-13, 15-15, 15-16, 15-18, 15-20, 18-4, 18-38 RDPMC instruction2-22, 4-25, 15-16, 15-18, 18-3, 18-21, 18-40 RDTSC instruction . . . . . . 2-22, 4-25, 15-15, 18-4 Read/write protection, page level . . . . . . . . . . . . . . . . 4-32 rights, checking . . . . . . . . . . . . . . . . . . . . . 4-27 Real-address mode 8086 emulation . . . . . . . . . . . . . . . . . . . . . 16-1 address translation in . . . . . . . . . . . . . . . . 16-3 description of. . . . . . . . . . . . . . . . . . . . . . . 16-1 exceptions and interrupts . . . . . . . . . . . . . 16-8 IDT initialization. . . . . . . . . . . . . . . . . . . . . 8-10 IDT, changing base and limit of. . . . . . . . . 16-6 IDT, structure of . . . . . . . . . . . . . . . . . . . . 16-7 IDT, use of. . . . . . . . . . . . . . . . . . . . . . . . . 16-6 initialization . . . . . . . . . . . . . . . . . . . . . . . . 8-10 instructions supported . . . . . . . . . . . . . . . . 16-4 interrupt and exception handling . . . . . . . . 16-6 mode switching . . . . . . . . . . . . . . . . . . . . . 8-13 native 16-bit mode. . . . . . . . . . . . . . . . . . . 17-1 overview of . . . . . . . . . . . . . . . . . . . . . . . . 16-1 registers supported . . . . . . . . . . . . . . . . . . 16-4 switching to . . . . . . . . . . . . . . . . . . . . . . . . 8-15 Related literature . . . . . . . . . . . . . . . . . . . . . . . 1-9 Requested privilege level (see RPL) Reserved bits . . . . . . . . . . . . . . . . . . . . . .1-6, 18-1 RESET# pin . . . . . . . . . . . . . . . . . . . . . .5-2, 18-19 RESET# signal . . . . . . . . . . . . . . . . . . . . . . . . 2-22 Reset, hardware receiving when processor is shutdown . . . 5-33 Restarting program or task, following an exception or interrupt . . . . . . . . . . . . . . . . . . . . 5-7 Restricting addressable domain . . . . . . . . . . . 4-31 RET instruction . . . . . . . . . . 4-12, 4-13, 4-23, 17-7 Returning from a called procedure . . . . . . . . . . . . . . 4-23 from an interrupt or exception handler . . . 5-15 RF (resume) flag, EFLAGS register . 2-9, 5-9, 15-2 Rounding control, RC field of FPU control word . . . . 11-3 modes, FPU . . . . . . . . . . . . . . . . . . .11-3, 11-4
INDEX-14
INDEX
results, FPU . . . . . . . . . . . . . . . . . . . . . . . .11-5 RPL description of . . . . . . . . . . . . . . . . . . . . 3-8, 4-9 field, segment selector . . . . . . . . . . . . . . . . .4-2 RSM instruction . . . . 2-22, 7-12, 12-1, 12-2, 12-3, 12-11, 12-16, 18-5 R/S# pin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-2 R/W (read/write) flag page-directory entry . . . . . . . . . . 4-2, 4-3, 4-32 page-table entry . . . . . . . . 3-26, 4-2, 4-3, 4-32 R/W0-R/W3 (read/write) fields, DR7 register . 15-6, 18-24
S
S (descriptor type) flag, segment descriptor . 3-11, 3-13, 4-2, 4-6 SBB instruction. . . . . . . . . . . . . . . . . . . . . . . . . .7-4 Segment descriptors access rights. . . . . . . . . . . . . . . . . . . . . . . .4-26 access rights, invalid values . . . . . . . . . . .18-24 automatic bus locking while updating . . . . . .7-3 base address fields. . . . . . . . . . . . . . . . . . .3-11 code type . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3 data type . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3 description of . . . . . . . . . . . . . . . . . . . . 2-3, 3-9 DPL (descriptor privilege level) field . . 3-12, 4-2 D/B (default operation size/default stack pointer size and/or upper bound) flag . . . . 3-12, 4-5 E (expansion direction) flag . . . . . . . . . 4-2, 4-5 G (granularity) flag . . . . . . . . . . . 3-12, 4-2, 4-5 limit field . . . . . . . . . . . . . . . . . . . . . . . . 4-2, 4-5 loading . . . . . . . . . . . . . . . . . . . . . . . . . . .18-24 P (segment-present) flag . . . . . . . . . . . . . .3-12 S (descriptor type) flag . . . 3-11, 3-13, 4-2, 4-6 segment limit field . . . . . . . . . . . . . . . . . . . .3-10 system type. . . . . . . . . . . . . . . . . . . . . . . . . .4-3 tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-16 TSS descriptor . . . . . . . . . . . . . . . . . . . . . . .6-6 type field . . . . . . . . . . . . . . 3-11, 3-13, 4-2, 4-6 type field, encoding. . . . . . . . . . . . . . 3-14, 3-15 when P (segment-present) flag is clear . . .3-13 Segment limit checking . . . . . . . . . . . . . . . . . . . . . . . . . . .2-20 field, segment descriptor. . . . . . . . . . . . . . .3-10 Segment not present exception (#NP) . . . . . . .3-12 Segment registers description of . . . . . . . . . . . . . . . . . . . . . . . .3-8 saved in TSS . . . . . . . . . . . . . . . . . . . . . . . .6-4 Segment selectors description of . . . . . . . . . . . . . . . . . . . . . . . .3-7 index field . . . . . . . . . . . . . . . . . . . . . . . . . . .3-7 null . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7 RPL field . . . . . . . . . . . . . . . . . . . . . . . . 3-8, 4-2 TI (table indicator) flag . . . . . . . . . . . . . . . . .3-8 Segmented addressing . . . . . . . . . . . . . . . . . . .1-7 Segment-not-present exception (#NP). . . . . . .5-37 Segments
basic flat model . . . . . . . . . . . . . . . . . . . . . . 3-3 code type. . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 combining segment and page-level protection . . . . . . . . . . . . . . . . . . . . . . . 4-33 combining with paging. . . . . . . . . . . . . . . . . 3-6 data type . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13 defined . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 disabling protection of . . . . . . . . . . . . . . . . . 4-2 enabling protection of . . . . . . . . . . . . . . . . . 4-2 mapping to pages . . . . . . . . . . . . . . . . . . . 3-39 multisegment usage model . . . . . . . . . . . . . 3-5 protected flat model. . . . . . . . . . . . . . . . . . . 3-4 segment-level protection . . . . . . . . . . . . . . . 4-2 segment-not-present exception. . . . . . . . . 5-37 system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 types, checking access rights . . . . . . . . . . 4-26 typing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6 using . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 wraparound . . . . . . . . . . . . . . . . . . . . . . . 18-35 Self-interrupts, local APIC . . . . . . . . . . . . . . . 7-25 Self-modifying code, effect on caches . . . . . . 9-15 Serializing instructions . . . . . . . . . . . . .7-11 , 18-19 SF (stack fault) flag, FPU status word . . . . . . 18-9 SGDT instruction . . . . . . . . . . . . . . . . . .2-20, 3-18 Shutdown resulting from double fault. . . . . . . . . . . . . 5-33 resulting from out of IDT limit condition. . . 5-33 SIDT instruction . . . . . . . . . . . . . . 2-20, 3-18, 5-13 Single-stepping breakpoint exception condition . . . . . . . . 15-10 on branches . . . . . . . . . . . . . . . . . . . . . . 15-14 on exceptions . . . . . . . . . . . . . . . . . . . . . 15-14 on interrupts . . . . . . . . . . . . . . . . . . . . . . 15-14 TF (trap) flag, EFLAGS register . . . . . . . 15-10 SLDT instruction . . . . . . . . . . . . . . . . . . . . . . . 2-20 SLTR instruction . . . . . . . . . . . . . . . . . . . . . . . 3-18 SMBASE default value . . . . . . . . . . . . . . . . . . . . . . . 12-4 relocation of. . . . . . . . . . . . . . . . . . . . . . . 12-14 SMI handler description of. . . . . . . . . . . . . . . . . . . . . . . 12-1 execution environment for. . . . . . . . . . . . . 12-8 exiting from . . . . . . . . . . . . . . . . . . . . . . . . 12-3 location in SMRAM . . . . . . . . . . . . . . . . . . 12-4 SMI interrupt . . . . . . . . . . . . . . . . . . . . . .2-22, 7-13 description of. . . . . . . . . . . . . . . . . . .12-1, 12-2 priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2 switching to SMM . . . . . . . . . . . . . . . . . . . 12-2 SMI# pin . . . . . . . . . . . . . . . . . . . . 5-2, 12-2, 12-15 SMM auto halt restart . . . . . . . . . . . . . . . . . . . . 12-13 executing the HLT instruction in . . . . . . . 12-14 exiting from . . . . . . . . . . . . . . . . . . . . . . . . 12-3 handling exceptions and interrupts . . . . . 12-10 I/O instruction restart. . . . . . . . . . . . . . . . 12-15 native 16-bit mode. . . . . . . . . . . . . . . . . . . 17-1 overview of . . . . . . . . . . . . . . . . . . . . . . . . 12-1 revision identifier . . . . . . . . . . . . . . . . . . . 12-12
INDEX-15
INDEX
revision identifier field . . . . . . . . . . . . . . . .12-12 switching to . . . . . . . . . . . . . . . . . . . . . . . . .12-2 switching to from other operating modes . .12-2 using FPU in . . . . . . . . . . . . . . . . . . . . . . .12-11 SMRAM caching . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-7 description of . . . . . . . . . . . . . . . . . . . . . . .12-1 state save map . . . . . . . . . . . . . . . . . . . . . .12-5 structure of . . . . . . . . . . . . . . . . . . . . . . . . .12-4 SMSW instruction. . . . . . . . . . . . . . . . . . . . . . .2-20 SNaN compatibility, Intel Architecture processors. . . . . . . . . . . . . . . . 18-10, 18-17 Snooping mechanism. . . . . . . . . . . . . . . . . 7-8, 9-5 Software interrupts . . . . . . . . . . . . . . . . . . . . . . .5-3 Software-controlled bus locking . . . . . . . . . . . . .7-4 Split pages . . . . . . . . . . . . . . . . . . . . . . . . . . .18-18 Spurious interrupt, local APIC . . . . . . . . . . . . .7-33 SS register, saving on call to exception or interrupt handler . . . . . . . . . . . . . . . . . . . . . . .5-15 Stack fault exception (#SS) . . . . . . . . . . . . . . .5-39 Stack fault, FPU . . . . . . . . . . . . . . . . . . 18-9, 18-16 Stack overflow exception, FPU . . . . . . . . . . .11-17 Stack pointers privilege level 0, 1, and 2 stacks. . . . . . . . . .6-6 size of . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-12 Stack segments privilege level checks when loading the SS register . . . . . . . . . . . . . . . . . . . . . . . . .4-12 size of stack pointer . . . . . . . . . . . . . . . . . .3-12 Stack switching inter-privilege level calls . . . . . . . . . . . . . . .4-21 masking exceptions and interrupts when switching stacks . . . . . . . . . . . . . . . . . .5-10 on call to exception or interrupt handler . . .5-15 Stack underflow exception, FPU . . . . . . . . . .11-17 Stack-fault exception (#SS) . . . . . . . . . . . . . .18-35 Stacks error code pushes. . . . . . . . . . . . . . . . . . .18-33 faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-39 for privilege levels 0, 1, and 2 . . . . . . . . . . .4-21 interlevel RET/IRET from a 16-bit interrupt or call gate . . . . . . . . . . . . . . . . . . . . . . . .18-34 managment of control transfers for 16- and 32-bit procedure calls . . . . . . . . . . . . . .17-5 operation on pushes and pops . . . . . . . . .18-33 pointers to in TSS . . . . . . . . . . . . . . . . . . . . .6-6 stack switching . . . . . . . . . . . . . . . . . . . . . .4-21 usage on call to exception or interrupt handler . . . . . . . . . . . . . . . . . . . . . . . .18-33 Stepping information, following processor initialization or reset . . . . . . . . . . . . . .8-5 STI instruction . . . . . . . . . . . . . . . . . . . . . . . . . .5-9 STPCLK# pin . . . . . . . . . . . . . . . . . . . . . 5-2, 15-15 STR instruction. . . . . . . . . . . . . . . . . . . . . 3-18, 6-8 STRT instruction . . . . . . . . . . . . . . . . . . . . . . .2-20 SUB instruction . . . . . . . . . . . . . . . . . . . . . . . . .7-4 Supervisor mode
description of. . . . . . . . . . . . . . . . . . . . . . . 4-31 U/S (user/supervisor) flag . . . . . . . . . . . . . 4-31 SVR (spurious-interrupt vector register), local APIC . . . . . . . . . . . . . . . . . . . . . . . . 7-34 System architecture . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 instructions . . . . . . . . . . . . . . . . . . . . .2-6, 2-18 registers, introduction to . . . . . . . . . . . . . . . 2-5 segment descriptor, layout of . . . . . . . . . . . 4-3 System-management mode (see SMM)
T
T (debug trap) flag, TSS . . . . . . . . . . . . . .6-6, 15-2 Task gates descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 executing a task . . . . . . . . . . . . . . . . . . . . . 6-3 handling a virtual-8086 mode interrupt or exception through . . . . . . . . . . . . . . . 16-20 in IDT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13 introduction to . . . . . . . . . . . . . . . . . . . .2-3, 2-4 layout of. . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13 referencing of TSS descriptor . . . . . . . . . . 5-19 Task management . . . . . . . . . . . . . . . . . . . . . . 6-1 data structures . . . . . . . . . . . . . . . . . . . . . . 6-4 mechanism, description of . . . . . . . . . . . . . 6-3 Task register. . . . . . . . . . . . . . . . . . . . . . . . . . 3-18 description of. . . . . . . . . . . . . . . . 2-11, 6-1, 6-8 initializing. . . . . . . . . . . . . . . . . . . . . . . . . . 8-13 introduction to . . . . . . . . . . . . . . . . . . . . . . . 2-5 Task switching description of. . . . . . . . . . . . . . . . . . . . . . . . 6-3 exception condition . . . . . . . . . . . . . . . . . 15-11 operation . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 preventing recursive task switching . . . . . 6-16 T (debug trap) flag. . . . . . . . . . . . . . . . . . . . 6-6 Tasks address space. . . . . . . . . . . . . . . . . . . . . . 6-17 description of. . . . . . . . . . . . . . . . . . . . . . . . 6-1 exception-handler task . . . . . . . . . . . . . . . 5-15 executing. . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 Intel 286 processor tasks . . . . . . . . . . . . 18-37 interrupt-handler task . . . . . . . . . . . . . . . . 5-15 interrupts and exceptions . . . . . . . . . . . . . 5-18 linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14 logical address space . . . . . . . . . . . . . . . . 6-18 management . . . . . . . . . . . . . . . . . . . . . . . . 6-1 mapping to linear and physical address spaces . . . . . . . . . . . . . . . . . . . . . . . . . 6-17 restart following an exception or interrupt . . 5-7 state (context) . . . . . . . . . . . . . . . . . . . .6-2, 6-3 structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 switching . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 task management data structures. . . . . . . . 6-4 Task-state segment (see TSS) Test registers . . . . . . . . . . . . . . . . . . . . . . . . 18-25 TF (trap) flag, EFLAGS register . 2-8, 5-18, 12-10, 15-2, 15-10, 15-12, 15-14, 16-6, 16-26
INDEX-16
INDEX
TI (table indicator) flag, segment selector . . . . .3-8 Timer, local APIC . . . . . . . . . . . . . . . . . . . . . . .7-43 Time-stamp counter description of . . . . . . . . . . . . . . . . . . . . . .15-14 reading . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-22 software drivers for . . . . . . . . . . . . . . . . . .15-18 TLBs description of . . . . . . . . . . . . . . . 3-19, 9-1, 9-4 flushing . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-17 invalidating (flushing) . . . . . . . . . . . . . . . . .2-21 relationship to PGE flag . . . . . . . . . 3-27, 18-23 relationship to PSE flag . . . . . . . . . . 3-22, 9-17 TMR (Trigger Mode Register), local APIC . . . .7-30 TPR (task priority register), local APIC . . . . . .7-31 TR (trace message enable) flag, DebugCtlMSR register . . . . . . . . . . . . . . . . . . . . . .15-12 Transcendental instruction accuracy . . 18-9, 18-18 Translation lookaside buffer (see TLB) Trap gates difference between interrupt and trap gates . . . . . . . . . . . . . . . . . . . . . . . . . . .5-18 for 16-bit and 32-bit code modules . . . . . . .17-2 handling a virtual-8086 mode interrupt or exception through . . . . . . . . . . . . . . . .16-17 in IDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-13 introduction to . . . . . . . . . . . . . . . . . . . . 2-3, 2-4 layout of . . . . . . . . . . . . . . . . . . . . . . . . . . .5-13 Traps description of . . . . . . . . . . . . . . . . . . . . . . . .5-5 restarting a program or task after . . . . . . . . .5-7 TS (task switched) flag, CR0 control register . . . . . . . . . . . . . 2-14, 5-30, 6-12 TSD (time-stamp counter disable) flag, CR4 control register .2-17, 4-25, 15-15, 15-18, 18-22 TSS 16-bit TSS, structure of. . . . . . . . . . . . . . . .6-19 32-bit TSS, structure of. . . . . . . . . . . . . . . . .6-4 CR3 control register (PDBR) . . . . . . . 6-6, 6-17 description of . . . . . . . . . . . . 2-3, 2-4, 6-1, 6-4 EFLAGS register. . . . . . . . . . . . . . . . . . . . . .6-4 EIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-4 executing a task . . . . . . . . . . . . . . . . . . . . . .6-3 floating-point save area . . . . . . . . . . . . . .18-14 general-purpose registers. . . . . . . . . . . . . . .6-4 initialization for multitasking . . . . . . . . . . . .8-13 invalid TSS exception . . . . . . . . . . . . . . . . .5-35 I/O map base address field. . . . . . . . 6-6, 18-29 I/O permission bit map . . . . . . . . . . . . . . . . .6-6 LDT segment selector field . . . . . . . . . 6-5, 6-17 link field. . . . . . . . . . . . . . . . . . . . . . . . . . . .5-19 order of reads/writes to . . . . . . . . . . . . . . .18-28 page-directory base address (PDBR). . . . .3-23 pointed to by task-gate descriptor. . . . . . . . .6-8 previous task link field. . . . . . . . 6-4, 6-14, 6-16 privilege-level 0, 1, and 2 stacks. . . . . . . . .4-21 referenced by task gate . . . . . . . . . . . . . . .5-19 segment registers . . . . . . . . . . . . . . . . . . . . .6-4 T (debug trap) flag . . . . . . . . . . . . . . . . . . . .6-6
task register. . . . . . . . . . . . . . . . . . . . . . . . . 6-8 using 16-bit TSSs in a 32-bit environment 18-29 virtual-mode extensions . . . . . . . . . . . . . 18-28 TSS descriptor B (busy) flag . . . . . . . . . . . . . . . . . . . . . . . . 6-7 initialization for multitasking . . . . . . . . . . . 8-13 structure of . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 TSS segment selector field, task-gate descriptor . . . . . . . . . . . . . . 6-8 writes. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-28 Type checking . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6 field, MTRRdefType register . . . . . . . . . . . 9-21 field, MTRRphysBasen register . . . . . . . . 9-24 field, segment descriptor .3-11, 3-13, 3-15, 4-2, 4-6 of segment . . . . . . . . . . . . . . . . . . . . . . . . . 4-6
U
UD2 instruction . . . . . . . . . . . . . . . . . . . .5-28, 18-3 UE (numeric overflow exception) flag, FPU status word . . . . . . . . . . . . . . . . . . . . . . . 11-21 Uncached (UC) memory type description of. . . . . . . . . . . . . . . . . . . . . . . . 9-5 effect on memory ordering . . . . . . . . . . . . 7-10 use of . . . . . . . . . . . . . . . . . . . . . . . . . .8-9, 9-8 Undefined opcodes. . . . . . . . . . . . . . . . . . . . . . . . . . . 18-6 Underflow, FPU stack. . . . . . . . . . . . . . . . . . 11-17 Unit mask field, PerfEvtSel0 and PerfEvtSel1 MSRs (P6 family processors) . . . . . . . . . 15-17 Un-normal number . . . . . . . . . . . . . . . . . . . . 18-10 User mode description of. . . . . . . . . . . . . . . . . . . . . . . 4-31 U/S (user/supervisor) flag . . . . . . . . . . . . . 4-31 User-defined interrupts . . . . . . . . . . . . . . .5-4, 5-55 USR (user mode) flag, PerfEvtSel0 and PerfEvtSel1 MSRs (P6 family processors). . . . . . . . . . . . . . . . . . 15-16 U/S (user/supervisor) flag page-directory entry . . . . . . . . . . 4-2, 4-3, 4-31 page-table entries . . . . . . . . . . . . . . . . . . 16-11 page-table entry . . . . . . . . 3-26, 4-2, 4-3, 4-31
V
V (valid) flag, MTRRphysMaskn register . . . . 9-24 Variable-range MTRRs, description of . . . . . . 9-23 VCNT (variable range registers count) field, MTRRcap register . . . . . . . . . . . . . 9-20 Vectors exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 reserved . . . . . . . . . . . . . . . . . . . . . . . . . . 7-15 VERR instruction . . . . . . . . . . . . . . . . . .2-20, 4-27 VERW instruction . . . . . . . . . . . . . . . . . .2-20, 4-27 VIF flag, EFLAGS register . . . . . . . . . . . . . . . 18-6
INDEX-17
INDEX
VIF (virtual interrupt) flag, EFLAGS register . .2-10 VIP (virtual interrupt pending) flag, EFLAGS register . . . . . . . . . . . . . . . . . . 2-10, 18-6 Virtual memory . . . . . . . . . . . . . . . . . . . . . . 2-5, 3-1 Virtual-8086 mode 8086 emulation . . . . . . . . . . . . . . . . . . . . . .16-1 description of . . . . . . . . . . . . . . . . . . . . . . .16-9 emulating 8086 operating system calls. . .16-25 enabling . . . . . . . . . . . . . . . . . . . . . . . . . . .16-9 entering. . . . . . . . . . . . . . . . . . . . . . . . . . .16-11 exception and interrupt handling, overview . . . . . . . . . . . . . . . . . . . . . . .16-15 exceptions and interrupts, handling through a task gate . . . . . . . . . . . . . . . . . . . . . . .16-19 exceptions and interrupts, handling through a trap or interrupt gate . . . . . . . . . . . . . .16-17 handling exceptions and interrupts through a task gate . . . . . . . . . . . . . . . . . . . . . . .16-20 IOPL sensitive instructions . . . . . . . . . . . .16-14 I/O-port-mapped I/O . . . . . . . . . . . . . . . . .16-15 leaving . . . . . . . . . . . . . . . . . . . . . . . . . . .16-13 memory mapped I/O . . . . . . . . . . . . . . . . .16-15 native 16-bit mode . . . . . . . . . . . . . . . . . . .17-1 overview of . . . . . . . . . . . . . . . . . . . . . . . . .16-1 paging of virtual-8086 tasks . . . . . . . . . . .16-10 protection within a virtual-8086 task . . . . .16-11 special I/O buffers. . . . . . . . . . . . . . . . . . .16-15 structure of a virtual-8086 task . . . . . . . . . .16-9 virtual I/O . . . . . . . . . . . . . . . . . . . . . . . . .16-14 Virtual-8086 tasks paging of . . . . . . . . . . . . . . . . . . . . . . . . . .16-10 protection within . . . . . . . . . . . . . . . . . . . .16-11 structure of . . . . . . . . . . . . . . . . . . . . . . . . .16-9 VM (virtual-8086 mode) flag, EFLAGS register .2-9 VME (virtual-8086 mode extensions) flag, CR4 control register . . . . . . . . . . . 2-17, 18-22
WRMSR instruction 2-22, 2-23, 4-25, 7-12, 15-11, 15-15, 15-16, 15-18, 15-20, 18-4, 18-38 WT (write through) memory type . . . . . . . .9-6, 9-8
X
XADD instruction . . . . . . . . . . . . . . . . . . .7-4, 18-5 XCHG instruction . . . . . . . . . . . . . . . 7-3, 7-4, 7-10 XOR instruction . . . . . . . . . . . . . . . . . . . . . . . . 7-4
Z
ZF flag, EFLAGS register . . . . . . . . . . . . . . . . 4-27
W
WAIT instruction. . . . . . . . . . . . . . . . . . . . . . . .5-30 WAIT/FWAIT instructions. . . . . 18-8, 18-18, 18-19 WB (write back) memory type . . . . . . . . . . 9-6, 9-8 WBINVD instruction . . 2-21, 4-25, 7-12, 9-15, 18-5 WC (write combining) flag, MTRRcap register. . . . . . . . . . . . . . . .9-21 memory type . . . . . . . . . . . . . . . . . . . . . 9-6, 9-8 WP (write protected) memory type. . . . . . . . . . .9-7 WP (write protect) flag, CR0 control register . 2-14, 4-32, 18-22 Write forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . .7-8 hit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-5 Write back (WB) memory type . . . . . . . . . . . . .7-10 Write buffer description of . . . . . . . . . . . . . . . . . . . . . . . .9-4 in Intel Architecture processors . . . . . . . .18-36 operation of. . . . . . . . . . . . . . . . . . . . . . . . .9-17 Write-back caching. . . . . . . . . . . . . . . . . . . . . . .9-5
INDEX-18
Getting Started
Microsoft Corporation
Filename: LMATUTTL.DOC Project: Template: FRONTWA1.DOT Author: Mike Eddy Last Saved By: Mike Eddy Revision #: 6 Page: 1 of 1 Printed: 10/02/00 04:07 PM
Microsoft MASM
Assembly-Language Development System Version 6.1 For MS-DOS and Windows Operating Systems
Information in this document is subject to change without notice. Companies, names, and data used in examples herein are fictitious unless otherwise noted. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Microsoft Corporation.
1992 Microsoft Corporation. All rights reserved.
Microsoft, MS, MS-DOS, XENIX, CodeView, and QuickC are registered trademarks and Windows and Windows NT are trademarks of Microsoft Corporation in the USA and other countries. U.S. Patent No. 4955066 IBM is a registered trademark of International Business Machines Corporation. Intel is a registered trademark of Intel Corporation. Printed in the United States of America.
Document No. DB35753-1292
iv
Contents
Contents
Chapter 1 Microsoft Macro Assembler (MASM) Overview . . . . . . . . . . . . . . . . . . . . System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Package Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Product Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New MASM Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 2 3 4
Chapter 2 Installing and Using MASM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Using Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Reviewing Installation Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 System Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Installing MASM for Use With Other Programming Languages . . . . . . . . . . 11 Running MASM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Running MASM from the MS-DOS Command Line . . . . . . . . . . . . . . . . . . . . 13 Running MASM Within the Windows Operating System . . . . . . . . . . . . . . . . 13 Getting More Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 3 Configuring Your System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Understanding System Configuration Terminology . . . . . . . . . . . . . . . . . . . . . . . . Choosing a Development Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Revising System Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modifying Your AUTOEXEC.BAT File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modifying Your CONFIG.SYS File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modifying Your .PIF Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modifying Your SYSTEM.INI File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modifying Your TOOLS.INI FILE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Your DOSXNT.EXE File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Increasing System Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimizing Disk Access Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using SMARTDRV.EXE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using RAMDRIVE.SYS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimizing Available Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Understanding Memory Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Determining Memory Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 19 19 20 23 24 25 25 26 26 26 27 31 31 32 32
Contents
Freeing Conventional Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enabling Extended Memory with HIMEM.SYS. . . . . . . . . . . . . . . . . . . . . . . . Freeing Extended Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Freeing Expanded Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using EMM386.EXE as an Expanded Memory Emulator . . . . . . . . . . . . . . . Using EMM386.EXE to Manage Upper Memory . . . . . . . . . . . . . . . . . . . . . . Other DPMI Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33 33 34 35 35 36 39 39
C H A P T E R
Microsoft Macro Assembler (MASM) Overview
This chapter describes the features of MASM version 6.1. The following topics are included:
u u u u u
System Requirements
Package Contents Product Components New MASM features Documentation Conventions
System Requirements
MASM version 6.1 requires the following system configuration:
u
u u u
An IBM Personal Computer or 100 percent compatible, running MS-DOS version 3.3 or later An 80386 or later processor 4 megabytes of available memory (RAM) One hard-disk drive with a minimum of 5 megabytes of free space (Depending on the options you select, you may need up to 9 megabytes of disk space. The SETUP program will ask what components you want installed and then check to see if your system has enough disk space to install all the components you selected.) One 1.2 megabyte, 5.25-inch floppy disk drive, or one 1.44 megabyte, 3.5inch floppy disk drive. (For information on the 720K MASM disk set, see Package Contents.)
Filename: LMATUC01.DOC Template: WEDGEA1.DOT Revision #: 32 Page: 1 of 5
Project: MASM Overview Author: a.c. birdsong Last Saved By: Mike Eddy Printed: 10/02/00 04:05 PM
Getting Started
Package Contents
Your MASM version 6.1 package should include the items listed below. If any pieces are missing, contact the retailer from whom you purchased the product.
u
Registration card. There are many advantages to being a registered owner of MASM, including notification of future software releases and easy access to customer assistance. Please take the time to fill out and mail the registration card now. If you are already a registered owner (from an earlier version of MASM) and have upgraded, your upgrade kit will not include a registration card. Disks. Disk 1 of the MASM disk set contains a file named PACKING.TXT that lists the name, location and a brief description of each disk file in the MASM package. Most files on the disks are compressed; the SETUP program decompresses files as they are installed. MASM is distributed on five 5.25-inch high-density, or four 3.5-inch high-density disks. If you need 3.5-inch 720K disks to install MASM, please send the media order card contained in the MASM package, or call Microsoft Customer Service (1-800426-9400). Books. Your package should contain following books: u Getting Started (this book) Getting Started includes information on system requirements, tells you how to set up the software, and provides instructions on optimizing your system for use with MASM. u Environment and Tools This book describes how to use the Programmers WorkBench (PWB), the CodeView (CV) Debugger, and all the other utilities included with your MASM package. u Programmers Guide This advanced programming text describes the enhanced features and technical details of MASM version 6.1. u Macro Assembler Reference This quick-reference book lists the utilities along with a brief description of their command-line options, directives, symbols and operators and include-file macro names. Complete information on processor and coprocessor instructions is also included.
Product Components
MASM version 6.1 includes all the components you need to develop Assembly Language programs for the MS-DOS and Windows operating systems. The following components are included:
u
ML Assembler version 6.1
Chapter 1
u
Programmers WorkBench version 2
Getting Started
u u
CodeView version 4 Debugger The latest versions of LINK, LIB, IMPLIB, NMAKE, BSCMAKE, CREF, H2INC, EXEHDR, CVPACK, SBRPACK, HELPMAKE, RM, UNDEL, and EXP utilities On-line Help for the assembler and all utilities Sample code Readme documentation for information unavailable at the time of printing.
u u u
New MASM Features

MASM version 6.1 includes the following new features:
u
MASM can run as a 32-bit application under MS-DOS version 3.3 and later, and the Windows operating system version 3.x (see Programmers Guide). The new command-line option sequence /Fl /Sc causes the assembler to show instruction timings in the listing file (see MASM Reference ). The assembler accepts @file in the command line to specify a response file to extend the command line (see MASM Reference ). The new /COFF command-line option causes the assembler to convert .OBJ output to COFF (see MASM Reference ). Updated versions of the programming utilities are included (see Environment and Tools).
Improved compatibility with MASM version 5.10 (see Appendix A in Programmers Guide). Allows creation of Windows-based DLLs without the Microsoft Windows Software Development Kit (see Programmers Guide). Includes sample code for writing DLLs for calling from the Windows operating system version 3.x (see Programmers Guide).
Chapter 1
Document Conventions
The MASM document set uses the following conventions:
Example COPY TEST.ASM C: Description Uppercase letters represent MS-DOS commands and filenames. Boldface letters indicate standard features of the MASM language: keywords, operators, and standard library functions. Words in italics indicate place holders for information you must supply, such as a filename. Italics are also occasionally used for emphasis in the text. This typeface is used for example programs, program fragments, and the names of user-defined functions and variables. It also indicates user input and screen output. Small capital letters denote names of keys on the keyboard. A plus sign ( + ) indicates a combination of keys. For example, SHIFT+F5 tells you to hold down the SHIFT key while pressing the F5 key. The first time a new term is defined, it is enclosed in quotation marks. Since some knowledge of programming is assumed, common terms such as memory or branch are not defined.
INVOKE
expression
ML /Zi HELLO.ASM
SHIFT
bookmark
C H A P T E R
Installing and Using MASM
This chapter describes the MASM version 6.1 installation. It includes detailed information about:
u u u u
Using SETUP Reviewing Installation Settings Running MASM Getting More Information
Before running SETUP, back up the distribution disks and make sure you have enough disk space (see System Requirements, page 1). For information on configuring your system after you have installed MASM, see Chapter 3 of this book.
Using Setup
To install Microsoft MASM version 6.1, run the SETUP.EXE program (located on Disk 1 of your installation set). The SETUP program performs all tasks necessary for installing the MASM components. You must run SETUP to install MASM, as the files on the distribution disks are compressed. SETUP decompresses the files and copies them to your hard disk. SETUP runs under MS-DOS and under the Windows operating system version 3.x. You can use SETUP to perform the following:
u
View the documentation notes, README.TXT, packing list, and information for users of MASM version 5.10. Preview the installation prompts and their defaults before installing any files.
Project: Installing MASM Author: a.c. birdsong Last Saved By: Mike Eddy Printed: 10/02/00 04:06 PM
Getting Started
u
Install the Macro Assembler using the defaults or while modifying the loading options. Copy individual files from the distribution disks.
You can use the interactive installation, which is the default, to set the following options:
u u u
Load utilities for use with the Windows operating system (default = yes). Load the Programmers WorkBench (default = yes). Configure PWB with BRIEF-compatible environment commands (default = no). Load MASM.EXE for MASM version 5.10 compatibility (default = yes). Copy Help files, README.TXT, and other documentation files (default = yes). Copy the sample programs (default = no). Note If you plan on using the Tutorials in the Programmers Guide, you should load the sample programs.
u u
u u
Copy a mouse driver (default = yes). Select the drive where you want the MASM files to reside (default = highest drive letter). Select the directories for the MASM component files you choose to install. These include: u Executable files (default = C:\MASM61\BIN) u Library files (default = C:\MASM61\LIB) u Include files (default = C:\MASM61\INCLUDE) u Initialization files (default = C:\MASM61\INIT) u Help files (default = C:\MASM61\HELP) u Sample files (default = C:\MASM61\SAMPLES) Note If you are using more than one Microsoft programming language, you may want to direct the files to other than the default directories to avoid duplicating utilities. For more information on installing MASM in a multilanguage environment, see Installing MASM for Use With Other Programming Languages on page 11.
Chapter 2
u
Check your TMP environment variable and available disk space installing any files. If the TMP environment variable is not set when you run SETUP, SETUP will propose C:\MASM61\TMP as the temporary directory on your hard disk to use during installation. This directory will not be deleted when installation is complete. SETUP will place the line
SET TMP=C:\MASM61\TMP
in the NEW-VARS.BAT file if a TMP environment variable is not defined. If there is a problem during installation, SETUP will report the error and terminate without loading MASM. For information on how SETUP uses your systems TMP environment variable, see page 21. You can run SETUP from either the MS-DOS command line or within the Windows operating system version 3.x. Running SETUP from the MS-DOS Command Line 1. Insert the disk labeled Disk 1 in the appropriate disk drive. 2. At the MS-DOS command prompt, type
DRIVE:\ENTER
where DRIVE is the disk drive into which you just put Disk 1. 3. Type SETUP and press ENTER to begin installation. 4. Press ENTER again to display the Main Menu screen in Figure 2.1. Running SETUP from the Windows operating system 1. Insert the disk labeled Disk 1 in the appropriate disk drive. 2. Open the File Manager and view the contents of Disk 1. 3. Run the SETUP.EXE file by double-clicking on it with the mouse, or by selecting it and pressing ENTER. 4. Press ENTER to display the main menu.
Reviewing Installation Settings

After SETUP is complete, the SETUP Main Menu screen, shown in Figure 2.1, appears. To review the default settings for installing MASM, choose Run SETUP Without Installing Any Files from the SETUP Main Menu. SETUP will go through the prompting screens and then return to the Main Menu screen.
Getting Started
Figure 2.1 SETUP Main Menu screen.
Follow the instructions on the screen. Press ENTER to proceed with a selection you have made. Use ARROW KEYS to make a selection. Press F1 for information on a selection. Press CTRL+C to quit SETUP. If you are running SETUP interactively, the Confirm Your Choices screen shown in Figure 2.2 appears when you have viewed all the prompts. This screen allows you to change any of the installation selections you made.
Figure 2.2 SETUP Confirm Your Choices Screen
Chapter 2
To change a setting or to access more information on each menu item, use the to make the selection, then press ENTER. To accept all the settings, select No Changes and press ENTER.
ARROW KEYS
After SETUP is complete, the Environment Settings screen shown in Figure 2.3 appears.
Figure 2.3 SETUP Environment Settings screen
System Files
SETUP does not modify your system files. Instead, SETUP copies recommended updates that are named NEW-VARS.BAT, NEW-CONF.SYS, and NEW-SYS.INI to your \MASM61\BIN subdirectory. Depending on your system configuration, some of the settings in these files are necessary for MASM to run on your system. NEW-CONF.SYS is a sample CONFIG.SYS file containing system commands for your CONFIG.SYS file. If your CONFIG.SYS file has the same commands, then make sure they are set to the same or greater values as in NEWCONF.SYS. NEW-VARS.BAT is a batch file that sets the MASM environment variables. You can run NEW-VARS.BAT from the command line or merge it with your existing AUTOEXEC.BAT file. If you chose to install the Windows operating system utilities during SETUP, the SYSTEM.INI Setting screen shown in Figure 2.4 will be displayed.
10
Getting Started
Figure 2.4
EM.INI Settings screen
NEW-SYS.INI contains commands that should exist in the [386.enh] section of your Microsoft Windows SYSTEM.INI file. It also lists any [386.enh] section lines you should delete to complete the update. The Sample PWB Settings screen shown in Figure 2.5 is displayed next. It tells you the location of the TOOLS.PRE file that contains the default PWB setting. You can rename this file TOOLS.INI, or merge the information in it with your existing TOOLS.INI file.
Figure 2.5 Sample PWB Settings screen
Filename: LMATUC02.DOC Project: Installing MASM Template: WEDGEA1.DOT Author: a.c. birdsong Last Saved By: Mike Eddy Revision #: 60 Page: 10 of 6 Printed: 10/02/00 04:06 PM
Chapter 2
11
The Memory Utilities screen shown in Figure 2.6 is displayed next. It tells you the location of HIMEM.SYS, RAMDRIVE.SYS and SMARTDRV.EXE. For more information about these memory utilities, see Chapter 3 of this book.
Figure 2.6 Memory Utilities screen
Pressing ENTER displays SETUPs Main Menu screen again. From here you may exit SETUP, view documentation files, or run the installation process again. You may run SETUP again at any time to load any files you chose to exclude during this installation.
Installing MASM for Use With Other Programming Languages

If you will be using MASM with other Microsoft programming languages, such as Microsoft C/C++, you may already have versions of PWB, CV and the other programming utilities loaded. You can install MASM in one of three ways, depending on your working requirements and available hard disk space:
u
Install MASM in your existing MASM tree structure, if one exists. This will update all identically named and placed files in the MASM tree. The SETUP program defaults to this option. If a copy of MASM version 5.10 is located in the \MASM directory, it will be renamed to OLDMASM.EXE. Install MASM in your high-level language tree structure. This will also update identically named and placed files in the high-level language tree. For example,
12
Getting Started
if you have Microsoft C/C++ version 7.0 on drive D:, the root of your C/C++ tree is \C700. If you wanted to install MASM in your \C700 tree, you would:
u u u
Select Install the Microsoft Macro Assembler from the Main Menu. Specify C700 instead of MASM in each of the target directory prompts.
Install MASM in an independent tree. MASM will have its own complete tree structure, so any identically named utilities or files that exist between the new MASM tree and your high-level language (or previous MASM version) trees will be preserved. If you select this option and want to use the newest versions of the Programmer's Work Bench, CodeView, and other programming utilities loaded during MASM setup, you will need to make sure the directory that contains your executable MASM files (default \MASM61\BIN) comes before other language \BIN directories in your MS-DOS PATH statement.
If SETUP detects files that are named and located identically to files it is about to install, it checks the time/date stamps of those files. If the files are newer than the files SETUP is about to install, and the file is supplied by another Microsoft language product (other than MASM), you are warned and given three options:
u u
Copy new files over old files. This deletes the old files from your hard disk. Do not copy new files, and keep the older versions. The new versions are not copied to your hard disk. Exit SETUP. If you exit SETUP you can save your old files in another location.
There are two exceptions:

u
If you choose to install the new MASM.EXE utility, your old MASM.EXE is renamed OLDMASM.EXE and left in the same directory. You are warned if SETUP is about to overwrite any version of LINK.EXE that is different from the one SETUP is about to install.
If you will be using Microsoft FORTRAN, BASIC, or C/C++ with MASM, you need to activate the appropriate PWB extensions (PWBFORT.XXT for FORTRAN, PWBBASIC.XXT for BASIC, and PWBC.XXT for C/C++). These are located in the C:\MASM61\BIN directory. To activate an extension, change the .XXT extension to .MXT. Language extensions provided with earlier versions of PWB are not compatible with PWB version 2.0, so you must use the new .XXT files if you want to use a language extension.
Chapter 2
13
Note Any extensions you wrote for a previous version of PWB must be rebuilt for PWB version 2.0. Building custom PWB extensions for MASM 6.1 requires the Microsoft C/C++ Version 7 programming set. For more information on PWB extensions, see Programmers Guide.
Running MASM
You can run MASM from the MS-DOS command line, or in an MS-DOS application window within the Windows operating system version 3.x. The configuration procedure for MASM to run under the two platforms is slightly different. (For information on using MASM after it is running, see Programmers Guide.)
Running MASM from the MS-DOS Command Line

If you plan to run MASM from the MS-DOS command line, make sure of three things:
u
u u
Your computer has booted with a CONFIG.SYS file that includes the commands listed in NEW-CONF.SYS. The environment variables listed in NEW-VARS.BAT are set. The MS-DOS extender file DOSXNT.EXE is in the path or current directory.
During SETUP, NEW-CONF.SYS, NEW-VARS.BAT, and DOSXNT.EXE are copied in the directory you specify for executable files (default \MASM61\BIN).
Running MASM Within the Windows Operating System

If you plan to run MASM within the Windows operating system version 3.x, you may want to add some of the MASM utilities to a program group in the Program Manager. MASM.GRP is copied to the \BIN directory you specify during SETUP for your executable files. Note Make sure the MASM61\BIN directory is in the current path before you
14
Getting Started
add MASM.GRP to your Program Manager. You may need to exit the Windows operating system to verify the current path. If the directory MASM61\BIN is not part of the current path, you will have to add the MASM.GRP program items and icons individually. Adding the MASM Program Group 1. Open the Program Manager. 2. 3. 4. 5. From the File menu, choose New. Select Program Group. Choose the OK button. Type DRIVE:\MASM61\BIN\MASM.GRP and press
ENTER
DRIVE is the MASM-resident drive. (If you specified a different directory for
your executable files during the SETUP program, type that as the path for MASM.GRP instead.) The Program Manager adds a new MASM Program Group. It has five program items: Programmers WorkBench, MASM 6.1 Reference, CodeView, MS-DOS CodeView, and WXServer. Once you have added your MASM program group and included the statements from NEW-SYS.INI to your SYSTEM.INI file, you must exit the Windows operating system to save your changes. Restart the Windows operating system to use the items in your new MASM program group. For more information on adding program items and groups to your Windows operating system, see your Windows operating system Users Guide. For information on using PWB, CV or WX Server, see Environment and Tools.
Getting More Information

While SETUP is running, press on the highlighted option.
F1
during any screen to access more information
For information on particular components, commonly asked questions, or information not available at the time of printing, use SETUPs Main Menu to view the README.TXT file. If you have checked these sources as well as information found elsewhere in the documentation set, and you need to contact Microsoft Product Support
Chapter 2
15
Services, see the information on contacting Microsoft located in the front of Environment and Tools.
15
C H A P T E R
Configuring Your System
This chapter describes how to configure your system for optimal use of MASM and explains the recommended modifications to your system files (CONFIG.SYS, AUTOEXEC.BAT, and SYSTEM.INI). This chapter also provides information on conventional memory, extended memory, expanded memory, and memory managers. This will help you:
u u u
make more memory available for MASM and other programs optimize the speed at which your programs run use the memory in your system more efficiently
You may need to experiment with the described techniques to find the right optimization for your system. Note MS-DOS version 5.0 or later provides many new features that make memory configuration easier. Many of the recommendations made in this chapter require your system to have these new MS-DOS features. If you have not upgraded to MS-DOS version 5.0 or later, you may want to do so before configuring your system files for MASM.
Understanding System Configuration Terminology

This section defines terms that can help you understand the configuration information in this chapter. Figure 3.1 shows the relationship between the different memory areas.
Filename: LMATUC03.DOC Project: Masm 6.10 Configuration Template: WEDGEA1.DOT Author: Stacy Schoolfield Last Saved By: Mike Eddy Revision #: 88 Page: 15 of 1 Printed: 10/02/00 04:06 PM
16
Getting Started
Figure 3.1 Memory Locations
Conventional (Real) Memory The first 640K of memory in a computer using an Intel-compatible processor. All MS-DOS systems have conventional memory. All application programs can use conventional memory without additional memorymanagement programs. Extended Memory Memory above the first 1 MB of memory on systems with 80286 or higher processors. Most 80386 computers come with some extended memory. Extended memory requires an extended memory manager, such as HIMEM.SYS, to prevent programs from using the same area of extended memory at the same time. You need nearly 3.5 megabytes of extended memory to run MASM on your system (4 megabytes of total system memory). High Memory Area (HMA) The first 64K of extended memory. Systems using MS-DOS version 5.0 or later can load MS-DOS into the HMA. This will free about 50K of conventional memory.
Chapter 3
17
Expanded Memory An area of memory accessible to programs that can access memory above 640K. Expanded memory is divided into 16K segments called pages. When a program requests information from expanded memory, an expanded memory manager maps or copies the appropriate page to an area called a page frame in upper memory. Since an expanded memory manager allows programs access to a limited amount of information at one time, expanded memory can be slower for programs to use than extended memory. Expanded memory requires special drivers such as EMM386.EXE. EMM386.EXE can also use extended memory to emulate expanded memory on 80386 and 80486 systems. Upper Memory Also referred to as High MS-DOS Memory. The 384K of memory above the 640K of conventional memory in most systems. Parts of this area not used by your system are called upper memory blocks (UMB). If your system has an 80386 or 80486 processor and extended memory, MS-DOS can be loaded into UMBs, so more conventional memory is free for programs. MS-DOS version 5.0 or later has commands that enable you to store certain device drivers and programs in upper memory. This memory cannot be accessed by user programs. Device Driver A program that MS-DOS uses to control devices such as the keyboard, mouse, monitor, disk drives, and physical memory. Memory managers are device drivers. Device drivers are loaded into memory by statements in your CONFIG.SYS file. Memory Manager A program that provides access to a particular type of memory. For programs to use extended memory, expanded memory, or upper memory, your system must have a memory manager. MASM provides two memory managers, HIMEM.SYS and EMM386.EXE, that can be installed on your system. Although memory managers take up some space in conventional memory, they provide access to extended memory, expanded memory, and upper memory. HIMEM.SYS A memory manager that provides access to extended memory. HIMEM.SYS is required for MASM. EMM386.EXE A device driver provided by MASM to control expanded memory and provide access to upper memory. The EMM386.EXE memory manager can also use extended memory to emulate expanded memory (see page 35).
18
Getting Started
SMARTDRV.EXE A device driver provided by MASM for systems running MS-DOS (version 4.x and later) that enables faster disk access. SMARTDRV.EXE creates a disk cache in extended or expanded memory. (SMARTDRV.EXE replaces the earlier version of this program, SMARTDRV.SYS.) Double Buffering A SMARTDRV option that provides compatibility for hard disk controllers that cannot work with virtual memory. Disk Cache An area in extended or expanded memory that SMARTDRV.EXE uses to store information it reads from the hard disk. This speeds up disk access because the next information the application requests may already be available in memory. RAMDRIVE.SYS A device driver provided by MASM for systems running MS-DOS (version 4.x and later) that reduces disk access. RAMDRIVE.SYS creates a virtual disk drive in RAM to emulate a physical disk drive. DOS Protected Mode Interface (DPMI) A published specification for handling MS-DOS calls in protected mode programs. The Microsoft Windows operating system version 3.x provides DPMI services and is therefore called a DPMI server. MS-DOS Extensions to the DPMI These extensions provide additional functionality not required by the DPMI specification. The Windows operating system version 3.x provides this additional functionality. Virtual Control Program Interface (VCPI ) Defines how multiple programs can run in protected mode on MS-DOS. MS-DOS-Extended Programs Programs that have a protected mode MS-DOS extender bound into the executable file. This allows the program to use extended memory and to use real-mode interrupt services in protected mode. WX Server WXSRVR.EXE works with WX.EXE, a real-mode MS-DOS program, to allow Windows-based programs to be invoked from within a Windows operating system MS-DOS application window.
Chapter 3
19
Choosing a Development Environment

The MASM assembler components require extended memory and an XMS memory manager, such as HIMEM.SYS. MASM supports the DPMI and VCPI specifications, and will use any available DPMI and VCPI allocated memory. However, neither DPMI or VCPI is required to run MASM. You may run MASM within MS-DOS, or within an MS-DOS application window under the Windows operating system. If you are using the Windows operating system, version 3.x, you can have multiple MS-DOS application windows operating simultaneously. For example, you can use one MS-DOS window for editing and another for compiling.
Revising System Files

You will need to modify your CONFIG.SYS and AUTOEXEC.BAT files to use some of the MASM features. Perform the following procedures to update these system files safely:
u u
Make a backup copy of these files before modifying them. To disable a statement without deleting it, insert the word REM in front of an AUTOEXEC.BAT or CONFIG.SYS statement. (Note that use of REM in a CONFIG.SYS file generates a warning message in versions of MS-DOS prior to 4.0.) Make a system disk in case a change to your CONFIG.SYS file makes it impossible to restart from your hard disk.
Note When you finish making changes to the files, you must restart your system to enable the changes. The information in this section applies to a system running MS-DOS version 5.0 or later and the Windows operating system version 3.x. If you are not using a Microsoft memory manager, follow the manufacturers instructions. If your system is running MS-DOS version 3.x or version 4.x, you cannot load MSDOS into high memory, because the DEVICEHIGH command is not available for CONFIG.SYS files, and the LOADHIGH command is not available for the AUTOEXEC.BAT file in these versions of MS-DOS. Since upper memory cannot be accessed, less memory is available for the MASM components.
20
Getting Started
Modifying Your AUTOEXEC.BAT File

The AUTOEXEC.BAT file contains settings or definitions for environment variables. Several environment variables need to be defined so the MASM components can work together optimally. The NEW-VARS.BAT file includes most of the SET statements you need to run MASM.
Required Environment Variables

The following environment variables are required by MASM: INIT Specifies the directory where TOOLS.INI and CURRENT.STS initialization files are located. If INIT is set, the Programmers WorkBench (PWB) looks for TOOLS.INI and CURRENT.STS in the directory specified by INIT. If you do not set INIT, PWB and the CodeView debugger create a CURRENT.STS file in every directory from which PWB or CodeView are invoked, making it unlikely that the correct status file will be loaded the next time PWB or CodeView is run. Only one directory should be specified for the INIT variable. PATH Specifies the search path for finding executable files. TMP Operating-system environment variable that specifies the directory for temporary files. Only one directory can be specified in the TMP variable. Utilities that use the TMP environment variable are NMAKE, LINK, and PWB. HELPFILES Required for using Help with PWB, CodeView, and QuickHelp. Specifies the list of directories where Help files (.HLP) are located, and the filenames of specific .HLP files. Wildcard characters are allowed in the HELPFILES variable to indicate more than one .HLP file. Note Do not place Microsoft Windows-based Help files in the HELPFILES directory. They are not compatible with MASM help.
Optional Environment Variables

The following environment variables are optional for MASM:
Chapter 3
21
LIB Specifies the list of directories, separated by semicolons, where library files (*.LIB) are located. INCLUDE Specifies the list of directories, separated by semicolons, where INCLUDE files (.INC files for MASM, .H files for C/C++) are located. MASM Additional options for the MASM 5.10 compatibility driver. ML Specifies additional options for the assembler. LINK Specifies additional options for the linker. You can use environment variables in makefiles. NMAKE uses a set of macro definitions equivalent to the setting of each environment variable when it runs. The macro definitions can be redefined without changing the value of the environment variable. An environment variable can also be defined in a makefile if it has not already been defined. You can view the current settings for macros by specifying the /p option for NMAKE. For more information on NMAKE and its options, see the Environment and Tools.
Using the TMP and TEMP Environment Variables

Microsoft programming languages and the utilities included with them use the environment variable TMP. This operating system environment variable is typically set in the AUTOEXEC.BAT file and is assigned to the drive and directory used for temporary file storage. For example, PWB needs at least 1 MB free in the directory specified by the TMP environment variable because that directory is the location for PWBs virtual memory file. While Microsoft development tools use the TMP environment variable, Microsoft applications, such as Microsoft Word, use an environment variable called TEMP. Both variables are set to a drive and directory for temporary file storage. Since compilers and other development tools use more temporary disk space than applications, you may want to assign TMP to your hard disk. Applications generally use less temporary disk space, allowing you to set TEMP equal to a small RAM drive. Always set TMP to an existing subdirectory, as in the following example:
22
Getting Started
SET TMP=C:\TMP
Chapter 3
23
Remember the following when using the TMP environment variables:

u
If the TMP environment variable is not set when you run SETUP, SETUP will prompt you for the path to store temporary files (default=\MASM61\TMP). If the path you give it doesnt exist, SETUP will create the directory. If you specify a drive in the path that doesnt exist, SETUP will create and use \MASM61\TMP. This temporary directory is not deleted when SETUP is finished installing MASM. After installation, SETUP will place the line
SET TMP=\MASM61\TMP
in the NEW-VARS.BAT file if a TMP environment variable is not set when it is run.
u
If your TMP environment variable points to a location on a network drive, make sure that the directory is not write protected. Make sure the directory pointed to by your TMP environment variable exists. If you set the TMP environment variable to a non-existing directory, SETUP will use the root directory of the current drive for storing temporary files. This can cause a problem, since MS-DOS limits the number of files that can be created in a root directory.
For more information on how MS-DOS and the Windows operating system use temporary files and how to change file settings, see your MS-DOS or Windows operating system documentation.
Setting Environment Space

The memory area created by the operating system to store environment variables and their values is called the environment space. If the assembler cant find files like include files, or if you receive this error message when rebooting:
Out of environment space
the available environment space is insufficient to hold the definitions of the environment variables. The default environment size is 256 bytes. The size of the current environment is 256 bytes or the amount of actual memory used by the environment variables rounded up to the next 16 bytes. The limits are 160 to 32,768 bytes. The current environment is the actual memory being used for the environment variables, not the size specified with the /e option of the SHELL command in
24
Getting Started
the CONFIG.SYS file. (The size of the current environment may be less than the size specified with the
Chapter 3
25
SHELL command.) For correct operation, set the value for the environment size to at least 1024 with this statement:
SHELL = C:\DOS\COMMAND.COM /e:1024 /p
Note Use the SHELL options carefully. Its a good idea to have a system disk with working CONFIG.SYS and AUTOEXEC.BAT files when experimenting with your system. The SHELL command specifies the name and location of the command interpreter you want MS-DOS to use and sets the environment space to 1024 bytes. The default is 256 bytes. The /p parameter tells MS-DOS to make its associated command interpreter permanent so that you cannot type EXIT to stop the command interpreter. It also tells MS-DOS to run your AUTOEXEC.BAT file when it carries out the SHELL command. See your MSDOS documentation for more information on the SHELL command.
Modifying Your CONFIG.SYS File

The amount of memory available and the configuration for MS-DOS and Windows operating system are controlled by your CONFIG.SYS file, your SYSTEM.INI file, and your .PIF files. If you are not familiar with these files, see your MS-DOS and Windows operating system, version 3.x documentation. SETUP does not modify your current system files. It writes all suggested changes to the NEW-CONF.SYS, NEW-VARS.BAT, and NEW-SYS.INI files in the C:\MASM61\BIN subdirectory. For more information about these files, see page 9. Note Depending on your existing system configuration, some of these changes are necessary for MASM to run on your system. SETUP adds the BUFFERS, FILES, and DEVICE commands to your NEW-CONF.SYS file or modifies the values set for these commands.
BUFFERS
The BUFFERS command in your CONFIG.SYS file specifies the number of buffers that MS-DOS reserves for file transfers. The greater the number of buffers (up to about 50), the faster your system runs. However increasing the number of buffers past a certain value will cause your system to use more memory without increasing speed. Each buffer requires 532 bytes of memory.
26
Getting Started
Table 3.1 shows the recommended number of buffers in relation to hard disk size.
Table 3.1 Recommended Number of Buffers Number of Buffers 20 30 40 50
Hard-Disk Size Less than 40 MB 4079 MB 80119 MB More than 120 MB
If you have SMARTDRV.EXE installed, set BUFFERS to 10. SMARTDRV.EXE provides much of the increase in system speed that setting BUFFERS to a higher value would accomplish.
FILES
The FILES command sets the number of files MS-DOS can access at the same time. Each file uses 48 bytes of conventional memory. If the number of FILES is set too low, compilations may fail because include files cannot be opened. The default value is 8. If you will be running MASM in a Windows operating system MS-DOS application window, use the setting recommended in your Windows operating system Users Guide. If you will be running MASM from the MS-DOS command line, this value will depend on the size of your programs. Start with a minimum of 20, and increase it to a higher value if necessary.
DEVICE, DEVICEHIGH
The DEVICE command loads a device driver. The DEVICEHIGH command loads a device driver into upper memory on MS-DOS version 5.0 or later. DEVICEHIGH also requires adding the DOS=UMB command and a DEVICE statement to HIMEM.SYS and EMM386.EXE. Setting DOS=UMB is necessary if you want to load programs and device drivers into upper memory. This command tells MS-DOS to maintain a link to upper memory. HIMEM.SYS must be installed before you specify the DOS command.
Modifying Your .PIF Files

The .PIF file sets the amount of expanded memory and extended memory used by a program that is not Windows-based and enables background execution by default. None of the MASM programs depend on expanded memory, and the
Chapter 3
27
Windows operating system uses only extended memory. Therefore, the PWB .PIF file provided with MASM sets expanded memory to 0 and extended memory to 1. Setting extended memory to 1 makes available as much extended memory as possible. The .PIF files for an MS-DOS session should also use these settings, unless you have utilities that require expanded memory. If you have a memory card that can be configured as expanded or extended memory, set it to extended memory and let the software emulate expanded memory if needed. If you have a memory card that provides expanded memory only, set extended memory to 1. Expanded memory should be set to the amount of expanded memory actually available, either from a memory card or from EMM386.EXE. If you develop programs under the Windows operating system version 3.x, you can allow background execution with certain exceptions. When profiling or doing timing dependent tasks, you will want exclusive execution. If you find that background compiles interfere with your foreground work, increase the priority of foreground scheduling by changing the value for Background Priority for Multitasking Options in the PWB .PIF file. You can also minimize the number of extensions that PWB loads by editing the .PIF file. You can specify the /DA option to suppress automatic loading of all PWB extensions, or you can put a question mark (?) in the Optional Parameter section of the PWB .PIF file. This specifies PWB to prompt you for commandline options each time it loads. For more information, see About Projects, and About PWB Extensions from the PWB Table of Contents in the Microsoft Advisor Help system.
Modifying Your SYSTEM.INI File

During installation, SETUP copies the following three lines into the file C:\MASM61\BIN\NEW-SYS.INI. These statements must be added to the [386enh] section of your SYSTEM.INI file:
DEVICE=C:\MASM61\BIN\DOSXNT.386 DEVICE=C:\MASM61\BIN\CVW1.386 DEVICE=C:\MASM61\BIN\VMB.386
The VMB.386 device line is added only if you choose to have SETUP install PWB. This line is used only by WX.EXE.
28
Getting Started
Modifying Your TOOLS.INI FILE

PWB and CodeView both use the TOOLS.INI file. Customization settings for PWB are located in TOOLS.INI, and CodeView looks for information in the [CV] and [CVW] sections of the TOOLS.INI file to do remote debugging. TOOLS.INI needs to be located in the directory pointed to by your INIT environment variable. If you dont have a TOOLS.INI file, copy (do not rename) TOOLS.PRE to TOOLS.INI. SETUP loads TOOLS.PRE to the C:\MASM61\INIT directory. To load only a subset of the PWB extensions, move the extensions that are not in the load set to a directory that is not on the path. Then you can load these extensions only when necessary. You can modify your TOOLS.INI to load an individual PWB extension selectively. For more information, see the About TOOLS.INI entry from the PWB Table of Contents in the Microsoft Advisor Help system, or see Chapter 6, Customizing PWB, in Environment and Tools . A TOOLS.INI file for the system that is the target side of a remote debugging session is also useful. For more information on remote debugging, see Chapter 10, Special Topics, in Environment and Tools .
Using Your DOSXNT.EXE File

DOSXNT.EXE is the MS-DOS extender that allows you to run the MASM assembler. You must make sure that DOSXNT.EXE is located somewhere in the path or in the current working directory (default \MASM61\BIN).
Increasing System Speed

There are several ways to improve system speed and the speed of the programs you use. The following sections explain how to increase system speed by optimizing disk access time and by using the SMARTDRV.EXE disk cache program.
Optimizing Disk Access Time

To decrease disk access time, arrange the directories in the PATH statement of your AUTOEXEC.BAT file so that file searches are as efficient as possible. Executable files that you use often should be in directories at the beginning of your PATH statement. Removing unnecessary files from your hard disk can also decrease disk access time. To optimize disk access time, perform the following:
Chapter 3
u
29
Use the CHKDSK /f command to recover lost disk space and then delete the files CHKDSK creates. Do not use CHKDSK when the Windows operating system is running. See your MS-DOS documentation before using CHKDSK. Delete obsolete Help files from previous installations of other Microsoft language products, especially UTILERR.HLP and CVW.HLP. See Chapter 23, Using Help, in Environment and Tools for more information. Have SETUP install only the MASM options you need.
Using SMARTDRV.EXE
MASM includes SMARTDRV.EXE version 4.0, a sophisticated block-oriented disk cache program that significantly improves compilation and link times. SMARTDRV.EXE is not required by MASM, but can reduce the amount of time your computer spends reading data from your hard disk. SMARTDRV.EXE replaces the older version, SMARTDRV.SYS, and is compatible with all versions of the Windows operating system version 3.x. SMARTDRV.EXE sets aside expanded or extended memory as a cache. SMARTDRV.EXE uses this disk cache to store the information read from the hard disk. When an application attempts to read additional information from the hard disk, the SMARTDRV.EXE program supplies the information directly from its cache instead. If you are using RAMDRIVE.SYS to create one or more RAM drives, and are limiting the memory assigned to SMARTDRV.EXE as a result, you can increase system speed by reassigning some or all of the memory from the RAM drive and adding it to the memory available to SMARTDRV.EXE. Install SMARTDRV.EXE by placing the following line in your AUTOEXEC.BAT file:
C:\MASM61\BIN\SMARTDRV.EXE
SMARTDRV.EXE automatically loads itself into high memory under MS-DOS version 5.0 if EMM386.EXE is loaded and upper memory blocks are available as a result of a DOS=UMB or DOS=HIGH, UMB command in your CONFIG.SYS file. SMARTDRV.EXE can also be loaded into HMA with thirdparty memory managers such as 386-Max. SETUP also checks your CONFIG.SYS file for a DEVICE statement for SMARTDRV.SYS. If SETUP finds a DEVICE statement for SMARTDRV.SYS,
30
Getting Started
it places the following DEVICE statement for SMARTDRV.EXE into your NEW-CONF.SYS file.
DEVICE=C:\MASM61\BIN\SMARTDRV.EXE /DOUBLE_BUFFER
Using Double Buffering

If you have a SCSI (Small Computer System Interface) hard disk controller, you may need to use the double buffering feature of SMARTDRV.EXE. Double buffering provides compatibility for hard disk controllers that cannot work with virtual memory. For double buffering to be enabled, SMARTDRV.EXE must be specified in both your AUTOEXEC.BAT file and CONFIG.SYS file. The double buffer driver is installed in CONFIG.SYS, and the cache component of SMARTDRV.EXE is installed during execution of AUTOEXEC.BAT. To enable SMARTDRVs double-buffering option, add the following line to the end of your CONFIG.SYS file:
DEVICE=C:\MASM61\BIN\SMARTDRV.EXE /DOUBLE_BUFFER
Some disk controllers do not need double buffering, so using this option when you do not need it results in some penalty in performance. SETUP does not determine if your system needs double buffering. Therefore, once your system is running with SMARTDRV.EXE, type:
SMARTDRV ENTER
at the command-line prompt. SMARTDRV.EXE displays disk cache information as illustrated in the following example:
Microsoft SMARTDrive Disk Cache version 4.0 Copyright 1991,1992 Microsoft Corp. Cache size: 1,048,576 bytes Cache size while running the Windows operating system: 1,048,576 bytes Disk Caching Status drive read cache write cache buffering -------------------------------------------A: yes no no B: yes no no C: yes yes yes D: yes yes For help, type Smartdrv /?.
Chapter 3
31
Note If every line in the Buffering column is No, you do not need the DEVICE statement for SMARTDRV.EXE in your CONFIG.SYS file. SMARTDRV.EXE always copies data to your hard disk when an application calls the MS-DOS reset disk function. If you want to force data to be written to disk, use
32
Getting Started
the /C command-line option. If you use a non-Microsoft utility to reboot your machine from a batch file, you should make sure you have
SMARTDRV /C
in the batch file prior to the reboot command. Failure to include this line can result in loss of data. You can use command-line options to control the size of the cache element (/E) and the size of the read-ahead buffer (/B). The read-ahead buffer is additional information that SMARTDRV.EXE reads when the application reads information from the hard disk. The size must be specified in bytes, and the element size must be one of the following: 1024, 2048, 4096, or 8192. The read-ahead buffer must be a multiple of the element size, cannot be less than the element size, and cannot exceed 32768. The defaults are 8192 for the element size and 16384 for the read-ahead buffer. Because these will occupy conventional or upper memory, making them larger reduces the available memory for MS-DOS applications. You can start SMARTDRV.EXE program by typing SMARTDRV at the MSDOS prompt before you start the Windows operating system, or by placing a command line in your AUTOEXEC.BAT file. The syntax is: [[drive:]] [[path]] S MARTDRV.EXE[[ [[drive[[+|]] ]]...]] [[/E:ElementSize]] [[InitCacheSize]] [[WinCacheSize]] ]] [[/B:BufferSize]] [[/C]] [[/R]] [[/L]] [[ /Q]] [[/S]] [[/?]] The following list describes the command-line options available for SMARTDRV.EXE:
Option drive Description Specify the letter of the disk drive you want to control disk caching. If you dont specify a drive letter, floppy disk drives read operations are cached but write operations are not, hard disk drive read and write operations are cached, and CD-ROM and network drives are ignored. You can specify multiple disk drives. Specify the location of the SMARTDRV.EXE file. Enable (+) or disable ( ) disk caching. Use the plus (+) and minus ( ) signs to override the default settings. If you specify a drive letter without a plus or minus sign, read operations are cached and write operations are not. If you specify a drive letter followed by a plus sign (+), read and write operations are both cached. If you specify a drive letter followed by a minus sign( ), neither read nor write operations are cached.
path +|
Chapter 3 Option /E:ElementSize Description
33
Specify in bytes the amount of the disk cache SMARTDRV.EXE moves at a time. This must be greater than or equal to 1, and a power of 2. The default value is 8K. Specify the size in kilobytes of the disk cache when SMARTDRV.EXE starts (before the Windows operating system is running). The size of the disk cache affects SMARTDRV.EXEs efficiency. In general, the larger the disk cache, the less often SMARTDRV.EXE needs to read information from the disk, which speeds up your systems performance. If you do not specify an InitCacheSize value, SMARTDRV.EXE sets the value according to how much memory your system has (see Table 3.2). Limit the amount (in kilobytes) the Windows operating system can reduce the disk cache size. The Windows operating system reduces the size of the disk cache to recover memory for its own use. The Windows operating system and SMARTDRV.EXE cooperate to provide optimum use of your system memory. When you exit the Windows operating system, it restores the disk cache to its normal size. The default value depends on how much available memory your system has (see Table 3.2). If you specify a value for InitCacheSize that is smaller than the value specified for WinCacheSize, InitCacheSize is set to the same size as WinCacheSize Specify the size of the read-ahead buffer. The next time the application is to read information from that file, it can read it from memory instead. The default size of the buffer is 16K. Its value can be any multiple of ElementSize. Write all cached information from memory to the hard disk. SMARTDRV.EXE writes information from memory to the hard disk when other disk activity has slowed. You might use this option if you are going to turn off your computer, and you want to make sure all information has been written to the hard disk. Clear the contents of the existing disk cache and restart SMARTDRV.EXE. Prevent SMARTDRV.EXE from loading into upper memory blocks (UMBs), even if there are UMBs available. You can use this option if you are using MS-DOS version 5.0 or later and UMBs are enabled. Prevent the display of SMARTDRV.EXE information on your screen. Display additional information about the status of SMARTDRV.EXE. Display online Help about the SMARTDRV.EXE command and options.
InitCacheSize
WinCacheSize
/B:BufferSize
/C
/R /L
/Q /S /?
Table 3.2 shows the default values for InitCacheSize and WinCacheSize, depending on the amount of available extended memory on your computer.
34
Getting Started Table 3.2 Default Values for InitCacheSize and WinCacheSize Extended Memory Up to 1 MB Up to 2 MB Up to 4 MB Up to 6 MB 6 MB or more InitCacheSize All extended memory 1 MB 1 MB 2 MB 2 MB WinCacheSize Zero (no caching) 256K 512K 1 MB 2 MB
Note Do not put the SMARTDRV.EXE disk cache in the expanded memory provided by EMM386.EXE. EMM386.EXE uses extended memory to emulate expanded memory that other programs can use. Although SMARTDRV.EXE can use this emulated expanded memory for its cache, it may not make your program run as quickly as it would using extended memory. Also, earlier versions of SMARTDRV.EXE allowed you to use the /a switch to direct SMARTDRV to use expanded memory. This function is no longer valid with the current version. The optimal disk cache size for SMARTDRV.EXE depends on the programs you run, and your system configuration. You should experiment to find the best disk cache size for your system, after you have saved a copy of your CONFIG.SYS file. For more information, see your Windows operating system Users Guide.
Using RAMDRIVE.SYS
If you dont want to use SMARTDRV.EXE, or if you have a large amount of memory, use RAMDRIVE.SYS to create a disk partition in RAM. To use it for assembler temporary files, set the TMP environment variable to the drive and directory of the RAM disk. The minimum recommended size for a RAM disk is 1 MB.
Optimizing Available Memory

You may need to experiment to find the right memory layout to ensure that all your MS-DOS applications have enough memory to run. The optimal memory layout may change according to the task to be performed. Therefore, you may need to change your CONFIG.SYS file, depending on what you need to
Chapter 3
35
optimize. This section provides general information for making more memory available on your system when necessary.
Understanding Memory Requirements

Your CONFIG.SYS file includes statements that define how your system uses memory. This section describes the memory requirements for MASM components. To run CodeView, LINK, ML, or CVPACK, you need an XMS, DPMI, or VCPI, server. HIMEM.SYS provides XMS services; EMM386.EXE provides VCPI services; the Windows operating system version 3.x in enhanced mode provides DPMI services. CodeView requires expanded memory when running as a VCPI application. If neither DPMI or VCPI service is available, CodeView uses extended memory provided by HIMEM.SYS. If PWB cannot allocate enough memory, it may be unable to load all the extensions you selected, list boxes may come up empty, and performance may be slower than usual. Try to make as much conventional and extended memory available as possible when running PWB. PWB will use expanded memory if it is available, but will run faster using extended memory.
Determining Memory Availability

To determine the size and type of memory in your system, use the MS-DOS MEM command to display the amount of used and free memory, list allocated and free memory areas, and list programs that are loaded. (The MEM command is available in MS-DOS version 4. x or later.) Type MEM at the command-line prompt when the Windows operating system is not running:
MEM /c | more
The MEM command does not report the contents of upper memory if you have the Windows operating system loaded. Use the /c option to tell MS-DOS to display the status of programs loaded into conventional memory and upper memory. (The /c option is only available in MS-DOS version 5.0 or later.) MS-DOS displays three columns of information about the programs currently using system memory: Name, Size in Decimal, and Size in Hex. See your MSDOS documentation for more information about the MEM command.
36
Getting Started
You can use the CHKDSK command to check the amount of free conventional memory if you are using an MS-DOS version earlier to version 4.0. You can also examine system configuration with the MSD.EXE utility provided with MASM. See MSD.TXT in C:\MASM61\BIN for information about running MSD.EXE.
Freeing Conventional Memory

If you have MS-DOS version 5.0 or later on your system and have followed recommendations in the section on HIMEM.SYS, you should have MS-DOS running in extended memory. The MS-DOS (version 5.0 or later only) SETUP program installs MS-DOS so that it runs in the first 64K of extended memory, called the high memory area (HMA). HIMEM.SYS or another XMS driver must be loaded before you can load MS-DOS into the high memory area. There are several other ways to free conventional memory:
u
Run device drivers and other memory-resident programs in upper memory (see pages 17 and 24). Dont start unnecessary memory-resident programs. Learn the purpose of each statement in your CONFIG.SYS and AUTOEXEC.BAT files so you know what programs are loaded (see your MS-DOS documentation). Include DEVICE commands in CONFIG.SYS only for device drivers you really need. If your system has expanded-memory hardware, include a DEVICE command for the expanded-memory manager that comes with the memory board. Configure memory as extended, not expanded, if possible.
In some situations involving the /Zi option, MASM requires 500Kb of free conventional memory for a successful assembly. This can be resolved by assembling and linking in separate steps (that is, by using NMAKE). For more information on the /Zi option, see Programmers Guide. For more information on using NMAKE, see Environment and Tools.
Enabling Extended Memory with HIMEM.SYS

HIMEM.SYS, a memory manager provided in the MASM package, is required by MASM version 6.1. This program provides access to extended memory and ensures that two programs do not use the same part of extended memory at the same time. HIMEM.SYS conforms to the Lotus/Intel/Microsoft (LIM) Extended Memory Standard (XMS), version 2.0.
Chapter 3
37
If you do not have a memory manager installed, SETUP adds the following statement to your NEW-CONF.SYS file and copies HIMEM.SYS to your C:\MASM61\BIN subdirectory:
DEVICE=C:\MASM61\BIN\HIMEM.SYS
The DEVICE command for HIMEM.SYS enables the use of extended memory. This command must be included in your CONFIG.SYS file before any commands that start device drivers, or any programs that use extended memory such as RAMDRIVE.SYS and EMM386.EXE. Note Some systems require the use of the HIMEM.SYS /m switch. Check your computers operations guide and MS-DOS manual for details. To access the maximum amount of conventional memory, use high memory and upper memory as much as possible. To move MS-DOS out of conventional memory and to enable access to upper memory (if you have MS-DOS version 5.0 or later), your CONFIG.SYS file needs the following statements:
DEVICE=C:\MASM61\BIN\HIMEM.SYS DOS=HIGH,UMB
This loads MS-DOS into high memory (HMA), enables the use of upper memory, and lets you load device drivers and TSRs into HMA, thus keeping a large amount of conventional memory available. Note If you are not using a Microsoft memory manager, you may need to remove HIMEM.SYS from your system. Follow the manufacturers instructions.
Freeing Extended Memory

If you are having difficulty running a program that requires additional extended memory, use the MEM command or MSD.EXE to make sure your system has enough extended memory to run the program. See MSD.TXT for a description of MSD.EXE. To minimize use of extended memory:
u
Make sure your CONFIG.SYS and AUTOEXEC.BAT files are not loading unnecessary programs or device drivers that are using extended memory. To verify what a particular device driver does, see your MS-DOS or Windows operating system documentation.
38
Getting Started
u
Reduce the amount of extended memory you allocate for each device driver by modifying the DEVICE command for each.
Freeing Expanded Memory

If a program does not run because there is not enough expanded memory, first make sure that your system contains as much physical expanded memory as the program needs. Also, make sure that it has enough extended memory to emulate expanded memory by checking your memory layout with the MEM command. Then check that your CONFIG.SYS and AUTOEXEC.BAT files arent starting unnecessary programs that use expanded memory. If you want to free more expanded memory, try the following:
u u
Use EMM386.EXE to provide more expanded memory. Check that the devices loaded with the DEVICE command in your CONFIG.SYS file arent using all of your expanded memory. Then reduce the amount of expanded memory being allocated, or disable unnecessary DEVICE commands.
Using EMM386.EXE as an Expanded Memory Emulator

The EMM386.EXE device driver included with MASM can use extended memory to emulate expanded memory on 80386 and 80486 systems. EMM386.EXE can also function as an upper memory manager (see page 36). EMM386.EXE can be run under MS-DOS version 3.x or later. Expanded memory boards and managers conform to the Lotus/Intel/Microsoft (LIM) Expanded Memory Standard (EMS) version 3.2 or 4.0. Note Do not use EMM386.EXE with an expanded memory manager from another manufacturer. EMM386.EXE provides expanded memory for systems that have only extended memory. EMM386.EXE requires about 80K of extended memory, in addition to the memory used to emulate expanded memory. Thus expanded memory is provided at the cost of extended memory. The typical NEW-CONF.SYS statement for EMM386.EXE is:
DEVICE=C:\MASM61\BIN\EMM386.EXE NOEMS
If you specify the NOEMS option, CodeView and CVPACK will not run outside of the Windows operating system. If you want to run MS-DOS-extended programs
Chapter 3
39
both within and outside of the Windows operating system, change the DEVICE statement in your CONFIG.SYS file as follows:
DEVICE=C:\MASM61\BIN\EMM386.EXE 2048 RAM
This statement installs EMM386.EXE with an allocation of 2048K memory. Make sure this command comes after the DEVICE command for HIMEM.SYS and before any commands for device drivers that use the high memory area (device drivers loaded with the DEVICEHIGH command). If you are currently using EMM386.SYS, an older version of this memory manager, replace it with EMM386.EXE and adjust the filename in your CONFIG.SYS file. You can exclude certain blocks or ranges of memory from the memory you allocate to EMM386.EXE. This prevents EMM386.EXE from using ranges of memory reserved for device drivers (such as hard disk cards, net cards, or video display) that use memory above 1 MB. Use the x= option for this. The following example excludes the memory for a hard card or a net card. (The exact memory locations will vary, depending on your computer and the memory cards you have installed.)
DEVICE=C:\MASM61\BIN\EMM386.EXE RAM X=C000-CDFF X=D800-DFFF
Make sure that the memory area you specify is the correct range of addresses for your system before enabling this statement in your CONFIG.SYS file. Note Use EMM386.EXE options carefully. Always save a copy of your CONFIG.SYS file when experimenting with the x= option. You can use the MEM command to determine the hex locations and sizes of the device drivers you have loaded. This will enable you to fine tune the X= setting for EMM386.EXE. (For information on the MEM command, see page 32 or your MS-DOS Reference.)
Using EMM386.EXE to Manage Upper Memory

Upper memory is the region of your 80386 or 80486 computers memory that is used by the system. Parts of upper memory that are not used are called upper memory blocks (UMB). You can use UMBs for running device drivers and other memory-resident programs that make more conventional memory available for running programs. The MS-DOS upper memory manager is
40
Getting Started
EMM386.EXE. You need to use MS-DOS 4.x or higher for EMM386.EXE to access upper memory. Some programs cannot be moved to upper memory. These include HIMEM.SYS, EMM386.EXE, and MS-DOS system data. Programs such as DOSKEY, SHARE, FASTOPEN, RAMDRIVE.SYS, console and other device drivers are good choices for loading into upper memory. You can also load your TSRs into high memory if your MS-DOS version supports it. Note The only way to find out if a program can run in upper memory is to try it. Some programs do not run properly in upper memory. If the program does not execute correctly, or if the system locks up, run it in conventional memory. To run programs in upper memory, you must include the following commands in your CONFIG.SYS file for loading HIMEM.SYS and EMM386.EXE:
DEVICE=C:\MASM61\BIN\HIMEM.SYS DEVICE=C:\MASM61\BIN\EMM386.EXE NOEMS
The DEVICE command for EMM386.EXE installs EMM386.EXE as an upper memory manager. The NOEMS option tells MS-DOS to run EMM386.EXE to manage upper memory only. Since NOEMS prevents EMM386.EXE from emulating expanded memory, use NOEMS only if your programs do not require expanded memory. The NOEMS option for EMM386.EXE is the most efficient setting for the Windows operating system, but MS-DOS-extended programs in MASM (CodeView, CVPACK, and LINK) fail if you run them from MS-DOS without the Windows operating system. Use this command:
DEVICE=C:\DOS\EMM386.EXE RAM
if you want to use EMM386.EXE both for the upper memory area manager and to emulate expanded memory. Note The Microsoft Windows operating system will be unable to allocate expanded memory to programs that need it if you specify the NOEMS option when installing EMM386.EXE. If you use such programs, use the RAM option (or no options) instead.
Chapter 3
41
Put the DEVICE command for HIMEM.SYS before the DEVICE command for EMM386.EXE. The DEVICE commands for HIMEM.SYS and EMM386.EXE must appear before any other DEVICE commands. If MS-DOS runs in upper memory, your CONFIG.SYS file will have DOS=HIGH,UMB instead of DOS=UMB. To load programs into upper memory, check your memory layout by executing the MEM command. At the end of the output from the MEM /c command, note the size in the line Largest available upper memory block. Then look in the Conventional Memory section of the output and find the largest device driver or program that will fit into that upper memory block (UMB). Change the command in the CONFIG.SYS file for that device driver from DEVICE to DEVICEHIGH. For memory-resident programs, change the command in the AUTOEXEC.BAT file from LOAD to LOADHIGH. Do this for one program at a time. You must restart your system each time. If you get an error with one of the programs you have loaded into upper memory, or the program or device driver is still running in conventional memory after you restart your system, it may be that the largest UMB is not large enough. Some programs require more memory when they are loaded than when they are running. Try using the SIZE= option with the DEVICEHIGH command (see the information in your MS-DOS documentation on the DEVICE command). Modify the DEVICEHIGH command in your CONFIG.SYS file to specify the hexadecimal size of the driver from the Size in Hex column of the MEM output, and restart your computer. For example, if the information in the Size in Hex column from the MEM command output for MOUSE.SYS is 39E0, you would put this statement in your CONFIG.SYS:
DEVICEHIGH SIZE=39E0 C:\WIN3\MOUSE.SYS
The SIZE= option takes effect only if needed. If using the SIZE= option doesnt allow your program to run, or if your system locks up during startup or when running the program, it is likely that the program cannot run in upper memory. Change the DEVICEHIGH command to DEVICE and remove LOADHIGH commands one at a time until the program works correctly. Some hardware programs might attempt to use upper memory after EMM386.EXE has determined this memory is available for running device drivers and programs. To avoid this, you can use the x= option when you load EMM386.EXE. This option prevents EMM386.EXE from allocating a specified range of upper memory for its use. For example, to prevent EMM386.EXE
42
Getting Started
from using the addresses D800h through DFFFh for UMB, you can include the following command in your CONFIG.SYS file:
DEVICE=C:\DOS\EMM386.EXE NOEMS x=D800-DFFF
Chapter 3
43
If you think your computer is set up correctly to run device drivers and programs in upper memory, but nothing appears there when you use the MEM /c command, check the following:
u
Make sure you are not running the Windows operating system version 3. x in 386-Enhanced mode when you execute the MEM command. The MEM command does not report the contents of upper memory when you are running the Windows operating system. Your CONFIG.SYS file must contain the DOS=UMB or DOS=HIGH,UMB command. The DEVICE command for EMM386.EXE in your CONFIG.SYS file must contain the NOEMS or RAM option. RAM is the default. Your CONFIG.SYS file must contain a DEVICEHIGH command, or your AUTOEXEC.BAT file must contain the LOADHIGH command for each program you want to run in upper memory. The DEVICE command for HIMEM.SYS must appear before the DEVICE command for EMM386.EXE; the DEVICE command for EMM386.EXE must appear before any DEVICEHIGH command in your CONFIG.SYS file.
Once programs are working successfully in upper memory, you can experiment to find the most efficient way to use available memory. In general, load device drivers and programs in order of size, from largest to smallest. Do this because MS-DOS uses the largest remaining UMB, even if that program would fit into a smaller UMB. The optimal loading order depends on the sizes of programs you are loading and the sizes of available UMB.
Other DPMI Servers

Though MASM version 6.1 will make use of any memory allocated by a DPMI or VCPI server. it does not require the presence of a DPMI server. MASM does require an XMS server, such as HIMEM.SYS.
Optimization Summary
Table 3.3, Summary of Optimization Methods, lists the different optimization methods and how they are used.
44
Getting Started Table 3.3 Summary of Optimizing Methods Method Use HIMEM.SYS. Run MS-DOS in extended memory. Use EMM386.EXE as an expanded memory emula-tor. When to Use Required for MASM version 6.1. If your system has extended memory. If your system has extended memory and your programs need expanded memory. To free memory. Memory Used Conventional High memory (HMA) Extended and conventional
Make sure CONFIG.SYS and AUTOEXEC.BAT are not loading unnecessary programs or device drivers.
Run device drivers and programs in upper memory Use SMARTDRV.EXE. (Dont use with a secondary buffer cache.) Use RAMDRIVE.SYS.
Depends on the programs you remove
To free memory. If your system has extended or expanded memory that isnt needed by programs. If your system has extended or expanded memory and you use programs that RAMDRIVE.SYS optimize.
Upper memory All programs
Programs that use temporary files or programs that you run often
Reference
Microsoft MASM
Assembly-Language Development System Version 6.1 For MS-DOS and Windows Operating System
Filename: LMARFTTL.DOC Project: Template: FRONTA1.DOT Author: Launi Lockard Last Saved By: Mike Eddy Revision #: 4 Page: 1 of 1 Printed: 10/02/00 04:17 PM
1987, 1991, 1992 Microsoft Corporation. All rights reserved.
Microsoft, MS, MS-DOS, XENIX, CodeView, and QuickC are registered trademarks and Windows and Windows NT are trademarks of Microsoft Corporation in the USA and other countries. U.S. Patent No. 4955066 IBM is a registered trademark of International Business Machines Corporation. Intel is a registered trademark and 386, 387, 486 are trademarks of Intel Corporation. Timings and encodings in this manual are used with permission of Intel and come from the following publications: Intel Corporation, iAPX 86, 88, 186, and 188 Users Manual, Programmers Reference. Santa Clara, Calif. 1985. Intel Corporation, iAPX 286 Programmers Reference Manual including the iAPX 286 Numeric Supplement. Santa Clara, Calif. 1985. Intel Corporation. 80386 Programmers Reference Manual. Santa Clara, Calif. 1986. Intel Corporation. 80387 80-bit CHMOS III Numeric Processor Extension. Santa Clara, Calif. 1987. Intel Corporation. i486 Microprocessor Data Sheet. Santa Clara, Calif. 1989.
Document No. DB35749-1292 Printed in the United States of America.
Filename: LMARFCPY.DOC Project: Template: FRONTA1.DOT Author: Launi Lockard Last Saved By: Launi Lockard Revision #: 3 Page: 2 of 1 Printed: 10/02/00 04:16 PM
iii
Contents
Introduction
...........................................................
ix
Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Chapter 1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Microsoft CodeView Debugger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 CVPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 EXEHDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 EXP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 HELPMAKE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 H2INC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 IMPLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 LIB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 LINK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 MASM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 NMAKE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 PWB (Programmers WorkBench) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 PWBRMAKE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 QuickHelp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 RM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 UNDEL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Chapter 2 Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Topical Cross-reference for Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Chapter 3 Symbols and Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Topical Cross-reference for Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Topical Cross-reference for Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Predefined Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Run-Time Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4 Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Topical Cross-reference for Processor Instructions . . . . . . . . . . . . . . . . . . . . . . . . Interpreting Processor Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 40 41 43 44 48 49 50 53 53
Filename: LMARFTOC.DOC Project: Template: FRONTA1.DOT Author: Launi Lockard Last Saved By: Launi Lockard Revision #: 4 Page: 3 of 1 Printed: 10/02/00 04:13 PM
iv
Contents
Clock Speeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Timings on the 8088 and 8086 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . Timings on the 8028680486 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . Interpreting Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interpreting 80386/486 Encoding Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-Bit Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32-Bit Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Address-Size Prefix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operand-Size Prefix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Encoding Differences for 32-Bit Operations . . . . . . . . . . . . . . . . . . . . . . . . Scaled Index Base Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AAA ASCII Adjust After Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AAD ASCII Adjust Before Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AAM ASCII Adjust After Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AAS ASCII Adjust After Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ADC Add With Carry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ADD Add. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AND Logical AND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ARPL Adjust Requested Privilege Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BOUND Check Array Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BSF/BSR Bit Scan. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BSWAP Byte Swap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BT/BTC/BTR/BTS Bit Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CALL Call Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CBW Convert Byte to Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CDQ Convert Double to Quad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CLC Clear Carry Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CLD Clear Direction Flag. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CLI Clear Interrupt Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CLTS Clear Task Switched Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CMC Complement Carry Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CMP Compare Two Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CMPS/CMPSB/CMPSW/CMPSD Compare String . . . . . . . . . . . . . . . . . . . . . CMPXCHG Compare and Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CWD Convert Word to Double . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CWDE Convert Word to Extended Double . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DAA Decimal Adjust After Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DAS Decimal Adjust After Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DEC Decrement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54 55 56 56 59 60 60 60 60 60 61 64 64 64 65 65 66 67 68 69 69 70 71 72 73 74 75 75 76 76 76 77 77 79 80 80 81 81 82 82
Contents
DIV Unsigned Divide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 ENTER Make Stack Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 HLT Halt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 IDIV Signed Divide. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 IMUL Signed Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 IN Input from Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 INC Increment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 INS/INSB/INSW/INSD Input from Port to String. . . . . . . . . . . . . . . . . . . . . . . 89 INT Interrupt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 INTO Interrupt on Overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 INVD Invalidate Data Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 INVLPG Invalidate TLB Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 IRET/IRETD Interrupt Return. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Jcondition Jump Conditionally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 JCXZ/JECXZ Jump if CX is Zero. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 JMP Jump Unconditionally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 LAHF Load Flags into AH Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 LAR Load Access Rights. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 LDS/LES/LFS/LGS/LSS Load Far Pointer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 LEA Load Effective Address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 LEAVE High Level Procedure Exit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 LES/LFS/LGS Load Far Pointer to Extra Segment . . . . . . . . . . . . . . . . . . . . . . 99 LGDT/LIDT/LLDT Load Descriptor Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 LMSW Load Machine Status Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 LOCK Lock the Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 LODS/LODSB/LODSW/LODSD Load Accumulator from String . . . . . . . 101 LOOP/LOOPW/LOOPD Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 LOOPcondition/LOOPconditionW/LOOPconditionD Loop Conditionally . 102 LSL Load Segment Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 LSS Load Far Pointer to Stack Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 LTR Load Task Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 MOV Move Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 MOV Move to/from Special Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 MOVS/MOVSB/MOVSW/MOVSD Move String Data . . . . . . . . . . . . . . . . . 108 MOVSX Move with Sign-Extend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 MOVZX Move with Zero-Extend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 MUL Unsigned Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 NEG Twos Complement Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 NOP No Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 NOT Ones Complement Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
vi
Contents
OR Inclusive OR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OUT Output to Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OUTS/OUTSB/OUTSW/OUTSD Output String to Port . . . . . . . . . . . . . . . . POP Pop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . POPA/POPAD Pop All . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . POPF/POPFD Pop Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PUSH/PUSHW/PUSHD Push . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PUSHA/PUSHAD Push All . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PUSHF/PUSHFD Push Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RCL/RCR/ROL/ROR Rotate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . REP Repeat String. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . REPcondition Repeat String Conditionally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RET/RETN/RETF Return from Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . ROL/ROR Rotate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SAHF Store AH into Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SAL/SAR Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SBB Subtract with Borrow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SCAS/SCASB/SCASW/SCASD Scan String Flags . . . . . . . . . . . . . . . . . . . . . SETcondition Set Conditionally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SGDT/SIDT/SLDT Store Descriptor Table . . . . . . . . . . . . . . . . . . . . . . . . . . . SHL/SHR/SAL/SAR Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SHLD/SHRD Double Precision Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SMSW Store Machine Status Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . STC Set Carry Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . STD Set Direction Flag. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . STI Set Interrupt Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . STOS/STOSB/STOSW/STOSD Store String Data . . . . . . . . . . . . . . . . . . . . . STR Store Task Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SUB Subtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TEST Logical Compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VERR/VERW Verify Read or Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WAIT Wait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WBINVD Write Back and Invalidate Data Cache . . . . . . . . . . . . . . . . . . . . . . XADD Exchange and Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XCHG Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XLAT/XLATB Translate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XOR Exclusive OR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
112 113 113 114 115 116 116 117 118 118 120 122 123 124 124 125 125 126 127 128 129 131 133 134 134 134 135 136 136 137 138 139 140 140 141 141 142
Chapter 5 Coprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Topical Cross-reference for Coprocessor Instructions . . . . . . . . . . . . . . . . . . . . . 146
Contents
vii
Interpreting Coprocessor Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 F2XM1 2X1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 FABS Absolute Value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 FADD/FADDP/FIADD Add. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 FBLD Load BCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 FBSTP Store BCD and Pop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 FCHS Change Sign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 FCLEX/FNCLEX Clear Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 FCOM/FCOMP/FCOMPP/FICOM/FICOMP Compare . . . . . . . . . . . . . . . . 152 FCOS Cosine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 FDECSTP Decrement Stack Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 FDISI/FNDISI Disable Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 FDIV/FDIVP/FIDIV Divide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 FDIVR/FDIVRP/FIDIVR Divide Reversed. . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 FENI/FNENI Enable Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 FFREE Free Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 FIADD/FISUB/FISUBR/FIMUL/FIDIV/FIDIVR Integer Arithmetic . . . . . 157 FICOM/FICOMP Compare Integer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 FILD Load Integer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 FINCSTP Increment Stack Pointer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 FINIT/FNINIT Initialize Coprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 FIST/FISTP Store Integer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 FLD/FILD/FBLD Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 FLD1/FLDZ/FLDPI/FLDL2E/FLDL2T/FLDLG2/FLDLN2 Load Constant159 FLDCW Load Control Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 FLDENV/FLDENVW/FLDENVD Load Environment State . . . . . . . . . . . . 161 FMUL/FMULP/FIMUL Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 FNinstruction No-Wait Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 FNOP No Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 FPATAN Partial Arctangent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 FPREM Partial Remainder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 FPREM1 Partial Remainder (IEEE Compatible) . . . . . . . . . . . . . . . . . . . . . . . 164 FPTAN Partial Tangent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 FRNDINT Round to Integer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 FRSTOR/FRSTORW/FRSTORD Restore Saved State . . . . . . . . . . . . . . . . . 166 FSAVE/FSAVEW/FSAVED/FNSAVE/FNSAVEW/FNSAVED Save Coprocessor State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 FSCALE Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 FSETPM Set Protected Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 FSIN Sine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
viii
Contents
FSINCOS Sine and Cosine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FSQRT Square Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FST/FSTP/FIST/FISTP/FBSTP Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FSTCW/FNSTCW Store Control Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FSTENV/FSTENVW/FSTENVD/FNSTENV/FNSTENVW/FNSTENVD Store Environment State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FSTSW/FNSTSW Store Status Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FSUB/FSUBP/FISUB Subtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FSUBR/FSUBRP/FISUBR Subtract Reversed . . . . . . . . . . . . . . . . . . . . . . . . . FTST Test for Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FUCOM/FUCOMP/FUCOMPP Unordered Compare . . . . . . . . . . . . . . . . . . FWAIT Wait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FXAM Examine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FXCH Exchange Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FXTRACT Extract Exponent and Significand . . . . . . . . . . . . . . . . . . . . . . . . . . FYL2X Y log2(X) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FYL2XP1 Y log2(X+1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 6 Macros. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BIOS.INC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CMACROS.INC, CMACROS.NEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MS-DOS.INC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MACROS.INC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PROLOGUE.INC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WIN.INC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 7 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ASCII Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Key Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MS-DOS Program Segment Prefix (PSP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Color Display Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hexadecimal-Binary-Decimal Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
168 169 169 170 170 171 171 172 173 173 174 175 176 176 176 177 179 180 180 180 183 184 185 185 187 188 190 192 193 194
ix
Introduction
This Microsoft Macro Assembler Reference lists all MASM instructions, directives, statements, and operators. It also serves as a quick reference to the Programmers WorkBench commands, and the commands for Microsoft utilities such as LINK and LIB. This book documents features of MASM version 6.1, and is part of a complete MASM documentation set. Other titles in the set are: Getting Started Explains how to perform all the tasks necessary to install and begin running MASM 6.1 on your system. Environment and Tools Describes the development tools that are included with MASM 6.1: the Programmers WorkBench, CodeView debugger, LINK, EXEHDR, NMAKE, LIB, and other tools and utilities. A detailed tutorial on the Programmers WorkBench teaches the basics of creating and debugging MASM code in this full-featured programming environment. A complete list of utilities and error messages generated by ML is also included. Programmers Guide Provides information for experienced assemblylanguage programmers on the features of the MASM 6.1 language. The appendixes cover the differences between MASM 5.1, MASM 6.0, and MASM 6.1, and the Backus-Naur Form for grammar notation to use in determining the syntax for any MASM language component.
Filename: LMARFINT.DOC Template: MSGRIDA1.DOT Revision #: 16 Page: 9 of 2
Project: Author: Terri Sharkey Last Saved By: Launi Lockard Printed: 10/02/00 04:14 PM
Reference
The following document conventions are used throughout this book:

Example SAMPLE 2ASM KEY TERMS Description Uppercase letters indicate filenames, segment names, registers and terms used at the command line. Bold type indicates text that must be typed exactly as shown. This includes assembly-language instructions, directives, symbols, operators, and keywords in other languages. Italics indicate variable information supplied by the user. This typeface indicates example programs, user input, and screen output. Double brackets indicate that the enclosed item is optional. Braces and a vertical bar indicate a choice between two or more items. You must choose one of the items unless double square brackets surround the braces. Three dots following an item indicate that you may type more items having the same form. Small capital letters indicate key names.
placeholders
Examples
[ [optional items] ] {choice1 | choice2}
Repeating elements...
SHIFT+F1
Filename: LMARFINT.DOC Project: Template: MSGRIDA1.DOT Author: Terri Sharkey Last Saved By: Launi Lockard Revision #: 16 Page: 10 of 2 Printed: 10/02/00 04:14 PM
C H A P T E R
Tools
CodeView. . . . CVPACK . . . . EXEHDR . . . . EXP . . . . . . . . HELPMAKE . H2INC . . . . . . IMPLIB . . . . . LIB . . . . . . . . LINK . . . . . . . MASM . . . . . . ML. . . . . . . . . NMAKE . . . . . PWB . . . . . . . PWBRMAKE . QuickHelp. . . . RM . . . . . . . . UNDEL . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . .
2 3 3 4 4 6 7 7 8 11 12 14 16 17 18 19 20
Filename: LMARFC01.DOC Project: MASM 6.1 Template: MSGRIDA1.DOT Author: Mike Eddy Last Saved By: Launi Lockard Revision #: 54 Page: 1 of 1 Printed: 10/02/00 04:13 PM
Reference
Microsoft CodeView Debugger

The Microsoft CodeView debugger runs the assembled or compiled program while simultaneously displaying the program source code, program variables, memory locations, processor registers, and other pertinent information. Syntax Options CV [[options]] executablefile [[arguments]] CVW [[options]] executablefile [[arguments]]
Option /2 /8 /25 /43 /50 /B /Ccommands /F /G /I[ [0 | 1] ] /Ldllfile /K /M /N[ [0 | 1] ] /R /S /TSF Action Permits the use of two monitors. Uses 8514/a as Windows display, and VGA as debugger display (CVW only). Starts in 25-line mode. Starts in 43-line mode. Starts in 50-line mode. Starts in black-and-white mode. Executes commands on startup. Exchanges screens by flipping between video pages (CV only). Eliminates refresh snow on CGA monitors. Turns nonmaskable-interrupt and 8259-interrupt trapping on ( / I1) or off ( / I0). Loads DLL dllfile for debugging (CVW only). Disables installation of keyboard monitors for the program being debugged (CV only). Disables CodeView use of the mouse. Use this option when debugging an application that supports the mouse. /N0 tells CodeView to trap nonmaskable interrupts; /N1 tells it not to trap. Enables 80386/486 debug registers (CV only). Exchanges screens by changing buffers (primarily for use with graphics programs) (CV only). Toggles TOOLS.INI entry to read/not read the CURRENT.STS file. Description Specifies path of help files or list of help filenames. Specifies path for TOOLS.INI and CURRENT.STS files.
Environment Variables
Variable HELPFILES INIT
Error! Style not defined.
CVPACK
The CVPACK utility reduces the size of an executable file that contains CodeView debugging information. Syntax Options CVPACK [[options]] exefile
Option /HELP /P /? Action Calls QuickHelp for help on CVPACK. Packs the file to the smallest possible size. Displays a summary of CVPACK command-line syntax.
EXEHDR
The EXEHDR utility displays and modifies the contents of an executable-file header. S yntax Options EXEHDR [[options]] filenames
Option /HEA:number /HEL /MA:number /MI:number /NE /NO /PM:type Action Option name: /HEA[ [P] ]. Sets the heap allocation field to number bytes for segmented-executable files. Option name: /HEL[ [P] ]. Calls QuickHelp for help on EXEHDR. Option name: /MA[ [X] ]. Sets the maximum memory allocation to number paragraphs for DOS executable files. ]. Sets the minimum memory allocation to Option name: /MI[ [N] number paragraphs for DOS executable files. Option name: /NE[ [WFILES] ]. Enables support for HPFS. Option name: /NO[ [LOGO] ]. Suppresses the EXEHDR copyright message. Option name: /PM[ [TYPE] ]. Sets the application type for Microsoft Windows, where type is one of the following: PM (or WINDOWAPI ), VIO (or WINDOWCOMPAT), or NOVIO (or NOTWINDOWCOMPAT). Option name: /R[ [ESETERROR] ]. Clears the error bit in the header of a Windows executable file. Option name: /S[ [TACK] ]. Sets the stack allocation to number bytes.
/R /S:number
EXP Option /V Action Option name: /V[ [ERBOSE] ]. Provides more information about segmented-executable files, including the default flags in the segment table, all run-time relocations, and additional fields from the header. Option name: /?. Displays a summary of EXEHDR commandline syntax.
/?
EXP
The EXP utility deletes all files in the hidden DELETED subdirectory of the current or specified directory. EXP is used with RM and UNDEL to manage backup files. Syntax Options EXP [[options]] [[directories]]
Option /HELP /Q /R /? Action Calls QuickHelp for help on EXP. Suppresses display of deleted files. Recurses into subdirectories of the current or specified directory. Displays a summary of EXP command-line syntax.
HELPMAKE
The HELPMAKE utility creates help files and customizes the help files supplied with Microsoft language products. Syntax Options HELPMAKE {/ E[[n]] | / D[[c]] | / H | /?} [[options]] sourcefiles
Option /Ac Action Specifies c as an application-specific control character for the help database, marking a line that contains special information for internal use by the application. Indicates that the context strings are case sensitive so that at run time all searches for help topics are case sensitive. Fully decodes the help database.
/C /D
Error! Style not defined. Option /DS Action Splits the concatenated, compressed help database into its components, using their original names. No decompression occurs. Decompresses the database and removes all screen formatting and cross-references.
/DU /E[ [n] ]
Creates (encodes) a help database from a specified text file (or files). The optional n indicates the amount of compression to take place. The value of n can range from 0 to 15. Calls the QuickHelp utility. If HELPMAKE cannot find QuickHelp or the help file, it displays a summary of HELPMAKE command-line syntax. Specifies a file containing word-separator characters. This file must contain a single line of characters that separate words. ASCII characters from 0 to 32 (including the space) and character 127 are always separators. If the /K option is not specified, the following characters are also considered separators: !#&( )*+-,/:;<=>?@[\]^_`{\}~ Locks the generated file so that it cannot be decoded by HELPMAKE at a later time. Suppresses the HELPMAKE copyright message. Specifies outfile as the name of the help database. The name outfile is optional with the /D option. Specifies the type of input file, according to the following values for n: /S1 /S2 /S3 Rich Text Format QuickHelp Format Minimally Formatted ASCII
/H[ [ELP] ]
/Kfilename
/L /NOLOGO /O outfile /Sn
/T
During encoding, translates dot commands to applicationspecific commands. During decoding, translates application commands to dot commands. The /T option forces /A:. Sets the verbosity of the diagnostic and informational output, depending on the value of n. The value of n can range from 0 to 6. Sets the fixed width of the resulting help text in number of characters. The value of width can range from 11 to 255. Displays a summary of HELPMAKE command-line syntax.
/V[ [n] ]
/Wwidth /?
H2INC
H2INC
The H2INC utility converts C header (.H) files into MASM-compatible include (.INC) files. It translates declarations and prototypes, but does not translate code. Syntax Options H2INC [[options]] filename.H
Option* /C /Fa[ [filename] ] /Fc[ [filename] ] Action Passes comments in the .H file to the .INC file. Specifies that the output file contain only equivalent MASM statements. This is the default. Specifies that the output file contain equivalent MASM statements plus original C statements converted to comment lines. Calls QuickHelp for help on H2INC. Enables generation of text equates. By default, text items are not translated. Instructs H2INC to explicitly declare the distances for all pointers and functions. Suppresses the expansion of nested include files. Adds string to all names generated by H2INC. Used to eliminate name conflicts with other H2INC-generated include files. Makes all structure and union tag names unique. Displays a summary of H2INC command-line syntax.
/HELP /Ht /Mn /Ni /Zn string
/Zu /?
*H2INC also supports the following options from Microsoft C, version 6.0 and higher: /AC, /AH, /AL, /AM, /AS, /AT, / D, / F, / Fi, /G0, /G1, /G2, /G3, /G4, /Gc, /Gd, /Gr, / I, /J, / Tc, /U, /u, / W0, / W1, / W2, / W3, / W4, / X, / Za, / Zc, / Ze, / Zp1, / Zp2, / Zp4.
Variable CL H2INC INCLUDE
Description Specifies default command-line options. Specifies default command-line options. Appended after the CL environment variable. Specifies search path for include files.
IMPLIB
The IMPLIB utility creates import libraries used by LINK to link dynamic-link libraries with applications. Syntax Options IMPLIB [[options]] implibname {dllfile... | deffile...}
Option /H /NOI /NOL /? Action Option name: /H[ [ELP] ]. Calls QuickHelp for help on IMPLIB. Option name: /NOI[ [GNORECASE] ]. Preserves case for entry names in DLLs. Option name: /NOL[ [OGO] ]. Suppresses the IMPLIB copyright message. Option name: /?. Displays a summary of IMPLIB commandline syntax.
LIB
The LIB utility helps create, organize, and maintain run-time libraries. Syntax Options LIB inlibrary [[options]] [[commands]] [[, [[listfile]] [[, [[outlibrary]] ]] ]] [[;]]
Option /H /I Action Option name: /H[ [ELP] ]. Calls QuickHelp for help on LIB. Option name: /I[ [GNORECASE] ]. Tells LIB to ignore case when comparing symbols (the default). Use to combine a library marked /NOI with an unmarked library to create a new case-insensitive library. Option name: NOE[ [XTDICTIONARY] ]. Prevents LIB from creating an extended dictionary. Option name: /NOI[ [GNORECASE] ]. Tells LIB to preserve case when comparing symbols. When combining libraries, if any library is marked /NOI, the output library is case sensitive, unless /IGN is specified. Option name: /NOL[ [OGO] ]. Suppresses the LIB copyright message. Action Option name: /P[ [AGESIZE] ]. Specifies the page size (in bytes) of a new library or changes the page size of an existing library. The default for a new library is 16.
/NOE /NOI
/NOL Option /P:number
LINK /? Option name: /?. Displays a summary of LIB command-line syntax. Action Appends an object file or library file. Deletes a module. Replaces a module by deleting it and appending an object file with the same name. Copies a module to a new object file. Moves a module out of the library by copying it to a new object file and then deleting it.
Commands
Operator +name name +name *name *name
LINK
The LINK utility combines object files into a single executable file or dynamiclink library. Syntax Options LINK objfiles [[, [[exefile]] [[, [[mapfile]] [[, [[libraries]] [[, [[deffile]] ]] ]] ]] ]] [[;]]
Option /A:size Action Option name: /A[ [LIGNMENT] ]. Directs LINK to align segment data in a segmented-executable file along the boundaries specified by size bytes, where size must be a power of two. Option name: /B[ [ATCH] ]. Suppresses prompts for library or object files not found. Option name: /CO[ [DEVIEW] ]. Adds symbolic data and line numbers needed by the Microsoft CodeView debugger. This option is incompatible with the /EXEPACK option. Option name: /CP[ [ARMAXALLOC] ]. Sets the programs maximum memory allocation to number of 16-byte paragraphs. Option name: /DO[ [SSEG] ]. Orders segments in the default order used by Microsoft high-level languages.
/B /CO
/CP:number
/DO
Error! Style not defined. Option /DS Action Option name: /DS[ [ALLOCATE] ]. Directs LINK to load all data starting at the high end of the data segment. The /DSALLOC option is for assembly-language programs that create MS-DOS .EXE files.
/E
Option name: /E[ [XEPACK] ]. Packs the executable file. The /EXEPACK option is incompatible with /INCR and /CO. Do not use /EXEPACK on a Windows-based application. Option name: /F[ [ARCALLTRANSLATION] ]. Optimizes far calls. The /FARCALL option is automatically on when using /TINY. The /PACKC option is not recommended with /FARCALL when linking a Windows-based program. Option name: /HE[ [LP] ]. Calls QuickHelp for help on LINK. Option name: /HI[ [GH] ]. Places the executable file as high in memory as possible. Use /HIGH with the /DSALLOC option. This option is for assembly-language programs that create MSDOS .EXE files. Option name: /INC[ [REMENTAL] ]. Prepares for incremental linking with ILINK. This option is incompatible with /EXEPACK and /TINY. Option name: /INF[ [ORMATION] ]. Displays to the standard output the phase of linking and names of object files being linked. Option name: /LI[ [NENUMBERS] ]. Adds source file line numbers and associated addresses to the map file. The object file must be created with line numbers. This option creates a map file even if mapfile is not specified. Option name: /M[ [AP] ]. Adds public symbols to the map file. ]. Ignores Option name: /NOD[ [EFAULTLIBRARYSEARCH] the specified default library. Specify without libraryname to ignore all default libraries. Option name: /NOE[ [XTDICTIONARY] ]. Prevents LINK from searching extended dictionaries in libraries. Use /NOE when redefinition of a symbol causes error L2044. Option name: /NOF[ [ARCALLTRANSLATION] ]. Turns off far-call optimization. Option name: /NOI[ [GNORECASE] ]. Preserves case in identifiers. Option name: /NOL[ [OGO] ]. Suppresses the LINK copyright message.
/F
/HE /HI
/INC
/INF
/LI
/M /NOD[ [:libraryname] ]
/NOE
/NOF /NOI /NOL
10
LINK Option /NON Action Option name: /NON[ [ULLSDOSSEG] ]. Orders segments as with the /DOSSEG option, but with no additional bytes at the beginning of the _TEXT segment (if defined). This option overrides /DOSSEG. Option name: /NOP[ [ACKCODE] ]. Turns off code segment packing. Option name: /PACKC[ [ODE] ]. Packs neighboring code segments together. Specify number bytes to set the maximum size for physical segments formed by /PACKC. Option name: /PACKD[ [ATA] ]. Packs neighboring data segments together. Specify number bytes to set the maximum size for physical segments formed by /PACKD. This option is for Windows only. Option name: /PAU[ [SE] ]. Pauses during the link session for disk changes. Option name: /PM[ [TYPE] ]. Specifies the type of Windowsbased application where type is one of the following: PM (or WINDOWAPI ), VIO (or WINDOWCOMPAT), or NOVIO (or NOTWINDOWCOMPAT). Option name: /ST[ [ACK] ]. Sets the stack size to number bytes, from 1 byte to 64K. Option name: /T[ [INY] ]. Creates a tiny-model MS-DOS program with a .COM extension instead of .EXE. Incompatible with /INCR. Option name: /?. Displays a summary of LINK command-line syntax.
/NOP /PACKC[ [:number] ]
/PACKD[ [:number] ]
/PAU /PM:type
/ST:number /T
/?
Note Several rarely used options not listed here are described in Help. Environment Variables
Variable INIT LIB LINK TMP Description Specifies path for the TOOLS.INI file. Specifies search path for library files. Specifies default command-line options. Specifies path for the VM.TMP file.
11
MASM
The MASM program converts command-line options from MASM style to ML style, adds options to maximize compatibility, and calls ML.EXE. Note MASM.EXE is provided to maintain compatibility with old makefiles. For new makefiles, use the more powerful ML driver. Syntax Options MASM [[options]] sourcefile [[, [[objectfile]] [[, [[listingfile]] [[, [[crossreferencefile]] ]] ]] ]] [[;]]
Option /A /B /C /D /Dsymbol[ [=value] ] /E /H /HELP /I pathname /L /LA /ML /MU /MX /N /P /S /T /V Action Orders segments alphabetically. Results in a warning. Ignored. Sets internal buffer size. Ignored. Creates a cross-reference file. Translated to /FR. Creates a Pass 1 listing.Translated to F1/ST. Defines a symbol. Unchanged. Emulates floating-point instructions. Translated to /FPi. Lists command-line arguments. Translated to /help. Calls QuickHelp for help on the MASM driver. Specifies an include path. Unchanged. Creates a normal listing. Translated to /Fl. Lists all. Translated to /Fl and /Sa. Treats names as case sensitive. Translated to /Cp. Converts names to uppercase. Translated to /Cu. Preserves case on nonlocal names. Translated to /Cx. Suppresses table in listing file. Translated to /Sn. Checks for impure code. Use OPTION READONLY. Ignored. Orders segments sequentially. Results in a warning. Ignored. Enables terse assembly. Translated to /NOLOGO. Enables verbose assembly. Ignored.
12
ML Option /Wlevel /X /Z /ZD /ZI Action Sets warning level, where level = 0, 1, or 2. Lists false conditionals. Translated to /Sx. Displays error lines on screen. Ignored. Generates line numbers for CodeView. Translated to /Zd. Generates symbols for CodeView. Translated to /Zi. Description Specifies default path for .INC files. Specifies default command-line options. Specifies path for temporary files.
Variable INCLUDE MASM TMP
ML
The ML program assembles and links one or more assembly-language source files. The command-line options are case sensitive. Syntax Options ML [[options]] filename [[ [[options]] filename]]... [[/ link linkoptions]]
Option /AT Action Enables tiny-memory-model support. Enables error messages for code constructs that violate the requirements for .COM format files. Note that this is not equivalent to the .MODEL TINY directive. Selects an alternate linker. Assembles only. Does not link. Preserves case of all user identifiers. Maps all identifiers to uppercase (default). Preserves case in public and extern symbols. Defines a text macro with the given name. If value is missing, it is blank. Multiple tokens separated by spaces must be enclosed in quotation marks. Generates a preprocessed source listing (sent to STDOUT). See /Sf. Sets stack size to hexnum bytes (this is the same as /link /STACK:number). The value must be expressed in hexadecimal notation. There must be a space between /F and hexnum.
/Bl filename /c /Cp /Cu /Cx /Dsymbol[ [=value] ]
/EP /Fhexnum
Error! Style not defined. Option /Fefilename /Fl[ [filename] ] /Fm[ [filename] ] /Fofilename /FPi /Fr[ [filename] ] /FR[ [filename] ] /Gc Action Names the executable file. Generates an assembled code listing. See /Sf. Creates a linker map file. Names an object file.
13
Generates emulator fixups for floating-point arithmetic (mixedlanguage only). Generates a Source Browser .SBR file. Generates an extended form of a Source Browser .SBR file. Specifies use of FORTRAN- or Pascal-style function calling and naming conventions. Same as OPTION LANGUAGE:PASCAL. Specifies use of C-style function calling and naming conventions. Same as OPTION LANGUAGE:C. Restricts external names to number significant characters. The default is 31 characters. Calls QuickHelp for help on ML. Sets path for include file. A maximum of 10 /I options is allowed. Suppresses messages for successful assembly. Turns on listing of all available information. Adds instruction timings to listing file. Adds first-pass listing to listing file. Turns on listing of assembly-generated code. Sets the line width of source listing in characters per line. Range is 60 to 255 or 0. Default is 0. Same as PAGE width. Turns off symbol table when producing a listing. Sets the page length of source listing in lines per page. Range is 10 to 255 or 0. Default is 0. Same as PAGE length. Specifies text for source listing. Same as SUBTITLE text. Specifies title for source listing. Same as TITLE text. Turns on false conditionals in listing. Assembles source file whose name does not end with the .ASM extension. Same as /W0. Sets the warning level, where level = 0, 1, 2, or 3.
/Gd /H number /help /I pathname /nologo /Sa /Sc /Sf /Sg /Sl width /Sn /Sp length /Ss text /St text /Sx /Ta filename /w /Wlevel
14
NMAKE Option /WX /Zd /Zf /Zi /Zm /Zp[ [alignment ] ] /Zs /? Action Returns an error code if warnings are generated. Generates line-number information in object file. Makes all symbols public. Generates CodeView information in object file. Enables M510 option for maximum compatibility with MASM 5.1. Packs structures on the specified byte boundary. The alignment may be 1, 2, or 4. Performs a syntax check only. Displays a summary of ML command-line syntax.
QuickAssembler Support
For compatibility with QuickAssembler makefiles, ML recognizes these options:

Option /a Action Orders segments alphabetically in QuickAssembler. MASM 6.1 uses the .ALPHA directive for alphabetical ordering and ignores /a. Equivalent to /Cp. Prints the source for error lines to the screen. MASM 6.1 ignores this option. Performs one-pass assembly. MASM 6.1 ignores this option. Performs two-pass assembly. MASM 6.1 ignores this option. Orders segments sequentially. MASM 6.1 uses the .SEQ directive for sequential ordering and ignores /s. Equivalent to /Sl0 /Sp0. Description Specifies search path for include files. Specifies default command-line options. Specifies path for temporary files.
/Cl /Ez /P1 /P2 /s /Sq
Variable INCLUDE ML TMP
NMAKE
The NMAKE utility automates the process of compiling and linking project files. Syntax NMAKE [[options]] [[macros]] [[targets]]
15
Options
Option /A /C /D /E /F filename
Action Executes all commands even if targets are not out-of-date. Suppresses the NMAKE copyright message and prevents nonfatal error or warning messages from being displayed. Displays the modification time of each file when the times of targets and dependents are checked. Causes environment variables to override macro definitions within description files. Specifies filename as the name of the description file to use. If a dash () is entered instead of a filename, NMAKE reads the description file from the standard input device. If /F is not specified, NMAKE uses MAKEFILE as the description file. If MAKEFILE does not exist, NMAKE builds command-line targets using inference rules. Calls QuickHelp for help on NMAKE. Ignores exit codes from commands in the description file. NMAKE continues executing the rest of the description file despite the errors. Displays but does not execute commands from the description file. Suppresses the NMAKE copyright message. Displays all macro definitions, inference rules, target descriptions, and the .SUFFIXES list. Checks modification times of command-line targets (or first target in the description file if no command-line targets are specified). NMAKE returns a zero exit code if all such targets are up-to-date and a nonzero exit code if any target is out-of-date. Only preprocessing commands in the description file are executed. Ignores inference rules and macros that are predefined or defined in the TOOLS.INI file. Suppresses display of commands as they are executed. Changes modification times of command-line targets (or first target in the description file if no command-line targets are specified) to the current time. Only preprocessing commands in the description file are executed. The contents of target files are not modified. Sends all error output to filename, which can be either a file or a device. If a dash () is entered instead of a filename, the error output is sent to the standard output device. Internal option for use by the Microsoft Programmers WorkBench (PWB). Displays a summary of NMAKE command-line syntax. Description Specifies path for TOOLS.INI file, which may contain macros, inference rules, and description blocks.
/HELP /I /N /NOLOGO /P /Q
/R /S /T
/X filename
/Z /?
Environment Variable
Variable INIT
16
PWB (Programmers WorkBench)
PWB (Programmers WorkBench)

The Microsoft Programmers WorkBench (PWB) provides an integrated environment for developing programs in assembly language. The command-line options are case sensitive. Syntax Options PWB [[options]] [[files]]
Option /D[ [init ] ] Action Prevents PWB from examining initialization files, where init is one or more of the following characters: A S T Disable autoload extensions (including languagespecific extensions and Help). Ignore CURRENT.STS. Ignore TOOLS.INI.
If the /D option does not include an init character, it is equivalent to specifying /DAST (all files and extensions ignored). /e cmdstr Executes the command or sequence of commands at start-up. The entire cmdstr argument must be placed in double quotation marks if it contains a space. If cmdstr contains literal double quotation marks, place a backslash ( \) in front of each double quotation mark. To include a literal backslash in the command string, use double backslashes ( \\). Moves the cursor to the specified mark instead of moving it to the last known position. The mark can be a line number. Specifies a program list for PWB to read, where init can be: Ffile L Pfile /r Read a foreign program list (one not created using PWB). Read the last program list. Use this option to start PWB in the same state you left it. Read a PWB program list.
/m mark /P[ [init ] ]
Starts PWB in no-edit mode. Functions that modify files are disallowed.
Error! Style not defined. Option [ [/t] ] file... Action
17
Loads the specified file at startup. The file specification can contain wildcards. If multiple files are specified, PWB loads only the first file. When the Exit function is invoked, PWB saves the current file and loads the next file in the list. Files specified with /t are temporary; PWB does not add them to the file history on the File menu. No other options can follow /t on the command line. Each temporary file must be specified in a separate /t option.
/?
Displays a summary of PWB command-line syntax. Description Specifies path of help files or list of help filenames. Specifies path for TOOLS.INI and CURRENT.STS files. Specifies path for temporary files.
Variable HELPFILES INIT TMP
PWBRMAKE
PWBRMAKE converts the .SBR files created by the assembler into database .BSC files that can be read by the Microsoft Programmers WorkBench (PWB) Source Browser. The command-line options are case sensitive. Syntax Options PWBRMAKE [[options]] sbrfiles
Option /Ei filename /Ei (filename...) /Em /Es Action Excludes the contents of the specified include files from the database. To specify multiple filenames, separate them with spaces and enclose the list in parentheses. Excludes symbols in the body of macros. Use /Em to include only macro names. Excludes from the database every include file specified with an absolute path or found in an absolute path specified in the INCLUDE environment variable. Calls QuickHelp for help on PWBRMAKE. Includes unreferenced symbols. Forces a nonincremental build and prevents truncation of .SBR files. Specifies a name for the database file. Displays verbose output. Displays a summary of PWBRMAKE command-line syntax.
/HELP /Iu /n /o filename /v /?
18
QuickHelp
QuickHelp
The QuickHelp utility displays Help files. All MASM reserved words and error messages can be used for topic. Syntax Options QH [[options]] [[topic]]
Option /d filename /lnumber /mnumber /p filename /pa[ [filename] ] /q /r command Action Specifies either a specific database name or a path where the databases are found. Specifies the number of lines the QuickHelp window should occupy. Changes the screen mode to display the specified number of lines, where number is in the range 25 to 60. Sets the name of the paste file. Specifies that pasting operations are appended to the current paste file (rather than overwriting the file). Prevents the version box from being displayed when QuickHelp is installed as a keyboard monitor. Specifies the command that QuickHelp should execute when the right mouse button is pressed. The command can be one of the following letters: l i w b e t /s Display last topic Display history of help topics Hide window Display previous topic Find next topic Display contents
Specifies that clicking the mouse above or below the scroll box causes QuickHelp to scroll by lines rather than pages.
Error! Style not defined. Option /t name Action Directs QuickHelp to copy the specified section of the given topic to the current paste file and exit. The name may be: All Syntax Example Paste the entire topic Paste the syntax only Paste the example only
19
If the topic is not found, QuickHelp returns an exit code of 1. /u Specifies that QuickHelp is being run by a utility. If the topic specified on the command line is not found, QuickHelp immediately exits with an exit code of 3. Description Specifies path of help files or list of help filenames. Specifies default command-line options. Specifies directory of default paste file.
Variable HELPFILES QH TMP
RM
The RM utility moves a file to a hidden DELETED subdirectory of the directory containing the file. Use the UNDEL utility to recover the file and the EXP utility to mark the hidden file for deletion. Syntax Options RM [[options]] [[files]]
Option /F /HELP /I /K /R directory /? Action Deletes read-only files without prompting. Calls QuickHelp for help on RM. Inquires for permission before removing each file. Keeps read-only files without prompting. Recurses into subdirectories of the specified directory. Displays a summary of RM command-line syntax.
20
UNDEL
UNDEL
The UNDEL utility moves a file from a hidden DELETED subdirectory to the parent directory. UNDEL is used along with EXP and RM to manage backup files. Syntax Options UNDEL [[{option | files}]]
Option /HELP /? Action Calls QuickHelp for help on UNDEL. Displays a summary of UNDEL command-line syntax.
21
C H A P T E R
Directives
Topical Cross-reference for Directives. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Filename: LMARFC02.DOC Project: Template: MSGRIDA1.DOT Author: Mike Eddy Last Saved By: Launi Lockard Revision #: 24 Page: 21 of 1 Printed: 10/02/00 04:14 PM
22
Reference
Topical Cross-reference for Directives

Code Labels
ALIGN LABEL EVEN ORG
Conditional Assembly
ELSE ENDIF IFB/IFNB IFE ELSEIF IF IFDEF/IFNDEF IFIDN/IFIDNI ELSEIF2 IF2 IFDIF/IFDIFI
Conditional Control Flow

.BREAK .ELSEIF .IF .UNTILCXZ .CONTINUE .ENDIF .REPEAT .WHILE .ELSE .ENDW .UNTIL/
Conditional Error
.ERR .ERRDEF .ERRIDN/.ERRIDNI .ERRNZ .ERR2 .ERRDIF/.ERRDIFI .ERRNB .ERRB .ERRE .ERRNDEF
Data Allocation
ALIGN EVEN ORG REAL8 WORD/SWORD BYTE/SBYTE FWORD QWORD REAL10 DWORD/SDWORD LABEL REAL4 TBYTE
Equates
= EQU TEXTEQU
Directives
23
Listing Control
.CREF .LISTIF .NOCREF .NOLISTMACRO .TFCOND .LIST .LISTMACRO .NOLIST PAGE TITLE .LISTALL .LISTMACROALL .NOLISTIF SUBTITLE
Macros
ENDM LOCAL EXITM MACRO GOTO PURGE
Miscellaneous
ASSUME END OPTION .RADIX COMMENT INCLUDE POPCONTEXT ECHO INCLUDELIB PUSHCONTEXT
Procedures
ENDP PROTO INVOKE USES PROC
Processor
.186 .287 .387 .8086 .286 .386 .486 .8087 .286P .386P .486P .NO87
Repeat Blocks
ENDM GOTO FOR REPEAT FORC WHILE
Scope
COMM INCLUDELIB EXTERN PUBLIC EXTERNDEF
24
Reference
Segment
.ALPHA END SEGMENT ASSUME ENDS .SEQ .DOSSEG GROUP
Simplified Segment
.CODE .DATA? .FARDATA .STACK .CONST .DOSSEG .FARDATA? .STARTUP .DATA .EXIT .MODEL
String
CATSTR SIZESTR INSTR SUBSTR
Structure and Record

ENDS TYPEDEF RECORD UNION STRUCT
Directives
25
Directives
name = expression Assigns the numeric value of expression to name. The symbol may be redefined later. .186 Enables assembly of instructions for the 80186 processor; disables assembly of instructions introduced with later processors. Also enables 8087 instructions. .286 Enables assembly of nonprivileged instructions for the 80286 processor; disables assembly of instructions introduced with later processors. Also enables 80287 instructions. .286P Enables assembly of all instructions (including privileged) for the 80286 processor; disables assembly of instructions introduced with later processors. Also enables 80287 instructions. .287 Enables assembly of instructions for the 80287 coprocessor; disables assembly of instructions introduced with later coprocessors. .386 Enables assembly of nonprivileged instructions for the 80386 processor; disables assembly of instructions introduced with later processors. Also enables 80387 instructions. .386P Enables assembly of all instructions (including privileged) for the 80386 processor; disables assembly of instructions introduced with later processors. Also enables 80387 instructions. .387 Enables assembly of instructions for the 80387 coprocessor. .486 Enables assembly of nonprivileged instructions for the 80486 processor. .486P Enables assembly of all instructions (including privileged) for the 80486 processor. .8086 Enables assembly of 8086 instructions (and the identical 8088 instructions); disables assembly of instructions introduced with later processors. Also enables 8087 instructions. This is the default mode for processors.
26
Reference
.8087 Enables assembly of 8087 instructions; disables assembly of instructions introduced with later coprocessors. This is the default mode for coprocessors. ALIGN [[number]] Aligns the next variable or instruction on a byte that is a multiple of number. .ALPHA Orders segments alphabetically. ASSUME segregister:name [[, segregister:name]]... ASSUME dataregister:type [[, dataregister:type]]... ASSUME register:ERROR [[, register:ERROR]]... ASSUME [[register:]] NOTHING [[, register:NOTHING]]... Enables error-checking for register values. After an ASSUME is put into effect, the assembler watches for changes to the values of the given registers. ERROR generates an error if the register is used. NOTHING removes register error-checking. You can combine different kinds of assumptions in one statement. .BREAK [[.IF condition]] Generates code to terminate a .WHILE or .REPEAT block if condition is true. [[name]] BYTE initializer [[, initializer]] ... Allocates and optionally initializes a byte of storage for each initializer. Can also be used as a type specifier anywhere a type is legal. name CATSTR [[textitem1 [[, textitem2]] ...]] Concatenates text items. Each text item can be a literal string, a constant preceded by a %, or the string returned by a macro function. .CODE [[name]] When used with .MODEL , indicates the start of a code segment called name (the default segment name is _TEXT for tiny, small, compact, and flat models, or module_TEXT for other models). COMM definition [[, definition]] ... Creates a communal variable with the attributes specified in definition. Each definition has the following form: [[langtype]] [[NEAR | FAR]] label:type[[:count]] The label is the name of the variable. The type can be any type specifier (BYTE, WORD, and so on) or an integer specifying the number of bytes. The count specifies the number of data objects (one is the default). COMMENT delimiter [[text]] [[text]] [[text]] delimiter [[text]] Treats all text between or on the same line as the delimiters as a comment.
Directives
27
.CONST When used with .MODEL , starts a constant data segment (with segment name CONST). This segment has the read-only attribute. .CONTINUE [[.IF condition]] Generates code to jump to the top of a .WHILE or .REPEAT block if condition is true. .CREF Enables listing of symbols in the symbol portion of the symbol table and browser file. .DATA When used with .MODEL , starts a near data segment for initialized data (segment name _DATA). .DATA? When used with .MODEL , starts a near data segment for uninitialized data (segment name _BSS). .DOSSEG Orders the segments according to the MS-DOS segment convention: CODE first, then segments not in DGROUP, and then segments in DGROUP. The segments in DGROUP follow this order: segments not in BSS or STACK, then BSS segments, and finally STACK segments. Primarily used for ensuring CodeView support in MASM stand-alone programs. Same as DOSSEG . DOSSEG Identical to .DOSSEG , which is the preferred form. DB Can be used to define data like BYTE. DD Can be used to define data like DWORD. DF Can be used to define data like FWORD. DQ Can be used to define data like QWORD. DT Can be used to define data like TBYTE. DW Can be used to define data like WORD. [[name]] DWORD initializer [[, initializer]]... Allocates and optionally initializes a doubleword (4 bytes) of storage for each initializer. Can also be used as a type specifier anywhere a type is legal.
28
Reference
ECHO message Displays message to the standard output device (by default, the screen). Same as %OUT. .ELSE See .IF. ELSE Marks the beginning of an alternate block within a conditional block. See IF. ELSEIF Combines ELSE and IF into one statement. See IF. ELSEIF2 ELSEIF block evaluated on every assembly pass if OPTION:SETIF2 is TRUE. END [[address]] Marks the end of a module and, optionally, sets the program entry point to address. .ENDIF See .IF. ENDIF See IF. ENDM Terminates a macro or repeat block. See MACRO, FOR, FORC, REPEAT, or WHILE. name ENDP Marks the end of procedure name previously begun with PROC. See PROC. name ENDS Marks the end of segment, structure, or union name previously begun with SEGMENT, STRUCT, UNION, or a simplified segment directive. .ENDW See .WHILE. name EQU expression Assigns numeric value of expression to name. The name cannot be redefined later. name EQU <text> Assigns specified text to name. The name can be assigned a different text later. See TEXTEQU. .ERR [[message]] Generates an error.
Directives
29
.ERR2 [[message]] .ERR block evaluated on every assembly pass if OPTION:SETIF2 is TRUE. .ERRB <textitem> [[, message]] Generates an error if textitem is blank. .ERRDEF name [[, message]] Generates an error if name is a previously defined label, variable, or symbol. .ERRDIF[[I]] <textitem1>, <textitem2> [[, message]] Generates an error if the text items are different. If I is given, the comparison is case insensitive. .ERRE expression [[, message]] Generates an error if expression is false (0). .ERRIDN[[I]] <textitem1>, <textitem2> [[, message]] Generates an error if the text items are identical. If I is given, the comparison is case insensitive. .ERRNB <textitem> [[, message]] Generates an error if textitem is not blank. .ERRNDEF name [[, message]] Generates an error if name has not been defined. .ERRNZ expression [[, message]] Generates an error if expression is true (nonzero). EVEN Aligns the next variable or instruction on an even byte. .EXIT [[expression]] Generates termination code. Returns optional expression to shell. EXITM [[textitem]] Terminates expansion of the current repeat or macro block and begins assembly of the next statement outside the block. In a macro function, textitem is the value returned. EXTERN [[langtype]] name [[(altid)]] :type [[, [[langtype]] name [[(altid)]] :type]]... Defines one or more external variables, labels, or symbols called name whose type is type. The type can be ABS, which imports name as a constant. Same as EXTRN. EXTERNDEF [[langtype]] name:type [[, [[langtype]] name:type]]... Defines one or more external variables, labels, or symbols called name whose type is type. If name is defined in the module, it is treated as PUBLIC. If name is referenced in the module, it is treated as EXTERN. If name is not referenced, it is ignored. The type can be ABS, which imports name as a constant. Normally used in include files.
30
Reference
EXTRN See EXTERN. .FARDATA [[name]] When used with .MODEL , starts a far data segment for initialized data (segment name FAR_DATA or name). .FARDATA? [[name]] When used with .MODEL , starts a far data segment for uninitialized data (segment name FAR_BSS or name). FOR parameter [[:REQ | :=default]] , <argument [[, argument]]...> statements ENDM Marks a block that will be repeated once for each argument, with the current argument replacing parameter on each repetition. Same as IRP. FORC parameter, <string> statements ENDM Marks a block that will be repeated once for each character in string, with the current character replacing parameter on each repetition. Same as IRPC. [[name]] FWORD initializer [[, initializer]]... Allocates and optionally initializes 6 bytes of storage for each initializer. Also can be used as a type specifier anywhere a type is legal. GOTO macrolabel Transfers assembly to the line marked :macrolabel. GOTO is permitted only inside MACRO, FOR, FORC, REPEAT, and WHILE blocks. The label must be the only directive on the line and must be preceded by a leading colon. name GROUP segment [[, segment]]... Add the specified segments to the group called name. .IF condition1 statements [[.ELSEIF condition2 statements]] [[.ELSE statements]] .ENDIF Generates code that tests condition1 (for example, AX > 7) and executes the statements if that condition is true. If an .ELSE follows, its statements are executed if the original condition was false. Note that the conditions are evaluated at run time.
Directives
31
IF expression1 ifstatements [[ELSEIF expression2 elseifstatements]] [[ELSE elsestatements]] ENDIF Grants assembly of ifstatements if expression1 is true (nonzero) or elseifstatements if expression1 is false (0) and expression2 is true. The following directives may be substituted for ELSEIF: ELSEIFB,
32
Reference
ELSEIFDEF, ELSEIFDIF, ELSEIFDIFI, ELSEIFE, ELSEIFIDN, ELSEIFIDNI, ELSEIFNB, and ELSEIFNDEF. Optionally, assembles elsestatements if the previous expression is false. Note that the expressions are evaluated at assembly time. IF2 expression IF block is evaluated on every assembly pass if OPTION:SETIF2 is TRUE. See IF for complete syntax. IFB textitem Grants assembly if textitem is blank. See IF for complete syntax. IFDEF name Grants assembly if name is a previously defined label, variable, or symbol. See IF for complete syntax. IFDIF[[I]] textitem1, textitem2 Grants assembly if the text items are different. If I is given, the comparison is case insensitive. See IF for complete syntax. IFE expression Grants assembly if expression is false (0). See IF for complete syntax. IFIDN[[I]] textitem1, textitem2 Grants assembly if the text items are identical. If I is given, the comparison is case insensitive. See IF for complete syntax. IFNB textitem Grants assembly if textitem is not blank. See IF for complete syntax. IFNDEF name Grants assembly if name has not been defined. See IF for complete syntax. INCLUDE filename Inserts source code from the source file given by filename into the current source file during assembly. The filename must be enclosed in angle brackets if it includes a backslash, semicolon, greater-than symbol, less-than symbol, single quotation mark, or double quotation mark. INCLUDELIB libraryname Informs the linker that the current module should be linked with libraryname. The libraryname must be enclosed in angle brackets if it includes a backslash, semicolon, greater-than symbol, less-than symbol, single quotation mark, or double quotation mark. name INSTR [[position,]] textitem1, textitem2 Finds the first occurrence of textitem2 in textitem1. The starting position is optional. Each text item can be a literal string, a constant preceded by a %, or the string returned by a macro function.
Directives
33
INVOKE expression [[, arguments]] Calls the procedure at the address given by expression, passing the arguments on the stack or in registers according to the standard calling conventions of the language type. Each argument passed to the procedure may be an expression, a register pair, or an address expression (an expression preceded by ADDR). IRP See FOR. IRPC See FORC. name LABEL type Creates a new label by assigning the current location-counter value and the given type to name. name LABEL [[NEAR | FAR | PROC]] PTR [[type]] Creates a new label by assigning the current location-counter value and the given type to name. .LALL See .LISTMACROALL. .LFCOND See .LISTIF. .LIST Starts listing of statements. This is the default. .LISTALL Starts listing of all statements. Equivalent to the combination of .LIST, .LISTIF, and .LISTMACROALL. .LISTIF Starts listing of statements in false conditional blocks. Same as .LFCOND. .LISTMACRO Starts listing of macro expansion statements that generate code or data. This is the default. Same as .XALL. .LISTMACROALL Starts listing of all statements in macros. Same as .LALL. LOCAL localname [[, localname]]... Within a macro, LOCAL defines labels that are unique to each instance of the macro. LOCAL label [[ [count ] ]] [[:type]] [[, label [[ [count] ]] [[type]]]]... Within a procedure definition (PROC), LOCAL creates stack-based variables that exist for the duration of the procedure. The label may be a simple variable or an array containing count elements.
34
Reference
name MACRO [[parameter [[:REQ | :=default | :VARARG]]]]... statements ENDM [[value]] Marks a macro block called name and establishes parameter placeholders for arguments passed when the macro is called. A macro function returns value to the calling statement. .MODEL memorymodel [[, langtype]] [[, stackoption]] Initializes the program memory model. The memorymodel can be TINY, SMALL, COMPACT, MEDIUM, LARGE, HUGE, or FLAT. The langtype can be C, BASIC, FORTRAN, PASCAL, SYSCALL, or STDCALL. The stackoption can be NEARSTACK or FARSTACK. NAME modulename Ignored. .NO87 Disallows assembly of all floating-point instructions. .NOCREF [[name[[, name]]...]] Suppresses listing of symbols in the symbol table and browser file. If names are specified, only the given names are suppressed. Same as .XCREF. .NOLIST Suppresses program listing. Same as .XLIST. .NOLISTIF Suppresses listing of conditional blocks whose condition evaluates to false (0). This is the default. Same as .SFCOND. .NOLISTMACRO Suppresses listing of macro expansions. Same as .SALL. OPTION optionlist Enables and disables features of the assembler. Available options include CASEMAP, DOTNAME, NODOTNAME, EMULATOR, NOEMULATOR, EPILOGUE, EXPR16, EXPR32, LANGUAGE, LJMP, NOLJMP, M510, NOM510, NOKEYWORD, NOSIGNEXTEND, OFFSET, OLDMACROS, NOOLDMACROS, OLDSTRUCTS , NOOLDSTRUCTS , PROC, PROLOGUE, READONLY, NOREADONLY, SCOPED , NOSCOPED , SEGMENT, and SETIF2. ORG expression Sets the location counter to expression. %OUT See ECHO. PAGE [[[[length]], width]] Sets line length and character width of the program listing. If no arguments are given, generates a page break.
Directives
35
PAGE + Increments the section number and resets the page number to 1. POPCONTEXT context Restores part or all of the current context (saved by the PUSHCONTEXT directive). The context can be ASSUMES , RADIX, LISTING, CPU, or ALL. label PROC [[distance]] [[langtype]] [[visibility]] [[<prologuearg>]] [[USES reglist]] [[, parameter [[:tag]]]]... statements label ENDP Marks start and end of a procedure block called label. The statements in the block can be called with the CALL instruction or INVOKE directive. label PROTO [[distance]] [[langtype]] [[, [[parameter]]:tag]]... Prototypes a function. PUBLIC [[langtype]] name [[, [[langtype]] name]]... Makes each variable, label, or absolute symbol specified as name available to all other modules in the program. PURGE macroname [[, macroname]]... Deletes the specified macros from memory. PUSHCONTEXT context Saves part or all of the current context: segment register assumes, radix value, listing and cref flags, or processor/coprocessor values. The context can be ASSUMES , RADIX, LISTING, CPU, or ALL. [[name]] QWORD initializer [[, initializer]]... Allocates and optionally initializes 8 bytes of storage for each initializer. Also can be used as a type specifier anywhere a type is legal. .RADIX expression Sets the default radix, in the range 2 to 16, to the value of expression. name REAL4 initializer [[, initializer]]... Allocates and optionally initializes a single-precision (4-byte) floating-point number for each initializer. name REAL8 initializer [[, initializer]]... Allocates and optionally initializes a double-precision (8-byte) floating-point number for each initializer. name REAL10 initializer [[, initializer]]... Allocates and optionally initializes a 10-byte floating-point number for each initializer.
36
Reference
recordname RECORD fieldname:width [[= expression]] [[, fieldname:width [[= expression]]]]... Declares a record type consisting of the specified fields. The fieldname names the field, width specifies the number of bits, and expression gives its initial value. .REPEAT statements .UNTIL condition Generates code that repeats execution of the block of statements until condition becomes true. .UNTILCXZ, which becomes true when CX is zero, may be substituted for .UNTIL. The condition is optional with .UNTILCXZ. REPEAT expression statements ENDM Marks a block that is to be repeated expression times. Same as REPT. REPT See REPEAT. .SALL See .NOLISTMACRO. name SBYTE initializer [[, initializer]]... Allocates and optionally initializes a signed byte of storage for each initializer. Can also be used as a type specifier anywhere a type is legal. name SDWORD initializer [[, initializer]]... Allocates and optionally initializes a signed doubleword (4 bytes) of storage for each initializer. Also can be used as a type specifier anywhere a type is legal. name SEGMENT [[READONLY]] [[align]] [[combine]] [[use]] [['class']] statements name ENDS Defines a program segment called name having segment attributes align (BYTE, WORD, DWORD, PARA, PAGE), combine (PUBLIC, STACK, COMMON, MEMORY, AT address, PRIVATE), use (USE16, USE32, FLAT), and class. .SEQ Orders segments sequentially (the default order). .SFCOND See .NOLISTIF. name SIZESTR textitem Finds the size of a text item.
Directives
37
.STACK [[size]] When used with .MODEL , defines a stack segment (with segment name STACK). The optional size specifies the number of bytes for the stack (default 1,024). The .STACK directive automatically closes the stack statement. .STARTUP Generates program start-up code. STRUC See STRUCT. name STRUCT [[alignment]] [[, NONUNIQUE]] fielddeclarations name ENDS Declares a structure type having the specified fielddeclarations. Each field must be a valid data definition. Same as STRUC. name SUBSTR textitem, position [[, length]] Returns a substring of textitem, starting at position. The textitem can be a literal string, a constant preceded by a %, or the string returned by a macro function. SUBTITLE text Defines the listing subtitle. Same as SUBTTL. SUBTTL See SUBTITLE. name SWORD initializer [[, initializer]]... Allocates and optionally initializes a signed word (2 bytes) of storage for each initializer. Can also be used as a type specifier anywhere a type is legal. [[name]] TBYTE initializer [[, initializer]]... Allocates and optionally initializes 10 bytes of storage for each initializer. Can also be used as a type specifier anywhere a type is legal. name TEXTEQU [[textitem]] Assigns textitem to name. The textitem can be a literal string, a constant preceded by a %, or the string returned by a macro function. .TFCOND Toggles listing of false conditional blocks. TITLE text Defines the program listing title. name TYPEDEF type Defines a new type called name, which is equivalent to type.
38
Reference
name UNION [[alignment]] [[, NONUNIQUE]] fielddeclarations [[name]] ENDS Declares a union of one or more data types. The fielddeclarations must be valid data definitions. Omit the ENDS name label on nested UNION definitions. .UNTIL See .REPEAT. .UNTILCXZ See .REPEAT. .WHILE condition statements .ENDW Generates code that executes the block of statements while condition remains true. WHILE expression statements ENDM Repeats assembly of block statements as long as expression remains true. [[name]] WORD initializer [[, initializer]]... Allocates and optionally initializes a word (2 bytes) of storage for each initializer. Can also be used as a type specifier anywhere a type is legal. .XALL See .LISTMACRO. .XCREF See .NOCREF. .XLIST See .NOLIST.
Directives
39
39
C H A P T E R
Symbols and Operators
Topical Cross-reference for Symbols . . Topical Cross-reference for Operators . Predefined Symbols . . . . . . . . . . . . . . Operators . . . . . . . . . . . . . . . . . . . . . Run-Time Operators . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
40 41 43 44 48
40
Reference
Topical Cross-reference for Symbols

Date and Time Information
@Date @Time
Environment Information
@Cpu @Environ @Interface @Version
File Information
@FileCur @FileName @Line
Macro Functions
@CatStr @InStr @SizeStr @SubStr
Miscellaneous
$ @B ? @F @@:
Segment Information
@code @data @fardata? @WordSize @CodeSize @DataSize @Model @CurSeg @fardata @stack
41
Topical Cross-reference for Operators

Arithmetic
* . MOD + / []
Control Flow
! && == || != < > & <= >=
Logical and Shift

AND SHL NOT SHR OR XOR
Macro
! ;; % <> &
Miscellaneous
:: DUP SIGN? ; OVERFLOW? ZERO? : CARRY? PARITY?
Record
MASK WIDTH
Relational
EQ LE GE LT GT NE
42
Reference
Segment
: LROFFSET OFFSET SEG
Type
HIGH LENGTHOF OPATTR SIZE TYPE HIGHWORD LOW PTR SIZEOF LENGTH LOWWORD SHORT THIS
43
Predefined Symbols
$ The current value of the location counter. ? In data declarations, a value that the assembler allocates but does not initialize. @@: Defines a code label recognizable only between label1 and label2, where label1 is either start of code or the previous @@: label, and label2 is either end of code or the next @@: label. See @B and @F. @B The location of the previous @@: label. @CatStr( string1 [[, string2...]] ) Macro function that concatenates one or more strings. Returns a string. @code The name of the code segment (text macro). @CodeSize 0 for TINY, SMALL, COMPACT, and FLAT models, and 1 for MEDIUM, LARGE, and HUGE models (numeric equate). @Cpu A bit mask specifying the processor mode (numeric equate). @CurSeg The name of the current segment (text macro). @data The name of the default data group. Evaluates to DGROUP for all models except FLAT. Evaluates to FLAT under the FLAT memory model (text macro). @DataSize 0 for TINY, SMALL, MEDIUM, and FLAT models, 1 for COMPACT and LARGE models, and 2 for HUGE model (numeric equate). @Date The system date in the format mm/dd/yy (text macro). @Environ( envvar ) Value of environment variable envvar (macro function). @F The location of the next @@: label. @fardata The name of the segment defined by the .FARDATA directive (text macro).
44
Reference
@fardata? The name of the segment defined by the .FARDATA? directive (text macro). @FileCur The name of the current file (text macro). @FileName The base name of the main file being assembled (text macro). @InStr( [[position]], string1, string2 ) Macro function that finds the first occurrence of string2 in string1, beginning at position within string1. If position does not appear, search begins at start of string1. Returns a position integer or 0 if string2 is not found. @Interface Information about the language parameters (numeric equate). @Line The source line number in the current file (numeric equate). @Model 1 for TINY model, 2 for SMALL model, 3 for COMPACT model, 4 for MEDIUM model, 5 for LARGE model, 6 for HUGE model, and 7 for FLAT model (numeric equate). @SizeStr( string ) Macro function that returns the length of the given string. Returns an integer. @SubStr( string, position [[, length]] ) Macro function that returns a substring starting at position. @stack DGROUP for near stacks or STACK for far stacks (text macro). @Time The system time in 24-hour hh:mm:ss format (text macro). @Version 610 in MASM 6.1 (text macro). @WordSize Two for a 16-bit segment or 4 for a 32-bit segment (numeric equate).
Operators
expression1 + expression2 Returns expression1 plus expression2. expression1 expression2 Returns expression1 minus expression2. expression1 * expression2 Returns expression1 times expression2.
45
expression1 / expression2 Returns expression1 divided by expression2. expression Reverses the sign of expression. expression1 [expression2] Returns expression1 plus [expression2]. segment: expression Overrides the default segment of expression with segment. The segment can be a segment register, group name, segment name, or segment expression. The expression must be a constant. expression. field [[. field]]... Returns expression plus the offset of field within its structure or union. [register]. field [[. field]]... Returns value at the location pointed to by register plus the offset of field within its structure or union. <text> Treats text as a single literal element. text Treats text as a string. text Treats text as a string. !character Treats character as a literal character rather than as an operator or symbol. ;text Treats text as a comment. ;;text Treats text as a comment in a macro that appears only in the macro definition. The listing does not show text where the macro is expanded. %expression Treats the value of expression in a macro argument as text. &parameter& Replaces parameter with its corresponding argument value. ABS See the EXTERNDEF directive. ADDR See the INVOKE directive. expression1 AND expression2 Returns the result of a bitwise AND operation for expression1 and expression2.
46
Reference
count DUP (initialvalue [[, initialvalue]]...) Specifies count number of declarations of initialvalue. expression1 EQ expression2 Returns true (1) if expression1 equals expression2, or returns false (0) if it does not. expression1 GE expression2 Returns true (1) if expression1 is greater-than-or-equal-to expression2, or returns false (0) if it is not. expression1 GT expression2 Returns true (1) if expression1 is greater than expression2, or returns false (0) if it is not. HIGH expression Returns the high byte of expression. HIGHWORD expression Returns the high word of expression. expression1 LE expression2 Returns true (1) if expression1 is less than or equal to expression2, or returns false (0) if it is not. LENGTH variable Returns the number of data items in variable created by the first initializer. LENGTHOF variable Returns the number of data objects in variable. LOW expression Returns the low byte of expression. LOWWORD expression Returns the low word of expression. LROFFSET expression Returns the offset of expression. Same as OFFSET, but it generates a loader resolved offset, which allows Windows to relocate code segments. expression1 LT expression2 Returns true (1) if expression1 is less than expression2, or returns false (0) if it is not. MASK {recordfieldname | record} Returns a bit mask in which the bits in recordfieldname or record are set and all other bits are cleared. expression1 MOD expression2 Returns the integer value of the remainder (modulo) when dividing expression1 by expression2.
47
expression1 NE expression2 Returns true (1) if expression1 does not equal expression2, or returns false (0) if it does. NOT expression Returns expression with all bits reversed. OFFSET expression Returns the offset of expression. OPATTR expression Returns a word defining the mode and scope of expression. The low byte is identical to the byte returned by .TYPE. The high byte contains additional information. expression1 OR expression2 Returns the result of a bitwise OR operation for expression1 and expression2. type PTR expression Forces the expression to be treated as having the specified type. [[distance]] PTR type Specifies a pointer to type. SEG expression Returns the segment of expression. expression SHL count Returns the result of shifting the bits of expression left count number of bits. SHORT label Sets the type of label to short. All jumps to label must be short (within the range 128 to +127 bytes from the jump instruction to label). expression SHR count Returns the result of shifting the bits of expression right count number of bits. SIZE variable Returns the number of bytes in variable allocated by the first initializer. SIZEOF {variable | type} Returns the number of bytes in variable or type. THIS type Returns an operand of specified type whose offset and segment values are equal to the current location-counter value. .TYPE expression See OPATTR. TYPE expression Returns the type of expression.
48
Reference
WIDTH {recordfieldname | record} Returns the width in bits of the current recordfieldname or record. expression1 XOR expression2 Returns the result of a bitwise XOR operation for expression1 and expression2.
Run-Time Operators
The following operators are used only within .IF, .WHILE, or .REPEAT blocks and are evaluated at run time, not at assembly time: expression1 == expression2 Is equal to. expression1 != expression2 Is not equal to. expression1 > expression2 Is greater than. expression1 >= expression2 Is greater than or equal to. expression1 < expression2 Is less than. expression1 <= expression2 Is less than or equal to. expression1 || expression2 Logical OR. expression1 && expression2 Logical AND. expression1 & expression2 Bitwise AND. !expression Logical negation. CARRY? Status of carry flag. OVERFLOW? Status of overflow flag. PARITY? Status of parity flag. SIGN? Status of sign flag. ZERO? Status of zero flag.
49
49
C H A P T E R
Processor
Topical Cross-reference for Processor Instructions . Interpreting Processor Instructions . . . . . . . . . . . . Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clock Speeds . . . . . . . . . . . . . . . . . . . . . . . . . Timings on the 8088 and 8086 Processors . . Timings on the 80286-80486 Processors . . . Interpreting Encodings . . . . . . . . . . . . . . . . . . . . . Interpreting 8038680486 Encoding Extensions . . . 16-bit Encoding. . . . . . . . . . . . . . . . . . . . . . . . 32-bit Encoding. . . . . . . . . . . . . . . . . . . . . . . . Address-Size Prefix . . . . . . . . . . . . . . . . . . Operand-Size Prefix . . . . . . . . . . . . . . . . . . Encoding Differences for 32-Bit Operations . Scaled Index Base Byte . . . . . . . . . . . . . . . Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
50 53 53 54 54 54 55 56 56 59 60 60 60 60 60 61 64
Filename: LMARFC04.DOC Project: Template: MSGRIDA1.DOT Author: Mike Eddy Last Saved By: Mike Eddy Revision #: 67 Page: 49 of 1 Printed: 10/02/00 04:15 PM
50
Reference
Topical Cross-reference for Processor Instructions

Arithmetic
ADC DIV INC SBB ADD IDIV MUL SUB DEC IMUL NEG XADD#
BCD Conversion
AAA AAS AAD DAA AAM DAS
Bit Operations
AND BT BTS RCL ROR SHLD XOR BSF BTC NOT RCR SAR SHR BSR BTR OR ROL SHL/SAL SHRD
Compare
BT BTS CMPXCHG# BTC CMP TEST BTR CMPS
Conditional Set
SETA/SETNBE SETBE/SETNA SETG/SETNLE SETLE/SETNG SETNO SETO
* 8018680486 only. 8038680486 only.
SETAE/SETNB SETC SETGE/SETNL SETNC SETNP/SETPO SETP/SETPE

8028680486 only. # 80486 only.
SETB/SETNAE SETE/SETZ SETL/SETNGE SETNE/SETNZ SETNS SETS
Processor
51
Conditional Transfer
BOUND* JAE/JNB JC JG/JNLE JLE/JNG JNO JO INTO JB/JNAE JCXZ/JECXZ JGE/JNL JNC JNP/JPO JP/JPE JA/JNBE JBE/JNA JE/JZ JL/JNGE JNE/JNZ JNS JS
Data Transfer
BSWAP# LEA MOV MOVZX XCHG CMPXCHG# LFS/LGS/LSS MOVS STOS XLAT/XLATB LDS/LES LODS MOVSX XADD#
Flag
CLC CMC PUSHF STD CLD LAHF SAHF STI CLI POPF STC
Input/Output
IN OUT INS* OUTS*
Loop
JCXZ/JECXZ LOOPE/LOOPZ
* 8018680486 only. 8038680486 only.
LOOP LOOPNE/LOOPNZ
8028680486 only. # 80486 only.
52
Reference
Process Control
ARPL LGDT /LIDT /LLDT LTR STR MOV special WBINVD#
CLTS LMSW SGDT /SIDT /SLDT VERR INVD#
LAR LSL SMSW VERW INVLPG#
Processor Control
HLT NOP LOCK WAIT
Stack
PUSH PUSHAD* POPA* LEAVE* PUSHF POP POPAD* PUSHA* POPF ENTER*
String
MOVS SCAS OUTS* REPNE/REPNZ LODS CMPS REP STOS INS* REPE/REPZ
Type Conversion
CBW CWDE BSWAP# CWD CDQ
Unconditional Transfer
CALL RET
* 8018680486 only. 8038680486 only.
INT RETN/RETF
8028680486 only. # 80486 only.
IRET JMP
Processor
53
Interpreting Processor Instructions

The following sections explain the format of instructions for the 8086, 8088, 80286, 80386, and 80486 processors. Those instructions begin on page 64.
Flags
Only the flags common to all processors are shown. If none of the flags is affected by the instruction, the flag line says No change. If flags can be affected, a two-line entry is shown. The first line shows flag abbreviations as follows:
Abbreviation O D I T S Z A P C Flag Overflow Direction Interrupt Trap Sign Zero Auxiliary carry Parity Carry
The second line has codes indicating how the flag can be affected:
Code 1 0 ? blank Effect Sets the flag Clears the flag May change the flag, but the value is not predictable No effect on the flag Modifies according to the rules associated with the flag
54
Reference
Syntax
Each encoding variation may have different syntaxes corresponding to different addressing modes. The following abbreviations are used: reg A general-purpose register of any size.
segreg One of the segment registers: DS, ES, SS, or CS (also FS or GS on the 8038680486). accum An accumulator register of any size: AL or AX (also EAX on the 8038680486). mem label src,dest immed A direct or indirect memory operand of any size. A labeled memory location in the code segment. A source or destination memory operand used in a string operation. A constant operand.
In some cases abbreviations have numeric suffixes to specify that the operand must be a particular size. For example, reg16 means that only a 16-bit (word) register is accepted.
Examples
One or more examples are shown for each syntax. Their position is not related to the clock speeds in the right column.
Clock Speeds
Column 3 shows the clock speeds for each processor. Sometimes an instruction may have more than one clock speed. Multiple speeds are separated by commas. If several speeds are part of an expression, they are enclosed in parentheses. The following abbreviations are used to specify variations: EA Effective address. This applies only to the 8088 and 8086 processors, as described in the next section. b,w,d pm Byte, word, or doubleword operands. Protected mode.
n Iterations. Repeated instructions may have a base number of clocks plus a number of clocks for each iteration. For example, 8+4n means 8 clocks plus 4 clocks for each iteration. noj No jump. For conditional jump instructions, noj indicates the speed if the condition is false and the jump is not taken.
Processor
55
m Next instruction components. Some control transfer instructions take different times depending on the length of the next instruction executed. On the 8088 and 8086, m is never a factor. On the 80286, m is the number of bytes in the instruction. On the 8038680486, m is the number of components. Each byte of encoding is a component, and the displacement and data are separate components. W88,88 8088 exceptions. See Timings on the 8088 and 8086 Processors, following. Clocks can be converted to nanoseconds by dividing 1 microsecond by the number of megahertz (MHz) at which the processor is running. For example, on a processor running at 8 MHz, 1 clock takes 125 nanoseconds (1000 MHz per nanosecond / 8 MHz). The clock counts are for best-case timings. Actual timings vary depending on wait states, alignment of the instruction, the status of the prefetch queue, and other factors.
Timings on the 8088 and 8086 Processors

Because of its 8-bit data bus, the 8088 always requires two fetches to get a 16bit operand. Therefore, instructions that work on 16-bit memory operands take longer on the 8088 than on the 8086. Separate 8088 timings are shown in parentheses following the main timing. For example, 9 (W88=13) means that the 8086 with any operands or the 8088 with byte operands take 9 clocks, but the 8088 with word operands takes 13 clocks. Similarly, 16 (88=24) means that the 8086 takes 16 clocks, but the 8088 takes 24 clocks. On the 8088 and 8086, the effective address (EA) value must be added for instructions that operate on memory operands. A displacement is any direct memory or constant operand, or any combination of the two. The following shows the number of clocks to add for the effective address:
Components Displacement Base or index Displacement plus base or index Base plus index (BP+DI, BX+SI) EA Clocks 6 5 9 7 Examples
mov mov mov mov mov mov mov mov ax,stuff ax,stuff+2 ax,[bx] ax,[di] ax,[bp+8] ax,stuff[di] ax,[bx+si] ax,[bp+di]
56
Reference Components Base plus index (BP+SI, BX+DI) Base plus index plus displacement (BP+DI+disp, BX+SI+disp) Base plus index plus displacement (BP+SI+disp, BX+DI+disp) Segment override EA Clocks 8 11 12 Examples
mov mov mov mov mov mov mov mov ax,[bx+di] ax,[bp+si] ax,stuff[bx+si] ax,[bp+di+8] ax,stuff[bx+di] ax,[bp+si+20] ax,es:stuff ax,ds:[bp+10]
EA+2
Timings on the 8028680486 Processors

On the 8028680486 processors, the effective address calculation is handled by hardware and is therefore not a factor in clock calculations except in one case. If a memory operand includes all three possible elements a displacement, a base register, and an index register then add one clock. On the 80486, the extra clock is not always used. Examples are shown in the following.
mov mov mov ax,[bx+di] ax,array[bx+di] ax,[bx+di+6] ;No extra ;One extra ;One extra
Note 80186 and 80188 timings are different from 8088, 8086, and 80286 timings. They are not shown in this manual. Timings are also not shown for protected-mode transfers through gates or for the virtual 8086 mode available on the 8038680486 processors.
Interpreting Encodings
Encodings are shown for each variation of the instruction. This section describes encoding for all processors except the 8038680486. The encodings take the form of boxes filled with 0s and 1s for bits that are constant for the instruction variation, and abbreviations (in italics) for the following variable bits or bitfields: d Direction bit. If set, do memory to register; the reg field is the destination. If clear, do register to memory or register to register; the reg field is the source. a Accumulator direction bit. If set, move accumulator register to memory. If clear, move memory to accumulator register. w Word/byte bit. If set, use 16-bit or 32-bit operands. If clear, use 8-bit operands.
Processor
57
Sign bit. If set, sign-extend 8-bit immediate data to 16 bits.
mod Mode. This 2-bit field gives the register/memory mode with displacement. The possible values are shown below:
mod 00 Meaning This value can have two meanings: If r/m is 110, a direct memory operand is used. If r/m is not 110, the displacement is 0 and an indirect memory operand is used. The operand must be based, indexed, or based indexed. An indirect memory operand is used with an 8-bit displacement. An indirect memory operand is used with a 16-bit displacement. A two-register instruction is used; the reg field specifies the destination and the r/m field specifies the source.
01 10 11
reg
reg 000 001 010 011 100 101 110 111
Register. This 3-bit field specifies one of the general-purpose registers:

16/32-bit if w=1 AX/EAX CX/ECX DX/EDX BX/EBX SP/ESP BP/EBP SI/ESI DI/EDI 8-bit if w=0 AL CL DL BL AH CH DH BH
The reg field is sometimes used to specify encoding information rather than a register. sreg
sreg 000 001 010 011 100 101
Segment register. This field specifies one of the segment registers:

Register ES CS SS DS FS GS
58
Reference
r/m Register/memory. This 3-bit field specifies a register or memory r/m operand. If the mod field is 11, r/m specifies the source register using the reg field codes. Otherwise, the field has one of the following values:
r/m 000 001 010 011 100 101 110 111 Operand Address DS:[BX+SI+disp] DS:[BX+DI+disp] SS:[BP+SI+disp] SS:[BP+DI+disp] DS:[SI+disp] DS:[DI+disp] SS:[BP+disp]* DS:[BX+disp]
* If mod is 00 and r/m is 110, then the operand is treated as a direct memory operand. This means that the operand [BP] is encoded as [BP+0] rather than having a short-form like other register indirect operands. Encoding [BX] takes one byte, but encoding [BP] takes two.
disp Displacement. These bytes give the offset for memory operands. The possible lengths (in bytes) are shown in parentheses. data Data. These bytes give the actual value for constant values. The possible lengths (in bytes) are shown in parentheses. If a memory operand has a segment override, the entire instruction has one of the following bytes as a prefix:
Prefix 00101110 (2Eh) 00111110 (3Eh) 00100110 (26h) 00110110 (36h) 01100100 (64h) 01100101 (65h) Segment CS DS ES SS FS GS
Example
As an example, assume you want to calculate the encoding for the following statement (where warray is a 16-bit variable):
add warray[bx+di], -3
Processor
59
First look up the encoding for the immediate-to-memory syntax of the ADD instruction: 100000sw mod,000,r/m disp (0, 1, or 2) data (0, 1, or 2) Since the destination is a word operand, the w bit is set. The 8-bit immediate data must be sign-extended to 16 bits to fit into the operand, so the s bit is also set. The first byte of the instruction is therefore 10000011 (83h). Since the memory operand can be anywhere in the segment, it must have a 16bit offset (displacement). Therefore the mod field is 10. The reg field is 000, as shown in the encoding. The r/m coding for [bx+di+ disp] is 001. The second byte is 10000001 (81h). The next two bytes are the offset of warray. The low byte of the offset is stored first and the high byte second. For this example, assume that warray is located at offset 10EFh. The last byte of the instruction is used to store the 8-bit immediate value 3 (FDh). This value is encoded as 8 bits (but sign-extended to 16 bits by the processor). The encoding is shown here in hexadecimal: 83 81 EF 10 FD You can confirm this by assembling the instruction and looking at the resulting assembly listing.
Interpreting 8038680486 Encoding Extensions

This book shows 8038680486 encodings for instructions that are available only on the 8038680486 processors. For other instructions, encodings are shown only for the 16-bit subset available on all processors. This section tells how to convert the 80286 encodings shown in the book to 8038680486 encodings that use extensions such as 32-bit registers and memory operands. The extended 8038680486 encodings differ in that they can have additional prefix bytes, a Scaled Index Base (SIB) byte, and 32-bit displacement and immediate bytes. Use of these elements is closely tied to the segment word size. The use type of the code segment determines whether the instructions are processed in 32-bit mode (USE32) or 16-bit mode (USE16). Current versions of MS-DOS and Microsoft Windows use 16-bit mode only. Windows NT uses 32-bit mode. The bytes that can appear in an instruction encoding are:
60
Reference
16-Bit Encoding
Opcode (1-2) mod-reg-r/m (0-1) disp (0-2) immed (0-2)
32-Bit Encoding
AddressSize (67h) (0-1) OperandSize (66h) (0-1) Opcode (1-2) mod-regr/m (0-1) Scaled Index Base (0-1) disp (0-4) immed (0-4)
Additional bytes may be added for a segment prefix, a repeat prefix, or the LOCK prefix.
Address-Size Prefix
The address-size prefix determines the segment word size of the operation. It can override the default size for calculating the displacement of memory addresses. The address prefix byte is 67h. The assembler automatically inserts this byte where appropriate. In 32-bit mode (USE32 or FLAT code segment), displacements are calculated as 32-bit addresses. The effective address-size prefix must be used for any instructions that must calculate addresses as 16-bit displacements. In 16-bit mode, the defaults are reversed. The prefix must be used to specify calculation of 32-bit displacements.
Operand-Size Prefix
The operand-size prefix determines the size of operands. It can override the default size of registers or memory operands. The operand-size prefix byte is 66h. The assembler automatically inserts this byte where appropriate. In 32-bit mode, the default sizes for operands are 8 bits and 32 bits (depending on the w bit). For most instructions, the operand-size prefix must be used for any instructions that use 16-bit operands. In 16-bit mode, the default sizes are 8 bits and 16 bits. The prefix must be used for any instructions that use 32-bit operands. Some instructions use 16-bit operands, regardless of mode.
Encoding Differences for 32-Bit Operations

When 32-bit operations are performed, the meaning of certain bits or fields is different from their meaning in 16-bit operations. The changes may affect default operations in 32-bit mode, or 16-bit mode operations in which the address-size prefix or the operand-size prefix is used. The following fields may
Processor
61
have a different meaning for 32-bit operations from their meaning as described in the Interpreting Encodings section: w s Word/byte bit. If set, use 32-bit operands. If clear, use 8-bit operands. Sign bit. If set, sign-extend 8-bit and 16-bit immediate data to 32 bits.
mod Mode. This field indicates the register/memory mode. The value 11 still indicates a register-to-register operation with r/m containing the code for a 32-bit source register. However, other codes have different meanings as shown in the tables in the next section. reg Register. The codes for 16-bit registers are extended to 32-bit registers. For example, if the reg field is 000, EAX is used instead of AX. Use of 8-bit registers is unchanged. sreg Segment register. The 80386 has the following additional segment registers:
sreg 100 101 Register FS GS
r/m Register/memory. If the r/m field is used for the source register, 32-bit registers are used as for the reg field. If the field is used for memory operands, the meaning is completely different from the meaning used for 16-bit operations, as shown in the tables in the next section. disp data Displacement. This field is 4 bytes for 32-bit addresses. Data. Immediate data can be up to 4 bytes.
Scaled Index Base Byte

Many 8038680486 extended memory operands are too complex to be represented by a single mod-reg-r/m byte. For these operands, a value of 100 in the r/m field signals the presence of a second encoding byte called the Scaled Index Base (SIB) byte. The SIB byte is made up of the following fields: ss index base ss Scaling Field. This two-bit field specifies one of the following scaling factors:
ss 00 01 10 Scale 1 2 4
62
Reference 11 8
index Index Register. This three-bit field specifies one of the following index registers:
index 000 001 010 011 100 101 110 111 Register EAX ECX EDX EBX no index EBP ESI EDI
Note ESP cannot be an index register. If the index field is 100, the ss field must be 00. base Base Register. This 3-bit field combines with the mod field to specify the base register and the displacement. Note that the base field only specifies the base when the r/m field is 100. Otherwise, the r/m field specifies the base. The possible combinations of the mod, r/m, scale, index, and base fields are as follows:
Processor
63
If a memory operand has a segment override, the entire instruction has one of the prefixes discussed in the preceding section, Interpreting Encodings, or one of the following prefixes for the segment registers available only on the 80386 80486:
Prefix 01100100 (64h) 01100101 (65h) Segment FS GS
Example
Assume you want to calculate the encoding for the following statement (where warray is a 16-bit variable). Assume that the instruction is used in 16-bit mode.
add warray[eax+ecx*2], -3
First look up the encoding for the immediate-to-memory syntax of the ADD instruction: 100000sw mod,000,r/m disp (0, 1, or 2) data (1 or 2)
This encoding must be expanded to account for 8038680486 extensions. Note that the instruction operates on 16-bit data in a 16-bit mode program. Therefore, the operand-size prefix is not needed. However, the instruction does use 32-bit
64
Reference
registers to calculate a 32-bit effective address. Thus the first byte of the encoding must be the effective address-size prefix, 01100111 (67h). The opcode byte is the same (83h) as for the 80286 example described in the Interpreting Encodings section. The mod-reg-r/m byte must specify a based indexed operand with a scaling factor of two. This operand cannot be specified with a single byte, so the encoding must also use the SIB byte. The value 100 in the r/m field specifies an SIB byte. The reg field is 000, as shown in the encoding. The mod field is 10 for operands that have base and scaled index registers and a 32-bit displacement. The combined mod, reg, and r/m fields for the second byte are 10000100 (84h). The SIB byte is next. The scaling factor is 2, so the ss field is 01. The index register is ECX, so the index field is 001. The base register is EAX, so the base field is 000. The SIB byte is 01001000 (48h). The next 4 bytes are the offset of warray. The low bytes are stored first. For this example, assume that warray is located at offset 10EFh. This offset only requires 2 bytes, but 4 must be supplied because of the addressing mode. A 32bit address can be safely used in 16-bit mode as long as the upper word is 0. The last byte of the instruction is used to store the 8-bit immediate value 3 (FDh). The encoding is shown here in hexadecimal: 67 83 84 48 00 00 EF 10 FD
Instructions
This section provides an alphabetical reference to the instructions for the 8086, 8088, 80286, 80386, and 80486 processors.
AAA ASCII Adjust After Addition

Adjusts the result of an addition to a decimal digit (09). The previous addition instruction should place its 8-bit sum in AL. If the sum is greater than 9h, AH is incremented and the carry and auxiliary carry flags are set. Otherwise, the carry and auxiliary carry flags are cleared. Flags O D I ? T S Z A P C ? ? ?
AAD ASCII Adjust Before Division
65
Encoding
00110111
Syntax AAA Examples
aaa
CPU 88/86 286 386 486
Clock Cycles 8 3 4 3
AAD ASCII Adjust Before Division

Converts unpacked BCD digits in AH (most significant digit) and AL (least significant digit) to a binary number in AX. This instruction is often used to prepare an unpacked BCD number in AX for division by an unpacked BCD digit in an 8-bit register. Flags Encoding O D I ? T S Z A P C ? ? 00001010
Examples
aad
11010101
Syntax AAD
CPU 88/86 286 386 486
66
AAM ASCII Adjust After Multiply
AAM ASCII Adjust After Multiply

Converts an 8-bit binary number less than 100 decimal in AL to an unpacked BCD number in AX. The most significant digit goes in AH and the least significant in AL. This instruction is often used to adjust the product after a MUL instruction that multiplies unpacked BCD digits in AH and AL. It is also used to adjust the quotient after a DIV instruction that divides a binary number less than 100 decimal in AX by an unpacked BCD number. Flags Encoding O D I ? T S Z A P C ? ? 00001010
Examples
aam
11010100
Syntax AAM
CPU 88/86 286 386 486
AAS ASCII Adjust After Subtraction

Adjusts the result of a subtraction to a decimal digit (09). The previous subtraction instruction should place its 8-bit result in AL. If the result is greater than 9h, AH is decremented and the carry and auxiliary carry flags are set. Otherwise, the carry and auxiliary carry flags are cleared. Flags Encoding O D I ? T S Z A P C ? ? ?
00111111
Syntax AAS Examples
aas
CPU 88/86 286 386 486
ADC Add with Carry
67
ADC Add with Carry

Adds the source operand, the destination operand, and the value of the carry flag. The result is assigned to the destination operand. This instruction is used to add the more significant portions of numbers that must be added in multiple registers. Flags Encoding O D I T S Z A P C mod,reg,r/m disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486 Clock Cycles 3 2 2 1 16+EA (W88=24+EA) 7 7 3 9+EA (W88=13+EA) 7 6 2
000100dw
Syntax ADC reg,reg
Examples
adc dx,cx
ADC mem,reg
adc
WORD PTR m32[2],dx
ADC reg,mem
adc
dx,WORD PTR m32[2]
Encoding
100000sw
Syntax
mod, 010,r/m
disp (0, 1, or 2)
data (1 or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 4 3 2 1 17+EA (W88=23+EA) 7 7 3
Examples
adc dx,12
ADC reg,immed
ADC mem,immed
adc
WORD PTR m32[2],16
Encoding
0001010w
Syntax
data (1 or 2)
Examples
adc ax,5
CPU 88/86 286 386 486
ADC accum,immed
68
ADD Add
ADD Add
Adds the source and destination operands and puts the sum in the destination operand. Flags Encoding O D I T S Z A P C mod,reg,r/m disp (0, 1, or 2)
000000dw
Syntax ADD reg,reg
Examples
add ax,bx
ADD mem, reg
add add
total, cx array[bx+di], dx
ADD reg,mem
add add
cx,incr dx,[bp+6]
Encoding
100000sw
Syntax
mod, 000,r/m
disp (p,1, or2)
data (1or2)
Examples
add bx,6
ADD reg,immed
ADD mem,immed
add add
amount,27 pointers[bx][si],6
Encoding
0000010w
Syntax
data (1 or 2)
Examples
add ax,10
CPU 88/86 286 386 486
ADD accum,immed
AND Logical AND
69
AND Logical AND

Performs a bitwise AND operation on the source and destination operands and stores the result in the destination operand. For each bit position in the operands, if both bits are set, the corresponding bit of the result is set. Otherwise, the corresponding bit of the result is cleared. Flags Encoding O D I 0 T S Z A P C ? 0 mod,reg,r/m disp (0, 1, or 2)
001000dw
Syntax AND reg,reg
Examples
and dx,bx
AND mem,reg
and and
bitmask,bx [bp+2],dx
AND reg,mem
and and
bx,masker dx,marray[bx+di]
Encoding
100000sw
Syntax
mod, 100, r/m
disp (0, 1, or 2)
data (1 or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 4 3 2 1 17+EA(W88=24+EA) 7 7 3
Examples
and dx,0F7h
AND reg,immed
AND mem,immed
and
masker, 100lb
Encoding
0010010w
Syntax
data (1 or 2)
Examples
and ax,0B6h
CPU 88/86 286 386 486
AND accum,immed
70
ARPL Adjust Requested Privilege Level
ARPL Adjust Requested Privilege Level

8028680486 Protected Only Verifies that the destination Requested Privilege Level (RPL) field (bits 0 and 1 of a selector value) is less than the source RPL field. If it is not, ARPL adjusts the destination RPL to match the source RPL. The destination operand should be a 16-bit memory or register operand containing the value of a selector. The source operand should be a 16-bit register containing the test value. The zero flag is set if the destination is adjusted; otherwise, the flag is cleared. ARPL is useful only in 8028680486 protected mode. See Intel documentation for details on selectors and privilege levels. Flags Encoding O D I T S Z A P C mod,reg,r/m disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 10 20 9 11 21 9
01100011
Syntax
Examples
arpl ax,cx
ARPL reg,reg
ARPL mem,reg
arpl
selector,dx
BOUND Check Array Bounds

8028680486 Only Verifies that a signed index value is within the bounds of an array. The destination operand can be any 16-bit register containing the index to be checked. The source operand must then be a 32-bit memory operand in which the low and high words contain the starting and ending values, respectively, of the array. (On the 8038680486 processors, the destination operand can be a 32-bit register; in this case, the source operand must be a 64bit operand made up of 32-bit bounds.) If the source operand is less than the first bound or greater than the last bound, an interrupt 5 is generated. The instruction pointer pushed by the interrupt (and returned by IRET) points to the BOUND instruction rather than to the next instruction. Flags No change
BSF/BSR Bit Scan
71
Encoding
01100010
Syntax
mod,reg, r/m
disp (2)
Examples
bound di,base-4
CPU 88/86 286 386 486
Clock Cycles noj =13 noj =10 noj =7
BOUND reg16,mem32 BOUND reg32,mem64*
* 8038680486 only. See INT for timings if interrupt 5 is called.
BSF/BSR Bit Scan

8038680486 Only Scans an operand to find the first set bit. If a set bit is found, the zero flag is cleared and the destination operand is loaded with the bit index of the first set bit encountered. If no set bit is found, the zero flag is set. BSF (Bit Scan Forward) scans from bit 0 to the most significant bit. BSR (Bit Scan Reverse) scans from the most significant bit of an operand to bit 0. Flags Encoding O D I T S Z A P C 10111100 mod, reg, r/m disp (0, 1, 2, or 4)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 10+3n* 642 10+3n* 743
00001111
Syntax
Examples
bsf cx,bx
BSF reg16,reg16 BSF reg32,reg32
BSF reg16,mem16 BSF reg32,mem32
bsf
ecx,bitmask
72
BSWAP Byte Swap
Encoding
00001111
Syntax
10111101
mod, reg, r/m
disp (0, 1, 2, or 4)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 10+3n* 103 3n# 10+3n* 104 3n#
Examples
bsr cx,dx
BSR reg16,reg16 BSR reg32,reg32
BSR reg16,mem16 BSR reg32,mem32
bsr
eax,bitmask
* n = bit position from 0 to 31. clocks = 6 if second operand equals 0. Clocks = 8 + 4 for each byte scanned + 3 for each nibble scanned + 3 for each bit scanned in last nibble or 6 if second operand equals 0. Same as footnote above, but add 1 clock. # n = bit position from 0 to 31. clocks = 7 if second operand equals 0.
BSWAP Byte Swap

80486 Only Takes a single 32-bit register as operand and exchanges the first byte with the fourth, and the second byte with the third. This instruction does not alter any bit values within the bytes and is useful for quickly translating between 8086-family byte storage and storage schemes in which the high byte is stored first. Flags Encoding No change 00001111
Syntax BSWAP reg32
11001 reg
Examples
bswap bswap eax ebx
CPU 88/86 286 386 486
Clock Cycles 1
BT/BTC/BTR/BTS Bit Tests
73
BT/BTC/BTR/BTS Bit Tests

8038680486 Only Copies the value of a specified bit into the carry flag, where it can be tested by a JC or JNC instruction. The destination operand specifies the value in which the bit is located; the source operand specifies the bit position. BT simply copies the bit to the flag. BTC copies the bit and complements (toggles) it in the destination. BTR copies the bit and resets (clears) it in the destination. BTS copies the bit and sets it in the destination. Flags Encoding O D I T S Z A P C 10111010 mod, BBB*,r/m
Examples
bt ax,4
00001111
Syntax
disp (0, 1, 2, or 4)
CPU 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486
data (1)
Clock Cycles 3 3 6 6 6 3 8 8
BT reg16,immed8
BTC reg16,immed8 BTR reg16,immed8 BTS reg16,immed8 BT mem16,immed8
bts btr btc
ax,4 bx,17 edi,4
btr DWORD PTR [si],27 btc color[di],4 btc DWORD PTR [bx],27 btc maskit,4 btr color[di],4
BTC mem16,immed8 BTR mem16,immed8 BTS mem16,immed8
Encoding
00001111
Syntax
10BBB011*
mod, reg, r/m

Examples
bt ax,bx
disp (0, 1, 2, or 4)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 3 3 6 6
BT reg16,reg16
BTC reg16,reg16 BTR reg16,reg16 BTS reg16,reg16
btc bts btr
eax,ebx bx,ax cx,di
74
CALL Call Procedure Syntax BT mem16,reg16 Examples

bt [bx],dx
CPU 88/86 286 386 486 88/86 286 386 486
BTC mem16,reg16 BTR mem16,reg16 BTS mem16,reg16
bts btr btc
flags[bx],cx rotate,cx [bp+8],si
* BBB is 100 for BT, 111 for BTC, 110 for BTR, and 101 for BTS . Operands also can be 32 bits (reg32 and mem32).
CALL Call Procedure

Calls a procedure. The instruction pushes the address of the next instruction onto the stack and jumps to the address specified by the operand. For NEAR calls, the offset (IP) is pushed and the new offset is loaded into IP. For FAR calls, the segment (CS) is pushed and the new segment is loaded into CS. Then the offset (IP) is pushed and the new offset is loaded into IP. A subsequent RET instruction can pop the address so that execution continues with the instruction following the call. Flags Encoding No change 11101000
Syntax CALL label
disp (2)
Examples
call upcase
CPU 88/86 286 386 486
Clock Cycles 19 (88=23) 7+m 7+m 3
Encoding
10011010
Syntax CALL label
disp (4)
Examples
call call FAR PTR job distant
CPU 88/86 286 386 486
Clock Cycles 28 (88=36) 13+m,pm=26+m* 17+m,pm=34+m* 18,pm=20*
CBW Convert Byte to Word
75
Encoding
11111111
Syntax CALL reg
mod,010,r/m
Examples
call ax
CPU 88/86 286 386 486 88/86 286 386 486
Clock Cycles 16 (88=20) 7+m 7+m 5 21+EA (88=29+EA) 11+m 10+m 5
CALL mem16 CALL mem32
call call
pointer [bx]
Encoding
11111111
Syntax
mod,011,r/m
Examples
call call far_table[di] DWORD PTR [bx]
CPU 88/86 286 386 486
Clock Cycles 37+EA (88=53+EA) 16+m,pm=29+m* 22+m,pm=38+m* 17,pm=20*
CALL mem32 CALL mem48
* Timings for calls through call and task gates are not shown, since they are used primarily in operating systems. 8038680486 32-bit addressing mode only.
CBW Convert Byte to Word

Converts a signed byte in AL to a signed word in AX by extending the sign bit of AL into all bits of AH. Flags Encoding No change 10011000*
Syntax CBW Examples
cbw
CPU 88/86 286 386 486
* CBW and CWDE have the same encoding with two exceptions: in 32-bit mode, CBW is preceded by the operand-size byte (66h) but CWDE is not; in 16-bit mode, CWDE is preceded by the operand-size byte but CBW is not.
76
CDQ Convert Double to Quad
CDQ Convert Double to Quad

8038680486 Only Converts the signed doubleword in EAX to a signed quadword in the EDX:EAX register pair by extending the sign bit of EAX into all bits of EDX. Flags Encoding No change 10011001*
Syntax CDQ Examples
cdq
CPU 88/86 286 386 486
Clock Cycles 2 3
* CWD and CDQ have the same encoding with two exceptions: in 32-bit mode, CWD is preceded by the operand-size byte (66h) but CDQ is not; in 16-bit mode, CDQ is preceded by the operand-size byte but CWD is not.
CLC Clear Carry Flag

Clears the carry flag. Flags Encoding O D I T S Z A P C 0
11111000
Syntax CLC Examples
clc
CPU 88/86 286 386 486
CLTS Clear Task-Switched Flag
77
CLD Clear Direction Flag

Clears the direction flag. All subsequent string instructions will process up (from low addresses to high addresses) by increasing the appropriate index registers. Flags Encoding O D I 0 T S Z A P C
11111100
Syntax CLD Examples
cld
CPU 88/86 286 386 486
CLI Clear Interrupt Flag

Clears the interrupt flag. When the interrupt flag is cleared, maskable interrupts are not recognized until the flag is set again with the STI instruction. In protected mode, CLI clears the flag only if the current tasks privilege level is less than or equal to the value of the IOPL flag. Otherwise, a general-protection fault occurs. Flags Encoding O D I T S Z A P C 0 11111010
Syntax CLI Examples
cli
CPU 88/86 286 386 486
CLTS Clear Task-Switched Flag

8028680486 Privileged Only Clears the task-switched flag in the Machine Status Word (MSW) of the 80286, or the CR0 register of the 8038680486. This instruction can be used only in system software executing at privilege level
78
CMC Complement Carry Flag
0. See Intel documentation for details on the task-switched flag and other privileged-mode concepts. Flags Encoding No change 00001111
Syntax CLTS
00000110
Examples
clts
CPU 88/86 286 386 486
Clock Cycles 2 5 7
CMC Complement Carry Flag

Complements (toggles) the carry flag. Flags Encoding O D I T S Z A P C
11110101
Syntax CMC Examples
cmc
CPU 88/86 286 386 486
CMP Compare Two Operands

Compares two operands as a test for a subsequent conditional-jump or set instruction. CMP does this by subtracting the source operand from the destination operand and setting the flags according to the result. CMP is the same as the SUB instruction, except that the result is not stored. Flags O D I T S Z A P C
CMP Compare Two Operands
79
Encoding
001110dw
Syntax CMP reg,reg
mod, reg, r/m
disp (0, 1, or 2)
Examples
cmp cmp di,bx dl,cl
CMP mem ,reg
cmp cmp
maximum,dx array[si],bl
CMP reg,mem
cmp cmp
dx,minimum bh,array[si]
Encoding
100000sw
Syntax
mod, 111,r/m
disp (0, 1, or 2)
data (1 or 2)
Examples
cmp bx,24
CMP reg,immed
CMP mem ,immed
cmp cmp
WORD PTR [di],4 tester,4000
Encoding
0011110w
Syntax
data (1 or 2)
Examples
cmp ax,1000
CPU 88/86 286 386 486
CMP accum,immed
80
CMPS/CMPSB/CMPSW/CMPSD Compare String
CMPS/CMPSB /CMPSW /CMPSD Compare String

Compares two strings. DS:SI must point to the source string and ES:DI must point to the destination string (even if operands are given). For each comparison, the destination element is subtracted from the source element and the flags are updated to reflect the result (although the result is not stored). DI and SI are adjusted according to the size of the operands and the status of the direction flag. They are increased if the direction flag has been cleared with CLD, or decreased if the direction flag has been set with STD. If the CMPS form of the instruction is used, operands must be provided to indicate the size of the data elements to be processed. A segment override can be given for the source (but not for the destination). If CMPSB (bytes), CMPSW (words), or CMPSD (doublewords on the 8038680486 only) is used, the instruction determines the size of the data elements to be processed. CMPS and its variations are normally used with repeat prefixes. REPNE (or REPNZ) is used to find the first match between two strings. REPE (or REPZ) is used to find the first mismatch. Before the comparison, CX should contain the maximum number of elements to compare. After a REPNE CMPS, the zero flag is clear if no match was found. After a REPE CMPS, the zero flag is set if no mismatch was found. When the instruction finishes, ES:DI and DS:SI point to the element that follows (if the direction flag is clear) or precedes (if the direction flag is set) the match or mismatch. If CX decrements to 0, ES:DI and DS:SI point to the element that follows or precedes the last comparison. The zero flag is set or clear according to the result of the last comparison, not according to the value of CX. Flags Encoding
Syntax CMPS [ [segreg:] ] src, [ [ES:] ] dest CMPSB [ [[ [segreg:[ [ src, ] ]ES:] ] dest ] ] CMPSW [ [[ [segreg:[ [ src, ] ]ES:] ] dest ] ] CMPSD [ [[ [segreg:[ [ src, ] ]ES:] ] dest ] ]
O D I
T S Z A P C
1010011w
Examples
cmps repne repe repne source,es:dest cmpsw cmpsb cmpsd
CPU 88/86 286 386 486
Clock Cycles 22 (W88=30) 8 10 8
CWD Convert Word to Double
81
CMPXCHG Compare and Exchange

80486 Only Compares the destination operand to the accumulator (AL, AX, or EAX). If equal, the source operand is copied to the destination. Otherwise, the destination is copied to the accumulator. The instruction sets flags according to the result of the comparison. Flags Encoding O D I 00001111
Syntax CMPXCHG mem ,reg
T S Z A P C 1011000b mod, reg, r/m

Examples
cmpxchg cmpxchg warr[bx],cx string,bl
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 710 6
CMPXCHG reg,reg
cmpxchg cmpxchg
dl,cl bx,dx
CWD Convert Word to Double

Converts the signed word in AX to a signed doubleword in the DX:AX register pair by extending the sign bit of AX into all bits of DX. Flags Encoding O D I T S Z A P C
10011001*
Syntax CWD Examples
cwd
CPU 88/86 286 386 486
* CWD and CDQ have the same encoding with two exceptions: in 32-bit mode, CWD is preceded by the operand-size byte (66h) but CDQ is not; in 16-bit mode, CDQ is preceded by the operand-size byte but CWD is not.
82
CWDE Convert Word to Extended Double
CWDE Convert Word to Extended Double

8038680486 Only Converts a signed word in AX to a signed doubleword in EAX by extending the sign bit of AX into all bits of EAX. Flags Encoding No change 10011000*
Syntax CWDE Examples
cwde
CPU 88/86 286 386 486
Clock Cycles 3 3
* CBW and CWDE have the same encoding with two exceptions: in 32-bit mode, CBW is preceded by the operand-size byte (66h) but CWDE is not; in 16-bit mode, CWDE is preceded by the operand-size byte but CBW is not.
DAA Decimal Adjust After Addition

Adjusts the result of an addition to a packed BCD number (less than 100 decimal). The previous addition instruction should place its 8-bit binary sum in AL. DAA converts this binary sum to packed BCD format with the least significant decimal digit in the lower four bits and the most significant digit in the upper four bits. If the sum is greater than 99h after adjustment, the carry and auxiliary carry flags are set. Otherwise, the carry and auxiliary carry flags are cleared. Flags Encoding O D I ? T S Z A P C
00100111
Syntax DAA Examples
daa
CPU 88/86 286 386 486
DEC Decrement
83
DAS Decimal Adjust After Subtraction

Adjusts the result of a subtraction to a packed BCD number (less than 100 decimal). The previous subtraction instruction should place its 8-bit binary result in AL. DAS converts this binary sum to packed BCD format with the least significant decimal digit in the lower four bits and the most significant digit in the upper four bits. If the sum is greater than 99h after adjustment, the carry and auxiliary carry flags are set. Otherwise, the carry and auxiliary carry flags are cleared. Flags Encoding O D I ? T S Z A P C
00101111
Syntax DAS Examples
das
CPU 88/86 286 386 486
DEC Decrement
Subtracts 1 from the destination operand. Because the operand is treated as an unsigned integer, the DEC instruction does not affect the carry flag. To detect any effects on the carry flag, use the SUB instruction. Flags Encoding O D I T S Z A P C mod, 001,r/m disp (0, 1, or 2)
1111111w
Syntax DEC reg8
Examples
dec cl
DEC mem
dec
counter
84
DIV Unsigned Divide
Encoding
01001 reg
Syntax DEC reg16 DEC reg32* Examples
dec ax
CPU 88/86 286 386 486
* 8038680486 only.
DIV Unsigned Divide

Divides an implied destination operand by a specified source operand. Both operands are treated as unsigned numbers. If the source (divisor) is 16 bits wide, the implied destination (dividend) is the DX:AX register pair. The quotient goes into AX and the remainder into DX. If the source is 8 bits wide, the implied destination operand is AX. The quotient goes into AL and the remainder into AH. On the 8038680486, if the source is EAX, the quotient goes into EAX and the remainder into EDX. Flags Encoding O D I ? T S Z A P C ? ? ? ? ? mod, 110,r/m disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486
* Word memory operands on the 8088 take (158176)+EA clocks.
1111011w
Syntax DIV reg
Examples
div div cx dl
Clock Cycles b=8090,w=144162 b=14,w=22 b=14,w=22,d=38 b=16,w=24,d=40 (b=8696,w=150 168)+EA* b=17,w=25 b=17,w=25,d=41 b=16,w=24,d=40
DIV mem
div div
[bx] fsize
HLT Halt
85
ENTER Make Stack Frame

80286-80486 Only Creates a stack frame for a procedure that receives parameters passed on the stack. When immed16 is 0, ENTER is equivalent to push bp, followed by mov bp,sp. The first operand of the ENTER instruction specifies the number of bytes to reserve for local variables. The second operand specifies the nesting level for the procedure. The nesting level should be 0 for languages that do not allow access to local variables of higherlevel procedures (such as C, Basic, and FORTRAN). See the complementary instruction LEAVE for a method of exiting from a procedure. Flags Encoding No change 11001000
Syntax ENTER immed16,0
data (2)
data (1)
Examples
enter 4,0
CPU 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486
Clock Cycles 11 10 14 15 12 17 12+4(n 1) 15+4(n 1) 17+3n
ENTER immed16,1
enter 0,1
ENTER immed16,immed8
enter 6,4
HLT Halt
Stops CPU execution until an interrupt restarts execution at the instruction following HLT. In protected mode, this instruction works only in privileged mode. Flags No change
86
HLT Halt
Encoding
11110100
Syntax HLT Examples
hlt
CPU 88/86 286 386 486
IMUL
Signed Multiply
87
IDIV Signed Divide

Divides an implied destination operand by a specified source operand. Both operands are treated as signed numbers. If the source (divisor) is 16 bits wide, the implied destination (dividend) is the DX:AX register pair. The quotient goes into AX and the remainder into DX. If the source is 8 bits wide, the implied destination is AX. The quotient goes into AL and the remainder into AH. On the 8038680486, if the source is EAX, the quotient goes into EAX and the remainder into EDX. Flags Encoding O D I ? T S Z A P C ? ? ? ? ? mod, 111,r/m disp (0, 1, or 2)
CPU 88/86 286 386 486 IDIV mem
idiv itemp
1111011w
Syntax IDIV reg
Examples
idiv idiv bx dl
Clock Cycles b=101112,w= 165184 b=17,w=25 b=19,w=27,d=43 b=19,w=27,d=43 (b=107118,w=171 190)+EA* b=20,w=28 b=22,w=30,d=46 b=20,w=28,d=44
88/86 286 386 486
* Word memory operands on the 8088 take (175194)+EA clocks.
IMUL Signed Multiply

Multiplies an implied destination operand by a specified source operand. Both operands are treated as signed numbers. If a single 16-bit operand is given, the implied destination is AX and the product goes into the DX:AX register pair. If a single 8-bit operand is given, the implied destination is AL and the product goes into AX. On the 8038680486, if the operand is EAX, the product goes into the EDX:EAX register pair. The carry and overflow flags are set if the product is sign-extended into DX for 16-bit operands, into AH for 8-bit operands, or into EDX for 32-bit operands.
88
IMUL
Signed Multiply
Two additional syntaxes are available on the 8018680486 processors. In the two-operand form, a 16-bit register gives one of the factors and serves as the destination for the result; a source constant specifies the other factor. In the three-operand form, the first operand is a 16-bit register where the result will be stored, the second is a 16-bit register or memory operand containing one of the factors, and the third is a constant representing the other factor. With both variations, the overflow and carry flags are set if the result is too large to fit into the 16-bit destination register. Since the low 16 bits of the product are the same for both signed and unsigned multiplication, these syntaxes can be used for either signed or unsigned numbers. On the 8038680486, the operands can be either 16 or 32 bits wide. A fourth syntax is available on the 8038680486. Both the source and destination operands can be given specifically. The source can be any 16- or 32bit memory operand or general-purpose register. The destination can be any general-purpose register of the same size. The overflow and carry flags are set if the product does not fit in the destination. Flags Encoding O D I T S Z A P C ? ? ? ? mod, 101,r/m disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles b=8098,w=128154 b=13,w=21 b=914,w=922,d=938* b=1318,w=1326,d=1342 (b=86104,w=134160)+EA b=16,w=24 b=1217,w=1225,d=1241* b=1318,w=1326, d=1342
1111011w
Syntax IMUL reg
Examples
imul dx
IMUL mem
imul
factor
* The 8038680486 processors have an early-out multiplication algorithm. Therefore, multiplying an 8-bit or 16-bit value in EAX takes the same time as multiplying the value in AL or AX. Word memory operands on the 8088 take (138164)+EA clocks.
Encoding
011010s1
Syntax
mod, reg, r/m
disp (0, 1, or 2)
Examples
imul cx,25
data (1 or 2)
CPU 88/86 286 386 486 Clock Cycles 21 b=914,w=922,d=938 b=1318,w=1326,d=1342
IMUL reg16,immed IMUL reg32,immed*
IMUL IMUL reg16,reg16,immed IMUL reg32,reg32,immed*

imul dx,ax,18
Signed Multiply
89
88/86 286 386 486
21 b=914,w=922,d=938 b=1318,w=1326,d=1342
90
IN Input from Port Syntax IMUL reg16,mem16,immed IMUL reg32,mem32,immed* Examples

imul bx,[si],60
CPU 88/86 286 386 486
Clock Cycles 24 b=1217,w=1225,d=1241 b=1318,w=1326,d=1342
Encoding
00001111
Syntax
10101111
mod,reg,r/m
Examples
imul
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles w=922,d=938 b=1318,w=1326,d=1342 w=1225,d=1241 b=1318,w=1326,d=1342
IMUL reg16,reg16 IMUL reg32,reg32*
cx,ax
IMUL reg16,mem16 IMUL reg32,mem32*
imul dx,[si]
* 8038680486 only. The variations depend on the source constant size; destination size is not a factor.
IN Input from Port

Transfers a byte or word (or doubleword on the 8038680486) from a port to the accumulator register. The port address is specified by the source operand, which can be DX or an 8-bit constant. Constants can be used only for port numbers less than 255; use DX for higher port numbers. In protected mode, a general-protection fault occurs if IN is used when the current privilege level is greater than the value of the IOPL flag. Flags Encoding No change 1110010w
Syntax IN accum,immed
data (1)
Examples
in ax,60h
CPU 88/86 286 386 486
Clock Cycles 10 (W88=14) 5 12,pm=6,26* 14,pm=9,29*
INC Increment
91
Encoding
1110110w
Syntax IN accum,DX Examples
in in ax,dx al,dx
CPU 88/86 286 386 486
Clock Cycles 8 (W88=12) 5 13,pm=7,27* 14,pm=8,28*
* First protected-mode timing: CPL IOPL. Second timing: CPL > IOPL. Takes 27 clocks in virtual 8086 mode.
INC Increment
Adds 1 to the destination operand. Because the operand is treated as an unsigned integer, the INC instruction does not affect the carry flag. If a signed carry requires detection, use the ADD instruction. Flags Encoding O D I T S Z A P C mod,000,r/m disp (0, 1, or 2)
1111111w
Syntax INC reg8
Examples
inc cl
INC mem
inc
vpage
Encoding
01000 reg
Syntax INC reg16 INC reg32* Examples
inc bx
CPU 88/86 286 386 486
* 8038680486 only.
92
INS/INSB/INSW/INSD Input from Port to String
INS/INSB/INSW/INSD Input from Port to String

80286-80486 Only Receives a string from a port. The string is considered the destination and must be pointed to by ES:DI (even if an operand is given). The input port is specified in DX. For each element received, DI is adjusted according to the size of the operand and the status of the direction flag. DI is increased if the direction flag has been cleared with CLD or decreased if the direction flag has been set with STD. If the INS form of the instruction is used, a destination operand must be provided to indicate the size of the data elements to be processed, and DX must be specified as the source operand containing the port number. A segment override is not allowed. If INSB (bytes), INSW (words), or INSD (doublewords on the 8038680486 only) is used, the instruction determines the size of the data elements to be received. INS and its variations are normally used with the REP prefix. Before the repeated instruction is executed, CX should contain the number of elements to be received. In protected mode, a general-protection fault occurs if INS is used when the current privilege level is greater than the value of the IOPL flag. Flags Encoding No change 0110110w
Syntax INS [ [ES:] ] dest , DX INSB [ [[ [ES:] ] dest , DX] ] INSW [ [[ [ES:] ] dest , DX] ] INSD [ [[ [ES:] ] dest , DX] ] Examples
ins rep rep rep es:instr,dx insb insw insd
CPU 88/86 286 386 486
Clock Cycles 5 15,pm=9,29* 17,pm=10,32*
* First protected-mode timing: CPL IOPL. Second timing: CPL > IOPL.
INT Interrupt
Generates a software interrupt. An 8-bit constant operand (0 to 255) specifies the interrupt procedure to be called. The call is made by indexing the interrupt number into the Interrupt Vector Table (IVT) starting at segment 0, offset 0. In real mode, the IVT contains 4-byte pointers to interrupt procedures. In privileged mode, the IVT contains 8-byte pointers. When an interrupt is called in real mode, the flags, CS, and IP are pushed onto the stack (in that order), and the trap and interrupt flags are cleared. STI can be
INTO Interrupt on Overflow
93
used to restore interrupts. See Intel documentation and the documentation for your operating system for details on using and defining interrupts in privileged mode. To return from an interrupt, use the IRET instruction. Flags Encoding O D I T S Z A P C 0 0 11001101
Syntax INT immed8
data (1)
Examples
int 25h
CPU 88/86 286 386 486
Clock Cycles 51 (88=71) 23+m,pm=(40,78)+m* 37,pm=59,99* 30,pm=44,71*
Encoding
11001100
Syntax INT 3 Examples
int 3
CPU 88/86 286 386 486
Clock Cycles 52 (88=72) 23+m,pm=(40,78)+m* 33,pm=59,99* 26,pm=44,71*
* The first protected-mode timing is for interrupts to the same privilege level. The second is for interrupts to a higher privilege level. Timings for interrupts through task gates are not shown.
INTO Interrupt on Overflow

Generates Interrupt 4 if the overflow flag is set. The default MS-DOS behavior for Interrupt 4 is to return without taking any action. For INTO to have any effect, you must define an interrupt procedure for Interrupt 4. Flags Encoding O D I T S Z A P C 11001110
Syntax INTO Examples
into
CPU 88/86 286 386 486
Clock Cycles 53 (88=73),noj =4 24+m,noj =3,pm=(40, 78)+m* 35,noj =3,pm=59,99* 28,noj =3,pm=46,73*
* The first protected-mode timing is for interrupts to the same privilege level. The second is for interrupts to a higher privilege level. Timings for interrupts through task gates are not shown.
94
INVD Invalidate Data Cache
INVD Invalidate Data Cache

80486 Only Empties contents of the current data cache without writing changes to memory. Proper use of this instruction requires knowledge of how contents are placed in the cache. INVD is intended primarily for system programming. See Intel documentation for details. Flags Encoding No change 00001111
Syntax INVD
00001000
Examples
invd
CPU 88/86 286 386 486
Clock Cycles 4
INVLPG Invalidate TLB Entry

80486 Only Invalidates an entry in the Translation Lookaside Buffer (TLB), used by the demand-paging mechanism in virtual-memory operating systems. The instruction takes a single memory operand and calculates the effective address of the operand, including the segment address. If the resulting address is mapped by any entry in the TLB, this entry is removed. Proper use of INVLPG requires understanding the hardware-supported demand-paging mechanism. INVLPG is intended primarily for system programming. See Intel documentation for details. Flags Encoding No change 00001111
Syntax INVLPG
00000001
mod, reg, r/m
disp (2)
CPU 88/86 286 386 486 Clock Cycles 12*
Examples
invlpg invlpg pointer[bx] es:entry
* 11 clocks if address is not mapped by any TLB entry.
Jcondition
Jump Conditionally
95
IRET/IRETD Interrupt Return

Returns control from an interrupt procedure to the interrupted code. In real mode, the IRET instruction pops IP, CS, and the flags (in that order) and resumes execution. See Intel documentation for details on IRET operation in privileged mode. On the 8038680486, the IRETD instruction should be used to pop a 32-bit instruction pointer when returning from an interrupt called from a 32-bit segment. The F suffix prevents epilogue code from being generated when ending a PROC block. Use it to terminate interrupt service procedures. Flags Encoding O D I T S Z A P C 11001111
Syntax IRET IRETD * IRETF IRETDF*
* 8038680486 only. The first protected-mode timing is for interrupts to the same privilege level within a task. The second is for interrupts to a higher privilege level within a task. Timings for interrupts through task gates are not shown.
Examples
iret
CPU 88/86 286 386 486
Clock Cycles 32 (88=44) 17+m,pm=(31,55)+m 22,pm=38,82 15,pm=20,36
Jcondition Jump Conditionally

Transfers execution to the specified label if the flags condition is true. The condition is tested by checking the flags shown in the table on the following page. If condition is false, no jump is taken and program execution continues at the next instruction. On the 808680286 processors, the label given as the operand must be short (between 128 and +127 bytes from the instruction following the jump).* The 8038680486 processors allow near jumps (32,768 to +32,767 bytes). On the 8038680486, the assembler generates the shortest jump possible, unless the jump size is explicitly specified. When the 8038680486 processors are in FLAT memory model, short jumps range from 128 to +127 bytes and near jumps range from 2 to +2 gigabytes. There are no far jumps. Flags No change
96
Jcondition
Jump Conditionally
Encoding
0111cond
Syntax
disp (1)
Examples
jg jo jpe bigger SHORT too_big p_even
CPU 88/86 286 386 486
Clock Cycles 16,noj =4 7+m,noj =3 7+m,noj =3 3,noj =1
Jcondition label
Encoding
00001111
Syntax
1000cond
disp (2)
CPU 88/86 286 386 486 Clock Cycles 7+m,noj =3 3,noj =1
Examples
je next jnae lesser js negative
Jcondition label
* If a source file for an 808680286 program contains a conditional jump outside the range of 128 to +127 bytes, the assembler emits a level 3 warning and generates two instructions (including an unconditional jump) that are the equivalent of the desired instruction. This behavior can be enabled and disabled with the OPTION LJMP and OPTION NOLJMP directives. Near labels are only available on the 8038680486. They are the default.
Jump Conditions
Opcode * size 0010 size 0011 size 0110 size 0111 size 0100 size 0101 size 1100 size 1101 size 1110 size 1111 size 1000 size 1001
Mnemonic JB/JNAE JAE/JNB JBE/JNA JA/JNBE JE/JZ JNE/JNZ JL/JNGE JGE/JNL JLE/JNG JG/JNLE JS JNS
Flags Checked CF=1 CF=0 CF=1 or ZF=1 CF=0 and ZF=0 ZF=1 ZF=0 SF_OF SF=OF ZF=1 or SF_OF ZF=0 and SF=OF SF=1 SF=0
Description Jump if below/not above or equal (unsigned comparisons) Jump if above or equal/not below (unsigned comparisons) Jump if below or equal/not above (unsigned comparisons) Jump if above/not below or equal (unsigned comparisons) Jump if equal (zero) Jump if not equal (not zero) Jump if less/not greater or equal (signed comparisons) Jump if greater or equal/not less (signed comparisons) Jump if less or equal/not greater (signed comparisons) Jump if greater/not less or equal (signed comparisons) Jump if sign Jump if not sign
JMP Opcode * size 0010 size 0011 size 0000 size 0001 size 1010 size 1011 Mnemonic JC JNC JO JNO JP/JPE JNP/JPO Flags Checked CF=1 CF=0 OF=1 OF=0 PF=1 PF=0
Jump Unconditionally
97
Description Jump if carry Jump if not carry Jump if overflow Jump if not overflow Jump if parity/parity even Jump if no parity/parity odd
* The size bits are 0111 for short jumps or 1000 for 8038680486 near jumps.
JCXZ/JECXZ Jump if CX is Zero

Transfers program execution to the specified label if CX is 0. On the 80386 80486, JECXZ can be used to jump if ECX is 0. If the count register is not 0, execution continues at the next instruction. The label given as the operand must be short (between 128 and +127 bytes from the instruction following the jump). Flags Encoding No change 11100011
Syntax JCXZ label JECXZ label*
disp (1)
Examples
jcxz not found
CPU 88/86 286 386 486
Clock Cycles 18,noj =6 8+m,noj =4 9+m,noj =5 8,noj =5
* 8038680486 only.
JMP Jump Unconditionally

Transfers program execution to the address specified by the destination operand. Jumps are near (between 32,768 and +32,767 bytes from the instruction following the jump), or short (between 128 and +127 bytes), or far (in a different code segment). Unless a distance is explicitly specified, the assembler selects the shortest possible jump. With near and short jumps, the operand specifies a new IP address. With far jumps, the operand specifies new IP and CS addresses.
98
JMP
Jump Unconditionally
When the 8038680486 processors are in FLAT memory model, short jumps range from 128 to +127 bytes and near jumps range from 2 to +2 gigabytes. Flags Encoding No change 11101011
Syntax JMP label
disp (1)
Examples
jmp SHORT exit
CPU 88/86 286 386 486
Clock Cycles 15 7+m 7+m 3
Encoding
11101001
Syntax JMP label
disp (2*)
Examples
jmp jmp close NEAR PTR distant
CPU 88/86 286 386 486
Clock Cycles 15 7+m 7+m 3
Encoding
11101010
Syntax JMP label
disp (4*)
Examples
jmp jmp FAR PTR close distant
CPU 88/86 286 386 486
Clock Cycles 15 11+m,pm=23+m 12+m,pm=27+m 17,pm=19
Encoding
11111111
Syntax
mod,100,r/m
disp (0 or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 11 7+m 7+m 5 18+EA 11+m 10+m 5
Examples
jmp ax
JMP reg16 JMP mem32
JMP mem16 JMP mem32
jmp jmp jmp
WORD PTR [bx] table[di] DWORD PTR [si]
LAR Load Access Rights
99
Encoding
11111111
Syntax
mod,101,r/m
disp (4*)
CPU 88/86 286 386 486 Clock Cycles 24+EA 15+m,pm=26+m 12+m,pm=27+m 13,pm=18
Examples
jmp jmp jmp fpointer[si] DWORD PTR [bx] FWORD PTR [di]
JMP mem32 JMP mem48
* On the 8038680486, the displacement can be 4 bytes for near jumps or 6 bytes for far jumps. Timings for jumps through call or task gates are not shown, since they are normally used only in operating systems. 8038680486 only. You can use DWORD PTR to specify near register-indirect jumps or FWORD PTR to specify far register-indirect jumps.
LAHF Load Flags into AH Register

Transfers bits 0 to 7 of the flags register to AH. This includes the carry, parity, auxiliary carry, zero, and sign flags, but not the trap, interrupt, direction, or overflow flags. Flags Encoding No change 10011111
Syntax LAHF Examples
lahf
CPU 88/86 286 386 486
LAR Load Access Rights

80286-80486 Protected Only Loads the access rights of a selector into a specified register. The source operand must be a register or memory operand containing a selector. The destination operand must be a register that will receive the access rights if the selector is valid and visible at the current privilege level. The zero flag is set if the access rights are transferred, or cleared if they are not. See Intel documentation for details on selectors, access rights, and other privileged-mode concepts.
100
LDS/LES/LFS/LGS/LSS
Load Far Pointer
Flags Encoding
O D I
T S Z A P C 00000010 mod, reg, r/m

Examples
lar ax,bx
00001111
Syntax
disp (0, 1, 2, or 4)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 14 15 11 16 16 11
LAR reg16,reg16 LAR reg32,reg32*
LAR reg16,mem16 LAR reg32,mem32*
lar
cx,selector
* 8038680486 only.
LDS/LES/LFS/LGS/LSS Load Far Pointer

Reads and stores the far pointer specified by the source memory operand. The instruction moves the pointers segment value into DS, ES, FS, GS, or SS (depending on the instruction). Then it moves the pointers offset value into the destination operand. The LDS and LES instructions are available on all processors. The LFS, LGS, and LSS instructions are available only on the 8038680486. Flags Encoding No change 11000101
Syntax LDS reg,mem
mod, reg, r/m
disp (2)
CPU 88/86 286 386 486 Clock Cycles 16+EA (88=24+EA) 7,pm=21 7,pm=22 6,pm=12
Examples
lds si,fpointer
Encoding
11000100
Syntax
mod, reg, r/m
disp (2)
CPU 88/86 286 386 486 Clock Cycles 16+EA (88=24+EA) 7,pm=21 7,pm=22 6,pm=12
Examples
les di,fpointer
LES reg,mem
LEA Load Effective Address
101
Encoding
00001111
Syntax LFS reg,mem
10110100
mod, reg, r/m
disp (2 or 4)
CPU 88/86 286 386 486 Clock Cycles 7,pm=25 6,pm=12
Examples
lfs edi,fpointer
Encoding
00001111
Syntax
10110101
mod, reg, r/m
disp (2 or 4)
Examples
lgs bx,fpointer
LGS reg,mem
Encoding
00001111
Syntax LSS reg,mem
10110010
mod, reg, r/m
disp (2 or 4)
Examples
lss bp,fpointer
LEA Load Effective Address

Calculates the effective address (offset) of the source memory operand and stores the result in the destination register. If the source operand is a direct memory address, the assembler encodes the instruction in the more efficient MOV reg,immediate form (equivalent to MOV reg, OFFSET mem). Flags Encoding No change 10001101
Syntax LEA reg16,mem LEA reg32,mem *
mod, reg, r/m
disp (2)
CPU 88/86 286 386 486 Clock Cycles 2+EA 3 2 1
Examples
lea bx,npointer
* 8038680486 only. 2 if index register used.
102
LEAVE High Level Procedure Exit
LEAVE High Level Procedure Exit

Terminates the stack frame of a procedure. LEAVE reverses the action of a previous ENTER instruction by restoring SP and BP to the values they had before the procedure stack frame was initialized. LEAVE is equivalent to mov sp,bp, followed by pop bp. Flags Encoding No change 11001001
Syntax LEAVE Examples
leave
CPU 88/86 286 386 486
Clock Cycles 5 4 5
LES/LFS/LGS Load Far Pointer to Extra Segment

See LDS.
LGDT/LIDT/LLDT Load Descriptor Table

Loads a value from an operand into a descriptor table register. LGDT loads into the Global Descriptor Table, LIDT into the Interrupt Vector Table, and LLDT into the Local Descriptor Table. These instructions are available only in privileged mode. See Intel documentation for details on descriptor tables and other protected-mode concepts. Flags Encoding No change 00001111
Syntax LGDT mem48
00000001
mod, 010,r/m
disp (2)
CPU 88/86 286 386 486 Clock Cycles 11 11 11
Examples
lgdt descriptor
LMSW Load Machine Status Word
103
Encoding
00001111
Syntax
00000001
mod, 011,r/m
disp (2)
CPU 88/86 286 386 486 Clock Cycles 12 11 11
Examples
lidt descriptor
LIDT mem48
Encoding
00001111
Syntax LLDT reg16
00000000
mod, 010,r/m
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 17 20 11 19 24 11
Examples
lldt ax
LLDT mem16
lldt
selector
LMSW Load Machine Status Word

80286-80486 Privileged Only Loads a value from a memory operand into the Machine Status Word (MSW). This instruction is available only in privileged mode. See Intel documentation for details on the MSW and other protectedmode concepts. Flags Encoding No change 00001111
Syntax LMSW reg16
00000001
mod, 110,r/m
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 3 10 13 6 13 13
Examples
lmsw ax
LMSW mem16
lmsw
machine
104
LOCK Lock the Bus
LOCK Lock the Bus

Locks out other processors during execution of the next instruction. This instruction is a prefix. It must precede an instruction that accesses a memory location that another processor might attempt to access at the same time. See Intel documentation for details on multiprocessor environments. Flags Encoding No change 11110000
Syntax LOCK instruction Examples
lock xchg ax,sem
CPU 88/86 286 386 486
LODS/LODSB/LODSW/LODSD Load Accumulator from String LODS/LODSB/LODSW/LODSD Accumulator from String Loads the accumulator register with anLoad element from a string in memory. DS:SI
must point to the source element, even if an operand is given. For each source element loaded, SI is adjusted according to the size of the operand and the status of the direction flag. SI is incremented if the direction flag has been cleared with CLD or decremented if the direction flag has been set with STD. If the LODS form of the instruction is used, an operand must be provided to indicate the size of the data elements to be processed. A segment override can be given. If LODSB (bytes), LODSW (words), or LODSD (doublewords on the 8038680486 only) is used, the instruction determines the size of the data elements to be processed and whether the element will be loaded to AL, AX, or EAX. LODS and its variations are not used with repeat prefixes, since there is no reason to repeatedly load memory values to a register. Flags No change
LOOPcondition/LOOPconditionW/LOOPconditionD Loop Conditionally
105
Encoding
1010110w
Syntax LODS [ [segreg:] ]src LODSB [ [[ [segreg:] ]src] ] LODSW[ [[ [segreg:] ]src] ] LODSD [ [[ [segreg:] ]src] ] Examples
lods es:source lodsw
CPU 88/86 286 386 486
Clock Cycles 12 (W88=16) 5 5 5
LOOP/LOOPW/LOOPD Loop
Loops repeatedly to a specified label. LOOP decrements CX (without changing any flags) and, if the result is not 0, transfers execution to the address specified by the operand. On the 8038680486, LOOP uses the 16-bit CX in 16-bit mode and the 32-bit ECX in 32-bit mode. The default can be overridden with LOOPW (CX) or LOOPD (ECX). If CX is 0 after being decremented, execution continues at the next instruction. The operand must specify a short label (between 128 and +127 bytes from the instruction following the LOOP instruction). Flags Encoding No change 11100010
Syntax LOOP label LOOPW label* LOOPD label*
* 8038680486 only.
disp (1)
Examples
loop wend
CPU 88/86 286 386 486
Clock Cycles 17,noj =5 8+m,noj =4 11+m 7,noj =6
LOOPcondition/LOOPconditionW/LOOPconditionD Loop Conditionally

Loops repeatedly to a specified label if condition is met and if CX is not 0. On the 8038680486, these instructions use the 16-bit CX in 16-bit mode and the 32-bit ECX in 32-bit mode. This default can be overridden with the W (CX) or D (ECX) forms of the instruction. The instruction decrements CX (without changing any flags) and tests whether the zero flag was set by a previous instruction (such as CMP). With LOOPE and LOOPZ (they are synonyms),
106
LSL
Load Segment Limit
execution is transferred to the label if the zero flag is set and CX is not 0. With LOOPNE and LOOPNZ (they are synonyms), execution is transferred to the label if the zero flag is cleared and CX is not 0. Execution continues at the next instruction if the condition is not met. Before entering the loop, CX should be set to the maximum number of repetitions desired. Flags Encoding No change 11100001
Syntax LOOPE label LOOPEW label* LOOPED label* LOOPZ label LOOPZW label* LOOPZD label*
disp (1)
Examples
loopz again
CPU 88/86 286 386 486
Clock Cycles 18,noj =6 8+m,noj =4 11+m 9,noj =6
Encoding
11100000
Syntax
disp (1)
Examples
loopnz for_next
CPU 88/86 286 386 486
Clock Cycles 19,noj =5 8,noj =4 11+m 9,noj =6
LOOPNE label LOOPNEW label* LOOPNED label* LOOPNZ label LOOPNZW label* LOOPNZD label*
* 8038680486 only.
LSL Load Segment Limit

80286-80486 Protected Only Loads the segment limit of a selector into a specified register. The source operand must be a register or memory operand containing a selector. The destination operand must be a register that will receive the segment limit if the selector is valid and visible at the current privilege level. The zero flag is set if the segment limit is transferred, or cleared if it is not. See Intel documentation for details on selectors, segment limits, and other protectedmode concepts. Flags O D I T S Z A P C
LTR Load Task Register
107
Encoding
00001111
Syntax
00000011
mod, reg, r/m

Examples
lsl ax,bx
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 14 20,25 10 16 21,26 10
LSL reg16,reg16 LSL reg32,reg32*
LSL reg16,mem16 LSL reg32,mem32*
lsl
cx,seg_lim
* 8038680486 only. The first value is for byte granular; the second is for page granular.
LSS Load Far Pointer to Stack Segment

See LDS.
LTR Load Task Register

80286-80486 Protected Only Loads a value from the specified operand to the current task register. LTR is available only in privileged mode. See Intel documentation for details on task registers and other protected-mode concepts. Flags Encoding No change 00001111
Syntax LTR reg16
00000000
mod, 011,r/m
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 17 23 20 19 27 20
Examples
ltr ax
LTR mem16
ltr
task
108
MOV Move Data
MOV Move Data

Moves the value in the source operand to the destination operand. If the destination operand is SS, interrupts are disabled until the next instruction is executed (except on early versions of the 8088 and 8086). Flags Encoding No change 100010dw
Syntax MOV reg,reg
mod, reg, r/m
disp (0, 1, or 2)
Examples
mov mov mov mov mov dh,bh dx,cx bp,sp array[di],bx count,cx
MOV mem,reg
MOV reg,mem
mov mov
bx,pointer dx,matrix[bx+di]
Encoding
1100011w
Syntax
mod, 000,r/m
disp (0, 1, or 2)
data (1 or 2)
CPU 88/86 286 386 486 Clock Cycles 10+EA (W88=14+EA) 3 2 1
Examples
mov mov [bx],15 color,7
MOV mem,immed
Encoding
1011w reg
Syntax
data (1 or 2)
Examples
mov mov cx,256 dx,OFFSET string
CPU 88/86 286 386 486
MOV reg,immed
MOV Move to/from Special Registers
109
Encoding
101000aw
Syntax
disp (2)
Examples
mov total,ax
CPU 88/86 286 386 486 88/86 286 386 486
Clock Cycles 10 (W88=14) 3 2 1 10 (W88=14) 5 4 1
MOV mem,accum
MOV accum,mem
mov
al,string
Encoding
100011d0
Syntax
mod,sreg, r/m
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486 Clock Cycles 2 2,pm=17 2,pm=18 3,pm=9 8+EA (88=12+EA) 5,pm=19 5,pm=19 3,pm=9 2 2 2 3 9+EA (88=13+EA) 3 2 3
Examples
mov ds,ax
MOV segreg,reg16
MOV segreg,mem16
mov
es,psp
MOV reg16,segreg
mov
ax,ds
MOV mem16,segreg
mov
stack_save,ss

8038680486 Only Moves a value from a special register to or from a 32-bit general-purpose register. The special registers include the control registers CR0, CR2, and CR3; the debug registers DR0, DR1, DR2, DR3, DR6, and DR7; and the test registers TR6 and TR7. On the 80486, the test registers TR3, TR4, and TR5 are also available. See Intel documentation for details on special registers. Flags O D I ? T S Z A P C ? ? ? ? ?
110
Encoding
00001111
Syntax
001000d0
11, reg*, r/m

Examples
mov eax,cr2
CPU 88/86 286 386 486 88/86 286 386 486
Clock Cycles 6 4 CR0=10,CR2=4,CR3= 5 4,CR0=16
MOV reg32, controlreg
MOV controlreg,reg32
mov
cr0,ebx
Encoding
00001111
Syntax
001000d1
11, reg*, r/m

Examples
mov edx,dr3
CPU 88/86 286 386 486 88/86 286 386 486
Clock Cycles DR03=22,DR67=14 10 DR03=22,DR67=16 11
MOV reg32,debugreg
MOV debugreg,reg32
mov
dr0,ecx
Encoding
00001111
Syntax
001001d0
11,reg*, r/m
Examples
mov edx,tr6
CPU 88/86 286 386 486 88/86 286 386 486
Clock Cycles 12 4,TR3=3 12 4,TR3=6
MOV reg32,testreg
MOV testreg, reg32
mov
tr7,eax
* The reg field contains the register number of the special register (for example, 000 for CR0, 011 for DR7, or 111 for TR7).
MOVSX Move with Sign-Extend
111
MOVS/MOVSB/MOVSW/MOVSD Move String Data

Moves a string from one area of memory to another. DS:SI must point to the source string and ES:DI to the destination address, even if operands are given. For each element moved, DI and SI are adjusted according to the size of the operands and the status of the direction flag. They are increased if the direction flag has been cleared with CLD, or decreased if the direction flag has been set with STD. If the MOVS form of the instruction is used, operands must be provided to indicate the size of the data elements to be processed. A segment override can be given for the source operand (but not for the destination). If MOVSB (bytes), MOVSW (words), or MOVSD (doublewords on the 8038680486 only) is used, the instruction determines the size of the data elements to be processed. MOVS and its variations are normally used with the REP prefix. Flags Encoding
Syntax MOVS [ [ES:] ]dest ,[ [segreg:] ]src MOVSB [ [[ [ES:] ]dest ,[ [segreg:] ]src] ] MOVSW [ [[ [ES:] ]dest ,[ [segreg:] ]src] ] MOVSD [ [[ [ES:] ]dest ,[ [segreg:] ]src] ]
No change 1010010w
Examples
rep movs movsb dest,es:source
CPU 88/86 286 386 486
MOVSX Move with Sign-Extend

8038680486 Only Moves and sign-extends the value of the source operand to the destination register. MOVSX is used to copy a signed 8-bit or 16-bit source operand to a larger 16-bit or 32-bit destination register. Flags Encoding No change 00001111
Syntax MOVSX reg,reg
1011111w
mod, reg, r/m
disp (0, 1, 2, or 4)
CPU 88/86 286 386 486 Clock Cycles 3 3
Examples
movsx movsx movsx eax,bx ecx,bl bx,al
112
MOVZX Move with Zero-Extend Syntax MOVSX reg,mem Examples

movsx movsx movsx cx,bsign edx,wsign eax,bsign
CPU 88/86 286 386 486
Clock Cycles 6 3
MOVZX Move with Zero-Extend

8038680486 Only Moves and zero-extends the value of the source operand to the destination register. MOVZX is used to copy an unsigned 8-bit or 16-bit source operand to a larger 16-bit or 32-bit destination register. Flags Encoding No change 00001111
Syntax MOVZX reg,reg
1011011w
mod, reg, r/m
disp (0, 1, 2, or 4)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 3 3 6 3
Examples
movzx movzx movzx movzx movzx movzx eax,bx ecx,bl bx,al cx,bunsign edx,wunsign eax,bunsign
MOVZX reg,mem
MUL Unsigned Multiply

Multiplies an implied destination operand by a specified source operand. Both operands are treated as unsigned numbers. If a single 16-bit operand is given, the implied destination is AX and the product goes into the DX:AX register pair. If a single 8-bit operand is given, the implied destination is AL and the product goes into AX. On the 8038680486, if the operand is EAX, the product goes into the EDX:EAX register pair. The carry and overflow flags are set if DX is not 0 for 16-bit operands or if AH is not 0 for 8-bit operands. Flags O D I T S Z A P C ? ? ? ?
NEG Twos Complement Negation
113
Encoding
1111011w
Syntax MUL reg
mod, 100, r/m

Examples
mul mul bx dl
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles b=7077,w=118133 b=13,w=21 b=914,w=922,d=938* b=1318,w=1326,d=1342 (b=7683,w=124139)+EA b=16,w=24 b=1217,w=1225,d=1241* b=1318,w=1326,d=1342
MUL mem
mul mul
factor WORD PTR [bx]
* The 8038680486 processors have an early-out multiplication algorithm. Therefore, multiplying an 8-bit or 16-bit value in EAX takes the same time as multiplying the value in AL or AX. Word memory operands on the 8088 take (128143)+EA clocks.
NEG Twos Complement Negation

Replaces the operand with its twos complement. NEG does this by subtracting the operand from 0. If the operand is 0, the carry flag is cleared. Otherwise, the carry flag is set. If the operand contains the maximum possible negative value ( 128 for 8-bit operands or 32,768 for 16-bit operands), the value does not change, but the overflow and carry flags are set. Flags Encoding O D I T S Z A P C mod, 011, r/m
Examples
neg ax
1111011w
Syntax NEG reg
disp (0, 1, or 2)
NEG mem
neg
balance
114
NOP No Operation
NOP No Operation
Performs no operation. NOP can be used for timing delays or alignment. Flags Encoding No change 10010000*
Syntax NOP Examples
nop
CPU 88/86 286 386 486
* The encoding is the same as XCHG AX,AX.
NOT Ones Complement Negation

Toggles each bit of the operand by clearing set bits and setting cleared bits. Flags Encoding No change 1111011w
Syntax NOT reg
mod, 010, r/m

Examples
not ax
disp (0,1,or2)
NOT mem
not
masker
OR Inclusive OR
115
OR Inclusive OR
Performs a bitwise OR operation on the source and destination operands and stores the result to the destination operand. For each bit position in the operands, if either or both bits are set, the corresponding bit of the result is set. Otherwise, the corresponding bit of the result is cleared. Flags Encoding O D I 0 T S Z A P C ? 0 mod, reg, r/m disp (0, 1, or 2)
000010dw
Syntax OR reg,reg
Examples
or ax,dx
OR mem ,reg
or or
bits,dx [bp+6],cx
OR reg,mem
or or
bx,masker dx,color[di]
Encoding
100000sw
Syntax
mod,001, r/m
disp (0, 1, or 2)
data (1 or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 4 3 2 1 (b=17,w=25)+EA 7 7 3
Examples
or dx,110110b
OR reg,immed
OR mem,immed
or
flag_rec,8
Encoding
0000110w
Syntax
data (1 or 2)
Examples
or ax,40h
CPU 88/86 286 386 486
OR accum,immed
116
OUT Output to Port
OUT Output to Port

Transfers a byte or word (or a doubleword on the 8038680486) to a port from the accumulator register. The port address is specified by the destination operand, which can be DX or an 8-bit constant. In protected mode, a generalprotection fault occurs if OUT is used when the current privilege level is greater than the value of the IOPL flag. Flags Encoding No change 1110011w
Syntax OUT immed8,accum
data (1)
Examples
out 60h,al
CPU 88/86 286 386 486
Clock Cycles 10 (88=14) 3 10,pm=4,24* 16,pm=11,31*
Encoding
1110111w
Syntax OUT DX,accum Examples
out out dx,ax dx,al
CPU 88/86 286 386 486
Clock Cycles 8 (88=12) 3 11,pm=5,25* 16,pm=10,30*
* First protected-mode timing: CPL < IOPL. Second timing: CPL > IOPL.
OUTS/OUTSB/OUTSW/OUTSD Output String to Port

8018680486 Only Sends a string to a port. The string is considered the source and must be pointed to by DS:SI (even if an operand is given). The output port is specified in DX. For each element sent, SI is adjusted according to the size of the operand and the status of the direction flag. SI is increased if the direction flag has been cleared with CLD, or decreased if the direction flag has been set with STD. If the OUTS form of the instruction is used, an operand must be provided to indicate the size of data elements to be sent. A segment override can be given. If OUTSB (bytes), OUTSW (words), or OUTSD (doublewords on the 80386 80486 only) is used, the instruction determines the size of the data elements to be sent.
POP
Pop
117
OUTS and its variations are normally used with the REP prefix. Before the instruction is executed, CX should contain the number of elements to send. In protected mode, a general-protection fault occurs if OUTS is used when the current privilege level is greater than the value of the IOPL flag. Flags Encoding No change 0110111w
Syntax OUTS DX, [ [segreg:] ] src OUTSB [ [DX, [ [segreg:] ] src] ] OUTSW [ [DX, [ [segreg:] ] src] ] OUTSD [ [DX, [ [segreg:] ] src] ] Examples
rep outs dx,buffer outsb rep outsw
CPU 88/86 286 386 486
Clock Cycles 5 14,pm=8,28* 17,pm=10,32*
* First protected-mode timing: CPL < IOPL. Second timing: CPL > IOPL.
POP Pop
Pops the top of the stack into the destination operand. The value at SS:SP is copied to the destination operand and SP is increased by 2. The destination operand can be a memory location, a general-purpose 16-bit register, or any segment register except CS. Use RET to pop CS. On the 8038680486, 32-bit values can be popped by giving a 32-bit operand. ESP is increased by 4 for 32bit pops. Flags Encoding No change 01011 reg
Syntax POP reg16 POP reg32* Examples
pop cx
CPU 88/86 286 386 486
Clock Cycles 8 (88=12) 5 4 1
Encoding
10001111
Syntax
mod,000,r/m
disp (2)
CPU 88/86 286 386 486 Clock Cycles 17+EA (88=25+EA) 5 5 6
Examples
pop param
POP mem16 POP mem32*
118
POPA/POPAD Pop All
Encoding
000,sreg,111
Syntax POP segreg Examples
pop pop pop es ds ss
CPU 88/86 286 386 486
Clock Cycles 8 (88=12) 5,pm=20 7,pm=21 3,pm=9
Encoding
00001111
Syntax POP segreg*
10,sreg,001
Examples
pop pop fs gs
CPU 88/86 286 386 486
Clock Cycles 7,pm=21 3,pm=9
* 8038680486 only.
POPA/POPAD Pop All

80186-80486 Only Pops the top 16 bytes on the stack into the eight generalpurpose registers. The registers are popped in the following order: DI, SI, BP, SP, BX, DX, CX, AX. The value for the SP register is actually discarded rather than copied to SP. POPA always pops into 16-bit registers. On the 80386 80486, use POPAD to pop into 32-bit registers. Flags Encoding No change 01100001
Syntax POPA POPAD* Examples
popa
CPU 88/86 286 386 486
Clock Cycles 19 24 9
* 8038680486 only.
PUSH/PUSHW/PUSHD Push
119
POPF/POPFD Pop Flags

Pops the value on the top of the stack into the flags register. POPF always pops into the 16-bit flags register. On the 8038680486, use POPFD to pop into the 32-bit flags register. Flags Encoding O D I T S Z A P C 10011101
Syntax POPF POPFD* Examples
popf
CPU 88/86 286 386 486
Clock Cycles 8 (88=12) 5 5 9,pm=6
* 8038680486 only.
PUSH/PUSHW/PUSHD Push
Pushes the source operand onto the stack. SP is decreased by 2 and the source value is copied to SS:SP. The operand can be a memory location, a generalpurpose 16-bit register, or a segment register. On the 8018680486 processors, the operand can also be a constant. On the 8038680486, 32-bit values can be pushed by specifying a 32-bit operand. ESP is decreased by 4 for 32-bit pushes. On the 8088 and 8086, PUSH SP saves the value of SP after the push. On the 8018680486 processors, PUSH SP saves the value of SP before the push. The PUSHW and PUSHD instructions push a word (2 bytes) and a doubleword (4 bytes), respectively. Flags Encoding No change 01010 reg
Syntax PUSH reg16 PUSH reg32* PUSHW reg16 PUSHD reg32* Examples
push dx
CPU 88/86 286 386 486
Clock Cycles 11 (88=15) 3 2 1
120
PUSHA/PUSHAD Push All
Encoding
11111111
Syntax
mod, 110,r/m
disp (2)
CPU 88/86 286 386 486 Clock Cycles 16+EA (88=24+EA) 5 5 4
Examples
push push [di] fcount
PUSH mem16 PUSH mem32*
Encoding
00,sreg,110
Syntax PUSH segreg PUSHW segreg PUSHD segreg* Examples
push push push es ss cs
CPU 88/86 286 386 486
Clock Cycles 10 (88=14) 3 2 3
Encoding
00001111
Syntax
10,sreg,000
Examples
push push fs gs
CPU 88/86 286 386 486
Clock Cycles 2 3
PUSH segreg PUSHW segreg PUSHD segreg*
Encoding
011010s0
Syntax
data (1 or 2)
Examples
push push 'a' 15000
CPU 88/86 286 386 486
Clock Cycles 3 2 1
PUSH immed PUSHW immed PUSHD immed*
* 8038680486 only.
PUSHA/PUSHAD Push All

8018680486 Only Pushes the eight general-purpose registers onto the stack. The registers are pushed in the following order: AX, CX, DX, BX, SP, BP, SI, DI. The value pushed for SP is the value before the instruction. PUSHA always pushes 16-bit registers. On the 8038680486, use PUSHAD to push 32-bit registers. Flags No change
RCL/RCR/ROL/ROR Rotate
121
Encoding
01100000
Syntax PUSHA PUSHAD * Examples
pusha
CPU 88/86 286 386 486
* 8038680486 only.
PUSHF/PUSHFD Push Flags

Pushes the flags register onto the stack. PUSHF always pushes the 16-bit flags register. On the 8038680486, use PUSHFD to push the 32-bit flags register. Flags Encoding No change 10011100
Syntax PUSHF PUSHFD * Examples
pushf
CPU 88/86 286 386 486
Clock Cycles 10(88=14) 3 4 4,pm=3
* 8038680486 only.
Rotates the bits in the destination operand the number of times specified in the source operand. RCL and ROL rotate the bits left; RCR and ROR rotate right. ROL and ROR rotate the number of bits in the operand. For each rotation, the leftmost or rightmost bit is copied to the carry flag as well as rotated. RCL and RCR rotate through the carry flag. The carry flag becomes an extension of the operand so that a 9-bit rotation is done for 8-bit operands, or a 17-bit rotation for 16-bit operands. On the 8088 and 8086, the source operand can be either CL or 1. On the 8018680486, the source operand can be CL or an 8-bit constant. On the 8018680486, rotate counts larger than 31 are masked off, but on the 8088 and 8086, larger rotate counts are performed despite the inefficiency involved. The
122
overflow flag is modified only by single-bit variations of the instruction; for multiple-bit variations, the overflow flag is undefined. Flags Encoding O D I T S Z A P C mod, TTT*,r/m
Examples
ror rol ax,1 dl,1
1101000w
Syntax ROL reg,1 ROR reg,1
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486 Clock Cycles 2 2 3 3 2 2 9 3 15+EA (W88=23+EA) 7 7 4 15+EA (W88=23+EA 7 10 4
RCL reg,1 RCR reg,1
rcl rcr
dx,1 bl,1
ROL mem ,1 ROR mem ,1
ror rol
bits,1 WORD PTR [bx],1
RCL mem ,1 RCR mem ,1
rcl rcr
WORD PTR [si],1 WORD PTR m32[0],1
Encoding
1101001w
Syntax ROL reg,CL ROR reg,CL
mod, TTT*,r/m
Examples
ror rol
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486 Clock Cycles 8+4n 5+n 3 3 8+4n 5+n 9 830 20+EA+4n (W88=28+EA+4n) 8+n 7 4
ax,cl dx,cl
RCL reg,CL RCR reg,CL
rcl rcr
dx,cl bl,cl
ROL mem ,CL ROR mem ,CL
ror rol
color,cl WORD PTR [bp+6],cl
REP Repeat String Syntax RCL mem ,CL RCR mem ,CL Examples
rcr rcl WORD PTR [bx+di],cl masker
123
CPU 88/86 286 386 486
Clock Cycles 20+EA+4n (W88=28+EA+4n) 8+n 10 931
Encoding
1100000w
Syntax
mod,TTT*,r/m
Examples
rol ror
disp (0, 1, or 2)
data (1)
CPU 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486 Clock Cycles 5+n 3 2 5+n 9 830 8+n 7 4 8+n 10 931
ROL reg,immed8 ROR reg,immed8
ax,13 bl,3 286
RCL reg,immed8 RCR reg,immed8
rcl rcr
bx,5 si,9
ROL mem ,immed8 ROR mem ,immed8
rol ror
BYTE PTR [bx],10 bits,6
RCL mem ,immed8 RCR mem ,immed8
rcl rcr
WORD PTR [bp+8], masker,3
* TTT represents one of the following bit codes: 000 for ROL, 001 for ROR, 010 for RCL, or 011 for RCR.
REP Repeat String

Repeats a string instruction the number of times indicated by CX. First, CX is compared to 0; if it equals 0, execution proceeds to the next instruction. Otherwise, CX is decremented, the string instruction is performed, and the loop continues. REP is used with MOVS and STOS. REP also can be used with INS and OUTS on the 8018680486 processors. On all processors except the 8038680486, combining a repeat prefix with a segment override can cause errors if an interrupt occurs. Flags No change
124
REP Repeat String
Encoding
11110011
Syntax
1010010w
Examples
rep rep movs source,dest movsw
CPU 88/86 286 386 486
Clock Cycles 9+17n (W88=9+25n) 5+4n 7+4n 12+3n#
REP MOVS dest ,src REP MOVSB [ [dest ,src] ] REP MOVSW [ [dest ,src] ] REP MOVSD [ [dest ,src] ]*
Encoding
11110011
Syntax
1010101w
Examples
rep rep stosb stos dest
CPU 88/86 286 386 486
Clock Cycles 9+10n (W88=9+14n) 4+3n 5+5n 7+4n
REP STOS dest REP STOSB [ [dest ] ] REP STOSW [ [dest ] ] REP STOSD [ [dest ] ]*
Encoding
11110011
Syntax
1010101w
Examples
rep rep lodsb lods dest
CPU 88/86 286 386 486
Clock Cycles 7+4n
REP LODS dest REP LODSB [ [dest ] ] REP LODSW [ [dest ] ] REP LODSD [ [dest ] ]*
Encoding
11110011
Syntax
0110110w
Examples
rep rep insb ins dest,dx
CPU 88/86 286 386 486
Clock Cycles 5+4n 13+6n,pm=(7,27)+6n 16+8n,pm=(10,30)+8n
REP INS dest ,DX REP INSB [ [dest ,DX] ] REP INSW [ [dest ,DX] ] REP INSD [ [dest ,DX] ]*
Encoding
11110011
Syntax
0110111w
Examples
rep rep outs dx,source outsw
CPU 88/86 286 386 486
Clock Cycles 5+4n 12+5n,pm=(6,26)+5n 17+5n,pm=(11,31)+5n
REP OUTS DX,src REP OUTSB [ [src] ] REP OUTSW [ [src] ] REP OUTSD [ [src] ]*
* 8038680486 only. # 5 if n = 0, 13 if n = 1. 5 if n = 0.
First protected-mode timing: CPL IOPL. Second timing: CPL > IOPL.
REPcondition Repeat String Conditionally
125
REPcondition Repeat String Conditionally

Repeats a string instruction as long as condition is true and the maximum count has not been reached. REPE and REPZ (they are synonyms) repeat while the zero flag is set. REPNE and REPNZ (they are synonyms) repeat while the zero flag is cleared. The conditional-repeat prefixes should only be used with SCAS and CMPS, since these are the only string instructions that modify the zero flag. Before executing the instruction, CX should be set to the maximum allowable number of repetitions. First, CX is compared to 0; if it equals 0, execution proceeds to the next instruction. Otherwise, CX is decremented, the string instruction is performed, and the loop continues. On all processors except the 8038680486, combining a repeat prefix with a segment override may cause errors if an interrupt occurs during a string operation. Flags Encoding O D I 11110011
Syntax REPE CMPS src,dest REPE CMPSB [ [src,dest ] ] REPE CMPSW [ [src,dest ] ] REPE CMPSD [ [src,dest ] ]*
T S Z A P C 1010011w
Examples
repz cmpsb repe cmps src,dest
CPU 88/86 286 386 486
Encoding
11110011
Syntax
1010111w
Examples
repe repz scas dest scasw
CPU 88/86 286 386 486
REPE SCAS dest REPE SCASB [ [dest ] ] REPE SCASW [ [dest ] ] REPE SCASD [ [dest ] ]*
Encoding
11110010
Syntax
1010011w
Examples
repne cmpsw repnz cmps src,dest
CPU 88/86 286 386 486
REPNE CMPS src,dest REPNE CMPSB [ [src,dest ] ] REPNE CMPSW [ [src,dest ] ] REPNE CMPSD [ [src,dest ] ]*
126
RET/RETN/RETF Return from Procedure
Encoding
11110010
Syntax
1010111w
Examples
repne scas dest repnz scasb
CPU 88/86 286 386 486
Clock Cycles 9+15n (W88=9+19n) 5+8n 5+8n 7+5n*
REPNE SCAS des REPNE SCASB [ [dest ] ] REPNE SCASW [ [dest ] ] REPNE SCASD [ [dest ] ]*
* 8038680486 only. # 5 if n=0.
RET/RETN/RETF Return from Procedure

Returns from a procedure by transferring control to an address popped from the top of the stack. A constant operand can be given indicating the number of additional bytes to release. The constant is normally used to adjust the stack for arguments pushed before the procedure was called. The size of a return (near or far) is the size of the procedure in which the RET is defined with the PROC directive. RETN can be used to specify a near return; RETF can specify a far return. A near return pops a word into IP. A far return pops a word into IP and then pops a word into CS. After the return, the number of bytes given in the operand (if any) is added to SP. Flags Encoding No change 11000011
Syntax RET RETN Examples
ret retn
CPU 88/86 286 386 486
Clock Cycles 16 (88=20) 11+m 10+m 5
Encoding
11000010
Syntax
data (2)
Examples
ret retn 2 8
CPU 88/86 286 386 486
Clock Cycles 20 (88=24) 11+m 10+m 5
RET immed16 RETN immed16
SAHF Store AH into Flags
127
Encoding
11001011
Syntax RET RETF Examples
ret retf
CPU 88/86 286 386 486
Clock Cycles 26 (88=34) 15+m,pm=25+m,55* 18+m,pm=32+m,62* 13,pm=18,33*
Encoding
11001010
Syntax
data (2)
Examples
ret retf 8 32
CPU 88/86 286 386 486
Clock Cycles 25 (88=33) 15+m,pm=25+m,55* 18+m,pm=32+m,68* 14,pm=17,33*
RET immed16 RETF immed16
* The first protected-mode timing is for a return to the same privilege level; the second is for a return to a lesser privilege level.
ROL/ROR Rotate
See RCL/RCR.
SAHF Store AH into Flags

Transfers AH into bits 0 to 7 of the flags register. This includes the carry, parity, auxiliary carry, zero, and sign flags, but not the trap, interrupt, direction, or overflow flags. Flags Encoding O D I T S Z A P C
10011110
Syntax SAHF Examples
sahf
CPU 88/86 286 386 486
128
SAL/SAR Shift
SAL/SAR Shift
See SHL/SHR/SAL/SAR.
SBB Subtract with Borrow

Adds the carry flag to the second operand, then subtracts that value from the first operand. The result is assigned to the first operand. SBB is used to subtract the least significant portions of numbers that must be processed in multiple registers. Flags Encoding O D I T S Z A P C mod, reg, r/m disp (0, 1, or 2)
000110dw
Syntax SBB reg,reg
Examples
sbb dx,cx
SBB mem ,reg
sbb
WORD PTR m32[2],dx
SBB reg,mem
sbb
dx,WORD PTR m32[2]
Encoding
100000sw
Syntax
mod,011, r/m
disp (0, 1, or 2)
data (1 or 2)
Examples
sbb dx,45
SBB reg,immed
SBB mem,immed
sbb
WORD PTR m32[2],40
SCAS/SCASB/SCASW/SCASD Scan String Flags
129
Encoding
0001110w
Syntax
data (1 or 2)
Examples
sbb ax,320 88/86
CPU 4 86 386 486
Clock Cycles 3 2 1
SBB accum,immed
SCAS/SCASB /SCASW /SCASD Scan String Flags

Scans a string to find a value specified in the accumulator register. The string to be scanned is considered the destination. ES:DI must point to that string, even if an operand is specified. For each element, the destination element is subtracted from the accumulator value and the flags are updated to reflect the result (although the result is not stored). DI is adjusted according to the size of the operands and the status of the direction flag. DI is increased if the direction flag has been cleared with CLD, or decreased if the direction flag has been set with STD. If the SCAS form of the instruction is used, an operand must be provided to indicate the size of the data elements to be processed. No segment override is allowed. If SCASB (bytes), SCASW (words), or SCASD (doublewords on the 8038680486 only) is used, the instruction determines the size of the data elements to be processed and whether the element scanned for is in AL, AX, or EAX. SCAS and its variations are normally used with repeat prefixes. REPNE (or REPNZ) is used to find the first element in a string that matches the value in the accumulator register. REPE (or REPZ) is used to find the first mismatch. Before the scan, CX should contain the maximum number of elements to scan. After a REPNE SCAS, the zero flag is clear if the string does not contain the accumulator value. After a REPE SCAS, the zero flag is set if the string contains nothing but the accumulator value. When the instruction finishes, ES:DI points to the element that follows (if the direction flag is clear) or precedes (if the direction flag is set) the match or mismatch. If CX decrements to 0, ES:DI points to the element that follows or precedes the last comparison. The zero flag is set or clear according to the result of the last comparison, not according to the value of CX. Flags O D I T S Z A P C
130
SETcondition
Set Conditionally
Encoding
1010111w
Syntax SCAS [ [ES:] ] dest SCASB [ [[ [ES:] ] dest ] ] SCASW [ [[ [ES:] ] dest ] ] SCASD [ [[ [ES:] ] dest ] ]*
* 8038680486 only
Examples
repne repe scas scasw scasb es:destin
CPU 88/86 286 386 486
SETcondition Set Conditionally

8038680486 Only Sets the byte specified in the operand to 1 if condition is true or to 0 if condition is false. The condition is tested by checking the flags shown in the table on the following page. The instruction is used to set Boolean flags conditionally. Flags Encoding No change 00001111
Syntax SETcondition reg8
1001cond
mod,000,r/m
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 4 true=4, false=3 5 true=3, false=4
Examples
setc setz setae seto setle sete dh al bl BTYE PTR [ebx] flag Booleans[di]
SETcondition mem8
Set Conditions
Opcode 10010010 10010011 10010110 10010111 10010100 10010101 Opcode
Mnemonic SETB/SETNAE SETAE/SETNB SETBE/SETNA SETA/SETNBE SETE/SETZ SETNE/SETNZ Mnemonic
Flags Checked CF=1 CF=0 CF=1 or ZF=1 CF=0 and ZF=0 ZF=1 ZF=0 Flags Checked
Description Set if below/not above or equal (unsigned comparisons) Set if above or equal/not below (unsigned comparisons) Set if below or equal/not above (unsigned comparisons) Set if above/not below or equal (unsigned comparisons) Set if equal/zero Set if not equal/not zero Description
SGDT/SIDT/SLDT Store Descriptor Table 10011100 10011101 10011110 10011111 10011000 10011001 10010010 10010011 10010000 10010001 10011010 10011011 SETL/SETNGE SETGE/SETNL SETLE/SETNG SETG/SETNLE SETS SETNS SETC SETNC SETO SETNO SETP/SETPE SETNP/SETPO SF_OF SF=OF ZF=1 or SF_OF ZF=0 and SF=OF SF=1 SF=0 F=1 CF=0 OF=1 OF=0 PF=1 PF=0 Set if less/not greater or equal (signed comparisons) Set if greater or equal/not less (signed comparisons)
131
Set if less or equal/not greater or equal (signed comparisons) Set if greater/not less or equal (signed comparisons) Set if sign Set if not sign Set if carry Set if not carry Set if overflow Set if not overflow Set if parity/parity even Set if no parity/parity odd
SGDT/SIDT/SLDT Store Descriptor Table

80286-80486 Only Stores a descriptor table register into a specified operand. SGDT stores the Global Descriptor Table; SIDT, the Interrupt Vector Table; and SLDT, the Local Descriptor Table. These instructions are generally useful only in privileged mode. See Intel documentation for details on descriptor tables and other protected-mode concepts. Flags Encoding No change 00001111
Syntax SGDT mem48
00000001
mod,000,r/m
disp (2)
CPU 88/86 286 386 486 Clock Cycles 11 9 10
Examples
sgdt descriptor
132
SHL/SHR/SAL/SAR Shift
Encoding
00001111
Syntax SIDT mem48
00000001
mod,001,r/m
disp (2)
CPU 88/86 286 386 486 Clock Cycles 12 9 10
Examples
sidt descriptor
Encoding
00001111
Syntax SLDT reg16
00000000
mod, 000,r/m
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 2 2 2 3 2 3
Examples
sldt ax
SLDT mem16
sldt
selector
Shifts the bits in the destination operand the number of times specified by the source operand. SAL and SHL shift the bits left; SAR and SHR shift right. With SHL, SAL, and SHR, the bit shifted off the end of the operand is copied into the carry flag, and the leftmost or rightmost bit opened by the shift is set to 0. With SAR, the bit shifted off the end of the operand is copied into the carry flag, and the leftmost bit opened by the shift retains its previous value (thus preserving the sign of the operand). SAL and SHL are synonyms. On the 8088 and 8086, the source operand can be either CL or 1. On the 8018680486 processors, the source operand can be CL or an 8-bit constant. On the 8018680486 processors, shift counts larger than 31 are masked off, but on the 8088 and 8086, larger shift counts are performed despite the inefficiency. Only single-bit variations of the instruction modify the overflow flag; for multiple-bit variations, the overflow flag is undefined. Flags O D I T S Z A P C ?
133
Encoding
1101000w
Syntax SAR reg,1
mod,TTT*,r/m
Examples
sar sar
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486 Clock Cycles 2 2 3 3 2 2 3 3 15+EA (W88=23+EA) 7 7 4 15+EA (W88=23+EA) 7 7 4
di,1 cl,1
SAL SHL SHR SAR
reg,1 reg,1 reg,1 mem ,1
shr shl sal sar
dh,1 si,1 bx,1 count,1
SAL mem ,1 SHL mem ,1 SHR mem ,1
sal shl shr
WORD PTR m32[0],1 index,1 unsign[di],1
88/86 286 386 486
Encoding
1101001w
Syntax SAR reg,CL
mod,TTT*,r/m
Examples
sar sar
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486 Clock Cycles 8+4n 5+n 3 3 8+4n 5+n 3 3 20+EA+4n (W88=28+EA+4n) 8+n 7 4 20+EA+4n (W88=28+EA+4n) 8+n 7 4
bx,cl dx,cl
SAL reg,CL SHL reg,CL SHR reg,CL SAR mem ,CL
shr shl sal sar sar
dx,cl di,cl ah,cl sign,cl WORD PTR [bp+8],cl
SAL mem ,CL SHL mem ,CL SHR mem ,CL
shr sal shl
WORD PTR m32[2],cl BYTE PTR [di],cl index,cl
134
SHLD/SHRD Double Precision Shift
Encoding
1100000w
Syntax
mod,TTT*,r/m
Examples
sar sar
disp (0, 1, or 2)
data (1)
CPU 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486 88/86 286 386 486 Clock Cycles 5+n 3 2 5+n 3 2 8+n 7 4 8+n 7 4
SAR reg,immed8
bx,5 cl,5
SAL reg,immed8 SHL reg,immed8 SHR reg,immed8 SAR mem,immed8
sal shl shr sar sar
cx,6 di,2 bx,8 sign_count,3 WORD PTR [bx],5
SAL reg,immed8 SHL reg,immed8 SHR reg,immed8
shr shl sal
mem16,11 unsign,4 array[bx+di],14
* TTT represents one of the following bit codes: 100 for SHL or SAL, 101 for SHR, or 111 for SAR.

8038680486 Only Shifts the bits of the second operand into the first operand. The number of bits shifted is specified by the third operand. SHLD shifts the first operand to the left by the number of positions specified in the count. The positions opened by the shift are filled by the most significant bits of the second operand. SHRD shifts the first operand to the right by the number of positions specified in the count. The positions opened by the shift are filled by the least significant bits of the second operand. The count operand can be either CL or an 8-bit constant. If a shift count larger than 31 is given, it is adjusted by using the remainder (modulo) of a division by 32. Flags O D I ? T S Z A P C ?
135
Encoding
00001111
Syntax
10100100
mod,reg,r/m
Examples
shld
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486
data (1)
SHLD reg16,reg16,immed8 SHLD reg32,reg32,immed8
ax,dx,10
SHLD mem16,reg16,immed8 SHLD mem32,reg32,immed8
shld
bits,cx,5
Encoding
00001111
Syntax
10101100
mod,reg,r/m
Examples
shrd
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486
data (1)
SHRD reg16,reg16,immed8 SHRD reg32,reg32,immed8
cx,si,3
SHRD mem16,reg16,immed8 SHRD mem32,reg32,immed8
shrd
[di],dx,13
Encoding
00001111
Syntax
10100101
mod,reg,r/m
Examples
shld
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 3 3 7 4
SHLD reg16,reg16,CL SHLD reg32,reg32,CL
ax,dx,cl
SHLD mem16,reg16,CL SHLD mem32,reg32,CL
shld masker,ax,cl
136
SMSW Store Machine Status Word
Encoding
00001111
Syntax
10101101
mod,reg,r/m
Examples
shrd
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 3 3 7 4
SHRD reg16,reg16,CL SHRD reg32,reg32,CL
bx,dx,cl
SHRD mem16,reg16,CL SHRD mem32,reg32,CL
shrd
[bx],dx,cl
SMSW Store Machine Status Word

80286-80486 Only Stores the Machine Status Word (MSW) into a specified memory operand. SMSW is generally useful only in protected mode. See Intel documentation for details on the MSW and other protected-mode concepts. Flags Encoding No change 00001111
Syntax SMSW reg16
00000001
mod,100,r/m
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 2 2 2 3 3 3
Examples
smsw ax
SMSW mem16
smsw
machine
STI Set Interrupt Flag
137
STC Set Carry Flag

Sets the carry flag. Flags Encoding O D I 11111001
Syntax STC Examples
stc
T S Z A P C 1
CPU 88/86 286 386 486
STD Set Direction Flag

Sets the direction flag. All subsequent string instructions will process down (from high addresses to low addresses). Flags Encoding O D I 1 T S Z A P C
11111101
Syntax STD Examples
std
CPU 88/86 286 386 486
STI Set Interrupt Flag

Sets the interrupt flag. When the interrupt flag is set, maskable interrupts are recognized. If interrupts were disabled by a previous CLI instruction, pending interrupts will not be executed immediately; they will be executed after the instruction following STI. Flags O D I T S Z A P C 1
138
STOS/STOSB/STOSW/STOSD Store String Data
Encoding
11111011
Syntax STI Examples
sti
CPU 88/86 286 386 486
STOS/STOSB/STOSW/STOSD Store String Data

Stores the value of the accumulator in a string. The string is the destination and must be pointed to by ES:DI, even if an operand is given. For each source element loaded, DI is adjusted according to the size of the operand and the status of the direction flag. DI is incremented if the direction flag has been cleared with CLD or decremented if the direction flag has been set with STD. If the STOS form of the instruction is used, an operand must be provided to indicate the size of the data elements to be processed. No segment override is allowed. If STOSB (bytes), STOSW (words), or STOSD (doublewords on the 8038680486 only) is used, the instruction determines the size of the data elements to be processed and whether the element comes from AL, AX, or EAX. STOS and its variations are often used with the REP prefix to fill a string with a repeated value. Before the repeated instruction is executed, CX should contain the number of elements to store. Flags Encoding No change 1010101w
Syntax STOS [ [ES:] ] dest STOSB [ [[ [ES:] ] dest ] ] STOSW [ [[ [ES:] ] dest ] ] STOSD [ [[ [ES:] ] dest ] ]*
* 8038680486 only
Examples
stos es:dstring rep stosw rep stosb
CPU 88/86 286 386 486
SUB Subtract
139
STR Store Task Register

80286-80486 Only Stores the current task register to the specified operand. This instruction is generally useful only in privileged mode. See Intel documentation for details on task registers and other protected-mode concepts. Flags Encoding No change 00001111
Syntax STR reg16
00000000
mod, 001, reg
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 2 2 2 3 2 3
Examples
str cx
STR mem16
str
taskreg
SUB Subtract
Subtracts the source operand from the destination operand and stores the result in the destination operand. Flags Encoding O D I T S Z A P C mod, reg, r/m disp (0, 1, or 2)
001010dw
Syntax SUB reg,reg
Examples
sub sub ax,bx bh,dh
SUB mem ,reg
sub sub
tally,bx array[di],bl
140
TEST
Logical Compare Syntax SUB reg,mem Examples

sub sub cx,discard al,[bx]
CPU 88/86 286 386 486
Clock Cycles 9+EA (W88=13+EA) 7 7 2
Encoding
100000sw
Syntax
mod,101,r/m
disp (0, 1, or 2)
data (1 or 2)
Examples
sub sub dx,45 bl,7
SUB reg,immed
SUB mem,immed
sub sub
total,4000 BYTE PTR [bx+di],2
Encoding
0010110w
Syntax
data (1 or 2)
Examples
sub ax,32000
CPU 88/86 286 386 486
SUB accum,immed
TEST Logical Compare

Tests specified bits of an operand and sets the flags for a subsequent conditional jump or set instruction. One of the operands contains the value to be tested. The other contains a bit mask indicating the bits to be tested. TEST works by doing a bitwise AND operation on the source and destination operands. The flags are modified according to the result, but the destination operand is not changed. This instruction is the same as the AND instruction, except the result is not stored. Flags O D I 0 T S Z A P C ? 0
VERR/VERW Verify Read or Write
141
Encoding
1000010w
Syntax TEST reg,reg
mod, reg, r/m
disp (0, 1, or 2)
Examples
test test dx,bx bl,ch
TEST mem ,reg TEST reg,mem*
test test
dx,flags bl,bitarray[bx]
Encoding
1111011w
Syntax
mod,000,r/m
disp (0, 1, or 2)
data (1 or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 5 3 2 1 11+EA 6 5 2
Examples
test test cx,30h cl,1011b
TEST reg,immed
TEST mem ,immed
test test
masker,1 BYTE PTR [bx],03h
Encoding
1010100w
Syntax TEST accum,immed
data (1 or 2)
Examples
test ax,90h
CPU 88/86 286 386 486
* MASM transposes TEST reg, mem; that is, it is encoded as TEST mem, reg.
VERR/VERW Verify Read or Write

80286-80486 Protected Only Verifies that a specified segment selector is valid and can be read or written to at the current privilege level. VERR verifies that the selector is readable. VERW verifies that the selector can be written to. If the segment is verified, the zero flag is set. Otherwise, the zero flag is cleared. Flags O D I T S Z A P C
142
WAIT Wait
Encoding
00001111
Syntax VERR reg16
00000000
mod, 100,r/m
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 14 10 11 16 11 11
Examples
verr ax
VERR mem16
verr
selector
Encoding
00001111
Syntax
00000000
mod, 101,r/m
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 14 15 11 16 16 11
Examples
verw cx
VERW reg16
VERW mem16
verw
selector
WAIT Wait
Suspends processor execution until the processor receives a signal that a coprocessor has finished a simultaneous operation. It should be used to prevent a coprocessor instruction from modifying a memory location that is being modified simultaneously by a processor instruction. WAIT is the same as the coprocessor FWAIT instruction. Flags Encoding No change 10011011
Syntax WAIT Examples
wait
CPU 88/86 286 386 486
XADD Exchange and Add
143
WBINVD Write Back and Invalidate Data Cache

80486 Only Empties the contents of the current data cache after writing changes to memory. Proper use of this instruction requires knowledge of how contents are placed in the cache. WBINVD is intended primarily for system programming. See Intel documentation for details. Flags Encoding No change 00001111
Syntax WBINVD
00001001
Examples
wbinvd
CPU 88/86 286 386 486
Clock Cycles 5
XADD Exchange and Add

80486 Only Adds the source and destination operands and stores the sum in the destination; simultaneously, the original value of the destination is moved to the source. The instruction sets flags according to the result of the addition. Flags Encoding O D I T S Z A P C 1100000b mod, reg, r/m
Examples
xadd xadd warr[bx],ax string,bl
00001111
Syntax
disp (0, 1, or 2)
CPU 88/86 286 386 486 88/86 286 386 486 Clock Cycles 4 3
XADD mem,reg
XADD reg,reg
xadd xadd
dl,al bx,dx
144
XCHG Exchange
XCHG Exchange
Exchanges the values of the source and destination operands. Flags Encoding No change 1000011w
Syntax XCHG reg,reg
mod,reg,r/m
disp (0, 1, or 2)
Examples
xchg xchg xchg xchg xchg cx,dx bl,dh al,ah [bx],ax bx,pointer
XCHG reg,mem XCHG mem ,reg
Encoding
10010 reg
Syntax XCHG accum,reg16* XCHG reg16,accum* Examples
xchg xchg ax,cx cx,ax
CPU 88/86 286 386 486
* On the 8038680486, the accumulator may also be exchanged with a 32-bit register.
XLAT/XLATB Translate
Translates a value from one coding system to another by looking up the value to be translated in a table stored in memory. Before the instruction is executed, BX should point to a table in memory and AL should contain the unsigned position of the value to be translated from the table. After the instruction, AL contains the table value at the specified position. No operand is required, but one can be given to specify a segment override. DS is assumed unless a segment override is given. XLATB is a synonym for XLAT. Either version allows an operand, but neither requires one.
XOR Exclusive OR
145
Flags Encoding
No change 11010111
Syntax XLAT [ [[ [segreg:] ] mem ] ] XLATB [ [[ [segreg:] ] mem ] ] Examples
xlat xlatb es:table
CPU 88/86 286 386 486
XOR Exclusive OR
Performs a bitwise exclusive OR operation on the source and destination operands and stores the result in the destination. For each bit position in the operands, if both bits are set or if both bits are cleared, the corresponding bit of the result is cleared. Otherwise, the corresponding bit of the result is set. Flags Encoding O D I 0 T S Z A P C ? 0 mod, reg, r/m
Examples
xor xor cx,bx ah,al
001100dw
Syntax XOR reg,reg
disp (0, 1, or 2)
XOR mem,reg
xor xor
[bp+10],cx masked,bx
XOR reg,mem
xor xor
cx,flags bl,bitarray[di]
146
XOR Exclusive OR
Encoding
100000sw
Syntax
mod,110,r/m
disp (0, 1, or 2)
data (1 or 2)
Examples
xor xor bx,10h bl,1
XOR reg,immed
XOR mem,immed
xor xor
Boolean,1 switches[bx],101b
Encoding
0011010w
Syntax
data (1 or 2)
Examples
xor ax,01010101b
CPU 88/86 286 386 486
XOR accum,immed
XOR Exclusive OR
147
145
C H A P T E R
Coprocessor
Topical Cross-reference for Coprocessor Instructions . Interpreting Coprocessor Instructions. . . . . . . . . . . . . Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clock Speeds . . . . . . . . . . . . . . . . . . . . . . . . . . . Instruction Size . . . . . . . . . . . . . . . . . . . . . . . . . . Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
146 148 148 148 148 148 149
146
Reference
Topical Cross-reference for Coprocessor Instructions

Arithmetic
FABS FCHS FDIVR/FIDIVR FMULP FRNDINT FSUB/FISUB FSUBRP FADD/FIADD FDIV/FIDIV FDIVRP FPREM FSCALE FSUBP FXTRACT FADDP FDIVP FMUL/FIMUL FPREM1 FSQRT FSUBR/FISUBR
Compare
FCOM/FICOM FSTSW/FNSTSW FUCOMP FCOMP/FICOMP FTST FUCOMPP FCOMPP FUCOM FXAM
Load
FLD/FILD/FBLD FRSTOR FLDCW FXCH FLDENV
Load Constant
FLD1 FLDLG2 FLDZ FLDL2E FLDLN2 FLDL2T FLDPI
Processor Control
FCLEX/FNCLEX FENI/FNENI* FINIT/FNINIT FRSTOR FSTCW/FNSTCW FWAIT FDECSTP FFREE FLDCW FSAVE/FNSAVE FSTENV/FNSTENV FDISI/FNDISI* FINCSTP FNOP FSETPM_ FSTSW/FNSTSW
Store Data
FSAVE/FNSAVE FSTENV/FNSTENV FST/FIST FSTP/FISTP/FBSTP FSTCW/FNSTCW FSTSW/FNSTSW
Coprocessor
147
Transcendental
F2XM1 FPREM FSIN
FCOS FPREM1

FPATAN FPTAN FYL2P1
FSINCOS
FYL2X
* 8087 only 80287 only. 8038780486 only.
148
Reference
Interpreting Coprocessor Instructions

This section provides an alphabetical reference to instructions of the 8087, 80287, and 80387 coprocessors. The format is the same as the processor instructions except that encodings are not provided. Differences are noted in the following. The 80486 has the coprocessor built in. This one chip executes all the instructions listed in the previous section and this section.
Syntax
Syntaxes in Column 1 use the following abbreviations for operand types:
Syntax reg memreal memint membcd Operand A coprocessor stack register A direct or indirect memory operand storing a real number A direct or indirect memory operand storing a binary integer A direct or indirect memory operand storing a BCD number
Examples
The position of the examples in Column 2 is not related to the clock speeds in Column 3.
Clock Speeds
Column 3 shows the clock speeds for each processor. Sometimes an instruction may have more than one possible clock speed. The following abbreviations are used to specify variations:
Abbreviation EA Description Effective address. This applies only to the 8087. See the Processor Section, Timings on the 8088 and 8086 Processors, for an explanation of effective address timings. Short real, long real, and 10-byte temporary real. Word, doubleword, and quadword binary integer. To or from stack top. On the 80387 and 80486, the to clocks represent timings when ST is the destination. The fr clocks represent timings when ST is the source.
s,l,t w,d,q to, fr
Instruction Size
The instruction size is always 2 bytes for instructions that do not access memory. For instructions that do access memory, the size is 4 bytes on the 8087 and 80287. On the 80387 and 80486, the size for instructions that access memory is 4 bytes in 16-bit mode, or 6 bytes in 32-bit mode.
Coprocessor
149
On the 8087, each instruction must be preceded by the WAIT (also called FWAIT) instruction, thereby increasing the instructions size by 1 byte. The assembler inserts WAIT automatically by default, or with the .8087 directive.
Architecture
The 8087, 80287, and 80387 coprocessors, along with the 80486, have several common elements of architecture. All have a register stack made up of eight 80bit data registers. These can contain floating-point numbers in the temporary real format. The coprocessors also have 14 bytes of control registers. Figure 5.1 shows the format of registers.
Fig. 5.1
Coprocessor Registers
150
F2XM1 2X1
The most important control registers are the control word and the status word. Figure 5.2 shows the format of these registers.
Fig. 5.2
Control Word and Status Word
F2XM1 2X1
Calculates Y = 2X 1. X is taken from ST. The result, Y, is returned in ST. X must be in the range 0 X 0.5 on the 8087/287, or in the range 1.0 X +1.0 on the 8038780486.
Syntax F2XM1 Examples
f2xm1
CPU 87 287 387 486
Clock Cycles 310630 310630 211476 140279
FABS Absolute Value

Converts the element in ST to its absolute value.
Syntax FABS Examples
fabs
CPU 87 287 387 486
FBSTP Store BCD and Pop
151
FADD/FADDP/FIADD Add
Adds the source to the destination and returns the sum in the destination. If two register operands are specified, one must be ST. If a memory operand is specified, the sum replaces the value in ST. Memory operands can be 32- or 64bit real numbers or 16- or 32-bit integers. If no operand is specified, ST is added to ST(1) and the stack is popped, returning the sum in ST. For FADDP, the source must be ST; the sum is returned in the destination and ST is popped.
Syntax FADD [ [reg,reg] ] Examples
fadd st,st(2) fadd st(5),st fadd faddp st(6),st
CPU 87 287 387 486 87 287 387 486 87 287 387 486
Clock Cycles 70100 70100 to=2331, fr=2634 820 75105 75105 2331 820 (s=90120,s=95 125)+EA s=90120,l=95125 s=2432,l=2937 820 (w=102137,d=108 143)+EA w=102137,d=108 143 w=7185,d=5772 w=2035,d=1932
FADDP reg,ST
FADD memreal
fadd QWORD PTR [bx] fadd shortreal
FIADD memint
fiadd int16 fiadd warray[di] fiadd double
87 287 387 486
FBLD Load BCD

See FLD.
FBSTP Store BCD and Pop

See FST.
152
FCHS Change Sign
FCHS Change Sign

Reverses the sign of the value in ST.
Syntax FCHS Examples
fchs
CPU 87 287 387 486
Clock Cycles 1017 1017 2425 6
FCLEX/FNCLEX Clear Exceptions

Clears all exception flags, the busy flag, and bit 7 in the status word. Bit 7 is the interrupt-request flag on the 8087, and the error-status flag on the 80287, 80387, and 80486. The instruction has wait and no-wait versions.
Syntax FCLEX FNCLEX Examples
fclex
CPU 87 287 387 486
Clock Cycles* 28 28 11 7
* These timings reflect the no-wait version of the instruction. The wait version may take additional clock cycles.
FCOM/FCOMP/FCOMPP/FICOM/FICOMP Compare
Compares the specified source operand to ST and sets the condition codes of the status word according to the result. The instruction subtracts the source operand from ST without changing either operand. Memory operands can be 32- or 64-bit real numbers or 16- or 32-bit integers. If no operand is specified or if two pops are specified, ST is compared to ST(1) and the stack is popped. If one pop is specified with an operand, the operand is compared to ST. If one of the operands is a NAN, an invalid-operation exception occurs (see FUCOM for an alternative method of comparing on the 8038780486).
FCOM/FCOMP/FCOMPP/FICOM/FICOMP Compare Syntax FCOM [ [reg] ] Examples

fcom fcom st(2)
153
CPU 87 287 387 486 87 287 387 486 87 287 387 486
Clock Cycles 4050 4050 24 4 4252 4252 26 4 4555 4555 26 5 (s=6070,l=6575)+EA s=6070,l=6575 s=26,l=31 4 (s=6373,l=6777)+EA s=6373,l=6777 s=26,l=31 4 (w=7286,d=7891)+EA w=7286,d=7891 w=7175,d=5663 w=1620,d=1517 (w=7488,d=8093)+EA w=7488,d=8093 w=7175,d=5663 w=1620,d=1517
FCOMP [ [reg] ]
fcomp fcomp
st(7)
FCOMPP
fcompp
FCOM memreal
fcom fcom
shortreals[di] longreal
87 287 387 486 87 287 387 486 87 287 387 486
FCOMP memreal
fcomp fcomp
longreal shorts[di]
FICOM memint
ficom ficom
double warray[di]
FICOMP memint
ficomp [bp+6] ficomp
WORD PTR darray[di]
87 287 387 486
Condition Codes for FCOM C3 0 0 1 1 C2 0 0 0 1 C1 ? ? ? ? C0 0 1 0 1 Meaning ST > source ST < source ST = source ST is not comparable to source
154
FCOS Cosine
FCOS Cosine
8038780486 Only Replaces a value in radians in ST with its cosine. If |ST | < 263, the C2 bit of the status word is cleared and the cosine is calculated. Otherwise, C2 is set and no calculation is performed. ST can be reduced to the required range with FPREM or FPREM1.
Syntax FCOS Examples
fcos
CPU 87 287 387 486
Clock Cycles 123772* 257354
* For operands with an absolute value greater than /4, up to 76 additional clocks may be required. For operands with an absolute value greater than /4, add n clocks where n = operand/(/4).
FDECSTP Decrement Stack Pointer

Decrements the stack-top pointer in the status word. No tags or registers are changed, and no data is transferred. If the stack pointer is 0, FDECSTP changes it to 7.
Syntax FDECSTP Examples
fdecstp
CPU 87 287 387 486
FDISI/FNDISI Disable Interrupts

8087 Only Disables interrupts by setting the interrupt-enable mask in the control word. This instruction has wait and no-wait versions. Since the 80287, 80387, and 80486 do not have an interrupt-enable mask, the instruction is recognized but ignored on these coprocessors.
Syntax FDISI FNDISI Examples
fdisi
CPU 87 287 387 486
FDIV/FDIVP/FIDIV
Divide
155
FDIV/FDIVP/FIDIV Divide
Divides the destination by the source and returns the quotient in the destination. If two register operands are specified, one must be ST. If a memory operand is specified, the quotient replaces the value in ST. Memory operands can be 32- or 64-bit real numbers or 16- or 32-bit integers. If no operand is specified, ST(1) is divided by ST and the stack is popped, returning the result in ST. For FDIVP, the source must be ST; the quotient is returned in the destination register and ST is popped.
Syntax FDIV [ [reg,reg] ] Examples
fdiv fdiv st,st(2) st(5),st
CPU 87 287 387 486 87 287 387 486 87 287 387 486 87 287 387 486
Clock Cycles 193203 193203 to=88, fr=91 73 197207 197207 91 73 (s=215225,l=220 230)+EA s=215225,l=220230 s=89,l=94 73 (w=224238,d=230 243)+EA w=224238,d=230 243 w=136140,d=120 127 w=8589,d=8486
FDIVP reg,ST
fdivp
st(6),st
FDIV memreal
fdiv fdiv fdiv
DWORD PTR [bx] shortreal[di] longreal
FIDIV memint
fidiv fidiv fidiv
int16 warray[di] double
156
FDIVR/FDIVRP/FIDIVR Divide Reversed
FDIVR/FDIVRP/FIDIVR Divide Reversed

Divides the source by the destination and returns the quotient in the destination. If two register operands are specified, one must be ST. If a memory operand is specified, the quotient replaces the value in ST. Memory operands can be 32- or 64-bit real numbers or 16- or 32-bit integers. If no operand is specified, ST is divided by ST(1) and the stack is popped, returning the result in ST. For FDIVRP, the source must be ST; the quotient is returned in the destination register and ST is popped.
Syntax FDIVR [ [reg,reg] ] Examples
fdivr fdivr fdivr fdivrp st,st(2) st(5),st
CPU 87 287 387 486 87 287 387 486 87 287 387 486
Clock Cycles 194204 194204 to=88, fr=91 73 198208 198208 91 73 (s=216226,l=221 231)+EA s=216226,l=221231 s=89,l=94 73 (w=225239,d=231 245)+EA w=225239,d=231 245 w=135141,d=121128 w=8589,d=8486
FDIVRP reg,ST
st(6),st
FDIVR memreal
fdivr fdivr
longreal shortreal[di]
FIDIVR memint
fidivr fidivr
double warray[di]
87 287 387 486
FENI/FNENI Enable Interrupts

8087 Only Enables interrupts by clearing the interrupt-enable mask in the control word. This instruction has wait and no-wait versions. Since the 80287, 80387, and 80486 do not have interrupt-enable masks, the instruction is recognized but ignored on these coprocessors.
FILD Load Integer Syntax FENI FNENI Examples

feni
157
CPU 87 287 387 486
FFREE Free Register

Changes the specified registers tag to empty without changing the contents of the register.
Syntax FFREE ST(i) Examples
ffree st(3)
CPU 87 287 387 486
FIADD/FISUB/FISUBR/ FIMUL/FIDIV/FIDIVR Integer Arithmetic

See FADD, FSUB, FSUBR, FMUL, FDIV, and FDIVR.
FICOM/FICOMP Compare Integer

See FCOM.
FILD Load Integer

See FLD.
158
FINCSTP Increment Stack Pointer
FINCSTP Increment Stack Pointer

Increments the stack-top pointer in the status word. No tags or registers are changed, and no data is transferred. If the stack pointer is 7, FINCSTP changes it to 0.
Syntax FINCSTP Examples
fincstp
CPU 87 287 387 486
FINIT/FNINIT Initialize Coprocessor

Initializes the coprocessor and resets all the registers and flags to their default values. The instruction has wait and no-wait versions. On the 8038780486, the condition codes of the status word are cleared. On the 8087/287, they are unchanged.
Syntax FINIT FNINIT Examples
finit
CPU 87 287 387 486
FIST/FISTP Store Integer

See FST.
FLD1/FLDZ/FLDPI/FLDL2E/FLDL2T/FLDLG2/FLDLN2 Load Constant
159
FLD/FILD/FBLD Load
Pushes the specified operand onto the stack. All memory operands are automatically converted to temporary-real numbers before being loaded. Memory operands can be 32-, 64-, or 80-bit real numbers or 16-, 32-, or 64-bit integers.
Syntax FLD reg Examples
fld st(3)
CPU 87 287 387 486 87 287
Clock Cycles 1722 1722 14 4 (s=3856,l=4060,t = 5365)+EA s=3856,l=4060,t = 5365 s=20,1=25,t =44 s=3,l=3,t =6 (w=4654,d=52 60,q=6068)+EA w=46-54,d=52-60,q= 60-68 w=6165,d=45 52,q=5667 w=1316,d=912,q= 1018 (290310)+EA 290310 266275 70103
FLD memreal
fld fld fld
longreal shortarray[bx+di] tempreal
387 486 FILD memint

fild fild fild mem16 DWORD PTR [bx] quads[si]
87 287 387 486
FBLD membcd
fbld
packbcd
87 287 387 486
FLD1/FLDZ/FLDPI/FLDL2E/ FLDL2T/FLDLG2/FLDLN2 Load Constant FLD1/FLDZ/FLDPI/FLDL2E/FLDL2T/FLDLG2/FLDLN2 Load Constant Pushes a constant onto the stack. The following constants can be loaded:
Instruction FLD1 FLDZ FLDPI Constant +1.0 +0.0
160
FLD1/FLDZ/FLDPI/FLDL2E/FLDL2T/FLDLG2/FLDLN2 Load Constant Instruction FLDL2E FLDL2T FLDLG2 FLDLN2 Syntax FLD1 Constant Log2(e) Log2(10) Log10(2) Loge(2) Examples
fld1
CPU 87 287 387 486 87 287 387 486 87 287 387 486 87 287 387 486 87 287 387 486 87 287 387 486 87 287 387 486
Clock Cycles 1521 1521 24 4 1117 1117 20 4 1622 1622 40 8 1521 1521 40 8 1622 1622 40 8 1824 1824 41 8 1723 1723 41 8
FLDZ
fldz
FLDPI
fldpi
FLDL2E
fldl2e
FLDL2T
fldl2t
FLDLG2
fldlg2
FLDLN2
fldln2
FMUL/FMULP/FIMUL
Multiply
161
FLDCW Load Control Word

Loads the specified word into the coprocessor control word. The format of the control word is shown in the Interpreting Coprocessor Instructions section.
Syntax FLDCW mem16 Examples
fldcw ctrlword
CPU 87 287 387 486
Clock Cycles (714)+EA 714 19 4
FLDENV/FLDENVW/FLDENVD Load Environment State

Loads the 14-byte coprocessor environment state from a specified memory location. The environment includes the control word, status word, tag word, instruction pointer, and operand pointer. On the 8038780486 in 32-bit mode, the environment state is 28 bytes.
Syntax FLDENV mem FLDENVW mem* FLDENVD mem * Examples
fldenv [bp+10]
CPU 87 287 387 486
Clock Cycles (3545)+EA 3545 71 44,pm=34
* 8038780486 only.
FMUL/FMULP/FIMUL Multiply
Multiplies the source by the destination and returns the product in the destination. If two register operands are specified, one must be ST. If a memory operand is specified, the product replaces the value in ST. Memory operands can be 32- or 64-bit real numbers or 16- or 32-bit integers. If no operand is specified, ST(1) is multiplied by ST and the stack is popped, returning the product in ST. For FMULP, the source must be ST; the product is returned in the destination register and ST is popped.
162
FNinstruction No-Wait Instructions Syntax FMUL [ [reg,reg] ] Examples

fmul fmul fmul st,st(2) st(5),st
CPU 87 287 387 486
Clock Cycles 130145 (90105)* 130145 (90105)* to=4654 (49), fr= 2957 (52) 16 134148 (94108)* 134148 (94108)* 2957 (52) 16 (s=110125,l=154 168)+EA s=110125,l=154 168 s=2735,l=3257 s=11,l=14 (w=124138,d=130 144)+EA w=124138,d=130 144 w=7687,d=6182 w=2327,d=2224
FMULP reg,ST
fmulp
st(6),st
87 287 387 486 87 287 387 486
FMUL memreal
fmul fmul fmul
DWORD PTR [bx] shortreal[di+3] longreal
FIMUL memint
fimul fimul fimul
int16 warray[di] double
87 287 387 486
* The clocks in parentheses show times for short valuesthose with 40 trailing zeros in their fraction because they were loaded from a short-real memory operand. The clocks in parentheses show typical speeds. If the register operand is a short valuehaving 40 trailing zeros in its fraction because it was loaded from a short-real memory operandthen the timing is (112126)+EA on the 8087 or 112126 on the 80287.
FNinstruction No-Wait Instructions

Instructions that have no-wait versions include FCLEX, FDISI, FENI, FINIT, FSAVE, FSTCW, FSTENV, and FSTSW. Wait versions of instructions check for unmasked numeric errors; no-wait versions do not. When the .8087 directive is used, the assembler puts a WAIT instruction before the wait versions and a NOP instruction before the no-wait versions.
FPREM Partial Remainder
163
FNOP No Operation
Performs no operation. FNOP can be used for timing delays or alignment.
Syntax FNOP Examples
fnop
CPU 87 287 387 486
FPATAN Partial Arctangent

Finds the partial tangent by calculating Z = ARCTAN(Y / X). X is taken from ST and Y from ST(1). On the 8087/287, Y and X must be in the range 0 Y < X < . On the 8038780486, there is no restriction on X and Y. X is popped from the stack and Z replaces Y in ST.
Syntax FPATAN Examples
fpatan
CPU 87 287 387 486
Clock Cycles 250800 250800 314487 218303
FPREM Partial Remainder

Calculates the remainder of ST divided by ST(1), returning the result in ST. The remainder retains the same sign as the original dividend. The calculation uses the following formula:
remainder = ST ST(1) * quotient
The quotient is the exact value obtained by chopping ST / ST(1) toward 0. The instruction is normally used in a loop that repeats until the reduction is complete, as indicated by the condition codes of the status word.
Syntax FPREM Examples
fprem
CPU 87 287 387 486
Clock Cycles 15190 15190 74155 70138
164
FPREM1 Partial Remainder (IEEE Compatible)
Condition Codes for FPREM and FPREM1 C3 ? 0 0 0 0 1 1 1 1 C2 1 0 0 0 0 0 0 0 0 C1 ? 0 0 1 1 0 0 1 1 C0 ? 0 1 0 1 0 1 0 1 Meaning Incomplete reduction quotient MOD 8 = 0 quotient MOD 8 = 4 quotient MOD 8 = 1 quotient MOD 8 = 5 quotient MOD 8 = 2 quotient MOD 8 = 6 quotient MOD 8 = 3 quotient MOD 8 = 7
FPREM1 Partial Remainder (IEEE Compatible)

8038780486 Only Calculates the remainder of ST divided by ST(1), returning the result in ST. The remainder retains the same sign as the original dividend. The calculation uses the following formula:
remainder = ST ST(1) * quotient
The quotient is the integer nearest to the exact value of ST / ST(1). When two integers are equally close to the given value, the even integer is used. The instruction is normally used in a loop that repeats until the reduction is complete, as indicated by the condition codes of the status word. See FPREM for the possible condition codes.
Syntax FPREM1 Examples
fprem1
CPU 87 287 387 486
Clock Cycles 95185 72167
FRNDINT Round to Integer
165
FPTAN Partial Tangent

Finds the partial tangent by calculating Y / X = TAN(Z). Z is taken from ST. Z must be in the range 0 Z / 4 on the 8087/287. On the 8038780486, |Z| must be less than 263. The result is the ratio Y / X. Y replaces Z, and X is pushed into ST. Thus, Y is returned in ST(1) and X in ST.
Syntax FPTAN Examples
fptan
CPU 87 287 387 486
Clock Cycles 30540 30540 191497* 200273
FRNDINT Round to Integer

Rounds ST from a real number to an integer. The rounding control (RC) field of the control word specifies the rounding method, as shown in the introduction to this section.
Syntax FRNDINT Examples
frndint
CPU 87 287 387 486
Clock Cycles 1650 1650 6680 2130
166
FRSTOR/FRSTORW/FRSTORD Restore Saved State
FRSTOR/FRSTORW/FRSTORD Restore Saved State

Restores the 94-byte coprocessor state to the coprocessor from the specified memory location. In 32-bit mode on the 8038780486, the environment state takes 108 bytes.
Syntax FRSTOR mem FRSTORW mem * FRSTORD mem * Examples
frstor [bp94]
CPU 87 287 387 486
Clock Cycles (197207)+EA 308 131,pm=120
* 8038780486 only. Clock counts are not meaningful in determining overall execution time of this instruction. Timing is determined by operand transfers.
FSAVE/FSAVEW/FSAVED/FNSAVE/ FNSAVEW/FNSAVED Save Coprocessor State

Stores the 94-byte coprocessor state to the specified memory location. In 32-bit mode on the 8038780486, the environment state takes 108 bytes. This instruction has wait and no-wait versions. After the save, the coprocessor is initialized as if FINIT had been executed.
Syntax FSAVE mem FSAVEW mem* FSAVED mem * FNSAVE mem FNSAVEW mem* FNSAVED mem *
* 8038780486 only. Clock counts are not meaningful in determining overall execution time of this instruction. Timing is determined by operand transfers. These timings reflect the no-wait version of the instruction. The wait version may take additional clock cycles.
Examples
fsave [bp94] fsave cobuffer
CPU 87 287 387 486
Clock Cycles (197207)+EA 375376 154,pm=143
FSETPM Set Protected Mode
167
FSCALE Scale
Scales by powers of 2 by calculating the function Y = Y * 2X. X is the scaling factor taken from ST(1), and Y is the value to be scaled from ST. The scaled result replaces the value in ST. The scaling factor remains in ST(1). If the scaling factor is not an integer, it will be truncated toward zero before the scaling. On the 8087/287, if X is not in the range 215 X < 215 or if X is in the range 0 < X < 1, the result will be undefined. The 8038780486 have no restrictions on the range of operands.
Syntax FSCALE Examples
fscale
CPU 87 287 387 486
Clock Cycles 3238 3238 6786 3032
FSETPM Set Protected Mode

80287 Only Sets the 80287 to protected mode. The instruction and operand pointers are in the protected-mode format after this instruction. On the 80387 80486, FSETPM is recognized but interpreted as FNOP, since the 80386/486 processors handle addressing identically in real and protected mode.
Syntax FSETPM Examples
fsetpm
CPU 87 287 387 486
168
FSIN
Sine
FSIN Sine
8038780486 Only Replaces a value in radians in ST with its sine. If |ST | < 263, the C2 bit of the status word is cleared and the sine is calculated. Otherwise, C2 is set and no calculation is performed. ST can be reduced to the required range with FPREM or FPREM1.
Syntax FSIN Examples
fsin
CPU 87 287 387 486
FSINCOS Sine and Cosine

8038780486 Only Computes the sine and cosine of a radian value in ST. The sine replaces the value in ST, and then the cosine is pushed onto the stack. If |ST | < 263, the C2 bit of the status word is cleared and the sine and cosine are calculated. Otherwise, C2 is set and no calculation is performed. ST can be reduced to the required range with FPREM or FPREM1.
Syntax FSINCOS Examples
fsincos
CPU 87 287 387 486
FST/FSTP/FIST/FISTP/FBSTP
Store
169
FSQRT Square Root

Replaces the value of ST with its square root. (The square root of 0 is 0.)
Syntax FSQRT Examples
fsqrt
CPU 87 287 387 486
Clock Cycles 180186 180186 122129 8387
FST/FSTP/FIST/FISTP/FBSTP Store
Stores the value in ST to the specified memory location or register. Temporaryreal values in registers are converted to the appropriate integer, BCD, or floating-point format as they are stored. With FSTP, FISTP, and FBSTP, the ST register value is popped off the stack. Memory operands can be 32-, 64-, or 80-bit real numbers for FSTP or 16-, 32-, or 64-bit integers for FISTP.
Syntax FST reg Examples
fst fst st(6) st
CPU 87 287 387 486 87 287 387 486 87 287 387 486
Clock Cycles 1522 1522 11 3 1724 1724 12 3 (s=8490,l=96 104)+EA s=8490,l=96104 s=44,l=45 s=7,l=8 (s=8692,l=98106, t =5258)+EA s=8692,l=98106, t =5258 s=44,l=45,t =53 s=7,l=8,t =6
FSTP reg
fstp fstp
st st(3)
FST memreal
fst fst
shortreal longs[bx]
FSTP memreal
fstp fstp
longreal tempreals[bx]
87 287 387 486
170
FSTCW/FNSTCW Store Control Word Syntax FIST memint Examples

fist fist int16 doubles[8]
CPU 87 287 387 486
Clock Cycles (w=8090,d=82 92)+EA w=8090,d=8292 w=82-95,d=79-93 w=2934,d=2834 (w=8292,d=8494, q=94105)+EA w=8292,d=8494, q=94105 w=8295,d=7993, q=8097 2934 (520540)+EA 520540 512534 172176
FISTP memint
fistp fistp
longint doubles[bx]
87 287 387 486
FBSTP membcd
fbstp
bcds[bx]
87 287 387 486
FSTCW/FNSTCW Store Control Word

Stores the control word to a specified 16-bit memory operand. This instruction has wait and no-wait versions.
Syntax FSTCW mem16 FNSTCW mem16 Examples
fstcw ctrlword
CPU 87 287 387 486
Clock Cycles* 1218 1218 15 3
FSTENV/FSTENVW/FSTENVD/FNSTENV/FNSTENVW/ FNSTENVD Store Environment State

Stores the 14-byte coprocessor environment state to a specified memory location. The environment state includes the control word, status word, tag word, instruction pointer, and operand pointer. On the 8038780486 in 32-bit mode, the environment state is 28 bytes.
FSUB/FSUBP/FISUB Subtract Syntax FSTENV mem FSTENVW mem * FSTENVD mem* FNSTENV mem FNSTENVW mem * FNSTENVD mem *
* 8038780486 only. These timings reflect the no-wait version of the instruction. The wait version may take additional clock cycles.
171
Examples
fstenv [bp14]
CPU 87 287 387 486
Clock Cycles (4050)+EA 4050 103104 67,pm=56
FSTSW/FNSTSW Store Status Word

Stores the status word to a specified 16-bit memory operand. On the 80287, 80387, and 80486, the status word can also be stored to the processors AX register. This instruction has wait and no-wait versions.
Syntax FSTSW mem16 FNSTSW mem16 Examples
fstsw statword
CPU 87 287 387 486 87 287 387 486
Clock Cycles* 1218 1218 15 3 1016 13 3
FSTSW AX FNSTSW AX
fstsw
ax
FSUB/FSUBP/FISUB Subtract
Subtracts the source operand from the destination operand and returns the difference in the destination operand. If two register operands are specified, one must be ST. If a memory operand is specified, the result replaces the value in ST. Memory operands can be 32- or 64-bit real numbers or 16- or 32-bit integers. If no operand is specified, ST is subtracted from ST(1) and the stack is popped, returning the difference in ST. For FSUBP, the source must be ST; the difference (destination minus source) is returned in the destination register and ST is popped.
172
FSUBR/FSUBRP/FISUBR Subtract Reversed Syntax FSUB [ [reg,reg] ] Examples

fsub fsub fsub fsubp st,st(2) st(5),st
CPU 87 287 387 486 87 287 387 486 87 287 387 486
Clock Cycles 70100 70100 to=2937, fr=2634 820 75105 75105 2634 820 (s=90120,s=95 125)+EA s=90120,l=95125 s=2432,l=2836 820 (w=102137,d=108143)+EA w=102137,d=108 143 w=7183,d=5782 w=2035,d=1932
FSUBP reg,ST
st(6),st
FSUB memreal
fsub fsub
longreal shortreals[di]
FISUB memint
fisub double fisub warray[di]
87 287 387 486
FSUBR/FSUBRP/FISUBR Subtract Reversed

Subtracts the destination operand from the source operand and returns the result in the destination operand. If two register operands are specified, one must be ST. If a memory operand is specified, the result replaces the value in ST. Memory operands can be 32- or 64-bit real numbers or 16- or 32-bit integers. If no operand is specified, ST(1) is subtracted from ST and the stack is popped, returning the difference in ST. For FSUBRP, the source must be ST; the difference (source minus destination) is returned in the destination register and ST is popped.
Syntax FSUBR [ [reg,reg] ] Examples
fsubr fsubr fsubr st,st(2) st(5),st
CPU 87 287 387 486 87 287 387 486
Clock Cycles 70100 70100 to=2937, fr=2634 820 75105 75105 2634 820
FSUBRP reg,ST
fsubrp st(6),st
FUCOM/FUCOMP/FUCOMPP Unordered Compare Syntax FSUBR memreal Examples

fsubr fsubr fsubr QWORD PTR [bx] shortreal[di] longreal
173
CPU 87 287 387 486 87 287 387 486
Clock Cycles (s=90120,s=95 125)+EA s=90120,l=95125 s=2533,l=2937 820 (w=103139,d=109 144)+EA w=103139,d=109 144 w=7284,d=5883 w=2055,d=1932
FISUBR memint
fisubr int16 fisubr warray[di] fisubr double
FTST Test for Zero

Compares ST with +0.0 and sets the condition of the status word according to the result.
Syntax FTST Examples
ftst
CPU 87 287 387 486
Condition Codes for FTST C3 0 0 1 1 C2 0 0 0 1 C1 ? ? ? ? C0 0 1 0 1 Meaning ST is positive ST is negative ST is 0 ST is not comparable (NAN or projective infinity)
FUCOM/FUCOMP/FUCOMPP Unordered Compare

8038780486 Only Compares the specified source to ST and sets the condition codes of the status word according to the result. The instruction subtracts the source operand from ST without changing either operand. Memory operands are not allowed. If no operand is specified or if two pops are specified, ST is compared to ST(1). If one pop is specified with an operand, the given register is compared to ST.
174
FWAIT Wait
Unlike FCOM, FUCOM does not cause an invalid-operation exception if one of the operands is NAN. Instead, the condition codes are set to unordered.
Syntax FUCOM [ [reg] ] Examples
fucom fucom st(2)
CPU 87 287 387 486 87 287 387 486 87 287 387 486
Clock Cycles 24 4 26 4 26 5
FUCOMP [ [reg] ]
fucomp fucomp
st(7)
FUCOMPP
fucompp
Condition Codes for FUCOM C3 0 0 1 1 C2 0 0 0 1 C1 ? ? ? ? C0 0 1 0 1 Meaning ST > source ST < source ST = source Unordered
FWAIT Wait
Suspends execution of the processor until the coprocessor is finished executing. This is an alternate mnemonic for the processor WAIT instruction.
Syntax FWAIT Examples
fwait
CPU 87 287 387 486
FXAM Examine
175
FXAM Examine
Reports the contents of ST in the condition flags of the status word.
Syntax FXAM Examples
fxam
CPU 87 287 387 486
Clock Cycles 1223 1223 3038 8
Condition Codes for FXAM C3 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 C2 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 C1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 C0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Meaning + Unnormal* + NAN Unnormal* NAN + Normal + Infinity Normal Infinity +0 Empty 0 Empty + Denormal Empty* Denormal Empty*
* Not used on the 8038780486. Unnormals are not supported by the 8038780486. Also, the 80387 80486 use two codes instead of four to identify empty registers.
176
FXCH Exchange Registers
FXCH Exchange Registers

Exchanges the specified (destination) register and ST. If no operand is specified, ST and ST(1) are exchanged.
Syntax FXCH [ [reg] ] Examples
fxch fxch st(3)
CPU 87 287 387 486
FXTRACT Extract Exponent and Significand

Extracts the exponent and significand (mantissa) fields of ST. The exponent replaces the value in ST, and then the significand is pushed onto the stack.
Syntax FXTRACT Examples
fxtract
CPU 87 287 387 486
Clock Cycles 2755 2755 7076 1620
FYL2X Y log2(X)
Calculates Z = Y log2(X). X is taken from ST and Y from ST(1). The stack is popped, and the result, Z, replaces Y in ST. X must be in the range 0 < X < and Y in the range < Y < .
Syntax FYL2X Examples
fyl2x
CPU 87 287 387 486
Clock Cycles 9001100 9001100 120538 196329
FYL2XP1 Y log2(X+1)
177
FYL2XP1 Y log2(X+1)
Calculates Z = Y log2(X + 1). X is taken from ST and Y from ST(1). The stack is popped once, and the result, Z, replaces Y in ST. X must be in the range 0 < |X| < (1 (2 / 2)). Y must be in the range < Y < .
Syntax FYL2XP1 Examples
fyl2xp1
CPU 87 287 387 486
Clock Cycles 7001000 7001000 257547 171326
178
FYL2XP1 Y log2(X+1)
179
C H A P T E R
Macros
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BIOS.INC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CMACROS.INC, CMACROS.NEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MS-DOS.INC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MACROS.INC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PROLOGUE.INC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WIN.INC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
180 180 180 183 184 185 185
Filename: LMARFC06.DOC Project: MASM Reference Template: MSGRIDA1.DOT Author: a.c. birdsong Last Saved By: Launi Lockard Revision #: 26 Page: 179 of 1 Printe d: 10/02/00 04:16 PM
180
Reference
Introduction
Each of the INCLUDE files is listed with the names of the macros it contains. Macros listed take the form:
<macroname>MACRO[[ <variables[[:=<default value>]], ..>]]
Some variables are listed as name:req. In these cases, req indicates that macroname cannot be called without the variable name supplied. For specific information on the macros themselves, see the contents of the commented *.INC file.
BIOS.INC
@Cls MACRO pagenum @GetCharAtr MACRO pagenum @GetCsr MACRO pagenum @GetMode MACRO @PutChar MACRO chr, atrib, pagenum, loops @PutCharAtr MACRO chr, atrib, pagenum, loops @Scroll MACRO distance:REQ, atrib:=<07h>, upcol, uprow, dncol, dnrow @SetColor MACRO color @SetCsrPos MACRO column, row, pagenum @SetCsrSize MACRO first, last @SetMode MACRO mode @SetPage MACRO pagenum @SetPalette MACRO color
CMACROS.INC, CMACROS.NEW
These two include files contain the same macros. Use CMACROS.NEW for programs written in MASM 6.0 and later. Use CMACROS.INC for programs written in MASM 5.1 or earlier, or if you have problems with CMACROS.NEW. @reverse MACRO list arg MACRO args assumes MACRO s,ln
Filename: LMARFC06.DOC Project: MASM Reference Template: MSGRIDA1.DOT Author: a.c. birdsong Last Saved By: Launi Lockard Revision #: 26 Page: 180 of 2 Printed: 10/02/00 04:16 PM
Macros
181
callcrt MACRO funcname cBegin MACRO pname cEnd MACRO pname cEpilog MACRO procname, flags, cbParms, cbLocals, reglist, userparms cProc MACRO pname:REQ, attribs, autoSave cPrologue MACRO procname, flags, cbParms, cbLocals, reglist, userparms createSeg MACRO segName, logName, aalign, combine, class, grp cRet MACRO defGrp MACRO foo:vararg errn$ MACRO l,x errnz MACRO x externA MACRO names:req, langtype externB MACRO names:req, langtype externCP MACRO n,c externD MACRO names:req, langtype externDP MACRO n,c externFP MACRO names:req, langtype externNP MACRO names:req, langtype externP MACRO n,c externQ MACRO names:req, langtype externT MACRO names:req, langtype externW MACRO names:req, langtype farPtr MACRO n,s,o globalB MACRO name:req, initVal:=<?>, repCount, langType globalCP MACRO n,i,s,c globalD MACRO name:req, initVal:=<?>, repCount, langType globalDP MACRO n,i,s,c globalQ MACRO name:req, initVal:=<?>, repCount, langType globalT MACRO name:req, initVal:=<?>, repCount, langType globalW MACRO name:req, initVal:=<?>, repCount, langType labelB MACRO names:req,langType labelCP MACRO n,c
182
Reference
labelD MACRO names:req,langType labelDP MACRO n,c labelFP MACRO names:req,langType labelNP MACRO names:req,langType labelP MACRO n,c labelQ MACRO names:req,langType labelT MACRO names:req,langType labelW MACRO names:req,langType lbl MACRO names:req localB MACRO name localCP MACRO n localD MACRO name localDP MACRO n localQ MACRO name localT MACRO name localV MACRO name,a localW MACRO name logName&_assumes MACRO s logName&_sbegin MACRO n MACRO outif MACRO name:req, defval:=<0>, onmsg, offmsg parmB MACRO names:req parmCP MACRO n parmD MACRO names:req parmDP MACRO n parmQ MACRO names:req parmR MACRO n,r,r2 parmT MACRO names:req parmW MACRO names:req regPtr MACRO n,s,o save MACRO r sBegin MACRO name:req
Macros
183
sEnd MACRO name setDefLangType MACRO overLangType staticB MACRO name:req, initVal:=<?>, repCount staticCP MACRO name:req, i, s staticD MACRO name:req, initVal:=<?>, repCount staticDP MACRO name:req, i, s staticI MACRO name:req, initVal:=<?>, repCount staticQ MACRO name:req, initVal:=<?>, repCount staticT MACRO name:req, initVal:=<?>, repCount staticW MACRO name:req, initVal:=<?>, repCount
MS-DOS.INC
NPVOID TYPEDEF NEAR PTR FPVOID TYPEDEF FAR PTR FILE_INFO STRUCT @ChDir MACRO path:REQ, segmnt @ChkDrv MACRO drive @CloseFile MACRO handle:REQ @DelFile MACRO path:REQ, segmnt @Exit MACRO return @FreeBlock MACRO segmnt @GetBlock MACRO graphs:REQ, retry:=<0> @GetChar MACRO ech:=<1>, cc:=<1>, clear:=<0> @GetDate MACRO @GetDir MACRO buffer:REQ, drive, segmnt @GetDrv MACRO @GetDTA MACRO @GetFileSize MACRO handle:REQ @GetFirst MACRO path:REQ, atrib, segmnt @GetInt MACRO interrupt:REQ @GetNext MACRO
184
Reference
@GetStr MACRO ofset:REQ, terminator, limit, segmnt @GetTime MACRO @GetVer MACRO @MakeFile MACRO path:REQ, atrib:=<0>, segmnt, kind @MkDir MACRO path:REQ, segmnt @ModBlock MACRO graphs:REQ, segmnt @MoveFile MACRO old:REQ, new:REQ, segold, segnew @MovePtrAbs MACRO handle:REQ, distance @MovePtrRel MACRO handle:REQ, distance @OpenFile MACRO path:REQ, access:=<0>, segmnt @PrtChar MACRO chr:VARARG @Read MACRO ofset:REQ, bytes:REQ, handle:=<0>, segmnt @RmDir MACRO path:REQ, segmnt @SetDate MACRO month:REQ, day:REQ, year:REQ @SetDrv MACRO drive:REQ @SetDTA MACRO buffer:REQ, segmnt @SetInt MACRO interrupt:REQ, vector:REQ, segmnt @SetTime MACRO hour:REQ, minutes:REQ, seconds:REQ, hundredths:REQ @ShowChar MACRO chr:VARARG @ShowStr MACRO ofset:REQ, segmnt @TSR MACRO paragraphs:REQ, return @Write MACRO ofset:REQ, bytes:REQ, handle:=<1>, segmnt
MACROS.INC
@ArgCount MACRO arglist:VARARG @ArgI MACRO index:REQ, arglist:VARARG @ArgRev MACRO arglist @PopAll MACRO @PushAll MACRO @RestoreRegs MACRO @SaveRegs MACRO regs:VARARG echof MACRO format:REQ, args:VARARG pushc MACRO op
Macros
185
PROLOGUE.INC
cEpilogue MACRO szProcName, flags, cbParams, cbLocals, rgRegs, rgUserParams cPrologue MACRO szProcName, flags, cbParams, cbLocals, rgRegs, rgUserParams
WIN.INC
The include file WIN.INC is WINDOWS.H processed by H2INC, and slightly modified to reduce unnecessary warnings.
186
Reference
187
C H A P T E R
Tables
ASCII Chart . . . . . . . . . . . . . . . . . . . . . Key Codes . . . . . . . . . . . . . . . . . . . . . . MS-DOS Program Segment Prefix (PSP) Color Display Attributes . . . . . . . . . . . . . Hexadecimal-Binary-Decimal Conversion
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
188 190 192 193 194
188
Reference
ASCII Codes
Tables
189
190
Reference
Key Codes
Tables
191
192
Reference
MS-DOS Program Segment Prefix (PSP)
1 Opcode for INT 20h instruction (CDh 20h) 2 Segment of first allocatable address following the program (used for memory allocation) 3 Reserved or used by MS-DOS 4 Opcode for far call to MS-DOS function dispatcher 5 Vector for terminate routine 6 Vector for CTRL+C handler routine 7 Vector for error handler routine 8 Segment address of programs environment block 9 Opcode for MS-DOS INT 21h and far return (you can do a far call to this address to execute MS-DOS calls) 10 First command-line argument (formatted as uppercase 11-character filename) 11 Second command-line argument (formatted as uppercase 11-character filename) 12 Number of bytes in command-line argument 13 Unformatted command line and/or default Disk Transfer Area (DTA)
Tables
193
Color Display Attributes

Background Bits F 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 R 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 G 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 B 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 A B C D E F Black Blue Green Cyan Red Magenta Brown White Black blink Blue blink Green blink Cyan blink Red blink Brown blink White blink
I
Foreground Num Color Bits* I 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 R 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 G 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 B 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 A B C D E F Black Blue Green Cyan Red Magenta Brown White Dark gray Light Blue Light green Light cyan Light red Light Magenta Yellow Bright White Num Color
Magenta blink 1
F Flashing bit R Red bit
G Green bit B Blue bit
Intensity bit
* On monochrome monitors, the blue bit is set and the red and green bits are cleared (001) for underline; all color bits are set (111) for normal text.
194
Reference
Hexadecimal-Binary-Decimal Conversion
Hex Number 0 1 2 3 4 5 6 7 8 9 A B C D E F Binary Number 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Decimal Digit 000X 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Decimal Digit 00X0 0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 Decimal Digit 0X00 0 256 512 768 1,024 1,280 1,536 1,792 2,048 2,304 2,560 2,816 3,072 3,328 3,584 3,840 Decimal Digit X000 0 4,096 8,192 12,288 16,384 20,480 24,576 28,672 32,768 36,864 40,960 45,056 49,152 53,248 57,344 61,440
Programmers Guide
Microsoft MASM
Filename: LMAPGTTL.DOC Project: Template: FRONTA1.DOT Author: Bart Simpson, Who the Hell Are You? Revision #: 16 Page: 1 of 1 Printed: 10/02/00 04:19 PM
Last Saved By: Mike Eddy
Information in this document is subject to change without notice. Companies, names, and data used in examples herein are fictitious unless otherwise noted. No part of this document maybe reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Microsoft Corporation.
Microsoft, MS, MS-DOS, XENIX, CodeView, and QuickC are registered trademarks and Microsoft QuickBasic, QuickPascal, Windows and Windows NT are trademarks of Microsoft Corporation in the USA and other countries. U.S. Patent No. 4,955,066 Hercules is a registered trademark of Hercules Computer Technology. IBM, PS/2, and OS/2 are registered trademarks of International Business Machines Corporation. Intel is a registered trademark of Intel Corporation. NEC and V25 are registered trademarks and V35 is a trademark of NEC Corporation.
Document No. DB35747-1292 Printed in the United States of America.
Filename: LMAPGCPY.DOC Project: Template: FRONTA1.DOT Author: Ruth L Silverio Last Saved By: Mike Eddy Revision #: 6 Page: 2 of 1 Printed: 10/02/00 04:21 PM
iii
Contents
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii New and Extended Features in MASM 6.1 . . . . . . . . . . . . . . . . . . . . . . . . xiii MASM Features New Since Version 5.1 . . . . . . . . . . . . . . . . . . . . . . . . xiv MASM Features New Since Version 6.0 . . . . . . . . . . . . . . . . . . . . . . . . xv ML and MASM Command Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Compatibility with Earlier Versions of MASM . . . . . . . . . . . . . . . . . . . . xvi A Word About Instruction Timings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Books for Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Getting Assistance and Reporting Problems . . . . . . . . . . . . . . . . . . . . . . . . xx Chapter 1 Understanding Global Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 The Processing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 8086-Based Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Segmented Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Segment Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Segmented Addressing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Segment Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Language Components of MASM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Reserved Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Predefined Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Integer Constants and Constant Expressions . . . . . . . . . . . . . . . . . . . . . 11 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 The Assembly Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Generating and Running Executable Programs . . . . . . . . . . . . . . . . . . . . 23 Using the OPTION Directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Conditional Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Chapter 2 Organizing Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Physical Memory Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Logical Segments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Using Simplified Segment Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Filename: LMAPGTOC.DOC Project: Template: FRONTA1.DOT Author: Don Hayward Last Saved By: Ruth L Silverio Revision #: 18 Page: 3 of 1 Printed: 10/02/00 04:19 PM
iv
Contents
Defining Basic Attributes with .MODEL . . . . . . . . . . . Specifying a Processor and Coprocessor . . . . . . . . . . . Creating a Stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating Data Segments . . . . . . . . . . . . . . . . . . . . . . . Creating Code Segments . . . . . . . . . . . . . . . . . . . . . . . Starting and Ending Code with .STARTUP and .EXIT . Using Full Segment Definitions . . . . . . . . . . . . . . . . . . . . Defining Segments with the SEGMENT Directive. . . . . Controlling the Segment Order . . . . . . . . . . . . . . . . . . Setting the ASSUME Directive for Segment Registers . . Defining Segment Groups . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
34 38 38 39 40 41 44 44 47 49 51
Chapter 3 Using Addresses and Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Programming Segmented Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Initializing Default Segment Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Near and Far Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Register Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Immediate Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Direct Memory Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Indirect Memory Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 The Program Stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Saving Operands on the Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Saving Flags on the Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Saving Registers on the Stack (8018680486 Only). . . . . . . . . . . . . . . . . 74 Accessing Data with Pointers and Addresses . . . . . . . . . . . . . . . . . . . . . . . . 74 Defining Pointer Types with TYPEDEF . . . . . . . . . . . . . . . . . . . . . . . . 75 Defining Register Types with ASSUME . . . . . . . . . . . . . . . . . . . . . . . . . 77 Basic Pointer and Address Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Chapter 4 Defining and Using Simple Data Types . . . . . . . . . . . . . . . . . . . . . . . 85 Declaring Integer Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Allocating Memory for Integer Variables . . . . . . . . . . . . . . . . . . . . . . . . 85 Data Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Working with Simple Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Copying Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Adding and Subtracting Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Multiplying and Dividing Integers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Manipulating Numbers at the Bit Level. . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Logical Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Shifting and Rotating Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Contents
Multiplying and Dividing with Shift Instructions . . . . . . . . . . . . . . . . . . 102 Chapter 5 Defining and Using Complex Data Types. . . . . . . . . . . . . . . . . . . . . 105 Arrays and Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Declaring and Referencing Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Declaring and Initializing Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Processing Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Structures and Unions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Declaring Structure and Union Types . . . . . . . . . . . . . . . . . . . . . . . . . 118 Defining Structure and Union Variables. . . . . . . . . . . . . . . . . . . . . . . . 121 Referencing Structures, Unions, and Fields . . . . . . . . . . . . . . . . . . . . . 126 Nested Structures and Unions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Declaring Record Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Defining Record Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Record Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Chapter 6 Using Floating-Point and Binary Coded Decimal Numbers . . . . . . . 135 Using Floating-Point Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Declaring Floating-Point Variables and Constants . . . . . . . . . . . . . . . . . 136 Storing Numbers in Floating-Point Format. . . . . . . . . . . . . . . . . . . . . . 138 Using a Math Coprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Coprocessor Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Instruction and Operand Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Coordinating Memory Access. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Using Coprocessor Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Using An Emulator Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Using Binary Coded Decimal Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Defining BCD Constants and Variables . . . . . . . . . . . . . . . . . . . . . . . . 157 BCD Calculations on a Coprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . 157 BCD Calculations on the Main Processor . . . . . . . . . . . . . . . . . . . . . . 158 Chapter 7 Controlling Program Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Jumps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Unconditional Jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Conditional Jumps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Loop-Generating Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Writing Loop Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Defining Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
vi
Contents
Passing Arguments on the Stack . . . . . . . . . . . Declaring Parameters with the PROC Directive Using Local Variables. . . . . . . . . . . . . . . . . . . Creating Local Variables Automatically . . . . . . Declaring Procedure Prototypes . . . . . . . . . . . Calling Procedures with INVOKE . . . . . . . . . . Generating Prologue and Epilogue Code. . . . . . MS-DOS Interrupts . . . . . . . . . . . . . . . . . . . . . . Calling MS-DOS and ROM-BIOS Interrupts . . Replacing an Interrupt Routine . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
182 184 188 190 193 194 198 204 204 206
Chapter 8 Sharing Data and Procedures Among Modules and Libraries . . . . . 211 Selecting Data-Sharing Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Sharing Symbols with Include Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 Organizing Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 Declaring Symbols Public and External . . . . . . . . . . . . . . . . . . . . . . . . 214 Positioning External Declarations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Using Alternatives to Include Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 PUBLIC and EXTERN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Other Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Developing Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Associating Libraries with Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 Using EXTERN with Library Routines . . . . . . . . . . . . . . . . . . . . . . . . 223 Chapter 9 Using Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Text Macros. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 Macro Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 Creating Macro Procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Passing Arguments to Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Specifying Required and Default Parameters . . . . . . . . . . . . . . . . . . . . 229 Defining Local Symbols in Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Assembly-Time Variables and Macro Operators . . . . . . . . . . . . . . . . . . . . 233 Text Delimiters and the Literal-Character Operator . . . . . . . . . . . . . . . . 234 Expansion Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Substitution Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Defining Repeat Blocks with Loop Directives . . . . . . . . . . . . . . . . . . . . . . 239 REPEAT Loops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 WHILE Loops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 FOR Loops and Variable-Length Parameters . . . . . . . . . . . . . . . . . . . . 242 FORC Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 String Directives and Predefined Functions . . . . . . . . . . . . . . . . . . . . . . . . 245
Contents
vii
Returning Values with Macro Functions. . . . . . . . . . . . . . . . . . . Returning Values with EXITM. . . . . . . . . . . . . . . . . . . . . . . Using Macro Functions with Variable-Length Parameter Lists . Expansion Operator in Macro Functions . . . . . . . . . . . . . . . . Advanced Macro Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . Defining Macros within Macros . . . . . . . . . . . . . . . . . . . . . . Testing for Argument Type and Environment . . . . . . . . . . . . Using Recursive Macros . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
248 248 249 251 251 251 252 255
Chapter 10 Writing a Dynamic-Link Library For Windows . . . . . . . . . . . . . . . . 257 Overview of DLLs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Loading a DLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 Building a DLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 DLL Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 DLL Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 DLL Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 DLL Extension Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Example of a DLL: SYSINFO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Entry Routine for SYSINFO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 Expanding SYSINFO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 Chapter 11 Writing Memory-Resident Software . . . . . . . . . . . . . . . . . . . . . . . . 273 Terminate-and-Stay-Resident Programs. . . . . . . . . . . . . . . . . . . . . . . . . . 273 Structure of a TSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Passive TSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Active TSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Interrupt Handlers in Active TSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Auditing Hardware Events for TSR Requests . . . . . . . . . . . . . . . . . . . 275 Monitoring System Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Determining Whether to Invoke the TSR . . . . . . . . . . . . . . . . . . . . . . 279 Example of a Simple TSR: ALARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Using MS-DOS in Active TSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Understanding MS-DOS Stacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Determining MS-DOS Activity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Interrupting MS-DOS Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 Monitoring the Critical Error Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Preventing Interference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Trapping Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Preserving an Existing Condition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Preserving Existing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
viii
Contents
Communicating Through the Multiplex Interrupt . . . . . . . . . . The Multiplex Handler . . . . . . . . . . . . . . . . . . . . . . . . . . Using the Multiplex Interrupt Under MS-DOS Version 2.x. Deinstalling a TSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of an Advanced TSR: SNAP . . . . . . . . . . . . . . . . . Building SNAP.EXE . . . . . . . . . . . . . . . . . . . . . . . . . . . Outline of SNAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
290 291 292 292 293 294 295
Chapter 12 Mixed-Language Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Naming and Calling Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 Naming Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 The C Calling Convention. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 The Pascal Calling Convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 The STDCALL and SYSCALL Calling Conventions. . . . . . . . . . . . . . . 311 Writing an Assembly Procedure For a Mixed-Language Program . . . . . . . . 312 The MASM/High-LevelLanguage Interface . . . . . . . . . . . . . . . . . . . . . . . 313 The C/MASM Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 The C++/MASM Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 The FORTRAN/MASM Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 The Basic/MASM Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 Chapter 13 Writing 32-Bit Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 32-Bit Memory Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 MASM Directives for 32-Bit Programming . . . . . . . . . . . . . . . . . . . . . . . . 336 Sample Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Appendixes
Appendix A Differences Between MASM 6.1 and 5.1. . . . . . . . . . . . . . . . . . . . 341 New Features of Version 6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 The Assembler, Environment, and Utilities . . . . . . . . . . . . . . . . . . . . . . 342 Segment Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 Procedures, Loops, and Jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Simplifying Multiple-Module Projects . . . . . . . . . . . . . . . . . . . . . . . . . 348 Expanded State Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 New Processor Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 Renamed Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 Macro Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 MASM 6.1 Programming Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 Compatibility Between MASM 5.1 and 6.1. . . . . . . . . . . . . . . . . . . . . . . . 352
Contents
ix
Rewriting Code for Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Using the OPTION Directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Changes to Instruction Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 Appendix B BNF Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Appendix Generating and Reading Assembly Listings. . . . . . . . . . . . . . . . . . 397 Generating Listing Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Precedence of Command-Line Options and Listing Directives. . . . . . . . 399 Reading the Listing File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 Generated Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 Symbols and Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 Reading Tables in a Listing File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 Appendix D MASM Reserved Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Operands and Symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Special Operands for the 80386/486 . . . . . . . . . . . . . . . . . . . . . . . . . . 409 Predefined Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 Operators and Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410 Processor Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 8086/8088 Processor Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 80186 Processor Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 80286 Processor Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 80286 and 80386 Privileged-Mode Instructions . . . . . . . . . . . . . . . . . . 413 80386 Processor Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 80486 Processor Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 Instruction Prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 Coprocessor Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 8087 Coprocessor Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 80287 Privileged-Mode Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 80387 Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Appendix E Default Segment Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Contents
Figures and Tables

Figures 1.1 Segment Allocation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Calculating Physical Addresses . . . . . . . . . . . . . . . . . . . . . . 1.3 Registers for 8088-80286 Processors . . . . . . . . . . . . . . . . . . 1.4 Extended Registers for the 80386/486 Processors . . . . . . . . . 1.5 Flags for 8088-80486 Processors. . . . . . . . . . . . . . . . . . . . . 3.1 Stack Status Before and After Pushes and Pops . . . . . . . . . . 4.1 Integer Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Shifts and Rotates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Encoding for Real Numbers in IEEE Format . . . . . . . . . . . . 6.2 Coprocessor Data Registers. . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Status of the Register Stack. . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Status of the Register Stack and Memory Locations . . . . . . . 6.5 Status of the Previously Initialized Register Stack . . . . . . . . . 6.6 Status of the Already Initialized Register Stack . . . . . . . . . . . 6.7 Status of the Register Stack: Main Memory and Coprocessor. 6.8 Coprocessor Control Registers. . . . . . . . . . . . . . . . . . . . . . . 6.9 Coprocessor and Processor Control Flags . . . . . . . . . . . . . . . 7.1 Program Arguments on the Stack . . . . . . . . . . . . . . . . . . . . 7.2 Local Variables on the Stack . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Operation of Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Using EXTERNDEF for Variables. . . . . . . . . . . . . . . . . . . . 8.2 Using PROTO and INVOKE . . . . . . . . . . . . . . . . . . . . . . . 8.3 Using PUBLIC and EXTERN. . . . . . . . . . . . . . . . . . . . . . . 11.1 Time Line of Interaction Between Interrupt Handlers for a Typical TSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Flowchart for SNAP.EXE: Installation Phase . . . . . . . . . . . 11.3 Flowchart for SNAP.EXE Resident Phase . . . . . . . . . . . . . 11.4 Flowchart for SNAP.EXE Deinstallation Phase. . . . . . . . . . 12.1 C String Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 C Stack Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 FORTRAN String Frame . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 FORTRAN Stack Frame. . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Basic String Descriptor Format . . . . . . . . . . . . . . . . . . . . . 12.6 Basic Stack Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1 BNF Definition of the TYPEDEF Directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 . 8 . 17 . 18 . 20 . 72 . 87 101 138 140 142 143 144 144 148 154 155 183 190 206 215 217 221 278 296 297 298 316 320 324 327 330 333 380
Contents
xi
Tables 1.1 8086 Family of Processors. . . . . . . . . . . . . . . . . . . . . . . . 1.2 The MS-DOS and Windows Operating Systems Compared 1.3 Operator Precedence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Attributes of Memory Models . . . . . . . . . . . . . . . . . . . . . 3.1 Indirect Addressing with 16-Bit Registers . . . . . . . . . . . . . 4.1 Division Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Requirements for String Instructions . . . . . . . . . . . . . . . . . 6.1 Ranges of Floating-Point Variables . . . . . . . . . . . . . . . . . . 6.2 Coprocessor Operand Formats . . . . . . . . . . . . . . . . . . . . . 6.3 Control-Flag Settings After Comparison or Test . . . . . . . . . 7.1 Conditional Jumps Based on Comparisons of Two Values . 9.1 MASM Macro Operators . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 MS-DOS Internal Stacks . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Naming and Calling Conventions . . . . . . . . . . . . . . . . . . 12.2 Register Conventions for Simple Return Values . . . . . . . . A.1 Requirements for String Instructions. . . . . . . . . . . . . . . . . C.1 Options for Generating or Modifying Listing Files . . . . . . . C.2 Symbols and Abbreviations in Listings . . . . . . . . . . . . . . . C.3 Symbols in Timing Column . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . .
. 2 . 4 14 35 68 97 112 136 141 151 167 234 286 309 317 353 398 400 401
xiii
Introduction
The Microsoft Macro Assembler Programmers Guide provides the information you need to write and debug assembly-language programs with the Microsoft Macro Assembler (MASM), version 6.1. This book documents enhanced features of the language and the programming environment for MASM 6.1. This Programmers Guide is written for experienced programmers who know assembly language and are familiar with an assembler. The book does not teach the basics of assembly language; it does explain Microsoft-specific features. If you want to learn or review the basics of assembly language, refer to Books for Further Reading in this introduction. This book teaches you how to write efficient code with the new and advanced features of MASM. Getting Started explains how to set up MASM 6.1. Environment and Tools introduces the integrated development environment called the Programmers WorkBench (PWB). It also includes a detailed reference to Microsoft tools and utilities such as Microsoft CodeView , LINK, and NMAKE. The Microsoft Macro Assembler Reference provides a full listing of all MASM instructions, directives, statements, and operators, and it serves as a quick reference to utility commands. For more information on these same topics, see the online Microsoft Advisor, which is a complete reference to Macro Assembler language topics, to the utilities, and to PWB. You should be able to find most of the information you need in the Microsoft Advisor.
New and Extended Features in MASM 6.1

MASM 6.1 continues the break with tradition established by version 6.0. It incorporates conveniences of high-level languages while offering all the traditional advantages of assembly-language programming. For example, MASM 6.1 includes the Programmers WorkBench, which provides the same integrated software development environment enjoyed by programmers of Microsoft high-level languages such as C and Basic. From within PWB you can edit, build, debug, or run a program. You can perform most of these operations with either menu selections or keyboard commands. You can also customize PWB to suit your individual programming and editing requirements and preferences.
Filename: LMAPGINT.DOC Project: Template: MSGRIDA1.DOT Author: Terri Sharkey Last Saved By: Ruth L Silverio Revision #: 33 Page: 13 of 1 Printed: 10/02/00 04:20 PM
xiv
Programmers Guide
MASM Features New Since Version 5.1

MASM 6.1 includes several features designed to make programming more efficient and productive. The following list briefly describes how MASM 6.1 improves on the language features of the popular version 5.1.
u
MASM 6.1 has many enhancements related to types. You can now use the same type specifiers in initializations as in other contexts (BYTE instead of DB). You can also define your own types, including pointer types, with the new TYPEDEF directive. See Chapter 3, Using Addresses and Pointers, and Chapter 4, Defining and Using Simple Data Types. The syntax for defining and using structures and records has been enhanced since version 5.1. You can also define unions with the new UNION directive. See Chapter 5, Defining and Using Complex Data Types. MASM now generates complete CodeView information for all types. See Chapter 3, Using Addresses and Pointers, and Chapter 4, Defining and Using Simple Data Types. New control-flow directives let you use high-level language constructs such as loops and if-then-else blocks defined with .REPEAT and .UNTIL (or .UNTILCXZ); .WHILE and .ENDW; and .IF, .ELSE, and .ELSEIF. The assembler generates the appropriate code to implement the control structure. See Chapter 7, Controlling Program Flow. MASM now has more powerful features for defining and calling procedures. The extended PROC syntax for generating stack frames has been enhanced since version 5.1. You can also use the PROTO directive to prototype a procedure, which you can then call with the INVOKE directive. INVOKE automatically generates code to pass arguments (converting them to a related type, if appropriate) and makes the call according to the specified calling convention. See Chapter 7, Controlling Program Flow. MASM optimizes jumps by automatically determining the most efficient coding for a jump and then generating the appropriate code. See Chapter 7, Controlling Program Flow. Maintaining multiple-module programs is easier in MASM 6.1 than in version 5.1. The EXTERNDEF and PROTO directives make it easy to maintain all global definitions in include files shared by all the source modules of a project. See Chapter 8, Sharing Data and Procedures Among Modules and Libraries.
The assembler has many new macro features that make complex macros clearer and easier to write:
u
You can specify default values for macro arguments or mark arguments as required. And with the VARARG keyword, one parameter can accept a variable number of arguments.
Introduction
u
xv
You can implement loops inside of macros in various ways. For example, the new WHILE directive expands the statements in a macro body while an expression is not zero. You can define macro functions, which return text macros. Several predefined text macros are also provided for processing strings. Macro operators and other features related to processing text macros and macro arguments have been enhanced. For more information on all these macro features, see Chapter 9, Using Macros.
MASM 6.1 has other improved capabilities, such as:

u
The .STARTUP and .EXIT directives automatically generate appropriate startup and exit code for your assembly-language programs. See Chapter 2, Organizing Segments. MASM 6.1 supports flat memory model, available with the new Microsoft Windows NT operating system. Flat model allows segments as large as 4 gigabytes instead of 64K (kilobytes). Offsets are 32 bits instead of 16 bits. See Chapter 2, Organizing Segments. The program H2INC.EXE converts C include files to MASM include files and translates data structures and declarations. See Chapter 20 in Environment and Tools. MASM 6.1 provides a library of assembly routines that let you create a terminate-and-stay-resident program (TSR) in a high-level language.
MASM 6.1 includes many other minor new features as well as extensive support for features of earlier versions of MASM. For a complete list of enhancements, refer to Appendix A, Differences between MASM 6.1 and 5.1. The crossreferences in Appendix A guide you to the chapters where the new features are described in detail.
MASM Features New Since Version 6.0

MASM 6.1 offers several new features:
u
ML now runs in 32-bit protected mode under MS-DOS, giving it direct access to extended memory for assembling very large source files. A collection of tools lets you write a dynamic-link library (DLL) for the Microsoft Windows operating system without the Windows Software Development Kit. The LIBW.LIB library provides access to all functions in the Windows application programming interface (API), so your DLL can display menus, dialog boxes, and scroll bars. Chapter 10, Writing a Dynamic-Link Library for Windows, shows you how.
xvi
Programmers Guide
u
Program listings now show instruction timings. The number of required processor cycles appears adjacent to each instruction in the listing, based on the selected processor. For an example listing and instructions on how to use this feature, see Appendix C, Generating and Reading Assembly Listings. All utilities have been updated for version 6.1. Documentation is clearer and better arranged, with a new Environment and Tools reference book. Version 6.1 generates debugging information for CodeView version 4.0 and later. MASM 6.1 provides even greater compatibility with version 5.1 than does MASM 6.0. Many programs written with version 5.1 will assemble unchanged under MASM 6.1.
ML and MASM Command Lines

MASM 6.1 provides an updated version of the command-line driver, ML, introduced in version 6.0. ML is more powerful and flexible than the MASM driver of version 5.1. ML assembles and links with one command. It recognizes all the old MASM driver command syntax, however, to support existing batch files and makefiles that use MASM command lines. Note The name MASM has traditionally referred to the Microsoft Macro Assembler. It is used in that context throughout this book. However, MASM also refers to MASM.EXE, which has been replaced by ML.EXE. In MASM 6.1, MASM.EXE is a small utility that translates command-line options to those accepted by ML.EXE, and then calls ML.EXE. The distinction between ML.EXE and MASM.EXE is made whenever necessary. Otherwise, MASM refers to the assembler and its features.
Compatibility with Earlier Versions of MASM

MASM 6.1 is fully compatible with version 6.0 and, in many cases, with version 5.1. Code written for MASM 5.1 will often assemble correctly without modification under MASM 6.1. However, MASM 6.1 provides the OPTION directive to let you selectively modify the assembly process. In particular, you can use the M510 argument with OPTION or the /Zm command-line option to set most features to be compatible with version 5.1 code. For information about obsolete features that will not assemble correctly under MASM 6.1, see Appendix A, Differences Between MASM 6.1 and 5.1. The appendix also explains how to update code to use the new features.
Introduction
xvii
A Word About Instruction Timings

As an assembly-language programmer, whether novice or expert, you are probably interested in producing lightning-fast code. After all, one of the main reasons to program in assembly is to take advantage of its ability to streamline execution speeds to the limit of the processor. This book will help you write efficient and fast programs. When discussing the speed of individual instructions, the chapters in this book often speak of timing, which is the number of processor cycles required to carry out an instruction. The Reference lists instruction timings for processors in the 8086 family. It is tempting to use timing as the only criterion when judging an instructions actual execution speed, but the world within the processor is not so simple. The clock for instruction timing does not begin ticking until the processor has read and begins to execute an instruction. When you read about instruction timings (in this book or any other), keep in mind that other factors also influence the real speed of an instruction: the instructions size, whether it resides in cache memory, whether it accesses memory, its position in the processors prefetch queue, and the processor type. These factors make it impossible to say precisely how fast an instruction executes. Accept the references to timing in this book as guidelines, but use these simple rules to write fast code:
u
u u
Whenever possible, use registers rather than constant values, and constant values rather than memory. Minimize changes in program flow. Smaller is often better. For example, the instructions
dec sub bx bx, 1
accomplish the same thing and have the same timings on 80386/486 processors. But the first instruction is 3 bytes smaller than the second, and so may reach the processor faster. When possible, use the string instructions described in Chapter 5, Defining and Using Complex Data Types.
xviii
Programmers Guide
Books for Further Reading

The following books may help you learn to program in assembly language or write specialized programs. These books are listed only for your convenience. Microsoft makes no specific recommendations concerning any of these books.
Books About Programming in Assembly Language

Abrash, Michael. Zen of Assembly Language. Glenview, IL: Scott, Foresman and Co., 1990. Out of print. Duntemann, Jeff. Assembly Language from Square One: For the PC AT and Compatibles. Glenview, IL: Scott, Foresman and Co., 1990. Out of print. Fernandez, Judi N., and Ruth Ashley. Assembly Language Programming for the 80386. New York: McGraw-Hill, 1990. Miller, Alan R. DOS Assembly Language Programming. San Francisco: SYBEX, 1988. Out of print. Scanlon, Leo J. 80286 Assembly Language Programming on MS-DOS Computers. New York: Brady Communications, 1986. Out of print. Turley, James L. Advanced 80386 Programming Techniques. Berkeley, CA: Osborne McGraw-Hill, 1988.
Books About MS-DOS and BIOS

Terminate-and-Stay-Resident Utilities. MS-DOS Encyclopedia. Redmond, WA: Microsoft Press, 1989. Duncan, Ray. Advanced MS-DOS Programming: The Microsoft Guide for Assembly Language and C Programmers. 2d ed. Redmond, WA: Microsoft Press, 1988. Duncan, Ray. Extending DOS: Programmers Guide to Protected-Mode DOS. Redding, MA: Addison-Wesley. 1991. Jourdain, Robert. Programmers Problem Solver for the IBM PC, XT and AT. New York: Brady Communications, 1985. Out of print. Microsoft MS-DOS Programmers Reference. Redmond, WA: Microsoft Press, 1991. Norton, Peter and Richard Wilton. The New Peter Norton Programmers Guide to the IBM PC and PS/2. Redmond, WA: Microsoft Press, 1988. Wilton, Richard. Programmers Guide to PC & PS/2 Video Systems: Maximum Video Performance from the EGA, VGA, HGC, and MCGA. Redmond, WA: Microsoft Press, 1987. Out of print.
Introduction
xix
Books and Articles About Windows

Kauler, Barry. Windows Assembly Language & Systems Programming: ObjectOriented & Systems Programming in Assembly Language for Windows 3.0 and 3.1. New York, NY: Prentice Hall, 1993. Klein, Mike. Windows Programmers Guide to DLLs & Memory Management. Carmel, IN: Sams, 1992. Petzold, Charles. Programming Windows. 3d ed. Redmond, WA: Microsoft Press, 1992. Petzold, Charles. Environments. PC Magazine. New York, NY: Ziff-Davis Publishing Company, June 19901992. Programmers Reference. 4 vols. Microsoft Windows Software Development Kit (SDK). Redmond, WA: Microsoft Press, 1992.
Books About Other Topics

Nelson, Ross P. The 80386/80486 Programming Guide. 2d ed. Redmond, WA: Microsoft Press, 1991. Startz, Richard. 8087/80287/80387 for the IBM PC and Compatibles: Applications and Programming with Intels Math Coprocessors. Bowie, MD: Robert J. Brady Co., 1988. Out of print.
The following document conventions are used throughout this manual:
Example of Convention SAMPLE2.ASM Description Uppercase letters indicate filenames, segment names, registers, and terms used at the command level. Boldface type indicates assembly-language directives, instructions, type specifiers, and predefined macros, as well as keywords in other programming languages. Italic letters indicate placeholders for information you must supply, such as a filename. Italics are used occasionally for emphasis in the text. This font is used to indicate example programs, user input, and screen output. A semicolon in the first column of an example signals illegal code. A semicolon also marks a comment.
.MODEL
placeholder
target ;
xx
Programmers Guide
SHIFT
Small capital letters signify names of keys on the keyboard. Notice that a plus (+) indicates a combination of keys. For example, CTRL+E means to hold down the CTRL key while pressing the E key. Items inside double square brackets are optional. Braces and a vertical bar indicate a choice between two or more items. You must choose one of the items unless double square brackets surround the braces. A horizontal ellipsis (...) following an item indicates that more items having the same form may appear. A vertical ellipsis tells you that part of a program has been intentionally omitted.
[ [argument ] ] {register|memory}
Program . . . Fragment
Getting Assistance and Reporting Problems

If you need help or think you have discovered a problem in the software, please provide the following information to help us locate the source of the problem:
u u
The version of MS-DOS or Windows you run. Your system configuration: the type of machine you use, its total memory, and its total free memory at assembler execution time, as well as any other information you think might be useful. The command line you used for the assembler, linker, or other MASM tool that was running when the problem occurred. Any object files or libraries you linked with if the problem occurred at link time.
If your program is very large, reduce it to the smallest possible program that still produces the problem. Note the circumstances of the error and notify Microsoft Corporation by following the instructions in the section Microsoft Support Services in the introduction to Environment and Tools. If you have comments or suggestions regarding any of the books accompanying this product, please indicate them on the Document Feedback page at the back of this book and send it to Microsoft. If you have not yet registered your copy of the Macro Assembler, you should fill out and return the Registration Card. This enables Microsoft to keep you informed of updates and other information about the assembler.
C H A P T E R
Understanding Global Concepts
With the development of the Microsoft Macro Assembler (MASM) version 6.1, you now have more options available to you for approaching a programming task. This chapter explains the general concepts of programming in assembly language, beginning with the environment and a review of the components you need to work in the assembler environment. Even if you are familiar with previous versions of MASM, you should examine this chapter for information on new terms and features. The first section of this chapter reviews available processors and operating systems and how they work together. The section also discusses segmented architecture and how it affects a protected-mode operating environment such as Windows. The second section describes some of the language components of MASM that are common to most programs, such as reserved words, constant expressions, operators, and registers. The remainder of this book was written with the assumption that you understand the information presented in this section. The last section summarizes the assembly process, from assembling a program through running it. You can affect this process by the way you develop your code. Finally, this section explores how you can change the assembly process with the OPTION directive and conditional assembly.
The Processing Environment

The processing environment for MASM 6.1 includes the processor on which your programs run, the operating system your programs use, and the aspects of the segmented architecture that influence the choice of programming models. This section summarizes these elements of the environment and how they affect your programming choices.
Filename: LMAPGC01.DOC Template: MSGRIDA1.DOT Revision #: 57 Page: 1 of 1
Project: Author: Terri Sharkey Last Saved By: Ruth L Silverio Printed: 10/02/00 04:24 PM
Programmers Guide
8086-Based Processors
The 8086 family of processors uses segments to control data and code. The later 8086-based processors have larger instruction sets and more memory capacity, but they still support the same segmented architecture. Knowing the differences between the various 8086-based processors can help you select the appropriate target processor for your programs. The instruction set of the 8086 processor is upwardly compatible with its successors. To write code that runs on the widest number of machines, select the 8086 instruction set. By using the instruction set of a more advanced processor, you increase the capabilities and efficiency of your program, but you also reduce the number of systems on which the program can run. Table 1.1 lists modes, memory, and segment size of processors on which your application may need to run. Each processor is discussed in more detail following.
Table 1.1 Processor 8086/8088 80186/80188 80286 80386 80486 8086 Family of Processors Available Modes Real Real Real and Protected Real and Protected Real and Protected Addressable Memory 1 megabyte 1 megabyte 16 megabytes 4 gigabytes 4 gigabytes Segment Size 16 bits 16 bits 16 bits 16 or 32 bits 16 or 32 bits
Processor Modes
Real mode allows only one process to run at a time. The mode gets its name from the fact that addresses in real mode always correspond to real locations in memory. The MS-DOS operating system runs in real mode. Windows 3.1 operates only in protected mode, but runs MS-DOS programs in real mode or in a simulation of real mode called virtual-86 mode. In protected mode, more than one process can be active at any one time. The operating system protects memory belonging to one process from access by another process; hence the name protected mode. Protected-mode addresses do not correspond directly to physical memory. Under protected-mode operating systems, the processor allocates and manages memory dynamically. Additional privileged instructions initialize protected mode and control multiple processes. For more information, see Operating Systems, following.
Chapter 1 Understanding Global Concepts
8086 and 8088

The 8086 is faster than the 8088 because of its 16-bit data bus; the 8088 has only an 8-bit data bus. The 16-bit data bus allows you to use EVEN and ALIGN on an 8086 processor to word-align data and thus improve datahandling efficiency. Memory addresses on the 8086 and 8088 refer to actual physical addresses.
80186 and 80188

These two processors are identical to the 8086 and 8088 except that new instructions have been added and several old instructions have been optimized. These processors run significantly faster than the 8086.
80286
The 80286 processor adds some instructions to control protected mode, and it runs faster. It also provides protected mode services, allowing the operating system to run multiple processes at the same time. The 80286 is the minimum for running Windows 3.1 and 16-bit versions of OS/2 .
80386
Unlike its predecessors, the 80386 processor can handle both 16-bit and 32-bit data. It supports the entire instruction set of the 80286, and adds several new instructions as well. Software written for the 80286 runs unchanged on the 80386, but is faster because the chip operates at higher speeds. The 80386 implements many new hardware-level features, including paged memory, multiple virtual 8086 processes, addressing of up to 4 gigabytes of memory, and specialized debugging registers. Thirty-twobit operating systems such as Windows NT and OS/2 2.0 can run only on an 80386 or higher processor.
80486
The 80486 processor is an enhanced version of the 80386, with instruction pipelining that executes many instructions two to three times faster. The chip incorporates both a math coprocessor and an 8K (kilobyte) memory cache. (The math coprocessor is disabled on a variation of the chip called the 80486SX.) The 80486 includes new instructions and is fully compatible with 80386 software.
8087, 80287, and 80387

These math coprocessors work concurrently with the 8086 family of processors. Performing floating-point calculations with math coprocessors is up to 100 times faster than emulating the calculations with integer instructions. Although there are technical and performance differences among the three coprocessors, the main difference to the applications programmer is that the 80287 and 80387 can
Programmers Guide
operate in protected mode. The 80387 also has several new instructions. The 80486 does not use any of these coprocessors; its floating-point processor is built in and is functionally equivalent to the 80387.
Operating Systems
With MASM, you can create programs that run under MS-DOS, Windows, or Windows NT or all three, in some cases. For example, ML.EXE can produce executable files that run in any of the target environments, regardless of the programmers environment. For information on building programs for different environments, see Building and Running Programs in Help for PWB. MS-DOS and Windows 3.1 provide different processing modes. MS-DOS runs in the single-process real mode. Windows 3.1 operates in protected mode, allowing multiple processes to run simultaneously. Although Windows requires another operating system for loading and file services, it provides many functions normally associated with an operating system. When an application requests an MS-DOS service, Windows often provides the service without invoking MS-DOS. For consistency, this book refers to Windows as an operating system. MS-DOS and Windows (in protected mode) differ primarily in system access methods, size of addressable memory, and segment selection. Table 1.2 summarizes these differences.
Table 1.2 Operating System MS-DOS and Windows real mode Windows virtual-86 mode Windows protected mode Windows NT The MS-DOS and Windows Operating Systems Compared System Access Direct to hardware and OS call Operating system call Operating system call Operating system call Available Active Processes One Addressable Memory 1 megabyte Contents of Segment Register Actual address Segment selectors Segment selectors Segment selectors Word Length 16 bits
Multiple Multiple Multiple
1 megabyte 16 megabytes 512 megabytes
16 bits 16 bits 32 bits
MS-DOS
In real-mode programming, you can access system functions by calling MSDOS, calling the basic input/output system (BIOS), or directly addressing hardware. Access is through MS-DOS Interrupt 21h.
Windows
As you can see in Table 1.2, protected mode allows for much larger data structures than real mode, since addressable memory extends to 16 megabytes. In protected mode, segment registers contain selector values rather than actual segment addresses. These selectors cannot be calculated by the program; they must be obtained by calling the operating system. Programs that attempt to calculate segment values or to address memory directly do not work in protected mode. Protected mode uses privilege levels to maintain system integrity and security. Programs cannot access data or code that is in a higher privilege level. Some instructions that directly access ports or affect interrupts (such as CLI, STI, IN, and OUT) are available at privilege levels normally used only by systems programmers. Windows protected mode provides each application with up to 16 megabytes of virtual memory, even on computers that have less physical memory. The term virtual memory refers to the operating systems ability to use a swap area on the hard disk as an extension of real memory. When a Windows application requires more memory than is available, Windows writes sections of occupied memory to the swap area, thus freeing those sections for other use. It then provides the memory to the application that made the memory request. When the owner of the swapped data regains control, Windows restores the data from disk to memory, swapping out other memory if required.
Windows NT
Windows NT uses the so-called flat model of 80386/486 processors. This model places the processors entire address space within one 32-bit segment. The section Defining Basic Attributes with .MODEL in Chapter 2 explains how to use the flat model. In flat model, your program can (in theory) access up to 4 gigabytes of virtual memory. Since code, data, and stack reside in the same segment, each segment register can hold the same value, which need never change.
Segmented Architecture
The 8086 family of processors employs a segmented architecture that is, each address is represented as a segment and an offset. Segmented addresses affect many aspects of assembly-language programming, especially addresses and pointers. Segmented architecture was originally designed to enable a 16-bit processor to access an address space larger than 64K. (The section Segmented Addressing, later in this chapter, explains how the processor uses both the segment and offset to create addresses larger than 64K.) MS-DOS is an example of an operating system that uses segmented architecture on a 16-bit processor.
Programmers Guide
With the advent of protected-mode processors such as the 80286, segmented architecture gained a second purpose. Segments can separate different blocks of code and data to protect them from undesirable interactions. Windows takes advantage of the protection features of the 16-bit segments on the 80286. Segmented architecture went through another significant change with the release of 32-bit processors, starting with the 80386. These processors are compatible with the older 16-bit processors, but allow flat model 32-bit offset values up to 4 gigabytes. Offset values of this magnitude remove the memory limitations of segmented architecture. The Windows NT operating system uses 32-bit addressing.
Segment Protection
Segmented architecture is an important part of the Windows memory-protection scheme. In a multitasking operating system in which numerous programs can run simultaneously, programs cannot access the code and data of another process without permission. In MS-DOS, the data and code segments are usually allocated adjacent to each other, as shown in Figure 1.1. In Windows, the data and code segments can be anywhere in memory. The programmer knows nothing about, and has no control over, their location. The operating system can even move the segments to a new memory location or to disk while the program is running.
Figure 1.1
Segment Allocation
Segment protection makes software development easier and more reliable in Windows than in MS-DOS, because Windows immediately detects illegal
memory accesses. The operating system intercepts illegal memory accesses, terminates the program, and displays a message. This makes it easier for you to track down and fix the bug. Because it runs in real mode, MS-DOS contains no mechanism for detecting an improper memory access. A program that overwrites data not belonging to it may continue to run and even terminate correctly. The error may not surface until later, when MS-DOS or another program reads the corrupted memory.
Segmented addressing refers to the internal mechanism that combines a segment value and an offset value to form a complete memory address. The two parts of an address are represented as segment:offset The segment portion always consists of a 16-bit value. The offset portion is a 16-bit value in 16-bit mode or a 32-bit value in 32-bit mode. In real mode, the segment value is a physical address that has an arithmetic relationship to the offset value. The segment and offset together create a 20-bit physical address (explained in the next section). Although 20-bit addresses can access up to 1 megabyte of memory, the BIOS and operating system on International Standard Architecture (IBM PC/AT and compatible) computers use part of this memory, leaving the remainder available for programs.
Segment Arithmetic
Manipulating segment and offset addresses directly in real-mode programming is called segment arithmetic. Programs that perform segment arithmetic are not portable to protected-mode operating systems, in which addresses do not correspond to a known segment and offset. To perform segment arithmetic successfully, it helps to understand how the processor combines a 16-bit segment and a 16-bit offset to form a 20-bit linear address. In effect, the segment selects a 64K region of memory, and the offset selects the byte within that region. Heres how it works: 1. The processor shifts the segment address to the left by four binary places, producing a 20-bit address ending in four zeros. This operation has the effect of multiplying the segment address by 16. 2. The processor adds this 20-bit segment address to the 16-bit offset address. The offset address is not shifted. 3. The processor uses the resulting 20-bit address, called the physical address, to access an actual location in the 1-megabyte address space.
Programmers Guide
Figure 1.2 illustrates this process.
Figure 1.2
Calculating Physical Addresses
A 20-bit physical address may actually be specified by 4,096 equivalent segment:offset addresses. For example, the addresses 0000:F800, 0F00:0800, and 0F80:0000 all refer to the same physical address 0F800.
Language Components of MASM

Programming with MASM requires that you understand the MASM concepts of reserved words, identifiers, predefined symbols, constants, expressions, operators, data types, registers, and statements. This section defines important terms and provides lists that summarize these topics. For detailed information, see Help or the Reference.
Reserved Words
A reserved word has a special meaning fixed by the language. You can use it only under certain conditions. Reserved words in MASM include:
u u u u
Instructions, which correspond to operations the processor can execute. Directives, which give commands to the assembler. Attributes, which provide a value for a field, such as segment alignment. Operators, which are used in expressions.

u
Predefined symbols, which return information to your program.
MASM reserved words are not case sensitive except for predefined symbols (see Predefined Symbols, later in this chapter). The assembler generates an error if you use a reserved word as a variable, code label, or other identifier within your source code. However, if you need to use a reserved word for another purpose, the OPTION NOKEYWORD directive can selectively disable a words status as a reserved word. For example, to remove the STR instruction, the MASK operator, and the NAME directive from the set of words MASM recognizes as reserved, use this statement in the code segment of your program before the first reference to STR, MASK, or NAME:
OPTION NOKEYWORD:<STR MASK NAME>
The section Using the OPTION Directive, later in this chapter, discusses the OPTION directive. Appendix D provides a complete list of MASM reserved words. With the /Zm command-line option or OPTION M510 in effect, MASM does not reserve any operators or instructions that do not apply to the current CPU mode. For example, you can use the symbol ENTER when assembling under the default CPU mode but not under .286 mode, since the 80186/486 processors recognize ENTER as an instruction. The USE32, FLAT, FAR32, and NEAR32 segment types and the 80386/486 register names are not keywords with processors other than the 80386/486.
Identifiers
An identifier is a name that you invent and attach to a definition. Identifiers can be symbols representing variables, constants, procedure names, code labels, segment names, and user-defined data types such as structures, unions, records, and types defined with TYPEDEF. Identifiers longer than 247 characters generate an error. Certain restrictions limit the names you can use for identifiers. Follow these rules to define a name for an identifier:
u
The first character of the identifier can be an alphabetic character (AZ) or any of these four characters: @ _ $ ? The other characters in the identifier can be any of the characters listed above or a decimal digit (09).
Avoid starting an identifier with the at sign (@), because MASM 6.1 predefines some special symbols starting with @ (see Predefined Symbols, following).
10
Programmers Guide
Beginning an identifier with @ may also cause conflicts with future versions of the Macro Assembler. The symbol and thus the identifier is visible as long as it remains within scope. (For more information about visibility and scope, see Sharing Symbols with Include Files in Chapter 8.)
Predefined Symbols
The assembler includes a number of predefined symbols (also called predefined equates). You can use these symbol names at any point in your code to represent the equate value. For example, the predefined equate @FileName represents the base name of the current file. If the current source file is TASK.ASM, the value of @FileName is TASK. The MASM predefined symbols are listed according to the kinds of information they provide. Case is important only if the /Cp option is used. (For additional details, see Help on ML command-line options.) The predefined symbols for segment information include:
Symbol @code @CodeSize @CurSeg @data @DataSize @fardata @fardata? @Model @stack @WordSize Description Returns the name of the code segment. Returns an integer representing the default code distance. Returns the name of the current segment. Expands to DGROUP. Returns an integer representing the default data distance. Returns the name of the segment defined by the .FARDATA directive. Returns the name of the segment defined by the .FARDATA? directive. Returns the selected memory model. Expands to DGROUP for near stacks or STACK for far stacks. (See Creating a Stack in Chapter 2.) Provides the size attribute of the current segment.
The predefined symbols for environment information include:

Symbol @Cpu @Environ @Interface @Version Description Contains a bit mask specifying the processor mode. Returns values of environment variables during assembly. Contains information about the language parameters. Represents the text equivalent of the MASM version number. In MASM 6.1, this expands to 610.
Filename: LMAPGC01.DOC Project: Template: MSGRIDA1.DOT Author: Terri Sharkey Last Saved By: Ruth L Silverio Revision #: 57 Page: 10 of 10 Printed: 10/02/00 04:24 PM
11
The predefined symbols for date and time information include:

Symbol @Date @Time Description Supplies the current system date during assembly. Supplies the current system time during assembly.
The predefined symbols for file information include:

Symbol @FileCur @FileName @Line Description Names the current file (base and suffix). Names the base name of the main file being assembled as it appears on the command line. Gives the source line number in the current file.
The predefined symbols for macro string manipulation include:

Symbol @CatStr @InStr @SizeStr @SubStr Description Returns concatenation of two strings. Returns the starting position of a string within another string. Returns the length of a given string. Returns substring from a given string.
Integer Constants and Constant Expressions

An integer constant is a series of one or more numerals followed by an optional radix specifier. For example, in these statements
mov mov ax, 25 bx, 0B3h
the numbers 25 and 0B3h are integer constants. The h appended to 0B3 is a radix specifier. The specifiers are:
u u u u
y for binary (or b if the default radix is not hexadecimal) o or q for octal t for decimal (or d if the default radix is not hexadecimal) h for hexadecimal
Radix specifiers can be either uppercase or lowercase letters; sample code in this book is in lowercase. If you do not specify a radix, the assembler interprets the integer according to the current radix. The default radix is decimal, but you can change the default with the .RADIX directive.
12
Programmers Guide
Hexadecimal numbers must always start with a decimal digit (09). If necessary, add a leading zero to distinguish between symbols and hexadecimal numbers that start with a letter. For example, MASM interprets ABCh as an identifier. The hexadecimal digits A through F can be either uppercase or lowercase letters. Sample code in this book is in uppercase letters. Constant expressions contain integer constants and (optionally) operators such as shift, logical, and arithmetic operators. The assembler evaluates constant expressions at assembly time. (In addition to constants, expressions can contain labels, types, registers, and their attributes.) Constant expressions do not change value during program execution.
Symbolic Integer Constants

You can define symbolic integer constants with either of the data assignment directives, EQU or the equal sign (=). These directives assign values to symbols during assembly, not during program execution. Symbolic constants are used to assign names to constant values. You can use a symbol with an assigned value in place of an immediate operand. For example, instead of referring in your code to keyboard scan codes with numbers such as 30 or 48, you can create more recognizable symbols:
SCAN_A SCAN_B EQU EQU 30 48
then use the appropriate symbol in your program rather than the number. Using symbolic constants instead of undescriptive numbers makes your code more readable and easier to maintain. The assembler does not allocate data storage when you use either EQU or =. It simply replaces each occurrence of the symbol with the value of the expression. The directives EQU and = have slightly different purposes. Integers defined with the = directive can be redefined with another value in your source code, but those defined with EQU cannot. Once youve defined a symbolic constant with the EQU directive, attempting to redefine it generates an error. The syntax is: symbol EQU expression The symbol is a unique name of your choice, except for words reserved by MASM. The expression can be an integer, a constant expression, a one- or twocharacter string constant (four-character on the 80386/486), or an expression that evaluates to an address. Symbolic constants let you change a constant value used throughout your source code by merely altering expression in the definition. This removes the potential for error and saves you the inconvenience of having to find and replace each occurrence of the constant in your program.
13
The following example shows the correct use of EQU to define symbolic integers.
column row screen line EQU EQU EQU EQU .DATA .CODE . . . mov mov 80 25 column * row row ; ; ; ; Constant 80 Constant 25 Constant - 2000 Constant 25
cx, column bx, line
The value of a symbol defined with the = directive can be different at different places in the source code. However, a constant value is assigned during assembly for each use, and that value does not change at run time. The syntax for the = directive is: symbol = expression
Size of Constants
The default word size for MASM 6.1 expressions is 32 bits. This behavior can be modified using OPTION EXPR16 or OPTION M510. Both of these options set the expression word size to 16 bits, but OPTION M510 affects other assembler behavior as well (see Appendix A). It is illegal to change the expression word size once it has been set with OPTION M510, OPTION EXPR16, or OPTION EXPR32. However, you can repeat the same directive in your source code as often as you wish. You can place the same directive in every include file, for example.
Operators
Operators are used in expressions. The value of the expression is determined at assembly time and does not change when the program runs. Operators should not be confused with processor instructions. The reserved word ADD is an instruction; the plus sign (+) is an operator. For example, Amount+2 illustrates a valid use of the plus operator (+). It tells the assembler to add 2 to the constant value Amount, which might be a value or an address. Contrast this operation, which occurs at assembly time, with the processors ADD instruction. ADD tells the processor at run time to add two numbers and store the result.
14
Programmers Guide
The assembler evaluates expressions that contain more than one operator according to the following rules:
u u u u
Operations in parentheses are performed before adjacent operations. Binary operations of highest precedence are performed first. Operations of equal precedence are performed from left to right. Unary operations of equal precedence are performed right to left.
Table 1.3 lists the order of precedence for all operators. Operators on the same line have equal precedence.
Table 1.3 Precedence 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Operator Precedence Operators ( ), [ ]
LENGTH, SIZE, WIDTH, MASK, LENGTHOF, SIZEOF
. (structure-field-name operator) : (segment-override operator), PTR

LROFFSET, OFFSET, SEG, THIS , TYPE HIGH, HIGHWORD, LOW, LOWWORD
+ , (unary) *, /, MOD, SHL, SHR +, (binary)

EQ, NE, LT, LE, GT, GE NOT AND OR, XOR OPATTR, SHORT, .TYPE
Data Types
A data type describes a set of values. A variable of a given type can have any of a set of values within the range specified for that type. The intrinsic types for MASM 6.1 are BYTE, SBYTE, WORD, SWORD, DWORD, SDWORD, FWORD, QWORD, and TBYTE. These types define integers and binary coded decimals (BCDs), as discussed in Chapter 6. The signed data types SBYTE, SWORD, and SDWORD work in conjunction with directives such as INVOKE (for calling procedures) and .IF (introduced in Chapter 7). The REAL4, REAL8, and REAL10 directives define floating-point types. (See Chapter 6.)
15
Versions of MASM prior to 6.0 had separate directives for types and initializers. For example, BYTE is a type and DB is the corresponding initializer. The distinction does not apply in MASM 6.1. You can use any type (intrinsic or user-defined) as an initializer. MASM does not have specific types for arrays and strings. However, you can treat a sequence of data units as arrays, and character or byte sequences as strings. (See Arrays and Strings in Chapter 5.) Types can also have attributes such as langtype and distance (NEAR and FAR). For information on these attributes, see Declaring Parameters with the PROC Directive in Chapter 7. You can also define your own types with STRUCT, UNION, and RECORD. The types have fields that contain string or numeric data, or records that contain bits. These data types are similar to the user-defined data types in high-level languages such as C, Pascal, and FORTRAN. (See Chapter 5, Defining and Using Complex Data Types.) You can define new types, including pointer types, with the TYPEDEF directive. TYPEDEF assigns a qualifiedtype (explained in the following) to a typename of your choice. This lets you build new types with descriptive names of your choosing, making your programs more readable. For example, the following statement makes the symbol CHAR a synonym for the intrinsic type BYTE:
CHAR TYPEDEF BYTE
The qualifiedtype is any type or pointer to a type of the form: [[distance]] PTR [[qualifiedtype]] where distance is NEAR, FAR, or any distance modifier. (For more information on distance, see Declaring Parameters with the PROC Directive in Chapter 7.) The qualifiedtype can also be any type previously defined with TYPEDEF. For example, if you use TYPEDEF to create an alias for BYTE say, CHAR as in the preceding example you can use CHAR as a qualifiedtype when defining the pointer type PCHAR, like this:
CHAR PCHAR TYPEDEF BYTE TYPEDEF PTR CHAR
The typename CHAR in the first line becomes a qualifiedtype in the second line. Use of the TYPEDEF directive to define pointers is explained in Accessing Data with Pointers and Addresses in Chapter 3.
16
Programmers Guide
Since distance and qualifiedtype are optional syntax elements, you can use variables of type PTR or FAR PTR. You can also define procedure prototypes with qualifiedtype. For more information about procedure prototypes, see Declaring Procedure Prototypes in Chapter 7. These rules govern the use of qualifiedtype:
u
The only component of a qualifiedtype definition that can be forwardreferenced is a structure or union type identifier. If you do not specify distance, the assembler assumes a distance that corresponds to the memory model. The assumed distance is NEAR for tiny, small, and medium models, and FAR for other models. If you do not specify a memory model with .MODEL , the assembler assumes SMALL model (and therefore NEAR pointers).
You can use a qualifiedtype in seven places:

Use In procedure arguments In prototype arguments With local variables declared inside procedures With the LABEL directive With the EXTERN and EXTERNDEF directives With the COMM directive With the TYPEDEF directive Example
proc1 PROC pMsg:PTR BYTE proc2 PROTO pMsg:FAR PTR WORD LOCAL pMsg:PTR TempMsg LABEL PTR WORD EXTERN pMsg:FAR PTR BYTE EXTERNDEF MyProc:PROTO COMM var1:WORD:3 PBYTE TYPEDEF PTR BYTE PFUNC TYPEDEF PROTO MyProc
Defining Pointer Types with TYPEDEF in Chapter 3 shows ways to write a TYPEDEF type for a qualifiedtype. Attributes such as NEAR and FAR can also apply to a qualifiedtype. You can determine an accurate definition for TYPEDEF and qualifiedtype from the BNF grammar definitions given in Appendix B. The BNF grammar defines each component of the syntax for any directive, showing the recursive properties of components such as qualifiedtype.
Registers
The 8086 family of processors have the same base set of 16-bit registers. Each processor can treat certain registers as two separate 8-bit registers. The 80386/486 processors have extended 32-bit registers. To maintain compatibility
17
with their predecessors, 80386/486 processors can access their registers as 16bit or, where appropriate, as 8-bit values. Figure 1.3 shows the registers common to all the 8086-based processors. Each register has its own special uses and limitations.
Figure 1.3
Registers for 8088 80286 Processors
80386/486 Only
The 80386/486 processors use the same 8-bit and 16-bit registers used by the rest of the 8086 family. All of these registers can be further extended to 32 bits, except segment registers, which always occupy 16 bits. The extended register names begin with the letter E. For example, the 32-bit extension of AX is EAX. The 80386/486 processors have two additional segment registers, FS and GS. Figure 1.4 shows the extended registers of the 80386/486.
18
Programmers Guide
Figure 1.4
Extended Registers for the 80386/486 Processors
Segment Registers
At run time, all addresses are relative to one of four segment registers: CS, DS, SS, or ES. (The 80386/486 processors add two more: FS and GS.) These registers, their segments, and their purposes include:
Register and Segment CS (Code Segment) DS (Data Segment) SS (Stack Segment) Purpose Contains processor instructions and their immediate operands. Normally contains data allocated by the program. Contains the program stack for use by PUSH , POP, CALL, and RET.
Chapter 1 Understanding Global Concepts Register and Segment ES (Extra Segment) FS, GS Purpose References secondary data segment. Used by string instructions. Provides extra segments on the 80386/486.
19
General-Purpose Registers
The AX, DX, CX, BX, BP, DI, and SI registers are 16-bit general-purpose registers, used for temporary data storage. Since the processor accesses registers more quickly than it accesses memory, you can make your programs run faster by keeping the most-frequently used data in registers. The 8086-based processors do not perform memory-to-memory operations. For example, the processor cannot directly copy a variable from one location in memory to another. You must first copy from memory to a register, then from the register to the new memory location. Similarly, to add two variables in memory, you must first copy one variable to a register, then add the contents of the register to the other variable in memory. The processor can access four of the general registers AX, DX, CX, and BX either as two 8-bit registers or as a single 16-bit register. The AH, DH, CH, and BH registers represent the high-order 8 bits of the corresponding registers. Similarly, AL, DL, CL, and BL represent the low-order 8 bits of the registers. The 80386/486 processors can extend all the general registers to 32 bits, though as Figure 1.4 shows, you cannot treat the upper 16 bits as a separate register as you can the lower 16 bits. To use EAX as an example, you can directly reference the low byte as AL, the next lowest byte as AH, and the low word as AX. To access the high word of EAX, however, you must first shift the upper 16 bits into the lower 16 bits.
Special-Purpose Registers
The 8086 family of processors has two additional registers, SP and IP, whose values are changed automatically by the processor.
SP (Stack Pointer)
The SP register points to the current location within the stack segment. Pushing a value onto the stack decreases the value of SP by two; popping from the stack increases the value of SP by two. Thirty-twobit operands on 80386/486 processors increase or decrease SP by four instead of two. The CALL and INT instructions store the return address on the stack and reduce SP accordingly. Return instructions retrieve the stored address from the stack and reset SP to its value before the call. SP can also be adjusted with instructions such as ADD. The program stack is described in detail in Chapter 3.
20
Programmers Guide
IP (Instruction Pointer)
The IP register always contains the address of the next instruction to be executed. You cannot directly access or change the instruction pointer. However, instructions that control program flow (such as calls, jumps, loops, and interrupts) automatically change the instruction pointer.
Flags Register
The 16 bits in the flags register control the execution of certain instructions and reflect the current status of the processor. In 80386/486 processors, the flags register is extended to 32 bits. Some bits are undefined, so there are actually 9 flags for real mode, 11 flags (including a 2-bit flag) for 80286 protected mode, 13 for the 80386, and 14 for the 80486. The extended flags register of the 80386/486 is sometimes called Eflags. Figure 1.5 shows the bits of the 32-bit flags register for the 80386/486. Earlier 8086-family processors use only the lower word. The unmarked bits are reserved for processor use, and should not be modified.
Figure 1.5 Flags for 8088-80486 Processors
In the following descriptions and throughout this book, set means a bit value of 1, and cleared means the bit value is 0. The nine flags common to all 8086family processors, starting with the low-order flags, include:
Chapter 1 Understanding Global Concepts Flag Carry Parity Auxiliary Carry Description Set if an operation generates a carry to or a borrow from a destination operand. Set if the low-order bits of the result of an operation contain an even number of set bits.
21
Set if an operation generates a carry to or a borrow from the low-order 4 bits of an operand. This flag is used for binary coded decimal (BCD) arithmetic. Set if the result of an operation is 0. Equal to the high-order bit of the result of an operation (0 is positive, 1 is negative). If set, the processor generates a single-step interrupt after each instruction. A debugging program can use this feature to execute a program one instruction at a time. If set, interrupts are recognized and acted on as they are received. The bit can be cleared to turn off interrupt processing temporarily. If set, string operations process down from high addresses to low addresses. If cleared, string operations process up from low addresses to high addresses. Set if the result of an operation is too large or small to fit in the destination operand.
Zero Sign Trap
Interrupt Enable Direction
Overflow
Although all flags serve a purpose, most programs require only the carry, zero, sign, and direction flags.
Statements
Statements are the line-by-line components of source files. Each MASM statement specifies an instruction or directive for the assembler. Statements have up to four fields, as shown here: [[name:]] [[operation]] [[operands]] [[;comment]] The following list explains each field:
Field name Purpose Labels the statement, so that instructions elsewhere in the program can refer to the statement by name. The name field can label a variable, type, segment, or code location. Defines the action of the statement. This field contains either an instruction or an assembler directive. Lists one or more items on which the instruction or directive operates. Provides a comment for the programmer. Comments are for documentation only; they are ignored by the assembler.
operation operands comment
22
Programmers Guide
The following line contains all four fields:

mainlp: mov ax, 7 ; Load AX with the value 7
Here, mainlp is the label, mov is the operation, and ax and 7 are the operands, separated by a comma. The comment follows the semicolon. All fields are optional, although certain directives and instructions require an entry in the name or operand field. Some instructions and directives place restrictions on the choice of operands. By default, MASM is not case sensitive. Each field (except the comment field) must be separated from other fields by white-space characters (spaces or tabs). MASM also requires code labels to be followed by a colon, operands to be separated by commas, and comments to be preceded by a semicolon. A logical line can contain up to 512 characters and occupy one or more physical lines. To extend a logical line into two or more physical lines, put the backslash character (\) as the last non-whitespace character before the comment or end of the line. You can place a comment after the backslash as shown in this example:
.IF && && mov .ENDIF (x > 0) (ax > x) (cx == 0) dx, 20h \ \ ; X must be positive ; Result from function must be > x ; Check loop counter, too
Multiline comments can also be specified with the COMMENT directive. The assembler ignores all text and code between the delimiters or on the same line as the delimiters. This example illustrates the use of COMMENT.
COMMENT ^ ^ mov ax, 1 The assembler ignores this text and this code
The Assembly Process

Creating and running an executable file involves four steps: 1. Assembling the source code into an object file 2. Linking the object file with other modules or libraries into an executable program 3. Loading the program into memory 4. Running the program
23
Once you have written your assembly-language program, MASM provides several options for assembling it. The OPTION directive has several different arguments that let you control the way MASM assembles your programs. Conditional assembly allows you to create one source file that can generate a variety of programs, depending on the status of various conditional-assembly statements.
Generating and Running Executable Programs

This section briefly lists all the actions that take place during each of the assembly steps. You can change the behavior of some of these actions in various ways, such as using macros instead of procedures, or using the OPTION directive or conditional assembly. The other chapters in this book include specific programming methods; this section simply gives you an overview.
Assembling
The ML.EXE program does two things to create an executable program. First, it assembles the source code into an intermediate object file. Second, it calls the linker, LINK.EXE, which links the object files and libraries into an executable program. At assembly time, the assembler:
u
u u
u u u u u
Evaluates conditional-assembly directives, assembling if the conditions are true. Expands macros and macro functions. Evaluates constant expressions such as MYFLAG AND 80H, substituting the calculated value for the expression. Encodes instructions and nonaddress operands. For example, mov cx, 13 can be encoded at assembly time because the instruction does not access memory. Saves memory offsets as offsets from their segments. Places segments and segment attributes in the object file. Saves placeholders for offsets and segments (relocatable addresses). Outputs a listing if requested. Passes messages (such as INCLUDELIB and .DOSSEG ) directly to the linker.
For information about conditional assembly, see Conditional Directives in this chapter; for macros, see Chapter 9. Further details about segments and offsets are included in Chapters 2 and 3. Assembly listings are explained in Appendix C.
24
Programmers Guide
Linking
Once your source code is assembled, the resulting object file is passed to the linker. At this point, the linker may combine several object files into an executable program. The linker:
u
Combines segments according to the instructions in the object files, rearranging the positions of segments that share the same class or group. Fills in placeholders for offsets (relocatable addresses). Writes relocations for segments into the header of .EXE files (but not .COM files). Writes the result as an executable program file.
u u
Classes and groups are defined in Defining Segment Groups in Chapter 2. Segments and offsets are explained in Chapter 3, Using Addresses and Pointers.
Loading
After loading the executable file into memory, the operating system:
u u u u u
Creates the program segment prefix (PSP) header in memory. Allocates memory for the program, based on the values in the PSP. Loads the program. Calculates the correct values for absolute addresses from the relocation table. Loads the segment registers SS, CS, DS, and ES with values that point to the proper areas of memory.
For information about segment registers, the instruction pointer (IP), and the stack pointer (SP), see Registers earlier in this chapter. For more information on the PSP see Help or an MS-DOS reference.
Running
To run your program, MS-DOS jumps to the programs first instruction. Some program operations, such as resolving indirect memory operands, cannot be handled until the program runs. For a description of indirect references, see Indirect Operands in Chapter 7.
Using the OPTION Directive

The OPTION directive lets you modify global aspects of the assembly process. With OPTION, you can change command-line options and default arguments. These changes affect only statements that follow the OPTION keyword.
25
For example, you may have MASM code in which the first character of a variable, macro, structure, or field name is a dot (.). Since a leading dot causes MASM 6.1 to generate an error, you can use this statement in your program:
OPTION DOTNAME
This enables the use of the dot for the first character. Changes made with OPTION override any corresponding command-line option. For example, suppose you compile a module with this command line (which enables M510 compatibility):
ML /Zm TEST.ASM
The assembler disables M510 compatibility options for all code following this statement:
OPTION NOM510
The following lists explain each of the arguments for the OPTION directive. Where appropriate, an underline identifies the default argument. If you wish to place more than one OPTION statement on a line, separate them by commas. Options for M510 compatibility include:
Argument
CASEMAP: maptype
Description
CASEMAP:NONE (or /Cx) causes internal
symbol recognition to be case sensitive and causes the case of identifiers in the .OBJ file to be the same as specified in the EXTERNDEF, PUBLIC, or COMM statement. The default is CASEMAP:NOTPUBLIC (or /Cp). It specifies case insensitivity for internal symbol recognition and the same behavior as CASEMAP:NONE for case of identifiers in .OBJ files. CASEMAP:ALL (/Cu) specifies case insensitivity for identifiers and converts all identifier names to uppercase.
DOTNAME | NODOTNAME Enables the use of the dot (.) as the leading
character in variable, macro, structure, union, and member names.

M510 | NOM510
Sets all features to be compatible with MASM version 5.1, disabling the SCOPED argument and enabling OLDMACROS, DOTNAME, and, OLDSTRUCTS. OPTION M510 conditionally sets other arguments for the OPTION directive. For more information on using OPTION M510, see Appendix A.
26
Programmers Guide Argument

OLDMACROS | NOOLDMACROS OLDSTRUCTS | NOOLDSTRUCTS
Description Enables the version 5.1 treatment of macros. MASM 6.1 treats macros differently. Enables compatibility with MASM 5.1 for treatment of structure members. See Chapter 5 for information on structures. Guarantees that all labels inside procedures are local to the procedure when SCOPED (the default) is enabled. If TRUE, .ERR2 statements and IF2 and ELSEIF2 conditional blocks are evaluated on every pass. If FALSE, they are not evaluated. If SETIF2 is not specified (or implied), .ERR2, IF2 , and ELSEIF2 expressions cause an error. Both the /Zm command-line argument and OPTION M510 imply SETIF2:TRUE.
SCOPED | NOSCOPED
SETIF2: TRUE | FALSE
Options for procedure use include:

Argument
LANGUAGE : langtype
Description Specifies the default language type (C, PASCAL, FORTRAN, BASIC, SYSCALL, or STDCALL) to be used with PROC, EXTERN, and PUBLIC. This use of the OPTION directive overrides the .MODEL directive but is normally used when .MODEL is not given. Instructs the assembler to call the macroname to generate a user-defined epilogue instead of the standard epilogue code when a RET instruction is encountered. See Chapter 7. Instructs the assembler to call macroname to generate a user-defined prologue instead of generating the standard prologue code. See Chapter 7. Lets you explicitly set the default visibility as PUBLIC, EXPORT, or PRIVATE.
EPILOGUE: macroname
PROLOGUE: macroname
PROC: visibility
Other options include:

Argument
EXPR16 | EXPR32
Description Sets the expression word size to 16 or 32 bits. The default is 32 bits. The M510 argument to the OPTION directive sets the word size to 16 bits. Once set with the OPTION directive, the expression word size cannot be changed.
Chapter 1 Understanding Global Concepts Argument

EMULATOR | NOEMULATOR
27
Description Controls the generation of floating-point instructions.The NOEMULATOR option generates the coprocessor instructions directly. The EMULATOR option generates instructions with special fixup records for the linker so that the Microsoft floating-point emulator, supplied with other Microsoft languages, can be used. It produces the same result as setting the /Fpi command-line option. You can set this option only once per module. Enables automatic conditional-jump lengthening. For information about conditional-jump lengthening, see Chapter 7. Disables the specified reserved words. For an example of the syntax for this argument, see Reserved Words in this chapter. Overrides the default sign-extended opcodes for the AND, OR, and XOR instructions and generates the larger non-sign-extended forms of these instructions. Provided for compatibility with NEC V25 and NEC V35 controllers. Determines the result of OFFSET operator fixups. SEGMENT sets the defaults for fixups to be segment-relative (compatible with MASM 5.1). GROUP, the default, generates fixups relative to the group (if the label is in a group). FLAT causes fixups to be relative to a flat frame. (The .386 mode must be enabled to use FLAT.) See Appendix A.
LJMP | NOLJMP
NOKEYWORD:<keywordlist >
NOSIGNEXTEND
OFFSET: offsettype
READONLY | NOREADONLY
Enables checking for instructions that modify code segments, thereby guaranteeing that read-only code segments are not modified. Same as the /p command-line option of MASM 5.1, except that it affects only segments with at least one assembly instruction, not all segments. The argument is useful for protected mode programs, where code segments must remain read-only. Allows global default segment size to be set. Also determines the default address size for external symbols defined outside any segment. The segSize can be USE16, USE32, or FLAT.
SEGMENT: segSize
28
Programmers Guide
Conditional Directives
MASM 6.1 provides conditional-assembly directives and conditional-error directives. Conditional-assembly directives let you test for a specified condition and assemble a block of statements if the condition is true. Conditional-error directives allow you to test for a specified condition and generate an assembly error if the condition is true. Both kinds of conditional directives test assembly-time conditions, not run-time conditions. You can test only expressions that evaluate to constants during assembly. For a list of the predefined symbols often used in conditional assembly, see Predefined Symbols, earlier in this chapter.
Conditional-Assembly Directives
The IF and ENDIF directives enclose the conditional statements. The optional ELSEIF and ELSE blocks follow the IF directive. There are many forms of the IF and ELSE directives. Help provides a complete list. The following statements show the syntax for the IF directives. The syntax for other condition-assembly directives follow the same form. IF expression1 ifstatements [[ELSEIF expression2 elseifstatements]] [[ELSE elsestatements]] ENDIF The statements within an IF block can be any valid instructions, including other conditional blocks, which in turn can contain any number of ELSEIF blocks. ENDIF ends the block. MASM assembles the statements following the IF directive only if the corresponding condition is true. If the condition is not true and the block contains an ELSEIF directive, the assembler checks to see if the corresponding condition is true. If so, it assembles the statements following the ELSEIF directive. If no IF or ELSEIF conditions are satisfied, the assembler processes only the statements following the ELSE directive. For example, you may want to assemble a line of code only if your program defines a particular variable. In this example,
IFDEF buff ENDIF buffer BYTE buffer DUP(?)
the assembler allocates buff only if buffer has been previously defined.
29
MASM 6.1 provides the directives IF1, IF2, ELSEIF1, and ELSIF2 to grant assembly only on pass one or pass two. To use these directives, you must either enable 5.1 compatibility (with the /Zm command-line switch or OPTION M510) or set OPTION SETIF2:TRUE, as described in the previous section. The following list summarizes the conditional-assembly directives:
The Directive IF expression IFE expression IFDEF name IFNDEF name IFB argument* IFNB argument* IFIDN[I] arg1, arg2* IFDIF[I] arg1, arg2* Grants Assembly If expression is true (nonzero) expression is false (zero) name has been previously defined name has not been previously defined argument is blank argument is not blank arg1 equals arg2 arg1 does not equal arg2 The optional I suffix (IFIDNI and IFDIFI ) makes comparisons insensitive to differences in case.
* Used only in macros.
Conditional-Error Directives
You can use conditional-error directives to debug programs and check for assembly-time errors. By inserting a conditional-error directive at a key point in your code, you can test assembly-time conditions at that point. You can also use conditional-error directives to test for boundary conditions in macros. Like other severe errors, those generated by conditional-error directives cause the assembler to return a nonzero exit code. If MASM encounters a severe error during assembly, it does not generate the object module. For example, the .ERRNDEF directive produces an error if the program has not defined a given label. In the following example, .ERRNDEF makes sure a label called publevel actually exists.
.ERRNDEF IF PUBLIC ELSE PUBLIC ENDIF publevel publevel LE 2 var1, var2 var1, var2, var3
The conditional-error directives use the syntax given in the previous section. The following list summarizes the conditional-error directives. Note their close correspondence with the previous list of conditional-assembly directives.
30
Programmers Guide The Directive

.ERR .ERRE expression .ERRNZ expression .ERRDEF name .ERRNDEF name .ERRB argument* .ERRNB argument* .ERRIDN[I] arg1, arg2* .ERRDIF[I] arg1, arg2*
Generates an Error Unconditionally where it occurs in the source file. Usually placed within a conditional-assembly block. If expression is false (zero). If expression is true (nonzero). If name has been defined. If name has not been defined. If argument is blank. If argument is not blank. If arg1 equals arg2. If arg1 does not equal arg2. The optional I suffix (.ERRIDNI and .ERRDIFI) makes comparisons insensitive to case.
* Used only in macros
Two special conditional-error directives, .ERR1 and .ERR2, generate an error only on pass one or pass two. To use these directives, you must either enable 5.1 compatibility (with the /Zm command-line switch or OPTION M510) or set OPTION SETIF2:TRUE, as described in the previous section.
31
C H A P T E R
Organizing Segments
Understanding segments is an essential part of programming in assembly language. In the family of 8086-based processors, the term segment has two meanings:
u
A block of memory of discrete size, called a physical segment. The number of bytes in a physical memory segment is 64K for 16-bit processors or 4 gigabytes for 32-bit processors. A variable-sized block of memory, called a logical segment, occupied by a programs code or data.
As you read this chapter, the distinction between the two definitions will become clear. The adjectives physical and logical are not often used when speaking of segments. The beginning programmer is left to infer from context which definition applies. Fortunately, this is not difficult, and a distinction is often not required. This chapter begins with a close look at physical memory segments. This lays the foundation for understanding logical segments, which form the subject of most of the following sections. The section Using Simplified Segment Directives explains how to begin, end, and organize segments. It also explains how to access far data and code with simplified segment directives. The next section, Using Full Segment Definitions, describes how to order, combine, and divide segments, and how to use the SEGMENT directive to define full segments. It also explains how to create a segment group so that you can use one segment address to access all the data. Most of the information in this chapter also applies to writing modules to be called from other programs. Exceptions are noted when they apply. For more information about multiple-module programming, see Chapter 8, Sharing Data and Procedures Among Modules and Libraries.
32
Programmers Guide
Physical Memory Segments

As explained in Chapter 1, a physical segment can begin only at memory locations evenly divisible by 16, including address 0. Intel calls such locations paragraphs. You can easily recognize a paragraph location because its hexadecimal address always ends with 0, as in 10000h or 2EA70h. The 8086/286 processors allow segments 64K in size, the largest number 16 bits can represent. The 80386/486 processors still adhere to the 64K limit when running in real mode. In protected mode, however, they use 32-bit registers that can hold addresses up to 4 gigabytes. Segmented architecture presents certain hurdles for the assembly-language programmer. For small programs, the limitations lose importance. Code and data each occupy less than 64K and reside in individual segments. A simple offset locates each variable or instruction within a segment. Larger programs, however, must contend with problems of segmented memory areas. If data occupies two or more segments, the program must specify both segment and offset to access a variable. When the data forms a continuous stream across segments such as the text in a word processors workspace the problems become more acute. Whenever it adds or deletes text in the first segment, the word processor must seamlessly move data back and forth over the boundaries of each following segment. The problem of segment boundaries disappears in the so-called flat address space of 32-bit protected mode. Although segments still exist, they easily hold all the code and data of the largest programs. Even a very large program becomes in effect a small application, able to reach all code and data with a single offset address.
Logical Segments
Logical segments contain the three components of a program: code, data, and stack. MASM organizes the three parts for you so they occupy physical segments of memory. The segment registers CS, DS, and SS contain the addresses of the physical memory segments where the logical segments reside. You can define segments in two ways: with simplified segment directives and with full segment definitions. You can also use both kinds of segment definitions in the same program. Simplified segment directives hide many of the details of segment definition and assume the same conventions used by Microsoft high-level languages. (See the following section, Using Simplified Segment Directives.) The simplified segment directives generate necessary code, specify segment attributes, and arrange segment order.
Chapter 2 Organizing Segments
33
Full segment definitions require more complex syntax but provide more complete control over how the assembler generates segments. (See Using Full Segment Definitions later in this chapter.) If you use full segment definitions, you must write code to handle all the tasks performed automatically by the simplified segment directives.
Using Simplified Segment Directives

Structuring a MASM program using simplified segments requires use of several directives to assign standard names, alignment, and attributes to the segments in your program. These directives define the segments in such a way that linking with Microsoft high-level languages is easy. The simplified segment directives are .MODEL , .CODE, .CONST, .DATA , .DATA?, .FARDATA , .FARDATA?, .STACK, .STARTUP, and .EXIT. The following sections discuss these directives and the arguments they take. MASM programs consist of modules made up of segments. Every program written only in MASM has one main module, where program execution begins. This main module can contain code, data, or stack segments defined with all of the simplified segment directives. Any additional modules should contain only code and data segments. Every module that uses simplified segments must, however, begin with the .MODEL directive. The following example shows the structure of a main module using simplified segment directives. It uses the default processor (8086) and the default stack distance (NEARSTACK). Additional modules linked to this main program would use only the .MODEL , .CODE, and .DATA directives and the END statement.
34
Programmers Guide
; This is the structure of a main module ; using simplified segment directives .MODEL small, c ; This statement is required before you ; can use other simplified segment directives .STACK .DATA ; Use default 1-kilobyte stack ; Begin data segment ; Place data declarations here .CODE .STARTUP ; Begin code segment ; Generate start-up code ; Place instructions here .EXIT END ; Generate exit code
The .DATA and .CODE statements do not require any separate statements to define the end of a segment. They close the preceding segment and then open a new segment. The .STACK directive opens and closes the stack segment but does not close the current segment. The END statement closes the last segment and marks the end of the source code. It must be at the end of every module.
Defining Basic Attributes with .MODEL

The .MODEL directive defines the attributes that affect the entire module: memory model, default calling and naming conventions, operating system, and stack type. This directive enables use of simplified segments and controls the name of the code segment and the default distance for procedures. You must place .MODEL in your source file before any other simplified segment directive. The syntax is: .MODEL memorymodel [[, modeloptions ]] The memorymodel field is required and must appear immediately after the .MODEL directive. The use of modeloptions, which define the other attributes, is optional. The modeloptions must be separated by commas. You can also use equates passed from the ML command line to define the modeloptions. The following list summarizes the memorymodel field and the modeloptions fields, which specify language and stack distance:
Chapter 2 Organizing Segments Field Memory model Description

TINY, SMALL, COMPACT, MEDIUM , LARGE , HUGE , or FLAT. Determines size of code and data pointers. This field is
35
required. Language Stack distance

C, BASIC, FORTRAN, PASCAL, SYSCALL, or STDCALL. Sets
calling and naming conventions for procedures and public symbols.

NEARSTACK or FARSTACK. Specifying NEARSTACK groups
the stack segment into a single physical segment (DGROUP) along with data. SS is assumed to equal DS. FARSTACK does not group the stack with DGROUP; thus SS does not equal DS.
36
Programmers Guide
You can use no more than one reserved word from each field. The following examples show how you can combine various fields:
.MODEL .MODEL small large, c, farstack ; Small memory model ; Large memory model, ; C conventions, ; separate stack ; Medium memory model, ; Pascal conventions, ; near stack (default)
.MODEL
medium, pascal
The next four sections give more detail on each field.
Defining the Memory Model

MASM supports the standard memory models used by Microsoft high-level languages tiny, small, medium, compact, large, huge, and flat. You specify the memory model with attributes of the same name placed after the .MODEL directive. With the exception of the flat model, which requires instructions specific to the 80386/486, your choice of a memory model does not limit the kind of instructions you can write. The memory model does, however, control segment defaults and determine whether data and code are near or far by default, as indicated in the following table.
Table 2.1 Attributes of Memory Models Memory Model Tiny Small Medium Compact Large Huge Flat Default Code Near Near Far Near Far Far Near Default Data Near Near Near Far Far Far Near Operating System MS-DOS MS-DOS, Windows MS-DOS, Windows MS-DOS, Windows MS-DOS, Windows MS-DOS, Windows Windows NT Data and Code Combined Yes No No No No No Yes
When writing assembler modules for a high-level language, you should use the same memory model as the calling language. Choose the smallest memory model available that can contain your data and code, since near references operate more efficiently than far references. The predefined symbol @Model returns the memory model, encoding memory models as integers 1 through 7. For more information on predefined symbols, see Predefined Symbols in Chapter 1. For an example of how to use them, see Help.
37
The seven memory models supported by MASM 6.1 fall into three groups, described in the following paragraphs.
Small, Medium, Compact, Large, and Huge Models

The traditional memory models recognized by many languages are small, medium, compact, large, and huge. Small model supports one data segment and one code segment. All data and code are near by default. Large model supports multiple code and multiple data segments. All data and code are far by default. Medium and compact models are in-between. Medium model supports multiple code and single data segments; compact model supports multiple data segments and a single code segment. Huge model implies individual data items larger than a single segment, but the implementation of huge data items must be coded by the programmer. Since the assembler provides no direct support for this feature, huge model is essentially the same as large model. In each of these models, you can override the default. For example, you can make large data items far in small model, or internal procedures near in large model.
Tiny Model
Tiny-model programs run only under MS-DOS. Tiny model places all data and code in a single segment. Therefore, the total program file size can occupy no more than 64K. The default is near for code and static data items; you cannot override this default. However, you can allocate far data dynamically at run time using MS-DOS memory allocation services. Tiny model produces MS-DOS .COM files. Specifying .MODEL tiny automatically sends the /TINY argument to the linker. Therefore, the /AT argument is not necessary with .MODEL tiny. However, /AT does not insert a .MODEL directive. It only verifies that there are no base or pointer fixups, and sends /TINY to the linker.
Flat Model
The flat memory model is a nonsegmented configuration available in 32-bit operating systems. It is similar to tiny model in that all code and data go in a single 32-bit segment. To write a flat model program, specify the .386 or .486 directive before .MODEL FLAT. All data and code (including system resources) are in a single 32-bit segment. The operating system automatically initializes segment registers at load time; you need to modify them only when mixing 16-bit and 32-bit segments in a single application. CS, DS, ES, and SS all occupy the supergroup FLAT. Addresses and pointers passed to system services are always 32-bit near addresses and pointers.
38
Programmers Guide
Choosing the Language Convention

The language option facilitates compatibility with high-level languages by determining the internal encoding for external and public symbol names, the code generated for procedure initialization and cleanup, and the order that arguments are passed to a procedure with INVOKE. It also facilitates compatibility with high-level language modules. The PASCAL, BASIC, and FORTRAN conventions are identical. C and SYSCALL have the same calling convention but different naming conventions. Functions in the Windows API use the Pascal calling convention. Procedure definitions (PROC) and high-level procedure calls (INVOKE) automatically generate code consistent with the calling convention of the specified language. The PROC, INVOKE, PUBLIC, and EXTERN directives all use the naming convention of the language. These directives follow the default language conventions from the .MODEL directive unless you specifically override the default. Use of these directives is explained in Controlling Program Flow, Chapter 7. You can also use the OPTION directive to set the language type. (See Using the OPTION Directive in Chapter 1.) Not specifying a language type in either the .MODEL , OPTION, EXTERN, PROC, INVOKE, or PROTO statement causes the assembler to generate an error. The predefined symbol @Interface provides information about the language parameters. For a description of the bit flags, see Help. For more information on calling and naming conventions, see Chapter 12, Mixed-Language Programming. For information about writing procedures and prototypes, see Chapter 7, Controlling Program Flow. For information on multiple-module programming, refer to Chapter 8, Sharing Data and Procedures Among Modules and Libraries.
Setting the Stack Distance

The NEARSTACK keyword places the stack segment in the group DGROUP along with the data segment. The .STARTUP directive then generates code to adjust SS:SP so that SS (Stack Segment register) holds the same address as DS (Data Segment register). If you do not use .STARTUP, you must make this adjustment or your program may fail to run. (For information about startup code, see Starting and Ending Code with .STARTUP and .EXIT, later in this chapter.) In this case, you can use DS to access stack items (including parameters and local variables) and SS to access near data. Furthermore, since stack items share the same segment address as near data, you can reliably pass near pointers to stack items. The FARSTACK setting gives the stack a segment of its own. That is, SS does not equal DS. The default stack type, NEARSTACK, is a convenient setting for
39
most programs. Use FARSTACK for special cases such as memory-resident programs
40
Programmers Guide
and dynamic-link libraries (discussed in Chapters 10 and 11) when you cannot assume that the callers stack is near. You can use the predefined symbol @Stack to determine if the stack location is DGROUP (for near stacks) or STACK (for far stacks).
Specifying a Processor and Coprocessor

MASM supports a set of directives for selecting processors and coprocessors. Once you select a processor, you must use only the instruction set for that processor. The default is the 8086 processor. If you always want your code to run on this processor, you do not need to add any processor directives. To enable a different processor mode and the additional instructions available on that processor, use the directives .186, .286, .386, and .486. The instruction timings on a listing (see Appendix C, Generating and Reading Assembly Listings) correspond to whichever processor directive you select. The .286P, .386P, and .486P directives enable the instructions available only at higher privilege levels in addition to the normal instruction set for the given processor. Generally, you dont need privileged instructions unless you are writing operating-systems code or device drivers. In addition to enabling different instruction sets, the processor directives also affect the behavior of extended language features. For example, the INVOKE directive pushes arguments onto the stack. If the .286 directive is in effect, INVOKE takes advantage of operations possible only on 80286 and later processors. Use the directives .8087 (the default), .287, .387, and .NO87 to select a math coprocessor instruction set. The .NO87 directive turns off assembly of all coprocessor instructions. Note that .486 also enables assembly of all coprocessor instructions because the 80486 processor has a complete set of coprocessor registers and instructions built into the chip. The processor instructions imply the corresponding coprocessor directive. The coprocessor directives are provided to override the defaults.
Creating a Stack
The stack is the section of memory used for pushing or popping registers and storing the return address when a subroutine is called. The stack often holds temporary and local variables. If your main module is written in a high-level language, that language handles the details of creating a stack. Use the .STACK directive only when you write a main module in assembly language.
41
The .STACK directive creates a stack segment. By default, the assembler allocates 1K of memory for the stack. This size is sufficient for most small programs.
42
Programmers Guide
To create a stack of a size other than the default size, give .STACK a single numeric argument indicating stack size in bytes:
.STACK 2048 ; Use 2K stack
For a description of how stack memory is used with procedure calls and local variables, see Chapter 7, Controlling Program Flow.
Creating Data Segments

Programs can contain both near and far data. In general, you should place important and frequently used data in the near data area, where data access is faster. This area can get crowded, however, because in 16-bit operating systems the total amount of all near data in all modules cannot exceed 64K. Therefore, you may want to place infrequently used or particularly large data items in a far data segment. The .DATA , .DATA?, .CONST, .FARDATA , and .FARDATA? directives create data segments. You can access the various segments within DGROUP without reloading segment registers (see Defining Segment Groups, later in this chapter). These five directives also prevent instructions from appearing in data segments by assuming CS to ERROR.
Near Data Segments

The .DATA directive creates a near data segment. This segment contains the frequently used data for your program. It can occupy up to 64K in MS-DOS or 512 megabytes under flat model in Windows NT. It is placed in a special group identified as DGROUP, which is also limited to 64K. When you use .MODEL , the assembler automatically defines DGROUP for your near data segment. The segments in DGROUP form near data, which can normally be accessed directly through DS or SS. You can also define the .DATA? and .CONST segments that go into DGROUP unless you are using flat model. Although all of these segments (along with the stack) are eventually grouped together and handled as data segments, .DATA? and .CONST enhance compatibility with Microsoft high-level languages. In Microsoft languages, .CONST is used to define constant data such as strings and floating-point numbers that must be stored in memory. The .DATA? segment is used for storing uninitialized variables. You can follow this convention if you want. If you use C startup code, .DATA? is initialized to 0. You can use @data to determine the group of the data segment and @DataSize to determine the size of the memory model set by the .MODEL directive. The predefined symbols @WordSize and @CurSeg return the size attribute and name of the current segment, respectively. See Predefined Symbols in Chapter 1.
43
Far Data Segments

The compact, large, and huge memory models use far data addresses by default. With these memory models, however, you can still construct data segments using .DATA , .DATA?, and .CONST. The effect of these directives does not change from one memory model to the next. They always contribute segments to the default data area, DGROUP, which has a total limit of 64K. When you use .FARDATA or .FARDATA? in the small and medium memory models, the assembler creates far data segments FAR_DATA and FAR_BSS, respectively. You can access variables with:
mov mov ax, SEG farvar2 ds, ax
For more information on far data, see Near and Far Addresses in Chapter 3.
Creating Code Segments

Whether you are writing a main module or a module to be called from another module, you can have both near and far code segments. This section explains how to use near and far code segments and how to use the directives and predefined equates that relate to code segments.
Near Code Segments

The small memory model is often the best choice for assembly programs that are not linked to modules in other languages, especially if you do not need more than 64K of code. This memory model defaults to near (two-byte) addresses for code and data, which makes the program run faster and use less memory. When you use .MODEL and simplified segment directives, the .CODE directive in your program instructs the assembler to start a code segment. The next segment directive closes the previous segment; the END directive at the end of your program closes remaining segments. The example at the beginning of Using Simplified Segment Directives, earlier in this chapter, shows how to do this. You can use the predefined symbol @CodeSize to determine whether code pointers default to NEAR or FAR.
Far Code Segments

When you need more than 64K of code, use the medium, large, or huge memory model to create far segments. The medium, large, and huge memory models use far code addresses by default. In the larger memory models, the assembler creates a different code segment for each module. If you use multiple code segments in the small,
44
Programmers Guide
compact, or tiny model, the linker combines the .CODE segments for all modules into one segment. For far code segments, the assembler names each code segment MODNAME_TEXT, in which MODNAME is the name of the module. With near code, the assembler names every code segment _TEXT, causing the linker to concatenate these segments into one. You can override the default name by providing an argument after .CODE. (For a complete list of segment names generated by MASM, see Appendix E, Default Segment Names.) With far code, a single module can contain multiple code segments. The .CODE directive takes an optional text argument that names the segment. For instance, the following example creates two distinct code segments, FIRST_TEXT and SECOND_TEXT.
.CODE . . . .CODE . . . FIRST ; First set of instructions here SECOND ; Second set of instructions here
Whenever the processor executes a far call or jump, it loads CS with the new segment address. No special action is necessary other than making sure that you use far calls and jumps. See Near and Far Addresses in Chapter 3. Note The assembler always assumes that the CS register contains the address of the current code segment or group.
Starting and Ending Code with .STARTUP and .EXIT

The easiest way to begin and end an MS-DOS program is to use the .STARTUP and .EXIT directives in the main module. The main module contains the starting point and usually the termination point. You do not need these directives in a module called by another module. These directives make MS-DOS programs easy to maintain. They automatically generate code appropriate to the stack distance specified with .MODEL . However, they do not apply to flat-model programs written for 32-bit operating systems. Thus, you should not use .STARTUP or .EXIT in programs written for Windows NT.
45
To start a program, place the .STARTUP directive where you want execution to begin. Usually, this location immediately follows the .CODE directive:
.CODE .STARTUP . . . .EXIT END
; Place executable code here
Note that .EXIT generates executable code, while END does not. The END directive informs the assembler that it has reached the end of the module. All modules must end with the END directive whether you use simplified or full segments. If you do not use .STARTUP, you must give the starting address as an argument to the END directive. For example, the following fragment shows how to identify a programs starting instruction with the label start:
.CODE start: . . . END ; Place executable code here start
Only the END directive for the module with the starting instruction should have an argument. When .STARTUP is present, the assembler ignores any argument to END. For the default NEARSTACK attribute, .STARTUP points DS to DGROUP and sets SS:SP relative to DGROUP, generating the following code:
46
Programmers Guide
@Startup: mov mov mov sub shl shl shl shl cli mov add sti . . . END
dx, ds, bx, bx, bx, bx, bx, bx,
DGROUP dx ss dx 1 1 1 1
; If .286 or higher, this is ; shortened to shl bx, 4
; Not necessary in .286 or higher ss, dx sp, bx ; Not necessary in .286 or higher
@Startup
An MS-DOS program with the FARSTACK attribute does not need to adjust SS:SP, so .STARTUP just initializes DS, like this:
@Startup: mov mov . . . END dx, DGROUP ds, dx
@Startup
When the program terminates, you can return an exit code to the operating system. Applications that check exit codes usually assume that an exit code of 0 means no problem occurred, and that an exit code of 1 means an error terminated the program. The .EXIT directive accepts a 1-byte exit code as its optional argument:
.EXIT 1 ; Return exit code 1
.EXIT generates the following code that returns control to MS-DOS, thus terminating the program. The return value, which can be a constant, memory reference, or 1-byte register, goes into AL:
mov mov int al, value ah, 04Ch 21h
If your program does not specify a return value, .EXIT returns whatever value happens to be in AL.
47
Using Full Segment Definitions

If you need complete control over segments, you can fully define the segments in your program. This section explains segment definitions, including how to order segments and how to define the segment types. If you write a program under MS-DOS without .MODEL and .STARTUP, you must initialize registers yourself and use the END directive to indicate the starting address. The Windows operating system does not require you to initialize registers, as described in Chapter 3. For a description of typical startup code, see Controlling the Segment Order, later in this chapter.
Defining Segments with the SEGMENT Directive

A defined segment begins with the SEGMENT directive and ends with the ENDS directive: name SEGMENT [[align]] [[READONLY]] [[combine]] [[use]] [[class]] statements name ENDS The name defines the name of the segment. Within a module, all segment definitions with the same name are treated as though they reference the same segment. The linker also combines identically named segments from different modules unless the combine type is PRIVATE. In addition, segments can be nested. The optional types that follow the SEGMENT directive give the linker and the assembler instructions on how to set up and combine segments. The optional types, which are explained in detail in the following sections, include:
Type align
READONLY
Description Defines the memory boundary on which a new segment begins. Tells the assembler to report an error if it detects an instruction modifying any item in a READONLY segment. Determines how the linker combines segments from different modules when building executable files. Determines the size of a segment. USE16 indicates that offsets in the segment are 16 bits wide. USE32 indicates 32-bit offsets. Provides a class name for the segment. The linker automatically groups segments of the same class in memory.
combine use (80386/486 only) class
Types can be specified in any order. You can specify only one attribute from each of these fields; for example, you cannot have two different align types.
48
Programmers Guide
You can close a segment and reopen it later with another SEGMENT directive. When you reopen a segment, you need only give the segment name. You cannot change the attributes of a segment once you have defined it. Note The PAGE align type and the PUBLIC combine type are distinct from the PAGE and PUBLIC directives. The assembler distinguishes them by means of context.
Aligning Segments
The optional align type in the SEGMENT directive defines the range of memory addresses from which a starting address for the segment can be selected. The align type can be any of the following:
Align Type
BYTE WORD DWORD PARA PAGE
Starting Address Next available byte address. Next available word address. Next available doubleword address. Next available paragraph address (16 bytes per paragraph). Default. Next available page address (256 bytes per page).
The linker uses the alignment information to determine the relative starting address for each segment. The operating system calculates the actual starting address when the program is loaded.
Making Segments Read-Only

The optional READONLY attribute is helpful when creating read-only code segments for protected mode, or when writing code to be placed in read-only memory (ROM). It protects against illegal self-modifying code. The READONLY attribute causes the assembler to check for instructions that modify the segment and to generate an error if it finds any. The assembler generates an error if you attempt to write directly to a read-only segment.
Combining Segments
The optional combine type in the SEGMENT directive defines how the linker combines segments having the same name but appearing in different modules.
49
The combine type controls linker behavior, not assembler behavior. The combine types, which are described in full detail in Help, include:
Combine Type
PRIVATE PUBLIC STACK
Linker Action Does not combine the segment with segments from other modules, even if they have the same name. Default. Concatenates all segments having the same name to form a single, contiguous segment. Concatenates all segments having the same name and causes the operating system to set SS:00 to the bottom and SS:SP to the top of the resulting segment. Data initialization is unreliable, as discussed following. Overlaps segments. The length of the resulting area is the length of the largest of the combined segments. Data initialization is unreliable, as discussed following. Used as a synonym for the PUBLIC combine type. Assumes address as the segment location. An AT segment cannot contain any code or initialized data, but is useful for defining structures or variables that correspond to specific far memory locations, such as a screen buffer or low memory. You cannot use the AT combine type in protected-mode programs.
COMMON
MEMORY AT address
Do not place initialized data in STACK or COMMON segments. With these combine types, the linker overlays initialized data for each module at the beginning of the segment. The last module containing initialized data writes over any data from other modules. Note Normally, you should provide at least one stack segment (having STACK combine type) in a program. If no stack segment is declared, LINK displays a warning message. You can ignore this message if you have a specific reason for not declaring a stack segment. For example, you would not have a separate stack segment in a MS-DOS tiny model (.COM) program, nor would you need a separate stack in a DLL that uses the callers stack.
Setting Segment Word Sizes (80386/486 Only)

The use type in the SEGMENT directive specifies the segment word size on the 80386/486 processors. Segment word size determines the default operand and address size of all items in a segment. The size attribute can be USE16, USE32, or FLAT. If you specify the .386 or .486 directive before the .MODEL directive, USE32 is the default. This attribute specifies that items in the segment are addressed with a 32-bit offset rather than a
50
Programmers Guide
16-bit offset. If .MODEL precedes the .386 or .486 directive, USE16 is the default. To make USE32 the default, put .386 or .486 before .MODEL . You can override the USE32 default with the USE16 attribute, or vice versa. Note Programs written for MS-DOS must not specify USE32. Mixing 16-bit and 32-bit segments in the same program is possible but usually applies only to systems programming.
Setting Segment Order with Class Type

The optional class type in the SEGMENT directive helps control segment ordering. Two segments with the same name are not combined if their class is different. The linker arranges segments so that all segments identified with a given class type are next to each other in the executable file. However, within a particular class, the linker arranges segments in the order encountered. The .ALPHA, .SEQ , or .DOSSEG directive determines this order in each .OBJ file. The most common method for specifying a class type is to place all code segments first in the executable file.
Controlling the Segment Order

The assembler normally positions segments in the object file in the order in which they appear in source code. The linker, in turn, processes object files in the order in which they appear on the command line. Within each object file, the linker outputs segments in the order they appear, subject to any group, class, and .DOSSEG requirements. You can usually ignore segment ordering. However, it is important whenever you want certain segments to appear at the beginning or end of a program or when you make assumptions about which segments are next to each other in memory. For tiny model (.COM) programs, code segments must appear first in the executable file, because execution must start at the address 100h.
Segment Order Directives

You can control the order in which segments appear in the executable program with three directives. The default, .SEQ , arranges segments in the order in which you declare them. The .ALPHA directive specifies alphabetical segment ordering within a module. .ALPHA is provided for compatibility with early versions of the IBM assembler. If you have trouble running code from older books on assembly language, try using .ALPHA. The .DOSSEG directive specifies the MS-DOS segment-ordering convention. It places segments in the standard order required by Microsoft languages. Do not use .DOSSEG in a module to be called from another module.
51
The .DOSSEG directive orders segments as follows: 1. Code segments 2. Data segments, in this order: a. Segments not in class BSS or STACK b. Class BSS segments c. Class STACK segments When you declare two or more segments to be in the same class, the linker automatically makes them contiguous. This rule overrides the segment-ordering directives. (For more about segment classes, see Setting Segment Order with Class Type in the previous section.)
Linker Control
Most of the segment-ordering techniques (class names, .ALPHA, and .SEQ ) control the order in which the assembler outputs segments. Usually, you are more interested in the order in which segments appear in the executable file. The linker controls this order. The linker processes object files in the order in which they appear on the command line. Within each module, it then outputs segments in the order given in the object file. If the first module defines segments DSEG and STACK and the second module defines CSEG, then CSEG is output last. If you want to place CSEG first, there are two ways to do so. The simpler method is to use .DOSSEG . This directive is output as a special record to the object file linker, and it tells the linker to use the Microsoft segment-ordering convention. This convention overrides command-line order of object files, and it places all segments of class 'CODE' first. (See Defining Segments with the SEGMENT Directive, previous.) The other method is to define all the segments as early as possible (in an include file, for example, or in the first module). These definitions can be dummy segments that is, segments with no content. The linker observes the segment ordering given, then later combines the empty segments with segments in other modules that have the same name.
52
Programmers Guide
For example, you might include the following at the start of the first module of your program or in an include file:
_TEXT _TEXT _DATA _DATA CONST CONST STACK STACK SEGMENT ENDS SEGMENT ENDS SEGMENT ENDS SEGMENT ENDS WORD PUBLIC 'CODE' WORD PUBLIC 'DATA' WORD PUBLIC 'CONST' PARA STACK 'STACK'
Later in the program, the order in which you write _TEXT, _DATA, or other segments does not matter because the ultimate order is controlled by the segment order defined in the include file.
Setting the ASSUME Directive for Segment Registers

Many of the assembler instructions assume a default segment. For example, JMP assumes the segment associated with the CS register, PUSH and POP assume the segment associated with the SS register, and MOV instructions assume the segment associated with the DS register. When the assembler needs to reference an address, it must know what segment contains the address. It finds this by using the default segment or group addresses assigned with the ASSUME directive. The syntax is: ASSUME ASSUME ASSUME ASSUME ASSUME segregister : seglocation [, segregister : seglocation] ] dataregister : qualifiedtype [, dataregister : qualifiedtype] register : ERROR [, register : ERROR] [register :] NOTHING [, register : NOTHING] register : FLAT [, register : FLAT]
The seglocation must be the name of the segment or group that is to be associated with segregister. Subsequent instructions that assume a default register for referencing labels or variables automatically assume that if the default segment is segregister, the label or variable is in the seglocation. MASM 6.1 automatically gives CS the address of the current code segment. Therefore, you do not need to include
ASSUME CS : MY_CODE
at the beginning of your program if you want the current segment associated with CS.
53
Note Using the ASSUME directive to tell the assembler which segment to associate with a segment register is not the same as telling the processor. The ASSUME directive affects only assembly-time assumptions. You may need to use instructions to change run-time conditions. Initializing segment registers at run time is discussed in Informing the Assembler About Segment Values, Chapter 3. The ASSUME directive can define a segment for each of the segment registers. The segregister can be CS, DS, ES, or SS (and FS and GS on the 80386/486). The seglocation must be one of the following:
u
u u u u
The name of a segment defined in the source file with the SEGMENT directive. The name of a group defined in the source file with the GROUP directive. The keyword NOTHING, ERROR, or FLAT. A SEG expression (see Immediate Operands in Chapter 3). A string equate (text macro) that evaluates to a segment or group name (but not a string equate that evaluates to a SEG expression).
It is legal to combine assumes to FLAT with assumes to specific segments. Combinations might be necessary in operating-system code that handles both 16- and 32-bit segments. The keyword NOTHING cancels the current segment assumptions. For example, the statement ASSUME NOTHING cancels all register assumptions made by previous ASSUME statements. Usually, a single ASSUME statement defines all four segment registers at the start of the source file. However, you can use the ASSUME directive at any point to change segment assumptions. Using the ASSUME directive to change segment assumptions is often equivalent to changing assumptions with the segment-override operator (:). See Direct Memory Operands in Chapter 3. The segment-override operator is more convenient for one-time overrides. The ASSUME directive may be more convenient if previous assumptions must be overridden for a sequence of instructions. However, in either case, your program must explicitly load a segment register with a segment address before accessing data within the segment. ASSUME only tells the assembler to assume that the register is correctly initialized; it does not by itself generate any code to load the register.
54
Programmers Guide
You can also prevent the use of a register with:

ASSUME SegRegister : ERROR
The assembler generates an ASSUME CS:ERROR when you use simplified directives to create data segments, effectively preventing instructions or code labels from appearing in a data segment. For more information about ASSUME, refer to Defining Register Types with ASSUME in Chapter 3.
Defining Segment Groups

A group is a collection of segments totalling not more than 64K in 16-bit mode. A program addresses a code or data item in the group relative to the beginning of the group. A group lets you develop separate logical segments for different kinds of data and then combine these into one segment (a group) for all the data. Using a group can save you from having to continually reload segment registers to access different segments. As a result, the program uses fewer instructions and runs faster. The most common example of a group is the specially named group for near data, DGROUP. In the Microsoft segment model, several segments (_DATA, _BSS, CONST, and STACK) are combined into a single group called DGROUP. Microsoft high-level languages place all near data segments in this group. (By default, the stack is placed here, too.) The .MODEL directive automatically defines DGROUP. The DS register normally points to the beginning of the group, giving you relatively fast access to all data in DGROUP. The syntax of the group directive is: name GROUP segment [[, segment]]... The name labels the group. It can refer to a group that was previously defined. This feature lets you add segments to a group one at a time. For example, if MYGROUP was previously defined to include ASEG and BSEG, then the statement
MYGROUP GROUP CSEG
is perfectly legal. It simply adds CSEG to the group MYGROUP; ASEG and BSEG are not removed. Each segment can be any valid segment name (including a segment defined later in source code), with one restriction: a segment cannot belong to more than one group.
55
The GROUP directive does not affect the order in which segments of a group are loaded. You can place any number of 16-bit segments in a group as long as the total size does not exceed 65,536 bytes. If the processor is in 32-bit mode, the maximum size is 4 gigabytes. You need to make sure that non-grouped segments do not get placed between grouped segments in such a way that the size of the group exceeds 64K or 4 gigabytes. Neither can you place a 16-bit and a 32-bit segment in the same group.
53
C H A P T E R
Using Addresses and Pointers
MASM applications running in real mode require segmented addresses to access code and data. The address of the code or data in a segment is relative to a segment address in a segment register. You can also use pointers to access data in assembly language programs. (A pointer is a variable that contains an address as its value.) The first section of this chapter describes how to initialize default segment registers to access near and far addresses. The next section describes how to access code and data. It also describes related operators, syntax, and displacements. The discussion of memory operands lays the foundation for the third section, which describes the stack. The fourth section of this chapter explains how to use the TYPEDEF directive to declare pointers and the ASSUME directive to give the assembler information about registers containing pointers. This section also shows you how to do typical pointer operations and how to write code that works for pointer variables in any memory model.
Programming Segmented Addresses

Before you use segmented addresses in your programs, you need to initialize the segment registers. The initialization process depends on the registers used and on your choice of simplified segment directives or full segment definitions. The simplified segment directives (introduced in Chapter 2) handle most of the initialization process for you. This section explains how to inform the assembler and the processor of segment addresses, and how to access the near and far code and data in those segments.
Initializing Default Segment Registers

The segmented architecture of the 8086-family of processors does not require that you specify two addresses every time you access memory. As explained in Chapter 2, Organizing Segments, the 8086 family of processors uses a system
54
Programmers Guide
of default segment registers to simplify access to the most commonly used data and code. The segment registers DS, SS, and CS are normally initialized to default segments at the beginning of a program. If you write the main module in a highlevel language, the compiler initializes the segment registers. If you write the main module in assembly language, you must initialize the segment registers yourself. Follow these steps to initialize segments: 1. Tell the assembler which segment is associated with a register. The assembler must know the default segments at assembly time. 2. Tell the processor which segment is associated with a register by writing the necessary code to load the correct segment value into the segment register on the processor. These steps are discussed separately in the following sections.
Informing the Assembler About Segment Values

The first step in initializing segments is to tell the assembler which segment to associate with a register. You do this with the ASSUME directive. If you use simplified segment directives, the assembler automatically generates the appropriate ASSUME statements. If you use full segment definitions, you must code the ASSUME statements for registers other than CS yourself. (ASSUME can also be used on general-purpose registers, as explained in Defining Register Types with ASSUME later in this chapter.) The .STARTUP directive generates startup code that sets DS equal to SS (unless you specify FARSTACK), allowing default data to be accessed through either SS or DS. This can improve efficiency in the code generated by compilers. The DS equals SS convention may not work with certain applications, such as memory-resident programs in MS-DOS and Windows dynamic-link libraries (see Chapter 10). The code generated for .STARTUP is shown in Starting and Ending Code with .STARTUP and .EXIT in Chapter 2. You can use similar code to set DS equal to SS in programs using full segment definitions. Here is an example of ASSUME using full segment definitions:
ASSUME cs:_TEXT, ds:DGROUP, ss:DGROUP
This example is equivalent to the ASSUME statement generated with simplified segment directives in small model with NEARSTACK. Note that DS and SS are part of the same segment group. It is also possible to have different segments for data and code, and to use ASSUME to set ES, as shown here:
Chapter 3 Using Addresses and Pointers

ASSUME cs:MYCODE, ds:MYDATA, ss:MYSTACK, es:OTHER
55
Correct use of the ASSUME statement can help find addressing errors. With .CODE, the assembler assumes CS is the current segment. When you use the simplified segment directives .DATA, .DATA?, .CONST, .FARDATA , or .FARDATA?, the assembler automatically assumes CS is the ERROR segment. This prevents instructions from appearing in these segments. If you use full segment definitions, you can accomplish the same by placing ASSUME CS:ERROR in a data segment. With simple or full segments, you can cancel the control of an ASSUME statement by assuming NOTHING. You can cancel the previous assumption for ES with the following statement:
ASSUME es:NOTHING
Prior to the .MODEL statement (or in its absence), the assembler sets the ASSUME statement for DS, ES, and SS to the current segment.
Informing the Processor About Segment Values

The second and final step in initializing segments is to inform the processor of segment values at run time. How segment values are initialized at run time differs for each segment register and depends on the operating system and on your use of simplified segment directives or full segment definitions.
Specifying a Starting Address

A programs starting address determines where execution begins. After the operating system loads a program, it simply jumps to the starting address, giving processor control to the program. The true starting address is known only to the loader; the linker determines only the offset of the address within an undetermined code segment. Thats why a normal application is often referred to as relocatable code, because it runs regardless of where the loader places it in memory. The offset of the starting address depends on the program type. Programs with an .EXE extension contain a header from which the loader reads the offset and combines it with a segment to form the starting address. Programs with a .COM extension (tiny model) have no such header, so by convention the loader jumps to the first byte of the program. In either case, the .STARTUP directive identifies where execution begins, provided you use simplified segment directives. For an .EXE program, place .STARTUP immediately before the instruction where you want execution to start. In a .COM program, place .STARTUP before the first assembly instruction in your source code.
56
Programmers Guide
If you use full segment directives or prefer not to use .STARTUP, you must identify the starting instruction in two steps: 1. Label the starting instruction. 2. Provide the same label in the END directive. These steps tell the linker where execution begins in the program. The following example illustrates the two steps for a tiny model program:
_TEXT start: SEGMENT WORD PUBLIC 'CODE' ORG 100h ; Use this declaration for .COM files only . ; First instruction here . . ENDS END start ; Name of starting label
_TEXT
Notice the ORG statement in this example. This statement is mandatory in a tiny model program without the .STARTUP directive. It places the first instruction at offset 100h in the code segment to create space for a 256-byte (100h) data area called the Program Segment Prefix (PSP). The operating system takes care of initializing the PSP, so you need only make sure the area exists. (For a description of what data resides in the PSP, refer to the Tables chapter in the Reference.)
Initializing DS
The DS register is automatically initialized to the correct value (DGROUP) if you use .STARTUP or if you are writing a program for Windows. If you do not use .STARTUP with MS-DOS, you must initialize DS using the following instructions:
mov mov ax, DGROUP ds, ax
The initialization requires two instructions because the segment name is a constant and the assembler does not allow a constant to be loaded directly to a segment register. The previous example loads DGROUP, but you can load any valid segment or group.
Initializing SS and SP
The SS and SP registers are initialized automatically if you use the .STACK directive with simplified segments or if you define a segment that has the STACK combine type with full segment definitions. Using the STACK directive initializes SS to the stack segment. If you want SS to be equal to DS, use .STARTUP or its equivalent. (See Combining Segments, page 45.) For an .EXE file, the stack address is encoded into the executable header and resolved
57
at load time. For a .COM file, the loader sets SS equal to CS and initializes SP to 0FFFEh. If your program does not access far data, you do not need to initialize the ES register. If you choose to initialize, use the same technique as for the DS register. You can initialize SS to a far stack in the same way.
Near and Far Addresses

Addresses that have an implied segment name or segment registers associated with them are called near addresses. Addresses that have an explicit segment associated with them are called far addresses. The assembler handles near and far code automatically, as described in the following sections. You must specify how to handle far data. The Microsoft segment model puts all near data and the stack in a group called DGROUP. Near code is put in a segment called _TEXT. Each modules far code or far data is placed in a separate segment. This convention is described in Controlling the Segment Order in Chapter 2. The assembler cannot determine the address for some program components; these are said to be relocatable. The assembler generates a fixup record and the linker provides the address once it has determined the location of all segments. Usually a relocatable operand references a label, but there are exceptions. Examples in the next two sections include information about relocating near and far data.
Near Code
Control transfers within near code do not require changes to segment registers. The processor automatically handles changes to the offset in the IP register when control-flow instructions such as JMP, CALL, and RET are used. The statement
call nearproc ; Change code offset
changes the IP register to the new address but leaves the segment unchanged. When the procedure returns, the processor resets IP to the offset of the next instruction after the CALL instruction.
Far Code
The processor automatically handles segment register changes when dealing with far code. The statement
call farproc ; Change code segment and offset
automatically moves the segment and offset of the farproc procedure to the CS and IP registers. When the procedure returns, the processor sets CS to the
58
Programmers Guide
original code segment and sets IP to the offset of the next instruction after the call.
Near Data
A program can access near data directly, because a segment register already holds the correct segment for the data item. The term near data is often used to refer to the data in the DGROUP group. After the first initialization of the DS and SS registers, these registers normally point into DGROUP. If you modify the contents of either of these registers during the execution of the program, you must reload the register with DGROUPs address before referencing any DGROUP data. The processor assumes all memory references are relative to the segment in the DS register, with the exception of references using BP or SP. The processor associates these registers with the SS register. (You can override these assumptions with the segment override operator, described in Direct Memory Operands, on page 62.) The following lines illustrate how the processor accesses either the DS or SS segments, depending on whether the pointer operand contains BP or SP. Note the distinction loses significance when DS and SS are equal.
nearvar WORD . . . mov mov mov mov mov 0
ax, nearvar di, [bx] [di], cx [bp+6], ax bx, [bp]
; ; ; ; ;
Reads from Reads from Writes to Writes to Reads from
DS:[nearvar] DS:[bx] DS:[di] SS:[bp+6] SS:[bp]
Far Data
To read or modify a far address, a segment register must point to the segment of the data. This requires two steps. First load the segment (normally either ES or DS) with the correct value, and then (optionally) set an assume of the segment register to the segment of the address. Note Flat model does not require far addresses. By default, all addressing is relative to the initial values of the segment registers. Therefore, this section on far addressing does not apply to flat model programs. One method commonly used to access far data is to initialize the ES segment register. This example shows two ways to do this:

; First method mov mov mov
59
ax, SEG farvar es, ax ax, es:farvar
; Load segment of the , far address into ES ; Provide an explicit segment ; override on the addressing
60
Programmers Guide
; Second method mov ax, SEG farvar2 ; Load the segment of the mov es, ax ; far address into ES ASSUME ES:SEG farvar2 ; Tell the assembler that ES points ; to the segment containing farvar2 mov ax, farvar2 ; The assembler provides the ES ; override since it knows that ; the label is addressable
After loading the segment of the address into the ES segment register, you can explicitly override the segment register so that the addressing is correct (method 1) or allow the assembler to insert the override for you (method 2). The assembler uses ASSUME statements to determine which segment register can be used to address a segment of memory. To use the segment override operator, the left operand must be a segment register, not a segment name. (For more information on segment overrides, see Direct Memory Operands on page 62.) If an instruction needs a segment override, the resulting code is slightly larger and slower, since the override must be encoded into the instruction. However, the resulting code may still be smaller than the code for multiple loads of the default segment register for the instruction. The DS, SS, FS, and GS segment registers (FS and GS are available only on the 80386/486 processors) may also be used for addressing through other segments. If a program uses ES to access far data, it need not restore ES when finished (unless the program uses flat model). However, some compilers require that you restore ES before returning to a module written in a high-level language. To access far data, first set DS to the far segment and then restore the original DS when finished. Use the ASSUME directive to let the assembler know that DS no longer points to the default data segment, as shown here:
push mov mov ASSUME mov mov . . . pop ASSUME ds ax, SEG fararray ds, ax ds:SEG fararray ax, fararray[0] dx, fararray[2] ; ; ; ; ; ; Save original segment Move segment into data register Initialize segment register Tell assembler where data is Set DX:AX = dword variable fararray
ds ds:@DATA
; Restore segment ; and default assumption
61
Direct Memory Operands,on page 62, describes an alternative method for accessing far data. The technique of resetting DS as shown in the previous example is best for a lengthy series of far data references. The segment override method described in Direct Memory Operands serves best when accessing only one or two far variables. If your program changes DS to access far data, it should restore DS when finished. This allows procedures to assume that DS is the segment for near data. Many compilers, including Microsoft compilers, use this convention.
Operands
With few exceptions, assembly language instructions work on sources of data called operands. In a listing of assembly code (such as the examples in this book), operands appear in the operand field immediately to the right of the instructions. This section describes the four kinds of instruction operands: register, immediate, direct memory, and indirect memory. Some instructions, such as POPF and STI, have implied operands which do not appear in the operand field. Otherwise, an implied operand is just as real as one stated explicitly. Certain other instructions such as NOP and WAIT deserve special mention. These instructions affect only processor control and do not require an operand. The following four types of operands are described in the rest of this section:
Operand Type Register Immediate Direct memory Indirect memory Addressing Mode An 8-bit or 16-bit register on the 808680486; can also be 32-bit on the 80386/486. A constant value contained in the instruction itself. A fixed location in memory. A memory location determined at run time by using the address stored in one or two registers.
Instructions that take two or more operands always work right to left. The right operand is the source operand. It specifies data that will be read, but not changed, in the operation. The left operand is the destination operand. It specifies the data that will be acted on and possibly changed by the instruction.
62
Programmers Guide
Register Operands
Register operands refer to data stored in registers. The following examples show typical register operands:
mov add jmp bx, 10 ax, bx di ; Load constant to BX ; Add BX to AX ; Jump to the address in DI
An offset stored in a base or index register often serves as a pointer into memory. You can store an offset in one of the base or index registers, then use the register as an indirect memory operand. (See Indirect Memory Operands, following.) For example:
mov inc mov [bx], dl ; Store DL in indirect memory operand bx ; Increment register operand [bx], dl ; Store DL in new indirect memory operand
This example moves the value in DL to 2 consecutive bytes of a memory location pointed to by BX. Any instruction that changes the register value also changes the data item pointed to by the register.
Immediate Operands
An immediate operand is a constant or the result of a constant expression. The assembler encodes immediate values into the instruction at assembly time. Here are some typical examples showing immediate operands:
mov add sub cx, 20 var, 1Fh bx, 25 * 80 ; Load constant to register ; Add hex constant to variable ; Subtract constant expression
Immediate data is never permitted in the destination operand. If the source operand is immediate, the destination operand must be either a register or direct memory to provide a place to store the result of the operation. Immediate expressions often involve the useful OFFSET and SEG operators, described in the following paragraphs.
The OFFSET Operator

An address constant is a special type of immediate operand that consists of an offset or segment value. The OFFSET operator returns the offset of a memory location, as shown here:
mov bx, OFFSET var ; Load offset address
For information on differences between MASM 5.1 behavior and MASM 6.1 behavior related to OFFSET, see Appendix A.
63
Since data in different modules may belong to a single segment, the assembler cannot know for each module the true offsets within a segment. Thus, the offset for var, although an immediate value, is not determined until link time.
The SEG Operator

The SEG operator returns the segment of a memory location:
mov mov ax, SEG farvar es, ax ; Load segment address
The actual value of a particular segment is not known until the program is loaded into memory. For .EXE programs, the linker makes a list in the programs header of all locations in which the SEG operator appears. The loader reads this list and fills in the required segment address at each location. Since .COM programs have no header, the assembler does not allow relocatable segment expressions in tiny model programs. The SEG operator returns a variables frame if it appears in the instruction. The frame is the value of the segment, group, or segment override of a nonexternal variable. For example, the instruction
mov ax, SEG DGROUP:var
places in AX the value of DGROUP, where var is located. If you do not include a frame, SEG returns the value of the variables group if one exists. If the variable is not defined in a group, SEG returns the variables segment address. This behavior can be changed with the /Zm command-line option or with the OPTION OFFSET:SEGMENT statement. (See Appendix A, Differences between MASM 6.1 and 5.1.) Using the OPTION Directive in Chapter 1 introduces the OPTION directive.
Direct Memory Operands

A direct memory operand specifies the data at a given address. The instruction acts on the contents of the address, not the address itself. Except when size is implied by another operand, you must specify the size of a direct memory operand so the instruction accesses the correct amount of memory. The following example shows how to explicitly specify data size with the BYTE directive:
64
Programmers Guide
.DATA? BYTE ? .CODE . . . mov var, al ; Segment for uninitialized data ; Reserve one byte, labeled "var"
var
; Copy AL to byte at var
Any location in memory can be a direct memory operand as long as a size is specified (or implied) and the location is fixed. The data at the address can change, but the address cannot. By default, instructions that use direct memory addressing use the DS register. You can create an expression that points to a memory location using any of the following operators:
Operator Name Plus Minus Index Structure member Segment override Symbol + [] . :
These operators are discussed in more detail in the following section.
Plus, Minus, and Index

The plus and index operators perform in exactly the same way when applied to direct memory operands. For example, both the following statements move the second word value from an array into the AX register:
mov mov ax, array[2] ax, array+2
The index operator can contain any direct memory operand. The following statements are equivalent:
mov mov ax, var ax, [var]
Some programmers prefer to enclose the operand in brackets to show that the contents, not the address, are used.
65
The minus operator behaves as you would expect. Both the following instructions retrieve the value located at the word preceding array:
mov mov ax, array[-2] ax, array-2
Structure Field
The structure operator (.) references a particular element of a structure or field, to use C terminology:
mov bx, structvar.field1
The address of the structure operand is the sum of the offsets of structvar and field1. For more information about structures, see Structures and Unions in Chapter 5.
Segment Override
The segment override operator (:) specifies a segment portion of the address that is different from the default segment. When used with instructions, this operator can apply to segment registers or segment names:
mov ax, es:farvar ; Use segment override
The assembler will not generate a segment override if the default segment is explicitly provided. Thus, the following two statements assemble in exactly the same way:
mov mov [bx], ax ds:[bx], ax
A segment name override or the segment override operator identifies the operand as an address expression.
mov mov mov mov WORD WORD WORD WORD PTR PTR PTR PTR FARSEG:0, ax es:100h, ax es:[100h], ax [100h], ax ; Segment name override ; Legal and equivalent ; expressions ; Illegal, not an address
As the example shows, a constant expression cannot be an address expression unless it has a segment override.
Indirect Memory Operands

Like direct memory operands, indirect memory operands specify the contents of a given address. However, the processor calculates the address at run time by referring to the contents of registers. Since values in the registers can change at run time, indirect memory operands provide dynamic access to memory.
66
Programmers Guide
Indirect memory operands make possible run-time operations such as pointer indirection and dynamic indexing of array elements, including indexing of multidimensional arrays. Strict rules govern which registers you can use for indirect memory operands under 16-bit versions of the 8086-based processors. The rules change significantly for 32-bit processors starting with the 80386. However, the new rules apply only to code that does not need to be compatible with earlier processors. This section covers features of indirect operands in either mode. The specific 16-bit rules and 32-bit rules are then explained separately.
Indirect Operands with 16- and 32-Bit Registers

Some rules and options for indirect memory operands always apply, regardless of the size of the register. For example, you must always specify the register and operand size for indirect memory operands. But you can use various syntaxes to indicate an indirect memory operand. This section describes the rules that apply to both 16-bit and 32-bit register modes.
Specifying Indirect Memory Operands

The index operator specifies the register or registers for indirect operands. The processor uses the data pointed to by the register. For example, the following instruction moves into AX the word value at the address in DS:BX.
mov ax, WORD PTR [bx]
When you specify more than one register, the processor adds the contents of the two addresses together to determine the effective address (the address of the data to operate on):
mov ax, [bx+si]
Specifying Displacements
You can specify an address displacement, which is a constant value added to the effective address. A direct memory specifier is the most common displacement:
mov ax, table[si]
In this relocatable expression, the displacement table is the base address of an array; SI holds an index to an array element. The SI value is calculated at run time, often in a loop. The element loaded into AX depends on the value of SI at the time the instruction executes.
67
Each displacement can be an address or numeric constant. If there is more than one displacement, the assembler totals them at assembly time and encodes the total displacement. For example, in the statement
table WORD . . . mov 100 DUP (0)
ax, table[bx][di]+6
both table and 6 are displacements. The assembler adds the value of 6 to table to get the total displacement. However, the statement
mov ax, mem1[si] + mem2
is not legal, because it attempts to use a single command to join the contents of two different addresses.
Specifying Operand Size

You must give the size of an indirect memory operand in one of three ways:
u u u
By the variables declared size With the PTR operator Implied by the size of the other operand
The following lines illustrate all three methods. Assume the size of the table array is WORD, as declared earlier.
mov mov mov table[bx], 0 ; 2 bytes - from size of table BYTE PTR table, 0 ; 1 byte - specified by BYTE ax, [bx] ; 2 bytes - implied by AX
Syntax Options
The assembler allows a variety of syntaxes for indirect memory operands. However, all registers must be inside brackets. You can enclose each register in its own pair of brackets, or you can place the registers in the same pair of brackets separated by a plus operator (+). All the following variations are legal and assemble the same way:
mov mov mov mov mov ax, ax, ax, ax, ax, table[bx][di] table[di][bx] table[bx+di] [table+bx+di] [bx][di]+table
All of these statements move the value in table indexed by BX+DI into AX.
68
Programmers Guide
Scaling Indexes
The value of index registers pointing into arrays must often be adjusted for zerobased arrays and scaled according to the size of the array items. For a word array, the item number must be multiplied by two (shifted left by one place). When using 16-bit registers, you must scale with separate instructions, as shown here:
mov shl inc bx, 5 bx, 1 wtable[bx] ; Get sixth element (adjust for 0) ; Scale by two (word size) ; Increment sixth element in table
When using 32-bit registers on the 80386/486 processor, you can include scaling in the operand, as described in Indirect Memory Operands with 32-Bit Registers, following.
Accessing Structure Elements

The structure member operator can be used in indirect memory operands to access structure elements. In this example, the structure member operator loads the year field of the fourth element of the students array into AL:
STUDENT grade name year STUDENT students . . mov mov mov mul mov mov STRUCT WORD BYTE BYTE ENDS ? 20 DUP (?) ?
STUDENT
< >
; Assume array is initialized ; Point to array of students ; Get fourth element ; Get size of STUDENT ; Multiply size times ax ; elements to point DI ; to current element al, (STUDENT PTR[bx+di]).year bx, ax, di, di di, OFFSET students 4 SIZE STUDENT
For more information on MASM structures, see Structures and Unions in Chapter 5.
Indirect Memory Operands with 16-Bit Registers

For 8086-based computers and MS-DOS, you must follow the strict indexing rules established for the 8086 processor. Only four registers are allowed BP, BX, SI, and DI those only in certain combinations.
69
BP and BX are base registers. SI and DI are index registers. You can use either a base or an index register by itself. But if you combine two registers, one must be a base and one an index. Here are legal and illegal forms:
mov mov mov mov mov mov ax, ax, ax, ax, ax, ax, [bx+di] [bx+si] [bp+di] [bp+si] [bx+bp] [di+si] ; ; ; ; ; ; Legal Legal Legal Legal Illegal - two base registers Illegal - two index registers
; ;
Table 3.1 shows the register modes in which you can specify indirect memory operands.
Table 3.1 Mode Register indirect Indirect Addressing with 16-Bit Registers Syntax [BX] [BP] [DI] [SI] displacement [BX] displacement [BP] displacement [DI] displacement [SI] [BX][DI] [BP][DI] [BX][SI] [BP][SI] displacement [BX][DI] displacement [BP][DI] displacement [BX][SI] displacement [BP][SI] Effective Address Contents of register
Base or index
Contents of register plus displacement
Base plus index
Contents of base register plus contents of index register
Base plus index with displacement
Sum of base register, index register, and displacement
Different combinations of registers and displacements have different timings, as shown in Reference.
Indirect Memory Operands with 32-Bit Registers

You can write instructions for the 80386/486 processor using either 16-bit or 32-bit segments. Indirect memory operands are different in each case. In 16-bit real mode, the 80386/486 operates the same way as earlier 8086-based processors, with one difference: you can use 32-bit registers. If the 80386/486 processor is enabled (with the .386 or .486 directive), 32-bit general-purpose registers are available with either 16-bit or 32-bit segments. Thirty-twobit
70
Programmers Guide
registers eliminate many of the limitations of 16-bit indirect memory operands. You can use 80386/486 features to make your MS-DOS programs run faster and more efficiently if you are willing to sacrifice compatibility with earlier processors. In 32-bit mode, an offset address can be up to 4 gigabytes. (Segments are still represented in 16 bits.) This effectively eliminates size restrictions on each segment, since few programs need 4 gigabytes of memory. Windows NT uses 32-bit mode and flat model, which spans all segments. XENIX 386 uses 32-bit mode with multiple segments.
80386/486 Enhancements
On the 80386/486, the processor allows you to use any general-purpose 32-bit register as a base or index register, except ESP, which can be a base but not an index. However, you cannot combine 16-bit and 32-bit registers. Several examples are shown here:
add mov dec cmp jmp edx, [eax] dl, [esp+10] WORD PTR [edx][eax] ax, array[ebx][ecx] FWORD PTR table[ecx] ; ; ; ; ; Add double Copy byte from stack Decrement word Compare word from array Jump into pointer table
Scaling Factors
With 80386/486 registers, the index register can have a scaling factor of 1, 2, 4, or 8. Any register except ESP can be the index register and can have a scaling factor. To specify the scaling factor, use the multiplication operator (*) adjacent to the register. You can use scaling to index into arrays with different sizes of elements. For example, the scaling factor is 1 for byte arrays (no scaling needed), 2 for word arrays, 4 for doubleword arrays, and 8 for quadword arrays. There is no performance penalty for using a scaling factor. Scaling is illustrated in the following examples:
mov mov mov eax, darray[edx*4] ; Load double of double array eax, [esi*8][edi] ; Load double of quad array ax, wtbl[ecx+2][edx*2] ; Load word of word array
Scaling is also necessary on earlier processors, but it must be done with separate instructions before the indirect memory operand is used, as described in Indirect Memory Operands with 16-Bit Registers, previous. The default segment register is SS if the base register is EBP or ESP. However, if EBP is scaled, the processor treats it as an index register with a value relative to DS, not SS.
71
All other base registers are relative to DS. If two registers are used, only one can have a scaling factor. The register with the scaling factor is defined as the index register. The other register is defined as the base. If scaling is not used, the first register is the base. If only one register is used, it is considered the base for deciding the default segment unless it is scaled. The following examples illustrate how to determine the base register:
mov mov mov mov mov mov eax, eax, eax, eax, eax, eax, [edx][ebp*4] [edx*1][ebp] [edx][ebp] [ebp][edx] [ebp] [ebp*2] ; ; ; ; ; ; EDX base (not scaled - seg DS) EBP base (not scaled - seg SS) EDX base (first - seg DS) EBP base (first - seg SS) EBP base (only - seg SS) EBP*2 index (seg DS)
Mixing 16-Bit and 32-Bit Registers

Assembly statements can mix 16-bit and 32-bit registers. For example, the following statement is legal for 16-bit and 32-bit segments:
mov eax, [bx]
This statement moves the 32-bit value pointed to by BX into the EAX register. Although BX is a 16-bit pointer, it can still point into a 32-bit segment. However, the following statement is never legal, since you cannot use the CX register as a 16-bit pointer:
; mov eax, [cx] ; illegal
Operands that mix 16-bit and 32-bit registers are also illegal:
; mov eax, [ebx+si] ; illegal
The following statement is legal in either 16-bit or 32-bit mode:

mov bx, [eax]
This statement moves the 16-bit value pointed to by EAX into the BX register. This works in 32-bit mode. However, in 16-bit mode, moving a 32-bit pointer into a 16-bit segment is illegal. If EAX contains a 16-bit value (the top half of the 32-bit register is 0), the statement works. However, if the top half of the EAX register is not 0, the operand points into a part of the segment that doesnt exist, generating an error. If you use 32-bit registers as indexes in 16-bit mode, you must make sure that the index registers contain valid 16-bit addresses.
The Program Stack

The preceding discussion on memory operands lays the groundwork for understanding the important data area known as the stack.
72
Programmers Guide
A stack is an area of memory for storing data temporarily. Unlike other segments that store data starting from low memory, the stack stores data starting from high memory. Data is always pushed onto, or popped from the top of the stack. The stack gets its name from its similarity to the spring-loaded plate holders in cafeterias. You add and remove plates from only the top of the stack. To retrieve the third plate, you must remove that is, pop the first two plates. Stacks are often referred to as LIFO buffers, from their last-in-first-out operation. A stack is an essential part of any nontrivial program. A program continually uses its stack to temporarily store return addresses, procedure arguments, memory data, flags, or registers. The SP register serves as an indirect memory operand to the top of the stack. At first, the stack is an uninitialized segment of a finite size. As your program adds data to the stack, the stack grows downward from high memory to low memory. When you remove items from the stack, it shrinks upward from low to high memory.
Saving Operands on the Stack

The PUSH instruction stores a 2-byte operand on the stack. The POP instruction retrieves the most recent pushed value. When a value is pushed onto the stack, the assembler decreases the SP (Stack Pointer) register by 2. On 8086-based processors, the SP register always points to the top of the stack. The PUSH and POP instructions use the SP register to keep track of the current position. When a value is popped off the stack, the assembler increases the SP register by 2. Since the stack always contains word values, the SP register changes in multiples of two. When a PUSH or POP instruction executes in a 32-bit code segment (one with USE32 use type), the assembler transfers a 4-byte value, and ESP changes in multiples of four. Note The 8086 and 8088 processors differ from later Intel processors in how they push and pop the SP register. If you give the statement push sp with the 8086 or 8088, the word pushed is the word in SP after the push operation.
73
Figure 3.1 illustrates how pushes and pops change the SP register.
Figure 3.1
Stack Status Before and After Pushes and Pops
On the 8086, PUSH and POP take only registers or memory expressions as their operands. The other processors allow an immediate value to be an operand for PUSH. For example, the following statement is legal on the 8018680486 processors:
push 7 ; 3 clocks on 80286
That statement is faster than these equivalent statements, which are required on the 8088 or 8086:
mov push ax, 7 ax ; 2 clocks plus ; 3 clocks on 80286
Words are popped off the stack in reverse order: the last item pushed is the first popped. To return the stack to its original status, you do the same number of
74
Programmers Guide
pops as pushes. You can subtract the correct number of words from the SP register if you want to restore the stack without using the values on it. To reference operands on the stack, remember that the values pointed to by the BP (Base Pointer) and SP registers are relative to the SS (Stack Segment) register. The BP register is often used to point to the base of a frame of reference (a stack frame) within the stack. This example shows how you can access values on the stack using indirect memory operands with BP as the base register.
push mov push push push . . . mov mov mov . . . add pop bp bp, sp ax bx cx ; ; ; ; ; Save current value of BP Set stack frame Push first; SP = BP - 2 Push second; SP = BP - 4 Push third; SP = BP - 6
ax, [bp-6] bx, [bp-4] cx, [bp-2]
; Put third word in AX ; Put second word in BX ; Put first word in CX
sp, 6 bp
; Restore stack pointer ; (two bytes per push) ; Restore BP
If you often use these stack values in your program, you may want to give them labels. For example, you can use TEXTEQU to create a label such as count TEXTEQU <[bp-6]>. Now you can replace the mov ax, [bp - 6] statement in the previous example with mov ax, count. For more information about the TEXTEQU directive, see Text Macros in Chapter 9.
Saving Flags on the Stack

Your program can push and pop flags onto the stack with the PUSHF and POPF instructions. These instructions save and then restore the status of the flags. You can also use them within a procedure to save and restore the flag status of the caller. The 32-bit versions of these instructions are PUSHFD and POPFD . This example saves the flags register before calling the systask procedure:
pushf call popf systask
75
If you do not need to store the entire flags register, you can use the LAHF instruction to manually load and store the status of the lower byte of the flag register in the AH register. SAHF restores the value.
Saving Registers on the Stack (8018680486 Only)

Starting with the 80186 processor, the PUSHA and POPA instructions push or pop all the general-purpose registers with only one instruction. These instructions save the status of all registers before a procedure call and restore them after the return. Using PUSHA and POPA is significantly faster and takes fewer bytes of code than pushing and popping each register individually. The processor pushes the registers in the following order: AX, CX, DX, BX, SP, BP, SI, and DI. The SP word pushed is the value before the first register is pushed. The processor pops the registers in the opposite order. The 32-bit versions of these instructions are PUSHAD and POPAD.
Accessing Data with Pointers and Addresses

A pointer is simply a variable that contains an address of some other variable. The address in the pointer points to the other object. Pointers are useful when transferring a large data object (such as an array) to a procedure. The caller places only the pointer on the stack, which the called procedure uses to locate the array. This eliminates the impractical step of having to pass the entire array back and forth through the stack. There is a difference between a far address and a far pointer. A far address is the address of a variable located in a far data segment. A far pointer is a variable that contains the segment address and offset of some other data. Like any other variable, a pointer can be located in either the default (near) data segment or in a far segment. Previous versions of MASM allow pointer variables but provide little support for them. In previous versions, any address loaded into a variable can be considered a pointer, as in the following statements:
Var npVar fpVar BYTE WORD DWORD 0 Var Var ; Variable ; Near pointer to variable ; Far pointer to variable
If a variable is initialized with the name of another variable, the initialized variable is a pointer, as shown in this example. However, in previous versions of MASM, the CodeView debugger recognizes npVar and fpVar as word and doubleword variables. CodeView does not treat them as pointers, nor does it recognize the type of data they point to (bytes, in the example).
76
Programmers Guide
The TYPEDEF directive and enhanced capabilities of ASSUME (introduced in MASM 6.0) make it easier to manage pointers in registers and variables. The rest of this chapter describes these directives and how they apply to basic pointer operations.
Defining Pointer Types with TYPEDEF

The TYPEDEF directive can define types for pointer variables. A type so defined is considered the same as the intrinsic types provided by the assembler and can be used in the same contexts. When used to define pointers, the syntax for TYPEDEF is: typename TYPEDEF [[distance]] PTR qualifiedtype The typename is the name assigned to the new type. The distance can be NEAR, FAR, or any distance modifier. The qualifiedtype can be any previously intrinsic or defined MASM type, or a type previously defined with TYPEDEF. (For a full definition of qualifiedtype, see Data Types in Chapter 1.) Here are some examples of user-defined types:
PBYTE NPBYTE FPBYTE PWORD NPWORD FPWORD PPBYTE PVOID PERSON name num PERSON PPERSON TYPEDEF TYPEDEF TYPEDEF TYPEDEF TYPEDEF TYPEDEF TYPEDEF TYPEDEF PTR BYTE NEAR PTR BYTE FAR PTR BYTE PTR WORD NEAR PTR WORD FAR PTR WORD PTR PBYTE PTR ; ; ; ; ; ; Pointer to bytes Near pointer to bytes Far pointer to bytes Pointer to words Near pointer to words Far pointer to words
; Pointer to pointer to bytes ; (in C, an array of strings) ; Pointer to any type of data
STRUCT ; Structure type BYTE 20 DUP (?) WORD ? ENDS TYPEDEF PTR PERSON ; Pointer to structure type
The distance of a pointer can be set specifically or determined automatically by the memory model (set by .MODEL ) and the segment size (16 or 32 bits). If you dont use .MODEL , near pointers are the default. In 16-bit mode, a near pointer is 2 bytes that contain the offset of the object pointed to. A far pointer requires 4 bytes, and contains both the segment and offset. In 32-bit mode, a near pointer is 4 bytes and a far pointer is 6 bytes, since segments are
77
still word values in 32-bit mode. If you specify the distance with NEAR or FAR, the processor uses the default distance of the current segment size. You can use NEAR16, NEAR32, FAR16, and FAR32 to override the defaults set by the current segment size. In flat model, NEAR is the default. You can declare pointer variables with a pointer type created with TYPEDEF. Here are some examples using these pointer types.
; Type declarations Array WORD 25 DUP (0) Msg BYTE "This is a string", 0 pMsg PBYTE Msg ; Pointer to string pArray PWORD Array ; Pointer to word array npMsg NPBYTE Msg ; Near pointer to string npArray NPWORD Array ; Near pointer to word array fpArray FPWORD Array ; Far pointer to word array fpMsg FPBYTE Msg ; Far pointer to string S1 S2 S3 pS123 ppS123 Andy pAndy BYTE BYTE BYTE PBYTE PPBYTE "first", 0 "second", 0 "third", 0 S1, S2, S3, 0 pS123 ; Some strings
; Array of pointers to strings ; A pointer to pointers to strings ; Structure variable ; Pointer to structure variable ; Procedure prototype
PERSON <> PPERSON Andy
EXTERN Sort
ptrArray:PBYTE PROTO pArray:PBYTE
; External variable ; Parameter for prototype
; Parameter for procedure Sort PROC pArray:PBYTE LOCAL pTmp:PBYTE . . . ret Sort ENDP
; Local variable
Once defined, pointer types can be used in any context where intrinsic types are allowed.
78
Programmers Guide
Defining Register Types with ASSUME

You can use the ASSUME directive with general-purpose registers to specify that a register is a pointer to a certain size of object. For example:
ASSUME inc add mov . . . ASSUME bx:PTR WORD [bx] bx, 2 [bx], 0 ; ; ; ; Assume BX is now a word pointer Increment word pointed to by BX Point to next word Word pointed to by BX = 0
; Other pointer operations with BX bx:NOTHING ; Cancel assumption
In this example, BX is specified as a pointer to a word. After a sequence of using BX as a pointer, the assumption is canceled by assuming NOTHING. Without the assumption to PTR WORD, many instructions need a size specifier. The INC and MOV statements from the previous examples would have to be written like this to specify the sizes of the memory operands:
inc mov WORD PTR [bx] WORD PTR [bx], 0
When you have used ASSUME, attempts to use the register for other purposes generate assembly errors. In this example, while the PTR WORD assumption is in effect, any use of BX inconsistent with its ASSUME declaration generates an error. For example,
; mov al, [bx] ; Can't move word to byte register
You can also use the PTR operator to override defaults:

mov al, BYTE PTR [bx] ; Legal
Similarly, you can use ASSUME to prevent the use of a register as a pointer, or even to disable a register:
; ; ASSUME mov mov bx:WORD, dx:ERROR al, [bx] ; Error - BX is an integer, not a pointer ax, dx ; Error - DX disabled
For information on using ASSUME with segment registers, refer to Setting the ASSUME Directive for Segment Registers in Chapter 2.
79
Basic Pointer and Address Operations

A program can perform the following basic operations with pointers and addresses:
u u
Initialize a pointer variable by storing an address in it. Load an address into registers, directly or from a pointer.
The sections in the rest of this chapter describe variations of these tasks with pointers and addresses. The examples are used with the assumption that you have previously defined the following pointer types with the TYPEDEF directive:
PBYTE NPBYTE FPBYTE TYPEDEF PTR BYTE TYPEDEF NEAR PTR BYTE TYPEDEF FAR PTR BYTE ; Pointer to bytes ; Near pointer to bytes ; Far pointer to bytes
Initializing Pointer Variables

If the value of a pointer is known at assembly time, the assembler can initialize it automatically so that no processing time is wasted on the task at run time. The following example shows how to do this, placing the address of msg in the pointer pmsg.
Msg pMsg BYTE PBYTE "String", 0 Msg
If a pointer variable can be conditionally defined to one of several constant addresses, initialization must be delayed until run time. The technique is different for near pointers than for far pointers, as shown here:
Msg1 Msg2 npMsg fpMsg BYTE BYTE NPBYTE FPBYTE . . . mov mov mov "String1" "String2" ? ?
npMsg, OFFSET Msg1 WORD PTR fpMsg[0], OFFSET Msg2 WORD PTR fpMsg[2], SEG Msg2
; Load near pointer ; Load far offset ; Load far segment
If you know that the segment for a far pointer is in a register, you can load it directly:
mov WORD PTR fpMsg[2], ds ; Load segment of ; far pointer
80
Programmers Guide
Dynamic Addresses
Often a pointer must point to a dynamic address, meaning the address depends on a run-time condition. Typical situations include memory allocated by MSDOS (see Interrupt 21h Function 48h in Help) and addresses found by the SCAS or CMPS instructions (see Processing Strings in Chapter 5). The following illustrates the technique for saving dynamic addresses:
; Dynamically allocated buffer fpBuf FPBYTE 0 ; Initialize so offset will be zero . . . mov ah, 48h ; Allocate memory mov bx, 10h ; Request 16 paragraphs int 21h ; Call DOS jc error ; Return segment in AX mov WORD PTR fpBuf[2], ax ; Load segment . ; (offset is already 0) . . error: ; Handle error
Copying Pointers
Sometimes one pointer variable must be initialized by copying from another. Here are two ways to copy a far pointer:
fpBuf1 fpBuf2 FPBYTE ? FPBYTE ? . . . ; Copy through registers is faster, but requires a spare register mov ax, WORD PTR fpBuf1[0] mov WORD PTR fpBuf2[0], ax mov ax, WORD PTR fpBuf1[2] mov WORD PTR fpBuf2[2], ax ; Copy through stack is slower, but does not use a register push WORD PTR fpBuf1[0] push WORD PTR fpBuf1[2] pop WORD PTR fpBuf2[2] pop WORD PTR fpBuf2[0]
81
Pointers as Arguments
Most high-level-language procedures and library functions accept arguments passed on the stack. Passing Arguments on the Stack in Chapter 7 covers this subject in detail. A pointer is passed in the same way as any other variable, as this fragment shows:
; Push a far pointer (segment always pushed first) push WORD PTR fpMsg[2] ; Push segment push WORD PTR fpMsg[0] ; Push offset
Pushing an address has the same result as pushing a pointer to the address:
; Push a far address as a far pointer mov ax, SEG fVar ; Load and push segment push ax mov ax, OFFSET fVar ; Load and push offset push ax
On the 80186 and later processors, you can push a constant in one step:
push push SEG fVar OFFSET fVar ; Push segment ; Push offset
Loading Addresses into Registers

Loading a near address into a register (or a far address into a pair of registers) is a common task in assembly-language programming. To reference data pointed to by a pointer, your program must first place the pointer into a register or pair of registers. Load far addresses as segment:offset pairs. The following pairs have specific uses:
Segment:Offset Pair DS:SI ES:DI DS:DX ES:BX Standard Use Source for string operations Destination for string operations Input for certain DOS functions Output from certain DOS functions
Addresses from Data Segments

For near addresses, you need only load the offset; the segment is assumed as SS for stack-based data and as DS for other data. You must load both segment and offset for far pointers.
82
Programmers Guide
Here is an example of loading an address into DS:BX from a near data segment:
Msg .DATA BYTE . . . mov "String"
bx, OFFSET Msg
; Load address to BX ; (DS already loaded)
Far data can be loaded like this:

.FARDATA Msg BYTE . . . mov mov mov "String"
ax, SEG Msg es, ax bx, OFFSET Msg
; Load address to ES:BX
You can also read a far address from a pointer in one step, using the LES and LDS instructions described next.
Far Pointers
The LES and LDS instructions load a far pointer into a segment pair. The instructions copy the pointers low word into either ES or DS, and the high word into a given register. The following example shows how to load a far pointer into ES:DI:
OutBuf fpOut BYTE FPBYTE . . . les 20 DUP (0) OutBuf
di, fpOut
; Load far pointer into ES:DI
83
Stack Variables
The technique for loading the address of a stack variable is significantly different from the technique for loading near addresses. You may need to put the correct segment value into ES for string operations. The following example illustrates how to load the address of a local (stack) variable to ES:DI:
Task PROC LOCAL push pop lea Arg[4]:BYTE ss ; Since it's stack-based, segment is SS es ; Copy SS to ES di, Arg ; Load offset to DI
The local variable in this case actually evaluates to SS:[BP-4]. This is an offset from the stack frame (described in Passing Arguments on the Stack, Chapter 7). Since you cannot use the OFFSET operator to get the offset of an indirect memory operand, you must use the LEA (Load Effective Address) instruction.
Direct Memory Operands

To get the address of a direct memory operand, use either the LEA instruction or the MOV instruction with OFFSET. Though both methods have the same effect, the MOV instruction produces smaller and faster code, as shown in this example:
lea mov si, Msg ; Four byte instruction si, OFFSET Msg ; Three byte equivalent
Copying Between Segment Pairs

Copying from one register pair to another is complicated by the fact that you cannot copy one segment register directly to another. Two copying methods are shown here. Timings are for the 8088 processor.
; Copy DS:SI to ES:DI, generating smaller code push ds ; 1 byte, 14 clocks pop es ; 1 byte, 12 clocks mov di, si ; 2 bytes, 2 clocks ; Copy DS:SI to ES:DI, generating faster code mov di, ds ; 2 bytes, 2 clocks mov es, di ; 2 bytes, 2 clocks mov di, si ; 2 bytes, 2 clocks
84
Programmers Guide
Model-Independent Techniques
Often you may want to write code that is memory-model independent. If you are writing libraries that must be available for different memory models, you can use conditional assembly to handle different sizes of pointers. You can use the predefined symbols @DataSize and @Model to test the current assumptions. You can use conditional assembly to write code that works with pointer variables that have no specified distance. The predefined symbol @DataSize tests the pointer size for the current memory model:
Msg1 pMsg BYTE PBYTE . . . IF mov mov ELSE mov ENDIF "String1" ?
@DataSize WORD PTR pMsg[0], OFFSET Msg1 WORD PTR pMsg[2], SEG Msg1 pMsg, OFFSET Msg1
; ; ; ; ;
@DataSize > 0 for far Load far offset Load far segment @DataSize = 0 for near Load near pointer
In the following example, a procedure receives as an argument a pointer to a word variable. The code inside the procedure uses @DataSize to determine whether the current memory model supports far or near data. It loads and processes the data accordingly:
; Procedure that receives an argument by reference mul8 PROC arg:PTR WORD IF les mov ELSE mov mov ENDIF shl shl shl ret ENDP @DataSize bx, arg ; Load far pointer to ES:BX ax, es:[bx] ; Load the data pointed to bx, arg ax, [bx] ax, 1 ax, 1 ax, 1 ; Load near pointer to BX (assume DS) ; Load the data pointed to ; Multiply by 8
mul8
85
If you have many routines, writing the conditionals for each case can be tedious. The following conditional statements automatically generate the proper instructions and segment overrides.
; Equates for conditional handling of pointers IF @DataSize lesIF TEXTEQU <les> ldsIF TEXTEQU <lds> esIF TEXTEQU <es:> ELSE lesIF TEXTEQU <mov> ldsIF TEXTEQU <mov> esIF TEXTEQU <> ENDIF
Once you define these conditionals, you can use them to simplify code that must handle several types of pointers. This next example rewrites the above mul8 procedure to use conditional code.
mul8 PROC lesIF mov shl shl shl ret ENDP arg:PTR WORD bx, ax, ax, ax, ax, arg esIF [bx] 1 1 1 ; Load pointer to BX or ES:BX ; Load the data from [BX] or ES:[BX] ; Multiply by 8
mul8
The conditional statements from these examples can be defined once in an include file and used whenever you need to handle pointers.
85
C H A P T E R
Defining and Using Simple Data Types
This chapter covers the concepts essential for working with simple data types in assembly-language programs. The first section shows how to declare integer variables. The second section describes basic operations including moving, loading, and sign-extending numbers, as well as calculating. The last section describes how to do various operations with numbers at the bit level, such as using bitwise logical instructions and shifting and rotating bits. The complex data types introduced in the next chapter arrays, strings, structures, unions, and records use many of the operations illustrated in this chapter. Floating-point operations require a different set of instructions and techniques. These are covered in Chapter 6, Using Floating-Point and Binary Coded Decimal Numbers.
Declaring Integer Variables

An integer is a whole number, such as 4 or 4,444. Integers have no fractional part, as do the real numbers discussed in Chapter 6. You can initialize integer variables in several ways with the data allocation directives. This section explains how to use the SIZEOF and TYPE operators to provide information to the assembler about the types in your program. For information on symbolic integer constants, see Integer Constants and Constant Expressions in Chapter 1.
Allocating Memory for Integer Variables

When you declare an integer variable by assigning a label to a data allocation directive, the assembler allocates memory space for the integer. The variables name becomes a label for the memory space. The syntax is: [[name]] directive initializer
Project: Author: Ruth L Silverio Last Saved By: Ruth L Silverio Printed: 10/02/00 04:23 PM
86
Programmers Guide
The following directives indicate the integers size and value range:
Directive
BYTE, DB (byte) SBYTE (signed byte) WORD, DW (word = 2 bytes) SWORD (signed word) DWORD, DD (doubleword = 4
Description of Initializers Allocates unsigned numbers from 0 to 255. Allocates signed numbers from 128 to +127. Allocates unsigned numbers from 0 to 65,535 (64K). Allocates signed numbers from 32,768 to +32,767. Allocates unsigned numbers from 0 to 4,294,967,295 (4 megabytes). Allocates signed numbers from 2,147,483,648 to +2,147,483,647. Allocates 6-byte (48-bit) integers. These values are normally used only as pointer variables on the 80386/486 processors. Allocates 8-byte integers used with 8087-family coprocessor instructions. Allocates 10-byte (80-bit) integers if the initializer has a radix specifying the base of the number.
bytes),
SDWORD (signed doubleword) FWORD, DF (farword = 6 bytes)
QWORD, DQ (quadword = 8 bytes) TBYTE, DT (10 bytes),
See Chapter 6 for information on the REAL4, REAL8, and REAL10 directives that allocate real numbers. The SIZEOF and TYPE operators, when applied to a type, return the size of an integer of that type. The size attribute associated with each data type is:
Data Type
BYTE, SBYTE WORD, SWORD DWORD, SDWORD FWORD QWORD TBYTE
Bytes
1 2
4
6 8 10
The data types SBYTE, SWORD, and SDWORD tell the assembler to treat the initializers as signed data. It is important to use these signed types with high-level constructs such as .IF, .WHILE, and .REPEAT, and with PROTO and INVOKE directives. For descriptions of these directives, see the sections Loop-Generating Directives, Declaring Procedure Prototypes, and Calling Procedures with INVOKE in Chapter 7.
Chapter 4 Defining and Using Simple Data Types
87
The assembler stores integers with the least significant bytes lowest in memory. Note that assembler listings and most debuggers show the bytes of a word in the opposite order high byte first. Figure 4.1 illustrates the integer formats.
Figure 4.1
Integer Formats
Although the TYPEDEF directives primary purpose is to define pointer variables (see Defining Pointer Types with TYPEDEF in Chapter 3), you can also use TYPEDEF to create an alias for any integer type. For example, these declarations
char long float double TYPEDEF TYPEDEF TYPEDEF TYPEDEF SBYTE DWORD REAL4 REAL8
allow you to use char, long, float, or double in your programs if you prefer the C data labels.
Data Initialization
You can initialize variables when you declare them with constants or expressions that evaluate to constants. The assembler generates an error if you specify an initial value too large for the variable type. A ? in place of an initializer indicates you do not require the assembler to initialize the variable. The assembler allocates the space but does not write in it. Use ? for buffer areas or variables your program will initialize at run time.
88
Programmers Guide
You can declare and initialize variables in one step with the data directives, as these examples show.
integer negint expression signedexp empty long longnum tb BYTE SBYTE WORD SWORD QWORD BYTE DWORD SDWORD TBYTE 16 -16 4*3 4*3 ? 1,2,3,4,5,6 4294967295 ; ; ; ; ; ; ; ; -2147433648 ; ; 2345t ; Initialize byte to 16 Initialize signed byte to -16 Initialize word to 12 Initialize signed word to 12 Allocate uninitialized long int Initialize six unnamed bytes Initialize doubleword to 4,294,967,295 Initialize signed doubleword to -2,147,433,648 Initialize 10-byte binary number
For information on arrays and on using the DUP operator to allocate initializer lists, see Arrays and Strings in Chapter 5.
Working with Simple Variables

Once you have declared integer variables in your program, you can use them to copy, move, and sign-extend integer variables in your MASM code. This section shows how to do these operations as well as how to add, subtract, multiply, and divide numbers and do bit-level manipulations with logical, shift, and rotate instructions. Since MASM instructions require operands to be the same size, you may need to operate on data in a size other than that originally declared. You can do this with the PTR operator. For example, you can use the PTR operator to access the high-order word of a DWORD-size variable. The syntax for the PTR operator is type PTR expression where the PTR operator forces expression to be treated as having the type specified. An example of this use is
num .DATA DWORD .CODE mov mov 0
ax, WORD PTR num[0] ; Loads a word-size value from dx, WORD PTR num[2] ; a doubleword variable
89
Copying Data
The primary instructions for moving data from operand to operand and loading them into registers are MOV (Move), XCHG (Exchange), CWD (Convert Word to Double), and CBW (Convert Byte to Word).
Moving Data
The most common method of moving data, the MOV instruction, is essentially a copy instruction, since it always copies the source operand to the destination operand without affecting the source. After a MOV instruction, the source and destination operands contain the same value. The following example illustrates the MOV instruction. As explained in General-Purpose Registers, Chapter 1, you cannot move a value from one location in memory to another in a single operation.
; Immediate value moves mov ax, 7 mov mem, 7 mov mem[bx], 7 ; Immediate to register ; Immediate to memory direct ; Immediate to memory indirect
; Register moves mov mem, ax ; Register to memory direct mov mem[bx], ax ; Register to memory indirect mov ax, bx ; Register to register mov ds, ax ; General register to segment register ; Direct memory moves mov ax, mem mov ds, mem
; Memory direct to register ; Memory to segment register
; Indirect memory moves mov ax, mem[bx] ; Memory indirect to register mov ds, mem[bx] ; Memory indirect to segment register ; Segment register moves mov mem, ds ; Segment register to memory mov mem[bx], ds ; Segment register to memory indirect mov ax, ds ; Segment register to general register
90
Programmers Guide
The following example shows several common types of moves that require two instructions.
; Move immediate to segment register mov ax, DGROUP ; Load AX with immediate value mov ds, ax ; Copy AX to segment register ; Move memory to memory mov ax, mem1 mov mem2, ax
; Load AX with memory value ; Copy AX to other memory
; Move segment register to segment register mov ax, ds ; Load AX with segment register mov es, ax ; Copy AX to segment register
The MOVSX and MOVZX instructions for the 80386/486 processors extend and copy values in one step. See Extending Signed and Unsigned Integers, following.
Exchanging Integers
The XCHG (Exchange) instruction exchanges the data in the source and destination operands. You can exchange data between registers or between registers and memory, but not from memory to memory:
xchg xchg xchg ax, bx memory, ax mem1, mem2 ; Put AX in BX and BX in AX ; Put "memory" in AX and AX in "memory" ; Illegal- can't exchange memory locations
Extending Signed and Unsigned Integers

Since moving data between registers of different sizes is illegal, you must signextend integers to convert signed data to a larger size. Sign-extending means copying the sign bit of the unextended operand to all bits of the operands next larger size. This widens the operand while maintaining its sign and value. 8086-based processors provide four instructions specifically for sign-extending. The four instructions act only on the accumulator register (AL, AX, or EAX), as shown in the following list.
Instruction
CBW (convert byte to word) CWD (convert word to doubleword) CWDE (convert word to doubleword extended)* CDQ (convert doubleword to quadword)*
Sign-extend AL to AX AX to DX:AX AX to EAX EAX to EDX:EAX
*Requires an extended register and applies only to 80386/486 processors.
91
On the 80386/486 processors, the CWDE instruction converts a signed 16-bit value in AX to a signed 32-bit value in EAX. The CDQ instruction converts a signed 32-bit value in EAX to a signed 64-bit value in the EDX:EAX register pair. This example converts signed integers using CBW, CWD, CWDE, and CDQ.
mem8 mem16 mem32 .DATA SBYTE SWORD SDWORD .CODE . . . mov cbw mov cwd mov cwde mov cdq -5 +5 -5
al, mem8 ax, mem16 ax, mem16 eax, mem32
; ; ; ; ; ; ; ; ;
Load 8-bit -5 (FBh) Convert to 16-bit -5 (FFFBh) in AX Load 16-bit +5 Convert to 32-bit +5 (0000:0005h) in DX:AX Load 16-bit +5 Convert to 32-bit +5 (00000005h) in EAX Load 32-bit -5 (FFFFFFFBh) Convert to 64-bit -5 (FFFFFFFF:FFFFFFFBh) in EDX:EAX
These four instructions efficiently convert unsigned values as well, provided the sign bit is zero. This example, for instance, correctly widens mem16 whether you treat the variable as signed or unsigned. The processor does not differentiate between signed and unsigned values. For instance, the value of mem8 in the previous example is literally 251 (0FBh) to the processor. It ignores the human convention of treating the highest bit as an indicator of sign. The processor can ignore the distinction between signed and unsigned numbers because binary arithmetic works the same in either case. If you add 7 to mem8, for example, the result is 258 (102h), a value too large to fit into a single byte. The byte-sized mem8 can accommodate only the leastsignificant digits of the result (02h), and so receives the value of 2. The result is the same whether we treat mem8 as a signed value (-5) or unsigned value (251). This overview illustrates how the programmer, not the processor, must keep track of which values are signed or unsigned, and treat them accordingly. If AL=127 (01111111y), the instruction CBW sets AX=127 because the sign bit is zero. If AL=128 (10000000y), however, the sign bit is 1. CBW thus sets AX=65,280
92
Programmers Guide
(FF00h), which may not be what you had in mind if you assumed AL originally held an unsigned value.To widen unsigned values, explicitly set the higher register to zero, as shown in the following example:
mem8 mem16 .DATA BYTE WORD .CODE . . . mov sub mov sub sub mov 251 251
al, mem8 ah, ah
; Load 251 (FBh) from 8-bit memory ; Zero upper half (AH)
ax, mem16 ; Load 251 (FBh) from 16-bit memory dx, dx ; Zero upper half (DX) eax, eax ; Zero entire extended register (EAX) ax, mem16 ; Load 251 (FBh) from 16-bit memory
The 80386/486 processors provide instructions that move and extend a value to a larger data size in a single step. MOVSX moves a signed value into a register and sign-extends it. MOVZX moves an unsigned value into a register and zeroextends it.
; 80386/486 instructions movzx dx, bl ; Load unsigned 8-bit value into ; 16-bit register and zero-extend
These special 80386/486 instructions usually execute much faster than the equivalent 8086/286 instructions.
Adding and Subtracting Integers

You can use the ADD, ADC, INC, SUB, SBB, and DEC instructions for adding, incrementing, subtracting, and decrementing values in single registers. You can also combine them to handle larger values that require two registers for storage.
Adding and Subtracting Integers Directly

The ADD, INC (Increment), SUB, and DEC (Decrement) instructions operate on 8- and 16-bit values on the 808680286 processors, and on 8-, 16-, and 32bit values on the 80386/486 processors. They can be combined with the ADC and SBB instructions to work on 32-bit values on the 8086 and 64-bit values on the 80386/486 processors. (See Adding and Subtracting in Multiple Registers, following.)
93
These instructions have two requirements: 1. If there are two operands, only one operand can be a memory operand. 2. If there are two operands, both must be the same size. To meet the second requirement, you can use the PTR operator to force an operand to the size required. (See Working with Simple Variables, previous.) For example, if Buffer is an array of bytes and BX points to an element of the array, you can add a word from Buffer with
add ax, WORD PTR Buffer[bx] ; Add word from byte array
The next example shows 8-bit signed and unsigned addition and subtraction.
mem8 .DATA BYTE .CODE 39
; Addition mov inc add ; ; ; ; ; ; al, mem8 ; ; ah, al ; al, 26 al al, 76 al, ah Start with register Increment Add immediate signed unsigned 26 26 1 1 76 + 76 ------103 103 39 + 39 -------114 142 +overflow 142 ---28+carry
add mov add
Add memory Copy to AH
; Add register ; ;
; Subtraction mov dec sub al, 95 al al, 23 ; ; ; ; ; ; al, mem8 ; ; ; ah, 119 al, ah Load register Decrement Subtract immediate signed 95 -1 -23 ---71 -122 ----51 unsigned 95 -1 -23 ---71 -122 ---205+sign
sub
Subtract memory
mov sub
; Load register ; and subtract ; ;
119 -51 ---86+overflow
94
Programmers Guide
The INC and DEC instructions treat integers as unsigned values and do not update the carry flag for signed carries and borrows. When the sum of 8-bit signed operands exceeds 127, the processor sets the overflow flag. (The overflow flag is also set if both operands are negative and the sum is less than or equal to -128.) Placing a JO (Jump on Overflow) or INTO (Interrupt on Overflow) instruction in your program at this point can transfer control to error-recovery statements. When the sum exceeds 255, the processor sets the carry flag. A JC (Jump on Carry) instruction at this point can transfer control to error-recovery statements. In the previous subtraction example, the processor sets the sign flag if the result goes below 0. At this point, you can use a JS (Jump on Sign) instruction to transfer control to error-recovery statements. Jump instructions are described in the Jumps section in Chapter 7.
Adding and Subtracting in Multiple Registers

You can add and subtract numbers larger than the register size on your processor with the ADC (Add with Carry) and SBB (Subtract with Borrow) instructions. If the operations prior to an ADC or SBB instruction do not set the carry flag, these instructions are identical to ADD and SUB. When you operate on large values in more than one register, use ADD and SUB for the least significant part of the number and ADC or SBB for the most significant part. The following example illustrates multiple-register addition and subtraction. You can also use this technique with 64-bit operands on the 80386/486 processors.
mem32 mem32a mem32b .DATA DWORD DWORD DWORD .CODE . . . ; Addition mov sub add adc 316423 316423 156739
ax, dx, ax, dx,
43981 dx WORD PTR mem32[0] WORD PTR mem32[2]
; Load immediate 43981 ; into DX:AX ; Add to both + 316423 ; memory words -----; Result in DX:AX 360404
; Subtraction mov mov sub sbb
ax, dx, ax, dx,
WORD WORD WORD WORD
PTR PTR PTR PTR
mem32a[0] mem32a[2] mem32b[0] mem32b[2]
; Load mem32 316423 ; into DX:AX ; Subtract low - 156739 ; then high -----; Result in DX:AX 159684
Filename: LMAPGC04.DOC Project: Template: MSGRIDA1.DOT Author: Ruth L Silverio Last Saved By: Ruth L Silverio Revision #: 2 Page: 94 of 10 Printed: 10/02/00 04:23 PM
95
For 32-bit registers on the 80386/486 processors, only two steps are necessary. If your program needs to be assembled for more than one processor, you can assemble the statements conditionally, as shown in this example:
.DATA DWORD DWORD DWORD TEXTEQU .CODE . . . ; Addition IF mov add ELSE . . . ENDIF mem32 mem32a mem32b p386 ; Subtraction IF mov sub ELSE . . . ENDIF 316423 316423 156739 (@Cpu AND 08h)
p386 eax, 43981 eax, mem32
; Load immediate ; Result in EAX
; do steps in previous example
p386 eax, mem32a ; Load memory eax, mem32b ; Result in EAX
; do steps in previous example
Since the status of the carry flag affects the results of calculations with ADC and SBB, be sure to turn off the carry flag with the CLC (Clear Carry Flag) instruction or use ADD or SUB for the first calculation, when appropriate.
Multiplying and Dividing Integers

The 8086 family of processors uses different multiplication and division instructions for signed and unsigned integers. Multiplication and division instructions also have special requirements depending on the size of the operands and the processor the code runs on.
Using Multiplication Instructions

The MUL instruction multiplies unsigned numbers. IMUL multiplies signed numbers. For both instructions, one factor must be in the accumulator register (AL for 8-bit numbers, AX for 16-bit numbers, EAX for 32-bit numbers). The
96
Programmers Guide
other factor can be in any single register or memory operand. The result overwrites the contents of the accumulator register. Multiplying two 8-bit numbers produces a 16-bit result returned in AX. Multiplying two 16-bit operands yields a 32-bit result in DX:AX. The 80386/486 processor handles 64-bit products in the same way in the EDX:EAX pair. This example illustrates multiplication of signed 16- and 32-bit integers.
mem16 .DATA SWORD -30000 .CODE . . . ; 8-bit unsigned multiply mov al, 23 mov bl, 24 mul bl
; ; ; ; ;
Load AL 23 Load BL * 24 Multiply BL ----Product in AX 552 overflow and carry set
; 16-bit signed multiply mov ax, 50 imul mem16
; Load AX 50 ; -30000 ; Multiply memory ----; Product in DX:AX -1500000 ; overflow and carry set
A nonzero number in the upper half of the result (AH for byte, DX or EDX for word) sets the overflow and carry flags. On the 8018680486 processors, the IMUL instruction supports three additional operand combinations. The first syntax option allows for 16-bit multipliers producing a 16-bit product or 32-bit multipliers for 32-bit products on the 80386/486. The result overwrites the destination. The syntax for this operation is: IMUL register16, immediate The second syntax option specifies three operands for IMUL. The first operand must be a 16-bit register operand, the second a 16-bit memory (or register) operand, and the third a 16-bit immediate operand. IMUL multiplies the memory (or register) and immediate operands and stores the product in the register operand with this syntax: IMUL register16,{ memory16 | register16}, immediate
97
For the 80386/486 only, a third option for IMUL allows an additional operand for multiplication of a register value by a register or memory value. The syntax is: IMUL register,{register | memory} The destination can be any 16-bit or 32-bit register. The source must be the same size as the destination. In all of these options, products too large to fit in 16 or 32 bits set the overflow and carry flags. The following examples show these three options for IMUL.
imul imul dx, 456 ax, [bx],6 ; Multiply DX times 456 on 80186-80486 ; Multiply the value pointed to by BX ; by 6 and put the result in AX ; Multiply DX times AX on 80386 ; Multiply AX by the value pointed to ; by BX on 80386
imul imul
dx, ax ax, [bx]
The IMUL instruction with multiple operands can be used for either signed or unsigned multiplication, since the 16-bit product is the same in either case. To get a 32-bit result, you must use the single-operand version of MUL or IMUL.
Using Division Instructions

The DIV instruction divides unsigned numbers, and IDIV divides signed numbers. Both return a quotient and a remainder. Table 4.1 summarizes the division operations. The dividend is the number to be divided, and the divisor is the number to divide by. The quotient is the result. The divisor can be in any register or memory location except the registers where the quotient and remainder are returned.
Table 4.1 Division Operations Size of Operand 16 bits 32 bits 64 bits (80386 and 80486) Dividend Register AX DX:AX EDX:EAX Size of Divisor 8 bits 16 bits 32 bits Quotient AL AX EAX Remainder AH DX EDX
98
Programmers Guide
Unsigned division does not require careful attention to flags. The following examples illustrate signed division, which can be more complex.
.DATA SWORD SDWORD .CODE . . . ; Divide 16-bit mov mov div mem16 mem32 -2000 500000
unsigned by 8-bit ax, 700 bl, 36 bl
; ; ; ; ;
Load dividend 700 Load divisor DIV 36 Divide BL -----Quotient in AL 19 Remainder in AH
16
; Divide 32-bit mov mov idiv
signed by 16-bit ax, WORD PTR mem32[0] ; dx, WORD PTR mem32[2] ; mem16 ; ; ; ; signed by 16-bit ax, WORD PTR mem16 bx,-421 bx
Load into DX:AX 500000 DIV -2000 Divide memory -----Quotient in AX -250 Remainder in DX
; Divide 16-bit mov cwd mov idiv
; ; ; ; ; ;
Load into AX -2000 Extend to DX:AX DIV -421 Divide by BX ----Quotient in AX 4 Remainder in DX -316
If the dividend and divisor are the same size, sign-extend or zero-extend the dividend so that it is the length expected by the division instruction. See Extending Signed and Unsigned Integers, earlier in this chapter.
Manipulating Numbers at the Bit Level

The instructions introduced so far in this chapter access numbers at the byte or word level. The logical, shift, and rotate instructions described in this section access individual bits in a number. You can use logical instructions to evaluate characters and do other text and screen operations. The shift and rotate instructions do similar tasks by shifting and rotating bits through registers. This section reviews some applications of these bit-level operations.
99
The logical instructions AND, OR, and XOR compare bits in two operands. Based on the results of the comparisons, the instructions alter bits in the first (destination) operand. The logical instruction NOT also changes bits, but operates on a single operand. The following list summarizes these four logical instructions. The list makes reference to the destination bit, meaning the bit in the destination operand. The terms both bits and either bit refer to the corresponding bits in the source and destination operands. These instructions include:
Instruction
AND OR XOR NOT
Sets Destination Bit If Both bits set Either or both bits set Either bit (but not both) set Destination bit clear
Clears Destination Bit If Either or both bits clear Both bits clear Both bits set or both clear Destination bit set
Note Do not confuse logical instructions with the logical operators, which perform these operations at assembly time, not run time. Although the names are the same, the assembler recognizes the difference. The following example shows the result of the AND, OR, XOR, and NOT instructions operating on a value in the AX register and in a mask. A mask is any number with a pattern of bits set for an intended operation.
mov and ax, 035h ax, 0FBh ; ; ; ; ; ; ; ; ; ; ; Load value Clear bit 2 Value is now 31h Set bits 4,2,1 Value is now 37h Toggle bits 7,5,3,2,0 Value is now 9Ah Value is now 65h 00110101 AND 11111011 -------00110001 OR 00010110 -------00110111 XOR 10101101 -------10011010 01100101
or
ax, 016h
xor
ax, 0ADh
not
ax
The AND instruction clears unmasked bits that is, bits not protected by 1 in the mask. To mask off certain bits in an operand and clear the others, use an appropriate masking value in the source operand. The bits of the mask should be 0 for any bit positions you want to clear and 1 for any bit positions you want to remain unchanged.
100
Programmers Guide
The OR instruction forces specific bits to 1 regardless of their current settings. The bits of the mask should be 1 for any bit positions you want to set and 0 for any bit positions you want to remain unchanged. The XOR instruction toggles the value of specific bits on and off that is, reverses them from their current settings. This instruction sets a bit to 1 if the corresponding bits are different or to 0 if they are the same. The bits of the mask should be 1 for any bit positions you want to toggle and 0 for any bit positions you want to remain unchanged. The following examples show an application for each of these instructions. The code illustrating the AND instruction converts a y or n read from the keyboard to uppercase, since bit 5 is always clear in uppercase letters. In the example for OR, the first statement is faster and uses fewer bytes than cmp bx, 0. When the operands for XOR are identical, each bit cancels itself, producing 0.
;AND example - converts characters to uppercase mov ah, 7 ; Get character without echo int 21h and al, 11011111y ; Convert to uppercase by clearing bit 5 cmp al, 'Y' ; Is it Y? je yes ; If so, do Yes actions . ; Else do No actions . yes: . ;OR example - compares operand to 0 or bx, bx ; Compare to 0 jg positive ; BX is positive jl negative ; BX is negative ; else BX is zero ;XOR example - sets a register to 0 xor cx, cx ; 2 bytes, 3 clocks on 8088 sub cx, cx ; 2 bytes, 3 clocks on 8088 mov cx, 0 ; 3 bytes, 4 clocks on 8088
On the 80386/486 processors, the BSF (Bit Scan Forward) and the BSR (Bit Scan Reverse) instructions perform operations like those of the logical instructions. They scan the contents of a register to find the first-set or last-set bit. You can use BSF or BSR to find the position of a set bit in a mask or to check if a register value is 0.
Shifting and Rotating Bits

The 8086-based processors provide a complete set of instructions for shifting and rotating bits. Shift instructions move bits a specified number of places to the
101
right or left. The last bit in the direction of the shift goes into the carry flag, and the first bit is filled with 0 or with the previous value of the first bit. Rotate instructions also move bits a specified number of places to the right or left. For each bit rotated, the last bit in the direction of the rotate operation moves into the first bit position at the other end of the operand. With some variations, the carry bit is used as an additional bit of the operand. Figure 4.2 illustrates the eight variations of shift and rotate instructions for 8-bit operands. Notice that SHL and SAL are identical.
102
Programmers Guide Figure 4.2 Shifts and Rotates
All shift instructions use the same format. Before the instruction executes, the destination operand contains the value to be shifted; after the instruction executes, it contains the shifted operand. The source operand contains the number of bits to shift or rotate. It can be the immediate value 1 or the CL register. The 8088 and 8086 processors do not accept any other values or registers with these instructions. Starting with the 80186 processor, you can use 8-bit immediate values larger than 1 as the source operand for shift or rotate instructions, as shown here:
shr bx, 4 ; 9 clocks, 3 bytes on 80286
The following statements are equivalent if the program must run on the 8088 or 8086 processor:
mov shr cl, 4 bx, cl ; 2 clocks, 3 bytes on 80286 ; 9 clocks, 2 bytes on 80286 ; 11 clocks, 5 bytes total
Masks for logical instructions can be shifted to new bit positions. For example, an operand that masks off a bit or group of bits can be shifted to move the mask to a different position, allowing you to mask off a different bit each time the mask is used. This technique, illustrated in the following example, is useful only if the mask value is unknown until run time.
masker .DATA BYTE .CODE . . . mov mov rol or rol or 00000010y ; Mask that may change at run time
cl, 2 bl, 57h masker, cl bl, masker masker, cl bl, masker
; ; ; ; ; ; ; ;
Rotate two at a time Load value to be changed Rotate two to left Turn on masked values New value is 05Fh Rotate two more Turn on masked values New value is 07Fh
01010111y 00001000y --------01011111y 00100000y --------01111111y
Multiplying and Dividing with Shift Instructions

You can use the shift and rotate instructions (SHR, SHL, SAR, and SAL) for multiplication and division. Shifting a value right by one bit has the effect of dividing by two; shifting left by 1 bit has the effect of multiplying by two. You can take advantage of shifts to do fast multiplication and division by powers of
103
two. For example, shifting left twice multiplies by four, shifting left three times multiplies by eight, and so on. Use SHR (Shift Right) to divide unsigned numbers. You can use SAR (Shift Arithmetic Right) to divide signed numbers, but SAR rounds negative numbers down IDIV always rounds negative numbers up (toward 0). Division using SAR must adjust for this difference. Multiplication by shifting is the same for signed and unsigned numbers, so you can use either SAL or SHL. Multiply and divide instructions are relatively slow, particularly on the 8088 and 8086 processors. When multiplying or dividing by a power of two, use shifts to speed operations by a factor of 10 or more. For example, these statements take only four clocks on an 8088 or 8086 processor:
sub shl ah, ah ax, 1 ; Clear AH ; Multiply byte in AL by 2
The following statements produce the same results, but take between 74 and 81 clocks on the 8088 or 8086 processors. The same statements take 15 clocks on the 80286 and between 11 and 16 clocks on the 80386. (For a discussion about instruction timings, see A Word on Instruction Timings in the Introduction.)
mov mul bl, 2 bl ; Multiply byte in AL by 2
As the following macro shows, its possible to multiply by any number in this case, 10 without resorting to the MUL instruction. However, such a procedure is no more than an interesting arithmetic exercise, since the additional code almost certainly takes more time to execute than a single MUL. You should consider using shifts in your program only when multiplying or dividing by a power of two.
mul_10 MACRO mov shl mov shl shl add ENDM factor ax, factor ax, 1 bx, ax ax, 1 ax, 1 ax, bx ; ; ; ; ; ; ; ; Factor must be unsigned Load into AX AX = factor * 2 Save copy in BX AX = factor * 4 AX = factor * 8 AX = (factor * 8) + (factor * 2) AX = factor * 10
Heres another macro that divides by 512. In contrast to the previous example, this macro uses little code and operates faster than an equivalent DIV instruction.
104
Programmers Guide
div_512 MACRO mov shr xchg cbw ENDM dividend ax, dividend ax, 1 al, ah ; ; ; ; ; ; ; Dividend must be unsigned Load into AX AX = dividend / 2 (unsigned) XCHG is like rotate right 8 AL = (dividend / 2) / 256 Clear upper byte AX = (dividend / 512)
If you need to shift a value that is too large to fit in one register, you can shift each part separately. The RCR (Register Carry Right) and RCL (Register Carry Left) instructions carry values from the first register to the second by passing the leftmost or rightmost bit through the carry flag. This example shifts a multiword value.
mem32 .DATA DWORD .CODE 500000
; Divide 32-bit mov again: shr rcr loop
unsigned by 16 cx, 4 ; Shift right 4 WORD PTR mem32[2], 1 ; Shift into carry WORD PTR mem32[0], 1 ; Rotate carry in again ;
DIV
500000 16 -----31250
Since the carry flag is treated as part of the operand (its like using a 9-bit or 17bit operand), the flag value before the operation is crucial. The carry flag can be adjusted by a previous instruction, but you can also set or clear the flag directly with the CLC (Clear Carry Flag), CMC (Complement Carry Flag), and STC (Set Carry Flag) instructions. On the 80386 and 80486 processors, an alternate method for multiplying quickly by constants takes advantage of the LEA (Load Effective Address) instruction and the scaling of indirect memory operands. By using a 32-bit value as both the index and the base register in an indirect memory operand, you can multiply by the constants 2, 3, 4, 5, 8, and 9 more quickly than you can by using the MUL instruction. LEA calculates the offset of the source operand and stores it into the destination register, EBX, as this example shows:
lea lea lea lea lea lea ebx, ebx, ebx, ebx, ebx, ebx, [eax*2] [eax*2+eax] [eax*4] [eax*4+eax] [eax*8] [eax*8+eax] ; ; ; ; ; ; EBX EBX EBX EBX EBX EBX = = = = = = 2 3 4 5 8 9 * * * * * * EAX EAX EAX EAX EAX EAX
Scaling of 80386 indirect memory operands is reviewed in Indirect Memory Operands with 32-Bit Registers in Chapter 3. LEA is introduced in Loading Addresses into Registers in Chapter 3.
105
The next chapter deals with more complex data types arrays, strings, structures, unions, and records. Many of the operations presented in this chapter can also be applied to the data structures covered in Chapter 5, Defining and Using Complex Data Types.
105
C H A P T E R
Defining and Using Complex Data Types
With the complex data types available in MASM 6.1 arrays, strings, records, structures, and unions you can access data as a unit or as individual elements that make up a unit. The individual elements of complex data types are often the integer types discussed in Chapter 4, Defining and Using Simple Data Types. Arrays and Strings reviews how to declare, reference, and initialize arrays and strings. This section summarizes the general steps needed to process arrays and strings and describes the MASM instructions for moving, comparing, searching, loading, and storing. Structures and Unions covers similar information for structures and unions: how to declare structure and union types, how to define structure and union variables, and how to reference structures and unions and their fields. Records explains how to declare record types, define record variables, and use record operators.
Arrays and Strings

An array is a sequential collection of variables, all of the same size and type, called elements. A string is an array of characters. For example, in the string ABC, each letter is an element. You can access the elements in an array or string relative to the first element. This section explains how to handle arrays and strings in your programs.
Declaring and Referencing Arrays

Array elements occupy memory contiguously, so a program references each element relative to the start of the array. To declare an array, supply a label name, the element type, and a series of initializing values or ? placeholders. The following examples declare the arrays warray and xarray:
warray xarray WORD DWORD 1, 2, 3, 4 0FFFFFFFFh, 789ABCDEh
106
Programmers Guide
Initializer lists of array declarations can span multiple lines. The first initializer must appear on the same line as the data type, all entries must be initialized, and, if you want the array to continue to the new line, the line must end with a comma. These examples show legal multiple-line array declarations:
big BYTE 21, 22, 23, 24, 25, 26, 27, 28 10, 20, 30
somelist
WORD
If you do not use the LENGTHOF and SIZEOF operators discussed later in this section, an array may span more than one logical line, although a separate type declaration is needed on each logical line:
var1 BYTE BYTE BYTE 10, 20, 30 40, 50, 60 70, 80, 90
The DUP Operator

You can also declare an array with the DUP operator. This operator works with any of the data allocation directives described in Allocating Memory for Integer Variables in Chapter 4. In the syntax count DUP (initialvalue [[, initialvalue]]...) the count value sets the number of times to repeat all values within the parentheses. The initialvalue can be an integer, character constant, or another DUP operator, and must always appear within parentheses. For example, the statement
barray BYTE 5 DUP (1)
allocates the integer 1 five times for a total of 5 bytes. The following examples show various ways to allocate data elements with the DUP operator:
array buffer masks DWORD BYTE BYTE 10 DUP (1) 256 DUP (?) ; 10 doublewords ; initialized to 1 ; 256-byte buffer
three_d DWORD
20 DUP (040h, 020h, 04h, 02h) ; 80-byte buffer ; with bit masks 5 DUP (5 DUP (5 DUP (0))) ; 125 doublewords ; initialized to 0
Chapter 5 Defining and Using Complex Data Types
107
Referencing Arrays
Each element in an array is referenced with an index number, beginning with zero. The array index appears in brackets after the array name, as in
array[9]
Assembly-language indexes differ from indexes in high-level languages, where the index number always corresponds to the elements position. In C, for example, array[9] references the arrays tenth element, regardless of whether each element is 1 byte or 8 bytes in size. In assembly language, an elements index refers to the number of bytes between the element and the start of the array. This distinction can be ignored for arrays of byte-sized elements, since an elements position number matches its index. For example, defining the array
prime BYTE 1, 3, 5, 7, 11, 13, 17
gives a value of 1 to prime[0], a value of 3 to prime[1], and so forth. However, in arrays with elements larger than 1 byte, index numbers (except zero) do not correspond to an elements position. You must multiply an elements position by its size to determine the elements index. Thus, for the array
wprime WORD 1, 3, 5, 7, 11, 13, 17
wprime[4] represents the third element (5), which is 4 bytes from the beginning of the array. Similarly, the expression wprime[6] represents the fourth element (7) and wprime[10] represents the sixth element (13).
The following example determines an index at run time. It multiplies the position by two (the size of a word element) by shifting it left:
mov shl mov si, cx si, 1 ax, wprime[si] ; CX holds position number ; Scale for word referencing ; Move element into AX
The offset required to access an array element can be calculated with the following formula: nth element of array = array[(n-1) * size of element] Referencing an array element by distance rather than position is not difficult to master, and is actually very consistent with how assembly language works. Recall that a variable name is a symbol that represents the contents of a particular address in memory. Thus, if the array wprime begins at address DS:2400h, the reference wprime[6] means to the processor the word value contained in the DS segment at offset 2400h-plus-6-bytes.
108
Programmers Guide
As described in Direct Memory Operands, Chapter 3, you can substitute the plus operator (+) for brackets, as in:
wprime[9] wprime+9
Since brackets simply add a number to an address, you dont need them when referencing the first element. Thus, wprime and wprime[0] both refer to the first element of the array wprime. If your program runs only on an 80186 processor or higher, you can use the BOUND instruction to verify that an index value is within the bounds of an array. For a description of BOUND, see the Reference.
LENGTHOF, SIZEOF, and TYPE for Arrays

When applied to arrays, the LENGTHOF, SIZEOF, and TYPE operators return information about the length and size of the array and about the type of the initializers. The LENGTHOF operator returns the number of elements in the array. The SIZEOF operator returns the number of bytes used by the initializers in the array definition. TYPE returns the size of the elements of the array. The following examples illustrate these operators:
array larray sarray tarray num lnum snum tnum warray len siz typ WORD EQU EQU EQU DWORD EQU EQU EQU WORD EQU EQU EQU 40 DUP (5) LENGTHOF array SIZEOF array TYPE array ; 40 elements ; 80 bytes ; 2 bytes per element
4, 5, 6, 7, 8, 9, 10, 11 LENGTHOF num SIZEOF num TYPE num ; 8 elements ; 32 bytes ; 4 bytes per element
40 DUP (40 DUP (5)) LENGTHOF warray SIZEOF warray TYPE warray ; 1600 elements ; 3200 bytes ; 2 bytes per element
Declaring and Initializing Strings

A string is an array of characters. Initializing a string like "Hello, there" allocates and initializes 1 byte for each character in the string. An initialized string can be no longer than 255 characters.
109
For data directives other than BYTE, a string may initialize only the first element. The initializer value must fit into the specified size and conform to the expression word size in effect (see Integer Constants and Constant Expressions in Chapter 1), as shown in these examples:
wstr dstr WORD DWORD "OK" "DATA" ; Legal under EXPR32 only
As with arrays, string initializers can span multiple lines. The line must end with a comma if you want the string to continue to the next line.
str1 BYTE "This is a long string that does not ", "fit on one line."
You can also have an array of pointers to strings.

PBYTE msg1 msg2 msg3 pmsg TYPEDEF .DATA BYTE BYTE BYTE PBYTE PBBYTE PBYTE PTR BYTE "Operation completed successfully." "Unknown command" "File not found" msg1 ; pmsg is an array msg2 ; of pointers to msg3 ; above messages
Strings must be enclosed in single (') or double (") quotation marks. To put a single quotation mark inside a string enclosed by single quotation marks, use two single quotation marks. Likewise, if you need quotation marks inside a string enclosed by double quotation marks, use two sets. These examples show the various uses of quotation marks:
char message warn string BYTE BYTE BYTE BYTE 'a' "That's the message." ; That's the message. 'Can''t find file.' ; Can't find file. "This ""value"" not found." ; This "value" not found.
You can always use single quotation marks inside a string enclosed by double quotation marks, as the initialization for message shows, and vice versa.
The ? Initializer
You do not have to initialize an array. The ? operator lets you allocate space for the array without placing specific values in it. Object files contain records for initialized data. Unspecified space left in the object file means that no records contain initialized data for that address. The actual values stored in arrays allocated with ? depend on certain conditions. The ? initializer is treated as a zero in a DUP statement that contains initializers in addition to the ? initializer. If the ? initializer does not appear in a DUP statement, or if the DUP statement contains only ? initializers, the assembler leaves the allocated space unspecified.
110
Programmers Guide
LENGTHOF, SIZEOF, and TYPE for Strings

Because strings are simply arrays of byte elements, the LENGTHOF, SIZEOF, and TYPE operators behave as you would expect, as illustrated in this example:
msg BYTE "This string extends ", "over three ", "lines." LENGTHOF msg SIZEOF msg TYPE msg ; 37 elements ; 37 bytes ; 1 byte per element
lmsg smsg tmsg
EQU EQU EQU
Processing Strings
The 8086-family instruction set has seven string instructions for fast and efficient processing of entire strings and arrays. The term string in string instructions refers to a sequence of elements, not just character strings. These instructions work directly only on arrays of bytes and words on the 808680486 processors, and on arrays of bytes, words, and doublewords on the 80386/486 processors. Processing larger elements must be done indirectly with loops. The following list gives capsule descriptions of the five instructions discussed in this section.
Instruction
MOVS STOS CMPS LODS SCAS
Description
Copies a string from one location to another Stores contents of the accumulator register to a string Compares one string with another Loads values from a string to the accumulator register Scans a string for a specified value
All of these instructions use registers in a similar way and have a similar syntax. Most are used with the repeat instruction prefixes REP, REPE (or REPZ), and REPNE (or REPNZ). REPZ is a synonym for REPE (Repeat While Equal) and REPNZ is a synonym for REPNE (Repeat While Not Equal). This section first explains the general procedures for using all string instructions. It then illustrates each instruction with an example.
Overview of String Instructions

The string instructions have specific requirements for the location of strings and the use of registers. To operate on any string, follow these three steps: 1. Set the direction flag to indicate the direction in which you want to process the string. The STD instruction sets the flag, while CLD clears it.
111
If the direction flag is clear, the string is processed upward (from low addresses to high addresses, which is from left to right through the string). If the direction flag is set, the string is processed downward (from high addresses to low addresses, or from right to left). Under MS-DOS, the direction flag is normally clear if your program has not changed it. 2. Load the number of iterations for the string instruction into the CX register. If you want to process 100 elements in a string, move 100 into CX. If you wish the string instruction to terminate conditionally (for example, during a search when a match is found), load the maximum number of iterations that can be performed without an error. 3. Load the starting offset address of the source string into DS:SI and the starting address of the destination string into ES:DI. Some string instructions take only a destination or source, not both (see Table 5.1). Normally, the segment address of the source string should be DS, but you can use a segment override to specify a different segment for the source operand. You cannot override the segment address for the destination string. Therefore, you may need to change the value of ES. For information on changing segment registers, see Programming Segmented Addresses in Chapter 3. Note Although you can use a segment override on the source operand, a segment override combined with a repeat prefix can cause problems in certain situations on all processors except the 80386/486. If an interrupt occurs during the string operation, the segment override is lost and the rest of the string operation processes incorrectly. Segment overrides can be used safely when interrupts are turned off or with the 80386/486 processors. You can adapt these steps to the requirements of any particular string operation. The syntax for the string instructions is: [[prefix]] CMPS [[segmentregister:]] source, [[ES:]] destination LODS [[segmentregister:]] source [[prefix]] MOVS [[ES:]] destination, [[segmentregister:]] source [[prefix]] SCAS [[ES:]] destination [[prefix]] STOS [[ES:]] destination Some instructions have special forms for byte, word, or doubleword operands. If you use the form of the instruction that ends in B (BYTE), W (WORD), or D (DWORD) with LODS, SCAS, and STOS, the assembler knows whether the element is in the AL, AX, or EAX register. Therefore, these instruction forms do not require operands.
112
Programmers Guide
Table 5.1 lists each string instruction with the type of repeat prefix it uses and indicates whether the instruction works on a source, a destination, or both.
Table 5.1 Requirements for String Instructions Instruction MOVS SCAS CMPS LODS STOS INS OUTS Repeat Prefix REP REPE/REPNE REPE/REPNE None REP REP REP Source/Destination Both Destination Both Source Destination Destination Source Register Pair DS:SI, ES:DI ES:DI DS:SI, ES:DI DS:SI ES:DI ES:DI DS:SI
The repeat prefix causes the instruction that follows it to repeat for the number of times specified in the count register or until a condition becomes true. After each iteration, the instruction increments or decrements SI and DI so that it points to the next array element. The direction flag determines whether SI and DI are incremented (flag clear) or decremented (flag set). The size of the instruction determines whether SI and DI are altered by 1, 2, or 4 bytes each time. Each prefix governs the number of repetitions as follows:
Prefix REP REPE, REPZ REPNE, REPNZ Description Repeats instruction CX times Repeats instruction maximum CX times while values are equal Repeats instruction maximum CX times while values are not equal
The prefixes apply to only one string instruction at a time. To repeat a block of instructions, use a loop construction. (See Loops in Chapter 7.) At run time, if a string instruction is preceded by a repeat sequence, the processor: 1. Checks the CX register and exits if CX is 0. 2. Performs the string operation once. 3. Increases SI and/or DI if the direction flag is clear. Decreases SI and/or DI if the direction flag is set. The amount of increase or decrease is 1 for byte operations, 2 for word operations, and 4 for doubleword operations. 4. Decrements CX without modifying the flags.
113
5. Checks the zero flag (for SCAS or CMPS) if the REPE or REPNE prefix is used. If the repeat condition holds, loops back to step 1. Otherwise, the loop ends and execution proceeds to the next instruction. When the repeat loop ends, SI (or DI) points to the position following a match (when using SCAS or CMPS), so you need to decrement or increment DI or SI to point to the element where the last match occurred. Although string instructions (except LODS) are used most often with repeat prefixes, they can also be used by themselves. In these cases, the SI and/or DI registers are adjusted as specified by the direction flag and the size of operands.
Using String Instructions

To use the 8086-family string instructions, follow the steps outlined in the previous section. Examples in this section illustrate each instruction. You can also use the techniques in this section with structures and unions, since arrays and strings can be fields in structures and unions. (See the section Structures and Unions, following.)
Moving Array Data

The MOVS instruction copies data from one area of memory to another. To move data, first load the count, source and destination addresses into the appropriate registers. Then use REP with the MOVS instruction.
.MODEL .DATA BYTE BYTE .CODE mov mov mov . . . cld mov mov mov rep small 10 DUP ('0123456789') 100 DUP (?) ax, @data ds, ax es, ax ; Load same segment ; to both DS ; and ES
source destin
cx, LENGTHOF source si, OFFSET source di, OFFSET destin movsb
; ; ; ; ;
Work upward Set iteration count to 100 Load address of source Load address of destination Move 100 bytes
Filling Arrays
The STOS instruction stores a specified value in each position of a string. The string is the destination, so it must be pointed to by ES:DI. The value to store must be in the accumulator.
114
Programmers Guide
The next example stores the character 'a' in each byte of a 100-byte string, filling the entire string with aaaa.... Notice how the code stores 50 words rather than
115
100 bytes. This makes the fill operation faster by reducing the number of iterations. To fill an odd number of bytes, you need to adjust for the last byte.
.MODEL .DATA destin BYTE ldestin EQU .CODE . . . cld mov mov mov rep small, C 100 DUP (?) (LENGTHOF destin) / 2 ; Assume ES = DS
ax, 'aa' cx, ldestin di, OFFSET destin stosw
; ; ; ; ;
Work upward Load character to fill Load length of string Load address of destination Store 'aa' into array
Comparing Arrays
The CMPS instruction compares two strings and points to the address after which a match or nonmatch occurs. If the values are the same, the zero flag is set. Either string can be considered the destination or the source unless a segment override is used. This example using CMPSB assumes that the strings are in different segments. Both segments must be initialized to the appropriate segment register.
116
Programmers Guide
.MODEL large, C .DATA string1 BYTE "The quick brown fox jumps over the lazy dog" .FARDATA string2 BYTE "The quick brown dog jumps over the lazy fox" lstring EQU LENGTHOF string2 .CODE mov ax, @data ; Load data segment mov ds, ax ; into DS mov ax, @fardata ; Load far data segment mov es, ax ; into ES . . . cld ; Work upward mov cx, lstring ; Load length of string mov si, OFFSET string1 ; Load offset of string1 mov di, OFFSET string2 ; Load offset of string2 repe cmpsb ; Compare je allmatch ; Jump if all match . . . allmatch: ; Special case for all match
Loading Data from Arrays

The LODS instruction loads a value from a string into the accumulator register. This instruction is not used with a repeat instruction prefix, since continually reloading the accumulator serves no purpose. The code in this example loads, processes, and displays each byte in a string.

.DATA BYTE WORD .CODE . . . cld mov mov mov
117
info linfo
0, 1, 2, 3, 4, 5, 6, 7, 8, 9 LENGTHOF info
cx, linfo si, OFFSET info ah, 2
; ; ; ;
Work upward Load length Load offset of source Display character function
get: lodsb add mov int loop al, '0' dl, al 21h get ; ; ; ; ; Get a character Convert to ASCII Move to DL Call DOS to display character Repeat
Searching Arrays
The SCAS instruction compares the value pointed to by ES:DI with the value in the accumulator. If both values are the same, it sets the zero flag. A repeat prefix lets SCAS work on an entire string, scanning (from which SCAS gets its name) for a particular value called the target. REPNE SCAS sets the zero flag if it finds the target value in the array. REPE SCAS sets the zero flag if the scanned array contains nothing but the target value.
118
Programmers Guide
This example assumes that ES is not the same as DS and that the address of the string is stored in a pointer variable. The LES instruction loads the far address of the string into ES:DI.
.DATA string BYTE pstring PBYTE lstring EQU .CODE . . . cld mov les mov repne jne . . . notfound: "The quick brown fox jumps over the lazy dog" string ; Far pointer to string LENGTHOF string ; Length of string
cx, lstring di, pstring al, 'z' scasb notfound
; ; ; ; ; ; ; ;
Work upward Load length of string Load address of string Load character to find Search Jump if not found ES:DI points to character after first 'z'
; Special case for not found
Translating Data in Byte Arrays

The XLAT (Translate) instruction copies a byte from an array of bytes into the AL register. The instruction takes its name from its ability to translate an elements number into the element itself. For example, given the number 7, XLAT returns byte #7 from the array. The array may hold byte-sized integers or, very often, a table or list of characters. The syntax for XLAT is: XLAT[[B]] [[[[segment:]]memory]] The optional B suffix (for byte) reflects the size of data the instruction handles. Both XLAT and XLATB assemble to exactly the same machine code. To use XLAT, place the offset of the start of the array in the BX register and the desired index value in AL. Array indexes always begin with 0 in assembly language. To retrieve the first byte of the array, set AL to 0; to retrieve the second byte, set AL to 1, and so forth. XLAT returns the byte element in AL, overwriting the index number. By default, the DS register contains the segment of the table, but you can use a segment override to specify a different segment. You need not give an operand except when specifying a segment override. (For information about the segment override operator, see Direct Memory Operands in Chapter 3.)
119
This example illustrates XLAT by looking up hexadecimal characters in a list. The code converts an eight-bit binary number to a string representing a hexadecimal number.
; Table hex convert key of hexadecimal digits BYTE "0123456789ABCDEF" BYTE "You pressed the key with ASCII code " BYTE ?,?,"h",13,10,"$" .CODE . . . mov ah, 8 ; Get a key in AL int 21h ; Call DOS mov bx, OFFSET hex ; Load table address mov ah, al ; Save a copy in high byte and al, 00001111y ; Mask out top character xlat ; Translate mov key[1], al ; Store the character mov cl, 12 ; Load shift count shr ax, cl ; Shift high char into position xlat ; Translate mov key, al ; Store the character mov dx, OFFSET convert ; Load message mov ah, 9 ; Display character int 21h ; Call DOS
Although AL cannot contain an index value greater than 255, you can use XLAT with arrays containing more than 256 elements. Simply treat each 256byte block of the array as a smaller sub-array. For example, to retrieve the 260th element of an array, add 256 to BX and set AL=3 (260-256-1).
Structures and Unions

A structure is a group of possibly dissimilar data types and variables that can be accessed as a unit or by any of its components. The fields within the structure can have different sizes and data types. Unions are identical to structures, except that the fields of a union overlap in memory, which allows you to define different data formats for the same memory space. Unions can store different types of data depending on the situation. They also can store data as one data type and retrieve it as another data type. Whereas each field in a structure has an offset relative to the first byte of the structure, all the fields in a union start at the same offset. The size of a structure is the sum of its components; the size of a union is the length of the longest field.
120
Programmers Guide
A MASM structure is similar to a struct in the C language, a STRUCTURE in FORTRAN, and a RECORD in Pascal. Unions in MASM are similar to unions in C and FORTRAN, and to variant records in Pascal. Follow these steps when using structures and unions: 1. Declare a structure (or union) type. 2. Define one or more variables having that type. 3. Reference the fields directly or indirectly with the field (dot) operator. You can use the entire structure or union variable or just the individual fields as operands in assembler statements. This section explains the allocating, initializing, and nesting of structures and unions. MASM 6.1 extends the functionality of structures and also makes some changes to MASM 5.1 behavior. If you prefer, you can retain MASM 5.1 behavior by specifying OPTION OLDSTRUCTS in your program.
Declaring Structure and Union Types

When you declare a structure or union type, you create a template for data. The template states the sizes and, optionally, the initial values in the structure or union, but allocates no memory. The STRUCT keyword marks the beginning of a type declaration for a structure. (STRUCT and STRUC are synonyms.) The format for STRUCT and UNION type declarations is: name {STRUCT | UNION} [[alignment]] [[,NONUNIQUE ]] fielddeclarations name ENDS The fielddeclarations is a series of one or more variable declarations. You can declare default initial values individually or with the DUP operator. (See Defining Structure and Union Variables, following.) Referencing Structures, Unions, and Fields, later in this chapter, explains the NONUNIQUE keyword. You can nest structures and unions, as explained in Nested Structures and Unions, also later in this chapter.
Initializing Fields
If you provide initializers for the fields of a structure or union when you declare the type, these initializers become the default value for the fields when you define a variable of that type. Defining Structure and Union Variables, following, explains default initializers.
121
When you initialize the fields of a union type, the type and value of the first field become the default value and type for the union. In this example of an initialized union declaration, the default type for the union is DWORD:
DWB d w b DWB UNION DWORD WORD BYTE ENDS 00FFh ? ?
If the size of the first member is less than the size of the union, the assembler initializes the rest of the union to zeros. When initializing strings in a type, make sure the initial values are long enough to accommodate the largest possible string.
Field Names
Structure and union field names must be unique within a nesting level because they represent the offset from the beginning of the structure to the corresponding field. A label elsewhere in the code may have the same name as a structure field, but a text macro cannot. Also, field names between structures need not be unique. Field names must be unique if you place OPTION M510 or OPTION OLDSTRUCTS in your code or use the /Zm option from the command line, since versions of MASM prior to 6.0 require unique field names. (See Appendix A.)
Alignment Value and Offsets for Structures

Data access to structures is faster on aligned fields than on unaligned fields. Therefore, alignment gains speed at the cost of space. Alignment improves access on 16-bit and 32-bit processors but makes no difference in programs executing on an 8-bit 8088 processor. The way the assembler aligns structure fields determines the amount of space required to store a variable of that type. Each field in a structure has an offset relative to 0. If you specify an alignment in the structure declaration (or with the /Zpn command-line option), the offset for each field may be modified by the alignment (or n). The only values accepted for alignment are 1, 2, and 4. The default is 1. If the type declaration includes an alignment, each field is aligned to either the fields size or the alignment value, whichever is less. If the field size in bytes is greater than the alignment value, the field is padded so that its offset is evenly divisible by the alignment value. Otherwise, the field is padded so that its offset is evenly divisible by the field size.
122
Programmers Guide
Any padding required to reach the correct offset for the field is added prior to allocating the field. The padding consists of zeros and always precedes the aligned field. The size of the structure must also be evenly divisible by the structure alignment value, so zeros may be added at the end of the structure. If neither the alignment nor the /Zp command-line option is used, the offset is incremented by the size of each data directive. This is the same as a default alignment equal to 1. The alignment specified in the type declaration overrides the /Zp command-line option. These examples show how the assembler determines offsets:
STUDENT2 score id year sname STUDENT2 STRUCT WORD BYTE DWORD BYTE ENDS 2 1 2 3 4 ; ; ; ; ; Alignment value is 2 Offset = 0 Offset = 2 (1 byte padding added) Offset = 4 Offset = 8 (1 byte padding added)
One byte of padding is added at the end of the first byte-sized field. Otherwise, the offset of the year field would be 3, which is not divisible by the alignment value of 2. The size of this structure is now 9 bytes. Since 9 is not evenly divisible by 2, 1 byte of padding is added at the end of student2.
STUDENT4 sname score year STRUCT BYTE WORD BYTE 4 ; 1 ; 10 DUP (100) ; 2 ; ; ; 3 ; Alignment value is 4 Offset = 0 (1 byte padding added) Offset = 2 Offset = 22 (1 byte padding added so offset of next field is divisible by 4) Offset = 24
id STUDENT4
DWORD ENDS
The alignment value affects the alignment of structure variables, so adding an alignment value affects memory usage. This feature provides compatibility with structures in Microsoft C. MASM 6.1 provides an improved H2INC utility, which C programmers can use to translate C structures to assembly. (See Environment and Tools, Chapter 20.) The ALIGN, EVEN, and ORG directives can modify how field offsets are placed during structure definition. The EVEN and ALIGN directives insert padding bytes to round the field offset up to the specified alignment boundary. The ORG directive changes the offset of the next field to a given value, either positive or negative. If you use ORG when declaring a structure, you cannot define a structure of that type. ORG is useful when accessing existing data structures, such as a stack frame created by a high-level language.
123
Defining Structure and Union Variables

Once you have declared a structure or union type, you can define variables of that type. For each variable defined, memory is allocated in the current segment in the format declared by the type. The syntax for defining a structure or union variable is: [[name]] typename < [[initializer [[,initializer]]...]] > [[name]] typename { [[initializer [[,initializer]]...]] } [[name]] typename constant DUP ({ [[initializer [[,initializer]]...]] }) The name is the label assigned to the variable. If you do not provide a name, the assembler allocates space for the variable but does not give it a symbolic name. The typename is the name of a previously declared structure or union type. You can give an initializer for each field. Each initializer must correspond in type with the field defined in the type declaration. For unions, the type of the initializer must be the same as the type for the first field. An initialization list can also use the DUP operator. The list of initializers can be broken only after a comma unless you end the line with a continuation character (\). The last curly brace or angle bracket must appear on the same line as the last initializer. You can also use the line continuation character to extend a line as shown in the Item4 declaration that follows. Angle brackets and curly braces can be intermixed in an initialization as long as they match. This example illustrates the options for initializing lists in structures of type ITEMS:
124
Programmers Guide
ITEMS STRUCT Iname BYTE 'Item Name' Inum WORD ? UNION ITYPE ; oldtype BYTE 0 ; newtype WORD ? ; ENDS ; ITEMS ENDS . . . .DATA Item1 ITEMS < > ; Item2 ITEMS { } ; Item3 ITEMS <'Bolts', 126> ; ; ; Item4 ITEMS { \ 'Bolts', ; 126 \ ; }
UNION keyword appears first when nested in structure. (See "Nested Structures and Unions," following ).
Accepts default initializers Accepts default initializers Overrides default value of first 2 fields; use default of the third field Item name Part number
The example defines that is, allocates space for four structures of the ITEMS type. The structures are named Item1 through Item4. Each definition requires the angle brackets or curly braces even when not initialized. If you initialize more than one field, separate the values with commas, as shown in Item3 and Item4. You need not initialize all fields in a structure. If a field is blank, the assembler uses the structures initial value given for that field in the declaration. If there is no default value, the field value is left unspecified. For nested structures or unions, however, these are equivalent:
Item5 Item6 ITEMS ITEMS {'Bolts', , } {'Bolts', , { } }
A variable and an array of union type WB look like this:

WB w b WB num array UNION WORD BYTE ENDS WB WB
125
? ?
{0Fh} (40 / SIZEOF WB) DUP ({2})
; Store 0Fh ; Allocates and ; initializes 20 unions
Arrays as Field Initializers

The size of the initializer determines the length of the array that can override the contents of a field in a variable definition. The override cannot contain more elements than the default. Specifying fewer override array elements changes the first n values of the default where n is the number of values in the override. The rest of the array elements take their default values from the initializer.
Strings as Field Initializers

If the override is shorter, the assembler pads the override with spaces to equal the length of the initializer. If the initializer is a string and the override value is not a string, the override value must be enclosed in angle brackets or curly braces. A string can override any member of type BYTE (or SBYTE). You need not enclose the string in angle brackets or curly braces unless mixed with other override methods. If a structure has an initialized string field or an array of bytes, any new string assigned to a variable of the field that is smaller than the default is padded with spaces. The assembler adds four spaces at the end of 'Bolts' in the variables of type ITEMS previously shown. The Iname field in the ITEMS structure cannot contain a field initializer longer than 'Item Name'.
Structures as Field Initializers

Initializers for structure variables must be enclosed in curly braces or angle brackets, but you can specify overrides with fewer elements than the defaults. This example illustrates the use of default values with structures as field initializers:
126
Programmers Guide
DISKDRIVES a1 b1 c1 DISKDRIVES INFO buffer crlf query endmark drives INFO info1 INFO STRUCT BYTE ? BYTE ? BYTE ? ENDS STRUCT BYTE 100 DUP (?) BYTE 13, 10 BYTE 'Filename: ' ; String <= can override BYTE 36 DISKDRIVES <0, 1, 1> ENDS { , , 'Dir' }
; Next line illegal since name in query field is too long: ; info2 INFO {"TESTFILE", , "DirectoryName"} lotsof INFO { , , 'file1', , {0,0,0} }, { , , 'file2', , {0,0,1} }, { , , 'file3', , {0,0,2} }
The following diagram shows how the assembler stores info1.
The initialization for drives gives default values for all three fields of the structure. The fields left blank in info1 use the default values for those fields. The info2 declaration is illegal because DirectoryName is longer than the initial string for that field.
Arrays of Structures and Unions

You can define an array of structures using the DUP operator (see Declaring and Referencing Arrays, earlier in this chapter) or by creating a list of structures. For example, you can define an array of structure variables like this:

Item7 ITEMS 30 DUP ({,,{10}})
127
The Item7 array defined here has 30 elements of type ITEMS, with the third field of each element (the union) initialized to 10. You can also list array elements as shown in the following example.
Item8 ITEMS {'Bolts', 126, 10}, {'Pliers',139, 10}, {'Saws', 414, 10}
Redeclaring a Structure
The assembler generates an error when you declare a structure more than once unless the following are the same:
u u u u
Field names Offsets of named fields Initialization lists Field alignment value
LENGTHOF, SIZEOF, and TYPE for Structures

The size of a structure determined by SIZEOF is the offset of the last field, plus the size of the last field, plus any padding required for proper alignment. (For information about alignment, see Declaring Structure and Union Types, earlier in this chapter.)
128
Programmers Guide
This example, using the preceding data declarations, shows how to use the LENGTHOF, SIZEOF, and TYPE operators with structures.
INFO buffer crlf query endmark drives INFO info1 lotsof INFO INFO STRUCT BYTE 100 DUP (?) BYTE 13, 10 BYTE 'Filename: ' BYTE 36 DISKDRIVES <0, 1, 1> ENDS { { { { , , , , , , , , 'Dir' } 'file1', , {0,0,0} }, 'file2', , {0,0,1} }, 'file3', , {0,0,2} } info1 info1 info1 ; 116 = number of bytes in ; initializers ; 1 = number of items ; 116 = same as size
sinfo1 linfo1 tinfo1
EQU EQU EQU
SIZEOF LENGTHOF TYPE SIZEOF LENGTHOF TYPE
slotsof EQU llotsof EQU tlotsof EQU
lotsof ; 116 * 3 = number of bytes in ; initializers lotsof ; 3 = number of items lotsof ; 116 = same as size for structure ; of type INFO
LENGTHOF, SIZEOF, and TYPE for Unions

The size of a union determined by SIZEOF is the size of the longest field plus any padding required. The length of a union variable determined by LENGTHOF equals the number of initializers defined inside angle brackets or curly braces. TYPE returns a value indicating the type of the longest field.
DWB d w b DWB num array snum lnum tnum sarray larray tarray UNION DWORD WORD BYTE ENDS DWB DWB EQU EQU EQU EQU EQU EQU ? ? ?
{0FFFFh} (100 / SIZEOF DWB) DUP ({0}) SIZEOF LENGTHOF TYPE SIZEOF LENGTHOF TYPE num num num array array array ; ; ; ; ; ; = = = = = = 4 1 4 100 (4*25) 25 4
129
Referencing Structures, Unions, and Fields

Like other variables, structure variables can be accessed by name. You can access fields within structure variables with this syntax: variable. field References to fields must always be fully qualified, with the structure or union names and the dot operator preceding the field name. The assembler requires that you use the dot operator only with structure fields, not as an alternative to the plus operator; nor can you use the plus operator as an alternative to the dot operator. The following example shows several ways to reference the fields of a structure of type DATE.
DATE month day year DATE STRUCT BYTE BYTE WORD ENDS ; Defines structure type ? ? ?
yesterday . . . mov mov mov mov
DATE
{1, 20, 1993}
; Declare structure ; variable
al, bx, al, al,
yesterday.day OFFSET yesterday (DATE PTR [bx]).month [bx].date.month
; ; ; ; ; ; ;
Use structure variables Load structure address Use as indirect operand This is necessary only if month is already a field in a different structure
Under OPTION M510 or OPTION OLDSTRUCTS , unique structure names do not need to be qualified. However, if the NONUNIQUE keyword appears in a structure definition, all fields of the structure must be fully qualified when referenced, even if the OPTION OLDSTRUCTS directive appears in the code. Also, you must qualify all references to a field. (For information on the OPTION directive, see Chapter 1.) Even if the initialized union is the size of a WORD or DWORD, members of structures or unions are accessible only through the fields names.
130
Programmers Guide
In the following example, the two MOV statements show how you can access the elements of an array of unions.
WB w b WB array UNION WORD BYTE ENDS WB mov mov ? ?
(100 / SIZEOF WB) DUP ({0}) array[12].w, 40h array[32].b, 2
As the preceding code illustrates, you can use unions to access the same data in more than one form. One application of structures and unions is to simplify the task of reinitializing a far pointer. For a far pointer declared as
FPWORD TYPEDEF FAR PTR WORD
.DATA WordPtr FPWORD ?
you must follow these steps to point WordPtr to a word value named ThisWord in the current data segment.
mov mov WORD PTR WordPtr[2], ds WORD PTR WordPtr, OFFSET ThisWord
The preceding method requires that you remember whether the segment or the offset is stored first. However, if your program declares a union like this:
uptr dwptr STRUCT offs segm ENDS uptr UNION FPWORD WORD WORD ENDS 0 0 0
131
You can initialize a far pointer with these steps:

.DATA WrdPtr2 uptr . . . mov mov <>
WrdPtr2.segm, ds WrdPtr2.offs, OFFSET ThisWord
This code moves the segment and the offset into the pointer and then moves the pointer into a register with the other field of the union. Although this technique does not reduce the code size, it avoids confusion about the order for loading the segment and offset.
Nested Structures and Unions

You can nest structures and unions in several ways. This section explains how to refer to the fields in a nested structure or union. The following example illustrates the four techniques for nesting, and how to reference the fields. Note the syntax for nested structures. The techniques are reviewed following the example.
ITEMS Inum Iname ITEMS INVENTORY UpDate oldItem STRUCT WORD BYTE ENDS STRUCT WORD ITEMS ? 'Item Name'
ITEMS STRUCT ups source shipmode ENDS STRUCT f1 f2 ENDS INVENTORY .DATA yearly INVENTORY
? { \ 100, 'AF8' \ } { ?, '94C' }
; Named variable of ; existing structure ; Unnamed variable of ; existing type ; Named nested structure
WORD BYTE
? ? ; Unnamed nested structure
WORD WORD ENDS
? ?
{ }
132
Programmers Guide
; Referencing each type of data in the yearly structure: mov mov mov mov ax, yearly.oldItem.Inum yearly.ups.shipmode, 'A' yearly.Inum, 'C' ax, yearly.f1
To nest structures and unions, you can use any of these techniques:
u
The field of a structure or union can be a named variable of an existing structure or union type, as in the oldItem field. Because INVENTORY contains two structures of type ITEMS , the field names in oldItem are not unique. Therefore, you must use the full field names when referencing those fields, as in the statement
mov ax, yearly.oldItem.Inum
To declare a named structure or union inside another structure or union, give the STRUCT or UNION keyword first and then define a label for it. Fields of the nested structure or union must always be qualified:
mov yearly.ups.shipmode, 'A'
As shown in the Items field of Inventory, you also can use unnamed variables of existing structures or unions inside another structure or union. In these cases, you can reference fields directly:
mov mov yearly.Inum, 'C' ax, yearly.f1
Records
Records are similar to structures, except that fields in records are bit strings. Each bit field in a record variable can be used separately in constant operands or expressions. The processor cannot access bits individually at run time, but it can access bit fields with instructions that manipulate bits. Records are bytes, words, or doublewords in which the individual bits or groups of bits are considered fields. In general, the three steps for using record variables are the same as those for using other complex data types: 1. Declare a record type. 2. Define one or more variables having the record type. 3. Reference record variables using shifts and masks. Once it is defined, you can use the record variable as an operand in assembler statements.
133
This section explains the record declaration syntax and the use of the MASK and WIDTH operators. It also shows some applications of record variables and constants.
Declaring Record Types

A record type creates a template for data with the sizes and, optionally, the initial values for bit fields in the record. It does not allocate memory space for the record. The RECORD directive declares a record type for an 8-bit, 16-bit, or 32-bit record that contains one or more bit fields. The maximum size is based on the expression word size. See OPTION EXPR16 and OPTION EXPR32 in Chapter 1. The syntax is: recordname RECORD field [[, field]]... The field declares the name, width, and initial value for the field. The syntax for each field is: fieldname:width[[=expression]] Global labels, macro names, and record field names must all be unique, but record field names can have the same names as structure field names. Width is the number of bits in the field, and expression is a constant giving the initial (or default) value for the field. Record definitions can span more than one line if the continued lines end with commas. If expression is given, it declares the initial value for the field. The assembler generates an error message if an initial value is too large for the width of its field. The first field in the declaration always goes into the most significant bits of the record. Subsequent fields are placed to the right in the succeeding bits. If the fields do not total exactly 8, 16, or 32 bits as appropriate, the entire record is shifted right, so the last bit of the last field is the lowest bit of the record. Unused bits in the high end of the record are initialized to 0. The following example creates a byte record type COLOR having four fields: blink, back, intense, and fore. The contents of the record type are shown after the example. Since no initial values are given, all bits are set to 0. Note that this is only a template maintained by the assembler. It allocates no space in the data segment.
134
Programmers Guide
COLOR RECORD blink:1, back:3, intense:1, fore:3
The next example creates a record type CW that has six fields. Each record declared with this type occupies 16 bits of memory. Initial (default) values are given for each field. You can use them when declaring data for the record. The bit diagram after the example shows the contents of the record type.
CW RECORD r1:3=0, ic:1=0, rc:2=0, pc:2=3, r2:2=1, masks:6=63
Defining Record Variables

Once you have declared a record type, you can define record variables of that type. For each variable, the assembler allocates memory in the format declared by the type. The syntax is: [[name]] recordname <[[initializer [[,initializer]]...]] > [[name]] recordname { [[initializer [[,initializer]]...]] } [[name]] recordname constant DUP ( [[initializer [[,initializer]]...]] ) The recordname is the name of a record type previously declared with the RECORD directive. A fieldlist for each field in the record can be a list of integers, character constants, or expressions that correspond to a value compatible with the size of the field. You must include curly braces or angle brackets even when you do not specify an initial value. If you use the DUP operator (see Declaring and Referencing Arrays, earlier in this chapter) to initialize multiple record variables, only the angle brackets and
135
any initial values need to be enclosed in parentheses. For example, you can define an array of record variables with
xmas COLOR 50 DUP ( <1, 2, 0, 4> )
You do not have to initialize all fields in a record. If an initial value is blank, the assembler automatically stores the default initial value of the field. If there is no default value, the assembler clears each bit in the field. The definition in the following example creates a variable named warning whose type is given by the record type COLOR. The initial values of the fields in the variable are set to the values given in the record definition. The initial values override any default record values given in the declaration.
COLOR RECORD blink:1,back:3,intense:1,fore:3 ; Record ; declaration <1, 0, 1, 4> ; Record ; definition
warning COLOR
LENGTHOF, SIZEOF, and TYPE with Records

The SIZEOF and TYPE operators applied to a record name return the number of bytes used by the record. SIZEOF returns the number of bytes a record variable occupies. You cannot use LENGTHOF with a record declaration, but you can use it with defined record variables. LENGTHOF returns the number of records in an array of records, or 1 for a single record variable. The following example illustrates these points.
; Record definition ; 9 bits stored in 2 bytes RGBCOLOR RECORD red:3, mov mov mov mov
green:3,
blue:3
ax, RGBCOLOR ; Equivalent to "mov ax, 01FFh" ax, LENGTHOF RGBCOLOR ; Illegal since LENGTHOF can ; apply only to data label ax, SIZEOF RGBCOLOR ; Equivalent to "mov ax, 2" ax, TYPE RGBCOLOR ; Equivalent to "mov ax, 2"
136
Programmers Guide
; Record instance ; 8 bits stored in 1 byte RGBCOLOR2 RECORD red:3, green:3, blue:2 rgb RGBCOLOR2 <1, 1, 1> ; Initialize to 00100101y mov mov mov mov ax, RGBCOLOR2 ax, LENGTHOF rgb ax, SIZEOF rgb ax, TYPE rgb ; Equivalent ; "mov ; Equivalent ; Equivalent ; Equivalent to ax, 00FFh" to "mov ax, 1" to "mov ax, 1" to "mov ax, 1"
Record Operators
The WIDTH operator (used only with records) returns the width in bits of a record or record field. The MASK operator returns a bit mask for the bit positions occupied by the given record field. A bit in the mask contains a 1 if that bit corresponds to a bit field. The following example shows how to use MASK and WIDTH.
.DATA COLOR message wblink wback wintens wfore wcolor RECORD blink:1, back:3, intense:1, fore:3 COLOR <1, 5, 1, 1> WIDTH blink ; "wblink" = 1 WIDTH back ; "wback" = 3 WIDTH intense ; "wintens" = 1 WIDTH fore ; "wfore" = 3 WIDTH COLOR ; "wcolor" = 8
EQU EQU EQU EQU EQU .CODE . . . mov and
ah, message ah, NOT MASK back
or
ah, MASK blink
xor
ah, MASK intense
; ; ; ; ; ; ; ; ; ;
Load initial 1101 1001 Turn off AND 1000 1111 "back" --------1000 1001 Turn on OR 1000 0000 "blink" --------1000 1001 Toggle XOR 0000 1000 "intense" --------1000 0001
IF mov ELSE mov xor ENDIF
(WIDTH COLOR) GT 8 ax, message al, message ah, ah
; If color is 16 bit, load ; into 16-bit register ; else ; load into low 8-bit register ; and clear high 8-bits
137
The example continues by illustrating several ways in which record fields can serve as operands and expressions:
; Rotate "back" of "message" without changing other values mov mov and al, message ; ah, al ; al, NOT MASK back; ; ; cl, back ; ah, cl ; ah ; ah, cl ah, MASK back ; ; ; ; ; ; ; value from memory a copy for work 1101 1001=ah/al out old bits AND 1000 1111=mask save old message --------1000 1001=al Load bit position Shift to right 0000 1101=ah Increment 0000 1110=ah Shift left again Mask off extra bits to get new message Combine old and new Write back to memory 1110 0000=ah AND 0111 0000=mask --------0110 0000 ah OR 1000 1001 al --------1110 1001 ah Load Save Mask to
mov shr inc shl and
or mov
ah, al message, ah
Record variables are often used with the logical operators to perform logical operations on the bit fields of the record, as in the previous example using the MASK operator.
135
C H A P T E R
Using Floating-Point and Binary Coded Decimal Numbers
MASM requires different techniques for handling floating-point (real) numbers and binary coded decimal (BCD) numbers than for handling integers. You have two choices for working with real numbers a math coprocessor or emulation routines. Math coprocessors the 8087, 80287, and 80387 chips work with the main processor to handle real-number calculations. The 80486 processor performs floating-point operations directly. All information in this chapter pertaining to the 80387 coprocessor applies to the 80486DX processor as well. It does not apply to the 80486SX, which does not provide an on-chip coprocessor. This chapter begins with a summary of the directives and formats of floatingpoint data that you need to allocate memory storage and initialize variables before you can work with floating-point numbers. The chapter then explains how to use a math coprocessor for floating-point operations. It covers:
u u u u
The architecture of the registers. The operands for the coprocessor instruction formats. The coordination of coprocessor and main processor memory access. The basic groups of coprocessor instructions for loading and storing data, doing arithmetic calculations, and controlling program flow.
The next main section describes emulation libraries. The emulation routines provided with all Microsoft high-level languages enable you to use coprocessor instructions as though your computer had a math coprocessor. However, some coprocessor instructions are not handled by emulation, as this section explains. Finally, because math coprocessor and emulation routines can also operate on BCD numbers, this chapter includes the instruction set for these numbers.
136
Programmers Guide
Using Floating-Point Numbers

Before using floating-point data in your program, you need to allocate the memory storage for the data. You can then initialize variables either as real numbers in decimal form or as encoded hexadecimals. The assembler stores allocated data in 10-byte IEEE format. This section covers floating-point declarations and floating-point data formats.
Declaring Floating-Point Variables and Constants

You can allocate real constants using the REAL4, REAL8, and REAL10 directives. These directives allocate the following floating-point numbers:
Directive
REAL4 REAL8 REAL10
Size
Short (32-bit) real numbers Long (64-bit) real numbers 10-byte (80-bit) real numbers and BCD numbers
Table 6.1 lists the possible ranges for floating-point variables. The number of significant digits can vary in an arithmetic operation as the least-significant digit may be lost through rounding errors. This occurs regularly for short and long real numbers, so you should assume the lesser value of significant digits shown in Table 6.1. Ten-byte real numbers are much less susceptible to rounding errors for reasons described in the next section. However, under certain circumstances, 10-byte real operations can have a precision of only 18 digits.
Table 6.1 Ranges of Floating-Point Variables Data Type Short real Long real 10-byte real Bits 32 64 80 Significant Digits 67 1516 19 Approximate Range 1.18 x 10- 38 to 3.40 x 1038 2.23 x 10- 308 to 1.79 x 10308 3.37 x 10- 4932 to 1.18 x 104932
With versions of MASM prior to 6.0, the DD, DQ, and DT directives could allocate real constants. MASM 6.1 still supports these directives, but the variables are integers rather than floating-point values. Although this makes no difference in the assembly code, CodeView displays the values incorrectly. You can specify floating-point constants either as decimal constants or as encoded hexadecimal constants. You can express decimal real-number constants in the form: [[+ | ]] integer[[fraction]][[E[[+ | ]]exponent]]
Chapter 6 Using Floating-Point and Binary Coded Decimal Numbers
137
For example, the numbers 2.523E1 and -3.6E-2 are written in the correct decimal format. You can use these numbers as initializers for real-number variables. The assembler always evaluates digits of real numbers as base 10. It converts real-number constants given in decimal format to a binary format. The sign, exponent, and decimal part of the real number are encoded as bit fields within the number. You can also specify the encoded format directly with hexadecimal digits (09 plus AF). The number must begin with a decimal digit (09) and end with the real-number designator (R). It cannot be signed. For example, the hexadecimal number 3F800000r can serve as an initializer for a doubleword-sized variable. The maximum range of exponent values and the number of digits required in the hexadecimal number depend on the directive. The number of digits for encoded numbers used with REAL4, REAL8, and REAL10 must be 8, 16, and 20 digits, respectively. If the number has a leading zero, the number must be 9, 17, or 21 digits. Examples of decimal constant and hexadecimal specifications are shown here:
; Real numbers short REAL4 double REAL8 tenbyte REAL10 25.23 2.523E1 2523.0E-2 ; IEEE format ; IEEE format ; 10-byte real format
; Encoded as hexadecimals ieeeshort REAL4 3F800000r ; 1.0 as IEEE short ieeedouble REAL8 3FF0000000000000r ; 1.0 as IEEE long temporary REAL10 3FFF8000000000000000r ; 1.0 as 10-byte ; real
The section Storing Numbers in Floating-Point Format, following, explains the IEEE formats the way the assembler actually stores the data. Pascal or C programmers may prefer to create language-specific TYPEDEF declarations, as illustrated in this example:
138
Programmers Guide
; C-language specific float TYPEDEF REAL4 double TYPEDEF REAL8 long_double TYPEDEF REAL10 ; Pascal-language specific SINGLE TYPEDEF REAL4 DOUBLE TYPEDEF REAL8 EXTENDED TYPEDEF REAL10
For applications of TYPEDEF, see Defining Pointer Types with TYPEDEF, page 75.
Storing Numbers in Floating-Point Format

The assembler stores floating-point variables in the IEEE format. MASM 6.1 does not support .MSFLOAT and Microsoft binary format, which are available in version 5.1 and earlier. Figure 6.1 illustrates the IEEE format for encoding short (4-byte), long (8-byte), and 10-byte real numbers. Although this figure places the most significant bit first for illustration, low bytes actually appear first in memory.
139
Figure 6.1
Encoding for Real Numbers in IEEE Format
The following list explains how the parts of a real number are stored in the IEEE format. Each item in the list refers to an item in Figure 6.1.
u u
Sign bit (0 for positive or 1 for negative) in the upper bit of the first byte. Exponent in the next bits in sequence (8 bits for a short real number, 11 bits for a long real number, and 15 bits for a 10-byte real number).
140
Programmers Guide
u
The integer part of the significand in bit 63 for the 10-byte real format. By absorbing carry values, this bit allows 10-byte real operations to preserve precision to 19 digits. The integer part is always 1 in short and long real numbers; consequently, these formats do not provide a bit for the integer, since there is no point in storing it. Decimal part of the significand in the remaining bits. The length is 23 bits for short real numbers, 52 bits for long real numbers, and 63 bits for 10-byte real numbers.
The exponent field represents a multiplier 2n. To accommodate negative exponents (such as 2-6), the value in the exponent field is biased; that is, the actual exponent is determined by subtracting the appropriate bias value from the value in the exponent field. For example, the bias for short real numbers is 127. If the value in the exponent field is 130, the exponent represents a value of 2130127, or 23. The bias for long real numbers is 1,023. The bias for 10-byte real numbers is 16,383. Once you have declared floating-point data for your program, you can use coprocessor or emulator instructions to access the data. The next section focuses on the coprocessor architecture, instructions, and operands required for floating-point operations.
Using a Math Coprocessor

When used with real numbers, packed BCD numbers, or long integers, coprocessors (the 8087, 80287, 80387, and 80486) calculate many times faster than the 8086-based processors. The coprocessor handles data with its own registers. The organization of these registers can be one of the four formats for using operands explained in Instruction and Operand Formats, later in this section. This section describes how the coprocessor transfers data to and from the coprocessor, coordinates processor and coprocessor operations, and controls program flow.
141
Coprocessor Architecture
The coprocessor accesses memory as the CPU does, but it has its own data and control registers eight data registers organized as a stack and seven control registers similar to the 8086 flag registers. The coprocessors instruction set provides direct access to these registers. The eight 80-bit data registers of the 8087-based coprocessors are organized as a stack, although they need not be used as a stack. As data items are pushed into the top register, previous data items move into higher-numbered registers, which are lower on the stack. Register 0 is the top of the stack; register 7 is the bottom. The syntax for specifying registers is: ST [[(number)]] The number must be a digit between 0 and 7 or a constant expression that evaluates to a number from 0 to 7. ST is another way to refer to ST(0). All coprocessor data is stored in registers in the 10-byte real format. The registers and the register format are shown in Figure 6.2.
Figure 6.2
Coprocessor Data Registers
Internally, all calculations are done on numbers of the same type. Since 10-byte real numbers have the greatest precision, lower-precision numbers are guaranteed not to lose precision as a result of calculations. The instructions that transfer values between the main memory and the coprocessor automatically convert numbers to and from the 10-byte real format.
142
Programmers Guide
Instruction and Operand Formats

Because of the stack organization of registers, you can consider registers either as elements on a stack or as registers much like 8086-family registers. Table 6.2 lists the four main groups of coprocessor instructions and the general syntax for each. The names given to the instruction format reflect the way the instruction uses the coprocessor registers. The instruction operands are placed in the coprocessor data registers before the instruction executes.
Table 6.2 Coprocessor Operand Formats Instruction Format Classical stack Memory Register Register pop Syntax Finstruction Finstruction memory Finstruction ST(num), ST Finstruction ST, ST(num) FinstructionP ST(num), ST Implied Operands ST, ST(1) ST Example
fadd fadd memloc fadd st(5), st fadd st, st(3) faddp st(4), st
You can easily recognize coprocessor instructions because, unlike all 8086family instruction mnemonics, they start with the letter F. Coprocessor instructions can never have immediate operands and, with the exception of the FSTSW instruction, they cannot have processor registers as operands.
Classical-Stack Format
Instructions in the classical-stack format treat the coprocessor registers like items on a stack thus its name. Items are pushed onto or popped off the top elements of the stack. Since only the top item can be accessed on a traditional stack, there is no need to specify operands. The first (top) register (and the second, if the instruction needs two operands) is always assumed. ST (the top of the stack) is the source operand in coprocessor arithmetic operations. ST(1), the second register, is the destination. The result of the operation replaces the destination operand, and the source is popped off the stack. This leaves the result at the top of the stack.
143
The following example illustrates the classical-stack format; Figure 6.3 shows the status of the register stack after each instruction.
fld1 fldpi fadd ; Push 1 into first position ; Push pi into first position ; Add pi and 1 and pop
Figure 6.3
Status of the Register Stack
Memory Format
Instructions that use the memory format, such as data transfer instructions, also treat coprocessor registers like items on a stack. However, with this format, items are pushed from memory onto the top element of the stack, or popped from the top element to memory. You must specify the memory operand. Some instructions that use the memory format specify how a memory operand is to be interpreted as an integer (I) or as a binary coded decimal (B). The letter I or B follows the initial F in the syntax. For example, FILD interprets its operand as an integer and FBLD interprets its operand as a BCD number. If the instruction name does not include a type letter, the instruction works on real numbers. You can also use memory operands in calculation instructions that operate on two values (see Using Coprocessor Instructions, later in this section). The memory operand is always the source. The stack top (ST) is always the implied destination.
144
Programmers Guide
The result of the operation replaces the destination without changing its stack position, as shown in this example and in Figure 6.4:
m1 m2 .DATA REAL4 REAL4 .CODE . . . fld fld fadd fstp fst 1.0 2.0
m1 m2 m1 m1 m2
; ; ; ; ;
Push m1 into first position Push m2 into first position Add m2 to first position Pop first position into m1 Copy first position to m2
Figure 6.4
Status of the Register Stack and Memory Locations
Register Format
Instructions that use the register format treat coprocessor registers as registers rather than as stack elements. Instructions that use this format require two register operands; one of them must be the stack top (ST). In the register format, specify all operands by name. The first operand is the destination; its value is replaced with the result of the operation. The second operand is the source; it is not affected by the operation. The stack positions of the operands do not change.
145
The only instructions that use the register operand format are the FXCH instruction and arithmetic instructions for calculations on two values. With the FXCH instruction, the stack top is implied and need not be specified, as shown in this example and in Figure 6.5:
fadd fadd fxch st(1), st st, st(2) st(1) ; Add second position to first ; result goes in second position ; Add first position to third ; result goes in first position ; Exchange first and second positions
Figure 6.5
Status of the Previously Initialized Register Stack
Register-Pop Format
The register-pop format treats coprocessor registers as a modified stack. The source register must always be the stack top. Specify the destination with the registers name. Instructions with this format place the result of the operation into the destination operand, and the top pops off the stack. The register-pop format is used only for instructions for calculations on two values, as in this example and in Figure 6.6:
faddp st(2), st ; Add first and third positions and pop ; first position destroyed; ; third moves to second and holds result
146
Programmers Guide Figure 6.6 Status of the Already Initialized Register Stack
Coordinating Memory Access

The math coprocessor and main processor work simultaneously. However, since the coprocessor cannot handle device input or output, data originates in the main processor. The main processor and the coprocessor have their own registers, which are separate and inaccessible to each other. They exchange data through memory, since memory is available to both. When using the coprocessor, follow these three steps: 1. Load data from memory to coprocessor registers. 2. Process the data. 3. Store the data from coprocessor registers back to memory. Step 2, processing the data, can occur while the main processor is handling other tasks. Steps 1 and 3 must be coordinated with the main processor so that the processor and coprocessor do not try to access the same memory at the same time; otherwise, problems of coordinating memory access can occur. Since the processor and coprocessor work independently, they may not finish working on memory in the order in which you give instructions. The two potential timing conflicts that can occur are handled in different ways. One timing conflict results from a coprocessor instruction following a processor instruction. The processor may have to wait until the coprocessor finishes if the next processor instruction requires the result of the coprocessors calculation. You do not have to write your code to avoid this conflict, however. The assembler coordinates this timing automatically for the 8088 and 8086 processors, and the processor coordinates it automatically on the 8018680486 processors. This is the case shown in the first example that follows. Another conflict results from a processor instruction that accesses memory following a coprocessor instruction that accesses the same memory. The processor can try to load a variable that is still being used by the coprocessor. You need careful synchronization to control the timing, and this synchronization is not automatic on the 8087 coprocessor. For code to run correctly on the 8087, you must include WAIT or FWAIT (mnemonics for the same instruction) to ensure that the coprocessor finishes before the processor begins, as shown in the second example.
147
In this situation, the processor does not generate the FWAIT instruction automatically.
; Processor instruction first - No wait needed mov WORD PTR mem32[0], ax ; Load memory mov WORD PTR mem32[2], dx fild mem32 ; Load to register ; Coprocessor instruction first - Wait needed (for 8087) fist mem32 ; Store to memory fwait ; Wait until coprocessor ; is done mov ax, WORD PTR mem32[0] ; Move to register mov dx, WORD PTR mem32[2]
When generating code for the 8087 coprocessor, the assembler automatically inserts a WAIT instruction before the coprocessor instruction. However, if you use the .286 or .386 directive, the compiler assumes that the coprocessor instructions are for the 80287 or 80387 and does not insert the WAIT instruction. If your code does not need to run on an 8086 or 8088 processor, you can make your programs smaller and more efficient by using the .286 or .386 directive.
Using Coprocessor Instructions

The 8087 family of coprocessors has separate instructions for each of the following operations:
u u u
Loading and storing data Doing arithmetic calculations Controlling program flow
The following sections explain the available instructions and show how to use them for each of these operations. For general syntax information, see Instruction and Operand Formats, earlier in this section.
Loading and Storing Data

Data-transfer instructions copy data between main memory and the coprocessor registers or between different coprocessor registers. Two principles govern data transfers:
u
The choice of instruction determines whether a value in memory is considered an integer, a BCD number, or a real number. The value is always considered a 10-byte real number once transferred to the coprocessor.
148
Programmers Guide
u
The size of the operand determines the size of a value in memory. Values in the coprocessor always take up 10 bytes.
You can transfer data to stack registers using load commands. These commands push data onto the stack from memory or from coprocessor registers. Store commands remove data. Some store commands pop data off the register stack into memory or coprocessor registers; others simply copy the data without changing it on the stack. If you use constants as operands, you cannot load them directly into coprocessor registers. You must allocate memory and initialize a variable to a constant value. That variable can then be loaded by using one of the load instructions in the following list. The math coprocessor offers a few special instructions for loading certain constants. You can load 0, 1, pi, and several common logarithmic values directly. Using these instructions is faster and often more precise than loading the values from initialized variables. All instructions that load constants have the stack top as the implied destination operand. The constant to be loaded is the implied source operand. The coprocessor data area, or parts of it, can also be moved to memory and later loaded back. You may want to do this to save the current state of the coprocessor before executing a procedure. After the procedure ends, restore the previous status. Saving coprocessor data is also useful when you want to modify coprocessor behavior by writing certain data to main memory, operating on the data with 8086-family instructions, and then loading it back to the coprocessor data area. Use the following instructions for transferring numbers to and from registers:
Instruction(s)
FLD, FST, FSTP FILD, FIST, FISTP FBLD FBSTP FXCH FLDZ FLD1 FLDPI FLDCW mem2byte F[ [N] ]STCW mem2byte
Description Loads and stores real numbers Loads and stores binary integers Loads BCD Stores BCD Exchanges register values Pushes 0 into ST Pushes 1 into ST Pushes the value of pi into ST Loads the control word into the coprocessor Stores the control word in memory

FLDENV mem14byte F[ [N] ]STENV mem14byte
149
Loads environment from memory Stores environment in memory Description Restores state from memory Saves state in memory Pushes the value of log2e into ST Pushes log210 into ST Pushes log102 into ST Pushes loge2 into ST
Instruction(s)
FRSTOR mem94byte F[ [N] ]SAVE mem94byte FLDL2E FLDL2T FLDLG2 FLDLN2
The following example and Figure 6.7 illustrate some of these instructions:
m1 m2 .DATA REAL4 REAL4 .CODE fld fld fst fxch fstp 1.0 2.0 m1 st(2) m2 st(2) m1 ; ; ; ; ; Push m1 into first item Push third item into first Copy first item to m2 Exchange first and third items Pop first item into m1
Figure 6.7
Status of the Register Stack: Main Memory and Coprocessor
150
Programmers Guide
Doing Arithmetic Calculations

Most of the coprocessor instructions for arithmetic operations have several forms, depending on the operand used. You do not need to specify the operand type in the
151
instruction if both operands are stack registers, since register values are always 10-byte real numbers. In most of the arithmetic instructions listed here, the result replaces the destination register. The instructions include:
Instruction
FADD FSUB FSUBR FMUL FDIV FDIVR FABS FCHS FRNDINT FSQRT FSCALE FPREM
Description Adds the source and destination Subtracts the source from the destination Subtracts the destination from the source Multiplies the source and the destination Divides the destination by the source Divides the source by the destination Sets the sign of ST to positive Reverses the sign of ST Rounds ST to an integer Replaces the contents of ST with its square root Multiplies the stack-top value by 2 to the power contained in ST(1) Calculates the remainder of ST divided by ST(1)
80387 Only
Instruction
FSIN FCOS FSINCOS FPREM1 FXTRACT F2XM1 FYL2X FYL2XP1 FPTAN FPATAN F[ [N] ]INIT F[ [N] ]CLEX FINCSTP FDECSTP FFREE
Description Calculates the sine of the value in ST Calculates the cosine of the value in ST Calculates the sine and cosine of the value in ST Calculates the partial remainder by performing modulo division on the top two stack registers Breaks a number down into its exponent and mantissa and pushes the mantissa onto the register stack Calculates 2x1 Calculates Y * log2 X Calculates Y * log2 (X+1) Calculates the tangent of the value in ST Calculates the arctangent of the ratio Y/X Resets the coprocessor and restores all the default conditions in the control and status words Clears all exception flags and the busy flag of the status word Adds 1 to the stack pointer in the status word Subtracts 1 from the stack pointer in the status word Marks the specified register as empty
152
Programmers Guide
The following example illustrates several arithmetic instructions. The code solves quadratic equations, but does no error checking and fails for some values because it attempts to find the square root of a negative number. Both Help and the MATH.ASM sample file show a complete version of this procedure. The complete form uses the FTST (Test for Zero) instruction to check for a negative number or 0 before calculating the square root.
a b cc posx negx .DATA REAL4 REAL4 REAL4 REAL4 REAL4 3.0 7.0 2.0 0.0 0.0
.CODE . . . ; Solve quadratic equation - no error checking ; The formula is: -b +/- squareroot(b2 - 4ac) / (2a) fld1 ; Get constants 2 and 4 fadd st,st ; 2 at bottom fld st ; Copy it fmul a ; = 2a fmul fxch fmul fld fmul fsubr fsqrt fld fchs fxch fld fadd fxch fsubp fdiv fstp fdivr fstp st(1),st cc b st,st ; = 4a ; Exchange ; = 4ac ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; Load b = b2 = b2 - 4ac Negative value here produces error = square root(b2 - 4ac) Load b Make it negative Exchange Copy square root Plus version = -b + root(b2 - 4ac) Exchange Minus version = -b - root(b2 - 4ac) Divide plus version Store it Divide minus version Store it
st st,st(2) st(2),st st,st(2) posx negx
153
Controlling Program Flow

The math coprocessor has several instructions that set control flags in the status word. The 8087-family control flags can be used with conditional jumps to direct program flow in the same way that 8086-family flags are used. Since the coprocessor does not have jump instructions, you must transfer the status word to memory so that the flags can be used by 8086-family instructions. An easy way to use the status word with conditional jumps is to move its upper byte into the lower byte of the processor flags, as shown in this example:
fstsw fwait mov sahf mem16 ax, mem16 ; ; ; ; Store status word in memory Make sure coprocessor is done Move to AX Store upper word in flags
The SAHF (Store AH into Flags) instruction in this example transfers AH into the low bits of the flags register. You can save several steps by loading the status word directly to AX on the 80287 with the FSTSW and FNSTSW instructions. This is the only case in which data can be transferred directly between processor and coprocessor registers, as shown in this example:
fstsw ax
The coprocessor control flags and their relationship to the status word are described in Control Registers, following. The 8087-family coprocessors provide several instructions for comparing operands and testing control flags. All these instructions compare the stack top (ST) to a source operand, which may either be specified or implied as ST(1). The compare instructions affect the C3, C2, and C0 control flags, but not the C1 flag. Table 6.3 shows the flags settings for each possible result of a comparison or test.
Table 6.3 Control-Flag Settings After Comparison or Test After FCOM ST > source ST < source ST = source Not comparable After FTEST ST is positive ST is negative ST is 0 ST is NAN or projective infinity C3 0 0 1 1 C2 0 0 0 1 C0 0 1 0 1
Variations on the compare instructions allow you to pop the stack once or twice and to compare integers and zero. For each instruction, the stack top is always
154
Programmers Guide
the implied destination operand. If you do not give an operand, ST(1) is the implied source. With some compare instructions, you can specify the source as a memory or register operand. All instructions summarized in the following list have implied operands: either ST as a single-destination operand or ST as the destination and ST(1) as the source. Each instruction in the list has implied operands. Some instructions have a wait version and a no-wait version. The no-wait versions have N as the second letter. The instructions for comparing and testing flags include:
Instruction
FCOM FTST FCOMP FUCOM , FUCOMP, FUCOMPP F[ [N] ]STSW mem2byte FXAM FPREM
Description Compares the stack top to the source. The source and destination are unaffected by the comparison. Compares ST to 0. Compares the stack top to the source and then pops the stack. Compares the source to ST and sets the condition codes of the status word according to the result (80386/486 only). Stores the status word in memory. Sets the value of the control flags based on the type of the number in ST. Finds a correct remainder for large operands. It uses the C2 flag to indicate whether the remainder returned is partial (C2 is set) or complete (C2 is clear). If the bit is set, the operation should be repeated. It also returns the leastsignificant three bits of the quotient in C0, C3, and C1. Copies the stack top onto itself, thus padding the executable file and taking up processing time without having any effect on registers or memory. Enables or disables interrupts (8087 only). Sets protected mode. Requires a .286P or .386P directive (80287, 80387, and 80486 only).
FNOP
FDISI , FNDISI , FENI, FNENI FSETPM
The following example illustrates some of these instructions. Notice how conditional blocks are used to enhance 80287 code.
down across diamtr status .DATA REAL4 REAL4 REAL4 WORD 10.35 13.07 12.93 ? ; Sides of a rectangle ; Diameter of a circle

P287 EQU (@Cpu AND 00111y) .CODE . . . ; Get area of rectangle fld across ; Load one side fmul down ; Multiply by the other ; Get area of circle: Area = PI * (D/2)2 fld1 ; Load one and fadd st, st ; double it to get constant 2 fdivr diamtr ; Divide diameter to get radius fmul st, st ; Square radius fldpi ; Load pi fmul ; Multiply it ; Compare area of circle and fcompp ; IF p287 fstsw ax ; ELSE fnstsw status ; mov ax, status ; ENDIF sahf ; jp nocomp ; jz same ; jc rectangle ; jmp circle ; nocomp: . . . same: . . . rectangle: . . . circle: ; Both equal rectangle Compare and throw both away (For 287+, skip memory) Load from coprocessor to memory Transfer memory to register Transfer AH to flags register If parity set, can't compare If zero set, they're the same If carry set, rectangle is bigger else circle is bigger
155
; Error handler
; Rectangle bigger
; Circle bigger
156
Programmers Guide
Additional instructions for the 80387/486 are FLDENVD and FLDENVW for loading the environment; FNSTENVD, FNSTENVW, FSTENVD, and FSTENVW for storing the environment state; FNSAVED , FNSAVEW, FSAVED, and FSAVEW for saving the coprocessor state; and FRSTORD and FRSTORW for restoring the coprocessor state. The size of the code segment, not the operand size, determines the number of bytes loaded or stored with these instructions. The instructions ending with W store the 16-bit form of the control register data, and the instructions ending with D store the 32-bit form. For example, in 16-bit mode FSAVEW saves the 16-bit control register data. If you need to store the 32-bit form of the control register data, use FSAVED.
Control Registers
Some of the flags of the seven 16-bit control registers control coprocessor operations, while others maintain the current status of the coprocessor. In this sense, they are much like the 8086-family flags registers (see Figure 6.8).
Figure 6.8
Coprocessor Control Registers
The status word register is the only commonly used control register. (The others are used mostly by systems programmers.) The format of the status word register is shown in Figure 6.9, which shows how the coprocessor control flags align with the processor flags. C3 overwrites the zero flag, C2 overwrites the parity flag, and C0 overwrites the carry flag. C1 overwrites an undefined bit, so it cannot be used directly with conditional jumps, although you can use the TEST instruction to
157
check C1 in memory or in a register. The status word register also overwrites the sign and auxiliary-carry flags, so you cannot count on their being unchanged after the operation.
Figure 6.9
Coprocessor and Processor Control Flags
Using An Emulator Library

If you do not have a math coprocessor or an 80486 processor, you can do most floating-point operations by writing assembly-language procedures and accessing an emulator from a high-level language. All Microsoft high-level languages come with emulator libraries for all memory models. To use emulator functions, first write your assembly-language procedure using coprocessor instructions. Then assemble the module with the /FPi option and link it with your high-level language modules. You can enter options in the Programmers WorkBench (PWB) environment, or you can use the OPTION EMULATOR in your source code. In emulation mode, the assembler generates instructions for the linker that the Microsoft emulator can use. The form of the OPTION directive in the following example tells the assembler to use emulation mode. This option (introduced in Chapter 1) can be defined only once in a module.
OPTION EMULATOR
158
Programmers Guide
You can use emulator functions in a stand-alone assembler program by assembling with the /Cx command-line option and linking with the appropriate emulator library. The following fragment outlines a small-model program that contains floating-point instructions served by an emulator:
.MODEL OPTION . . . PUBLIC .CODE main: .STARTUP . fadd st, st fldpi small, c EMULATOR
main ; Program entry point must ; have name 'main' ; Floating-point instructions ; emulated
Emulator libraries do not allow for all of the coprocessor instructions. The following floating-point instructions are not emulated: FBLD FBSTP FCOS FDECSTP FINCSTP FINIT FLDENV FNOP FPREM1 FRSTOR FRSTORW FRSTORD FSAVE FSAVEW FSAVED FSETPM FSIN FSINCOS FSTENV FUCOM FUCOMP FUCOMPP FXTRACT
For information about writing assembly-language procedures for high-level languages, see Chapter 12, Mixed-Language Programming.
Using Binary Coded Decimal Numbers

Binary coded decimal (BCD) numbers allow calculations on large numbers without rounding errors. This characteristic makes BCD numbers a common choice for monetary calculations. Although BCDs can represent integers of any precision, the 8087-based coprocessors accommodate BCD numbers only in the range 999,999,999,999,999,999. This section explains how to define BCD numbers, how to access them with a math coprocessor or emulator, and how to perform simple BCD calculations on the main processor.
Defining BCD Constants and Variables

Unpacked BCD numbers are made up of bytes containing a single decimal digit in the lower 4 bits of each byte. Packed BCD numbers are made up of bytes
159
containing two decimal digits: one in the upper 4 bits and one in the lower 4 bits. The leftmost digit holds the sign (0 for positive, 1 for negative). Packed BCD numbers are encoded in the 8087 coprocessors packed BCD format. They can be up to 18 digits long, packed two digits per byte. The assembler zero-pads BCDs initialized with fewer than 18 digits. Digit 20 is the sign bit, and digit 19 is reserved. When you define an integer constant with the TBYTE directive and the current radix is decimal (t), the assembler interprets the number as a packed BCD number. The syntax for specifying packed BCDs is the same as for other integers.
pos1 neg1 TBYTE TBYTE 1234567890 ; Encoded as 00000000001234567890h -1234567890 ; Encoded as 80000000001234567890h
Unpacked BCD numbers are stored one digit to a byte, with the value in the lower 4 bits. They can be defined using the BYTE directive. For example, an unpacked BCD number could be defined and initialized as follows:
unpackedr unpackedf BYTE BYTE 1,5,8,2,5,2,9 9,2,5,2,8,5,1 ; Initialized to 9,252,851 ; Initialized to 9,252,851
As these two lines show, you can arrange digits backward or forward, depending on how you write the calculation routines that handle the numbers.
BCD Calculations on a Coprocessor

As the previous section explains, BCDs differ from other numbers only in the way a program stores them in memory. Internally, a math coprocessor does not distinguish BCD integers from any other type. The coprocessor can load, calculate, and store packed BCD integers up to 18 digits long. The coprocessor instruction
fbld bcd1
pushes the packed BCD number at bcd1 onto the coprocessor stack. When your code completes calculations on the number, place the result back into memory in BCD format with the instruction
fbstp bcd1
which discards the variable from the stack top.
160
Programmers Guide
BCD Calculations on the Main Processor

The 8086-family of processors can perform simple arithmetic operations on BCD integers, but only one digit at a time. The main processor, like the coprocessor, operates internally on the numbers binary value. It requires additional code to translate the binary result back into BCD format. The main processor provides instructions specifically designed to translate to and from BCD format. These instructions are called ASCII-adjust and decimal-adjust instructions. They get their names from Intel mnemonics that use the term ASCII to refer to unpacked BCD numbers and decimal to refer to packed BCD numbers.
Unpacked BCD Numbers

When a calculation using two one-digit values produces a two-digit result, the instructions AAA, AAS, AAM, and AAD place the first digit in AL and the second in AH. If the digit in AL needs to carry to or borrow from the digit in AH, the instructions set the carry and auxiliary carry flags. The four ASCIIadjust instructions for unpacked BCDs are:
Instruction
AAA AAS AAM AAD
Description Adjusts after an addition operation. Adjusts after a subtraction operation. Adjusts after a multiplication operation. Always use with MUL, not with IMUL. Adjusts before a division operation. Unlike other BCD instructions, AAD converts a BCD value to a binary value before the operation. After the operation, use AAM to adjust the quotient. The remainder is lost. If you need the remainder, save it in another register before adjusting the quotient. Then move it back to AL and adjust if necessary.
For processor arithmetic on unpacked BCD numbers, you must do the 8-bit arithmetic calculations on each digit separately, and assign the result to the AL register. After each operation, use the corresponding BCD instruction to adjust the result. The ASCII-adjust instructions do not take an operand and always work on the value in the AL register.
161
The following examples show how to use each of these instructions in BCD addition, subtraction, multiplication, and division.
; To add 9 and 3 as BCDs: mov ax, 9 mov bx, 3 add al, bl aaa ; ; ; ; ; ; Load 9 and 3 as unpacked BCDs Add 09h and 03h to get 0Ch Adjust 0Ch in AL to 02h, increment AH to 01h, set carry Result 12 (unpacked BCD in AX)
; To subtract 4 from 13: mov ax, 103h mov bx, 4 sub al, bl aas
; ; ; ; ; ;
Load 13 and 4 as unpacked BCDs Subtract 4 from 3 to get FFh (-1) Adjust 0FFh in AL to 9, decrement AH to 0, set carry Result 9 (unpacked BCD in AX)
; To multiply 9 times 3: mov ax, 903h mul ah aam
; Load 9 and 3 as unpacked BCDs ; Multiply 9 and 3 to get 1Bh ; Adjust 1Bh in AL ; to get 27 (unpacked BCD in AX)
; To divide 25 by 2: mov ax, 205h mov bl, 2 aad div bl
aam
; ; ; ; ; ; ; ; ; ;
Load 25 and 2 as unpacked BCDs Adjust 0205h in AX to get 19h in AX Divide by 2 to get quotient 0Ch in AL remainder 1 in AH Adjust 0Ch in AL to 12 (unpacked BCD in AX) (remainder destroyed)
If you process multidigit BCD numbers in loops, each digit is processed and adjusted in turn.
Packed BCD Numbers

Packed BCD numbers are made up of bytes containing two decimal digits: one in the upper 4 bits and one in the lower 4 bits. The 8086-family processors provide instructions for adjusting packed BCD numbers after addition and subtraction. You must write your own routines to adjust for multiplication and division.
162
Programmers Guide
For processor calculations on packed BCD numbers, you must do the 8-bit arithmetic calculations on each byte separately, placing the result in the AL register. After each operation, use the corresponding decimal-adjust instruction to adjust the result. The decimal-adjust instructions do not take an operand and always work on the value in the AL register. The 8086-family processors provide the instructions DAA (Decimal Adjust after Addition) and DAS (Decimal Adjust after Subtraction) for adjusting packed BCD numbers after addition and subtraction. These examples use DAA and DAS to add and subtract BCDs.
;To add 88 and 33: mov ax, 8833h add al, ah daa ; Load 88 and 33 as packed BCDs ; Add 88 and 33 to get 0BBh ; Adjust 0BBh to 121 (packed BCD:) ; 1 in carry and 21 in AL
;To subtract 38 from 83: mov ax, 3883h sub al, ah das
; Load 83 and 38 as packed BCDs ; Subtract 38 from 83 to get 04Bh ; Adjust 04Bh to 45 (packed BCD:) ; 0 in carry and 45 in AL
Unlike the ASCII-adjust instructions, the decimal-adjust instructions never affect AH. The assembler sets the auxiliary carry flag if the digit in the lower 4 bits carries to or borrows from the digit in the upper 4 bits, and it sets the carry flag if the digit in the upper 4 bits needs to carry to or borrow from another byte. Multidigit BCD numbers are usually processed in loops. Each byte is processed and adjusted in turn.
161
C H A P T E R
Controlling Program Flow
Very few programs execute all lines sequentially from .STARTUP to .EXIT. Rather, complex program logic and efficiency dictate that you control the flow of your program jumping from one point to another, repeating an action until a condition is reached, and passing control to and from procedures. This chapter describes various ways for controlling program flow and several features that simplify coding program-control constructs. The first section covers jumps from one point in the program to another. It explains how MASM 6.1 optimizes both unconditional and conditional jumps under certain circumstances, so that you do not have to specify every attribute. The section also describes instructions you can use to test conditional jumps. The next section describes loop structures that repeat actions or evaluate conditions. It discusses MASM directives, such as .WHILE and .REPEAT, that generate appropriate compare, loop, and jump instructions for you, and the .IF, .ELSE, and .ELSEIF directives that generate jump instructions. The Procedures section in this chapter explains how to write an assemblylanguage procedure. It covers the extended functionality for PROC, a PROTO directive that lets you write procedure prototypes similar to those used in C, an INVOKE directive that automates parameter passing, and options for the stackframe setup inside procedures. The last section explains how to pass program control to an interrupt routine.
Jumps
Jumps are the most direct way to change program control from one location to another. At the processor level, jumps work by changing the value of the IP (Instruction Pointer) register to a target offset and, for far jumps, by changing the CS register to a new segment address. Jump instructions fall into only two categories: conditional and unconditional.
162
Programmers Guide
Unconditional Jumps
The JMP instruction transfers control unconditionally to another instruction. JMPs single operand contains the address of the target instruction. Unconditional jumps skip over code that should not be executed, as shown here:
; Handle one case label1: . . . jmp continue ; Handle second case label2: . . . jmp continue . . . continue:
The distance of the target from the jump instruction and the size of the operand determine the assemblers encoding of the instruction. The longer the distance, the more bytes the assembler uses to code the instruction. In versions of MASM prior to 6.0, unconditional NEAR jumps sometimes generated inefficient code, but MASM can now optimize unconditional jumps.
Jump Optimizing
The assembler determines the smallest encoding possible for the direct unconditional jump. MASM does not require a distance operator, so you do not have to determine the correct distance of the jump. If you specify a distance, it overrides any assembler optimization. If the specified distance falls short of the target address, the assembler generates an error. If the specified distance is longer than the jump requires, the assembler encodes the given distance and does not optimize it. The assembler optimizes jumps when the following conditions are met:
u
You do not specify SHORT, NEAR, FAR, NEAR16, NEAR32, FAR16, FAR32, or PROC as the distance of the target. The target of the jump is not external and is in the same segment as the jump instruction. If the target is in a different segment (but in the same group), it is treated as though it were external.
Chapter 7 Controlling Program Flow
163
If these two conditions are met, MASM uses the instruction, distance, and size of the operand to determine how to optimize the encoding for the jump. No syntax changes are necessary. Note This information about jump optimizing also applies to conditional jumps on the 80386/486.
Indirect Operands
An indirect operand provides a pointer to the target address, rather than the address itself. A pointer is a variable that contains an address. The processor distinguishes indirect (pointer) operands from direct (address) operands by the instructions context. You can specify the pointers size with the WORD, DWORD, or FWORD attributes. Default sizes are based on .MODEL and the default segment size.
jmp jmp [bx] ; Uses .MODEL and segment size defaults WORD PTR [bx] ; A NEAR16 indirect call
If the indirect operand is a register, the jump is always a NEAR16 jump for a 16-bit register, and NEAR32 for a 32-bit register:
jmp jmp bx ebx ; NEAR16 jump ; NEAR32 jump
A DWORD indirect operand, however, is ambiguous to the assembler.

jmp DWORD PTR [var] ; A NEAR32 jump in a 32-bit segment; ; a FAR16 jump in a 16-bit segment
In this case, your code must clear the ambiguity with the NEAR32 or FAR16 keywords. The following example shows how to use TYPEDEF to define NEAR32 and FAR16 pointer types.
NFP FFP TYPEDEF TYPEDEF jmp jmp PTR PTR NFP FFP NEAR32 FAR16 PTR [var] ; NEAR32 indirect jump PTR [var] ; FAR16 indirect jump
You can use an unconditional jump as a form of conditional jump by specifying the address in a register or indirect memory operand. Also, you can use indirect memory operands to construct jump tables that work like C switch statements, Pascal CASE statements, or Basic ON GOTO , ON GOSUB, or SELECT CASE statements, as shown in the following example.
164
Programmers Guide
NPVOID TYPEDEF NEAR PTR .DATA ctl_tbl NPVOID extended, ctrla, ctrlb .CODE . . . mov ah, 8h int 21h cbw mov bx, ax shl bx, 1 jmp ctl_tbl[bx] extended: mov int . . . jmp ctrla: . . . jmp ctrlb: . . . jmp . . next: .
; Null key (extended code) ; Address of CONTROL-A key routine ; Address of CONTROL-B key routine
; Get a key ; ; ; ; Stretch AL into AX Copy Convert to address Jump to key routine
ah, 8h 21h
; Get second key of extended key ; Use another jump table ; for extended keys
next ; CONTROL-A code here
next ; CONTROL-B code here
next
; Continue
In this instance, the indirect memory operands point to addresses of routines for handling different keystrokes.
Conditional Jumps
The most common way to transfer control in assembly language is to use a conditional jump. This is a two-step process: 1. First test the condition. 2. Then jump if the condition is true or continue if it is false.
165
All conditional jumps except two (JCXZ and JECXZ) use the processor flags for their criteria. Thus, any statement that sets or clears a flag can serve as a test basis for a conditional jump. The jump statement can be any one of 30 conditional-jump instructions. A conditional-jump instruction takes a single operand containing the target address. You cannot use a pointer value as a target as you can with unconditional jumps.
Jumping Based on the CX Register

JCXZ and JECXZ are special conditional jumps that do not consult the processor flags. Instead, as their names imply, these instructions cause a jump only if the CX or ECX register is zero. The use of JCXZ and JECXZ with program loops is covered in the next section, Loops.
Jumping Based on the Processor Flags

The remaining conditional jumps in the processors repertoire all depend on the status of the flags register. As the following list shows, several conditional jumps have two or three names JE (Jump if Equal) and JZ (Jump if Zero), for example. Shared names assemble to exactly the same machine instruction, so you may choose whichever mnemonic seems more appropriate. Jumps that depend on the status of the flags register include:
Instruction JC/JB/JNAE JNC/JNB/JAE JBE/JNA JA/JNBE JE/JZ JNE/JNZ JL/JNGE JGE/JNL JLE/JNG JG/JNLE JS JNS JO JNO JP/JPE JNP/JPO Jumps if Carry flag is set Carry flag is clear Either carry or zero flag is set Carry and zero flag are clear Zero flag is set Zero flag is clear Sign flag overflow flag Sign flag = overflow flag Zero flag is set or sign overflow Zero flag is clear and sign = overflow Sign flag is set Sign flag is clear Overflow flag is set Overflow flag is clear Parity flag is set (even parity) Parity flag is clear (odd parity)
The last two jumps in the list, JPE (Jump if Parity Even) and JPO (Jump if Parity Odd), are useful only for communications programs. The processor sets
166
Programmers Guide
the parity flag if an operation produces a result with an even number of set bits. A communications program can compare the flag against the parity bit received through the serial port to test for transmission errors. The conditional jumps in the preceding list can follow any instruction that changes the processor flags, as these examples show:
; Uses JO to handle overflow condition add ax, bx ; Add two values jo overflow ; If value too large, adjust ; Uses JNZ to check for zero as the result of subtraction sub ax, bx ; Subtract mov cx, Count ; First, initialize CX jnz skip ; If the result is not zero, continue call zhandler ; Else do special case
As the second example shows, the jump does not have to immediately follow the instruction that alters the flags. Since MOV does not change the flags, it can appear between the SUB instruction and the dependent jump. There are three categories of conditional jumps:
u u u
Comparison of two values Individual bit settings in a value Whether a value is zero or nonzero
Jumps Based on Comparison of Two Values

The CMP instruction is the most common way to test for conditional jumps. It compares two values without changing either, then sets or clears the processor flags according to the results of the comparison. Internally, the CMP instruction is the same as the SUB instruction, except that CMP does not change the destination operand. Both set flags according to the result of the subtraction.
167
You can compare signed or unsigned values, but you must choose the subsequent conditional jump to reflect the correct value type. For example, JL (Jump if Less Than) and JB (Jump if Below) may seem conceptually similar, but a failure to understand the difference between them can result in program bugs. Table 7.1 shows the correct conditional jumps for comparisons of signed and unsigned values. The table shows the zero, carry, sign, and overflow flags as ZF, CF, SF, and OF, respectively.
Table 7.1 Conditional Jumps Based on Comparisons of Two Values Signed Comparisons Instruction Jump if True JE JNE JG/JNLE JLE/JNG JL/JNGE JGE/JNL ZF = 1 ZF = 0 ZF = 0 and SF = OF ZF = 1 or SF OF SF OF SF = OF Unsigned Comparisons Instruction Jump if True JE JNE JA/JNBE JBE/JNA JB/JNAE JAE/JNB ZF = 1 ZF = 0 CF = 0 and ZF = 0 CF = 1 or ZF = 1 CF = 1 CF = 0
The mnemonic names of jumps always refer to the comparison of CMPs first operand (destination) with the second operand (source). For instance, in this example, JG tests whether the first operand is greater than the second.
cmp jg jl ax, bx ; Compare AX and BX next1 ; Equivalent to: If ( AX > BX ) goto next1 next2 ; Equivalent to: If ( AX < BX ) goto next2
Jumps Based on Bit Settings

The individual bit settings in a single value can also serve as the criteria for a conditional jump. The TEST instruction tests whether specific bits in an operand are on or off (set or clear), and sets the zero flag accordingly.
168
Programmers Guide
The TEST instruction is the same as the AND instruction, except that TEST changes neither operand. The following example shows an application of TEST.
.DATA BYTE ? .CODE . . . ; If bit 2 or bit 4 is set, then call task_a ; Assume "bits" is 0D3h test bits, 10100y ; If 2 or 4 is set AND jz skip1 ; call task_a ; Then call task_a skip1: ; Jump taken . . . ; If bits 2 and 4 are clear, then call task_b ; Assume "bits" is 0E9h test bits, 10100y ; If 2 and 4 are clear AND jnz skip2 ; call task_b ; Then call task_b skip2: ; Jump taken bits
11010011 00010100 -------00010000
11101001 00010100 -------00000000
The source operand for TEST is often a mask in which the test bits are the only bits set. The destination operand contains the value to be tested. If all the bits set in the mask are clear in the destination operand, TEST sets the zero flag. If any of the flags set in the mask are also set in the destination operand, TEST clears the zero flag. The 80386/486 processors provide additional bit-testing instructions. The BT (Bit Test) series of instructions copy a specified bit from the destination operand to the carry flag. A JC or JNC can then route program flow depending on the result. For variations on the BT instruction, see the Reference.
Jumps Based on a Value of Zero

A program often needs to jump based on whether a particular register contains a value of zero. Weve seen how the JCXZ instruction jumps depending on the value in the CX register. You can test for zero in other data registers nearly as efficiently with the OR instruction. A program can OR a register with itself without changing the registers contents, then act on the resulting flags status. For example, the following example tests whether BX is zero:
or jz bx, bx is_zero ; Is BX = 0? ; Jump if so
169
This code is functionally equivalent to:

cmp je bx, 0 is_zero ; Is BX = 0? ; Jump if so
but produces smaller and faster code, since it does not use an immediate number as an operand. The same technique also lets you test a registers sign bit:
or js dx, dx sign_set ; Is DX sign bit set? ; Jump if so
Jump Extending
Unlike an unconditional jump, a conditional jump cannot reference a label more than 128 bytes away. For example, the following statement is valid as long as target is within a distance of 128 bytes:
; Jump to target less than 128 bytes away jz target ; If previous operation resulted ; in zero, jump to target
However, if target is too distant, the following sequence is necessary to enable a longer jump. Note this sequence is logically equivalent to the preceding example:
; Jumps to distant targets previously required two steps jnz skip ; If previous operation result is ; NOT zero, jump to "skip" jmp target ; Otherwise, jump to target skip:
MASM can automate jump-extending for you. If you target a conditional jump to a label farther than 128 bytes away, MASM rewrites the instruction with an unconditional jump, which ensures that the jump can reach its target. If target lies within a 128-byte range, the assembler encodes the instruction jz target as is. Otherwise, MASM generates two substitute instructions:
jne $ + 2 + (length in bytes of the next instruction) jmp NEAR PTR target
The assembler generates this same code sequence if you specify the distance with NEAR PTR , FAR PTR, or SHORT. Therefore,
jz NEAR PTR target
becomes
170
Programmers Guide
jne jmp $ + 5 NEAR PTR target
even if target is less than 128 bytes away. MASM enables automatic jump expansion by default, but you can turn it off with the NOLJMP form of the OPTION directive. For information about the OPTION directive, see page 24. If the assembler generates code to extend a conditional jump, it issues a level 3 warning saying that the conditional jump has been lengthened. You can set the warning level to 1 for development and to level 3 for a final optimizing pass to see if you can shorten jumps by reorganizing. If you specify the distance for the jump and the target is out of range for that distance, a Jump out of Range error results. Since the JCXZ and JECXZ instructions do not have logical negations, expansion of the jump instruction to handle targets with unspecified distances cannot be performed for those instructions. Therefore, the distance must always be short. The size and distance of the target operand determines the encoding for conditional or unconditional jumps to externals or targets in different segments. The jump-extending and optimization features do not apply in this case. Note Conditional jumps on the 80386 and 80486 processors can be to targets up to 32K away, so jump extension occurs only for targets greater than that distance.
Anonymous Labels
When you code jumps in assembly language, you must invent many label names. One alternative to continually thinking up new label names is to use anonymous labels, which you can use anywhere in your program. But because anonymous labels do not provide meaningful names, they are best used for jumping over only a few lines of code. You should mark major divisions of a program with actual named labels. Use two at signs (@@) followed by a colon (:) as an anonymous label. To jump to the nearest preceding anonymous label, use @B (back) in the jump instructions operand field; to jump to the nearest following anonymous label, use @F (forward) in the operand field. The jump in the following example targets an anonymous label:

jge . . . @@: @F
171
The items @B and @F always refer to the nearest occurrences of @@:, so there is never any conflict between different anonymous labels.
Decision Directives
The high-level structures you can use for decision-making are the .IF, .ELSEIF, and .ELSE statements. These directives generate conditional jumps. The expression following the .IF directive is evaluated, and if true, the following instructions are executed until the next .ENDIF, .ELSE, or .ELSEIF directive is reached. The .ELSE statements execute if the expression is false. Using the .ELSEIF directive puts a new expression inside the alternative part of the original .IF statement to be evaluated. The syntax is: .IF condition1 statements [[.ELSEIF condition2 statements]] [[.ELSE statements]] .ENDIF The decision structure
.IF mov .ELSE mov .ENDIF cx == 20 dx, 20 dx, 30
generates this code:

.IF 0017 001A 001C 001F 0021 0021 0024 83 F9 14 75 05 BA 0014 EB 03 BA 001E * * cx == 20 cmp cx, 014h jne @C0001 mov dx, 20 @C0003 dx, 30
.ELSE * jmp *@C0001: mov .ENDIF *@C0003:
172
Programmers Guide
Loops
Loops repeat an action until a termination condition is reached. This condition can be a counter or the result of an expressions evaluation. MASM 6.1 offers many ways to set up loops in your programs. The following list compares MASM loop structures:
Instructions
LOOP
Action Automatically decrements CX. When CX = 0, the loop ends. The top of the loop cannot be greater than 128 bytes from the LOOP instruction. (This is true for all LOOP instructions.) Loops while equal or not equal. Checks both CX and the state of the zero flag. LOOPZ ends when either CX=0 or the zero flag is clear, whichever occurs first. LOOPNZ ends when either CX=0 or the zero flag is set, whichever occurs first. LOOPE and LOOPZ assemble to the same machine instruction, as do LOOPNE and LOOPNZ. Use whichever mnemonic best fits the context of your loop. Set CX to a number out of range if you dont want a count to control the loop. Branches to a label only if CX = 0 or ECX = 0. Unlike other conditional-jump instructions, which can jump to either a near or a short label under the 80386 or 80486, JCXZ and JECXZ always jump to a short label. Acts only if certain conditions met. Necessary if several conditions must be tested. See Conditional Jumps, page 164.
LOOPE/LOOPZ, LOOPNE/LOOPNZ
JCXZ, JECXZ
Conditional jumps
The following examples illustrate these loop constructions.

; The LOOP instruction: For 200 to 0 do task mov cx, 200 ; Set counter next: . ; Do the task here . . loop next ; Do again ; Continue after loop ; The LOOPNE instruction: While AX is not 'Y', do task mov cx, 256 ; Set count too high to interfere wend: . ; But don't do more than 256 times . ; Some statements that change AX . cmp al, 'Y' ; Is it Y or too many times? loopne wend ; No? Repeat ; Yes? Continue
The JCXZ and JECXZ instructions provide an efficient way to avoid executing loops when the loop counter CX is empty. For example, consider the following loops:

mov next: cx, LoopCount . . . loop next ; Load loop counter ; Iterate loop CX times
173
; Do again
If LoopCount is zero, CX decrements to -1 on the first pass. It then must decrement 65,535 more times before reaching 0. Use a JCXZ to avoid this problem:
mov next: cx, LoopCount jcxz done . . . loop next ; Load loop counter ; Skip loop if count is 0 ; Else iterate loop CX times
done:
; Do again ; Continue after loop
Loop-Generating Directives
The high-level control structures generate loop structures for you. These directives are similar to the while and repeat loops of C or Pascal, and can make your assembly programs easier to code and to read. The assembler generates the appropriate assembly code. These directives are summarized as follows:
Directives
.WHILE ... .ENDW .REPEAT ... .UNTIL .REPEAT ... .UNTILCXZ .BREAK .CONTINUE
Action The statements between .WHILE condition and .ENDW execute while the condition is true. The loop executes at least once and continues until the condition given after .UNTIL is true. Generates conditional jumps. Compares label to an expression and generates appropriate loop instructions. End a .REPEAT or a .WHILE loop unconditionally. Jump unconditionally past any remaining code to bottom of loop.
174
Programmers Guide
These constructs work much as they do in a high-level language such as C or Pascal. Keep in mind the following points:
u
These directives generate appropriate processor instructions. They are not new instructions. They require proper use of signed and unsigned data declarations.
These directives cause a set of instructions to execute based on the evaluation of some condition. This condition can be an expression that evaluates to a signed or unsigned value, an expression using the binary operators in C (&&, ||, or !), or the state of a flag. For more information about expression operators, see page 178. The evaluation of the condition requires the assembler to know if the operands in the condition are signed or unsigned. To state explicitly that a named memory location contains a signed integer, use the signed data allocation directives SBYTE, SWORD, and SDWORD.
.WHILE Loops
As with while loops in C or Pascal, the test condition for .WHILE is checked before the statements inside the loop execute. If the test condition is false, the loop does not execute. While the condition is true, the statements inside the loop repeat. Use the .ENDW directive to mark the end of the .WHILE loop. When the condition becomes false, program execution begins at the first statement following the .ENDW directive. The .WHILE directive generates appropriate compare and jump statements. The syntax is: .WHILE condition statements .ENDW For example, this loop copies the contents of one buffer to another until a $ character (marking the end of the string) is found:
.DATA buf1 buf2 .CODE sub .WHILE mov mov inc .ENDW BYTE "This is a string",'$' BYTE 100 DUP (?) bx, bx (buf1[bx] != '$') al, buf1[bx] buf2[bx], al bx ; Zero out bx ; Get a character ; Move it to buffer 2 ; Count forward
175
.REPEAT Loops
MASMs .REPEAT directive allows for loop constructions like the do loop of C and the REPEAT loop of Pascal. The loop executes until the condition following the .UNTIL (or .UNTILCXZ) directive becomes true. Since the condition is checked at the end of the loop, the loop always executes at least once. The .REPEAT directive generates conditional jumps. The syntax is: .REPEAT statements .UNTIL condition .REPEAT statements .UNTILCXZ [[condition]] where condition can also be expr1 == expr2 or expr1 != expr2. When two conditions are used, expr2 can be an immediate expression, a register, or (if expr1 is a register) a memory location. For example, the following code fills a buffer with characters typed at the keyboard. The loop ends when the ENTER key (character 13) is pressed:
buffer .DATA BYTE 100 DUP (0) .CODE sub bx, bx .REPEAT mov ah, 01h int 21h mov buffer[bx], al inc bx .UNTIL (al == 13)
; Zero out bx
; ; ; ;
Get a key Put it in the buffer Increment the count Continue until al is 13
176
Programmers Guide
The .UNTIL directive generates conditional jumps, but the .UNTILCXZ directive generates a LOOP instruction, as shown by the listing file code for these examples. In a listing file, assembler-generated code is preceded by an asterisk.
ASSUME bx:PTR SomeStruct .REPEAT *@C0001: inc ax .UNTIL ax==6 * cmp ax, 006h * jne @C0001 .REPEAT *@C0003: mov .UNTILCXZ loop ax, 1 @C0003
.REPEAT *@C0004: .UNTILCXZ * cmp * loope
[bx].field != 6 [bx].field, 006h @C0004
.BREAK and .CONTINUE Directives

The .BREAK and .CONTINUE directives terminate a .REPEAT or .WHILE loop prematurely. These directives allow an optional .IF clause for conditional breaks. The syntax is: .BREAK [[.IF condition]] .CONTINUE [[.IF condition]] Note that .ENDIF is not used with the .IF forms of .BREAK and .CONTINUE in this context. The .BREAK and .CONTINUE directives work the same way as the break and continue instructions in C. Execution continues at the instruction following the .UNTIL, .UNTILCXZ, or .ENDW of the nearest enclosing loop. Instead of ending the loop execution as .BREAK does, .CONTINUE causes loop execution to jump directly to the code that evaluates the loop condition of the nearest enclosing loop. The following loop accepts only the keys in the range 0 to 9 and terminates when you press ENTER.

.WHILE 1 ; Loop forever mov ah, 08h ; Get key without echo int 21h .BREAK .IF al == 13 ; If ENTER, break out of the loop .CONTINUE .IF (al < '0') || (al > '9') ; If not a digit, continue looping mov dl, al ; Save the character for processing mov ah, 02h ; Output the character int 21h .ENDW
177
If you assemble the preceding source code with the /Fl and /Sg command-line options and then view the results in the listing file, you will see this code:
0017 0017 0019 001B 001D 001F 0021 0023 0025 0027 0029 002B 002D 002F .WHILE 1 *@C0001: mov int .BREAK .IF al * cmp * je .CONTINUE .IF * cmp * jb * cmp * ja mov mov int .ENDW * jmp *@C0002:
B4 08 CD 21 3C 0D 74 10 3C 72 3C 77 8A B4 CD 30 F4 39 F0 D0 02 21
ah, 08h 21h == 13 al, 00Dh @C0002 (al '0') || (al al, '0' @C0001 al, '9' @C0001 dl, al ah, 02h 21h @C0001
'9')
EB E8
The high-level control structures can be nested. That is, .REPEAT or .WHILE loops can contain .REPEAT or .WHILE loops as well as .IF statements. If the code generated by a .WHILE loop, .REPEAT loop, or .IF statement generates a conditional or unconditional jump, MASM encodes the jump using the jump extension and jump optimization techniques described in Unconditional Jumps, page 162, and Conditional Jumps, page 164.
178
Programmers Guide
Writing Loop Conditions

You can express the conditions of the .IF, .REPEAT, and .WHILE directives using relational operators, and you can express the attributes of the operand with the PTR operator. To write loop conditions, you also need to know how the assembler evaluates the operators and operands in the condition. This section explains the operators, attributes, precedence level, and expression evaluation order for the conditions used with loop-generating directives.
Expression Operators
The binary relational operators in MASM 6.1 are the same binary operators used in C. These operators generate MASM compare, test, and conditional jump instructions. High-level control instructions include:
Operator == != > >= < <= & ! && || Meaning Equal Not equal Greater than Greater than or equal to Less than Less than or equal to Bit test Logical NOT Logical AND Logical OR
A condition without operators (other than !) tests for nonzero as it does in C. For example, .WHILE (x) is the same as .WHILE (x != 0), and .WHILE (!x) is the same as .WHILE (x == 0). You can also use the flag names (ZERO?, CARRY? , OVERFLOW? , SIGN?, and PARITY? ) as operands in conditions with the high-level control structures. For example, in .WHILE (CARRY?), the value of the carry flag determines the outcome of the condition.
Signed and Unsigned Operands

Expression operators generate unsigned jumps by default. However, if either side of the operation is signed, the assembler considers the entire operation signed.
179
You can use the PTR operator to tell the assembler that a particular operand in a register or constant is a signed number, as in these examples:
.WHILE .IF SWORD PTR [bx] <= 0 SWORD PTR mem1 > 0
Without the PTR operator, the assembler would treat the contents of BX as an unsigned value. You can also specify the size attributes of operands in memory locations with SBYTE, SWORD, and SDWORD, for use with .IF, .WHILE, and .REPEAT.
mem1 mem2 .DATA SBYTE WORD .IF .WHILE .WHILE ? ? mem1 > 0 mem2 < bx SWORD PTR ax < count
Precedence Level
As with C, you can concatenate conditions with the && operator for AND, the || operator for OR, and the ! operator for negate. The precedence level is !, &&, and ||, with ! having the highest priority. Like expressions in high-level languages, precedence is evaluated left to right.
Expression Evaluation
The assembler evaluates conditions created with high-level control structures according to short-circuit evaluation. If the evaluation of a particular condition automatically determines the final result (such as a condition that evaluates to false in a compound statement concatenated with AND), the evaluation does not continue. For example, in this .WHILE statement,
.WHILE (ax > 0) && (WORD PTR [bx] == 0)
the assembler evaluates the first condition. If this condition is false (that is, if AX is less than or equal to 0), the evaluation is finished. The second condition is not checked and the loop does not execute, because a compound condition containing && requires both expressions to be true for the entire condition to be true.
180
Programmers Guide
Procedures
Organizing your code into procedures that execute specific tasks divides large programs into manageable units, allows for separate testing, and makes code more efficient for repetitive tasks. Assembly-language procedures are similar to functions, subroutines, and procedures in high-level languages such as C, FORTRAN, and Pascal. Two instructions control the use of assembly-language procedures. CALL pushes the return address onto the stack and transfers control to a procedure, and RET pops the return address off the stack and returns control to that location. The PROC and ENDP directives mark the beginning and end of a procedure. Additionally, PROC can automatically:
u
Preserve register values that should not change but that the procedure might otherwise alter. Set up a local stack pointer, so that you can access parameters and local variables placed on the stack. Adjust the stack when the procedure ends.
Defining Procedures
Procedures require a label at the start of the procedure and a RET instruction at the end. Procedures are normally defined by using the PROC directive at the start of the procedure and the ENDP directive at the end. The RET instruction normally is placed immediately before the ENDP directive. The assembler makes sure the distance of the RET instruction matches the distance defined by the PROC directive. The basic syntax for PROC is: label PROC [[NEAR | FAR]] . . . RET [[constant]] label ENDP The CALL instruction pushes the address of the next instruction in your code onto the stack and passes control to a specified address. The syntax is: CALL {label | register | memory} The operand contains a value calculated at run time. Since that operand can be a register, direct memory operand, or indirect memory operand, you can write call tables similar to the example code on page 164.
181
Calls can be near or far. Near calls push only the offset portion of the calling address and therefore must target a procedure within the same segment or group. You can specify the type for the target operand. If you do not, MASM uses the declared distance (NEAR or FAR) for operands that are labels and for the size of register or memory operands. The assembler then encodes the call appropriately, as it does with unconditional jumps. (See previous Unconditional Jumps and Conditional Jumps.) MASM optimizes a call to a far non-external label when the label is in the current segment by generating the code for a near call, saving one byte. You can define procedures without PROC and ENDP, but if you do, you must make sure that the size of the CALL matches the size of the RET. You can specify the RET instruction as RETN (Return Near) or RETF (Return Far) to override the default size:
call . . . task: . . . retn NEAR PTR task ; Call is declared near ; Return comes to here
; Procedure begins with near label ; Instructions go here ; Return declared near
The syntax for RETN and RETF is: label: | label LABEL NEAR statements RETN [[constant]] label LABEL FAR statements RETF [[constant]] The RET instruction (and its RETF and RETN variations) allows an optional constant operand that specifies a number of bytes to be added to the value of the SP register after the return. This operand adjusts for arguments passed to the procedure before the call, as shown in the example in Using Local Variables, following. When you define procedures without PROC and ENDP, you must make sure that calls have the same size as corresponding returns. For example, RETF pops two words off the stack. If a NEAR call is made to a procedure with a far return, the popped value is meaningless, and the stack status may cause the execution to return to a random memory location, resulting in program failure.
182
Programmers Guide
An extended PROC syntax automates many of the details of accessing arguments and saving registers. See Declaring Parameters with the PROC Directive, later in this chapter.
Passing Arguments on the Stack

Each time you call a procedure, you may want it to operate on different data. This data, called arguments, can be passed to the procedure in various ways. Although you can pass arguments to a procedure in registers or in variables, the most common method is the stack. Microsoft languages have specific conventions for passing arguments. These conventions for assembly-language modules shared with modules from high-level languages are explained in Chapter 12, Mixed-Language Programming. This section describes how a procedure accesses the arguments passed to it on the stack. Each argument is accessed as an offset from BP. However, if you use the PROC directive to declare parameters, the assembler calculates these offsets for you and lets you refer to parameters by name. The next section, Declaring Parameters with the PROC Directive, explains how to use PROC this way. This example shows how to pass arguments to a procedure. The procedure expects to find those arguments on the stack. As this example shows, arguments must be accessed as offsets of BP.
; C-style procedure call and definition mov push push push call add . . . PROC push mov mov add add pop ret ENDP ax, 10 ax arg2 cx addup sp, 6 ; ; ; ; ; ; ; Load and push constant as third argument Push memory as second argument Push register as first argument Call the procedure Destroy the pushed arguments (equivalent to three pops)
addup
NEAR bp bp, sp ax, [bp+4] ax, [bp+6] ax, [bp+8] bp
; ; ; ; ; ; ; ; ; ; ; ; ;
Return address for near call takes two bytes Save base pointer - takes two bytes so arguments start at fourth byte Load stack into base pointer Get first argument from fourth byte above pointer Add second argument from sixth byte above pointer Add third argument from eighth byte above pointer Restore BP Return result in AX
addup
183
Figure 7.1 shows the stack condition at key points in the process.
Figure 7.1
Procedure Arguments on the Stack
Starting with the 80186 processor, the ENTER and LEAVE instructions simplify the stack setup and restore instructions at the beginning and end of procedures. However, ENTER uses a lot of time. It is necessary only with nested, statically-scoped procedures. Thus, a Pascal compiler may sometimes generate ENTER. The LEAVE instruction, on the other hand, is an efficient way to do the stack cleanup. LEAVE reverses the effect of the last ENTER instruction by restoring BP and SP to their values before the procedure call.
184
Programmers Guide
Declaring Parameters with the PROC Directive

With the PROC directive, you can specify registers to be saved, define parameters to the procedure, and assign symbol names to parameters (rather than as offsets from BP). This section describes how to use the PROC directive to automate the parameter-accessing techniques described in the last section. For example, the following diagram shows a valid PROC statement for a procedure called from C. It takes two parameters, var1 and arg1, and uses (and must save) the DI and SI registers:
The syntax for PROC is: label PROC [[attributes]] [[USES reglist]] [[, ]] [[parameter[[:tag]]... ]] The parts of the PROC directive include:
Argument label attributes Description The name of the procedure. Any of several attributes of the procedure, including the distance, langtype, and visibility of the procedure. The syntax for attributes is given on the following page. A list of registers following the USES keyword that the procedure uses, and that should be saved on entry. Registers in the list must be separated by blanks or tabs, not by commas. The assembler generates prologue code to push these registers onto the stack. When you exit, the assembler generates epilogue code to pop the saved register values off the stack. The list of parameters passed to the procedure on the stack. The list can have a variable number of parameters. See the discussion following for the syntax of parameter. This list can be longer than one line if the continued line ends with a comma.
reglist
parameter
This diagram shows a valid PROC definition that uses several attributes:
185
Attributes
The syntax for the attributes field is: [[distance]] [[langtype]] [[visibility]] [[<prologuearg>]] The explanations for these options include:
Argument distance Description Controls the form of the RET instruction generated. Can be NEAR or FAR. If distance is not specified, it is determined from the model declared with the .MODEL directive. NEAR distance is assumed for TINY, SMALL, COMPACT, and FLAT. The assembler assumes FAR distance for MEDIUM , LARGE , and HUGE . For 80386/486 programming with 16- and 32-bit segments, you can specify NEAR16, NEAR32, FAR16, or FAR32. Determines the calling convention used to access parameters and restore the stack. The BASIC, FORTRAN, and PASCAL langtypes convert procedure names to uppercase, place the last parameter in the parameter list lowest on the stack, and generate a RET num instruction to end the procedure. The RET adjusts the stack upward by num, which represents the number of bytes in the argument list. This step, called cleaning the stack, returns the stack pointer SP to the value it had before the caller pushed any arguments. The C and STDCALL langtype prefixes an underscore to the procedure name when the procedures scope is PUBLIC or EXPORT and places the first parameter lowest on the stack. SYSCALL is equivalent to the C calling convention with no underscore prefixed to the procedures name. STDCALL uses caller stack cleanup when :VARARG is specified; otherwise the called routine must clean up the stack (see Chapter 12). visibility Indicates whether the procedure is available to other modules. The visibility can be PRIVATE, PUBLIC, or EXPORT. A procedure name is PUBLIC unless it is explicitly declared as PRIVATE. If the visibility is EXPORT, the linker places the procedures name in the export table for segmented executables. EXPORT also enables PUBLIC visibility. You can explicitly set the default visibility with the OPTION directive. OPTION PROC:PUBLIC sets the default to public. For more information, see Chapter 1, Using the Option Directive. prologuearg Specifies the arguments that affect the generation of prologue and epilogue code (the code MASM generates when it encounters a PROC directive or the end of a procedure). For an explanation of prologue and epilogue code, see Generating Prologue and Epilogue Code, later in this chapter.
langtype
186
Programmers Guide
Parameters
The comma that separates parameters from reglist is optional, if both fields appear on the same line. If parameters appears on a separate line, you must end the reglist field with a comma. In the syntax: parmname [[:tag] parmname is the name of the parameter. The tag can be the qualifiedtype or the keyword VARARG. However, only the last parameter in a list of parameters can use the VARARG keyword. The qualifiedtype is discussed in Data Types, Chapter 1. An example showing how to reference VARARG parameters appears later in this section. You can nest procedures if they do not have parameters or USES register lists. This diagram shows a procedure definition with one parameter definition.
The procedure presented in Passing Arguments on the Stack, page 182, is here rewritten using the extended PROC functionality. Prior to the procedure call, you must push the arguments onto the stack unless you use INVOKE. (See Calling Procedures with INVOKE, later in this chapter.)
addup PROC NEAR C, arg1:WORD, arg2:WORD, count:WORD mov ax, arg1 add ax, count add ax, arg2 ret ENDP
addup
If the arguments for a procedure are pointers, the assembler does not generate any code to get the value or values that the pointers reference; your program must still explicitly treat the argument as a pointer. (For more information about using pointers, see Chapter 3, Using Addresses and Pointers.)
187
In the following example, even though the procedure declares the parameters as near pointers, you must code two MOV instructions to get the values of the parameters. The first MOV gets the address of the parameters, and the second MOV gets the parameter.
; Call from C as a FUNCTION returning an integer .MODEL medium, c .CODE PROC arg1:NEAR PTR WORD, arg2:NEAR PTR WORD mov mov mov add ret myadd ENDP bx, ax, bx, ax, arg1 [bx] arg2 [bx] ; Load first argument ; Add second argument
myadd
You can use conditional-assembly directives to make sure your pointer parameters are loaded correctly for the memory model. For example, the following version of myadd treats the parameters as FAR parameters, if necessary.
.MODEL .CODE PROC IF les mov les add ELSE mov mov mov add ENDIF ret ENDP medium, c arg1:PTR WORD, @DataSize bx, arg1 ax, es:[bx] bx, arg2 ax, es:[bx] bx, ax, bx, ax, arg1 [bx] arg2 [bx] ; Could be any model arg2:PTR WORD
myadd
; Far parameters
; Near parameters
myadd
188
Programmers Guide
Using VARARG
In the PROC statement, you can append the :VARARG keyword to the last parameter to indicate that the procedure accepts a variable number of arguments. However, :VARARG applies only to the C, SYSCALL, or STDCALL calling conventions (see Chapter 12). A symbol must precede :VARARG so the procedure can access arguments as offsets from the given variable name, as this example illustrates:
addup3 PROTO NEAR C, argcount:WORD, arg1:VARARG invoke addup3 PROC sub sub .WHILE add dec inc inc .ENDW ret ENDP addup3, 3, 5, 2, 4 NEAR C, argcount:WORD, arg1:VARARG ax, ax ; Clear work register si, si argcount > 0 ax, arg1[si] argcount si si ; Argcount has number of arguments ; Arg1 has the first argument ; Point to next argument
; Total is in AX
addup3
You can pass non-default-sized pointers in the VARARG portion of the parameter list by separately passing the segment portion and the offset portion of the address. Note When you use the extended PROC features and the assembler encounters a RET instruction, it automatically generates instructions to pop saved registers, remove local variables from the stack, and, if necessary, remove parameters. It generates this code for each RET instruction it encounters. You can reduce code size by having only one return and jumping to it from various locations.
Using Local Variables

In high-level languages, local variables are visible only within a procedure. In Microsoft languages, these variables are usually stored on the stack. In assembly-language programs, you can also have local variables. These variables should not be confused with labels or variable names that are local to a module, as described in Chapter 8, Sharing Data and Procedures Among Modules and Libraries.
189
This section outlines the standard methods for creating local variables. The next section shows how to use the LOCAL directive to make the assembler
190
Programmers Guide
automatically generate local variables. When you use this directive, the assembler generates the same instructions as those demonstrated in this section but handles some of the details for you. If your procedure has relatively few variables, you can usually write the most efficient code by placing these values in registers. Use local (stack) data when you have a large amount of temporary data for the procedure. To use a local variable, you must save stack space for it at the start of the procedure. A procedure can then reference the variable by its position in the stack. At the end of the procedure, you must clean the stack by restoring the stack pointer. This effectively throws away all local variables and regains the stack space they occupied. This example subtracts 2 bytes from the SP register to make room for a local word variable, then accesses the variable as [bp-2].
push call . . . task PROC push mov sub . . . mov add sub . . . mov pop ret ENDP ax task ; Push one argument ; Call
NEAR bp bp, sp sp, 2
; Save base pointer ; Load stack into base pointer ; Save two bytes for local variable
WORD PTR [bp-2], 3 ; Initialize local variable ax, [bp-2] ; Add local variable to AX [bp+4], ax ; Subtract local from argument ; Use [bp-2] and [bp+4] in ; other operations sp, bp bp 2 ; Clear local variables ; Restore base ; Return result in AX and pop ; two bytes to clear parameter
task
Notice the instruction mov sp,bp at the end of the procedure restores the original value of SP. The statement is required only if the value of SP changes inside the procedure (usually by allocating local variables). The argument passed to the procedure is removed with the RET instruction. Contrast this to the example in Passing Arguments on the Stack, page 182, in which the calling code adjusts the stack for the argument.
191
Figure 7.2 shows the stack at key points in the process.
Figure 7.2
Local Variables on the Stack
Creating Local Variables Automatically

MASMs LOCAL directive automates the process for creating local variables on the stack. LOCAL frees you from having to count stack words, and it makes your code easier to write and maintain. This section illustrates the advantages of creating temporary data with the LOCAL directive. To use the LOCAL directive, list the variables you want to create, giving a type for each one. The assembler calculates how much space is required on the stack. It also generates instructions to properly decrement SP (as described in the previous section) and to reset SP when you return from the procedure. When you create local variables this way, your source code can refer to each local variable by name rather than as an offset of the stack pointer. Moreover,
192
Programmers Guide
the assembler generates debugging information for each local variable. If you have programmed before in a high-level language that allows scoping, local variables will seem familiar. For example, a C compiler sets up variables with automatic storage class in the same way as the LOCAL directive. We can simplify the procedure in the previous section with the following code:
task PROC LOCAL . . . mov add sub . . . ret ENDP NEAR arg:WORD loc:WORD
loc, 3 ax, loc arg, ax
; ; ; ;
Initialize local variable Add local variable to AX Subtract local from argument Use "loc" and "arg" in other operations
task
The LOCAL directive must be on the line immediately following the PROC statement with the following syntax: LOCAL vardef [[, vardef]]... Each vardef defines a local variable. A local variable definition has this form: label[[[count]]][[:qualifiedtype]] These are the parameters in local variable definitions:
Argument label count Description The name given to the local variable. You can use this name to access the variable. The number of elements of this name and type to allocate on the stack. You can allocate a simple array on the stack with count . The brackets around count are required. If this field is omitted, one data object is assumed. A simple MASM type or a type defined with other types and attributes. For more information, see Data Types in Chapter 1.
qualifiedtype
If the number of local variables exceeds one line, you can place a comma at the end of the first line and continue the list on the next line. Alternatively, you can use several consecutive LOCAL directives.
193
The assembler does not initialize local variables. Your program must include code to perform any necessary initializations. For example, the following code fragment sets up a local array and initializes it to zero:
arraysz EQU aproc 20
PROC USES di LOCAL var1[arraysz]:WORD, var2:WORD . . . ; Initialize local array to zero push ss pop es ; Set ES=SS lea di, var1 ; ES:DI now points to array mov cx, arraysz ; Load count sub ax, ax rep stosw ; Store zeros ; Use the array... . . . ret aproc ENDP
Even though you can reference stack variables by name, the assembler treats them as offsets of BP, and they are not visible outside the procedure. In the following procedure, array is a local variable.
index test LOCAL EQU 10 PROC NEAR array[index]:WORD . . . mov bx, index mov array[bx], 5
; Not legal!
The second MOV statement may appear to be legal, but since array is an offset of BP, this statement is the same as
; mov [bp + bx + arrayoffset], 5 ; Not legal!
BP and BX can be added only to SI and DI. This example would be legal, however, if the index value were moved to SI or DI. This type of error in your program can be difficult to find unless you keep in mind that local variables in procedures are offsets of BP.
194
Programmers Guide
Declaring Procedure Prototypes

MASM provides the INVOKE directive to handle many of the details important to procedure calls, such as pushing parameters according to the correct calling conventions. To use INVOKE, the procedure called must have been declared previously with a PROC statement, an EXTERNDEF (or EXTERN) statement, or a TYPEDEF. You can also place a prototype defined with PROTO before the INVOKE if the procedure type does not appear before the INVOKE. Procedure prototypes defined with PROTO inform the assembler of types and numbers of arguments so the assembler can check for errors and provide automatic conversions when INVOKE calls the procedure. Declaring procedure prototypes is good programming practice, but is optional. Prototypes in MASM perform the same function as prototypes in C and other high-level languages. A procedure prototype includes the procedure name, the types, and (optionally) the names of all parameters the procedure expects. Prototypes usually are placed at the beginning of an assembly program or in a separate include file so the assembler encounters the prototype before the actual procedure. Prototypes enable the assembler to check for unmatched parameters and are especially useful for procedures called from other modules and other languages. If you write routines for a library, you may want to put prototypes into an include file for all the procedures used in that library. For more information about using include files, see Chapter 8, Sharing Data and Procedures among Modules and Libraries. The PROTO directive provides one way to define a procedure prototype. The syntax for a prototype definition is the same as for a procedure declaration (see Declaring Parameters with the PROC Directive, earlier in this chapter), except that you do not include the list of registers, prologuearg list, or the scope of the procedure. Also, the PROTO keyword precedes the langtype and distance attributes. The attributes (like C and FAR) are optional. However, if they are not specified, the defaults are based on any .MODEL or OPTION LANGUAGE statement. The names of the parameters are also optional, but you must list parameter types. A label preceding :VARARG is also optional in the prototype but not in the PROC statement. If a PROTO and a PROC for the same function appear in the same module, they must match in attribute, number of parameters, and parameter types. The easiest way to create prototypes with PROTO is to write your procedure and then copy the first line (the line that contains the PROC keyword) to a location in your program that follows the data declarations. Change PROC to PROTO and remove the USES reglist, the prologuearg field, and the visibility field. It
195
is important that the prototype follow the declarations for any types used in it to avoid any forward references used by the parameters in the prototype. The following example illustrates how to define and then declare two typical procedures. In both prototype and declaration, the comma before the argument list is optional only when the list does not appear on a separate line:
; Procedure prototypes. addup myproc PROTO NEAR C argcount:WORD, arg2:WORD, arg3:WORD PROTO FAR C, argcount:WORD, arg2:VARARG
; Procedure declarations addup . . . myproc PROC NEAR C, argcount:WORD, arg2:WORD, arg3:WORD
PROC FAR C PUBLIC <callcount> USES di si, argcount:WORD, arg2:VARARG
When you call a procedure with INVOKE, the assembler checks the arguments given by INVOKE against the parameters expected by the procedure. If the data types of the arguments do not match, MASM reports an error or converts the type to the expected type. These conversions are explained in the next section.
Calling Procedures with INVOKE

INVOKE generates a sequence of instructions that push arguments and call a procedure. This helps maintain code if arguments or langtype for a procedure are changed. INVOKE generates procedure calls and automatically:
u u u
Converts arguments to the expected types. Pushes arguments on the stack in the correct order. Cleans the stack when the procedure returns.
If arguments do not match in number or if the type is not one the assembler can convert, an error results. If the procedure uses VARARG, INVOKE can pass a number of arguments different from the number in the parameter list without generating an error or warning. Any additional arguments must be at the end of the INVOKE argument list. All other arguments must match those in the prototype parameter list.
196
Programmers Guide
The syntax for INVOKE is: INVOKE expression [[, arguments]] where expression can be the procedures label or an indirect reference to a procedure, and arguments can be an expression, a register pair, or an expression preceded with ADDR. (The ADDR operator is discussed later in this chapter.) Procedures with these prototypes
addup myproc PROTO NEAR C argcount:WORD, arg2:WORD, arg3:WORD PROTO FAR C, argcount:WORD, arg2:VARARG
and these procedure declarations

addup . . . myproc PROC NEAR C, argcount:WORD, arg2:WORD, arg3:WORD
PROC FAR C PUBLIC <callcount> USES di si, argcount:WORD, arg2:VARARG
can be called with INVOKE statements like this:

INVOKE INVOKE addup, myproc, ax, x, y bx, cx, 100, 10
The assembler can convert some arguments and parameter type combinations so that the correct type can be passed. The signed or unsigned qualities of the arguments in the INVOKE statements determine how the assembler converts them to the types expected by the procedure. The addup procedure, for example, expects parameters of type WORD, but the arguments passed by INVOKE to the addup procedure can be any of these types:
u
u
u u u
BYTE, SBYTE, WORD, or SWORD An expression whose type is specified with the PTR operator to be one of those types An 8-bit or 16-bit register An immediate expression in the range 32K to +64K A NEAR PTR
If the type is smaller than that expected by the procedure, MASM widens the argument to match.
197
Widening Arguments
For INVOKE to correctly handle type conversions, you must use the signed data types for any signed assignments. MASM widens an argument to match the type expected by a procedures parameters in these cases:
Type Passed
BYTE, SBYTE WORD, SWORD
Type Expected
WORD, SWORD, DWORD, SDWORD DWORD, SDWORD
The assembler can extend a segment if far data is expected, and it can convert the type given in the list to the types expected. If the assembler cannot convert the type, however, it generates an error.
Detecting Errors
If the assembler needs to widen an argument, it first copies the value to AL or AX. It widens an unsigned value by placing a zero in the higher register area, and widens a signed value with a CBW, CWD, or CWDE instruction as required. Similarly, the assembler copies a constant argument value into AL or AX when the .8086 directive is in effect. You can see these generated instructions in the listing file when you include the /Sg command-line option. Using the accumulator register to widen or copy an argument may lead to an error if you attempt to pass AX as another argument. For example, consider the following INVOKE statement for a procedure with the C calling convention
INVOKE myprocA, ax, cx, 100, arg
where arg is a BYTE variable and myproc expects four arguments of type WORD. The assembler widens and then pushes arg like this:
mov xor push al, DGROUP:arg ah, ah ax
The generated code thus overwrites the last argument (AX) passed to the procedure. The assembler generates an error in this case, requiring you to rewrite the INVOKE statement. To summarize, the INVOKE directive overwrites AX and perhaps DX when widening arguments. It also uses AX to push constants on the 8088 and 8086. If you use these registers (or EAX and EDX on an 80386/486) to pass arguments, they may be overwritten. The assemblers error detection prevents this from ever becoming a run-time bug, but AX and DX should remain your last choice for holding arguments.
198
Programmers Guide
Invoking Far Addresses

You can pass a FAR pointer in a segment::offset pair, as shown in the following. Note the use of double colons to separate the register pair. The registers could be any other register pair, including a pair that an MS-DOS call uses to return values.
FPWORD TYPEDEF FAR PTR WORD SomeProc PROTO var1:DWORD, var2:WORD, var3:WORD pfaritem . . . les INVOKE FPWORD faritem
bx, pfaritem SomeProc, ES::BX, arg1, arg2
However, INVOKE cannot combine into a single address one argument for the segment and one for the offset.
Passing an Address
You can use the ADDR operator to pass the address of an expression to a procedure that expects a NEAR or FAR pointer. This example generates code to pass a far pointer (to arg1) to the procedure proc1.
PBYTE arg1 proc1 TYPEDEF FAR PTR BYTE BYTE "This is a string" PROTO NEAR C fparg:PBYTE . . . proc1, ADDR arg1
INVOKE
For information on defining pointers with TYPEDEF, see Defining Pointer Types with TYPEDEF in Chapter 3.
199
Invoking Procedures Indirectly

You can make an indirect procedure call such as call [bx + si] by using a pointer to a function prototype with TYPEDEF, as shown in this example:
FUNCPROTO FUNCPTR TYPEDEF PROTO NEAR ARG1:WORD TYPEDEF PTR FUNCPROTO
pfunc
.DATA FUNCPTR OFFSET proc1, OFFSET proc2 .CODE . . . mov mov INVOKE
bx, OFFSET pfunc ; BX points to table si, Num ; Num contains 0 or 2 FUNCPTR PTR [bx+si], arg1 ; Call proc1 if Num=0 ; or proc2 if Num=2
You can also use ASSUME to accomplish the same task. The following ASSUME statement associates the type FUNCPTR with the BX register.
ASSUME mov mov INVOKE BX:FUNCPTR bx, OFFSET pfunc si, Num [bx+si], arg1
Checking the Code Generated

Code generated by the INVOKE directive may vary depending on the processor mode and calling conventions in effect. You can check your listing files to see the code generated by the INVOKE directive if you use the /Sg command-line option.
Generating Prologue and Epilogue Code

When you use the PROC directive with its extended syntax and argument list, the assembler automatically generates the prologue and epilogue code in your procedure. Prologue code is generated at the start of the procedure. It sets up a stack pointer so you can access parameters from within the procedure. It also saves space on the stack for local variables, initializes registers such as DS, and pushes registers that the procedure uses. Similarly, epilogue code is the code at the end of the procedure that pops registers and returns from the procedure.
200
Programmers Guide
The assembler automatically generates the prologue code when it encounters the first instruction or label after the PROC directive. This means you cannot label the prologue for the purpose of jumping to it. The assembler generates the epilogue code when it encounters a RET or IRET instruction. Using the assembler-generated prologue and epilogue code saves time and decreases the number of repetitive lines of code in your procedures. The generated prologue or epilogue code depends on the:
u u u u u u
Local variables defined. Arguments passed to the procedure. Current processor selected (affects epilogue code only). Current calling convention. Options passed in the prologuearg of the PROC directive. Registers being saved.
The prologuearg list contains options specifying how to generate the prologue or epilogue code. The next section explains how to use these options, gives the standard prologue and epilogue code, and explains the techniques for defining your own prologue and epilogue code.
Using Automatic Prologue and Epilogue Code

The standard prologue and epilogue code handles parameters and local variables. If a procedure does not have any parameters or local variables, the prologue and epilogue code that sets up and restores a stack pointer is omitted, unless FORCEFRAME is included in the prologuearg list. (FORCEFRAME is discussed later in this section.) Prologue and epilogue code also generates a push and pop for each register in the register list. The prologue code consists of three steps: 1. Point BP to top of stack. 2. Make space on stack for local variables. 3. Save registers the procedure must preserve.
201
The epilogue cancels these three steps in reverse order, then cleans the stack, if necessary, with a RET num instruction. For example, the procedure declaration
myproc PROC NEAR PASCAL USES di si, arg1:WORD, arg2:WORD, arg3:WORD LOCAL local1:WORD, local2:WORD
generates the following prologue code:

push mov sub push push bp bp, sp sp, 4 di si ; Step 1: ; point BP to stack top ; Step 2: space for 2 local words ; Step 3: ; save registers listed in USES
The corresponding epilogue code looks like this:

pop pop mov pop ret si di sp, bp bp 6 ; Undo Step 3 ; Undo Step 2 ; Undo Step 1 ; Clean stack of pushed arguments
Notice the RET 6 instruction cleans the stack of the three word-sized arguments. The instruction appears in the epilogue because the procedure does not use the C calling convention. If myproc used C conventions, the epilogue would end with a RET instruction without an operand. The assembler generates standard epilogue code when it encounters a RET instruction without an operand. It does not generate an epilogue if RET has a nonzero operand. To suppress generation of a standard epilogue, use RETN or RETF with or without an operand, or use RET 0. The standard prologue and epilogue code recognizes two operands passed in the prologuearg list, LOADDS and FORCEFRAME. These operands modify the prologue code. Specifying LOADDS saves and initializes DS. Specifying FORCEFRAME as an argument generates a stack frame even if no arguments are sent to the procedure and no local variables are declared. If your procedure has any parameters or locals, you do not need to specify FORCEFRAME.
202
Programmers Guide
For example, adding LOADDS to the argument list for myproc creates this prologue:
push mov sub push mov mov push push bp bp, sp, ds ax, ds, di si sp 4 DGROUP ax ; ; ; ; ; ; ; ; Step 1: point BP to stack top Step 2: space for 2 locals Save DS and point it to DGROUP, as instructed by LOADDS Step 3: save registers listed in USES
The epilogue code restores DS:

pop pop pop mov pop ret si di ds sp, bp bp 6 ; Undo Step 3 ; ; ; ; Restore DS Undo Step 2 Undo Step 1 Clean stack of pushed arguments
User-Defined Prologue and Epilogue Code

If you want a different set of instructions for prologue and epilogue code in your procedures, you can write macros that run in place of the standard prologue and epilogue code. For example, while you are debugging your procedures, you may want to include a stack check or track the number of times a procedure is called. You can write your own prologue code to do these things whenever a procedure executes. Different prologue code may also be necessary if you are writing applications for Windows. User-defined prologue macros will respond correctly if you specify FORCEFRAME in the prologuearg of a procedure. To write your own prologue or epilogue code, the OPTION directive must appear in your program. It disables automatic prologue and epilogue code generation. When you specify OPTION PROLOGUE : macroname OPTION EPILOGUE : macroname the assembler calls the macro specified in the OPTION directive instead of generating the standard prologue and epilogue code. The prologue macro must be a macro function, and the epilogue macro must be a macro procedure.
203
The assembler expects your prologue or epilogue macro to have this form: macroname MACRO procname, \ flag, \ parmbytes, \ localbytes, \ <reglist>, \ userparms Your macro must have formal parameters to match all the actual arguments passed. The arguments passed to your macro include:
Argument procname flag Description The name of the procedure. A 16-bit flag containing the following information: Bit = Value Bit 0, 1, 2 Description For calling conventions (000=unspecified language type, 001=C, 010=SYSCALL, 011=STDCALL, 100=PASCAL, 101=FORTRAN, 110=BASIC). Undefined (not necessarily zero). Set if the caller restores the stack (use RET, not RETn). Set if procedure is FAR. Set if procedure is PRIVATE. Set if procedure is EXPORT. Set if the epilogue is generated as a result of an IRET instruction and cleared if the epilogue is generated as a result of a RET instruction. Undefined (not necessarily zero).
Bit 3 Bit 4 Bit 5 Bit 6 Bit 7 Bit 8
Bits 915 parmbytes localbytes reglist
The accumulated count in bytes of all parameters given in the PROC statement. The count in bytes of all locals defined with the LOCAL directive. A list of the registers following the USES operator in the procedure declaration. Enclose this list with angle brackets (< >) and separate each item with commas. Reverse the list for epilogues. Any argument you want to pass to the macro. The prologuearg (if there is one) specified in the PROC directive is passed to this argument.
userparms
Your macro function must return the parmbytes parameter. However, if the prologue places other values on the stack after pushing BP and these values are not referenced by any of the local variables, the exit value must be the number of bytes for procedure locals plus any space between BP and the locals. Therefore, parmbytes is not always equal to the bytes occupied by the locals.
204
Programmers Guide
The following macro is an example of a user-defined prologue that counts the number of times a procedure is called.
ProfilePro MACRO procname, flag, bytecount, numlocals, regs, macroargs .DATA WORD 0 .CODE procname&count bp bp, sp ; Other BP operations IFNB <regs> FOR r, regs push r ENDM ENDIF EXITM %bytecount ENDM \ \ \ \ \
procname&count inc push mov
; Accumulates count of times the ; procedure is called
Your program must also include this statement before calling any procedures that use the prologue:
OPTION PROLOGUE:ProfilePro
If you define either a prologue or an epilogue macro, the assembler uses the standard prologue or epilogue code for the one you do not define. The form of the code generated depends on the .MODEL and PROC options used. If you want to revert to the standard prologue or epilogue code, use PROLOGUEDEF or EPILOGUEDEF as the macroname in the OPTION statement.
OPTION EPILOGUE:EPILOGUEDEF
You can completely suppress prologue or epilogue generation with

OPTION PROLOGUE:None OPTION EPILOGUE:None
In this case, no user-defined macro is called, and the assembler does not generate a default code sequence. This state remains in effect until the next OPTION PROLOGUE or OPTION EPILOGUE is encountered.
205
For additional information about writing macros, see Chapter 9, Using Macros. The PROLOGUE.INC file provided in the MASM 6.1 distribution disks can create the prologue and epilogue sequences for the Microsoft C professional development system.
MS-DOS Interrupts
In addition to jumps, loops, and procedures that alter program execution, interrupt routines transfer execution to a different location. In this case, control goes to an interrupt routine. You can write your own interrupt routines, either to replace an existing routine or to use an undefined interrupt number. For example, you may want to replace an MS-DOS interrupt handler, such as the Critical Error (Interrup 24h) and CONTROL+C (Interrupt 23h) handlers. The BOUND instruction checks array bounds and calls Interrupt 5 when an error occurs. If you use this instruction, you need to write an interrupt handler for it. This section summarizes the following:
u u u
How to call interrupts How the processor handles interrupts How to redefine an existing interrupt routine
The example routine in this section handles addition or multiplication overflow and illustrates the steps necessary for writing an interrupt routine. For additional information about MS-DOS and BIOS interrupts, see Chapter 11, Writing Memory-Resident Software.
Calling MS-DOS and ROM-BIOS Interrupts

Interrupts provide a way to access MS-DOS and ROM-BIOS from assembly language. They are called with the INT instruction, which takes an immediate value between 0 and 255 as its only operand. MS-DOS and ROM-BIOS interrupt routines accept data through registers. For instance, most MS-DOS routines (and many BIOS routines) require a function number in the AH register. Many handler routines also return values in registers. To use an interrupt, you must know what data the handler routine expects and what data, if any, it returns. For information, consult Help or one of the other references mentioned in the Introduction.
206
Programmers Guide
The following fragment illustrates a simple call to MS-DOS Function 9, which displays the string msg on the screen:
msg .DATA BYTE .CODE mov mov mov mov int "This writes to the screen$" ax, ds, dx, ah, 21h SEG msg ax offset msg 09h ; Necessary only if DS does not ; already point to data segment ; DS:DX points to msg ; Request Function 9
When the INT instruction executes, the processor: 1. Looks up the address of the interrupt routine in the Interrupt Vector Table. This table starts at the lowest point in memory (segment 0, offset 0) and consists of a series of far pointers called vectors. Each vector comprises a 4byte address (segment:offset) pointing to an interrupt handler routine. The table sequence implies the number of the interrupt the vector references: the first vector points to the Interrupt 0 handler, the second vector to the Interrupt 1 handler, and so forth. Thus, the vector at 0000:i *4 holds the address of the handler routine for Interrupt i . 2. Clears the trap flag (TF) and interrupt enable flag (IF). 3. Pushes the flags register, the current code segment (CS), and the current instruction pointer (IP), in that order. (The current instruction is the one following the INT statement.) As with a CALL, this ensures control returns to the next logical position in the program. 4. Jumps to the address of the interrupt routine, as specified in the Interrupt Vector Table. 5. Executes the code of the interrupt routine until it encounters an IRET instruction. 6. Pops the instruction pointer, code segment, and flags.
207
Figure 7.3 illustrates how interrupts work.
Figure 7.3
Operation of Interrupts
Replacing an Interrupt Routine

To replace an existing interrupt routine, your program must:
u u
Provide a new routine to handle the interrupt. Replace the old routines address in the Interrupt Vector Table with the address of your new routine. Replace the old address back into the vector table before your program ends.
You can write an interrupt routine as a procedure by using the PROC and ENDP directives. The routine should always be defined as FAR and should end with an IRET instruction instead of a RET instruction.
208
Programmers Guide
Note You can use the full extended PROC syntax (described in Declaring Parameters with the PROC Directive, earlier in this chapter) to write interrupt procedures. However, you should not make interrupt procedures NEAR or specify arguments for them. You can use the USES keyword, however, to correctly generate code to save and restore a register list in interrupt procedures. The IRET instruction in MASM 6.1 has two forms that suppress epilogue code. This allows an interrupt to have local variables or use a user-defined prologue. IRETF pops a FAR16 return address, and IRETFD pops a FAR32 return address. The following example shows how to replace the handler for Interrupt 4. Once registered in the Interrupt Vector Table, the new routine takes control when the processor encounters either an INT 4 instruction or its special variation INTO (Interrupt on Overflow). INTO is a conditional instruction that acts only when the overflow flag is set. With INTO after a numerical calculation, your code can automatically route control to a handler routine if the calculation results in a numerical overflow. By default, the routine for Interrupt 4 simply consists of an IRET, so it returns without doing anything. Using INTO is an alternative to using JO (Jump on Overflow) to jump to another set of instructions. The following example program first executes INT 21h to invoke MS-DOS Function 35h (Get Interrupt Vector). This function returns the existing vector for Interrupt 4. The program stores the vector, then invokes MS-DOS Function 25h (Set Interrupt Vector) to place the address of the ovrflow procedure in the Interrupt Vector Table. From this point on, ovrflow gains control whenever the processor executes INTO while the overflow flag is set. The new routine displays a message and returns with AX and DX set to 0.
FPFUNC msg vector .MODEL LARGE, C TYPEDEF FAR PTR .DATA BYTE "Overflow - result set to 0",13,10,'$' FPFUNC ? .CODE .STARTUP mov int mov mov ax, 3504h 21h WORD PTR vector[2],es WORD PTR vector[0],bx ; Load Interrupt 4 and call DOS ; Get Interrupt Vector ; Save segment ; and offset

push mov mov mov mov int pop . . . add into . . . lds mov int mov int ds ax, ds, dx, ax, 21h ds ; Save DS ; Load segment of new routine
209
cs ax OFFSET ovrflow 2504h
; Load offset of new routine ; Load Interrupt 4 and call DOS ; Set Interrupt Vector ; Restore
ax, bx
; Do arithmetic ; Call Interrupt 4 if overflow
dx, vector ax, 2504h 21h ax, 4C00h 21h
; Load original address ; Restore it to vector table ; with DOS set vector function ; Terminate function
ovrflow PROC sti mov mov int sub cwd iret ovrflow ENDP END
FAR ; ; ; ; ; ; ; ; Enable interrupts (turned off by INT) Display string function Load address Call DOS Set AX to 0 Set DX to 0 Return
ah, 09h dx, OFFSET msg 21h ax, ax
Before the program ends, it again uses MS-DOS Function 25h to reset the original Interrupt 4 vector back into the Interrupt Vector Table. This reestablishes the original routine as the handler for Interrupt 4. The first instruction of the ovrflow routine warrants further discussion. When the processor encounters an INT instruction, it clears the interrupt flag before branching to the specified interrupt handler routine. The interrupt flag serves a crucial role in smoothing the processors tasks, but must not be abused. When clear, the flag inhibits hardware interrupts such as the keyboard or system timer. It should be left clear only briefly and only when absolutely necessary. Unless you have a
210
Programmers Guide
compelling reason to leave the flag clear, always include an STI (Set Interrupt Flag) instruction at the beginning of your interrupt handler routine to reenable hardware interrupts. CLI (Clear Interrupt Flag) and its corollary STI are designed to protect small sections of time-dependent code from interruptions by the hardware. If you use CLI in your program, be sure to include a matching STI instruction as well. The sample interrupt handlers in Chapter 11, Writing Memory-Resident Software, illustrate how to use these important instructions.
210
Programmers Guide
211
C H A P T E R
Sharing Data and Procedures Among Modules and Libraries
To use symbols and procedures in more than one module, the assembler must be able to recognize the shared data as global to all the modules where they are used. MASM provides techniques to simplify data-sharing and give a high-level interface to multiple-module programming. With these techniques, you can place shared symbols in include files. This makes the data declarations in the file available to all modules that use the include file. This chapter explains the two data-sharing methods MASM 6.1 offers. The first method simplifies data sharing between modules with include files. The second does not involve include files. Instead, this method allows modules to share procedures and data items using the PUBLIC and EXTERN directives. The last section of this chapter explains how to create program libraries and access their routines.
Selecting Data-Sharing Methods

If data defined in one module is to be used in other modules of a program, you must declare the data public and external. MASM provides several ways to do this:
u
Declare a symbol public with the PUBLIC directive in the module where it is defined. This makes the symbol available to other modules. You must also place an EXTERN statement for that symbol in all other modules that refer to the public symbol. This statement informs the assembler that the symbol is external that is, defined in another module. u Declare the data communal with the COMM directive. However, communal variables have limitations. You cannot depend on their location in memory because they are allocated by the linker, and they cannot be initialized. The EXTERNDEF directive declares a symbol either public or external, as appropriate. EXTERNDEF simplifies the declarations for global (public and external) variables and encourages the use of include files.
212
Programmers Guide
The next section provides further details on using include files. For more information on PUBLIC and EXTERN, see Using Alternatives to Include Files, page 219.
Sharing Symbols with Include Files

Include files can contain any valid MASM statement, but typically consist of type and symbol declarations. The assembler inserts the contents of the include file into a module at the location of the INCLUDE directive. Include files are optional, but can simplify project organization by eliminating the need to insert common declarations into all modules of a program. An alternative to using include files is described in Using Alternatives to Include Files, page 219. This section explains how to organize symbol definitions and the declarations that make them global (available to all modules); how to make both variables and procedures public with EXTERNDEF, PROTO , and COMM.; and where to place these directives in the modules and include files.
Organizing Modules
This section summarizes the organization of declarations and definitions in modules and include files and the use of the INCLUDE directive.
Include Files
Type declarations that need to be identical in every module should be placed in an include file. This ensures consistency and saves time when you update programs. Include files should contain only symbol declarations and any other declarations that are resolved at assembly time. (For a list of assembly-time operations, see Generating and Running Executable Programs in Chapter 1.) If more than one module accesses the include file, the file cannot contain statements that define and allocate memory for symbols. Otherwise, the assembler would attempt to allocate the same symbol more than once. Note An include file used in two or more modules should not allocate data variables.
Modules
An INCLUDE statement is usually placed before data and code segments in your modules. When the assembler encounters an INCLUDE directive, it opens the specified file and assembles all its statements. The assembler then returns to the original module and continues the assembly.
Chapter 8 Sharing Data and Procedures Among Modules and Libraries
213
The INCLUDE directive takes the form: INCLUDE filename where filename is the full name of the include file. For example, the following declaration inserts the contents of the include file SCREEN.INC in your program:
INCLUDE SCREEN.INC
The filename in the INCLUDE directive must be fully specified; no extensions are assumed. If a full pathname is not given, the assembler first searches the directory of the source file containing the INCLUDE directive. If the include file is not in the source file directory, the assembler searches the paths specified in the assemblers command-line option /I, or in PWBs Include Paths field in the MASM Option dialog box (accessed from the Option menu). The /I option takes this form: /I path You can include more than one /I option on the command line. The assembler then searches for include files within each specified path in the order given. If none of these directories contains the include file, the assembler finally searches in the paths specified in the INCLUDE environment variable. If the include file still cannot be found, an assembly error occurs. (The /x command-line option tells the assembler to ignore the INCLUDE environment variable when searching for include files.) An include file may specify another include file. The assembler processes the second include file before returning to the first. Your program can nest include files this way as deeply as the amount of free memory allows.
Include Files or Modules

You can use the EQU directive to create named constants that cannot be redefined in your program. (For information about the EQU directive, see Integer Constants and Constant Expressions, page 11.) Placing a constant defined with EQU in an include file makes it available to all modules that use that include file. Placing TYPEDEF, STRUCT, UNION, and RECORD definitions in an include file guarantees consistency in type definitions. If required, the variable instances derived from these definitions can be made public among the modules with EXTERNDEF declarations (see the next section). Macros, including macros defined with TEXTEQU, must be placed in include files to make them visible in other modules.
214
Programmers Guide
If you elect to use full segment definitions with, or instead of, simplified definitions, you can force a consistent segment order in all files by defining segments in an include file. This technique is explained in Controlling the Segment Order, page 47.
Declaring Symbols Public and External

It is sometimes useful to make certain procedures and variables (such as status flags) global to all program modules. Global variables are freely accessible within all routines; you do not have to explicitly pass them to the routines that need them. This section describes how to make variables and procedures global using the EXTERNDEF, PROTO , or COMM declarations within include files. When a procedure is defined in one module and called in another module, it must be declared public in the defining module and external in the calling module(s). MASM offers three ways to declare a procedure public and external:
u
u u
Use the PUBLIC directive in the defining module and EXTERN in all other modules that reference the procedure. The PUBLIC and EXTERN directives are explained on page 220. Declare the procedure with EXTERNDEF. Prototype the procedure with the PROTO directive.
Using EXTERNDEF
MASM treats EXTERNDEF as a public declaration in the defining module, and as an external declaration in the referencing module(s). You can use the EXTERNDEF statement in your include file to make a variable common to two or more modules. EXTERNDEF works with all types of variables, including arrays, structures, unions, and records. It also works with procedures. As a result, a single include file can contain an EXTERNDEF declaration that works in both the defining module and any referencing module. It is ignored in modules that neither define nor reference the variable. Therefore, an include file for a library which is used in multiple .EXE files does not force the definition of a symbol as EXTERN does. The EXTERNDEF statement takes this form: EXTERNDEF [[langtype]] name:qualifiedtype The name is the variables identifier. The qualifiedtype is explained in detail in Data Types, page 14. The optional langtype specifier sets the naming conventions for the name it precedes. It overrides any language specified in the .MODEL directive. The specifier can be C, SYSCALL, STDCALL, PASCAL, FORTRAN, or
215
BASIC. For information on selecting the appropriate langtype type, see Naming and Calling Conventions, page 308.
216
Programmers Guide
The following diagram shows the statements that declare an array, make it public, and use it in another module.
Figure 8.1
Using EXTERNDEF for Variables
The file position of EXTERNDEF directives is important. For more information, see Positioning External Declarations, following. You can also make procedures visible by using EXTERNDEF without PROTO inside an include file. This method treats the procedure name as a simple identifier, without the parameter list, so you forgo the assemblers ability to check for the correct parameters during assembly. Use EXTERNDEF with procedures in the same way as variables:
EXTERNDEF MyProc:FAR ; Declare far procedure external
You can also use EXTERNDEF to make a code label global between modules so that one module can reference a label in another module. Give the label global scope with the double colon operator, like this:
EXTERNDEF codelabel:NEAR . . . codelabel::
Another module can reference codelabel like this:

EXTERNDEF codelabel:NEAR . . . jmp codelabel
217
Using PROTO
This section describes how to prototype a procedure with the PROTO directive. PROTO automatically issues an EXTERNDEF for the procedure unless the PROC statement declares the procedure PRIVATE. Defining a prototype enables type-checking for the procedure arguments. Follow these steps to create an interface for a procedure defined in one module and called from other modules: 1. Place the PROTO declaration in the include file. 2. Define the procedure with PROC in one module. The PROC directive declares the procedure PUBLIC by default. 3. Call the procedure with the INVOKE statement (or with CALL). Make sure that all calling modules access the include file. For descriptions, syntax, and examples of PROTO , PROC, and INVOKE, see Chapter 7, Controlling Program Flow. The following example illustrates these three steps. In the example, a PROTO statement defines the far procedure CopyFile, which uses the C parameterpassing and naming conventions, and takes the arguments filename and numberlines. The diagram following the example shows the file placement for these statements. This definition goes into the include file:
CopyFile PROTO FAR C filename:BYTE, numberlines:WORD
The procedure definition for CopyFile is:

CopyFile PROC FAR C USES cx, filename:BYTE, numberlines:WORD
To call the CopyFile procedure, you can use this INVOKE statement:
INVOKE CopyFile, NameVar, 200
218
Programmers Guide
Figure 8.2
Using PROTO and INVOKE
Using COMM
Another way to share variables among modules is to add the COMM (communal) declaration to your include file. Since communal variables are allocated by the linker and cannot be initialized, you cannot depend on their location or sequence. Communal variables are supported by MASM primarily for compatibility with communal variables in Microsoft C. Communal variables are not used in any other Microsoft language, and they are not compatible with C++ and some other languages. COMM declares a data variable external and instructs the linker to allocate the variable if it has not been explicitly defined in a module. The memory space for communal variables may not be assigned until load time, so using communal variables may reduce the size of your executable file. The COMM declaration has the syntax: COMM [[langtype]] [[NEAR | FAR]] label:type[[:count]] The label is the name of the variable. The langtype sets the naming conventions for the name it precedes. It overrides any language specified in the .MODEL directive.
219
If NEAR or FAR is not specified, the variable determines the default from the current memory model (NEAR for TINY, SMALL, COMPACT, and FLAT; FAR for MEDIUM, LARGE, and HUGE). If you do not provide a memory model with the .MODEL directive, you must specify a distance when accessing a communal variable, like this:
mov mov ax, NEAR PTR CommNear bx, FAR PTR CommFar
The type can be a constant expression, but it is usually a type such as BYTE, WORD, or DWORD, or a structure, union, or record. If you first declare the type with TYPEDEF, CodeView can provide type information. The count is the number of elements. If no count is given, one element is assumed. The following example creates the on far variable DataBlock, which is a 1,024-element array of uninitialized signed doublewords:
COMM FAR DataBlock:SDWORD:1024
Note C variables declared outside functions (except static variables) are communal unless explicitly initialized; they are the same as assembly-language communal variables. If you are writing assembly-language modules for C, you can declare the same communal variables in both C and MASM include files. However, communal variables in C do not have to be declared communal in assembler. The linker will match the EXTERN, PUBLIC, and COMM statements for the variable. EXTERNDEF (explained in the previous section) is more flexible than COMM because you can initialize variables defined with it, and your code can rely on the position and sequence of the defined data.
Positioning External Declarations

Although LINK determines the actual address of an external symbol, the assembler assumes a default segment for the symbol, based on the location of the external directive in the source code. You should therefore position EXTERN and EXTERNDEF directives according to these rules:
u
If you know which segment defines an external symbol, put the EXTERN statement in that segment.
220
Programmers Guide
u
If you know the group but not the segment, position the EXTERN statement outside any segment and reference the variable with the group name. For example, if var1 is in DGROUP, reference the variable as
mov DGROUP:var1, 10
If you know nothing about the location of an external variable, put the EXTERN statement outside any segment. You can use the SEG directive to access the external variable like this:
mov mov mov ax, SEG var1 es, ax ax, es:var1
If the symbol is an absolute symbol or a far code label, you can declare it external anywhere in the source code.
Always close any segments opened in include files so that external declarations following an include statement are not incorrectly placed inside a segment. If you want to be certain an external definition lies outside a segment, you can use @CurSeg. The @CurSeg predefined symbol returns a blank if the definition is not in a segment. For example,
.DATA . . . @CurSeg ENDS EXTERNDEF var:WORD
; Close segment
For information about predefined symbols such as @CurSeg, see Predefined Symbols, page 10.
Using Alternatives to Include Files

If your project uses only two modules (or if it is written with a version of MASM prior to 6.0), you may want to continue using PUBLIC in the defining module and EXTERN in the referencing module, and not create an include file for the project. The EXTERN directive can be used in an include file, but the include file containing EXTERN cannot be added to the module that contains the corresponding PUBLIC directive for that symbol. This section assumes that you are not using include files.
221
PUBLIC and EXTERN

The PUBLIC and EXTERN directives are less flexible than EXTERNDEF and PROTO because they are module-specific: PUBLIC must appear in the defining module and EXTERN must appear in the calling modules. This section shows how to use PUBLIC and EXTERN. Information on where to place the external declarations in your file is in Positioning External Declarations, previous. The PUBLIC directive makes a name visible outside the module in which it is defined. This gives other program modules access to that identifier. The EXTERN directive performs the complementary function. It tells the assembler that a name referenced within a particular module is actually defined and declared public in another module that will be specified at link time. A PUBLIC directive can appear anywhere in a file. Its syntax is: PUBLIC [[langtype]] name[[, [[langtype]] name]]... The name must be the name of an identifier defined within the current source file. Only code labels, data labels, procedures, and numeric equates can be declared public. If you specify the langtype field here, it overrides the language specified by .MODEL. The langtype field can be C, SYSCALL, STDCALL, PASCAL, FORTRAN, or BASIC. For more information on specifying langtype types, see Declaring Parameters with the PROC Directive, page 184, and Naming and Calling Conventions, page 308. The EXTERN directive tells the assembler that an identifier is external defined in some other module that will be supplied at link time. Its syntax is: EXTERN [[langtype]] name:{ABS | qualifiedtype} Data Types, page 14, describes qualifiedtype. You can use the ABS (absolute) keyword only with external numeric constants. ABS causes the identifier to be imported as a relocatable unsized constant. This identifier can then be used anywhere a constant can be used. If the identifier is not found in another module at link time, the linker generates an error.
222
Programmers Guide
In the following example, the procedure BuildTable and the variable Var are declared public. The procedure uses the Pascal naming and data-passing conventions:
Figure 8.3
Using PUBLIC and EXTERN
Other Alternatives
You can also use the directives discussed earlier (EXTERNDEF, PROTO , and COMM) without the include file. In this case, place the declarations to make a symbol global in the same module where the symbol is defined. You might want to use this technique if you are linking only a few modules that have very little data in common.
Developing Libraries
As you create reusable procedures, you can place them in a library file for convenient access. Although you can put any routine into a library, each library file, recognizable by its .LIB extension, usually contains related routines. For example, you might place string-manipulation functions in one library, matrix calculations in another, and port communications in another. Do not place communal variables (defined with the COMM directive) in a library. A library consists of combined object modules, each created from a single source file. The object module is the smallest independent unit in a library. If you link with one symbol in a module, the linker adds the entire module to your program, but not the entire library.
223
Associating Libraries with Modules

You can choose either of two methods for associating your libraries with the modules that use them: you can use the INCLUDELIB directive inside your source files, or link the modules from the command line. To associate a specified library with your object code, use INCLUDELIB. You can add this directive to the source file to specify the libraries you want linked, rather than specifying them in the LINK command line. The INCLUDELIB syntax is: INCLUDELIB libraryname The libraryname can be a file name or a complete path specification. If you do not specify an extension, .LIB is assumed. The libraryname is placed in the comment record of the object file. LINK reads this record and links with the specified library file. For example, the statement INCLUDELIB GRAPHICS passes a message from the assembler to the linker telling LINK to use library routines from the file GRAPHICS.LIB. If you place this statement in the source file DRAW.ASM and GRAPHICS.LIB is in the same directory, you can assemble and link the program with the following command:
ML DRAW.ASM
Without the INCLUDELIB directive, you must link the program DRAW.ASM with either of the following commands:
ML DRAW.ASM GRAPHICS.LIB ML DRAW /link GRAPHICS
If you want to assemble and link separately, type

ML /c DRAW.ASM LINK DRAW,,,GRAPHICS
If you do not specify a complete path in the INCLUDELIB statement or at the command line, LINK searches for the library file in the following order: 1. In the current directory. 2. In any directories in the library field of the LINK command line. 3. In any directories specified by the LIB environment variable. The LIB.EXE utility helps you create, organize, and maintain run-time libraries. Refer to Environment and Tools for instructions on LIB.EXE.
224
Programmers Guide
Using EXTERN with Library Routines

In some cases, EXTERN helps you limit the size of your executable file by specifying in the syntax an alternative name for a procedure. You would use this form of the EXTERN directive when declaring a procedure or symbol that may not need to be used. The syntax looks like this: EXTERN [[langtype]] name [[ (altname) ]] :qualifiedtype The addition of the altname to the syntax provides the name of an alternate procedure that the linker uses to resolve the external reference if the procedure given by name is not needed. Both name and altname must have the same qualifiedtype. When the linker encounters an external definition for a procedure that gives an altname, the linker finishes processing that module before it links the object module that contains the procedure given by name. If the program does not reference any symbols in the name files object from any of the linked modules, the linker uses altname to satisfy the external reference. This saves space because the library object module is not brought in. For example, assume that the contents of STARTUP.ASM include these statements:
EXTERN init(dummy):PROC . . . PROC . . . ret ENDP . . . call
dummy
; A procedure definition containing no ; executable code
dummy
init
; Defined in FLOAT.OBJ
In this example, the reference to the routine init (defined in FLOAT.OBJ) does not force the module FLOAT.OBJ to be linked into the executable file. If another reference causes FLOAT.OBJ to be linked into the executable file, then init will refer to the init label in FLOAT.OBJ. If there are no references that force linkage with FLOAT.OBJ, the linker will use the alternate name for init(dummy).
224
Programmers Guide
225
C H A P T E R
Using Macros
A macro is a symbolic name you give to a series of characters (a text macro) or to one or more statements (a macro procedure or function). As the assembler evaluates each line of your program, it scans the source code for names of previously defined macros. When it finds one, it substitutes the macro text for the macro name. In this way, you can avoid writing the same code several places in your program. This chapter describes the following types of macros:
u u
Text macros, which expand to text within a source statement. Macro procedures, which expand to one or more complete statements and can optionally take parameters. Repeat blocks, which generate a group of statements a specified number of times or until a specified condition becomes true. Macro functions, which look like macro procedures and can be used like text macros but which also return a value. Predefined macro functions and string directives, which perform string operations.
This chapter explains how to use macros for simple code substitutions and how to write sophisticated macros with parameter lists and repeat loops. It also describes how to use these features in conjunction with local symbols, macro operators, and predefined macro functions.
Filename: LMAPGC09.DOC Project: Template: MSGRIDA1.DOT Author: rick debroux Last Saved By: Ruth L Silverio Revision #: 86 Page: 225 of 1 Printed: 10/02/00 04:22 PM
226
Programmers Guide
Text Macros
You can give a sequence of characters a symbolic name and then use the name in place of the text later in the source code. The named text is called a text macro. The TEXTEQU directive defines a text macro, as these examples show: name TEXTEQU <text> name TEXTEQU macroId | textmacro name TEXTEQU %constExpr In the previous lines, text is a sequence of characters enclosed in angle brackets, macroId is a previously defined macro function, textmacro is a previously defined text macro, and %constExpr is an expression that evaluates to text. Here are some examples:
msg string msg value TEXTEQU TEXTEQU TEXTEQU TEXTEQU <Some text> msg <Some other text> %(3 + num) ; ; ; ; ; Text assigned to symbol Text macro assigned to symbol New text assigned to symbol Text representation of resolved expression assigned to symbol
The first line assigns text to the symbol msg. The second line equates the text of the msg text macro with a new text macro called string. The third line assigns new text to msg. Although msg has new text, string retains its original text value. The fourth line assigns 7 to value if num equals 4. If a text macro expands to another text macro (or macro function, as discussed on page 248), the resulting text macro will expand recursively. Text macros are useful for naming strings of text that do not evaluate to integers. For example, you might use a text macro to name a floating-point constant or a bracketed expression. Here are some practical examples:
pi WPT arg1 TEXTEQU <3.1416> TEXTEQU <WORD PTR> TEXTEQU <[bp+4]> ; Floating point constant ; Sequence of key words ; Bracketed expression
Macro Procedures
If your program must perform the same task many times, you can avoid repeatedly typing the same statements each time by writing a macro procedure. Think of macro procedures (commonly called macros) as text-processing mechanisms that automatically generate repeated text.
Chapter 9 Using Macros
227
This section uses the term macro procedure rather than macro when necessary to distinguish between a macro procedure and a macro function. Macro functions are described in Returning Values with Macro Functions. Conforming to common usage, this chapter occasionally speaks of calling a macro, a term that deserves further scrutiny. Its natural to think of a program calling a macro procedure in the same way it calls a normal subroutine procedure, because they seem to perform identically. However, a macro is simply a representative for real code. Wherever a macro name appears in your program, so in reality does all the code the macro represents. A macro does not cause the processor to vector off to a new location as does a normal procedure. Thus, the expression calling a macro may imply the effect, but does not accurately describe what actually occurs.
Creating Macro Procedures

You can define a macro procedure without parameters by placing the desired statements between the MACRO and ENDM directives: name MACRO statements ENDM For example, suppose you want a program to beep when it encounters certain errors. You could define a beep macro as follows:
beep mov mov int ENDM MACRO ah, 2 dl, 7 21h ;; Select DOS Print Char function ;; Select ASCII 7 (bell) ;; Call DOS
The double semicolons mark the beginning of macro comments. Macro comments appear in a listing file only at the macros initial definition, not at the point where the macro is referenced and expanded. Listings are usually easier to read if the comments arent repeatedly expanded. However, regular comments (those with a single semicolon) are listed in macro expansions. See Appendix C for listing files and examples of how macros are expanded in listings. Once you define a macro, you can call it anywhere in the program by using the macros name as a statement. The following example calls the beep macro two times if an error flag has been set.
.IF beep beep .ENDIF error ; If error flag is true ; execute macro two times
228
Programmers Guide
During assembly, the instructions in the macro replace the macro reference. The listing file shows:
.IF 0017 001C 001E 0020 0022 0024 0026 0028 002A 80 3E 0000 R 00 74 0C B4 02 B2 07 CD 21 B4 02 B2 07 CD 21 * * beep 1 1 1 beep 1 1 1 .ENDIF *@C0001: mov mov int ah, 2 dl, 7 21h mov mov int ah, 2 dl, 7 21h cmp je error, 000h @C0001 error
Contrast this with the results of defining beep as a procedure using the PROC directive and then calling it with the CALL instruction. Many such tasks can be handled as either a macro or a procedure. In deciding which method to use, you must choose between speed and size. For repetitive tasks, a procedure produces smaller code, because the instructions physically appear only once in the assembled program. However, each call to the procedure involves the additional overhead of a CALL and RET instruction. Macros do not require a change in program flow and so execute faster, but generate the same code multiple times rather than just once.
Passing Arguments to Macros

By defining parameters for macros, you can define a general task and then execute variations of it by passing different arguments each time you call the macro. The complete syntax for a macro procedure includes a parameter list: name MACRO parameterlist statements ENDM The parameterlist can contain any number of parameters. Use commas to separate each parameter in the list. You cannot use reserved words as parameter names unless you disable the keyword with OPTION NOKEYWORD. You must also set the compatibility mode with OPTION M510 or the /Zm command-line option. To pass arguments to a macro, place the arguments after the macro name when you call the macro: macroname arglist
229
The assembler treats as one item all text between matching quotation marks in an arglist. The beep macro introduced in the previous section used the MS-DOS interrupt to write only the bell character (ASCII 7). We can rewrite the macro with a parameter that accepts any character:
writechar MACRO char mov ah, 2 mov dl, char int 21h ENDM ;; Select DOS Print Char function ;; Select ASCII char ;; Call DOS
Whenever it expands the macro, the assembler replaces each instance of char with the given argument value. The rewritten macro now writes any character to the screen, not just ASCII 7:
writechar 7 writechar A ; Causes computer to beep ; Writes A to screen
If you pass more arguments than there are parameters, the additional arguments generate a warning (unless you use the VARARG keyword; see page 242). If you pass fewer arguments than the macro procedure expects, the assembler assigns empty strings to the remaining parameters (unless you have specified default values). This may cause errors. For example, a reference to the writechar macro with no argument results in the following line:
mov dl,
The assembler generates an error for the expanded statement but not for the macro definition or the macro call. You can make macros more flexible by leaving off arguments or adding additional arguments. The next section tells some of the ways your macros can handle missing or extra arguments.
Specifying Required and Default Parameters

Macro parameters can have special attributes to make them more flexible and improve error handling. You can make parameters required, give them default values, or vary their number. Variable parameters are used almost exclusively with the FOR directive, so are covered in FOR Loops and Variable-Length Parameters, later in this chapter.
230
Programmers Guide
The syntax for a required parameter is: parameter:REQ For example, you can rewrite the writechar macro to require the char parameter:
writechar MACRO char:REQ mov ah, 2 mov dl, char int 21h ENDM ;; Select DOS Print Char function ;; Select ASCII char ;; Call DOS
If the call does not include a matching argument, the assembler reports the error in the line that contains the macro reference. REQ can thus improve error reporting. You can also accommodate missing parameters by specifying a default value, like this: parameter:=textvalue Suppose that you often use writechar to beep by printing ASCII 7. The following macro definition uses an equal sign to tell the assembler to assume the parameter char is 7 unless you specify otherwise:
writechar MACRO char:=<7> mov ah, 2 mov dl, char int 21h ENDM ;; Select DOS Print Char function ;; Select ASCII char ;; Call DOS
If a reference to this macro does not include the argument char, the assembler fills in the blank with the default value of 7 and the macro beeps when called. Enclose the default parameter value in angle brackets so the assembler recognizes the supplied value as a text value. This is explained in detail in Text Delimiters and the Literal-Character Operator, later in this chapter. Missing arguments can also be handled with the IFB, IFNB, .ERRB, and .ERRNB directives. They are described in the section Conditional Directives in chapter 1 and in Help. Here is a slightly more complex macro that uses some of these techniques:

Scroll MACRO distance:REQ, attrib:=<7>, tcol, trow, bcol, brow IFNB <tcol> ;; Ignore arguments if blank mov cl, tcol ENDIF IFNB <trow> mov ch, trow ENDIF IFNB <bcol> mov dl, bcol ENDIF IFNB <brow> mov dh, brow ENDIF IFDIFI <attrib>, <bh> ;; Dont move BH onto itself mov bh, attrib ENDIF IF distance LE 0 ;; Negative scrolls up, positive down mov ax, 0600h + (-(distance) AND 0FFh) ELSE mov ax, 0700h + (distance AND 0FFh) ENDIF int 10h ENDM
231
In this macro, the distance parameter is required. The attrib parameter has a default value of 7 (white on black), but the macro also tests to make sure the corresponding argument isnt BH, since it would be inefficient (though legal) to load a register onto itself. The IFNB directive is used to test for blank arguments. These are ignored to allow the user to manipulate rows and columns directly in registers CX and DX at run time. The following shows two valid ways to call the macro:
; Assume DL and CL already loaded dec dh ; Decrement top row inc ch ; Increment bottom row Scroll -3 ; Scroll white on black dynamic ; window up three lines Scroll 5, 17h, 2, 2, 14, 12 ; Scroll white on blue constant ; window down five lines
This macro can generate completely different code, depending on its arguments. In this sense, it is not comparable to a procedure, which always has the same code regardless of arguments.
232
Programmers Guide
Defining Local Symbols in Macros

You can make a symbol local to a macro by identifying it at the start of the macro with the LOCAL directive. Any identifier may be declared local. You can choose whether you want numeric equates and text macros to be local or global. If a symbol will be used only inside a particular macro, you can declare it local so that the name will be available for other declarations outside the macro. You must declare as local any labels within a macro, since a label can occur only once in the source. The LOCAL directive makes a special instance of the label each time the macro appears. This prevents redefinition of the label when expanding the macro. It also allows you to reuse the label elsewhere in your code. You must declare all local symbols immediately following the MACRO statement (although blank lines and comments may precede the local symbol). Separate each symbol with a comma. You can attach comments to the LOCAL statement and list multiple LOCAL statements in the macro. Here is an example macro that declares local labels:
power MACRO factor:REQ, exponent:REQ LOCAL again, gotzero ;; Local symbols sub dx, dx ;; Clear top mov ax, 1 ;; Multiply by one on first loop mov cx, exponent ;; Load count jcxz gotzero ;; Done if zero exponent mov bx, factor ;; Load factor again: mul bx ;; Multiply factor times exponent loop again ;; Result in AX gotzero: ENDM
If the labels again and gotzero were not declared local, the macro would work the first time it is called, but it would generate redefinition errors on subsequent calls. MASM implements local labels by generating different names for them each time the macro is called. You can see this in listing files. The labels in the power macro might be expanded to ??0000 and ??0001 on the first call and to ??0002 and ??0003 on the second.
233
You should avoid using anonymous labels in macros (see Anonymous Labels in Chapter 7). Although legal, they can produce unwanted results if you expand a macro near another anonymous label. For example, consider what happens in the following:
Update MACRO arg1 @@: . . . loop @B ENDM . . . jcxz @F Update ax @@:
Expanding Update places another anonymous label between the jump and its target. The line
jcxz @F
consequently jumps to the start of the loop rather than over the loop exactly the opposite of what the programmer intended.
Assembly-Time Variables and Macro Operators

In writing macros, you will often assign and modify values assigned to symbols. Think of these symbols as assembly-time variables. Like memory variables, they are symbols that represent values. But since macros are processed at assembly time, any symbol modified in a macro must be resolved as a constant by the end of assembly. The three kinds of assembly-time variables are:
u u u
Macro parameters Text macros Macro functions
When the assembler expands a macro, it processes the symbols in the order shown here. MASM first replaces macro parameters with the text of their actual arguments, then expands text macros.
234
Programmers Guide
Macro parameters are similar to procedure parameters in some ways, but they also have important differences. In a procedure, a parameter has a type and a memory location. Its value can be modified within the procedure. In a macro, a parameter is a placeholder for the argument text. The value can only be assigned to another symbol or used directly; it cannot be modified. The macro may interpret the argument text it receives either as a numeric value or as a text value. It is important to understand the difference between text values and numeric values. Numeric values can be processed with arithmetic operators and assigned to numeric equates. Text values can be processed with macro functions and assigned to text macros. Macro operators are often helpful when processing assembly-time variables. Table 9.1 shows the macro operators that MASM provides.
Table 9.1 MASM Macro Operators Symbol <> ! % & Name Text Delimiters Literal-Character Operator Expansion Operator Substitution Operator Description Opens and closes a literal string. Treats the next character as a literal character, even if it would normally have another meaning. Causes the assembler to expand a constant expression or text macro. Tells the assembler to replace a macro parameter or text macro name with its actual value.
The next sections explain these operators in detail.
Text Delimiters and the Literal-Character Operator

The angle brackets (< >) are text delimiters. A text value is usually delimited when assigning a text macro. You can do this with TEXTEQU, as previously shown, or with the SUBSTR and CATSTR directives discussed in String Directives and Predefined Functions, later in this chapter. By delimiting the text of macro arguments, you can pass text that includes spaces, commas, semicolons, and other special characters. The following example expands a macro called work in two different ways:
work work <1, 2, 3, 4, 5> ; Passes one argument with 13 chars, ; including commas and spaces 1, 2, 3, 4, 5 ; Passes five arguments, each ; with 1 character
235
The literal-character operator (!) lets you include angle brackets as part of a delimited text value, so the assembler does not interpret them as delimiters. The assembler treats the character following ! literally rather than as a special character, like this:
errstr TEXTEQU <Expression !> 255> ; errstr = Expression > 255
Text delimiters also have a special use with the FOR directive, as explained in FOR Loops and Variable-Length Parameters, later in this chapter.
Expansion Operator
The expansion operator (%) expands text macros or converts constant expressions into their text representations. It performs these tasks differently in different contexts, as discussed in the following.
Converting Numeric Expressions to Text

The expansion operator can convert numbers to text. The operator forces immediate evaluation of a constant expression and replaces it with a text value consisting of the digits of the result. The digits are generated in the current radix (default decimal). This application of the expansion operator is useful when defining a text macro, as the following lines show. Notice how you can enclose expressions with parentheses to make them more readable:
a b c TEXTEQU <3 + 4> TEXTEQU %3 + 4 TEXTEQU %(3 + 4) ; a = 3 + 4 ; b = 7 ; c = 7
When assigning text macros, you can use numeric equates in the constant expressions, but not text macros:
num numstr a b EQU TEXTEQU TEXTEQU TEXTEQU 4 <4> %3 + num %3 + numstr ; ; ; ; num = 4 numstr = <4> a = <7> b = <7>
The expansion operator gives you flexibility when passing arguments to macros. It lets you pass a computed value rather than the literal text of an expression. The following example illustrates by defining a macro
work ENDM MACRO arg mov ax, arg * 4
236
Programmers Guide
which accepts different arguments:

work work work work work work 2 + 3 %2 + 3 2 + num %2 + num 2 + numstr %2 + numstr ; ; ; ; ; ; ; ; Passes 2 + 3 Code: mov ax, 2 + (3 * 4) Passes 5 Code: mov ax, 5 * 4 Passes 2 + num Passes 6 Passes 2 + numstr Passes 6
You must consider operator precedence when using the expansion operator. Parentheses inside the macro can force evaluation in a desired order:
work ENDM work work 2 + 3 %2 + 3 ; Code: mov ax, (2 + 3) * 4 ; Code: mov ax, (5) * 4 MACRO arg mov ax, (arg) * 4
Several other uses for the expansion operator are reviewed in Returning Values with Macro Functions, later in this chapter.
Expansion Operator as First Character on a Line

The expansion operator has a different meaning when used as the first character on a line. In this case, it instructs the assembler to expand any text macros and macro functions it finds on the rest of the line. This feature makes it possible to use text macros with directives such as ECHO, TITLE, and SUBTITLE, which take an argument consisting of a single text value. For instance, ECHO displays its argument to the standard output device during assembly. Such expansion can be useful for debugging macros and expressions, but the requirement that its argument be a single text value may have unexpected results. Consider this example:
ECHO Bytes per element: %(SIZEOF array / LENGTHOF array)
Instead of evaluating the expression, this line echoes it:

Bytes per element: %(SIZEOF array / LENGTHOF array)
However, you can achieve the desired result by assigning the text of the expression to a text macro and then using the expansion operator at the beginning of the line to force expansion of the text macro.

temp % TEXTEQU %(SIZEOF array / LENGTHOF array) ECHO Bytes per element: temp
237
Note that you cannot get the same results simply by putting the % at the beginning of the first echo line, because % expands only text macros, not numeric equates or constant expressions. Here are more examples of the expansion operator at the start of a line:
; Assume memmod, lang, and os specified with /D option % SUBTITLE Model: memmod Language: lang Operating System: os ; Assume num defined earlier tnum TEXTEQU %num % .ERRE num LE 255, <Failed because tnum !> 255>
Substitution Operator
References to a parameter within a macro can sometimes be ambiguous. In such cases, the assembler may not expand the argument as you intend. The substitution operator (&) lets you identify unambiguously any parameter within a macro. As an example, consider the following macro:
errgen MACRO num, msg PUBLIC errnum errnum BYTE Error num: msg ENDM
This macro is open to several interpretations:

u u
Is errnum a distinct word or the word err next to the parameter num? Should num and msg within the string be treated literally as part of the string or as arguments?
In each case, the assembler chooses the most literal interpretation. That is, it treats errnum as a distinct word, and num and msg as literal parts of the string. The substitution operator can force different interpretations. If we rewrite the macro with the & operator, it looks like this:
errgen MACRO num, msg PUBLIC err&num err&num BYTE Error &num: &msg ENDM
238
Programmers Guide
When called with the following arguments,

errgen 5, <Unreadable disk>
the macro now generates this code:

err5 PUBLIC BYTE err5 Error 5: Unreadable disk
When it encounters the & operator, the assembler interprets subsequent text as a parameter name until the next & or until the next separator character (such as a space, tab, or comma). Thus, the assembler correctly parses the expression err&num because num is delimited by & and a space. The expression could also be written as err&num&, which again unambiguously identifies num as a parameter. The rule also works in reverse. You can delimit a parameter reference with & at the end rather than at the beginning. For example, if num is 5, the expression num&12 resolves to 512. The assembler processes substitution operators from left to right. This can have unexpected results when you are pasting together two macro parameters. For example, if arg1 has the value var and arg2 has the value 3, you could paste them together with this statement:
&arg1&&arg2& BYTE Text
Eliminating extra substitution operators, you might expect the following to be equivalent:
&arg1&arg2 BYTE Text
However, this actually produces the symbol vararg2, because in processing from left to right, the assembler associates both the first and the second & symbols with the first parameter. The assembler replaces &arg1& by var, producing vararg2. The arg2 is never evaluated. The correct abbreviation is:
arg1&&arg2 BYTE Text
which produces the desired symbol var3. The symbol arg1&&arg2 is replaced by var&arg2, which is replaced by var3. The substitution operator is also necessary if you want to substitute a text macro inside quotes. For example,

arg %echo %echo TEXTEQU <hello> This is a string &arg ; Produces: This is a string hello This is a string arg ; Produces: This is a string arg
239
You can also use the substitution operator in lines beginning with the expansion operator (%) symbol, even outside macros (see page 236). It may be necessary to use the substitution operator to paste text macro names to adjacent characters or symbol names, as shown here:
text value % TEXTEQU <var> TEXTEQU %5 ECHO textvalue is text&&value
This echoes the message

textvalue is var5
Macro substitution always occurs before evaluation of the high-level control structures. The assembler may therefore mistake a bit-test operator (&) in your macro for a substitution operator. You can guarantee the assembler correctly recognizes a bit-test operator by enclosing its operands in parentheses, as shown here:
test MACRO x .IF ax==&x mov ax, 10 .ELSEIF ax&(x) mov ax, 20 .ENDIF ; &x substituted with parameter value ; & is bitwise AND
ENDM
The rules for using the substitution operator have changed significantly since MASM 5.1, making macro behavior more consistent and flexible. If you have macros written for MASM 5.1 or earlier, you can specify the old behavior by using OLDMACROS or M510 with the OPTION directive (see page 24).
Defining Repeat Blocks with Loop Directives

A repeat block is an unnamed macro defined with a loop directive. The loop directive generates the statements inside the repeat block a specified number of times or until a given condition becomes true. MASM provides several loop directives, which let you specify the number of loop iterations in different ways. Some loop directives can also accept arguments for each iteration. Although the number of iterations is usually specified in the directive, you can use the EXITM directive to exit the loop early.
240
Programmers Guide
Repeat blocks can be used outside macros, but they frequently appear inside macro definitions to perform some repeated operation in the macro. Since repeat blocks are macros themselves, they end with the ENDM directive.
241
This section explains the following four loop directives: REPEAT, WHILE, FOR, and FORC. In versions of MASM prior to 6.0, REPEAT was called REPT, FOR was called IRP, and FORC was called IRPC. MASM 6.1 recognizes the old names. The assembler evaluates repeat blocks on the first pass only. You should therefore avoid using address spans as loop counters, as in this example:
REPEAT (OFFSET label1 - OFFSET label2) ; Don't do this!
Since the distance between two labels may change on subsequent assembly passes as the assembler optimizes code, you should not assume that address spans remain constant between passes. Note The REPEAT and WHILE directives should not be confused with the REPEAT and WHILE directives (see Loop-Generating Directives in Chapter 7), which generate loop and jump instructions for run-time program control.
REPEAT Loops
REPEAT is the simplest loop directive. It specifies the number of times to generate the statements inside the macro. The syntax is: REPEAT constexpr statements ENDM The constexpr can be a constant or a constant expression, and must contain no forward references. Since the repeat block expands at assembly time, the number of iterations must be known then. Here is an example of a repeat block used to generate data. It initializes an array containing sequential ASCII values for all uppercase letters.
alpha LABEL BYTE letter = A REPEAT 26 BYTE letter letter = letter + 1 ENDM ; ; ;; ;; ;; Name the data generated Initialize counter Repeat for each letter Allocate ASCII code for letter Increment counter
242
Programmers Guide
Here is another use of REPEAT, this time inside a macro:

beep MACRO iter:=<3> mov ah, 2 mov dl, 7 REPEAT iter int 21h ENDM ;; ;; ;; ;; Character output function Bell character Repeat number specified by macro Call DOS
ENDM
WHILE Loops
The WHILE directive is similar to REPEAT, but the loop continues as long as a given condition is true. The syntax is: WHILE expression statements ENDM The expression must be a value that can be calculated at assembly time. Normally, the expression uses relational operators, but it can be any expression that evaluates to zero (false) or nonzero (true). Usually, the condition changes during the evaluation of the macro so that the loop wont attempt to generate an infinite amount of code. However, you can use the EXITM directive to break out of the loop. The following repeat block uses the WHILE directive to allocate variables initialized to calculated values. This is a common technique for generating lookup tables. (A lookup table is any list of precalculated results, such as a table of interest payments or trigonometric values or logarithms. Programs optimized for speed often use lookup tables, since calculating a value often takes more time than looking it up in a table.)
cubes LABEL BYTE ;; root = 1 ;; cube = root * root * root ;; WHILE cube LE 32767 ;; WORD cube ;; root = root + 1 ;; cube = root * root * root ENDM Name the data generated Initialize root Calculate first cube Repeat until result too large Allocate cube Calculate next root and cube
243
FOR Loops and Variable-Length Parameters

With the FOR directive you can iterate through a list of arguments, working on each of them in turn. It has the following syntax: FOR parameter, <argumentlist> statements ENDM The parameter is a placeholder that represents the name of each argument inside the FOR block. The argument list must contain comma-separated arguments and must always be enclosed in angle brackets. Heres an example of a FOR block:
series LABEL BYTE FOR arg, <1,2,3,4,5,6,7,8,9,10> BYTE arg DUP (arg) ENDM
On the first iteration, the arg parameter is replaced with the first argument, the value 1. On the second iteration, arg is replaced with 2. The result is an array with the first byte initialized to 1, the next 2 bytes initialized to 2, the next 3 bytes initialized to 3, and so on. The argument list is given specifically in this example, but in some cases the list must be generated as a text macro. The value of the text macro must include the angle brackets.
arglist TEXTEQU <!<3,6,9!>> %FOR arg, arglist . . . ENDM ; Generate list as text macro ; Do something to arg
Note the use of the literal character operator (!) to identify angle brackets as characters, not delimiters. See Text Delimiters (< >) and the Literal-Character Operator, earlier in this chapter. The FOR directive also provides a convenient way to process macros with a variable number of arguments. To do this, add VARARG to the last parameter to indicate that a single named parameter will have the actual value of all additional arguments. For example, the following macro definition includes the three possible parameter attributes required, default, and variable.
work MACRO rarg:REQ, darg:=<5>, varg:VARARG
244
Programmers Guide
The variable argument must always be last. If this macro is called with the statement
work 4, , 6, 7, a, b
the first argument is received as the value 4, the second is replaced by the default value 5, and the last four are received as the single argument <6, 7, a, b>. This is the same format expected by the FOR directive. The FOR directive discards leading spaces but recognizes trailing spaces. The following macro illustrates variable arguments:
show MACRO chr:VARARG mov ah, 02h FOR arg, <chr> mov dl, arg int 21h ENDM
ENDM
When called with

show O, K, 13, 10
the macro displays each of the specified characters one at a time. The parameter in a FOR loop can have the required or default attribute. You can modify the show macro to make blank arguments generate errors:
show MACRO chr:VARARG mov ah, 02h FOR arg:REQ, <chr> mov dl, arg int 21h ENDM
ENDM
The macro now generates an error if called with

show O,, K, 13, 10
245
Another approach would be to use a default argument:

show MACRO chr:VARARG mov ah, 02h FOR arg:=< >, <chr> mov dl, arg int 21h ENDM
ENDM
Now calling the macro with

show O,, K, 13, 10
inserts the default character, a space, for the blank argument.
FORC Loops
The FORC directive is similar to FOR, but takes a string of text rather than a list of arguments. The statements are assembled once for each character (including spaces) in the string, substituting a different character for the parameter each time through. The syntax looks like this: FORC parameter, < text> statements ENDM The text must be enclosed in angle brackets. The following example illustrates FORC:
FORC arg, BYTE BYTE BYTE ENDM <ABCDEFGHIJKLMNOPQRSTUVWXYZ> &arg ;; Allocate uppercase letter &arg + 20h ;; Allocate lowercase letter &arg - 40h ;; Allocate ordinal of letter
Notice that the substitution operator must be used inside the quotation marks to make sure that arg is expanded to a character rather than treated as a literal string. With versions of MASM earlier than 6.0, FORC is often used for complex parsing tasks. A long sentence can be examined character by character. Each character is then either thrown away or pasted onto a token string, depending on whether it is a separator character. The new predefined macro functions and string processing directives discussed in the following section are usually more efficient for these tasks.
246
Programmers Guide
String Directives and Predefined Functions

The assembler provides four directives for manipulating text:
Directive
SUBSTR INSTR SIZESTR CATSTR
Description
Assigns part of string to a new symbol. Searches for one string within another. Determines the size of a string. Concatenates one or more strings to a single string.
These directives assign a processed value to a text macro or numeric equate. For example, the following lines
num newstr = CATSTR 7 <3 + >, %num, < = > , %3 + num ; "3 + 7 = 10"
assign the string "3 + 7 = 10" to newstr. CATSTR and SUBSTR assign text in the same way as the TEXTEQU directive. SIZESTR and INSTR assign a number in the same way as the = operator. The four string directives take only text values as arguments. Use the expansion operator (%) when you need to make sure that constants and numeric equates expand to text, as shown in the preceding lines. Each of the string directives has a corresponding predefined macro function version: @SubStr, @InStr, @SizeStr, and @CatStr. Macro functions are similar to the string directives, but you must enclose their arguments in parentheses. Macro functions return text values and can appear in any context where text is expected. The following section, Returning Values with Macro Functions, tells how to write your own macro functions. The following example is equivalent to the previous CATSTR example:
num newstr = 7 TEXTEQU @CatStr( <3 + >, %num, < = > , %3 + num )
Macro functions are often more convenient than their directive counterparts because you can use a macro function as an argument to a string directive or to another macro function. Unlike string directives, predefined macro function names are case sensitive when you use the /Cp command-line option. Each string directive and predefined function acts on a string, which can be any textItem. The textItem can be text enclosed in angle brackets (< >), the name of a text macro, or a constant expression preceded by % (as in %constExpr). Refer to Appendix B, BNF Grammar, for a list of types that textItem can represent.
247
The following sections summarize the syntax for each of the string directives and functions. The explanations focus on the directives, but the functions work the same except where noted.
SUBSTR
name SUBSTR string, start[[, length]] @SubStr( string, start[[, length]] ) The SUBSTR directive assigns a substring from a given string to the symbol name. The start parameter specifies the position in string, beginning with 1, to start the substring. The length gives the length of the substring. If you do not specify length, SUBSTR returns the remainder of the string, including the start character.
INSTR
name INSTR [[start,]] string, substring @InStr( [[start]], string, substring ) The INSTR directive searches a specified string for an occurrence of substring and assigns its position number to name. The search is case sensitive. The start parameter is the position in string to start the search for substring. If you do not specify start, it is assumed to be position 1, the start of the string. If INSTR does not find substring, it assigns position 0 to name. The INSTR directive assigns the position value name as if it were a numeric equate. In contrast, the @InStr returns the value as a string of digits in the current radix. The @InStr function has a slightly different syntax than the INSTR directive. You can omit the first argument and its associated comma from the directive. You can leave the first argument blank with the function, but a blank function argument must still have a comma. For example,
pos INSTR <person>, <son>
is the same as
pos = @InStr( , <person>, <son> )
You can also assign the return value to a text macro, like this:
248
Programmers Guide
strpos TEXTEQU @InStr( , <person>, <son> )
SIZESTR
name SIZESTR string @SizeStr( string ) The SIZESTR directive assigns the number of characters in string to name. An empty string returns a length of zero. The SIZESTR directive assigns the size value to a name as if it were a numeric equate. The @SizeStr function returns the value as a string of digits in the current radix.
CATSTR
name CATSTR string[, string]... @CatStr( string[, string]... ) The CATSTR directive concatenates a list of text values into a single text value and assigns it to name. TEXTEQU is technically a synonym for CATSTR. TEXTEQU is normally used for single-string assignments, while CATSTR is used for multistring concatenations. The following example pushes and pops one set of registers, illustrating several uses of string directives and functions:

; SaveRegs - Macro to generate a push instruction for each ; register in argument list. Saves each register name in the ; regpushed text macro. regpushed TEXTEQU <> ;; Initialize empty string SaveRegs MACRO regs:VARARG LOCAL reg FOR reg, <regs> ;; Push each register push reg ;; and add it to the list regpushed CATSTR <reg>, <,>, regpushed ENDM ;; Strip off last comma regpushed CATSTR <!<>, regpushed ;; Mark start of list with < regpushed SUBSTR regpushed, 1, @SizeStr( regpushed ) regpushed CATSTR regpushed, <!>> ;; Mark end with > ENDM ; RestoreRegs - Macro to generate a pop instruction for registers ; saved by the SaveRegs macro. Restores one group of registers. RestoreRegs MACRO LOCAL reg %FOR reg, regpushed pop reg ENDM ENDM
249
;; Pop each register
Notice how the SaveRegs macro saves its result in the regpushed text macro for later use by the RestoreRegs macro. In this case, a text macro is used as a global variable. By contrast, the reg text macro is used only in RestoreRegs. It is declared LOCAL so it wont take the name reg from the global name space. The MACROS.INC file provided with MASM 6.1 includes expanded versions of these same two macros.
Returning Values with Macro Functions

A macro function is a named group of statements that returns a value. When calling a macro function, you must enclose its argument list in parentheses, even if the list is empty. The function always returns text. MASM 6.1 provides several predefined macro functions for common tasks. The predefined macros include @Environ (see page 10) and the string functions @SizeStr, @CatStr, @SubStr, and @InStr (discussed in the preceding section). You define macro functions in exactly the same way as macro procedures, except that a macro function always returns a value through the EXITM directive. Here is an example:
250
Programmers Guide
DEFINED MACRO symbol:REQ IFDEF symbol EXITM <-1> ELSE EXITM <0> ENDIF ENDM
;; True ;; False
This macro works like the defined operator in the C language. You can use it to test the defined state of several different symbols with a single statement, as shown here:
IF DEFINED( DOS ) AND NOT DEFINED( XENIX ) ;; Do something ENDIF
Notice that the macro returns integer values as strings of digits, but the IF statement evaluates numeric values or expressions. There is no conflict because the assembler sees the value returned by the macro function exactly as if the user had typed the values directly into the program:
IF -1 AND NOT 0
Returning Values with EXITM

The return value must be text, a text equate name, or the result of another macro function. A macro function must first convert a numeric value such as a constant, a numeric equate, or the result of a numeric expression before returning it. The macro function can use angle brackets or the expansion operator (%) to convert numbers to text. The DEFINED macro, for instance, could have returned its value as
EXITM %-1
Here is another example of a macro function that uses the WHILE directive to calculate factorials:

factorial MACRO num:REQ LOCAL i, factor factor = num i = 1 WHILE factor GT 1 i = i * factor factor = factor - 1 ENDM EXITM %i ENDM
251
The integer result of the calculation is changed to a text string with the expansion operator (%). The factorial macro can define data, as shown here:
var WORD factorial( 4 )
This statement initializes var with the number 24 (the factorial of 4).
Using Macro Functions with Variable-Length Parameter Lists

You can use the FOR directive to handle macro parameters with the VARARG attribute. FOR Loops and Variable-Length Parameters, page 242, explains how to do this in simple cases where the variable parameters are handled sequentially, from first to last. However, you may sometimes need to process the parameters in reverse order or nonsequentially. Macro functions make these techniques possible. For example, the following macro function determines the number of arguments in a VARARG parameter:
@ArgCount MACRO arglist:VARARG LOCAL count count = 0 FOR arg, <arglist> count = count + 1 ENDM EXITM %count ENDM
;; Count the arguments
252
Programmers Guide
You can use @ArgCount inside a macro that has a VARARG parameter, as shown here:
work MACRO args:VARARG % ECHO Number of arguments is: @ArgCount( args ) ENDM
Another useful task might be to select an item from an argument list using an index to indicate the item. The following macro simplifies this.
@ArgI MACRO index:REQ, arglist:VARARG LOCAL count, retstr retstr TEXTEQU <> ;; Initialize count count = 0 ;; Initialize return string FOR arg, <arglist> count = count + 1 IF count EQ index ;; Item is found retstr TEXTEQU <arg> ;; Set return string EXITM ;; and exit IF ENDIF ENDM EXITM retstr ;; Exit function ENDM
You can use @ArgI like this:

work MACRO args:VARARG % ECHO Third argument is: @ArgI( 3, args ) ENDM
Finally, you might need to process arguments in reverse order. The following macro returns a new argument list in reverse order.
@ArgRev MACRO arglist:REQ LOCAL txt, arg txt TEXTEQU <> % FOR arg, <arglist> txt CATSTR <arg>, <,>, txt ENDM txt SUBSTR txt CATSTR EXITM txt ENDM
;; Paste each onto list
;; Remove terminating comma txt, 1, @SizeStr( %txt ) - 1 <!<>, txt, <!>> ;; Add angle brackets
253
Here is an example showing @ArgRev in use:

work MACRO args:VARARG % FOR arg, @ArgRev( <args> ) ECHO arg ENDM ENDM ;; Process in reverse order
These three macro functions appear in the MACROS.INC include file, located on one of the MASM distribution disks.
Expansion Operator in Macro Functions

This list summarizes the behavior of the expansion operator (%) with macro functions.
u
If a macro function is preceded by a %, it will be expanded. However, if it expands to a text macro or a macro function call, it will not expand further. If you use a macro function call as an argument for another macro function call, a % is not needed. If a macro function is called inside angle brackets and is preceded by %, it will be expanded.
Advanced Macro Techniques

The concept of replacing macro names with predefined macro text is simple in theory, but it has many implications and complications. Here is a brief summary of some advanced techniques you can use in macros.
Defining Macros within Macros

Macros can define other macros, a technique called nesting macros. MASM expands macros as it encounters them, so nested macros are always processed in nesting order. You cannot reference a nested macro directly in your program, since the assembler begins expansion from the outer macro. In effect, a nested macro is local to the macro that defines it. Only the amount of available memory limits the number of macros a program can nest.
254
Programmers Guide
The following example demonstrates how one macro can define another. The macro takes as an argument the name of a shift or rotate instruction, then creates another macro that simplifies the instruction for 8088/86 processors.
shifts MACRO opname ;; Macro generates macros opname&s MACRO operand:REQ, rotates:=<1> IF rotates LE 2 ;; One at a time is faster REPEAT rotate ;; for 2 or less opname operand, 1 ENDM ELSE ;; Using CL is faster for mov cl, rotates ;; more than 2 opname operand, cl ENDIF ENDM ENDM
Recall that the 8086 processor allows only 1 or CL as an operand for shift and rotate instructions. Expanding shifts generates a macro for the shift instruction that uses whichever operand is more efficient. You create the entire series of macros, one for each shift instruction, like this:
; Call shifts shifts shifts shifts shifts shifts shifts shifts macro repeatedly to make new macros ror ; Generates rors rol ; Generates rols shr ; Generates shrs shl ; Generates shls rcl ; Generates rcls rcr ; Generates rcrs sal ; Generates sals sar ; Generates sars
Then use the new macros as replacements for shift instructions, like this:
shrs rols ax, 5 bx, 3
Testing for Argument Type and Environment

Macros can expand conditional blocks of code by testing for argument type with the OPATTR operator. OPATTR returns a single word constant that indicates the type and scope of an expression, like this: OPATTR expression If expression is not valid or is forward-referenced, OPATTR returns a 0. Otherwise, the return value incorporates the bit flags shown in the table below.
255
OPATTR serves as an enhanced version of the .TYPE operator, which returns only the low byte (bits 0 7) shown in the table. Bits 11 15 of the return value are undefined.
Bit 0 1 2 3 4 5 6 7 8 10 Set If expression References a code label Is a memory variable or has a relocatable data label Is an immediate value Uses direct memory addressing Is a register value References no undefined symbols and is without error Is relative to SS References an external label Has the following language type:
u u u u u u u
000 No language type 001 C 010 SYSCALL 011 STDCALL 100 Pascal 101 FORTRAN 110 Basic
A macro can use OPATTR to determine if an argument is a constant, a register, or a memory operand. With this information, the macro can conditionally generate the most efficient code depending on argument type. For example, given a constant argument, a macro can test it for 0. Depending on the arguments value, the code can select the most effective method to load the value into a register:
IF CONST mov bx, CONST ELSE sub bx, bx ENDIF ; If CONST > 0, move into BX ; More efficient if CONST = 0
256
Programmers Guide
The second method is faster than the first, yet has the same result (with the byproduct of changing the processor flags). The following macro illustrates some techniques using OPATTR by loading an address into a specified offset register:
load MACRO reg:REQ, adr:REQ IF (OPATTR (adr)) AND 00010000y ;; Register IFDIFI reg, adr ;; Dont load register mov reg, adr ;; onto itself ENDIF ELSEIF (OPATTR (adr)) AND 00000100y mov reg, adr ;; Constant ELSEIF (TYPE (adr) EQ BYTE) OR (TYPE (adr) EQ SBYTE) mov reg, OFFSET adr ;; Bytes ELSEIF (SIZE (TYPE (adr)) EQ 2 mov reg, adr ;; Near pointer ELSEIF (SIZE (TYPE (adr)) EQ 4 mov reg, WORD PTR adr[0] ;; Far pointer mov ds, WORD PTR adr[2] ELSE .ERR <Illegal argument> ENDIF
ENDM
A macro also can generate different code depending on the assembly environment. The predefined text macro @Cpu returns a flag for processor type. The following example uses the more efficient constant variation of the PUSH instruction if the processor is an 80186 or higher.
IF @Cpu AND 00000010y pushc MACRO op push op ENDM ;; 80186 or higher
ELSE pushc MACRO op mov ax, op push ax ENDM ENDIF ;; 8088/8086
Another macro can now use pushc rather than conditionally testing for processor type itself. Although either case produces the same code, using pushc assembles faster because the environment is checked only once.
257
You can test the language and operating system using the @Interface text macro. The memory model can be tested with the @Model, @DataSize, or @CodeSize text macros. You can save the contexts inside macros with PUSHCONTEXT and POPCONTEXT. The options for these keywords are:
Option
ASSUMES RADIX LISTING CPU ALL
Description
Saves segment register information Saves current default radix Saves listing and CREF information Saves current CPU and processor All of the above
Using Recursive Macros

Macros can call themselves. In MASM 5.1 and earlier, recursion is an important technique for handling variable arguments. MASM 6.1 handles variable arguments much more cleanly with the FOR directive and the VARARG attribute, as described in FOR Loops and Variable-Length Parameters, earlier in this chapter. However, recursion is still available and may be useful for some macros.
256
Programmers Guide
257
C H A P T E R
1 0
Writing a Dynamic-Link Library For Windows
The Windows operating system relies heavily on service routines and data contained in special libraries called dynamic-link libraries, or DLLs for short. Most of what Windows comprises, from the collections of screen fonts to the routines that handle the graphical interface, is provided by DLLs. MASM 6.1 contains tools that you can use to write DLLs in assembly language. This chapter shows you how. DLLs do not run under MS-DOS. The information in this chapter applies only to Windows, drawing in part on the chapter Writing a Module-Definition File in Environment and Tools. The acronym API, which appears throughout this chapter, refers to the application programming interface that Windows provides for programs. For documentation of API functions, see the Programmers Reference, Volume 2 of the Windows Software Development Kit (SDK). The first section of this chapter gives an overview of DLLs and their similarities to normal libraries. The next section explores the parts of a DLL and the rules you must follow to create one. The third section applies this information to an example DLL.
Overview of DLLs
A dynamic-link library is similar to a normal run-time library. Both types of libraries contain a collection of compiled procedures, which serve one or more calling modules. To link a normal library, the linker copies the required functions from the library file (which usually has a .LIB extension) and combines them with other modules to form an executable program in .EXE format. This process is called static linking. In dynamic linking, the library functions are not copied to an .EXE file. Instead, they reside in a separate file in executable form, ready to serve any calling program, called a client. When the first client requires the library, Windows takes care of loading the functions into memory and establishing linkage. If
Filename: LMAPGC10.DOC Project: Template: MSGRIDA1.DOT Author: Mike Eddy Last Saved By: Ruth L Silverio Revision #: 62 Page: 257 of 1 Printed: 10/02/00 04:22 PM
258
Programmers Guide
subsequent clients also need the library, Windows dynamically links them with the proper library functions already in memory.
Loading a DLL
How Windows loads a DLL affects the client rather than the DLL itself. Accordingly, this section focuses on how to set up a client program to use a DLL. Since the client can itself be a DLL, this is information a DLL programmer should know. However, MASM 6.1 does not provide all the tools required to create a stand-alone program for Windows. To create such a program, called an application, you must use tools in the Windows SDK. Windows provides two methods for loading a dynamic-link library into memory:
Method Implicit loading Explicit loading Description Windows loads the DLL along with the first client program and links it before the client begins execution. Windows does not load the DLL until the first client explicitly requests it during execution.
When you write a DLL, you do not need to know beforehand which of the two methods will be used to load the library. The loading method is determined by how the client is written, not the DLL.
Implicit Loading
The implicit method of loading a DLL offers the advantage of simplicity. The client requires no extra programming effort and can call the library functions as if they were normal run-time functions. However, implicit loading carries two constraints:
u u
The name of the library file must have a .DLL extension. You must either list all DLL functions the client calls in the IMPORTS section of the clients module-definition file, or link the client with an import library.
An import library contains no executable code. It consists of only the names and locations of exported functions in a DLL. The linker uses the locations in the import library to resolve references to DLL functions in the client and to build an executable header. For example, the file LIBW.LIB provided with MASM 6.1 is the import library for the DLL files that contain the Windows API functions. The IMPLIB utility described in Environment and Tools creates an import library. Run IMPLIB from the MS-DOS command line like this:
Chapter 10 Writing a Dynamic-Link Library for Windows
259
IMPLIB implibfile dllfile where implibfile is the name of the import library you want to create from the DLL file dllfile. Once you have created an import library from a DLL, link it with a client program that relies on implicit loading, but does not list imported functions in its module-definition file. Continuing the preceding example, heres the link step for a client program that calls library procedures in the DLL dllfile: LINK client.OBJ, client.EXE, , implibfile, client.DEF This simplified example creates the client program client.EXE, linking it with the import library implibfile, which in turn was created from the DLL file dllfile. To summarize implicit loading, a client program must either
u u
List DLL functions in the IMPORTS section of its module-definition file, or Link with an import library created from the DLL.
Implicit loading is best when a client always requires at least one procedure in the library, since Windows automatically loads the library with the client. If the client does not always require the library service, or if the client must choose at run time between several libraries, you should use explicit loading, discussed next.
Explicit Loading
To explicitly load a DLL, the client does not require linking with an import library, nor must the DLL file have an extension of .DLL. Explicit loading involves three steps in which the client calls Windows API functions: 1. The client calls LoadLibrary to load the DLL. 2. The client calls GetProcAddress to obtain the address of each DLL function it requires. 3. When finished with the DLL, the client calls FreeLibrary to unload the DLL from memory. The following example fragment shows how a client written in assembly language explicitly loads a DLL called SYSINFO.DLL and calls the DLL function GetSysDate.
INCLUDE .DATA hInstance szDLL szDate lpProc windows.inc HINSTANCE 0 BYTE 'SYSINFO.DLL', 0 BYTE 'GetSysDate', 0 DWORD 0
260
Programmers Guide
.CODE . . . INVOKE mov INVOKE mov mov call . . . INVOKE
LoadLibrary, ADDR szDLL ; Load SYSINFO.DLL hInstance, ax ; Save instance count GetProcAddress, ax, ADDR szDate ; Get and save lpProc, ax ; far address of lpProc[2], dx ; GetSysDate lpProc ; Call GetSysDate
FreeLibrary, hInstance
; Unload SYSINFO.DLL
For simplicity, the above example contains no error-checking code. An actual program should check all values returned from the API functions. The explicit method of loading a DLL requires more programming effort in the client program. However, the method allows the client to control which (if any) dynamic-link libraries to load at run time.
Searching for a DLL File

To load a DLL, whether implicitly or explicitly, Windows searches for the DLL file in the following directories in the order shown: 1. The current directory 2. The Windows directory, which contains WIN.COM 3. The Windows system directory, which contains system files such as GDI.EXE 4. The directory where the client program resides (except Windows 3.0 and earlier) 5. Directories listed in the PATH environment string 6. Directories mapped in a network If Windows does not locate the DLL in any of these directories, it prompts the user with a message box.
Building a DLL
A DLL has additional programming requirements beyond those for a normal run-time library. This section describes the requirements pertaining to the librarys code, data, and stack. It also discusses the effects of the librarys extension name.
261
DLL Code
The code in a DLL consists of exported and nonexported functions. Exported functions, listed in the EXPORTS section of the module-definition file, are public routines serving clients. Nonexported functions provide private, internal support for the exported procedures. They are not visible to a client. Under Windows, an exported library routine must appear to the caller as a far procedure. Your DLL routines can use any calling convention you wish, provided the caller assumes the same convention. You can think of dynamiclink code as code for a normal run-time library with the following additions:
u u u
An entry procedure A termination procedure Special prologue and epilogue code
Entry Procedure
A DLL, like any Windows-based program, must have an entry procedure. Windows calls the entry procedure only once when it first loads the DLL, passing the following information in registers:
u u u
DS contains the librarys data segment address. DI holds the librarys instance handle. CX holds the librarys heap size in bytes.
Note Windows API functions destroy all registers except DI, SI, BP, DS, and the stack pointer. To preserve the contents of other registers, your program must save the registers before an API call and restore them afterwards. This information corresponds to the data provided to an application. Since a DLL has only one occurrence in memory, called an instance, the value in DI is not usually important. However, a DLL can use its instance handle to obtain resources from its own executable file. The entry procedure does not need to record the address of the data segment. Windows automatically ensures that each exported routine in the DLL has access to the librarys data segment, as explained in Prologue and Epilogue Code, on page 264. The heap size contained in CX reflects the value provided in the HEAPSIZE statement of the module-definition file. You need not make an accurate guess in the HEAPSIZE statement about the librarys heap requirements, provided you specify a moveable data segment. With a moveable segment, Windows automatically allocates more heap when needed. However, Windows can
262
Programmers Guide
provide no more heap in a fixed data segment than the amount specified in the HEAPSIZE statement. In any case, a librarys total heap cannot exceed 64K, less the amount of static data. Static data and heap reside in the same segment. Windows does not automatically deallocate unneeded heap while the DLL is in memory. Therefore, you should not set an unnecessarily large value in the HEAPSIZE statement, since doing so wastes memory. The entry procedure calls the Windows API function LocalInit to allocate the heap. The library must create a heap before its routines call any heap functions, such as LocalAlloc. The following example illustrates these steps:
DLLEntry PROC FAR PASCAL PUBLIC jcxz INVOKE .IF INVOKE call mov .ENDIF ret @F LocalInit, ds, 0, cx ( ax ) UnlockSegment, -1 LibMain ax, TRUE ; Entry point for DLL ; ; ; ; ; ; ; ; If no heap, skip Else set up the heap If successful, unlock the data segment Call DLL's data init routine Return AX = 1 if okay, else if LocalInit error, return AX = 0
@@:
DLLEntry ENDP
This example code is taken from the DLLENTRY.ASM module, contained in the LIB subdirectory on one of the MASM 6.1 distribution disks. After allocating the heap, the procedure calls the librarys initialization procedure called LibMain in this case. LibMain initializes the librarys static data (if required), then returns to DLLEntry , which returns to Windows. If Windows receives a return value of 0 (FALSE) from DLLEntry , it unloads the library and displays an error message. The process is similar to the way MS-DOS loads a terminate-and-stay-resident program (TSR), described in the next chapter. Both the DLL and TSR return control immediately to the operating system, then wait passively in memory to be called. The following section explains how a DLL gains control when Windows unloads it from memory.
Termination Procedure
Windows maintains a DLL in memory until the last client program terminates or explicitly unloads the library. When unloading a DLL, Windows first calls the librarys termination procedure. This allows the DLL to return resources and do any necessary cleanup operations before Windows unloads the library from memory.
263
Libraries that have registered window procedures with RegisterClass need not call UnregisterClass to remove the class registration. Windows does this automatically when it unloads the library. You must name the librarys termination procedure WEP (for Windows Exit Procedure) and list it in the EXPORTS section of the librarys module-definition file. To ensure immediate operation, provide an ordinal number and use the RESIDENTNAME keyword, as described in the chapter Creating ModuleDefinition Files in Environment and Tools. This keeps the name WEP in the Windows-resident name table at all times. Besides its name, the code for WEP should also remain constantly in memory. To ensure this, place WEP in its own code segment and set the segments attributes as PRELOAD FIXED in the SEGMENTS statement of the moduledefinition file. Thus, your DLL code should use a memory model that allows multiple code segments, such as medium model. Since a termination procedure is usually short, keeping it resident in memory does not burden the operating system. The termination procedure accepts a single parameter, which can have one of two values. These values are assigned to the following symbolic constants in the WINDOWS.INC file located in the LIB subdirectory:
u u
WEP_SYSTEM_EXIT (value 1) indicates Windows is shutting down. WEP_FREE_DLL (value 0) indicates the librarys last client has terminated or
has called FreeLibrary , and Windows is unloading the DLL. The following fragment provides an outline for a typical termination procedure:
WEP PROC FAR PASCAL EXPORT wExitCode:WORD Prolog ; .IF wExitCode == WEP_FREE_DLL ; . ; . ; . ELSEIF wExitCode == WEP_SYSTEM_EXIT . ; . ; . . ENDIF ; ; mov ax, TRUE ; Epilog ; ret ; WEP ENDP Prologue macro, discussed below Get ready to unload
Windows is shutting down If neither value, take no action Always return AX = 1 Epilogue code, discussed below
264
Programmers Guide
Usually, the WEP procedure takes the same actions regardless of the parameter value, since in either case Windows will unload the DLL. Under Windows 3.0, the WEP procedure receives stack space of about 256 bytes. This allows the procedure to unhook interrupts, but little else. Any other action, such as calling an API function, usually results in an unrecoverable application error because of stack overflow. Later versions of Windows provide at least 4K of stack to the WEP procedure, allowing it to call many API functions. However, WEP should not send or post a message to a client, because the client may already be terminated. The WEP procedure should also not attempt file I/O, since only application processes not DLLs can own files. When control reaches WEP, the client may no longer exist and its files are closed.
Prologue and Epilogue Code

Exported procedures in a Windows-based program require special epilogue and prologue code. (For a definition of these terms, see Generating Prologue and Epilogue Code in Chapter 7.) The SAMPLES subdirectory on one of the MASM 6.1 distribution disks contains macros you can use for far procedures in your Windows-based programs. Heres a listing of the prologue macro:
Prolog MACRO mov nop inc push mov push mov ENDM ax, ds bp bp bp, sp ds ds, ax ; ; ; ; ; ; ; Must be 1st, since Windows overwrites Placeholder for 3rd byte Push odd BP. Not required, but allows CodeView to recognize frame Set up stack frame to access params Save DS Point DS to DLL's data segment
The instruction
inc bp
marks the beginning of the stack frame with an odd number. This allows realmode Windows to locate segment addresses on the stack and update the addresses when it moves or discards the corresponding segments. In protected mode, selector values do not change when segments are moved, so marking the stack frame is not required. However, certain debugging applications, such as Microsoft Codeview for Windows and the Microsoft Windows 80386 Debugger (both documented in Programming Tools of the SDK), search for a marked frame to determine if the frame belongs to a far procedure. Without the mark, these debuggers give meaningless information when backtracing through the stack. Therefore, you should include the INC BP instruction for Windows-
265
based programs that may run in real mode or that require debugging with a Microsoft debugger. Another characteristic of the prologue macro may seem puzzling at first glance. The macro moves DS into AX, then AX back into DS. This sequence of instructions lets Windows selectively overwrite the prologue code in far procedures. When Windows loads a program, it compares the names of far procedures with the list of exported procedures in the module-definition file. For procedures that do not appear on the list, Windows leaves their prologue code untouched. However, Windows overwrites the first 3 bytes of all exported procedures with
mov ax, DGROUP
where DGROUP represents the selector value for the librarys data segment. This explains why the prologue macro reserves the third byte with a NOP instruction. The 1-byte instruction serves as padding to provide a 3-byte area for Windows to overwrite. The epilogue code returns BP to normal, like this:
Epilog MACRO pop pop dec ENDM ds bp bp ; Recover original DS ; and BP+1 ; Reset to original BP
DLL Data
A DLL can have its own local data segment up to 64K. Besides static data, the segment contains the heap from which a procedure can allocate memory through the LocalAlloc API function. You should minimize static data in a DLL to reserve as much memory as possible for temporary allocations. Furthermore, all procedures in the DLL draw from the same heap space. If more than one procedure in the library accesses the heap, a procedure should not hold allocated space unnecessarily at the expense of the other procedures. A Windows-based program must reserve a task header in the first 16 bytes of its data segment. If you link your program with a C run-time function, the C startup code automatically allocates the task header. Otherwise, you must explicitly reserve and initialize the header with zeros. The sample program described in Example of a DLL:SYSINFO, page 267, shows how to allocate a task header.
266
Programmers Guide
DLL Stack
A DLL does not declare a stack segment and does not allocate stack space. A client program calls a librarys exported procedure through a simple far call, and the stack does not change. The procedure is, in effect, part of the calling program, and therefore uses the callers stack. This simple arrangement differs from that used in small and medium models, in which many C run-time functions accept near pointers as arguments. Such functions assume the pointer is relative to the current data segment. In applications, the call works even if the argument points to a local variable on the stack, since DS and SS contain the same segment address. However, in a DLL, DS and SS point to different segments. Under small and medium models, a library procedure must always pass pointers to static variables located in the data segment, not to local variables on the stack. When you write a DLL, include the FARSTACK keyword with the .MODEL directive, like this:
.MODEL small, pascal, farstack
This informs the assembler that SS points to a segment other than DGROUP. With full segment definitions, also add the line:
ASSUME DS:DGROUP, SS:NOTHING
DLL Extension Names

You can name an explicitly-loaded DLL file with any extension. The many files in your Windows directory with extensions such as .DRV and .FON are almost certainly DLLs. Many DLLs have an .EXE extension, though they are not true executable files. A library with an .EXE extension should always include stub code, specified by the STUB statement in the module-definition file. The stub code activates when run under MS-DOS, usually displaying a message to inform the user that the program requires Windows. Without the stub code, the system hangs if a user attempts to run a DLL with an .EXE extension. Do not name a DLL with a .COM extension, since MS-DOS will give control to the first byte of the program header. The header does not contain executable instructions, and the system will hang even if the DLL has stub code.
Summary
Following is a summary of the previous information in this chapter.

u
267
A dynamic-link library has only one instance that is, it can load only once during a Windows session. A single DLL can service calls from many client programs. Windows takes care of linkage between the DLL and each client. Windows loads a DLL either implicitly (along with the first client) or explicitly (when the first client calls LoadLibrary ). It unloads the DLL when the last client either terminates or calls FreeLibrary . A client calls a DLL routine as a simple far procedure. The routine can use any calling convention. Windows ensures that the first instruction in a DLL procedure moves the address of the librarys data segment into AX. You must provide the proper prologue code to allow space for this 3-byte instruction and to copy AX to DS. All procedures in a DLL have access to a single common data segment. The segment contains both static variables and heap space, and cannot exceed 64K. A DLL procedure uses the callers stack. All exported procedures in a DLL must appear in the EXPORTS list in the librarys module-definition file.
u u
Example of a DLL: SYSINFO

Like any library, a DLL should be as small and fast as possible a good argument for writing it in assembly language. This section describes an example library called SYSINFO, written entirely in assembly language. The following text applies previous information in this chapter to an actual DLL. SYSINFO contains three callable procedures. The acronym ASCIIZ refers to a string of ASCII characters terminated with a zero. The callable procedures are:
Procedure GetSysTime GetSysDate GetSysInfo Description Returns a far pointer to a 12-byte ASCIIZ string containing the current time in hh:mm:ss format. Returns a far pointer to an ASCIIZ string containing the current date in any of six languages. Returns a far pointer to a structure containing the following system data:
u u u u
ASCIIZ string of Windows version ASCIIZ string of MS-DOS version Current keyboard status Current video mode
268
Programmers Guide
u u u
Math coprocessor flag Processor type ASCIIZ string of ROM-BIOS release date
To see SYSINFO in action, follow the steps below. The file SYSDATA.EXE resides in the SAMPLES\WINDLL subdirectory of MASM if you requested example files when installing MASM. Otherwise, you must first install the file with the MASM 6.1 SETUP utility.
u
Create SYSINFO.DLL as described in the following section and place it in the SAMPLES\WINDLL subdirectory for MASM 6.1. From the Windows File Manager, make the SAMPLES\WINDLL subdirectory the current directory. In the Program Manager, choose Run from the File menu and type SYSDATA to run the example program SYSDATA.EXE. This program calls the routines in SYSINFO.DLL and displays the returned data.
Entry Routine for SYSINFO

SYSINFO links with the DLLENTRY module, which serves as the librarys entry point when Windows first loads the program. For a listing and description of DLLENTRY.ASM, see the previous section, Entry Procedure. DLLENTRY replaces the LIBENTRY module provided with the Windows SDK, but unlocks the data segment after calling the API function InitTask. LIBENTRY does not unlock the segment. DLLENTRY saves some space over LIBENTRY, because it does not pass any arguments to LibMain. The LibMain procedure handles the librarys initialization tasks. You can name the procedure whatever you want, provided you make the same change in DLLENTRY.ASM and reassemble both modules. You can even combine DLLENTRY with LibMain to form one procedure, like this:

DLLInit PROC FAR PASCAL PUBLIC jcxz INVOKE .IF INVOKE . . . mov .ENDIF ret @F LocalInit, ds, 0, cx ( ax ) UnlockSegment, -1 ; Entry point for DLL ; ; ; ; ; ; ; ; ; ; If no heap, skip Else set up the heap If successful, unlock the data segment Initialize DLL data. This replaces the call to the LibMain procedure. Return AX = 1 if okay, else if LocalInit error, return AX = 0
269
@@:
ax, TRUE
DLLInit ENDP END DLLInit
Whatever you call your combined procedure (DLLInit in the preceding example), place the name on the END statement as shown. This identifies the procedure as the one that first executes when Windows loads the DLL. SYSINFO accommodates several international languages. Currently, SYSINFO recognizes English, French, Spanish, German, Italian, and Swedish, but you can easily extend the code to include other languages. LibMain calls GetProfileString to determine the current language, then initializes the variable indx accordingly. The variable indirectly points to an array of strings containing days and months in different languages. The GetSysDate procedure uses these strings to create a full date in the correct language.
Static Data
SYSINFO stores the strings in its static data segment. This data remains in memory along with the librarys code. All procedures have equal access to the data segment. Because the library does not call any C run-time functions, it explicitly allocates the low paragraph of the data segment with the variable TaskHead. This 16byte area serves as the required Windows task header, described in DLL Data, earlier in this chapter.
Module-Definition File
The librarys module-definition file, named SYSINFO.DEF, looks like this:
270
Programmers Guide
LIBRARY DESCRIPTION EXETYPE CODE DATA SEGMENTS CODE2 EXPORTS SYSINFO 'Sample assembly-language DLL' WINDOWS PRELOAD MOVEABLE DISCARDABLE PRELOAD MOVEABLE SINGLE PRELOAD FIXED WEP @1 RESIDENTNAME GetSysTime @2 GetSysDate @3 GetSysInfo @4
Note the following points about the module-definition file:

u u
u u
The LIBRARY statement identifies SYSINFO as a dynamic-link library. SYSINFO places its termination procedure WEP in a separate code segment, called CODE2, which the SEGMENTS statement declares as FIXED. This keeps the WEP routine fixed in memory, while all other code remains moveable. The EXPORTS section lists all procedures the library exports, including WEP. None of the librarys procedures require heap space, so SYSINFO.DEF includes no HEAPSIZE statement.
Assembling and Linking SYSINFO

The following listing shows the description file for SYSINFO:
sysinfo.obj: sysinfo.asm dll.inc ML /c /W3 sysinfo.asm dllentry.obj: dllentry.asm dll.inc. ML /c /W3 dllentry.asm sysinfo.dll: dllentry.obj sysinfo.obj LINK dllentry sysinfo, sysinfo.dll,, libw.lib mnocrtdw.lib, sysinfo.def
To create SYSINFO.DLL, run the NMAKE utility described in Chapter 16 of Environments and Tools. Or assemble and link SYSINFO with the three command lines shown in the preceding listing. This does not require running NMAKE. SYSINFO links with the library modules MNOCRTDW.LIB and LIBW.LIB. The first supplies the required Windows startup code for a medium-model DLL that does not use any C run-time functions. LIBW.LIB is the Windows import library, which contains no executable code. The import library provides linkage information for the Windows API functions referenced in the DLL. Windows establishes the final links when it loads the program.
271
Expanding SYSINFO
SYSINFO is an example of how to write an assembly-language DLL without overwhelming detail. It has plenty of room for expansion and improvements. The following list may give you some ideas:
u
To create a heap area for the library, add the line HEAPSIZE value to the module-definition file, where value is an approximate guess for the amount of heap required in bytes. The DLLEntry procedure automatically allocates the indicated amount of heap. Keep the data segment moveable, because Windows then provides more heap space if required by the DLL procedures. If you want to add a procedure that calls C run-time functions, you must replace MNOCRTDW.LIB with MDLLCW.LIB, which is supplied with the Windows SDK. The MDLLCW.LIB library contains the run-time functions for medium-model DLLs. Each time the GetSysInfo procedure is called, it retrieves the version number of MS-DOS and Windows, gets the processor type, checks for a coprocessor, and reads the ROM-BIOS release date. Since this information does not change throughout a Windows session, it would be handled more efficiently in the LibMain procedure, which executes only once. The code is currently placed in GetSysInfo for the sake of clarity at the expense of efficiency. SYSINFO is not a true international program. You can easily add more languages, extending the days and months arrays accordingly. Moreover, for the sake of simplicity, the GetSysDate procedure arranges the date with an American bias. For example, in many parts of the world, the date numeral appears before the month rather than after. If you use SYSINFO in your own applications, you should include code in LibMain to determine the correct date format with additional calls to GetProfileString. You can find more information on how to do this in Chapter 18 of the Microsoft Windows Programmers Reference, Volume 1, supplied with the Windows SDK.
272
Programmers Guide
273
C H A P T E R
1 1
Writing Memory-Resident Software
Through its memory-management system, MS-DOS allows a program to remain resident in memory after terminating. The resident program can later regain control of the processor to perform tasks such as background printing or popping up a calculator on the screen. Such a program is commonly called a TSR, from the terminate-and-stay-resident function it uses to return to MSDOS. This chapter explains the techniques of writing memory-resident software. The first two sections present introductory material. Following sections describe important MS-DOS and BIOS interrupts and focus on how to write safe, compatible, memory-resident software. Two example programs illustrate the techniques described in the chapter. The MASM 6.1 disks contain complete source code for the two example TSR programs.
Terminate-and-Stay-Resident Programs
MS-DOS maintains a pointer to the beginning of unused memory. Programs load into memory at this position and terminate execution by returning control to MS-DOS. Normally, the pointer remains unchanged, allowing MS-DOS to reuse the same memory when loading other programs. A terminating program can, however, prevent other programs from loading on top of it. These programs exit to MS-DOS through the terminate-and-stayresident function, which resets the free-memory pointer to a higher position. This leaves the program resident in a protected block of memory, even though it is no longer running.
274
Programmers Guide
The terminate-and-stay-resident function (Function 31h) is one of the MS-DOS services invoked through Interrupt 21h. The following fragment shows how a TSR program terminates through Function 31h and remains resident in a 1000hbyte block of memory:
mov mov mov int ah, 31h al, err dx, 100h 21h ; ; ; ; ; Request DOS Function 31h Set return code Reserve 100h paragraphs (1000h bytes) Terminate-and-stay-resident
Note In current versions of MS-DOS, Interrupt 27h also provides a terminateand-stay-resident service. However, Microsoft cannot guarantee future support for Interrupt 27h and does not recommend its use.
Structure of a TSR
TSRs consist of two distinct parts that execute at different times. The first part is the installation section, which executes only once, when MS-DOS loads the program. The installation code performs any initialization tasks required by the TSR and then exits through the terminate-and-stay-resident function. The second part of the TSR, called the resident section, consists of code and data left in memory after termination. Though often identified with the TSR itself, the resident section makes up only part of the entire program. The TSRs resident code must be able to regain control of the processor and execute after the program has terminated. Methods of executing a TSR are classified as either passive or active.
Passive TSRs
The simplest way to execute a TSR is to transfer control to it explicitly from another program. Because the TSR in this case does not solicit processor control, it is said to be passive. If the calling program can determine the TSRs memory address, it can grant control via a far jump or call. More commonly, a program activates a passive TSR through a software interrupt. The installation section of the TSR writes the address of its resident code to the proper position in the interrupt vector table (see MS-DOS Interrupts in Chapter 7). Any subsequent program can then execute the TSR by calling the interrupt. Passive TSRs often replace existing software interrupts. For example, a passive TSR might replace Interrupt 10h, the BIOS video service. By intercepting calls that read or write to the screen, the TSR can access the video buffer directly, increasing display speed.
Chapter 11 Writing Memory-Resident Software
275
Passive TSRs allow limited access since they can be invoked only from another program. They have the advantage of executing within the context of the calling program, and thus run no risk of interfering with another process. Such a risk does exist with active TSRs.
Active TSRs
The second method of executing a TSR involves signaling it through some hardware event, such as a predetermined sequence of keystrokes. This type of TSR is active because it must continually search for its startup signal. The advantage of active TSRs lies in their accessibility. They can take control from any running application, execute, and return, all on demand. An active TSR, however, must not seize processor control blindly. It must contain additional code that determines the proper moment at which to execute. The extra code consists of one or more routines called interrupt handlers, described in the following section.
Interrupt Handlers in Active TSRs

The memory-resident portion of an active TSR consists of two parts. One part contains the body of the TSR the code and data that perform the programs main tasks. The other part contains the TSRs interrupt handlers. An interrupt handler is a routine that takes control when a specific interrupt occurs. Although sometimes called an interrupt service routine, a TSRs handler usually does not service the interrupt. Instead, it passes control to the original interrupt routine, which does the actual interrupt servicing. (See the section Replacing an Interrupt Routine in Chapter 7 for information on how to write an interrupt handler.) Collectively, interrupt handlers ensure that a TSR operates compatibly with the rest of the system. Individually, each handler fulfills one or more of the following functions:
u u u
Auditing hardware events that may signal a request for the TSR Monitoring system status Determining whether a request for the TSR should be honored, based on current system status
Auditing Hardware Events for TSR Requests

Active TSRs commonly use a special keystroke sequence or the timer as a request signal. A TSR invoked through one of these channels must be equipped with handlers that audit keyboard or timer events.
276
Programmers Guide
A keyboard handler receives control at every keystroke. It examines each key, searching for the proper signal or hot key. Generally, a keyboard handler should not attempt to call the TSR directly when it detects the hot key. If the TSR cannot safely interrupt the current process at that moment, the keyboard handler is forced to exit to allow the process to continue. Since the handler cannot regain control until the next keystroke, the user has to press the hot key repeatedly until the handler can comply with the request. Instead, the handler should merely set a request flag when it detects a hot-key signal and then exit normally. Examples in the following paragraphs illustrate this technique. For computers other than MCA (IBM PS/2 and compatible), an active TSR audits keystrokes through a handler for Interrupt 09, the keyboard interrupt:
Keybrd PROC sti push in call .IF mov . . . FAR ax al, 60h CheckHotKey carry? cs:TsrRequestFlag, TRUE ; ; ; ; ; ; ; Interrupts are okay Save AX register AL = key scan code Check for hot key If hot key pressed, raise flag and set up for exit
A TSR running on a PS/2 computer cannot reliably read key scan codes using this method. Instead, the TSR must search for its hot key through a handler for Interrupt 15h (Miscellaneous System Services). The handler determines the current keypress from the AL register when AH equals 4Fh, as shown here:
MiscServ PROC sti .IF call .IF mov . . . FAR ah == 4Fh CheckHotKey carry? cs:TsrRequestFlag, TRUE ; ; ; ; ; ; Interrupts okay If Keyboard Intercept Service: Check for hot key If hot key pressed, raise flag and set up for exit
The example program on page 293 shows how a TSR tests for a PS/2 machine and then sets up a handler for either Interrupt 09 or Interrupt 15h to audit keystrokes. Setting a request flag in the keyboard handler allows other code, such as the timer handler (Interrupt 08), to recognize a request for the TSR. The timer handler gains control at every timer interrupt, which occurs an average of 18.2 times per second.
277
The following fragment shows how a timer handler tests the request flag and continually polls until it can safely execute the TSR.
NewTimer PROC FAR . . . cmp TsrRequestFlag, FALSE .IF !zero? call CheckSystem .IF carry? call ActivateTsr . . .
; Has TSR been requested? ; If so, can system be ; interrupted safely? ; If so, ; activate TSR
Monitoring System Status

A TSR that uses a hardware device such as the video or disk must not interrupt while the device is active. A TSR monitors a device by handling the devices interrupt. Each interrupt handler simply sets a flag to indicate the device is in use, and then clears the flag when the interrupt finishes. The following shows a typical monitor handler:
NewHandler PROC FAR mov cs:ActiveFlag, TRUE pushf call mov iret NewHandler ENDP ; ; ; OldHandler ; cs:ActiveFlag, FALSE ; ; Set active flag Simulate interrupt by pushing flags, then far-calling original routine Clear active flag Return from interrupt
Only hardware used by the TSR requires monitoring. For example, a TSR that performs disk input/output (I/O) must monitor disk use through Interrupt 13h. The disk handler sets an active flag that prevents the TSR from executing during a read or write operation. Otherwise, the TSRs own I/O would move the disk head. This would cause the suspended disk operation to continue with the head incorrectly positioned when the TSR returned control to the interrupted program. In the same way, an active TSR that displays to the screen must monitor calls to Interrupt 10h. The Interrupt 10h BIOS routine does not protect critical sections of code that program the video controller. The TSR must therefore ensure it does not interrupt such nonreentrant operations. The activities of the operating system also affect the system status. With few exceptions, MS-DOS functions are not reentrant and must not be interrupted.
278
Programmers Guide
However, monitoring MS-DOS is somewhat more complicated than monitoring hardware. This subject is discussed in Using MS-DOS in Active TSRs, later in this chapter. Figure 11.1 illustrates the process described so far. It shows a time line for a typical TSR signaled from the keyboard. When the keyboard handler detects the proper hot key, it sets a request flag called TsrRequestFlag. Thereafter, the timer handler continually checks the system status until it can safely call the TSR.
Figure 11.1
Time Line of Interactions Between Interrupt Handlers for a Typical TSR
The following comments describe the chain of events depicted in Figure 11.1. Each comment refers to one of the numbered pointers in the figure. 1. At time = t, the timer handler activates. It finds the flag TsrRequestFlag clear, indicating the user has not requested the TSR. The handler terminates without taking further action. Notice that Interrupt 13h is currently processing a disk I/O operation. 2. Before the next timer interrupt, the keyboard handler detects the hot key, signaling a request for the TSR. The keyboard handler sets TsrRequestFlag and returns. 3. At time = t + 1/18 second, the timer handler again activates and finds TsrRequestFlag set. The handler checks other active flags to determine if the TSR can safely execute. Since Interrupt 13h has not yet completed its disk operation, the timer handler finds DiskActiveFlag set. The handler therefore terminates without activating the TSR.
279
4. At time = t + 2/18 second, the timer handler again finds TsrRequestFlag set and repeats its scan of the active flags. DiskActiveFlag is now clear, but in the interim, Interrupt 10h has activated as indicated by the flag VideoActiveFlag. The timer handler accordingly terminates without activating the TSR. 5. At time = t + 3/18 second, the timer handler repeats the process. This time it finds all active flags clear, indicating the TSR can safely execute. The timer handler calls the TSR, which sets its own active flag to ensure it will not interrupt itself if requested again. 6. The timer and other interrupts continue to function normally while the TSR executes. The timer itself can serve as the startup signal if the TSR executes periodically. Screen clocks that continuously show seconds and minutes are examples of TSRs that use the timer this way. ALARM.ASM, a program described in the next section, shows another example of a timer-driven TSR.
Determining Whether to Invoke the TSR

Once a handler receives a request signal for the TSR, it checks the various active flags maintained by the handlers that monitor system status. If any of the flags are set, the handler ignores the request and exits. If the flags are clear, the handler invokes the TSR, usually through a near or far call. Figure 11.1 illustrates how a timer handler detects a request and then periodically scans various active flags until all the flags are clear. A TSR that changes stacks must not interrupt itself. Otherwise, the second execution would overwrite the stack data belonging to the first. A TSR prevents this by setting its own active flag before executing, as shown in Figure 11.1. A handler must check this flag along with the other active flags when determining whether the TSR can safely execute.
Example of a Simple TSR: ALARM

This section presents a simple alarm clock TSR that demonstrates some of the material covered so far. The program accepts an argument from the command line that specifies the alarm setting in military form, such as 1635 for 4:35 P.M. For simplicity, the argument must consist of four digits, including leading zeros. To set the alarm at 7:45 A.M. , for example, enter the command:
ALARM 0745
The installation section of the program begins with the Install procedure. Install computes the number of five-second intervals that must elapse before the alarm sounds and stores this number in the word CountDown. The
280
Programmers Guide
procedure then obtains the vector for Interrupt 08 (timer) through MS-DOS Function 35h and stores it in the far pointer OldTimer. Function 25h replaces the vector with the far address of the new timer handler NewTimer. Once installed, the new timer handler executes at every timer interrupt. These interrupts occur 18.2 times per second or 91 times every five seconds. Each time it executes, NewTimer subtracts one from a secondary counter called Tick91. By counting 91 timer ticks, Tick91 accurately measures a period of five seconds. When Tick91 reaches zero, its reset to 91 and CountDown is decremented by one. When CountDown reaches zero, the alarm sounds.
;* ;* ;* ;* ;* ;* ;* ALARM.ASM - A simple memory-resident program that beeps the speaker at a prearranged time. Can be loaded more than once for multiple alarm settings. During installation, ALARM establishes a handler for the timer interrupt (Interrupt 08). It then terminates through the terminate-and-stay-resident function (Function 31h). After the alarm sounds, the resident portion of the program retires by setting a flag that prevents further processing in the handler. ; Create ALARM.COM
.MODEL tiny .STACK .CODE ORG 5Dh CountDown LABEL .STARTUP jmp Install
WORD
; Location of time argument in PSP, ; converted to number of 5-second ; intervals to elapse ; Jump over data and resident code
; Data must be in code segment so it wont be thrown away with Install code. OldTimer DWORD ? ; Address of original timer routine tick_91 BYTE 91 ; Counts 91 clock ticks (5 seconds) TimerActiveFlag BYTE 0 ; Active flag for timer handler ;* NewTimer - Handler routine for timer interrupt (Interrupt 08). ;* Decrements CountDown every 5 seconds. No other action is taken ;* until CountDown reaches 0, at which time the speaker sounds. NewTimer PROC .IF jmp .ENDIF inc pushf call sti push push pop dec FAR cs:TimerActiveFlag != 0 ; If timer busy or retired, cs:OldTimer ; jump to original timer routine cs:TimerActiveFlag cs:OldTimer ds cs ds tick_91 ; ; ; ; ; ; ; ; Set active flag Simulate interrupt by pushing flags, then far-calling original routine Enable interrupts Preserve DS register Point DS to current segment for further memory access Count down for 91 ticks

.IF mov dec .IF call inc .ENDIF .ENDIF dec pop iret NewTimer ENDP zero? tick_91, 91 CountDown zero? Sound TimerActiveFlag ; If 91 ticks have elapsed, ; reset secondary counter and ; subtract one 5-second interval ; If CountDown drained, ; sound speaker ; Alarm has sounded--inc flag ; again so it remains set
281
TimerActiveFlag ds
; Decrement active flag ; Recover DS ; Return from interrupt handler
;* Sound - Sounds speaker with the following tone and duration: BEEP_TONE BEEP_DURATION EQU EQU 440 6 ; Beep tone in hertz ; Number of clocks during beep, ; where 18 clocks = approx 1 second
Sound
PROC USES ax bx cx dx es ; Save registers used in this routine mov al, 0B6h ; Initialize channel 2 of out 43h, al ; timer chip mov dx, 12h ; Divide 1,193,180 hertz mov ax, 34DCh ; (clock frequency) by mov bx, BEEP_TONE ; desired frequency div bx ; Result is timer clock count out 42h, al ; Low byte of count to timer mov al, ah out 42h, al ; High byte of count to timer in al, 61h ; Read value from port 61h or al, 3 ; Set first two bits out 61h, al ; Turn speaker on ; Pause for specified number of clock ticks mov sub mov add adc .REPEAT mov mov sub sbb .UNTIL dx, cx, es, dx, cx, BEEP_DURATION cx cx es:[46Ch] es:[46Eh] ; ; ; ; ; Beep duration in clock CX:DX = tick count for Point ES to low memory Add current tick count Result is target count ticks pause data to CX:DX in CX:DX
bx, es:[46Ch] ax, es:[46Eh] bx, dx ax, cx !carry?
; Now repeatedly poll clock ; count until the target ; time is reached
282
Programmers Guide
in xor out ret ENDP al, 61h al, 3 61h, al ; When time elapses, get port value ; Kill bits 0-1 to turn ; speaker off
Sound ;* ;* ;* ;* ;* ;* ;* ;*
Install - Converts ASCII argument to valid binary number, replaces NewTimer as the interrupt handler for the timer, then makes program memory-resident by exiting through Function 31h. This procedure marks the end of the TSR's resident section and the beginning of the installation section. When ALARM terminates through Function 31h, the above code and data remain resident in memory. The memory occupied by the following code is returned to DOS.
Install PROC ; Time argument is in hhmm military format. Converts ASCII digits to ; number of minutes since midnight, then converts current time to number ; of minutes since midnight. Difference is number of minutes to elapse ; until alarm sounds. Converts to seconds-to-elapse, divides by 5 seconds, ; and stores result in word CountDown. DEFAULT_TIME EQU 3600 ; Default alarm setting = 1 hour ; (in seconds) from present time mov ax, DEFAULT_TIME cwd ; DX:AX = default time in seconds .IF BYTE PTR CountDown != ' ' ; If not blank argument, xor CountDown[0], '00' ; convert 4 bytes of ASCII xor CountDown[2], '00' ; argument to binary mov mul add mov mov mul add mov mov int al, 10 BYTE PTR al, BYTE bh, al al, 10 BYTE PTR al, BYTE bl, al ah, 2Ch 21h ; CountDown[0] ; PTR CountDown[1] ; ; CountDown[2] ; PTR CountDown[3] ; ; ; ; Multiply 1st hour digit by 10 and add to 2nd hour digit BH = hour for alarm to go off Repeat procedure for minutes Multiply 1st minute digit by 10 and add to 2nd minute digit BL = minute for alarm to go off Request Function 2Ch Get Time (CX = current hour/min)

mov sub push mov mul sub add mov mul sub add sub .IF add .ENDIF mov mul pop sub sbb .IF mov cwd .ENDIF .ENDIF mov div mov mov int mov mov mov mov int mov mov shr inc mov int Install ENDP END dl, dh, dx al, ch ch, cx, dh dh 60 ch ax ; Save DX = current seconds ; Multiply current hour by 60 ; to convert to minutes ; Add current minutes to result ; CX = minutes since midnight ; Multiply alarm hour by 60 ; to convert to minutes ; AX = number of minutes since ; midnight for alarm setting ; AX = time in minutes to elapse ; before alarm sounds ; If alarm time is tomorrow, ; add minutes in a day
283
al, 60 bh bh, bh ax, bx ax, cx carry? ax, 24 * 60
bx, 60 bx bx ax, bx dx, 0 carry? ax, 5
; ; ; ; ; ;
DX:AX = minutes-to-elapse-times-60 Recover current seconds DX:AX = seconds to elapse before alarm activates If negative, assume 5 seconds
bx, 5 bx CountDown, ax ax, 3508h 21h WORD PTR OldTimer[0], bx WORD PTR OldTimer[2], es ax, 2508h dx, OFFSET NewTimer 21h dx, cl, dx, dx ax, 21h OFFSET Install 4 cl 3100h
; Divide result by 5 seconds ; AX = number of 5-second intervals ; to elapse before alarm sounds ; ; ; ; ; ; ; Request Function 35h Get Vector for timer (Interrupt 08) Store address of original timer interrupt Request Function 25h DS:DX points to new timer handler Set Vector with address of NewTimer
; DX = bytes in resident section ; Convert to number of paragraphs ; plus one ; Request Function 31h, error code=0 ; Terminate-and-stay-resident
284
Programmers Guide
Note the following points about ALARM:

u
The constant BEEP_TONE specifies the alarm tone. Practical values for the tone range from approximately 100 to 4,000 hertz. The Install procedure marks the beginning of the installation section of the program. Execution begins here when ALARM.COM is loaded. A TSR generally places its installation code after the resident section. This allows the terminating TSR to include the installation code with the rest of the memory it returns to MS-DOS. Since the installation section executes only once, the TSR can discard it after becoming resident. You can install ALARM any number of times in quick succession, each time with a new alarm setting. The timer handler does not restore the original vector for Interrupt 08 after the alarm sounds. In effect, the multiple installations remain daisy-chained in memory. The address in OldTimer for one installation is the address of NewTimer in the preceding installation. Until a system reboot, NewTimer remains in place as the Interrupt 08 handler, even after the alarm sounds. To save unnecessary activity, the byte TimerActiveFlag remains set after the alarm sounds. This forces an immediate jump to the original handler for all subsequent executions of NewTimer. NewTimer and Sound alter registers DS, AX, BX, CX, DX, and ES. To preserve the original values in these registers, the procedures first push them onto the stack and then restore the original values before exiting. This ensures that the process interrupted by NewTimer continues with valid registers after NewTimer returns. ALARM requires little stack space. It assumes the current stack is adequate and makes no attempt to set up a new one. More sophisticated TSRs, however, should as a matter of course provide their own stacks to ensure adequate stack depth. The example program presented in Example of an Advanced TSR: SNAP, later in this chapter, demonstrates this safety measure.
285
Using MS-DOS in Active TSRs

This section explains how to write active TSRs that can safely call MS-DOS functions. The material explores the problems imposed by the nonreentrant nature of MS-DOS and explains how a TSR can resolve those problems. The solution consists of four parts:
u u u u
Understanding how MS-DOS uses stacks Determining when MS-DOS is active Determining whether a TSR can safely interrupt an active MS-DOS function Monitoring the Critical Error flag
Understanding MS-DOS Stacks

MS-DOS functions set up their own stacks, which makes them nonreentrant. If a TSR interrupts an MS-DOS function and then executes another function that sets up the same stack, the second function will overwrite everything placed on the stack by the first function. The problem occurs when the second function returns and the first is left with unusable stack data. A TSR that calls an MSDOS function must not interrupt any function that uses the same stack. MS-DOS versions 2.0 and later use three internal stacks: an I/O stack, a disk stack, and an auxiliary stack. The current stack depends on the MS-DOS function. Functions 01 through 0Ch set up the I/O stack. Functions higher than 0Ch (with few exceptions) use the disk stack, as do Interrupts 25h and 26h. MS-DOS normally uses the auxiliary stack only when it executes Interrupt 24h (Critical Error Handler).
Determining MS-DOS Activity

A TSRs handlers can determine when MS-DOS is active by consulting a 1-byte flag called the InDos flag. Every MS-DOS function sets this flag upon entry and clears it upon termination. During installation, a TSR locates the flag through Function 34h (Get Address of InDos Flag), which returns the address as ES:BX. The installation portion then stores the address so the handlers can later find the flag without again calling Function 34h. Theoretically, a TSR can wait to execute until the InDos flag is clear, thus sidestepping the entire issue of interrupting MS-DOS. However, several loworder functions such as Function 0Ah (Get Buffered Keyboard Input) wait idly for an expected keystroke before they terminate. If a TSR were allowed to execute only after MS-DOS returned, it too would have to wait for the terminating event.
286
Programmers Guide
The solution lies in determining when the low-order functions 01 through 0Ch are active. MS-DOS provides another service for this purpose: Interrupt 28h, the Idle Interrupt.
Interrupting MS-DOS Functions

MS-DOS continually calls Interrupt 28h from the low-order polling functions as they wait for keyboard input. This signal says that MS-DOS is idle and that a TSR may interrupt provided it does not overwrite the I/O stack. Put another way, a TSR can safely interrupt MS-DOS Functions 01 through 0Ch provided it does not call them. An active TSR that calls MS-DOS must monitor Interrupt 28h with a handler. When the handler gains control, it checks the TSR request flag. If the flag indicates the TSR has been requested and if system hardware is inactive, the handler executes the TSR. Since control must eventually return to the idle MSDOS function which has stored data on the I/O stack, the TSR in this case must not call any MS-DOS function that also uses the I/O stack. Table 11.1 shows which functions set up the I/O stack for various versions of MS-DOS.
Table 11.1 Function 010Ch 33h 50h51h 59h 5D0Ah 62h All others MS-DOS Internal Stacks Critical Error flag Clear Set Clear Set Clear Set Clear Set Clear Set Clear Set Clear Set MS-DOS 2.x I/O* Aux* Disk* Disk I/O Aux n/a* n/a n/a n/a n/a n/a Disk Disk MS-DOS 3.0 I/O Aux Disk Disk Caller Caller I/O Aux n/a n/a Caller Caller Disk Disk MS-DOS 3.1+ I/O Aux Caller* Caller Caller Caller Disk Disk Disk Disk Caller Caller Disk Disk
* I/O=I/O stack, Aux = auxiliary stack, Disk = disk stack, Caller = callers stack, n/a = function not available.
287
TSRs that perform tasks of long or indefinite duration should themselves call Interrupt 28h. For example, a TSR that polls for keyboard input should include an INT 28h instruction in the polling loop, as shown here:
poll: int mov int jnz sub int 28h ah, 1 16h poll ah, ah 16h ; Signal idle state ; Key waiting? ; If not, repeat polling loop ; Otherwise, get key
This courtesy gives other TSRs a chance to execute if the InDos flag happens to be set.
Monitoring the Critical Error Flag

MS-DOS sets the Critical Error flag to a nonzero value when it detects a critical error. It then invokes Interrupt 24h (Critical Error Handler) and clears the flag when Interrupt 24h returns. MS-DOS functions higher than 0Ch are illegal during critical error processing. Therefore, a TSR that calls MS-DOS must not execute while the Critical Error flag is set. MS-DOS versions 3.1 and later locate the Critical Error flag in the byte preceding the InDos flag. A single call to Function 34h (Get Address of InDos Flag) thus effectively returns the addresses of both flags. For earlier versions of MS-DOS or for the compatibility version of MS-DOS in OS/2 1.x, a TSR must call Function 34h and then scan the segment returned in the ES register for one of the two following sequences of instructions:
; Sequence of instructions in DOS Versions 2.0 - 3.0 cmp ss:[CriticalErrorFlag], 0 jne @F int 28h ; Sequence of instructions in DOS compatibility version for OS/2 1.x test [CriticalErrorFlag], 0FFh jnz @F push ss:[ ? ] int 28h
The question mark inside brackets in the preceding PUSH statement indicates that the operand for the PUSH instruction can be any legal operand. In either version of MS-DOS, the operand field in the first instruction gives the flags offset. The value in ES determines the segment address. Example of an Advanced TSR: SNAP, later in the chapter, presents a program that shows how to locate the Critical Error flag with this technique.
288
Programmers Guide
Preventing Interference
This section describes how an active TSR can avoid interfering with the process it interrupts. Interference occurs when a TSR commits an error or performs an action that affects the interrupted process after the TSR returns. Examples of interference range from relatively harmless, such as moving the cursor, to serious, such as overrunning a stack. Although a TSR can interfere with another process in many different ways, protection against interference involves only three steps: 1. Recording a current configuration 2. Changing the configuration so it applies to the TSR 3. Restoring the original configuration before terminating The example program described on page 293 demonstrates all the noninterference safeguards described in this section. These safeguards by no means exhaust the subject of noninterference. More sophisticated TSRs may require more sophisticated methods. However, noninterference methods generally fall into one of the following categories:
u u u
Trapping errors Preserving an existing condition Preserving existing data
Trapping Errors
A TSR committing an error that triggers an interrupt must handle the interrupt to trap the error. Otherwise, the existing interrupt routine, which belongs to the underlying process, would attempt to service an error the underlying process did not commit. For example, a TSR that accepts keyboard input should include handlers for Interrupts 23h and 1Bh to trap keyboard break signals. When MS-DOS detects CTRL+C from the keyboard or input stream, it transfers control to Interrupt 23h (CTRL+C Handler). Similarly, the BIOS keyboard routine calls Interrupt 1Bh (CTRL+BREAK Handler) when it detects a CTRL+BREAK key combination. Both routines normally terminate the current process. A TSR that calls MS-DOS should also trap critical errors through Interrupt 24h (Critical Error Handler). MS-DOS functions call Interrupt 24h when they encounter certain hardware errors. The TSR must not allow the existing interrupt routine to service the error, since the routine might allow the user to abort service and return control to MS-DOS. This would terminate both the
289
TSR and the underlying process. By handling Interrupt 24h, the TSR retains control if a critical error occurs. An error-trapping handler differs in two ways from a TSRs other handlers: 1. It is temporary, in service only while the TSR executes. At startup, the TSR copies the handlers address to the interrupt vector table; it then restores the original vector before returning. 2. It provides complete service for the interrupt; it does not pass control on to the original routine. Error-trapping handlers often set a flag to let the TSR know the error has occurred. For example, a handler for Interrupt 1Bh might set a flag when the user presses CTRL+BREAK. The TSR can check the flag as it polls for keyboard input, as shown here:
BrkHandler PROC FAR ; Handler for Interrupt 1Bh . . . mov cs:BreakFlag, TRUE ; Raise break flag iret ; Terminate interrupt BrkHandler ENDP . . . mov poll: . . . cmp je mov int jnz
BreakFlag, FALSE
; Initialize break flag
BreakFlag, TRUE exit ah, 1 16h poll
; Keyboard break pressed? ; If so, break polling loop ; Key waiting? ; If not, repeat polling loop
Preserving an Existing Condition

A TSR and its interrupt handlers must preserve register values so that all registers are returned intact to the interrupted process. This is usually done by pushing the registers onto the stack before changing them, then popping the original values before returning. Setting up a new stack is another important safeguard against interference. A TSR should usually provide its own stack to avoid the possibility of overrunning the current stack. Exceptions to this rule are simple TSRs such as the sample program ALARM that make minimal stack demands.
290
Programmers Guide
A TSR that alters the video configuration should return the configuration to its original state upon return. Video configuration includes cursor position, cursor shape, and video mode. The services provided through Interrupt 10h enable a TSR to determine the existing configuration and alter it if necessary. However, some applications set video parameters by directly programming the video controller. When this happens, BIOS remains unaware of the new configuration and consequently returns inaccurate information to the TSR. Unfortunately, there is no solution to this problem if the controllers data registers provide write-only access and thus cannot be queried directly. For more information on video controllers, refer to Richard Wilton, Programmers Guide to the PC & PS/2 Video Systems. (See Books for Further Reading in the Introduction.)
Preserving Existing Data

A TSR requires its own disk transfer area (DTA) if it calls MS-DOS functions that access the DTA. These include file control block functions and Functions 11h, 12h, 4Eh, and 4Fh. The TSR must switch to a new DTA to avoid overwriting the one belonging to the interrupted process. On becoming active, the TSR calls Function 2Fh to obtain the address of the current DTA. The TSR stores the address and then calls Function 1Ah to establish a new DTA. Before returning, the TSR again calls Function 1Ah to restore the address of the original DTA. MS-DOS versions 3.1 and later allow a TSR to preserve extended error information. This prevents the TSR from destroying the original information if it commits an MS-DOS error. The TSR retrieves the current extended error data by calling MS-DOS Function 59h. It then copies registers AX, BX, CX, DX, SI, DI, DS, and ES to an 11-word data structure in the order given. MS-DOS reserves the last three words of the structure, which should each be set to zero. Before returning, the TSR calls Function 5Dh with AL = 0Ah and DS:DX pointing to the data structure. This call restores the extended error data to their original state.
Communicating Through the Multiplex Interrupt

The Multiplex interrupt (Interrupt 2Fh) provides the Microsoft-approved way for a program to verify the presence of an installed TSR and to exchange information with it. MS-DOS version 2.x uses Interrupt 2Fh only as an interface for the resident print spooler utility PRINT.COM. Later MS-DOS versions standardize calling conventions so that multiple TSRs can share the interrupt. A TSR chains to the Multiplex interrupt by setting up a handler. The TSRs installation code records the Interrupt 2Fh vector and then replaces it with the address of the new multiplex handler.
291
The Multiplex Handler

A program communicates with a multiplex handler by calling Interrupt 2Fh with an identity number in the AH register. As each handler in the chain gains control, it compares the value in AH with its own identity number. If the handler finds that it is not the intended recipient of the call, it passes control to the previous handler. The process continues until control reaches the target handler. When the target handler finishes its tasks, it returns via an IRET instruction to terminate the interrupt. The target handler determines its tasks from the function number in AL. Convention reserves Function 0 as a request for installation status. A multiplex handler must respond to Function 0 by setting AL to 0FFh, to inform the caller of the handlers presence in memory. The handler should also return other information to provide a completely reliable identification. For example, it might return in ES:BX a far pointer to the TSRs copyright notice. This assures the caller it has located the intended TSR and not another TSR that has already claimed the identity number in AH. Identity numbers range from 192 to 255, since MS-DOS reserves lesser values for its own use. During installation, a TSR must verify the uniqueness of its number. It must not set up a multiplex handler identified by a number already in use. A TSR usually obtains its identity number through one of the following methods:
u u
The programmer assigns the number in the program. The user chooses the number by entering it as an argument in the command line, placing it into an environment variable, or by altering the contents of an initialization file. The TSR selects its own number through a process of trial and error.
The last method offers the most flexibility. It finds an identity number not currently in use among the installed multiplex handlers and does not require intervention from the user. To use this method, a TSR calls Interrupt 2Fh during installation with AH = 192 and AL = 0. If the call returns AL = 0FFh, the program tests other registers to determine if it has found a prior installation of itself. If the test fails, the program resets AL to zero, increments AH to 193, and again calls Interrupt 2Fh. The process repeats with incrementing values in AH until the TSR locates a prior installation of itself in which case it should abort with an appropriate message to the user or until AL returns as zero. The TSR can then use the value in AH as its identity number and proceed with installation. The SNAP.ASM program in this chapter demonstrates how a TSR can use this trial-and-error method to select a unique identity number. During installation, the program calls Interrupt 2Fh to verify that SNAP is not already installed. When
292
Programmers Guide
deinstalling, the program again calls Interrupt 2Fh to locate the resident TSR in memory. SNAPs multiplex handler services the call and returns the address of the resident codes program-segment prefix. The calling program can then locate the resident code and deinstall it, as explained in Deinstalling a TSR, following.
Using the Multiplex Interrupt Under MS-DOS Version 2.x

A TSR can use the Multiplex interrupt under MS-DOS version 2.x, with certain limitations. Under version 2.x, only MS-DOSs print spooler PRINT, itself a TSR program, provides an Interrupt 2Fh service. The Interrupt 2Fh vector remains null until PRINT or another TSR is installed that sets up a multiplex handler. Therefore, a TSR running under version 2.x must first check the existing Interrupt 2Fh vector before installing a multiplex handler. The TSR locates the current Interrupt 2Fh handler through Function 35h (Get Interrupt Vector). If the function returns a null vector, the TSRs handler will be last in the chain of Interrupt 2Fh handlers. The handler must terminate with an IRET instruction rather than pass control to a nonexistent routine. PRINT in MS-DOS version 2.x does not pass control to the previous handler. If you intend to run PRINT under version 2.x, the program must be installed before other TSRs that also handle Interrupt 2Fh. This places PRINTs multiplex handler last in the chain of handlers.
Deinstalling a TSR
A TSR should provide a means for the user to remove or deinstall it from memory. Deinstallation returns occupied memory to the system, offering these benefits:
u
The freed memory becomes available to subsequent programs that may require additional memory space. Deinstallation restores the system to a normal state. Thus, sensitive programs that may be incompatible with TSRs can execute without the presence of installed routines.
A deinstallation program must first locate the TSR in memory, usually by requesting an address from the TSRs multiplex handler. When it has located the TSR, the deinstallation program should then compare addresses in the vector table with the addresses of the TSRs handlers. A mismatch indicates that another TSR has chained a handler to the interrupt routine. In this case, the deinstallation program should deny the request to deinstall. If the addresses of
293
the TSRs handlers match those in the vector table, deinstallation can safely continue.
294
Programmers Guide
You can deinstall the TSR with these three steps: 1. Restore to the vector table the original interrupt vectors replaced by the handler addresses. 2. Read the segment address stored at offset 2Ch of the resident TSRs program segment prefix (PSP). This address points to the TSRs environment block, a list of environment variables that MS-DOS copies into memory when it loads a program. Place the blocks address in the ES register and call MS-DOS Function 49h (Release Memory Block) to return the blocks memory to the operating system. 3. Place the resident PSP segment address in ES and again call Function 49h. This call releases the block of memory occupied by the TSRs code and data. The example program in the next section demonstrates how to locate a resident TSR through its multiplex handler, and deinstall it from memory.
Example of an Advanced TSR: SNAP

This section presents SNAP, a memory-resident program that demonstrates most of the techniques discussed in this chapter. SNAP takes a snapshot of the current screen and copies the text to a specified file. SNAP accommodates screens with various column and line counts, such as CGAs 40-column mode or VGAs 50-line mode. The program ignores graphics screens. Once installed, SNAP occupies approximately 7.5K of memory. When it detects the ALT+LEFT SHIFT+S key combination, SNAP displays a prompt for a file specification. The user can type a new filename, accept the previous filename by pressing ENTER, or cancel the request by pressing ESC. SNAP reads text directly from the video buffer and copies it to the specified file. The program sets the file pointer to the end of the file so that text is appended without overwriting previous data. SNAP copies each line only to the last character, ignoring trailing spaces. The program adds a carriage returnlinefeed sequence (0D0Ah) to the end of each line. This makes the file accessible to any text editor that can read ASCII files. To demonstrate how a program accesses resident data through the Multiplex interrupt, SNAP can reset the display attribute of its prompt box. After installing SNAP, run the main program with the /C option to change box colors:
SNAP /Cxx
The argument xx specifies the desired attribute as a two-digit hexadecimal number for example, 7C for red on white, or 0F for monochrome high
295
intensity. For a list of color and monochrome display attributes, refer to the Tables section of the Reference. SNAP can deinstall itself, provided another TSR has not been loaded after it. Deinstall SNAP by executing the main program with the /D option:
SNAP /D
If SNAP successfully deinstalls, it displays the following message:

TSR deinstalled
Building SNAP.EXE
SNAP combines four modules: SNAP.ASM, COMMON.ASM, HANDLERS.ASM, and INSTALL.ASM. Source files are located on one of your distribution disks. Each module stores temporary code and data in the segments INSTALLCODE and INSTALLDATA. These segments apply only to SNAPs installation phase; MS-DOS recovers the memory they occupy when the program exits through the terminate-and-stay-resident function. The following briefly describes each module:
u u u
SNAP.ASM contains the TSRs main code and data. COMMON.ASM contains procedures used by other example programs. HANDLERS.ASM contains interrupt handler routines for Interrupts 08, 09, 10h, 13h, 15h, 28h, and 2Fh. It also provides simple error-trapping handlers for Interrupts 1Bh, 23h, and 24h. Additional routines set up and deinstall the handlers. INSTALL.ASM contains an exit routine that calls the terminate-and-stayresident function and a deinstallation routine that removes the program from memory. The module includes error-checking services and a command-line parser.
This building-block approach allows you to create other TSRs by replacing SNAP.ASM and linking with the HANDLERS and INSTALL object modules. The library of routines accommodates both keyboard-activated and timeactivated TSRs. A time-activated TSR is a program that activates at a predetermined time of day, similar to the example program ALARM introduced earlier in this chapter. The header comments for the Install procedure in HANDLERS.ASM explain how to install a time-activated TSR. You can write new TSRs in assembly language or any high-level language that conforms to the Microsoft conventions for ordering segments. Regardless of the language, the new code must not invoke an MS-DOS function that sets up the I/O stack (see Interrupting MS-DOS Functions, earlier in this chapter). Code
296
Programmers Guide
in Microsoft C, for example, must not call getche or kbhit, since these functions in turn call MS-DOS Functions 01 and 0Bh. Code written in a high-level language must not check for stack overflows. Compiler-generated stack probes do not recognize the new stack setup when the TSR executes, and therefore must be disabled. The example program BELL.C, included on disk with the TSR library routines, demonstrates how to disable stack checking in Microsoft C using the check_stack pragma.
Outline of SNAP
The following sections outline in detail how SNAP works. Each part of the outline covers a specific portion of SNAPs code. Headings refer to earlier sections of this chapter, providing cross-references to SNAPs key procedures. For example, the part of the outline that describes how SNAP searches for its startup signal refers to the section Auditing Hardware Events for TSR Requests, earlier in this chapter. Figures 11.2 through 11.4 are flowcharts of the SNAP program. Each chart illustrates a separate phase of SNAPs operation, from installation through memory-residency to deinstallation.
297
Figure 11.2
Flowchart for SNAP.EXE: Installation Phase
298
Programmers Guide
Figure 11.3
Flowchart for SNAP.EXE: Resident Phase
299
Figure 11.4
Flowchart for SNAP.EXE: Deinstallation Phase
Refer to the flowcharts as you read the following outline. They will help you maintain perspective while exploring the details of SNAPs operation. Text in the outline cross-references the charts. Note that information in both the outline and the flowcharts is generic. Except for references to the SNAP procedure, all descriptions in the outline and the
300
Programmers Guide
flowcharts apply to any TSR created with the HANDLERS and INSTALL modules.
Auditing Hardware Events for TSR Requests

To search for its startup signal, SNAP audits the keyboard with an interrupt handler for either Interrupt 09 (keyboard) or Interrupt 15h (Miscellaneous System Services). The Install procedure determines which of the two interrupts to handle based on the following code:
.IF mov mov call mov .ELSE cmp jb ; ; ; ; ; ; ; Version, 031Eh setup HotScan == 0 ah, HotShift al, HotMask GetTimeToElapse CountDown, ax ; ; ; ; ; If valid scan code given: AH = hour to activate AL = minute to activate Get number of 5-second intervals to elapse before activation
; Force use of KeybrdMonitor as ; keyboard handler ; DOS Version 3.3 or higher? ; No? Skip next step
Test for IBM PS/2 series. If not PS/2, use Keybrd and SkipMiscServ as handlers for Interrupts 09 and 15h respectively. If PS/2 system, set up KeybrdMonitor as the Interrupt 09 handler. Audit keystrokes with MiscServ handler, which searches for the hot key by handling calls to Interrupt 15h (Miscellaneous System Services). Refer to Section 11.2.1 for more information about keyboard handlers. mov int sti jc or jnz ax, 0C00h 15h ; Function 0Ch (Get System ; Configuration Parameters) ; Compaq ROM may leave disabled ; If carry set, ; or if AH not 0, ; services are not supported
setup ah, ah setup
; Test bit 4 to see if Intercept is implemented test BYTE PTR es:[bx+5], 00010000y jz setup ; If so, set up MiscServ as Interrupt 15h handler mov ax, OFFSET MiscServ mov WORD PTR intMisc.NewHand, ax .ENDIF ; Set up KeybrdMonitor as Interrupt 09 handler mov ax, OFFSET KeybrdMonitor mov WORD PTR intKeybrd.NewHand, ax
The following describes the codes logic:

u
301
If the program is running under MS-DOS version 3.3 or higher and if Interrupt 15h supports Function 4Fh, set up handler MiscServ to search for the hot key. Handle Interrupt 09 with KeybrdMonitor only to maintain the keyboard active flag. Otherwise, set up a handler for Interrupt 09 to search for the hot key. Handle calls to Interrupt 15h with the routine SkipMiscServ, which contains this single instruction:
jmp cs:intMisc.OldHand
The jump immediately passes control to the original Interrupt 15h routine; thus, SkipMiscServ has no effect. It serves only to simplify coding in other parts of the program. At each keystroke, the keyboard interrupt handler (either Keybrd or MiscServ) calls the procedure CheckHotKey with the scan code of the current key. CheckHotKey compares the scan code and shift status with the bytes HotScan and HotShift. If the current key matches, CheckHotKey returns the carry flag clear to indicate that the user has pressed the hot key. If the keyboard handler finds the carry flag clear, it sets the flag TsrRequestFlag and exits. Otherwise, the handler transfers control to the original interrupt routine to service the interrupt. The timer handler Clock reads the request flag at every occurrence of the timer interrupt. Clock takes no action if it finds a zero value in TsrRequestFlag. Figures 11.1 and 11.3 depict the relationship between the keyboard and timer handlers.
Monitoring System Status

Because SNAP produces output to both video and disk, it avoids interrupting either video or disk operations. The program uses interrupt handlers Video and DiskIO to monitor Interrupts 10h (video) and 13h (disk). SNAP also avoids interrupting keyboard use. The instructions at the far label KeybrdMonitor serve as the monitor handler for Interrupt 09 (keyboard). The three handlers perform similar functions. Each sets an active flag and then calls the original routine to service the interrupt. When the service routine returns, the handler clears the active flag to indicate that the device is no longer in use.
302
Programmers Guide
The BIOS Interrupt 13h routine clears or sets the carry flag to indicate the operations success or failure. DiskIO therefore preserves the flags register when returning, as shown here:
DiskIO PROC FAR mov cs:intDiskIO.Flag, TRUE ; Set active flag ; Simulate interrupt by pushing flags and far-calling old ; Int 13h routine pushf call cs:intDiskIO.OldHand ; Clear active flag without disturbing flags register mov cs:intDiskIO.Flag, FALSE sti ; Enable interrupts ; Simulate IRET without popping flags (since services use ; carry flag) ret 2 DiskIO ENDP
The terminating RET 2 instruction discards the original flags from the stack when the handler returns.
Determining Whether to Invoke the TSR

The procedure CheckRequest determines whether the TSR:
u u
Has been requested. Can safely interrupt the system.
Each time it executes, the timer handler Clock calls CheckRequest to read the flag TsrRequestFlag. If CheckRequest finds the flag set, it scans other flags maintained by the TSRs interrupt handlers and by MS-DOS. These flags indicate the current system status. As the flowchart in Figure 11.3 shows, CheckRequest calls CheckDos (described following) to determine the status of the operating system. CheckRequest then calls CheckHardware to check hardware status.
CheckHardware queries the interrupt controller to determine if any device is
currently being serviced. It also reads the active flags maintained by the KeybrdMonitor, Video, and DiskIO handlers. If the controller, keyboard, video, and disk are all inactive, CheckHardware clears the carry flag and returns.
CheckRequest indicates system status with the carry flag. If the procedure
returns the carry flag set, the caller exits without invoking the TSR. A clear carry signals that the caller can safely execute the TSR.
303
Determining MS-DOS Activity

As Figure 11.2 shows, the procedure GetDosFlags locates the InDos flag during SNAPs installation phase. GetDosFlags calls Function 34h (Get Address of InDos Flag) and then stores the flags address in the far pointer InDosAddr. When called from the CheckRequest procedure, CheckDos reads InDos to determine whether the operating system is active. Note that CheckDos reads the flag directly from the address in InDosAddr. It does not call Function 34h to locate the flag, since it has not yet established whether MS-DOS is active. This follows from the general rule that interrupt handlers must not call any MSDOS function. The next two sections more fully describe the procedure CheckDos.
Interrupting MS-DOS Functions

Figure 11.3 shows that the call to CheckDos can initiate either from Clock (timer handler) or Idle (Interrupt 28h handler). If CheckDos finds the InDos flag set, it reacts in different ways, depending on the caller:
u
If called from Clock, CheckDos cannot know which MS-DOS function is active. In this case, it returns the carry flag set, indicating that Clock must deny the request for the TSR. If called from Idle, CheckDos assumes that one of the low-order polling functions is active. It therefore clears the carry flag to let the caller know the TSR can safely interrupt the function.
For more information on this topic, see the section Interrupting MS-DOS Functions, earlier in this chapter.
Monitoring the Critical Error Flag

The procedure GetDosFlags (Figure 11.2) determines the address of the Critical Error flag. The procedure stores the flags address in the far pointer CritErrAddr. When called from either the Clock or Idle handlers, CheckDos reads the Critical Error flag. A nonzero value in the flag indicates that the Critical Error Handler (Interrupt 24h) is processing a critical error and the TSR must not interrupt. In this case, CheckDos sets the carry flag and returns, causing the caller to exit without executing the TSR.
Trapping Errors
As Figure 11.3 shows, Clock and Idle invoke the TSR by calling the procedure Activate. Before calling the main body of the TSR, Activate sets up the following handlers:
304
Programmers Guide Handler Name CtrlBreak CtrlC CritError For Interrupt 1Bh (CTRL+BREAK Handler) 23h (CTRL+C Handler) 24h (Critical Error Handler) Receives Control When
CTRL+BREAK sequence entered at
keyboard MS-DOS detects a CTRL+C sequence from the keyboard or input stream MS-DOS encounters a critical error
These handlers trap keyboard break signals and critical errors that would otherwise trigger the original handler routines. The CtrlBreak and CtrlC handlers contain a single IRET instruction, thus rendering a keyboard break ineffective. The CritError handler contains the following instructions:
CritError PROC sti sub .IF mov .ENDIF iret CritError ENDP FAR al, al cs:major != 2 al, 3 ; ; ; ; Assume DOS 2.x Set AL = 0 for ignore error If DOS 3.x, set AL = 3 DOS call fails
The return code in AL stops MS-DOS from taking further action when it encounters a critical error. As an added precaution, Activate also calls Function 33h (Get or Set to determine the current setting of the checking flag. Activate stores the setting, then calls Function 33h again to turn off break checking.
CTRL+BREAK Flag)
When the TSRs main procedure finishes its work, it returns to Activate, which restores the original setting for the checking flag. It also replaces the original vectors for Interrupts 1Bh, 23h, and 24h. SNAPs error-trapping safeguards enable the TSR to retain control in the event of an error. Pressing CTRL+BREAK or CTRL+C at SNAPs prompt has no effect. If the user specifies a nonexistent drive a critical error SNAP merely beeps the speaker and returns normally.
Preserving an Existing Condition

Activate records the stack pointer SS:SP in the doubleword OldStackAddr. The procedure then resets the pointer to the address of a new stack before calling the TSR. Switching stacks ensures that SNAP has adequate stack depth while it executes.
The label NewStack points to the top of the new stack buffer, located in the code segment of the HANDLERS.ASM module. The equate constant
Chapter 11 Writing Memory-Resident Software STACK_SIZ determines the size of the stack. The include file TSR.INC contains the declaration for STACK_SIZ.
305
Activate preserves the values in all registers by pushing them onto the new stack. It does not push DS, since that register is already preserved in the Clock or Idle handler.
SNAP does not alter the applications video configuration other than by moving the cursor. Figure 11.3 shows that Activate calls the procedure Snap, which executes Interrupt 10h to determine the current cursor position. Snap stores the row and column in the word OldPos. The procedure restores the cursor to its original location before returning to Activate.
Preserving Existing Data

Because SNAP does not call an MS-DOS function that writes to the DTA, it does not need to preserve the DTA belonging to the interrupted process. However, the code for switching and restoring the DTA is included within IFDEF blocks in the procedure Activate. The equate constant DTA_SIZ, declared in the TSR.INC file, governs the assembly of the blocks as well as the size of the new DTA. It is possible for SNAP to overwrite existing extended error information by committing a file error. The program does not attempt to preserve the original information by calling Functions 59h and 5Dh. In certain rare instances, this may confuse the interrupted process after SNAP returns.
Communicating Through the Multiplex Interrupt

The program uses the Multiplex interrupt (Interrupt 2Fh) to
u u u
Verify that SNAP is installed. Select a unique multiplex identity number. Locate resident data.
For more information about Interrupt 2Fh, see the section Communicating through the Multiplex Interrupt, earlier in this chapter. SNAP accesses Interrupt 2Fh through the procedure CallMultiplex, as shown in Figures 11.2 and 11.4. By searching for a prior installation, CallMultiplex ensures that SNAP is not installed more than once. During deinstallation, CallMultiplex locates data required to deinstall the resident TSR. The procedure Multiplex serves as SNAPs multiplex handler. When it recognizes its identity number in AH, Multiplex determines its tasks from the function number in the AL register. The handler responds to Function 0 by
306
Programmers Guide
returning AL equalling 0FFh and ES:DI pointing to an identifier string unique to SNAP.
CallMultiplex searches for the handler by invoking Interrupt 2Fh in a loop,
beginning with a trial identity number of 192 in AH. At the start of each iteration of the loop, the procedure sets AL to zero to request presence verification from the multiplex handler. If the handler returns 0FFh in AL, CallMultiplex compares its copy of SNAPs identifier string with the text at memory location ES:DI. A failed match indicates that the multiplex handler servicing the call is not SNAPs handler. In this case, CallMultiplex increments AH and cycles back to the beginning of the loop. The process repeats until the call to Interrupt 2Fh returns a matching identifier string at ES:DI, or until AL returns as zero. A matching string verifies that SNAP is installed, since its multiplex handler has serviced the call. A return value of zero indicates that SNAP is not installed and that no multiplex handler claims the trial identity number in AH. In this case, SNAP assigns the number to its own handler.
Deinstalling a TSR
During deinstallation, CallMultiplex locates SNAPs multiplex handler as described previously. The handler Multiplex receives the verification request and returns in ES the code segment of the resident program.
Deinstall reads the addresses of the following interrupt handlers from the
data structure in the resident code segment:

Handler Name Clock Keybrd KeybrdMonitor Video DiskIO SkipMiscServ MiscServ Idle Multiplex Description Timer handler Keyboard handler (non-PS/2) Keyboard monitor handler (PS/2) Video monitor handler Disk monitor handler Miscellaneous Systems Services handler (non-PS/2) Miscellaneous Systems Services handler (PS/2) MS-DOS Idle handler Multiplex handler
Deinstall calls MS-DOS Function 35h (Get Interrupt Vector) to retrieve the
current vectors for each of the listed interrupts. By comparing each handler address with the corresponding vector, Deinstall ensures that SNAP can be safely deinstalled. Failure in any of the comparisons indicates that another TSR has been installed after SNAP and has set up a handler for the same interrupt. In
307
this case, Deinstall returns an error code, stopping the program with the following message:
Cant deinstall TSR
If all addresses match, Deinstall calls Interrupt 2Fh with SNAPs identity number in AH and AL set to 1. The handler Multiplex responds by returning in ES the address of the resident codes PSP. Deinstall then calls MS-DOS Function 25h (Set Interrupt Vector) to restore the vectors for the original service routines. This is called unhooking or unchaining the interrupt handlers. After unhooking all of SNAPs interrupt handlers, Deinstall returns with AX pointing to the resident codes PSP. The procedure FreeTsr then calls MSDOS Function 49h (Release Memory) to return SNAPs memory to the operating system. The program ends with the message
TSR deinstalled
to indicate a successful deinstallation. Deinstalling SNAP does not guarantee more available memory space for the next program. If another TSR loads after SNAP but handles interrupts other than 08, 09, 10h, 13h, 15h, 28h, or 2Fh, SNAP still deinstalls properly. The result is a harmless gap of deallocated memory formerly occupied by SNAP. MS-DOS can use the free memory to store the next programs environment block. However, MS-DOS loads the program itself above the still-resident TSR.
307
C H A P T E R
1 2
Mixed-Language Programming
Mixed-language programming allows you to combine the unique strengths of Microsoft Basic, C, C++, and FORTRAN with your assembly-language routines. Any one of these languages can call MASM routines, and you can call any of these languages from within your assembly-language programs. This makes virtually all the routines from high-levellanguage libraries available to a mixed-language program. MASM 6.1 provides mixed-language features similar to those in high-level languages. For example, you can use the INVOKE directive to call high-levellanguage procedures, and the assembler handles the argument-passing details for you. You can also use H2INC to translate C header files to MASM include files, as explained in Chapter 20 of Environment and Tools. The mixed-language features of MASM 6.1 do not make older methods of defining mixed-language interfaces obsolete. In most cases, mixed-language programs written with earlier versions of MASM will assemble and link correctly under MASM 6.1. (For more information, see Appendix A.) This chapter explains how to write assembly routines that can be called from high-levellanguage modules and how to call high-level language routines from MASM. You should already understand the languages you want to combine and should know how to write, compile, and link multiple-module programs with these languages. This chapter covers only assembly-language interface with C, C++, Basic, and FORTRAN; it does not cover mixed-language programming between high-level languages. The focus here is the Microsoft versions of C, C++, Basic, and FORTRAN, but the same principles apply to other languages and compilers. Many of the techniques used in this chapter are explained in the material in Chapter 7 on writing procedures in assembly language, and in Chapter 8 on multiple-module programming. The first section of this chapter discusses naming and calling conventions. The next section, Writing an Assembly Procedure for a Mixed-Language Program, provides a template for writing an assembly-language procedure that can be
308
Programmers Guide
called from another module written in a high-level language. This represents the essence of mixed-language programming. Assembly language is often used for creating fast secondary routines in a large program written in a high-level language. The third section describes specific conventions for linking assembly-language procedures with modules in C, C++, Basic, and FORTRAN. These languagespecific sections also provide details on how the language manages various data structures so that your MASM programs are compatible with the data from the high-level language.
Naming and Calling Conventions

Each language has its own set of conventions, which fall into two categories:
u
The naming convention specifies how or if the compiler or assembler alters the name of an identifier before placing it into an object file. The calling convention determines how a language implements a call to a procedure and how the procedure returns to the caller.
MASM supports several different conventions. The assembler uses C convention when you specify a language type (langtype) of C, and Pascal convention for language types PASCAL, BASIC, or FORTRAN. To the assembler, the keywords BASIC, PASCAL, and FORTRAN are synonymous. MASM also supports the SYSCALL and STDCALL conventions, which mix elements of the C and Pascal conventions. MASM gives you several ways to set the naming and calling conventions in your assembly-language program. Using .MODEL with a langtype sets the default for the module. This can also be done with the OPTION directive. This is equivalent to the /Gc or /Gd option from the command line. Procedure prototypes and declarations can specify a langtype to override the default. When you write mixed-language routines, the easiest way to ensure convention compatibility is to adopt the conventions of the called procedures language. However, Microsoft languages can change the naming and calling conventions for different procedures. If your program must call a procedure that uses an argument-passing method different from that of the default language, prototype the procedure first with the desired language type. This tells the assembler to override the conventions of the default language and assume the proper conventions for the prototyped procedure. The MASM/High-LevelLanguage Interface section in this chapter explains how to change the default conventions. The following sections provide more detail on the information summarized in Table 12.1.
Chapter 12 Mixed-Language Programming Table 12.1 Naming and Calling Conventions Convention Leading underscore Capitalize all Arguments pushed left to right Arguments pushed right to left Caller stack cleanup :VARARG allowed X X X C X SYSCALL STDCALL X X X X X X X BASIC FORTRAN PASCAL
309
X X
X X
* X
* The STDCALL language type uses caller stack cleanup if the :VARARG parameter is used. Otherwise, the called routine must clean up the stack.
Naming Conventions
Naming convention refers to the way a compiler or assembler stores the names of identifiers. The first two rows of Table 12.1 show how each language type affects symbol names. SYSCALL leaves symbol names as they appear in the source code, but C and STDCALL add an underscore prefix. PASCAL, BASIC, and FORTRAN change symbols to all uppercase. The following list describes how these naming conventions affect a variable called Big Time in your source code:
Langtype Specified
SYSCALL C, STDCALL
Characteristics Leaves the name unmodified. The linker sees the variable as
Big Time.
The assembler (or compiler) adds a leading underscore to the name, but does not change case. The linker sees the variable as _Big Time. Converts all names to uppercase. The linker sees the variable as Big Time.
PASCAL, FORTRAN, BASIC
The C Calling Convention

Specify the C language type for assembly-language procedures called from programs that assume the C calling convention. Note that such programs are not necessarily written in C, since other languages can mimic C conventions.
310
Programmers Guide
Argument Passing
With the C calling convention, the caller pushes arguments from right to left as they appear in the callers argument list. The called procedure returns without removing the arguments from the stack. It is the callers responsibility to clean the stack after the call, either by popping the arguments or by adding an appropriate value to the stack pointer SP.
Register Preservation
The called routine must return with the original values in BP, SI, DI, DS, and SS. It must also preserve the direction flag.
Varying Number of Arguments

The additional overhead of cleaning the stack after each call has compensations. It frees the caller from having to pass a set number of arguments to the called procedure each time. Because the first argument in the list is always the last one pushed, it is always on the top of the stack. Thus, it has the same address relative to the frame pointer, regardless of how many arguments were actually passed. For example, consider the C library function printf, which accepts different numbers of arguments. A C program calls the function like this:
printf( "Numbers: printf( "Also: %f %f %.2f\n", n1, n2, n3 ); %f", n4 );
The first line passes four arguments (including the string in quotes) and the second line passes only two arguments. Notice that printf has no reliable way of determining how many arguments the caller has pushed. Therefore, the function returns without adjusting the stack. The C calling convention requires the caller to take responsibility for removing the arguments from the stack, since only the caller knows how many arguments it passed. Use INVOKE to call a C-callable function from your assembly-language program, since INVOKE automatically generates the necessary stack-cleaning code after the call. You must also prototype the function with the VARARG keyword if appropriate, as explained in Procedures, Chapter 7. Similarly, when you write a C-callable procedure that accepts a varying number of arguments, include VARARG in the procedures PROC statement.
The Pascal Calling Convention

By default, the langtype for FORTRAN, BASIC, and PASCAL selects the Pascal calling convention. This convention pushes arguments left to right so that the last argument is lowest on the stack, and it requires that the called routine remove arguments from the stack.
Chapter 12 Mixed-Language Programming
311
Argument Passing
Arguments are placed on the stack in the same order in which they appear in the source code. The first argument is highest in memory (because it is also the first argument to be placed on the stack), and the stack grows downward.
A routine that uses the Pascal calling convention must preserve SI, DI, BP, DS, and SS. For 32-bit code, the EBX, ES, FS, and GS registers must be preserved as well as EBP, ESI, and EDI. The direction flag is also cleared upon entry and must be preserved.

Passing a variable number of arguments is not possible with the Pascal calling convention.
The STDCALL and SYSCALL Calling Conventions

A STDCALL procedure adopts the C name and calling conventions when prototyped with the VARARG keyword. Refer to the section Declaring Parameters with the PROC Directive in Chapter 7. Without VARARG, the procedure uses the C naming and Pascal calling conventions. STDCALL provides compatibility with 32-bit versions of Microsoft compilers. As Table 12.1 shows, SYSCALL is identical to the C calling convention, but does not add an underscore prefix to symbols.
Argument Passing
Argument passing order for both STDCALL and SYSCALL is the same as the C calling convention. The caller pushes the arguments from right to left and must remove the parameters from the stack after the call. However, STDCALL requires the called procedure to clean the stack if the procedure does not accept a variable number of arguments.
Both conventions require the called procedure to preserve the registers BP, SI, DI, DS, and SS. Under STDCALL, the direction flag is clear on entry and must be returned clear.

SYSCALL allows a variable number of arguments in the same way as the C calling convention. STDCALL also mimics the C convention when VARARG appears in the called procedures declaration or definition. It allows a varying number of arguments and requires the caller to clean the stack. If not declared
312
Programmers Guide
or defined with VARARG, the called procedure does not accept a variable argument list and must clean the stack before it returns.
Writing an Assembly Procedure For a Mixed-Language Program

MASM 6.1 simplifies the coding required for linking MASM routines to highlevel language routines. You can use the PROTO directive to write procedure prototypes, and the INVOKE directive to call external routines. MASM simplifies procedure-related tasks in the following ways:
u
The PROTO directive improves error checking on argument types. u INVOKE pushes arguments onto the stack and converts argument types to types expected when possible. These arguments can be referenced by their parameter label, rather than as offsets of the stack pointer. u The LOCAL directive following the PROC statement saves places on the stack for local variables. These variables can also be referenced by name, rather than as offsets of the stack pointer. u PROC sets up the appropriate stack frame according to the processor mode. u The USES keyword preserves registers given as arguments. u The C calling conventions specified in the PROC syntax allow for a variable number of arguments to be passed to the procedure. u The RET keyword adjusts the stack upward by the number of bytes in the argument list, removes local variables from the stack, and pops saved registers. u The PROC statement lists parameter names and types. The parameters can be referenced by name inside the procedure. The complete syntax and parameter descriptions for these procedure directives are explained in Procedures in Chapter 7. This section provides a template that you can use for writing a MASM routine to be called from a high-level language. The template looks like this: Label PROC [[distance langtype visibility <prologueargs> USES reglist parmlist]] LOCAL varlist . . .
313
RET Label ENDP Replace the italicized words with appropriate keywords, registers, or variables as defined by the syntax in Declaring Parameters with the PROC Directive in Chapter 7. The distance (NEAR or FAR) and visibility (PUBLIC, PRIVATE, or EXPORT) that you give in the procedure declaration override the current defaults. In some languages, the model can also be specified with command-line options. The langtype determines the calling convention for accessing arguments and restoring the stack. For information on calling conventions, see Naming and Calling Conventions earlier in this chapter. The types for the parameters listed in the parmlist must be given. Also, if any of the parameters are pointers, the assembler does not generate code to get the value of the pointer references. You must write this code yourself. An example of how to write such code is provided in Declaring Parameters with the PROC Directive in Chapter 7. If you need to code your own stack-frame setup manually, or if you do not want the assembler to generate the standard stack setup and cleanup, see Passing Arguments on the Stack and User-Defined Prologue and Epilogue Code in Chapter 7.
The MASM/High-LevelLanguage Interface

Since high-levellanguage programs require initialization, you must write the main routine of a mixed-language program in the high-level language, or link with the startup code supplied by the high-levellanguage compiler. This gives the assembly code access to high-level routines or library functions. The next section explains how to link an assembly-language program with C-language startup code. For procedures with prototypes, INVOKE makes calls from MASM to highlevel language programs, much like procedure or function calls in the high-level language. INVOKE calls procedures and generates the code to push arguments in the order specified by the procedures calling convention, and to remove arguments from the stack at the end of the procedure. INVOKE can also do type checking and data conversion for the argument types so that the procedure receives compatible data. For explanations of how to write procedure prototypes and several examples of procedure declarations and the corresponding prototypes, see Declaring Procedure Prototypes in Chapter 7.
314
Programmers Guide
For programs that mix assembly language and C, the H2INC utility makes it easy to write prototypes and data declarations for the C procedures you want to call from MASM. H2INC translates the C prototypes and declarations into the corresponding MASM prototypes and declarations, which INVOKE can use to call the procedure. The use of H2INC is explained in Chapter 20 in Environment and Tools. Mixed-language programming also allows the main program or a routine to use external data data defined in the other module. External data is the data that is stored in a set place in memory (unlike dynamic and local data, which is allocated on the stack and heap) and is visible to other modules. External data is shared by all routines. One of the modules must define the static data, which causes the compiler to allocate storage for the data. The other modules that access the data must declare the data as external.
Argument Passing
Each language has its own convention for how an argument is actually passed. If the argument-passing conventions of your routines do not agree, then a called routine receives bad data. Microsoft languages support three different methods for passing an argument:
u
Near reference. Passes a variables near (offset) address, expressed as an offset from the default data segment. This method gives the called routine direct access to the variable itself. Any change the routine makes to the parameter is reflected in the calling routine. Far reference. Passes a variables far (segmented) address. Though slower than passing a near reference, this method is necessary for passing data that lies outside the default data segment. (This is not an issue in Basic unless you have specifically requested far memory.) Value. Passes only a copy of the variable, not its address. With this method, the called routine gets a copy of the argument on the stack, but has no access to the original variable. The copy is discarded when the routine returns, and the variable retains its original value.
When you pass arguments between routines written in different languages, you must ensure that the caller and the called routine use the same conventions for passing and receiving arguments. In most cases, you should check the argumentpassing defaults used by each language and make any necessary adjustments. Most languages have features that allow you to change argument-passing methods.
315
A procedure called from any high-level language should preserve the direction flag and the values of BP, SI, DI, SS, and DS. Routines called from MASM must not alter SI, DI, SS, DS, or BP.
Pushing Addresses
Microsoft high-level languages push segment addresses before offsets. This lets the called routine use the LES and LDS instructions to read far addresses from the stack. Furthermore, each word of an argument is placed on the stack in order of significance. Thus, the high word of a long integer is pushed first, followed by the low word.
Array Storage
Most high-level-language compilers store arrays in row-major order. This means that all elements of a row are stored consecutively. The first five elements of an array with four rows and three columns are stored in row-major order as
A[1, 1], A[1, 2], A[1, 3], A[2, 1], A[2, 2]
In column-major order, the column elements are stored consecutively. For example, this same array would be stored in column-major order as
A[1, 1], A[2, 1], A[3, 1], A[4, 1], A[1, 2], A[2, 2]
The C/MASM Interface

This section summarizes the characteristics of the interface between MASM and Microsoft C and QuickC compilers. With the default naming and calling convention, the assembler (or compiler) pushes arguments right to left and adds a leading underscore to routine names.
Compatible Data Types

This list shows the 16-bit C data types and equivalent data types in MASM 6.1. For 32-bit C compilers, int and unsigned int are equivalent to the MASM types SDWORD and DWORD, respectively.
C Type unsigned char char unsigned short , unsigned int int, short unsigned long long Equivalent MASM Type
BYTE SBYTE WORD SWORD DWORD SDWORD
316
Programmers Guide float double long double

REAL4 REAL8 REAL10
Naming Restrictions
C is case-sensitive and does not convert names to uppercase. Since C normally links with the /NOI command-line option, you should assemble MASM modules with the /Cx or /Cp option to prevent the assembler from converting names to uppercase.
317
Argument-Passing Defaults
C always passes arrays by reference and all other variables (including structures) by value. C programs in tiny, small, and medium model pass near addresses for arrays, unless another distance is specified. Compact-, large-, and huge-model programs pass far addresses by default. To pass by reference a variable type other than array, use the C-language address-of operator (&). If you need to pass an array by value, declare the array as a structure member and pass a copy of the entire structure. However, this practice is rarely necessary and usually impractical except for very small arrays, since it can make substantial demands on stack space. If your program must maintain an array through a procedure call, create a temporary copy of the array in heap and provide the copy to the procedure by reference.
Changing the Calling Convention

Put _pascal or _fortran in the C function declaration to specify the Pascal calling convention.
Array Storage
Array declarations give the number of elements. A1[a][b] declares a twodimensional array in C with a rows and b columns. By default, the arrays lower bound is zero. Arrays are stored by the compiler in row-major order. By default, passing arrays from C passes a pointer to the first element of the array.
String Format
C stores strings as arrays of bytes and uses a null character as the end-of-string delimiter. For example, consider the string declared as follows:
char msg[] = "string of text"
The string occupies 15 bytes of memory as:
Figure 12.1
C String Format
Since msg is an array of characters, it is passed by reference.
External Data
In C, the extern keyword tells the compiler that the data or function is external. You can define a static data object in a C module by defining a data object outside all functions and subroutines. Do not use the static keyword in C with a data object that you want to be public.
318
Programmers Guide
Structure Alignment
By default, C uses word alignment (unpacked storage) for all data objects longer than 1 byte. This storage method specifies that occasional bytes may be added as padding, so that word and doubleword objects start on an even boundary. In addition, all nested structures and records start on a word boundary. MASM aligns on byte boundaries by default. When converting .H files with H2INC, you can use the /Zp command-line option to specify structure alignment. If you do not specify the /Zp option, H2INC uses word-alignment. Without H2INC, set the alignment to 2 when declaring the MASM structure, compile the C module with /Zp1, or assemble the MASM module with /Zp2.
Compiling and Linking

Use the same memory model for both C and MASM.
Returning Values
The assembler returns simple data types in registers. Table 12.2 shows the register conventions for returning simple data types to a C program.
Table 12.2 Data Type char short , near, int (16-bit) short , near, int (32-bit) long, far (16-bit) long, far (32-bit) Register Conventions for Simple Return Values Registers AL AX EAX High-order portion (or segment address) in DX; low-order portion (or offset address) in AX High-order portion (or segment address) in EDX; low-order portion (or offset address) in EAX
Procedures using the C calling convention and returning type float or type double store their return values into static variables. In multi-threaded programs, this could mean that the return value may be overwritten. You can avoid this by using the Pascal calling convention for multi-threaded programs so float or double values are passed on the stack. Structures less than 4 bytes long are returned in DX:AX. To return a longer structure from a procedure that uses the C calling convention, you must copy the structure to a global variable and then return a pointer to that variable in the AX register (DX:AX, if you compiled in compact, large, or huge model or if the variable is declared as a far pointer).
319
Structures, Records, and User-Defined Data Types

You can pass structures, records, and user-defined types as arguments by value or by reference.
Writing Procedure Prototypes

The H2INC utility simplifies the task of writing prototypes for the C functions you want to call from MASM. The C prototype converted by H2INC into a MASM prototype allows INVOKE to correctly call the C function. Here are some examples of C functions and the MASM prototypes created with H2INC.
/* Function Prototype Declarations to Convert with H2INC */ long checktypes ( char *name, unsigned char a, int b, float d, unsigned int *num ); my_func (float fNum, unsigned int x); extern my_func1 (char *argv[]); struct videoconfig _far * _far pascal my_func2 (int, scri );
For these C prototypes, H2INC generates this code:

@proto_0 checktypes @proto_1 my_func @proto_2 my_func1 @proto_3 my_func2 TYPEDEF PROTO TYPEDEF PROTO TYPEDEF PROTO TYPEDEF PROTO PROTO C :PTR SBYTE, :BYTE, :SWORD, :REAL4, :PTR WORD @proto_0 PROTO C :REAL4, :WORD @proto_1 PROTO C :PTR PTR SBYTE @proto_2 PROTO FAR PASCAL :SWORD, :scri @proto_3
320
Programmers Guide
Example
As shown in the following short example, the main module (written in C) calls an assembly routine, Power2.
#include <stdio.h> extern int Power2( int factor, int power ); void main() { printf( "3 times 2 to the power of 5 is %d\n", Power2( 3, 5 ) ); }
321
Figure 12.2 shows how functions that observe the C calling convention use the stack frame.
Figure 12.2
C Stack Frame
322
Programmers Guide
The MASM module that contains the Power2 routine looks like this:
.MODEL Power2 small, c
PROTO C factor:SWORD, power:SWORD .CODE PROC mov mov shl ret ENDP END C factor:SWORD, ax, factor cx, power ax, cl power:SWORD ; Load Arg1 into AX ; Load Arg2 into CX ; AX = AX * (2 to power of CX) ; Leave return value in AX
Power2
Power2
The MASM procedure declaration for the Power2 routine specifies the C langtype and the parameters expected by the procedure. The langtype specifies the calling and naming conventions for the interface between MASM and C. The routine is public by default. When the C module calls Power2, it passes two arguments, 3 and 5 by value.
Using the C Startup Code

This section explains how to write an assembly-language program that can call C library functions. It links with the C startup module, which performs the necessary initialization required by the library functions. You must follow these steps when writing such a program: 1. Specify the C convention in the .MODEL statement. 2. Include the following (optional) statement to note linkage with the C startup module:
EXTERN _acrtused:abs
3. Prototype or declare as external all C functions the program references. 4. Include a public procedure called main in your assembly-language module. The C startup code calls _main (which is why all C programs begin with a main function). This procedure serves as the effective entry point for your program. 5. Omit an entry point in the programs END directive. The C startup code serves as the true entry point when the program runs. 6. Assemble with MLs /Cx switch to preserve the case of nonlocal names.
323
The following example serves as a template for these steps. The program calls the C run-time function printf to display two variables.
.MODEL EXTERN small, c _acrtused:abs . . . PROTO NEAR, pstring:NEAR PTR BYTE, num1:WORD, num2:VARARG .DATA BYTE '%i %i', 13, 0 .CODE main PROC . . . INVOKE . . . END PUBLIC ; Step 4: C startup calls here ; Step 1: declare C conventions ; Step 2: bring in C startup
printf
; Step 3: prototype ; external C ; routines
format
printf, OFFSET format, ax, bx
; Step 5: no label on END
The C++/MASM Interface

C++ can apply a protocol called a linkage specification to mixed-language procedures. This lets you link C++ code in the same way as C code. All information in the preceding section applies when linking assembly-language and C++ routines through the C linkage specification. The C linkage specification forces the C++ compiler to adopt C conventions which are not the same as C++ conventions for listed routines. Since MASM does not specifically support C++ conventions, set the C linkage specification in your C++ code for all mixed-language routines, as shown here: extern C declaration where declaration is the prototype of an exported C++ function or an imported assembly-language procedure. You can bracket a list of declarations:
extern "C" { int WriteLine( short attr, char *string ); void GoExit( int err ); }
324
Programmers Guide
or apply the specification to individual prototypes:

extern "C" int extern "C" void WriteLine( short attr, char *string ); GoExit( int err );
Note the syntax remains the same whether WriteLine and GoExit are exported C++ functions or imported assembly-language routines. The linkage specification applies only to called routines, not to external variables. Use the extern keyword (without the C) as you normally would when identifying objects external to the C++ module.
The FORTRAN/MASM Interface

This section summarizes the specific details important to calling FORTRAN procedures or receiving arguments from FORTRAN routines that call MASM routines. It includes a sample MASM and FORTRAN module. A FORTRAN procedure follows the Pascal calling convention by default. This convention passes arguments in the order listed, and the calling procedure removes the arguments from the stack. The naming convention converts all exported names to uppercase.

This list shows the FORTRAN data types that are equivalent to the MASM 6.1 data types.
FORTRAN Type
CHARACTER*1 INTEGER*1 INTEGER*2 REAL*4 INTEGER*4 REAL*8, DOUBLE PRECISION
Equivalent MASM Type

BYTE SBYTE SWORD REAL4 SDWORD REAL8
Naming Restrictions
FORTRAN allows 31 characters for identifier names. A digit or an underscore cannot be the first character in an identifier name.
By default, FORTRAN passes arguments by reference as far addresses if the FORTRAN module is compiled in large or huge memory model. It passes them as near addresses if the FORTRAN module is compiled in medium model. Versions of FORTRAN prior to Version 4.0 always require large model.
325
The FORTRAN compiler passes an argument by value when declared with the VALUE attribute. This declaration can occur either in a FORTRAN INTERFACE block (which determines how to pass an argument) or in a function or subroutine declaration (which determines how to receive an argument). In FORTRAN you can apply the NEAR (or FAR) attribute to reference parameters. These keywords override the default. They have no effect when they specify the same method as the default.

A call to a FORTRAN function or subroutine declared with the PASCAL or C attribute passes all arguments by value in the parameter list (except for parameters declared with the REFERENCE attribute). This change in default passing method applies to function and subroutine definitions as well as to the functions and subroutines described by INTERFACE blocks.
Array Storage
When you declare FORTRAN arrays, you can specify any integer for the lower bound (the default is 1). The FORTRAN compiler stores all arrays in columnmajor order that is, the leftmost subscript increments most rapidly. For example, the first seven elements of an array defined as A[3,4] are stored as
A[1,1], A[2,1], A[3,1], A[1,2], A[2,2], A[3,2], A[1,3]
String Format
FORTRAN stores strings as a series of bytes at a fixed location in memory, with no delimiter at the end of the string. When passing a variable-length FORTRAN string to another language, you need to devise a method by which the target routine can find the end of the string. Consider the string declared as
CHARACTER*14 MSG MSG = 'String of text'
The string is stored in 14 bytes of memory like this:
Figure 12.3
FORTRAN String Format
326
Programmers Guide
Strings are passed by reference. Although FORTRAN has a method for passing length, the variable-length FORTRAN strings cannot be used in a mixedlanguage interface because other languages cannot access the temporary variable that FORTRAN uses to communicate string length. However, fixed-length strings can be passed if the FORTRAN INTERFACE statement declares the length of the string in advance.
External Data
FORTRAN routines can directly access external data. In FORTRAN you can declare data to be external by adding the EXTERN attribute to the data declaration. You can also access a FORTRAN variable from MASM if it is declared in a COMMON block. A FORTRAN program can call an external assembly procedure with the use of the INTERFACE statement. However, the INTERFACE statement is not strictly necessary unless you intend to change one of the FORTRAN defaults.
Structure Alignment
By default, FORTRAN uses word alignment (unpacked storage) for all data objects larger than 1 byte. This storage method specifies that occasional bytes may be added as padding, so that word and doubleword objects start on an even boundary. In addition, all nested structures and records start on a word boundary. The MASM default is byte-alignment, so you should specify an alignment of 2 for MASM structures or use the /Zp1 option when compiling in FORTRAN.

Use the same memory model for the MASM and FORTRAN modules.
Returning Values
You must use a special convention to return floating-point values, records, userdefined types, arrays, and values larger than 4 bytes to a FORTRAN module from an assembly procedure. The FORTRAN module creates space in the stack segment to hold the actual return value. When the call to the assembly procedure is made, an extra parameter is passed. This parameter is the last one pushed. The segment address of the return value is contained in SS. In the assembly procedure, put the data for the return value at the location pointed to by the return value offset. Then copy the return-value offset (located at BP + 6) to AX, and copy SS to DX. This is necessary because the calling module expects DX:AX to point to the return value.
327
Structures, Records, and User-Defined Data Types

The FORTRAN structure variable, defined with the STRUCTURE keyword and declared with the RECORD statement, is equivalent to the Pascal RECORD and the C struct. You can pass structures as arguments by value or by reference (the default). The FORTRAN types COMPLEX*8 and COMPLEX*16 are not directly implemented in MASM. However, you can write structures that are equivalent. The type COMPLEX*8 has two fields, both of which are 4-byte floating-point numbers; the first contains the real component, and the second contains the imaginary component. The type COMPLEX is equivalent to the type COMPLEX*8. The type COMPLEX*16 is similar to COMPLEX*8. The only difference is that each field of the former contains an 8-byte floating-point number. A FORTRAN LOGICAL*2 is stored as a 1-byte indicator value (1=true, 0=false) followed by an unused byte. A FORTRAN LOGICAL*4 is stored as a 1-byte indicator value followed by three unused bytes. The type LOGICAL is equivalent to LOGICAL*4, unless $STORAGE:2 is in effect. To pass or receive a FORTRAN LOGICAL type, declare a MASM structure with the appropriate fields.

In FORTRAN, you can call routines with a variable number of arguments by including the VARYING attribute in your interface to the routine, along with the C attribute. You must use the C attribute because a variable number of arguments is possible only with the C calling convention. The VARYING attribute prevents FORTRAN from enforcing a matching number of parameters.
Pointers and Addresses

FORTRAN programs can determine near and far addresses with the LOCNEAR and LOCFAR functions. Store the result as INTEGER*2 (with the LOCNEAR function) or as INTEGER*4 (with the LOCFAR function). If you pass the result of LOCNEAR or LOCFAR to another language, be sure to pass by value.
328
Programmers Guide
Example
In the following example, the FORTRAN module calls an assembly procedure that calculates A*2^B, where A and B are the first and second parameters, respectively. This is done by shifting the bits in A to the left B times.
INTERFACE TO INTEGER*2 FUNCTION POWER2(A, B) INTEGER*2 A, B END PROGRAM MAIN INTEGER*2 POWER2 INTEGER*2 A, B A = 3 B = 5 WRITE (*, *) '3 TIMES 2 TO THE B OR 5 IS ',POWER2(A, B) END
To understand the assembly procedure, consider how the parameters are placed on the stack, as illustrated in Figure 12.4.
Figure 12.4
FORTRAN Stack Frame
Figure 12.4 assumes that the FORTRAN module is compiled in large model. If you compile the FORTRAN module in medium model, then each argument is passed as a 2-byte, not 4-byte, address. The return address is 4 bytes long because procedures called from FORTRAN must always be FAR.
329
The assembler code looks like this:

.MODEL LARGE, FORTRAN Power2 PROTO .CODE Power2 PROC les mov les mov shl ret ENDP END FORTRAN, pFactor:FAR PTR SWORD, pPower:FAR PTR SWORD bx, ax, bx, cx, ax, pFactor es:[bx] pPower es:[bx] cl ; ; ; ; ; ; ES:BX points to factor AX = value of factor ES:BX points to power CX = value of power Multiply by 2^power Return result in AX FORTRAN, pFactor:FAR PTR SWORD, pPower:FAR PTR SWORD
Power2
The Basic/MASM Interface

This section explains how to call MASM procedures or functions from Basic and how to receive Basic arguments for the MASM procedure. Pascal is the default naming and calling convention, so all lowercase letters are converted to uppercase. Routines defined with the FUNCTION keyword return values, but routines defined with SUB do not. Basic DEF FN functions and GOSUB routines cannot be called from another language. The information provided pertains to Microsofts Basic and QuickBasic compilers. Differences between the two compilers are noted when necessary.

The following list shows the Basic data types that are equivalent to the MASM 6.1 data types.
Basic Type
STRING*1 INTEGER (X%) SINGLE (X!) LONG (X&), CURRENCY DOUBLE (X#)
Equivalent MASM Type

WORD SWORD REAL4 SDWORD REAL8
Naming Conventions
Basic recognizes up to 40 characters of a name. In the object code, Basic also drops any of its reserved characters: %, &, !, #, @, &.
330
Programmers Guide
Basic can pass data in several ways and can receive it by value or by near reference. By default, Basic arguments are passed by near reference as 2-byte addresses. To pass a near address, pass only the offset; if you need to pass a far address, pass the segment and offset separately as integer arguments. Pass the segment address first, unless you have specified C compatibility with the CDECL keyword. Basic passes each argument in a call by far reference when CALLS is used to invoke a routine. You can also use SEG to modify a parameter in a preceding DECLARE statement so that Basic passes that argument by far reference. To pass any other variable type by value, apply the BYVAL keyword to the argument in the DECLARE statement. You cannot pass arrays and userdefined types by value.
DECLARE SUB Test(BYVAL a%, b%, SEG c%) CALL Test(x%, y%, z%) CALLS Test(x%, y%, z%)
This CALL statement passes the first argument (a%) by value, the second argument (b%) by near reference, and the third argument (c%) by far reference. The statement
CALLS Test2(x%, y%, z%)
passes each argument by far reference.

Including the CDECL keyword in the Basic DECLARE statement enables the C calling and naming conventions. This also allows a call to a MASM procedure with a varying number of arguments.
Array Storage
The DIM statement sets the number of dimensions for a Basic array and also sets the arrays maximum subscript value. In the array declaration DIM x(a,b), the upper bounds (the maximum number of values possible) of the array are a and b. The default lower bound is 0. The default upper bound for an array subscript is 10. The default for column storage in Basic is column-major order, as in FORTRAN. For an array defined as DIM Arr%(3,3), reference the last element as Arr%(3,3). The first five elements of Arr (3,3) are
Arr(0,0), Arr(1,0), Arr(2,0), Arr(0,1), Arr(1,1)
331
When you pass an array from Basic to a language that stores arrays in rowmajor order, use the command-line option /R when compiling the Basic module. Most Microsoft languages permit you to reference arrays directly. Basic uses an array descriptor, however, which is similar in some respects to a Basic string descriptor. The array descriptor is necessary because Basic handles memory allocation for arrays dynamically, and thus may shift the location of the array in memory. A reference to an array in Basic is really a near reference to an array descriptor. Array descriptors are always in DGROUP, even though the data may be in far memory. Array descriptors contain information about type, dimensions, and memory locations of data. You can safely pass arrays to MASM routines only if you follow three rules:
u
Pass the arrays address by applying the VARPTR function to the first element of the Basic array and passing the result by value. To pass the far address of the array, apply both the VARPTR and VARSEG functions and pass each result by value. The receiving language gets the address of the first element and considers it to be the address of the entire array. It can then access the array with its normal array-indexing syntax. The MASM routine that receives the array should not call back to one of the calling programs routines before it has finished processing the array. Changing data within the callers heap even data unrelated to the array may change the arrays location in the heap. This would invalidate any further work the called routine performs, since the routine would be operating on the arrays old location. Basic can pass any member of an array by value. When passing individual array elements, these restrictions do not apply.
You can apply LBOUND and UBOUND to a Basic array to determine lower and upper bounds, and then pass the results to another routine. This way, the size of the array does not need to be determined in advance.
String Format
Basic maintains a 4-byte string descriptor for each string, as shown in the following. The first field of the string descriptor contains a 2-byte integer indicating the length of the actual string text. The second field contains the offset address of this text within the callers data segment.
Figure 12.5
Basic String Descriptor Format
332
Programmers Guide
An assembly-language procedure can store a Basic string descriptor as a simple structure, like this:
DESC len off DESC string sdesc STRUCT WORD WORD ENDS BYTE DESC ? ? ; Length of string ; Offset of string
"This text referenced by a string descriptor" (LENGTHOF string, string)
Version 7.0 or later of the Microsoft Basic Compiler provides new functions that access string descriptors. These functions simplify the process of sharing Basic string data with routines written in other languages. Earlier versions of Basic offer the LEN (Length) and SADD (String Address) functions, which together obtain the information stored in a string descriptor. LEN returns the length of a string in bytes. SADD returns the offset address of a string in the data segment. The caller must provide both pieces of information so the called procedure can locate and read the entire string. The address returned by SADD is declared as type INTEGER but is actually equivalent to a C near pointer. If you need to pass the far address of a string, use the SSEGADD (String Segment Address) function of Microsoft Basic version 7.0 or later. You can also determine the segment address of the first element with VARSEG .
External Data
Declaring global data in Basic follows the same two-step process as in other languages: 1. Declare shareable data in Basic with the COMMON statement. 2. Identify the shared variables in your assembly-language procedures with the EXTERN keyword. Place the EXTERN statement outside of a code or data segment when declaring far data.
Structure Alignment
Basic packs user-defined types. For MASM structures to be compatible, select byte-alignment.

Always use medium model in assembly-language procedures linked with Basic modules. If you are listing other libraries on the LINK command line, specify Basic libraries first. (There are differences between the QBX and command-line compilation. See your Basic documentation.)
333
Returning Values
Basic follows the usual convention of returning values in AX or DX:AX. If the value is not floating point, an array, or a structured type, or if it is less than 4 bytes long, then the 2-byte integers should be returned from the MASM procedure in AX and 4-byte integers should be returned in DX:AX. For all other types, return the near offset in AX.
User-Defined Data Types

The Basic TYPE statement defines structures composed of individual fields. These types are equivalent to the C struct, FORTRAN record (declared with the STRUCTURE keyword), and Pascal Record types. You can use any of the Basic data types except variable-length strings or dynamic arrays in a user-defined type. Once defined, Basic types can be passed only by reference.

You can vary the number of arguments in Basic when you change the calling convention with CDECL. To call a function with a varying number of arguments, you also need to suppress the type checking that normally forces a call to be made with a fixed number of arguments. In Basic, you can remove this type checking by omitting a parameter list from the DECLARE statement.
Pointers and Addresses

VARSEG returns a variables segment address, and VARPTR returns a variables offset address. These intrinsic Basic functions enable your program to pass near or far addresses.
Example
This example calls the Power2 procedure in the MASM 6.1 module.
DEFINT A-Z DECLARE FUNCTION Power2 (A AS INTEGER, B AS INTEGER) PRINT "3 times 2 to the power of 5 is "; PRINT Power2(3, 5) END
The first argument, A, is higher in memory than B because Basic pushes arguments in the same order in which they appear.
334
Programmers Guide
Figure 12.6 shows how the arguments are placed on the stack.
Figure 12.6
Basic Stack Frame
The assembly procedure can be written as follows:

.MODEL Power2 Power2 PROTO .CODE PROC mov mov mov mov shl ret ENDP END medium PASCAL, factor:PTR WORD, power:PTR WORD PASCAL, factor:PTR WORD, power:PTR WORD bx, ax, bx, cx, ax, WORD PTR factor [bx] WORD PTR power [bx] cl ; ; ; ; ; BX points to factor Load factor into AX BX points to power Load power into CX AX = AX * (2 to power of CX)
Power2
Note that each parameter must be loaded in a two-step process because the address of each is passed rather than the value. The return address is 4 bytes long because procedures called from Basic must be FAR.
334
Programmers Guide
335
C H A P T E R
1 3
Writing 32-Bit Applications
This chapter is an introduction to 32-bit programming for the 80386. The guidelines in this chapter also apply to the 80486 processor, which is basically a faster 80386 with the equivalent of a 80387 floating-point processor. Since you are already familiar with 16-bit real-mode programming, this chapter covers the differences between 16-bit programming and 32-bit protected-mode programming. The 80386 processor (and its successors such as the 80486) can run in real mode, virtual-86 mode, and in protected mode. In real and virtual-86 modes, the 80386 can run 8086/8088 programs. In protected mode, it can run 80286 programs. The 386 also extends the features of protected mode to include 32-bit operations and segments larger than 64K. The MS-DOS operating system directly supports 8086/8088 programs, which it runs either in real mode or virtual-86 mode. Native 32-bit 80386 programs can be run by using a DOS extender, by using the WINMEM32.DLL facility of Microsoft Windows 3.x, or by running a native 32-bit operating system, such as Microsoft Windows NT. You can use MASM to generate object code (OMF or COFF) for 32-bit programs. To do this, you will need a software development kit such as the Windows SDK for the target environment. Such kits include the linker and other components specific to your chosen operating environment.
32-Bit Memory Addressing

The 80386 has six segment registers. Four of these are familiar to 8086/8088 programmers: CS (Code Segment), SS (Stack Segment), DS (Data Segment), and ES (Extra Segment). The two additional registers, FS and GS, are used as data segment registers. Memory addresses on 80x86 machines consist of two parts a segment and an offset. In real-mode programs, the segment is a 16-bit number and the offset is a 16-bit number. Effective addresses are calculated by multiplying the segment by
336
Programmers Guide
16 and adding the offset to it. In protected mode, the segment value is not used directly as a number, but instead is an index to a table of selectors. Each selector describes a block of memory, including attributes such as the size and location of the block, and the access rights the program has to it (read, write, execute). The effective address is calculated by adding the offset to the base address of the memory block described by the selector. All segment registers are 16 bits wide. The offset in a 32-bit protected-mode program is itself 32 bits wide, which means that a single segment can address up to 4 gigabytes of memory. Because of this large range, there is little need to use segment registers to extend the range of addresses in 32-bit programs. If all six segment registers are initially set to the same value, then the rest of the program can ignore them and treat the processor as if it used a 32-bit linear address space. This is called 0:32, or flat, addressing. (The full segmented 32-bit addressing mode, in which the segment registers can contain different values, is called 16:32 addressing.) Flat addressing is used by the Windows NT operating system.
Figure 13.1
32-Bit Register Set
MASM Directives for 32-Bit Programming

If you use the simplified segment directives, a 32-bit program is surprisingly similar to a program for MS-DOS. Here are the differences:
Chapter 13 Writing 32-Bit Applications

u
337
Supply the .386 directive, which enables the 32-bit programming features of the 386 and its successors. The .386 directive must precede the .MODEL directive. For flat-model programming, use the directive .MODEL flat, stdcall which tells the assembler to assume flat model (0:32) and to use the Windows NT standard calling convention for subroutine calls.
u u u
Precede your data declarations with the .DATA directive. Precede your instruction codes with the .CODE directive. At the end of the source file, place an END directive.
Sample Program
The following sample is a 32-bit assembly language subroutine, such as might be called from a 32-bit C program written for the Windows NT operating system. The program illustrates the use of a variety of directives to make assembly language easier to read and maintain. Note that with 32-bit flat model programming, there is no longer any need to refer to segment registers, since these are artifacts of segmented addressing.
338
Programmers Guide
;* szSearch - An example of 32-bit assembly programming using MASM 6.1 ;* ;* Purpose: Search a buffer (rgbSearch) of length cbSearch for the ;* first occurrence of szTok (null terminated string). ;* ;* Method: A variation of the Boyer-Moore method ;* 1. Determine length of szTok (n) ;* 2. Set array of flags (rgfInTok) to TRUE for each character ;* in szTok ;* 3. Set current position of search to rgbSearch (pbCur) ;* 4. Compare current position to szTok by searching backwards ;* from the nth position. When a comparison fails at ;* position (m), check to see if the current character ;* in rgbSearch is in szTok by using rgfInTok. If not, ;* set pbCur to pbCur+(m)+1 and restart compare. If ;* pbCur reached, increment pbCur and restart compare. ;* 5. Reset rgfInTok to all 0 for next instantiation of the ;* routine. .386 .MODEL FALSE TRUE EQU EQU
flat, stdcall 0 NOT FALSE
Chapter 13 Writing 32-Bit Applications

.DATA ; Flags buffer - data initialized to FALSE. We will ; set the appropriate flags to TRUE during initialization ; of szSearch and reset them to FALSE before exit. rgfInTok BYTE 256 DUP (FALSE); .CODE PBYTE TYPEDEF PTR BYTE
339
szSearch PROC PUBLIC USES esi edi, rgbSearch:PBYTE, cbSearch:DWORD, szTok:PBYTE ; Initialize flags buffer. This tells us if a character is in ; the search token - Note how we use EAX as an index ; register. This can be done with all extended registers. mov esi, szTok xor eax, eax .REPEAT lodsb mov BYTE PTR rgfInTok[eax], TRUE .UNTIL (!AL) ; Save count of szTok bytes in EDX mov edx, esi sub edx, szTok dec edx ; ESI will always point to beginning of szTok mov esi, szTok ; EDI will point to current search position ; and will also contain the return value mov edi, rgbSearch ; Store pointer mov add sub to end of rgbSearch in EBX ebx, edi ebx, cbSearch ebx, edx
340
Programmers Guide
; Initialize ECX with length of szTok mov ecx, edx .WHILE ( ecx != 0 ) dec ecx ; Move index to current mov al, [edi+ecx] ; characters to compare ; ; ; ; If the current byte in the buffer doesn't exist in the search token, increment buffer pointer to current position +1 and start over. This can skip up to 'EDX' bytes and reduce search time. .IF !(rgfInTok[eax]) add edi, ecx inc edi ; Initialize ECX with mov ecx, edx ; length of szTok Otherwise, if the characters match, continue on as if we have a matching token .ELSEIF (al == [esi+ecx]) .CONTINUE Finally, if we have searched all szTok characters, and land here, we have a mismatch and we increment our pointer into rgbSearch by one and start over. .ELSEIF (!ecx) inc edi mov ecx, edx .ENDIF
; ;
; ; ;
; Verify that we haven't searched beyond the buffer. .IF (edi > ebx) mov edi, 0 ; Error value .BREAK .ENDIF .ENDW ; Restore flags mov xor .REPEAT lodsb mov .UNTIL in rgfInTok to 0 (for next time). esi, szTok eax, eax
BYTE PTR rgfInTok[eax], FALSE !AL
; Put return value in eax mov eax, edi ret szSearch ENDP end
340
Programmers Guide
341
A P P E N D I X
Differences Between MASM 6.1 and 5.1
For the many users who come to version 6.1 of the Microsoft Macro Assembler directly from the popular MASM 5.1, this appendix describes the differences between the two versions. Version 6.1 contains significant changes, including:
u
u u
u u
An integrated development environment called Programmers WorkBench (PWB) from which you can write, edit, debug, and execute code. Expanded functionality for structures, unions, and type definitions. New directives for generating loops and decision statements, and for declaring and calling procedures. Simplified methods for applying public attributes to variables and routines in multiple-module programs. Enhancements for writing and using macros. Flat-model support for Windows NT and new instructions for the 80486 processor.
The OPTION M510 directive (or the /Zm command-line switch) assures nearly complete compatibility between MASM 6.1 and MASM 5.1. However, to take full advantage of the enhancements in MASM 6.1, you will need to rewrite some code written for MASM 5.1. The first section of this appendix describes the new or enhanced features in MASM 6.1. The second section, Compatibility Between MASM 5.1 and 6.1, explains how to:
u u
Minimize the number of required changes with the OPTION directive. Rewrite your existing assembly code, if necessary, to take advantage of the assemblers enhancements.
Filename: LMAPGAPA.DOC Project: Template: MSGRIDA1.DOT Author: Ruth L Silverio Last Saved By: Ruth L Silverio Revision #: 2 Page: 341 of 1 Printed: 10/02/00 04:18 PM
342
Programmers Guide
New Features of Version 6.1

This section gives an overview of the new features of MASM 6.1 and provides references to more detailed information elsewhere in the documentation. For full explanations and coding examples, see the documentation listed in the crossreferences.
The Assembler, Environment, and Utilities

Most of the executable files provided with MASM 6.1 are new or revised. For a complete list of these files, read the PACKING.TXT file on the distribution disk. The book Getting Started also provides information about setting up the environment, assembler, and Help system.
The Assembler
The macro assembler, named ML.EXE, can assemble and link in one step. Its new 32-bit operation gives ML.EXE the ability to handle much larger source files than MASM 5.1. The command-line options are new. For example, the /Fl and /Sc options generate instruction timings in the listing file. Command-line options are case-sensitive and must be separated by spaces. For backward compatibility with MASM 5.1 makefiles, MASM 6.1 includes the MASM.EXE utility. MASM.EXE translates MASM 5.1 command-line options to the new MASM 6.1 command-line options and calls ML.EXE. See the Reference book for details.
H2INC
H2INC converts C include files to MASM include files. It translates data structures and declarations but does not translate executable code. For more information, see Chapter 20 of Environment and Tools.
NMAKE
NMAKE replaces the MAKE utility. NMAKE provides new functions for evaluating target files and more flexibility with macros and command-line options. For more information, see Environment and Tools.
Integrated Environment
PWB is an integrated development environment for writing, developing, and debugging programs. For information on PWB and the CodeView debugging application, see Environment and Tools.
Online Help
MASM 6.1 incorporates the Microsoft Advisor Help system. Help provides a vast database of online help about all aspects of MASM, including the syntax
Appendix A Differences Between MASM 6.1 and 5.1
343
and timings for processor and coprocessor instructions, directives, commandline options, and support programs such as LINK and PWB. For information on how to set up the help system, see Getting Started. You can invoke the help system from within PWB or from the QuickHelp program (QH).
HELPMAKE
You can use the HELPMAKE utility to create additional help files from ASCII text files, allowing you to customize the online help system. For more information, see Environment and Tools.
Other Programs
MASM 6.1 contains the most recent versions of LINK, LIB, BIND, CodeView, and the mouse driver. The CREF program is not included in MASM 6.1. The Source Browser provides the information that CREF provided under MASM 5.1. For more information on the Source Browser, see Chapter 5 of Environment and Tools or Help.
Segment Management
This section lists the changes and additions to memory-model support and directives that relate to memory model.
Predefined Symbols
The following predefined symbols (also called predefined equates) provide information about simplified segments:
Predefined Symbol @stack @Interface @Model @Line @Date @FileCur @Time @Environ Value
DGROUP for near stacks, STACK for far stacks
Information about language parameters Information about the current memory model The source line in the current file The current date The current file The current time The current environment variables
For more information about predefined symbols, see Predefined Symbols in Chapter 1.
344
Programmers Guide
Enhancements to the ASSUME Directive

MASM automatically generates ASSUME values for the code segment register (CS). It is no longer necessary to include lines such as
ASSUME CS:MyCodeSegment
in your programs. In addition, the ASSUME directive can include ERROR, FLAT, or register:type. MASM 6.1 issues a warning when you specify ASSUME values for CS other than the current segment or group. For more information, see Setting the ASSUME Directive for Segment Registers in Chapter 2 and Defining Register Types with ASSUME in Chapter 3.
Relocatable Offsets
For compatibility with applications for Windows, the LROFFSET operator can calculate a relocatable offset, which is resolved by the loader at run time. See Help for details.
Flat Model
MASM 6.1 supports the flat-memory model of Windows NT, which allows segments as large as 4 gigabytes. All other memory models limit segment size to 64K for MS-DOS and Windows. For more information about memory models, see Defining Basic Attributes with .MODEL in Chapter 2.
Data Types
MASM 6.1 supports an improved data typing. This section summarizes the improved forms of data declarations in MASM 6.1.
Defining Typed Variables

You can now use the type names as directives to define variables. Initializers are unsigned by default. The following example lines are equivalent:
var1 var1 DB BYTE 25 25
Signed Types
You can use the SBYTE, SWORD, and SDWORD directives to declare signed data. For more information about these directives, see Allocating Memory for Integer Variables in Chapter 4.
345
Floating-Point Types
MASM 6.1 provides the REAL4, REAL8, and REAL10 directives for declaring floating-point variables. For information on these type directives, see Declaring Floating-Point Variables and Constants in Chapter 6 .
Qualified Types
Type definitions can now include distance and language type attributes. Procedures, procedure prototypes, and external declarations let you specify the type as a qualified type. A complete description of qualified types is provided in the section Data Types in Chapter 1.
Structures
Changes to structures since MASM 5.1 include:
u u
Structures can be nested. The names of structure fields need not be unique. As a result, you must qualify references to field names. Initialization of structure variables can continue over multiple lines provided the last character in the line before the comment field is a comma. Curly braces and angle brackets are equivalent.
For example, this code works in MASM 6.1:

SCORE team1 score1 team2 score2 SCORE first SCORE STRUCT BYTE BYTE BYTE BYTE ENDS 10 DUP (?) ? 10 DUP (?) ?
{"BEARS", 20, "CUBS", 10 }
; This comment is allowed.
mov
al, [bx].score.team1 ; Field name must be qualified ; with structure name.
You can use OPTION OLDSTRUCTS or OPTION M510 to enable MASM 5.1 behavior for structures. See Compatibility between MASM 5.1 and 6.1, later in this appendix. For more information on structures and unions, see Structures and Unions in Chapter 5.
Unions
MASM 6.1 allows the definition of unions with the UNION directive. Unions differ from structures in that all fields within a union occupy the same data space. For more information, see Structures and Unions in Chapter 5.
346
Programmers Guide
Types Defined with TYPEDEF

The TYPEDEF directive defines a type for use later in the program. It is most useful for defining pointer types. For more information on defining types, see Data Types in Chapter 1, and Defining Pointer Types with TYPEDEF in Chapter 3.
Names of Identifiers
MASM 6.1 accepts identifier names up to 247 characters long. All characters are significant, whereas under MASM 5.1, names are significant to 31 characters only. For more information on identifiers, see Identifiers in Chapter 1.
Multiple-Line Initializers
In MASM 6.1, a comma at the end of a line (except in the comment field) implies that the line continues. For example, the following code is legal in MASM 6.1:
longstring bitmasks BYTE BYTE "This string ", "continues over two lines." 80h, 40h, 20h, 10h, 08h, 04h, 02h, 01h
For more information, see Statements in Chapter 1.
Comments in Extended Lines

MASM 5.1 allows a backslash ( \ ) as the line-continuation character if it is the last nonspace character in the line. MASM 6.1 permits a comment to follow the backslash.
Determining Size and Length of Data Labels

The LENGTHOF operator returns the number of data items allocated for a data label. MASM 6.1 also provides the SIZEOF operator. When applied to a type, SIZEOF returns the size attribute of the type expression. When applied to a data label, SIZEOF returns the number of bytes used by the initializer in the labels definition. In this case, SIZEOF for a variable equals the number of bytes in the type multiplied by LENGTHOF for the variable. MASM 6.1 recognizes the LENGTH and SIZE operators for backward compatibility. For a description of the behavior of SIZE under OPTION M510, see Length and Size of Labels with OPTION M510, later in this appendix. For obsolete behavior with the LENGTH operator, see also LENGTH Operator Applied to Record Types, page 356. For information on LENGTHOF and SIZEOF, see the following sections in chapter 5: Declaring and Referencing Arrays, Declaring and Initializing
347
Strings, Declaring Structure and Union Variables, and Defining Record Variables.
HIGHWORD and LOWWORD Operators

These operators return the high and low words for a given 32-bit operand. They are similar to the HIGH and LOW operators of MASM 5.1 except that HIGHWORD and LOWWORD can take only constants as operands, not relocatables (labels).
PTR and CodeView

Under MASM 5.1, applying the PTR operator to a data initializer determines the size of the data displayed by CodeView. You can still use PTR in this manner in MASM 6.1, but it does not affect CodeView typing. Defining pointers with the TYPEDEF directive allows CodeView to generate correct information. See Defining Pointer Types with TYPEDEF in Chapter 3.
Procedures, Loops, and Jumps

With its significant improvements for procedure and jump handling, MASM 6.1 closely resembles high-level language implementations of procedure calls. MASM 6.1 generates the code to correctly handle argument passing, check type compatibility between parameters and arguments, and process a variable number of arguments. MASM 6.1 can also automatically recast jump instructions to correct for insufficient jump distance.
Function Prototypes and Calls

The PROTO directive lets you prototype procedures in the same way as highlevel languages. PROTO enables type-checking and type conversion of arguments when calling the procedure with INVOKE. For more information, see Declaring Procedure Prototypes in Chapter 7. The INVOKE directive sets up code to call a procedure and correctly pass arguments according to the prototype. MASM 6.1 also provides the VARARG keyword to pass a variable number of arguments to a procedure with INVOKE. For more information about INVOKE and VARARG, see Calling Procedures with INVOKE and Declaring Parameters with the PROC Directive in Chapter 7. The ADDR keyword is new since MASM 5.1. When used with INVOKE, it provides the address of a variable, in the same way as the address-of operator (&) in C. This lets you conveniently pass an argument by reference rather than value. See Calling Procedures with INVOKE in Chapter 7.
348
Programmers Guide
High-Level Flow-Control Constructions

MASM 6.1 contains several directives that generate code for loops and decisions depending on the status of a conditional statement. The conditions are tested at run time rather than at assembly time. Directives new since MASM 5.1 include .IF, .ELSE, .ELSEIF, .REPEAT, .UNTIL, .UNTILCXZ, .WHILE, and .ENDW. MASM 6.1 also provides the associated .BREAK and .CONTINUE directives for loops and IF statements. For more information, see Loops in Chapter 7 and Decision Directives on page 171.
Automatic Optimization for Unconditional Jumps

MASM 6.1 automatically determines the smallest encoding for direct unconditional jumps. See Unconditional Jumps in Chapter 7.
Automatic Lengthening for Conditional Jumps

If a conditional jump cannot reach its target destination, MASM automatically recasts the code to use an unconditional jump to the target. See Jump Extending, page 169.
User-Defined Stack Frame Setup and Cleanup

The prologue code generated immediately after a PROC statement sets up the stack for parameters and local variables. The epilogue code handles stack cleanup. MASM 6.1 allows user-defined prologues and epilogues, as described in Generating Prologue and Epilogue Code in Chapter 7.
Simplifying Multiple-Module Projects

MASM 6.1 simplifies the sharing of code and data among modules and makes the use of include files more efficient.
EXTERNDEF in Include Files

MASM 5.1 requires that you declare public and external all data and routines used in more than one module. With MASM 6.1, a single EXTERNDEF directive accomplishes the same task. EXTERNDEF lets you put global data declarations within an include file, making the data visible to all source files that include the file. For more information, see Using EXTERNDEF in Chapter 8.
Search Order for Include Files

MASM 6.1 searches for include files in the directory of the main source file rather than in the current directory. Similarly, it searches for nested include files in the directory of the include file. You can specify additional paths to search with the /I command-line option. For more information on include files, see Organizing Modules in Chapter 8.
349
Enforcing Case Sensitivity

In MASM 5.1, sensitivity to case is influenced only by command-line options such as /MX, not the language type given with the .MODEL directive. In MASM 6.1, the language type takes precedence over the command-line options in specifying case sensitivity.
350
Programmers Guide
Alternate Names for Externals

The syntax for EXTERN allows you to specify an alternate symbol name, which the linker can use to resolve an external reference to an unused symbol. This prevents linkage with unneeded library code, as explained in Using EXTERN with Library Routines, Chapter 8.
Expanded State Control

Several directives in MASM 6.1 enable or disable various aspects of the assembler control. These include 80486 coprocessor instructions and use of compatibility options.
The OPTION Directive

The new OPTION directive allows you to selectively define the assemblers behavior, including its compatibility with MASM 5.1. See Using the OPTION Directive in Chapter 1 and Compatibility between MASM 5.1 and 6.1, later in this appendix.
The .NO87 Directive

The .NO87 directive disables all coprocessor instructions. For more information, see Help.
The .486 and .486P Directives

MASM 6.1 can assemble instructions specific to the 80486, enabled with the .486 directive. The .486P directive enables 80486 instructions at the highest privilege level (recommended for systems-level programs only). For more information, see Help.
The PUSHCONTEXT and POPCONTEXT Directives

The directive PUSHCONTEXT saves the assembly environment, and POPCONTEXT restores it. The environment includes the segment register assumes, the radix, the listing and CREF flags, and the current processor and coprocessor. Note that .NOCREF (the MASM 6.1 equivalent to .XCREF) still determines whether information for a given symbol will be added to Browser information and to the symbol table in the listing file. For more information on listing files, see Appendix C or Help.
351
New Processor Instructions

MASM 6.1 supports these instructions for the 80486 processor:
80486 Instruction
BSWAP CMPXCHG INVD INVLPG WBINVD XADD
Description Byte swap Compare and exchange Invalidate data cache Invalidate Translation Lookaside Buffer entry Write back and invalidate data cache Exchange and add
For full descriptions of these instructions, see the Reference or Help.
Renamed Directives
Although MASM 6.1 still supports the old names in MASM 5.1, the following directives have been renamed for language consistency:
MASM 6.1
.DOSSEG .LISTIF .LISTMACRO .LISTMACROALL .NOCREF .NOLIST .NOLISTIF .NOLISTMACRO ECHO EXTERN FOR FORC REPEAT STRUCT SUBTITLE
MASM 5.1
DOSSEG .LFCOND .XALL .LALL .XCREF .XLIST .SFCOND .SALL %OUT EXTRN IRP IRPC REPT STRUC SUBTTL
Specifying 16-Bit and 32-Bit Instructions

MASM 6.1 supports all instructions that work with the extended 32-bit registers of the 80386/486. For certain instructions, you can override the default operand
352
Programmers Guide
size with the W (word) and the D (doubleword) suffixes. For details, see the Reference or Help.
Macro Enhancements
There are significant enhancements to macro functions in MASM 6.1. Directives provide for a variable number of arguments, loop constructions, definitions of text equates, and macro functions.
Variable Arguments
MASM 5.1 ignores extra arguments passed to macros. In MASM 6.1, you can pass a variable number of arguments to a macro by appending the VARARG keyword to the last macro parameter in the macro definition. The macro can then reference additional arguments relative to the last declared parameter. This procedure is explained in Returning Values with Macro Functions in Chapter 9.
Required and Default Macro Arguments

With MASM 6.1, you can use REQ or the := operator to specify required or default arguments. See Specifying Required and Default Parameters in Chapter 9.
New Directives for Macro Loops

Within a macro definition, WHILE repeats assembly as long as a condition remains true. Other macro loop directives, IRP, IRPC, and REPT, have been renamed FOR, FORC, and REPEAT. For more information, see Defining Repeat Blocks with Loop Directives in Chapter 9.
Text Macros
The EQU directive retains its old functionality, but MASM 6.1 also incorporates a TEXTEQU directive for defining text macros. TEXTEQU allows greater flexibility than EQU. For example, TEXTEQU can assign to a label the value calculated by a macro function. For more information, see Text Macros in Chapter 9.
The GOTO Directive for Macros

Within a macro definition, GOTO transfers assembly to a line labeled with a leading colon(:). For more information on GOTO, see Help .
Macro Functions
At assembly time, macro functions can determine and return a text value using EXITM. Predefined macro string functions concatenate strings, return the size of a string, and return the position of a substring within a string. For information
353
on writing your own macro functions, see Returning Values with Macro Functions in Chapter 9.
Predefined Macro Functions

MASM 6.1 provides the following predefined text macro functions:
Symbol @CatStr @InStr @SizeStr @SubStr Value Returned A concatenated string The position of one string within another The size of a string A substring
For more information on predefined macros, see String Directives and Predefined Functions in Chapter 9.
MASM 6.1 Programming Practices

MASM 6.1 provides many features that make it easier for you to write assembly code. If you are familiar with MASM 5.1 programming, you may find it helpful to adopt the following list of new programming practices for programming with MASM 6.1. The list summarizes many of the changes covered in the following section, Compatibility Between MASM 5.1 and 6.1.
u u
u u
Select identifier names that do not begin with the dot operator (.). Use the dot operator (.) only to reference structure fields, and the plus operator (+) when not referencing structures. Different structures can have the same field names. However, the assembler does not allow ambiguous references. You must include the structure type when referring to field names common to two or more structures. Separate macro arguments with commas, not spaces. Avoid adding extra ampersands in macros. For a list of the new rules about using ampersands in macros, see Substitution Operator in Chapter 9 and OPTION OLDMACROS, page 372. By default, code labels defined with a colon are local. Place two colons after code labels if you want to reference the label outside the procedure.
Compatibility Between MASM 5.1 and 6.1

MASM 6.1 provides a compatibility mode, making it easy for you to transfer existing MASM 5.1 code to the new version. You invoke the compatibility mode through the OPTION M510 directive or the /Zm command-line switch. This
354
Programmers Guide
section explains the changes you may need to make to get your MASM 5.1 code to run under MASM 6.1 in compatibility mode.
Rewriting Code for Compatibility

In some cases, MASM 6.1 with OPTION M510 does not support MASM 5.1 behavior. In several cases, this is because bugs in MASM 5.1 were corrected. To update your code to MASM 6.1, use the instructions in this section. This usually requires only minor changes. Many of the topics listed here will not apply to your code. This section discusses topics in order of likelihood, beginning with the most common. In addition, you may have conflicts between identifier names and new reserved words. OPTION NOKEYWORD resolves errors generated from the use of reserved words as identifiers. See OPTION NOKEYWORD, page 376, for more information.
Bug Fixes Since MASM 5.1

This section lists the differences between MASM 5.1 and MASM 6.1 due to bug corrections since MASM 5.1.
Invalid Use of LOCK, REPNE, and REPNZ

Except in compatibility mode, MASM 6.1 flags illegal uses of the instruction prefixes LOCK, REPNE, and REPNZ. The error generated for invalid uses of the LOCK, REPNE, and REPNZ prefixes is error A2068:
instruction prefix not allowed
Table A.1 summarizes the correct use of the instruction prefixes. It lists each string instruction with the type of repeat prefix it uses, and indicates whether the instruction works on a source, a destination, or both.
Table A.1 Requirements for String Instructions Instruction
MOVS SCAS CMPS LODS STOS INS OUTS
Repeat Prefix
REP REPE/REPNE REPE/REPNE
Source/Destination Both Destination Both Source Destination Destination Source
Register Pair DS:SI, ES:DI ES:DI DS:SI, ES:DI DS:SI ES:DI ES:DI DS:SI
-REP REP REP
355
No Closing Quotation Marks in Macro Arguments

In MASM 5.1, you can use both single and double quotation marks (' and ") to begin strings in macro arguments. The assembler does not generate an error or warning if the string does not end with quotation marks on a macro call. Instead, MASM 5.1 considers the remainder of the line to be part of the macro argument containing the opening quote, as if there were a closing quotation mark at the end of the line. By default, MASM 6.1 now generates error A2046:
missing single or double quotation mark in string
so all single and double quotation marks in macro arguments must be matched. To correct such errors in MASM 6.1, either end the string with a closing quotation mark as shown in the following example, or use the macro escape character (!) to treat the quotation mark literally.
; MASM 5.1 code MyMacro "all this in one argument ; Default MASM 6.1 code MyMacro "all this in one argument"
Making a Scoped Label Public

MASM 5.1 considers code labels defined with a single colon inside a procedure to be local to that procedure if the module contains a .MODEL directive with a language type. Although the label is local, MASM 5.1 does not generate an error if it is also declared PUBLIC. MASM 6.1 generates error A2203:
cannot declare scoped code label as PUBLIC
If you want to make a label PUBLIC, it must not be local. You can use the double colon operator to define a non-scoped label, as shown in this example:
PUBLIC publicLabel:: publicLabel ; Non-scoped label MASM 6.1
Byte Form of BT, BTS, BTC, and BTR Instructions

MASM 5.1 allows a byte argument for the 80386 bit-test instructions, but encodes it as a word argument. The byte form is not supported by the processor. MASM 6.1 does not support this behavior and generates error A2024:
invalid operand size for instruction
Rewrite your code to use a word-sized argument.
356
Programmers Guide
Default Values for Record Fields

In MASM 5.1, default values for record fields can range down to 2n , where n is the number of bits in the field. This results in the loss of the sign bit. MASM 6.1 allows a range of 2n1 to 2n1 for default values. Illegal initializers generate error A2071:
initializer too large for specified size
Design Change Issues

MASM 6.1 includes design changes that make the language more consistent. These changes are not affected by the OPTION directive, discussed later in this appendix. Therefore, the changes require revisions in your code. In most cases, the necessary revisions are minor and the circumstances requiring changes are rare.
Operands of Different Size

MASM 5.1 does not require operands to agree in size, as the following code illustrates:
.DATA? var1 var2 .CODE . . . mov DB DB ? ?
var1, ax
; Copy AX to word at var1
The operands for the MOV instruction do not match in size, yet the instruction assembles correctly. It places the contents of AL into var1 and AH into var2, moving a word of data in one step. If the code defined var1 as a word value, the instruction
mov var1, al
would also assemble correctly, copying AL into the low byte of var1 while leaving the high byte unaffected. Except at warning level 0, MASM 5.1 issues a warning to inform you of the size mismatch, but both scenarios are legal. MASM 6.1 does not accept instructions with operands that do not agree in size. You must specifically coerce the size of the memory operand, like this:

mov BYTE PTR var1, al
357
Conflicting Structure Declarations

MASM 5.1 allows you to declare two or more structures with the same name. Each declaration replaces the previous declaration. However, the field names from previous declarations still remain in the assemblers list of declared values. MASM 6.1 does not allow conflicting declarations of a structure. It generates errors A2160 through A2165 for each conflicting declaration. The errors note a specific conflict, such as conflicting number of fields, conflicting names of fields, or conflicting initializers.
Forward References to Text Macros Outside of Expressions

MASM 5.1 allows forward references to text macros in specialized cases. MASM 6.1 with OPTION M510 also permits forward references, but only when the text macro is referenced in an expression. To revise your code, place all macro definitions at the beginning of the file.
HIGH and LOW Applied to Relocatable Operands

In some cases, MASM 5.1 accepts HIGH and LOW applied to relocatable memory expressions. For example, MASM 5.1 allows this code sequence:
; MASM 5.1 code EXTRN var1:WORD var2 DW 0 mov al, LOW var1 mov ah, HIGH var1
; These two instructions yield the ; same as mov ax, OFFSET var1
However, the instruction

mov ax, LOW var2
is not legal. MASM 6.1 generates error A2105:

HIGH and LOW require immediate operands
The OFFSET operator is required on these operands in MASM 6.1, as shown in the following. Rewrite your code if necessary.
; MASM 6.1 code mov al, LOW OFFSET var1 mov ah, HIGH OFFSET var2
OFFSET Applied to Group Names and Indirect Memory Operands

In MASM 6.1, you cannot apply OFFSET to a group name, indirect argument, or procedure argument. Doing so generates error A2098:
358
Programmers Guide
invalid operand for OFFSET
LENGTH Operator Applied to Record Types

In MASM 5.1, the LENGTH operator, when applied to a record type, returns the total number of bits in a record definition.
359
In MASM 6.1, the statement LENGTH recordName returns error A2143:

expected data label
Rewrite your code if necessary. The new SIZEOF operator returns information about records in MASM 6.1. For more information, see Defining Record Variables in Chapter 5.
Signed Comparison of Hexadecimal Values Using GT, GE, LE, or LT

The rules for twos-complement comparisons have changed. In MASM 5.1, the expression
0FFFFh GT -1
is false because the twos-complement values are equal. However, because hexadecimal numbers are now treated as unsigned, the expression is true in MASM 6.1. To update, rewrite the affected code.
RET Used with a Constant in Procedures with Epilogues

By default in MASM 6.1, the RET instruction followed by a constant suppresses automatic generation of epilogue code. MASM 5.1 ignores the operand and generates the epilogue. Remove the argument if necessary. See Generating Prologue and Epilogue Code in Chapter 7.
Code Labels at Top of Procedures with Prologues

By default in MASM 5.1, a code label defined on the same line as the first procedure instruction refers to the first byte of the prologue. In MASM 6.1, a code label defined at the beginning of a procedure refers to the first byte of the procedure after the prologue. If you need to label the start of the prologue code, place the label before the PROC statement. For more information, see Generating Prologue and Epilogue Code in Chapter 7.
Use of % as an Identifier Character

MASM 5.1 allows % as an identifier character. This behavior leads to ambiguities when % is used as the expansion operator in macros. Since % is not allowed as a character in MASM 6.1 identifiers, you must change the names of any identifiers containing the % character. For a list of legal identifier characters, see Identifiers in Chapter 1.
ASSUME CS Set to Wrong Value

With MASM 6.1 you do not need to use the ASSUME statement for the CS register. Instead, MASM 6.1 generates an automatic ASSUME statement for the code segment register to the current segment or group, as explained in Setting the ASSUME Directive for Segment Registers in Chapter 2.
360
Programmers Guide
Additionally, MASM 6.1 does not allow explicit ASSUME statements for CS that contradict the automatically set ASSUME statement. MASM 5.1 allows CS to be assumed to the current segment, even if that segment is a member of a group. With MASM 6.1, this results in warning A4004:
cannot ASSUME CS
To avoid this warning with MASM 6.1, delete the ASSUME statement for CS.
Code Requiring Two-Pass Assembly

Unlike version 5.1, MASM 6.1 does most of its work on its first pass, then performs as many subsequent passes as necessary. In contrast, MASM 5.1 always assembles in two source passes. As a result, you may need to revise or delete some pass-dependent constructs under MASM 6.1.
Two-Pass Directives
To assure compatibility, MASM 6.1 supports 5.1 directives referring to two passes. These include .ERR1, .ERR2, IF1, IF2, ELSEIF1, and ELSEIF2. For second-pass constructs, you must specify OPTION SETIF2 , as discussed in OPTION SETIF2, page 377. Without OPTION SETIF2 , the IF2 and .ERR2 directives cause error A2061:
[ [ELSE] ]IF2/.ERR2 not allowed : single-pass assembler
MASM 6.1 handles first-pass constructs differently. It treats the .ERR1 directive as .ERR, and the IF1 directive as IF. The following examples show you how you can rewrite typical pass-sensitive code for MASM 6.1:
u
Declare var external only if not defined in current module:

; MASM 5.1: IF2 IFNDEF var EXTRN var:far ENDIF ENDIF ; MASM 6.1: EXTERNDEF
var:far

u
361
Include a file of definitions only once to speed assembly:

; MASM 5.1: IF1 INCLUDE file1.inc ENDIF ; MASM 6.1: INCLUDE FILE1.INC
Generate a %OUT or .ERR message only once:

; MASM 5.1: IF2 %OUT This is my message ENDIF IF2 .ERRNZ A NE B ENDIF ; MASM 6.1: ECHO This is my message .ERRNZ A NE B <ASSERTION FAILURE: A NE B>
Generate an error if a symbol is not defined but may be forward referenced:

; MASM 5.1: IF2 .ERRNDEF ENDIF ; MASM 6.1: .ERRNDEF
var
var
For information on conditional directives, see Conditional Directives, Chapter 1.
IFDEF and IFNDEF with Forward-Referenced Identifiers

If you use a symbol name that has not yet been defined in an IFDEF or IFNDEF expression, MASM 6.1 returns FALSE for the IFDEF expression and TRUE for the IFNDEF expression. When OPTION M510 is enabled, the assembler generates warning A6005:
expression condition may be pass-dependent
To resolve the warning, place the symbol definition before the conditional test.
362
Programmers Guide
Address Spans as Constants

The value of offsets calculated on the first assembly pass may not be the same as those calculated on later passes. Therefore, you should avoid comparisons with an address span, as in the following examples:
IF (OFFSET var1 - OFFSET var2) EQ 10 WHILE dx LT (OFFSET var1 - OFFSET var2) REPEAT OFFSET var1 - OFFSET var2
However, the DUP operator allows such an expression as its count value. The assembler evaluates the DUP count on every pass, so even expressions involving forward references assemble correctly. You can also use expressions containing span distances with the .ERR directives, since the assembler evaluates these directives after calculating all offsets:
.ERRE OFFSET var1 - OFFSET var2 - 10, <span incorrect>
.TYPE with Forward References

MASM 5.1 evaluates .TYPE on both assembly passes. This means it yields zero on the first pass and nonzero on the second pass, if applied to an expression that forward-references a symbol. MASM 6.1 evaluates .TYPE only on the first assembly pass. As a result, if the operand references a symbol that has not yet been defined, .TYPE yields a value of zero. This means that .TYPE, if used in a conditional-assembly construction, may yield different results in MASM 6.1 than in MASM 5.1.
Obsolete Features No Longer Supported

The following two features are no longer supported by MASM 6.1. Because both are obscure features provided by early versions of the assembler, they probably do not affect your MASM 5.1 code.
The ESC Instruction

MASM 6.1 no longer supports the ESC instruction, which was used to send hand-coded commands to the coprocessor. Because MASM 6.1 recognizes and assembles the full set of coprocessor mnemonics, the ESC instruction is not necessary. Using the ESC instruction generates error A2205:
ESC instruction is obsolete: ignored
To update MASM 5.1 code, use the coprocessor instructions instead of ESC.
363
The MSFLOAT Binary Format

MASM 6.1 does not support the .MSFLOAT directive, which provided the Microsoft Binary Format (MSB) for floating-point numbers in variable initializers. Using the .MSFLOAT directive generates error A2204:
.MSFLOAT directive is obsolete: ignored
Use IEEE format or, if MSB format is necessary, initialize variables with hexadecimal values. See Storing Numbers in Floating-Point Format in Chapter 6.
Using the OPTION Directive

The OPTION directive lets you control compatibility with MASM 5.1 code. This section explains the differences in MASM 5.1 and MASM 6.1 behavior that the OPTION directive can influence. The OPTION M510 directive (or /Zm command-line option) initiates all aspects of 5.1 compatibility mode. You can select from among specific characteristics of MASM 5.1 behavior with the OPTION arguments discussed in following sections. Each section also explains how to revise your code if you want to remove OPTION directives from your MASM 5.1 code. Note If your code includes both .MODEL and OPTION M510, the OPTION M510 statement must appear first. Wherever this appendix suggests using OPTION M510 in your code, you can set the /Zm command-line option instead.
OPTION M510
This section discusses the M510 argument to the OPTION directive, which selects the MASM 5.1 compatibility mode. In this mode, MASM 6.1 implements MASM 5.1 behavior relating to macros, offsets, scope of code labels, structures, identifier names, identifier case, and other behaviors. The OPTION M510 directive automatically sets the following:
OPTION OPTION OPTION OPTION OLDSTRUCTS OLDMACROS DOTNAME SETIF2:TRUE ; ; ; ; MASM 5.1 structures MASM 5.1 macros Identifiers may begin with a dot (.) Two-pass code activates on every pass
If you do not have a .386, 386P .486, or 486P directive in your module, then OPTION M510 adds:
364
Programmers Guide
OPTION EXPR16 ; 16-bit expression precision ; See "OPTION EXPR16," following
365
If you do not have a .MODEL directive in your module, OPTION M510 adds:
OPTION OFFSET:SEGMENT ; OFFSET operator defaults to ; segment-relative ; See "OPTION OFFSET," following
If you do not have a .MODEL directive with a language specifier in your module, OPTION M510 also adds:
OPTION NOSCOPED ; Code labels are not local inside ; procedures ; See "OPTION NOSCOPED," following ; Labels defined with PROC are not ; public by default ; See "OPTION PROC," following
OPTION PROC:PRIVATE
If you want to remove OPTION M510 from your code (or /Zm from the command line), add the OPTION directive arguments to your module according to the conditions stated earlier. There may be compatibility issues affecting your code that are supported under OPTION M510, but are not covered by the other OPTION directive arguments. Once you have modified your source code so it no longer requires behavior supported by OPTION M510, you can replace OPTION M510 with other OPTION directive arguments. These compatibility issues are discussed in following sections. Once you have replaced OPTION M510 with other forms of the OPTION directive and your code works correctly, try removing the OPTION directives, one at a time. Make appropriate source modifications as necessary, until your code uses only MASM 6.1 defaults.
Reserved Keywords Dependent on CPU Mode with OPTION M510

With OPTION M510, keywords and instructions not available in the current CPU mode (such as ENTER under .8086) are not treated as keywords. This also means the USE32, FLAT, FAR32, and NEAR32 segment types and the 80386/486 registers are not keywords with a processor selection less than .386. If you remove OPTION M510, any reserved word used as an identifier generates a syntax error. You can either rename the identifiers or use OPTION NOKEYWORD. For more information on OPTION NOKEYWORD, see OPTION NOKEYWORD, later in this appendix.
Invalid Use of Instruction Prefixes with OPTION M510

Code without OPTION M510 generates errors for all invalid uses of the instruction prefixes. OPTION M510 suppresses some of these errors to match MASM 5.1 behavior. MASM 5.1 does not check for illegal usage of the instruction prefixes LOCK, REP, REPE, REPZ, REPNE, and REPNZ.
366
Programmers Guide
Illegal usage of these prefixes results in error A2068:

instruction prefix not allowed
For more information on these instruction prefixes, see Overview of String Instructions in Chapter 5. See also Bug Fixes from MASM 5.1, earlier in this appendix.
Size of Constant Operands with OPTION M510

In MASM 5.1, a large constant value that can fit only in the processors default word (4 bytes for .386 and .486, 2 bytes otherwise) is assigned a size attribute of the default word size. The value of the constant affects the number of bytes changed by the instruction. For example,
; Legal only with OPTION M510 mov [bx], 0100h
is legal in OPTION M510 mode. Since 0100h cannot fit in a byte, the assembler interprets the value as a word. Without OPTION M510, the assembler never assigns a size automatically. You must state it explicitly with the PTR operator, as shown in the following example:
; Without OPTION M510 mov [bx], WORD PTR 0100h
Code Labels when Defining Data with OPTION M510

MASM 5.1 allows a code label definition in a data definition statement if that statement does not also define a data label. MASM 6.1 also allows such definitions if OPTION M510 is enabled; otherwise it is illegal.
; Legal only with OPTION M510 MyCodeLabel: DW 0
SEG Operator with OPTION M510

In MASM 5.1, the SEG operator returns a labels segment address unless the frame is explicitly specified, in which case it returns the segment address of the frame. A statement such as SEG DGROUP:var always returns DGROUP, whereas SEG var always returns the segment address of var. OPTION M510 forces this same behavior in MASM 6.1. If you do not use OPTION M510, the behavior of the SEG operator is determined by the OPTION OFFSET directive, as described in OPTION OFFSET, later in this appendix. In MASM 6.1, the value returned by the SEG operator applied to a nonexternal variable depends on compatibility mode:

u
367
Without OPTION M510, SEG returns the address of the frame (the segment, group, or the value assumed to the segment register) if one has been explicitly set. With OPTION M510, SEG returns the group if one has been specified. In the absence of a defined group, SEG returns the segment where the variable is defined.
Expression Evaluation with OPTION M510

By default, MASM 6.1 changes the way expressions are evaluated. In MASM 5.1,
var-2[bx]
is parsed as
(var-2)[bx]
Without OPTION M510, you must rewrite the statement, since the assembler parses it as
var-(2[bx])
which generates an error.
Length and Size of Labels with OPTION M510

With OPTION M510, you can apply the LENGTH and SIZE operators to any label. For a code label, SIZE returns a value of 0FFFFh for NEAR and 0FFFEh for FAR. LENGTH always returns a value of 1. For strings, SIZE and LENGTH both return 1. Without OPTION M510, SIZE returns values of 0FF01h, 0FF02h, 0FF04h, 0FF05h, and 0FF06h for SHORT, NEAR16, NEAR32, FAR16, and FAR32 labels, respectively. LENGTH returns 1 except when used with DUP, in which case it returns the outermost count. For arrays initialized with DUP, SIZE returns the length multiplied by the size of the type. The LENGTHOF and SIZEOF operators in MASM 6.1 handle arrays much more consistently. These operators return the number of data items and the number of bytes in an initializer. For a description of SIZEOF and LENGTHOF, see the following sections in Chapter 5: Declaring and Referencing Arrays, Declaring and Initializing Strings, Defining Structure and Union Variables, and Defining Record Variables.
Comparing Types Using EQ and NE with OPTION M510

With OPTION M510, the assembler converts types to a constant value before comparisons with EQ and NE. Code types are converted to values of 0FFFFh
368
Programmers Guide
(near) or 0FFFEh (far). If OPTION M510 is not enabled, the assembler converts types to constants only when comparing them with constants. Thus, MASM 6.1 recognizes only equivalent qualified types as equal expressions. For existing MASM 5.1 code, these distinctions affect only the use of the TYPE operator in conjunction with EQ and NE. The following example illustrates how the assembler compares types with and without compatibility mode:
MYSTRUCT f1 f2 MYSTRUCT STRUC DB DB ENDS 0 0
; With OPTION M510 val val val val = = = = (TYPE MYSTRUCT) EQ WORD 2 EQ WORD WORD EQ WORD SWORD EQ WORD ; ; ; ; True: True: True: True: 2 2 2 2 EQ EQ EQ EQ 2 2 2 2
; Without OPTION M510 val val val val = = = = (TYPE MYSTRUCT) EQ WORD 2 EQ WORD WORD EQ WORD SWORD EQ WORD ; ; ; ; False: True: True: False: MyStruct NE WORD 2 EQ 2 WORD EQ WORD SWORD NE WORD
Use of Constant and PTR as a Type with OPTION M510

You can use a constant as the left operand to PTR in compatibility mode. Otherwise, you must use a type expression. With OPTION M510, a constant must have a value of 1 (BYTE), 2 (WORD), 4 (DWORD), 6 (FWORD), 8 (QWORD) or 10 (TBYTE). The assembler treats the constant as the
369
parenthesized type. Note that the TYPE operator yields a type expression, but the SIZE operator yields a constant.
; With OPTION M510 MyData DW mov mov mov mov 0 WORD PTR [bx], 10 (TYPE MyData) PTR [bx], 10 (SIZE MyData) PTR [bx], 10 2 ptr [bx], 10 ; ; ; ; Legal Legal Legal Legal
; Without OPTION M510 MyData WORD mov mov mov mov 0 WORD PTR [bx], 10 (TYPE MyData) PTR [bx], 10 (SIZE MyData) PTR [bx], 10 2 PTR [bx], 10 ; ; ; ; Legal Legal Illegal Illegal
; ;
Structure Type Cast on Expressions with OPTION M510

In compatibility mode, use the PTR operator to type-cast a constant to a structure type. This is most often done in data initializers to affect the CodeView information of the data label. Without OPTION M510, the assembler generates an error.
MYSTRC f1 MYSTRC MyPtr STRUC DB ENDS DW 0
MYSTRC PTR 0
; Illegal without OPTION M510
In MASM 6.1, the initializer type does not influence CodeViews type information.
Hidden Coercion of OFFSET Expression Size with OPTION M510

When programming for the 80386 or 80486, the size of an OFFSET expression can be 2 bytes for a symbol in a USE16 segment, or 4 bytes for a symbol in a USE32 or FLAT segment. With OPTION M510, you can use a 32-bit OFFSET expression in a 16-bit context. Without OPTION M510, you must use the LOWWORD operator to convert the offset size.
370
Programmers Guide
.386 ; With OPTION M510 seg32 SEGMENT USE32 MyLabel WORD 0 seg32 ENDS seg16 SEGMENT mov mov mov ENDS USE16 'code' ax, OFFSET MyLabel ax, LOWWORD OFFSET MyLabel eax, OFFSET MyLabel ; ; ; ; With OPTION M510: Legal Legal Legal
seg16
; Without OPTION M510 seg32 SEGMENT USE32 MyLabel WORD 0 seg32 ENDS seg16 ; SEGMENT mov mov mov ENDS USE16 'code' ; ax, OFFSET MyLabel ; ax, LOWWORD offset MyLabel ; eax, OFFSET MyLabel ; Without OPTION M510: Illegal Legal Legal
seg16
Specifying Radixes with OPTION M510

If the current radix in your code is greater than 10 decimal, MASM 6.1 allows the radix specifiers B (binary) and D (decimal) only in compatibility mode. You must change B to Y for binary, and D to T for decimal, since both B and D are legitimate hexadecimal values, making numbers such as 12D ambiguous. If you want to keep B and D as radix specifiers when the current radix is greater than 10, you must specify OPTION M510. For more information about radixes, see Integer Constants and Constant Expressions in Chapter 1.
Naming Conventions with OPTION M510

By default, MASM 5.1 does not write the names of public variables in uppercase to the object file, even when a language type of PASCAL, FORTRAN, or BASIC is specified. Unless you use OPTION M510, these language types in MASM 6.1 write identifier names in uppercase, even with the /Cp or /Cx command-line options. When you link with /NOI, case must match in the object files to resolve externals.
371
Length Significance of Symbol Names with OPTION M510

With MASM 5.1, only the first 31 characters of a symbol name are considered significant, and only the first 31 characters of a public or external symbol name are placed in the object file. Without OPTION M510, the entire name is considered significant. The maximum number of characters placed in the object file is controlled with the /Hnumber command-line option, with a default of 247 (the maximum length of an identifier in MASM 6.1).
String Defaults in Structure Variables with OPTION M510

In compatibility mode, a constant initializer can override a structure field initialized with a string value. Without OPTION M510, only another string or a list can override a string initializer. To update your code, surround the constant override value with angle brackets or curly braces to indicate a list with one element.
MTSTRUCT MyString MTSTRUCT STRUCT BYTE ENDS "This is a string"
; With OPTION M510 MyInst MTSTRUCT <0>
; Without OPTION M510, either of these statements is correct MyInst MyInst MTSTRUCT MTSTRUCT <<0>> {<0>}
Effects of the ? Initializer in Data Definitions with OPTION M510

As described in Declaring and Initializing Strings in Chapter 5, the assembler treats the ? initializer as either zero or as an unspecified value. In compatibility mode, however, the assembler always treats the ? initializer as zero unless it is used with the DUP operator. In this case, the assembler allocates space, but does not initialize it with any value.
Current Address Operator with OPTION M510

In compatibility mode, the current address operator ($) applied to a structure returns the offset of the first byte of the structure. When OPTION M510 is not enabled, $ returns the offset of the current field in the structure.
Segment Association for FAR Externals with OPTION M510

In MASM 5.1, you must place an EXTRN directive for a variable in the same segment that holds the variable. For far data, this often entails opening and closing a segment just to place the EXTRN statement.
372
Programmers Guide
MASM 6.1 offers much greater flexibility in where EXTERN and EXTERNDEF statements can appear, as described in Positioning External Declarations in Chapter 8. However, in compatibility mode, MASM 6.1 emulates the behavior of MASM 5.1.
Defining Aliases Using EQU with OPTION M510

In MASM 5.1, you can equate one symbol with another. These equates are called aliases. Unless you specify OPTION M510, MASM 6.1 does not allow aliases defined with EQU. An immediate expression or text must appear as the right operand of an EQU directive. Change aliases to use the TEXTEQU directive, described in Text Macros in Chapter 9. This change may cause an expression to evaluate differently. The following examples illustrate the differences between MASM 5.1 code, MASM 6.1 code with OPTION M510, and MASM 6.1 code without OPTION M510:
; MASM 5.1 code var1 EQU 3 var2 EQU var1
; var2 taken as an alias ; var2 references var1 anywhere var2 is ; used as a symbol
; MASM 6.1 with OPTION M510 var1 EQU 3 var2 EQU var1 ; var2 taken as a var2 EQU <var1> ; var2 substituted for var1 whenever ; text macros substituted ; MASM 6.1 without OPTION M510 var1 EQU 3 var2 EQU var1 ; Treated as var2 EQU 3
373
Difference in Text Macro Expansions with OPTION M510

MASM 6.1 recursively expands text macros used as values, whereas MASM 5.1 simply replaces the text macro with its value. The following example illustrates the difference:
; With OPTION M510 tm1 tm2 tm3 EQU EQU CATSTR <contains tm2> <value> tm1 ; == <contains tm2>
; Without OPTION M510 tm3 CATSTR tm1 ; == <contains value>
Conditional Directives and Missing Operands with OPTION M510

MASM 5.1 considers a missing argument to be a zero. MASM 6.1 requires an argument unless OPTION M510 is enabled.
OPTION OLDSTRUCTS
This section describes changes in MASM 6.1 that apply to structures. With OPTION OLDSTRUCTS or OPTION M510:
u u
You can use plus operator (+) in structure field references. Labels and structure field names cannot have the same name with OPTION OLDSTRUCTS .
Plus Operator Not Allowed with Structures

By default, each reference to structure member names must use the dot operator (.) to separate the structure variable name from the field name. You cannot use the dot operator as the plus operator (+) or vice versa. To convert your code so that it does not need OPTION OLDSTRUCTS :
u u
Qualify all structure field references. Change all uses of the dot operator ( . ) that occur outside of structure references to use the plus operator ( + ).
374
Programmers Guide
If you remove OPTION OLDSTRUCTS from your code, the assembler generates errors for all lines requiring change. Using the dot operator in any context other than for a structure field results in error A2166:
structure field expected
Unqualified structure references result in error A2006:

undefined symbol : identifier
The following example illustrates how to change MASM 5.1 code from the old structure references to the new type in MASM 6.1:
; OPTION OLDSTRUCTS (simulates MASM 5.1) structname STRUC a BYTE ? b WORD ? structname ENDS structinstance mov mov mov structname <> ax, [bx].b al, structinstance.a ax, [bx].4 ; This code assembles ; correctly only with ; OPTION OLDSTRUCTS ; or OPTION M510
; OPTION NOOLDSTRUCTS (the MASM 6.1 default) structname STRUCT a BYTE ? b WORD ? structname ENDS structinstance mov mov mov structname <> ax, [bx].structname.b al, structinstance.a ax, [bx]+4 ; Add qualifying type ; No change needed ; Change dot to plus
; Alternative methods in MASM 6.1 ; Either this: ASSUME bx:PTR structname mov ax, [bx] ; or this: mov ax, (structname PTR[bx]).b
375
Duplicate Structure Field Names

With the default, OPTION NOOLDSTRUCTS , label and structure field names may have the same name. With OPTION OLDSTRUCTS (the MASM 5.1 default), labels and structure fields cannot have the same name. For more information, see Structures and Unions in Chapter 5.
OPTION OLDMACROS
This section describes how MASM 5.1 and 6.1 differ in their handling of macros. Without OPTION OLDMACROS or OPTION M510, MASM 6.1 changes the behavior of macros in several ways. If you want the MASM 5.1 macro behavior, add OPTION OLDMACROS or OPTION M510 to your MASM 5.1 code.
Separating Macro Arguments with Commas

MASM 5.1 allows white spaces or commas to separate arguments to macros. MASM 6.1 with OPTION NOOLDMACROS (the default) requires commas between arguments. For example, in the macro call
MyMacro var1 var2 var3, var4
OPTION OLDMACROS causes the assembler to treat all four items as separate arguments. With OPTION NOOLDMACROS, the assembler treats
var1 var2 var3
as one argument, since the items are not separated with commas. To convert your macro code, replace spaces between macro arguments with a single comma.
New Behavior with Ampersands in Macros

The default OPTION NOOLDMACROS causes the assembler to interpret ampersands (&) within a macro differently than does MASM 5.1. MASM 5.1 requires one ampersand for each level of macro nesting. OPTION OLDMACROS emulates this behavior. Without OPTION OLDMACROS, MASM 6.1 removes ampersands only once no matter how deeply nested the macro. To update your MASM 5.1 macros, follow this simple rule: replace every sequence of ampersands with a single ampersand. The only exception is when macro parameters immediately precede and follow the ampersand, and both require substitution. In this case, use two ampersands. For a description of the new rules, see Substitution Operator in Chapter 9.
376
Programmers Guide
This example shows how to update a MASM 5.1 macro:

; OPTION OLDMACROS (emulates MASM 5.1 behavior) createNames macro irp tail, irp num, ; Define more arg&&tail&&&num&&&? ENDM ENDM ENDM arg <Next, Last> <1, 2> names of the form: abcNext1? label BYTE
; OPTION NOOLDMACROS (the MASM 6.1 default) createNames macro arg for tail, <Next, Last> ; FOR is the MASM 6.1 for num, <1, 2> ; synonym for irp ; Define more names of the form: abcNext1? arg&&tail&&num&? label BYTE ENDM ENDM ENDM
OPTION DOTNAME
MASM 5.1 allows names of identifiers to begin with a period. The MASM 6.1 default is OPTION NODOTNAME. Adding OPTION DOTNAME to your code enables the MASM 5.1 behavior. If you dont want to use this directive in your source code, rename the identifiers whose names begin with a period.
OPTION EXPR16
MASM 5.1 treats expressions as 16-bit words if you do not specify .386 or .386P directives. MASM 6.1 by default treats expressions as 32-bit words, regardless of the CPU type. You can force MASM 6.1 to use the smaller expression size with the OPTION EXPR16 statement. Unless your MASM 5.1 code specifies .386 or .386P, OPTION M510 also sets 16-bit expression size. You can selectively disable this by following OPTION M510 with the OPTION EXPR32 directive, which sets the size back to 32 bits. You cannot have both OPTION EXPR32 and OPTION EXPR16 in your program. It may not be easy to determine the effect of changing from 16-bit internal expression size to 32-bit size. In most cases, the 32-bit word size does not affect the MASM 5.1 code. However, problems may arise because of differences in
377
intermediate values during evaluation of expressions. You can compare the files for differences by generating listing files with the /Fl and /Sa command-line options with and without OPTION EXPR16.
OPTION OFFSET
The information in this section is relevant only if your MASM 5.1 code does not use the .MODEL directive. With no .MODEL , MASM 5.1 computes offsets from the start of the segment, whereas MASM 6.1 computes offsets from the start of the group. (With .MODEL , MASM 5.1 also computes offsets from the start of the group.) To force MASM 6.1 to emulate 5.1 behavior, specify either OFFSET:SEGMENT or OPTION M510. Both directives cause the assembler to compute offsets relative to the segment if you do not include .MODEL . To selectively enable MASM 6.1 behavior, place the directive OPTION OFFSET:GROUP after OPTION M510. In this case, you should ensure each OFFSET statement has a segment override where appropriate. The following example shows how OPTION OFFSET:SEGMENT affects code written for MASM 5.1:
OPTION OFFSET:SEGMENT MyGroup GROUP MySeg MySeg SEGMENT 'data' MyLabel LABEL BYTE DW OFFSET MyLabel DW OFFSET MyGroup:MyLabel DW OFFSET MySeg:MyLabel MySeg ENDS
; Relative to MySeg ; Relative to MyGroup ; Relative to MySeg
In the preceding example, the first OFFSET statement computes the offset of MyLabel relative to MySeg. Without OFFSET:SEGMENT, MASM 6.1 returns the offset relative to MyGroup. To maintain the correct behavior with OFFSET:GROUP, specify a segment override, as shown in the following. The other two OFFSET statements already include overrides, and so do not require modification.
OPTION OFFSET:GROUP MyGroup GROUP MySeg MySeg SEGMENT 'data' MyLabel LABEL BYTE DW OFFSET MySeg:MyLabel DW OFFSET MyGroup:MyLabel DW OFFSET MySeg:MyLabel MySeg ENDS
; Relative to MySeg ; Relative to MyGroup ; Relative to MySeg
378
Programmers Guide
When not in compatibility mode, the OPTION OFFSET directive determines whether the SEG operator returns a value relative to the group or segment. With OPTION M510, SEG is always segment-relative by default, regardless of the current value of OPTION OFFSET.
OPTION NOSCOPED
The information in this section applies only if the .MODEL directive in your MASM 5.1 code does not specify a language type. Without a language type, MASM 5.1 assumes code labels in procedures have no scope that is, the labels are not local to the procedure. When not in compatibility mode, MASM 6.1 always gives scope to code labels, even without a language type. To force MASM 5.1 behavior, specify either OPTION M510 or OPTION NOSCOPED in your code. To selectively enable MASM 6.1 behavior, place the directive OPTION SCOPED after OPTION M510. To determine which labels require change, assemble the module without the OPTION NOSCOPED directive. For each reference to a label that is not local, the assembler generates error A2006:
undefined symbol : identifier
OPTION PROC
The information in this section applies only if the .MODEL directive in your MASM 5.1 code does not specify a language type. Without a language type, MASM 5.1 makes procedures private to the module. By default, MASM 6.1 makes procedures public. You can explicitly change the default visibility to private with either OPTION M510, OPTION PROC:PRIVATE, or OPTION PROC:EXPORT. To selectively enable MASM 6.1 behavior, place the directive OPTION PROC:PUBLIC after OPTION M510. You can override the default by adding the PUBLIC or PRIVATE keyword to selected procedures. The following example shows how to change MASM 5.1 code to keep a procedure private:
; MASM 5.1 (OPTION PROC:PRIVATE) MyProc PROC NEAR ; MASM 6.1 (OPTION PROC:PUBLIC) MyProc PROC NEAR PRIVATE
This is necessary only to avoid naming conflicts between public names in multiple modules or libraries. The symbol table in a listing file shows the visibility (public, private, or export) of each procedure.
379
OPTION NOKEYWORD
MASM 6.1 has several new keywords that MASM 5.1 does not recognize as reserved. To resolve any conflicts, you can:
u u
Rename any offending symbols in your code. Selectively disable keywords with the OPTION NOKEYWORD directive.
The second option lets you retain the offending symbol names in your code by forcing MASM 6.1 to not recognize them as keywords. For example,
OPTION NOKEYWORD:<INVOKE STRUCT>
removes the keywords INVOKE and STRUCT from the assemblers list of reserved words. However, you cannot then use the keywords in their intended function, since the assembler no longer recognizes them. The following list shows MASM 6.1 reserved words new since MASM 5.1: .BREAK .CONTINUE .DOSSEG .ELSE .ELSEIF .ENDIF .ENDW .EXIT .IF .LISTALL .LISTIF .LISTMACRO .LISTMACROALL .NO87 .NOCREF .NOLIST .NOLISTIF .NOLISTMACRO .REPEAT .STARTUP .UNTIL .UNTILCXZ .WHILE ADDR ALIAS BSWAP CARRY? CMPXCHG ECHO EXTERN EXTERNDEF FAR16 FAR32 FLAT FLDENVD FLDENVW FNSAVED FNSAVEW FNSTENVD FNSTENVW FOR FORC FRSTORD FRSTORW FSAVED FSAVEW FSTENVD FSTENVW GOTO HIGHWORD INVD INVLPG INVOKE IRETDF IRETF LENGTHOF LOOPD LOOPED LOOPEW LOOPNED LOOPNEW LOOPNZD LOOPNZW
380
Programmers Guide
LOOPW LOOPZW LOWWORD LROFFSET NEAR16 NEAR32 OPATTR OPTION OVERFLOW? PARITY? POPAW POPCONTEXT PROTO PUSHAW
PUSHCONTEXT PUSHD PUSHW REAL10 REAL4 REAL8 REPEAT SBYTE SDWORD SIGN? SIZEOF STDCALL STRUCT SUBTITLE
SWORD SYSCALL TEXTEQU TR3 TR4 TR5 TYPEDEF UNION VARARG WBINVD WHILE XADD ZERO?
OPTION SETIF2
By default, MASM 6.1 does not recognize pass-dependent constructs. Both the OPTION M510 and OPTION SETIF2 statements force MASM 6.1 to handle MASM 5.1 constructs that activate on the second assembly pass, such as .ERR2, IF2, and ELSEIF2. Invoke the option like this: OPTION SETIF2: {TRUE | FALSE} When set to TRUE, OPTION SETIF2 forces all second-pass constructs to activate on every assembly pass. When set to FALSE, second-pass constructs do not activate on any pass. OPTION M510 implies OPTION SETIF2:TRUE.
Changes to Instruction Encodings

MASM 6.1 contains changes to the encodings for several instructions. In some cases, the changes help optimize code size.
Coprocessor Instructions
For the 8087 coprocessor, MASM 5.1 adds an extra NOP before the no-wait versions of coprocessor instructions. MASM 6.1 does not. In the rare case that the missing NOP affects timing, insert NOP.
Chapter 1 Chapter Head
381
For the 80287 coprocessor or better, MASM 5.1 inserts FWAIT before certain instructions. MASM 6.1 does not prefix any 80287, 80387, or 80486 coprocessor instruction with FWAIT, except for wait forms of instructions that have a no-wait form.
RET Instruction
MASM 5.1 generates a 3-byte encoding for RET, RETN, or RETF instructions with an operand value of zero, unless the operand is an external absolute. In this case, MASM 5.1 ignores the parameter and generates a 1-byte encoding. MASM 6.1 does the opposite. It ignores a zero operand for the return instructions and generates a 1-byte encoding, unless the operand is an external absolute. In this case, MASM 6.1 generates a 3-byte encoding. Thus, you can suppress epilogue code in a procedure but still specify the default size for RET by coding the return as
ret 0
Versions 5.1 and 6.1 differ in the way they encode the arithmetic instructions ADC, ADD, AND, CMP, OR, SUB, SBB, and XOR, under the following conditions:
u u
The first operand is either AX or EAX. The second operand is a constant value between 0 and 127.
For the AX register, there is no size or speed difference between the two encodings. For the EAX register, the encoding in MASM 6.1 is 2 bytes smaller. The OPTION NOSIGNEXTEND directive forces the MASM 5.1 behavior for AND, OR, and XOR.
379
A P P E N D I X
BNF Grammar
This appendix provides a complete description of symbols, operators, and directives for MASM 6.1. It uses the Backus-Naur Form (BNF) for grammar notation. You can use BNF grammar to determine the exact syntax for any language component and find all available options for any MASM command. BNF definitions consist of nonterminals and terminals. Nonterminals are placeholders within a BNF definition, defined elsewhere in the BNF grammar. Terminals are endpoints in a BNF definition, consisting of MASM 6.1 keywords. In this Appendix, all nonterminals appear in italics type and all terminals appear in bold type.
BNF Conventions
The conventions use different font attributes for different items in the BNF. The symbols and formats are as follows:
Attribute nonterminal
RESERVED
Description Italic type indicates nonterminals. Terminals in boldface type are literal reserved words and symbols that must be entered as shown. Characters in this context are always case insensitive. Objects enclosed in double brackets ([ [] ]) are optional. The brackets do not actually appear in the source code. A vertical bar indicates a choice between the items on each side of the bar. Underlined items indicate the default option if one is given. Characters in the set described or listed can be used as terminals in MASM statements.
[ [] ] |
.8086
default typeface
Filename: LMAPGAPB.DOC Project: Template: MSGRIDA1.DOT Author: Ruth L Silverio Last Saved By: Ruth L Silverio Revision #: 2 Page: 379 of 1 Printed: 10/02/00 04:19 PM
380
Programmers Guide
How to Use the BNF Grammar

To illustrate the use of the BNF, Figure B.1 diagrams the definition of the TYPEDEF directive, starting with the nonterminal typedefDir. The entries under each horizontal brace in Figure B.1 are terminals (such as NEAR16, NEAR32, FAR16, and FAR32) or nonterminals (such as qualifier, qualifiedType, distance, and protoSpec) that can be further defined. Each italicized nonterminal in the typedefDir definition is also an entry in the BNF. Three vertical dots indicate a branching definition for a nonterminal that, for the sake of simplicity, this figure does not illustrate. The BNF grammar allows recursive definitions. For example, the grammar uses qualifiedType as a possible definition for qualifiedType, which is also a component of the definition for qualifier.
Figure B.1 Nonterminal ;; =Dir addOp aExpr
BNF Definition of the TYPEDEF Directive Definition endOfLine | comment id = immExpr ;; +|term | aExpr && term
Appendix B BNF Grammar Nonterminal altId arbitraryText asmInstruction assumeDir assumeList assumeReg assumeRegister assumeSegReg assumeSegVal assumeVal bcdConst binaryOp bitDef bitDefList bitFieldId bitFieldSize blockStatements Definition id charList mnemonic [ [ exprList ] ]
ASSUME assumeList ;; | ASSUME NOTHING ;;
381
assumeRegister | assumeList , assumeRegister register : assumeVal assumeSegReg | assumeReg segmentRegister : assumeSegVal frameExpr | NOTHING | ERROR qualifiedType | NOTHING | ERROR [ [ sign ] ] decNumber == | != | >= | <= | > | < | & bitFieldId : bitFieldSize [ [ = constExpr ] ] bitDef | bitDefList , [ [ ;; ] ] bitDef id constExpr directiveList | .CONTINUE [ [ .IF cExpr ] ] | .BREAK [ [ .IF cExpr ] ]
TRUE | FALSE
bool byteRegister cExpr character charList className commDecl commDir comment
AL | AH | BL | BH | CL | CH | DL | DH aExpr | cExpr || aExpr Any character with ordinal in the range 0255 except linefeed (10) character | charList character string [ [ nearfar ] ][ [ langType ] ] id : commType [ [ : constExpr ] ]
COMM commList ;;
; text ;;
382
Programmers Guide Nonterminal commentDir Definition

COMMENT delimiter text text delimiter text ;;
commList commType constant constExpr contextDir contextItem contextItemList controlBlock controlDir controlElseif
commDecl | commList , commDecl type | constExpr digits [ [ radixOverride ] ] expr

PUSHCONTEXT contextItemList ;; | POPCONTEXT contextItemList ;; ASSUMES | RADIX | LISTING | CPU | ALL
contextItem | contextItemList , contextItem whileBlock | repeatBlock controlIf | controlBlock

.ELSEIF cExpr ;;
directiveList [ [ controlElseif ] ] controlIf

.IF cExpr ;;
directiveList [ [ controlElseif ] ] [ [ .ELSE ;; directiveList ] ] .ENDIF ;; coprocessor crefDir crefOption

.8087 | .287 | .387 | .NO87
crefOption ;;
.CREF
| .XCREF [ [ idList ] ] | .NOCREF [ [ idList ] ] cxzExpr expr | ! expr | expr == expr | expr != expr DB | DW | DD | DF | DQ | DT | dataType | typeId [ [ id ] ] dataItem ;;
dataDecl dataDir
Appendix B BNF Grammar Nonterminal dataItem Definition dataDecl scalarInstList | structTag structInstList | typeId structInstList | unionTag structInstList | recordTag recordInstList
BYTE | SBYTE | WORD | SWORD | DWORD | SDWORD | FWORD | QWORD | TBYTE | REAL4 | REAL8 | REAL10
383
dataType
decdigit decNumber delimiter digits
0|1|2|3|4|5|6|7|8|9 decdigit | decNumber decdigit Any character except whiteSpaceCharacter decdigit | digits decdigit | digits hexdigit generalDir | segmentDef directive | directiveList directive nearfar | NEAR16 | NEAR32 | FAR16 | FAR32 e01 orOp e02 | e02 e02 AND e03 | e03
NOT e04
directive directiveList distance e01 e02 e03 e04 e05 e06
| e04 e04 relOp e05 | e05 e05 addOp e06 | e06 e06 mulOp e07 | e06 shiftOp e07 | e07 e07 addOp e08 | e08
HIGH e09 | LOW e09 | HIGHWORD e09 | LOWWORD e09
e07 e08
| e09
384
Programmers Guide Nonterminal e09 Definition

OFFSET e10 | SEG e10 | LROFFSET e10 | TYPE e10 | THIS e10 | e09 PTR e10
| e09 : e10 | e10 e10 e10 . e11 | e10 [ [ expr ] ] | e11 ( expr ) |[ [ expr ] ] | WIDTH id | MASK id | SIZE sizeArg | SIZEOF sizeArg | LENGTH id | LENGTHOF id | recordConst | string | constant | type | id |$ | segmentRegister | register | ST | ST ( expr )
ECHO arbitraryText ;; %OUT arbitraryText ;;
e11
echoDir elseifBlock
elseifStatement ;; directiveList [ [ elseifBlock ] ]

ELSEIF constExpr | ELSEIFE constExpr | ELSEIFB textItem | ELSEIFNB textItem | ELSEIFDEF id | ELSEIFNDEF id | ELSEIFDIF textItem , textItem | ELSEIFDIFI textItem , textItem | ELSEIFIDN textItem , textItem | ELSEIFIDNI textItem , textItem | ELSEIF1 | ELSEIF2
elseifStatement
Appendix B BNF Grammar Nonterminal endDir endpDir endsDir equDir equType errorDir errorOpt Definition
END [ [ immExpr ] ] ;;
385
procId ENDP ;; id ENDS ;; textMacroId EQU equType ;; immExpr | textLiteral errorOpt ;;

.ERR [ [ textItem ] ] | .ERRE constExpr [ [ optText ] ] | .ERRNZ constExpr [ [ optText ] ] | .ERRB textItem [ [ optText ] ] | .ERRNB textItem [ [ optText ] ] | .ERRDEF id [ [ optText ] ] | .ERRNDEF id [ [ optText ] ] | .ERRDIF textItem , textItem [ [ optText ] ] | .ERRDIFI textItem , textItem [ [ optText ] ] | .ERRIDN textItem , textItem [ [ optText ] ] | .ERRIDNI textItem , textItem [ [ optText ] ] | .ERR1 [ [ textItem ] ] | .ERR2 [ [ textItem ] ] .EXIT [ [ expr ] ] ;; EXITM | EXITM textItem
exitDir exitmDir: exponent expr
E[ [ sign ] ] decNumber
SHORT e05 | .TYPE e01 | OPATTR e01
| e01 exprList externDef externDir externKey externList externType fieldAlign fieldInit expr | exprList , expr [ [ langType ] ] id [ [ ( altId ) ] ] : externType externKey externList ;;
EXTRN | EXTERN | EXTERNDEF
externDef | externList , [ [ ;; ] ] externDef

ABS
| qualifiedType constExpr [ [ initValue ] ] | structInstance
386
Programmers Guide Nonterminal fieldInitList fileChar fileCharList fileSpec flagName floatNumber Definition fieldInit | fieldInitList , [ [ ;; ] ] fieldInit delimiter fileChar | fileCharList fileChar fileCharList | textLiteral
ZERO? | CARRY? | OVERFLOW? | SIGN? | PARITY?
[ [ sign ] ] decNumber . [ [ decNumber ] ][ [ exponent ] ] | digits R | digits r

FORC | IRPC FOR | IRP
forcDir forDir forParm forParmType frameExpr
id [ [ : forParmType ] ]
REQ
| = textLiteral
SEG id
| DGROUP : id | segmentRegister : id | id generalDir modelDir | segOrderDir | nameDir | includeLibDir | commentDir | groupDir | assumeDir | structDir | recordDir | typedefDir | externDir | publicDir | commDir | protoTypeDir | equDir | =Dir | textDir | contextDir | optionDir | processorDir | radixDir | titleDir | pageDir | listDir | crefDir | echoDir | ifDir | errorDir | includeDir | macroDir | macroCall | macroRepeat | purgeDir | macroWhile | macroFor | macroForc | aliasDir AX | EAX | BX | EBX | CX | ECX | DX | EDX | BP | EBP | SP | ESP | DI | EDI | SI | ESI groupId GROUP segIdList id a|b|c|d|e|f |A|B|C|D|E|F
gpRegister groupDir groupId hexdigit
Appendix B BNF Grammar Nonterminal id Definition alpha | id alpha | id decdigit id | idList , id ifStatement ;; directiveList [ [ elseifBlock ] ] [ [ ELSE ;; directiveList ] ] ENDIF ;;
IF constExpr | IFE constExpr | IFB textItem | IFNB textItem | IFDEF id | IFNDEF id | IFDIF textItem , textItem | IFDIFI textItem , textItem | IFIDN textItem , textItem | IFIDNI textItem , textItem | IF1 | IF2
387
idList ifDir
ifStatement
immExpr includeDir includeLibDir initValue
expr
INCLUDE fileSpec ;; INCLUDELIB fileSpec ;;
immExpr | string |? | constExpr DUP ( scalarInstList ) | floatNumber | bcdConst [ [ labelDef ] ] inSegmentDir inSegDir | inSegDirList inSegDir
inSegDir inSegDirList
388
Programmers Guide Nonterminal inSegmentDir Definition instruction | dataDir | controlDir | startupDir | exitDir | offsetDir | labelDir | procDir [ [ localDirList ] ][ [ inSegDirList ] ] endpDir | invokeDir | generalDir
REP | REPE | REPZ | REPNE | REPNZ | LOCK
instrPrefix instruction invokeArg
[ [ instrPrefix ] ] asmInstruction register :: register | expr | ADDR expr

INVOKE expr [ [,[ [ ;; ] ] invokeList ] ] ;;
invokeDir invokeList keyword keywordList labelDef
invokeArg | invokeList , [ [ ;; ] ] invokeArg Any reserved word keyword | keyword keywordList id : | id :: | @@: id LABEL qualifiedType ;;
C | PASCAL | FORTRAN | BASIC | SYSCALL | STDCALL
labelDir langType listDir listOption
listOption ;;
.LIST | .NOLIST | .XLIST | .LISTALL | .LISTIF | .LFCOND | .NOLISTIF | .SFCOND | .TFCOND | .LISTMACROALL | .LALL | .NOLISTMACRO | .SALL | .LISTMACRO | .XALL LOCAL idList ;; LOCAL parmList ;;
localDef localDir localDirList
localDir | localDirList localDir
Appendix B BNF Grammar Nonterminal localList macroArg Definition localDef | localList localDef % constExpr | % textMacroId | % macroFuncId ( macroArgList ) | string | arbitraryText | < arbitraryText > macroArg | macroArgList , macroArg [ [ localList ] ] macroStmtList id macroArgList ;; | id ( macroArgList ) id MACRO [ [ macroParmList ] ] ;; macroBody ENDM ;; forDir forParm , < macroArgList > ;; macroBody ENDM ;; forcDir id , textLiteral ;; macroBody ENDM ;; id macroProcId | macroFuncId macroId | macroIdList , macroId id id [ [ : parmType ] ] macroParm | macroParmList , [ [ ;; ] ] macroParm id repeatDir constExpr ;; macroBody ENDM ;; directive | exitmDir | : macroLabel | GOTO macroLabel
389
macroArgList macroBody macroCall macroDir
macroFor
macroForc
macroFuncId macroId macroIdList macroLabel macroParm macroParmList macroProcId macroRepeat
macroStmt
390
Programmers Guide Nonterminal macroStmtList macroWhile Definition macroStmt ;; | macroStmtList macroStmt ;;

WHILE constExpr ;;
macroBody ENDM ;; mapType memOption mnemonic modelDir modelOpt modelOptlist module mulOp nameDir nearfar nestedStruct
ALL | NONE | NOTPUBLIC TINY | SMALL | MEDIUM | COMPACT | LARGE | HUGE | FLAT
Instruction name
.MODEL memOption [ [ , modelOptlist ] ] ;;
langType | stackOption modelOpt | modelOptlist , modelOpt [ [ directiveList ] ] endDir * | / | MOD

NAME id ;; NEAR | FAR
structHdr [ [ id ] ] ;; structBody ENDS ;; offsetDirType ;;

EVEN | ORG immExpr | ALIGN [ [ constExpr ] ] GROUP | SEGMENT | FLAT
offsetDir offsetDirType
offsetType oldRecordFieldList optionDir
[ [ constExpr ] ] | oldRecordFieldList , [ [ constExpr ] ]

OPTION optionList ;;
Appendix B BNF Grammar Nonterminal optionItem Definition

CASEMAP : mapType | DOTNAME | NODOTNAME | EMULATOR | NOEMULATOR | EPILOGUE : macroId | EXPR16 | EXPR32 | LANGUAGE : langType | LJMP | NOLJMP | M510 | NOM510 | NOKEYWORD : < keywordList > | NOSIGNEXTEND | OFFSET : offsetType | OLDMACROS | NOOLDMACROS | OLDSTRUCTS | NOOLDSTRUCTS | PROC : oVisibility | PROLOGUE : macroId | READONLY | NOREADONLY | SCOPED | NOSCOPED | SEGMENT : segSize | SETIF2 : bool
391
optionList optText orOp oVisibility pageDir pageExpr pageLength pageWidth parm parmId parmList parmType
optionItem | optionList , [ [ ;; ] ] optionItem , textItem

OR | XOR PUBLIC | PRIVATE | EXPORT PAGE [ [ pageExpr ] ] ;;
+ |[ [ pageLength ] ][ [ , pageWidth ] ] constExpr constExpr parmId [ [ : qualifiedType ] ] | parmId [ [ constExpr ] ][ [ : qualifiedType ] ] id parm | parmList , [ [ ;; ] ] parm
REQ
| = textLiteral | VARARG pOptions primary [ [ distance ] ][ [ langType ] ][ [ oVisibility ] ] expr binaryOp expr | flagName | expr
392
Programmers Guide Nonterminal procDir processor Definition procId PROC [ [ pOptions ] ][ [ < macroArgList > ] ] [ [ usesRegs ] ][ [ procParmList ] ] .8086 | .186 | .286 | .286C | .286P | .386 | .386C | .386P | .486 | .486P processor ;; | coprocessor ;; id [ [,[ [ ;; ] ] parmList ] ] [ [,[ [ ;; ] ] parmId :VARARG] ] [ [ id ] ] : qualifiedType [ [,[ [ ;; ] ] protoList ] ] [ [,[ [ ;; ] ][ [ id ] ] :VARARG ] ] protoArg | protoList , [ [ ;; ] ] protoArg [ [ distance ] ][ [ langType ] ][ [ protoArgList ] ] | typeId id PROTO protoSpec [ [ langType ] ] id
PUBLIC pubList ;;
processorDir procId procParmList protoArg protoArgList protoList protoSpec protoTypeDir pubDef publicDir pubList purgeDir qualifiedType qualifier quote radixDir radixOverride recordConst recordDir recordFieldList
pubDef | pubList , [ [ ;; ] ] pubDef

PURGE macroIdList
type |[ [ distance ] ] PTR [ [ qualifiedType ] ] qualifiedType | PROTO protoSpec |

.RADIX constExpr ;;
h|o|q|t|y |H|O|Q|T|Y recordTag { oldRecordFieldList } | recordTag < oldRecordFieldList > recordTag RECORD bitDefList ;; [ [ constExpr ] ] | recordFieldList , [ [ ;; ] ][ [ constExpr ] ]
Appendix B BNF Grammar Nonterminal recordInstance Definition {[ [ ;; ] ] recordFieldList [ [ ;; ] ]} | < oldRecordFieldList > | constExpr DUP ( recordInstance ) recordInstance | recordInstList , [ [ ;; ] ] recordInstance id specialRegister | gpRegister | byteRegister register | regList register
EQ | NE | LT | LE | GT | GE .REPEAT ;;
393
recordInstList recordTag register
regList relOp repeatBlock
blockStatements ;; untilDir ;; repeatDir scalarInstList segAlign segAttrib

REPEAT | REPT
initValue | scalarInstList , [ [ ;; ] ] initValue

BYTE | WORD | DWORD | PARA | PAGE PUBLIC | STACK | COMMON | MEMORY | AT constExpr | PRIVATE .CODE [ [ segId ] ] | .DATA | .DATA? | .CONST | .FARDATA [ [ segId ] ] | .FARDATA? [ [ segId ] ] | .STACK [ [ constExpr ] ]
segDir
segId segIdList segmentDef segmentDir segmentRegister
id segId | segIdList , segId segmentDir [ [ inSegDirList ] ] endsDir | simpleSegDir [ [ inSegDirList ] ][ [ endsDir ] ] segId SEGMENT [ [ segOptionList ] ] ;; CS | DS | ES | FS | GS | SS
394
Programmers Guide Nonterminal segOption Definition segAlign | segRO | segAttrib | segSize | className segOption | segOptionList segOption
.ALPHA | .SEQ | .DOSSEG | DOSSEG READONLY USE16 | USE32 | FLAT SHR | SHL
segOptionList segOrderDir segRO segSize shiftOp sign simpleExpr simpleSegDir sizeArg
-|+ ( cExpr ) | primary segDir ;; id | type | e10 :|.|[ [|] ]|(|)|<|>|{|} |+|-|/|*|&|%|! ||\|=|;|,| | whiteSpaceCharacter | endOfLine CR0 | CR2 | CR3 | DR0 | DR1 | DR2 | DR3 | DR6 | DR7 | TR3 | TR4 | TR5 | TR6 | TR7
NEARSTACK | FARSTACK .STARTUP ;;
specialChars
specialRegister
stackOption startupDir stext string stringChar structBody structDir
stringChar | stext stringChar quote [ [ stext ] ] quote quote quote | Any character except quote structItem ;; | structBody structItem ;; structTag structHdr [ [ fieldAlign ] ] [ [, NONUNIQUE ] ] ;; structBody structTag ENDS ;;
STRUC | STRUCT | UNION
structHdr
Appendix B BNF Grammar Nonterminal structInstance Definition <[ [ fieldInitList ] ]> |{[ [ ;; ] ][ [ fieldInitList ] ][ [ ;; ] ]} | constExpr DUP ( structInstList ) structInstance | structInstList , [ [ ;; ] ] structInstance dataDir | generalDir | offsetDir | nestedStruct id simpleExpr | ! simpleExpr textLiteral | text character | ! character text | character | ! character id textMacroDir ;; textLiteral | textMacroId | % constExpr constExpr textItem | textList , [ [ ;; ] ] textItem < text >;;
CATSTR [ [ textList ] ] | TEXTEQU [ [ textList ] ] | SIZESTR textItem | SUBSTR textItem , textStart [ [ , textLen ] ] | INSTR [ [ textStart , ] ] textItem , textItem
395
structInstList structItem
structTag term text
textDir textItem
textLen textList textLiteral textMacroDir
textMacroId textStart titleDir titleType type
id constExpr titleType arbitraryText ;;

TITLE | SUBTITLE | SUBTTL
structTag | unionTag | recordTag | distance | dataType | typeId typeId TYPEDEF qualifier
typedefDir
396
Programmers Guide Nonterminal typeId unionTag untilDir usesRegs whileBlock Definition id id

.UNTIL cExpr ;; .UNTILCXZ [ [ cxzExpr ] ] ;; USES regList .WHILE cExpr ;;
blockStatements ;;
.ENDW
whiteSpaceCharacter
ASCII 8, 9, 1113, 26, 32
397
A P P E N D I X
Generating and Reading Assembly Listings
A listing file shows precisely how the assembler translates your source file into machine code. The listing documents the assemblers assumptions, memory allocations, and optimizations. MASM creates an assembly listing of your source file whenever you do one of the following:
u u u
Select the appropriate option in PWB. Use one of the related source code directives. Specify the /Fl option on the MASM command line.
The assembly listing contains both the statements in the source file and the binary code (if any) generated for each statement. The listing also shows the names and values of all labels, variables, and symbols in your file. The assembler creates tables for macros, structures, unions, records, segments, groups, and other symbols, and places the tables at the end of the assembly listing. Only the types of symbols encountered in the program are included. For example, if your program has no macros, the symbol table does not have a macros section.
Generating Listing Files

To generate a listing file from within PWB, follow these steps: 1. From the Options menu, choose MASM Options. 2. In the MASM Options dialog box, choose Set Debug or Release Options. The dialog box for Set Debug or Release Options lists the choices summarized in Table C.1. This table also shows the equivalent source code directives and command-line options.
Filename: LMAPGAPC.DOC Project: Template: MSGRIDA1.DOT Author: Ruth L Silverio Last Saved By: Ruth L Silverio Revision #: 2 Page: 397 of 1 Printed: 10/02/00 04:19 PM
398 Table C.1
Programmers Guide Options for Generating or Modifying Listing Files
To generate this information: To generate this information: Default listing includes all assembled lines Turn off all source listings (overrides all listing directives) List all source lines, including false conditionals and generated code Show instruction timings Show assembler-generated code Include false conditionals2 Suppress listing of any subsequent conditional blocks whose condition is false Toggle between .LISTIF and .NOLISTIF Suppress symbol table generation List all processed macro statements List only instructions, data, and segment directives in macros Turn off all listing during macro expansion Specify title for each page (use only once per file) Specify subtitle for page Designate page length and line width, increment section number, or generate page breaks Generate first-pass listing In PWB1, select: Generate Listing File Generate Listing File (turn off) Include All Source Lines In source code, enter: .LIST (default) .NOLIST (synonym = .SFCOND) .LISTALL From command line, enter: /Fl
/Fl /Sa
List Instruction Timings List Generated Instructions List False Conditionals List False Conditionals (turn off) Generate Symbol Table (turn off the default)
.LISTIF (synonym = .LFCOND) .NOLISTIF (synonym = .SFCOND) .TFCOND .LISTMACROALL (synonym = .LALL) .LISTMACRO (default) (synonym = .XALL) .NOLISTMACRO (synonym = .SALL) TITLE name SUBTITLE name PAGE [ [length,width] ][ [+] ]
/Fl /Sc /Fl /Sg /Fl /Sx
/Fl /Sn
/St name /Ss name /Sp length /Sl width
/Ep
1 Select MASM Options from the Options menu, then choose Set Dialog Options from the MASM Options dialog box.
Appendix C Generating and Reading Assembly Listings

2 See Conditional Directives in Chapter 1
399
Precedence of Command-Line Options and Listing Directives

Since command-line options and source code directives can specify opposite behavior for the same listing file option, the assembler interprets the commands according to the following precedence levels. Selecting PWB options is equivalent to specifying /Fl /Sx on the command line:
u u u
/Sa overrides any source code directives that suppress listing. Source code directives override all command-line options except /Sa. .NOLIST overrides other listing directives such as .NOLISTIF and .LISTMACROALL. The /Sx, /Ss, /Sp, and /Sl options set initial values for their respective features. Directives in the source file can override these command-line options.
Reading the Listing File

The first half of the listing shows macros from the include file DOS.MAC, structure declarations, and data. After the .DATA directive, the columns on the left show offsets and initialized byte values within the data segment. Instructions begin after the .CODE directive. The three columns on the left show offsets, instruction timings, and binary code generated by the assembler. The columns on the right list the source statements exactly as they appear in the source file or as expanded by a macro. Various symbols and abbreviations in the middle column provide information about the code, as explained in the following section. The subsequent section, Symbols and Abbreviations, explains the meanings of listing symbols.
Generated Code
The assembler lists the code generated from the statements of a source file. With the /Sc command-line switch, which generates instruction timings, each line has this syntax: offset [[timing]] [[code]] The offset is the offset from the beginning of the current code segment. The timing shows the number of cycles the processor needs to execute the instruction. The value of timing reflects the CPU type; for example, specifying the .386 directive produces instruction timings for the 80386 processor. If the statement generates code or data, code shows the numeric value in hexadecimal
400
Programmers Guide
notation if the value is known at assembly time. If the value is calculated at run time, the assembler indicates what action is necessary to compute the value. When assembling under the default .8086 directive, timing includes an effective address value if the instruction accesses memory. The 80186/486 processors do not use effective address values. For more information on effective address timing, see the Processor section in the Reference book.
Error Messages
If any errors occur during assembly, each error message and error number appears directly below the statement where the error occurred. An example of an error line and message is:
mov ax, [dx][di] listtst.asm(77): error A2031: must be index or base register
Symbols and Abbreviations

The assembler uses the symbols and abbreviations shown in Table C.2 to indicate addresses that need to be resolved by the linker or values that were generated in a special way. The example in this section illustrates many of these symbols. The example listing was produced using List Generated Instructions and List Instruction Timings in PWB. These options correspond to the ML commandline switches /Fl /Sg /Sc.
Table C.2 Character C = nn[xx] ---R * E n | & nn: nn/ Symbols and Abbreviations in Listings Meaning Line from include file
EQU or equal-sign (=) directive DUP expression:
nn copies of the value xx
Segment/group address (linker must resolve) Relocatable address (linker must resolve) Assembler-generated code External address (linker must resolve) Macro-expansion nesting level (+ if more than 9) Operator size override Address size override Segment override in statement
REP
or LOCK prefix instruction
401
Table C.3 explains the five symbols that may follow timing values in your listing. The Reference book will help you determine correct timings for those values marked with a symbol.
402
Programmers Guide Table C.3 Symbols in Timing Column Symbol m n p + , Meaning Add cycles depending on next executed instruction. Add cycles depending on number of iterations or size of data. Different timing value in protected mode. Add cycles depending on operands or combination of the preceding. Separates two values for jump taken and jump not taken.
09/20/00 12:00:00 Page 1 - 1
Microsoft (R) Macro Assembler Version 6.10 listtst.asm
= 0020 = 35 = 32 = 04 <son> )
.MODEL .386 .DOSSEG .STACK INCLUDE C StrDef MACRO C name1 BYTE C BYTE C l&name1 EQU C ENDM C C Display MACRO C mov C mov C int C ENDM num EQU COLOR RECORD value TEXTEQU tnum TEXTEQU strpos TEXTEQU
small, c
256 dos.mac name1, text &text 13d, 10d, '$' LENGTHOF name1
string ah, 09h dx, OFFSET string 21h 20h b:1, r:3=1, i:1=1, f:3=7 %3 + num %num @InStr( , <person>,
PutStr 0004 0000 0001 0002 DATE month day year DATE
PROTO STRUCT BYTE BYTE WORD ENDS
pMsg:PTR BYTE
01 01 0000
1 1 ?

0002 0000 U1 fsize bsize U1 UNION WORD BYTE ENDS .DATA 00000000 1F 01 14 07C9 00 001E [ 0000 ] ddData text today flag buffer DWORD COLOR DATE BYTE WORD ? <> <01, 20, 1993> 0 30 DUP (0)
403
0028
40 60
0000 0000 0004 0005 0009 000A
0046 46 69 6E 69 65 64 004F 0D 0A 24 = 0009 0052 54 68 69 73 string","0" 73 20 73 74 6E 67
73 68 2E
1 1 1
ending
StrDef BYTE
ending, "Finished." "Finished." 13d, 10d, '$' LENGTHOF ending BYTE "This is a
20 69 61 20 72 69 30
BYTE lending EQU Msg
0063 ---- 0052 R
float FPBYTE FPMSG PBYTE NPWORD PVOID PPBYTE
TYPEDEF TYPEDEF FAR FPBYTE TYPEDEF TYPEDEF NEAR TYPEDEF TYPEDEF .CODE .STARTUP
REAL4 PTR BYTE Msg PTR BYTE PTR WORD PTR PTR PBYTE
0000 0000 0000 0003 0005 0007 0009 000C 000E *@Startup: * mov * mov * mov * sub * shl * mov * add
2 2p 2 2 3 2p 2
B8 8E 8C 2B C1 8E 03
---- R D8 D3 D8 E3 04 D0 E3
ax, ds, bx, bx, bx, ss, sp,
DGROUP ax ss ax 004h ax bx work:NEAR work
0010
7m
E8 0000 E
EXTERNDEF call
404
Programmers Guide
INVOKE PutStr, ADDR msg OFFSET Msg PutStr sp, 00002h mov mov mov mov mov repne mov inc EXTERNDEF call Display mov mov int ax, es, al, cx, @data ax 'c' es:num
0013 0016 0019 001C 001F 0021 0023 0028 002B 002D 0031
2 7m 2 2 2p 2 4 2 7n 4 6
68 0052 R E8 0029 83 C4 02 B8 ---- R 8E C0 B0 63 26: 8B 0E 0020 BF 0052 F2/ AE 66| A1 0000 R 67& FE 03
* * *
push call add
di, 82 scasb eax, ddData BYTE PTR [ebx] morework:NEAR morework ending ah, 09h dx, OFFSET ending 21h
0034
7m
E8 0000 E
0037 0039 003C
2 2 37
B4 09 BA 0046 R CD 21
1 1 1
003E 0040
2 37
B4 4C CD 21
* *
mov int
.EXIT ah, 04Ch 021h
0042 0042 0043 0045 0047 004A 2 4 2 4 4 55 8B B4 8B 8A * *
PutStr push mov
PROC
pMsg:PTR BYTE
bp bp, sp mov ah, mov di, mov dl, mov ax, listtst.asm(77): error A2031: must be index or base EC 02 7E 04 15 .WHILE (dl) @C0001 int inc mov .ENDW
02H pMsg [di] [dx][di] register
004C 0059 0059 005B 005C
7m 37 2 4
EB 10 CD 21 47 8A 15
* jmp *@C0002:
21h di dl, [di]
005E 005E 2 0060 7m,3
0A D2 75 F7
*@C0001: * or dl, dl * jne @C0002 ret

0062 0063 0064 4 10m 5D C3 * * pop bp ret 00000h PutStr ENDP END
405
Reading Tables in a Listing File

The tables at the end of a listing file list the macros, structures, unions, records, segments, groups, and symbols that appear in a source file. These tables are not printed in the previous sample listing, but are summarized as follows.
Macro Table
Lists all macros in the main file or the include files. Differentiates between macro functions and macro procedures.
Structures and Unions Table

Provides the size in bytes of the structure or union and the offset of each field. The type of each field is also given.
Record Table
Width gives the number of bits of the entire record. Shift provides the offset in bits from the low-order bit of the record to the low-order bit of the field. Width for fields gives the number of bits in the field. Mask gives the maximum value of the field, expressed in hexadecimal notation. Initial gives the initial value supplied for the field.
Type Table
The Size column in this table gives the size of the TYPEDEF type in bytes, and the Attr column gives the base type for the TYPEDEF definition.
Segment and Group Table

Size specifies whether the segment is 16 bit or 32 bit. Length gives the size of the segment in bytes. Align gives the segment alignment (WORD, PARA, and so on). Combine gives the combine type (PUBLIC, STACK, and so on). Class gives the segments class (CODE, DATA, STACK, or CONST).
Procedures, Parameters, and Locals

Gives the types and offsets from BP of all parameters and locals defined in each procedure, as well as the size and memory location of each procedure.
Symbol Table
All symbols (except names for macros, structures, unions, records, and segments) are listed in a symbol table at the end of the listing. The Name
406
Programmers Guide
column lists the names in alphabetical order. The Type column lists each symbols type. The length of a multiple-element variable, such as an array or string, is the length of a single element, not the length of the entire variable. If the symbol represents an absolute value defined with an EQU or equal sign (=) directive, the Value column shows the symbols value. The value may be another symbol, a string, or a constant numeric value (in hexadecimal), depending on the type. If the symbol represents a variable or label, the Value column shows the symbols hexadecimal offset from the beginning of the segment in which it is defined. The Attr column shows the attributes of the symbol. The attributes include the name of the segment (if any) in which the symbol is defined, the scope of the symbol, and the code length. A symbols scope is given only if the symbol is defined using the EXTERN and PUBLIC directives. The scope can be external, global, or communal. The Attr column is blank if the symbol has no attribute.
406
Programmers Guide
407
A P P E N D I X
MASM Reserved Words
This appendix lists the reserved words recognized by MASM. They are divided primarily by their use in the language. The primary categories are:
u u u u u
Operands and symbols Registers Operators and directives Processor instructions Coprocessor instructions
Reserved words in MASM 6.1 are reserved under all CPU modes. Words enabled in .8086 mode, the default, can be used in all higher CPU modes. To use words from subcategories such as Special Operands for the 80386 (later in this appendix) requires .386 mode or higher. You can disable the recognition of any reserved word specified in this appendix by setting the NOKEYWORD option for the OPTION directive. Once disabled, the word can be used in any way as a user-defined symbol (provided the word is a valid identifier). If you want to remove the STR instruction, the MASK operator, and the NAME directive, for instance, from the set of words MASM recognizes as reserved, add this statement to your program:
OPTION NOKEYWORD:<STR MASK NAME>
Words in this appendix identified with an asterisk (*) are new since MASM 5.1.
Operands and Symbols

The words on the two lists in this section are the operands to certain directives. They have special meaning to the assembler. The words on the first list are not reserved words. They can be used in every way as normal identifiers, without affecting their use as operands to directives. The assembler interprets their use from context.
Filename: LMAPGAPD.DOC Project: Template: MSGRIDA1.DOT Author: Ruth L Silverio Last Saved By: Ruth L Silverio Revision #: 4 Page: 407 of 1 Printed: 10/02/00 04:24 PM
408
Programmers Guide
Even though the words on the first list are not reserved, they should not be defined to be text macros or text macro functions. If they are, they will not be recognized in their special contexts. The assembler does not give a warning if such a redefinition occurs. ABS ALL ASSUMES AT CASEMAP* COMMON COMPACT CPU* DOTNAME* EMULATOR* EPILOGUE* ERROR* EXPORT* EXPR16* EXPR32* FARSTACK* FLAT FORCEFRAME HUGE LANGUAGE* LARGE LISTING* LJMP* LOADDS* M510* MEDIUM MEMORY NEARSTACK* NODOTNAME* NOEMULATOR* NOKEYWORD* NOLJMP* NOM510* NONE NONUNIQUE* NOOLDMACROS* NOOLDSTRUCTS* NOREADONLY* NOSCOPED* NOSIGNEXTEND* NOTHING NOTPUBLIC* OLDMACROS* OLDSTRUCTS* OS_DOS* PARA PRIVATE* PROLOGUE* RADIX* READONLY* REQ* SCOPED* SETIF2* SMALL STACK TINY USE16 USE32 USES
These operands are reserved words. Reserved words are not case sensitive. $ ? @B @F ADDR* BASIC BYTE C CARRY?* DWORD FAR FAR16* FORTRAN FWORD NEAR NEAR16* OVERFLOW?* PARITY?* PASCAL QWORD REAL4* REAL8* REAL10* SBYTE* SDWORD* SIGN?* STDCALL*
Appendix D MASM Reserved Words
409
SWORD* SYSCALL*
TBYTE VARARG*
WORD ZERO?*
Special Operands for the 80386/486

FLAT* NEAR32* FAR32*
Predefined Symbols
Unlike most MASM reserved words, predefined symbols are case sensitive. @CatStr* @code @CodeSize @Cpu @CurSeg @data @DataSize @Date* @Environ* @fardata @fardata? @FileCur* @FileName @InStr* @Interface* @Line* @Model* @SizeStr* @stack* @SubStr* @Time* @Version @WordSize
Registers
AH AL AX BH BL BP BX CH CL CR0 CR2 CR3 CS CX DH DI DL DR0 DR1 DR2 DR3 DR6 DR7 DS DX EAX EBP EBX ECX EDI EDX ES ESI ESP FS GS SI SP SS ST TR3* TR4* TR5* TR6 TR7
410
Programmers Guide
Operators and Directives

.186 .286 .286C .286P .287 .386 .386C .386P .387 .486* .486P* .8086 .8087 .ALPHA .BREAK* .CODE .CONST .CONTINUE* .CREF .DATA .DATA? .DOSSEG* .ELSE* .ELSEIF* .ENDIF* .ENDW* .ERR .ERR1 .ERR2 .ERRB .ERRDEF .ERRDIF .ERRDIFI .ERRE .ERRIDN .ERRIDNI .ERRNB .ERRNDEF .ERRNZ .EXIT* .FARDATA .FARDATA? .IF* .LALL .LFCOND .LIST .LISTALL* .LISTIF* .LISTMACRO* .LISTMACROALL* .MODEL .NO87* .NOCREF* .NOLIST* .NOLISTIF* .NOLISTMACRO* .RADIX .REPEAT* .SALL .SEQ .SFCOND .STACK .STARTUP* .TFCOND .TYPE .UNTIL* .UNTILCXZ* .WHILE* .XALL .XCREF .XLIST ALIAS* ALIGN ASSUME CATSTR COMM COMMENT DB DD DF DOSSEG DQ DT DUP DW ECHO* ELSE ELSEIF ELSEIF1 ELSEIF2 ELSEIFB ELSEIFDEF ELSEIFDIF ELSEIFDIFI ELSEIFE ELSEIFIDN
411
ELSEIFIDNI ELSEIFNB ELSEIFNDEF END ENDIF ENDM ENDP ENDS EQ EQU EVEN EXITM EXTERN* EXTERNDEF* EXTRN FOR* FORC* GE GOTO* GROUP GT HIGH HIGHWORD* IF IF1 IF2 IFB IFDEF IFDIF IFDIFI IFE
IFIDN IFIDNI IFNB IFNDEF INCLUDE INCLUDELIB INSTR INVOKE* IRP IRPC LABEL LE LENGTH LENGTHOF* LOCAL LOW LOWWORD* LROFFSET* LT MACRO MASK MOD .MSFLOAT NAME NE OFFSET OPATTR* OPTION* ORG %OUT PAGE
POPCONTEXT* PROC PROTO* PTR PUBLIC PURGE PUSHCONTEXT* RECORD REPEAT* REPT SEG SEGMENT SHORT SIZE SIZEOF* SIZESTR STRUC STRUCT* SUBSTR SUBTITLE* SUBTTL TEXTEQU* THIS TITLE TYPE TYPEDEF* UNION* WHILE* WIDTH
412
Processor Instructions
Processor instructions are not case sensitive.
413
8086/8088 Processor Instructions

AAA AAD AAM AAS ADC ADD AND CALL CBW CLC CLD CLI CMC CMP CMPS CMPSB CMPSW CWD DAA DAS DEC DIV ESC HLT IDIV IMUL IN INC INT INTO IRET JA JAE JB JBE JC JCXZ JE JG JGE JL JLE JMP JNA JNAE JNB JNBE JNC JNE JNG JNGE JNL JNLE JNO JNP JNS JNZ JO JP JPE JPO JS JZ LAHF LDS LEA LES LODS LODSB LODSW LOOP LOOPE LOOPEW* LOOPNE LOOPNEW* LOOPNZ LOOPNZW* LOOPW* LOOPZ LOOPZW* MOV MOVS MOVSB MOVSW MUL NEG NOP NOT OR OUT POP POPF PUSH PUSHF RCL RCR
414
Programmers Guide
RET RETF RETN ROL ROR SAHF SAL SAR SBB
SCAS SCASB SCASW SHL SHR STC STD STI STOS
STOSB STOSW SUB TEST WAIT XCHG XLAT XLATB XOR
80186 Processor Instructions

BOUND ENTER INS INSB INSW LEAVE OUTS OUTSB OUTSW POPA PUSHA PUSHW*

ARPL LAR LSL SGDT SIDT SLDT SMSW STR VERR VERW
80286 and 80386 Privileged-Mode Instructions

CLTS LGDT LIDT LLDT LMSW LTR

BSF BSR BT BTC BTR BTS CDQ CMPSD CWDE INSD IRETD IRETDF* IRETF* JECXZ LFS LGS LODSD LOOPD*
415
LOOPED* LOOPNED* LOOPNZD* LOOPZD* LSS MOVSD MOVSX MOVZX OUTSD POPAD POPFD PUSHAD PUSHD* PUSHFD SCASD SETA
SETAE SETB SETBE SETC SETE SETG SETGE SETL SETLE SETNA SETNAE SETNB SETNBE SETNC SETNE SETNG
SETNGE SETNL SETNLE SETNO SETNP SETNS SETNZ SETO SETP SETPE SETPO SETS SETZ SHLD SHRD STOSD

BSWAP* CMPXCHG* INVD* INVLPG* WBINVD* XADD*
Instruction Prefixes
LOCK REP REPE REPNE REPNZ REPZ
Coprocessor Instructions
Coprocessor instructions are not case sensitive.
8087 Coprocessor Instructions

F2XM1 FABS FADD FADDP FBLD FBSTP FCHS FCLEX FCOM
416
Programmers Guide
FCOMP FCOMPP FDECSTP FDISI FDIV FDIVP FDIVR FDIVRP FENI FFREE FIADD FICOM FICOMP FIDIV FIDIVR FILD FIMUL FINCSTP FINIT FIST FISTP FISUB FISUBR FLD FLD1
FLDCW FLDENV FLDENVW* FLDL2E FLDL2T FLDLG2 FLDLN2 FLDPI FLDZ FMUL FMULP FNCLEX FNDISI FNENI FNINIT FNOP FNSAVE FNSAVEW* FNSTCW FNSTENV FNSTENVW* FNSTSW FPATAN FPREM FPTAN
FRNDINT FRSTOR FRSTORW* FSAVE FSAVEW* FSCALE FSQRT FST FSTCW FSTENV FSTENVW* FSTP FSTSW FSUB FSUBP FSUBR FSUBRP FTST FWAIT FXAM FXCH FXTRACT FYL2X FYL2XP1
80287 Privileged-Mode Instruction

FSETPM
80387 Instructions
FCOS FLDENVD* FNSAVED* FNSTENVD* FPREM1 FRSTORD* FSAVED* FSIN FSINCOS FSTENVD* FUCOM FUCOMP
417
FUCOMPP
417
A P P E N D I X
Default Segment Names
If you use simplified segment directives by themselves, you do not need to know the names assigned for each segment. However, it is possible to mix full segment definitions with simplified segment directives, in which case you need to know the segment names. Table E.1 shows the default segment names created by each directive. If you use .MODEL , a _TEXT segment is always defined, even if all .CODE directives specify a name. The default segment name used as part of far-code segment names is the filename of the module. The default name associated with the .CODE directive can be overridden, as can the default names for .FARDATA and .FARDATA?. The segment and group table at the end of listings always shows the actual segment names. However, the GROUP and ASSUME statements generated by the .MODEL directive are not shown in listing files. For a program that uses all possible segments, group statements equivalent to the following would be generated:
DGROUP GROUP _DATA, CONST, _BSS, STACK
For the tiny model, these ASSUME statements would be generated:

ASSUME cs:DGROUP, ds:DGROUP, ss:DGROUP
For small and compact models with NEARSTACK, these ASSUME statements would be generated:
ASSUME cs: _TEXT, ds:DGROUP, ss:DGROUP
Filename: LMAPGAPE.DOC Project: Template: MSGRIDA1.DOT Author: Ruth L Silverio Last Saved By: Ruth L Silverio Revision #: 2 Page: 417 of 1 Printed: 10/02/00 04:24 PM
418
Programmers Guide
For medium, large, and huge models with NEARSTACK, these ASSUME statements would be generated:
ASSUME cs:name_TEXT, ds:DGROUP, ss:DGROUP
Table E.1 Model Tiny
Default Segments and Types for Standard Memory Models Directive

.CODE .FARDATA .FARDATA? .DATA .CONST .DATA?
Name _TEXT FAR_DATA FAR_BSS _DATA CONST _BSS _TEXT FAR_DATA FAR_BSS _DATA CONST _BSS STACK name_TEXT FAR_DATA FAR_BSS _DATA CONST _BSS STACK _TEXT FAR_DATA FAR_BSS _DATA CONST _BSS STACK
Align
WORD PARA PARA WORD WORD WORD WORD PARA PARA WORD WORD WORD PARA WORD PARA PARA WORD WORD WORD PARA WORD PARA PARA WORD WORD WORD PARA
Combine
PUBLIC PRIVATE PRIVATE PUBLIC PUBLIC PUBLIC PUBLIC PRIVATE PRIVATE PUBLIC PUBLIC PUBLIC STACK PUBLIC PRIVATE PRIVATE PUBLIC PUBLIC PUBLIC STACK PUBLIC PRIVATE PRIVATE PUBLIC PUBLIC PUBLIC STACK
Class 'CODE' 'FAR_DATA' 'FAR_BSS' 'DATA' 'CONST' 'BSS' 'CODE' 'FAR_DATA' 'FAR_BSS' 'DATA' 'CONST' 'BSS' 'STACK' 'CODE' 'FAR_DATA' 'FAR_BSS' 'DATA' 'CONST' 'BSS' 'STACK' 'CODE' 'FAR_DATA' 'FAR_BSS' 'DATA' 'CONST' 'BSS' 'STACK'
Group DGROUP
DGROUP DGROUP DGROUP
Small
.CODE .FARDATA .FARDATA? .DATA .CONST .DATA? .STACK
DGROUP DGROUP DGROUP DGROUP *
Medium
DGROUP DGROUP DGROUP DGROUP*
Compact
Appendix E Default Segment Names Table E.1 Model Large or huge (continued) Directive
419
Name name_TEXT FAR_DATA FAR_BSS _DATA CONST _BSS STACK _TEXT _DATA _BSS _DATA CONST _BSS STACK
Align
WORD PARA PARA WORD WORD WORD PARA DWORD DWORD DWORD DWORD DWORD DWORD DWORD
Combine
PUBLIC PRIVATE PRIVATE PUBLIC PUBLIC PUBLIC STACK PUBLIC PUBLIC PUBLIC PUBLIC PUBLIC PUBLIC PUBLIC
Class 'CODE' 'FAR_DATA' 'FAR_BSS' 'DATA' 'CONST' 'BSS' 'STACK' 'CODE' 'DATA' 'BSS' 'DATA' 'CONST' 'BSS' 'STACK'
Group
Flat
* unless the stack type is FARSTACK
420
Programmers Guide
421
Glossary
8087, 80287, or 80387 coprocessor Intel chips that perform high-speed floating-point and binary coded decimal number processing. Also called math coprocessors. Floating-point instructions are supported directly by the 80486 processor. arg In PWB, a function modifier that introduces an argument or an editing function. The argument may be of any type and is passed to the next function as input. For example, the PWB command Arg textarg Copy passes the text argument textarg to the function Copy. argument A value passed to a procedure or function. See parameter. array An ordered set of continuous elements of the same type. ASCII (American Standard Code for Information Interchange) A widely used coding scheme where 1-byte numeric values represent letters, numbers, symbols, and special characters. There are 256 possible codes. The first 128 codes are standardized; the remaining 128 are special characters defined by the computer manufacturer. assembler A program that converts a text file containing mnemonically coded microprocessor instructions into the corresponding binary machine code. MASM is an assembler. See compiler. assembly language A programming language in which each line of source code corresponds to a specific microprocessor instruction. Assembly language gives the programmer full access to the computers hardware and produces the most compact, fastest executing code. See high-level language. assembly mode The mode in which the CodeView debugger displays the assemblylanguage equivalent of the high-level code being executed. CodeView obtains the assembly-
A
address The memory location of a data item or procedure. The expression can represent just the offset (in which case the default segment is assumed), or it can be in segment:offset format. address constant In an assembly-language instruction, an immediate operand derived by applying the SEG or OFFSET operator to an identifier. address range A range of memory bounded by two addresses. addressing modes The various ways a memory address or device I/O address can be generated. See far address, near address. aggregate types Data types containing more than one element, such as arrays, structures, and unions. animate A debugging feature in which each line in a running program is highlighted as it executes. The Animate command from the CodeView debugger Run menu turns on animation. API (application programming interface) A set of system-level routines that can be used in an application program for tasks such as basic input/output and file management. In a graphicsoriented operating environment like Microsoft Windows, high-level support for video graphics output is part of the Windows graphical API.
Filename: LMAPGGLO.DOC Project: Glossary Template: GLOSS.DOT Author: Ruth L Silverio Last Saved By: Ruth L Silverio Revision #: 2 Page: 421 of 16 Printed: 10/02/00 04:21 PM
422
base address
language code by disassembling the executable file. See source mode.
B
base address The starting address of a stack frame. Base addresses are usually stored in the BP register. base name The portion of the filename that precedes the extension. For example, SAMPLE is the base name of the file SAMPLE.ASM. BCD (binary coded decimal) A way of representing decimal digits where 4 bits of 1 byte are a decimal digit, coded as the equivalent binary number. binary Referring to the base-2 counting system, whose digits are 0 and 1. binary expression A Boolean expression consisting of two operands joined by a binary operator and resolving to a binary number. binary file A file that contains numbers in binary form (as opposed to ASCII characters representing the same numbers). For example, a program file is a binary file. binary operator A Boolean operator that takes two arguments. The AND and OR operators in assembly language are examples of binary operators. BIOS (Basic Input/Output System) The software in a computers ROM which forms a hardwareindependent interface between the CPU and its peripherals (for example, keyboard, disk drives, video display, I/O ports). bit Short for binary digit. The basic unit of binary counting. Logically equivalent to decimal digits, except that bits can have a value of 0 or 1, whereas decimal digits can range from 0 through 9.
breakpoint A user-defined condition that pauses program execution while debugging. CodeView can set breakpoints at a specific line of code, for a specific value of a variable, or for a combination of these two conditions. buffer A reserved section of memory that holds data temporarily, most often during input/output operations. byte The smallest unit of measure for computer memory and data storage. One byte consists of 8 bits and can store one 8-bit character (a letter, number, punctuation mark, or other symbol). It can represent unsigned values from 0 to 255 or signed values between 128 and +127.
C
C calling convention The convention that follows the C standard for calling a procedurethat is, pushing arguments onto the stack from right to left (in reverse order from the way they appear in the argument list). The C calling convention permits a variable number of arguments to be passed. chaining (to an interrupt) Installing an interrupt handler that shares control of an interrupt with other handlers. Control passes from one handler to the next until a handler breaks the chain by terminating through an IRET instruction. See interrupt handler, hooking (an interrupt). character string See string. clipboard In PWB, a section of memory that holds text deleted with the Copy, Ldelete, or Sdelete functions. Any text attached to the clipboard deletes text already there. The Paste function inserts text from the clipboard at the current cursor position. .COM The filename extension for executable files that have a single segment containing both code and data. Tiny model produces .COM files.
device driver
423
combine type The segment-declaration specifier (AT, COMMON, MEMORY, PUBLIC, or STACK) which tells the linker to combine all segments of the same type. Segments without a combine type are private and are placed in separate physical segments. compact A memory model with multiple data segments but only one code segment. compiler A program that translates source code into machine language. Usually applied only to high-level languages such as Basic, FORTRAN, or C. See assembler. constant A value that does not change during program execution. A variable, on the other hand, is a value that canand usually does change. See symbolic constant. constant expression Any expression that evaluates to a constant. It may include integer constants, character constants, floating-point constants, or other constant expressions.
description file A text file used as input for the NMAKE utility.
D
debugger A utility program that allows the programmer to execute a program one line at a time and view the contents of registers and memory in order to help locate the source of bugs or other problems. Examples are CodeView and Symdeb. declaration A construct that associates the name and the attributes of a variable, function, or type. See variable declaration. default A setting or value that is assumed unless specified otherwise. definition A construct that initializes and allocates storage for a variable, or that specifies either code labels or the name, formal parameters, body, and return type of a procedure. See type definition.
424
device driver
device driver A program that transforms I/O requests into the operations necessary to make a specific piece of hardware fulfill that request. Dialog Command window The window at the bottom of the CodeView screen where dialog commands can be entered, and previously entered dialog commands can be reviewed. direct memory operand In an assembly-language instruction, a memory operand that refers to the contents of an explicitly specified memory location. directive An instruction that controls the assemblers state. displacement In an assembly-language instruction, a constant value added to an effective address. This value often specifies the starting address of a variable, such as an array or multidimensional table. DLL See dynamic-link library. double-click To rapidly press and release a mouse button twice while pointing the mouse cursor at an object on the screen. double precision A real (floating-point) value that occupies 8 bytes of memory (MASM type REAL8). Double-precision values are accurate to 15 or 16 digits. doubleword A 4-byte word (MASM type DWORD). drag To move the mouse while pointing at an object and holding down one of the mouse buttons. dump To display or print the contents of memory in a specified memory range. dynamic linking The resolution of external references at load time or run time (rather than link time). Dynamic linking allows the called
subroutines to be packaged, distributed, and maintained independently of their callers. Windows extends the dynamic-link mechanism to serve as the primary method by which all system and nonsystem services are obtained. See linking. dynamic-link library (DLL) A library file that contains the executable code for a group of dynamically linked routines. dynamic-link routine A routine in a dynamic-link library that can be linked at load time or run time.
E
element A single member variable of an array of like variables. environment block The section of memory containing the MS-DOS environment variables. errorlevel code See exit code. .EXE The filename extension for a program that can be loaded and executed by the computer. The small, compact, medium, large, huge, and flat models generate .EXE files. See .COM, tiny. exit code A code returned by a program to the operating system. This usually indicates whether the program ran successfully. expanded memory Increased memory available after adding an EMS (Expanded Memory Specification) board to an 8086 or 80286 machine. Expanded memory can be simulated in software. The EMS board can increase memory from 1 megabyte to 8 megabytes by swapping segments of high-end memory into lower memory. Applications must be written to the EMS standard in order to make use of expanded memory. See extended memory. expression Any valid combination of
forward declaration
425
mathematical or logical variables, constants, strings, and operators that yields a single value.
extended memory Physical memory above 1 megabyte that can be addressed by 80286 80486 machines in protected mode. Adding a memory card adds extended memory. On 80386-based machines, extended memory can be made to simulate expanded memory by using a memory-management program. extension The part of a filename (of up to three characters) that follows the period (.). An extension is not required but is usually added to differentiate similar files. For example, the source-code file MYPROG.ASM is assembled into the object file MYPROG.OBJ, which is linked to produce the executable file MYPROG.EXE. external variable A variable declared in one module and referenced in another module.
F
far address A memory location specified with a segment value plus an offset from the start of that segment. Far addresses require 4 bytes two for the segment and two for the offset. See near address. field One of the components of a structure, union, or record variable. fixup The linking process that supplies addresses for procedure calls and variable references. flags register A register containing information about the status of the CPU and the results of the last arithmetic operation performed by the CPU. flat A nonsegmented linear address space. Selectors in flat model can address the entire 4 gigabytes of addressable memory space. See segment, selector. formal parameters The variables that receive values passed to a function when the function is called.
426
forward declaration
forward declaration A function declaration that establishes the attributes of a symbol so that it can be referenced before it is defined, or called from a different source file. frame The segment, group, or segment register that specifies the segment portion of an address.
H
handle An arbitrary value that an operating system supplies to a program (or vice versa) so that the program can access system resources, files, peripherals, and so forth, in a controlled fashion. handler See interrupt handler. hexadecimal The base-16 numbering system whose digits are 0 through F (the letters A through F represent the decimal numbers 10 through 15). This is often used in computer programming because it is easily converted to and from the binary (base-2) numbering system the computer itself uses. high-level language A programming language that expresses operations as mathematical or logical relationships, which the languages compiler then converts into machine code. This contrasts with assembly language, in which the program is written directly as a sequence of explicit microprocessor instructions. Basic, C, COBOL, and FORTRAN are examples of highlevel languages. See assembly language, compiler. hooking (an interrupt) Replacing an address in the interrupt vector table with the address of another interrupt handler. See interrupt handler, interrupt vector table, unhooking (an interrupt). huge A memory model (similar to large model) with more than one code segment and more than one data segment. However, individual data items can be larger than 64K, spanning more than one segment. See large.
G
General-Protection (GP) fault An error that occurs in protected mode when a program accesses invalid memory locations or accesses valid locations in an invalid way (such as writing into ROM areas). gigabyte 1,024 megabytes, or 1,073,741,824 bytes. global See visibility. global constant A constant available throughout a module. Symbolic constants defined in the module-level code are global constants. global data segment A data segment that is shared among all instances of a dynamic-link routine; in other words, a single segment that is accessible to all processes that call a particular dynamic-link routine. global variable A variable that is available (visible) across multiple modules. granularity The degree to which library procedures can be linked as individual blocks of code. In Microsoft libraries, granularity is at the object-file level. If a single object file containing three procedures is added to a library, all three procedures will be linked with the main program even if only one of them is actually called. group A collection of individually defined segments that have the same segment base address.
I
identifier A name that identifies a register or memory location.
linked list
427
IEEE format A standard created by the Institute of Electrical and Electronics Engineers for representing floating-point numbers, performing math with them, and handling underflow/overflow conditions. The 8087 family of coprocessors and the emulator package implement this format. immediate expression An expression that evaluates to a number that can be either a component of an address or the entire address. immediate operand In an assembly-language instruction, a constant operand that is specified at assembly time and stored in the program file as part of the instruction opcode. import library A pseudo library that contains addresses rather than executable code. The linker reads the addresses from an import library to resolve references to external dynamic-link library routines. include file A text file with the .INC extension whose contents are inserted into the source-code file and immediately assembled. indirect memory operand In an assemblylanguage instruction, a memory operand whose value is treated as an address that points to the location of the desired data. See pointer. instruction The unit of binary information that a CPU decodes and executes. In assembly language, instruction refers to the mnemonic (such as LDS or SHL) that the assembler converts into machine code. instruction prefix See prefix. interrupt A signal to the processor to halt its current operation and immediately transfer control to an interrupt handler. Interrupts are triggered either by hardware, as when the keyboard detects a keypress, or by software, as when a program executes the INT instruction. See interrupt handler.
interrupt handler A routine that receives processor control when a specific interrupt occurs. interrupt service routine See interrupt handler. interrupt vector An address that points to an interrupt handler. interrupt vector table A table maintained by the operating system. It contains addresses (vectors) of current interrupt handlers. When an interrupt occurs, the CPU branches to the address in the table that corresponds to the interrupts number. See interrupt handler.
K
keyword A word with a special, predefined meaning for the assembler. Keywords cannot be used as identifiers. kilobyte (K) 1,024 bytes.
L
label A symbol (identifier) representing the address of a code label or data objects. language type The specifier that establishes the naming and calling conventions for a procedure. These are BASIC, C, FORTRAN, PASCAL, STDCALL, and SYSCALL. large A memory model with more than one code segment and more than one data segment, but with no individual data item larger than 64K (a single segment). See huge. library A file that contains modules of compiled code. MS-DOS programs use normal run-time libraries, from which the linker extracts modules and combines them with other object modules to create executable program files. Windows-based programs can use dynamic-link libraries (see), which the operating system loads and links to calling programs. See also import library.
428
linked list
linked list A data structure in which each entry includes a pointer to the location of the adjoining entries. linking In normal static linking, the process in which the linker resolves all external references by searching run-time and user libraries, and then computes absolute offset addresses for these references. Static linking results in a single executable file. In dynamic linking (see), the operating system, rather than the linker, provides the addresses after loading the modules into separate parts of memory. local constant A constant whose scope is limited to a procedure or a module. local variable A variable whose scope is confined to a particular unit of code, such as module-level code, or a procedure. See module-level code. logical device A symbolic name for a device that can be mapped to a physical (actual) device. logical line A complete program statement in source code, including the initial line of code and any extension lines. logical segment A memory area in which a program stores code, data, or stack information. See physical segment. low-level input and output routines Run-time library routines that perform unbuffered, unformatted input/output operations. LSB (least-significant bit) The bit lowest in memory in a binary number.
macro A block of text or instructions that has been assigned an identifier. When the assembler sees this identifier in the source code, it substitutes the related text or instructions and assembles them. main module The module containing the point where program execution begins (the programs entry point). See module. math coprocessor See 8087, 80287, or 80387 coprocessor. medium A memory model with multiple code segments but only one data segment. megabyte 1,024 kilobytes or 1,048,576 bytes. member One of the elements of a structure or union; also called a field. memory address A number through which a program can reference a location in memory. memory map A representation of where in memory the computer expects to find certain types of information. memory model A convention for specifying the number and types of code and data segments in a module. See tiny, small, medium, compact, large, huge, and flat. memory operand An operand that specifies a memory location. meta A prefix that modifies the subsequent PWB function. mnemonic A word, abbreviation, or acronym that replaces something too complex to remember or type easily. For example, ADC is the mnemonic for the 8086s add-with-carry instruction. The assembler converts it into machine (binary) code, so it is not necessary to remember or calculate the binary form.
M
machine code The binary numbers that a microprocessor interprets as program instructions. See instruction.
overlay
429
module A discrete group of statements. Every program has at least one module (the main module). In most cases, a module is the same as a source file. module-definition file A text file containing information that the linker uses to create a Windows-based program. module-level code Program statements within any module that are outside procedure definitions. MSB (most-significant bit) The bit farthest to the left in a binary number. It represents 2(n-1) , where n is the number of bits in the number. multitasking operating system An operating system in which two or more programs, processes, or threads can execute simultaneously.
the value 0.
O
.OBJ Default filename extension for an object file. object file A file (normally with the extension .OBJ) produced by assembling source code. It contains relocatable machine code. The linker combines object files with run-time and library code to create an executable file. offset The number of bytes from the beginning of a segment to a particular byte within that segment. opcode The binary number that represents a specific microprocessor instruction. operand A constant or variable value that is manipulated in an expression or instruction. operator One or more symbols that specify how the operand or operands of an expression are manipulated. option A variable that modifies the way a program performs. Options can appear on the command line, or they can be part of an initialization file (such as TOOLS.INI). An option is sometimes called a switch. output screen The CodeView screen that displays program output. Choosing the Output command from the View menu or pressing F4 switches to this screen. overflow An error that occurs when the value assigned to a numeric variable is larger than the allowable limit for that variables type. overlay A program component loaded into memory from disk only when needed. This technique reduces the amount of free RAM needed to run the program.
N
naming convention The way the compiler or assembler alters the name of a routine before placing it into an object file. NAN Acronym for not a number. Math coprocessors generate NANs when the result of an operation cannot be represented in IEEE format. For example, if two numbers being multiplied have a product larger than the maximum value permitted, the coprocessor returns a NAN instead of the product. near address A memory location specified by the offset from the start of the value in a segment register. A near address requires only 2 bytes. See far address. nonreentrant See reentrant procedure. null character The ASCII character encoded as the value 0. null pointer A pointer to nothing, expressed as
430
parameter
P
parameter The name given in a procedure definition to a variable that is passed to the procedure. See argument.
passing by reference Transferring the address of an argument to a procedure. This allows the procedure to modify the arguments value. passing by value Transferring the value (rather than the address) of an argument to a procedure. This prevents the procedure from changing the arguments original value. physical segment The true memory address of a segment, referenced through a segment register. pointer A variable containing the address of another variable. See indirect memory operand. precedence The relative position of an operator in the hierarchy that determines the order in which expression elements are evaluated. preemptive Having the power to take precedence over another event. prefix A keyword (LOCK, REP, REPE, REPNE, REPNZ, or REPZ) that modifies the behavior of an instruction. MASM 6.1 ensures the prefix is compatible with the instruction. private Data items and routines local to the module in which they are defined. They cannot be accessed outside that module. See public. privilege level A hardware-supported feature of the 8028680486 processors that allows the programmer to specify the exclusivity of a program or process. Programs running at lownumbered privilege levels can access data or resources at higher-numbered privilege levels, but the reverse is not true. This feature reduces the possibility that malfunctioning code will corrupt data or crash the operating system. privileged mode The term applied to privilege level 0. This privilege level should be used only by a protected-mode operating system. Special privileged instructions are enabled by .286P, .386P, and .486P. Privileged mode should not be confused with protected mode.
ROM (read-only memory)
431
procedure call An expression that invokes a procedure and passes actual arguments (if any) to the procedure. procedure definition A definition that specifies a procedures name, its formal parameters, the declarations and statements that define what it does, and (optionally) its return type and storage class. procedure prototype A procedure declaration that includes a list of the names and types of formal parameters following the procedure name. process Generally, any executing program or code unit. This term implies that the program or unit is one of a group of processes executing independently. Program Segment Prefix (PSP) A 256-byte data structure at the base of the memory block allocated to a transient program. It contains data and addresses supplied by MS-DOS that a program can read during execution. protected mode The 8028680486 operating mode that permits multiple processes to run and not interfere with each other. This feature should not be confused with privileged mode. public Data items and procedures that can be accessed outside the module in which they are defined. See private.
from. RAM data is volatile; it is usually lost when the computer is turned off. Programs are loaded into and executed from RAM. See ROM. real mode The normal operating mode of the 8086 family of processors. Addresses correspond to physical (not mapped) memory locations, and there is no mechanism to keep one application from accessing or modifying the code or data of another. See protected mode. record A MASM variable that consists of a sequence of bit values. reentrant procedure A procedure that can be safely interrupted during execution and restarted from its beginning in response to a call from a preemptive process. After servicing the preemptive call, the procedure continues execution at the point at which it was interrupted. register operand In an assembly-language instruction, an operand that is stored in the register specified by the instruction. register window The optional CodeView window in which the CPU registers and the flag register bits are displayed. registers Memory locations in the processor that temporarily store data, addresses, and processor flags. regular expression A text expression that specifies a pattern of text to be matched (as opposed to matching specific characters). relocatable Not having an absolute address. The assembler does not know where the label, data, or code will be located in memory, so it generates a fixup record. The linker provides the address. return value The value returned by a function. ROM (read-only memory) Computer memory that can only be read from and cannot be modified. ROM data is permanent; it is not lost when the
Q
qualifiedtype A user-defined type consisting of an existing MASM type (intrinsic, structure, union, or record), or a previously defined TYPEDEF type, together with its language or distance attributes.
R
radix The base of a number system. The default radix for MASM and CodeView is 10. RAM (random-access memory) Computer memory that can be both written to and read
432
routine
machine is turned off. A computers ROM often contains BIOS routines and parts of the operating system. See RAM. routine A generic term for a procedure or function. run-time dynamic linking The act of establishing a link when a process is running. See dynamic linking. run-time error A math or logic error that can be detected only when the program runs. Examples of run-time errors are dividing by a variable whose value is zero or calling a DLL function that doesnt exist.
after they scroll off the top. This mode is required with computers that are not IBM compatible. selector A value that indirectly references a segment address. A protected-mode operating system, such as Windows, assigns selector values to programs, which use them as segment addresses. If a program attempts to use an unassigned selector, it triggers a GeneralProtection fault (see). shared memory A memory segment that can be accessed simultaneously by more than one process. shell escape A method of gaining access to the operating system without leaving CodeView or losing the current debugging context. It is possible to execute MS-DOS commands, then return to the debugger. sign extended The process of widening an integer (for example, going from a byte to a word, or a word to a doubleword) while retaining its correct value and sign. signed integer An integer value that uses the most-significant bit to represent the values sign. If the bit is one, the number is negative; if zero, the number is positive. See twos complement, unsigned integer, MSB. single precision A real (floating-point) value that occupies 4 bytes of memory. Single-precision values are accurate to six or seven decimal places. single-tasking environment An environment in which only one program runs at a time. MSDOS is a single-tasking environment. small A memory model with only one code segment and only one data segment. source file A text file containing symbols that define the program. source mode The mode in which CodeView
S
scope The range of statements over which a variable or constant can be referenced by name. See global constant, global variable, local constant, local variable. screen swapping A screen-exchange method that uses buffers to store the debugging and output screens. When you request the other screen, the two buffers are exchanged. This method is slower than flipping (the other screen-exchange method), but it works with most adapters and most types of programs. scroll bars The bars that appear at the right side and bottom of a window and some list boxes. Dragging the mouse on the scroll bars allows scrolling through the contents of a window or text box. segment A section of memory, limited to 64K with 16-bit segments or 4 gigabytes with 32-bit segments, containing code or data. Also refers to the starting address of that memory area. sequential mode The mode in CodeView in which no windows are available. Input and output scroll down the screen, and the old output scrolls off the top of the screen when the screen is full. You cannot examine previous commands
status bar
433
displays the assembly-language source code that
represents the machine code currently being executed. stack An area of memory in which data items are consecutively stored and removed on a lastin, first-out basis. A stack can be used to pass parameters to procedures. stack frame The portion of a stack containing a particular procedures local variables and parameters. stack probe A short routine called on entry to a function to verify that there is enough room in the program stack to allocate local variables required by the function. stack switching Changing the stack pointers to point to another stack area. stack trace A symbolic representation of the functions that are being executed to reach the current instruction address. As a function is executed, the function address and any function arguments are pushed on the stack. Therefore, tracing the stack shows the active functions and their arguments. standard error The device to which a program can send error messages. The display is normally standard error. standard input The device from which a program reads its input. The keyboard is normally standard input. standard output The device to which a program can send its output. The display is normally standard output. statement A combination of labels, data declarations, directives, or instructions that the assembler can convert into machine code. status bar See linking.
434
static linking
static linking The line at the bottom of the PWB or CodeView screen. The status bar displays text position, keyboard status, current context of execution, and other program information. STDCALL A calling convention that uses caller stack cleanup if the VARARG keyword is specified. Otherwise the called routine must clean up the stack. string A contiguous sequence of characters identified with a symbolic name. string literal A string of characters and escape sequences delimited by single quotation marks (' ') or double quotation marks (" "). structure A set of variables that may be of different types, grouped under a single name. structure member One of the elements of a structure. Also called a field. switch See option. symbol A name that identifies a memory location (usually for data). symbolic constant A constant represented by a symbol rather than the constant itself. Symbolic constants are defined with EQU statements. They make a program easier to read and modify. SYSCALL A language type for a procedure. Its conventions are identical to Cs, except no underscore is prefixed to the name.
text box In PWB, a box where you type information needed to carry out a command. A text box appears within a dialog box. The text box may be blank or contain a default entry. tiny Memory model with a single segment for both code and data. This limits the total program size to 64K. Tiny programs have the filename extension .COM. toggle A function key or menu selection that turns a feature off if it is on, or on if it is off. Used as a verb, toggle means to reverse the status of a feature. TOOLS.INI A file containing initialization information for many of the Microsoft utilities, including PWB. twos complement A form of base-2 notation in which negative numbers are formed by inverting the bit values of the equivalent positive number and adding 1 to the result. type A description of a set of values and a valid set of operations on items of that type. For example, a variable of type BYTE can have any of a set of integer values within the range specified for the type on a particular machine. type checking An operation in which the assembler verifies that the operands of an operator are valid or that the actual arguments in a function call are of the same types as the function definitions parameters. type definition The storage format and attributes for a data unit.
T
tag The name assigned to a structure, union, or enumeration type. task See process. text Ordinary, readable characters, including the uppercase and lowercase letters of the alphabet, the numerals 0 through 9, and punctuation marks.
U
unary expression An expression consisting of a single operand preceded or followed by a unary operator. unary operator An operator that acts on a single operand, such as NOT.
word
435
underflow An error condition that occurs when a calculation produces a result too small for the computer to represent. unhooking (an interrupt) The act of removing your interrupt handler and restoring the original vector. See hooking (an interrupt). union A set of values (in fields) of different types that occupy the same storage space. unresolved external See unresolved reference. unresolved reference A reference to a global or external variable or function that cannot be found, either in the modules being linked or in the libraries linked with those modules. An unresolved reference causes a fatal link error. unsigned integer An integer in which the mostsignificant bit serves as part of the number, rather than as an indication of sign. For example, an unsigned byte integer can have a value from 0 to 255. A signed byte integer, which reserves its eighth bit for the sign, can range from -127 to +128. See signed integer, MSB. user-defined type A data type defined by the user. It is usually a structure, union, record, or pointer.
visibility The characteristic of a variable or function that describes the parts of the program in which it can be accessed. An item has global visibility if it can be referenced in every source file constituting the program. Otherwise, it has local visibility.
W
watch window The window in CodeView that displays watch statements and their values. A variable or expression is watchable only while execution is occurring in the section of the program (context) in which the item is defined. window A discrete area of the screen in PWB or CodeView used to display part of a file or to enter statements. window commands Commands that work only in CodeViews window mode. Window commands consist of function keys, mouse selections, CTRL and ALT key combinations, and selections from pop-up menus. window mode The mode in which CodeView displays separate windows, which can change independently. CodeView has mouse support and a wide variety of window commands in window mode. word A data unit containing 16 bits (2 bytes). It can store values from 0 to 65,535 (or -32,768 to +32,767).
V
variable declaration A statement that initializes and allocates storage for a variable of a given type. virtual disk A portion of the computers random access memory reserved for use as a simulated disk drive. Also called an electronic disk or RAM disk. Unless saved to a physical disk, the contents of a virtual disk are lost when the computer is turned off. virtual memory Memory space allocated on a disk, rather than in RAM. Virtual memory allows large data structures that would not fit in conventional memory, at the expense of slow access.
436
435
Index
! (literal-character operator) 235 != (not equal operator) 178 (double quotation marks) 109, 353 $ (current address operator) 368 % (expansion operator) 235, 248, 357 & (substitution operator) 238, 372 && (logical AND operator) 178 (single quotation mark) 109, 353 ( ) (parentheses) 106 + (plus operator) 63, 66, 352, 370 . (dot operator) 126 . (structure-member operator) 64, 67, 352, 370 .186 directive 38 .286 directive 38 .286P directive 38 .287 directive 38 .386 directive FLAT, with 26, 36 processor mode, specifying 38, 336 segment mode, setting 46, 68 .386P directive 38 .387 directive 38 .486 directive FLAT, with 36 processor mode, specifying 38 segment mode, setting 46, 68 .486P directive 38 .8087 directive 38 : (colon) 22, 352, 354 : (segment-override operator) 50, 59 60, 64 :: (double colon) 197, 215, 352 354 ; (semicolon) 21 ;; (double semicolon) 227 < (less than operator) 178 < > (angle brackets) See Angle brackets = = (equal operator) 178 > (greater than operator) 178 ? (question mark initializer) array elements 109 described 368 variables 87 @ (at sign) 10 @@: (anonymous label) 170 [ ] (brackets) 107 [ ] (index operator) 63 \ (backslash character), MASM code 22 \ (line-continuation character) 121 {} (curly braces) 121, 131 || (logical OR operator) 178 32-bit programming 335 80186 processor 3 80188 processor 3 80286 processor 3 80287 math coprocessor 3, 135 80386 processor 3, 335 80387 math coprocessor 3, 135 80486 processor 3, 135 8086-based processors 2 3 8087 math coprocessor 3, 135 8088 processor 3
A
AAD instruction 160 AAM instruction 160 ABS operand 220 Accessing data with pointers See Pointer variables ADC instruction 92 94 ADD instruction 92 94 ADDR operator 197 Addresses displacement of 65 dynamic 79 effective 65 errors in 54 far 57, 74, 80 near 57, 80 physical 7 registers, loading into 80 relocatable 57 segmented 7 8, 53 Addressing direct registers, used in 62 63 indirect registers, used in 65, 68 scaling operands 70 specifying 60 Aliases 87, 369 ALIGN directive 3 Align types 45 See also individual entries .ALPHA directive 47 AND instruction 27, 99, 100 Angle brackets (< >) default parameters 230 epilogues 202 FOR loops 242
Filename: LMAPGINX.DOC Project: Template: INDEX.DOT Author: Samuel G. Dawson Last Saved By: Ruth L Silverio Revision #: 20 Page: 435 of 1 Printed: 10/02/00 04:20 PM
436
Index
FORC loops 244
Index
Angle brackets (< >) (continued) macro text delimiters 234 prologues 202 records 131 structures and unions 121 Anonymous label (@@) 170 API (Application Programming Interface) 257 Architecture, segmented 2, 5 Architecture, unsegmented 5 Arguments errors 196 macro 252 mixed-language programs, passing in 314 qualifiedtypes, with 16 stack, on 182 Arrays accessing elements in 105 declaring 105 defined 105 defining 15 DUP, declaring with 106, 124 instructions for processing 110 length of 108 multiple-line declarations for 105 number of bytes in 108 referencing 108, 316 size of elements 108 with DUP operator See DUP operator with SIZEOF operator See SIZEOF operator with TYPE operator See TYPE operator ASCIIZ 267 Assembly actions during 23 conditional See Conditional assembly INCLUDE files 212 language book list xviii mixed-language programs 312 listing files See Listing files two-pass 358 Assembly pointers See Conditional assembly Assembly-time variables 233 ASSUME directive .MODEL, generated with 37 code segments, changing 357 enhancements 344 general-purpose registers 77 segment registers, setting 49 55, 58 59, 357 AT address combine type 46 /AT command-line option, ML 36 At sign (@) 10
437
438
Index
OPTION directive 25
B
Backslash character (\) 22 Backus-Naur Form See BNF grammar Base Pointer (BP) register 73 Basic calling conventions 308 310 Basic/MASM programs 328 332 Binary Coded Decimals calculating with 156 160 defining 156 instructions for 156 160 packed 158 unpacked 159 160 Bits mask 99 102 rotating 100 shifting 100 BNF grammar 16, 379 380 BOUND instruction 108, 204 BP (Base Pointer) register 73 Brackets ([ ]) 107 .BREAK directive 173, 176 BSF instruction 100 BSR instruction 100 BYTE align type 45 directive 86
C
C calling convention 309 C++/MASM programs 322 323 C/MASM programs 315 321 CALL instruction 180 Calling conventions 309 Basic 308 310 directives, specifying 37 FORTRAN 308 310 (list) 308 mixed-language programming 308 309 Pascal 310 STDCALL 311 SYSCALL 308 311 CARRY? flag as operand 178 Case sensitivity enforcing 348 macro functions, predefined 245 MASM statements 22 radix specifiers 11 reserved words 9, 407 specifying command-line options, in 25 language type 348
Index
Case sensitivity (continued) symbols, predefined 10 CASEMAP ALL argument, OPTION directive 25 NONE argument, OPTION directive 25 NOTPUBLIC argument, OPTION directive 25 CATSTR directive 245 247 CATSTR, compared with TEXTEQU directive See TEXTEQU directive @CatStr predefined string function 245 247 CBW instruction 90 CDQ instruction 90 CLC instruction 104 Cleaning the stack 185 CLI instruction 5, 209 Client program 257, 266 CMC instruction 104 CMP instruction 166 CMPS instruction 110 114, 353 CMPSB instruction 114 .CODE directive 33, 40 42 Code segment See Segments, code Code, near or far 57 @CodeSize predefined symbol 40 CodeView for Windows 264 Combine types (list) 46 See also individual entries .COM files relocatable segment expression, lacking 62 starting address 56 tiny model, using 36, 46 47 COMM directive 16, 211, 217 218 Command-line driver, ML xvi Command-line options See ML command-line options COMMENT directive 22 Comments extended lines, in 346 macros, in 227 source code 21 22 COMMON combine type 46 Communal variables 217 Compact model See Memory models, compact Compatibility, MASM 5.1 See MASM 5.1 compatibility Conditional assembly assembly behavior, changing 23 conditions, testing for 28 directives 28 pointers 83, 187 Conditional-error directives (table) 29 Conditional jumps 164 170 Conditions, testing for conditional assembly See Conditional assembly Constants defined 11 expressions 12 immediate 61 integer 11 12 size 363 size of 12 symbolic 12 .CONST directive 33, 39 40 .CONTINUE directive 173, 176 Coprocessors architecture 140 144 control registers 156 data format in registers 140 defined 135 described 3, 139 instructions arithmetic 148 150 data transfer 146 described 146 (list) 414 overview 141 program control 151 155 memory access 145 operand formats classical stack 141 memory 142 overview 141 register 143 register-pop 144 specifying 37, 140 status word register 156 steps for using 145 /Cp command-line option, ML 10, 245 @Cpu predefined symbol 254 Curly braces ({}) records 131 structures and unions 121 Current address operator ($) 368 @CurSeg predefined symbol 39, 219 CWD instruction 90 CWDE instruction 90 /Cx command-line option, ML 158
439
D
DAA instruction 162 DAS instruction 162 .DATA directive 33, 39 40 .DATA? directive 33, 39 40 @data predefined symbol 39 Data segment See Segments, data @DataSize predefined symbol 39, 83
440
Index
Index
Data types arrays See Arrays attributes for 15 Binary Coded Decimals 159 defined 14 defining 87 directives 14 floating-point 136 initializers, as 14 integers, allocating memory for 85 86 new features, MASM 6.1 344 qualifiedtypes 15, 214 real 14, 136 signed 14, 86 strings See Strings structures 117 unions 117 user-defined 15 Data, near or far 57, 58 Data-sharing methods 211 Data-sharing methods, multiple-module programs See Multiple-module programs Date, system 11 DB directive 86 DD directive 86 DEC instruction 92 94 DF directive 86 DGROUP group name .MODEL, defined by 34, 39, 51 DS registers, initializing to 56 MS-DOS programs, for 41 42 near data, accessing 57 58 segment 35 37, 51 52, 57 Direct memory operands loading offset of 82 overview 60 64 Directives .286P 38 .287 38 .386 See .386 directive .386P 38 .387 38 .486P 38 .8087 38 .ALPHA 47 ALIGN 3 .BREAK 173, 176 BYTE 86 CATSTR 245 247 .CODE 33, 40 42 COMM 16, 211, 217 218 COMMENT 22 Conditional assembly 28 Conditional error 29, 358
441
442
Index
Directives (continued) .CONST 33, 39 40 .CONTINUE 173, 176 .DATA 33, 39 40 .DATA? 33, 39 40 Data declarations, for 87 Data types, for 14 Data-sharing See EXTERN directive DB 86 DD 86, 136 Decision 171 DF 86 .DOSSEG 47 DQ 86, 136 DT 86, 136 DW 86 DWORD 86 ECHO 236 .ELSE 171 ELSE 28 .ELSEIF 171 ELSEIF 28 ELSEIF1 29, 358 ELSEIF2 29, 358 END 33, 56 .ENDIF 171 ENDIF 28 ENDM 227 239 ENDP 180 181, 206 ENDS 44 .ENDW 173 EQU 12, 369 .ERR 30 .ERR1 30, 358 .ERR2 30, 358 .ERRB 30, 231 .ERRDEF 30 .ERRDIF 30 .ERRE 30 .ERRIDN 29 .ERRNB 29, 231 .ERRNDEF 29 .ERRNZ 29 EVEN 3 .EXIT 33, 41 43 EXITM 248 EXTERN See EXTERN directive EXTERNDEF See EXTERNDEF directive FARDATA 33, 39 40 .FARDATA 39 40 .FARDATA? 33, 39 40 Floating-point 136 FOR 242 243, 249 FORC 244
Index
Directives (continued) FWORD 86 GROUP 51 52 .IF 171 IF 28 29 IF1 29, 358 IF2 29, 358 IFB 29, 231 IFDEF 29, 359 IFDIF 29 IFE 29 IFIDN 29 IFNB 29, 231 IFNDEF 29, 359 INCLUDE 212 INCLUDELIB 222 INSTR 245 246 INVOKE See INVOKE directive LABEL 16 LOCAL 188 191, 232 loop-generating 173 .MODEL See .MODEL directive .MSFLOAT 361 Naming conventions 37 .NO87 38, 349 obsolete 361 OPTION See OPTION directive ORG 56 POPCONTEXT 255, 349 PROC 180 184, 193, 206, 312 PUBLIC 185, 211, 220 PUSHCONTEXT 255, 349 QWORD 86 .RADIX 11 REAL4 136 137 REAL8 136 137 REAL10 136 137 RECORD 130 131 Renamed since MASM 5.1 350 .REPEAT 173 177 REPEAT 240 SBYTE 86 SDWORD 86 SEGMENT 44 47 Segment order, controlling 47 .SEQ 47 SIZESTR 245 246 STACK See STACK directive .STARTUP See .STARTUP directive STARTUP See .STARTUP directive STRUCT 118 129 SUBSTR 245 246 SWORD 86 TBYTE 86, 159
443
444
Index
ELSE directive 28
Directives (continued) TEXTEQU See TEXTEQU directive UNION 118 119, 122, 125 129 .UNTIL 173 .UNTILCXZ 173 .WHILE 173 177 WHILE 241 WORD 86 Directives: 36 38, 46 Displacement 66 Distance attributes 15 DIV instruction 97 98 Division 97, 102 DLLs client program 257, 266 data segment 265 269 defined 257, 266 example 267 268 extension name 266 heap 261 262, 265 267 IMPLIB utility 258 initialization 261 262, 268 269 loading 258 260 programming requirements 260 261, 267 prologue and epilogue 264 267 stacks in 46, 264 267 summary 266 termination 262 264, 270 Document conventions vii DOS See MS-DOS .DOSSEG directive 47 Dot (.) operator See Structure-member operator DOTNAME argument, OPTION directive 25 Double colon (::) 197, 215 Double quotation marks () 109 Double semicolon (;;) 227 Doublewords 86 DQ directive 86 DT directive 86 DUP operator arrays, with 106, 124 record variables, with 131 structures and unions, with 121 DW directive 86 DWORD align type 45 directive 86 Dynamic -link libraries See DLLs
E
ECHO directive 236 .ELSE directive 171
Index
.ELSEIF directive 171 ELSEIF directive 28 ELSEIF1 directive 358 ELSEIF2 directive 29, 358 EMULATOR argument, OPTION directive 27, 157 Emulator libraries 155 156 END directive 33, 56 .ENDIF directive 171 ENDIF directive 28 ENDM directive 227 239 ENDP directive 180 181, 206 ENDS directive 44 .ENDW directive 173 ENTER instruction 183 Environment target 4 variables INCLUDE 213 LIB 222 returning values of 10 /EP command-line option, ML 342 EPILOGUE argument, OPTION directive 26, 201 203 Epilogue code defined 198 macros 201 202, 264 265 PROC statement, specifying arguments in 185 procedures, with 26 RET instruction 357 standard 199 user-defined 201 EQ operator 365 EQU directive 12, 369 Equal directive (=) 12 Equates, predefined See Predefined symbols .ERR directive 29 .ERR1 directive 30, 358 .ERR2 directive 30, 358 .ERRB directive 29, 231 .ERRDEF directive 29 .ERRDIF directive 29 .ERRE directive 29 .ERRIDN directive 29 .ERRNB directive 29, 231 .ERRNDEF directive 29 .ERRNZ directive 29 Error detection 196 ERROR operand 4950 Errors, argument passing 196 ESC instruction 360 EVEN directive 3 Executable (.EXE) files, controlling size of 223 Exit codes, Windows operating system 263 .EXIT directive 33, 41 43 EXITM directive 248
445
446
Index
relocatable segment expression, lacking 62
Expansion operator (%) 235 236, 248, 357 Explicit loading 258 Exponent bias 139 EXPORT operand 185 EXPORTS statement 261, 270 EXPR16 argument, OPTION directive 13, 26, 361, 373 EXPR32 argument, OPTION directive 13, 26, 373 Expression operators 178 Expressions assembly-time evaluation 23 constant 12 loop conditions, evaluating 179 OPTION M510 behavior 364, 373 order of evaluation 14 size 366, 373 word size 13, 26 Extension, filename 266 EXTERN directive data-sharing 211 executable file size, limiting 223 module-specific 220 overview 16 positioning 218 procedure prototypes, declaring 193 External declarations 216 218 External variables 217, 369 EXTERNDEF directive data-sharing 211 overview 16 positioning 218 procedure prototypes, declaring 193 symbols, declaring 214 215
F
Far addresses, invoking 57, 74, 80 81, 197 Far code 57 Far data 58 60 .FARDATA directive 33, 39 40 .FARDATA? directive 39 40 FAR operator 169, 185 Far pointer 74, 80 81 FARSTACK operand example 35 grouping 34 in Windows-based programs 266 MS-DOS program, initializing 43 special cases, setting for 37 Farwords 86 FCOM instruction 153 Fields, statements in 21 22 Files .COM
Index
Files (continued) .COM (continued) starting address 56 tiny model, using 36, 46 47 executable 24 include 212 213, 348 line numbers 11 naming 11 Flags CARRY? 178 operands, as 178 OVERFLOW? 178 PARITY? 178 SIGN? 178 stack, saving on 73 ZERO? 178 Flags register See Registers, flags Flat model See Memory models, flat FLAT operand 46, 49 50 FLD1 instruction 147 FLDZ instruction 147 Floating-point calculations 3 constants decimal form 137 encoded hexadecimal format 137 syntax for defining 136 emulation 157 158 IEEE format 139 instructions arithmetic 148 149 controlling 26 data transfer 147 not emulated (list) 158 program control 152 153, 156 operations 146 values double precision 139 single precision 139 variables IEEE format 138 Microsoft binary format 138 .MSFLOAT format 138 ranges 136 FOR directive 242 243, 249 FORC directive 244 FORCEFRAME operand 200 201 FORTRAN calling convention 308 310 FORTRAN/MASM programs 323 326 /Fpi command-line option, ML 26, 157 Frame 62 FS register 17 FTST instruction 153
447
448
Index
Full segment definitions described 32 segment registers, initializing 54 56 using 44 51 Full segment defintions See .STARTUP directive FWORD directive 86 FXCH instruction 144
G
Global variables 211 GROUP directive 51 52 Groups defined 51 DGROUP 51 SEG operator, returned by 62 GS register 17
H
H2INC 318 Heap space 261 262, 265 267 HEAPSIZE statement 261, 271 Help, online See Microsoft Advisor HIGH operator 356 HIGHWORD operator 346 Huge model See Memory models, huge
I
/I command-line option, ML 213 Identifiers ABS, using 220 naming restrictions 9, 346, 353, 357, 368 OPTION DOTNAME 373 OPTION NOKEYWORD 376 IDIV instruction 97 98 IEEE format 139 .IF directive 171 IF directive 28 29 IF1 directive 29, 358 IF2 directive 29, 358 IFB directive 29, 231 IFDEF directive 29, 359 IFDIF directive 29 IFE directive 29 IFIDN directive 29 IFNB directive 29, 231 IFNDEF directive 29, 359 Immediate operands 60 62 IMPLIB utility 258 Implicit loading 258 Import libraries 258
Index
IMPORTS statement 266 IMUL instruction 95 96 IN instruction 5 INC instruction 92 94 INCLUDE directive 212 INCLUDE environment variable 213 Include files assembling 213 nested 213 overview 212, 348 INCLUDELIB directive 222 Index operator ([ ]) 63 Indirect memory operands 60, 64 70 Indirect procedure calls See INVOKE directive Initializers allocating 87 directives for 15 multiple-line 346 Instance 261, 266 INSTR directive 245 246 @InStr predefined string function 245 246 Instruction Pointer (IP) register 20, 57, 161 Instructions ADC 92 94 ADD 92 94 AND 26, 99 100 arithmetic 378 bit-test 354 BOUND 108, 204 BSF 100 BSR 100 CALL 180 CBW 90 CDQ 90 CLC 104 CLI 5, 209 CMC 104 CMP 166 CMPS 110 114, 353 CMPSB 114 conditional-jump 165 167 coprocessor 377 CWD 90 CWDE 90 DAA 162 DAS 162 DEC 92 94 default segments, requiring 49 DIV 97 98 encodings, changes to 377 378 ENTER 183 ESC 360 FCOM 153 FLD1 147 Instructions (continued) FLDZ 147 floating-point See Floating-point instructions FTST 153 FXCH 144 IDIV 97 98 IMUL 95 96 IN 5 INC 92 94 INT 204 205 INTO 207 JCXZ 170 173 JECXZ 170 173 JMP 49, 162 JO 165 jump 165167, 170, 173 LAHF 73 LDS 81 LEA 82, 104 LEAVE 183 LES 81 (list) 412 LOCK 353, 363 LODS 110 115, 353 logical 99 102 LOOP 172 LOOPE 172 LOOPNE 172 LOOPNZ 172 LOOPZ 172 MOV 49, 82, 89 MOVS 110 113, 353 MOVSX 92 MOVZX 92 MUL 95 96 NOP 377 NOT 99 100 obsolete 360 operands for 60 OR 26, 99 100, 168 OUT 5 POP 49, 71 POPA 74 POPAD 74 POPF 73 POPFD 73 privileged 2, 38 PUSH 49, 71 PUSHA 74 PUSHAD 74 PUSHF 73 PUSHFD 73 RCL 101 104 RCR 101 104
449
450
Index
Instructions (continued) REP 110 112, 363 REPE 110 112, 363 REPNE 110 112, 353, 363 REPNZ 110 112, 353, 363 REPZ 110 112, 363 RET 378 RETF 181, 378 RETN 181, 378 ROL 101 104 ROR 101 104 SAL 101 104 SAR 101 104 SBB 92 94 SCAS 110 115, 353 SHL 101 104 SHR 101 104 STC 104 STI 5, 209 STOS 110 113, 353 SUB 92 94 TEST 167 168 timing xvii, 399 400 XCHG 90 XLAT 116 XLATB 116 XOR 26, 99 100 Integers adding 92 94 allocating memory for 85 86 Binary Coded Decimal (BCD) 159 bit operations on 99 constants, defining 11 12 dividing 97 98 exchanging 90 hexadecimal 12 initializing 87 memory format 86 moving 89 multiplying 95 96 operations with 88 popping off stack 71 pushing onto stack 71 radix specifiers for 11 sign-extending 90 signed 86 size of 86 stack 71 subtracting 92 94 translating 116 types, defining 14, 86 value range 86 @Interface predefined symbol 37 Interrupt vector 205
Index
Interrupt-enable flag 205 Interrupts CLI instruction 209 handlers 206 207 INT instruction 204 205 MS-DOS 204, 285 operation 206 overview 204 redefining 207 STI instruction 209 vector table 205 INTO instruction 207 INVOKE directive actions 194 ADDR, invoking 197 arguments, widening 196 error detection 196 far addresses, invoking 197 generated code, checking 198 indirect procedure calls 198 mixed-language programs 312 313 procedures, calling 193 197, 216 type conversions 194 195
451
L
LABEL directive 16
J
JCXZ instruction 170 173 JECXZ instruction 170 173 JMP instruction 49, 162 JO instruction 165 Jumps anonymous 170 automatic 169 conditional bit status 167 comparisons 166 extending 26, 169 flag status 165 166 instructions (list) 165 167 overview 164 zero value 168 directives for 171 extension, automatic 26, 169 instructions 165 167 optimization, automatic 162 overview 161 unconditional indirect operands 163 jump tables 163 overview 162
452
Index
Labels anonymous 170 code length 346 OPTION M510 behavior 363 OPTION NOSCOPED 375 procedures, in 357 referencing 352 size 346 visibility 354 LAHF instruction 73 LANGUAGE BASIC argument, OPTION directive 26 C argument, OPTION directive 26 FORTRAN argument, OPTION directive 26 PASCAL argument, OPTION directive 26 STDCALL argument, OPTION directive 26 SYSCALL argument, OPTION directive 26 LANGUAGE argument, OPTION directive 193 Language attributes .MODEL directive, with 34, 37 OPATTR operator 253 OPTION directive, with 26 Large model See Memory models, large LDS instruction 81 LEA instruction 82, 104 LEAVE instruction 183 Length of strings See LENGTHOF operator LENGTH operator 356 357, 364 LENGTHOF operator number of items, returning 110, 124, 132, 346 structures, defining 108 unions, with 125 LES instruction 81 Libraries C run-time 271 emulator 155 156 overview 221 source files, specifying in 222 LIBRARY statement 270 Line-continuation character (\) 121 LINK, command-line options See individual entries Linkage specification 322 323 Linking actions during 24, 45 segment order in 48 Listing files code generated 399 command-line options 397 399 error messages 400 examples 401 generating 397 PWB options 397 399 reading 399, 405
Index
Listing files (continued) symbols used in (list) 400 tables in 405 406 Literal-character operator (!) 235 LJMP argument, OPTION directive 27 LOADDS operand 200 201 Loading local address variables See Local variables Loading, actions during 24 Local addresses, loading See Local variables LOCAL directive 188 191, 232 Local variables creating 188 loading addresses of 82 procedures, in 188 LOCK instruction 353, 363 LODS instruction 110 115, 353 Logical AND 178 Logical instruction 99 100 Logical line 22 Lookup tables 241 LOOP instruction 172 LOOPE instruction 172 LOOPNE instruction 172 LOOPNZ instruction 172 Loops conditions expression evaluation 179 precedence 179 PTR operator in 178 relational operators for (list) 178 signed operands 178 writing 178 controlling 176 directives .REPEAT 173 177 .WHILE 173 177 instructions (list) 172 macros FOR 242 243, 249 FORC 244 REPEAT 240 WHILE 241 LOOPZ instruction 172 LOW operator 356 LOWWORD operator 346, 366 LROFFSET operator 344
453
M
M510 argument, OPTION directive compatibility with MASM 5.1 26, 353 370 expression word size, setting 13 structures, with 119
454
Index
Macros arguments commas 352, 372 quotation marks 353 testing 29, 252 variable 242, 249 calling 227 checking argument types with 253 comments (;;) 227 expansion 23 functions defined 248 epilogues 201 EXITM 248 prologues 201 returning values 248 local symbols in 232 loops FOR 242 243, 249 FORC 244 REPEAT 240 WHILE 242 243 MASM 5.1 behavior 25, 356, 372 nested 251 new features 351 operators behavior in macro functions 251 expansion (%) 235 236, 248, 357 (list) 234 literal-character (!) 235 substitution (&) 238, 352, 372 OPTION OLDMACROS 372 parameters default values 230 procedure parameters, compared to 234 required 229 substitution 238 passing arguments to 228, 235 predefined string functions 11 procedures defined 226 functions, compared to 228 recursive 255 redefining 251 text defined 226 forward referencing 356 numeric equates, compared to 234 OPTION M510 behavior 370 syntax 226 VARARG keyword 242, 249, 351 writing 227 Mask defined 99
Index
Mask (continued) logic instructions, with 102 record operators, with 133 MASK operator 133 MASM 5.1 compatibility address fixups 26 macro behavior 25, 356, 372 OPTION directive, specifying 25 overview xvi structures 25 updating code 353 360 MASM utility xvi, 342 Math coprocessor See Coprocessors Medium model See Memory models, medium Memory access 64 allocation 24 virtual 5 MEMORY combine type 46 Memory models attributes 35 compact 36 described 34 determining 10 far code segments 40 far data segments 40 flat 36, 58, 336 huge 36 large 36 medium 36 model-independent code 83 near code segments 40 small 36 specifying in PROC statement 185 tiny 36, 46 47 Memory-resident programs See TSRs Microsoft Advisor xiii, 342 Minus operator ( ) 64 Mixed-language programming argument passing 314 assembly procedures 312 Basic/MASM programs 328 332 C prototypes, converting with H2INC 318 C++/MASM programs 322 323 C/MASM programs 315 321 calling conventions Basic 308 310 FORTRAN 308 310 (list) 308 Pascal 310 STDCALL 311 SYSCALL 308 311 column-major order 315
455
456
Index
Mixed-language programming (continued) compatible data types Basic (list) 328 C (list) 315 FORTRAN (list) 323 external data 314 FORTRAN/MASM programs 323 326 initialization code 313, 321 INVOKE, using 312 313 naming conventions 308 309 overview 307 register preservation 314 row-major order 315 ML command-line options /AT 36 /Cp 10, 245 described xvi /EP 342 /Fpi 26, 157 /I 213 listing options (list) 397 overview xvi /X 213 /Zm 62, 119 /Zp 119 Mode, real, protected See Real mode; Protected mode .MODEL directive attributes 34 35 DGROUP 51 language types, specifying 26, 308 memory model, defining 35 36 mode default 46 overview 34 positioning 46 simplified segment directives 33 @Model predefined symbol 35, 83 Module-definition file described 270 statements EXPORTS 261, 270 HEAPSIZE 261, 271 IMPORTS 266 LIBRARY 270 STUB 266 Module-specific EXTERN directive See EXTERN directive MOV instruction 49, 82, 89 MOVS instruction 110 113, 353 MOVSX instruction 92 MOVZX instruction 92 MS-DOS interrupts 204, 285 MS-DOS operating system 2 6 MUL instruction 95 96 Multiple-module programs alternatives to include files 219
Index
Multiple-module programs (continued) COMM, using 217 data-sharing methods 211 declaring symbols public and external 214 EXTERN with library routines 223 external declarations, positioning 218 EXTERNDEF, using 214 include files 212 213 libraries 221 222 modules 212 PROTO, using 216 PUBLIC and EXTERN, using 220 sharing symbols with include files 212 Multiplex interrupt 291, 304 Multiplication instructions 95 shift operations 102
457
N
Naming conventions directives 37 (list) 308 mixed-language programming 308 309 Naming restrictions 9 Naming restrictions, identifers See Identifiers NE operator 365 Near address 57, 80 NEAR operator 169, 185 NEARSTACK operand ASSUME statement 54 default stack type 37, 42 described 35 New features, MASM 6.1 xiv xv, 342 351 NMAKE 270 .NO87 directive 38, 349 NODOTNAME argument, OPTION directive 25 NOEMULATOR argument, OPTION directive 27 NOKEYWORD argument, OPTION directive 9, 27, 353, 376 NOLJMP argument, OPTION directive 27, 170 NOM510 argument, OPTION directive 25 NONUNIQUE operand 118, 126 NOOLDMACROS argument, OPTION directive 26 NOOLDSTRUCTS argument, OPTION directive 26 NOP instruction 377 NOREADONLY argument, OPTION directive 27 NOSCOPED argument, OPTION directive 26, 362, 375 NOSIGNEXTEND argument, OPTION directive 27, 378 NOT instruction 99 100 NOTHING operand 49 50 Number of items with LENGTHOF operator See LENGTHOF operator Numeric equates, compared to text macros 234
458
Index
LENGTHOF 346 Operators (continued) LOW 356 LOWWORD 346, 366 LROFFSET 344 macro 251 MASK 133 minus ( ) 64 NE 365 NEAR 169, 185 OFFSET 61, 82, See OFFSET operator OPATTR 252 254 plus (+) 63, 66 precedence 14 PTR See PTR operator PTR, example See PTR operator relational 357, 365 relational (list) 178 SEG 50, 62, 363 segment-override (:) 59, 64 SHORT 169 SIZE 364 365 size See PTR operator SIZEOF 86, 346 structure-member (.) 64 67, 126, 352, 370 substitution (&) 238 .TYPE 252, 360 TYPE 86 WIDTH 133 OPTION directive CASEMAP 25 described 23 DOTNAME 25, 361, 373 emulation mode 157 EMULATOR 26, 157 EPILOGUE 26, 201 203 EXPR16 OPTION directive 13, 26, 361, 373 EXPR32 OPTION directive 13, 26, 373 LANGUAGE 26, 193 language types, specifying 308 list of arguments for 25 LJMP 26 M510 See M510 argument, OPTION directive NODOTNAME 25 NOEMULATOR 26 NOKEYWORD See NOKEYWORD argument, OPTION directive NOLJMP 27, 170 NOM510 25 NOOLDMACROS 26 NOOLDSTRUCTS 26 NOREADONLY 27 NOSCOPED 26, 362, 375 NOSIGNEXTEND 27, 378
O
OFFSET FLAT argument, OPTION directive 27 GROUP argument, OPTION directive 27 SEGMENT argument, OPTION directive 27, 62 OFFSET operator 61, 82, 356, 374 Offsets accessing data with 74 addresses 7 described 5 7 determining 23 24, 360, 374 fixups for 26 OLDMACROS argument, OPTION directive 25, 239, 361, 372 OLDSTRUCTS argument, OPTION directive MASM 5.1 compatibility 25, 361, 370 372 structures, with 119, 126 Online help See Microsoft Advisor OPATTR operator 252 253 Operands ABS 220 direct memory 60 64 EXPORT 185 FAR 15 FARSTACK See FARSTACK operand FLAT 46, 49 50 FORCEFRAME 244 immediate 60 62 indirect memory 60, 64 70 NEAR 15 PRIVATE READONLY 44 45 registers 61 size 66, 355 USE16 44 46 USE32 44 46 Operating systems (list) 4 .MODEL, specifying with 34 multitasking 6 types See MS-DOS, Windows operating systems Operators ADDR 197 current address ($) 368 dot (.) 126, 352, 370 EQ 365 expansion (%) 235 236, 248, 357 expressions, in 12 13 FAR 169, 185 HIGH 356 HIGHWORD 346 index ([ ]) 63 instructions, compared to 13 LENGTH 356 357, 364
Index
OPTION directive (continued) OFFSET 26, 62, 362, 374 375 OLDMACROS 25, 237 OLDSTRUCTS See OLDSTRUCTS argument, OPTION directive PROC 185, 375 procedure use 26 PROLOGUE 26, 201 203 READONLY 26 SCOPED 25 SETIF2 25, 29 30 using 25, 361 OR instruction 27, 99 100, 168 ORG directive 56 OUT instruction 5 OVERFLOW? flag as operand 178 @InStr 245 246
459
P
PAGE align type 45 PARA align type 45 Parentheses [( )] 106 PARITY? flag as operand 178 Pascal convention 310 Physical line 22 Plus operator (+) 66, 352, 370 Pointer variables 74 78 Pointers accessing data with 74 arguments, as 80 copying 79 far 74, 80 81 initializing 78 location 74 operations 78 TYPEDEF, defined with 15, 75 78 types, to 15 Pointers and conditional Assembly See Conditional assembly Pointers defined by TYPEDEF See TYPEDEF directive POP instruction 49, 71 POPA instruction 74 POPAD instruction 74 POPCONTEXT directive 255, 349 POPF instruction 73 POPFD instruction 73 Positioning EXTERN directive See EXTERN directive EXTERNDEF directive See EXTERNDEF directive Precedence operators 14 Predefined equates See Predefined symbols Predefined functions for macros 11 Predefined string functions @CatStr 245 247
460
Index
Predefined string functions (continued) @SizeStr 245 246 @SubStr 245 246 Predefined symbols 39, 83 @Codesize 40 @Cpu 254 @CurSeg 39, 219 @Data 39 @DataSize 39, 83 @Interface 37 (list) 10, 409 @Model 35, 83 @stack 37 @Wordsize 39 case sensitivity 9 10 new to MASM 6.1 (list) 343 PRIVATE operand 185 Privilege levels 5 Problems, reporting xx PROC EXPORT argument, OPTION directive 25 PRIVATE argument, OPTION directive 25, 362 PUBLIC argument, OPTION directive 25, 185 PROC directive 180 184, 193, 206, 312 PROC statements with visibility See also Visibility PROC with RET instruction See RET instruction Procedure prototypes declaring See EXTERNDEF directive defined with See PROTO directive defined with PROTO directive See PROTO directive writing See PROTO directive Procedures arguments far pointers 197 near addresses 197 passing 182 pointers 80 type conversions 195, 196 CALL instruction 180 calling See INVOKE directive calls indirect 198 optimizing 181 defining 180 epilogues 26 EXTERNDEF directive 214215 See also EXTERNDEF directive, include files 214 INVOKE directive 193 197, 216 libraries 221 local variables 188 192 See also Local variables Macro See Macros, procedures new features 347
Index
Procedures (continued) OPTION PROC 375 overview 180 parameters declaring 184 186 variable numbers of 186 188, 194 PROC attributes, specifying 185 prologues 26 PROTO directive 193, 214, 216 See also PROTO directive prototypes, writing 193 RET instruction 180 RETF instruction 181 RETN instruction 181 syntax description 184 VARARG keyword 186 188, 194 visibility 25, 375 Processors See also Real mode; Protected mode 8086-based 2 3 .MODEL directive 37 modes, determining 10 target 2 timing xvii, 399 400 Product assistance xx Program Segment Prefix (PSP) 56 Programming, MASM 6.1 practices 352 Programs exiting 41 mixed-language 307 starting 41 PROLOGUE argument, OPTION directive 25, 201 203 Prologue code arguments, specifying 185 code labels in 357 defined 198 macros for 201 203, 264 265 standard 199 user-defined 26, 201 Protected mode described 2 7, 335 flat model 335 read-only segments 45 PROTO directive include files 211, 214 216 procedure prototypes, defined with 193 procedure prototypes, writing 312 Prototypes procedure directives for 193 overview 193 qualifiedtypes, defined with 15 PTR operator example 92
461
462
Index
RECORD syntax 130 131
PTR operator (continued) OPTION M510 behavior 365 pointer to type, as 15 signed number, specifying 178 size 66, 88 TYPEDEF, used with 75 PUBLIC combine type 45 PUBLIC directive 185, 211, 220 PUSH instruction 49, 71 PUSHA instruction 74 PUSHAD instruction 74 PUSHCONTEXT directive 255, 349 PUSHF instruction 73 PUSHFD instruction 73
Q
Quadwords 86 Qualifiedtypes BNF grammar 16 defined 15 pointers, defining 75 76 prototypes, as 15 rules for use 15 16 Question mark initializer ( ? ) array elements 109 described 368 variables 87 Quotation marks (' or ") 109 QWORD directive 86
R
.RADIX directive 11 Radix specifiers (list) 11 OPTION M510 behavior 367 RCL instruction 101 104 RCR instruction 101 104 Read-only code 27 READONLY argument, OPTION directive 26 READONLY operand 44 45 Real mode 2, 4, 7 Real numbers See Floating-point REAL4 directive 136 137 REAL8 directive 136 137 REAL10 directive 136 137 RECORD directive 130 131 Records defined 129 field ranges 354 LENGTH operator 357 operators 133 134
Index
Records with SIZEOF operator See SIZEOF operator Records with TYPE operator See TYPE operator Recursive macros 255 Register operands 61 Registers 16-bit 16 17, 67 32-bit 335 base 65 70 coprocessor 140 copying pairs of 82 division (table) 98 Eflags 20 extended 17 flags 20 FS 17 general purpose 19 GS 17 index 65 69 indirect addressing 65 indirect operands 67 68 initializing 44 Instruction Pointer (IP) 20, 57, 161 (list) 409 loading addresses into 80 mixed 16-bit, 32-bit 70 pointers as 77 scaling 67 69 segment See Segment registers Stack Pointer (SP) 19 Stack Segment (SS) 73 stacks, saving on 74 types, defined with ASSUME 77 Relational operators (list) 178 Relocatable addresses 57 expressions 62, 65 REP instruction 110 112, 363 REPE instruction 110 112, 363 Repeat blocks 239 .REPEAT directive 173 REPEAT directive 240 REPNE instruction 110 112, 353, 363 REPNZ instruction 110 112, 353, 363 Reporting problems xx REPZ instruction 110 112 Reserved words described 8, 26 (list) 407 OPTION M510 behavior 362 OPTION NOKEYWORD 376 RET instruction epilogue code, generating 200, 378 instruction encodings, changes to 357 PROC, with 180
463
464
Index
RETF instruction 181, 378 RETN instruction 181, 378 ROL instruction 101 104 ROM-BIOS interrupts See Interrupts ROR instruction 101 104 Rotate instructions 101 Routines, interrupt 206
S
SAL instruction 101 104 SAR instruction 101 104 SBB instruction 92 94 SBYTE directive 86 Scaling factor 107 Scaling index registers 67 69 SCAS instruction 110 112, 115, 353 Scope within visibility See also Visibility SCOPED argument, OPTION directive 26 SDWORD directive 86 SEG operator 49, 62, 363 SEGMENT FLAT argument, OPTION directive 27 USE16 argument, OPTION directive 27 USE32 argument, OPTION directive 27 Segment arithmetic 7 SEGMENT directive 44 47 Segment mode, setting See .386 directive; .486 directive Segment registers 32-bit 335 assigning 59, 62 ASSUME directive 49 55, 58 59, 357 changing 57 default 60, 64 described 18 FS 18 GS 18 initializing 43, 54 57 MS-DOS, under 24, 43 near code 57 restoring 59 segment-override operator (:) 50, 59 60, 64 Segment registers initializing See STARTUP directive setting See STACK directive Segment selectors 5 Segment-override operator (:) 50, 59 60, 64 Segmented architecture 2, 5 Segments 32-bit 36, 335 accessing data 74 aligning 44 45 class types 44, 47 48
Index
Segments (continued) code creating 40 far 40 memory model support for 36 near 40 combining 40, 44 46 current 10 data creating 39 default 49, 54 55, 59 far 40 memory model support for 36 near 39 defined 31 described 5 7 determining order of 47 48 determining position of 23 24 determining size of 44 fixups for 26 full segment definitions, defining 32 groups, defining 51 initializing 55 location of 6 naming 40 ordering with the linker 48 protection 6 READONLY 45 simplified segment directives 37 42 size, determining 10 types 44 USE16 44 USE32 44 values 55 word size, setting 46 Selector 335 Semicolon (;), comments 21 .SEQ directive 47 SETIF2 argument, OPTION directive 25, 29 30 Shift instructions 100 SHL instruction 101 104 SHORT operator 169 SHR instruction 101 104 Sign-extending integers 90 SIGN? flag as operand 178 Signed data 14, 91 Signed numbers, specifying See PTR operator Significand 139 Simplified segment directives code segments 41 code, starting and ending 42 data segments 40 described 32 language convention 36
465
466
Index
Simplified segment directives (continued) memory model 35 .MODEL, defining with 34 operating system 35 processor 38 segment registers, initializing 54 56 stack 39 stack distance 37 using 33 Single quotation mark (') 109 Size attribute, segments FLAT 46 USE16 46 USE32 46 Size mismatch 355 Size of strings See SIZEOF operator SIZE operator 364, 365 @SizeStr predefined string function 245 246 SIZEOF operator arrays, with 108 described 346 records, with 132 strings, with 110 structures, with 124 types 86 unions, with 125 SIZESTR directive 245 246 Small model See Memory models, small Source code, statements in 21 SP (Stack Pointer) register 19, 71 73 SS (Stack Segment) register 73 STACK combine type 45 .STACK directive described 33 segment registers, setting 56 Stack distance 37 Stack frame 73, 200, 264 265 Stack Pointer (SP) register 19 @stack predefined symbol 37 Stack Segment (SS) register 73 Stacks cleaning 185 creating 38 described 71 distance 37 far 10 FARSTACK 35, 37 in DLLs 264 267 local variables on 188 191 near 10 NEARSTACK 33, 35 37 operations with 72 74 operators 71 passing arguments 182
Index
Stacks (continued) pointer 71 73 POP instructions 71 PUSH instructions 71 saving flags 73 saving registers 74 segment register 18 separate 46 trace 264 .STARTUP directive described 33 initializing segments 54 56 program, starting 41 42 segment address 37 Statements case sensitivity 22 syntax 21 Status flags, saving 73 STC instruction 104 STDCALL calling convention 311, 336 STI instruction 5, 209 STOS instruction 110 113, 353 Strings declaring 109 defined 105 defining 15 initializing 109 instructions processing, for 110 requirements (table) 112, 353 length of 110 multiple-line declarations for 109 overview 111 predefined functions for macros 11 See also Predefined string functions size of 110 type of 110 STRUCT directive 118 129 Structure-member operator (.) 64 67, 126, 352, 370 Structures alignment of fields 118 119 array initializers 122 arrays 124 compatibility with MASM 5.1 25, 118 current address operator ($) 368 default field values 122 defined 117 fields accessing 64, 67, 371 initializing 118 naming 119, 352, 372 initializers, as 123 MASM 5.1 behavior 25, 355, 370 memory allocation 117
467
468
Index
Trap flag 205
Structures (continued) nested 128 129 new features 345 operators 124 OPTION M510 behavior 366 OPTION OLDSTRUCTS 370 redeclaration 124, 355 referencing fields in 126 steps for using 118 string initializers 122, 368 syntax types 118 variables 121 Structures with LENGTHOF operator See LENGTHOF operator Structures with SIZEOF operator See SIZEOF operator Structures with TYPE operator See TYPE operator STUB statement 266 SUB instruction 92 94 Substitution operator (&) 238, 372 SUBSTR directive 245 246 @SubStr predefined string function 245 246 SWORD directive 86 Symbol table, listing files 405 Symbols declaring public and external 214, 220 external 369 naming 346, 368 predefined 9 11 Symbols, declaring by EXTERNDEF directive See EXTERNDEF directive Syntax, MASM 6.1 statements 21 SYSCALL calling convention 308 311 System date 11 System time 11
T
Tables, lookup 241 Target environment 4 TBYTE directive 86, 159 Terminate-and-Stay-Resident programs See TSRs TEST instruction 167 168 Testing for zero 168 Text delimiters See Angle brackets Text macros See Macros, text TEXTEQU directive aliases 369 CATSTR, compared with 247 syntax 226 Time, system 11 Timing (cycle/second) xvii, 399 400 Tiny model See Memory models, tiny
Index
TSRs active described 275 interrupt handlers in 275 MS-DOS functions, calling 285 MS-DOS functions, interrupting 286, 302 deinstalling 292, 305 described 273 errors, trapping 288 289 examples ALARM.ASM 279 280, 284 SNAP.ASM 293 305 existing data, preserving 290, 303 hardware events, auditing 275 276, 299 interrupt handlers 275 monitoring Critical Error flag 287 system status 277, 300 MS-DOS internal stacks (lists) 286 multiplex interrupt 290, 304 passive 274 Type conversions See INVOKE directive Type of strings See TYPE operator TYPE operator and OPATTR 252 253 arrays, with 108 compatibility 360, 365 records, with 132 string, with 110 structures, with 124 types 86 unions, with 125 TYPEDEF directive aliases, created by 87, 137 BNF, from 380 data types, defining 87 indirect operands, defining 163 pointers, defined by 15, 75 78 procedure declarations 193 procedure prototypes 193 qualifiedtypes 16 TYPEDEF, used with PTR operator See PTR operator Types, data See Data types memory allocation 117 Unions (continued) nested 128 129 operators 125 referencing fields in 126 steps for using 118 strings as initializers 122 types 118 variables 121, 127 Unpacked BCD numbers 160 Unsegmented architecture 5 Unsigned data 91 .UNTIL directive 173 .UNTILCXZ directive 173 USE16 operand 44 46 USE32 operand 44 46 USES in PROC statement 184 Utilities IMPLIB 258 MASM 342 ML xvi
469
V
VARARG keyword macros, used in 242, 249, 351 procedures, used with 186 188, 194 Variables assembly-time 233 communal 217 environment 10, 213, 222 external 217, 369 floating-point 136 138 global 211 initializing 87 integers, allocating memory for 85 86 local address, loading 82 naming restrictions 9 Virtual memory 5 Virtual-86 mode 2, 335 Visibility PROC statement 25, 185 scope, within 9
U
Unconditional jumps 162 UNION directive 118 119, 122, 125 129 Unions arrays as initializers 122 arrays of 124 defined 117 fields 119, 127 129
W
WDEB386 debugger 264 WEP (Windows Exit Procedure) 263 264, 270 .WHILE directive 173 WHILE directive 241 WIDTH operator 133 Windows operating system API 257, 262
470
Index
applications 258, 261 DLLs 261 Windows operating system (continued) exit codes 263 MS-DOS, compared 4 programming for 4 protected mode 2, 6 SDK 268 task header 265, 269 Windows NT 3 5 WORD align type 45 WORD directive 86 Word size default 13, 363, 373 expressions, in 13, 26 @WordSize predefined symbol 39 Words, reserved See Reserved words
X
XCHG instruction 90 /X command-line option, ML 213 XLAT instruction 116 XLATB instruction 116 XOR instruction 27, 99 100
Z
ZERO? flag as operand 178 /Zm command-line option, ML 62, 119 /Zp command-line option, ML 119
Documentation Feedback Microsoft Macro Assembler Version 6.1

Please help us improve our documentation. When you have used MASM 6.1 for a while, please complete and return this form. Use the back of the form for additional suggestions and comments. Suggestions and comments become the property of Microsoft Corporation.
Rate each component of the document set: Rate each from 1 (never use) to 5 (often use). 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Getting Started Programmers Guide Environment and Tools Reference Online Help
What one thing would you like to see added to or removed from each component? Getting Started _____________________________________ _________________________________________________ _________________________________________________ _________________________________________________ Programmers Guide _________________________________ _________________________________________________ _________________________________________________ _________________________________________________ Environment and Tools _______________________________ _________________________________________________ _________________________________________________ _________________________________________________ Reference __________________________________________ _________________________________________________ _________________________________________________ _________________________________________________ Online Help ________________________________________ _________________________________________________ _________________________________________________ _________________________________________________ How many years of programming experience do you have? With assembly language With other programming languages (including application macro languages)
List what stands out most about each component: List the one or two things that you like, dislike, or both. Or, use the space to comment on the rating you gave the component in the previous section. Getting Started______________________________________ __________________________________________________ __________________________________________________ __________________________________________________ Programmers Guide _________________________________ __________________________________________________ __________________________________________________ __________________________________________________ Environment and Tools _______________________________ __________________________________________________ __________________________________________________ __________________________________________________ Reference __________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ Online Help_________________________________________ __________________________________________________ __________________________________________________ __________________________________________________
Filename: LMAPGDFD.DOC Project: Documentation Feedback Form MASM 6.1 Template: FEEDBACK.DOT Author: Ruth L Silverio Last Saved By: Mike Eddy Revision #: 5 Page: 1 of 1 Printed: 10/02/00 04:21 PM
Name _____________________________________________ Address ___________________________________________ City/State/Zip _______________________________________ Home Phone ( _____ ) ________________________________ Work Phone ( _____ ) ________________________________ May we contact you for additional information about your comments? Additional comments: Yes No
......................................................................................................................Fold......................................................................................................................
Place stamp here. Post Office will not deliver without proper postage.
Microsoft Corporation Languages MASM 6.1 One Microsoft Way Redmond WA 98052-9953
Filename: LMAPGDFD.DOC Project: Documentation Feedback Form MASM 6.1 Template: FEEDBACK.DOT Author: Ruth L Silverio Last Saved By: Mike Eddy Revision #: 5 Page: 2 of 2 Printed: 10/02/00 04:21 PM
Microsoft MASM
Filename: LMAETTTL.DOC Project: Template: FRONTA1.DOT Author: Mike Eddy Last Saved By: Mike Eddy Revision #: 4 Page: 1 of 1 Printed: 10/09/00 03:00 PM
Microsoft, MS, MS-DOS, XENIX, CodeView, and QuickC are registered trademarks and Windows and Windows NT are trademarks of Microsoft Corporation in the USA and other countries. U.S. Patent No. 4955066 IBM is a registered trademark of International Business Machines Corporation. Intel is a registered trademark of Intel Corporation. UNIX is a registered trademark of American Telephone and Telegraph Company. BRIEF is a registered trademark of SDC Software Partners II L.P. Printed in the United States of America.
Document No. DB35751-1292
Filename: LMAETCPY.DOC Project: Template: FRONTA1.DOT Author: Mike Eddy Last Saved By: Mike Eddy Revision #: 4 Page: 2 of 1 Printed: 10/09/00 02:53 PM
Contents
iii
Contents Overview
Introduction
......................................................... xxi
Part 1 The Programmers WorkBench Chapter 1 Introducing the Programmers WorkBench . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 Quick Start. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 3 Managing Multimodule Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Chapter 4 User Interface Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Chapter 5 Advanced PWB Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Chapter 6 Customizing PWB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Chapter 7 Programmers WorkBench Reference . . . . . . . . . . . . . . . . . . . . . . . 131 Part 2 The CodeView Debugger Chapter 8 Getting Started with CodeView . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 9 The CodeView Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 10 Special Topics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 11 Using Expressions in CodeView . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 12 CodeView Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 319 351 375 393
Part 3 Compiling and Linking Chapter 13 Linking Object Files with LINK . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 Chapter 14 Creating Module-Definition Files . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Chapter 15 Using EXEHDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Part 4 Utilities Chapter 16 Managing Projects with NMAKE . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 17 Managing Libraries with LIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 18 Creating Help Files with HELPMAKE . . . . . . . . . . . . . . . . . . . . . Chapter 19 Browser Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 20 Using Other Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 581 593 615 631
Part 5 Using Help Chapter 21 Using Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663 Appendixes Appendix A Appendix B Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845
Filename: LMAETTOC.DOC Project: Template: FRONTA1.DOT Author: Mike Eddy Last Saved By: Mike Eddy Revision #: 5 Page: 3 of 1 Printed: 10/09/00 02:54 PM
iv
Contents
Glossary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873
Contents
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Scope and Organization of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii Microsoft Support Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii Support Services Within the United States . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii Support Services Worldwide. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvi Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiii
Part 1 The Programmers WorkBench

Chapter 1 Introducing the Programmers WorkBench . . . . . . . . . . . . . . . . . . . . . . . Whats in Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conventions in the Tutorial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 4 5
Chapter 2 Quick Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 The PWB Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 The Microsoft Advisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Entering Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Saving a File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Indenting Text with PWB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Opening an Existing File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Copying, Pasting, and Deleting Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Single-Module Builds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Setting Build Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Setting Other Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Building the Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Fixing Build Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Running the Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Debugging the Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Using CodeView to Isolate an Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Working Through a Program to Debug it . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Examining Memory in the Memory Window . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Where to Go from Here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Chapter 3 Managing Multimodule Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Multimodule Program Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Opening the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
vi
Contents
Contents of a Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dependencies in a Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Building a Multimodule Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Running the Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Project Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Existing Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adding and Deleting a Project File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Changing Assembler and Linker Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Changing Options for Individual Modules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Program Build Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extending a PWB Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using a Non-PWB Makefile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Where to Go from Here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4 User Interface Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Starting PWB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . From the Command Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the Windows Operating System Program Manager. . . . . . . . . . . . . . . . Using the Windows Operating System File Manager . . . . . . . . . . . . . . . . . . . . The PWB Screen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PWB Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Edit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Browse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Executing Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choosing Menu Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shortcut Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Buttons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dialog Boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 5 Advanced PWB Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Searching with PWB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Searching by Visual Inspection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the Find Command. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38 39 40 40 41 42 44 46 49 51 52 55 56 57 57 57 58 59 59 64 64 64 65 66 66 67 68 69 70 70 70 71 72 72 77 77 78 79 82
Contents
vii
Using the Source Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Advanced Browser Database Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Executing Functions and Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Executing Functions and Macros by Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Writing PWB Macros. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 When Is a Macro Useful? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Recording Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Flow Control Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 User Input Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Chapter 6 Customizing PWB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Changing Key Assignments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Changing Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Customizing Colors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adding Commands to the Run Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How PWB Handles Tabs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PWB Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Autoloading Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The TOOLS.INI File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TOOLS.INI Statement Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Current Status File CURRENT.STS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Project Status Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 7 Programmers WorkBench Reference . . . . . . . . . . . . . . . . . . . . . . . . . PWB Command Line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PWB Menus and Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PWB Default Key Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Note on Available Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PWB Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cursor-Movement Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Predefined PWB Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PWB Switches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extension Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filename-Parts Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Boolean Switch Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Browser Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Help Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 109 112 114 115 118 120 121 122 124 127 128 129 131 131 132 135 139 140 144 207 244 246 247 248 286 287
viii
Contents
Part 2 The CodeView Debugger

Chapter 8 Getting Started with CodeView . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preparing Programs for Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . General Programming Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compiling and Linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Debugging Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Identifying the Bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Locating the Bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Setting up CodeView. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CodeView Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Configuring CodeView with TOOLS.INI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CodeView TOOLS.INI Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Memory Management and CodeView . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The CodeView Command Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leaving CodeView. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Command-Line Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The CURRENT.STS State File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 9 The CodeView Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The CodeView Display. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Menu Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Window Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Status Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CodeView Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to Use CodeView Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Source Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Watch Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Command Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Local Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Register Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The 8087 Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Memory Windows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Help Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CodeView Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The File Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Edit Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Search Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Run Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Data Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Options Menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 293 294 295 297 297 298 299 300 301 302 308 308 309 310 316 319 319 320 320 321 321 321 324 324 326 328 329 330 330 332 332 332 334 335 336 338 342
Contents
ix
The Calls Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 The Windows Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 The Help Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 Chapter 10 Special Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Debugging in the Windows Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparing CVW with CV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preparing to Run CVW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Starting a Debugging Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CVW Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CVW Debugging Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Debugging P-Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preparing Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P-Code Debugging Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P-Code Debugging Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remote Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remote Monitor Command-Line Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Starting a Remote Debugging Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 11 Using Expressions in CodeView . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Common Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Line Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Address Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choosing an Expression Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the C and C++ Expression Evaluators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Additional Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unsupported Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Restrictions and Special Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Context Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numeric Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . String Literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symbol Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using C++ Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Access Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ambiguous References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Constructors, Destructors, and Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . 351 351 351 352 353 357 360 363 364 365 365 367 367 368 370 371 375 375 376 377 378 379 380 381 381 381 382 382 384 385 385 386 386 386 386 387
Contents
Overloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operator Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Debugging Assembly Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Memory Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Register Indirection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Register Indirection with Displacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Address of a Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PTR Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Array and Structure Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
388 388 389 389 390 391 391 391 392 392
Chapter 12 CodeView Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 CodeView Command Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 CodeView Command Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
Part 3 Compiling and Linking

Chapter 13 Linking Object Files with LINK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LINK Output Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LINK Syntax and Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The objfiles Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The exefile Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The mapfile Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The libraries Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The deffile Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Running LINK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying Input with LINK Prompts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying Input in a Response File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LINK Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /ALIGN Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /BATCH Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /CO Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /CPARM Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /DOSSEG Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /DSALLOC Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /DYNAMIC Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 457 458 459 460 461 462 463 463 466 467 468 469 469 471 471 472 472 473 473 474 475 475
Contents
xi
The /EXEPACK Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /FARCALL Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /HELP Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /HIGH Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /INFO Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /LINE Option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /MAP Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /NOD Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /NOE Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /NOFARCALL Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /NOGROUP Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /NOI Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /NOLOGO Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /NONULLS Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /NOPACKC Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /NOPACKF Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /OLDOVERLAY Option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /ONERROR Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /OV Option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /PACKC Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /PACKD Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /PACKF Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /PAUSE Option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /PCODE Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /PM Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /Q Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /r Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /SEG Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /STACK Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /TINY Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /W Option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The /? Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Setting Options with the LINK Environment Variable . . . . . . . . . . . . . . . . . . . . Setting the LINK Environment Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behavior of the LINK Environment Variable . . . . . . . . . . . . . . . . . . . . . . . . . Clearing the LINK Environment Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . .
475 476 477 477 477 478 478 479 479 479 480 480 480 480 481 481 481 481 482 482 483 484 484 485 485 485 486 486 487 487 488 488 488 488 489 489
xii
Contents
LINK Temporary Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 LINK Exit Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490 Chapter 14 Creating Module-Definition Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MS-DOS Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overlays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Module Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Syntax Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The NAME Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The LIBRARY Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The DESCRIPTION Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The STUB Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The APPLOADER Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The EXETYPE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The PROTMODE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The REALMODE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The STACKSIZE Statement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The HEAPSIZE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The CODE Statement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The DATA Statement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The SEGMENTS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CODE, DATA, and SEGMENTS Attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . The OLD Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The EXPORTS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The IMPORTS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The FUNCTIONS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The INCLUDE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reserved Words. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 15 Using EXEHDR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Running EXEHDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The EXEHDR Command Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EXEHDR Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Executable-File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EXEHDR Output: MS-DOS Executable File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EXEHDR Output: Segmented-Executable File . . . . . . . . . . . . . . . . . . . . . . . . . . . DLL Header Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Segment Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 491 491 492 492 492 493 494 495 496 496 497 498 498 499 500 500 500 501 501 502 503 505 505 506 508 510 510 513 513 513 514 515 516 518 519 520
Contents
xiii
Exports Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EXEHDR Output: Verbose Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MS-DOS Header Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New .EXE Header Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
520 521 521 521 523 523
Part 4 Utilities
Chapter 16 Managing Projects with NMAKE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Running NMAKE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Command-Line Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NMAKE Command File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The TOOLS.INI File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contents of a Makefile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Special Characters as Literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wildcards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Long Filenames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Description Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dependency Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Targets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dependents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Command Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Command Modifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exit Codes from Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filename-Parts Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inline Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . User-Defined Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Special Macros. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Substitution Within Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Substitution Within Predefined Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Environment-Variable Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inherited Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Precedence Among Macro Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 527 528 529 529 533 534 535 535 536 536 537 537 538 538 542 543 543 544 545 546 547 550 551 554 554 560 561 561 563 563
xiv
Contents
Inference Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inference Rule Syntax. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inference Rule Search Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . User-Defined Inference Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Predefined Inference Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inferred Dependents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Precedence Among Inference Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Directives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dot Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preprocessing Directives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence of NMAKE Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Sample NMAKE Makefile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NMAKE Exit Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 17 Managing Libraries with LIB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Running LIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The LIB Command Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LIB Command Prompts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The LIB Response File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying LIB Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Library File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LIB Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LIB Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Cross-reference Listing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Output Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LIB Exit Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 18 Creating Help Files with HELPMAKE . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Running HELPMAKE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Getting Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Source File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elements of a Help Source File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Defining a Topic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating Links to Other Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Formatting Topic Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
563 564 565 566 567 569 570 570 570 572 576 578 580 581 581 582 582 582 583 583 584 584 586 590 590 591 592 593 594 595 595 597 598 599 599 600 600 601 604
Contents
xv
Dot and Colon Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Help Text Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rich Text Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minimally Formatted ASCII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Context Prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 19 Browser Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview of Database Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preparing to Build a Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How BSCMAKE Builds a Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods for Increasing Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BSCMAKE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System Requirements for BSCMAKE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The BSCMAKE Command Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BSCMAKE Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using a Response File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BSCMAKE Exit Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SBRPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview of SBRPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The SBRPACK Command Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SBRPACK Exit Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CREF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using CREF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Difference from Previous Releases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 20 Using Other Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CVPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview of CVPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The CVPACK Command Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CVPACK Exit Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H2INC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basic H2INC Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H2INC Syntax and Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Converting Data and Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Converting Function Prototypes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary of H2INC-Recognized Keywords and Pragmas. . . . . . . . . . . . . . IMPLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . About Import Libraries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The IMPLIB Command Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RM, UNDEL, and EXP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
605 609 609 612 613 615 616 616 616 617 618 618 619 620 622 623 623 624 624 626 626 626 629 631 631 632 632 633 633 634 635 638 648 651 652 652 653 653 654
xvi
Contents
Overview of the Backup Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The RM Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The UNDEL Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The EXP Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WX/WXServer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Running WX/WXServer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
654 654 655 656 657 657
Part 5 Using Help

Chapter 21 Using Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structure of the Microsoft Advisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Navigating Through the Microsoft Advisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the Help Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the Mouse and the F1 Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Hyperlinks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Help Windows and Dialog Boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Accessing Different Types of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Different Help Screens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Help in PWB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Opening a Help File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Global Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using QuickHelp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the /Help Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the QH Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Managing Help Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Managing Many Help Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663 663 664 665 666 666 667 669 672 673 673 674 675 676 676 679 680
Appendixes
Appendix A Error Messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error Message Lists. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BSCMAKE Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CodeView C/C++ Expression Evaluator Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . CodeView Error Messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CVPACK Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EXEHDR Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Math Coprocessor Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H2INC Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HELPMAKE Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IMPLIB Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 685 688 692 700 716 720 722 724 761 767
Contents
xvii
LIB Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LINK Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ML Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NMAKE Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PWB Error Messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SBRPACK Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix B Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regular-Expression Summaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . UNIX Regular-Expression Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tagged Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tagged Expressions in Build:Message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Justifying Tagged Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Predefined Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Non-UNIX Regular-Expression Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Non-UNIX Matching Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
769 775 798 828 840 842 845 845 848 850 852 852 853 854 855
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873
xviii
Contents
Figures and Tables

Figures Figure 2.1 Figure 0.1 Figure 0.2 Figure 0.3 Figure 0.4 Figure 0.5 Figure 0.6 Figure 0.7 Figure 0.8 Figure 0.9 Figure 0.10 Figure 0.11 Figure 0.12 Figure 0.13 Figure 0.14 Figure 0.15 Figure 0.16 Figure 0.17 Figure 0.18 Tables Table 0.1 Table 0.2 Table 0.3 Table 0.4 Table 0.5 Table 0.6 Table 0.7 Table 0.8 Table 0.9 Table 0.10 Table 0.11 Table 0.12 Table 0.13 Table 0.14 PWB Display. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 The SHOW Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 The PWB Build Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 User Interface Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Window Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Status Bar Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 PWB Menu Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Dialog Box Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Key Box and Check Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Regular Expression Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Complex Regular Expression Example . . . . . . . . . . . . . . . . . . . . . . . 84 How PWB Displays Tabs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Arranged Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Vertical Tiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 Horizontal Tiling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 CodeView Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 Format for a Segmented-Executable File . . . . . . . . . . . . . . . . . . . . 516 NMAKE Description Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 Microsoft Advisor Global Contents Screen. . . . . . . . . . . . . . . . . . 664 File Menu and Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Edit Menu and Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Search Menu and Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Project Menu and Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Run Menu and Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Browse Menu and Keys. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Window Menu and Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Help Menu and Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PWB Default Key Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PWB Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cursor-Movement Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PWB Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PWB Color Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PWB Color Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 133 133 134 134 134 135 135 136 140 145 208 252 254
Contents
xix
Table 0.1 Table 0.2 Table 0.1 Table 0.1 Table 0.1 Table 0.2 Table 0.1 Table 0.1 Table 0.2 Table 0.2 Table 0.3 Table 0.4 Table 0.5 Table 0.6 Table A.1 Table A.2 Table .1 Table .2 Table .3 Table .4 Table .5 Table .6 Table .7 Table .8
CodeView TOOLS.INI Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CodeView Command-Line Options . . . . . . . . . . . . . . . . . . . . . . . . . Moving Around with the Keyboard . . . . . . . . . . . . . . . . . . . . . . . . . Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Register Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CodeView Command Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . Module Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Predefined Inference Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Binary Operators for Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . Formatting Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dot and Colon Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RTF Formatting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Microsoft Product Context Prefixes . . . . . . . . . . . . . . . . . . . . . . . . . Standard h. Contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error Codes Listed by Utility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error Codes Listed by Error Code Range . . . . . . . . . . . . . . . . . . . . UNIX Regular-Expression Summary. . . . . . . . . . . . . . . . . . . . . . . . . UNIX Predefined Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CodeView Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Non-UNIX Regular-Expression Summary . . . . . . . . . . . . . . . . . . . . Non-UNIX Predefined Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . UNIX Regular-Expression Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . Predefined Regular Expressions and Definitions . . . . . . . . . . . . . . . Non-UNIX Regular Expression Syntax. . . . . . . . . . . . . . . . . . . . . . .
302 310 323 377 395 398 493 567 574 605 606 610 613 614 686 687 845 846 847 847 848 848 853 854
Contents
xx
xxi
Introduction
Microsoft Macro Assembler (MASM) includes a full set of development tools editor, compiler, linker, debugger, and browser for writing, compiling, and debugging your programs. You can work within the Microsoft Programmers WorkBench (PWB) integrated environment, or you can use the tools separately to develop your programs. Environment and Tools describes the following development tools:
u
The Programmers WorkBench (PWB). PWB is a comprehensive tool for application development. Within its environment is everything you need to create, build, browse, and debug your programs. Its macro language gives you control over not only editing but also build operations and other PWB functions. The Microsoft CodeView debugger. This is a diagnostic tool for finding errors in your programs. Two versions of CodeView are described: one for MS-DOS and one for Microsoft Windows . Each CodeView version has specialized commands for its operating environment, as well as other commands for examining code and data, setting breakpoints, and controlling your programs execution. LINK, the Microsoft Segmented-Executable Linker. The linker combines object files and libraries into an executable file, either an application or a dynamic-link library (DLL). EXEHDR, the Microsoft EXE File Header Utility. EXEHDR displays and modifies the contents of an executable-file header. NMAKE, the Microsoft Program Maintenance Utility. NMAKE simplifies project maintenance. Once you specify which project files depend on others, you can use NMAKE to automatically execute the commands that will update your project when any file has changed. LIB, the Microsoft Library Manager. LIB creates and maintains standard libraries. With LIB, you can create a library file and add, delete, and replace modules.
Filename: LMAETINT.DOC Project: Template: MSGRIDA1.DOT Author: Cris Morris Last Saved By: Mike Eddy Revision #: 19 Page: 21 of 1 Printed: 10/09/00 02:57 PM
xxii

u
HELPMAKE, the Microsoft Help File Maintenance Utility. HELPMAKE creates and maintains Help files. You can use HELPMAKE to create a Help file or to customize the Microsoft Help files. BSCMAKE, the Microsoft Browser Database Maintenance Utility, and SBRPACK, the Microsoft Browse Information Compactor. BSCMAKE creates browser files for use with the PWB Source Browser. SBRPACK compresses the files that are used by BSCMAKE.
Environment and Tools also describes these special-purpose utilities:

u
H2INC, the Microsoft C Header Translation Utility. H2INC translates C header files into MASM-compatible include files. CVPACK, the Microsoft Debugging Information Compactor. CVPACK compresses the size of debugging information in an executable file. IMPLIB, the Microsoft Import Library Manager. IMPLIB creates an import library that resolves external references from a Windows-based application to a DLL. RM, the Microsoft File Removal Utility; UNDEL, the Microsoft File Undelete Utility; and EXP, the Microsoft File Expunge Utility. These utilities manage, delete, and recover backup files.
Scope and Organization of This Book

This book has five parts and five appendixes to give you complete information about PWB, CodeView, and the utilities included with MASM. Part 1 is a brief PWB tutorial and comprehensive reference. The first three chapters introduce PWB and provide a tutorial that describes the features of the integrated environment and how to use them. Chapters 4, 5, and 6 contain detailed information on the interface, advanced PWB techniques, and customization. Chapter 7 contains a complete reference to PWBs default keys and all functions, predefined macros, and switches. Part 2 provides full information on the Microsoft CodeView debugger. Chapter 8 tells how to prepare programs for debugging, how to start CodeView, and how to customize CodeViews interface and memory usage. Chapter 9 describes the environment, including the CodeView menu commands and the format and use of each CodeView window. Chapter 10 explains how to use expressions, including the C and C++ expression evaluators. Chapter 11 describes techniques for debugging Windows-based programs. Chapter 12 contains a complete reference to CodeView commands. The chapters in Parts 3 and 4 describe the utilities. These chapters are principally for command-line users. Even if youre using PWB, however, you may find the detailed information in Parts 3 and 4 helpful for a better
Introduction
xxiii
understanding of how each tool contributes to the program development process. Part 3 provides information about compiling and linking your program. LINK command-line syntax and options are covered in Chapter 13. The contents and use of module-definition files are explained in Chapter 14. Chapter 15 describes how to use EXEHDR to examine the file header of a program. Part 4 presents the other utilities. NMAKE, the utility for automating project management, is described in Chapter 16. Chapter 17 covers LIB, used in managing standard libraries. Procedures for using HELPMAKE to create and maintain Help files are in Chapter 18. The tools for creating a browser database are discussed in Chapter 19. Finally, Chapter 20 describes how to use the following special-purpose utilities: H2INC, CVPACK, IMPLIB, RM, UNDEL, and EXP. Part 5 presents the Microsoft Advisor Help system and the QuickHelp program. It describes the structure of the Help files, how to navigate through the Help system, and how to manage Help files. The appendixes provide supplementary information. Appendix A describes error messages. Appendix B describes regular expressions for use in PWB and CodeView.
Microsoft Support Services

Microsoft offers a variety of no-charge and fee-based support options to help you get the most from your Microsoft product. For an explanation of these options, please see one of the following sections:
u
If you are in the United States, see Support Services Within the United States. If you are outside the United States, see Support Services Worldwide.
Support Services Within the United States

If you have a question about Microsoft Macro Assembler (MASM), one of the following resources may help you find an answer:
u
The index in the product documentation and other printed product documentation. Context-sensitive online Help available from the Help menu.
xxiv

u
The README files that come with your product disks. These files provide general information that became available after the books in the product package were published. Electronic options such as CompuServe forums or bulletin board systems, if available.
If you cannot find the information you need, you can obtain product support through several methods. In addition, you can locate training and consultation services in your area. For information about Microsoft incremental fee-based support service options, call Microsoft Inside Sales at (800) 227-4679, Monday through Friday, between 6:30 a.m. and 5:30 p.m. Pacific time. Note Microsofts support services are subject to Microsofts prices, terms, and conditions in place in each country at the time the services are used.
Microsoft Forums on CompuServe

Microsoft Product Support Services are available on several CompuServe forums. For an introductory CompuServe membership kit specifically for Microsoft users, dial (800) 848-8199 and ask for operator 230. If you are already a CompuServe member, type go microsoft at any ! prompt.
Microsoft Product Support Services

You can reach Microsoft Product Support Services Monday through Friday between 6:00 a.m. and 6:00 p.m. Pacific time.
u
For assistance with Microsoft MASM, dial (206) 646-5109.
When you call, you should be at your computer with Microsoft MASM running and the product documentation at hand. Have your file open and be prepared to give the following information:
u u
u u u
The version of Microsoft MASM you are using. The type of hardware you are using, including network hardware, if applicable. The operating system you are using. The exact wording of any messages that appeared on your screen. A description of what happened and what you were trying to do when the problem occurred. A description of how you tried to solve the problem.
Introduction
xxv
Microsoft Product Support for the Deaf and Hard-of-Hearing

Microsoft Product Support Services are available for the deaf and hard-ofhearing Monday through Friday between 6:00 a.m. and 6:00 p.m. Pacific time. Using a special TDD/TT modem, dial (206) 635-4948.
Product Training and Consultation Services

Within the United States, Microsoft offers the following services for training and consultation:
Authorized Training Centers

Microsoft Authorized Training Centers offer several services for Microsoft product users. These include:
u u u
Customized training for users and trainers. Training material development. Consulting services.
For information about the training center nearest you, call Microsoft Consumer Sales at (800) 426-9400 Monday through Friday between 6:30 a.m. and 5:30 p.m. Pacific time.
Consultant Referral Service

Microsofts Consultant Relations Program can refer you to an independent consultant in your area. These consultants are skilled in:
u u u
Macro development and translation. Database development. Custom interface design.
For information about the consultants in your area, call the Microsoft Consultant Relations Program at (800) 227-4679, extension 56042, Monday through Friday between 6:30 a.m. and 5:30 p.m. Pacific time.
xxvi
Support Services Worldwide

If you are outside the United States and have a question about Microsoft MASM, Microsoft offers a variety of no-charge and fee-based support options. To solve your problem, you can:
u
Consult the index in the product documentation and other printed product documentation. Check context-sensitive online Help available from the Help menu. Check the README files that come with your product disks. These files provide general information that became available after the books in the product package were published. Consult electronic options such as CompuServe forums or bulletin board systems, if available.
u u
If you cannot find a solution, you can receive information on how to obtain product support by contacting the Microsoft subsidiary office that serves your country. Note Microsofts support services are subject to Microsofts prices, terms, and conditions in place in each country at the time the services are used.
Calling a Microsoft Subsidiary Office

When you call, you should be at your computer with Microsoft MASM running and the product documentation at hand. Have your file open and be prepared to give the following information:
u u
u u u
The version of Microsoft MASM you are using. The type of hardware you are using, including network hardware, if applicable. The operating system you are using. The exact wording of any messages that appeared on your screen. A description of what happened and what you were trying to do when the problem occurred. A description of how you tried to solve the problem.
Introduction
xxvii
Microsoft subsidiary offices and the countries they serve are listed below.
Area Argentina Telephone Numbers Microsoft de Argentina S.A. Phone: (54) (1) 814-0356 Fax: (54) (1) 814-0372 Microsoft Pty. Ltd. Phone: (61) (02) 870-2200 Fax: (02) 805-1108 Bulletin Board Service: (612) 870-2348 Technical Support: (61) (02) 870-2131 Sales Information Centre: (02) 870-2100 Microsoft Ges.m.b.H. Phone: 0222 - 68 76 07 Fax: 0222 - 68 16 2710 Information: 060 - 89 - 247 11 101 Prices, updates, etc.: 060 - 89 - 3176 1199 CompuServe: msce (Microsoft Central Europe) Technical support: Windows, Windows for Workgroups, Microsoft Mail: 0660 - 65 - 10 Microsoft Excel for Windows, Microsoft Excel for OS/2, PowerPoint for Windows: 0660 - 65 - 11 Word for MS-DOS, Windows Write: 0660 65 - 12 Word for Windows, Word for OS/2: 0660 - 65 - 13 Works for MS-DOS, Works for Windows, Publisher, WorksCalc, WorksText: 0660 - 65 - 14 C PDS, FORTRAN PDS, Pascal, Macro Assembler PDS, QuickC, QuickC for Windows, QuickPascal, QuickAssembler, Profiler: 0660 - 65 - 15 COBOL PDS, Basic PDS, QuickBASIC, Visual Basic: 0660 - 65 - 16 MS-DOS: 0660 - 65 - 17 Macintosh Software: 0660 - 65 - 18 Project for Windows, Project for MS-DOS, Multiplan, Mouse, Flight Simulator, Paintbrush, Chart: 0660 - 67 - 38 FoxPro: 0660 - 67 - 61 See Germany Microsoft NV Phone: 02-7322590 Fax: 02-7351609 Technical Support Bulletin Board Service: 02-7350045 (1200/2400/9600 baud, 8 bits, no parity, 1 stop bit, ANSI terminal emulation) (Dutch speaking) Technical Support: 02-5133274 (English speaking) Technical Support: 02-5023432 (French speaking) Technical Support: 02-5132268 Technical Support Fax: (31) 2503-24304 See Venezuela See Argentina
Australia
Austria
Baltic States Belgium
Bermuda Bolivia
xxviii Area Brazil
Environment and Tools Telephone Numbers Microsoft Informatica Ltda. Phone: (55) (11) 530-4455 Fax: (55) (11) 240-2205 Technical Support Phone: (55) (11) 533-2922 Technical Support Fax: (55) (11) 241-1157 Technical Support Bulletin Board Service: (55) (11) 543-9257 Microsoft Canada Inc. Phone: 1 (416) 568-0434 Fax: 1 (416) 568-4689 Technical Support Phone: 1 (416) 568-3503 Technical Support Facsimile: 1 (416) 568-4689 Technical Support Bulletin Board Service: 1 (416) 507-3022 See Venezuela See Venezuela See Argentina See Venezuela Microsoft Denmark AS Phone: (45) (44) 89 01 00 Fax: (45) (44) 68 55 10 See Venezuela Microsoft Limited Phone: (44) (734) 270000 Fax: (44) (734) 270002 Upgrades: (44) (81) 893-8000 Technical Support: Main Line (All Products): (44) (734) 271000 Windows Direct Support Line: (44) (734) 271001 Database Direct Support Line: (44) (734) 271126 MS-DOS 5 Warranty Support: (44) (734) 271900 MS-DOS 5 Fee Support Line: (44) (891) 315500 OnLine Service Assistance: (44) (734) 270374 Bulletin Board Service: (44) (734) 270065 (2400 Baud) Fax Information Service: (44) (734) 270080 Microsoft OY Phone: (358) (0) 525 501 Fax: (358) (0) 522 955 Microsoft France Phone: (33) (1) 69-86-46-46 Telex: MSPARIS 604322F Fax: (33) (1) 64-46-06-60 Technical Support Phone: (33) (1) 69-86-10-20 Technical Support Fax: (33) (1) 69-28-00-28
Canada
Caribbean Countries Central America Chile Colombia Denmark
Ecuador England
Finland
France
Introduction Area French Polynesia Germany Telephone Numbers See France Microsoft GmbH Phone: 089 - 3176-0 Telex: (17) 89 83 28 MS GMBH D Fax: 089 - 3176-1000 Information: 0130 - 5099 Prices, updates, etc.: 089 - 3176 1199 Bulletin board, device drivers, tech notes : BTX: microsoft# or *610808000# CompuServe: msce (Microsoft Central Europe) Technical support: Windows, Windows for Workgroups, Microsoft Mail: 089 - 3176 - 1110 Microsoft Excel for Windows, Microsoft Excel for OS/2, PowerPoint for Windows: 089 - 3176 - 1120 Word for MS-DOS, Windows Write: 089 - 3176 - 1130 Word for Windows, Word for OS/2: 089 - 3176 - 1131 Works for MS-DOS, Works for Windows, Publisher, WorksCalc, WorksText: 089 - 3176 - 1140 C PDS, FORTRAN PDS, Pascal, Macro Assembler PDS, QuickC, QuickC for Windows, QuickPascal, QuickAssembler, Profiler: 089 - 3176 - 1150 COBOL PDS, Basic PDS, QuickBASIC, Visual Basic: 089 - 3176 - 1151 MS-DOS: 089 - 3176 - 1152 Macintosh Software: 089 - 3176 - 1160 Project for Windows, Project for MS-DOS, Multiplan, Mouse, Flight Simulator, Paintbrush, Chart: 089 - 3176 - 1170 FoxPro: 089 - 3176 - 1180 Microsoft Hong Kong Limited Technical Support: (852) 804-4222 See England Microsoft Israel Ltd. Phone: 972-3-752-7915 Fax: 972-3-752-7919 Microsoft SpA Phone: (39) (2) 269121 Telex: 340321 I Fax: (39) (2) 21072020 Technical Support: Microsoft Excel for Windows, Project for Windows, Works for Windows: (39) (2) 26901361 Word, Works for MS-DOS: (39) (2) 26901362 Windows, PowerPoint, Publisher, Windows for Workgroups, Works : (39) (2) 26901363 Basic, COBOL, Visual Basic, MS-DOS-based, Fox Products: (39) (2) 26901364 C, FORTRAN, Pascal, Macro Assembler (MASM), and SDKs: (39) (2) 26901354 LAN Manager, SQL Server, Microsoft Mail, Microsoft Mail Gateways: (39) (2) 26901356
xxix
Hong Kong Ireland Israel
Italy
xxx Area Japan
Environment and Tools Telephone Numbers Microsoft Company Ltd. Phone: (81) (3) 3363-1200 Fax: (81) (3) 3363-1281 Technical Support: MS-DOS-based Applications: (81) (3) 3363-0160 Windows-based Applications: (81) (3) 3363-5040 Language Products (Microsoft C, Macro Assembler [MASM], QuickC): (81) (3) 3363-7610 Language Products (Basic, FORTRAN, Visual Basic, Quick Basic): (81) (3) 3363-0170 All Products Technical Support Fax: (81) (3) 3363-9901 Microsoft CH Phone: (82) (2) 552-9505 Fax: (82) (2) 555-1724 Technical Support: (82) (2) 563-9230 See Switzerland (German speaking) Microsoft NV Phone: (32) 2-7322590 Fax: (32) 2-7351609 Technical Support Bulletin Board Service: (31) 2503-34221 (1200/2400/9600 baud, 8 bits, No parity, 1 stop bit, ANSI terminal emulation) (Dutch speaking) Technical Support: (31) 2503-77877 (English speaking) Technical Support: (31) 2503-77853 (French speaking) Technical Support: (32) 2-5132268 Technical Support Fax: (31) 2503-24304 Microsoft Mxico, S.A. de C.V. Phone: (52) (5) 325-0910 Fax: (52) (5) 280-0198 Technical Support: (52) (5) 325-0912 Sales: (52) (5) 325-0911 Microsoft BV Phone: 02503-13181 Fax: 02503-37761 Technical Support Bulletin Board Service: 02503-34221 (1200/2400/9600 baud, 8 bits, No parity, 1 stop bit, ANSI terminal emulation) (Dutch speaking) Technical Support: 02503-77877 (English speaking) Technical Support: 02503-77853 Technical Support Fax: 02503-24304 Technology Link Centre Phone: 64 (9) 358-3724 Fax: 64 (9) 358-3726 Technical Support Applications: 64 (9) 357-5575 See England
Korea
Liechtenstein Luxemburg
Mxico
Netherlands
New Zealand
Northern Ireland
Introduction Area Norway Telephone Numbers Microsoft Norway AS Phone: (47) (2) 95 06 65 Fax: (47) (2) 95 06 64 Technical Support: (47) (2) 18 35 00 See Australia See Argentina See Venezuela MSFT, Lda. Phone: (351) 1 4412205 Fax: (351) 1 4412101 See Venezuela Microsoft Taiwan Corp. Phone: (886) (2) 504-3122 Fax: (886) (2) 504-3121 See England See England Microsoft Iberica SRL Phone: (34) (1) 804-0000 Fax: (34) (1) 803-8310 Technical Support: (34) (1) 803-9960 Microsoft AB Phone: (46) (8) 752 56 00 Fax: (46) (8) 750 51 58 Technical Support: Applications: (46) (8) 752 68 50 Development and Network products: (46) (8) 752 60 50 MS-DOS: (46) (071) 21 05 15 (SEK 4.55/min) Sales Support: (46) (8) 752 56 30 Bulletin Board Service: (46) (8) 750 47 42 Fax Information Service: (46) (8) 752 29 00
xxxi
Papua New Guinea Paraguay Peru Portugal
Puerto Rico Republic of China
Republic of Ireland Scotland Spain
Sweden
xxxii Area Switzerland
Environment and Tools Telephone Numbers (German speaking) Microsoft AG Phone: 01 - 839 61 11 Fax: 01 - 831 08 69 Infomation: 0049 - 89 - 247 11 101 Prices, updates, etc.: 0049 - 89 - 3176 1199 CompuServe: msce (Microsoft Central Europe) Technical support: Windows, Windows for Workgroups, Microsoft Mail: 01 - 342 - 4085 Microsoft Excel for Windows, Microsoft Excel for OS/2, PowerPoint for Windows: 01 - 342 - 4082 Word for MS-DOS, Windows Write: 01 - 342 - 4083 Word for Windows, Word for OS/2: 01 - 342 - 4087 Works for MS-DOS, Works for Windows, Publisher, WorksCalc, WorksText: 01 - 342 - 4084 C PDS, FORTRAN PDS, Pascal, Macro Assembler PDS, QuickC, QuickC for Windows, QuickPascal, QuickAssembler, Profiler: 01 - 342 - 4036 COBOL PDS, Basic PDS, QuickBASIC, Visual Basic: 01 - 342 - 4086 MS-DOS: 01 - 342 - 2152 Macintosh Software: 01 - 342 - 4081 Project for Windows, Project for MS-DOS, Multiplan, Mouse, Flight Simulator, Paintbrush, Chart : 01 - 342 - 0322 FoxPro: 01 / 342 - 4121 (French speaking) Microsoft SA, office Nyon Phone: 022 - 363 68 11 Fax: 022 - 363 69 11 Technical support: 022 - 738 96 88
Uruguay Venezuela
See Argentina Corporation MS 90 de Venezuela S.A. Phone: 0058.2.914739 Fax: 0058.2.923835 See England Phone: 0058.2.914739 Fax: 0058.2.923835
Wales Venezuela
Introduction
xxxiii
This book uses the following typographic conventions:
Examples README.TXT, COPY, LINK, /CO Description Uppercase (capital) letters indicate filenames, MS-DOS commands, and the commands to run the tools. Uppercase is also used for command-line options, unless the option must be lowercase. Bold letters indicate keywords, library functions, reserved words, and CodeView commands. Keywords are required unless enclosed in double brackets as explained below. Words in italic are placeholders for information that you must supply (for example, a function argument). Items inside double square brackets are optional. Braces and a vertical bar indicate a choice between two or more items. You must choose one of the items unless all the items are also enclosed in double square brackets. This font is used for program examples, user input, program output, and error messages within the text. Three horizontal dots following an item indicate that more items having the same form may follow. Three vertical dots following a line of code indicate that part of the example program has intentionally been omitted.
printf, IMPORT
expression [[option]] {choice1 | choice2}
CL ONE.C TWO.C
while( ) { . . . }
F1, ALT+A
Small capital letters indicate the names of keys and key sequences, such as ENTER and CTRL+C. A plus (+) indicates a combination of keys. For example, CTRL+E means to hold down the CTRL key while pressing the E key. The cursor-movement keys on the numeric keypad are called ARROW keys. Individual ARROW keys are referred to by the direction of the arrow on the top of the key (LEFT, RIGHT , UP, DOWN). Other keys are referred to by the name on the top of the key (PGUP , PGDN).
xxxiv
Environment and Tools Examples Arg Meta Delete (ALT+A ALT+A SHIFT+DEL) Description A bold series of names followed by a series of keys indicates a sequence of PWB functions that you can use in a macro definition, type in a dialog box, or execute directly by pressing the sequence of keys. In this book, these keys are the default keys for the corresponding functions. Some functions are not assigned to a key, and the word Unassigned appears in the place of a key. In PWB Help, the current key that is assigned to the function is shown. Quotation marks usually indicate a new term defined in the text. Acronyms are usually spelled out the first time they are used.
defined term dynamic-link library (DLL)
P A R T
The Programmers WorkBench

Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Introducing the Programmers WorkBench . . . . . . . . . . . . . . . . . . . . 3 Quick Start. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Managing Multimodule Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 User Interface Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Advanced PWB Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Customizing PWB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Programmers WorkBench Reference . . . . . . . . . . . . . . . . . . . . . . . 131
Filename: LMAETP01.DOC Project: Part opening 1 Template: MSGRIDA1.DOT Author: Terri Sharkey Last Saved By: Mike Eddy Revision #: 9 Page: 1 of 1 Printed: 10/09/00 02:56 PM
C H A P T E R
Introducing the Programmers WorkBench
The Microsoft Programmers WorkBench (PWB) is a powerful tool for application development. PWB combines the following features:
u u
A full-featured programmers text editor. An extensible build engine which allows you to assemble and link your programs using the PWB environment. The build engine can be extended to support any programming tool. Error-message browsing. Once a build completes, you can step through the build messages, fixing errors in your source programs. A Source Browser. When working with large systems, it is often difficult to remember where program symbols are accessed and defined. The Source Browser maintains a database that allows you to go quickly to where a given variable, function, type, class, or macro is defined or referenced. An extensible Help system. The Microsoft Advisor Help system provides a complete reference on using PWB and MASM. You can also write new Help files and seamlessly integrate them into the Help system to document your own library routines or naming conventions. A macro language that can control editing functions, program builds, and other PWB operations.
For increased flexibility, you can write extensions to PWB. These extensions can perform tasks that are inconvenient in the PWB macro language. For example, you can write extensions to perform file translations, source-code formatting, text justification, and so on. As with the macro language, PWB extensions have full access to most PWB capabilities. For information about how to write PWB extensions, see the Microsoft Advisor Help system (choose PWB Extensions from the main Help table of contents). PWB comes with extensions for C/C++, Basic, and Fortran, in addition to assembly language, to facilitate mixed-language programming. To install one of
Filename: LMAETC01.DOC Template: MSGRIDA1.DOT Revision #: 47 Page: 3 of 1
Project: Environment and Tools Author: Cris Morris Last Saved By: Mike Eddy Printed: 10/09/00 02:48 PM
these extensions, simply rename its corresponding .XXT file to a .MXT file in the \BIN subdirectory where you installed MASM, as described in Getting Started. Also, because an increasing number of programmers are using C++, the PWB Browser extension supports classes.
Whats in Part 1
This part of the book introduces you to the fundamentals of PWB. Chapter 2, Quick Start, shows you how to use the PWB editor and build a simple singlemodule program from PWB. Chapter 3, Managing Multimodule Programs, expands upon the information you learned in Chapter 2. It teaches you how to build a more complicated program that consists of several modules. You should be able to work through these two chapters in less than three hours. As you work through these chapters, you may want to refer to Chapter 4, User Interface Details, which explains options for starting PWB, briefly describes all of the menu commands, and summarizes how menus and dialog boxes work. The user interface information is presented in one chapter for easy access. Chapter 5, Advanced PWB Techniques, shows how to use the PWB search facilities (including searching with regular expressions), how to use the Source Browser, how to execute functions and macros, and how to write PWB macros. Chapter 6, Customizing PWB, describes how to redefine key assignments, change PWB settings, add commands to the PWB menu, and use the TOOLS.INI initialization file to store startup and configuration information for PWB. Chapter 7, PWB Reference, contains an alphabetical reference to PWB menus, keys, functions, predefined macros, and switches. It contains the essential information you need to know to take the greatest advantage of PWBs richly customizable environment.
Using the Tutorial

You probably want to get right to work with MASM. The tutorial chapters 2 and 3 can help you become productive very quickly. To get the most out of this material, here are a few recommendations:
u
Follow the steps presented in the tutorial. It is always tempting to explore the system and find out more about the product through independent research. However, just as programming requires an orderly sequence of steps, some aspects of PWB also require sequenced actions.
Project: Environment and Tools Author: Cris Morris Last Saved By : Mike Eddy Printed: 10/09/00 02:48 PM
Chapter 1 Introducing the Programmers Workbench

u
If you complete a step and something seems wrongfor example, if your screen doesnt match what is in the bookback up and try to find out whats wrong. Troubleshooting tips will help you take corrective actions. When working through this tutorial, consider how you might use these techniques in your own work. PWB is like a full tool chest. You probably wont learn (or even want to learn) all of PWBs capabilities right away. But as time goes on, youll have uses for many of the tools you dont use immediately.
Conventions in the Tutorial

Procedures described in the course of the tutorial are introduced with headings designated by a triangular symbol. A list of the steps making up the procedure then follows. For example: To open a file: 1. From the File menu, choose Open. PWB displays the Open File dialog box. 2. In the File List list box, select the file that you want to open. 3. Choose OK. In procedures, the heading gives you a capsule summary of what the steps will accomplish. Each numbered step is an action you take to complete the procedure. Some steps are followed by an explanation, an illustration, or both.
Project: Environment and Tools Author: Cris Morris Last Saved By: Mike Eddy Printed: 10/09/00 02:48 PM
Project: Environment and Tools Author: Cris Morris Last Saved By : Mike Eddy Printed: 10/09/00 02:48 PM
C H A P T E R
Quick Start
This chapter gets you started with PWB. Youll learn the basics by building and debugging a C-callable routine that generates a 2-byte pseudo-random number. Some of the source code that you will be using is included with the sample programs shipped with MASM 6.1. If you chose not to install the sample code when you set up MASM, run SETUP to install it (see Getting Started for more information). To start PWB in the Windows operating system for this tutorial, double-click the PWB icon in the MASM group. In MS-DOS, type
PWB
at the prompt. To leave PWB at any time: From the File menu, choose Exit, or press
ALT+F4.
The PWB Environment

If this is the first time you have used PWB, you see the menu bar, the status bar, and an empty desktop (assuming a standard installation). If you have used PWB before, it opens the file you last worked with. PWB uses a windowed environment to present information, get information from you, and allow you to edit programs. The environment has the following components:
u u
An editor for writing and revising programs A build engine the part of PWB that helps you assemble, link, and execute your programs from within the environment
Filename: LMAET02A.DOC Template: MSGRIDA1.DOT Revision #: 71 Page: 7 of 1
Project: Author: Harold S. Henry Last Saved By: Mike Eddy Printed: 10/09/00 02:38 PM

u u u
A source-code browser Commands for program execution and debugging The Microsoft Advisor Help system
The browser and the Help system are dynamically loaded extensions to the PWB platform. Microsoft languages and the utilities are also supported in PWB by extensions. Other extensions are available, such as the Microsoft Source Profiler. PWB presents all of these components through menus and dialog boxes. The following figure shows some parts of the PWB interface.
Figure 2.1 PWB Display
Chapter 4, User Interface Details, contains a thorough description of these elements and the rest of the PWB environment. Refer to this chapter when you need specific information about an unfamiliar interface element.
The Microsoft Advisor

PWB makes programming easier by providing the Microsoft Advisor Help system, which contains comprehensive information about:
u u
PWB editing functions PWB advanced features
Chapter 2 QuickStart
u u u u u
PWB menus and dialog boxes CodeView debugger Intel 80x86 assembly language MASM 6.1 assembler options Microsoft utilities (such as NMAKE, LINK, and so on)
The Advisor provides context-sensitive Help and general Help. Context-sensitive Help provides information about the menu, dialog box, or language element at the cursor. To see context-sensitive Help, you can simply point to an item on the screen and press either the right mouse button or the F1 key. PWB displays a Help window showing the requested information. You can also get contextsensitive Help and more general Help by using the Help menu. To answer questions of a less specific nature, you can access the Contents screen by choosing Contents from the Help menu or by pressing SHIFT+F1. From the Advisor contents, you can access Help on any other subject in the database. To get started using the Microsoft Advisor: From the Help menu, choose the Help on Help command. Help on Help teaches you how to use the Microsoft Advisor Help system. For more information on using Help, see Chapter 21. To close the Help window: Click the upper-left corner of the Help window (the Close box), press choose Close from the File menu, or press CTRL+F4. Note Click the Close box, choose Close from the File menu, or press to close any open window in PWB.
ESC,
CTRL+F4
The following sections explain basic editing procedures. If youre already familiar with these, you can skip to Opening an Existing File on page 14.
Entering Text
In this section, youll learn basic PWB procedures by entering a simple Ccallable assembly-language routine. To start a new file: 1. Move the mouse cursor (point) to the File menu on the menu bar and click the left button, or press ALT+F from the keyboard. PWB opens the File menu.
10
2. Point to the New command and click the left button, or press New. PWB opens a window with the title Untitled.001.
to choose
Pressing the ALT key from the keyboard changes focus to the menu bar, and pressing the highlighted key in a menu name opens that menu. Similarly, within a menu, pressing a key highlighted in one of the commands causes that command to be carried out. Using the keyboard, you can also easily move to the beginning of a file by typing CTRL+HOME, or to the end of a file by typing CTRL+END. Starting with your cursor in the upper-left corner of the edit window, type the following comment line:
; C-CALLABLE PSEUDO-RANDOM NUMBER GENERATOR ROUTINE
Your screen should appear as follows:
Saving a File
Now that youve started entering your program, save your work before proceeding. To save a file: From the File menu, choose Save, or press PWB displays the Save As dialog box.
SHIFT+F2.
Filename: LMAET02A.DOC Project: Template: MSGRIDA1.DOT Author: Harold S. Henry Last Saved By: Mike Eddy Revision #: 71 Page: 10 of 4 Printed: 10/09/00 02:38 PM
11
This dialog box has several options that you use to pass information to PWB. PWB indicates the active option in this case, the File Name text box by highlighting the area in which you can enter text. For more information about dialog boxes, see Chapter 4, User Interface Details. Because you have not yet saved the file, it still has the name Untitled.001. Type ONEOF.ASM in the File Name text box. Then click OK or press ENTER to save the file (if you want, you can first select the directory where the file will be saved, using the Drives / Dirs list box). Note Now that you have named your file, choosing Save from the File menu does not bring up a dialog box. Your file is immediately saved to disk.
Indenting Text with PWB

Most assembly-language programmers format their code in several text columns (for example, a label column, an instruction column, a parameter column, and a comment column). You can create these columns differently in PWB than in other text editors. In PWB, you can move the cursor (point) to any position on the screen and start typing text. PWB will take care of inserting whatever new lines, spaces, or tabs are necessary to place the text in the position you are typing. By setting options, you can determine whether PWB will use spaces or tab characters to create the necessary white space (see How PWB Handles Tabs on page 118).
12
Type the following comment lines to document the routine:

; unsigned int OneOf ( unsigned int range ) ; ---------------------------------------------------------------; Routine uses a linear congruential method to calculate ; a pseudo-random number, treats the number as a fraction ; between 0 and 1, multiplies it times the range, ; truncates the result to an integer, and returns it. ; ; Algorithm: a[i] = ( ( a[i-1] * b ) + 1 ) mod m ; where b = 4961 and m = 2^16 ; OneOf PROC NEAR C PUBLIC USES bx dx, range:WORD OneOf ENDP
When you enter assembly-language code, you will often be adding a line indented to the same column as the line above. PWB saves you time by automatically indenting new lines when you press the ENTER key.
u
If there is no line or a blank line immediately below the new line, PWB matches the indentation of the line above it. u If there is a line immediately below the new line, PWB matches the indentation of the line below it. Youll now type some text after the line containing the PROC NEAR directive.
To insert space for a new line using a mouse: 1. Position the cursor anywhere past the end of the line containing PROC NEAR. Precise positioning of the cursor is not critical because (by default) PWB trims trailing spaces from the end of your lines. 2. Click the left mouse button. 3. Press ENTER to make a new line. If you are in overtype mode, change to insert mode by pressing the INS key. Otherwise, pressing ENTER simply moves the cursor to the beginning of the next line. PWB displays the letter O on the status bar and shows the cursor as an underscore to signal that you are in overtype mode. To insert the new line using the keyboard: 1. Move the cursor to the line containing the PROC NEAR directive by pressing the UP ARROW key. 2. Press END to move the cursor to the end of the line. 3. Press ENTER to make a new line.
13
Now type the following lines, using the TAB key to indent and space the instructions:
mov mul inc mov mov mul mov ax, 4961 rndPrev ax rndPrev, ax bx, range bx ax, dx ; ; ; ; Load the constant into AX and multiply it by the previous value in the series add one to the product and save it, mod 2^16
; Now load the range argument, ; multiply it times the new number, ; and return the high 16 bits
Your program now looks like this:
Now that you have finished entering the code for the routine, save the file. From the File menu, choose Save, or press SHIFT+F2. Because you have already named and saved the file once before, PWB simply saves it, without bringing up the Save As dialog box. Note You can turn on automatic file saving by setting the Autosave switch to yes with the Editor Settings command on the Options menu. When Autosave is turned on, PWB automatically saves your file before executing certain commands such as running your program or switching to another file. For example, if you run a program that is not yet stabilized, PWB ensures that your file is stored safely in case you have to reboot.
14
Opening an Existing File

The remainder of this chapter uses a different file, RND.ASM, which you can now open in PWB. This file contains code to let you test the routine you just entered. It has several errors you will correct as you follow the tutorial. To open RND.ASM: 1. From the File menu, choose Open (press PWB displays the Open File dialog box.
ALT+F, O).
PWB uses *.* as the default filename in the File Name text box. This causes PWB to display all files in the current directory in the File List box. If you know the name of the file you want to open, you can replace the *.* by typing the filename into the File Name text box. 2. If you are not in the directory or drive where the sample programs are located, press TAB twice to move to the Drives/Dirs box, or click inside it. The example file, RND.ASM, is located in the \SAMPLES\PWBTUTOR subdirectory of your main MASM directory, if you accepted the default directory suggested by SETUP. The current directory is shown directly beneath the File Name text box. Subdirectories of the current directory are listed in the Drives/Dir box, followed by the available disk drives. Although the box is only large enough to display five entries at a time, you can scroll through the subdirectories or drives to find the one you want by using the DOWN ARROW or PAGE DOWN key, or by using the scroll bar to the right of the box. A directory entry consisting of two periods ( . . ) indicates the parent directory of the one you are currently in. Selecting the . . directory causes you to move one level up your directory tree to the directory immediately
15
above the current directory. For example, if you are in the directory, C:\MASM\SAMPLES, then the . . directory would be C:\MASM. Using the . . entry helps you walk one step at a time along directory paths. Youll notice that the cursor is a blinking underline. That means that although you have selected the list box, you havent yet chosen an item. 3. Use the arrow keys to move to the \SAMPLES\PWBTUTOR subdirectory of your main MASM directory. As you press the arrow keys, youll notice that the cursor changes to a bar that highlights the whole selection. This is called the selection cursor. The text of the selected item also appears in the File Name box. 4. When you have highlighted the drive or directory you want, press ENTER to move there. Using the mouse, you can simply double-click on a directory or drive entry to move to it, without having to go through an intermediate selection step. 5. Use the TAB key or mouse to move to the File List box. 6. Use the arrow keys to move to RND.ASM, or click on it with the left mouse button. 7. When you have highlighted RND.ASM, press ENTER or choose the OK button to accept your selection and open the file. Just as with the directory or drive entries, you can simply double-click on the filename to open it, bypassing the selection step. PWB opens RND.ASM for editing.
16
Copying, Pasting, and Deleting Text

The RND.ASM code contains a placeholder routine named OneOf, which returns 0. You can now delete it and replace it with the random number routine that you created in the previous section of this tutorial. You have already typed in and saved the OneOf routine in a different file. Rather than type it over again, you can copy and paste it using PWBs clipboard (a temporary storage place for text). To do this, open the Window menu and choose the ONEOF.ASM window (if you no longer have it open, you will need to go back to the directory in which you saved it and open it using the Open command on the File menu). Next, to copy the routine most conveniently, you will change the way text is selected. Three selection modes are available on the Edit menu:
u
Stream mode by default, the editor starts in stream selection mode, which allows selection to begin at any point, and selects all characters in a stream between the beginning and end positions of the cursor. Line mode selects complete lines of text, starting with the entire line on which the cursor begins, and ending with the entire line on which it ends. Box mode allows you to select a rectangular section of text, one corner of which is the starting position of the cursor, and the opposite corner of which is the ending position of the cursor.
The currently active selection mode is marked with a dot on the Edit menu. Clicking on a mode selects it. You can also change modes while selecting text. Just select text by clicking the left mouse button and dragging the mouse. Then, without releasing the left mouse button, press the right mouse button to toggle among the selection modes. In this case, line selection mode is the most convenient. To change the selection mode: From the Edit menu, choose Line Mode. Next, place the cursor on the top line of text for the routine:
; unsigned int OneOf ( unsigned int range )
To select lines of text using the keyboard: Press SHIFT+DOWN ARROW until the cursor is on the line containing ENDP.
17
To select lines of text using the mouse: Hold down the left mouse button and drag the cursor to the line containing ENDP.
To copy and paste the text that has been selected: 1. From the Edit menu, choose Copy. This action places the section of text that has been selected into the clipboard. You can also invoke the copy command using the shortcut key combination CTRL+INS. 2. From the Window menu, choose the RND.ASM window. 3. Go to the place where you want to insert the routine (line 51). Press ALT+A, type 51, then press CTRL+M to jump to line 51. This sequence of keystrokes is pronounced Arg 51 Mark. The PWB function Arg begins an argument (51) that is passed to the Mark function. When you pass a number to Mark, PWB moves the cursor to that line. You can also do this from the menu by typing the line number in the Goto Mark dialog box from the Search menu. The cursor is at the beginning of line 51, exactly where you want to insert the new routine. 4. From the Edit menu, choose Paste, or use the SHIFT+INS shortcut keystroke to paste the contents of the clipboard into that location.
To delete the old placeholder routine: 1. Use the PAGE DOWN key and arrow keys or mouse to move to the first line of the placeholder routine, just below the ENDP line of the inserted routine. 2. Select the six lines of the old routine, using SHIFT+DOWN ARROW or by selecting with the mouse. 3. From the Edit menu, choose Delete, or press DEL. The selected section is deleted. Important If you select an area of text and then type something or otherwise insert text, PWB replaces the selected text (deletes it and substitutes what you are typing or inserting), without saving it on the clipboard. You can recover the text by choosing Undo at once from the Edit menu. In the example above, if you had selected the six lines of old routine before pasting in the new routine, those lines would have been deleted and replaced by the paste operation. You have inserted the new routine into RND.ASM. Save the file by choosing Save from the File menu.
18
Single-Module Builds
The next step is to assemble and link the RND program to see if it works. Assembling and linking the source files is called building the project. It results in an executable file. A project build can also:
u u u
Create and update the browser database. Create a Windows-based dynamic-link library (DLL). Build a library of routines.
Setting Build Options

Before you build a program, you must tell PWB what kind of file to create by using the commands on the Options menu. Use the commands from the Options menu to specify:
u
The run-time support for your program. This is important for mixed-language program development, where you have some source files in assembler and some in another language. With Basic, for example, the run-time support must be Basics run-time support. The run-time support you choose determines the run-time libraries that are used and the types of target environments that can be supported. Project template. The template describes in detail how PWB is to build a project for a specific type of file (.EXE, .COM, .DLL, .LIB) and the operating environment for the target file (MS-DOS, the Windows operating system, and so on). Either a debug or release build. Debug options normally specify the inclusion of CodeView debugging information, where release options do not. You may want to generate a different listing file for a debug build than for a release build, or you may not want any listing file for one type of build or the other. A build directory. PWB builds your object and executable files in your current directory unless you specify otherwise. (This option is reserved for projects that use explicit project files, which are described in Chapter 3.)
To set the project template for RND.ASM: 1. From the Options menu, choose Set Project Template from the Project Templates cascaded menu.
19
Note that the actual order of the menu items may differ from the illustration because PWBs extensions can be loaded in any order. 2. PWB displays the Set Project Template dialog box.
This dialog box typically has the entries None and Assembler in the Runtime Support list box. If you have installed other languages, their names appear as well. Since the RND program does not require run-time support, leave None selected. 3. Move to the Project Templates list box by clicking in the box, pressing the TAB key the appropriate number of times, or by pressing ALT+T. 4. Select DOS EXE. 5. Choose the OK button to set the new project template.
20
To set the build options for RND.ASM: 1. From the Options menu, choose Build Options. PWB displays the Build Options dialog box.
2. Turn on Use Debug Options by choosing the Option button or by pressing ALT+D. This option tells PWB that you are building a debugging version of the program. PWB uses debug options when you build or rebuild until you use the Build Options dialog box to choose Use Release Options. 3. Choose the OK button. 4. From the Options menu, choose Languages Options, then choose MASM options from the secondary menu. PWB displays the Macro Assembler Global Options dialog box. 5. Choose Set Debug Options. PWB displays the Macro Assembler Debug Options dialog box. In the Debug Information box, CodeView should already be selected, indicating that the assembler will generate the information that CodeView needs to correlate assembled code with source code. 6. Select Generate Listing File and Include Instruction Timings. This causes the assembler to create a listing file showing you exactly how it assembled your program, and to include in the listing how many clock cycles each instruction will take to execute. 7. Choose the OK button twice. PWB saves all the options that you specify. You dont have to respecify them each time you work on your project. The following illustration shows the three sets of options that PWB maintains for each project. Global options are used for every build. Debug options are
21
used when Use Debug Options is turned on in the Build Options dialog box. Release options are used when Use Release Options is turned on.
21
You can set assembler and linker options for both types of builds (debug and release) by using the Language Options commands and the LINK Options command. The Build Options command then determines which type of build, using which set of options, is actually performed when you assemble a file or rebuild the project. Global options, on the other hand, typically include settings for warning level, memory model, and language variant. These are options that do not change between debug and release versions of a project.
Setting Other Options

The Options menu also contains commands that allow you to describe the desired project build more completely. You dont need to change most of these options to build RND.ASM because the default values supplied by the template will work well. The Options menu contains the following commands:
u
MASM Options in the Language Options cascaded menu. These commands let you specify assembler options specific to debug and release builds, and general options common to both types of builds. Using the MASM Global Options dialog box, you can specify memory model, warning level, and so on. If you have more languages installed, their Compiler Options commands also appear in the Languages Options cascaded menu.
Filename: LMAET02B.DOC Template: MSGRIDA1.DOT Revision #: 3 Page: 21 of 1
Project: Author: Mike Eddy Last Saved By: Mike Eddy Printed: 10/09/00 02:37 PM
22

u
LINK Options. This command parallels the Compiler Options commands. You can specify options specific to debug or release builds and general options common to both debug and release builds. Use LINK Options to specify items such as stack size and additional libraries. You can also select different libraries for debug and release builds. This is handy if you have special libraries for debugging and fast libraries for release builds.
NMAKE Options. This command lets you specify NMAKE command-line options for all builds. This option is particularly useful if you have an existing makefile that was not created by PWB or if you have modified your PWB project makefile. For more information about these subjects, see Using a Non-PWB Makefile on page 55. CodeView Options. This command allows you to set options for the CodeView debugger.
Building the Program

Now that youve set your options, you can build the program. Note that the sample program contains intentional errors that you will correct. To start the project build: 1. From the Project menu, choose Build. PWB tells you that your build options have changed and asks if you want to Rebuild All. 2. Choose Yes to rebuild your entire project. After the build is completed, PWB displays the following dialog box:
You can choose one of several actions in this dialog box:

u u
View the complete results of the build by opening the Build Results window. Run the program if building in MS-DOS. You can run an MS-DOS program right away if the build succeeds. If the build fails, you should fix the errors before you attempt to run the program.
23
To run a successfully built Windows-based program, you must be running under the Windows operating system, and have started the WXServer program before you start PWB.
u
Debug the program if building in MS-DOS. If the build succeeds but you already know the program is not producing the intended results, you can debug your MS-DOS program using CodeView. To debug a Windows-based program, you should be running under the Windows operating system, and already have the WXServer running when you start PWB or CodeView.
Get Help by choosing the Help button or by pressing F1 (as in every PWB dialog box). Cancel the dialog box. This returns you to normal editing.
Choose View Results to close the dialog box (press ENTER). PWB displays the results of the build so that you can review the build messages or step through them to view the location of each error. The next section describes how to do this.
Fixing Build Errors

For each build, PWB keeps a complete list of build errors and messages in the Build Results window. The RND.ASM program that you just built contains several errors that youll identify and fix in this section. If you want to examine build errors in a specific order, you can do so in the Build Results window by placing the cursor on whatever error you wish to examine, and selecting Goto Error from the Project menu. PWB opens a window onto the appropriate source file and places the cursor on the line at which that error was recognized. When you are finished with each error, selecting the Build Results window from the Window menu will return you to the Build Results window. In many cases, however, you will want to work through the errors one after another. This is the easiest method for fixing the build errors in RND.ASM. To fix errors one after another: 1. From the Project menu, choose Next Error, or press SHIFT+F3. PWB positions the cursor on the location of the first error or warning in your program. In this case, a comma is missing after the 10 at the end of the first line of the banr2 data declaration.
24
2. Correct the first error by inserting a comma immediately after the 10. 3. From the Project menu, choose Next Error, or press SHIFT+F3. PWB moves the cursor to the location of the second error. Here, "Esc" in the string on the line below the cursor is enclosed in double quotes, and the string itself is also enclosed in double quotes. As a result, the assembler interprets the first set of quotes around Esc as the end of the string, and then does not recognize Esc as a valid instruction or directive. This can be fixed by substituting a pair of single quotes for the pair of double quotes either around the string or around Esc. 4. Fix the error by changing the double quotes ("") around Esc to single quotes (''). Because of this error, the data symbol again was not defined during the first assembly pass, which also meant that the constant lagain could not be evaluated. As a result, two more errors were generated, which can now be ignored. 5. From the Project menu, choose Next Error, or press SHIFT+F3. PWB positions the cursor on the location of the third error, a simple typographical error where the mov instruction was spelled mob. 6. Correct the third error by replacing the b in mob with a v. Now that all the build errors in RND.ASM have been corrected, save the file by choosing Save from the File menu or by pressing SHIFT+F2.
25
Running the Program

The next step is to build and run the program. To run the program: 1. From the Run menu, choose Execute (be sure that you have saved RND.ASM first). PWB detects that youve changed the source and displays a dialog box with the following options:
2. Choose Build Target to build the program. When the build completes, PWB displays the following dialog box:
3. Choose Run Program to run the finished program. When you run it, the RND program will start by asking you to supply a range value between 1 and 65,535. Type 1234 and press ENTER. The program will then ask you to confirm that 1,234 is indeed the correct range. When you type y, the program is supposed to display a list of random numbers within that range. Instead, however, the program restarts when you type y. Something is wrong. To get out of the program and back to PWB, press CTRL+C (in the case of this particular program, you can also use the ESC key to exit when the program asks for confirmation of a range value). Before blanking your programs output, PWB will display the message, Strike any key to continue... so that you can examine the final state of the screen. The following sections describe the process of debugging using the Microsoft CodeView debugger. If youre already familiar with CodeView, skip to Chapter 3, Managing a Multimodule Program.
26
Debugging the Program

PWB integrates several Microsoft tools to produce a complete development environment. Among those tools are NMAKE, a program maintenance utility, and CodeView, a symbolic debugger. Whenever you build programs using PWB, PWB in turn invokes NMAKE to manage the build process. In the same way, PWB can serve as a gateway to CodeView when you need to debug a program you have built. Earlier, you chose Use Debug Options in the Build Options dialog box. A debug build typically includes the assembler options that generate CodeView information. Therefore, the program is ready to debug with the CodeView debugger.
Using CodeView to Isolate an Error

In addition to the typographical errors that you just corrected, RND.ASM contains a logical error which will prevent it from running properly. You can use CodeView to isolate this error. To start CodeView: From the Run menu, choose Debug. If anything in your program is out of date, PWB asks if you want to build or rebuild the current target. If you modified the source file in any way, PWB considers it out of date relative to the executable file that you built earlier. If this happens, build the program and choose Debug from the Run menu. CodeView now starts, displaying three windows on its main debugging screen.
27
The first thing to do is set up the CodeView screen so that it best suits your way of working. When you leave CodeView, your setup will be saved in CURRENT.STS. The next time you use CodeView, that setup will be restored when the program starts. The right screen layout depends a lot on your work style, and on the project you are working on. In this case, many of CodeViews more advanced features will not be necessary, so we will set up a simple screen. By default, three windows are initially displayed: locals, source1, and command. Close the locals window, since it will not be needed in debugging RND, open a register window and a memory window, and arrange the windows in the screen.
28
To close a window using the mouse: Click the upper left corner of the window. To close a window using the keyboard: Use the F6 key to move into the window that you want to close. Choose Close from the Windows menu, or press CTRL+F4. To open the Register and Memory windows: 1. From the Windows menu, choose Register, or press ALT+7. The Register window displays the contents of the processors registers, either in Native (8086) mode, or in 32-bit (80386-80486) mode. 2. At the bottom of the Options menu, click Native if it is not already selected. 3. Choose Memory 1 from the Windows menu, or press ALT+F5. Memory windows display the contents of a specified block of memory, so that you can watch changes as your program runs.
To move and size a window using the mouse: 1. To move a window, place the cursor on its top line, not in a corner. Then drag the window to a new location. 2. To size a window, move the cursor to the lower right corner of the window. Then drag the corner to change the windows size.
29
To move and size a window using the keyboard: 1. Using the F6 key, shift focus to the window you want to size. 2. Choose Move or Size from the Windows menu. 3. Use the arrow keys to move or size the window. 4. Press ENTER when you are finished. When you have positioned and sized the windows to your satisfaction, set the source window to show both your source text and the actual instructions assembled by MASM, and set the memory window to stay fully up to date as the program executes.
To display mixed source and assembler output: 1. From the Options menu, choose Source1 Window. CodeView displays the Source1 Window Options dialog box. 2. In the Display box, choose Mixed Source and Assembly. 3. Choose OK.
To set the Memory1 window to be updated frequently: 1. From the Options menu, choose Memory1 Window. CodeView displays the Memory1 Window Options dialog box. 2. Select the Re-evaluate expression always (live) check box. 3. Choose the OK button.
Working Through a Program to Debug it

CodeView has placed you at the programs starting point. The registers are as they would be at that point, and the memory window shows whatever the DS register is pointing to. The instructions that appear at the top of the source window have been created by the .STARTUP directive, as you can see if you scroll up a few lines. CodeView provides various ways to control and examine the execution of a program. The Step command (F10 key) executes the next instruction in the program, and if that instruction is a call, executes the entire called code up through the return. Trace (F8 key), on the other hand, jumps to the called code and traces through it too, one instruction at a time. You can also run the program up to a given point, or set breakpoints at several points. With RND, we will only need to use a few of the possible debugging tools.
30
To Step through the program: Use the F10 key to step through the first couple of instructions of the .STARTUP code. You will notice that as each instruction is executed, CodeView briefly displays the program output screen, and updates the Register window to show changes in the registers. As the DS register is loaded, the Memory window displays the data segment of the RND program. Stepping is a slow way to move through the program. In many cases, as with RND, you will want to move quickly to the point where the program failed, to see what the matter was. In RND, everything seemed to be working correctly until you entered y to confirm the range.
To run a program up to a given place: 1. Scroll through the code to the comment line:
; Read in a character from the keyboard
Three lines below the comment is a cmp instruction. 2. Place the cursor on the line containing the cmp instruction, either by using the arrow keys or the mouse.
3. Press the F7 key, or by clicking on that line with the right mouse button.
Filename: LMAET02B.DOC Project: Template: MSGRIDA1.DOT Author: Mike Eddy Last Saved By: Mike Eddy Revision #: 3 Page: 30 of 10 Printed: 10/09/00 02:37 PM
31
CodeView procedes to execute the program up to (but not including) that line. The display switches to the output screen where the program shows its introductory message, then requests a range value. To work through the RND program and find the bug: 1. Type in a range value smaller than 65,535 and press 2. Press y. CodeView returns you to the source window in the debugging screen. The succeeding instructions are designed to recognize an ESC or a y, and are presumably failing in some way, causing the program to start from the beginning. 3. Using the F10 key, step through the various cmp instructions. You will find that the code works as expected, recognizes the y, and proceeds. 4. Go on to the next the next jump or branch. The next possible branch in program execution occurs at the call to OneOf. Although this seems unlikely to be causing the program to start over, it is the next thing to test. 5. Position the cursor on the call instruction, and press either F7 or the right mouse button to execute the program up to that call. So far, so good: the program continues to run as expected. 6. Now press
F10 ENTER.
The program redisplays the range value and asks for confirmation.
to execute the call itself.
32
The program now erroneously starts over. We now know that the problem must be located in the OneOf routine. 7. Press CTRL+C, then ENTER to get out of the program. 8. Choose Restart from the Run menu to return to the beginning of the program. 9. As you did before, scroll down to where OneOf is called and execute the program up to just before the call. 10. This time, use F8 to trace through the call. You will notice that CodeView now shifts into the called routine, allowing you to step through the OneOf code instruction by instruction. 11. Step or trace through the OneOf routine, using F10 or F8, and look for the problem. You will discover a simple error of omission: the routine has no ret instruction at the end. As a result, execution continues into the succeeding code, which happens to be the .STARTUP code. Having found the problem, you can leave CodeView and return to PWB. 12. From the File menu, select Exit. CodeView closes, saving your settings for next session. 13. From PWB, insert a new line in the RND.ASM file just before the ENDP line of the OneOf routine.
33
14. Type a ret instruction there.
15. Save the corrected file by choosing Save from the File menu, or by pressing SHIFT+F2. 16. Select Execute from the Run menu to rebuild the program and try it again. This time, it should work without problems.
Examining Memory in the Memory Window

In addition to being able to watch the register contents change as your code runs, CodeView lets you see what happens to locations in memory. For example, you may have noticed that OFFSET lnBuf was assembled as hex E4. By setting the memory window at that address, you can watch what happens in the lnBuf buffer as the program formats a line of output. One way to reach that address, since it is fairly close, is to scroll in the memory window until you get to it. However, this is often impractical. To set the address in the Memory Window: 1. From the Options menu, select Memory1 Window. 2. In place of DS:0 in the Address Expression field of the Memory1 Window Options dialog box, type DS:0x00e4. Now, you can step through cycles of a formatting loop and watch the buffer change.
34
To step through a formatting loop in RND.EXE: 1. In the source window, scroll to the instruction dec bl around line 150, which completes a formatting cycle for a random number. 2. Press F7 or click in that line with the right mouse button. If you know for sure that dec bl is on line 150, you can move to the Command window and type g @150 followed by the ENTER key. This instructs CodeView to execute the program up through line 150 in the source file.
3. While watching the memory window, press F7 again, or click the dec bl instruction again with the right mouse button. As the loop executes again, you can see the memory area change to reflect the new value being formatted into lnBuf. To switch from CodeView back to PWB: Choose Exit from the CodeView File menu.
Where to Go from Here

Now that youve created, built, and debugged a simple program, youve begun to discover the power of PWB. Chapter 3, Managing Multimodule Programs, describes how to create and manage projects with more than one source file.
36
35
C H A P T E R
Managing Multimodule Programs
This chapter expands on the work you did in Chapter 2 and explains how to build and maintain multimodule programs using PWBs integrated projectmanagement facilities. PWB offers an efficient way to manage complex projects. You organize and build your project entirely within PWB, using convenient menus and dialog boxes instead of makefiles or batch files. PWB stores the information needed to build and manage your program in two files, the project makefile and the project status file. These are called the project. When you open the project, PWB automatically configures itself to build your program. To move from one project to another, you close one project and open another.
Multimodule Program Example

In this chapter, youll learn to set up a multimodule project in PWB by building SHOW.EXE, a three-module program. The SHOW program displays text files on character-based screens with MS-DOS. The following modules make up SHOW.EXE:
Module SHOW.ASM PAGER.ASM SHOWUTIL.ASM Function Program driver; contains .STARTUP entry point, and calls all other procedures. Contains procedures for paging through a file and writing text to the screen buffer Contains miscellaneous procedures.
The program also contains a common header file SHOW.INC in addition to these three source modules. Figure 3.1 shows the components of SHOW and how they combine to build the executable file.
Filename: LMAETC03.DOC Project: Environment and Tools Template: MSGRIDA1.DOT Author: Harold S. Henry Last Saved By: Mike Eddy Revision #: 64 Page: 35 of 1 Printed: 10/09/00 02:44 PM
36
Figure 3.1
The SHOW Project
To build SHOW.EXE, you need to assemble the three source files and link them together, having specified the assembler and link options that will produce the kind of file you are trying to make. All this build information is stored in the SHOW project make and status files.
Opening the Project

Start by opening the SHOW project. (If you have not started PWB, do so now.) To create a project: 1. From the Project menu, choose New Project. PWB displays the New Project dialog box.
2. Type show in the Project Name text box. 3. Choose Set Project Template. PWB displays the Set Project Template dialog box. 4. Select the following options:
Chapter 3 Managing Multimodule Programs

u u
37
Runtime Support: None. Project Template: DOS EXE.
At this point, the Set Project Template dialog box should appear as follows:
This initial specification tells PWB what kind of executable file you intend to build, and is saved as part of the project. 5. Choose OK to return to the New Project dialog box. In this case, a project makefile, SHOW.MAK, already exists. Since PWB would ordinarily create and save a new makefile at this point, you are now asked whether you want to overwrite the existing file.
6. Choose Yes to overwrite the existing file. PWB saves the new SHOW.MAK and returns to the New Project dialog box. 7. Choose OK. PWB now displays the Edit Project dialog box so that you can add files to your new project.
38
The next section describes the types of files that can be added to the project. The tutorial then continues by listing the example files to add to the list.
Contents of a Project
A project file list can contain the following files:
u u u u u
Source code files (.ASM). Object files (.OBJ) in special cases. Library files (.LIB) for libraries that change. Module-definition files (.DEF) for DLLs. Resource-assembler source files (.RC) for Microsoft Windows-based programs.
These file types are all that are needed to create most MS-DOS and Windowsbased applications. Include files, such as SHOW.INC, need not be listed because PWB automatically adds them when it scans your source files for dependencies. When you select assembler run-time support with a Windows-based project template in the Set Project Template dialog box, PWB automatically specifies standard library files such as LIBW.LIB. Therefore, you need not add standard library files to the project list. To add the SHOW files to your project: 1. Choose the files you want to add to the project from the File List box. In this case, youll add SHOW.ASM, PAGER.ASM, and SHOWUTIL.ASM. These files are located in the \MASM\SAMPLES\SHOW directory. If you installed Microsoft MASM 6.1 in a directory other than MASM, adjust the path accordingly.
39
You can scroll the File List box by clicking the scroll bars or by pressing the arrow keys. 2. For each file, select it and choose Add / Delete to add the file to the Project list box. Or, you can double-click a file to add or remove it from the list. To add all three files at once, you can type *.ASM in the File Name field, press ENTER, and then choose Add All. 3. Choose Save List when you have added all three files. PWB uses the rules in the project template along with the list of files that you just specified to scan the sources for include dependencies and to create the project makefile. This process is described in the next section. Now your project completely describes what you want to build (the project template), the component source files, and the commands used to build the project.
Dependencies in a Project
When you save the project, PWB generates a makefile from the project template, files, and options you specified. This file also contains a list of instructions that are interpreted by NMAKE. In addition, PWB generates the project status file, which saves the project template, the editor state, and the build environment for the project. For more information on the project status file, see Project Status Files on page 129. When you build the project, NMAKE examines the build rules in the project makefile. These are rules that specify targets (such as an object or an executable
40
file) and the commands required to build them. For example, a rule for making an .OBJ file from an .ASM file can be expressed as follows:
.asm.obj: ML /c $<
To reduce the amount of time builds take, NMAKE assembles or links only the targets that are out-of-date with respect to their corresponding source file. This process is simple if there is a one-to-one correspondence between sources and targets. However, many programs use the INCLUDE directive to include files containing common equates, macros, and other program text. The object files must be made dependent not only on the source file but also on the files that are used by the source file. You dont need to add include (.INC) files to your project. When you save the project, PWB scans your source files looking for INCLUDE directives and builds dependencies on these files. NMAKE will thereafter recompile a source file if you change a file that it includes.
Building a Multimodule Program

Now that the project files are complete, you can build the program in the same way you built the single-module program. To build a multimodule program: 1. You are starting a new project, so you should use debug options for the initial builds. Choose the Use Debug Options button in the Build Options dialog box. 2. From the Project menu, choose Build. PWB displays a dialog box to inform you that build information has changed because you altered the build options. 3. Choose Yes to rebuild your entire project. As the program is built, PWB shows status messages about the progress of the build. When the build completes, a dialog box displays a summary of any errors encountered during the build process. Note The Next Error command on the Project menu works the same for a multimodule build as for a single-module build. Because errors in a multimodule build can occur in different files, PWB automatically switches to the file that contains the error. In some cases, you will want to force a complete rebuild of your project by choosing Rebuild All from the Project menu. The difference between Build and
41
Rebuild All is that Build compiles and links only out-of-date targets and Rebuild All compiles all targets, regardless of whether they are current.
Running the Program

Now that your program is built, you can test it from PWB. To run SHOW: 1. From the Run menu, choose Program Arguments. 2. Type the name of a text file to pass to the SHOW program. The SHOW.ASM source file is a good file to use. 3. Choose OK to set the program arguments. PWB saves the arguments so that you can run or debug the program many times with the same command line. 4. From the Run menu, choose Execute. SHOW will display the first screen of text in the file you passed to it. You can use the arrow keys and PAGE UP and PAGE DOWN to move around in the text file. Press Q and then any key to return to PWB. You have successfully created a multimodule project, built the program, and run it, all from within the Programmers WorkBench. You can now leave PWB. To leave PWB: From the File menu, choose Exit or press ALT+F4. PWB saves your project and returns to the operating-system prompt. If you started PWB from within the Windows operating system, you will return to the Windows operating system. Creating a PWB project is an important first step. However, most of the time you will be maintaining projects. The next section provides an overview of project maintenance. The tutorial then continues with the SHOW project.
Project Maintenance
Once you have created a project, you may have to change it to reflect the changes in your project organization. You can:
u u u u
Add new file-inclusion directives to your source files. Add new source, object, or library files. Delete obsolete files. Move modules within the list.
42

u u
Change assembler and linker options. Change options for individual modules.
When you add a new INCLUDE directive to a source file, you add a new dependency between files. For the most accurate builds, you need to regenerate include dependencies for the project. To regenerate include dependencies: 1. From the Project menu, choose Edit Project. 2. Select the Set Include Dependencies check box. 3. Choose Save List. PWB regenerates the include dependencies for the entire project and rewrites the project makefile. To add new files to an existing project: 1. From the Project menu, choose Edit Project. 2. For each file that you want to add to the project: u Select the file from the File List box, or type the name of the file in the File Name text box. u Choose the Add / Delete button to add the file. 3. Choose Save List to rewrite the project makefile, set up the dependencies, and add the commands for the new files. To delete files from a project: 1. From the Project menu, choose Edit Project. 2. For each file that you want to remove from the project: u Select the file from the File List box, or type the name of the file in the File Name text box. u Choose the Add / Delete button to remove the file from the list. 3. Choose Save List. With most programming languages, you wont need to move modules within a project. However, some languages or custom projects require files to be in a specific order. If youre programming in Basic, for example, you must place the main module of your program at the top of the list. Unlike other languages, Basic does not define an explicit name where execution begins. Entry to a Basic program is defined by the first file in the list.
43
To move a file to the top of the project file list: 1. From the Project menu, choose Edit Project. 2. Select the file you want to move to the top of the list. 3. Choose the To Top of List button.
Using Existing Projects

Youll now make modifications to the SHOW project that you just created. During a PWB session, the project you open remains open unless you explicitly change it. If you have not already started PWB, you should do so now. In the Windows operating system, double-click the PWB icon in the MASM program group. If you are not compiling from within the Windows operating system, you can start PWB and open the SHOW project from the operating-system command line by typing the command:
PWB /PP SHOW
If the SHOW project is the last project you had open in PWB, type the following command:
PWB /PL
44
You can set up PWB to reopen the last project automatically at startup by choosing Editor Settings from the Options menu, and then by setting the Boolean switch Lastproject to Yes. If you have already started PWB, open the project now. To open the project from within PWB: 1. From the Project menu, choose Open Project. 2. Choose SHOW.MAK from the File List box or type show in the Project Name text box.
3. Choose OK. When you open the project, PWB restores the projects environment, including:
u
u u u
The window layout with the window style, size, and position for each window. The file historya list of open files for each window and the last cursor position in each file. The last find string. The last replace string. The options that you used for the last find or find-and-replace operation, such as regular expressions. See Using Regular Expressions on page 82 for more information about regular expressions. The project template (for example, DOS EXE) and any customizations you have made to the template such as changing the build type or an assembler or linker option. The command-line arguments for your program.
45
46
Note PWB can save all environment variables, including PATH, INCLUDE, LIB, and HELPFILES, depending on how the envcursave and envprojsave switches are set. For more information, see Environment Variables on page 127. Also, if you turn the restorelayout switch off, PWB does not restore the window layout, the find strings and options, or the file history of a project. Instead, PWB keeps the current editor state when opening a project.
Adding and Deleting a Project File

As you develop a project, you will occasionally add new modules. The following example presents the steps needed to add a library file to the SHOW project. Note that this procedure is only an example, and in fact, SHOW does not use or require any library support. To add a file to your project: 1. From the Project menu, choose Edit Project. The file and directory navigation lists in this dialog box work in exactly the same way as those in the Open File dialog box. 2. Choose the parent directory symbol (..) in the Drives / Dirs list box to move up the directory tree to the SAMPLES directory. 3. Choose the parent directory symbol (..) again to move up the directory tree to the MASM directory. 4. Choose the LIB directory in the Drives / Dirs list box to move down the tree into the LIB directory.
47
Notice that the directory displayed after the label File List reflects the directory change. 5. Make sure the File Name text box contains *.* or *.LIB. 6. Select LIBW.LIB in the File List box. 7. Choose the Add / Delete button to add the file to the project. LIBW.LIB is being used here as an example of how to add a file to your project. In practice, because it is a system library that will not change, there is no reason to add it. However, if you have a library of your own that is being used by your project, you would add it to the project in this way. 8. Since LIBW.LIB is not a source file and cannot have include dependencies, you can clear the Set Include Dependencies check box. If this check box is selected, PWB regenerates the dependencies for all the files in the project. 9. Choose Save List. LIBW.LIB is now part of the project. Since SHOW is not a program designed to run under Microsoft Windows, you should now delete this library from the project again. To delete a file from your project: 1. From the Project menu, choose Edit Project. 2. In the Edit Project dialog box, you can either select LIBW.LIB in the Project list box and then select Add / Delete, or simply double-click on LIBW.LIB in the Project list box to delete it.
48
Changing Assembler and Linker Options

Up to this point, you have used PWBs default build options for all the examples. These options are sufficient for most cases, but in special cases, you will want to adjust them. When you are debugging a program, you should choose the debug build type. When producing a debug build, the assembler and linker include a good deal of extra information in the program for CodeView to use in debugging. When you are ready to use the program, choose the release build type, so that the extra debugging information is no longer incorporated into the program. To specify whether a build should be for release or debug: 1. From the Options menu, choose Build Options. 2. Choose Use Debug Options or Use Release Options in the Build Options dialog box. 3. Choose OK. When you specify a release build, PWB does not change your debug options. For more information on global options, debug options, and release options, see Setting Build Options on page 18. To change assembler options: 1. From the Language Options cascaded menu on the Options menu, choose MASM Options. The Macro Assembler Global Options dialog box contains a number of options that are common to both the release and debug builds.
49
At the bottom of the dialog box are buttons that set options that are specific to the current type of build (debug or release), and that show the assembler flags corresponding to those options. Default settings were determined when you chose the project template. Note You can choose the Set Debug Options button to view and set the options for debug builds. However, this does not change the type of build that is performed when you build the project. To set the type of build, choose Build Options from the Options menu. 2. Choose Set Debug Options. PWB displays a dialog box in which you can specify debug options.
If you had chosen Set Release Options, PWB would have displayed the same dialog box, so that you could select options for release builds. 3. Choose OK to return to the Macro Assembler Global Options dialog box. 4. Choose OK to save the new assembly options and return to the main PWB screen. To change the linker options: 1. From the Options menu, choose LINK Options. PWB displays the LINK Options dialog box.
50
2. Choose Additional Global Options to review and select additional global link options. PWB displays the Additional Global Link Options dialog box.
3. Choose OK when you are finished to return to the LINK Options dialog box. 4. Choose Additional Debug Options to review and select additional debug link options. PWB displays the Additional Debug Options dialog box.
51
5. Choose OK when you are finished to return to the LINK Options dialog box. 6. Choose OK to close the LINK Options dialog box and use any new options you might have set. You are now ready to build your project with any new options you have selected. To build a modified project: From the Project menu, choose Rebuild All.
Changing Options for Individual Modules

Most of the modules in a program can generally be built using the same options. However, you may occasionally want to modify the options for a single module. The example that follows shows how to customize your project to change the assembler options for PAGER.ASM only. To do this, you manually edit the instructions in the project makefile for compiling PAGER.ASM. To open SHOW.MAK for editing: 1. If the SHOW project is open, choose Close Project from the Project menu. This step is important because you cannot edit a PWB makefile for a project that is currently open. 2. Choose the Open command from the File menu and open the SHOW.MAK file in the editor. To customize the assembly of PAGER.ASM: 1. Find the rule for compiling PAGER.ASM:
52

PAGER.obj : PAGER.ASM show.inc !IF $(DEBUG) $(ASM) /c $(AFLAGS_G) $(AFLAGS_D) /FoPAGER.obj PAGER.ASM !ELSE $(ASM) /c $(AFLAGS_G) $(AFLAGS_R) /FoPAGER.obj PAGER.ASM !ENDIF
This rule contains a conditional statement with two commands. The first command is for debug builds, and the second command is for release builds. You can edit either one of these commands, or both. They contain the following macros defined earlier in the makefile:
Macro ASM AFLAGS_G AFLAGS_D AFLAGS_R Definition The name of the MASM assembler Global options for assembly Debug options for assembly Release options for assembly
As an example, suppose that PAGER.ASM contained data structures which you want to pack on 32-bit boundaries for the release build only. The /Zp4 flag tells the ML program to pack data structures on 4-byte boundaries. 2. Edit the release build command as follows.
$(ASM) /c $(AFLAGS_G) $(AFLAGS_R) /Zp4 /FoSHOWUTIL.obj SHOWUTIL.ASM
Because it is hard to be sure what options the flags macros will invoke, the new option should be placed after them, so that it will supersede any instructions they may contain. Note that both the assembler options, such as /Zp, and NMAKE macros, such as AFLAGS_G, are case sensitive and must appear exactly as shown. Warning After this modification, PWB still recognizes this makefile as a PWB makefile. However, if you make changes beyond adding options to individual command lines, PWB may no longer recognize the file as a PWB makefile. If this happens, you can delete the makefile and re-create it, or you can use it as a non-PWB makefile. For more information on using non-PWB makefiles, see Using a Non-PWB Makefile on page 55. You could save your changes to the makefile by choosing Save from the File menu, then reopen the project and rebuild SHOW with the custom option you just installed. Because PAGER.ASM does not contain any data declarations, however, the new options have no real purpose or effect.
53
The Program Build Process

This section explains the correspondence between projects and makefiles. Normally, the build process is automatic, but you may encounter situations that require customized build options. Read this section to learn how the utilities work with PWB. The following diagram illustrates the PWB build process.
Figure 3.2
The PWB Build Process
54
When you save your project by choosing the Save button in the Edit Project dialog box, PWB uses the list of files along with the rules in the selected project template to scan for dependencies and write the project makefile. When you choose the Build or Rebuild All command from the Project menu, PWB releases as much memory as possible and passes the makefile to NMAKE, which builds the project. NMAKE stops at the end of the first build step that produces an error (as opposed to a warning) or at the end of a successful build. In either case, NMAKE returns the results of the build to PWB along with a log of any errors and warnings. For more information about NMAKE, see Chapter 16, Managing Projects with NMAKE. PWB saves the output of the build for you to view in the Build Results window or to step through when you choose the Next Error (SHIFT+F3), Previous Error (SHIFT+F4), and Goto Error commands on the Project menu. You can run the program, set program arguments, and debug the program by choosing commands in the Run menu. If you have turned on the generation of browser information, PWB builds the browser database when you build the program. Once you have a browser database, you can use the commands in the Browse menu to navigate your programs source files and examine the structure of your program. For more information, see Using the Source Browser on page 88.
Extending a PWB Project

Makefiles that are not written by PWB often contain utility targets that are not used in the process of building the project itself. These targets are used to clean up intermediate files, perform backups, process documentation, or automate other tasks related to the project. You can extend a PWB makefile to perform these kinds of tasks by adding new rules. These additional rules must be placed in a special section of the project makefile. In the following example you will add a section that creates a file with information about the project. This file has the same base name as the project and the extension .LST. It lists the files in the project and the major options used for the build. This example section can be used with any assemblylanguage PWB project. Use the SHOW project to see how to add a custom section. If you have been following the tutorial, you have already made one custom edit to the SHOW.MAK file.
55
To add a custom section to the PWB makefile: 1. If the project is open, choose Close Project from the Project menu. This step is crucial because PWB disables modification of the project makefile until the project is closed or a different project is opened. (This restriction does not apply to non-PWB project makefiles.) 2. From the File menu, choose the Open command and open the SHOW.MAK file in the editor. 3. Press CTRL+END to move the cursor to the end of the makefile. 4. Type the following new comment line exactly as shown:
# << User_supplied_information >>
You must put the number sign (#) in column one and type the contents of the line exactly as shown, including capitalization. Failing to type this line accurately will make the project unrecognizable to PWB or will cause PWB to change your custom build information in unexpected ways. You can copy this line from Help rather than typing it in, if you wish. Press ALT+A, type USI, press F1, and then copy the line. Move back to the make file, and paste the line in at the end. NMAKE requires space between rules. Therefore, you should separate this line from the lines above it by one blank line. Similarly, you should leave at least one line between the separator and your custom build rules. For more information about NMAKE and the syntax of makefiles, see Chapter 16, Managing Projects with NMAKE. This comment line is used by PWB as a separator. Anything above this comment is regarded as belonging to PWB, and you should not edit the information there. The exception is to add options to individual command lines, as described in Changing Options for Individual Modules on page 49. Anything in the makefile after the separator is your information, and PWB ignores it. NMAKE, however, processes the entire file. Now that you have a separator to show PWB where your custom information starts, you can add the custom information. The separator and custom section is included in the following text, and can also be found in the EXTRA.TXT file in the SAMPLES directory:
56

# << User_supplied_information >> # # # # # # # # # # Example 'user section' for PWB project makefiles, used in the PWB Tutorial. NOTE: This is not a standalone makefile. Append this file to makefiles created by PWB. This user section adds a new target to build a project listing that shows the build type, options, and a list of files in the project.
!IFNDEF PROJ !ERROR Not a standalone makefile. !ENDIF !IF $(DEBUG) BUILD_TYPE = debug !ELSE BUILD_TYPE = release !ENDIF # Project files and information-list target # $(PROJ).bld : $(PROJFILE) @echo <<$(PROJ).bld : Project Build Information Build Type: $(BUILD_TYPE) Program Arguments: $(RUNFLAGS) Project Files $(FILES: =^ ) Assembler Options Global: $(AFLAGS_G) Debug: $(AFLAGS_D) Release: $(AFLAGS_R) Link Options Global: $(LFLAGS_G) Debug: $(LFLAGS_D) Release: $(LFLAGS_R) <<KEEP
The custom section of a PWB makefile can use any of the information defined by PWB. This example takes advantage of many macros defined by PWB. For example, the PROJFILE macro, which contains the name of the project makefile, is used as the dependent of the listing file so that the listing is rebuilt whenever the project makefile changes. In addition, this custom section uses many features of NMAKE, including macros, macro substitution, preprocessing directives, and inline files. For more
57
information about NMAKE and makefiles, see Chapter 16, Managing Projects with NMAKE. To rebuild using the custom options: 1. Choose Open Project from the Project menu and reopen the SHOW project. 2. From the Project menu, choose Build Target. 3. Type the name of the new target SHOW.BLD in the Target text box, and then choose OK. PWB informs you that the build options have changed and asks if you want to rebuild everything. 4. Choose Yes to confirm that you want to rebuild everything. The project information file that is created shows the project name, indicates whether the build is a debug or release build, lists the files in the project, and lists the assembler and linker options used for the build.
Using a Non-PWB Makefile

PWB makefiles are highly structured and stylized makefiles that are generated from the rules in the project template and a list of files that you supply. Many projects have existing makefiles that PWB cant read because they do not have this stylized structure. These makefiles are called non-PWB or foreign makefiles. You can still take advantage of many of PWBs project features with non-PWB makefiles. The features that cannot be used are shown as unavailable menu items. Note that a PWB makefile is not required to use the Source Browserall you need to have is a browser database. For information on building a browser database, see Building Databases for Non-PWB Projects on page 94. To use a non-PWB make file: 1. From the Project menu, choose Open Project. 2. Select the non-PWB make file to open. 3. Select the Use as a Non-PWB Makefile check box. The Open Project dialog box appears. 4. Choose OK. Note A PWB makefile cannot be edited or modified when it is the open project. However, PWB does not disable modification of non-PWB makefiles. You can edit a non-PWB makefile, even when it belongs to the currently open project.
58
You can now use the Build, Rebuild All, and Build Target commands from the Project menu. The Build and Rebuild All commands work as they do with a PWB makefile by building the first target. However, the Language Options commands and the LINK Options command on the Options menu are unavailable. You set these kinds of options by editing the makefile. When you close a non-PWB project, PWB saves the environment, window layout, and file history just as it does for a PWB project.
Where to Go from Here

This concludes the PWB tutorial section of this manual. If you wish, you can leave PWB by choosing Exit from the File menu (or by pressing ALT+F4). Chapter 4, User Interface Details, explains how to start PWB, describes the elements of the user interface, and gives you an overview of the menus. Chapter 5, Advanced PWB Techniques, explains search techniques (including regular-expression searching), describes how to use the browser, and shows how to write PWB macros. Chapter 6, Customizing PWB, describes how to change the behavior of PWB to suit your needs. Chapter 7, PWB Reference, contains an alphabetical reference to PWB menus, keys, functions, predefined macros, and switches.
57
C H A P T E R
User Interface Details
This chapter summarizes the PWB user interface. It contains:

u u u u u
General information on starting PWB. Instructions on how to use elements of the PWB screen. A description of the indicators on the status bar. A summary of every PWB menu command. Instructions on how to use menus and dialog boxes.
Starting PWB
You can start PWB in either of the following ways:
u u
From the the Windows operating system Program Manager From the operating-system command line
From the Command Line

To start PWB from the command line: At the operating-system prompt, type:
PWB [ [options ] ] [ [filename] ]
PWB starts with its default startup sequence. For a complete list of PWB options and their meanings, see PWB Command Line on page 131. Sometimes, you will want to modify the default startup sequence. The following procedures are examples of how you can start PWB to accommodate different circumstances.
Filename: LMAETC04.DOC Project: Environment and Tools Template: MSGRIDA1.DOT Author: Cris Morris Last Saved By: Mike Eddy Revision #: 86 Page: 57 of 1 Printed: 10/09/00 02:48 PM
58
To start PWB with an existing PWB project: Type PWB /PP project.mak PWB opens the specified project and the files that you were working on with the project.
To start PWB with the project you used in your last session: Type PWB /PL As with the previous option, the /PL option opens a project and arranges your screen as it was when you left PWB.
To start PWB quickly for editing a file such as CONFIG.SYS: Type PWB /DAS /t CONFIG.SYS This command suppresses autoloading of extensions and status files (/DAS). It also tells PWB not to remember CONFIG.SYS for the next PWB session (/t CONFIG.SYS).
Using the Windows Operating System Program Manager

Microsoft Windows offers features that can enhance program development, particularly if you plan to develop Windows-based applications. You can edit and build your application in an MS-DOS session and then immediately run it under the Windows operating system. See Getting Started for a full description of how to set up Windows operating system icons for MASM in the Windows Program Manager. To start PWB with Windows, double-click the PWB icon. You can add a Program Item to the Program Manager for each project you are working on. Use the PIF editor to open PWB.PIF, and then choose Save As on the File menu to create a .PIF file with the same base name as your project. Next, use the Optional Parameters text box to specify the /PF or /PP options and the name of the project makefile. To run PWB in a window by default, you can change Display Usage in the PIF file to Windowed and (optionally) Execution to Background. Then, choose Project Templates on the Options menu. In the Build Rule edit field of the Customize Project Templates dialog box, type: macro WXFLAGS "/w" and select Set Build Rule. Choose OK.
Chapter 4 User Interface Details
59
Using the Windows Operating System File Manager

When programming, you are often concentrating on which file or project you want to work on and would prefer that the computer provide the right tool for the job. With the Windows File Manager, you can associate certain types of files with the commands that operate on those files. Therefore, when you doubleclick the filename in the File Manager, the right tool starts with the correct command-line options. You can associate project makefiles (.MAK files) with the PWB .PIF file. Double-clicking a project makefile then starts PWB and opens that project, source files and all. To associate PWB with .MAK files: 1. Select any file in the File Manager with the extension .MAK. 2. From the File menu, choose Associate. 3. Type the command PWB.PIF in the dialog box. (Make sure that your PWB.PIF file specifies a question mark (?) in the Optional Parameters text box.) Now when you double-click a project makefile, the File Manager automatically starts PWB, and PWB opens that project. Note Be sure you have set your PATH, INIT, and TMP environment variables prior to starting the Windows operating system so PWB can find all its files.
The PWB Screen

Figure 4.1 shows the PWB display. The table which follows it describes each of the user interface elements.
60
Figure 4.1 Name Menu bar Menu Desktop Icon Window Scroll bars Status bar
User Interface Elements Description Lists available menus. Lists PWB commands. Background area. Displays a window in compact form. Contains source code; displays Help, browser results, build results, or error messages. Change position in file or list. Shows command buttons for the mouse and shortcut keys; summarizes commands and file and keyboard status.
Figure 4.2 shows a PWB window. The table which follows it describes each of a windows elements.
61
Figure 4.2 Name
Window Elements Description Moves window. Drag to move the window. Closes the window. Click to close the window. Identifies window. Press ALT+number to move to that window. Indicates window contents, a filename, or pseudofile title. Shrinks window to an icon. Click to minimize the window. Enlarges window to maximum size or restores window to its original size. Scrolls up by lines. Click to scroll up. Scrolls up by pages. Click to page up. Indicates relative position in the file. Drag to change position. Scrolls down by pages. Click to page down. Scrolls down by lines. Click to scroll down. Sizes window. Drag to size the window. Moves window. Drag to move the window.
Window border Close box Window number Window title Minimize box Maximize/Restore box Scroll up arrow Page up area Scroll box Page down area Scroll down arrow Size area Move bar
62
Figure 4.3 shows the PWB status bar. The table which follows it describes each of the status bars elements.
Figure 4.3 Name
Status Bar Elements Description Shows command buttons for the mouse and shortcut keys, and summarizes commands. Indicates current file, editor, and keyboard status, as described in the following table. Shows the location of the cursor in the file. Show common commands and shortcut keys. Click the button or press the key to execute the command. Indicates the line at the cursor. When scanning a file during a search or when loading a file, PWB displays the current line in the line indicator as specified by the Noise switch. Indicates the column at the cursor.
Message area Status Location Command buttons Line
Column
The status area of the status bar displays one of the following letters to indicate the corresponding status.
Letter T R L M P A X O C N Description File is temporary and is not recorded in the PWB status file. File is no-edit (read-only); modification is disabled. Line endings in the file are linefeed characters only. File is modified. File is a pseudofile. Meta prefix (F9) is active. Macro recording is turned on. Overtype mode is enabled. In insert mode, no indicator appears.
CAPS LOCK is on. NUM LOCK is on.
63
Figure 4.4 shows the Window menu with the PWB Windows cascaded menu pulled down. The table which follows it describes each element of a menu.
Figure 4.4 Name Menu
PWB Menu Elements Description Displays a list of commands. Executes the command. When the command is dimmed, it is unavailable. Executes the command directly and bypasses the use of the menu. Press the key to execute the command. Lists a group of related commands. The command for a cascaded menu has a small right arrow after the command. To open a cascaded menu, click the command or move the selection cursor to the command and press the RIGHT ARROW key. To close an open cascaded menu, press the LEFT ARROW key. Executes the command. Press the highlighted letter key to execute the command. Indicates the selected command. Press the UP ARROW and DOWN ARROW keys to move the selection cursor. Press ENTER to execute the command.
Menu command Shortcut key Cascaded menu
Access key Selection cursor
64
PWB Menus
PWB commands are organized into menus; the menu names appear along the menu bar at the top of the screen. When a menu or command is selected, PWB displays a brief description of the selected menu on the status bar. To get more information about a menu or command, point the mouse cursor to the name and click the right mouse button, or highlight the name by using the arrow keys and then press F1.
File
The File menu provides commands to open, close, and save files. You can switch to any open PWB file or find a specific file on your disk. You can also print a selection, a file, or a list of files.
Command New Open Find Merge Next Save Save As Save All Close Print DOS Shell All Files Exit Description Start a new file Open an existing file Locate a file or list of files on disk Merge one or more files into the current file Open the next file in the list of files specified on the command line Save the current file Save the current file with a different name Save all modified files Close the current file Print a selection, the current file, or a list of files Temporarily exit to the operating system List all open files in PWB Leave PWB
Edit
The Edit menu provides commands to manipulate text, set the selection mode, and record macros.
Command Undo Redo Repeat Cut Description Reverse the effect of your recent edit Reverse the effect of the last Undo Repeat the last edit Delete selected text and copy it to the clipboard
Chapter 4 User Interface Details Command Copy Paste Delete Set Anchor Select To Anchor Stream Mode Box Mode Line Mode Read Only Set Record Record On Description Copy selected text to the clipboard Insert text from the clipboard Delete selected text without copying it to the clipboard Save the current cursor position Select text from the anchor to the cursor Set stream selection mode Set box selection mode Set line selection mode
65
Toggle the PWB no-edit state (to prevent accidental modification or to allow modification) Define a macro name and its shortcut key Record commands for a macro
Search
The Search menu provides commands to perform single-file and multifile text and regular-expression searches. You can do single-file and multifile find-andreplace operations. You can define and jump to marks or go to specific lines.
Command Find Replace Log Next Match Previous Match Goto Match Goto Mark Define Mark Set Mark File Description Search for an occurrence of a text string or pattern Search for a string or pattern and replace it with another Turn multifile searching on and off Move to the next match Move to the previous match Go to the match at the cursor in the Search Results window Move to a mark or line number Set a mark at the cursor Open or create a mark file
66
Project
The Project menu provides commands to open and create projects, build a project or selected targets in the project, and determine the location of build errors and messages.
Command Compile File Build Rebuild All Build Target New Project Open Project Edit Project Close Project Next Error Previous Error Goto Error Description Compile or assemble the current source file Build the project Build all files in the project (even those that have not been modified) Build specific targets in the project Create a new project Open an existing project Change the list of files in the project Remove the current project from memory without changing its contents Move to the next error Move to the previous error Move to the error at the cursor in the Build Results window
Run
The Run menu provides commands to set arguments for the projects program, run and debug the program, run operating-system commands, and add or run custom Run menu commands.
Command Execute Program Arguments Debug Run DOS Command Customize Run Menu Description Run the current program Specify commands passed to your program for Execute or Debug Run CodeView for the current program Perform any single DOS task without exiting PWB Add commands to the Run menu
The custom commands that you add to the Run menu appear after the Customize Run Menu command.
67
Options
The Options menu provides commands to set environment variables for use within PWB, customize the look and behavior of PWB, and assign keys to commands. For projects, you can set the build type, customize the project template, and set assembler and utility options.
Command Environment Variables Key Assignments Editor Settings Colors Build Options Project Templates Language Options Description View and modify environment variables Assign keys that invoke functions and macros Change the setting of any PWB switch Change screen colors Specify whether the program is built as a debug or release version; specify a build directory Cascaded menu of commands for project templates Cascaded menu of compiler options commands
The Project Templates cascaded menu provides the following commands to manage project templates:
Command Set Project Template Customize Project Template Save Custom Project Template Remove Custom Project Template Description Changes the run-time support and project template Modify the current project template Save the current template as a new, custom template Delete custom project templates
The Language Options cascaded menu provides the following commands for setting assembler and compiler options for any other languages that may be installed:
Command MASM Options Description Set assembler options
Note Additional languages, such as Basic and FORTRAN, are listed when their PWB extensions are loaded. To load the Basic extension, rename PWBBASIC.XXT in the BIN subdirectory to PWBBASIC.MXT. For FORTRAN, do the same for PWBFORT.XXT.
68
The following commands appear when the utilities extension (PWBUTILS) is loaded:
Command LINK Options NMAKE Options CodeView Options Description Set linker options for your project Set options for NMAKE when it builds the project Set options for CodeView when debugging the project
The following command appears when the browser extension (PWBROWSE) is loaded:
Command Browse Options Description Define the way the Source Browser database is built
Browse
The Browse menu provides the commands for the PWB Source Browser. You can select a browser database. You can jump to specific definitions or symbols in your project and view complex relationships among program symbols. You can also view your program as an outline, function-call tree, or, if you are using Microsoft C++, you can even view it as a class-inheritance tree.
Command Open Custom Goto Definition Goto Reference View Relationship List References Call Tree (Fwd/Rev) Function Hierarchy Module Outline Which Reference? Class Tree (Fwd/Rev) Class Hierarchy Next Previous Match Case Description Open a custom browser database, open the project database, or save the current database Locate the definition of any symbol in your source code Locate the references to any name in the browser database Query the browser database Display a list of functions that call each function and show the use of each variable, type, macro, or class View which functions call other functions Display a program outline Display an outline of program modules Display a list of possible references for the ambiguous reference at the cursor View the class-inheritance tree (for the C++ language) View the hierarchy of classes (for the C++ language) Find the next definition or reference Find the previous definition or reference Define whether or not browse queries are case sensitive
69
Window
The Window menu provides commands to manipulate and navigate windows in PWB.
Command New Close Close All Move Size Restore Minimize Maximize Cascade Tile Arrange PWB Windows PWB Window Build Results Search Results Print Results Record Clipboard Help Browser Output 1 window1 ... 5 window5 All Windows Description Duplicate the active window Close the active window Close all windows Start window-moving mode for the active window Start window-sizing mode for the active window Restore a minimized or maximized window to normal size Shrink the active window to an icon Enlarge windows to maximum size Arrange windows to show all their titles Arrange windows so that none overlap Organize windows in a useful configuration for viewing Help, source code, and Build Results Cascaded menu that lists the following special PWB windows: Description View the results of builds View the results of logged searches View the results of print operations View, edit, save recorded macros View the PWB clipboard Access the Help system View the results of browser queries Move to window n
View a list of all open windows
The All Windows command does not appear until the full list of open windows is too long to fit on the menu.
70
Help
The Help menu contains commands to access the Microsoft Advisor Help system. You can see the index or table of contents for the system, get contextsensitive Help, and perform global plain-text searches in the Help.
Command Index Contents Topic Help on Help Next Global Search Search Results About Description Display a list of available indexes Display a table of contents for each component of the Help system Display Help about the item or keyword at the cursor Display information on how to use Help Display the next Help screen that has the same name as the topic you last viewed Search all open Help files for a string or regular expression View the results of the last global Help search Display the PWB copyright and version number
Executing Commands
PWB commands appear in menus and as buttons. You can execute these commands in two ways:
u
With a Microsoft Mouse or any fully compatible pointing device You perform mouse operations by clicking moving the mouse cursor to the specified item and briefly pressing the left mouse button. Double-click by pressing the left button twice, quickly. Always use the left mouse button unless specifically instructed otherwise. With the keyboard
Choosing Menu Commands

To choose a menu command with the mouse: 1. Click the menu name to open the menu. 2. Click the command. To choose a menu command from the keyboard: 1. Press the ALT key to activate the menu bar. 2. Press the highlighted character in the menu name (such as F for File). An alternative is:
71
1. Press the ALT key to activate the menu bar. 2. Use the RIGHT ARROW and LEFT ARROW keys to select a menu. 3. Press ENTER to open the menu. 4. Press the highlighted character in the command name (such as S for Save in the File menu), or use the UP ARROW and DOWN ARROW keys to select the command and then press ENTER. There are several ways to close an open menu without executing a command. To close a menu without executing a command: Do one of the following procedures: u Click outside of the menu. u Press ESC. u Press ALT twice. When a menu command is dimmed (rather than black), it is unavailable. For example, when no windows are open, the Close command on the File menu is unavailable. If a command you want to use is unavailable, you must perform some other action or complete a pending action before you can invoke that command.
Shortcut Keys
Some commands are followed by the names of keys or key combinations. Press the shortcut key to execute the command immediately. You dont have to go through the menu. For example, press SHIFT+F2 to execute the Save command, which saves the current file. All menu commands with shortcut keys and many other menu commands invoke predefined PWB macros to carry out their action. You can change the key or add new shortcut keys for these commands by assigning a key to the predefined macro. For a complete list of predefined macros and their corresponding menu commands, see Predefined PWB Macros on page 207. For more information on assigning keys, see Changing Key Assignments on page 109. Many PWB functions have been assigned to keys besides those listed on the menus. Choose the Key Assignments command on the Options menu to view a list of functions and macros and their assigned keys.
72
Buttons
You can often execute commands by using buttons or boxes, which are areas of the screen that perform an action when you click them or select them from the keyboard. For example, the rectangle at the upper-left corner of a window is the close box. Clicking this box with the mouse closes the window. A command name surrounded by angle brackets (< >) appearing on the status bar or in a dialog box is a button. The following buttons are on the status bar when you first start PWB:
<General Help> <F1=Help> <Alt=Menu>
The General Help button brings up a screen that explains how to use the Help system. The other two buttons remind you of PWB functions: F1 summons Help, and ALT activates the menu bar. Clicking one of these buttons with the mouse performs the same function as pressing the key. When you have opened more than one window, PWB displays the following buttons:
<F1=Help> <Alt=Menu> <F6=Window>
Click the Window button or press
F6
to move to the next window.
When a menu is selected or a dialog box is displayed, an informative message appears on the status bar. While PWB displays this message, no buttons are available and clicking the status bar does nothing.
Dialog Boxes
When a menu command is followed by an ellipsis (...), PWB needs more information before executing the command. You enter this information in a dialog box that appears when you choose the command. Dialog boxes can contain any of the items in Figure 4.5.
73
Figure 4.5
Dialog Box Elements
Option Button A button that you select from a list of mutually exclusive choices. Click the one you want, press its highlighted letter, or use the arrow keys to move among the choices. Text Box An area in which you can type text. You can move the cursor within the text box by clicking the location with the mouse or by pressing the LEFT ARROW and RIGHT ARROW keys. You can toggle between insert and overtype mode by pressing the INS key. To select text for deletion, click and drag the mouse over the text or press SHIFT plus an arrow key. Press DEL to delete the text, or type new text to replace the highlighted text. List Box A box displaying a list of information (such as the contents of the current disk directory). If the number of items exceeds the visible area, click the scroll bar to move through the list or press PGUP , PGDN, or the arrow keys. You can move to the next item in the list that starts with a particular letter by typing that letter.
74
Combo Box The combination of a text box and a drop-down list box. You can type the name of an item in the text box or you can select it from the list. To open the list, click the highlighted arrow, or press ALT+DOWN ARROW or ALT+UP ARROW. You can then click the item or press the arrow keys to select the item you want. You can also press the first letter of an item to select it. When you have selected an item, click the highlighted arrow or press ALT+DOWN ARROW or ALT+UP ARROW to close the list. Command Button A button within angle brackets (< >) that invokes a command. Choose the OK button to accept the current options, or choose the Cancel button to exit the dialog box without changing the current options. Choose the Help button to see Help on the dialog box. If one of the command buttons in a dialog box is highlighted, press ENTER to execute that command. You can also click a command button to execute that command. If a button contains an ellipsis (...), it indicates that another dialog box will appear when you choose it. Check Box An on/off switch. If the box is empty, the option is turned off. If it contains the letter X, the option is turned on. Click the box with the mouse, or press the SPACEBAR or the UP ARROW and DOWN ARROW keys to toggle a check box on and off. Key Box A pair of braces ({ }) that allows you to enter a key by pressing the key. The key box is always followed by a text box where you can type the name of the key. When the cursor is in the key box (between the braces), most keys lose their usual meaning, including ESC, F1, and the dialog box access keys. The key you press is interpreted as the key to be specified. Only TAB, SHIFT+TAB, ENTER, and NUMENTER retain their usual meaning. To specify one of these keys, type the name in the text box.
75
Figure 4.6
Key Box and Check Box
Clicking a dialog-box item either selects it (a text box, for example) or toggles its value (a check box or option button). You can also move among items with the TAB and SHIFT+TAB keys. Dialog boxes usually contain access keys, identified by highlighted letters. Pressing an access key is equivalent to clicking that item with the mouse or moving to it by pressing TAB or SHIFT+TAB, and then pressing ENTER. Although some access keys are uppercase and others lowercase, dialog boxes are not case sensitive. Therefore, you can type either an uppercase or a lowercase character. Note When the cursor is in a text box, access keys are interpreted as text. You must press ALT along with the highlighted letter. Pressing ALT is also required in list boxes because typing a letter by itself moves the cursor to the next item that starts with that letter.
76
77
C H A P T E R
Advanced PWB Techniques
This chapter introduces you to some of the powerful features in the Programmers WorkBench. It is not an exhaustive discussion of all the ways to use PWB. However, it can provide a starting point for you to begin your own investigation of PWB using the information in the Microsoft Advisor and in Chapter 7, Programmers WorkBench Reference. This chapter contains:
u
u u u
Find and search-and-replace techniques, including marks and regular expressions. How to use the PWB Source Browser. How to execute PWB functions and macros. An overview of PWB macros, macro recording, and the macro language.
Searching with PWB

PWB offers the following ways to search your files for information:
u
Visually inspecting code, moving with the cursor or the PGUP and PGDN keys. This method is most effective either when you are familiarizing yourself with some old code or after you have switched from CodeView back to PWB and want to examine the local impact of a proposed change. Searching with text strings. When you have a specific string in mind (for example, FileName), you can find, in sequence, all the references to the name in your file. Searching with regular expressions. Regular expressions describe patterns of text. This is useful when you have a number of similarly named program symbols and youd like to see them all in succession. For example, Windows API (application programming interface) names are made up of multiple words; the start of each new word is shown as a capital
Filename: LMAET05A.DOC Project: Environment and Tools Template: MSGRIDA1.DOT Author: Harold S. Henry Last Saved By: Mike Eddy Revision #: 89 Page: 77 of 1 Printed: 10/09/00 02:38 PM
78
letter (for example, GetTextMetrics). With this in mind, you might search for a string optionally starting with spaces and the letters GetText followed by any uppercase letter. This is expressed with a regular expression such as *GetText[A-Z], which means zero or more spaces (using the * operator after a space), followed by GetText, followed by any uppercase letter (using a character class).
u
Searching multiple files with text strings or regular expressions. A multifile search is called a logged search. Instead of finding one match, PWB finds all matches in one operation. You can then browse the results of the search. Using the Source Browser. The Source Browser enables you to perform faster and more sophisticated searches than plain text searches because it maintains a complete database of relationships between program symbols. For example, to find all occurrences of FileName in your entire program (regardless of the number of files in the program), you can use the View References command from the Browse menu. The Source Browser has many more capabilities than just finding symbols. It can also produce call trees and perform sophisticated queries on the use-anddefinition relationships among functions, variables, and classes in your program.
These searching techniques are discussed in detail in the following sections.
Searching by Visual Inspection

If you think youre close to the text you want to see, dont try a fancy search use the PGUP or PGDN key. Its often faster. One trick you can use to speed up this method of searching is to leave a trail in the form of marks (names associated with file locations).
Using Marks
PWB lets you set named marks at any location in your file by using the Define Mark command from the Search menu or by using the Mark function. You can access these locations by name using the Goto Mark command or the Mark function. For example, if you are writing code and want to leave certain sections until later, or if you are revising an existing program and dont fully understand all the algorithms, you might leave a mark at each location in the code you want to come back to. That way, you can work on some sections of the program first, and then come back to the marked areas after further research. To save marks between PWB sessions, create a mark file using the Set Mark File command from the Search menu.
Chapter 5 Advanced PWB Techniques
79
Using the Find Command

The Find command on the Search menu allows you to search a file using text strings and regular expressions. Searching forward uses the Psearch function (assigned to the F3 key), while searching backwards uses the Msearch function (assigned to the F4 key). Find can help you locate any string of text in any file you specify. PWB usually searches the file you are currently editing. However, it can also search a list of files. This is particularly useful for finding all occurrences of a string in an entire project. The function used is called Mgrep. The results of a multifile search are logged, that is, put into the Search Results window. To see the logged results of a search, choose Search Results from the PWB Windows cascaded menu. There are two ways to use the information that PWB puts into Search Results:
u
You can look at the matches in sequence by choosing Next Match and Previous Match from the Search menu. You can go directly to a specific match by moving the cursor to the match listed in the Search Results window and choosing Goto Match from the Search menu. PWB then jumps to the location of the match.
The Match commands on the Search menu work with the Search Results window in exactly the same way that the Project menus Next Error, Previous Error, and Goto Error commands work with the Build Results window. These Project menu commands are described in Fixing Build Errors on page 23. To illustrate the logged-search technique, suppose you want to locate all instances of a software interrupt instruction in the SHOW projects source files. To search all the source files in this project: 1. From the Search menu, choose Find. PWB brings up the Find dialog box. 2. Turn on Log Search check box. 3. Type int in lowercase. 4. Select the Match Case check box to exclude uppercase or mixed-case occurrences of the word.
80
5. Choose the Files button. PWB brings up the Search In Selected Files dialog box.
6. Type SHOW*.ASM in the File Name text box. This wildcard specifies all filenames beginning with SHOW and having the .ASM extension. 7. Choose the Add Pattern button to add the wildcard to the file list. 8. In the Drives / Dirs window, select the SAMPLES\SHOW subdirectory under the main MASM directory. 9. Return to the File Name text box by clicking the box or by pressing ALT+F. 10. Type $INCLUDE:dos.inc (with the environment variable, INCLUDE, all in caps). Preceding an environment variable name with a dollar sign causes the contents of that variable to be inserted into the search string. The INCLUDE variable normally contains the path to the directory where general-purpose include files are kept. 11. Press ENTER to add DOS.INC to the file list, or click Add / Delete. 12. Choose OK to start the search. When PWB finishes the search, it displays the Log Search Complete dialog box.
81
From this dialog box you can:

u u
Choose View Results to open the Search Results window. Choose Cancel to close the dialog box.
Choose Cancel now (you will open the Search Results window later). To go to the first match: From the Search menu, choose Next Match. You can step sequentially through all occurrences of the string using the Next Match command. Choose Previous Match to move to the previous occurrence of the string. When you reach the end of Search Results, PWB displays the following message:
End of Search Results
Sometimes, you cannot focus the search narrowly enough to make a sequential scan of Search Results profitable. In this example, you wanted only instances of the software interrupt instruction, but PWB found many more occurrences of int. In these cases, you can examine the results of the search and skip the matches that arent relevant. To view the Search Results: To see all matches from the search, open the Search Results window. You can do this by choosing Search Results from the PWB Windows cascaded menu on the Window menu.
82
In the Search Results window, PWB displays the file, line, and column where the string was located. It also shows as much of the matching line as will fit in the window.
For example, if the instruction you were looking for is the Interrupt 10h in PAGERR.ASM, .you can jump directly to that location. To jump directly to a match: 1. Put the cursor on the match. 2. From the Search menu, choose Goto Match. PWB opens the correct file if it is not already open and positions the cursor on the text you located. You can use multifile searching regardless of whether the files that you want to search are open in PWB.
Using Regular Expressions

The PWB searching capabilities that you have used so far are useful when you know the exact text you are looking for. Sometimes, however, you have only part of the information that you want to match (for example, the beginning or end of the string), or you want to find a wider range of information. In such cases, you can use regular expressions.
83
Regular expressions are a notation for specifying patterns of text, as opposed to exact strings of characters. The notation uses literal characters and metacharacters. Every character that does not have special meaning in the regular-expression syntax is a literal character and matches an occurrence of that character. For example, letters and numbers are literal characters. A metacharacter is an operator or delimiter in the regular-expression syntax. For example, the backslash (\) and the asterisk (*) are metacharacters. PWB supports two syntaxes for regular expressions: UNIX and non-UNIX. Each syntax has its own set of metacharacters. The UNIX metacharacters are .\[]*+^$. The non-UNIX metacharacters are ?\[]*+^$@#(){}. Because it uses fewer metacharacters, the UNIX form is a little more verbose. However, it is more familiar to programmers who have experience with UNIX tools such as awk and grep. This book uses the UNIX syntax, but any expression that can be written with this syntax can also be written with the non-UNIX syntax. The regular-expression syntax used by PWB depends on the setting of the Unixre switch (this is a Boolean switch, and UNIX is the default). You can change the Unixre switch by using the Editor Settings dialog box on the Options Menu. Note PWB switches that take regular expressions always use UNIX syntax. They are independent from the Unixre switch.
Finding Text
In the multifile searching example, you learned how to locate every occurrence of int in the SHOW project. In a large project, finding every int would yield too many matches. To narrow the search, you can use a regular expression. For this example, lets say you want to match any int instruction. You can specify this with a regular expression. The expression below matches text that:
u u u
Begins with the keyword int Is followed by white space Is followed by one or more hex digits (characters between 0 and 9 or between A and F)
The syntax for this regular expression is shown in Figure 5.1.
Figure 5.1
Regular Expression Example
84
It illustrates the following important features of regular expressions: 1. Regular expressions can contain literal text. In this example, int is literal text and is matched exactly. 2. Regular expressions can contain predefined regular expressions. Here, \:b is shorthand for a pattern that matches one or more spaces or tabs (that is, white space). For a complete list of predefined regular expressions, see Appendix B. 3. You can use classes of characters in regular expressions. A class matches any one character in the class. For example, the class [0-9a-f] is the class of characters that contains the digits between 0 and 9 and the lowercase letters between A and F. The dash ( ) defines a range of characters in a class. 4. The plus sign (+) after the class instructs PWB to look for one or more occurrences of any of the characters in the class. This is the key to regular expressions. You dont have to know exactly what the interrupt number is; all you have to do is describe what kind of characters make it up. The pattern int\:b[0-9A-F] matches strings such as
int 21h int 3Ah ; Print 25 lines... ; DOS function interrupt
Figure 5.2 shows a more detailed way to write an expression that matches only an int 20h or an int 21h.
Figure 5.2
Complex Regular Expression Example
This expression is more precise than most searches require, but it is useful as an illustration of how to write a complex regular expression. You can interpret this expression as follows:
85
1. Start at beginning of the line, which is specified by a caret (^) at the beginning of the regular expression. Using an initial caret is particularly helpful in a situation like this if your file uses space characters rather than tabs. Otherwise, when you begin your search criteria with \:b, the search will return one match for every space character preceding the matching text. For example, if your instruction column is indented eight spaces, searching for \:bmov\:b will return eight copies of every mov instruction, one for each of the preceding space characters. Including the inital caret, however, will result in only one match per line. 2. Skip any label on the line, without matching a comment line. The [^; \t] indicates a class made up of any characters that are not a semicolon, a space or a tab character. Within brackets, a caret (^) at the beginning of the class indicates an inverse class, that is, one including all characters except the specified ones. Following the class is an asterisk, indicating that zero or more such characters may be present. In general, optional items are specified using the asterisk (*) operator, which indicates that zero or more of the preceding character should be matched. For example, the expression * means match zero or more spaces. 3. Skip white space. The predefined UNIX regular expression \:b is equivalent to [ \t]+, which requires that there be at least one space or tab. 4. Look for the int instruction as literal text. 5. Skip white space. The expression [ \t]+ is equivalent to \:b, and requires that there be one or more space or tab. 6. Skip optional 0 digits. 7. Look for a 2 digit as literal text. 8. Look for either a 0 digit or a 1 digit. 9. Look for an uppercase or lowercase h character. This expression is so exact that it may take longer to write than the time it saves. The key to using regular expressions effectively is determining the minimal characteristics that make the text qualify as a match. To find all int 20h and int 21h instructions: 1. From the Search menu, choose Find. 2. In the Find Text box, type ^\:bint\:b2[01]. 3. Select the Regular Expression check box. 4. Choose the Files button. 5. Add the pattern *.ASM and the file $INCLUDE:DOS.INC to the file list. 6. Choose OK to start the search.
86
When the search is complete, choose View Results. You can see in the Search Results window that PWB matched only the int 20h and int 21h instructions.
Replacing Text
You can use regular expressions when changing text to achieve some extremely powerful results. A regular expression replacement can be a simple one-to-one replacement, or it can use tagged expressions. A tagged expression marks part of the matched text so that you can copy it into the replacement text. For example, you can manipulate lists of files easily using regular expressions. This exercise shows how to get a clean list of files that is stripped of the size and time-stamp information. To get a clean list of .ASM files in the current directory: 1. From the File menu, choose New. This gives you a new file for the directory listing. 2. Execute the function sequence Arg Arg !dir *.ASM Paste. The default key sequence for this command is to press type!dir *.asm, then press SHIFT+INS.
ALT+A twice,
Arg Arg introduces a text argument to the Paste function with an Arg count of two. The exclamation point (!) designates the text argument to be run as an operating-system command. Without the exclamation point, the text is the name of a file to be merged. If only one Arg is used, PWB inserts the text argument. PWB runs the DIR command and captures the output. When the DIR command is complete, PWB prompts you to press a key. When you press a key, PWB then inserts the results of the command at the cursor. For more information about this and other forms of the Paste function, see Paste in Chapter 7, Programmers WorkBench Reference. 3. From the Search menu, choose Replace. 4. In the Find Text box, type \:b\:z \:z-.*$ This pattern means:
u u u u u
White space followed by A number followed by Exactly one space followed by A number followed by A dash () followed by

u u
87
Any sequence of characters, then End of the line
This string must be tied to the end of the line to prevent the search from finding anything too close to the beginning of the line. 5. Make sure there are no characters in the Replace Text text box. 6. Choose Replace All. PWB prompts you to verify that you want to replace text with an empty string. 7. Choose OK to confirm that you want to perform the empty replacement. All the file-size, date, and time-stamp information is removed. Because you did not reuse any of the original text in the replacement, this is a simple regular expression replacement. Choose Close from the File menu to discard the text you created in the previous exercise. A more complicated task is backing up the .ASM files to a directory called LAST, which is assumed to be a subdirectory of the current directory. A batch file makes this easier. You can create such a batch file using regular expressions. To create a batch file that copies the .ASM files to a subdirectory: 1. Create a list of .ASM files in the current directory as described in the previous example, but do not remove the file sizes, dates, and times. 2. Delete the heading printed by the DIR command. 3. From the Search menu, choose Replace. 4. In the Find Text text box, type ^$[^ ]+$[ ]+$[^ ]+$.* 5. This expression finds a string that starts at the beginning of the line (^). Placing parts of the expression inside the delimiters $ and $ is called tagging. The first tagged expression ($[^ ]+$) matches one or more characters that are not spaces. A leading caret in a class means not. The pattern then matches one or more spaces ([ ]+), followed by the second tagged expression which matches one or more characters that are not spaces. The remainder of the line is matched by the wildcard (.), which matches any character, and the repeat operator (*). Matching the rest of the line is important because that is how this pattern removes everything after the filename. It discards these portions of the matched text.
88
6. In the Replace Text text box, type COPY \1.\2 .\\LAST 7. Select Replace All and choose OK to begin the find-and-replace operation. PWB transforms each directory entry into a command to copy the file to the LAST subdirectory.
The word COPY is inserted literally. The text matched in the first tagged expression (the base name) replaces the expression \1. The period is inserted literally. The text matched by the second tagged expression (the filename extension) replaces the expression \2. The space is inserted literally. The text .\\LAST is inserted as .\LAST. Be sure to use two backslashes to indicate a literal backslash; otherwise, PWB expects a reference to a tagged expression such as \1 and displays an error message. Youll notice that the last two lines of the file are not useful in your batch file. They are the remnants of the summary statistics produced by the DIR command. Delete these two lines and you have a finished batch file.
Using the Source Browser

Another search technique is browsing. Browsing uses information generated by the compiler to help you find pieces of code quickly. This section introduces you to some of the capabilities of the Source Browser. The browser is a handy tool for moving about in projects, large and small.
89
In addition to navigating through your program, you can use the browser to explore the relationships between parts of the project. The browser database contains full information about where each symbol is defined and used and about the relationships among modules, constants, macros, variables, functions, and classes. Note that the browser files can be very large.
Creating a Browser Database

Before you can use the PWB Source Browser, you must build a browser database. PWB helps you maintain this database automatically as a part of a normal project build. To build a browser database: 1. Open the SHOW project using the Open Project command from the Project menu (this project is located in the SAMPLES\SHOW subdirectory of your main MASM directory). 2. From the Options menu, choose Browse Options. PWB displays the Browse Options dialog box.
3. Select the Generate Browse Information check box. 4. Choose OK. The browser changes the project makefile to build the project. It adds compiler options for creating browser information (.SBR files). It includes a BSCMAKE command which combines the .SBR files and creates a browser database (a .BSC file). 5. From the Project menu, choose Rebuild All. Rebuilding the entire project ensures that the database contains up-to-date information for all files in your program.
90
When the build completes, the following new files are on your disk:
u u u u
SHOW.BSC, the browser database SHOWUTIL.SBR, a zero-length placeholder for the SHOWUTIL module. PAGER.SBR, a placeholder for PAGER. SHOW.SBR, a placeholder for SHOW.
After adding each .SBR files contribution to the database, BSCMAKE truncates it and leaves the empty .SBR file on disk to provide an up-to-date target for later builds. Leaving these files on the disk ensures that a browser database is not rebuilt unless it is out of date with respect to its source files. A PWB project is not required to create a browser database (although it is convenient). For information on how to build a browser database for non-PWB projects, see Building Databases for Non-PWB Projects on page 94.
Finding Symbol Definitions

When you are working on a program, its easy to forget where a particular variable, constant, or function is defined. You can use the Find command to locate occurrences of a symbol, but that offers little information about which one is the definition. To make such searches easier, you can choose Goto Definition from the Browse menu to jump directly to the definition of any symbol in your program. The following procedure uses the SHOW project to demonstrate how powerful the browser can be. To investigate the GetNamePos procedure: 1. From the Window menu, choose Close All. 2. Open SHOW.ASM. 3. Go to line 174 (from the Search menu, choose Goto Mark, type 174, and press ENTER). 4. Move the cursor to the GetNamePos procedure. 5. From the Browse menu, choose Goto Definition. PWB displays the Goto Definition dialog box.
91
Notice that GetNamePos is highlighted and the defining files name is displayed in the list box to the right. More than one defining file is listed if a name is defined in several scopes. 6. Choose OK. PWB opens SHOWUTIL.ASM and shows the definition of GetNamePos.
Showing the Call Tree

Often when analyzing an existing programs flow, or when looking for opportunities for optimization, its useful to refer to a call tree. A call tree is a view of your program that provides, for each function, a list of other functions called. To generate a call tree of SHOW: 1. From the Browse menu, choose Call Tree. PWB displays the Display Tree dialog box.
92
2. Choose SHOW.ASM from the Modules list box. Notice that the Functions list box changes to show only the functions in SHOW.ASM. 3. Choose OK to see the call tree. Three kinds of annotations appear in the call tree: ? A symbol followed by a question mark is used by your program but not defined in any of the program files in the browse database. These are often library functions. [n] The number n between square brackets shows symbols that are used more than once. In the preceding example, GetNamePos is listed (under SHOW.ASM) as:
GetNamePos[3]
This means that there are three references to GetNamePos in SHOW. ... (ellipsis) The ellipsis means that the full information for the function appears elsewhere in the call tree.
Finding Unreferenced Symbols

As you write, rewrite and maintain a program, you will occasionally remove function calls or references to global variables, leaving unused code or data space in your program. Since the browser database contains information about where every function and variable is referenced, you can easily identify ones that are not used. This section shows how to use the Source Browser to find and remove the extra code and data.
93
The system include files define many more functions than most programs use. Therefore, unreferenced functions in your program are easiest to find when using a browser database that does not contain the system include files. This example begins by building a browser database for SHOW that does not contain information defined by system include files. To build the SHOW browser database: 1. From the Options menu, choose Browse Options. PWB displays the Browse Options dialog box. 2. In the Browse Options dialog box, select the Generate Browse Information, the Exclude System Include Files, and the Include Unreferenced Symbols check boxes. 3. Choose OK. Now that the browse options are set, rebuild the project and browser database by choosing Rebuild All from the Project menu. With the updated browser database, you can obtain a list of references for functions and variables. To get a list of references for function and variables: 1. From the Browse menu, choose List References. PWB displays the List References dialog box.
2. Select the Functions, Variables and Macros options, and then choose OK. PWB opens the Browser Output window and creates the list of references. Each name is followed by a colon and a list of functions that refer to the name. To find an unreferenced symbol: Search for the regular expression :$ (colon, dollar sign). This pattern specifies a colon at the end of the line. It finds names that are followed by an empty list of references.
94
In the list of references created above for SHOW, a search for this expression will find no matches, since there are no unreferenced symbols.To find all unreferenced items with one search, you can perform a logged search and add only <browse> (the Browser Output pseudofile) to the file list. This is especially useful for large projects. To go to the definition of an unreferenced symbol in the source: 1. Place the cursor on the symbol in question. From the Browse menu, choose Goto Definition. PWB automatically selects the definition of the symbol under the cursor. However, if the symbol begins with @ or ? or other punctuation characters, the nonalphabetic character is not automatically recognized as part of the symbol name. To include it, mark the entire name before choosing Goto Definition. 2. Choose OK. PWB jumps to the definition of the selected symbol in the appropriate source file, where you can remove the unused function, macro or variable.
Advanced Browser Database Information

In the previous sections, you learned the basics of building a browser database and some useful applications of the Source Browser. In this section, you will find information on what goes into a browser database and how to estimate the disk requirements to build one. You will also learn how to build a database for non-PWB projects and how to build a single database for related projects.
Estimating .SBR and .BSC File Size

When you build a browser database, you first create an .SBR file for each source file in the project. Each of these files contains the following information:
u u
u u u
The name of the source file and the files it includes. Every symbol defined in the source file and the files it includes. These symbols are the names of all functions, types (including the names of all classes, structures, and enumerations and their members), macros (including symbols in the expanded macro), and variables in the file. These symbols also include all parameters and local variables for the functions. The location of all symbol definitions in the files. The location of all references to every symbol in the files. Linkage information.
95
This is a tremendous amount of information about your program and can therefore occupy a large quantity of disk space. The benefit is that the Source Browser provides fast, sophisticated access to this database of knowledge about your program. For assembler source files, the .SBR file may be between a quarter and a half the size of the preprocessed source file (that is, the source file with comments removed, all files included, and all macros expanded). You might assume that the resulting browser database (.BSC file) is approximately the sum of all the .SBR files. However, the browser database is the union of the information in the component .SBR files. This means that the .BSC file is usually not very large. Much of the information in the .SBR files is defined in include files, which are common to many modules in the project. The union of the .SBR files is relatively small because most of the include-file information is duplicated in each .SBR file. Even for C or C++ programs, which tend to create much larger .BSC files, a good-sized program will seldom require a .BSC file larger than 500K.
Building Databases for Non-PWB Projects

The simplest way to build a browser database for non-PWB projects is to build the browser database separately from the project. You can use a makefile or a batch file for this purpose. The process requires only two steps: 1. Create an .SBR file for each module. The simplest way to do this is to run the compiler with the options to produce an .SBR file and no other files. For example, the ML command line:
ML /Zs /W0 /Fr *.asm
specifies that the compiler processes all .ASM files in the current directory, checks syntax only (/Zs) and issues no warnings (/W0). Therefore, no object files are produced. However, browser information (.SBR files) are generated ( /Fr). 2. Combine the .SBR files into a browser database. The syntax for this command is: BSCMAKE options /o project.BSC *.sbr For complete information on BSCMAKE options and syntax, see Chapter 19. The process of creating a browser database changes little between projects. Therefore, you could use a batch file for many projects similar to the following example:
96

ECHO OFF REM Require at least one command-line option IF %1.==. GOTO USAGE REM Compile to generate only .SBR files ML /Zs /W0 /Fr *.asm REM Build the browser database BSCMAKE %2 %3 %4 %5 %6 %7 %8 /o%1.BSC *.sbr GOTO END :USAGE REM Print instructions ECHO -Usage: %0 project [option]... ECHO project Base name of browser database ECHO [option]... List of BSCMAKE options :END
This batch file assumes that all the project sources are in the current directory. It requires that you specify the name of the browser database and allows BSCMAKE options. You may want to change this file to specify different BSCMAKE or assembler options. If your projects sources are distributed across several directories, you must write a custom batch file or makefile to build the database. For more information on the BSCMAKE utility, see Chapter 19. To use a custom browser database in PWB: 1. From the Browse menu, choose Open Custom. 2. Choose the Use Custom Database button. 3. Select your custom browser database and choose OK. If you want to save this database name permanently, choose Save Current Database. 4. Choose OK. The PWB Source Browser opens your custom database. You can now browse your non-PWB project. If you are using a makefile to build your project, you can choose Open Project from the Project menu and open it as a non-PWB project makefile. If the project makefile has the same base name as the browser database and resides in the same directory, PWB automatically opens the database when you open the project. For more information on using a non-PWB makefile for a project in PWB, see Using a Non-PWB Makefile on page 55.
97
Building Combined Databases

If you have two or more closely related projects, you can combine the browser databases for the projects. For example, if two large programs differ only in one or two modules so that most of the sources are shared between the two projects, it can be useful to browse both projects with a single browser database. To build a combined browser database: 1. Generate the .SBR files for both projects. 2. Pass all of the .SBR files to BSCMAKE to build the combined database. The resulting database is the inclusive-OR of the information in the two projects.
Executing Functions and Macros

The menus and dialog boxes in PWB provide access to almost everything you need to do to develop your projects. You can edit, search, and browse your source files. You can build, run, and debug your project, and you can view Help for the entire system. However, the visible display provides access to only part of the capabilities available in PWB. Behind the menu commands lie functions with many more options than you can access from the menus. Many functions and macros are not assigned to keys by default. The sophisticated PWB user learns how to use the functions and predefined macros to perform the precisely correct action. Each function has several forms that are invoked with the combinations of the Arg and Meta prefixes. These two functions are used to introduce arguments and modify the action of PWB functions. Arg (ALT+A) The fundamental function in PWB. You use Arg to begin selecting text, introduce text and numeric function arguments, or modify the action of functions by increasing the Arg count. To pass a text argument to a function, for example, press ALT+A, and then type the text. The text you type doesnt go into your file. The Text Argument dialog box appears when you type the first letter of the text.
98
You can then edit the text. PWB displays the current argument count and Meta state in the dialog box. Notice that there is no OK button in this dialog box. Instead of choosing OK, press the key for the function you want to execute with this argument. Choose the Cancel button if you do not want to execute a function. Meta (F9) Modifies the action of a function in different ways from the various argument types. It generally toggles an aspect of the functions action. For example, the text-deletion functions usually move the deleted text to the clipboard. However, when modified with Meta, they clear the text without changing the clipboard. The combination of Arg and Meta greatly increases the number of variations available to each function. For example, the Psearch function can perform different search operations depending on how it is executed. Psearch can:
u u u u u
Repeat the previous search (Psearch). Search for text (Arg text Psearch). Perform a case-sensitive text search (Arg Meta text Psearch). Search for a regular expression (Arg Arg text Psearch). Search for a case-sensitive regular expression (Arg Arg Meta text Psearch).
Because you can reassign keys to your preference, the PWB documentation cannot assume that a specific key executes a given function or macro. Therefore, the PWB documentation gives a sequence of functions or macros by name, followed by the same sequence of actions by key name. In this book, the key is the default key. In PWB Help, the displayed key is the one currently assigned to that function. When no key is assigned, PWB displays unassigned. For example, to insert the definition of a macro at the cursor, you pass the name of the macro to the Tell function and modify Tells action with the Meta prefix. This sequence of actions is expressed as follows:
u
Execute the function sequence Arg Meta macroname Tell (ALT+A F9 macroname CTRL+T).
If the Tell function is assigned to a different key, Help displays that key in place of CTRL+T. Chapter 7, Programmers WorkBench Reference, contains complete descriptions of all forms of each function in PWB.
99
Executing Functions and Macros by Name

The most frequently used functions and macros are assigned to certain keys by default. For example, the Paste function is assigned to SHIFT+ENTER, Linsert is assigned to CTRL+N, and so on. Sometimes, however, you want to use a function or macro that is not assigned to a key. You can always assign a key by using the Key Assignments command or by using the Assign function. However, that is a lot of trouble for something you need only once. PWB allows you to execute a function or macro by name, rather than by pressing a key. To execute a function or macro by name: Perform the function sequence Arg function Execute (ALT+A function F7). In other words, press ALT+A (execute the Arg function), type the name of the function or macro, and then press F7 (invoke the Execute function). The argument to Execute doesnt have to be a single function or macro name. It can be a list of functions and macros. The argument is really a temporary, nameless macro. This means that you can do anything in an argument to Execute that you can do in a macro. PWB follows the rules for macro syntax and execution. You can define labels, test function results, and loop. Warning When executed from a macro, PWB functions that display a yes-or-no prompt assume a Yes response. To restore the prompt, use the macro prompt directive (<). For more information, see Macro Prompt Directives in PWB Help.
Writing PWB Macros

The Programmers WorkBench, like other editors designed for programmers, provides a macro language so that you can customize and extend the editor or automate common tasks. You can create macros in one of the following ways:
u
By recording actions you perform. The recording mechanism allows you to perform a procedure once, while PWB is recording. After youve recorded it, you can execute the macro to repeat the recorded procedure. By manually writing macros. This technique is less automatic but does allow you to write more powerful macros.
These two techniques are not mutually exclusive. You can start by recording a macro that approaches the steps you want to perform, then edit it to expand its functionality or handle different situations.
100
When Is a Macro Useful?

Macros are useful for automating procedures you perform frequently. You may also write macros that automate tedious one-time tasks. Of course, not every task is a good candidate for automation. It might take longer to write the macro than to do the task by hand. If you dont expect to perform a task often, dont automate it. Also, automated editing procedures introduce an element of risk. You might not foresee situations that your macro can encounter. Incorrect macros can sometimes be destructive. A little experience with macros and some careful testing will enable you to create a good set of macros for your own use.
Recording Macros
Recording actions you perform with the mouse or at the keyboard can be a powerful way to write a macro. You turn on recording and perform the actions that you want the macro to execute. You can concentrate on the task that you want to automate, instead of concentrating on the syntax of the macro language. For example, if you occasionally reverse characters when you type quickly, a macro to transpose them is useful. Before recording a macro to transpose characters, you should think about what you are going to do while recording the macro. To transpose characters, you will select the character at the cursor, cut it onto the clipboard, move over one character, and then paste the character you cut. To record a macro that transposes characters: 1. From the Edit menu, choose Set Record. PWB brings up the Set Macro Record dialog box.
2. In the Name text box, type Transpose. 3. Click the mouse in the key box (between the braces { }), or press the cursor is in the key box. 4. Press CTRL+SHIFT+T (for transpose).
TAB until
101
PWB automatically fills in the name of the key you pressed.
5. Press TAB to leave the key box, and then choose OK. PWB closes the Set Macro Record dialog box. When you turn on macro recording, PWB records a macro called Transpose and associates it with SHIFT+CTRL+T. Important The Set Macro Record command does not start the macro recorder. It only specifies the name and key association for the macro you are going to record. 6. From the Edit menu, choose Record On. When you choose Record On, the macro recorder starts. To indicate that the macro recorder is running, PWB displays the letter X on the status bar. Notice that the Project, Options, and Help menus are unavailable while PWB is recording a macro. 7. Select the character at the cursor by holding down the SHIFT key and pressing the RIGHT ARROW key. 8. Press SHIFT+DEL to cut the character onto the clipboard. 9. Press the RIGHT ARROW key to move the cursor to the new location for the character.
101
10. Press SHIFT+INS to paste the character from the clipboard back into the text. 11. From the Edit menu, choose Record On to stop the macro recorder. Press SHIFT+CTRL+T to switch the character at the cursor with the character to the right. You can now use the new macro and key assignment for the rest of the PWB session. To edit the macro: From the Window menu, choose Record from the PWB Windows cascaded menu. PWB opens the Record window.
The Record window shows the definition of the Transpose macro that you just recorded. You can edit the definition to change the way the macro works. For example, you decide that the macro should reverse the character at the cursor with the character to the left, instead of the character to the right. To redefine the macro: 1. Change the macro to read as follows:
Transpose:=select left delete left paste
2. Move the cursor to the macro definition.
Filename: LMAET05B.DOC Project: Environment and Tools Template: MSGRIDA1.DOT Author: Mike Eddy Last Saved By: Mike Eddy Revision #: 6 Page: 101 of 1 Printed: 10/09/00 02:50 PM
102
3. Press ALT+=, the default key for the Assign function. Assigning the macro replaces the previous definition of Transpose with the new definition. 4. Return to the file you were originally viewing. Up to this point, the macro exists only in memory. To use your recorded macro for subsequent PWB sessions, you must save the definition of the macro to disk. To save the macro: 1. If the Record window is not open, choose Record from the PWB Windows cascaded menu. PWB opens the Record window. 2. From the File menu, choose Save. PWB inserts the macro definition and the key assignment into your TOOLS.INI file for future sessions. When you leave PWB, you are prompted to save TOOLS.INI. Your changes are not permanent until you actually save TOOLS.INI.
Flow Control Statements

Recorded macros have the inherent limitation of playing back one fixed sequence of commands. Often you need a macro to execute repeatedly until some condition is satisfied. This requires that you use flow control statements to govern the actions your macro takes. All editor functions return a true or false value. The macro flow control operators that use these values are:
Operator +>label >label =>label :>label Meaning Branch to label if last function yields TRUE Branch to label if last function yields FALSE Branch unconditionally to label Define label
These rudimentary operators are not as sophisticated as a high-level languages IF statement or FOR loop. They are more like an assembly languages conditional jump instruction. However, they provide the essential capabilities needed for writing loops and other conditional constructs.
103
Flow Control Example

If you frequently perform multiple-window editing, a macro that restores the display to a single window can be helpful. Such a macro requires the following logic: 1. Switch to the next window. 2. If the switch is not successful (meaning that only one window is present), end the macro. 3. If the switch is successful (another window is present), close that window and go back to step one. This macro will be called CloseWindows and assigned to SHIFT+CTRL+W. To create the CloseWindows macro: 1. From the File menu, choose All Files. PWB displays the All Files dialog box. Notice that your TOOLS.INI file is in the list of open files, even though you did not explicitly open it. PWB opens TOOLS.INI to load its configuration information (unless when you specify /DT on the PWB command line). 2. Select TOOLS.INI file in the list of open files. 3. Choose OK. PWB opens a window and displays your TOOLS.INI file. 4. Find the section of TOOLS.INI that begins with [pwb]. This is the section where PWB keeps its startup configuration information. 5. In the PWB section, type the following two new lines:
CloseWindows:= :>Loop Openfile -> Meta Window Window =>Loop CloseWindows: SHIFT+CTRL+W
If you want these definitions to take effect immediately, select both lines and press ALT+= to execute the Assign function. You can also assign the definitions one at a time. 6. Choose Save from the File menu to make this macro and key assignment part of your TOOLS.INI file. The next time you start PWB, the CloseWindows macro is defined and assigned to the SHIFT+CTRL+W key. The first line you typed uses the := operator to associate the macro definition with the name CloseWindows. After the operator is the list of functions and macro operators that specify what the macro is to do. The second line is a
104
SHIFT+CTRL+W
separate statement that uses the : operator to assign the macro to the key.
The CloseWindows macro works as follows: 1. Loop defines a label called Loop. There cannot be a space between the :> operator and the label name. 2. Openfile switches to the window under the active window. 3. The -> operator examines the return value from the Openfile function. If the function returns false because there is no other window, the -> operator exits the macro. 4. The phrase Meta Window closes the active window. 5. Window returns to the window you started from. 6. Loop unconditionally transfers control back to the Loop label and starts the sequence again. When this macro is defined, you can press SHIFT+CTRL+W whenever you want to close all windows except the active window.
User Input Statements

PWB macros can prompt for input. This helps you write more general macros. For example, you might keep a history of the changes you make to a file at the top in a format similar to the following:
//** Revision History ** //15-Nov-1991:IAD:Add return value for DoPrint //31-Oct-1991:IAD:Implement printing primitives
To facilitate entering the revision history in reverse chronological order and to make it easy to keep track of where you were in the source file, you can write a macro to perform the following steps: 1. Set a mark at the cursor for future reference. 2. Insert a revision history header at the beginning of the file if one is not present. 3. Insert the current date. 4. Prompt for initials and insert them just below the header. 5. Prompt for comments and insert them after the initials. 6. Return to the saved position in the file. Note that while this macro is executing, you can choose the Cancel button in the dialog boxes that prompt for initials and comments. The macro must handle these cases and gracefully back out of the changes to the file.
105
To enter this macro in TOOLS.INI: 1. Open TOOLS.INI for editing. 2. Type the following macros and key assignment in the [pwb] section of TOOLS.INI:
LineComment:=// " RevHead:= "** Revision History **" RevComment:= \ Arg Arg "Start" Mark \ Begfile Arg RevHead Psearch +>Found \ Linsert LineComment RevHead \ :>Found \ Down Linsert Begline LineComment Curdate " (" \ Arg "Initials" Prompt ->Quit Paste Endline ") " \ Arg "Comment" Prompt ->Quit Paste =>End \ :>Quit Meta Ldelete \ :>End Arg "Start" Mark RevComment:Ctrl+H
There are at least two spaces before the backslash at the end of each line. The backslashes are line-continuation characters. They allow you to write a macro that is more than one line long. In this case, line continuations format the macro in a readable way. To further assist in readability, you can indent the parts of the macro which define the actual keystrokes, as in the preceding example. 3. Choose Save from the File menu to save your changes. 4. To reinitialize PWB, execute the Initialize function by pressing SHIFT+F8. PWB discards all of its current settings and rereads the PWB section of TOOLS.INI. The same effect can be achieved by quitting and restarting PWB. The following discussion analyzes the workings of the definitions you added to TOOLS.INI. It repeats one or two lines from the text you typed and describes how each line works. You may want to refer to the full definition as you follow along. The first two lines
LineComment:="//" RevHead:= "** Revision History **"
define two utility macros that are used by the main RevComment macro. They define strings that are used several times in RevComment. The third line
106

RevComment:= \
declares the name of the macro. The succeeding lines define the action of the RevComment macro.
107
The first line of the definition

Arg Arg "Start" Mark \
sets a mark named Start at the cursor so that the macro can restore the cursor position after inserting the comments at the beginning of the file. The next line
Begfile Arg RevHead Psearch +>Found \
moves to the beginning of the file (Begfile), then searches forward for the revision-history header. If the header is found, PWB branches to the Found label; otherwise, it executes the next line.
Linsert LineComment RevHead \
If the macro is here, the header was not located in the file. The Linsert function creates a new line, and PWB types the revision-history header. The macro continues with the line:
:>Found \
This line defines the Found label. At this point in the macro, the cursor is on the line with the header. The next lines insert the new revision information, starting with the following line:
Down Linsert Begline LineComment Curdate " (" \
PWB moves the cursor down one line (Down ), inserts a new line (Linsert ), moves to the beginning of the line (Begline), and calls the LineComment macro to designate the line as a comment. PWB then types the current date (Curdate) and an open parenthesis. The macro prompts for initials:
Arg "Initials" Prompt ->Quit Paste Endline ") " \
The macro uses the Prompt function to get your initials. If you choose the Cancel button, the function returns false, so the macro branches to the label Quit. If you choose the OK button, the text you typed in the dialog box is passed to the Paste function, which inserts the text. The macro moves the cursor to the end of the line (Endline) and types a closing parenthesis. The code on this line explicitly handles the case when you cancel the prompt (the false condition). The phrase ->Quit causes PWB to skip to the label Quit when Prompt returns false. If you use the Prompt function and you do not handle the false condition, a null argument (a text string with zero length) is passed to the next function.
108
Therefore, a phrase like Arg "Que?" Prompt Paste pastes either the input or nothing, depending on whether you choose the OK or Cancel button. Passing a null argument to Paste is harmless, but some functions require an argument. In these cases, you can use the -> operator to terminate the macro. The RevComment macro uses an explicit label so that it can end the macro without an error when you choose the Cancel button. The next line of the macro is almost the same as the previous line in the macro.
Arg "Comment" Prompt ->Quit Paste =>End \
On this line, if the paste is carried out, an unconditional branch is taken to the label End and passes over the Quit branch, which is defined on the next line.
:>Quit Meta Ldelete \
The Quit branch is taken when you cancel a prompt. The macro has to clean up the text already inserted by the macro. The Meta Ldelete function deletes the incomplete line that would have been the revision-history entry. The next line defines the last step of the macro.
:>End Arg "Start" Mark
The End label defines the entry point for the common cleanup code. This line restores the cursor to the initial position when you invoked the macro. Because this line does not end in a line-continuation character (\), it is the end of the RevComment macro. The last line that you typed is not part of the RevComment macro. It is a separate TOOLS.INI entry.
RevComment:Ctrl+H
This line assigns the CTRL+H key to the RevComment macro. You can polish this macro by adding Arg "Start" Meta Mark to the end of the macro. This phrase deletes the mark. A better alternative is to use the Savecur and Restcur functions instead of named marks. However, this example uses named marks to illustrate how to use them in a macro.
109
109
C H A P T E R
Customizing PWB
PWB is a completely customizable development environment. You can modify PWB in the following ways:
u u
u u u
Changing mapping of keystrokes to actions. Changing default behavior of PWB (for example, how tabs are handled or if PWB automatically saves files). Changing the colors of parts of the PWB display. Adding new commands to the Run menu. Programming new editor actions (macros). Instructions on how to write macros are in Writing PWB Macros on page 98.
In addition to the customizations that you can make by using the commands in the Options menu, you can also customize PWB by editing the TOOLS.INI file. Note Another category of customization that is not covered in this book is how to write PWB extensions. An extension is a dynamically loaded module that can access PWBs internal functions. Extensions can do much more than macros. To learn more about writing PWB extensions, see the Microsoft Advisor Help system (choose PWB Extensions from the main Help table of contents).
Changing Key Assignments

PWB maps actions (functions and macros) to keys. You can assign any of these actions to keys other than the default keys. For example, Exit is a PWB function. Its default key assignment is user may prefer to use ALT+X to leave the editor.
F8.
A BRIEF
110
To make ALT+X execute the Exit function: 1. From the Options menu, choose Key Assignments. PWB displays the Key Assignments dialog box.
2. Select Exit in the Macro/Function List box, or type exit in the Macro/Function Name text box. 3. Move the cursor to the New Key box between the braces ({}) by clicking between the braces or by pressing ALT+K. 4. Press ALT+X. PWB types ALT+X in the text box after the braces and displays the name of the macro or function that ALT+X is currently assigned to. With the default settings, you can see that ALT+X is assigned to the Unassigned function. Pressing a key in the key box is a quick way to find out the name of the function assigned to the key. Note When the cursor is in the key box (between the braces), most keys lose their usual meaning, including ESC, F1, and the dialog box access keys. The key you press is interpreted as the key to be assigned. Only TAB, SHIFT+TAB, ENTER, and NUMENTER retain their usual meaning. To assign one of these keys, type the name of the key in the text box. 5. Press TAB to move the cursor out of the key box. 6. Choose Assign.
Chapter 6 Customizing PWB
111
PWB assigns Exit to the ALT+X key. Note that Exit is still assigned to the F8 key. Functions can be assigned to many keys. 7. Choose OK. Important To change a key, you must choose the Assign button. The OK button dismisses only the dialog box. It does not perform any other action. This design allows you to assign many keys in one session with the dialog box. The change remains in effect for the duration of the session. To make a permanent key assignment: 1. From the Options menu, choose Key Assignments. 2. Choose Save. PWB displays the Save Key Assignments dialog box, which lists all of the unsaved assignments that you have made during the PWB session by using the Key Assignments dialog box. 3. Delete any settings that you do not want to save. 4. Choose OK. PWB writes your new settings into the [PWB] section of TOOLS.INI for subsequent sessions. When you exit PWB, you are prompted to save TOOLS.INI. Your changes are not permanent until you actually save the file to disk. If you already know the function name, you can make a quick assignment for the current session by using the Assign function instead of going through the Key Assignments dialog box. To assign a key using the Assign function: Execute the function sequence: Arg function:key Assign (ALT+A function:key ALT+=). For example, to assign Exit to ALT+X: 1. Press ALT+A to execute Arg. 2. Type exit:ALT+X 3. Press ALT+= to execute Assign. The assignment is in effect for the rest of the PWB session. The key assignments you make by using the Assign function are not listed in the Save Key Assignments dialog box.
112
To discover the name of the function or macro that is currently assigned to a key, use the Key Assignments dialog box (as previously described) or use the Tell function. To find a current key assignment using Tell: 1. Press CTRL+T to execute Tell. PWB displays the prompt:
2. Press the key you want to find out about. If you press F10, PWB displays the function assigned to the F10 key (Openfile).
The Tell function has many other uses in addition to displaying key assignments. For more information on Tell, see page 202.
Changing Settings
When you first use PWB, you dont have to specify the tab stops, whether the editor starts in insert or overtype mode, and so on. These settings (called switches) are all covered by defaults. PWBs default behavior can be extensively customized by changing the values of PWB switches. Switches fall into three categories:
u
Boolean switches. True/false or on/off switches that can also be specified as yes/no or 0/1. An example of a Boolean switch is Autosave, which governs whether PWB saves a file when you switch to a different one. Numeric switches. An example of a numeric switch is Undocount, which determines the maximum number of editing actions you can undo. Text switches. Examples of a text switch are Markfile, the name of the file in which to store marks, Tabstops, a list of tab-stop intervals, and Readonly, the operating-system command for PWB to run when saving a read-only file.
113
To change the setting for Tabstops: 1. From the Options menu, choose Editor Settings. PWB displays the Editor Settings dialog box. 2. Tabstops is a text switch (not a numeric switch as you might expect), so select the Text option button. 3. Select Tabstops in the Switch List box. PWB shows the current setting for Tabstops in the Switch text box at the top of the dialog box. 4. Move to the Switch text box by clicking in the box or by pressing PWB selects only the switch value, instead of the entire text. 5. Type the new setting:
3 4 7 8
ALT+S.
This setting defines a tab stop at columns 4, 8, 15, and every eight columns thereafter. At this point, the Editor Settings dialog box should look like:
6. Choose the Set Switch button to change the setting of the Tabstops switch. 7. Choose OK.
114
Important To change a setting you must choose the Set Switch button. The OK button only dismisses the dialog box. It does not perform any other action. This design allows you to set many switches in one session with the dialog box. The new tab stops you set are used for the current session. If you want to use this setting permanently, you must choose the Save button in the Editor Settings dialog box. This changes your TOOLS.INI file in the same way as for key assignments. You can make temporary switch assignments for the current session by using the Assign function. You do this in the same way as for a key assignment by typing Arg switch:value Assign (ALT+A switch:value ALT+=). You may be curious about the Switch Owner box that you did not use in this example. The Switch Owner is either PWB or a PWB extension such as PWBHELP (the extension that provides the Microsoft Advisor in PWB). Type or select a switch owner to set switches for that extension. Each extension has its own section in TOOLS.INI. Note When you choose Set Switch, most switch settings take effect immediately. However, changes to the Height switch do not take effect until you choose OK.
Customizing Colors
You can change the color of almost any item in the PWB interface. For a table showing the names and meanings of PWBs color settings, see page 252 in Chapter 7, the Programmers WorkBench Reference. Some displays show a brilliant green for the left and right triangular symbols surrounding buttons in Help. To change the light green to light cyan: 1. From the Options menu, choose Colors. PWB displays the Colors dialog box.
115
2. Select Helpitalic in the Color list box. 3. Select Cyan in the Foreground list box. 4. Choose Set Color. To verify your change, press F1. The green symbols in help are now light cyan blue. While you are viewing Help, you can find out what parts of PWB the rest of the color names determine. To leave Help, choose the Cancel button or press ESC. PWB returns you to the Colors dialog box. The Bright Fore and Bright Back check boxes determine if the given color is the usual version of the color or the bright version of the color. Bright black, for example, is usually a dark gray color. If you want to save your new colors for subsequent sessions, choose the Save button. PWB displays the Save Colors dialog box where you can delete modifications that you dont want to save. When you choose OK in the Save Colors dialog box, PWB modifies TOOLS.INI to record your changes.
Adding Commands to the Run Menu

You can add up to six commands to the Run menu to integrate your own utilities into PWB. A command is the name of any executable (.EXE or .COM) file, batch (.BAT) file, or built-in operating-system command such as DIR or COPY. Suppose you use an outline processor to keep project notes. You can start the outline processor from PWBs Run menu.
116
To add a command to the Run menu: 1. From the Run menu, choose Customize Run Menu. 2. Choose the Add button. PWB displays the Add Custom Run Menu Item dialog box for you to describe your custom menu item:
3. Type Project ~Notes... in the Menu Text box. The tilde (~) before the letter N indicates the highlighted access letter for the menu command. The ellipsis (...) uses the standard convention to indicate that the command will require more information before it is completed. An ellipsis is commonly associated with a dialog box command but can be used in this context as well. 4. Specify the full path to the outlining program, OUTLINE.EXE, in the Path Name text box. (The program name OUTLINE.EXE is for example purposes only. Substitute the name of your own outliner or other program in its place.) 5. Specify the arguments you want to pass to the outliner in the Arguments text box: %|dpfF.log. This example illustrates a powerful feature of PWB: its ability to extract parts of the filename to form a new name for customized menu items. The specification %|dpfF extracts the drive (d), path (p), and base name (f) of the current file. Anything after F is added to the end of the name. For example, if the current file is C:\SOURCE\COUNT.ASM, the argument that PWB passes to the program is C:\SOURCE\COUNT.LOG.
117
6. In the Help Line text box, type the explanatory message that appears on the status bar when you browse this menu item:
Run the OUTLINE program
7. Choose OK to confirm your entries. PWB adds the command to your Run menu and modifies TOOLS.INI to save the new item. You can now access your outline processor directly from the Run menu.
Note You can add other text processing or word processing programs to the Run menu. If you change the current file using another program, PWB prompts you to update the file or to ignore the changes made by the other program.
118
How PWB Handles Tabs

The following functions and switches control how PWB handles tabs:
Name Realtabs Entab Tabalign Filetab Tabdisp Tab Backtab Tabstops Type Switch Switch Switch Switch Switch Function Function Switch Description Determines if PWB preserves tabs on modified lines The white space translation method The alignment of the cursor within a tab field The width of a tab field The fill-character for displaying tab fields Moves the cursor to the next tab stop Moves the cursor to the previous tab stop Tab positions for Tab and Backtab
For detailed information on each function and switch, see Help or Chapter 7, Programmers WorkBench Reference. For instructions on how to set a switch see Changing Settings on page 112. For instructions on how to assign a function to a key, see Changing Key Assignments on page 109. To understand how PWB handles tabs, you need to know only a few facts:
u
u u u
The Tab (TAB) and Backtab (SHIFT+TAB) cursor-movement functions and the Tabstops switch have nothing to do with tab characters. They affect cursor movement, rather than the handling of tab characters, and are not discussed further here. For more information on these items, see Chapter 7, Programmers WorkBench Reference. PWB never changes any line in your file unless you explicitly modify it (lines longer than PWBs limit of 250 characters are the exception). Some text editors translate white space (that is, entab or detab) when they read and write the file. PWB does not translate white space when it reads or writes a file. This is to be compatible with source-code control systems that would detect the translated lines as changed lines. PWB translates white space according to the Entab switch only when you modify a line. Tabalign has an effect only when Realtabs is set to yes. A tab break occurs every Filetab columns. When PWB displays a tab in the file, it fills from the tab character to the next tab break with the Tabdisp character. Figure 6.1 illustrates how PWB displays tabs.
119
Figure 6.1
u
How PWB Displays Tabs
When translating white space, PWB preserves the exact spacing of text as it is displayed on screen.
To set the width of displayed tabs, change the setting of the Filetab switch. To tell PWB to translate white space on lines that you modify, set the Realtabs switch to no and the Entab switch to a nonzero value, according to the translation method that you want to use. The Entab switch takes one of the following values:
Entab
0 1 2
Translation Method Translate white space to space characters Translate white space outside of quotation-mark pairs to tabs Translate white space to tabs
To preserve white space exactly as you type it, set the Realtabs switch to yes and the Entab switch to 0. When Realtabs is yes, the Tabalign switch comes into effect. When Tabalign is set to yes, PWB automatically repositions the cursor onto the physical tab character in the file, similar to the way a word processor positions the cursor. When Tabalign is set to no, PWB allows the cursor to be anywhere in the tab field. If you want the TAB key to type a tab character, assign the TAB key to the Graphic function. Note that when a dialog box is displayed, the TAB key always moves to the next option. You can always use the following method to type a tab character, whether you are in a dialog box or an editing window. To type a literal tab character in your text or in a dialog box: 1. Execute the Quote function (press CTRL+P). 2. Press TAB.
120
Examples
The following example sets up tabs so that they act the same as in other Microsoft editors, such as QuickC or Word:
realtabs:yes tabalign:yes graphic:tab trailspace:yes entab:0
The Trailspace switch is needed so that the TAB key will have an effect on otherwise blank lines. To save your file so that it does not include any actual tab characters (ASCII 9), use the following settings:
realtabs:no entab:0 tabstops:3
The Tabstops value determines the number of spaces inserted for each press of the tab key. Another example of a common tab configuration is one in which the TAB key inserts a tab in insert mode but moves over text to the next tab stop when the editor is in overtype mode. First, use the following tab settings:
realtabs:yes tabalign:yes
Then insert the following macro into the PWB section of your TOOLS.INI:
;Insert mode and overtype mode tabbing TabIO:= Insertmode +>over Insertmode "\t" => :>over Insertmode Tab TabIO:TAB \
For more information on PWB macros see Writing PWB Macros on page 98.
PWB Configuration
PWB keeps track of three kinds of information between sessions in these three files:
File TOOLS.INI Information Saved Configuration and customizations, such as key assignments, colors, and macro definitions
Chapter 6 Customizing PWB CURRENT.STS project .STS The editing environment used most recently The editing and building environment for a project
121
TOOLS.INI is described in the next section: The TOOLS.INI File. For more information about CURRENT.STS, see Current Status File CURRENT.STS on page 128, and for more information about the project.STS files, see Project Status Files on page 129. When you start PWB, it reads the TOOLS.INI file, loads PWB extensions, and reads the CURRENT.STS or project status file in the following order: 1. PWB reads the [PWB] section of TOOLS.INI (except when PWB is started using the /D or /DT command-line options). For more information on tagged sections, see TOOLS.INI Section Tags. If the [PWB] section contains Load switches, PWB loads the specified extension when each switch is encountered. When PWB loads an extension, it also reads the extensions tagged section of TOOLS.INI, if any. For example, when the Help extension is loaded, PWB reads the [PWBPWBHELP] section of TOOLS.INI. 2. PWB autoloads extensions (except when the /D or /DA option is used to start PWB). The automatic loading of PWB extensions is described in the next section, Autoloading Extensions. 3. PWB reads the TOOLS.INI operating-system tagged section (except when /D or /DT is used). 4. PWB reads the CURRENT.STS status file (except when /D or /DS is used to start PWB). 5. PWB reads the TOOLS.INI tagged section for the file extension of the current file (except when /D or /DT is used to start PWB). 6. PWB runs the Autostart macro if it is defined in TOOLS.INI (except when /D or /DT is used).
Autoloading Extensions
PWB automatically loads extensions if they follow a specific naming convention and reside in a certain directory. For extensions that follow the convention, it is not necessary to put load statements in TOOLS.INI. PWB searches the directory where the PWB executable file is located for filenames with the following pattern:
PWB*.MXT
122
PWB loads as many extensions with names of this form as it finds. When PWB loads an extension, it also loads the extensions tagged section of TOOLS.INI. To suppress extension autoloading, use the /DA option on the PWB command line.
123
Important Do not rename editor extensions. PWB and some extensions may assume the predefined filename.
The TOOLS.INI File

PWB, like other Microsoft tools, stores information in a file called TOOLS.INI. This file retains information about how you want PWB to work under various circumstances. PWB expects to find this file in the directory specified by your INIT environment variable. TOOLS.INI is a text file. You can edit it using PWB or any other text editor. PWB also can store information directly to TOOLS.INI when, for example, you choose the Save Colors button in the Colors dialog box. PWB modifies this file when you save a recorded macro, a changed switch, a new key assignment, a custom browser database, or a custom project template.
TOOLS.INI Section Tags

The TOOLS.INI file is divided into sections, separated by tags. These tags are specified in the form: [[tagname]] The tagname is the base name of an executable file, such as NMAKE, CVW, or PWB. The tag defines the start of a TOOLS.INI section that contains settings for the indicated tool. PWB extends this simple syntax to enable you to take different action depending on the operating system or the current files extension. The extended syntax is: [[PWB-modifier]] The modifier can be the base name of a PWB extension, an operating systems identifier, or a filename extension for files that you edit.
Operating-System Tags
The following table lists the operating-system tags for various operating environments. If you are running the Windows operating system, use the tag for the version of MS-DOS that you are running.
Tag [PWB-4.0] [PWB-5.0] Operating Environment MS-DOS versions 4.0 and 4.01 MS-DOS version 5.0
Be sure to use the correct version number for your operating system.
124
Filename-Extension Tags
The operating-system tags are read only once at startup. PWB reads the filename-extension tagged sections each time you switch to a file with that extension. For example, suppose that you want the tab stops for MASM files to be every eight columns, and every five columns for text files. To set tab options based on filename extension: 1. Open your TOOLS.INI file in an editing window. 2. Create a MASM section by typing the tag:
[PWB-.ASM PWB-.INC ]
3. Create a text file section by typing the tag:

[PWB-.TXT]
4. Put the appropriate Tabstops, Entab, and Realtabs switches in each section. The lines that begin with a semicolon are comments.
[PWB-.ASM PWB-.INC ; Set the tab stops for MASM to 8 tabstops : 8 ; Translate white space to tabs entab : 1 realtabs : no [PWB-.TXT] ; Set the tab stops for text files to 5 tabstops : 5 ; Translate white space to spaces entab : 0 realtabs : no
Depending on whether the current file is a MASM (.ASM or .INC) file or a text (.TXT) file, the tab stops are set at 8 or 5 columns, respectively. PWB reads multiple sections and applies the appropriate settings. You can use this to your advantage by storing all your general settings in the [PWB] section and storing differences in separate tagged sections. Filename-extension tagged sections are useful for the kinds of files you edit most frequently. However, its impossible to define settings for every conceivable extension. To handle this case, PWB provides a special extension (..) that means all extensions not defined elsewhere in TOOLS.INI. For example, to set tab stops to 5 for all files except MASM files, modify the preceding example to use the [PWB-..] tag in place of [PWB-.TXT].
125
Note When you choose the Save button in the Key Assignments, Editor Settings, and Colors dialog boxes, and when you save a recorded macro or custom Run menu command, PWB saves the setting in the main section. If the setting is for a PWB extension, it is saved in that extensions tagged section. PWB never modifies or writes settings in a filename-extension or operating-system section.
Named Tags
You can define tagged sections of TOOLS.INI that you load manually. Use manually loaded sections to make special key assignments, to load complex or rarely used macros, or to use a special PWB configuration under a particular circumstance. The syntax for a manually-loaded section tag is: [PWB-name] Where name is the name of the tagged section. A single section of TOOLS.INI can be given several tag names. These tags have the form: [PWB-name1 PWB-name2...] When you want to use the settings defined in one of these named sections, pass the name of the section to the Initialize function (SHIFT+F8). To read a tagged section of TOOLS.INI: Execute Arg name Initialize (ALT+A name SHIFT+F8) You can use this method to read any tagged section, including the automatically loaded sections. Note When you execute Initialize with no arguments, PWB clears all the current settings before reading the [PWB] section, including settings that you have made for specific PWB extensions. PWB does not reread the operatingsystem or other additional sections of TOOLS.INI. To reread the main section without clearing other settings that you want to remain in effect, label the main PWB section with the tag [PWB PWB-main]. You can then use Arg main Initialize to recover your startup settings, instead of using Initialize with no arguments.
TOOLS.INI Statement Syntax

Within each TOOLS.INI section you place a series of comments or statements. Each statement is a macro definition, key assignment, or switch setting, and
126
must be stated on a single logical line. Statements can be continued across lines by using line-continuations.
General Macro Syntax

The general syntax for a macro definition is: name := definition PWB does not reserve any names. Therefore, be careful not to redefine a PWB function. For more information about how to write macros, see Writing PWB Macros on page 98.
General Key Syntax

The general syntax for a key assignment is: name : key The name is the name of a function or macro, and the key is the name of a key. To see how to write a given key, use the Tell function as described in Changing Key Assignments on page 109. Note that certain keys have fixed meanings when the cursor is in a dialog box or in the Help window. You can assign one of these keys to a function or macro, but the fixed meaning is used in a dialog box or the Help window. The following keys have fixed meanings:
Key
ESC F1 TAB SHIFT+TAB SPACEBAR ENTER, SHIFT+ENTER, NUMENTER, SHIFT+NUMENTER
Dialog Box Choose Cancel See Help on the dialog box (choose Help) Move to the next option Move to the previous option Toggle the setting of the current option Choose the default action
Help Window Close the Help window See Help on the current item Move to the next hyperlink Move to the previous hyperlink Activate the current hyperlink Activate the current hyperlink
127
Note The Windows operating system or a terminate-and-stay-resident (TSR) program may override PWBs use of specific keys. PWB has no knowledge of keys that are reserved by these external processes. PWB lists these keys as available keys in the Key Assignments dialog box and allows you to assign functions to these keys, but you may not be able to use them. See the documentation for your operating environment to see what keys are reserved by the system.
General Switch Syntax

The general syntax for a switch setting is: switch : value The exact syntax for the switch value depends on the switch. See Chapter 7, PWB Reference, for more information about each switch.
Line Continuation
All statements in TOOLS.INI must be stated on a single logical line. A logical line can be written on several physical lines by using the TOOLS.INI linecontinuation character, the backslash (\). The backslash must be preceded by a space to be treated as a line-continuation character. Precede the backslash by two spaces if you want the concatenated statement to contain a space at that location. If the backslash is preceded by a tab, PWB treats the tab as if it were two spaces. The backslash should be the last character on the line except for spaces or tabs. The backslash in the following statement is not a line continuation.
Qreplace:CTRL+\
However, the backslash at the end of the first line below is a line continuation.
findtag:=Arg Arg "^\\[^\\]+\\]" Psearch ->nf Arg Setwindow => :>nf Arg "no tag" Message \
In this example, the backslash is preceded by two spaces. The first space is included to separate ->nf from Arg in the concatenated macro definition. The second space identifies the backslash that follows it as the line-continuation character.
Comments
In the TOOLS.INI file, PWB treats the text that follows a semicolon (;) up to the end of the line as a comment. To specify the beginning of a comment, you must place the semicolon at the beginning of a line or following white space.
128
For example, the first semicolon in the following statement is part of a command, and the second semicolon begins a comment.
Printcmd:lister -t4 %s -c; ;Print using lister program
In the following example, the first semicolon is a key name, and the second semicolon begins a comment.
Sinsert:CTRL+; ;Stream insertion: CTRL plus semicolon
Semicolons inside a quoted string do not begin a comment.
The INIT environment variable tells PWB where to find the TOOLS.INI file and where to store the CURRENT.STS file. In general, the INIT, TMP, LIB, INCLUDE, HELPFILES, and PATH environment variables must all be properly set for your development environment to work smoothly. To set the INIT environment variable from the command line: u Type SET INIT=C:\INIT The operating-system SET command sets the environment variable to contain the string C:\INIT. This example presumes that you want to store your initialization files in C:\INIT. You could use any other directory. Make sure that the INIT environment variable lists a single directory. Multiple directories in INIT can cause inconsistent behavior. The following list outlines how the environment works:
u
The environment is always inherited from the parent process. The parent is the process that starts the current process. In MS-DOS, the parent is often COMMAND.COM or the Windows operating system. Inheritance of environment variables is a one-way process. A child inherits from its parent. You can make changes to the environment in a child (when you use the Environment Variables command in PWB, for example), but they are not passed back to the parent. This means that any changes to environment variables that you make while shelled out are lost when you return to PWB. Each MS-DOS session under the Windows operating system inherits its environment from the Windows operating system. Changes made to the environment in one session do not affect any other session.
The best way to make sure your environment is set properly is to explicitly set it in one of your startup files. These are:
u
CONFIG.SYS

u
129
AUTOEXEC.BAT
PWB can save the complete table of environment variables for each project. You can then use the Environment Variables command from the Options menu to change environment variables for individual projects. If you prefer that PWB save the environment variables for all PWB sessions or use the current operating-system environment when it starts up, change the Envcursave and Envprojsave switches. For more information on these switches, see the Programmers WorkBench Reference on pages 259 and 260.
Current Status File CURRENT.STS

The first time you run PWB or CodeView, it creates a CURRENT.STS (current status) file in your INIT directory. If there is no INIT directory, PWB and CodeView create the file in the current directory. CURRENT.STS keeps track of the following items for PWB:
u
u u u u u
Open windows, including their size and position and the list of open files in each window Screen height Window style Find string Replace string The options used in a find or find-and-replace operation, such as the use of regular expressions Optionally, all environment variables
PWB and CodeView share the current location and filename for the active window. When you leave CodeView after a debugging session and return to PWB, PWB positions the cursor at the place where you stopped debugging. For more information on the items that CodeView saves in CURRENT.STS, see The CURRENT.STS State File on page 316. The next time you run PWB, it reads CURRENT.STS and restores the editing environment to what it was when you left PWB. For more information on how PWB uses environment variables, see Environment Variables on page 127. The status files are plain text files. You can load one into an editor and read it. However, you might corrupt the file if you try to modify it. There is no need to modify it because PWB keeps it updated for you. No harm occurs if you delete CURRENT.STS. However, you will have to manually reopen the files you were working on.
130
Project Status Files

For each project, PWB creates a project status file. PWB stores this file in the project directory and gives it the name project.STS, where project is the base name of the project. Project status files contain the same kind of information that CURRENT.STS contains, except on a per-project basis. This scheme allows PWB to keep track of your screen layout, file history, and environment variables for each project. The project status files also contain the current project template, language and utility options, build directory, and the programs run-time arguments. The main difference between the two status files is that the CURRENT.STS file supplies default status information settings that PWB uses when you have not set a project. PWB uses the projects status file when you open that project. PWB can also save all environment variables, including PATH, INCLUDE, LIB, and HELPFILES, depending on how the envcursave and envprojsave switches are set. For more information, see Environment Variables on page 127. Important While it is harmless to delete CURRENT.STS, you should not delete project status files. They contain important information for building and updating your project. If you delete a project status file, you may need to delete the project makefile and start over.
131
131
C H A P T E R
Programmers WorkBench Reference
PWB Command Line

Syntax Options PWB [[options]] [[/t]] files Use the following case-insensitive options when starting PWB: /D[[S|T|A]]... Disables PWB loading the initialization files or PWB extensions as indicated by the following letters:
Letter S T A Meaning Disable reading the status file CURRENT.STS Disable reading TOOLS.INI Disable PWB extension autoload
The /D option alone disables loading all the PWB extension and initialization files. See: Autoload. Note If you start PWB with the /DT option, this means that PWB options you change during the session cannot be saved. /PP makefile Opens the specified PWB project. /PF makefile Opens the specified non-PWB project (foreign makefile). /PL Resets the last project. Use this option to start PWB in the same state you last left it. You can set this option as the default by setting the Lastproject switch to yes.
Filename: LMAETC07.DOC Project: Environment and Tools Template: MSGRIDA1.DOT Author: Mike Eddy Last Saved By: Mike Eddy Revision #: 5 Page: 131 of 1 Printed: 10/09/00 02:46 PM
132
/E command Executes the given command or sequence of commands as a macro upon startup. If command contains a space, command should be enclosed in double quotation marks ("). A single command need not be quoted. If command uses literal quotation marks, place a backslash (\) before each mark. To use a backslash, precede it with another backslash. /R PWB starts in no-edit mode. You cannot modify files in this mode. See: Noedit. /M {mark | line} PWB starts at the specified location. See: Mark. [[[[/T]] file]]... Tells PWB to load the given files on startup. If you specify a single file, PWB loads it. If you specify multiple files, PWB loads the first file; then when you use File Next or the Exit function, PWB loads the next file in the list. If a /T precedes a filename or wildcard, PWB loads each file as a temporary file. PWB does not include temporary files in the list of files saved between sessions. Note No other options can follow /T on the PWB command line. You must specify /T for each file you want to be temporary.
PWB Menus and Keys

Many PWB menu commands activate PWB functions or predefined macros. The menu commands that are attached to functions and macros are listed in the tables that follow. To assign a shortcut key for one of these menu commands, use the Key Assignments command on the Options menu and assign a key to the corresponding function or macro. For details on using the Key Assignments dialog box, see Changing Key Assignments on page 109. Names beginning with an underscore (_pwb...) are macros. Names without an underscore are functions.
Table 7.1 File Menu and Keys Menu Command New Close Macro or Function _pwbnewfile _pwbclosefile Default Keys Unassigned Unassigned
Chapter 7 Programmers WorkBench Reference Next Save Save All Table 7.1 Menu Command DOS Shell n file Exit _pwbnextfile _pwbsavefile _pwbsaveall File Menu and Keys (continued) Macro or Function _pwbshell _pwbfilen _pwbquit Default Keys Unassigned Unassigned
ALT+F4
133
Unassigned
SHIFT+F2
Unassigned
Table 7.2 Edit Menu and Keys Menu Command Undo Redo Repeat Cut Copy Paste Delete Set Anchor Select To Anchor Stream Mode Box Mode Line Mode Record On Macro or Function _pwbundo _pwbredo _pwbrepeat Delete Copy Paste _pwbclear Savecur Selcur _pwbstreammode _pwbboxmode _pwblinemode _pwbrecord Default Keys Unassigned Unassigned Unassigned
SHIFT+DEL, SHIFT+NUMCTRL+INS, SHIFT+NUM* SHIFT+INS, SHIFT+NUM+ DEL
Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned
Table 7.3 Search Menu and Keys Menu Command Log Next Match (Logging on) Next Match (Logging off) Previous Match (Logging on) Previous Match (Logging off) Goto Match Macro or Function _pwblogsearch _pwbnextlogmatch _pwbnextmatch _pwbpreviouslogmatch _pwbpreviousmatch _pwbgotomatch Default Keys Unassigned
SHIFT+CTRL+F3
Unassigned
SHIFT+CTRL+F4
Unassigned Unassigned
134
Environment and Tools Table 7.4 Project Menu and Keys Menu Command Compile File Build Rebuild All Close Next Error Previous Error Goto Error Macro or Function _pwbcompile _pwbbuild _pwbrebuild _pwbcloseproject _pwbnextmsg _pwbprevmsg _pwbsetmsg Default Keys Unassigned Unassigned Unassigned Unassigned
SHIFT+F3 SHIFT+F4
Unassigned
Table 7.5 Run Menu and Keys Menu Command command1 command2 command3 command4 command5 command6 command7 command8 command9 Macro or Function _pwbuser1 _pwbuser2 _pwbuser3 _pwbuser4 _pwbuser5 _pwbuser6 _pwbuser7 _pwbuser8 _pwbuser9 Default Keys [ALT+Fn] [ALT+Fn] [ALT+Fn] [ALT+Fn] [ALT+Fn] [ALT+Fn] [ALT+Fn] [ALT+Fn] [ALT+Fn]
Table 7.6 Browse Menu and Keys Menu Command Goto Definition Goto Reference View Relationship List References Call Tree (Fwd/Rev) Function Hierarchy Module Outline Which Reference Class Tree (Fwd/Rev) Class Hierarchy Macro or Function Pwbrowsegotodef Pwbrowsegotoref Pwbrowseviewrel Pwbrowselistref Pwbrowsecalltree Pwbrowsefuhier Pwbrowseoutline Pwbrowsewhref Pwbrowsecltree Pwbrowseclhier Default Keys Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned
Chapter 7 Programmers WorkBench Reference Table 7.6 Browse Menu and Keys (continued) Menu Command Next Previous Macro or Function Pwbrowsenext Pwbrowseprev Default Keys
CTRL+NUM+ CTRL+NUM-
135
Table 7.7 Window Menu and Keys Menu Command New Close Close All Move Size Restore Minimize Maximize Cascade Tile Arrange n file Macro or Function _pwbnewwindow _pwbclose _pwbcloseall _pwbmove _pwbresize _pwbrestore _pwbminimize _pwbmaximize _pwbcascade _pwbtile _pwbarrange _pwbwindown Default Keys Unassigned
CTRL+F4
Unassigned
CTRL+F7 CTRL+F8 CTRL+F5 CTRL+F9 CTRL+F10 F5 SHIFT+F5 ALT+F5 ALT+n
Table 7.8 Help Menu and Keys Menu Command Index Contents Topic Help on Help Next Search Results Macro or Function _pwbhelp_index _pwbhelp_contents _pwbhelp_context _pwbhelp_general _pwbhelp_again _pwbhelp_searchres Default Keys Unassigned
SHIFT+F1 F1
Unassigned Unassigned Unassigned
PWB Default Key Assignments

PWBs default keys assignments are shown in table 7.9. In each position having the text Unassigned, you can assign a function or macro to that key without taking away a default keystroke. You cannot assign keys for positions that are
136
empty. These can usually be expressed in a different way. For example, CTRL+{ is expressed as SHIFT+CTRL+[.
Table 7.9 PWB Default Key Assignments Key
! # $ % & ( *
Plain Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic
SHIFT
ALT
CTRL
CTRL+SHIFT
Graphic Graphic Graphic Graphic
Unassigned Unassigned Unassigned Unassigned Unassigned _pwbwindow1 _pwbwindow2 _pwbwindow3 _pwbwindow4 _pwbwindow5 _pwbwindow6 _pwbwindow7 _pwbwindow8 _pwbwindow9 Unassigned Unassigned Unassigned Assign Unassigned Arg (Browse menu) Unassigned Unassigned
Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Mword Unassigned Ppage Right
Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned
+
, . / 0 1 2 3 4 5 6 7 8 9 : ; < = >
@ A
B C D
Chapter 7 Programmers WorkBench Reference Table 7.9 PWB Default Key Assignments (continued) Key
E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ { | } ~ F1 F2
137
Plain Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic _pwbhelp_context Setfile
SHIFT
ALT
CTRL
CTRL+SHIFT
Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic Graphic _pwbhelp_contents _pwbsavefile
(Edit menu) (File menu) Unassigned (Help menu) Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned (Options menu) (Project menu) Unassigned (Run menu) (Search menu) Unassigned Unassigned Unassigned (Window menu) Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned _pwbhelp_back Unassigned
Up Pword Cdelete Unassigned Unassigned Sinsert Unassigned Replace Mark Linsert Lasttext Quote Unassigned Mpage Left Tell Lastselect Insertmode Mlines Down Ldelete Plines Pbal Qreplace Setwindow Pwbhelpnext Unassigned
Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Record Sethelp Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned
138
Table 7.9 PWB Default Key Assignments (continued) Key

F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 LEFT RIGHT UP DOWN INS DEL HOME END ENTER BKSP ESC GOTO NUM* NUM+ NUMNUM/ NUMENTER
Plain Psearch Msearch _pwbcascade Selwindow Execute Exit Meta Openfile Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Left Right Up Down Insertmode _pwbclear Begline Endline Emacsnewl Emacscdel Cancel Home Graphic Graphic Graphic Graphic Emacsnewl
SHIFT
ALT
CTRL
CTRL+SHIFT
_pwbnextmsg _pwbprevmsg _pwbtile _pwbprevwindow Refresh Initialize Shell Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Select Select Select Select Paste Delete Select Select Newline Emacscdel Unassigned Unassigned Copy Paste Delete Newline
Unassigned _pwbquit _pwbarrange Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Undo Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned
Compile _pwbclose _pwbrestore Winstyle _pwbmove _pwbresize _pwbminimize _pwbmaximize Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Mword Pword Mlines Plines Copy Unassigned Begfile Endfile Unassigned Unassigned Unassigned Unassigned Unassigned Pwbrowsenext Pwbrowseprev Unassigned Unassigned
_pwbnextlogmatch _pwbpreviouslogmatch Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Select Select Unassigned Unassigned Unassigned Unassigned Select Select Unassigned Undo Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned
Chapter 7 Programmers WorkBench Reference Table 7.9 PWB Default Key Assignments (continued) Key
PGUP PGDN TAB
139
Plain Mpage Ppage Tab
SHIFT
ALT
CTRL
CTRL+SHIFT
Select Select Backtab
Select Select Unassigned
Note on Available Keys

PWB allows you to assign functions and macros to almost any key combination. However, some keys have a fixed meaning in certain circumstances or operating environments. PWB lists these key as available keys in the Key Assignments dialog box, and PWB allows you to assign a command to the key. However, when the circumstance holds, or you are running PWB in a specific environment, certain keys have a fixed meaning that overrides any assignment that you make.
Help Window
In the Help window, the following keys have a fixed meaning:
Key
ESC TAB SHIFT+TAB ENTER NUMENTER SHIFT+ENTER SHIFT+NUMENTER SPACE
Meaning Close the Help window Move to next hyperlink Move to previous hyperlink Activate current hyperlink Activate current hyperlink Activate current hyperlink Activate current hyperlink Activate current hyperlink
Dialog Boxes
In dialog boxes, all keys have predetermined meanings. Your assignments have no effect when a dialog box is displayed. In particular, note the following keys:
Key
ESC ENTER F1 TAB SHIFT+TAB
Meaning Choose Cancel Choose the active command button Choose Help Move to the next option or command Move to the previous option or command
140
Environment and Tools Key

SPACE CTRL+P
Meaning Toggle active option When used in a text box, inserts the next key as a literal value. Use this key to type a literal tab character.
ESC (Cancel)
The Text Argument dialog box is an exception. All keys except F1 (Help) have their assigned meaning.
and
Microsoft Windows
When running PWB with the Windows operating system, some keys are reserved for use by Windows. You can override these reserved keys by setting options in a PIF file.
Key
ALT+ESC CTRL+ESC ALT+TAB ALT+SPACE ALT+ENTER
Default Meaning in the Windows operating system Switch to the next window in the Windows operating system Switch to the the Windows operating system Task Manager Switch to the next application Activate the current windows system menu Shift application between full screen and window
PWB Functions
PWB provides a rich variety of editing, searching, and project-management capabilites in the form of functions. Most of PWBs menus and dialogs call these functions (or macros that use these functions) to perform their actions. You can write your own macros that use these capabilities in ways that precisely suit your needs. You can also execute every function directly, either by pressing a key or by using the Execute function. Table 7.10 summarizes PWB functions. Most functions can be executed in different ways to perform related actions. Complete details are given in the A-toZ reference that follows the table.
Table 7.10 PWB Functions Function Arg Arrangewindow Assign Backtab Begfile Begline Description Begin a function argument Arrange windows or icons Define a macro or assign a key Move to previous tab stop Move to beginning of file Move to beginning of line Keys
ALT+A
Unassigned
ALT+= SHIFT+TAB CTRL+HOME HOME
Chapter 7 Programmers WorkBench Reference Table 7.10 Function Cancel Cancelsearch Cdelete Clearmsg Clearsearch Closefile Compile Copy Curdate Curday Curtime Delete Down Emacscdel Emacsnewl Endfile Endline Environment Execute Exit Graphic Home Information Initialize Insert Insertmode Lastselect Lasttext Ldelete Left Linsert Logsearch PWB Functions (continued) Description Cancel arguments or current operation Cancel background search Delete character Clear Build Results Clear Search Results Close current file Compile and build Copy selection to the clipboard Todays date (dd-Mmm-yyyy) Day of week (Tue) Current time (hour:minute:second) Delete selection Move down one line Delete character Start a new line Move to end of file Move to end of line Set or insert environment variable Execute macros and functions by name Advance to next file or leave PWB Type character Move to window corner (Obsolete) Reinitialize Insert spaces or lines Toggle insert/overtype mode Recover last selection Recover last text argument Delete lines Move left Insert lines or indent line Toggle search logging Keys
ESC
141
Unassigned
CTRL+G

CTRL+F3 CTRL+INS, SHIFT+NUM*

SHIFT+DEL, SHIFT+NUMCTRL+X, DOWN BKSP, SHIFT+BKSP ENTER, NUMENTER CTRL+END END
Unassigned
F7 F8
(many)
GOTO
SHIFT+F8
Unassigned
CTRL+V, INS CTRL+U CTRL+O CTRL+Y CTRL+S, LEFT CTRL+N
Unassigned
142
Environment and Tools Table 7.10 Function Mark Maximize Menukey Message Meta Mgrep Minimize Mlines Movewindow Mpage Mpara Mreplace Mreplaceall Msearch Mword Newfile Newline Nextmsg Nextsearch Noedit Openfile Paste Pbal Plines Ppage Ppara Print Project Prompt Psearch Pwbhelp PWB Functions (continued) Description Set, clear, or go to a mark or line number Enlarge window to full size Activate menu Display a message or refresh the screen Modify the action of a function Search across files for text or pattern Shrink window to an icon Scroll down by lines Move window Move up one page Move up one paragraph Multifile replace with confirmation Multifile replace Search backward for pattern or text Move back one word Create a new pseudofile Move to the next line Go to build message location Go to search match location Toggle the no-edit restriction Open a new file Insert file or text from clipboard Balance paired characters Scroll up by lines Move down one page Move down one paragraph Print file or selection Set or clear project Request text argument Search forward for pattern or text Help topic lookup Keys
CTRL+M
Unassigned
ALT
Unassigned
F9
CTRL+UP , CTRL+W
Unassigned
CTRL+R, PGUP

F4 CTRL+A, CTRL+LEFT
Unassigned
SHIFT+ENTER, SHIFT+NUMENTER

F10 SHIFT+INS, SHIFT+NUM+ CTRL+[ CTRL+DOWN, CTRL+Z CTRL+C, PGDN
Unassigned Unassigned Unassigned Unassigned

F3
Unassigned
Chapter 7 Programmers WorkBench Reference Table 7.10 Function Pwbhelpnext Pwbhelpsearch Pwbrowse1stdef Pwbrowse1stref Pwbrowsecalltree Pwbrowseclhier Pwbrowsecltree Pwbrowsefuhier Pwbrowsegotodef Pwbrowsegotoref Pwbrowselistref Pwbrowsenext Pwbrowseoutline Pwbrowsepop Pwbrowseprev Pwbrowseviewrel Pwbrowsewhref Pwbwindow Pword Qreplace Quote Record Refresh Repeat Replace Resize Restcur Right Saveall Savecur Sdelete Searchall Selcur PWB Functions (continued) Description Relative help topic lookup Global full-text help search Go to first definition Go to first reference Browse Call Tree (Fwd/Rev) Browse Class Hierarchy Browse Class Tree (Fwd/Rev) Browse Function Hierarchy Browse Goto Definition Browse Goto Reference Browse List References Browse Next Browse Module Outline Go to previously browsed location Browse Previous Browse View Relationship Browse Which Reference? Open a PWB window Move forward one word Replace with confirmation Insert literal key Toggle macro recording Reread or discard file Repeat the last editing operation Replace pattern or text Resize window Restore saved position Move right Save all modified files Save cursor position Delete streams Highlight occurrences of pattern or text Select to saved position Keys
CTRL+F1
143
Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned
CTRL+NUM+
CTRL+NUM-

CTRL+F, CTRL+RIGHT CTRL+\ CTRL+P SHIFT+CTRL+R SHIFT+F7
Unassigned
CTRL+L
CTRL+D, RIGHT
Unassigned Unassigned Unassigned Unassigned Unassigned
144
Environment and Tools Table 7.10 Function Select PWB Functions (continued) Description Select text Keys
SHIFT+PGUP , SHIFT+CTRL+PGUP , SHIFT+PGDN, SHIFT+CTRL+PGDN, SHIFT+END, SHIFT+CTRL+END, SHIFT+HOME, SHIFT+CTRL+HOME, SHIFT+LEFT, SHIFT+CTRL+LEFT, SHIFT+UP , SHIFT+RIGHT , SHIFT+CTRL+RIGHT , SHIFT+DOWN
Selmode Selwindow Setfile Sethelp Setwindow Shell Sinsert Tab Tell Unassigned Undo Up Usercmd Window Winstyle
Change selection mode: box Move to window Open or change files Opens, closes, and lists help files Adjust file in window Start a shell or run a system command Insert a stream of blanks or break line Move to the next tab stop Show key assignment or macro definition Remove a function assignment from a key Undo and redo editing operations Move up Execute a custom Run menu command Move to next or previous window Add or remove scroll bars
Unassigned
F6 F2 SHIFT+CTRL+S CTRL+] SHIFT+F9 CTRL+J TAB CTRL+T
(All unassigned keys)

ALT+BKSP, SHIFT+CTRL+BKSP CTRL+E, UP
CTRL+F6
Cursor-Movement Commands
PWB provides the following commands to navigate through text. In addition to the commands in the PWB editor, the Source Browser provides powerful commands to navigate through the source of your programs.
Chapter 7 Programmers WorkBench Reference Table 7.11 Cursor-Movement Commands Cursor Movement Up one line Down one line Left one column Right one column Upper-left corner of window Top of window Bottom of window Leftmost column in window Rightmost column in window Lower-right corner of window Up one window Down one window Column one One column past window width Back one word Forward one word Beginning of line End of line Next paragraph Previous paragraph End of paragraph End of previous paragraph Beginning of file End of file To specific line number Position before last scroll Saved position Named mark Scroll window down one line Scroll window up one line Scroll window so cursor at top Command Up Down Left Right Home Meta Up Meta Down Meta Left Meta Right Meta Home Mpage Ppage Meta Begline Meta Endline Mword Pword Begline Endline Ppara Mpara Meta Ppara Meta Mpara Begfile Endfile Arg number Mark Arg Mark Restcur Arg name Mark Mlines Plines Arg Plines Keys
UP DOWN LEFT RIGHT
145
HOME
F9 UP F9 DOWN F9 LEFT F9 RIGHT
HOME
PGUP PGDN F9 HOME F9 END CTRL+LEFT CTRL+RIGHT HOME END
F9 Unassigned F9 Unassigned CTRL+HOME CTRL+END ALT+A number CTRL+M ALT+A CTRL+M
Unassigned
ALT+A name CTRL+M CTRL+UP CTRL+DOWN ALT+A CTRL+DOWN
146
Environment and Tools Table 7.11 Cursor-Movement Commands (continued) Cursor Movement Scroll window so cursor at bottom Scroll window so cursor at home Command Arg Mlines Arg Setwindow Keys
ALT+A CTRL+UP ALT+A CTRL+]
Arg
Key
ALT+A
Arg Begin an argument to a function or begin a selection. After you execute Arg, PWB displays Arg[1] on the status bar. Each time you execute Arg, PWB increments the Arg count. PWB functions perform variations of their action depending on the Arg count and the Meta state. You can use the Meta and Arg function prefixes in any order. See: Meta. To select text or create a function argument: 1. Execute Arg (ALT+A). 2. Execute a cursor-movement function. Or hold down the SHIFT key and click the left mouse button. PWB creates a stream, box, or line selection based on the current selection mode. A selection in each of these modes creates a function argument called streamarg, boxarg, or linearg, respectively. To create a text argument: 1. Execute Arg (ALT+A). 2. Type the text of the argument. When you type the first character of the argument, PWB displays the Text Argument dialog box where you can enter the textarg without modifying your file. The Text Argument dialog box does not have an OK button; instead, you execute the function to which you are passing the text argument. Choose Cancel to save the text and do nothing.
Chapter 7 Programmers WorkBench Reference
147
To pick up text from a window: 1. Select the text that you want to use in the Text Argument dialog box. 2. Execute Lasttext (CTRL+O). PWB copies the selected text into the text argument dialog box.
Returns See
To cancel an argument or selection:

u
Execute Cancel (ESC).
The return value of Arg cannot be tested. Cancel, Lastselect, Lasttext, Meta, Prompt
Arrangewindow
Key Unassigned Arrangewindow Cascades all unminimized windows on the desktop. Does not affect minimized windows. See: _pwbcascade. Arg Arrangewindow (ALT+A Unassigned) Arranges all unminimized windows on the desktop. Does not affect minimized windows. See: _pwbarrange. Meta Arrangewindow (F9 Unassigned) Tiles up to 16 unminimized windows. Does not affect minimized windows. See: _pwbtile. Meta Arg Arrangewindow (F9 ALT+A Unassigned) Arranges all icons (minimized windows) on the desktop. Returns True False Windows or icons arranged. Nothing to arrange, or more than 16 windows open.
Assign
Key
ALT+=
The Assign function assigns a function to a keystroke, defines a macro, or sets a PWB switch. You can also assign keys and set switches by using the commands in the Options menu. To see the current assignment for a key or the
148
definition of a macro, use Options Keys Assignments or the Tell function (CTRL+T). See: Tell. Assign Performs the assignment using the text on the current line. If the line ends with a line continuation, PWB uses the next line, and so on for all continued lines. Arg Assign (ALT+A ALT+=) Same as Assign, except uses text starting from the cursor. Arg textarg Assign (ALT+A textarg ALT+=) Performs the assignment using the specified textarg. Arg mark Assign (ALT+A mark ALT+=) Performs the assignment using the text from the line at the cursor to the specified mark. The mark argument can be either a line number or a previously defined mark name. See: Mark. Arg boxarg | linearg | streamarg Assign (ALT+A boxarg | linearg | streamarg ALT+=) Performs the assignment using the selected text. Ignores blank and comment lines. Returns Example True False Assignment successful. Assignment invalid.
To set the Tabstops switch to 8: 1. Execute Arg (ALT+A). 2. Type the following switch assignment:
tabstops:8
3. Execute Assign (ALT+=). Update Assign Arg Assign With PWB 1.x, Assign and Arg Assign do not recognize line continuations. With PWB 2.00, they use all continued lines for the assignment. Arg streamarg Assign With PWB 1.x, a streamarg is not allowed. With PWB 2.00, Assign accepts a streamarg. Arg ? Assign With PWB 1.x, this form of the Assign function displays the current assignments for all functions, switches, and macros in the <ASSIGN>Current Assignments and Switch Settings pseudofile.
149
With PWB 2.00, the <ASSIGN> pseudofile does not exist; therefore, this form of the Assign function is obsolete. If you use this command or execute a macro that executes this command, PWB issues the error:
Missing ':' in '?'
PWB is expecting an assignment or definition using the name ?, which is a legal macro name.
Backtab
Key
SHIFT+TAB
Backtab Moves the cursor to the previous tab stop on the line. Returns Update See True False Cursor moved. Cursor is at left margin.
PWB 2.0 supports variable tab stops. PWB 1.x supports only fixed-width tab stops. Tab, Tabstops
Begfile
Key
CTRL+HOME
Begfile Moves the cursor to the beginning of the file. Returns See True False Endfile Cursor moved. Cursor not moved; the cursor is already at the beginning of the file.
150
Begline
Key
HOME
Begline Places the cursor on the first nonblank character in the line. Meta Begline (F9 HOME) Places the cursor in the first character position of the line (column one). Returns Example True False Cursor moved. Cursor not moved; the cursor is already at the destination.
The following macro moves the cursor to column one, then toggles between column one and the first nonblank character of the line.
toggle_begline := Left ->x Meta :>x Begline
The result of the Left function is tested to determine if the cursor is already in column one. If the cursor is in column one, PWB skips the Meta and executes Begline to move to the first nonblank character. If the cursor is not in column one, PWB executes Meta Begline to move there. Example This macro mimics the behavior of the BRIEF
bhome:= Meta Begline +> Home +> Begfile
HOME
key:
The result of Meta Begline (go to column 1 on the line) is tested to determine if the cursor moved. If the cursor moved, the test (+>) succeeds and the macro exits. If the cursor did not move, the cursor is already in column 1, so the macro advances to the home position with Home. If the cursor did not move going to the home position, the macro advances to the beginning of the file with Begfile. See Left, Meta
Cancel
Key
ESC
Cancel Cancels the current selection, argument, or operation. If a message appears on the status bar, the Cancel function restores the original contents of the status bar.
151
If a dialog box or menu is open, Cancel closes the dialog box or menu and takes no further action. If Help on a dialog box, menu, or message box is being displayed, Cancel closes the Help dialog box. Returns See Cancel always returns true. Arg
Cancelsearch
Key Unassigned Cancelsearch Cancels a background search. The Search Results window contains the partial results of the aborted search and is not flushed. You can browse matches listed in the Search Results by using the Next Match, Previous Match, and Goto Match commands from the Search menu and by using the Nextsearch function (Unassigned). Cancelsearch applies only to multithreaded environments. Returns See True False Background search was canceled. No background search in progress.
Nextsearch, _pwbnextlogmatch, _pwbpreviouslogmatch, _pwbgotomatch
Cdelete
Key
CTRL+G
Cdelete Deletes the previous character, excluding line breaks. If the cursor is in column 1, Cdelete moves the cursor to the end of the previous line. In insert mode, Cdelete deletes the previous character, reducing the line length by 1. In overtype mode, Cdelete deletes the previous character and replaces it with a space character. If the cursor is beyond the end of the line, the cursor moves to the immediate right of the last character on the line.
152
Emacscdel is similar to Cdelete. However, in insert mode, Emacscdel deletes line breaks; in overtype mode beyond the end of the line, it does not automatically move to the end of the line. Returns See True False Cursor moved. Cursor not moved.
Delete, Emacscdel, Ldelete, Sdelete
Clearmsg
Key Unassigned Clearmsg Clears the contents of the Build Results window. Arg Clearmsg (ALT+A Unassigned) Clears the current set of messages in the Build Results window. Returns See True False Cleared a message set or the contents of Build Results. The Build Results window is empty.
Nextmsg, _pwbnextmsg, _pwbprevmsg, _pwbsetmsg
Clearsearch
Key Unassigned Clearsearch Clears the contents of the Search Results window. Arg Clearsearch (ALT+A Unassigned) Clears the current set of matches in the Search Results window. Returns See True False Cleared a match set or the contents of Search Results. The Search Results window is empty.
Clearmsg, Logsearch, _pwbnextlogmatch, _pwbpreviouslogmatch, _pwbgotomatch
153
Closefile
Key Unassigned Closefile Closes the file in the active window. If no files remain in the windows file history, the window is also closed. Arg Closefile (ALT+A Unassigned) Closes the file named by the text at the cursor. Arg linearg | boxarg | streamarg Closefile (ALT+A linearg | boxarg | streamarg Unassigned) Closes the file named by the selected text. Arg textarg Closefile (ALT+A textarg Unassigned) Closes the specified file. Returns See True False The file was closed. No file was closed.
Refresh, _pwbclosefile
Compile
Key
CTRL+F3
The Compile function compiles and builds targets in the project or runs external commands, capturing the result of the operation in the Build Results window. Under multithreaded environments the commands run in the background. Arg Compile (ALT+A CTRL+F3) Compiles the current file. This is equivalent to Project Compile File. Arg Compile fails if no project is open. See: _pwbcompile. Arg textarg Compile (ALT+A textarg CTRL+F3) Builds the target specified by textarg. This is equivalent to Build Target command on the Project menu. Arg textarg Compile fails if no project is open. To build the current project, execute Arg all Compile. Arg Meta textarg Compile (ALT+A textarg F9 CTRL+F3) Rebuilds the specified target and its dependents. See: _pwbrebuild.
154
This command is equivalent to specifying the NMAKE /a option. Note that you can also include NMAKE command-line macro definitions in the text you pass to the Compile function. Arg Meta Compile (ALT+A F9 CTRL+F3) Aborts the background compile after prompting for confirmation. Also clears the queue of pending background operations (if any). Arg Arg textarg Compile (ALT+A ALT+A textarg CTRL+F3) Runs the program or operating-system command specified by textarg. The output is displayed in the Compile Results window. Under multithreaded environments, the program runs in the background, and the Compile Results window is updated as the program executes. Several programs can be queued for background execution. Do not use this command to execute an interactive program. The program is able to change the display but may not receive input. To run an interactive program, use the Shell function (SHIFT+F9). Returns True False Operation successfully initiated. Operation not initiated.
Copy
Keys Menu
CTRL+INS, SHIFT+NUM*
Edit menu, Copy command Copy Copies the current line to the clipboard. Arg Copy (ALT+A CTRL+INS) Copies text from the cursor to the end of the line. The text is copied to the clipboard, but the line break is not included. Arg boxarg | linearg | streamarg Copy (ALT+A boxarg | linearg | streamarg CTRL+INS) Copies the selected text to the clipboard. Arg textarg Copy (ALT+A textarg CTRL+INS) Copies the specified textarg to the clipboard. Arg mark Copy (ALT+A mark CTRL+INS) Copies the text from the cursor to the mark. The text is copied to the clipboard. The mark argument can be either a line number or a previously defined mark. See: Mark.
155
The text is copied as a boxarg or linearg depending on the relative positions of the cursor and the mark. If the cursor and the mark are in the same column, the text is copied as a linearg. If the cursor and the mark are in different columns, the text is copied as a boxarg. Arg number Copy (ALT+A number CTRL+INS) Copies the specified number of lines to the clipboard, starting with the current line. For example, Arg 5 Copy copies five lines to the clipboard. Returns See Copy always returns true. Delete, Ldelete, Sdelete, Paste
Curdate
Key Unassigned Curdate Types the current date at the cursor in the format day-month-year, for example: 17-Apr-1999. Returns See True False Date typed. Typing the date would make the line too long.
Curday, Curfile, Curfilenam, Curfileext, Curtime
Curday
Key Unassigned Curday Types the three-letter abbreviation for the current day of the week, as follows: Mon Tue Wed Thu Fri Sat Sun. Returns See True False Day typed. Typing the day would make the line too long.
Curdate, Curfile, Curfilenam, Curfileext, Curtime
156
Curtime
Key Unassigned Curtime Types the current time in the format hours:minutes:seconds, for example, 17:08:32. Returns See True False Time typed. Typing the time would make the line too long.
Curdate, Curday, Curfile, Curfilenam, Curfileext
Delete
Keys Menu
SHIFT+DEL, SHIFT+NUM
Edit menu, Cut command Delete Deletes the single character at the cursor, excluding line breaks. It does not copy the deleted character onto the clipboard. Note that the Delete function can delete more than one character, depending on the current selection mode. Arg Delete (ALT+A SHIFT+DEL) Deletes from the cursor to the end of the line. The deleted text is copied onto the clipboard. In stream selection mode, the deletion includes the line break and joins the current line to the next line. Arg boxarg | linearg | streamarg Delete (ALT+A boxarg | linearg | streamarg SHIFT+DEL) Deletes the selected text. The text is copied on to the clipboard. Meta ... Delete (F9 ... SHIFT+DEL) As above but discards the deleted text. The contents of the clipboard are not changed.
Returns
Delete always returns true.
157
Down
Keys
DOWN, CTRL+X
Down Moves the cursor down one line. If a selection has been started, it is extended by one line. If this movement results in the cursor moving out of the window, the window is adjusted downward as specified by the Vscroll switch. Meta Down (F9 DOWN) Moves the cursor to the bottom of the window without changing the column position. Returns See True False Up Cursor moved. Cursor did not move; the cursor is at the destination.
Emacscdel
Keys
BKSP, SHIFT+BKSP
Emacscdel Deletes the previous character. If the cursor is in column 1, Emacscdel moves the cursor to the end of the previous line. In insert mode, Emacscdel deletes the previous character, reducing the length of the line by 1. If the cursor is in column one, Emacscdel deletes the line break, joining the current line to the previous line. In overtype mode, Emacscdel deletes the previous character and replaces it with a space character. If the cursor is in column 1, Emacscdel moves the cursor to the end of the previous line and does not delete the line break. Emacscdel is similar to Cdelete, but Cdelete never deletes line breaks; in overtype mode beyond the end of the line, Cdelete automatically moves to the end of the line. Returns See True False Cursor moved. Cursor not moved.
Cdelete, Delete, Ldelete, Sdelete
158
Emacsnewl
Keys
ENTER, NUMENTER
Emacsnewl In insert mode, starts a new line. In overtype mode, moves the cursor to the beginning of the next line. PWB automatically positions the cursor on the new line, depending on the setting of the Softcr switch. Returns Update Emacsnewl always returns True. In PWB 1.x, PWB performs special automatic indentation for C files. In PWB 2.00, language-specific automatic indentation is handled by language extensions if the feature is enabled. Otherwise, PWB uses its default indentation rules. Newline, Softcr, C_Softcr
See
Endfile
Key
CTRL+END
Endfile Places the cursor at the end of the file. Returns See True False Begfile Cursor moved. Cursor did not move; the cursor is at the end of the file.
Endline
Key
END
Endline Moves the cursor to the immediate right of the last character on the line. Meta Endline (F9 END) Moves the cursor to the column that is one column past the active window width.
159
Returns See
True False
Cursor moved. Cursor did not move; the cursor is at the destination.
Begline, Traildisp, Trailspace
Environment
Key Unassigned Environment Executes the current line as an environment-variable setting. For example, if the current line contains the following text when you execute Environment:
PATH=C:\UTIL;C:\DOS
PWB adds this setting to the current environment table. The effect is the same as the operating-system SET command. PWB uses the new environment variable for the rest of the session (including shells). Depending on the settings of the Envcursave and Envprojsave switches, PWB saves the environment table for PWB sessions and/or projects. See: Envcursave, Envprojsave. Arg textarg Environment (ALT+A textarg Unassigned) Executes the argument as an environment-variable setting. Arg linearg | boxarg Environment (ALT+A linearg | boxarg Unassigned) Executes each selected line or line fragment as an environment-variable setting. Meta Environment (F9 Unassigned) Performs environment-variable substitutions for all variables on the current line, replacing each variable with its value. The syntax for an environment variable isINDEX: Environment variable, specfying in PWB $(ENV) | $ENV: where ENV is the uppercase name of the environment variable. Arg Meta Environment (ALT+A F9 Unassigned) Performs environment-variable substitutions (described above) for the text from the cursor to the end of the line. Arg boxarg | linearg | streamarg Meta Environment (ALT+A boxarg | linearg | streamarg F9 Unassigned) Performs environment-variable substitutions for the selected text.
160
Returns Update
True False
Environment variable successfully set or substituted. Syntax error or line too long.
Because the <ENVIRONMENT> pseudofile no longer exists, this form of the Environment function is obsolete; it is replaced by the Environment command on the Options menu.
Execute
Key
F7
The Execute function executes PWB functions and macros by name. It allows you to execute commands that are not assigned to a key or execute a sequence of commands in one step. The Execute function executes the commands by the same rules as macros. Function prompts are suppressed, and you can use the macro flow-control and macro prompt directives. You do not need to define a macro to use these features. Arg Execute (ALT+A F7) Executes the text from the cursor to the end of the line as a PWB macro. Arg linearg | textarg Execute (ALT+A linearg | textarg F7) Executes the specified text as a PWB macro. Returns True False Last executed function returned true. Last executed function returned false.
Exit
Key
F8
Exit If you specified multiple files on the PWB command line, PWB advances to the next file. Otherwise, PWB quits and returns control to the operating system. If the Autosave switch is set to yes, the file is saved if it has been modified. If Autosave is no and the file is modified, PWB prompts for confirmation to save the file.
161
Meta Exit (F9 F8) Performs like Exit with the Autosave switch set to no, independent of the current setting of Autosave. If you have changed any files, PWB asks for confirmation to save before exiting. Arg Exit (ALT+A F8) Like Exit, except PWB quits immediately without advancing to the next file (if any). Arg Meta Exit (ALT+A F9 F8) Like Meta Exit, except PWB quits immediately without advancing to the next file. Returns See No return value. _pwbquit
Graphic
Keys Assigned to most alphanumeric and punctuation keys. Graphic Types the character corresponding to the key that you pressed. Returns See True False The character is typed. Typing the character would make the line too long.
Assign, Quote
Home
Key
GOTO
(Numeric-keypad 5)
Home Places the cursor in the upper-left corner of the window. Meta Home (F9 GOTO) Places the cursor in the lower-right corner of the window. Returns See True False Cursor moved. Cursor not moved; it is already at the destination.
Begline, Endline, Left, Right
162
Initialize
Key
SHIFT+F8
Initialize Discards all current settings, including extension settings, then reads the statements from the [PWB] section of TOOLS.INI. Arg Initialize (ALT+A SHIFT+F8) Reads the statements from a tagged section of TOOLS.INI. The tag name is specified by the continuous string of nonblank characters starting at the cursor. Arg textarg Initialize (ALT+A textarg SHIFT+F8) Reads the statements from the TOOLS.INI tagged section specified by textarg. Example The section tagged with
[PWB-name]
is initialized by the command

Arg name Initialize
Example
To reload the main section of TOOLS.INI without clearing other settings that you want to remain in effect, label the main section of TOOLS.INI with the tag:
[PWB PWB-main]
then use Arg main Initialize to recover your main settings instead of using Initialize with no arguments. Returns True False Initialized tagged section in TOOLS.INI. Did not find tagged section in TOOLS.INI.
Information
Update (obsolete) The PWB 1.x Information function and its associated pseudofile <INFORMATION-FILE> are obsolete; they do not exist in PWB 2.00.
163
Insert
Key Unassigned Insert Inserts a single-space character at the cursor, independent of the insert/overtype mode. Arg Insert (ALT+A Unassigned) Breaks the line at the cursor. Arg boxarg | linearg | streamarg Insert (ALT+A boxarg | linearg | streamarg Unassigned) Inserts space characters into the selected area. Returns Example True False Spaces or line break inserted. Insertion would make a line too long.
If paragraphs in your file consist of a sequence of lines beginning in the same column and are separated from other paragraphs by at least one blank line, the following macro indents a paragraph to the next tab stop:
para_indent:=_pwbboxmode Meta Mpara Down Begline Arg Meta Ppara Up Begline Tab Insert \
This macro starts with the predefined PWB macro _pwbboxmode to set box selection mode, then creates a box selection from the beginning of the paragraph to the end, one tab stop wide. The Insert function inserts spaces in the selection. See Sinsert , Linsert
Insertmode
Keys
INS, CTRL+V
Insertmode Toggles between insert mode and overtype mode. If overtype mode is on, the letter O appears on the status bar. The cursor can also change shape, depending on the Cursormode switch. See: Cursormode. In insert mode, each character you type is inserted at the cursor. This insertion shifts the remainder of the line one position to the right. In overtype mode, the character you type replaces the character at the cursor.
164
Returns
True False
PWB is in insert mode. PWB is in overtype mode.
Lastselect
Key
CTRL+U
Lastselect Duplicates the last selection. The Arg count and Meta state that were previously in effect are not duplicatedonly the selection. The new Arg count is one, and the Meta state is the current Meta state. To use a higher Arg count, execute Arg (ALT+A). To toggle the Meta state, execute Meta (F9). The re-created selection uses the same pair of line:column coordinates as the previous selection. Thus, different text can be selected if you have made additions or deletions to the file since the last selection. See Arg, Lasttext, Meta
Lasttext
Key
CTRL+O
Lasttext Displays the last text argument in the Text Argument dialog box. You can edit the text and then execute any PWB function that accepts a text argument, or you can cancel the dialog box. If you edit the text and then cancel the dialog box, PWB retains the modified text. Thus, when you execute Lasttext again, the new text appears in the dialog box. Arg [[Arg]]... [[Meta]] Lasttext (ALT+A [[ALT+A]]... [[F9]] CTRL+O) Displays the last text argument in the Text Argument dialog box with the specified Arg count and Meta state. Arg [[Arg]]... linearg | boxarg | streamarg [[Meta]] Lasttext (ALT+A [[ALT+A]]... linearg | boxarg | streamarg [[F9]] CTRL+O) Displays the first line of the selection in the Text Argument dialog box with the specified Arg count and Meta state.
165
Returns Example
The return value of Lasttext cannot be tested. The OpenInclude macro that follows opens an include file named in the next #include directive. The macro demonstrates a technique using the Lasttext function to pick up text from the file and modify it without modifying the file or the clipboard.
OpenInclude := \ Up Meta Begline Arg Arg "^[ \t]*#[ \t]*include" Psearch -> Arg Arg "[<>\"]" Psearch -> Right Savecur Psearch -> Selcur Lasttext Begline "$INCLUDE:" Openfile <n +> Lastselect Openfile < \ \ \
In the fourth line, Lasttext pulls the selected filename into the Text Argument dialog box. The text argument is modified to prepend $INCLUDE: before passing it to the Openfile function. Example In some macro-programming situations, you dont want to use the text immediately. Instead, you need to pick up some text, do some other processing, then use the text. In this situation, use the phrase: (make selection) Lasttext Cancel ... This picks up the text, then cancels the Text Argument dialog box. The selected text remains in the Lasttext buffer for later use. To reuse the text, call Lasttext again. See Arg, Lastselect, Meta, Prompt
Ldelete
Key
CTRL+Y
Ldelete Deletes the current line and copies it to the clipboard. Arg Ldelete (ALT+A CTRL+Y) Deletes text from the cursor to the end of the line and copies it to the clipboard. Arg mark Ldelete (ALT+A mark CTRL+Y) Deletes the text from the line at the cursor to the line specified by mark and copies it to the clipboard. The mark cannot be a line number.
166
Arg number Ldelete (ALT+A number CTRL+Y) Deletes the specified number of lines starting from the line at the cursor and copies them to the clipboard.
167
Arg boxarg | linearg Ldelete (ALT+A boxarg | linearg CTRL+Y) Deletes the specified text and copies it to the clipboard. The argument is a linearg or boxarg regardless of the current selection mode. The argument is a linearg if the starting and ending points are in the same column. Meta ... Ldelete (F9 ... CTRL+Y) As above but discards the deleted text. The clipboard is not changed. Returns See Ldelete always returns true. Cdelete, Delete, Emacscdel, Sdelete
Left
Keys
LEFT, CTRL+S
Left Moves the cursor one character to the left. If this movement results in the cursor moving out of the window, the window is adjusted to the left as specified by the Hscroll switch. Meta Left (F9 LEFT) Moves the cursor to the first column in the window. Returns See True False Cursor moved. Cursor not moved; the cursor is in column one.
Begline, Down , Endline, Home, Right, Up
Linsert
Key
CTRL+N
Linsert Inserts one blank line above the current line. Arg Linsert (ALT+A CTRL+N) Inserts or deletes blanks at the beginning of a line to move the first nonblank character to the cursor. Arg boxarg | linearg Linsert (ALT+A boxarg | linearg CTRL+N) Inserts blanks within the specified area.
168
The argument is a linearg or boxarg regardless of the current selection mode. The argument is a linearg if the starting and ending points are in the same column. Arg mark Linsert (ALT+A mark CTRL+N) Like boxarg | linearg except the specified area is given by the cursor position and the position of the specified mark. The mark argument must be a named mark: it cannot be a line number. See: Mark. Returns See Linsert always returns true. Insert , Sinsert
Logsearch
Key Unassigned Logsearch Toggles the search-logging state. The default search-logging mode when PWB starts up is determined by the Enterlogmode switch. Returns True False Search logging turned on. Search logging turned off.
Mark
Key
CTRL+M
The Mark function moves the cursor to a mark or specific location, defines marks, and deletes marks. Note that you cannot set a mark at specific text in a PWB window such as Help; PWB marks only the window position. If you want to save marks between sessions, assign a filename to the Markfile switch or use the Set Mark File command on the Search menu. Mark (CTRL+M) Moves the cursor to the beginning of the file. Arg Mark (ALT+A CTRL+M) Restores the cursor to its location prior to the last window scroll. Use Arg Mark to return to your previous location after a search or other large jump.
169
Arg number Mark (ALT+A number CTRL+M) Moves the cursor to the beginning of the line specified by number in the current file. Line numbering starts at 1. Arg textarg Mark (ALT+A textarg CTRL+M) Moves the cursor to the specified mark. Arg Arg textarg Mark (ALT+A ALT+A textarg CTRL+M) Defines a mark at the cursor position. The name of the mark is specified by textarg. Arg Arg textarg Meta Mark (ALT+A ALT+A textarg F9 CTRL+M) Deletes the specified mark. This form of the Mark function always returns true. Returns See True False Move, definition, or deletion successful. Invalid argument or mark not found.
Markfile, Restcur, Savecur, Selcur
Maximize
Key Unassigned Maximize Expands the window to its maximum size. If the window is already maximized, the window is restored. When the window is maximized and scroll bars are turned off by using the Winstyle function, PWB turns off the window borders. This is the clean screen look. Meta Maximize (F9 Unassigned) Restores the window to its original size. Returns See True False Window is maximized. Window is restored.
Minimize, Winstyle
170
Menukey
Key
ALT
Menukey Activates the menu bar. Unlike other PWB functions, Menukey can be assigned to only one key. It cannot be assigned to a combination of keys. Returns You cannot test the return value of Menukey.
Message
Key Unassigned Message Clears the status bar. Arg Message (ALT+A Unassigned) Displays the text from the cursor to the end of the line on the status bar. Arg textarg Message (ALT+A textarg Unassigned) Displays textarg on the status bar. Meta ... Message (F9 ... Unassigned) As above and also repaints the screen. Returns Example Message always returns true. The following macro is useful when writing new macros (the ! is the macro name):
! := Meta Message
With this definition you can place an exclamation point in your macros wherever you want a screen update. If you also want to display a status-bar message at the time of the update, use the phrase: ... Arg "text of message" ! ... See Prompt
171
Meta
Key
F9
Meta Modifies the action of the function it prefixes. When the Meta state is turned on, the letter A (for Alternate) appears in the status bar. You can use the Meta and Arg function prefixes in any order. Returns See True False Meta state turned on. Meta state turned off.
Arg, Lasttext, Lastselect
Mgrep
Key Unassigned The Mgrep function searches all the files listed in the Mgreplist macro. PWB places all matches in the Search Results window. Under multithreaded environments, PWB performs the search in the background. To browse the list of matches, use _pwbnextlogmatch (CTRL+SHIFT+F3), _pwbpreviouslogmatch (CTRL+SHIFT+F4), and the Nextsearch function (Unassigned). Mgrep (Unassigned) Searches for the previously searched string or pattern. Arg Mgrep (ALT+A Unassigned) Searches for the string specified by the characters from the cursor to the first blank character. Arg textarg Mgrep (ALT+A textarg Unassigned) Searches for textarg. Arg Arg Mgrep (ALT+A ALT+A Unassigned) Searches for the regular expression specified by the characters from the cursor to the first blank character. Arg Arg textarg Mgrep (ALT+A ALT+A textarg Unassigned) Searches for the regular expression specified by textarg. Meta ... Mgrep (F9 ... Unassigned) As above except that the value of the Case switch is reversed for the search.
172
Returns
True With MS-DOS, indicates that a match was found. With multithreaded environments, indicates that a background search was successfully initiated. False No matches, no search pattern specified, search pattern invalid, or search terminated by CTRL+BREAK.
Update
In PWB 2.00, search and build results and their browsing functions are separate. A background build operation and a background search can be performed simultaneously. In PWB 1.x, search and build results appear in the same window, and are browsed with the same commands. A background build operation and a multifile search cannot be performed at the same time in PWB 1.x.
Minimize
Key Unassigned Minimize Shrinks the active window to an icon (a minimized window). If the window is already minimized, restores the window. Arg Minimize (ALT+A Unassigned) Minimizes all open windows. Meta Minimize (F9 Unassigned) Restores the window to its unminimized state. Returns See True False Maximize Window minimized: the window is an icon. Window restored: the window is not an icon.
Mlines
Keys
CTRL+UP , CTRL+W
Mlines Scrolls the window down as specified by the Vscroll switch.
173
Arg Mlines (ALT+A CTRL+UP ) Scrolls the window so the line at the cursor moves to the bottom of the window. Arg number Mlines (ALT+A number CTRL+UP ) Scrolls the window down by number lines. Returns See True False Plines Window scrolled. Invalid argument.
Movewindow
Key Unassigned Movewindow Enters window-moving mode. In window-moving mode, only the following actions are available:
Action Move up one row Move down one row Move left one column Move right one column Accept the new position Cancel the move Key
UP DOWN LEFT RIGHT ENTER ESC
Arg number Movewindow (ALT+A number Unassigned) Moves the upper-left corner of the window to the screen row specified by number. Meta Arg number Movewindow (F9 ALT+A number Unassigned) Moves the upper-left corner of the window to the screen column specified by number. Returns True False Window moved. Window not moved.
174
Mpage
Keys
PGUP , CTRL+R
Mpage Moves the cursor backward in the file by one window. Returns See True False Ppage Cursor moved. Cursor not moved.
Mpara
Key Unassigned Mpara Moves the cursor to the beginning of the first line of the current paragraph. If the cursor is already on the first line of the paragraph, it is moved to the begining of the first line of the preceding paragraph. Meta Mpara (F9 Unassigned) Moves the cursor to the first blank line preceding the current paragraph. Returns See True False Ppara Cursor moved. Cursor not moved; no more paragraphs in the file.
Mreplace
Key Unassigned Mreplace Performs a find-and-replace operation across multiple files, prompting for the find-and-replacement strings and for confirmation at each occurrence. Mreplace searches all the files listed in the special macro Mgreplist. Arg Arg Mreplace (ALT+A ALT+A Unassigned) Performs the same action as Mreplace but uses regular expressions.
175
Meta ... Mreplace (F9 ... Unassigned) As above except reverses the sense of the Case switch for the operation. Returns See True False At least one replacement made. No replacements made or operation aborted.
Mgrep, Mreplaceall, Qreplace, Replace
Mreplaceall
Key Unassigned Mreplaceall Performs a find-and-replace operation across multiple files, prompting for the find-and-replacement strings. Mreplaceall searches all the files listed in the special macro Mgreplist. Arg Arg Mreplaceall (ALT+A ALT+A Unassigned) Performs the same action as Mreplaceall but uses regular expressions. Meta ... Mreplaceall (F9 ... Unassigned) As above except reverses the sense of the Case switch for the operation. Returns See True False At least one replacement made. No replacements made or operation aborted.
Mgrep, Mreplace, Qreplace, Replace
Msearch
Key
F4
Msearch Searches backward for the previously searched string or pattern. Arg Msearch (ALT+A F4) Searches backward for the string specified by the text from the cursor to the first blank character. Arg textarg Msearch (ALT+A textarg F4) Searches backward for the specified text.
176
Arg Arg Msearch (ALT+A ALT+A F4) Searches backward for the regular expression specified by the text from the cursor to the first blank character. Arg Arg textarg Msearch (ALT+A ALT+A textarg F4) Searches backward for the regular expression defined by textarg. Meta ... Msearch (F9 ... F4) As above except reverses the sense of the Case switch for the search. Returns See True False String found. Invalid argument, or string not found.
Mgrep, Psearch
Mword
Keys
CTRL+LEFT, CTRL+A
Mword Moves the cursor to the beginning of the current word, or if the cursor is not in a word or at the beginning of the word, moves the cursor to the beginning of the previous word. A word is defined by the Word switch. Meta Pword (F9 CTRL+RIGHT ) Moves the cursor to the immediate right of the previous word. Returns See True False Pword Cursor moved. Cursor not moved; there are no more words in the file.
Newfile
Key Unassigned The Newfile function creates a new pseudofile. If the Newwindow switch is set to yes, it opens a new window for the file. Newfile (Unassigned) Creates a new untitled pseudofile. The new pseudofile is given a unique name of the form: <Untitled.nnn>Untitled.nnn
177
where nnn is a three-digit number starting with 001 at the beginning of each PWB session. The window title shows Untitled.001. Use the pseudofile name <Untitled.001> to refer to the file in a text argument or dialog box. Arg Newfile (ALT+A Unassigned) Creates a new pseudofile with the name specified by the text from the cursor to the end of the line. The resulting full pseudofile name is:
"<Text on the line>Text on the line"
Arg textarg Newfile (ALT+A textarg Unassigned) Creates a new pseudofile with the name specified by textarg. The resulting full pseudofile name is:
"<textarg>textarg"
If you want to use a different short name and window title, use the full name as an argument to the Setfile or Openfile functions. For example, Arg "<temp>Temporary File" Openfile opens a pseudofile in a new window that has the title Temporary File. Returns True False Successfully created the pseudofile. Unable to create the pseudofile.
Newline
Keys
SHIFT+ENTER, SHIFT+NUMENTER
Newline Moves the cursor to a new line. If the Softcr switch is set to yes, PWB automatically indents to an appropriate position based on the type of file you are editing. Meta Newline (F9 SHIFT+ENTER) Moves the cursor to column 1 of the next line. Returns Update Newline always returns true. In PWB 1.x, PWB performs special automatic indentation for C files. In PWB 2.00, language-specific automatic indentation is handled by language extensions if the feature is enabled. Otherwise, PWB uses its default indentation rules. Emacsnewl
See
178
Nextmsg
Key Unassigned Nextmsg Advances to next message in the Build Results window. Arg number Nextmsg (ALT+A number Unassigned) Moves to the nth message in the current set of messages, where n is specified by number. To move relative to the current message, use a signed number. For example, when number is +1, PWB moves to the next message, and when it is 1, PWB moves to the previous message. Arg Nextmsg (ALT+A Unassigned) Moves to the next message in the current set of messages that does not refer to the current file. Meta Nextmsg (F9 Unassigned) Advances to the next set of messages. Arg Arg Nextmsg (ALT+A ALT+A Unassigned) Sets the message at the cursor as the current message. This works only when the cursor is on a message in the Build Results window. Returns Update True False Message found. No more messages found.
In PWB 1.x, Nextmsg also browses the results of searches. In PWB 2.00, search results are browsed with the Nextsearch function. Meta Nextmsg In PWB 1.x, deletes the current set of messages and advances to the next set. In PWB 2.00, Meta Nextmsg does not delete the set. To delete sets of messages in PWB 2.00, use the Clearmsg function. Meta Arg Arg Nextmsg In PWB 1.x, closes the Compile Results window. In PWB 2.00, it behaves like Arg Arg Nextmsg.
See
Clearmsg
179
Nextsearch
Key Unassigned Nextsearch Advances to the next match in the Search Results window. Arg number Nextsearch (ALT+A number Unassigned) Moves to the nth match in the current set of matches, where n is specified by number. To move relative to the current match, use a signed number. For example, when number is +1, PWB moves to the next match, and when it is 1, PWB moves to the previous match. Arg Nextsearch (ALT+A Unassigned) Moves to the next match in the current set of matches that does not refer to the current file. Meta Nextsearch (F9 Unassigned) Advances to the next set of matches. Arg Arg Nextsearch (ALT+A ALT+A Unassigned) Sets the match at the cursor as the current match. This works only when the cursor is on a match in the Search Results window. Update See In PWB 1.x, the results of searches are browsed using the Nextmsg function. Clearsearch
Noedit
Key Unassigned The Noedit function toggles the no-edit state of PWB or the current file. When the no-edit state is turned on, PWB displays the letter R on the status bar and disallows modification of the file. Noedit Toggles the no-edit state. If you started PWB with the /R (read-only) option, Noedit removes the no-edit limitation. Meta Noedit (F9 Unassigned) Toggles the no-edit state for the current file. This form of the Noedit command works only for disk files and has no effect on pseudofiles.
180
If you have the Editreadonly switch set to no, PWB turns on the no-edit state for files that are marked read-only on disk. This function toggles the noedit state for the file so that you can modify it. Returns True False File or PWB in no-edit state; modification disallowed. File or PWB not in no-edit state; modification allowed.
Openfile
Key
F10
The Openfile function opens a file in a new window, ignoring the Newwindow switch. Arg Openfile (ALT+A F10) Opens the file at the cursor in a new window. The name of the file is specified by the text from the cursor to the first blank character. Arg textarg Openfile (ALT+A textarg F10) Opens the specified file in a new window. If the argument is a wildcard, PWB creates a pseudofile containing a list of files that match the pattern. To open a file from this list, position the cursor at the beginning of the name and use Arg Openfile or Arg Setfile. Returns See True False File and window successfully opened. No argument specified, or file did not exist and you did not create it.
Newfile, Setfile
Paste
Keys Menu
SHIFT+INS, SHIFT+NUM+
Edit menu, Paste command Paste (SHIFT+INS) Copies the contents of the clipboard to the file at the cursor. The text is always inserted independent of the insert/overtype mode. If the clipboard contents were copied to the clipboard as a linearg, PWB inserts the contents of the clipboard above the current line. Otherwise, the contents of the clipboard are inserted at the cursor.
181
Arg boxarg | linearg | streamarg Paste (ALT+A boxarg | linearg | streamarg SHIFT+INS) Replaces the selected text with the contents of the clipboard. Arg Paste (ALT+A SHIFT+INS) Copies the text from the cursor to the end of the line. The text is copied to the clipboard and inserted at the cursor. Arg textarg Paste (ALT+A textarg SHIFT+INS) Copies textarg to the clipboard and inserts it at the cursor. Arg Arg filename Paste (ALT+A ALT+A filename SHIFT+INS) Copies the contents of the file specified by textarg to the current file above the current line. Arg Arg !textarg Paste (ALT+A ALT+A !filename SHIFT+INS) Runs textarg as an operating-system command, capturing the commands output to standard output. The output is copied to the clipboard and inserted above the current line. You must enter the exclamation mark as shown. Returns True False text Paste always returns true except for the following cases. Tried Arg Arg filename Paste and file did not exist, or the pasted would make a line too long. Example The following command copies a sorted copy of the file SAMPLE.TXT to the current file: Arg Arg !SORT <SAMPLE.TXT Paste (ALT+A ALT+A !SORT <SAMPLE.TXT SHIFT+INS).
Pbal
Key
CTRL+[
Pbal Scans backward through the file, balancing parentheses (( )) and brackets ([ ]). The first unmatched parenthesis or bracket is highlighted when found. If an unbalanced parenthesis or bracket is found, it is highlighted and the corresponding character is inserted at the cursor. If no unbalanced characters are found, PWB displays a message box. The search does not include the cursor position and looks for more opening brackets or parentheses than closing ones.
182
Arg Pbal (ALT+A CTRL+[) Like Pbal except that it scans forward through the file and searches for right brackets or parentheses lacking opening partners. Meta Pbal (F9 CTRL+[) Like Pbal but does not insert the unbalanced character. If no unbalanced characters are found, moves to the matching character. Arg Meta Pbal (ALT+A F9 CTRL+[) Like Arg Pbal but does not insert the character. If no unbalanced characters are found, moves to the matching character. Update Returns See In PWB 1.x, the messages appear on the status bar. In PWB 2.00, they appear in a message box. True False Balance successful. Invalid argument, or no unbalanced characters found.
Infodialog
Plines
Keys
CTRL+DOWN, CTRL+Z
Plines Scrolls the text up as specified by the Vscroll switch. Arg Plines (ALT+A CTRL+DOWN) Scrolls the text such that the line at the cursor is moved to the top of the window. Arg number Plines (ALT+A number CTRL+DOWN) Scrolls the text up by number lines. Returns See True False Mlines Text scrolled. Invalid argument.
Ppage
Keys
PGDN, CTRL+C
183
Ppage Moves the cursor forward in the file by one window. Returns See True False Mpage Cursor moved. Cursor not moved.
Ppara
Key Unassigned Ppara Moves the cursor to the beginning of the first line of the next paragraph. Meta Ppara (F9 Unassigned) Moves cursor to the beginning of the first blank line after the current paragraph. If the cursor is not on a paragraph, moves the cursor to the first blank line after the next paragraph. Returns See True False Mpara Cursor moved. Cursor not moved; no more paragraphs in the file.
Print
Key Unassigned The Print function prints files or selections. If the Printcmd switch is set, PWB uses the command line given in the switch. Otherwise, PWB copies the file or selection to PRN. Under multithreaded environments, PWB runs the print command in the background. Print (Unassigned) Prints the current file. Arg textarg Print (ALT+A textarg Unassigned) Prints all the files listed in textarg. Use a space to separate each name from the preceding name. You can use environment variables to specify paths for the files.
184
Arg boxarg | linearg | streamarg Print (ALT+A boxarg | linearg | streamarg Unassigned) Prints the selected text. Arg Meta Print (ALT+A F9 Unassigned) Cancels the current background print. Returns Update True False Print successfully submitted. Could not start print job.
In PWB 1.x there is no way to cancel a background print.
Project
Key Unassigned Project Open the last project. Arg Project (ALT+A Unassigned) Open the project makefile at the cursor as a PWB project. The name of the project is specified by the text from the cursor to the first blank character. Arg textarg Project (ALT+A textarg Unassigned) Open the project makefile specified by textarg as a PWB project. Arg Arg Project (ALT+A ALT+A Unassigned) Close the current project. Arg Meta Project (ALT+A F9 Unassigned) Open the project makefile at the cursor as a non-PWB project (foreign makefile). Arg textarg Meta Project (ALT+A textarg F9 Unassigned) Open the project makefile specified by textarg as a non-PWB project. Returns See True False A project is open. A project is not open.
Lastproject
Prompt
Key Unassigned
185
The Prompt function displays the Text Argument dialog box where you can enter a text argument. You can use this function interactively, but because it is mainly useful in macros, it is not assigned to a key by default. You usually use Lasttext or Arg to directly enter a text argument. Prompt Displays the Text Argument dialog box without a title. See: Lasttext Arg Prompt (ALT+A Unassigned) Uses the text of the current line from the cursor to the end of the line as the title. Arg textarg Prompt (ALT+A textarg Unassigned) Uses textarg as the title. Arg boxarg | linearg | streamarg Prompt (ALT+A boxarg | linearg | streamarg Unassigned) Uses the selected text as the title. If the selection spans more than one line, the title is the first line of the selected text. Returns Example True False Textarg entered; the user chose the OK button. The dialog box was canceled.
With the following macro, PWB prompts for a Help topic:

QueryHelp := Arg "Help Topic to Find:" Prompt -> Pwbhelp QueryHelp : Ctrl+Q
When you press
CTRL+Q, PWB displays a dialog box with the string Help Topic to Find: as the title and waits for a response. PWB passes your
response to the Pwbhelp function as if the command Arg textarg Pwbhelp had been executed. If you cancel the dialog box, Prompt returns false and the macro conditional -> terminates the macro without executing Pwbhelp. See Assign
Psearch
Key
F3
Psearch Searches forward for the previously searched string or pattern. Arg Psearch (ALT+A F3) Searches forward in the file for the string specified by the text from the cursor to the first blank character.
186
Arg textarg Psearch (ALT+A textarg F3) Searches forward for the specified text. Arg Arg Psearch (ALT+A ALT+A F3) Searches forward in the file for the regular expression specified by the text from the cursor to the first blank character. Arg Arg textarg Psearch (ALT+A ALT+A textarg F3) Searches forward for the regular expression defined by textarg. Meta ... Psearch (F9 ... F3) As above but reverses the value of the Case switch for one search. Returns True False String found. Invalid argument, or string not found.
Pwbhelp
Key Unassigned Pwbhelp Displays the default Help topic. Arg Pwbhelp (ALT+A Unassigned) Displays Help on the topic at the cursor. Equivalent to the macro _pwbhelp_context (F1). Arg textarg Pwbhelp (ALT+A textarg Unassigned) Displays Help on the specified text argument. Arg streamarg Pwbhelp (ALT+A streamarg Unassigned) Displays Help on the selected text. The selection cannot include more than one line. Meta Pwbhelp (F9 Unassigned) Prompts for a key, then displays Help on the function or macro assigned to the key you press. If you press a key that is not assigned to a function or macro, PWB displays help on the Unassigned function. If you press a key that PWB does not recognize, the prompt remains displayed until you press a key that PWB recognizes. Returns True False Help topic found. Help topic not found.
187
Pwbhelpnext
Key
CTRL+F1
Pwbhelpnext Displays the next physical topic in the current Help database. Meta Pwbhelpnext (F9 CTRL+F1) Displays the previous Help topic on the backtrace list. This is the Help topic that you previously viewed. Up to 20 Help topics are retained in the backtrace list. Equivalent to the Back button on the Help screens and the macro _pwbhelp_back (ALT+F1). Arg Pwbhelpnext (ALT+A CTRL+F1) Displays the next occurrence of the current Help topic within the Help system. Equivalent to the macro _pwbhelp_again (Unassigned). Use this command when the Help topic appears several times in the set of open Help databases. Returns True False Help topic found. Help topic not found.
Pwbhelpsearch
Key Unassigned The Pwbhelpsearch function performs a global search of the Help system. The search is case insensitive unless you use the Meta form of Pwbhelpsearch, which uses the setting of the Case switch to determine case sensitivity. Pwbhelpsearch (Unassigned) Displays the results of the last global Help search. Equivalent to the predefined macro _pwbhelp_searchres (Unassigned). Arg Pwbhelpsearch (ALT+A Unassigned) Searches Help for the word at the cursor. Arg textarg Pwbhelpsearch (ALT+A textarg Unassigned) Searches Help for the selected text. Arg Arg Pwbhelpsearch (ALT+A ALT+A Unassigned) Searches Help using the regular expression at the cursor.
188
Arg Arg textarg Pwbhelpsearch (ALT+A ALT+A textarg Unassigned) Searches Help for the selected regular expression. Meta ... Pwbhelpsearch (F9 ... Unassigned) As above except the search is case sensitive if the Case switch is set to yes. Returns True False At least one match found. No matches found, or search canceled.
Pwbrowse Functions
Most of the Pwbrowse... functions provided by the PWBROWSE Source Browser extension display one of the Source Browsers dialog boxes. The Source Browser functions attached to Browse menu commands are listed in the following table.
Function Pwbrowsecalltree Pwbrowseclhier Pwbrowsecltree Pwbrowsefuhier Pwbrowsegotodef Pwbrowsegotoref Pwbrowselistref Pwbrowsenext Pwbrowseoutline Pwbrowseprev Pwbrowseviewrel Pwbrowsewhref Browse Menu Command Call Tree (Fwd/Rev) Class Hierarchy Class Tree (Fwd/Rev) Function Hierarchy Goto Definition Goto Reference List References Next Module Outline Previous View Relationship Which Reference Key Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned
CTRL+NUM+
Unassigned
CTRL+NUM-
The browser functions in the following table do not correspond to a Browse menu command.
Function Pwbrowse1stdef Pwbrowse1stref Pwbrowsepop Description Go to 1st definition Go to 1st reference Go to previously browsed location Key Unassigned Unassigned Unassigned
189
Pwbwindow
Key Unassigned The Pwbwindow function opens PWB windows. If the specified window is already open, PWB switches to that window. Arg Pwbwindow (ALT+A Unassigned) Opens the PWB window with the name at the cursor. The name is specified by the text from the cursor to the first blank character. Arg textarg Pwbwindow (ALT+A textarg Unassigned) Opens the specified PWB window. Arg Meta Pwbwindow (ALT+A F9 Unassigned) Closes the PWB window specified by the name at the cursor. Arg textarg Meta Pwbwindow (ALT+A textarg F9 Unassigned) Closes the specified PWB window. Returns True False The specified window was opened. The window could not be opened.
Pword
Keys
CTRL+RIGHT , CTRL+F
Pword Moves the cursor to the beginning of the next word. A word is defined by the Word switch. Meta Pword (F9 CTRL+RIGHT ) Moves the cursor to the immediate right of the current word, or if the cursor is not in a word, moves it to the right of the next word. Returns See True False Mword Cursor moved. Cursor not moved; there are no more words in the file.
190
Qreplace
Key
CTRL+\
The Qreplace function performs a find-and-replace operation on the current file, prompting for find-and-replacement strings and confirmation at each occurrence. Qreplace (CTRL+\) Performs the replacement from the cursor to the end of the file, wrapping around the end of the file if the Searchwrap switch is set to yes. Arg boxarg | linearg | streamarg Qreplace (ALT+A boxarg | linearg | streamarg CTRL+\) Performs the replacement over the selected area. Note that PWB does not adjust the selection at each replacement for changes in the length of the text. For boxarg and streamarg, PWB may replace text that was not included in the original selection or miss text included in the original selection. Arg mark Qreplace (ALT+A mark CTRL+\) Performs the replacement on text from the cursor to the specified mark. Replaces over text as if it were selected, according to the current selection mode. The mark argument cannot be a line number. See: Mark. Arg number Qreplace (ALT+A number CTRL+\) Performs the replacement for the specified number of lines, starting with the line at the cursor. Arg Arg ... Qreplace (ALT+A ALT+A ... CTRL+\) As above except using regular expressions. Meta ... Qreplace (F9 ... CTRL+\) As above except the sense of the Case switch is reversed for the operation. Returns See True False At least one replacement was performed. String not found, or invalid pattern.
Mreplace, Replace, Searchwrap
191
Quote
Key
CTRL+P
Quote Reads one key from the keyboard and types it into the file or dialog box. In a dialog box, the key is always CTRL+P, no matter what function or macro you may have assigned to CTRL+P for the editor. This is useful for typing a character (such as keystroke is assigned to a PWB function. Returns True False
TAB
or CTRL+L) whose
Quote always returns true except in the following case. Character would make line too long.
Record
Key
SHIFT+CTRL+R
The Record function toggles macro recording. While a macro is being recorded, PWB displays the letter X on the status bar, and a bullet appears next to the Record On command from the Edit menu. If a menu command cannot be recorded, it is disabled while recording. When macro recording is stopped, PWB assigns the recorded commands to the default macro name Playback. During the recording, PWB writes the name of each command to the definition of Playback in the Record window, which can be viewed as it is updated. Macro recording in PWB does not record changes in cursor position accomplished by clicking the mouse. Use the keyboard if you want to include cursor movements in a macro. Record (SHIFT+CTRL+R) Toggles macro recording on and off. Arg textarg Record (ALT+A textarg SHIFT+CTRL+R) Turns on recording if it is off and assigns the name specified in the text argument to the recorded macro. Turns off recording if it is turned on. Meta Record (F9 SHIFT+CTRL+R) Toggles macro recording. While recording, no editing commands are executed until recording is turned off. Use this form of the function to record a macro without modifying your file.
192
Arg Record (ALT+A SHIFT+CTRL+R) Arg Arg textarg Record (ALT+A ALT+A textarg SHIFT+CTRL+R) Arg Arg Meta Record (ALT+A ALT+A F9 SHIFT+CTRL+R) As above but if the target macro already exists, the commands are appended to the end of the macro. Returns Update True False Recording turned on. Recording turned off.
In PWB 2.00, more menu commands can be recorded than with PWB 1.x.
Refresh
Key
SHIFT+F7
Refresh Prompts for confirmation and then rereads the file from disk, discarding its Undo history and all modifications to the file since the file was last saved.
Returns True False Condition File reread. Prompt canceled
Arg Refresh (ALT+A SHIFT+F7) Prompts for confirmation and then removes the file from the active window and the windows file history. If the active window is the last window that has the file in its history, the file is discarded from memory without saving changes, and the file is closed.
Returns True False Condition File removed from the window. Prompt canceled, or bad argument. The file is not removed from the window.
193
Repeat
Key Unassigned Repeat Repeats the last editing action relative to the current cursor position. The Repeat function considers the following types of operations to be editing actions:
u
Typing a contiguous stream of characters without entering a command or moving the cursor Deleting text Pasting from the clipboard
u u
Repeat does not repeat macros or cursor movements. Arg number Repeat (ALT+A number Unassigned) Performs the last action the number of times specified by number. Returns True False Action repeated and returned true. Action repeated and returned false, or no action to repeat.
Replace
Key
CTRL+L
The Replace function performs a find-and-replace operation on the current file, prompting for find and replacement strings. Replace substitutes all matches of the search pattern without prompting for confirmation. Replace (CTRL+L) Performs the replacement from the cursor to the end of the file, wrapping around the end of the file if the Searchwrap switch is on. Arg boxarg | linearg | streamarg Replace (ALT+A boxarg | linearg | streamarg CTRL+L) Performs the replacement over the selected area. Note that PWB does not adjust the selection at each replacement for changes in the length of the text. For boxarg and streamarg, PWB may replace text that was not included in the original selection or miss text included in the original selection.
194
Arg mark Replace (ALT+A mark CTRL+L) Performs the replacement on text from the cursor to the specified mark. It searches the range of text as if it were selected, according to the current selection mode. The mark argument cannot be a line number. Arg number Replace (ALT+A number CTRL+L) Performs the replacement over the specified number of lines, starting with the current line. Arg Arg ... Replace (ALT+A ALT+A ... CTRL+L) As above except using regular expressions. Meta ... Replace (F9 ... CTRL+L) As above except the sense of the Case switch is reversed for the operation. Returns See Example True False At least one replacement was performed. String not found, or invalid pattern.
Qreplace, Searchwrap To use the replace function in a macro, use the phrase:
...Replace "pattern" Newline "replacement" Newline +>found...
Enter the replies to the prompts as you would when executing Replace interactively. This example also shows where to place the conditional to test the result of Replace. You can specify special characters in the find-and-replacement strings by using escape sequences similar to those in the C language. Note that backslashes in the macro string must be doubled. To restore the usual prompts, use the phrase:
...Replace <
To use an empty replacement text (replace with nothing), use the following phrase:
...Replace "pattern" Newline " " Cdelete Newline...
If you find that you write many macros with empty replacements, the common phrase can be placed in a macro, as follows:
nothing := " " Cdelete Newline
In addition, macro definitions can be more readable with the following definition:
with := Newline
With these definitions, you can write:

... Replace "pattern" with nothing ...
195
196
Resize
Key Unassigned Resize Enters window-resizing mode. When in window-resizing mode, only the following actions are available:
Action Shrink one row Expand one row Shrink one column Expand one column Accept the new size Cancel the resize Key
Arg number Resize (ALT+A number Unassigned) Resizes the window to number rows high. Arg number Meta Resize (ALT+A number F9 Unassigned) Resizes the window to number columns wide. See Movewindow
Restcur
Key Unassigned Restcur Moves the cursor to the last position saved with the Savecur function (Unassigned, Set To Anchor command, Edit menu). Restcur always clears the saved position. Returns See True False Selcur Position restored. No saved position to restore.
197
Right
Keys
RIGHT , CTRL+D
Right Moves the cursor one character to the right. If this action causes the cursor to move out of the window, PWB adjusts the window to the right according to the Hscroll switch. Meta Right (F9 RIGHT ) Moves the cursor to the rightmost position in the window. Returns Example True False Cursor on text in the line. Cursor past text on the line.
In a macro, the return value of the Right function can be used to test if the cursor is on text in the line or past the end of the line. The following macro tests the return value to simulate the Endline function:
MyEndline := Begline :>loop Right +>loop
See
Begline, Endfile, Endline, Home, Left
Saveall
Key Unassigned Saveall Saves all modified disk files. Pseudofiles are not saved. Returns Saveall always returns true.
Savecur
Key Menu Unassigned Edit menu, Set Anchor command Savecur Saves the cursor position (sets an anchor).
198
To restore the cursor to the saved position, use the Restcur function (Unassigned). To select text from the current position to the saved position, use the Select To Anchor command from the Edit menu or the Selcur function (Unassigned). Returns Savecur always returns true.
Sdelete
Key Unassigned Sdelete Deletes the character at the cursor. Does not copy the character to the clipboard. Arg Sdelete (ALT+A Unassigned) Deletes text from the cursor to the end of the line, including the line break. The deleted text is copied to the clipboard. Arg streamarg | boxarg | linearg Sdelete (ALT+A streamarg | boxarg | linearg Unassigned) Deletes the selected stream of text from the starting point of the selection to the cursor and copies it to the clipboard. Always deletes a stream, regardless of the current selection mode. Meta ... Sdelete (F9 ... Unassigned) As above but discards the deleted text. The contents of the clipboard are unchanged. Returns Sdelete always returns true.
Searchall
Key Unassigned Searchall Highlights all occurrences of the previously searched string or pattern. Moves the cursor to the first occurrence in the file. Arg Searchall (ALT+A Unassigned) Highlights all occurrences of the string specified by the text from the cursor to the first blank character.
199
Arg textarg Searchall (ALT+A textarg Unassigned) Highlights all occurrences of textarg. Arg Arg Searchall (ALT+A ALT+A Unassigned) Highlights all occurrences of the regular expression defined by the characters from the cursor to the first blank character. Arg streamarg Searchall (ALT+A streamarg Unassigned) Highlights all occurrences of streamarg. Arg Arg textarg Searchall (ALT+A ALT+A textarg Unassigned) Highlights all occurrences of a regular expression defined by textarg. Meta ... Searchall (F9 ... Unassigned) As above but reverses the value of the Case switch for one search. Returns True False String or pattern found. No matches found.
Selcur
Key Menu Unassigned Edit menu, Select To Anchor command Selcur Selects text from the cursor to the position saved using the Set Anchor command from the Edit menu or the Savecur function (Unassigned). If no position has been saved, Selcur selects text from the cursor to the beginning of the file. Returns Selcur always returns true.
Select
Keys
SHIFT+PGUP , SHIFT+CTRL+PGUP , SHIFT+PGDN, SHIFT+CTRL+PGDN, SHIFT+END, SHIFT+CTRL+END, SHIFT+HOME, SHIFT+CTRL+HOME, SHIFT+LEFT, SHIFT+CTRL+LEFT, SHIFT+UP , SHIFT+RIGHT , SHIFT+CTRL+RIGHT , SHIFT+DOWN
Select Causes a shifted key to take on the cursor-movement function associated with the unshifted key and begins or extends a selection.
200
To see the key combinations currently assigned to this function, use the Key Assignments command from the Options menu.
201
Selmode
Key Unassigned Selmode Advances the selection mode between stream, line, and box modes, starting with the current mode. Returns See True False New mode is stream mode. New mode is box mode or line mode.
_pwbstreammode, _pwbboxmode, _pwblinemode
Selwindow
Key
F6
Selwindow Moves the focus to the next window. Arg Selwindow (ALT+A F6) Moves the focus to the next unminimized window. Minimized windows (icons) are skipped. Arg number Selwindow (ALT+A number F6) Moves the focus to the specified window. Meta Selwindow (F9 F6) Moves the focus to the previous window. Arg Meta Selwindow (ALT+A F9 F6) Moves the focus to the previous unminimized window. Returns True False Focus moved to another window. No other windows are open.
202
Setfile
Key
F2
Setfile Switches to the first file in the active windows file history. If there are no files in the file history, PWB displays the message No alternate file. When the Autosave switch is set to yes, PWB saves the current file if it has been modified. Setfile does not honor the Newwindow switch. To open a new window when you open a file, use Openfile. Arg Setfile (ALT+A F2) Switches to the filename that begins at the cursor and ends with the first blank character. Arg textarg Setfile (ALT+A textarg F2) Switches to the file specified by textarg. If the file is not already open, PWB opens it. You can use environment-variable specifiers in the argument. If the argument is a drive or directory name, PWB changes the current drive or directory to the specified one and displays a message to confirm the change. See: Infodialog. Arg !number Setfile (ALT+A !number F2) If the argument has the form !number, PWB switches to the file with that number in the file history. The number can be from 1 to 9, inclusive. See: _pwbfilen. Arg wildcard Setfile (ALT+A wildcard F2) If the argument is a wildcard, PWB creates a pseudofile containing a list of files that match the pattern. To open a file from this list, position the cursor at the beginning of the name and execute Arg Openfile (ALT+A F10) or Arg Setfile (ALT+A F2). Meta ... Setfile (F9 ... F2) As above but does not save the changes to the current file. Arg Arg Setfile (ALT+A ALT+A F2) Saves the current file. Arg Arg textarg Setfile (ALT+A ALT+A textarg F2) Saves the current file under the name specified by textarg. Returns True False wish saved. File opened successfully. No alternate file, the specified file does not exist, and you did not to create it; or the current file needs to be saved and cannot be
203
See
Newfile
204
Sethelp
Key
SHIFT+CTRL+S
The Sethelp function opens and closes single Help files. The Sethelp function can also display the current list of open Help files. Sethelp affects only the current PWB session. Arg Sethelp (ALT+A SHIFT+CTRL+S) Opens the Help file specified by the filename at the cursor. Arg streamarg | textarg Sethelp (ALT+A streamarg | textarg SHIFT+CTRL+S) Opens the Help file specified by the selected filename. Meta ... Sethelp (F9 ALT+A SHIFT+CTRL+S) As above except the specified Help file is closed. Arg ? Sethelp (ALT+A ? SHIFT+CTRL+S) Lists all currently open Help files. Returns True False Helpfiles Help file opened or closed, or list of Help files displayed. The specified file could not be opened or closed, or the list of files could not be displayed.
See
Setwindow
Key
CTRL+]
Setwindow Redisplays the contents of the active window. Meta Setwindow (F9 CTRL+]) Redisplays the current line. Arg Setwindow (ALT+A CTRL+]) Adjusts the window so that the cursor position becomes the home position (upper-left corner). Returns Setwindow always returns true.
205
Shell
Key
SHIFT+F9
Shell Runs an operating-system command shell. To return to PWB, type exit at the operating-system prompt. Warning Do not start terminate-and-stay-resident (TSR) programs in a shell. This causes unpredictable results. Arg Shell (ALT+A SHIFT+F9) Runs the text from the cursor to the end of the line as a command to the shell, and returns to PWB. Arg boxarg | linearg Shell (ALT+A boxarg | linearg SHIFT+F9) Runs each selected line as a separate command to the shell, and returns to PWB. Arg textarg Shell (ALT+A textarg SHIFT+F9) Runs textarg as a command to the shell, and returns to PWB. Meta ... Shell (F9 ... SHIFT+F9) Runs a shell, ignoring the Autosave switch. Modified files are not saved to disk, but they are retained in PWBs virtual memory. Returns True False Shell ran successfully. Invalid argument, or error starting the operating-system command processor.
See
Askrtn, Restart , Savescreen
Sinsert
Key
CTRL+J
Sinsert Inserts a space at the cursor. Arg Sinsert (ALT+A CTRL+J) Inserts a line break at the cursor, splitting the line.
206
Arg streamarg | linearg | boxarg Sinsert (ALT+A streamarg | linearg | boxarg CTRL+J) Inserts a stream of blanks between the starting point of the selection and the cursor. The insertion is always a stream, regardless of the current selection mode. Returns Example True False Spaces or line break inserted. Insertion would make a line too long.
The following macro inserts a stream of spaces up to the next tab stop, regardless of the current selection mode:
InsertTab := Arg Tab Sinsert
See
Insert , Linsert
Tab
Key
TAB
Tab Moves the cursor to the next tab stop. If there are no tab stops to the right of the cursor, the cursor does not move. Tab stops are defined by the Tabstops switch. Returns Update See True False Cursor moved. Cursor not moved.
In PWB 1.x, tab stops appear at fixed intervals. In PWB 2.00, tab stops can be at variable or fixed intervals. Backtab
Tell
Key
CTRL+T
Tell Displays the message Press a key to tell about and waits for a keystroke. After you press a key or combination of keys, Tell brings up the Tell dialog box showing the name of the key and its assigned function in TOOLS.INI key-assignment format.
207
The key-assignment format is: function:key If the key is not assigned a function, Tell displays unassigned for the function name. See: Unassigned. If you press a combination of keys, but Tell still shows the Press a key prompt (when you press SCROLL LOCK, for example), PWB is unable to recognize that combination of keys and you cannot use it as a key assignment. Arg Tell (ALT+A CTRL+T) Prompts for a key, then displays the name of the function or macro assigned to the key in one of these formats: function:key macroname:=definition Arg textarg Tell (ALT+A textarg CTRL+T) Displays the definition of the macro named by textarg. If you specify a PWB function, Tell displays: function:function Meta ... Tell (F9 ... CTRL+T) As above except Tell types the result into the current file rather than displaying it in a dialog box. This is how to discover the definition of any macro, including PWB macros. Returns Update Remarks True False Assignment displayed or typed. No assignment for the key or the specified name.
In PWB 1.x, the prompt and results appear on the status bar; in PWB 2.00, the prompt and results appear in dialog boxes. Meta Tell is a convenient and reliable way of writing a key assignment when you are configuring PWB. For example, if you want to execute the Curdate function (type todays date) when you press the CTRL, SHIFT, and D keys simultaneously, perform the following steps: 1. Go to an empty line in the [PWB] section of TOOLS.INI. 2. Execute Meta Tell (F9 CTRL+T). Tell displays the message: Press a key to tell about. 3. Press the D, SHIFT, and CTRL keys simultaneously. If you have not already assigned a function to this combination, Tell types:
unassigned:Shift+Ctrl+D
208
4. Select the word unassigned and type curdate. 5. If you want the assignment to take effect immediately, move the cursor to the line youve just entered and execute the Assign function (ALT+=). You can use Meta Arg textarg Tell to recover the definition of a predefined PWB macro or a macro that you have not saved or entered into a file. See _pwbusern, Assign, Record
Unassigned
Keys Assigned to all available keys. Unassigned Displays a message for keys that do not have a function assignment. All unassigned keys are actually assigned the Unassigned function. Thus, to remove a function assignment for a key, assign the Unassigned function to the key. The Unassigned function is not useful in macros. Returns See The Unassigned function always returns false. Assign, Tell
Undo
Keys
ALT+BKSP, SHIFT+CTRL+BKSP
Undo Reverses the last editing operation. The maximum number of times this can be performed for each file is set by the Undocount switch. Meta Undo (F9 ALT+BKSP) Performs the operation previously reversed with Undo . This action is often called redo. Returns See True False Operation undone or redone. Nothing to undo or redo.
_pwbundo , _pwbredo , Repeat
209
Up
Keys
UP, CTRL+E
Up Moves the cursor up one line. If a selection has been started, it is extended by one line. If this movement results in the cursor moving out of the window, the window is adjusted upward as specified by the Vscroll switch. Meta Up (F9 UP) Moves the cursor to the top of the window without changing the column position. Returns See True False Down Cursor moved. Cursor not moved; the cursor is already at the destination.
Usercmd
Key Unassigned The Usercmd function executes a custom command added to the Run menu by using Customize command from the Run menu or setting the User switch. Arg number Usercmd (ALT+A number Unassigned) Executes the given custom Run menu command. The number can be in the range 19. Returns See True False Command exists. Command does not exist, or invalid argument.
_pwbusern, Assign, Record
210
Window
Key Unassigned Window Switch to the next window.
Returns True False Condition Switched to next window. No next window to switch to: zero or one window open.
Arg [[Arg]] Window (ALT+A [[ALT+A]] Unassigned) Open a new window.

Returns True False Condition Opened a new window. Window not opened.
Meta Window (F9 Unassigned) Close the active window.

Returns True False Condition Window closed. No open window to close.
Meta Arg Window (ALT+A F9 Unassigned) Switch to the previous window.

Returns True False Condition Switched to previous window. No previous window to switch to: zero or one window open.
Update See
In PWB 1.x, Arg Window and Arg Arg Window split the window at the cursor. In PWB 2.00, these forms of Window open a new window. Selwindow, Setwindow
211
Winstyle
Key
CTRL+F6
Winstyle Advances through the following series of window styles, starting from the current style:
Horizontal Scroll Bar No No Yes Yes Vertical Scroll Bar No Yes No Yes
When the horizontal scroll bar is not shown, a maximized window does not show its bottom border. Similarly, when the vertical scroll bar is not shown, a maximized window does not show its left and right borders. PWB always displays the title bar. To get the clean-screen look, maximize the window and advance the window style until the borders disappear. Default Returns Update Set the default window style with the Defwinstyle switch. True False Changed window style. No windows open.
The no-border state in PWB 1.x is not available in PWB 2.00. In PWB 2.00, when a window is maximized and no scroll bars are present, PWB displays the window without borders. Maximize
See
Predefined PWB Macros

PWB predefines a number of macros, most of which correspond to a command in the PWB menus. You can define a shortcut key for a menu command by assigning the key to the corresponding macro. Note that some menu commands such as the Open command from the File menu do not correspond to a macro, and some macros do not correspond to a menu command.
212
Environment and Tools Table 7.12 PWB Macros Macro Curfile Curfileext Curfilenam _pwbarrange _pwbboxmode _pwbbuild _pwbcancelbuild _pwbcancelprint _pwbcancelsearch _pwbcascade _pwbclear _pwbclose _pwbcloseall _pwbclosefile _pwbcloseproject _pwbcompile _pwbfilen _pwbgotomatch _pwbhelp_again _pwbhelp_back _pwbhelp_contents _pwbhelp_context _pwbhelp_general _pwbhelp_index _pwbhelpnl Description Current files full path Current files extension Current files name Arrange command, Window menu Box Mode command, Edit menu Build command, Project menu Cancel Build command, Project menu Cancel Print command, File menu Cancel Search command, Search menu Cascade command, Window menu Delete command, Edit menu Close command, Window menu Close All command, Window menu Close command, File menu Close command, Project menu Compile command, Project menu n file, File menu Goto Match command, Search menu Next command, Help menu Previous Help topic Contents command, Help menu Topic command, Help menu Help on Help command, Help menu Index command, Help menu Display the message:
Online Help Not Loaded
Key Unassigned Unassigned Unassigned

ALT+F5
Unassigned Unassigned Unassigned Unassigned Unassigned

F5 DEL CTRL+F4
Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned Unassigned

ALT+F1 SHIFT+F1 F1
F1 when Help
extension not loaded Unassigned Unassigned Unassigned

CTRL+F10 CTRL+F9 CTRL+F7
_pwbhelp_searchres _pwblinemode _pwblogsearch _pwbmaximize _pwbminimize _pwbmove _pwbnewfile
Search Results command, Help menu Line Mode command, Edit menu Log command, Search menu Maximize command, Window menu Minimize command, Window menu Move command, Window menu New command, File menu
Unassigned
Chapter 7 Programmers WorkBench Reference Table 7.12 Macro _pwbnewwindow _pwbnextfile _pwbnextlogmatch _pwbnextmatch _pwbnextmsg _pwbpreviouslogmatch _pwbpreviousmatch _pwbprevmsg _pwbprevwindow _pwbquit _pwbrebuild _pwbrecord _pwbredo _pwbrepeat _pwbresize _pwbrestore _pwbsaveall _pwbsavefile _pwbsetmsg _pwbshell _pwbstreammode _pwbtile _pwbundo _pwbusern _pwbviewbuildresults _pwbviewsearchresults _pwbwindown PWB Macros (continued) Description New command, Window menu Next command, File menu Next Match command, Search menu Next Match command, Search menu Next Error command, Project menu Previous Match command, Search menu Previous Match command, Search menu Previous Error command, Project menu Move to previous window Exit command, File menu Rebuild All command, Project menu Record command, Edit menu Redo command, Edit menu Repeat command, Edit menu Resize command, Window menu Restore command, Window menu Save All command, File menu Save command, File menu Goto Error command, Project menu DOS Shell command, File menu Stream Mode command, Edit menu Tile command, Window menu Undo command, Edit menu command n, Run menu View build results button View search results button n file, Window menu Key Unassigned Unassigned
213
SHIFT+CTRL+F3
Unassigned
SHIFT+F3 SHIFT+CTRL+F4
Unassigned
SHIFT+F4 SHIFT+F6 ALT+F4
Unassigned Unassigned Unassigned Unassigned

CTRL+F8 CTRL+F5
Unassigned
SHIFT+F2

SHIFT+F5
Unassigned
ALT+Fn
ALT+n
PWB continually redefines the following macros to reflect the current files name:
Macro Curfile Curfileext Curfilenam Description Full path File extension File base name
214
PWB uses the following special-purpose macros:

Macro Autostart Mgreplist Playback Restart Description Executed on startup while reading TOOLS.INI List of files for logged searches, multifile replace, Mgrep, and Mreplace Default name of recorded macros (Obsolete)
By default, these macros are undefined.
Autostart
Key Unassigned The special PWB macro Autostart is executed after PWB finishes all initialization at startup. If used, it must be defined in the [PWB] section of TOOLS.INI. Definition By default, Autostart is not defined.
Curfile
Key Unassigned The Curfile macro types the full path of the current file. This macro is redefined each time you switch to a new file. Definition Example curfile := "pathname" The following macro copies the full path of the current file to the clipboard for later use:
Path2clip := Arg Curfile Copy
See
Arg, Copy, Curdate, Curday, Curfilenam, Curfileext, Curtime
215
Curfileext
Key Unassigned The Curfileext macro types the filename extension of the current file. This macro is redefined each time you switch to a new file. Definition Example curfileext := "extension" The following macro copies the base name plus the extension of the current file to the clipboard for later use:
Filename2clip := Arg Curfilenam Curfileext Copy
See
Arg, Copy, Curdate, Curday, Curfile, Curfilenam, Curtime
Curfilenam
Key Unassigned The Curfilenam macro types the base name of the current file. This macro is redefined each time you switch to a new file. Definition Example curfilenam := "basename" The following macro copies the base name of the current file to the clipboard for later use:
Name2clip := Arg Curfilenam Copy
See
Arg, Copy, Curdate, Curday, Curfile, Curfileext, Curtime
Mgreplist
Key Unassigned The special PWB macro Mgreplist is used by the Find and Replace commands on the Search menu, Mgrep, Mreplace, and Mreplaceall to specify the list of files to search.
216
When you create a list of files to search using the Files button in either the Find or Replace dialog box, PWB redefines the Mgreplist macro with the specified list of files. To see the current list of files, choose the Files button in the Replace dialog box. You can change the list in this dialog box, and either choose OK to perform the find-and-replace operation, or choose Cancel to cancel the replace and accept the changes to Mgreplist. You can also insert the definition of Mgreplist into the current file by using the phrase: Arg Meta Mgreplist Tell (ALT+A F9 Mgreplist CTRL+T). You can edit the macro, then redefine it by using the Assign function (ALT+=). Definition Mgreplist:= "list" list Space-separated list of filenames The filenames can use the operating-system wildcards (* and ?), and can use environment-variable specifiers. Note that backslashes (\) must be doubled in the macro string. See Assign, Tell, Mgrep, Mreplace, Mreplaceall
Restart
Key Update Unassigned In PWB 1.x, the special PWB macro Restart is executed whenever PWB returns from a shell, build, or other external operation. In PWB 2.00, the Restart macro is never executed automatically and has no special meaning; it is an ordinary macro.
_pwbarrange
Key Menu
ALT+F5
Window menu, Arrange command
217
The _pwbarrange macro arranges all unminimized windows on the desktop. The following illustration shows a typical desktop after execution of _pwbarrange:
Figure 7.1
Arranged Windows
Definition
_pwbarrange:=cancel arg arrangewindow < Cancel Establishes a uniform ground state by canceling any selection or argument. Arg Arrangewindow Arranges all unminimized windows on the desktop. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running.
See
Arrangewindow
_pwbboxmode
Key Menu Definition Unassigned Edit menu, Box Mode command The _pwbboxmode macro sets the selection mode to box selection mode. _pwbboxmode := :>more selmode ->more selmode
218
:>more Defines the label more. Selmode Advances to the next selection mode. ->more Branches to the label more if Selmode returns false. The Selmode function advances the selection mode from box, to stream, to line. Selmode returns true when the mode is stream mode. The macro executes the Selmode function until it returns true (sets stream mode), then advances the selection mode once to set box selection mode. See Enterselmode, Selmode
_pwbbuild
Key Menu Unassigned Project menu, Build command The _pwbbuild macro builds the all target of the current PWB project. The all pseudotarget in a PWB project lists all the targets in the project. For non-PWB projects, _pwbbuild builds the targets that were last specified by using the Build Target command from the Project menu. PWB redefines _pwbbuild each time you use Build Target. If no target has been specified, NMAKE builds the first target listed in the project makefile. Definition _pwbbuild := cancel arg "all" compile < Cancel Establishes a uniform ground state by cancelling any selection or argument. Arg all Compile Builds the all pseudotarget in the current project. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Arg, Cancel, Compile
219
_pwbcancelbuild
Key Menu Unassigned Project menu, Cancel Build command The _pwbcancelbuild macro terminates the current background build or compile and flushes any queued build operations. Definition _pwbcancelbuild := cancel arg meta compile Cancel Establishes a uniform ground state by canceling any selection or argument. Arg Meta Compile Terminates the background build process. See Arg, Cancel, Compile, Meta
_pwbcancelprint
Key Definition Unassigned The _pwbcancelprint macro terminates all background print operations. _pwbcancelprint := cancel arg meta print Cancel Establishes a uniform ground state by canceling any selection or argument. Arg Meta Print Terminate background print operations. See Arg, Cancel, Meta, Print
_pwbcancelsearch
Key Menu Unassigned Search menu, Cancel Search command
220
The _pwbcancelsearch macro cancels the current background search. PWB performs logged searches in the background under multithreaded environments. Definition _pwbcancelsearch := cancel cancelsearch < Cancel Establishes a uniform ground state by canceling any selection or argument. Cancelsearch Cancels the current background search. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Cancel, Cancelsearch, Logsearch
_pwbcascade
Key Menu
F5
Window menu, Cascade command The _pwbcascade macro arranges all unminimized windows in cascaded fashion so that all of their titles are visible. Up to 16 unminimized windows can be cascaded.
Definition
_pwbcascade := cancel arrangewindow < Cancel Establishes a uniform ground state by canceling any selection or argument. Arrangewindow Cascades all unminimized windows. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running.
See
Arrangewindow, Cancel
_pwbclear
Key
DEL
221
Menu
Edit menu, Delete command The _pwbclear macro removes the selected text from the file. If there is no selection, PWB removes the current line. The selection or line is not copied to the clipboard. It can be recovered only by using the Undo command from the Edit menu or Undo (ALT+BKSP).
Definition
_pwbclear := meta delete Meta Delete Removes the selection or the current line from the file without modifying the clipboard.
See
Delete, Meta
_pwbcloseall
Key Menu Definition Unassigned Window menu, Close All command The _pwbcloseall macro closes all open windows. _pwbcloseall := cancel arg arg meta window < Cancel Establishes a uniform ground state by canceling any selection or argument. Arg Arg Meta Window Closes all windows. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Arg, Cancel, Meta, Window
_pwbclosefile
Key Menu Unassigned File menu, Close command
222
The _pwbclosefile macro closes the current file. If no files remain in the windows file history, the window is closed. Definition _pwbclosefile := cancel closefile < Cancel Establishes a uniform ground state by canceling any selection or argument. Closefile Closes the current file. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Cancel, Closefile
_pwbcloseproject
Key Menu Definition Unassigned Project menu, Close Project command The _pwbcloseproject macro closes the current project. _pwbcloseproject := cancel arg arg project < Cancel Establishes a uniform ground state by canceling any selection or argument. Arg Arg Project Closes the current project. < Restores the functions prompt (if any). By default, function prompts are suppressed within a macro. See Arg, Cancel, Project
_pwbcompile
Key Menu Unassigned Project menu, Compile File command
223
The _pwbcompile macro compiles the current file. Definition _pwbcompile := cancel arg compile < Cancel Establishes a uniform ground state by canceling any selection or argument. Arg Compile Compiles the current file. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Arg, Cancel, Compile
_pwbgotomatch
Key Menu Unassigned Search menu, Goto Match command The _pwbgotomatch macro sets the match listed at the current location in the Search Results pseudofile as the current match and moves the cursor to the location specified by that match. Definition _pwbgotomatch := cancel arg arg nextsearch < Cancel Establishes a uniform ground state by canceling any selection or argument. Arg Arg Nextsearch Goes to the current match. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Arg, Cancel, Nextsearch
224
_pwbhelpnl
The _pwbhelpnl macro displays a message indicating the Help extension is not loaded. The Help keys are assigned this macro until the Help extension is loaded. Definition _pwbhelpnl := cancel arg "OnLine Help Not Loaded" message Cancel Establishes a uniform ground state by canceling any selection or argument. Arg OnLine Help Not Loaded Message Displays the message on the status bar. See Arg, Cancel, Load, Message
_pwbhelp_again
Key Menu Unassigned Help menu, Next command The _pwbhelp_again displays the next occurrence of the last topic for which you requested Help. If no other occurrences of the topic are defined in the open files, PWB redisplays the current topic. The topic that PWB looks up when you use this command is displayed after the Next command on the Help menu, as follows: Next: topic key topic key Definition Topic string used for the command. Current key assignment for _pwbhelp_again (if any).
_pwbhelp_again:=cancel arg pwbhelp.pwbhelpnext < Cancel Establishes a uniform ground state by canceling any selection or argument. Arg Sets the Arg prefix for the Pwbhelpnext function. Pwbhelp. Specifies that the function is the PWBHELP extension function.
225
Pwbhelpnext Displays the next occurrence of the previously requested topic. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Pwbhelpnext
226
_pwbhelp_back
Key
ALT+F1
The _pwbhelp_back macro displays the previously viewed Help topic. Up to 20 topics are kept in the Help backtrace list. Definition _pwbhelp_back:=cancel meta pwbhelp.pwbhelpnext < Cancel Establishes a uniform ground state by canceling any selection or argument. Meta Sets the meta prefix for the function. Pwbhelp. Specifies that the function is the PWBHELP extension function. Pwbhelpnext Displays the previously viewed Help topic. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Pwbhelpnext
_pwbhelp_contents
Key Menu
SHIFT+F1
Help menu, Contents command The _pwbhelp_contents macro opens the Help window and displays the toplevel contents of the Help system. Within the Help system, most Help topics have a Contents button at the top of the window. This button also takes you to the top-level contents.
Definition
_pwbhelp_contents:=cancel arg "advisor.hlp!h.contents" pwbhelp.pwbhelp < Cancel Establishes a uniform ground state by canceling any selection or argument. Arg "advisor.hlp!h.contents" Defines a text argument with the topic name for the general table of contents.
227
Pwbhelp. Specifies that the function is the PWBHELP extension function. Pwbhelp Looks up the topic h.contents in the ADVISOR.HLP Help file. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Pwbhelp
_pwbhelp_context
Key Menu
F1
Help menu, Topic command The _pwbhelp_context macro looks up Help on the topic at the cursor, the current selection, or the specified text argument.
Definition
_pwbhelp_context:=arg pwbhelp.pwbhelp < Arg Sets the Arg prefix for the Pwbhelp function. Pwbhelp. Specifies that the function is the PWBHELP extension function. Pwbhelp Displays Help on the topic at the cursor. When text is selected, displays Help on the selected text. When you have entered an argument in the Text Argument dialog box, displays Help on the topic specified by the text argument. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running.
See
Pwbhelp
228
_pwbhelp_general
Key Menu Unassigned Help menu, Help on Help command The _pwbhelp_general macro opens the Help window and displays information about using the Help system. Definition _pwbhelp_general:=cancel arg "advisor.hlp!h.default" pwbhelp.pwbhelp < Cancel Establishes a uniform ground state by canceling any selection or argument. Arg "advisor.hlp!h.default" Defines a text argument with the topic name for default Help. Pwbhelp. Specifies that the function is the PWBHELP extension function. Pwbhelp Looks up the topic h.default in the ADVISOR.HLP Help file. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Pwbhelp
_pwbhelp_index
Key Menu Unassigned Help menu, Index command The _pwbhelp_index macro opens the Help window and displays the top-level table of indexes in the Help system. Within the Help system, most Help topics have an Index button at the top of the window. This button also takes you to the top-level table of indexes. Definition _pwbhelp_index:=cancel arg "advisor.hlp!h.index" pwbhelp.pwbhelp < Cancel Establishes a uniform ground state by canceling any selection or argument. Arg "advisor.hlp!h.index" Defines a text argument with the topic name for the Help index.
229
Pwbhelp. Specifies that the function is the PWBHELP extension function. Pwbhelp Looks up the topic h.index in the ADVISOR.HLP Help file. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Pwbhelp
_pwbhelp_searchres
Key Menu Unassigned Help menu, Search Results command The _pwbhelp_searchres macro opens the Help window and displays the list of matches found during the last global Help search. Definition _pwbhelp_searchres:=cancel pwbhelp.pwbhelpsearch < Cancel Establishes a uniform ground state by canceling any selection or argument. Pwbhelp. Specifies that the function is the PWBHELP extension function. Pwbhelpsearch Opens the Help window and displays the results of the last global Help search. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Pwbhelpsearch
_pwblinemode
Key Menu Unassigned Edit menu, Line Mode command
230
The _pwblinemode macro sets the selection mode to line selection mode. Definition _pwblinemode := :>more selmode ->more selmode selmode :>more Defines the label more. Selmode Advances to the next selection mode. ->more Branches to the label more if Selmode returns false. The Selmode function advances the selection mode from box, to stream, to line. Selmode returns true when the mode is stream mode. The macro executes the Selmode function until it returns true (sets stream mode), then advances the selection mode twice to set line selection mode. See Enterselmode, Selmode
_pwblogsearch
Key Menu Unassigned Search menu, Log command The _pwblogsearch macro toggles search logging on and off. When search logging is turned on, PWB displays a bullet next to the Log command on the Search menu. The Next Match command executes the _pwbnextlogmatch macro, and the Previous Match command executes the _pwbpreviouslogmatch macro. When search logging is turned off, no bullet appears and the Next Match and Previous Match commands execute _pwbnextmatch and _pwbpreviousmatch. Definition _pwblogsearch := cancel logsearch < Cancel Establishes a uniform ground state by canceling any selection or argument. Logsearch Toggles the logging of search results on and off. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Cancel, Logsearch
231
_pwbmaximize
Key Menu
CTRL+F10
Window menu, Maximize command The _pwbmaximize macro enlarges the active window to its largest possible size, showing only the window, the menu bar, and the status bar on the PWB screen.
Definition
_pwbmaximize := cancel maximize < Cancel Establishes a uniform ground state by canceling any selection or argument. Maximize Enlarges the active window to full size. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running.
See
Cancel, Minimize
_pwbminimize
Key Menu
CTRL+F9
Window menu, Minimize command The _pwbminimize macro minimizes the active window, reducing the window to an icon. To restore a window to its original size, double-click in the box or use the Restore command (CTRL+F5) on the Window menu.
Definition
_pwbminimize := cancel minimize < Cancel Establishes a uniform ground state by canceling any selection or argument. Minimize Shrinks the window to an icon. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running.
See
Cancel, Maximize, Minimize
232
_pwbmove
Key Menu
CTRL+F7
Window menu, Move command The _pwbmove macro starts window-moving mode for the active window. In window-moving mode, you can only do the following:
Action Move up one row Move down one row Move left one column Move right one column Accept the new position Cancel the move Key
To move the window in larger increments, you can use a numeric argument with the Movewindow function. Definition _pwbmove := cancel movewindow < Cancel Establishes a uniform ground state by canceling any selection or argument. Movewindow Starts window-moving mode. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Arrangewindow, Cancel, Maximize, Minimize, Resize
_pwbnewfile
Key Menu Unassigned File menu, New command The _pwbnewfile macro creates a new pseudofile.
233
New pseudofiles are given a unique name of the form: <Untitled.nnn>Untitled.nnn where <nnn> is a three-digit number starting with 001 at the beginning of each PWB session. The window title shows Untitled.nnn. The file may be referred to by the name <Untitled.nnn>. When the Newwindow switch is set to yes, or the active window is a PWB window, a new window is opened for the file. Otherwise, the file is opened in the active window, and the previous file is placed in the windows file history. Definition _pwbnewfile := cancel newfile < Cancel Establishes a uniform ground state by canceling any selection or argument. Newfile Creates a new untitled pseudofile. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Cancel, Setfile
_pwbnewwindow
Key Menu Unassigned Window menu, New command The _pwbnewwindow macro opens a new window, which shows the current file. The new window has the complete file history as the original window. Definition _pwbnewwindow := cancel arg window Cancel Establishes a uniform ground state by canceling any selection or argument. Arg Window Opens a new window on the current file See Arg, Cancel, Window
234
_pwbnextfile
Key Menu Unassigned File menu, Next command The _pwbnextfile macro moves to the next file in the list of files specified on the PWB command line. If no more files remain in the list, this macro ends the PWB session. When the Newwindow switch is set to yes, or the active window is a PWB window, a new window is opened for the file. Otherwise, the file is opened in the active window, and the previous file is placed in the windows file history. Definition _pwbnextfile := cancel exit < Cancel Establishes a uniform ground state by canceling any selection or argument. Exit Moves to the next file specified on the command line. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Exit, Askexit, Cancel, PWB Command Line
_pwbnextlogmatch
Key Menu
SHIFT+CTRL+F3
Search menu, Next Match command The _pwbnextlogmatch macro advances the cursor to the next match listed in the Search Results pseudofile. The Next Match command on the Search menu executes this macro when search logging is turned on. When search logging is turned off, Next Match executes the _pwbnextmatch macro.
Definition
_pwbnextlogmatch := cancel nextsearch < Cancel Establishes a uniform ground state by canceling any selection or argument.
235
Nextsearch Advances to the next match in Search Results. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Cancel, Nextsearch
_pwbnextmatch
Key Menu Unassigned Search menu, Next Match command The _pwbnextmatch macro searches forward in the file using the last search pattern and options. The search options are Match Case, Wrap Around, and Regular Expression. The Next Match command on the Search menu executes this macro when search logging is turned off. When search logging is turned on, Next Match executes the _pwbnextlogmatch macro. Definition _pwbnextmatch := cancel psearch < Cancel Establishes a uniform ground state by canceling any selection or argument. Psearch Searches forward in the file for the next occurrence of the last search string or pattern. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Cancel, Psearch
_pwbnextmsg
Key Menu
SHIFT+F3
Project menu, Next Error command
236
The _pwbnextmsg macro moves the cursor to the next message in Build Results. Definition _pwbnextmsg := cancel nextmsg < Cancel Establishes a uniform ground state by canceling any selection or argument. Nextmsg Goes to the next message in Build Results. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Cancel, Nextmsg
_pwbpreviouslogmatch
Key Menu
SHIFT+CTRL+F4
Search menu, Previous Match command The _pwbpreviouslogmatch macro moves the cursor to the previous match listed in the Search Results pseudofile. The Previous Match command on the Search menu executes this macro when search logging is turned on. When search logging is turned off, Previous Match executes the _pwbpreviousmatch macro.
Definition
_pwbpreviouslogmatch := cancel arg "-1" nextsearch < Cancel Establishes a uniform ground state by canceling any selection or argument. Arg "-1" Nextsearch Moves to the previous match listed in Search Results. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running.
See
Arg, Cancel, Nextsearch
237
_pwbpreviousmatch
Key Menu Unassigned Search menu, Previous Match command The _pwbpreviousmatch macro searches backward in the file, using the last search pattern and options. The search options are Match Case, Wrap Around, and Regular Expression. The Previous Match command on the Search menu executes this macro when search logging is turned off. When search logging is turned on, Previous Match executes the _pwbpreviouslogmatch macro. Definition _pwbpreviousmatch := cancel msearch < Cancel Establishes a uniform ground state by canceling any selection or argument. Msearch Searches backward in the file for the last search string or pattern. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Cancel, Msearch
_pwbprevmsg
Key Menu
SHIFT+F4
Project menu, Previous Error command The _pwbprevmsg macro moves the cursor to the previous message in the Build Results pseudofile.
Definition
_pwbprevmsg := cancel arg "-1" nextmsg < Cancel Establishes a uniform ground state by canceling any selection or argument. Arg "-1" Nextmsg Goes to the previous message in Build Results.
238
< Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Arg, Cancel, Nextmsg
_pwbprevwindow
Key
SHIFT+F6
The _pwbprevwindow macro moves the focus to the previous window. That is, PWB sets the previously active window as the active window. This action moves among the open windows in the reverse order of Selwindow (F6). Definition _pwbprevwindow:=cancel meta selwindow < Cancel Establishes a uniform ground state by canceling any selection or argument. Meta Selwindow Moves the focus to the previous window. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Cancel, Meta, Selwindow
_pwbquit
Key Menu
ALT+F4
File menu, Exit command The _pwbquit macro leaves PWB immediately. Any specified files on the PWB command line that have not been opened are ignored.
Definition
_pwbquit := cancel arg exit < Cancel Establishes a uniform ground state by canceling any selection or argument. Arg Exit Leaves PWB.
239
< Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Arg, Askexit, Cancel, Exit, Savescreen
_pwbrebuild
Key Menu Unassigned Project menu, Rebuild All command The _pwbrebuild macro forces a rebuild of everything in the current project. For non-PWB projects, _pwbrebuild rebuilds the targets that were last specified by using the Build Target command on the Project menu. PWB redefines _pwbrebuild each time you use Build Target. If no target has been specified, NMAKE rebuilds the first target listed in the project makefile. Definition _pwbrebuild := cancel arg meta "all" compile < Cancel Establishes a uniform ground state by canceling any selection or argument. Arg Meta "all" Compile Rebuilds the all pseudotarget. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Arg, Cancel, Compile, Meta
_pwbrecord
Key Menu Unassigned Edit menu, Record On command The _pwbrecord macro toggles macro recording on and off. If you have not set the recorded macro name and key, PWB displays the Set Macro Record dialog box so you can set them. Execute _pwbrecord again to start recording. Definition _pwbrecord := cancel record <
240
Cancel Establishes a uniform ground state by canceling any selection or argument. Record Toggles macro recording on and off. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Cancel, Record
_pwbredo
Key Menu Unassigned Edit menu, Redo command The _pwbredo macro restores the last modification that was reversed using Edit Undo or Undo (ALT+BKSP). Definition _pwbredo := cancel meta undo < Cancel Establishes a uniform ground state by canceling any selection or argument. Meta Undo Restores the last undone modification. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Cancel, Meta, Undo
_pwbrepeat
Key Menu Definition Unassigned Edit menu, Repeat command The _pwbrepeat macro repeats the last editing operation once. _pwbrepeat := cancel repeat <
241
Cancel Establishes a uniform ground state by canceling any selection or argument. Repeat Repeats the last operation one time. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Cancel, Repeat
_pwbresize
Key Menu
CTRL+F8
Window menu, Size command The _pwbresize macro starts window-sizing mode for the active window. When in window-resizing mode, only the following actions are available:
Action Shrink one row Expand one row Shrink one column Expand one column Accept the new size Cancel the resize Key
To size the window in larger increments, you can use the numeric forms of the Resize function. Definition _pwbresize := cancel resize < Cancel Establishes a uniform ground state by canceling any selection or argument. Resize Starts window-sizing mode.
242
< Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Arrangewindow, Cancel, Maximize, Minimize, Movewindow
_pwbrestore
Key Menu
CTRL+F5
Window menu, Restore command The _pwbrestore macro restores the active window to its size before it was maximized or minimized.
Definition
_pwbrestore := cancel meta maximize Cancel Establishes a uniform ground state by canceling any selection or argument. Meta Maximize Restores the window to its previous size.
See
Cancel, Maximize, Meta, Minimize
_pwbsaveall
Key Menu Unassigned File menu, Save All command The _pwbsaveall macro saves all modified disk files. Modified pseudofiles are not saved. Definition _pwbsaveall := cancel saveall < Cancel Establishes a uniform ground state by canceling any selection or argument. Saveall Writes all modified files to disk.
243
< Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Cancel, Saveall
_pwbsavefile
Key Menu
SHIFT+F2
File menu, Save command The _pwbsavefile macro saves the current file to disk. If the current file is a pseudofile (an untitled file), PWB displays the Save As dialog box so you can give the file a more meaningful name.
Definition
_pwbsavefile := cancel arg arg setfile < Cancel Establishes a uniform ground state by canceling any selection or argument. Arg Arg Setfile Writes the current file to disk. < Restores the functions prompt (if any). By default, function prompts are suppressed when a macro is running.
See
Arg, Cancel, Setfile
_pwbsetmsg
Key Menu Unassigned Project menu, Goto Error command The _pwbsetmsg macro sets the message listed at the current location in the Build Results pseudofile as the current message and moves the cursor to the location specified by that message. See Definition Nextmsg _pwbsetmsg := cancel arg arg nextmsg <
244
Cancel Establishes a uniform ground state by canceling any selection or argument. Arg Arg Nextmsg Goes to the current message. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Arg, Cancel, Nextmsg
_pwbshell
Key Menu Unassigned File menu, DOS Shell command The _pwbshell macro temporarily leaves PWB, starting a new operating-system shell. To return to PWB, type exit at the operating-system prompt. Definition _pwbshell := cancel shell < Cancel Establishes a uniform ground state by canceling any selection or argument. Shell Starts an operating-system shell. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Askrtn, Cancel, Savescreen, Shell
_pwbstreammode
Key Menu Definition Unassigned Edit menu, Stream Mode command The _pwbstreammode macro sets the selection mode to stream selection mode. _pwbstreammode := :>more selmode ->more
245
:>more Defines the label more. Selmode Advances to the next selection mode. ->more Branches to the label more if Selmode returns false. The Selmode function advances the selection mode from box, to stream, to line. Selmode returns true when the mode is stream mode. The macro executes Selmode until it returns true (sets stream selection mode). See Enterselmode, Selmode
_pwbtile
Key Menu
SHIFT+F5
Window menu, Tile command The _pwbtile macro tiles all unminimized windows on the desktop so that no windows overlap and the desktop is completely covered. Up to 16 unminimized windows can be tiled.
Definition
_pwbtile := cancel meta arrangewindow < Cancel Establishes a uniform ground state by canceling any selection or argument. Meta Arrangewindow Tiles all unminimized windows. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running.
See
Arrangewindow, Cancel, Meta
_pwbundo
Key Menu Unassigned Edit menu, Undo command
246
The _pwbundo macro reverses the last modification made to the current file. The maximum number of modifications that can be undone for each file is determined by the Undocount switch. Definition _pwbundo := cancel undo < Cancel Establishes a uniform ground state by canceling any selection or argument. Undo Reverses the last modification. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. See Cancel, _pwbredo
_pwbusern
Macro _pwbuser1 _pwbuser2 _pwbuser3 _pwbuser4 _pwbuser5 _pwbuser6 _pwbuser7 _pwbuser8 _pwbuser9 Description Run custom Run menu command 1 . . . . . . . Run custom Run menu command 9 Key [ [ALT+Fn] ] [ [ALT+Fn] ] [ [ALT+Fn] ] [ [ALT+Fn] ] [ [ALT+Fn] ] [ [ALT+Fn] ] [ [ALT+Fn] ] [ [ALT+Fn] ] [ [ALT+Fn] ]
Menu
Run command command Title of custom Run menu item.
The _pwbusern macros execute custom commands in the Run menu. To add a new command to the Run menu, use the Customize Run Menu command or assign a value to the User switch. Definition _pwbusern := cancel arg "n" usercmd <
247
Cancel Establishes a uniform ground state canceling any selection or argument. Arg "n" Usercmd Executes custom run menu item number n. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. Example See _pwbuser1 := cancel arg 1 usercmd < This macro executes custom Run menu command number 1. Arg, Cancel, Usercmd
_pwbviewbuildresults
Key Button Unassigned The View Results button in the Build Operation Complete dialog box. The _pwbviewbuildresults macro opens the Build Results window. PWB executes this macro when you choose the View Results button in the Build Operation Complete dialog box. You can redefine this macro to change the behavior of the View Results button. For example, if you want to move to the first message in the log and arrange windows, add _pwbnextmsg _pwbarrangewindow to the end of the macro definition. Definition _pwbviewbuildresults:=cancel arg "<COMPILE>" pwbwindow Cancel Establishes a uniform ground state by canceling any selection or argument. Arg "<COMPILE>" Pwbwindow Opens the Build Results window. See Pwbwindow
248
_pwbviewsearchresults
Key Button Unassigned The View Results button in the Search Operation Complete dialog box. The _pwbviewSearchresults macro opens the Search Results window. PWB executes this macro when you choose the View Results button in the Search Operation Complete dialog box. You can redefine this macro to change the behavior of the View Results button. For example, if you want to move to the first location in the log and arrange windows, add _pwbnextsearch _pwbarrangewindow to the end of the macro definition. Definition _pwbviewsearchresults:=cancel arg <SEARCH> pwbwindow Cancel Establishes a uniform ground state by canceling any selection or argument. Arg "<SEARCH>" Pwbwindow Opens the Search Results window. See Pwbwindow
_pwbwindown
Macro _pwbwindow1 _pwbwindow2 _pwbwindow3 _pwbwindow4 _pwbwindow5 _pwbwindow6 _pwbwindow7 _pwbwindow8 _pwbwindow9 Description Switch to window 1 . . . . . . . Switch to window 9 Key
ALT+1 ALT+2 ALT+3 ALT+4 ALT+5 ALT+6 ALT+7 ALT+8 ALT+9
249
Menu
Window n file n file Window number Current file in the window
The _pwbwindown macros each set a specific numbered window as the active window. Definition _pwbwindown := cancel arg "n" selwindow < Cancel Establishes a uniform ground state by canceling any selection or argument. Arg "n" Selwindow Moves to window number n. < Restores the functions prompt (if any). By default, function prompts are suppressed while a macro is running. Example See _pwbwindow1 := cancel arg "1" selwindow < This macro sets window number 1 as the active window. Arg, Cancel, Selwindow
PWB Switches
PWB provides the following switches to customize its behavior. You set switches by adding entries to the TOOLS.INI file or by using the Editor Settings, Key Assignments, and Colors commands on the Options menu.
Switch Askexit Askrtn Autoload Autosave Backup Beep Build Case Description Prompt before leaving PWB Prompt before returning from a shell Load PWB extensions automatically Save files when switching File backup mode Issue audible or visible alerts Rules and definitions for the build process Make letter case significant in searches
250
Environment and Tools Color Color of interface elements
Chapter 7 Programmers WorkBench Reference Switch Cursormode Dblclick Deflang Defwinstyle Editreadonly Enablealtgr Entab Enterinsmode Enterlogmode Enterselmode Envcursave Envprojsave Factor Fastfunc Filetab Friction Height Hike Hscroll Infodialog Keepmem Lastproject Load Markfile Mousemode Msgdialog Msgflush Newwindow Noise Printcmd Readonly Realtabs Restorelayout Rmargin Savescreen Description Block or underline cursor state Double-click threshold Default language Default window style Allow editing of files marked read-only on disk Enable the ALTGR key on non-US keyboards Tab translation mode while editing Enter PWB in insert mode Enter PWB with search logging turned on Enter PWB in specified selection mode Save environment variables for PWB sessions Save environment variables for projects Auto-repeat factor Functions for fast auto-repeat Width of tab characters in the file Delay between repetitions of fast functions Height of the display Window adjustment factor Horizontal scrolling factor Set of information dialogs displayed XMS/EMS memory kept during shell, build, and compile Set the last project on startup PWB extension to load Name of the current mark file Mouse configuration; disabled or swapped buttons Display a dialog box for build results Keep only one set of build results Create a new window when opening a file Line counting interval Command for printing files Command for saving disk read-only files Preserve tab characters in the file Restore the window layout when a project is set Right margin for word wrap Preserve the operating-system screen
251
252
Environment and Tools Switch Searchdialog Searchflush Searchwrap Shortnames Softcr Tabalign Tabdisp Tabstops Tilemode Timersave Tmpsav Traildisp Traillines Traillinesdisp Trailspace Undelcount Undocount Unixre User Vscroll Width Word Wordwrap Description Display a dialog box for search results Keep only one set of search results Make searches wrap around the end of the file Allow access to loaded files by base name Perform automatic indenting Align the cursor in tab fields Character for displaying tab characters Variable tab stops Window tiling style Timer interval for saving files Number of files kept in file history Character for displaying trailing spaces Preserve trailing lines Character for displaying trailing lines Preserve trailing spaces Maximum number of file backups Maximum number of edits per file to undo Use UNIX regular-expression syntax Custom Run menu item Vertical scrolling factor Width of the display Definition of a word Wrap words as they are entered
Extension Switches
The standard PWB extensions define additional switches to control their behavior. You set these switches in tagged sections of TOOLS.INI specific to that extension.
PWB Extension PWBROWSE.MXT PWBMASM.MXT PWBHELP.MXT Description Source Browser Assembly Language Microsoft Advisor Help TOOLS.INI Section Tag [PWB-PWBROWSE] [PWB-PWBMASM] [PWB-PWBHELP]
The PWBROWSE switches are described in Browser Switches on page 286. The PWBHELP switches are described in Help Switches on page 287.
253
Filename-Parts Syntax
The filename-parts syntax is used by PWB to pass the name of the current file to external programs or operating-system commands. You use this syntax in the Printcmd, Readonly, and User switches. Syntax Syntax %% A literal percent sign (%). %s The fully qualified path of the current file. If the current file is a pseudofile, %s specifies the name of a temporary disk file created for the external command to operate on. The temporary file is destroyed before returning to PWB and is never accessible to the editor. %| [[d]][[p]][[f]][[e]]F Parts of the current filename. The parts of the name are drive, path, filename, and extension. If the current file is a disk file named:
C:\SCRATCH\TEST.TXT
Syntax
or the pseudofile:
"<COMPILE>Build Results"
the given syntax yields:

Syntax %|F %|dF %|pF %|fF %|eF %|pfF %s %% Disk File C:\SCRATCH\TEST.TXT C: \SCRATCH TEST .TXT \SCRATCH\TEST C:\SCRATCH\TEST.TXT % <COMPILE> C:\TMP\PWB00031.R00 % <COMPILE> Pseudofile <COMPILE>
The title of a pseudofile cannot be specified with the filename-parts syntax, but it is accessible to macros by using the Curfile predefined macro. Warning The %|F syntax always specifies the name of the current file in the active window. For some commands, such as the command specified in the Readonly switch, this may not be the desired file. Use %s for the Readonly switch.
254
See
Printcmd, Readonly, User
Boolean Switch Syntax

You can use either one of the following syntaxes to set Boolean switches in PWB: Syntax 1 switch : [[ yes | no | on | off | 1 | 0 ]] switch The name of a PWB switch. yes, on, 1 Enable the feature controlled by switch. no, off, 0 Disable the feature controlled by switch. Syntax 2 [[no ]]switch : switch Enable the feature controlled by switch. no switch Disable the feature controlled by switch.
Askexit
Type Boolean The Askexit switch determines if PWB prompts for confirmation before returning to the operating system. Syntax Askexit:{ yes | no } yes no Default See Prompt for confirmation before leaving PWB. Do not prompt before leaving PWB.
Askexit:no Exit
255
Askrtn
Type Boolean The Askrtn switch determines if PWB prompts before returning to PWB after running a shell or external command. Syntax Askrtn:{ yes | no } yes Prompt for confirmation before returning to PWB. This setting allows you to review the contents of the operating-system screen before returning to the editor. no Do not prompt before returning to PWB. Default See Askrtn:yes Shell
Autoload
Type Boolean The Autoload switch determines if PWB automatically loads its extensions on startup. When the Autoload switch is yes, PWB automatically loads extensions whose names begin with PWB and are found in the same directory as PWB.EXE. PWB always loads extensions named in a Load switch. If you disable automatic extension loading, you can load extensions as you need them by assigning a value to the Load switch as follows: Arg load: pwbextension Assign (ALT+A load:pwbextension ALT+=). The pwbextension is the path of the extensions executable file. PWB automatically assumes the filename extension .MXT. You can specify an environment-variable path by using an environment-variable specifier. Syntax Autoload:{ yes | no } yes Automatically load PWB extensions on startup.
256
no Do not automatically load PWB extensions on startup. Load only those extensions named in Load switches in TOOLS.INI. Default Update Autoload:yes PWB 1.x extensions are not compatible with PWB 2.0. They are refused when you request that they be loaded. Old extensions must be recompiled with the new extension-support libraries and header files. In some cases, old extensions must also be modified for use with PWB 2.00. Updated Microsoft PWB 1.x extensions not included with this product are available by contacting Microsoft Product Support Services in the United States or your local Microsoft subsidiary.
Autosave
Type Boolean The Autosave switch determines if PWB automatically saves the current file without prompting whenever you move to another file, exit PWB, or execute an external operation such as a shell, build, or compile. When the Autosave switch is set to no, PWB maintains the contents of files in memory for internal operations, and prompts to save modified files on exit or for external operations such as a build. With this setting, PWB never saves a file unless you explicitly save it. Syntax Autosave:{ yes | no } yes no Default Update See Automatically save files. Do not automatically save files.
Autosave:no In PWB 1.x, the default value of Autosave is yes. Shell, Timersave
257
Backup
Type Text The Backup switch determines what happens to the old copy of a file before the new version is saved to disk. Syntax Backup:[[ undel | bak ]] (none) No backup: PWB overtypes the file. undel PWB moves the old file to a hidden directory so you can retrieve it with the UNDEL utility. The number of copies saved is specified by the Undelcount switch. bak The extension of the previous version of the file is changed to .BAK. Default Backup:bak
Beep
Type Boolean The Beep switch determines PWBs alerting method. When set to yes, PWB issues an audible sound. When no, PWB flashes the menu bara visual beep. Syntax Beep:{ yes | no } yes no Default Generate an audible beep. Flash the menu bar.
Beep:yes
Case
Type Boolean The Case switch determines if letter case is distinguished in searches.
258
The search functions that use the Case switch have meta forms that temporarily reverse the sense of the Case switch. The Unixre and Case switches have no effect on the syntax of regular expressions used by the Build or Word switches. These switches always use case-sensitive UNIX regular expressions. Syntax Case:{ yes | no } yes Case is significant in searches. Uppercase letters in search patterns do not match lowercase letters in text. no Case is not significant in searches. Uppercase letters match lowercase letters. Default See Case:no Meta, Mgrep, Mreplace, Msearch, Psearch, Replace.
Color
Type Syntax Text The Color switch specifies color of various parts of the PWB display. Color:name value name Identifies the part of PWB affected by the color value. value Two hexadecimal digits specifying the foreground and background color of the indicated item.
Color Names
PWB uses the following color names and default color values for the various parts of the PWB display:
Table 7.13 PWB Color Names Name Alert Background Border Default Value 70 07 07 Description Message box (Not visible) Window borders
Chapter 7 Programmers WorkBench Reference Table 7.13 PWB Color Names (continued) Name Builderr Buttondown Desktop Dialogaccel Dialogaccelbor Dialogbox Disabled Elevator Enabled Greyed Helpbold* Helpitalic* Helpnorm* Helpunderline* Helpwarning* Highlight Hilitectrl Info Itemhilitesel Listbox Location Menu Menubox Menuhilite Menuhilitesel Menuselected Message Pushbutton Pwbwindowborder Pwbwindowtext Scratch Scrollbar Selection Shadow Default Value 40 07 80 7F 7F 70 78 07 70 78 8F 8A 87 8C 70 1F 07 3F 0F 70 70 70 70 7F 0F 07 70 70 07 87 07 70 71 08 Description Build message line in active window Button while it is down Desktop Dialog box accelerator Dialog box accelerator border Dialog box Disabled items in menus and dialogs Scroll box Available items in menus and dialogs (Not visible) Bold Help text Italic Help text and the characters Plain Help text Emphasized Help text Current hyperlink Highlighted text; text found by searches Highlighted control item Special information Highlighted character in selected item List box within a dialog box Location indicator in status bar Menu bar Menu Highlighted character in menu Highlighted character in selected menu Selected menu Message area of status bar Button that is not pressed PWB window borders PWB window text (Not visible) Gray area and arrows in scroll bar Current selection Shadows
259
260
Environment and Tools Table 7.13 PWB Color Names (continued) Name Status Text Default Value 7F 17 Description Indicator area of status bar Text in a window
* Defined by the Help extension. Define these settings in the [PWB-PWBHELP] section of TOOLS.INI.
Color Values
Color values for the Color switch are two hexadecimal digits that specify the color of the item. The first digit specifies the background color and the second digit specifies the foreground color, according to the following table:
Table 7.14 PWB Color Values Color Black Blue Green Cyan Red Magenta Brown White Digit 0 1 2 3 4 5 6 7 Color Dark Gray Bright Blue Bright Green Bright Cyan Bright Red Bright Magenta Yellow Bright White Digit 8 9 A B C D E F
For example, a setting of 3E displays a cyan background (3) and a yellow foreground (E). Note that only color displays support all the colors listed. If you have a monochrome adapter or monochrome monitor, the only available colors are black (0), white (7), and bright white (F). All other colors are displayed as white.
Cursormode
Type Numeric The Cursormode switch determines the shape of the cursor when PWB is in insert and overtype mode, according to the following table:
Chapter 7 Programmers WorkBench Reference Cursormode Value 0 1 2 3 Insert Mode Cursor Underscore Block Block Underscore Overtype Mode Cursor Underscore Block Underscore Block
261
Default See
Cursormode:2 Status Bar
Dblclick
Type Numeric The Dblclick switch sets the double-click threshold for the mouse (the maximum time between successive clicks of the mouse button). The units for the Dblclick switch are 1/18 of a second. Default See Dblclick:10 Mousemode
Deflang
Type Text The Deflang switch determines the default file extension for file lists in PWB dialog boxes. Syntax Deflang:language language One of the following settings:
Setting NONE Asm Basic C Extension .* .ASM .BAS .C
262
Environment and Tools Setting C++ CXX COBOL FORTRAN LISP Pascal Extension .CPP .CXX .CBL .FOR .LSP .PAS
Default
Deflang:NONE
Defwinstyle
Type Numeric The Defwinstyle switch sets the default window style. The possible values for Defwinstyle are:
Value 1 3 5 7 Style No scroll bars Vertical scroll bar Horizontal scroll bar Both scroll bars
You can change the active window style by using the Winstyle function (CTRL+F6). Default See Defwinstyle:7 Maximize
Editreadonly
Type Boolean The Editreadonly switch determines if PWB allows you to edit a file marked read-only on disk.
263
Syntax
Editreadonly:{ yes | no } yes Allow modification of files that are marked read-only on disk. When PWB attempts to save the modified file, PWB informs you that the file is marked read-only. It also prompts you to confirm that the command specified by the Readonly switch is to be run. If you decline to run the command, PWB gives you the opportunity to save the file with a different name. no Disallow modification of read-only files. For files that cannot be modified, PWB displays the letter R on the status bar. You can reenable modification of a read-only file by using the Read Only command on the Edit menu or the Noedit function.
Default
Editreadonly:yes
Enablealtgr
Type Boolean The Enablealtgr switch determines if PWB recognizes the ALTGR key (the right ALT key) on international keyboards as ALTGR (Graphic Alt) or ALT. When ALTGR is enabled, pressing ALTGR+key produces the corresponding graphic character. ALTGR is never recognized as a key name for use in PWB key assignments. Syntax Enablealtgr:{ yes | no } yes no Default Recognize the right Recognize the right
ALT ALT
key as key as
ALTGR. ALT.
Enablealtgr:no
Entab
Type Numeric The Entab switch controls how PWB converts white space on modified lines. PWB converts white space only on the lines that you modify.
264
When the Realtabs switch is set to yes, tab characters are converted. When set to no, tab characters are not converted. The Entab switch can have the following values:
Value 0 1 Meaning Convert all white space to space (ASCII 32) characters. Convert white space outside quoted strings to tabs. A quoted string is any span of characters enclosed by a pair of single quotation marks or a pair of double quotation marks. PWB does not recognize escape sequences because they are language-specific. For well-behaved conversions with this setting, make sure that you use a numeric escape sequence to encode quotation marks in strings or character literals. 2 Convert white space to tabs.
With settings 1 and 2, if the white space being considered for conversion to a tab character occupies an entire tab field or ends at the boundary of a tab field, it is converted to a tab (ASCII 9) character. The width of a tab field is specified by the Filetab switch. In all conversions, PWB maintains the text alignment as it is displayed on screen. Default See Entab:1 Filetab, Realtabs, Tabalign
Enterinsmode
Type Boolean The Enterinsmode switch determines if PWB is to start in insert mode or overtype mode. You can toggle the current mode by using the Insertmode function (INS). When the current mode is overtype mode, the letter O appears on the status bar. Depending on the setting of the Cursormode switch, the shape of the cursor reflects the current mode. Syntax Enterinsmode:{ yes | no } yes no Start PWB in insert mode. Start PWB in overtype mode.
265
Default
Enterinsmode:yes
266
Enterlogmode
Type Boolean The Enterlogmode switch determines if search logging is turned on or off when PWB starts up. The current search-logging mode can be changed at any time using the Log command on the Search menu or the Logsearch function (Unassigned). Syntax Enterlogmode:{ yes | no } yes no Default Start PWB with search logging on. Start PWB with search logging off.
Enterlogmode:no
Enterselmode
Type Syntax Text The Enterselmode switch determines the selection mode when PWB starts up. Enterselmode:{ stream | box | line } stream Starts PWB in stream selection mode. box Starts PWB in box selection mode. line Starts PWB in line selection mode. Default See Enterselmode:stream Selmode
Envcursave
Type Boolean
267
The Envcursave switch determines if PWB saves and restores the current environment table for PWB sessions. You can change environment variables by using the Environment command on the Options menu or the Environment function (Unassigned). If you always want to use the operating-system environment, set both Envcursave and Envprojsave to no. Syntax Envcursave:{ yes | no } yes Save and restore environment variables for PWB sessions. Use this setting if you want to use an environment that is specific to PWB. The PWB environment overrides the operating-system environment. no Do not save environment variables between PWB sessions. Default Update Envcursave:no In PWB 1.x, the INCLUDE, LIB, and HELPFILES environment variables were always saved for PWB sessions and projects.
Envprojsave
Type Boolean The Envprojsave switch determines if PWB saves and restores the environment table for each project. A projects environment overrides both the PWB environment and the external (operating-system) environment. If you always want to use the operating-system environment table, set both Envcursave and Envprojsave to no. You can change environment variables by using the Environment command on the Options menu or the Environment function (Unassigned). Syntax Envprojsave:{ yes | no } yes Save environment variables for the project. Use this setting if you want to set project-specific environments. no Do not save environment variables for the project. Default Envprojsave:yes
268
Update
In PWB 1.x, the INCLUDE, LIB, and HELPFILES environment variables were always saved for PWB sessions and projects.
269
Factor
Type Text The Factor switch, together with the Friction switch, controls how quickly PWB executes a fast function. A fast function is a PWB function whose action repeats as rapidly as possible while you hold down the associated keystroke. Syntax Factor:{ %percent | -constant } [[count]] percent Percentage between 0 and 100 to reduce friction. constant Constant value between 0 and 65,535 to reduce friction. count Interval between reductions of friction. PWB reduces friction by percent percent or constant every count repetition of a keystroke, until friction is zero. Default Example Factor:%50 10 If you hold down the RIGHT ARROW key with the settings:
Right :RIGHT Fastfunc:Right Friction:1000 Factor :%75 7
PWB moves the cursor at the current speed until it has moved seven characters to the right. Then PWB changes the friction to 250 (75 percent reduction of the initial friction of 1000). When the cursor has moved 14 characters, the friction changes to 188 (75 percent reduction of the friction of 250). The cursor moves faster the longer you hold down the RIGHT ARROW key. See Fastfunc
270
Fastfunc
Type Text The Fastfunc switch specifies functions whose action is rapidly repeated by PWB as you hold down the associated key combination. The Friction and Factor switches control the repeat speed and acceleration of fast functions. Syntax Fastfunc:function {on | off} function on off Default PWB function to repeat.
Enable fast repeat for function. Disable fast repeat for function.
Fastfunc:Down on Fastfunc:Left on Fastfunc:Mlines on Fastfunc:Mpage on Fastfunc:Mpara on Fastfunc:Mword on Fastfunc:Plines on Fastfunc:Ppage on Fastfunc:Ppara on Fastfunc:Pword on Fastfunc:Right on Fastfunc:Up on
Filetab
Type Numeric The Filetab switch determines the width of a tab field for displaying tab (ASCII 9) characters in the file. The width of a tab field determines how white space is translated when the Realtabs switch is set to no. The Filetab switch does not affect the cursor-movement functions Tab (TAB) and Backtab (SHIFT+TAB). Default See Filetab:8 Entab, Realtabs, Tabdisp
271
Friction
Type Numeric The Friction switch, together with the Factor switch, controls how quickly PWB executes a fast function. A fast function is a PWB function whose action repeats rapidly when you hold down the associated key. The value of the Friction switch is a decimal number between 0 and 65,535 and specifies the delay between repetitions of a fast function. As the function is repeated, the delay is reduced according to the setting of the Factor switch. Default See Friction:40 Factor, Fastfunc
Height
Type Numeric The Height switch determines the number of lines on the PWB screen. The Height switch can have one of these values: 25, 43, 50, or 60. The last setting of this switch is saved and restored across PWB sessions and for each project. Default Height: first screen height When you start PWB for the first time, PWB uses the current screen height. Thereafter, PWB restores the previous setting until you explicitly assign a new value to the Height switch. Note that when you change the setting for Height in the Editor Settings dialog box, the change does not take effect until you choose OK. Other switches takes effect immediately when you choose Set Switch. See Assign
272
Hike
Type Numeric The Hike switch determines the number of lines from the cursor to the top of the window after you move the cursor out of the window by more than the number of lines specified by the Vscroll switch. The minimum value is 1. When the window occupies less than the full screen, the value is reduced in proportion to the window size. Default See Hike:4 Hscroll
Hscroll
Type Numeric The Hscroll switch controls the number of columns that PWB scrolls the text left or right when you move the cursor out of the window. When the window does not occupy the full screen, the amount scrolled is in proportion to the window size. Text is never scrolled in increments greater than the size of the window. Default See Hscroll:10 Vscroll
Infodialog
Type Numeric The Infodialog switch determines which information dialog boxes are displayed.
273
Syntax
Infodialog:hh hh Two hexadecimal digits specifying a set of flags to indicate which information dialog boxes should be displayed. When a bit is on (1), the corresponding dialog box is displayed. When a bit is off (0), the corresponding dialog box is not displayed. To set the value of Infodialog, add up the hexadecimal numbers listed in the table below for the dialog boxes you want to display.
Value 01 02 04 08 10 Information Dialog n occurrences found n occurrences replaced End of Build Results End of Search Results
'pattern' not found
No unbalanced characters found Changed directory to directory Changed drive to drive
Default
Infodialog:0F The default value of Infodialog tells PWB to display all information dialog boxes except for the Changed... dialog boxes.
Keepmem
Type Numeric The Keepmem switch specifies the amount of extended (XMS) memory or expanded (EMS) memory kept by PWB during a shell, compile, build, or other external command. Specify the value in units of kilobytes (1024 bytes). A larger number means that shelling is faster and leaves less memory for tools that use extended or expanded memory. A smaller number means that shelling is slower and leaves more memory for tools. If the number you specify is not large enough, PWB uses no extended or expanded memory. Default Keepmem:2048
274
Lastproject
Type Boolean The Lastproject switch determines if PWB automatically opens the last project on startup. The /PN, /PP, /PL, and /PF command-line options override the setting of the Lastproject switch. Syntax Lastproject:{ yes | no } yes no Default See On startup, open the last project that was open. Do not open the last project on startup.
Lastproject:no Project
Load
Type Text The Load switch specifies the filename of a PWB extension to load. When this switch is assigned a value, PWB loads the specified extension. The initialization specified in the extension is performed, and the functions and switches defined by the extension become available in PWB. The extension can be loaded during initialization of a TOOLS.INI section. You can also interactively load an extension by using the Editor Settings command on the Options menu or by using the Assign function to assign a value to the Load switch. Syntax Load:[[path]]basename[[ .ext]] path Can be a path or an environment-variable specifier. basename Base name of the extension executable file. ext Normally you do not specify a filename extension. See Autoload
275
Markfile
Type Text The Markfile switch specifies the name of the file PWB uses to save marks. When no mark file is open, marks are kept in memory, and they are lost when you exit PWB. When you open a mark file, marks in memory are saved in the mark file, unless a mark file is already open. When a mark file is already open, the marks in memory are saved in the open file. To open a mark file, use the Set Mark File command on the Search menu or assign a value to the Markfile switch by using the Editor Settings command on the Options menu or the Assign function. To close a mark file without opening a new one, assign an empty value to the Markfile switch. That is, use the setting:
Markfile:
To set a permanent mark file that is used for every PWB session, place a Markfile definition in the [PWB] section of TOOLS.INI. Syntax Default See Markfile: filename filename Markfile: The Markfile switch has no default value and is initially undefined. Assign, Mark The name of the file containing mark definitions.
Mark File Format

A mark file is a text file containing mark definitions of the form: markname filename line column The mark markname is defined as the location given by line and column in the file filename. The markname cannot contain spaces and cannot be a number. Update With PWB 1.x, when you open a mark file and no mark file is currently open, the marks in memory are lost. With PWB 2.00, the marks are saved in the new mark file.
276
Mousemode
Type Numeric The Mousemode switch enables or disables the mouse and sets the actions of the left and right mouse buttons.
Chapter 7 Programmers WorkBench Reference Value 0 1 2 Description The mouse is disabled and the mouse pointer is not visible. Normal mouse control. Exchanges the actions of the left and right mouse buttons.
277
Default See
Mousemode:1 Dblclick
Msgdialog
Type Boolean The Msgdialog switch determines if PWB brings up a dialog box summarizing build results or only beeps when a build is complete. Syntax Msgdialog:{ yes | no } yes no Default See Display a dialog box summarizing build results when a build is complete. Beep when a build is complete.
Msgdialog:yes Beep, Compile, Searchdialog
Msgflush
Type Boolean The Msgflush switch determines if previous build results are retained in the Build Results window or flushed when a new build is started. Syntax Msgflush:{ yes | no } yes no Default See Flush previous build results when a new build is started. Save previous build results.
Msgflush:yes Nextmsg, Searchflush
278
Newwindow
Type Boolean The Newwindow switch determines if certain PWB functions open a file in a new window or in the active window. The Newwindow switch provides the default state of the New Window check box in the Open File dialog box. This check box does not change the value of the Newwindow switch. When Newwindow is set to yes, PWB behaves like a Multiple Document Interface (MDI) application. That is, when you open a new file, PWB opens a new window for the file, except in certain situations as noted below. When Newwindow is set to no, PWB behaves like PWB 1.x. In this case, PWB opens files into the active window, creating a file history for that window. This mode is useful when working with large numbers of files. Some functions use the Newwindow switch to determine if a new window is created when opening a file. The following functions ignore the Newwindow switch, and either create a new window or open the file into the active window:
Function Mreplace Openfile Setfile Nextmsg Nextsearch Creates a New Window No Yes No No No
When the active window is a PWB window, PWB always creates a new window. You cannot open a file into a PWB window. Syntax Newwindow:{ yes | no } yes Open a new window when a new file is opened. This setting makes PWB behave like other MDI applications such as Microsoft Word 5.5 and Microsoft Works. no Open files into the active window, adding the previous file to the windows file history. This setting makes PWB behave like PWB 1.x. Default See Newwindow:yes Exit, Mark, Mreplace, Newfile, Nextmsg, Nextsearch, Openfile, Setfile
279
Noise
Type Numeric The Noise switch specifies the number of lines counted at a time as PWB traverses a file while reading, writing, or searching. PWB displays the line counter on the right side of the status bar, in the area which usually shows the current line. Set Noise to 0 to turn off the display of scanned lines. Default Noise:50
Printcmd
Type Text The Printcmd switch specifies a program or operating system command that PWB starts when you choose the Print command from the File menu or execute the Print function (Unassigned). Syntax Printcmd: command_line command_line An operating-system command line.
To pass the filename of the current file, specify %s in the command line. Specify %% to pass a literal percent sign. You can extract parts of the full filename using a special PWB syntax. See Filename-Parts Syntax on page 247. Default See Printcmd:COPY %s PRN Print
Readonly
Type Text The Readonly switch specifies the operating-system command invoked when PWB attempts to write to a read-only file.
280
When PWB attempts to overtype a file that is marked read-only on disk, PWB informs you that the file is read-only. It also prompts you to confirm that the command specified in the Readonly switch is to be run. If you decline to run the Readonly command, PWB gives you the opportunity to save the file with a different name. Syntax Readonly:[[command]] command Operating-system command line.
If no command is specified, PWB prompts you to enter a new filename to save the file. To pass the filename of the current file to the command, specify %s in the command line. Specify %% to pass a literal percent sign. You can extract parts of the full path using a special PWB syntax. See Filename-Parts Syntax on page 247. Note that only %s is guaranteed to give the name of the read-only file. The %|F syntax gives the current filename (the file displayed in the active window), even when PWB is saving a different file. Default Readonly: The default value specifies that PWB should run no command and should prompt for a different filename. Example The Readonly switch setting
Readonly:Attrib -r %s
removes the read-only attribute from the file on disk so PWB can overtype it. See Editreadonly, Noedit
Realtabs
Type Boolean The Realtabs switch determines if PWB preserves tab (ASCII 9) characters or translates white space according to the Entab switch when a line is modified. Realtabs also determines if the Tabalign switch is in effect. Syntax Realtabs:{ yes | no } yes Preserve tab characters when editing a line.
281
no Default See
Translate tab characters when editing a line.
Realtabs:yes Entab, Filetab, Tabalign
282
Restorelayout
Type Boolean The Restorelayout switch determines if PWB restores the saved window layout and file history from the project status file when you open a project or retains the active window layout and file history. This switch provides the default state of the Restore Window Layout check box in the Open Project dialog box. Syntax Restorelayout:{ yes | no } yes Restore a projects saved window layout and file history when the project is opened. no Do not restore the projects windows and file history. Default See Restorelayout:yes Project
Rmargin
Type Numeric The Rmargin switch sets the right margin for word wrapping. It has an effect only when word wrapping is turned on. Default Update Rmargin:78 In PWB 1.x, Rmargin sets the beginning of a six-character probation zone where typing a space wraps the line. After the zone, typing any character wraps the current word. This behavior is similar to that of a typewriter. PWB 2.00 uses a word-processors style of wrapping. To maintain the same margins as PWB 1.x, increase your Rmargin settings by 6. See Softcr, Wordwrap
283
Savescreen
Type Boolean The Savescreen switch determines if PWB preserves the operating-system screen image and video mode. Syntax Savescreen:{ yes | no } yes Save the operating-system screen when starting PWB, and restore it when leaving PWB. no Do not preserve the operating-system screen. When you leave PWB, the operating-system screen is blank, and the video mode is left in PWBs last video mode. Default Savescreen:yes
Searchdialog
Type Boolean The Searchdialog switch determines if PWB brings up a dialog box that summarizes logged search results or only beeps when a logged search is complete. The Searchdialog switch has an effect only while logging search results. Syntax Searchdialog:{ yes | no } yes Display a dialog box summarizing search results when a logged search is complete. no Beep when a logged search is complete. Default See Searchdialog:yes Beep, Enterlogmode, Logsearch, Msgdialog
284
Searchflush
Type Boolean The Searchflush switch determines if previous logged search results are flushed or retained when you start a new logged search. This switch has an effect only when PWB performs a logged search. Syntax Searchflush:{ yes | no } yes Flush the previous search results from the Search Results window when a new search is begun. no Preserve previous search results in the Search Results window. Default See Searchflush:yes Logsearch, Mgrep
Searchwrap
Type Boolean The Searchwrap switch determines if search commands and replace commands wrap around the ends of a file. Syntax Searchwrap:{ yes | no } yes no Default See Searches wrap around the beginning and end of the file. Searches stop at the beginning and end of the file.
Searchwrap:no Msearch, Psearch, Replace.
285
Shortnames
Type Boolean The Shortnames switch determines if currently loaded files can be accessed by their short names (base name only). Syntax Shortnames:{ yes | no } yes You can switch to a file currently loaded into PWB by specifying only the base name to the Setfile (F2) or Openfile (F10) functions. no You must specify the extension as well as the base name to switch to a file. Default See Shortnames:yes Openfile, Setfile
Softcr
Type Boolean The Softcr switch controls indentation of new lines based on the format of surrounding text when you execute the Emacsnewl (ENTER) and Newline (SHIFT+ENTER) functions. Syntax Softcr:{ yes | no } yes Indent new lines. no Do not indent new lines. After executing Emacsnewl or Newline, the cursor is placed in column 1. Default Softcr:yes
286
Tabalign
Type Boolean The Tabalign switch determines the positioning of the cursor when it enters a tab field. A tab field is the area of the screen representing a tab character (ASCII 9) in the file. The width of a tab field is specified by the Filetab switch. The Tabalign switch takes effect only when the Realtabs switch is set to yes. Syntax Tabalign:{ yes | no } yes PWB aligns the cursor to the beginning of the tab field when the cursor enters the tab field. The cursor is placed on the actual tab character in the file. no PWB does not align the cursor within the tab field. You can place the cursor on any column in the tab field. When you type a character at this position, PWB inserts enough leading blanks to ensure that the character appears in the same column. Default Tabalign:no
Tabdisp
Type Numeric The Tabdisp switch specifies the decimal ASCII code of the character used to display tab (ASCII 9) characters in your file. If you specify 0 or 255, PWB uses the space (ASCII 32) character. It is sometimes useful to set Tabdisp to the code for a graphic character so that tabs can be distinguished from spaces. Default See Tabdisp:32 The default value 32 specifies the ASCII space character. Filetab, Realtabs, Traildisp, Traillinesdisp
287
Tabstops
Type Text The Tabstops switch specifies variable tab stops used by the Tab and Backtab functions. Tab moves the cursor to the next tab stop; Backtab moves the cursor to the previous tab stop. Note that the Tabstops switch has no effect on the handling of tab (ASCII 9) characters in a file. Syntax Tabstops: [[tabwidth]]... repeat tabwidth The width of a tab stop. You can repeat tabwidth for as many tab stops as will fit on a PWB line (250 characters). repeat The width of every tab stop after the explicitly listed tab stops.A value of 0 for repeat specifies that there are no tab stops after the list of tabwidth settings. When the cursor is past the last tab stop, the Tab function does nothing. Default Update Tabstops:4 In PWB 1.x, Tabstops is a numeric switch specifying a single value, equivalent to the repeat value in PWB 2.0. The default PWB 2.00 Tabstops setting mimics the default behavior of PWB 1.x. The Tabstops switch setting
Tabstops:4
Example
sets a tab stop every four columns. Example The setting

Tabstops:3 4 7 8
sets a tab stop at columns 4, 8, 15, and every eight columns thereafter. Example The setting
Tabstops:3 4 7 25 25 0
sets a tab stop at columns 4, 8, 15, 40, and 65. When the cursor is past column 65, the Tab function does nothing. See Backtab, Entab, Filetab, Realtabs, Tab
288
Tilemode
Type Numeric The Tilemode switch specifies the window tiling style. It can take one of the values below:
Value 0 1 Tiling Style The first three windows are stacked one above the other. The top two windows are tiled side-by-side.
When four or more windows are open, the tiling is the same in the two styles. In stacked style (Tilemode:0), the top windows are placed one above the other, as shown in gray.
Figure 7.2
Vertical Tiling
In side-by-side style (Tilemode:1), the top two windows are tiled next to each other, as shown in Figure 7.3. This arrangement is good for comparing two files.
Figure 7.3
Horizontal Tiling
Default See
Tilemode:0 Arrangewindow
289
Timersave
Type Numeric The Timersave switch sets the interval in seconds between automatic file saves. The value must be in the range 0-65,535. Set Timersave to 0 to turn off time-triggered autosave. Default See Timersave:0 Autosave
Tmpsav
Type Numeric The Tmpsav switch determines the maximum number of files kept in the file history between sessions. When Tmpsav is 0, PWB lets the file history grow without limit; all files loaded into PWB appear in this list until you delete the CURRENT.STS file or change the value of the Tmpsav switch. Default Tmpsav:20
Traildisp
Type Numeric The Traildisp switch specifies the decimal ASCII code for the character used to display trailing spaces on a line. If you specify 0 or 255, PWB uses the space (ASCII 32) character. Default See Traildisp:0 Traillines, Trailspace, Traillinesdisp
290
Traillines
Type Boolean The Traillines switch determines if PWB preserves or removes empty trailing lines in a file when the file is written to disk. You can make trailing lines visible by setting the Traillinesdisp switch to a value other than 0 or 32. Syntax Traillines:{ yes | no } yes no Default See Preserve trailing blank lines in the file. Remove trailing blank lines from the file.
Traillines:no Traildisp, Trailspace
Traillinesdisp
Type Numeric The Traillinesdisp switch specifies the decimal ASCII code for the character displayed in the first column of blank lines at the end of the file. If you specify 0 or 255, PWB uses the space (ASCII 32) character. Default See Traillinesdisp:32 Traillines, Traildisp, Trailspace
Trailspace
Type Boolean The Trailspace switch determines if PWB preserves or removes trailing spaces from modified lines. You can make trailing spaces visible by setting the Traildisp switch to a value other than 0 or 32.
291
Syntax
Trailspace:{ yes | no } yes no Preserve trailing spaces on lines as they are changed. Remove trailing spaces from lines as they are changed.
Default See
Trailspace:no Traillines, Traillinesdisp
Undelcount
Type Numeric The Undelcount switch determines the maximum number of backup copies of a given file saved by PWB. This switch is used only when the Backup switch is set to undel. Default Undelcount:32767
Undocount
Type Numeric The Undocount switch sets the maximum number of edits per file that you can reverse with Undo (ALT+BKSP). Default Undocount:30
Unixre
Type Boolean The Unixre switch determines if PWB uses UNIX regular-expression syntax or PWBs non-UNIX regular-expression syntax for search-and-replace commands. The Unixre and Case switches have no effect on the syntax of regular expressions used by the Build and Word switches. These switches always use case-sensitive UNIX regular-expression syntax.
292
Syntax
Unixre :{ yes | no } yes no Use UNIX regular-expression syntax when searching. Use non-UNIX regular-expression syntax when searching.
Default
Unixre:yes
User
Type Syntax Text The User switch adds a custom menu item to the PWB Run menu. User: title, path, [[arg]], [[out]], [[dir]], [[help]], [[prompt]], [[ask]], [[back]], [[key]] If any argument to the User switch contains spaces, it must be enclosed in double quotation marks. title Menu title for the program to be added. No other command can have the same title. Prefix the character to be highlighted as the access key with a tilde (~) or ampersand (&). If you do not specify an access key, the first letter of the title is used. path Full path of the program. If the program is on the PATH environment variable, you can specify just the filename of the program. arg Command-line arguments for the program. To pass the name of the current file to the program, specify %s in the command line. Default: no arguments. out Name of a file to store program output. If no file is specified and the program is run in the foreground, the current file in PWB receives the output. Default: no output file. dir Current directory for the program. Default: PWBs current directory. help Text that appears on the status bar when the menu item is selected. Default: no help text. prompt Determines if PWB prompts for command-line arguments. The value of arg is the default response. Specify Y to prompt or any other character to run the program without prompting for arguments. Default: no prompt.
293
ask Determines if PWB is to prompt for a keystroke before returning to PWB. Specify Y to prompt or any other character to return to PWB immediately after running the program. Default: return without prompting. back Determines if the program is run in the background under a multithreaded environment. Specify Y to run the program in the background or any other character to run it in the foreground. If you run the program in the background, you must also specify output. Default: run the program in the foreground. key A single digit from 1 to 9, specifying a key from ALT+F1 to ALT+F9 as the shortcut key for the command. Default: no shortcut key. Default Example By default, no custom menu commands are defined. The User switch setting
User : "~Print", XPRINT, "/2 %s", LPT1, , \ "Print the current file with XPRINT", y, n, n, 8
specifies the following custom Run menu command:

Option title path arg out dir help prompt ask back key Description The menu title is Print with the accelerator P. The XPRINT program is expected to be on the PATH. The default command line specifies the /2 option and the current filename. The program output is redirected to the LPT1 device. The XPRINT program is run in the current directory. The Help line is Print the current file with XPRINT. PWB prompts for additional arguments. PWB doesnt prompt before returning from XPRINT. The XPRINT program is to run in the foreground.
ALT+F8 runs the XPRINT program after prompting.
The backslash at the end of the first line of the definition is a TOOLS.INI line continuation. See Printcmd, _pwbusern, Usercmd
294
Vscroll
Type Numeric The Vscroll switch controls the number of lines scrolled up or down when you move the cursor out of the window. When the window is smaller than the full screen, the amount scrolled is in proportion to the window size. The minimum value for Vscroll is 1. Text is never scrolled in increments greater than the size of the window. The Mlines and Plines functions also scroll according to the value of the Vscroll switch. Default See Vscroll:1 Hscroll
Width
Type Numeric The Width switch controls the width of the display. Only an 80-column display is supported. Default See Width:80 Height
Word
Type Syntax Text Word: "regular_expression" "regular_expression" A macro string specifying a UNIX-syntax regular expression that matches a word. The Word switch specifies a case-sensitive UNIX regular expression that matches a word. The Unixre and Case switches are ignored.
295
The Word switch accepts a TOOLS.INI macro string. The string can use escape sequences to represent nonprintable ASCII characters. Note that backslashes (\) must be doubled within a macro string. The Word switch is used by functions that operate on words: Mword , Pword , Pwbhelp, right-clicking the mouse for Help, and double-clicking the mouse to select a word. Default Examples Word:"[a-zA-Z0-9_$]+" The default value mimics the behavior of PWB 1.x. The Word switch can be used to change the definition of a word. The following examples show some useful word definitions. The following setting works the same way as the default setting, except that Pword and Mword stop at the end of a line:
Word:"\\{[a-zA-Z0-9_$]+\\!$\\}"
The default setting of the Word switch matches Microsoft C/C++ identifiers and unsigned integers. To restrict the definition of a word to match the ANSI C standard for identifiers, you would use the setting:
Word:"[a-zA-Z_][a-zA-Z0-9_]*"
Another useful setting is to define a word as a contiguous stream of nonspace characters:

Word:"[^ \t]+"
The following Word setting defines a word as an identifier or unsigned integer, a stream of white space, a stream of other characters, or the beginning or end of the line. This causes the word-movement functions to stop at each boundary, and allows a double-click to select white space.
Word:"\\{[a-zA-Z0-9_$]+\\![ ]+\\![^a-zA-Z0-9_$]+\\!$\\!^\\}"
296
Wordwrap
Type Boolean The Wordwrap switch determines if PWB performs automatic word wrap as you enter text. When word wrapping is turned on and you type a nonspace character past the column specified by Rmargin, PWB brings the current word down to a new line. A word is defined by the Word switch. Syntax Wordwrap:{ yes | no } yes no Default Update Wrap words as you enter text. Do not wrap.
Wordwrap:no See Rmargin
Browser Switches
The PWBBROWSE extension provides the following switches to control the behavior of the Source Browser in PWB.
Browcase
Type Numeric The Browcase switch determines the initial case sensitivity of the browser when a database is opened. The browser consults this switch only when it opens the database. This switch must appear in the [PWB-PWBROWSE] tagged section of TOOLS.INI. A dot appears next to the Match Case command on the Browse menu when the browser matches case. Choose Match Case to turn case-sensitive browsing on and off. Changing the current state does not affect the value of the Browcase switch.
297
Syntax
Browcase:{ 0 | 1 | 2 } 0 Use the case sensitivity stored in the database by BSCMAKE. The default case sensitivity matches the case sensitivity of the source language. 1 Match case for browse queries. 2 Ignore case for browse queries.
Default
Browcase:0
Browdbase
Type Text The Browdbase switch specifies the browser database to use. When this switch is not set, or the setting is empty, the browser uses the database for the current project (if any). You set this switch by using the Save Current Database command in the Custom Database Management dialog box. This switch must appear in the [PWB-PWBROWSE] tagged section of TOOLS.INI. Syntax Browdbase: database database The full filename of the browser database (.BSC file) to use. When database is not specified, the browser uses the database for the open project.
indows operating systemWindows operating systemWindows operating systemthe Windows operating systemWindows operating system
Help Switches
The PWBHELP extension provides the following switches to control the behavior of the Help system in PWB.
298
Color (Help Colors)

The PWBHELP extension defines the following Color switches to set the colors for items displayed in the Help window. These switches must appear in the [PWB-PWBHELP] tagged section of TOOLS.INI. When you choose OK in the Save Colors dialog box, PWB automatically writes the new settings to the correctly tagged section of TOOLS.INI.
Name Color: Helpnorm Color: Helpbold Color: Helpitalic Color: Helpunderline Color: Helpwarning Default Value 87 8F 8A 8C 70 Description Plain Help text Bold Help text Italic Help text and the characters Emphasized Help text Current hyperlink
For a complete description of the Color switch, see Color.
Helpautosize
Type Boolean The Helpautosize switch determines if PWB displays the Help window according to the size of the current topic or displays Help with its previous size and position. This switch must appear in the [PWB-PWBHELP] tagged section of TOOLS.INI. Syntax Helpautosize: { yes | no } yes When displaying a new topic, automatically resize the Help window to the size of the topic. no Do not automatically resize the Help window. The Help window is displayed with its previous size and position. Default Update Helpautosize:no In PWB 1.x, the Help window is always automatically resized. In PWB 2.00, the Help window is not resized by default.
299
Helpfiles
Type Text The Helpfiles switch lists Help files or directories containing Help files that PWB should open in addition to the Help files listed in the HELPFILES environment variable. This switch must appear in the [PWB-PWBHELP] tagged section of TOOLS.INI. Syntax Helpfiles: [[file]][[;file]]... file The filename of a Help file to open or the name of a directory. If a directory name is used, all Help files in the directory are opened. Each file can contain wildcards or environment-variable specifiers. Default Helpfiles: By default, PWB uses only the Help files in the current directory and those listed in the HELPFILES environment variable.
Helplist
Type Boolean The Helplist switch determines if PWB searches every Help file when you request Help or displays the first occurrence of the topic that it finds. This switch must appear in the [PWB-PWBHELP] tagged section of TOOLS.INI. Syntax Helplist: { yes | no } yes Displays a list of Help files that contain the topic you requested Help on when the topic is defined more than once. no Does not display a list of topics. PWB displays the first Help associated with the requested topic. To see the other Help screens that define the topic, use the Next command on the Help menu. Default Helplist: yes
300
Helpwindow
(obsolete) The PWB 1.x Helpwindow switch is obsolete and does not exist in PWB 2.00. PWB 2.00 always displays Help in the Help window.
P A R T
The CodeView Debugger
Chapter 8 Getting Started with CodeView . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Chapter 9 The CodeView Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Chapter 10 Special Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Chapter 11 .................................................... Using Expressions in CodeView 375 Chapter 12 CodeView Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
293
C H A P T E R
Getting Started with CodeView
Microsoft CodeView is a window-oriented debugging tool that helps you find and correct errors in MASM and Microsoft C/C++ programs. With CodeView, you can examine source-level code and the corresponding compiled code at the same time. You can execute your code in increments and view and modify data in memory as your program runs. Your MASM 6.10 package includes CodeView for MS-DOS (CV.EXE) and CodeView for Windows (CVW.EXE). The names CodeView, CodeView debugger, and the debugger refer to both versions unless the discussion indicates otherwise. This chapter shows you how to:
u u u
u u u u u
Write programs to make debugging easier. Formulate a debugging strategy. Compile and link your programs to include Microsoft Symbolic Debugging Information. Set up the files CodeView needs. Configure CodeView with TOOLS.INI. Start CodeView and load a program. Use the CodeView command-line options. Use or disable the CURRENT.STS state file.
Preparing Programs for Debugging

You can use CodeView to debug any MS-DOS or Windows-based executable file produced from MASM or Microsoft C/C++ source code. Compiling means producing object code from source files. All references to compiling also apply to assembling unless stated otherwise.
294
General Programming Considerations

This section describes programming practices that make debugging with CodeView easier and more efficient.
Multiple Statements on a Line

CodeView treats each source-code line as a unit. For this reason, you cannot trace and set a breakpoint on more than one statement per line. You can change from Source display mode to Mixed or Assembly display mode (see The Source Windows on page 324) and then set breakpoints at individual assembly instructions. If a single statement is broken across multiple lines, you may be able to set breakpoints on only the starting or ending line of the statement.
Macros and Inline Code

Microsoft C, C++, and MASM support macro expansion. Microsoft C and C++ also support inline code generation. These features pose a debugging problem because a macro or an inlined function is expanded where it is used, and CodeView has no information about the source code. This means that you cannot trace or set breakpoints in a macro or inlined function when debugging at the source level. To work around this condition, you can:
u u u
Manually expand the macro to its corresponding source code. Rewrite the macro as a function. Suppress inline code generation with the /Ob0 compiler option.
You can often rewrite macros as inline functions, then selectively disable inlining with a compiler option or pragma so that you can step and trace the routine. Rewriting macros as inlined functions can have additional benefits such as argument type checking. However, in some cases the best solution for debugging macros or inline code is to use Assembly or Mixed display mode.
Segment Ordering and Naming

For assembly-language programs, you must declare your segments according to the standard Microsoft high-level language format. MASM versions 5.10 and later provide directives to specify the standard segment order and naming.
Programs that Alter the Environment

Programs that run under CodeView can read the environment table, but they cannot permanently change it. When you exit CodeView, changes to the environment are lost.
Chapter 8 Getting Started with CodeView
295
Programs that Access the Program Segment Prefix

CodeView processes the command line in the program segment prefix (PSP) the same way as the C/C++ run-time library does. Quotation marks are removed, and exactly one space is left between command-line arguments. As a result, a program that accesses the PSP directly cannot expect the command line to appear exactly as typed.

After you compile and link your program into a running executable file, you can begin debugging with CodeView. To take full advantage of CodeView, however, you must compile and link with the options that generate CodeView Symbolic Debugging Information. This book refers to this information as CodeView information, debugging information, or symbolic information. The CodeView information tells CodeView about:
u u u u u
All program symbols, including locals, globals, and publics Data types Line numbers Segments Modules
Without this information, you cannot refer to any source-level names, and you can view the program only in Assembly display mode. When CodeView loads a module that does not contain symbolic information, CodeView starts in Assembly mode and displays the message:
CV0101 Warning: No symbolic information for PROGRAM.EXE
You get this message if you try to debug an executable file that you did not compile and link with CodeView options, if you use a compiler that does not generate CodeView information, or if you link your program with an old version of the linker. If you retain an old linker version and it is first in your path, the proper information may not be generated for CodeView. You can specify CodeView compiler and linker options from the command line, in a makefile, or from within the Microsoft Programmers Workbench (PWB). To compile and link your program with CodeView options from PWB, choose Build Options from the Options menu, and turn on Use Debug Options. By default, all project templates enable the generation of CodeView information for debug builds.
296
Assembler/Compiler Options
You can specify CodeView options when you assemble a source file of a program you want to debug. Specify the /Zi option on the command line or in a makefile to instruct the assembler to include line-number and complete symbolic information in the object file. Symbolic information takes up a large amount of space in the executable file and in memory while debugging. If you do not need full symbolic information in some modules, compile those modules with the /Zd option. The /Zd option specifies that only line numbers and public symbols are included in the object file. In such modules you can view the source file and examine and modify global variables, but type information and names with local scope are not available. For modules that are assembled or compiled with the /Zd option, all names in that module are displayed and can only be referred to using their decorated name. The decorated name is the form of the name in the object code produced by the compiler. With full debugging information, CodeView can translate between the source form of the name and the decorated name. Name decoration encodes additional information into a symbols name by adding prefixes and suffixes. For example, the C compiler prefixes the names of functions that use the C calling convention with an underscore. You often see decorated names for library routines in disassembly or output from the Examine Symbols (X) command. For more information on decorated names, see Symbol Formats on page 385. All Microsoft high-level language compilers are optimizing compilers that may rearrange and remove source code. As a result, optimizations destroy the correspondence between source lines and generated machine code, which can make debugging especially difficult. While you are debugging, you should disable optimizations with the /Od compiler option. When you finish debugging, you can compile a final version of your program with full optimizations. Note The /Od option does not pertain to MASM.
Linker Options
When you are using Microsoft C/C++ or the Microsoft Assembler, you must use the Microsoft Segmented Executable Linker (LINK) version 5.30 or later to generate an executable file with CodeView information. If you include debugging options when you compile, the compiler automatically invokes the linker with the appropriate options. In turn, LINK runs the CVPACK utility, which compresses the symbolic information.
297
When compiling, you can specify the compile-only (/c) option to disable running LINK. To include debugging information when you link the object modules separately, specify the LINK /CO option. LINK automatically runs CVPACK when you specify /CO. If you link with the /EXEPACK option, you must execute the programs startup code before setting breakpoints in the program. If you set breakpoints in a packed executable file before the startup code has executed, CodeView behavior is unpredictable. An executable file that includes debugging information can be executed from the command line like any other program. However, to minimize the size of the final version of the program, compile and link without the CodeView options. Examples The following command sequence assembles and links two files:
ML /C /Zi MOD1.ASM ML /C /Zd MOD2.ASM LINK /CO MOD1 MOD2
This example produces the object file MOD1.OBJ, which contains line-number and complete symbolic information, and the object file MOD2.OBJ, containing only line-number and public-symbol information. The object files are then linked to produce a smaller file than the file that is produced when both modules are assembled with the /Zi option. The following commands produce a mixed-language executable file:
CL /Zi PROG.CPP CL /Zi /Od /c /AL SUB1.C ML /C /Zi /MX SUB2.ASM LINK /CO PROG SUB1 SUB2
You can use CodeView to trace through C, C++, and MASM source files in the same session.
Debugging Strategies
The process of debugging a program varies from programmer to programmer and program to program. This section offers some guidelines for detecting bugs. If you are familiar with symbolic, source-level debuggers, you can skip this section.
Identifying the Bug

If your program crashes or yields incorrect output, it has a bug. There are times, however, when a program runs correctly with some input but produces incorrect
298
output or crashes with different input. You can assume a bug exists, but finding it may be difficult.
Locating the Bug

You may not need to use CodeView to find bugs in simple programs. For more complex programs, however, using CodeView can save you debugging time and effort.
Setting Breakpoints
When you debug with CodeView, you usually cycle between two activities:
u u
Running a small part of the program Stopping the program to check its status
You use breakpoints to switch between these tasks. CodeView runs your program until it reaches a breakpoint. At that time, CodeView gives you control. You can then enter CodeView commands in the Command window or use the menus and shortcut keys to proceed. To find an error, try the following:
u
Set breakpoints around the place you think the bug might be. Execute the program with the Go command so that it runs at full speed until it reaches the area that you suspect harbors the bug. You can then execute the program step by step with the Program Step and Trace commands to see if there is a program execution error. Set breakpoints when certain conditions become true. You can, for example, set a breakpoint to check a range of memory starting at DS:00, the base of your programs data. If your program writes to memory using a null pointer, the breakpoint is taken, and you can see what statement or variables within the statement are in error.
Setting Watch Expressions

Watch expressions constantly display the values of variables in the Watch window. By setting a Watch expression, you can see how a variable or an expression changes as your program executes. Try using watch expressions as follows:
u
Set a Watch expression on an important variable. Then step through a part of the program where you suspect there is a bug. When you see a variable in the Watch window take on an unexpected value, you know that there is probably a bug in the line you just executed.

u
299
Explore Watch expressions. A bug can appear when your program builds complex data structures. Both the Watch window and the Quick Watch dialog box allow you to explore the data structure by expanding arrays and pointers. Use this feature to make sure the program creates the data structure correctly. As soon as you execute code that destroys the structure, you have probably found a bug.
Arranging Your Display

Your display can be more effective if you arrange your windows so that they display the information you need. You will need at least one Source window. You can open a second Source window to see each assembly-language instruction. You may also need one or more Memory windows to examine ranges of memory in various formats. You may want to change values in memory. For example, a program that does its own dynamic-memory allocation may need an initialized block of memory. You can edit memory directly in the Memory window or fill the block with zeros using the Memory Fill command. If a certain value is required for a mathematical function, you can type over values displayed in the Memory window or assign the value in the Command window. If you expect a value to appear at a certain location and it does not, you can use the Memory Search command to find it. Use the Register window to see the CPU registers and the Local and Watch windows to keep track of changing variable values. Open the Calls menu to examine your programs stack to see what routines have been called. You can set up CodeViews windows to display the information you want to see by using keyboard commands or the commands in the Window menu. For example, when you press SHIFT+F5 or choose Tile from the Windows menu, CodeView arranges all open windows to fill the entire window area. When the windows are tiled, you can press ALT+F5 or choose Arrange from the Windows menu. This allows you to move your open windows with a mouse so that you can view several or all of them at once.
Setting up CodeView
The MASM SETUP program installs all the necessary CodeView files. Make sure that all of the CodeView executable files (.EXE and .DLL files) are in a directory listed in the PATH environment variable. In addition, SETUP creates TOOLS.PRE in the INIT directory that you specify when you run SETUP. If you do not already have a TOOLS.INI file, rename TOOLS.PRE as TOOLS.INI.
300
This file contains the recommended settings to run CodeView for MS-DOS and CodeView for Windows. For more information on the entries in TOOLS.INI, see Configuring CodeView with TOOLS.INI on page 301. CodeView version 4.0 introduces a new, flexible architecture for the debugger. CodeView is made up of a main executable program: CV.EXE (CodeView for MS-DOS) or CVW.EXE (CodeView for Windows) and a collection of dynamiclink libraries (DLLs). Each DLL implements an aspect of the debugging process. The following table summarizes CodeViews component DLLs:
TOOLS.INI Entry Eval Model Native Symbolhandler Transport Component Expression evaluator Additional nonnative execution model Native execution model Symbol handler Transport layer Required Required Optional Required Required Required Example C or C++ P-code MS-DOS or Windows MS-DOS or Windows Local or remote
This architecture allows for the implementation of such improbable debugging configurations as a Windows operating system-hosted debugger that debugs interpreted Macintosh programs across a network. The existing CVW.EXE could be used with new transport, symbol handling, and execution model DLLs. Instead of creating completely different programs for each combination of host and target, all that is needed is the appropriate set of DLLs.
CodeView Files
CodeView for Windows and CodeView for MS-DOS use several additional files. One of these is the executable program file that you are debugging. CodeView requires one executable (.EXE) file to load for debugging. program.EXE An .EXE-format program to debug. CodeView assumes the .EXE extension when you specify the program to load for debugging. source.ext A program source file. Your program may consist of more than one source file. When CodeView needs to load a source file for a module at startup or when you step into a new module, it searches directories in the following order: 1. The compiled directory. This is the source-file path specified when you invoke the compiler. 2. The directory where the program is located.
301
If CodeView cannot find the source file in one of these directories, it prompts you for a directory. You can enter a new directory or press ENTER to indicate that you do not want a source file to be loaded for the module. If you do not specify a source file, you can debug only in Assembly mode. CV.HLP ADVISOR.HLP Help files for CodeView and the Microsoft Advisor. These two files are the minimum set of files required to use Help during a CodeView session. They must be in a directory listed in the HELPFILES environment variable or in the Helpfiles entry of TOOLS.INI. Depending on what programming environment you work in, you may also want to use the various programming language and p-code help files. TOOLS.INI Specifies paths for CodeView .DLL files and other files that CodeView uses. The MASM SETUP program creates the file TOOLS.PRE in the directory specified in your INIT environment variable. If CodeView cannot find the modules it needs in its own directory, it looks for entries in TOOLS.INI that specify paths for the modules it needs. You can include other settings for CodeView in TOOLS.INI. TOOLHELP.DLL System support .DLL for CVW. Remote debugging requires additional files and a different configuration. The files and configuration required for remote debugging are described in Chapter 10, Special Topics.
Configuring CodeView with TOOLS.INI

You can configure CodeView and other Microsoft tools including the Microsoft Programmers WorkBench (PWB) and NMAKE by specifying entries in the TOOLS.INI file. You must have separate sections in TOOLS.INI for each tool. TOOLS.INI sections begin with a taga line containing the base name of the executable file enclosed in brackets ([ ]). The tag must appear in column one. The CV and CVW section tags look like this:
[CV] ; . . . [CVW] ; . . . MS-DOS CodeView entries
Windows operating system CodeView entries
In the TOOLS.INI file, a line beginning with a semicolon (;) is a comment.
302
CodeView looks for certain entries following the tag. Each entry may be preceded by any number of spaces, but the entire entry must fit on one line. You may want to indent each entry for readability.
CodeView TOOLS.INI Entries

You may want to specify or change entries in TOOLS.INI to customize CodeView. Table 8.1 summarizes the TOOLS.INI entries.
Table 8.1 Entry Autostart Color Cvdllpath Eval Helpbuffer Helpfiles Model Native Printfile Statefileread Symbolhandler Transport CodeView TOOLS.INI Entries Description Commands to execute on startup Screen colors Path to CodeView .DLL files Expression evaluator Size of help buffer List of help files Additional execution model (such as p-code) Native execution model Default name for print command or file Read or ignore CURRENT.STS state file Symbol handler Transport layer
Autostart
The Autostart entry specifies a list of Command-window commands that CodeView executes on startup. Syntax Autostart :command[[;command]]... command A command for CodeView to execute at startup. Separate multiple commands with a semicolon (;). The following entry automatically executes the programs run-time startup code. It specifies that CodeView always starts with the Screen Swap option off and the Trace Speed option set to fast.
Example

Autostart:OF-;TF;Gmain
303
Color
The Color entry is retained only for compatibility with previous versions of CodeView. You should set screen colors with the Colors command on the Options menu.
Cvdllpath
The Cvdllpath entry specifies the default path for CodeViews dynamic-link libraries (DLLs). CodeView searches this path when it cannot find its DLLs in CodeViews directory or along the PATH environment variable. This entry is recommended. Syntax Cvdllpath:path path The path to the CodeView .DLL files.
Eval
The Eval entry specifies an expression evaluator. The expression evaluator looks up symbols, parses, and evaluates expressions that you enter as arguments to CodeView commands. If there is no Eval entry in TOOLS.INI, CodeView loads the C++ expression evaluator by default. CodeView uses the specified expression evaluator when you are debugging modules with source files ending in the specified extensions. Syntax Eval:[[path\]]EEhost evaluator.DLL extension... path The path to the specified expression evaluator. host The host environment.
Specifier D1 W0 Operating Environment MS-DOS Windows
evaluator The source language expression evaluator.

Specifier CAN CXX Source Language C or MASM C, C++, or MASM
304
extension A source-file extension. CodeView uses the specified expression evaluator when it loads a source file with the given extension. You can list any number of extensions.
305
Example
The following example loads both the C and C++ expression evaluators for the MS-DOS CodeView:
Eval:C:\C700\DLL\EED1CAN.DLL .C .ABC .ASM .H Eval:C:\C700\DLL\EED1CXX.DLL .CPP .CXX .XYZ .HXX
With the entries in this example, when you trace into a module whose source file has the extension .C, .ABC, or .ASM, CodeView uses the C expression evaluator. When you trace into a source file with a .CXX, .CPP, or .XYZ extension, CodeView switches to the C++ expression evaluator. Note The C++ expression evaluator is the only expression evaluator provided with MASM 6.10. For most MASM, C, and C++ programs the C++ expression evaluator is sufficient. You can load expression evaluators after CodeView has started by using the Load command from the Run menu. You can override CodeViews automatic choice of expression evaluator by using the Language command on the Options menu or the USE command in the Command window. For more information about choosing an appropriate expression evaluator and how to use expressions in CodeView, see Chapter 11, Using Expressions in CodeView.
Helpbuffer
The Helpbuffer entry specifies the size of the buffer CodeView uses to decompress help files. You can set Helpbuffer to 0 to disable Help and maximize the amount of memory available for debugging. Otherwise, specify a value between 1 and 256. Syntax Helpbuffer:size size The number of kilobytes (K) of memory to use for decompressing help files. The default help buffer size is 24K. Specify 0 to disable help. The following table shows values you can specify and the actual size of the buffer that is allocated:
Value Specified 124 25128 129256 Help Buffer Size 24K 128K 256K
The smallest buffer size is 24K, and the largest is 256K.
306
Helpfiles
The Helpfiles entry lists help files for CodeView to load. These files are loaded before any files listed in the HELPFILES environment variable. Syntax Helpfiles:file[[;file]]... file A directory or help file. If you list a directory, CodeView loads all files with the .HLP extension in that directory. Separate multiple files or directories with a semicolon (;).
Model
The Model entry specifies an additional execution model that CodeView uses when you are debugging nonnative code such as p-code. The execution model handles tasks specific to the type of executable code that you are debugging. Syntax Model:[[path\]]NMhost model.DLL path The path to the specified file. host The host environment must be one of the following:
model A nonnative execution model. The p-code execution model (PCD) is required if you plan to debug p-code. Example
Model:NMD1PCD.DLL
Native
The Native entry specifies the native execution model. This DLL handles tasks that are specific to the machine and operating system on which you are running (the host) and specific to the native code (the target). Syntax Native:[[path\]]EMhost target.DLL path The path to the specified native execution model.
307
host The host environment must be one of the following:

target The target environment must be one of the following:

Printfile
The Printfile entry lists the default device name or filename used by the Print command on the File menu. This can be a printer port (for example, LPT1 or COM2) or an output file. If Printfile is omitted, CodeView prints to a file named CODEVIEW.LST in the current directory. This entry is ignored by CVW, which does not have the Print command. Syntax Printfile:path path The path to the specified output file or the name of a device.
Statefileread
The Statefileread entry tells CodeView to read or ignore the CodeView state file (CURRENT.STS) on startup. You can toggle this setting from the command line using the /TSF (Toggle State File) option. These options have no effect on writing CURRENT.STS. CodeView always saves its state on exit. Syntax Statefileread:[[y | n]] y (yes) CodeView reads CURRENT.STS on startup. n (no) CodeView ignores CURRENT.STS on startup.
Symbolhandler
The Symbolhandler entry specifies a symbol handler. The symbol handler manages the CodeView symbol and type information.
308
Syntax
Symbolhandler:[[path\]]SHhost.DLL path The path to the symbol handler. host The host environment must be one of the following:
Transport
The Transport entry specifies a transport layer. A transport layer provides the data link for communication between the host and target during debugging. Syntax Transport :path\TLhost transport.DLL [[COM{1|2}:[[rate]]]] path The path to the specified transport layer. host The host environment must be one of the following:
transport Specifies a transport layer.

Specifier LOC COM Transport Layer Local transport layer Serial remote transport layer
The optional [[COM{1|2}:[[rate]]]] specifies a communications port and baud rate for remote debugging. No space is allowed between COM and the port number (1 or 2). The default port is COM1. The <rate> can be any number from 50 through 9600. The default rate is 9600. You specify the local transport layer (LOC) when the debugger and the program you are debugging are running on the same machine. With the appropriate transport layer, CodeView can support remote debugging across serial lines or networks. For more information on remote debugging, see Chapter 10.
309
The following example specifies the transport layer for debugging a program that is running on the same machine. Example
Transport:C:\C700\DLL\TLW0LOC.DLL
Memory Management and CodeView

CodeView for MS-DOS (CV) requires at least 2 megabytes of memory. The memory must be managed by a Virtual Control Program Interface (VCPI) server, DOS Protected-Mode Interface (DPMI) server, or extended memory (XMS) manager. These drivers manage memory at addresses above 1 megabyte on an 80286, 80386, or 80486 machine. CodeView loads itself and the debugging information for the program into high memory. In this way, CodeView uses only approximately 17K of conventional MS-DOS memory. CodeView can use the following memory managers:
u
A VCPI server such as EMM386.EXE or EMM386.SYS. With a VCPI server, your program is also able to use EMS memory. To use this memory manager you must have a command in your CONFIG.SYS file such as:
DEVICE=C:\DOS\EMM386.EXE ram
u u
A DPMI server such as 386max. An Extended Memory Standard (XMS) driver such as HIMEM.SYS. To use this memory manager you must have a command in your CONFIG.SYS file such as:
DEVICE=C:\DOS\HIMEM.SYS
For more information about using memory managers, see your memory managers documentation. When you make new entries in your CONFIG.SYS file, remember to reboot your system so that your changes take effect.
The CodeView Command Line

You can specify CV or CVW options when you start them from the command line. You can also specify commands from within the CodeView environment to modify these startup arguments. Syntax CV[[W]] [[options]] [[program [[arguments]] ]] CV[[W]] @file [[program [[arguments]] ]] W Indicates the Windows operating system version of CodeView.
310
options One or more options. The CodeView options are described in the Command-Line Options section on page 310. program Program to be debugged. Specifies the name of an executable file to be loaded by the debugger. If you specify program as a filename with no extension, CodeView searches for a file with the extension .EXE. If you do not specify a program, CodeView starts up and displays the Load dialog box where you can specify a program and its command-line arguments. arguments The programs command-line arguments. All remaining text on the CodeView command line is passed to the program you are debugging as its command line. If the program you are debugging does not accept commandline arguments, you do not need to specify any. Once youve started debugging, you can change the programs command-line arguments. @file File of command-line arguments. You can also specify arguments in a text file. The file contains a list of arguments, one per line. An argument file lets you specify a large number of arguments without exceeding the operatingsystem limit on the length of a command line. This is especially useful when starting a session that uses many DLLs. After CodeView loads its DLLs, processes the debugging information, and loads the source file, the CodeView display appears. If you do not specify a program to debug or CodeView cannot find all of its required DLLs, CodeView prompts for the necessary files. After starting up, CodeView is at the beginning of the program startup code, and you are ready to start debugging. At this point, you can enter an execution command (such as Trace or Program Step) to execute through the startup code to the beginning of your program.
Leaving CodeView
To exit CodeView at any time, choose the Exit command from the File menu. You can also press ALT+F4, or type Q (for Quit) in the Command window. At this point, you may want to skip ahead to the next chapter, The CodeView Environment for information on CodeViews menus and windows. The rest of this chapter describes each command-line option in detail, then continues with a description of how PWB and CodeView use the CURRENT.STS file.
311
Command-Line Options
CV and CVW accept some of the same options for debugging. Table 8.2 summarizes the CodeView command-line options.
Table 8.2 CodeView Command-Line Options Option /2 /8 /25, /43, /50 /B /Ccommands /F /G /I[0 | 1] /Ldll /M /N[0 | 1] /S /TSF /X /Y CV Yes No Yes Yes Yes Yes Yes Yes No Yes Yes Yes Yes No No CVW Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes No Yes Yes Yex Description Use two displays Use 8514 and VGA displays Set 25-line, 43-line, or 50-line mode Use black-and-white display Execute commands Flip video pages Control snow on CGA displays Trap NMIs and 8259 interrupts Load DLL or application symbols Disable mouse Trap nonmaskable interrupts Swap video buffers Read or ignore state file Set starting X coordinate (pixels) Set starting Y coordinate (pixels)
The remainder of this section describes each option in detail.
Use Two Displays (CV, CVW)

Option /2 The /2 option permits the use of two monitors. The program display appears on the default monitor, while CodeView displays on the secondary monitor. You must have two monitors and two adapters to use the /2 option. The secondary display must be a monochrome adapter. If you are debugging a Windows-based application and have an IBM PS/2 with an 8514 primary display and a Video Graphics Adapter (VGA) secondary display, use the /8 option.
312
Use 8514 and VGA Displays (CVW)

Option /8 If your system is an IBM PS/2, you can configure it with an 8514 as the primary display and a VGA as the secondary display. To use this configuration, specify the /8 (8514) option on the CVW command line. If your VGA monitor is monochrome, it is recommended to use the /B (blackand-white) option. The 8514 serves as the Windows operating system screen and the VGA as the debugging screen. By default, the debugging screen operates in 50-line mode in this configuration. If you specify the /8 option, you can specify /25 or /43 for 25-line or 43-line mode on the debugging screen. Warning Results are unpredictable if you attempt to run non-Windows-based applications or the MS-DOS shell while you are running CVW with the /8 option.
Set Line-Display Mode (CV, CVW)

Options /25 /43 /50 If you have the appropriate display adapter and monitor, you can display 25, 43, or 50 lines when you are running CV, and 25 or 50 lines when you are running CVW. The mode you specify is saved in the CURRENT.STS file so that it is still in effect the next time you run CodeView. CVW.EXE supports 25, 43 and 50 lines on VGA monitors. It does not support 50-line mode on EGA monitors. If you specify a mode that is not supported by your adapter and your monitor, CodeView displays 25 lines. To display 43 or 50 lines on a screen, you must use the OEM fonts supplied with CodeView. There are two OEM files: OEM08.FON for 50-line mode, and OEM10.FON for 43-line mode. To use these fonts, change the OEMFONTS.FON entry in your SYSTEM.INI file. For example, to use 50-line mode, change:
OEMFONTS.FON=VGAOEM.FON
to:
OEMFONTS.FON=C:\MASM\BIN\OEM08.FON
313
Use Black-and-White Display (CV, CVW)

Option /B When you start CodeView, it checks the kind of display adapter that is installed in your computer. If the debugger detects a monochrome adapter, it displays in black and white; if it finds a color adapter, it displays in color. The /B option tells CodeView to display in black and white even if it detects a color adapter. If you use a monochrome display or laptop computer that simulates a color display, you many want to disable color. These displays may be difficult to read with CodeViews color display. You can also customize CodeViews colors by choosing the Colors command from the Options menu. For more information, see Colors on page 345.
Execute Commands (CV, CVW)

Option /Ccommands You type commands in the CodeView Command window. You can also specify Command-window commands when you start CodeView. The /C option allows you to specify one or more CodeView Command-window commands to be executed upon startup. If you specify more than one command, you must separate each one with a semicolon (;). If the commands contain spaces or redirection symbols (< or >), enclose the entire option in double quotation marks ("). Otherwise, the debugger interprets each argument as a separate CodeView command-line argument rather than as a Command-window command. For complete information on CodeView Command-window commands, see Chapter 12, CodeView Reference. Examples The following example loads CV with CALCPR as the executable file and /p TST.DAT as the programs command line:
CV /CGmain CALCPR /p TST.DAT
Upon startup, CV executes the high-level language startup code with the command Gmain. Since no space is required between the command (G) and its argument (main), there is no need to enclose the option in double quotation marks. The next example loads CV with CALCPR as the executable file and /p TST.DAT as the programs command line. It starts CodeView with a long list of startup commands.
314

CV "/C VS &;G signal_lpd;MDA print_buffer L 20" CALCPR /p TST.DAT
CodeView starts with the Source window displaying in Mixed mode (VS &). Then it executes up to the function signal_lpd with the command G signal_lpd. Next, it dumps 20 characters starting at the address of print_buffer with the command MDA print_buffer L 20. Since some of the commands use spaces, the entire /C option is enclosed in quotation marks. In this example, the command directs CV to take Command-window input from the file SCRIPT.TXT rather than from the keyboard:
CV "/C<SCRIPT.TXT" CALCPR TST.DAT
Although the option does not include any spaces, you must enclose it in quotation marks so that the less-than symbol (<) is read by CodeView rather than by the operating-system command processor.
Set Screen-Exchange Method (CV)

Options /F /S CodeView allows you to move between the output screen, which contains your program display output, and the CodeView screen, which contains the debugging display. In MS-DOS, CodeView can perform this screen exchange in two ways: screen flipping or screen swapping. The /F (flipping) and /S (swapping) options allow you to choose the method from the command line. These two methods are: Flipping Flipping is the default for a computer with a graphics adapter. CodeView uses the graphic adapters video-display pages to store each screen of text. Flipping is faster than swapping and uses less memory, but it cannot be used with a monochrome adapter or to debug programs that use graphic video modes or the video-display pages. CodeView ignores the /F option if you have a monochrome adapter. Swapping Swapping is the default for computers with monochrome adapters. It has none of the limitations of flipping, but it is slower than flipping and requires more memory. To swap screens, CodeView creates a buffer in memory and uses it to store the screen that is not displayed. When you request the other screen, CodeView swaps the screen in the display buffer for the one in the storage buffer. When you use screen swapping, the buffer is 16K bytes for all adapters. The amount of memory CodeView uses is increased by the size of the buffer.
315
Suppress Snow (CV, CVW)

Option /G The /G option suppresses snow that can appear on some CGA displays. Use this option if your CodeView display is unreadable because of snow.
Specify Interrupt Trapping (CV, CVW)

Options /I[[0 | 1]] /N[[0 | 1]] The /I option tells CV whether to handle nonmaskable-interrupt (NMI) and 8259-interrupt trapping. The /N option controls only CodeViews handling of NMIs and does not affect handling of interrupts generated by the 8259 chip. The following table summarizes the options and their effects:
Option /I0 /I1, /I /N0 /N1, /N Effect Trap NMIs and 8259 interrupts Do not trap NMIs or 8259 interrupts Trap NMIs Do not trap NMIs
You may need to force CodeView to trap interrupts with /I0 on computers that CodeView does not recognize as IBM compatible. Using /I0 enables the CTRL+C and CTRL+BREAK interrupts on such computers.
Load Other Files (CVW)

Option /Ldll /Lexe To load symbolic information from a dynamic-link library (DLL) or from another application, use the /L option when you start CodeView. Specify /L for each DLL or application that you want to debug. When you place a module in a DLL, neither code nor debugging information for that module is stored in an application executable (.EXE) file. Instead, the code and symbols are stored in the library and are not linked to the main program until run time. The same is true for symbols in another application running within Windows. Thus, CVW needs to search the DLL or other application for symbolic information. Because the debugger does not automatically know which libraries to look for, use the /L option to preload the symbolic information.
316
Example
The following command starts CodeView for Windows:

CVW /LPRIORITY.DLL /LCAPPARSE.DLL PRINTSYS
CVW is used to debug the program PRINTSYS.EXE. CVW loads symbolic information for the dynamic-link libraries PRIORITY.DLL and CAPPARSE.DLL, as well as the file PRINTSYS.EXE.
Disable Mouse (CV, CVW)

Option /M If you have a mouse installed on your system, you can tell CodeView to ignore it by using the /M option. You may need to use this option if you are debugging a program that uses the mouse and there is a usage conflict between the program and CodeView.
Nonmaskable-Interrupt Trapping (CV, CVW)

Option /N For information on the /N option, see Specify Interrupt Trapping on page 314.
Set Screen Swapping (CV)

Option /S The /S option sets the CodeView screen-exchange method to swapping. For complete information on CodeView screen-exchange methods, see Set ScreenExchange Method on page 313.
Toggle State-File Reading

Option /TSF The Toggle State File (/TSF) option either reads or ignores CodeViews state file and color files, depending on the Statefileread entry in the CodeView sections of TOOLS.INI. The /TSF option reverses the effect of the Statefileread entry. The Statefileread entry is set to yes by default. These options have no effect on writing the files. CodeView always saves its state on exit.
317
The effect of different combinations of Statefileread and /TSF are summarized in the following table:
/TSF Specified Specified Not specified Not specified Statefileread
y (or omitted) n y (or omitted) n
CodeView Result Do not read files Read files Read files Do not read files
The state file is CURRENT.STS. The color files are CLRFILE.CV4 for CV and CLRFILE.CVW for CVW.
Set Startup X and Y Coordinates

Options /X, /Y The window CodeView uses within Windows cannot be moved or sized while Windows is running. You can specify the position of the CodeView window with the /X and /Y options. In the Command Line field of the Program Item Properties dialog box, enter
C:\MASM\BIN\CVW.EXE /X:X /Y:Y
where X and Y are the pixel coordinates for the upper lefthand corner of the CodeView session window. (The location for your CVW.EXE file may be different.) Note that this still does not allow the CodeView window to be moved to another location on the Program Manager workspace. For more information on specifying command-line options with Windows operating system applications, see your Windows Users Guide.
The CURRENT.STS State File

CodeView and PWB save settings and state information in the CURRENT.STS file. The file contains information about the current state of the two environments. When you restart CodeView or PWB, they read CURRENT.STS and restore their previous state. CodeView uses additional files to save your most recent color settings. These files are CLRFILE.CV4 for CV and CLRFILE.CVW for CVW. CodeView and PWB search for these files in the directory that the INIT environment variable specifies. If no INIT environment variable exists, CodeView and PWB search the current directory. If no state file is found, new CURRENT.STS and CLRFILE.CV4 or CLRFILE.CVW files are created in the INIT directory or the current directory if no INIT variable is set.
318
Information about CodeView stored in CURRENT.STS includes:

u u u u u
Window layout Breakpoints Watch expressions Source, Local, and Memory display options Global CodeView options such as case sensitivity, screen exchange method, radix, and expression evaluator
You can set CodeView options in TOOLS.INI or on the command line and then modify them during a session. They are saved in CURRENT.STS when you exit CodeView. During each CodeView session, these features are set in the following order: 1. 2. 3. 4. From TOOLS.INI From the CodeView command line From CURRENT.STS During the debugging session
The following items are not saved between sessions:

u u
The current location (CS:IP). The expansion state of watch expressions. All watch expressions and their format specifiers are restored, but they appear in their contracted state. Absolute-address breakpoints. Breakpoints set at an absolute segment:offset address are not saved. CodeView saves breakpoints only at specific line numbers or symbols. Memory window addresses. Each memory window is restored with its display type and options, but CodeView does not save the starting address. Instead, Memory windows show the start of the data segment (address DS:00).
319
319
C H A P T E R
The CodeView Environment
CodeView provides a powerful environment in which to debug programs and dynamic-link libraries (DLLs). Its rich set of commands helps you track program execution and changing data values. In CodeView you can point-and-click your source code to start and stop execution or modify bytes in memory. You can also use more traditional keyboard commands. You can use function keys to execute common commands, such as tracing and stepping through a program. When you quit CodeView, it remembers your breakpoints, window arrangement, watch expressions, and option settings. This chapter describes the CodeView display, shows you how to use the menu commands, and how to interact with the different types of windows.
The CodeView Display

The CodeView screen is divided into three parts:
u u u
The menu bar across the top of the screen The window area between the menu bar and status bar The removable status bar across the bottom of the screen
Figure 9.1 shows a typical CodeView screen with several open windows. The figure shows selected elements of the display, which are described in the sections that follow.
Filename: LMAETC09.DOC Project: MASM Environment and Tools Template: MSGRIDA1.DOT Author: Mike Eddy Last Saved By: Mike Eddy Revision #: 44 Page: 319 of 1 Printed: 10/09/00 02:41 PM
320
Figure 9.1
CodeView Display
The Menu Bar

The menu bar displays the names of the CodeView menus. To open a menu, use one of the following methods:
u u u
Click a menu title with the mouse. Press ALT plus the menu titles highlighted letter. Press and release ALT, use the arrow keys to select a menu, and then press DOWN ARROW or ENTER to open it.
Each command in a menu has a highlighted letter. To choose that command, press the highlighted letter. Many commands also list a shortcut key that you can press at any time instead of opening a menu and choosing a command. A command that does not apply to a particular situation is dimmed on the menu. When you press the corresponding shortcut key, no action is performed.
The Window Area

Most of your debugging takes place in the window area, where you can open, close, move, size, and overlap the various types of CodeView windows.
Chapter 9 The CodeView Environment
321
Although each window serves a different function for debugging, the windows have a number of common features. The Close, Maximize, Restore, and Minimize boxes work in the same way as they do in PWB. The scroll bars also work the same as in PWB. For information on the window border controls, see Chapter 4, User Interface Details. Only one window can be active at a time. You always use the currently active window, which appears with a highlighted border and a shadow on the screen. The text cursor always appears in the active window.
The Status Bar

The status bar contains information about the active window. It usually includes a row of buttons you can click to execute commands. You can also use the shortcut keys shown on the buttons. To remove the status bar and gain an extra line for the window area, choose Status Bar from the Options menu, or type the OA- command in the Command window. To restore the status bar, choose Status Bar from the Options menu, or type the OA+ command in the Command window. For more information on this command, see the Options command on page 422.
CodeView Windows
CodeView windows organize and display information about your program. This section describes each CodeView window, the information you can display, and how you can change information and enter commands in the Command window. It also explains how to move among the windows and manipulate them.
How to Use CodeView Windows

Each CodeView window has a different function and operates independently of the others. Only one window can be active at a time. Commands you choose from the menus or by using shortcut keys affect the active window. The following list briefly describes each windows function: Source Displays the source or assembly code for the program you are debugging. You can open a second Source window to view an include file or any ASCII text file.
322
Command Accepts debugging commands from the keyboard. CodeView displays the results, including error messages, in the Command window. When you enter a command in a dialog box, CodeView displays any resulting errors in a popup window. Watch Displays the values of variables and expressions you select. You can modify the value of watched variables, browse the contents of structures and arrays, and follow pointers through memory. Local Lists the values of all variables local to the current scope. You can set Local window options to show other scopes. You can modify the values of variables displayed in the Local window. Memory Displays the contents of memory. You can open a second Memory window to view a different section of memory. You can set Memory window options to select the format and address of displayed memory. You can directly change the displayed memory by typing in the Memory window. Register Displays the contents of the machines registers and flags. You can directly edit the values in the registers, and you can toggle flags with a single keystroke or mouse click. 8087 Displays the registers of the hardware math coprocessor or the software emulator. Help Displays the Microsoft Advisor Help system. The first time you run CodeView, it displays three windows. The Local window is at the top, the Source window fills the middle of the screen, and the Command window is at the bottom. The Local window is empty until you trace into the main part of the program. You can open or close any CodeView window. However, at least one Source window must remain open. When you exit CodeView, it records which windows are open and how they are positioned, along with their display options. These settings become the default the next time you run CodeView. To open a window, choose a window from the Windows menu. Some operations, such as setting a watch expression or requesting help, open the appropriate window automatically. You can change how CodeView displays information in the Source, Memory, and Local windows. Choose the appropriate window options command from the
323
Options menu. When the cursor is in one of these windows, you can press CTRL+O to display that windows options dialog box. CodeView automatically updates the windows as you debug your program. To interact with a particular window (such as entering a command or modifying a variable), you must select it. The selected window is the active window. The active window is marked in the following ways:
u u u u
The windows frame is highlighted. The window casts a shadow over other windows. The cursor appears in the window. The horizontal and vertical scroll bars move to the window.
To make a window active, click anywhere in the window or in the window frame. You can also press F6 or SHIFT+F6 to cycle through the open windows, making each one active in turn. You can also choose a window from the Windows menu or press ALT plus a window number. In addition, some CodeView commands make a certain window active.
Moving Around in CodeView Windows

To move the cursor to a specific window location, click that location. You can also use the keyboard to move the cursor as shown in Table 9.1.
Table 9.1 Moving Around with the Keyboard Action Move cursor up, down, left, and right Move cursor left and right by words Move cursor to beginning of line Move cursor to end of line Page up and down Page left and right Move cursor to beginning of window Move cursor to end of window Move to next window Move to previous window Restore window Move window Size window Minimize window Keyboard
UP ARROW, DOWN ARROW, LEFT ARROW, RIGHT ARROW CTRL+LEFT, CTRL+RIGHT HOME END PAGE UP , PAGE DOWN CTRL+PAGE UP , CTRL+PAGE DOWN CTRL+HOME CTRL+END F6 SHIFT+F6 CTRL+F5 CTRL+F7 CTRL+F8 CTRL+F9
324
Environment and Tools Maximize window Close window Tile windows Arrange windows
CTRL+F10 CTRL+F4 SHIFT+F5 ALT+F5
The Source Windows

The Source windows display the source code. You can open a second Source window to view other source files, header files, the same source file at a different location, or any ASCII text file. To open a Source window, use one of the following methods:
u u u u
From the Windows menu, choose Source 1 or Source 2. In the Command window, type the View Source (VS) command. Press ALT+3 to open Source window 1. Press ALT+4 to open Source window 2.
You cannot edit source code in CodeView, but you can temporarily modify the machine code in memory using the Assemble (A) command. For more information on the Assemble command, see page 400. Source windows can display three different views of your program code in three different modes:
u u u
Source mode shows your source file with numbered lines. Assembly mode shows a disassembly of your programs machine code. Mixed mode shows each numbered source line followed by a disassembly of the machine code for each line.
Note When you are debugging p-code while Native mode is off, CodeView displays p-code instructions rather than disassembled machine instructions. See the Options command on page 422. For more information on p-code, see Debugging P-code on page 363. CodeView automatically switches to Assembly mode when you trace into routines for which no source is available, such as library or system code. The debugger switches back to the original display mode when you continue tracing into code for which source code is available. For more information on setting display modes, see the View Source command on page 433. For detailed information about the Source window display options, see page 343.
325
The Watch Window

The Watch window displays the value of program variables or the value of expressions you specify in a high-level language. For each expression or variable, you can change the format of the data that is displayed. You can expand aggregate variables, such as structures and arrays, to show all the elements of an aggregate and contract them to save space in the Watch window. You can follow chains of pointers to display and help debug more complex structures, such as linked lists or binary trees. To open a Watch window, use one of the following methods:
u u
From the Windows menu, choose Watch. In the Command window, type the Add Watch (W?) command followed by the variable or expression name. Press ALT+2.
To add expressions to the Watch window, use the Add Watch command from the Data menu or the Quick Watch dialog box (SHIFT+F9). You can also add watch expressions using the Add Watch (W?) and Quick Watch (??) commands. Note Do not edit a string in the Watch window. To change the value of any variable displayed in the Watch window, move the cursor to the value, delete the old value, and type the new value. To change the format in which a variable is displayed or to specify a new format, move the cursor to the end of the variable name and type a new format specifier. To toggle between insert and overtype modes, press the INS key.
Using the Watch Window to View Multi-Level Arrays

You can use the watch window to view the changing values of a structure or array as you step or trace through your program: 1. Open the Watch Window. 2. Add the structure whose elements you want to track to the Watch window with the Add Watch command from the Data menu, or by using the Quick Watch dialog box (SHIFT+F9). The structure name will be added to the Watch Window. 3. Using the mouse, double-click anywhere on the structure name in the Watch Window to expand it one level. Double-click again on any subsequent levels until the structure is open to the level you want to watch.
326
4. Step or Trace through the code using will update with each step.
F8
or F10 keys. The structure elements
For information on expanding and contracting aggregate types and following pointers, see the Quick Watch command on page 453. For detailed information on specifying and using watch expressions, see the Codeview Expression Reference on page 393 and Chapter 11, Using Expressions in CodeView.
The Command Window

You type CodeView commands in the Command window to execute code, set breakpoints, and perform other debugging tasks. You can use the menus, mouse, and keyboard for many debugging tasks, but you can use some CodeView commands only in the Command window. When you first start the debugger, the Command window is active, and the cursor is at the CodeView prompt (>). To return to the Command window after you make another window active, click the command window, or press ALT+9. Using the Command window is similar to using an operating-system prompt, except that you can scroll back to view previous results and edit or reuse previous commands or parts of commands.
How to Enter Commands and Arguments

You enter commands in the Command window at the CodeView prompt when the Command window is active. Type the command followed by any arguments and press ENTER. Some commands, such as the Assemble (A) command, prompt for an indefinite series of arguments until you enter an empty response. CodeView may display errors, warnings, or other messages in response to commands you enter in the Command window. If a Source window is active and the Command window is open, you can still type Command-window commands. When you begin typing, the cursor moves to the Command window and remains there until you press ENTER. The cursor returns to the Source window, and CodeView executes the command. If you have begun typing but do not want to execute a command, press ESC to clear the text and place the cursor at the prompt. After you press ESC, the Command window becomes active.
Command Format
The format for CodeView commands is as follows:
327
command [[arguments]] [[;command2]] The command is the command name, and arguments are control options or expressions that represent values or addresses to be used by the command. The first argument can usually be placed immediately after command with no intervening spaces. Arguments may be separated by spaces or commas, depending on the command. For more information, see Chapter 12, CodeView Reference. To specify additional commands on the same line, separate each command with a semicolon (;). Commands are always one, two, or three characters long. They are not case sensitive, so you can use any combination of uppercase and lowercase letters. Arguments to commands may be case sensitive, depending on the command. Example The following example shows three commands separated by semicolons:
MDB 100 L 10 ; G .178 ; MDB 100 L 10
The first command (MDB 100 L 10) dumps 10 bytes of memory starting at address 100. The second command (G .178) executes the program up to line 178 in the current module. The third command is the same as the first and is used to see if the executed code changed memory. Example This example demonstrates the Comment (*) command:
U extract_velocity ;* Unassemble at this routine
The first command is the Unassemble (U) command, given the argument extract_velocity. The next command is the Comment command. Comment commands are used throughout the CodeView examples in this book.
How to Copy Text for Use with Commands

Copy and paste text instead of retyping. Text that appears in any CodeView window can be copied and used in a command. For example, an address that is displayed in a Memory window or the Register window can be copied and used in a breakpoint command. To copy and use text: 1. Select the text with the mouse or the keyboard. To select text with the mouse, move the mouse pointer to the beginning of the desired text, hold down the left mouse button, and drag the mouse. When you have selected the desired text, release the button.
328
To select text with the keyboard, move the cursor to the desired text, hold down the SHIFT key, and move the cursor with the ARROW keys. 2. Choose the Copy command from the Edit menu or press
CTRL+INS.
3. Move the cursor to the location where you want to use the text and choose the Paste command from the Edit menu, or press SHIFT+INS. 4. Edit the command if desired, and press
ENTER
to execute the command.
Because all input to CodeView windows is line oriented, you cannot copy more than a single line. If you select more than a single line, the Copy command in the Edit menu is unavailable, and CTRL+INS has no effect. However, you can still select more than one line for use with the Print command on the File menu. For more information about the Print command, see Print on page 333. When editing a command, you can toggle between insert and overtype modes by pressing the INS key.
How to Use the Command Buffer

CodeView keeps the last several screens of commands and output in the Command window. You can scroll the Command window to view the commands you entered earlier in the session. This is particularly useful for viewing the output from commands, such as Memory Dump (MD) or Examine Symbols (X), whose output exceeds the size of the window. The TAB key provides a convenient way to move among the previously entered commands. Press TAB to move the cursor to the beginning of the next command, and press SHIFT+TAB to move to the beginning of the previous command. If the cursor is at the beginning or the end of the command buffer, the cursor wraps around to the other end. To return to the current command prompt, you can press CTRL+END or press TAB repeatedly. You can also reuse any command that appears in the Command window without copying and pasting. Move the cursor to the command or press TAB, edit the command if desired, and press ENTER to execute it. When you press ENTER, CodeView restores the original command, copies the new command to the current prompt, and executes the command. If you make a mistake while editing a command, press ESC to restore the line.
The Local Window

The Local window shows all local variables in the current scope. The Local window is similar to the Watch window, except that the variables that are displayed change as the local scope changes. A variable in the Local window is always shown in its default type format. When you edit in the Local window, you can toggle between insert and overtype modes by pressing the INS key.
329
You can expand and contract pointers, structures, and arrays the same way you do in the Watch window. You can also change the values of the variables as in the Watch window. The keyboard shortcut to open or switch to the Local window is
ALT+1.
You can see the local variables of each active routine in the stack by selecting the routine from the Calls menu. For more information on this feature, see The Calls Menu on page 346. By default, the Local window shows the addresses of the local variables on the left side of the window. You can turn this address display on or off using the Options (O) command. For more information on the Options command, see page 422.
The Register Window

The Register window displays the names and current values of the native CPU registers and flags. When you are debugging p-code, it displays names and values of the p-code registers and flags. You can change the value of any register or flag directly in the Register window. To open the Register window, choose Register from the Windows menu, press ALT+7, or F2. You can also view and modify registers by using the Register (R) command. For more information about the Register command, see page 426. When a register value changes after a program step or trace, CodeView highlights the new value so you can see how your program uses the CPU registers. Depending on the current instruction, the Register window also displays the effective address at the bottom of the window. This display shows the location of an operand in physical memory and its value. If you are debugging on an 80386 or 80486 machine, you can view and modify the 32-bit registers. To turn on the 32-bit Registers option, choose the 386 command from the Options menu or use the O3+ command. The 32-bit registers are available if you are debugging on an 80386 or 80486 machine. When you are debugging p-code, CodeView displays the p-code registers: DS, SS, CS, IP, SP, BP, PQ, TH, and TL. If your program has taken an unexpected turn, you may be able to compensate for the problem and continue debugging if you change the value of a register or a flag. You can change a flag value before a dump or looping instruction to test a different branch of code, for example. You can change the instruction pointer (CS:IP) to jump to any code in your program or to execute code you have assembled elsewhere in memory.
330
To change the value of any register, move the cursor to the register value you want to change and overtype the old value with the new value. The cursor automatically moves to the next register. Although you cannot change the value of the flag register numerically in the Register window, you can conveniently toggle the values of each flag using either the mouse or the keyboard:
u u
To toggle a flag with the mouse, double-click the flag. To toggle a flag using the keyboard, move the cursor to the flag and press any key except ENTER, TAB, or ESC. After toggling a flag, CodeView moves the cursor to the next flag.
To restore the value of the last flag or register that you changed, choose Undo from the Edit menu or press ALT+BACKSPACE. If you happen to lose the cursor somewhere in the register window, press TAB. The TAB key moves the cursor to the next register or flag that can be changed.
The 8087 Window

The 8087 window displays the current status of the math coprocessors registers and flags. If you are debugging a program that uses the software-emulated coprocessor, the emulated registers are displayed. To open the 8087 window, choose 8087 from the Windows menu or press ALT+8. The display in the 8087 window is the same as the display produced by the 8087 (7) command, except that the window is continually updated to show the current status of the math coprocessor. For more information about the display, see the 8087 command on page 448. If your program uses floating-point libraries provided by several Microsoft languages, or if your program does not use floating-point arithmetic, the 8087 window and 8087 command display the message:
Floating point not loaded
CodeView displays this message until at least one floating-point instruction has been executed.
The Memory Windows

Memory windows display memory in a number of formats. CodeView allows two Memory windows to be open at the same time. You can open a Memory window in several ways:
u
From the Windows menu, choose Memory 1 or Memory 2.

u
331
From the Options menu, choose Memory1 Window when no Memory windows are open. In the Command window, enter the View Memory (VM) command. At the keyboard, press
ALT+5
u u
or ALT+6.
By default, memory is displayed as bytes or as the last type specified by a Memory Enter (ME), Memory Dump (MD), or View Memory (VM) command. The byte display shows hexadecimal byte values followed by an ASCII representation of those byte values. For values that are outside the range of printable ASCII characters (decimal 32 to 127), CodeView displays a period (.).
How to Change Memory Display Format

It is not always most convenient to view memory as byte values. If an area of memory contains character strings or floating-point values, you might prefer to view them in a directly readable form. To change the display format of a Memory window, choose Memory1 Window or Memory2 Window from the Options menu. CodeView displays a dialog box where you can choose from a variety of display options. When the cursor is in a Memory window, you can presss CTRL+O to display the corresponding Memory Window Options dialog box. You can also set memory display options using the View Memory (VM) command. For detailed information about the display options, see View Memory on page 431. To cycle through the display formats, click the <Sh+F3=Mem1Fmt> or <Sh+F3=Mem2 Fmt> buttons on the status bar, or press SHIFT+F3. Pressing CTRL+SHIFT+F3 displays the cycle in reverse order. When you first open the Memory window, it displays memory starting at address DS:00. To change the starting address, use one of the commands to set Memory window options. You can specify the starting address or enter an expression to use as the starting address. You can also type over the segment:offset addresses shown in the left column of the Memory window to change the displayed addresses. Move the cursor to an address in the window, or repeatedly press TAB until the cursor is on an address, and type a new address.
How to Change Memory Directly

To change the values in memory, overtype the value you want to change. To move quickly from field to field in the Memory window, press TAB. You can change memory by entering new values for the format that is displayed or by typing over the raw bytes in the window. CodeView ignores the input if you
332
press a key that does not make sense for the current format (for example, if you type the letter X in anything but ASCII format). To undo a change to memory, choose Undo from the Edit menu, or press ALT+BACKSPACE.
How to View Memory at a Dynamic Address

Live expressions make it easy for you to watch a dynamic view of an array or pointer in the Memory window. Live means that the starting address of memory in the window changes to reflect the current value of an address expression. To create a live expression, choose the Memory1 Window or Memory2 Window command from the Options menu. In the Memory Window Options dialog box, type in an address expression, then turn on the Re-evaluate Expression Always (Live) option. It is usually more convenient to view an item in the Watch window than in the Memory window. However, some items are more easily viewed using live expressions. For example, you can examine what is currently on top of the stack by entering SS:SP as the live expression.
The Help Window

In CodeView, you can request Help:
u u
u u u
From the Help menu. By pressing F1 when the cursor is on the keyword, menu, or dialog box for which you want Help. By clicking the right mouse button on a Help keyword. Using the Help (H) command. By choosing Help from the Windows menu. You can also press ALT+0 for Help on CodeView windows.
The Microsoft Advisor Help window is displayed whenever you request Help for CodeView. For information about getting the most out of the Microsoft Advisor Help system, see Chapter 21.
CodeView Menus
Many commands that you are likely to use frequently are in the CodeView menus. This section describes the menus and the commands or options in each menu. Not all commands are available in both versions of the CodeView debugger. When applicable, the menu descriptions discuss command availability.
333
The File Menu

The File menu contains commands to load source files and other ASCII text files into the Source window, print from the active window, start an operatingsystem shell, and end the debugging session. The following table summarizes the commands on the File menu. Commands marked with an asterisk are not shown in the CVW File menu:
334
Environment and Tools Command Open Source Open Module Print* DOS Shell* Exit Purpose Opens a source, include, or other text file Opens a source file for a module in the program Prints all or part of the active window Goes to the operating-system prompt temporarily Exits CodeView
Open Source
The Open Source command displays the Open Source File dialog box. You can select the name of the source file, include file, or other text file to display in the active Source window.
Open Module
The Open Module command displays the Open Module dialog box. This dialog box provides an easy way to load the source file for any module in your program. The dialog box lists the source files that make up the modules in the program you are debugging. Only those modules that include line-number or full symbolic information are listed. CodeView displays the source file you choose in the active Source window.
Print
In CodeView for MS-DOS only, the Print command displays the Print dialog box, which offers several options to write information in the active window to a device or a file. You can print text in the active window in the following ways:
u u
Window view, which prints text that currently appears in the active window Complete window contents, which prints the contents of the active window, including what has scrolled out of the window
To print to a file, specify a filename in the dialog box. To append the printed text to the end of the file, select Append. To overwrite a file that already exists, select Overwrite. If you specify a device instead of a file, you can choose either Append or Overwrite. To print directly to a printer, specify the name of the printer port such as LPT1 or COM2. You must specify a filename or a device name. CodeView reports an error if you omit the name.
DOS Shell
In MS-DOS only, you can use the DOS Shell command to leave CodeView temporarily and go to the operating-system prompt.
335
When you choose the Shell command, CodeView starts a second copy of the command processor specified by the COMSPEC environment variable. If there is not enough memory to open a new shell, a message appears. Even if you do have enough memory to start a command shell, you may not have enough memory to execute large programs from the shell. While in the shell, do not start any terminate-and-stay-resident (TSR) programs, such as MSHERC.COM, and do not delete files you are working on during your debugging session. Also, do not delete any files used by CodeView, such as the CURRENT.STS file. To return to CodeView, type exit at the operating-system prompt to close the shell. For more information about starting a shell, see the Shell Escape command on page 443.
Exit
The Exit command saves the current CodeView environment and returns to the program that called CodeView, such as COMMAND.COM, PWB, or another editor. CodeView saves the window arrangement, watch expressions, option settings, and most breakpoints in the state file, CURRENT.STS. It saves current color settings in CLRFILE.CV4 if you are using CV and in CLRFILE.CVW if you are using CVW. When you start the debugger at a later time, CodeView restores these settings. To prevent CodeView from restoring the information it stores in CURRENT.STS, start the debugger with the /TSF option or use the Statefileread entry in your TOOLS.INI file.
The Edit Menu

The Edit menu contains commands to undo changes to windows fields, copy selected text to the clipboard, and paste the contents of the clipboard into a window. For more details on editing in CodeViews windows, see CodeView Windows on page 321. The following table summarizes the commands on the Edit menu:
Command Undo Copy Paste Purpose Reverses the last editing change Copies the selected text to the clipboard Inserts the contents of the clipboard at the cursor
Undo
The Undo command (ALT+BACKSPACE) reverses the last editing action.
336
Copy
The Copy command (CTRL+INS) copies selected text to the clipboard. Because input to CodeView is restricted to single lines, you can copy only a single line of text. If you select more than a single line of text, the Copy command is disabled and pressing CTRL+INS has no effect.
Paste
The Paste command (SHIFT+INS) inserts text from the clipboard at the cursor in the Command window.
The Search Menu

The Search menu provides commands to find strings and regular expressions in source files and to locate the definitions of labels and routines. The following table summarizes the commands on the Search menu:
Command Find Selected Text Repeat Last Find Label/Function Purpose Searches for a text string or pattern in the source file Searches for the selected text in the source file Repeats the last text search Searches for a label or function definition in the program
Find
The Find command displays the Find dialog box. In the Find What text box, type the text or pattern you want to find. You can also select text in a window and then choose Find. The text you selected is shown in the dialog box. You can select options in the dialog box to modify the way CodeView searches for text. The following options are available: Whole Word CodeView matches the text only when it occurs as a word by itself. For example, when you search for the pattern print with the Whole Word option, CodeView finds print("eeep"), but not error_print("eeep"). Match Case CodeView matches the text when each letter in the pattern has the same case as the source file. For example, the pattern fish matches fish, but not Fish.
337
Regular Expression CodeView treats the text as a regular expression. Regular expressions provide a powerful way to specify patterns that match several different sections of text. For more information about regular expressions, see Appendix A. To search for a regular expression in the active Source window using the Command window, you can type the Search (/) command followed by the string. CodeView searches the file starting at the current position. CodeView places the cursor on the next occurrence of the search pattern. If the end of the file is reached without finding a match, CodeView wraps around and continues searching from the beginning of the file.
Selected Text
The Selected Text command (CTRL+\) searches for the next occurrence of the selected text in the Source window.
Repeat Last Find

The Repeat Last Find command (ALT+/) searches for the next occurrence of the search pattern, including search options, you last specified.
Label/Function
The Label/Function command lets you search the programs symbolic information for the definition of a label or routine. When you choose Label/Function, the Find Label/Function dialog box appears. The currently selected text or the word at the cursor comes up in the Label/Function Name text box. You can search for this name or type in a different label or routine name. When you choose OK, CodeView searches the symbolic information in the program for the name. When the label or routine name is found, CodeView positions the cursor at the name in the source file. To view the current program location after searching, choose the first item in the Calls menu or type the Current Location (.) command in the Command window.
The Run Menu

The Run menu consists of commands to restart the program, animate the program in slow motion, change the programs arguments, load a new program, or configure the modules CodeView is using. The following table summarizes the commands on the Run menu:
338
Environment and Tools Command Restart Set Runtime Arguments Animate Load Purpose Restarts the program Changes the programs run-time arguments and restarts the program Executes the program in slow motion Loads a new program to debug, sets run-time arguments, and configures CodeViews modules
Restart
The Restart command resets execution to start at the beginning of the program. After you issue the command, CodeView:
u u u
Initializes all program variables. Resets the pass counts for all breakpoints. Preserves existing breakpoints, watch expressions, and the programs command-line arguments.
You can use Restart any time after execution stops: at a breakpoint, while stepping or tracing, or when your program ends. If your program redefines interrupts, Restart may not work correctly because it does not execute any cleanup or exit-list code in the program. If your program requires this code to be executed, let the program run to the end before restarting, or use the Display Expression (?) command in the Command window to call the cleanup routines. For more information on calling program routines, see Display Expression on page 452.
Set Runtime Arguments

The Set Runtime Arguments command lets you change your programs command-line arguments. When you set new arguments, CodeView restarts the program.
Animate
The Animate command executes your program in slow motion. CodeView highlights each statement in the Source window as your program executes. This allows you to see the flow of execution. To stop animation, press any key. You can set the animation speed with the Trace Speed command from the Options menu or with the Trace Speed (T) Command-window command.
Load
The Load command displays the Load dialog box, which you can use to:

u u u
339
Load executable (.EXE or .DLL) files. Change the programs command-line arguments. Specify different CodeView components from those specified in TOOLS.INI, such as a different expression evaluator or the p-code execution model.
340
Loading Programs or DLLs

To load program or DLL symbols into the debugger, type a filename in the File to Debug text box, or use the mouse or keyboard to select a file from the File List box. Use the Drives/Dirs list box to change to a different drive or directory.
Set Command-Line Arguments

Use the Arguments text box to change the command-line arguments to the program you are debugging or to set entirely new arguments. Type the arguments to your program as you would on the command line.
Configure CodeView Modules

CodeView uses a default setting for an execution model, transport layer, and expression evaluator if any of these is not specified in TOOLS.INI. Choose the Configure button to load different CodeView DLLs. The Configure dialog box lists the DLLs that CodeView has loaded. CodeView loads several DLLs that are required to debug your programs. These DLLs include:
u u u u
Expression evaluators for various languages and environments. Execution models for various operating systems. Execution models for p-code. Transport layers.
To load new DLLs, choose the Change buttons on the right side of the dialog box.
The Data Menu

The Data menu provides commands to add and delete watch expressions and breakpoints. Watch expressions allow you to observe how variables change as your program executes and also to expand arrays and dereference pointers. Breakpoints allow you to stop execution of your program to check the values of your variables, determine execution flow, and change how your program executes. For more information about watch expressions, see Chapter 11, Using Expressions in CodeView and the Add Watch Expression command on page 436. The following table summarizes the commands on the Data menu:
Chapter 9 The CodeView Environment Command Add Watch Delete Watch Set Breakpoint Edit Breakpoints Quick Watch Purpose Adds an expression to the Watch window Deletes an expression from the Watch window Sets a breakpoint in the program Modifies or removes existing breakpoints Displays a quick view of a variable or expression
341
Add Watch
The Add Watch command (CTRL+W) displays the Add Watch dialog box, which shows the selected text or the word at the cursor in the Expression text box. You can enter a different expression or add a format specifier to the expression. When you choose OK, the expression is added to the end of the Watch window.
Delete Watch
The Delete Watch command (CTRL+U) displays the Delete Watch dialog box, which displays a list of the watch expressions in the Watch window. Select the expression you want to delete from the list and choose OK. Choose the Clear All button to remove all expressions from the Watch window. You can also delete expressions directly from the Watch window. Use the mouse or the cursor keys to move the cursor to the expression you want removed, and press CTRL+Y.
Set Breakpoint
The Set Breakpoint command displays the Set Breakpoint dialog box, which allows you to select from several kinds of breakpoints and set options for each type. The following list describes the breakpoints you can set: Break at Location This is the simplest type of breakpoint. You specify an address or a line number where you want execution to pause. To specify a line number, precede it with a period (.); otherwise, CodeView will interpret it as an address. When your programs execution reaches the breakpoint location, your program stops temporarily, and you can enter CodeView commands. Break at Location if Expression is True You specify a location and an expression. Whenever execution reaches that location, CodeView checks the expression. If the expression is true (nonzero), the breakpoint is taken. Otherwise, execution continues.
342
Break at Location if Expression has Changed You specify a location and an expression that represents a variable or any portion of memory. To specify a range of memory, enter the length of the range in the Length text box. CodeView checks the variable or the range of memory when execution reaches the breakpoint location. If the value of any byte has changed since the last time CodeView checked, the breakpoint is taken. Otherwise, execution continues. Break When Expression is True This breakpoint is taken whenever the expression becomes true. CodeView evaluates the expression after every line or every instruction, instead of only at a certain location. As a result, this type of breakpoint can greatly slow your programs execution. Break When Expression has Changed CodeView checks the variable or the range of memory as each line or each instruction is executed. You can also specify a range with the Length text box. This type of breakpoint can also slow your programs execution. Each breakpoint is numbered, beginning with 0. For each type of breakpoint, you can set several options. If you try to use an option that does not apply to a certain breakpoint, CodeView displays N/A in the edit box and ignores that option. The options are: Location Specifies where CodeView should evaluate the breakpoint. Expression Specifies an expression that causes a break when it becomes true or a location that is to be watched for changes. Length Specifies a range of memory (starting at the address in the Expression text box) to watch for changes. Pass Count Specifies the number of times to pass over the breakpoint when it otherwise would be taken. For example, a pass count of 10 tells CodeView to ignore the breakpoint ten times. Commands Specifies a list of Command-window commands, separated by semicolons, that are executed when the breakpoint is taken. If several breakpoints with commands are taken, the commands are queued and executed in first-in, first-out order.
343
As shortcuts, you can also set simple (break at location) breakpoints with the following methods:
u u
Double-click the line in the Source window. Move the cursor to the breakpoint location in the Source window and press F9.
A line with a breakpoint is highlighted. In the Mixed and Assembly modes, an assembly-language comment that displays the breakpoint number appears. For example:
0047:0b30 57 push di ;BK0
In this example, breakpoint number 0 is set at the address 0047:0B30. You can also set breakpoints with the Breakpoint Set (BP) command. See the Breakpoint Set command on page 405.
Edit Breakpoints
The Edit Breakpoints command displays the Edit Breakpoints dialog box, where you can add, remove, change, enable, and disable breakpoints. Select a breakpoint from the list of breakpoints, then choose one of the command buttons on the right side of the dialog box. The list of breakpoints in the Edit Breakpoints dialog box shows the current state of each breakpoint in your program. For more information on the format of the breakpoint list, see the Breakpoint List command on page 405. The command buttons in the Edit Breakpoints dialog box are described in the following table:
Button Add Remove Modify Enable Disable Clear All Description Adds a new breakpoint Removes the selected breakpoint Modifies the same breakpoint Activates a disabled breakpoint Disables an active breakpoint Removes all breakpoints
If you choose the Modify button, CodeView displays the Set Breakpoint dialog box with the appropriate options set for the breakpoint you selected. You can then modify the options and set the breakpoint just as you do with the Set Breakpoint command.
344
When you disable a breakpoint by selecting the Disable button, CodeView does not evaluate the breakpoint. Program execution continues as if the breakpoint was never set. You may encounter several occasions where it is useful to disable a breakpoint. Sometimes a certain breakpoint is not practical when you are debugging a routine nested deeply in your program. You can re-enable the breakpoint later when you really need it. Also, conditional breakpoints are evaluated at every program step and can slow execution. You can disable some conditional breakpoints in areas of your program where youre certain you wont need them.
Quick Watch
The Quick Watch command (SHIFT+F9) displays the Quick Watch dialog box, which shows the variable at the cursor position or the selected expression. The Quick Watch dialog box is similar to the Watch window. However, you mainly use Quick Watch for a quick exploration of the current values in an array or a pointer-based data structure, rather than as a method to constantly display the values. The Quick Watch dialog box automatically expands structures, arrays, and pointers to their first level. You can expand or contract an element just as you can in the Watch window. If the expanded item needs more lines than the Quick Watch dialog box can display, you can scroll the view up and down. Choose the Add Watch button to add a Quick Watch item to the Watch window. Expanded items appear in the Watch window as they are displayed in the Quick Watch dialog box. For complete information on using the Quick Watch dialog box, see the Quick Watch command on page 453.
The Options Menu

The Options menu contains commands to change the default behavior of CodeView commands and the display status of CodeView windows. You can also set display options with various Command-window commands. When the cursor is in one of the Source, Memory, or Local windows, you can press CTRL+O to display the windows Options dialog box. For menu items that are toggles, a bullet appears to the left of the item when the option is turned on. No bullet appears when it is turned off. The following table summarizes the commands on the Options menu:
Chapter 9 The CodeView Environment Command Source1 Window Source2 Window Memory1 Window Memory2 Window Local Options Trace Speed Language Horizontal Scrollbars Vertical Scrollbars Status Bar Colors Screen Swap Case Sensitivity 32-Bit Registers Native Purpose Sets Source window 1 display options Sets Source window 2 display options Sets Memory window 1 display options Sets Memory window 2 display options Sets Local window display options Sets animation speed Sets the expression evaluator Toggles horizontal scroll bars on windows Toggles vertical scroll bars on windows Toggles the status bar display Changes colors of CodeView screen elements Toggles screen exchange Toggles case sensitivity of symbols Toggles display of 32-bit registers Toggles display of p-code or machine code instructions
345
Source Window
The Source Window command displays the Source Window Options dialog box. In this dialog box, you can set the source display mode and other options for the current Source window. These options are as follows:
Option Follow CS:IP thread of control Source Mixed Source and Assembly Assembly Tab Length Show Machine Code Show Symbolic Name Description Keeps the current program location visible in the active Source window. Displays the source code for the program. Displays each source line followed by the disassembly of the code generated for that line. Displays a disassembly of the machine code in your program. Sets the number of spaces to which tab characters expand in the source file. Shows the address and hexadecimal representation of the machine code in Mixed and Assembly modes. Shows the symbol name in assembly-language displays instead of the numeric value of the symbol.
The Source Window Options dialog box also contains all the options available with the VS (View Source) command. For information on the VS command, see VS (View Source) on page 433.
346
Memory Window
The Memory Window command displays the Memory Window Options dialog box, where you can set the starting address and display format of the active Memory window. For details, see The Memory Windows on page 330 and the View Memory command on page 431.
Local Options
You can specify the scope of variables to be displayed in the Local window. When you select Local Options from the Options menu, a dialog box appears in which you can select any combination of lexical, function, module, executable, and global scopes. You can also toggle the display of addresses in the Local window from the Local Options dialog box. When you turn Show Addresses on, the BP-relative address of each local variable is shown in the Local window. Otherwise, the Local window shows only the names of the variables. You can also use the Options (OL) command in the Command window to specify the scope of variables to be displayed in the Local window. For information about the Options command, see page 422.
Trace Speed
The Trace Speed command displays the Trace Speed dialog box, which presents a list of three speeds from which you can select. When you use the Animate command to run your program in slow motion, CodeView pauses execution between each step. The duration of the pause is set with the Trace Speed command. Slow pauses for 1/2 second. Medium pauses for 1/4 second. Fast runs the program as fast as possible while still updating CodeView windows and evaluating breakpoints and watch expressions.
Language
The Language command displays the Language dialog box, which presents a list of the expression evaluators that CodeView has loaded, plus the Auto option. In your TOOLS.INI file, you can configure CodeView to load a number of different expression evaluators. You can also load expression evaluators by choosing Load from the Run menu. Only one expression evaluator can be active at a time. The Auto setting is the default. It tells CodeView to set the expression evaluator automatically based on the extension of the source file you are debugging in the current Source window. For more information on expression evaluators, see Configuring CodeView with TOOLS.INI on page 301. For more information on using expression evaluators, see Chapter 11, Using Expressions in CodeView.
347
Horizontal Scrollbars
The Horizontal Scrollbars command toggles the horizontal scroll bars on and off. When scroll bars are off, you can drag the bottom window frame, as well as the size box, to resize the window.
Vertical Scrollbars
The Vertical Scrollbars command toggles the vertical scroll bars on and off. When scroll bars are off, you can drag the right window frame, as well as the size box, to resize the window.
Status Bar
The Status Bar command toggles the status bar on and off. When the status bar is off, you gain an extra line of space for windows.
Colors
The Colors command displays a dialog box that lets you change the colors of CodeView screen elements. The Item list box displays all the elements of the debugging screen. The Foreground and Background list boxes show the current color settings for the highlighted element in the Item list box. To change the color of a screen element, choose an element in the Item list box, then choose foreground and background colors. When you are done, click the OK button. Your new color settings take effect as soon as you exit the dialog box. If you make a number of changes and want to go back to your previous color settings, click the Reset button. You can then start changing colors again. To close the dialog box without making any changes, click the Cancel button. To reset to the standard CodeView colors, click the Use Default button. When you specify colors using the Colors command in CodeView, the colors are saved in CLRFILE.CVW if you are using CodeView for the Windows Operating System and in CLRFILE.CV4 if you are using CodeView for DOS. CodeView saves these files in the directory specified by the INIT environment variable or in the current directory if no INIT environment variable is set. These settings become the new default colors.
Screen Swap
The Screen Swap command toggles screen exchange on or off. By default, CodeView switches to your program's output screen whenever you execute code in the program. CodeView uses either screen flipping or screen swapping, depending on the command-line options you used to start the debugger. See Set Screen-Exchange Method on page 313.
348
If your program sends no output to the screen, youll probably want to turn Screen Swap off. This setting continuously displays CodeViews screen while your program executes. If Screen Swap is off and your program writes to the screen, a portion of the CodeView display may be overwritten. If this happens, type the Refresh (@) command in the Command window.
Case Sensitivity
The Case Sensitivity command toggles case sensitivity on or off. When Case Sensitivity is on, CodeView treats symbol names as case sensitive (that is, a lowercase letter is different from its corresponding uppercase letter). This option affects only commands that deal with symbols in your program; it does not affect the text-searching commands.
32-Bit Registers
The 32-Bit Registers command toggles 386 mode on and off. When 386 mode is on, a bullet appears next to the command on the menu, and CodeView displays the 32-bit registers in the Register window. In this mode, CodeView can also assemble instructions that use 32-bit registers or memory operands.
Native
When you are debugging a program that uses p-code, you use the Native command to toggle between p-code instructions and native machine instructions. With Native mode on, CodeView displays your programs native CPU instructions. With Native mode off, CodeView displays the instructions in pcode. For more information on debugging p-code, see page 363.
The Calls Menu

The Calls menu shows what routines have been called into your program during debugging. Its contents change to reflect their current status. The current routine is at the top of the menu; the routine that called it appears just below. Routines are listed in the reverse order in which they were called. At the bottom of the list is your programs main routine. In C, for example, main appears at the bottom. When you are debugging a Windows-based application, winmain is at the bottom of the list. The Calls menu is empty until the program enters at least one routine that creates a stack frame. Listed with each routine name are the arguments to each routine in parentheses. The menus width expands to accommodate the widest entry.
349
Arguments are shown in the current radix, except for pointers, which are always shown in hexadecimal. When you choose a routine from the Calls menu, CodeView displays the source code for that routine and updates the Local window to show the local variables in that routine. The cursor moves to the return location to show the next line or instruction that will be executed when control returns to that routine. To step out of deeply nested code, choose a routine and then press
F7.
Choosing a routine from the Calls menu does not affect program execution; it provides you with a convenient way to view a routines source code and local variables. However, since the cursor is positioned at the return location, you can press F7 to execute through the stack of nested calls to that line. This is especially convenient when you find youve accidentally traced into a deeply nested set of routines which you know to be bug-free. Rather than continue a tedious trace until you work your way out of the stack of routines, you can choose a routine from the Calls menu and press F7. CodeView executes through the nested routines until control returns to the point you chose. A routine may not be visible in the Calls menu under the following circumstances:
u
u u u
You have traced only startup or termination routines from the run-time library. Routine calls are nested so deeply that not all routines appear on the menu. The stack has been corrupted. CodeView cannot trace through the stack frame because the BP register is overwritten.
The Windows Menu

If you get lost among your windows, try the Arrange command. The Windows menu contains commands that activate, open, close, tile, arrange, and manipulate CodeView windows. There is also a command to view your programs output screen. A bullet appears to the left of the active window when you open this menu. All the windows are numbered. You can quickly open or switch to a window by pressing ALT plus the windows number. The following table summarizes the commands on the Windows menu and the corresponding shortcut keys:
350
Environment and Tools Shortcut Key

CTRL+F5 CTRL+F7 CTRL+F8 CTRL+F9 CTRL+F10 CTRL+F4 SHIFT+F5 ALT+F5 ALT+0 ALT+1 ALT+2 ALT+3 ALT+4 ALT+5 ALT+6 ALT+7 ALT+8 ALT+9 F4
Command Restore Move Size Minimize Maximize Close Tile Arrange Help Local Watch Source 1 Source 2 Memory 1 Memory 2 Register 8087 Command View Output
Purpose Restores the active window to its size and position before it was maximized or minimized Moves the active window using the keyboard Sizes the active window using the keyboard Shrinks the active window to an icon Enlarges the active window to full screen Closes the active window Arranges all open windows to fill the entire window area Arranges all open windows to an effective layout for debugging Opens or switches to the Help window Opens or switches to the Local window Opens or switches to the Watch window Opens or switches to Source window 1 Opens or switches to Source window 2 Opens or switches to Memory window 1 Opens or switches to Memory window 2 Opens or switches to the Register window Opens or switches to the 8087 window Opens or switches to the Command window Swaps the CodeView screen for the programs output screen
Source and Memory Windows

You can open as many as two Source and two Memory windows. At least one Source window must be open at all times. To close a window, use the Close command (CTRL+F4).
Help, Local, Watch, Register, 8087, and Command Windows

CodeView can display one of each of these windows. The Register window has an additional shortcut key (F2) you can use to open or close it. When you open the Help window, CodeView displays the last Help screen you viewed. If you have not yet viewed Help during the session, CodeView displays the top-level contents in the Microsoft Advisor.
351
View Output
To view your programs output screen, choose View Output or press CodeView displays the output screen until you press a key.
ALT+F4.
The Help Menu

The Help menu contains commands to access the Microsoft Advisor Help system. When you choose a Help command, CodeView opens the Help window if it is not already open and displays the appropriate part of the Microsoft Advisor. When the Help window is open, you can browse through Help with mouse and keyboard commands. All Microsoft environments provide the same mouse and keyboard commands to access the Microsoft Advisor. For more information on getting the most out of Help, see Chapter 21. The following table summarizes the commands on the Help menu:
Command Index Contents Topic Help on Help About Purpose Displays the table of Microsoft Advisor indexes Displays the Microsoft Advisor contents screen Displays Help on the current word Displays Help on using the Microsoft Advisor Displays CodeView copyright and version information
Index
The Index command displays a table of available indexes. Each part of the Help system has its own index.
Contents
The Contents command (SHIFT+F1) displays the contents for the entire Help system. This screen lists the table of contents for each Help system component.
Topic
The Topic command (F1) displays help on the word at the cursor or the selected text. When you open the Help menu, CodeView displays the topic in the menu. When you choose the Topic command, CodeView displays information on the indicated topic in the Help window.
Help on Help
The Help on Help command displays information on the Microsoft Advisor itself. It describes how the system is organized, how the mouse and keyboard
352
commands are used to browse through Help, and how to use the various kinds of buttons you encounter.
About
The About command displays the CodeView copyright and version information in a dialog box.
351
C H A P T E R
1 0
Special Topics
Debugging in the Windows Operating System

The Microsoft CodeView for the Windows operating system debugger (CVW) is a powerful tool for analyzing the behavior of Microsoft Windows-based programs. With CVW, you can test the execution of your application and examine your applications data. You can isolate problems quickly because you can display any combination of variables global or local while you halt or trace your applications execution.
Comparing CVW with CV

The CVW windows, menus, and commands are used in the same way as for CV. See Chapter 9, The CodeView Environment, for details on the format of CodeView windows and how to use the windows and menus. Like the MS-DOS CodeView, CVW allows you to display and modify any program variable, section of addressable memory, or processor register. However, CodeView for Windows differs from CV in the following ways:
u
Because the Windows operating system has a special use for the ALT+/ key combination used by CV to repeat a search, CVW uses CTRL+R. CVW tracks your applications segments and data as the Windows operating system moves them in memory. Thus, when you refer to an object by name, CVW always supplies the correct address.
CVW also provides six additional Command-window commands for Windowsbased program debugging, which are summarized in the following list: Windows Display Global Heap (WDG) Displays memory objects in the global heap. Windows Display Local Heap (WDL) Displays memory objects in the local heap.
352
Windows Dereference Local Handle (WLH) Dereferences a local heap handle to a pointer. Windows Dereference Global Handle (WGH) Dereferences a global heap handle to a pointer. Windows Display Modules (WDM) Displays a list of the application and DLL modules currently loaded in the Windows operating system. Windows Kill Application (WKA) Terminates the task that is currently executing by simulating a fatal error. For details on using these commands, see CVW Commands on page 357. The following CV features are not available in CVW.
u u
The Print command from the File menu. The DOS Shell command from the File menu and the corresponding Shell (!) Command-window command. The Screen Swap command from the Options menu and the corresponding Options (OF) Command-window command.
Preparing to Run CVW

Before beginning a CVW debugging session, you must ensure that your system is configured correctly and the Windows-based application you are going to debug is compiled and linked with the options that generate CodeView debugging information. For information on setting up your system and configuring CodeView, see Setting up CodeView on page 299. For information on preparing programs for use with CodeView, see General Programming Considerations on page 294 and Compiling and Linking on page 295. The SETUP program installed the two files in your MASM \ BIN subdirectory: CVW.EXE and CVW1.386. These two files must be in the current path. Also, in order for CodeView to run properly with the Windows operating system, the line:
device = drive:\MASM\BIN\CVW1.386
must appear under the [[386 Enh]] section of your SYSTEM.INI file, where drive is the hard disk drive where MASM resides. The window that CodeView uses cannot be sized or moved as can other Windows operating system applications. You can specify a different starting position for CodeView using the /X and /Y command-line options. For information on the CodeView command-line options, see page 310.
Chapter 10 Special Topics
353
If You Use the Windows Operating System Version 3.0

If the Windows operating system Version 3.0 is running in Standard Mode and CodeView is invoked with the / X option or with no parameters, Windows Version 3.0 will generate an error when CodeView attempts to switch to protected mode. This conflict only occurs with Windows Version 3.0 running in standard mode. You can avoid this by configuring a PIF file. For CodeView to run under Windows Version 3.0, create a PIF file using the PIF Editor. In the Optional Parameters field, enter only a question mark (?). This instructs Windows 3.0 to prompt for additional options when CodeView is invoked. When the PIF file is run, it will prompt for the command line. Specify the appropriate parameter as listed below, followed by the name of the program to be debugged.
Windows Mode Real Standard 386 enhanced Switch(es) /D or none /D /D (or /E if expanded memory is available)
Starting a Debugging Session

Like most Windows-based applications, CVW can be started in several ways. You can double-click the CVW icon and respond to CVWs prompts for arguments, or you can run CVW by using the Run command from the Program Manager File menu. To specify CVW options, choose the Run command from the Program Manager File menu. The Windows operating system displays a dialog box where you can enter the appropriate options for your debugging session. For specific information on CodeView command-line syntax and options, see The CodeView Command Line on page 308. You can run CVW to perform the following tasks:
u u u u
Debug a single application Debug multiple instances of an application Debug multiple applications Debug dynamic-link libraries (DLLs)
This section describes the methods you use to perform these tasks and summarizes the syntax of the CVW command line for each task.
Starting CVW for a Single Application

After you start CVW from Windows, CodeView displays the Load dialog box.
354
To start debugging a single application: 1. Type the name of the application in the File to Debug text box. CVW assumes the .EXE filename extension if you do not include an extension for the application name. You can also pick the program that you want to debug by choosing it from the Files List box. 2. If you want to specify command-line arguments, move the cursor to the Arguments text box and type the programs command line. 3. Choose OK. CVW loads the application and displays the source code for the applications WinMain routine. 4. Set breakpoints in the code if you desire. 5. Use the Go (G) command (F5) to begin executing the application.
To avoid the startup dialog boxes: 1. Choose the Run command from the Windows File menu. 2. Type the application name and arguments on the CVW command line. Use the following syntax to start debugging a single application: CVW [[options]] appname[[.EXE]] [[arguments]] 3. Choose OK.
Starting CVW for Multiple Instances of an Application

The Windows operating system can run multiple instances of an application, which can cause problems. For example, each instance of an application might corrupt the others data. To help you solve such problems, CVW allows you to debug multiple instances of an application. The breakpoints you set in your application apply to all of the instances. To determine which instance of the application has the focus in CVW, examine the DS register. To debug multiple instances of an application: 1. Start CVW as usual for one instance of your application. 2. Run additional instances of your application by choosing the Run command from the Windows File menu. You cannot specify the application name more than once on the CVW command line. Any additional application names are passed as arguments to the first application.
355
Starting CVW for Multiple Applications

You can debug two or more applications at the same time, such as a dynamic data exchange (DDE) client and server. To debug several applications at the same time: 1. Start CVW as usual for a single application. 2. Choose Load from the Run menu and choose other applications that you also want to debug. 3. Set breakpoints in either or both applications. You can use the Open Module command from the CVW File menu to display the source code for the different modules. If you know the module and the location or function name, you can use the context operator ({ }) to directly set breakpoints in the other applications. 4. Use the Go (G) command (F5) to start running the first application. 5. Choose the Run command from the Windows File menu to start running the second application. You can also use the /L option on the CVW command line to load the symbols for additional applications, as shown in this example:
CVW /Lsecond.exe /Lthird.exe first
The /L option and name of each additional application must precede the name of the first application on the command line. You must specify the .EXE filename extension for the additional applications. Repeat the /L option for each application to be included in the debugging session. Once CVW starts, choose the Run command from the Windows File menu to start executing the additional applications. Note Global symbols with the same name in several applications (such as WinMain) may not be distinguished. You can use the context operator to specify the exact instance of a symbol.
Starting CVW for DLLs

You can debug one or more DLLs while debugging an application. To debug a DLL at the same time as an application: 1. Start CVW as usual for the application. 2. Choose Load from the Run menu and type the name of the DLL.
356
3. Set breakpoints in the application or DLL. You can use the Open Module command from the CVW File menu to display the source code for the different modules. 4. Use the Go (G) command (F5) to continue executing the application. You can also use the /L option on the CVW command line to specify the DLLs, as shown in this example:
CVW /Lappdll appname
The /L option must precede the name of the application. Repeat the /L option for each DLL you want to debug.
Debugging the LibEntry DLL Initialization Routine

CVW allows you to debug the LibEntry initialization routine of a DLL. If your application implicitly loads the library, however, a special technique is required to debug the LibEntry routine. An application implicitly loads a DLL if the library routines are imported in the applications module-definition (.DEF) file or if your application imports library routines through an import library when you link the application. An application explicitly loads a DLL by calling the LoadLibrary routine. If your application implicitly loads the DLL and you specify the application in the Command Line dialog box, Windows automatically loads the DLL and executes the LibEntry routine when it loads the application. This gives you no opportunity to debug the LibEntry routine since it is executed when the application is loaded and before CVW gains control. To gain control before the LibEntry routine is executed, you must set a breakpoint in the LibEntry routine before the DLL is loaded. To set this breakpoint: 1. In the CVW Load dialog box, provide the name of a dummy application that does not load the library instead of the name of your application. The CALC.EXE program is provided for this purpose. 2. Load the DLL by using the Load command from the Run menu. 3. Choose the Open Module command from the CVW File menu and select the module containing the LibEntry routine. 4. Set at least one breakpoint in the LibEntry routine. 5. Use the Go (G) command (F5) to start the dummy application. 6. Run your application using the Run command from the Windows File menu. CVW resumes control when the breakpoint in the LibEntry routine is taken.
357
You can also specify the dummy application or DLL on the CVW command line. To begin a DLL debugging session from the command line: 1. Type the command line:
CVW /Lmydll winstub
2. After CVW starts, do steps 3 6 in the previous procedure to begin debugging.
CVW Commands
CVW recognizes several commands for Windows-based program debugging in addition to the Command-window commands recognized by CV. These commands allow you to inspect objects in the global and local Windows heaps, list the currently loaded modules, trace and set breakpoints on the occurrence of Windows operating system messages, and terminate the currently executing task.
Windows Display Global Heap

The Windows Display Global Heap (WDG) command lists the memory objects in the Windows global heap. Syntax WDG [[ghandle]] If ghandle is specified, WDG displays the first five global memory objects that start at the object specified by ghandle. The ghandle argument must be a valid handle to an object allocated on the global heap. If ghandle is not specified, WDG displays the entire global heap in the Command window. Global memory objects are displayed in the order in which Windows manages them, which is typically not in ascending handle order. The output format is: Format handle address size PDB locks type owner Any field may not be present if that field is not defined for the block.
Field handle address size PDB locks Description Value of the global memory block handle. Address of the global memory block. Size of the block in bytes. Block owner. If present, indicates that the tasks Process Descriptor Block is the owner of the block. Count of locks on the block.
358
Environment and Tools type owner The memory-block type. The block owners module name.
Windows Display Local Heap

The Windows Display Local Heap (WDL) command displays the entire heap of local Windows operating system memory objects. This commands syntax takes no arguments. Syntax Format WDL The output has the following format: handle address size flags locks type heaptype blocktype Any field may not be present if that field is not defined for the block.
Field handle address size flags locks type heaptype blocktype Description Value of the global memory block handle Address of the block Size of the block in bytes The blocks flags Count of locks on the block The type of the handle. The type of heap where the block resides The blocks type
Windows Display Modules

The Windows Display Modules (WDM) command displays a list of all the DLL and task modules that the Windows operating system has loaded. To see a list of known modules, type the WDM command in the Command window. Syntax WDM Each entry in the list is displayed with the following format: handle refcount module path
Field handle refcount module path Description The module handle The number of times the module has been loaded The name of the module The path of the modules executable file
359
Watching Windows Operating System Messages

You can trace the occurrence of a Windows operating system message or an entire class of Windows operating system messages by using the Breakpoint Set (BP) command. You can stop at each message, or you can execute continuously and display the messages in the Command window as they are received. To trace a Windows operating system message or message class, set a breakpoint using the following options: Syntax BP winproc / M{msgname|msgclass} [[/ D]] winproc Symbol name or address of a window function. msgname The name of a Windows operating system message, such as WM_PAINT. The msgname is case sensitive. msgclass A case-insensitive string of characters that identifies one or more classes of messages to watch. Use the following characters to indicate the class of Windows operating system message:
Class m w n s i c d z Type of Windows Message Mouse Window management Input System Initialization Clipboard DDE Nonclient
/D When specified, CodeView displays the message in the command window, but your program continues executing. The message is displayed similar to the following example:
HWND:lc00 wParm:0000 lParm:000000 msg:000F WM_PAINT
For each matching message that is sent to the specified winproc, CVW lists the hexadecimal values of the window handle (HWND), word parameter (wParm), long parameter (lParm), and message (msg) arguments, along with the name of the message. You can also specify a pass count and commands to be executed when the breakpoint is taken. For details on the full Breakpoint (BP) command syntax,
360
see BP (Breakpoint Set) on page 405. Note that you can also use the Breakpoint Set command from the Data menu to set all types of breakpoints.
Windows Kill Application

The Windows Kill Application (WKA) command terminates the currently executing task by simulating a fatal error. Since a fatal error terminates the application without performing any of the normal program exit processing, use WKA with caution. To terminate your application, type the following command in the Command window: Syntax WKA As a result of the simulated fatal error, Windows displays an Unexpected Application Error box. After you close the box, Windows may not release subsequent mouse input messages from the system queue until you press a key. If this happens, the mouse pointer moves on the Windows screen but Windows does not respond to the mouse. After you press any key, Windows responds to the queued mouse events. The currently executing task is not necessarily your application, so you should use the WKA command only when your application is the currently executing task. You can be sure that your application is the currently executing task when CVW shows the current location at a breakpoint in your application. For more information on using the WKA command, see Terminating Your Program on page 362.
CVW Debugging Techniques

Debugging Windows-based programs can be challenging. Objects move around in memory. The thread of execution can be a twisting maze where it is difficult to know what code is executing or to control what code in your program is executed. This section describes the WLH and WGH commands that you use to examine movable memory objects by their handles. It also describes ways to control your applications execution, how to interrupt and resume debugging your application, how to handle abnormal termination from fatal errors and general protection faults, and how to resume debugging your application after a normal termination.
361
Dereferencing Memory Handles

In a Windows-based application, the LocalLock and GlobalLock routines are used to lock memory handles so that they can dereference them into near or far pointers. In a debugging session, you may know the memory objects handle. However, you may not know what near or far address the handle references unless you are debugging in an area where the program has just completed a LocalLock or GlobalLock routine call. To get the near and far pointer addresses for unlocked local and global handles, use the WLH and WGH commands. For detailed information on the WLH and WGH commands, see WLH (Windows Dereference Local Handle) on page 441 and WGH (Windows Dereference Global Handle) on page 439.
Controlling Application Execution

In CVW, all of the CodeView execution commands (Go, Program Step, Trace, and Animate) can be used to control your applications execution. However, you should keep these restrictions in mind while using CVW:
u
Attempting to use the Program Step or Trace commands to execute Windows operating system startup code in Assembly mode causes unpredictable results. To step through your application in Assembly mode, first set a breakpoint at the WinMain routine and begin stepping through the program only after the breakpoint is taken. Directly calling a Windows-based application procedure or dialog routine in the Watch window, in the Quick Watch dialog box, or with the Display Expression (?) command can have unpredictable results.
The rest of this section describes techniques and special considerations for controlling program execution in CVW.
Interrupting Your Program

There may be times when you want to halt your program immediately. You can interrupt your program by pressing CTRL+ALT+SYSREQ. After you press CTRL+ALT+SYSREQ, CVW gains control and displays code corresponding to the current CS:IP location. You then have the opportunity to examine registers and memory, set breakpoints and watch expressions, and modify variables. To resume execution, use one of the CodeView program execution commands. You should take care when you interrupt execution. If you interrupt execution while Windows operating system code or other system code is executing, attempting to use the Program Step or Trace commands can produce unpredictable results. When you interrupt execution, it is safest to set
362
breakpoints in your code and then resume continuous execution with the Go command, rather than using the Program Step, Trace, or Animate commands. For example, an infinite loop in your code presents a special problem. Since you should avoid using the Program Step or Trace commands after interrupting your application, you should try to locate the loop by setting breakpoints in places you suspect are in the loop, then resume continuous execution. When one of these breakpoints is taken, you can be sure that the currently executing code is your application code.
Terminating Your Program

At times (such as when your application is executing an infinite loop), you may have to terminate the application. The Windows Kill Application (WKA) command terminates the currently executing task. Since this task is not necessarily your application, you should use the WKA command only when your application is the currently executing task. If your application is the currently executing task and is executing a module containing CodeView information, the Source window highlights the current line or instruction. However, if your application contains modules that are compiled without CodeView information, it is more difficult to determine whether the assembly-language code displayed in the Source window belongs to your application or to another task. In this case, use the Windows Display Global Heap (WDG) command with the value in the CS register as the argument. CVW displays a listing that indicates whether the code segment belongs to your application. If the current code is in your application, you can safely use the WKA command without affecting other tasks. However, the WKA command does not perform all the cleanup tasks associated with the normal termination of a Windows-based application. For example, global objects created during program execution but not destroyed before you terminated the program remain allocated in the system-wide global heap. This reduces the amount of memory available during the rest of the Windows operating system session. For this reason, you should use the WKA command to terminate the application only if you cannot terminate it normally. You should exercise caution when using the WKA command on an application that loads a DLL. If you terminate the application before it frees the DLL, the DLL remains loaded. If you rebuild the DLL and then run CVW again, the new version of the DLL doesnt get loaded. Note The WKA command simulates a fatal error in your application, causing
363
the Windows operating system to display an Unexpected Application Error message box. After you close this message box, Windows may not release subsequent mouse input messages from the system queue until you press a key. If this happens, the mouse pointer moves on the Windows operating system screen, but the Windows operating system does not respond to the mouse. After you press any key, the Windows operating syste

Assembly Language Programming. MASM &amp; Intel Architecture Documents

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assembly Language Programming. MASM &amp; Intel Architecture Documents

Uploaded by

Copyright:

Available Formats

CSCI 240 - Assembly Language Programming - MASM & Intel Docs

Getting Started Reference Guide Programmer's Guide Environment and Tools

Intel Architecture Optimization Reference Manual

Microsoft MASM 6.1 Documentation

Environment and Tools

http://web.sau.edu/LillisKevinM/csci240/masmdocs/ [12/27/2002 10:21:00 PM]

Intel Architecture Software Developers Manual

8.5.3. 8.5.4. 8.5.4.1. 8.5.5. 8.5.5.1. 8.5.5.2. 8.5.6. 8.5.7.

8-13 8-14 8-14 8-15 8-16 8-16 8-16 8-16

CHAPTER 1 ABOUT THIS MANUAL

ABOUT THIS MANUAL

ABOUT THIS MANUAL

ABOUT THIS MANUAL

ABOUT THIS MANUAL

Bit and Byte Order

ABOUT THIS MANUAL

Figure 1-1. Bit and Byte Order

Reserved Bits and Software Compatibility

ABOUT THIS MANUAL

Hexadecimal and Binary Numbers

ABOUT THIS MANUAL

ABOUT THIS MANUAL

The following books contain additional material related to Intel processors:

ABOUT THIS MANUAL

CHAPTER 2 INTRODUCTION TO THE INTEL ARCHITECTURE

BRIEF HISTORY OF THE INTEL ARCHITECTURE

INTRODUCTION TO THE INTEL ARCHITECTURE

INTRODUCTION TO THE INTEL ARCHITECTURE

INTRODUCTION TO THE INTEL ARCHITECTURE

INCREASING INTEL ARCHITECTURE PERFORMANCE AND MOORES LAW

INTRODUCTION TO THE INTEL ARCHITECTURE

INTRODUCTION TO THE INTEL ARCHITECTURE

BRIEF HISTORY OF THE INTEL ARCHITECTURE FLOATINGPOINT UNIT

INTRODUCTION TO THE P6 FAMILY PROCESSORS ADVANCED MICROARCHITECTURE

INTRODUCTION TO THE INTEL ARCHITECTURE

System Bus L2 Cache

Cache Bus Bus Interface Unit

L1 Instruction Cache Fetch Fetch/Decode Unit

INTRODUCTION TO THE INTEL ARCHITECTURE

Deep branch prediction. Dynamic data flow analysis. Speculative execution.

INTRODUCTION TO THE INTEL ARCHITECTURE

DETAILED DESCRIPTION OF THE P6 FAMILY PROCESSOR MICROARCHITECTURE

INTRODUCTION TO THE INTEL ARCHITECTURE

System Bus (External)

L2 Cache Cache Bus

Instruction Fetch Unit

Instruction Cache (L1)

Register Alias Table Retirement Register File (Intel Arch. Registers)

Retirement Unit Reorder Buffer (Instruction Pool)

Data Cache Unit (L1)

Internal Data-Results Buses

Figure 2-2. Functional Block Diagram of the P6 Family Processor Microarchitecture

INTRODUCTION TO THE INTEL ARCHITECTURE

Instruction Pool (Reorder Buffer)

INTRODUCTION TO THE INTEL ARCHITECTURE

INTRODUCTION TO THE INTEL ARCHITECTURE

INTRODUCTION TO THE INTEL ARCHITECTURE

CHAPTER 3 BASIC EXECUTION ENVIRONMENT

BASIC EXECUTION ENVIRONMENT

OVERVIEW OF THE BASIC EXECUTION ENVIRONMENT

236 1 Eight 32-bit Registers General-Purpose Registers

Six 16-bit Registers 32-bits 32-bits

EFLAGS Register EIP (Instruction Pointer Register)

Assembly Language Programming. MASM & Intel Architecture Documents

Assembly Language Programming. MASM & Intel Architecture Documents