Professional Documents
Culture Documents
Unit-02Introduction To Computer Architecture - Isa
Unit-02Introduction To Computer Architecture - Isa
Unit-02Introduction To Computer Architecture - Isa
CIS 501 (Martin/Roth): Instruction Set Architectures 1 CIS 501 (Martin/Roth): Instruction Set Architectures 2
CIS 501 (Martin/Roth): Instruction Set Architectures 3 CIS 501 (Martin/Roth): Instruction Set Architectures 4
A Language Analogy for ISAs RISC vs CISC Foreshadowing
• A ISA is analogous to a human language • Recall performance equation:
• Allows communication • (instructions/program) * (cycles/instruction) * (seconds/cycle)
• Language: person to person
• ISA: hardware to software • CISC (Complex Instruction Set Computing)
• Need to speak the same language/ISA • Improve “instructions/program” with “complex” instructions
• Many common aspects • Easy for assembly-level programmers, good code density
• Part of speech: verbs, nouns, adjectives, adverbs, etc.
• Common operations: calculation, control/branch, memory
• RISC (Reduced Instruction Set Computing)
• Many different languages/ISAs, many similarities, many differences
• Improve “cycles/instruction” with many single-cycle instructions
• Different structure
• Increases “instruction/program”, but hopefully not as much
• Both evolve over time
• Help from smart compiler
• Key differences: ISAs must be unambiguous • Perhaps improve clock cycle time (seconds/cycle)
• ISAs are explicitly engineered and extended • via aggressive implementation allowed by simpler instructions
CIS 501 (Martin/Roth): Instruction Set Architectures 5 CIS 501 (Martin/Roth): Instruction Set Architectures 6
CIS 501 (Martin/Roth): Instruction Set Architectures 7 CIS 501 (Martin/Roth): Instruction Set Architectures 8
Human Programmability Compiler Programmability
• What makes an ISA easy for a human to program in? • What makes an ISA easy for a compiler to program in?
• Proximity to a high-level language (HLL) • Low level primitives from which solutions can be synthesized
• Closing the “semantic gap” • Wulf: “primitives not solutions”
• Semantically heavy (CISC-like) insns that capture complete idioms • Computers good at breaking complex structures to simple ones
• “Access array element”, “loop”, “procedure call” • Requires traversal
• Example: SPARC save/restore • Not so good at combining simple structures into complex ones
• Bad example: x86 rep movsb (copy string) • Requires search, pattern matching (why AI is hard)
• Ridiculous example: VAX insque (insert-into-queue) • Easier to synthesize complex insns than to compare them
• “Semantic clash”: what if you have many high-level languages?
• Rules of thumb
• Stranger than fiction • Regularity: “principle of least astonishment”
• People once thought computers would execute language directly • Orthogonality & composability
• Fortunately, never materialized (but keeps coming back around) • One-vs.-all
CIS 501 (Martin/Roth): Instruction Set Architectures 9 CIS 501 (Martin/Roth): Instruction Set Architectures 10
CIS 501 (Martin/Roth): Instruction Set Architectures 11 CIS 501 (Martin/Roth): Instruction Set Architectures 12
Compatibility The Compatibility Trap
• No-one buys new hardware… if it requires new software • Easy compatibility requires forethought
• Intel was the first company to realize this • Temptation: use some ISA extension for 5% performance gain
• ISA must remain compatible, no matter what • Frequent outcome: gain diminishes, disappears, or turns to loss
• x86 one of the worst designed ISAs EVER, but survives – Must continue to support gadget for eternity
• As does IBM’s 360/370 (the first “ISA family”)
• Backward compatibility • Example: register windows (SPARC)
• New processors must support old programs (can’t drop features) • Adds difficulty to out-of-order implementations of SPARC
• Very important • Details shortly
• Forward (upward) compatibility
• Old processors must support new programs (with software help)
• New processors redefine only previously-illegal opcodes
• Allow software to detect support for specific new instructions
• Old processors emulate new instructions in low-level software
CIS 501 (Martin/Roth): Instruction Set Architectures 13 CIS 501 (Martin/Roth): Instruction Set Architectures 14
CIS 501 (Martin/Roth): Instruction Set Architectures 15 CIS 501 (Martin/Roth): Instruction Set Architectures 16
The Sequential Model Format
• Implicit model of all modern ISAs • Length
• Often called VonNeuman, but in ENIAC before • Fixed length
Fetch PC
• Basic feature: the program counter (PC) • Most common is 32 bits
Decode
• Defines total order on dynamic instruction + Simple implementation: compute next PC using only PC
Read Inputs • Next PC is PC++ unless insn says otherwise – Code density: 32 bits to increment a register by 1?
Execute • Order and named storage define computation – x86 can do this in one 8-bit instruction
Write Output • Value flows from insn X to Y via storage A iff…
• Variable length
Next PC
• X names A as output, Y names A as input…
– Complex implementation
• And Y after X in total order
+ Code density
• Processor logically executes loop at left
• Compromise: two lengths
• Instruction execution assumed atomic
• MIPS16 or ARM’s Thumb
• Instruction X finishes before insn X+1 starts
• Encoding
• Alternatives have been proposed… • A few simple encodings simplify decoder implementation
CIS 501 (Martin/Roth): Instruction Set Architectures 17 CIS 501 (Martin/Roth): Instruction Set Architectures 18
CIS 501 (Martin/Roth): Instruction Set Architectures 19 CIS 501 (Martin/Roth): Instruction Set Architectures 20
Operand Model: Accumulator Operand Model: Stack
• Accumulator: implicit single element storage • Stack: TOS implicit in instructions
load B ACC = mem[B] push B stk[TOS++] = mem[B]
add C ACC = ACC + mem[C] push C stk[TOS++] = mem[C]
store A mem[A] = ACC add stk[TOS++] = stk[--TOS] + stk[--TOS]
pop A mem[A] = stk[--TOS]
ACC TOS
MEM MEM
CIS 501 (Martin/Roth): Instruction Set Architectures 21 CIS 501 (Martin/Roth): Instruction Set Architectures 22
MEM
• Upshot: most new ISAs are load-store or hybrids
CIS 501 (Martin/Roth): Instruction Set Architectures 23 CIS 501 (Martin/Roth): Instruction Set Architectures 24
How Many Registers? Register Windows
• Registers faster than memory, have as many as possible? • Register windows: hardware activation records
• No • Sun SPARC (from the RISC I)
• One reason registers are faster is that there are fewer of them • 32 integer registers divided into: 8 global, 8 local, 8 input, 8 output
• Small is fast (hardware truism) • Explicit save/restore instructions
• Another is that they are directly addressed (no address calc) • Global registers fixed
– More of them, means larger specifiers • save: inputs “pushed”, outputs ! inputs, locals zeroed
– Fewer registers per instruction or indirect addressing • restore: locals zeroed, inputs ! outputs, inputs “popped”
• Not everything can be put in registers
• Hardware stack provides few (4) on-chip register frames
• Structures, arrays, anything pointed-to
• Spilled-to/filled-from memory on over/under flow
• Although compilers are getting better at putting more things in
– More registers means more saving/restoring + Automatic parameter passing, caller-saved registers
+ No memory traffic on shallow (<4 deep) call graphs
• Upshot: trend to more registers: 8 (x86)!32 (MIPS) !128 (IA32) – Hidden memory operations (some restores fast, others slow)
• 64-bit x86 has 16 64-bit integer and 16 128-bit FP registers – A nightmare for register renaming (more later)
CIS 501 (Martin/Roth): Instruction Set Architectures 25 CIS 501 (Martin/Roth): Instruction Set Architectures 26
CIS 501 (Martin/Roth): Instruction Set Architectures 27 CIS 501 (Martin/Roth): Instruction Set Architectures 28
Example: MIPS Addressing Modes Two More Addressing Issues
• MIPS implements only displacement • Access alignment: address % size == 0?
• Aligned: load-word @XXXX00, load-half @XXXXX0
• Why? Experiment on VAX (ISA with every mode) found distribution
• Unaligned: load-word @XXXX10, load-half @XXXXX1
• Disp: 61%, reg-ind: 19%, scaled: 11%, mem-ind: 5%, other: 4% • Question: what to do with unaligned accesses (uncommon case)?
• 80% use small displacement or register indirect (displacement 0) • Support in hardware? Makes all accesses slow
• Trap to software routine? Possibility
• Use regular instructions
• I-type instructions: 16-bit displacement • Load, shift, load, shift, and
• Is 16-bits enough? • MIPS? ISA support: unaligned access using two instructions
• Yes? VAX experiment showed 1% accesses use displacement >16 lwl @XXXX10; lwr @XXXX10
CIS 501 (Martin/Roth): Instruction Set Architectures 31 CIS 501 (Martin/Roth): Instruction Set Architectures 32
Control Instructions II MIPS Control Instructions
• Another issue: computing targets • MIPS uses all three
• Option I: PC-relative • PC-relative conditional branches: bne, beq, blez, etc.
• Position-independent within procedure • 16-bit relative offset, <0.1% branches need more
• Used for branches and jumps within a procedure
• Option II: Absolute I-type Op(6) Rs(5)
Rs(5) Rt(5)
Rt(5) Immed(16)
Immed(16)
• Position independent outside procedure
• Absolute jumps unconditional jumps: j
• Used for procedure calls
• Option III: Indirect (target found in register) • 26-bit offset
• Needed for jumping to dynamic targets J-type Op(6) Target(26)
• Used for returns, dynamic procedure calls, switches
• Indirect jumps: jr
• How far do you need to jump?
• Typically not so far within a procedure (they don’t get that big) R-type Op(6) Rs(5)
Rs(5) Rt(5)
Rt(5) Rd(5) Sh(5)
Sh(5) Func(6)
Func(6)
• Further from one procedure to another
CIS 501 (Martin/Roth): Instruction Set Architectures 33 CIS 501 (Martin/Roth): Instruction Set Architectures 34
CIS 501 (Martin/Roth): Instruction Set Architectures 35 CIS 501 (Martin/Roth): Instruction Set Architectures 36
The Setup The RISC Tenets
• Pre 1980 • Single-cycle execution
• Bad compilers • CISC: many multicycle operations
• Complex, high-level ISAs • Hardwired control
• Slow multi-chip micro-programmed implementations • CISC: microcoded multi-cycle operations
• Vicious feedback loop
• Load/store architecture
• Around 1982 • CISC: register-memory and memory-memory
• Advances in VLSI made single-chip microprocessor possible…
• Few memory addressing modes
• Speed by integration, on-chip wires much faster than off-chip
• CISC: many modes
• …but only for very small, very simple ISAs
• Fixed instruction format
• Compilers had to get involved in a big way
• CISC: many formats and lengths
• RISC manifesto: create ISAs that…
• Reliance on compiler optimizations
• Simplify single-chip implementation
• CISC: hand assemble to get good performance
• Facilitate optimizing compilation
CIS 501 (Martin/Roth): Instruction Set Architectures 37 CIS 501 (Martin/Roth): Instruction Set Architectures 38
CIS 501 (Martin/Roth): Instruction Set Architectures 39 CIS 501 (Martin/Roth): Instruction Set Architectures 40
Post-RISC The RISC Debate
• Intel/HP IA64 (Itanium): 2000 • RISC argument [Patterson et al.]
• Fixed length instructions: 128-bit 3-operation bundles • CISC is fundamentally handicapped
• EPIC: explicitly parallel instruction computing • For a given technology, RISC implementation will be better (faster)
• 128 64-bit registers • Current technology enables single-chip RISC
• When it enables single-chip CISC, RISC will be pipelined
• Special instructions: true predication, software speculation
• When it enables pipelined CISC, RISC will have caches
• Every new ISA feature suggested in last two decades
• When it enables CISC with caches, RISC will have next thing...
• More later in course
CIS 501 (Martin/Roth): Instruction Set Architectures 41 CIS 501 (Martin/Roth): Instruction Set Architectures 42
CIS 501 (Martin/Roth): Instruction Set Architectures 43 CIS 501 (Martin/Roth): Instruction Set Architectures 44
Intel’s Trick: RISC Inside Transmeta’s Take: Code Morphing
• 1993: Intel wanted out-of-order execution in Pentium Pro • Code morphing: x86 translation performed in software
• OOO was very hard to do with a coarse grain ISA like x86 • Crusoe/Astro are x86 emulators, no actual x86 hardware anywhere
• Their solution? Translate x86 to RISC uops in hardware • Only “code morphing” translation software written in native ISA
push $eax • Native ISA is invisible to applications, OS, even BIOS
is translated (dynamically in hardware) to • Different Crusoe versions have (slightly) different ISAs: can’t tell
store $eax [$esp-4] • How was it done?
addi $esp,$esp,-4
• Code morphing software resides in boot ROM
• Processor maintains x86 ISA externally for compatibility
• On startup boot ROM hijacks 16MB of main memory
• But executes RISC µISA internally for implementability
• Translator loaded into 512KB, rest is translation cache
• Translation itself is proprietary, but 1.6 uops per x86 insn
• Software starts running in interpreter mode
• Given translator, x86 almost as easy to implement as RISC
• Interpreter profiles to find “hot” regions: procedures, loops
• Result: Intel implemented OOO before any RISC company
• Hot region compiled to native, optimized, cached
• Idea co-opted by other x86 companies: AMD and Transmeta
• Gradually, more and more of application starts running native
• The one company that resisted (Cyrix) couldn’t keep up
CIS 501 (Martin/Roth): Instruction Set Architectures 45 CIS 501 (Martin/Roth): Instruction Set Architectures 46
CIS 501 (Martin/Roth): Instruction Set Architectures 47 CIS 501 (Martin/Roth): Instruction Set Architectures 48
ISA Research Summary
• Compatibility killed ISA research for a while • What makes a good ISA
• But binary translation/emulation has revived it • {Programm|Implement|Compat}-ability
• Current projects • Compatibility is a powerful force
• “ISA for Instruction-Level Distributed Processing” [Kim,Smith] • Compatibility and implementability: µISAs, binary translation
• Multi-level register file exposes local/global communication • Aspects of ISAs
• “DELI: Dynamic Execution Layer Interface” [HP] • CISC and RISC
• An open translation/optimization/caching infrastructure
• “WaveScalar” [Swanson,Shwerin,Oskin]
• Next up: caches
• The vonNeumann alternative
• “DISE: Dynamic Instruction Stream Editor” [Corliss,Lewis,Roth]
• A programmable µISA: µISA/binary-rewriting hybrid
• Local project: http://…/~eclewis/proj/dise/
CIS 501 (Martin/Roth): Instruction Set Architectures 49 CIS 501 (Martin/Roth): Instruction Set Architectures 50