Professional Documents
Culture Documents
02 Isa PDF
02 Isa PDF
02 Isa PDF
Application
OS
CIS 501
Introduction to Computer Architecture
Compiler
CPU
Firmware
I/O
Memory
Digital Circuits
Readings
What Is An ISA?
H+P
Chapter 2
Further reading:
Appendix C (RISC) and Appendix D (x86)
Available from web page
Paper
No
guarantees regarding
How operations are implemented
Which operations are fast and which are slow and when
Which operations take more power and which take less
Allows communication
Language: person to person
ISA: hardware to software
Need to speak the same language/ISA
Many common aspects
Part of speech: verbs, nouns, adjectives, adverbs, etc.
Common operations: calculation, control/branch, memory
Many different languages/ISAs, many similarities, many differences
Different structure
Both evolve over time
Programmability
Programmability
For whom?
Implementability
Easy to design high-performance implementations?
More recently
Easy to design low-power implementations?
Easy to design high-reliability implementations?
Easy to design low-cost implementations?
Compatibility
Easy to maintain programmability (implementability) as languages
and programs (technology) evolves?
x86 (IA32) generations: 8086, 286, 386, 486, Pentium, PentiumII,
PentiumIII, Pentium4,
CIS 501 (Martin/Roth): Instruction Set Architectures
Human Programmability
Compiler Programmability
Rules of thumb
Regularity: principle of least astonishment
Orthogonality & composability
One-vs.-all
CIS 501 (Martin/Roth): Instruction Set Architectures
Implementability
Popular argument
10
11
12
Compatibility
Backward compatibility
New processors must support old programs (cant drop features)
Very important
13
Aspects of ISAs
Compatibilitys friends
VonNeumann model
Backward compatibility
Format
Length and encoding
Operand model
Where (other than memory) are operands stored?
Forward compatibility
Reserve sets of trap & nop opcodes (dont define uses)
Add ISA functionality by overloading traps
Release firmware patch to add to old implementation
Add ISA hints by overloading nops
CIS 501 (Martin/Roth): Instruction Set Architectures
14
Overview only
Read about the rest in the book and appendices
15
16
Format
Length
Read Inputs
Execute
Write Output
Next PC
Encoding
A few simple encodings simplify decoder implementation
Fixed length
Most common is 32 bits
+ Simple implementation: compute next PC using only PC
Code density: 32 bits to increment a register by 1?
x86 can do this in one 8-bit instruction
Variable length
Complex implementation
+ Code density
Compromise: two lengths
MIPS16 or ARMs Thumb
17
18
Length
32-bits
Encoding
3 formats, simple encoding
Q: how many instructions can be encoded? A: 127
Memory only
add B,C,A
R-type
Op(6)
Rs(5)
Rs(5) Rt(5)
Rt(5) Rd(5) Sh(5)
Sh(5) Func(6)
Func(6)
I-type
Op(6)
Rs(5)
Rs(5) Rt(5)
Rt(5)
J-type
Op(6)
Immed(16)
Immed(16)
Target(26)
MEM
CIS 501 (Martin/Roth): Instruction Set Architectures
19
20
load B
add C
store A
ACC = mem[B]
ACC = ACC + mem[C]
mem[A] = ACC
push B
push C
add
pop A
stk[TOS++] = mem[B]
stk[TOS++] = mem[C]
stk[TOS++] = stk[--TOS] + stk[--TOS]
mem[A] = stk[--TOS]
ACC
TOS
MEM
CIS 501 (Martin/Roth): Instruction Set Architectures
21
MEM
CIS 501 (Martin/Roth): Instruction Set Architectures
load B,R1
add C,R1
store R1,A
R1 = mem[B]
R1 = R1 + mem[C]
mem[A] = R1
22
R1 = mem[B]
R2 = mem[C]
R1 = R1 + R2
mem[A] = R1
23
24
Register Windows
No
One reason registers are faster is that there are fewer of them
Small is fast (hardware truism)
Another is that they are directly addressed (no address calc)
More of them, means larger specifiers
Fewer registers per instruction or indirect addressing
Not everything can be put in registers
Structures, arrays, anything pointed-to
Although compilers are getting better at putting more things in
More registers means more saving/restoring
Upshot: trend to more registers: 8 (x86)!32 (MIPS) !128 (IA32)
64-bit x86 has 16 64-bit integer and 16 128-bit FP registers
CIS 501 (Martin/Roth): Instruction Set Architectures
25
+
+
Memory Addressing
Examples
26
Register-Indirect: R1=mem[R2]
Displacement: R1=mem[R2+immed]
Index-base: R1=mem[R2+R3]
Memory-indirect: R1=mem[mem[R2]]
Auto-increment: R1=mem[R2], R2= R2+1
Auto-indexing: R1=mem[R2+immed], R2=R2+immed
Scaled: R1=mem[R2+R3*immed1+immed2]
PC-relative: R1=mem[PC+imm]
28
Op(6)
Rs(5)
Rs(5) Rt(5)
Rt(5)
Immed(16)
Immed(16)
29
Control Instructions
30
31
Why?
More than 80% of branches are (in)equalities or comparisons to 0
OK to take two insns to do remaining branches (MCCF)
32
Control Instructions II
Option I: PC-relative
Position-independent within procedure
Used for branches and jumps within a procedure
Option II: Absolute
Position independent outside procedure
Used for procedure calls
Option III: Indirect (target found in register)
Needed for jumping to dynamic targets
Used for returns, dynamic procedure calls, switches
Rs(5)
Rs(5) Rt(5)
Rt(5)
Immed(16)
Immed(16)
Op(6)
Target(26)
Indirect jumps: jr
Op(6)
33
R-type
Op(6)
Rs(5)
Rs(5) Rt(5)
Rt(5) Rd(5) Sh(5)
Sh(5) Func(6)
Func(6)
MIPS
Implicit return address register is $31
Direct jump-and-link: jal
Indirect jump-and-link: jalr
34
35
36
The Setup
Pre 1980
Single-cycle execution
Bad compilers
Complex, high-level ISAs
Slow multi-chip micro-programmed implementations
Vicious feedback loop
Hardwired control
CISC: microcoded multi-cycle operations
Load/store architecture
Around 1982
Advances in VLSI made single-chip microprocessor possible
Speed by integration, on-chip wires much faster than off-chip
but only for very small, very simple ISAs
Compilers had to get involved in a big way
38
The CISCs
The RISCs
39
32-bit instructions
32 registers
64-bit virtual address space
Fews addressing modes (SPARC and PowerPC have more)
Why so many? Everyone invented their own new ISA
40
Post-RISC
41
IBM put it into its PCs because there was no competing choice
Rest is historical inertia and financial feedback
x86 is most difficult ISA to implement and do it fast but
Because Intel sells the most non-embedded processors
It has the most money
Which it uses to hire more and better engineers
Which it uses to maintain competitive performance
And given equal performance compatibility wins
So Intel sells the most non-embedded processors
AMD as a competitor keeps pressure on x86 performance
42
43
44
OOO was very hard to do with a coarse grain ISA like x86
Their solution? Translate x86 to RISC uops in hardware
push $eax
45
Emulation/Binary Translation
Virtual ISAs
Machine virtualization
47
46
48
ISA Research
Summary
Current projects
ISA for Instruction-Level Distributed Processing [Kim,Smith]
Multi-level register file exposes local/global communication
DELI: Dynamic Execution Layer Interface [HP]
An open translation/optimization/caching infrastructure
WaveScalar [Swanson,Shwerin,Oskin]
The vonNeumann alternative
DISE: Dynamic Instruction Stream Editor [Corliss,Lewis,Roth]
A programmable ISA: ISA/binary-rewriting hybrid
{Programm|Implement|Compat}-ability
Compatibility is a powerful force
Compatibility and implementability: ISAs, binary translation
Aspects of ISAs
CISC and RISC
Next up: caches
49
50