Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 35

This equipment is representative of the tabulating system invented and developed by Herman Hollerith

(1860-1929) and built for the U.S. Census Bureau. These machines were first used in compiling the 1890
Census. Hollerith's patents were acquired by the Computing-Tabulating-Recording Co. (which later became
IBM), and this work became the basis of the IBM Punched Card System. The first "tabulator" used simple
clock-like counting devices. When an electrical circuit is closed (through a punched hole in a predetermined
position on the card), each counter is actuated by an electromagnet. The unit's pointer (clock hand) moves
one step each time the magnet is energized. The circuits to the electromagnets are closed by means of a
hand-operated press type card reader. The operator places each card in the reader, pulls down a lever, and
removes the card after each punched hole is counted.
- IBM Archives (http://www-03.ibm.com/ibm/history/exhibits/attic/attic_071.html)
COMP 206:
Computer Architecture and
Implementation

Montek Singh
Jan 27-29, 2009

Lecture 4: Instruction Set Architecture

2
Approaching an ISA
 Instruction Set Architecture
 Defines set of operations, instruction format, hardware
supported data types, named storage, addressing modes,
sequencing
 Meaning of each instruction is described by RTL on
architected registers and memory

3
Moving Toward Design
 Given technology constraints assemble adequate datapath
 Architected storage mapped to actual storage
 Function units to do all the required operations
 Possible additional storage (eg. MAR)
 Interconnect to move information among regs and FUs

 Map each instruction to sequence of RTLs


 Collate sequences into symbolic controller state transition
diagram (STD)
 Implement controller

4
Datapath vs Control

Datapath Controller

signals

Control Points

 Datapath: Storage, FU, interconnect sufficient to perform the desired


functions
 Inputs are Control Points
 Outputs are signals (such as overflow, negative, etc)
 Controller: State machine to orchestrate operation on the data path
 Based on desired function and signals

5
Contents
 Design objectives
 Information representation
 Endian-ness, aligned access
 Organization of Instructions
 Encoding

6
Instruction Set Design Objective #1
Code size (code density):
 Depends on:
 size of MM/cache
 access time of cache (on-chip/off-chip)
 CPU-MM bandwidth
 Frequently used instructions should be short
 Implies variable-length instructions
 But there are negatives to this

7
Instruction Set Design Objective #2
Execution speed (performance) :
 Only frequently executed instructions should be included in the instruction set
 Infrequently executed instructions slow down the others
 Complex and long instructions tend to be used infrequently
 Defining hardware-software interface
 Frequently executed instructions should be fast
 Pipelining should be made as easy as possible
 Overlapped execution lowers CPI value
 Single instruction length, simple instruction formats, and few addressing
modes for easy decoding
 Three (register) address instructions decouple CPU and memory

8
Instruction Set Design Objective #3
Minimize size and complexity of hardware
(ALU/Control)
 Implementing infrequently executed instructions ties down
hardware that is rarely used, and could be used for some
other purpose with greater advantage

9
Instruction Set Design Objective #4
Instruction set as a programming language
 Needs of a human programmer (less important today)
 Several desirable properties of instruction sets have been recognized and described,
such as orthogonality (each operand can be specified independently of the others)
and consistency (being able to predict the remainder of an architecture given partial
knowledge of the system)
 Needs of an optimizing compiler
 Simple instructions are more suitable for code optimizations
 Optimizing compilers try to find the shortest or fastest code sequence that
implements the semantics of a HLL program. To make code reorganization
tractable, an instruction set is needed that makes:
– the size of each instruction easy to calculate;
– the execution time of each instruction easy to calculate;
– the interactions between instructions easy to figure out.
 ISA features such as complex addressing modes, variable length instructions,
special-purpose registers provide too many ways of doing the same thing and lead to
combinatorial explosion

10
Notations for Information Representation

64 bits

8 bytes

2 words

1 doubleword

Q: How do we number these various units of information in a consistent manner?

9 6 2 1 7 6 6
Most
MostSignificant
SignificantDigit
Digit(MSD)
(MSD) Least
LeastSignificant
SignificantDigit
Digit(LSD)
(LSD)
“Big End”
“Big End” “Little End”
“Little End”

“Big End”-ian Numbering 0 1 2 3 4 5 6


6 5 4 3 2 1 0 “Little End”-ian Numbering

“On holy wars and a plea for peace”, Danny Cohen, IEEE Computer 14(10), pages 49-54, Oct 1981
11
Why Is Numbering Important?
 English text is written left-to-right and the characters are numbered left-
to-right
 Numbers can be numbered in two different ways
 Memory locations are numbered (addresses)
 Consequences of numbering
 Data is stored in memory according to byte numbering (the lower-numbered byte goes
into a byte in memory with a smaller address)
 Data is sent through a bit-serial communication channel according to bit numbering (bit 0
goes first, followed by bit 1, etc.)
 When displaying computer representation for humans
 Numbers are written in the usual way (MSD on left, LSD on right)
 Text is written in such a way as to match the numbering of numbers

12
Odds and Ends about Numbering
 The Little Endian notation is compatible with mathematical
conventions of positional notation
 The Little Endian notation has the disadvantage that is
displays English text in reverse
 To overcome this, manuals for Little Endian machines usually display character
strings vertically
 Example machines
 Little Endian: PDP-11, VAX, 80x86
 Big Endian: IBM 370, MIPS, DLX, SPARC
 Mixed: Motorola 68000, Z8000
 Big Endian byte ordering
 Little Endian bit ordering

13
Alignment of Words in Memory
Mem Mem Mem Mem
Bank Bank Bank Bank
Memory 00 01 10 11
Controller
8 8 8 8

32 bits

 CPU accesses a 32-bit word of data starting at byte address x…x00


 Such an address (multiple of 32[b]/8[b/B] = 4[B]) is called word-aligned
 Memory controller is simple and fast, data available in one cycle
 CPU accesses a 32-bit word of data starting at byte address 01111
 Byte addresses are 01111, 10000, 10001, 10010 (misaligned address)
 Doubles the access time of word
 Requiring aligned addresses results in simpler memory controller and faster
execution
 Costs some loss of storage, and adds complexity in code generators

14
Sub-Word Accesses
Mem Mem Mem Mem
Bank Bank Bank Bank
Memory 00 01 10 11
Controller
8 8 8 8
CPU
Register
32 bits File
(32 bits)

 Byte operand in register is usually the rightmost byte of register


 Byte may come from any of the four memory banks
 Needs routing/permuting hardware
 Either at memory side of bus (justified bus)
Byte always travels on rightmost quarter of bus
 Or on CPU side (unjustified bus)
Bus lanes are extensions of memory bank lanes
 Source of complications in either case 15
What is Used?

SPEC2000
16
Organization of an Instruction

Arithmetic
Logical
M a c h in e in s tru c tio n
Shift
S yn ta x S e m a n tics

L e n g th F o rm a t P ro c e s s in g D a ta m o v e m e n t T ra n s fe r o f co n tro l I/O o p e ra tio n s

U n ifo rm V a ria b le O p co d e O p e ra n d s S p e c ifie rs Load (from MM)


Store (to MM) If I/O is not
N u m b er W h e re H o w s p e c ifie d
Move (reg-reg) memory-
(e.g., MIPS: Move (MM-MM) mapped
4 bytes)
1) Length of operands
(e.g., VAX: 2) Shift/rotate: direction,
1-37 bytes) amount
3) Branch condition
0 address
1 address Addressing modes
•immediate Unconditional (branch)
2 address Instruction
•absolute Conditional (jump)
3 address Register
•computed Call
implied Memory
Return
Operand Locations

18
Classification by Operands
Stack Accumulator General Purpose Register
Load/Store Reg/Mem Mem/Mem
ALU operations 0 address
1 address 3 address 2 (or 1.5) address 3 address
Explicit operands (1,1) (0,3) (1,2), (1, 3), (2, 2) (3, 3)
Instruction size Short Short 4 bytes 2/4/6 bytes variable
Needs separate Load/Store Load/Store Load/Store Store
Early examples Burroughs PDP-8 CDC 6600 IBM S/360 DEC VAX-11/780
B5000- Intel 8086 IBM S/370
B7500 Motorola 6809
Current examples Transputer All RISC machines IBM 3033, IBM S/390
Amdahl V
Hitachi, Fujitsu
Orthogonality Farthest from Intermediate Closest to
Pipelining Easiest Intermediate Hardest

 Important machines that are difficult to classify


 Intel 80x86
 variable instruction size: 1-17 bytes (m,n) means
 memory can be destination m memory operands
 uses implied registers n total operands
 Motorola 680x0
 Instruction size: 2, 4, 6, 8, 10 bytes
 Two address format only (2, 2)

19
Registers versus Cache
 Similarities
 Both small, fast, and expensive (flip-flops)
 Both used to increase execution speed of CPU
 Both operate based on locality of reference
 Differences
 Registers are visible in ISA; caches are not (except for instructions for invalidation,
prefetch, or flushing)
 Number of registers is fixed by instruction format; size of cache is easily changeable
 Registers have higher BW: 3 words/cycle, and are random-access; caches have lower
BW: 1 word/cycle, and are associative
 Register access time is fixed; cache access time is statistical
 Register allocation is explicit by compiler; cache allocation is automatic
 Registers require fewer bits to address; caches require full memory addresses
 Registers create no I/O problems; caches do

20
Organization of Registers
 One general-purpose set (all interchangeable, “typeless”)
 One general-purpose set (a few with dedicated uses)
 PDP-11: eight 16-bit registers (R6: stack pointer, R7: PC)
 VAX 11/780: sixteen 32-bit registers (four special-purpose, R14: stack pointer, R15: PC)
 Two sets
 Motorola 68000: eight 32-bit data, eight 32-bit address
 IBM 370: sixteen 32-bit integer, four 64-bit FP
 DLX, MIPS: 31 32-bit integer, 32 32-bit FP
 Three sets
 CDC 6600: eight 18-bit integer, eight 18-bit address, eight 60-bit FP
 Many registers with dedicated use
 Intel 80x86

21
Addressing Modes
Name
Name Example
Example Meaning
Meaning When
Whenused used
Register
Register add
addr4,
r4,r3r3 R[r4]
R[r4]:=:=R[r4]+R[r3]
R[r4]+R[r3] When
Whenvalue
valueisisininregister
register
Immediate
Immediate add
addr4,
r4,#3#3 R[r4]
R[r4]:=:=R[r4]+3
R[r4]+3 For
Forconstants
constants
Displacement
Displacement add
addr4,
r4,100(r1)
100(r1) R[r4]
R[r4]:=:=R[r4]+M[100+R[r1]]
R[r4]+M[100+R[r1]] Accessing
Accessinglocal
localvariables
variables
Register
Registerdeferred
deferred add
addr4,
r4,(r1)
(r1) R[r4]
R[r4]:=:=R[r4]
R[r4]++M[R[r1]]
M[R[r1]] Pointer,
Pointer,computed
computedaddressaddress
Indexed
Indexed add r3, (r1+r2)
add r3, (r1+r2) R[r3] := R[r3]+M[R[r1]+R[r2]]
R[r3] := R[r3]+M[R[r1]+R[r2]] Array addressing
Array addressing
Direct
Direct add
addr1,
r1,(1001)
(1001) R[r1]
R[r1]:=:=R[r1]+M[1001]
R[r1]+M[1001] Static
Staticdata
data
Memory
Memoryindirect
indirect add
addr1,
r1,@(r3)
@(r3) R[r1]
R[r1]:=:=R[r1]+M[M[R[r3]]]
R[r1]+M[M[R[r3]]] Pointer
Pointerdereferencing
dereferencing
Autoincrement
Autoincrement add
addr1,
r1,(r2)+
(r2)+ R[r1]
R[r1]:=:=R[r1]+M[R[r2]];
R[r1]+M[R[r2]];R[r2]R[r2]:=:=R[r2]+
R[r2]+dd Stepping
Steppingthrough
througharray array
Autodecrement
Autodecrement add r1, -(r2)
add r1, -(r2) R[r2]
R[r2]:=:=R[r2]-d
R[r2]-d; ;R[r1]
R[r1]:=:=R[r1]+M[R[r2]]
R[r1]+M[R[r2]] Stepping through
Stepping through arrayarray
Scaled
Scaled add
addr1,
r1,100(r2)[r3]
100(r2)[r3] R[r1]
R[r1] := R[r1]+M[100+R[r2]+d*R[r3]]
:= R[r1]+M[100+R[r2]+d *R[r3]] Array
Arrayindexing
indexing
R : the register file
M: the memory address space
d : the size of the data item being accessed (1, 2, 4, 8
bytes)

 We can’t directly refer to data values, only their addresses


 Except for immediate operands
 Register deferred and direct addressing modes can be synthesized from
displacement addressing mode

22
Frequency of Addressing Modes

Register account for ½. This is the other ½.

SPEC2000
23
Address Displacement Sizes

This type of data would help you decide how much space to
allocate to displacement. Tested on a machine w/ 16 bits of
displacement, so can’t evaluate more.
SPEC2000
24
Use of Immediate Operands

25
Length of Immediate Oper.

Max size was 16. HP book says that a study on VAX


(32-bit imm.) showed 20-25% were longer than 16
bits
26
Control Transfer Instructions
Terminology
 BTA (Branch Target Address): The destination address of the branch
 The BTA is static if it is always the same during execution
 The BTA is dynamic if it can vary during a single execution of a program (procedure
return, O-O dynamic dispatch, switch statements are major examples)
 Branch taken if next instruction to be executed is at address BTA
 Branch not taken if next instruction to be executed is the one following the branch
instruction (“fall-through”)
 Branch outcome: whether the branch is taken or not taken
 Forward branch: BTA > (PC), where (PC) is the address of the branch instruction
 Backward branch: BTA < (PC)
 An unconditional branch is always taken

27
Code Generation Examples for Branches

if (x > 0) y += z; while (a < b) {


else y -=z; a++; b--; x++;
}
blez r7, L18
addu r3, r3, r4 j L33
j L33 L34:
L18: addu r5, r5, 1
subu r3, r3, r4 addu r6, r6, -1
L33: addu r7, r7, 1
L33:
slt r2, r5, r6
bne r2, r0, L34

Register r3 contains y
Register r3 contains y
Register r4 contains z
Register r4 contains z
Register r5 contains a
Register r5 contains a
Register r6 contains b
Register r6 contains b
Register r7 contains x
Register r7 contains x

28
Classification of Branches

HP terminology Branch Jump Call Return


Conditional Unconditional Unconditional Unconditional
HLL equivalent IF-THEN GOTO CALL RETURN
Relative freq. 83% 5% 6% 6%
Taken With probability T always always always
Not taken With probability 1-T never never never
BTA static most often (PC-relative) PC-relative most frequent never
BTA dynamic usually not allowed BTA in register BTA in register always

Taken Not Taken Classifying branches into these four


groups permits us to compute some of the
F&T F&NT Forward dynamic frequencies if some others have
B&T B&NT Backward been measured.

Rule of thumb: Backward branches tend to be taken,


forward branches tend not to be taken. Why?

29
Evaluating Branch Conditions

Name How is condition tested? Advantages Disadvantages


Condition code Special bits set by ALU ops Sometimes condition is Extra state, additional constraints
set for free on instruction reordering
Condition register Test arbitrary register Simple Uses up a register
with result of comparison
Compare and branch Compare is part of branch One instruction rather May be too much work
than two per instruction

 Typical set of condition codes (e.g., Motorola 680x0)


 NegativeResult, ZeroResult, ArithmeticOverflow, CarryOut
 Many RISC machines do not use condition codes (e.g., MIPS, Alpha)
 Magnitude comparisons are done with explicit COMPARE instructions that put their
results into named registers
 Some instructions have two variants: one traps on overflow, the other does not

30
Branch Distance

31
Instruction Encoding

These days encoding more important for embedded


processors. PowerPC compresses code in memory,
uncompresses in icache.

32
“Typical” RISC ISA
 32-bit fixed format instruction (3 formats)
 32 32-bit GPR (R0 contains zero, DP take pair)
 3-address, reg-reg arithmetic instruction
 Single address mode for load/store:
base + displacement
 no indirection
 Simple branch conditions
 Delayed branch

see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC,


CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3
33
Example: MIPS

Register-Register

31 26 25 21 20 16 15 11 10 6 5 0
Op Rs1 Rs2 Rd Opx

Register-Immediate
31 26 25 21 20 16 15 0
Op Rs1 Rd immediate

Branch
31 26 25 21 20 16 15 0
Op Rs1 Rs2/Opx immediate

Jump / Call
31 26 25 0
Op target

34
Next Time
 Pipelining

 If you’ve never looked at pipelining, read Appendix A,


otherwise skim

35

You might also like