EE182: Computer Org.

& Design Handout #05

EE182 Computer Organization & Design

Tom Fountain Stanford University

n n

Handout (1) Today lecture s

n n n n n

Lecture 3: Assembly Language Basics

October 5, 2000

Administrative Announcements Performance Review Languages Arithmetic operators Registers Memory access

n n

Read Sections 3.1-3.4 (slightly out of order) #1 Rule: Keep it interactive!

Fountain/Autumn 00-01 EE182 Lecture #3


Administrative (cont)

E-mail list reminder (again!)

n n

Send to Body should contain subscribe ee182 Yesterday, Wednesday 3:15-4:05 in Gates B03 Available via Stanford Online and on tape in Terman Greg Larchev
Mondays 1:30 3:30 pm Sweet Hall Tuesdays 1:00 3:00 pm Packard 106

Problem Set #1 Reminder


Section information
n n

n n n n n n

TA Office Hours

Alex Liu
Sundays 7:00 9:00 pm Sweet Hall Thursdays 1:00 3:00 pm Packard 109

Problems: 1.50, 2.10-2.12, 2.18-2.23, 2.26-2.29, 2.41, 2.44 Due Tuesday, 10/10 5:00 pm Turn in at lecture or to Gates 227 May work in groups of up to two One late day for the quarter, so use wisely Check the FAQ for common questions/answers Send E-mail with questions to:

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #3

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #4

Measuring Time

Cycles Per Instruction (CPI)


The best predictor of performance is frequently execution time

1 Performance = ExecutionTime

We can use CPI to calculate execution time:

Execution Time = Instructions CPI Clock Cycle Time

Execution Time =

Instructions CPI Clock Rate

Improving performance
n n n

To compare, we say is n times faster than Y X

n= PerformanceX ExecutionTimeY = PerformanceY ExecutionTimeX
EE182 Lecture #3

Increased clock rate Lower CPI Reduced instructions

Designers have to balance the length of each cycle and the number of cycles required
Fountain/Autumn 00-01 EE182 Lecture #3

Fountain/Autumn 00-01

EE182: Computer Org. & Design Handout #05

Calculating and Using CPI

n n

A Language
n n n

Different classes of instructions usually take different numbers of cycles If you know the number of instructions of each class of instruction
Clock Cycles = (CPI i Ci )
i =1 n

Computers speaka language Programming languages provide a means for symbolically expressing data processing Each language has a well defined syntax and grammar

where CPIi is the CPI for the class of instructions and Ci is the count of that type of instructions To compute the average CPI use
n Instruction Counti CPI = CPI i Instruction Count i =1
Fountain/Autumn 00-01 EE182 Lecture #3

Programming Languages

Assembly Languages
n n n

There are many programming languages, but they usually fall into two categories

High-level languages are usually machine-independent and instructions are often more expressive
C, Fortran, Pascal, Basic

Low-level languages are usually machine-specific and offer much finer-grained instructions that closely match the machine language of the target processor
Assembly languages for MIPS, x86, SGI, HP-PA

Assembly languages are text representations of the machine language One statement represents one machine instruction Abstraction layer between high-level programs and machine code

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #9

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #10

Machine Language
n n n n n

Fitting Languages Together

temp = v[k]; High Level Language Program v[k] = v[k+1]; v[k+1] = temp; lw lw sw sw $15, $16, $16, $15,
1001 1111 0110 1000

Machine language is the native language of the computer The words are called instructions The vocabulary is the instruction set Bit representation of machine operations to be executed by the hardware We will focus on the MIPS instructions
n n

Assembly Language Program

Machine Language Program Control Signal Specification

0($2) 4($2) 0($2) 4($2)

1100 0101 1010 0000 0110 1000 1111 1001 1010 0000 0101 1100 1111 1001 1000 0110 0101 1100 0000 1010 1000 0110 1001 1111

0000 1010 1100 0101

Other RISC-based instruction sets are similar Different instruction sets tend to share a lot of commonalities since they function similarly

Machine Interpretation
High/Low on control lines

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #11

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #12

EE182: Computer Org. & Design Handout #05

Real World Example (SPARC)

main() { printf("Hello world! \n"); }

Real World Example (cont)

.type main,#function .proc main: !#PROLOGUE# 0 04

.file "hello.c"

save %sp,-112,%sp !#PROLOGUE# 1 sethi %hi(.LLC0),%o1 or %o1,%lo(.LLC0),%o0 call printf,0

gcc2_compiled.: .section ". rodata" .align 8 .LLC0: .asciz .section "Hello world! \n ".text" .align 4 .global main .LLfe1: .LL1:


ret restore

.size .ident

main,.LLfe1 -main "GCC: (GNU) 2.8.1"

hello.s (Part 1)
Fountain/Autumn 00-01 EE182 Lecture #3

hello.s (Part 2)
EE182 Lecture #3

Assembly Instructions

Arithmetic Operators

The basic type of instruction has four components:

1. 2. 3. 4. Operator name Place to store result 1st operand 2nd operand

Consider the C operation for addition

a = b + c;

Use the add operator in MIPS

add a, b, c

add dst, src1, src2

n n

Simple, fixed formats make hardware implementation simpler ( simplicity favors regularity ) On most architectures, there are no restrictions on elements appearing more than once

Use the sub operator for a=bc in MIPS

sub a, b, c

Since assembly code can be difficult to read, the common practice is to use # for comments

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #15

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #16

Complex Operations

Data Representation
n n

What about more complex statements?

a = b + c + d - e;

Bits: 0 or 1 Bit strings sequence of bits

n n n n

Break into multiple instructions

add t0, b, c add t1, t0, d sub a, t1, e # t0 = b + c # t1 = t0 + d # a = t1 - e

8 bits is a byte 16 bits is a half-word 32 bits is a word 64 bits is a double-word

n n n

n n

Compilers often use temporary variables when generating code Notice all of the comments!
Fountain/Autumn 00-01 EE182 Lecture #3

Characters one byte, usually using ASCII Integer numbers stored in 2 complement s which we will review in the next chapter Floating point uses a mantissa and exponential (m 2e), also covered in the next chapter
Fountain/Autumn 00-01 EE182 Lecture #3

EE182: Computer Org. & Design Handout #05

Data Storage
n n n

Register Organization
n n

In high-level programs we store data in variables In practice, where is this data stored? The answer is that it can be stored in many different places
n n n n

Register organization is one of the defining aspects about a particular processor architecture Three basic mechanisms for operators/operands

Disk Random Access Memory (RAM) Cache (RAM or disk) Registers

n n

Accumulator architecture which uses a single register for one of the sources and the destination (ex. 8088) Stack operands are pushed and popped (ex. Java) General Purpose a limited number of registers used to store data for any purpose (ex. most systems today)

n n

A register is a small high-speed block of memory that holds data We will focus in this course on general purpose
Fountain/Autumn 00-01 EE182 Lecture #3

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #19

Accumulator Example

Stack Example

Consider the code

a = b + c;

Consider the code

a = b + c;

In an accumulator-based architecture it is
load addressB add addressC store addressA

In Java bytecode it is
iload_1 iload_2 iadd istore_0 # # # # Loads b onto the stack Loads c onto the stack Adds and puts result on stack Stores into a

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #21

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #22

General Purpose Registers


MIPS Architecture

When using General Purpose Registers (GPRs), data can access in different ways

MIPS is a load-store architecture

n n

Load-Store (L/S) data is loaded into registers, operated on, and stored back to memory (ex. all RISC instruction sets)
Hardware for operands is simple Smaller is faster since clock cycle can be kept fast Emphasis is on efficiency

Each register is 32 bits long, called a word The MIPS has 32 general purpose registers (some reserved for different purposes) MIPS also has 32 floating point only registers, which we will also discuss later

Memory-Memory operands can use memory addresses as both a source and a destination (ex. Intel)

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #23

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #24

EE182: Computer Org. & Design Handout #05

Register Naming
n n

Using registers

Registers 0-31 are named using a $<num> By convention, we give them names:
n n n n

n n

$zero contains the hardwired value 0 $s0, $s1, $s7 are for save variables $t0, $t1, $t9 are for temp variables The others will be introduced as we get to them

Keep data in registers as much as possible Always use data still in registers if possible Finite number of registers available
Spill registers to memory when all registers in use Data must also be stored across procedures (covered next lecture)


n n

Compilers use these conventions to make linking a smooth process Unlike variables, there are a fixed number of data registers ( smaller is faster )
Fountain/Autumn 00-01 EE182 Lecture #3

Data is too large to store in registers Need to compute index

Dynamic memory allocation

Dynamically allocated data structures must be loaded one word at a time

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #26

Arithmetic Operators: II

Complex Operations: II

Consider the C operation for addition where the variables are in $s0-$s2 respectively
a = b + c;

What about more complex statements?

a = b + c + d - e;

The add operator using registers

add $s0, $s1, $s2 # a = b + c

Break into multiple instructions

add $t0, $s1, $s2 add $t1, $t0, $s3 sub $s0, $t1, $s4 # $t0 = b + c # $t1 = $t0 + d # a = $t1 - e

Use the sub operator for a=bc in MIPS

sub $s0, $s1, $s2 # a = b - c

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #27

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #28

n n

Constant Example

Often want to be able to add a constant Use the addi instruction

addi dst, src1, immediate

Consider the following C code


The addi operator

addi $s0, $s0, 1 # a = a + 1

The immediate is a 16 bit value

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #29

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #30

EE182: Computer Org. & Design Handout #05

MIPS Simple Arithmetic

Instruction add subtract add immediate add unsigned subtract unsign add imm unsign Example add $1,$2,$3 sub $1,$2,$3 addi $1,$2,100 addu $1,$2,$3 subu $1,$2,$3 addiu $1,$2,100 Meaning $1 = $2 + $3 $1 = $2 $3 $1 = $2 + 100 $1 = $2 + $3 $1 = $2 $3 $1 = $2 + 100 Comments 3 operands; Exceptions 3 operands; Exceptions + constant; Exceptions 3 operands; No exceptions 3 operands; No exceptions + constant; No exceptions

Putting Data in Registers

n n

n n

Data transfer instructions are used to move data to and from memory in load-store A load operation moves data from memory to a register and a store operation moves data from a register to memory One word at a time is loaded from memory to a register on MIPS using the lw instruction Load instructs have three parts
1. 2. 3. Operator name Destination register Base register address and constant offset

lw dst, offset(base)
Fountain/Autumn 00-01 EE182 Lecture #3

Offset value is signed (use ulw for unsigned)

Fountain/Autumn 00-01 EE182 Lecture #3

Memory Access
n n n n

Loading Data Example


All memory access happens through loads and stores Aligned words, halfwords, and bytes Floating Point loads and stores for accessing FP registers Displacement based addressing

Consider the example

a = b + *c;

Use the lw instruction to load

lw $t0, 0($s2) add $s0, $s1, $t0 # $t0 = Memory[c] # a = b + *c



Fountain/Autumn 00-01

EE182 Lecture #3

Data to load/ location to store into

Slide #33

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #34

Accessing Arrays
n n n

Array Example

Arrays are really pointers to the base address in memory Use offset value to indicate which index Remember that addresses are in bytes, so multiply by the size of the element
n n n n

Consider the example

a = b + c[9];

Use the lw instruction offset

lw $t0, 36($s2) add $s0, $s1, $t0 # $t0 = Memory[c[9]] # a = b + c[9]

Consider an integer array where A is the base address The data to be accessed is at index 5 Then the address from memory is A + 5 * 4 Unlike C, assembly does not handle pointer arithmetic for you!

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #35

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #36

EE182: Computer Org. & Design Handout #05

Complex Array Example


Storing Data
n n

Consider the example

a = b + c[i];

First find the correct offset

add $t0, $s3, $s3 add $t0, $t0, $t0 add $t1, $s2, $t0 lw $t2, 0($t1) add $s0, $s1, $t2 # # # # # $t0 $t0 $t1 $t2 a = = = = = b 2 * i 4 * i c + 4*i Memory[c[i]] + c[i]

Storing data is just the reverse and the instruction is nearly identical Use the sw instruction to copy a word from the source register to an address in memory
sw src, offset(base)

Offset value is signed (usw for unsigned)

Note: We will cover multiply later

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #37

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #38

Storing Data Example


Storing to an Array

Consider the example

*a = b + c;

Consider the example

a[3] = b + c;

Use the sw instruction to store

add $t0, $s1, $s2 sw $t0, 0($s0) # $t0 = b + c # Memory[s0] = b + c

Use the sw instruction offset

add $t0, $s1, $s2 sw $t0, 12($s0) # $t0 = b + c # Memory[a[3]] = b + c

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #39

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #40

Complex Array Storage


MIPS Load/Store
Instruction store word store half store byte Example sw $1, 8($2) sh $1, 6($2) sb $1, 5($2) sf $f1, 4($2) lw $1, 8($2) lh $1, 6($2) lhu $1, 6($2) lb $1, 5($2) lbu $1, 5($2) lf $f1, 4($2) Meaning Mem[8+$2]=$1 Mem[6+$2]=$1 Mem[5+$2]=$1 Mem[4+$2]=$f1 $1=Mem[8+$2] $1=Mem[6+$2] $1=Mem[8+$2] $1=Mem[5+$2] $1=Mem[5+$2] $f1=Mem[4+$2] Comments Store word Stores only lower 16 bits Stores only lowest byte Store FP word Load word Load half; sign extend Load half; zero extend Load byte; sign extend Load byte; zero extend Load FP register

Consider the example

a[i] = b + c;

Use the sw instruction offset

add $t0, $s1, $s2 add $t1, $s3, $s3 add $t1, $t1, $t1 add $t2, $s0, $t1 sw $t0, 0($t2) # # # # # $t0 = b + c $t1 = 2 * i $t1 = 4 * i $t2 = a + 4*i Memory[a[i]] = b + c

store float load word load halfword load half unsign load byte load byte unsign load float

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #41

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #42

EE182: Computer Org. & Design Handout #05

Memory Addressing
n n n

Byte Ordering

Almost all architectures support byte addressing as well as word addressing Different architectures have different ways of ordering the bits, known as the byte order Some architectures limit the way data is stored so as to improve efficiency

Two basic ways of ordering bits


Big Endian the bigend comes first and the most significant bit (MSB) is the lowest memory address Little Endian the little endcomes first and the least significant bit (LSB) is the first address (ex. Intel) Some systems such as MIPS and PowerPC can do both, but are primarily big endian
3 msb 0 Big Endian byte 0 1 2 3 2 1 Little Endian byte 0 0 lsb

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #43

Fountain/Autumn 00-01

EE182 Lecture #3

Slide #44

Byte Ordering Example


Alignment Restrictions
n n

Consider the following word (32 bits) of memory

Little Endian LSB Big Endian MSB AB Memory Address 0 CD 1 00 2 Little Endian MSB Big Endian LSB 00 3

In MIPS, data is required to fall on addresses that are even multiples of the data size Historically
n n n

Early machines (IBM 360 in 1964) required alignment Removed in 1970s since hard for programmers RISC reintroduced due to effect on performance
0 Aligned 1 2 3

n n

Big Endian interprets as AB CD 00 00 (2882338816) Little Endian interprets as 00 00 CD AB (52651)

Not Aligned
Fountain/Autumn 00-01 EE182 Lecture #3

