EE182: Computer Org. & Design Handout #05

EE182: Computer Org.
& Design Handout #05
EE182 Computer Organization & Design

Tom Fountain Stanford University
Overview
n n
Handout (1) Today lecture s

n n n n n
Lecture 3: Assembly Language Basics

October 5, 2000
Administrative Announcements Performance Review Languages Arithmetic operators Registers Memory access
n n
Read Sections 3.1-3.4 (slightly out of order) #1 Rule: Keep it interactive!

Fountain/Autumn 00-01 EE182 Lecture #3 Slide #2
Administrative
n
Administrative (cont)
n
E-mail list reminder (again!)

n n
Send to majordomo@lists.stanford.edu Body should contain subscribe ee182 Yesterday, Wednesday 3:15-4:05 in Gates B03 Available via Stanford Online and on tape in Terman Greg Larchev
Mondays 1:30 3:30 pm Sweet Hall Tuesdays 1:00 3:00 pm Packard 106
Problem Set #1 Reminder

n
Section information
n n
n n n n n n
TA Office Hours
n
Alex Liu
Sundays 7:00 9:00 pm Sweet Hall Thursdays 1:00 3:00 pm Packard 109
Problems: 1.50, 2.10-2.12, 2.18-2.23, 2.26-2.29, 2.41, 2.44 Due Tuesday, 10/10 5:00 pm Turn in at lecture or to Gates 227 May work in groups of up to two One late day for the quarter, so use wisely Check the FAQ for common questions/answers Send E-mail with questions to: ee182-help@lists.stanford.edu
Fountain/Autumn 00-01
EE182 Lecture #3
Slide #3
EE182 Lecture #3
Slide #4
Measuring Time
n
Cycles Per Instruction (CPI)

n
The best predictor of performance is frequently execution time

1 Performance = ExecutionTime
We can use CPI to calculate execution time:

Execution Time = Instructions CPI Clock Cycle Time
Execution Time =
n
Instructions CPI Clock Rate
Improving performance
n n n
To compare, we say is n times faster than Y X

n= PerformanceX ExecutionTimeY = PerformanceY ExecutionTimeX
EE182 Lecture #3 Slide #5
Increased clock rate Lower CPI Reduced instructions
Designers have to balance the length of each cycle and the number of cycles required
EE182: Computer Org. & Design Handout #05
Calculating and Using CPI

n n
A Language
n n n
Different classes of instructions usually take different numbers of cycles If you know the number of instructions of each class of instruction
Clock Cycles = (CPI i Ci )
i =1 n
Computers speaka language Programming languages provide a means for symbolically expressing data processing Each language has a well defined syntax and grammar
where CPIi is the CPI for the class of instructions and Ci is the count of that type of instructions To compute the average CPI use
n Instruction Counti CPI = CPI i Instruction Count i =1
Fountain/Autumn 00-01 EE182 Lecture #3 Slide #7 Fountain/Autumn 00-01 EE182 Lecture #3 Slide #8
Programming Languages
n
Assembly Languages
n n n
There are many programming languages, but they usually fall into two categories
n
High-level languages are usually machine-independent and instructions are often more expressive
C, Fortran, Pascal, Basic
Low-level languages are usually machine-specific and offer much finer-grained instructions that closely match the machine language of the target processor
Assembly languages for MIPS, x86, SGI, HP-PA
Assembly languages are text representations of the machine language One statement represents one machine instruction Abstraction layer between high-level programs and machine code
EE182 Lecture #3
Slide #9
EE182 Lecture #3
Slide #10
Machine Language
n n n n n
Fitting Languages Together

temp = v[k]; High Level Language Program v[k] = v[k+1]; v[k+1] = temp; lw lw sw sw $15, $16, $16, $15,
1001 1111 0110 1000
Machine language is the native language of the computer The words are called instructions The vocabulary is the instruction set Bit representation of machine operations to be executed by the hardware We will focus on the MIPS instructions
n n
Compiler
Assembly Language Program
Assembler
Machine Language Program Control Signal Specification
0($2) 4($2) 0($2) 4($2)

1100 0101 1010 0000 0110 1000 1111 1001 1010 0000 0101 1100 1111 1001 1000 0110 0101 1100 0000 1010 1000 0110 1001 1111
0000 1010 1100 0101
Other RISC-based instruction sets are similar Different instruction sets tend to share a lot of commonalities since they function similarly
Machine Interpretation
High/Low on control lines
EE182 Lecture #3
Slide #11
EE182 Lecture #3
Slide #12
Real World Example (SPARC)

main() { printf("Hello world! \n"); }
Real World Example (cont)

.type main,#function .proc main: !#PROLOGUE# 0 04
hello.c
.file "hello.c"
save %sp,-112,%sp !#PROLOGUE# 1 sethi %hi(.LLC0),%o1 or %o1,%lo(.LLC0),%o0 call printf,0
gcc2_compiled.: .section ". rodata" .align 8 .LLC0: .asciz .section "Hello world! \n ".text" .align 4 .global main .LLfe1: .LL1:
nop
ret restore
.size .ident
main,.LLfe1 -main "GCC: (GNU) 2.8.1"
hello.s (Part 1)
Fountain/Autumn 00-01 EE182 Lecture #3 Slide #13 Fountain/Autumn 00-01
hello.s (Part 2)
EE182 Lecture #3 Slide #14
Assembly Instructions
n
Arithmetic Operators
n
The basic type of instruction has four components:

1. 2. 3. 4. Operator name Place to store result 1st operand 2nd operand
Consider the C operation for addition

a = b + c;
Use the add operator in MIPS

add a, b, c
add dst, src1, src2
n n
Simple, fixed formats make hardware implementation simpler ( simplicity favors regularity ) On most architectures, there are no restrictions on elements appearing more than once
Use the sub operator for a=bc in MIPS

sub a, b, c
Since assembly code can be difficult to read, the common practice is to use # for comments
EE182 Lecture #3
Slide #15
EE182 Lecture #3
Slide #16
Complex Operations
n
Data Representation
n n
What about more complex statements?

a = b + c + d - e;
Bits: 0 or 1 Bit strings sequence of bits

n n n n
Break into multiple instructions

add t0, b, c add t1, t0, d sub a, t1, e # t0 = b + c # t1 = t0 + d # a = t1 - e
8 bits is a byte 16 bits is a half-word 32 bits is a word 64 bits is a double-word
n n n
n n
Compilers often use temporary variables when generating code Notice all of the comments!
Characters one byte, usually using ASCII Integer numbers stored in 2 complement s which we will review in the next chapter Floating point uses a mantissa and exponential (m 2e), also covered in the next chapter
Data Storage
n n n
Register Organization
n n
In high-level programs we store data in variables In practice, where is this data stored? The answer is that it can be stored in many different places
n n n n
Register organization is one of the defining aspects about a particular processor architecture Three basic mechanisms for operators/operands
n
Disk Random Access Memory (RAM) Cache (RAM or disk) Registers
n n
Accumulator architecture which uses a single register for one of the sources and the destination (ex. 8088) Stack operands are pushed and popped (ex. Java) General Purpose a limited number of registers used to store data for any purpose (ex. most systems today)
n n
A register is a small high-speed block of memory that holds data We will focus in this course on general purpose
EE182 Lecture #3
Slide #19
Accumulator Example
n
Stack Example
n
Consider the code

a = b + c;
Consider the code

a = b + c;
In an accumulator-based architecture it is
load addressB add addressC store addressA
In Java bytecode it is
iload_1 iload_2 iadd istore_0 # # # # Loads b onto the stack Loads c onto the stack Adds and puts result on stack Stores into a
EE182 Lecture #3
Slide #21
EE182 Lecture #3
Slide #22
General Purpose Registers

n
MIPS Architecture
n
When using General Purpose Registers (GPRs), data can access in different ways
n
MIPS is a load-store architecture

n n
Load-Store (L/S) data is loaded into registers, operated on, and stored back to memory (ex. all RISC instruction sets)
Hardware for operands is simple Smaller is faster since clock cycle can be kept fast Emphasis is on efficiency
Each register is 32 bits long, called a word The MIPS has 32 general purpose registers (some reserved for different purposes) MIPS also has 32 floating point only registers, which we will also discuss later
Memory-Memory operands can use memory addresses as both a source and a destination (ex. Intel)
EE182 Lecture #3
Slide #23
EE182 Lecture #3
Slide #24
Register Naming
n n
Using registers
n
Registers 0-31 are named using a $<num> By convention, we give them names:
n n n n
Goals
n n
$zero contains the hardwired value 0 $s0, $s1, $s7 are for save variables $t0, $t1, $t9 are for temp variables The others will be introduced as we get to them
Keep data in registers as much as possible Always use data still in registers if possible Finite number of registers available
Spill registers to memory when all registers in use Data must also be stored across procedures (covered next lecture)
Issues
n
n n
Compilers use these conventions to make linking a smooth process Unlike variables, there are a fixed number of data registers ( smaller is faster )
Arrays
Data is too large to store in registers Need to compute index
Dynamic memory allocation

Dynamically allocated data structures must be loaded one word at a time
EE182 Lecture #3
Slide #26
Arithmetic Operators: II
n
Complex Operations: II
n
Consider the C operation for addition where the variables are in $s0-$s2 respectively
a = b + c;
What about more complex statements?

a = b + c + d - e;
The add operator using registers

add $s0, $s1, $s2 # a = b + c
Break into multiple instructions

add $t0, $s1, $s2 add $t1, $t0, $s3 sub $s0, $t1, $s4 # $t0 = b + c # $t1 = $t0 + d # a = $t1 - e
Use the sub operator for a=bc in MIPS

sub $s0, $s1, $s2 # a = b - c
EE182 Lecture #3
Slide #27
EE182 Lecture #3
Slide #28
Constants
n n
Constant Example
n
Often want to be able to add a constant Use the addi instruction

addi dst, src1, immediate
Consider the following C code

a++;
The addi operator

addi $s0, $s0, 1 # a = a + 1
The immediate is a 16 bit value
EE182 Lecture #3
Slide #29
EE182 Lecture #3
Slide #30
MIPS Simple Arithmetic

Instruction add subtract add immediate add unsigned subtract unsign add imm unsign Example add $1,$2,$3 sub $1,$2,$3 addi $1,$2,100 addu $1,$2,$3 subu $1,$2,$3 addiu $1,$2,100 Meaning $1 = $2 + $3 $1 = $2 $3 $1 = $2 + 100 $1 = $2 + $3 $1 = $2 $3 $1 = $2 + 100 Comments 3 operands; Exceptions 3 operands; Exceptions + constant; Exceptions 3 operands; No exceptions 3 operands; No exceptions + constant; No exceptions
Putting Data in Registers

n n
n n
Data transfer instructions are used to move data to and from memory in load-store A load operation moves data from memory to a register and a store operation moves data from a register to memory One word at a time is loaded from memory to a register on MIPS using the lw instruction Load instructs have three parts
1. 2. 3. Operator name Destination register Base register address and constant offset
lw dst, offset(base)
n
Offset value is signed (use ulw for unsigned)

Memory Access
n n n n
Loading Data Example

n
All memory access happens through loads and stores Aligned words, halfwords, and bytes Floating Point loads and stores for accessing FP registers Displacement based addressing
Immediate
Consider the example

a = b + *c;
Use the lw instruction to load

lw $t0, 0($s2) add $s0, $s1, $t0 # $t0 = Memory[c] # a = b + *c
Registers
Memory
Base
+
EE182 Lecture #3
Data to load/ location to store into
Slide #33
EE182 Lecture #3
Slide #34
Accessing Arrays
n n n
Array Example
n
Arrays are really pointers to the base address in memory Use offset value to indicate which index Remember that addresses are in bytes, so multiply by the size of the element
n n n n

a = b + c[9];
Use the lw instruction offset

lw $t0, 36($s2) add $s0, $s1, $t0 # $t0 = Memory[c[9]] # a = b + c[9]
Consider an integer array where A is the base address The data to be accessed is at index 5 Then the address from memory is A + 5 * 4 Unlike C, assembly does not handle pointer arithmetic for you!
EE182 Lecture #3
Slide #35
EE182 Lecture #3
Slide #36
Complex Array Example

n
Storing Data
n n

a = b + c[i];
First find the correct offset

add $t0, $s3, $s3 add $t0, $t0, $t0 add $t1, $s2, $t0 lw $t2, 0($t1) add $s0, $s1, $t2 # # # # # $t0 $t0 $t1 $t2 a = = = = = b 2 * i 4 * i c + 4*i Memory[c[i]] + c[i]
Storing data is just the reverse and the instruction is nearly identical Use the sw instruction to copy a word from the source register to an address in memory
sw src, offset(base)
Offset value is signed (usw for unsigned)
Note: We will cover multiply later
EE182 Lecture #3
Slide #37
EE182 Lecture #3
Slide #38
Storing Data Example

n
Storing to an Array
n

*a = b + c;

a[3] = b + c;
Use the sw instruction to store

add $t0, $s1, $s2 sw $t0, 0($s0) # $t0 = b + c # Memory[s0] = b + c
Use the sw instruction offset

add $t0, $s1, $s2 sw $t0, 12($s0) # $t0 = b + c # Memory[a[3]] = b + c
EE182 Lecture #3
Slide #39
EE182 Lecture #3
Slide #40
Complex Array Storage

n
MIPS Load/Store
Instruction store word store half store byte Example sw $1, 8($2) sh $1, 6($2) sb $1, 5($2) sf $f1, 4($2) lw $1, 8($2) lh $1, 6($2) lhu $1, 6($2) lb $1, 5($2) lbu $1, 5($2) lf $f1, 4($2) Meaning Mem[8+$2]=$1 Mem[6+$2]=$1 Mem[5+$2]=$1 Mem[4+$2]=$f1 $1=Mem[8+$2] $1=Mem[6+$2] $1=Mem[8+$2] $1=Mem[5+$2] $1=Mem[5+$2] $f1=Mem[4+$2] Comments Store word Stores only lower 16 bits Stores only lowest byte Store FP word Load word Load half; sign extend Load half; zero extend Load byte; sign extend Load byte; zero extend Load FP register

a[i] = b + c;
Use the sw instruction offset

add $t0, $s1, $s2 add $t1, $s3, $s3 add $t1, $t1, $t1 add $t2, $s0, $t1 sw $t0, 0($t2) # # # # # $t0 = b + c $t1 = 2 * i $t1 = 4 * i $t2 = a + 4*i Memory[a[i]] = b + c
store float load word load halfword load half unsign load byte load byte unsign load float
EE182 Lecture #3
Slide #41
EE182 Lecture #3
Slide #42
Memory Addressing
n n n
Byte Ordering
n
Almost all architectures support byte addressing as well as word addressing Different architectures have different ways of ordering the bits, known as the byte order Some architectures limit the way data is stored so as to improve efficiency
Two basic ways of ordering bits

n
Big Endian the bigend comes first and the most significant bit (MSB) is the lowest memory address Little Endian the little endcomes first and the least significant bit (LSB) is the first address (ex. Intel) Some systems such as MIPS and PowerPC can do both, but are primarily big endian
3 msb 0 Big Endian byte 0 1 2 3 2 1 Little Endian byte 0 0 lsb
EE182 Lecture #3
Slide #43
EE182 Lecture #3
Slide #44
Byte Ordering Example

n
Alignment Restrictions
n n
Consider the following word (32 bits) of memory

Little Endian LSB Big Endian MSB AB Memory Address 0 CD 1 00 2 Little Endian MSB Big Endian LSB 00 3
In MIPS, data is required to fall on addresses that are even multiples of the data size Historically
n n n
Early machines (IBM 360 in 1964) required alignment Removed in 1970s since hard for programmers RISC reintroduced due to effect on performance
0 Aligned 1 2 3
n n
Big Endian interprets as AB CD 00 00 (2882338816) Little Endian interprets as 00 00 CD AB (52651)
Not Aligned
Fountain/Autumn 00-01 EE182 Lecture #3 Slide #45 Fountain/Autumn 00-01 EE182 Lecture #3 Slide #46

EE182: Computer Org. & Design Handout #05

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EE182: Computer Org. & Design Handout #05

Uploaded by

Copyright:

Available Formats

EE182: Computer Org.

& Design Handout #05

EE182 Computer Organization & Design

Handout (1) Today lecture s

Lecture 3: Assembly Language Basics

Read Sections 3.1-3.4 (slightly out of order) #1 Rule: Keep it interactive!

E-mail list reminder (again!)

Problem Set #1 Reminder

Cycles Per Instruction (CPI)

The best predictor of performance is frequently execution time

We can use CPI to calculate execution time:

Instructions CPI Clock Rate

To compare, we say is n times faster than Y X

Increased clock rate Lower CPI Reduced instructions