Lect#1 - Introduction

CS734: Advanced Computer Architecture
Introduction
Reviewing Some Basic Concepts
• Central Processing Unit
• Memory Hierarchy
• Bus / Interconnect
• Moore’s Law
• Instruction Set Architecture
• Amdahl’s Law
• Pipelining
Resources
• Computer Architecture: A Quantitative
Approach, (4th ) by John L. Hennessy and David
A. Patterson
Other references
• Computer Organization and Design, (3rd
Edition) by David A. Patterson and John L.
Hennessy (4th if available)
Course Contents
• Fundamentals of Computer Architecture
• Instruction-Level Parallelism and Its
Exploitation
• Limits on Instruction-Level Parallelism
• Multiprocessors and Thread-Level Parallelism
• Memory Hierarchy Design
• Storage Systems
• Others
Assessment
• Final – 50
• Midterm – 30
• Assignments and quizzes – 20
• All material from the slides/book including
assignments and anything that is suggested
for further reading.
Computer Architecture
A Quantitative Approach, Fifth Edition
Chapter 1
Fundamentals of Quantitative Design and Analysis
Copyright © 2012, Elsevier Inc. All rights

reserved. 6
Computer Technology
• Performance improvements:
– Improvements in semiconductor technology
• Feature size, clock speed
– Improvements in computer architectures
• Enabled by HLL compilers, UNIX
• Lead to RISC architectures
– Together have enabled:

• Lightweight computers
• Productivity-based managed/interpreted programming
languages
Single Processor Performance
Move to multi-processor
RISC
Current Trends in Architecture
• Cannot continue to leverage Instruction-Level
parallelism (ILP)
– Single processor performance improvement ended in 2003
• New models for performance:

– Data-level parallelism (DLP)
– Thread-level parallelism (TLP)
– Request-level parallelism (RLP)
• These require explicit restructuring of the application

Classes of Computers
• Personal Mobile Device (PMD)
– e.g. smart phones, tablet computers
– Emphasis on energy efficiency and real-time
• Desktop Computing
– Emphasis on price-performance
• Servers
– Emphasis on availability, scalability, throughput
• Clusters / Warehouse Scale Computers
– Used for “Software as a Service (SaaS)”
– Emphasis on availability and price-performance
– Sub-class: Supercomputers, emphasis: floating-point performance
and fast internal networks
• Embedded Computers
– Emphasis: price
Parallelism
• Classes of parallelism in applications:
– Data-Level Parallelism (DLP)
– Task-Level Parallelism (TLP)
• Classes of architectural parallelism:

– Instruction-Level Parallelism (ILP)
– Vector architectures/Graphic Processor Units (GPUs)
– Thread-Level Parallelism
– Request-Level Parallelism
Flynn’s Taxonomy
• Single instruction stream, single data stream (SISD)
• Single instruction stream, multiple data streams (SIMD)

– Vector architectures
– Multimedia extensions
– Graphics processor units
• Multiple instruction streams, single data stream (MISD)

– No commercial implementation
• Multiple instruction streams, multiple data streams (MIMD)

– Tightly-coupled MIMD
– Loosely-coupled MIMD
What is Computer Architecture?
What is Computer Architecture?
• Architecture is those attributes visible to the
programmer
– These attributes have a direct impact on program
execution: instruction set, number of bits used for
data representation, I/O mechanisms, addressing
techniques.
• More broadly Computer Architecture =
Instruction set + organization + hardware
Instruction Set Architecture
• Class of ISA
• Memory Addressing: (Byte addressable,
Aligned)
• Addressing Modes
• Types and Sizes of Operands
• Operations
• Control Flow Instructions
• Encoding an ISA
Instruction Set Architecture: Critical Interface
software
instruction set
hardware
• Properties of a good abstraction

– Lasts through many generations (portability)
– Used in many different ways (generality)
– Provides convenient functionality to higher levels
– Permits an efficient implementation at lower levels
Example: MIPS
r0 0 Programmable storage Data types ?
r1
° architecture
2^32 x bytes Format ?
° 31 x 32-bit GPRs (R0=0) Addressing Modes?
° 32 x 32-bit FP regs (paired DP)
r31
PC HI, LO, PC
lo
hi
Arithmetic logical
Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU,
AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI
SLL, SRL, SRA, SLLV, SRLV, SRAV
Memory Access
LB, LBU, LH, LHU, LW, LWL,LWR
SB, SH, SW, SWL, SWR
Control
J, JAL, JR, JALR
BEq, BNE, BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL
32-bit instructions on word boundary

MIPS architecture instruction set format
Register to register
Transfer, branches
Jumps
Reviewing some basic concepts
Two categories of ISAs: RISC vs. CISC
• Reduced Instruction Set • Complex Instruction Set

Computing Computing
• Software centric approach • Hardware centric
• Instructions are simple approach
(fixed size) • Instructions are complex
• A reduced number of (different sizes)
instructions • A large number of
• Instructions on average instructions
take very few cycles • Instructions on average
(single cycle) take a large number of
cycles
MIPS
• Acronym for Million Instructions per Second
• Developed at Stanford by John L. Hennessy
and his team
• RISC
• Used in embedded devices
• Used very frequently for educational purposes
MIPS Arithmetic Instructions
• Each MIPS arithmetic instruction
– performs only one operation and
– must always have three variables (variables?).
MIPS add
• C code: a=b+c;
• Assembly code: (human-friendly machine

instructions)
add a, b, c # a is the sum of b and c
• Machine code: (hardware-friendly machine

instructions)
00000010001100100100000000100000
MIPS add Example from C with Multiple
Operands
• C code a = b + c + d + e;
• translates into the following assembly code:
add a, b, c
add a, a, d
add a, a, e
• Instructions are simple: fixed number of operands (unlike C)

• A single line of C code is converted into multiple lines of
• assembly code
MIPS Subtract
C code f = (g + h) – (i + j);
translates into the following assembly code:
add f, g, h
sub f, f, i
sub f, f, j
25
Operands
• In C, each “variable” is a location in memory
• In hardware, each memory access is expensive – if

variable a is accessed repeatedly, it helps to bring the
variable into an on-chip register and operate on the
registers
• To simplify the instructions, MIPS require that each

instruction (add, sub) only operate on registers
• Note: the number of operands (variables) in a C program is

very large; the number of operands in assembly is fixed…
The number of registers limited
26
Registers in MIPS
• The MIPS ISA has 32 registers (x86 has 8 registers)
• Each register is 32-bit wide (modern 64-bit architectures

have 64-bit wide registers)
• A 32-bit entity (4 bytes) is referred to as a word
• To make the code more readable, registers are

partitioned as $s0-$s7 (C/Java variables), $t0-$t9
(temporary variables)… (Complete set of registers later)
27
MIPS Add Using Registers
• C code: a=b+c;
• Assembly code: (human-friendly machine instructions)

add $s0, $s1, $s2 # assuming $s0, $s1, $s2 #
corresponds to a,b,c respectively
• Machine code: (hardware-friendly machine instructions)

00000010001100100100000000100000
Memory Operands
• Values must be fetched from memory before (add and sub)

instructions can operate on them
Load word Register Memory

lw $t0, memory-address
Store word Memory

Register
sw $t0, memory-address
How is memory-address determined?
29
Memory Address
• The compiler organizes data in memory… it knows the

location of every variable (saved in a table)… it can fill
in the appropriate mem-address for load-store instructions
int a, b, c, d[10]
…
Memory
Base address
30
Immediate Operands
• An instruction may require a constant as input
• An immediate instruction uses a constant number as one

of the inputs (instead of a register operand)
addi $s0, $zero, 1000 # the program has base address

# 1000 and this is saved in $s0
# $zero is a register that always
# equals zero
addi $s1, $s0, 0 # this is the address of variable a
addi $s2, $s0, 4 # this is the address of variable b
addi $s3, $s0, 8 # this is the address of variable c
addi $s4, $s0, 12 # this is the address of variable d[0]
31
Memory Instruction Format
• The format of a load instruction:
destination register
source address
lw $t0, 8($t3)
any register
a constant that is added to the register in brackets
32
Example
Convert to assembly:
C code: d[3] = d[2] + a;
33
Example
C code: d[3] = d[2] + a;
Assembly: # addi instructions as before

lw $t0, 8($s4) # d[2] is brought into $t0
lw $t1, 0($s1) # a is brought into $t1
add $t0, $t0, $t1 # the sum is in $t0
sw $t0, 12($s4) # $t0 is stored into d[3]
34
Recap – Numeric Representations
• Decimal 3510
• Binary 001000112
• Hexadecimal (compact representation)

0x 23 or 23hex
0-15 (decimal)  0-9, a-f (hex)
35
Control Instructions
• Conditional branch: Jump to instruction L1 if register1

equals register2: beq register1, register2, L1
Similarly, bne and slt (set-on-less-than)
• Unconditional branch:
j L1
jr $s0
if (i == j)
f = g+h;
else
f = g-h;
36
Control Instructions
• Conditional branch: Jump to instruction L1 if register1

equals register2: beq register1, register2, L1
Similarly, bne and slt (set-on-less-than)
• Unconditional branch:
j L1
jr $s0
if (i == j) bne $s3, $s4, Else
f = g+h; add $s0, $s1, $s2
else j Exit
f = g-h; Else: sub $s0, $s1, $s2
Exit: 37
Implementation Overview
Basic MIPS Architecture
• We’ll design a simple CPU that executes:
 basic math (add, sub, and, or, slt)

 memory access (lw and sw)
 branch and jump instructions (beq and j)
39
Implementation Overview
• We need memory
 to store instructions
 to store data
 for now, let’s make them separate units
• We need registers, ALU, and a whole lot of control logic
• CPU operations common to all instructions:

 use the program counter (PC) to pull instruction out
of instruction memory
 read register values
40
Executing a MIPS Instruction
41
MIPS Pipeline
 Five stages, one step per stage

– IF : Instruction fetch from memory
– ID : Instruction decode & register read
– EX : Execute operation or calculate address
– MEM : Access memory operand
– WB : Write result back to register
42
MIPS Pipeline
43
Pipelining Hazards
• Structural hazards: different instructions in different stages
(or the same stage) conflicting for the same resource
• Data hazards: an instruction cannot continue because it

needs a value that has not yet been generated by an
earlier instruction
• Control hazard: fetch cannot continue because it does

not know the outcome of an earlier branch – special case
of a data hazard – separate category because they are
treated in different ways
44
Data Hazards
45
Forwarding
• Some data hazard stalls can be eliminated: bypassing

46
Control Hazards
• Simple techniques to handle control hazard stalls:
• assume the branch is not taken and start fetching the next
instruction
• if the branch is taken, need hardware to cancel the effect of
the wrong-path instruction fetch the next instruction (branch
delay slot) and execute it anyway
• if the instruction turns out to be on the correct path, useful
work was done
• if the instruction turns out to be on the wrong path,
hopefully program state is not lost
47

Lect#1 - Introduction

Uploaded by

Copyright:

Available Formats

You might also like

Lect#1 - Introduction

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect#1 - Introduction

Uploaded by

Copyright:

Available Formats

CS734: Advanced Computer Architecture

Fundamentals of Quantitative Design and Analysis

Copyright © 2012, Elsevier Inc. All rights

– Together have enabled:

• New models for performance:

• These require explicit restructuring of the application

• Classes of architectural parallelism:

• Single instruction stream, multiple data streams (SIMD)

• Multiple instruction streams, single data stream (MISD)

• Multiple instruction streams, multiple data streams (MIMD)

• Properties of a good abstraction

32-bit instructions on word boundary

• Reduced Instruction Set • Complex Instruction Set

• Assembly code: (human-friendly machine

• Machine code: (hardware-friendly machine

• Instructions are simple: fixed number of operands (unlike C)

• In C, each “variable” is a location in memory

• In hardware, each memory access is expensive – if

• To simplify the instructions, MIPS require that each

• Note: the number of operands (variables) in a C program is

• The MIPS ISA has 32 registers (x86 has 8 registers)

• Each register is 32-bit wide (modern 64-bit architectures

• A 32-bit entity (4 bytes) is referred to as a word

• To make the code more readable, registers are

• Assembly code: (human-friendly machine instructions)

• Machine code: (hardware-friendly machine instructions)

• Values must be fetched from memory before (add and sub)

Load word Register Memory

Store word Memory

How is memory-address determined?

• The compiler organizes data in memory… it knows the

• An instruction may require a constant as input

• An immediate instruction uses a constant number as one

addi $s0, $zero, 1000 # the program has base address

• The format of a load instruction:

C code: d[3] = d[2] + a;

C code: d[3] = d[2] + a;

Assembly: # addi instructions as before

• Hexadecimal (compact representation)

0-15 (decimal)  0-9, a-f (hex)

• Conditional branch: Jump to instruction L1 if register1

• Conditional branch: Jump to instruction L1 if register1

• We’ll design a simple CPU that executes:

 basic math (add, sub, and, or, slt)

• We need registers, ALU, and a whole lot of control logic

• CPU operations common to all instructions:

 Five stages, one step per stage

• Data hazards: an instruction cannot continue because it

• Control hazard: fetch cannot continue because it does

• Some data hazard stalls can be eliminated: bypassing

You might also like