Lect#1 - Introduction

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 47

CS734: Advanced Computer Architecture

Introduction
Reviewing Some Basic Concepts
• Central Processing Unit
• Memory Hierarchy
• Bus / Interconnect
• Moore’s Law
• Instruction Set Architecture
• Amdahl’s Law
• Pipelining
Resources
• Computer Architecture: A Quantitative
Approach, (4th ) by John L. Hennessy and David
A. Patterson
Other references
• Computer Organization and Design, (3rd
Edition) by David A. Patterson and John L.
Hennessy (4th if available)
Course Contents
• Fundamentals of Computer Architecture
• Instruction-Level Parallelism and Its
Exploitation
• Limits on Instruction-Level Parallelism
• Multiprocessors and Thread-Level Parallelism
• Memory Hierarchy Design
• Storage Systems
• Others
Assessment
• Final – 50
• Midterm – 30
• Assignments and quizzes – 20
• All material from the slides/book including
assignments and anything that is suggested
for further reading.
Computer Architecture
A Quantitative Approach, Fifth Edition

Chapter 1

Fundamentals of Quantitative Design and Analysis

Copyright © 2012, Elsevier Inc. All rights


reserved. 6
Computer Technology
• Performance improvements:
– Improvements in semiconductor technology
• Feature size, clock speed
– Improvements in computer architectures
• Enabled by HLL compilers, UNIX
• Lead to RISC architectures

– Together have enabled:


• Lightweight computers
• Productivity-based managed/interpreted programming
languages
Single Processor Performance
Move to multi-processor

RISC
Current Trends in Architecture
• Cannot continue to leverage Instruction-Level
parallelism (ILP)
– Single processor performance improvement ended in 2003

• New models for performance:


– Data-level parallelism (DLP)
– Thread-level parallelism (TLP)
– Request-level parallelism (RLP)

• These require explicit restructuring of the application


Classes of Computers
• Personal Mobile Device (PMD)
– e.g. smart phones, tablet computers
– Emphasis on energy efficiency and real-time
• Desktop Computing
– Emphasis on price-performance
• Servers
– Emphasis on availability, scalability, throughput
• Clusters / Warehouse Scale Computers
– Used for “Software as a Service (SaaS)”
– Emphasis on availability and price-performance
– Sub-class: Supercomputers, emphasis: floating-point performance
and fast internal networks
• Embedded Computers
– Emphasis: price
Parallelism
• Classes of parallelism in applications:
– Data-Level Parallelism (DLP)
– Task-Level Parallelism (TLP)

• Classes of architectural parallelism:


– Instruction-Level Parallelism (ILP)
– Vector architectures/Graphic Processor Units (GPUs)
– Thread-Level Parallelism
– Request-Level Parallelism
Flynn’s Taxonomy
• Single instruction stream, single data stream (SISD)

• Single instruction stream, multiple data streams (SIMD)


– Vector architectures
– Multimedia extensions
– Graphics processor units

• Multiple instruction streams, single data stream (MISD)


– No commercial implementation

• Multiple instruction streams, multiple data streams (MIMD)


– Tightly-coupled MIMD
– Loosely-coupled MIMD
What is Computer Architecture?
What is Computer Architecture?
• Architecture is those attributes visible to the
programmer
– These attributes have a direct impact on program
execution: instruction set, number of bits used for
data representation, I/O mechanisms, addressing
techniques.
• More broadly Computer Architecture =
Instruction set + organization + hardware
Instruction Set Architecture
• Class of ISA
• Memory Addressing: (Byte addressable,
Aligned)
• Addressing Modes
• Types and Sizes of Operands
• Operations
• Control Flow Instructions
• Encoding an ISA
Instruction Set Architecture: Critical Interface

software

instruction set

hardware

• Properties of a good abstraction


– Lasts through many generations (portability)
– Used in many different ways (generality)
– Provides convenient functionality to higher levels
– Permits an efficient implementation at lower levels
Example: MIPS
r0 0 Programmable storage Data types ?
r1
° architecture
2^32 x bytes Format ?
° 31 x 32-bit GPRs (R0=0) Addressing Modes?
° 32 x 32-bit FP regs (paired DP)
r31
PC HI, LO, PC
lo
hi
Arithmetic logical
Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU,
AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI
SLL, SRL, SRA, SLLV, SRLV, SRAV
Memory Access
LB, LBU, LH, LHU, LW, LWL,LWR
SB, SH, SW, SWL, SWR
Control
J, JAL, JR, JALR
BEq, BNE, BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL

32-bit instructions on word boundary


MIPS architecture instruction set format

Register to register

Transfer, branches

Jumps
Reviewing some basic concepts
Two categories of ISAs: RISC vs. CISC

• Reduced Instruction Set • Complex Instruction Set


Computing Computing
• Software centric approach • Hardware centric
• Instructions are simple approach
(fixed size) • Instructions are complex
• A reduced number of (different sizes)
instructions • A large number of
• Instructions on average instructions
take very few cycles • Instructions on average
(single cycle) take a large number of
cycles
MIPS
• Acronym for Million Instructions per Second
• Developed at Stanford by John L. Hennessy
and his team
• RISC
• Used in embedded devices
• Used very frequently for educational purposes
MIPS Arithmetic Instructions
• Each MIPS arithmetic instruction
– performs only one operation and
– must always have three variables (variables?).
MIPS add
• C code: a=b+c;

• Assembly code: (human-friendly machine


instructions)
add a, b, c # a is the sum of b and c

• Machine code: (hardware-friendly machine


instructions)
00000010001100100100000000100000
MIPS add Example from C with Multiple
Operands
• C code a = b + c + d + e;
• translates into the following assembly code:

add a, b, c
add a, a, d
add a, a, e

• Instructions are simple: fixed number of operands (unlike C)


• A single line of C code is converted into multiple lines of
• assembly code
MIPS Subtract

C code f = (g + h) – (i + j);
translates into the following assembly code:

add f, g, h
sub f, f, i
sub f, f, j

25
Operands

• In C, each “variable” is a location in memory

• In hardware, each memory access is expensive – if


variable a is accessed repeatedly, it helps to bring the
variable into an on-chip register and operate on the
registers

• To simplify the instructions, MIPS require that each


instruction (add, sub) only operate on registers

• Note: the number of operands (variables) in a C program is


very large; the number of operands in assembly is fixed…
The number of registers limited
26
Registers in MIPS

• The MIPS ISA has 32 registers (x86 has 8 registers)

• Each register is 32-bit wide (modern 64-bit architectures


have 64-bit wide registers)

• A 32-bit entity (4 bytes) is referred to as a word

• To make the code more readable, registers are


partitioned as $s0-$s7 (C/Java variables), $t0-$t9
(temporary variables)… (Complete set of registers later)

27
MIPS Add Using Registers
• C code: a=b+c;

• Assembly code: (human-friendly machine instructions)


add $s0, $s1, $s2 # assuming $s0, $s1, $s2 #
corresponds to a,b,c respectively

• Machine code: (hardware-friendly machine instructions)


00000010001100100100000000100000
Memory Operands

• Values must be fetched from memory before (add and sub)


instructions can operate on them

Load word Register Memory


lw $t0, memory-address

Store word Memory


Register
sw $t0, memory-address

How is memory-address determined?

29
Memory Address

• The compiler organizes data in memory… it knows the


location of every variable (saved in a table)… it can fill
in the appropriate mem-address for load-store instructions

int a, b, c, d[10]

Memory

Base address

30
Immediate Operands

• An instruction may require a constant as input

• An immediate instruction uses a constant number as one


of the inputs (instead of a register operand)

addi $s0, $zero, 1000 # the program has base address


# 1000 and this is saved in $s0
# $zero is a register that always
# equals zero
addi $s1, $s0, 0 # this is the address of variable a
addi $s2, $s0, 4 # this is the address of variable b
addi $s3, $s0, 8 # this is the address of variable c
addi $s4, $s0, 12 # this is the address of variable d[0]
31
Memory Instruction Format

• The format of a load instruction:

destination register
source address

lw $t0, 8($t3)

any register
a constant that is added to the register in brackets

32
Example

Convert to assembly:

C code: d[3] = d[2] + a;

33
Example

Convert to assembly:

C code: d[3] = d[2] + a;

Assembly: # addi instructions as before


lw $t0, 8($s4) # d[2] is brought into $t0
lw $t1, 0($s1) # a is brought into $t1
add $t0, $t0, $t1 # the sum is in $t0
sw $t0, 12($s4) # $t0 is stored into d[3]

34
Recap – Numeric Representations

• Decimal 3510

• Binary 001000112

• Hexadecimal (compact representation)


0x 23 or 23hex

0-15 (decimal)  0-9, a-f (hex)

35
Control Instructions

• Conditional branch: Jump to instruction L1 if register1


equals register2: beq register1, register2, L1
Similarly, bne and slt (set-on-less-than)
• Unconditional branch:
j L1
jr $s0

Convert to assembly:
if (i == j)
f = g+h;
else
f = g-h;

36
Control Instructions

• Conditional branch: Jump to instruction L1 if register1


equals register2: beq register1, register2, L1
Similarly, bne and slt (set-on-less-than)

• Unconditional branch:
j L1
jr $s0

Convert to assembly:
if (i == j) bne $s3, $s4, Else
f = g+h; add $s0, $s1, $s2
else j Exit
f = g-h; Else: sub $s0, $s1, $s2
Exit: 37
Implementation Overview
Basic MIPS Architecture

• We’ll design a simple CPU that executes:

 basic math (add, sub, and, or, slt)


 memory access (lw and sw)
 branch and jump instructions (beq and j)

39
Implementation Overview

• We need memory
 to store instructions
 to store data
 for now, let’s make them separate units

• We need registers, ALU, and a whole lot of control logic

• CPU operations common to all instructions:


 use the program counter (PC) to pull instruction out
of instruction memory
 read register values

40
Executing a MIPS Instruction

41
MIPS Pipeline

 Five stages, one step per stage


– IF : Instruction fetch from memory
– ID : Instruction decode & register read
– EX : Execute operation or calculate address
– MEM : Access memory operand
– WB : Write result back to register

42
MIPS Pipeline

43
Pipelining Hazards
• Structural hazards: different instructions in different stages
(or the same stage) conflicting for the same resource

• Data hazards: an instruction cannot continue because it


needs a value that has not yet been generated by an
earlier instruction

• Control hazard: fetch cannot continue because it does


not know the outcome of an earlier branch – special case
of a data hazard – separate category because they are
treated in different ways

44
Data Hazards

45
Forwarding

• Some data hazard stalls can be eliminated: bypassing


46
Control Hazards
• Simple techniques to handle control hazard stalls:
• assume the branch is not taken and start fetching the next
instruction
• if the branch is taken, need hardware to cancel the effect of
the wrong-path instruction fetch the next instruction (branch
delay slot) and execute it anyway
• if the instruction turns out to be on the correct path, useful
work was done
• if the instruction turns out to be on the wrong path,
hopefully program state is not lost

47

You might also like