Lect 2

von Neumann Architecture Processor Organization Instruction Cycle Instruction Pipelining
CSC 213: Computer Architecture

Lecture 2: Processor Structure and Function
October 22, 2021

Agenda
1 von Neumann Architecture
2 Processor Organization
3 Instruction Cycle
4 Instruction Pipelining

von Neumann Architecture
Modern computers are based on the von Neumann architecture.

Key concepts
The contents of this memory are addressable by location,
without regard to the type of data contained there.
Execution occurs in a sequential fashion (unless explicitly
modified) from one instruction to the next.
The basic function performed by a computer is execution of a
program, which consists of a set of instructions stored in memory.

von Neumann Architecture Internals
CPU
AC I/O AR I/O devices
ALU
MBR I/O BR
Internal
bus
MAR Main
memory
PC IR CU

Instruction format
0 3 4 15
Operation code Address
0 1 15
Size
Sign
instruction argument
For example Move 104 - 0101000001101000

Operation
Interrupt cycle
Interrupt
execution
Fetch cycle Execution cycle YES

NO
Instruction Instruction Interrupts

START
fetching execution valid?
STOP

Instruction fetching cycle
Program counter (PC) stores address of the next instruction

to acquire (at the beginning it is so called entry point)
Processor fetches instruction from the address pointed by PC
Value of the PC is increased by 1 (unless something else is
required - jump)
Instruction is loaded into the instruction register (IR)
Processor decodes instruction and executes operation pointed
by it

instruction fetching cycle (1)
AC MBR Entry Address Content

point
100 Move 104
ALU 101 Add 105
102 Store 106
103 Stop
104 3
MAR  PC
105 5
PC IR CU
106
100
MAR
ADDRESS BUS


DATA BUS
MBR  M(MAR)
Move 104
AC MBR Address Content
100 Move 104
ALU 101 Add 105
102 Store 106
103 Stop
104 3
105 5
READ
PC IR CU
106
CONTROL BUS
MAR 100
ADDRESS BUS


DATA BUS
Move 104
100 Move 104
ALU 101 Add 105
102 Store 106
103 Stop
104 3
PC  PC + 1 105 5
READ
PC IR CU
106
101 CONTROL BUS
MAR 100
ADDRESS BUS


DATA BUS
Move 104
100 Move 104
ALU 101 Add 105
IR  MBR 102 Store 106
103 Stop
Move 104 104 3
105 5
READ
PC IR CU
106
101 CONTROL BUS
MAR 100
ADDRESS BUS


DATA BUS
Move 104
100 Move 104
ALU 101 Add 105
102 Store 106
103 Stop
CU  IR
104 3
Move 104
105 5
READ
PC IR CU
106
101 CONTROL BUS
MAR 100
ADDRESS BUS

Instruction execution cycle
Processor-memory
data transfer between CPU and memory
Processor – input/output
data transfer between CPU and input/output module
Data processing
Arithmetic or logical operations on the data
Change of the instruction execution order (for example, jump)
Control
Combination of the above

instruction execution cycle (1)

Data BUS
MBR  M(104)
3
AC  MBR 100 Move 104
ALU 101 Add 105
102 Store 106
103 Stop
MAR  IR(104)
104 3
Move 104
105 5
READ
PC IR CU
106
101 CONTROL BUS
MAR 104
104
ADDRESS BUS

Next instruction fetching cycle

DATA BUS
Add 105 MBR  M(101)

100 Move 104
ALU 101 Add 105
102 Store 106
103 Stop
104 3
MAR  PC
105 5
READ
PC IR CU
106
101 CONTROL BUS
MAR 101
ADDRESS BUS

Next instruction fetching cycle (2)

DATA BUS
Add 105
100 Move 104
ALU 101 Add 105
102 Store 106
IR  MBR
103 Stop
CU  IR 104 3
Add 105
105 5
READ
PC IR CU
106
101 CONTROL BUS
MAR 101
ADDRESS BUS

Interrupts
Mechanism allowing to disturb the original execution order by

the other system components
Types of interrupts
Programmed
for example, overflow, divide by zero
Clock-generated
Generated by the internal processor clock
Used for process scheduling
Input/Output
From the I/O controller
Hardware failure
for example, Memory parity error

Examples of interrupts
User program I/O program User program I/O Program
1 1
4 4
WRITE WRITE
I/O instruction
I/O instruction
2a
2 @ Interrupt
5 2b execution
program
WRITE WRITE
Stop 3a 5
3 @
3b
Stop

Interrupt cycle
Processor checks periodically, if the interrupt occurred

It is shown by the interrupt signal
If no interrupt occurred, the next instruction is fetched
If the interrupt occurred:
The executed program is suspended
Its context is saved
Program counter is set to the address of the first instruction of
the interrupt execution program
Interrupt is processed
After that, the previous context is loaded to the CPU and the
user program is executed from the point it was suspended

Multiple interrupts
Two ways of dealing with multiple interrupts execution exist

Blocked interrupts
Processor ignores other interrupts while the current interrupt
is processed
Interrupts are queued and after the current interrupt is
processed, the next one (if exists) is processed
Interrupts are executed in the sequence they occurred
Priorities
Execution of the low priority interrupt can be suspended by
the higher priority interrupt
After execution of the higher priority interrupt the execution
of the low priority interrupt is continued

Multiple interrupts execution

Sequential execution Priority execution
User program Interrupt nr 1 User program Interrupt nr 1
Interrupt nr 2
Interrupt nr 2

Processor Organization
Processor Requirements:
Fetch instruction
The processor reads an instruction from memory (register,
cache, main memory)
Interpret instruction
The instruction is decoded to determine what action is required
Fetch data
The execution of an instruction may require reading data from
memory or an I/O module
Process data
The execution of an instruction may require performing some
arithmetic or logical operation on data
Write data
The results of an execution may require writing data to
memory or an I/O module

Basic Elements of Processor (CPU)

CPU internal Organization

Registers
CPU must have some working space (temporary storage) -

registers
Number and function vary between processor designs
One of the major design decisions
Top level of memory hierarchy
Registers in CPU perform two roles:
User-visible registers: used by the machine or assembly
language programmer to minimize main memory references
Control and status registers: used by the control unit to control
the operation of the processor and by privileged operating
system programs to control the execution of programs

1. User-visible Registers
General Purpose
Data
Address
Condition Codes

General Purpose Registers (1)
May be true general purpose

May be used for data or addressing: e.g. register indirect,
displacement
Data: e.g. accumulator
Addressing: e.g. segment pointers, index registers

Make them general purpose

Increase flexibility and programmer options
Increase instruction size and complexity
Make them specialized
Smaller (faster) instructions
Less flexibility

How Many? How Big?

Between 8 - 32 Large enough to hold full
Fewer = more memory address
references Large enough to hold full
More does not reduce word
memory references and takes Often possible to combine
up processor real estate two data registers

Condition Code Registers
Sets of individual bits

e.g. result of last operation was zero
Can be read (implicitly) by programs
e.g. Jump if zero
Can not (usually) be set by programs

2. Control and Status Registers
Four registers are essential to instruction execution:

Program counter (PC)
Contains the address of an instruction to be fetched
Instruction register (IR)
Contains the instruction most recently fetched
Memory address register (MAR)
Contains the address of a location in memory
Memory buffer register (MBR)
Contains a word of data to be written to memory or the word
most recently read

Program Status Word (PSW)
Includes Condition Codes

Sign of last result
Zero
Carry
Equal
Overflow
Interrupt enable/disable
Supervisor

Example Register Organizations

Instruction Cycle

Instruction Cycle State Diagram

The Fetch Cycle

The Indirect Cycle

The Interrupt Cycle

Pipelining
As with manufacturing, the concept of pipelining regarding

the operation of a CPU is to:
break the process into smaller steps, each step handled by a
sub process
as soon as one sub process finishes its task, it passes its result
to the next sub process, then attempts to begin the next task
multiple tasks being operated on simultaneously improves
performance

Breaking an Instruction into Cycles
A simple approach is to divide instruction into two stages:

Fetch instruction
Execute instruction
There are times when execution of instruction doesn’t use
main memory
In these cases, use idle bus to fetch next instruction in parallel
with execution.
This is called instruction prefetch

Instruction Prefetch

Improved Performance of Prefetch
Without prefetch:
Instruction 1 Instruction 2 Instruction 3 Instruction 4
fetch exec fetch exec fetch exec fetch exec
With prefetch:
Instruction 1 fetch exec

Improved Performance of Prefetch (2)
Examining operation of prefetch appears to take half as many

cycles as the number of instructions increases
Performance, however, is not doubled:
Fetch usually shorter than execution
Prefetch more than one instruction?
Any jump or branch means that prefetched instructions are not
the required instructions
Add more stages to improve performance

Three Cycle Instruction
The number of cycles it takes to execute a single instruction is

further reduced (to approximately a third) if we break an
instruction into three cycles (fetch/decode/execute).
Instruction 1 Instruction 2 Instruction 3 Instruction 4
F D E F D E F D E F D E
Instruction 1 F D E
Instruction 2 F D E
Instruction 3 F D E
Instruction 4 F D E

Pipelining Strategy
If instruction execution could be broken into more pieces, we

could realize even better performance
Fetch instruction (FI) – Read next instruction into buffer
Decode instruction (DI) – Determine the opcode
Calculate operands (CO) – Find effective address of source
operands
Fetch operands (FO) – Get source operands from memory
Execute instructions (EI) – Perform indicated operation
Write operands (WO) – Store the result
This decomposition produces nearly equal durations

Sample Timing Diagram for Pipeline

Problems Associated with Previous Timing Diagram
Assumes that each instruction goes through all six stages of

pipeline
It is possible to have FI, FO, and WO happening at the same
time
Even with the more detailed decomposition, some stages will
still take more time
Conditional branches cause even greater disruption to pipeline
than with prefetch
Interrupts, like conditional branches, will disrupt pipeline
CO and FO stages may depend on results of previous
instruction at a point before the WO stage writes the results

Effect of a Conditional Branch on Instruction Pipeline

Flow of a Six Stage

Pipeline

Alternative Pipeline
Depiction

More Roadblocks to Realizing Full Speedup
There are two additional factors that frustrate improving

performance using pipelining
Overhead required between stages such as buffer-to-buffer
transfers
The amount of control logic required to handle memory and
register dependencies and to control the pipeline itself
With each added stage, the hardware needed to support
pipelining requires careful consideration and design

Pipeline Hazards
Resource Hazards
A resource hazard occurs when two or more instructions that
are already in the pipeline need the same resource
The result is that the instructions must be executed in serial
rather than parallel for a portion of the pipeline
A resource hazard is sometimes referred to as a structural
hazard

Resource Hazard

Pipeline Hazards (3)
Data Hazard
A data hazard
occurs when
there is a
conflict in the
access of an
operand
location

Pipeline Hazards (4)
Control Hazard
Also known as a branch hazard
Occurs when the pipeline makes the wrong decision on a
branch prediction
Brings instructions into the pipeline that must subsequently
be discarded

Dealing with Branches
A variety of approaches have been used to reduce the

consequences of branches encountered in a pipelined system:
Multiple Streams
Prefetch Branch Target
Loop buffer
Branch prediction
Delayed branching

Multiple Streams
Branch penalty is a result of having two possible paths of

execution
Solution: Have two pipelines
Prefetch each branch into a separate pipeline
Once outcome of conditional branch is determined, use
appropriate pipeline
Competing for resources – this method leads to bus and
register contention
More streams than pipes – multiple branches lead to further
pipelines being needed

Prefetch Branch Target
When a conditional branch is recognized, the target of the

branch is prefetched, in addition to the instruction following
the branch
Target is then saved until the branch instruction is executed
If the branch is taken, the target has already been prefetched

Loop Buffer
Add a small, very fast memory

Maintained by fetch stage of pipeline
Use it to contain the n most recently fetched instructions in
sequence.
Before taking a branch, see if branch target is in buffer
Similar in concept to a cache dedicated to instructions while
maintaining an order of execution

Loop Buffer (2)

Branch Prediction
There are a number of methods that processors employ to make an

educated guess as to the direction a branch may take.
Static
Predict never taken
Predict always taken
Predict by opcode
Dynamic – depend on execution history
Taken/not taken switch
Branch history table

Lect 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect 2

Uploaded by

Copyright:

Available Formats

von Neumann Architecture Processor Organization Instruction Cycle Instruction Pipelining

CSC 213: Computer Architecture

October 22, 2021

CSC 213: Computer Architecture

1 von Neumann Architecture

CSC 213: Computer Architecture

von Neumann Architecture

Modern computers are based on the von Neumann architecture.

CSC 213: Computer Architecture

von Neumann Architecture Internals

AC I/O AR I/O devices

CSC 213: Computer Architecture

Operation code Address

For example Move 104 - 0101000001101000

CSC 213: Computer Architecture

Fetch cycle Execution cycle YES

Instruction Instruction Interrupts

CSC 213: Computer Architecture

Instruction fetching cycle

Program counter (PC) stores address of the next instruction

CSC 213: Computer Architecture

instruction fetching cycle (1)

AC MBR Entry Address Content

CSC 213: Computer Architecture

instruction fetching cycle (2)

CSC 213: Computer Architecture

instruction fetching cycle (3)

CSC 213: Computer Architecture

instruction fetching cycle (4)

CSC 213: Computer Architecture

instruction fetching cycle (5)

CSC 213: Computer Architecture

Instruction execution cycle

CSC 213: Computer Architecture

instruction execution cycle (1)

CSC 213: Computer Architecture

Next instruction fetching cycle

Add 105 MBR  M(101)

AC MBR Address Content

CSC 213: Computer Architecture

Next instruction fetching cycle (2)

CSC 213: Computer Architecture

Mechanism allowing to disturb the original execution order by

CSC 213: Computer Architecture

CSC 213: Computer Architecture

Processor checks periodically, if the interrupt occurred

CSC 213: Computer Architecture

Two ways of dealing with multiple interrupts execution exist

CSC 213: Computer Architecture

Multiple interrupts execution

CSC 213: Computer Architecture

CSC 213: Computer Architecture

Basic Elements of Processor (CPU)

CSC 213: Computer Architecture

CPU internal Organization

CSC 213: Computer Architecture

CPU must have some working space (temporary storage) -

CSC 213: Computer Architecture

CSC 213: Computer Architecture

General Purpose Registers (1)

May be true general purpose

CSC 213: Computer Architecture