The CPU & Memory - Design and Enhancement

The CPU & Memory:
Design and
Enhancement
EDITED ON MAR2020
Lesson Outcomes
 Fetch-execute Instruction Cycle
 CPU Architectures
 CPU Enhancements (Separate fetch/execute
unit, pipelining, multiple parallel execution
units, superscalar processing,multiprocessing)
 Memory Enhancements (wide path memory
access, memory interleaving, cache memory)
THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

PART 1: MACHINE CYCLE
Fetch-Execute Instruction Cycle
CPU repeats four basic operations:
1. Fetch - obtain program instruction or
data item from memory
2. Decode - translate instruction into commands
3. Execute - carry out command
4. Store - write result to memory

The 4 steps in
Fetch – Execute
Instruction Cycle
Fetch-Execute Instruction Cycle
1. Fetch the next instruction from memory into the instruction

register
2. Change the program counter to point to the following instruction
3. Determine the type of instruction just fetched
4. If the instruction use data in memory, determine where they are
5. Fetch the data, if any, into internal CPU registers
6. Execute the instruction
7. Store the results in the proper place
8. Go to step 1 to begin executing the following instruction

The Little Man Computer (LMC)
FETCH
(1) LMC reads the
address from the
location counter.
(3) LMC reads the

number on the paper.
he puts the paper
back, in case he need
to read it again later.
(2) LMC walks over the

mailbox that correspond
to the location counter.

EXECUTE (LOAD)
(1) LMC goes to mailbox

address specified in the (3) He walks over to the
instruction he previously calculator and punches the
fetched. number in.
(4) Finally, he walks to the

location counter and clicks it,
(2) He reads the number in which gets him ready to fetch
that mailbox. the next instruction.
Instruction Cycle
 Registers discussed:
A or GP (general purpose): holds data values between
instructions
 Program Counter (PC): determines next instruction for execution
 Instruction
Register (IR): holds current instruction while it is
being executed
 MAR and MDR: used for accessing memory
 Every instruction must be fetched from memory before it can
be executed!
Instruction Cycle (cont.)
 Two-cycle process because both
instructions and data are in memory
 Fetch
 Decode or find instruction, load from memory into
register and signal ALU
 Execute
 Performs operation that instruction requires
 Move/transform data

Instruction Cycle - STEP 1
 Transfer value in PC (address of current instruction)
into MAR, so that computer can retrieve the
instruction located at that address
PC  MAR
 Result  instruction transferred from specified
memory location to MDR

Instruction Cycle - STEP 2
 Transfer that instruction to IR
MDR  IR
 Result  IR will hold the instruction through the
rest of the instruction cycle (that will control the
particular steps that make up remainder of cycle)

STEP 1 + STEP 2
= the FETCH phase of the instruction cycle for

every instruction
 The remaining steps are instruction

dependent!

LMC vs. CPU
Instruction Cycle
THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS 15

Load Fetch/Execute Cycle
1. PC  MAR Transfer the address from the

PC to the MAR
2. MDR  IR Transfer the instruction to the IR
3. IR(address)  MAR Address portion of the instruction

loaded in MAR
4. MDR  A Actual data copied into the
accumulator
5. PC + 1  PC Program Counter incremented
Store Fetch/Execute Cycle
PC to the MAR

loaded in MAR
4. A  MDR* Accumulator copies data into MDR

*Notice how Step #4 differs for LOAD and STORE

ADD Fetch/Execute Cycle
PC to the MAR

loaded in MAR
4. A + MDR  A Contents of MDR added to contents of
accumulator

LMC Fetch/Execute
SUBTRACT IN OUT HALT
PC  MAR PC  MAR PC  MAR PC  MAR
MDR  IR MDR  IR MDR  IR MDR  IR
IR[addr]  MAR IOR  A A  IOR
A – MDR  A PC + 1  PC PC + 1  PC
PC + 1  PC
BRANCH BRANCH on Condition

PC  MAR PC  MAR
MDR  IR MDR  IR
IR[addr]  PC If condition false: PC + 1  PC

If condition true: IR[addr]  PC
Assume the following values are present just prior to execution of this segment:
Program Counter: 12
Value in Memory Location 12: 530 (LOAD 30)
Value in Memory Location 13: 376 (ADD 76)
Value in Memory Location 30: 777
At the end of fetching step in the first instruction cycle, give the contents
of the following:
First instruction:(fetching)
PC  MAR MAR =
MDR  IR IR =

Assume the following values are present just prior to the execution of this segment:
Program Counter: 45
Value in Memory Location 44: 398 (ADD 98)
Value in Memory Location 45: 599 (LOAD 99)
At the end of each step in the instruction cycle, give the contents of the following:
PC  MAR MAR =
MDR  IR IR =
IR [address]  MAR MAR =
MDR  A A =
PC + 1  PC PC =

PART 2: CPU ARCHITECTURES
CPU Architectures
CPU architecture = ISA (Instruction Set Architecture)
 CPU architecture defined by:

1. Number & types of registers
2. Methods of addressing memory
3. Basic design & layout of instruction set
 Many CPU architectures over the years but only few
remains as a result from evolution & expansion of the
architecture to include new features with improved design,
technology & implementation

CPU Architectures (cont.)
 Most CPU architectures today are loosely

categorized into:
1. CISC – Complex Instruction Set Computers
2. RISC – Reduced Instruction Set Computers

CISC RISC
• Few general-purpose • Many registers

registers. • Limited and simple
• Large number of instruction set
specialized instructions • Register-oriented 
• Wide variety of limited memory access
addressing techniques • Fixed length, fixed
• Instruction words of format instruction word
varying sizes • Limited addressing modes
The CPU and Memory: Organization

CISC
RISC
RISC Architecture (cont.)
 Attempt to produce more CPU power by
eliminating 2 major bottlenecks to instruction
execution speed:
1. Reducing number of data memory access by using
registers more effectively
 time to locate & access data in memory is much longer
2. Simplifying the instruction set by eliminating
rarely used instructions

CPU Architectures
 However, in modern times, the dividing line between CISC &

RISC has become increasingly blurred as many of the
features of each have migrated across the dividing line

PART 3: CPU ENHANCEMENT
CPU Performance
The purpose of a computer is to execute programs.
The ability of the CPU to execute instruction quickly is an important
contributor to performance.
Methods to increase the performance of the CPU are:
Separate fetch Pipelining Superscalar Multiprocessing

and execute
Separate Fetch-Execute Units
 Fetch Unit
 Instruction fetch unit
 Instruction decode unit
 Determine opcode
 Identify type of instruction and operands
 Several instructions are fetched in parallel and held in a buffer
until decoded and executed
 Execute Unit
 Receives instructions from the decode unit
 Appropriate execution unit services the instruction

Separate Fetch-Execute Units

Instruction Pipelining
 Overlap instructions to speed up
processing
 As each instruction completes a step, the following
instruction moves into the stage just vacated
 Result  large overall increase in average number
of instructions performed at a given time

Instruction Pipelining

Pipelining Hazard
Structural Hazard Data Hazard Control Hazard
• attempt to use • attempt to use • attempt to make

the same data before it is a decision
resource two ready before a
different ways at • Eg: the condition is
a time. following evaluated
• Eg: use the instruction • Eg: branch
register for depends on the instructions
multiplication result of prior
and division instruction in
operation at the the pipeline.
same time.
Superscalar Processing
 Process more than one instruction per
clock cycle
 Instructions processed in parallel, with an
average rate of more than 1 instruction per
clock cycle through multiple execution
units

Scalar vs. Superscalar Processing

Multiprocessing
 Increase power of a computer system by adding
more computers
 2 or more CPUs may be interconnected to form a
multiprocessing system because:
1. adding additional CPUs is cheap & within limits
2. programs can be divided & the parts executed
simultaneously on multiple CPUs
 2 different approaches:

Multiprocessing (cont.)
The term also refers to the ability of a system to
support more than one processor or the ability to
allocate tasks between them.
Reasons for using multiprocessing:
◦ Increase the processing power of a system.
◦ Enables parallel processing – programs can be
divided into independent pieces and the different
parts executed simultaneously on multiple
processors.
Since the execution speed of a CPU is directly related to the clock speed,
equivalent processing power can be achieved at much lower clock speeds,
reducing power consumption, heat and stress within the various computer
components.
Adding more CPUs is relatively inexpensive.
If a CPU encounters a problem, other CPUs can continue instruction execution,

increasing overall throughput.
Two different approaches in multiprocessing:
Asymmetric Multiprocessing
(Master Slave) Tightly- Loosely-
Coupled Coupled
System System
Symmetric Multiprocessing
(Peers)
1) Tightly-coupled systems
 Connected CPUs share some or all of the
system's memory & to some I/O devices
 Can divide program execution
 Two types:
1. Master-slave multiprocessing
2. Symmetrical multiprocessing (SMP)

1) Tightly-coupled systems (e.g.)

1) Tightly-coupled systems (cont.)
i) Master-slave multiprocessing
 One CPU (master) manages the system & controls all
resources & scheduling
 Only master can execute OS
 Simple BUT low reliability, & poor use of resources
ii) Symmetrical multiprocessing (SMP)
 Each CPU has identical access to OS & to all system
resources
 Each performs own dispatch scheduling
 Difficult to implement BUT high reliability, & well- balanced
workload

Tightly-Coupled System
ASSYMETRIC/MASTER SLAVE
Master CPU
 Manages
the system
 Controls all
resources
and
scheduling
 Assigns
tasks to
slave CPUs
ASSYMETRIC/MASTER SLAVE
ADVANTAGES DISADVANTAGES
• Resources can be • Reliability issues –

dedicated to if master CPU
critical tasks - fails entire
system fails
resulting in more
deterministic
performance.
• Cores spend less
time handshaking
with each other
SYMETRICAL
• Multiple CPUs in a networking device share the same
board, memory, I/O and operating system.
• Each CPU has equal access to resources
• Each CPU determines what to run using a
• standard algorithm
• A single operating system (OS) runs on all processors,
which access a single image of the OS in the memory
• Any processor can run any type of task
• Processors communicate with each other through
shared memory.
SYMETRICAL
ADVANTAGES DISADVANTAGES
• Provide better load- • Resource conflicts –

balancing and fault memory, i/o, etc.
tolerance • Complex implementation-
• Can save money by to keep everything
sharing power supplies, synchronized
housings, and • Additional CPU cycles are
peripherals required to manage the
• Can execute programs cooperation, so per- CPU
more quickly and increased efficiency goes down.
reliability.
Symmetric VS Asymmetric
2) Loosely-coupled systems
 Each computer is complete in itself, with
own CPU, memory, & I/O facilities
 Data communications provides link
between different computers -
communication channels
 Example: point-to-point, multipoint,
clusters, client-server

Example: Point-to-Point

Example: Client-Server Network

PART4: MEMORY ENHANCEMENT
Memory Enhancements
 With the instruction cycle, slowest steps are those
require memory access  memory access needs
improvement
 3 different approaches to enhance memory
performance:
1. Wide path memory access
2. Memory interleaving
3. Cache memory

Memory Enhancements
Method to improved memory accesses:
Wide path Memory Memory
Access Interleaving Cache Memory
• Widen the data path • Divide • Position a small
to read/write memory into amount of high
parts speed memory
several bytes/words • Organized into
between the CPU • Partition memory
blocks
and memory for into subsections,
• Each block provides
each access each with its own a small amount of
• Retrieve multiple address register storage
bytes instead of 1 and data register
byte at a time
1) Wide path memory access
 Retrieve multiple bytes instead of 1 byte at a time
 Widen data path - can read/write several bytes or
words between CPU & memory with each access
 Widening the bus data path & using larger MDR
 Simple technique, widely used

2) Memory Interleaving
 Dividing memory into parts  increase the
effective rate of memory access
 Possible to access more than 1 location at a
time
 Each part would have its own MAR & MDR and
each part is independently accessible

2) Memory Interleaving (cont.)

3) Cache Memory
 Position a small amount of high-speed memory
between CPU and main storage
 Organized in blocks : 8 or 16 bytes
 Tags : location in main memory
 Cache controller
 Hardware that checks tags
 Cache Line
 Unit of transfer between storage and cache memory
 Hit Ratio: ratio of hits out of total requests

3) Cache Memory (cont.)
 Synchronizing cache and memory
1. Write through – writes data back to main memory
immediately upon change in the cache
2. Write back – writes to memory are made only when
a cache line is actually replaced (faster but more
care is needed)

Locality of reference:
 Cache memory works due to the locality of reference
principle
Most memory references confined to small region of memory at any given
time
Well-written program in small loop, procedure or function
Data likely in array
Variables stored together

The CPU & Memory - Design and Enhancement

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The CPU & Memory - Design and Enhancement

Uploaded by

Copyright:

Available Formats

The CPU & Memory:

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

1. Fetch the next instruction from memory into the instruction

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

(3) LMC reads the

(2) LMC walks over the

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

(1) LMC goes to mailbox

(4) Finally, he walks to the

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

 Transfer that instruction to IR

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

= the FETCH phase of the instruction cycle for

 The remaining steps are instruction

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS 15

1. PC  MAR Transfer the address from the

3. IR(address)  MAR Address portion of the instruction

3. IR(address)  MAR Address portion of the instruction

5. PC + 1  PC Program Counter incremented

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

3. IR(address)  MAR Address portion of the instruction

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

BRANCH BRANCH on Condition

IR[addr]  PC If condition false: PC + 1  PC

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS 20

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS 21

 CPU architecture defined by:

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

 Most CPU architectures today are loosely

2. RISC – Reduced Instruction Set Computers

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

• Few general-purpose • Many registers

The CPU and Memory: Organization

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

 However, in modern times, the dividing line between CISC &

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

Separate fetch Pipelining Superscalar Multiprocessing

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

• attempt to use • attempt to use • attempt to make

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

Adding more CPUs is relatively inexpensive.

If a CPU encounters a problem, other CPUs can continue instruction execution,

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

• Resources can be • Reliability issues –

• Provide better load- • Resource conflicts –

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS