Download as pdf or txt
Download as pdf or txt
You are on page 1of 63

The CPU & Memory:

Design and
Enhancement

EDITED ON MAR2020
Lesson Outcomes
 Fetch-execute Instruction Cycle
 CPU Architectures
 CPU Enhancements (Separate fetch/execute
unit, pipelining, multiple parallel execution
units, superscalar processing,multiprocessing)
 Memory Enhancements (wide path memory
access, memory interleaving, cache memory)

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


PART 1: MACHINE CYCLE
Fetch-Execute Instruction Cycle
CPU repeats four basic operations:
1. Fetch - obtain program instruction or
data item from memory
2. Decode - translate instruction into commands
3. Execute - carry out command
4. Store - write result to memory

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


The 4 steps in
Fetch – Execute
Instruction Cycle
Fetch-Execute Instruction Cycle

1. Fetch the next instruction from memory into the instruction


register
2. Change the program counter to point to the following instruction
3. Determine the type of instruction just fetched
4. If the instruction use data in memory, determine where they are
5. Fetch the data, if any, into internal CPU registers
6. Execute the instruction
7. Store the results in the proper place
8. Go to step 1 to begin executing the following instruction

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


The Little Man Computer (LMC)
FETCH
(1) LMC reads the
address from the
location counter.

(3) LMC reads the


number on the paper.
he puts the paper
back, in case he need
to read it again later.

(2) LMC walks over the


mailbox that correspond
to the location counter.

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


EXECUTE (LOAD)

(1) LMC goes to mailbox


address specified in the (3) He walks over to the
instruction he previously calculator and punches the
fetched. number in.

(4) Finally, he walks to the


location counter and clicks it,
(2) He reads the number in which gets him ready to fetch
that mailbox. the next instruction.
Instruction Cycle
 Registers discussed:
A or GP (general purpose): holds data values between
instructions
 Program Counter (PC): determines next instruction for execution
 Instruction
Register (IR): holds current instruction while it is
being executed
 MAR and MDR: used for accessing memory
 Every instruction must be fetched from memory before it can
be executed!
THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS
Instruction Cycle (cont.)
 Two-cycle process because both
instructions and data are in memory
 Fetch
 Decode or find instruction, load from memory into
register and signal ALU
 Execute
 Performs operation that instruction requires
 Move/transform data

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


Instruction Cycle - STEP 1
 Transfer value in PC (address of current instruction)
into MAR, so that computer can retrieve the
instruction located at that address

PC  MAR
 Result  instruction transferred from specified
memory location to MDR

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


Instruction Cycle - STEP 2

 Transfer that instruction to IR

MDR  IR
 Result  IR will hold the instruction through the
rest of the instruction cycle (that will control the
particular steps that make up remainder of cycle)

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


STEP 1 + STEP 2

= the FETCH phase of the instruction cycle for


every instruction

 The remaining steps are instruction


dependent!

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


LMC vs. CPU
Instruction Cycle

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS 15


Load Fetch/Execute Cycle

1. PC  MAR Transfer the address from the


PC to the MAR
2. MDR  IR Transfer the instruction to the IR

3. IR(address)  MAR Address portion of the instruction


loaded in MAR
4. MDR  A Actual data copied into the
accumulator
5. PC + 1  PC Program Counter incremented
Store Fetch/Execute Cycle
1. PC  MAR Transfer the address from the
PC to the MAR
2. MDR  IR Transfer the instruction to the IR

3. IR(address)  MAR Address portion of the instruction


loaded in MAR
4. A  MDR* Accumulator copies data into MDR

5. PC + 1  PC Program Counter incremented


*Notice how Step #4 differs for LOAD and STORE

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


ADD Fetch/Execute Cycle
1. PC  MAR Transfer the address from the
PC to the MAR
2. MDR  IR Transfer the instruction to the IR

3. IR(address)  MAR Address portion of the instruction


loaded in MAR
4. A + MDR  A Contents of MDR added to contents of
accumulator
5. PC + 1  PC Program Counter incremented

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


LMC Fetch/Execute
SUBTRACT IN OUT HALT
PC  MAR PC  MAR PC  MAR PC  MAR
MDR  IR MDR  IR MDR  IR MDR  IR
IR[addr]  MAR IOR  A A  IOR
A – MDR  A PC + 1  PC PC + 1  PC
PC + 1  PC

BRANCH BRANCH on Condition


PC  MAR PC  MAR
MDR  IR MDR  IR

IR[addr]  PC If condition false: PC + 1  PC


If condition true: IR[addr]  PC
THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS
Assume the following values are present just prior to execution of this segment:

Program Counter: 12
Value in Memory Location 12: 530 (LOAD 30)
Value in Memory Location 13: 376 (ADD 76)
Value in Memory Location 30: 777
Value in Memory Location 76: 210

At the end of fetching step in the first instruction cycle, give the contents
of the following:

First instruction:(fetching)

PC  MAR MAR =

MDR  IR IR =

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS 20


Assume the following values are present just prior to the execution of this segment:

Program Counter: 45
Value in Memory Location 44: 398 (ADD 98)
Value in Memory Location 45: 599 (LOAD 99)
Value in Memory Location 46: 123
Value in Memory Location 98: 777
Value in Memory Location 99: 210

At the end of each step in the instruction cycle, give the contents of the following:

PC  MAR MAR =
MDR  IR IR =
IR [address]  MAR MAR =
MDR  A A =
PC + 1  PC PC =

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS 21


PART 2: CPU ARCHITECTURES
CPU Architectures
CPU architecture = ISA (Instruction Set Architecture)

 CPU architecture defined by:


1. Number & types of registers
2. Methods of addressing memory
3. Basic design & layout of instruction set
 Many CPU architectures over the years but only few
remains as a result from evolution & expansion of the
architecture to include new features with improved design,
technology & implementation

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


CPU Architectures (cont.)

 Most CPU architectures today are loosely


categorized into:
1. CISC – Complex Instruction Set Computers

2. RISC – Reduced Instruction Set Computers

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


CISC RISC

• Few general-purpose • Many registers


registers. • Limited and simple
• Large number of instruction set
specialized instructions • Register-oriented 
• Wide variety of limited memory access
addressing techniques • Fixed length, fixed
• Instruction words of format instruction word
varying sizes • Limited addressing modes

The CPU and Memory: Organization


CISC
RISC
RISC Architecture (cont.)
 Attempt to produce more CPU power by
eliminating 2 major bottlenecks to instruction
execution speed:
1. Reducing number of data memory access by using
registers more effectively
 time to locate & access data in memory is much longer
2. Simplifying the instruction set by eliminating
rarely used instructions

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


CPU Architectures

 However, in modern times, the dividing line between CISC &


RISC has become increasingly blurred as many of the
features of each have migrated across the dividing line

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


PART 3: CPU ENHANCEMENT
CPU Performance
The purpose of a computer is to execute programs.
The ability of the CPU to execute instruction quickly is an important
contributor to performance.
Methods to increase the performance of the CPU are:

Separate fetch Pipelining Superscalar Multiprocessing


and execute
Separate Fetch-Execute Units
 Fetch Unit
 Instruction fetch unit
 Instruction decode unit
 Determine opcode
 Identify type of instruction and operands
 Several instructions are fetched in parallel and held in a buffer
until decoded and executed
 Execute Unit
 Receives instructions from the decode unit
 Appropriate execution unit services the instruction

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


Separate Fetch-Execute Units

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


Instruction Pipelining
 Overlap instructions to speed up
processing
 As each instruction completes a step, the following
instruction moves into the stage just vacated
 Result  large overall increase in average number
of instructions performed at a given time

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


Instruction Pipelining

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


Pipelining Hazard
Structural Hazard Data Hazard Control Hazard

• attempt to use • attempt to use • attempt to make


the same data before it is a decision
resource two ready before a
different ways at • Eg: the condition is
a time. following evaluated
• Eg: use the instruction • Eg: branch
register for depends on the instructions
multiplication result of prior
and division instruction in
operation at the the pipeline.
same time.
Superscalar Processing
 Process more than one instruction per
clock cycle
 Instructions processed in parallel, with an
average rate of more than 1 instruction per
clock cycle through multiple execution
units

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


Scalar vs. Superscalar Processing

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


Multiprocessing
 Increase power of a computer system by adding
more computers
 2 or more CPUs may be interconnected to form a
multiprocessing system because:
1. adding additional CPUs is cheap & within limits
2. programs can be divided & the parts executed
simultaneously on multiple CPUs
 2 different approaches:

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


Multiprocessing (cont.)
The term also refers to the ability of a system to
support more than one processor or the ability to
allocate tasks between them.
Reasons for using multiprocessing:
◦ Increase the processing power of a system.
◦ Enables parallel processing – programs can be
divided into independent pieces and the different
parts executed simultaneously on multiple
processors.
Multiprocessing (cont.)
Since the execution speed of a CPU is directly related to the clock speed,
equivalent processing power can be achieved at much lower clock speeds,
reducing power consumption, heat and stress within the various computer
components.

Adding more CPUs is relatively inexpensive.

If a CPU encounters a problem, other CPUs can continue instruction execution,


increasing overall throughput.
Multiprocessing (cont.)
Two different approaches in multiprocessing:

Asymmetric Multiprocessing
(Master Slave) Tightly- Loosely-
Coupled Coupled
System System
Symmetric Multiprocessing
(Peers)
1) Tightly-coupled systems
 Connected CPUs share some or all of the
system's memory & to some I/O devices
 Can divide program execution
 Two types:
1. Master-slave multiprocessing
2. Symmetrical multiprocessing (SMP)

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


1) Tightly-coupled systems (e.g.)

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


1) Tightly-coupled systems (cont.)
i) Master-slave multiprocessing
 One CPU (master) manages the system & controls all
resources & scheduling
 Only master can execute OS
 Simple BUT low reliability, & poor use of resources
ii) Symmetrical multiprocessing (SMP)
 Each CPU has identical access to OS & to all system
resources
 Each performs own dispatch scheduling
 Difficult to implement BUT high reliability, & well- balanced
workload

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


Tightly-Coupled System
ASSYMETRIC/MASTER SLAVE

Master CPU
 Manages
the system
 Controls all
resources
and
scheduling
 Assigns
tasks to
slave CPUs
Tightly-Coupled System
ASSYMETRIC/MASTER SLAVE
ADVANTAGES DISADVANTAGES

• Resources can be • Reliability issues –


dedicated to if master CPU
critical tasks - fails entire
system fails
resulting in more
deterministic
performance.
• Cores spend less
time handshaking
with each other
Tightly-Coupled System
SYMETRICAL
• Multiple CPUs in a networking device share the same
board, memory, I/O and operating system.
• Each CPU has equal access to resources
• Each CPU determines what to run using a
• standard algorithm
• A single operating system (OS) runs on all processors,
which access a single image of the OS in the memory
• Any processor can run any type of task
• Processors communicate with each other through
shared memory.
Tightly-Coupled System
SYMETRICAL
ADVANTAGES DISADVANTAGES

• Provide better load- • Resource conflicts –


balancing and fault memory, i/o, etc.
tolerance • Complex implementation-
• Can save money by to keep everything
sharing power supplies, synchronized
housings, and • Additional CPU cycles are
peripherals required to manage the
• Can execute programs cooperation, so per- CPU
more quickly and increased efficiency goes down.
reliability.
Symmetric VS Asymmetric
2) Loosely-coupled systems
 Each computer is complete in itself, with
own CPU, memory, & I/O facilities
 Data communications provides link
between different computers -
communication channels
 Example: point-to-point, multipoint,
clusters, client-server

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


Example: Point-to-Point

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


Example: Client-Server Network

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


PART4: MEMORY ENHANCEMENT
Memory Enhancements
 With the instruction cycle, slowest steps are those
require memory access  memory access needs
improvement
 3 different approaches to enhance memory
performance:
1. Wide path memory access
2. Memory interleaving
3. Cache memory

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


Memory Enhancements
Method to improved memory accesses:
Wide path Memory Memory
Access Interleaving Cache Memory
• Widen the data path • Divide • Position a small
to read/write memory into amount of high
parts speed memory
several bytes/words • Organized into
between the CPU • Partition memory
blocks
and memory for into subsections,
• Each block provides
each access each with its own a small amount of
• Retrieve multiple address register storage
bytes instead of 1 and data register
byte at a time
1) Wide path memory access
 Retrieve multiple bytes instead of 1 byte at a time
 Widen data path - can read/write several bytes or
words between CPU & memory with each access
 Widening the bus data path & using larger MDR
 Simple technique, widely used

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


2) Memory Interleaving
 Dividing memory into parts  increase the
effective rate of memory access
 Possible to access more than 1 location at a
time
 Each part would have its own MAR & MDR and
each part is independently accessible

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


2) Memory Interleaving (cont.)

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


3) Cache Memory
 Position a small amount of high-speed memory
between CPU and main storage
 Organized in blocks : 8 or 16 bytes
 Tags : location in main memory
 Cache controller
 Hardware that checks tags
 Cache Line
 Unit of transfer between storage and cache memory
 Hit Ratio: ratio of hits out of total requests

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


3) Cache Memory (cont.)
 Synchronizing cache and memory
1. Write through – writes data back to main memory
immediately upon change in the cache
2. Write back – writes to memory are made only when
a cache line is actually replaced (faster but more
care is needed)

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS


3) Cache Memory (cont.)
Locality of reference:
 Cache memory works due to the locality of reference
principle
Most memory references confined to small region of memory at any given
time
Well-written program in small loop, procedure or function
Data likely in array
Variables stored together
3) Cache Memory (cont.)

THE CPU AND MEMORY: DESIGN AND ENHANCEMENTS

You might also like