CH 03

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

Processor and Memory

Organization
Chapter 3

1
Chapter 3 Objectives

• Learn the components common to every modern


computer system.
• Be able to explain how each component contributes to
program execution.
• Understand the factors involved in instruction set
architecture design.
• Gain familiarity with memory addressing modes.
• Understand the concepts of instruction-level pipelining
and its affect upon execution performance.
• Memory organization
2
Computer Organization

3
CPU Basics
• What is a Central Processing Unit (CPU)?
– It is the brain of the machine
• CPU executes programs by:
– Fetching the next instruction from memory
– Decoding fetched instructions
– Executing /performing/ the indicated sequence of operations
• It Consists of:
– Control Unit,
– Arithmetic Logic Unit (ALU)
– Registers (high-speed memory)
4
CPU Basics

• Registers hold data, addresses, or control


information that can be readily accessed by the CPU.
– Some registers are “special purpose”
• May contain only data, only addresses, or only control information.
• Examples: Program Counter (PC), Instruction Register (IR), etc
– General purpose registers may hold data, addresses, and control
information at various times
• The arithmetic-logic unit (ALU) carries out logical and
arithmetic operations as directed by the control unit.
• The control unit(CU) determines which actions to carry
out according to the values in a program counter register
and a status register.
5
The Bus

• The CPU shares data with other system components by


way of a data bus.
– A bus is a set of wires that simultaneously convey a single bit
along each line.
• Two types of buses are commonly found in computer
systems:
– point-to-point: connecting two specific components
– multipoint buses: connects a number of devices(shared)

This is a point-to-point
bus configuration:
6
The Bus

• A multipoint bus is shown below.


• Because a multipoint bus is a shared resource,
access to it is controlled through protocols, which
are built into the hardware.

7
The Bus

• The common protocol multipoint bus uses to synchronize access is


called master-slave.
– Master: devices that are allowed to initiate transfer of information
– Slave: modules that are activated by a master and respond to requests to read and write data
• In a master-slave configuration concurrent bus master requests must be
arbitrated.
• Four categories of bus arbitration are:

– Distributed using self-detection:


– Daisy chain: Permissions
Devices decide which gets the bus
are passed from the highest-
among themselves.
priority device to the
lowest. – Distributed using collision-
detection: Any device can try to
– Centralized parallel: Each
use the bus. If its data collides
device is directly connected
with the data of another device,
to an arbitration circuit. 8
it tries again.
The Bus
• Typical Buses consist of
– data lines: convey bits from one device to another
– control lines: determine the direction of data flow
– address lines: determine the location of the source or
destination of the data

9
Instruction Set
Architecture

11
Instruction Set
• Refer to the operations the hardware recognizes and
performs.
• Instruction sets are differentiated by the following:
• Number of bits per instruction.
• Stack-based or register-based.
• Number of explicit operands per instruction.
• Operand location.
• Types of operations.
• Type and size of operands.

12
Typical Instruction Format
• Representation of an instruction
• Binary format for hardware (0s & 1s)
• For software –3 parts
• Opcodes, operands, results
• A typical instruction contains three parts
• Opcode—the operation to be performed
• Operands –the value(s) to be used
• Results –where to place the result(s)
• Binary format
Opcode Operand 1 Operand 2 …

13
Programming with Registers
• Registers are used to hold an operand or the result of an
instruction
• For a particular task, a series of instructions might be required to
move values between memory and the registers
• Eg. To add two integers X and Y and place the result in memory
M. (suppose registers R3, R6 are available)

mov R3 X
mov R6 Y
add R6 R3
mov M R6
14
Instruction Set Architecture (ISA)

In designing an instruction set, consideration is


given to:
• Endianness
• Instruction length.
– Whether short, long, or variable.
• Number of operands.
• Number of addressable registers.
• Memory organization.
– Whether byte- or word addressable.
• Addressing modes.
– Choose any or all: direct, indirect or indexed.
15
Byte ordering, or endianness

• Byte ordering, or endianness, is major architectural


consideration.
• If we have a two-byte integer, the integer may be stored so
that the least significant byte is followed by the most
significant byte or vice versa.
– In little endian machines, the least significant byte is followed by the most
significant byte.
– Big endian machines store the most significant byte first (at the lower
address).
• As an example, suppose we have the hexadecimal number 12345678.

16
how the CPU will store data?

• The next consideration for architecture design


concerns how the CPU will store data.
• We have three choices:
1. A stack architecture
2. An accumulator architecture
3. A general purpose register architecture.
• In choosing one over the other, the tradeoffs are
simplicity (and cost) of hardware design with
execution speed and ease of use.

18
how the CPU will store data?

• In a stack architecture, instructions and operands


are implicitly taken from the stack.
– A stack cannot be accessed randomly.
• In an accumulator architecture, one operand of
a binary operation is implicitly in the
accumulator(Register).
– One operand is in memory, creating lots of bus traffic.
• In a general purpose register (GPR) architecture,
registers can be used instead of memory.
– Faster than accumulator architecture.
– Results in longer instructions.

19
how the CPU will store data?

• Most systems today are GPR systems.


• There are three types:
– Memory-memory where two or three operands may be in
memory.
– Register-memory where at least one operand must be in a
register.
– Load-store where no operands may be in memory.
• The number of operands and the number of
available registers has a direct affect on instruction
length.

20
Instruction Formats

• Stack machines use one - and zero-operand


instructions.
• LOAD and STORE instructions require a single
memory address operand.
– Load x;
• Other instructions use operands from the stack
implicitly.
– Binary instructions (e.g., ADD, MULT) use the top two
items on the stack.
• PUSH and POP operations involve only the stack’s
top element.

21
Instruction Formats

• Stack architectures require us to think about


arithmetic expressions a little differently.
• We are accustomed to writing expressions using
infix notation, such as: Z = X + Y.
• Stack arithmetic requires that we use postfix
notation: Z = XY+.
– This is also called reverse Polish notation

22
Instruction Formats

• The principal advantage of postfix notation is


that parentheses are not used.
• For example, the infix expression,
Z = (X  Y) + (W  U),
becomes:
Z = X Y  W U  +
in postfix notation.

23
Instruction Formats

• In a stack ISA, the postfix expression,


Z = X Y  W U  +
might look like this:
PUSH X
PUSH Y
MULT
PUSH W
Note: The result of
PUSH U
MULT a binary operation
ADD is implicitly stored
PUSH Z on the top of the
stack!

24
Instruction Formats

• In a two-address ISA, (e.g.,Intel, Motorola), the


infix expression,
Z = X  Y + W  U
might look like this:
LOAD R1,X
MULT R1,Y
LOAD R2,W
MULT R2,U
ADD R1,R2
STORE Z,R1

25
Instruction Formats

• With a three-address ISA, (e.g.,mainframes),


the infix expression,
Z = X  Y + W  U
might look like this:
MULT R1,X,Y
MULT R2,W,U
ADD Z,R1,R2

26
Instruction types

Instructions fall into several broad categories


that you should be familiar with:
• Data movement.
• Arithmetic.
• Boolean.
• Bit manipulation.
• I/O.
• Control transfer.
• Special purpose.

27
Addressing

• Addressing modes specify where an operand is


located.
• They can specify a constant, a register, or a
memory location.
• The actual location of an operand is its effective
address.

28
Addressing

• Immediate addressing is where the data is part


of the instruction.
• Direct addressing is where the address of the
data is given in the instruction.
• Register addressing is where the data is located
in a register.
• Indirect addressing gives the address of the
address of the data in the instruction.
• Register indirect addressing uses a register to
store the address of the address of the data.

29
Addressing

• Indexed addressing uses a register (implicitly or


explicitly) as an offset, which is added to the
address in the operand to determine the effective
address of the data.
• Based addressing is similar except that a base
register is used instead of an index register.
• The difference between these two is that an index
register holds an offset relative to the address given
in the instruction, a base register holds a base
address where the address field represents a
displacement from this base.
30
Addressing

• For the instruction shown, what value is loaded into


the accumulator for each addressing mode?

31
Addressing

• These are the values loaded into the accumulator


for each addressing mode.

32
Instruction-Level Pipelining

• Some CPUs divide the fetch-decode-execute cycle


into smaller steps.
• These smaller steps can often be executed in parallel
to increase throughput.
– we use parallel hardware units
• Such parallel execution is called instruction-level
pipelining(ILP).

To enable high speed, we use parallel hardware units in


a processor, where each unit performs one step
33
Instruction-Level Pipelining

• Suppose a fetch-decode-execute cycle were broken


into the following smaller steps:
1. Fetch instruction. 4. Fetch operands.
2. Decode opcode. 5. Execute instruction.
3. Calculate effective 6. Store result.
address of operands.

• Suppose we have a six-stage pipeline. S1 fetches


the instruction, S2 decodes it, S3 determines the
address of the operands, S4 fetches them, S5
executes the instruction, and S6 stores the result.

34
Instruction-Level Pipelining

• For every clock cycle, one small step is carried out,


and the stages are overlapped.

S1. Fetch instruction. S4. Fetch operands.


S2. Decode opcode. S5. Execute.
S3. Calculate effective S6. Store result.
address of operands.
35
How to Decide the Operation
Sets
• Choosing a set of operation sets represents a
complex trade-off
• The Cost of the hardware
• The convenience for a programmer
• Engineering considerations
– Chip size
– Power consumption
– Heat dissipation

36
Complex and Reduced Instruction Sets
(CISC and RISC)
• A CISC processor includes a large set (hundreds)
of instructions, many of which perform complex
computation.
– Complex instructions are slow
• A RISC includes a minimum set of instructions
(typically <50)
– To achieve highest possible speed
– Fixed-size instruction
– Is designed to complete one instruction in each clock
cycle
37
Multi-Core Processor
• One integrated circuit which has two or more processors
( called cores)
– Dual-, quad-, hexa- core etc.
• Implements multiprocessing in a single physical
package
• Use message-passing or shared memory inter-core
communication methods
• Several tens of cores may require a Network-on-
Chip(NoC)
– Applies networking theory and methods to on-chip
communication between cores 38
Assignment
• Form a group having a maximum of 6 students
• Research and write a maximum of 5 pages report on
features, strengths and weaknesses of the two
common architectures called the Intel architecture,
which is a basically CISC machine and MIPS, which
is a RISC machine.
• Deadline
– Two weeks from today

39
Memory Organization

40
Access Time
• Registers a few nanoseconds
• Cache a small multiple of CPU registers
• Main memory few 10s of nanoseconds
• Big gap - disk access at least 10 milliseconds
• Tape access measured in seconds if storage is
offline

41
Capacity
• Registers ~128 bytes
• Cache a few Mb
• Memory 10 to 1000s of Mb
• Magnetic disks 100G to 1Tb+
• Tapes usually offline so limited only by budget

42
Cost per bit
• Cost per bit decreases as we move down

43
Memory Addresses
• Computer memory consists of cells, each with a
unique address
◦A cell is the smallest addressable unit
◦Each cell usually consists of 8 bits (1 Byte)
• Bytes are grouped into words
• Many instructions operate on entire words
– 32-bit computers have 32-bit words, made from 4 x 8-bit bytes
– 64-bit computers have 64-bit words made from 8 x 8-bit bytes
– 32-bit machine will have 32-bit registers and instructions for 32-bit
words
– 64-bit machine will have 64-bit registers and instructions for 64-bit
44
words
Problem: CPU Fast, Memory Slow

• After a memory request, the CPU will not get


the word for several cycles
• Memory speed always playing catch-up to
processor speed.
• Memory capacity increased rather than speed.

45
The Root of the Problem:
• Economics
– Fast memory is possible, but to run at full speed, it
needs to be located on the same chip as the CPU
◦Very expensive
◦Limits the size of the memory
• Do we choose:
– A small amount of fast memory?
– A large amount of slow memory?

46
Other problems
• Problem increased by designers concentrating
on making CPU’s faster and memories bigger
• Memory accessed on a bus is slow.
• Limits on how big CPUs can be made.
• Limits on chip memory.

47
Cache Memory
• Combine a small amount of fast memory (the cache) with a large
amount of slow memory
• When a word is referenced, put it and its neighbours into the
cache
• Programs do not access memory randomly
• (Locality Principle)
– Temporal Locality: recently accessed items are likely to be used again
• Example stored program in memory, instructions likely to be sequential and
in consecutive memory locations (spatial)
– Spatial Locality: the next access is likely to be near the last
one
• Most execution time spent in loops where the same instructions are
executed over and over (temporal)
48
CACHE LEVELS
• Sometimes systems are built with more than one
cache.
• The cache closest to the CPU is called level 1
cache, the next is level 2 etc.
• Level 1 cache will be faster than level 2, but
smaller in capacity.
• The CPU will always look for data in level 1
first. If it gets a cache miss, it looks in level 2 if
it also gets a miss it goes to main memory.
49
End of Chapter 4
End of chapter 3

50

You might also like