Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 28

MODULE 1

Computer Architecture: 
Computer Architecture is a functional description of requirements and design
implementation for the various parts of computer.It deals with functional behavior of
computer system. It comes before the computer organization while designing a
computer. 
Von Neumann Architecture
The modern computers are based on a stored-program concept introduced by John Von
Neumann.
Based on three key concepts:
• Data and instructions are stored in a single read–write memory.
• The contents of this memory are addressable by location, without   regard to the type of data
contained there.
• Execution occurs in a sequential fashion (unless explicitly 
   modified) from one instruction to the next.

A computer consists of
  CPU (central processing unit)
 Memory
  I/O
 and interconnection structures among these components(buses).
Program Concept
Hardwired systems are inflexible. General purpose hardware can do different tasks, given correct control
signals. Instead of re-wiring, supply a new set of control signals. Program is a sequence of steps. For
each step, an arithmetic or logical operation is done. For each operation, a different set of control
signals is needed.

Function of Control Unit

For each operation a unique code is provided e.g. ADD, MOVE. A hardware segment accepts the code
and issues the control signals.

Components

The Control Unit and the Arithmetic and Logic Unit constitute the Central Processing Unit. Data and
instructions need to get into the system and results out - Input/output. Temporary storage of code and
results is needed - Main memory.
Instruction Cycle
The instruction cycle has two steps – Fetch and Execute.

Fetch Cycle

Program Counter (PC) holds address of next instruction to be fetched.

 Processor fetches instruction from memory location pointed to by PC.


 The PC is incremented unless told otherwise.
 Loads instruction into Instruction Register (IR).
 Processor interprets instruction and performs required actions

Execute Cycle

It performs 4 types of operations-

• Processor-memory - data transfer between CPU and main memory

• Processor I/O - Data transfer between CPU and I/O module

• Data processing - Some arithmetic or logical operation on data

— Control - Alteration of sequence of operations. e.g. jump

• Combination of above

Instruction Cycle State Diagram


Interrupts
Mechanism by which other modules (e.g. I/O) may interrupt normal sequence of processing. It is of 4
types

• Program Interrupt - e.g. overflow, division by zero

• Timer interrupt - Generated by internal processor timer

• I/O Interrupt – which comes from I/O controller to signal the completion of operation or error
condition

• Hardware failure interrupt - e.g. memory parity error, power failure

Program Flow Control

Interrupt Cycle

• Added to the instruction cycle. After each instruction the processor checks for interrupt which is
indicated by an interrupt signal

• If no interrupt, fetch the next instruction

• If interrupt pending:

— Suspend execution of current program

— Save the context

— Set PC to start address of interrupt handler routine

— Process the interrupt


— Restore context and continue interrupted program

Instruction Cycle (with Interrupts) - State Diagram

Multiple Interrupts

It can be implemented in 2 ways.

• Disable interrupts

— Processor will ignore further interrupts while processing one interrupt

— Interrupts remain pending and are checked after first interrupt has been processed

— Interrupts handled in sequence as they occur

• Define priorities

— Low priority interrupts can be interrupted by higher priority interrupts

— When higher priority interrupt has been processed, processor returns to previous
interrupt
Interconnecting Structures
All the functional units must be connected. The collection of paths interconnecting the basic
units/modules is called interconnection structure. Different type of connection for different type of units
like memory, Input/Output and CPU

Computer Modules

Memory Module

• Typically, a memory module will consist of N words of equal length. Each word is assigned a
unique numerical address (0, 1, . . . , N – 1).

• Receives and sends data

• Receives addresses

• Receives control signals like Read, Write and Timing


Input/Output Module

Similar to memory from computer’s viewpoint

• Output - Receive data from computer and send data to peripheral

• Input - Receive data from peripheral and send data to computer

• Receive control signals from computer

• Send control signals to peripherals e.g. spin disk

• Receive addresses from computer e.g. port number to identify peripheral

• Send interrupt signals to the processor

CPU Module

• Reads instruction and data

• Writes out data (after processing)

• Sends control signals to other units

• Receives & acts on interrupts

Buses
There are a number of possible interconnection systems. Single and multiple BUS structures are most
common. e.g. Control/Address/Data bus (PC), Unibus (DEC-PDP).

A bus is a communication pathway connecting two or more devices. It is usually broadcast. Often
grouped with a number of channels in one bus e.g. 32 bit data bus is 32 separate single bit channels

A bus that connects major computer components is called a system bus. System bus has 3 functional
groups : data, address and control lines. Power distribution lines supply power.

Data Bus

It carries data and instructions among system modules. Width is a key determinant of performance - 32,
64, 128 bit. Each line can carry only 1 bit at a time, so number of lines determine how many bits can be
transferred at a time.

Address bus

It is used to identify the source or destination of data e.g. CPU needs to read an instruction (data) from a
given location in memory. Bus width determines maximum memory capacity of system e.g. 8080 has 16
bit address bus giving 64k address space

Control Bus
It is used to send and receive control and timing information like Memory read/write signal, Interrupt
request, Clock signals. Buses are Parallel lines on circuit boards, Ribbon cables, Strip connectors on
mother boards like PCI or a set of wires

Bus Interconnection Scheme

Physical Realization of Bus Architecture

Single Bus Problems

Lots of devices on one bus leads to: Propagation delays. Long data paths mean that co-ordination of bus
use can adversely affect performance. If aggregate data transfer approaches bus capacity, it causes
bottleneck. Most systems use multiple buses to overcome these problems.

Multiple Bus
Traditional (ISA) (with cache)
An expansion bus interface buffers data transfers between the system bus and the I/O controllers on the expansion
bus. This arrangement allows the system to support a wide variety of I/O devices and at the same time insulate
memory-to-processor traffic from I/O traffic. Network connections include local area networks (LANs) such as a
10-Mbps Ethernet and connections to wide area networks (WANs) such as a
packet-switching network. SCSI (small computer system interface) is itself a type of bus used to support local disk
drives and other peripherals. A serial port could be used to support a printer or scanner. This traditional bus
architecture is reasonably efficient but begins to break down as higher and higher performance is seen in the I/O
devices.

High Performance Bus

There is a local bus that connects the processor to a cache controller, which is in turn connected to a
system bus that supports main memory. The cache controller is integrated into a bridge, or buffering device, that
connects to the high-speed bus. This bus supports connections to high-speed LANs, such as Fast Ethernet at 100
Mbps, video and graphics workstation controllers, as well as interface controllers to local peripheral buses. The
latter is a high-speed bus arrangement specifically designed to support high-capacity I/O devices. Lower-speed
devices are still supported off an expansion bus, with an interface buffering traffic between the expansion bus and
the high-speed bus. The advantage of this arrangement is that the high-speed bus brings high demand devices into
closer integration with the processor and at the same time is independent of the processor.

Elements of Bus Design


Bus Types
• Dedicated
— Separate data & address lines
— High throughput
— Disadv - increased size and cost
• Multiplexed
— Shared lines
— Address valid or data valid control line
— Advantage - fewer lines
— Disadvantages
– More complex control
– Reduction in performance

Bus Arbitration
More than one module controlling the bus. E.g. CPU and DMA controller. Only one module may control
bus at one time. Arbitration may be centralised or distributed.
• Centralised
— Single hardware device controlling bus access called a bus controller or arbitrer
— May be part of CPU or separate
— bus controller or arbiter is responsible for allocating time on bus
• Distributed
— Each module may claim the bus
— Control logic on all modules
— Each module contains access control logic and act together to share the bus
Timing
• It is used for co-ordination of events on bus
• Buses use either synchronous timing or asynchronous timing
• Synchronous
— Control Bus includes clock line in which a clock transmits alternating 1s and 0s
— A single 1-0 is a bus cycle and each event takes a single cycle
— All devices can read clock line and usually sync on leading edge
Synchronous Timing Diagram

Asynchronous Timing – Read Diagram

Asynchronous Timing – Write Diagram

Cache Memory
Characteristics
1. Location
CPU memory - Registers
Internal memory - Cache
External memory – Disk, Tapes
2. Capacity
Internal memory is expressed in Words or Bytes, the natural unit of organisation. Word
sizes can be 8,16 or 32 bits. External memory is expressed in terms of Bytes.
3. Unit of Transfer
Internal - Usually governed by the width of data bus
External - Usually a block which is much larger than a word

Addressable unit is the smallest location which can be uniquely addressed. If the no of
bits in address is A, then, the number of addressable units is N = 2A
4. Access Methods
Sequential
o Start at the beginning & read through in order
o Memory is organized into units of data called records
o Access time depends on location of data
o e.g. tape

Direct
o Individual blocks have unique address
o Access is by jumping to vicinity plus sequential search
o Access time depends on location and previous location
o e.g. disk
Random
o Individual addresses identify locations exactly
o Access time is independent of location or previous access
o e.g. RAM
Associative
o Data is located by a comparison with contents of a portion of the store
o Access time is independent of location or previous access
o e.g. cache
5. Performance
Access time is the time between presenting the address and getting the valid data
Memory Cycle time is the time required for the memory to “recover” before next access.
Cycle time is access time + recoverytime. Transfer Rate is the rate at which data can be
moved.
6. Physical types
Semi conductor memory – RAM, EPROM
Magnetic surface memory – Disk, Tapes
Optical memory – CD-ROM, DVD-ROM
Magneto optical memory
7. Physical characteristics
Volatile memory – Information is lost when power is switched off
Non volatile memory – Information once recorded remains until deliberately changed. No
electrical power is needed to retain information.
8. Organisation - Physical arrangement of bits into words. e.g. interleaved
Memory Hierarchy
It is possible to build a computer which uses only static RAM. This would be very fast and need no cache.
But this would cost a very large amount.

• Registers

• L1 Cache

• L2 Cache

• Main memory

• Disk cache

• Disk

• Optical

• Tape

Locality of Reference - During the course of the execution of a program, memory references tend to
cluster e.g. loops.

Cache - Small amount of fast memory which sits between normal main memory and CPU. It may be
located on CPU chip or module.
Cache/Main Memory Structure

Cache operation

When the CPU requests contents of memory location, first it checks cache for this data. If the data is
present, get the data from the cache in a fast manner. If the data is not present, read required block
from main memory to cache. Then deliver from cache to CPU. Cache includes tags to identify which
block of main memory is in each cache slot.

Cache Read Operation – Flowchart


Typical Cache Organization

Cache Design
1. Cache Addressing
The cache sits between processor and virtual memory management unit, that is between
MMU and main memory. Logical cache (virtual cache) stores data using virtual
addresses. Processor accesses cache directly, not through MMU. Cache access faster,
before MMU address translation. Virtual addresses use same address space for different
applications. Must flush cache on each context switch. Physical cache stores data using
main memory physical addresses.
2. Cache Size
Cost - More cache is expensive. Speed - More cache is faster (up to a point) Checking
cache for data takes time if the size is too big. A balance between size and cost.
3. Mapping Function
There are fewer cache lines than main memory blocks. An algorithm for mapping main
memory blocks into cache lines determines how the cache is organized.
Three types of mapping
i. Direct
ii. Associative
iii. Set associative
Direct Mapping

Each block of main memory maps to only one cache line ie. if a block is in cache, it must
be in one specific place. Address is in two parts - Least Significant w bits identify unique
word and the Most Significant s bits specify one memory block. The MSBs are split into
a cache line field r and a tag of s-r (most significant).
The mapping is expressed as i = j modulo m
where i = cache line number, j = main memory block number, m = no of lines in the
cache

Direct Mapping Cache Organization


Address length = (s + w) bits
Number of addressable units = 2s+w words or bytes
Block size = line size = 2w words or bytes
Number of blocks in main memory =2s+w/2w = 2s
Number of lines in cache = m = 2r
Size of the cache = 2r+w words or bytes
Size of tag = (s – r) bits
Direct Mapping pros & cons
 Simple
 Inexpensive
 Fixed location for given block - If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very high
Victim Cache – It is used to lower the miss penalty by remembering what was discarded.
It works on the principle that what is already fetched can be used again with little penalty.
It is fully associative, has 4 to 16 cache lines and is present between direct mapped L1
cache and next memory level.
Associative Mapping
A main memory block can load into any line of cache. Memory address is interpreted as a
tag and a word. Tag field uniquely identifies block of memory. Every line’s tag is
examined for a match. Cache searching gets expensive.
Fully Associative Cache Organization

Address length = (s + w) bits


Number of addressable units = 2s+w words or bytes
Block size = line size = 2w words or bytes
Number of blocks in main memory =2s+w/2w = 2s
Number of lines in cache = undetermined
Size of tag = s bits

Set Associative Mapping


Cache is divided into a number of sets. Each set contains a number of lines
m=v x k

i = j modulo v

i = cache set number, j = main memory block number

m= no of lines in the cache, v= no of sets, k= no of lines in each set

Mapping from Main Memory to Cache: v Associative


K-Way Set Associative Cache Organization

Address length = (s + w) bits

Number of addressable units = 2s+w words or bytes

Block size = line size = 2w words or bytes

Number of blocks in main memory =2s+w/2w = 2s

Number of lines in set = k

Number of sets = v = 2d
Number of lines in cache m= kv = k * 2d

Size of tag = (s – d) bits

4. Replacement Algorithm
Direct mapping - No choice of replacement. Each block only maps to one line, replace
that line.
Associative & Set Associative – The following criteria can be used to determine which
line to replace
• Hardware implemented algorithm which gives maximum speed
• Least Recently used (LRU) - For 2-way associative, USE variable is used
• First in first out (FIFO) - replace block that has been in cache longest
• Least frequently used - replace block which has had fewest hits using COUNTER
• Random which provides only inferior performance
5. Write Policy
Must not overwrite a cache block unless main memory is up to date. Multiple CPUs may
have individual caches. It follows any of the 2 methods
Write back - Updates initially made in cache only. Update bit/ dirty bit/ use bit for cache
slot is set when update occurs. If block is to be replaced, write to main memory only if
update bit is set. Disadvantage : Other caches get out of sync and I/O must access main
memory only through cache
Write through - All writes go to main memory as well as cache. Multiple CPUs can
monitor main memory traffic to keep local cache up to date. Disadvantage : Lots of
traffic and hence slow writes
6. Line Size
Retrieve not only desired word but a number of adjacent words as well. This increased
block size will increase hit ratio at first due to the principle of locality. Hit ratio will
decrease as block becomes even bigger.
Disadvantages of Larger blocks
a. Reduce number of blocks that fit in cache
b. Data overwritten shortly after being fetched
c. Each additional word is less local so less likely to be needed
No definitive optimum value for the size of the block has been found. 8 to 64 bytes seems
reasonable.
7. Multilevel Caches More than 1 level of cache designated as Level 1, Level 2
High logic density enables caches on chip which has the following advantages
a. Faster than external bus access
b. Frees bus for other transfers
Common to use both on and off chip cache. L1 on chip, L2 off chip in static RAM. L2
access much faster than DRAM or ROM as L2 often uses separate data path
L2 may now be on chip resulting in L3 cache. L3 may be via bus access or now
L3 is also on chip.
Unified v Split Caches

Unified is single cache for data and instructions. Split cache is two cache - one for data and one for
instruction.

Advantages of unified cache – Higher hit rate


– Balances load of instruction and data fetch

– Only one cache to design & implement

Advantages of split cache - Eliminates cache contention between instruction fetch/decode unit and
execution unit which is important in pipelining

Internal Memory
The two basic forms of semiconductor random access memory (RAM) are dynamic RAM (DRAM) and
static RAM (SRAM). SRAM is faster, more expensive, and less dense than DRAM, and is used for cache
memory. DRAM is used for main memory. To compensate for the relatively slow speed of DRAM, a
number of advanced DRAM organizations have been introduced. The two most common are
synchronous DRAM and RamBus DRAM. Both of these involve using the system clock to provide for the
transfer of blocks of data.

The basic element of a semiconductor memory is the memory cell. All semiconductor memory cells
share certain properties:

 They exhibit two stable (or semistable) states, which can be used to represent binary 1
and 0.
 They are capable of being written into (at least once), to set the state.
 They are capable of being read to sense the state.

Memory Cell Operation

The cell has three functional terminals capable of carrying an electrical signal. The select terminal is to
select a memory cell for a read or write operation. The control terminal indicates read or write. For
writing, the third terminal is to provide an electrical signal that sets the state of the cell to 1 or 0. For
reading, that terminal is used for output of the cell’s state.

Semiconductor Memory Types


Write
Memory Type Category Erasure Volatility
Mechanism

Random-
Read-
access Electrically,
write Electrically Volatile
memory byte-level
memory
(RAM)

Read-only
memory Masks
(ROM) Read-
Not
only
possible
Programmable memory

ROM (PROM)

Erasable
UV light,
PROM Nonvolatile
chip-level
(EPROM)
Electrically
Electrically Read-
Erasable mostly Electrically,
PROM memory byte-level
(EEPROM)

Electrically,
Flash memory
block-level

Semiconductor Main Memory


• Random Access Memory (RAM) - Misnamed as all semiconductor memory is random access.
One distinguishing characteristic of RAM is that it is possible both to read data from the memory
and to write new data into the memory easily and rapidly. The other distinguishing
characteristic of RAM is that it is volatile. The two traditional forms of RAM used in computers
are DRAM and SRAM.

• Dynamic RAM (DRAM) Bits stored as charge in capacitors. Charges leak. Need refreshing even
when powered. Simpler construction. Smaller per bit. Less expensive. Need refresh circuits.
Slower. Used for main memory. Essentially an analogue device - Level of charge determines
value. DRAM is made with cells that store data as charge on capacitors. The presence or absence
of charge in a capacitor is interpreted as a binary 1 or 0. The next diagram is a typical DRAM
structure for an individual cell that stores 1 bit. The address line is activated when the bit value
from this cell is to be read or written. The transistor acts as a switch that is closed (allowing
current to flow) if a voltage is applied to the address line and opened (no current flows) if no
voltage is present on the address line.

DRAM Operation

Address line active when bit is to be read or written- Transistor switch closed (current flows).

 For write operation:

 Voltage signal is applied to the bit line - High voltage for 1, low voltage for 0.

 Then a signal is applied to the address line - Transfers charge to the capacitor.

 For read operation:

 Address line is selected - transistor turns on.

 Charge on capacitor is fed out onto bit line to sense amplifier. Sense amplifier
compares capacitor voltage with reference value to determine if the cell has 0
or 1.

Capacitor charge must be restored.

 Static RAM (SRAM) - Bits stored as on/off switches. No charges to leak. No refreshing
needed when powered. More complex construction. Larger per bit. More expensive. Does
not need refresh circuits. Faster. Used for cache memory. Digital device: Uses flip-flops.

Static RAM (SRAM) Cell Structure


Four transistors T1,T2,T3,T4 connected in an arrangement gives stable logic state.

• State 1:

▫ C1 high, C2 low

▫ T1 T4 off, T2 T3 on

• State 0:

▫ C2 high, C1 low

▫ T2 T3 off, T1 T4 on

Address line controls the two transistors T5 T6 by switch it on to allowing read or write operation.

 For write operation:

o The desired bit value is applied to line B, while its complement is applied to line B’.

o This forces the four transistors (T1, T2, T3, T4) into the proper state.

 For a read operation:

o The bit value is read from line B.

A static RAM will hold its data as long as power is supplied to it. Both states are stable as long as the
direct current (dc) voltage is applied. Unlike the DRAM, no refresh is needed to retain (hold) data.

SRAM versus DRAM

▫ Both volatile -Power needed to preserve data.


▫ Dynamic Memory Cell: - Simpler to build, smaller. More dense (smaller cells = more cells per unit
area). Less expensive. Needs refresh circuitry. Favoured for larger memory units.

▫ Static Memory Cell: - Faster. Used for cache memory (both on and off chip).

• Read Only Memory (ROM) It contains a permanent pattern of data that cannot be changed. A ROM is
nonvolatile. While it is possible to read a ROM, it is not possible to write new data into it. An
important application of ROMs is: Microprogramming, Library subroutines, Systems programs (BIOS),
Function tables.

Types of ROM

 Written during manufacture - Very expensive for small runs.


 Programmable (once): PROM - Needs special equipment to program.
When only a small number of ROMs with a particular memory content is needed, a less
expensive alternative is the programmable ROM (PROM). It is nonvolatile and may be
written into only once. The writing process is performed electrically and may be
performed by a supplier or customer at a time later than the original chip fabrication.
Special equipment is required for the writing or “programming” process. PROMs provide
flexibility and convenience.

 Erasable Programmable (EPROM) -Erased by UV.


It is read and written electrically, as with PROM. It can be altered multiple times and,
like the ROM and PROM, holds its data virtually indefinitely. For comparable amounts
of storage, the EPROM is more expensive than PROM, but it has the advantage of the
multiple update capability.

 Electrically Erasable (EEPROM) -Takes much longer to write than read.


It is a read-mostly memory that can be written into at any time without erasing prior
contents; only the byte or bytes addressed are updated. It combines the advantage of non
volatility with the flexibility of being updatable in place, using ordinary bus control,
address, and data lines. EEPROM is more expensive than EPROM and also is less dense,
supporting fewer bits per chip.

 Flash memory - Erase whole memory electrically.


Like EEPROM, flash memory uses an electrical erasing technology. An entire flash
memory can be erased in one or a few seconds, which is much faster than EPROM. In
addition, it is possible to erase just blocks of memory rather than an entire chip. Like
EPROM, flash memory uses only one transistor per bit, and so achieves the high density

Advanced DRAM Organization

• Basic DRAM same since first RAM chips

• Enhanced DRAM.

▫ Contains small SRAM as well.

▫ SRAM holds last line read.


• Cache DRAM

▫ Larger SRAM component.

▫ Use as cache or serial buffer

• Synchronous DRAM (SDRAM) Access is synchronized with an external clock. Address is


presented to RAM. RAM finds data. Since SDRAM moves data in time with system clock, CPU
knows when data will be ready. CPU does not have to wait, it can do something else. Burst
mode allows SDRAM to set up stream of data and fire it out in block. DDR-SDRAM sends data

twice per clock cycle (double data rate). One of the most widely used forms of DRAM is the
synchronous DRAM. SDRAM exchanges data with the processor synchronized to an external
clock signal and running at the full speed of the processor/memory bus without imposing wait
states. The traditional DRAM chip is constrained both by its internal architecture and by its
interface to the processor’s memory bus.

With synchronous access, the DRAM moves data in and out under control of the system clock.
The processor or other master issues the instruction and address information, which is latched
by the DRAM. The DRAM then responds after a set number of clock cycles. Meanwhile, the
master can safely do other tasks while the SDRAM is processing the request. The mode register
specifies the burst length and latency. The SDRAM performs best when it is transferring large
blocks of data serially, such as for applications like word processing, spreadsheets, and
multimedia.
RAMBUS DRAM

Adopted by Intel for Pentium & Itanium. Main competitor to SDRAM. Vertical package – all pins
on one side. Data exchange with the processor over 28 wires < 12 cm long. Bus addresses up to
320 RDRAM chips at 1.6 Gbps. Using asynchronous block protocol. After 480 ns access time,
produces the 1.6 Gbps data rate.

The configuration consists of a controller and a number of RDRAM modules connected via a
common bus. The bus includes 18 data lines (16 actual data, two parity) cycling at twice the
clock rate. There is a separate set of 8 lines (RC) used for address and control signals. There is
also a clock signal that starts at the far end from the controller propagates to the controller end
and then loops back. A RDRAM module sends data to the controller synchronously to the clock
to master, and the controller sends data to an RDRAM synchronously with the clock signal in the
opposite direction. The remaining bus lines include a reference voltage, ground, and power
source.

DDR SDRAM

SDRAM can only send data once per clock. Double-data-rate SDRAM can send data twice per
clock cycle. Once on rising edge and once on falling edge. DDR chips are widely used in desktop
computers and servers.

There have been two generations of improvement to the DDR technology:

DDR2 increases the data transfer rate by increasing the operational frequency of the RAM chip
and by increasing the prefetch buffer from 2 bits to 4 bits per chip. The prefetch buffer is a
memory cache located on the RAM chip. The buffer enables the RAM chip to preposition bits to
be placed on the data base as rapidly as possible.

DDR3 increases the prefetch buffer size to 8 bits.

DDR SDRAM Read Timing

Simplified DRAM Read Timing

Cache DRAM

Integrates small SRAM cache (16 kb) onto generic DRAM chip. Used as true cache. 64-bit lines. Effective
for ordinary random access to memory. Used as a buffer to support serial access of block of data

▫ E.g. refresh bit-mapped screen.

CDRAM can prefetch data from DRAM into SRAM buffer. Subsequent accesses solely to SRAM.

You might also like