Professional Documents
Culture Documents
Ca Module 1
Ca Module 1
Computer Architecture:
Computer Architecture is a functional description of requirements and design
implementation for the various parts of computer.It deals with functional behavior of
computer system. It comes before the computer organization while designing a
computer.
Von Neumann Architecture
The modern computers are based on a stored-program concept introduced by John Von
Neumann.
Based on three key concepts:
• Data and instructions are stored in a single read–write memory.
• The contents of this memory are addressable by location, without regard to the type of data
contained there.
• Execution occurs in a sequential fashion (unless explicitly
modified) from one instruction to the next.
A computer consists of
CPU (central processing unit)
Memory
I/O
and interconnection structures among these components(buses).
Program Concept
Hardwired systems are inflexible. General purpose hardware can do different tasks, given correct control
signals. Instead of re-wiring, supply a new set of control signals. Program is a sequence of steps. For
each step, an arithmetic or logical operation is done. For each operation, a different set of control
signals is needed.
For each operation a unique code is provided e.g. ADD, MOVE. A hardware segment accepts the code
and issues the control signals.
Components
The Control Unit and the Arithmetic and Logic Unit constitute the Central Processing Unit. Data and
instructions need to get into the system and results out - Input/output. Temporary storage of code and
results is needed - Main memory.
Instruction Cycle
The instruction cycle has two steps – Fetch and Execute.
Fetch Cycle
Execute Cycle
• Combination of above
• I/O Interrupt – which comes from I/O controller to signal the completion of operation or error
condition
Interrupt Cycle
• Added to the instruction cycle. After each instruction the processor checks for interrupt which is
indicated by an interrupt signal
• If interrupt pending:
Multiple Interrupts
• Disable interrupts
— Interrupts remain pending and are checked after first interrupt has been processed
• Define priorities
— When higher priority interrupt has been processed, processor returns to previous
interrupt
Interconnecting Structures
All the functional units must be connected. The collection of paths interconnecting the basic
units/modules is called interconnection structure. Different type of connection for different type of units
like memory, Input/Output and CPU
Computer Modules
Memory Module
• Typically, a memory module will consist of N words of equal length. Each word is assigned a
unique numerical address (0, 1, . . . , N – 1).
• Receives addresses
CPU Module
Buses
There are a number of possible interconnection systems. Single and multiple BUS structures are most
common. e.g. Control/Address/Data bus (PC), Unibus (DEC-PDP).
A bus is a communication pathway connecting two or more devices. It is usually broadcast. Often
grouped with a number of channels in one bus e.g. 32 bit data bus is 32 separate single bit channels
A bus that connects major computer components is called a system bus. System bus has 3 functional
groups : data, address and control lines. Power distribution lines supply power.
Data Bus
It carries data and instructions among system modules. Width is a key determinant of performance - 32,
64, 128 bit. Each line can carry only 1 bit at a time, so number of lines determine how many bits can be
transferred at a time.
Address bus
It is used to identify the source or destination of data e.g. CPU needs to read an instruction (data) from a
given location in memory. Bus width determines maximum memory capacity of system e.g. 8080 has 16
bit address bus giving 64k address space
Control Bus
It is used to send and receive control and timing information like Memory read/write signal, Interrupt
request, Clock signals. Buses are Parallel lines on circuit boards, Ribbon cables, Strip connectors on
mother boards like PCI or a set of wires
Lots of devices on one bus leads to: Propagation delays. Long data paths mean that co-ordination of bus
use can adversely affect performance. If aggregate data transfer approaches bus capacity, it causes
bottleneck. Most systems use multiple buses to overcome these problems.
Multiple Bus
Traditional (ISA) (with cache)
An expansion bus interface buffers data transfers between the system bus and the I/O controllers on the expansion
bus. This arrangement allows the system to support a wide variety of I/O devices and at the same time insulate
memory-to-processor traffic from I/O traffic. Network connections include local area networks (LANs) such as a
10-Mbps Ethernet and connections to wide area networks (WANs) such as a
packet-switching network. SCSI (small computer system interface) is itself a type of bus used to support local disk
drives and other peripherals. A serial port could be used to support a printer or scanner. This traditional bus
architecture is reasonably efficient but begins to break down as higher and higher performance is seen in the I/O
devices.
There is a local bus that connects the processor to a cache controller, which is in turn connected to a
system bus that supports main memory. The cache controller is integrated into a bridge, or buffering device, that
connects to the high-speed bus. This bus supports connections to high-speed LANs, such as Fast Ethernet at 100
Mbps, video and graphics workstation controllers, as well as interface controllers to local peripheral buses. The
latter is a high-speed bus arrangement specifically designed to support high-capacity I/O devices. Lower-speed
devices are still supported off an expansion bus, with an interface buffering traffic between the expansion bus and
the high-speed bus. The advantage of this arrangement is that the high-speed bus brings high demand devices into
closer integration with the processor and at the same time is independent of the processor.
Bus Arbitration
More than one module controlling the bus. E.g. CPU and DMA controller. Only one module may control
bus at one time. Arbitration may be centralised or distributed.
• Centralised
— Single hardware device controlling bus access called a bus controller or arbitrer
— May be part of CPU or separate
— bus controller or arbiter is responsible for allocating time on bus
• Distributed
— Each module may claim the bus
— Control logic on all modules
— Each module contains access control logic and act together to share the bus
Timing
• It is used for co-ordination of events on bus
• Buses use either synchronous timing or asynchronous timing
• Synchronous
— Control Bus includes clock line in which a clock transmits alternating 1s and 0s
— A single 1-0 is a bus cycle and each event takes a single cycle
— All devices can read clock line and usually sync on leading edge
Synchronous Timing Diagram
Cache Memory
Characteristics
1. Location
CPU memory - Registers
Internal memory - Cache
External memory – Disk, Tapes
2. Capacity
Internal memory is expressed in Words or Bytes, the natural unit of organisation. Word
sizes can be 8,16 or 32 bits. External memory is expressed in terms of Bytes.
3. Unit of Transfer
Internal - Usually governed by the width of data bus
External - Usually a block which is much larger than a word
Addressable unit is the smallest location which can be uniquely addressed. If the no of
bits in address is A, then, the number of addressable units is N = 2A
4. Access Methods
Sequential
o Start at the beginning & read through in order
o Memory is organized into units of data called records
o Access time depends on location of data
o e.g. tape
Direct
o Individual blocks have unique address
o Access is by jumping to vicinity plus sequential search
o Access time depends on location and previous location
o e.g. disk
Random
o Individual addresses identify locations exactly
o Access time is independent of location or previous access
o e.g. RAM
Associative
o Data is located by a comparison with contents of a portion of the store
o Access time is independent of location or previous access
o e.g. cache
5. Performance
Access time is the time between presenting the address and getting the valid data
Memory Cycle time is the time required for the memory to “recover” before next access.
Cycle time is access time + recoverytime. Transfer Rate is the rate at which data can be
moved.
6. Physical types
Semi conductor memory – RAM, EPROM
Magnetic surface memory – Disk, Tapes
Optical memory – CD-ROM, DVD-ROM
Magneto optical memory
7. Physical characteristics
Volatile memory – Information is lost when power is switched off
Non volatile memory – Information once recorded remains until deliberately changed. No
electrical power is needed to retain information.
8. Organisation - Physical arrangement of bits into words. e.g. interleaved
Memory Hierarchy
It is possible to build a computer which uses only static RAM. This would be very fast and need no cache.
But this would cost a very large amount.
• Registers
• L1 Cache
• L2 Cache
• Main memory
• Disk cache
• Disk
• Optical
• Tape
Locality of Reference - During the course of the execution of a program, memory references tend to
cluster e.g. loops.
Cache - Small amount of fast memory which sits between normal main memory and CPU. It may be
located on CPU chip or module.
Cache/Main Memory Structure
Cache operation
When the CPU requests contents of memory location, first it checks cache for this data. If the data is
present, get the data from the cache in a fast manner. If the data is not present, read required block
from main memory to cache. Then deliver from cache to CPU. Cache includes tags to identify which
block of main memory is in each cache slot.
Cache Design
1. Cache Addressing
The cache sits between processor and virtual memory management unit, that is between
MMU and main memory. Logical cache (virtual cache) stores data using virtual
addresses. Processor accesses cache directly, not through MMU. Cache access faster,
before MMU address translation. Virtual addresses use same address space for different
applications. Must flush cache on each context switch. Physical cache stores data using
main memory physical addresses.
2. Cache Size
Cost - More cache is expensive. Speed - More cache is faster (up to a point) Checking
cache for data takes time if the size is too big. A balance between size and cost.
3. Mapping Function
There are fewer cache lines than main memory blocks. An algorithm for mapping main
memory blocks into cache lines determines how the cache is organized.
Three types of mapping
i. Direct
ii. Associative
iii. Set associative
Direct Mapping
Each block of main memory maps to only one cache line ie. if a block is in cache, it must
be in one specific place. Address is in two parts - Least Significant w bits identify unique
word and the Most Significant s bits specify one memory block. The MSBs are split into
a cache line field r and a tag of s-r (most significant).
The mapping is expressed as i = j modulo m
where i = cache line number, j = main memory block number, m = no of lines in the
cache
i = j modulo v
Number of sets = v = 2d
Number of lines in cache m= kv = k * 2d
4. Replacement Algorithm
Direct mapping - No choice of replacement. Each block only maps to one line, replace
that line.
Associative & Set Associative – The following criteria can be used to determine which
line to replace
• Hardware implemented algorithm which gives maximum speed
• Least Recently used (LRU) - For 2-way associative, USE variable is used
• First in first out (FIFO) - replace block that has been in cache longest
• Least frequently used - replace block which has had fewest hits using COUNTER
• Random which provides only inferior performance
5. Write Policy
Must not overwrite a cache block unless main memory is up to date. Multiple CPUs may
have individual caches. It follows any of the 2 methods
Write back - Updates initially made in cache only. Update bit/ dirty bit/ use bit for cache
slot is set when update occurs. If block is to be replaced, write to main memory only if
update bit is set. Disadvantage : Other caches get out of sync and I/O must access main
memory only through cache
Write through - All writes go to main memory as well as cache. Multiple CPUs can
monitor main memory traffic to keep local cache up to date. Disadvantage : Lots of
traffic and hence slow writes
6. Line Size
Retrieve not only desired word but a number of adjacent words as well. This increased
block size will increase hit ratio at first due to the principle of locality. Hit ratio will
decrease as block becomes even bigger.
Disadvantages of Larger blocks
a. Reduce number of blocks that fit in cache
b. Data overwritten shortly after being fetched
c. Each additional word is less local so less likely to be needed
No definitive optimum value for the size of the block has been found. 8 to 64 bytes seems
reasonable.
7. Multilevel Caches More than 1 level of cache designated as Level 1, Level 2
High logic density enables caches on chip which has the following advantages
a. Faster than external bus access
b. Frees bus for other transfers
Common to use both on and off chip cache. L1 on chip, L2 off chip in static RAM. L2
access much faster than DRAM or ROM as L2 often uses separate data path
L2 may now be on chip resulting in L3 cache. L3 may be via bus access or now
L3 is also on chip.
Unified v Split Caches
Unified is single cache for data and instructions. Split cache is two cache - one for data and one for
instruction.
Advantages of split cache - Eliminates cache contention between instruction fetch/decode unit and
execution unit which is important in pipelining
Internal Memory
The two basic forms of semiconductor random access memory (RAM) are dynamic RAM (DRAM) and
static RAM (SRAM). SRAM is faster, more expensive, and less dense than DRAM, and is used for cache
memory. DRAM is used for main memory. To compensate for the relatively slow speed of DRAM, a
number of advanced DRAM organizations have been introduced. The two most common are
synchronous DRAM and RamBus DRAM. Both of these involve using the system clock to provide for the
transfer of blocks of data.
The basic element of a semiconductor memory is the memory cell. All semiconductor memory cells
share certain properties:
They exhibit two stable (or semistable) states, which can be used to represent binary 1
and 0.
They are capable of being written into (at least once), to set the state.
They are capable of being read to sense the state.
The cell has three functional terminals capable of carrying an electrical signal. The select terminal is to
select a memory cell for a read or write operation. The control terminal indicates read or write. For
writing, the third terminal is to provide an electrical signal that sets the state of the cell to 1 or 0. For
reading, that terminal is used for output of the cell’s state.
Random-
Read-
access Electrically,
write Electrically Volatile
memory byte-level
memory
(RAM)
Read-only
memory Masks
(ROM) Read-
Not
only
possible
Programmable memory
ROM (PROM)
Erasable
UV light,
PROM Nonvolatile
chip-level
(EPROM)
Electrically
Electrically Read-
Erasable mostly Electrically,
PROM memory byte-level
(EEPROM)
Electrically,
Flash memory
block-level
• Dynamic RAM (DRAM) Bits stored as charge in capacitors. Charges leak. Need refreshing even
when powered. Simpler construction. Smaller per bit. Less expensive. Need refresh circuits.
Slower. Used for main memory. Essentially an analogue device - Level of charge determines
value. DRAM is made with cells that store data as charge on capacitors. The presence or absence
of charge in a capacitor is interpreted as a binary 1 or 0. The next diagram is a typical DRAM
structure for an individual cell that stores 1 bit. The address line is activated when the bit value
from this cell is to be read or written. The transistor acts as a switch that is closed (allowing
current to flow) if a voltage is applied to the address line and opened (no current flows) if no
voltage is present on the address line.
DRAM Operation
Address line active when bit is to be read or written- Transistor switch closed (current flows).
Voltage signal is applied to the bit line - High voltage for 1, low voltage for 0.
Then a signal is applied to the address line - Transfers charge to the capacitor.
Charge on capacitor is fed out onto bit line to sense amplifier. Sense amplifier
compares capacitor voltage with reference value to determine if the cell has 0
or 1.
Static RAM (SRAM) - Bits stored as on/off switches. No charges to leak. No refreshing
needed when powered. More complex construction. Larger per bit. More expensive. Does
not need refresh circuits. Faster. Used for cache memory. Digital device: Uses flip-flops.
• State 1:
▫ C1 high, C2 low
▫ T1 T4 off, T2 T3 on
• State 0:
▫ C2 high, C1 low
▫ T2 T3 off, T1 T4 on
Address line controls the two transistors T5 T6 by switch it on to allowing read or write operation.
o The desired bit value is applied to line B, while its complement is applied to line B’.
o This forces the four transistors (T1, T2, T3, T4) into the proper state.
A static RAM will hold its data as long as power is supplied to it. Both states are stable as long as the
direct current (dc) voltage is applied. Unlike the DRAM, no refresh is needed to retain (hold) data.
▫ Static Memory Cell: - Faster. Used for cache memory (both on and off chip).
• Read Only Memory (ROM) It contains a permanent pattern of data that cannot be changed. A ROM is
nonvolatile. While it is possible to read a ROM, it is not possible to write new data into it. An
important application of ROMs is: Microprogramming, Library subroutines, Systems programs (BIOS),
Function tables.
Types of ROM
• Enhanced DRAM.
twice per clock cycle (double data rate). One of the most widely used forms of DRAM is the
synchronous DRAM. SDRAM exchanges data with the processor synchronized to an external
clock signal and running at the full speed of the processor/memory bus without imposing wait
states. The traditional DRAM chip is constrained both by its internal architecture and by its
interface to the processor’s memory bus.
With synchronous access, the DRAM moves data in and out under control of the system clock.
The processor or other master issues the instruction and address information, which is latched
by the DRAM. The DRAM then responds after a set number of clock cycles. Meanwhile, the
master can safely do other tasks while the SDRAM is processing the request. The mode register
specifies the burst length and latency. The SDRAM performs best when it is transferring large
blocks of data serially, such as for applications like word processing, spreadsheets, and
multimedia.
RAMBUS DRAM
Adopted by Intel for Pentium & Itanium. Main competitor to SDRAM. Vertical package – all pins
on one side. Data exchange with the processor over 28 wires < 12 cm long. Bus addresses up to
320 RDRAM chips at 1.6 Gbps. Using asynchronous block protocol. After 480 ns access time,
produces the 1.6 Gbps data rate.
The configuration consists of a controller and a number of RDRAM modules connected via a
common bus. The bus includes 18 data lines (16 actual data, two parity) cycling at twice the
clock rate. There is a separate set of 8 lines (RC) used for address and control signals. There is
also a clock signal that starts at the far end from the controller propagates to the controller end
and then loops back. A RDRAM module sends data to the controller synchronously to the clock
to master, and the controller sends data to an RDRAM synchronously with the clock signal in the
opposite direction. The remaining bus lines include a reference voltage, ground, and power
source.
DDR SDRAM
SDRAM can only send data once per clock. Double-data-rate SDRAM can send data twice per
clock cycle. Once on rising edge and once on falling edge. DDR chips are widely used in desktop
computers and servers.
DDR2 increases the data transfer rate by increasing the operational frequency of the RAM chip
and by increasing the prefetch buffer from 2 bits to 4 bits per chip. The prefetch buffer is a
memory cache located on the RAM chip. The buffer enables the RAM chip to preposition bits to
be placed on the data base as rapidly as possible.
Cache DRAM
Integrates small SRAM cache (16 kb) onto generic DRAM chip. Used as true cache. 64-bit lines. Effective
for ordinary random access to memory. Used as a buffer to support serial access of block of data
CDRAM can prefetch data from DRAM into SRAM buffer. Subsequent accesses solely to SRAM.