Proj CA MITE-F18-015

Computer Architecture
Final Exam Report
USMAN ZAFAR | MITE-F18-015

Superior University Lahore
Table of Contents
Magnetic Storage devices............................................................................................................................2
Magnetic Tape........................................................................................................................................3
Optical Storage........................................................................................................................................4
1984: Compact Disc - Read Only Memory (CD-ROM).........................................................................4
1989: CD-ROM/Extended Architecture (CD-ROM/XA).......................................................................4
1990: Write Once (CD-WO).................................................................................................................4
1995: CD Read/Write (CD-RW)...........................................................................................................4
1997: DVD-ROM..................................................................................................................................5
2003 Blu-Ray.......................................................................................................................................5
Memory Organization in Computer Architecture........................................................................................7
Memory Hierarchy...................................................................................................................................7
Memory Access Methods........................................................................................................................9
Main Memory..........................................................................................................................................9
Auxiliary Memory..................................................................................................................................10
Cache Memory..........................................................................................................................................10
Hit Ratio............................................................................................................................................11
Associative Memory..............................................................................................................................11
Semiconductor Memory Types & Technologies........................................................................................11
Semiconductor memory is used in all forms of computer applications: there are many types,
technologies and terminologies - DRAM, SRAM, Flash, DDR3, DDR4, DDR5, and more.................12
Semiconductor Memory Types & Technologies........................................................................................12
Semiconductor memory is used in all forms of computer applications: there are many types,
technologies and terminologies - DRAM, SRAM, Flash, DDR3, DDR4, DDR5, and more.................12
Semiconductor memory: main types.....................................................................................................13
Semiconductor memory technologies...................................................................................................14
Concept of Cache in Computer Architecture.............................................................................................17
How cache works..............................................................................................................................17
Cache Mapping Techniques-.................................................................................................................18
The three different types of mapping used for the purpose of cache memory are as follow, Associative
mapping, Direct mapping and Set-Associative mapping. - Associative mapping: In this type of mapping
the associative memory is used to store content and addresses both of the memory word. This
enables the placement of the any word at any place in the cache memory. It is considered to be the
fastest and the most flexible mapping form. - Direct mapping: In direct mapping the RAM is made use
of to store data and some is stored in the cache. An address space is split into two parts index field
and tag field. The cache is used to store the tag field whereas the rest is stored in the main memory.
Direct mapping`s performance is directly proportional to the Hit ratio. - Set-associative mapping: This
form of mapping is a modified form of the direct mapping where the disadvantage of direct mapping
is removed. Set-associative mapping allows that each word that is present in the cache can have two
or more words in the main memory for the same index address..........................................................18
Mapping Functions And Replacement Algorithms....................................................................................20
Introduction: Cache Reads.....................................................................................................................27
Design Decision #1: Keeping Track of Modified Data............................................................................28
Write-Through Implementation Details (naive version)....................................................................29
Write-Through Implementation Details (smarter version)................................................................29
Write-Back Implementation Details...................................................................................................30
Design Decision #2: Write Misses: Should You Care?............................................................................30
Write-allocate....................................................................................................................................31
No-write-allocate...............................................................................................................................31
Instruction Cycle........................................................................................................................................32
Instruction Addressing Modes & Formats................................................................................................52
Magnetic Storage devices
Floppy Disk
A floppy disk is a magnetic storage medium for computer systems. The floppy disk
is composed of a thin, flexible magnetic disk sealed in a square plastic carrier. In
order to read and write data from a floppy disk, a computer system must have a
floppy disk drive (FDD). A floppy disk is also referred to simply as a floppy. Since
the early days of personal computing, floppy disks were widely used to distribute
software, transfer files, and create back-up copies of data. When hard drives were
still very expensive, floppy disks were also used to store the operating system of a
computer.
A number of different types of floppy disks have been developed. The size of the
floppy got smaller, and the storage capacity increased. However, in the 1990s,
other media, including hard disk drives, ZIP drives, optical drives, and USB flash
drives, started to replace floppy disks as the primary storage medium.
Hard Disk
Hard disk drives have been the dominant type of storage since the early days of
computers. A hard disk drive consists of a rigid disc made with non-magnetic
material, which is coated with a thin layer of magnetic material. Data is stored by
magnetizing this thin film. The disk spins at a high speed and a magnetic head
mounted on a moving arm is used to read and write data. A typical hard disk drive
operates at a speed of 7,200 rpm (rotations per minute), so you will often see this
number as part of the technical specifications of a computer. The spinning of the
disk is also the source of the humming noise of a computer, although most
modern hard disk drives are fairly quiet.
Magnetic Tape
'Tape is dead! Long live tape!' Were you around in the 80s when cassette tapes
were all the rage? People still say 'mixed tape' sometimes when referring to
playlists they make or CDs they give each other. Though the cassette tape has
fallen out of favor, it was neither the first nor the last device to use magnetic tape
for storage.
A magnetic tape, in computer terminology, is a storage medium that allows for
data archiving, collection, and backup. At first, the tapes were wound in wheel-
like reels, but then cassettes and cartridges came along, which offered more
protection for the tape inside.
Optical Storage Devices
Optical Storage
About that time, optical devices were starting to be marketed. An optical storage
device is written and read with a laser. It is strong and can handle temperature
fluctuations much better than magnetic media. Because the floppy was so
inexpensive at this time, it took several years before the optical drives became
affordable to the general and small business consumer.
Evolution of Optical Storage
1984: Compact Disc - Read Only Memory (CD-ROM)

The technology was developed for storing general data, although it can also store
audio. Instead of storing data in tracks (audio data is stored this way on CDs; just
an optical version of the tracks on that old cassette stuck beneath your car seat),
data on a CD-ROM is stored in blocks. A block of data is 2,352 bytes and 333,000
blocks can fit onto one CD.
However, there are limits to this technology. It is still slower than a hard drive.
Further, the laser beam needs time to seek out the data on the disc, which
precludes simultaneous playback of audio and data.
1989: CD-ROM/Extended Architecture (CD-ROM/XA)

The CD-ROM/XA was designed to allow both audio and other data to be read at
the same time. The first CD-ROMs were known as Mode 1, while the CD-ROM XA
is considered Mode 2.
1990: Write Once (CD-WO)
Think of the CD-WO as a blank tape that you can only record over once. The disc
has a layer of gold between the dye and protective layer; when the data is burned
to the disc, the laser heats the dye which creates small hills--the drive reads these
as pits and lands and can access the data.
1995: CD Read/Write (CD-RW)

The CD-RW is a re-writable, or write-many device: Data can be written, erased,
and written again. The heat of the laser is adjusted for each process (high
temperature for burning, lower for reading, and medium power to erase).
Because of this, these discs do not work in all drives; as such, even some CD-RW
players have trouble reading the data, especially if a disc has been written and re-
written several times.
1997: DVD-ROM
Soon after DVDs were released for video, they were used to store data. In
a DVD (digital versatile discs), the pits and lands are shorter, which allows for
capacity up to 4.3GB. DVD's also make use of a second data layer between the
reflective layer and the substrate that boosts speed and storage capacity.
2003 Blu-Ray
Blu-Ray discs look like CDs and DVDs but can can store high definition video (HD)
and even more storage. What gives it the ability to store more is the shorter
wavelength, blue laser that's used, giving it its name.
Primary Storage Devices

Random-access memory (RAM) is a type of computer data storage. A RAM device
makes it possible to access data in random order, which makes it very fast to find
a specific piece of information. Certain other types of storage are not random-
access. For example, a hard disk drive and a CD will read and write data in a
predetermined order. The mechanical design of these devices prescribes that
data access is consecutive. This means that the time it takes to find a specific
piece of information can vary greatly depending on where it is located on the disk.
Static Random Access Memory (Static RAM or SRAM) is a type of RAM that holds
data in a static form, that is, as long as the memory has power. Unlike dynamic
RAM, it does not need to be refreshed.
SRAM stores a bit of data on four transistors using two cross-coupled inverters.
The two stable states characterize 0 and 1. During read and write operations
another two access transistors are used to manage the availability to a memory
cell. To store one memory bit it requires six metal-oxide-semiconductor field-
effect transistors (MOFSET). MOFSET is one of the two types of SRAM chips; the
other is the bipolar junction transistor. The bipolar junction transistor is very fast
but consumes a lot of energy. MOFSET is a popular SRAM type.
The term is prononuced "S-RAM", not "sram."
Dynamic random access memory (DRAM) is a type of random-access memory
used in computing devices (primarily PCs). DRAM stores each bit of data in a
separate passive electronic component that is inside an integrated circuit board.
Each electrical component has two states of value in one bit called 0 and 1. This
captivator needs to be refreshed often otherwise information fades. DRAM has
one capacitor and one transistor per bit as opposed to static random access
memory (SRAM) that requires 6 transistors. The capacitors and transistors that
are used are exceptionally small. There are millions of capacitors and transistors
that fit on one single memory chip.
Read-only memory (ROM) is a type of storage medium that permanently stores

data on personal computers (PCs) and other electronic devices. It contains the
programming needed to start a PC, which is essential for boot-up; it performs
major input/output tasks and holds programs or software instructions.
Because ROM is read-only, it cannot be changed; it is permanent and non-volatile,
meaning it also holds its memory even when power is removed. By contrast,
random access memory (RAM) is volatile; it is lost when power is removed.
There are numerous ROM chips located on the motherboard and a few on
expansion boards. The chips are essential for the basic input/output system
(BIOS), boot up, reading and writing to peripheral devices, basic data
management and the software for basic processes for certain utilities.
Memory Organization in Computer Architecture
A memory unit is the collection of storage units or devices together. The memory
unit stores the binary information in the form of bits. Generally, memory/storage
is classified into 2 categories:
 Volatile Memory: This loses its data, when power is switched off.
 Non-Volatile Memory: This is a permanent storage and does not lose any
data when power is switched off.
Memory Hierarchy
The total memory capacity of a computer can be visualized by hierarchy of

components. The memory hierarchy system consists of all storage devices
contained in a computer system from the slow Auxiliary Memory to fast Main
Memory and to smaller Cache memory.
Auxillary memory access time is generally 1000 times that of the main memory,
hence it is at the bottom of the hierarchy.
The main memory occupies the central position because it is equipped to
communicate directly with the CPU and with auxiliary memory devices through
Input/output processor (I/O).
When the program not residing in main memory is needed by the CPU, they are
brought in from auxiliary memory. Programs not currently needed in main
memory are transferred into auxiliary memory to provide space in main memory
for other programs that are currently in use.
The cache memory is used to store program data which is currently being
executed in the CPU. Approximate access time ratio between cache memory and
main memory is about 1 to 7~10
Memory Access Methods

Each memory type, is a collection of numerous memory locations. To access data
from any memory, first it must be located and then the data is read from the
memory location. Following are the methods to access information from memory
locations:
1. Random Access: Main memories are random access memories, in which

each memory location has a unique address. Using this unique address any
memory location can be reached in the same amount of time in any order.
2. Sequential Access: This methods allows memory access in a sequence or in
order.
3. Direct Access: In this mode, information is stored in tracks, with each track
having a separate read/write head.
Main Memory
The memory unit that communicates directly within the CPU, Auxillary memory
and Cache memory, is called main memory. It is the central storage unit of the
computer system. It is a large and fast memory used to store data during
computer operations. Main memory is made up of RAM and ROM, with RAM
integrated circuit chips holing the major share.
 RAM: Random Access Memory

o DRAM: Dynamic RAM, is made of capacitors and transistors, and
must be refreshed every 10~100 ms. It is slower and cheaper than
SRAM.
o SRAM: Static RAM, has a six transistor circuit in each cell and retains
data, until powered off.
o NVRAM: Non-Volatile RAM, retains its data, even when turned off.
Example: Flash memory.
 ROM: Read Only Memory, is non-volatile and is more like a permanent
storage for information. It also stores the bootstrap loader program, to
load and start the operating system when computer is turned
on. PROM(Programmable ROM), EPROM(Erasable PROM)
and EEPROM(Electrically Erasable PROM) are some commonly used ROMs.
Auxiliary Memory
Devices that provide backup storage are called auxiliary memory. For
example: Magnetic disks and tapes are commonly used auxiliary devices. Other
devices used as auxiliary memory are magnetic drums, magnetic bubble memory
and optical disks.
It is not directly accessible to the CPU, and is accessed using the Input/Output
channels.
Cache Memory
The data or contents of the main memory that are used again and again by CPU,
are stored in the cache memory so that we can easily access that data in shorter
time.
Whenever the CPU needs to access memory, it first checks the cache memory. If
the data is not found in cache memory then the CPU moves onto the main
memory. It also transfers block of recent data into the cache and keeps on
deleting the old data in cache to accomodate the new one.
Hit Ratio
The performance of cache memory is measured in terms of a quantity called hit
ratio. When the CPU refers to memory and finds the word in cache it is said to
produce a hit. If the word is not found in cache, it is in main memory then it
counts as a miss.
The ratio of the number of hits to the total CPU references to memory is called hit
ratio.
Hit Ratio = Hit/(Hit + Miss)
Associative Memory
It is also known as content addressable memory (CAM). It is a memory chip in
which each bit position can be compared. In this the content is compared in each
bit cell which allows very fast table lookup. Since the entire chip can be
compared, contents are randomly stored without considering addressing scheme.
These chips have less storage capacity than regular memory chips.
Semiconductor Memory Types & Technologies
Semiconductor memory is used in all forms of computer applications: there are

many types, technologies and terminologies - DRAM, SRAM, Flash, DDR3, DDR4,
DDR5, and more.
Semiconductor memory is used in any electronics assembly that uses computer

processing technology. Semiconductor memory is the essential electronics
component needed for any computer based PCB assembly.
In addition to this, memory cards have become commonplace items for

temporarily storing data - everything from the portable flash memory cards used
for transferring files, to semiconductor memory cards used in cameras, mobile
phones and the like.
The use of semiconductor memory has grown, and the size of these memory
cards has increased as the need for larger and larger amounts of storage is
needed.
To meet the growing needs for semiconductor memory, there are many types and
technologies that are used. As the demand grows new memory technologies are
being introduced and the existing types and technologies are being further
developed.
Semiconductor Memory Types & Technologies

Semiconductor memory is used in all forms of computer applications: there are
many types, technologies and terminologies - DRAM, SRAM, Flash, DDR3, DDR4,
DDR5, and more.
Semiconductor memory is used in any electronics assembly that uses computer

processing technology. Semiconductor memory is the essential electronics
component needed for any computer based PCB assembly.
In addition to this, memory cards have become commonplace items for

temporarily storing data - everything from the portable flash memory cards used
for transferring files, to semiconductor memory cards used in cameras, mobile
phones and the like.
The use of semiconductor memory has grown, and the size of these memory
cards has increased as the need for larger and larger amounts of storage is
needed.
To meet the growing needs for semiconductor memory, there are many types and
technologies that are used. As the demand grows new memory technologies are
being introduced and the existing types and technologies are being further
developed.
A variety of different memory technologies are available - each one suited to

different applications.. Names such as ROM, RAM, EPROM, EEPROM, Flash
memory, DRAM, SRAM, SDRAM, as well as F-RAM and MRAM are available, and
new types are being developed to enable improved performance.
Terms like DDR3, DDR4, DDR5 and many more are seen and these refer to
different types of SDRAM semiconductor memory.
In addition to this the semiconductor devices are available in many forms - ICs for
printed board assembly, USB memory cards, Compact Flash cards, SD memory
cards and even solid state hard drives. Semiconductor memory is even
incorporated into many microprocessor chips as on-board memory.
Printed circuit board containing

computer memory
Semiconductor memory: main types
There are two main types or categories that can be used for semiconductor
technology. These memory types or categories differentiate the memory to the
way in which it operates:
 RAM - Random Access Memory: As the names suggest, the RAM or
random access memory is a form of semiconductor memory technology that is
used for reading and writing data in any order - in other words as it is required
by the processor. It is used for such applications as the computer or processor
memory where variables and other stored and are required on a random basis.
Data is stored and read many times to and from this type of memory.
Random access memory is used in huge quantities in computer applications as

current day computing and processing technology requires large amounts of
memory to enable them to handle the memory hungry applications used
today. Many types of RAM including SDRAM with its DDR3, DDR4, and soon
DDR5 variants are used in huge quantities.
 ROM - Read Only Memory: A ROM is a form of semiconductor memory
technology used where the data is written once and then not changed. In view
of this it is used where data needs to be stored permanently, even when the
power is removed - many memory technologies lose the data once the power
is removed.
As a result, this type of semiconductor memory technology is widely used for

storing programs and data that must survive when a computer or processor is
powered down. For example the BIOS of a computer will be stored in ROM. As
the name implies, data cannot be easily written to ROM. Depending on the
technology used in the ROM, writing the data into the ROM initially may
require special hardware. Although it is often possible to change the data, this
gain requires special hardware to erase the data ready for new data to be
written in.
As can be seen, these two types of memory are very different, and as a result they
are used in very different ways.
Each of the semiconductor memory technologies outlined below falls into one of
these two types of category. each technology offers its own advantages and is
used in a particular way, or for a particular application.
Semiconductor memory technologies
There is a large variety of types of ROM and RAM that are available. Often the
overall name for the memory technology includes the initials RAM or ROM and
this gives a guide as to the overall type of format for the memory.
With technology moving forwards apace, not only are the established
technologies moving forwards with SDRAM technology moving from DDR3 to
DDR4 and then to DDR5, but Flash memory used in memory cards is also
developing as are the other technologies.
In addition to this, new memory technologies are arriving on the scene and they
are starting to make an impact in the market, enabling processor circuits to
perform more effectively.
The different memory types or memory technologies are detailed below:
 DRAM: Dynamic RAM is a form of random access memory. DRAM uses a

capacitor to store each bit of data, and the level of charge on each capacitor
determines whether that bit is a logical 1 or 0.
However these capacitors do not hold their charge indefinitely, and therefore
the data needs to be refreshed periodically. As a result of this dynamic
refreshing it gains its name of being a dynamic RAM. DRAM is the form of
semiconductor memory that is often used in equipment including personal
computers and workstations where it forms the main RAM for the computer.
The semiconductor devices are normally available as integrated circuits for use
in PCB assembly in the form of surface mount devices or less frequently now as
leaded components.
 EEPROM: This is an Electrically Erasable Programmable Read Only

Memory. Data can be written to these semiconductor devices and it can be
erased using an electrical voltage. This is typically applied to an erase pin on
the chip. Like other types of PROM, EEPROM retains the contents of the
memory even when the power is turned off. Also like other types of ROM,
EEPROM is not as fast as RAM.
 EPROM: This is an Erasable Programmable Read Only Memory. These

semiconductor devices can be programmed and then erased at a later time.
This is normally achieved by exposing the semiconductor device itself to
ultraviolet light. To enable this to happen there is a circular window in the
package of the EPROM to enable the light to reach the silicon of the device.
When the PROM is in use, this window is normally covered by a label,
especially when the data may need to be preserved for an extended period.
The PROM stores its data as a charge on a capacitor. There is a charge storage
capacitor for each cell and this can be read repeatedly as required. However it
is found that after many years the charge may leak away and the data may be
lost.
Nevertheless, this type of semiconductor memory used to be widely used in

applications where a form of ROM was required, but where the data needed to
be changed periodically, as in a development environment, or where quantities
were low.
 Flash memory: Flash memory may be considered as a development of
EEPROM technology. Data can be written to it and it can be erased, although
only in blocks, but data can be read on an individual cell basis.
To erase and re-programme areas of the chip, programming voltages at levels

that are available within electronic equipment are used. It is also non-volatile,
and this makes it particularly useful. As a result Flash memory is widely used in
many applications including USB memory sticks, compact Flash memory cards,
SD memory cards and also now solid state hard drives for computers and many
other applications.
 F-RAM: Ferroelectric RAM is a random-access memory technology that

has many similarities to the standard DRAM technology. The major difference
is that it incorporates a ferroelectric layer instead of the more usual dielectric
layer and this provides its non-volatile capability. As it offers a non-volatile
capability, F-RAM is a direct competitor to Flash.
 MRAM: This is Magneto-resistive RAM, or Magnetic RAM. It is a non-

volatile RAM memory technology that uses magnetic charges to store data
instead of electric charges.
Unlike technologies including DRAM, which require a constant flow of

electricity to maintain the integrity of the data, MRAM retains data even when
the power is removed. An additional advantage is that it only requires low
power for active operation. As a result this technology could become a major
player in the electronics industry now that production processes have been
developed to enable it to be produced.
Read more about . . . .
 P-RAM / PCM: This type of semiconductor memory is known as Phase

change Random Access Memory, P-RAM or just Phase Change memory, PCM. It
is based around a phenomenon where a form of chalcogenide glass changes is
state or phase between an amorphous state (high resistance) and a
polycrystalline state (low resistance). It is possible to detect the state of an
individual cell and hence use this for data storage. Currently this type of
memory has not been widely commercialized, but it is expected to be a
competitor for flash memory.
Concept of Cache in Computer Architecture
A cache -- pronounced CASH -- is hardware or software that is used to store

something, usually data, temporarily in a computing environment.
A small amount of faster, more expensive memory is used to improve the

performance of recently accessed or frequently accessed data that is stored
temporarily in a rapidly accessible storage media that's local to the cache client
and separate from bulk storage. Cache is frequently used by cache clients, such as
the CPU, applications, web browsers or operating systems (OSes).
Cache is used because bulk, or main, storage can't keep up with the demands of
the cache clients. Cache shortens data access times, reduces latency and
improves input/output (I/O). Because almost all application workloads depend on
I/O operations, caching improves application performance.
How cache works
When a cache client needs to access data, it first checks the cache. When the
requested data is found in a cache, it's called a cache hit. The percent of attempts
that result in cache hits is known as the cache hit rate or ratio.
If the requested data isn't found in the cache -- a situation known as a cache miss
-- it is pulled from main memory and copied into the cache. How this is done, and
what data is ejected from the cache to make room for the new data, depends on
the caching algorithm or policies the system uses.
Cache Mapping Techniques-
The three different types of mapping used for the purpose of cache memory are
as follow, Associative mapping, Direct mapping and Set-Associative mapping.
- Associative mapping: In this type of mapping the associative memory is used to
store content and addresses both of the memory word. This enables the
placement of the any word at any place in the cache memory. It is considered to
be the fastest and the most flexible mapping form.
- Direct mapping: In direct mapping the RAM is made use of to store data and
some is stored in the cache. An address space is split into two parts index field
and tag field. The cache is used to store the tag field whereas the rest is stored in
the main memory. Direct mapping`s performance is directly proportional to the
Hit ratio.
- Set-associative mapping: This form of mapping is a modified form of the direct

mapping where the disadvantage of direct mapping is removed. Set-associative
mapping allows that each word that is present in the cache can have two or more
words in the main memory for the same index address.
Cache mapping is performed using following three different techniques-
1. Direct Mapping
2. Fully Associative Mapping
3. K-way Set Associative Mapping
1. Direct Mapping-
In direct mapping,
A particular block of main memory can map only to a particular line of the cache.
The line number of cache to which a particular block can map is given by-
Mapping Functions And Replacement Algorithms
Mapping Functions
The mapping functions are used to map a particular block of main memory to a
particular block of cache. This mapping function is used to transfer the block from
main memory to cache memory. Three different mapping functions are available:
Direct mapping:
A particular block of main memory can be brought to a particular block of cache
memory. So, it is not flexible.

Associative mapping:
In this mapping function, any block of Main memory can potentially reside in any
cache block position. This is much more flexible mapping method.

Block-set-associative mapping:
In this method, blocks of cache are grouped into sets, and the mapping allows a
block of main memory to reside in any block of a specific set. From the flexibility
point of view, it is in between to the other two methods.

All these three mapping methods are explained with the help of an example.
Consider a cache of 4096 (4K) words with a block size of 32 words. Therefore, the
cache is organized as 128 blocks. For 4K words, required address lines are 12 bits.
To select one of the block out of 128 blocks, we need 7 bits of address lines and to
select one word out of 32 words, we need 5 bits of address lines. So the total 12
bits of address is divided for two groups, lower 5 bits are used to select a word
within a block, and higher 7 bits of address are used to select any block of cache
memory.

Let us consider a main memory system consisting 64K words. The size of address
bus is 16 bits. Since the block size of cache is 32 words, so the main memory is
also organized as block size of 32 words. Therefore, the total number of blocks in
main memory is 2048 (2K x 32 words = 64K words). To identify any one block of
2K blocks, we need 11 address lines. Out of 16 address lines of main memory,
lower 5 bits are used to select a word within a block and higher 11 bits are used to
select a block out of 2048 blocks.

Number of blocks in cache memory is 128 and number of blocks in main memory
is 2048, so at any instant of time only 128 blocks out of 2048 blocks can reside in
cache memory. Therefore, we need mapping function to put a particular block of
main memory into appropriate block of cache memory.

Direct Mapping Technique:
The simplest way of associating main memory blocks with cache block is the
direct mapping technique. In this technique, block k of main memory maps into
block k modulo m of the cache, where m is the total number of blocks in cache. In
this example, the value of m is 128. In direct mapping technique, one particular
block of main memory can be transferred to a particular block of cache which is
derived by the modulo function.

Since more than one main memory block is mapped onto a given cache block
position, contention may arise for that position. This situation may occurs even
when the cache is not full. Contention is resolved by allowing the new block to
overwrite the currently resident block. So the replacement algorithm is trivial.

The detail operation of direct mapping technique is as follows:
The main memory address is divided into three fields. The field size depends on
the memory capacity and the block size of cache. In this example, the lower 5 bits
of address is used to identify a word within a block. Next 7 bits are used to select
a block out of 128 blocks (which is the capacity of the cache). The remaining 4 bits
are used as a TAG to identify the proper block of main memory that is mapped to
cache.

When a new block is first brought into the cache, the high order 4 bits of the main
memory address are stored in four TAG bits associated with its location in the
cache. When the CPU generates a memory request, the 7-bit block address
determines the corresponding cache block. The TAG field of that block is
compared to the TAG field of the address. If they match, the desired word
specified by the low-order 5 bits of the address is in that block of the cache.

If there is no match, the required word must be accessed from the main memory,
that is, the contents of that block of the cache is replaced by the new block that is
specified by the new address generated by the CPU and correspondingly the TAG
bit will also be changed by the high order 4 bits of the address. The whole
arrangement for direct mapping technique is shown in the figure below.

Figure : Direct-mapping cache

Associated Mapping Technique:
In the associative mapping technique, a main memory block can potentially reside
in any cache block position. In this case, the main memory address is divided into
two groups, low-order bits identifies the location of a word within a block and
high-order bits identifies the block. In the example here, 11 bits are required to
identify a main memory block when it is resident in the cache , high-order 11 bits
are used as TAG bits and low-order 5 bits are used to identify a word within a
block. The TAG bits of an address received from the CPU must be compared to the
TAG bits of each block of the cache to see if the desired block is present.

In the associative mapping, any block of main memory can go to any block of
cache, so it has got the complete flexibility and we have to use proper
replacement policy to replace a block from cache if the currently accessed block
of main memory is not present in cache. It might not be practical to use this
complete flexibility of associative mapping technique due to searching overhead,
because the TAG field of main memory address has to be compared with the TAG
field of all the cache block. In this example, there are 128 blocks in cache and the
size of TAG is 11 bits. The whole arrangement of Associative Mapping Technique
is shown in the figure below.
s
Figure : Associated Mapping Cache

Block-Set-Associative Mapping Technique:
This mapping technique is intermediate to the previous two techniques. Blocks of
the cache are grouped into sets, and the mapping allows a block of main memory
to reside in any block of a specific set. Therefore, the flexibility of associative
mapping is reduced from full freedom to a set of specific blocks. This also reduces
the searching overhead, because the search is restricted to number of sets,
instead of number of blocks. Also the contention problem of the direct mapping is
eased by having a few choices for block replacement.

Consider the same cache memory and main memory organization of the previous
example. Organize the cache with 4 blocks in each set. The TAG field of
associative mapping technique is divided into two groups, one is termed as SET bit
and the second one is termed as TAG bit. Each set contains 4 blocks, total number
of set is 32. The main memory address is grouped into three parts: low-order 5
bits are used to identifies a word within a block. Since there are total 32 sets
present, next 5 bits are used to identify the set. High-order 6 bits are used as TAG
bits.

The 5-bit set field of the address determines which set of the cache might contain
the desired block. This is similar to direct mapping technique, in case of direct
mapping, it looks for block, but in case of block-set-associative mapping, it looks
for set. The TAG field of the address must then be compared with the TAGs of the
four blocks of that set. If a match occurs, then the block is present in the cache;
otherwise the block containing the addressed word must be brought to the cache.
This block will potentially come to the corresponding set only.

Since, there are four blocks in the set, we have to choose appropriately which
block to be replaced if all the blocks are occupied. Since the search is restricted to
four block only, so the searching complexity is reduced. The whole arrangement
of block-set-associative mapping technique is shown in the figure below.

It is clear that if we increase the number of blocks per set, then the number of
bits in SET field is reduced. Due to the increase of blocks per set, complexity of
search is also increased. The extreme condition of 128 blocks per set requires no
set bits and corresponds to the fully associative mapping technique with 11 TAG
bits. The other extreme of one block per set is the direct mapping method.

Figure : Block-set Associated mapping Cache with 4 blocks per set

Replacement Algorithms
When a new block must be brought into the cache and all the positions that it
may occupy are full, a decision must be made as to which of the old blocks is to
be overwritten. In general, a policy is required to keep the block in cache when
they are likely to be referenced in near future. However, it is not easy to
determine directly which of the block in the cache are about to be referenced.
The property of locality of reference gives some clue to design good replacement
policy.

Least Recently Used (LRU) Replacement policy:
Since program usually stay in localized areas for reasonable periods of time, it can
be assumed that there is a high probability that blocks which have been
referenced recently will also be referenced in the near future. Therefore, when a
block is to be overwritten, it is a good decision to overwrite the one that has gone
for longest time without being referenced. This is defined as the least recently
used (LRU) block. Keeping track of LRU block must be done as computation
proceeds.

Consider a specific example of a four-block set. It is required to track the LRU
block of this four-block set. A 2-bit counter may be used for each block.

When a hit occurs, that is, when a read request is received for a word that is in
the cache, the counter of the block that is referenced is set to 0. All counters
which values originally lower than the referenced one are incremented by 1 and
all other counters remain unchanged.

When a miss occurs, that is, when a read request is received for a word and the
word is not present in the cache, we have to bring the block to cache.

There are two possibilities in case of a miss:
1. If the set is not full, the counter associated with the new block loaded from
the main memory is set to 0, and the values of all other counters are
incremented by 1.
2. If the set is full and a miss occurs, the block with the counter value 3 is
removed, and the new block is put in its place. The counter value is set to
zero. The other three block counters are incremented by 1.

It is easy to verify that the counter values of occupied blocks are always distinct.
Also it is trivial that highest counter value indicates least recently used block.

First In First Out (FIFO) replacement policy:
A reasonable rule may be to remove the oldest from a full set when a new block
must be brought in. While using this technique, no Updation is required when a
hit occurs. When a miss occurs and the set is not full, the new block is put into an
empty block and the counter values of the occupied block will be increment by
one. When a miss occurs and the set is full, the block with highest counter value is
replaced by new block and counter is set to 0, counter value of all other blocks of
that set is incremented by 1. The overhead of the policy is less, since no updation
is required during hit.

Random replacement policy:
The simplest algorithm is to choose the block to be overwritten at random.
Interestingly enough, this simple algorithm has been found to be very effective in
practice.
Introduction: Cache Reads
So far, we've traced sequences of memory addresses that work as follows, if you'll
let me anthropomorphize a little bit:
1. The processor asks the memory subsystem, "Hey, do you have the data at
Address XXX?"
2. The L1 cache tackles this question first by checking the valid bit and tag of
whatever block(s) could possibly contain Address XXX's data.
3. One of two things will happen:
o Hit: If the L1 determines that it is currently holding Address XXX's

data, then the L1 cheerfully returns that data to the processor (and
updates its own LRU information, if applicable).
o Miss: If the L1 determines that it doesn't have Address XXX's data,

then it starts talking to the next level of the hierarchy (probably an L2
cache). The conversation between the L1 and L2 looks a lot like the
conversation between the processor and the L1 we've outlined so
far.
But eventually, the data makes its way from some other level of the
hierarchy to both the processor that requested it and the L1 cache.
The L1 cache then stores the new data, possibly replacing some old
data in that cache block, on the hypothesis that temporal locality is
king and the new data is more likely to be accessed soon than the old
data was.
Throughout this process, we make some sneaky implicit assumptions that are
valid for reads but questionable for writes. We will label them Sneaky
Assumptions 1 and 2:
 Sneaky assumption 1: It's OK to unceremoniously replace old data in a

cache, since we know there is a copy somewhere else further down the
hierarchy (main memory, if nowhere else).
 Sneaky assumption 2: If the access is a miss, we absolutely need to go get
that data from another level of the hierarchy before our program can
proceed.
Why these assumptions are valid for reads:
 Sneaky assumption 1: Bringing data into the L1 (or L2, or whatever) just
means making a copy of the version in main memory. If we lose this copy,
we still have the data somewhere.
 Sneaky assumption 2: If the request is a load, the processor has asked the
memory subsystem for some data. In order to fulfill this request, the
memory subsystem absolutely must go chase that data down, wherever it
is, and bring it back to the processor.
Why these assumptions are questionable for writes:
 Sneaky assumption 1: Let's think about the data that's being replaced (the
technical term is evicted) when we bring in the new data. If some of the
accesses to the old data were writes, it's at least possible that the version
of the old data in our cache is inconsistent with the versions in lower levels
of the hierarchy. We would want to be sure that the lower levels know
about the changes we made to the data in our cache before just
overwriting that block with other stuff.
 Sneaky assumption 2: If the request is a store, the processor is just asking
the memory subsystem to keep track of something -- it doesn't need any
information back from the memory subsystem. So the memory subsystem
has a lot more latitude in how to handle write misses than read misses.
In short, cache writes present both challenges and opportunities that reads don't,
which opens up a new set of design decisions.
Design Decision #1: Keeping Track of Modified Data
More wild anthropomorphism ahead...
Imagine you're an L1 cache (although this discussion generalizes to other levels as

well). The processor sends you a write request for address XXX, whose data you're
already storing (a write hit). As requested, you modify the data in the appropriate
L1 cache block.
Oh no! Now your version of the data at Address XXX is inconsistent with the
version in subsequent levels of the memory hierarchy (L2, L3, main memory...)!
Since you care about preserving correctness, you have only two real options:
 Option 1: Write-through. You and L2 are soulmates. Inconsistency with L2

is intolerable to you. You feel uncomfortable when you and L2 disagree
about important issues like the data at Address XXX. To deal with this
discomfort, you immediately tell L2 about this new version of the data.
 Option 2: Write-back. You have a more hands-off relationship with L2. Your

discussions are on a need-to-know basis. You quietly keep track of the fact
that you have modified this block. If you ever need to evict the block, that's
when you'll finally tell L2 what's up.
Write-Through Implementation Details (naive version)
With write-through, every time you see a store instruction, that means you need
to initiate a write to L2. In order to be absolutely sure you're consistent with L2 at
all times, you need to wait for this write to complete, which means you need to
pay the access time for L2.
What this means is that a write hit actually acts like a miss, since you'll need to
access L2 (and possibly other levels too, depending on what L2's write policy is
and whether the L2 access is a hit or miss).
This is no fun and a serious drag on performance.
Write-Through Implementation Details (smarter version)
Instead of sitting around until the L2 write has fully completed, you add a little bit
of extra storage to L1 called a write buffer. The write buffer's job is to keep track
of all the pending updates to L2, so that L1 can move on with its life.
This optimization is possible because those write-through operations don't

actually need any information from L2; L1 just needs to be assured that the write
will go through.
However, the write buffer is finite -- we're not going to be able to just add more
transistors to it if it fills up. If the write buffer does fill up, then, L1 actually will
have to stall and wait for some writes to go through.
The bottom line: from a performance perspective, we'll treat a write hit to a
write-through cache like a read hit, as long as the write buffer has available space.
When the write buffer is full, we'll treat it more like a read miss (since we have to
wait to hand the data off to the next level of cache).
Write-Back Implementation Details
As long as we're getting write hits to a particular block, we don't tell L2 anything.
Instead, we just set a bit of L1 metadata (the dirty bit -- technical term!) to
indicate that this block is now inconsistent with the version in L2.
So everything is fun and games as long as our accesses are hits. The problem is
whenever we have a miss -- even if it's a read miss -- and the block that's being
replaced is dirty.
Whenever we have a miss to a dirty block and bring in new data, we actually have
to make two accesses to L2 (and possibly lower levels):
 One to let it know about the modified data in the dirty block. We'll treat
this like an L1 miss penalty.
 Another to fetch the actual missed data. We'll treat this like a second miss
penalty.
What this means is that some fraction of our misses -- the ones that overwrite
dirty data -- now have this outrageous double miss penalty.
Design Decision #2: Write Misses: Should You Care?
Again, pretend (without loss of generality) that you're an L1 cache. You get a write
request from the processor. Your only obligation to the processor is to make sure
that the subsequent read requests to this address see the new value rather than
the old one. That's it.
If this write request happens to be a hit, you'll handle it according to your write
policy (write-back or write-through), as described above. But what if it's a miss?
As long as someone hears about this data, you're not actually obligated to
personally make room for it in L1. You can just pass it to the next level without
storing it yourself.
(As a side note, it's also possible to refuse to make room for the new data on a
read miss. But that requires you to be pretty smart about which reads you want
to cache and which reads you want to send to the processor without storing in L1.
That's a very interesting question but beyond the scope of this class.)
So you have two basic choices: make room for the new data on a write miss, or
don't.
Write-allocate
A write-allocate cache makes room for the new data on a write miss, just like it
would on a read miss.
Here's the tricky part: if cache blocks are bigger than the amount of data
requested, now you have a dilemma. Do you go ask L2 for the data in the rest of
the block (which you don't even need yet!), or not? This leads to yet another
design decision:
 Fetch-on-write: If the cache is fetch-on-write, then an L1 write miss triggers

a request to L2 to fetch the rest of the block. This read request to L2 is in
addition to any write-through operation, if applicable.
 No-fetch-on-write: If the cache isn't fetch-on-write, then here's how a
write miss works: L1 fills in only the part of the block that's being written
and doesn't ask L2 to help fill in the rest. This eliminates the overhead of
the L2 read, but it requires multiple valid bits per cache line to keep track of
which pieces have actually been filled in.
In this class, I won't ask you about the sizing or performance of no-fetch-on-write
caches. I might ask you conceptual questions about them, though.
No-write-allocate
This is just what it sounds like! If you have a write miss in a no-write-
allocate cache, you simply notify the next level down (similar to a write-through
operation). You don't kick anything out of your own cache.
Generally, write-allocate makes more sense for write-back caches and no-write-
allocate makes more sense for write-through caches, but the other combinations
are possible too.
Instruction Cycle
What is an Instruction Set?
From the designer's point of view, the machine instruction set provides the
functional requirements for the CPU: Implementing the CPU is a task that in large
part involves implementing the machine instruction set.
From the user's side, the user who chooses to program in machine language
(actually, in assembly language) becomes aware of the register and memory
structure, the types of data directly supported by the machine, and the
functioning of the ALU.
Elements of an Instruction
Each instruction must have elements that contain the information required by the
CPU for execution. These elements are as follows
 Operation code: Specifies the operation to be performed (e.g.. ADD, I/O).

The operation is specified by a binary code, known as the operation code,
or opcode.
 Source operand reference: The operation may involve one or more source
operands, that is, operands that are inputs for the operation.
 Result operand reference: The operation may produce a result.
 Next instruction reference: This tells the CPU where to fetch the next
instruction after the execution of this instruction is complete.
The next instruction to be fetched is located in main memory or, in the case of a
virtual memory system, in either main memory or secondary memory (disk). In
most cases, the next instruction to be fetched immediately follows the current
instruction. In those cases, there is no explicit reference to the next instruction.
Source and result operands can be in one of three areas:
 Main or virtual memory: As with next instruction references, the main or
virtual memory address must be supplied.
 CPU register: With rare exceptions, a CPU contains one or more registers
that may be referenced by machine instructions. If only one register exists,
reference to it may be implicit. If more than one register exists, then each
register is assigned a unique number, and the instruction must contain the
number of the desired register.
 I/O device: The instruction must specify (he I/O module and device for the
operation. If memory-mapped I/O is used, this is just another main or
virtual memory address.
Instruction Cycle State Diagram

Instruction Representation
Within the computer, each instruction is represented by a sequence of bits. The

instruction is divided into fields, corresponding to the constituent elements of the
instruction. During instruction execution, an instruction is read into an instruction
register (IR) in the CPU. The CPU must be able to extract the data from the various
instruction fields to perform the required operation.
It is difficult for both the programmer and the reader of textbooks to deal with
binary representations of machine instructions. Thus, it has become common
practice to use a symbolic representation of machine instructions. Opcodes are
represented by abbreviations, called mnemonics, that indicate the operation.
Common examples include
ADD Add
SUB Subtract
MPY Multiply
DIV Divide
LOA
Load data from memory
D
STOR Store data to memory
Operands are also represented symbolically. For example, the instruction
ADD R, Y
may mean add the value contained in data location Y to the contents of register R.
In this example. Y refers to the address of a location in memory, and R refers to a
particular register. Note that the operation is performed on the contents of a
location, not on its address.
Simple Instruction Format

Instruction Types
Consider a high-level language instruction that could be expressed in a language

such as BASIC or FORTRAN. For example,
X = X+Y
This statement instructs the computer lo add the value stored in Y to the value
Stored in X and put the result in X. How might this be accomplished with machine
instructions? Let us assume that the variables X and Y correspond lo locations 513
and 514. If we assume a simple set of machine instructions, this operation could
be accomplished with three instructions:
1. Load a register with the contents of memory location 513.
2. Add the contents of memory location 514 to the register.
3. Store the contents of the register in memory location 513.
As can be seen, the single BASIC instruction may require three machine
instructions. This is typical of the relationship between a high-level language and
a machine language. A high-level language expresses operations in a concise
algebraic form, using variables. A machine language expresses operations in a
basic form involving the movement of data to or from registers.
With this simple example to guide us, let us consider the types of instructions that
must be included in a practical computer. A computer should have a set of
instructions that allows the user to formulate any data processing task. Another
way to view it is to consider the capabilities of a high-level programming
language. Any program written in a high-level language must be translated into
machine language to be executed. Thus, the set of machine instructions must be
sufficient to express any of the instructions from a high-level language. With this
in mind we can categorize instruction types as follows:
 Data processing: Arithmetic and logic instructions
 Data storage: Memory instructions
 Data movement: I/O instructions
 Control: Test and branch instructions
Number of Addresses
What is the maximum number of addresses one might need in an instruction?

Evidently, arithmetic and logic instructions will require the most operands.
Virtually all arithmetic and logic operations are either unary (one operand) or
binary (two operands). Thus, we would need a maximum of two addresses to
reference operands. The result of an operation must be stored, suggesting a third
address. Finally, after completion of an instruction, the next instruction must be
fetched, and its address is needed.
This line of reasoning suggests that an instruction could plausibly be required to
contain four address references: two operands, one result and the address of the
next instruction. In practice, four-address instructions are extremely rare. Most
instructions have one, two, or three operand addresses, with the address of the
next instruction being implicit (obtained from the program counter).
 Three addresses:
 Operand 1, Operand 2, Result
Example: a = b + c
 Three-address instruction formats are not common, because they require a

relatively long instruction format to hold the three address references.
 Two addresses:
 One address doubles as operand and result
Example: a = a + b
 The two-address formal reduces the space requirement but also introduces
some awkwardness. To avoid altering the value of an operand, a MOVE
instruction is used to move one of the values to a result or temporary
location before performing the operation.
 One addresses:
 a second address must be implicit. This was common in earlier machines,

with the implied address being a CPU register known as the accumulator. or
AC. The accumulator contains one of the operands and is used to store the
result.
 Zero addresses
 Zero-address instructions are applicable to a special memory organization,

called a Stack. A stack is a last-in-first-out set of locations.
How Many Addresses?

The number of addresses per instruction is a basic design decision.
Fewer addresses:
 Fewer addresses per instruction result in more primitive instructions, which

requires a less complex CPU.
 It also results in instructions of shorter length. On the other hand, programs

contain more total instructions, which in general results in longer execution
times and longer, more complex programs
Multiple-address instructions:
 With multiple-address instructions, it is common to have multiple general-

purpose registers. This allows some operations to be performed solely on
registers.
 Because register references are faster than memory references, this speeds
up execution.
Design Decisions
One of the most interesting and most analyzed, aspects of computer design is
instruction set design. The design of an instruction set is very complex, because it
affects so many aspects of the computer system. The instruction set defines many
of the functions performed by the CPU and thus has a significant effect on the
implementation of the CPU. The instruction set is the programmer's means of
controlling the CPU. Thus, programmer requirements must be considered in
designing the instruction set. The most important design issues include the
following:
 Operation repertoire: How many and which operations to provide, and how
complex operations should be
 Data types: The various types of data upon which operations are performed
 Instruction format: Instruction length (in bits), number of addresses, size of

various fields, and so on.
 Registers: Number of CPU registers that can be referenced by instructions,
and their use.
 Addressing: The mode or modes by which the address of an operand is

specified
Types of Operands
Machine instructions operate on data. The most important general categories of

data are:
 Addresses
 Numbers
 Characters
 Logical data
Numbers
All machine languages include numeric data types. Even in nonnumeric data pro-
cessing, there is a need for numbers to act as counters, field widths, and so forth.
An important distinction between numbers used in ordinary mathematics and
numbers stored in a computer is that the latter are limited. Thus, the programmer
is faced with understanding the consequences of rounding, overflow and
underflow.
Three types of numerical data are common in computers:
 Integer or fixed point
 Floating point
 Decimal
Characters
A common form of data is text or character strings. While textual data are most
convenient for human beings, they cannot, in character form, be easily stored or
transmitted by data processing and communications systems. Such systems are
designed for binary data. Thus, a number of codes have been devised by which
characters are represented by a sequence of bits. Perhaps the earliest common
example of this is the Morse code. Today, the most commonly used character
code in the International Reference Alphabet (IRA), referred to in the United
States as the American Standard Code for Information Interchange (ASCII). IRA is
also widely used outside the United States. Each character in this code is
represented by a unique 7-bit pattern, thus, 128 different characters can be
represented. This is a larger number than is necessary to represent printable
characters, and some of the patterns represent control characters. Some of these
control characters have to do with controlling the printing of characters on a
page. Others are concerned with communications procedures. IRA-encoded
characters are almost always stored and transmitted using 8 bits per character.
The eighth bit may be set to 0 or used as a parity bit for error detection. In the
latter case, the bit is set such that the total number of binary 1s in each octet is
always odd (odd parity) or always even (even parity).
Another code used to encode characters is the Extended Binary Coded Decimal
Interchange Code (EBCDIC). EBCDIC is used on IBM S/390 machines. It is an 8-bit
code. As with IRA, EBCDIC is compatible with packed decimal. In the case of
EBCDIC, the codes 11110000 through 11111001 represent the digits 0 through 9.
Logical Data
Normally, each word or other addressable unit (byte, half-word, and soon) is
treated as a single unit of data. It is sometimes useful, however, to consider an n-
bit unit as consisting 1-bit items of data, each item having the value 0 or I. When
data are viewed this way, they are considered to be logic data.
There are two advantages to the bit-oriented view:
 First, we may sometimes wish to store an array of Boolean or binary data

items, in which each item can take on only the values I (true) and II (fake).
With logical data, memory can be used most efficiently for this storage.
 Second, there are occasions when we wish to manipulate the bits of a data
item.
Types of Operations
The number of different opcodes varies widely from machine to machine.
However, the same general types of operations are found on all machines. A
useful and typical categorization is the following:
 Data transfer
 Arithmetic
 Logical
 Conversion
 I/O
 System control
 Transfer of control
Data transfer
The most fundamental type of machine instruction is the data transfer

instruction. The data transfer instruction must specify several things.
 The location of the source and destination operands must be specified.

Each location could be memory. a register, or the lop of the stack.
 The length of data to be transferred must be indicated.
 As with all instructions with operands, the mode of addressing for each
operand must be specified.
In term of CPU action, data transfer operations are perhaps the simplest type. If
both source and destination are registers, then the CPU simply causes data to be
transferred from one register to another; this is an operation internal to the CPU.
If one or both operands are in memory, then (he CPU must perform some or all of
following actions:
1. Calculate the memory address, based on the address mode

2. If the address refers to virtual memory, translate from virtual to actual memory
address.
3. Determine whether the addressed item is in cache.
4. If not, issue a command lo the memory module.
Example:
Operation Number of bits

Name Description
mnemonic transferred
Transfer from memory

L Load 32
in register
Load half- Transler from memory

LH 16
word to register
Transfer from register

ST Store 32
to memory
Store half- Transfer from register

STH 16
word to memory
Arithmetic
Most machines provide the basic arithmetic operations of add, subtract, multiply,
and divide. These are invariably provided for signed integer (fixed-point)
numbers, Often they are also provided for floating-point and packed decimal
numbers.
Other possible operations include a variety of single-operand instructions: for

example.
• Absolute: Take the absolute value of the operand.
• Negate: Negate the Operand.
• Increment.: Add 1 to the operand.

• Decrement: Subtract 1 from the operand
Logical
Most machines also provide a variety of operations for manipulating individual

bits of a word or other addressable units, often referred to as "bit twiddling."
They are based upon Boolean operations.
Some of the basic logical operations that can be performed on Boolean or binary
data are AND, OR, NOT, XOR, …
These logical operations can be applied bitwise to n-bit logical data units. Thus, if
two registers contain the data
(R1) - 10100101 (R2) - 00001111
then
(R1) AND (R2) – 00000101
In addition lo bitwise logical operations, most machines provide a variety of

shifting and rotating functions such as shift left, shift right, right rotate, left
rotate…
Conversion
Conversion instructions are those that change the format or operate on the
format of data. An example is converting from decimal to binary.
Input/Output
As we saw, there are a variety of approaches taken, including isolated

programmed IO, memory-mapped programmed I/O, DMA, and the use of an I/O
processor. Many implementations provide only a few I/O instructions, with the
specific actions specified by parameters, codes, or command words.
System Controls
System control instructions are those that can he executed only while the proces-
sor is in a certain privileged state or is executing a program in a special privileged
area of memory, typically, these instructions are reserved for the use of the oper-
ating system.
Some examples of system control operations are as follows. A system control

instruction may read or alter a control register. Another example is an instruction
to read or modify a storage protection key, such us is used in the S/390 memory
system. Another example is access to process control blocks in a
multiprogramming system.
Transfer of control
For all of the operation types discussed so far. The next instruction to be
performed is the one that immediately follows, in memory, the current
instruction. However, a significant fraction of the instructions in any program
have as their function changing the sequence of instruction execution. For these
instructions, the operation performed by the CPU is to update the program
counter to contain the address of some instruction in memory.
There are a number of reasons why transfer-of-control operations are required.

Among the most important are the following:
1. In the practical use of computers, it is essential to be able to execute each

instruction more than once and perhaps many thousands of times. It may require
thousands or perhaps millions of instructions to implement an application. This
would be unthinkable if each instruction had to be written out separately. If a
table or a list of items is to be processed, a program loop is needed. One
sequence of instructions is executed repeatedly to process all the data.
2. Virtually all programs involve some decision making. We would like the com-
puter to do one thing if one condition holds, and another thing if another con-
dition holds. For example, a sequence of instructions computes the square root of
a number. At the start of the sequence, the sign of the number is tested, If the
number is negative, the computation is not performed, but an error condition is
reported.
3. To compose correctly a large or even medium-size computer program is an

exceedingly difficult task. It helps if there are mechanisms for breaking the task up
into smaller pieces that can be worked on one at a time.
We now turn to a discussion of the most common transfer-of-control operations

found in instruction sets: branch, skip, and procedure call.
Branch instruction
A branch instruction, also called a jump instruction, has as one of its operands the
address of the next instruction to be executed. Most often, the instruction is a
conditional branch instruction. That is, the branch is made (update program
counter to equal address specified in operand) only if a certain condition is met.
Otherwise, the next instruction in sequence is executed (increment program
counter as usual).
Skip instructions
Another common form of transfer-of-control instruction is the skip instruction.

The skip instruction includes an implied address. Typically, the skip implies that
one instruction be skipped; thus, the implied address equals the address of the
next instruction plus one instruction-length.
Procedure call instructions
Perhaps the most important innovation in the development of programming

languages is the procedure, a procedure is a self-contained computer program
that is incorporated into a larger program. At any point in the program the
procedure may he invoked, or called. The processor is instructed to go and
execute the entire procedure and then return to the point from which the call
took place.
The two principal reasons for the use of procedures are economy and modularity.
A procedure allows the same piece of code to be used many times. This is
important for economy in programming effort and for making the most efficient
use of storage space in the system (the program must be stored). Procedures also
allow large programming tasks to be subdivided into smaller units. This use of
modularity greatly eases the programming task.
The procedure mechanism involves two basic instructions: a call instruction that
branches from the present location to the procedure, and a return instruction
that returns from the procedure to the place from which it was called. Both of
these are forms of branching instructions.
Several points are worth noting:
1. A procedure can be called from more than one location.
2. A procedure call can appear in a procedure. This allows the nesting of proce-
dures to an arbitrary depth.
3. Each procedure call is matched by a return in the called program.
Because we would like to be able to call a procedure from a variety of points, the
CPU must somehow save the return address so that the return can take place
appropriately. There are three common places for storing the return address:
• Register
• Start of called procedure
• Top of stack
Addressing Modes
The address field or fields in a typical instruction format are relatively small. We
would like to be able to reference a large range of locations in main memory or
for some systems, virtual memory. To achieve this objective, a variety of
addressing techniques has been employed. They all involve some trade-off
between address range and/or addressing flexibility, on the one hand, and the
number of memory references and/or the complexity of address calculation, on
the other. In this section, we examine the most common addressing techniques:
 Immediate
 Direct
 Indirect
 Register
 Register indirect
 Displacement
Immediate Addressing
The simplest form of addressing is immediate addressing, in which the operand is

actually present in the instruction:
 Operand is part of instruction
 Operand = address field
 e.g. ADD 5 ;Add 5 to contents of accumulator ;5 is operand
The advantage of immediate addressing is that no memory reference other than

the instruction fetch is required to obtain the operand, thus saving one memory
or cache cycle in the instruction cycle.
The disadvantage is that the size of the number is restricted to the size of the
address field, which, in most instruction sets, is small compared with the word
length.
Direct Addressing
A very simple form of addressing is direct addressing, in which:
 Address field contains address of operand
 Effective address (EA) = address field (A)
 e.g. ADD A ;Add contents of cell A to accumulator
The technique was common in earlier generations of computers but is not

common on contemporary architectures. It requires only one memory reference
and no special calculation. The obvious limitation is that it provides only a limited
address space.
Indirect Addressing
With direct addressing, the length of the address field is usually less than the
word length, thus limiting the address range. One solution is to have the address
field refer to the address of a word in memory, which in turn contains a full-length
address of the operand. This is known as indirect addressing.
Register Addressing
Register addressing is similar to direct addressing. The only difference is that the
address field refers to a register rather than a main memory address.
The advantages of register addressing are that :
 Only a small address field is needed in the instruction
 No memory 'references are required, faster instruction fetch
The disadvantage of register addressing is that the address space is very limited.
Register Indirect Addressing
Just as register addressing is analogous to direct addressing, register indirect ad-

dressing is analogous to indirect addressing. In both cases, the only difference is
whether the address field refers to a memory location or a register. Thus, for
register indirect address: Operand is in memory cell pointed to by contents of
register.
The advantages and limitations of register indirect addressing are basically the
same as for indirect addressing. In both cases, the address space limitation
(limited range of addresses) of the address field is overcome by having that field
refer to a word-length location containing an address. In addition, register indirect
addressing uses one less memory reference than indirect addressing.
Displacement Addressing
A very powerful mode of addressing combines the capabilities of direct

addressing and register indirect addressing. It is known by a variety of names
depending on the context of its use but the basic mechanism is the same. We will
refer to this as displacement addressing, address field hold two values:
 A = base value
 R = register that holds displacement

Instruction Addressing Modes & Formats
Below we have discussed different types of addressing modes one by one:
Immediate Mode
In this mode, the operand is specified in the instruction itself. An immediate mode
instruction has an operand field rather than the address field.
For example: ADD 7, which says Add 7 to contents of accumulator. 7 is the
operand here.
Register Mode
In this mode the operand is stored in the register and this register is present in
CPU. The instruction has the address of the Register where the operand is stored.
Advantages
 Shorter instructions and faster instruction fetch.
 Faster memory access to the operand(s)
Disadvantages
 Very limited address space
 Using multiple registers helps performance but it complicates the
instructions.
Register Indirect Mode

In this mode, the instruction specifies the register whose contents give us the
address of operand which is in memory. Thus, the register contains the address of
operand rather than the operand itself.
Auto Increment/Decrement Mode
In this the register is incremented or decremented after or before its value is
used.
Direct Addressing Mode

In this mode, effective address of operand is present in instruction itself.
 Single memory reference to access data.
 No additional calculations to find the effective address of the operand.
For Example: ADD R1, 4000 - In this the 4000 is effective address of operand.
NOTE: Effective Address is the location where operand is present.
Indirect Addressing Mode

In this, the address field of instruction gives the address where the effective
address is stored in memory. This slows down the execution, as this includes
multiple memory lookups to find the operand.
Displacement Addressing Mode
In this the contents of the indexed register is added to the Address part of the
instruction, to obtain the effective address of operand.
EA = A + (R), In this the address field holds two values, A(which is the base value)
and R(that holds the displacement), or vice versa.
Relative Addressing Mode

It is a version of Displacement addressing mode.
In this the contents of PC(Program Counter) is added to address part of
instruction to obtain the effective address.
EA = A + (PC), where EA is effective address and PC is program counter.
The operand is A cells away from the current cell(the one pointed to by PC)
Base Register Addressing Mode

It is again a version of Displacement addressing mode. This can be defined as EA =
A + (R), where A is displacement and R holds pointer to base address.
Stack Addressing Mode
In this mode, operand is at the top of the stack. For example: ADD, this instruction
will POP top two items from the stack, add them, and will then PUSH the result to
the top of the stack.
Instruction Cycle
An instruction cycle, also known as fetch-decode-execute cycle is the basic
operational process of a computer. This process is repeated continuously by CPU
from boot up to shut down of computer.
Following are the steps that occur during an instruction cycle:
1. Fetch the Instruction

The instruction is fetched from memory address that is stored in PC(Program
Counter) and stored in the instruction register IR. At the end of the fetch
operation, PC is incremented by 1 and it then points to the next instruction to be
executed.
2. Decode the Instruction

The instruction in the IR is executed by the decoder.
3. Read the Effective Address

If the instruction has an indirect address, the effective address is read from the
memory. Otherwise operands are directly read in case of immediate operand
instruction.
4. Execute the Instruction
The Control Unit passes the information in the form of control signals to the
functional unit of CPU. The result generated is stored in main memory or sent to
an output device.
The cycle is then repeated by fetching the next instruction. Thus in this way the
instruction cycle is repeated continuously.

Proj CA MITE-F18-015

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Proj CA MITE-F18-015

Uploaded by

Copyright:

Available Formats

Computer Architecture

Final Exam Report

USMAN ZAFAR | MITE-F18-015

Optical Storage Devices

Evolution of Optical Storage

1984: Compact Disc - Read Only Memory (CD-ROM)

1989: CD-ROM/Extended Architecture (CD-ROM/XA)

1995: CD Read/Write (CD-RW)

Primary Storage Devices

Read-only memory (ROM) is a type of storage medium that permanently stores

Memory Organization in Computer Architecture

The total memory capacity of a computer can be visualized by hierarchy of

Memory Access Methods

1. Random Access: Main memories are random access memories, in which

 RAM: Random Access Memory

Semiconductor memory is used in all forms of computer applications: there are

Semiconductor memory is used in any electronics assembly that uses computer

In addition to this, memory cards have become commonplace items for

Semiconductor Memory Types & Technologies

Semiconductor memory is used in any electronics assembly that uses computer

In addition to this, memory cards have become commonplace items for

A variety of different memory technologies are available - each one suited to

Printed circuit board containing

Random access memory is used in huge quantities in computer applications as

As a result, this type of semiconductor memory technology is widely used for

Semiconductor memory technologies

The different memory types or memory technologies are detailed below:

 DRAM: Dynamic RAM is a form of random access memory. DRAM uses a

 EEPROM: This is an Electrically Erasable Programmable Read Only

 EPROM: This is an Erasable Programmable Read Only Memory. These

Nevertheless, this type of semiconductor memory used to be widely used in

To erase and re-programme areas of the chip, programming voltages at levels

 F-RAM: Ferroelectric RAM is a random-access memory technology that

 MRAM: This is Magneto-resistive RAM, or Magnetic RAM. It is a non-

Unlike technologies including DRAM, which require a constant flow of

 P-RAM / PCM: This type of semiconductor memory is known as Phase

Concept of Cache in Computer Architecture

A cache -- pronounced CASH -- is hardware or software that is used to store

A small amount of faster, more expensive memory is used to improve the

Cache Mapping Techniques-

- Set-associative mapping: This form of mapping is a modified form of the direct

Cache mapping is performed using following three different techniques-

Mapping Functions And Replacement Algorithms

Figure : Direct-mapping cache

3. One of two things will happen:

o Hit: If the L1 determines that it is currently holding Address XXX's

o Miss: If the L1 determines that it doesn't have Address XXX's data,

 Sneaky assumption 1: It's OK to unceremoniously replace old data in a

Why these assumptions are valid for reads:

Why these assumptions are questionable for writes:

Design Decision #1: Keeping Track of Modified Data

More wild anthropomorphism ahead...

Imagine you're an L1 cache (although this discussion generalizes to other levels as

 Option 1: Write-through. You and L2 are soulmates. Inconsistency with L2

 Option 2: Write-back. You have a more hands-off relationship with L2. Your

Write-Through Implementation Details (naive version)

This is no fun and a serious drag on performance.

Write-Through Implementation Details (smarter version)

This optimization is possible because those write-through operations don't

Write-Back Implementation Details

Design Decision #2: Write Misses: Should You Care?

 Fetch-on-write: If the cache is fetch-on-write, then an L1 write miss triggers

What is an Instruction Set?