Professional Documents
Culture Documents
Proj CA MITE-F18-015
Proj CA MITE-F18-015
Floppy Disk
A floppy disk is a magnetic storage medium for computer systems. The floppy disk
is composed of a thin, flexible magnetic disk sealed in a square plastic carrier. In
order to read and write data from a floppy disk, a computer system must have a
floppy disk drive (FDD). A floppy disk is also referred to simply as a floppy. Since
the early days of personal computing, floppy disks were widely used to distribute
software, transfer files, and create back-up copies of data. When hard drives were
still very expensive, floppy disks were also used to store the operating system of a
computer.
A number of different types of floppy disks have been developed. The size of the
floppy got smaller, and the storage capacity increased. However, in the 1990s,
other media, including hard disk drives, ZIP drives, optical drives, and USB flash
drives, started to replace floppy disks as the primary storage medium.
Hard Disk
Hard disk drives have been the dominant type of storage since the early days of
computers. A hard disk drive consists of a rigid disc made with non-magnetic
material, which is coated with a thin layer of magnetic material. Data is stored by
magnetizing this thin film. The disk spins at a high speed and a magnetic head
mounted on a moving arm is used to read and write data. A typical hard disk drive
operates at a speed of 7,200 rpm (rotations per minute), so you will often see this
number as part of the technical specifications of a computer. The spinning of the
disk is also the source of the humming noise of a computer, although most
modern hard disk drives are fairly quiet.
Magnetic Tape
'Tape is dead! Long live tape!' Were you around in the 80s when cassette tapes
were all the rage? People still say 'mixed tape' sometimes when referring to
playlists they make or CDs they give each other. Though the cassette tape has
fallen out of favor, it was neither the first nor the last device to use magnetic tape
for storage.
A magnetic tape, in computer terminology, is a storage medium that allows for
data archiving, collection, and backup. At first, the tapes were wound in wheel-
like reels, but then cassettes and cartridges came along, which offered more
protection for the tape inside.
Optical Storage
About that time, optical devices were starting to be marketed. An optical storage
device is written and read with a laser. It is strong and can handle temperature
fluctuations much better than magnetic media. Because the floppy was so
inexpensive at this time, it took several years before the optical drives became
affordable to the general and small business consumer.
1997: DVD-ROM
Soon after DVDs were released for video, they were used to store data. In
a DVD (digital versatile discs), the pits and lands are shorter, which allows for
capacity up to 4.3GB. DVD's also make use of a second data layer between the
reflective layer and the substrate that boosts speed and storage capacity.
2003 Blu-Ray
Blu-Ray discs look like CDs and DVDs but can can store high definition video (HD)
and even more storage. What gives it the ability to store more is the shorter
wavelength, blue laser that's used, giving it its name.
Static Random Access Memory (Static RAM or SRAM) is a type of RAM that holds
data in a static form, that is, as long as the memory has power. Unlike dynamic
RAM, it does not need to be refreshed.
SRAM stores a bit of data on four transistors using two cross-coupled inverters.
The two stable states characterize 0 and 1. During read and write operations
another two access transistors are used to manage the availability to a memory
cell. To store one memory bit it requires six metal-oxide-semiconductor field-
effect transistors (MOFSET). MOFSET is one of the two types of SRAM chips; the
other is the bipolar junction transistor. The bipolar junction transistor is very fast
but consumes a lot of energy. MOFSET is a popular SRAM type.
The term is prononuced "S-RAM", not "sram."
Dynamic random access memory (DRAM) is a type of random-access memory
used in computing devices (primarily PCs). DRAM stores each bit of data in a
separate passive electronic component that is inside an integrated circuit board.
Each electrical component has two states of value in one bit called 0 and 1. This
captivator needs to be refreshed often otherwise information fades. DRAM has
one capacitor and one transistor per bit as opposed to static random access
memory (SRAM) that requires 6 transistors. The capacitors and transistors that
are used are exceptionally small. There are millions of capacitors and transistors
that fit on one single memory chip.
A memory unit is the collection of storage units or devices together. The memory
unit stores the binary information in the form of bits. Generally, memory/storage
is classified into 2 categories:
Volatile Memory: This loses its data, when power is switched off.
Non-Volatile Memory: This is a permanent storage and does not lose any
data when power is switched off.
Memory Hierarchy
Main Memory
The memory unit that communicates directly within the CPU, Auxillary memory
and Cache memory, is called main memory. It is the central storage unit of the
computer system. It is a large and fast memory used to store data during
computer operations. Main memory is made up of RAM and ROM, with RAM
integrated circuit chips holing the major share.
Auxiliary Memory
Devices that provide backup storage are called auxiliary memory. For
example: Magnetic disks and tapes are commonly used auxiliary devices. Other
devices used as auxiliary memory are magnetic drums, magnetic bubble memory
and optical disks.
It is not directly accessible to the CPU, and is accessed using the Input/Output
channels.
Cache Memory
The data or contents of the main memory that are used again and again by CPU,
are stored in the cache memory so that we can easily access that data in shorter
time.
Whenever the CPU needs to access memory, it first checks the cache memory. If
the data is not found in cache memory then the CPU moves onto the main
memory. It also transfers block of recent data into the cache and keeps on
deleting the old data in cache to accomodate the new one.
Hit Ratio
The performance of cache memory is measured in terms of a quantity called hit
ratio. When the CPU refers to memory and finds the word in cache it is said to
produce a hit. If the word is not found in cache, it is in main memory then it
counts as a miss.
The ratio of the number of hits to the total CPU references to memory is called hit
ratio.
Hit Ratio = Hit/(Hit + Miss)
Associative Memory
It is also known as content addressable memory (CAM). It is a memory chip in
which each bit position can be compared. In this the content is compared in each
bit cell which allows very fast table lookup. Since the entire chip can be
compared, contents are randomly stored without considering addressing scheme.
These chips have less storage capacity than regular memory chips.
Semiconductor Memory Types & Technologies
The use of semiconductor memory has grown, and the size of these memory
cards has increased as the need for larger and larger amounts of storage is
needed.
To meet the growing needs for semiconductor memory, there are many types and
technologies that are used. As the demand grows new memory technologies are
being introduced and the existing types and technologies are being further
developed.
To meet the growing needs for semiconductor memory, there are many types and
technologies that are used. As the demand grows new memory technologies are
being introduced and the existing types and technologies are being further
developed.
Terms like DDR3, DDR4, DDR5 and many more are seen and these refer to
different types of SDRAM semiconductor memory.
In addition to this the semiconductor devices are available in many forms - ICs for
printed board assembly, USB memory cards, Compact Flash cards, SD memory
cards and even solid state hard drives. Semiconductor memory is even
incorporated into many microprocessor chips as on-board memory.
There are two main types or categories that can be used for semiconductor
technology. These memory types or categories differentiate the memory to the
way in which it operates:
RAM - Random Access Memory: As the names suggest, the RAM or
random access memory is a form of semiconductor memory technology that is
used for reading and writing data in any order - in other words as it is required
by the processor. It is used for such applications as the computer or processor
memory where variables and other stored and are required on a random basis.
Data is stored and read many times to and from this type of memory.
Each of the semiconductor memory technologies outlined below falls into one of
these two types of category. each technology offers its own advantages and is
used in a particular way, or for a particular application.
There is a large variety of types of ROM and RAM that are available. Often the
overall name for the memory technology includes the initials RAM or ROM and
this gives a guide as to the overall type of format for the memory.
With technology moving forwards apace, not only are the established
technologies moving forwards with SDRAM technology moving from DDR3 to
DDR4 and then to DDR5, but Flash memory used in memory cards is also
developing as are the other technologies.
In addition to this, new memory technologies are arriving on the scene and they
are starting to make an impact in the market, enabling processor circuits to
perform more effectively.
However these capacitors do not hold their charge indefinitely, and therefore
the data needs to be refreshed periodically. As a result of this dynamic
refreshing it gains its name of being a dynamic RAM. DRAM is the form of
semiconductor memory that is often used in equipment including personal
computers and workstations where it forms the main RAM for the computer.
The semiconductor devices are normally available as integrated circuits for use
in PCB assembly in the form of surface mount devices or less frequently now as
leaded components.
The PROM stores its data as a charge on a capacitor. There is a charge storage
capacitor for each cell and this can be read repeatedly as required. However it
is found that after many years the charge may leak away and the data may be
lost.
Cache is used because bulk, or main, storage can't keep up with the demands of
the cache clients. Cache shortens data access times, reduces latency and
improves input/output (I/O). Because almost all application workloads depend on
I/O operations, caching improves application performance.
How cache works
When a cache client needs to access data, it first checks the cache. When the
requested data is found in a cache, it's called a cache hit. The percent of attempts
that result in cache hits is known as the cache hit rate or ratio.
If the requested data isn't found in the cache -- a situation known as a cache miss
-- it is pulled from main memory and copied into the cache. How this is done, and
what data is ejected from the cache to make room for the new data, depends on
the caching algorithm or policies the system uses.
The three different types of mapping used for the purpose of cache memory are
as follow, Associative mapping, Direct mapping and Set-Associative mapping.
- Associative mapping: In this type of mapping the associative memory is used to
store content and addresses both of the memory word. This enables the
placement of the any word at any place in the cache memory. It is considered to
be the fastest and the most flexible mapping form.
- Direct mapping: In direct mapping the RAM is made use of to store data and
some is stored in the cache. An address space is split into two parts index field
and tag field. The cache is used to store the tag field whereas the rest is stored in
the main memory. Direct mapping`s performance is directly proportional to the
Hit ratio.
1. Direct Mapping
2. Fully Associative Mapping
3. K-way Set Associative Mapping
1. Direct Mapping-
In direct mapping,
A particular block of main memory can map only to a particular line of the cache.
The line number of cache to which a particular block can map is given by-
Mapping Functions
The mapping functions are used to map a particular block of main memory to a
particular block of cache. This mapping function is used to transfer the block from
main memory to cache memory. Three different mapping functions are available:
Direct mapping:
A particular block of main memory can be brought to a particular block of cache
memory. So, it is not flexible.
Associative mapping:
In this mapping function, any block of Main memory can potentially reside in any
cache block position. This is much more flexible mapping method.
Block-set-associative mapping:
In this method, blocks of cache are grouped into sets, and the mapping allows a
block of main memory to reside in any block of a specific set. From the flexibility
point of view, it is in between to the other two methods.
All these three mapping methods are explained with the help of an example.
Consider a cache of 4096 (4K) words with a block size of 32 words. Therefore, the
cache is organized as 128 blocks. For 4K words, required address lines are 12 bits.
To select one of the block out of 128 blocks, we need 7 bits of address lines and to
select one word out of 32 words, we need 5 bits of address lines. So the total 12
bits of address is divided for two groups, lower 5 bits are used to select a word
within a block, and higher 7 bits of address are used to select any block of cache
memory.
Let us consider a main memory system consisting 64K words. The size of address
bus is 16 bits. Since the block size of cache is 32 words, so the main memory is
also organized as block size of 32 words. Therefore, the total number of blocks in
main memory is 2048 (2K x 32 words = 64K words). To identify any one block of
2K blocks, we need 11 address lines. Out of 16 address lines of main memory,
lower 5 bits are used to select a word within a block and higher 11 bits are used to
select a block out of 2048 blocks.
Number of blocks in cache memory is 128 and number of blocks in main memory
is 2048, so at any instant of time only 128 blocks out of 2048 blocks can reside in
cache memory. Therefore, we need mapping function to put a particular block of
main memory into appropriate block of cache memory.
Direct Mapping Technique:
The simplest way of associating main memory blocks with cache block is the
direct mapping technique. In this technique, block k of main memory maps into
block k modulo m of the cache, where m is the total number of blocks in cache. In
this example, the value of m is 128. In direct mapping technique, one particular
block of main memory can be transferred to a particular block of cache which is
derived by the modulo function.
Since more than one main memory block is mapped onto a given cache block
position, contention may arise for that position. This situation may occurs even
when the cache is not full. Contention is resolved by allowing the new block to
overwrite the currently resident block. So the replacement algorithm is trivial.
The detail operation of direct mapping technique is as follows:
The main memory address is divided into three fields. The field size depends on
the memory capacity and the block size of cache. In this example, the lower 5 bits
of address is used to identify a word within a block. Next 7 bits are used to select
a block out of 128 blocks (which is the capacity of the cache). The remaining 4 bits
are used as a TAG to identify the proper block of main memory that is mapped to
cache.
When a new block is first brought into the cache, the high order 4 bits of the main
memory address are stored in four TAG bits associated with its location in the
cache. When the CPU generates a memory request, the 7-bit block address
determines the corresponding cache block. The TAG field of that block is
compared to the TAG field of the address. If they match, the desired word
specified by the low-order 5 bits of the address is in that block of the cache.
If there is no match, the required word must be accessed from the main memory,
that is, the contents of that block of the cache is replaced by the new block that is
specified by the new address generated by the CPU and correspondingly the TAG
bit will also be changed by the high order 4 bits of the address. The whole
arrangement for direct mapping technique is shown in the figure below.
s
Figure : Associated Mapping Cache
Block-Set-Associative Mapping Technique:
This mapping technique is intermediate to the previous two techniques. Blocks of
the cache are grouped into sets, and the mapping allows a block of main memory
to reside in any block of a specific set. Therefore, the flexibility of associative
mapping is reduced from full freedom to a set of specific blocks. This also reduces
the searching overhead, because the search is restricted to number of sets,
instead of number of blocks. Also the contention problem of the direct mapping is
eased by having a few choices for block replacement.
Consider the same cache memory and main memory organization of the previous
example. Organize the cache with 4 blocks in each set. The TAG field of
associative mapping technique is divided into two groups, one is termed as SET bit
and the second one is termed as TAG bit. Each set contains 4 blocks, total number
of set is 32. The main memory address is grouped into three parts: low-order 5
bits are used to identifies a word within a block. Since there are total 32 sets
present, next 5 bits are used to identify the set. High-order 6 bits are used as TAG
bits.
The 5-bit set field of the address determines which set of the cache might contain
the desired block. This is similar to direct mapping technique, in case of direct
mapping, it looks for block, but in case of block-set-associative mapping, it looks
for set. The TAG field of the address must then be compared with the TAGs of the
four blocks of that set. If a match occurs, then the block is present in the cache;
otherwise the block containing the addressed word must be brought to the cache.
This block will potentially come to the corresponding set only.
Since, there are four blocks in the set, we have to choose appropriately which
block to be replaced if all the blocks are occupied. Since the search is restricted to
four block only, so the searching complexity is reduced. The whole arrangement
of block-set-associative mapping technique is shown in the figure below.
It is clear that if we increase the number of blocks per set, then the number of
bits in SET field is reduced. Due to the increase of blocks per set, complexity of
search is also increased. The extreme condition of 128 blocks per set requires no
set bits and corresponds to the fully associative mapping technique with 11 TAG
bits. The other extreme of one block per set is the direct mapping method.
Figure : Block-set Associated mapping Cache with 4 blocks per set
Replacement Algorithms
When a new block must be brought into the cache and all the positions that it
may occupy are full, a decision must be made as to which of the old blocks is to
be overwritten. In general, a policy is required to keep the block in cache when
they are likely to be referenced in near future. However, it is not easy to
determine directly which of the block in the cache are about to be referenced.
The property of locality of reference gives some clue to design good replacement
policy.
Least Recently Used (LRU) Replacement policy:
Since program usually stay in localized areas for reasonable periods of time, it can
be assumed that there is a high probability that blocks which have been
referenced recently will also be referenced in the near future. Therefore, when a
block is to be overwritten, it is a good decision to overwrite the one that has gone
for longest time without being referenced. This is defined as the least recently
used (LRU) block. Keeping track of LRU block must be done as computation
proceeds.
Consider a specific example of a four-block set. It is required to track the LRU
block of this four-block set. A 2-bit counter may be used for each block.
When a hit occurs, that is, when a read request is received for a word that is in
the cache, the counter of the block that is referenced is set to 0. All counters
which values originally lower than the referenced one are incremented by 1 and
all other counters remain unchanged.
When a miss occurs, that is, when a read request is received for a word and the
word is not present in the cache, we have to bring the block to cache.
There are two possibilities in case of a miss:
1. If the set is not full, the counter associated with the new block loaded from
the main memory is set to 0, and the values of all other counters are
incremented by 1.
2. If the set is full and a miss occurs, the block with the counter value 3 is
removed, and the new block is put in its place. The counter value is set to
zero. The other three block counters are incremented by 1.
It is easy to verify that the counter values of occupied blocks are always distinct.
Also it is trivial that highest counter value indicates least recently used block.
First In First Out (FIFO) replacement policy:
A reasonable rule may be to remove the oldest from a full set when a new block
must be brought in. While using this technique, no Updation is required when a
hit occurs. When a miss occurs and the set is not full, the new block is put into an
empty block and the counter values of the occupied block will be increment by
one. When a miss occurs and the set is full, the block with highest counter value is
replaced by new block and counter is set to 0, counter value of all other blocks of
that set is incremented by 1. The overhead of the policy is less, since no updation
is required during hit.
Random replacement policy:
The simplest algorithm is to choose the block to be overwritten at random.
Interestingly enough, this simple algorithm has been found to be very effective in
practice.
Introduction: Cache Reads
So far, we've traced sequences of memory addresses that work as follows, if you'll
let me anthropomorphize a little bit:
1. The processor asks the memory subsystem, "Hey, do you have the data at
Address XXX?"
2. The L1 cache tackles this question first by checking the valid bit and tag of
whatever block(s) could possibly contain Address XXX's data.
But eventually, the data makes its way from some other level of the
hierarchy to both the processor that requested it and the L1 cache.
The L1 cache then stores the new data, possibly replacing some old
data in that cache block, on the hypothesis that temporal locality is
king and the new data is more likely to be accessed soon than the old
data was.
Throughout this process, we make some sneaky implicit assumptions that are
valid for reads but questionable for writes. We will label them Sneaky
Assumptions 1 and 2:
Sneaky assumption 1: Bringing data into the L1 (or L2, or whatever) just
means making a copy of the version in main memory. If we lose this copy,
we still have the data somewhere.
Sneaky assumption 2: If the request is a load, the processor has asked the
memory subsystem for some data. In order to fulfill this request, the
memory subsystem absolutely must go chase that data down, wherever it
is, and bring it back to the processor.
Sneaky assumption 1: Let's think about the data that's being replaced (the
technical term is evicted) when we bring in the new data. If some of the
accesses to the old data were writes, it's at least possible that the version
of the old data in our cache is inconsistent with the versions in lower levels
of the hierarchy. We would want to be sure that the lower levels know
about the changes we made to the data in our cache before just
overwriting that block with other stuff.
Sneaky assumption 2: If the request is a store, the processor is just asking
the memory subsystem to keep track of something -- it doesn't need any
information back from the memory subsystem. So the memory subsystem
has a lot more latitude in how to handle write misses than read misses.
In short, cache writes present both challenges and opportunities that reads don't,
which opens up a new set of design decisions.
Oh no! Now your version of the data at Address XXX is inconsistent with the
version in subsequent levels of the memory hierarchy (L2, L3, main memory...)!
Since you care about preserving correctness, you have only two real options:
With write-through, every time you see a store instruction, that means you need
to initiate a write to L2. In order to be absolutely sure you're consistent with L2 at
all times, you need to wait for this write to complete, which means you need to
pay the access time for L2.
What this means is that a write hit actually acts like a miss, since you'll need to
access L2 (and possibly other levels too, depending on what L2's write policy is
and whether the L2 access is a hit or miss).
Instead of sitting around until the L2 write has fully completed, you add a little bit
of extra storage to L1 called a write buffer. The write buffer's job is to keep track
of all the pending updates to L2, so that L1 can move on with its life.
The bottom line: from a performance perspective, we'll treat a write hit to a
write-through cache like a read hit, as long as the write buffer has available space.
When the write buffer is full, we'll treat it more like a read miss (since we have to
wait to hand the data off to the next level of cache).
As long as we're getting write hits to a particular block, we don't tell L2 anything.
Instead, we just set a bit of L1 metadata (the dirty bit -- technical term!) to
indicate that this block is now inconsistent with the version in L2.
So everything is fun and games as long as our accesses are hits. The problem is
whenever we have a miss -- even if it's a read miss -- and the block that's being
replaced is dirty.
Whenever we have a miss to a dirty block and bring in new data, we actually have
to make two accesses to L2 (and possibly lower levels):
One to let it know about the modified data in the dirty block. We'll treat
this like an L1 miss penalty.
Another to fetch the actual missed data. We'll treat this like a second miss
penalty.
What this means is that some fraction of our misses -- the ones that overwrite
dirty data -- now have this outrageous double miss penalty.
Again, pretend (without loss of generality) that you're an L1 cache. You get a write
request from the processor. Your only obligation to the processor is to make sure
that the subsequent read requests to this address see the new value rather than
the old one. That's it.
If this write request happens to be a hit, you'll handle it according to your write
policy (write-back or write-through), as described above. But what if it's a miss?
As long as someone hears about this data, you're not actually obligated to
personally make room for it in L1. You can just pass it to the next level without
storing it yourself.
(As a side note, it's also possible to refuse to make room for the new data on a
read miss. But that requires you to be pretty smart about which reads you want
to cache and which reads you want to send to the processor without storing in L1.
That's a very interesting question but beyond the scope of this class.)
So you have two basic choices: make room for the new data on a write miss, or
don't.
Write-allocate
A write-allocate cache makes room for the new data on a write miss, just like it
would on a read miss.
Here's the tricky part: if cache blocks are bigger than the amount of data
requested, now you have a dilemma. Do you go ask L2 for the data in the rest of
the block (which you don't even need yet!), or not? This leads to yet another
design decision:
In this class, I won't ask you about the sizing or performance of no-fetch-on-write
caches. I might ask you conceptual questions about them, though.
No-write-allocate
This is just what it sounds like! If you have a write miss in a no-write-
allocate cache, you simply notify the next level down (similar to a write-through
operation). You don't kick anything out of your own cache.
Generally, write-allocate makes more sense for write-back caches and no-write-
allocate makes more sense for write-through caches, but the other combinations
are possible too.
Instruction Cycle
From the designer's point of view, the machine instruction set provides the
functional requirements for the CPU: Implementing the CPU is a task that in large
part involves implementing the machine instruction set.
From the user's side, the user who chooses to program in machine language
(actually, in assembly language) becomes aware of the register and memory
structure, the types of data directly supported by the machine, and the
functioning of the ALU.
Elements of an Instruction
Each instruction must have elements that contain the information required by the
CPU for execution. These elements are as follows
Source operand reference: The operation may involve one or more source
operands, that is, operands that are inputs for the operation.
Next instruction reference: This tells the CPU where to fetch the next
instruction after the execution of this instruction is complete.
The next instruction to be fetched is located in main memory or, in the case of a
virtual memory system, in either main memory or secondary memory (disk). In
most cases, the next instruction to be fetched immediately follows the current
instruction. In those cases, there is no explicit reference to the next instruction.
Source and result operands can be in one of three areas:
Main or virtual memory: As with next instruction references, the main or
virtual memory address must be supplied.
CPU register: With rare exceptions, a CPU contains one or more registers
that may be referenced by machine instructions. If only one register exists,
reference to it may be implicit. If more than one register exists, then each
register is assigned a unique number, and the instruction must contain the
number of the desired register.
I/O device: The instruction must specify (he I/O module and device for the
operation. If memory-mapped I/O is used, this is just another main or
virtual memory address.
It is difficult for both the programmer and the reader of textbooks to deal with
binary representations of machine instructions. Thus, it has become common
practice to use a symbolic representation of machine instructions. Opcodes are
represented by abbreviations, called mnemonics, that indicate the operation.
Common examples include
ADD Add
SUB Subtract
MPY Multiply
DIV Divide
LOA
Load data from memory
D
ADD R, Y
may mean add the value contained in data location Y to the contents of register R.
In this example. Y refers to the address of a location in memory, and R refers to a
particular register. Note that the operation is performed on the contents of a
location, not on its address.
X = X+Y
This statement instructs the computer lo add the value stored in Y to the value
Stored in X and put the result in X. How might this be accomplished with machine
instructions? Let us assume that the variables X and Y correspond lo locations 513
and 514. If we assume a simple set of machine instructions, this operation could
be accomplished with three instructions:
1. Load a register with the contents of memory location 513.
As can be seen, the single BASIC instruction may require three machine
instructions. This is typical of the relationship between a high-level language and
a machine language. A high-level language expresses operations in a concise
algebraic form, using variables. A machine language expresses operations in a
basic form involving the movement of data to or from registers.
With this simple example to guide us, let us consider the types of instructions that
must be included in a practical computer. A computer should have a set of
instructions that allows the user to formulate any data processing task. Another
way to view it is to consider the capabilities of a high-level programming
language. Any program written in a high-level language must be translated into
machine language to be executed. Thus, the set of machine instructions must be
sufficient to express any of the instructions from a high-level language. With this
in mind we can categorize instruction types as follows:
Number of Addresses
Three addresses:
Example: a = b + c
Two addresses:
Example: a = a + b
The two-address formal reduces the space requirement but also introduces
some awkwardness. To avoid altering the value of an operand, a MOVE
instruction is used to move one of the values to a result or temporary
location before performing the operation.
One addresses:
Zero addresses
Fewer addresses:
Multiple-address instructions:
Because register references are faster than memory references, this speeds
up execution.
Design Decisions
One of the most interesting and most analyzed, aspects of computer design is
instruction set design. The design of an instruction set is very complex, because it
affects so many aspects of the computer system. The instruction set defines many
of the functions performed by the CPU and thus has a significant effect on the
implementation of the CPU. The instruction set is the programmer's means of
controlling the CPU. Thus, programmer requirements must be considered in
designing the instruction set. The most important design issues include the
following:
Operation repertoire: How many and which operations to provide, and how
complex operations should be
Data types: The various types of data upon which operations are performed
Types of Operands
Addresses
Numbers
Characters
Logical data
Numbers
All machine languages include numeric data types. Even in nonnumeric data pro-
cessing, there is a need for numbers to act as counters, field widths, and so forth.
An important distinction between numbers used in ordinary mathematics and
numbers stored in a computer is that the latter are limited. Thus, the programmer
is faced with understanding the consequences of rounding, overflow and
underflow.
Floating point
Decimal
Characters
A common form of data is text or character strings. While textual data are most
convenient for human beings, they cannot, in character form, be easily stored or
transmitted by data processing and communications systems. Such systems are
designed for binary data. Thus, a number of codes have been devised by which
characters are represented by a sequence of bits. Perhaps the earliest common
example of this is the Morse code. Today, the most commonly used character
code in the International Reference Alphabet (IRA), referred to in the United
States as the American Standard Code for Information Interchange (ASCII). IRA is
also widely used outside the United States. Each character in this code is
represented by a unique 7-bit pattern, thus, 128 different characters can be
represented. This is a larger number than is necessary to represent printable
characters, and some of the patterns represent control characters. Some of these
control characters have to do with controlling the printing of characters on a
page. Others are concerned with communications procedures. IRA-encoded
characters are almost always stored and transmitted using 8 bits per character.
The eighth bit may be set to 0 or used as a parity bit for error detection. In the
latter case, the bit is set such that the total number of binary 1s in each octet is
always odd (odd parity) or always even (even parity).
Another code used to encode characters is the Extended Binary Coded Decimal
Interchange Code (EBCDIC). EBCDIC is used on IBM S/390 machines. It is an 8-bit
code. As with IRA, EBCDIC is compatible with packed decimal. In the case of
EBCDIC, the codes 11110000 through 11111001 represent the digits 0 through 9.
Logical Data
Normally, each word or other addressable unit (byte, half-word, and soon) is
treated as a single unit of data. It is sometimes useful, however, to consider an n-
bit unit as consisting 1-bit items of data, each item having the value 0 or I. When
data are viewed this way, they are considered to be logic data.
Second, there are occasions when we wish to manipulate the bits of a data
item.
Types of Operations
The number of different opcodes varies widely from machine to machine.
However, the same general types of operations are found on all machines. A
useful and typical categorization is the following:
Data transfer
Arithmetic
Logical
Conversion
I/O
System control
Transfer of control
Data transfer
As with all instructions with operands, the mode of addressing for each
operand must be specified.
In term of CPU action, data transfer operations are perhaps the simplest type. If
both source and destination are registers, then the CPU simply causes data to be
transferred from one register to another; this is an operation internal to the CPU.
If one or both operands are in memory, then (he CPU must perform some or all of
following actions:
Example:
Arithmetic
Most machines provide the basic arithmetic operations of add, subtract, multiply,
and divide. These are invariably provided for signed integer (fixed-point)
numbers, Often they are also provided for floating-point and packed decimal
numbers.
Logical
Some of the basic logical operations that can be performed on Boolean or binary
data are AND, OR, NOT, XOR, …
These logical operations can be applied bitwise to n-bit logical data units. Thus, if
two registers contain the data
then
Conversion instructions are those that change the format or operate on the
format of data. An example is converting from decimal to binary.
Input/Output
System Controls
System control instructions are those that can he executed only while the proces-
sor is in a certain privileged state or is executing a program in a special privileged
area of memory, typically, these instructions are reserved for the use of the oper-
ating system.
Transfer of control
For all of the operation types discussed so far. The next instruction to be
performed is the one that immediately follows, in memory, the current
instruction. However, a significant fraction of the instructions in any program
have as their function changing the sequence of instruction execution. For these
instructions, the operation performed by the CPU is to update the program
counter to contain the address of some instruction in memory.
Branch instruction
A branch instruction, also called a jump instruction, has as one of its operands the
address of the next instruction to be executed. Most often, the instruction is a
conditional branch instruction. That is, the branch is made (update program
counter to equal address specified in operand) only if a certain condition is met.
Otherwise, the next instruction in sequence is executed (increment program
counter as usual).
Skip instructions
The procedure mechanism involves two basic instructions: a call instruction that
branches from the present location to the procedure, and a return instruction
that returns from the procedure to the place from which it was called. Both of
these are forms of branching instructions.
2. A procedure call can appear in a procedure. This allows the nesting of proce-
dures to an arbitrary depth.
Because we would like to be able to call a procedure from a variety of points, the
CPU must somehow save the return address so that the return can take place
appropriately. There are three common places for storing the return address:
• Register
• Top of stack
Addressing Modes
The address field or fields in a typical instruction format are relatively small. We
would like to be able to reference a large range of locations in main memory or
for some systems, virtual memory. To achieve this objective, a variety of
addressing techniques has been employed. They all involve some trade-off
between address range and/or addressing flexibility, on the one hand, and the
number of memory references and/or the complexity of address calculation, on
the other. In this section, we examine the most common addressing techniques:
Immediate
Direct
Indirect
Register
Register indirect
Displacement
Immediate Addressing
Direct Addressing
Indirect Addressing
With direct addressing, the length of the address field is usually less than the
word length, thus limiting the address range. One solution is to have the address
field refer to the address of a word in memory, which in turn contains a full-length
address of the operand. This is known as indirect addressing.
Register Addressing
Register addressing is similar to direct addressing. The only difference is that the
address field refers to a register rather than a main memory address.
The advantages of register addressing are that :
The disadvantage of register addressing is that the address space is very limited.
Displacement Addressing
A = base value
Immediate Mode
In this mode, the operand is specified in the instruction itself. An immediate mode
instruction has an operand field rather than the address field.
For example: ADD 7, which says Add 7 to contents of accumulator. 7 is the
operand here.
Register Mode
In this mode the operand is stored in the register and this register is present in
CPU. The instruction has the address of the Register where the operand is stored.
Advantages
Shorter instructions and faster instruction fetch.
Faster memory access to the operand(s)
Disadvantages
Very limited address space
Using multiple registers helps performance but it complicates the
instructions.
Instruction Cycle
An instruction cycle, also known as fetch-decode-execute cycle is the basic
operational process of a computer. This process is repeated continuously by CPU
from boot up to shut down of computer.
Following are the steps that occur during an instruction cycle:
The cycle is then repeated by fetching the next instruction. Thus in this way the
instruction cycle is repeated continuously.