Unit 3

COMPUTER ARCHITECTURE
UNIT 3
Processors.
Processors
• The main brain or the engine of the PC is the processor (sometimes called
microprocessor) or the Central Processing Unit. The CPU performs the system’s
calculating and processing.
• It is the most expensive single component in the system, costing up to four times
greater than the mother board. This invention is generally accredited to Intel. This
means that all PC-compatible systems use Intel Processor.
• The processor acts like the conductor in an orchestra. It reads program instructions
( commands) from main memory that tell it what it needs to do to accomplish the
work that the user wants, and then executes them.
• The CPU is functionally divided into the control unit, internal registers and the
Arithmetic and Logic unit.
•
The CPU and Program Execution
• The computer has to read and obey every program, including the operating system itself,
one instruction at a time. The basic operation is the fetch-decode-execute cycle. It is the
sequence whereby each instruction within a program is read into the CPU from program
memory and then decoded and executed.
• In the history of the development of computers, the limiting factors have been dictated the
units involved in the fetch, decodes or execute cycle: memory, bus , or CPU. This affects
both design parameters for computer architecture and also the selection of algorithms for
problem solving. For example: ‘memory-intensive’ methods are chosen when memory is
fast and cheap, otherwise ‘compute –intensive’ methods are chosen.
• DRAM chips are used in main memory but they are not as fast as the CPU.
• SRAM chips are available or they faster than DRAM but cost higher so they are only used
in small, fat buffers, called memory caches. Memory cache helps reduce the main memory
access delay by holding copies of current instructions and data.
• The processor is responsible for actually executing the instructions that make up
programs and the operating system. Processors are made up several building
blocks: execution units, registers files, and control logic.
• The execution units contains the hardware that executes instructions. this includes
the hardware that fetches and decode instructions, as well as the arithmetic logic
units that perform actual computation.
• Many processor contain separate execution units for integer and floating –point
computations because very difficult hardware is required to handle these two data
types. Also, modern processors use multiple execution units to execute
instructions in parallel to improve performance.
The Register file
• This is small storage area for data that the processor is using. Values stored in the
register file can accessed more quickly than stored in the memory system, and
register file usually support multiple simultaneous accesses. This allows an
operation be read all of its inputs from the register file at the same time.
• The control logic controls the rest of the processor: It determines when
instructions can be executed and what operations are required to execute each
instruction.
• The main function of the microprocessor or the CPU is to accept data in the form
of a program from input devices, process the data, output the result and transform
the result either to the memory or an output device.
• All processors are organized into three major sections
Arithmetic and Logical Unit Section (ALU)
Control Unit section (CU)
Registers (Internal Mamory) IR
• The function of the ALU is to perform arithmetic operations such as addition,
subtraction, division, multiplication and Logical operations such as AND, OR and
NOT.
• The function of the Control Unit is to control I/O devices, generate control signals
to the other components of the computer such as the Read and Write signals and
also perform instruction execution.
• Information is moved from memory to the registers and also pass the information
to ALU for logical and arithmetic operations. It should also be noted that the
function of the microprocessor and the CPU are the same.
• If the Control Unit, the Registers and the ALU are all packaged into one ie,
Integrated circuit it is referred to as a microprocessor. Otherwise the unit is CPU.
Processor Design
• Processor design is typically divided into two subcategories: instruction set

architecture and processor micro architecture. Instruction set architecture refers to
the design of the set of operations that the processor executes and includes the
choice of programming model, number of registers, and decision about how data
is accessed.
• Processor micro architecture describes how instructions are implemented and
includes factors such as how long it takes to execute instructions, how many
instructions may be executed at one time, and how processor modules at one time,
and how processor modules such as the register file are designed.
Working definition
• Any aspect of the processor that an assembly-language programmer needs to

know about to write a correct program is part of the instruction set architecture,
and any aspect that only affects performance not correctness, is part of the micro
architecture.
Instruction Set Architecture
• When most computer programming was done in assembly language, instruction set
architecture was considered the most important part of computer architecture, because
it determined how difficult it was to obtain optimal performance from the system.
• Over the years, instruction set architecture has become less significant, for some
reasons. First, most programming is now done in high-level languages, so the
programmer never interacts with the instruction set.
• Second, consumers have come to expect compatibility between different generations
of a complex system, meaning that they expect program that ran on their old system to
run on their new system without changes.
• As a result, the instruction set of new processor is often required to be the same as the
instruction set of the company’s previous processor, sometimes with a few additional
instructions [meaning that most of the design effort for a processor goes into
improving the micro architecture to increase performance.]
RISC VS. CISC
CISC-Complex Instruction Set Computers

• Generally require fewer instructions than RISC computers to perform a given
computation, so a CISC computer will have higher performance than a RISC
computer that executes instruction at the same rate.
•Programs written for CISC architectures tend to take les space in memory than the
same program written for RISC architecture.
RISC- Reduce Instruction Set Computers
• The simple instruction set of RISC architecture often allow them to be
implemented at higher clock rates than CISC architectures, allowing them to
execute more instructions in the same amount of time
• RISC architecture are load-store architectures, meaning that only load and store
instructions may access the memory system.
Example
• In many CISC architectures, architecture and other instructions may need their
inputs from or write their output to the memory system, instead of the register.
A CISC architecture might allow and ADD operation of the form:
ADD (RI), (R2),(R3)
Where the parentheses around a register name indicates that the register contains
the address in memory where the operand can be found or the result should be
placed. Thus, the above ADD instruction wants the processor to add the value
contained in the memory location whose address is stored in e2 to the value
contained in the memory location whose address is stored in e3, and store the result
into memory at the address contained in R1.
• ADPR architecture is a load- store architecture to perform the same ADD
operation, assuming in appropriate memory address are present in R1,R2 and R3
at the start of the instruction sequence.
• It will require:
• LD R4, (R2)
• LD R5 (R3)
• ADD R6, R4, R5
• ST (R1), R6
Addressing Modes
• An architecture addressing modes are the set of syntax and methods that
instruction use to specify and memory address either as the target address of a
memory reference or as the address that a branch will jump to.
• Depending on the architectures, same of the addressing modes may only be
available to some of the instruction that reference memory.
• Architecture that allows any instruction that references memory to use any
addressing mode are described as Orthogonal, because the choice of addressing
mode is independent from the choice of instruction.
Register Addressing
Label Addressing
Register plus Immediate Addressing
Register Addressing:
• In register addressing, an instruction that reads the value out of a

register and uses that as the address of the memory reference or
branch target.
Label Addressing
• In label addressing, a branch instruction specifies its destination as a label that is

placed on an instruction elsewhere in the program. [Most branch instruction does
not explicitly contain their destination addresses. Instead, the assembler/linker
translates the label into an offset (which can be either positive or negative) from
the location of the branch instruction to be location of its target. In effect, the
branch instruction tells the processor how far away the target instruction is
located.]
Register plus Immediate Addressing
• This is typically expressed is added to the immediate (constant) values specified

in the instruction to generate a memory address.
• One problem with all addressing modes that compute their address rather than
taking it straight from a register is that these addressing modes increase the
execution time of instructions that use them, since the processor must perform a
computation before the address can send to the memory system.
• In order to provide flexibility in addressing without increasing memory latency,
some architecture provides post incrementing addressing modes.
• These addressing modes read their address out of the specified register, send that
address to the memory system, and then add the specified immediate to the value
of the register
• This result is then written back to the register file. Because the address is sent
directly from the register file to the memory system, these instructions execute
more quickly than register plus immediate addressing mode instructions, but they
still reduce the number instructions required to implement a program as compared
to ISAs that only provide register addressing.
Multimedia Vector Instructions
• Many processor families have recently added multimedia vector instructions to

their ISAs. These instructions are intended to improve performance on multimedia
applications, such as video decompression and audio playback. The applications
have several traits that make it possible to significantly improve their performance
with a small number of new instructions:
• First, they perform the same sequence of operations on a large number of
independent data objects, such as 8×8 blocks of compressed pixels. This trait is
often described as data parallelism, because multiple data objects can be
processed at the same time.
• Second, the application operate on data that is much smaller than the 32-bits or
64-bits data words found in most modern processor.
• Video pixels, which are described by 8-bit red, green, and blue color values, are
an example of this. Each of the color values is generally computed independently,
meaning that 24 bits of a 32-bit ALU are idle during the computation.
• Multimedia vector instructions treat the processors data word as a collection of
smaller data objects. Thus instead of operating on a 32-bit quantity, the data word
is treated as a collection of four 8-bit quantities or two 16-bit quantities.
• Most of the multimedia vector instruction sets can operate on longer data types,
such as 64-bit or 128-bit quantities, allowing more operations to be done in
parallel.
• Many multimedia vector instructions allow the option to operate in saturating
arithmetic mode.
• In saturating arithmetic, computations that overflow the number of bits in their
representation return the maximum value that representation can represent, and
computations that underflow return Ǿ. For example: adding OXAA and OXBC in
8-bit saturated arithmetic has a result of OXFF instead of OX66
• Saturating arithmetic is useful when it is desirable to have a computation be
limited by its maximum value. [ for example, increasing the amount of red should
result in a pixel that is already extremely red in pixel that has the maximum
allowable amount of redness instead of pixel that has very little redness because
the computation has wrapped around to a small value]
• When a multimedia vector instruction executes if performs its computation in
parallel on each of the smaller objects within its input word.
• Multimedia vector instructions can significantly improve a processors
performance on data parallel applications that operate on small data types by
allowing multiple computations to be performed in parallel.
• The hardware required to implement multimedia operations is typically fairly
implement a processor’s non vector operations can be reused, making these
operations attractive to compute architects who expect their processor to be used
for data-parallel applications.
Fixed-length vs. variable-length Instruction Encodings
• Instruction set architecture (ISA) encoding is the set of bits that is used to
represent the instructions in the memory of the computer. Generally, we need an
encoding that is both compact and requires little logic to decode, [meaning it is
simple for the processor to figure out which instruction is represented by a figure
out which instruction is represented by a given bit pattern in the program.]
Unfortunately, these two goals are somewhat in conflict.
Fixed-length
• Instruction set encodings use the same number of bits to encode each instruction
in the ISA. Fixed-length encodings have the encodings have the advantage that
they are simple to decode, reducing the amount of latency of the decode logic
required and the latency of the decode logic. Also, a processor that uses a fixed-
length ISA encoding can easily predict the location of the next instruction to be
executed (assuming that the current instruction is not a branch). This makes it
easier for the processor to use pipelining of multiple instructions.
Variable-length
• instruction set encodings use different numbers of bits to encode the instructions
in the ISA, depending on the number of inputs to the instruction, the addressing
modes used, and other factors.
• Using a variable length of encoding, each instruction takes only as much space in
memory as it requires, although many systems require that all instruction
encodings be an integer number of bytes long.
• Using a variable-length instruction set can reduce the amount of space taken up by
a program, but it greatly increases the complexity of the logic required to decode
instructions, since parts of the instruction, such as the input operands, may be
stored in different bit positions in different instructions.
• Also, the hardware cannot predict the location of the next instruction until the
current instruction has been decoded enough to know how long the current
instruction is
• Given the pros and cons of fixed-and variable-length instruction encodings fixed-
length encodings are more common in recent architectures. Variable-length
encodings are mainly used in architectures where there is a large variance between
the amounts of space required for the longest instruction in the ISA.
• Examples of this include stack-based architectures, because many operations do
not specify their inputs, and CISC architectures, which often contain a few
instructions that can take a large number of inputs.
Processor Micro architecture
• Processor micro architecture includes all of the details about how a processor is
implemented. The ISA has a great deal of impact on the micro architecture. An
ISA that contains only simple operations can be implemented using a simple,
straightforward micro architecture, while an ISA containing complex micro
architecture to implement.
SUMMARY OF PROCESSOR DESIGN

• The architecture is the build up of the processor. There are two main types of the
architecture or technology used to design a CPU.
CISC – Complex Instruction Set Computer
RISC – Reduced Instruction Set Computer
CISC technology
• The CISC was adopted or developed in 1978 by Intel. It started with the 8086
microprocessor chip. It was designed to process 16 bit data word. It had no
instructions for floating points operation. Presently Pentium processors possess 32
bit and 64 bit word and it can process floating point instructions. This is because
Intel designed the Pentium processor in such a way that it can execute programs
written for 8086 processor.
Characteristics of CISC
A large number of instructions

Many addressing modes
Variable length of instruction
Most instructions can be manipulated in the main memory
Control unit is micro programmed.
• It should also be noted that the function of the microprocessor and the CPU are
the same.
• If the control unit, the registers and the ALU are all packaged into one integrated
circuit (IC), then it is referred to as a microprocessor otherwise the unit is called
CPU as seen in most older systems.
RISC
• Until the mid 1990s manufacturers were designing processors using the CISC
technology with large set of instructions. Because of the setbacks in the former
technology, manufacturers decided to adopt the RISC technology that executes
instructions with only
Characteristics of RISC technology
It requires few instructions

All instructions are of the same length
Most instructions are executed in one machine cycle
Control unit is hardwired
Few address modes
A large number of registers.

Processor Specification.
Processor can be identified by two (2) main parameters

How wide they are
How fast they are based on their architecture.
• The speed of the processor is a fairly simple concept. It is measured in megahertz
(mhz or Ghz) which means millions or billions of cycles per second. The faster
the better.
• The width of a processor is a little more complicating because there are three main
specification in a processor that are expressed in width
Data input / output bus
Internal registers
Memory address bus.
Processor speeds and marking / motherboard speed.
• Another confusing factor when comparing processor performance is that virtually
all modern processors since the 486 DX3 run at some multiple of the motherboard
speed. Example, a Pentium II 333 runs a multiple of five times the motherboard
speed of 66mhz; while a Pentium II 400 runs four times the motherboard speed of
100mhz. the number of times also determines the clock speed of the system. Most
of the modern Pentium motherboard used today have 3 to 4 speed settings.
• If you know the clock speed of the system and the motherboard speeds, it should
give you the speed of the processor.
• The processor speed = CPU clock speed x motherboard speed
• Pentium II 350 with CPU clock speed of 3.5x and motherboard speed of 100
3.5x 100 = 350
= 350 mhz
• exx 2.
CPU Clock speed = 4.5x
Motherboard speed = 100
Type of CPU = 450
Databus.
• The most common way to describe a processor is by the width of the processor’s
external data bus. This defines the number of data bits that can be moved into or
out of the processor in one cycle. A
• bus is simply a series of connections that carry common signals. Data buses are
bundles of wires (or pins) used to send and receive data.
• The more signals that can be sent at the same time, the more data that can be
transmitted in a specific interval and therefore the faster the bus.
• A wider bus is like having a highway with more lanes, which allow for greater
throughput. Since data in a computer is sent as digital information consisting of a
time interval in which a signal carries 1 data bit, the more wires you have, the
more individual bits you can send in the same time interval.
Internal Registers.
• The size of the internal register indicate how much information the processor can
operate on at one time, and how it moves data around internally within the chip.
The register size is essentially the internal databus size.
• A register is a holding cell within the processor eg the processor can add numbers
in two different registers, storing the result in a third register. The register size
determines the size of data the processor can operate on.
• The register size also describes the type of software or commands and instructions a
chip can run. That is, a processor with 32-bits internal registers can run 32-bit
instructions that are processing 32 bit chunks of data, but processors with 16bit
registers cannot.
• More advanced 6th generation processors such as Pentium Pro have as many as
six(6) internal pipelines for executing instructions.
Internal Cache
• Most processors have an integrated (LI) cache controller. This controller has built-in
full core speed cache memory.
• This cache basically is an area of very fast memory built into the processor and is used
to hold some of the current working set of code and data.
• Cache memory can be accessed with no wait states because it can fully keep up with
the speed of the processor core.
• Using cache memory reduces a traditional system bottle neck because system RAM
often is much slower than the CPU. This prevents the processor from having to wait for
code and data from much slower main memory, therefore improving performance.
• If the data the processor wants is already in the internal cache, the CPU does not have
to wait. If the data is not in the cache, the CPU must fetch it from the level 2 cache or
from the system bus, meaning main memory directly.
Address bus.
• The address bus is the set of wires that carry the addressing information used to
describe the memory location to which the data is being sent, or from which the
data is being retrieved. As with the data bus, each wire in an address bus carries a
single bit of information.
• This single bit is a single digit in the address. The more wire ( digits) used in
calculating these addresses, the greater the total number of address location. The
size ( or width) of the address bus indicates the maximum amount of RAM that a
chip can address.
Processor modes
• All Intel processors, from 386 on up, can run in several modes. Processor modes
refer to the various operating environments and affect the instructions and
capabilities of the chip. The processor sees and manages the system’s memory and
the tasks that use it.
• The 3 different modes of operation possible are
Real mode
Protected Mode
Virtual Real Mode (Real within Protected Mode)
Real Mode
• All software running in real mode must use only 16-bit instructions and live
within the 20-bit (1M) memory architecture. It also supports software of the type
– single tasking
• which means that only one program can run at a time. There is no built-in
protection to keep one from overwriting another program or even the operating
system in memory, which means that if more than one program is running, it is
possible for one of them to bring the entire system to a crash.
Protected mode.
• This chip can run an entirely new 32 bit of instruction set which also means that
softwares running at that mode is protected from overwriting one another in
memory. Such protection helps make the system more crash-proof as an errant
program cannot very easily damage other programs or the operating system. In
addition, a crashed program can easily be terminated.
Virtual Real Mode (Real within Protected)
• Virtual Real is essentially a virtual real mode 16-bit environment that run inside
32-bit protected mode. Eg. When you run a DOS prompt window inside windows
98 you have created a virtual real mode session.
• Because protected mode allows true multi-tasking you can actually have several
real mode sessions running, each with its own software running on a virtual; PC.
This can all run simultaneously even while other 32-bit applications are running.
• Note that any program running in virtual real mode window can access up to only
1M of memory.
Processor Features,
• Modern processors have several different features. The most notable are
SMM (Power Management)
Super Scaler execution
MMX Technology
Dynamic execution
Dual Independent Bus (DIN) architecture
SMM (Power Management)
• It is a power management circuitry put on by Intel. This circuitry enables

processors to conserve energy use and lengthen battery life. This was introduced
in the Intel 486 SL processor, which is an enhanced version of 486 DX processor.
• This power management feature was universalized and incorporated into all
Pentium and later processors.
• This feature set is called SMM which stands for System Management Mode.
SMM circuit is integrated into the physical chip but operates independently to
control the processor’s power use based on its activity level.
• It also supports the suspend / resume features that allows for instant power on and
power off used in laptops. These settings are normally controlled through system
BIOS settings.
Superscalar Execution
• This is a newer processor feature. It is a multiple internal instruction execution

pipeline which enables processors to execute multiple instructions at the same
time. This technology is usually associated with high speed or high-output RISC
chip. It is now a standard feature on newer PCs.
MMX Technology
• MMX Technology is named for Multi-Media eXtension or Matrix Math

aXtensions. This was introduced in the later Pentium processor to improve video
compression / decompression, image manipulation, encryption and I/O processing
all of which are used in variety of softwares today.
• MMX comes in two (2) main forms. The first is very basic – it has longer L1
cache.
• The second one has extra 57 new commands or instructions and a new instruction
capability called Single Instruction Multiple Data (SIMD)
Dynamic Execution.
• It is an innovative combination of 3 processing techniques designed to help the

processor manipulate data more efficiently. These techniques are multiple branch
prediction, dataflow analysis and speculative execution. This means more efficient
means of manipulating data in a more logical ordered fashion rather than simply
processing a list of instructions.
• Dynamic Execution consists of the following
Multi branch prediction – predicts the flow of the program through several
branches using special algorithms. The processor can anticipate jumps of
branches in the instruction flow. It uses this to predict where the next instruction
can be found in memory. This is possible because while the processor is fetching
instructions, it is also looking at instructions further ahead in the program.
Dataflow analysis – it analyses and schedules instructions to be executed in an
optimal sequence, independent of the original program order. The processor
determines the optimal sequence for processing and executing of instruction in
the most efficient manner.
Speculative Execution. – it increases performance by looking ahead of the
program counter and executing instruction that are likely to be needed later.
This technique essentially allows the processor to complete instructions in
advance and then grab the already completed results when necessary.
DIB (Dual Independent Bus) architecture.
• It was created to improve processor bus bandwidth and performance. Having 2

(dual) independent data I/O buses enables the processor to access data from either
of its buses simultaneously and in parallel, rather than in a singular sequential
manner.
• The second backside bus in a processor with DIB is used for L2 cache, allowing it
to run at much greater speed than if it were to share the main processor bus.
• Two buses make up the dual independent bus architecture: the L2 cache bus and
the processor-to-main memory or system bus.

Unit 3

Uploaded by

Copyright:

Available Formats

You might also like

Unit 3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 3

Uploaded by

Copyright:

Available Formats

COMPUTER ARCHITECTURE

• Processor design is typically divided into two subcategories: instruction set

• Any aspect of the processor that an assembly-language programmer needs to

CISC-Complex Instruction Set Computers

• In register addressing, an instruction that reads the value out of a

• In label addressing, a branch instruction specifies its destination as a label that is

• This is typically expressed is added to the immediate (constant) values specified

• Many processor families have recently added multimedia vector instructions to

A large number of instructions

It requires few instructions

Processor can be identified by two (2) main parameters

• It is a power management circuitry put on by Intel. This circuitry enables

• This is a newer processor feature. It is a multiple internal instruction execution

• MMX Technology is named for Multi-Media eXtension or Matrix Math

• It is an innovative combination of 3 processing techniques designed to help the

• It was created to improve processor bus bandwidth and performance. Having 2

You might also like