Computer Architecture CT-3221: By: Solomon S

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 49

COMPUTER ARCHITECTURE

CT-3221

By: Solomon S.
1
 TEXT BOOK:
 Computer Organization and Architecture: Designing for Performance,
6th,7th, 8th , 9th Ed.
2
– William Stallings
CHAPTER 1

INTRODUCTION TO COMPUTER
ARCHTECTURE

3
Computer architecture and computer organization
 In describing computers, a distinction is often made between
computer architecture and computer organization.
Although it is difficult to give precise definitions for these
terms.
Computer architecture
 It refers to those attributes of a system visible to a
programmer or, put another way, those attributes that
have a direct impact on the logical execution of a
program.
 Examples of architectural attributes include the
instruction set, the number of bits used to represent
various data types (e.g., numbers, characters), I/O
mechanisms, and techniques for addressing memory. 4
Cont..
Computer organization
 It refers to the operational units and their
interconnections that realize the architectural
specifications.
Examples of Organizational attributes include those
hardware details transparent to the programmer, such
as
 control signals;
 interfaces between the computer and peripherals;
 and the memory technology used.

5
Cont…

6
Brief History of Computers
KEY POINT :- The evolution of computers has been characterized by increasing
processor speed, decreasing component size, increasing memory size, and
increasing I/O capacity and speed.
1. ENIAC
• Electronic Numerical Integrator And Computer
• Programmed manually by switches
• Eckert and Mauchly
• Decimal (not binary)
• 20 accumulators of 10 digits
Drawback of ENIAC:
• 18,000 vacuum tubes
• 15,000 square feet It had to be programmed manually
by setting switches and plugging
• 140 kW power consumption and unplugging cables.
• 5,000 additions per second
7
Cont…
2. Von Neumann/Turing
• Stored Program concept
• Main memory storing programs and data
• ALU operating on binary data
• Control unit interpreting instructions from memory and
executing
• Input and output equipment operated by control unit
• Princeton Institute for Advanced Studies
- IAS

8
Structure of von Neumann machine

It consists of:
 A main memory, which
stores both data and
instructions
 An arithmetic and logic
unit (ALU) capable of
operating on binary data
 A control unit, which
interprets the instructions
in memory and causes
them to be executed
 Input and output (I/O)
equipment operated by
Structure of the IAS computer
the control unit
9
IAS - details
• 1000 x 40 bit words
—Binary number
—2 x 20 bit instructions
• Set of registers (storage in CPU)
—Memory Buffer Register
—Memory Address Register
—Instruction Register
—Instruction Buffer Register
—Program Counter
—Accumulator
—Multiplier Quotient

10
Structure of IAS – detail

11
Cont…
3. Transistors
• Replaced vacuum tubes
• Smaller
• Cheaper
• Less heat dissipation
• Made from Silicon (Sand)
• Second generation machines

12
Cont…
4. Microelectronics
• Literally - “small electronics”
• A computer is made up of gates, memory cells
and interconnections
• These can be manufactured on a
semiconductor
• e.g. silicon wafer

13
Generations of Computer-Summery I
 First generation (1946 - 1957)
 Vacuum tubes were larger components and resulted in
first generation computers being quite large in size, taking
up a lot of space in a room.
 Second generation (1958 - 1964)
 Transistors were smaller than vacuum tubes and allowed
computers to be smaller in size, faster in speed, and
cheaper to build.
 Third generation (1964 - 1971)
 Using IC's in computers helped reduce the size of
computers even more compared to second-generation
computers, as well as make them faster.
14
Cont…
 Fourth generation (1972 - 2010)
 Microprocessors, along with integrated circuits,
helped make it possible for computers to fit easily
on a desk and for the introduction of the laptop.
 Fifth generation (2010 to present)
 AI (artificial intelligence), an exciting technology
that has many potential applications around the
world.

15
Generations of Computer-Summery II
 Vacuum tube - 1946-1957
 Transistor - 1958-1964
 Small scale integration - 1965 on
 Up to 100 devices on a chip
 Medium scale integration - to 1971
 100-3,000 devices on a chip
 Large scale integration - 1971-1977
 3,000 - 100,000 devices on a chip
 Very large scale integration - 1978 -1991
 100,000 - 100,000,000 devices on a chip
 Ultra large scale integration – 1991 -
 Over 100,000,000 devices on a chip 16
Moore’s Law

o Moore's law is the observation that the number of transistors in a


dense integrated circuit doubles about every two years.
• Increased density of components on chip
• Gordon Moore – co-founder of Intel
• Cost of a chip has remained almost unchanged
• Higher packing density means shorter electrical paths,
giving higher performance
• Reduced power and cooling requirements
• Fewer interconnections increases reliability

17
Growth in CPU Transistor Count

18
Hardware/Software/Firmware
 Hardware
 Hardware is Physical. It's "Real," Sometimes Breaks, and
Eventually Wears Out.
 Software
 Software Is Virtual. It Can Be Copied, Changed, and
Destroyed.
 Firmware
 Firmware Is Virtual. It's Software Specifically Designed
for a Piece of Hardware.
 Firmware is a software program permanently etched into a
hardware device such as a keyboards, hard drive, BIOS, or
video cards.
19
Basics of computer Architecture
 The main components in a typical computer system are the
processor, memory, input/output devices, and the
communication channels that connect them.
 Processor
 The processor is the workhorse of the system; it is the
component that executes a program by performing
arithmetic and logical operations on data.
 In a typical system there will be only one processor,
known at the central processing unit, or CPU.
 Modern high performance systems, for example vector
processors and parallel processors, often have more than
one processor.
20
Cont…
 Memory
 Memory is a passive component that simply stores
information until it is requested by another part of the
system.
 During normal operations it feeds instructions and data to
the processor, and at other times it is the source or
destination of data transferred by I/O devices.
 Information in a memory is accessed by its address.

21
Cont…
 Input/output (I/O) device
 The term I/O is used to describe any program, operation or
device that transfers data to or from a computer and to or
from a peripheral device
 Input/output (I/O) devices transfer information without
altering it between the external world and one or more
internal components.
 I/O devices can be secondary memories, for example
disks and tapes, or devices used to communicate directly
with users, such as video displays, keyboards, and mouses.

22
Cont…
 Communication channels
 The communication channels that tie the system together.
 Can either be simple links that connect two devices or
more complex switches that interconnect several
components and allow any two of them to communicate at
a given point in time.
 When a switch is configured to allow two devices to
exchange information, all other devices that rely on the
switch are blocked, i.e.
 They must wait until the switch can be reconfigured.

23
A stored-program computer
 A computer with a von Neumann architecture stores
program and data in the same memory;
 A computer with a Harvard architecture has separate
memories for storing program and data.
 Stored-program computer is sometimes used as a synonym
for von Neumann architecture.
 The von-Neumann architecture
 Since you cannot access program memory and data
memory simultaneously, the Von Neumann architecture is
susceptible to bottlenecks and system performance is
affected.
24
25
Cont…
 The Harvard architecture
 In this case, there are at least two memory address
spaces to work with, so there is a memory register
for machine instructions and another memory
register for data.
 Computers designed with the Harvard architecture
are able to run a program and access data
independently, and therefore simultaneously.
 Harvard architecture has a strict separation
between data and code.
26
Cont…
 Thus, Harvard architecture is more complicated but separate
pipelines remove the bottleneck that Von Neumann creates.

27
Computer Structures
 Structure: The way in which the components are
interrelated.
 Accumulator based machines
 Stack machine
 General register machines
I. Accumulator based machines
 An accumulator machine, also called a 1-operand machine,
or a CPU with accumulator-based architecture, is a kind of
CPU where, although it may have several registers.
 The CPU mostly stores the results of calculations in one
special register, typically called "the accumulator".
28
Cont…
 Have a sharply limited number of data accumulator.(most
of the time one Acc.)
 Additional address register with in CPU.
 The Acc. serve both as a source of one operand a
destination for arithmetic operation.

Operand + Acc. Acc.


-
Operation to be performed
Advantage:
• Save memory location.

29
Cont…
II. Stack machine
 The CPU registers are organized as LIFO technique.
 Operands are “PUSHED” on the stack from memory and
“POPED” off the stack in reverse order.
 LIFO arithmetic operation remove operands from the top
of the stack and the result is placed on the stack replacing
the operands.
 When expression evaluation is completed the result is
“POPED” into memory location to complete the process.
i.e. no operand address need to be specified during
arithmetic operation.
 Referred as 0-address machine.
30
Cont…
III. General register machine
 Have a set of numbered registers with in the CPU.
 e.g. A, B, C, D, E, …
 Unlike Acc machine & stack machine the register
in general register machine can be used for almost
any purpose.
 All modern machines have a set of general purpose
register.
 Such register, hold both address and data for
integer register.
 Floating point register holds only data. 31
Designing for performance Idea
The speed of your processor (central processing unit or CPU), the quantity and speed of your memory
(random access memory or RAM), and the capacity and performance of your hard disk are all significant.
Other hardware factors play a part in determining the speed of your computer.

32
Cont…
 The Processor Performance Equation
 Essentially all computers are constructed using a clock
running at a constant rate.
 These discrete time events are called ticks, clock ticks,
clock periods, clocks, cycles, or clock cycles.
 Computer designers refer to the time of a clock period by
its duration (e.g., 1 ns) or by its rate (e.g., 1 GHz).
 CPU time for a program can then be expressed two ways:
1
CPU time = CPU clock cycles for a program x Clock cycle time

33
Cont…
2

 In addition to the number of clock cycles needed to


execute a program, we can also count the number of
instructions executed.
 The instruction path length or instruction count (IC).
 If we know the number of clock cycles and the instruction
count, we can calculate the average number of clock cycles
per instruction(CPI). Because it is easier to work with, we
use CPI.
34
Cont…
o So we can find CPI

 By transposing instruction count in the above formula,


clock cycles can be defined as IC x CPI.
 This allows us to use CPI in the execution time formula:
CPU time = Instruction count x Cycles per instruction x Clock cycle time
 Expanding the first formula into the units of measurement
shows how the pieces fit together:

35
Cont…
o As this formula demonstrates, processor performance is
dependent upon three characteristics:
 clock cycle (or rate),
 clock cycles per instruction, and
 instruction count.
o Furthermore, CPU time is equally dependent on these
three characteristics:
 A 10% improvement in any one of them leads to a 10%
improvement in CPU time.

36
CPU time Example

Question? - A Program is running on a specific machine


(CPU) with the following parameters:
a) Total executed instruction count: 10,000,000 instructions
b) Average CPI for the program: 2.5 cycles/instruction.
c) CPU clock rate: 200 MHz (clock cycle = C = 5x10-9 seconds)
What is the execution time for this program?

37
Cont…
 Speedup(Over all Speedup) Equation
 Speedup tells us how much faster a task will run using the
computer with the enhancement as opposed to the original
computer.
 Suppose that we can make an enhancement to a computer
that will improve performance when it is used.
 Speedup is the ratio:
SU = Performance for entire task using the enhancement
when possible
Performance for entire task without using the enhancement
 Alternatively,
SU = Performance for entire task without using the
enhancement
Performance for entire task using the enhancement 38
when possible
Cont…
 Amdahl's
o Aim - Simply to calculate the performance gain that can be
obtained by improving some portion of a computer.
 Reflection of Amdahl's Law
 Amdahl's Law gives us a quick way to find the speedup
from some enhancement, which depends on two factors:
1) The fraction of the computation time in the original
computer that can be converted to take advantage of the
enhancement.
Example. If 20 seconds of the execution time of a program that takes
60 seconds in total can use an enhancement, the fraction is 20/60. This
value, which is called as Fractionenhanced is always less than or equal
to 1. 39
Cont…
2) The improvement gained by the enhanced
execution mode; that is, how much faster the task
would run if the enhanced mode were used for the
entire program, This value is the time of the original
mode over the time of the enhanced mode.

Example. Example If the enhanced mode takes, say,


2 seconds for a portion of the program, while it is 5
seconds in the original mode, the improvement is 5/2.
We call this value Speedupenhanced, which is always
greater than 1. 40
Cont…
 The execution time using the original computer with the
enhanced mode will be the time spent using the
unenhanced portion of the computer multiply with the time
spent using the enhancement:

41
Over all Speedup Example

 Question? - Suppose that we want to enhance the processor


used for Web serving. The new processor is 10 times faster on
computation in the Web serving application than the original
processor. Assuming that the original processor is busy with
computation 40% of the time and is waiting for I/O 60% of the
time, what is the overall speedup gained by incorporating the
enhancement?

42
Cont…
Solution
SUo= _____________1_________
(1-FRenh) +_FRenh__
SUenh

Where, FRenh  Fraction Enhanced


SUo  Over all Speed up
SUenh Speed up Enhanced

43
Cont…
 AMAT Equation
 AMAT stands for Average Memory Access Time. It refers
to the time necessary to perform a memory access on
average.
 The AMAT of a simple system with only a single level of
cache may be calculated as:

Where,
1. Hit time – Time to access data in a cache. one property of cache
is Hit time, the amount of time that it takes to access data in catch
and our hope is that, this is less than 1 nanosecond. 44
Cont…
2. Miss penalty– is the time to replace the block from memory (that is,
the cost of a miss)
3. Miss rate- is simply the fraction of cache accesses that result in a miss
Example of AMAT
Question?
Assume that the hit time 1 cycle. The miss rate falls 5% for an 8 KB
data cache & Assume the miss penalty is 20 cycles. So, what is the
average memory access time(AMAT)?
Solution
Average memory access time = Hit time + Miss rate x Miss penalty

45
Pentium and PowerPC Evolution
1. Pentium Evolution
Definition of Pentium. A family of 32 and 64-bit x86-based CPU chips from Intel. The term
may refer to the chip or to a PC that uses it. During their reign, Pentium chips were the most
widely used CPUs in the world for general-purpose computing.

 Pre x86 series processors


 4004, 4040, 8008: 4 bits processor
 8080: first general purpose 8 bit microprocessor used in Altair PC.
 8085: binary compatible with 8080, simpler and less expensive.
 x86 series - 16 bit processors
 8086: 16 bit powerful machine, support instruction cache that pre-
fetch few instructions before they executed.
 8088: has 8 bit external bus, used in first IBM PC
 80286: enabled addressing a 16 MB memory instead of 1MB.
 80386: 32 bit, support for multitasking
46
Cont…
 x86 series - 32 bit processors
 80486: introduced sophisticated powerful cache and instruction
pipelining. Also offered built in math co-processor, offloading
complex math operations from the main CPU.
 Pentium: introduced super-scalar technique which allows multiple
instructions to execute in parallel.
 Pentium Pro: Increased super-scalar organization with Aggressive
register renaming, branch prediction, data flow analysis and
speculative execution.
 Pentium II: introduced the MMX technology which is designed
specifically to process video, audio and graphics data efficiently.
 Pentium III: provides additional floating point instructions for 3D
graphics s/w.
 Pentium 4: provides further floating point and multimedia
enhancements.
 Core Series: the first x86 microprocessor with dual core,
implementation of two processors on a single
47 chip.
Cont…

 Pentium D, Pentium Extreme Edition (EE)


 Core 2 series: The Core 2 extends the architecture to 64 bits. The
Core 2 Quad processor provides 4 processors on a single chip.
 Itanium series:
 64 bit architecture, examples: Itanium I and Itanium II.
 Itanium II has hardware enhancements to increase the speed.

48
Cont…
2. PowerPC Evolution

o In 1975, 801 minicomputer projects by IBM introduced the RISC. 


o In 1990, IBM introduced IBM RISC System/6000 which has
 RISC-like superscalar machine.
o POWER architecture. PowerPC Family is:
 601: Quickly to market. 32-bit machine.
 603: 32 bit low-end desktop and portable.
o Lower cost and more efficient implementation
 604: Desktop and low-end servers, 32-bit machine.
o Much more advanced superscalsar design. Greater performance
 620: High-end servers, 64-bit architecture.
 740/750: Also known as G3, two levels of cache on chip.
 G4: Increases parallelism and internal speed.
 G5: Improvements in parallelism and internal speed, 64-bit
organization.
49

You might also like