Lecture 5

Computer Architecture
Lecture 5
Topics
• Software (System/Application)
• Computer Organization
• Architecture and its Organization
• Basic Architectural Design Principles
• Caching
• Measuring Performance
• Aspects of Computer Performance
Software
• Computer software, or simply software, is a generic term that refers to a collection of data or
computer instructions that tell the computer how to work.
• The majority of software is written in high-level programming languages that are easier and
more efficient for programmers to use because of their familiarity with natural languages.
• TYPES OF SOFTWARE
• System software: is software that directly operates the computer hardware, to provide basic
functionality needed by users and other software, and to provide a platform for running
application software.
• Application software: is software that uses the computer system to perform special functions
or provide entertainment functions beyond the basic operation of the computer itself.
Types of System Software
• TYPES OF SYSTEM SOFTWARE
• Operating systems: are essential collections of software that manage resources and provides
common services for other software that runs "on top" of them.
• Device drivers: operate or control a particular type of device that is attached to a computer.
Each device needs at least one corresponding device driver.
• Utilities: are computer programs designed to assist users in the maintenance and care of
their computers. E.g., Disk Defragmenter.
• Malicious software or malware: is software that is developed to harm and disrupt
computers. As such, malware is undesirable.
Types of Application Software
• TYPES OF APPLICATION SOFTWARE
• Office Suites e.g., MS Office Suite etc.

• PC Games e.g., Resident Evil, Need for Speed etc.
• Media Players e.g., KM Player.
• Internet Browsers e.g., Apple Safari, Mozilla Firefox, Google Chrome etc.
• Image Editors e.g., Adobe Photoshop.
• Computer Aided Design (CAD) Software: Autodesk Autocad
• Animation Software: Pixar Renderman
What is Computer Organization?
Organization or Microarchitecture
• Basic components of a computer on the CPU (ALU, registers etc)
• Memory (levels of the cache hierarchy)
• How they operate
• How they are connected together
• Organization is mostly invisible to the programmer

• Today some components are considered part of the architecture
• Why? because a programmer can get better performance if he/ she

knows the structure
• for example: the caches, the pipeline structure
Separate Architecture & its Organization
• Many Organizations/ISAs for each particular architecture family:

EXAMPLES
• IBM 360/85, 360/91, 370s
• MIPS R2000, R3000, R10000
• Intel x86, Pentium, Pentium-Pro, Pentium 4
• DEC Alpha 21064, 21164, 21264
• Different points in the cost/performance curve

• Binary compatibility: same software could run on all machines
Different Architectures
• So why have different architectures?
• Different architecture philosophies &
therefore different styles
• support high level language operations: CISC
• support basic primitive operations: RISC
• Different Application Areas: for example,
multimedia instructions are better executed
on RISC
Basic Architectural Design Principles
• Design for the common case
• common cases in Hardware, uncommon cases in Software
• EXAMPLES
• Basic floating point operations in Hardware, Software function for the
cosine routine
• Memory access in Hardware, Trap to Software for a page fault
• Smaller is faster
• Must have a good reason for adding an instruction, register etc.
• memory hierarchy: registers, caches, main memory
• Keep it simple, stupid: this principle simplicity favors smaller designs
and shorter design time
Assembly Language
• Symbolic form of computer machine language
• Easier to understand in comparison to machine language
• Where assembly language is used in practice:

• Things that aren’t expressible in a high-level language:
• for example:
• Programs that need access to protected registers (I/O)
• Size-critical applications e.g., programs for embedded processors
• time-critical applications e.g., real-time applications
• Why assembly language is not widely used:

• lower programmer productivity
• for example:
• longer coding time, more debugging
• Not portable across architectures
Caching
• Caching (pronounced “cashing”) is the process of storing data in a cache.
• A cache is a temporary storage area that stores data so that future requests for that data can be
served faster.
• CACHE HIT and CACHE MISS
• Cache hits are served by reading data from the cache, which is faster than re-computing a result or
reading from a slower data store; thus, the more requests that can be served from the cache, the
faster the system performs.
• A Cache Hit occurs when the requested data can be found in a cache, while a Cache Miss occurs when
it cannot.
• BENEFITS of CACHING
• The buffering provided by a cache benefits both Response Time and Throughput.
• EXAMPLES of HARDWARE CACHE
• CPU CACHE: Small memories on or close to the CPU can operate faster than the much
larger main memory.
• GPU CACHE: Advanced GPUs support on board cache for faster and better graphics. For
example, GT200 architecture GPUs did not feature an L2 cache, while the Fermi GPU has 768 KB
cache, the Kepler GPU has 1536 KB cache, and the Maxwell GPU has 2048 KB of cache.
Caching
• Level 1 (L1) cache, or primary cache, is extremely fast but relatively small, and is usually
embedded in the processor chip as CPU cache.
• Level 2 (L2) cache, or secondary cache, is often more capacious than L1. L2 cache may be
embedded on the CPU, or it can be on a separate chip or coprocessor and have a high-
speed alternative system bus connecting the cache and CPU. That way it doesn't get
slowed by traffic on the main system bus.
• Level 3 (L3) cache is specialized memory developed to improve the performance of L1 and
L2. L1 or L2 can be significantly faster than L3, though L3 is usually double the speed of
RAM.
• L3 cache. (Level 3 cache) A memory bank built onto the motherboard or within the CPU
module. The L3 cache feeds the L2 cache, and its memory is typically slower than the L2
memory, but faster than main memory. The L3 cache feeds the L2 cache, which feeds the
L1 cache, which feeds the processor.
Computer Performance and Benchmarks
• In computing, the performance of a computer system refers to the amount of useful work it accomplishes
(i.e. instructions it executes).
• When it comes to computers with high performance however, one or more of the following factors may be
involved:
• Short response time for a given piece of work.
• High throughput (rate of processing work).
• Low utilization of computing resource(s).
• High availability of the computing system or application.
• Fast (or highly compact) data compression and decompression.
• High bandwidth.
• Short data transmission time.
• Benchmarks are programs that have been developed to test a CPU on all aspects of performance.
• EXAMPLES: SPECint and SPECfp benchmarks developed by Standard Performance Evaluation Corporation.
Performance
• Performance improvements:
– Improvements in semiconductor technology
• Architecture (e.g., 10nm architecture), clock speed
– Improvements in computer architectures
• Lead to RISC architectures
– Together have enabled:

• Lightweight computers
• Improved Performance (Response Time/Throughput)
Processor Performance Improvement over Time
Move to multi-processor
RISC
Copyright © 2012, Elsevier Inc. All rights reserved.

Processor Performance Improvement over Time
•As we see in the Graph on the Previous Slide, with the advent of Multi-
Processor/Multi-core systems, the Performance Percentage has decreased
from 52%/Year to 22%/Year.
•The reason for this is that Multi-core CPUs were designed after considering
the limitations of Single Core CPUs under High Performance Conditions such as
Over clocking.
•Multi-core CPUs feature separate Cache Memory for each core, therefore
improved performance can be obtained even at lower clock speeds than the
Single Core CPUs and over clocking is not required as it can damage the CPU.
Current Trends in Architecture
• Cannot continue to leverage Instruction-Level
parallelism (ILP)
– Single processor performance improvement ended in 2003
• New models for performance:

– Data-level parallelism (DLP)
– Thread-level parallelism (TLP)
– Request-level parallelism (RLP)
Classes of Computers
• Personal Mobile Device (PMD)
– e.g. start phones, tablet computers
– Emphasis on energy efficiency and real-time
• Desktop Computing
– Emphasis on price-performance
• Servers
– Emphasis on availability, scalability, throughput
• Clusters / Warehouse Scale Computers
– Used for “Software as a Service (SaaS)”
– Emphasis on availability and price-performance
– Sub-class: Supercomputers, emphasis: floating-point performance and fast
internal networks
• Embedded Computers
– Emphasis: price
Summary of mainstream…
Copyright © 2012, Elsevier Inc. All rights

reserved.
Bandwidth and Latency
• Bandwidth or throughput
– Total work done in a given time
– 10,000-25,000X improvement for processors
– 300-1200X improvement for memory and disks
• Latency or response time

– Time between start and completion of an event
– 30-80X improvement for processors
– 6-8X improvement for memory and disks

reserved.
Parallelism
• Classes of parallelism in applications:
– Data-Level Parallelism (DLP)
– Task-Level Parallelism (TLP)
• Classes of architectural parallelism:

– Instruction-Level Parallelism (ILP)
– Thread-Level Parallelism (TLP)
– Request-Level Parallelism (RLP)

reserved.
Classes of Parallelism in Applications
– Data-Level Parallelism (DLP): Data parallelism is parallelization across
multiple processors in parallel computing environments. It focuses on distributing the data
across different nodes, which operate on the data in parallel.
– A common type of Data Parallelism is when you convert a large video file into some
format, in that case, the file is split into two or more parts and each part is assigned to
each core of the CPU for conversion.
– Task-Level Parallelism (TLP): Task parallelism (also known as function

parallelism and control parallelism) is a form of parallelization of computer code across
multiple processors in parallel computing environments. Task parallelism focuses on
distributing tasks—across different processors.
– A common type of task parallelism is pipelining which consists of moving a single set of

data through a series of separate tasks where each task can execute independently of the
others.
Classes of Architectural Parallelism
Instruction-Level Parallelism (ILP): is a measure of how many of the
instructions in a computer program can be executed simultaneously in
Parallel.
Thread Level Parallelism (TLP): is the parallelism inherent in an application that

runs multiple threads at once. This type of parallelism is found largely in
applications written for servers such as databases. By running many threads at
once, these applications are able to tolerate the high amounts of I/O and
memory system latency their workloads can suffer - while one thread is delayed
waiting for a memory or disk access, other threads can do useful work.
Request Level Parallelism (RLP): is another way of representing tasks which are
nothing but a set of requests which we are going to run in parallel. When we use
the term Request then we mean that user is asking for some information which
servers are going to respond in parallel.
Flynn’s Taxonomy
• Single instruction stream, single data stream (SISD)
• Single instruction stream, multiple data streams (SIMD)

– Vector architectures
– Multimedia extensions
– Graphics processor units
• Multiple instruction streams, single data stream (MISD)

– No commercial implementation
• Multiple instruction streams, multiple data streams (MIMD)

– Tightly-coupled MIMD
– Loosely-coupled MIMD

reserved.
Flynn’s Taxonomy
• SISD: A sequential computer which exploits no parallelism in either the instruction or data
streams. Single control unit (CU) fetches single instruction stream (IS) from memory. The CU
then generates appropriate control signals to direct single processing element (PE) to operate
on single data stream (DS) i.e., one operation at a time.
• Examples of SISD architecture are the traditional uni-processor machines like older personal
computers in the late 70s and early 80s.
• SIMD: describes computers with multiple processing elements that perform the same

operation on multiple data points simultaneously. Such machines exploit data level
parallelism: there are simultaneous (parallel) computations on data, but only a single process
(instruction) at a given moment.
• SIMD is particularly applicable to common tasks such as adjusting the contrast in a digital
image or adjusting volume of an audio or video file. Most modern CPU designs include SIMD
instructions to improve the performance of multimedia use.
Flynn’s Taxonomy
• MISD: In computing, MISD (multiple instruction, single data) is a type of parallel
computing architecture where many functional units perform different operations on the
same data.
• Pipeline architectures belong to this type.
• MIMD: In computing, MIMD (multiple instruction, multiple data) is a technique employed to

achieve parallelism. Machines using MIMD have a number of processors that
function asynchronously and independently. At any time, different processors may be
executing different instructions on different pieces of data.
• MIMD architectures may be used in a number of application areas such as computer-aided

design/computer-aided manufacturing, simulation, modeling, and as communication
switches.
Measuring Performance
• Typical performance metrics:
– Response time
– Throughput
• Execution time
– Wall clock time: includes all system overheads
– CPU time: only computation time
• Benchmarks
– Kernels (e.g. matrix multiply)
– Toy programs (e.g. sorting)
– Synthetic benchmarks (e.g. Dhrystone)
– Benchmark suites(Standard Performance Eval. Coorp’s SPEC06fp, TPC-C)
Benchmark name by SPEC gen.
Aspects of Computer Performance
Aspects of Computer Performance include:
•Availability
•Response Time
•Processing Speed
•Channel Capacity
•Latency
•Bandwidth
•Throughput
•Scalability
•Power Consumption
•Performance Per Watt
•Speedup
•Hardware Acceleration
Availability: Availability of a system is typically measured as a factor of its reliability - as reliability increases, so does
availability (that is, less downtime).
Availability of a system may also be increased by the strategy of focusing on increasing testability and maintainability.
Response Time: Response time is the total amount of time it takes to respond to a request for service. In computing, that
service can be any unit of work from a simple disk IO to loading a complex web page.
The response time is the sum of three numbers:

Service time - How long it takes to do the work requested.
Wait time - How long the request has to wait for requests queued ahead of it before it gets to run.
Transmission time – How long it takes to move the request to the computer doing the work and the response back to the
requestor.
Processing Speed: Instructions per second (IPS) is a measure of a computer's processor speed.

In computer architecture, instructions per cycle (IPC) is one aspect of a processor's performance: the average number
of instructions executed for each clock cycle.
Channel Capacity: Channel capacity is the tightest upper bound on the rate of information that can be reliably transmitted
over a communications channel.
Latency: Latency is a time delay between the cause and the effect of some physical change in the system being observed.
Online games are sensitive to latency since fast response times to new events occurring during a game session are
rewarded while slow response times may carry penalties. Lag is the term used to describe latency in gaming.
Network latency in a packet-switched network is measured as either one-way (the time from the source sending a packet
to the destination receiving it), or round-trip delay time (the one-way latency from source to destination plus the one-way
latency from the destination back to the source).
Bandwidth: Bandwidth is a measurement of bit-rate of available data communication resources, expressed in bits per second or
multiples of it (bit/s, kbit/s, Mbit/s, Gbit/s, etc.).
Throughput: The number of tasks processed per unit time.
Scalability: Scalability is the ability of a system, network, or process to handle a growing amount of work in a capable manner or its
ability to be enlarged to accommodate that growth.
Power Consumption: The amount of electricity used by the computer. This becomes especially important for systems with limited
power sources such as solar, batteries, human power.
Performance Per Watt: In computing, performance per watt is a measure of the energy efficiency of a particular computer
architecture or computer hardware. Literally, it measures the rate of computation that can be delivered by a computer for
every watt of power consumed.
Speedup: In computer architecture, speedup is a number that measures the relative performance of two systems processing the same
problem.
Hardware Acceleration: Hardware acceleration is the use of computer hardware specially made to perform some functions more
efficiently than is possible in software running on a general-purpose CPU.
The implementation of computing tasks in hardware to decrease latency and increase throughput is known as hardware acceleration.
Examples of hardware acceleration include graphics acceleration functionality in graphics processing units (GPUs).
ADVANTAGES of Software Acceleration and Hardware Acceleration and their Tradeoff

Typical advantages of software include lower non-recurring engineering costs, heightened portability, and ease of updating
features or patching bugs, at the cost of overhead to compute general operations.
Advantages of hardware include speedup, lower latency, increased parallelism and bandwidth, and better utilization of functional

components available on an integrated circuit at the cost of functional verification i.e., the art of verifying functionality of circuit after
designing it.

Lecture 5

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 5

Uploaded by

Copyright:

Available Formats

Computer Architecture

• Office Suites e.g., MS Office Suite etc.

• Organization is mostly invisible to the programmer

• Why? because a programmer can get better performance if he/ she

• Many Organizations/ISAs for each particular architecture family:

• Different points in the cost/performance curve

• Where assembly language is used in practice:

• Why assembly language is not widely used:

– Together have enabled:

Copyright © 2012, Elsevier Inc. All rights reserved.

• New models for performance:

Copyright © 2012, Elsevier Inc. All rights

• Latency or response time

Copyright © 2012, Elsevier Inc. All rights

• Classes of architectural parallelism:

Copyright © 2012, Elsevier Inc. All rights

– Task-Level Parallelism (TLP): Task parallelism (also known as function

– A common type of task parallelism is pipelining which consists of moving a single set of

Thread Level Parallelism (TLP): is the parallelism inherent in an application that

• Single instruction stream, multiple data streams (SIMD)

• Multiple instruction streams, single data stream (MISD)

• Multiple instruction streams, multiple data streams (MIMD)

Copyright © 2012, Elsevier Inc. All rights

• SIMD: describes computers with multiple processing elements that perform the same

• Pipeline architectures belong to this type.

• MIMD: In computing, MIMD (multiple instruction, multiple data) is a technique employed to

• MIMD architectures may be used in a number of application areas such as computer-aided

The response time is the sum of three numbers:

Processing Speed: Instructions per second (IPS) is a measure of a computer's processor speed.

Throughput: The number of tasks processed per unit time.

ADVANTAGES of Software Acceleration and Hardware Acceleration and their Tradeoff

Advantages of hardware include speedup, lower latency, increased parallelism and bandwidth, and better utilization of functional

You might also like