Professional Documents
Culture Documents
COA CH01 & CH 02 (Autosaved)
COA CH01 & CH 02 (Autosaved)
• Attributes of a system
visible to the programmer • Instruction set, number of bits
• Have a direct impact on used to represent various data
the logical execution of a types, I/O mechanisms,
program Architectural techniques for addressing
Computer attributes memory
Architecture include:
• Hardware details
transparent to the
programmer, control
signals, interfaces Organizational Computer
attributes Organization
between the computer include: • The operational units and
and peripherals, their interconnections
memory technology that realize the
used architectural
specifications
For CS -----> CoA (CoSc2022)
Compiled by Ferhan Gursum
For IT ----> CoA (ITec2022)
+
IBM System 370 Architecture
IBM System/370 architecture
Was introduced in 1970
Included a number of models
Could upgrade to a more expensive, faster model without having to
abandon original software
New models are introduced with improved technology, but retain
the same architecture so that the customer’s software investment is
protected
Architecture has survived to this day as the architecture of IBM’s
mainframe product line
I/O Main
memory
System
Bus
CPU
CPU
Registers ALU
Structure
Internal
Bus
Control
Unit
CONTROL
UNIT
Sequencing
Logic
Control Unit
Registers and
Decoders
Control
Memory
CPUInterconnection
Some mechanism that provides for
communication among the control unit,
ALU, and registers
Core
An individual processing unit on a processor chip
May be equivalent in functionality to a CPU on a single-CPU system
Specialized processing units are also referred to as cores
Processor
A physical piece of silicon containing one or more cores
Is the computer component that interprets and executes instructions
Referred to as a multicore processor if it contains multiple cores
Processor
I/O chips chip
PROCESSOR CHIP
L3 cache L3 cache
CORE
Arithmetic
Instruction and logic Load/
logic unit (ALU) store logic
L2 instruction L2 data
cache cache
Figure 1.3
Motherboard with Two Intel Quad-Core Xeon Processors
For CS -----> CoA (CoSc2022)
Compiled by Ferhan Gursum
For IT ----> CoA (ITec2022)
Figure 1.4
zEnterprise EC12
Processor Unit (PU)
Chip Diagram
For CS CoA (CoSc2022)
Compiled by Ferhan Gursum
For IT CoA (ITec2022)
Figure 1.5
zEnterprise
EC12
Core Layout
For CS -----> CoA (CoSc2022)
Compiled by Ferhan Gursum
For IT ----> CoA (ITec2022)
+
History of Computers
First Generation: Vacuum Tubes
Vacuum tubes were used for digital logic elements and memory
IAS computer
Fundamental design approach was the stored program concept
Attributed to the mathematician John von Neumann
Instructions
and data
Instructions
and data
Instructions
M(0)
M(0)
and data
M(1)
M(1)
M(2)
PC IBR
M(2)
M(3)
M(4) AC: Accumulator register
0 8 20 28 39
opcode (8 bits) address (12 bits) opcode (8 bits) address (12 bits)
Memory address register • Specifies the address in memory of the word to be written from
(MAR) or read into the MBR
Instruction register (IR) • Contains the 8-bit opcode instruction being executed
Accumulator (AC) and • Employed to temporarily hold operands and results of ALU
multiplier quotient (MQ) operations
For CS CoA (CoSc2022)
Compiled by Ferhan Gursum
Start
Yes Is next No
instruction MAR PC
No memory in IBR?
Fetch access
cycle required
MBR M(MAR)
Left
No Yes IBR MBR (20:39)
IR IBR (0:7) IR MBR (20:27) instruction
IR MBR (0:7)
MAR IBR (8:19) MAR MBR (28:39) required?
MAR MBR (8:19)
PC PC + 1
Decode instruction in IR
Execution Yes
Is AC > 0?
cycle
AC MBR AC AC + MBR
Table 1.1
branch 00001110 JUMP M(X,20:39) Take next instruction from right half of M(X)
00001111 JUMP+ M(X,0:19) If number in the accumulator is nonnegative,
take next instruction from left half of M(X)
0 JU If number in the
0 MP accumulator is nonnegative,
Conditional branch 0 + take next instruction from
The IAS
1 M(X right half of M(X)
0 ,20:
0 39)
Instruction Set
0
0
00000101 ADD M(X) Add M(X) to AC; put the result in AC
00000111 ADD |M(X)| Add |M(X)| to AC; put the result in AC
00000110 SUB M(X) Subtract M(X) from AC; put the result in AC
00001000 SUB |M(X)| Subtract |M(X)| from AC; put the remainder
in AC
00001011 MUL M(X) Multiply M(X) by MQ; put most significant
bits of result in AC, put least significant bits
Arithmetic
in MQ
00001100 DIV M(X) Divide AC by M(X); put the quotient in MQ
and the remainder in AC
00010100 LSH Multiply accumulator by 2; i.e., shift left one
bit position
00010101 RSH Divide accumulator by 2; i.e., shift right one
position
00010010 STOR M(X,8:19) Replace left address field at M(X) by 12
Address modify
rightmost bits of AC (Table can be found on
page 17 in the
00010011 STOR M(X,28:39) Replace right address field at M(X) by 12
rightmost bits of AC
textbook.)
Symbolic
Instruction Type Opcode Representation Description
00001010 LOAD MQ Transfer contents of register MQ to the
accumulator AC
00001001 LOAD MQ,M(X) Transfer contents of memory location X to
MQ
00100001 STOR M(X) Transfer contents of accumulator to memory
Data transfer location X
00000001 LOAD M(X) Transfer M(X) to the accumulator
00000010 LOAD –M(X) Transfer –M(X) to the accumulator
00000011 LOAD |M(X)| Transfer absolute value of M(X) to the
accumulator
00000100 LOAD –|M(X)| Transfer –|M(X)| to the accumulator
Unconditional 00001101 JUMP M(X,0:19) Take next instruction from left half of M(X)
branch 00001110 JUMP M(X,20:39) Take next instruction from right half of M(X)
Unconditional 00001101 JUMP M(X,0:19) Take next instruction from left half of M(X)
branch 00001110 JUMP M(X,20:39) Take next instruction from right half of M(X)
00001111 JUMP+ M(X,0:19) If number in the accumulator is nonnegative,
take next instruction from left half of M(X)
0 JU If number in the
0 MP accumulator is nonnegative,
Conditional branch 0 + take next instruction from
1 M(X right half of M(X)
0 ,20:
0 39)
0
0
00000101 ADD M(X) Add M(X) to AC; put the result in AC
00000111 ADD |M(X)| Add |M(X)| to AC; put the result in AC
00000110 SUB M(X) Subtract M(X) from AC; put the result in AC
00001000 SUB |M(X)| Subtract |M(X)| from AC; put the remainder
in AC
00001011 MUL M(X) Multiply M(X) by MQ; put most significant
bits of result in AC, put least significant bits
Arithmetic
in MQ
00001100 DIV M(X) Divide AC by M(X); put the quotient in MQ
and the remainder in AC
00010100 LSH Multiply accumulator by 2; i.e., shift left one
bit position
00010101 RSH Divide accumulator by 2; i.e., shift right one
position
00010010 STOR M(X,8:19) Replace left address field at M(X) by 12
rightmost bits of AC
Address modify
00010011 STOR M(X,28:39) Replace right address field at M(X) by 12
rightmost bits of AC
+ History of Computers
Cheaper
Discrete component
Single, self-contained
transistor
Manufactured separately, packaged in their own containers, and
soldered or wired together onto masonite-like circuit boards
Manufacturing process was expensive and cumbersome
Read
Activate Write
signal
Chip
Gate
Packaged
chip
ed of
rc
or in
ga w
d
st rk
ci
ul l a
at n
te
gr tio
si o
’s
an w
om re
te n
tr rst
in ve
pr o o
In
Fi
M
100 bn
10 bn
1 bn
100 m
10 m
100,000
10.000
1,000
100
10
1
1947 50 55 60 65 70 75 80 85 90 95 2000 05 11
Announced in 1964
Was the success of the decade and cemented IBM as the overwhelmingly dominant computer
vendor
The architecture remains to this day the architecture of IBM’s mainframe computers
Increasing
Increasing
number of I/O
speed
ports
Increasing
Increasing cost
memory size
Omnibus
ULSI
Semiconductor Memory Ultra Large
Scale
Microprocessors Integration
For CS CoA (CoSc2022)
Compiled by Ferhan Gursum
For IT CoA (ITec2022)
Semiconductor Memory
In 1970 Fairchild produced the first relatively capacious semiconductor memory
In 1974 the price per bit of semiconductor memory dropped below the price per bit
of core memory
There has been a continuing and rapid decline in Developments in memory and processor
memory cost accompanied by a corresponding technologies changed the nature of computers in
increase in physical memory density less than a decade
Each generation has provided four times the storage density of the previous generation, accompanied
by declining cost per bit and declining access time
+
Microprocessors
The density of elements on processor chips continued to rise
More and more elements were placed on each chip so that fewer and fewer chips were
needed to construct a single computer processor
Two processor families are the Intel x86 and the ARM architectures
Current x86 offerings represent the results of decades of design
effort on complex instruction set computers (CISCs)
An alternative approach to processor design is the reduced
instruction set computer (RISC)
ARM architecture is used in a wide variety of embedded systems and
is one of the most powerful and best-designed RISC-based systems
on the market
For CS CoA (CoSc2022)
Compiled by Ferhan Gursum
For IT CoA (ITec2022)
Highlights of the Evolution of the
Intel Product Line:
8080 8086 80286 80386 80486
• World’s first • A more powerful • Extension of the • Intel’s first 32-bit • Introduced the
general-purpose 16-bit machine 8086 enabling machine use of much more
microprocessor • Has an addressing a 16- • First Intel sophisticated and
• 8-bit machine, 8- instruction cache, MB memory processor to powerful cache
bit data path to or queue, that instead of just support technology and
memory prefetches a few 1MB multitasking sophisticated
• Was used in the instructions instruction
first personal before they are pipelining
computer (Altair) executed • Also offered a
• The first built-in math
appearance of coprocessor
the x86
architecture
• The 8088 was a
variant of this
processor and
used in IBM’s first
personal
computer
(securing the
success of Intel
Highlights of the Evolution of the
Intel Product Line:
Pentium
• Intel introduced the use of superscalar techniques, which allow multiple instructions to execute in parallel
Pentium Pro
• Continued the move into superscalar organization with aggressive use of register renaming, branch prediction, data flow analysis, and speculative
execution
Pentium II
• Incorporated Intel MMX technology, which is designed specifically to process video, audio, and graphics data efficiently
Pentium III
•Incorporated additional floating-point instructions
•Streaming SIMD Extensions (SSE)
Pentium 4
• Includes additional floating-point and other enhancements for multimedia
Core
• First Intel x86 micro-core
Core 2
• Extends the Core architecture to 64 bits
• Core 2 Quad provides four cores on a single chip
• More recent Core offerings have up to 10 cores per chip
• An important addition to the architecture was the Advanced Vector Extensions instruction set
+
Embedded Systems
The use of electronics and software within a product
Billions of computer systems are produced each year that are embedded
within larger devices
Processor Memory
Human Diagnostic
interface port
A/D D/A
conversion Conversion
Actuators/
Sensors
indicators
It is the fourth generation that is usually thought of as the IoT and it is marked by the use of billions of
embedded devices
For CS CoA (CoSc2022)
Compiled by Ferhan Gursum
For IT CoA (ITec2022)
+ Embedded Application Processors
Operating versus
Systems Dedicated Processors
There are two general approaches to Application processors
Defined by the processor’s ability to execute
developing an embedded operating
complex operating systems
system (OS): General-purpose in nature
Take an existing OS and adapt it for An example is the smartphone – the
embedded system is designed to support
the embedded application numerous apps and perform a wide variety of
functions
Design and implement an OS
intended solely for embedded use Dedicated processor
Is dedicated to one or a small number of
specific tasks required by the host device
Because such an embedded system is
dedicated to a specific task or tasks, the
processor and associated components can be
engineered to reduce size and cost
For CS CoA (CoSc2022)
Compiled by Ferhan Gursum
Processor
Has a processor whose behavior is difficult to observe both by the programmer and the user
Is not programmable once the program logic for the device has been burned into ROM
Often have wireless capability and appear in networked configurations, such as networks of
sensors deployed over a large area
Typically have extreme resource constraints in terms of memory, processor size, time, and
power consumption
For CS CoA (CoSc2022)
Compiled by Ferhan Gursum
For IT CoA (ITec2022)
ARM
Refers to a processor architecture that has evolved from RISC design
principles and is used in embedded systems
Chips are high-speed processors that are known for their small die size and
low power requirements
Cortex-M
• Cortex-M0
Cortex-R • Cortex-M0+
• Cortex-M3
Cortex- • Cortex-M4
A/Cortex-
A50
Security Analog Interfaces Timers &Triggers Parallel I/O Ports Serial Interfaces
Periph Timer/
bus int counter Pin
Hard- reset USART USB
ware A/D D/A Low Real
AES con- con- energy time ctr
General External Low-
verter verter energy
Pulse Watch- purpose Inter- UART
counter dog tmr I/O rupts UART
Peripheral bus
32-bit bus
Voltage Voltage High fre- High freq Flash SRAM Debug DMA
regula- compar- quency RC crystal memory memory inter- control-
tor ator oscillator oscillator 64 kB 64 kB face ler
Microcontroller Chip
ICode SRAM &
interface peripheral I/F
Bus matrix
Debug logic
Memory
DAP protection unit
ARM
NVIC core ETM
Cortex-M3 Core
NVIC ETM Cortex-M3
interface interface
Processor
32-bit ALU
Hardware 32-bit
divider multiplier
Control Thumb
logic decode
Instruction Data
interface interface
One example is the provisioning of high-performance and/or high-reliability networking between the
provider and subscriber
The collection of network capabilities required to access a cloud, including making use of specialized
services over the Internet, linking enterprise data center to a cloud, and using firewalls and other network
security devices at critical points to enforce access security policies
Cloud Storage
Subset of cloud computing
Consists of database storage and database applications hosted remotely on cloud servers
Enables small businesses and individual users to take advantage of data storage that scales with their need
and to take advantage of a variety of database applications without having to buy, maintain, and manage the
storage assets
+ Summary
Basic Concepts and
Chapter 1 Computer Evolution
Organization and architecture
Embedded systems
Structure and function The Internet of things
Brief history of computers Embedded operating systems
The First Generation: Vacuum tubes Application processors versus dedicated
The Second Generation: Transistors processors
The Third Generation: Integrated
Circuits Microprocessors versus microcontrollers
Later generations Embedded versus deeply embedded systems
Performance Issues
Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years ago
Desktop applications that require the great power of today’s microprocessor-based systems include:
Image processing
Three-dimensional rendering
Speech recognition
Videoconferencing
Multimedia authoring
Voice and video annotation of files
Simulation modeling
Businesses are relying on increasingly powerful servers to handle transaction and database processing
and to support massive client/server networks that have replaced the huge mainframe computer
centers of yesteryear
Cloud service providers use massive high-performance banks of servers to satisfy high-volume, high-
transaction-rate applications for a broad spectrum of clients
For CS CoA (CoSc2022)
Compiled by Ferhan Gursum
For IT CoA (ITec2022)
+
Microprocessor Speed
Techniques built into contemporary processors include:
• Processor moves data or instructions
Pipelining into a conceptual pipe with all stages of
the pipe processing simultaneously
• Processor looks ahead in the instruction code
Branch prediction fetched from memory and predicts which
branches, or groups of instructions, are likely
to be processed next
• This is the ability to issue more than
Superscalar execution one instruction in every processor
clock cycle. (In effect, multiple parallel
pipelines are used.)
• Processor analyzes which instructions
Data flow analysis are dependent on each other’s results,
or data, to create an optimized
schedule of instructions
•Using branch prediction and data flow analysis, some
Speculative execution
processors speculatively execute instructions ahead of
their actual appearance in the program execution,
holding the results in temporary locations, keeping
execution engines as busy as possible
Graphics display
Wi-Fi modem
(max speed)
Hard disk
Optical disc
Laser printer
Scanner
Mouse
Keyboard
101 102 103 104 105 106 107 108 109 1010 1011
Data Rate (bps)
RC delay
Speed at which electrons flow limited by resistance and capacitance of metal wires connecting
them
Delay increases as the RC product increases
As components on the chip decrease in size, the wire interconnects become thinner, increasing
resistance
Also, the wires are closer together, increasing capacitance
Memory latency
Memory speeds lag processor speeds
106
Transistors (Thousands)
105 Frequency (MHz)
Power (W)
104 Cores
103
102
10
+
1
0.1
1970 1975 1980 1985 1990 1995 2000 2005 2010
(1 – f)T fT
N
1
1 f 1 T
N
Spedup
f = 0.90
+ f = 0.75
f = 0.5
Number of Processors
Queuing system
If server is idle an item is served immediately, otherwise an arriving item joins a
queue
There can be a single queue for a single server or for multiple servers, or multiple
queues with one being for each of multiple servers
an
co di alog
nv git
er al to
sio
n
Compiler technology X X X
Processor X X
implementation
Cache and memory
X X
hierarchy
• Arithmetic
• Geometric
• Harmonic
MD
AM
(a) GM
HM
MD
AM
(b) GM
HM
MD
AM
(c) GM
HM
MD
AM
(d) GM
HM
MD
AM
(e) GM
HM
MD
AM
(f) GM
HM
MD
AM
(g) GM
HM
0 1 2 3 4 5 6 7 8 9 10 11
(a) Constant (11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11) MD = median
(b) Clustered around a central value (3, 5, 6, 6, 7, 7, 7, 8, 8, 9, 1 1) AM = arithmetic mean
(c) Uniform distribution (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1) GM = geometric mean
(d) Large-number bias (1, 4, 4, 7, 7, 9, 9, 10, 10, 1 1, 11) HM = harmonic mean
(e) Small-number bias(1, 1, 2, 2, 3, 3, 5, 5, 8, 8, 1 1)
(f) Upper outlier (11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
(g) Lower outlier (1, 1 1, 11, 11, 11, 11, 11, 11, 11, 11, 11)
SPEC
An industry consortium
Defines and maintains the best known collection of benchmark
suites aimed at evaluating computer systems
Performance measurements are widely used for comparison and
research purposes
Best known SPEC benchmark suite
+ Industrystandard suite for processor intensive
applications
Fifth
generation of processor intensive suites
from SPEC
Table 2.5
C Compiler Based on gcc Version 3.2,
403.gcc 2.24 1,064 C
generates code for Opteron.
Combinatoria Vehicle scheduling
429.mcf 2.53 327 C l algorithm.
Optimization
Artificial Plays the game of Go, a
445.gobmk 2.91 1,603 C Intelligence simply described but deeply
complex game.
Table 2.6
Simulate Newtonian
Biochemistry /
equations of motion for
435.gromacs 1.98 1,958 C, Fortran Molecular
Dynamics hundreds to millions of
particles.
436.cactusAD 3.32 1,376 C, Fortran Physics / General Solves the Einstein
M Relativity evolution equations.
437.leslie3d 2.61 1,273 Fortran Fluid Dynamics Model fuel injection flows.
Floating-Point
Linear Test cases include railroad
450.soplex 2.32 703 C++ Programming, planning and military airlift
Optimization models.
Benchmarks
453.povray 1.48 940 C++ Image Ray-tracing 3D Image rendering.
Structural Finite element code for
454.calculix 2.29 3,04` C, Fortran linear and nonlinear 3D
Mechanics
structural applications.
459.GemsFDT Computational Solves the Maxwell
2.95 1,320 Fortran
D Electromagnetics equations in 3D.
Quantum chemistry
Quantum
465.tonto 2.73 2,392 Fortran package, adapted for
Chemistry
crystallographic tasks.
Simulates incompressible
470.lbm 3.82 1,500 C Fluid Dynamics
fluids in 3D.
481.wrf 3.10 1,684 C, Fortran Weather Weather forecasting model
482.sphinx3 5.41 2,472 C Speech recognition Speech recognition
software.
+
Terms Used in SPEC Documentation
Get next
program
Run program
three times
Select
median value
Ratio(prog) =
Tref(prog)/TSUT(prog)
End