Microprocessor Systems: Introduction & Historical Review

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 113

Microprocessor Systems

Introduction & Historical Review

Dr. Taisir Eldos

“Study the past if you would define the future” - Confucius


µP Systems
★ Digital systems have prominent roles in life; social, commercial, industrial, scientific, etc.
★ Performance, cost, size, power, etc. vary; depending on what they are designed to do

Mi cr o pr o c e sso r S y stems
Computing platforms like Desktops, Laptops & Tablets represent a small fraction of the
world’s computing; the industrial and embedded platforms are numerous

Dra f t L ec tu re s
Artificial Intelligence (AI) has a touch on everything today, and this requires sophisticated

high performance processors to process vast amount of data like autonomous driving
★ Ubiquitous computing is about hardware & software engineering appearing everywhere in life
By
❖ Handheld computing devices

❖ Internet of Things (IoT) & Internet of Medical Things (IoMT)


D r. Ta i s i r El dos
Telematics is a technology that combines informatics and telecommunications (Internet) for
specific applications like fleet management, using data collected by devices like sensors
★ In modern societies, people deal with computing devices more than 70 times in a typical day
Jord an U n ive r si ty of Sci ence
❖ Automatic Tellers

❖ Security Gates

an d Tech no lo gy
❖ Automotives

❖ Phones

❖ Tickets

Dr. Taisir Eldos 2


General Purpose Computer
★ A typical microprocessor system consist of:
❖ Processor and support logic; clock, reset, etc.

Mi cr o pr o c e sso r S y stems
❖ Memory where code executed and data stored; primary memory & secondary storage

❖ Input & Output; peripherals that interact with the outside world via an interface

Dra f t L ec tu re s
❖ Connections; to connect all parts

❖ Glue Logic; to arbitrate operations various operations Memory


Power is supplied to various parts with the required voltage
By

❖ Conventional operating voltages Processor


✦ 12 V, Hard Disk Drives & Liquid Crystal Display

D r. Ta i s i r El dos
✦ ±12 V, RS-232 Communication Links

✦ 5.0 & 3.3 V, Legacy & Modern chips Mouse


Keyboard Screen

Jord an U n ive r si ty of Sci ence


✦ 1.8 V, MultiMedia Cards, Flash Storage

❖ Modern operating voltages Touch


Camera Interface Printer

✦ 1.5 V, 1.2 V or 1.1 V for Memory Mic Speaker

an d Tech no lo gy
✦ 0.7 V to 1.4 V for Central Processing Units

Storage
✦ 0.6 V to 1.0 V for Graphic Processing Units

Dr. Taisir Eldos 3


Design Example - Small Computer
★ Define the target class of users, hence functions, performance & price (as range)
★ Write down what this system is going to do; the workload


Mi cr o pr o c e sso r S y stems
Write down the specifications to achieve the workload
Search for the system components based on the target application

Dra f t L ec tu re s
❖ CPU class, complexity, price & performance

❖ MEM class, size, price & performance

❖ I/O components, types & numbers

By
❖ Form Factor, Power Supply, etc.

★ Analyze trade-offs; cost & performance



D r. Ta i s i r El dos
Fewer components means
❖ Less assembly cost

Jord an U n ive r si ty of Sci ence


❖ Less testing cost

❖ Smaller system

❖ More reliability

an d Tech no lo gy
❖ Less shipping cost

❖ Less power consumption

Dr. Taisir Eldos 4


Quality of Design
★ Primary Metrics
❖ Price, depends on the cost of material and design time and effort

Mi cr o pr o c e sso r S y stems
❖ Power, important in portable systems for battery life and in large systems for cost

❖ Performance, doing more work in less time is always is a requirement

Dra f t L ec tu re s
★ Secondary Metrics
❖ Form factor: Size, Weight, etc.

❖ Operating range: Temperature, Radiation, etc.

★ Design Steps:
By
❖ Reliability: Mean Time Between Failures (MTBF)

D r. Ta i s i r El dos
❖ Project statement; Objectives, Deliverables, Constraints, Milestones, etc.

❖ Project specifications; for Managers, Developers, Testers, Clients, etc.

❖ Project analysis; Cost, Assumptions, Discrepancies, Choices, etc.

Jord an U n ive r si ty of Sci ence


❖ Co-Design (Hardware & Software) tradeoffs; Maximize performance & Minimize cost

❖ Implementation tradeoffs

an d Tech no lo gy
❖ Construction

❖ Testing

❖ Documentation

Dr. Taisir Eldos 5


The Beginning
★ Post Office Research Station invested in design a machine to break German Codes in WWII
★ Designed for specific task (not a programmable computer)

Mi cr o pr o c e sso r S y stems
Primitive machines; bulky, clumsy and slow
❖ 16,000 Vacuum Tubes in Mark #1


Dra f t L ec tu re s
❖ 24,000 Vacuum Tubes in Mark #2

9 KW of power consumption
No RAM at all ! Purpose specific design
By

★ Input / Output ?
❖ Input was paper tape


D r. Ta i s i r El dos
❖ Output was indicators lamps

Started in 1943

Jord an U n ive r si ty of Sci ence
Retired in 1960 COLOSUS 1943, UK

an d Tech no lo gy
Dr. Taisir Eldos 6
Programmable Computer
★ USA invented a machine for ballistics, it Electronic Numerical Integrator And Computer
(ENIAC), completed in 1945 and retired 1955


Mi cr o pr o c e sso r S y stems
Programmable by physical rewiring
University of Pennsylvania (Philadelphia)

Dra f t L ec tu re s
Primitive machines; bulky, clumsy and slow

★ Caused city brownout when turned on


★ ENIAC was capable of doing:
❖ 5,000 Add Per Second
By
❖ 350 & 38 Multiply & Divide Per Second

D r. Ta i s i r El dos
❖ 10 to 20 FLOPS (By software routines)

❖ Took 70 hours to compute to 2037 places


ENIAC 1946, USA

Jord an U n (Japan):
ive36rTFLOPS
Modern Computers (TOP500 List, #1))
❖2004, NEC’s Earth Simulator si ty of Sci ence M3 Max Based Personal System
2012, Cray’s Titan (USA): 18 PFLOPS
an 440 d Tech no lo gy

16.4 TFLOPS & $3,000
❖2021, Fugaku (Japan): PFLOPS ...
❖ 2022, Frontier (USA): 1100 PFLOPS NEC’s 6,400 CPUs & $500,000,000

Dr. Taisir Eldos 7


𝝅
Integrated Circuits
★ In the late 1950, transistors replaced the electronic valves as a switch to build gates and use in
making computers as discrete elements, since they are smaller, faster and more reliable

Mi cr o pr o c e sso r S y stems
Todays transistors are built using Complementary Metal Oxide Semiconductor (CMOS) type
for density and power consumption
Integrated Circuits (ICs) is about placing the whole circuit, transistors and connections, on a
Dra f t L ec tu re s

single die yielding smaller space and more reliability
★ Dies are packaged in what are called chips; with various forms and sizes

By
Contacts or pins on the chip provide communication with others
❖ Used as Address, Data, Control, Power & Test connections VCC
AB
DB

D r. Ta i s i r El dos
❖ Started with 16 and went 18, 40, 64, …, now 6096 CB
GND
★ How does the number of transistors affect performance? CB

★ How does wider data bus affect performance?



Jord an U n ive r si ty of Sci ence
How does wider address bus affect performance?
How does higher clock frequency affect performance?
CPU


an d Tech no lo gy
How does power consumption relate to performance?
Why do we have many VCC& GND inputs ?
CPU

Dr. Taisir Eldos 8


1st Generations (1946 - 1958)
★ Designed in 12 months & built in 18 months for the US Army, $500,000 ($7 Millions today’s)
★ Technology
Mi cr o pr o c e sso r S y stems
❖ Processing: Electronic valves

❖ Memory: Core memory and Magnetic tape


Dra f t L ec tu re s
Specifications
❖ 1000’s of vacuum tubes and 1000’s electromagnetic relays

❖ Huge power demand (160 KW), with liquid cooling

By
❖ Weight of 30 Tons and 15 x 9 x 2.5 m (like 12 offices)

★ Computing Power
D r. Ta i s i r El dos
❖ Basic functions; thousands of decimal additions/second

❖ Compute ballistic tables (for the military)

cm

Jord an U n ive r si ty of Sci ence
Ease of use
❖ Hardwired program for ballistic tables computation, took 3 weeks to change task

❖ Clumsy input/output

★ an d Tech no lo gy
Reliability
❖ Used to fail few times a day, only few days without failure

Dr. Taisir Eldos 9


2nd Generations (1958 - 1965)
★ Technology
❖ Discrete transistors, invented in1948 (used in 1958)


Mi cr o pr o c e sso r S y stems
Compared to electronic valves
❖ Size: 5 x 5 x 5 mm versus 15 x 15 x 40 mm (50+ times smaller)

Dra f t L ec tu re s
❖ Terminal count: 3 versus 5 or 6 (2 times lesser)

❖ Power: 5 mW versus 250 mW (50 times lesser)

❖ Voltage: 5 V or less versus 120 V (20 times lesser)

By
❖ Frequency: 1 MHz versus 0.1 MHz (10 times better)

❖ Mean Time Between Failures (at least 10 times better)


D r. Ta i s i r El dos
Today’s transistors are far better; much faster & lower power consumption
Historical review of Silicon

cm
Jord an U n ive r si ty of Sci ence
❖ Jons Berzelius discovered Silicon in 1823 in Sweden

❖ John Bardeen, Walter Brattain & William Shockley invented Transistor in Bell Labs 1947,
and got Nobel Prize in Physics 1956
an d Tech no lo gy
❖ Robert Noyce developed the first integrated Circuit in 1958

❖ Robert Noyce and Gordon Moore founded Intel Corporation in 1968

Dr. Taisir Eldos 10


3rd Generations (1965 - 1975)
★ Integrated Circuits (IC), transistors and connections on a wafer, invented in 1959
★ Integration levels, based on components per chip
Mi cr o pr o c e sso r S y stems
❖ 1962: 10 Transistors, Small Scale Integration (SSI)

❖ 1966: 100 Transistors, Medium Scale Integration (MSI)


Dra f t L ec tu re s
❖ 1969: 1,000 Transistors, Large Scale Integration (LSI)

1971 Intel 4004: 2300 transistors, 10 µm, 16-pin


Int
el
80
80

❖ PMOS, 4-bit data & 12-bit address, 740 KHz, 46 instructions

★ By
1972 Intel 8008: 3500 transistors, 10 µm,18-pin
Int
❖ NMOS, 8-bit data, 14-bit address, 800 KHz, 48 instructions el
80

D r. Ta i s i r El dos
08

★ 1974 Intel 8080: 6000 transistors, 6 µm, 40-pin


❖ NMOS, 8-bit data, 16-bit address, 2 MHz, 10 times faster than 8008 Int
el
40

cm
04


Jord an U n ive r si ty of Sci ence
The 4004 could do 60,000 Decimal Operations Per Second
But to have an idea about its Instruction Per Second (IPS) performance:
❖ Cycles Per Instruction (CPI) = 8

an d Tech no lo gy
❖ Instructions Per Cycle (IPC) = 1 / CPI = 0.125

❖ Instructions Per Second (IPS) = IPC x F = 740,000 x 0.125 = 92,500 IPS = 92.5 KIPS

Dr. Taisir Eldos 11



Modern Systems
★ Integration Levels (Tr for Transistor, K, M & B for Kilo, Million & Billion)
❖ 1975: 10 KTr, Very Large Scale Integration (VLSI)

Mi cr o pr o c e sso r S y stems
❖ 1980: 100 KTr, Ultra Large Scale Integration (ULSI)

❖ 1990: 1 MTr, Extremely Large Scale Integration (ELSI)

Dra f t L ec tu re s
❖ 2000: 10 MTr, VLSI for all as a generic name

❖ 2010: 1 BTr

❖ 2013: 10 BTr

❖ 2015: 20 BTr By
❖ 2018: 40 BTr

D r. Ta i s i r El dos
❖ 2022: 60 BTr

❖ 2023: 90 BTr

cm


Jord an U n ive r si ty of Sci ence
Today, chips are made of multiple dies for higher yields
Process, node or fab, used to refer to the transistor dimension (10 µm was the beginning)
Process today refers to feature size (like channel length); indicating smaller and smaller
an d Tech no lo gy

❖ 3 nm process can pack around 300 MTr/mm2; transistor cell is around 60 x 60 nm2

❖ 6 nm diameter copper wires used are 10,000 times thinner than human hair (60 µm)

Dr. Taisir Eldos 12


Silicon: Ingots, Wafers, Dies & Chips
★ Earth curst has 46% Oxygen, 28% Silicon, 8% Aluminum by weight, sand is mostly Silicon
★ Chip-Grade Silicon impurities must not exceed 1 in 109 (Carbon, Oxygen & Others)

Mi cr o pr o c e sso r S y stems
Fabrication takes place in sophisticated foundries that cost 5 to 30 Billions of dollars, and may
cost 660 Million dollars to setup the production line for a layout (440 & 220 for 5 & 7 nm)

Dra f t L ec tu re s
A chips takes 1000 to 2000 steps and 10 to 15 weeks to make; it may have 100 material layers,

including 5 to15 metal layers to route wires (> 100 Km, consuming 5% to 15% of the power)
★ A 300 mm raw wafer costs $200 to $400 while a processed one costs $1,000 to $20,000

By
More than 1 Trillion chips every year & more than half of them come from TSMC

D r. Ta i s i r El dos
Jord an U n ive r si ty of Sci ence
Sand & Silicon Melting & Crystallization Ingot, 400 KG Wafers, 15 - 45 cm

an d Tech no lo gy
Dies ready to cut A single die A die; well and pins Chip Packaging

Dr. Taisir Eldos 13


The Die

Mi cr o pr o c e sso r S y stems
Dra f t L ec tu re s
By
D r. Ta i s i r El dos
Jord an U n ive r si ty of Sci ence
an d Tech no lo gy
Intel 80286, 1982
134 KTr, 7mm x 7 mm
Intel i7 Octa Core, 2014
2.6 BTr, 18 mm x 20 mm
2.9 KTr / mm2 7.2 MTr / mm2

Dr. Taisir Eldos 14


Dies Per Wafer (Yield)
★ Wafers are 7.6, 10, 12, 15, 20, 30, 33 & 45 cm diameter, 0.1 mm to 0.9 mm thick wafers
★ Defect rates from 0.05 to 0.1 per cm2; a 30 cm wafer may have 30 to 70 defects
Mi cr o pr o c e sso r S y stems
❖ Smaller dies have better yields, and

❖ Larger wafers have better yields, this led to using 45 cm wafers & multi-die chips


Dra f t L ec tu re s
Largest monolithic die was 30 mm, and today’s most effective size is 10 mm to 15 mm
Apple’s M1 SoC, 5 nm & 120 mm2 die; a 30 cm wafer makes 450 dies, cost is $50 per chip

By
D r. Ta i s i r El dos
Jord an U n ive r si ty of Sci ence
an d Tech no lo gy
15 cm wafer & 30 mm die
6 Good & 3 Defective
20 cm wafer & 30 mm die
17 Good & 7 Defective
20 cm wafer & 20 mm die
54 Good & 7 Defective
Silicon Utilization = 30% Silicon Utilization = 50% Silicon Utilization = 70%

Dr. Taisir Eldos 15


30 cm Wafer & 3 nm Features
★ To see a 5 nm thick copper wire in a die as a 75 µm thick hair, we need magnify by 15,000
❖ A 7 mm x 7 mm x 450 nm Die x 15,000 = 105 m x 105 m x 7 cm (11 dunum area)


Mi cr o pr o c e sso r S y stems
❖ A transistor cell on the die is 60 nm x 60 nm will look like a point by a pen; 0.6 mm

The wafer to feature size ratio is 30 cm / 3 nm = 100,000,000



Dra f t L ec tu re s
With 7 mm x 7 mm die size, the yield of 30 cm wafer is around 1,000 (nearly, 200 defective)
If the wafer fits the equatorial line, the 3 nm feature is like 13 cm (bread slice size)
UV Light are few µm to 1 mm above wafer; lets say 0.1 mm
By

★ Lithography is like a chef flying an aircraft at 10 Km above


the ground, topping 13 cm bread slices on the ground

❖ Jam D r. Ta i s i r El dos
without making any mess …
300 Km x 300 Km
Wafer Die
❖ Butter

Jord an U n ive r si ty of Sci ence


❖ Thyme, etc.
Apartment Size
★ CMOS NOT Gate (middle)
an d Tech no lo gy
❖ 8 Features wire pitch

❖ 16 Features gate pitch


CMOS NOT Gate

❖ Billions of Gates

Dr. Taisir Eldos 16


Chip & Sockets
★ To connect various components comprising a system; CPU, ROM, RAM, PIO, SIO, PTC,
etc., we need a board to place and connect those chips


Mi cr o pr o c e sso r S y stems
Every chip has pins, leads, balls or contacts, to connect the die in the well with the outside
Chips sit on sockets, whose mechanical design may improve heat dissipation

Dra f t L ec tu re s
AMD Genoa, LGA6096 package, has 6096 contacts to supper 12 memory channel for 96

cores, more than 1 GB of L3 cache, 128 PCIe lanes, 3.7 GHz & 400 W.
★ Does this explain why 6096 contacts ?

By
Today, balls have 36 µm pitch

D r. Ta i s i r El dos
Dual In Line Package Thin Small Outline Package Plastic Leaded Chip Carrier Surface Mount Technology

Jord an U n ive r si ty of Sci ence


(DIL / DIP) (TSOP) (PLCC) (SMT / SMD)

an d Tech no lo gy
Pin Grid Array Reduced Pin Grid Array Ball Grid Array Land Grid Array
(PGA) (rPGA) (BGA) (LGA)

Dr. Taisir Eldos 17


Chip Packaging
★ Flip-Chip is a technique in use for long time now; dies are face down to minimize wiring by
having direct contact with pins, balls or lands

Mi cr o pr o c e sso r S y stems
Extreme shrinking of transistors cause tunneling (transistor conducts when it should not)
❖ Fabrication gets harder and harder due to the extremely small features

Dra f t L ec tu re s
❖ Yield, the percentage of good chips in a wafer, gets lower and lower

★ The fabrication cost per mm2 depends on the process; doubled by moving 14 nm to 7 nm
★ Multi-die packages reduce cost, increase yield and provide for customizable modular design
By
❖ 3D Stacking, dies stacked vertically on top of each other; CPUs, GPUs, SDRAM, I/O, etc.

❖ Chiplet (Tiles), dies placed next to each other horizontally connected via interposer

D r. Ta i s i r El dos
❖ 3D Chiplet, imagine a complex 100 cm2 chip gets squeezed on 25 cm2 (4-story chip)

Jord an U n ive r si ty of Sci ence


an d Tech no lo gy
3D Stacking Chiplet / Tiles 3D Chiplet

Dr. Taisir Eldos 18


Yield: Chiplet vs. Monolithic
★ Yield is the product of: wafer yield, die yield, packaging yield and burn-in yield
★ Yield decreases significantly with increasing die area; may become unfeasible …

Mi cr o pr o c e sso r S y stems
Consider a 200 mm2 chip, the burn-in yield is …
❖ Monolithic (Single Die) 200 mm2 has yield of 40%, but …

Dra f t L ec tu re s
❖ Chiplet (4 Dies x 50 mm2 each) has a far better yield of 70%, more than 75% more

90%

80%

70%
By 4-Chiplet

60%

50%
D r. Ta i s i r El dos
Jord an U n ive r si ty of Sci ence
40%

30%
Monolithic

an d Tech no lo gy
20%

10%

20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 400

Dr. Taisir Eldos 19


Enhancing Performance
★ Performance metrics: Clock, IPC, MIPS, GFLOPS, but the absolute metric is the task time
★ Ultimate performance is completion time of a task; productivity test. Run your app to figure
Mi cr o pr o c e sso r S y stems
out how good a machine is. To reduce this time:
❖ Wider Data Bus

Dra f t L ec tu re s
✦ Early days: 4, 8, 16, 32 and today 64 bits

✦ Means more bits, hence information, transferred each cycle or unit of time

❖ Wider Address Bus

By
✦ Early days: 12, 14, 16, 24, 32, 36 and today: 40, 41, …, 48, 50, & 52 (4 PB of memory)

✦ Larger address space means accessing more Code & Data directly in fast memory,

D r. Ta i s i r El dos
without referencing slower storage devices like HDD
❖ Larger number of registers, hence more high-speed data available, (192 not all named)

❖ Large number of functional units; Adders, Multipliers, etc. hence more things in parallel

Jord an U n ive r si ty of Sci ence


❖ Deeper pipelines; 10 to 30 stages in CPUs and 100s in GPUs

❖ Faster clocking; 6 GHz max by Intel Coe i9 with 24 cores consuming 250 W

an d Tech no lo gy
❖ Larger caches (2 x 64KB L1, 256 KB L2 & 250 MB L3)

❖ More cores (100 Cores) & hence more memory channels (8)

★ Leading to huge: number of transistors, number of contacts, and power consumption

Dr. Taisir Eldos 20


Modern Computing
★ The Central Processing Unit (CPU) consists of:
❖ Arithmetic Unit (AU), to perform arithmetic, logic and shift operations

Mi cr o pr o c e sso r S y stems
❖ Register Files (RF), to store operands and results temporarily

❖ Control Unit (CU), to decimate what to do in every cycle


Dra f t L ec tu re s
CPU is good in general data manipulation and operations on integers and strings, not
mathematically intensive operations like floating point numbers crunching
★ Floating Point Unit (FPU) is a special AU that is good at floating point arithmetic


By
Using FPU next to the CPU enhance performance; work in parallel and more
Integrating the FPU & the CPU on a die makes it even faster

D r. Ta i s i r El dos
Integrating more CPUs enhances parallelism and yields faster performance

Jord an U n ive r si ty of Sci ence


CPU FPU FPU FPU FPU

CPU CPU CPU#1 CPU#1

an d Tech no lo gy EU CPU#2

EU
CPU#2

CPU#3
Early days Later Integrate Integrate more
EU

Dr. Taisir Eldos 21


Modern Computing
★ EU with split L1 caches (Data & Code) make increase performance significantly
★ Adding a larger unified L2 caches will improve performance even more by reducing the


Mi cr o pr o c e sso r S y stems
memory latency, vial increasing the hit ratio for the workload
That is a typical single core processor

Dra f t L ec tu re s
We may integrate 2, 4, 6, 8, etc. of them to work in parallel on different programs or threads

★ A shred L3 cache makes improves the performance more and more


★ Multi-Core processors require large bandwidth; large amount of data and instructions to keep
By
the cores busy, hence we use multiple memory channels

D r. Ta i s i r El dos
EU EU EU EU EU

L1 C L1 D L1 C L1 D L1 C L1 D L1 C L1 D L1 C L1 D

Jord an U n ive r si ty of Sci ence


EU
L2 CD L2 CD

L3 CD
L2 CD

an d Tech no lo gy
L1 C L1 D

L2 CD

Single Core Dual Core L1/L2 Dual Core L1/L2/L3

Dr. Taisir Eldos 22


Architectural Enhancement
★ Technological enhancements like clocking faster does not pay off much anymore
★ Architectural enhancement like packing more functional units requires packing more


Mi cr o pr o c e sso r S y stems
transistors on the chip
Fabs could integrate only 200,000 transistors on a chip in the early 80’s! Where did they go?

Dra f t L ec tu re s
Researchers realized that more than half for the control sections due to large instruction set

and number of addressing modes.
★ Also need to add more and more on die like Memory Management Unit (MMU). What to do?

By
Make the control unit less complex; save transistors for registers, caches, function units, and it
runs fast as it is small and hardwired

D r. Ta i s i r El dos
★ This lead to anew design philosophy
❖ Reduced Instruction Set Computers (RISC)

✦ Hardware centric, easier on the designer & harder on the compiler

Jord an U n ive r si ty of Sci ence


✦ Example: ARM, MIPS, PowerPC, PA-RISC, RISC-V, etc.

❖ Complex Instruction Set Computers (CISC)

an d Tech no lo gy
✦ Software centric, easier on the programmer & harder on the designer

✦ Examples: Intel x86, AMD Ryzen, Motorola MC68000, SUN SPARC, etc.

Dr. Taisir Eldos 23


Design Trends - RISC vs. CISC
★ RISC
❖ Small instructions set, typically < 100 (RISC-V has 47, with all addressing modes nearly 200)

Mi cr o pr o c e sso r S y stems
❖ Small number of addressing modes, typically ≤ 5

❖ Uniform instruction length; take 1 word

★ CISC Dra f t L ec tu re s
❖ Simple control logic, hence fast and easy on the real estate (< 20%)

❖ Large instruction set, typically > 200 (Intel’s x86 > 1500)

By
❖ Large number of addressing modes, typically ≥ 8

❖ Variable length instructions, few to several (x86: 1 to 15 bytes & VAX 11/780: 1 to 57 bytes)


D r. Ta i s i r El dos
❖ Complex control logic, consume more real estate and has to go microprogramming (> 50%)

The two design styles coexist; each has its advantages … and they borrow from each other

Jord an U n ive r si ty of Sci ence
Some processors have RISC cores or engine and a hardware shell that translates CISC
instructions to RISC sequences on the fly, and caches them for fast processing
★ However, RISC is more power efficient (higher ratio of Performance to Power Consumption)

an d Tech no lo gy
Today, 20% to 30% of the die real estate is non-core; chip cache, graphics unit, memory
controller, clock distribution network, connections fabric, communications links and others
★ Around half of the non-core real estate goes for level 3 cache which is shared (in GB now)

Dr. Taisir Eldos 24


Specifications - RISC vs. CISC
★ RISC compilers generate larger binary codes; around 25% to 50% more (May even double).
★ A high level language source code may generate:
Mi cr o pr o c e sso r S y stems
CISC Code RISC Code
❖ 1500 to 1800 instructions for a CISC processor, and
AND R1, R2 AND R1, R2
❖ 2000 to 2400 instructions for a RISC processor

Dra f t L ec tu re s
MOVE A, B LOAD A, R1
★ Not all RISCs or CISCs are equal …
STORE R1, B
★ Hence, binary files vary in instructions count
❖ Normally, within ±10% close

By
ADD R1, A LOAD A, R2
❖ Quite similar ones, within ±5% close
ADD R1, R2
STORE R2, A
★ RISC compilers are hard to design and take longer to run

D r. Ta i s i r El dos
RISC / CISC code segments
❖ A & B are variables
ADD A, B LOAD A, R1
LOAD B, R2
ADD R1, R2


Jord an U n ive r si ty of Sci ence
❖ R1 & R2 are registers

Example
STORE R2, B

❖ A source code compiled for a RISC generated 1987 instructions

an d Tech no lo gy
❖ Assuming 30% less instructions when compiled for a CISC; 0.7 x 1987 = 1390.9 = 1400

❖ Estimated to generate ±10% when compiled for another RISC; 1800 to 2200

Dr. Taisir Eldos 25


State of the Art - Specifications
★ CPUs, general purpose cores
❖ Wide range of instructions for any task

Mi cr o pr o c e sso r S y stems
❖ Typically, 10s of cores depending on platform

✦ 2 to 4 cores, Controllers (Basic ones use 32-bit single core)

Dra f t L ec tu re s
✦ 4 to 8 cores, Gaming (overclocked & power hungry)

✦ 8 to 12 cores, Laptops (slower & power efficient)

✦ 8 to 12 cores, Desktops (faster & more power consuming)

By
✦ 12 to 24 cores, Workstations (fast & power hungry)

✦ 24 to 64 cores, Servers (slower but too many tasks at once)


D r. Ta i s i r El dos
✦ 48 to 192 cores, Data Centers

GPUs, graphic and compute cores

Jord an U n ive r si ty of Sci ence


❖ Special purpose graphic cores & compute-oriented for highly parallel tasks

❖ 100s to 1000s of GPUs on chip as co-processors add on

NPUs & TPUs, Neural Processing Units & Tensor Processor Units
an d Tech no lo gy

❖ Specialized cores for machine learning and artificial intelligence

❖ Typically, 10s of cores on die

Dr. Taisir Eldos 26


State of the Art - Components
★ Cache Memory:
24-core Intel Core i9 costs $700
❖ L1: 64 KB Code & 64 KB Data per core

Mi cr o pr o c e sso r S y stems
96-core AMD Epyc Genoa costs $3,000
❖ L2: 256 KB/512 KB, Unified Code/Data per core

❖ L3: 8 MB Laptops, 16 MB Desktops, 64 MB Workstations


Dra f t L ec tu re s
❖ L3: 8x32 = 256 MB in AMD Epyc Rome & 1.1 GB in AMD Epyc Genoa for Servers

UnCore Components
❖ Memory Management Unit, Virtual Memory and Multi-Channel Interfaces

By
❖ Thermal Control, Maximize core utilization and performance within the power envelope

★ Clock: operating frequency varies with the number of cores and workload…
D r. Ta i s i r El dos
❖ 9.0 GHz under nitrogen cooling for short period of time (overclocked for several minutes)

❖ 6.0 GHz low core count; up to 24 cores

Jord an U n ive r si ty of Sci ence


❖ 3.2 GHz high core count, up to 64 cores (with some cores running at 4.2 GHz)

❖ 3.0 GHz very high core count, up to 192 cores

Today, processors have two types of Compute Cores


an d Tech no lo gy

❖ Performance Cores (P-Core), high performance for foreground operations (more power)

❖ Efficiency Cores (E-Core), low power to run background operations

Dr. Taisir Eldos 27


Why 6 GHz Clock Frequency? Why not 50 GHz?
★ We are stuck at less than 6 GHz for long time now! With less than 3% increase every year
★ Why not clocking faster for better performance as we used to?
Mi cr o pr o c e sso r S y stems
❖ Higher frequency requires higher operating voltage causing heating; a big challenge

❖ Signals encounter resistance leading to degradation; causing integrity issues

Dra f t L ec tu re s
❖ Higher frequency produces more electromagnetic interference; causing crosstalk

❖ Higher frequency requires faster switching transistors; size and material limitation

❖ High frequency requires more precise manufacturing processes and materials

★ By
50 GHz clock? Assuming all above issues are sorted out
❖ Signal travels in the chip, silicon & wires, at 100,000 to 200,000 Km/s speed

D r. Ta i s i r El dos
❖ This is equivalent to 100 mm/ns to 200 mm/ns, lets us assume 100 nm/ns

❖ A clock cycle = 1 / 50 GHz = 0.02 ns, the signal travels only 2 mm, die side must be small

Jord an U n ive r si ty of Sci ence


enough for the signal to travel back and forth edge to edge
❖ Hence the die side must not exceed 1 mm

❖ By proportion, Apple SoC A17 Pro has 150 to 160 mm2 die and clocks at 3.78 GHz, the


an d Tech no lo gy
die side must be reduced to (3.78/50)x13 = 1 mm to run at 50 GHz
Such die size, 1 mm2, can pack only 250 to 300 MTr using 3 nm process
★ Hardly enough for a single core, to many cores at 5 GHz is better

Dr. Taisir Eldos 28


How fast is 6 GHz ?
★ Man walks in steps of 60 cm at least; a stride or cycle 120 cm or 0.0012 Km
★ If he ticks at 6 GHz, his speed is 7,200,000 Km/s (24 times faster than light in vacuum)

Mi cr o pr o c e sso r S y stems
Earth roundtrips to …
❖ Sun: 2 x 150,000,000/7,200,000 = 41.7 s


Dra f t L ec tu re s
❖ Moon: 2 x 384,400/7,200,000 = 0.107 s = 107 ms

Earth orbiting takes …


❖ 40,075/7,200,000 = 0.00557 s = 5.6 ms (180 rps)

By
❖ While it takes …
L step R step 60 cm 60 cm
✦ Light, 135 ms (A blink of eye)

D r. Ta i s i r El dos
✦ 2,000 Km/h supersonic jet, 20 hours

✦ 5 Km/h man walk, 1 year (straight & non-stop)


Stride 166.67 ps


Jord an U n ive r si ty of Sci ence
What can a processor do in such a small cycle?
❖ Arithmetic operations on integers and floating point numbers

❖ Logic & Shift operations on strings and numbers

an d Tech no lo gy
5.6 ms
53.4 ms
MOON EARTH
53.4 ms

Dr. Taisir Eldos 29


State of the Art - Power
★ Power consumption is a significant issue; it varies from class to class & vendor to vendor
❖ Legacy single core processors draw up to 0.3 A @ 5 V or 1.5 W max

Mi cr o pr o c e sso r S y stems
❖ Modern single core processors draw up to 0.4 A @ 1.5 V or 0.6 W max

❖ Low core count processors draw up to 100 A @ 1.0 V or 100 W max


Dra f t L ec tu re s
❖ High core count processors draw up to 600 A @ 0.6 V or 360 W max

Apple M3 Max has 92 BTr on 420 mm2, consumes 20 to 90 W (1 nW/Transistor max)


Class
Embedded
By
Power (W)
<1

D r. Ta i s i r El dos
Smart Phone 1 to 5
Tablet 5 to 10
Notebook 10 to 20

Jord an U n ive r si ty of Sci ence


Laptop
Desktop
20 to 30
40 to 120
Gaming
an d Tech no lo gy
Workstation
60 to 120
100 to 200
Server 100 to 400

Dr. Taisir Eldos 30


State of the Art - Packaging
★ Consider a server class processor like the 64-core AMD Epyc Rome with 40 BTr chiplet:
❖ 8 dies, 8 cores and 32 MB L3 Cache per die, and

Mi cr o pr o c e sso r S y stems
❖ 8 memory channels system controller with 128 PCIe Lanes, I2C, SATA, USB, RTC, etc.

❖ 0.9 V core voltage and 200 W maximum power consumption at 3.1 GHz


Dra f t L ec tu re s
Memory: using 288-pin SDRAM DDR4 modules, there has to be 8 x 288 = 2304 pins
Graphics, Chipset & I/O: around 1000 pins
Power: 200 / 0.8 = 250 A, 0.5 A bonding wires, needs 250 / 0.5 = 500 pairs = 1,000 pins
By

★ Number of pins/contacts exceeds 4000. It in fact has 4094 contacts and uses SP3 socket
❖ Dies fabricated 7 nm process (8 dies on the left and right sides)


D r. Ta i s i r El dos
❖ System Control fabricated using 14 nm (rectangle in the middle)

Latest: Intel Xeon BGA5903 & AMD Genoa SP5 LGA6096

Jord an U n ive r si ty of Sci ence


an d Tech no lo gy
Chip Top Chiplet Chip Bottom
5.8cm x 7.5cm

Dr. Taisir Eldos 31


Intel’s processors: 1974 versus 2018 (44 years)
★ Specifications
❖ 8080: 6 Thousands Transistors, 20 mm2 die using 6 µm node, 2 MHz, 10 CPI & 1 W


Mi cr o pr o c e sso r S y stems
❖ i9: 7 Billions Transistors, 200 mm2 die size using 14 nm node, 5 GHz, 0.01 CPI & 50 W

Transistor Count

Dra f t L ec tu re s
❖ 8080 chip has 6,000 / 20 = 300 Tr per square mm

❖ i9 chip has 7,000,000,000 / 200 = 35,000,000 Tr per square mm

❖ i9 transistors density is 1,166,667 times denser

★ CPI Performance
By
❖ i9 is 10 / 0.01 = 1,000 times better cycle utilization, although 2,500 times shorter cycle

D r. Ta i s i r El dos
Instruction per Second (IPS) Performance
❖ 8080 is 2 MCPS / 10 CPI = 0.2 MIPS = 200 KIPS

❖ i9 is 5,000 MCPS / 0.01 CPI = 500,000 MIPS. Hence i9 is 2,500,000 times better


Jord an U n ive r si ty of Sci ence
Power Efficiency: Millions Instructions Per Joule (MIPJ)
❖ 8080 reaches 0.2 MIPS /1 JPS = 0.2 MIPJ = 200 KIPJ

★ an d Tech no lo gy
❖ i9 reaches 500,000 MIPS / 50 JPS = 10,000 MIPJ. Hence 50,000 times more efficient

Gordon Moore predicted that number of transistors on a chip doubles every year (1.5 & 2 years)
★ This prediction went up and down since 1965, 2^(44/2) = 4 Million times (sort of performance)

Dr. Taisir Eldos 32


Performance Bottleneck - Memory Wall
★ Clock speed is an indication of how fast processors work, but what is done in this cycle?!
★ MIPS as a metric, tells how fast instructions are executed, but what they do? too much, too


Mi cr o pr o c e sso r S y stems
little ? Higher MIPS CPUs may produce less work
MFLOPS is a better metric, but only one aspect is addressed, floating point arithmetic
capability. Apple’s A17 SoC is rated 2.5 TFLOPS
★ Dra f t L ec tu re s
The ultimate performance measure is time; how much it takes a processor to complete a task
★ Benchmarks are collections of programs with range of activities to index computers


By
Benchmarks may focus on integer, floating point, graphics, etc. for specific users
A major problem of today’s systems is called the Memory Wall

D r. Ta i s i r El dos
❖ Annual performance growth of processor is > 30%, while

❖ Annual performance growth of memory is < 10%

So, a workload with 70% of the time processing & 20% of the time memory accesses, we get:
Jord an U n ive r si ty of Sci ence

❖ Processing time is reduced from 0.7xT to 0.7xT/1.3 = 0.54xT

❖ Memory access time is reduced from 0.2xT to 0.2xT/1.1 = 0.18xT, and


an d Tech no lo gy
❖ The rest is 0.1xT unchanged or 0.1xT; typically input/output related

System speed up is S = T / (0.54xT + 0.18xT + 0.1xT) = 1.22, or 22% performance growth

Dr. Taisir Eldos 33


Performance Enhancement - Amdahl’s Law
★ If a task has T completion time and α percentage of it gets enhanced by a factor β then
❖ Current completion time = T = (1-α)T + αT

Mi cr o pr o c e sso r S y stems
❖ New completion time is Tx = (1-α)T + (α/β)T

❖ Speedup is S = T / Tx = 1 / ((1-α) + α/β)

Dra f t L ec tu re s
★ What is the impact of doubling the core count? doubling the clock rate? and doubling both?
★ Assume a workload with P percentage of parallel activities & C percentage of computing activities
❖ Core Doubling, programs have some degree of parallelism (core scalability)

By
✦ Highly Parallel: α = 0.9 yields S = 1 / (0.1 + 0.9 / 2) = 1.8 (80% extra performance)

✦ Highly Serial: α = 0.2 yields S = 1 /(0.8 + 0.2 / 2) = 1.1 (10% extra performance)

D r. Ta i s i r El dos
❖ Clock Doubling, programs have some degree of number crunching (frequency scalability)

✦ Compute-Bound, α = 0.8 yields S = 1 / (0.2 + 0.8 / 2) = 1.7 (70% extra performance)

✦ Memory-Bound, α = 0.5 yields S = 1/ (0.5 + 0.5 / 2) = 1.3 (30% extra performance)


Jord an U n ive r si ty of Sci ence
Compute the speedup of a task using 4 times the core count & 50% higher clocking frequency
❖ Assume the task spends 70% of its time processing with 80% degree of parallelism

an d Tech no lo gy
❖ 4 times the core count yields speedup of SN = 1 / (0.2 + 0.8 / 4) = 2.5

❖ 50% faster clocking yields speedup of SF = 1 / (0.3 + 0.7 / 1.5) = 1.3

❖ Total speedup is S = SN x SF = 2.5 x 1.3 = 3.25; while expected to be 4 x 1.5 = 6

Dr. Taisir Eldos 34


Power Consumption
★ Power consumption varies significantly from; manufacturer, platform, target customer, etc.
★ Full load power consumption may reach 10 times the idle state; 1 nW/Transistor max


Mi cr o pr o c e sso r S y stems
Power efficiency is a hot issue; how much throughput per unit of energy in (Data Centers)
Single Board Computers (SBC) are small size lightweight subsystems with decent computing

Dra f t L ec tu re s
performance, memory and general input/outputs terminals (some are expandable)
❖ Power consumption of 5 to 20 W, native operating system, a version of Linux or Windows

❖ Used as low level computers (desktop) or high level controllers (plant or process control)


By
IoT, IoMT, Embedded systems may consume < 1 W
Handheld Devices, < 5 W (Small Battery) & Portable devices, 10 to 20 W (Large Battery)


D r. Ta i s i r El dos
Desktops, 200 to 400 W
Workstations & Gaming, 400 to 600 W

Jord an U n ive r si ty of Sci ence


Servers, 1000 to 2000 W, depends on capacity and attachments

★ SuperComputers & Data Centers are exceptional; 100s KW to 100s MW (Millions of cores)
★ Humans, 100 W (50 W sleeping to 1000 W exercising), depends on size, age, style, etc.
an d Tech no lo gy
❖ 10 to 20 W goes for brain activities

❖ 60 to 75 W goes for metabolism; breathing, heart activities, blood circulation, etc.

Dr. Taisir Eldos 35


Power - Personal Systems
★ Below is a power consumption breakdown, assuming commercial to professional desktops
★ For battery operated systems, the time for a battery to be depleted depends on the power


Mi cr o pr o c e sso r S y stems
consumption of the various parts; mainly processor, memory, solid state storage
An M3 Max MacBook Pro has a 72.4 WH / 11.46 V Battery that lasts for about 10 hours

Dra f t L ec tu re s
❖ 72.4 WH / 10 H = 7.24 W; the share of M3 Max is around 4 W

❖ How is that if M3 Max consumes 20 to 90 W? OS turns on & off parts as needed

Component
CPU
By
Power (W) Depends on: Vendor, Model, Size, Speed, Workload, etc.
40 – 100 Cores, Frequency, etc., 4 to 16 CPU cores & 16 to 32 GPU cores
GPU
CHIPsetD r. Ta i s i r El dos
80 – 300
20 – 40
Cores, Frequency, etc., 1,024 to 16,384 cores in discrete adapters
Complexity, Number of chips, etc.

Jord an U n ive r si ty of Sci ence


SDRAM 5 – 10 Number, Type, Frequency, etc.
HDD 5 – 10 Number, Size, Speed, Spinning/Idle, etc.

an d Tech no lo gy
SSD 2–4 Number, Size, Technology, etc.
Fan(s) 5 – 15 Number, Size, Speed, etc.
Attachments ? Number, Type, etc., USB max 5 W & USB-C max 240 W

Dr. Taisir Eldos 36


Power - Corporate Systems
★ Frontier SuperComputer is #1 in 2023: 1.1 EFLOPS / 23 MW, cost $600 Million
★ Has 606,208 CPU cores, 8,335,360 GPU cores & 700 PB Memory (8,000 pounds on 680 m2)


Mi cr o pr o c e sso r S y stems
Data Centers use 3% of the world’s power; 50 to 150 MW each & some exceed 600 MW
Google’s 35 data centers consume 16 TWH per year (100 countries consume less than this)


Dra f t L ec tu re s
Google’s largest, in Finland, 681 MW of renewable energy (0.3 of Jordan’s consumption)
A rack hold 14, 21 or 42 Servers (3U/2U/1U) housing: Servers, Storage, Switches & UPS
A rack can house around housing 6,000 cores & consuming 10 to 30 KW
By

★ A data center may have 10 to 10,000 racks


★ Around 40% of the energy goes for cooling Switch

D r. Ta i s i r El dos
Consider a bank data center of 10 racks …
❖ 20 KW / rack computing
Server

Jord an U n ive r si ty of Sci ence


❖ 10 KW / rack cooling

❖ 285 Fils / KWH power grid tariff for banks

✦ 10 x (20 + 10) = 300 KW total power Server


an d Tech no lo gy
✦ 300 x 24 x 30 = 216,000 KWH/month
Storage
UPS
✦ 216,000 x 0.285 = 61,560 JD/month

Dr. Taisir Eldos 37


Microprocessor Systems
System Structure

Dr. Taisir Eldos

“Science can amuse and fascinate us all, but it is engineering that changes the world.” - Isaac Asimov
Microprocessor System
★ Microprocessor systems vary greatly in complexity, performance, size, cost, etc.
★ But almost all have the same major components:
Mi cr o pr o c e sso r S yst ems
❖ CPU, to catty out tasks by running programs, along with Clock, Reset & Real Time Clock

❖ MEM, to store programs and data (input data and results)

D ra f t L e c tu re s
❖ I/O, to get information into and out of the system

❖ Mechanical structure to hold components, connections, glue logic, etc. (Motherboard)

By
RST, CLK, RTC

D r. Ta i s i r E l d os
CPU

Jorda n U n ive r si ty o f Sci ence


I/P DEC, ENC, BUF O/P

a n d Te ch n o l og y
ROM & RAM

Dr. Taisir Eldos 2


Basic Microprocessor System - Description
★ Kernel (Core)
❖ Central Processing Unit (CPU), the heart of any computer system


Mi cr o pr o c e sso r S yst ems
❖ CPU support: reset signal generator, clock signal generator, real-time clock module

Storage (Hierarchy; cache memory, main memory and mass storage)

D ra f t L e c tu re s
❖ Solid state memory comes in different flavors (Mostly, random access and volatile )

✦ Read Only Memory; ROM, PROM, EPROM, EEPROM

✦ Read Write Memory; SRAM, DRAM, SDRAM

By
❖ Mass storage come in different flavors (Non-volatile)

✦ Electro − Opto, Magneto, Mechanical Devices, Hard Disk Drive, Compact Disk,Tape Drive, etc.

D r. Ta i s i r E l d os
✦ Electronic, Solid State Device (SSD), Solid State Cards (SDC), Multi-Media Cards (MMC)

★ Peripheral Interface, to get data in and out …


❖ Input: Keyboard, Microphone, Camera, etc. & Pointing devices like Mouse, Touchpad, joystick, etc.


Jorda n U n ive r si ty o f Sci ence
❖ Output: Monitor, Printer, Speaker, etc.

Glue Logic, to connect all parts via buses (sets of wires to transfer Data, Address & Controls info)

a n d Te ch n o l og y
❖ Buffers, resolve the fan-out problem hence allow driving more and more loads in a complex system

❖ Decoders, partition the address space (select one memory or I/O chip for action)

❖ Encoders, resolve conflicts like requesting a service by many devices

Dr. Taisir Eldos 3


Types of Computers
★ Based on the way we interact with them, they are either special purposes or general purpose
❖ General purpose computers are designed to support wide range of applications


Mi cr o pr o c e sso r S yst ems
❖ Special purpose computers are design and optimized to carry out specific tasks

Both types use Processors, Memory & Storage with varying capabilities, and different kind of

★ D ra f t L e c tu re s
ports to support peripherals. However, they differ in the target application …
Special Purpose Computers
❖ Specific applications; Controllers & Embedded Systems

❖ Examples
By
✦ Simple: Oven, Fridge, Washer, Dryer, Traffic Controller, Automatic Teller Machine, etc.

D r. Ta i s i r E l d os
✦ Complex: Medical Equipment, Airplane Autopilot, Autonomous Driving, Missile
Guidance System, Industrial Plants
General Purpose Computers
Jorda n U n ive r si ty o f Sci ence

❖ Variety of applications; Scientific, Accounting, Editing, Financial, Games, etc.

❖ Examples

a n d Te ch n o l og y
✦ Simple: Portable Computer, Personal Computer, Workstation

✦ Complex: Network, Server, Data Center, Supercomputer

Dr. Taisir Eldos 4


General vs. Special
★ CPU (Raspberry Pi: Quad-Core ARM, 1 MB L2 Cache, 2.5 GHz)
❖ General: 64-bit data, 40 to 50-bit address, many cores, and multiple memory channels


Mi cr o pr o c e sso r S yst ems
❖ Special: 8/16/32/64-bit data, 16 to 36-bit address, few cores, single memory channel

RAM (Raspberry Pi: 2/4/8 GB)

D ra f t L e c tu re s
❖ General: ≥ 4 GB to support complex operating system, multi-tasking & large data

❖ Special: ≤ 4 GB to support real time operating system or control program


Ge
Storage
By
★ WiFi ner
al P
❖ TeraBytes HDD/SSD y BT I n pu r og r a
i spla tO m
SID utp mab
❖ GigaBytes SSD/eMMC/SD/ microSD D ut le


D r. Ta i s i r E l d os
Communication
❖ Display & Camera
CPU

AM

Jorda n U n ive r si ty o f Sci ence


❖ Wi-Fi & Bluetooth R
C
B-
US

❖ Ethernet & USB


1
MI
net
HD

2
❖ PCIe for functionality

a n d Te ch n o l og y
er
MI Eth

Po era
HD

r
am
we
★ Special, GPIO for sensors and actuators B-3
IC
U S
CS

★ Home Servers & Plant Controllers, etc. SB


- 2
U

Dr. Taisir Eldos 5


Most Common Platform - Desktops
★ Microprocessor systems have parts that vary in count, size, performance, power, etc.
★ Workstation MoBo has many sockets for multiple CPUs, GPUs & SDRAM slots


Mi cr o pr o c e sso r S yst ems
Desktop MoBo has many ports for connectivity & expansion slots for functionality
Notebook MoBo has limited ports & expansion slots (Zero?)


D ra f t L e c tu re s
All require power supplies that meet their needs
VRM is a DC-DC power supply for the CPU, GPU, MEM

SDRAM
By CPU Voltage Regulator Module
(VRM)

D r. Ta i s i r E l d os Ports
Power Supply

Jorda n U n ive r si ty o f Sci ence


a n d Te ch n o l og y
Ports
Motherboard Add-in Card
MoBo Expansion Slots

Dr. Taisir Eldos 6


General Microprocessor System - Minimal & Typical
CPU ROM RAM PIO SIO PCT
DB DB DB DB DB DB

Mi cr o pr o c e sso r S yst ems


WE*
AB
RD*
AB
OE*
PGM*
AB
OE*
WE*
RS
OE*
WE*
RS
OE*
WE*
RS
OE*
WE*

D ra f t L e c tu re s
INT* INT*
CLK CLK CS* CS* CS* CS* CS*

By
RST RST*
DEC
MREQ* E* 0* I/O I/O I/O

D r. Ta i s i r E l d os
1* DEV DEV DEV
A15
B 2*
A14
A 3*

Jorda n U n ive r si ty o f Sci ence


IOREQ*

A7
E* 0*
1* SIOCS*
RTC

B 2* PCTCS*

a n d Te ch n o l og y
INT*
A6
A 3*

NMI*

Dr. Taisir Eldos 7


Functional Description
★ CPU, Central Processing Unit; the core element in any microprocessor system
★ ROM, Read Only Memory; non-volatile memory to hold code and fixed data


Mi cr o pr o c e sso r S yst ems
RAM, Random Access Memory; read/write memory to hold computing results
PIO, Parallel Input Output; subsystem providing data exchange for near devices


D ra f t L e c tu re s
SIO, Serial Input Output; subsystem providing data exchange for far devices
PCT, Programmable Counter Timer, subsystem proving timing signals
Support Logic
By

❖ CLK, Clock; square wave signal necessary for the CPU to function

❖ RST, Reset; power-on pulse to restart the CPU


D r. Ta i s i r E l d os
❖ RTC, Real Time Clock; battery operated calendar subsystem

Glue Logic

Jorda n U n ive r si ty o f Sci ence


❖ DEC, Decoding elements; selects one of many devices for a transaction

✦ DEC, for memory enabled by memory access operations

✦ DEC, for input/output enabled by input/output access operations

a n d Te ch n o l og y
❖ IPE, Interrupt Priority encoder; selects the highest priority device to serve

★ I/O DEV, Input Output Devices via which the computer interacts with the world

Dr. Taisir Eldos 8


Special Microprocessor System - Traffic Lights Controller
★ A typical road intersection requires Red, Orange & Green lights to indicate the right to pass
★ Each path; East, West, South & North has:
Mi cr o pr o c e sso r S yst ems
❖ 8 outputs to switch the lights via four 8-bit latches LAX (LAXe, LAXw, LAXs & LAXn)

❖ 2 inputs from sensors to detect Emergency (E) & Congestion (C) read via a buffer BUF

D ra f t L e c tu re s
CPU ROM RAM BUF NORTH

DB DB DB DB
AB
RD*
WE*
ByAB
OE*
PGM*
AB
OE*
WE*
E

D r. Ta i s i r E l d os
DEC
CS* CS* OE*

DB
C

DB

Jorda n U n ive r si ty o f Sci ence


MRQ* E* Y0*
DB LAX
C DB
CLK CLK B

a n d Te ch n o l og y
A CK
CK
CK
RST RST* CK

Dr. Taisir Eldos 9


Traffic Light Controller - Functional Description
★ CPU, Central Processing Unit; the core element in any microprocessor system
★ ROM, Read Only Memory; non-volatile memory to hold code and fixed data


Mi cr o pr o c e sso r S yst ems
RAM, Random Access Memory; read/write memory, can be used for E & P times logging
LAX, Latch to holds the output state for path lights (4 latches, 1 per path)


D ra f t L e c tu re s
BUF, Buffer via which we read the sensors (1 buffer, 2 bits per path)
Support & Glue Logic
❖ CLK, Clock; square wave signal necessary for the CPU to function

By
❖ RST, Reset; power-on pulse to restart the CPU

❖ DEC, selects ROM, RAM, BUF, or one LAX of the 4 (Output Latches)


D r. Ta i s i r E l d os
BUF inputs come from Emergency sensors (E) for path priority as detected by sonar or
bluetooth receivers (Ambulance or Firetrucks). And Pressure sensors (P) for path time extra
when car queue length exceeds a threshold.
★ Jorda n U n ive r si ty o f Sci ence
Outputs of latch drive high-power transistors to operate the Red & Green lights of the four
groups: L-turn, S-lanes, R-turn & Walking lights (Assumed U-turn goes with the L-turn)

a n d Te ch n o l og y
During the time between deactivating and activating the Red & Green, the Yellow will be
turned on automatically, as neither is active in this time, to save a dedicated output
★ A clock of 1 MHz is more than enough to operate such a simple design

Dr. Taisir Eldos 10


Micro Controller Unit (MCU)
★ MCU integrates CPU, ROM, RAM, I/O, support and glue logic, Communication ports,
Counters, Timers, etc. , to handle general control tasks

Mi cr o pr o c e sso r S yst ems


❖ 8-bit, 16-bit and 32-bit processor

❖ 1 KB to 1 MB ROM (PROM or Flash)

D ra f t L e c tu re s
❖ 1 KB to 1 MB SRAM

❖ 4, 8, … , 48 pins of General I/O, with one or more alternative functions

★ 4 to 64-pin chips, with 4 or even more functions assigned to each pin.



By
Some chips can even change the pin number of a function (layout)
❖ Low end: 1 to 10 MHz clock, 20 to 30 pins, < 1 W


D r. Ta i s i r E l d os
❖ High end: 20 to 100 MHz clock, 100 to 300 pins, 1 to 5 W

Cost ranges from few cents to few dollars Arduino Controller

Jorda n U n ive r si ty o f Sci ence


Designing with MCU vs. CPU has many advantages:

❖ Shorter time to market (mostly ready and tested)

❖ Smaller size (most of the components integrated)

a n d Te ch n o l og y
❖ Lower power

❖ Cheaper
Wi-Fi
Quad Relay Module

Dr. Taisir Eldos 11


System On Chip (SoC)
★ SoC integrates more components to support specific functions like audio processing, video
processing, communications, bus interfaces, etc.

Mi cr o pr o c e sso r S yst ems


❖ Global Positioning System (GPS)

❖ Global System for Mobile communications (GSM)

❖ Near Field Communication (NFC)

D ra f t L e c tu re s
❖ Radio Frequency Identification (RFID)

❖ LiDar, Barometer, Accelerometer, Gyroscope, Compass, etc.

By
❖ Biometrics: Heart Rate & ECG, Blood Oxygen, Pressure, Glucose, etc.

❖ Security: Face Recognition, Fingerprint Recognition, etc.

D r. Ta i s i r E l d os
★ Examples of SoC specific functions:
❖ Smartphones & Tablets
ESP32 SoC
✦ An SoC for a smartphone may include graphics, audio, video processing parts

Jorda n U n ive r si ty o f Sci ence


❖ Televisions

✦ Sophisticated video functions, like scaling, upscaling, color processing, etc.

a n d Te ch n o l og y
❖ Networks

✦ Switches & Routers use SoC to handle packet processing and routing fast

★ ARM based chips production alone exceeds 7 Billion chips per year

Dr. Taisir Eldos 12


Form Factors - General
★ Motherboards vary in size to accommodate enough components for the target platform
★ Even personal systems motherboards come in many form factors; desktop, tower, etc.
Mi cr o pr o c e sso r S yst ems
❖ From a single SoC to multi-socket CPUs

❖ From a single channel SDRAM to multi-channel SDRAM

D ra f t L e c tu re s
❖ More ports, even of the same function, like:

✦ Video Graphics Array (VGA), Legacy

✦ Separate Video (S-Video), Legacy

By
✦ Digital Visual Interface DVI, Legacy

✦ High Definition Multimedia Interface (HDMI)

D r. Ta i s i r E l d os
✦ miniHDMI

✦ microHDMI Workstation MoBo

Jorda n U n ive r si ty o f Sci ence


✦ DisplayPort

✦ miniDisplayPort

✦ USB-C

a n d Te ch n o l og y
Laptop MoBo PC mini MoBo Desktop MoBo

Dr. Taisir Eldos 13


Sample Processors
★ Specifications vary significantly based on vendor and target platforms; smartphone, tablet,
laptop, desktop, gaming machine, workstation, server, etc.

Mi cr o pr o c e sso r S yst ems
Below, typical specs for low-end, medium and high end platforms

A17 Pro M3 Max AmpereOne

Apple D ra f t L e c tu re s
Handheld devices Personal Computers
Apple
Servers & Data Centers
Ampere Computing

3 nm process

2 to 8 W
By
20 Billions of Transistors
3 nm process
60 Billions of Transistors
8 to 40 W
5 nm process
80 Billions of Transistors
150 to 350 W
6 CPU Cores 3.8 GHz 16 CPU Cores 3.6 GHz 192 CPU Cores 2.8 GHz

D r. Ta i s i r E l d os
6 GPU Cores, 1.5 GHz
16 NPU cores
256 KB L1 per core
40 GPU Cores, 1.4 GHz
32 NPU cores
128 KB L1 per core
No GPU Cores
16 KB Code L1 per core
64 KB Data L1 per core

Jorda n U n ive r si ty o f Sci ence


16 MB L2 8 MB L2 per core cluster
64 MB L3
2 MB L2 per core
64 MB L3

a n d Te ch n o l og y
8 GB LPDDR5 DRAM

System: $1000 to $1500


128 GB LPDDR5 DRAM

System: $1500 to $3000


8 TB DDR5 DRAM

System: $4000 to $8000

Dr. Taisir Eldos 14


Example: Pro Laptop
★ Apple Pro MacBooks (2023) are based on home designed SoC M3 chips
★ Flavors: M3, M3 Pro & M3 Max (Maybe Ultra soon)


Mi cr o pr o c e sso r S yst ems
All are 3 nm process with M3 Max having 92 BTr on a 420 mm2 die
M3 Max system consumes 80 W full load & 3.5 W system nominal, 70 WH Battery lasts 20 Hours

D ra f t L e c tu re s
❖ CPU, 12 Performance Cores; 4.05 GHz, 192 KB Code + 128 KB Data L1 & 32 MB L2 shared

❖ CPU, 4 Efficiency Cores; 2.75 GHz, 128 KB Code + 64 KB Data L1 & 4 MB L2 shared

❖ GPU, 40 Cores; 1.6 GHz, 6400 Compute units delivering 4.26 TFLOPS (FP32)

By
❖ NPU, 16 Cores; 1.125 GHz yielding 18 TOPS for AI & ML

❖ Unified CPU/GPU 128 GB LDDR5-6400 SDRAM

D r. Ta i s i r E l d os
✦ 32 Channels x 16-bit each

✦ 6.4 x 2 = 12.8 GBps per channel

★ Jorda n U n ive r si ty o f Sci ence


✦ 12.8 x 32 = 400 GBps unified throughput

Integrates other functions


❖ Audio & Video processing

a n d Te ch n o l og y
❖ Security & High speed data transfer

❖ Supports up to 8 TB SSD (7.4 GBps)

Dr. Taisir Eldos 15


Form Factors - Compact
★ Single Board Computer (SBC) is a small board with an engine capable of running proprietary
real time operating systems or light operating systems to full fledge operating systems like

★ Mi cr o pr o c e sso r S yst ems


Windows and Linux
Integrate processor, memory, mass storage, peripherals, and good connectivity
They can be used as computers, controllers, IoT devices, etc.
D ra f t L e c tu re s

★ Low power, no fan, system on a tiny board; 15 to 60 cm2


★ Below are few examples of such systems from Intel, can be used for processing just like


By
general purpose computers but at varying scale
System on Module (SoM), integrates components for specific tasks on a small board.

D r. Ta i s i r E l d os
★ From few dollars to few hundred dollars; Multi-Core
★ Low end ones MCUs can get down to few cents per piece

Jorda n U n ive r si ty o f Sci ence


a n d Te ch n o l og y
Raspberry Pi Zero 2 W Raspberry Pi 4 Latte Panda

Dr. Taisir Eldos 16


Desktop & Control
★ Systems are packaged based on needs; smartphones, to fit tablets, notebooks, desktops,
workstations, servers, stand alone controllers, embedded systems, etc.


Mi cr o pr o c e sso r S yst ems
An example of computer / controller; with many ports for Keyboard, Mouse, Monitor, etc.
Raspberry Pi 5 is built around an SoC with 4-core 64-bit ARM running at 2.4 GHz

D ra f t L e c tu re s
2 GB, 4 GB & 8 GB LPDDR4 (for few to several tens of dollars)

❖ Gbps Ethernet with PoE support

❖ Dual USB 2, dual USB 3, USB-C (Power)

By
❖ Dual micro HDMI for 4K displays & Stereo Audio port

❖ Dual Camera support

D r. Ta i s i r E l d os
❖ Dual Band Wi-Fi & Bluetooth

❖ MicroSD card

Jorda n U n ive r si ty o f Sci ence


❖ 40-pin GPIO & PCIe x1 Gen. 2

❖ RTC & 15 W Power Jack

★ Pi OS, Android & Linux support


a n d Te ch n o l og y
❖ High end controller, or

❖ Low end computer (heat sink? fan?)

Dr. Taisir Eldos 17


Artificial Intelligence
★ AI applications require supercomputing performance in small size and low power
★ nVIDIA offers SoM based small form factor packages with range of connectivity; Wi-Fi,
Mi cr o pr o c e sso r S yst ems
Bluetooth, Gb Ethernet, CAN, USB 3, 4K HDMI, 16 GB eMMC, I2C, I2S, SPI, 5 MP Camera
interface, A/V encode/decode, tons of GPIO and options
❖ TX2 (15 W, $500, 2 TFLOPS), a multi-core CPUs and 256-core GPU, 8 GB LPDDR4, 32

D ra f t L e c tu re s
GB eMMC
❖ Jetson Nano (10 W, $99, 478 MFLOPS), a skimmed version with 128-core GPU, 4 GB

By
LPDDR4 & 16 GB eMMC

D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci ence
a n d Te ch n o l og y
nVIDIA Jetson Nano SoM nVIDIA Jetson Nano SBC nVIDIA Jetson TX2 SBC

Dr. Taisir Eldos 18


Edge Computing
★ An AI processing daemon, using nVIDIA Jetson Xavier NX SoMs (System on Module)
❖ CPU: 6 ARM v8.2 Cores, 6 MB L2 / 4 MB L3 Cache (1.9 GHz if 2 & 1.4 GHz 4/6)

Mi cr o pr o c e sso r S yst ems


❖ GPU: 384 CUDA Cores & 48 Tensor Cores

❖ MEM: 8 GB LPDDR4 128-bit / 51.2 GBps, 16 GB eMMC & microSD and NVMe SSD


D ra f t L e c tu re s
84 Tera Operation Per Second (TOPS)
Good for Edge Computing

By
★ Size & Cost:
❖ 4 x $400 + $200 = $1800

❖ 70 W in 1000 cc package

D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci ence
a n d Te ch n o l og y
Jetson Xavier NX Board Quad Jetson Xavier NX Carrier Jetson Mate - Cluster Box
70 mm x 45 mm 110 mm x 110 mm 120 mm x 120 mm x 8 mm

Dr. Taisir Eldos 19


Back End Computing
★ Large Language Models (LLM)s require high performance platforms, as they are compute
intensive & data intensive models

Mi cr o pr o c e sso r S yst ems
AMD Instinct integrates large amount of cores and memory to mach this demand as back
end computers in data centers; each box is …
❖ 300 compute cores & 20,000 processors cores

D ra f t L e c tu re s
❖ 192 GB & 5.3 TB/s memory

❖ 750 W & 5.2 PFLOPS

By
D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci ence
a n d Te ch n o l og y
Dr. Taisir Eldos 20
Purpose Built Systems
★ Increasing demand on performance led chip makers to look for ways to make them at
affordable price and in reasonable time; small dies with good yield & short time to market

Mi cr o pr o c e sso r S yst ems
Purpose Built is a way to assemble chips to meet special needs, like Data-Centric applications
❖ QualComm Centriq, designed for performance & optimized for power to handle Data
Centers workloads
D ra f t L e c tu re s
❖ NVIDIA Drive Adam System, Quad Orin, 4 x 254 > 1000 TOPS for autonomous driving;
a data center on wheels, to process large amount of data from many sensors & cameras

By
D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci enceNVIDIA Orin
254 TOPS

a n d Te ch n o l og y 17 BTr. SoC

Dr. Taisir Eldos 21


Computing Platforms - Personal
★ Workstations, Notebooks, Tablets & SmartPhones
★ Wearables; Watches, Headsets & Glasses

Mi cr o pr o c e sso r S yst ems
Networks, Personal Area Network (PAN), Internet of Things (IoT) & Cloud services

D ra f t L e c tu re s
By
ROUTER

D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci ence
a n d Te ch n o l og y
Dr. Taisir Eldos 22
Computing Platforms - Corporate
★ Workstations, Thin Clients, Terminals, Kiosks and other equipment
★ Supercomputers, Minicomputers and/or Servers

Mi cr o pr o c e sso r S yst ems
Clouds, Infrastructures, Platforms & Services; Public, Private & Hybrid Clouds forms

D ra f t L e c tu re s
By
On-premise Servers Cloud Servers
ROUTERS & SWITCHES

D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci ence
a n d Te ch n o l og y
Dr. Taisir Eldos 23
Computing Platforms - SuperComputers
★ Super computer are quite powerful machines dedicated for heavy computations like scientific
research, simulation, design of complex systems


Mi cr o pr o c e sso r S yst ems
Data Centers serve huge number of users & applications over the Internet; mostly CPUs
Super Computers serve limited number of users & applications; mostly GPUs

D ra f t L e c tu re s
By
D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci ence
a n d Te ch n o l og y
Dr. Taisir Eldos 24
Computing Platforms - Factories
★ Programmable Logic Controller (PLC) is a modular special purpose automation system
★ Consist of CPU modules, I/O modules, Links, etc.


Mi cr o pr o c e sso r S yst ems
Programmed using special languages, like Ladder Diagram (LD), Instruction List (IL), etc.
Used in small control applications and large industrial plants …

D ra f t L e c tu re s
❖ Controllers, traffic lights, elevators, automatic doors, car wash, remote monitoring, etc.

❖ Industry, automobile industry, oil and gas industry, equipments industry, food industry, etc.

By
D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci ence
a n d Te ch n o l og y PLC system

Automobile Assembly Line Silicon Foundry FOUPs

Dr. Taisir Eldos 25


Microprocessor Systems
System Components

Dr. Taisir Eldos


Clock Signal Generator
★ Every system needs a clock; a square wave signal with some characteristics like frequency,
duty cycle, voltage level, fall and rise times, etc.

Mi cr o pr o c e sso r S yst ems
Inverters have an input threshold VTH
❖ VI < VTH ➞ VO = VOH

D ra f t L e c tu re s
❖ VI > VTH ➞ VO = VOL

★ Schmitt inverter has hysteresis property; it has two input threshold voltage levels
❖ VI < VL ➞ VO = VOH

★ Inerter
By
❖ VI > VH ➞ VO = VOL
VOH
VO
VOH
VO


D r. Ta i s i r E l d os
❖ VTH = 1.5 V

TTL Schmitt Inverter

Jorda n U n ive r si ty o f Sci ence


❖ VL = 0.8 V

❖ VH = 1.7 V VOL VOL


VI VI
★ CMOS Schmitt Inverter
a n d Te ch n o l og y
❖ VL = 1.0 to 2.0 V

❖ VH = 2.4 to 3.2 V
VTH VL VH

VI VO VI VO

Dr. Taisir Eldos 27


Clock Signal Generator
★ Charging Equations: VI is Initial Voltage, VF is Final Voltage & VC is Capacitor Voltage
★ Capacitor Charge …
Mi cr o pr o c e sso r S yst ems
❖ Equation: VC = VF − (VF − VI) x e−T/RC , where VI = VL & VF = VCC

❖ TH = T when VC = VH, hence: TH = RC x ln ((VCC − VL)/(VCC − VH))


74HC14

D ra f t L e c tu re s
Capacitor Discharge …
❖ Equation: VC = (VI − VF) x e−T/RC , where VI = VH & VF = 0
CK

❖ TL = T when VC = VL, hence: TL = RC x ln (VH/VL)

By
VC
C R
★ To achieve 50% duty cycle
VC
❖ TH = TL , hence VCC = VH + VL

D r. Ta i s i r E l d os
VCC
❖ VL = 1.67 V & VH = 3.33 V (1/3 & 2/3 of 5 V), or
VH
★ Consider an inverter with: VL = 1.9 V & VH = 3.1 V VL

Jorda n U n ive r si ty o f Sci ence


❖ TH = TL = RC x ln (3.1/1.9) = 0.49 RC

❖ T = TL + TH = 0.98 RC ≈ RC
0

CK
Time

❖ F = 1 / T = 1 / RC

a n d Te ch n o l og y
VCC
★ With R = 1 KΩ & C = 2 nF Space Mark
❖ F = 500 KHz & DC = 50% TL TH
0 Time

Dr. Taisir Eldos 28


Clock Signal Generator - Pierce Oscillator
★ Crystal (Quartz) is a Silicon Dioxide (SiO2) compound with piezoelectric property
★ With the piezoelectric effect, crystals act like passive tuning forks; does not damp

Mi cr o pr o c e sso r S yst ems
Accuracy is measures in parts per million (ppm) or parts per billion (ppb). With ± 3.4 ppm …
❖ ± 3.4 x 10−6 x (30 x 24 x 60 x 60) = ± 8.8 seconds monthly


D ra f t L e c tu re s
❖ ± 3.4 x 10−6 x (365.25 x 24 x 60) = ± 1.8 minutes annually

Cut precisely to act like an RLC resonance circuit with very high Q–Factor, 106 or better
❖ Precision: 100 to 10 parts per million (ppm) in the temperature range 20 to 70 ○C

By
❖ Stability: 10 to 100 ppb/○C due to heating & 10 to 100 ppb/year due to aging

★ Low frequency crystals are bulky, less precise & hard to manufacture

D r. Ta i s i r E l d os
Systems use low frequency clock like 100 MHz as base & components have multipliers
❖ CPUs on-board multipliers produce 1.8 to 5.8 GHz

Jorda n U n ive r si ty o f Sci ence


1 MΩ
❖ GPUs on-board multipliers produce 1.0 to 3.2 GHz

❖ Other subsystems ma have their own


1 KΩ
✦ USB, use 12 MHz to generate 48 MHz, 960 MHz

a n d Te ch n o l og y
✦ Sound, 44.1 KHz, 48 KHz, 96 KHz Crystals
22 pF
✦ Ethernet; 25 MHz for 10/100 Mbps, 125 MHz for 1 Gbps
Oscillator

Dr. Taisir Eldos 29


High Precision Clocks
★ Crystal Oscillator (XO)
❖ 10 - 5, 100 to 10 ppm

Mi cr o pr o c e sso r S yst ems


❖ One second shift per week

❖ Cheap & Small for commercial apps, 7 x 7 x 3 mm


XO


D ra f t L e c tu re s
Temperature Controlled Crystal Oscillator (TCXO)
❖ 10−6, 10 to 1 ppm

❖ One second shift per month

By
TCXO
❖ Thermal sensor to adjust frequency, 7 x 7 x 3 mm

★ Oven Controlled Crystal Oscillator (OCXO)


D r. Ta i s i r E l d os
❖ 10−8, 100 to 10 ppb

❖ One second shift per year (instrumentation, airborne systems, etc.)


OCXO


Jorda n U n ive r si ty o f Sci ence
❖ Temperature is kept at 100 ○C for stability, 9 x 9 x 5 cm

Laser Controlled Crystal Oscillator (LCXO)


❖ 10−11, 100 to 10 ppt

a n d Te ch n o l og y
❖ One second shift in thousands of years

❖ Cost thousands of dollars, high precision apps, 20 x 20 x 7 mm


LCXO

Dr. Taisir Eldos 30


Atomic Clocks
★ Atomic clocks are crystal oscillators with precision control; Lasers or Photonics
★ Extremely precise, but bulky (closet size) & Costly (Millions of dollars) for:
Mi cr o pr o c e sso r S yst ems
❖ Timing Standards (ATI is based on 400 clocks in 69 labs around the world)

❖ Positioning, Navigation, Surveying & Air Traffic Control (Satellite Based)

D ra f t L e c tu re s
❖ Financial Transactions & Securities Exchange

❖ Communications & Network Synchronization

❖ Climate, Scientific Research & Space Exploration

★ By
Fountain Clock (CFC), Cesium @ 9.2 GHz Life Expectancy
10 to 50 years
❖ 3.3 nanoseconds per year


D r. Ta i s i r E l d os
❖ One second shift in 300 Million years

Optical Lattice Clock (OLC), Strontium @ 429 THz

Jorda n U n ive r si ty o f Sci ence


❖ 33 picoseconds per year

❖ One second shift in 30 Billion years

Optical Lattice Clock (OLC), Ytterbium @ 642 THz


a n d Te ch n o l og y

❖ 1 picoseconds per year

❖ One second shift in 1 Trillion years

Dr. Taisir Eldos 31


Reset Signal Generator
★ On power up, the capacitor charges according to: Vc = Vcc x (1 − e-−T/RC)
★ If the high threshold of 74HC14 is 3.2 V ( which is nearly 63.2% of 5 V final voltage value),


Mi cr o pr o c e sso r S yst ems
it takes RC seconds to reach, but a bit more to reach 3.6 V and trigger; like 1.1 RC
The output Vx is active high; it goes high on power up or button release, and down after
some time that depends on the R, C and the threshold voltage of the Schmitt inverter
★ D ra f t L e c tu re s
With C = 4.7 µF & Rc = 20 KΩ, we get 20 KΩ x 4.7 µF = 94 ms pulse duration
★ If C max ratings are 12 V / 0.5 A, then Rd > 5 V / 0.5 A ≥ 100 Ω for safe discharge

By
Rd must be small enough to discharge before the push button is released. If the push button
time is 10 ms for example, Rd x C < 10 ms, or else the capacitor starts charging before
reaching the low threshold
D r. Ta i s i r E l d os
5.00 V
Rc

Jorda n U n ive r si ty o f Sci ence


3.6 V
3.2 V 63.2%
PB
74HC14

Vx

a n d Te ch n o l og y
Vc T
Tc
Rd C

0.00 V Time

Dr. Taisir Eldos 32


𝛕
Reset Signal - Cold versus Warm
★ In one time constant, RC seconds, the capacitor charges 63.2% of 5V or 3.2 V
★ Assuming 74HC14 with VL = 1.6 V & VH = 3.2 V thresholds

Mi cr o pr o c e sso r S yst ems
Green curve is VC and Blue curve is VX. Initially VC is 0 hence output is High
❖ Green charges to VH, causing the inverter output to go Low ending the pulse

D ra f t L e c tu re s
❖ Green discharges to VL, causing the inverter to start and end another pulse

❖ Red discharges slowly (due to high Rd) and does not get to V−, no reset pulse generated

★ Cold & Warm Reset


By
❖ Cold reset, when the power supply is turned on; Tc ≈ RC, depends on VH inverter type

❖ Warm reset, when the power supply is on and the push button is pressed; Tw ≤ Tc

D r. Ta i s i r E l d os
The 555 timer circuit requires 2 resistors & 2 capacitors to construct a robust pulse generator
❖ Operates at 5 V to 15 V
5.0 V VCC
❖ Robust; 50 years of reputation

Jorda n U n ive r si ty o f Sci ence


❖ Popular; Billion sold every year

❖ Cheap; few cents only


3.2 V VH

a n d Te ch n o l og y
❖ 8-pin chip has single timer

❖ 14-pin chip has dual timer


1.6 V
PBP
VL

❖ Available in CMOS too Tc Tw


0.0 V Time

Dr. Taisir Eldos 33


555 Timer Based Reset Generator
★ The 555 timer circuit 3 resistors voltage divider to generate 1.67 V & 3.33 V reference points
★ The output goes high on power up or button release & goes down when C charges to 3.33 V


Mi cr o pr o c e sso r S yst ems
Hence T = 1.1 x R x C because it occurs when 5 x (1 − e −T/RC) = 3.33 V
Rx (1 MΩ) initiates the pulse on power up or button release & Cx (10 nF) eliminates noise

D ra f t L e c tu re s
❖ Compute the pulse duration with R = 18 KΩ / 5% & C = 6.8 µF / 20%

✦ T = 1.1 x 18 x (1− 0.05) x 6.8 x (1 − 0.20) = 102.3 ms (Good for 100 ms requirement)

✦ Note that electrolytic capacitors lose value over time due to liquid evaporation

By
❖ Compute the 10% resistor value that generates 50 ms using C = 4.7 µF / 15%

✦ 50 ms = 1.1 x R x (1− 0.10) KΩ x 4.7 x (1 − 0.15) µF, hence R = 12.64 KΩ

D r. Ta i s i r E l d os
✦ If only 12 KΩ, 15 KΩ, 27 KΩ are available, pick 15 KΩ (Need at least 12.64 KΩ)

5V

Jorda n U n ive r si ty o f Sci ence


GND
TRIG
OUT
1
2
3
8
7
6
VCC
DIS
THR
Rx
RST
VCC
THR
R

T
TRG DIS

a n d Te ch n o l og y
RST 4 5 CONT LM555
CON OUT

PB Cx GND C
GND

Dr. Taisir Eldos 34


Central Processing Unit (CPU)
★ DB: 4, 8, 16, 32, 64
★ AB: 12, 14, 16, 20, 24, 32, 36, 40, 41, 42, 43, …, 48, 50 & 52.


Mi cr o pr o c e sso r S yst ems
Address Space: Kilo = 210, Mega = 220, Giga = 230, Tera = 240, Peta = 250, Exa= 260, Zetta= 270
CB: 10’s to 100’s of signals
210 = 1024 = Ki

D ra f t L e c tu re s
❖ Input: Clock, Reset, Interrupt, Wait, Bus Error, Bus Request, etc.

❖ Output: Clock, Address Strobe, Data Strobe, Read, Write, Halt, etc.
103 = 1000 = K

❖ Multiplexed: Input/Output due to pins shortage, like Reset & Halt in the MC68000

★ By
PB: 5.0, 3.3, 2.0, 1.9, 1.8 V, 1.5, 1.2 V, …, with dynamic voltage scaling it cover a range:
❖ CPU Cores: 0.7 – 1.3 V (low core count), (0.6 – 1.1 V (high core count) & 1.5 V (Gaming)

D r. Ta i s i r E l d os
❖ CPU Logic: 1.2 – 1.5 V

❖ GPU: 0.6 – 1.0 V

Jorda n U n ive r si ty o f Sci ence


❖ MEM Controller: 1.2 – 1.5 V

❖ I/O Logic: 1.8 – 3.3 V


DB
AB

Packages: DIL, PLCC, SMT, PGA, BGA, LGA …


a n d Te ch n o l og y

CB*
★ Pin count: 16, 18, 28, 40, 48, 52, 64, 100s, 1000s CB*
★ AB & DB of 48 & 64 implies 23x248 = 251 = 2 PB

Dr. Taisir Eldos 35


Read Only Memory (ROM, PROM, EPROM, EEPROM)
★ DB: 4, 8 and 16
★ AB: 4, 5, 6, …, 20 (More recently …)

Mi cr o pr o c e sso r S yst ems
CB: CS*/CE* (Chip Select/Enable), OE* (Output Enable or Read) and PGM* (for
Programming or Write using a programming device)

D ra f t L e c tu re s
PB: GND, VCC (5.0 V), VPP = 12 V (for programming)

❖ Mask ROM (MROM), hardcoded with data from the factory

❖ Programmable ROM (PROM), blank to program in house using a special equipment

By
❖ Erasable Programmable ROM (EPROM), UV Light erased, and programmed many times

❖ EEPROM, Electronically byte erasable and Flash is the same but block erasable. Why?


D r. Ta i s i r E l d os
The 2764 is 28-pin 8 KB EPROM (8Kx8b), why not 27 pins?
❖ AB = 13 (8 K implies 3 + 10)

Jorda n U n ive r si ty o f Sci ence


❖ DB = 8 DB
❖ CB = 3 (CS*, OE*, PGM*) 2764
AB
CS*
❖ PB = 3 (VCC, GND & VPP in some chips)


a n d Te ch n o l og y
Unused pin! NC, VCC, GND, CS2* …
Some chips have dual function pins, like CE*/VPP
OE*
PGM*

Dr. Taisir Eldos 36


Flash Memory (NOR / NAND)
★ Uses 1 floating gate transistor per cell; to trap and release electrons under high voltage
★ NOR is less dense, random access at the byte/word level good for code; eXecute In Place (XIP)


Mi cr o pr o c e sso r S yst ems
NAND is denser, sequential, cheaper good for data; formatted as block, pages, etc.
A chip consists of: dies, planes, blocks, pages, strings of bits, and requires erase before write

★ D ra f t L e c tu re s
Units of erase and write are blocks not bytes
Millions of erase/program cycles & decades of retention SCLK
Standard Bus SDI

By

SDO
❖ DB of 4, 8, 16
CS*
❖ AB of 4, … up to 20 or more


D r. Ta i s i r E l d os
❖ CB: CS*, OE* and WE*

Serial Peripheral Interface (SPI) for low pin count packages

Jorda n U n ive r si ty o f Sci ence


❖ Serial Clock & Select: SCLK & CS*

❖ Serial Data In & Serial Data Out: SDI & SDO


DB
AB
CS*
PB: GND and 1.8 V, 3.3 V, 5.0 V (25 V generated internally)
a n d Te ch n o l og y

OE*
★ Packages: DIL & TSOP with 4, 8, 16, 18, 20 and 42 pins WE*
★ 3D-NAND or V-NAND is reliable, cheap, small and low power

Dr. Taisir Eldos 37


Static Read Access Memory (SRAM)
★ Uses 4 to 6 transistors per cell or bit structure (even 8 and 10 for some applications)
★ The misleading “Random Access” used here means accessing any random location takes the same


Mi cr o pr o c e sso r S yst ems
amount of time. This is opposed to Direct Access in which it depends on where the data is stored
DB: 4, 8 and 16
AB: 8, 9, …, 20 (More recently …)
D ra f t L e c tu re s

★ CB: CS*/CE*, OE* and WE*


❖ CS*/CE*: Chip Select/Enable, there can be many active low and active high

By
❖ OE*: Output Enable, to read data out

❖ WE*: Write, to write data in

D r. Ta i s i r E l d os
★ PB: GND and 5.0 V (Today, many work on much less like 2.0 V)
★ Packages: DIL and SMT with 24, 28, 32, 40 pins (8-pin chips uses serial bus)
★ The 62128 is a 28-pin 16 KB SRAM (16 Kx8b), again not 27 !
Jorda n U n ive r si ty o f Sci ence
❖ AB = 14 (16 K implies 4 + 10)

❖ DB = 8 6212
8
DB
AB
CS*

a n d Te ch n o l og y
❖ CB = 3 (CS*, OE*, WE*) OE*
❖ PB = 2 (VCC, GND) WE*
★ What is the capacity of a 32-pin 8-bit data SRAM chip? 219 = 512 KB

Dr. Taisir Eldos 38


Dynamic Read Access Memory (DRAM)
★ 1 transistor per cell instead of 4; to control a charge on a capacitor, slower but cheaper
★ DB: 1, 2, 4, 8, 16


Mi cr o pr o c e sso r S yst ems
AB: 8, 9, …, 16, 17, 18 (18x2 = 36, yields 236 = 64 GB, Today: 8 GB x 8 Dies = 64 GB
CB: CS*, OE*, WE*, CAS* and RAS*

D ra f t L e c tu re s
❖ CS*/CE*: Chip Select/Enable, there can be many active low and active high

❖ OE*: Output Enable, to read data out

❖ WE*: Write, to write data in

By
❖ RAS*: Row Address Select; latches the upper half to select a page

❖ CAS*: Column Address Select; latches the lower half


D r. Ta i s i r E l d os
PB: GND and 5.0 V (3.3 V, 2.0 V, 1.2 V, 1.1 V & 1.0 V today)
Packages: DIL & SMT with 16, 18, 20 & 40 pins
DB
AB

Jorda n U n ive r si ty o f Sci ence
How many pins in 4 GB DRAM, assuming 4-bit wide? 28
❖ Format: 4 GB = 8 G x 4 b, AB = ⌈33/2⌉ = 17; 2 steps only
CS*
OE*
WE*

a n d Te ch n o l og y
❖ DB = 4 RAS*
❖ CB = 5 (CS*, OE*, WE*, CAS*, RAS*) CAS*
❖ PB = 2 (VCC, GND), more for high density chips

Dr. Taisir Eldos 39


Parallel Input Output (PIO)
★ Integrates 2 or 3 ports to be programmed to as input or output
★ Ports are Byte, Nibble or Bit programmable, depending on vendor and purpose

Mi cr o pr o c e sso r S yst ems
Used as peripherals, hence use the slow synchronous mode but can use the asynchronous
mode too

D ra f t L e c tu re s
They have reset input to initialize ports as inputs to avoid damage; there will be a content that

leads to damage if a port is randomly set as output while connected to an input device
★ May have Interrupt Output, to notify the CPU when an action is complete or to be requited

By
Naturally, they have a data bus to communicate with the CPU, few address lines to select a
port and read/write control and a chip select (from a decoder)

D r. Ta i s i r E l d os
★ Different vendors have chips with different flavors
❖ Parallel Input Output (PIO); Zilog DB PA
❖ Parallel Peripheral Interface (PPI); Intel AB PB

Jorda n U n ive r si ty o f Sci ence


❖ Parallel Interface Adaptor (PIA); Motorola
CS* PC
OE*
WE*

a n d Te ch n o l og y RST*
INT*

Dr. Taisir Eldos 40


Serial Input Output (SIO)
★ Integrates one or two serial communication channels, bit rates from 1.2 Kbps to 1.5 Mbps
★ Serial communications used to link terminals or printers to mainframes over 100’s feet using


Mi cr o pr o c e sso r S yst ems
+12/−12 V drivers as opposed to standard parallel 0/+5 V
Each channel has Transmit, Receive and Handshaking signals (on the right side)

D ra f t L e c tu re s
Has interrupt output, to signal events like rather received or sent

★ Some chips have FIFO buffer for each direction; 16, 32, …, 128
★ Different vendors have chips with different flavors
By
❖ Asynchronous Communication Interface Adaptor (ACIA, ACA); Motorola

❖ Universal Asynchronous Receiver Transmitter (UART); National SC

D r. Ta i s i r E l d os
DB TxD
❖ Dual Asynchronous Receiver Transmitter (DART); Zilog
AB RxD
★ Asynchronous serial protocol may use: CS*

Jorda n U n ive r si ty o f Sci ence


❖ 5-wire cable with hardware handshaking OE* CtS*
WE* RtS*
❖ 3-wire cable with software handshaking
RST*
★ Chips provide more controls for modems INT*

a n d Te ch n o l og y RxC
TxC

Dr. Taisir Eldos 41


Programmable Counter Timer (PCT)
★ Integrates few counting modules; typically three 16-bit counters, and provides many modes
of operation; Pulse generator, Square wave generator, etc.


Mi cr o pr o c e sso r S yst ems
Output of module can be used as a clock for another to form 32-bit or even 48-bit counters
Mostly byte oriented low operating frequency used in timing signal generation like periodic
interrupt for multitasking
★ D ra f t L e c tu re s
Typical chips have three channels or counting elements, called contain modules
★ There can be more chip specific controls in some chips; like Clock, Reset, etc.

By
Different vendors have chips with different flavors
❖ Programmable Interval Timer (PIT); Intel

D r. Ta i s i r E l d os
❖ Programmable Timer Module (PTM); Motorola

❖ Counter Timer Channels (CTC); Zilog


DB
Each counting module, M0, M1 and M2 has:
Jorda n U n ive r si ty o f Sci ence
★ M0
AB
❖ 2 inputs: Clock & Gate (Enable) CS*
M1
❖ 1 output: Out OE*


a n d Te ch n o l og y
Modules are totally independent, each operates in any mode
Any module can generate an interrupt when done
WE*
INT*
M2

★ Can be cascaded; to make 32-bit or 48-bit modules

Dr. Taisir Eldos 42


Real Time Clock (RTC)
★ Integrates a clock generator, few bytes of read/write memory and some control logic, along
with 40 to 60 bytes of non-volatile Storage (NVS) to keep the time & date (for few dollars)

Mi cr o pr o c e sso r S yst ems
The ticking logic gets power from VCC when the system is on, or VBB otherwise
❖ Lithium Battery, small coin like non-rechargeable 3 V / 100s mAH, good for few years

D ra f t L e c tu re s
❖ SuperCapacitor (UltraCapacitor), few minutes of charge gives months of operation

★ To generate the 1 Hz ticker with high precision; 10 or 20 ppm, we use


❖ 215 = 32,768 Hz, need 15 FFs to divide to get 1 Hz


By
❖ 222 = 4,194,304 Hz, need 22 FFs to divide to get 1 Hz

Why those oddball numbers? 32,700 Hz, 32,000 Hz, etc. need complex next state logic

D r. Ta i s i r E l d os
❖ 213 = 8,192 Hz? Less FFs compared to 215, but less precise

❖ 212 = 4,096 Hz? Less FFs, but less precise, more power, bulky, fragile & hum
XTL1

Jorda n U n ive r si ty o f Sci ence


Host reads information over I C bus for data exchange
★ 2
XTL2
★ OUT can be set to show: 1 or 32,768 Hz SCL
SDA
★ Some systems get informations form other sources:
a n d Te ch n o l og y
❖ Radios, Cellular communication, Internet, etc.

❖ Chipset Integrated function, Intel’s Mobos


OUT
VBB

Battery Capacitor

Dr. Taisir Eldos 43


Glue Logic: Decoders & Encoders
★ Glue logic provides physical support functions for data and control flow in the system
★ Decoders select one out of many devices, memory or input/output chips, for a transaction

Mi cr o pr o c e sso r S yst ems
Decoders can be cascaded support more chip select controls and have more control over the
space. This can be achieved using

D ra f t L e c tu re s
❖ DEC, 2-to-4, 3-to-8, etc. are binary decoders in one or two stages normally

❖ ROM or PLA, flexible but slow and needs programming step which is an added cost

★ Encoders arbitrate events like interrupts; report the code of the highest priority active device
By
request to serve; the lowest is always active to report no request.

D r. Ta i s i r E l d os E1
E2*
Y0*
Y1*
E* I0*
I1*
E3* Y2* I2*

Jorda n U n ive r si ty o f Sci ence


E* Y0* A
Y3*
Y4* A*
I3*
I4*
Y1* B Y5* B* I5*

a n d Te ch n o l og y
A
B
Y2*
Y3*
C Y6*
Y7*
C* I6*
I7*
74LS

74LS

74LS
139

138

148
Dr. Taisir Eldos 44
Glue Logic: Buffers & Latches
★ Buffers are used for two reasons
❖ Physical, to resolve the fan out issues by strengthening the signal power, it is an electronic

Mi cr o pr o c e sso r S yst ems


function although might be inverting or none-inverting. Buffers allow low driving power
devices like CMOS to drive many higher power ones like TTL
❖ Logical, to read specific input to the data bus when enabled, as an input port

★ D ra f t L e c tu re s
Address bus is unidirectional and hence needs unidirectional buffers; each 4 have one Enable
★ Data bus needs bidirectional buffers (bus transceivers), hence Enable & Direction controls

By
Latches or Flip-Flops are used as output ports, data on the data bus is written into by
activating the clock, to be read by another party when output is enabled

D r. Ta i s i r E l d os
74LS

74LS

74LS
Jorda n U n ive r si ty o f Sci ence
374

244

245
8 8 8 8 8 8
D Q A B A1 B1 A B

a n d Te ch n o l og y
D D Q Q B
E1*
OE* CK CK
E1* E* A
CK OE* E2* A2 B2 D
E* D
E2*

Dr. Taisir Eldos 45


Programmable Logic Devices
★ Programmable Logic, are structures with Thousands to Billions of transistors, that can be
programmed to achieve specific functions. Generally, Programmable Logic Devices (PLDs)

Mi cr o pr o c e sso r S yst ems
PLDs are used to implement complex logical expressions and even complex systems
❖ Programmable Logic Arrays (PLA)

D ra f t L e c tu re s
❖ Field Programmable Gate Arrays (FPGA)

❖ Application Specific Integrated Circuits (ASIC)

★ PLAs are generally used to implement simple logic expressions while FPGAs and ASICs
By
consist of huge number of complex blocks and interconnection network managed by
switches, and hence can be used to implement complex systems like controllers or processors

D r. Ta i s i r E l d os
★ PLA structure
❖ AND/OR sections, with programmable connections

❖ Fuses are initially robust (Red) and making connection, the blown are (Blue) to disconnect

Jorda n U n ive r si ty o f Sci ence


❖ AND sections generate product terms

❖ OR sections generate sum of products

a n d Te ch n o l og y
❖ XOR takes the complement of a function

❖ Variables passed true and complemented, and only one can be taken (if any)

❖ To invert a function, AND/NOR connect to 1, else to 0 (exactly one of them)

Dr. Taisir Eldos 46


PLA Structure (6-input & 10-output Example)
A A
B
C
Mi cr o pr o c e sso r S yst ems B
C
D
E
D ra f t L e c tu re s D
E
B: Blown
F
By R: Robust
F

= (A’•1)(B’•1)(1•C)(1•1)(1•1)(1•F)

D r. Ta i s i r E l d os = A’B’CF

Jorda n U n ive r si ty o f Sci ence F1 = A’BE’F+A’B’CF

F2 = (BCD+AC’E’)’

a n d Te ch n o l og y
F10 = ?
A’BE’F BCD AC’E’ A’B’CF

Dr. Taisir Eldos 47


Processor Power Module
★ Motherboards host processors, chipsets, support components, and slots of various kinds to
host memory modules, storage modules, graphics cards, ports, et. Each needs specific voltage

Mi cr o pr o c e sso r S yst ems
Processor Power Modules (PPMs) or Voltage Regulator Modules (VRMs) are local power
supplies; step down 5 or 12 V DC to stable reliable DC lower voltage (using buck converters)
VRMs consist of: Pulse Generator, MOSFET Switches, Chokes & Capacitors
D ra f t L e c tu re s

★ It has a feedback control to adjust the duty cycle, pulse width, to stabilize the output
★ Output based on input identification code VID (5, 6 or 8 bits to specify the requited value)


By
Codes may imply: 0.55, 0.56, 0.57, …, or 3.0 V (Some codes reserved for control; shut off)
Switches & Chokes make phases; more phases …

D r. Ta i s i r E l d os
❖ Larger currents using cheaper components

❖ Less ripple, noise & heat

An 8 + 2 PPM has 8 CPU & 2 MEM phases


Jorda n U n ive r si ty o f Sci ence

★ VRMs may have 16 phases to supply up to 1000 A


❖ A phase may supply 15 to 60 A

a n d Te ch n o l og y
❖ A light core requires 2 to 4 W & 5 A

❖ A power core requires 8 to 16 W & 15 A

Dr. Taisir Eldos 48


Power vs. Frequency & Voltage
★ The power consumption is directly proportional to frequency and square of voltage
★ So, P = KFV2; the constant K depends architecture, design, fab process, complexity, etc.

Mi cr o pr o c e sso r S yst ems
A processor with: F = 2.4 to 3.6 GHz, V = 0.8 to 1.4 V & Pmax = 37 W
❖ 3.0 GHz & 1.4 V

D ra f t L e c tu re s
✦ P = 37x(3.0/3.6)x(1.4/1.4)2

✦ P ≈ 31 W
Voltage (V) Power (W)
❖ 3.0 GHz & 1.1 V

By
2.0 50
✦ P = 37x(3.0/3.6)x(1.1/1.4)2 1.8 45
1.6 40
✦ P ≈ 19 W

D r. Ta i s i r E l d os
1.4 35
★ A processor with: 1.2 30
❖ F: 3.2 to 4.4 GHz (normally, 3.8 GHz) 1.0 25
0.8 20
Jorda n U n ive r si ty o f Sci ence
❖ V: 0.9 to 1.3 V (normally, 1.1 V)

❖ Normally, consuming 15 W
0.6
0.4
Find K …
50 = K x 4 x 1.2 x 1.2
15
10
Compute 0.2 K = 8.7 Ω GHz
-1 -1 05

a n d Te ch n o l og y

0.0 00
❖ Minimal power, Pmin
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
❖ Maximum power, Pmax
Frequency (GHz)

Dr. Taisir Eldos 49


Buck Converters
★ Using a voltage divider or zener diode to step down voltage neither efficient nor flexible
★ A buck converter is an efficient and flexible way to get stable voltage with the required value


Mi cr o pr o c e sso r S yst ems
The sawtooth is the choke current during HS/LS FETs activation (mutually opposite)
VOUT = DC x VIN. If VIN = 5 V, then: VIN

D ra f t L e c tu re s
❖ 25% duty cycle yields VOUT = 1.25 V

❖ 50% duty cycle yields VOUT = 2.5 V

4 A at 2.5 V means 2 A at 5 V (100% efficiency) VOUT


By

★ 4 A at 2.5 V means 2.5 A at 5 V (80% efficiency) Time

VIN
D r. Ta i s i r E l d os P

VIN
Time

Jorda n U n ive r si ty o f Sci ence


Drivers
HS FET

VOUT VOUT

a n d Te ch n o l og y
P
LS FET
Time

P Time

Dr. Taisir Eldos 50


Buck Converter - Multi Phase
★ While output voltage is controlled by the duty cycle, more phrases increase current capacity
★ A chock and two control switches, HS FET & LS FET, constitute a phase


Mi cr o pr o c e sso r S yst ems
Pulse controller generates 100 KHz to 10 MHz overlapping or non-overlapping pulses
Higher frequency switching produces smoother output but causes electromagnetic

VIN
D ra f t L e c tu re s
interference that requires design care

Phase #1
VOUT VIN

By P1
Pulse Generator

Phase #2
VOUT

D r. Ta i s i r E l d os
P2
Time

Phase #3

Jorda n U n ive r si ty o f Sci ence


P3 P1 Time

P2
Time

a n d Te ch n o l og y
3-phase non-overlapping
with 20% duty cycle P3
Time
1 V output using 5 V input

Dr. Taisir Eldos 51


Microprocessor Systems
Software Model

Dr. Taisir Eldos

“It is soft; you can swallow it” - Terry Eldos


Introduction
★ How to tell a machine to do something? A task ….
❖ Binary, sequence of bytes or words; hard to write, read, understand, comment, etc.

Mi cr o pr o c e sso r S y stems
❖ Hex, direct mapping of every nibble Binary to Hex

❖ Symbolic, use mnemonics to represent instructions


Dra f t L ec tu re s
❖ Assembly, symbolic language with directives (pseudo-instructions)

Assembly enhances reuse and code readability by using labels, comments, etc.
Colors indicate matching code in the all levels; black stuff is pseudo; no translation
By

Line Lablel Code Comment Address Binary Hex

2 D r. Ta i s i r El dos
code
org $1200

ld a, init
Place Code at $1200

Initialize Counter
$1200

$1201
0011 1110

0100 0111
$3E

$47

3 loop nop Delay $1202 0000 0000 $00

Jord an U n ive r si ty of Sci ence


Assembler
4 dec a Decrement Counter $1203 0011 1101 $3D

5 jp nz, loop Exit Counter on Zero $1204 1100 0010 $C2

an d Tech no lo gy
6 init equ $47 Initial Count $1205 0000 0010 $02

7 end code $1206 0001 0010 $12

Intel 8080 Assembly Code

Dr. Taisir Eldos 2


Memory Organization
★ Memory can be viewed as a matrix; rows of data items of specific width
★ The width in bits times the number of row is the capacity

Mi cr o pr o c e sso r S y stems
Wider implies fewer addresses; a 32 bits of information can be organized as:
❖ 2 x 16b, Word wide; 1 Address bit C15
B 31

Dra f t L ec tu re s
❖ 4 x 8b, Byte wide; 2 Address bits

❖ 8 x 4b, Nibble wide; 3 Address bits


C9
10
9
❖ 16 x 2b, Crumb wide; 4 Address bits

By
❖ 32 x 1b, Bit wide; 5 Address bits
N7
C8
C7
8
7
N6 C6 6

D r. Ta i s i r El dos
B3
B2 N5 C5 5

B1 N4 C4 4

Jord an U n ive r si ty of Sci ence


B0 N3
N2
C3
C2
3
2
W1 N1 C1 1

an d Tech no lo gy
W0 N0 C0 B 0

LW0

Dr. Taisir Eldos 3


Memory Organization & Indexing
★ Memory organized as Byte Wide, Word Wide, Long Word Wide, Very Long Word Wide
★ Byte indexing allows addressing individual bytes but need extra bits to select a byte in a word

Mi cr o pr o c e sso r S y stems
A 256 GB memory, formatted as 64 G x 32 b requires
❖ Log2 (64 G) = 6 + 30 = 36 address bits to select a word, and

Dra f t L ec tu re s
❖ Log2 (32 / 8) = 2 address bits to select a byte within a word

❖ A total of 36 + 2 = 38 address bits; which gives 2 ^ 38 = 256 GB

By
A1A0 A1A0 A1A0 A1A0
00 B 00 W 00 LW 00 VLW
01 01 01 01
10
11
D r. Ta i s i r El dos
10
11
10
11
10
11

Jord an U n ive r si ty of Sci ence


A2A1A0
00 B1 B0
A3A2A1A0
00 B3 B2 B1 B0
A4 A3A2A1A0
00 B7 B6 B5 B4 B3 B2 B1 B0

an d Tech no lo gy
Byte Indexing 01 01 01
10 10 10 $48
M(10011) = $48 11 11 11
1 0 11 10 01 00 111 011 101 100 011 010 001 000

Dr. Taisir Eldos 4


Endianness (Data Mapping)
★ Processor Endianness is how it maps multiple byte register content to byte indexed memory
★ A processor supports Little Endian (LE), Big Endian (BE) at one time
Mi cr o pr o c e sso r S y stems
❖ LE: Lower Data maps to Lower Address

❖ BE: Higher Data maps to Lower Address


Dra f t L ec tu re s
When a 4-byte data is copied to address $1002, it has to be speed over $1002, 3, 4 & 5
❖ LE, B0 goes to $1002

❖ BE, B0 goes to $1005

CPU REGISTER
By MEM MEM CPU REGISTER
B3
D r. Ta i s i r El dos
B2 B1 B0 $1000
$1001
$1000
$1001
B3 B2 B1 B0

B0 $1002 $1002 B3

Jord an U n ive r si ty of Sci ence


B1
B2
$1003
$1004
$1003
$1004
B2
B1

an d Tech no lo gy
Low Address ↔ Low Data
B3 $1005
$1006
$1005
$1006
B0

Low Address ↔ High Data


$1007 $1007
Little Endian Big Endian

Dr. Taisir Eldos 5


Endianness & Alignment
★ If registers are multi-byte and memory is byte indexed then there has to be a mapping:
❖ Little Endian (LE), maps lower order data item to lower order memory address


Mi cr o pr o c e sso r S y stems
❖ Big Endian (BE), maps lower order data item to higher order memory address

The Endian term also applies to:

Dra f t L ec tu re s
❖ File formats; when an application stores a multi-byte or multi-word data items, and this is
why we have standards
❖ Networking & Serial Transmission; least or most significant bit transmitted first in time

★ Examples By
❖ LE: ARM & Intel x86 architectures

D r. Ta i s i r El dos
❖ BE: Motorola 68K & Sun SPARC architectures

❖ LE/BE: MIPS architecture, support both using different implementations

Jord an U n ive r si ty of Sci ence


❖ LE & BE: PowerPC 601 architecture is switchable, one implementation supports both

★ Endianness switching can be done in one of two ways


❖ Either by software, during operation


an d Tech no lo gy
❖ At start up, using some motherboard setting jumper

Alignment is about allowing or disallowing multiple byte at odd address; data fragmentation

Dr. Taisir Eldos 6


Memory Alignment
★ When accessing multiple bytes at byte indexed memory, we need to consider fragmentation.
★ Regarding this, processors are either:

Mi cr o pr o c e sso r S y stems
❖ Aligned: does not allows fragmentation; fast but may waste memory (Temporal advantage)

❖ Unaligned: allows fragmentation; compact but may be slower (Spatially advantage)

Dra f t L ec tu re s
★ Example
❖ Define the following constants, at

❖ Address $1000 01 23 1000 01 23 1000


✦ $0123,

✦ $45,
By 89
A4
45 1002
67 1004

67
45 1002
89 1004
83 79 1006 79 A4 1006

D r. Ta i s i r El dos
✦ $6789,

✦ $A4,
24 1008

Little Endian
24 83 1008

Little Endian
✦ $79,


Jord an U n ive r si ty of Sci ence
✦ $2483

Dash indicates skipped location


Unaligned Aligned

★ an d Tech no lo gy
Skipped locations are just left alone, and can still be accessed by their addresses
Fragmented words requires two transactions to access
★ Grouping; words then bytes or bytes then word resolve the spatial issue

Dr. Taisir Eldos 7


Assembler Directives: EQU & ORG
★ Assembler directives or pseudo-instructions are meant to direct the Assembler as how to
assemble the code, like:

Mi cr o pr o c e sso r S y stems
❖ EQU, Equate: binds a name to a value

❖ ORG, Origin, location counter: where to place Code and Data in memory

Dra f t L ec tu re s
❖ END, Indicate the end of program

★ Consider this piece of code, assuming Big Endian processor 1000

By
1002
Many EQU 32 1004
More EQU $48 1006 12 3C

D r. Ta i s i r El dos
Code ORG $001006 1008 00 20
MOVE.B #Many, D1 ; $123C, $0020 100A 7A 48
MOVEQ #More, D5 ; $7A48 100C


Jord an U n ive r si ty of Sci ence
Instruction MOVE.B #$20, D1 in binary coding is translated to $123C0020

anis d Tech noas theloprocessor


gy is aligned
❖ Instruction word $123C is for the instruction opcode, size, addressing modes
❖ Instruction word $0020 for source literal (Many),
★ Instruction MOVEQ #48, D5 is encoded as $7A48

Dr. Taisir Eldos 8


Assembler Directives - DC & DS
★ Another two important directives are
❖ DC, Define Constant, allocates memory and loads with data at compile time


Mi cr o pr o c e sso r S y stems
❖ DS, Define Storage, allocates memory to be used at run time

Consider the piece of code (For a Big Endian Processor)


1000
1002 − −

ORG Dra f t L ec tu re s
$1002
1004 −
1006 −
X

BArray DS.B 3 ; Allocate 3 bytes 1008 − −
WArray
BData
DS.W
DC.B
2
12
By
; Allocate 2 words
; Allocate a byte and write $0C
100A 0C
100C 00
X
0C

D r. Ta i s i r El dos
DC.W 12 ; Allocate a word and write $000C 100E 41 42
Message DC.B “ABC 123” ; Allocate bytes for ASCII string and write 1010 43 20
; $41 for “A”, …, $20 for “ ”, … and $33 dor “3” 1012 31 32

Jord an U n ive r si ty of Sci ence


Marks
ORG $1018
DC.L 23, $23 ; Reserve & Write $00000017 & $00000023
1014 33
1016
1018 00 00


an d Tech no lo gy
Labels are used as friendly alternatives to addresses
To access the string “ABC 123”, we use Message as pointer
101A 00
101C 00
17
00
101E 00 23

Dr. Taisir Eldos 9


Memory Alignment - Waste
★ The DC secretive allocates memory and write data
1000 13 X
★ To reduce memory waste, re-arrange by packing bytes
Mi cr o pr o c e sso r S y stems
1002 12 34
★ Assemblers can re-arrange (or the programmers) 1004 24 X
1006 56 78

Dra f t L ec tu re s
ORG $1000 ORG $1000 1008 35 X B2
B1 DC.B $13 W1 DC.W $1234 A1 B2 3 different places;
100A
$1004, $1009, $1001
W1 DC.W $1234 W2 DC.W $5678 100C 46 X same data
B2
W2
DC.B
DC.W
$24
$5678
W3
W4
DC.W
DC.W By$A1B2
$C3D4
100E
1010
C3 D4
AB
B3 DC.B $35 B1 DC.B $13
W3
B4
DC.W
DC.B
$A1B2
$46
D r. Ta i s i r El dos
B2
B3
DC.B
DC.B
$24
$35
1000
1002
12 34
56 78
1000
1002
13 24
35 46
W4 DC.W $C3D4 B4 DC.B $46 1004 A1 B2 1004 AB X

Jord an U n ive r si ty of Sci ence


B5 DC.B $AB B5 DC.B $AB 1006
1008
C3 D4
13 24
1006
1008
12 34
56 78
100A 35 46 100A A1 B2

Xs have no labels, an d Tech at run time no lo gy


★ Locations marked by X are skipped for alignment 100C AB 100C C3 D4
❖ but accessible
❖ Re-arranging, Words/Bytes or Bytes/Words, eliminates or reduces waste

Dr. Taisir Eldos 10


MC68000 Programmer’s Model
★ General Purpose Registers (Light Gray)
D7
❖ Data Registers

Mi cr o pr o c e sso r S y stems
✦ 8 x 32-bit registers, D0, D1,…, D7

✦ L, W, B segmentation
D0

Dra f t L ec tu re s
❖ Address Registers

✦ 7 x 32-bit registers, named A0, A1, …, A6


A6
✦ L, W segmentation


By
Special Purpose Registers (Dark Gray)
❖ Stack Pointers, no segmentation
A0

D r. Ta i s i r El dos
✦ 32-bit, User Stack Pointer (A7, USP)

✦ 32-bit, Supervisor Stack Pointer (A7’, SSP) A7


❖ Program Counter (PC), no segmentation

Jord an U n ive r si ty of Sci ence


A7’
✦ 32-bit, also called Instruction Pointer (IP)
− PC
✦ 24-bit address bus (MSB not used)

an d Tech no lo gy
❖ 16-bit Status Register (SR)

✦ 8-bit System Byte (SB)


SB CCR SR

✦ 8-bit Condition Code Register (CCR) T − S − − I2 I1 I0 − − − X N Z V C SR

Dr. Taisir Eldos 11


CCR or Flags
★ Consider the 4-bit Binary Adder below, to understand what the flags are about
★ The XOR acts like a MUX, and the control input A’/S dictates the operation:
Mi cr o pr o c e sso r S y stems
❖ A’/S = 0, XORs pass B along with carry of Cin = 0 to add, hence F = A + B

❖ A’/S = 1, XORs pass B’ along with carry of Cin = 1 to subtract, hence F = A – B


Dra f t L ec tu re s
Spilling means outcomes does not fit the size, and expressed through C (assuming inputs are
unsigned numbers) & V (assuming inputs are signed numbers)
B3 A3 B2 A2 B1 A1 B0 A0

By A’/S

D r. Ta i s i r El dos
C FA FA FA FA Cin

Jord an U n ive r si ty of Sci ence


V
N

an d Tech no lo gy
Z

F3 F2 F1 F0

Dr. Taisir Eldos 12


Microprocessor Systems
Addressing Modes

Dr. Taisir Eldos


Addressing Modes - Symbols and Notation
★ Addressing modes are the methods by which the instructions access their operands
★ We use Register Transfer Language (RTL) or to describe operations at the hardware level

Mi cr o pr o c e sso r S y stems
Assume that :
❖ <s> is the source operand, and

Dra f t L ec tu re s
❖ <d> is the destination operand

★ Then, the ADD & MOVE instructions of the processor are described as in the comment section

ADD
MOVE
By
<s>, <d>
<s>, <d>
; d ← d + s add d to s and store into d
; d ← s store copy of s into d

★ D r. Ta i s i r El dos
Here the source operand comes first, some Assemblers use destination first
★ Data Types
Jord an U n ive r si ty of Sci ence
❖ $ means Hexadecimal

❖ @ means Octal

an d Tech no lo gy
❖ % means Binary

❖ ‘ …’ means ASCII

❖ No prefix means Decimal

Dr. Taisir Eldos 14


Addressing Modes …
No. Addressing Mode Description

Mi cr o pr o c e sso r S y stems
1
2
Literal
Absolute.W
Immediate number
Direct or Absolute Short (Word Address, to sign extend)

Dra f t L ec tu re s
3 Absolute.L Direct or Absolute Long (Longword Address, full address)
4 Di Data Register Direct
5 Ai Address Register Direct
6
7
(Ai)
(Ai)+
By Address Register Indirect
Address Register Indirect with Post-increment
8
9
D r. Ta i s i r El dos
−(Ai)
(d16, Ai)
Address Register Indirect with Pre-decrement
Address Register Indirect with Displacement

Jord an U n ive r si ty of Sci ence


10
11
(d8, Ai, Xj)
(d16, PC)
Address Register Indirect with Displacement & Index (X is D or A)
Program Counter Relative with Displacement
12
13 an d Tech no lo gy
(d8, PC, Xj)
Embedded
Program Counter Relative with Displacement & Index (X is D or A)
3-bit / 8-bit immediate number encoded within the instruction
14 Implied SR, CCR, USP, PC

Dr. Taisir Eldos 15


Immediate or Literal
★ In this mode, the actual operands follow the instruction
★ Allows a constant to be setup when program is written


Mi cr o pr o c e sso r S y stems
The # is used to tell the Assembler “its immediate”
Typical application to setup control loops and delay counters
★ Example
Dra f t L ec tu re s
MOVE.B #$83, D3 ; D3(7:0) ← $83
MOVE.W
MOVE.L
By
#$83, D3
#$83, D3
; D3(15:0) ← $0083
; D3(31:0) ← $00000083

MOVE.L D r. Ta i s i r El dos
#$1A483, D3 ; D3(31:0) ← $0001A483

Jord an U n ive r si ty of Sci ence


MOVE.B #$100, D3 ; Syntax error
; Immediate value $100 exceeds the byte capacity
; Must be in the range 0 to 255 or -128 to +127
MOVE.B
an d Tech no lo gy
D3, #$83 ; Syntax error
; Immediate addressing mode makes no sense as destination
; As destination must be alterable; register or memory

Dr. Taisir Eldos 16


Absolute or Direct
★ Instruction contains the operand’s address not its value
★ Long, 32-bit address, accesses 16 Mbytes

Mi cr o pr o c e sso r S y stems
Short, 16-bit signed, to be signed extended internally
❖ Sign = 0, upper word is 0s; range is $000000 − $007FFF (Lowest 32KB block)


Dra f t L ec tu re s
❖ Sign = 1, upper word is 1s; range is $FF8000 − $FFFFFF (Highest 32KB block)

If sign extending a word address changes its value then it has to go long
Short takes less space and time; better if fits; Assemblers decide
By

FF8000 − FFFFFF

FF0000 − FF7FFF
MOVE.L D3, $17004 ; M($017004) ← D3(31:16); M($017006) ← D3(15:0)

D r. Ta i s i r El dos
MOVE.W D3, $7234
; Two transactions, High oder data first (BE), Long Abs
; M($007234) ← D3(15:0)
FE8000 − FEFFFF

FE0000 − FE7FFF
; Short fits because SE($7234) = $007234

Jord an U n ive r si ty of Sci ence


MOVE.W D3, $8234 ; M($008234) ← D3(15:0) 018000 − 01FFFF
MOVE.W D3, $8234.w ; M($FF8234) ← D3(15:0

an d Tech no lo gy
010000 − 017FFF
; Sign Extending $8234 yields $FF8234
008000 − 00FFFF
; So, if the address is $008234 it has to go long
; Otherwise it will be considered $FF8234 000000 − 007FFF

Dr. Taisir Eldos 17


Register Direct
★ Does not involve memory address, hence so fast
★ Effective address of operand is the register name

Mi cr o pr o c e sso r S y stems
The MC68K data path is 32 bits. Hence register-register transfer takes 4 clock cycles regardless
of the size; it is done as a single micro-operation

Dra f t L ec tu re s
★ Examples

MOVE.L D0, D1 ; D1(31:0) ← D0(31:0)


MOVE.W
MOVE.B
D0, D1 ; D1(15:0) ← D0(15:0)
D0, D1 ; D1(7:0) ← D0(7:0) By


D r. Ta i s i r El dos
Direct Address Register is not allowed as destination of MOVE
A dedicated instruction called MOVEA (Assembly restriction not processor OpCode)

Jord an U n ive r si ty of Sci ence


MOVEA.L
MOVEA.W
D0, A0 ; A0(31:0) ← D0(31:0)
D0, A0 ; A0(15:0) ← D0(15:0)
MOVEA.W

MOVEA.B
an d Tech no lo gy
A1, A0 ; A0(15:0) ← A1(15:0)

D0, A0 ; Syntax error, Address registers can not be byte sized


MOVEA.W D0, D1 ; Syntax error, Only address registers allowed for MOVEA

Dr. Taisir Eldos 18


Address Register Indirect
★ Specified by enclosing the address register in parentheses
★ Fast, address is in the CPU and can be dynamically changed


Mi cr o pr o c e sso r S y stems
Application: arrays, records, link lists, etc
Processor state is usually in hexadecimal even without the prefix $

Dra f t L ec tu re s
Examples, Big Endian processor
1000 12 34
A1 = $1000

By
1002 57 30
A5 = $1002
1004
A6 = $1008
1006
D4 = $31295730

MOVE.W
D r. Ta i s i r El dos
(A1), D3 ; D3(15:0) ← M(A1)
1008
100A
31
57
29
30
100C

Jord an U n ive r si ty of Sci ence


; D3(15:0) = $1234 & D3(31:16) unchanged
100E
MOVE.W D4, (A5) ; M(A5) ← D4(15:0)
MOVE.L
an d Tech no lo gy
D4, (A6) ; M(A6) ← D4(31:0)
; Ok to write it this way for the sake of learning, but of implementation …
; M(A6) ← D4(31:16) then M(A6+2) ← D4(15:0)

Dr. Taisir Eldos 19


Address Register Indirect with Post-increment
★ Auto adjustment provides faster access to structured data items; tables, arrays, etc.
★ Increment by 1 for .B, 2 for .W and 4 for .L instructions, hence less time and space


Mi cr o pr o c e sso r S y stems
Exception is A7 (USP) and A7’ (SSP), where 2 is used for .B, preserve alignment
Note that RTL uses one statement for .L sized memory accesses, but in fact it done done in

Dra f t L ec tu re s
two cycles because it’s a word sized data bus

MOVE.L (A0)+, D3 ; D3(31:0) ← M(A0); A0 ← A0 + 4


MOVE.W
MOVE.B By
(A0)+, D3
(A0)+, D3
; D3(15:0) ← M(A0); A0 ← A0 + 2
; D3(7:0) ← M(A0); A0 ← A0 + 1

MOVE.W
D r. Ta i s i r El dos
D3, (A0)+ ; M(A0) ← D3(15:0); A0 ← A0 + 2

MOVE.L (A7)+, D4 ; D4(31:0) ← M(A7); A7 ← A7 + 4


Jord an U n ive r si ty of Sci ence
MOVE.W
MOVE.B
(A7)+, D4
(A7)+, D4
; D4(15:0) ← M(A7); A7 ← A7 + 2
; D4(7:0) ← M(A7); A7 ← A7 + 2

an d Tech no lo gy
Dr. Taisir Eldos 20
Address Register Indirect with Pre-decrement
★ Auto adjustment, increment or decrement; faster access to structured data items; tables,
arrays, etc.


Mi cr o pr o c e sso r S y stems
Decrement by 1 for .B, 2 for .W and 4 for .L instructions, hence less time and space
Exception is A7 (USP) and A7’ (SSP), where 2 is used for .B, preserve alignment

Dra f t L ec tu re s
Applications include accessing data structures

MOVE.L –(A0), D3 ; A0 ← A0 – 4; D3(31:0) ← M(A0)


MOVE.W –(A0), D4
MOVE.B –(A0), D5 By
; A0 ← A0 – 2; D4(15:0) ← M(A0)
; A0 ← A0 – 1; D5(7:0) ← M(A0)

MOVE.W D4, –(A0)


D r. Ta i s i r El dos
; A0 ← A0 – 2; M(A0) ← D4(15:0)

MOVE.B D4, –(A7) ; A7 ← A7 – 2; M(A7) ← D4(7:0)


Jord an U n ive r si ty of
Latency hiding, which of the two modes: –(Ai) and (Ai)+ is faster?
Sci ence

ancyclesdbeforeTech no lo gy
As source, (Ai)+ is faster as we use then increment, but –(Ai) has to decrement first and
have to wait 2 clock use
❖ As destination, the pre-dec latency is also hidden, they are just as fast

Dr. Taisir Eldos 21


Address Register Indirect with Displacement
★ Effective address computed by adding the content of address register to a signed 16-bit word,
d16, which is encoded as part of the instruction


Mi cr o pr o c e sso r S y stems
Effective Address <ea> is the sum of address register content plus displacement
Applications include accessing data structures with records and fields

MOVE.L (12, A1), D3


MOVE.W (– 6, A2), D0
Dra f t L ec tu re s
; D3 ← M(ea) where ea = A1 + $C
; D0(15:0) ← M(ea) where ea = A2 – $6
MOVE.B D1, ($24, A3)
By
; M(ea) ← D1(7:0) where ea = A3 + $24

★ Some Assemblers requires the displacement written before the parenthesis; like MOVE.L

★ Example
D r. Ta i s i r El dos
12(A1), D3 as opposed to MOVE.L (12, A1), D3

Jord

❖ an addressU nsource
ive r first
siinstruction
ty of
If in the above instruction A1 = $123400 and A2 = $123468, then
The effective of the in the is Sci ence
ea = $00123400 + $0000000C = $0012340C
❖ an
The effective address d
of the Tech
source no
in the second lo
instruction
ea = $00123468 + $FFFFFFFA = $00123462 ($FFFFFFFA is – 6)
gy
is

Dr. Taisir Eldos 22


Address Register Indirect with Displacement & Index
★ Effective address is the sum of three components; the address registers, the longword or sign
extended lower order word of the index register, and the offset or displacement, which is 8-

★ Mi cr o pr o c e sso r S y stems
bit signed or d8
Most complex addressing mode
Good for structures, like the element in row r column c in matrix m
Dra f t L ec tu re s

MOVE.L (6, A1, D0.W), D3 ; D3 ← M(ea) where ea = A1 + SE(D0(15:0)) + $6


MOVE.L D4, ($24, A2, A5)
By
; M(ea) ← D4 where ea = A2 + A5 + $24

★ SE means Sign Extend Xj if needed, then compute: ea = d8 + [Ai] + [Xj]


★ Index register is Xi butDonlyr.
.L or Ta i s iandr theEl
.W allowed, offsetdos
is d8


Jord
For the firstan
Example

Uassume:
instruction, n ive r si ty
A1=$1234A6, of Sci
D0=$12348812, then ence
The effective address ea = $1234A6 + $FFFF8812 + $6 = $0011BCBE
an d Tech nothelo gy

❖ Why the effective address is lower than A1 ? Because index is negative

Dr. Taisir Eldos 23


Program Counter Relative
★ Special case of register indirect, where PC is used instead of Address registers
❖ Displacement; ea = PC + d16


Mi cr o pr o c e sso r S y stems
❖ Displacement & Index: ea = PC + Xj + d8, and

OpCode extension is a word that is d16, or d8 and 5 bits encoding X; j and W/L

CPY MOVE.W
.
Dra f t L ec tu re s
(MSG, PC), D1 ; Copies M(MSG) = $4131 to D1(15:0)
; MSG is d16 representing the distance to MSG label

MSG
.
DC.B “A1” By
; from the updated value of the PC which is CPY + 2


D r. Ta i s i r El dos
Actual distance is encoded to be added to the PC in execution
Useful in making relocatable code, i.e. Position Independent Code (PIC) to reside anywhere in
memory
★ Jord an U n ive r si ty of Sci ence
Example:
❖ Assume: MOVE.W instruction is at address $1000 & MSG at address $1008


an d Tech no lo
Then the displacement MSG to be encoded is $1008 - $1002 = $6
Then, the instruction decoding is $323A $0006
gy
❖ When it executes: Source ea = $1002 + $6 = $1008

Dr. Taisir Eldos 24


Stack Operations
★ Stacks are memory sections accessible using dedicated pointers
6208
★ In MC68000, there are two; Supervisor Stack & User Stack
Mi cr o pr o c e sso r S y stems
620A
MOVE.W D0, –(SP) ; SP ← SP – 2, M(SP) ← D0(15:0) 620C A3 B5 7
MOVE.W (SP)+, D0 ; D0(15:0) ← M(SP), SP ← SP + 2 620E 67 89 6

Assume the folding andDra f tsegment


L ecbelow tu re s 6210 12 34 4
★ run the code it 6212 56 78 5
SSP = $8428 & USP = $6214 & S = 1 USP 6214

B=y$1A3B5

❖ D1 = $12345678, D2 = $23456789 & D3


7000

D r. Ta i s i r El dos
MOVE.W D1, –(SP) ; [ 1 ] SP ← SP – 2, M(SP) ← D1(15:0)
MOVE.L D2, –(SP) ; [ 2 ] SP ← SP – 4, M(SP) ← D2(31:16)
; [ 3 ] M(SP+2) ← D2(15:0) 841C

Jord an U n ive r si ty of Sci ence


841E
MOVE #$00, SR ; Switch State, S = 0 8420
8422 23 45 2
MOVE.L D1, –(SP) ; [4] SP ← SP – 4, M(SP) ← D1(31:16)

MOVE.W D2, –(SP)


an d Tech no lo gy
;
;
[5]
[6]
M(SP+2) ← D1(15:0)
SP ← SP – 2, M(SP) ← D2(15:0)
8424
8426
67
56
89
78
3
1
MOVE.W D3, –(SP) ; [7] SP ← SP – 2, M(SP) ← D3(15:0) SSP 8428

Dr. Taisir Eldos 25

You might also like