Professional Documents
Culture Documents
Microprocessor Systems: Introduction & Historical Review
Microprocessor Systems: Introduction & Historical Review
Microprocessor Systems: Introduction & Historical Review
Dra f t L ec tu re s
Artificial Intelligence (AI) has a touch on everything today, and this requires sophisticated
★
high performance processors to process vast amount of data like autonomous driving
★ Ubiquitous computing is about hardware & software engineering appearing everywhere in life
By
❖ Handheld computing devices
★
D r. Ta i s i r El dos
Telematics is a technology that combines informatics and telecommunications (Internet) for
specific applications like fleet management, using data collected by devices like sensors
★ In modern societies, people deal with computing devices more than 70 times in a typical day
Jord an U n ive r si ty of Sci ence
❖ Automatic Tellers
❖ Security Gates
an d Tech no lo gy
❖ Automotives
❖ Phones
❖ Tickets
Mi cr o pr o c e sso r S y stems
❖ Memory where code executed and data stored; primary memory & secondary storage
❖ Input & Output; peripherals that interact with the outside world via an interface
Dra f t L ec tu re s
❖ Connections; to connect all parts
D r. Ta i s i r El dos
✦ ±12 V, RS-232 Communication Links
an d Tech no lo gy
✦ 0.7 V to 1.4 V for Central Processing Units
Storage
✦ 0.6 V to 1.0 V for Graphic Processing Units
★
Mi cr o pr o c e sso r S y stems
Write down the specifications to achieve the workload
Search for the system components based on the target application
Dra f t L ec tu re s
❖ CPU class, complexity, price & performance
By
❖ Form Factor, Power Supply, etc.
❖ Smaller system
❖ More reliability
an d Tech no lo gy
❖ Less shipping cost
Mi cr o pr o c e sso r S y stems
❖ Power, important in portable systems for battery life and in large systems for cost
Dra f t L ec tu re s
★ Secondary Metrics
❖ Form factor: Size, Weight, etc.
★ Design Steps:
By
❖ Reliability: Mean Time Between Failures (MTBF)
D r. Ta i s i r El dos
❖ Project statement; Objectives, Deliverables, Constraints, Milestones, etc.
❖ Implementation tradeoffs
an d Tech no lo gy
❖ Construction
❖ Testing
❖ Documentation
★
Dra f t L ec tu re s
❖ 24,000 Vacuum Tubes in Mark #2
9 KW of power consumption
No RAM at all ! Purpose specific design
By
★
★ Input / Output ?
❖ Input was paper tape
★
D r. Ta i s i r El dos
❖ Output was indicators lamps
Started in 1943
★
Jord an U n ive r si ty of Sci ence
Retired in 1960 COLOSUS 1943, UK
an d Tech no lo gy
Dr. Taisir Eldos 6
Programmable Computer
★ USA invented a machine for ballistics, it Electronic Numerical Integrator And Computer
(ENIAC), completed in 1945 and retired 1955
★
★
Mi cr o pr o c e sso r S y stems
Programmable by physical rewiring
University of Pennsylvania (Philadelphia)
Dra f t L ec tu re s
Primitive machines; bulky, clumsy and slow
★
D r. Ta i s i r El dos
❖ 10 to 20 FLOPS (By software routines)
D r. Ta i s i r El dos
❖ Started with 16 and went 18, 40, 64, …, now 6096 CB
GND
★ How does the number of transistors affect performance? CB
★
Jord an U n ive r si ty of Sci ence
How does wider address bus affect performance?
How does higher clock frequency affect performance?
CPU
★
an d Tech no lo gy
How does power consumption relate to performance?
Why do we have many VCC& GND inputs ?
CPU
★
Dra f t L ec tu re s
Specifications
❖ 1000’s of vacuum tubes and 1000’s electromagnetic relays
By
❖ Weight of 30 Tons and 15 x 9 x 2.5 m (like 12 offices)
★ Computing Power
D r. Ta i s i r El dos
❖ Basic functions; thousands of decimal additions/second
cm
★
Jord an U n ive r si ty of Sci ence
Ease of use
❖ Hardwired program for ballistic tables computation, took 3 weeks to change task
❖ Clumsy input/output
★ an d Tech no lo gy
Reliability
❖ Used to fail few times a day, only few days without failure
★
Mi cr o pr o c e sso r S y stems
Compared to electronic valves
❖ Size: 5 x 5 x 5 mm versus 15 x 15 x 40 mm (50+ times smaller)
Dra f t L ec tu re s
❖ Terminal count: 3 versus 5 or 6 (2 times lesser)
By
❖ Frequency: 1 MHz versus 0.1 MHz (10 times better)
★
D r. Ta i s i r El dos
Today’s transistors are far better; much faster & lower power consumption
Historical review of Silicon
cm
Jord an U n ive r si ty of Sci ence
❖ Jons Berzelius discovered Silicon in 1823 in Sweden
❖ John Bardeen, Walter Brattain & William Shockley invented Transistor in Bell Labs 1947,
and got Nobel Prize in Physics 1956
an d Tech no lo gy
❖ Robert Noyce developed the first integrated Circuit in 1958
★
Dra f t L ec tu re s
❖ 1969: 1,000 Transistors, Large Scale Integration (LSI)
★ By
1972 Intel 8008: 3500 transistors, 10 µm,18-pin
Int
❖ NMOS, 8-bit data, 14-bit address, 800 KHz, 48 instructions el
80
D r. Ta i s i r El dos
08
cm
04
★
★
Jord an U n ive r si ty of Sci ence
The 4004 could do 60,000 Decimal Operations Per Second
But to have an idea about its Instruction Per Second (IPS) performance:
❖ Cycles Per Instruction (CPI) = 8
an d Tech no lo gy
❖ Instructions Per Cycle (IPC) = 1 / CPI = 0.125
❖ Instructions Per Second (IPS) = IPC x F = 740,000 x 0.125 = 92,500 IPS = 92.5 KIPS
Mi cr o pr o c e sso r S y stems
❖ 1980: 100 KTr, Ultra Large Scale Integration (ULSI)
Dra f t L ec tu re s
❖ 2000: 10 MTr, VLSI for all as a generic name
❖ 2010: 1 BTr
❖ 2013: 10 BTr
❖ 2015: 20 BTr By
❖ 2018: 40 BTr
D r. Ta i s i r El dos
❖ 2022: 60 BTr
❖ 2023: 90 BTr
cm
★
★
Jord an U n ive r si ty of Sci ence
Today, chips are made of multiple dies for higher yields
Process, node or fab, used to refer to the transistor dimension (10 µm was the beginning)
Process today refers to feature size (like channel length); indicating smaller and smaller
an d Tech no lo gy
★
❖ 3 nm process can pack around 300 MTr/mm2; transistor cell is around 60 x 60 nm2
❖ 6 nm diameter copper wires used are 10,000 times thinner than human hair (60 µm)
Dra f t L ec tu re s
A chips takes 1000 to 2000 steps and 10 to 15 weeks to make; it may have 100 material layers,
★
including 5 to15 metal layers to route wires (> 100 Km, consuming 5% to 15% of the power)
★ A 300 mm raw wafer costs $200 to $400 while a processed one costs $1,000 to $20,000
★
By
More than 1 Trillion chips every year & more than half of them come from TSMC
D r. Ta i s i r El dos
Jord an U n ive r si ty of Sci ence
Sand & Silicon Melting & Crystallization Ingot, 400 KG Wafers, 15 - 45 cm
an d Tech no lo gy
Dies ready to cut A single die A die; well and pins Chip Packaging
Mi cr o pr o c e sso r S y stems
Dra f t L ec tu re s
By
D r. Ta i s i r El dos
Jord an U n ive r si ty of Sci ence
an d Tech no lo gy
Intel 80286, 1982
134 KTr, 7mm x 7 mm
Intel i7 Octa Core, 2014
2.6 BTr, 18 mm x 20 mm
2.9 KTr / mm2 7.2 MTr / mm2
❖ Larger wafers have better yields, this led to using 45 cm wafers & multi-die chips
★
Dra f t L ec tu re s
Largest monolithic die was 30 mm, and today’s most effective size is 10 mm to 15 mm
Apple’s M1 SoC, 5 nm & 120 mm2 die; a 30 cm wafer makes 450 dies, cost is $50 per chip
By
D r. Ta i s i r El dos
Jord an U n ive r si ty of Sci ence
an d Tech no lo gy
15 cm wafer & 30 mm die
6 Good & 3 Defective
20 cm wafer & 30 mm die
17 Good & 7 Defective
20 cm wafer & 20 mm die
54 Good & 7 Defective
Silicon Utilization = 30% Silicon Utilization = 50% Silicon Utilization = 70%
★
Mi cr o pr o c e sso r S y stems
❖ A transistor cell on the die is 60 nm x 60 nm will look like a point by a pen; 0.6 mm
★
Dra f t L ec tu re s
With 7 mm x 7 mm die size, the yield of 30 cm wafer is around 1,000 (nearly, 200 defective)
If the wafer fits the equatorial line, the 3 nm feature is like 13 cm (bread slice size)
UV Light are few µm to 1 mm above wafer; lets say 0.1 mm
By
★
❖ Jam D r. Ta i s i r El dos
without making any mess …
300 Km x 300 Km
Wafer Die
❖ Butter
❖ Billions of Gates
★
Mi cr o pr o c e sso r S y stems
Every chip has pins, leads, balls or contacts, to connect the die in the well with the outside
Chips sit on sockets, whose mechanical design may improve heat dissipation
Dra f t L ec tu re s
AMD Genoa, LGA6096 package, has 6096 contacts to supper 12 memory channel for 96
★
cores, more than 1 GB of L3 cache, 128 PCIe lanes, 3.7 GHz & 400 W.
★ Does this explain why 6096 contacts ?
★
By
Today, balls have 36 µm pitch
D r. Ta i s i r El dos
Dual In Line Package Thin Small Outline Package Plastic Leaded Chip Carrier Surface Mount Technology
an d Tech no lo gy
Pin Grid Array Reduced Pin Grid Array Ball Grid Array Land Grid Array
(PGA) (rPGA) (BGA) (LGA)
Dra f t L ec tu re s
❖ Yield, the percentage of good chips in a wafer, gets lower and lower
★ The fabrication cost per mm2 depends on the process; doubled by moving 14 nm to 7 nm
★ Multi-die packages reduce cost, increase yield and provide for customizable modular design
By
❖ 3D Stacking, dies stacked vertically on top of each other; CPUs, GPUs, SDRAM, I/O, etc.
❖ Chiplet (Tiles), dies placed next to each other horizontally connected via interposer
D r. Ta i s i r El dos
❖ 3D Chiplet, imagine a complex 100 cm2 chip gets squeezed on 25 cm2 (4-story chip)
Dra f t L ec tu re s
❖ Chiplet (4 Dies x 50 mm2 each) has a far better yield of 70%, more than 75% more
90%
80%
70%
By 4-Chiplet
60%
50%
D r. Ta i s i r El dos
Jord an U n ive r si ty of Sci ence
40%
30%
Monolithic
an d Tech no lo gy
20%
10%
20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 400
Dra f t L ec tu re s
✦ Early days: 4, 8, 16, 32 and today 64 bits
✦ Means more bits, hence information, transferred each cycle or unit of time
By
✦ Early days: 12, 14, 16, 24, 32, 36 and today: 40, 41, …, 48, 50, & 52 (4 PB of memory)
✦ Larger address space means accessing more Code & Data directly in fast memory,
D r. Ta i s i r El dos
without referencing slower storage devices like HDD
❖ Larger number of registers, hence more high-speed data available, (192 not all named)
❖ Large number of functional units; Adders, Multipliers, etc. hence more things in parallel
❖ Faster clocking; 6 GHz max by Intel Coe i9 with 24 cores consuming 250 W
an d Tech no lo gy
❖ Larger caches (2 x 64KB L1, 256 KB L2 & 250 MB L3)
❖ More cores (100 Cores) & hence more memory channels (8)
Mi cr o pr o c e sso r S y stems
❖ Register Files (RF), to store operands and results temporarily
★
Dra f t L ec tu re s
CPU is good in general data manipulation and operations on integers and strings, not
mathematically intensive operations like floating point numbers crunching
★ Floating Point Unit (FPU) is a special AU that is good at floating point arithmetic
★
★
By
Using FPU next to the CPU enhance performance; work in parallel and more
Integrating the FPU & the CPU on a die makes it even faster
★
D r. Ta i s i r El dos
Integrating more CPUs enhances parallelism and yields faster performance
an d Tech no lo gy EU CPU#2
EU
CPU#2
CPU#3
Early days Later Integrate Integrate more
EU
★
Mi cr o pr o c e sso r S y stems
memory latency, vial increasing the hit ratio for the workload
That is a typical single core processor
Dra f t L ec tu re s
We may integrate 2, 4, 6, 8, etc. of them to work in parallel on different programs or threads
★
D r. Ta i s i r El dos
EU EU EU EU EU
L1 C L1 D L1 C L1 D L1 C L1 D L1 C L1 D L1 C L1 D
L3 CD
L2 CD
an d Tech no lo gy
L1 C L1 D
L2 CD
★
Mi cr o pr o c e sso r S y stems
transistors on the chip
Fabs could integrate only 200,000 transistors on a chip in the early 80’s! Where did they go?
Dra f t L ec tu re s
Researchers realized that more than half for the control sections due to large instruction set
★
and number of addressing modes.
★ Also need to add more and more on die like Memory Management Unit (MMU). What to do?
★
By
Make the control unit less complex; save transistors for registers, caches, function units, and it
runs fast as it is small and hardwired
D r. Ta i s i r El dos
★ This lead to anew design philosophy
❖ Reduced Instruction Set Computers (RISC)
an d Tech no lo gy
✦ Software centric, easier on the programmer & harder on the designer
✦ Examples: Intel x86, AMD Ryzen, Motorola MC68000, SUN SPARC, etc.
Mi cr o pr o c e sso r S y stems
❖ Small number of addressing modes, typically ≤ 5
★ CISC Dra f t L ec tu re s
❖ Simple control logic, hence fast and easy on the real estate (< 20%)
❖ Large instruction set, typically > 200 (Intel’s x86 > 1500)
By
❖ Large number of addressing modes, typically ≥ 8
❖ Variable length instructions, few to several (x86: 1 to 15 bytes & VAX 11/780: 1 to 57 bytes)
★
D r. Ta i s i r El dos
❖ Complex control logic, consume more real estate and has to go microprogramming (> 50%)
The two design styles coexist; each has its advantages … and they borrow from each other
★
Jord an U n ive r si ty of Sci ence
Some processors have RISC cores or engine and a hardware shell that translates CISC
instructions to RISC sequences on the fly, and caches them for fast processing
★ However, RISC is more power efficient (higher ratio of Performance to Power Consumption)
★
an d Tech no lo gy
Today, 20% to 30% of the die real estate is non-core; chip cache, graphics unit, memory
controller, clock distribution network, connections fabric, communications links and others
★ Around half of the non-core real estate goes for level 3 cache which is shared (in GB now)
Dra f t L ec tu re s
MOVE A, B LOAD A, R1
★ Not all RISCs or CISCs are equal …
STORE R1, B
★ Hence, binary files vary in instructions count
❖ Normally, within ±10% close
By
ADD R1, A LOAD A, R2
❖ Quite similar ones, within ±5% close
ADD R1, R2
STORE R2, A
★ RISC compilers are hard to design and take longer to run
★
D r. Ta i s i r El dos
RISC / CISC code segments
❖ A & B are variables
ADD A, B LOAD A, R1
LOAD B, R2
ADD R1, R2
★
Jord an U n ive r si ty of Sci ence
❖ R1 & R2 are registers
Example
STORE R2, B
an d Tech no lo gy
❖ Assuming 30% less instructions when compiled for a CISC; 0.7 x 1987 = 1390.9 = 1400
❖ Estimated to generate ±10% when compiled for another RISC; 1800 to 2200
Mi cr o pr o c e sso r S y stems
❖ Typically, 10s of cores depending on platform
Dra f t L ec tu re s
✦ 4 to 8 cores, Gaming (overclocked & power hungry)
By
✦ 12 to 24 cores, Workstations (fast & power hungry)
★
D r. Ta i s i r El dos
✦ 48 to 192 cores, Data Centers
NPUs & TPUs, Neural Processing Units & Tensor Processor Units
an d Tech no lo gy
★
Mi cr o pr o c e sso r S y stems
96-core AMD Epyc Genoa costs $3,000
❖ L2: 256 KB/512 KB, Unified Code/Data per core
★
Dra f t L ec tu re s
❖ L3: 8x32 = 256 MB in AMD Epyc Rome & 1.1 GB in AMD Epyc Genoa for Servers
UnCore Components
❖ Memory Management Unit, Virtual Memory and Multi-Channel Interfaces
By
❖ Thermal Control, Maximize core utilization and performance within the power envelope
★ Clock: operating frequency varies with the number of cores and workload…
D r. Ta i s i r El dos
❖ 9.0 GHz under nitrogen cooling for short period of time (overclocked for several minutes)
❖ Performance Cores (P-Core), high performance for foreground operations (more power)
Dra f t L ec tu re s
❖ Higher frequency produces more electromagnetic interference; causing crosstalk
❖ Higher frequency requires faster switching transistors; size and material limitation
★ By
50 GHz clock? Assuming all above issues are sorted out
❖ Signal travels in the chip, silicon & wires, at 100,000 to 200,000 Km/s speed
D r. Ta i s i r El dos
❖ This is equivalent to 100 mm/ns to 200 mm/ns, lets us assume 100 nm/ns
❖ A clock cycle = 1 / 50 GHz = 0.02 ns, the signal travels only 2 mm, die side must be small
❖ By proportion, Apple SoC A17 Pro has 150 to 160 mm2 die and clocks at 3.78 GHz, the
★
an d Tech no lo gy
die side must be reduced to (3.78/50)x13 = 1 mm to run at 50 GHz
Such die size, 1 mm2, can pack only 250 to 300 MTr using 3 nm process
★ Hardly enough for a single core, to many cores at 5 GHz is better
★
Dra f t L ec tu re s
❖ Moon: 2 x 384,400/7,200,000 = 0.107 s = 107 ms
By
❖ While it takes …
L step R step 60 cm 60 cm
✦ Light, 135 ms (A blink of eye)
D r. Ta i s i r El dos
✦ 2,000 Km/h supersonic jet, 20 hours
★
Jord an U n ive r si ty of Sci ence
What can a processor do in such a small cycle?
❖ Arithmetic operations on integers and floating point numbers
an d Tech no lo gy
5.6 ms
53.4 ms
MOON EARTH
53.4 ms
Mi cr o pr o c e sso r S y stems
❖ Modern single core processors draw up to 0.4 A @ 1.5 V or 0.6 W max
★
Dra f t L ec tu re s
❖ High core count processors draw up to 600 A @ 0.6 V or 360 W max
D r. Ta i s i r El dos
Smart Phone 1 to 5
Tablet 5 to 10
Notebook 10 to 20
Mi cr o pr o c e sso r S y stems
❖ 8 memory channels system controller with 128 PCIe Lanes, I2C, SATA, USB, RTC, etc.
❖ 0.9 V core voltage and 200 W maximum power consumption at 3.1 GHz
★
Dra f t L ec tu re s
Memory: using 288-pin SDRAM DDR4 modules, there has to be 8 x 288 = 2304 pins
Graphics, Chipset & I/O: around 1000 pins
Power: 200 / 0.8 = 250 A, 0.5 A bonding wires, needs 250 / 0.5 = 500 pairs = 1,000 pins
By
★
★ Number of pins/contacts exceeds 4000. It in fact has 4094 contacts and uses SP3 socket
❖ Dies fabricated 7 nm process (8 dies on the left and right sides)
★
D r. Ta i s i r El dos
❖ System Control fabricated using 14 nm (rectangle in the middle)
★
Mi cr o pr o c e sso r S y stems
❖ i9: 7 Billions Transistors, 200 mm2 die size using 14 nm node, 5 GHz, 0.01 CPI & 50 W
Transistor Count
Dra f t L ec tu re s
❖ 8080 chip has 6,000 / 20 = 300 Tr per square mm
★ CPI Performance
By
❖ i9 is 10 / 0.01 = 1,000 times better cycle utilization, although 2,500 times shorter cycle
D r. Ta i s i r El dos
Instruction per Second (IPS) Performance
❖ 8080 is 2 MCPS / 10 CPI = 0.2 MIPS = 200 KIPS
❖ i9 is 5,000 MCPS / 0.01 CPI = 500,000 MIPS. Hence i9 is 2,500,000 times better
★
Jord an U n ive r si ty of Sci ence
Power Efficiency: Millions Instructions Per Joule (MIPJ)
❖ 8080 reaches 0.2 MIPS /1 JPS = 0.2 MIPJ = 200 KIPJ
★ an d Tech no lo gy
❖ i9 reaches 500,000 MIPS / 50 JPS = 10,000 MIPJ. Hence 50,000 times more efficient
Gordon Moore predicted that number of transistors on a chip doubles every year (1.5 & 2 years)
★ This prediction went up and down since 1965, 2^(44/2) = 4 Million times (sort of performance)
★
Mi cr o pr o c e sso r S y stems
little ? Higher MIPS CPUs may produce less work
MFLOPS is a better metric, but only one aspect is addressed, floating point arithmetic
capability. Apple’s A17 SoC is rated 2.5 TFLOPS
★ Dra f t L ec tu re s
The ultimate performance measure is time; how much it takes a processor to complete a task
★ Benchmarks are collections of programs with range of activities to index computers
★
★
By
Benchmarks may focus on integer, floating point, graphics, etc. for specific users
A major problem of today’s systems is called the Memory Wall
D r. Ta i s i r El dos
❖ Annual performance growth of processor is > 30%, while
So, a workload with 70% of the time processing & 20% of the time memory accesses, we get:
Jord an U n ive r si ty of Sci ence
★
★
an d Tech no lo gy
❖ The rest is 0.1xT unchanged or 0.1xT; typically input/output related
Mi cr o pr o c e sso r S y stems
❖ New completion time is Tx = (1-α)T + (α/β)T
Dra f t L ec tu re s
★ What is the impact of doubling the core count? doubling the clock rate? and doubling both?
★ Assume a workload with P percentage of parallel activities & C percentage of computing activities
❖ Core Doubling, programs have some degree of parallelism (core scalability)
By
✦ Highly Parallel: α = 0.9 yields S = 1 / (0.1 + 0.9 / 2) = 1.8 (80% extra performance)
✦ Highly Serial: α = 0.2 yields S = 1 /(0.8 + 0.2 / 2) = 1.1 (10% extra performance)
D r. Ta i s i r El dos
❖ Clock Doubling, programs have some degree of number crunching (frequency scalability)
★
Jord an U n ive r si ty of Sci ence
Compute the speedup of a task using 4 times the core count & 50% higher clocking frequency
❖ Assume the task spends 70% of its time processing with 80% degree of parallelism
an d Tech no lo gy
❖ 4 times the core count yields speedup of SN = 1 / (0.2 + 0.8 / 4) = 2.5
★
Mi cr o pr o c e sso r S y stems
Power efficiency is a hot issue; how much throughput per unit of energy in (Data Centers)
Single Board Computers (SBC) are small size lightweight subsystems with decent computing
Dra f t L ec tu re s
performance, memory and general input/outputs terminals (some are expandable)
❖ Power consumption of 5 to 20 W, native operating system, a version of Linux or Windows
❖ Used as low level computers (desktop) or high level controllers (plant or process control)
★
By
IoT, IoMT, Embedded systems may consume < 1 W
Handheld Devices, < 5 W (Small Battery) & Portable devices, 10 to 20 W (Large Battery)
★
★
D r. Ta i s i r El dos
Desktops, 200 to 400 W
Workstations & Gaming, 400 to 600 W
★ SuperComputers & Data Centers are exceptional; 100s KW to 100s MW (Millions of cores)
★ Humans, 100 W (50 W sleeping to 1000 W exercising), depends on size, age, style, etc.
an d Tech no lo gy
❖ 10 to 20 W goes for brain activities
★
Mi cr o pr o c e sso r S y stems
consumption of the various parts; mainly processor, memory, solid state storage
An M3 Max MacBook Pro has a 72.4 WH / 11.46 V Battery that lasts for about 10 hours
Dra f t L ec tu re s
❖ 72.4 WH / 10 H = 7.24 W; the share of M3 Max is around 4 W
Component
CPU
By
Power (W) Depends on: Vendor, Model, Size, Speed, Workload, etc.
40 – 100 Cores, Frequency, etc., 4 to 16 CPU cores & 16 to 32 GPU cores
GPU
CHIPsetD r. Ta i s i r El dos
80 – 300
20 – 40
Cores, Frequency, etc., 1,024 to 16,384 cores in discrete adapters
Complexity, Number of chips, etc.
an d Tech no lo gy
SSD 2–4 Number, Size, Technology, etc.
Fan(s) 5 – 15 Number, Size, Speed, etc.
Attachments ? Number, Type, etc., USB max 5 W & USB-C max 240 W
★
Mi cr o pr o c e sso r S y stems
Data Centers use 3% of the world’s power; 50 to 150 MW each & some exceed 600 MW
Google’s 35 data centers consume 16 TWH per year (100 countries consume less than this)
★
★
Dra f t L ec tu re s
Google’s largest, in Finland, 681 MW of renewable energy (0.3 of Jordan’s consumption)
A rack hold 14, 21 or 42 Servers (3U/2U/1U) housing: Servers, Storage, Switches & UPS
A rack can house around housing 6,000 cores & consuming 10 to 30 KW
By
★
“Science can amuse and fascinate us all, but it is engineering that changes the world.” - Isaac Asimov
Microprocessor System
★ Microprocessor systems vary greatly in complexity, performance, size, cost, etc.
★ But almost all have the same major components:
Mi cr o pr o c e sso r S yst ems
❖ CPU, to catty out tasks by running programs, along with Clock, Reset & Real Time Clock
D ra f t L e c tu re s
❖ I/O, to get information into and out of the system
By
RST, CLK, RTC
D r. Ta i s i r E l d os
CPU
a n d Te ch n o l og y
ROM & RAM
★
Mi cr o pr o c e sso r S yst ems
❖ CPU support: reset signal generator, clock signal generator, real-time clock module
D ra f t L e c tu re s
❖ Solid state memory comes in different flavors (Mostly, random access and volatile )
By
❖ Mass storage come in different flavors (Non-volatile)
✦ Electro − Opto, Magneto, Mechanical Devices, Hard Disk Drive, Compact Disk,Tape Drive, etc.
D r. Ta i s i r E l d os
✦ Electronic, Solid State Device (SSD), Solid State Cards (SDC), Multi-Media Cards (MMC)
★
Jorda n U n ive r si ty o f Sci ence
❖ Output: Monitor, Printer, Speaker, etc.
Glue Logic, to connect all parts via buses (sets of wires to transfer Data, Address & Controls info)
a n d Te ch n o l og y
❖ Buffers, resolve the fan-out problem hence allow driving more and more loads in a complex system
❖ Decoders, partition the address space (select one memory or I/O chip for action)
★
Mi cr o pr o c e sso r S yst ems
❖ Special purpose computers are design and optimized to carry out specific tasks
Both types use Processors, Memory & Storage with varying capabilities, and different kind of
★ D ra f t L e c tu re s
ports to support peripherals. However, they differ in the target application …
Special Purpose Computers
❖ Specific applications; Controllers & Embedded Systems
❖ Examples
By
✦ Simple: Oven, Fridge, Washer, Dryer, Traffic Controller, Automatic Teller Machine, etc.
D r. Ta i s i r E l d os
✦ Complex: Medical Equipment, Airplane Autopilot, Autonomous Driving, Missile
Guidance System, Industrial Plants
General Purpose Computers
Jorda n U n ive r si ty o f Sci ence
★
❖ Examples
a n d Te ch n o l og y
✦ Simple: Portable Computer, Personal Computer, Workstation
★
Mi cr o pr o c e sso r S yst ems
❖ Special: 8/16/32/64-bit data, 16 to 36-bit address, few cores, single memory channel
D ra f t L e c tu re s
❖ General: ≥ 4 GB to support complex operating system, multi-tasking & large data
★
D r. Ta i s i r E l d os
Communication
❖ Display & Camera
CPU
AM
2
❖ PCIe for functionality
a n d Te ch n o l og y
er
MI Eth
Po era
HD
r
am
we
★ Special, GPIO for sensors and actuators B-3
IC
U S
CS
★
Mi cr o pr o c e sso r S yst ems
Desktop MoBo has many ports for connectivity & expansion slots for functionality
Notebook MoBo has limited ports & expansion slots (Zero?)
★
★
D ra f t L e c tu re s
All require power supplies that meet their needs
VRM is a DC-DC power supply for the CPU, GPU, MEM
SDRAM
By CPU Voltage Regulator Module
(VRM)
D r. Ta i s i r E l d os Ports
Power Supply
D ra f t L e c tu re s
INT* INT*
CLK CLK CS* CS* CS* CS* CS*
By
RST RST*
DEC
MREQ* E* 0* I/O I/O I/O
D r. Ta i s i r E l d os
1* DEV DEV DEV
A15
B 2*
A14
A 3*
A7
E* 0*
1* SIOCS*
RTC
B 2* PCTCS*
a n d Te ch n o l og y
INT*
A6
A 3*
NMI*
★
Mi cr o pr o c e sso r S yst ems
RAM, Random Access Memory; read/write memory to hold computing results
PIO, Parallel Input Output; subsystem providing data exchange for near devices
★
★
D ra f t L e c tu re s
SIO, Serial Input Output; subsystem providing data exchange for far devices
PCT, Programmable Counter Timer, subsystem proving timing signals
Support Logic
By
★
❖ CLK, Clock; square wave signal necessary for the CPU to function
★
D r. Ta i s i r E l d os
❖ RTC, Real Time Clock; battery operated calendar subsystem
Glue Logic
a n d Te ch n o l og y
❖ IPE, Interrupt Priority encoder; selects the highest priority device to serve
★ I/O DEV, Input Output Devices via which the computer interacts with the world
❖ 2 inputs from sensors to detect Emergency (E) & Congestion (C) read via a buffer BUF
D ra f t L e c tu re s
CPU ROM RAM BUF NORTH
DB DB DB DB
AB
RD*
WE*
ByAB
OE*
PGM*
AB
OE*
WE*
E
D r. Ta i s i r E l d os
DEC
CS* CS* OE*
DB
C
DB
a n d Te ch n o l og y
A CK
CK
CK
RST RST* CK
★
Mi cr o pr o c e sso r S yst ems
RAM, Random Access Memory; read/write memory, can be used for E & P times logging
LAX, Latch to holds the output state for path lights (4 latches, 1 per path)
★
★
D ra f t L e c tu re s
BUF, Buffer via which we read the sensors (1 buffer, 2 bits per path)
Support & Glue Logic
❖ CLK, Clock; square wave signal necessary for the CPU to function
By
❖ RST, Reset; power-on pulse to restart the CPU
❖ DEC, selects ROM, RAM, BUF, or one LAX of the 4 (Output Latches)
★
D r. Ta i s i r E l d os
BUF inputs come from Emergency sensors (E) for path priority as detected by sonar or
bluetooth receivers (Ambulance or Firetrucks). And Pressure sensors (P) for path time extra
when car queue length exceeds a threshold.
★ Jorda n U n ive r si ty o f Sci ence
Outputs of latch drive high-power transistors to operate the Red & Green lights of the four
groups: L-turn, S-lanes, R-turn & Walking lights (Assumed U-turn goes with the L-turn)
★
a n d Te ch n o l og y
During the time between deactivating and activating the Red & Green, the Yellow will be
turned on automatically, as neither is active in this time, to save a dedicated output
★ A clock of 1 MHz is more than enough to operate such a simple design
D ra f t L e c tu re s
❖ 1 KB to 1 MB SRAM
★
D r. Ta i s i r E l d os
❖ High end: 20 to 100 MHz clock, 100 to 300 pins, 1 to 5 W
a n d Te ch n o l og y
❖ Lower power
❖ Cheaper
Wi-Fi
Quad Relay Module
D ra f t L e c tu re s
❖ Radio Frequency Identification (RFID)
By
❖ Biometrics: Heart Rate & ECG, Blood Oxygen, Pressure, Glucose, etc.
D r. Ta i s i r E l d os
★ Examples of SoC specific functions:
❖ Smartphones & Tablets
ESP32 SoC
✦ An SoC for a smartphone may include graphics, audio, video processing parts
a n d Te ch n o l og y
❖ Networks
✦ Switches & Routers use SoC to handle packet processing and routing fast
★ ARM based chips production alone exceeds 7 Billion chips per year
D ra f t L e c tu re s
❖ More ports, even of the same function, like:
By
✦ Digital Visual Interface DVI, Legacy
D r. Ta i s i r E l d os
✦ miniHDMI
✦ miniDisplayPort
✦ USB-C
a n d Te ch n o l og y
Laptop MoBo PC mini MoBo Desktop MoBo
Apple D ra f t L e c tu re s
Handheld devices Personal Computers
Apple
Servers & Data Centers
Ampere Computing
3 nm process
2 to 8 W
By
20 Billions of Transistors
3 nm process
60 Billions of Transistors
8 to 40 W
5 nm process
80 Billions of Transistors
150 to 350 W
6 CPU Cores 3.8 GHz 16 CPU Cores 3.6 GHz 192 CPU Cores 2.8 GHz
D r. Ta i s i r E l d os
6 GPU Cores, 1.5 GHz
16 NPU cores
256 KB L1 per core
40 GPU Cores, 1.4 GHz
32 NPU cores
128 KB L1 per core
No GPU Cores
16 KB Code L1 per core
64 KB Data L1 per core
a n d Te ch n o l og y
8 GB LPDDR5 DRAM
★
Mi cr o pr o c e sso r S yst ems
All are 3 nm process with M3 Max having 92 BTr on a 420 mm2 die
M3 Max system consumes 80 W full load & 3.5 W system nominal, 70 WH Battery lasts 20 Hours
D ra f t L e c tu re s
❖ CPU, 12 Performance Cores; 4.05 GHz, 192 KB Code + 128 KB Data L1 & 32 MB L2 shared
❖ CPU, 4 Efficiency Cores; 2.75 GHz, 128 KB Code + 64 KB Data L1 & 4 MB L2 shared
❖ GPU, 40 Cores; 1.6 GHz, 6400 Compute units delivering 4.26 TFLOPS (FP32)
By
❖ NPU, 16 Cores; 1.125 GHz yielding 18 TOPS for AI & ML
D r. Ta i s i r E l d os
✦ 32 Channels x 16-bit each
a n d Te ch n o l og y
❖ Security & High speed data transfer
★
By
general purpose computers but at varying scale
System on Module (SoM), integrates components for specific tasks on a small board.
D r. Ta i s i r E l d os
★ From few dollars to few hundred dollars; Multi-Core
★ Low end ones MCUs can get down to few cents per piece
★
Mi cr o pr o c e sso r S yst ems
An example of computer / controller; with many ports for Keyboard, Mouse, Monitor, etc.
Raspberry Pi 5 is built around an SoC with 4-core 64-bit ARM running at 2.4 GHz
D ra f t L e c tu re s
2 GB, 4 GB & 8 GB LPDDR4 (for few to several tens of dollars)
★
By
❖ Dual micro HDMI for 4K displays & Stereo Audio port
D r. Ta i s i r E l d os
❖ Dual Band Wi-Fi & Bluetooth
❖ MicroSD card
D ra f t L e c tu re s
GB eMMC
❖ Jetson Nano (10 W, $99, 478 MFLOPS), a skimmed version with 128-core GPU, 4 GB
By
LPDDR4 & 16 GB eMMC
D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci ence
a n d Te ch n o l og y
nVIDIA Jetson Nano SoM nVIDIA Jetson Nano SBC nVIDIA Jetson TX2 SBC
❖ MEM: 8 GB LPDDR4 128-bit / 51.2 GBps, 16 GB eMMC & microSD and NVMe SSD
★
D ra f t L e c tu re s
84 Tera Operation Per Second (TOPS)
Good for Edge Computing
By
★ Size & Cost:
❖ 4 x $400 + $200 = $1800
❖ 70 W in 1000 cc package
D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci ence
a n d Te ch n o l og y
Jetson Xavier NX Board Quad Jetson Xavier NX Carrier Jetson Mate - Cluster Box
70 mm x 45 mm 110 mm x 110 mm 120 mm x 120 mm x 8 mm
D ra f t L e c tu re s
❖ 192 GB & 5.3 TB/s memory
By
D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci ence
a n d Te ch n o l og y
Dr. Taisir Eldos 20
Purpose Built Systems
★ Increasing demand on performance led chip makers to look for ways to make them at
affordable price and in reasonable time; small dies with good yield & short time to market
★
Mi cr o pr o c e sso r S yst ems
Purpose Built is a way to assemble chips to meet special needs, like Data-Centric applications
❖ QualComm Centriq, designed for performance & optimized for power to handle Data
Centers workloads
D ra f t L e c tu re s
❖ NVIDIA Drive Adam System, Quad Orin, 4 x 254 > 1000 TOPS for autonomous driving;
a data center on wheels, to process large amount of data from many sensors & cameras
By
D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci enceNVIDIA Orin
254 TOPS
a n d Te ch n o l og y 17 BTr. SoC
D ra f t L e c tu re s
By
ROUTER
D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci ence
a n d Te ch n o l og y
Dr. Taisir Eldos 22
Computing Platforms - Corporate
★ Workstations, Thin Clients, Terminals, Kiosks and other equipment
★ Supercomputers, Minicomputers and/or Servers
★
Mi cr o pr o c e sso r S yst ems
Clouds, Infrastructures, Platforms & Services; Public, Private & Hybrid Clouds forms
D ra f t L e c tu re s
By
On-premise Servers Cloud Servers
ROUTERS & SWITCHES
D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci ence
a n d Te ch n o l og y
Dr. Taisir Eldos 23
Computing Platforms - SuperComputers
★ Super computer are quite powerful machines dedicated for heavy computations like scientific
research, simulation, design of complex systems
★
★
Mi cr o pr o c e sso r S yst ems
Data Centers serve huge number of users & applications over the Internet; mostly CPUs
Super Computers serve limited number of users & applications; mostly GPUs
D ra f t L e c tu re s
By
D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci ence
a n d Te ch n o l og y
Dr. Taisir Eldos 24
Computing Platforms - Factories
★ Programmable Logic Controller (PLC) is a modular special purpose automation system
★ Consist of CPU modules, I/O modules, Links, etc.
★
★
Mi cr o pr o c e sso r S yst ems
Programmed using special languages, like Ladder Diagram (LD), Instruction List (IL), etc.
Used in small control applications and large industrial plants …
D ra f t L e c tu re s
❖ Controllers, traffic lights, elevators, automatic doors, car wash, remote monitoring, etc.
❖ Industry, automobile industry, oil and gas industry, equipments industry, food industry, etc.
By
D r. Ta i s i r E l d os
Jorda n U n ive r si ty o f Sci ence
a n d Te ch n o l og y PLC system
D ra f t L e c tu re s
❖ VI > VTH ➞ VO = VOL
★ Schmitt inverter has hysteresis property; it has two input threshold voltage levels
❖ VI < VL ➞ VO = VOH
★ Inerter
By
❖ VI > VH ➞ VO = VOL
VOH
VO
VOH
VO
★
D r. Ta i s i r E l d os
❖ VTH = 1.5 V
❖ VH = 2.4 to 3.2 V
VTH VL VH
VI VO VI VO
By
VC
C R
★ To achieve 50% duty cycle
VC
❖ TH = TL , hence VCC = VH + VL
D r. Ta i s i r E l d os
VCC
❖ VL = 1.67 V & VH = 3.33 V (1/3 & 2/3 of 5 V), or
VH
★ Consider an inverter with: VL = 1.9 V & VH = 3.1 V VL
❖ T = TL + TH = 0.98 RC ≈ RC
0
CK
Time
❖ F = 1 / T = 1 / RC
a n d Te ch n o l og y
VCC
★ With R = 1 KΩ & C = 2 nF Space Mark
❖ F = 500 KHz & DC = 50% TL TH
0 Time
★
D ra f t L e c tu re s
❖ ± 3.4 x 10−6 x (365.25 x 24 x 60) = ± 1.8 minutes annually
Cut precisely to act like an RLC resonance circuit with very high Q–Factor, 106 or better
❖ Precision: 100 to 10 parts per million (ppm) in the temperature range 20 to 70 ○C
By
❖ Stability: 10 to 100 ppb/○C due to heating & 10 to 100 ppb/year due to aging
★ Low frequency crystals are bulky, less precise & hard to manufacture
★
D r. Ta i s i r E l d os
Systems use low frequency clock like 100 MHz as base & components have multipliers
❖ CPUs on-board multipliers produce 1.8 to 5.8 GHz
a n d Te ch n o l og y
✦ Sound, 44.1 KHz, 48 KHz, 96 KHz Crystals
22 pF
✦ Ethernet; 25 MHz for 10/100 Mbps, 125 MHz for 1 Gbps
Oscillator
★
D ra f t L e c tu re s
Temperature Controlled Crystal Oscillator (TCXO)
❖ 10−6, 10 to 1 ppm
By
TCXO
❖ Thermal sensor to adjust frequency, 7 x 7 x 3 mm
★
Jorda n U n ive r si ty o f Sci ence
❖ Temperature is kept at 100 ○C for stability, 9 x 9 x 5 cm
a n d Te ch n o l og y
❖ One second shift in thousands of years
D ra f t L e c tu re s
❖ Financial Transactions & Securities Exchange
★ By
Fountain Clock (CFC), Cesium @ 9.2 GHz Life Expectancy
10 to 50 years
❖ 3.3 nanoseconds per year
★
D r. Ta i s i r E l d os
❖ One second shift in 300 Million years
★
Mi cr o pr o c e sso r S yst ems
it takes RC seconds to reach, but a bit more to reach 3.6 V and trigger; like 1.1 RC
The output Vx is active high; it goes high on power up or button release, and down after
some time that depends on the R, C and the threshold voltage of the Schmitt inverter
★ D ra f t L e c tu re s
With C = 4.7 µF & Rc = 20 KΩ, we get 20 KΩ x 4.7 µF = 94 ms pulse duration
★ If C max ratings are 12 V / 0.5 A, then Rd > 5 V / 0.5 A ≥ 100 Ω for safe discharge
★
By
Rd must be small enough to discharge before the push button is released. If the push button
time is 10 ms for example, Rd x C < 10 ms, or else the capacitor starts charging before
reaching the low threshold
D r. Ta i s i r E l d os
5.00 V
Rc
Vx
a n d Te ch n o l og y
Vc T
Tc
Rd C
0.00 V Time
D ra f t L e c tu re s
❖ Green discharges to VL, causing the inverter to start and end another pulse
❖ Red discharges slowly (due to high Rd) and does not get to V−, no reset pulse generated
❖ Warm reset, when the power supply is on and the push button is pressed; Tw ≤ Tc
D r. Ta i s i r E l d os
The 555 timer circuit requires 2 resistors & 2 capacitors to construct a robust pulse generator
❖ Operates at 5 V to 15 V
5.0 V VCC
❖ Robust; 50 years of reputation
a n d Te ch n o l og y
❖ 8-pin chip has single timer
★
Mi cr o pr o c e sso r S yst ems
Hence T = 1.1 x R x C because it occurs when 5 x (1 − e −T/RC) = 3.33 V
Rx (1 MΩ) initiates the pulse on power up or button release & Cx (10 nF) eliminates noise
D ra f t L e c tu re s
❖ Compute the pulse duration with R = 18 KΩ / 5% & C = 6.8 µF / 20%
✦ T = 1.1 x 18 x (1− 0.05) x 6.8 x (1 − 0.20) = 102.3 ms (Good for 100 ms requirement)
✦ Note that electrolytic capacitors lose value over time due to liquid evaporation
By
❖ Compute the 10% resistor value that generates 50 ms using C = 4.7 µF / 15%
D r. Ta i s i r E l d os
✦ If only 12 KΩ, 15 KΩ, 27 KΩ are available, pick 15 KΩ (Need at least 12.64 KΩ)
5V
T
TRG DIS
a n d Te ch n o l og y
RST 4 5 CONT LM555
CON OUT
PB Cx GND C
GND
★
Mi cr o pr o c e sso r S yst ems
Address Space: Kilo = 210, Mega = 220, Giga = 230, Tera = 240, Peta = 250, Exa= 260, Zetta= 270
CB: 10’s to 100’s of signals
210 = 1024 = Ki
D ra f t L e c tu re s
❖ Input: Clock, Reset, Interrupt, Wait, Bus Error, Bus Request, etc.
❖ Output: Clock, Address Strobe, Data Strobe, Read, Write, Halt, etc.
103 = 1000 = K
❖ Multiplexed: Input/Output due to pins shortage, like Reset & Halt in the MC68000
★ By
PB: 5.0, 3.3, 2.0, 1.9, 1.8 V, 1.5, 1.2 V, …, with dynamic voltage scaling it cover a range:
❖ CPU Cores: 0.7 – 1.3 V (low core count), (0.6 – 1.1 V (high core count) & 1.5 V (Gaming)
D r. Ta i s i r E l d os
❖ CPU Logic: 1.2 – 1.5 V
D ra f t L e c tu re s
PB: GND, VCC (5.0 V), VPP = 12 V (for programming)
★
By
❖ Erasable Programmable ROM (EPROM), UV Light erased, and programmed many times
❖ EEPROM, Electronically byte erasable and Flash is the same but block erasable. Why?
★
D r. Ta i s i r E l d os
The 2764 is 28-pin 8 KB EPROM (8Kx8b), why not 27 pins?
❖ AB = 13 (8 K implies 3 + 10)
★
a n d Te ch n o l og y
Unused pin! NC, VCC, GND, CS2* …
Some chips have dual function pins, like CE*/VPP
OE*
PGM*
★
Mi cr o pr o c e sso r S yst ems
NAND is denser, sequential, cheaper good for data; formatted as block, pages, etc.
A chip consists of: dies, planes, blocks, pages, strings of bits, and requires erase before write
★
★ D ra f t L e c tu re s
Units of erase and write are blocks not bytes
Millions of erase/program cycles & decades of retention SCLK
Standard Bus SDI
By
★
SDO
❖ DB of 4, 8, 16
CS*
❖ AB of 4, … up to 20 or more
★
D r. Ta i s i r E l d os
❖ CB: CS*, OE* and WE*
★
Mi cr o pr o c e sso r S yst ems
amount of time. This is opposed to Direct Access in which it depends on where the data is stored
DB: 4, 8 and 16
AB: 8, 9, …, 20 (More recently …)
D ra f t L e c tu re s
★
By
❖ OE*: Output Enable, to read data out
D r. Ta i s i r E l d os
★ PB: GND and 5.0 V (Today, many work on much less like 2.0 V)
★ Packages: DIL and SMT with 24, 28, 32, 40 pins (8-pin chips uses serial bus)
★ The 62128 is a 28-pin 16 KB SRAM (16 Kx8b), again not 27 !
Jorda n U n ive r si ty o f Sci ence
❖ AB = 14 (16 K implies 4 + 10)
❖ DB = 8 6212
8
DB
AB
CS*
a n d Te ch n o l og y
❖ CB = 3 (CS*, OE*, WE*) OE*
❖ PB = 2 (VCC, GND) WE*
★ What is the capacity of a 32-pin 8-bit data SRAM chip? 219 = 512 KB
★
Mi cr o pr o c e sso r S yst ems
AB: 8, 9, …, 16, 17, 18 (18x2 = 36, yields 236 = 64 GB, Today: 8 GB x 8 Dies = 64 GB
CB: CS*, OE*, WE*, CAS* and RAS*
D ra f t L e c tu re s
❖ CS*/CE*: Chip Select/Enable, there can be many active low and active high
By
❖ RAS*: Row Address Select; latches the upper half to select a page
★
D r. Ta i s i r E l d os
PB: GND and 5.0 V (3.3 V, 2.0 V, 1.2 V, 1.1 V & 1.0 V today)
Packages: DIL & SMT with 16, 18, 20 & 40 pins
DB
AB
★
Jorda n U n ive r si ty o f Sci ence
How many pins in 4 GB DRAM, assuming 4-bit wide? 28
❖ Format: 4 GB = 8 G x 4 b, AB = ⌈33/2⌉ = 17; 2 steps only
CS*
OE*
WE*
a n d Te ch n o l og y
❖ DB = 4 RAS*
❖ CB = 5 (CS*, OE*, WE*, CAS*, RAS*) CAS*
❖ PB = 2 (VCC, GND), more for high density chips
D ra f t L e c tu re s
They have reset input to initialize ports as inputs to avoid damage; there will be a content that
★
leads to damage if a port is randomly set as output while connected to an input device
★ May have Interrupt Output, to notify the CPU when an action is complete or to be requited
★
By
Naturally, they have a data bus to communicate with the CPU, few address lines to select a
port and read/write control and a chip select (from a decoder)
D r. Ta i s i r E l d os
★ Different vendors have chips with different flavors
❖ Parallel Input Output (PIO); Zilog DB PA
❖ Parallel Peripheral Interface (PPI); Intel AB PB
a n d Te ch n o l og y RST*
INT*
★
Mi cr o pr o c e sso r S yst ems
+12/−12 V drivers as opposed to standard parallel 0/+5 V
Each channel has Transmit, Receive and Handshaking signals (on the right side)
D ra f t L e c tu re s
Has interrupt output, to signal events like rather received or sent
★
★ Some chips have FIFO buffer for each direction; 16, 32, …, 128
★ Different vendors have chips with different flavors
By
❖ Asynchronous Communication Interface Adaptor (ACIA, ACA); Motorola
D r. Ta i s i r E l d os
DB TxD
❖ Dual Asynchronous Receiver Transmitter (DART); Zilog
AB RxD
★ Asynchronous serial protocol may use: CS*
a n d Te ch n o l og y RxC
TxC
★
Mi cr o pr o c e sso r S yst ems
Output of module can be used as a clock for another to form 32-bit or even 48-bit counters
Mostly byte oriented low operating frequency used in timing signal generation like periodic
interrupt for multitasking
★ D ra f t L e c tu re s
Typical chips have three channels or counting elements, called contain modules
★ There can be more chip specific controls in some chips; like Clock, Reset, etc.
★
By
Different vendors have chips with different flavors
❖ Programmable Interval Timer (PIT); Intel
D r. Ta i s i r E l d os
❖ Programmable Timer Module (PTM); Motorola
★
a n d Te ch n o l og y
Modules are totally independent, each operates in any mode
Any module can generate an interrupt when done
WE*
INT*
M2
D ra f t L e c tu re s
❖ SuperCapacitor (UltraCapacitor), few minutes of charge gives months of operation
★
By
❖ 222 = 4,194,304 Hz, need 22 FFs to divide to get 1 Hz
Why those oddball numbers? 32,700 Hz, 32,000 Hz, etc. need complex next state logic
D r. Ta i s i r E l d os
❖ 213 = 8,192 Hz? Less FFs compared to 215, but less precise
❖ 212 = 4,096 Hz? Less FFs, but less precise, more power, bulky, fragile & hum
XTL1
Battery Capacitor
D ra f t L e c tu re s
❖ DEC, 2-to-4, 3-to-8, etc. are binary decoders in one or two stages normally
❖ ROM or PLA, flexible but slow and needs programming step which is an added cost
★ Encoders arbitrate events like interrupts; report the code of the highest priority active device
By
request to serve; the lowest is always active to report no request.
D r. Ta i s i r E l d os E1
E2*
Y0*
Y1*
E* I0*
I1*
E3* Y2* I2*
a n d Te ch n o l og y
A
B
Y2*
Y3*
C Y6*
Y7*
C* I6*
I7*
74LS
74LS
74LS
139
138
148
Dr. Taisir Eldos 44
Glue Logic: Buffers & Latches
★ Buffers are used for two reasons
❖ Physical, to resolve the fan out issues by strengthening the signal power, it is an electronic
★ D ra f t L e c tu re s
Address bus is unidirectional and hence needs unidirectional buffers; each 4 have one Enable
★ Data bus needs bidirectional buffers (bus transceivers), hence Enable & Direction controls
★
By
Latches or Flip-Flops are used as output ports, data on the data bus is written into by
activating the clock, to be read by another party when output is enabled
D r. Ta i s i r E l d os
74LS
74LS
74LS
Jorda n U n ive r si ty o f Sci ence
374
244
245
8 8 8 8 8 8
D Q A B A1 B1 A B
a n d Te ch n o l og y
D D Q Q B
E1*
OE* CK CK
E1* E* A
CK OE* E2* A2 B2 D
E* D
E2*
D ra f t L e c tu re s
❖ Field Programmable Gate Arrays (FPGA)
★ PLAs are generally used to implement simple logic expressions while FPGAs and ASICs
By
consist of huge number of complex blocks and interconnection network managed by
switches, and hence can be used to implement complex systems like controllers or processors
D r. Ta i s i r E l d os
★ PLA structure
❖ AND/OR sections, with programmable connections
❖ Fuses are initially robust (Red) and making connection, the blown are (Blue) to disconnect
a n d Te ch n o l og y
❖ XOR takes the complement of a function
❖ Variables passed true and complemented, and only one can be taken (if any)
= (A’•1)(B’•1)(1•C)(1•1)(1•1)(1•F)
D r. Ta i s i r E l d os = A’B’CF
F2 = (BCD+AC’E’)’
a n d Te ch n o l og y
F10 = ?
A’BE’F BCD AC’E’ A’B’CF
★ It has a feedback control to adjust the duty cycle, pulse width, to stabilize the output
★ Output based on input identification code VID (5, 6 or 8 bits to specify the requited value)
★
★
By
Codes may imply: 0.55, 0.56, 0.57, …, or 3.0 V (Some codes reserved for control; shut off)
Switches & Chokes make phases; more phases …
D r. Ta i s i r E l d os
❖ Larger currents using cheaper components
a n d Te ch n o l og y
❖ A light core requires 2 to 4 W & 5 A
D ra f t L e c tu re s
✦ P = 37x(3.0/3.6)x(1.4/1.4)2
✦ P ≈ 31 W
Voltage (V) Power (W)
❖ 3.0 GHz & 1.1 V
By
2.0 50
✦ P = 37x(3.0/3.6)x(1.1/1.4)2 1.8 45
1.6 40
✦ P ≈ 19 W
D r. Ta i s i r E l d os
1.4 35
★ A processor with: 1.2 30
❖ F: 3.2 to 4.4 GHz (normally, 3.8 GHz) 1.0 25
0.8 20
Jorda n U n ive r si ty o f Sci ence
❖ V: 0.9 to 1.3 V (normally, 1.1 V)
❖ Normally, consuming 15 W
0.6
0.4
Find K …
50 = K x 4 x 1.2 x 1.2
15
10
Compute 0.2 K = 8.7 Ω GHz
-1 -1 05
a n d Te ch n o l og y
★
0.0 00
❖ Minimal power, Pmin
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
❖ Maximum power, Pmax
Frequency (GHz)
★
Mi cr o pr o c e sso r S yst ems
The sawtooth is the choke current during HS/LS FETs activation (mutually opposite)
VOUT = DC x VIN. If VIN = 5 V, then: VIN
D ra f t L e c tu re s
❖ 25% duty cycle yields VOUT = 1.25 V
VIN
D r. Ta i s i r E l d os P
VIN
Time
VOUT VOUT
a n d Te ch n o l og y
P
LS FET
Time
P Time
★
Mi cr o pr o c e sso r S yst ems
Pulse controller generates 100 KHz to 10 MHz overlapping or non-overlapping pulses
Higher frequency switching produces smoother output but causes electromagnetic
VIN
D ra f t L e c tu re s
interference that requires design care
Phase #1
VOUT VIN
By P1
Pulse Generator
Phase #2
VOUT
D r. Ta i s i r E l d os
P2
Time
Phase #3
P2
Time
a n d Te ch n o l og y
3-phase non-overlapping
with 20% duty cycle P3
Time
1 V output using 5 V input
Mi cr o pr o c e sso r S y stems
❖ Hex, direct mapping of every nibble Binary to Hex
★
Dra f t L ec tu re s
❖ Assembly, symbolic language with directives (pseudo-instructions)
Assembly enhances reuse and code readability by using labels, comments, etc.
Colors indicate matching code in the all levels; black stuff is pseudo; no translation
By
★
2 D r. Ta i s i r El dos
code
org $1200
ld a, init
Place Code at $1200
Initialize Counter
$1200
$1201
0011 1110
0100 0111
$3E
$47
an d Tech no lo gy
6 init equ $47 Initial Count $1205 0000 0010 $02
Dra f t L ec tu re s
❖ 4 x 8b, Byte wide; 2 Address bits
By
❖ 32 x 1b, Bit wide; 5 Address bits
N7
C8
C7
8
7
N6 C6 6
D r. Ta i s i r El dos
B3
B2 N5 C5 5
B1 N4 C4 4
an d Tech no lo gy
W0 N0 C0 B 0
LW0
Dra f t L ec tu re s
❖ Log2 (32 / 8) = 2 address bits to select a byte within a word
By
A1A0 A1A0 A1A0 A1A0
00 B 00 W 00 LW 00 VLW
01 01 01 01
10
11
D r. Ta i s i r El dos
10
11
10
11
10
11
an d Tech no lo gy
Byte Indexing 01 01 01
10 10 10 $48
M(10011) = $48 11 11 11
1 0 11 10 01 00 111 011 101 100 011 010 001 000
★
Dra f t L ec tu re s
When a 4-byte data is copied to address $1002, it has to be speed over $1002, 3, 4 & 5
❖ LE, B0 goes to $1002
CPU REGISTER
By MEM MEM CPU REGISTER
B3
D r. Ta i s i r El dos
B2 B1 B0 $1000
$1001
$1000
$1001
B3 B2 B1 B0
B0 $1002 $1002 B3
an d Tech no lo gy
Low Address ↔ Low Data
B3 $1005
$1006
$1005
$1006
B0
★
Mi cr o pr o c e sso r S y stems
❖ Big Endian (BE), maps lower order data item to higher order memory address
Dra f t L ec tu re s
❖ File formats; when an application stores a multi-byte or multi-word data items, and this is
why we have standards
❖ Networking & Serial Transmission; least or most significant bit transmitted first in time
★ Examples By
❖ LE: ARM & Intel x86 architectures
D r. Ta i s i r El dos
❖ BE: Motorola 68K & Sun SPARC architectures
★
an d Tech no lo gy
❖ At start up, using some motherboard setting jumper
Alignment is about allowing or disallowing multiple byte at odd address; data fragmentation
Mi cr o pr o c e sso r S y stems
❖ Aligned: does not allows fragmentation; fast but may waste memory (Temporal advantage)
Dra f t L ec tu re s
★ Example
❖ Define the following constants, at
✦ $45,
By 89
A4
45 1002
67 1004
−
67
45 1002
89 1004
83 79 1006 79 A4 1006
D r. Ta i s i r El dos
✦ $6789,
✦ $A4,
24 1008
Little Endian
24 83 1008
Little Endian
✦ $79,
★
Jord an U n ive r si ty of Sci ence
✦ $2483
★ an d Tech no lo gy
Skipped locations are just left alone, and can still be accessed by their addresses
Fragmented words requires two transactions to access
★ Grouping; words then bytes or bytes then word resolve the spatial issue
Mi cr o pr o c e sso r S y stems
❖ EQU, Equate: binds a name to a value
❖ ORG, Origin, location counter: where to place Code and Data in memory
Dra f t L ec tu re s
❖ END, Indicate the end of program
By
1002
Many EQU 32 1004
More EQU $48 1006 12 3C
D r. Ta i s i r El dos
Code ORG $001006 1008 00 20
MOVE.B #Many, D1 ; $123C, $0020 100A 7A 48
MOVEQ #More, D5 ; $7A48 100C
★
Jord an U n ive r si ty of Sci ence
Instruction MOVE.B #$20, D1 in binary coding is translated to $123C0020
★
Mi cr o pr o c e sso r S y stems
❖ DS, Define Storage, allocates memory to be used at run time
ORG Dra f t L ec tu re s
$1002
1004 −
1006 −
X
−
BArray DS.B 3 ; Allocate 3 bytes 1008 − −
WArray
BData
DS.W
DC.B
2
12
By
; Allocate 2 words
; Allocate a byte and write $0C
100A 0C
100C 00
X
0C
D r. Ta i s i r El dos
DC.W 12 ; Allocate a word and write $000C 100E 41 42
Message DC.B “ABC 123” ; Allocate bytes for ASCII string and write 1010 43 20
; $41 for “A”, …, $20 for “ ”, … and $33 dor “3” 1012 31 32
★
an d Tech no lo gy
Labels are used as friendly alternatives to addresses
To access the string “ABC 123”, we use Message as pointer
101A 00
101C 00
17
00
101E 00 23
Dra f t L ec tu re s
ORG $1000 ORG $1000 1008 35 X B2
B1 DC.B $13 W1 DC.W $1234 A1 B2 3 different places;
100A
$1004, $1009, $1001
W1 DC.W $1234 W2 DC.W $5678 100C 46 X same data
B2
W2
DC.B
DC.W
$24
$5678
W3
W4
DC.W
DC.W By$A1B2
$C3D4
100E
1010
C3 D4
AB
B3 DC.B $35 B1 DC.B $13
W3
B4
DC.W
DC.B
$A1B2
$46
D r. Ta i s i r El dos
B2
B3
DC.B
DC.B
$24
$35
1000
1002
12 34
56 78
1000
1002
13 24
35 46
W4 DC.W $C3D4 B4 DC.B $46 1004 A1 B2 1004 AB X
Mi cr o pr o c e sso r S y stems
✦ 8 x 32-bit registers, D0, D1,…, D7
✦ L, W, B segmentation
D0
Dra f t L ec tu re s
❖ Address Registers
★
By
Special Purpose Registers (Dark Gray)
❖ Stack Pointers, no segmentation
A0
D r. Ta i s i r El dos
✦ 32-bit, User Stack Pointer (A7, USP)
an d Tech no lo gy
❖ 16-bit Status Register (SR)
★
Dra f t L ec tu re s
Spilling means outcomes does not fit the size, and expressed through C (assuming inputs are
unsigned numbers) & V (assuming inputs are signed numbers)
B3 A3 B2 A2 B1 A1 B0 A0
By A’/S
D r. Ta i s i r El dos
C FA FA FA FA Cin
an d Tech no lo gy
Z
F3 F2 F1 F0
Dra f t L ec tu re s
❖ <d> is the destination operand
★ Then, the ADD & MOVE instructions of the processor are described as in the comment section
ADD
MOVE
By
<s>, <d>
<s>, <d>
; d ← d + s add d to s and store into d
; d ← s store copy of s into d
★ D r. Ta i s i r El dos
Here the source operand comes first, some Assemblers use destination first
★ Data Types
Jord an U n ive r si ty of Sci ence
❖ $ means Hexadecimal
❖ @ means Octal
an d Tech no lo gy
❖ % means Binary
❖ ‘ …’ means ASCII
Mi cr o pr o c e sso r S y stems
1
2
Literal
Absolute.W
Immediate number
Direct or Absolute Short (Word Address, to sign extend)
Dra f t L ec tu re s
3 Absolute.L Direct or Absolute Long (Longword Address, full address)
4 Di Data Register Direct
5 Ai Address Register Direct
6
7
(Ai)
(Ai)+
By Address Register Indirect
Address Register Indirect with Post-increment
8
9
D r. Ta i s i r El dos
−(Ai)
(d16, Ai)
Address Register Indirect with Pre-decrement
Address Register Indirect with Displacement
★
Mi cr o pr o c e sso r S y stems
The # is used to tell the Assembler “its immediate”
Typical application to setup control loops and delay counters
★ Example
Dra f t L ec tu re s
MOVE.B #$83, D3 ; D3(7:0) ← $83
MOVE.W
MOVE.L
By
#$83, D3
#$83, D3
; D3(15:0) ← $0083
; D3(31:0) ← $00000083
MOVE.L D r. Ta i s i r El dos
#$1A483, D3 ; D3(31:0) ← $0001A483
★
Dra f t L ec tu re s
❖ Sign = 1, upper word is 1s; range is $FF8000 − $FFFFFF (Highest 32KB block)
If sign extending a word address changes its value then it has to go long
Short takes less space and time; better if fits; Assemblers decide
By
★
FF8000 − FFFFFF
FF0000 − FF7FFF
MOVE.L D3, $17004 ; M($017004) ← D3(31:16); M($017006) ← D3(15:0)
D r. Ta i s i r El dos
MOVE.W D3, $7234
; Two transactions, High oder data first (BE), Long Abs
; M($007234) ← D3(15:0)
FE8000 − FEFFFF
FE0000 − FE7FFF
; Short fits because SE($7234) = $007234
an d Tech no lo gy
010000 − 017FFF
; Sign Extending $8234 yields $FF8234
008000 − 00FFFF
; So, if the address is $008234 it has to go long
; Otherwise it will be considered $FF8234 000000 − 007FFF
Dra f t L ec tu re s
★ Examples
★
D r. Ta i s i r El dos
Direct Address Register is not allowed as destination of MOVE
A dedicated instruction called MOVEA (Assembly restriction not processor OpCode)
MOVEA.B
an d Tech no lo gy
A1, A0 ; A0(15:0) ← A1(15:0)
★
Mi cr o pr o c e sso r S y stems
Application: arrays, records, link lists, etc
Processor state is usually in hexadecimal even without the prefix $
★
Dra f t L ec tu re s
Examples, Big Endian processor
1000 12 34
A1 = $1000
By
1002 57 30
A5 = $1002
1004
A6 = $1008
1006
D4 = $31295730
MOVE.W
D r. Ta i s i r El dos
(A1), D3 ; D3(15:0) ← M(A1)
1008
100A
31
57
29
30
100C
★
Mi cr o pr o c e sso r S y stems
Exception is A7 (USP) and A7’ (SSP), where 2 is used for .B, preserve alignment
Note that RTL uses one statement for .L sized memory accesses, but in fact it done done in
Dra f t L ec tu re s
two cycles because it’s a word sized data bus
MOVE.W
D r. Ta i s i r El dos
D3, (A0)+ ; M(A0) ← D3(15:0); A0 ← A0 + 2
an d Tech no lo gy
Dr. Taisir Eldos 20
Address Register Indirect with Pre-decrement
★ Auto adjustment, increment or decrement; faster access to structured data items; tables,
arrays, etc.
★
★
Mi cr o pr o c e sso r S y stems
Decrement by 1 for .B, 2 for .W and 4 for .L instructions, hence less time and space
Exception is A7 (USP) and A7’ (SSP), where 2 is used for .B, preserve alignment
Dra f t L ec tu re s
Applications include accessing data structures
★
★
Jord an U n ive r si ty of
Latency hiding, which of the two modes: –(Ai) and (Ai)+ is faster?
Sci ence
❖
ancyclesdbeforeTech no lo gy
As source, (Ai)+ is faster as we use then increment, but –(Ai) has to decrement first and
have to wait 2 clock use
❖ As destination, the pre-dec latency is also hidden, they are just as fast
★
Mi cr o pr o c e sso r S y stems
Effective Address <ea> is the sum of address register content plus displacement
Applications include accessing data structures with records and fields
★ Some Assemblers requires the displacement written before the parenthesis; like MOVE.L
★ Example
D r. Ta i s i r El dos
12(A1), D3 as opposed to MOVE.L (12, A1), D3
Jord
❖
❖ an addressU nsource
ive r first
siinstruction
ty of
If in the above instruction A1 = $123400 and A2 = $123468, then
The effective of the in the is Sci ence
ea = $00123400 + $0000000C = $0012340C
❖ an
The effective address d
of the Tech
source no
in the second lo
instruction
ea = $00123468 + $FFFFFFFA = $00123462 ($FFFFFFFA is – 6)
gy
is
★ Mi cr o pr o c e sso r S y stems
bit signed or d8
Most complex addressing mode
Good for structures, like the element in row r column c in matrix m
Dra f t L ec tu re s
★
★
Jord
For the firstan
Example
❖
Uassume:
instruction, n ive r si ty
A1=$1234A6, of Sci
D0=$12348812, then ence
The effective address ea = $1234A6 + $FFFF8812 + $6 = $0011BCBE
an d Tech nothelo gy
❖
★
Mi cr o pr o c e sso r S y stems
❖ Displacement & Index: ea = PC + Xj + d8, and
OpCode extension is a word that is d16, or d8 and 5 bits encoding X; j and W/L
CPY MOVE.W
.
Dra f t L ec tu re s
(MSG, PC), D1 ; Copies M(MSG) = $4131 to D1(15:0)
; MSG is d16 representing the distance to MSG label
MSG
.
DC.B “A1” By
; from the updated value of the PC which is CPY + 2
★
D r. Ta i s i r El dos
Actual distance is encoded to be added to the PC in execution
Useful in making relocatable code, i.e. Position Independent Code (PIC) to reside anywhere in
memory
★ Jord an U n ive r si ty of Sci ence
Example:
❖ Assume: MOVE.W instruction is at address $1000 & MSG at address $1008
❖
❖
an d Tech no lo
Then the displacement MSG to be encoded is $1008 - $1002 = $6
Then, the instruction decoding is $323A $0006
gy
❖ When it executes: Source ea = $1002 + $6 = $1008
B=y$1A3B5
❖
D r. Ta i s i r El dos
MOVE.W D1, –(SP) ; [ 1 ] SP ← SP – 2, M(SP) ← D1(15:0)
MOVE.L D2, –(SP) ; [ 2 ] SP ← SP – 4, M(SP) ← D2(31:16)
; [ 3 ] M(SP+2) ← D2(15:0) 841C