Professional Documents
Culture Documents
Chapter 1
Chapter 1
Chapter 1
Computer and Assembly Language
CE-475
Real-Time Embedded Systems
Spring 2023
1
Special thanks to Dr. Yifeng Zhu
Embedded Systems
2
Amazon Warehouse
Kiva Robot
3
Why ARM processor
ARM: Advanced RISC Machine
4
iPhone 11
A13 Bionic:
• 64-bit system on chip (SoC)
• 6 ARMv8.3-A cores
• Introduced on Sept. 10, 2019
ifixit.com 5
Amazon Echo (Alexa)
Texas Instruments
DM3725 Digital
Media Processor
ARM Cortex-A8
6
Kindle HD Fire
Texas Instruments
OMAP 4460 dual-core
ARM processor
http://www.ifixit.com 7
Fitbit Flex
ARM Cortex M3
Microcontroller
8
www.ifixit.com
Samsung Galaxy Gear
STMicroelectronics STM32F401B
ARM-Cortex M4 MCU with
128KB Flash
source: ifixit.com
9
Pebble Smartwatch
source: ifixit.com
10
Oculus VR
11
HTC Vive
STMicroelectronics 32F072R8
ARM Cortex-M0
Microcontroller
source: ifixit.com 12
Nest Learning Thermostat
source: ifixit.com
source: ifixit.com
14
ML Model Evolution Problem
Microphone
IMU
Processor
+ Bluetooth
Microphone
IMU
Processor
+ Bluetooth
Temperature Microphone
+ Humidity
IMU
Processor
I/O (USB)
+ Bluetooth
Temperature Microphone
+ Humidity
Tiny Applications?
Embedded
Systems
TinyML
Machine
Learning
Tiny Applications?
Embedded
Systems
TinyML
Machine
Learning
Tiny Applications?
Embedded
Systems
TinyML
Machine
Learning
Let’s Take an Example
Google Assistant
Let’s Take an Example
Okay, Google!
Step 1
Audio input
from microphone (sensor)
input
complete
The Three Basic Steps
Step 1 Step 2
Audio input Process input
from microphone (sensor) translation, then
execute command
input
complete
The Three Basic Steps
input
complete
Input
input
complete
Endpoints Have Sensors, Tons of Sensors
Non-invasive Glucose
Monitoring
Fingerprint + Photoplethysmography
(PPG)
Endpoints Have Sensors, Tons of Sensors
input
complete
Thinking Big
Thinking Big
Thinking Big
BIG
GPU / CPU
561mm2
Thinking Small
BIG
GPU / CPU
561mm2
Thinking Small
BIG
GPU / CPU
561mm2
Thinking Small
BIG
GPU / CPU SMALL
561mm2
Mobile SoC
83mm2
Thinking Tiny
BIG
GPU / CPU SMALL
561mm2
Mobile SoC
83mm2
Thinking Tiny
BIG
GPU / CPU SMALL
561mm2
Mobile SoC
83mm2
Thinking Tiny
BIG
GPU / CPU SMALL
561mm2
Mobile SoC
83mm2
Thinking Tiny
BIG
GPU / CPU SMALL TINY
561mm2
Apple 0778
Mobile SoC 30mm2
83mm2
We’re just getting started.
Thinking Record-breaking
BIG
GPU / CPU SMALL TINY
561mm2
Apple 0778 Kinetis KL03
Mobile SoC 30mm2 3.2mm2
83mm2
Thinking Record-breaking
BIG
GPU / CPU SMALL TINY
561mm2
Apple 0778 Kinetis KL03
Mobile SoC 30mm2 3.2mm2
83mm2
250 Billion
MCUs today
MCU Demand Forecast
Millions of Units
forecasted
Source: IC Insights
MCU Demand Forecast
Millions of Units
forecasted
Source: IC Insights
MCU Pricing Forecast
forecasted
Source: IC Insights
Comparing Power
BIG SMALL
GPU / CPU
140 μW
Syntiant NDP100
3.64W
Apple A12
300W
NVIDIA Tesla K80
Comparing Power
BIG SMALL
GPU / CPU
140 μW
Syntiant NDP100
3.64W
Apple A12
300W
NVIDIA Tesla K80
Comparing Power
BIG SMALL
GPU / CPU
140 μW
Syntiant NDP100
3.64W
Apple A12
300W
NVIDIA Tesla K80
Comparing Power
140 μW
Syntiant NDP100
input
complete
Output
Output
Output
MCUs enable TinyML
LOW LOW
SIZE
POWER COST
MCUs enable TinyML
LOW LOW
SIZE
POWER COST
MCUs enable TinyML
LOW LOW
SIZE
POWER COST
< 140 μW
Syntiant NDP100
MCUs enable TinyML
LOW LOW
SIZE
POWER COST
LOW LOW
SIZE
POWER COST
HIGH DEMAND
What Makes TinyML?
Embedded
Systems
TinyML
Machine
Learning
What Makes TinyML?
Embedded
Systems
TinyML
Machine
Learning
What Makes TinyML?
Embedded
Systems
TinyML
Machine
Learning
Data Address
Memory 8 bits 32 bits
0xFFFFFFFF
Memory is arranged as a series of “locations”
Each location has a unique “address”
Each location holds a byte (byte-addressable)
e.g. the memory location at address 0x080001B0
contains the byte value 0x70, i.e., 112
The number of locations in memory is limited 70 0x080001B0
BC 0x080001AF
e.g. 4 GB of RAM 18 0x080001AE
1 Gigabyte (GB) = 230 bytes 01 0x080001AD
232 locations ➔ 4,294,967,296 locations! A0 0x080001AC
Values stored at each location can represent
either program data or program instructions
e.g. the value 0x70 might be the code used to tell
the processor to add two values together
0x00000000
Memory 83
Computer Architecture
Von-Neumann Harvard
Instructions and data are stored Data and instructions are stored
in the same memory. into separate memories.
84
Computer Architecture
Von-Neumann Harvard
Instructions and data are stored Data and instructions are stored
in the same memory. into separate memories.
85
ARM Cortex-M Series Family
Von-Neumann Harvard
Instructions and data are stored Data and instructions are stored
in the same memory. into separate memories.
86
Levels of Program Code
C Program Assembly Program Machine Program
0010000100000000
int main(void){ 0010000000000000
int i; 1110000000000001
int total = 0; Compile Assemble 0100010000000001
for (i = 0; i < 10; i++) {
0001110001000000
total += i;
}
0010100000001010
while(1); // Dead loop 1101110011111011
} 1011111100000000
1110011111111110
87
Why should we learn Assembly?
Assembly isn’t “just another language”.
Help you understand how does the processor work
Cost-sensitive applications
Embedded devices, where the size of code is limited, wash machine controller, automobile controllers
88
See a Program Runs
C Code
Assembly Code
int main(void){
int a = 0; MOVS r1, #0x00 ; int a = 0
int b = 1; compiler MOVS r2, #0x01 ; int b = 1
int c; ADDS r3, r1, r2 ; c = a + b
c = a + b; MOVS r0, 0x00 ; set return value
return 0; BX lr ; return
}
Machine Code
0010000100000000 2100 ; MOVS r1, #0x00
0010001000000001 2201 ; MOVS r2, #0x01
0001100010001011 188B ; ADDS r3, r1, r2
0010000000000000 2000 ; MOVS r0, #0x00
0100011101110000 4770 ; BX lr
In Binary In Hex
89
Processor Registers
32 bits
Fastest way to read and write
R0 Registers are within the processor chip
R1
A register stores 32-bit value
R2
Low R3 ARM Cortex-M has
Registers
R4 R0-R12: 13 general-purpose registers
R5 R13: Stack pointer (Shadow of MSP or PSP)
General
R6 Purpose
Register
R14: Link register (LR)
R7
R15: Program counter (PC)
R8
Special registers (xPSR, BASEPRI, PRIMASK, etc)
R9
High
32 bits
Registers R10
R11 xPSR
R12 BASEPRI
Special
R13 (SP) R13 (MSP) R13 (PSP) PRIMASK Purpose
Register
R14 (LR) FAULTMASK
R15 (PC) CONTROL
90
Program Execution
Program Counter (PC) is a register that holds the memory address of the next instruction
to be fetched from the memory.
Memory Address
1. Fetch
instruction at
PC address 4770 0x080001B4
2000 0x080001B2
PC 188B 0x080001B0
2201 0x080001AE
3. Execute 2. Decode 2100 0x080001AC
the the
instruction instruction PC = 0x080001B0
Instruction = 188B or
2000188B or 8B180020
91
Three-state pipeline:
Fetch, Decode, Execution
Pipelining allows hardware resources to be fully utilized
One 32-bit instruction or two 16-bit instructions can be fetched.
92
Three-state pipeline:
Fetch, Decode, Execution
Pipelining allows hardware resources to be fully utilized
One 32-bit instruction or two 16-bit instructions can be fetched.
Clock
93
Machine codes are stored in memory
Data Address
r15 pc
0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2
r1
r0
0x00000000
Registers CPU
Memory 94
Fetch Instruction: pc = 0x08001AC
Decode Instruction: 2100 = MOVS r1, #0x00
Data Address
r15 0x080001AC pc 0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2
r1
r0
0x00000000
Registers CPU
Memory 95
Execute Instruction:
MOVS r1, #0x00
Data Address
r15 0x080001AC pc 0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2
r1 0x00000000
r0
0x00000000
Registers CPU
Memory 96
Fetch Next Instruction: pc = pc + 2
Data Address
r15 0x080001AE pc
0xFFFFFFFF
r14 lr
• Thumb-2 consists of a
mix of 16- & 32-bit r13 sp
instructions r12
• In reality, we always
fetch 4 bytes from the
r11
instruction memory r10
(either one 32-bit
r9 4770 0x080001B4
instruction or two 16-
bit instructions) r8 ALU 2000 0x080001B2
• To simplify the demo, r7 188B 0x080001B0
we assume we only 2201
r6 0x080001AE
fetch 2 bytes from the
instruction memory in r5 2100 0x080001AC
this example. r4
r3
r2
r1 0x00000000
r0
0x00000000
Registers CPU
Memory 97
Fetch Next Instruction: pc = pc + 2
Decode & Execute: 2201 = MOVS r2, #0x01
Data Address
r15 0x080001AE pc
0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2 0x00000001
r1 0x00000000
r0
0x00000000
Registers CPU
Memory 98
Fetch Next Instruction: pc = pc + 2
Decode & Execute: 188B = ADDS r3, r1, r2
Data Address
r15 0x080001B0 pc
0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3 0x00000001
r2 0x00000001
r1 0x00000000
r0
0x00000000
Registers CPU
Memory 99
Fetch Next Instruction: pc = pc + 2
Decode & Execute: 2000 = MOVS r0, #0x00
Data Address
r15 0x080001B2 pc
0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2 0x00000001
r1 0x00000000
r0 0x00000000
0x00000000
Registers CPU
Memory 100
Fetch Next Instruction: pc = pc + 2
Decode & Decode: 4770 = BX lr
Data Address
r15 0x080001B4 pc
0xFFFFFFFF
r14 lr
r13 sp
r12
r11
r10
r9 4770 0x080001B4
r8 ALU 2000 0x080001B2
r7 188B 0x080001B0
r6 2201 0x080001AE
r5 2100 0x080001AC
r4
r3
r2 0x00000001
r1 0x00000000
r0 0x00000000
0x00000000
Registers CPU
Memory 101
Realities
In the previous example,
PC is incremented by 2
102
Realities
PC is always incremented by 4.
Each time, 4 bytes are fetched from the instruction memory
It is either two 16-bit instructions or one 32-bit instruction
If bit [15-11] = 11101, 11110, or 11111, then, it is the first half-word of a 32-bit instruction.
Otherwise, it is a 16-bit instruction.
103
Example:
Calculate the Sum of an Array
int main(void){
int i;
total = 0;
for (i = 0; i < 10; i++) {
total += a[i];
}
while(1);
}
104
Example:
Calculate the Sum of an Array
Instruction Data
Memory (Flash) Memory (RAM)
105
Loading Code and Data into Memory
108
Loading Code and Data into Memory
109