Download as pdf or txt
Download as pdf or txt
You are on page 1of 88

INTRODUCTION

 ARM is a RISC processor.


 It is used for small size and high
performance applications.
 Simple architecture – low power
consumption.

ARM System - On - Chip


Architecture 2
TIMELINE (1/2)
 1985: Acorn Computer Group manufactures the
first commercial RISC microprocessor.
 1990: Acorn and Apple participation leads to the
founding of Advanced RISC Machines (A.R.M.).
 1991: ARM6, First embeddable RISC
microprocessor.
 1992 – 1994: Various companies use ARM (Sharp,
Samsung), while in 1993 ARM7, the first
multimedia microprocessor is introduced.

ARM System - On - Chip


Architecture 3
TIMELINE (2/2)
 1995: Introduction of Thumb and ARM8.
 1996 – 2000: Alcatel, Huindai, Philips, Sony, use
ΑRM, while in 1999 η ARM cooperates with Erickson
for the development of Bluetooth.
 2000 – 2002: ARM’s share of the 32 – bit embedded
RISC microprocessor market is 80%. ARM Developer
Suite is introduced.

ARM System - On - Chip


Architecture 4
THE ARM
ARCHITECTURE
GENERAL INFO (1/2)
AIM: Simple design

 Load – store architecture


 32 bit data bus
 3 addressing modes

ARM System - On - Chip


Architecture 6
GENERAL INFO (2/2)
Simple architecture
+ Small size
Simple instruction set
+
Code density Low power
consumption

ARM System - On - Chip


Architecture 7
Registers
 32 general purpose registers
 7 modes of operation
 Different set of visible registers and
different cpsr control level in each
mode.

ARM System - On - Chip


Architecture 8
ARM Programming Model
r0
usable in user mode
r1
r2
r3 system modes only
r4
r5
r6
r7
r8_fiq
r8
r9 r9_fiq
r10_fiq
r10
r11 r11_fiq
r12_fiq r13_irq r13_und
r12 r13_abt
r13_fiq r13_svc r14_irq r14_und
r13 r14_svc r14_abt
r14 r14_fiq
r15 (PC)

SPSR_abt SPSR_irq SPSR_und


CPSR SPSR_fiq SPSR_svc

fiq svc abort irq undefined


user mode mode mode mode mode mode
CPSR

ARM CPSR format


31 28 27 8 7 6 5 4 0
N ZC V unused IF T mode

N: Negative
Z: Zero
C: Carry
V: Overflow
Q: Saturation (for enhanced DSP instructions)
ARM System - On - Chip
Architecture 10
Memory Organization
bit 31 bit 0
23 22 21 20
 Address bus: 32 – bits
19 18 17 16  1 word = 32 – bits
word16
15 14 13 12
half-word14 half-word12
11 10 9 8
word8
7 6 5 4
byte6 half-word4
3 2 1 0 byte
byte3 byte2 byte1 byte0 address

ARM System - On - Chip


Architecture 11
Instruction Set
 Three instruction types
 Data processing
 Data transfer
 Control flow

ARM System - On - Chip


Architecture 12
Supervisor mode
 In user mode the operating system handles
operations outside user privileges.
 Using “supervisor calls”, the user goes to
system level and can perform system
functions.

ARM System - On - Chip


Architecture 13
I/O System
 ARM handles peripherals as “memory mapped
devices with interrupt support”.
 Interrupts:
 IRQ: normal interrupt
 FIQ: fast interrupt

ARM System - On - Chip


Architecture 14
Exceptions
 Exceptions:
 Interrupts
 Supervisor Call
 Traps
 When an exception takes place:
 The value of PC is copied to r14_exc
 The operating mode changes into the respective
exception mode.
 The PC takes the exception handler vector
address.
ARM System - On - Chip
Architecture 15
ARM programming model
r0
usable in user mode
r1
r2
r3 system modes only
r4
r5
r6
r7
r8_fiq
r8
r9 r9_fiq
r10_fiq
r10
r11 r11_fiq
r12_fiq r13_irq r13_und
r12 r13_abt
r13_fiq r13_svc r14_irq r14_und
r13 r14_svc r14_abt
r14 r14_fiq
r15 (PC)

SPSR_abt SPSR_irq SPSR_und


CPSR SPSR_fiq SPSR_svc

fiq svc abort irq undefined


user mode mode mode mode mode mode
THE ARM
INSTRUCTION SET
Data Processing Instructions (1/2)
 Arithmetic Operations
ADD r0, r1, r2 ; r0:= r1+r2 and don’t update flags
ADDS r0, r1, r2 ; r0:= r1+r2 and update flags
 Logical Operations
AND r0, r1, r2 ; r0:= r1 AND r2
 Register Movement
MOV r0, r2
 Comparison
CMP r1, r2
ARM System - On - Chip
Architecture 18
Data Processing Instructions (2/2)
 Operands:
 Immediate operands
ADD r3, r3, #1
 Shifted register operands:
ADD r3, r2, r1, LSL #3

 Miscellaneous data processing instructions:


 Multiplication:
MUL r4, r3, r2

ARM System - On - Chip


Architecture 19
Data transfer instructions
 Load and store instructions:
LDR r0, [r1]
STR r0, [r1]
 Offset: LDR r0, [r1,#4]
 Post – indexed: LDR r0, [r1], #16
 Auto – indexed: LDR r0, [r1,#16]!
 Multiple data transfers:
LDMIA r1, {r0,r2,r5}

ARM System - On - Chip


Architecture 20
Examples
 PRE:
 r0 = 0x00000000
 r1 = 0x00009000
 mem32[0x00009000] = 0x01010101
 mem32[0x00009004] = 0x02020202
 LDR r0, [r1, #4]!
 POST:
 r0 = 0x02020202
 r1 = 0x00009004

ARM System - On - Chip


Architecture 21
Examples
 PRE:
 r0 = 0x00000000
 r1 = 0x00009000
 mem32[0x00009000] = 0x01010101
 mem32[0x00009004] = 0x02020202
 LDR r0, [r1, #4]
 POST:
 r0 = 0x02020202
 r1 = 0x00009000

ARM System - On - Chip


Architecture 22
Examples
 PRE:
 r0 = 0x00000000
 r1 = 0x00009000
 mem32[0x00009000] = 0x01010101
 mem32[0x00009004] = 0x02020202
 LDR r0, [r1], #4
 POST:
 r0 = 0x01010101
 r1 = 0x00009004

ARM System - On - Chip


Architecture 23
Examples
 mem32[0x80018] = 0x03
 mem32[0x80014] = 0x02

 mem32[0x80010] = 0x01

 r0 = 0x00080010

LDMIA r0!, {r1-r3}


 r0 = 0x0008001c

 r1 = 0x00000001

 r2 = 0x00000002

 r3 = 0x00000003

ARM System - On - Chip


Architecture 24
Examples
 mem32[0x8001c] = 0x04
 mem32[0x80018] = 0x03

 mem32[0x80014] = 0x02

 mem32[0x80010] = 0x01

 r0 = 0x00080010

LDMIB r0!, {r1-r3}


 r0 = 0x0008001c

 r1 = 0x00000002

 r2 = 0x00000003

 r3 = 0x00000004

ARM System - On - Chip


Architecture 25
Conditional execution
Instructions can be executed
conditionally without braches
CMP r2, r3 ;subtract and set flags
ADDGE r4, r5, r6 ; if r2>r3
SUBLT r4, r5, r6 ; else

ARM System - On - Chip


Architecture 26
Conditional execution mnemonics

ARM System - On - Chip


Architecture 27
Control flow instructions
 Branch instruction: B label
 Conditional branch: BNE label
 Branch and Link: BL label
BL loop
… …
Loop … …
… …
MOV PC, r14 ; επιστροφή

ARM System - On - Chip


Architecture 28
Example 1
AREA ARMex, CODE, READONLY ; Name this block of code ARMex
ENTRY ; Mark first instruction to execute
start
MOV r0, #10 ; Set up parameters
MOV r1, #3
ADD r0, r0, r1 ; r0 = r0 + r1
stop
MOV r0, #0x18 ; angel_SWIreason_ReportException
LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit
SWI 0x123456 ; ARM semihosting SWI
END ; Mark end of file

ARM System - On - Chip


Architecture 29
Example 2
AREA subrout, CODE, READONLY ; Name this block of code
ENTRY ; Mark first instruction to execute
start MOV r0, #10 ; Set up parameters
MOV r1, #3
BL doadd ; Call subroutine
stop
MOV r0, #0x18 ; angel_SWIreason_ReportException
LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit
SWI 0x123456 ; ARM semihosting SWI
doadd
ADD r0, r0, r1 ; Subroutine code
MOV pc, lr ; Return from subroutine
END ; Mark end of file

ARM System - On - Chip


Architecture 30
ARM ORGANIZATION AND
IMPLEMENTATION
3 – Stage A[31:0]

address regis ter


control

Pipeline P
C i ncrementer

(ARM7 – register
bank
PC

80MHz) A multi ply


i nstructi on
decode
&

Fetch
L register
U control
 b
A B
u b b

Decode
s u u
barrel
 s
shi fter
s

 Execute ALU

 Throughput:
1 instruction / cycle
data out regi ster data i n regi ster

D[31:0]
5 – stage pipeline (1/2)
 Program execution time:
N inst  CPI
Tprog 
f clk

 Ways to reduce Tprog:


 Increase f clk Logic simplification
 Reduce CPI reduce the number of
multicycle instructions.

ARM System - On - Chip


Architecture 33
5 – stage
pipeline
(ARM9-
150MHz)
(2/2)
 Fetch
 Decode
 Execute
 Buffer / Data
 Write - Back
ARM coprocessor interface
 ARM supports upto 16 coprocessors, which
can be software emulated.
 Each coprocessor has upto 16 general-
purpose registers
 ARM is a load and store architecture.
 Coprocessors usually handle on – chip
functions, such as cache and memory
management.

ARM System - On - Chip


Architecture 35
ARCHITECTURAL SUPPORT FOR
HIGH – LEVEL LANGUAGES
Floating - point accelerator (1/2)

 For floating-point operations, ARM has the FPE


software emulator and the FPA 10 hardware floating
– point accelerator.
 FPA 10 includes:
 Coprocessor interface
 Load / store unit
 Register bank ( 8 registers 80 – bit )
 ALU (adder, mult, div)

ARM System - On - Chip


Architecture 37
Floating - point accelerator (2/2)
data bus

pipeline instructio n load/store


co ntr ol issuer unit

co processor registe r ba nk
co processor
hand-sh ake interface

add
arithmetic
mult
unit
div

ARM System - On - Chip


Architecture 38
APCS (1/2)
 APCS (ARM Procedure Call Standard) is a set of
rules concerning C procedure input and output.
 Specific use of general purpose registers. (r0 –
r4: arguments, r4 – r8 variables, r10 stack limit,
etc. )
 Procedure I/O:

BL Loop

Loop …
MOV pc, lr

ARM System - On - Chip


Architecture 39
APCS (2/2)
C code Assembly code

f1 LDR r0, [r13]


void f1(int a) { STR r13!, [r14]
f2(a); STR r13!, [r0]
} BL f2
16 SUB r13,#4
LDR r13!, r15
8
4
0 Stack pointer
ARM System - On - Chip
Architecture 40
THUMB PROGRAMMER’S
MODEL
General information
 Thumb objective:
Code density.
 Thumb has a 16 – bit instruction set.
 A subset of the ARM instruction set is coded to a
16–bit space
 With appropriate use great benefits can be
achieved in terms of
 Power efficiency
 Enhanced performance

ARM System - On - Chip


Architecture 42
Going in and out of Thumb mode
 Using the BX instruction, in ARM state:
e.g. ΒΧ r0
 Commands are assembled as 16 – bit
instructions with the appropriate directive
 If r0[0] is 1, the T bit in the CPSR becomes 1
and the PC is set to the address obtained from
the remaining bits of r0.
 Using the BX instruction from Thumb state,
we return to ARM state.

ARM System - On - Chip


Architecture 43
The Thumb programmer’s model
 Thumb registers
r0
r1 shaded registers have
res tricted acc es s
r2
r3
r4 Lo regis ters
r5
r6
r7
r8
r9
r10
r11
Hi registers
r12
SP (r13)
CPSR
LR (r14)
PC (r15)

ARM System - On - Chip


Architecture 44
ARM vs. Thumb (1/3)
 Thumb  ARM
 Upto 70% code  40% faster code
size reduction when coupled with
 40% more a 32-bit memory
instructions.
 45% faster code
with 16-bit
memory
 Requires about
30% less external
memory
ARM System - On - Chip
Architecture 45
ARM vs. Thumb (2/3)
 If performance is critical:
ARM

 If cost and power consumption are


critical:
Thumb

ARM System - On - Chip


Architecture 46
ARM and Τhumb interaction
 A 32 – bit ARM system can go into Thumb mode
for specific routines, in order to meet power and
memory constraints.
 A 16 – bit system: Can use an on – chip, 32 – bit
memory for ARM state routines, and a 16-bit off
– chip memory and Thumb code for the rest of
the application.

ARM System - On - Chip


Architecture 47
Example 3
AREA ThumbSub, CODE, READONLY ; Name this block of code
ENTRY ; Mark first instruction to execute
CODE32 ; Subsequent instructions are ARM
header ADR r0, start + 1 ; Processor starts in ARM state,
BX r0 ; so small ARM code header used
; to call Thumb main program
CODE16 ; Subsequent instructions are Thumb
start
MOV r0, #10 ; Set up parameters
MOV r1, #3
BL doadd ; Call subroutine
stop
MOV r0, #0x18 ;
angel_SWIreason_ReportException
LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit
SWI 0xAB ; Thumb semihosting SWI
doadd
ADD r0, r0, r1 ; Subroutine code
MOV pc, lr ; Return from subroutine
END ; Mark end of file

ARM System - On - Chip


Architecture 48
Example 4
 Implement the following pseudocode in ARM
and Thumb assembly. Which is more efficient
in terms of execution time and which in terms
of code size?
If r1>r2 then
R3= r4 + r5
R6 = r4 – r5
Else
R3= r4 - r5
R6 = r4 + r5
ARM System - On - Chip
Architecture 49
Example 5
 Write an ARM assembly program that
loads data from memory location 0x40,
sets bits 3 to 5, clears bits 0 to 2 and
leaves the remaining bits unchanged.
 Test it using 0xAD as input data

ARM System - On - Chip


Architecture 50
ARCHITECTURAL SUPPORT
FOR SYSTEM
DEVELOPMENT
The ARM memory interface

A basic
ARM
memory
system
AMBA (1/4)
 Advanced Microcontroller Bus Architecture
 Advanced High – Performance Bus
 Advanced System Bus
 Advanced Peripheral Bus
 AMBA objectives:
 Technology – independence
 To encourage modular system design

ARM System - On - Chip


Architecture 53
AMBA (2/4)
 A typical AMBA – based system

ARM System - On - Chip


Architecture 54
AMBA (3/4)
 AHB bus arbiter

 Burst address

transaction master
1
slave
1

 Split

write
data
transaction master slave

Data bus 64 –
2 2

128 bit
master slave
3 read 3
data

decoder

ARM System - On - Chip


Architecture 55
AMBA (4/4)
 AMBA Design Kit (ADK)
 An environment that assists designers in developing
ΑΜΒΑ based components και SoC designs.

ARM System - On - Chip


Architecture 56
Signal Processing Support (1/2)

 Piccolo DSP coprocessor.


 Various data memories for maximizing
throughput.

ARM System - On - Chip


Architecture 57
Signal Processing Support (2/2)
 Piccolo
ALU

mult

decode and control


ARM7TDMI

output
register buffer
bank

input
I cache
buffer

AMBA i/f AMBA i/f

AMBA
MEMORY HIERARCHY
Memory hierarchy
Larger size Lower speed

Memory Size Speed


type
Registers 32 – bit A few nsec
On – chip 8– 10 nsec
cache 32kbytes
Off – chip 100 – 200 10 – 30
cache kbytes nsec
RAM Mbytes 100 nsec
ARM System - On - Chip
Architecture 60
On – chip memory
 Necessary for performance
 Some system prefer RAM to on – chip
cache. Simpler, cheaper and less power-
hungry.

ARM System - On - Chip


Architecture 61
Cache types
 Cache types:
 Unified cache.
 Separate instruction and data caches.
 Performance: hit rate – miss rate
t av  htcache  (1  h)t main
 Compulsory miss: first time and address is accessed
 Capacity miss: When cache full
 Conflict miss: Two addresses compete for the same place in
the cache
ARM System - On - Chip
Architecture 62
Replacement policy -implementation

 Least Recently Used (LRU)


 Least Frequently Used (LFU)
 Data prediction

 Fully-associative
 Direct-mapped
 Set-associative
ARM System - On - Chip
Architecture 63
Direct – mapped cache (1/2)

 A line of
data
stored
in a tag
of
memory

ARM System - On - Chip


Architecture 64
Direct – mapped cache (2/2)

 Each memory location has a specific


place in the cache.
 Tag and data can be accessed at the
same time.
 Tag RAM smaller than data RAM and
has a smaller access time allowing the
comparison to complete before
accessing the data RAM.
ARM System - On - Chip
Architecture 65
 2 – way set
– associative
cache. (1/3)
Set associative cache (2/3)
 A set – associative cache has a number of
sets yielding n – way associative cache.
 Two addresses that would be competing for
the same spot in a direct mapped cache, can
be stored in different locations and accessed
independently.

ARM System - On - Chip


Architecture 67
Set associative (3/3)
 Set selection:
 Random allocation
 Least recently used (LRU)
 Round – robin (cyclic)

ARM System - On - Chip


Architecture 68
Fully associative (1/2)
address

tag CAM data RA M

mux

hit data
Write strategies
 Write – through
All write operations are passed to main memory
 Write – through with buffered write
Write operations are passed to main memory
through the write buffer
 Copy – back (write – back)
Write operations update only the cache.

ARM System - On - Chip


Architecture 70
Cache feature summary
Org ani zati o nal feature Opti o ns
Cache-MMU rel ati o ns hi p Physical cache Virtual cache
Cache co ntents Unified instruction Separate instruction
and data cache and data caches
As s o ci ati v i ty Direct-mapped Set-associative Fully associative
RAM-RAM RAM-RAM CAM-RAM
Repl acement s trateg y Cyclic Random LRU
Wri te s trateg y Write-through Write-through with Copy-back
write buffer

ARM System - On - Chip


Architecture 71
‘Perfect’ cache performance

Cache fo rm Perfo rmance


No cache 1
Instruction-only cache 1.95
Instruction and data cache 2.5
Data-only cache 1.13

ARM System - On - Chip


Architecture 72
MMU (1/3)
Two memory management approaches:
 Segmentation
 Paging

ARM System - On - Chip


Architecture 73
MMU (2/3)
 Segmented memory management:
segment selector logical address

base limit

segment descriptor table

+ >?

physical address access fault

ARM System - On - Chip


Architecture 74
MMU (3/3)
 Paging memory management:
31 22 21 12 11 0
logical address

data

page page page


directory table frame

ARM System - On - Chip


Architecture 75
ARCHITECTURAL SUPPORT
FOR OPERATING SYSTEMS
External Trace Port 14 External 8 external DMA
Clock Analyser Interrupts requests

Timers
&
ETM CLCD
W'Dog VIC DMAC CLCD
RTC
(PL031)
(PL192) (PL080) (PL110) Display
ARM1136JF
External
System
core
Reset &
Control AHB/APB 64 64 64 64
Battery Fail Bridge

}
config 1.
64
2.
3.
64
4.
64
5. 8 AHBs
64
6.
SDRAM MPMC
7.
& DDR (PL176)
8.

unassigned
config Bus Matrix
Static SMC
Memory (PL093)
1. ARM Periph AHB AHB/APB AHB/APB
Bridge Bridge UART
2. ARM D Write AHB (PL011) 2x UARTs
3. ARM D Read AHB
4. ARM I AHB
5. ARM DMA AHB
Smart Card
6. CLCD AHB GPIO SSP SCI
(PL061) (PL022) (PL131) (UICC
7. DMA 2 AHB
compliant)
8. DMA 1 AHB

32 GPIO
Lines
CP15
 On – chip coprocessor for MMU, cache,
protection unit control.
 Control takes place through registers with
instructions executed in supervisor mode.

ARM System - On - Chip


Architecture 77
Protection Unit
 Simpler alternative to the MMU.
Requires simpler software and
hardware.
 Does not use translation tables, but 8
protection regions instead.

ARM System - On - Chip


Architecture 78
ARM DEVELOPER SUITE
ARMULATOR (1/2)
 Armulator: Emulator of various ARM
processors.
 Allows project development in C, C++
or Assembly.
 It includes debugger, compilers,
assembler and this entire set is called
ARM Developer Suite (ADS).

ARM System - On - Chip


Architecture 80
ARMULATOR (2/2)
 Possible project options:
 ARM and Thumb Interworking
 Mixing C, C++ and Assembly
 Code for ROM
 Exception handlers

MM

ARM System - On - Chip


Architecture 81
ARMULATOR TUTORIAL
 CODEWARRIOR ENVIRONMENT

ARM System - On - Chip


Architecture 82
ARM System - On - Chip
Architecture 83
ARM System - On - Chip
Architecture 84
ARM System - On - Chip
Architecture 85
ARM System - On - Chip
Architecture 86
ARM System - On - Chip
Architecture 87

You might also like