Professional Documents
Culture Documents
Chapter 03 PDF
Chapter 03 PDF
Hardware/Software Introduction
• General-Purpose Processor
– Processor designed for a variety of computation tasks
– Low unit cost, in part because manufacturer spreads NRE
over large numbers of units
• Motorola sold half a billion 68HC05 microcontrollers in 1996 alone
– Carefully designed since higher NRE is acceptable
• Can yield good performance, size and power
– Low NRE cost, short time-to-market/prototype, high
flexibility
• User just writes software; no processor design
– a.k.a. “microprocessor” – “micro” used when they were
implemented on one or a few chips rather than entire rooms
2
Basic Architecture
3
Datapath Operations
• Load Processor
– Read memory location Control unit Datapath
into register ALU
• ALU operation Controller Control +1
/Status
– Input certain registers
through ALU, store Registers
back in register
• Store 10 11
– Write register to PC IR
memory location
I/O
Memory
...
10
11
...
4
Control Unit
• Control unit: configures the datapath
operations Processor
– Sequence of desired operations Control unit Datapath
(“instructions”) stored in memory –
ALU
“program”
Controller Control
• Instruction cycle – broken into /Status
several sub-operations, each one
clock cycle, e.g.: Registers
– Fetch: Get next instruction into IR
– Decode: Determine what the
instruction means
PC IR R0 R1
– Fetch operands: Move data from
memory to datapath register
– Execute: Move data through the
ALU I/O
– Store results: Write data from 100 load R0, M[500] Memory
...
500 10
register to memory 101 inc R1, R0 501
102 store M[501], R1 ...
5
Control Unit Sub-Operations
• Fetch Processor
Control unit Datapath
– Get next instruction
ALU
into IR Controller Control
/Status
– PC: program
counter, always Registers
points to next
instruction PC 100 IR R0 R1
load R0, M[500]
– IR: holds the
fetched instruction I/O
• Decode Processor
Control unit Datapath
– Determine what the
ALU
instruction means Controller Control
/Status
Registers
PC 100 IR R0 R1
load R0, M[500]
I/O
Registers
10
PC 100 IR R0 R1
load R0, M[500]
I/O
• Execute Processor
Control unit Datapath
– Move data through
ALU
the ALU Controller Control
/Status
– This particular
instruction does Registers
I/O
I/O
PC=100 Processor
Registers
10
PC 100 IR R0 R1
load R0, M[500]
I/O
PC=100 Processor
PC=101
Registers
Fetch Decode Fetch Exec. Store
ops results
clk
10 11
PC 101 IR R0 R1
inc R1, R0
I/O
PC=100 Processor
PC=101
Registers
Fetch Decode Fetch Exec. Store
ops results
clk
10 11
PC 102 IR R0 R1
store M[501], R1
PC=102
Fetch Decode Fetch Exec. Store I/O
ops results ...
100 load R0, M[500] Memory
clk 500 10
101 inc R1, R0 501 11
102 store M[501], R1 ...
13
Architectural Considerations
14
Architectural Considerations
register delay in
entire processor PC IR
– Memory access is
often the longest I/O
Memory
15
Pipelining: Increasing Instruction
Throughput
Wash 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Non-pipelined Pipelined
Dry 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Fetch-instr. 1 2 3 4 5 6 7 8
Decode 1 2 3 4 5 6 7 8
Execute 1 2 3 4 5 6 7 8
Instruction 1
Store res. 1 2 3 4 5 6 7 8
Time
pipelined instruction execution
16
Superscalar and VLIW Architectures
17
Two Memory Architectures
Processor Processor
• Princeton
– Fewer memory
wires
• Harvard
Program Data memory Memory
– Simultaneous memory (program and data)
program and data
memory access
Harvard Princeton
18
Cache Memory
Memory
19
Programmer’s View
20
Assembly-Level Instructions
Instruction 1 opcode operand1 operand2
...
• Instruction Set
– Defines the legal set of instructions for that processor
• Data transfer: memory/register, register/register, I/O, etc.
• Arithmetic/logical: move register through ALU and back
• Branches: determine next PC value when not just PC+1
21
A Simple (Trivial) Instruction Set
22
Addressing Modes
Addressing Register- file Memory
mode Operand field contents contents
Immediate Data
Register-direct
Register address Data
Register
Register address Memory address Data
indirect
Data
23
Sample Programs
C program Equivalent assembly program
24
Programmer Considerations
25
Microprocessor Architecture Overview
26
Example: parallel port driver
27
Parallel Port Example
; This program consists of a sub-routine that reads extern “C” CheckPort(void); // defined in
; the state of the input pin, determining the on/off state // assembly
; of our switch and asserts the output pin, turning the LED void main(void) {
; on/off accordingly while( 1 ) {
.386 CheckPort();
}
CheckPort proc }
push ax ; save the content
push dx ; save the content
mov dx, 3BCh + 1 ; base + 1 for register #1
in al, dx ; read register #1
and al, 10h ; mask out all but bit # 4 Pin 13
cmp al, 0 ; is it 0?
jne SwitchOn ; if not, we need to turn the LED on Switch
PC Parallel port
SwitchOff:
mov dx, 3BCh + 0 ; base + 0 for register #0 Pin 2 LED
in al, dx ; read the current state of the port
and al, f7h ; clear first bit (masking)
out dx, al ; write it out to the port
jmp Done ; we are done
28
Operating System
execution MOV
MOV
R0, 1324
R1, file_name
--
--
system call “open” id
address of file-name
INT 34 -- cause a system call
• Or even just multiple threads from JZ R0, L1 -- if zero -> error
the OS L2:
29
Development Environment
• Development processor
– The processor on which we write and debug our programs
• Usually a PC
• Target processor
– The processor that the program will run on in our embedded
system
• Often different from the development processor
30
Software Development Process
• Compilers
C File C File Asm.
– Cross compiler
File
• Runs on one
Compiler Assembler
processor, but
generates code for
Binary Binary Binary
File
another
File File
Linker
• Assemblers
Debugger
Library
• Linkers
Exec.
Profiler
File
• Debuggers
Implementation Phase Verification Phase
• Profilers
31
Running a Program
32
Instruction Set Simulator For A Simple
Processor
#include <stdio.h> }
typedef struct { }
unsigned char first_byte, second_byte; return 0;
} instruction; }
33
Testing and Debugging
(a) (b)
• ISS
Implementation Implementation – Gives us control over time –
Phase Phase set breakpoints, look at
register values, set values,
Verification step-by-step execution, ...
Phase Development processor
– But, doesn’t interact with real
Debugger environment
/ ISS
• Download to board
Emulator
– Use device programmer
– Runs in real environment, but
External tools
not controllable
• Compromise: emulator
Programmer
– Runs in real environment, at
speed or near
Verification
Phase – Supports some controllability
from the PC
34
Application-Specific Instruction-Set
Processors (ASIPs)
• General-purpose processors
– Sometimes too general to be effective in demanding
application
• e.g., video processing – requires huge video buffers and operations
on large arrays of data, inefficient on a GPP
– But single-purpose processor has high NRE, not
programmable
• ASIPs – targeted to a particular domain
– Contain architectural features specific to that domain
• e.g., embedded control, digital signal processing, video processing,
network processing, telecommunications, etc.
– Still programmable
35
A Common ASIP: Microcontroller
37
Trend: Even More Customized ASIPs
38
Selecting a Microprocessor
• Issues
– Technical: speed, power, size, cost
– Other: development environment, prior expertise, licensing, etc.
• Speed: how evaluate a processor’s speed?
– Clock speed – but instructions per cycle may differ
– Instructions per second – but work per instr. may differ
– Dhrystone: Synthetic benchmark, developed in 1984. Dhrystones/sec.
• MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digital’s VAX
11/780). A.k.a. Dhrystone MIPS. Commonly used today.
– So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second
– SPEC: set of more realistic benchmarks, but oriented to desktops
– EEMBC – EDN Embedded Benchmark Consortium, www.eembc.org
• Suites of benchmarks: automotive, consumer electronics, networking, office
automation, telecommunications
39
General Purpose Processors
Processor Clock speed Periph. Bus Width MIPS Power Trans. Price
General Purpose Processors
Intel PIII 1GHz 2x16 K 32 ~900 97W ~7M $900
L1, 256K
L2, MMX
IBM 550 MHz 2x32 K 32/64 ~1300 5W ~7M $900
PowerPC L1, 256K
750X L2
MIPS 250 MHz 2x32 K 32/64 NA NA 3.6M NA
R5000 2 way set assoc.
StrongARM 233 MHz None 32 268 1W 2.1M NA
SA-110
Microcontroller
Intel 12 MHz 4K ROM, 128 RAM, 8 ~1 ~0.2W ~10K $7
8051 32 I/O, Timer, UART
Motorola 3 MHz 4K ROM, 192 RAM, 8 ~.5 ~0.1W ~10K $5
68HC811 32 I/O, Timer, WDT,
SPI
Digital Signal Processors
TI C5416 160 MHz 128K, SRAM, 3 T1 16/32 ~600 NA NA $34
Ports, DMA, 13
ADC, 9 DAC
Lucent 80 MHz 16K Inst., 2K Data, 32 40 NA NA $75
DSP32C Serial Ports, DMA
Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998
40
Designing a General Purpose Processor
FSMD
Declarations:
• Not something an embedded bit PC[16], IR[16]; Reset PC=0;
bit M[64k][16], RF[16][16];
system designer normally Fetch IR=M[PC];
PC=PC+1
would do Decode from states
below
– But instructive to see how Mov1 RF[rn] = M[dir]
down 0001
Mov2 M[dir] = RF[rn]
to Fetch
41
Architecture of a Simple Microprocessor
• Storage devices for each Datapath
Control unit 1 0
declared variable To all
input RFs
2x1 mux
control
– register file holds each of the signals
RFwa
variables Controller RFw
(Next -state and RFwe
• Functional units to carry out the control
logic; state register)
From all
output RFr1a
RF (16)
42
A Simple Microprocessor
43
Chapter Summary
• General-purpose processors
– Good performance, low NRE, flexible
• Controller, datapath, and memory
• Structured languages prevail
– But some assembly level programming still necessary
• Many tools available
– Including instruction-set simulators, and in-circuit emulators
• ASIPs
– Microcontrollers, DSPs, network processors, more customized ASIPs
• Choosing among processors is an important step
• Designing a general-purpose processor is conceptually the same
as designing a single-purpose processor
44