Professional Documents
Culture Documents
Computers As Components 2nd Edi - Wayne Wolf
Computers As Components 2nd Edi - Wayne Wolf
Computers As Components 2nd Edi - Wayne Wolf
output analog
input analog
CPU
mem
embedded
computer
Overheads for Computers as
© 2008 Wayne Wolf Components, 2nd ed. 3
Examples
• Cell phone.
• Printer.
• Automobile: engine, brakes, dash, etc.
• Airplane: engine, flight controls,
nav/comm.
• Digital television.
• Household appliances.
sensor sensor
brake brake
hydraulic
ABS
pump
brake brake
sensor sensor
Overheads for Computers as
© 2008 Wayne Wolf Components, 2nd ed. 11
Characteristics of embedded
systems
• Sophisticated functionality.
• Real-time operation.
• Low manufacturing cost.
• Low power.
• Designed to tight deadlines by small
teams.
requirements
specification
architecture
component
design
system
integration
Overheads for Computers as
© 2008 Wayne Wolf Components, 2nd ed. 28
Top-down vs. bottom-up
• Top-down design:
• start from most abstract description;
• work to most detailed.
• Bottom-up design:
• work from small components to big system.
• Real design uses both techniques.
name
purpose
inputs
outputs
functions
performance
manufacturing cost
power
physical size/weight
• Moving map
obtains position I-78
from GPS, paints
Scotch Road
map from local
database.
lat: 40 13 lon: 32 19
Overheads for Computers as
© 2008 Wayne Wolf Components, 2nd ed. 34
GPS moving map needs
• Functionality: For automotive use. Show major
roads and landmarks.
• User interface: At least 400 x 600 pixel screen.
Three buttons max. Pop-up menu.
• Performance: Map should scroll smoothly. No
more than 1 sec power-up. Lock onto GPS
within 15 seconds.
• Cost: $120 street price = approx. $30 cost of
goods sold.
user
database interface
memory
panel I/O
user
timer
interface
comment
attributes
Overheads for Computers as
© 2008 Wayne Wolf Components, 2nd ed. 53
UML class
Display
class name
pixels
elements
menu_items
mouse_click()
operations
draw_box
Derived_class
UML
generalization
Base_class
Display
base
pixels class
elements
menu_items
pixel()
derived class set_pixel()
mouse_click()
draw_box
BW_display Color_map_display
Overheads for Computers as
© 2008 Wayne Wolf Components, 2nd ed. 59
Multiple inheritance
base classes
Speaker Display
Multimedia_display
derived class
message
msg = msg1 message set
length = 1102
count = 2
message
msg = msg2
length = 2114
transition
a b
<<signal>>
mouse_click a
leftorright: button
mouse_click(x,y,button)
x, y: position
b
declaration
event description
Overheads for Computers as
© 2008 Wayne Wolf Components, 2nd ed. 69
Call event
draw_box(10,5,3,2,blue)
c d
tm(time-value)
e f
start input/output
mouse_click(x,y,button)/ region = menu/
find_region(region) which_menu(i) call_menu(I)
region got menu called
found item menu item
region = drawing/
find_object(objid) highlight(objid)
found object
object highlighted
mouse_click(x,y,button)
which_menu(x,y,i)
time
call_menu(i)
rcvr motor
power
supply
console
• Voltage moves
around the power logic 1 logic 0
supply voltage; adds
no DC component.
• 1 is 58 ms, 0 is at
time
least 100 ms.
58 ms >= 100 ms
set-speed speed
(positive/negative)
set-inertia inertia-value (non-
negative)
estop none
:console :train_rcvr
set-inertia
set-speed
set-speed
estop
set-speed
command
1..n: command
:console :receiver
console
1 1
1 1 1 1
1 1 1 1
receiver* sender*
train set
1 1..t
1 1
train
1 1 motor
receiver interface
1 1
1 1 controller 1 1
detector* pulser*
duty
cycle +
V
-
knobs* pulser*
train-knob: integer pulse-width: unsigned-
speed-knob: integer integer
inertia-knob: unsigned- direction: boolean
integer
emergency-stop: boolean
sender* detector*
panel motor-interface
speed: integer
train-number() : integer
speed() : integer
inertia() : integer
estop() : boolean
new-settings()
transmitter receiver
current: command
send-speed(adrs: integer, new: boolean
speed: integer)
send-inertia(adrs: integer, read-cmd()
val: integer) new-cmd() : boolean
set-estop(adrs: integer) rcv-type(msg-type:
command)
rcv-speed(val: integer)
rcv-inertia(val:integer)
© 2000 Morgan Overheads for Computers as
Kaufman Components 2nd ed. 103
Class descriptions
• transmitter class has one behavior for
each type of message sent.
• receiver function provides methods to:
• detect a new message;
• determine its type;
• read its parameters (estop has no
parameters).
formatter
current-train: integer
current-speed[ntrains]: integer
current-inertia[ntrains]:
unsigned-integer
current-estop[ntrains]: boolean
send-command()
panel-active() : boolean
operate()
control panel-active
train number inertia/estop
update-panel()
T current-train = train-knob
panel*:read-train() update-screen
changed = true
F
T
panel*:read-speed() current-speed = throttle
changed = true
F
... ...
© 2000 Morgan Overheads for Computers as
Kaufman Components 2nd ed. 110
Controller class
controller
current-train: integer
current-speed[ntrains]: integer
current-direction[ntrains]: boolean
current-inertia[ntrains]:
unsigned-integer
operate()
issue-command()
wait for a
command
from receiver
receive-command()
issue-command()
command
type: 3-bits
address: 3-bits
parity: 1-bit
address
200
PC
memory data
CPU
200 ADD r5,r1,r3 ADD IR
r5,r1,r3
address
data memory
data PC
CPU
address
status
mechanism
reg
CPU
data
reg
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 130
Application: 8251 UART
• Universal asynchronous receiver
transmitter (UART) : provides serial
communication.
• 8251 functions are integrated into
standard PC interface chip.
• Allows many communication parameters
to be programmed.
no
char
time
status
(8 bit)
CPU xmit/
8251
rcv
data serial
(8 bit) port
intr request
status
mechanism
intr ack reg
PC
IR
CPU
data/address data
reg
empty
a
empty
bc
c
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 148
Debugging interrupt code
• What if you forget to change registers?
• Foreground program can exhibit mysterious
bugs.
• Bugs will be hard to repeat---depend on
interrupt timing.
interrupt
acknowledge
L1 L2 .. Ln
CPU
:interrupts :foreground :A :B :C
A,B
Interrupt handler 0
vector
handler 1
table head
handler 2
handler 3
:CPU :device
receive
request
receive
ack
receive
vector
continue intr?
N Assume priority selection is
execution Y handled before this
point.
N intr priority >
ignore current
priority?
Y
ack
Y
Y N
bus error timeout? vector?
Y
call table[vector]
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 156
Interrupt sequence
• CPU acknowledges request.
• Device sends vector.
• CPU calls handler.
• Software processes request.
• CPU restores state to foreground
program.
• 2-D DCT/IDCT is
computed from block
two 1-D
DCT/IDCT. Column DCT
• Put data in
different banks to
maximize interim
Row
DCT
DCT
throughput.
d = {8,4,2,1};
for (i=0; i<4; i++) {
compute 3 upper differences for
d[i];
compute 3 middle differences
for d[i];
compute 3 lower differences for
d[i]; X
compute minimum value;
move to next d;
}
A U B
M R
C D
address data
cache
controller
cache main
CPU
memory
address
data data
hit value
byte
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 188
Write operations
• Write-through: immediately copy write to
main memory.
• Write-back: write to main memory only
when location is removed from cache.
hit data
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 191
Example: direct-mapped vs.
set-associative
address data
000 0101
001 1111
010 0000
011 0110
100 1000
101 0001
110 1010
111 0100
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 192
Direct-mapped cache behavior
logical physical
address memory address main
CPU management
memory
unit
page 1
page 2
segment 1
memory
segment 2
physical address
page offset
page i base
concatenate
page offset
page
descriptor
page descriptor
flat tree
descriptor concatenate
1st level table
concatenate
descriptor
2nd level table physical address
add r0,r1,#5
fetch decode execute
time
1 2 3
sub
fetch decode ex sub
r2,r3,r6
time
sub
fetch decode
r2,r3,r6
time
Prun = 400 mW
run
10 ms
160 ms
90 ms
10 ms
90 ms
idle sleep
1..m: packed
1..n: input output
symbols symbols
:input :data compressor :output
character P
a .45
P=1
b .24
P=.55
c .11
P=.31
d .08 P=.19
e .07
f .05 P=.12
data-compressor
buffer: data-buffer
table: symbol-table
current-bit: integer
encode(): boolean,
data-buffer
flush()
new-symbol-table()
data-buffer symbol-table
databuf[databuflen] : symbols[nsymbols] :
character data-buffer
len : integer len : integer
data-compressor
1 1
1 1
data-buffer symbol-table
pack into
input T this buffer
symbol
update
fills buffer?
length
symbol table
compare
device 1
enq
device 1 device 2
ack
device 2
1 2 3 4
time
• Clock provides
synchronization.
• R/W is true when
reading (R/W’ is false
when reading).
• Address is a-bit bundle
of address lines.
• Data is n-bit bundle of
data lines.
• Data ready signals
when n-bit data is
ready.
See Ack
ack Adrs Adrs
Wait Wait
adrs
Adrs enable
bridge
bus.
• Fast devices on memory slow device
separate bus.
• A bridge connects high-speed
device
two busses.
• Two varieties:
• AHB is high-performance.
• APB is lower-speed, lower
cost.
• AHB supports pipelining,
burst transfers, split
transactions, multiple bus
masters.
• All devices are slaves on
APB.
• Several different
types of memory:
• DRAM.
• SRAM.
• Flash.
• Each type of memory
comes in varying:
• Capacities.
• Widths.
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 279
Random-access memory
• Dynamic RAM is dense, requires refresh.
• Synchronous DRAM is dominant type.
• SDRAM uses clock to improve performance,
pipeline memory accesses.
• Static RAM is faster, less dense, consumes
more power.
interrupt
row
ADC
voltage
R
Vout
bn
2R
bn-1
4R
bn-2
8R
bn-3
Vin
encoder
...
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 293
Dual-slope conversion
• Use counter to time required to
charge/discharge capacitor.
• Charging, then discharging eliminates
non-linearities.
Vin
timer
Vin converter
CPU memory
device
interface
CPU bus
bus
high-speed bus
intr DMA timers
ctrl controller
low-speed bus
bus
interface
device
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 304
Typical busses
• PCI: standard for high-speed interfacing
• 33 or 66 MHz.
• PCI Express.
• USB (Universal Serial Bus), Firewire (IEEE
1394): relatively low-cost serial interface
with high speed.
target
system
serial line
host system
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 309
Host-based tools
• Cross compiler:
• compiles code on host for target system.
• Cross debugger:
• displays target state, allows target system to
be controlled.
UUT sample
microprocessor
memory
• Simplifies testing of
multiple chips on a
board.
• Registers on pins can
be configured as a
scan chain.
• Used for debuggers,
in-circuit emulators.
• Performance depends
on all the elements of
the system: memory
• CPU. CPU
• Cache. cache
• Bus.
• Main memory.
• I/O device.
• T: # bus cycles.
O1 D O2
• P: time/bus cycle.
• Total time for
W
transfer:
• t = TP.
• D: data payload
length.
• O1 + O2 = overhead Tbasic(N) = (D+O)N/W
O.
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 324
Bus burst transfer bandwidth
• T: # bus cycles.
1 2 B O
• P: time/bus cycle.
• Total time for
… W
transfer:
• t = TP.
• D: data payload
length.
• O1 + O2 = overhead Tburst(N) = (BD+O)N/(BW)
O.
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 325
Memory aspect ratios
16 M
64 M
8M
1 4 8
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 326
Memory access times
• Memory component access times comes
from chip data sheet.
• Page modes allow faster access for
successive transfers on same page.
• If data doesn’t fit naturally into physical
words:
• A = [(E/w)mod W]+1
• Is performance
bottleneck bus or
memory?
bus memory
clock period 1.00E-06 clock period 1.00E-08
W 2 W 0.5
D 1 D 1
O 3 O 4
B 4
N 612000 N 612000
• Speed things up by
running several units
at once.
• DMA provides
parallelism if CPU
doesn’t need the bus:
• DMA + bus.
• CPU.
Alarm
ready
light
set set hour minute
time alarm button
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 333
Operations
• Set time: hold set time, depress hour,
minute.
• Set alarm time: hold set alarm, depress
hour, minute.
• Turn alarm on/off: depress alarm on/off.
1 1 1 1
Lights* Display Mechanism
1 1
1
Buttons*
Speaker* 1
set-time(): boolean
digit-val() buzz()
set-alarm(): boolean
digit-scan() alarm-on(): boolean
alarm-on-light() alarm-off(): boolean
PM-light() minute(): boolean
hour(): boolean
Display
time[4]: integer
alarm-indicator: boolean
PM-indicator: boolean
set-time()
alarm-light-on()
alarm-light-off()
PM-light-on()
PM-light-off()
AM->PM PM->AM
PM=true PM=false
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 340
Scan-keyboard behavior
Set-time and
not set-alarm
compute button activations and hours
Alarm-on Increment time
alarm-ready=
tens w. rollover
true
Alarm-off and AM/PM
no seat/-
no seat/ idle
buzzer off seat/timer on
no seat/- no belt
buzzer seated and no
Belt/buzzer on timer/-
belt/-
belt/
buzzer off no belt/timer on
belted
stream.
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 350
Circular buffer
x1 x2 x3 x4 x5 x6
t1 t2 t3
Data stream
x1
x5 x2
x6 x3
x7 x4
Circular buffer
d2 input d2
d3 d3
use d4 d4
x = a + b; x = a + b;
y = c - d; y = c - d;
z = x * y; z = x * y;
y = b + d; y1 = b + d;
x = a + b; a b c d
y = c - d;
+ -
z = x * y;
y1 = b + d; x
y
* +
single assignment form
z y1
DFG
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 359
DFGs and partial orders
a b c d Partial order:
• a+b, c-d; b+d x*y
+ - Can do pairs of
y operations in any
x
order.
* +
z y1
x = a + b;
y=c+d
T v1 v4
cond value
v2 v3
F
Equivalent forms
T
if (cond1) bb1(); cond1 bb1()
else bb2(); F
bb3(); bb2()
switch (test1) {
case c1: bb4(); break; bb3()
case c2: bb5(); break;
case c3: bb6(); break; c3
c1 test1
}
c2
bb4() bb5() bb6()
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 364
for loop
equivalent
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 365
Assembly and linking
module2
module3
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 377
Dynamic linking
• Some operating systems link modules
dynamically at run time:
• shares one copy of library among all
executing programs;
• allows programs to be updated with new
versions of libraries.
HLL
machine-independent
optimizations
machine-dependent
optimizations
assembly
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 381
Statement translation and
optimization
• Source code is translated into
intermediate form such as CDFG.
• CDFG is transformed/optimized.
• CDFG is translated into instructions with
optimization decisions.
• Instructions are further optimized.
a*b + 5*(c-d) a b c d
* -
expression
5
DFG
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 383
Arithmetic expressions, cont’d.
a b c d ADR r4,a
MOV r1,[r4]
1 * 2 - ADR r4,b
MOV r2,[r4]
5
ADD r3,r1,r2
ADR r4,c
3 * MOV r1,[r4]
ADR r4,d
MOV r5,[r4]
SUB r6,r4,r5
4 +
MUL r7,r6,#5
ADD r8,r7,r3
DFG code
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 384
Control code generation
if (a+b > 0)
x = 5;
else a+b>0 x=5
x = 7;
x=7
growth
proc1 proc1(int a) {
proc2(5);
FP }
frame pointer
proc2
5 accessed relative to SP
SP
stack pointer
a a[0]
a[1] = *(a + 1)
a[2]
• Column-major layout:
a[0,0]
a[0,1] M
...
N
... a[1,0]
a[1,1] = a[i*M+j]
aptr
struct { field1 4 bytes
int field1;
char field2; *(aptr+4)
} mystruct; field2
• Dead code:
#define DEBUG 0
0
if (DEBUG) dbg(p1); 0
• Can be eliminated by 1
analysis of control
dbg(p1);
flow, constant
folding.
for (i=0; i<2; i++) {
a[i*2] = b[i*2] * c[i*2];
a[i*2+1] = b[i*2+1] * c[i*2+1];
}
before after
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 402
Register allocation
• Goals:
• choose register to hold each variable;
• determine lifespan of varible in the register.
• Basic case: within basic block.
w = a + b; t=1 a
x = c + w; t=2 b
y = c + d; t=3 c
d
w
x
y
1 2 3 time
• Need to understand
performance in detail:
• Real-time behavior, not
just typical.
• On complex platforms.
• Program performance
CPU performance:
• Pipeline, cache are
windows into program.
• We must analyze the entire
program.
if (a || b) { /* T1 */ a b c path
if ( c ) /* T2 */ 0 0 0 T1=F, T3=F: no assignments
N
i=N
Y
f = f + c[i] * x[i]
i=i+1
i<N*M
i<X N
Y
z[i] = a[i] + b[i];
i = i+1;
a[0,0] 1024
1024 4099
while (TRUE)
a();
firout = 0.0;
• Controllability:
for (j=curr, k=0; j<N; j++, k++)
firout += buff[j] * c[k]; • Must fill circular buffer
for (j=0; j<curr; j++, k++) with desired N values.
firout += buff[j] * c[k]; • Other code governs
if (firout > 100.0) firout = 100.0; how we access the
if (firout < -100.0) firout = -100.0;
buffer.
• Observability:
• Want to examine
firout before limit
testing.
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 449
Execution paths and testing
• Paths are important in functional testing
as well as performance analysis.
• In general, an exponential number of
paths through the program.
• Show that some paths dominate others.
• Heuristically limit paths.
• Possible criteria:
• Execute every
statement at least not covered
once.
• Execute every branch
direction at least once.
• Equivalent for
structured programs.
• Not true for gotos.
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 451
Basis paths
• Approximate CDFG
with undirected
graph.
• Undirected graphs
have basis paths:
• All paths are linear
combinations of basis
paths.
• Cyclomatic complexity
is a bound on the size
of basis sets:
• e = # edges
• n = # nodes
• p = number of graph
components
• M = e – n + 2p.
• Correct: • Test:
• if (a || (b >= c)) { • a=F
printf(“OK\n”); } • (b >=c) = T
• Incorrect: • Example:
• if (a && (b >= c)) { • Correct: [0 || (3 >=
printf(“OK\n”); } 2)] = T
• Incorrect: [0 && (3
>= 2)] = F
• Variable def-use:
• Def when value is
assigned (defined).
• Use when used on
right-hand side.
• Exercise each def-use
pair.
• Requires testing
correct path.
• Frequency-shift keying:
• separate frequencies for 0 and 1.
0 1
time
0110101 bit-controlled
waveform
generator
A/D converter
zero filter detector 0 bit
Line-in* Receiver
1 1
sample-in()
input()
bit-out()
Transmitter Line-out*
1 1
bit-in()
output()
sample-out()
time
• Tasks:
• spark control
• crankshaft sensing
• fuel/air mixture engine
• oxygen sensor controller
• Kalman filter
deadline
P1
time
initiating period
event aperiodic process
periodic process initiated
at start of period
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 485
Rate requirements on
processes
• Period: interval
between process
activations.
CPU 1 P11
• Rate: reciprocal of
period. CPU 2 P12
• Initiatino rate may be CPU 3 P13
higher than period--- CPU 4 P14
several copies of
process run at once. time
• A process can be in
one of three states: executing gets data
• executing on the CPU; gets and CPU
preempted
• ready to run; CPU needs
data
• waiting for data.
gets data
ready waiting
needs data
• Resource constraints
make schedulability
analysis NP-hard. P1 P2
• Assume:
• No resource conflicts.
• Constant process T1 T2 T3
execution times.
• Require: T
• T ≥ Si Ti
• Can’t use more than
100% of the CPU.
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 496
Hyperperiod
• Hyperperiod: least common multiple
(LCM) of the task periods.
• Must look at the hyperperiod schedule to
find all task interactions.
• Hyperperiod can be very long if task
periods are not chosen carefully.
• Schedule in time
slots. T1 T2 T3 T1 T2 T3
• Same process
activation P P
irrespective of
workload.
• Time slots may be
equal size or
unequal.
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 500
TDMA assumptions
• Schedule based on
least common
multiple (LCM) of
the process P1 P1 P1
periods.
P2 P2
• Trivial scheduler -
> very small PLCM
scheduling
overhead.
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 501
TDMA schedulability
• Always same CPU utilization (assuming
constant process execution times).
• Can’t handle unexpected loads.
• Must schedule a time slot for aperiodic
events.
• TDMA period = 10
ms. TDMA period 1.00E-02
• P1 CPU time 1 ms. CPU time
P1 1.00E-03
• P2 CPU time 3 ms. P2 3.00E-03
P3 2.00E-03
• P3 CPU time 2 ms. P4 2.00E-03
total 8.00E-03
• P4 CPU time 2 ms. utilization 8.00E-01
• Schedule process
only if ready.
• Always test T1 T2 T3 T2 T3
processes in the
same order.
• Variations: P P
• Constant system
period.
• Start round-robin
again after finishing
a round.
• A process can be in
one of three states: executing gets data
• executing on the CPU; gets and CPU
preempted
• ready to run; CPU needs
data
• waiting for data.
gets data
ready waiting
needs data
P3 ready t=18
P2 ready t=0 P1 ready t=15
P2 P1 P2 P3
0 10 20 30 40 50 60
time
© 2000 Morgan Overheads for Computers as
Kaufman Components 521
The scheduling problem
• Can we meet all deadlines?
• Must be able to meet deadlines in all cases.
• How much CPU horsepower do we need
to meet our deadlines?
memory
CPU 1 CPU 2
CPU 1 CPU 2
message message
message
period ti
Pi
computation time Ti
P1 P1 P1 P1 P1
P2 P2 P2
P3 P3
critical
instant
P4
P2 period
P2
P1 period
P1 P1 P1
0 5 10
time
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 545
RMS CPU utilization
• Utilization for n processes is
• S i Ti / ti
• As number of tasks approaches infinity,
maximum utilization approaches 69%.
• Data dependencies
allow us to improve
P1
utilization.
• Restrict combination
of processes that can
run simultaneously. P2
• P1 and P2 can’t run
simultaneously.
memory
CPU 1 CPU 2
CPU 1 CPU 2
message message
message
someClass
<<signal>>
aSig
<<send>>
p : integer sigbehavior()
applications
power
OS kernel management
device
drivers
ACPI BIOS
Hardware platform
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 578
ACPI global power states
• G3: mechanical off
• G2: soft off
• S1: low wake-up latency with no loss of context
• S2: low latency with loss of CPU/cache state
• S3: low latency with loss of all state except memory
• S4: lowest-power state with all devices off
• G1: sleeping state
• G0: working state
analog
time
ADPCM 3 2 1 -1 -2 -3
time
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 581
ADPCM coding
• Coded in a small alphabet with positive
and negative values.
• {-3,-2,-1,1,2,3}
• Minimize error between predicted value
and actual signal value.
S quantizer
inverse
integrator
quantizer
encoder
samples
inverse
integrator
quantizer
decoder
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 583
Telephone system terms
• Subscriber line: line to phone.
• Central office: telephone switching
system.
• Off-hook: phone active.
• On-hook: phone inactive.
1 Lights
Buttons*
1
Speaker*
sample() sample()
sample()
ring-indicator() pick-up()
Message
length
start-adrs
next-msg
samples
Incoming-message Outgoing-message
Activations?
Play OGM
Wait for timeout
Allocate ICM
Erase
Record ICM
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 593
Record-msg/playback-msg
behaviors
nextadrs = 0 nextadrs = 0
msg.samples[nextadrs] = speaker.samples() =
sample(source) msg.samples[nextadrs];
nextadrs++
F F
End(source) nextadrs=msg.length
T T
record-msg playback-msg
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 594
Hardware platform
• CPU.
• Memory.
• Front panel.
• 2 A/Ds:
• subscriber line, microphone.
• 2 D/A:
• subscriber line, speaker.
• Better cost/performance.
• Match each CPU to its tasks or use custom
logic (smaller, cheaper).
• CPU cost is a non-linear function of
performance.
cost
performance
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 599
Why multiprocessors? cont’d.
• Better real-time performance.
• Put time-critical functions on less-loaded
processing elements.
• Remember RMS utilization---extra CPU cycles
must be reserved to meet deadlines.
cost
deadline w.
deadline RMS overhead
performance
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 600
Why multiprocessors? cont’d.
• Using specialized
processors or custom
logic saves power.
• Desktop
uniprocessors are not
power-efficient [Aus04] © 2004 IEEE Computer Society
request accelerator
result
data
data
CPU
memory
I/O
• Single-threaded: • Multi-threaded:
P1
P1
P2 A1
P2 A1
P3
P3
P4
P4
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 619
Execution time analysis
• Single-threaded: • Multi-threaded:
• Count execution time • Find longest path
of all component through execution.
processes.
P1 P2
M1 M2
d1 d2
P3
M1 P1 P1C P2 P2C
M2 P3
time
M1 P1 P1C
M2 P2 P3
time
d1 d2
P3
4
Transmission time = 4
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 627
Initial schedule
M1 P1
M2 P2
M3 P3
network d1 d2
Time = 15
0 5 10 15 20 time
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 628
New design
• Modify P3:
• reads one packet of d1, one packet of d2
• computes partial result
• continues to next packet
M1 P1
M2 P2
M3
P3 P3 P3 P3
network d1d2d1d2d1d2d1d2
Time = 12
0 5 10 15 20 time
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 630
Buffering and performance
• Buffering may sequentialize operations.
• Next process must wait for data to enter
buffer before it can continue.
• Buffer policy (queue, RAM) affects
available parallelism.
• Three processes
separated by buffers:
B1 A B2 B B3 C
A[0] A[0]
A[1] B[0]
… C[0]
Must wait for
B[0] all of A before A[1]
B[1] getting any B B[1]
… C[1]
C[0] …
C[1]
… Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 633
Multiprocessors
• Consumer electronics systems.
• Cell phones.
• CDs and DVDs.
• Audio players.
• Digital still cameras.
• Multimedia: stored in
compressed form,
uncompressed on
viewing.
• Data storage and
management: keep track
of your multimedia, etc.
• Communication:
download, upload, chat.
• Most popular CE
device in history;
most widely used
computing device.
• 1 billion sold per year.
• Handset talks to cell.
• Cells hand off
handset as it moves.
Audio
CPU
memory
Jog
memory
Error Analog
display focus, drive
corrector out
tracking,
sled,
amp DAC Servo Analog head
motor
CPU in
FE, TE, amp
I2S
memory
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 642
CD medium
• Rotational speed: 1.2-1.4 m/s (CLV).
• Track pitch: 1.6 microns.
• Diameter: 120 mm.
• Pit length: 0.8 -3 microns.
• Pit depth: .11 microns.
• Pit width: 0.5 microns.
• Laser wavelength: 780 nm.
track
detectors
diffraction
sled grating
laser
track
Overheads for Computers as 644
© 2008 Wayne Wolf Components 2nd ed.
Laser focus
Side spot
detectors F
A
Level:
D B A+B+C+D
Focus error:
C (A+C)-(B+D)
E Tracking error:
E-F
• Eight-to-fourteen modulation:
• Fourteen-bit code guarantees a maximum
distance between transitions.
00000011 00100100000000
Choose
Scale factor
mux
Filter
bank * requantize
0101..
Masking
FFT model
Scale
factor
demux inverse
quantize Inverse
0101.. * * filter
bank
expand
Step
size
Bayer pattern
PC Motion-estimator
memory[]
compute-mv()
:PC :Motion-estimator
compute-mv()
Search area memory[]
memory[]
macroblocks memory[]
search area
PE 0
network
PE 1
generator
comparator
Address
ctrl ...
Motion
vector
macroblock
network
PE 15
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 676
Pixel schedules
PE 0 PE 1 PE 2
|M(0,0)-S(0,0)|
M(0,0)
|M(0,1)-S(0,1)| |M(0,0)-S(0,1)|
PE
PE
communication link
network
PE
PEs may be CPUs or ASICs.
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 680
Networks in embedded
systems
initial processing
more processing
PE sensor
PE
PE actuator
PE 1 PE 2 PE 3
link 1 link 2
PE 1 PE 2 PE 3 PE 4
fixed A B C A B C
round-robin
A B C B C A
A,B,C A,B,C
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 690
Crossbar
out4
out3
out2
out1
in1 in2 in3 in4
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 691
Crossbar characteristics
• Non-blocking.
• Can handle arbitrary multi-cast
combinations.
• Size proportional to n2.
master 1 master 2
data line
SDL
clock line
SCL
slave 1 slave 2
SDL ...
SDL
+
SCL
multi-byte write
S adrs 1 data P
A B C
time
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 706
Ethernet packet format
A B C
Network 1 Network 2
P1 P2
• Computational • Communication
requirements: requirements:
• sum up process • Count all
requirements over transmissions in one
least-common multiple period.
of periods, average
over one period.
application application
presentation presentation
session session
transport IP transport
network network network
data link data link data link
physical physical physical
User
TCP UDP Datagram
Protocol
IP
Quickcam HTTP
server QuickCam
Java VM
Java nanokernel
486
• 11 bit destination
address.
• RTR bit determines
read/write from/to
destination.
• Any node can detect
bus error, interrupt
packet for
retransmission.
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 737
CAN controller
• Controller implements
physical and data link
layers.
• No network layer
needed---bus
provides end-to-end
connections.
floor
floor
floor
floor
floor
Hoistway 1 Hoistway 2
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 745
Theory of operation
• Each floor has control panel, display.
• Each car has control panel:
• one button per floor;
• emergency stop.
• Controlled by a single controller.
sensor
fine
coarse
1
Coarse-sensor*
Master-control-panel*
1 1
1 N 1
Fine-sensor* Car 1
1 1
1
1 Controller
Car-control-panel* 1
1
1 Floor F N
Floor-control-panel* 1 Motor*
Sensor* Car-control-panel*
hit: boolean Floors[1..F]: boolean
emergency-stop:
boolean
open-door, close-door:
Coarse-sensor* Fine-sensor* boolean
Master-control-panel...
Motor* Floor-control-panel*
speed: {o,s,f} up, down: boolean
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 751
Car and Floor classes
Car Floor
request-lights[1..F]:
up-light, down-light:
boolean
boolean
current-floor: integer
Controller
car-floor[1..H]: integer
emergency-stop[1..H]:
integer
scan-cars()
scan-floors()
scan-master-panel()
operate()
requirements
architecture
coding
testing
maintenance
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 763
Waterfall model steps
• Requirements: determine basic
characteristics.
• Architecture: decompose into basic
modules.
• Coding: implement and integrate.
• Testing: exercise and uncover bugs.
• Maintenance: deploy, fix bugs, upgrade.
system feasibility
specification
prototype
initial system
enhanced system
requirements
design
test
specify specify
architect architect
design design
build build
test test
requirements and
specification
architecture
integration
testing
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 769
Co-design methodology
• Must architect hardware and software
together:
• provide sufficient resources;
• avoid software bottlenecks.
• Can build pieces somewhat
independently, but integration is major
step.
• Also requires bottom-up feedback.
spec spec
spec
architecture HWSW
architecture
architecture
HW SW detailed
detailed
design
design
integrate integration
integration
test testtest
system hardware
software
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 772
Concurrent engineering
• Large projects use many people from
multiple disciplines.
• Work on several tasks at once to reduce
design time.
• Feedback between tasks helps improve
quality, reduce number of later design
problems.
• Used in telephone
on-hook
telecommunications
protocol design. caller goes
• Event-oriented state off-hook
machine model.
dial tone
caller gets
dial tone
i2
S3 S3
traditional OR state
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 784
Statechart AND state
sab
c
S1-3 S1-4 S1 S3
d
b a b a b a c d
c
S2-3 S2-4 S2 S4
d r
r r
S5
S5
traditional AND state
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 785
AND-OR tables
• Alternate way of specifying complex
conditions:
cond1 or (cond2 and !cond3)
cond1 T
OR -
cond2 - T
AND cond3 - F
state description b
c
outputs
d
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 788
TCAS top-level description
CAS
power-off
power-on
Inputs:
TCAS-operational-status {operational,not-operational}
fully-operational
own-aircraft C
Outputs:
sound-aural-alarm: {true,false} aural-alarm-inhibit: {true, false}
combined-control-out: enumerated, etc.
Overheads for Computers as
© 2008 Wayne Wolf Components 2nd ed. 790
CRC cards
• Well-known method for analyzing a
system and developing an architecture.
• CRC:
• classes;
• responsibilities of each class;
• collaborators are other classes that work with
a class.
• Team-oriented methodology.
front back
requirements
bug coding bug