Lecture 02 Hardware

Embedded System
Hardware
These slides use Microsoft clip arts.

Microsoft copyright restrictions apply.
TU Dortmund
Motivation
 The need to consider both hardware and software is one of

the characteristics of embedded/cyber-physical systems.
Reasons:
• Real-time behavior
• Efficiency
- Energy
- …
• Security
• Reliability
• …

- 2-
TU Dortmund
Structure of this course
2: Design
Design
Application Knowledge
Specification repository
3: 8:
ES-hardware 6: Application Test
mapping
4: system 7: Optimization
software (RTOS,
5: Evaluation &
middleware, …) validation (energy, cost,
performance, …)
Numbers denote sequence of chapters
- 3-
TU Dortmund
Embedded System Hardware
Embedded system hardware is frequently used in a loop

(“hardware in a loop“):
 cyber-physical systems
- 4-
TU Dortmund
Many examples of such loops
 Heating
 Lights
 Engine control
 Power supply
 …
 Robots
Heating: www.masonsplumbing.co.uk/images/heating.jpg
Robot:: Courtesy and ©: H.Ulbrich, F. Pfeiffer, TU München
- 5-
TU Dortmund
Sensors
Processing of physical data starts with capturing this data.

Sensors can be designed for virtually every physical and
chemical quantity
 including weight, velocity, acceleration, electrical
current, voltage, temperatures etc.
 chemical compounds.
Many physical effects used for constructing sensors.
Examples:
 law of induction (generation of voltages in an electric
field),
 light-electric effects.
Huge amount of sensors designed in recent years.
- 6-
TU Dortmund
Example: Acceleration Sensor
Courtesy & ©: S. Bütgenbach, TU Braunschweig
- 7-
TU Dortmund
Charge-coupled devices (CCD) image sensors
Based
Basedon
oncharge
chargetransfer
transferto
tonext
next pixel
pixelcell
cell
Corresponding to “bucket brigade device”

(German: “Eimerkettenschaltung”)
- 8-
TU Dortmund
CMOS image sensors
Based on standard
production process
for CMOS chips,
allows integration
with other
components.
- 9-
TU Dortmund
Comparison CCD/CMOS sensors
Property CCD CMOS

Technology Optics VLSI technology
optimized for
Technology Special Standard
Smart sensors No, no logic on chip Logic elements on chip
Access Serial Random
Size Limited Can be large
Power consumption Low Larger
Applications Compact cameras Low cost devices, SLR
cameras
See also B. Diericks: CMOS image sensor concepts. Photonics West 2000 Short course (Web)
- 10 -
TU Dortmund
Example: Biometrical Sensors
e.g.: Fingerprint sensor
© P. Marwedel, 2010
- 11 -
TU Dortmund
Artificial eyes
© Dobelle Institute
(was at www.dobelle.com)
- 12 -
TU Dortmund
Artificial eyes (2)
He looks hale, hearty, and healthy — except

for the wires. …. From a distance the wires
look like long ponytails.
© Dobelle Institute
- 13 -
TU Dortmund
Artificial eyes (3)
 Translation into sound;

resolution claimed to be good
[http://www.seeingwithsound.com/etumble.htm]
Movie - 14 -
TU Dortmund
Other sensors
 Rain sensors for wiper control

(“Sensors multiply like rabbits“ [ITT automotive])
 Pressure sensors
 Proximity sensors
 Engine control sensors
 Hall effect sensors
- 15 -
TU Dortmund
Signals
Sensors generate signals
Definition: a signal s is a mapping

from the time domain DT to a value domain DV:
s: DT  DV
DT : continuous or discrete time domain
DV : continuous or discrete value domain.
- 16 -
Discretization
Graphics: © Alexandra Nolte, Gesine Marwedel, 2003

TU Dortmund
Discretization of time
Digital computers require discrete sequences of physical

values
s : DT  DV
Discrete time domain
 Sample-and-hold circuits
- 18 -
TU Dortmund
Sample-and-hold circuits
Clocked transistor + capacitor;

Capacitor stores sequence values
e(t) is a mapping ℝ  ℝ
h(t) is a sequence of values or a mapping ℤ  ℝ
- 19 -
TU Dortmund
Do we loose information due to sampling?
Would we be able to reconstruct input signals from the

sampled signals?
 approximation of signals by sine waves.
- 20 -
TU Dortmund
Approximation of a K=1
square wave (1)
Target: square wave
with period p1=4
K
4  2 t 
e' K (t )   sin  
k 1, 3, 5,.. k  pk 
K=3
with k: pk= p1/k: periods

of contributions to e’
- 21 -
TU Dortmund
square wave (2)
K
4  2 t 
e' K (t )   sin  
k 1, 3, 5,.. k  4/k  K=7
- 22 -
TU Dortmund
square wave (3)
K
4  2 t 
e' K (t )   sin  
k 1, 3, 5,.. k  4/k  K=11
K=11
Applet at © http://
www.jhu.edu/~signals/fourier2/index.html- 23 -
TU Dortmund
Linear transformations
Let e1(t) and e2(t) be signals
Definition: A transformation Tr of signals is linear iff

Tr (e1  e2 )  Tr (e1 )  Tr (e2 )
In the following, we will consider linear transformations.

 We consider sums of sine waves instead of the original
signals.
- 24 -
TU Dortmund
Aliasing
 2 t   2 t 
e3 (t )  sin    0 . 5 sin  
 8   4 
 2 t   2 t   2 t 
e4 (t )  sin   0.5 sin    0. 5 sin  
 8   4   1 
Periods of 8,4,1
Indistinguishable if sampled at integer times, ps=1
Matlab demo - 25 -
TU Dortmund
Aliasing (2)
 Reconstruction impossible, if not sampling frequently

enough
How frequently do we have to sample?
Nyquist criterion (sampling theory):
Aliasing can be avoided if we restrict the frequencies of
the incoming signal to less than half of the sampling rate.
ps < ½ pN where pN is the period of the “fastest” sine wave
or fs > 2 fN where fN is the frequency of the “fastest” sine wave
fN is called the Nyquist frequency, fs is the sampling rate.
See e.g. [Oppenheim/Schafer, 2009]
- 26 -
TU Dortmund
Anti-aliasing filter
A filter is needed to remove high frequencies
e4(t) changed into e3(t)
g (t ) Ideal filter
e(t )
Realizable
filter fs /2 fs
- 27 -
TU Dortmund
Examples of Aliasing in computer graphics
Original Sub-sampled, no filtering
http://en.wikipedia.org/wiki/Image:
Moire_pattern_of_bricks_small.jpg - 28 -
TU Dortmund
Examples of Aliasing in
computer graphics (2)
Filtered &
Original (pdf screen copy) sub-
sampled
Sub-
sampled,
no filtering
http://www.niirs10.com/
Impact of
Resources/Reference rasterization
Documents/Accuracy in Digital
Image Processing.pdf
- 29 -
TU Dortmund
Discretization of values: A/D-converters
Digital computers require digital form of physical values
s: DT  DV
Discrete value domain
A/D-conversion; many methods with different speeds.
- 30 -
TU Dortmund
Flash A/D converter
*
Encodes input
number of most
significant ‘1’ as an
unsigned number,
e.g.
“1111” -> “100”,
“0111” -> “011”,
“0011” -> “010”,
“0001” -> “001”,
“0000” -> “000”
(Priority encoder).
* Frequently, the case h(t) > Vref would not be decoded
- 31 -
TU Dortmund
Assuming 0  h(t)  Vref
Encoding of voltage intervals
“11“
“10“
“01“
“00“
Vref /4 Vref /2 3Vref /4 Vref h(t)
- 32 -
TU Dortmund
Resolution
 Resolution (in bits): number of bits produced

 Resolution Q (in volts): difference between two input
voltages causing the output to be incremented by 1
VFSR
Q with
n
Q: resolution in volts per step Example:

Q = Vref /4 for the
VFSR: difference between largest
previous slide,
and smallest voltage assuming * to be
n: number of voltage intervals absent
- 33 -
TU Dortmund
Resolution and speed of Flash A/D-converter
Parallel comparison with reference voltage

Speed: O(1)
Hardware complexity: O(n)
Applications: e.g. in video processing
- 34 -
TU Dortmund
Higher resolution:
Successive approximation
h(t)
V-
w(t)
Key idea: binary search: Speed: O(log2(n))

Set MSB='1' Hardware complexity: O(log2(n))
if too large: reset MSB with n= # of distinguished
Set MSB-1='1' voltage levels;
if too large: reset MSB-1 slow, but high precision possible.
- 35 -
TU Dortmund
Successive approximation (2)
1100
1011
Vx
h(t) 1010
1000
V-
t
- 36 -
TU Dortmund
Application areas for flash

and successive approximation converters
Effective number of bits at bandwidth
(used in multimeters)
(using single bit D/A-converters;
common for high quality audio equipments)
[http://www.beis.de/Elektronik/
DeltaSigma/DeltaSigma.html]
(Pipelined flash
converters)
[Gielen et al., DAC 2003]

Movie IEEE tv - 37 -
TU Dortmund
Quantization Noise
Assuming
h(t) “rounding“
(truncating)
towards 0
w(t)
w(t)-h(t)
- 38 -
TU Dortmund
Quantization Noise
h(t)
Assuming
“rounding“
w(t) (truncating)
towards 0
h(t)-w(t)
- 39 -
TU Dortmund
Quantization noise for audio signal
e.g.: 20 log(2)=6.02 decibels
 effective signal voltage 

signal to noise ratio (SNR) [db]  20 log  
 effective noise voltage 
Signal to noise for ideal n-bit converter : n * 6.02 + 1.76 [dB]
e.g. 98.1 db for 16-bit converter, ~ 160 db for 24-bit converter
Additional noise for non-ideal converters Source: [http://www.beis.de/Elektronik/
DeltaSigma/DeltaSigma.html]
MATLAB demo - 40 -
TU Dortmund
Signal to noise ratio
 effective signal voltage 

signal to noise ratio (SNR) [db]  20 log10  
 effective noise voltage 
e.g.: 20 log10(2)=6.02 decibels
Signal to noise for ideal n-bit converter : n * 6.02 + 1.76 [dB]

e.g. 98.1 db for 16-bit converter, ~ 160 db for 24-bit converter
Additional noise for non-ideal converters
- 41 -
TU Dortmund
Summary
Hardware in a loop
 Sensors
 Discretization
• Definition of signals
• Sample-and-hold circuits
- Aliasing (and how to avoid it)
- Nyquist criterion
• A/D-converters
- Flash-based
- Successive approximation
- Quantization noise
- 42 -
Hardware
- Processing -
Embedded System

TU Dortmund

- 44 -
TU Dortmund
Processing units
Need for efficiency (power + energy):
Why worry about

energy and power?
“Power is considered as the most important constraint in

embedded systems“
[in: L. Eggermont (ed): Embedded Systems Roadmap 2002, STW]
Energy consumption by IT is the key concern

of green computing initiatives (embedded
computing leading the way)
http://www.esa.int/images/earth,4.jpg
- 45 -
TU Dortmund
Importance
of Energy
Efficiency
p ower on“
e rent of silic
“inh iency
c
effi
Hugo De Man,
©
IMEC, Philips, 2007
- 46 -
TU Dortmund
Power and energy are related

to each other
E   P dt
P
E'
E
t
In many cases, faster execution also means less energy,
but the opposite may be true if power has to be increased
to allow faster execution.
- 47 -
TU Dortmund
Low Power vs. Low Energy Consumption
 Minimizing power consumption important for

• the design of the power supply
• the design of voltage regulators
• the dimensioning of interconnect
• short term cooling
 Minimizing energy consumption important due to
• restricted availability of energy (mobile systems)
• limited battery capacities (only slowly improving)
• very high costs of energy (solar panels, in space)
• cooling
• high costs
• limited space
• dependability
• long lifetimes, low temperatures
- 48 -
TU Dortmund
Power density continues to get worse
Nuclear reactor
Prescott: 90 W/cm²,
90 nm [c‘t 4/2004]
© Intel
M. Pollack,
Micro-32
- 49 -
TU Dortmund
Surpassed hot (kitchen) plate …?

Why not use it?
http://
www.phys.ncku.edu.tw/
~htsu/humor/fry_egg.html
- 50 -
TU Dortmund
Energy consumption in mobile devices
[O. Vargas (Infineon Technologies): Minimum power consumption in mobile-phone memory subsystems; Pennwell
Portable Design - September 2005;] Thanks to Thorsten Koch (Nokia/ Univ. Dortmund) for providing this source.
- 51 -
TU Dortmund
Application Specific Circuits (ASICS)

or Full Custom Circuits
Custom-designed circuits necessary

 if ultimate speed or
 energy efficiency is the goal and
 large numbers can be sold.
Approach suffers from
 long design times,
 lack of flexibility
(changing standards) and
 high costs
(e.g. Mill. $ mask costs).
- 53 -
TU Dortmund
Mask cost for specialized HW

becomes very expensive
Trend
towards
implementation
in Software
HW
HWsynthesis
synthesisnot
not covered
coveredin
inthis
thiscourse.
course.
[http://www.molecularimprints.com/Technology/
tech_articles/MII_COO_NIST_2001.PDF9]
- 54 -
TU Dortmund
Key requirements for processors
1. Energy/
power-
efficiency
- 55 -
TU Dortmund
Dynamic power management (DPM)
Example: STRONGARM SA1100
RUN: operational 400mW

IDLE: a sw routine may
RUN s
stop the CPU when not µ
in use, while monitoring 90
10µs
si ult er
fa ow
interrupts
P
160ms
al
gn
SLEEP: Shutdown of on- 10µs
chip activity 90µs
IDLE Power fault SLEEP
signal
50mW 160µW
- 56 -
TU Dortmund
Fundamentals of dynamic voltage

scaling (DVS)
Power consumption of CMOS

circuits (ignoring leakage): Delay for CMOS circuits:
P   C L Vdd2 f with
 : switching activity Vdd
  k CL with
C L : load capacitanc e Vdd  Vt 2
Vdd : supply voltage Vt : threshhold voltage

f : clock frequency (Vt  than Vdd )
Decreasing Vdd reduces P quadratically,

while the run-time of algorithms is only linearly increased
- 57 -
TU Dortmund
Variable-voltage/frequency example: INTEL Xscale
OS should
schedule
distribution
of the
energy
budget.
From Intel’s Web Site

- 59 -
TU Dortmund
Low voltage, parallel operation more efficient

than high voltage, sequential operation
Basic equations
Power: P ~ VDD² ,
Maximum clock frequency: f ~ VDD ,
Energy to run a program: E = P  t, with: t = runtime (fixed)
Time to run a program: t ~ 1/f
Changes due to parallel processing, with  operations per clock:
Clock frequency reduced to: f’ = f / ,
Voltage can be reduced to: VDD’ =VDD / ,
Power for parallel processing: P° = P / ² per operation,
Power for  operations per clock: P’ =   P° = P / ,
Time to run a program is still: t’ = t,
Energy required to run program: E’ = P’  t = E /  Rough
Argument in favour of voltage scaling, approxi-
VLIW processors, and multi-cores mations!
- 60 -
TU Dortmund
Application: VLIW procesing and

voltage scaling in the Crusoe processor
 VDD: 32 levels (1.1V - 1.6V)

 Clock: 200MHz - 700MHz in increments of 33MHz
Scaling is triggered when CPU load change is detected by
software (~1/2 ms).
 More load: Increase of supply voltage (~20 ms/step),
followed by scaling clock frequency
 Less load: reduction of clock frequency, followed by
reduction of supply voltage
Worst case (1.1V to 1.6V VDD, 200MHz to 700MHz) takes
280 ms
- 61 -
TU Dortmund
Key requirement #2: Code-size efficiency

 CISC machines: RISC machines designed for run-time-,
not for code-size-efficiency
 Compression techniques: key idea
- 63 -
TU Dortmund
Code-size efficiency
 Compression techniques (continued):
• 2nd instruction set, e.g. ARM Thumb instruction set:
16-bit Thumb instr.
001 10 Rd Constant ADD Rd #constant
Dynamically
decoded at
major
source=
run-time
opcode minor
opcode destination zero extended
1110 001 01001 0 Rd 0 Rd 0000 Constant

• Reduction to 65-70 % of original code size
• 130% of ARM performance with 8/16 bit memory
• 85% of ARM performance with 32-bit memory [ARM, R. Gupta]
Same approach for LSI TinyRisc, …

Requires support by compiler, assembler etc.
- 64 -
TU Dortmund
Dictionary approach, two level control store

(indirect addressing of instructions)
“Dictionary-based coding schemes cover a wide range of

various coders and compressors.
Their common feature is that the methods use some kind of a
dictionary that contains parts of the input sequence which
frequently appear.
The encoded sequence in turn contains references to the
dictionary elements rather than containing these over and
over.”
[Á. Beszédes et al.: Survey of Code size Reduction Methods, Survey of Code-Size
Reduction Methods, ACM Computing Surveys, Vol. 35, Sept. 2003, pp 223-267]
- 65 -
TU Dortmund
Key idea (for d bit instructions)
For each Uncompressed storage of

b
instruction instruction a d-bit-wide instructions
address, S requires axd bits.
address
S a contains table
address of
instruction.
In compressed code, each
b « d bit instruction pattern is
table of used instructions stored only once.
c≦ (“dictionary”)
2b small
d bit
Hopefully, axb+cxd < axd.
CPU Called nanoprogramming
in the Motorola 68000.
- 66 -
TU Dortmund
More information on code compaction
 Popular code compaction library by Rik van de Wiel

[http://www.extra.research.philips.com/ccb] has been
moved to
http://www-perso.iro.umontreal.ca/~latendre/
codeCompression/codeCompression/node1.html
http://www.iro.umontreal.ca/~latendre/compactBib/
(153 entries as per 11/2004)
- 68 -
TU Dortmund
Key requirement #3: Run-time efficiency

- Domain-oriented architectures -
Example: Filtering in Digital signal processing (DSP)
Signal at t=ts (sampling points)
- 69 -
TU Dortmund
Filtering in digital signal processing
ADSP 2100
-- outer loop over

-- sampling times ts
{ MR:=0; A1:=1; A2:=s-1;
MX:=w[s]; MY:=a[0];
for (k=0; k <= (n−1); k++)
{ MR:=MR + MX * MY;
MX:=w[A2]; MY:=a[A1];
A1++; A2--;
}
x[s]:=MR;
}
Maps nicely
- 70 -
TU Dortmund
DSP-Processors: multiply/accumulate (MAC)

and zero-overhead loop (ZOL) instructions
MR:=0; A1:=1; A2:=s-1; MX:=w[s]; MY:=a[0];

for ( k:=1 <= n-1)
{MR:=MR+MX*MY; MY:=a[A1]; MX:=w[A2]; A1++; A2--}
Multiply/accumulate (MAC) instruction Zero-overhead loop (ZOL)

instruction preceding MAC
instruction.
Loop testing done in parallel to
MAC operations.
- 71 -
TU Dortmund
Heterogeneous registers
Example
Example(ADSP
(ADSP210x):
210x):
P
D
AX AY MX MY
Address- AF MF
registers
A0, A1,
A2 ..
+,-,..
*
Address
AR
+,-
generation
unit (AGU) MR
Different
Differentfunctionality
functionalityof
ofregisters
registersAn,
An,AX,
AX,AY,
AY,AF,MX,
AF,MX,MY,
MY,MF,
MF,MR
MR
- 72 -
TU Dortmund
Separate address generation units (AGUs)
Example
Example(ADSP
(ADSP210x):
210x):
 Data memory can only be
fetched with address contained
in A,
 but this can be done in parallel
with operation in main data path
(takes effectively 0 time).
 A := A ± 1 also takes 0 time,
 same for A := A ± M;
 A := <immediate in instruction>
requires extra instruction
 Minimize load immediates
 Optimization in optimization
chapter
- 73 -
TU Dortmund
Modulo addressing
sliding window
Modulo addressing: w
Am++  Am:=(Am+1) mod n
(implements ring or circular
buffer in memory)
t
t1
.. ..
n most w[t1-1] w[t1-1]
recent w[t1] w[t1]
values w[t1-n+1] w[t1+1]
w[t1-n+2] w[t1-n+2]
.. ..
Memory, t=t1 Memory, t2= t1+1
- 74 -
TU Dortmund
Saturating arithmetic
 Returns largest/smallest number in case of

over/underflows
 Example:
a 0111
b + 1001
standard wrap around arithmetic (1)0000
saturating arithmetic 1111
(a+b)/2: correct 1000
wrap around arithmetic 0000
saturating arithmetic + shifted 0111
“almost correct“
 Appropriate for DSP/multimedia applications:
• No timeliness of results if interrupts are generated for overflows
• Precise values less important
• Wrap around arithmetic would be worse.
- 75 -
TU Dortmund
Example
MATLAB Demo - 76 -
TU Dortmund
Fixed-point arithmetic
Shifting
Shiftingrequired
requiredafter
aftermultiplications
multiplicationsand
anddivisions
divisionsin
in
order
ordertotomaintain
maintainbinary
binarypoint.
point.
- 77 -
TU Dortmund
Real-time capability
 Timing behavior has to be predictable
[Dagstuhl workshop on predictability, Nov. 17-19, 2003]

Features that cause problems:
• Unpredictable access to shared resources
• Caches with difficult to predict replacement strategies
• Unified caches (conflicts between instructions and data)
• Pipelines with difficult to predict stall cycles ("bubbles")
• Unpredictable communication times for multiprocessors
• Branch prediction, speculative execution
• Interrupts that are possible any time
• Memory refreshes that are possible any time
• Instructions that have data-dependent execution times
 Trying to avoid as many of these as possible.
- 79 -
TU Dortmund
Multiple memory banks or memories
P
D
AX AY MX MY
Address- AF MF
registers
A0, A1,
A2 ..
+,-,..
*
Address
AR
+,-
generation
unit (AGU) MR
Simplifies parallel fetches

- 80 -
TU Dortmund
Multimedia-Instructions/Processors
 Multimedia instructions exploit that many registers,

adders etc are quite wide (32/64 bit),
 whereas most multimedia data types are narrow
(e.g. 8 bit per color, 16 bit per audio sample per channel)
 2-8 values can be stored per register and added. E.g.:
+
4 additions per instruction;
carry disabled at word
boundaries.
- 81 -
TU Dortmund
Early example: HP precision architecture (hp PA)
Half word add instruction HADD:
Half word add?
Optional saturating arithmetic.

Up to 10 instructions can be replaced by HADD.
- 82 -
TU Dortmund
Pentium MMX-architecture (1)
64-bit vectors representing 8 byte encoded, 4 word encoded

or 2 double word encoded numbers.
wrap around/saturating options.
Multimedia registers mm0 - mm7,
consistent with floating-point registers (OS unchanged).
Instruction Options Comments

Padd[b/w/d] wrap around, addition/subtraction of
PSub[b/w/d] saturating bytes, words, double words
Pcmpeq[b/w/d] Result= "11..11" if true, "00..00" otherwise
Pcmpgt[b/w/d] Result= "11..11" if true, "00..00" otherwise
Pmullw multiplication, 4*16 bits, least significant word
Pmulhw multiplication, 4*16 bits, most significant word
- 83 -
TU Dortmund
Pentium MMX-architecture (2)
Psra[w/d] No. of Parallel shift of words, double words

Psll[w/d/q] positions in or 64 bit quad words
Psrl[w/d/q] register or
instruction
Punpckl[bw/wd/dq] Parallel unpack
Punpckh[bw/wd/dq] Parallel unpack
Packss[wb/dw] saturating Parallel pack
Pand, Pandn Logical operations on 64 bit words
Por, Pxor
Mov[d/q] Move instruction
- 84 -
TU Dortmund
Appli-
cation
Scaled
interpolation
between two
images
Next word =
next pixel,
same color. pxor mm7,mm7 ;clear register mm7
movq mm3,fade_val;load scaling value
4 pixels movd mm0,imageA ;load 4 red pixels for A
processed at movd mm1,imageB ;load 4 red pixels for B
a time. unpcklbw mm1,mm7 ;unpack,bytes to words
unpcklbw mm0,mm7 ;upper bytes from mm7
psubw mm0,mm1 ;subtract pixel values
pmulhw mm0,mm3 ;scale
paddw mm0,mm1 ;add to image B
packuswb mm0,mm7 ;pack, words to bytes - 85 -
TU Dortmund
Short vector instruction set extensions

for Intel® Pentium®/AMD® processors
 3DNow! (AMD, 1989)
 Streaming SIMD Extensions SSE (Intel, 1999)
• 16 new registers, floating point SIMD
 SSE2 (Intel, 2001; AMD, 2003)
• MMX instructions available for new SSE registers
 SSE3 (Intel, 2004; AMD)
• vector reduction, floating point conversion independent
of global rounding mode, relaxed alignment restrictions
 SSE4 (Intel, 2006; AMD: 4 instructions implemented)
• String comparison, counting 1‘s, CRC, …
 SSE5 (AMD, 2007)
• 3-address instructions, …
 Advanced vector extensions AVX (Intel, 2008)
• Registers 256, … bit wide
- 86 -
TU Dortmund
Summary
Hardware in a loop
 Sensors
 Discretization
 Information processing
• Importance of energy efficiency
• Special purpose HW very expensive
• Energy efficiency of processors
• Code size efficiency
• Run-time efficiency
• MPSoCs
• Reconfigurable Hardware
 D/A converters
 Actuators
- 87 -
Embedded System

Processing

TU Dortmund
Key idea of very long instruction word

(VLIW) computers
Instructions included in long instruction packets.

Instruction packets are assumed to be executed in parallel.
Fixed association of packet bits with functional units.
- 89 -
TU Dortmund
Very long instruction word (VLIW) architectures
 Very long instruction word (“instruction packet”) contains

several instructions, all of which are assumed to be
executed in parallel.
 Compiler is assumed to generate these “parallel” packets
 Complexity of finding parallelism is moved from the
hardware (RISC/CISC processors) to the compiler;
Ideally, this avoids the overhead (silicon, energy, ..) of
identifying parallelism at run-time.
A lot of expectations into VLIW machines
 Explicitly parallel instruction set computers (EPICs) are an
extension of VLIW architectures: parallelism detected by
compiler, but no need to encode parallelism in 1 word.
- 90 -
TU Dortmund
EPIC: TMS 320C6xx as an example
Bit in each instruction encodes end of parallel execution

31 0 31 0 31 0 31 0 31 0 31 0 31 0
0 1 1 0 1 1 0
Instr. Instr. Instr. Instr. Instr. Instr. Instr.
A B C D E F G
Cycle Instruction Instructions B, C and D use

disjoint functional units,
1 A cross paths and other data
2 B C D path resources. The same
3 E F G is also true for E, F and G.
Parallel execution cannot span several packets.

- 91 -
TU Dortmund
Partitioned register files
 Many memory ports are required to supply enough

operands per cycle.
 Memories with many ports are expensive.
 Registers are partitioned into (typically 2) sets,
e.g. for TI C60x:
- 92 -
TU Dortmund
More encoding flexibility with IA-64 Itanium
3 instructions per bundle:

127 0
instruc 1 instruc 2 instruc 3 template
There are 5 instruction types: Instruction

 A: common ALU instructions grouping
 I: more special integer instructions (e.g. shifts) information
 M: Memory instructions
 F: floating point instructions
 B: branches
The following combinations can be encoded in templates:
 MII, MMI, MFI, MIB, MMB, MFB, MMF, MBB, BBB, MLX
with LX = move 64-bit immediate encoded in 2 slots
- 93 -
TU Dortmund
Templates and instruction types
End of parallel execution called stops.

Stops are denoted by underscores.
Example:
bundle 1 bundle 2
… MMI M_II MFI_ MII MMI MIB_
Group 1 Group 2 Group 3
Very restricted placement of stops within bundle.

Parallel execution within groups possible.
Parallel execution can span several bundles
- 94 -
TU Dortmund
Instruction types are mapped to

functional unit types
There are 4 functional unit (FU) types:

 M: Memory Unit
 I: Integer Unit
 F: Floating-Point Unit
 B: Branch Unit
Instruction types  corresponding FU type,
except type A (mapping to either I or M-functional units).
- 95 -
TU Dortmund
Implementation: Itanium 2 (2003)

L3 cache
 410M transistors
 374 mm2 die size
 6MB on-die L3
cache
 1.5 GHz at 1.3V
[ftp://download.intel.com/design/
itanium2/download/
madison_slides_r1.pdf]
© Intel, 2003 - 96 -
TU Dortmund
Philips
TriMedia-
Processor
For
For
multimedia-
multimedia-
applications,
applications,
up
upto
to55
instructions/
instructions/
cycle.
cycle.
http://www.nxp.com/acrobat/
datasheets/
PNX15XX_SER_N_3.pdf
(incompatible with firefox?)
© NXP
- 97 -
TU Dortmund
Large # of delay slots,

a problem of VLIW processors
add sub and or
sub mult xor div
ld st mv beq
- 98 -
TU Dortmund

add sub and or

sub mult xor div
ld st mv beq
- 99 -
TU Dortmund

add sub and or

sub mult xor div
ld st mv beq
The execution of many instructions has been started before it is

realized that a branch was required.
Nullifying those instructions would waste compute power
 Executing those instructions is declared a feature, not a bug.
 How to fill all “delay slots“ with useful instructions?
 Avoid branches wherever possible.
- 100 -
TU Dortmund
Predicated execution:
Implementing IF-statements “branch-free“
Conditional Instruction “[c] I“ consists of:

 condition c
 instruction I
c = true => I executed

c = false => NOP
- 101 -
TU Dortmund
Predicated execution:
Implementing IF-statements “branch-free“: TI C6x
Conditional branch Predicated execution

if (c) [c] B L1 [c] ADD x,y,a
{ a = x + y; NOP 5 || [c] ADD x,z,b
b = x + z; B L2 || [!c] SUB x,y,a
} NOP 4 || [!c] SUB x,z,b
else SUB x,y,a
{ a = x - y; || SUB x,z,b
b = x - z; L1: ADD x,y,a
} || ADD x,z,b
L2:
max. 12 cycles 1 cycle
- 102 -
TU Dortmund
Microcontrollers
- MHS 80C51 as an example -
Features for Embedded Systems

 8-bit CPU optimised for control applications
 Extensive Boolean processing capabilities
 64 k Program Memory address space
 64 k Data Memory address space
 4 k bytes of on chip Program Memory
 128 bytes of on chip data RAM
 32 bi-directional and individually addressable I/O lines
 Two 16-bit timers/counters
 Full duplex UART
 6 sources/5-vector interrupt structure with 2 priority levels
 On chip clock oscillators
 Very popular CPU with many different variations
Moved from 3.4.3.4 - 103 -

TU Dortmund
Trend: multiprocessor systems-on-a-chip (MPSoCs)
http://www.mpsoc-forum.org/2007/slides/Hattori.pdf
- 104 -
TU Dortmund
Multiprocessor systems-on-a-chip
(MPSoCs) (2)
- 105 -
TU Dortmund
Multiprocessor systems-on-a-chip
(MPSoCs) (3)
- 106 -
TU Dortmund
Multiprocessor systems-on-a-chip (MPSoCs) (4)
Hugo De Man, IMEC, 2007

©
~50% inherent power efficiency of silicon - 107 -

Embedded System
Hardware
- Reconfigurable

Hardware -

TU Dortmund
Energy Efficiency of FPGAs
ower n“
p o
e rent of silic
“inh iency
c
effi
Hugo De Man,
©
IMEC, Philips, 2007
- 109 -
TU Dortmund
Reconfigurable Logic
Full custom chips may be too expensive, software too slow.

Combine the speed of HW with the flexibility of SW
HW with programmable functions and interconnect.
Use of configurable hardware;
common form: field programmable gate arrays (FPGAs)
Applications: bit-oriented algorithms like
 encryption,
 fast “object recognition“ (medical and military)
 Adapting mobile phones to different standards.
Very popular devices from
 XILINX (XILINX Vertex II are recent devices)
 Actel, Altera and others
- 110 -
TU Dortmund
Floor-plan of VIRTEX II FPGAs
More recent: Virtex 5, but no floor-plan found for Virtex 5.

- 111 -
TU Dortmund
Virtex 5 Configurable Logic Block (CLB)
- 112 -
TU Dortmund
Virtex 5 Slice (simplified)
Memories typically
used as look-up
tables to implement
any Boolean
function of  6
variables.
- 113 -
TU Dortmund
Virtex 5 SliceM
SliceM supports using

memories for storing
data and as shift
registers
- 114 -
TU Dortmund
Resources
available
in Virtex 5
devices
[© and source: Xilinx Inc.:

Virtex 5 FPGA User
Guide, May, 2009
//www.xilinx.com]
- 115 -
Hierarchical Routing Resources;
no routing plan found for Virtex 5.
TU Dortmund
Interconnect for Virtex II
- 116 -
TU Dortmund
Virtex II Pro Devices

include
up to 4 PowerPC
processor cores
Virtex 5 Devices include

up to 2 PowerPC
processor cores
[© and source: Xilinx Inc.: Virtex-II Pro™ Platform

FPGAs: Functional Description, Sept. 2002,
//www.xilinx.com]
- 117 -
Memory
TU Dortmund
Memory
Memories?
Oops!
Memories!
For the memory, efficiency is again a concern:

 speed (latency and throughput); predictable timing
 energy efficiency
 size
 cost
 other attributes (volatile vs. persistent, etc)
- 119 -
TU Dortmund
Access times and energy consumption increases

with the size of the memory
Example (CACTI Model): "Currently, the size of

some applications is
doubling every 10
months"
[STMicroelectronics,
Medea+ Workshop,
Stuttgart, Nov. 2003]
- 120 -
TU Dortmund
Access times and energy consumption

for multi-ported register files
Cycle Time (ns) Area (2x106) Power (W)
1.8 7 14
1.7 6 12
1.6 10
5
1.5
4 8
1.4
3 6
1.3
1.2 2 4
Source and © H. Valero, 2001

1.1 1 2
1 0 0
16 32 64 128 16 32 64 128 16 32 64 128
Register File Size Register File Size

GP6M2 GP6M3
Rixner’s et al. model [HPCA’00], Technology of 0.18 m
- 121 -
TU Dortmund
Memory system frequently consumes

>50 % of the energy used for processing
29%
Processor Energy
Cache ($)-less Main Mem.
monoprocessor Energy
71%
Proc. Energy
Multiprocessor with 51,9% 28,1%

I-Cache Energy
cache ($) D-Cache Energy
Main Mem.
Energy
Average over 200 benchmarks

14,8%
analyzed by Verma (U. Dortmund)
5,2%
[M. Verma, P. Marwedel: Advanced Memory Optimization Techniques for Low-Power Embedded Processors, Springer, 2007]
- 122 -
TU Dortmund
Similar information according to other sources

Others Icache
DMMU 5%
8% 26%
EBOX
8%
Clock
10%
IMMU
9% Ibox
18%
Dcache
16%
Strong ARM
IEEE Journal of SSC

Nov. 96
[Based on slide by and ©: Osman S. Unsal, Israel Koren, C. Mani

Krishna, Csaba Andras Moritz, U. of Massachusetts, Amherst, 2001] [Segars 01 according to Vahid@ISSS01]
- 123 -
TU Dortmund
Energy consumption in mobile devices
[O. Vargas (Infineon Technologies): Minimum power consumption in mobile-phone memory subsystems; Pennwell
Portable Design - September 2005;] Thanks to Thorsten Koch (Nokia/ Univ. Dortmund) for providing this source.
- 124 -
TU Dortmund
Trends for the Speeds
Speed gap between processor Similar problems also for

and main DRAM increases
embedded systems &
8 Speed
MPSoCs
ce  In the future:
an
a. m
Memory access times >>

p. for
-2 er
)
processor cycle times

.5 P
4
(1 U
CP
 2x  “Memory wall”
every 2 problem
2
years
(1. 07 p.a.)
DRAM
1
0 1 2 3 4 5 years
[P. Machanik: Approaches to Addressing the Memory Wall, TR Nov. 2002, U. Brisbane]
- 125 -
TU Dortmund
Set-associative cache n-way cache
|Set| = 2
Address
Tag Index
way 0 $ (€) way 1
Tags data block Tags data block
= =
1
Data
- 126 -
TU Dortmund
Hierarchical memories
using scratch pad memories (SPM)
SPM is a small, Address space Example

Example
physically separate 0
memory mapped
scratch pad memory
into the address
space
FFF..
Hierarchy
Hierarchy
ARM7TDMI
cores, well-
main no tag memory
known for low
power
select consumption
Selection is by an
SPM SPM appropriate address
decoder (simple!)
processor
- 127 -
TU Dortmund
Comparison of currents using measurements
E.g.: ATMEL board with

ARM7TDMI and Current
ext. SRAM 32 Bit-Load Instruction (Thumb)
200
150
116
mA
100 77,2 82,2
50 1,16
48,2 50,9 44,4 53,1
0
Prog Main/ Data Prog Main/ Data Prog SPM/ Data Prog SPM/ Data SPM
Main SPM Main
Core+SPM (mA) Main Memory Current (mA)
- 128 -
TU Dortmund
Why not just use a cache ?
2. Energy for parallel access of sets, in comparators, muxes.
9
.
7
Energy per access [nJ]
6
Scratch pad
5 Cache, 2way, 4GB space
4 Cache, 2way, 16 MB space
Cache, 2way, 1 MB space
3
0
256 512 1024 2048 4096 8192 16384
memory size [R. Banakar, S. Steinke, B.-S. Lee, 2001]
- 129 -
TU Dortmund
Influence of the associativity
Parameters different from

previous slides
[P. Marwedel et al., ASPDAC, 2004]
- 130 -
TU Dortmund
Summary
 Processing
• VLIW/EPIC processors
• MPSoCs
 FPGAs
 Memories
• “Small is beautiful”
(in terms of energy consumption, access times, size)
- 131 -
Communication

TU Dortmund

- 133 -
TU Dortmund
Communication
- Requirements -
 Real-time behavior
 Efficient, economical
(e.g. centralized power supply)
 Appropriate bandwidth and communication delay
 Robustness
 Fault tolerance
 Diagnosability
 Maintainability
 Security
 Safety
- 134 -
TU Dortmund
Basic techniques:
Electrical robustness
Single-ended vs. differential signals
ground
Voltage at input of Op-Amp positive  '1'; otherwise  '0'
Local ground Local ground
Combined with twisted pairs; Most noise added to both wires.

- 135 -
TU Dortmund
Evaluation
Advantages:
 Subtraction removes most of the noise
 Changes of voltage levels have no effect
 Reduced importance of ground wiring
 Higher speed
Disadvantages:
 Requires negative voltages
 Increased number of wires and connectors
Applications:
 USB, FireWire, ISDN
 Ethernet (STP/UTP CAT 5/6 cables)
 differential SCSI
 High-quality analog audio signals (XLR) © wikipedia
- 136 -
TU Dortmund
Communication
- Requirements -
 Real-time behavior
 Efficient, economical
(e.g. centralized power supply)
 Appropriate bandwidth and communication delay
 Robustness
 Fault tolerance
 Diagnosability
 Maintainability
 Security
 Safety
- 137 -
TU Dortmund
Priority-based arbitration of communication media
For example, consider a bus
Device 0 Device 1 Device 2 Device 3
 Bus arbitration (allocation) is frequently priority-based

 Communication delay depends on communication traffic of
other partherns
 No tight real-time guarantees, except for highest priority
partner
- 138 -
TU Dortmund
Real-time behavior
Carrier-sense multiple-access/collision-detection
(CSMA/CD, Standard Ethernet) no guaranteed response time.
Alternatives:
 token rings, token busses
 Carrier-sense multiple-access/collision-avoidance
(CSMA/CA)
• WLAN techniques with request preceding transmission
• Each partner gets an ID (priority). After each bus transfer,
all partners try setting their ID on the bus; partners
detecting higher ID disconnect themselves from the bus.
Highest priority partner gets guaranteed response time;
others only if they are given a chance.
- 139 -
TU Dortmund
Time division multiple access

(TDMA) busses
Each communication partner is assigned a fixed time slot.
Example:
http://www.ece.cmu.edu/
~koopman/jtdma/
jtdma.html#classical
 Master sends sync [E. Wandeler, L. Thiele: Optimal

 Some waiting time TDMA Time Slot and Cycle
Length Allocation for Hard Real-
Time Systems, ASP-DAC, 2006]
 Each slave transmits in its time slot
  variations (truncating unused slots, >1 slots per slave)
 TDMA resources have a deterministic timing behavior
 TDMA provides QoS guarantees in networks on chips
- 140 -
TU Dortmund
FlexRay
 Developed by the FlexRay consortium

(BMW, Ford, Bosch, DaimlerChrysler, …)
 Specified in SDL
 Improved error tolerance and time-determinism
 Meets requirements with transfer rates >> CAN standard
High data rate can be achieved:
• initially targeted for ~ 10Mbit/sec;
• design allows much higher data rates
 TDMA protocol
 Cycle subdivided into a static and a dynamic segment.
- 141 -
TU Dortmund
TDMA in FlexRay
Exclusive bus access enabled for short time in each case.

Dynamic segment for transmission of variable length information.
Fixed priorities in dynamic segment: Minislots for each potential sender.
http://www.tzm.de/FlexRay/FlexRay_Introduction.html
Bandwidth used only when it is actually needed.
- 142 -
TU Dortmund
Time intervals in Flexray
Prof. Form, TU Braunschweig, 2007

©
 Microtick (µt) = Clock period in partners, may differ between partners

 Macrotick (mt) = Basic unit of time, synchronized between partners
(=riµt, ri varies between partners i)
 Slot=Interval allocated per sender in static segment (=pmt, p: fixed (configurable))
 Minislot = Interval allocated per sender in dynamic segment (=qmt, q: variable)
Short minislot if no transmission needed; starts after previous minislot.
 Cycle = Static segment + dynamic segment + network idle time
show flexray animation
from dortmund - 143 -
TU Dortmund
Structure of Flexray networks
Bus guardian protects the system against failing processors,
seite=introduction_flexray_en&root=5873&system_id=5875&com=formular_suche_treff
e.g. so-called “babbling idiots”
http://www.ixxat.de/index.php?
- 144 -
TU Dortmund
Communication:
Hierarchy
Inverse relation between volume and urgency quite common:
Sensor/actuator busses
- 145 -
TU Dortmund
Other busses
 Sensor/actuator busses: connecting sensors/actuators, low rates

 Field busses
 CAN: Controller bus for automotive
 LIN: low cost bus for interfacing sensors/actuators in the automotive
domain
 MOST: Multimedia bus for the automotive domain (not a field bus)
 MAP: bus designed for car factories.
 Process Field Bus (Profibus): used in smart buildings
 The European Installation Bus (EIB): bus designed for smart
buildings; CSMA/CA; low data rate.
 IEEE 488: Designed for laboratory equipment.
 Attempts to use standard Ethernet. Timing predictability an issue.
- 146 -
TU Dortmund
Wireless communication: Examples
 IEEE 802.11 a/b/g/n

 UMTS; HSPA
 DECT
 Bluetooth
 ZigBee
Timing predictability of wireless communication?
- 147 -
D/A-Converters
TU Dortmund
Embedded system hardware is frequently used

in a loop (“hardware in a loop“):
- 149 -
TU Dortmund
Kirchhoff‘s junction rule

Kirchhoff‘s Current Law, Kirchhoff‘s first rule
Kirchhoff’s Current Law: Example:

At any point in an electrical circuit,
the sum of currents flowing
towards that point is equal to the
sum of currents flowing away from
that point.
(Principle of conservation of
electric charge) i1 + i2+ i4 = i3
Formally, for any node in a circuit: i1+i2-i3+i4=0
 i 0
k k
[Jewett and
Count current flowing away from node as negative. Serway, 2007].
- 150 -
TU Dortmund
Kirchhoff's loop rule

Kirchhoff‘s Voltage Law, Kirchhoff's second rule
Example:
The principle of conservation of energy
implies that:
The sum of the potential
differences (voltages) across all
elements around any closed circuit
must be zero
[Jewett and Serway, 2007].
Formally, for any loop in a circuit: V1-V2-V3+V4=0
V k k 0 V3=R3I3 if current counted in

the same direction as V3
Count voltages traversed against arrow V3=-R3I3 if current counted in
direction as negative the opposite direction as V3
- 151 -
TU Dortmund
Operational Amplifiers (Op-Amps)
Operational amplifiers (op-amps) are devices amplifying the

voltage difference between two input terminals by a large gain
factor g
Supply voltage
Vout=(V+ - V-) ∙ g
-
op-amp High impedance input terminals
V- +  Currents into inputs  0
Vout
V+ Op-amp in a separate package
ground (TO-5) [wikipedia]
For an ideal op-amp: g  

(In practice: g may be around 104..106)
- 152 -
TU Dortmund
Op-Amps with feedback
In circuits, negative feedback is used to define

the actual gain I R 1
loop Due to the feedback to
R - the inverted input, R1
op-amp reduces voltage V-.
V1 V- + Vout To which level?
ground
Vout = - g ∙V- (op-amp feature) I  R1
 V 
I∙R1+Vout-V-=0 (loop rule) 1 g
I  R1
 I∙R1+ - g ∙V- -V-=0 V ,ideal  lim 0
g  1  g
 (1+g) ∙V- = I∙R1
V- is called virtual ground: the voltage is 0,
but the terminal may not be connected to ground
- 153 -
TU Dortmund
Digital-to-Analog (D/A) Converters
Various types, can be quite simple,

e.g.:
- 154 -
TU Dortmund
Current ~ no. represented by x
Loop rule:
x0  I 0  8  R  V  Vref  0
Vref
 I 0  x0 
8 R
Vref
In general: I i  xi 
2 3 i  R
Junction rule: I   Ii
i
I ~ nat (x), where nat(x): natural number represented by x;
- 155 -
TU Dortmund
Output voltage ~ no. represented by x
Loop rule*: y  R1  I '  0
Junction rule°: I  I'

° *
y  R1  I  0

From the previous slide
Hence:
Op-amp turns
current I ~ nat
R1 3 R1
y  Vref  
8  R i 0
xi  2 i
 Vref 
8 R
 nat ( x) (x) into a voltage
~ nat (x)
- 156 -
TU Dortmund
Output generated from signal e3(t)
*
* Assuming
“zero-order
hold”
Possible to
reconstruct
input
signal?
- 157 -
Sampling
Theorem
TU Dortmund
Possible to reconstruct input signal?
 Assuming Nyquist criterion met

 Let {ts}, s = ...,−1,0,1,2, ... be times at which we sample g(t)
 Assume a constant sampling rate of 1/ps(∀s: ps = ts+1−ts).
 According sampling theory, we can approximate the input
signal as follows:
Weighting factor
for influence of
y(ts) at time t
[Oppenheim, Schafer, 2009]
- 159 -
TU Dortmund
Weighting factor for influence of y(ts)

at time t
No influence at ts+n
- 160 -
TU Dortmund
Contributions from the various sampling instances
- 161 -
TU Dortmund
(Attempted) reconstruction of input signal
* Assuming 0-
order hold
- 162 -
TU Dortmund
How to compute the sinc( ) function?
 Filter theory: The required interpolation is performed

by an ideal low-pass filter (sinc is the Fourier transform
of the low-pass filter transfer function)
z (t )
y (t )
fs /2 fs
Filter removes high frequencies present in y(t)
- 163 -
TU Dortmund
How precisely are we reconstructing the input?
 Sampling theory:
• Reconstruction using sinc () is precise
 However, it may be impossible to really compute z(t) as

indicated ….
- 164 -
TU Dortmund
Limitations
 Actual filters do not compute sinc( )

In practice, filters are used as an approximation.
Computing good filters is an art itself!
 All samples must be known to reconstruct e(t) or g(t).
 Waiting indefinitely before we can generate output!
In practice, only a finite set of samples is available.
 Actual signals are never perfectly bandwidth limited.
 Quantization noise cannot be removed.
- 165 -
TU Dortmund
Output
Output devices of embedded systems include

 Displays: Display technology is extremely important. Major
research and development efforts
 Electro-mechanical devices: these influence the
environment through motors and other electro-mechanical
equipment.
Frequently require analog output.
- 166 -
TU Dortmund
Embedded system hardware is frequently used

in a loop (“hardware in a loop“):
- 167 -
Actuators
TU Dortmund
Actuators
Huge variety of actuators and output devices,

impossible to present all of them.
Microsystems motors as examples (© MCNC):
(© MCNC)
- 169 -
TU Dortmund
Actuators (2)
Courtesy and ©: E.
Obermeier, MAT, TU Berlin
http://www.piezomotor.se/pages/PWtechnology.html
http://www.elliptec.com/fileadmin/elliptec/User/Produkte/Elliptec_Motor/Elliptecmotor_How_it_works.h
- 170 -
TU Dortmund
Secure Hardware
 Security needed for communication and storage

 Demand for special equipment for cryptographic keys
 To resist side-channel attacks like
• measurements of the supply current or
• Electromagnetic radiation.
Special mechanisms for physical protection (shielding,
sensor detecting tampering with the modules).
 Logical security, using cryptographic methods needed.
 Smart cards: special case of secure hardware
• Have to run with a very small amount of energy.
 In general, we have to distinguish between different
levels of security and knowledge of “adversaries”
- 171 -
TU Dortmund
Summary
Hardware in a loop
 Sensors
 Discretization
 Information processing
• Importance of energy efficiency, Special purpose HW very
expensive, Energy efficiency of processors, Code size
efficiency, Run-time efficiency
• Reconfigurable Hardware
 Communication
 D/A converters
 Sampling theorem
 Actuators
- 172 -

Lecture 02 Hardware

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 02 Hardware

Uploaded by

Copyright:

Available Formats

Embedded System

These slides use Microsoft clip arts.

 The need to consider both hardware and software is one of

Structure of this course

Numbers denote sequence of chapters

Embedded System Hardware

Embedded system hardware is frequently used in a loop

Many examples of such loops

Processing of physical data starts with capturing this data.

Example: Acceleration Sensor

Courtesy & ©: S. Bütgenbach, TU Braunschweig

Charge-coupled devices (CCD) image sensors

Corresponding to “bucket brigade device”

CMOS image sensors

Comparison CCD/CMOS sensors

Property CCD CMOS

Example: Biometrical Sensors

e.g.: Fingerprint sensor

Artificial eyes (2)

He looks hale, hearty, and healthy — except

Artificial eyes (3)

 Translation into sound;

 Rain sensors for wiper control

 Engine control sensors

 Hall effect sensors

Sensors generate signals

Definition: a signal s is a mapping

Graphics: © Alexandra Nolte, Gesine Marwedel, 2003

Digital computers require discrete sequences of physical

Discrete time domain

Clocked transistor + capacitor;

Do we loose information due to sampling?

Would we be able to reconstruct input signals from the

 approximation of signals by sine waves.

with k: pk= p1/k: periods

Let e1(t) and e2(t) be signals

Definition: A transformation Tr of signals is linear iff

In the following, we will consider linear transformations.

 Reconstruction impossible, if not sampling frequently

A filter is needed to remove high frequencies

e4(t) changed into e3(t)

Examples of Aliasing in computer graphics

Original Sub-sampled, no filtering

Discretization of values: A/D-converters

Digital computers require digital form of physical values

Discrete value domain

A/D-conversion; many methods with different speeds.

Flash A/D converter

* Frequently, the case h(t) > Vref would not be decoded

Assuming 0  h(t)  Vref

Encoding of voltage intervals

 Resolution (in bits): number of bits produced

Q: resolution in volts per step Example:

Resolution and speed of Flash A/D-converter

Parallel comparison with reference voltage

Key idea: binary search: Speed: O(log2(n))

Successive approximation (2)

Application areas for flash

[Gielen et al., DAC 2003]

Quantization noise for audio signal

e.g.: 20 log(2)=6.02 decibels

 effective signal voltage 

Signal to noise ratio

 effective signal voltage 