Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 167

Embedded System

Hardware

These slides use Microsoft clip arts.


Microsoft copyright restrictions apply.
TU Dortmund

Motivation

 The need to consider both hardware and software is one of


the characteristics of embedded/cyber-physical systems.
Reasons:
• Real-time behavior
• Efficiency
- Energy
- …
• Security
• Reliability
• …

- 2-
TU Dortmund

Structure of this course

2: Design
Design
Application Knowledge

Specification repository

3: 8:
ES-hardware 6: Application Test
mapping
4: system 7: Optimization
software (RTOS,
5: Evaluation &
middleware, …) validation (energy, cost,
performance, …)

Numbers denote sequence of chapters

- 3-
TU Dortmund

Embedded System Hardware

Embedded system hardware is frequently used in a loop


(“hardware in a loop“):

 cyber-physical systems

- 4-
TU Dortmund

Many examples of such loops

 Heating

 Lights

 Engine control

 Power supply

 …

 Robots
Heating: www.masonsplumbing.co.uk/images/heating.jpg
Robot:: Courtesy and ©: H.Ulbrich, F. Pfeiffer, TU München

- 5-
TU Dortmund

Sensors

Processing of physical data starts with capturing this data.


Sensors can be designed for virtually every physical and
chemical quantity
 including weight, velocity, acceleration, electrical
current, voltage, temperatures etc.
 chemical compounds.
Many physical effects used for constructing sensors.
Examples:
 law of induction (generation of voltages in an electric
field),
 light-electric effects.
Huge amount of sensors designed in recent years.
- 6-
TU Dortmund

Example: Acceleration Sensor

Courtesy & ©: S. Bütgenbach, TU Braunschweig

- 7-
TU Dortmund

Charge-coupled devices (CCD) image sensors

Based
Basedon
oncharge
chargetransfer
transferto
tonext
next pixel
pixelcell
cell

Corresponding to “bucket brigade device”


(German: “Eimerkettenschaltung”)

- 8-
TU Dortmund

CMOS image sensors

Based on standard
production process
for CMOS chips,
allows integration
with other
components.

- 9-
TU Dortmund

Comparison CCD/CMOS sensors

Property CCD CMOS


Technology Optics VLSI technology
optimized for
Technology Special Standard
Smart sensors No, no logic on chip Logic elements on chip
Access Serial Random
Size Limited Can be large
Power consumption Low Larger
Applications Compact cameras Low cost devices, SLR
cameras
See also B. Diericks: CMOS image sensor concepts. Photonics West 2000 Short course (Web)

- 10 -
TU Dortmund

Example: Biometrical Sensors

e.g.: Fingerprint sensor

© P. Marwedel, 2010

- 11 -
TU Dortmund

Artificial eyes

© Dobelle Institute
(was at www.dobelle.com)

- 12 -
TU Dortmund

Artificial eyes (2)

He looks hale, hearty, and healthy — except


for the wires. …. From a distance the wires
look like long ponytails.

© Dobelle Institute

- 13 -
TU Dortmund

Artificial eyes (3)

 Translation into sound;


resolution claimed to be good
[http://www.seeingwithsound.com/etumble.htm]

Movie - 14 -
TU Dortmund

Other sensors

 Rain sensors for wiper control


(“Sensors multiply like rabbits“ [ITT automotive])

 Pressure sensors

 Proximity sensors

 Engine control sensors

 Hall effect sensors

- 15 -
TU Dortmund

Signals

Sensors generate signals

Definition: a signal s is a mapping


from the time domain DT to a value domain DV:
s: DT  DV
DT : continuous or discrete time domain
DV : continuous or discrete value domain.

- 16 -
Discretization

Graphics: © Alexandra Nolte, Gesine Marwedel, 2003


TU Dortmund

Discretization of time

Digital computers require discrete sequences of physical


values

s : DT  DV

Discrete time domain

 Sample-and-hold circuits

- 18 -
TU Dortmund

Sample-and-hold circuits

Clocked transistor + capacitor;


Capacitor stores sequence values

e(t) is a mapping ℝ  ℝ
h(t) is a sequence of values or a mapping ℤ  ℝ

- 19 -
TU Dortmund

Do we loose information due to sampling?

Would we be able to reconstruct input signals from the


sampled signals?

 approximation of signals by sine waves.

- 20 -
TU Dortmund

Approximation of a K=1
square wave (1)
Target: square wave
with period p1=4

K
4  2 t 
e' K (t )   sin  
k 1, 3, 5,.. k  pk 
K=3

with k: pk= p1/k: periods


of contributions to e’

- 21 -
TU Dortmund

Approximation of a K=5
square wave (2)

K
4  2 t 
e' K (t )   sin  
k 1, 3, 5,.. k  4/k  K=7

- 22 -
TU Dortmund

Approximation of a K=9
square wave (3)

K
4  2 t 
e' K (t )   sin  
k 1, 3, 5,.. k  4/k  K=11
K=11

Applet at © http://
www.jhu.edu/~signals/fourier2/index.html- 23 -
TU Dortmund

Linear transformations

Let e1(t) and e2(t) be signals

Definition: A transformation Tr of signals is linear iff


Tr (e1  e2 )  Tr (e1 )  Tr (e2 )

In the following, we will consider linear transformations.


 We consider sums of sine waves instead of the original
signals.

- 24 -
TU Dortmund

Aliasing

 2 t   2 t 
e3 (t )  sin    0 . 5 sin  
 8   4 

 2 t   2 t   2 t 
e4 (t )  sin   0.5 sin    0. 5 sin  
 8   4   1 

Periods of 8,4,1
Indistinguishable if sampled at integer times, ps=1

Matlab demo - 25 -
TU Dortmund

Aliasing (2)

 Reconstruction impossible, if not sampling frequently


enough
How frequently do we have to sample?
Nyquist criterion (sampling theory):
Aliasing can be avoided if we restrict the frequencies of
the incoming signal to less than half of the sampling rate.
ps < ½ pN where pN is the period of the “fastest” sine wave
or fs > 2 fN where fN is the frequency of the “fastest” sine wave
fN is called the Nyquist frequency, fs is the sampling rate.
See e.g. [Oppenheim/Schafer, 2009]

- 26 -
TU Dortmund

Anti-aliasing filter

A filter is needed to remove high frequencies

e4(t) changed into e3(t)

g (t ) Ideal filter
e(t )

Realizable
filter fs /2 fs
- 27 -
TU Dortmund

Examples of Aliasing in computer graphics

Original Sub-sampled, no filtering

http://en.wikipedia.org/wiki/Image:
Moire_pattern_of_bricks_small.jpg - 28 -
TU Dortmund

Examples of Aliasing in
computer graphics (2)

Filtered &
Original (pdf screen copy) sub-
sampled

Sub-
sampled,
no filtering
http://www.niirs10.com/
Impact of
Resources/Reference rasterization
Documents/Accuracy in Digital
Image Processing.pdf

- 29 -
TU Dortmund

Discretization of values: A/D-converters

Digital computers require digital form of physical values

s: DT  DV

Discrete value domain

A/D-conversion; many methods with different speeds.

- 30 -
TU Dortmund

Flash A/D converter

*
Encodes input
number of most
significant ‘1’ as an
unsigned number,
e.g.
“1111” -> “100”,
“0111” -> “011”,
“0011” -> “010”,
“0001” -> “001”,
“0000” -> “000”
(Priority encoder).

* Frequently, the case h(t) > Vref would not be decoded

- 31 -
TU Dortmund

Assuming 0  h(t)  Vref

Encoding of voltage intervals

“11“
“10“
“01“
“00“
Vref /4 Vref /2 3Vref /4 Vref h(t)

- 32 -
TU Dortmund

Resolution

 Resolution (in bits): number of bits produced


 Resolution Q (in volts): difference between two input
voltages causing the output to be incremented by 1

VFSR
Q with
n

Q: resolution in volts per step Example:


Q = Vref /4 for the
VFSR: difference between largest
previous slide,
and smallest voltage assuming * to be
n: number of voltage intervals absent

- 33 -
TU Dortmund

Resolution and speed of Flash A/D-converter

Parallel comparison with reference voltage


Speed: O(1)
Hardware complexity: O(n)
Applications: e.g. in video processing

- 34 -
TU Dortmund

Higher resolution:
Successive approximation

h(t)

V-
w(t)

Key idea: binary search: Speed: O(log2(n))


Set MSB='1' Hardware complexity: O(log2(n))
if too large: reset MSB with n= # of distinguished
Set MSB-1='1' voltage levels;
if too large: reset MSB-1 slow, but high precision possible.
- 35 -
TU Dortmund

Successive approximation (2)

1100
1011
Vx
h(t) 1010
1000

V-

t
- 36 -
TU Dortmund

Application areas for flash


and successive approximation converters
Effective number of bits at bandwidth

(used in multimeters)
(using single bit D/A-converters;
common for high quality audio equipments)
[http://www.beis.de/Elektronik/
DeltaSigma/DeltaSigma.html]

(Pipelined flash
converters)

[Gielen et al., DAC 2003]


Movie IEEE tv - 37 -
TU Dortmund

Quantization Noise

Assuming
h(t) “rounding“
(truncating)
towards 0
w(t)

w(t)-h(t)

- 38 -
TU Dortmund

Quantization Noise

h(t)
Assuming
“rounding“
w(t) (truncating)
towards 0

h(t)-w(t)

- 39 -
TU Dortmund

Quantization noise for audio signal

e.g.: 20 log(2)=6.02 decibels

 effective signal voltage 


signal to noise ratio (SNR) [db]  20 log  
 effective noise voltage 
Signal to noise for ideal n-bit converter : n * 6.02 + 1.76 [dB]
e.g. 98.1 db for 16-bit converter, ~ 160 db for 24-bit converter
Additional noise for non-ideal converters Source: [http://www.beis.de/Elektronik/
DeltaSigma/DeltaSigma.html]

MATLAB demo - 40 -
TU Dortmund

Signal to noise ratio

 effective signal voltage 


signal to noise ratio (SNR) [db]  20 log10  
 effective noise voltage 

e.g.: 20 log10(2)=6.02 decibels

Signal to noise for ideal n-bit converter : n * 6.02 + 1.76 [dB]


e.g. 98.1 db for 16-bit converter, ~ 160 db for 24-bit converter

Additional noise for non-ideal converters

- 41 -
TU Dortmund

Summary

Hardware in a loop
 Sensors
 Discretization
• Definition of signals
• Sample-and-hold circuits
- Aliasing (and how to avoid it)
- Nyquist criterion
• A/D-converters
- Flash-based
- Successive approximation
- Quantization noise

- 42 -
Hardware
- Processing -
Embedded System

Graphics: © Alexandra Nolte, Gesine Marwedel, 2003


TU Dortmund

Embedded System Hardware

Embedded system hardware is frequently used in a loop


(“hardware in a loop“):

 cyber-physical systems

- 44 -
TU Dortmund

Processing units

Need for efficiency (power + energy):

Why worry about


energy and power?

“Power is considered as the most important constraint in


embedded systems“
[in: L. Eggermont (ed): Embedded Systems Roadmap 2002, STW]

Energy consumption by IT is the key concern


of green computing initiatives (embedded
computing leading the way)
http://www.esa.int/images/earth,4.jpg

- 45 -
TU Dortmund

Importance
of Energy
Efficiency
p ower on“
e rent of silic
“inh iency
c
effi

Hugo De Man,
©

IMEC, Philips, 2007

- 46 -
TU Dortmund

Power and energy are related


to each other

E   P dt
P

E'
E

t
In many cases, faster execution also means less energy,
but the opposite may be true if power has to be increased
to allow faster execution.

- 47 -
TU Dortmund

Low Power vs. Low Energy Consumption

 Minimizing power consumption important for


• the design of the power supply
• the design of voltage regulators
• the dimensioning of interconnect
• short term cooling
 Minimizing energy consumption important due to
• restricted availability of energy (mobile systems)
• limited battery capacities (only slowly improving)
• very high costs of energy (solar panels, in space)
• cooling
• high costs
• limited space
• dependability
• long lifetimes, low temperatures

- 48 -
TU Dortmund

Power density continues to get worse

Nuclear reactor
Prescott: 90 W/cm²,
90 nm [c‘t 4/2004]

© Intel
M. Pollack,
Micro-32

- 49 -
TU Dortmund

Surpassed hot (kitchen) plate …?


Why not use it?

http://
www.phys.ncku.edu.tw/
~htsu/humor/fry_egg.html
- 50 -
TU Dortmund

Energy consumption in mobile devices

[O. Vargas (Infineon Technologies): Minimum power consumption in mobile-phone memory subsystems; Pennwell
Portable Design - September 2005;] Thanks to Thorsten Koch (Nokia/ Univ. Dortmund) for providing this source.

- 51 -
TU Dortmund

Application Specific Circuits (ASICS)


or Full Custom Circuits

Custom-designed circuits necessary


 if ultimate speed or
 energy efficiency is the goal and
 large numbers can be sold.
Approach suffers from
 long design times,
 lack of flexibility
(changing standards) and
 high costs
(e.g. Mill. $ mask costs).

- 53 -
TU Dortmund

Mask cost for specialized HW


becomes very expensive

Trend
towards
implementation
in Software

HW
HWsynthesis
synthesisnot
not covered
coveredin
inthis
thiscourse.
course.
[http://www.molecularimprints.com/Technology/
tech_articles/MII_COO_NIST_2001.PDF9]

- 54 -
TU Dortmund

Key requirements for processors

1. Energy/
power-
efficiency

- 55 -
TU Dortmund

Dynamic power management (DPM)

Example: STRONGARM SA1100

RUN: operational 400mW


IDLE: a sw routine may
RUN s
stop the CPU when not µ
in use, while monitoring 90
10µs

si ult er
fa ow
interrupts

P
160ms

al
gn
SLEEP: Shutdown of on- 10µs
chip activity 90µs
IDLE Power fault SLEEP
signal
50mW 160µW

- 56 -
TU Dortmund

Fundamentals of dynamic voltage


scaling (DVS)

Power consumption of CMOS


circuits (ignoring leakage): Delay for CMOS circuits:
P   C L Vdd2 f with
 : switching activity Vdd
  k CL with
C L : load capacitanc e Vdd  Vt 2

Vdd : supply voltage Vt : threshhold voltage


f : clock frequency (Vt  than Vdd )

Decreasing Vdd reduces P quadratically,


while the run-time of algorithms is only linearly increased

- 57 -
TU Dortmund

Variable-voltage/frequency example: INTEL Xscale

OS should
schedule
distribution
of the
energy
budget.

From Intel’s Web Site


- 59 -
TU Dortmund

Low voltage, parallel operation more efficient


than high voltage, sequential operation

Basic equations
Power: P ~ VDD² ,
Maximum clock frequency: f ~ VDD ,
Energy to run a program: E = P  t, with: t = runtime (fixed)
Time to run a program: t ~ 1/f
Changes due to parallel processing, with  operations per clock:
Clock frequency reduced to: f’ = f / ,
Voltage can be reduced to: VDD’ =VDD / ,
Power for parallel processing: P° = P / ² per operation,
Power for  operations per clock: P’ =   P° = P / ,
Time to run a program is still: t’ = t,
Energy required to run program: E’ = P’  t = E /  Rough
Argument in favour of voltage scaling, approxi-
VLIW processors, and multi-cores mations!
- 60 -
TU Dortmund

Application: VLIW procesing and


voltage scaling in the Crusoe processor

 VDD: 32 levels (1.1V - 1.6V)


 Clock: 200MHz - 700MHz in increments of 33MHz
Scaling is triggered when CPU load change is detected by
software (~1/2 ms).
 More load: Increase of supply voltage (~20 ms/step),
followed by scaling clock frequency
 Less load: reduction of clock frequency, followed by
reduction of supply voltage
Worst case (1.1V to 1.6V VDD, 200MHz to 700MHz) takes
280 ms

- 61 -
TU Dortmund

Key requirement #2: Code-size efficiency


 CISC machines: RISC machines designed for run-time-,
not for code-size-efficiency
 Compression techniques: key idea

- 63 -
TU Dortmund

Code-size efficiency
 Compression techniques (continued):
• 2nd instruction set, e.g. ARM Thumb instruction set:
16-bit Thumb instr.
001 10 Rd Constant ADD Rd #constant

Dynamically
decoded at
major
source=

run-time
opcode minor
opcode destination zero extended

1110 001 01001 0 Rd 0 Rd 0000 Constant


• Reduction to 65-70 % of original code size
• 130% of ARM performance with 8/16 bit memory
• 85% of ARM performance with 32-bit memory [ARM, R. Gupta]

Same approach for LSI TinyRisc, …


Requires support by compiler, assembler etc.
- 64 -
TU Dortmund

Dictionary approach, two level control store


(indirect addressing of instructions)

“Dictionary-based coding schemes cover a wide range of


various coders and compressors.
Their common feature is that the methods use some kind of a
dictionary that contains parts of the input sequence which
frequently appear.
The encoded sequence in turn contains references to the
dictionary elements rather than containing these over and
over.”

[Á. Beszédes et al.: Survey of Code size Reduction Methods, Survey of Code-Size
Reduction Methods, ACM Computing Surveys, Vol. 35, Sept. 2003, pp 223-267]

- 65 -
TU Dortmund

Key idea (for d bit instructions)

For each Uncompressed storage of


b
instruction instruction a d-bit-wide instructions
address, S requires axd bits.
address
S a contains table
address of
instruction.
In compressed code, each
b « d bit instruction pattern is
table of used instructions stored only once.
c≦ (“dictionary”)
2b small
d bit
Hopefully, axb+cxd < axd.
CPU Called nanoprogramming
in the Motorola 68000.
- 66 -
TU Dortmund

More information on code compaction

 Popular code compaction library by Rik van de Wiel


[http://www.extra.research.philips.com/ccb] has been
moved to

http://www-perso.iro.umontreal.ca/~latendre/
codeCompression/codeCompression/node1.html

http://www.iro.umontreal.ca/~latendre/compactBib/

(153 entries as per 11/2004)

- 68 -
TU Dortmund

Key requirement #3: Run-time efficiency


- Domain-oriented architectures -

Example: Filtering in Digital signal processing (DSP)

Signal at t=ts (sampling points)

- 69 -
TU Dortmund

Filtering in digital signal processing

ADSP 2100

-- outer loop over


-- sampling times ts
{ MR:=0; A1:=1; A2:=s-1;
MX:=w[s]; MY:=a[0];
for (k=0; k <= (n−1); k++)
{ MR:=MR + MX * MY;
MX:=w[A2]; MY:=a[A1];
A1++; A2--;
}
x[s]:=MR;
}
Maps nicely
- 70 -
TU Dortmund

DSP-Processors: multiply/accumulate (MAC)


and zero-overhead loop (ZOL) instructions

MR:=0; A1:=1; A2:=s-1; MX:=w[s]; MY:=a[0];


for ( k:=1 <= n-1)
{MR:=MR+MX*MY; MY:=a[A1]; MX:=w[A2]; A1++; A2--}

Multiply/accumulate (MAC) instruction Zero-overhead loop (ZOL)


instruction preceding MAC
instruction.
Loop testing done in parallel to
MAC operations.

- 71 -
TU Dortmund

Heterogeneous registers

Example
Example(ADSP
(ADSP210x):
210x):
P
D

AX AY MX MY
Address- AF MF
registers
A0, A1,
A2 ..
+,-,..
*
Address
AR
+,-
generation
unit (AGU) MR

Different
Differentfunctionality
functionalityof
ofregisters
registersAn,
An,AX,
AX,AY,
AY,AF,MX,
AF,MX,MY,
MY,MF,
MF,MR
MR
- 72 -
TU Dortmund

Separate address generation units (AGUs)

Example
Example(ADSP
(ADSP210x):
210x):
 Data memory can only be
fetched with address contained
in A,
 but this can be done in parallel
with operation in main data path
(takes effectively 0 time).
 A := A ± 1 also takes 0 time,
 same for A := A ± M;
 A := <immediate in instruction>
requires extra instruction
 Minimize load immediates
 Optimization in optimization
chapter

- 73 -
TU Dortmund

Modulo addressing

sliding window
Modulo addressing: w
Am++  Am:=(Am+1) mod n
(implements ring or circular
buffer in memory)
t
t1

.. ..
n most w[t1-1] w[t1-1]
recent w[t1] w[t1]
values w[t1-n+1] w[t1+1]
w[t1-n+2] w[t1-n+2]
.. ..
Memory, t=t1 Memory, t2= t1+1
- 74 -
TU Dortmund

Saturating arithmetic

 Returns largest/smallest number in case of


over/underflows
 Example:
a 0111
b + 1001
standard wrap around arithmetic (1)0000
saturating arithmetic 1111
(a+b)/2: correct 1000
wrap around arithmetic 0000
saturating arithmetic + shifted 0111
“almost correct“
 Appropriate for DSP/multimedia applications:
• No timeliness of results if interrupts are generated for overflows
• Precise values less important
• Wrap around arithmetic would be worse.

- 75 -
TU Dortmund

Example

MATLAB Demo - 76 -
TU Dortmund

Fixed-point arithmetic

Shifting
Shiftingrequired
requiredafter
aftermultiplications
multiplicationsand
anddivisions
divisionsin
in
order
ordertotomaintain
maintainbinary
binarypoint.
point.

- 77 -
TU Dortmund

Real-time capability

 Timing behavior has to be predictable

[Dagstuhl workshop on predictability, Nov. 17-19, 2003]


Features that cause problems:
• Unpredictable access to shared resources
• Caches with difficult to predict replacement strategies
• Unified caches (conflicts between instructions and data)
• Pipelines with difficult to predict stall cycles ("bubbles")
• Unpredictable communication times for multiprocessors
• Branch prediction, speculative execution
• Interrupts that are possible any time
• Memory refreshes that are possible any time
• Instructions that have data-dependent execution times
 Trying to avoid as many of these as possible.
- 79 -
TU Dortmund

Multiple memory banks or memories

P
D

AX AY MX MY
Address- AF MF
registers
A0, A1,
A2 ..
+,-,..
*
Address
AR
+,-
generation
unit (AGU) MR

Simplifies parallel fetches


- 80 -
TU Dortmund

Multimedia-Instructions/Processors

 Multimedia instructions exploit that many registers,


adders etc are quite wide (32/64 bit),
 whereas most multimedia data types are narrow
(e.g. 8 bit per color, 16 bit per audio sample per channel)
 2-8 values can be stored per register and added. E.g.:

+
4 additions per instruction;
carry disabled at word
boundaries.
- 81 -
TU Dortmund

Early example: HP precision architecture (hp PA)

Half word add instruction HADD:

Half word add?

Optional saturating arithmetic.


Up to 10 instructions can be replaced by HADD.
- 82 -
TU Dortmund

Pentium MMX-architecture (1)

64-bit vectors representing 8 byte encoded, 4 word encoded


or 2 double word encoded numbers.
wrap around/saturating options.
Multimedia registers mm0 - mm7,
consistent with floating-point registers (OS unchanged).

Instruction Options Comments


Padd[b/w/d] wrap around, addition/subtraction of
PSub[b/w/d] saturating bytes, words, double words
Pcmpeq[b/w/d] Result= "11..11" if true, "00..00" otherwise
Pcmpgt[b/w/d] Result= "11..11" if true, "00..00" otherwise
Pmullw multiplication, 4*16 bits, least significant word
Pmulhw multiplication, 4*16 bits, most significant word

- 83 -
TU Dortmund

Pentium MMX-architecture (2)

Psra[w/d] No. of Parallel shift of words, double words


Psll[w/d/q] positions in or 64 bit quad words
Psrl[w/d/q] register or
instruction
Punpckl[bw/wd/dq] Parallel unpack
Punpckh[bw/wd/dq] Parallel unpack
Packss[wb/dw] saturating Parallel pack
Pand, Pandn Logical operations on 64 bit words
Por, Pxor
Mov[d/q] Move instruction

- 84 -
TU Dortmund

Appli-
cation

Scaled
interpolation
between two
images

Next word =
next pixel,
same color. pxor mm7,mm7 ;clear register mm7
movq mm3,fade_val;load scaling value
4 pixels movd mm0,imageA ;load 4 red pixels for A
processed at movd mm1,imageB ;load 4 red pixels for B
a time. unpcklbw mm1,mm7 ;unpack,bytes to words
unpcklbw mm0,mm7 ;upper bytes from mm7
psubw mm0,mm1 ;subtract pixel values
pmulhw mm0,mm3 ;scale
paddw mm0,mm1 ;add to image B
packuswb mm0,mm7 ;pack, words to bytes - 85 -
TU Dortmund

Short vector instruction set extensions


for Intel® Pentium®/AMD® processors
 3DNow! (AMD, 1989)
 Streaming SIMD Extensions SSE (Intel, 1999)
• 16 new registers, floating point SIMD
 SSE2 (Intel, 2001; AMD, 2003)
• MMX instructions available for new SSE registers
 SSE3 (Intel, 2004; AMD)
• vector reduction, floating point conversion independent
of global rounding mode, relaxed alignment restrictions
 SSE4 (Intel, 2006; AMD: 4 instructions implemented)
• String comparison, counting 1‘s, CRC, …
 SSE5 (AMD, 2007)
• 3-address instructions, …
 Advanced vector extensions AVX (Intel, 2008)
• Registers 256, … bit wide
- 86 -
TU Dortmund

Summary

Hardware in a loop
 Sensors
 Discretization
 Information processing
• Importance of energy efficiency
• Special purpose HW very expensive
• Energy efficiency of processors
• Code size efficiency
• Run-time efficiency
• MPSoCs
• Reconfigurable Hardware
 D/A converters
 Actuators
- 87 -
Embedded System

Graphics: © Alexandra Nolte, Gesine Marwedel, 2003


Processing

These slides use Microsoft clip arts.


Microsoft copyright restrictions apply.
TU Dortmund

Key idea of very long instruction word


(VLIW) computers

Instructions included in long instruction packets.


Instruction packets are assumed to be executed in parallel.
Fixed association of packet bits with functional units.

- 89 -
TU Dortmund

Very long instruction word (VLIW) architectures

 Very long instruction word (“instruction packet”) contains


several instructions, all of which are assumed to be
executed in parallel.
 Compiler is assumed to generate these “parallel” packets
 Complexity of finding parallelism is moved from the
hardware (RISC/CISC processors) to the compiler;
Ideally, this avoids the overhead (silicon, energy, ..) of
identifying parallelism at run-time.
A lot of expectations into VLIW machines
 Explicitly parallel instruction set computers (EPICs) are an
extension of VLIW architectures: parallelism detected by
compiler, but no need to encode parallelism in 1 word.

- 90 -
TU Dortmund

EPIC: TMS 320C6xx as an example

Bit in each instruction encodes end of parallel execution


31 0 31 0 31 0 31 0 31 0 31 0 31 0

0 1 1 0 1 1 0
Instr. Instr. Instr. Instr. Instr. Instr. Instr.
A B C D E F G

Cycle Instruction Instructions B, C and D use


disjoint functional units,
1 A cross paths and other data
2 B C D path resources. The same
3 E F G is also true for E, F and G.

Parallel execution cannot span several packets.


- 91 -
TU Dortmund

Partitioned register files

 Many memory ports are required to supply enough


operands per cycle.
 Memories with many ports are expensive.
 Registers are partitioned into (typically 2) sets,
e.g. for TI C60x:

- 92 -
TU Dortmund

More encoding flexibility with IA-64 Itanium

3 instructions per bundle:


127 0
instruc 1 instruc 2 instruc 3 template

There are 5 instruction types: Instruction


 A: common ALU instructions grouping
 I: more special integer instructions (e.g. shifts) information
 M: Memory instructions
 F: floating point instructions
 B: branches
The following combinations can be encoded in templates:
 MII, MMI, MFI, MIB, MMB, MFB, MMF, MBB, BBB, MLX
with LX = move 64-bit immediate encoded in 2 slots
- 93 -
TU Dortmund

Templates and instruction types

End of parallel execution called stops.


Stops are denoted by underscores.
Example:

bundle 1 bundle 2

… MMI M_II MFI_ MII MMI MIB_

Group 1 Group 2 Group 3

Very restricted placement of stops within bundle.


Parallel execution within groups possible.
Parallel execution can span several bundles

- 94 -
TU Dortmund

Instruction types are mapped to


functional unit types

There are 4 functional unit (FU) types:


 M: Memory Unit
 I: Integer Unit
 F: Floating-Point Unit
 B: Branch Unit
Instruction types  corresponding FU type,
except type A (mapping to either I or M-functional units).

- 95 -
TU Dortmund

Implementation: Itanium 2 (2003)


L3 cache

 410M transistors
 374 mm2 die size
 6MB on-die L3
cache
 1.5 GHz at 1.3V

[ftp://download.intel.com/design/
itanium2/download/
madison_slides_r1.pdf]
© Intel, 2003 - 96 -
TU Dortmund

Philips
TriMedia-
Processor
For
For
multimedia-
multimedia-
applications,
applications,
up
upto
to55
instructions/
instructions/
cycle.
cycle.

http://www.nxp.com/acrobat/
datasheets/
PNX15XX_SER_N_3.pdf
(incompatible with firefox?)
© NXP
- 97 -
TU Dortmund

Large # of delay slots,


a problem of VLIW processors
add sub and or
sub mult xor div
ld st mv beq

- 98 -
TU Dortmund

Large # of delay slots,


a problem of VLIW processors

add sub and or


sub mult xor div
ld st mv beq

- 99 -
TU Dortmund

Large # of delay slots,


a problem of VLIW processors

add sub and or


sub mult xor div
ld st mv beq

The execution of many instructions has been started before it is


realized that a branch was required.
Nullifying those instructions would waste compute power
 Executing those instructions is declared a feature, not a bug.
 How to fill all “delay slots“ with useful instructions?
 Avoid branches wherever possible.
- 100 -
TU Dortmund

Predicated execution:
Implementing IF-statements “branch-free“

Conditional Instruction “[c] I“ consists of:


 condition c
 instruction I

c = true => I executed


c = false => NOP

- 101 -
TU Dortmund

Predicated execution:
Implementing IF-statements “branch-free“: TI C6x

Conditional branch Predicated execution


if (c) [c] B L1 [c] ADD x,y,a
{ a = x + y; NOP 5 || [c] ADD x,z,b
b = x + z; B L2 || [!c] SUB x,y,a
} NOP 4 || [!c] SUB x,z,b
else SUB x,y,a
{ a = x - y; || SUB x,z,b
b = x - z; L1: ADD x,y,a
} || ADD x,z,b
L2:

max. 12 cycles 1 cycle

- 102 -
TU Dortmund

Microcontrollers
- MHS 80C51 as an example -

Features for Embedded Systems


 8-bit CPU optimised for control applications
 Extensive Boolean processing capabilities
 64 k Program Memory address space
 64 k Data Memory address space
 4 k bytes of on chip Program Memory
 128 bytes of on chip data RAM
 32 bi-directional and individually addressable I/O lines
 Two 16-bit timers/counters
 Full duplex UART
 6 sources/5-vector interrupt structure with 2 priority levels
 On chip clock oscillators
 Very popular CPU with many different variations

Moved from 3.4.3.4 - 103 -


TU Dortmund

Trend: multiprocessor systems-on-a-chip (MPSoCs)

http://www.mpsoc-forum.org/2007/slides/Hattori.pdf
- 104 -
TU Dortmund

Multiprocessor systems-on-a-chip
(MPSoCs) (2)

http://www.mpsoc-forum.org/2007/slides/Hattori.pdf
- 105 -
TU Dortmund

Multiprocessor systems-on-a-chip
(MPSoCs) (3)

http://www.mpsoc-forum.org/2007/slides/Hattori.pdf
- 106 -
TU Dortmund

Multiprocessor systems-on-a-chip (MPSoCs) (4)

Hugo De Man, IMEC, 2007


©

~50% inherent power efficiency of silicon - 107 -


Embedded System
Hardware
- Reconfigurable

Graphics: © Alexandra Nolte, Gesine Marwedel, 2003


Hardware -

These slides use Microsoft clip arts.


Microsoft copyright restrictions apply.
TU Dortmund

Energy Efficiency of FPGAs

ower n“
p o
e rent of silic
“inh iency
c
effi

Hugo De Man,

©
IMEC, Philips, 2007

- 109 -
TU Dortmund

Reconfigurable Logic

Full custom chips may be too expensive, software too slow.


Combine the speed of HW with the flexibility of SW
HW with programmable functions and interconnect.
Use of configurable hardware;
common form: field programmable gate arrays (FPGAs)
Applications: bit-oriented algorithms like
 encryption,
 fast “object recognition“ (medical and military)
 Adapting mobile phones to different standards.
Very popular devices from
 XILINX (XILINX Vertex II are recent devices)
 Actel, Altera and others

- 110 -
TU Dortmund

Floor-plan of VIRTEX II FPGAs

More recent: Virtex 5, but no floor-plan found for Virtex 5.


- 111 -
TU Dortmund

Virtex 5 Configurable Logic Block (CLB)

- 112 -
TU Dortmund

Virtex 5 Slice (simplified)

Memories typically
used as look-up
tables to implement
any Boolean
function of  6
variables.

- 113 -
TU Dortmund

Virtex 5 SliceM

SliceM supports using


memories for storing
data and as shift
registers

- 114 -
TU Dortmund

Resources
available
in Virtex 5
devices

[© and source: Xilinx Inc.:


Virtex 5 FPGA User
Guide, May, 2009
//www.xilinx.com]

- 115 -
Hierarchical Routing Resources;
no routing plan found for Virtex 5.
TU Dortmund

Interconnect for Virtex II

- 116 -
TU Dortmund

Virtex II Pro Devices


include
up to 4 PowerPC
processor cores

Virtex 5 Devices include


up to 2 PowerPC
processor cores

[© and source: Xilinx Inc.: Virtex-II Pro™ Platform


FPGAs: Functional Description, Sept. 2002,
//www.xilinx.com]

- 117 -
Memory
TU Dortmund

Memory

Memories?
Oops!
Memories!

For the memory, efficiency is again a concern:


 speed (latency and throughput); predictable timing
 energy efficiency
 size
 cost
 other attributes (volatile vs. persistent, etc)

- 119 -
TU Dortmund

Access times and energy consumption increases


with the size of the memory

Example (CACTI Model): "Currently, the size of


some applications is
doubling every 10
months"
[STMicroelectronics,
Medea+ Workshop,
Stuttgart, Nov. 2003]

- 120 -
TU Dortmund

Access times and energy consumption


for multi-ported register files
Cycle Time (ns) Area (2x106) Power (W)

1.8 7 14
1.7 6 12
1.6 10
5
1.5
4 8
1.4
3 6
1.3
1.2 2 4

Source and © H. Valero, 2001


1.1 1 2

1 0 0
16 32 64 128 16 32 64 128 16 32 64 128

Register File Size Register File Size


GP6M2 GP6M3

Rixner’s et al. model [HPCA’00], Technology of 0.18 m

- 121 -
TU Dortmund

Memory system frequently consumes


>50 % of the energy used for processing
29%

Processor Energy
Cache ($)-less Main Mem.

monoprocessor Energy

71%

Proc. Energy

Multiprocessor with 51,9% 28,1%


I-Cache Energy

cache ($) D-Cache Energy

Main Mem.
Energy

Average over 200 benchmarks


14,8%
analyzed by Verma (U. Dortmund)
5,2%

[M. Verma, P. Marwedel: Advanced Memory Optimization Techniques for Low-Power Embedded Processors, Springer, 2007]

- 122 -
TU Dortmund

Similar information according to other sources


Others Icache
DMMU 5%
8% 26%
EBOX
8%
Clock
10%
IMMU
9% Ibox
18%

Dcache
16%
Strong ARM

IEEE Journal of SSC


Nov. 96

[Based on slide by and ©: Osman S. Unsal, Israel Koren, C. Mani


Krishna, Csaba Andras Moritz, U. of Massachusetts, Amherst, 2001] [Segars 01 according to Vahid@ISSS01]

- 123 -
TU Dortmund

Energy consumption in mobile devices

[O. Vargas (Infineon Technologies): Minimum power consumption in mobile-phone memory subsystems; Pennwell
Portable Design - September 2005;] Thanks to Thorsten Koch (Nokia/ Univ. Dortmund) for providing this source.

- 124 -
TU Dortmund

Trends for the Speeds

Speed gap between processor Similar problems also for


and main DRAM increases
embedded systems &
8 Speed
MPSoCs
ce  In the future:
an
a. m

Memory access times >>


p. for
-2 er
)

processor cycle times


.5 P

4
(1 U
CP

 2x  “Memory wall”
every 2 problem
2
years
(1. 07 p.a.)
DRAM
1
0 1 2 3 4 5 years

[P. Machanik: Approaches to Addressing the Memory Wall, TR Nov. 2002, U. Brisbane]

- 125 -
TU Dortmund

Set-associative cache n-way cache

|Set| = 2
Address
Tag Index

way 0 $ (€) way 1

Tags data block Tags data block

= =

1
Data
- 126 -
TU Dortmund

Hierarchical memories
using scratch pad memories (SPM)

SPM is a small, Address space Example


Example
physically separate 0
memory mapped
scratch pad memory
into the address
space
FFF..
Hierarchy
Hierarchy
ARM7TDMI
cores, well-
main no tag memory
known for low
power
select consumption
Selection is by an
SPM SPM appropriate address
decoder (simple!)
processor

- 127 -
TU Dortmund

Comparison of currents using measurements

E.g.: ATMEL board with


ARM7TDMI and Current
ext. SRAM 32 Bit-Load Instruction (Thumb)

200

150
116
mA
100 77,2 82,2
50 1,16
48,2 50,9 44,4 53,1
0
Prog Main/ Data Prog Main/ Data Prog SPM/ Data Prog SPM/ Data SPM
Main SPM Main

Core+SPM (mA) Main Memory Current (mA)

- 128 -
TU Dortmund

Why not just use a cache ?

2. Energy for parallel access of sets, in comparators, muxes.

9
.

7
Energy per access [nJ]

6
Scratch pad
5 Cache, 2way, 4GB space
4 Cache, 2way, 16 MB space
Cache, 2way, 1 MB space
3

0
256 512 1024 2048 4096 8192 16384
memory size [R. Banakar, S. Steinke, B.-S. Lee, 2001]

- 129 -
TU Dortmund

Influence of the associativity

Parameters different from


previous slides

[P. Marwedel et al., ASPDAC, 2004]

- 130 -
TU Dortmund

Summary

 Processing
• VLIW/EPIC processors
• MPSoCs
 FPGAs
 Memories
• “Small is beautiful”
(in terms of energy consumption, access times, size)

- 131 -
Communication

Graphics: © Alexandra Nolte, Gesine Marwedel, 2003


These slides use Microsoft clip arts.
Microsoft copyright restrictions apply.
TU Dortmund

Embedded System Hardware

Embedded system hardware is frequently used in a loop


(“hardware in a loop“):

 cyber-physical systems

- 133 -
TU Dortmund

Communication
- Requirements -

 Real-time behavior
 Efficient, economical
(e.g. centralized power supply)
 Appropriate bandwidth and communication delay
 Robustness
 Fault tolerance
 Diagnosability
 Maintainability
 Security
 Safety

- 134 -
TU Dortmund

Basic techniques:
Electrical robustness
Single-ended vs. differential signals

ground

Voltage at input of Op-Amp positive  '1'; otherwise  '0'

Local ground Local ground

Combined with twisted pairs; Most noise added to both wires.


- 135 -
TU Dortmund

Evaluation

Advantages:
 Subtraction removes most of the noise
 Changes of voltage levels have no effect
 Reduced importance of ground wiring
 Higher speed
Disadvantages:
 Requires negative voltages
 Increased number of wires and connectors
Applications:
 USB, FireWire, ISDN
 Ethernet (STP/UTP CAT 5/6 cables)
 differential SCSI
 High-quality analog audio signals (XLR) © wikipedia

- 136 -
TU Dortmund

Communication
- Requirements -

 Real-time behavior
 Efficient, economical
(e.g. centralized power supply)
 Appropriate bandwidth and communication delay
 Robustness
 Fault tolerance
 Diagnosability
 Maintainability
 Security
 Safety

- 137 -
TU Dortmund

Priority-based arbitration of communication media

For example, consider a bus

Device 0 Device 1 Device 2 Device 3

 Bus arbitration (allocation) is frequently priority-based


 Communication delay depends on communication traffic of
other partherns
 No tight real-time guarantees, except for highest priority
partner

- 138 -
TU Dortmund

Real-time behavior

Carrier-sense multiple-access/collision-detection
(CSMA/CD, Standard Ethernet) no guaranteed response time.
Alternatives:
 token rings, token busses
 Carrier-sense multiple-access/collision-avoidance
(CSMA/CA)
• WLAN techniques with request preceding transmission
• Each partner gets an ID (priority). After each bus transfer,
all partners try setting their ID on the bus; partners
detecting higher ID disconnect themselves from the bus.
Highest priority partner gets guaranteed response time;
others only if they are given a chance.

- 139 -
TU Dortmund

Time division multiple access


(TDMA) busses
Each communication partner is assigned a fixed time slot.
Example:

http://www.ece.cmu.edu/
~koopman/jtdma/
jtdma.html#classical

 Master sends sync [E. Wandeler, L. Thiele: Optimal


 Some waiting time TDMA Time Slot and Cycle
Length Allocation for Hard Real-
Time Systems, ASP-DAC, 2006]
 Each slave transmits in its time slot
  variations (truncating unused slots, >1 slots per slave)
 TDMA resources have a deterministic timing behavior
 TDMA provides QoS guarantees in networks on chips
- 140 -
TU Dortmund

FlexRay

 Developed by the FlexRay consortium


(BMW, Ford, Bosch, DaimlerChrysler, …)
 Specified in SDL
 Improved error tolerance and time-determinism
 Meets requirements with transfer rates >> CAN standard
High data rate can be achieved:
• initially targeted for ~ 10Mbit/sec;
• design allows much higher data rates
 TDMA protocol
 Cycle subdivided into a static and a dynamic segment.
- 141 -
TU Dortmund

TDMA in FlexRay

Exclusive bus access enabled for short time in each case.


Dynamic segment for transmission of variable length information.
Fixed priorities in dynamic segment: Minislots for each potential sender.

http://www.tzm.de/FlexRay/FlexRay_Introduction.html
Bandwidth used only when it is actually needed.

- 142 -
TU Dortmund

Time intervals in Flexray

Prof. Form, TU Braunschweig, 2007


©

 Microtick (µt) = Clock period in partners, may differ between partners


 Macrotick (mt) = Basic unit of time, synchronized between partners
(=riµt, ri varies between partners i)
 Slot=Interval allocated per sender in static segment (=pmt, p: fixed (configurable))
 Minislot = Interval allocated per sender in dynamic segment (=qmt, q: variable)
Short minislot if no transmission needed; starts after previous minislot.
 Cycle = Static segment + dynamic segment + network idle time
show flexray animation
from dortmund - 143 -
TU Dortmund

Structure of Flexray networks

Bus guardian protects the system against failing processors,

seite=introduction_flexray_en&root=5873&system_id=5875&com=formular_suche_treff
e.g. so-called “babbling idiots”

http://www.ixxat.de/index.php?
- 144 -
TU Dortmund

Communication:
Hierarchy

Inverse relation between volume and urgency quite common:

Sensor/actuator busses

- 145 -
TU Dortmund

Other busses

 Sensor/actuator busses: connecting sensors/actuators, low rates


 Field busses
 CAN: Controller bus for automotive
 LIN: low cost bus for interfacing sensors/actuators in the automotive
domain
 MOST: Multimedia bus for the automotive domain (not a field bus)
 MAP: bus designed for car factories.
 Process Field Bus (Profibus): used in smart buildings
 The European Installation Bus (EIB): bus designed for smart
buildings; CSMA/CA; low data rate.
 IEEE 488: Designed for laboratory equipment.
 Attempts to use standard Ethernet. Timing predictability an issue.
- 146 -
TU Dortmund

Wireless communication: Examples

 IEEE 802.11 a/b/g/n


 UMTS; HSPA
 DECT
 Bluetooth
 ZigBee
Timing predictability of wireless communication?

- 147 -
D/A-Converters
TU Dortmund

Embedded System Hardware

Embedded system hardware is frequently used


in a loop (“hardware in a loop“):

 cyber-physical systems

- 149 -
TU Dortmund

Kirchhoff‘s junction rule


Kirchhoff‘s Current Law, Kirchhoff‘s first rule

Kirchhoff’s Current Law: Example:


At any point in an electrical circuit,
the sum of currents flowing
towards that point is equal to the
sum of currents flowing away from
that point.
(Principle of conservation of
electric charge) i1 + i2+ i4 = i3

Formally, for any node in a circuit: i1+i2-i3+i4=0

 i 0
k k
[Jewett and
Count current flowing away from node as negative. Serway, 2007].

- 150 -
TU Dortmund

Kirchhoff's loop rule


Kirchhoff‘s Voltage Law, Kirchhoff's second rule

Example:
The principle of conservation of energy
implies that:
The sum of the potential
differences (voltages) across all
elements around any closed circuit
must be zero
[Jewett and Serway, 2007].

Formally, for any loop in a circuit: V1-V2-V3+V4=0

V k k 0 V3=R3I3 if current counted in


the same direction as V3
Count voltages traversed against arrow V3=-R3I3 if current counted in
direction as negative the opposite direction as V3
- 151 -
TU Dortmund

Operational Amplifiers (Op-Amps)

Operational amplifiers (op-amps) are devices amplifying the


voltage difference between two input terminals by a large gain
factor g
Supply voltage
Vout=(V+ - V-) ∙ g
-
op-amp High impedance input terminals
V- +  Currents into inputs  0
Vout
V+ Op-amp in a separate package
ground (TO-5) [wikipedia]

For an ideal op-amp: g  


(In practice: g may be around 104..106)

- 152 -
TU Dortmund

Op-Amps with feedback

In circuits, negative feedback is used to define


the actual gain I R 1
loop Due to the feedback to
R - the inverted input, R1
op-amp reduces voltage V-.
V1 V- + Vout To which level?
ground
Vout = - g ∙V- (op-amp feature) I  R1
 V 
I∙R1+Vout-V-=0 (loop rule) 1 g
I  R1
 I∙R1+ - g ∙V- -V-=0 V ,ideal  lim 0
g  1  g
 (1+g) ∙V- = I∙R1
V- is called virtual ground: the voltage is 0,
but the terminal may not be connected to ground
- 153 -
TU Dortmund

Digital-to-Analog (D/A) Converters

Various types, can be quite simple,


e.g.:

- 154 -
TU Dortmund

Current ~ no. represented by x

Loop rule:
x0  I 0  8  R  V  Vref  0

Vref
 I 0  x0 
8 R
Vref
In general: I i  xi 
2 3 i  R

Junction rule: I   Ii
i

I ~ nat (x), where nat(x): natural number represented by x;

- 155 -
TU Dortmund

Output voltage ~ no. represented by x

Loop rule*: y  R1  I '  0

Junction rule°: I  I'


° *
y  R1  I  0

From the previous slide

Hence:
Op-amp turns
current I ~ nat
R1 3 R1
y  Vref  
8  R i 0
xi  2 i
 Vref 
8 R
 nat ( x) (x) into a voltage
~ nat (x)

- 156 -
TU Dortmund

Output generated from signal e3(t)

*
* Assuming
“zero-order
hold”

Possible to
reconstruct
input
signal?

- 157 -
Sampling
Theorem
TU Dortmund

Possible to reconstruct input signal?

 Assuming Nyquist criterion met


 Let {ts}, s = ...,−1,0,1,2, ... be times at which we sample g(t)
 Assume a constant sampling rate of 1/ps(∀s: ps = ts+1−ts).
 According sampling theory, we can approximate the input
signal as follows:
Weighting factor
for influence of
y(ts) at time t
[Oppenheim, Schafer, 2009]

- 159 -
TU Dortmund

Weighting factor for influence of y(ts)


at time t

No influence at ts+n

- 160 -
TU Dortmund

Contributions from the various sampling instances

- 161 -
TU Dortmund

(Attempted) reconstruction of input signal

* Assuming 0-
order hold

- 162 -
TU Dortmund

How to compute the sinc( ) function?

 Filter theory: The required interpolation is performed


by an ideal low-pass filter (sinc is the Fourier transform
of the low-pass filter transfer function)
z (t )
y (t )

fs /2 fs
Filter removes high frequencies present in y(t)

- 163 -
TU Dortmund

How precisely are we reconstructing the input?

 Sampling theory:

• Reconstruction using sinc () is precise

 However, it may be impossible to really compute z(t) as


indicated ….

- 164 -
TU Dortmund

Limitations

 Actual filters do not compute sinc( )


In practice, filters are used as an approximation.
Computing good filters is an art itself!
 All samples must be known to reconstruct e(t) or g(t).
 Waiting indefinitely before we can generate output!
In practice, only a finite set of samples is available.
 Actual signals are never perfectly bandwidth limited.
 Quantization noise cannot be removed.

- 165 -
TU Dortmund

Output

Output devices of embedded systems include


 Displays: Display technology is extremely important. Major
research and development efforts
 Electro-mechanical devices: these influence the
environment through motors and other electro-mechanical
equipment.
Frequently require analog output.

- 166 -
TU Dortmund

Embedded System Hardware

Embedded system hardware is frequently used


in a loop (“hardware in a loop“):

 cyber-physical systems

- 167 -
Actuators
TU Dortmund

Actuators

Huge variety of actuators and output devices,


impossible to present all of them.
Microsystems motors as examples (© MCNC):

(© MCNC)

- 169 -
TU Dortmund

Actuators (2)

Courtesy and ©: E.
Obermeier, MAT, TU Berlin
http://www.piezomotor.se/pages/PWtechnology.html

http://www.elliptec.com/fileadmin/elliptec/User/Produkte/Elliptec_Motor/Elliptecmotor_How_it_works.h

- 170 -
TU Dortmund

Secure Hardware

 Security needed for communication and storage


 Demand for special equipment for cryptographic keys
 To resist side-channel attacks like
• measurements of the supply current or
• Electromagnetic radiation.
Special mechanisms for physical protection (shielding,
sensor detecting tampering with the modules).
 Logical security, using cryptographic methods needed.
 Smart cards: special case of secure hardware
• Have to run with a very small amount of energy.
 In general, we have to distinguish between different
levels of security and knowledge of “adversaries”
- 171 -
TU Dortmund

Summary

Hardware in a loop
 Sensors
 Discretization
 Information processing
• Importance of energy efficiency, Special purpose HW very
expensive, Energy efficiency of processors, Code size
efficiency, Run-time efficiency
• Reconfigurable Hardware
 Communication
 D/A converters
 Sampling theorem
 Actuators

- 172 -

You might also like