Professional Documents
Culture Documents
Embedded System Design Introduction To SoC System Architecture
Embedded System Design Introduction To SoC System Architecture
Mohit Arora
Ever since I got involved in chip design during my earlier career, designing
IPs, I got more involved into SoC architecture, before taking a full time
architect position focused on embedded systems. The role allowed me to
visit customers world-wide, validate requirements, before starting to write
detailed architectural specifications for the design team.
During this course, I realized, though looks simple, embedded applications
require some unique aspects that need specific attention to the chip
architecture, much different than typical consumer applications that are
mostly focused on features and processing capabilities.
This is my second book that is very natural extension of my first book titled
“The Art of hardware architecture” that was focused on design techniques,
while the current one extends that further to embedded systems
architecture.
The book's aim is to highlight all the complex issues, tasks and techniques
that must be mastered by a SoC Architect to define and architect SoC for
an embedded application. Since “Embedded System design” is a broad
subject, the first revision of the book does not cover everything but make
an attempt to include essential elements and attributes that are important
to design an embedded system. The subsequent version of the book will
include extended topics to keep the book up-to date with any upcoming
trends.
vii
techniques across various aspects of chip-design, especially focused on
embedded systems.
Chapter 4 focuses on all one need to know about embedded “System Boot”.
The chapter starts with Window XP boot as an example to start with as that
being very common consumer boot and expands the later sections to
include boot process and options in an embedded application. The chapter
includes challenges in embedded boot, Boot ROM and bootloaders for
embedded application including popular open source universal bootloader
(U-Boot).
Chapter 5 covers all about “System Integrity”. The chapter outlines the need
for robust Watchdog and the guidelines that must be considered while
designing a fault tolerant system monitor aka Watchdog.
viii
Chapter 6 covers several hardware as well as software “Debouncing
Techniques” to eliminate unwanted noise or glitch in the circuit caused by an
external input (usually some kind of switch).
Every possible effort was made to make the book self-contained. Any
feedback/comments are welcome on this aspect or any other related
aspects. Comments can be sent to me at the following mail:
mohit.arora@me.com.
ix
x
Acknowledgements
The original idea behind “Embedded System Design” was to link my years of
experience as a system architect with the practical experiences architecting
SoC and meeting customers. However, achieving the final shape of this
book would not have been possible without many contributions.
My sweet wife Pooja was so patient with my late nights, and I want to thank
her for her faithful support & encouragement in writing this book. Most of
the work occurred on weekends, nights, while on vacation, and other times
inconvenient to my family. I like to thank my parents for allowing me to
follow my ambitions throughout my childhood.
xi
xii
Contents
PREFACE VII
ACKNOWLEDGEMENTS XI
CONTENTS XIII
1.1 INTRODUCTION 1
2. HANDLING INTERRUPTS 32
xiii
2.1 INTRODUCTION 32
2.2 INTERRUPTS 32
2.5 INTERRUPT SERVICE ROUTINE (ISR), INTERRUPT VECTORS AND VECTOR TABLE
40
3. MEMORY ADDRESSING 61
3.1 INTRODUCTION 61
xiv
3.2 MEMORY CLASSIFICATION 61
3.10.1 Definition 85
3.10.8.1 Methods 98
8. REFERENCES 186
xix
Introduction to Embedded Systems
1.1 Introduction
There are millions of computing systems built every year destined for
desktop computers (Personal Computers, or PC’s), workstations,
mainframes and servers. Interestingly there are rather billions of computing
systems that are built every year for a very different purpose: they are
embedded within larger devices, repeatedly carrying out a particular
function, often going completely unrecognized by the device’s user.
This Chapter is intended to help the readers understand about what makes
system an embedded system, how it differs from general computer systems
and other key components of an embedded systems.
“Loosely defined, it is any device that includes a programmable computer but is not itself
intended to be a general purpose computer” by Wayne Wolf [1]
“An embedded computer system includes a microcomputer with mechanical, chemical and
electrical devices attached to it, programmed for a specific dedicated purpose, and packaged
as a complete system” by Jonathan W. Valvano [2]
1
Introduction to Embedded Systems
One may not realize but will find embedded devices into all sort of everyday
items. In fact one may find easily find more than dozen embedded devices
in a home hidden or embedded inside things like washing machines,
electronic shavers, Digital TV, digital cameras, air-conditioning etc. The key
characteristic, however, is that an embedded system is designed to handle a
particular task. Most embedded devices are primary designed for a
particular function; however one may find several embedded devices, such
as a Smartphone, Digital TVs etc. that may perform variety of functions.
Table 1-1 lists some of the “Embedded Device” examples across various
markets.
2
Introduction to Embedded Systems
For a typical embedded device, a user can make choices concerning the
functionality but cannot change the system functionality by adding or
replacing software. For example, a programmable digital thermostat has an
embedded system that has a dedicated function of monitoring and
controlling the surrounding temperature. User may have choices for setting
the desired low and high temperatures but cannot just change its
functionality to function something different than a temperature controller.
The software for an embedded system is often referred to as firmware, and
often contained in the system’s non-volatile memory.
Unless told, most of the users would be completely unaware that what they
are using is controlled by one or more embedded device. Most people do
recognize computers by their screen, keyboard, disc drives and so on. These
embedded devices or computers have none of these characteristics. In the
next section, we will discuss more details on embedded device and the main
characteristics that differentiate them from general computers.
A Microprocessor
A large primary memory that includes RAM, ROM and cache.
A large secondary memory like hard disk drive, optical drive or
solid state drive.
I/O unit such as display, keyboard, mouse and others.
Operating System (OS)
General purpose user interfaces and application software.
3
Introduction to Embedded Systems
Embeds hardware that includes the core and necessary I/O for a
specific function.
Embeds main application software into embedded Flash.
Embeds a real time operating system (RTOS) which supervises
the application software tasks running on the hardware.
4
Introduction to Embedded Systems
Embedded systems are typically used over long periods of time, will not (or
cannot) be programmed or maintained by its end-users, and often face
significantly different design constraints such as limited memory, low cost,
strict performance guarantees, fail-safe operation, low power, reliability and
guaranteed real-time behavior.
These embedded systems often use simple executives (OS kernels) or real-
time operating systems with typically small footprints, support for real-time
scheduling and no hard drives. Many embedded systems also interact with
their physical environment using a variety of sensors and/or actuators.
The main job of an air conditioner is to cool the indoor air. Air conditioners
monitor and regulate the air temperature via a thermostat. Air conditioners
function also acts as dehumidifiers. Because temperature is a key
component of relative humidity, reducing the temperature of a volume of
humid air causes it to release a portion of its moisture. That's why there are
drains and moisture-collecting pans near or attached to air conditioners,
and the reason for why air conditioners discharge water when they operate
on humid days.
If you open a window air conditioner unit, you will find that it contain
following main components:-
The cold side of an air conditioner contains the evaporator and a fan that
blows air over the chilled coils and into the room. The hot side contains
the compressor, condenser and another fan to vent hot air coming off the
compressed refrigerant to the outdoors. In between the two sets of coils,
there's an expansion valve. It regulates the amount of compressed liquid
5
Introduction to Embedded Systems
refrigerant moving into the evaporator. Once in the evaporator, the
refrigerant experiences a pressure drop, expands and changes back into a
gas. The compressor is actually a large electric pump that pressurizes the
refrigerant gas as part of the process of turning it back into a liquid. There
are many additional and optional components like sensors, timers and
valves, but the evaporator, compressor, condenser and expansion valve are
the main components of an air conditioner.
This forms the basic setup for a conventional air-conditioner. Window air
conditioners have all these components mounted into a relatively small
metal box that installs into a window opening. The hot air vents from the
back of the unit, while the condenser coils and a fan cool and re-circulate
indoor air. A split-system air conditioner splits the hot side from the cold
side of the system with the hot side usually kept outside the building/Room.
On-chip
Flash Multi-Channel
Timer
Display Compressor
Control IGBT Control Unit
On-chip
RAM
Microcontroller Multi-Channel
Timer
IGBT
Outside Temp Fan Motor
Sensor Control Unit
Current Detector
Cabin Temp
ADCs Current Detector
Sensor
6
Introduction to Embedded Systems
create most efficient PWM waveforms for the motor speed, resulting in
better efficiency and low power consumption.
What an airbag does is to slow the passenger's speed to zero with little or
no damage; however the constraints that it has to work within are huge.
The goal of an airbag is to slow the passenger's forward motion as evenly
as possible in a fraction of a second. For the front driver airbag, a bag made
of thin, nylon fabric, is folder into the steering wheel (as shown in the
Figure 1-2). A sensor in the device (part of microcontroller explained later
in this section) indicates a bag to inflate during a collision. Inflation is a
result of chemical reaction to produce nitrogen gas that inflates the airbag.
7
Introduction to Embedded Systems
Whole process happens in few milliseconds, thus require a microcontroller
to control the whole operation.
Figure 1-3 shows microcontroller that is the heart for airbag control unit to
manage the whole operation. A microcontroller monitors a number of
sensors such as G-sensors, front sensors and Rollover sensors. When a
predefined threshold is exceeded, it sends a signal to trigger the ignition of
the airbags via special squib driver circuits.
Power Management
Driver’s Seat
On-chip
Rollover Sensor Serial I/F Airbag Drive Circuit and
Flash Serial I/F Inflater
Or Squib Circuit
ADCs
Safety Sensor On-chip Front Passenger’s Seat
RAM
Airbag Drive Circuit and Inflater
Microcontroller Serial I/F
Squib Circuit
Front Sensor
Serial I/F
G Sensor
Or ADCs
Passenger CAN
Occupant Sensor
Automotive LAN
CAN
Transceiver
There are two numbers in a blood pressure reading: systolic and diastolic.
Systolic arterial pressure is the higher blood pressure reached by the arteries
during systole (ventricular contraction), and diastolic arterial pressure is the
lowest blood pressure reached during diastole (ventricular relaxation). In a
8
Introduction to Embedded Systems
healthy young adult at rest, systolic arterial pressure is around 120 mmHg
and diastolic arterial pressure is around 80 mmHg.
Blood flow is the blood volume that flows through any tissue in a
determined period of time (typically represented as ml/min) in order to
bring tissue oxygen and nutrients transported in blood. Blood flow is
directly affected by the blood pressure as blood flows from the area with
more pressure to the area with less pressure. Greater the pressure
difference, higher is the blood flow. Blood is pumped from the left ventricle
of the heart out to the aorta where it reaches its higher pressure levels.
Blood pressure falls as blood moves away from the left ventricle until it
reaches 0 mm Hg, when it returns to the heart’s right atrium.
The arm cuff is inflated using an external air pump controlled with an MCU
GPIO pin, and deflated by activating an escape valve with another GPIO
pin.
9
Introduction to Embedded Systems
Arm cuff
Air
Optocoupler
Pump
On-chip
LCD
General Flash
Controller
Air Purpose
Valve IOs (GPIO) Segmented / TFT Display
On-chip
RAM Serial
Interface
Microcontroller
Wireless Connectivity
Low
Pressure Operational High Pass
Pass
Sensor Amplifier Filter
Filter
Signal Processing
One way to activate air pump is to provide current through USB however
current through the USB port may not be enough to activate the air pump
and the valve, so another option would be to activate the external
components using an external power source (like dual AA 1.5V batteries)
which provides sufficient current. An optocoupler (as shown) is needed for
coupling MCU control signals with the components to activate. Output
from the optocoupler is connected to a MOSFET working as a switch, so
the air pump and valve mechanisms can be activated successfully.
10
Introduction to Embedded Systems
Figure 1-5 shows a system block diagram for a three phase energy meter.
As shown the energy meter hardware includes a power supply, an analog
front end, a microcontroller section, and an interface section. The analog
front end is the part that interfaces to the high voltage lines. It converts
high voltages and high currents to voltages sufficiently small to be measured
directly by the ADC (Analog to Digital Converter) of the microcontroller.
A typical energy meter also requires a Real Time Clock (RTC) for tariff
information. The RTC required for a metering application needs to be very
accurate (< 5ppm) for Time of Day (TOD), which involves dividing the
day, month and year into tariff slots. Higher rates are applied at peak load
periods and lower tariff rates at off-peak load periods.
The heart of the meter is the firmware, which calculates Active, reactive
energy based on voltage and current measurement. The firmware also
includes tamper detection algorithms, data logging and protocols like
DLMS and Power Line Modem communication protocol for Automatic
Meter Reading (AMR).
11
Introduction to Embedded Systems
Optical Port (PLM, RF, Zigbee etc)
Buttons
1 IR AMR
Battery Input/Output
POWER GENERATOR 220 V @ 100A Processing
PHASE B
PHASE C
PHASE A
VCC VBAT
Power Keyboard
Supply Serial Communication
NEUTRAL
Processing 32 KHz
VPHASE A
METERING ASIC LCD Driver
VPHASE B
Flash/Compute Engine
VPHASE C LCD Display
q Active/reactive Energy On-chip
I PHASE A Calculation
q Tampering Algorithm RAM
CT q Memory Management
LOAD I PHASE B ADCs q Data Logging
CT q Data Processing Real Time
q DLMS Clock(RTC) External Tampers
I PHASE C
LOAD CT q AMR(PLM protocols
etc)
I NEUTRAL GPIO
LOAD
x100 LEDS
Gain Amplifiers Serial Communication TIMERS
CT Energy output pulse
0.1 kWhr/pulse
EEPROM
Temp Sensor
PC Interface
(for Debug/Programming)
Figure 1-5: System Block Diagram for three phase Energy Meter
The energy meter also needs to be calibrated before it can be used and that
is done in a digital domain for an electronic meter. Digital calibration is fast,
efficient and can be automated, removing the time-consuming manual
trimming required in traditional, electromechanical meters. Calibration
coefficients are safely stored in an EEPROM that can be either internal or
external.
12
Introduction to Embedded Systems
A Portable music player is a convergence of many technologies. Unlike
earlier forms of music players that required moving parts to read encoded
data on a tape or CD, MP3 players use solid-state memory. An MP3 player
is no more than a data-storage device with an embedded software
application that allows users to transfer MP3 files to the player and play
them. The advantage to solid-state memory is that there are no moving
parts, which means better reliability.
On-chip LCD
General
RAM Controller
Purpose
IOs (GPIO) TFT Display
Controls
Memory USB
NAND Flash Interface Controller
Power
Management
Host PC
In addition to storing music, the music or MP3 player must play music and
allow the user to hear the songs played. To do this, the player pulls the song
from its memory, decompresses the MP3 encoding through DSP, via an
algorithm or formula. Runs the decompressed bytes through a CODEC
that includes a digital-to-analog converter to convert the data into sound
waves and amplifies the analog signal, allowing the song to be heard.
All of the portable MP3 players are battery-powered. Most branded would
use a rechargeable internal lithium battery that would last for number of
hours on a single charge. Charging Lithium Ion battery via USB port is now
a common supported feature of most portable customer grade devices
including media player.
13
Introduction to Embedded Systems
Hardware
Application Software
Real time Operating System (RTOS)
Embedded Hardware:
Power Management: This includes the power supply and additional control to
be able to support variety of power modes, some of them including power
gating modes to offer number of operating modes thus optimizing power
consumption for hand-held devices. System may even choose to retain
some of the peripherals like Real Time Clock (RTC) if main supply is lost
by running it on batteries.
Interrupt System
Control RAM
Embedded Processor
Peripherals ROM
14
Introduction to Embedded Systems
Embedded Memory: The memory unit in an embedded system should have
low access time and high density. Some of the embedded microcontrollers
include ROM as primary bootloader that is pre-programmed by the vendor.
The contents of ROM are non-volatile (power failure does not erase the
contents). All embedded microcontroller include some sort of system
memory or RAM (volatile) to store transient input or output data.
Embedded systems generally do not possess secondary storage devices
such as magnetic disks. As programs of embedded systems are small there
is no need for virtual storage.
Peripherals and I/Os: Peripherals are the input and output devices connected
to the serial and parallel ports of the embedded system. Serial ports transfer
one bit at a time between the peripheral and the microcontroller or
microprocessor. Parallel ports transfer an entire word consisting of many
bits simultaneously between the peripheral and the microcontroller. A
microcontroller generally communicates with the peripherals using a
programmable interface device. Programmable interface devices provide
flexibility since they can be programmed to perform I/O on different
peripherals. The microcontroller monitors the inputs from peripherals and
performs actions when certain events occur. For instance, when sensors
indicate that the level of water in the wash tub of a washing machine is
above the preset level, the microprocessor starts the wash cycle.
15
Introduction to Embedded Systems
Interrupt Controller: Due to real time nature for some of the embedded
applications, an embedded system would often require low latency and fast
response to an interrupt event. This could be one of the important
considerations for selecting a microcontroller for microprocessor for an
embedded device. Apart from interrupt controller, chip architecture and
way caches and RAMs are organized plays a big role to achieve low latency
response.
Note: Features described in this section for embedded system hardware just covers general
trends and options, however does not mean all embedded system hardware would include
all the options described above.
16
Introduction to Embedded Systems
The RTOS provides features that simplify the programmer’s job. For
example, an RTOS provides semaphores that can be used by the
programmer to prevent multiple tasks from simultaneously writing into
shared memory.
Read-Only Read-Write
Memory Memory
As shown in the figure, all support devices like Read-only Memory, Read-
Write Memory, Serial Interface, Timers and I/O Port are all external and
interfaced to Microprocessor via system bus. The system bus is composed
of address bus, data bus and control bus. The prime use of a
microprocessor is to read data, perform extensive calculations on that data,
and store the results in a mass storage device or display the results. Some
of the popular microprocessor examples include 8085, 8086, Z80, 6800,
Pentium, Intel i3, Intel i5, Intel i7 processors.
Serial
Microprocessor
Interface
Microcontroller
Generally in the embedded world, the term “MPU” is used for “Micro
processing unit” or “Microprocessor” that does not include Flash(Flash being
external to MCU) in the System-on-chip. Likewise the term “MCU” is used
for “Microcontroller” that includes on-chip Flash in the system-on-chip.
18
Introduction to Embedded Systems
used for permanent saving program being executed, while Data Memory is
used for temporarily storing and keeping intermediate results and variables.
Program Memory:
Application
Flash Section
Program
Memory
Boot Flash
Section
Data Memory:
Data memory is the volatile memory that is used to store the variables
during the program execution and is deleted once the power to the
microcontroller is lost. Data Memory would often include the following:-
19
Introduction to Embedded Systems
I/O Memory space contains addresses for CPU peripheral function, such
as Control registers, SPI, and other I/O functions.
Storing data in I/O and Extended I/O memory is usually handled by the
compiler only. Users can not use this memory space for storing their data.
Internal SRAM (Data Memory) is used for temporarily storing and keeping
intermediate results and variables.
So both program memory and data memory have a different role in building
a program. Program Memory must be a non-volatile memory (often on-
chip or off-chip Flash), which store the information even after the power
is turn off. In contrast, Data Memory does not save the information
because it needs power in order to maintain the information stored in the
chip.
20
Introduction to Embedded Systems
contents of the RAM based part of the program storage memory are lost.
Due to the temporary nature of these programs, they are referred to as
software.
In the PC world, due to small size of the BIOS, major part of the primary
storage is DRAM to be used for program storage. In comparison, in an
embedded system, such as an electronic game or coffee machine, the
complete program storage memory is implemented with either ROM or
Flash devices.
NOTE: Engineers must have their own criteria in order to make the right selection.
This section discusses the general considerations and some guidelines to keep in mind
when selecting a microcontroller, serving as a basis for setting your own criteria.
To start the selection process, the designer must first ask, "What does the
MCU need to do in my system?" The answer to this one simple question dictates
21
Introduction to Embedded Systems
the required MCU features for the system and, thus, is the controlling
agency in the selection process.
The second step is to conduct a search for MCUs which meet all of the
system requirements. This usually involves searching the literature -
primarily data books, data sheets, and technical trade journals but also
includes peer consultations. If the fit is good enough, a single-chip MCU
solution has been found; otherwise, a second search must be conducted to
find an MCU which best fits the requirements with a minimum of extra
circuitry, including considerations of cost and board space. Obviously, a
single-chip solution is preferred for cost as well as reliability reasons. Of
course, if there is a company policy dictating which MCU manufacturer to
use, this will narrow the search considerably. The last step has several parts,
all of which attempt to reduce the list of acceptable MCUs to a single
choice. These parts include pricing, availability, development tools,
manufacturer support, stability, and sole sourcing. The whole process may
need to be iterated several times to arrive at the optimum decision.
MCUs generally can be classified into 8-bit, 16-bit, and 32-bit groups based
upon the size of their arithmetic and index register(s), although some
designers argue that bus access size determines the 8-, 16-, 32-bit
architecture.
22
Introduction to Embedded Systems
another clock required in the system, for example, for serial baud rates. In
general, computational power, power consumption, and system cost
increase with higher clock frequencies. System costs increase with
frequency because not only does the MCU cost more, but so do all the
support chips required, such as RAMs, ROMs, PLDs (programmable logic
device), and bus drivers.
Memory Requirements:
Most common peripherals could include timers, both real-time clocks and
periodic interrupt timers. Be sure to consider the range and resolution of
the timer as well as any sub functions, such as timer compare and/or input
capture lines. I/O includes serial communication ports, parallel ports (I/O
lines), analog-to-digital (A/D) converters, digital-to-analog (D/A)
converters, liquid crystal display drivers (LCD). Certainly if one wants
microcontroller to have built in Ethernet, CAN, USB, or even multiple
serial ports, many common choices are going to be eliminated. It's also
convenient if output pins can supply reasonable amounts of current for
driving LEDs or transistors directly; some chips have 5mA or less drive
capability.
23
Introduction to Embedded Systems
Some peripherals can be handy to have: UARTs, SPI or I2C controllers,
PWM controllers, and EEPROM data memory are good examples, even
though similar functionality can frequently be implemented in software or
external parts. The less common built-in resources are internal/external
bus capability, computer operating properly watchdog system, clock
detection and selectable memory configurations.
Physical Packaging:
Some OEMs just prefer QFP package than BGA due to ease in mounting,
soldering and fabrication cost. However for applications that need small
form factor due to physical geometry of the product, BGA may be a better
solution. Similarly for security related applications, one would want to go
with BGA package even though cost is high due to the fact that pins in the
BGA package are not easy to probe as compared to QFP/DIP package,
thus providing another later of security. It is often a combination of
application needs and cost that drives the choice of package.
Microcontroller Architecture:
24
Introduction to Embedded Systems
In a “Harvard architecture”, the instruction memory and the data memory are
separate, controlled by different buses, and sometimes have different sizes.
For microcontrollers, the instructions are usually stored in "read only"
memory, and data is in RAM or registers.
25
Introduction to Embedded Systems
A Microcontroller may support lot of fancy instructions that seem to do a
lot in one instruction however the real measure should be number of clocks
it takes to accomplish the task at hand, not how many instructions were
executed. A fair comparison is to code the same routine and compare the
total number of clock cycles executed and bytes used.
MCU Interrupts:
How many interrupt lines or levels are there versus how many
does your system require?
Is there an interrupt level mask?
Once an interrupt level is acknowledged, are there individual
vectors to the interrupt handler routines or must each possible
interrupt source be polled to determine the source of the
interrupt?
Hardware Tools:
Hardware tools (sort of programmer) are required to load the program into
the microcontroller; however they vary widely in cost. It is pretty common
for manufactures to offer some low cost development tools, however it
may help further if manufactures support third party tools allowing more
options for the development community.
Software Tools:
Most of the microcontrollers have some level of standard tools (at least an
assembler) provided by the manufacturer. Most have "Integrated Development
Environments" (IDE) that allow integrated use of an editor with the
assembler, some compilers, and a simulator. Some have significant
additional support from the open source movement.
Literature Support:
26
Introduction to Embedded Systems
Literature covers a wide selection of printed material which can assist in the
selection process. This includes items from the manufacturer, such as data
sheets, data books, and application notes, as well as items available at the
local book store and/or library. Book store and library items indicate not
only the popularity of the manufacturers and MCUs under consideration,
but they also offer unbiased opinions when written by non-manufacturer-
related authors.
As a final step to help in the selection process, user should consider building
a table to list each MCU under consideration on one axis and the important
attributes on the other axis. Blanks should be filled in from the
manufacturer’s data sheets to obtain a fair side-by-side comparison. Some
manufacturers have premade comparison sheets of their MCU product line
which makes this task much easier, but as with all data sheets, be sure they
are up-to-date with current production units.
Size
Cost
Performance
Power
Consumption
Technology
27
Introduction to Embedded Systems
Figure 1-11: Parameters that Control embedded System Success
These design attributes typically compete with one another: improving one
often leads to degradation in another. For example, if die size is reduced,
thereby reducing the features, performance of embedded system may
suffer. One may choose to move to lower technology node to reduce the
die size and cost, however that may increase the leakage significant to have
an adverse impact on power consumption.
Performance:-
Systems used for many mission critical applications must be real-time, such
as for control of fly-by-wire aircraft, or anti-lock brakes on a vehicle, which
must produce maximum deceleration but intermittently stop braking to
prevent skidding [6]. Real-time processing fails if not completed within a
specified deadline relative to an event; deadlines must always be met,
regardless of system load.
Power Consumption:-
28
Introduction to Embedded Systems
An embedded system never includes heat sink and must operate fan less
unlike laptop or desktop PC. This increase the challenge to think beyond
just active mode to optimize power consumption of an embedded system
offering several low power modes.
Since switching between technology nodes adds huge NRE cost, volume
have to be significantly high to be able to justify the same. In order to
reduce per unit cost of embedded SoC, it is necessary to reduce the die size
either by restricting feature set or by switching to lower technology node,
which may be a natural transition once the technology is stable and
transition cost is justified. So there is always a fine balance between
Technology, die size and design cost when designing SoC for an embedded
application.
Interoperability:-
Reliability:-
29
Introduction to Embedded Systems
The amount of software (and technology) in products is increasing
exponentially. However software is far from errorless. Studies of the density
of errors in actual code show that 1000 lines of code typically contain 3
errors [8]. Incremental increase of the code size will increase the number of
hidden errors in a system.
This may be application specific, but some embedded devices require high
reliability and should be capable to work for long operation hours without
failure. For example a medical device used in critical care cannot afford a
crash which is common and often harmless in case of personal computers.
Security:-
31
Handling Interrupts
2. Handling Interrupts
2.1 Introduction
Interrupts are essential feature of an embedded system. They enable the
software to respond, in a timely fashion, to internal and external hardware
events. By managing the interaction with external systems, effective use of
interrupts can dramatically improve system efficiency and the use of
processing resources. For example, the reception and transmission of bytes
via UART is more efficient using interrupts rather than polling method. By
offloading the tasks to the hardware module so as to report back when
finished, drastically improves performance.
Interrupts play more critical role in real time systems since the events have
to handle in real-time for example synchronization of a video input. This
requires low latency and determinism since the action needs to be handled
in a particular time-frame. If an inordinate delay occurs the user will
perceive the system as being non-responsive.
2.2 Interrupts
An “interrupt” is event triggered inside an embedded device, either by
internal or external hardware, that initiates automatic transfer of software
execution to an interrupt service routine (ISR). On completion of ISR,
software execution returns to the next instruction that would have occurred
without the interrupt. The behavior is shown in Figure 2-1.
32
Handling Interrupts
Waiting for
Hardware Busy Busy
Service
Hardware Hardware
needs service proceeds with
the task
Saves
Restores
execution
execution
state
state
ISR provides
service
"Interrupts" and "Polling". The difference between the two is best summed
up as:
33
Handling Interrupts
Generally, the time it takes to get information from your average device,
the CPU could be off doing something far more useful than waiting for a
busy but slow device. So to keep from having to busy-wait all the time,
interrupts are provided which can interrupt whatever is happening so that
the operating system can do some task and return to what it was doing
without losing information. In an ideal world, all devices would probably
work by using interrupts. However, on a PC or clone, there are only a few
interrupts available for use by your peripherals, so some drivers have to poll
the hardware: ask the hardware if it is ready to transfer data yet. This
unfortunately wastes time, but it sometimes needs to be done.
34
Handling Interrupts
Interrupts
Synchronous Asynchronous
Interrupts Interrupts
Non-Maskable
Maskable Interrupts
Interrupts
Maskable Interrupts: Maskable interrupts are the one that can be blocked by
various masking techniques in the hardware.
36
Handling Interrupts
can inspect the machine's memory, and examine the internal state of the
program at the instant of its interruption.
Non-Maskable Interrupt
(NMI)
Interrupt Request
OR
Maskable Interrupt
(INTR)
CPU AND
Enable INTR
Enable/Disable
Disable INTR D Q
Logic
Clock
Vectored interrupts requires that the interrupting device supply the CPU
with the starting address or transfer vector of ISR while “Non-vectored
interrupts” has pre-fixed start address of the ISRs.
37
Handling Interrupts
System Interconnect
IRQ
When either of the peripherals places request, IRQ line to the CPU gets
asserted, triggering a request. However this does not let CPU identify actual
source of interrupt between Peripheral 1, Peripheral 2 or Peripheral 3, so
within the ISR CPU proceeds to check the service request flags (denoted
by "INTF”) of all the peripherals to find out who placed the request. Each
peripheral has open drain request line tied to the processor IRQ line. The
activation of interrupt request by a particular peripheral will set its interrupt
Flag INTF. If the peripheral interrupt is enabled (denote by INTE), its IRQ
will be asserted, placing a request to the CPU. Interrupt will take place by
loading the PC with a pre-defined address where ISR is stored. The ISR
address in a non-vectored systems is usually fixed to a certain location in
the program memory where the ISR must be stored in order to be executed.
When any peripheral places a request, a single ISR is executed and code
within the ISR polls the interrupt flag INF of each peripheral to determine
who placed the request. The absence of hardware mechanism allowing the
CPU automatically identifying who placed the service request is what gains
the method the name “Non-Vectored” [10]. Some of the processor that
support Non-vectored interrupts include 8085, 6802, PowerPC, MIPS and
MSP430.
38
Handling Interrupts
Interrupt
Interrupt Register
Mask Register
0 1 N -1
For the MC68000 and 8086 the vector table is in a fixed position in the
memory map [11]. In the Z8000 and Z80, however, the position of the table
is relative to the contents of an internal register. An 8-bit vector allows for
256 entries in the vector table. The 8086 predefines or reserves 32 of these
and care must be taken to avoid generating these vectors externally. For
MC68000, Z8000 and Z80, the full range is available for user definition
[11].
Address (hex) Main Memory
0000
Jump to
address 01C0
Program Counter (16 bits)
0004
0000 0000 0000 xx00
Jump to
address 1010
0008
Jump to
address 0040
000C
Jump to
address F000
01C0
Start ISR0
Figure 2-6: A vectored interrupt scheme with interrupt vector description table
39
Handling Interrupts
Most common processors include a Vector table that defines the start
address of interrupt service routine.
40
Handling Interrupts
FLASH Memory ~
~
ISRs
Program Counter
~
(PC)
Interrupt Vector #120 0x00F8
Interrupt Vector #121 0x00FA
Interrupt Vector #122 0x00FC
Interrupt Vector #124 0x00FE
On-chip
Memory (RAM) Interrupt Vector
Table(IVT)
Peripherals
0xFFFF
NOTE: The vector table is at a fixed location (defined by the processor data sheet), but
the ISRs can be located anywhere in memory.
Figure 2-8 and Figure 2-9 shows the Interrupt Vector Table (IVT) and the
Interrupt Vectors (only first 32 interrupt vectors shown) for Microchip
dsPIC33F Digital Signal Controllers [12] .
41
Handling Interrupts
1 Microchip dsPIC33F Interrupt Vector Table reprinted with permission of the copyright
owner, Microchip Technology Incorporated. All rights reserved. No further reprints or
reproductions may be made without Microchip Technology Inc.’s prior written consent.
42
Handling Interrupts
One may also notice that dsPIC33F includes Alternate Interrupt Vector
Table (AIVT) in the memory map providing means to switch between an
application and a support environment without requiring the interrupt
vectors to be reprogrammed. Based on ALTIVT bit in Interrupt control
register all interrupt and exception processes use either the default vectors
or alternate vectors.
Also note that for dsPIC33F, Interrupt controller is not involved in reset
process. The dsPIC33F device clears its registers in response to a Reset,
which forces the Program Counter (PC) to zero. The processor then starts
2 Microchip dsPIC33F Interrupt Vector Table reprinted with permission of the copyright
owner, Microchip Technology Incorporated. All rights reserved. No further reprints or
reproductions may be made without Microchip Technology Inc.’s prior written consent.
43
Handling Interrupts
Kinetis KL25 is based on ARM CortexTM M0+ that include Nested Vector
Interrupt Controller (NVIC). On the ARMv6-M based architecture NVIC
supports up-to 32 external interrupts with 4 different priority levels. This is
explained in more details in Section 2.8.1.
The use of an NVIC in the microcontroller profiles means that the vector
table is very different from other ARM processors consisting of addresses
not instructions. The initial stack pointer and the address of the reset
handler must be located at 0x0 and 0x4 respectively. These addresses are
loaded into the SP and PC registers by the processor at reset.
44
Handling Interrupts
45
Handling Interrupts
46
Handling Interrupts
Start
Instruction execution
NO
Interrupt ?
YES
Disable Interrupt
47
Handling Interrupts
STEP 1: CPU executes its main program that includes series of instructions.
While executing instruction at 100, P1 receives input data in the register
with address 0x2000 and asserts “Int” to request servicing by the CPU.
Program Memory
CPU Data Memory
ISR
10: MOV R0, 0x2000
11: Modify R0 System Bus
12: MOV 0x2001, R0
13: RETI #ISR return
…..
Int P1 P2
…..
Main Program
….. PC
101 : Instruction
…..
Program Memory
CPU Data Memory
ISR
10: MOV R0, 0x2000
11: Modify R0 System Bus
12: MOV 0x2001, R0
13: RETI #ISR return
…..
Int P1 P2
…..
Main Program
….. PC
STEP 3: The ISR reads data from 0x2000, modifies the data and writes the
resulting data in 0x2001 (as shown). Once the data from P1 is read, it
deasserts “Int”.
Program Memory
CPU Data Memory
ISR
10: MOV R0, 0x2000
11: Modify R0 System Bus
12: MOV 0x2001, R0 R0
13: RETI #ISR return
…..
Int P1 P2
…..
Main Program
….. PC
STEP 4: ISR ends with a return (or RETI) instruction, thereby restoring
PC to 100+1 =101 where CPU resumes executing next instruction.
Program Memory
CPU Data Memory
ISR
10: MOV R0, 0x2000
11: Modify R0 System Bus
12: MOV 0x2001, R0
13: RETI #ISR return
…..
Int P1 P2
…..
Main Program
….. PC
101 : Instruction
…..
49
Handling Interrupts
STEP 1: CPU executes its main program that includes series of instructions.
While executing instruction at 100, P1 receives input data in the register
with address 0x2000 and asserts “Int” to request servicing by the CPU.
Program Memory
IVT CPU
00: 08 Data Memory
01: 10
02: 14
………
ISR
10: MOV R0, 0x2000
11: Modify R0 System Bus
12: MOV 0x2001, R0
13: RETI #ISR return
…..
Int P1 P2
…..
Main Program
….. PC
101 : Instruction
…..
Program Memory
IVT CPU
00: 08 Data Memory
01: 10
02: 14
………
ISR
10: MOV R0, 0x2000
11: Modify R0 System Bus
12: MOV 0x2001, R0
13: RETI #ISR return
…..
Int P1 P2
….. “10"
Main Program
….. PC
101 : Instruction
…..
50
Handling Interrupts
STEP 3: The ISR reads data from 0x2000, modifies the data and writes the
resulting data in 0x2001 (as shown). Once the data from P1 is read, it
deasserts “Int”.
00: 08
01: 10
02: 14
………
ISR
10: MOV R0, 0x2000
11: Modify R0 System Bus
12: MOV 0x2001, R0
R0
13: RETI #ISR return
…..
….. Int P1 P2
Main Program
….. PC
STEP 4: ISR ends with a return (or RETI) instruction, thereby restoring
PC to 100+1 =101 where CPU resumes executing next instruction.
The term interrupt latency refers to the number of clock cycles required for
a processor to responds to an interrupt request, this is typically a measure
based on the number of clock cycles between the assertion of the interrupt
request up to the cycle where the first instruction of the interrupt handler
exited [13].
51
Handling Interrupts
cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6 cycle 7 cycle 8 cycle 9 cycle 10 cycle 11
Processor Clock
IRQ
Fetch Instr X
Decode Instr X
In many cases, when the clock frequency of the system is known, the
interrupt latency can also be expressed in terms of time delay, for example,
in µsec.
For generic processors, the exact interrupt latency depends on what the
processor is executing at the time the interrupt occurs. For example, in
many processor architectures, the processor starts to respond to an
interrupt request only when the current executing instruction completes,
which can add a number of extra clock cycles. Therefore the maximum
latency from interrupt request to completion of the hardware response
consists of the execution time of the slowest instruction plus the time
required to complete the memory transfers required by the hardware
response.
As a result, the interrupt latency value can contain a best case and a worst
case value. This variation can results in jitters of interrupt responses, which
could be problematic in certain applications like audio processing (with the
introduction of signal distortions) and motor control (which can result in
harmonics or vibrations) [13].
52
Handling Interrupts
main
RTIF Set
OUT0 = 1
0
Ack
1
Ack Ack = 1
1
0 Ack = 0
Toggle
OUT1
Count++
Main
OUT0 = 0
RTI
ISR
Consider a common case for the application that uses the serial
communication interface (UART). The UART hardware receives
characters at an asynchronous rate. In order to avoid loss of data in periods
of high activity, characters need to be stored in a FIFO buffer. The main
program can process characters at a rate which is independent of the rate
at which the characters arrive. It must process the characters at an average
rate which is faster than the average rate at which they can arrive, otherwise
the FIFO buffer will become full and data will be lost. In other words,
buffer allows the input data to arrive in bursts, and the main program can
access them when it is ready.
53
Handling Interrupts
Read data
from input
FIFO_Put FIFO_Get
yes FIFO FIFO yes
Full ? empty ?
no no
Error Error
Return data
to caller
RTI return
54
Handling Interrupts
The main program (background thread in this case) checks the output busy
flag every time it writes data to the buffer. If the device is busy, then a
device ready interrupt is expected and nothing needs to be done otherwise
the background thread arms the output and calls SendData to “kick start” the
output process.
The SendData routine is responsible for retrieving the data from the buffer
and outputting it. If there is no more data in the buffer, then it must disarm
the output to prevent further interrupts.
FIFO_Put
FIFO_Get
yes FIFO FIFO yes
Full ? Empty ?
no no
yes
Error ?
no
Error ?
no
yes
Output yes
device rti
Disarm
busy ?
output
no ISR
(Foreground thread)
Arm
Output
return
Call SendData
(Kick Start)
return
Main program
(background thread)
Figure 2-22: Kick Start an interrupt-driven output routine for a device that requests
interrupts on transitioning from busy to ready
can also define device latency as the response time of the external I/O
device. For example, if we request that a certain sector be read from a disk,
then the device latency is the time it take to find the correct track and spin
the disk (seek) so the proper sector is positioned under the read head. For
an output device, the interface latency is the time between when the output
device is idle, and the time when the software writes new data. A real-time
system is one that can guarantee worst case interface latency.
For the Cortex-M0 and Cortex-M0+ processors, the NVIC design supports
up to 32 interrupt inputs plus a number of built-in system exceptions
[13](Figure 2-23). For each interrupt input, there are 4 programmable
priority levels (Figure 2-24). Higher Cortex-M processors supports larger
number of interrupt inputs. In practice the number of interrupt inputs and
the number of priority levels are likely to be driven by the application
requirements, and defined by silicon designers based on the needs of the
chip design.
56
Handling Interrupts
For a zero wait state memory systems, following table shows the latency in
terms of number of clock cycles from time when interrupt request is
asserted to the time when the first instruction of the interrupt handler is
ready to be executed.
57
Handling Interrupts
To make the Cortex-M devices easy to use and program, and to support
the automatic handling of nested exceptions or interrupts, the interrupt
response sequence includes a number of stack push operations. This
enables all of the interrupt handlers to be written as normal C subroutines,
and enables the ISR to start real work immediately without the need to
spend time on saving current context.
Just considering the processor interrupt latency may not provide overall
interrupt response time. One must consider software overhead to handle
the interrupts (like stacking of registers, switching register bank, check the
actual interrupt source if it is shared interrupt and other misc. tasks).
As in any program code, ISRs take time to execute. The faster the
performance of the processor, the quicker the interrupt request is serviced,
and the longer the system can stay in sleep mode thus reducing power
consumption. When considering from the time an interrupt request is
asserted to the time the interrupt processing is actually completed, the
Cortex-M processors claims to be better than other microcontrollers due
to lower software overheads.
58
Handling Interrupts
In traditional 8-bit/16-bit systems, the run time for ISRs can be many more
cycles than with Cortex-M based microcontrollers because of lower
performance. When combined with the higher maximum clock speed of
many Cortex-M based microcontrollers, the maximum interrupt processing
capacity can be much higher than other microcontroller products [13].
The jitter of interrupt response time refers to the variation (or value range)
of interrupt latency cycles. In many systems, the interrupt latency cycle
depends on what the CPU is doing when the interrupt takes place. For
example, in an architecture like the 8051, if the processor is executing a
multi-cycle instruction, the interrupt entry sequence cannot start until the
instruction is finished, which can be a few cycles later. This results in a
variation of the number of interrupt latency cycles, and is commonly
referred as jitter.
59
Handling Interrupts
In some of the embedded processor targeted for real time operating system
(Like Cortex-M processors) , if a multiple cycle instruction is being executed
when an interrupt arrives, in most cases, the instruction is abandoned and
restarted when the ISR is completed. On ARM Cortex-M3 [13] processor
if the interrupt request is received during a multiple load/store (memory
access) instruction, the current state of the multiple transfer is automatically
stored as part of the PSR (Program Status Register) and when the ISR
completes, the multiple transfer can resume from where it was stalled by
using the saved information in the PSR. This mechanism provides high
performance processing while at the same time maintains low jitter in the
interrupt response time.
ARM Cortex-M3 [13] also includes “Tail Chaining” – technique that allows
processor to switch to pending ISR after the current ISR is complete by
skipping some of the un-stacking and stacking operations which are
normally needed(see Figure 2-27). This also makes the processor much
more energy efficient by avoiding unnecessary memory accesses.
IRQ1
IRQ2 Interrupt
exit Interrupt exit
Tail-chain
60
Memory Addressing
3. Memory Addressing
3.1 Introduction
Many type of memory devices are used in an embedded system. Architect
and designers focused on embedded systems must be aware of the
differences between them and understand how to use each type effectively.
This chapter covers all about memories in context to embedded systems.
First few sections covers the memory technologies and memory
classification based on various characteristics. Later sections focus on
building memory system using memory devices and combinational
components for an efficient memory system design.
Accessibility
Persistence of Storage
Storage Density & Cost
61
Memory Addressing
Storage Media
Power Consumption
Accessibility
Persistence of Storage
Storage Cells
62
Memory Addressing
Storage Density (number of bits which can be stored per unit area) is
generally a good measure of cost. Dense memories (like SDRAM) are
much cheaper than their counterparts (like SRAM)
Power Consumption
RAM
RAM stands for Random Access Memory. RAMs are simplest and most
common form of volatile data storage. The number of words which can
be stored in a RAM are proportional (exponential of two) to the number
of address buses available. This severely restricts the storage capacity of
RAMs (A 32 GB RAM will require 36 Address lines) because designing
circuit boards with more signal lines directly adds to the complexity and
cost.
DPRAM are static RAMs with two I/O ports. These two ports access the
same memory locations - hence DPRAMs are generally used to implement
Shared Memories in Dual Processor Systems. The operations performed
on a single port are identical to any RAM. There are some common
problems associated with usage of DPRAM:
(a) Possible of data corruption when both ports are trying to access the
same memory location - Most DPRAM devices provide interlocked
memory accesses to avoid this problem.
(b) Data Coherency when Cache scheme is being used by the processor
accessing DPRAM - This happens because any data modifications (in the
DPRAM) by one processor are unknown to the Cache controller of other
processor. In order to avoid such issues, Shared memories are not mapped
63
Memory Addressing
Dynamic RAM
Dynamic RAMs use a different storage technique for data storage. A Static
RAM has four transistors per memory cell, whereas Dynamic RAMs have
only one transistor per memory cell. The DRAMs use capacitive storage.
Since the capacitor can lose charge, these memories need to be refreshed
periodically making DRAMs more complex (due to additional control) and
power consuming. However, DRAMs have a very high storage density (as
compared to static RAMs) and are much cheaper in cost. DRAMs are
generally accessed in terms of rows, columns and pages which significantly
reduces the number of address buses (another advantage over RAM).
Generally SDRAM controller (which manages different SDRAM
commands and Address translation) is required to access a SDRAM. Most
of the modern processors come with an on-chip SDRAM controller.
Flash (NOR)
64
Memory Addressing
NAND FLASH
These memories are denser and cheaper than NOR Flash. However these
memories are block accessible, and cannot be used for code execution.
These devices are mostly used for Data Storage (being generally cheaper
than NOR flash). However some systems use them for storing the boot
codes (these can be used with external hardware or with built-in NAND
boot logic in the processor).
SD-MMC
Hard Disc
Hard Discs are Optical Memory devices. These devices are bulky and they
require another bulky hardware (disk reader) for reading these memories.
These memories are generally used for Mass storage. Hence they memories
do not exist in smaller and portable systems. However these memories are
being used in embedded systems which require bulk storage without any
size constraint.
65
Memory Addressing
Memory
The RAM family includes two important memory devices: static RAM
(SRAM) and dynamic RAM (DRAM). The primary difference between
them is the lifetime of the data they store. SRAM retains its contents as long
as electrical power is applied to the chip. If the power is turned off or lost
temporarily, its contents will be lost forever. DRAM, on the other hand,
has an extremely short data lifetime-typically about few milliseconds. This
is true even when power is applied constantly.
In short, SRAM has all the properties of the memory of RAM. Compared
to that, DRAM seems kind of useless. By itself, it is. However, a simple
piece of hardware called a DRAM controller can be used to make DRAM
behave more like SRAM. The job of the DRAM controller is to periodically
refresh the data stored in the DRAM. By refreshing the data before it
expires, the contents of memory can be kept alive for as long as they are
needed.
When deciding which type of RAM to use, a system designer must consider
access time and cost. SRAM devices offer extremely fast access times but
are much more expensive to produce. Generally, SRAM is used only where
access speed is extremely important. A lower cost-per-byte makes DRAM
attractive whenever large amounts of RAM are required. Many embedded
systems include both types: a small block of SRAM (a few kilobytes) along
a critical data path and a much larger block of DRAM (perhaps even
Megabytes) for everything else.
66
Memory Addressing
One step up from the masked ROM is the PROM (programmable ROM),
which comes in an unprogrammed state. Data in PROM in an
unprogrammed state is made up entirely of 1's. The process of writing data
to the PROM involves a special piece of equipment called a device
programmer. The device programmer writes data to the device one word
at a time by applying an electrical charge to the input pins of the chip. Once
a PROM has been programmed in this way, its contents can never be
changed. If the code or data stored in the PROM must be changed, the
current device must be discarded. As a result, PROMs are also known as
one-time programmable (OTP) devices.
As memory technology has matured in recent years, the line between RAM
and ROM has blurred. Now, several types of memory combine features of
67
Memory Addressing
both. These devices do not belong to either group and can be collectively
referred to as hybrid memory devices. Hybrid memories can be read and
written as desired, like RAM, but maintain their contents without electrical
power, just like ROM. Two of the hybrid devices, EEPROM and flash, are
descendants of ROM devices. These are typically used to store code. The
third hybrid, NVRAM, is a modified version of SRAM. NVRAM usually
holds persistent data.
Flash memory combines the best features of the memory devices described
thus far. Flash memory devices are high density, low cost, nonvolatile, fast
(to read, but not to write), and electrically reprogrammable. Thus Flash
offers significant advantages and, as a direct result, the use of flash memory
has increased dramatically in embedded systems. From a software
viewpoint, flash and EEPROM technologies are very similar. The major
difference being that flash devices can only be erased one sector at a time,
rather than byte-by-byte. Typical sector sizes are in the range 256 bytes to
16KB. Despite this disadvantage, flash is much more popular than
EEPROM and is rapidly displacing many of the ROM devices as well.
The third member of the hybrid memory class includes NVRAM (non-
volatile RAM). Non-volatility is also a characteristic of the ROM and hybrid
memories discussed previously. However, an NVRAM is physically very
different from those devices. Logically an NVRAM is just an SRAM with a
battery backup. When the power is turned on, the NVRAM operates just
like any other SRAM. When the power is turned off, the NVRAM draws
just enough power from the battery to retain its data. NVRAM is fairly
common in embedded systems. However, it is expensive than SRAM,
because of the battery so its applications are typically limited to the storage
of a few hundred bytes of system critical information that cannot be stored
in any better way.1 = Only once using device programmer
Table 3-1 summarizes the features of each type of memory discussed in this
section.
68
Memory Addressing
NOTE: Different memory types serve different purposes with each memory type having
its strengths and weaknesses, Side-by-side comparison is not always effective.
Let’s consider a 1Mbit memory. To access any cell in this memory we must
be able to identify each cell. The simplest approach would be to number all
of the cells and provide an integer value or number of the cell one wish to
access for read or write. That becomes the memory address of the cell and
providing this number is referred to as the process of addressing a cell.
However these memory cells (that store the individual bits) are not stored
in a linear array that can be addressed in such a simple manner. The single
bit cells are arranged in a two-dimensional array of 1024x1024 cells. All of
the cells in a row share the same word line or select signal. Thus when a
word line is asserted all of the cells in the row will drive their bit values onto
the bit lines. Similarly all of the cells in a column share the same bit line.
Therefore at any given time only one cell in a column place a value on this
shared bit line. Addressing a row or column now only requires 10 bits (210
= 1024). While strictly speaking a cell holds a bit value we will use the term
bit and cell interchangeably in this chapter.
A
D
E
C
Bit line
O
D
E
R
I/O
D COLUMN DECODE
CS
RW
For the same scenario when a 20 bit address is provided we may see the
following sequence of events. The most significant 10 bits of the address
are used as a row address to a 10 bit row decoder. Each of the 1024 outputs
of the row decoder are connected to one of the 1024 word lines in the array.
The selected row will drive their bit values onto the corresponding 1024 bit
lines. The 1024 bit values are latched into a row buffer that can be
graphically thought of as residing at the bottom of the array in Figure 3-2.
The 10-bit column address is used to select one of the 1024 bits from this
row buffer and provide the value to the chip output. The access of data
from the chip is controlled by two signals. The first is the chip select (CS)
which enables the Memory device. The second is a signal that controls
whether data is being written to the array (RW=0) or whether data is being
read from the array (RW=1). The combination of CS and RW control the
tristate devices on the data signals D. The organization shown in Figure 3-2
70
Memory Addressing
uses bidirectional signals for data into and out of the memory device rather
than having separate input data signals and output data signals.
Consider having four identical memory planes shown in Figure 3-2. Each
plane operating concurrently on the same 20 bit address. The result is 4 bits
for every access to provide 4Mbits Memory device.
Example 1:
Consider a 1Mx8 memory system design using 1Mx4 memory devices. The
total number of bits to be provided by the memory system are 8M bits.
Each available 1Mx4 memory device provides 4M bits and thus the memory
system requires need two memory devices, each providing 4 bits at each
address.
71
Memory Addressing
A19: A0 D0
D1
1M x 4
D2
CS
RW D3
D4
D5
1M x 4
D6
D7
One memory device provides the least significant 4-bits at each address and
the second chip provides the most significant 4-bits at the same address.
The 20-bit address, chip select and Read/Write control goes to both the
memory devices.
Example 2:
Consider the same example but for memory system with 2M addresses with
a four bits output at each address using the same 1Mx4 memory devices.
The total number of bits in the memory system still remain 8Mbits as in the
Example 1. As shown in Figure 3-4 memory system is organized such that
each memory device provides 4-bit data and 1M addresses.
72
Memory Addressing
A19: A0 D0
D1
1M x 4
(MD1)
D2
CS
D3
A20 RW
1:2
MSEL
1M x 4
(MD2)
CS
RW
RW
Memory device MD1 services the first 1M addresses from 0 to 220 -1. The
second memory device MD2 provides the data at the remaining addresses
that is addresses from 220 to 221-1. However this arrangement requires
additional control to ensure only one of the two memory devices must be
enabled depending on the value of the address. This can be achieved by
using the most significant bit of the address as a 1:2 decoder. The most
significant bit of an address determines which of the two halves of the
address range is being accessed. The outputs of the decoder are connected
to the individual memory chip select signals to enable the corresponding
memory device. The remaining address lines and the read/write control
signal are connected to the corresponding inputs of each memory device.
Note that each memory device is provided with same number of address
bits, 20 in this case. The memory select signal MSEL is used as an enable
to the memory system.
Example 3:
73
Memory Addressing
Let’s consider the same example but with a memory system with 4M
addresses and four bits output at each address for a total of 16Mbits. As in
the previous example, building memory system with 1Mx4 memory device
will require four memory devices.
A19: A0 D0
D1
1M x 4
(MA1)
D2
CS
D3
1M x 4
(MA2)
CS
A21: A20
2:4
MSEL
1M x 4
(MA3)
CS
1M x 4
(MA4)
CS
RW
The two most significant bits of the address can be used to select one of
the memory device. The remaining 20 bits of address are provided to each
memory device along with the common read/write control. A 2:4 decoder
operating on the two most significant bits of the address is used to select
the memory device. As in the previous example MSEL is used as a memory
74
Memory Addressing
system enable signal for the decoder which in turn generates the chip select
signals. The memory organization is shown in Figure 3-5. Note that only
one memory device is active at a time, however we may have memory
organization where this is not necessarily the case as in next example.
Example 4:
Now let’s extend the previous example to build a memory system that can
provide 2M addresses with 8 bits at each address. Using the same 1Mx4
memory devices, one would need at least two memory devices to provide
an 8 bit output at any given address with a total of four memory devices to
provide necessary total of 16 Mbits (2M addresses). The memory devices
are organized in pairs as shown in Figure 3-6.
A19: A0 D0
D1
1M x 4
(MD1)
D2
CS
D3
D4
D5
1M x 4
(MD2)
D6
CS
D7
A20
1:2
MSEL
1M x 4
(MD3)
CS
1M x 4
(MD4)
CS
RW
The first two memory devices (MD1 and MD2) provide four bits each for
the first 1M addresses. The second pair of memory devices (MD3 and
75
Memory Addressing
MD4) does so similarly for the second 1M addresses. A decoder uses the
most significant bit of the address to determine which of the two pairs of
memory devices will be selected for any specific address.
76
Memory Addressing
Memory Memory
Contents Address
0x00 0x10010000
0x11 0x10010001
0x22 0x10010002
0x33 0x10010003
0x44 0x10010004
0x55 0x10010005
0x66 0x10010006
0x77 0x10010007
0x88 0x10010008
0x99 0x10010009
0xAA 0x1001000a
The contents of each memory location in the figure is an 8-bit value that
shown in hexadecimal notation. The address of each memory location is
shown adjacent to the location. Addresses are assumed to be 32-bit values
and are also shown in hexadecimal notation. This is just one example of
how data could be organized, there could be other models that return 16,
32 or 64 bits rather than 8 bits.
The first issue is how are these words stored? For example, consider the
need to store the 32-bit quantity 0x00112233 at address 0x10010000. The
77
Memory Addressing
0x00 0x10010000
0x11 0x10010001
0x22 0x10010002
0x33 0x10010003
If the word size is 32 bits, in a byte addressed memory every fourth address
will be the start of a new word. Such addresses are referred to word
boundaries. Alternatively if the word size is 64 bits each word will include
8 bytes. Therefore every eighth byte will correspond to a word boundary.
In general one can think of 2k byte boundaries where 0 ≤ k ≤ n and n is the
number of bits in the address.
78
Memory Addressing
79
Memory Addressing
registers (as the first level for storing and retrieving information inside the
CPU), then a typical memory hierarchy starts with a small, expensive, and
relatively fast unit, called the cache. The cache is followed in the hierarchy
by a larger, less expensive, and relatively slow main memory unit. Cache
and main memory are part of System-on-Chip (SoC). They are followed in
the hierarchy by a far larger, less expensive, and much slower external
memories typically NOR/NAND Flash. The objective behind designing a
memory hierarchy is to have a memory system that performs as if it consists
entirely of the fastest unit and with the cost dominated by the cost of the
slowest unit.
CPU
Processor Processor
Registers
Level 1 Cache (L1)
CPU CACHE
SPEED COST PER BIT
NOTE: Data in the table above should only be taken as relative comparison. Numbers
may not be accurate during the time book would be released. Also note that numbers in
context to bandwidth are de-rated as to what are applicable to embedded system instead
of max that can be achieved on modern computers or servers.
81
Memory Addressing
NOTE: On some Von Neumann machines the program can read from and write to
CPU registers, including the program counter. This can be dangerous as you can point
the PC at memory blocks outside program memory space. Careless PC manipulation can
cause errors which require a hard reset [15].
82
Memory Addressing
Figure 3-10: Von Neumann Memory Map for the MC68705C8 [15]
Typically these chips have a register analogous to the program counter (PC)
which refers to addresses in program space. Also, some chips support the
83
Memory Addressing
use of any 16 bit value contained in data space as a pointer into the program
address space. These chips have special instructions to use these data
pointers.
NOTE: It is important to understand how Harvard architecture part deals with data
in program space. It is possible to generate more efficient code using symbolic constants
declared with #define directives instead of declared constants. You may also create global
variables for constant values.
The following memory map is from the Microchip PIC16C74. Notice that
program memory is paged and data memory is banked. The stack is
implemented in hardware and the developer has no access to it [15].
accesses all the data as 32-bit words; the issue of endianness is not relevant.
However, if the software executes instructions that operate on 8 or 16 bits
data at a time, and the data need to be mapped at specific memory addresses
(such as with memory-mapped I/O), then the issue of endianness will have
to dealt with.
3.10.1 Definition
Endian
Byte 0 Byte 1 Byte 2 Byte 3
Architecture
Big Endian AA (MSB) BB CC DD (LSB)
Little Endian DD (LSB) CC BB AA (MSB)
Table 3-3: Big Endian and Little Endian Byte Ordering
Note that stored multi-byte data field is the same for both types of
Endianness as long as the data is referenced in its native data type i.e. 32
bit. However, when the data is accessed as bytes or half-words, the order
of the sub-fields depends on the endian configuration of the system. If a
program stores the above value at location 0x100 as a word and then fetches
the data as individual bytes, two possible orders exist.
In the case of a little-endian system, the data bytes will have the order
depicted in Table 3-4.
85
Memory Addressing
Address Data
0x0100 DD
0x0101 CC
0x0102 BB
0x0103 AA
Table 3-4: Little Endian Addressing
Note that the rightmost byte of the word is the first byte in the memory
location at 0x100. This is why this format is called little-endian; the least
significant byte of the word occupies the lowest byte address within the
word in memory.
If the program executes in a big-endian system, the word has the byte order
in memory shown in Table 3-5.
Address Data
0x0100 AA
0x0101 BB
0x0102 CC
0x0103 DD
Table 3-5: Big Endian Addressing
The least significant byte of the word is stored in the high order byte
address. The most significant byte of the word occupies the low order byte
address, which is why this format is called big-endian.
Note: Within the half-word, the bytes maintain the same order as they have in the word
format. In little-endian mode, the least significant half-word resides at the low-order
86
Memory Addressing
address (0x100) and the most significant half-word resides at the high-order address
(0x102). For the big-endian case, the layout is reversed.
One may see a lot of discussion about the relative merits of the two formats,
mostly religious arguments based on the relative merits of the PC versus
the Mac; however both formats have their advantages and disadvantages.
In Little Endian form, since lowest order byte is at offset “0” and is accessed
first, assembly language instructions for accessing 1, 2, 4, or longer byte
number proceed in exactly the same way for all formats. Also, because of
the 1:1 relationship between address offset and byte number (offset 0 is
byte 0), multiple precision math routines are correspondingly easy to write.
In Big Endian form, since the higher-order byte come first, it is easy to test
whether the number is positive or negative by looking at the byte at offset
zero. Thus there is no need to receive the complete packet of bytes to know
the sign information. The numbers are also stored in the order in which
they are printed out, so binary to decimal routines are particularly efficient.
Address 00 01 02 03
Big-endian 12 34 56 78
Little-endian 78 56 34 12
One would notice that reading a hex dump is certainly easier in a big-endian
machine since numbers are normally read from left to right (lower to higher
address).
have to keep reversing the byte order when working with large graphical
elements.
Table 3-7 lists several popular computer systems and their Endian
Architectures. Note that some CPUs can be either big or little endian (Bi-
Endian) by setting a processor register to the desired endian-architecture.
Some of the common file formats and their endian order are listed in Table
3-8:
88
Memory Addressing
What this means is that any time numbers are written to a file, one needs
to know how file is supposed to be constructed, for example if graphics file
(such as a .BMP file) is written on a Big Endian machine , byte order first
needs to be reversed else standard program to read the file won't work.
Also note that some CPUs can be either big or little endian (Bi-Endian) by
setting a processor register to the desired endian-architecture.
As it turns out, all of the protocol layers in the TCP/IP suite are defined as
big-endian. In other words, any 16- or 32-bit value within the various layer
headers (for example, an IP address, a packet length, or a checksum) must
be sent and received with its most significant byte first.
each network host. The dotted decimal IP address must be translated into
such an integer.
Code which will be executed directly from an 8- or 16-bit flash device must
be stored in a way that instructions will be properly recognized when they
are fetched by the processor. This may be affected by the endian
configuration of the system. Compilers typically have a switch that can be
used to control the endianness of the code image that will be programmed
into the flash device.
90
Memory Addressing
The relationship of a byte address to specific bits on the 32-bit data bus is
shown in the Table 3-9.
Table 3-10 shows the data byte mapping for little and big endian system
with 8-bit, 16-bit and 32-bit access.
8-bit read at
-- 0B -- --
Address “01” (BE)
8-bit read at
-- -- 0C --
Address “01” (LE)
8-bit read at
-- -- 0C --
Address “10” (BE)
8-bit read at
-- 0B -- --
Address “10” (LE)
8-bit read at
-- -- -- 0D
Address “11” (BE)
8-bit read at
0A -- -- --
Address “11” (LE)
Table 3-10: Address-Data mapping for different Endian system with 8, 16 and 32 bit
access size
92
Memory Addressing
When a core or IP within a SoC operates on a single or multi byte field, the
MSB is on the left hand side of the field and the LSB is on the right hand
side of the field. That is, if a 16 bit field holds an integer and the desired
operation is to increment it, a "1" is added to the LSB and any needed
carries are propagated from the LSB (on the right) towards the MSB (on
the left). This operation is the same for either big or little endian address
architectures.
This leads to one of the main issues in mixing cores and other IPs of
different endian address architectures since a multi-byte field has different
byte address based on the endian mode, if a multi-byte field is be
manipulated as a single entry, bit ordering within the entry must be
preserved as it is moved across various IPs.
This same issue applies to multi-bit fields that cross byte boundaries.
Consider an IP that has a 16 bit control register in its programming model.
If the bit field [8:7] within this control register defines a control field, then
it is required that the relationship of these 16 bits remain constant for all
accesses to the control register.
The serial frame received is stored in the peripheral’s memory in the order
Type, H2, H1, and H0, which is little endian. It is possible that fields in the
frame can span over multiple bytes and not end on a byte boundary (Figure
3-13). For example, the status field can be of 12 bits. Hence it is important
for the application that this data is not changed due to endianness
conversion as the software would process the data in that order.
93
Memory Addressing
CPU
(Big Endian)
DMA
System Interconnect
(Big Endian)
Peripheral
Data Path
(Little Endian)
Figure 3-12: Data flow from Little-Endian Peripheral to System Memory (Address
Variance)
In Figure 3-12, the data is stored in peripheral’s memory using little endian
addressing. Now when this data is transferred to the system RAM, which is
big endian, it should be ensured that the bit ordering of the data is not
changed. In order to achieve this in hardware, the address that is used to
access the peripheral RAM’s memory is modified. The modification of
address is done based on the size of transfer, as shown in table Table 3-11:
Using the above logic, the last two LSBs of the address bus is inverted and
the data bus is used as is.
94
Memory Addressing
H0 H1 H2 Type S0 S1
X Bytes 2
3 Bytes 1 Byte 2 Bytes
(say 10 Bytes) Bytes
0 Type H2 H1 H0 0 Type H2 H1 H0
4 D1 D0 S1 S0 4 D1 D0 S1 S0
8 D5 D4 D3 D2 8 D5 D4 D3 D2
C D9 D8 D7 D6 C D9 D8 D7 D6
10 CRC 10 CRC
31 0 31 0
Figure 3-13: Interfacing Little Endian Memory to Big Endian Memory using Data
Invariance
Data Flow:
Data flow from a little endian peripheral to big endian memory using data
invariance is described below:
96
Memory Addressing
H0 H1 H2 Type S0 S1
X Bytes 2
3 Bytes 1 Byte 2 Bytes
(say 10 Bytes) Bytes
0 Type H2 H1 H0 0 H0 H1 H2 Type
4 D1 D0 S1 S0 4 S0 S1 D0 D1
8 D5 D4 D3 D2 8 D2 D3 D4 D5
C D9 D8 D7 D6 C D6 D7 D8 D9
10 CRC 10 CRC
31 0 31 0
Figure 3-14: Interfacing Little Endian Memory to Big Endian Memory using
Address Invariance
Data Flow:
Data flow from a little endian peripheral to big endian memory using
address invariance is described below:
7. The big endian memory decodes the access as write to bits 31:24.
8. Since after endianness conversion, data from little endian
memory is on the same address location, the data gets stored in
the big endian RAM.
9. The process continues for other bytes that need to be transferred
from peripheral RAM to system RAM.
10. For 16-bit and 32-bit access, the above process is same with
output data being swapped as shown in Table 3-11
3.10.8.1 Methods
Various byte swap methods that are commonly used in software are:
Limitation
99
Memory Addressing
Two 1MB bit-band regions, one in the peripheral memory area and one in
the SRAM memory areas are each mapped to a 32MB virtual alias region.
Each bit in the bit-band region is mapped to a 32bit word in the alias region
[16].
The first bit in the bit-band peripheral memory is mapped to the first word
in the alias region, the second bit to the second word etc.
Writing a value to the alias region with Least Significant Bit i.e. bit [0] set to
1 will write a value of 1 to the bit-band bit. Conversely writing a value of 0
will clear the bit-band bit. The value of the bits [31:1] in the alias region for
any word are unmapped and will have no effect on the bit-band value.
One can use this method to do atomic (non interruptible) changes to a bit
in SRAM or peripheral mapped memory. If atomic changes are not
required, then this process can be slower as change is limited to single bit
at a time. In certain circumstances (changing lots of bits) it may be quicker
to disable interrupts, make the changes and re-enable interrupts.
100
System Boot
4. System Boot
4.1 Introduction
Boot process is the sequence of steps that a system performs when the
power is switched on until the application is loaded. Though it sounds
simple and obvious that main job of a bootloader is to load the operating
system, the process is often complex and understood differently by a
hardware as well as software engineer.
This chapter takes deep dive into boot process covering both hardware as
well as software aspects of an embedded system. A program cannot be
loaded into memory unless a program has already been loaded into
memory. This leads to the “chicken and egg” situation. The boot process
solves this dilemma. This is often a multi-step complex process and
involves several sub-steps before a program gets loaded in the system
memory.
Over the last two decades boot process has really evolved from simple DOS
based boot to more complicated multi-OS or even peripheral boot like USB
101
System Integrity
that allows an image to be booted from a USB device. Later is getting more
popular recently in Industrial/embedded applications as it provides lot of
flexibility for example during a software corruption where system(or
equipment) needs to be loaded with the new firmware, the technique allows
service engineer to just copy new software on a USB pen drive and boot
the system from the USB drive rather than taking big piece of equipment
back to the manufacturer saving thousands of dollars that could just be
incurred in transportation of the equipment. To enable this, hardware as
well as software capabilities need to be understood that allow an embedded
system to boot from various interfaces like USB, PCI-Express, SDHC card
apart from standard boot from on-chip or off-chip Memories.
102
System Integrity
Windows® XP follows the same step but have more sophistication with
details shown below with step-wise explanation [17].
1. Power Supply Switched ON and POR - Boot starts with Power ON.
Processor is kept in reset. When all voltages and current levels are
acceptable, the supply indicates that the power is stable and sends
the Power Good signal to the processor.
2. POR Negated: With the availability of good power supply, reset to
the processor is negated so as to allow the CPU to begin
operation. CPU points to the ROM address and starts executing
the ROM BIOS (one line description on BIOS) code.
3. Power On Self Test(POST) : The CPU starts executing the ROM
BIOS code. The ROM BIOS performs a basic test (POST) of
central hardware to verify basic functionality any errors that
occur at this point in the boot process will be reported by means
of 'beep-codes' because the video subsystem has not yet been
initialized.
4. Video Card Initialization: BIOS looks for video card adaptor.
Startup BIOS routine scan memory addresses (C000:0000
through C780:0000) to find video ROM. The Video Test
initializes the video adapter, tests the video card and video
memory, and displays configuration information. Depending on
whether this a cold-start or warm-start, ROM BIOS executes a
full POST. If this is warm-start, memory test portion of the
POST is skipped.
5. CMOS readout from BIOS: BIOS locates and reads the
configuration information stored in CMOS(small area of
memory-usually 64 bytes maintained by small coin cell on the
motherboard). CMOS indicates things like Time, boot order etc.
6. Master Boot Record(MBR): If the first bootable disk is a fixed disk
the BIOS examines the very first sector of the disk for a Master
Boot Record (MBR). A Master Boot Record is made up of two
parts - the partition table which describes the layout of the fixed
disk and the partition loader code which includes instructions for
continuing the boot process.
7. Boot Loader : The partition loader (or Boot Loader) examines the
partition table for a partition marked as active. The partition
loader then searches the very first sector of that partition for a
Boot Record.
8. NTLDR : The active partition's boot record is checked for a valid
boot signature and if found the boot sector code is executed as a
program. The loading of Windows® XP is controlled by the file
103
System Integrity
104
System Integrity
In these processors, code and data are already programmed into an internal
Flash device. The only timing constraints relate to the intervals after power-
up sequences to ensure that the flash is ready to be accessed at least as
quickly as the processor is ready to make an access.
For the case where there is no internal Flash and system relies on external
memory, execution out of the external memory is slower than running code
from faster internal memory because the flash memory runs at a clock
speed that is typically much lower than the speed at which the processor's
core runs.
Because of this, the first code that is executed is often a small code
segment that is used to set up the transfers needed to bring the remaining
code into internal memory space, where it can then be executed at the
core processor frequency.
When the transfers are complete, the processor then jumps to the start of
the internal memory space where it executes the application code that was
just transferred.
105
System Integrity
This is the most common reset configuration which does not require any
special setup on board. It provides no flexibility or options to configure any
register. All the registers are initialized to some fixed values and thus chip
comes out only one fixed state after reset. This mode provides fastest mode
to initialize the system before the boot process but is least powerful in terms
of capabilities to control the system state. This might work for some
applications but if same microcontroller is used in variety of application
with varying boot requirement, this mode would be least preferred.
106
System Integrity
Figure 4-2 shows the timing diagram for this mode. System POR asserts
internal chip Reset (both active low), when de-asserted restarts the clock
and load reset configuration in the system registers. Based on system
configuration the process might be gated with other necessary tasks (for
example system to wait for PLL to get locked) before internal reset gets de-
asserted and system starts to execute the boot code.
System Clock
Power on Reset(POR)
Reset Configuration settings copied
to system registers
System Reset
b) Fuse Programming:
Reset Reset
Configuration Bits Configuration Bits Non-Volatile
OR Registers
One Time
On-Chip Flash
programmable Fuses
107
System Integrity
Figure 4-3 shows the reset configuration scheme. Note that since the fuses
are one time programmable and this secure, these can very effectively
utilized to enable and disable functions within the chip and phantom parts
with lower price. This strategy is very common across semiconductor
vendors that sell same silicon with different feature set (by blowing the
fuses to enable/disable specific function) and cost.
74LVC125
ENB
VDD
ENB
GND
Reset
ENB
GND Configuration
ENB
VDD
System On Chip
(SoC)
108
System Integrity
provide user the ability to enable the PLL when buffer is connected to
VDD and disable the PLL when buffer is connected to VSS.
Usually this approach is limited to the number of pins available for this
purpose. Also note that most of the external line drivers (like 74LV125)
comes in group of 4 or 8 buffers so in order to limit the cost of solution, it
is advised to keep the number of buffers in multiples of 4.
In a typical flow, when the system reset is asserted, chip establishes the
serial communication with serial memory and reset configuration
information is transferred from memory to microcontroller via serial
communication. Once the serial data is received in microcontroller, it
configures the system registers based on the received data and deasserts the
reset. This method provides the maximum flexibility in terms of
configuring different options in system registers as large number of data
bytes can be written in serial memories. In some of the advanced serial
configuration schemes, even software code can also be loaded in a serial
memory. In this case, both reset configuration as well as boot code is read
from external serial memory during microprocessor reset sequence and
require only minimal I/O pins. By reading data stored in for example
external SPI memory, the system would also need to configure the SPI
memory clock frequency setting, configurable power-up options for the
microprocessor, and optionally loads code into the microprocessor
memory space. All this needs to be accomplished before the device’s reset
negates, ensuring the chip is properly configured when exiting the reset
state.
Clock
Serial Chip Select
Reset boot SPI Based
Configuration Data Out
Logic Serial Memory
Data In
109
System Integrity
Figure 4-6 shows common hardware boot components that allow system
to boot from variety of interfaces. Let’s take a look at these options:-
System
Internal Memory
Flash
External
External
Bus
Chip select
Flash
Processor Interface
Boot
ROM
External Nand
Nand Flash Flash DRAM DDR
Memory Controller Controller Memory
System-On-Chip(SoC)
110
System Integrity
Just like Windows XP boot (As explained in Part I), some microcontrollers
include boot ROM as the primary boot option. Boot ROM includes basic
bootloader such that microcontroller can perform more sophisticated boot
sequence on its own and load programs from various sources like Ethernet,
NAND Flash, SD/MMC card, USB and so on. Boot ROM usage enables
more flexible boot sequences than hardwired logic could provide and
allows user the choice to boot from various peripherals. This feature is
often used for system recovery purposes when for some reasons usual boot
software in non-volatile memory (other than ROM) get erased.
This allows the system to directly boot from external NOR Flash or other
parallel memories. This offers one of the fastest ways to boot the system
since the interface to external memory can be 32 bits or more with a
reasonable frequency of operation. For a full-fledged operating system like
Linux (or windows), it can take long time (several milliseconds to seconds)
to boot the system due to size of Operating System (OS) that can be
annoying to the user. Keeping the bootloader/OS in external parallel
memory allows reducing the boot up time drastically for the systems where
boot time is critical (for example in a medical equipment)
111
System Integrity
Boot from Internal memory is always part of Secondary boot since the
RAM needs to be loaded by primary boot before it can execute the code. It
is common to use one of the primary method of boot (as described in this
section) to load the OS/Drivers and then copy the code to RAM. Once the
OS is loaded into the RAM, it takes control of the system. Executing code
from System RAM is faster and consumes less power than other memory
technologies. This is much faster and efficient method than executing
directly from External or Internal Flash. Since the internal RAM is volatile,
some system allows the RAM to switch to battery supply in event of power
failure so that there is no further need to copy the code from
External/Internal Memory when the power comes back, thus reducing
subsequent initialization/boot time.
Boot ROM
Bootloader
Processor
Nand
OS Code Flash DRAM OS
Controller Controller Execution
System-On-Chip(SoC)
112
System Integrity
In the example, ROM includes the boot loader while NAND Flash memory
contains the OS and the application code. Boot process starts with system
initialization in the ROM boot loader that also includes the reset vector.
Main OS code is copied from the NAND Flash memory to DDR, thereby
switching the execution control to external DDR after code in ROM is
executed. This scheme is very efficient for multimedia as DDR is much
faster than executing code from NAND Flash directly. Similar schemes can
be used to copy the code from various interfaces like Ethernet etc.
There may be a desire to boot from various interfaces like SDHC, SPI, I2C,
USB, SATA, PCI Express, Ethernet and others. All this comes as part of
secondary boot. As mentioned before, primary boot interface (like ROM)
initializes secondary boot interface like USB before the code execution
switches to secondary boot.
Storing boot code in external non-volatile serial memories like SPI Flash,
EEPROM on IIC can be very useful for microcontrollers that have low pin
count and can afford to have longer boot time. In this scheme, the boot
code is first copied from the external memory to the On-chip RAM and
code execution switches to the RAM just after the reset so that boot code
can be fetched right after reset de-assertion.
Since embedded systems do not have a BIOS to perform the initial system
configuration, the low level initialization of microprocessors, memory
controllers, and other board-specific hardware varies from board to board
and from CPU to CPU. These initializations must be performed before a
Linux kernel image can execute.
One of the first tasks the ROM performs is to establish which boot mode
has been selected. This is usually determined by reading the state of pins
that have been tied high or low. These may be dedicated "Boot Mode Pins"
or multipurpose I/O, depending on the processor as explained in Section
4.4.3. The ROM code reads the pin state and figures out which peripheral
will be used to bring in the code and data. Alternatively other option is to
make a selection via one time programmable memory (or fuses) which
ROM can access and make a selection choice as previously explained in
Section 4.4.3. The ROM code will then proceed to setup the peripheral
interface, including the programming of all required registers, to make the
transfer happen.
The ROM can also be responsible for setting default values of some
important system parameters pertaining to memory initialization, interrupt
handling and reset behavior. Because the ROM must be programmed to
operate within a wide variety of system situations, it often uses only the
"safest" values for key configurations like system and peripheral clock
settings.
A series of headers usually "frame" the data on the memory device. The
ROM first reads the header and then decodes it to decide how to proceed.
These headers usually include parameters such as the number of bytes to
be moved and the destination address of the transfer. One of the other
useful header features is the ability to load a specific image based on certain
board-level hardware. For example, a single product may have multiple
configurations, from low-end to high-end. As such, the external flash may
include multiple images to allow identical hardware to behave in different
ways. The booting header can be used to select the desired executable out
of this code store.
114
System Integrity
As with RAM, the boot ROM can be mapped at any memory level that the
processor supports. Typically, these ROMs are located either in L1 memory
level where instruction execution occurs in a single core clock cycle or in
L3 memory level hierarchy, where execution occurs in the slower system
clock domain. If a larger ROM is required, it is most often at the L3 level.
If speed of execution is important, an L1 ROM is more appropriate.
Primary option provides a direct boot capability and facilitates the first
instruction fetch of processor from the memory location where software
initialization code resides. These interfaces are expected to be enabled and
configured right after reset and can sometimes be configured through the
reset configuration option (as explained in 4.4.2). Small software
initialization code that resides in ROM is what constitutes the primary
bootloader.
115
System Integrity
Program
Execution Full Bootloader/
Flow Complete OS
Kernel
Program Initial
Execution Bootloader
Flow
Practically Primary boot can be on-chip component like ROM that can
include initial initialization code such that the execution switches to
secondary option like external DDR that includes complete OS Kernel.
The simplest type of boot ROM may just look for a fixed size code block
from external memory. This fixed size block almost always serves as a
secondary bootloader. One good example of a 2nd stage loader is U-Boot,
an open source, universal boot loader. It is a small segment of software that
is brought in from external memory and executes soon after powering up a
processor.
There are many standard boot loaders used in embedded applications like
DINK32, Open Firmware, and x86 bios etc. which facilitates the loading
of an Operating System (OS) and bring the system in safe state. U-boot is
one of the similar open source universal bootloader that is more popular,
especially for Linux based embedded applications. U-Boot provides an
automated interactive environment which offers user a lot of flexibility and
options to choose among various different boot schemes and interfaces. It
116
System Integrity
U-Boot can reside in internal ROM or Flash. After the basic CPU, local
memories, bus initialization, U-Boot can relocate itself to a RAM location
and then executes from there. Figure 4-9 shows the splash screen that is
displayed on serial console when U-Boot is running.
117
System Integrity
118
System Integrity
Bootstrap is the second utility in booting sequence. It controls the search and
load process of IOS. Bootstrap program is responsible for bringing up the
router, finding IOS on all possible locations and loading it in RAM.
The non-volatile code is stored in external Flash, Boot ROM during the
boot process copies IOS image from the Flash to internal SRAM.
Router also include NVRAM that is used to store data such as configuration
parameters so data is not lost when router is powered off.
120
System Integrity
5. System Integrity
5.1 Introduction
Embedded electronic control units are finding their way into more and
more complex safety critical and mission critical applications. Many of these
applications operate in adverse conditions, which can cause code runaway
in the embedded control systems, putting them into unknown states. A
watchdog timer is the best way to bring the system out of an unknown state
into a safe state. Given its importance, the watchdog has to be carefully
designed, so as to reduce the chances of its operation being compromised
by runaway code. This chapter outlines the need for robust Watchdog and
the guidelines that must be considered while designing a fault tolerant
system monitor aka Watchdog. Efficient methods for refreshing a
watchdog, write protection mechanism, early detection of code runaway
and a quick self-test of the watchdog have been described in this chapter.
121
System Integrity
The above examples serve to highlight the need for fault tolerant systems.
Looking ahead, it’s not just the automotive, industrial, aeronautical, medical
and space applications that need fault tolerance. With the introductions of
the IEC 60730 standards, it is required that even automatic electronic
controls in household appliances ensure safe and reliable operation of their
component.
at the time of power down and can face droop due to ground bounce or
current surge. Cosmic rays can cause bit-flipping in memory bit cells, while
EMI and noisy power supplies can result in a read or write of incorrect data
to memories/registers.
When such data corruption happens, program execution can get affected
as the program counter might have gotten modified. Modification of the
program code memory or a read of wrong data from code memory can
result in a totally different and unintended instruction getting executed.
Thus, program flow or the program code itself gets modified, i.e. code
runaway, and the system can enter an unknown state where its behavior is
unpredictable. Such runaway can also be a result of faulty coding on part of
the firmware coder. There might be unhandled exceptions, out of bound
array accesses, unbounded loops or simply an unexpected sequence of user
inputs, all of which can lead to an unexpected outcome.
Once the program flow takes an unexpected branch, the system can start
behaving unpredictably, which is unacceptable for a safety critical system.
For example, an airbag control unit could go haywire, firing at the wrong
time or worse, not firing during an accident. While there are remedial
measures available to prevent data corruption, there is need for a system
monitor that can detect such system failures and take action to bring the
system into a safe/known state. The system monitor would, in essence, act
as the last “dive-and-catch” for the system when a code runaway takes place.
The system monitor should be able to reliably detect a code runaway and
then bring the system into a safe state with minimum delay. The system
monitor should itself be immune to code runaways.
is determined by the timeout value of its timer. Figure 5-1 illustrates the
basic concept of a watchdog.
Refresh
Timer Reload
Clock Timer
Timer
Value
System
Timeout Reset
COMPARE
Timeout Value
The way that the aforementioned arrangement works is that the firmware
code is first profiled to determine the sequence of instruction execution and
the time taken. Watchdog refresh routines are then inserted into the code,
in such a manner that the interval between the executions of two successive
refresh routines works out to be less than the watchdog timer’s timeout
period. If a code runaway happens, the program flow will get disrupted and
either the refresh routines won’t be executed at all or they would be
executed at intervals exceeding the timeout period. The watchdog timer
would timeout and reset the system, pulling it back into a known state.
• The width of the watchdog timer should be such that it can cover a whole
range of timeout’s, for all available clock sources in the system.
124
System Integrity
• The watchdog timer should run off a clock source that is independent of
the clock source of the system that it is monitoring. Preferably it should be
a dedicated clock source for the watchdog, say an RC oscillator. This means
that even if the system clock dies out due to some reason, leaving the system
hung, the watchdog timer can still timeout and reset the system.
• The critical control and configuration register bits of the watchdog should
have write protection on them so that once set they cannot be accidentally
modified.
• The method of refreshing the watchdog should be such that the chances
of runaway code accidentally refreshing the watchdog are minimal. If
runaway code, through some weird chance, manages to refresh the
watchdog, the watchdog would either not get to know about the code
runaway or get to know it after a long time.
NOTE: All recommended features that an ideal watchdog must include is described as
“Robust Watchdog” within this chapter.
125
System Integrity
50MHz 8
8bit 16 bit 24 bit 32 bit
it
32MHz b 16 bit 24 bit 32 bit
8 bit
itb
25MHz 8 bit 16 bit 24 bit 32 bit
itb
20MHz 8 bit 16 bit 24 bit 32 bit
it
8MHz 8 bit 16 bit 24 bit 32 bit
b
4MHz 8 bit 16 bit 24 bit 32 bit
it
b 16 bit 24 bit
100kHz 8bit
8
it b 1b 2
32kHz 8
8bit 16 bit 24 bit
it 6it 16 bit 4 24 bit
1kHz 8 bit
1000 sec
100 ms
10sec
100 ns
100 sec
10us
100 us
1 ms
10 ms
1ns
10ns
1 us
1 sec
1
Timeout Period
The vertical band marks out a range of timeouts which cover the 1ms to 1
second range. As can be observed, a 32 bit counter is required to cover all
clock frequencies and the expected range of watchdog timeouts.
126
System Integrity
A Robust Watchdog should place a restriction on the time gap between the
writes of the two values, thereby reducing chances of runaway code being
able to “unlock” the registers for writing and possibly disabling the
watchdog. By placing a limit on the time gap, where the limit is just equal
to the time it takes for the CPU to fetch and execute the write instruction
for the second value, the user is forced to place the write instructions for
the two values one after the other in the code (as assembly instructions).
Now if there is a runaway after the execution of the first write, there is no
time left for the code to possibly return and execute the instruction writing
the second value of the sequence. This makes the refresh sequence more
unique because it minimizes the chance of the sequence being replicated by
runaway code.
127
System Integrity
If the gap between the two words of the password is more than a few
system bus clock cycles, the watchdog infers an exception and resets the
system. In addition, the amount of time for which the registers stay
“unlocked” is limited too, roughly equal to the time it takes for these
registers to be configured once, after which they are “locked” again. This
write protection is in effect from right after system reset, leaving no room
for runaway code to “sneak in” and change the watchdog’s configuration.
128
System Integrity
Tstart_window Tend_window
Tstart_timeout Tend_timeout
Tstart_timeout = Start of Timeout Period
Tend_timeout = End of Timeout Period
Tstart_window = Start of Window
Tend_window = End of Window
While the method of running a timer in the watchdog and interpreting its
timeout as a sign of system failure (due to runaway code or system clock
failure) is time tested, it does however have once shortcoming. If code
runaway happens in the early stages of the watchdog timer period, it takes
a lot of time before safety measures (like resetting the system) kick in,
because the watchdog waits for its timer to timeout. In some applications,
this delay in the watchdog reacting, might be as large as 1 second (the
watchdog’s timeout period). A Robust Watchdog should seek to do this by
recognizing the signs of runaway code early on and resetting the system
immediately, without waiting for a timeout of its internal timer. These signs
are:
• Presence of a value, other than the two bonafide values of the refresh
sequence or the register-unlock password, in the watchdog’s refresh or
unlock register - The user’s software code would only contain instruction
129
System Integrity
writing the said sets of two values to these registers. Thus, the presence of
a third value indicates something abnormal happening in the code, probably
due to a runaway.
When indeed a timeout takes place, the logic generating a reset to the
system is run off the fast system clock (in the range of tens to hundreds of
MHz), rather than the watchdog’s dedicated, slow clock (in the range of a
few KHz to a few MHz). If the reset were to be generated off the slow
clock, say 1 KHz, it could take the watchdog almost 1ms to reset the system,
after timeout, leaving too much time for run-away code to cause havoc.
One risk in generating the reset off the system clock is that in the event this
clock fails, the watchdog timer’s timeout would go unacknowledged and
wouldn’t reset the system. To take care of such a situation, a backup circuit
in the Robust Watchdog waits for second consecutive timeout of timer and
passes it on as reset to the system, as shown in Figure 5-4 [19].
Watchdog
0
Reset
1
Reset
Timeout
Second Consec
System Bus Timeout without
Interface Watchdog Control system reset
Control signals
Logic
Watchdog Timer
For IEC 60730 and other safety standards the expectation is that anything
that monitors a safety function must be tested and this test is required to
be fault tolerant. To test the watchdog, its main timer and its associated
compare and reset logic should be tested. Most current implementations of
the watchdog do a simple overflow test of their timers. A 32 bit timer
running on a 1 KHz clock would take ~4x106 seconds to overflow, which
is unreasonably long for a test. For a Robust Watchdog, during its test, the
timer should split up into its constituent byte-wide stages, which are then
run independently and tested for timeout against the corresponding byte of
the actual timeout value. The following block diagram, in Figure 5-5,
explains the “splitting” concept. Here the case is shown for the test of Byte
Stage 3 of the timer.
Modulus
Register
(Timeout Byte 1 Byte 2 Byte 3 Byte 4
Value)
Mod =
TEST BYTE 3 Equality Comparison
Timer?
131
System Integrity
N. These disabled stages (except the most significant stage of the counter)
are loaded with a value of 0xFF. For a 1 KHz clock, a test of each byte, one
after another, would take 4x 256ms (~103ms) for a 32 bit timeout value set
to all 1’s, i.e. 0xFFFFFFFF [19]. The actual time taken would depend on
the actual timeout value that is set.
132
Debouncing Techniques
6. Deboucing Techniques
6.1 Introduction
When any two metal contacts in an electronic device to generate multiple
signals as the contacts close or open is known as “Bouncing”. “Debouncing” is
any kind of hardware device or software that ensures that only a single
signal will be acted upon for a single opening or closing of a contact.
Mechanical Switch and relay contacts are usually made of springy metals
that are forced into contact by an actuator. When the contacts strike
together, their momentum and elasticity act together to cause bounce. The
result is a rapidly pulsed electrical current instead of a clean transition from
zero to full current. The waveform is then further modified by the parasitic
inductances and capacitances in the switch and wiring, resulting in a series
of damped sinusoidal oscillations. This effect is usually unnoticeable in AC
mains circuits, where the bounce happens too quickly to affect most
equipment, but causes problems in some analogue and logic circuits that
respond fast enough to misinterpret the on-off pulses as a data stream.
When you press a key on your computer keyboard, you expect a single
contact to be recorded by your computer. In fact, however, there is an initial
contact, a slight bounce or lightening up of the contact, then another
contact as the bounce ends, yet another bounce back, and so forth. Usually
Manufactures for these use Membrane switches that includes a sheet of
rubber with a tip of rubberized conductive material that when pressed
makes a connection with a set of exposed contacts on the circuit board.
The rubber is soft therefore provides a soft connection that has little to no
bounce. The main problem is that most of these solutions don't stand up
very well to the high impact stress of being stepped on.
133
Debouncing Techniques
VCC (LOGIC 1)
R1
OUTPUT
GND (LOGIC 0)
LOGIC 1
LOGIC 0
SWITCH SWITCH
ACTIVATED DE-ACTIVATED
If the switch is used to turn on a lamp or start a fan motor, then contact
bounce is not a problem. But if the switch or relay is used as input to a
134
Debouncing Techniques
The reason for concern is due to the fact that the time it takes for contacts
to stop bouncing is typically in the order of milliseconds while digital
circuits can respond in microseconds or even faster (in nanoseconds).
135
Debouncing Techniques
For a pressure switch, gas or liquid pressure can be used to actuate a switch
mechanism if that pressure is applied to a piston, diaphragm, or bellows,
which converts pressure to mechanical force.
Level switches can also be designed to detect the level of solid materials
such as wood chips, grain, coal etc.
Selector switches are actuated with a rotary knob or lever of some sort to
select one of two or more positions. Like the toggle switch, selector
switches can either rest in any of their positions or contain spring-return
mechanisms for momentary operation.
There may be many more switches not listed here but different switches
may behave differently and may exhibit different bounce period. A simple
cheap switch may exhibit a higher bounce period than a switch designed
for specific purpose for example a switch designed with multiple parallel
contacts give less bounce, but at greater switch complexity and cost. There
are various techniques and guidelines for a switch design that can be
considered to reduce the bounce period but this is beyond the scope of this
book.
136
Debouncing Techniques
6.4.1 RC De-bouncer
When the switch is opened, the voltage across the capacitor is zero, but it
starts to climb at a rate determined by the values of R and C. Bouncing
contacts pull the voltage down and slow the cap’s charge accumulation. A
very slow discharging R/C ratio is required to eliminate the bounces
completely. R/C can be adjusted to a value such that voltage stays below a
gate’s logic one level till bouncing stops. This has a potential side-effect that
switch may not respond to fast “open” and “close” if the time constant is too
long.
Vcc
R1
R2
Output
Now, suppose the switch has been open for a while. The capacitor is fully
charged. The user closes the switch, which discharges the capacitor through
R2. Slowly, again, the voltage drops down and the gate continues to see a
logic one at its input for a time. Here the contacts open and close for a
small time during the bouncing. While open, even if only for short periods,
the two resistors start to recharge the cap, reinforcing the logic one to the
gate. Again, component values can be chosen such that it guarantees the
gate sees a one until the bouncing contacts settle.
137
Debouncing Techniques
TIME
RC circuit shown above works well to eliminate any bounces even without
having R2 (R2 = 0). Switch operating at high speed may have bounces in
the order of sub-microseconds or less thus having sharp rise times. To
make things worse, depending on the physical arrangement of the
components, the input to the switch might go to a logic zero while the
voltage across the capacitor is still one. When the contacts bounce open the
gate now sees a one. The output is a train of ones and zeroes bounces. R2
insures the capacitor discharges slowly, giving a clean logic level regardless
of the frequency of bounces. The resistor also limits current flowing
through the switch’s contacts, so they aren’t burned up by a momentary
major surge of electrons from the capacitor.
Lastly, the state information coming from the switch is not digital in nature,
so to control something like a switching IC with this won't work very well.
In order to use the switch state information properly a basic analog-to-
digital conversion is required. This comprises of a logic gate tacked on to
the RC network as shown in Figure 6-6.
Vcc
R1
R2
Output
The logic gate has a certain voltage threshold at which it changes its output
state. This provides some more tolerance to switch bounce but switch
bounce can still leak through as shown in Figure 6-7.
VOLTAGE
RC Network
Threshold
Logic Output
TIME
The logic gate or the inverter cannot be a standard logic gate. For instance
TTL Logic defines a zero as an input between 0.0 and 0.8 volts and a one
when input is more than 2.0 volts. In between 0.8 V and 2.0V the output is
unpredictable. Some more bounce tolerance can be added by using logic
gates with Schmitt triggers. With a Schmitt trigger when the voltage drops
below the first threshold it will not switch state again, even if the voltage
crosses the same threshold, until the other higher threshold is reached. This
will reduce the sensitivity the Schmitt triggered gate has for switch bounce.
The behavior is shown in Figure 6-8.
RC Network
VOLTAGE
Schmitt on
Schmitt off
Logic Output
TIME
Circuits based on “Schmitt trigger” inputs have hysteresis, the inputs can
dither yet the output remains in a stable, known state.
139
Debouncing Techniques
It can be pretty annoying trying to adjust RC ratio for each and every circuit.
Let’s come up with generic RC circuit that works for all cases.
VCap = Vinitial(e-t/RC)
where
VCap = Voltage across the capacitor at time t
Vinitial = Initial voltage across the capacitor
t = time in seconds
R = Value of the resistor in Ohms
C = Value of the Capacitor in Farads
Values of R and C should be selected in such a way that VCap always stays
above the threshold voltage at which the gate switches till switch stops
bouncing.
R1 + R2 controls the capacitor charge time, and sets the debounce period
for the condition where the switch opens. The equation for charging is:
where
Vthreshold = Worst case transition point voltage across the capacitor
Vfinal = Final charged value across the capacitor
Figure 6-9 shows a small change to the RC de-bounce that includes a diode
between R1 and R2. Diode is an optional component here and takes care
of correct operation even when a hysteresis voltage assumes different
values due to wrong gate such that value of R1 + R2 comes out to be less
than R2. In this case, the diode forms a short cut that removes R2 from the
charging circuit. All of the charge flows through R1.
140
Debouncing Techniques
Vcc
R1
Output
R2
C
Let’s analyze this in more details. Figure 6-10 shows the state of the circuit
when Switch is Open and Closed respectively.
Vcc Vcc
R1 R1
Vb Vb
Va Output Va Output
R2 R2
Switch C Switch C
OPEN CLOSED
When the Switch is OPEN, capacitor C will charge via R1 and Diode. In
time, capacitor will charge and Vb will reach within 0.7V of Vcc. Therefore
the output of the inverting schmitt tigger will be at logic 0.
When the Switch is CLOSED, the Capacitor will discharge via R2. In time
capacitor C will discharge and Vb will reach 0V. Therefore the output of
the inverting Schmitt trigger will be logic 1.
If bounce occurs and there are short periods of switch closure or opening,
the capacitor will stop the voltage at Vb immediately reaching Vcc or GND.
Although, bouncing will cause slight charging and discharging of the
capacitor, the hysteresis of the Schmitt trigger input will stop the output
from switching.
141
Debouncing Techniques
Also note that the resistor R2 is required as a discharge path for the
capacitor, without it, Capacitor will be shorted when the switch is closed.
Without the diode, both R1 and R2 would form the capacitor charge path
when the switch is open. The combination of R1 and R2 would increase
the capacitor charge time, slowing down the circuit. Other alternative is to
make the R1 smaller but this will result in unwanted waste current when
the switch is closed and R1 is connected across the supply rails
Vcc
1 OUT
a
b
2 OUT
With the switch in position “a”, output of gate “1” will be Logic HIGH,
regardless of value of other input. This will pull the output of the gate “2”
to be held at Logic LOW. If the switch now moves between contacts and
is for a while suspended in the neither region between terminals, the latch
maintains its state because of the looped back zero from the gate “2”. Thus,
latch output is guaranteed bounce-free.
An alternative software approach to the above idea would be to run the two
contacts with pull-ups directly to the input pins of the CPU. Of course CPU
142
Debouncing Techniques
would observe lot of bounces but by writing a trivial code that detects any
assertion of either contact, the same can be eliminated.
Solution A: Read the Switch after sufficient time allowing the bounces to settle down
Though a simple approach, the above technique does not provide any EMI
protection. This reduces most of the random noise spikes by providing
sufficient time (500 ms) for the switch to settle down to its stable state but
143
Debouncing Techniques
a single glitch during that period (time when the switch status is being read)
might be mistaken as a contact transition. To avoid this, software needs to
be modified to read the input a couple of times each pass through the 500
ms loop and look for a stable signal. This would reject most of the EMI.
CALL DELAY ;
POP PSW ; RESTORE PROGRAM STATUS
EI ; RE-ENABLE INTERRUPTS
RETI ; RETURN BACK TO MAIN PROGRAM
The idea is that as soon as the switch is activated the De-bounce Routine
(DR) is called. The DR calls another subroutine called DELAY which just
kills time long enough to allow the contacts to stop bouncing. At that point
the DR checks to see if the contacts are still activated (maybe the user kept
a finger on the switch). If so, the DR waits for the contacts to clear. If the
contacts are clear, DR calls DELAY one more time to allow for bounce on
contact-release before finishing.
A de-bounce routine must be tuned to your application; the one above may
not work for everything. Also, the programmer should be aware that
switches and relays can lose some of their springiness as they age. That can
cause the time it takes for contacts to stop bouncing to increase with time.
So, the de-bounce code that worked fine when the keyboard was new might
not work a year or two later. Consult the switch manufacturer for data on
worst-case bounce times.
Solution C: Use a Counter to eliminate the noise and validate switch state
144
Debouncing Techniques
a certain fixed value, which should be 1 or 2 times bigger noise pulses, this
means that the current pulse is a valid pulse.
// include files
unsigned char counter; // Variable used to count
unsigned char T_valid; // Variable used as the minimum
// duration of a valid pulse
void main(){
P1 = 255; // Initialize port 1 as input port
T_valid = 100; // Arbitrary number from 0 to 255 where
// the pulse if validated
while(1){ // infinite loop
if (counter < 255){ // prevent the counter to roll
// back to 0
counter++;
}
if (P1_0 == 1){
counter = 0; // reset the counter back to 0
}
if (counter > T_valid){
//....
// Code to be executed when a valid
// pulse is detected.
//....
}
//....
// Rest of you program goes here.
//....
}
}
145
Debouncing Techniques
For all practical reasons, a system may have multiple banks of switches.
While it is seen how a single input switch can be de-bounced it does not
make sense to de-bounce multiple inputs individually when all input
switches can be handled at once with little overhead on the CPU. This
section extends the technique or de-bouncing algorithm to handle multiple
switches or inputs. Figure 6-12 shows a system with multiple input switches.
146
Debouncing Techniques
Vcc
(5V)
10K
IN0
IN1
IN2
IN3
Main:
GOSUB Debounce_Switches // get debounced inputs
PAUSE 50 // time between readings
GOTO Main // Continue the loop
END
Debounce_Switches:
switches = 0xF // enable all four inputs
FOR x = 1 TO 10
switches = switches & ~Switch_Inputs // test inputs
PAUSE 5 // delay between tests
NEXT
RETURN
147
Debouncing Techniques
The Debounce_Switches routine starts by assuming that all switch inputs will
be valid, so all the bits of switches are set to one. Then, the inputs are
scanned and compared to the previous state in FOR-NEXT loop. Since the
inputs are active low (zero when pressed), the one’s compliment operator
inverts them. The And operator (&) is used to update the current state. For
a switch to be valid, it must remain pressed through the entire FOR-NEXT
loop.
148
Debouncing Techniques
149
Debouncing Techniques
number of sampling periods. The output does not change until the input is
stable for duration of 40 ms.
The under-voltage lockout circuitry ensures that the outputs are at the
correct state on power-up. While the supply voltage is below the under-
voltage threshold, the de-bounce circuitry remains transparent. Switch
states are present at the logic outputs without delay.
Apart from the de-bounce circuitry, above Maxim devices includes ±15kV
ESD-protection on all pins to protect against electrostatic discharges
encountered during handling and assembly.
150
Power Management
7. Power Management
7.1 Introduction
Today’s designs require an increasing number of power rails and supply
solutions in System-on-chip , with loads ranging from a few uA for standby
supplies to over 100s of mA voltage regulators. It is important to choose
the appropriate solution for the targeted application and to meet specified
performance requirements, such as high efficiency, tight printed circuit
board (PCB) space, accurate output regulation, fast transient response, low
solution cost, etc. Power management design is becoming a more frequent
and challenging task for system designers, many of who may not have
strong power backgrounds.
The chapter is aimed at system engineers who may not be very familiar with
power supply designs and selection. The basic operating principles of linear
regulators and SMPS are explained and the advantages and disadvantages
of each solution are discussed. Chapter expands to include power supply
design models and considerations for embedded systems to provide most
optimal solution for the target application based on power targets,
efficiency and area tradeoff.
Let’s start with a simple example. Let’s say in an embedded system, a 12V
bus rail is available from the front-end power supply. On the system board,
a 3.3V voltage is needed to power an operational amplifier (op amp). The
simplest approach to generate the 3.3V is to use a resistor divider from the
12V bus, as shown in Figure 7-1. Does it work well? The answer is usually
“No”. The op amp’s VCC pin current may vary under different operating
conditions. If a fixed resistor divider is used, the chip VCC voltage varies
with load. Besides, the 12V bus input may not be well regulated. There may
151
Power Management
be many other loads in the same system sharing the 12V rail. Because of
the bus impedance, the 12V bus voltage varies with the bus loading
conditions. As a result, a resistor divider cannot provide a regulated 3.3V
to the op amp to ensure its proper operation. Therefore, a dedicated voltage
regulation loop is needed.
12VDC
R1
VCC
R2 LOAD
+
VX
Figure 7-1: Resistor Divider Generates 3.3VDC from 12V Bus Input
As shown in Figure 7-2, the feedback loop needs to adjust the top resistor
R1 value to dynamically regulate the 3.3V on VCC.
12VBUS
VCC
FEEDBACK
REGULATOR LOAD
+
VX
Figure 7-2: Feedback Loop Adjusts Series Resistor R1 Value to Regulate 3.3V [23]
152
Power Management
variable resistor in series with the output load. To establish the feedback
loop, conceptually, an error amplifier senses the DC output voltage via a
sampling resistor network RA and RB, and then compares the feedback
voltage VFB with a reference voltage VREF. The error amplifier output
voltage drives the base of the series power transistor via a current amplifier.
When either the input VBUS voltage decreases or the load current increases,
the VCC output voltage goes down. The feedback voltage VFB decreases as
well. As a result, the feedback error amplifier and current amplifier generate
more current into the base of the transistor Q1. This reduces the voltage
drop VCE and hence brings back the VCC output voltage, so that VFB equals
VREF. On the other hand, if the VCC output voltage goes up, in a similar
way, the negative feedback circuit increases VCE to ensure the accurate
regulation of the 3.3V output. In summary, any variation of VO is absorbed
by the linear regulator transistor’s VCE voltage. So the output voltage VCC is
always constant and well regulated.
VIN = 12VBUS
C
+
CURRENT B VCE
AMPLIFIER
-
E
C0 V0 = 3.3V
VCC
ERROR
AMPLIFIER LOAD
RA
- +
RB
VX
+ VREF -
153
Power Management
its power dissipation is PLOSS = (VIN – VO) x IO. In this case, the efficiency
of a linear regulator can be quickly estimated by:
𝑃𝑂𝑈𝑇 𝑉𝑂 × 𝐼𝑂 𝑉𝑂
𝜂= = =
𝑃𝑂𝑈𝑇 + 𝑃𝐿𝑂𝑆𝑆 𝑉𝑂 × 𝐼𝑂 + (𝑉𝐼𝑁 − 𝑉𝑂 ) × 𝐼𝑂 𝑉𝐼𝑁
So in the Table 1-1 example, when the input is 12V and output is 3.3V, the
linear regulator efficiency is just 27.5%. In this case, 82.5% of the input
power is just wasted and generates heat in the regulator. This means that
the transistor must have the thermal capability to handle its power/heat
dissipation at worst case at maximum VIN and full load. So the size of the
linear regulator and its heat sink may be large, especially when VO is much
less than VIN.
It is also clear that a linear regulator or an LDO can only provide step-down
DC/DC conversion. In applications that require VO voltage to be higher
than VIN voltage, or need negative VO voltage from a positive VIN voltage,
linear regulators obviously do not work.
154
Power Management
Unlike linear regulators, which can only step down an input, SMPS are
attractive because a topology can be selected to fit nearly any output
voltage.
a) Buck Converter
In the generic buck configuration, the switch controls the current flowing
into the inductor. The inductor stores the energy for the load.
156
Power Management
b) Boost Converter
The generic boost configuration steps up the voltage since the inductor is
placed prior to the switch.
The boost converter, also known as the step-up converter, takes a DC input
voltage and produces a DC output voltage that’s higher in value than the
input but of the same polarity (Figure 7-5). Linear regulators cannot provide
this feature.
c) Buck-Boost Converter
157
Power Management
e) CUK Converter
The generic CUK configuration can output a voltage that is either greater
or less than the input voltage magnitude.
The CUK converter’s output voltage can be greater than or less than the
input voltage magnitude (Figure 7-8). It uses a capacitor as its main energy-
storage component. By using inductors on the input and output, the CUK
converter produces very little input and output current ripple. And, it has
minimized electromagnetic interference (EMI) radiation.
158
Power Management
g) Flyback Converter
The flyback converter is the most versatile of all the topologies (Figure
7-10). It allows for one or more output voltages, some of which may be
opposite in polarity. Additionally, it is very popular in battery-powered
systems. It provides isolation as well.
159
Power Management
h) Forward Converter
The forward converter is a buck regulator with a transformer inserted
between the buck switch and the load (Figure 7-11). It provides both higher
and lower voltage outputs as well as isolation. It also might be more energy
efficient than a flyback converter [24].
i) Push-Pull Converter
160
Power Management
j) Half-Bridge Converter
k) Full-Bridge Converter
The full-bridge converter provides isolation from the AC line (Figure 7-14).
The pulse-width modulation (PWM) control circuitry is referenced to the
output ground, requiring a dedicated voltage rail to run the control circuits.
The base drive voltages for the switch transistors have to be transformer-
coupled because of the required isolation [24].
161
Power Management
162
Power Management
Figure 7-15: Buck, Boost, and Buck-Boost compose the fundamental SMPS
topologies8 [21]
The on and off states of the MOSFET divide the SMPS circuit into two
phases: a charge phase and a discharge phase, both of which describe the
energy transfer of the inductor (see the path loops in Figure 7-15). Energy
stored in the inductor during the charging phase is transferred to the output
load and capacitor during the discharge phase. The capacitor supports the
load while the inductor is charging and sustains the output voltage. This
163
Power Management
cyclical transfer of energy between the circuit elements maintains the output
voltage at the proper value, in accordance with its topology.
The inductor is central to the energy transfer from source to load during
each switching cycle. Without it, the SMPS would not function when the
MOSFET is switched. The energy (E) stored in an inductor (L) is
dependent upon its current (I):
1
𝐸= × 𝐿 × 𝐼2 (1)
2
𝑉𝐿 × ∆𝑇
∆ 𝐼𝐿 = (2)
𝐿
Figure 7-16: Voltage and Current Characteristics are detailed for a steady-state
inductor9 [21]
164
Power Management
During the charge phase, the MOSFET is on, the diode is reverse biased,
and energy is transferred from the voltage source to the inductor (Figure
7-15). Inductor current ramps up because VL is positive. Also, the output
capacitance transfers the energy it stored from the previous cycle to the
load in order to maintain a constant output voltage. During the discharge
phase, the MOSFET turns off, and the diode becomes forward biased and,
therefore, conducts. Because the source is no longer charging the inductor,
the inductor's terminals swap polarity as it discharges energy to the load and
replenishes the output capacitor (Figure 7-15). The inductor current ramps
down as it imparts energy, according to the same transfer relationship given
previously.
The ripple current must be filtered out by the SMPS in order to deliver true
DC current to the output. This filtering action is accomplished by the
output capacitor, which offers little opposition to the high-frequency AC
current. The unwanted output-ripple current passes through the output
capacitor, and maintains the capacitor's charge as the current passes to
ground. Thus, the output capacitor also stabilizes the output voltage. In
non-ideal applications, however, equivalent series resistance (ESR) of the
output capacitor causes output-voltage ripple proportional to the ripple
current that flows through it.
So, in summary, energy is shuttled between the source, the inductor, and
the output capacitor to maintain a constant output voltage and to supply
the load. But, how does the SMPS's energy transfer determine its output
voltage-conversion ratio? This ratio is easily calculated when steady state is
understood as it applies to periodic waveforms.
165
Power Management
inductor current at the beginning of the PWM period must equal inductor
current at the end. This means that the change in inductor current during
the charge phase (ΔICHARGE) must equal the change in inductor current
during the discharge phase (ΔIDISCHARGE). Equating the change in inductor
current for the charge and discharge phases, an interesting result is
achieved, which is also referred to as the volt-second rule:
|∆𝐼𝐶𝐻𝐴𝑅𝐺𝐸 | = |∆𝐼𝐷𝐼𝑆𝐶𝐻𝐴𝑅𝐺𝐸 |
|𝑉𝐿(𝐶𝐻𝐴𝑅𝐺𝐸) | × 𝐷 × 𝑇𝑆 = |𝑉𝐿(𝐷𝐼𝑆𝐶𝐻𝐴𝑅𝐺𝐸) | × (1 − 𝐷) × 𝑇𝑆
Simply put, the inductor voltage-time product during each circuit phase is
equal. This means that, by observing the SMPS circuits of Figure 7-15, the
ideal steady-state voltage-/current-conversion ratios can be found with
little effort. For the step-down circuit, a Kirchhoff's voltage loop around
the charge phase circuit reveals that inductor voltage is the difference
between VIN and VOUT. Likewise, inductor voltage during the discharge
phase circuit is -VOUT. Using the volt-second rule from equation 3, the
following voltage-conversion ratio is determined:
𝑉𝑂𝑈𝑇
=𝐷 (4)
𝑉𝐼𝑁
Further, input power (PIN) equals output power (POUT) in an ideal circuit.
Thus, the current-conversion ratio is found:
PIN = POUT
𝐼𝐼𝑁 𝑉𝑂𝑈𝑇
= = 𝐷
𝐼𝑂𝑈𝑇 𝑉𝐼𝑁
166
Power Management
From these results, it is seen that the step-down converter reduces VIN by
a factor of D, while input current is a D-multiple of load current. Table 7-1
lists the conversion ratios for the topologies depicted in Figure 7-15.
Generally, all SMPS conversion ratios can be found with the method used
to solve equations 3 and 5, though complex topologies can be more difficult
to analyze.
Wall powered
Wall powered with battery backup
Primarily Battery backed up
Fully powered battery
These devices operate fully on power supply available from wall power.
They typically consume more power and work in tandem with systems that
consumes a lot of power, that they are redundant when the underlying
system could not be powered on. Many of the devices in use fall under this
category including medical devices, industrial systems etc.
These classes of devices are very similar to above case but will have a limited
power backup using batteries. This backup is useful to properly shutdown
the system and to store the system configuration and acquired values safely
till full power is back.
167
Power Management
The most common example of these devices is mobile phones. They are
designed to work primarily with battery power supply. Whenever needed
the system can be charged back. It incorporates a full-fledged battery
charging and managing circuitry.
These devices are designed to work only from battery supply that does not
have a charging mechanism. These batteries have to be externally charged
or non-rechargeable batteries like a coin-cell.
Apart from these, there are many power sources being used in embedded
systems including photo-voltaic – solar power, etc. With the upcoming
wearable computing becoming a trend, the power supplies include
generating from unconventional sources like audio jack of smart phone,
human/mechanical movements or even body heat etc.
Figure 7-17 typically explains the power supply design for wall power with
battery backup devices.
The DC power input from the wall socket is used to power the system. If
the wall power is absent, the battery powers the system. The Power path
168
Power Management
controller is used to route the power from preferable source. The power
conditioning circuit finally supplies to the load at the required voltage and
current. Battery monitoring and charger circuit is necessary for managing
the battery.
Wall power is obtained from AC wall adapter plugged in the wall socket. It
provides constant low voltage DC suitable for running the system from the
high level AC source in the wall socket. The main factors to be considered
on selecting the wall power are voltage and current. The voltage supplied
by the wall power should be more enough to satisfy the input voltage
requirement of the power conditioning circuit usually comprising of linear
or SMPS regulators. Also if the power supply system incorporates battery
charging, then the voltage requirement of the battery charger should be
taken into account.
Power Supply for these systems are usually big and integrated separately
instead of being part of the embedded system SoC. One good example is
laptop battery.
Battery powered systems can cover a wide range of embedded systems all
the way from low-end systems that takes very low current running bare
metal operating system or RTOS to all the way to higher end system
running sophisticated multimedia and operating system like Linux.
169
Power Management
3.3V/3.0V 3.3V/3.0V
VDD_EXT System Load
3.3V/3.0V
System-On-Chip(SoC)
3.3V/3.0V 3.3V/3.0V
VDD_EXT System Load
3.3V/3.0V
System-On-Chip(SoC)
170
Power Management
This type of scheme is very well suited for low-end microcontrollers where
main requirements are low power and cost and where system load on each
rail is low such that efficiency loss due to regulator is not really a
consideration.
Common source for single 3.0/3.V supply could be external 3.3V regulator.
System-On-Chip(SoC)
Figure 7-20: Single Supply USB Powered regulation with on-chip USB regulator
System-On-Chip(SoC)
Figure 7-21: Single Supply USB Powered regulation with external USB regulator
For highly integrated system where cost and small form factor is highest
priority, there are specific advantages to integrate USB regulator on-chip,
however both USB cases are limited by the amount of current that can be
sourced from USB cable.
171
Power Management
NOTE: When powered with USB cable, there is an absolute limit of 150 mA that
can be sourced from the USB cable.
These systems may also extend to include some sort of external volatile
memory like DDR (DDR2, DDR3, LPDDR2 or Similar) that would require
separate power for the DDR IOs and external DRAM Memory which
needs to powered separately outside the SoC. This can be done by having
another external regulator dedicated for DDR supply that also powers the
external DRAM memory.
System-On-Chip(SoC)
DRAM
Memory
Figure 7-22: Multi-Supply embedded system with separate supply for DDR
Since there could be low power modes where DRAM is powered off while
system is still powered in low power modes, it make all sense to decouple
DRAM supply internally with core supply even if both requires same
voltage since in this case DDR regulator can be switched off in low power
modes providing lower system current. There are other better reasons to
do so as well.
172
Power Management
System-On-Chip(SoC)
DRAM
Memory
NOTE: One of the common sources for 1.8V could be chargeable 1.8V Li-Ion battery
(not shown), however for systems running of 1.8V battery generally would be DDR-less
low power applications.
173
Power Management
System-On-Chip(SoC)
DRAM
Memory
Figure 7-24: DC-DC Converter on 1.2V supply for higher power efficiency
For higher load supplies like core supply in a SoC, having SMPS-Buck
would really provide much higher efficiency. If the difference in Source
supply and generated supply is high, having a Linear regulator for higher
loads would be very in-efficient (Figure 7-25).
174
Power Management
System-On-Chip(SoC)
DRAM
Memory
In this particular case, using internal linear regulator on high load supply
(1.2V) would mean reduced efficiency.
𝑉𝑂 1.2𝑉
𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 (𝜂) = = = 28 %
𝑉𝐼𝑁 4.25𝑉
So for a 200 mA load on 1.2V supply (= 240 mW), system power from the
Lithium-Ion battery would still be 4.25V x 200 mA =850mW.
For a higher efficiency system, one may consider including SMPS (DC-DC)
on 1.2V supply (Figure 7-26). Main difference from what is shown in Figure
7-24 is that former is powered by 3.3V external regulator while later (Figure
7-26) is powered from Lithium Ion battery (3.7-4.2V). Since DC-DC are
highly efficient (above 90%), for higher loads would provide enable lower
system power thus increasing battery life.
175
Power Management
System-On-Chip(SoC)
DRAM
Memory
Any embedded systems in this category that consumes higher current will
generally rely on external Power management ICs (PMIC) to be able to
provide different supply voltages for the SoC with the highest power
efficiency.
Often all power supply may not come from PMIC based on PMIC selected
and the number of output tunable supplies that are available versus what is
required by the SoC. An example is shown in Figure 7-27 where 1.8V
supply is not available from the PMIC and is generated internally through
on-chip LDO.
System-On-Chip(SoC)
DRAM
Memory
Figure 7-27: Efficiency Power system with external PMIC for high load embedded
systems
176
Power Management
This would still keep power system very efficient if the load on 1.8V LDO
is “low”. For a higher load, an internal DC-DC on that supply may be
necessary (not shown).
For more complex system, there could be more scenarios where PMIC may
be required to be turned off during low power modes with only part of the
SoC operational. One way to efficiently do this is by having a separate
regulator (separate from PMIC) that only powers the “Always ON logic”
(Figure 7-28) that is necessary to remain enabled during low power modes,
for example. This provides a best combination of low power (since PMIC
remains OFF during low power modes) and fast recovery time (since
standalone regulator has faster response time then PMIC) from low power
modes.
DRAM 1.2V
Memory 1.2V Regulator
(LDO) System Load
System-On-Chip(SoC)
Figure 7-28: Combination of PMIC and internal LDOs for a power efficient
embedded system.
Another example where this may be very useful is dual core system-on-chip
with a combination of application core and real time core for housekeeping
and low power operation. Here application core can be made to work on
external PMIC while real time core can remain decoupled and rely on
internal LDO providing a good combination of low power and fast
response time with respect to wake-up from low power modes.
177
Power Management
NOTE: Most of the schemes shown in this section should be considered as examples
rather than strict guidelines; however a System-on-Chip may have several restrictions that
one may end up with a different combination of power scheme to meet target application
needs.
DC-DC
2.2 to 5.5V Converter 1.8 to 3.6V Main Supply
VIN VOUT
ADC
(Step Down-Buck) Control Interface
Switched Supply
MCU
Acceleration
Sensor
Radio
Temperature SoC
Sensor
Proximity
Sensor
178
Power Management
Though LCD driver is shown separate, could be part of the same SoC
though. Same applies for ADC and Radio. Need for small form factor
would eventually push all these components to be part of same SoC in
future [Except LCD Display].
Most phones today operate on a single cell Li-Ion battery, which has a 4.2V
maximum fully charged voltage. If the cellular phone manufacturer requires
the phone to operate with removed battery and plugged in charger the
maximum input voltage of the system can be higher, depending on how the
battery charger is implemented. In the past, the voltage regulator function
was implemented using discrete low dropout linear regulators, LDOs
however today most phones are built using more integrated power
management solutions, that include a large number of regulators, LDOs
and switching regulators, battery chargers, sequencing circuitry, supervisory
and house-keeping circuitry [27].
179
Power Management
DC-DC Baseband
[Buck] Processor domains
RF
Baseband
Processor SPI/IIC Serial LDOs Memory
Interface
The Li-Ion charger can safely charge and maintain a single cell Li-Ion
battery operating from an AC adapter. Some chargers would often integrate
a power FET with a thermally regulated charging to provide efficient
charging rate for a given ambient temperature.
High Power efficiency is one of the key requirements for Tablets to enable
longer numbers of hours of operation in a single charge.
180
Power Management
Buck Regulators are specifically useful for high current load blocks. For
example separate Buck regulator output can be dedicated for each
processor core and memory island for different power domains. The USB
switch enables the use of a single, mini or micro USB connector for USB,
UART and audio connections, switching the relevant signals to the
connector depending on the type of device connected. In addition, the
MC34708 also integrates a real time clock, coin cell charger, a 13-channel
10-bit ADC, 5V USB Boost regulator, two PWM outputs, touch-screen
interface, status LED drivers and four GPIOs [28].
Ambient energy sources can be broadly divided into direct current (DC)
sources and alternating current (AC) sources. DC sources include
harvesting energy from sources that vary very slowly with time, such as light
intensity and thermal gradients using solar panels and thermoelectric
generators respectively [29]. The output voltage of these harvesters does
not have to be rectified.
181
Power Management
Battery/
Super Cap
PV Module
(MPPT) Battery
Management
Energy
Interface Charger Regulator Microcontroller
Harvester
Cold Start
PMIC
182
Power Management
To extract the maximum power from a solar panel, the panel must be
operated at its maximum power point. A solar panel can be modeled as a
reverse-biased diode that delivers current in parallel with a parasitic
capacitance (C). The current output of the diode is proportional to the light
intensity.
IC C
183
Power Management
The cold-start unit is an optional block that may or may not be present in
a typical energy-harvesting PMIC. The function of the cold-start unit is to
boot strap the system when there is insufficient energy stored in the storage
element. The design of the cold-start unit is application dependent. For
solar applications, an input-powered (as opposed to a battery-powered)
oscillator can be used to drive the switches of a temporary low efficiency
switching converter. Once sufficient energy has been built up in the energy
buffer, the highly efficient switching converter can take over [29].
184
References
185
References
8. References
186
References
187
References
188
References
Avoiding Metastability
C
Clock Boost Circuitry, 7, 8, 11, 12
capacitor, 64, 137, 138, 140, 141, 142,
Multi-Stage Synchronizer, 5, 37, 158, 159, 163, 165, 182, 184
41, 44, 48, 49, 52, 53, 55, 56,
59, 66, 67, 82, 83, 85, 87, 89, Code runaway, 129
91, 92, 106, 110, 114, 115, 116,
119, 124, 126, 127, 128, 129,
131, 132 D
Guidelines, 145
BGA, 24
189
References
Techniques, 136 G
DMA, 93, 95, 97
GPIO, 9
DPRAM, 63
H
DRAM, 21, 66, 69, 79, 112, 172
Harvard architecture, 25, 83, 84
Drivers, 104, 112
I
E
I/O, 3, 4, 15, 17, 19, 20, 23, 34, 37, 56,
EEPROM, 12, 24, 64, 65, 68, 69, 113, 63, 85, 104, 109, 114, 145
117
IDE, 26
efficiency, viii, 7, 32, 81, 151, 154,
155, 156, 157, 170, 171, 173, 174, IEC, 122, 131
175, 176, 178, 180, 183, 184
Internet of Things, 29, 61
electromagnetic interference, 158
Internet of Things (IoT), 29, 61
EMI, 122, 143, 146, 155, 158
interrupt service routine, 32, 40, 144
Endianess, viii, 61, 84, 85, 88, 89, 92,
94 IoT, 29, 61
energy harvesting, 182 ISR, viii, 32, 33, 34, 37, 38, 40, 43, 46,
48, 49, 50, 51, 54, 55, 58, 60, 144,
energy pulse output (EP), 12 146
F J
FIFO, 53, 54
190
References
K N
Non-maskable, 36
Latency, viii, 4, 51, 52, 53, 55, 56, 81
M P
Maskable, 36, 37, 57 PC, 1, 4, 18, 20, 21, 28, 29, 34, 35, 38,
40, 43, 44, 46, 48, 49, 50, 51, 82,
masked ROM, 67 83, 87, 88, 90
MOSFET, 10, 163, 164, 165 PWM, 6, 24, 161, 163, 166, 181
multidrop, 38
191
References
system-on-a-chip, 17
RAM, 3, 15, 16, 17, 19, 21, 23, 25, 62,
63, 64, 66, 67, 68, 79, 82, 94, 95,
96, 97, 98, 112, 113, 115, 117, 119, T
120
transformer, 160, 161, 162
Real Time Clock (RTC), 11, 14
resistor, 134, 137, 138, 140, 142, 151, U-Boot, viii, 102, 116, 117, 118, 119
152, 153
USB, 10, 13, 23, 102, 111, 113, 117,
ROM, viii, 3, 15, 16, 17, 19, 20, 21, 62, 119, 171, 172, 179, 181
64, 67, 68, 69, 83, 103, 105, 110,
111, 113, 114, 115, 116, 117, 119, V
120
192