Embedded System Final

Introduction
An embedded system is some combination of computer hardware and software, either fixed in
capability or programmable, that is designed for a specific function or for specific functions within a
larger system. Industrial machines, agricultural and process industry devices, automobiles, medical
equipment, cameras, household appliances, airplanes, vending machines and toys as well as mobile
devices are all possible locations for an embedded system.
Characteristics
• Embedded systems are designed to do some specific task, rather than a general-purpose
computer for multiple tasks
• Embedded systems are typically designed to meet real time constraints
• Embedded system are not always standalone devices. Many embedded systems consist of
small parts within a larger device that serves a more general purpose.
o (Form Gibson robot guitar features an embedded system for tuning the strings, but the
overall purpose of the Robot Guitar is, of course, to play music. Similarly, an
embedded system in an automobile provides a specific function as a subsystem of the
car itself.)
• The program instruction written for embedded systems are referred to as firmware and stored
in read-only memory or flash memory chips
• They run with limited computer hardware resources: little memory, small or non-existent
keyboard or screen.
• User interface of embedded system range from no user interface at all, in systems dedicated
only to one task, to complex graphical user interfaces. Simple embedded devices use buttons,
LEDs or character LCD.
• Some embedded systems provide user interface remotely with the help of serial or network
connection
• Embedded system either uses microprocessor or microcontroller as its processor
• Embedded systems talk with the outside world via peripherals such as Serial communication
interfaces (rs232, rs485), USB, synchronous serial communication interfaces (i2c, spi),
multimedia cards (SD cards), networks (ethernet, LonWorks), Fieldbuses (CAN-Bus), GPIO
and so on
Classification of Embedded Systems
Embedded systems can be classified into different types based on performance, functional
requirements and performance of the microcontroller.
Classification of Embedded systems
Embedded systems are classified into four categories based on their performance and functional
requirements:
• Stand alone embedded systems
• Real time embedded systems
• Networked embedded systems
• Mobile embedded systems
Embedded Systems are classified into three types based on the performance of
the microcontroller such as
• Small scale embedded systems

• Medium scale embedded systems
• Sophisticated embedded systems
Stand Alone Embedded Systems
Stand alone embedded systems do not require a host system like a computer, it works by itself. It takes
the input from the input ports either analog or digital and processes, calculates and converts the data
and gives the resulting data through the connected device-Which either controls, drives and displays
the connected devices. Examples for the stand alone embedded systems are mp3 players, digital
cameras, video game consoles, microwave ovens and temperature measurement systems.
Real Time Embedded Systems
A real time embedded system is defined as, a system which gives a required o/p in a particular time.
These types of embedded systems follow the time deadlines for completion of a task. Real time
embedded systems are classified into two types such as soft and hard real time systems.
Networked Embedded Systems
These types of embedded systems are related to a network to access the resources. The connected
network can be LAN, WAN or the internet. The connection can be any wired or wireless. This type of
embedded system is the fastest growing area in embedded system applications. The embedded web
server is a type of system wherein all embedded devices are connected to a web server and accessed
and controlled by a web browser.Example for the LAN networked embedded system is a home security
system wherein all sensors are connected and run on the protocol TCP/IP
Mobile Embedded Systems
Mobile embedded systems are used in portable embedded devices like cell phones, mobiles, digital
cameras, mp3 players and personal digital assistants, etc.The basic limitation of these devices is the
other resources and limitation of memory.
Small Scale Embedded Systems
These types of embedded systems are designed with a single 8 or 16-bit microcontroller, that may
even be activated by a battery. For developing embedded software for small scale embedded systems,
the main programming tools are an editor, assembler, cross assembler and integrated development
environment (IDE).
Medium Scale Embedded Systems
These types of embedded systems design with a single or 16 or 32 bit microcontroller, RISCs or DSPs.
These types of embedded systems have both hardware and software complexities. For developing
embedded software for medium scale embedded systems, the main programming tools are C, C++,
JAVA, Visual C++, RTOS, debugger, source code engineering tool, simulator and IDE.
Sophisticated Embedded Systems
These types of embedded systems have enormous hardware and software complexities, that may need
ASIPs, IPs, PLAs, scalable or configurable processors. They are used for cutting-edge applications
that need hardware and software Co-design and components which have to assemble in the final
system.
Components of embedded system

i) Hardware
• Power Supply
• Processor
• Memory
• Timers
• Serial communication ports
• Input/Output circuits
• System application specific circuits
ii) Software: The application software is required to perform the series of tasks.
An embedded system has software designed to keep in view of three constraints:
• Availability of System Memory
• Availability of processor speed
• The need to limit power dissipation when running the system continuously in cycles of
wait for events, run , stop and wake up.
iii) Real Time Operating System: (RTOS) It supervises the application software
and provides a mechanism to let the processor run a process as per scheduling and
do the switching from one process (task) to another process.
Applications of Embedded Systems:
Embedded systems are used in different applications like automobiles, telecommunications, smart
cards, missiles, satellites, computer networking and digital consumer electronics.
Applications of Embedded Systems
Embedded Systems in Automobiles and in telecommunications
• Motor and cruise control system

• Body or Engine safety
• Entertainment and multimedia in car
• E-Com and Mobile access
• Robotics in assembly line
• Wireless communication
• Mobile computing and networking
Embedded Systems in Smart Cards, Missiles and Satellites
• Security systems
• Telephone and banking
• Defense and aerospace
• Communication
Embedded Systems in Peripherals & Computer Networking
• Displays and Monitors

• Networking Systems
• Image Processing
• Network cards and printers
Embedded Systems in Consumer Electronics
• Digital Cameras
• Set top Boxes
• High Definition TVs
• DVDs
Skills required for an embedded system designer
1. Skills for small scale embedded system designer
Full understanding of microcontroller with a basic knowledge of computer architecture, digital

electronics design, software engineering, data communication, control engineering, motors and
actuators, sensors and measurements, analog electronic design and IC design and manufacture.
For specific situation, some specific knowledge such as control system engineering is needed.
2. Skills for medium scale embedded system designer

a. Tasks and their scheduling by RTOS
b. Cooperative (os doesn’t initiate context switching/task yield) and preemptive scheduling
(temporarily interrupting a task being carried out by a computer system, without requiring its
cooperation, and with the intention of resuming the task at a later time.)
c. Inter processor communication functions.
d. Use of shared data, and program in the critical sections and re-entrant functions.
e. Use of semaphores, mailboxes, queues and pipes
f. Handling of interrupt-latencies and meeting task deadlines
g. Use of various RTOS functions
h. User of physical and virtual device drivers
3. Skills for sophisticated embedded system designer
a. Co-design and solve the high level of complexities of the hardware and software design.
b. Need full skills in hardware units and basic knowledge of ‘C’, RTOS and other programming
tools
Processor
A processor is an electronic circuit which performs operations on some external data source, usually memory
or some other data stream.
A processor has two essential units:
i) Program flow control unit (CU)

ii) Execution unit (EU)
The CU includes the fetch unit for fetching instructions from the memory
The EU has circuits that implement the instruction pertaining to data transfer operations and data conversion
from one form to another. It includes the arithmetic and logical unit (ALU) and also the circuits that execute
instructions for a program control task (halt, interrupt, jump).
An embedded system processor chip or core can be of the following:
1. General purpose processor (GPP)

a. Microprocessor
b. Microcontroller
c. Embedded processor
d. Digital signal processor (DSP)
2. Application specific system processor (ASSP)
3. Multiprocessor system using general purpose processor (GPP)
1. Microprocessor
A microprocessor is a single VLSI chip that has a CPU and many also have some other units such as caches,
floating point processing arithmetic unit, pipelining and super-scaling units for fast processing of instructions.
2. Microcontroller
A microcontroller is a single chip VLSI unit which, though having limited computational capabilities possesses
enhanced input output capabilities and a number of on-chip functional unit
3. Embedded processor
When a microcontroller or microprocessor are specially designed to have the following capabilities, the term
embedded processor is preferred instead of microcontroller of microprocessor.
1. Fast context switching

2. Atomic (Atomic operations in concurrent programming are program operations that run completely
independently of any other processes) ALU operation
3. RISC core for fast, more precise and intensive calculations by the embedded software
4. Digital signal processor (DSP)
A DSP is an essential unit of an embedded system for a large number of applications which needs signal
processing. For example: image processing, multimedia, Video, HDTV, DSP modem and telecommunication
processing systems.
A DSP provides fast, discrete-time, signal processing instructions.
5. Application specific system processors (ASSPs)

ASSP is application dependent system processor used for processing signal of embedded system. The
processor is dedicated to specific application market. For example processor for real time video processing
can be used in digital television, set-up boxes, DVD players, video-conferencing and other systems
6. Multi-processor systems using general purpose processors (GPP)
In an embedded system, several processors may be needed to execute an algorithm fast and within a strict
dead line. For example, in real-time video processing, the number of MAC operations needed per second may
not be possible from one DSP unit. An embedded system then may incorporate tow or more processors
running in synchronization.
Other hardware units
1. Power source
The embedded system devices operate in one of the following voltage source range
a) 5.0 V ± 0.25 V
b) 3.3 V ± 0.3 V
c) 2.0 V ± 0.2 V
d) 1.5 V ± 0.2 V
The following points have to be taken care of while connecting the supply rails
a) A processor may have more than two pins of VDD and VSS. This distributes the power in all the sections
and reduces interference between the sections.
b) Supply should separately power the (a) external I/O driving port (b) timer (c) clock and (d) reset
circuits.
An embedded system may need to be run continuously without being switched off; the system design,
therefore, is constrained by the need to limit power dissipation while it is running.
2. Clock oscillator circuits and clocking unit(s)
The clock controls the various clocking requirements for the CPU of the system timer and CPU machine cycle.
The machine cycles are for
• Fetching the codes and data from memory

• Decoding and executing at the processor
• Transferring the results to memory
The clock control time for executing an instruction. The clock uses either a crystal (external to the processor)
or a ceramic (internally associated with the processor) or an external oscillator attached to the processor.
• The crystal resonator gives the highest stability in frequency with temp drift in the circuit
• The internal ceramic resonator, if available in a processor, saves the use of the external crystal and
gives a reasonable though not very highly stable frequency
• The external IC based clock oscillator has a significantly higher power dissipation compared to internal
processor-resonator. However, it provides a higher driving capability, which might be needed when
various circuits of embedded system are concurrently driven
3. System clock
A timer circuit suitably configured is the system-clock. It is used by the schedulers and for real-time
programming. More than one timer using the system clock may be needed for various timing and counting
needs in a system
4. Reset circuit and watchdog timer
Reset means that the processor starts the processing of instructions from a starting address. The reset
circuit activates for a fixed period and then deactivates. The processor circuit keeps the reset pin active
and then deactivates to let the program proceed from a default beginning address. Reset can be activated
by one of the followings:
a. An external reset circuit that activates on the power-up, on switching-on reset of the system or
on detection of a low voltage.
b. By a software instruction or time-out by a programmed timer know as watchdog timer or a clock
monitor detecting a slowdown below certain threshold frequencies due to fault.
The watchdog timer is a timing device that resets the system after a predefined timeout. To avoid the
reset the watchdog should be fed before the timeout occurs. The watchdog timer is activated within the
first few clock cycles after power-up. The watchdog timer rescues the system if a fault develops and the
program gets stuck.
5. Memories
The system uses various types of the memory. The types of memories and their functions are tabulated
below:
Memory Function
ROM or EPROM Stores application programs from where the
processor fetches the instruction codes. Stores
codes for system booting, initializing, initial input
data and strings. Codes for RTOS pointers
(addresses) of various service routines.
RAM (internal and External) Stores variables during program run and stores
the stack. Stores input and output buffers, for
example, for speech or image
EEPROM or Flash Store non-volatile results of processing
Caches Stores copies of instructions and data in advance
from external memories and stores temporarily
during fast memory
6. I/O ports
• The system uses input ports to get the inputs from the physical devices such as key-buttons,
sensors and transducer circuit.
• The system uses output ports to send output to various devices such as LED, LCD, modems,
printers, alarms, actuators and so on.
• The system may get inputs from multiple channels or may have to send output to multiple
channels.
• For networking the system there are different types of protocol such as I2C, CAN, USB, SPI and
PCI.
7. Interrupts Handler
An interrupt handling mechanism exits in each system to handle interrupts from various process in the
system.
Important points regarding interrupt handling are as follows:
• There can be a number of interrupt sources and groups of interrupt sources in a processor. An
interrupt may be a hardware signal that indicates the occurrence of an event. A software interrupt
may arise or can be configured in some conditions.
• The system may prioritize the sources, nnh and service them accordingly.
• Certain sources are not maskable and cannot be disabled.
• The processor’s current program diverts to interrupt service routine on the occurrence of the
interrupt.
8. PWM and ADC
PWM stands for Pulse Width Modulation and is the method to produce variable voltages using digital
means. PWM is a way of digitally encoding analog signal levels.
Figure 1: PWM signals of varying duty cycles
Figure 1 shows three different PWM signals. Figure 1a shows a PWM output at a 10% duty cycle. That is,
the signal is on for 10% of the period and off the other 90%. Figures 1b and 1c show PWM outputs at 50%
and 90% duty cycles, respectively. These three PWM outputs encode three different analog signal values,
at 10%, 50%, and 90% of the full strength. If, for example, the supply is 9V and the duty cycle is 10%, a
0.9V analog signal results.
An Analog to Digital Converter (ADC) is a very useful feature that converts an analog voltage on a pin to a
digital number. By converting from the analog world to the digital world, we can begin to use electronics
to interface to the analog world around us.
ADCs can vary greatly between microcontroller. The ADC on some microcontroller is a 10-bit ADC meaning
it has the ability to detect 1,024 (210) discrete analog levels. Some microcontrollers have 8-bit ADCs (28
= 256 discrete levels) and some have 16-bit ADCs (216 = 65,536 discrete levels). There is a conversion time
limit in which the conversion is definite.
9. LCD and LED displays

A LCD screen may show up a multi-line display of characters or also show a small graphs or icon. An LCD
needs little power. LCD is a diode tat absorbs o r emis light on application of 3V to 5V and 50 or 60 Hz
voltage-pulses with current less than 50 µA.
The LED is a diode that emits yellow, green or red light on application of forward voltage. The LED is use
to indicate the system status.
10. Keypad/Keyboard
The keypad or keyboard is an important device for getting user inputs. The system must provide the
necessary interfacing and key-debouncing circuit as well as the software for the system to receive input
from the key of keypad or keyboard.
11. Modem/Transceiver
The system provides necessary interface for user for connection through cable or wireless connectivity. A
transceiver is circuit that can transmit as well as receive byte streams.
Embedded Systems Design: A Unified
Hardware/Software Introduction
Chapter 2: Custom single-purpose

processors
1
Outline
• Introduction
• Combinational logic
• Sequential logic
• Custom single-purpose processor design
• RT-level custom single-purpose processor design
Manish Man Shrestha 2

Cosmos College of management and technology
Introduction
• Processor
– Digital circuit that performs a
computation tasks
– Controller and datapath CCD
Digital camera chip
– General-purpose: variety of computation CCD Pixel coprocessor D2A

tasks A2D preprocessor
– Single-purpose: one particular lens

computation task
JPEG codec Microcontroller Multiplier/Accum
– Custom single-purpose: non-standard
task DMA controller Display
ctrl
• A custom single-purpose
processor may be
Memory controller ISA bus interface UART LCD ctrl
– Fast, small, low power
– But, high NRE, longer time-to-market,
less flexible

CMOS transistor on silicon
• Transistor
– The basic electrical component in digital systems
– Acts as an on/off switch
– Voltage at “gate” controls whether current flows from
source to drain
source
gate Conducts
if gate=1
1 drain
gate
IC package IC oxide
source channel drain
Silicon substrate

CMOS transistor implementations
• Complementary Metal Oxide source source
Semiconductor gate Conducts

if gate=1
gate Conducts
if gate=0
drain drain
• We refer to logic levels
nMOS pMOS
– Typically 0 is 0V, 1 is 5V
• Two basic CMOS types
– nMOS conducts if gate=1 1 1 1
x y x
– pMOS conducts if gate=0 x F = x'
F = (xy)' y
– Hence “complementary” x F = (x+y)'
0 y x y
• Basic gates
0 0
– Inverter, NAND, NOR inverter NAND gate NOR gate

Basic logic gates
x F x F x x y F x x y F x x y F
F y F F
0 0 y 0 0 0 0 0 0 y 0 0 0
1 1 0 1 0 0 1 1 0 1 1
1 0 0 1 0 1 1 0 1
F=x F=xy F=x+y F=x⊕y
1 1 1 1 1 1 1 1 0
Driver AND OR XOR
x F x F x x y F x x y F x x y F
F F F
0 1 y 0 0 1 y 0 0 1 y 0 0 1
1 0 0 1 1 0 1 0 0 1 0
F = x’ F = (x y)’ 1 0 1 F = (x+y)’ 1 0 0 F=x y 1 0 0
Inverter NAND 1 1 0 NOR 1 1 0 XNOR 1 1 1

Combinational logic design
A) Problem description B) Truth table C) Output equations
y is 1 if a is to 1, or b and c are 1. z is 1 if Inputs Outputs y = a'bc + ab'c' + ab'c + abc' + abc

b or c is to 1, but not both, or if all are 1. a b c y z
0 0 0 0 0
0 0 1 0 1 z = a'b'c + a'bc' + ab'c + abc' + abc
0 1 0 0 1
0 1 1 1 0
1 0 0 1 0
1 0 1 1 1
D) Minimized output equations 1 1 0 1 1
y bc 1 1 1 1 1 E) Logic Gates
a 00 01 11 10
0 0 0 1 0
a y
1 1 1 1 1 b
c
y = a + bc
z
bc
a 00 01 11 10
0 0 1 0 1
z
1 0 1 1 1
z = ab + b’c + bc’

Combinational components
I(log n -1) I0 A A B
B A B
I(m-1) I1 I0 n n
… n n n
n …
log n x n n-bit n bit,
S0 n-bit, m x 1 n-bit
Decoder Adder m function S0
… Multiplexor Comparator
ALU …
… n
S(log m) n S(log m)
n
O(n-1) O1 O0 carry sum less equal greater
O O
O= O0 =1 if I=0..00 sum = A+B less = 1 if A<B O = A op B

I0 if S=0..00 O1 =1 if I=0..01 (first n bits) equal =1 if A=B op determined
I1 if S=0..01 … carry = (n+1)’th greater=1 if A>B by S.
… O(n-1) =1 if I=1..11 bit of A+B
I(m-1) if S=1..11
With enable input e  With carry-in input Ci May have status outputs
all O’s are 0 if e=0 carry, zero, etc.
sum = A + B + Ci

Sequential components
I
n
load shift n-bit
n-bit n-bit
Register Shift register Counter
clear I Q
n n
Q Q
Q= Q = lsb Q=
0 if clear=1, - Content shifted 0 if clear=1,
I if load=1 and clock=1, - I stored in msb Q(prev)+1 if count=1 and clock=1.
Q(previous) otherwise.

Sequential logic design: Example 1
Design a counter with following binary sequence 1,2,5,7 and repeat. Use
JK flip flop.
• Step 1: Since it is three bit counter, the number of flip-flops required is three
• Step 2: flip-flop used: JK flip flop
• Step 3: Let three flip flop be A, B, C
• Step 4: State diagram and state table
001 Present state Next state

A B C A+ B+ C+
0 0 1 0 1 0
111 010
0 1 0 1 0 1
101
1 0 1 1 1 1
1 1 1 0 0 1

Sequential logic design: Example 1
(Excitation table)
JK flip flop excitation table
• Step 5: Excitation table
Present state Next state Flip flop input

A B C A+ B+ C+ JA KA JB KB JC KC
0 0 1 0 1 0 0 x 1 x x 1
0 1 0 1 0 1 1 x x 1 1 x
1 0 1 1 1 1 x 0 1 x x 0
1 1 1 0 0 1 x 1 x 1 x 0

Sequential logic design: Example 1 (K map)
• Step 6: K map for JK states from excitation table
K-map for JA K-map for KA K-map for JB
BC BC BC
A 00 01 11 10 A 00 01 11 10 A 00 01 11 10
0 1 0 x x 0 1 x
1 x x 1 1 1 1 x
JA = A’BC’ KA = ABC JB = B’C
K-map for KB K-map for Jc K-map for Kc

BC BC BC
A 00 01 11 10 A 00 01 11 10 A 00 01 11 10
0 x 1 0 x 1 0 1 x
1 x 1 1 x x 1
KB = AC + A’BC’ Jc = A’BC’ Kc = A’B’C

Sequential logic design: Example 1 (Circuit
diagram)
Step 7: Circuit diagram
A
JA A
CLK
A
KA A’
B
JB B
B
KB B’
C
JC C
C
KC C’

Custom single-purpose processor basic
model
… …
external external
control data controller datapath
inputs inputs
… …
datapath next-state registers
control and
controller inputs datapath control
logic
datapath
control state functional
outputs register units
… …
external external
control data
outputs outputs
… …
controller and datapath a view inside the controller and datapath

Example: greatest common divisor
!1
(a) black-box 1:
(c) state
• First create algorithm view 1 !(!go_i) diagram
2:
• Convert algorithm to go_i x_i y_i

2-J:
!go_i
GCD
“complex” state machine 3: x = x_i
d_o
– Known as FSMD: finite- 4: y = y_i
state machine with datapath (b) desired functionality
5: !(x!=y)
– Can use templates to 0: int x, y; x!=y
1: while (1) {
perform such conversion 2: while (!go_i);
6:
x<y !(x<y)
3: x = x_i;
y = y -x 8: x = x - y
4: y = y_i; 7:
5: while (x != y) {
6-J:
6: if (x < y)
7: y = y - x;
else 5-J:
8: x = x - y; 9: d_o = x
}
9: d_o = x; 1-J:
}

Creating the datapath
• Create a register for any 1:
!1
1 !(!go_i)
declared variable 2:
x_i y_i
!go_i
• Create a functional unit for 2-J:
x_sel
Datapath
each arithmetic operation 3: x = x_i

y_sel
n-bit 2x1 n-bit 2x1
x_ld
0: x 0: y
• Connect the ports, registers 4: y = y_i
y_ld
and functional units 5: !(x!=y)

!= < subtractor subtractor
x!=y
5: x!=y 6: x<y 8: x-y 7: y-x
– Based on reads and writes 6: x_neq_y
x<y !(x<y) x_lt_y
– Use multiplexors for 7: y = y -x 8: x = x - y d_ld
9: d
multiple sources 6-J:

d_o
• Create unique identifier 5-J:
– for each datapath component 9: d_o = x
control input and output 1-J:

Creating the controller’s FSM
!1 go_i
1:
1 !(!go_i)
Controller
0000 1:
!1 • Same structure as FSMD
1
2:
!go_i
0001 2:
!(!go_i)
• Replace complex
!go_i
2-J:
0010 2-J: actions/conditions with
3: x = x_i x_sel = 0
0011 3: x_ld = 1
datapath configurations
4: y = y_i
y_sel = 0 x_i y_i
0100 4: y_ld = 1
!(x!=y)
Datapath
5: !x_neq_y
0101 5: x_sel
x!=y n-bit 2x1 n-bit 2x1
x_neq_y y_sel
6: 0110 6:
x_ld
x<y !(x<y) x_lt_y !x_lt_y 0: x 0: y
y_ld
7: y = y -x 8: x = x - y 7: y_sel = 1 8: x_sel =1
y_ld = 1 x_ld = 1
6-J: 0111 1000

1001 6-J:
5: x!=y 6: x<y 8: x-y 7: y-x
5-J: x_neq_y
1010 5-J:
x_lt_y 9: d
9: d_o = x 1011 9: d_ld = 1
d_ld
1-J: 1100 1-J: d_o

Splitting into a controller and datapath
go_i
Controller implementation model Controller !1

0000 1: x_i y_i
go_i
x_sel 1 !(!go_i) (b) Datapath
Combinational y_sel 0001 2:
logic !go_i x_sel
x_ld n-bit 2x1 n-bit 2x1
y_ld 0010 2-J: y_sel
x_neq_y x_sel = 0 x_ld
0011 3: x_ld = 1 0: x 0: y
x_lt_y y_ld
d_ld
y_sel = 0
0100 4: y_ld = 1
x_neq_y=0 5: x!=y 6: x<y 8: x-y 7: y-x
0101 5: x_neq_y
Q3 Q2 Q1 Q0 x_neq_y=1
0110 6: x_lt_y 9: d
State register d_ld
x_lt_y=1 x_lt_y=0
I3 I2 I1 I0
7: y_sel = 1 8: x_sel =1 d_o
y_ld = 1 x_ld = 1
0111 1000
1001 6-J:
1010 5-J:
1011 9: d_ld = 1
1100 1-J:

Finite state machine with datapath (FSMD)
• The above state diagram is known as FSMD

• A FSMD is a digital system composed of a finite-state machine, which
controls the program flow, and a datapath, which performs data processing
operations.
• FSMDs are essentially sequential programs in which statements have been
scheduled into states, thus resulting in more complex state diagrams.
• A finite-state machine (FSM) or finite-state automaton (FSA, plural:
automata), finite automaton, or simply a state machine, is a mathematical
model of computation. It is an abstract machine that can be in exactly one of
a finite number of states at any given time.
• Need example of FSM? Vending machine

State diagram templates
Assignment statement Loop statement Branch statement

a=b while (cond) { if (c1)
next statement loop-body- c1 stmts
statements else if c2
} c2 stmts
next statement else
other stmts
next statement
!cond
a=b C: C:
cond c1 !c1*c2 !c1*!c2
next loop-body-
c1 stmts c2 stmts others
statement statements
J: J:
next next
statement statement

Fibonacci series
Find error in the datapath

Optimizing single-purpose processors
• Optimization is the task of making design metric

values the best possible
• Optimization opportunities
– original program
– FSMD
– datapath
– FSM

Optimizing the original program
• Analyze program attributes and look for areas of

possible improvement
– number of computations
– size of variable
– time and space complexity
– operations used
• multiplication and division very expensive

Optimizing the original program (cont’)
original program optimized program
0: int x, y; 0: int x, y, r;
1: while (1) { 1: while (1) {
2: while (!go_i); 2: while (!go_i);
3: x = x_i; // x must be the larger number
4: y = y_i; 3: if (x_i >= y_i) {
5: while (x != y) { 4: x=x_i;
replace the subtraction
6: if (x < y) 5: y=y_i;
operation(s) with modulo
7: y = y - x; }
operation in order to speed
else 6: else {
up program
8: x = x - y; 7: x=y_i;
} 8: y=x_i;
9: d_o = x; }
} 9: while (y != 0) {
10: r = x % y;
11: x = y;
12: y = r;
}
13: d_o = x;
}
GCD(42, 8) - 9 iterations to complete the loop GCD(42,8) - 3 iterations to complete the loop
x and y values evaluated as follows : (42, 8), (34, 8), x and y values evaluated as follows: (42, 8), (8,2),
(26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2). (2,0)

Optimizing the FSMD
• Areas of possible improvements

– merge states
• states with constants on transitions can be eliminated, transition
taken is already known
• states with independent operations can be merged
– separate states
• states which require complex operations (a*b*c*d) can be broken
into smaller states to reduce hardware size
– scheduling

Optimizing the FSMD (cont.)
int x, y; !1 optimized FSMD
original FSMD
1:
int x, y;
1 !(!go_i) eliminate state 1 – transitions have constant values
2: 2:
!go_i go_i !go_i
2-J: x = x_i
3: y = y_i
merge state 2 and state 2J – no loop operation in
3: x = x_i between them
5:
4: y = y_i x<y x>y

merge state 3 and state 4 – assignment operations are
independent of one another 7: y = y -x 8: x = x - y
5: !(x!=y)
x!=y
9: d_o = x
6: merge state 5 and state 6 – transitions from state 6 can
x<y !(x<y) be done in state 5
y = y -x 8: x = x - y
7:
eliminate state 5J and 6J – transitions from each state
6-J: can be done from state 7 and state 8, respectively
5-J:
eliminate state 1-J – transition from state 1-J can be
d_o = x done directly from state 9
9:
1-J:

Optimizing the datapath
• Sharing of functional units

– one-to-one mapping, as done previously, is not necessary
– if same operation occurs in different states, they can share a
single functional unit
• Multi-functional units
– ALUs support a variety of operations, it can be shared
among operations occurring in different states

Optimizing the FSM
• State encoding
– task of assigning a unique bit pattern to each state in an FSM
– size of state register and combinational logic vary
– can be treated as an ordering problem
• State minimization
– task of merging equivalent states into a single state
• state equivalent if for all possible input combinations the two states
generate the same outputs and transitions to the next same state

Summary
• Custom single-purpose processors

– Straightforward design techniques
– Can be built to execute algorithms
– Typically start with FSMD
– CAD tools can be of great assistance

Chapter 2 General-Purpose Processors:

Software
1
Introduction
• General-Purpose Processor
– Processor designed for a variety of computation tasks
– Low unit cost, in part because manufacturer spreads NRE
(Non-recurring engineering) over large numbers of units
• Motorola sold half a billion 68HC05 microcontrollers in 1996 alone
– Carefully designed since higher NRE is acceptable
• Can yield good performance, size and power
– Low NRE cost, short time-to-market/prototype, high
flexibility
• User just writes software; no processor design
– a.k.a. “microprocessor” – “micro” used when they were
implemented on one or a few chips rather than entire rooms
Basic Architecture
• Control unit and Processor

datapath Control unit Datapath
– Note similarity to ALU

Controller Control
single-purpose /Status
processor
Registers
• Key differences
– Datapath is general
– Control unit doesn’t PC IR
store the algorithm –
the algorithm is
I/O
“programmed” into the
Memory
memory

Basic architecture overview
• Datapath Unit: consists of circuitry for transforming data and

for storing temporary data.
• Control Unit: consists of circuitry for retrieving program
instructions and for moving data to, from and through the
datapath according to those instr.
• Memory: While registers serve a processor’s short term storage
requirements, memory serves the processor’s medium and long-
term information-storage requirements. There are two memory
architectures:
– Harvard and Princeton.

Datapath Operations
• Load Processor
– Read memory location Control unit Datapath
into register ALU
• ALU operation Controller Control
/Status
+1
– Input certain registers

through ALU, store Registers
back in register
• Store 10 11
– Write register to PC IR
memory location
I/O
Memory
...
10
11
...
Control Unit
• Control unit: configures the datapath
operations Processor
– Sequence of desired operations Control unit Datapath
(“instructions”) stored in memory –
ALU
“program”
Controller Control
• Instruction cycle – broken into /Status
several sub-operations, each one
clock cycle, e.g.: Registers
– Fetch: Get next instruction into IR
– Decode: Determine what the
instruction means
– Fetch operands: Move data from PC IR R0 R1
memory to datapath register
– Execute: Move data through the
ALU I/O
– Store results: Write data from 100 load R0, M[500] Memory
...
register to memory 500 10
101 inc R1, R0 501
102 store M[501], R1 ...
Control Unit Sub-Operations
• Fetch Processor
Control unit Datapath
– Get next instruction
ALU
into IR Controller Control
/Status
– PC: program
counter, always Registers
points to next
instruction PC IR
100 R0 R1
load R0, M[500]
– IR: holds the
fetched instruction I/O
100 load R0, M[500] Memory

...
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
• Decode Processor
– Determine what the
ALU
instruction means Controller Control
/Status
Registers
PC 100 IR R0 R1
load R0, M[500]
I/O

...
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
• Fetch operands Processor

– Move data from
ALU
memory to datapath Controller Control
register /Status
Registers
10
PC 100 IR R0 R1
load R0, M[500]
I/O

...
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
• Execute Processor
– Move data through
ALU
the ALU Controller Control
/Status
– This particular
instruction does Registers
nothing during this

sub-operation PC IR
10
100 R0 R1
load R0, M[500]
I/O

...
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
• Store results Processor

– Write data from
ALU
register to memory Controller Control
/Status
– This particular
instruction does Registers
nothing during this

sub-operation PC IR
10
100 R0 R1
load R0, M[500]
I/O

...
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
Instruction Cycles
PC=100 Processor
Fetch Decode Fetch Exec. Store Control unit Datapath

ops results ALU
clk Controller Control
/Status
Registers
10
PC 100 IR R0 R1
load R0, M[500]
I/O

...
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
Instruction Cycles
PC=100 Processor

ops results ALU
clk Controller Control +1
/Status
PC=101
Registers
Fetch Decode Fetch Exec. Store
ops results
clk
10 11
PC 101 IR R0 R1
inc R1, R0
I/O

...
500 10
101 inc R1, R0 501
102 store M[501], R1 ...
Instruction Cycles
PC=100 Processor

ops results ALU
clk Controller Control
/Status
PC=101
Registers
Fetch Decode Fetch Exec. Store
ops results
clk
10 11
PC 102 IR R0 R1
store M[501], R1
PC=102
Fetch Decode Fetch Exec. Store I/O
ops results ...
clk 500 10
101 inc R1, R0 501 11
102 store M[501], R1 ...
Architectural Considerations
• N-bit processor Processor

– N-bit ALU, registers,
ALU
buses, memory data Controller Control
interface /Status
– Embedded: 8-bit, 16- Registers
bit, 32-bit common

– Desktop/servers: 32- PC IR
bit, even 64
• PC size determines I/O
Memory
address space

Architectural Considerations
• Clock frequency Processor

– Inverse of clock
ALU
period Controller Control
/Status
– Must be longer than
longest register to Registers
register delay in
entire processor PC IR
– Memory access is
often the longest I/O
Memory

Pipelining: Increasing Instruction Throughput
Wash 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Non-pipelined Pipelined
Dry 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
non-pipelined dish cleaning Time pipelined dish cleaning Time
Fetch-instr. 1 2 3 4 5 6 7 8
Decode 1 2 3 4 5 6 7 8
Fetch ops. 1 2 3 4 5 6 7 8 Pipelined
Execute 1 2 3 4 5 6 7 8
Instruction 1
Store res. 1 2 3 4 5 6 7 8
Time
pipelined instruction execution

Superscalar and VLIW Architectures
• Performance can be improved by:

– Faster clock (but there’s a limit)
– Pipelining: slice up instruction into stages, overlap stages
– Multiple ALUs to support more than one instruction stream
• Superscalar
– Scalar: non-vector operations
– Fetches instructions in batches, executes as many as possible
• May require extensive hardware to detect independent instructions
• VLIW: each word in memory has multiple independent instructions
– Relies on the compiler to detect and schedule instructions
– Currently growing in popularity

Two Memory Architectures
Processor Processor
• Princeton
– Fewer memory
wires
• Harvard
Program Data memory Memory
– Simultaneous memory (program and data)
program and data

memory access
Harvard Princeton

Cache Memory
• Memory access may be slow Fast/expensive technology, usually on

the same chip
• Cache is small but fast
Processor
memory close to processor
– Holds copy of part of memory
– Hits and misses Cache
Memory
Slower/cheaper technology, usually on

a different chip

Programmer’s View
• A programmer writes the program instructions that carry out the desired functionality
on the general-purpose processor.
• Programmer doesn’t need detailed understanding of architecture
– Instead, needs to know what instructions can be executed
• Two levels of instructions:
– Assembly level
– Structured languages (C, C++, Java, etc.)
• Most development today done using structured languages
– But, some assembly level programming may still be necessary
– Drivers: portion of program that communicates with and/or controls (drives)
another device
• Often have detailed timing considerations, extensive bit manipulation
• Assembly level may be best for these

Assembly-Level Instructions
Instruction 1 opcode operand1 operand2
...
• Instruction Set
– Defines the legal set of instructions for that processor
• Data transfer: memory/register, register/register, I/O, etc.
• Arithmetic/logical: move register through ALU and back
• Branches: determine next PC value when not just PC+1

A Simple (Trivial) Instruction Set
Assembly instruct. First byte Second byte Operation
MOV Rn, direct 0000 Rn direct Rn = M(direct)
MOV direct, Rn 0001 Rn direct M(direct) = Rn
MOV @Rn, Rm 0010 Rn Rm M(Rn) = Rm
MOV Rn, #immed. 0011 Rn immediate Rn = immediate
ADD Rn, Rm 0100 Rn Rm Rn = Rn + Rm
SUB Rn, Rm 0101 Rn Rm Rn = Rn - Rm
JZ Rn, relative 0110 Rn relative PC = PC+ relative

(only if Rn is 0)
opcode operands

Addressing Modes
Addressing Register-file Memory
mode Operand field contents contents
Immediate Data
Register-direct
Register address Data
Register
Register address Memory address Data
indirect
Direct Memory address Data
Indirect Memory address Memory address
Data

Sample Programs
C program Equivalent assembly program
0 MOV R0, #0; // total = 0

1 MOV R1, #10; // i = 10
2 MOV R2, #1; // constant 1
3 MOV R3, #0; // constant 0
Loop: JZ R1, Next; // Done if i=0

int total = 0; 5 ADD R0, R1; // total += i
for (int i=10; i!=0; i--) 6 SUB R1, R2; // i--
total += i;
7 JZ R3, Loop; // Jump always
// next instructions...
Next: // next instructions...

Programmer Considerations
• Program and data memory space

– Embedded processors often very limited
• e.g., 64 Kbytes program, 256 bytes of RAM (expandable)
• Registers
– assembly-level programmers need to know how many are
there?
– Only a direct concern for assembly-level programmers
• I/O
– How communicate with external signals?
• Interrupts
Development Environment
• Development processor
– The processor on which we write and debug our programs
• Usually a PC
• Target processor
– The processor that the program will run on in our embedded
system
• Often different from the development processor
Development processor Target processor

Firmware Development Process
• Compilers
C File C File Asm.
– Cross compiler
File
• Runs on one
Compiler Assemble
processor, but
r generates code for
Binary Binary Binary
File
another
File File
Linker
• Assemblers
Library Debugger
• Linkers
Exec.
File Profiler
• Debuggers
Implementation Phase Verification Phase
• Profilers

• Assemblers translate assembly instructions to binary machine instructions.
• A linker allows a programmer to create a program in separately-assembled files.
• Compilers translate structured programs into machine (or assembly) programs.
• A cross-compiler executes on one processor (our development processor), but
generates code for a different processor (our target processor).
• Debuggers help programmers evaluate and correct their programs.
• Emulators support debugging of the program while it executes on the target
processor.
• Profiler is the form of dynamic program analysis that measures, for example, the
space (memory) or time complexity of a program, the usage of particular instructions,
or the frequency and duration of function calls. Most commonly, profiling
information serves to aid program optimization.
• All this tool in integrated development environment (IDE)

Running a Program
• If development processor is different than target, we

run our compiled code by:
– Download to target processor
– Simulate
• Simulation
– One method: Hardware description language
• But slow, not always available
– Another method: Instruction set simulator (ISS)
• Runs on development processor, but executes instructions of target
processor

Testing and Debugging
(a) (b)
• Debugger
Implementation Implementation – Gives us control over time –
Phase Phase set breakpoints, look at
register values, set values,
Verification step-by-step execution, ...
Phase Development processor
– But, doesn’t interact with real
Debugger
environment
• Download to board
Emulator – Use device programmer
– Runs in real environment, but
External tools
not controllable
• Compromise: emulator
Programmer
– Runs in real environment, at
speed or near
Verification
Phase – Supports some controllability
from the PC
Application-Specific Instruction-Set
Processors (ASIPs)
• General-purpose processors
– Sometimes too general to be effective in demanding
application
• e.g., video processing – requires huge video buffers and operations
on large arrays of data, inefficient on a GPP
– But single-purpose processor has high NRE (Non recurring
engineering), not programmable
• ASIPs – targeted to a particular domain
– Contain architectural features specific to that domain
• e.g., embedded control, digital signal processing, video processing,
network processing, telecommunications, etc.
– Still programmable
A Common ASIP: Microcontroller
• For embedded control applications

– Reading sensors, setting actuators
– Mostly dealing with events (bits): data is present, but not in huge
amounts
– e.g., VCR, disk drive, digital camera (assuming SPP for image
compression), washing machine, microwave oven
• Microcontroller features
– On-chip peripherals
• Timers, analog-digital converters, serial communication, etc.
• Tightly integrated for programmer, typically part of register space
– On-chip program and data memory
– Direct programmer access to many of the chip’s pins
– Specialized instructions for bit-manipulation and other low-level
operations
Another Common ASIP: Digital Signal
Processors (DSP)
• For signal processing applications
– Large amounts of digitized data, often streaming
– Data transformations must be applied fast
– e.g., cell-phone voice filter, digital TV, music synthesizer
• DSP features
– Several instruction execution units
– Multiple-accumulate single-cycle instruction, other instrs.
– Efficient vector operations – e.g., add two arrays
• Vector ALUs, loop buffers, etc.

Trend: Even More Customized ASIPs
• In the past, microprocessors were acquired as chips

• Today, we increasingly acquire a processor as Intellectual
Property (IP)
– e.g., synthesizable VHDL model
• Opportunity to add a custom datapath hardware and a few
custom instructions, or delete a few instructions
– Can have significant performance, power and size impacts
– Problem: need compiler/debugger for customized ASIP
• Remember, most development uses structured languages
• One solution: automatic compiler/debugger generation
– e.g., www.tensillica.com
• Another solution: retargettable compilers
– e.g., www.improvsys.com (customized VLIW architectures)

Selecting a Microprocessor
• Issues
– Technical: speed, power, size, cost
– Other: development environment, prior expertise, licensing, etc.
• Speed: how evaluate a processor’s speed?
– Clock speed – but instructions per cycle may differ
– Instructions per second – but work per instr. may differ
– Dhrystone: Synthetic benchmark, developed in 1984. Dhrystones/sec.
• MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digital’s VAX 11/780). A.k.a. Dhrystone
MIPS. Commonly used today.
– So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second
– SPEC (Standard Performance Evaluation Corporation): is an American non-profit
organization that aims to "produce, establish, maintain and endorse a standardized set" of performance
benchmarks for computer
– EEMBC – Embedded Microprocessor Benchmark Consortium, www.eembc.org
• Suites of benchmarks: automotive, consumer electronics, networking, office automation,
telecommunications
• Embedded Microprocessor Benchmark Consortium, is a non-profit, member-funded organization
formed in 1997, focused on the creation of standard benchmarks for the hardware and software
used in embedded systems

General Purpose Processors
Processor Clock speed Periph. Bus Width MIPS Power Trans. Price
General Purpose Processors
Intel PIII 1GHz 2x16 K 32 ~900 97W ~7M $900
L1, 256K
L2, MMX
IBM 550 MHz 2x32 K 32/64 ~1300 5W ~7M $900
PowerPC L1, 256K
750X L2
MIPS 250 MHz 2x32 K 32/64 NA NA 3.6M NA
R5000 2 way set assoc.
StrongARM 233 MHz None 32 268 1W 2.1M NA
SA-110
Microcontroller
Intel 12 MHz 4K ROM, 128 RAM, 8 ~1 ~0.2W ~10K $7
8051 32 I/O, Timer, UART
Motorola 3 MHz 4K ROM, 192 RAM, 8 ~.5 ~0.1W ~10K $5
68HC811 32 I/O, Timer, WDT,
SPI
Digital Signal Processors
TI C5416 160 MHz 128K, SRAM, 3 T1 16/32 ~600 NA NA $34
Ports, DMA, 13
ADC, 9 DAC
Lucent 80 MHz 16K Inst., 2K Data, 32 40 NA NA $75
DSP32C Serial Ports, DMA
Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998

Designing a General Purpose Processor
FSMD
Declarations:
• Not something an embedded bit PC[16], IR[16]; Reset PC=0;
bit M[64k][16], RF[16][16];
system designer normally Fetch IR=M[PC];
PC=PC+1
would do Decode from states
below
– But instructive to see how Mov1 RF[rn] = M[dir]
simply we can build one top op = 0000 to Fetch
down 0001
Mov2 M[dir] = RF[rn]
to Fetch
– Remember that real processors Mov3 M[rn] = RF[rm]

0010 to Fetch
aren’t usually built this way
Mov4 RF[rn]= imm
• Much more optimized, much 0011 to Fetch
more bottom-up design Add RF[rn] =RF[rn]+RF[rm]

0100 to Fetch
Sub RF[rn] = RF[rn]-RF[rm]

Aliases: 0101 to Fetch
op IR[15..12] dir IR[7..0]
rn IR[11..8] imm IR[7..0]
Jz PC=(RF[rn]=0) ?rel :PC
rm IR[7..4] rel IR[7..0]
0110 to Fetch

Architecture of a Simple Microprocessor
• Storage devices for each Datapath
Control unit 1 0
declared variable To all
input RFs
2x1 mux
control
– register file holds each of the signals
RFwa
variables Controller RFw
(Next-state and RFwe
• Functional units to carry out the control
logic; state register)
From all
output RFr1a
RF (16)
FSMD operations control

signals RFr1e
– One ALU carries out every PCld

16
Irld
RFr2a
RFr1 RFr2
required operation PCinc
PC IR RFr2e
• Connections added among the PCclr

ALUs
ALU
components’ ports 2 1 0
ALUz
corresponding to the operations Ms

3x1 mux
required by the FSM Mre Mwe
• Unique identifiers created for

every control signal A Memory D

A Simple Microprocessor
Reset PC=0; PCclr=1;
Fetch IR=M[PC]; MS=10; Datapath 1

Control unit To all 0
PC=PC+1 Irld=1; RFs
input 2x1 mux
Decode from states Mre=1;
contro
below PCinc=1;
l
signals RFwa
Mov1 RF[rn] = M[dir] RFwa=rn; RFwe=1; RFs=01; Controller RFw
op = 0000 to Fetch Ms=01; Mre=1; (Next-state and RFwe
control From all RF (16)
Mov2 M[dir] = RF[rn] RFr1a=rn; RFr1e=1; output
logic; state RFr1a
0001 to Fetch Ms=01; Mwe=1; control
register)
signals RFr1e
Mov3 M[rn] = RF[rm] RFr1a=rn; RFr1e=1;
0010 to Fetch Ms=10; Mwe=1; 16 RFr2a
PCld Irld
RF[rn]= imm RFwa=rn; RFwe=1; RFs=10; PC IR RFr1 RFr2
Mov4 RFr2e
0011 to Fetch PCinc
ALUs
Add RF[rn] =RF[rn]+RF[rm] RFwa=rn; RFwe=1; RFs=00; PCclr
0100 ALU
to Fetch RFr1a=rn; RFr1e=1; ALUz
RFr2a=rm; RFr2e=1; ALUs=00 2 1 0
Sub RF[rn] = RF[rn]-RF[rm] RFwa=rn; RFwe=1; RFs=00;
0101 to Fetch RFr1a=rn; RFr1e=1;
RFr2a=rm; RFr2e=1; ALUs=01 Ms
PCld= ALUz; 3x1 mux Mre Mwe
Jz PC=(RF[rn]=0) ?rel :PC
0110 to Fetch RFrla=rn;
RFrle=1;
FSM operations that replace the FSMD
FSMD
operations after a datapath is created Memory
A D
You just built a simple microprocessor!

Chapter 3 Memory
1
Introduction
• Embedded system’s functionality aspects

– Processing
• processors
• transformation of data
– Storage
• memory
• retention of data
– Communication
• buses
• transfer of data

Cosmos College of Management and technology
Memory: basic concepts
• Stores large number of bits m × n memory
…
– m x n: m words of n bits each
m words
– k = Log2(m) address input signals …
– or m = 2^k words
– e.g., 4,096 x 8 memory:
n bits per word
• 32,768 bits
• 12 address input signals
memory external view
• 8 input/output data signals
r/w
2k × n read and write
• Memory access enable memory
– r/w: selects read or write

A0
…
– enable: read or write only when asserted
Ak-1
– multiport: multiple accesses to different locations …
simultaneously
Qn-1 Q0

Write ability/ storage permanence
permanence
• Traditional ROM/RAM distinctions
Storage
– ROM Mask-programmed ROM Ideal memory
• read only, bits stored without power
OTP ROM
– RAM Life of
product
• read and write, lose stored bits without
power Tens of EPROM EEPROM FLASH
years
• Traditional distinctions blurred Battery Nonvolatile NVRAM
life (10
– Advanced ROMs can be written to years)
• e.g., EEPROM In-system
programmable SRAM/DRAM
– Advanced RAMs can hold bits without
Near
power zero Write
ability
• e.g., NVRAM
During External External External External
• Write ability fabrication programmer, programmer, programmer programmer
In-system, fast
writes,
only one time only 1,000s OR in-system, OR in-system,
unlimited
– Manner and speed a memory can be of cycles 1,000s block-oriented
cycles
of cycles writes, 1,000s
written of cycles
• Storage permanence
– ability of memory to hold stored bits Write ability and storage permanence of memories,
after they are written showing relative degrees along each axis (not to scale).

Write ability
• Ranges of write ability
– High end
• processor writes to memory simply and quickly
• e.g., RAM
– Middle range
• processor writes to memory, but slower
• e.g., FLASH, EEPROM
– Lower range
• special equipment, “programmer”, must be used to write to memory
• e.g., EPROM, OTP ROM
– Low end
• bits stored only during fabrication
• e.g., Mask-programmed ROM
• In-system programmable memory
– Can be written to by a processor in the embedded system using the
memory
– Memories in high end and middle range of write ability

Storage permanence
• Range of storage permanence
– High end
• essentially never loses bits
• e.g., mask-programmed ROM
– Middle range
• holds bits days, months, or years after memory’s power source turned off
• e.g., NVRAM
– Lower range
• holds bits as long as power supplied to memory
• e.g., SRAM
– Low end
• begins to lose bits almost immediately after written
• e.g., DRAM
• Nonvolatile memory
– Holds bits after power is no longer supplied
– High end and middle range of storage permanence

ROM: “Read-Only” Memory
• Nonvolatile memory
• Can be read from but not written to, by a
processor in an embedded system External view
• Traditionally written to, “programmed”, enable 2k × n ROM
before inserting to embedded system A0
…
Ak-1
• Uses …
– Store software program for general-purpose Qn-1 Q0

processor
• program instructions can be one or more ROM
words
– Store constant data needed by system
– Implement combinational circuit
Example: 8 x 4 ROM
• Horizontal lines = words
• Vertical lines = data Internal view
• Lines connected only at circles 8 × 4 ROM

word 0
• Decoder sets word 2’s line to 1 if enable 3×8
decoder
word 1
word 2
address input is 010 A0 word line
A1
• Data lines Q3 and Q1 are set to 1 A2
because there is a “programmed” data line
connection with word 2’s line programmable

connection wired-OR
• Word 2 is not connected with data Q3 Q2 Q1 Q0
lines Q2 and Q0
• Output is 1010
Implementing combinational function
• Any combinational circuit of n functions of same k variables

can be done with 2^k x n ROM
Truth table
Inputs (address) Outputs
a b c y z 8×2 ROM
0 0 word 0
0 0 0 0 0
0 0 1 0 1 0 1 word 1
0 1 0 0 1 0 1
0 1 1 1 0 enable 1 0
1 0 0 1 0 1 0
1 0 1 1 1 c 1 1
1 1 0 1 1 b 1 1
1 1 1 1 1 1 1 word 7
a
y z

Mask-programmed ROM
• Connections “programmed” at fabrication

– set of masks
• Lowest write ability
– only once
• Highest storage permanence
– bits never change unless damaged
• Typically used for final design of high-volume systems
– spread out NRE cost for a low unit cost

OTP ROM: One-time programmable ROM
• Connections “programmed” after manufacture by user
– user provides file of desired contents of ROM
– file input to machine called ROM programmer
– each programmable connection is a fuse
– ROM programmer blows fuses where connections should not exist
• Very low write ability
– typically written only once and requires ROM programmer device
• Very high storage permanence
– bits don’t change unless reconnected to programmer and more fuses
blown
• Commonly used in final products
– cheaper, harder to inadvertently modify
EPROM: Erasable programmable ROM
• Programmable component is a MOS transistor
– Transistor has “floating” gate surrounded by an insulator 0V
floating gate
– (a) Negative charges form a channel between source and drain
source drain
storing a logic 1
– (b) Large positive voltage at gate causes negative charges to
move out of channel and get trapped in floating gate storing a (a)
logic 0
– (c) (Erase) Shining UV rays on surface of floating-gate causes
negative charges to return to channel from floating gate restoring +15V
the logic 1
source drain
– (d) An EPROM package showing quartz window through which (b)
UV light can pass

• Better write ability 5-30 min
– can be erased and reprogrammed thousands of times
• Reduced storage permanence (c)
source drain
– program lasts about 10 years but is susceptible to

radiation and electric noise
(d)
• Typically used during design development
Manish Man Shrestha .
12
EEPROM: Electrically erasable
programmable ROM
• Programmed and erased electronically
– typically by using higher than normal voltage
– can program and erase individual words
• Better write ability
– can be in-system programmable with built-in circuit to provide higher
than normal voltage
• built-in memory controller commonly used to hide details from memory user
– writes very slow due to erasing and programming
• “busy” pin indicates to processor EEPROM still writing
– can be erased and programmed tens of thousands of times
• Similar storage permanence to EPROM (about 10 years)
• Far more convenient than EPROMs, but more expensive
Flash Memory
• Extension of EEPROM
– Same floating gate principle
– Same write ability and storage permanence
• Fast erase
– Large blocks of memory erased at once, rather than one word at a time
– Blocks typically several thousand bytes large
• Writes to single words may be slower
– Entire block must be read, word updated, then entire block written back
• Used with embedded systems storing large data items in
nonvolatile memory
– e.g., digital cameras, TV set-top boxes, cell phones

RAM: “Random-access” memory
external view
• Typically volatile memory r/w 2k × n read and write
enable memory
– bits are not held without power supply
A0
• Read and written to easily by embedded system …
Ak-1
during execution …
• Internal structure more complex than ROM

Qn-1 Q0
– a word consists of several memory cells, each
internal view
storing 1 bit I3 I2 I1 I0
– each input and output data line connects to each 4×4 RAM
cell in its column 2×4
enable
decoder
– rd/wr connected to every cell
A0
– when row is enabled by decoder, each cell has logic A1
Memory
that stores input data bit when rd/wr indicates write cell
rd/wr
or outputs stored bit when rd/wr indicates read To every cell
Q3 Q2 Q1 Q0

Basic types of RAM
• SRAM: Static RAM memory cell internals
– Memory cell uses flip-flop to store bit

SRAM
– Requires 6 transistors
– Holds data as long as power supplied
Data' Data
• DRAM: Dynamic RAM
– Memory cell uses MOS transistor and W
capacitor to store bit
– More compact than SRAM
DRAM
– “Refresh” required due to capacitor leak
Data
• word’s cells refreshed when read W
– Typical refresh rate 15.625 microsec.
– Slower to access than SRAM

Ram variations
• PSRAM: Pseudo-static RAM

– DRAM with built-in memory refresh controller
– Popular low-cost high-density alternative to SRAM
• NVRAM: Nonvolatile RAM
– Holds data after external power removed
– Battery-backed RAM
• SRAM with own permanently connected battery
• writes as fast as reads
• no limit on number of writes unlike nonvolatile ROM-based memory
– SRAM with EEPROM or flash
• stores complete RAM contents on EEPROM or flash before power turned off

Example:
HM6264 & 27C256 RAM/ROM devices
• Low-cost low-capacity memory
data<7…0>
devices 11-13, 15-19 11-13, 15-19 data<7…0>
2,23,21,24, addr<15...0> 27,26,2,23,21, addr<15...0>
25, 3-10
• Commonly used in 8-bit 22 /OE
24,25, 3-10
22 /OE
microcontroller-based 27 /WE 20 /CS
embedded systems 20 /CS1
26 CS2 HM6264 27C256

• First two numeric digits indicate block diagrams
device type Device

HM6264
Access Time (ns)
85-100
Standby Pwr. (mW)
.01
Active Pwr. (mW)
15
Vcc Voltage (V)
5
27C256 90 .5 100 5
– RAM: 62
device characteristics
– ROM: 27 Read operation Write operation
• Subsequent digits indicate data data
capacity in kilobits addr

OE
addr
WE
/CS1 /CS1
CS2 CS2
timing diagrams

Example:
TC55V2325FF-100 memory device
• 2-megabit data<31…0> Device
TC55V23
Access Time (ns)
10
Standby Pwr. (mW)
na
Active Pwr. (mW)
1200
Vcc Voltage (V)
3.3
addr<15…0>
synchronous pipelined 25FF-100
addr<10...0> device characteristics

burst SRAM memory
/CS1
device /CS2 A single read operation
• Designed to be CS3
CLK
interfaced with 32-bit /WE
/ADSP
processors /OE
/ADSC
MODE
• Capable of fast /ADV
/ADSP
sequential reads and /ADSC
addr <15…0>
/WE
writes as well as /ADV /OE
single byte I/O CLK /CS1 and /CS2
TC55V2325F CS3
F-100
data<31…0>
block diagram
timing diagram

Composing memory
• Memory size needed often differs from size of readily Increase number of words
available memories 2m+1 × n ROM
2m × n ROM
• When available memory is larger, simply ignore unneeded
high-order address bits and higher data lines A0
… …
Am-1
• When available memory is smaller, compose several smaller 1×2 …
Am decoder
memories into one larger memory
2m × n ROM
– Connect side-by-side to increase width of words
enable
– Connect top to bottom to increase number of words …
• added high-order address line selects smaller memory …
containing desired word using a decoder
– Combine techniques to increase number and width of words
…
Qn-1 Q0
2m × 3n ROM
enable 2m × n ROM 2m × n ROM 2m × n ROM A
Increase width Increase number

A0 and width of
of words … … …
Am words
… … … enable
Q3n-1 Q2n-1 Q0 outputs

Memory hierarchy
• Want inexpensive, fast
memory
Processor
• Main memory
– Large, inexpensive, slow Registers
memory stores entire Cache

program and data
• Cache Main memory
– Small, expensive, fast Disk
memory stores copy of likely

accessed parts of larger Tape
memory
– Can be multiple levels of
cache
Cache
• Usually designed with SRAM

– faster but more expensive than DRAM
• Usually on same chip as processor
– space limited, so much smaller than off-chip main memory
– faster access ( 1 cycle vs. several cycles for main memory)
• Cache operation:
– Request for main memory access (read or write)
– First, check cache for copy
• cache hit
– copy is in cache, quick access
• cache miss
– copy not in cache, read address and possibly its neighbors into cache
• Several cache design choices
– cache mapping, replacement policies, and write techniques
Cache mapping
• Far fewer number of available cache addresses

• Are address’ contents in cache?
• Cache mapping used to assign main memory address to cache
address and determine hit or miss
• Three basic techniques:
– Direct mapping
– Fully associative mapping
– Set-associative mapping
• Caches partitioned into indivisible blocks or lines of adjacent
memory addresses
– usually 4 or 8 addresses per line

Direct mapping
• Main memory address divided into 2 fields

– Index
• cache address
• number of bits determined by cache size
– Tag
• compared with tag stored in cache at address Tag Index Offset
indicated by index V T D
• if tags match, check valid bit

Data
• Valid bit
Valid
– indicates whether data in slot has been loaded =
from memory
• Offset
– used to find particular word in cache line

Fully associative mapping
• Complete main memory address stored in each cache address

• All addresses stored in cache simultaneously compared with
desired address
• Valid bit and offset same as direct mapping
Tag Offset
Data
V T D V T D V T D
…
Valid
= =
=

Set-associative mapping
• Compromise between direct mapping and

fully associative mapping
• Index same as in direct mapping
• But, each cache address contains content Tag Index Offset
and tags of 2 or more memory address V T D V T D

locations Data
• Tags of that set simultaneously compared as Valid
in fully associative mapping = =
• Cache with set size N called N-way set-

associative
– 2-way, 4-way, 8-way are common

Cache-replacement policy
• Technique for choosing which block to replace

– when fully associative cache is full
– when set-associative cache’s line is full
• Random
– replace block chosen at random
• LRU: least-recently used
– replace block not accessed for longest time
• FIFO: first-in-first-out
– push block onto queue when accessed
– choose block to replace by popping queue

Cache write techniques
• When written, data cache must update main memory

• Write-through
– write to main memory whenever cache is written to
– easiest to implement
– processor must wait for slower main memory write
– potential for unnecessary writes
• Write-back
– main memory only written when “dirty” block replaced
– The dirty bit is set when the processor writes to (modifies) this memory. The bit indicates that its
associated block of memory has been modified and has not been saved to storage yet. When a block of
memory is to be replaced, its corresponding dirty bit is checked to see if the block needs to be written
back to secondary memory before being replaced or if it can simply be removed.
– reduces number of slow main memory writes

Cache impact on system performance
• Most important parameters in terms of performance:

– Total size of cache
• total number of data bytes cache can hold
• tag, valid and other house keeping bits not included in total
– Degree of associativity
– Data block size
• Larger caches achieve lower miss rates but higher access cost
– e.g.,
• 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles
– avg. cost of memory access = (0.85 * 2) + (0.15 * 20) = 4.7 cycles
• 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change
– avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 cycles (improvement)
• 8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change
– avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = 4.8904 cycles (worse)

Manish Man Shrestha
Chapter 4 Cosmos College of Management and Technology
Interfacing
Table of Contents
Introduction. ................................................................................................................................................. 3
Bus................................................................................................................................................................. 3
Ports .............................................................................................................................................................. 3
Timing Diagrams ........................................................................................................................................... 3
Basic protocol concepts ................................................................................................................................ 4
Time multiplexing ..................................................................................................................................... 4
Strobe protocol ......................................................................................................................................... 4
Handshake protocol .................................................................................................................................. 5
Microprocessor interfacing: I/O addressing ................................................................................................. 5
Port-based I/O (parallel I/O) ..................................................................................................................... 5
Bus-based I/O............................................................................................................................................ 5
Types of bus-based I/O: ........................................................................................................................ 6
Compromises/extension for I/O addressing ................................................................................................. 6
Parallel I/O peripheral ............................................................................................................................... 6
Extended parallel I/O ................................................................................................................................ 7
ISA bus ........................................................................................................................................................... 7
A basic memory protocol .............................................................................................................................. 7
Microprocessor Interrupts ............................................................................................................................ 8
Fixed interrupt .......................................................................................................................................... 8
Interrupt-driven I/O using fixed ISR location example ............................................................................. 8
Vectored interrupt .................................................................................................................................... 9
Interrupt-driven I/O using vectored interrupt .......................................................................................... 9
Interrupt address table ............................................................................................................................... 10
Maskable vs. non-maskable interrupts ....................................................................................................... 10
Direct memory access ................................................................................................................................. 10
Peripheral to memory transfer with DMA .............................................................................................. 11
Arbitration................................................................................................................................................... 11
Priority arbiter......................................................................................................................................... 11
Types of priority .................................................................................................................................. 12
1
Manish Man Shrestha
Cosmos College of Management and Technology
Manish Man Shrestha
Daisy-chain arbitration............................................................................................................................ 12
Pros/cons of Daisy-chain arbitration .................................................................................................. 13
Network-oriented arbitration ................................................................................................................. 13
Multilevel bus architectures ....................................................................................................................... 13
Processor-local bus ................................................................................................................................. 14
Peripheral bus ......................................................................................................................................... 14
Bridge ...................................................................................................................................................... 14
Advanced communication principles .......................................................................................................... 14
Parallel communication .............................................................................................................................. 14
Serial communication ................................................................................................................................. 14
Wireless communication ............................................................................................................................ 15
Error detection and correction ................................................................................................................... 15
Serial protocols: I2C ..................................................................................................................................... 15
Serial protocols: CAN .................................................................................................................................. 16
Serial protocols: FireWire ........................................................................................................................... 17
Serial protocols: USB ................................................................................................................................... 17
Parallel protocols: PCI Bus .......................................................................................................................... 17
Parallel protocols: ARM Bus ........................................................................................................................ 18
Wireless protocols: IrDA ............................................................................................................................. 18
Wireless protocols: Bluetooth .................................................................................................................... 18
Wireless Protocols: IEEE 802.11.................................................................................................................. 18
2
Manish Man Shrestha
Manish Man Shrestha
Introduction.
The aspects of embedded system functionality are processing, storage and communication. The
communication includes transfer of data between processors, memories and various peripherals. The
are implemented using bused and are called interfacing.
Bus
A bus is a set of wires with a single function (address bus and data bus) or entire collection of wires
(Address, data and control) with associated protocol. It can be uni-directional or bi-directional.
Ports
Ports are the conducting device on periphery. It connects bus to processor of memory. It is often
referred to as a pin. It can be a single wire or set of wires with a single function.
Timing Diagrams
It is a most common method for describing a communication protocol. In timing diagram time proceeds
to the right on x-axis. The diagram has control signal and data signal.
3
Manish Man Shrestha
Manish Man Shrestha
Basic protocol concepts

In communication protocol, generally, master initiates the communication and slave respond to the
master. There are sender and receiver.
Time multiplexing
• Share a single set of wires for multiple pieces of data
• Saves wires at expense of time
Strobe protocol
• Master asserts req to receive data
• Servant puts data on bus within time taccess
• Master receives data and deasserts req
• Servant ready for next request
4
Manish Man Shrestha
Manish Man Shrestha
Handshake protocol
• Master asserts req to receive data
• Servant puts data on bus and asserts ack
• Master receives data and deasserts req
• Servant ready for next request
Microprocessor interfacing: I/O addressing

• A microprocessor communicates with other devices using some its pins
Port-based I/O (parallel I/O)

• Processor has one or more N-bit ports
• Processor’s software reads and writes a port just like a register
• E.g., P0 = 0xFF; v = P1.2; -- P0 and P1 are 8-bit ports
Bus-based I/O
• Processor has address, data and control ports that form a single bus
5
Manish Man Shrestha
Manish Man Shrestha
• Communication protocol is built into the processor

• A single instruction carries out the read or write protocol on the bus
Types of bus-based I/O:

The processor talks to both memory and peripherals using same bus. There are two ways to talk to peripherals:
1. Memory mapped I/O

2. Standard I/O
1. Memory malpped I/O
In memory mapped I/O, peripheral registers occupy addresses in same address space as memory. For example, if
Bus as 16-bit address, lower 32k addresses may correspond to memory and upper 32k addresses may correspond
to peripherals.
2. Standard I/O (I/O -mapped I/O)
In standard I/O, Additional pin (M/IO) on bus indicates whether to access memory or peripheral. For example, if
bus has 16-bit address, all 64k address correspond to memory when M/IO set to 0 and all 64 k addresses
correspond to peripheral when M/IO set to 1.
Compromises/extension for I/O addressing

Parallel I/O peripheral
Parallel I/O peripheral is added when processor only support bus-based I/O but parallel I/O is needed for
interfacing. Each port of the peripheral is connected to a register that is read/written by the processor.
6
Manish Man Shrestha
Manish Man Shrestha
Extended parallel I/O
It is used when processor need more port than the available port within the processor.
ISA bus
ISA support standard I/O communication protocol. In this protocol /IOR is used to read data and /IOW is used for
writes.
A basic memory protocol
7
Manish Man Shrestha
Manish Man Shrestha
• Interfacing an 8051 to external memory
Ports P0 and P2 support port-based I/O when 8051 internal memory being used
Those ports serve as data/address buses when external memory is being used
16-bit address and 8-bit data are time multiplexed; low 8-bits of address must therefore be latched with aid of ALE
signal
Microprocessor Interrupts
Fixed interrupt
• Address built into microprocessor, cannot be changed
• Either ISR stored at address or a jump to actual ISR stored if not enough bytes available
Interrupt-driven I/O using fixed ISR location example
8
Manish Man Shrestha
Manish Man Shrestha
Vectored interrupt
• Peripheral must provide the address
• Common when microprocessor has multiple peripherals connected by a system bus
Interrupt-driven I/O using vectored interrupt
9
Manish Man Shrestha
Manish Man Shrestha
Interrupt address table

It compromises between fixed and vectored interrupts. It is a table in memory holding ISR addresses. Generally,
peripheral doesn’t provide ISR address, but rather index into table.
Maskable vs. non-maskable interrupts

• Maskable: programmer can set bit that causes processor to ignore interrupt
Important when in the middle of time-critical code
• Non-maskable: a separate interrupt pin that can’t be masked
Typically reserved for drastic situations, like power failure requiring immediate backup of data to non-volatile
memory
Direct memory access

It acts as a separate single-purpose processor. When DMA request is made, microprocessor relinquishes control of
system bus to DMA controller and the processor can meanwhile execute its regular program. It avoids inefficient
storing and restoring state due to ISR call and regular program need not wait unless it requires the system bus.
10
Manish Man Shrestha
Manish Man Shrestha
Peripheral to memory transfer with DMA
Arbitration
It is a method of handling request service from multiple peripherals. There are three types of arbiter: Priority
arbiter, Daisy chain arbiter and Network-oriented arbitration.
Priority arbiter
It is single purpose processor. Peripherals make request to arbiter and arbiter make requests to resource. The
arbiter assigns priority to each peripheral connected to it and request service in accordance to the priority of the
peripheral. The arbiter is connected to system bus for configuration only.
11
Manish Man Shrestha
Manish Man Shrestha
1. Microprocessor is executing its program.

2. Peripheral1 needs servicing so asserts Ireq1. Peripheral2 also needs servicing so asserts Ireq2.
3. Priority arbiter sees at least one Ireq input asserted, so asserts Int.
4. Microprocessor stops executing its program and stores its state.
5. Microprocessor asserts Inta.
6. Priority arbiter asserts Iack1 to acknowledge Peripheral1.
7. Peripheral1 puts its interrupt address vector on the system bus
8. Microprocessor jumps to the address of ISR read from data bus, ISR executes and returns (and
completes handshake with arbiter).
9. Microprocessor resumes executing its program.
Types of priority
• Fixed priority
• each peripheral has unique rank

• highest rank chosen first with simultaneous requests
• preferred when clear difference in rank between peripherals
• Rotating priority (round-robin)
• priority changed based on history of servicing

• better distribution of servicing especially among peripherals with similar priority demands
Daisy-chain arbitration
In daisy-chain arbitration, arbitration is done by the peripherals. The arbiter is either built into peripheral or
external logic is added in the peripherals. The peripherals are connected to each other in daisy-chain manner. In
this arbitration, closest peripheral has highest priority and only one peripheral connected to resource, all other
connected “upstream”. The peripheral’s req flow “downstream” to resource, resource’s ack flows “upstream” to
requesting peripheral.
12
Manish Man Shrestha
Manish Man Shrestha
Pros/cons of Daisy-chain arbitration

– Easy to add/remove peripheral - no system redesign needed
– Does not support rotating priority
– One broken peripheral can cause loss of access to other peripherals
Network-oriented arbitration
In network-oriented arbitration, multiple microprocessors share a bus (network). Th arbitration typically built into
bus protocol. The arbitration as typically used for connecting multiple distant chips.
Multilevel bus architectures

In multilevel bus architecture, buses are divided into processor-local bus and peripheral bus. Generally, peripheral
needs high-speed, processor-specific bus interface and too many peripherals slows down bus.
13
Manish Man Shrestha
Manish Man Shrestha
Processor-local bus
High speed, wide, most frequent communication
Connects microprocessor, cache, memory controllers, etc.
Peripheral bus
Lower speed, narrower, less frequent communication
Typically industry standard bus (ISA, PCI) for portability
Bridge
Single-purpose processor converts communication between busses
Advanced communication principles

• Layering
Break complexity of communication protocol into pieces easier to design and understand
Lower levels provide services to higher level
Lower level might work with bits while higher level might work with packets of data
Physical layer
Lowest level in hierarchy
Medium to carry data from one actor (device or node) to another
• Parallel communication
Physical layer capable of transporting multiple bits of data
• Serial communication
Physical layer transports one bit of data at a time
• Wireless communication
No physical connection needed for transport at physical layer
Parallel communication
• Multiple data, control, and possibly power wires
One bit per wire
• High data throughput with short distances
• Typically used when connecting devices on same IC or same circuit board
Bus must be kept short
long parallel wires result in high capacitance values which requires more time to charge/discharge
Data misalignment between wires increases as length increases
• Higher cost, bulky
Serial communication
• Single data wire, possibly also control and power wires
• Words transmitted one bit at a time
• Higher data throughput with long distances
Less average capacitance, so more bits per unit of time
• Cheaper, less bulky
• More complex interfacing logic and communication protocol
Sender needs to decompose word into bits
Receiver needs to recompose bits into word
14
Manish Man Shrestha
Manish Man Shrestha
Control signals often sent on same wire as data increasing protocol complexity
Wireless communication
• Infrared (IR)
Electronic wave frequencies just below visible light spectrum
Diode emits infrared light to generate signal
Infrared transistor detects signal, conducts when exposed to infrared light
Cheap to build
Need line of sight, limited range
• Radio frequency (RF)
Electromagnetic wave frequencies in radio spectrum
Analog circuitry and antenna needed on both sides of transmission
Line of sight not needed, transmitter power determines range
Error detection and correction

• Often part of bus protocol
• Error detection: ability of receiver to detect errors during transmission
• Error correction: ability of receiver and transmitter to cooperate to correct problem
Typically done by acknowledgement/retransmission protocol
• Bit error: single bit is inverted
• Burst of bit error: consecutive bits received incorrectly
• Parity: extra bit sent with word used for error detection
Odd parity: data word plus parity bit contains odd number of 1’s
Even parity: data word plus parity bit contains even number of 1’s
Always detects single bit errors, but not all burst bit errors
• Checksum: extra word sent with data packet of multiple words
e.g., extra word contains XOR sum of all data words in packet
Serial protocols: I2C

• I2C (Inter-IC)
• Two-wire serial bus protocol developed by Philips Semiconductors nearly 20 years ago
• Enables peripheral ICs to communicate using simple communication hardware
• Data transfer rates up to 100 kbits/s and 7-bit addressing possible in normal mode
• 3.4 Mbits/s and 10-bit addressing in fast-mode
• Common devices capable of interfacing to I2C bus:
• EPROMS, Flash, and some RAM memory, real-time clocks, watchdog timers, and microcontrollers
15
Manish Man Shrestha
Manish Man Shrestha
Serial protocols: CAN

• CAN (Controller area network)
– Protocol for real-time applications
– Developed by Robert Bosch GmbH
– Originally for communication among components of cars
– Applications now using CAN include:
• elevator controllers, copiers, telescopes, production-line control systems, and

medical instruments
– Data transfer rates up to 1 Mbit/s and 11-bit addressing
– Common devices interfacing with CAN:
• 8051-compatible 8592 processor and standalone CAN controllers
– Actual physical design of CAN bus not specified in protocol
• Requires devices to transmit/detect dominant and recessive signals to/from bus
• e.g., ‘1’ = dominant, ‘0’ = recessive if single data wire used
• Bus guarantees dominant signal prevails over recessive signal if asserted

simultaneously
16
Manish Man Shrestha
Manish Man Shrestha
Serial protocols: FireWire

1. FireWire (a.k.a. I-Link, Lynx, IEEE 1394)
• High-performance serial bus developed by Apple Computer Inc.
• Designed for interfacing independent electronic components
• e.g., Desktop, scanner
• Data transfer rates from 12.5 to 400 Mbits/s, 64-bit addressing
• Plug-and-play capabilities
• Packet-based layered design structure
• Applications using FireWire include:
• disk drives, printers, scanners, cameras
• Capable of supporting a LAN similar to Ethernet
• 64-bit address:
• 10 bits for network ids, 1023 subnetworks
• 6 bits for node ids, each subnetwork can have 63 nodes
• 48 bits for memory address, each node can have 281 terabytes of distinct locations
Serial protocols: USB

• USB (Universal Serial Bus)
• Easier connection between PC and monitors, printers, digital speakers, modems, scanners, digital
cameras, joysticks, multimedia game equipment
• Data rates
a. USB 1.0 January 1996 Full Speed (12 Mbit/s)
b. USB 1.1 August 1998 Full Speed (12 Mbit/s)
c. USB 2.0 April 2000 High Speed (480 Mbit/s)
d. USB 3.0 November 2008 SuperSpeed (5 Gbit/s)
• Tiered star topology can be used
a. One USB device (hub) connected to PC
i. hub can be embedded in devices like monitor, printer, or keyboard or can be standalone
b. Multiple USB devices can be connected to hub
c. Up to 127 devices can be connected like this
• USB host controller
a. Manages and controls bandwidth and driver software required by each peripheral
b. Dynamically allocates power downstream according to devices connected/disconnected
Parallel protocols: PCI Bus

• PCI Bus (Peripheral Component Interconnect)
– High performance bus originated at Intel in the early 1990’s
– Standard adopted by industry and administered by PCISIG (PCI Special Interest Group)
– Interconnects chips, expansion boards, processor memory subsystems
– Data transfer rates of 127.2 to 508.6 Mbits/s and 32-bit addressing
17
Manish Man Shrestha
Manish Man Shrestha
• Later extended to 64-bit while maintaining compatibility with 32-bit schemes
– Synchronous bus architecture
– Multiplexed data/address lines
Parallel protocols: ARM Bus

• ARM Bus
– Designed and used internally by ARM Corporation
– Interfaces with ARM line of processors
– Many IC design companies have own bus protocol
– Data transfer rate is a function of clock speed
• If clock speed of bus is X, transfer rate = 16 x X bits/s
– 32-bit addressing
Wireless protocols: IrDA

• IrDA
– Protocol suite that supports short-range point-to-point infrared data transmission
– Created and promoted by the Infrared Data Association (IrDA)
– Data transfer rate of 9.6 kbps and 4 Mbps
– IrDA hardware deployed in notebook computers, printers, PDAs, digital cameras, public phones,
cell phones
– Lack of suitable drivers has slowed use by applications
– Becoming available on some embedded OS’s
Wireless protocols: Bluetooth

• Bluetooth
– New, global standard for wireless connectivity
– Based on low-cost, short-range radio link
– Connection established when within 10 meters of each other
– No line-of-sight required
• e.g., Connect to printer in another room
Wireless Protocols: IEEE 802.11

• IEEE 802.11
– Proposed standard for wireless LANs

18
Manish Man Shrestha
Manish Man Shrestha
– Specifies parameters for PHY and MAC layers of network
• PHY layer
– physical layer
– handles transmission of data between nodes
– provisions for data transfer rates of 1 or 2 Mbps
– operates in 2.4 to 2.4835 GHz frequency band (RF)
– or 300 to 428,000 GHz (IR)
• MAC layer
– medium access control layer
– protocol responsible for maintaining order in shared medium
– collision avoidance/detection
19
Manish Man Shrestha
2018
Real Time Operating

System
NOTES
MANISH MAN SHRESTHA
Table of Contents
Introduction............................................................................................................................................. 3
Benefits of RTOS .................................................................................................................................... 3
Things to consider while choosing an RTOS............................................................................................ 3
Features of RTOS .................................................................................................................................... 4
RTOS Architecture .................................................................................................................................. 5
Kernel ..................................................................................................................................................... 6
Monolithic kernel ................................................................................................................................ 7
Microkernel ......................................................................................................................................... 7
Exokernel ............................................................................................................................................ 8
Real Time Kernel .................................................................................................................................... 9
Structure of real time kernel................................................................................................................. 9
Scheduling .............................................................................................................................................. 9
Cooperative scheduling...................................................................................................................... 10
Preemptive scheduling ....................................................................................................................... 10
Rate-monotonic scheduling ............................................................................................................ 10
Round-robin scheduling ................................................................................................................. 10
Fixed priority pre-emptive scheduling ............................................................................................ 11
Earliest Deadline First approach ........................................................................................................ 11
Process .................................................................................................................................................. 11
Thread ................................................................................................................................................... 11
Task ...................................................................................................................................................... 11
State of Task in a system ................................................................................................................... 12
Idle (created) state ......................................................................................................................... 12
Ready (Active) State ...................................................................................................................... 12
Running state ................................................................................................................................. 12
Blocked (waiting) state .................................................................................................................. 13
Deleted (finished) state .................................................................................................................. 13
Interrupt routines in RTOS .................................................................................................................... 13
1. Direct call to an ISR by an interrupting source and ISR sending an interrupt enter message to OS .. 13
2. RTOS first interrupting on an interrupt, the RTOS calling the corresponding ISR........................... 14
3. RTOS first interrupting on interrupt, then RTOS calling the corresponding ISR, the ISR sending
message to priority queue on interrupt service threads by temporary suspension of a scheduled task .. 15
© Manish Man Shrestha

Types of interrupt handlers in RTOS ..................................................................................................... 15
Intertask Communication and Synchronization ...................................................................................... 16
Shared Variables or Memory Areas ................................................................................................... 16
Signals .............................................................................................................................................. 17
Event Flag Groups ............................................................................................................................. 17
Semaphores ....................................................................................................................................... 17
Mailboxes .......................................................................................................................................... 17
Queues .............................................................................................................................................. 18
Mutexes............................................................................................................................................. 18
Task control block (TCB) ...................................................................................................................... 19
Task Information at the TCB ............................................................................................................. 19
Information about the task state at Task Control Block: ..................................................................... 20
Task's context and context switching ..................................................................................................... 20
Task endless event-waiting loop ............................................................................................................ 20
Task characteristics ............................................................................................................................... 20
Memory allocation in RTOS .................................................................................................................. 21
Classification of memory allocation techniques .................................................................................. 22
Manual Memory Allocation ........................................................................................................... 22
Automatic memory allocation ........................................................................................................ 22

Real Time Operating System (RTOS)
Introduction
A real-time operating system (RTOS) is an operating system (OS) intended to serve real-time
applications that process data as it comes in, typically without buffer delays. Processing in RTOS
either are event driven or time sharing. Event driven systems switch between tasks based on their
priorities while time sharing systems switch the task based on clock interrupts.
Benefits of RTOS
• Priority Based Scheduling: The ability to separate critical processing from non-
critical is a powerful tool.
• Abstracting Timing Information: The RTOS is responsible for timing and provides
API functions. This allows for cleaner (and smaller) application code.
• Maintainability/Extensibility: Abstracting timing dependencies and task-based
design results in fewer interdependencies between modules. This makes for easier
maintenance.
• Modularity: The task-based API naturally encourages modular development as a task
will typically have a clearly defined role.
• Promotes Team Development: The task-based system allows separate
designers/teams to work independently on their parts of the project.
• Easier Testing: Modular task-based development allows for modular task based
testing.
• Code Reuse: Another benefit of modularity is that similar applications on similar
platforms will inevitably lead to the development of a library of standard tasks.
• Improved Efficiency: An RTOS can be entirely event driven; no processing time is
wasted polling for events that have not occurred.
• Idle Processing: Background or idle processing is performed in the idle task. This
ensures that things such as CPU load measurement, background CRC checking etc will
not affect the main processing.
Things to consider while choosing an RTOS

• Responsiveness: The RTOS scheduling algorithm, interrupt latency and context switch
times will significantly define the responsiveness and determinism of the system. The
most important consideration is what type of response is desired - Is a hard-real time
response required? This means that there are precisely defined deadlines that, if not
met, will cause the system to fail. Alternatively, would a non-deterministic, soft real
time response be appropriate? In which case there are no guarantees as to when each
task will complete.
• Available system resources: Micro kernels use minimum system resources and
provide limited but essential task scheduling functionality. Micro kernels generally
deliver a hard-real time response, and are used extensively with embedded

microprocessors with limited RAM/ROM capacity, but can also be appropriate for
larger embedded processor systems.
• Alternatively, a full featured OS like Linux or WinCE could be used. These provide a
feature rich operating system environment, normally supplied with drivers, GUI’s and
middleware components. Full featured OS’s are generally less responsive, require more
memory and more processing power than micro kernels, and are mainly used on
powerful embedded processors where system resources are plentiful.
• Open source or professionally licensed: There are widely used, free open source
RTOS’s available, distributed under GPL or modified GPL licenses. However, these
licenses may contain copy left restrictions and offer little protection. Professionally
licensed RTOS products remove the copy left restrictions, offer full IP infringement
indemnification and warranties. In addition, you have a single company providing
support and taking responsibility for the quality of your product.
• Quality: What emphasis does the RTOS supplier place on quality within their
organization? Quality is more than just a coding standard. Are the correct procedures
in place to guarantee the quality of future products and support? Well-managed
companies that take quality seriously tend to be ISO 9001 certified.
• Safety Certification: Pre-certified and certifiable RTOS’s are available for
applications that require certification to international design standards such as DO-
178C and IEC 61508. These RTOS’s provide key safety features, and the design
evidence required by certification bodies to confirm that the process used to develop
the RTOS meets the relevant design standard.
• Licensing: It’s not only the RTOS functionality and features that you’ll need to
consider, but the licensing model that will work best for your project budget and the
company’s “return on investment”.
• RTOS Vendor: The company behind the RTOS is just as important as selecting the
correct RTOS itself. Ideally you want to build a relationship with a supplier that can
support not only your current product, but also your products of the future. To do this
you need to select a proactive supplier with a good reputation, working with leading
silicon manufacturers to ensure they can support the newest processors and tools.
Features of RTOS
The design of an RTOS is essentially a balance between providing a reasonably rich feature set for
application development and deployment and, not sacrificing predictability and timeliness. A basic
RTOS will be equipped with the following features:
• Multitasking and Preemptibility: An RTOS must be multi-tasked and preemptible to
support multiple tasks in real-time applications. The scheduler should be able to
preempt any task in the system and allocate the resource to the task that needs it most
even at peak load.
• Task Priority: Preemption defines the capability to identify the task that needs a
resource the most and allocates it the control to obtain the resource. In RTOS, such

capability is achieved by assigning individual task with the appropriate priority level.
Thus, it is important for RTOS to be equipped with this feature.
• Reliable and Sufficient Inter Task Communication Mechanism: For multiple tasks
to communicate in a timely manner and to ensure data integrity among each other,
reliable and sufficient inter-task communication and synchronization mechanisms are
required.
• Priority Inheritance: To allow applications with stringent priority requirements to be
implemented, RTOS must have a sufficient number of priority levels when using
priority scheduling.
• Predefined Short Latencies: An RTOS needs to have accurately defined short timing
of its system calls. The behavior metrics are:
o Task switching latency: The time needed to save the context of a currently
executing task and switching to another task is desirable to be short.
o Interrupt latency: The time elapsed between execution of the last instruction of
the interrupted task and the first instruction in the interrupt handler.
o Interrupt dispatch latency: The time from the last instruction in the interrupt
handler to the next task scheduled to run.
• Control of Memory Management: To ensure predictable response to an interrupt, an
RTOS should provide way for task to lock its code and data into real memory.
RTOS Architecture
The architecture of an RTOS is dependent on the complexity of its deployment. Good RTOSs are
scalable to meet different sets of requirements for different applications. For simple applications,
an RTOS usually comprises only a kernel. For more complex embedded systems, an RTOS can be
a combination of various modules, including the kernel, networking protocol stacks, and other
components as illustrated in Figure 1.

Figure 1General Architecture of RTOSx
Kernel
An operating system generally consists of two parts: kernel space (kernel mode) and user space
(user mode). Kernel is the smallest and central component of an operating system. Its services
include managing memory and devices and also to provide an interface for software applications
to use the resources. Additional services such as managing protection of programs and
multitasking may be included depending on architecture of operating system. The kernel is a bridge
between applications and the actual data processing done at the hardware level. It is the heart of
an operating system.
Figure 2 RTOS kernel

There are three broad categories of kernel models available, namely:

Monolithic kernel
It runs all basic system services (i.e. process and memory management, interrupt handling and I/O
communication, file system, etc) in kernel space. As such, monolithic kernels provide rich and
powerful abstractions of the underlying hardware. Amount of context switches and messaging
involved are greatly reduced which makes it run faster than microkernel. Examples are Linux and
Windows.
Figure 3 Monolithic Kernel Based Operating System
Microkernel
It runs only basic process communication (messaging) and I/O control. The other system services
(file system, networking, etc) reside in user space in the form of daemons/servers. Thus, micro
kernels provide a smaller set of simple hardware abstractions. It is more stable than monolithic as
the kernel is unaffected even if the servers failed (i.e. File System). Examples are AmigaOS and
QNX.

Figure 4 Microkernel Based Operating System
Exokernel
The concept is orthogonal to that of micro- vs. monolithic kernels by giving an application efficient
control over hardware. It runs only services protecting the resources (i.e. tracking the ownership,
guarding the usage, revoking access to resources, etc) by providing low-level interface for library
operating systems (libOSes) and leaving the management to the application.
Figure 5 Exokernel Based Operating System

An RTOS generally avoids implementing the kernel as a large monolithic program. The kernel is
developed instead as a micro-kernel with added configurable functionalities. The kernel of an
RTOS provides an abstraction layer between the application software and hardware.

Real Time Kernel
A real time kernel usually provides the following basic activities:
• Process management
• Interrupt handling
• Process synchronization
• Other kernel activities involve the initialization of internal data structures (such as
queues, tables, task control blocks, global variables, semaphores, and so on)
Structure of real time kernel
Figure 6 Structure of a real time kernel

Machine layer: This layer directly interacts with the hardware of the physical machine. The
primitives realized at this level mainly deals with activities such as context switch, interrupt
handling, and timer handling. The primitives are not visible at the user level.
List management layer: To keep track of the status of the various tasks, the kernel has to manage
a number of lists, where tasks having the same state are enqueued. This layer provides the basic
primitives for inserting and removing task to and from a list.
Processor management layer: The mechanisms developed in this layer only concerns scheduling
and dispatching operations.
Service layer: This layer provides all services visible at the user level as a set of system calls.
Typical services concern task creation, task abortion, suspension of periodic instances, activations
and suspension of aperiodic instances, and system inquiry operations.
Scheduling
Scheduling is the method by which work is assigned to resources that complete the work. The
work may be virtual computation elements such as threads, processes or data flows, which are in
turn scheduled onto hardware resources such as processors, network links

A scheduler is what carries out the scheduling activity. Schedulers are often implemented so they
keep all resources busy, allow multiple task to share system resources effectively, or to achieve a
target quality of service.
Some commonly used RTOS scheduling algorithms are:
• Cooperative scheduling
• Preemptive scheduling
• Rate-monotonic scheduling
• Round-robin scheduling
• Fixed priority pre-emptive scheduling
• Earliest Deadline First approach
Cooperative scheduling
Cooperative scheduling, also known as non-preemptive scheduling, is a style of multitasking in
which the operating system never initiates a context switch from a running process to another
process. Instead, processes voluntarily yield control periodically or when idle or logically blocked
in order to enable multiple applications to be run concurrently. This type of multitasking is called
"cooperative" because all programs must cooperate for the entire scheduling scheme to work.
Preemptive scheduling
Preemption is the act of temporarily interrupting a task being carried out by a RTOS, without
requiring its cooperation, and with the intention of resuming the task at a later time. Such changes
of the executed task are known as context switches. It is normally carried out by a privileged task
or part of the system known as a preemptive scheduler, which has the power to preempt, or
interrupt, and later resume, other tasks in the system.
Rate-monotonic scheduling
Rate-monotonic scheduling (RMS) is a priority assignment algorithm used in real-time operating
systems (RTOS) with a static-priority scheduling class. The static priorities are assigned according
to the cycle duration of the job, so a shorter cycle duration results in a higher job priority.
These operating systems are generally preemptive and have deterministic guarantees with regard
to response times.
Round-robin scheduling
Round-robin (RR) is one of the algorithms employed by RTOS. As the term is generally used,
time slices (also known as time quanta) are assigned to each process in equal portions and in
circular order, handling all processes without priority (also known as cyclic executive). Round-
robin scheduling is simple, easy to implement, and starvation-free.

Fixed priority pre-emptive scheduling
Fixed-priority preemptive scheduling is a scheduling system commonly used in real-time systems.
With fixed priority preemptive scheduling, the scheduler ensures that at any given time, the
processor executes the highest priority task of all those tasks that are currently ready to execute.
Earliest Deadline First approach
Earliest deadline first (EDF) or least time to go is a dynamic priority scheduling algorithm used in
real-time operating systems to place processes in a priority queue. Whenever a scheduling event
occurs (task finishes, new task released, etc.) the queue will be searched for the process closest to
its deadline. This process is the next to be scheduled for execution.
Process
A process is an instance of a computer program that is being executed. It contains the program
code and its current activity. Depending on the operating system (OS), a process may be made up
of multiple threads of execution that execute instructions concurrently. While a computer program
is a passive collection of instructions, a process is the actual execution of those instructions.
Thread
A thread of execution is the smallest sequence of programmed instructions that can be managed
independently by a scheduler, which is typically a part of the operating system.
Thread advantages and characteristics:

• Faster to switch between threads; switching between user-level threads requires no major
intervention by operating system.
• Typically, an application will have separate thread for each distinct activity.
• Thread Control Block stores information needed to manage and schedule a thread.
Task
Task consists of executable program, state of which is controlled by OS. It runs when it is
scheduled to run by the OS kernel, which gives the control of the CPU on a task request or message.
Task is that executing unit of computations which is controlled by some process at the OS
scheduling mechanism, which lets it execute on the CPU and by some process at OS for a resource-

management mechanism that lets it use the system memory and other system-resources such as
network, file and so on.
A task is an independent process and the task cannot call another task. The task can send signal or
message that can let another task run. The OS can only block a running task and let another task
gain access of CPU run the servicing codes.
State of Task in a system
i) Idle state [Not attached or not registered]
ii) Ready state [Attached or registered]
iii) Blocked (waiting) state
iv) Delayed for a preset period
Number of possible states depends on the RTOS
Idle (created) state

In idle state, the task has been created and memory allotted to its structure. However, it is not ready
and is not schedulable by kernel.
Ready (Active) State
In this state, the created task is ready and is schedulable by the kernel but not running at present
as another task is scheduled to run and gets the system resources at this instance.
Running state
Executing the codes and getting the system resources at this instance. it will run till it needs some
input or wait for an event or till it gets preempted by another higher priority task than this one.

Blocked (waiting) state
In blocked state, execution of task codes suspends after saving the needed parameters into its
context. It needs some input or it needs to wait for an event or wait for higher priority task to block
to enable running after blocking.
Deleted (finished) state
The created task has memory de-allotted to its structure. It frees the memory. To use the task, it
has to be re-created.
Interrupt routines in RTOS

ISRs have the higher priorities over the RTOS functions and the tasks. An ISR should not wait for
a semaphore, mutex, mailbox messages or queue messages. There are three alternative ways
systems to respond to hardware source calls from the interrupts.
1. Direct call to an ISR by an interrupting source and ISR sending an interrupt enter
message to OS
• On an interrupt, the process running at the CPU is interrupted.

• ISR corresponding to that source starts executing.
• The hardware source calls an ISR directly.
• The ISR just sends an ISR enter message to the RTOS to inform that an ISR has taken control
of the CPU.
• ISR code can send into a mailbox or message queue but the task waiting for a mailbox or
message queue does not start before the return from the ISR.
• On return from ISR by retrieving saved context, the RTOS later on returns to the interrupted
task or reschedules the tasks

2. RTOS first interrupting on an interrupt, the RTOS calling the corresponding ISR
• On interrupt of a task, the RTOS first gets itself the hardware source call and initiates the
corresponding ISR after saving the present processor status.
• Then the ISR during execution can post one or more outputs for the events and messages into
the mailboxes or queues.

3. RTOS first interrupting on interrupt, then RTOS calling the corresponding ISR,
the ISR sending message to priority queue on interrupt service threads by temporary
suspension of a scheduled task
Types of interrupt handlers in RTOS

In RTOS, interrupt handlers are divided into two parts: the First-Level Interrupt Handler (FLIH)
and the Second-Level Interrupt Handlers (SLIH). FLIHs are also known as hard interrupt handlers
or fast interrupt handlers, and SLIHs are also known as slow/soft interrupt handlers, or Deferred
Procedure Calls in Windows.
A FLIH implements at minimum platform-specific interrupt handling similar to interrupt routines.
In response to an interrupt, there is a context switch, and the code for the interrupt is loaded and
executed. The job of a FLIH is to quickly service the interrupt, or to record platform-specific
critical information which is only available at the time of the interrupt, and schedule the execution
of a SLIH for further long-lived interrupt handling.
A SLIH completes long interrupt processing tasks similarly to a process. SLIHs either have a
dedicated kernel thread for each handler, or are executed by a pool of kernel worker threads. These
threads sit on a run queue in the operating system until processor time is available for them to
perform processing for the interrupt. SLIHs may have a long-lived execution time, and thus are
typically scheduled similarly to threads and processes.

Intertask Communication and Synchronization
Different tasks in an embedded system typically must share the same hardware and software
resources or may rely on each other in order to function correctly. For these reasons, embedded
OSs provide different mechanisms that allow for tasks in a multitasking system to
intercommunicate and synchronize their behavior so as to coordinate their functions, avoid
problems, and allow tasks to run simultaneously in harmony.
Embedded OSs with multiple intercommunicating processes commonly implement interprocess
communication (IPC) and synchronization algorithms based upon one or some combination of
memory sharing, message passing, and signaling mechanisms.
There are three broad paradigms for inter-task communications and synchronization:
Task-owned facilities – attributes that an RTOS imparts to tasks that provide communication
(input) facilities. The example we will look at some more is signals.
Kernel objects – facilities provided by the RTOS which represent stand-alone communication or
synchronization facilities. Examples include: event flags, mailboxes, queues/pipes, semaphores
and mutexes.
Message passing – a rationalized scheme where an RTOS allows the creation of message objects,
which may be sent from one to task to another or to several others. This is fundamental to the
kernel design and leads to the description of such a product as being a “message passing RTOS”.
Shared Variables or Memory Areas
A simplistic approach to inter-task communication is to have variables or memory areas which are
accessible to all the tasks concerned. While accessing shared data as a means to communicate is a
simple approach, the major issue of race conditions can arise. A race condition occurs when a
process that is accessing shared variables is pre-empted before completing a modification access,
thus affecting the integrity of shared variables. To counter this issue, some kind of locking and
unlocking mechanism is needed.

Signals
Signals are one of the inter-task communication facility offered in conventional RTOSes. They
consist of a set of bit flags – there may be 8, 16 or 32, depending on the specific implementation
– which is associated with a specific task.
A signal flag (or several flags) may be set by any task using an OR type of operation. Only the
task that owns the signals can read them. The reading process is generally destructive – i.e. the
flags are also cleared.
In some systems, signals are implemented in a more sophisticated way such that a special function
is automatically executed when any signal flags are set. This removes the necessity for the task to
monitor the flags itself.
Event Flag Groups
Event flag groups are like signals in that they are a bit-oriented inter-task communication facility.
They may similarly be implemented in groups of 8, 16 or 32 bits. They differ from signals in being
independent kernel objects; they do not “belong” to any specific task.
Any task may set and clear event flags using OR and AND operations. Likewise, any task may
interrogate event flags using the same kind of operation. In many RTOSes, it is possible to make
a blocking API call on an event flag combination; this means that a task may be suspended until a
specific combination of event flags has been set. There may also be a “consume” option available,
when interrogating event flags, such that all read flags are cleared.
Semaphores
Semaphores are independent kernel objects, which provide a flagging mechanism that is generally
used to control access to a resource. There are broadly two types: binary semaphores (that just
have two states) and counting semaphores (that have an arbitrary number of states). Some
processors support (atomic) instructions that facilitate the easy implementation of binary
semaphores. Binary semaphores may also be viewed as counting semaphores with a count limit of
1.
Any task may attempt to obtain a semaphore in order to gain access to a resource. If the current
semaphore value is greater than 0, the obtain will be successful, which decrements the semaphore
value. In many OSes, it is possible to make a blocking call to obtain a semaphore; this means that
a task may be suspended until the semaphore is released by another task. Any task may release a
semaphore, which increments its value.
Mailboxes
Mailboxes are independent kernel objects, which provide a means for tasks to transfer messages.
The message size depends on the implementation, but will normally be fixed. One to four pointer-
sized items are typical message sizes. Commonly, a pointer to some more complex data is sent via
a mailbox. Some kernels implement mailboxes so that the data is just stored in a regular variable
and the kernel manages access to it. Mailboxes may also be called “exchanges”, though this name
is now uncommon.

Any task may send to a mailbox, which is then full. If a task then tries to send to send to a full
mailbox, it will receive an error response. In many RTOSes, it is possible to make a blocking call
to send to a mailbox; this means that a task may be suspended until the mailbox is read by another
task. Any task may read from a mailbox, which renders it empty again. If a task tries read from an
empty mailbox, it will receive an error response. In many RTOSes, it is possible to make a blocking
call to read from a mailbox; this means that a task may be suspended until the mailbox is filled by
another task.
Some RTOSes support a “broadcast” feature. This enables a message to be sent to all the tasks that
are currently suspended on reading a specific mailbox.
Certain RTOSes do not support mailboxes at all. The recommendation is to use a single-entry
queue (see below) instead. This is functionally equivalent, but carries additional memory and
runtime overhead.
Queues
Queues are independent kernel objects, that provide a means for tasks to transfer messages. They
are a little more flexible and complex than mailboxes. The message size depends on the
implementation, but will normally be a fixed size and word/pointer oriented.
Any task may send to a queue and this may occur repeatedly until the queue is full, after which
time any attempts to send will result in an error. The depth of the queue is generally user specified
when it is created or the system is configured. In many RTOSes, it is possible to make a blocking
call to send to a queue; this means that, if the queue is full, a task may be suspended until the queue
is read by another task. Any task may read from a queue. Messages are read in the same order as
they were sent – first in, first out (FIFO). If a task tries to read from an empty queue, it will receive
an error response. In many RTOSes, it is possible to make a blocking call to read from a queue;
this means that, if the queue is empty, a task may be suspended until a message is sent to the queue
by another task.
Mutexes
Mutual exclusion semaphores – mutexes – are independent kernel objects, which behave in a very
similar way to normal binary semaphores. They are slightly more complex and incorporate the
concept of temporary ownership (of the resource, access to which is being controlled). If a task
obtains a mutex, only that same task can release it again – the mutex (and, hence, the resource) is
temporarily owned by the task.
Mutexes are not provided by all RTOSes, but it is quite straightforward to adapt a regular binary
semaphore. It would be necessary to write a “mutex obtain” function, which obtains the semaphore
and notes the task identifier. Then a complementary “mutex release” function would check the
calling task’s identifier and release the semaphore only if it matches the stored value, otherwise it
would return an error.

Task control block (TCB)
Task Control Block (also called process control block (PCB)) is a data structure in the operating
system kernel containing the information needed to manage the scheduling of a particular task.
The task control block is "the manifestation of a task in an operating system."
The role of the TCBs is central in task management: they are accessed and/or modified by most
OS utilities, including those involved with scheduling, memory and I/O resource access and
performance monitoring. It can be said that the set of the TCBs defines the current state of the
operating system. Data structuring for processes is often done in terms of TCBs.
Task Information at the TCB

• TaskID, for example, ID a number between 0 and 255
• Task priority, if between 0 and 255, is represented by a byte
• Parent task (if any),
• Child task (if any),
• Address to the next task's TCB of task that will run next
• Allocated program memory address blocks in physical memory and in secondary (virtual)
memory for the tasks-codes
• Allocated task-specific data address blocks
• Allocated task-heap (data generated during the program run) addresses
• Allocated task-stack addresses for the functions called during running of the process,

• Allocated addresses of CPU register-save area as a task context represents by CPU
registers, which include the program counter and stack pointer
• Allocated addresses of CPU register-save area as a task context
Information about the task state at Task Control Block:
• Task-state signal mask [when mask is set to 0 (active) the task is inhibited from running
and when reset to 1, the task is allowed to run],
• Task signals (messages) dispatch table [task IPC functions],
• OS allocated resources' descriptors (for example, device descriptor for open devices,
device-buffer addresses and status, socket-descriptors for open socket),
• Security restrictions and permissions
Task's context and context switching

• Each task has a context,
• Context has record that reflects the CPU state just before OS blocks one task and initiates
another task into running state.
• Continuously updates during the running of a task,
• Saved before switching occurs to another task
• Saved before switching occurs to another task
• The present CPU registers, which include program counter and stack pointer are part of the
context
• When context saves on the TCB pointed process-stack and register-save area addresses,
then the running process stopes.
• Other task context now loads and that task runs – which means that the context has switched
Task endless event-waiting loop

• Each task may be coded such that it is in endless event-waiting loop to start with.
• An event loop is one that keeps on waiting for an event to occur. on the start-event, the
loop starts executing instruction from the next instruction of the waiting function in the
loop.
• Execution of service codes (or setting a token that is an event for another task) then occurs.
• At the end, the task returns to the event waiting in the loop
Task characteristics
• Each task is independent and takes control of the CPU when scheduled by a scheduler at
an OS. The scheduler controls and runs the tasks.
• No task can call another task.
• Each task is recognized by a TCB.
• Each task has an ID.
• Each task may have priority parameter.
• A task is an independent process. The OS will only block a running task and let another
task again access of CPU to run the servicing codes.

• Each task has its independent values of the followings at an instant: i) Program counter ii)
Task stack pointer. These two values are the part of its context of a task
• Task runs by context switching to that task by the OS scheduler.
• Multitasking operation are by context switching between the various tasks
• Each task must be either reentrant routine or
• Must have a way to solve the shared data problem
• The task returns to the either idle state (on deregistering or detaching) or ready state after
finishing (completion of the running state), that is, when all the servicing codes have been
executed.
• Each task may be coded such that it is in endless loop waiting for an event to start running
of the codes.
• Event can be a message in a queue or in mailbox or
• Event can be a token or signal
• Event can be delay-period over
Memory allocation in RTOS

Real time operating system uses different memory allocation strategies and algorithm that are used
in order to improve the performance of the intended system. Basically two types of memory
allocation are provided in RTOS stack and heap. The stack allocation is used during context
switching for Task control blocks while heap allocation deals with memory other than memory
used for program code and it is used for dynamic allocation of data space for tasks. Allocation of
this memory is called heap allocation
Static memory allocation Dynamic memory allocation

Memory is allocated at compile or design time Memory is allocated at run time
Memory allocation is a deterministic process Requires memory manager to keep track which
that means memory allocation already known memory are used and which are not
No memory allocation or deallocation actions Memory allocation is established and
are performed during execution destroyed during the execution

Classification of memory allocation techniques
Manual Memory Allocation

In manual memory allocation, the programmer has direct control on process of memory allocation
and recycling. Usually this is either by explicit calls to heap allocation functions or by language
constructs that affect the stack
Advantage:
• Manual memory allocation is easier for programmers to understand and use.
Disadvantage:
• Memory allocation bus are common wen manual memory allocation is used
Automatic memory allocation
Automatic memory allocation, usually do their job y recylcing blocks that are unreachable form
program variables.
Advantage:
• Atomaic memory allocaiton eliminates most memory allocation bugs.
Disadvantage:
• Sometimes automatic memory managers fial to dellocate unused memory location causing
memory leaks
• Automatic memory manager usually consume a great amount of the CPU time and usually
have non-deterministic behavior.

Detailed internal architecture of 8051*
Data Pointer (DTPR)

This 16 bit register contains a higher byte (DPH) and the lower byte (DPL) of a 16 bit external
data RAM address. It is accessed as a 16 bit register or two 8bit register as specified above. It has been
allotted two address in the special Function register bank for its two bytes DPH and DPL
Port 0 to 3 latches and Drivers:
These four latches and drivers pairs are allotted to each of the four on chip I/O Ports. These
latches have been allotted addresses in the special function register bank using the allotted address the
user can communicate with these ports. These are identified as P0, P1, P2 and P3
Serial data buffer:

The serial data buffer internally contains two independent registers. one of them is a transmit
buffer which is necessarily a parallel .
Timing and control unit :

This unit derives all the necessary timing and control signal required for the internal operation
of the circuit It also derives control signal required for controlling the external system bus oscillator :
This circuit generates the basic timing clock signal for the operation of the circuit using crystal oscillator
.
Instruction Register :
This register decodes the op-code of an instruction to be executed and gives information to the
timing and control unit to generate necessary signals for the execution of the instruction
EPROM and Program Address Register :

These blocks provide an on chip EPROM and a mechanism to internally adders it . EPROM is not
available in all 8051 versions.
RAM and RAM address Register .

This block provide internal 128 bytes of RAM and mechanism to address it internally .
ALU :
The arithmetic and logic unit performs 8 bit arithmetic and logical operations over the operands
held by the temporary registers TMPI and TMP2 .Users cannot access these temporary registers .
SFR Register Bank :

This is a set of special function registers which can be addressed using their respective address
which lie in the range 80 H to FFH . Finally the interrupt , serial port and timer units control and perform
their special functions under the control of the timing and control unit in serial out register . The other
is called receive butter which in a serial in parallel out register. Loading a byte to the transmit buffer
initiates serial transmission of that byte. The serial data butter in identified as SBUF and is one of the
special function registers. If a byte is written to SBUF, it initiates serial transmission and if the SBUF is
read, it reads received serial data .
Timer Register :
These two 16 bit register can be accessed as their lower and upper bytes. For example TL0
represents the lower byte of the timing register 0, while TH0 represents higher bytes of the timing
register 0. Similarly TL1 and TH1 represents lower and higher bytes of timing register 1. All these
registers can be accessed using the 4 addresses allotted to them which lies in the special function
registers. SFR address range, ie 80H to FFH .
Control Registers :
The special function registers IP, IE, TMOD, TCON, SCON and PCON contain control and status
information for interrupt timer/ counters and serial port.These register have been allotted address in
the SFR bank of 8051 .
Addressing Modes
The CPU can access data in various ways, which are called addressing modes
� Immediate addressing mode

� Register addressing mode
� Direct addressing mode
� Register indirect addressing mode
� Indexed addressing mode
1 Immediate Addressing
Immediate addressing is so-named because the value to be stored in memory immediately follows
the operation code in memory. That is to say, the instruction itself dictates what value will be stored in
memory. The immediate data must be preceded by the pound sign, “#” .For example, the instruction:
MOV A,#20h This instruction uses Immediate Addressing because the Accumulator will be loaded with
the value that immediately follows; in this case 20 (hexadecimal). Immediate addressing is very fast since
the value to be loaded is included in the instruction. Can load information into any registers, including 16-
bit DPTR register However, since the value to be loaded is fixed at compile-time it is not very flexible.We
can also use immediate addressing mode to send data to 8051 ports
2 Register Addressing
This mode is uses registers to hold the data to be manipulated .The source and destination registers must
match in size. For example MOV DPTR,A will give an error because of their different size .The movement
of data between Rn registers is not allowed. That is Mov R1,R2 is not possible .
3 Direct Addressing
Direct addressing is so-named because the value to be stored in memory is obtained by directly
retrieving it from another memory location. For example: MOV A,30h
This instruction will read the data out of Internal RAM address 30 (hexadecimal) and store it in the
Accumulator. Direct addressing is generally fast since, although the value to be loaded isn’t included in
the instruction, it is quickly accessible since it is stored in the 8051’s Internal RAM. It is also much more
flexible than Immediate Addressing since the value to be loaded is whatever is found at the given address
which may be variable.
Also, it is important to note that when using direct addressing any instruction which refers to an address
between 00h and 7Fh is referring to Internal Memory. Any instruction which refers to an address between
80h and FFh is referring to the SFR control registers that control the 8051 microcontroller itself.
4 Register Indirect Addressing
Indirect addressing is a very powerful addressing mode which in many cases provides an exceptional
level of flexibility. Indirect addressing is also the only way to access the extra 128 bytes of Internal RAM
found on an 8052. Indirect addressing appears as : MOV A,@R0
This instruction causes the 8051 to analyze the value of the R0 register. The 8051 will then load the
accumulator with the value from Internal RAM which is found at the address indicated by R0.
A register is used as a pointer to the data. Only register R0 and R1 are used for this purpose .R2 – R7 cannot be used
to hold the address of an operand located in RAM .When R0 and R1 hold the addresses of RAM locations, they must
be preceded by the “@” sign .
R0 and R1 are the only registers that can be used for pointers in register indirect addressing mode . Since R0 and R1
are 8 bits wide, their use is limited to access any information in the internal RAM. Whether accessing externally
connected RAM or on-chip ROM, we need 16-bit pointer in such case, the DPTR register is used.
5 Indexed Addressing Mode
Indexed addressing mode is widely used in accessing data elements of look-up table entries located in
the program ROM .The instruction used for this purpose is MOVC A,@A+DPTR .In instruction MOVC, “C”
means code. The contents of A are added to the 16-bit register DPTR to form the 16-bit address of the
needed data.
In many applications, the size of program code does not leave any room to share the 64K-byte code
space with data . The 8051 has another 64K bytes of memory space set aside exclusively for data
storage. This data memory space is referred to as external memory and it is accessed only by the MOVX
instruction
MOVX is a widely used instruction allowing access to external data memory space . To bring externally
stored data into the CPU, we use the instruction MOVX A,@DPTR
Example :
In this program, assume that the word “USA” is burned into ROM locations starting at 200H. And that the
program is burned into ROM locations starting at 0. Analyze how the program works and state where
“USA” is stored after this program is run.
Solution:
ORG 0000H ;burn into ROM starting at 0

MOV DPTR,#200H ;DPTR=200H look-up table addr
CLR A ;clear A(A=0)
MOVC A,@A+DPTR ;get the char from code space
MOV R0,A ;save it in R0
INC DPTR ;DPTR=201 point to next char
CLR A ;clear A(A=0)
MOVC A,@A+DPTR ;get the next char
INC DPTR ;DPTR=202 point to next char
CLR A ;clear A(A=0)
MOVC A,@A+DPTR ;get the next char
Here: SJMP HERE ;stay here
;Data is burned into code space starting at 200H
ORG 200H
MYDATA:DB “USA”
END ; end of program
Example
An external ROM uses the 8051 data space to store the look-up table (starting at 1000H) for DAC
data. Write a program to read 30 Bytes of these data and send it to P1.
Solution:
MYXDATA EQU 1000H
COUNT EQU 30
MOV DPTR,#MYXDATA
MOV R2,#COUNT
AGAIN: MOVX A,@DPTR
MOV P1,A
INC DPTR
DJNZ R2,AGAIN
ORG 1000H
23H,42H,67H,89H,56H ...... ; 30 bytes Data for the table will be here
END
Memory Organization and Management

The 8051 has a separate memory space for code (programs) and data. We will refer here to on-chip
memory and external memory as shown in figure 1.5. In an actual implementation the external memory
may, in fact, be contained within the microcomputer chip. However, we will use the definitions of internal
and external memory to be consistent with 8051 instructions which operate on memory. Note, the
separation of the code and data memory in the 8051 architecture is a little unusual. The separated
memory architecture is referred to as Harvard architecture whereas Von Neumann architecture defines a
system where code and data can share common memory.
8051 Memory representation
External Code Memory

The executable program code is stored in this code memory. The code memory size is limited to
64KBytes (in a standard 8051). The code memory is read-only in normal operation and is programmed
under special conditions e.g. it is a PROM or a Flash RAM type of memory.
External RAM Data Memory

This is read-write memory and is available for storage of data. Up to 64KBytes of external RAM data
memory is supported (in a standard 8051).
Internal Memory
The 8051’s on-chip memory consists of 256 memory bytes organized as follows:
The first 128 bytes of internal memory is organized as shown in figure, and is referred to as Internal RAM,
or IRAM.
Figure: Organization of Internal RAM memory
Register Banks: 00h to 1Fh

The 8051 uses 8 general-purpose registers R0 through R7 (R0, R1, R2, R3, R4, R5, R6, and R7). These
registers are used in instructions such as:
ADD A, R2 ; adds the value contained in R2 to the accumulator
Note since R2 happens to be memory location 02h in the Internal RAM the following instruction has the
same effect as the above instruction.
ADD A, 02h
Bit Addressable RAM: 20h to 2Fh

The 8051 supports a special feature which allows access to bit variables. This is where individual memory
bits in Internal RAM can be set or cleared. In all there are 128 bits numbered 00h to 7Fh. Being bit
variables any one variable can have a value 0 or 1. A bit variable can be set with a command such as
SETB and cleared with a command such as CLR. Example instructions are:
SETB 25h ; sets the bit 25h (becomes 1)

CLR 25h ; clears bit 25h (becomes 0)
Note, bit 25h is actually bit b5 of Internal RAM location 24h.

The Bit Addressable area of the RAM is just 16 bytes of Internal RAM located between 20h and 2Fh. So if
a program writes a byte to location 20h, for example, it writes 8 bit variables, bits 00h to 07h at once.
General Purpose RAM: 30h to 7Fh

These 80 bytes of Internal RAM memory are available for general-purpose data storage. Access to this
area of memory is fast compared to access to the main memory and special instructions with single byte
operands are used. However, these 80 bytes are used by the system stack and in practice little space is
left for general storage. The general purpose RAM can be accessed using direct or indirect addressing
modes. Examples of direct addressing:
MOV A, 6Ah ; reads contents of address 6Ah to accumulator

Examples for indirect addressing (use registers R0 or R1):
MOV R1, #6Ah ; move immediate 6Ah to R1
MOV A, @R1 ;; move indirect: R1 contains address of Internal RAM which contains data that is
moved to A.
These two instructions have the same effect as the direct instruction above.
SFR Registers
The SFR registers are located within the Internal Memory in the address range 80h to FFh, as shown in
figure 1.7. Not all locations within this range are defined. Each SFR has a very specific function. Each
SFR has an address (within the range 80h to FFh) and a name which reflects the purpose of the SFR.
Although 128 byes of the SFR address space is defined only 21 SFR registers are defined in the standard
8051. Undefined SFR addresses should not be accessed as this might lead to some unpredictable
results. Note some of the SFR registers are bit addressable. SFRs are accessed just like normal Internal
RAM locations.
We will discuss a few specific SFR registers here to help explain the SFR concept. Other specific SFR will
be explained later.
Port Registers SFR

The standard 8051 has four 8 bit I/O ports: P0, P1, P2 and P3. For example Port 0 is a physical 8 bit I/O
port on the 8051. Read (input) and write (output) access to this port is done in software by accessing the
SFR P0 register which is located at address 80h. SFR P0 is also bit addressable. Each bit corresponds to
a physical I/O pin on the 8051.
Example access to port 0:
SETB P0.7 ; sets the MSB bit of Port 0

CLR P0.7 ; clears the MSB bit of Port 0
The operand P0.7 uses the dot operator and refers to bit 7 of SFR P0. The same bit could be addressed
by accessing bit location 87h. Thus the following two instructions have the same meaning:
CLR P0.7
CLR 87h
PSW Program Status Word
PSW, the Program Status Word is at address D0h and is a bit-addressable register. The status bits are
listed in table
Carry flag. C
This is a conventional carry, or borrow, flag used in arithmetic operations. The carry flag is also used as
the ‘Boolean accumulator’ for Boolean instruction operating at the bit level. This flag is sometimes
referenced as the CY flag.
Auxiliary carry flag. AC

This is a conventional auxiliary carry (half carry) for use in BCD arithmetic.
Flag 0. F0
This is a general-purpose flag for user programming.
Register bank select 0 and register bank select 1

RS0 and RS1
These bits define the active register bank (bank 0 is the default register bank).
Overflow flag. OV
This is a conventional overflow bit for signed arithmetic to determine if the result of a signed arithmetic
operation is out of range.
Even Parity flag. P

The parity flag is the accumulator parity flag, set to a value, 1 or 0, such that the number of ‘1’ bits in the
accumulator plus the parity bit add up to an even number.
Stack Pointer
The Stack Pointer, SP, is an 8-bit SFR register at address 81h. The small address field (8 bits) and the
limited space available in the Internal RAM confines the stack size and this is sometimes a limitation for
8051 programmers. The SP contains the address of the data byte currently on the top of the stack. The
SP pointer in initialised to a defined address. A new data item is ‘pushed’ on to the stack using a PUSH
instruction which will cause the data item to be written to address SP + 1. Typical instructions, which
cause modification to the stack are: PUSH, POP, LCALL, RET, RETI etc.. The SP SFR, on start-up, is
initialised to 07h so this means the stack will start at 08h and expand upwards in Internal RAM. If register
banks 1 to 3 are to be used the SP SFR should be initialised to start higher up in Internal RAM. The
following instruction is often used to initialise the stack:
MOV SP, #2Fh
Data Pointer
The Data Pointer, DPTR, is a special 16-bit register used to address the external code or external data
memory. Since the SFR registers are just 8-bits wide the DPTR is stored in two SFR registers, where
DPL (82h) holds the low byte of the DPTR and DPH (83h) holds the high byte of the DPTR. For example,
if you wanted to write the value 46h to external data memory location 2500h, you might use the following
instructions:
MOV A, #46h ; Move immediate 8 bit data 46h to A (accumulator)

MOV DPTR, #2504h ; Move immediate 16 bit address value 2504h to A.
; Now DPL holds 04h and DPH holds25h.
MOVX @DPTR, A ; Move the value in A to external RAM location 2500h.

Uses indirect addressing.
Note the MOVX (Move X) instruction is used to access external memory.
Accumulator
This is the conventional accumulator that one expects to find in any computer, which is used to the hold
result of various arithmetic and logic operations. Since the 8051 microcontroller is just an 8-bit device, the
accumulator is, as expected, an 8 bit register. The accumulator, referred to as ACC or A, is usually
accessed explicitly using instructions such as:
INC A ; Increment the accumulator
However, the accumulator is defined as an SFR register at address E0h. So the following two instructions
have the same effect:
MOV A, #52h ; Move immediate the value 52h to the accumulator
MOV E0h, #52h ; Move immediate the value 52h to Internal RAM location E0h, which is, in fact, the
accumulator SFR register.
Usually the first method, MOV A, #52h, is used as this is the most conventional (and happens to use less
space, 2 bytes as oppose to 3 bytes!)
B Register
The B register is an SFR register at addresses F0h which is bit-addressable. The B register is used in
two instructions only: i.e. MUL (multiply) and DIV (divide). The B register can also be used as a general-
purpose register.
Program Counter
The PC (Program Counter) is a 2 byte (16 bit) register which always contains the memory address of the
next instruction to be executed. When the 8051 is reset the PC is always initialised to 0000h. If a 2 byte
instruction is executed the PC is incremented by 2 and if a 3 byte instruction is executed the PC is
incremented by three so as to correctly point to the next instruction to be executed. A jump instruction
(e.g. LJMP) has the effect of causing the program to branch to a newly specified location, so the jump
instruction causes the PC contents to change to the new address value. Jump instructions cause the
program to flow in a non-sequential fashion.
SFR Registers for the Internal Timer

The set up and operation of the on-chip hardware timers will be described later, but the associated
registers are briefly described here:
TCON, the Timer Control register is an SFR at address 88h, which is bit-addressable. TCON is used to
configure and monitor the 8051 timers. The TCON SFR also contains some interrupt control bits,
described later.
TMOD, the Timer Mode register is an SFR at address 89h and is used to define the operational modes
for the timers, as will be described later.
TL0 (Timer 0 Low) and TH0 (Timer 0 High) are two SFR registers addressed at 8Ah and 8Bh
respectively. The two registers are associated with Timer 0.
TL1 (Timer 1 Low) and TH1 (Timer 1 High) are two SFR registers addressed at 8Ch and 8Dh
respectively. These two registers are associated with Timer 1.
Power Control Register

PCON (Power Control) register is an SFR at address 87h. It contains various control bits including a
control bit, which allows the 8051 to go to ‘sleep’ so as to save power when not in immediate use.
Serial Port Registers

Programming of the on-chip serial communications port will be described later in the text. The associated
SFR registers, SBUF and SCON, are briefly introduced here, as follows:
The SCON (Serial Control) is an SFR register located at addresses 98h, and it is bit- addressable. SCON
configures the behaviour of the on-chip serial port, setting up parameters such as the baud rate of the
serial port, activating send and/or receive data, and setting up some specific control flags.
The SBUF (Serial Buffer) is an SFR register located at address 99h. SBUF is just a single byte deep
buffer used for sending and receiving data via the on-chip serial port
Interrupt Registers
Interrupts will be discussed in more detail later. The associated SFR registers are:
IE (Interrupt Enable) is an SFR register at addresses A8h and is used to enable and disable specific
interrupts. The MSB bit (bit 7) is used to disable all interrupts.
IP (Interrupt Priority) is an SFR register at addresses B8h and it is bit addressable. The IP register
specifies the relative priority (high or low priority) of each interrupt. On the 8051, an interrupt may either
be of low (0) priority or high (1) priority. .
ASSEMBLY LANGUAGE PROGRAMMING
Some Assembler Directives

The assembler directives are special instruction to the assembler program to define
some specific operations but these directives are not part of the executable program.
Some of the most frequently assembler directives are listed as follows:
ORG OriGinate, defines the starting address for the program in program (code) memory
EQU EQUate, assigns a numeric value to a symbol identifier so as to make the program more readable.
DB Define a Byte, puts a byte (8-bit number) number constant at this memory location
DW Define a Word, puts a word (16-bit number) number constant at this memory location
DBIT Define a Bit, defines a bit constant, which is stored in the bit addressable section if the Internal
RAM.
END This is the last statement in the source file to advise the assembler to stop the assembly process.
Types of Instructions
The assembly level instructions include: data transfer instructions, arithmetic
instructions, logical instructions, program control instructions, and some special
instructions such as the rotate instructions.
Data Transfer
Many computer operations are concerned with moving data from one location to another. The 8051 uses
five different types of instruction to move data:
MOV MOVX MOVC

PUSH and POP XCH
MOV
In the 8051 the MOV instruction is concerned with moving data internally, i.e. between Internal RAM, SFR
registers, general registers etc. MOVX and MOVC are used in accessing external memory data. The
MOV instruction has the following format:
MOV destination <- source
The instruction copies (copy is a more accurate word than move) data from a defined source location to a
destination location. Example MOV instructions are:
MOV R2, #80h ; Move immediate data value 80h to register R2

MOV R4, A ; Copy data from accumulator to register R4
MOV DPTR, #0F22Ch ; Move immediate value F22Ch to the DPTR register
MOV R2, 80h ; Copy data from 80h (Port 0 SFR) to R2
MOV 52h, #52h ; Copy immediate data value 52h to RAM location 52h
MOV 52h, 53h ; Copy data from RAM location 53h to RAM 52h
MOV A, @R0 ; Copy contents of location addressed in R0 to A (indirect addressing)
MOVX
The 8051 the external memory can be addressed using indirect addressing only. The DPTR register is
used to hold the address of the external data (since DPTR is a 16-bit register it can address 64KByte
locations: 216 = 64K). The 8 bit registers R0 or R1 can also be used for indirect addressing of external
memory but the address range is limited to the lower 256 bytes of memory (28 = 256 bytes).
The MOVX instruction is used to access the external memory (X indicates eXternal memory access). All
external moves must work through the A register (accumulator).
Examples of MOVX instructions are:

MOVX @DPTR, A ; Copy data from A to the address specified in DPTR
MOVX A, @DPTR ; Copy data from address specified in DPTR to A
MOVC
MOVX instructions operate on RAM, which is (normally) a volatile memory. Program tables often need to
be stored in ROM since ROM is non volatile memory. The MOVC instruction is used to read data from the
external code memory (ROM). Like the MOVX instruction the DPTR register is used as the indirect
address register. The indirect addressing is enhanced to realise an indexed addressing mode where
register A can be used to provide an offset in the address specification. Like the MOVX instruction all
moves must be done through register A. The following sequence of instructions provides an example:
MOV DPTR, # 2000h ; Copy the data value 2000h to the DPTR register
MOV A, #80h ; Copy the data value 80h to register A
MOVC A, @A+DPTR ; Copy the contents of the address 2080h (2000h + 80h)
; to register A
Note, for the MOVC the program counter, PC, can also be used to form the address.
PUSH and POP

PUSH and POP instructions are used with the stack only. The SFR register SP
contains the current stack address. Direct addressing is used as shown in the following
examples:
PUSH 4Ch ; Contents of RAM location 4Ch is saved to the stack. SP is incremented.
PUSH 00h ; The content of R0 (which is at 00h in RAM) is saved to the stack and SP is incremented.
POP 80h ; The data from current SP address is copied to 80h and SP is decremented.
XCH
The above move instructions copy data from a source location to a destination location, leaving the
source data unaffected. A special XCH (eXCHange) instruction will actually swap the data between
source and destination, effectively changing the source data. Immediate addressing may not be used with
XCH. XCH instructions must use register A. XCHD is a special case of the exchange instruction where
just the lower nibbles are exchanged. Examples using the XCH instruction are:
XCH A, R3 ; Exchange bytes between A and R3

XCH A, @R0 ; Exchange bytes between A and RAM location whose address is in R0
XCH A, A0h ; Exchange bytes between A and RAM location A0h (SFR port 2)
Arithmetic
Some key flags within the PSW, i.e. C, AC, OV, P, are utilised in many of the arithmetic instructions. The
arithmetic instructions can be grouped as follows:
Addition
Subtraction
Increment/decrement
Multiply/divide
Decimal adjust
Addition
Register A (the accumulator) is used to hold the result of any addition operation. Some simple addition
examples are:
ADD A, #25h ; Adds the number 25h to A, putting sum in A

ADD A, R3 ; Adds the register R3 value to A, putting sum in A
The flags in the PSW register are affected by the various addition operations, as follows:
The C (carry) flag is set to 1 if the addition resulted in a carry out of the accumulator’s MSB bit, otherwise
it is cleared. The AC (auxiliary) flag is set to 1 if there is a carry out of bit position 3 of the accumulator,
otherwise it is cleared.For signed numbers the OV flag is set to 1 if there is an arithmetic overflow .Simple
addition is done within the 8051 based on 8 bit numbers, but it is often required to add 16 bit numbers, or
24 bit numbers etc. This leads to the use of multiple byte (multi-precision) arithmetic. The least significant
bytes are first added, and if a carry results, this carry is carried over in the addition of the next significant
byte etc. This addition process is done at 8-bit precision steps to achieve multiprecision arithmetic. The
ADDC instruction is used to include the carry bit in the addition process.
Example instructions using ADDC are:

ADDC A, #55h ; Add contents of A, the number 55h, the carry bit; and put the sum in A
ADDC A, R4 ; Add the contents of A, the register R4, the carry bit; and put the sum in A.
Subtraction
Computer subtraction can be achieved using 2’s complement arithmetic. Most computers also provide
instructions to directly subtract signed or unsigned numbers.The accumulator, register A, will contain the
result (difference) of the subtraction operation. The C (carry) flag is treated as a borrow flag, which is
always subtracted from the minuend during a subtraction operation. Some examples of subtraction
instructions are:
SUBB A, #55d ; Subtract the number 55 (decimal) and the C flag from A; and put the result in A.
SUBB A, R6 ; Subtract R6 the C flag from A; and put the result in A.
SUBB A, 58h ; Subtract the number in RAM location 58h and the C flag From A; and put the result in A.
Increment/Decrement
The increment (INC) instruction has the effect of simply adding a binary 1 to a number while a decrement
(DEC) instruction has the effect of subtracting a binary 1 from a number. The increment and decrement
instructions can use the addressing modes: direct, indirect and register. The flags C, AC, and OV are not
affected by the increment or decrement instructions. If a value of FFh is increment it overflows to 00h. If a
value of 00h is decrement it underflows to FFh. The DPTR can overflow from FFFFh to 0000h. The DPTR
register cannot be decremented using a DEC instruction (unfortunately!). Some example INC and DEC
instructions are as follows:
INC R7 ; Increment register R7

INC A ; Increment A
INC @R1 ; Increment the number which is the content of the address in R1
DEC A ; Decrement register A
DEC 43h ; Decrement the number in RAM address 43h
INC DPTR ; Increment the DPTR register
Multiply / Divide
The 8051 supports 8-bit multiplication and division. This is low precision (8 bit) arithmetic but is useful for
many simple control applications. The arithmetic is relatively fast since multiplication and division are
implemented as single instructions. If better precision, or indeed, if floating point arithmetic is required
then special software routines need to be written. For the MUL or DIV instructions the A and B registers
must be used and only unsigned numbers are supported.
Multiplication
The MUL instruction is used as follows (note absence of a comma between the A and B operands):
MUL AB ; Multiply A by B.
The resulting product resides in registers A and B, the low-order byte is in A and the high order byte is in
B.
Division
The DIV instruction is used as follows:
DIV AB ; A is divided by B.
The remainder is put in register B and the integer part of the quotient is put in register A.
Decimal Adjust (Special)

The 8051 performs all arithmetic in binary numbers (i.e. it does not support BCD arithmetic). If two BCD
numbers are added then the result can be adjusted by using the DA, decimal adjust, instruction:
DA A ; Decimal adjust A following the addition of two BCD numbers.
Logical /Boolean Operations

Most control applications implement control logic using Boolean operators to act on
the data. Most microcomputers provide a set of Boolean instructions that act on byte
level data. However, the 8051 (somewhat uniquely) additionally provides Boolean
instruction which can operate on bit level data.
The following Boolean operations can operate on byte level or bit level data:
ANL Logical AND
ORL Logical OR
CPL Complement (logical NOT)
XRL Logical XOR (exclusive OR)
Logical operations at the BYTE level
The destination address of the operartion can be the accumulator (register A), a
general register, or a direct address. Status flags are not affected by these logical
operations (unless PSW is directly manipulated). Example instructions are:
ANL A, #55h ; AND each bit in A with corresponding bit in number 55h, leaving
the result in A.
ANL 42h, R4 ; AND each bit in RAM location 42h with corresponding bit in R4,
leaving the result in RAM location 42h.
ORL A,@R1 ; OR each bit in A with corresponding bit in the number whose address
is contained in R1 leaving the result in A.
XRL R4, 80h ; XOR each bit in R4 with corresponding bit in RAM location 80h
(port 0), leaving result in A.
CPL R0 ; Complement each bit in R0
Logical operations at the BIT level

The C (carry) flag is the destination of most bit level logical operations. The carry flag
can easily be tested using a branch (jump) instruction to quickly establish program
flow control decisions following a bit level logical operation.
The following SFR registers only are addressable in bit level operations:
PSW IE IP TCON SCON
Examples of bit level logical operations are as follows:
SETB 2Fh ; Bit 7 of Internal RAM location 25h is set

CLR C ; Clear the carry flag (flag =0)
CPL 20h ; Complement bit 0 of Internal RAM location 24h
MOV C, 87h ; Move to carry flag the bit 7of Port 0 (SFR at 80h)
ANL C,90h ; AND C with the bit 0 of Port 1 (SFR at 90
BRANCHING
The 8051 has a rich set of jumps that can operate at the bit and byte levels.
These jump opcodes are one reason the 8051 in such a powerful microcontroller.
Bit Jumps
Bit jumps all operate according to the status of the carry lag in the PSW or the status of any bit
addressable location.
All bit jumps are relative to the program counter.
JC radd jump relative if the carry flag is set to 1
JNC radd jump relative if the carry flag is reset to 0
JB b,radd jump relative if addressable bit is set to 1
JNB b,radd jump relative if addressable bit is reset to 0
JBC b,radd jump relative if addressable bit is set, and clear the addressable bit to 0
Byte Jumps
All byte jumps are relative to the program counter
• CJNE A, add, radd
compare the contents of the A register with the contents of the direct address; if they are not
equal, then jump to the relative address; set the carry flag to 1 if a is less than the contents of the
direct address; otherwise, set the carry flag to 0.
• CJNE A, #n, radd
compare the contents of the A register with the immediate number n ; if they are not equal, then
jump to the relative address; set the carry flag to 1 if A is less than the number; otherwise, set the
carry flag to 0.
• CJNE Rn, #n, radd

compare the contents of register Rn with the immediate number n; if they are not equal, then
jump to the relative address; set the carry flag to 1 if Rn is less than the number; otherwise, set
the carry flag to 0.
CJNE @Rp,#n, radd

compare the contents of the address contained in register Rp to the number n; if they are not
equal,then jump to the relative address; set the carry flag to 1 if the contents of the
address in Rp are less than the number; otherwise, set the carry flag to 0.
DJNZ Rn, radd

Decrement register Rn by 1 and jump to the relative address if the result is not 0; no flags are
affected
DJNZ add, radd

Decrement Direct Address by 1 and jump to the relative address if the result is not 0; no flags are
affected unless the direct address is the PSW
JZ radd
Jump to the relative address if A is 0; the flags and the A register are not changed
JNZ radd
Jump to the relative address if A is not 0; the flags and the A register are not changed
UNCONDITIONAL JUMPS
All jump ranges are possible

• JMP @A+DPTR
Jump to the address formed by adding A to the DPTR; this is an unconditional jump and will
always be done; the address can be anywhere in program memory; A the DPTR, and the flags
are unchanged
• AJMP sadd
Jump to absolute short range address sadd; this is an unconditional jump and is always taken;
no flags are affected;
• LJMP ladd
Jump to absolute long range address ladd; this is an unconditional jump and is always taken; no
flags are affected;
• SJMP radd
Jump to relative address radd; this is an unconditional jump is always taken; no flags are
affected;
• NOP
Do nothing and go to the next instruction; NOP(no operation) is used to waste time in a software
timing loop, or to leave room in program for later; no flags are affected
CALL AND SUBROUTINE
Calls use Short or Long range Addressing

• ACALL sadd
Call the subroutine loacted on the same page as the address of the opcode immediately following
the ACALL instruction; push the address of the instruction immediately after the call on the stack
• LCALL ladd
Call the subroutine loacted anywhere in programming memory space ; push the address of the
instruction immediately following the call on the stack
• RET
pop 2 bytes from the stack in to the program counter
INTERFACING WITH EXTERNAL DEVICES
The diagram above shows the 8051 pinout. The chip is a 40-pin package.
The four 8-bit I/O ports P0, P1, P2 and P3 each use 8 pins. All the ports upon RESET are configured as input,
ready to be used as input ports . When the first 0 is written to a port, it becomes an output . To reconfigure
it as an input, a 1 must be sent to the port . To use any of these ports as an input port, it must be
programmed.
Port 0 - pins 32 to 39 make up the 8-bit I/O port 0. However, if external memory is used, these lines are
used as a multiplexed address and data bus. However, if external memory is used, these lines are used as a
multiplexed address and data bus. It can be used for input or output, each pin must be connected
externally to a 10K ohm pull-up resistor .This is due to the fact that P0 is an open drain, unlike P1, P2, and
P3 .In order to make port 0 an input, the port must be programmed by writing 1 to all the bits Port 0 is also
designated as AD0-AD7, allowing it to be used for both address and data. When connecting an 8051/31 to
an external memory, port 0 provides both address and data
Port 1 - pins 1 to 8 make up the 8-bit I/O port 1. Port 1 can be used as input or output . In contrast to port
0, this port does not need any pull-up resistors since it already has pull-up resistors internally. Upon reset,
port 1 is configured as an input port
Port 2 - pins 21 to 28 make up the 8-bit I/O port 2. However, if external memory is used, these lines make
up the high-byte of the external address (A8 to A15). Port 2 can be used as input or output. Just like P1,
port 2 does not need any pull-up resistors since it already has pull-up resistors internally. Upon reset, port
2 is configured as an input port
Port 3 - pins 10 to 17 make up the 8-bit I/O port 3. Port 3 can be used as input or output. Port 3 does not
need any pull-up resistors. Port 3 is configured as an input port upon reset; this is not the way it is most
commonly used because each of these eight pins also has an alternate function, as detailed in the table
below.
Pin Name Bit Address Function
P3.0 RXD B0H Receive data for serial port
P3.1 TXD B1H Transmit data for serial port
P3.2 INT0-bar B2H External interrupt 0
P3.3 INT1-bar B3H External interrupt 1
P3.4 T0 B4H Timer/counter 0 external input
P3.5 T1 B5H Timer/counter 1 external input
P3.6 WR-bar B6H External data memory write strobe
P3.7 RD-bar B7H External data memory read strobe
RST - the reset input is on pin 9. This pin is used for resetting the 8051 (ie; loading the PC with the correct
startup value).
EA-bar - the external access, on pin 31, is used for enabling or disabling the on-chip ROM. When tied high
(5V), the 8051 executes instructions in internal ROM when executing in the lower 4K (8K for the 8052) of
memory. If tied low the 8051 will always execute instructions in external memory. The 8031 and 8032
should always have pin 31 tied low as there is no internal code memory.
ALE - the address latch enable is on pin 30. The ALE is used for latching the low byte of the address into an
external register.
PSEN-bar - the program store enable is an output signal on pin 29. This signal is used for fetching
instructions from external code memory.
Sometimes we need to access only 1 or 2 bits of the port. Instructions that are used for signal-bit
operations are as following
The JNB and JB instructions are widely used single-bit operations. They allow you to monitor a bit and
make a decision depending on whether it’s 0 or 1.These two instructions can be used for any bits of I/O
ports 0, 1, 2, and 3. Port 3 is typically not used for any I/O, either single-bit or byte-wise.
In reading a port :
Some instructions read the status of port pins. Others read the status of an internal port latch .Therefore,
when reading ports there are two possibilities:
� Read the status of the input pin
� Read the internal latch of the output port
Confusion between them is a major source of errors in 8051 programming . Especially where external
hardware is concerned .Some instructions read the contents of an internal port latch instead of reading
the status of an external pin .
For example, look at the ANL P1,A instruction
and the sequence of actions is executed as follow
1. It reads the internal latch of the port and brings that data into the CPU
2. This data is ANDed with the contents of register A
3. The result is rewritten back to the port latch
4. The port pin data is changed and now has the same value as port latch
Read-Modify-Write
The instructions read the port latch normally read a value, perform an operation then rewrite it back to the
port latch. The ports in 8051 can be accessed by the Read-modify-write technique .This feature saves
many lines of code by combining in a single instruction all three actions
1. Reading the port

2. Modifying it
3. Writing to the port
Example of read modify write .
MOV P1,#55H ;P1=01010101

AGAIN: XRL P1,#0FFH ;EX-OR P1 with 1111 1111
ACALL DELAY
SJMP BACK
SOME PROGRAMS
Port 0 is configured first as an input port by writing 1s to it, and then data is received from
that port and sent to P1
$MOD51
MOV A,#0FFH ;A=FF hex
MOV P0,A ;make P0 an i/p port by writing it all 1s
BACK: MOV A,P0 ;get data from P0
MOV P1,A ;send it to port 1
SJMP BACK ;keep doing it
END
The following code will continuously send out to port 1 the alternating value 55H and
AAH
$mod51
MOV P1,#00H
BACK: MOV A,#55H
MOV P1,A
ACALL DELAY
MOV A,#0AAH
MOV P1,A
ACALL DELAY
SJMP BACK
DELAY: MOV R2,#200
AGAIN: MOV R3,#250
HERE:
DJNZ R3,HERE
DJNZ R2,AGAIN
RET
END
Port 1 is configured first as an input port by writing 1s to it, then data is received from that
port and saved in R7 and R5
$MOD51
ORG 0H
MOV A,#0FFH ;A=FF hex
MOV P1,A ;make P1 an input port by writing it all 1s
MOV A,P1 ;get data from P1
MOV R7,A ;save it to in reg R7
ACALL DELAY ;wait
MOV A,P1 ;another data from P1
MOV R5,A ;save it to in reg R5
DELAY: MOV R2,#200
AGAIN: MOV R3,#250
HERE:
DJNZ R3,HERE
DJNZ R2,AGAIN
RET
END
Create a square wave of 50% duty cycle on bit 0 of port 1.

Solution:
The 50% duty cycle means that the “on” and “off” state (or the high and low portion of the
pulse) have the same length. Therefore, we toggle P1.0 with a time delay in between each
state.(NOTE : this is a small block of program )
HERE: SETB P1.0 ;set to high bit 0 of port 1

LCALL DELAY ;call the delay subroutine
CLR P1.0 ;P1.0=0
LCALL DELAY
SJMP HERE ;keep doing it
Another way to write the above program is:
HERE: CPL P1.0 ;set to high bit 0 of port 1

LCALL DELAY ;call the delay subroutine
SJMP HERE ;keep doing it
Write a program to perform the following:

(a) Keep monitoring the P1.2 bit until it becomes high
(b) When P1.2 becomes high, write value 45H to port 0
(c) Send a high-to-low (H-to-L) pulse to P2.3
Solution:
$MOD51
ORG 0H
SETB P1.2 ;make P1.2 an input
MOV A,#45H ;A=45H
AGAIN: JNB P1.2,AGAIN ; get out when P1.2=1
MOV P0,A ;issue A to P0
SETB P2.3 ;make P2.3 high
CLR P2.3 ;make P2.3 low for H-to-L
END
A switch is connected to pin P1.7. Write a program to check the status of SW and perform the
following:
(a) If SW=0, send letter ‘N’ to P2
(b) If SW=1, send letter ‘Y’ to P2
Solution:
$MOD51
ORG 0H
SETB P1.7 ;make P1.7 an input
AGAIN: JB P1.7,OVER ;jump if P1.7=1
MOV P2,#’N’ ;SW=0, issue ‘N’ to P2
SJMP AGAIN ;keep monitoring
OVER: MOV P2,#’Y’ ;SW=1, issue ‘Y’ to P2
SJMP AGAIN ;keep monitoring
END
8051 UART
Manish Man Shrestha
Manish Man Shrestha Cosmos College of Management and Technology

Introduction
• 8051 has built in UART with RXD (serial data receive pin) and TXD (serial data transmit
pin) on PORT3.0 and PORT3.1 respectively.
• Asynchronous serial communication is widely used for byte oriented transmission.
• Frame structure in Asynchronous communication:
– START bit: It is a bit with which serial communication start and it is always low.
– Data bits packet: Data bits can be 5 to 9 bits packet. Normally we use 8 data bit packet, which is always
sent after START bit.
– STOP bit: This is one or two bits. It is sent after data bits packet to indicate end of frame. Stop bit is
always logic high.

Interface standard

8051 UART Programming: Baud rate calculation
Now with Fosc = 11.0592Mhz, TH1 value for 9600

baudrate will be: TH1 = 256-(11.0592*10^6)/(32
* 12 * 9600) = 253 = 0xFD = -3
Baudrate = Fosc/(32 * 12 * (256-TH1))

$$TH1 = 256 - (Fosc/(32 * 12 * Baudrate))
$$ //If( SMOD==0 in PCON register)
$$TH1 = 256 - (Fosc/(32 * 6 * Baudrate))$$
//If( SMOD==1 in PCON register)

Serial communication Registers
Register Description
SCON Serial Control Register
TCON Timer Control Register for Baud Rate Generator
TMOD Timer Mode Control for Baud Rate Generator
SBUFF Serial Buffer holds the data to be transmitted and
the data received
PCON Power Control Register

SBUF: Serial Buffer Register
This is the serial communication data register used to transmit or receive data through it.

SCON: Serial Control Register
Bit 7:6 - SM0:SM1: Serial Mode Speciﬁer

Bit 5 - SM2: for Multiprocessor Communication
Bit 4 - REN: Receive Enable;1 = Receive enable; 0 = Receive disable
Bit 3 - TB8: 9th Transmit Bit
Bit 2 - RB8: 9th Receive Bit
Bit 1 - TI: Transmit Interrupt Flag
Bit 0 – RI: Receive Interrupt Flag

PCON (Power Control Register)
SMOD – – – GF1 GF0 PD IDL
SMOD : Double baud rate bit. If Timer 1 is used to generate baud rate and SMOD = 1, the
baud rate is doubled when the serial port is used in modes 1, 2, or 3.
GF1 : General-purpose flag bit.
GF0 : General-purpose flag bit.
PD : Power Down bit. Setting this bit activates the Power Down operation in the 8051BH.
(Available only in CHMOS).
IDL : Idle Mode bit. Setting this bit activates Idle Mode operation in the 8051BH.
(Available only in CHMOS).

UART programming
#include<reg51.h>

void UART_Init(int baudrate)
{
SCON = 0x50; // Asynchronous mode, 8-bit data and 1-stop bit
TMOD = 0x20; //Timer1 in Mode2.
TH1 = 256 - (11059200UL)/(long)(32*12*baudrate); // Load timer value for baudrate generation
TR1 = 1; //Turn ON the timer for Baud rate generation

}

void UART_TxChar(char ch)
{
SBUF = ch; // Load the data to be transmitted
while(TI==0); // Wait till the data is trasmitted
TI = 0; //Clear the Tx flag for next cycle.
}

char UART_RxChar(void)
{
while(RI==0); // Wait till the data is received
RI=0; // Clear Receive Interrupt Flag for next cycle
return(SBUF); // return the received char
}


int main()
{
char i,a[]={"Welcome to 8051 Serial Comm, Type the char to be echoed: "};
char ch;
UART_Init(9600); //Initialize the UART module with 9600 baud rate
for(i=0;a[i]!=0;i++)
{
UART TxChar(a[i]); // Transmit predefined string
}
while(1)
{
ch = UART_RxChar(); // Receive a char from serial port
UART_TxChar(ch); // Transmit the received char
}
}

UART ISR
#include<reg51.h>
unsigned char receivedChar=0;

void main()
void serial_isr() interrupt 4 {
{ SCON = 0x50; // Asynchronous mode, 8-bit data and 1-stop bit
if(RI == 1) TMOD = 0x20; // Timer1 in Mode2.
TH1 = 0xFD; // Load timer value for 9600 baudrate
{ TR1 = 1; // Turn ON the timer for Baud rate generation
receivedChar = SBUF; ES = 1; // Enable Serial INterrupt
EA = 1; // Enable Global Interrupt bit
SBUF = receivedChar;

RI = 0; while(1)
} {
LEDs = receivedChar;
else if(TI == 1) }
{ }
TI = 0;
}
}


8051 timer
Manish Man Shrestha

Introduction
• The 8051 has two timers/counters, they can be used as
– Timers to generate a time delay
– Event counters to count events happening outside the
microcontroller
• Both Timer 0 and Timer 1 are 16 bits wide
• Since 8051 has an 8-bit architecture, each 16-bits timer
is accessed as two separate registers of low byte and
high byte

8051 timer overview
AT89C51 microcontroller has two Timers designated as Timer0 and Timer1. Each of these timers is
assigned a 16-bit register.
Timer Size Control Count Register Min Delay Max Delay

Register
TIMER0 16-bit TMOD,TCON TH0,TL0 1.085µs 71.107ms
TIMER1 16-bit TMOD,TCON TH1,TL1 1.085µs 71.107ms
TIMER2 (8052 16-bit T2CON RCAP2H,RCAP2L 1.085µs 71.107ms

only)

Timer registers
• Accessed as low byte and high byte This means that the maximum number of times a timer can
16
• The low byte register is called TL0/TL1 count without repeating is 2 , i.e., 65536. So the
maximum allowed counts in value of Timer registers can
• The high byte register is called TH0/TH1be from 0000H to FFFFH.

TMOD register
Gate Control M1-M0:Mode Control

0 = Timer enabled
1 = Timer enabled if INTx is high 00-Mode 0, 13 bit count mode
01-Mode 1, 16 bit count mode
C/T:Counter or Timer Selector 10-Mode 2, 8 bit Auto reload
0 = Internal count source (clock/12) mode
1 = External count source T0/T1(P3.4/P3.5) pin. 11-Mode 3, Split Timer mode

TCON register
TRx: Timer x run control
0 = Timer not running
1 = Timer running
TFx: Timer x OverFlow flag
0 = Timer has not overflowed/rolled
over
1 = Timer has overflowed/rolled over

Timer clock frequency

Timer calculation
• 8051 Oscillator frequency is divided by 12 and then fed to the controller
• Time to increment the Timer count by one(timer tick) can be determined as below.
tick = (1/(Fosc/12)
tick = 12/Fosc
• For Fosc == 11.0592Mhz
tick = 12/11.0592M = 1.085069444us = 1.085us
• Now the Timer value for the required delay can be calculated as below.
Delay = Count * tick
Count = (Delay/tick)
RegValue = TimerMax- Count
RegValue = TimerMax-(Delay/tick) = TimerMax - (Delay/1.085us)
RegValue = TimerMax-((Delay/1.085) * 10^6)

Timer Mode1
• The timer in Mode-1 can be used as a 16-bit timer to count from 0000 to
FFFFH thus allowing to generate a wide range of delay. The timer value
for the required delay needs to be loaded into Timer Count registers TH
& TL. After loading the values to the register, the timers must be started.
Now the Timer starts counting up and once it reaches the max value(0xffff)
, it rolls back to zero setting the overflow flag. At this point, the timer
values must be reloaded and the overflow flag should also be cleared
Timer Calculation for 50ms delay
Fosc = 11.0592Mhz
Delay = 50ms
RegValue = 65536 - (50ms/1.085)*10^6 = 65536 - 46082 = 19453 = 0x4BFD

Timer 0 : polling method
#include<reg51.h>
sbit LED = P0^0;
void timerDelay()
{
TH0 = 0X4B; //Load the timer value
TL0 = 0XFD;
TR0 = 1; //turn ON Timer zero
while(TF0 == 0); // Wait for Timer Overflow
TF0 = 0; //clear the timer Over flow flag
TR0 = 0;
}
void main()
{
TMOD = 0x01; //Timer0 mode 1
while(1)
{
LED = 1;
timerDelay();
LED = 0;
timerDelay();
}
}

Timer 0: interrupt
#include<reg51.h>
sbit LED = P0^0;
void timer0_isr() interrupt 1

{
TH0 = 0X4B; //ReLoad the timer value
TL0 = 0XFD;
LED =! LED; // Toggle the LED pin
}
void main()
{
TH0 = 0X4B; //Load the timer value
TL0 = 0XFD;
ET0 = 1; //Enable TImer0 Interrupt
EA = 1; //Enable Global Interrupt bit
while(1)
{
// Do nothing
}
}

Timer Mode2
• The timer in Mode-2 can be used as an 8-bit timer to count from 00 to FFH . The timer
value for the required delay needs to be loaded into Timer Count registers TH(which is
copied to TL). After loading the values to the register, the timers must be started. Now
the Timer starts incrementing TL and once it reaches the max value(0xff), it rolls back to
zero setting the overflow flag and reloads the value from TH
Fosc = 11.0592Mhz
Delay = 250µs
RegValue = 256 - (250µs/1.085)*10^6 = 256 - 230 = 26 = 0x1A

Timer 0: auto reload
#include<reg51.h>
sbit LED = P0^0;

void timer0_isr() interrupt 1
{
LED =! LED; // Toggle the LED pin,
//Note Timer value is not reloaded, It is
automatically taken care
}

void main()
{
TH0 = 0X1A; //Load the timer value
ET0 = 1; //Enable TImer0 Interrupt
EA = 1; //Enable Global Interrupt bit
while(1)
{
// Do nothing
}
}

In the following program, we are creating a square wave of 50% duty cycle (with equal portions high
and low) on the PI.5 bit. Timer 0 is used to generate the time delay. Analyze the program.

Assuming that XTAL = 11.0592 MHz, write a program to generate a square wave of 2 kHz frequency on pin PI .5.
Solution:
Since XTAL = 11.0592 MHz, the counter counts up every 1.085 us..
1.T = 1 / f = 1 / 2 kHz = 500 us the period of the square wave.
2.1/2 of it for the high and low portions of the pulse is 250 us.
3.250 us / 1.085 us = 230 and 65536 – 230 = 65306. which in hex is FF1AH.
4.TL = 1AH and TH = FFH. all in hex. The program is as follows.

8051 7-segment display
Manish Man Shrestha

Introduction
• A seven-segment display is a combination

of eight LEDs
• These segments are named as a, b, c, d,
e, f, g, DP.
• There are two types of seven-segment
displays
• Common Anode (CA) 7 Segment
Display
• Common Cathode (CC) 7 Segment
Display

Common Anode 7-segment display

Common Cathode 7-segment display

Common cathode connection diagram

Common cathode digital display table
* P2.7 P2.6 P2.5 P2.4 P2.3 P2.2 P2.1 P2.0 *
Character DP g f e d c b a HEX
0 0 0 1 1 1 1 1 1 0x3F
1 0 0 0 0 0 1 1 0 0x06
2 0 1 0 1 1 0 1 1 0x5B
3 0 1 0 0 1 1 1 1 0x4F
4 0 1 1 0 0 1 1 0 0x66
5 0 1 1 0 1 1 0 1 0x6D
6 0 1 1 1 1 1 0 1 0x7D
7 0 0 0 0 0 1 1 1 0x07
8 0 1 1 1 1 1 1 1 0x7F
9 0 1 1 0 1 1 1 1 0x6F

C code
#include<reg51.h>
void delay() // Function for creating delay in milliseconds.
{
unsigned i,j ;
for(i=0;i<0xff;i++)
{
for(j=0;j<0xff;j++);
}
}
void main()
{
unsigned char disp[]={0x3F,0x06,0x5B,0x4F,0x66,0x6D,0x7D,0x07,0x7F,0x6F};
int i;
while(1)
{
for(i=0;i<10;i++)
{
P2=disp[i];
delay();
}
}
}

ASM code
ORG 4000H
DB 3FH, 06H, 5BH, 4FH, 66H, 6DH, 7DH, 07H, 7FH, 6FH ; Lookup table for digits 0 to 9
ORG 0000H
main: MOV R0,#10;
MOV DPTR, #4000H
repeat: CLR A
MOVC A, @A+DPTR ; Copy data from external location to accumulator
MOV P2, A ; Move the pattern of the digit into port P2
ACALL delay ; Call a delay to so that the transition is visible
INC DPTR ; Point to the next pattern
DJNZ R0, repeat ;Decrement R1 by 1 and repeat until becomes 0
SJMP main
delay:
MOV R1, #FFH

LP2: MOV R2, #FFH
LP1: DJNZ R2, LP1
DJNZ R1, LP2
RET
END

VHDL
Manish Man Shrestha

Overview
• VHDL is an acronym for VHSIC Hardware Description language (VHSIC is an
acronym for Very High Speed Integrated Circuits). It is a hardware description
language that can be used to model a digital system at many levels of
abstractions, ranging from the algorithmic level to the gate level. The complexity
of the digital system being modeled could vary from that of a simple gate to a
complete digital electronic system, or anything in between. The usage of VHDL are
– For describing hardware
– As a modeling language
– For simulation of hardware
– For early performance estimation of system architecture
– For synthesis of hardware
– For fault simulation, test and verification of design

Levels of representation and abstractions
• To keep description and design manageable
• Behavioral:
– Highest level of abstraction that describes a system in terms of what it does rather
than in terms of its components and interconnection between them. It describes
relationship between input and output signal.
– Let’s consider a simple circuit that warns car passengers when the door isn’t locked
and the seatbelt is not used whenever a car key is inserted in ignition lock. At
behavioral level, this could be expressed as
Warning: Ignitin_on AND(Door-open OR Seatbelt_off)
• Structural
– It describes a system as a collection of gates and compnents that are interconnected
to perform a desired function. It is usually closer to physical realization of system.

Basic structure of a VHDL file
• A digital system in VHDL consists of a design entity that can contain other entities that are then
considered components of the top-level entity. Each entity is modeled by an entity declaration and
an architecture body.
• Entity declaration interfaces to outside world that defines input and output signals
• Architecture body contains description of entity and is composed of interconnected entities, processes
along with components and all operations concurrently.
• VHDL uses reserved keywords which can’t be used as signal names or identifiers.
• Keyword and user-defined identifiers are case insensitive.
• The comments starts with two adjacent hyphens(--) and will be ignored by the compiler.
• VHDL also ignores line breaks and extra spaces.
• VHDL is strongly typed language which implies that one has always to declared type of every
object that can have a value, such as signals, constants and variables

Entity
• An entity represents a template for a hardware block
• It describes just the outside view of a hardware model – namely its
interface with other modules in terms of input and output signals.
• The hardware block can be the entire design, a part of it or indeed an
entire “test-bench”
• A test bench includes the circuit being designed, blocks which apply test
signals to it and those which monitor its output
• The inner operation of the entity is described by and ARCHITECTURE
associated with it.

Entity declaration
• The declaration of an ENTITY describes the signals which connect this
hardware to the outside. These are called port signals. It also provides
optional values of manifest constants. These are called generics.
• General Syntax
Entity NAME_OF_ENTITY
Port( signal_name: mode type;
signal name: mode type;
:
:
Signal_name: mode type);
End NAME_OF_ENTITY;

Entity declaration contd.
• The NAME_OF_ENTITY is user selected identifiers that specify the external interface signals
• An entity start with keyword “entity” and end with keyword “end”. Port declaration using keyword
“port”
• Mode: one of the reserved words to indicate signal direction.
– In: indicates signal is an input
– Out: indicates signal is an output that can only be read by other entity that use it
– Buffer: indicates signal is an output of an entity whose value can be read inside entity architecture
– Inout: signal can be input or output.
• Type: a built-in or user defined signal type
– Bit: can have value 0 or 1
– Bit_vector: vector of bit value 0 to 7
– Std_logic, std_ulogic, std_logic_vector, std_julogic_vector: can have nine values to indicate value and strength of a signal
– Boolean: can have value TRUE and FALSE
– Integer: can have range of integer values
– Real: can have range of real values
– Time: to indicate time

Architecture
• It describes how an ENTITY operates. An ARCHITECTURE is always associated with an ENTITY.
There can be multiple ARCHITECTURES associated with an ENTITTY. ARCHITECTURE can describe
an entity in a structural style, behavioral style or mixed style. The language provides constructs for
describing components, their interconnects and composition. The language also includes signal
assignment, sequential and concurrent statements for describing data and control flow, and for
behavioral design.
• Syntax:
architecture architecture_name of NAME_OF_ENTITY is
--Declaration
--component declarations
--signal declarations
--constant declaration
--type declaration
begin
--statements
end architecture_name;

Behavior Model
• Header of architecture body defines architecture name e.g. behavior and associates it with entity
BUZZER
• The architecture name can be any legal identifier
• The main body starts with keyword “begin”and ends with keyword “end”
• The “<=“ symbol is an assignment operator and assigns value of expression on right to signal on left
• Complete program look as follows
Library ieee;
Use ieee.std_logic_1164.all;
entity BUZZER is
port(DOOR, IGNITION, SBELT:in std_logic;
WARNING:out std_logic);
end BUZZER;
architecture behavior of BUZZER is
begin
WARNING<=(not DOOR and IGNITION)or(not SBELT and IGNITION);
end behavior;

Structural description
• The above program can also be described using a structural model that specifies what gates are used and
how they are interconnected. For example
--declaration of signals used to interconnect gates
Signal DOOR_NOT, SBELT_NOT,B1,B2: std_logic;
Architecture structural of BUZZER is
Begin
--Declaration
--component instantiations statements
component AND2
U0:NOT1 port map(DOOR->DOOR_NOT);
port(in1, in2: in std_logic;
U1:NOT1 port map(SBELT-.>SBELT_NOT);
out1: out std_logic);
U2:AND2 port map(IGNITION,DOOR_NOT,B1);
end component;
U3:AND2 port map(IGNITION,SBELT_NOT,B2);
component OR2
U4:OR2 port map(B1,B2,WARNING);
port(in1, in2: in std_logic;
end structural;
out1:out std_logic);
end component;
component NOT1
port(in1: in std_logic;
out1: out std_logic);
end component;

• Following header is declarative part that gives gates which will be used in
description of circuits
• The statements after begin keyword gives instantiations of components and
describes how these are interconnected
• A library is a place where compiler store information about a design
project.
• std_logic is define in package ieee.std_logic_1164 in the ieee library.
• Its done at beginning of VHDL file using the library and use keywords
• The .all extension indicates to use all of iee.std_logic_1164 package.
• The Xilins foundation express comes with several packages

VHDL language elements: identifiers
• They are user defined words used to name objects
• While choosing identifiers one need to follow these basic rules:
– Alpha-numeric characters and underscore
– First character must be a letter and last one can’t be an underscore
– It can’t include two consecutive underscores
– It is case ‘insensitive’. (Eg. AND2 AnD2 and and2 refer to same object)
– An identifier can be of any length.

VHDL language elements: data objects
• A data object holds a value of a specified type. It is
created by means of an object declaration. An example
is variable COUNT:INTEGER
– This results in the creation of a data object called COUNT,
which can hold values. The object COUNT is also declared to
be of variable class
• Every data object belongs to one of the following three
classes: i) Constant ii) Variable declaration iii) Signal

Constant
• A constant can have a single value of a given type and cannot
be changed during simulation. A constant is declared as follows:
– constant list_of_name_of_constant: type[:=initial value]
• Initial value is optional, constants can be declared to start of an
architecture and can then be used anywhere within the
architecture
• Constants declared within a process can only be used inside that
specific process
– Constant RISE_TIME:TIME:=2ns;

Variable declaration
• A variable may be changed during program execution. Variable value is
updated using a variable assignment statement. The variable is updated
without any delay as soon as the statement is executed. Variable must be
declared inside a process (and are local to process). The variable
declaration is as follows:
• Syntax
– variable list_of_variable_name:type[:=initial value];
• Some examples are
– variable GNIR_BIT:bit:=0;
– variable VARS:Boolean:=FALSE;
– variable SUM:integer range 0 to 256:=16;

Signal
• Signals are similar to wires on a schematic and can be
used to interconnect concurrent elements of design
• Syntax
– signal list_of_signal_names: type[:initial value];
• Examples:
– signal SUM, CARRY: std_logic;
– signal CLOCK:bit;
– signal TRIGGER:integer:=0;

Data types
• Every data object in VHDL can hold a value that belongs to a set of values. This set of value is specified by
using a type declaration. A type is a name that has associated with it a set of values and a set of operations

Scalar data types
Enumerated Real
An enumerated type declaration defines a type A floating point type has a set of values in a
that has a set of user-defined value consisting of given range of real numbers. Examples of floating
identifiers and character literals. point type declaration are
type MVL is (‘U’, ‘0’, ’1’, ‘T’); type TTL_VOLTAGE is range -5.5 to -1.4
type MICRO_OP is (LOAD, STORE, ADD, SUB, MUL, type REAL_DATA is range 0.0 to 31.9
DIV);
Physical
Integer A physical type contains values that represents
An integer type defines a type whose set of measurement of some physical quantity, like time,
values fall within a specified integer range. length, voltage or current
type INDEX is range 0 to 15; type Current is range 0 to 1000000000000
type WORD_LENGTH is range 31 downto 0; units
nA;
uA = 1000nA;
mA = 1000uA;
end units;

Operators
• The predefined operators in VHDL are classified into
the following six catagories.
i. Logical operators
ii. Relational operators
iii. Shift operators
iv. Adding operators
v. Multiplying operators
vi. Miscellaneous operators

Logical operators
• The seven logical operators are and, or, nand, nor, xor,
xnor, not. These operators are defined for the predefined
types BIT and BOOLEAN.

Relational Operators
• They are
i. =(equal)
ii. /=(not equal)
iii. <(less than)
iv. >(greater than)
v. <=(less than or equal to)
vi. >=(greater than or equal to)
This result type for all relational operations is always the predefined type
BOOLEAN

Shift operators
• These are
i. sll – shift left logical ROL – rotate left
ii. srl – shift right logical
10010011 rol 1 = 00100111
iii. sla – shift left arithmetic
iv. sra – shift right arithmetic ROR – rotate right
v. rol – rotate left 10010011 ror 1 = 110011001
vi. ror – rotate right

Addition, Multiply and Miscellaneous
• Addition
– These are + (addition), - (subtraction), & (concatenation). The operands for addition and
subtraction operator must be of the same numeric type with the result being of the same numeric
type. The operands for the concatenation operator can be either a one-dimensional array type or
an element type. The result is always an array type.
• For example: ‘0’ & ‘1’ results in an array of characters “01”
• Multiply
– These are * (multiplication), / (division), rem (remainder) and mod (modulus). Examples
• 7 mod 4 – has value 3
• (-7) rem 4 – has value -3
• Miscellaneous
– The miscellaneous operators are abs (absolute), **(exponential)

IEEE library

Embedded System Final

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Embedded System Final

Uploaded by

Copyright:

Available Formats

Introduction

Classification of Embedded Systems

Classification of Embedded systems

• Small scale embedded systems

Stand Alone Embedded Systems

Real Time Embedded Systems

Networked Embedded Systems

Mobile Embedded Systems

Small Scale Embedded Systems

Medium Scale Embedded Systems

Components of embedded system

Applications of Embedded Systems:

Embedded Systems in Automobiles and in telecommunications

• Motor and cruise control system

Embedded Systems in Smart Cards, Missiles and Satellites

Embedded Systems in Peripherals & Computer Networking

• Displays and Monitors

Embedded Systems in Consumer Electronics

1. Skills for small scale embedded system designer

Full understanding of microcontroller with a basic knowledge of computer architecture, digital

2. Skills for medium scale embedded system designer

A processor has two essential units:

i) Program flow control unit (CU)

An embedded system processor chip or core can be of the following:

1. General purpose processor (GPP)

1. Fast context switching

4. Digital signal processor (DSP)

A DSP provides fast, discrete-time, signal processing instructions.

5. Application specific system processors (ASSPs)

6. Multi-processor systems using general purpose processors (GPP)

Other hardware units

2. Clock oscillator circuits and clocking unit(s)

• Fetching the codes and data from memory

Important points regarding interrupt handling are as follows:

8. PWM and ADC

Figure 1: PWM signals of varying duty cycles

9. LCD and LED displays

Chapter 2: Custom single-purpose

Manish Man Shrestha 2

– General-purpose: variety of computation CCD Pixel coprocessor D2A

– Single-purpose: one particular lens

Manish Man Shrestha 3

Manish Man Shrestha 4

• Complementary Metal Oxide source source

Semiconductor gate Conducts

Manish Man Shrestha 5

Manish Man Shrestha 6

y is 1 if a is to 1, or b and c are 1. z is 1 if Inputs Outputs y = a'bc + ab'c' + ab'c + abc' + abc

Manish Man Shrestha 7

O= O0 =1 if I=0..00 sum = A+B less = 1 if A<B O = A op B

Manish Man Shrestha 8

Manish Man Shrestha 9

001 Present state Next state

Manish Man Shrestha 10

• Step 5: Excitation table

Present state Next state Flip flop input

Manish Man Shrestha 11

K-map for KB K-map for Jc K-map for Kc

Manish Man Shrestha 12

Manish Man Shrestha 13

controller and datapath a view inside the controller and datapath

Manish Man Shrestha 14

• Convert algorithm to go_i x_i y_i