Download as pdf or txt
Download as pdf or txt
You are on page 1of 199

Mansoura University

Faculty of Engineering
Dept. of Electronics and
Communication Engineering

Smart Blind Stick


A B. Sc. Project in
Electronics and Communications Engineering

Supervised by
Assist. Prof. Mohamed Abdel-Azim
Eng. Ahmed Shabaan, Eng. Mohamed Gamal, Eng. Eman Ashraf

Department of Electronics and Communications Engineering


Faculty of Engineering-Mansoura University

2011-2012
Mansoura University
Faculty of Engineering
Dept. of Electronics and Comm. Engineering

Smart Blind Stick


A B. Sc. Project in
Electronics and Communications Engineering

Supervised by

Assist. Prof. Mohamed Abdel-Azim

Eng. Ahmed Shabaan, Eng. Mohamed Gamal, Eng. Eman Ashrsf

Department of Electronics and Communications Engineering


Faculty of Engineering-Mansoura University

2011-2012
Team Work

Team Work

No. Name Contact Information

1 Ahmed Helmy Abd-Ghaffar Ahmed2033@gmail.com

2 Nesma Zein El-Abdeen Mohammed eng_nesma.zein@yahoo.com

3 Aya Gamal Osman El-Mansy eng_tota_20@hotmail.com

4 Fatma Ayman Mohammed angel_whisper89@hotmail.com

5 Ahmed Moawad Abo-Elenin Awad ahmedmowd@gmail.com

i
Acknowledgement

Acknowledgement
We would like to express our gratitude to our advisor and supervisor Dr.
Mohammed Abd ElAzim for guiding this work with interest. We would like to
also thank Eng. Ahmed Shaaban and Eng. Mohammed Gamal and Eng. Eman
Ashraf, Teaching Assistance for the countless hours he spent in the labs. We are
grateful to them for setting high standards and giving us the freedom to explore.
We would like to thank our colleagues for the assistance and constant support
provided by them.

Our Team

ii
Acknowledgement

iii
Abstract

Abstract

There is approximately 36.9 million people in the world are blind in 2002
according to World Health Organization. Majority of them are using a
conventional white cane to aid in navigation. The limitation in white cane is that
the information’s are gained by touching the objects by the tip of the cane. The
traditional length of a white cane depends on the height of user and it extends
from the floor to the person’s sternum. So we'll design ultrasound sensor to
detect all kinds of barriers whatever its shape or height and warn him with
vibration. Blind people also face great problems in moving from place to
another in the town and the only way for them is Guide dogs which can cost
about $20, 000 and they can be useful for about 5 – 6 years.

So we'll design GPS for blind people which help him in moving from place
to another in the town with voice orders for directions and he'll identify the
place he want to go with voice only and not need to type any thing.

But we want also to help him in moving indoor or in closed places he goes
daily from place to another we'll design an indoor navigation system depend on
working off line to help him to move from location to another in specific places
home, moles, libraries...Etc. also by voice orders .

He may face a great problem in control his electric devices we'll design for
him a total wireless control system to easily control all his electric devices by
voice connected to a security system to warn him if he indoor or out if any thing
wrong happen and help him to solve this problem .

iv
Contents

Chapter-01: Introduc on……………………………………………………………………………………………….. 1


1.1 Problem Definition …………………………………………………………………………………...... 1
1.2 Problem Solution …………………………………………………………………………………………. 1
1.3 Business Model ……………………………………………………………………………………………. 2
1.4 Block Diagram………………………………………………………………………………………………. 2
1.5 Detailed Technical Description ……………………….…………………………………………… 3
1.6 Pre-Project Planning….…………………………………………………………………………………. 4
1.7 Time Planning………………………………………………………………………………………………. 4
Chapter-02: Speech recognition ………………………………………………………………………………………… 7
2.1 Introduction ………………………………………………………………………………………………… 7
2.2 Literature review …………………………………………………………………………………………. 7
2.2.1 Pattern recognition ………………………………………………………. 7
2.2.2 Generation of voice ……………………………………………………… 9
2.2.3 Voice as biometric ………………………………………………………… 11
2.2.4 Speech recognition ………………………………………………………. 11
2.2.5 Speaker recognition ……………………………………………………… 12
2.2.6 Speech\speaker modeling …………………………………………….. 13
2.3 Implementation details ……………………………………………………………………………….. 13
2.3.1 Pre-processing and feature extraction …………………………… 13
2.4 Artificial neural network……………………………………………………............................. 22
2.4.1 Introduction ………………………………………………………………….. 22
2.4.2 Models ………………………………………………………………………….. 23
2.4.3 Network function …………………………………………………………... 24
2.4.4 Ann dependency graph ………………………………………………….. 24
2.4.5 Learning …………………………………………………………………………. 25
2.4.6 Choosing a cost function ……………………………………………….. 26
2.4.7 Learning paradigms ……………………………………………………….. 26
2.4.8 Supervised learning ……………………………………………………….. 26
2.4.9 unsupervised learning ……………………………………………………. 27
2.4.10 Reinforcement learning …………………………………………………. 27
2.4.11 Learning algorithms………………………………………………………… 28
2.4.12 Employing artificial neural network ……………………………….. 28
2.4.13 Application …………………………………………………………………….. 29
2.4.14 Types of models …………………………………………………………….. 30
2.4.15 Neural network software ………………………………………………. 31
2.4.16 Types of artificial neural network ………………………………….. 31
2.4.17 Confidence analysis of neural network ………………………….. 31
Chapter-03: Image Processing ………….…………………………………………………………………………….. 32
3.1 Introduction …………………………………………………………………………………………………. 33
3.1.1 What is digital image processing? ...................................... 33
3.1.2 Motivating problems ……………………………………………………… 33
3.2 Color vision ………………………………………………………………………………………………….. 34
3.2.1 Fundamentals ………………………………………………………………… 34
3.2.2 Image formats supported by mat lab …………………………….. 35
3.2.3 Working formats in mat lab …………………………………………… 35
3.3 Aspects of image processing ……………………………………………………………………….. 35
ii
Contents

3.4 Image types …………………………………………………………………………………………………. 36


3.4.1 Intensity image ……………………………………………………………… 36
3.4.2 Binary image …………………………………………………………………. 37
3.4.3 Indexed image ………………………………………………………………. 37
3.4.4 RGB image……………………………………………………………………… 37
3.4.5 Multi frame image …………………………………………………………. 37
3.5 How to …………………………………………………………………………………………………………. 38
3.5.1 How to convert between different formats …………………… 38
3.5.2 How to read file …………………………………………………………….. 38
3.5.3 Loading and saving variables in mat lab …………………………. 39
3.5.4 How to display an image in mat lab ……………………………….. 39
3.6 Some important definitions …………………………………………………………………………. 40
3.6.1 Imread function …………………………………………………………….. 40
3.6.2 Rotation ………………………………………………………………………… 40
3.6.3 Scaling …………………………………………………………………………… 41
3.6.4 Interpolation …………………………………………………………………. 41
3.7 Edge detection …………………………………………………………………………………………….. 41
3.7.1 Canny edge detection ……………………………………………………. 41
3.7.2 Edge tracing …………………………………………………………………… 42
3.8 Mapping ………………………………………………………………………………………………………. 43
3.8.1 Mapping image onto surface overview ………………………….. 43
3.8.2 Mapping an image onto elevation data …………………………. 44
3.8.3 Initializing the IDL display objects…………………………………… 46
3.8.4 Displaying image and geometric surface object……………… 47
3.8.5 Mapping an image onto sphere……………………………………… 51
3.9 Mapping offline……………………………………………………………………………………………. 51
Chapter-04: GPS naviga on………………………………………………………………………………………….. 53
4.1 Introduction ………………………………………………………………………………………………… 53
4.1.1 What is GPS ?...................................................................... 53
4.1.2 How it work ?...................................................................... 53
4.2 Basic concepts of GPS ………………………………………………………………………………….. 54
4.3 Position calculation ……………………………………………………………………………………… 55
4.4 Communication …………………………………………………………………………………………… 57
4.5 Message format ………………………………………………………………………………………….. 57
4.6 Satellite frequencies ……………………………………………………………………………………. 58
4.7 Navigation equations ………………………………………………………………………………….. 59
4.8 Bancroft's method ……………………………………………………………………………………….. 60
4.9 Trilateration …………………………………………………………………………………………………. 60
4.10 Multidimensional Newton-Raphson calculation …………………………………………. 60
4.11 Additional method for more than four satellites ………………………….................. 61
4.12 Error sources and analysis …………………………………………………………………………… 61
4.13 Accuracy enhancement and surveying ………………………………………………………… 61
4.13.1 Augmentation………………………………………………………………… 61
4.13.2 Precise monitoring…………………………………………………………. 62
4.14 Time keeping ………………………………………………………………………………………………. 63
4.14.1 Time keeping and leap seconds …………………………………….. 63
iii
Contents

4.14.2 Time keeping accuracy …………………………………………………… 63


4.14.3 Time keeping format………………………………………………………. 64
4.14.4 Carrier phase tracking ……………………………………………………. 64
4.15 GPS navigation …………………………………………………………………………………………….. 66
Chapter-05: Ultrasound ……………………………………………………………………………………………. 69
5.1 Introduction …………………………………………………………………………………………………. 69
5.1.1 History ……………………………………………………………………………. 69
5.2 Wave motion ……………………………………………………………………………………………….. 69
5.3 Wave characteristics ……………………………………………………………………………………. 71
5.4 Ultrasound intensity …………………………………………………………………………………….. 72
5.5 Ultrasound velocity ……………………………………………………………………………………… 75
5.6 Attenuation of ultrasound …………………………………………………………………………… 76
5.7 Reflection ……………………………………………………………………………………………………. 77
5.8 Refraction ……………………………………………………………………………………………………. 79
5.9 Absorption …………………………………………………………………………………………………. 81
5.10 Hardware part …………………………………………………………………………………… 83
5.10.1 Introduction ………………………………………………………….. 83
5.10.2 Calculating the distance…………………………………………. 87
5.10.3 Changing beam pattern and beam width …………………. 87
5.10.4 The development of the sensor………………………………… 88
Chapter-06: Microcontroller ………………………………………………………………………………………. 91
6.1 Introduction …………………………………………………………………………………….. 91
6.1.1 History of microcontroller ……………………………………… 92
6.1.2 Embedded design…………………………………………………….. 93
6.1.3 Interrupt …………………………………………………………………. 93
6.1.4 Programs ………………………………………………………………… 94
6.1.5 Other microcontroller feature ……………………………….. 94
6.1.6 Higher integration ……………………………………………………. 95
6.1.7 Programming environment ……………………………………… 97
6.2 Types of micro controller …………………………………………………………………. 98
6.2.1 Interrupt latency ………………………………………………………. 99
6.3 Microcontroller embedded memory technology ………………………… 100
6.3.1 Data……………………………………………………………………….. 100
6.3.2 Firmware ………………………………………………………………… 101
6.4 PIC microcontroller ………………………………………………………………………….. 101
6.4.1 Family core architecture ……………………………………….. 101
6.5 PIC component ………………………………………………………………………………….. 101
6.5.1 Logic circuit ……………………………………………………………… 106
6.5.2 Power supply …………………………………………………………… 119
6.6 Development tools…………………………………………………………………………… 127
6.6.1 Device programs …………………………………………………….. 127
6.6.2 Debugging ………………………………………………………………. 128
6.7 LCD display ……………………………………………………………………………………….. 130
6.7.1 LCD display pins ………………………………………………………. 131
6.7.2 LCD screen ……………………………………………………………… 131
6.7.3 LCD memory ………………………………………………………….. 132
iv
Contents

6.7.4 LCD basic command ………………………………………………….. 136


6.7.5 LCD connecting …………………………………………………………. 138
6.7.6 LCD initialization ……………………………………………………… 139
Chapter-07: System Implementa on ………………………………………………………………………… 141
7.1 Introduction ……………………………………………………………………………………… 141
7.2 Survey……………………………………………………………………………………………….. 142
7.3 Searches …………………………………………………………………………………………… 142
7.3.1 Ultra sound sensor……………………………………………………. 142
7.3.2 Indoor navigation systems ………………………………………. 142
7.3.3 Outdoor navigation ………………………………………………… 142
7.4 Sponsors ……………………………………………………………………………………….. 143
7.5 Pre-design ………………………………………………………………………………………. 143
7.5.1 List of matrices ………………………………………………………. 144
7.5.2 Competitive Benchmarking Information………… 145
7.5.3 Ideal and marginally acceptable target values ……….. 146
7.5.4 Time plan diagram …………………………………………………… 146
7.6 Design ……………………………………………………………………………………………… 147
7.6.1 Speech recognition ……………………………………………….. 147
7.6.2 Ultra sensors …………………………………………………………… 149
7.6.3 Outdoor navigation ………………………………………………… 150
7.7 Product architecture ……………………………………………………………………… 151
7.7.1 Product schematic ………………………………………………….. 151
7.7.2 Rough geometric layout …………………………………………. 152
7.7.3 Incidental interactions …………………………………………….. 153
7.8 Defining secondary system …………………………………………………………….. 154
7.9 Detailed interface specification ……………………………………………………… 154
7.10 Establishing the architecture of the chunks ……………………………………… 155
Chapter-08: conclusion ………………………………………………………………………………………………. 157
8.1 Introduction…………………………………………………………………………………. 158
8.2 Overview………………………………………………………………………………………….. 158
8.2.1 Outdoor navigation …………………………………………………… 158
8.2.1 8.2.1.1 Outdoor navigation online ……………………………………… 158
8.2.1.2 Outdoor navigation offline ………………………………………. 158
8.2 8.2.2 Ultrasound sensor …………………………………………………….. 159
8.2.3 Object identifier ………………………………………………………. 159
8.3 Features ……………………………………………………………………………………………. 159

v
CHAPTER 1
Introduction
Chapter 1 | Introduction

1.1 | PROBLEM DEFINITION

There is approximately 36.9 million people in the world are blind in 2002
according to World Health Organization. Majority of them are using a
conventional white cane to aid in navigation. The limitation in white cane is that
the information’s are gained by touching the objects by the tip of the cane. The
traditional length of a white cane depends on the height of user and it extends from
the floor to the person’s sternum.
Blind people also face great problems in moving from place to another in the
town and the only way for them is Guide dogs which can cost about $20, 000 and
they can be useful for about 5 – 6 years. They also have a great problem to identify
the objects he frequently used in his house as kitchen tools and clothes. And also
he may face a great problem in control his electric devices or have a security
problem and he can't face it.

1.2 | PROBLEM SOLUTION

All previous problems we're trying to solve them. To help the user moving
easily indoor and outdoor we'll use ultrasound sensor to detect the barriers on his
way and alert him by 2 ways vibration motor which speed increases when the
distance decreases and voice alert told him the distance between him and the
barrier.
To solve the problem of moving outside home from place to another we'll
design a software to be used in smart phones to help him in moving from place to
another with voice orders without any external help he just say the place he want to
go then the phone will guide him with voice orders to arrive this place. To help
him to identify the objects we'll use RFID every important object will have tag or
id when the reader read the id it will told him what it is by voice. Inside the home
we'll design a system to control all electronic devices by voice orders and also a
security system designed especially for them the most important in it is the fire
alarm when it detects a fire it will alert him by a call to his mobile phone and
another call to his friends near him for help and also a security system to warn him
if he forget to close his door. After finishing these applications we're going to make
features after graduation by adding new technologies to help him moving in the
street easier and help him crossing roads and reading books. The products in our
market in Egypt for them don't cover any needs for them.

1
Chapter 1 | Introduction

The blind needs to move control and do his tasks his self without any help
from anybody. There’s just a white stick without any technologies or features. So
finally we'll install on the white stick a sensor and RFID and the other part is a
software part on the mobile to do the navigation and automation tasks.

1.3 | BUSINESS MODEL

Our customers are blind people and a visually impaired person there's almost
1 million people in Egypt has one of the past problems.
Our product would cover some needs of our customers as helping them to avoid
the barriers on their way and guide them with voice to the direction they must go to
avoid this and also help the to move free without any external help in different
countries by android application on his mobile which designed especially for them
to guide them with voice through roads and tell them the direction they have to go
to arrive their goal.

To reach our goal we met with different customers to know exactly what they
need and help us to get a vision for our final product to be comfortable and also we
were guided technically by our sponsors to find the best way to cover all these
needs.
In our market the available products doesn't cover any needs we just found a white
stick without any technologies to help the user.

1.4 | BLOCK DIAGRAMS

Fig.(1.1): General Project Block Diagram

2
Chapter 1 | Introduction

1.5 DETAILED TECHNICAL DESCRIPTION

Our project was built on the simplest available technologies to reach our goal
in the way that comfort the user so we divided our project into 2 parts software and
hardware.
The hardware part consists of MCU pic, MP3 module, cam module and
ultrasound sensor module.
The software part is an android application available to be installed on the
mobile.
In the hardware part there're 2 conditions for it indoor and outdoor.
For indoor only one sensor will measure ranges and cam module will take a photo
to the object when the user reaches 2 cm to detect the code put on and send it to
MCU which processing it and identify the code number and then get the object
name from database and then connect the mp3 module WT588D and get the mp3
file address which contains the name of it and out to from the speaker.
For outdoor 3 sensors HC-SR04 sensors will be activated in 3 direction to
determine the best way no barriers on it and send measured data to MCU and the
MCU detect the best way and send the address of the mp3 which contains the
wanted direction and it would be the output.
For navigation outdoor we'll design android application using Google maps
the user detect the place he want to go with voice and the application detect his
current position using GPS and the digital compass detect the angel of view and
guide him to the direction using GPS data and compass data.

Choose Mode

Left Button Right Button

Outdoor Indoor

Fig. (1.2): Button Configuration

3
Chapter 1 | Introduction

Fig. (1.3): Indoor & Outdoor Processes Block Diagram

1.6 PRE-PROJECT PLANNING

We start searching for a problem no one care it and we found blinds'


problems take no care to be solved and available products in Egypt aren't found. So
we found it's a good field to start in it to get an opportunity to solve a problem and
also enter a new field in the market with low number of Competitors.

1.7 TIME PLANNING

Project Timing:
The three main parts are individual in execution time but each part has many
branches which are series in execution time.

Timing of Product Introductions:


The timing of launching the product is dependent on the marketing and the
market studying again to the products which must be having low cost and high
quality.

4
Chapter 1 | Introduction

Technology Readiness:
One of the fundamental components in the product is technology because the
Android and Ultrasonic technology are taking good importance between the
Egyptian customers.

Market Readiness:
The market always has a readiness to any new product the market is common
between products to give the customers the best one for them.

The Product Plan:


This plan makes the project comfortable in his implementation because
anything arranged or planned to do give the best results.

5
CHAPTER 2
Speech Recognition
Chapter 2 | Speech Recognition

2.1 | INTRODUCTION

Biometrics is, in the simplest definition, something you are. It is a physical


characteristic unique to each individual such as fingerprint, retina, iris, speech.
Biometrics has a very useful application in security; it can be used to authenticate a
person’s identity and control access to a restricted area, based on the premise that
the set of these physical characteristics can be used to uniquely identify
individuals. Speech signal conveys two important types of information, the
primarily the speech content and on the secondary level, the speaker identity.
Speech recognizers aim to extract the lexical information from the speech signal
independently of the speaker by reducing the inter-speaker variability. On the other
hand, speaker recognition is concerned with extracting the identity of the person
speaking the utterance. So both speech recognition and speaker recognition system
is possible from same voice input.
We use in our project the speech recognition technique because we want in
our project to recognize the word that the stick will make action depending on this
word.
Mel Filter Cepstral Coefficient (MFCC) is used as feature for both speech and
speaker recognition. We also combined energy features and delta and delta-delta
features of energy and MFCC. After calculating feature, neural networks are used
to model the speech recognition. Based on the speech model the system decides
whether or not the uttered speech matches what was prompted to utter.

2.2 | LITERATURE REVIEW

2.2.1 | Pattern Recognition

Pattern recognition, one of the branches of artificial intelligence, sub-section


of machine learning, is the study of how machines can observe the environment,
learn to distinguish patterns of interest from their background, and make sound and
reasonable decisions about the categories of the patterns. A pattern can be a
fingerprint image, a handwritten cursive word, a human face, or a speech signal,
sales pattern etc…
The applications of pattern recognition include data mining, document
classification, financial forecasting, organization and retrieval of multimedia
databases, and biometrics (personal identification based on various physical
attributes such as face, retina, speech, ear and fingerprints).The essential steps of

7
Chapter 2 | Speech Recognition

pattern recognition are: Data Acquisition, Preprocessing, Feature Extraction,


Training and Classification.
Features are used to denote the descriptor. Features must be selected so that
they are discriminative and invariant. They can be represented as a vector, matrix,
tree, graph, or string.

They are ideally similar for objects in the same class and very different for
objects indifferent class. Pattern class is a family of patterns that share some
common properties. Pattern recognition by machine involves techniques for
assigning patterns to their respective classes automatically and with as little human
intervention as possible.

Learning and Classification usually use one of the following approaches:


Statistical Pattern Recognition is based on statistical characterizations of patterns,
assuming that the patterns are generated by a probabilistic system. Syntactical (or
Structural) Pattern Recognition is based on the structural interrelationships of
features. Given a pattern, its recognition/classification may consist of one of the
following two tasks according to the type of learning procedure:

1) Supervised Classification (e.g., Discriminant Analysis) in which the input pattern


is identified as a member of a predefined class.

2) Unsupervised Classification (e.g., clustering) in which the pattern is assigned to


a previously unknown class.

Fig. (2.1): General block diagram of pattern recognition system

8
Chapter 2 | Speech Recognition

2.2.2 | Generation of Voice

Speech begins with the generation of an airstream, usually by the lungs and
diaphragm -process called initiation. This air then passes through the larynx tube,
where it is modulated by the glottis (vocal chords). This step is called phonation or
voicing, and is responsible fourth generation of pitch and tone. Finally, the
modulated air is filtered by the mouth, nose, and throat - a process called
articulation - and the resultant pressure wave excites the air.

Fig. (2.2): Vocal Schematic

Depending upon the positions of the various articulators different sounds are
produced. Position of articulators can be modeled by linear time- invariant system
that has frequency response characterized by several peaks called formants. The
change in frequency of formants characterizes the phoneme being articulated.
As a consequence of this physiology, we can notice several characteristics of
the frequency domain spectrum of speech. First of all, the oscillation of the glottis

9
Chapter 2 | Speech Recognition

results in an underlying fundamental frequency and a series of harmonics at


multiples of this fundamental. This is shown in the figure below, where we have
plotted a brief audio waveform for the phoneme /i: / and its magnitude spectrum.
The fundamental frequency (180 Hz) and its harmonics appear as spikes in the
spectrum. The location of the fundamental frequency is speaker dependent, and is a
function of the dimensions and tension of the vocal chords. For adults it usually
falls between 100 Hz and 250 Hz, and females‟ average significantly higher than
that of males.

Fig. (2.3): Audio Sample for /i: / phoneme showing stationary property of phonemes for a short period

The sound comes out in phonemes which are the building blocks of speech.
Each phoneme resonates at a fundamental frequency and harmonics of it and thus
has high energy at those frequencies in other words have different formats. It is the
feature that enables the identification of each phoneme at the recognition stage.
The variations in

Fig.(2.4): Audio Magnitude Spectrum for /i:/ phoneme showing fundamental frequency and its harmonics

10
Chapter 2 | Speech Recognition

Inter-speaker features of speech signal during utterance of a word are modeled in


word training in speech recognition. And for speaker recognition the intra-speaker
variations in features in long speech content is modeled.

Besides the configuration of articulators, the acoustic manifestation of a phoneme


is affected by:
 Physiology and emotional state of speaker.
 Phonetic context.
 Accent.

2.2.3 | Voice as Biometric

The underlying premise for voice authentication is that each person’s voice
differs in pitch, tone, and volume enough to make it uniquely distinguishable.
Several factors contribute to this uniqueness: size and shape of the mouth, throat,
nose, and teeth (articulators) and the size, shape, and tension of the vocal cords.
The chance that all of these are exactly the same in any two people is very low.
Voice Biometric has following advantages from other form of biometrics:
 Natural signal to produce
 Implementation cost is low since, doesn’t require specialized input device
 Acceptable by user
Easily mixed with other form of authentication system for multifactor
authentication only biometric that allows users to authenticate remotely.

2.2.4 | Speech Recognition

Speech is the dominant means for communication between humans, and


promises to be important for communication between humans and machines, if it
can just be made a little more reliable.
Speech recognition is the process of converting an acoustic signal to a set of
words. The applications include voice commands and control, data entry, voice
user interface, automating the telephone operator’s job in telephony, etc. They can
also serve as the input to natural language processing. There is two variant of
speech recognition based on the duration of speech signal:
Isolated word recognition, in which each word is surrounded by some sort of
pause, is much easier than recognizing continuous speech, in which words run into
each other and have to be segmented. Speech recognition is a difficult task because

11
Chapter 2 | Speech Recognition

of the many source of variability associated with the signal such as the acoustic
realizations of phonemes, the smallest sound units of which words are composed,
are highly dependent on the context. Acoustic variability can result from changes
in the environment as well as in the position and characteristics of the transducer.
Third, within speaker variability can result from changes in the speaker's physical
and emotional state, speaking rate, or voice quality. Finally, differences in socio
linguistic background, dialect, and vocal tract size and shape can contribute to
cross-speaker variability. Such variability is modeled in various ways. At the level
of signal representation, the representation that emphasizes the speaker
independent features is developed.

2.2.5 | Speaker Recognition

Speaker recognition is the process of automatically recognizing who is


speaking on the basis of individual’s information included in speech waves.
Speaker recognition can be classified into identification and verification. Speaker
recognition has been applied most often as means of biometric authentication.

2.2.5.1 | Types of Speaker Recognition

Speaker Identification
Speaker identification is the process of determining which registered speaker
provides a given utterance. In Speaker Identification (SID) system, no identity
claim is provided, the test utterance is scored against a set of known (registered)
references for each potential speaker and the one whose model best matches the
test utterance is selected. There is two types of speaker identification task closed-
set and open-set speaker identification .In closed-set, the test utterance belongs to
one of the registered speakers.
During testing, a matching score is estimated for each registered speaker. The
speaker corresponding to the model with the best matching score is selected. This
requires N comparisons for a population of N speakers. In open-set, any speaker
can access the system; those who are not registered should be rejected. This
requires another model referred to as garbage model or imposter model or
background model, which is trained with data provided by other speakers different
from the registered speakers.
During testing, the matching score corresponding to the best speaker model is
compared with the matching score estimated using the garbage model. In order to
accept or reject the speaker, making the total number of comparisons equal to N +

12
Chapter 2 | Speech Recognition

1. Speaker identification performance tends to decrease as the population size


increases.

Speaker verification
Speaker verification, on the other hand, is the process of accepting or
rejecting the identity claim of a speaker. That is, the goal is to automatically accept
or reject an identity that is claimed by the speaker. During testing, a verification
score is estimated using the claimed speaker model and the anti-speaker model.
This verification score is then compared to a threshold. If the score is higher than
the threshold, the speaker is accepted, otherwise, the speaker is rejected.
Thus, speaker verification, involves a hypothesis test requiring a simple
binary decision: accept or reject the claimed identity regardless of the population
size. Hence, the performance is quite independent of the population size, but it
depends on the number of test utterances used to evaluate the performance of the
system.

2.2.6 | Speaker/Speech Modeling

There are various pattern modeling/matching techniques. They include


Dynamic Time Warping (DTW), Gaussian Mixture Model (GMM), Hidden
Markov Modeling (HMM), Artificial Neural Network (ANN), and Vector
Quantization (VQ). These are interchangeably used for speech, speaker modeling.
The best approach is statistical learning methods: GMM for Speaker Recognition,
which models the variations in features of a speaker for a long sequence of
utterance.

And another statistical method widely used for speech recognition is HMM.
HMM models the Markovian nature of speech signal where each phoneme
represents a state and sequence of such phonemes represents a word. Sequence of
Features of such phonemes from different speakers is modeled by HMM.

2.3 | IMPLEMENTATION DETAILS

The implementation of system includes common pre-processing and feature


extraction module, speaker independent speech modeling and classification by
ANNs.

2.3.1 | Pre-Processing and Feature Extraction

13
Chapter 2 | Speech Recognition

Starting from the capturing of audio signal, feature extraction consists of the
following steps as shown in the block diagram below:
Speech
Silence Pre- Mel Filter
Signal removal emphasis
Framing Windowing DFT Bank Log

IDF
T

CMS
12MFCC
12 ΔMFCC
12 ΔΔ MFCC
Energy Delta
1 energy
1 Δ energy
1 ΔΔ energy
Fig. (2.5): Pre-Processing and Feature Extraction

2.3.1.1 | Capture

The first step in processing speech is to convert the analog representation


(first air pressure, and then analog electric signals in a microphone) into a digital
signal x[n], where n is an index over time. Analysis of the audio spectrum shows
that nearly all energy resides in the band between DC and 4 kHz, and beyond 10
kHz there is virtually no energy what so ever.
Used sound format:
 22050 Hz
 16-bits, Signed
 Little Endian
 Mono Channel
 Uncompressed PCM

2.3.1.2 | End point detection and Silence removal

The captured audio signal may contain silence at different positions such as
beginning of signal, in between the words of a sentence, end of signal…. etc. If
silent frames are included, modeling resources are spent on parts of the signal
which do not contribute to the identification. The silence present must be removed
before further processing. There are several ways for doing this: most popular are
Short Time Energy and Zeros Crossing Rate. But they have their own limitation
regarding setting thresholds as an ad hocbasis. The algorithm we used uses

14
Chapter 2 | Speech Recognition

statistical properties of background noise as well as physiological aspect of speech


production and does not assume any ad hoc threshold.
It assumes that background noise present in the utterances is Gaussian in
nature. Usually first 200msec or more (we used 4410 samples for the sampling rate
22050samples/sec) of a speech recording corresponds to silence (or background
noise) because the speaker takes some time to read when recording starts.
Endpoint Detection Algorithm:
Step 1:
Calculate the mean (μ) and standard deviation (σ) of the first 200ms samples
of the given utterance. The background noise is characterized by this μ and σ.

Step 2:
Go from 1st sample to the last sample of the speech recording. In each
sample, check whether one-dimensional Mahalanobis distance functions i.e. | x-μ |/
σ greater than 3 or not. If Mahalanobis distance function is greater than 3, the
sample is to be treated as voiced sample otherwise it is an unvoiced/silence. The
threshold reject the samples up to 99.7% as per given by P [|x−μ|≤3σ] =0.997 in a
Gaussian distribution thus accepting only the voiced samples.

Step 3:
Mark the voiced sample as 1 and unvoiced sample as 0. Divide the whole
speech signal into 10 ms non-overlapping windows. Represent the complete speech
by only zeros and ones.

Step 4:
Consider there are M number of zeros and N number of ones in a window. If
M ≥ N then convert each of ones to zeros and vice versa. This method adopted here
keeping in mind that a speech production system consisting of vocal cord, tongue,
vocal tract etc. cannot change abruptly in a short period of time window taken here
as 10ms.

Step 5:
Collect the voiced part only according to the labeled „1‟ samples from the
windowed array and dump it in a new array. Retrieve the voiced part of the
original speech signal from labeled 1 sample.

15
Chapter 2 | Speech Recognition

Fig. (2.6): Input signal to End-point detection system

Fig. (2.7): Output signal from End point Detection System

2.3.1.3 | PCM Normalization

The extracted pulse code modulated values of amplitude is normalized, to


avoid amplitude variation during capturing.

2.3.1.4 | Pre-emphasis

Usually speech signal is pre-emphasized before any further processing, if we


look at the spectrum for voiced segments like vowels, there is more energy at
lower frequencies than the higher frequencies. This drop in energy across
frequencies is caused by the nature of the glottal pulse. Boosting the high
frequency energy makes information from these higher formants more available to
the acoustic model and improves phone detection accuracy. The pre-emphasis filter
is a first-order high-pass filter. In the time domain, with input x[n]and 0.9 ≤ α ≤
1.0, the filter equation is:
y[n] = x[n]− α x[n−1]
We used α=0.95.

16
Chapter 2 | Speech Recognition

Fig. (2.8): Signal before Pre-Emphasis

Fig.(2.9): Signal after Pre-Emphasis

2.3.1.5 | Framing and windowing

Speech is a non-stationary signal, meaning that its statistical properties are not
constant across time. Instead, we want to extract spectral features from a small
window of speech that characterizes a particular sub phone and for which we can
make the (rough) assumption that the signal is stationary (i.e. its statistical
properties are constant within this region).We used frame block of 23.22ms with
50% overlapping i.e., 512 samples per frame.

17
Chapter 2 | Speech Recognition

Fig.(2.10): Frame Blocking of the Signal

The rectangular window (i.e., no window) can cause problems, when we do


Fourier analysis; it abruptly cuts of the signal at its boundaries. A good window
function has a narrow main lobe and low side lobe levels in their transfer functions,
which shrinks the values of the signal toward zero at the window boundaries,
avoiding discontinuities. The most commonly used window function in speech
processing is the Hamming window defined as follows:
( )
( ) { ( )}

Fig.(2.11): Hamming window

The extraction of the signal takes place by multiplying the value of the signal
at time n, s frame [n], with the value of the window at time n, S w [n]:
Y[n] = Sw[n] × Sframe[n]

18
Chapter 2 | Speech Recognition

Fig.(2.12): A single frame before and after windowing

2.3.1.6 | Discrete Fourier Transform

A Discrete Fourier Transform (DFT) of the windowed signal is used to extract


the frequency content (the spectrum) of the current frame. The tool for extracting
spectral information i.e., how much energy the signal contains at discrete
frequency bands for a discrete-time (sampled) signal is the Discrete Fourier
Transform or DFT. The input to the DFT is a windowed signal x[n]...x[m], and the
output, for each of N discrete frequency bands, is a complex number X[k]
representing the magnitude and phase of that frequency component in the original
signal.
( )
|∑ ( ) |

The commonly used algorithm for computing the DFT is the Fast Fourier
Transform or in short FFT.

2.3.1.7 | Mel Filter

For calculating the MFCC, first, a transformation is applied according to the


following formula:
( ) [ ]

Where, x is the linear frequency. Then, a filter bank is applied to the


amplitude of the Mel-scaled spectrum. The Mel frequency warping is most
conveniently done by utilizing a filter bank with filters centered according to Mel

19
Chapter 2 | Speech Recognition

frequencies. The width of the triangular filters varies according to the Mel scale, so
that the log total energy in a critical band around the center frequency is included.
The centers of the filters are uniformly spaced in the Mel scale.

Fig.(2.13): Equally spaced Mel values

The result of Mel filter is information about distribution of energy at each Mel
scale band. We obtain a vector of outputs (12 coeffs.) from each filter.

Fig.(2.13): Triangular filter bank in frequency scale

We have used 30 filters in the filter bank.

20
Chapter 2 | Speech Recognition

2.3.1.8 | Cestrum by Inverse Discrete Fourier Transform

Cestrum transform is applied to the filter outputs in order to obtain MFCC


feature of each frame. The triangular filter outputs Y (i), i=0, 1, 2… M are
compressed using logarithm, and discrete cosine transform (DCT) is applied. Here,
M is equal to number of filters in filter bank i.e., 30.

[ ] ∑ () [ ( )]

Where, C[n] is the MFCC vector for each frame.


The resulting vector is called the Mel-frequency cepstrum (MFC), and the
individual components are the Mel-frequency Cepstral coefficients (MFCCs). We
extracted 12 features from each speech frame.

2.3.1.9 | Post Processing

Cepstral Mean Subtraction (CMS)


A speech signal may be subjected to some channel noise when recorded, also
referred to as the channel effect. A problem arises if the channel effect when
recording training data for a given person is different from the channel effect in
later recordings when the person uses the system. The problem is that a false
distance between the training data and newly recorded data is introduced due to the
different channel effects. The channel effect is eliminated by subtracting the Mel-
cepstrum coefficients with the mean Mel-cepstrum coefficients:

( ) ( ) ∑ ( )

The energy feature


The energy in a frame is the sum over time of the power of the samples in the
frame; thus for a signal x in a window from time sample t1 to time sample t2 the
energy is:

∑ [ ]

Delta feature
Another interesting fact about the speech signal is that it is not constant from
frame to frame. Co-articulation (influence of a speech sound during another

21
Chapter 2 | Speech Recognition

adjacent or nearby speech sound) can provide a useful cue for phone identity. It
can be preserved by using delta features. Velocity (delta) and acceleration (delta
delta) coefficients are usually obtained from the static window based information.
This delta and delta delta coefficients model the speed and acceleration of the
variation of Cepstral feature vectors across adjacent windows. A simple way to
compute deltas would be just to compute the difference between frames; thus the
delta value d(t ) for a particular Cepstral value c (t) at time t can be estimated as:
( ) [] [] []
The differentiating method is simple, but since it acts as a high-pass filtering
operation on the parameter domain, it tends to amplify noise. The solution to this is
linear regression, i.e. first-order polynomial, the least squares solution is easily
shown to be of the following form:
∑ []
[]

Where, M is regression window size. We used M=4.


Composition of Feature Vector
We calculated 39 Features from each frame:
 12 MFCC Features.
 12 Deltas MFCC.
 12 Delta-Deltas MFCC.
 1 Energy Feature.
 1 Delta Energy Feature.
 1 Delta-Delta Energy Feature.

2.4 | ARTIFICIAL NEURAL NETWORKS

2.4.1 | Introduction

We have used ANNs to model our system and train voices and test it to
classify it into words categories which return actions. And here we will make an
overview about artificial neural networks.
The original inspiration for the term Artificial Neural Network came from
examination of central nervous systems and their neurons, axons, dendrites, and
synapses, which constitute the processing elements of biological neural networks
investigated by neuroscience. In an artificial neural network, simple artificial
nodes, variously called "neurons", "neurodes", "processing elements" (PEs) or

22
Chapter 2 | Speech Recognition

"units", are connected together to form a network of nodes mimicking the


biological neural networks — hence the term "artificial neural network".
Because neuroscience is still full of unanswered questions, and since there are
many levels of abstraction and therefore many ways to take inspiration from the
brain, there is no single formal definition of what an artificial neural network is.
Generally, it involves a network of simple processing elements that exhibit
complex global behavior determined by connections between processing elements
and element parameters. While an artificial neural network does not have to be
adaptive per se, its practical use comes with algorithms designed to alter the
strength (weights) of the connections in the network to produce a desired signal
flow.
These networks are also similar to the biological neural networks in the sense
that functions are performed collectively and in parallel by the units, rather than
there being a clear delineation of subtasks to which various units are assigned (see
also connectionism). Currently, the term Artificial Neural Network (ANN) tends to
refer mostly to neural network models employed in statistics, cognitive psychology
and artificial intelligence. Neural network models designed with emulation of the
central nervous system (CNS) in mind are a subject of theoretical neuroscience and
computational neuroscience.
In modern software implementations of artificial neural networks, the
approach inspired by biology has been largely abandoned for a more practical
approach based on statistics and signal processing. In some of these systems,
neural networks or parts of neural networks (such as artificial neurons) are used as
components in larger systems that combine both adaptive and non-adaptive
elements. While the more general approach of such adaptive systems is more
suitable for real-world problem solving, it has far less to do with the traditional
artificial intelligence connectionist models. What they do have in common,
however, is the principle of non-linear, distributed, parallel and local processing
and adaptation. Historically, the use of neural networks models marked a paradigm
shift in the late eighties from high-level (symbolic) artificial intelligence,
characterized by expert systems with knowledge embodied in if-then rules, to low-
level (sub-symbolic) machine learning, characterized by knowledge embodied in
the parameters of a dynamical system.

2.4.2 | Models

23
Chapter 2 | Speech Recognition

Neural network models in artificial intelligence are usually referred to as


artificial neural networks (ANNs); these are essentially simple mathematical
models defining a function or a distribution over or both and , but sometimes
models are also intimately associated with a particular learning algorithm or
learning rule. A common use of the phrase ANN model really means the definition
of a class of such functions (where members of the class are obtained by varying
parameters, connection weights, or specifics of the architecture such as the number
of neurons or their connectivity).

2.4.3 | Network Function

The word network in the term 'artificial neural network' refers to the inter–
connections between the neurons in the different layers of each system. An
example system has three layers. The first layer has input neurons, which send data
via synapses to the second layer of neurons, and then via more synapses to the
third layer of output neurons. More complex systems will have more layers of
neurons with some having increased layers of input neurons and output neurons.
The synapses store parameters called "weights" that manipulate the data in the
calculations. An ANN is typically defined by three types of parameters:
 The interconnection pattern between different layers of neurons
 The learning process for updating the weights of the interconnections
 The activation function that converts a neuron's weighted input to its output
activation.
Mathematically, a neuron's network function is defined as a composition of
other functions, which can further be defined as a composition of other functions.
This can be conveniently represented as a network structure, with arrows depicting
the dependencies between variables. A widely used type of composition is the
nonlinear weighted sum, where (commonly referred to as the activation function)
is some predefined function, such as the hyperbolic tangent. It will be convenient
for the following to refer to a collection of functions as simply a vector.

2.4.4 | ANN dependency graph


This figure depicts such a decomposition of , with dependencies between
variables indicated by arrows. These can be interpreted in two ways.
The first view is the functional view: the input is transformed into a 3-
dimensional vector , which is then transformed into a 2-dimensional vector , which
is finally transformed into . This view is most commonly encountered in the
context of optimization.

24
Chapter 2 | Speech Recognition

The second view is the probabilistic view: the random variable depends upon
the random variable , which depends upon , which depends upon the random
variable . This view is most commonly encountered in the context of graphical
models.
The two views are largely equivalent. In either case, for this particular
network architecture, the components of individual layers are independent of each
other (e.g., the components of are independent of each other given their input).
This naturally enables a degree of parallelism in the implementation. Two separate
depictions of the recurrent ANN dependency graph.
Networks such as the previous one are commonly called feed forward,
because their graph is a directed acyclic graph. Networks with cycles are
commonly called recurrent. Such networks are commonly depicted in the manner
shown at the top of the figure, where is shown as being dependent upon itself.
However, an implied temporal dependence is not shown.

2.4.5 | Learning

What has attracted the most interest in neural networks is the possibility of
learning. Given a specific task to solve, and a class of functions, learning means
using a set of observations to find which solves the task in some optimal sense.
This entails defining a cost function such that, for the optimal solution, - i.e.,
no solution has a cost less than the cost of the optimal solution (see Mathematical
optimization).
The cost function is an important concept in learning, as it is a measure of
how far away a particular solution is from an optimal solution to the problem to be
solved. Learning algorithms search through the solution space to find a function
that has the smallest possible cost.
For applications where the solution is dependent on some data, the cost must
necessarily be a function of the observations; otherwise we would not be modeling
anything related to the data. It is frequently defined as a statistic to which only
approximations can be made. As a simple example, consider the problem of
finding the model , which minimizes , for data pairs drawn from some distribution
. In practical situations we would only have samples from and thus, for the above
example, we would only minimize . Thus, the cost is minimized over a sample of
the data rather than the entire data set.

25
Chapter 2 | Speech Recognition

When some form of online machine learning must be used, where the cost is
partially minimized as each new example is seen. While online machine learning is
often used when is fixed, it is most useful in the case where the distribution
changes slowly over time. In neural network methods, some form of online
machine learning is frequently used for finite datasets.

2.4.6 | Choosing a cost function

While it is possible to define some arbitrary, ad hoc cost function, frequently a


particular cost will be used, either because it has desirable properties (such as
convexity) or because it arises naturally from a particular formulation of the
problem (e.g., in a probabilistic formulation the posterior probability of the model
can be used as an inverse cost). Ultimately, the cost function will depend on the
desired task. An overview of the three main categories of learning tasks is provided
below.

2.4.7 | Learning paradigms

There are three major learning paradigms, each corresponding to a particular


abstract learning task. These are supervised learning, unsupervised learning and
reinforcement learning.

2.4.8 | Supervised learning

In supervised learning, we are given a set of example pairs and the aim is to
find a function in the allowed class of functions that matches the examples. In
other words, we wish to infer the mapping implied by the data; the cost function is
related to the mismatch between our mapping and the data and it implicitly
contains prior knowledge about the problem domain.
A commonly used cost is the mean-squared error, which tries to minimize the
average squared error between the network's output, f(x), and the target value y
over all the example pairs. When one tries to minimize this cost using gradient
descent for the class of neural networks called multilayer perceptron’s, one obtains
the common and well-known back-propagation algorithm for training neural
networks.
Tasks that fall within the paradigm of supervised learning are pattern
recognition (also known as classification) and regression (also known as function
approximation). The supervised learning paradigm is also applicable to sequential

26
Chapter 2 | Speech Recognition

data (e.g., for speech and gesture recognition). This can be thought of as learning
with a "teacher," in the form of a function that provides continuous feedback on the
quality of solutions obtained thus far.

2.4.9 | Unsupervised learning

In unsupervised learning, some data is given and the cost function to be


minimized, that can be any function of the data and the network's output.
The cost function is dependent on the task (what we are trying to model) and
our a priori assumptions (the implicit properties of our model, its parameters and
the observed variables).
As a trivial example, consider the model, where is a constant and the cost.
Minimizing this cost will give us a value of that is equal to the mean of the data.
The cost function can be much more complicated. Its form depends on the
application: for example, in compression it could be related to the mutual
information between and, whereas in statistical modeling, it could be related to the
posterior probability of the model given the data. (Note that in both of those
examples those quantities would be maximized rather than minimized).
Tasks that fall within the paradigm of unsupervised learning are in general
estimation problems; the applications include clustering, the estimation of
statistical distributions, compression and filtering.

2.4.10 | Reinforcement learning

In reinforcement learning, data are usually not given, but generated by an


agent's interactions with the environment. At each point in time, the agent performs
an action and the environment generates an observation and an instantaneous cost,
according to some (usually unknown) dynamics. The aim is to discover a policy
for selecting actions that minimizes some measure of a long-term cost; i.e., the
expected cumulative cost. The environment's dynamics and the long-term cost for
each policy are usually unknown, but can be estimated.
More formally, the environment is modeled as a Markov decision process
(MDP) with states and actions with the following probability distributions: the
instantaneous cost distribution, the observation distribution and the transition,
while a policy is defined as conditional distribution over actions given the
observations. Taken together, the two define a Markov chain (MC). The aim is to

27
Chapter 2 | Speech Recognition

discover the policy that minimizes the cost; i.e., the MC for which the cost is
minimal.
ANNs are frequently used in reinforcement learning as part of the overall
algorithm. Dynamic programming has been coupled with ANNs (Neuro dynamic
programming) by Bertsekas and Tsitsiklis and applied to multi-dimensional
nonlinear problems such as those involved in vehicle routing or natural resources
management because of the ability of ANNs to mitigate losses of accuracy even
when reducing the discretization grid density for numerically approximating the
solution of the original control problems.
Tasks that fall within the paradigm of reinforcement learning are control
problems, games and other sequential decision making tasks.

2.4.11 | Learning algorithms

Training a neural network model essentially means selecting one model from
the set of allowed models (or, in a Bayesian framework, determining a distribution
over the set of allowed models) that minimizes the cost criterion. There are
numerous algorithms available for training neural network models; most of them
can be viewed as a straightforward application of optimization theory and
statistical estimation.
Most of the algorithms used in training artificial neural networks employ some
form of gradient descent. This is done by simply taking the derivative of the cost
function with respect to the network parameters and then changing those
parameters in a gradient-related direction.
Evolutionary methods, simulated annealing, expectation-maximization, non-
parametric methods and particle swarm optimization are some commonly used
methods for training neural networks.

2.4.12 | Employing artificial neural networks

Perhaps the greatest advantage of ANNs is their ability to be used as an


arbitrary function approximation mechanism that 'learns' from observed data.
However, using them is not so straightforward and a relatively good understanding
of the underlying theory is essential.
Choice of model: This will depend on the data representation and the
application. Overly complex models tend to lead to problems with learning.

28
Chapter 2 | Speech Recognition

Learning algorithm: There is numerous trades-offs between learning


algorithms. Almost any algorithm will work well with the correct hyper parameters
for training on a particular fixed data set. However selecting and tuning an
algorithm for training on unseen data requires a significant amount of
experimentation.

Robustness: If the model, cost function and learning algorithm are selected
appropriately the resulting ANN can be extremely robust.
With the correct implementation, ANNs can be used naturally in online
learning and large data set applications. Their simple implementation and the
existence of mostly local dependencies exhibited in the structure allows for fast,
parallel implementations in hardware.

2.4.13 | Applications

The utility of artificial neural network models lies in the fact that they can be
used to infer a function from observations. This is particularly useful in
applications where the complexity of the data or task makes the design of such a
function by hand impractical.

2.4.13.1 | Real-life applications

The tasks artificial neural networks are applied to tend to fall within the
following broad categories:
 Function approximation, or regression analysis, including time series prediction,
fitness approximation and modeling.
 Classification, including pattern and sequence recognition, novelty detection and
sequential decision making.
 Data processing, including filtering, clustering, blind source separation and
compression.
 Robotics, including directing manipulators, Computer numerical control.
Application areas include system identification and control (vehicle control,
process control, natural resources management), quantum chemistry, game-playing
and decision making (backgammon, chess, poker), pattern recognition (radar
systems, face identification, object recognition and more), sequence recognition
(gesture, speech, handwritten text recognition), medical diagnosis, financial

29
Chapter 2 | Speech Recognition

applications (automated trading systems), data mining (or knowledge discovery in


databases, "KDD"), visualization and e-mail spam filtering.
Artificial neural networks have also been used to diagnose several cancers.
An ANN based hybrid lung cancer detection system named HLND improves the
accuracy of diagnosis and the speed of lung cancer radiology. These networks have
also been used to diagnose prostate cancer. The diagnoses can be used to make
specific models taken from a large group of patients compared to information of
one given patient.
The models do not depend on assumptions about correlations of different
variables. Colorectal cancer has also been predicted using the neural networks.
Neural networks could predict the outcome for a patient with colorectal cancer
with a lot more accuracy than the current clinical methods. After training, the
networks could predict multiple patient outcomes from unrelated institutions.

2.4.13.2 | Neural networks and neuroscience

Theoretical and computational neuroscience is the field concerned with the


theoretical analysis and computational modeling of biological neural systems.
Since neural systems are intimately related to cognitive processes and behavior, the
field is closely related to cognitive and behavioral modeling.
The aim of the field is to create models of biological neural systems in order
to understand how biological systems work. To gain this understanding,
neuroscientists strive to make a link between observed biological processes (data),
biologically plausible mechanisms for neural processing and learning (biological
neural network models) and theory (statistical learning theory and information
theory).

2.4.14 | Types of models


Many models are used in the field defined at different levels of abstraction
and modeling different aspects of neural systems. They range from models of the
short-term behavior of individual neurons, models of how the dynamics of neural
circuitry arise from interactions between individual neurons and finally to models
of how behavior can arise from abstract neural modules that represent complete
subsystems. These include models of the long-term, and short-term plasticity, of
neural systems and their relations to learning and memory from the individual
neuron to the system level.

30
Chapter 2 | Speech Recognition

2.4.15 | Neural network software

Neural network software is used to simulate research, develop and apply


artificial neural networks, biological neural networks and in some cases a wider
array of adaptive systems.

2.4.16 | Types of artificial neural networks

Artificial neural network types vary from those with only one or two layers of
single direction logic, to complicated multi–input many directional feedback loop
and layers. On the whole, these systems use algorithms in their programming to
determine control and organization of their functions. Some may be as simple as a
one neuron layer with an input and an output, and others can mimic complex
systems such as dANN, which can mimic chromosomal DNA through sizes at
cellular level, into artificial organisms and simulate reproduction, mutation and
population sizes.
Most systems use "weights" to change the parameters of the throughput and
the varying connections to the neurons. Artificial neural networks can be
autonomous and learn by input from outside "teachers" or even self-teaching from
written in rules.

2.4.17 | Confidence analysis of a neural network

Supervised neural networks that use an MSE cost function can use formal
statistical methods to determine the confidence of the trained model. The MSE on
a validation set can be used as an estimate for variance. This value can then be
used to calculate the confidence interval of the output of the network, assuming a
normal distribution. A confidence analysis made this way is statistically valid as
long as the output probability distribution stays the same and the network is not
modified.
By assigning a softmax activation function on the output layer of the neural
network (or a softmax component in a component-based neural network) for
categorical target variables, the outputs can be interpreted as posterior
probabilities. This is very useful in classification as it gives a certainty measure on
classifications.

31
CHAPTER 3
Image Processing

s
Chapter 3 | Image Processing

3.1 | INTRODUCTION

This chapter is an introduction on how to handle images in Matlab. When


working with images in Matlab, there are many things to keep in mind such as
loading an image, using the right format, saving the data as different data types,
how to display an image, conversion between different image formats, etc. This
worksheet presents some of the commands designed for these operations. Most of
these commands require you to have the Image processing tool box installed with
MATLAB. To find out if it is installed type very at the Matlab prompt. This gives
you a list of what tool boxes that are installed on your system.
For further reference on image handling in Matlab you are recommended to
use Matlab's help browser. There is an extensive (and quite good) on-line manual
for the Image processing tool box that you can access via Matlab's help browser.
The first sections of this worksheet are quite heavy. The only way to
understand how the presented commands work, is to carefully work through the
examples given at the end of the worksheet. Once you can get these examples to
work, experiment on your own using your favorite image!

3.1.1 | What Is Digital Image Processing?

Transforming digital information representing images.

3.1.2 | Motivating Problems:

1. Improve pictorial information for human interpretation.


2. Remove noise.
3. Correct for motion, camera position, and distortion.
4. Enhance by changing contrast, color.
5. Segmentation - dividing an image up into constituent parts
6. Representation - representing an image by some more abstract.
7. Models Classification.
8. Reduce the size of image information for efficient handling.
9. Compression with loss of digital information that minimizes loss of "perceptual"
information. JPEG and GIF, MPEG.

33
Chapter 3 | Image Processing

3.2 | COLOR VISION

The color-responsive chemicals in the cones are called cone pigments and are
very similar to the chemicals in the rods. The retinal portion of the chemical is the
same, however the scotopsin is replaced with photopsins. Therefore, the color-
responsive pigments are made of retinal and photopsins. There are three kinds of
color-sensitive pigments:
• Red-sensitive pigment
• Green-sensitive pigment
• Blue-sensitive pigmentlution representations versus quality of service.
Each cone cell has one of these pigments so that it is sensitive to that color.
The human eye can sense almost any gradation of color when red, green and blue
are mixed.
The wavelengths of the three types of cones (red, green and blue) are shown.
The peak absorbancy of blue-sensitive pigment is 445 nanometers, for green-
sensitive pigment it is 535 nanometers, and for red-sensitive pigment it is 570
nanometers.
MATLAB stores most images as two-dimensional arrays (i.e., matrices), in
which each element of the matrix corresponds to a single pixel in the displayed
image. For example, an image composed of 200 rows and 300 columns of different
colored dots would be stored in MATLAB as a 200-by-300 matrix. Some images,
such as RGB, require a three dimensional array, where the first plane in the 3rd
dimension represents the red pixel intensities, the second plane represents the
green pixel intensities, and the third plane represents the blue pixel intensities.
To reduce memory requirements, MATLAB supports storing image data in
arrays of class uint8 and uint16. The data in these arrays is stored as 8-bit or 16-bit
unsigned integers. These arrays require one-eighth or one-fourth as much memory
as data in double arrays. An image whose data matrix has class uint8 is called an 8-
bit image; an image whose data matrix has class uint16 is called a 16-bit image.

3.2.1 | Fundamentals
A digital image is composed of pixels which can be thought of as small dots
on the screen. A digital image is an instruction of how to color each pixel. We will
see in detail later on how this is done in practice. A typical size of an image is 512-
by-512 pixels. Later on in the course you will see that it is convenient to let the
33
Chapter 3 | Image Processing

dimensions of the image to be a power of 2. For example, 2 9=512. In the general


case we say that an image is of size m-by-n if it is composed of m pixels in the
vertical direction and n pixels in the horizontal direction.
Let us say that we have an image on the format 512-by-1024 pixels. This
means that the data for the image must contain information about 524288 pixels,
which requires a lot of memory! Hence, compressing images is essential for
efficient image processing. You will later on see how Fourier analysis and Wavelet
analysis can help us to compress an image significantly. There are also a few
"computer scientific" tricks (for example entropy coding) to reduce the amount of
data required to store an image.

3.2.2 | Image Formats Supported By Mat lab.

The following image formats are supported by Mat lab:

 BMP
 HDF
 JPEG
 PCX
 TIFF
 XWB

Most images you find on the Internet are JPEG-images which is the name for
one of the most widely used compression standards for images. If you have
stored an image you can usually see from the suffix what format it is stored in. For
example, an image named myimage.jpg is stored in the JPEG format and we will
see later on that we can load an image of this format into Mat lab.

3.2.3 | Working Formats In Matlab:

If an image is stored as a JPEG-image on your disc we first read it into


Matlab. However, in order to start working with an image, for example perform a
wavelet transform on the image, we must convert it into a different format. This
section explains four common formats.

3.3 | ASPECTS OF IMAGE PROCESSING

33
Chapter 3 | Image Processing

Image Enhancement: Processing an image so that the result is more suitable for a
particular application. (Sharpening or deploring an out of focus image, highlighting
edges, improving image contrast, or brightening an image, removing noise)
Image Restoration: This may be considered as reversing the damage done to an
image by a known cause. (Removing of blur caused by linear motion, removal of
optical distortions)
Image Segmentation: This involves subdividing an image into constituent parts,
or isolating certain aspects of an image.(finding lines, circles, or particular shapes
in an image, in an aerial photograph, identifying cars, trees, buildings, or roads.

3.4 | IMAGE TYPES

3.4.1 | Intensity Image (Gray Scale Image)

This is the equivalent to a "gray scale image" and this is the image we will
mostly work with in this course. It represents an image as a matrix where every
element has a value corresponding to how bright/dark the pixel at the
corresponding position should be colored. There are two ways to represent the
number that represents the brightness of the pixel: The double class (or data type).
This assigns a floating number ("a number with decimals") between 0 and 1 to
each pixel. The value 0 corresponds to black and the value 1 corresponds to white.
The other class is called uint8 which assigns an integer between 0 and 255 to
represent the brightness of a pixel. The value 0 corresponds to black and 255 to
white. The class uint8 only requires roughly 1/8 of the storage compared to the
class double. On the other hand, many mathematical functions can only be applied
to the double class. We will see later how to convert between double and uint8.

Fig. (3.1)

33
Chapter 3 | Image Processing

3.4.2 | Binary Image:

This image format also stores an image as a matrix but can only color a pixel
black or white (and nothing in between). It assigns a 0 for black and a 1 for white.

3.4.3 | Indexed Image:

This is a practical way of representing color images. (In this course we will
mostly work with gray scale images but once you have learned how to work with a
gray scale image you will also know the principle how to work with color images.)
An Indexed image stores an image as two matrices. The first matrix has the same
size as the image and one number for each pixel. The second matrix is called the
color map and its size may be different from the image. The numbers in the first
matrix is an instruction of what number to use in the color map matrix.

Fig. (3.2)

3.4.4 | RGB Image

This is another format for color images. It represents an image with three
matrices of sizes matching the image format. Each matrix corresponds to one of
the colors red, green or blue and gives an instruction of how much of each of these
colors a certain pixel should use.

3.4.5 | Multi-frame Image:

In some applications we want to study a sequence of images. This is very


common in biological and medical imaging where you might study a sequence of
slices of a cell. For these cases, the multi-frame format is a convenient way of
33
Chapter 3 | Image Processing

working with a sequence of images. In case you choose to work with biological
imaging later on in this course, you may use this format.

3.5 | HOW TO?

3.5.1 | How To Convert Between Different Formats:

The following table shows how to convert between the different formats given
above. All these commands require the Image processing tool box!

Table(3.1)Image format conversion (Within the parenthesis you type


the name of the image you wish to convert)
Matlab
Operation
command
Convert between intensity/indexed/RGB format to binary format. dither()
Convert between intensity format to indexed format. gray2ind()
Convert between indexed format to intensity format. ind2gray()
Convert between indexed format to RGB format. ind2rgb()
Convert a regular matrix to intensity format by scaling. mat2gray()
Convert between RGB format to intensity format. rgb2gray()
Convert between RGB format to indexed format. rgb2ind()

The command mat2gray is useful if you have a matrix representing an image


but the values representing the gray scale range between, let's say, 0 and 1000. The
command mat2gray automatically re scales all entries so that they fall within 0 and
255 (if you use the uint class) or 0 and 1 (if you use the double class).

3.5.2 | How to Read Files

When you encounter an image you want to work with, it is usually in form
of a file (for example, if you down load an image from the web, it is usually stored
as a JPEG-file). Once we are done processing an image, we may want to write it
back to a JPEG-file so that we can, for example, post the processed image on the
web. This is done using the imread and imwrite commands. These commands
require the Image processing tool box!

33
Chapter 3 | Image Processing

Table(3.2)Reading and writing image files


Operation Matlab command
Read an image.
(Within the parenthesis you type the name of the image file you imread()
wish to read. Put the file name within single quotes
Write an image to a file.
(As the first argument within the parenthesis you type the name
of the image you have worked with. As a second argument
imwrite( )
within the parenthesis you type the name of the file and format
that you want to write the image to. Put the file name within
single quotes.

Make sure to use semi-colon; after these commands, otherwise you will get
LOTS OF number scrolling on your screen... The commands imread and imwrite
support the formats given in the section "Image formats supported by Matlab"
above.

3.5.3 | Loading And Saving Variables in Matlab

This section explains how to load and save variables in Mat lab. Once you
have read a file, you probably convert it into an intensity image (a matrix) and
work with this matrix. Once you are done you may want to save the matrix
representing the image in order to continue to work with this matrix at another
time. This is easily done using the commands save and load. Note that save and
load are commonly used Matlab commands, and works independently of what tool
boxes that are installed.

Table(3.3) Loading and saving variables


Operation Matlab command
Save the variable X. Save X
Load the variable X. Load X

3.5.4 | How to Display an Image in MATLAB

Here are a couple of basic Mat lab commands (do not require any tool box)
for displaying an image.

33
Chapter 3 | Image Processing

Table(3.4)Displaying an image given on matrix form


Operation Matlab command
Display an image represented as the matrix X. imagesc(X)
Adjust the brightness .S is a parameter such that -1<s<0 gives a
brighten(s)
darker image, 0<s<1 gives a brighter image.
Change the colors to gray. colormap(gray)

Sometimes your image may not be displayed in gray scale even though you
might have converted it into a gray scale image. You can then use the command
colormap (gray) to "force" Matlab to use a gray scale when displaying an image.
If you are using Matlab with an Image processing tool box installed, I
recommend you to use the command imshow to display an image.

Table (3.5)Displaying an image given on matrix form (with image processing tool box)
Operation Matlab command
Display an image represented as the matrix X. imshow(X)
Zoom in (using the left and right mouse button). zoom on
Turn off the zoom function. zoom off

3.6 | SOME IMPORTANT DEFINITIONS

3.6.1 | Imread Function

A = imread (filename, fmt) reads a grayscale or true color image named filename
into A. If the file contains a grayscale intensity image, A is a two-dimensional
array. If the file contains a true color (RGB) image, A is a three-dimensional (m-
by-n-by-3) array.

3.6.2 | Rotation

>> B = imrotate (A, ANGLE, METHOD)

Where;
A: Your image.
ANGLE: The angle (in degrees) you want to rotate your image in the counter
clockwise direction.
METHOD: A string that can have one of these values
If you omit the METHOD argument, IMROTATE uses the default method of
'nearest'.

34
Chapter 3 | Image Processing

Note: to rotate the image clockwise, specify a negative angle. The returned image
matrix B is, in general, larger than A to include the whole rotated image.
IMROTATE sets invalid values on the periphery of B to 0.

3.6.3 | Scaling

IMRESIZE resizes an image of any type using the specified interpolation


method. Supported interpolation methods

3.6.4 | Interpolation

'nearest' (default) nearest neighbor interpolation?


'bilinear' bilinear interpolation?
'bicubic' bicubic interpolation ?
B = IMRESIZE(A,M,METHOD) returns an image that is M times the size of A. If
M is between 0 and 1.0, B is smaller than A. If M is greater than 1.0, B is larger
than A. If METHOD is omitted, IMRESIZE uses nearest neighbor interpolation.
B = IMRESIZE (A,[MROWS MCOLS],METHOD) returns an image of size
MROWS-by-MCOLS. If the specified size does not produce the same aspect ratio
as the input image has, the output image is distorted.

a= imread(‘image.fmt’); % put your image in place of image.fmt.


» B = IMRESIZE (a,[100 100],'nearest');
» imshow(B);
» B = IMRESIZE(a,[100 100],'bilinear');
» imshow(B);
» B = IMRESIZE(a,[100 100],'bicubic');
» imshow(B);

3.7 | EDGE DETECTION

3.7.1 | Canny Edge Detector

1. Low error rate of detection


Well match human perception results
2. Good localization of edges
The distance between actual edges in an image and the edges found by a
computational algorithm should be minimized
3. Single response

34
Chapter 3 | Image Processing

The algorithm should not return multiple edges pixels when only a single one exist.

3.7.2 | Edge Detectors

b\w color Canny sobel

Fig.(3.4)

Fig. (3.5)

3.7.3 | Edge Tracing

b=rgb2gray(a); % convert to gray. WE can only do edge tracing for gray images.
edge(b,'prewitt');
edge(b,'sobel');
edge(b,'sobel','vertical');
edge(b,'sobel','horizontal');
edge(b,'sobel','both');

We can only do edge tracing using gray scale images (i.e images without color).

34
Chapter 3 | Image Processing

>> BW=rgb2gray (A);


>> edge (BW,’prewitt’)

Fig.(3.6)

That is what I saw!

>> edge (BW,’sobel’,’vertical’)


>> edge (BW,’sobel’,’horizontal’)
>> edge (BW,’sobel’,’both’)

Table(3.6):Data types
Type Description Range
Int8 8-bit integer -128_127
Uint8. 8-bit unsigned integer 0_255
Int16 16-bit integer -32768_32767
Double Double precision real number Machine specific

3.8 | MAPPING

3.8.1 | Mapping Images onto Surfaces Overview

33
Chapter 3 | Image Processing

Mapping an image onto geometry, also known as texture mapping, involves


overlaying an image or function onto a geometric surface. Images may be realistic,
such as satellite images, or representational, such as color-coded functions of
temperature or elevation. Unlike volume visualizations, which render each voxel
(volume element) of a three-dimensional scene, mapping an image onto geometry
efficiently creates the appearance of complexity by simply layering an image onto
a surface. The resulting realism of the display also provides information that is not
as readily apparent as with a simple display of either the image or the geometric
surface.
Mapping an image onto a geometric surface is a two step process. First, the
image is mapped onto the geometric surface in object space. Second, the surface
undergoes view transformations (relating to the viewpoint of the observer) and is
then displayed in 2D screen space. You can use IDL Direct Graphics or Object
Graphics to display images mapped onto geometric surfaces. The following table
introduces the tasks and routines.

Table(3.7):Tasks and Routines Associated with Mapping an Image onto Geometry


Routine(s)/Object(s) Description
SHADE_SURF Display the elevation data
IDLgrWindow::Init
IDLgrView::Init Initialize the objects necessary for an Object Graphics display.
IDLgrModel::Init
IDLgrSurface:: Init Initialize a surface object containing the elevation data.
IDLgrImage::Init Initialize an image object containing the satellite image
XOBJVIEW Display the object in an interactive IDL utility allowing
rotation and resizing.

3.8.2 | Mapping an Image onto Elevation Data

The following Object Graphics example maps a satellite image from the Los
Angeles, California vicinity onto a DEM (Digital Elevation Model) containing the
areas topographical features. The realism resulting from mapping the image onto
the corresponding elevation data provides a more informative view of the area’s
topography. The process is segmented into the following three sections:
• “Opening Image and Geometry Files”
• “Initializing the IDL Display Objects”
• “Displaying the Image and Geometric Surface Objects”

33
Chapter 3 | Image Processing

Note:
Data can be either regularly gridded (defined by a 2D array) or irregularly
gridded (defined by irregular x, y, z points). Both the image and elevation data used
in this example are regularly gridded. If you are dealing with irregularly gridded
data, use GRIDDATA to map the data to a regular grid.
Complete the following steps for a detailed description of the process.
Example Code:
See elevation_object.pro in the examples/doc/image subdirectory of the IDL
installation directory for code that duplicates this example. Run the example
procedure by entering elevation object at the IDL command prompt or view the file
in an IDL Editor window by entering .EDIT elevation_object.pro.
Opening Image and Geometry Files:
The following steps read in the satellite image and DEM files and display the
Elevation data.

1. Select the satellite image:


>> imageFile = FILEPATH('elev_t.jpg', $)
SUBDIRECTORY = ['examples', 'data'])

2. Import the JPEG file:


READ_JPEG, image File, image

3. Select the DEM file:


demFile = FILEPATH('elevbin.dat', $)
SUBDIRECTORY = ['examples', 'data'])

4. Define an array for the elevation data, open the file, read in the data and close
the file:
dem = READ_BINARY(demfile, DATA_DIMS = [64, 64]

5. Enlarge the size of the elevation array for display purposes:


dem = CONGRID(dem, 128, 128, /INTERP)

6. To quickly visualize the elevation data before continuing on to the Object


Graphics section, initialize the display, create a window and display the elevation
data using the SHADE_SURF command:
DEVICE, DECOMPOSED = 0
33
Chapter 3 | Image Processing

WINDOW, 0, TITLE = 'Elevation Data'


SHADE_SURF, dem
After reading in the satellite image and DEM data, continue with the next section
to create the objects necessary to map the satellite image onto the elevation
surface.

Fig.(3.7):Visual Display of the Elevation Data

After reading in the satellite image and DEM data, continue with the next
section to create the objects necessary to map the satellite image onto the elevation
surface.

3.8.3 | Initializing the IDL Display Objects

After reading in the image and surface data in the previous steps, you will
need to create objects containing the data. When creating an IDL Object Graphics
display, it is necessary to create a window object (oWindow), a view object
(oView) and a model object (oModel). These display objects, shown in the
conceptual representation in the following figure, will contain a geometric surface
object (the DEM data) and an image object (the satellite image).

These user-defined objects are instances of existing IDL object classes and
provide access to the properties and methods associated with each object class.

33
Chapter 3 | Image Processing

Note:
(The XOBJVIEW utility (described in “Mapping an Image Object onto a
Sphere” automatically creates window and view Complete the following steps to
initialize the necessary IDL objects.)

1. Initialize the window, view and model display objects. For detailed syntax,
arguments and keywords available with each object initialization, see
IDLgrWindow::Init, IDLgrView::Init and IDLgrModel::Init. The following
three lines use the basic syntax :
oNewObject = OBJ_NEW('Class_Name')

To create these objects:

oWindow = OBJ_NEW('IDLgrWindow', RETAIN = 2, COLOR_MODEL = 0)


oView = OBJ_NEW('IDLgrView')
oModel = OBJ_NEW('IDLgrModel')

2. Assign the elevation surface data, dem, to an IDLgrSurface object. The


IDLgrSurface::Init keyword, STYLE = 2, draws the elevation data using a filled
line style:
oSurface = OBJ_NEW('IDLgrSurface', dem, STYLE = 2)

3. Assign the satellite image to a user-defined IDLgrImage object using


IDLgrImage::Init:
oImage = OBJ_NEW('IDLgrImage', image, INTERLEAVE = 0, $
/INTERPOLATE)
INTERLEAVE = 0 indicates that the satellite image is organized using pixel
interleaving, and therefore has the dimensions (3, m, n). The INTERPOLATE
keyword forces bilinear interpolation instead of using the default nearest neighbor
interpolation method.

3.8.4 | Displaying the Image and Geometric Surface Objects

This section displays the objects created in the previous steps. The image and
surface objects will first be displayed in an IDL Object Graphics window and then
with the interactive XOBJVIEW utility.

33
Chapter 3 | Image Processing

1. Center the elevation surface object in the display window. The default object
graphics coordinate system is [–1,–1], [1,1]. To center the object in the window,
position the lower left corner of the surface data at [–0.5,–0.5, –0.5]
for the x, y and z dimensions:

2. Map the satellite image onto the geometric elevation surface using the
IDLgrSurface::Init TEXTURE_MAP keyword:
oSurface -> SetProperty, TEXTURE_MAP = oImage, $
COLOR = [255, 255, 255]
For clearest display of the texture map, set COLOR = [255, 255, 255]. If the image
does not have dimensions that are exact powers of 2, IDL resamples the image into
a larger size that has dimensions which are the next powers of two greater than the
original dimensions. This resampling may cause unwanted sampling artifacts. In
this example, the image does have dimensions that are exact powers of two, so no
resampling occurs.

oSurface -> GETPROPERTY, XRANGE = xr, YRANGE = yr, $


ZRANGE = zr
xs = NORM_COORD(xr)
xs[0] = xs[0] - 0.5
ys = NORM_COORD(yr)
ys[0] = ys[0] - 0.5
zs = NORM_COORD(zr)
zs[0] = zs[0] - 0.5
oSurface -> SETPROPERTY, XCOORD_CONV = xs, $
YCOORD_CONV = ys, ZCOORD = zs

Note:
(If your texture does not have dimensions that are exact powers of 2 and you
do not want to introduce resampling artifacts, you can pad the texture with unused
data to a power of two and tell IDL to map only a subset of the texture onto the
surface.) For example, if your image is 40 by 40, create a 64 by 64 image and fill
part of it with the image data:
textureImage = BYTARR(64, 64, /NOZERO)
textureImage[0:39, 0:39] = image ; image is 40 by 40
oImage = OBJ_NEW('IDLgrImage', textureImage)

Then, construct texture coordinates that map the active part of the texture to a
surface (oSurface):
textureCoords = [[], [], [], []]

33
Chapter 3 | Image Processing

oSurface -> SetProperty, TEXTURE_COORD = textureCoords


The surface object in IDL 5.6 is has been enhanced to automatically perform
the above calculation. In the above example, just use the image data (the 40 by 40
array) to create the image texture and do not supply texture coordinates. IDL
computes the appropriate texture coordinates to correctly use the 40 by 40 image.
Note:
(Some graphic devices have a limit for the maximum texture size. If your
texture is larger than the maximum size, IDL scales it down into dimensions that
work on the device. This rescaling may introduce resampling artifacts and loss of
detail in the texture. To avoid this, use the TEXTURE_HIGHRES keyword to tell
IDL to draw the surface in smaller pieces that can be texture mapped without loss
of detail.)

3. Add the surface object, covered by the satellite image, to the model object.
Then add the model to the view object:
oModel -> Add, oSurface.
oView -> Add, oMode.

4. Rotate the model for better display in the object window. Without rotating the
model, the surface is displayed at a 90 elevation angle, containing no depth
information. The following lines rotate the model 90 away from the viewer along
the x-axis and 30clockwise along the y-axis and the x-axis:
oModel -> ROTATE, [1, 0, 0], -90
oModel -> ROTATE, [0, 1, 0], 30
oModel -> ROTATE, [1, 0, 0], 30

5. Display the result in the Object Graphics window:


oWindow -> Draw, oView

Fig.(3.9:Image Mapped onto a Surface in an Object Graphics Window

33
Chapter 3 | Image Processing

6. Display the results using XOBJVIEW, setting the SCALE = 1 (instead of the
default value of 1/SQRT3) to increase the size of the initial display:

XOBJVIEW, oModel, /BLOCK, SCALE = 1


This results in the following display:

Fig.( 3.10) Displaying the Image Mapped onto the Surface in XOBJVIEW

After displaying the model, you can rotate it by clicking in the


applicationwindow and dragging your mouse. Select the magnify button, then click
near the middle of the image. Drag your mouse away from the center of the display
to magnify the image or toward the center of the display to shrink the image. Select
the left-most button on the XOBJVIEW toolbar to reset the display.

7. Destroy unneeded object references after closing the display windows:


OBJ_DESTROY, [oView, oImage]
The oModel and oSurface objects are automatically destroyed when oView is
destroyed.

For an example of mapping an image onto a regular surface using both Direct
and Object Graphics displays, see “Mapping an Image onto a Sphere”

34
Chapter 3 | Image Processing

3.8.5 | Mapping an Image onto a Sphere

The following example maps an image containing a color representation of


world elevation onto a sphere using both Direct and Object Graphics displays. The
example
is broken down into two sections:
• “Mapping an Image onto a Sphere Using Direct Graphics” .
• “Mapping an Image Object onto a Sphere” .

3.9 | MAPPING OFF LINE:

In the absence of a network or services we can identify and see the track
through the use of image processing technique, We incorporate the map where an
image of the places familiar to the person and determine how to access them and
return them in a clear and safe.
we calculate the distances by using mat lab function :
IMDISTLINE
and assuming speed to calculate time takes to get from one point to another and
we guide person through voice commands for example on the road to move
forward or back word or to left or to right. We have thus, we provide another way
to work mapping without being online.

34
CHAPTER 4
GPS Navigation
Chapter 4 | GPS Navigation

4.1 | INTRODUCTION

4.1.1 | What Is GPS?


The Global Positioning System (GPS) is a satellite-based navigation system
made up of a network of 24 satellites placed into orbit by the U.S. Department of
Defense. GPS was originally intended for military applications, but in the 1980s,
the government made the system available for civilian use. GPS works in any
weather conditions, anywhere in the world, 24 hours a day. There are no
subscription fees or setup charges to use GPS.

4.1.2 | How It Works


GPS satellites circle the earth twice a day in a very precise orbit and transmit
signal information to earth. GPS receivers take this information and use
triangulation to calculate the user's exact location. Essentially, the GPS receiver
compares the time a signal was transmitted by a satellite with the time it was
received. The time difference tells the GPS receiver how far away the satellite is.
Now, with distance measurements from a few more satellites, the receiver can
determine the user's position and display it on the unit's electronic map.

Fig.(4-1): Satellite Screen

35
Chapter 4 | GPS Navigation

Fig. (4-2): How GPS works

A GPS receiver must be locked on to the signal of at least three satellites to


calculate a 2D position (latitude and longitude) and track movement. With four or
more satellites in view, the receiver can determine the user's 3D position (latitude,
longitude and altitude). Once the user's position has been determined, the GPS unit
can calculate other information, such as speed, bearing, track, trip distance,
distance to destination, sunrise and sunset time and more. When people talk about
"a GPS," they usually mean a GPS receiver. The Global Positioning System (GPS)
is actually a constellation of 27 Earth-orbiting satellites (24 in operation and three
extras in case one fails). The US military developed and implemented this satellite
network as a military navigation system, but soon opened it up to

4.2 | BASIC CONCEPT OF GPS


A GPS receiver calculates its position by precisely timing the signals sent by
GPS satellites high above the Earth. Each satellite continually transmits messages
that include:

35
Chapter 4 | GPS Navigation

 The time the message was transmitted.


 Precise orbital information (the ephemeris).
 The general system health and rough orbits of all GPS satellites (the almanac).
The receiver uses the messages it receives to determine the transit time of
each message and computes the distance to each satellite. These distances along
with the satellites' locations are used with the possible aid of trilateration,
depending on which algorithm is used, to compute the position of the receiver.
This position is then displayed, perhaps with a moving map display or latitude and
longitude; elevation information may be included. Many GPS units show derived
information such as direction and speed, calculated from position changes.
Three satellites might seem enough to solve for position since space has three
dimensions and a position near the Earth's surface can be assumed. However, even
a very small clock error multiplied by the very large speed of light — the speed at
which satellite signals propagate — results in a large positional error. Therefore
receivers use four or more satellites to solve for both the receiver's location and
time. The very accurately computed time is effectively hidden by most GPS
applications, which use only the location. A few specialized GPS applications do
however use the time; these include time transfer, traffic signal timing, and
synchronization of cell phone base station.
Although four satellites are required for normal operation, fewer apply in
special cases. If one variable is already known, a receiver can determine its
position using only three satellites. For example, a ship or aircraft may have known
elevation. Some GPS receivers may use additional clues or assumptions to give a
less accurate (degraded) position when fewer than four satellites are visible.

4.3 | POSITION CALCULATION INTRODUCTION

To provide an introductory description of how a GPS receiver works, error


effects are deferred to a later section. Using messages received from a minimum of
four visible satellites, a GPS receiver is able to determine the times sent and then
the satellite positions corresponding to these times sent. The x, y, and z
components of position, and the time sent, are designated as [ ] where
the subscript i is the satellite number and has the value 1, 2, 3, or 4. Knowing the
indicated time the message was received ̅ , the GPS receiver could compute the
transit time of the message as ( ̅ ), if ̅ would be equal to correct reception
time, .

33
Chapter 4 | GPS Navigation

A pseudo range, ( ̅ ) , would be the traveling distance of the


message, assuming it traveled at the speed of light, c.
A satellite's position and pseudo range define a sphere, centered on the
satellite, with radius equal to the pseudo range. The position of the receiver is
somewhere on the surface of this sphere. Thus with four satellites, the indicated
position of the GPS receiver is at or near the intersection of the surfaces of four
spheres. In the ideal case of no errors, the GPS receiver would be at a precise
intersection of the four surfaces.
If the surfaces of two spheres intersect at more than one point, they intersect
in a circle. The article trilateration shows this mathematically. A figure, two sphere
surfaces intersecting in a circle, is shown below. Two points where the surfaces of
the spheres intersect are clearly shown in the figure. The distance between these
two points is the diameter of the circle of intersection. The intersection of a third
spherical surface with the first two will be its intersection with that circle; in most
cases of practical interest, this means they intersect at two points. Another
figure, surface of sphere intersection a circle (not a solid disk) at two points,
illustrates the intersection. The two intersections are marked with dots. Again the
article trilateration clearly shows this mathematically
For automobiles and other near-earth vehicles, the correct position of the GPS
receiver is the intersection closest to the Earth's surface. For space vehicles, the
intersection farthest from Earth may be the correct one. The correct position for the
GPS receiver is also on the intersection with the surface of the sphere
corresponding to the fourth satellite.

-Two sphere surfaces intersecting in a circle -Surface of sphere intersecting a circle (not a solid disk) at two
points

Fig. (4-3)

35
Chapter 4 | GPS Navigation

4.4 | COMMUNICATION
The navigational signals transmitted by GPS satellites encode a variety of
information including satellite positions, the state of the internal clocks, and the
health of the network. These signals are transmitted on two separate carrier
frequencies that are common to all satellites in the network. Two different
encodings are used: a public encoding that enables lower resolution navigation,
and an encrypted encoding used by the U.S. military.

4.5 | MESSAGE FORMAT

Table(4.1):GPS message format

Sub frames Description

1 Satellite clock, GPS time relationship

2–3 Ephemeris (precise satellite orbit)

4–5 Almanac component (satellite network synopsis, error correction)

Each GPS satellite continuously broadcasts a navigation message on L1 C/A


and L2 P/Y at a rate of 50 bits per second .Each complete message takes 750
seconds (12 1/2 minutes) to complete. The message structure has a basic format of
a 1500-bit-long frame made up of five sub frames, each sub frame being 300 bits
(6 seconds) long. Sub frames 4 and 5 are sub commutated 25 times each, so that a
complete data message requires the transmission of 25 full frames. Each sub frame
consists of ten words, each 30 bits long. Thus, with 300 bits in a sub frame times 5
sub frames in a frame times 25 frames in a message, each message is 37,500 bits
long. At a transmission rate of 50 bps, this gives 750 seconds to transmit an entire
almanac message. Each 30-second frame begins precisely on the minute or half
minute as indicated by the atomic clock on each satellite.
The first part of the message encodes the week number and the time within
the week, as well as the data about the health of the satellite. The second part of the

35
Chapter 4 | GPS Navigation

message, the ephemeris, provides the precise orbit for the satellite. The last part of
the message, the almanac sub commutated in sub frames 4 & 5, contains coarse
orbit and status information for up to 32 satellites in the constellation as well as
data related to error correction. Thus, in order to obtain an accurate satellite
location from this transmitted message the receiver must demodulate the message
from each satellite it includes in its solution for 18 to 30 seconds. In order to
collect all the transmitted almanacs the receiver must demodulate the message for
732 to 750 seconds or 12 1/2 minutes.
All satellites broadcast at the same frequencies. Signals are encoded
using code division multiple access (CDMA) allowing messages from individual
satellites to be distinguished from each other based on unique encodings for each
satellite (that the receiver must be aware of). Two distinct types of CDMA
encodings are used: the course/acquisition (C/A) code, which is accessible by the
general public, and the precise (P) code, that is encrypted so that only the U.S.
military can access it.
The ephemeris is updated every 2 hours and is generally valid for 4 hours,
with provisions for updates every 6 hours or longer in non-nominal conditions. The
almanac is updated typically every 24 hours. Additionally data for a few weeks
following is uploaded in case of transmission updates that delay data upload.

4.6 | SATELLITE FREQUENCIES


Table(4.2):GPS frequency overview

Band Frequency Description

Coarse-acquisition (C/A) and encrypted precision P(Y) codes, plus


L1 1575.42 MHz the L1 civilian (L1C) and military (M) codes on future Block III
satellites.

P(Y) code, plus the L2C and military codes on the Block IIR-M and
L2 1227.60 MHz
newer satellites.

L3 1381.05 MHz Used for nuclear detonation (NUDET) detection.

L4 1379.913 MHz Being studied for additional ionospheric correction.

L5 1176.45 MHz Proposed for use as a civilian safety-of-life (SoL) signal.

All satellites broadcast at the same two frequencies, 1.57542 GHz (L1
signal) and 1.2276 GHz (L2 signal). The satellite network uses a CDMA spread-

35
Chapter 4 | GPS Navigation

spectrum technique where the low-bitrate message data is encoded with a high-
rate pseudo-random (PRN) sequence that is different for each satellite. The
receiver must be aware of the PRN codes for each satellite to reconstruct the actual
message data. The C/A code, for civilian use, transmits data at
1.023 million chips per second, whereas the P code, for U.S. military use, transmits
at 10.23 million chips per second. The L1 carrier is modulated by both the C/A and
P codes, while the L2 carrier is only modulated by the P code. The P code can be
encrypted as a so-called P(Y) code that is only available to military equipment with
a proper decryption key. Both the C/A and P(Y) codes impart the precise time-of-
day to the user.
The L3 signal at a frequency of 1.38105 GHz is used by the United States
Nuclear Detonation (NUDET) Detection System (USNDS) to detect, locate, and
report nuclear detonations (NUDETs) in the Earth's atmosphere and near space.
One usage is the enforcement of nuclear test ban treaties.
The L4 band at 1.379913 GHz is being studied for additional ionospheric
correction.
The L5 frequency band at 1.17645 GHz was added in the process of GPS
modernization. This frequency falls into an internationally protected range for
aeronautical navigation, promising little or no interference under all circumstances.
The first Block IIF satellite that would provide this signal is set to be launched in
2009.The L5 consists of two carrier components that are in phase quadrature with
each other. Each carrier component is bi-phase shift key (BPSK) modulated by a
separate bit train. "L5, the third civil GPS signal, will eventually support safety-of-
life applications for aviation and provide improved availability and accuracy.

4.7 | NAVIGATION EQUATIONS


The receiver uses messages received from satellites to determine the satellite
positions and time sent. The x, y, and z components of satellite position and the
time sent are designated as [xi, yi, zi, ti] where the subscript i denotes the satellite
and has the value 1, 2, ..., n, where Knowing when the message was
received tr , the receiver computes the message's transit time as tr − ti . Note that the
receiver indeed knows the reception time indicated by its on-board clock, rather
than tr. Assuming the message traveled at the speed of light (c) the distance
traveled is (tr − ti) c. Knowing the distance from receiver to satellite and the
satellite's position implies that the receiver is on the surface of a sphere centered at
the satellite's position. Thus the receiver is at or near the intersection of the

35
Chapter 4 | GPS Navigation

surfaces of the spheres. In the ideal case of no errors, the receiver is at the
intersection of the surfaces of the spheres.
Let b denote the clock error or bias, the amount that the receiver's clock is off.
The receiver has four unknowns, the three components of GPS receiver position
and the clock bias [x, y, z, b]. The equations of the sphere surfaces are given by:
( ) ( ) ( ) ([ ] )
Or in terms of pseudo ranges, ( ) , as
√( ) ( ) ( ) .
These equations can be solved by algebraic or numerical methods.

4.8 | BANCROFT'S METHOD


Bancroft's method involves an algebraic as opposed to numerical method and
can be used for the case of four satellites or for the case of more than four
satellites. If there are four satellites then Bancroft's method provides one or two
solutions for the four unknowns. If there are more than four satellites then
Bancroft's method provides the solution which minimizes the sum of the squares of
the errors for the over determined system.

4.9 | TRILATERATION
The receiver can use trilateration and one dimensional numerical root finding.
Trilateration is used to determine the position based on three satellite's pseudo
ranges. In the usual case of two intersections, the point nearest the surface of the
sphere corresponding to the fourth satellite is chosen. Let d denote the signed
distance from the receiver position to the sphere around the fourth satellite.

4.10 | MULTIDIMENSIONAL NEWTON-RAPHSON CALCULATIONS


Alternatively, multidimensional root finding method such as Newton-
Raphson method can be used. The approach is to linearize around an approximate
solution, say [ ( ) ( ) ( ) ( ) ] from iteration k, then solve the linear equations
derived from the quadratic equations above to obtain
( ) ( ) ( ) ( )
[ ] . Although there is no guarantee that the method
always converges due to the fact that multidimensional roots cannot be bounded,

56
Chapter 4 | GPS Navigation

when a neighborhood containing a solution is known as is usually the case for


GPS, it is quite likely that a solution will be found. It has been shown that results
are comparable in accuracy to those of the Bancroft's method.

4.11 | ADDITIONAL METHODS FOR MORE THAN FOUR SATELLITES

When more than four satellites are available, the calculation can use the four
best or more than four, considering number of channels, processing capability,
and geometric dilution of precision (GDOP). Using more than four is an over-
determined system of equations with no unique solution, which must be solved
by least-squares or a similar technique. If all visible satellites are used, the results
are as good as or better than using the four best. Errors can be estimated through
the residuals. With each combination of four or more satellites, a GDOP factor can
be calculated, based on the relative sky directions of the satellites used. As more
satellites are picked up, pseudo ranges from various 4-way combinations can be
processed to add more estimates to the location and clock offset. The receiver then
takes the weighted average of these positions and clock offsets. After the final
location and time are calculated, the location is expressed in a specific coordinate
system such as latitude and longitude, using the WGS 84 geodetic datum or a
country-specific system.

4.12 | ERROR SOURCES AND ANALYSIS

Error analysis for the Global Positioning System is an important aspect for
determining what errors and their magnitude are to be expected. GPS errors are
affected by geometric dilution of precision and depend on signal arrival time
errors, numerical errors, atmospherics effects, ephemeris errors, multipath errors
and other effects. The single largest source of error in modeling the orbital
dynamics is due to variability in solar radiation pressure.

4.13 | ACCURACY ENHANCEMENT AND SURVEYING

4.13.1 | Augmentation
Integrating external information into the calculation process can materially
improve accuracy. Such augmentation systems are generally named or described

56
Chapter 4 | GPS Navigation

based on how the information arrives. Some systems transmit additional error
information (such as clock drift, ephemera, or ionospheric delay), others
characterize prior errors, while a third group provides additional navigational or
vehicle information.
Examples of augmentation systems include the Wide Area Augmentation
System (WAAS),European Geostationary Navigation Overlay Service
(EGNOS),Differential GPS, Inertial Navigation Systems(INS) and Assisted GPS.

4.13.2 | Precise Monitoring


Accuracy can be improved through precise monitoring and measurement of
existing GPS signals in additional or alternate ways.
The largest remaining error is usually the unpredictable delay through
the ionosphere. The spacecraft broadcast ionospheric model parameters, but errors
remain. This is one reason GPS spacecraft transmit on at least two frequencies, L1
and L2. Ionospheric delay is a well-defined function of frequency and the total
electron content (TEC) along the path, so measuring the arrival time difference
between the frequencies determines TEC and thus the precise ionospheric delay at
each frequency.
Military receivers can decode the P(Y)-code transmitted on both L1 and L2.
Without decryption keys, it is still possible to use a codeless technique to compare
the P(Y) codes on L1 and L2 to gain much of the same error information.
However, this technique is slow, so it is currently available only on specialized
surveying equipment. In the future, additional civilian codes are expected to be
transmitted on the L2 and L5 frequencies (see GPS modernization). Then all users
will be able to perform dual-frequency measurements and directly compute
ionospheric delay errors.
A second form of precise monitoring is called Carrier-Phase
Enhancement (CPGPS). This corrects the error that arises because the pulse
transition of the PRN is not instantaneous, and thus the correlation (satellite-
receiver sequence matching) operation is imperfect. CPGPS uses the L1 carrier
wave, which has a period of , which is about
one-thousandth of the C/A Gold code bit period of
, to act as an additional clock signal and resolve the uncertainty. The phase
difference error in the normal GPS amounts to 2–3 meters (6.6–9.8 ft.) of
ambiguity. CPGPS working to within 1% of perfect transition reduces this error to
3 centimeters (1.2 in.) of ambiguity. By eliminating this error source, CPGPS

56
Chapter 4 | GPS Navigation

coupled with DGPS normally realizes between 20–30 centimeters (7.9–12 in) of
absolute accuracy.
Relative Kinematic Positioning (RKP) is a third alternative for a precise GPS-
based positioning system. In this approach, determination of range signal can be
resolved to a precision of less than 10 centimeters (3.9 in). This is done by
resolving the number of cycles that the signal is transmitted and received by the
receiver by using a combination of differential GPS (DGPS) correction data,
transmitting GPS signal phase information and ambiguity resolution techniques via
statistical tests—possibly with processing in real-time (real-time kinematic
positioning, RTK).

4.14 | TIME KEEPING

4.14.1 | Timekeeping and leap seconds


While most clocks are synchronized to Coordinated Universal Time (UTC),
the atomic clocks on the satellites are set to GPS time (GPST; see the page
of United States Naval Observatory). The difference is that GPS time is not
corrected to match the rotation of the Earth, so it does not contain leap seconds or
other corrections that are periodically added to UTC. GPS time was set to match
Coordinated (UTC) in 1980, but has since diverged. The lack of corrections means
that GPS time remains at a constant offset with International Atomic Time (TAI)
(TAI – GPS = 19 seconds). Periodic corrections are performed on the on-board
clocks to keep them synchronized with ground clocks.
The GPS navigation message includes the difference between GPS time and
UTC. As of 2011, GPS time is 15 seconds ahead of UTC because of the leap
second added to UTC December 31, 2008. Receivers subtract this offset from GPS
time to calculate UTC and specific time zone values. New GPS units may not show
the correct UTC time until after receiving the UTC offset message. The GPS-UTC
offset field can accommodate 255 leap seconds (eight bits) that, given the current
period of the Earth's rotation (with one leap second introduced approximately
every 18 months), should be sufficient to last until approximately the year 2300.

4.14.2 | Timekeeping Accuracy


GPS time is accurate to about 14 nanoseconds.

55
Chapter 4 | GPS Navigation

4.14.3 | Timekeeping Format


As opposed to the year, month, and day format of the Gregorian calendar, the
GPS date is expressed as a week number and a seconds-into-week number. The
week number is transmitted as a ten-bit field in the C/A and P(Y) navigation
messages, and so it becomes zero again every 1,024 weeks (19.6 years). GPS week
zero started at 00:00:00 UTC (00:00:19 TAI) on January 6, 1980, and the week
number became zero again for the first time at 23:59:47 UTC on August 21, 1999
(00:00:19 TAI on August 22, 1999). To determine the current Gregorian date, a
GPS receiver must be provided with the approximate date (to within 3,584 days) to
correctly translate the GPS date signal. To address this concern the modernized
GPS navigation message uses a 13-bit field that only repeats every 8,192 weeks
(157 years), thus lasting until the year 2137 (157 years after GPS week zero).

4.14.4 | Carrier Phase Tracking (Surveying)


Another method that is used in surveying applications is carrier phase
tracking. The period of the carrier frequency multiplied by the speed of light gives
the wavelength, which is about 0.19 meters for the L1 carrier. Accuracy within 1%
of wavelength in detecting the leading edge reduces this component of pseudo
range error to as little as 2 millimeters. This compares to 3 meters for the C/A code
and 0.3 meters for the P code.
However, 2 millimeter accuracy requires measuring the total phase—the
number of waves multiplied by the wavelength plus the fractional wavelength,
which requires specially equipped receivers. This method has many surveying
applications.
Triple differencing followed by numerical root finding, and a mathematical
technique called least squares can estimate the position of one receiver given the
position of another. First, compute the difference between satellites, then between
receivers, and finally between epochs. Other orders of taking differences are
equally valid. Detailed discussion of the errors is omitted.
The satellite carrier total phase can be measured with ambiguity as to the
number of cycles. Let φ( ) denote the phase of the carrier of
satellite j measured by receiver i at time . This notation shows the meaning of
the subscripts i, j, and k. The receiver (r), satellite (s), and time (t) come in
alphabetical order as arguments of and to balance readability and conciseness,
let φ( ) be a concise abbreviation. Also we define three
functions,: , which return differences between receivers, satellites, and
time points, respectively. Each function has variables with three subscripts as its

55
Chapter 4 | GPS Navigation

arguments. These three functions are defined below. If is a function of the


three integer arguments, i, j, and k then it is a valid argument for the
functions,: , with the values defined as
( )
( )

Also if ( ) are valid arguments for the three functions


and a and b are constants then( ) is a valid argument with values
defined as
( ) ( ) ( ),
( ) ( ) ( ), and
( ) ( ) ( ).
Receiver clock errors can be approximately eliminated by differencing the
phases measured from satellite 1 with that from satellite 2 at the same epoch. This
difference is designated as ( )
Double differencing computes the difference of receiver 1's satellite
difference from that of receiver 2. This approximately eliminates satellite clock
errors. This double difference is:
( ( )) ( ) ( ) ( )
( ) ( )
Triple differencing subtracts the receiver difference from time 1 from that of
time 2. This eliminates the ambiguity associated with the integral number of wave
lengths in carrier phase provided this ambiguity does not change with time. Thus
the triple difference result eliminates practically all clock bias errors and the
integer ambiguity. Atmospheric delay and satellite ephemeris errors have been
significantly reduced. This triple difference is:

( ( ( )))
Triple difference results can be used to estimate unknown variables. For example if
the position of receiver 1 is known but the position of receiver 2 unknown, it may
be possible to estimate the position of receiver 2 using numerical root finding and
least squares. Triple difference results for three independent time pairs quite
possibly will be sufficient to solve for receiver 2's three position components. This
may require the use of a numerical procedure. An approximation of receiver 2's

53
Chapter 4 | GPS Navigation

position is required to use such a numerical method. This initial value can probably
be provided from the navigation message and the intersection of sphere surfaces.
Such a reasonable estimate can be key to successful multidimensional root finding.
Iterating from three time pairs and a fairly good initial value produces one
observed triple difference result for receiver 2's position. Processing additional
time pairs can improve accuracy, over determining the answer with multiple
solutions. Least squares can estimate an over determined system. Least squares
determines the position of receiver 2 which best fits the observed triple difference
results for receiver 2 positions under the criterion of minimizing the sum of the
squares.

4.15 | GPS NAVIGATION:

A GPS navigation device is any device that receives Global Positioning


System (GPS) signals for the purpose of determining the device's current location
on Earth. GPS devices provide latitude and longitude information, and some may
also calculate altitude, although this is not considered sufficiently accurate or
continuously available enough (due to the possibility of signal blockage and other
factors) to rely on exclusively to pilot aircraft. GPS devices are used in military,
aviation, marine and consumer product applications.

GPS devices may also have additional capabilities such as:

 containing all types of maps, like streets maps, which may be displayed in human
readable format via text or in a graphical format
 providing suggested directions to a human in charge of a vehicle or vessel via text
or speech
 providing directions directly to an autonomous vehicle such as a robotic probe
 providing information on traffic conditions (either via historical or real time data)
and suggesting alternative directions
 Providing information on nearby amenities such as restaurants, fueling stations,
etc.

In other words, all GPS devices can answer the question "Where am I?", and may
also be able to answer:

 Which roads or paths are available to me now?


 Which roads or paths should I take in order to get to my desired destination?

55
Chapter 4 | GPS Navigation

 If some roads are usually busy at this time or are busy right now, what would be a
better route to take?
 Where can I get something to eat nearby or where can I get fuel for my vehicle?

55
CHAPTER 5
Ultrasound
Chapter 5 | Ultrasound

5.1 | INTRODUCTION

Ultrasound is a mechanical disturbance that moves as a pressure wave through


a medium. When the medium is a patient, the wavelike disturbance is the basis for
use of ultrasound as a diagnostic tool. Appreciation of the characteristics of
ultrasound waves and their behavior in various media is essential to understanding
the use of diagnostic ultrasound in clinical medicine.

5.1.1 | History

In 1880, French physicists Pierre and Jacques Curie discovered the


piezoelectric effect.7 French physicist Paul Langevin attempted to develop
piezoelectric materials as senders and receivers of high-frequency mechanical
disturbances (ultrasound waves) through materials.8 His specific application was
the use of ultrasound to detect submarines during World War I. This technique,
sound navigation and ranging (SONAR), finally became practical during World
War II. Industrial uses of ultra- sound began in 1928 with the suggestion of Soviet
Physicist Sokolov that it could be used to detect hidden flaws in materials. Medical
uses of ultrasound through the 1930s were confined to therapeutic applications
such as cancer treatments and physical therapy for various ailments. Diagnostic
applications of ultrasound began in the late 1940s through collaboration between
physicians and engineers familiar with SONAR.

5.2 | WAVE MOTION

A fluid medium is a collection of molecules that are in continuous random


motion. The molecules are represented as filled circles in the margin figure.

When no external force is applied to the medium, the molecules are


distributed more or less uniformly (A). When a force is applied to the medium
(represented by movement of the piston from left to right in B), the molecules are
concentrated in front of the piston, resulting in an increased pressure at that
location. The region of increased pressure is termed a zone of compression.
Because of the forward motion imparted to the molecules by the piston, the region
of increased pressure begins to migrate away from the piston and through the
medium. That is, a mechanical disturbance introduced into the medium travels
through the medium in a direction away from the source of the disturbance. In

96
Chapter 5 | Ultrasound

clinical applications of ultrasound, the piston is replaced by an ultrasound


transducer.

As the zone of compression begins its migration through the medium, the
piston may be withdrawn from right to left to create a region of reduced pressure
immediately behind the compression zone. Molecules from the surrounding
medium move into this region to restore it to normal particle density; and a second
region, termed a zone of rarefaction, begins to migrate away from the piston (C).
That is, the compression zone (high pressure) is followed by a zone of rarefaction
(low pressure) also moving through the medium.

If the piston is displaced again to the right, a second compression zone is


established that follows the zone of rarefaction through the medium, If the piston
oscillates continuously, alternate zones of compression and rarefaction are
propagated through the medium, as illustrated in D. The propagation of these zones
establishes a wave disturbance in the medium. This disturbance is termed a
longitudinal wave because the motion of the molecules in the medium is parallel to
the direction of wave propagation.

A wave with a frequency between about 20 and 20,000 Hz is a sound wave


that is audible to the human ear. An infrasonic wave is a sound wave below 20 Hz;
it is not audible to the human ear. An ultrasound (or ultrasonic) wave has a
frequency greater than 20,000 Hz and is also inaudible. In clinical diagnosis,
ultrasound waves of frequencies between 1 and 20 MHz are used.

As a longitudinal wave moves through a medium, molecules at the edge of


the wave slide past one another. Resistance to this shearing effect causes these
molecules to move somewhat in a direction away from the moving longitudinal
wave.
This transverse motion of molecules along the edge of the longitudinal wave
establishes shear waves that radiate transversely from the longitudinal wave. In
general, shear waves are significant only in a rigid medium such as a solid. In
biologic tissues, bone is the only medium in which shear waves are important.

If the piston is displaced again to the right, a second compression zone is


established that follows the zone of rarefaction through the medium, If the piston
oscillates continuously, alternate zones of compression and rarefaction are
propagated through the medium, as illustrated in D. The propagation of these zones
establishes a wave disturbance in the medium. This disturbance is termed a
longitudinal wave because the motion of the molecules in the medium is parallel to

07
Chapter 5 | Ultrasound

the direction of wave propagation. A wave with a frequency between about 20 and
20,000 Hz is a sound wave that is audible to the human ear. An infrasonic wave is
a sound wave below 20 Hz; it is not audible to the human ear. An ultrasound (or
ultrasonic) wave has a frequency greater than 20,000 Hz and is also inaudible. In
clinical diagnosis, ultrasound waves of frequencies between 1 and 20 MHz are
used.
As a longitudinal wave moves through a medium, molecules at the edge of the
wave slide past one another. Resistance to this shearing effect causes these
molecules to move somewhat in a direction away from the moving longitudinal
wave.

This transverse motion of molecules along the edge of the longitudinal wave
establishes shear waves that radiate transversely from the longitudinal wave. In
general, shear waves are significant only in a rigid medium such as a solid. In
biologic tissues, bone is the only medium in which shear waves are important.

5.3 | WAVE CHARACTERISTICS

A zone of compression and an adjacent zone of rarefaction constitute one


cycle of an ultrasound wave. A wave cycle can be represented as a graph of local
pressure (particle density) in the medium versus distance in the direction of the
ultrasound wave. The distance covered by one cycle is the wavelength of the
ultrasound wave. The number of cycles per unit time (cps, or just sec -1 ) introduced
into the medium each second is referred to as the frequency of the wave, expressed
in units of hertz, kilohertz, or megahertz where 1 Hz equals 1 cps. The maximum
height of the wave cycle is the amplitude of the ultrasound wave. The product of
the frequency (υ) and the wavelength (λ) is the velocity of the wave; that is, c = υλ.

In most soft tissues, the velocity of ultrasound is about 1540 m/sec.


Frequencies of 1 MHz and greater are required to furnish ultrasound wavelengths
suitable for diagnostic imaging.
When two waves meet, they are said to “interfere” with each other. There are
two extremes of interference. In constructive interference the waves are “in phase”
(i.e., peak meets peak). In destructive interference the waves are “out of phase”
(i.e., peak meets valley). Waves undergoing constructive interference add their
amplitudes, whereas waves undergoing destructive interference may completely
cancel each other.

07
Chapter 5 | Ultrasound

Fig. (5.1): Characteristics of an ultrasound wave

Table (5-1): Frequency Classification of Ultrasound


Frequency (Hz) Classification
20–20,000 Audible sound
20,000–1,000,000 Ultrasound
1,000,000–30,000,000 Diagnostic medical Ultrasound

Table (5-2):Quantities and Unit Pertaining to Ultrasound Intensity


Quantity Definition Unit
Energy (E ) Ability to do work Joule
Power (P ) Rate at which energy is transported watt ( joule/sec)
Intensity (I ) Power per unit area (a), where t = time watt/cm2
Relationship
( )( )

5.4 | ULTRASOUND INTENSITY

Ultrasound frequencies of 1 MHz and greater correspond to ultrasound


wavelengths less than 1 mm in human soft tissue.
As an ultrasound wave passes through a medium, it transports energy through
the medium. The rate of energy transport is known as “power.” Medical ultrasound
is produced in beams that are usually focused into a small area, and the beam is
described in terms of the power per unit area, defined as the beam’s “intensity”.
The relationships among the quantities and units pertaining to intensity are
summarized in Table 5-2.

07
Chapter 5 | Ultrasound

Intensity is usually described relative to some reference intensity. For


example, the intensity of ultrasound waves sent into the body may be compared
with that of the ultrasound reflected back to the surface by structures in the body.
For many clinical situations the reflected waves at the surface may be as much as a
hundredth or so of the intensity of the transmitted waves. Waves reflected from
structures at depths of 10 cm or more below the surface may be lowered in
intensity by a much larger factor.

A logarithmic scale is most appropriate for recording data over a range of


many orders of magnitude. In acoustics, the decibel scale is used, with the decibel
defined as dB=10 log where I0 is the reference intensity. Table 5-3 shows
examples of decibel values for certain intensity ratios. Several rules can be
extracted from this table:
 Positive decibel values result when a wave has a higher intensity than the reference
wave; negative values denote a wave with lower intensity.
 Increasing a wave’s intensity by a factor of 10 adds 10 dB to the intensity, and
reducing the intensity by a factor of 10 subtracts 10 dB.
 Doubling the intensity adds 3 dB, and halving subtracts 3 dB.

Table (5-3): Calculation of Decibel Values Forms Intensity Ratios and Amplitude Ratios
Ratio of Ultrasound wave Intensity Ratio(I/I0) (dB) Amplitude Ratio
parameters (A/A0) (dB)

1000 30 60
100 20 40
10 10 20
2 3 6
1 0 0
⁄ −3 −6
⁄ −10 −20

⁄ −20 −40

⁄ −30 −60

No universal standard reference intensity exists for ultrasound. Thus the


statement “ultrasound at 50 dB was used” is nonsensical. However, a statement
such as “the returning echo was 50 dB below the transmitted signal” is

07
Chapter 5 | Ultrasound

informative. The trans-mitted signal then becomes the reference intensity for this
particular application. For audible sound, a statement such as “a jet engine
produces sound at 100 dB” is appropriate because there is a generally accepted
reference intensity of 10 -16 W/cm2 for audible sound.

A 1-kHz tone (musical note C one octave above middle C) at this intensity is
barely audible to most listeners. A 1-kHz note at 120 dB (10 -4 W/cm2) is painfully
loud. Because intensity is power per unit area and power is energy per unit time
(Table 5-2), Eq. (5-1) may be used to compare the power or the energy contained
within two ultrasound waves. Thus we could also write
10 log (Power/Power0)= 10 log(E/E0)

Ultrasound wave intensity is related to maximum pressure (Pm) in the


medium by the following expression:
1=
Where is the density of the medium in grams per cubic centimeter and c is
the speed of sound in the medium. Substituting Eq. (5-2) for I and I0 in Eq. (5-1)
yields

dB=10 log = 10 log [ ]2 = 20 log


( )

When comparing the pressure of two waves, Eq. (5-3) may be used directly.
That is, the pressure does not have to be converted to intensity to determine the
decibel value.

An ultrasound transducer converts pressure amplitudes received from the


patient (i.e., the reflected ultrasound wave) into voltages. The amplitude of
voltages recorded for ultrasound waves is directly proportional to the variations in
pressure in the reflected wave.

The decibel value for the ratio of two waves may be calculated from Eq. (5-1)
or from Eq. (5-3), depending upon the information that is available concerning the
waves (see Margin Table). The “half-power value” (ratio of 0.5 in power between
two waves) is –3 dB, whereas the “half-amplitude value” (ratio of 0.5 in
amplitude) is –6 dB (Table 5-3). This difference reflects the greater sensitivity of
the decibel scale to amplitude compared with intensity values.

07
Chapter 5 | Ultrasound

5.5 | ULTRASOUND VELOCITY

The velocity of an ultrasound wave through a medium varies with the


physical properties of the medium. In low-density media such as air and other
gases, molecules may move over relatively large distances before they influence
neighboring molecules. In these media, the velocity of an ultrasound wave is
relatively low. In solids, molecules are constrained in their motion, and the velocity
of ultrasound is relatively high.

Liquids exhibit ultrasound velocities intermediate between those in gases and


solids. With the notable exceptions of lung and bone, biologic tissues yield
velocities roughly similar to the velocity of ultrasound in liquids. In different
media, changes in velocity are reflected in changes in wavelength of the ultrasound
waves, with the frequency remaining relatively constant. In ultrasound imaging,
variations in the velocity of ultrasound in different media introduce artifacts into
the image, with the major artifacts attributable to bone, fat, and, in ophthalmologic
applications, the lens of the eye.
Table (5-4): Approximate Velocities of Ultrasound in Selected Materials
No biologic Material Velocity (m/sec) Biologic Material Velocity (m/sec)
Acetone 1174 Fat 1475
Air 331 Brain 1560
Aluminum (rolled) 6420 Liver 1570
Brass 4700 Kidney 1560
Ethanol 1207 Spleen 1570
Glass (Pyrex) 5640 Blood 1570
Acrylic plastic 2680 Muscle 1580
Mercury 1450 Lens of eye 1620
Nylon (6-6) 2620 Skull bone 3360
Polyethylene 1950 Soft tissue (mean value) 1540
Water (distilled), 1498
Water(distilled), 1540

The velocities of ultrasound in various media are listed in Table 5-3.The


velocity of an ultrasound wave should be distinguished from the velocity of
molecules whose displacement into zones of compression and rarefaction
constitutes the wave. The molecular velocity describes the velocity of the
individual molecules in the medium, whereas the wave velocity describes the

07
Chapter 5 | Ultrasound

velocity of the ultrasound wave through the medium. Properties of ultrasound such
as reflection, transmission, and refraction are characteristic of the wave velocity
rather than the molecular velocity.

5.6 | ATTENUATION OF ULTRASOUND

As an ultrasound beam penetrates a medium, energy is removed from the


beam by absorption, scattering, and reflection. These processes are summarized in
Figure (5-2).As with x rays, the term attenuation refers to any mechanism that
removes energy from the ultrasound beam. Ultrasound is “absorbed” by the
medium if part of the beam’s Constructive and destructive interference effects
characterize the echoes from no specular reflections. Because the sound is reflected
in all directions, there are many opportunities for waves to travel different
pathways. The wave fronts that return to the transducer may constructively or
destructively interfere at random. The random interference pattern is known as
“speckle”. Energy is converted into other forms of energy, such as an increase in
the random motion of molecules. Ultrasound is “reflected” if there is an orderly
deflection of all or part of the beam. If part of an ultrasound beam changes
direction in a less orderly fashion, the event is usually described as “scatter.”

Fig. (5.2)

09
Chapter 5 | Ultrasound

The behavior of a sound beam when it encounters an obstacle depends upon


the size of the obstacle compared with the wavelength of the sound. If the
obstacle’s size is large compared with the wavelength of sound (and if the obstacle
is relatively smooth), then the beam retains its integrity as it changes direction. Part
of the sound beam may be reflected and the remainder transmitted through the
obstacle as a beam of lower intensity.
If the size of the obstacle is comparable to or smaller than the wavelength of
the ultrasound, the obstacle will scatter energy in various directions. Some of the
ultrasound energy may return to its original source after “no specular” scatter, but
probably not until many scatter events have occurred.

In ultrasound imaging, specular reflection permits visualization of the


boundaries between organs, and no specular reflection permits visualization of
tissue parenchyma (Figure 5-2). Structures in tissue such as collagen fibers are
smaller than the wavelength of ultrasound. Such small structures provide scatter
that returns to the transducer through multiple pathways. The sound that returns to
the transducer from such no specular reflectors is no longer a coherent beam. It is
instead the sum of a number of component waves that produces a complex pattern
of constructive and destructive interference back at the source. This interference
pattern, known as “speckle,” provides the characteristic ultrasonic appearance of
complex tissue such as liver.

The behavior of a sound beam as it encounters an obstacle such as an


interface between structures in the medium is summarized in Figure 5-3. As
illustrated in Figure (5-4), the energy remaining in the beam decreases
approximately exponentially with the depth of penetration of the beam into the
medium. The reduction in energy (i.e., the decrease in ultrasound intensity) is
described in decibels, as noted earlier.

5.7 | REFLECTION

In most diagnostic applications of ultrasound, use is made of ultrasound


waves reflected from interfaces between different tissues in the patient. The
fraction of the impinging energy reflected from an interface depends on the
difference in acoustic impedance of the media on opposite sides of the interface.

00
Chapter 5 | Ultrasound

The acoustic impedance Z of a medium is the product of the density of the


medium and the velocity of ultrasound in the medium:
Z=

Acoustic impedances of several materials are listed in the margin. For an


ultrasound wave incident perpendicularly upon an interface, the fraction of the
incidentenergy that is reflected (i.e., the reflection coefficient ) is
=

Where Z1 and Z2 are the acoustic impedances of the two media. The fraction of
the incident energy that is transmitted across an interface is described by the
transmission coefficient , where
=
( )

Obviously + = 1. With a large impedance mismatch at an interface, much of


the energy of an ultrasound wave is reflected, and only a small amount is
transmitted across the inter-face. For example, ultrasound beams are reflected
strongly at air–tissue and air–water interfaces because the impedance of air is
much less than that of tissue or water.

Table (5-5): Approximate Acoustic Impedances of Selected Materials


Material Acoustic Impedance (kg-m−2 -sec−1)×10−4
Air at standard temperature and pressure 0.0004
Water 1.50
Polyethylene 1.85
Plexiglas 3.20
Aluminum 18.0
Mercury 19.5
Brass 38.0
Fat 1.38
Aqueous and vitreous humor of eye 1.50
Brain 1.55
Blood 1.61
Kidney 1.62
Human soft tissue, mean value 1.63
Spleen 1.64
Liver 1.65
Muscle 1.70
Lens of eye 1.85
Skull bone 6.10

07
Chapter 5 | Ultrasound

Range of echoes from biologic interfaces and selection of internal echoes to


be displayed over the major portion of the gray scale in an ultrasound unit. (From
Kossoff, G., et al.12 Used with permission). The ultrasound waves will enter the
patient with little reflection at the skin surface.

Similarly, strong reflections of ultrasound occur at the boundary between the


chest wall and the lungs and at the millions of air–tissue interfaces within the
lungs. Because of the large impedance mismatch at these interfaces, efforts to use
ultrasound as a diagnostic tool for the lungs have been unrewarding. The
impedance mismatch is also high between soft tissues and bone, and the use of
ultrasound to identify tissue characteristics in regions behind bone has had limited
success.

The discussion of ultrasound reflection above assumes that the ultrasound


beam strikes the reflecting interface at a right angle. In the body, ultrasound
impinges upon interfaces at all angles. For any angle of incidence, the angle at
which the reflected ultrasound energy leaves the interface equals the angle of
incidence of the ultrasound beam; that is,
Angle of incidence = Angle of reflection

In a typical medical examination that uses reflected ultrasound and a


transducer that both transmits and detects ultrasound, very little reflected energy
will be detected if the ultrasound strikes the interface at an angle more than about 3
degrees from perpendicular. A smooth reflecting interface must be essentially
perpendicular to the ultrasound beam to permit visualization of the interface.

Fig.(5.3)

06
Chapter 5 | Ultrasound

5.8 | REFRACTION

As an ultrasound beam crosses an interface obliquely between two media, its


direction is changed (i.e., the beam is bent). If the velocity of ultrasound is higher
in the second medium, then the beam enters this medium at a more oblique (less
steep) angle. This behavior of ultrasound transmitted obliquely across an interface
is termed refraction.

The relationship between incident and refraction angles is described by


Snell’s law:
=

For example, an ultrasound beam incident obliquely upon an interface


between muscle (velocity 1580 m/sec) and fat (velocity 1475 m/sec) will enter the
fat at a steeper angle.

If an ultrasound beam impinges very obliquely upon a medium in which the


ultrasound velocity is higher, the beam may be refracted so that no ultrasound
energy enters the medium. The incidence angle at which refraction causes no
ultrasound to enter a medium is termed the critical angle µc. For the critical angle,
the angle of refraction is 90 degrees, and the sine of 90 degrees is 1. From Eq. (5-
4):
=

But
Sin 90 =1

Therefore
= sin-1 [Ci / Cr]

Where sin−1, or arcsine, refers to the angle whose sine is ci /cr. For any
particular interface, the critical angle depends only upon the velocity of ultrasound
in the two media separated by the interface.

77
Chapter 5 | Ultrasound

Refraction is a principal cause of artifacts in clinical ultrasound images. For


example, the ultrasound beam is refracted at a steeper angle as it crosses the
interface between medium 1 and 2 (c1 > c2). As the beam emerges from medium 2
and reenters medium 1, it resumes its original direction of motion. The presence of
medium 2 simply displaces the ultrasound beam laterally for a distance that
depends upon the difference in ultrasound velocity and density in the two media
and upon the thickness of medium 2. Suppose a small structure below medium 2 is
visualized by reflected ultrasound. The position of the structure would appear to
the viewer as an extension of the original direction of the ultrasound through
medium 1.
In this manner, refraction adds spatial distortion and resolution loss to ultrasound
images.

5.9 | ABSORPTION
Relaxation processes are the primary mechanisms of energy dissipation for an
ultrasound beam transversing tissue. These processes involve (a) removal of
energy from the ultrasound beam and (b) eventual dissipation of this energy
primarily as heat. As discussed earlier, ultrasound is propagated by displacement of
molecules of a medium into regions of compression and rarefaction. This
displacement requires energy that is provided to the medium by the source of
ultrasound. As the molecules attain maximum displacement from an equilibrium
position, their motion stops, and their energy is transformed from kinetic energy
associated with motion to potential energy associated with position in the
compression zone.

From this position, the molecules begin to move in the opposite direction, and
potential energy is gradually transformed into kinetic energy. The maximum
kinetic energy (i.e., the highest molecular velocity) is achieved when the molecules
pass through their original equilibrium position, where the displacement and
potential energy are zero. If the kinetic energy of the molecule at this position
equals the energy absorbed originally from the ultrasound beam, then no
dissipation of energy has occurred, and the medium is an ideal transmitter of
ultrasound. Actually, the conversion of kinetic to potential energy (and vice versa)
is always accompanied by some dissipation of energy. Therefore, the energy of the
ultrasound beam is gradually reduced as it passes through the medium.

This reduction is termed relaxation energy loss. The rate at which the beam
energy decreases is a reflection of the attenuation properties of the medium.

77
Chapter 5 | Ultrasound

The effect of frequency on the attenuation of ultrasound in different media is


described in Table (5-7).14–18 Data in this table are reasonably good estimates of
the influence of frequency on ultrasound absorption over the range of ultrasound
frequencies used diagnostically. However, complicated structures such as tissue
samples often exhibit a rather complex attenuation pattern for different
frequencies, which probably reflects the existence of a variety of relaxation
frequencies and other molecular energy absorption processes that are poorly
Table (5-7): Variation of Ultrasound Attenuation Coefficient with Frequency in Megahertz,
Where Is the Attenuation Coefficient at 1 MHz
Tissue Frequency Variation Material Frequency Variation
Blood Lung
Fat Liver
Muscle (across fibers) Brain
Muscle (along fibers) Kidney
Aqueous and vitreous Spinal cord
humor of eye Water
Lens of eye Castor oil
Skull bone Lucite

understood at present. These complex attenuation patterns are reflected in the data
in Figure (5.3).

If gas bubbles are present in a material through which a sound wave is


passing, the compressions and rarefactions cause the bubble to shrink and expand
in resonance with the sound wave. The oscillation of such bubbles is referred to as
stable cavitation.

Stable cavitation is not a major mechanism for absorption at ultrasound


intensities used diagnostically, but it can be a significant source of scatter. If an
ultrasound beam is intense enough and of the right frequency, the ultrasound-
induced mechanical disturbance of the medium can be so great that microscopic
bubbles are produced in the medium. The bubbles are formed at foci, such as
molecules in the rarefaction zones, and may grow to a cubic millimeter or so in
size. As the pressure in the rarefaction zone increases during the next phase of the
ultrasound cycle, the bubbles shrink to 10-2 mm3 or so and collapses, thereby
creating minute shock waves that seriously disturb the medium if produced in large
quantities. The effect, termed dynamic cavitation, produces high temperatures (up

77
Chapter 5 | Ultrasound

to 10,000 C) at the point where the collapse occurs.19 Dynamic cavitation is


associated with absorption of energy from the beam. Free radicals are also
produced in water surrounding the collapse. Dynamic cavitation is not a significant
mechanism of at-tenuation at diagnostic intensities, although there is evidence that
it may occur under certain conditions.

5.10 | HARDWARE PART

5.10.1 | Introduction:

We'll use HC-SR04 ultrasound sensor which transmits ultrasound waves


using physical properties to calculate the range between the user and any barrier on
his way by using reflection to calculate the distance by calculating the time the
waves takes to travel on its way from the source to the target and the way back
from the target to the receiver by:
Velocity of sound = 340 m/s
The time needed in seconds
The total distance= v*t
The distance = the total distance/2
Technical specifications:
This project started after I looked at the Polaroid Ultrasonic Ranging module. It
has a number of disadvantages for use in small robots etc.
1. The maximum range of 10.7 meter is far more than is normally required, and as
a result
2. The current consumption, at 2.5 Amps during the sonic burst is truly horrendous.
3. The 150mA quiescent current is also far too high.
4. The minimum range of 26cm is useless. 1-2cm is more like it.
5. The module is quite large to fit into small systems, and

77
Chapter 5 | Ultrasound

6. it’s EXPENSIVE.
The SRF04 was designed to be just as easy to use as the Polaroid sonar, requiring a
short trigger pulse and providing an echo pulse. Your controller only has to time
the length of this pulse to find the range. The connections to the SRF04 are shown
below:

Fig.(5-7): SRF04 Connections

The SRF04 Timing diagram is shown below. You only need to supply a short
10uS pulse to the trigger input to start the ranging. The SRF04 will send out an 8
cycle burst of ultrasound at 40 kHz and raise its echo line high. It then listens for
an echo, and as soon as it detects one it lowers the echo line again. The echo line is
therefore a pulse whose width is proportional to the distance to the object. By
timing the pulse it is possible to calculate the range in inches/centimeters or
anything else.

If nothing is detected then the SRF04 will lower its echo line anyway after
about 36mS.Here is the schematic,

77
Chapter 5 | Ultrasound

Fig. (5.8): SRF04 schematic

Fig.(5.9): SRF04 Timing Diagram

The circuit is designed to be low cost. It uses a PIC12C508 to perform the


control functions and standard 40 kHz piezo transducers. The drive to the
transmitting transducer could be simplest driven directly from the PIC. The 5v
drive can give a useful range for large objects, but can be problematic detecting

77
Chapter 5 | Ultrasound

smaller objects. The transducer can handle 20v of drive, so I decided to get up
close to this level. A MAX232 IC, usually used for RS232 communication makes
and ideal driver, providing about 16v of drive.

The receiver is a classic two stage op-amp circuit. The input capacitor C8
blocks some residual DC which always seems to be present. Each gain stage is set
to 24 for a total gain of 576-ish. This is close the 25 maximum gain available using
the LM1458. The gain bandwidth product for the LM1458 is 1 MHz the maximum
gain at 40 kHz is 1000000/40000 = 25.

The output of the amplifier is fed into an LM311 comparator. A small amount
of positive feedback provides some hysteresis to give a clean stable output.
The problem of getting operation down to 1-2cm is that the receiver will pick
up direct coupling from the transmitter, which is right next to it. To make matters
worse the piezo transducer is a mechanical object that keeps resonating sometime
after the drive has been removed. Up to 1mS depending on when you decide it has
stopped. It is much harder to tell the difference between this direct coupled ringing
and a returning echo, which is why many designs, including the Polaroid module,
simply blank out this period.

Looking at the returning echo on an oscilloscope shows that it is much larger


in magnitude at close quarters than the cross-coupled signal. I therefore adjust the
detection threshold during this time so that only the echo is detectable. The 100n
capacitor C10 is charged to about –6v during the burst. This discharges quite
quickly through the 10k resistor R6 to restore sensitivity for more distant echo’s.

A convenient negative voltage for the op-amp and comparator is generated by


the MAX232.

Unfortunately, this also generates quite a bit of high frequency noise. I


therefore shut it down whilst listening for the echo. The 10uF capacitor C9 holds
the negative rail just long enough to do this.

In operation, the processor waits for an active low trigger pulse to come in. It
then generates just eight cycles of 40 kHz. The echo line is then raised to signal the
host processor to start timing. The raising of the echo line also shuts of the
MAX232. After a while – no more than 10-12mS normally, the returning echo will
be detected and the PIC will lower the echo line. The width of this pulse represents
the flight time of the sonic burst. If no echo is detected then it will automatically
time out after about 30mS (It’s two times the WDT period of the PIC). Because the

79
Chapter 5 | Ultrasound

MAX232 is shut down during echo detection, you must wait at least 10mS
between measurement cycles for the +/- 10v to recharge.

Performance of this design is, I think, quite good. It will reliably measure
down to 3cm and will continue detecting down to 1cm or less but after 2-3cm the
pulse width doesn’t get any smaller.

Maximum range is a little over 3m. As an example of the sensitivity of this


design, it will detect a 1inch thick plastic broom handle at 2.4m.

Average current consumption is reasonable at less than 50mA and typically


about 30mA.

5.10.2 | Calculating the Distance

The SRF04 provides an echo pulse proportional to distance. If the width of


the pulse is measured in µS, then dividing by 58 will give you the distance in cm,
or dividing by 148 will give the distance in inches. µS/58=cm or µS/148=inches.

5.10.3 | Changing beam pattern and beam width

You can't! This is a question which crops up regularly, however there is no


easy way to reduce or change the beam width that I'm aware of. The beam pattern
of the SRF04 is conical with the width of the beam being a function of the surface
area of the transducers and is fixed. The beam pattern of the transducers used on
the SRF04, taken from the manufacturers’ data sheet, is shown below.

70
Chapter 5 | Ultrasound

Fig.(5.10)

5.10.4 | The development of the sensor

Since the original design of the SRF04 was published, there have been
incremental improvements to improve performance and manufacturing reliability.
The op-amp is now an LMC6032 and the comparator is an LP311. The 10uF
capacitor is now 22uF and a few resistor values have been tweaked. These changes
have happened over a period of time.

All SRF04's manufactured after May 2003 have new software implementing
an optional timing control input using the "do not connect" pin. This connection is
the PIC's Vpp line used to program the chip after assembly. After programming it’s
just an unused input with a pull-up resistor. When left unconnected the SRF04
behaves exactly as it always has and is described above. When the "do not
connect" pin is connected to ground (0v), the timing is changed slightly to allow
the SRF04 to work with the slower controllers such as the Pic axe. The SRF04's
"do not connect" pin now acts as a timing control. This pin is pulled high by
default and when left unconnected, the timing remains exactly as before.

With the timing pin pulled low (grounded) a 300uS delay is added between
the end of the trigger pulse and transmitting the sonic burst. Since the echo output
is not raised until the burst is completed, there is no change to the range timing, but
the 300uS delay gives the Pic axe time to sort out which pin to look at and start

77
Chapter 5 | Ultrasound

doing so. The new code has shipped in all SRF04's since the end of April 2003.
The new code is also useful when connecting the SRF04 to the slower Stamps
Such as the BS2. Although the SRF04 works with the BS2, the echo line needs to
be connected to the lower number input pins. This is because the Stamps take
progressively longer to look at the higher numbered pins and can miss the rising
edge of the echo signal. In this case you can connect the "do not connect" pin to
ground and give it an extra 300uS to get there.

76
CHAPTER 6
Microcontroller
Chapter 6 | Microcontroller

6.1 | INTRODUCTION

A microcontroller is a microprocessor system which contains data and


program memory, serial and parallel I/O, timers, and external and internal
interrupts—all integrated into a single chip that can be purchased for as little as
two dollars. About 40 percent of all microcontroller applications are found in
office equipment, such as PCs, laser printers, fax machines, and intelligent
telephones. About one third of all microcontrollers are found in consumer
electronic goods. Products like CD players, hi-fi equipment, video games, washing
machines, and cookers fall into this category. The communications market, the
automotive market, and the military share the rest of the applications.
Figure6.1 shows the microcontroller block diagram. However, this project
concentrates on designing the Instruction Register (IR), Program Counter (PC) and
Arithmetic Logic Unit (ALU) only.

Fig.(6.1): Microcontroller Block Diagram

19
Chapter 6 | Microcontroller

6.1.1 | History of Microcontroller

The first single-chip microprocessor was the 4-bit Intel 4004 released in
1971, with the Intel 8008 and other more capable microprocessors becoming
available over the next several years. However, both processors required external
chips to implement a working system, raising total system cost, and making it
impossible to economically computerize appliances.

The Smithsonian Institution says TI engineers Gary Boone and Michael


Cochran succeeded in creating the first microcontroller in 1971. The result of their
work was the TMS 1000, which went commercial in 1974. It combined read-only
memory, read/write memory, processor and clock on one chip and was targeted at
embedded systems.

Partly in response to the existence of the single-chip TMS 1000,[2] Intel


developed a computer system on a chip optimized for control applications, the Intel
8048, with commercial parts first shipping in 1977. [2] It combined RAM and ROM
on the same chip. This chip would find its way into over one billion PC keyboards,
and other numerous applications. At that time Intel's President, Luke J. Valenter,
stated that the microcontroller was one of the most successful in the company's
history, and expanded the division's budget over 25%.

Most microcontrollers at this time had two variants. One had an erasable
EPROM program memory, which was significantly more expensive than the
PROM variant which was only programmable once. Erasing the EPROM required
exposure to ultraviolet light through a transparent quartz lid. One-time parts could
be made in lower-cost opaque plastic packages.

In 1993, the introduction of EEPROM memory allowed microcontrollers


(beginning with the Microchip PIC16x84) to be electrically erased quickly without
an expensive package as required for EPROM, allowing both rapid prototyping,
and In System Programming. The same year, Atmel introduced the first
microcontroller using Flash memory. Other companies rapidly followed suit, with
both memory types.

Cost has plummeted over time, with the cheapest 8-bit microcontrollers
being available for under $0.25 in quantity (thousands) in 2009, and some 32-bit
microcontrollers around $1 for similar quantities.

19
Chapter 6 | Microcontroller

Nowadays microcontrollers are cheap and readily available for hobbyists,


with large online communities around certain processors.

In the future, MRAM could potentially be used in microcontrollers as it has


infinite endurance and its incremental semiconductor wafer process cost is
relatively low.

6.1.2 | Embedded Design

A microcontroller can be considered a self-contained system with a


processor, memory and peripherals and can be used as an embedded system.[5] The
majority of microcontrollers in use today are embedded in other machinery, such
as automobiles, telephones, appliances, and peripherals for computer systems.
While some embedded systems are very sophisticated, many have minimal
requirements for memory and program length, with no operating system, and low
software complexity. Typical input and output devices include switches, relays,
solenoids, LEDs, small or custom LCD displays, radio frequency devices, and
sensors for data such as temperature, humidity, light level etc. Embedded systems
usually have no keyboard, screen, disks, printers, or other recognizable I/O devices
of a personal computer, and may lack human interaction devices of any kind.

6.1.3 | Interrupts

Microcontrollers must provide real time (predictable, though not necessarily


fast) response to events in the embedded system they are controlling. When certain
events occur, an interrupt system can signal the processor to suspend processing
the current instruction sequence and to begin an interrupt service routine (ISR, or
"interrupt handler"). The ISR will perform any processing required based on the
source of the interrupt before returning to the original instruction sequence.
Possible interrupt sources are device dependent, and often include events such as
an internal timer overflow, completing an analog to digital conversion, a logic
level change on an input such as from a button being pressed, and data received on
a communication link. Where power consumption is important as in battery
operated devices, interrupts may also wake a microcontroller from a low power
sleep state where the processor is halted until required to do something by a
peripheral event.

19
Chapter 6 | Microcontroller

6.1.4 Programs

Typically microcontroller programs must fit in the available on-chip


program memory, since it would be costly to provide a system with external,
expandable, memory. Compilers and assemblers are used to convert high-level
language and assembler language codes into a compact machine code for storage
in the microcontroller's memory. Depending on the device, the program memory
may be permanent, read-only memory that can only be programmed at the factory,
or program memory may be field-alterable flash or erasable read-only memory.

Manufacturers have often produced special versions of their


microcontrollers in order to help the hardware and software development of the
target system. Originally these included EPROM versions that have a "window" on
the top of the device through which program memory can be erased by ultraviolet
light, ready for reprogramming after a programming ("burn") and test cycle. Since
1998, EPROM versions are rare and have been replaced by EEPROM and flash,
which are easier to use (can be erased electronically) and cheaper to manufacture
Other versions may be available where the ROM is accessed as an external
device rather than as internal memory, however these are becoming increasingly
rare due to the widespread availability of cheap microcontroller programmers.

The use of field-programmable devices on a microcontroller may allow field


update of the firmware or permit late factory revisions to products that have been
assembled but not yet shipped. Programmable memory also reduces the lead time
required for deployment of a new product.

Where hundreds of thousands of identical devices are required, using parts


programmed at the time of manufacture can be an economical option. These "mask
programmed" parts have the program laid down in the same way as the logic of the
chip, at the same time.

A customizable microcontroller incorporates a block of digital logic that can


be personalized in order to provide additional processing capability, peripherals
and interfaces that are adapted to the requirements of the application. For example,
the AT91CAP from Atmel has a block of logic that can be customized during
manufacturer according to user requirements.

6.1.5 | Other microcontroller features

19
Chapter 6 | Microcontroller

Microcontrollers usually contain from several to dozens of general purpose


input/output pins (GPIO). GPIO pins are software configurable to either an input or
an output state. When GPIO pins are configured to an input state, they are often
used to read sensors or external signals. Configured to the output state, GPIO pins
can drive external devices such as LEDs or motors.

Many embedded systems need to read sensors that produce analog signals.
This is the purpose of the analog-to-digital converter (ADC). Since processors are
built to interpret and process digital data, i.e. 1s and 0s, they are not able to do
anything with the analog signals that may be sent to it by a device. So the analog to
digital converter is used to convert the incoming data into a form that the processor
can recognize. A less common feature on some microcontrollers is a digital-to-
analog converter (DAC) that allows the processor to output analog signals or
voltage levels.

In addition to the converters, many embedded microprocessors include a


variety of timers as well. One of the most common types of timers is the
Programmable Interval Timer (PIT). A PIT may either count down from some
value to zero, or up to the capacity of the count register, overflowing to zero. Once
it reaches zero, it sends an interrupt to the processor indicating that it has finished
counting. This is useful for devices such as thermostats, which periodically test the
temperature around them to see if they need to turn the air conditioner on, the
heater on, etc.

A dedicated Pulse Width Modulation (PWM) block makes it possible for the
CPU to control power converters, resistive loads, motors, etc., without using lots of
CPU resources in tight timer loops.

Universal Asynchronous Receiver/Transmitter (UART) block makes it


possible to receive and transmit data over a serial line with very little load on the
CPU. Dedicated on-chip hardware also often includes capabilities to communicate
with other devices (chips) in digital formats such as I²C and Serial Peripheral
Interface (SPI).

6.1.6 | Higher integration

Micro-controllers may not implement an external address or data bus as they


integrate RAM and non-volatile memory on the same chip as the CPU. Using
fewer pins, the chip can be placed in a much smaller, cheaper package.

19
Chapter 6 | Microcontroller

Integrating the memory and other peripherals on a single chip and testing them as a
unit increases the cost of that chip, but often results in decreased net cost of the
embedded system as a whole. Even if the cost of a CPU that has integrated
peripherals is slightly more than the cost of a CPU and external peripherals, having
fewer chips typically allows a smaller and cheaper circuit board, and reduces the
labor required to assemble and test the circuit board.

A micro-controller is a single integrated circuit, commonly with the


following features:
 central processing unit - ranging from small and simple 4-bit processors to
complex 32- or 64-bit processors
 volatile memory (RAM) for data storage
 ROM, EPROM, EEPROM or Flash memory for program and operating parameter
storage
 discrete input and output bits, allowing control or detection of the logic state of an
individual package pin
 serial input/output such as serial ports (UARTs)
 other serial communications interfaces like I²C, Serial Peripheral Interface and
Controller Area Network for system interconnect
 peripherals such as timers, event counters, PWM generators, and watchdog
 clock generator - often an oscillator for a quartz timing crystal, resonator or RC
circuit
 many include analog-to-digital converters, some include digital-to-analog
converters
 in-circuit programming and debugging support

This integration drastically reduces the number of chips and the amount of
wiring and circuit board space that would be needed to produce equivalent systems
using separate chips. Furthermore, on low pin count devices in particular, each pin
may interface to several internal peripherals, with the pin function selected by
software. This allows a part to be used in a wider variety of applications than if
pins had dedicated functions. Micro-controllers have proved to be highly popular
in embedded systems since their introduction in the 1970s.

Some microcontrollers use a Harvard architecture: separate memory buses for


instructions and data, allowing accesses to take place concurrently. Where a
Harvard architecture is used, instruction words for the processor may be a different
bit size than the length of internal memory and registers; for example: 12-bit
instructions used with 8-bit data registers.

19
Chapter 6 | Microcontroller

The decision of which peripheral to integrate is often difficult. The


microcontroller vendors often trade operating frequencies and system design
flexibility against time-to-market requirements from their customers and overall
lower system cost. Manufacturers have to balance the need to minimize the chip
size against additional functionality.

Microcontroller architectures vary widely. Some designs include general-


purpose microprocessor cores, with one or more ROM, RAM, or I/O functions
integrated onto the package. Other designs are purpose built for control
applications. A micro-controller instruction set usually has many instructions
intended for bit-wise operations to make control programs more compact. [6] For
example, a general purpose processor might require several instructions to test a bit
in a register and branch if the bit is set, where a micro-controller could have a
single instruction to provide that commonly-required function. Microcontrollers
typically do not have a math coprocessor, so floating point arithmetic is performed
by software.

9.1.7 | Programming environments

Microcontrollers were originally programmed only in assembly language,


but various high-level programming languages are now also in common use to
target microcontrollers. These languages are either designed specially for the
purpose, or versions of general purpose languages such as the C programming
language. Compilers for general purpose languages will typically have some
restrictions as well as enhancements to better support the unique characteristics of
microcontrollers. Some microcontrollers have environments to aid developing
certain types of applications. Microcontroller vendors often make tools freely
available to make it easier to adopt their hardware.

Many microcontrollers are so quirky that they effectively require their own
non-standard dialects of C, such as SDCC for the 8051, which prevent using
standard tools (such as code libraries or static analysis tools) even for code
unrelated to hardware features. Interpreters are often used to hide such low level
quirks.

Interpreter firmware is also available for some microcontrollers. For


example, BASIC on the early microcontrollers Intel 8052;[7] BASIC and FORTH
on the Zilog Z8[8] as well as some modern devices. Typically these interpreters
support interactive programming.

19
Chapter 6 | Microcontroller

Simulators are available for some microcontrollers. These allow a developer


to analyze what the behavior of the microcontroller and their program should be if
they were using the actual part. A simulator will show the internal processor state
and also that of the outputs, as well as allowing input signals to be generated.
While on the one hand most simulators will be limited from being unable to
simulate much other hardware in a system, they can exercise conditions that may
otherwise be hard to reproduce at will in the physical implementation, and can be
the quickest way to debug and analyze problems.

Recent microcontrollers are often integrated with on-chip debug circuitry


that when accessed by an in-circuit emulator via JTAG, allow debugging of the
firmware with a debugger.

6.2 | TYPES OF MICROCONTROLLERS

As of 2008 there are several dozen microcontroller architectures and vendors


including:
 ARM core processors (many vendors)
o includes ARM9, ARM Cortex-A8, Sitara ARM Microprocessor
 Atmel AVR (8-bit), AVR32 (32-bit), and AT91SAM (32-bit)
 Cypress Semiconductor's M8C Core used in their PSoC (Programmable System-
on-Chip)
 Free scale Cold Fire (32-bit) and S08 (8-bit)
 Free scale 68HC11 (8-bit)
 Intel 8051
 Infineon: 8, 16, 32 Bit microcontrollers[9]
 MIPS
 Microchip Technology PIC, (8-bit PIC16, PIC18, 16-bit dsPIC33 / PIC24), (32-bit
PIC32)
 NXP Semiconductors LPC1000, LPC2000, LPC3000, LPC4000 (32-bit), LPC900,
LPC700 (8-bit)
 Parallax Propeller
 PowerPC ISE
 Rabbit 2000 (8-bit)
 Renesas RX, V850, Hitachi H8, Hitachi SuperH (32-bit), M16C (16-bit), RL78,
R8C, 78K0/78K0R (8-bit)
 Silicon Laboratories Pipelined 8-bit 8051 Microcontrollers and mixed-signal
ARM-based 32-bit microcontrollers
 STMicroelectronics STM8 (8-bit), ST10 (16-bit) and STM32 (32-bit)

19
Chapter 6 | Microcontroller

 Texas Instruments TI MSP430 (16-bit)


 Toshiba TLCS-870 (8-bit/16-bit).

Many others exist, some of which are used in very narrow range of
applications or are more like applications processors than microcontrollers. The
microcontroller market is extremely fragmented, with numerous vendors,
technologies, and markets. Note that many vendors sell or have sold multiple
architectures.

6.2.1 Interrupt latency

In contrast to general-purpose computers, microcontrollers used in


embedded systems often seek to optimize interrupt latency over instruction
throughput. Issues include both reducing the latency, and making it be more
predictable (to support real-time control).

When an electronic device causes an interrupt, the intermediate results


(registers) have to be saved before the software responsible for handling the
interrupt can run. They must also be restored after that software is finished. If there
are more registers, this saving and restoring process takes more time, increasing
the latency. Ways to reduce such context/restore latency include having relatively
few registers in their central processing units (undesirable because it slows down
most non-interrupt processing substantially), or at least having the hardware not
save them all (this fails if the software then needs to compensate by saving the rest
"manually"). Another technique involves spending silicon gates on "shadow
registers": One or more duplicate registers used only by the interrupt software,
perhaps supporting a dedicated stack.

Other factors affecting interrupt latency include:


 Cycles needed to complete current CPU activities. To minimize those costs,
microcontrollers tend to have short pipelines (often three instructions or less),
small write buffers, and ensure that longer instructions are continuable or
restartable. RISC design principles ensure that most instructions take the same
number of cycles, helping avoid the need for most such continuation/restart logic.
 The length of any critical section that needs to be interrupted. Entry to a critical
section restricts concurrent data structure access. When a data structure must be
accessed by an interrupt handler, the critical section must block that interrupt.
Accordingly, interrupt latency is increased by however long that interrupt is
blocked. When there are hard external constraints on system latency, developers

11
Chapter 6 | Microcontroller

often need tools to measure interrupt latencies and track down which critical
sections cause slowdowns.
o One common technique just blocks all interrupts for the duration of the critical
section. This is easy to implement, but sometimes critical sections get
uncomfortably long.
o A more complex technique just blocks the interrupts that may trigger access to that
data structure. This is often based on interrupt priorities, which tend to not
correspond well to the relevant system data structures. Accordingly, this technique
is used mostly in very constrained environments.
o Processors may have hardware support for some critical sections. Examples
include supporting atomic access to bits or bytes within a word, or other atomic
access primitives like the LDREX/STREX exclusive access primitives introduced
in the ARMv6 architecture.
 Interrupt nesting. Some microcontrollers allow higher priority interrupts to
interrupt lower priority ones. This allows software to manage latency by giving
time-critical interrupts higher priority (and thus lower and more predictable
latency) than less-critical ones.
 Trigger rate. When interrupts occur back-to-back, microcontrollers may avoid an
extra context save/restore cycle by a form of tail call optimization. Lower end
microcontrollers tend to support fewer interrupt latency controls than higher end
ones.

6.3 | Microcontroller embedded memory technology

Since the emergence of microcontrollers, many different memory


technologies have been used. Almost all microcontrollers have at least two
different kinds of memory, a non-volatile memory for storing firmware and a read-
write memory for temporary data.

6.3.1 | Data
From the earliest microcontrollers to today, six-transistor SRAM is almost
always used as the read/write working memory, with a few more transistors per bit
used in the register file. MRAM could potentially replace it as it is 4 to 10 times
denser which would make it more cost effective.
In addition to the SRAM, some microcontrollers also have internal
EEPROM for data storage; and even ones that do not have any (or not enough) are
often connected to external serial EEPROM chip (such as the BASIC Stamp) or
external serial flash memory chip. A few recent microcontrollers beginning in
2003 have "self-programmable" flash memory.

911
Chapter 6 | Microcontroller

6.3.2 | Firmware

The earliest microcontrollers used mask ROM to store firmware. Later


microcontrollers (such as the early versions of the free scale 68HC11 and early PIC
microcontrollers) had quartz windows that allowed ultraviolet light in to erase the
EPROM. The Microchip PIC16C84, introduced in 1993,[10] was the first
microcontroller to use EEPROM to store firmware. In the same year, Atmel
introduced the first microcontroller using NOR Flash memory to store firmware.

6.4 | PIC MICROCONTROLLER

PIC is a family of modified Harvard architecture microcontrollers made by


Microchip Technology, derived from the PIC1650 originally developed by General
Instrument's Microelectronics Division. The name PIC initially referred to
"Peripheral Interface Controller".

PICs are popular with both industrial developers and hobbyists alike due to
their low cost, wide availability, large user base, extensive collection of application
notes, availability of low cost or free development tools, and serial programming
(and re-programming with flash memory) capability. Microchip announced on
September 2011 the shipment of its ten billionth PIC processor.

6.4.1 | Family core architectural differences

PIC microchips have Harvard architecture, and instruction words are


unusual sizes. Originally, 12-bit instructions included 5 address bits to specify the
memory operand, and 9-bit branch destinations. Later revisions added opcode bits,
allowing additional address bits.

6.4.1.1 | Baseline core devices (12 bit)

These devices feature a 12-bit wide code memory, a 32-byte register file,
and a tiny two level deep call stack. They are represented by the PIC10 series, as
well as by some PIC12 and PIC16 devices. Baseline devices are available in 6-pin
to 40-pin packages.

Generally the first 7 to 9 bytes of the register file are special-purpose


registers, and the remaining bytes are general purpose RAM. Pointers are

919
Chapter 6 | Microcontroller

implemented using a register pair: after writing an address to the FSR (file select
register), the INDF (indirect f) register becomes an alias for the addressed register.
If banked RAM is implemented, the bank number is selected by the high 3 bits of
the FSR. This affects register numbers 16–31; registers 0–15 are global and not
affected by the bank select bits.

Because of the very limited register space (5 bits), 4 rarely-read registers


were not assigned addresses, but written by special instructions (OPTION and
TRIS).

The ROM address space is 512 words (12 bits each), which may be extended
to 2048 words by banking. CALL and GOTO instructions specify the low 9 bits of
the new code location; additional high-order bits are taken from the status register.
Note that a CALL instruction only includes 8 bits of address, and may only specify
addresses in the first half of each 512-word page.

Lookup tables are implemented using a computed GOTO (assignment to


PCL register) into a table of RETLW instructions.

The instruction set is as follows. Register numbers are referred to as "f",


while constants are referred to as "k". Bit numbers (0–7) are selected by "b". The
"d" bit selects the destination: 0 indicates W, while 1 indicates that the result is
written back to source register f. The C and Z status flags may be set based on the
result; otherwise they are unmodified. Add and subtract (but not rotate)
instructions that set C also set the DC (digit carry) flag, the carry from bit 3 to bit
4, which is useful for BCD arithmetic.

Third-party clones (13 bit)


ELAN Microelectronics Corp. make a series of PICmicro-like
microcontrollers with a 13-bit instruction word.[11] The instructions are mostly
compatible with the mid-range 14-bit instruction set, but limited to a 6-bit register
address (16 special-purpose registers and 48 bytes of RAM) and a 10-bit (1024
word) program space.
The 7 accumulator-immediate instructions are renumbered relative to the 14-
bit PICmicro, to fit into 3 opcode bits rather than 4, but they are all there, as well as
an additional software interrupt instruction.
There are a few additional miscellaneous instructions, and there are some
changes to the terminology (the PICmicro OPTION register is called the Control

919
Chapter 6 | Microcontroller

register; the PICmicro TRIS registers are called I/O control registers), but the
equivalents are obvious.

Mid-range core devices (14 bit)


These devices feature a 14-bit wide code memory, and an improved 8 level
deep call stack. The instruction set differs very little from the baseline devices, but
the 2 additional opcode bits allow 128 registers and 2048 words of code to be
directly addressed. There are a few additional miscellaneous instructions, and two
additional 8-bit literal instructions, add and subtract. The mid-range core is
available in the majority of devices labeled PIC12 and PIC16.
The first 32 bytes of the register space are allocated to special-purpose
registers; the remaining 96 bytes are used for general-purpose RAM. If banked
RAM is used, the high 16 registers (0x70–0x7F) are global, as are a few of the
most important special-purpose registers, including the STATUS register which
holds the RAM bank select bits. (The other global registers are FSR and INDF, the
low 8 bits of the program counter PCL, the PC high preload register PCLATH, and
the master interrupt control register INTCON.)
The PCLATH register supplies high-order instruction address bits when the
8 bits supplied by a write to the PCL register, or the 11 bits supplied by a GOTO or
CALL instruction, is not sufficient to address the available ROM space.

Enhanced Mid-range core devices (14 bit)


Enhanced Mid-range core devices introduce a deeper hardware stack,
additional reset methods, 14 additional instructions and ‘C’ programming language
optimizations. In particular. There are two INDF registers (INDF0 and INDF1),
and two corresponding FSR register pairs (FSRnL and FSRnH). Special
instructions use FSRn registers like address registers, with a variety of addressing
modes.

PIC17 high end core devices (16 bit)


The 17 series never became popular and has been superseded by the PIC18
architecture. It is not recommended for new designs, and availability may be
limited.
Improvements over earlier cores are 16-bit wide opcodes (allowing many
new instructions), and a 16 level deep call stack. PIC17 devices were produced in
packages from 40 to 68 pins.
The 17 series introduced a number of important new features:
 a memory mapped accumulator
 read access to code memory (table reads)

919
Chapter 6 | Microcontroller

 direct register to register moves (prior cores needed to move registers through the
accumulator)
 an external program memory interface to expand the code space
 an 8-bit × 8-bit hardware multiplier
 a second indirect register pair
 auto-increment/decrement addressing controlled by control bits in a status register
(ALUSTA)

PIC18 high end core devices (16 bit)

Microchip introduced the PIC18 architecture in 2000. [4] Unlike the 17


series, it has proven to be very popular, with a large number of device variants
presently in manufacture. In contrast to earlier devices, which were more often
than not programmed in assembly, C has become the predominant development
language [5].
The 18 series inherits most of the features and instructions of the 17 series,
while adding a number of important new features:
 call stack is 21 bits wide and much deeper (31 levels deep)
 the call stack may be read and written (TOSU:TOSH:TOSL registers)
 conditional branch instructions
 indexed addressing mode (PLUSW)
 extending the FSR registers to 12 bits, allowing them to linearly address the entire
data address space
 the addition of another FSR register (bringing the number up to 3)
The RAM space is 12 bits, addressed using a 4-bit bank select register and an 8-
bit offset in each instruction. An additional "access" bit in each instruction selects
between bank 0 (a=0) and the bank selected by the BSR (a=1).
A 1-level stack is also available for the STATUS, WREG and BSR registers.
They are saved on every interrupt, and may be restored on return. If interrupts are
disabled, they may also be used on subroutine call/return by setting the s bit
(appending ", FAST" to the instruction).
The auto increment/decrement feature was improved by removing the control
bits and adding four new indirect registers per FSR. Depending on which indirect
file register is being accessed it is possible to post decrement, post increment, or
preincrement FSR; or form the effective address by adding W to FSR.
In more advanced PIC18 devices, an "extended mode" is available which makes
the addressing even more favorable to compiled code:
 a new offset addressing mode; some addresses which were relative to the access
bank are now interpreted relative to the FSR2 register

919
Chapter 6 | Microcontroller

 the addition of several new instructions, notable for manipulating the FSR
registers.
These changes were primarily aimed at improving the efficiency of a data stack
implementation. If FSR2 is used either as the stack pointer or frame pointer, stack
items may be easily indexed—allowing more efficient re-entrant code. Microchip's
MPLAB C18 C compiler chooses to use FSR2 as a frame pointer.
PIC24 and dsPIC 16-bit microcontrollers
In 2001, Microchip introduced the dsPIC series of chips, which entered mass
production in late 2004. They are Microchip's first inherently 16-bit
microcontrollers. PIC24 devices are designed as general purpose microcontrollers.
dsPIC devices include digital signal processing capabilities in addition.
Architecturally, although they share the PIC moniker, they are very different
from the 8-bit PICs. The most notable differences are:[15]
 they feature a set of 16 working registers (W0-W15)
 they fully support a stack in RAM, and do not have a hardware stack
 bank switching is not required to access RAM or special function registers
 data stored in program memory can be accessed directly using a feature called
Program Space Visibility
 interrupt sources may be assigned to distinct handlers using an interrupt vector
table
Some features are:
 hardware MAC (multiply–accumulate)
 barrel shifting
 bit reversal
 (16×16)-bit single-cycle multiplication and other DSP operations
 hardware divide assist (19 cycles for 16/32-bit divide)
 hardware support for loop indexing
 Direct memory access
dsPICs can be programmed in C using Microchip's C30 compiler which is a variant
of gcc.

PIC32 32-bit microcontrollers


In November 2007 Microchip introduced the new PIC32MX family of 32-
bit microcontrollers. The initial device line-up is based on the industry standard
MIPS32 M4K Core[6]. The device can be programmed using the Microchip
MPLAB C Compiler for PIC32 MCUs, a variant of the GCC compiler. The first 18
models currently in production (PIC32MX3xx and PIC32MX4xx) are pin to pin
compatible and share the same peripherals set with the PIC24FxxGA0xx family of

919
Chapter 6 | Microcontroller

(16-bit) devices allowing the use of common libraries, software and hardware
tools.
The PIC32 architecture brings a number of new features to Microchip portfolio,
including:
 The highest execution speed 80 MIPS (120+[16] Dhrystone MIPS @ 80 MHz)
 The largest flash memory: 512 Kbytes
 One instruction per clock cycle execution
 The first cached processor
 Allows execution from RAM
 Full Speed Host/Dual Role and OTG USB capabilities
 Full JTAG and 2 wire programming and debugging
 Real-time trace

6.5 | PIC COMPONENT

6.5.1 | Logic Circuits

Some of the program instructions give the same results as logic gates. The
principle of their operation will be discussed in the text below.

AND Gate

Fig.(6.2)

The logic gate ‘AND’ has two or more inputs and one output. Let us
presume that the gate used in this example has only two inputs. A logic one (1) will
appear on its output only if both inputs (A AND B) are driven high (1). Table on
the right shows mutual dependence between inputs and the output.

Fig.(6.3)

When used in a program, a logic AND operation is performed by the program


instruction, which will be discussed later. For the time being, it is enough to

919
Chapter 6 | Microcontroller

remember that logic AND in a program refers to the corresponding bits of two
registers.

OR GATE

Fig.(6.4)

Similarly, OR gates also have two or more inputs and one output. If the gate
has only two inputs the following applies. Alogic one (1) will appear on its output
if either input (A OR B) is driven high (1). If the OR gate has more than two inputs
then the following applies. A logic one (1) appears on its output if at least one
input is driven high (1). If all inputs are at logic zero (0), the output will be at logic
zero (0) as well.

Fig.(6.5)

In the program, logic OR operation is performed in the same manner as logic AND
operation.

NOT GATE
The logic gate NOT has only one input and only one output. It operates in an
extremely simple way. When logic zero (0) appears on its input, a logic one (1)
appears on its output and vice versa. It means that this gate inverts the signal and is
often called inverter, therefore.

Fig.(6.6)

919
Chapter 6 | Microcontroller

Fig.(6.7)

In the program, logic NOT operation is performed upon one byte. The result
is a byte with inverted bits. If byte bits are considered to be a number, the inverted
value is actually a complement thereof. The complement of a number is a value
which added to the number makes it reach the largest 8-digit binary number. In
other words, the sum of an 8-digit number and its complement is always 255.

EXCLUSIVE OR GATE

Fig.(6.8)

The EXCLUSIVE OR (XOR) gate is a bit complicated comparing to other


gates. It represents a combination of all of them. A logic one (1) appears on its
output only when its inputs have different logic states.

Fig.(6.9)

In the program, this operation is commonly used to compare two bytes.


Subtraction may be used for the same purpose (if the result is 0, bytes are equal).
Unlike subtraction, the advantage of this logic operation is that it is not possible to
obtain negative results.

REGISTER
In short, a register or a memory cell is an electronic circuit which can
memorize the state of one byte.

919
Chapter 6 | Microcontroller

Fig.(6.10)

SFR REGISTERS
In addition to registers which do not have any special and predetermined
function, every microcontroller has a number of registers (SFR) whose function is
predetermined by the manufacturer. Their bits are connected (literally) to internal
circuits of the microcontroller such as timers, A/D converter, oscillators and others,
which means that they are directly in command of the operation of these circuits,
i.e. the microcontroller. Imagine eight switches which control the operation of a
small circuit within the microcontroller- Special Function Registers do exactly that.

Fig.(6.11)

In other words, the state of register bits is changed from within the program,
registers run small circuits within the microcontroller; these circuits are via
microcontroller pins connected to peripheral electronics which is used for... Well,
it’s up to you.

INPUT / OUTPUT PORTS


In order to make the microcontroller useful, it has to be connected to
additional electronics, i.e. peripherals. Each microcontroller has one or more
registers (called ports) connected to the microcontroller pins. Why input/output?
Because you can change a pin function as you wish. For example, suppose you
want your device to turn on/off three signal LEDs and simultaneously monitor the
logic state of five sensors or push buttons. Some of the ports need to be configured
so that there are three outputs (connected to LEDs) and five inputs (connected to

911
Chapter 6 | Microcontroller

sensors). It is simply performed by software, which means that a pin function can
be changed during operation.

Fig.(6.12)

One of important specifications of input/output (I/O) pins is the maximum


current they can handle. For most microcontrollers, current obtained from one pin
is sufficient to activate an LED or some other low-current device (10-20 mA). The
more I/O pins, the lower maximum current of one pin. In other words, the
maximum current stated in the data specifications sheet for the microprocessor is
shared across all I/O ports.
Another important pin function is that it can have pull-up resistors. These
resistors connect pins to the positive power supply voltage and come into effect
when the pin is configured as an input connected to a mechanical switch or a push
button. Newer versions of microcontrollers have pull-up resistors configurable by
software.
Each I/O port is usually under control of the specialized SFR, which means
that each bit of that register determines the state of the corresponding
microcontroller pin. For example, by writing logic one (1) to a bit of the control
register (SFR), the appropriate port pin is automatically configured as an input and
voltage brought to it can be read as logic 0 or 1. Otherwise, by writing zero to the
SFR, the appropriate port pin is configured as an output. Its voltage (0V or 5V)
corresponds to the state of appropriate port register bit.

MEMORY UNIT
Memory is part of the microcontroller used for data storage. The easiest way
to explain it is to compare it with a filing cabinet with many drawers. Suppose, the
drawers are clearly marked so that their contents can be easily found out by
reading the label on the front of the drawer .

991
Chapter 6 | Microcontroller

Fig.(6.13)

Similarly, each memory address corresponds to one memory location. The


contents of any location can be accessed and read by its addressing. Memory can
either be written to or read from. There are several types of memory within the
microcontroller:

READ ONLY MEMORY (ROM)


Read Only Memory (ROM) is used to permanently save the program being
executed. The size of program that can be written depends on the size of this
memory. Today’s microcontrollers commonly use 16-bit addressing, which means
that they are able to address up to 64 Kb of memory, i.e. 65535 locations. As a
novice, your program will rarely exceed the limit of several hundred instructions.
There are several types of ROM.

Masked ROM (MROM)


Masked ROM is a kind of ROM the content of which is programmed by the
manufacturer. The term ‘masked’ comes from the manufacturing process, where
regions of the chip are masked off before the process of photolithography. In case
of a large-scale production, the price is very low. Forget it...
One Time Programmable ROM (OTP ROM)
One time programmable ROM enables you to download a program into it,
but, as its name states, one time only. If an error is detected after downloading, the
only thing you can do is to download the correct program to another chip.

UV Erasable Programmable ROM (UV EPROM)

Both the manufacturing process and characteristics of this memory are


completely identical to OTP ROM. However, the package of the microcontroller
with this memory has a recognizable ‘window’ on its top side. It enables data to be
erased under strong ultraviolet light. After a few minutes it is possible to download
a new program into it.

999
Chapter 6 | Microcontroller

Installation of this window is complicated, which normally affects the price.


From our point of view, unfortunately-negative .

Flash Memory
This type of memory was invented in the 80s in the laboratories of INTEL
and was represented as the successor to the UV EPROM. Since the content of this
memory can be written and cleared practically an unlimited number of times,
microcontrollers with Flash ROM are ideal for learning, experimentation and
small-scale production. Because of its great popularity, most microcontrollers are
manufactured in flash technology today. So, if you are going to buy a
microcontroller, the type to look for is definitely flash!

RANDOM ACCESS MEMORY (RAM)


Once the power supply is off the contents of RAM is cleared. It is used for
temporary storing data and intermediate results created and used during the
operation of the microcontroller. For example, if the program performs an addition
(of whatever), it is necessary to have a register representing what in everyday life
is called the ‘sum’. For this reason, one of the registers of RAM is called the ‘sum’
and used for storing results of addition.

ELECTRICALLY ERASABLE PROGRAMMABLE ROM (EEPROM)


The contents of EEPROM may be changed during operation (similar to
RAM), but remains permanently saved even after the loss of power (similar to
ROM). Accordingly, EEPROM is often used to store values, created during
operation, which must be permanently saved. For example, if you design an
electronic lock or an alarm, it would be great to enable the user to create and enter
the password, but it’s useless if lost every time the power supply goes off. The
ideal solution is a microcontroller with an embedded EEPROM.

INTERRUPT
Most programs use interrupts in their regular execution. The purpose of the
microcontroller is mainly to respond to changes in its surrounding. In other words,
when an event takes place, the microcontroller does something... For example,
when you push a button on a remote controller, the microcontroller will register it
and respond by changing a channel, turn the volume up or down etc. If the
microcontroller spent most of its time endlessly checking a few buttons for hours
or days, it would not be practical at all.

999
Chapter 6 | Microcontroller

This is why the microcontroller has learnt a trick during its evolution.
Instead of checking each pin or bit constantly, the microcontroller delegates the
‘wait issue’ to a ‘specialist’ which will respond only when something attention
worthy happens. The signal which informs the central processor unit about such an
event is called an INTERRUPT.
CENTRAL PROCESSOR UNIT (CPU)
As its name suggests, this is a unit which monitors and controls all processes
within the microcontroller. It consists of several subunits, of which the most
important are:
 Instruction Decoder is a part of electronics which decodes program instructions
and runs other circuits on the basis of that. The ‘instruction set’ which is different
for each microcontroller family expresses the abilities of this circuit;
 Arithmetical Logical Unit (ALU) performs all mathematical and logical operations
upon data; and
 Accumulator is an SFR closely related to the operation of the ALU. It is a kind of
working desk used for storing all data upon which some operation should be
performed (addition, shift/move etc.). It also stores results ready for use in further
processing. One of the SFRs, called a Status Register (PSW), is closely related to
the accumulator. It shows at any given time the ‘status’ of a number stored in the
accumulator (number is larger or less than zero etc.). Accumulator is also called
working register and is marked as W register or just W, therefore.

Fig.(6.14)

BUS
A bus consists of 8, 16 or more wires. There are two types of buses: the
address bus and the data bus. The address bus consists of as many lines as
necessary for memory addressing. It is used to transmit address from the CPU to
the memory. The data bus is as wide as the data, in our case it is 8 bits or wires
wide. It is used to connect all the circuits within the microcontroller.

SERIAL COMMUNICATION

999
Chapter 6 | Microcontroller

Parallel connection between the microcontroller and peripherals via


input/output ports is the ideal solution on shorter distances up to several meters.
However, in other cases when it is necessary to establish communication between
two devices on longer distances it is not possible to use parallel connection.
Instead, serial communication is used.
Today, most microcontrollers have built in several different systems for serial
communication as a standard equipment. Which of these systems will be used
depends on many factors of which the most important are:
 How many devices the microcontroller has to exchange data with?
 How fast the data exchange has to be?
 What is the distance between devices?
 Is it necessary to send and receive data simultaneously?

Fig.(6.15)

One of the most important things concerning serial communication is the


Protocol which should be strictly observed. It is a set of rules which must be
applied in order that devices can correctly interpret data they mutually exchange.
Fortunately, the microcontroller automatically takes care of this, so that the work
of the programmer/user is reduced to simple write (data to be sent) and read
(received data).

BAUD RATE
The term baud rate is used to denote the number of bits transferred per
second [bps]. Note that it refers to bits, not bytes. It is usually required by the
protocol that each byte is transferred along with several control bits. It means that
one byte in serial data stream may consist of 11 bits. For example, if the baud rate
is 300 bps then maximum 37 and minimum 27 bytes may be transferred per
second. The most commonly used serial communication systems are:

999
Chapter 6 | Microcontroller

I2C (INTER INTEGRATED CIRCUIT)


Inter-integrated circuit is a system for serial data exchange between the
microcontrollers and specialized integrated circuits of a new generation. It is used
when the distance between them is short (receiver and transmitter are usually on
the same printed board). Connection is established via two conductors. One is used
for data transfer, the other is used for synchronization (clock signal). As seen in
figure below, one device is always a master. It performs addressing of one slave
chip before communication starts. In this way one microcontroller can
communicate with 112 different devices. Baud rate is usually 100 Kb/sec (standard
mode) or 10 Kb/sec (slow baud rate mode). Systems with the baud rate of 3.4
Mb/sec have recently appeared. The distance between devices which communicate
over an I2C bus is limited to several meters.

Fig.(6.16)

SPI (SERIAL PERIPHERAL INTERFACE BUS)


A serial peripheral interface (SPI) bus is a system for serial communication
which uses up to four conductors, commonly three. One conductor is used for data
receiving, one for data sending, one for synchronization and one alternatively for
selecting a device to communicate with. It is a full duplex connection, which
means that data is sent and received simultaneously. The maximum baud rate is
higher than that in the I2C communication system.

Fig.(6.17)

UART (UNIVERSAL ASYNCHRONOUS RECEIVER/TRANSMITTER)


This sort of communication is asynchronous, which means that a special line
for transferring clock signal is not used. In some applications, such as radio
connection or infrared waves remote control, this feature is crucial. Since only one

999
Chapter 6 | Microcontroller

communication line is used, both receiver and transmitter operate at the same
predefined rate in order to maintain necessary synchronization. This is a very
simple way of transferring data since it basically represents the conversion of 8-bit
data from parallel to serial format. Baud rate is not high, up to 1 Mbit/sec.

OSCILLATOR

Fig.(6.18)

Even pulses generated by the oscillator enable harmonic and synchronous


operation of all circuits within the microcontroller. The oscillator is usually
configured so as to use quartz crystal or ceramic resonator for frequency stability,
but it can also operate as a stand-alone circuit (like RC oscillator). It is important to
say that instructions are not executed at the rate imposed by the oscillator itself, but
several times slower. It happens because each instruction is executed in several
steps. In some microcontrollers, the same number of cycles is needed to execute all
instructions, while in others, the number of cycles is different for different
instructions. Accordingly, if the system uses quartz crystal with a frequency of 20
MHz, the execution time of an instruction is not 50nS, but 200, 400 or 800 nS,
depending on the type of MCU!
EXTERNAL OSCILLATOR IN EC MODE
The external clock (EC) mode uses external oscillator as a clock source. The
maximum frequency of this clock is limited to 20 MHz

Fig.(6.19)

The advantages of the external oscillator when configured to operate in EC mode:


 The independent external clock source is connected to the OSC1 input and the
OSC2 is available as a general purpose I/O;

999
Chapter 6 | Microcontroller

 It is possible to synchronize the operation of the microcontroller with the rest of


on-board electronics;
 In this mode the microcontroller starts operation immediately after the power is on.
No time delay is required for frequency stabilization; and
 Temporary disabling the external clock source causes device to stop operation,
while leaving all data intact. After restarting the external clock, the device
proceeds with operation as if nothing has happened.

Fig.(6.20)

EXTERNAL OSCILLATOR IN LP, XT OR HS MODE

The LP, XT and HS modes use external oscillator as a clock source the
frequency of which is determined by quartz crystal or ceramic resonators
connected to the OSC1 and OSC2 pins. Depending on the features of the
component in use, select one of the following modes:
 LP mode - (Low Power) is used for low-frequency quartz crystal only. This mode
is designed to drive only 32.768 kHz crystals usually embedded in quartz watches.
It is easy to recognize them by small size and specific cylindrical shape. The
current consumption is the least of the three modes.
 XT mode is used for intermediate-frequency quartz crystals up to 8 MHz The
current consumption is the medium of the three modes.
 HS mode - (High Speed) is used for high-frequency quartz crystals over 8 MHz
The current consumption is the highest of the three modes.

Fig.(6.21)

999
Chapter 6 | Microcontroller

CERAMIC RESONATORS IN XT OR HS MODE


Ceramic resonators are by their features similar to quartz crystals and are
connected in the same way, therefore. Unlike quartz crystals, they are cheaper and
oscillators containing them have a bit poorer characteristics. They are used for
clock frequencies ranging from 100 kHz to 20 MHz

Fig.(6.22)

EXTERNAL OSCILLATOR IN RC AND RCIO MODE


There are certainly many advantages in using elements for frequency
stabilization, but sometimes they are really unnecessary. In most cases the
oscillator may operate at frequencies not precisely defined so that embedding of
such elements is a waste of money. The simplest and cheapest solution in these
situations is to use one resistor and one capacitor for the operation of oscillator.
There are two modes:

Fig.(6.23)

RC mode. When the external oscillator is configured to operate in RC


mode, the OSC1 pin should be connected to the RC circuit as shown in figure on
the right. The OSC2 pin outputs the RC oscillator frequency divided by 4. This
signal may be used for calibration, synchronization or other application
requirements.

Fig.(6.24)

999
Chapter 6 | Microcontroller

RCIO mode. Likewise, the RC circuit is connected to the OSC1 pin. This time,
the available OSC2 pin is used as an additional general-purpose I/O pin. In both
cases, it is recommended to use components as shown in figure. The frequency of
such an oscillator is calculated according to the formula f = 1/T in which:
 f = frequency [Hz];
 T = R * C = time constant [s];
 R = resistor resistance [Ω]; and
 C = capacitor capacity [F].

6.5.2 | Power Supply Circuit

There are two things worth attention concerning the microcontroller power
supply circuit:
 Brown out is a potentially dangerous condition which occurs at the moment the
microcontroller is being turned off or when the power supply voltage drops to a
minimum due to electric noise. As the microcontroller consists of several circuits
with different operating voltage levels, this state can cause its out-of-control
performance. In order to prevent it, the microcontroller usually has a built-in
circuit for brown out reset which resets the whole electronics as soon as the
microcontroller incurs a state of emergency.
 Reset pin is usually marked as MCLR (Master Clear Reset). It is used for external
reset of the microcontroller by applying a logic zero (0) or one (1) to it, which
depends on the type of the microcontroller. In case the brown out circuit is not
built in, a simple external circuit for brown out reset can be connected to the
MCLR pin.

TIMERS/COUNTERS
The microcontroller oscillator uses quartz crystal for its operation. Even
though it is not the simplest solution, there are many reasons to use it. The
frequency of such oscillator is precisely defined and very stable, so that pulses it
generates are always of the same width, which makes them ideal for time
measurement. Such oscillators are also used in quartz watches. If it is necessary to
measure time between two events, it is sufficient to count up pulses generated by
this oscillator. This is exactly what the timer does.

Most programs use these miniature electronic ‘stopwatches’. These are


commonly 8- or 16-bit SFRs the contents of which are automatically incremented
by each coming pulse. Once a register is completely loaded, an interrupt may be
generated!

991
Chapter 6 | Microcontroller

If the timer uses an internal quartz oscillator for its operation then it can be
used to measure time between two events (if the register value is T1 at the moment
measurement starts, and T2 at the moment it terminates, then the elapsed time is
equal to the result of subtraction T2-T1). If registers use pulses coming from
external source then such a timer is turned into a counter. This is only a simple
explanation of the operation itself. It is however more complicated in practice.

Fig.(6.25)

HOW DOES THE TIMER OPERATE?


In practice, pulses generated by the quartz oscillator are once per each
machine cycle, directly or via a prescaler, brought to the circuit which increments
the number stored in the timer register. If one instruction (one machine cycle) lasts
for four quartz oscillator periods then this number will be incremented a million
times per second (each microsecond) by embedding quartz with the frequency of
4MHz.

Fig.(6.26)

It is easy to measure short time intervals, up to 256 microseconds, in the way


described above because it is the largest number that one register can store. This
restriction may be easily overcome in several ways such as by using a slower
oscillator, registers with more bits, presales or interrupts. The first two solutions
have some weaknesses so it is more recommended to use presales or interrupts.

USING A PRESCALER IN TIMER OPERATION

991
Chapter 6 | Microcontroller

Presale is an electronic device used to reduce frequency by a predetermined


factor. In order to generate one pulse on its output, it is necessary to bring 1, 2 , 4
or more pulses on its input. Most microcontrollers have one or more presales built
in and their division rate may be changed from within the program. The prescaler
is used when it is necessary to measure longer periods of time. If one prescaler is
shared by timer and watchdog timer, it cannot be used by both of them
simultaneously.

Fig.(6.26)

USING INTERRUPT IN TIMER OPERATION


If the timer register consists of 8 bits, the largest number it can store is 255.
As for 16-bit registers it is the number 65.535. If this number is exceeded, the
timer will be automatically reset and counting will start at zero again. This
condition is called an overflow. If enabled from within the program, the overflow
can cause an interrupt, which gives completely new possibilities. For example, the
state of registers used for counting seconds, minutes or days can be changed in an
interrupt routine. The whole process (except for interrupt routine) is automatically
performed behind the scenes, which enables the main circuits of the
microcontroller to operate normally.

Fig.(6.28)

999
Chapter 6 | Microcontroller

This figure illustrates the use of an interrupt in timer operation. Delays of


arbitrary duration, having almost no influence on the main program execution, can
be easily obtained by assigning the presales to the timer.

COUNTERS
If the timer receives pulses frm the microcontroller input pin, then it turns
into a counter. Obviously, it is the same electronic circuit able to operate in two
different modes. The only difference is that in this case pulses to be counted come
over the microcontroller input pin and their duration (width) is mostly undefined.
This is why they cannot be used for time measurement, but for other purposes such
as counting products on an assembly line, number of axis rotation, passengers etc.
(depending on sensor in use).

WATCHDOG TIMER
A watchdog timer is a timer connected to a completely separate RC
oscillator within the microcontroller.If the watchdog timer is enabled, every time it
counts up to the maximum value, the microcontroller reset occurs and the program
execution starts from the first instruction. The point is to prevent this from
happening by using a specific command.
Anyway, the whole idea is based on the fact that every program is executed
in several longer or shorter loops. If instructions which reset the watchdog timer
are set at the appropriate program locations, besides commands being regularly
executed, then the operation of the watchdog timer will not affect the program
execution. If for any reason, usually electrical noise in industry, the program
counter ‘gets stuck’ at some memory location from which there is no return, the
watchdog timer will not be cleared, so the register’s value being constantly
incremented will reach the maximum et voila! Reset occurs!

Fig.(6.29)

A/D CONVERTER

999
Chapter 6 | Microcontroller

Fig.(6.30)

External signals are usually fundamentally different from those the


microcontroller understands (ones and zeros) and have to be converted therefore
into values understandable for the microcontroller. An analogue to digital
converter is an electronic circuit which converts continuous signals to discrete
digital numbers. In other words, this circuit converts an analogue value into a
binary number and passes it to the CPU for further processing. This module is
therefore used for input pin voltage measurement (analogue value). The result of
measurement is a number (digital value) used and processed later in the program.

Fig.(6.31)

INTERNAL ARCHITECTURE
All upgraded microcontrollers use one of two basic design models called
Harvard and von-Neumann architecture. They represent two different ways of
exchanging data between CPU and memory.

VON-NEUMANN ARCHITECTURE

999
Chapter 6 | Microcontroller

Fig.(6.32)

Microcontrollers using von-Neumann architecture have only one memory


block and one 8-bit data bus. As all data are exchanged through these 8 lines, the
bus is overloaded and communication is very slow and inefficient. The CPU can
either read an instruction or read/write data from/to the memory. Both cannot occur
at the same time since instructions and data use the same bus. For example, if a
program line reads that RAM memory register called ‘SUM’ should be
incremented by one (instruction: incf SUM), the microcontroller will do the
following:
1. Read the part of the program instruction specifying WHAT should be done (in this
case it is the ‘incf’ instruction for increment).
2. Read the other part of the same instruction specifying upon WHICH data it should
be performed (in this case it is the ‘SUM’ register).
3. After being incremented, the contents of this register should be written to the
register from which it was read (‘SUM’ register address).
The same data bus is used for all these intermediate operations.

HARVARD ARCHITECTURE

Fig.(6.33)

Microcontrollers using Harvard architecture have two different data buses.


One is 8 bits wide and connects CPU to RAM. The other consists of 12, 14 or 16
lines and connects CPU to ROM. Accordingly; the CPU can read an instruction
and access data memory at the same time. Since all RAM memory registers are 8

999
Chapter 6 | Microcontroller

bits wide, all data being exchanged are of the same width. During the process of
written a program, only 8-bit data are considered. In other words, all you can
change from within the program and all you can influence are 8 bits wide. All the
programs written for these microcontrollers will be stored in the microcontroller
internal ROM after being compiled into machine code. However, ROM memory
locations do not have 8, but 12, 14 or 16 bits. The rest of bits 4, 6 or 8 represent
instruction specifying for the CPU what to do with the 8-bit data.

The advantages of such design are the following:


 All data in the program is one byte (8 bits) wide. As the data bus used for program
reading has 12, 14 or 16 lines, both instruction and data can be read simultaneously
using these spare bits. For this reason, all instructions are single-cycle instructions,
except for the jump instruction which is two-cycle instruction.
 Owing to the fact that the program (ROM) and temporary data (RAM) are
separate, the CPU can execute two instructions at a time. Simply put, while RAM
read or write is in progress (the end of one instruction), the next program
instruction is read through the other bus.
 When using microcontrollers with von-Neumann architecture, one never knows
how much memory is to be occupied by the program. Basically, most program
instructions occupy two memory locations (one contains information on WHAT
should be done, whereas the other contains information upon WHICH data it
should be done). However, it is not a hard and fast rule, but the most common case.
In microcontrollers with Harvard architecture, the program word bus is wider than
one byte, which allows each program word to consist of instruction and data, i.e.
one memory location - one program instruction.
INSTRUCTION SET

Fig.(6.34)
All instructions understandable to the microcontroller are called together the
Instruction Set. When you write a program in assembly language, you actually
specify instructions in such an order they should be executed. The main restriction
here is a number of available instructions. The manufacturers usually adopt either
approach described below:

999
Chapter 6 | Microcontroller

RISC (REDUCED INSTRUCTION SET COMPUTER)


In this case, the microcontroller recognizes and executes only basic
operations (addition, subtraction, copying etc.). Other, more complicated
operations are performed by combining them. For example, multiplication is
performed by performing successive addition. It’s the same as if you try to explain
to someone, using only a few different words, how to reach the airport in a new
city. However, it’s not as black as it’s painted. First of all, this language is easy to
learn. The microcontroller is very fast so that it is not possible to see all the
arithmetic ‘acrobatics’ it performs. The user can only see the final results. At last,
it is not so difficult to explain where the airport is if you use the right words such
as left, right, kilometers etc.

CISC (COMPLEX INSTRUCTION SET COMPUTER)


CISC is the opposite to RISC! Microcontrollers designed to recognize more
than 200 different instructions can do a lot of things at high speed. However, one
needs to understand how to take all that such a rich language offers, which is not at
all easy...

HOW TO MAKE THE RIGHT CHOICE?


Ok, you are the beginner and you have made a decision to go on an
adventure of working with the microcontrollers. Congratulations on your choice!
However, it is not as easy to choose the right microcontroller as it may seem. The
problem is not a limited range of devices, but the opposite!
Before you start to design a device based on the microcontroller, think of the
following: how many input/output lines will I need for operation? Should it
perform some other operations than to simply turn relays on/off? Does it need
some specialized module such as serial communication, A/D converter etc.? When
you create a clear picture of what you need, the selection range is considerably
reduced and it’s time to think of price. Are you planning to have several same
devices? Several hundred? A million? Anyway, you get the point.
If you think of all these things for the very first time then everything seems a
bit confusing. For this reason, go step by step. First of all, select the manufacturer,
i.e. the microcontroller family you can easily get. Study one particular model.
Learn as much as you need, doesn't go into details. Solve a specific problem and
something incredible will happen- you will be able to handle any model belonging
to that microcontroller family.

999
Chapter 6 | Microcontroller

Remember learning to ride a bicycle. After several bruises at the beginning,


you were able to keep balance, then to easily ride any other bicycle. And of course,
you will never forget programming just as you will never forget riding bicycles!

6.6 | DEVELOPMENT TOOLS

Microchip provides a freeware IDE package called MPLAB, which includes


an assembler, linker, software simulator, and debugger. They also sell C compilers
for the PIC18 and dsPIC which integrate cleanly with MPLAB. Free student
versions of the C compilers are also available with all features. But for the free
versions, optimizations will be disabled after 60 days.

Several third parties make C language compilers for PICs, many of which integrate
to MPLAB and/or feature their own IDE. A fully featured compiler for the
PICBASIC language to program PIC microcontrollers is available from melbas,
Inc. Development tools are available for the PIC family under the GPL or other
free software or open source licenses.

6.6.1 | Device Programmers

Fig.(6.35)

A development board for low pin-count MCU, from Microchip Devices


called "programmers" are traditionally used to get program code into the target
PIC. Most PICs that Microchip currently sell feature ICSP (In Circuit Serial
Programming) and/or LVP (Low Voltage Programming) capabilities, allowing the
PIC to be programmed while it is sitting in the target circuit. ICSP programming is
performed using two pins, clock and data, while a high voltage (12V) is present on
the Vpp/MCLR pin. Low voltage programming dispenses with the high voltage,
but reserves exclusive use of an I/O pin and can therefore be disabled to recover
the pin for other uses (once disabled it can only be re-enabled using high voltage
programming).
There are many programmers for PIC microcontrollers, ranging from the
extremely simple designs which rely on ICSP to allow direct download of code

999
Chapter 6 | Microcontroller

from a host computer, to intelligent programmers that can verify the device at
several supply voltages. Many of these complex programmers use a pre-
programmed PIC themselves to send the programming commands to the PIC that
is to be programmed. The intelligent type of programmer is needed to program
earlier PIC models (mostly EPROM type) which do not support in-circuit
programming.
Many of the higher end flash based PICs can also self-program (write to
their own program memory). Demo boards are available with a small boot loader
factory programmed that can be used to load user programs over an interface such
as RS-232 or USB, thus obviating the need for a programmer device. Alternatively
there is boot loader firmware available that the user can load onto the PIC using
ICSP. The advantages of a boot loader over ICSP is the far superior programming
speeds, immediate program execution following programming, and the ability to
both debug and program using the same cable.

Fig.(6.36) Microchip PICSTART Plus programmer

Programmers/debuggers are available directly from Microchip. Third party


programmers range from plans to build your own, to self-assembly kits and fully
tested ready-to-go units. Some are simple designs which require a PC to do the
low-level programming signaling (these typically connect to the serial or parallel
port and consist of a few simple components), while others have the programming
logic built into them (these typically use a serial or USB connection, are usually
faster, and are often built using PICs themselves for control).

6.6.2 | Debugging

Software emulation
Commercial and free emulators exist for the PIC family processors.
In-circuit debugging
Later model PICs feature an ICD (in-circuit debugging) interface, built into
the CPU core. ICD debuggers (MPLAB ICD2 and other third party) can

999
Chapter 6 | Microcontroller

communicate with this interface using three lines. This cheap and simple
debugging system comes at a price however, namely limited breakpoint count (1
on older pics 3 on newer PICs), loss of some IO (with the exception of some
surface mount 44-pin PICs which have dedicated lines for debugging) and loss of
some features of the chip. For small PICs, where the loss of IO caused by this
method would be unacceptable, special headers are made which are fitted with
PICs that have extra pins specifically for debugging.

In-circuit emulators
Microchip offers three full in circuit emulators: the MPLAB ICE2000
(parallel interface, a USB converter is available); the newer MPLAB ICE4000
(USB 2.0 connection); and most recently, the REAL ICE. All of these ICE tools
can be used with the MPLAB IDE for full source-level debugging of code running
on the target.
The ICE2000 requires emulator modules, and the test hardware must provide
a socket which can take either an emulator module, or a production device.
The REAL ICE connects directly to production devices which support in-
circuit emulation through the PGC/PGD programming interface, or through a high
speed connection which uses two more pins. According to Microchip, it supports
"most" flash-based PIC, PIC24, and dsPIC processors.

The ICE4000 is no longer directly advertised on Microchip's website, and


the purchasing page states that it is not recommended for new designs.

PICKit 2 open source structure and clones

PICKit 2 has been an interesting PIC programmer from Microchip. It can


program all PICs and debug most of the PICs (as of May-2009, only the PIC32
family is not supported for MPLAB debugging). Ever since its first releases, all
software source code (firmware, PC application) and hardware schematic are open
to the public. This makes it relatively easy for an end user to modify the
programmer for use with a non-Windows operating system such as Linux or Mac
OS. In the mean time, it also creates lots of DIY interest and clones. This open
source structure brings many features to the PICKit 2 community such as
Programmer-to-Go, the UART Tool and the Logic Tool, which have been
contributed by PICKit 2 users. Users have also added such features to the PICKit 2
as 4MB Programmer-to-go capability, USB buck/boost circuits, RJ12 type
connectors and others.

991
Chapter 6 | Microcontroller

Part number suffixes


The F in a name generally indicates the PICmicro uses flash memory and
can be erased electronically. Conversely, a C generally means it can only be erased
by exposing the die to ultraviolet light (which is only possible if a windowed
package style is used). An exception to this rule is the PIC16C84 which uses
EEPROM and is therefore electrically erasable.
An L in the name indicates the part will run at a lower voltage, often with
frequency limits imposed.[19] Parts designed specifically for low voltage operation,
within a strict range of 3 - 3.6 Volts, are marked with a J in the part number. These
parts are also uniquely I/O tolerant as they will accept up to 5V as inputs. [19]

Fig.(6.37)

This figure below shows the most commonly used solution.

Fig.(6.38)

In order to prevent the appearance of high voltage self-induction, caused by


a sudden stop of the current flow through the coil, an inverted polarized diode is
connected in parallel to the coil. The purpose of this diode is to 'cut off' the voltage
peak.

6.7 | LCD DISPLAY

991
Chapter 6 | Microcontroller

This component is specifically manufactured to be used with


microcontrollers, which means that it cannot be activated by standard IC circuits. It
is used for displaying different messages on a miniature liquid crystal display. The
model described here is for its low price and great capabilities most frequently
used in practice. It is based on the HD44780 microcontroller (Hitachi) and can
display messages in two lines with 16 characters each. It can display all the letters
of alphabet, Greek letters, punctuation marks, mathematical symbols etc. It is also
possible to display symbols made up by the user. Other useful features include
automatic message shift (left and right), cursor appearance, LED backlight etc.

Fig.(6.39)

6.7.1 | LCD Display Pins

Along one side of the small printed board of the LCD display there are pins
that enable it to be connected to the microcontroller. There are in total of 14 pins
marked with numbers (16 if there is a backlight).

Fig.(6.40)

6.7.2 | LCD Screen

999
Chapter 6 | Microcontroller

An LCD screen can display two lines with 16 characters each. Every
character consists of 5x8 or 5x11 dot matrix. This book covers a 5x8 character
display which is most commonly used.

Display contrast depends on the power supply voltage and whether messages
are displayed in one or two lines. For this reason, varying voltage 0-Vdd is applied
to the pin marked as Vee. A trimmer potentiometer is usually used for this purpose.
Some of the LCD displays have built-in backlight (blue or green LEDs). When
used during operation, a current limiting resistor should be serially connected to
one of the pins for backlight power supply (similar to LED diodes).

Fig.(6.41)

If there are no characters displayed or if all of them are dimmed when the
display is switched on, the first thing that should be done is to check the
potentiometer for contrast adjustment. Is it properly adjusted? The same applies if
the mode of operation has been changed (writing in one or two lines).

6.7.3 | LCD Memory

LCD display contains three memory blocks:


 DDRAM Display Data RAM;
 CGRAM Character Generator RAM; and
 CGROM Character Generator ROM.

999
Chapter 6 | Microcontroller

DDRAM Memory
DDRAM memory is used for storing characters to be displayed. The size of
this memory is capable of storing 80 characters. Some memory locations are
directly connected to the characters on display.
Everything works quite simply: it is enough to configure the display to
increment addresses automatically (shift right) and set the starting address for the
message to be displayed (for example 00 hex).
Afterwards, all characters sent through lines D0-D7 will be displayed in the
message format we are used to- from left to right. In this case, displaying starts
from the first field of the first line because the initial address is 00 hex. If more
than 16 characters are sent, then all of them will be memorized, but only the first
sixteen characters will be visible. In order to display the rest of them, the shift
command should be used. Virtually, everything looks as if the LCD display is a
window which shifts left-right over memory locations containing different
characters. In reality, this is how the effect of the message shifting over the screen
has been created.

Fig.(6.42)

If the cursor is on, it appears at the currently addressed location. In other


words, when a character appears at the cursor position, it will automatically move
to the next addressed location.

This is a sort of RAM memory so that data can be written to and read from it,
but its content is irretrievably lost when the power goes off.

CGROM Memory
CGROM memory contains a standard character map with all characters that
can be displayed on the screen. Each character is assigned to one memory location:

999
Chapter 6 | Microcontroller

Fig.(6.43)

The addresses of CGROM memory locations match the characters of ASCII.


If the program being currently executed encounters a command ‘send character P
to port’ then the binary value 0101 0000 appears on the port. This value is the
ASCII equivalent to the character P. It is then written to an LCD, which results in
displaying the symbol from the 0101 0000 location of CGROM. In other words,
the character ‘P’ is displayed. This applies to all letters of alphabet (capitals and
small), but not to numbers. As seen on the previous map, addresses of all digits are
pushed forward by 48 relative to their values (digit 0 address is 48, digit 1 address
is 49, digit 2 address is 50 etc.). Accordingly, in order to display digits correctly it
is necessary to add the decimal number 48 to each of them prior to being sent to an
LCD.

999
Chapter 6 | Microcontroller

What is ASCII? From their inception till today, computers can recognize
only numbers, but not letters. It means that all data a computer swaps with a
peripheral device has a binary format even though the same is recognized by the
man as letters (the keyboard is an excellent example). In other words, every
character matches a unique combination of zeroes and ones. ASCII is character
encoding based on the English alphabet. ASCII code specifies a correspondence
between standard character symbols and their numerical equivalents.

Fig.(6.44)

CGRAM Memory
Apart from standard characters, the LCD display can also display symbols
defined by the user itself. It can be any symbol in the size of 5x8 pixels. RAM
memory called CGRAM in the size of 64 bytes enables it.
Memory registers are 8 bits wide, but only 5 lower bits are used. Logic one
(1) in every register represents a dimmed dot, while 8 locations grouped together
represent one character. It is best illustrated in figure below:

999
Chapter 6 | Microcontroller

Fig.(6.45)

Symbols are usually defined at the beginning of the program by simple


writing zeros and ones to registers of CGRAM memory so that they form desired
shapes. In order to display them it is sufficient to specify their address. Pay
attention to the first columns in the CGROM map of characters. It doesn't contain
RAM memory addresses, but symbols being discussed here. In this example,
‘display 0’ means - display ‘č’, ‘display 1’ means - display ‘ž’ etc.

6.7.4 | LCD Basic Commands

All data transferred to an LCD through the outputs D0-D7 will be interpreted as
a command or a data, which depends on the RS pin logic state:
 RS = 1 - Bits D0 - D7 are addresses of the characters to be displayed. LCD
processor addresses one character from the character map and displays it. The
DDRAM address specifies location on which the character is to be displayed. This

999
Chapter 6 | Microcontroller

address is defined prior to transferring character or the address of the previously


transferred character is automatically incremented.
 RS = 0 - Bits D0 - D7 are commands for setting the display mode.
Here is a list of commands recognized by the LCD:
Execution
Command RS RW D7 D6 D5 D4 D3 D2 D1 D0
Time
Clear display 0 0 0 0 0 0 0 0 0 1 1.64mS
Cursor home 0 0 0 0 0 0 0 0 1 x 1.64mS
Entry mode set 0 0 0 0 0 0 0 1 I/D S 40uS
Display on/off control 0 0 0 0 0 0 1 D U B 40uS
Cursor/Display Shift 0 0 0 0 0 1 D/C R/L x x 40uS
Function set 0 0 0 0 1 DL N F x x 40uS
Set CGRAM address 0 0 0 1 CGRAM address 40uS
Set DDRAM address 0 0 1 DDRAM address 40uS
Read "BUSY" flag (BF) 0 1 BF DDRAM address -
Write to CGRAM or
1 0 D7 D6 D5 D4 D3 D2 D1 D0 40uS
DDRAM
Read from CGRAM or
1 1 D7 D6 D5 D4 D3 D2 D1 D0 40uS
DDRAM

I/D 1 = Increment (by 1) R/L 1 = Shift right


0 = Decrement (by 1) 0 = Shift left

S 1 = Display shift on DL 1 = 8-bit interface


0 = Display shift off 0 = 4-bit interface

D 1 = Display on N 1 = Display in two lines


0 = Display off 0 = Display in one line

U 1 = Cursor on F 1 = Character format 5x10 dots


0 = Cursor off 0 = Character format 5x7 dots

B 1 = Cursor blink on D/C 1 = Display shift


0 = Cursor blink off 0 = Cursor shift

999
Chapter 6 | Microcontroller

WHAT IS THE BUSY FLAG?


Compared to the microcontroller, the LCD is an extremely slow component.
For this reason, it was necessary to provide a signal which would, upon command
execution, indicate that the display is ready for the next piece of data. That signal,
called the busy flag, can be read from the line D7. The display is ready to receive
new data when the voltage on this line is 0V (BF=0).

6.7.5 | LCD Connecting

Depending on how many lines are used for connecting an LCD to the
microcontroller, there are 8-bit and 4-bit LCD modes. The appropriate mode is
selected at the beginning of the operation in the process called 'initialization'. The
8-bit LCD mode uses outputs D0- D7 to transfer data as explained on the previous
page.
The main purpose of the 4-bit LCD mode is to save valuable I/O pins of the
microcontroller. Only 4 higher bits (D4-D7) are used for communication, while
others may be left unconnected. Each piece of data is sent to the LCD in two steps-
four higher bits are sent first (normally through the lines D4-D7), then four lower
bits. Initialization enables the LCD to link and interpret received bits correctly.

Fig.(6.46)

999
Chapter 6 | Microcontroller

Data is rarely read from the LCD (it is mainly transferred from the
microcontroller to the LCD) so it is often possible to save an extra I/O pin by
simple connecting the R/W pin to the Ground. Such a saving has its price.
Messages will be normally displayed, but it will not be possible to read the busy
flag since it is not possible to read the display either. Fortunately, there is a simple
solution. After sending a character or a command it is important to give the LCD
enough time to do its job. Owing to the fact that the execution of a command may
last for approximately 1.64mS, it will be sufficient to wait about 2mS for the LCD.

6.7.6 | LCD Initialization

The LCD is automatically cleared when powered up. It lasts for approximately
15mS. After this, it is ready for operation. The mode of operation is set by default,
which means that:
1. Display is cleared.
2. Mode DL = 1 - Communication through 8-bit interface
3. N = 0 - Messages are displayed in one line
4. F = 0 - Character font 5 x 8 dots
5. Display/Cursor on/off
D = 0 - Display off
U = 0 - Cursor off
B = 0 - Cursor blink off
6. Character entry ID = 1 Displayed addresses are automatically incremented by 1
7. S = 0 Display shift off

Automatic reset mostly occurs without any problems. Mostly, but not always!
If for any reason the power supply voltage doesn’t reach full value within 10mS,
the display will start to perform completely unpredictably. If the voltage unit is not
able to meet that condition or if it is needed to provide completely safe operation,
the process of initialization is applied. Initialization, among other things, causes a
new reset by enabling the display to operate normally.

There are two initialization algorithms. Which one is to be performed depends


on whether connecting to the microcontroller is through 4- or 8-bit interface. In
both cases, all that’s left to do after initialization is to specify basic commands and
of course - to display messages.

991
CHAPTER 7
System Implementation
Chapter 7 | System Implementation

7.1| INTRODUCTION

We’ll take overview about the project to completely understand the project.
The project aims to help visually impaired to face the different problems they face
in their life.
Our system pass through different stages we start with searches then surveys
then searching for sponsors to help us to get the best form of the product then we
start design phase then development to get the best results the making prototype
and get the final product.
Our systems consists of 2 parts hardware and software :
The software is out door navigation online and offline designed initially using
MATLAB. The user just has to say the place he wants to go and we have 2 cases:

Case 1: if GPS is on, the code will receive GPS data and compare it with database
and due to the result a specific action is done

Case 2: if GPS is off, the code will choose the pre-saved maps in the database and
due to the speed and the length of the road which is calculated the program will
calculate the time between the wanted orders.

The hardware section is ultrasound sensor connected with vibration motor


calculating the distance the lower the distance the faster the vibration motor. And
RFID connected with mp3 player to help the user to identify the objects he usually
uses. We get these ideas and application from searches and surveys.

7.2| SURVEYS

We want to make product which solve a real problems so we go to different


non-profit organization especially RESALA which help us to meet visually
impaired volunteers several times to see the real and high risk problems they face
so we re-order the wanted applications in the project.
The ultrasound sensor is the most important part to help them moving freely
without any problems.
Then the RFID which help them to identify the objects they usually use. Then
outdoor navigation and we did this application with MATLAB. Then we start the
next stage searches.

141
Chapter 7 | System Implementation

7.3| SEARCHES

We start searches to find the best way to reach our goal. At the first we
wanted to make sensor and indoor and outdoor navigation beside security system.
In this part we reach to all technical data we need to start designing the systems.

7.3.1| Ultrasound Sensor


Then we start to search for a suitable ultrasound sensor module to use it in the
system we decided to use maxsonar but because of its high price we decided to use
sensor HC-SR04 temporarily then use maxsonar.

7.3.2| Indoor Navigation System


We need in this part RFID reader for suitable range minimum 2 meters and
we initially start with image processing and search for courses in it and then start
design the code with the aid of MATLAB help.

7.3.3| Outdoor Navigation

We did our searches to get the best available GPS module available and we
choose the MediaTek MT3329 GPS module because of:
-Based on MediaTek Single Chip Architecture.
-Dimension:16mm x 16mm x 6mm
-L1 Frequency, C/A code, 66 channels
-High Sensitivity Up to -165dBm tracking, superior urban performances
-Position Accuracy:< 3m CEP (50%) without SA (horizontal)
-Cold Start is under 35 seconds (Typical)
-Warm Start is under 34 seconds (Typical)
-Hot Start is under 1 second (Typical)
-Low Power Consumption:48mA @ acquisition, 37mA @ tracking
-Low shut-down current consumption:15uA, typical
-DGPS (WAAS, EGNOS, MSAS) support (optional by firmware)
-USB/UART Interface
-Support AGPS function (Offline mode: EPO valid up to 14 days )
-Includes a Molex cable adapter, 5 cm
-Includes the new basic adapter
-Weight: 0.3oz; 8 g

141
Chapter 7 | System Implementation

Then we start our search to connect the GPS module with matlab so we buy
FTDI cable to connect between the module and pc

7.4| SPONSORS

We find it difficult to convert our project totally to hardware so we decided to


search for sponsors to help us to choose the best way to reach our goal
So after meeting our sponsors several times we reach to the present vision of our
project.
We searched for sponsors in the field of embedded systems , medical
equipment and programming.
We wanted to design auditory outdoor navigation with pic put we found it so
difficult and it needs technical and financial support doesn’t exist in Egypt so we
make this application using matlab.We wanted to design indoor navigation using
RFID but after surveying with Resala non-profit organization we found that the
users doesn't need it so we cancelled this part and replace it with identifying
objects using RFID.
We used one sensor to calculate the distance but we developed it by using 3
sensors in 3 directions to get the way the user must go.
Our sponsors is Futek and it works in embedded systems and power saving
we need it in designing the system and also finishing it and get final product.

Brilliance and we'll need it as a technical support to help us to find the best
technical solutions for any problems.

7.5| PRE-DESIGN

At the first we make the acceptable specifications we need in our project

Table(7.2):Acceptable specificitions
No. Need Imp
1 The suspension Acceptable Range 5
2 The suspension Used Outdoor 5
3 The suspension Low Cost 5

141
Chapter 7 | System Implementation

4 The suspension Light Intensity 5


5 The suspension Low Power 5
6 The suspension Good style and finishing 3
7 The suspension Arabic 5
8 The suspension Clear Voice 4
9 The suspension Easy to use 5
10 The suspension High Quality Materials 3

7.5.1| List Of Metrics

Table(7.2) :List Of Metrics


No. 1 2 3 4 5 6 7 8 9 10
Meters

Use Cam Module with

to

1.9cm * 2.1cm US

Measurement
Weather Resistance
2.5V to start work
Ultrasound Range

Design it a watch

Buttons
uart data transfer
Very Small Size

choose Mode
Mp3 Module

Module

Cycle
Two

Fast
6

1 Acceptab

le Range
2 Fast
√ √ √
response
3 used

outdoor
4 Low cost √
5 Light
√ √
Intensity
6 Low

power

144
Chapter 7 | System Implementation

7 Good
style and √ √ √
Finishing
8 Arabic
Languag √
e
9 Easy to

Use
10 High
quality √ √ √
materials
11 Clear

Voice

7.5.2| Competitive Benchmarking Information

Table(7.3): Competitive Benchmarking Information


Graduation
Metric Needs project in Our
Metric Imp Units
No. no. Mansoura Project
university
6 meters ultrasound
1 1 5 M 0.3 3
range
2,4,8,10
2 Mp3 module 4 Found
,11
3 3,10 Weather resistance 5 Not Found Found
4 6 Low Power 4 V 5V 2.5V
Stick
(or)
5 7,10 Good Design 3 Bag
Bracele
t
Use cam module with
6 2 4 Found Found
uart data transfer
Buttons to choose the More than 5 2
7 9 3
mode buttons Buttons
Small dimensions us 1.9cm*
8 5,7 4 cm 4cm*2cm
module 2.1cm

141
Chapter 7 | System Implementation

7.5.3| Ideal And Marginally Acceptable Target Values

Table(7.4):Ideal and Marginally Acceptable Target Values


Metr
Needs Uni Marginal Ideal
ic Metric Imp
no. ts Values Values
No.
6 meters ultrasound
1 1 5 M 3 6
range
2,4,8,10,
2 Mp3 module 4 32 64
11 MB
3 3,10 Weather resistance 5 Not Found Found
4 6 Low Power 4 V 5V 2.5V
Stick (or)
5 7,10 Good Design 3 Bag
Bracelet
Use cam module
6 2 with uart data 4 Found Found
transfer
Buttons to choose No
7 9 3 2 buttons
the mode Buttons
Small dimensions us 1.9cm*2.1c 1.9cm*2.1
8 5,7 4 Cm
module m cm

7.5.4| Time Plane Diagram

time plan
exams
design
develop
lectures; 21.2 exams; 21.2
finishing
other
design; 25.62 travelling
activities;
18.18 play
other activities
play; 1.59 develop; lectures
travelling; 1.7 finishing; 2.01 8.5
Fig.(7.1)

141
Chapter 7 | System Implementation

7.6| DESIGN

We designed speech recognition, outdoor mapping using mapping toolbox,


indoor mapping using image processing, ultrasound sensor using mikroc and GUI
interface using matlab.

7.6.1| Speech Recognition:


Speech processing Steps for one sample: and then but it and other samples in
dataset to train it using NNs.

141
Chapter 7 | System Implementation

Fig.(7.3): Screenshot of some parts of speech recognition code

141
Chapter 7 | System Implementation

7.6.2| Ultrasound Sensor

Fig.(7.4) : Screenshot of some parts of Ultrasonic sensor code

Fig.(7.5):Simulation of ultrasonic sensor circuit

141
Chapter 7 | System Implementation

7.6.3| Outdoor Navigation

Mapping Code:

Fig.(7.6): Screenshot of mapping code

Fig.(7.7):Result of pre-map code

111
Chapter 7 | System Implementation

7.7| PRODUCT ARCHITECTURE

7.7.1| Product Schematic:

For the visually impaired and blind women:

Button1 Button 2
Indoor Outdoor

User Interface

Main Board
Ultrasound Supply
sensors dc
modules power
Input
MCU

Camera
module

Output

MP3 module Speaker

Fig. (7.8): visually impaired and blind women model

111
Chapter 7 | System Implementation

For blind men:

Main Board
Input Supply
dc
power

Ultrasoun
d sensors MCU
modules

Output

MP3 module Speaker

Fig.(7.9) : Blind Men Model

7.7.2| Rough Geometric Layout

For blind men:

Mp3 module
White U_S to left
stick
U_S to forward

U_S to Right

Speaker
Fig.(7.10): Blind men Geometric Layout

111
Chapter 7 | System Implementation

For the visually impaired and blind women:

U_S to Center
U_S to Left Button 2 Button 1 U_S to Right

Fig.(7.11): This design is optional not the default design but the Stick is the default one which
we work on it firstly.

7.7.3| Incidental Interactions

For Blind Men:

Speaker

PW
M
MP3 MCU Ultrasound
module SPI data Analog data module
Select mp3 file

Fig. (7.12)

111
Chapter 7 | System Implementation

For the visually impaired and blind women:

Speaker

PW
M
MP3 MCU Ultrasound
module SPI data Analog data module
Select mp3 file

Send image data

RFID
Module

Fig.(7.13)

7.8| DEFINING SECONDARY SYSTEMS

 Power button (on/off).


 Speaker (connected with MP3 module).
 Led connected MP3 module to show the status.
 Mute button.
 USB interface to connect module with computer to download files.
 Power supply rechargeable.

7.9| DETAILED INTERFACE SPECIFICATIONS

Table(7.5)
Line Name Properties
1 Power 5v
2 Ground 0v
3 Input analog

114
Chapter 7 | System Implementation

Control 2 Ultrasound
unit 3 sensor

Fig.(7.14)
Table(7.6)
Line Name Properties
1 Power 3.3v
2 Power 3.3v
3 Ground 0v
4 Spi data
5 Spi CS
6 Spi clk

1
2
Control 3 MP3
unit 4 Module
5
6

Fig(7.15)

7.10| ESTABLISHING THE ARCHITECTURE OF THE CHUNKS

Choose mode

Control port WT588D-U Module USB Download

Voice output

111
CHAPTER 8
Conclusion
Chapter 8 | Conclusion

8.1| INTRODUCTION

Finally our purpose was to make a project solve a real problem found in our
life or solve difficulties faces some people as we said before blind or deaf people
need a special care and special devices to make their life easier.
After the survey we did and meeting sponsors technical and marketing we
choose the most wanted applications:
Outdoor navigation, ultrasound sensor and objects identifier and we cancelled
indoor navigation as the user doesn't need it.
We'll have an overview for every part

8.2| OVERVIEW

8.2.1| Outdoor navigation

8.2.1.1| Outdoor navigation online

In this part we use two subsystems:


1. Speech recognition.
2. Serial communication to read GPS.
1st step in this system the user speak to choose the place or location he wants to
go then GPS module is activated to detect the current location then the program we
made compare between the received data and the pre-saved data and due to the
result the code take a specific action.
We have 2 cases:
Case1: the 2 values aren't equal so there's no action is made.
Case 2: the 2 values are equal due to this location the program will output a sound
contain the direction he must go forward, right, left or telling him that he arrived .

8.2.1.2| Outdoor navigation offline

In this part we use 3 subsystems :


1. Speech recognition.
2. Image processing.
3. GUI .
1st step in this system the user speak to choose the place or location he wants to
go then the GUI load image then he enter his velocity through GUI then the code
load the map of this path then calculating the time the user needs to finish this way

851
Chapter 8 | Conclusion

by getting the pre-saved length of the road then divide it by velocity and calculate
the delay between every order the code take a specific action.

8.2.2| Ultrasound sensor

This part calculate the distance between the user and any barrier on his way and
this sensor connected with dc vibration motor and due to this distance the motor's
speed increases while the distance decreases.

8.2.3| Object identifier

In this part we use RFID reader connected with microcontroller and mp3
module.
Every object will have a tag when the user's hands approaches to any object the
reader activate the tag and this tag sends its id to the reader which sends it to pic.
The pic due to the ID will activate a specific wav file saved in mp3 module
which contains the name of the object.

8.3| FEATURES

We plan to develop and add more features in our project to solve all problems we
can solve it.
These features appear in the next points:
 Help him to read books.
 Help him to market easily.
 Help him to reach his lost objects.

851
Appendix
Appendix A: GUI

Appendix A: GUI
A.1 | INTRODUCTION

A.1.1 | What Is a GUI?

A graphical user interface (GUI) is a graphical display in one or more


windows containing controls, called components that enable a user to perform
interactive tasks. The user of the GUI does not have to create a script or type
commands at the command line to accomplish the tasks. Unlike coding programs
to accomplish tasks, the user of a GUI need not understand the details of how the
tasks are performed.

GUI components can include menus, toolbars, push buttons, radio buttons, list
boxes, and sliders—just to name a few. GUIs created using MATLAB® tools can
also perform any type of computation, read and write data files, communicate with
other GUIs, and display data as tables or as plots. The following figure illustrates a
simple GUI that you can easily build yourself.

Fig.(A.1): A simple GUI

The GUI contains:

• An axes component

161
Appendix A: GUI

• A pop-up menu listing three data sets that correspond to MATLAB functions:
peaks, membrane, and sinc
• A static text component to label the pop-up menu
• Three buttons that provide different kinds of plots: surface, mesh, and contour
When
you click a push button, the axes component displays the selected data set using the
specified type of 3-D plot.

A.1.2|How Does a GUI Work?

In the GUI described in “What Is a GUI?” the user selects a data set from the
pop-up menu, then clicks one of the plot type buttons. The mouse click invokes a
function that plots the selected data in the axes.

Most GUIs wait for their user to manipulate a control, and then respond to
each action in turn. Each control, and the GUI itself, has one or more user-written
routines (executable MATLAB code) known as callbacks, named for the fact that
they “call back” to MATLAB to ask it to do things. The execution of each callback
is triggered by a particular user action such as pressing a screen button, clicking a
mouse button, selecting a menu item, typing a string or a numeric value, or passing
the cursor over a component.

The GUI then responds to these events. You, as the creator of the GUI,
provide callbacks which define what the components do to handle events.

This kind of programming is often referred to as event-driven programming. In


the example, a button click is one such event. In event-driven programming,
callback execution is asynchronous, that is, it is triggered by events external to the
software. In the case of MATLAB GUIs, most events are user interactions with the
GUI, but the GUI can respond to other kinds of events as well, for example, the
creation of a file or connecting a device to the computer.

A.1.3 |How can you code callbacks?

You can code callbacks in two distinct ways:


• As MATLAB language functions stored in files
• As strings containing MATLAB expressions or commands (such as 'c = sqrt(a*a
+ b*b);'or 'print')

162
Appendix A: GUI

Using functions stored in code files as callbacks is preferable to using strings,


as functions have access to arguments and are more powerful and flexible.
MATLAB scripts (sequences of statements stored in code files that do not define
functions) cannot be used as callbacks.

Although you can provide a callback with certain data and make it do
anything you want, you cannot control when callbacks will execute. That is, when
your GUI is being used, you have no control over the sequence of events that
trigger particular callbacks or what other callbacks might still be running at those
times. This distinguishes event-driven programming from other types of control
flow, for example, processing sequential data files.

A.1.4|Where Do I Start?

Ways to Build MATLAB GUIs

A MATLAB GUI is a figure window to which you add user-operated


controls.
You can select, size, and position these components as you like. Using callbacks
you can make the components do what you want when the user clicks or
manipulates them with keystrokes.

You can build MATLAB GUIs in two ways:

• Use GUIDE (GUI Development Environment), an interactive GUI construction


kit.
• Create code files that generate GUIs as functions or scripts (programmatic GUI
construction).

The first approach starts with a figure that you populate with components
from within a graphic layout editor. GUIDE creates an associated code file
containing callbacks for the GUI and its components. GUIDE saves both the figure
(as a FIG-file) and the code file. Opening either one also opens the other to run the
GUI.
In the second, programmatic, GUI-building approach, you create a code file that
defines all component properties and behaviors; when a user executes the file, it
creates a figure, populates it with components, and handles user interactions. The
figure is not normally saved between sessions because the code in the file creates a
new one each time it runs.

163
Appendix A: GUI

As a result, the code files of the two approaches look different. Programmatic
GUI files are generally longer, because they explicitly define every property of the
figure and its controls, as well as the callbacks. GUIDE GUIs define most of the
properties within the figure itself. They store the definitions in its FIG-file rather
than in its code file. The code file contains callbacks and other functions that
initialize the GUI when it opens.

MATLAB software also provides functions that simplify the creation of


Table(A.1):GUI Technique
Type of GUI Technique
Dialog box MATLAB software provides a
selection of standard dialog boxes
that you can create with a single
function call. For an example, see
the documentation for msgbox, which
also provides links to functions that
create specialized predefined dialog
boxes.
GUI containing just a few It is often simpler to create GUIs
Components that contain only a few components
programmatically. You can fully
define each component with a single
function call.
Moderately complex GUIs GUIDE simplifies the creation of
moderately complex GUIs.
Complex GUIs with many Creating complex GUIs
components, and GUIs that programmatically lets you control
require interaction with other GUIs exact placement of the components
and provides reproducibility.
standard dialog boxes, for example to issue warnings or to open and save files. The
GUI-building technique you choose depends on your experience, your preferences,
and the kind of application you need the GUI to operate. This table outlines some
possibilities.

programmatically and later modify it with GUIDE.


You can combine the two approaches to some degree. You can create a GUI with
GUIDE and then modify it programmatically. However, you cannot create a GUI

A.2|WHAT IS GUIDE?

164
Appendix A: GUI

GUIDE, the MATLAB Graphical User Interface Development Environment,


provides a set of tools for creating graphical user interfaces (GUIs). These tools
greatly simplify the process of laying out and programming GUIs.
Opening GUIDE
There are several ways to open GUIDE from the MATLAB Command line.

Table (A.2): ways to open GUIDE from the MATLAB Command line

Command Result
guide Opens GUIDE with a choice of GUI templates
guide FIG-file name Opens FIG-file name in GUIDE

You can also right-click a FIG-file in the Current Folder Browser and select

Fig.(A.2)

Open in GUIDE from the context menu


When you right-click a FIG-file in this way, the figure opens in the GUIDE Layout
Editor, where you can work on it

165
Appendix A: GUI

Fig.(A.3)

. All tools in the tool palette have tool tips. Setting a GUIDE preference
lets you display the palette in GUIDE with tool names or just their icons.

A.2.1 | Getting Help in GUIDE

When you open GUIDE to create a new GUI, a gridded layout area displays.
It has a menu bar and toolbar above it, a tool palette to its left, and a status bar
below it, as shown below. See “GUIDE Tools Summary” on page 4-3 for a full
description. At any point, you can access help topics from the GUIDE
Help menu, shown in the following illustration. The first three options lead you to
topics in the GUIDE documentation that can help you get started using GUIDE.
The Example GUIs option opens a list of complete examples of GUIs built using
GUIDE that you can browse, study, open in GUIDE, and run.
The bottom option, Online Video Demos, opens a list of GUIDE- and related GUI-
building video tutorials on MATLAB Central. You can access MATLAB video
demos, as well as the page on MATLAB Central by clicking links in the following
table.

166
Appendix A: GUI

Table(A.3): clicking links


TYPE OF VEDIO VEDIO CONTENT
MATLAB New Feature New Graphics and GUI Building Features in
Demos Version 7.6 (9 min, 31 s)
New Graphics and GUI Building Features in
Version 7.5 (2 min, 47 s)
New Creating Graphical User Interfaces features
in Version 7 (4 min, 24 s)
MATLAB Central Video Archive for the “GUI or GUIDE” Category from
Tutorials 2005 to present.

A.2.2| Laying Out a GUIDE GUI

The GUIDE Layout Editor enables you to populate a GUI by clicking and
dragging GUI components into the layout area.

Fig.(A.4)

There you can resize, group and align buttons, text fields, sliders, axes, and other
components you add. Other tools accessible from the Layout Editor enable you to:
• Create menus and context menus
• Create toolbars
• Modify the appearance of components
167
Appendix A: GUI

• Set tab order


• View a hierarchical list of the component objects
• Set GUI options

A.2.3| Programming a GUIDE GUI


When you save your GUI layout, GUIDE automatically generates a file of
MATLAB code for controlling the way the GUI works. This file contains code to
initialize the GUI and organizes the GUI callbacks. Callbacks are functions that
execute in response to user-generated events, such as a mouse click.
Using the MATLAB editor, you can add code to the callbacks to perform the
functions you want.
Simple GUIDE GUI Example
Simple GUIDE GUI Components
This section shows you how to use GUIDE to create the graphical user interface
(GUI) shown in the following figure.

Fig.(A.5)

168
Appendix A: GUI

Fig.(A.6)

To use the GUI, select a data set from the pop-up menu, then click one of the
plot-type buttons. Clicking the button triggers the execution of a callback that plots
the selected data in the axes.
Lay Out the Simple GUI in GUIDE
Open a New GUI in the GUIDE Layout Editor
1 Start GUIDE by typing guide at the MATLAB prompt. The GUIDE Quick
Start dialog displays, as shown in the following figure.

2 In the GUIDE Quick Start dialog box, select the Blank GUI (Default) template.
Click OK to display the blank GUI in the Layout Editor, as shown in the following
figure.

169
Appendix A: GUI

Fig.(A.7)
3 Display the names of the GUI components in the component palette. Select File
> Preferences. Then select GUIDE > Show names in component palette, and then
click OK. The Layout Editor then appears as shown in the following figure

Fig(A.8)

170
Appendix A: GUI

Add Components to the Simple GUIDE GUI


1 Add the three push buttons to the GUI. Select the push button tool from the
component palette at the left side of the Layout Editor and drag it into the layout
area. Create three buttons this way, positioning them approximately as shown in
the
following figure.

Fig.(A.9)

2 Add the remaining components to the GUI.


• A static text area
• A pop-up menu
• An axes
Arrange the components as shown in the following figure. Resize the axes
component to approximately 2-by-2 inches.

Align the Components


If several components have the same parent, you can use the Alignment Tool to
align them to one another. To align the three push buttons:

1 Select all three push buttons by pressing Ctrl and clicking them.
2 Select Tools > Align Objects.
3 Make these settings in the Alignment Tool, as shown in the following figure:
• Left-aligned in the horizontal direction.

171
Appendix A: GUI

• 20 pixels spacing between push buttons in the vertical direction

Fig.(A.10)

4 Click OK. Your GUI now looks like this in the Layout Editor.

Fig.(A.11)

172
Appendix A: GUI

Label the Push Buttons. Each of the three push buttons lets the GUI user
choose a plot type: 1 Select Property Inspector from the View menu. surf, mesh,
and contour. This '

Fig.(A.12)

topic shows you how to label the buttons with those choices.
2 In the layout area, select the top push button by clicking it

Fig.(A.13)
3 In the Property Inspector, select the String property and then replace the existing

173
Appendix A: GUI

Fig.(A.14)

5 Select each of the remaining push buttons in turn and repeat steps 3 and 4. Label
the middle push button Mesh, and the bottom button Contour. List Pop-Up Menu
Items. The pop-up menu provides a choice of three data sets: peaks, membrane,
and sinc. These data sets correspond to MATLAB functions of the same name.
This topic shows you how to list those data sets as choices in the pop-menu.
1 In the layout area, select the pop-up menu by clicking it.
2 In the Property Inspector, click the button next to String. The String dialog box
displays.

Fig.(A.15)
3 Replace the existing text with the names of the three data sets: Peaks, Membrane,
and Sinc. Press Enter to move to the next line.

174
Appendix A: GUI

Fig.(A.16)

4 When you have finished editing the items, click OK. The first item in your list,
Peaks, appears in the pop-up menu in the layout area.

Fig.(A.17)

175
Appendix A: GUI

Modify the Static Text. In this GUI, the static text serves as a label for the
pop-up menu. The user cannot change this text. This topic shows you how to
change the static text to read Select Data.
1 In the layout area, select the static text by clicking it.
2 In the Property Inspector, click the button next to String. In the String dialog box
that displays, replace the existing text with the phrase Select Data.
3 Click OK. The phrase Select Data appears in the static text component above the

Fig.(A.18)
pop-up menu.
Completed Simple GUIDE GUI Layout
In the Layout Editor, your GUI now looks like this and the next step is to save the
layout. The next topic, “Save the GUI Layout”

Fig.(A.19)

176
Appendix A: GUI

A.2.4| Save the GUI Layout


When you save a GUI, GUIDE creates two files, a FIG-file and a code file.
The FIG-file, with extension .fig, is a binary file that contains a description of the
layout. The code file, with extension .m, contains MATLAB functions that control
the GUI.
1 Save and activate your GUI by selecting Run from the Tools menu.
2 GUIDE displays the following dialog box. Click Yes to continue.

Fig.(A.20)

3 GUIDE opens a Save As dialog box in your current folder and prompts you for a
FIG-file name.

Fig.(A.21)

177
Appendix A: GUI

4 Browse to any folder for which you have write privileges, and then enter the
filename simple_gui for the FIG-file. GUIDE saves both the FIG-file and the code
file using this name.
5 If the folder in which you save the GUI is not on the MATLAB path, GUIDE
opens a dialog box, giving you the option of changing the current folder to the
folder containing the GUI files, or adding that folder to the top or bottom of the
MATLAB path.

Fig.(A.22)

6 GUIDE saves the files simple_gui.fig and simple_gui.m and activates the GUI. It
also opens the GUI code file in your default editor.
The GUI opens in a new window. Notice that the GUI lacks the standard menu bar
and toolbar that MATLAB figure windows display. You can add your own menus
and toolbar buttons with GUIDE, but by default a GUIDE
GUI includes none of these components. When you operate simple_gui, you can
select a data set in the pop-up menu and click the push buttons, but nothing
happens. This is because the code file contains no statements to service the pop-up
menu and the buttons.

178
Appendix A: GUI

Fig.(A.23)

To run a GUI created with GUIDE without opening GUIDE, execute its code
file by typing its name. simple_gui You can also use the run command with the
code file, for example, run simple_gui
Note Do not attempt to run a GUIDE GUI by opening its FIG-file outside of
GUIDE. If you do so, the figure opens and appears ready to use.

179
Appendix B: RFID

Appendix B: RFID
B.1|INTRODUCTION

RFID stands for Radio-Frequency IDentification. The acronym refers to


small electronic devices that consist of a small chip and an antenna. The chip
typically is capable of carrying 2,000 bytes of data or less. RFID technology has
been available for more than fifty years.
System has three parts:

1- A scanning antenna.
2- A transceiver with a decoder to interpret the data.
3- A transponder - the RFID tag - that has been programmed with information.

B.2 | HOW RFID WORKS?

The scanning antenna puts out radio-frequency signals in a relatively short


range. The RF radiation does two things:

 It provides a means of communicating with the transponder (the RFID tag)


AND
 It provides the RFID tag with the energy to communicate (in the case of
Passive RFID tag .

This is an absolutely key part of the technology; RFID tags do not need to
contain batteries, and can therefore remain usable for very long periods of time
(maybe decades).

The scanning antennas can be permanently affixed to a surface; handheld


antennas are also available. They can take whatever shape you need; for example,
you could build them into a door frame to accept data from persons or objects
passing through.

When an RFID tag passes through the field of the scanning antenna, it detects
the activation signal from the antenna. That "wakes up" the RFID chip, and it
transmits the information on its microchip to be picked up by the scanning antenna.

871
Appendix 2:RFID

In addition, the RFID tag may be of one of two types Active RFID tag have their
own power source; the advantage of these tags is that the reader can be much
farther away and still get the signal. Even though some of these
devices are built to have up to a 10 year.

life span, they have limited life spans passive RFID tag , however, do not require
batteries, and can be much smaller and have a virtually unlimited life span.

RFID tags can be read in a wide variety of circumstances, where barcodes or other
optically read technologies are useless.

 The tag need not be on the surface of the object (and is therefore not
subject to wear)
 The read time is typically less than 100 milliseconds
 Large numbers of tags can be read at once rather than item by item.

B.3|TECHNICAL PROBLEMS WITH RFID

1-Problems with RFID Standards

RFID has been implemented in different ways by different manufacturers;


global standards are still being worked on. It should be noted that some RFID
devices are never meant to leave their network. This can cause problems for
companies

2-RFID systems can be easily disrupted

Since RFID systems make use of the electromagnetic spectrum (like Wi-Fi
networks or cell phones), they are relatively easy to jam using energy at the right
frequency. Although this would only be an inconvenience for consumers in stores
(longer waits at the checkout), it could be disastrous in other environments where
RFID is increasingly used, like hospitals or in the military in the field.

Also, active RFID tags (those that use a battery to increase the range of the
system) can be repeatedly interrogated to wear the battery down, disrupting the
system.

3-RFID Reader Collision

871
Appendix B: RFID

Reader collision occurs when the signals from two or more readers overlap.
The tag is unable to respond to simultaneous queries. Systems must be carefully set
up to avoid this problem; many systems use an anti-collision protocol (also called
a singulation protocol. Anti-collision protocols enable the tags to take turns in
transmitting to a reader.
4- RFID Tag Collision
Tag collision occurs when many tags are present in a small area; but since the
read time is very fast, it is easier for vendors to develop systems that ensure that
tags respond one at a time.

B.4|SECURITY, PRIVACY AND ETHICS PROBLEMS WITH RFID

1-The contents of an RFID tag can be read after the item leaves the
supply chain

An RFID tag cannot tell the difference between one reader and another. RFID
scanners are very portable; RFID tags can be read from a distance, from a few
inches to a few yards. This allows anyone to see the contents of your purse or
pocket as you walk down the street. Some tags can be turned off when the item has
left the supply chain
2-RFID tags are difficult to remove
RFID tags are difficult to for consumers to remove; some are very small (less
than a half-millimeter square, and as thin as a sheet of paper) - others may be
hidden or embedded inside a product where consumers cannot see them. New
technologies allow RFID tags to be "printed" right on a product and may not be
removable at all

3-RFID tags can be read without your knowledge

Since the tags can be read without being swiped or obviously scanned, anyone
with an RFID tag reader can read the tags embedded in your clothes and other
consumer products without your knowledge. For example, you could be
scanned before you enter the store, just to see what you are carrying. You might
then be approached by a clerk who knows what you have in your backpack or
purse, and can suggest accessories or other items.

4-RFID tags can be read a greater distances with a high-gain antenna

811
Appendix 2:RFID

For various reasons, RFID reader/tag systems are designed so that distance
between the tag and the reader is kept to a minimum (see the material on tag
collision above). However, a high-gain antenna can be used to read the tags from
much further away, leading to privacy problems.

5-RFID tags with unique serial numbers could be linked to an


individual credit card number

At present, the Universal Product Code (UPC) implemented with barcodes


allows each product sold in a store to have a unique number that identifies that
product. Work is proceeding on a global system of product identification that
would allow each individual item to have its own number. When the item is
scanned for purchase and is paid for, the RFID tag number for a particular item can
be associated with a credit card number.
B.5 | RFID TAG
An RFID tag is a microchip combined with an antenna in a compact
package; the packaging is structured to allow the RFID tag to be attached to an
object to be tracked The tag's antenna picks up signals from an RFID reader or
scanner and then returns the signal, usually with some additional data (like a
unique serial number or other customized information).RFID tags can be very
small - the size of a large rice grain. Others may be the size of a small paperback
book
B.5.1| What Are Zombie RFID Tags?

One of the main concerns with RFID tags is that their contents can be read by
anyone with an appropriately equipped scanner - even after you take it out of the
store.

One technology that has been suggested is a zombie RFID tag, a tag that can be
temporarily deactivated when it leaves the store. The process would work like this:
you bring your purchase up to the register, the RFID scanner reads the item, you
pay for it and as you leave the store, you pass a special device that sends a signal to
the RFID tag to "die." That is, it is no longer readable.

The "zombie" element comes in when you bring an item back to the store. A
special device especially made for that kind of tag "re-animates" the RFID tag,
allowing the item to reenter the supply chain.

818
References

]1]KennethR.Castleman. Digital Image Processing. Prentice Hall,1996.

]2[AshleyR.Clarkand Colin NEberhardt. MicroscopyTechniques for Materials


Science. CRC=Press,BocaRaton,Fl,2002.

]3[JamesD.Foley,AndriesvanDam,StevenK.Feiner,JohnF.Hughes,andRichardL.
Phillips. Introduction to Computer Graphics. Addison-Wesley, 1994.

]4[Rafael Gonzalez and Richard E.Woods. Digital Image Processing. Addison


Wesley, second edition,2002.

]5[Robert M.Haralick and Linda G.Shapiro. Computer and Robot Vision.


Addison-Wesley, 1993.

]6[RobertV.Hogg and AllenT.Craig. Introductionto Mathematical Statistics.


Prentice-Hall,fth edition,1994.

]7[JaeS.Lim.Two-Dimensional Signal and Image Processing. Prentice Hall,


1990.

]8[William K.Pratt. Digital Image Processing. JohnWileyandSons,


secondedition,1991.

]9[Majid RabbaniandPaulW.Jones. DigitalImage Compression Techniques.


SPIEOptical EngineeringPress,1991.

]11[Steven Roman. Introduction to Coding and Information Theory. Springer-


Verlag,1997.

]11[AzrielRosenfeldandAvinashC.Kak. DigitalPicture Processing. Academic


Press,second edition,1982.

]12[JeanPaulSerra.Image analysisandmathematicalmorphology.
AcademicPress,1982.

]13[MelvinP. Siedband. Medicalimaging systems. In JohnG.Webster, editor,


Medicalinstru-mentation: application and design,
pages518 .576JohnWileyandSons,1998.

]14[MilanSonka,VaclavHlavac,andRogerBoyle.ImageProcessing,Analysis and
MachineVision. PWS Publishing, secondedition,1999.

281
References

]15[Scott E. Umbaugh. Computer Vision and Image Processing: A Practical


Approach Using CVIP Tools. Prentice-Hall, 1998.

]16[Dominic Welsh. Codes and Cryptography. OxfordUniversityPress,1989.

[17] R. N. Bracewell. The Fourier Transform and its Applications. McGraw-


Hill, 2000.

[18] E. Oran Brigham. Schnelle Fourier Transformation. Oldenbourg Verlag,


1987.

[19] L. R. Rabiner C.-H. Lee and R. Pieraccini. Speaker IndependentContinuous


Speech Recognition Using Continuous Density Hidden Markov Models.,
volume F75 of NATO ASI Series, Speech Recogni-tion and Understanding.
Recent Advances. Ed. by P. Laface and R.De Mori. Springer Verlag, Berlin
Heidelberg, 1992.

[20] X. D. Huang and K. F. Lee. Phonene classification using semicontin-uous


hidden markov models.IEEE Trans. on Signal Processessing,40(5):1962–1067,
May 1992.

[21] F. Jelinek. Statistical Methods for Speech Recognition. MIT Press,1998.

[22] S. E. Levinson L.R. Rabiner, B.H. Juang and M. M. Sondhi. Recognition


of isolated digits using hidden markov models with continu-ous mixture
densities.AT & T Technical Journal, 64(6):1211–1234, July-August 1985.

[23] S. E. Levinson L.R. Rabiner, B.H. Juang and M. M. Sondhi. Some


properties of continuous hidden markov model representations.AT & T
Technical Journal, 64(6):1251–1270, July-August 1985.

[24] J. G. Wilpon L.R. Rabiner and Frank K. SOONG. High per-formance


connected digit recognition using hidden markov mod-els. IEEE Transactions
on Acoustics, Speech and Signal Processing, 37(8):1214–1225, August 1989.

[25] H. Ney. The use of a one-stage dynamic programming algorithm for


connected word recognition. IEEE Transactions on Acoustics,Speech and
Signal Processing, ASSP-32(2):263–271, April 1984.

281
References

[26] H. Ney. Modeling and search in continuous speech recognition.Pro


ceedings of EUROSPEECH 1993, pages 491–498, 1993.

[27] L.R. Rabiner. A tutorial on hidden markov models and selected ap


plications in speech recognition.Proceedings of the IEEE, 77(2):257– 286,
February 1989.

[28] L.R. Rabiner and B.H. Juang. An introduction to hidden markov

models.IEEE ASSP Magazine, pages 4–16, January 1986.

[29] G. Ruske. Automatische Spracherkennung. Oldenbourg Verlag,

M¨unchen, 1988.

[30] E. G. Schukat-Talamazzini. Automatische Spracherkennung. Vieweg

Verlag, 1995.

[31] J.J. Odell S. Young and P.C. Woodland. Tree-based state tying

for high accuracy acoustic modelling. Proc. Human Language Tech-nology


Workshop, Plainsboro NJ, Morgan Kaufman PublishersInc.,pages 307–312,
1994.

[32] K. F. Lee X. D. Huang and H. W. Hon. On semi-continuous hidden

markov modeling.Proceedings ICASSP 1990, Albuquerque, Mexico,

pages 689–692, April 1990.

[33] F. Alleva X. Huang, M. Belin and M. Hwang. Unified stochas-tic engine


(use) for speech recognition. Proceedings ICASSP 1993,,

II:636–639, 1993.

[34] S. J. Young. The general use of tying in phoneme-based hmm speech


recognisers. Proceedings of ICASSP 1992, I(2):569–572, 1992.

[35] S. Young. Large vocabulary continuous speech recognition: A review.


IEEE Signal Processing Magazine, 13(5):45–57, 1996.

281
References

[36] S. Young. Statistical modeling in continuous speech recognition.Proc. Int.


Conference on Uncertainity in Artificial Intelligence,Seattle, WA, August 2001.

[37] Zagzebski, J. Essentials of Ultrasound Physics. St. Louis, Mosby–Year


Book,1996.

[38] Wells, P. N. T. Biomedical Ultrasonics. New York, Academic Press, 1977.

[39] McDicken, W. Diagnostic Ultrasonics. New York, John Wiley & Sons,
1976.

[40]Eisenberg, R. Radiology: An Illustrated History. St. Louis, Mosby–Year


Book,1992, pp. 452–466.

[41] Bushong, S. Diagnostic Ultrasound. New York, McGraw-Hill, 1999.

[42] Palmer, P. E. S. Manual of Diagnostic Ultrasound. Geneva, Switzerland,


World Health Organization, 1995.

[43] Graff KF. Ultrasonics: Historical aspects. Presented at the IEEE


Symposium on Sonics and Ultrasonics, Phoenix, October 26–28, 1977.

[44] Hendee,W. R., and Holmes, J. H. History of Ultrasound Imaging, in


Fullerton,G. D., and Zagzebski, J. A. (eds.), Medical Physics of CT and
Ultrasound. NewYork: American Institute of Physics, 1980.

[45] Hendee,W. R. Cross sectional medical imaging: A history. Radiographics


1989;9:1155–1180.

[46] Kinsler, L. E., et al. Fundamentals of Acoustics, 3rd edition New York,
John Wiley & Sons, 1982, pp. 115–117.

[47] ter Haar GR. In CR Hill (ed): Physical Principles of Medical Ultrasonics.
Chichester, England, Ellis Horwood/Wiley, 1986.

[48] Kossoff, G., Garrett, W. J., Carpenter, D. A., Jellins, J., Dadd, M. J.
Principles and classification of soft tissues by grey scale echography.
Ultrasound Med. Biol. 1976; 2:89–111.

[49] Thrush, A., and Hartshorne, T. Peripheral Vascular Ultrasound. London,


Churchill-Livingstone, 1999.
281
References

[50] Chivers, R., and Hill, C. Ultrasonic attenuation in human tissues.


Ultrasound Med. Biol. 1975; 2:25.

[51] Dunn, F., Edmonds, P., and Fry, W. Absorption and Dispersion of
Ultrasound in Biological Media, in H. Schwan (ed.), Biological Engineering.
New York, McGraw-Hill, 1969, p. 205

[52] Powis, R. L., and Powis,W. J. A Thinker’s Guide to Ultrasonic Imaging.


Baltimore, Urban & Schwarzenberg, 1984.

[53] Kertzfield, K., and Litovitz, T. Absorption and Dispersion of Ultrasonic


Waves.New York, Academic Press, 1959.
[54] Wells, P. N. T. Review: Absorption and dispersion of ultrasound in
biological tissue. Ultrasound Med Biol 1975; 1:369–376.

[55] Suslick, K. S. (ed.). Ultrasound, Its Chemical, Physical and Biological


Effects.New York, VCH Publishers, 1988.

[56] Apfel, R. E. Possibility of microcavitation from diagnostic ultrasound.


Trans. IEEE 1986; 33:139–142.

281

You might also like