Module 1 - Fundamentals of NN

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 106

Professional Elective III

Soft Computing
Books
• Textbooks:
1. “Neural Networks, Fuzzy Logic and Genetic Algorithms” S.
Rajasekaran, G.A.VijayalakshmiPai, PHI (ECE).

• References:
1. MIT-OCW
2. “Introduction to the Theory of Neural Computation”, Hertz,
Krogh, Palmer.
3. “Artificial Neural Networks”, B. Yegnarayana, PHI.
4. “Genetic Algorithms”, David E. Goldberg, Addison Wesley
Course Learning Outcomes
Bloom’s Cognitive
After the completion of the course the
CO
student should be able to
level Descriptor

Interpret soft computing schemes using


knowledge of discrete mathematics, data Understand
CO1 2
structures, theory of computer science and ing
computer architectures..
CO2 Demonstrate machine learning processes. 3 Applying
CO3 Compare and analyse soft computing schemes. 4 Analyzing

CO4 Design schemes using soft computing. 5 Creating

CO5 Evaluate various schemes of soft computing 5 Evaluating


SC Syllabus
• Module 1 Fundamentals of Neural
Networks
Basics: Human Brain, Model of Artificial
Neuron, Neural Network Architectures,
Characteristics of Neural Networks, Learning
Methods; McCulloch-Pitts Model.
SC Syllabus
• Module 2 Back propagation Networks
BPN Architecture, Back propagation learning,
applications: Parity Problem, Encoder
Decoder, NETtalk and DEC-talk, Character
Recognition, Learning Time Sequences,
Cognitron; CNN, RCNN.
SC Syllabus
• Module 3 Unsupervised Learning
Introductions, ARTI Architecture, ART1
Algorithm, Kohonen’s Algorithm,
Applications of ART1
• Module 4 Fuzzy Systems
Crisp logic; Predicate Logic; Fuzzy logic:
Fuzzy Quantifiers, Fuzzy Inference; Fuzzy
Rule Based System; Defuzzification Methods,
Application
SC Syllabus
• Module 5 Genetic Algorithm
Fundamentals: Biological background, Creation of
Offsprings, Working Principle, Encoding,
Reproduction; Mathematical Foundations; Data
Structure: Mutation, Crossover, Selection;
Applications
• Module 6 Hybrid Systems
Integration of neural networks, fuzzy logic and genetic
algorithms: Hybrid Systems; Neuro-Fuzzy hybrids,
Neuro-Evolutionary Hybrids, Fuzzy-Evolutionary
Hybrids, GA-based BPN, Simplified Fuzzy ARTMAP
of neural networks.
Soft Computing
• It is a concept of Computation.
• It is the use of approximate calculations to
provide approximate but usable solutions to
complex computational problems.

• Where f is a mapping function which maps


input to some output.
• Characteristics of Computing:
– Should provide precise solution.
– Control action should be unambiguous and
accurate.
– There should be algorithm available.
• Hard Computing:
– Introduced by L. A. Zadeh
– He defines computing is hard if
• Precise result is guaranteed.
• Control action is unambiguous.
• Control actions are formally defined either in
the form of some mathematical model or
algorithm.
– E.g. of Hard computing:
• Solving Numerical Problems
• Searching & Sorting Techniques.
• Solving computational geometry problems etc
• Soft Computing:
– Proposed by L. A. Zadeh
– According to him, the soft computing is defined as “a
collection of methodologies that aim to exploit the
tolerance for imprecision and uncertainty to achieve
tractability, robustness, and low solution cost.”
– Soft Computing is a concept of Fuzzy logic, Neural
Computing and Genetic Algorithm.
– The role model for soft computing is in fact human
mind
• Soft Computing Characteristics:
– It does not require any mathematical model.
– It may not yield the precise solution.
– Algorithms are adaptive; that means, it can adjust to the
change of any dynamical situation.
– Use some biological inspired methodology. So, that also
constitute the concept of human behavior, such as
genetics, the evolution, the behaviors of ant colony,
swarming of particles, our nervous system, etc.
– Examples:
• Handwritten Character Recognition (ANN)
• Money Allocation Problem (Evolutionary Problem solved
using GA)
identify where to invest or in which bank
• Robot Movement (Fuzzy Logic)
• different techniques behind the SC:
1. How a student learn from his teacher?
➢ teacher ask questions and tell the answer.
➢ Teacher puts some questions and hints an answers and ask
whether the answers are correct or not.
➢ students thus learn a topic and store in his memory.
➢ based on the knowledge he then can solve many new
problems.
➢ This is the way exactly our human brain works in fact. And
based on this concept the artificial neural network is used for
example, hand written character can be recognized.
2. how world selects the best?
➢ starts with a population and initially random population.
➢ Reproduces another population (Next generation)
➢ Rank the population & selects the best individual.
• different techniques behind the SC:
3. How a doctor treats his patient?
➢ Doctor asks the patient about the problem that he is suffering.
➢ Doctor find the symptoms of disease.
➢ Doctor prescribe some tests and medicine.
➢ This is the exactly the way the fuzzy logic works.
Hard Vs. Soft Computing
Hard Computing Soft Computing
It requires a precisely stated analytical It is tolerance to imprecision, partial truth
model and, it is computationally and approximation
expensive.
It is based on binary logic, crisp system, It is based on the fuzzy logic, the neural
numerical analysis and some crisp networks, and evolutionary computation
software; (GA)
It has the characteristics of precision and It has characteristic of approximation and
categoricity. dispositionality
It is deterministic It is probabilistic
It requires exact input data It can deal with an ambiguous and noisy
data.
Strictly sequential can be carried out using parallel
computation
Produces precise answers It can yields approximate answers
Module 1
Fundamentals of Neural
Networks
Human Nervous System
• If we speak in the language of biology, then the neural network
is the human nervous system, the totality of neurons in our brain,
through which we think, make certain decisions, feel the world
around us.
• A biological neuron is a special cell consisting of a nucleus, a
body, and connectors. Each neuron has a close connection with
thousands of other neurons. Electrochemical impulses are
transmitted through this connection, causing the entire neural
network to be in a state of excitation or vice versa. For example,
a pleasant and exciting event (meeting a loved one, winning a
competition) will generate an electrochemical impulse in a
neural network that is located in our head, which will lead to its
excitement. As a result, the neural network in our brain will
transmit its excitement to other organs of our body and lead to an
increased heartbeat, more frequent blinking of the eyes, etc.
Biological Neural Network
Biological Neuron
• Has 3 parts
– Soma or cell body:- cell nucleus is located
– Dendrites:- nerve connected to cell body
– Axon: carries impulses of the neuron
• End of axon splits into fine strands
• Each strand terminates into a bulb-like organ called synapse
• Electric impulses are passed between the synapse and
dendrites
• Synapses are of two types
– Inhibitory:- impulses hinder the firing of the receiving cell
– Excitatory:- impulses cause the firing of the receiving cell
• Neuron fires when the total of the weights to receive impulses
exceeds the threshold value
Fundamental Concept
• NN are constructed and implemented to model
the human brain.
• Performs various tasks such as pattern-
matching, classification, optimization function,
approximation, vector quantization and data
clustering.
• These tasks are difficult for traditional
computers
What are Neural Networks?
• Model of the brain and nervous system
• Highly parallel
– Process information much more like the brain
than a serial computer
• Applications
– As powerful problem solvers
– As biological models
Artificial Neural Network (ANN)
• ANN is a machine learning approach inspired by the way
in which the brain performs a particular learning task.
• ANN possess a large number of processing elements
called nodes/neurons which operate in parallel.
• Neurons are connected with others by connection link.
• Each link is associated with weights which contain
information about the input signal.
• Each neuron has its own function of the inputs that
neuron receives- Activation function
• Neuron vs. Node
• Structure of a node:

Squashing function limits node output (0,1):


• Synapse vs. weight
Artificial Neural Networks
x1
X1
w1

Y y

x2
X2
w2
y = f ( yin )

y in = x 1 w 1 + x 2 w 2
Characteristics of NNs
• Exhibit mapping capabilities

• Learn by examples i.e. self learning

• Possess the capability to generalize

• Are robust systems and are fault tolerant. Recall full


patterns from incomplete, partial or noisy patterns

• Can process information in parallel, at high speed and in a


distributed manner
Applications of NNs
• Classification
In marketing: consumer spending pattern classification
In defence: radar and sonar image classification
In agriculture & fishing: fruit and catch grading
In medicine: ultrasound and electrocardiogram image classification, ECGs, medical
diagnosis
• Recognition and Identification
In general computing and telecommunications: speech, vision and handwriting recognition
In finance: signature verification and bank note verification
• Assessment
In engineering: product inspection monitoring and control
In defence: target tracking
In security: motion detection, surveillance image analysis and fingerprint matching
• Forecasting and Prediction
In finance: foreign exchange rate and stock market forecasting
In agriculture: crop yield forecasting
In marketing: sales forecasting
In meteorology: weather prediction
ANNs – The basics
• ANNs incorporate the two fundamental
components of biological neural nets:

1. Neurons (nodes)
2. Synapses (weights)
Basic models of ANN

Basic Models of
ANN

Activation
Interconnections Learning rules
function
Classification based on
interconnections
Interconnections

Feed forward Feed Back Recurrent

Single layer Single layer

Multilayer Multilayer
1. Feed-forward nets

Information flow is unidirectional


Data is presented to Input layer
Passed on to Hidden Layer
Passed on to Output layer

Information is distributed
Information processing is parallel
Often used in data mining

Internal representation (interpretation) of data


1.1 Single Layer Feed-forward

Input layer Output layer


of of
source nodes neurons

Neural Networks NN 1 32
1.2 Multi layer feed-forward

3-4-2 Network

Input Output
layer layer

Hidden Layer

Neural Networks NN 1 33
2. Feedback network
• When outputs are directed back as inputs to same or
preceding layer nodes it results in the formation of feedback
networks.
• Used in Associative memories and Optimization problems
where the network looks for the best arrangements or
interconnected factors.
3. Recurrent Network
• Feedback networks with closed loop are called
Recurrent Networks.
• The response at the (k+1)th instant depends on the entire
history of the network starting at k=0.

• Single node with own feedback


• Single-layer recurrent networks
• Multilayer recurrent networks
3.3 Multilayer Recurrent Network
Recurrent Network with hidden neuron(s): unit
delay operator z-1 implies dynamic system

z-1

input
z-1 hidden
output

z-1

38
Example of Recurrent NN
• First we see a simple example of NN
• Suppose we have a roommate which is perfect.
• He cooks three types of foods based on weather and he
cooks everyday.
• He cooks orange juice, pakora and Manchurian.
• We assign some vectors to foods and weather
1
0
0
1
0

0
1
0

0
1
0
0
1
0
0
1

1
0
0
• Now for next example,
• Consider the roommate cooks different food every
day e.g if today he cooks orange juice, next day he
cooks pakora, next day Manchurian like wise.
M T W TH FRI SAT
• More complex RNN

• So the cooking schedule is:

Mon Tues Wed Thurs Fri Sat Sun

OJ OJ P M M OJ P

S R R S R R
• Food metrics tells which food to be cooked and weather
matrix tells for which day the food is cooked
Basic models of ANN

Basic Models of
ANN

Activation
Interconnections Learning rules
function
Learning
• It’s a process by which a NN adapts itself to a
stimulus by making proper parameter
adjustments, resulting in the production of
desired response
• Two kinds of learning
– Parameter learning:- connection weights are
updated
– Structure Learning:- change in network structure
1. Training (Parameter
Learning)
• The process of modifying the weights in the
connections between network layers with the
objective of achieving the expected output is
called training a network.
• This is achieved through
– Supervised learning
– Unsupervised learning
– Reinforcement learning
Classification of learning
• Supervised learning
• Unsupervised learning
• Reinforcement learning
1.1 Supervised Learning
• Child learns from a teacher
• Each input vector requires a corresponding
target vector.
• Training pair=[input vector, target vector]

Neural
X Network Y
W
(Input) (Actual output)
Error
Error
(D-Y) Signal
signals Generator (Desired Output)
Supervised learning contd.

Supervised
learning does
minimization of
error
1. 2 Unsupervised Learning
• All similar input patterns are grouped together as
clusters.
• If a matching input pattern is not found a new cluster
is formed
Unsupervised learning
Self-organizing
• In unsupervised learning there is no feedback
• Network must discover patterns, regularities,
features for the input data over the output
• While doing so the network might change in
parameters
• This process is called self-organizing
1.3 Reinforcement Learning

X
Y
NN
(Input) W (Actual output)

Error
signals Error
Signal R
Generator Reinforcement signal
Reinforcement Learning (Contd…)
• Though teacher is available, does not provide
expected answer
• Only indicates the computed answer is correct
or incorrect. For correct answer reward and for
wrong answer penalty is given.
• Feedback is provided in terms of
reinforcement signals
• Helps in its learning process
Basic models of ANN

Basic Models of
ANN

Activation
Interconnections Learning rules
function
Activation Functions
• Also known as squashing function or transfer
function
• Defines the output of a neuron.
• Used to calculate the output response of a
neuron.
• Sum of the weighted input signal is applied
with an activation to obtain the response.
• Activation functions can be linear or non linear
Activation Functions

1ifx  
f (x) = {
0 ifx  

1 ifx  
f (x) = {
− 1 ifx  

1 ifx  1
f ( x ) = x if 0  x  1
0 ifx  0
The Neuron
• The neuron is the basic information processing unit of a
NN. It consists of:
1 A set of synapses or connecting links, each link
characterized by a weight:
W1, W2, …, Wm
2 An adder function (linear combiner) which computes
the weighted sum of the inputs: m
u= w x j j
j =1
3 Activation function (squashing function)  for limiting
the amplitude of output. (of the neuron)

y =  (u + b)
Neural Networks NN 1 60
The Neuron
Bias
b
x1 w1
Activation
Local function
Field

 (−)
v
Output
Input x2 w2 y
signal

  Summing
function

xm wm

Synaptic
weights
Neural Networks NN 1 61
Bias
Example, Suppose we are having a equation of
line Y = mX + C.
Why Bias is required?
• The relationship between input and output
given by the equation of straight line y=mx+c

C(bias)

Input X Y y=mx+C
Bias
• It is a constant that helps the model in a way
that it can fit best for the given data.
• It gives the freedom to perform best.
• Bias is like another weight. It’s included by
adding a component x0=1 to the input vector
X.
• X=(1,X1,X2…Xi,…Xn)
• Bias is of two types
– Positive bias: increase the net input
– Negative bias: decrease the net input
Bias as extra input
• Bias is an external parameter of the neuron. Can be modeled by
adding an extra input. m

x0 = +1
w0
v= wx
j=0
j j

w0 = b
x1 w1 Activation
Local function
Field

 (−)
Input v
Output
signal x2 w2 y

Summing
  function

xm wm Synaptic
weights
Neural Networks NN 1 65
Comparison between brain verses
computer
Brain ANN

Speed Few ms. Few nano sec. massive ||el


processing
Size and complexity 1011 neurons & 1015 Depends on designer
interconnections
Storage capacity Stores information in its Contiguous memory
interconnection or in locations
synapse. loss of memory may
No Loss of memory happen sometimes.
Tolerance Has fault tolerance No fault tolerance. It gets
disrupted when
interconnections are
disconnected
Control mechanism Complicated involves Simpler in ANN
chemicals in biological
neuron
McCulloch-Pitts Neuron Model
McCulloch-Pitts Model (Contd…)
• Allows binary 0,1 states only
• No interaction among network neurons
• Widely used in case of logic functions
• With feedback this model can also act as
memory cell, which can be used in the absence
of input.
• In this model, weights are fixed. Hence
networks using this model doesn’t have
learning capability
McCulloch-Pitts Model Example
Table

Object Purple? Round? Eat?

Blueberry Yes Yes Yes

Golf ball No Yes No

Violet Yes No No

Hot Dog No No No
situation if the threshold is set at 1

Total (x)
(Combin (Threshold=
Object Purple? Round? Eat?
ed 1)
Input) x > 1?

Blueberry 1 1 2 Yes 1

Golf ball 0 1 1 No 0

Violet 1 0 1 No 0

Hot Dog 0 0 0 No 0
situation if the threshold is set at 2

Object Purple? Round? Total x>2 Eat?

Blueberry 1 1 2 No 0

Golf ball 0 1 1 No 0

Violet 1 0 1 No 0

Hot Dog 0 0 0 No 0
Assignment Example 1 (MP Model)
OR function
W1=1 W2=1 Threshold (θ) = 0
I1 I2 I1

0 0

0 1 I2
1 0
1 1
Assignment Example 2 (MP Model)
OR function
W1=0.5 W2=0.5 Threshold (θ) = 0

I1 I2

0 0 I1

0 1
1 0
I2
1 1

Hint: You may change threshold to 0.5


if necessary
Hopfield Model
• A Hopfield Network is a model of associative memory.
• It provides a formal model which can be analyzed for
determining the storage capacity of the network.
• It consists of a single layer which contains one or more
fully connected recurrent neurons.
• It is commonly used for auto-association and optimization
tasks.
Hopfield Model (Contd…)
• The associative memory problem is summarized as
follows:
✓ Store a set of p patterns i (XI) in such a way that
when presented with a new pattern  i (Final Sigma),
the network responds by producing whichever one of
the stored patterns most closely resembles i .
• The patterns are labeled =1,2,…,p, while the units in
the network are labeled by i=1,2,…,N. Both the
stored patterns, i , and the test patterns, i , can be
taken to be either 0 or 1.
Hopfield Network
• The purpose of a Hopfield net is to store 1 or more
patterns and to recall the full patterns based on partial
input or even some corrupted information about that
pattern.
• Example: problem of optical character recognition
Associative Memory problem : Example
Hopfield Network
• All the nodes in a
Hopfield network are
both inputs and
outputs
• They are fully
interconnected
• Links from each node
to itself as being a
link with a weight of
0
How it works
• put a distorted pattern onto the nodes of the
network
• iterate a bunch of times
• eventually it arrives at one of the patterns we
trained it to know and stays there
• Steps
✓ How to "train" the network
✓ How to update a node in the network
✓ How the overall sequencing of node updates is accomplished
✓ How can you tell if you're at one of the trained patterns
How to "train" a Hopfield network

(1)

(2)
How to “test" a Hopfield network
How to “test" a Hopfield network

• Testing Algorithm:
8. Now feedback the obtained output Vi to all other
units. The activation vectors are updated.
9. Finally, test the network for convergence.
Example
• We have a 5 node Hopfield network and we want it to
recognize the pattern (0 1 1 0 1)

• Since there are 5 nodes, we need a matrix of 5 x 5 weights

0 W12 W13 W14 W15


W21 0 W23 W24 W25
W31 W32 0 W34 W35
W41 W42 W43 0 W45
W51 W52 W53 W54 0
• If you switch the i and j indexes, you get the same
result. i.e

• For vector V = (0 1 1 0 1),


V11 = 0, V12 = 1, V13 = 1, V14 = 0, and V15 = 1.
Weight Matrix for pattern (0 1 1 0 1)
Example for 2 patterns
• V1 = (0 1 1 0 1) and V2 = (1 0 1 0 1)
• Computation of weight matrix
Weight Matrix for 2 patterns
V1 = (0 1 1 0 1) and V2 = (1 0 1 0 1)
How to update a node in a Hopfield network
• Consider input pattern
• Formula to update (1 1 1 1 1)
a node
Alternate method to calculate the
weighted sum
Sequencing of node updates in a Hopfield
network
• There are at least two ways in which we might carry
out the updating:
• Synchronously: update all the units simultaneously at each
time step;
• Asynchronously: Update them one at a time.
• At each time step, select at random a unit i to be
updated, and apply the formula;
• Let each unit independently choose to update itself
according to the above formula.
Sequencing of node updates (Contd…)
• people code Hopfield nets in a semi-random order

• They update all of the nodes in one step, but within


that step they are updated in random order

• So it might go 3, 2, 1, 5, 4, 2, 3, 1, 5, 4, etc
When to stop updating the network
• Basically, if you go through all the nodes and
none of them changes, you can stop
Example (3, 1, 5, 2, 4,)
for input pattern (1 1 1 1 1)
Assignment
• Solve above example for following sequence
(2, 4, 3, 5, 1)
Example (2, 4, 3, 5, 1,)
for input pattern (1 1 1 1 1)
Hopfield Model
•We will study the memorization (i.e. find a set of suitable
wij) of a set of random patterns, which are made up of
independent bits Vi which can each take on the values +1
and –1 with equal probability.
•Our procedure for testing whether a proposed form of w ij
is acceptable is first to see whether the patterns are
themselves stable, and then to check whether small
deviations from these patterns are corrected as the
network evolves.
•We distinguish two cases:
•One pattern
•Many patterns
Hopfield Model : Storage of One pattern
•The condition for a single pattern to be stable is:
Vi in = (  wijV j ) ( for all i)
j

•It is easy to see that this is true if we get the weight as


proportional to the product of the components:

w ij  V iV j

Since Vi2=1. For convenience we get the constant of


proportionality to be 1/N, where N is the number of units in the
network. Thus we have: 1
w ij = V iV j
N
Hopfield Model : Storage of One pattern

•Furthermore, it is also obvious that even if a number


(fewer than half) of the bits of the starting pattern S i are
wrong (i.e. not equal to i ), they will be overwhelmed in
the sum for the net input:

hi = j
w ij V j

By the majority that are right and sgn(hi) will still give i.
This means that the network will correct errors as desired,
and we can say that the pattern i is an attractor.
Hopfield Model : Storage of many patterns
•In the case of many patterns the weights are assumed to be a
superposition of terms like in the case of a single pattern:

1 p
 i j
 
w ij = V V
N  =1
Where p is the number of patterns labeled by .

•An associative memory model above for all possible pairs ij,
with binary units and asynchronous updating is usually called
a Hopfield model.
Hopfield Model : Stability of a particular pattern
•Let us examine the stability of a particular pattern Vis. The
stability condition generalizes to:
Vi = sgn( hi ) ( for all i )
s s

Where the net input his to unit i in pattern s is:


1
hi   wijV j = Vi V j V j
s 
s  s

j N j 

Now we separate the sum on  into the special term =s and
all the rest:
1
Vi V j V j
 
hi = Vi +
s s s

N j  s
Hopfield Model : Stability of a particular pattern
• If the second term were zero, we could immediately conclude that
pattern s was stable according to the previous stability condition.
This is still true if the second term is small enough: if its magnitude
is smaller that 1 it cannot change the sigh of h is and the stability
condition will be still satisfied.

• The second term is called crosstalk. It turns out that it is less than
1 in many cases of interest if p is small enough.
Optimization Problems

▪ Optimization algorithms helps us to minimize (or


maximize) an Objective function (another name for
Error function) E(x) which is simply a mathematical
function dependent on the Model’s internal learnable
parameters.
▪ For example — we call the Weights(W) and the
Bias(b) values of the neural network as its internal
learnable parameters which are used in computing the
output values and are learned and updated in the
direction of optimal solution.
Optimization Algorithms

1. The Weighted Matching Problem


2. Travelling Salesman Problem
3. Graph Bipartitioning Problem

You might also like