Download as pdf or txt
Download as pdf or txt
You are on page 1of 92

Prepared By: Dr.

Sandeep Kumar
Professor, CSE-H
Bilogical Neurons
Biological Neuron
• First, we will see how a biological neuron works. Firstly, a
neuron can be divided into three basic units, Dendrites, Cell
body(also called Soma) and Axon.
• For every task we do, in our brain, a specific network of these
neurons fire signals which pass through Synaptic terminals to
Dendrites through the Cell body to Axon which further get
transferred to another neuron.
Cont…
Cont….
Neuron Models

1. McCulloch Pitts Model


2. Rosenblatt perceptron model
3. Adaptive Linear element (ADALINE)

• McCulloch-Pitts Neuron abbreviated as MP Neuron is the


fundamental building block of Artificial Neural Network.
• Similar to biological neurons, both MP Neuron as well as the
Perceptron Model take inputs and process them to give an
output.
McCulloch Pitts Model
• Proposed by Warren McCulloch and Walter Pitts in 1943.
• This model imitates the functionality of a biological neuron,
thus is also called Artificial Neuron.
• An artificial neuron accepts binary inputs and produces a
binary output based on a certain threshold value which can be
adjusted.
• This can be mainly used for
classification problems.
• If the aggregated value exceeds
the threshold, the output is 1
else it is 0.
Rosenblatt Perceptron Model
• This model was developed by Frank Rosenblatt in 1957.
• Here, the neurons are also called Linear Threshold Unit (LTU).
• This model can work on non-boolean values where each input
connection gets associated with a weight.
• Here the function calculates the weighted sum and based on
the threshold value provided, it gives a binary output.
Cont…
Difference
Cont…
• Both, MP Neuron Model as well as the Perceptron model work
on linearly separable data.
• MP Neuron Model only accepts boolean input whereas
Perceptron Model can process any real input.
• Inputs aren’t weighted in MP Neuron Model, which makes this
model less flexible. On the other hand, Perceptron model can
take weights with respective to inputs provided.
• While using both the models we can adjust threshold input to
make the model fit our the dataset.
Cont…
XOR GATE
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont….
Cont….
Topology
ANN
• Processing of ANN depends upon the following three building
blocks :
1. Network Topology
2. Adjustments of Weights or Learning
3. Activation Functions

• A network topology is the arrangement of a network along


with its nodes and connecting lines.
Topology
• Artificial neural networks are useful only when the processing
units are organized in a suitable manner to accomplish a given
pattern recognition task.
• The arrangement of the processing units, connections, and
pattern input output is referred to as topology.
• Artificial neural networks are normally organized into layers of
processing units.
1. Interlayer
2. Intralayer
a. Feed forward: Information is unidirectional
Feed backward : Information is bidirectional
b.
According to the topology, ANN can be classified as the following
kinds.
Cont….
Basic Learning Law’s
• Learning rule or Learning process is a method or a
mathematical logic.
• It improves the Artificial Neural Network’s performance and
applies this rule over the network.
• Thus learning rules updates the weights and bias levels of a
network when a network simulates in a specific data
environment.
• Applying learning rule is an iterative process.
• It helps a neural network to learn from the existing conditions
and improve its performance.
Hebbian Learning Rule
• This rule, one of the oldest and simplest, was introduced by
Donald Hebb in his book “The Organization of Behavior in
1949”. It is a kind of feed-forward, unsupervised learning.
• Basic Concept “When an axon of cell A is near enough to excite
a cell B and repeatedly or persistently takes part in firing it, some
growth process or metabolic change takes place in one or both
cells such that A’s efficiency, as one of the cells firing B, is
increased.”
• From the above postulate, we can conclude that the connections
between two neurons might be strengthened if the neurons fire at
the same time and might weaken if they fire at different times.
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Perceptron Learning Rule
• This rule is an error correcting the supervised learning algorithm
of single layer feed forward networks with linear activation
function, introduced by Rosenblatt.
• Basic Concept − As being supervised in nature, to calculate the
error, there would be a comparison between the desired/target
output and the actual output.
• If there is any difference found, then a change must be made to
the weights of connection.
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Delta Learning Rule Widrow−HoffRule

• It is introduced by Bernard Widrow and Marcian Hoff, also


called Least Mean Square LMS method, to minimize the error
over all training patterns.
• It is kind of supervised learning algorithm with having
continuous activation function.
• Basic Concept − The base of this rule is gradient-descent
approach, which continues forever.
• Delta rule updates the synaptic weights so as to minimize the
net input to the output unit and the target value.
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Competitive Learning Rule Winner−takes−all
• It is concerned with unsupervised training in which the output
nodes try to compete with each other to represent the input
pattern.
• To understand this learning rule, we must understand the
competitive network which is given as follows −
• Basic Concept of Competitive
Network − This network is just like a
single layer feed forward network with
feedback connection between outputs.
• The connections between outputs are
inhibitory type, shown by dotted lines,
which means the competitors never
support themselves.
Cont…
Cont…
Outstar Learning Rule
• This rule, introduced by Grossberg, is concerned with supervised
learning because the desired outputs are known.
• It is also called Grossberg learning.
• Basic Concept − This rule is applied over the neurons arranged
in a layer.
• It is specially designed to produce a desired output d of the layer
of p neurons.
Cont…
Feed-Forward Neural Network
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Bias-Variance Trade-off
"The two variables to measure the effectiveness of your model
are bias and variance."
Bias is the error or difference between points given and points
plotted on the line in your training set.
Variance is the error that occurs due to sensitivity to small changes
in the training set.
Cont…
• To simply things, let us say, the error is calculated as the
difference between predicted and observed/actual value. Now, say
we have a model that is very accurate. This means that the error is
very less, indicating a low bias and low variance. (As seen on the
top-left circle in the image).
• If the variance increases, the data is spread more which results in
lower accuracy. (As seen on the top-right circle in the image).
• If the bias increases, the error calculated increases. (As seen on the
bottom-left circle in the image).
• High variance and high bias indicate that data is spread with a
high error. (As seen on the bottom-right circle in the image).
• This is Bias-Variance Tradeoff. Earlier, I defined bias as a
measure of error between what the model captures and what the
available data is showing, and variance being the error from
sensitivity to small changes in the available data. A model having
high variance captures random noise in the data.
Cont…
Cont…
• We would have created a model that fits our training data
very well but fails to generalize beyond the training set (say
any testing data that the model was not trained on). Therefore
our model performs poorly on the test data giving rise to less
accuracy. This problem is called over-fitting. We also say
that the model has high variance and low bias.
• Similarly, we have another issue. It is called underfitting. It
occurs when our model neither fits the training data nor
generalizes on the new data (say any testing data that the
model was not trained on). Our model is underfitting when
we have high bias and low variance.
Cont…
How to overcome Underfitting & Overfitting for a
regression model?
• To overcome underfitting or high bias, we can add
new parameters to our model so that the model
complexity increases thus reducing high bias.
• To overcome overfitting, we could use methods like
reducing model complexity and regularization.
Cont…
Activation Dynamic Model
• In a neural network with N processing units, the set of activation
values of the units at any given instant defines the activation
state of the network.
• The trajectory depends upon the activation dynamics built into
the network. The activation dynamics is prescribed by a set of
equations, which can be used to determine the activation state of
the network at the next instant, given the activation state at the
current instant.
• A network is led to one of its steady activation states by the
activation dynamics and the input pattern. Since the steady
activation state depends on the input pattern, it is referred to as
short term memory.
Cont…
Issues in Activation Dynamics Model:
Activation dynamics is described by the first derivative of the
activation value of a neuron [Kosko, 1992].

For the ith neuron, it is expressed as where h(.) is a function of


the activation state and synaptic weights of the network
• Let us consider a network of N interconnected processing units,
where the variables and constants of each unit are shown in figure.

• The output function f(.) determines the output signal generated at


the axon hillock due to a given membrane potential. This function
bounds the output signal, and it is normally a non decreasing
function of the activation value. Thus the output is bounded for a
typical output function. Although the activation value is shown to
have a large range, in practice the membrane potential has to be
bounded due to limitation of the current carrying capacity of a
membrane. This is called the operating range of the processing unit
Cont…
Cont…
The input values to a processing unit coming from external sources,
especially through sensory inputs, may have a large dynamic range,
as for example, the reflections of an object in a dim light and the
same in a bright light. Thus the dynamic range of the external input
values could be very large, and usually not in our control
 If the neuron is made sensitive to smaller values of inputs, its
output signal will saturate for large input values, i.e., for x > x1
in the figure
 On the other hand, if the neuron is made sensitive to large values
of the input by making the threshold x large, its activation value
becomes insensitive to small values of the input
 This is the familiar noise-saturation dilemma [Grossberg, 19821.
The problem is how a neuron with limited operating range for
the activation values can be made sensitive to nearly unlimited
range of the input values
Synaptic Dynamics Model
For a given input data, the weights of the connecting links in a
network are adjusted to enable the network to learn the pattern in the
given input data
 The set of weight values of all the links in a network at any given
instant defines the weight state, which can be viewed as a point in
the weight space. The trajectory of the weight states in the weight
space is determined by the synaptic dynamics of the network
 The steady weight state of a network is determined by the
synaptic dynamics for a given set of training inputs, and it does
not change. Hence this steady weight state is referred to as long
term memory
 Activation dynamics relates to the fluctuations at the neuronal
level in a biological neural network
Distinction between Synaptics and Activation
Dynamics
• Both activation dynamics and synaptic dynamics models are expressed in terms of
expressions for the first derivatives of the activation value of each unit and the
strength of the connection between the ith unit and the jth unit, respectively
• The purpose of invoking activation dynamics model is to determine the equilibrium
state that the network would reach for a given input. In this case, the input to the
network is fixed throughout the dynamics
• The dynamics model may have terms corresponding to passive decay, excitatory input
(external and feedback) and inhibitory input (external and feedback). The passive
decay term contributes to transients, which may eventually die, leaving only the
steady state part. The transient part is due to the components representing the
capacitance and resistance of the cell membrane
• The steady state activation equations can be obtained by setting xi(t) = 0, i = 1,2, ...,
N. This results in a set of N coupled nonlinear equations, the solution of which will
give the steady activation state as a function of time. This assumes that the transients
decay faster than the signals coming from feedback, and the feedback signals do not
produce any transients
Cont…
• The case of synaptic dynamics model is different from the activation
dynamics model. The objective in synaptic dynamics is to capture the pattern
information in the examples by incrementally adjusting the weights. Here the
weights change due to input. If there is no input, the weights also do not
change
• If the model contains a passive decay term in addition to the terms due to the
varying external input, the network not only learns continuously, but also
forgets what it had learnt initially
• In discrete implementation, i.e., determining the weight change at each
discrete time step, suitable assumptions are made regarding the contribution
of the initial weight state and also the contributions due to the samples given
in the past.
Cont…
• The activation values considered here are steady and stable,
since it is assumed that the transients due to membrane
parameters like capacitances have decayed down, and the steady
activation state of the network has reached the stable state
• The distinction between the activation dynamics and synaptic
dynamics models is highlighted by the following statements:
 For activation dynamics our interest is in the equilibrium
states (x) given by V(x(t)) = 0, which in turn uses the
solution of equations for the steady activation states given
by x(t) = 0, i.e., xi(t) = 0, for i = 1,2, ..., N.
 For synaptic dynamics, on the other hand, learning takes
place when wij(t) + 0.
Cont…

You might also like