Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

18CSE352T

NEURO FUZZY AND GENETIC PROGRAMMING

SESSION 2
An Introduction

• There are certain tasks which a computer can do better than humans
• Multiplying 2 big numbers
• Search through millions of records
An Introduction
• There are certain tasks where humans outperform
computers
• We don’t see many robots driving the car on the road
• Some companies are trying driverless cars
• But it is still in initial phase
• Natural language conversation is also an area where
humans outperform machines
• Plumbing, Electrical Works and various other expert
works
An Introduction
• Machine learning tries to make computers
better at doing things that humans
traditionally can do better than machines
• Make machines learn things like humans do
• We need to understand how human brain
works
• Whenever we think or make a decision a signal
is generated and the neurons light up
An Introduction

• Here is a simple model of


neurons
• Circles are the neurons and the
arrows are the synapses that
connect these neurons
An Introduction
An Introduction
An Introduction
An Introduction
Topics that will be covered in this Session
• History of ANN
• Artificial Neural Network (ANN) Architectures
• Single Layer Feed Forward
• Multilayer Feed Forward
• Competitive
• Recurrent
• Activation Functions
• Identity Function
• Step Function
• Sigmoid Function
• Hyperbolic Tangent Function
HISTORY OF ANN
History of ANN
• 1943 : Neurophysiologist Warren McCulloch and Mathematician Walter Pitts wrote
a paper on how neurons work. To describe it, they modeled a simple neural network
using electrical circuits.
• 1949 : Donald Hebb pointed out the fact that “Neural pathways are strengthened
each time they are used (the way in which humans learn). If two nerves fire at the
same time, the connection b/w them is enhanced”
• 1950 : Nathanial Rochester from IBM research laboratories simulated a hypothetical
neural network
• 1959 : Bernard Widrow and Marcian Hoff of Stanford developed models called
ADALINE (ADAptive LINear Elements) and MADALINE (Multiple ADAptive LINear
Elements)
• ADALINE was developed to recognize binary patterns
• MADALINE was the first neural network applied to a real world problem that
eliminates echoes on phone lines
History of ANN – contd.
• 1962 : Widrow & Hoff developed a learning procedure that examines the value
before the weight adjusts it.
• Despite the later success of the neural network, traditional von Neumann
architecture took over the computing scene and neural research was left behind
• In the same time period, a paper was written that suggested that could not be an
extension from the single layered neural network to a multiple layered neural
network.
• In addition, many people in the field were using a learning function that was
fundamentally flawed
• As a result, research and funding went drastically down
• 1975 : The first multilayered network was developed, an unsupervised network.
• 1982 : interest in the field was renewed. There was a joint US-Japan conference on
Cooperative / Competitive Neural Networks. As a result, there was more funding
and thus more research in the field
History of ANN – contd.
• 1986 : There were efforts on extending the Windrow-Hoff rule to multiple layers.
Three independent groups of researchers worked on it and came up with similar
ideas which are now called Back Propagation Networks
• 1987 : IEEE’s first International Conference on Neural Networks drew more than
1,800 attendees
• 1997 : A recurrent neural network framework, Long Short-Term Memory (LSTM) was
proposed by Schmidhuber & Hochreiter
• 1998 : Yann LeCun published Gradient-Based Learning Applied to Document
Recognition
• Several other steps have been taken to get us to where we are now
The Future of Neural Networks
• The future of neural network lies in the development of hardware.
• Due to the limitations of processors, neural networks take weeks to learn
• Some companies are trying to create “silicon compiler” to generate a specific type of
integrated circuit that is optimized for the application of neural networks
ANN ARCHITECTURES
Artificial Neural Network (ANN) Architectures
• An ANN consists of a number of a number of artificial neurons connected among
themselves in certain ways
• These neurons are arranged in layers, with interconnections across the layers
• The network may or may not be fully connected
• The nature of the interconnection paths also varies
• They are either unidirectional or bidirectional
• The topology of an ANN, together with the nature of its interconnection paths is
generally referred to as its architecture
Output
1. Single-Layer Feed Forward ANN Neurons
y1_out
Input y1_in Y1
Neurons w11
• It is the simplest ANN architecture w21
• It consists of an array of i/p neurons x1 w12
X1 y2_in
connected to an array of o/p neurons w13 y2_out
Y2
w14
• The input neurons do not exercise any w22
x2 w32
processing power, but simply forward
the i/p signals to the subsequent X2 y3_in
neurons. So they are not considered w23 y3_out
Y3
to constitute a layer w31
w33
• So the only layer in the ANN is x3
X3 w24
composed of the o/p neurons Y1, … Yn y4_in
w34
y4_out
Y4
Output
Single-Layer Feed Forward ANN – cont.. Neurons
y1_out
• Net input is Input y1_in Y1
𝑚
Neurons w11
𝑦_𝑖𝑛1 = 𝑥1 𝑤11 + 𝑥2 𝑤21 + … + 𝑥𝑚 𝑤𝑚1 = ෍ 𝑥𝑖 𝑤𝑖1 w21
𝑖=1
x1 w12
X1 y2_in
w13 y2_out
𝑚 Y2
w14
𝑦𝑖𝑛2 = 𝑥1 𝑤12 + 𝑥2 𝑤22 + … + 𝑥𝑚 𝑤𝑚2 = ෍ 𝑥𝑖 𝑤𝑖2 w22
x2 w32
𝑖=1
X2 y3_in
𝑚 w23 y3_out
w31 Y3
𝑦_𝑖𝑛𝑛 = 𝑥1 𝑤1𝑛 + 𝑥2 𝑤2𝑛 + … + 𝑥𝑚 𝑤𝑚𝑛 = ෍ 𝑥𝑖 𝑤𝑖2
w33
𝑖=1 x3
X3 w24
w34 y4_in
where m ⇒ number of nodes in i/p layer
y4_out
n ⇒ number of nodes in o/p layer Y4
Single-Layer Feed Forward ANN – cont..
• Net input is 𝑚
In vector notation 𝑤.11
𝑦_𝑖𝑛1 = 𝑥1 𝑤11 + 𝑥2 𝑤21 + … + 𝑥𝑚 𝑤𝑚1 = ෍ 𝑥𝑖 𝑤𝑖1 .
𝑦𝑖𝑛1 = x1 … xm × . =𝑋 × 𝑤∗1
𝑖=1
.
𝑚 𝑤𝑚1
𝑦𝑖𝑛2 = 𝑥1 𝑤12 + 𝑥2 𝑤22 + … + 𝑥𝑚 𝑤𝑚2 = ෍ 𝑥𝑖 𝑤𝑖2 where X = x1 … xm ⇒ input vector of the i/p signals
𝑖=1 𝑤 ∗ 1 ⇒ first column of the weight matrix

𝑚 𝑤11 𝑤1𝑛

𝑤21 𝑤2𝑛
𝑦_𝑖𝑛𝑛 = 𝑥1 𝑤1𝑛 + 𝑥2 𝑤2𝑛 + … + 𝑥𝑚 𝑤𝑚𝑛 = ෍ 𝑥𝑖 𝑤𝑖2 ⋮ ⋱ ⋮
𝑖=1 𝑤𝑚1 ⋯ 𝑤𝑚𝑛

Y_in denotes the vector of the net inputs


where m ⇒ number of nodes in i/p layer
𝑌_𝑖𝑛 = [ 𝑦_𝑖𝑛1 … 𝑦_𝑖𝑛𝑛 ]
n ⇒ number of nodes in o/p layer

𝑌_𝑖𝑛 = 𝑋 × 𝑊
2. MultiLayer Feed Forward ANN
• Similar to single layer feed forward net except that there is one or more additional
layers of processing units between the input and the output layers
• The additional layers are called the hidden layers of the network
MultiLayer Feed Forward ANN – cont..
• Net input to the hidden layer
𝑌_𝑖𝑛 = X × 𝑊
• Net input to the output layer
Z𝑖𝑛 = Y_out × 𝑉
Where
𝑋 = [𝑥1, 𝑥2, … , 𝑥𝑚] is the i/p vector
Y_in= [𝑦_𝑖𝑛1, 𝑦_𝑖𝑛2, … , 𝑦_𝑖𝑛𝑛] is the
net i/p vector to the hidden layer
Z_in= [𝑧_𝑖𝑛1, 𝑧_𝑖𝑛2, … , 𝑧_𝑖𝑛𝑟] is the
net i/p vector to the output layer
Y_𝑜𝑢𝑡 = [𝑦_𝑜𝑢𝑡1, 𝑦_𝑜𝑢𝑡2, … , 𝑦_𝑜𝑢𝑡𝑛]
is the o/p vector from the hidden layer
MultiLayer Feed Forward ANN – cont..

• W and V are the weight matrices for


the interconnections b/w the i/p
layer, hidden layer and o/p layer
respectively

𝑤11 𝑤12 𝑤1𝑛



𝑤21 𝑤22 𝑤2𝑛
𝑊= ⋮ ⋱ ⋮
𝑤𝑚1 𝑤𝑚2 ⋯ 𝑤𝑚𝑛
𝑣11 𝑣12 𝑣1𝑟

𝑣 𝑣 𝑣2𝑟
𝑉 = 21 ⋮ 22 ⋱ ⋮ where m ⇒ number of nodes in the i/p layer
n ⇒ number of nodes in the hidden layer
𝑣𝑛1 𝑣𝑛2 ⋯ 𝑣𝑛𝑟
r ⇒ number of nodes in the o/p layer
MultiLayer Feed Forward ANN – cont..
It is possible to include more than one hidden layers
MultiLayer Feed Forward ANN – cont..
3. Competitive ANN
• Competitive networks are structurally
similar to single layer feed forward nets
• But, the o/p units are connected among
themselves usually through –ve weights
• Two types
1. O/p units are connected only to
their respective neighbours
2. O/p units are fully connected
• For a given i/p pattern, the output units
tend to compete among themselves to
represent that input
• Can be used for unsupervised learning
(clustering)
4. Recurrent Networks
• In feed forward networks, signals flow in
one direction only (from the i/p layer
towards the o/p layer through the hidden
layers). There are no feedback loops
• A recurrent network allows feedback
loops
• Fully connected recurrent networks
contain a bidirectional path between
every pair of processing elements
• Also, a recurrent network may contain
self loops
• Also called feedback neural networks
• They are designed to work with sequence
prediction problems
ACTIVATION FUNCTIONS
Activation Functions
• The function that maps the net input value to the output signal value is known as the
activation function
• The output from a processing unit is known as its activation
• Some of the common activation functions are
1. Identity Function
2. Step Function
3. Sigmoid Function
4. Hyperbolic Tangent Function
1. Identity Function
• It is the simplest activation function
• It passes on the incoming signal as the outgoing signal without any change
• The identity activity function 𝑔(𝑥) is defined as 𝒈 𝒙 = 𝒙
• It is employed in the input units
• This is because the role of an input unit is to forward the incoming signal as it is to
the units in the next layer through the respective weighted paths
• Graphical Representation:
2. Step Function
• It is one of the frequently used activation functions
• The step function is also known as the Heaviside function
• There are 4 types of step functions:
1. Binary step function
2. Binary threshold function
3. Bipolar step function
4. Bipolar threshold function
Binary Step Function Binary Threshold Function
• It produces 1 or 0 depending on • Instead of 0, a non-zero threshold
whether the net input is greater than 0 value θ is used.
or not • The binary threshold function and is
• The binary step function is defined as defined as
𝟏, 𝒊𝒇 𝒙 > 𝟎 𝟏, 𝒊𝒇 𝒙 > 𝜽
𝒈 𝒙 =ቊ 𝒈 𝒙 =ቊ
𝟎, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆 𝟎, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
Bipolar Step Function Bipolar Threshold Function
• Sometimes, it is more convenient to work with bipolar data -1 and +1, than binary data
• If a signal value 0 is sent through a weighted path, the information contained in the
interconnection weight is lost as it is multiplied by 0
• To overcome this, the binary input is converted to bipolar form and then a suitable bipolar
activation function is applied
• The output of bipolar step function is -1 or +1

+𝟏, 𝒊𝒇 𝒙 > 𝟎 +𝟏, 𝒊𝒇 𝒙 > 𝜽


𝒈 𝒙 =ቊ 𝒈 𝒙 =ቊ
−𝟏, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆 −𝟏, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
3. The Sigmoid Function
• The step function is not continuous and it is not differentiable
• Some ANN training algorithms require the activation function to be continuous and
differentiable
• The step function is not suitable for such cases
• Sigmoid functions have the nice property that they can approximate the step
function to the desired extent without losing its differentiability
• There are 2 types of sigmoid functions:
1. Binary sigmoid function (Logistics sigmoid function)
2. Bipolar sigmoid function
Binary Sigmoid Function

• It is defined as
1
𝑔 𝑥 =
1 + 𝑒 −σ𝑥
• The parameter σ is known as the steepness
parameter
• The transition from 0 to 1 could be made as
steep as desired by increasing the value of σ to • Derivative is similar to slope
appropriate extent • In a non-linear curve, at every point
• The first derivative of g(x), denoted by g’(x) is the slope is changing
Δ𝑦
• Derivative =
Δ𝑥
𝑔′ 𝑥 = σ𝑔 𝑥 (1 − 𝑔 𝑥 )
• It tells us how much the o/p
changes for the given change in i/p
Bipolar Sigmoid Function

• Depending the requirement, the binary


sigmoid function can be scaled to any range of
values appropriate for a given application
• The most widely used range is from -1 to +1
and the corresponding sigmoid function is
bipolar
• It is defined as
1 − 𝑒 −σ𝑥
𝑔 𝑥 =
1 + 𝑒 −σ𝑥
• The first derivative of g(x), denoted by g’(x) is


σ
𝑔 𝑥 = 1+𝑔 𝑥 (1 − 𝑔 𝑥 )
2
Hyperbolic Tangent Function
• It is another widely employed bipolar activation
function
• It is closely related to the bipolar sigmoid function
• It is defined as
𝑒 𝑥 − 𝑒 −𝑥
ℎ 𝑥 = 𝑥
𝑒 + 𝑒 −𝑥
• Its first derivative is
ℎ′ 𝑥 = 1 + ℎ 𝑥 (1 − ℎ 𝑥 )
• When the input data is binary and not continuously
valued in the range from 0 to 1, they are generally
converted to bipolar form and then a bipolar sigmoid or
hyperbolic tangent activation function is applied on
them by the processing units

You might also like