Ann PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 129

Artificial Neural Network

1
What do you mean by the neural
network?

1. The neural network is a model that works as neurons


works in human brains.

2. It is also known as ANN (Artificial Neural Networks)

3. It copies the mechanism as working in human brains.

4. In a neural network, the machine can learn, recognize


and make decisions like human beings.
2
Biological prototype of neuron

3
Artificial Neural Network

4
Neural Network Architectures

5
Network Architecture Types
Three basic type of neuron connection architectures

➢ Single layer feed forward network

➢ Multi-layer feed forward network

➢ Recurrent neural network

6
Single layer feed forward network

This network is called single-layer network, where


single layer refers to output layer of computation
nodes(neuron)

There is only one computational layer, so it is a single


layer architecture.

Input layer only receives signal from the external


world it won’ t perform any processing. 7
Multi-layer feed forward network

8
Recurrent neural network

These networks are differ from feedforward network in the


sense that there is atleast one feedback loop. They can be
single layer and multi-layer network
9
Learning Rules
➢ Error-correction learning

➢ Memory-based learning

➢ Hebbian learning

➢ Competitive learning

➢ Boltzmann learning

10
Error Correction learning

11
Memory based learning

12
13
Hebbian Learning

14
Hebbian Learning

15
Competitive Learning

16
Competitive Learning

17
Competitive Learning

18
Competitive Learning

19
Rosenblatt Perceptron Model
1.Rosenblatt Perceptron model was designed by Rosenblatt in 1958
to overcome the issues of McCulloch-Pitts Neuron model.

✓ It can process non-Boolean inputs and it assigns different weights to


each input automatically.

2. It is a single layer network.

3. Rosenblatt perceptron can be seen as a set of inputs that are


weighted and to which we apply activation function.

20
Rosenblatt Perceptron Model
4. The input can be seen as neurons and will be
called as input layer

5. These neurons (input layer) and activation


function forms a perceptron.

6. This model implements the functioning of a


single neuron that can solve linear classification
problem through very simple learning algorithm.

7. Rosenblatt Perceptron are called as first


generation of neural networks.

8. This model has main limitation of not solving the


non-linear problem. 21
Perceptron Convergence theorem

22
Perceptron Convergence theorem

23
Perceptron Convergence theorem

24
Perceptron Convergence theorem

25
Perceptron Convergence theorem

26
Perceptron Convergence theorem

27
Activation Function

28
29
30
31
32
33
34
35
Multilayer Perceptron

36
Multilayer Perceptrons

37
Multilayer Perceptron

38
Multilayer Perceptron

39
Multilayer Perceptron

40
The Back-Propagation Algorithm

41
The Back-Propagation Algorithm

42
Gradient Descent
Gradient Descent is defined as one of the most commonly
used iterative optimization algorithms of machine learning
to train the machine learning and deep learning models. It
helps in finding the local minimum of a function.

43
Gradient Descent
This entire procedure is known as Gradient Ascent, which is
also known as steepest descent. The main objective of using
a gradient descent algorithm is to minimize the cost
function using iteration. To achieve this goal, it performs
two steps iteratively:

Calculates the first-order derivative of the function to


compute the gradient or slope of that function.

Move away from the direction of the gradient, which means


slope increased from the current point by alpha times,
where Alpha is defined as Learning Rate. It is a tuning
parameter in the optimization process which helps to decide
the length of the steps.
44
45
46
47
48
49
50
51
52
53
54
55
56
XOR PROBLEM

57
58
1

59
61
62
Batch Learning
According to batch learning, it is unable to learn continuously from data after
we train a model. Then the model has to be trained once according to the
complete dataset, it may take longer and also require more computer
resources.

Then if we get some new data, how can we add it to this model? We should train the
model again from scratch using the whole data (old data + new data).

So, then we need again more time and computer resources. We can solve this
problem using algorithms that are capable of learning continuously. This is
called online learning.
63
Online Learning

That way we can continue to learn from the model so we can feed the dataset
into small groups also known as mini-batches without having to train the model
all at once from the complete dataset. Or we can train the model using individual
data points from the whole dataset.

64
Cover’s Theorem
Cover states that a pattern classification problem cast in a
nonlinear high-dimensional space is more likely to be linearly
separable than in a low-dimensional space.

65
Radial Basis Function Neural
Network
The idea of Radial Basis Function (RBF) Networks derives from the theory of function
approximation. We have already seen how Multi-Layer Perceptron (MLP) networks with
a hidden layer of sigmoidal units can learn to approximate functions. RBF Networks take
a slightly different approach. Their main features are:

Basic Form of RBF

Input layer: Source node connected to the environment

Hidden layer: Provide a set of function which form a base for mapping into hidden
layer

Output Layer: Supplies Response

66
Radial Basis Function Neural
Network

67
Radial Basis Function Neural
Network

P is dimensionality of input feature space , M is dimensionality of transformed feature


space where we have imposed our RBF.

68
Radial Basis Function Neural
Network
Training Comprises for these kind of network in two phases:
➢ Training Hidden layer which comprises of M RBF functions, the parameters to be
determined for RBF function are receptor position t and the Sigma in case of
Gaussion RBF.
➢ Training weight vectors Wij for output layer.
Training Hidden layer:

So for training hidden layers there are different approaches, let us assume for
now we are dealing with Gaussian RBF so we need to determine receptor t and
Spread ie Sigma. One of the approach is to randomly select M number of
receptor from N number of sample feature vector but this does not seems logical
so we can go ahead with clustering mechanism to determine receptors ti.

As we have M nodes in hidden layer and N samples so for clustering to work


here N>M. 69
Radial Basis Function Neural
Network

70
Radial Basis Function Neural
Network
Calculation of receptors:
Let’s look at above example where we have M=3 so we need to to determine three
t’s. so initially we divide out feature vector space in to three arbitrary clusters and
took their means as the initial receptors, then we need to iterate for every sample
feature vector and perform below steps:
➢ a) From the selected input feature vector x determine distances of means(t1,t2,t3)
of three different clusters whichever distance mean is minimum the sample x will
get assigned to that cluster.
➢ b) After x got assigned to different cluster all the means(t1,t2,t3) gets recomputed.
➢ c) Perform step 1 and step 2 for all sample points.
➢ Once the iteration finishes we will get the optimal t1,t2 and t3.
71
Radial Basis Function Neural
Network
Calculation of Sigma:

Once receptors are calculated we can use K nearest neighbor algorithm to calculate
sigma the formula is there in above image. we need to select the values of P.

72
Radial Basis Function Neural
Network

73
Radial Basis Function Neural
Network

74
Radial Basis Function Neural
Network

75
Radial Basis Function Neural
Network

76
Radial Basis Function Neural
Network

77
Radial Basis Function Neural
Network
Training Weight Vectors
Let us assume the dimensionality of hidden layer as M and sample size as N
the we can calculate the optimal weight vector for the network using the
pseudo inverse matrix solution.

78
Radial Basis Function Neural
Network

Every component of dk either will be one or 0. It will be equal to 1 if the


79
corresponding input vector belongs to class k else it will be 0.
Radial Basis Function Neural
Network

80
Radial Basis Function Neural
Network

81
K-means clustering
➢ K-Means clustering is an unsupervised learning algorithm.
There is no labeled data for this clustering, unlike in supervised
learning. K-Means performs the division of objects into clusters
that share similarities and are dissimilar to the objects belonging
to another cluster.

➢ The term ‘K’ is a number. You need to tell the system how many
clusters you need to create. For example, K = 2 refers to two
clusters. There is a way of finding out what is the best or
optimum value of K for a given data.

For a better understanding of k-means, let's take an example from


cricket. Imagine you received data on a lot of cricket players from
all over the world, which gives information on the runs scored by
the player and the wickets taken by them in the last ten matches.
Based on this information, we need to group the data into two
clusters, namely batsman and bowlers.
Let's take a look at the steps to create these clusters. 82
K-means clustering

83
K-means clustering

84
K-means clustering

85
K-means clustering

86
K-means clustering

87
K-means clustering

88
K-means clustering

89
K-means clustering

90
K-means clustering

91
K-means clustering

92
K-means clustering

93
K-means clustering

94
K-means clustering

95
K-means clustering

96
K-means clustering

97
K-means clustering

98
K-means clustering

99
Hybrid Learning Procedure for RBF Networks.

100
Self-Organizing Map

• Introduced by Prof. Teuvo Kohonen


in 1982
• Also known as Kohonen feature map
• Unsupervised neural network
• Clustering tool of
high-dimensional
and complex data

101
Self-Organizing Map

• Maintains the topology of the dataset


• Training occurs via competition between the neurons
• Impossible to assign network nodes to specific input
classes in advance
• Can be used for detecting similarity and degrees of
similarity
• It is assumed that input pattern fall into sufficiently
large distinct groupings
• Random weight vector initialization

102
103
Terminology used
• Clustering
• Unsupervised learning
• Euclidean Distance
p = (p1 , p 2 ,..., p n )
q = (q1 , q 2 ,..., q n )
n
ED =  (p
i =1
i − qi ) 2

104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129

You might also like