DL Unit - 4

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

UNIT – IV

UNIT IV: Probabilistic Neural Network: Hopfield Net, Boltzmann machine, RBMs,
Sigmoid net, Auto encoders.

Probabilistic Neural Network:


A probabilistic neural network (PNN) is a sort of feedforward neural network used to
handle classification and pattern recognition problems. In the PNN technique, the parent
probability distribution function (PDF) of each class is approximated using a Parzen window
and a non-parametric function. The PDF of each class is then used to estimate the class
probability of fresh input data, and Bayes’ rule is used to allocate the class with the highest
posterior probability to new input data. With this method, the possibility of misclassification
is lowered. This type of ANN was created using a Bayesian network and a statistical
approach known as Kernel Fisher discriminant analysis.
PNNs have shown a lot of promise in solving difficult scientific and engineering
challenges. The following are the major types of difficulties that researchers have attempted
to address with PNN:

 Labeled stationary data pattern classification


 Data pattern classification in which the data has a time-varying probabilistic density
function
 Applications for signal processing that work with waveforms as data patterns
 Unsupervised algorithms for unlabeled data sets, etc.

Structure of Probabilistic Neural Network:


The below is Specht’s basic framework for a probabilistic neural network (1990). The
network is composed of four basic layers. Let’s understand them one by one.

Sourse

1
Input Layer:
Each predictor variable is represented by a neuron in the input layer. When there are
N categories in a categorical variable, N-1 neurons are used. By subtracting the median and
dividing by the interquartile range, the range of data is standardized. The values are then fed
to each of the neurons in the hidden layer by the input neurons.

Pattern Layer:
Each case in the training data set has one neuron in this layer. It saves the values of
the case’s predictor variables as well as the target value. A hidden neuron calculates the
Euclidean distance between the test case and the neuron’s center point, then uses the sigma
values to apply the radial basis kernel function.

Summation Layer:
Each category of the target variable has one pattern neuron in PNN. Each hidden
neuron stores the actual target category of each training event; the weighted value output by a
hidden neuron is only supplied to the pattern neuron that corresponds to the hidden neuron’s
category. The values for the class that the pattern neurons represent are added together.

Decision Layer:
The output layer compares the weighted votes accumulated in the pattern layer for
each target category and utilizes the largest vote to predict the target category.

PNN Operation:
 The input nodes are the set of measurements.
 The second layer consists of the Gaussian functions formed using the given set of data
points as centers.
 The third layer performs an average operation of the outputs form the second layer for
each class.
 The fourth layer performs a vote, selecting the largest value. The associated class
label is thus determined.
 In PNN, the parent probability distribution function (PDF) of each class is
approximated by a Parzen window and a non-parametric function.
 A PNN consists of several sub-networks, each of which is a Parzen window pdf
estimator for each of the classes.
 PNNs do not use back propagation for training.

Algorithm of Probabilistic Neural Network:

The training set’s exemplar feature vectors are provided to us. We know the class to which
each one belongs. The PNN is configured as follows.

1. Input the file containing the exemplar vectors and class numbers.
2. Sort these into K sets, each of which contains one class of vectors.

2
3. Create a Gaussian function centered on each exemplar vector in set k, and then define
the cumulative Gaussian output function for each k.

After we’ve defined the PNN, we can feed vectors into it and classify them as follows.

1. Read the input vector and assign the Gaussian function according to their performance
in each category.
2. For each cluster of hidden nodes, compute all ‘Gaussian functional useful values’ at
the hidden nodes.
3. Feed all of the Gaussian functional values from the hidden node cluster to the
cluster’s single output node.
4. For each category output node, add all of the inputs and multiply by a constant.
5. Determine the most valuable of all the useful values added together at the output
nodes.

Advantages and Disadvantages:

There are various benefits and drawbacks and applications of employing a PNN rather than a
multilayer perceptron.

Advantages:
 Multilayer perceptron networks are substantially slower than PNNs.
 PNNs have the potential to outperform multilayer perceptron networks in terms of
accuracy.
 Outliers aren’t as noticeable in PNN networks.
 PNN networks predict target probability scores with high accuracy.
 PNNs are getting close to Bayes’s optimum classification.

Disadvantages:
 When it comes to classifying new cases, PNNs are slower than multilayer perceptron
networks.
 PNN requires extra memory to store the mod.

Hopfield Network:
Hopfield network is a special kind of neural network whose response is different from
other neural networks. It is calculated by converging iterative process. It has just one layer of
neurons relating to the size of the input and output, which must be the same. When such a
network recognizes, for example, digits, we present a list of correctly rendered digits to the
network. Subsequently, the network can transform a noise input to the relating perfect output.

In 1982, John Hopfield introduced an artificial neural network to collect and retrieve
memory like the human brain. Here, a neuron is either on or off the situation. The state of a
neuron(on +1 or off 0) will be restored, relying on the input it receives from the other neuron.
A Hopfield network is at first prepared to store various patterns or memories. Afterward, it is

3
ready to recognize any of the learned patterns by uncovering partial or even some corrupted
data about that pattern, i.e., it eventually settles down and restores the closest pattern. Thus,
similar to the human brain, the Hopfield model has stability in pattern recognition.

A Hopfield network is a single-layered and recurrent network in which the neurons


are entirely connected, i.e., each neuron is associated with other neurons. If there are two
neurons i and j, then there is a connectivity weight wij lies between them which is symmetric
wij = wji .

 Is a network with associative memory

 Can be used for different pattern recognition problems.

 It is fully connected, single layer auto associative network.

 Means it has only one layer, with each neuron connected to every other neuron.

 All the neuron act as input and output.

 The purpose of a Hopfield net is to store 1 or more patterns and to recall the full
patterns based on partial input.

Example:

 Consider the pattern: scan the input text and display corresponding ASCII.

Here’s the way a Hopfield network would work.

 You map it out so that each pixel is one node in the network.

 You train it (or just assign the weights) to recognize each of the 26 characters of the
alphabet, in both upper and lower case (that’s 52 patterns).

 Now if your scan gives you a pattern like something on the right of the above
illustration, and eventually reproduces the pattern on the left, a perfect “T”.

Architecture:

4
 All the nodes in Hopfield network are both inputs and outputs, and they are fully
interconnected.

 That is, each node is an input to every other node in the network.

 The Hopfield network(model) consists of a set of neurons and corresponding set of


unit delays, forming a multiple loop feedback system as shown in Figure.

 The number of feedback loops is equal to the number of neurons.

 Basically, the output of the neuron is feedback, via a unit delay element, to each of the
other neurons in the network.

 No self feedback in the network.

 Weights should be symmetrical,i.e wij = wji.

Properties of Hopfield network:


 A recurrent network with all nodes connected to all other nodes.
 Nodes have binary outputs (either 0,1 or -1,1).
 Weights between the nodes are symmetric.
 No connection from a node to itself is allowed.
 Nodes are updated asynchronously (i.e. nodes are selected at random).
 The network has no hidden nodes or layer.

Boltzmann machine:
A Boltzmann machine is an intriguing neural network model that operates in the
realm of unsupervised learning. Unlike other neural networks such as artificial neural
networks (ANNs), convolutional neural networks (CNNs), recurrent neural networks
(RNNs), and self-organizing maps (SOMs), Boltzmann machines are undirected. This means

5
that every node in a Boltzmann machine is connected to every other node, creating a
bidirectional network.

Here are some key points about Boltzmann machines:


1) Node Types:
 Visible Nodes: These nodes are directly measurable or observable.
 Hidden Nodes: These nodes are not directly measurable or observable.
 Despite their different roles, Boltzmann machines treat visible and hidden nodes
as part of a single system.

2) Stochastic Model:

 Boltzmann machines are stochastic or generative models, meaning they


introduce randomness into their computations.
 They are not deterministic like other neural networks.
 The training data is fed into the Boltzmann machine, and the system’s weights are
adjusted accordingly.

3) Energy-Based Models:

 Boltzmann machines use the Boltzmann distribution for sampling.


 The Boltzmann distribution describes the probabilities of different states of the
system.
 The energy of the system is defined in terms of the weights of synapses.
 The system aims to find its lowest energy state by adjusting the weights.

4) Types of Boltzmann Machines:


There are three types of Boltzmann machines

1. Restricted Boltzmann machine


2. Deep Boltzmann machine
3. Deep Belief machine

Restricted Boltzmann machine:

6
A restricted term means we cannot connect the same type of layer. In other words, the
two neurons in the input or hidden layer cannot communicate even though the hidden and
visible layers can be linked. Because there is no output layer in this machine, the question
of how we will identify, update the weights, and measure whether our prediction is
correct or not arises. All of the questions have a single answer: Restricted Boltzmann
Machine.Geoffrey Hinton (2007) proposed the RBM algorithm, which learns probability
distributions from sample training data inputs. It has widespread use in supervised and
unsupervised machine learning applications such as feature learning, dimensionality
reduction, classification, collaborative filtering, and topic modeling

Deep Boltzmann machine:

As shown in Fig, a deep Boltzmann machine is a model with additional hidden layers
and directionless connections between the nodes. DBM learns features from raw data in a
hierarchical manner, and the features recovered in one layer are applied as hidden variables
as input to the subsequent layer. To define the training information, weight initialization, and
adjustment parameters, the DBM training method must be modified. According to the DBM,
temporal complexity limitations will occur when the parameters are set to ideal. Montavon et
al presented a centering optimization strategy to make the learning process more robust and
for midsized DBM to construct a generative, quicker, and discriminative model.

Deep Belief machine:

7
The Deep Belief Network is one of the types of Boltzmann machine. It is a generative
model which uses multiple stacks of the deep architecture of the Restricted Boltzmann
Machine. Each restricted Boltzmann machine performs a non-linear transformation on the
input neurons and produces the outputs that serve as the input for the consecutive model.
Deep Belief Networks can act supervised or unsupervised as they have a generative model.
Due to it, Deep Belief Networks has a lot of flexibility, and it is easier to expand.

RBMs:

For the purpose of unsupervised learning of probability distributions, Hinton and


Sejnowski introduced Restricted Boltzmann Machines in 1986. An RBM is a type of
probabilistic graphical model and is a specific kind of BM. Like BMs, RBMs are used to
discover latent feature representations in a dataset by learning the probability
distribution of the input. In general, RBMs are useful in the dimensionality reduction of
data and contribute to extracting more meaningful features.

RBM’s Architecture:

The architecture of a Restricted Boltzmann Machine (RBM) consists of two


layers of interconnected nodes: an input layer and a hidden layer with symmetrically
connected weights. As we can see in the diagram below, each node in the input layer is
connected to each node in the hidden layer, with each connection having a weight associated
with it. Also, there are no connections between nodes of the hidden or the visible layer
respectively:

8
The set of neurons in the hidden layer represents the probability distribution across
the input data. During training, the system updates the weights between the layers aiming to
reproduce the desired output values. The purpose of each neuron in the visible layer is to
observe a pattern of data, while the neurons in the hidden layer are used to explain the
pattern observed by the visible neurons.
Training of an RBM:

RBMs learn by using an algorithm called Contrastive Divergence (CD).

 Contrastive Divergence:
The Contrastive Divergence algorithm is a form of the gradient descent optimization
method. In general, this algorithm iteratively updates the weights and biases of the neural
network to approximate the probability distribution of the input of the RBM. It operates by
estimating the gradient of the data’s log-likelihood function by adjusting the model
parameters. Moreover, CD iterates through input and compares it to the generated output of
the model until it converges to a local minimum.

 Learning Phase of an RBM:


The learning phase of an RBM basically refers to the adjustment of weights and
biases in order to reproduce the desired output. During this phase, the RBM receives input
data and drives it through the hidden layers. Each time an RBM layer receives a set of
inputs, it steps through a series of iterations to redefine the weights and biases.
Initially, the parameters of an RBM are initialized to small, random values. The weights
and biases are then updated iteratively until a suitable convergence criterion is reached.
The learning rate parameter is adjusted during the learning phase so that the model does
not over-fit or under-fit the data.

The process is repeated over multiple iterations, allowing the model to improve its accuracy
gradually.

The learning phase can be seen more easily in the diagram below:

9
As we can see the input is multiplied by the weights of the two consecutive layers.
Therefore each hidden layer receives those products, sums them, and adds a bias
to each node of the hidden layer. Then, the result is driven into an activation function and
then passed into the next hidden layer.

Therefore the learning phase considers updating and adjusting the weights between each
consecutive layer in order to find the probability distribution that best models the input data.
Note that RBMs are typically used in deep learning applications as a pre-training step. After
the RBM is trained, the weights of the model can be used as initial inputs for a more
advanced deep learning architecture, such as a deep belief network.

Advantages and Disadvantages of RBM:

Some of the Advantages of RBM Are:


 The hidden layer's activations can be included in other models as valuable features to
boost performance.
 Due to the limitations on connections between nodes, it is faster than a standard
Boltzmann machine.
 Efficiently computed and expressive enough to encode any distribution.

Some of the Disadvantages of RBM Are:


 The backpropagation algorithm is more well known than the CD-k algorithm, which
is utilized in RBMs.
 Because it is challenging to calculate the Energy gradient function, training is more
challenging.
 Weight Modification.

Applications:

 RBMs find applications in various fields:


o Computer Vision: Extracting features from images.
o Natural Language Processing: Learning word embeddings.
o Speech Recognition: Modeling acoustic features.

10
 They can also enhance other neural network architectures by improving their
performance.

Sigmoid net:
In deep learning, a sigmoid network typically refers to a neural network architecture
that uses the sigmoid activation function in its neurons. The sigmoid function, also known as
the logistic function, is commonly used in binary classification problems. It squashes its input
values to the range (0, 1), making it suitable for problems where the goal is to predict
probabilities.

The sigmoid function is defined as:

where z is the input to the function.

When used in a neural network, the sigmoid activation function is often applied to the
output layer, especially when the network is designed for binary classification tasks. The
output of the sigmoid function can be interpreted as the probability of the input belonging to
the positive class. For example, in a binary classification problem where you are trying to
predict whether an image contains a cat or not, the sigmoid output close to 1 would indicate a
high probability of the image containing a cat, while an output close to 0 would indicate a
low probability.

Auto encoders:
AutoEncoder is an unsupervised Artificial Neural Network that attempts to encode the
data by compressing it into the lower dimensions (bottlenecklayer or code) and then decoding
the data to reconstruct the original input.The bottleneck layer (or code) holds the compressed
representation of the input data. In AutoEncoder the number of output units must be equal to
the number ofinput units since we’re attempting to reconstruct the input data.

AutoEncoders usually consist of an encoder and a decoder. The encoder encodes the
provided data into a lower dimension which is the size of thebottleneck layer and the decoder
decodes the compressed data into its original form.The number of neurons in the layers of the
encoder will be decreasing as we move on with further layers, whereas the number of neurons
in the layers of the decoder will be increasing as we move on with further layers. There are
three layers used in the encoder and decoder in the following example. The encoder contains
32, 16, and 7 units in each layer respectively and the decoder contains 7, 16, and 32 units in
each layer respectively. The code size/ the number of neurons in bottle-neck must be less than
the number of features in the data. Before feeding the data into the AutoEncoder the data
must definitely be scaled between 0 and 1 using MinMaxScaler since we are going to use
sigmoid activation function in the output layer which outputs values between 0 and 1. When

11
we are using AutoEncoders for dimensionality reduction we’ll beextracting the bottleneck
layer and use it to reduce the dimensions. Thisprocess can be viewed as feature extraction.

The type of AutoEncoder that we’re using is Deep AutoEncoder, where the encoder
and the decoder are symmetrical. The Autoencoders don’t necessarily have a symmetrical
encoder and decoder but we can have the encoder and decoder non-symmetrical as well.

Architecture:
An Autoencoder is a type of neural network that can learn to reconstruct images, text, and
other data from compressed versions of themselves.
An Autoencoder consists of three layers:
1. Encoder
2. Code
3. Decoder
The Encoder layer compresses the input image into a latent space representation. It encodes
the input image as a compressed representation in a reduced dimension.
The compressed image is a distorted version of the original image.

The Code layer represents the compressed input fed to the decoder layer.

The decoder layer decodes the encoded image back to the original dimension. The decoded
image is reconstructed from latent space representation, and it is reconstructed from the latent
space representation and is a lossy reconstruction of the original image.

Types of Autoencoders:

12
Under Complete Autoencoders:

Complete autoencoders are unsupervised neural networks used to compress input data
by predicting the same image as output and reconstructing it from its compressed bottleneck
region. These autoencoders are primarily used to create a latent space or bottleneck, which
can be easily decompressed back using the network.
Sparse Autoencoders:

Complete autoencoders are unsupervised neural networks that generate a compressed


version of input data by predicting the same image as output and reconstructing it from its
compressed bottleneck region. They are used to create a latent space or bottleneck, which can
be decompressed back. Sparse autoencoders are controlled by changing the number of nodes
at each hidden layer, penalizing the activation of some neurons in hidden layers. This
regularization prevents more neurons from being activated.
Two types of regularizers are used:
1) the L1 Loss method, which adds magnitude to the model.
2) the KL-divergence method, which considers activations over a collection of
samples and constrains the average activation of each neuron.
Contractive Autoencoders:
The contractive autoencoder uses a bottleneck function to learn an image
representation while passing it through, and a regularization term prevents the network from
learning the identity function. To train a model that works with this constraint, it is crucial to
ensure that the derivatives of hidden layer activations are small concerning the input.
Denoising Autoencoders:

Denoising autoencoders are a method for removing noise from images, unlike regular
autoencoders which use the input image as their ground truth. They use a noisy version of the
image, which is difficult to remove manually. By feeding the noisy idea into the network, it
maps it into a lower-dimensional manifold, making noise filtering more manageable. The loss
function used with these networks is typically L2 or L1 loss.
Variational Autoencoders:

Variational autoencoders (VAEs) are models that address the issue of standard
autoencoders, which represent input in a compressed form called the latent space. VAEs
express latent attributes as a probability distribution, creating a continuous latent space that
can be easily sampled and interpolated, unlike traditional autoencoders that form a
compressed latent space.

13
14

You might also like