Professional Documents
Culture Documents
DL Unit - 4
DL Unit - 4
DL Unit - 4
UNIT IV: Probabilistic Neural Network: Hopfield Net, Boltzmann machine, RBMs,
Sigmoid net, Auto encoders.
Sourse
1
Input Layer:
Each predictor variable is represented by a neuron in the input layer. When there are
N categories in a categorical variable, N-1 neurons are used. By subtracting the median and
dividing by the interquartile range, the range of data is standardized. The values are then fed
to each of the neurons in the hidden layer by the input neurons.
Pattern Layer:
Each case in the training data set has one neuron in this layer. It saves the values of
the case’s predictor variables as well as the target value. A hidden neuron calculates the
Euclidean distance between the test case and the neuron’s center point, then uses the sigma
values to apply the radial basis kernel function.
Summation Layer:
Each category of the target variable has one pattern neuron in PNN. Each hidden
neuron stores the actual target category of each training event; the weighted value output by a
hidden neuron is only supplied to the pattern neuron that corresponds to the hidden neuron’s
category. The values for the class that the pattern neurons represent are added together.
Decision Layer:
The output layer compares the weighted votes accumulated in the pattern layer for
each target category and utilizes the largest vote to predict the target category.
PNN Operation:
The input nodes are the set of measurements.
The second layer consists of the Gaussian functions formed using the given set of data
points as centers.
The third layer performs an average operation of the outputs form the second layer for
each class.
The fourth layer performs a vote, selecting the largest value. The associated class
label is thus determined.
In PNN, the parent probability distribution function (PDF) of each class is
approximated by a Parzen window and a non-parametric function.
A PNN consists of several sub-networks, each of which is a Parzen window pdf
estimator for each of the classes.
PNNs do not use back propagation for training.
The training set’s exemplar feature vectors are provided to us. We know the class to which
each one belongs. The PNN is configured as follows.
1. Input the file containing the exemplar vectors and class numbers.
2. Sort these into K sets, each of which contains one class of vectors.
2
3. Create a Gaussian function centered on each exemplar vector in set k, and then define
the cumulative Gaussian output function for each k.
After we’ve defined the PNN, we can feed vectors into it and classify them as follows.
1. Read the input vector and assign the Gaussian function according to their performance
in each category.
2. For each cluster of hidden nodes, compute all ‘Gaussian functional useful values’ at
the hidden nodes.
3. Feed all of the Gaussian functional values from the hidden node cluster to the
cluster’s single output node.
4. For each category output node, add all of the inputs and multiply by a constant.
5. Determine the most valuable of all the useful values added together at the output
nodes.
There are various benefits and drawbacks and applications of employing a PNN rather than a
multilayer perceptron.
Advantages:
Multilayer perceptron networks are substantially slower than PNNs.
PNNs have the potential to outperform multilayer perceptron networks in terms of
accuracy.
Outliers aren’t as noticeable in PNN networks.
PNN networks predict target probability scores with high accuracy.
PNNs are getting close to Bayes’s optimum classification.
Disadvantages:
When it comes to classifying new cases, PNNs are slower than multilayer perceptron
networks.
PNN requires extra memory to store the mod.
Hopfield Network:
Hopfield network is a special kind of neural network whose response is different from
other neural networks. It is calculated by converging iterative process. It has just one layer of
neurons relating to the size of the input and output, which must be the same. When such a
network recognizes, for example, digits, we present a list of correctly rendered digits to the
network. Subsequently, the network can transform a noise input to the relating perfect output.
In 1982, John Hopfield introduced an artificial neural network to collect and retrieve
memory like the human brain. Here, a neuron is either on or off the situation. The state of a
neuron(on +1 or off 0) will be restored, relying on the input it receives from the other neuron.
A Hopfield network is at first prepared to store various patterns or memories. Afterward, it is
3
ready to recognize any of the learned patterns by uncovering partial or even some corrupted
data about that pattern, i.e., it eventually settles down and restores the closest pattern. Thus,
similar to the human brain, the Hopfield model has stability in pattern recognition.
Means it has only one layer, with each neuron connected to every other neuron.
The purpose of a Hopfield net is to store 1 or more patterns and to recall the full
patterns based on partial input.
Example:
Consider the pattern: scan the input text and display corresponding ASCII.
You map it out so that each pixel is one node in the network.
You train it (or just assign the weights) to recognize each of the 26 characters of the
alphabet, in both upper and lower case (that’s 52 patterns).
Now if your scan gives you a pattern like something on the right of the above
illustration, and eventually reproduces the pattern on the left, a perfect “T”.
Architecture:
4
All the nodes in Hopfield network are both inputs and outputs, and they are fully
interconnected.
That is, each node is an input to every other node in the network.
Basically, the output of the neuron is feedback, via a unit delay element, to each of the
other neurons in the network.
Boltzmann machine:
A Boltzmann machine is an intriguing neural network model that operates in the
realm of unsupervised learning. Unlike other neural networks such as artificial neural
networks (ANNs), convolutional neural networks (CNNs), recurrent neural networks
(RNNs), and self-organizing maps (SOMs), Boltzmann machines are undirected. This means
5
that every node in a Boltzmann machine is connected to every other node, creating a
bidirectional network.
2) Stochastic Model:
3) Energy-Based Models:
6
A restricted term means we cannot connect the same type of layer. In other words, the
two neurons in the input or hidden layer cannot communicate even though the hidden and
visible layers can be linked. Because there is no output layer in this machine, the question
of how we will identify, update the weights, and measure whether our prediction is
correct or not arises. All of the questions have a single answer: Restricted Boltzmann
Machine.Geoffrey Hinton (2007) proposed the RBM algorithm, which learns probability
distributions from sample training data inputs. It has widespread use in supervised and
unsupervised machine learning applications such as feature learning, dimensionality
reduction, classification, collaborative filtering, and topic modeling
As shown in Fig, a deep Boltzmann machine is a model with additional hidden layers
and directionless connections between the nodes. DBM learns features from raw data in a
hierarchical manner, and the features recovered in one layer are applied as hidden variables
as input to the subsequent layer. To define the training information, weight initialization, and
adjustment parameters, the DBM training method must be modified. According to the DBM,
temporal complexity limitations will occur when the parameters are set to ideal. Montavon et
al presented a centering optimization strategy to make the learning process more robust and
for midsized DBM to construct a generative, quicker, and discriminative model.
7
The Deep Belief Network is one of the types of Boltzmann machine. It is a generative
model which uses multiple stacks of the deep architecture of the Restricted Boltzmann
Machine. Each restricted Boltzmann machine performs a non-linear transformation on the
input neurons and produces the outputs that serve as the input for the consecutive model.
Deep Belief Networks can act supervised or unsupervised as they have a generative model.
Due to it, Deep Belief Networks has a lot of flexibility, and it is easier to expand.
RBMs:
RBM’s Architecture:
8
The set of neurons in the hidden layer represents the probability distribution across
the input data. During training, the system updates the weights between the layers aiming to
reproduce the desired output values. The purpose of each neuron in the visible layer is to
observe a pattern of data, while the neurons in the hidden layer are used to explain the
pattern observed by the visible neurons.
Training of an RBM:
Contrastive Divergence:
The Contrastive Divergence algorithm is a form of the gradient descent optimization
method. In general, this algorithm iteratively updates the weights and biases of the neural
network to approximate the probability distribution of the input of the RBM. It operates by
estimating the gradient of the data’s log-likelihood function by adjusting the model
parameters. Moreover, CD iterates through input and compares it to the generated output of
the model until it converges to a local minimum.
The process is repeated over multiple iterations, allowing the model to improve its accuracy
gradually.
The learning phase can be seen more easily in the diagram below:
9
As we can see the input is multiplied by the weights of the two consecutive layers.
Therefore each hidden layer receives those products, sums them, and adds a bias
to each node of the hidden layer. Then, the result is driven into an activation function and
then passed into the next hidden layer.
Therefore the learning phase considers updating and adjusting the weights between each
consecutive layer in order to find the probability distribution that best models the input data.
Note that RBMs are typically used in deep learning applications as a pre-training step. After
the RBM is trained, the weights of the model can be used as initial inputs for a more
advanced deep learning architecture, such as a deep belief network.
Applications:
10
They can also enhance other neural network architectures by improving their
performance.
Sigmoid net:
In deep learning, a sigmoid network typically refers to a neural network architecture
that uses the sigmoid activation function in its neurons. The sigmoid function, also known as
the logistic function, is commonly used in binary classification problems. It squashes its input
values to the range (0, 1), making it suitable for problems where the goal is to predict
probabilities.
When used in a neural network, the sigmoid activation function is often applied to the
output layer, especially when the network is designed for binary classification tasks. The
output of the sigmoid function can be interpreted as the probability of the input belonging to
the positive class. For example, in a binary classification problem where you are trying to
predict whether an image contains a cat or not, the sigmoid output close to 1 would indicate a
high probability of the image containing a cat, while an output close to 0 would indicate a
low probability.
Auto encoders:
AutoEncoder is an unsupervised Artificial Neural Network that attempts to encode the
data by compressing it into the lower dimensions (bottlenecklayer or code) and then decoding
the data to reconstruct the original input.The bottleneck layer (or code) holds the compressed
representation of the input data. In AutoEncoder the number of output units must be equal to
the number ofinput units since we’re attempting to reconstruct the input data.
AutoEncoders usually consist of an encoder and a decoder. The encoder encodes the
provided data into a lower dimension which is the size of thebottleneck layer and the decoder
decodes the compressed data into its original form.The number of neurons in the layers of the
encoder will be decreasing as we move on with further layers, whereas the number of neurons
in the layers of the decoder will be increasing as we move on with further layers. There are
three layers used in the encoder and decoder in the following example. The encoder contains
32, 16, and 7 units in each layer respectively and the decoder contains 7, 16, and 32 units in
each layer respectively. The code size/ the number of neurons in bottle-neck must be less than
the number of features in the data. Before feeding the data into the AutoEncoder the data
must definitely be scaled between 0 and 1 using MinMaxScaler since we are going to use
sigmoid activation function in the output layer which outputs values between 0 and 1. When
11
we are using AutoEncoders for dimensionality reduction we’ll beextracting the bottleneck
layer and use it to reduce the dimensions. Thisprocess can be viewed as feature extraction.
The type of AutoEncoder that we’re using is Deep AutoEncoder, where the encoder
and the decoder are symmetrical. The Autoencoders don’t necessarily have a symmetrical
encoder and decoder but we can have the encoder and decoder non-symmetrical as well.
Architecture:
An Autoencoder is a type of neural network that can learn to reconstruct images, text, and
other data from compressed versions of themselves.
An Autoencoder consists of three layers:
1. Encoder
2. Code
3. Decoder
The Encoder layer compresses the input image into a latent space representation. It encodes
the input image as a compressed representation in a reduced dimension.
The compressed image is a distorted version of the original image.
The Code layer represents the compressed input fed to the decoder layer.
The decoder layer decodes the encoded image back to the original dimension. The decoded
image is reconstructed from latent space representation, and it is reconstructed from the latent
space representation and is a lossy reconstruction of the original image.
Types of Autoencoders:
12
Under Complete Autoencoders:
Complete autoencoders are unsupervised neural networks used to compress input data
by predicting the same image as output and reconstructing it from its compressed bottleneck
region. These autoencoders are primarily used to create a latent space or bottleneck, which
can be easily decompressed back using the network.
Sparse Autoencoders:
Denoising autoencoders are a method for removing noise from images, unlike regular
autoencoders which use the input image as their ground truth. They use a noisy version of the
image, which is difficult to remove manually. By feeding the noisy idea into the network, it
maps it into a lower-dimensional manifold, making noise filtering more manageable. The loss
function used with these networks is typically L2 or L1 loss.
Variational Autoencoders:
Variational autoencoders (VAEs) are models that address the issue of standard
autoencoders, which represent input in a compressed form called the latent space. VAEs
express latent attributes as a probability distribution, creating a continuous latent space that
can be easily sampled and interpolated, unlike traditional autoencoders that form a
compressed latent space.
13
14