Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 43

Deep Learning

Module 3 Prof:Naveen Ghorpade


Contents
 Auto Encoders: Overview on Auto-Encoders
 Note on Biases
 Training an auto encoder
 Over complete hidden layers
 Sparse auto encoders
 De-noising auto-endoders
 Contractive auto-encoders
 Stacked auto-encoders
 Deep auto-encoders
 Building an Auto-encoder
 Tuning and optimizing
 Applications of auto-encoders
Auto Encoders
 An autoencoder is a type of artificial neural network used to learn efficient
data coding in an unsupervised manner.
 The aim of an autoencoder is to learn a representation (encoding) for a set of
data, typically for dimensionality reduction, by training the network to ignore
signal “noise”.
 Along with the reduction side, a reconstructing side is explained, where the
autoencoder tries to generate from the reduced encoding a representation as
close as possible to its original input, hence its name.
 Several variants exist to the basic model, with the aim of forcing the learned
representations of the input to assume useful properties.
 Examples are the regularized autoencoders (Sparse, Denoising and Contractive
autoencoders), proven effective in learning representations for subsequent
classification tasks, and Variational autoencoders, with their recent
applications as generative models.
 Autoencoders are effectively used for solving many applied problems, from
face recognition to acquiring the semantic meaning of words.
Auto Encoders

 Auto encoder learns to copy its input to its output. It has an internal (hidden)
layer that describes a code used to represent the input, and
 It is constituted by two main parts: an encoder that maps the input into the
code, and a decoder that maps the code to a reconstruction of the original
input.
 Performing the copying task perfectly would simply duplicate the signal, and
this is why autoencoders usually are restricted in ways that force them to
reconstruct the input approximately, preserving only the most relevant aspects
of the data in the copy.
 Their most traditional application was dimensionality reduction or feature
learning, but more recently the autoencoder concept has become more widely
used for learning generative models of data.
 Some of the most powerful AIs in the 2010s involved sparse autoencoders
Auto Encoders
Auto Encoders

 A typical use of a Neural Network is a case of supervised learning. It involves


training data which contains an output label. The neural network tries to learn
the mapping from the given input to the given output label. But if the output
label is replaced by the input vector itself then the network will try to find the
mapping from the input to itself. This would be the identity function which is a
trivial mapping.
 But if the network is not allowed to simply copy the input, then the network
will be forced to capture only the salient features. This constraint opens up a
different field of applications for Neural Networks which was unknown. The
primary applications are dimensionality reduction and specific data
compression.
 The network is first trained on the given input. The network tries to
reconstruct the given input from the features it picked up and gives an
approximation to the input as the output.
Auto Encoders

 Autoencoders are a specific type of feedforward neural networks where the


input is the same as the output.
 They compress the input into a lower-dimensional code and then reconstruct
the output from this representation. T
 The code is a compact “summary” or “compression” of the input, also called
the latent-space representation.
 An autoencoder consists of 3 components: encoder, code and decoder.
To build an autoencoder we need 3 things:
 an encoding method
 decoding method
 a loss function to compare the output with the target.
Auto Encoders

 Autoencoders are mainly a dimensionality reduction (or compression)


algorithm with a couple of important properties:
 Data-specific: Autoencoders are only able to meaningfully compress data
similar to what they have been trained on. Since they learn features specific for
the given training data, they are different than a standard data compression
algorithm like gzip. So we can’t expect an autoencoder trained on handwritten
digits to compress landscape photos.
 Lossy: The output of the autoencoder will not be exactly the same as the input,
it will be a close but degraded representation. If you want lossless compression
they are not the way to go.
 Unsupervised: To train an autoencoder we don’t need to do anything fancy,
just throw the raw input data at it. Autoencoders are considered
an unsupervised learning technique since they don’t need explicit labels to
train on. But to be more precise they are self-supervised because they generate
Overview of the SOM Algorithm

 We have a spatially continuous input space, in which our input vectors live.
 The aim is to map from this to a low dimensional spatially discrete output
space, the topology of which is formed by arranging a set of neurons in a grid.
 Our SOM provides such a nonlinear transformation called a feature map.
 The stages of the SOM algorithm can be summarised as follows:
1. Initialization – Choose random values for the initial weight vectors wj.
2. Sampling – Draw a sample training input vector x from the input space.
3. Matching – Find the winning neuron I(x) with weight vector closest to input
vector.
4. Updating – Apply the weight update equation
5. Continuation – keep returning to step 2 until the feature map stops changing.
Note on Biases
 Regularized training of an autoencoder typically results in hidden unit biases that take on large
negative values.
 The negative biases are a natural result of using a hidden layer whose responsibility is to both
represent the input data and act as a selection mechanism that ensures sparsity of the
representation.
 Then the negative biases impede the learning of data distributions whose intrinsic dimensionality
is high.
 We also observe a new activation function that decouples the two roles of the hidden layer and
that allows us to learn representations on data with very high intrinsic dimensionality, where
standard autoencoders typically fail.
 Since the decoupled activation function acts like an implicit regularizer, the model can be trained
by minimizing the reconstruction error of training data, without requiring any additional
regularization.
Training an Auto-Encoder
Step 1 - We start with an array where the lines (observations) correspond to the users and the columns (the
features) correspond to the movies. Each cell (u, i) contains the rating (from 1 to 5, 0 if no rating) of the movie
i by the user u.
Step 2 - The first user goes into the network. The input vector x = (r1, r2, ..., rm) contains all ratings for all
movies.
Step 3 - The input vector x is encoded into a vector z of lower dimensions by a mapping function f (e.g.
sigmoid function):
Z = f(Wx + b) where W is the vector of input weights and b the bias.
Step 4 - z is then decoded into the output vector y of same dimensions as x, aiming to replicate the input vector
x.
Step 5 - The reconstruction error d(x, y) = ||x-y|| is computed. The goal is to minimize it.
Step 6 - Backpropagation. From right to left, the error is backpropagated. The weights are updated according
to how much they are responsible for the error and the learning rate decides how much we update the weights.
Step 7 - Repeat steps 1 to 6 and update the weights after each observation (Reinforcement Learning). Or repeat
steps 1 to 6 but update the weights only after a batch of observations (Batch Learning).
Step 8 - When the whole training set has passed through the ANN, this classes as an epoch. Repeat more
Over complete hidden layers

 Overcomplete Hidden Layers are a concept in which Auto-Encoders have a


hidden layer that is equal to the amount of inputs nodes or greater, this allows
the Auto-Encoder to have more features.
 When you have an equal amount or greater amount of hidden nodes in your
Auto-Encoders it can cheat.
 This means they will only pass through the hidden nodes that are parallel with
the input nodes, ignoring any additional hidden nodes.
 When trained up this can be a huge problem and will make it useless. We can
resolve this by using any of the following types of Auto-Encoders:
Sparse Auto-Encoders
Denoising Auto-Encoders
Contractive Auto-Encoders
Over complete hidden layers
Sparse auto-encoders
Sparse auto-encoders

 Sparse autoencoders have hidden nodes greater than input nodes. They can
still discover important features from the data.
 Sparsity constraint is introduced on the hidden layer. This is to prevent output
layer copy input data.
 Sparse autoencoders have a sparsity penalty, Ω(h), a value close to zero but
not zero. Sparsity penalty is applied on the hidden layer in addition to the
reconstruction error. This prevents overfitting.

 Sparse autoencoders take the highest activation values in the hidden layer and
zero out the rest of the hidden nodes. This prevents autoencoders to use all of
the hidden nodes at a time and forcing only a reduced number of hidden nodes
to be used.
 As we activate and inactivate hidden nodes for each row in the dataset. Each
Sparse auto-encoders

 An auto-encoder takes the input image or vector and learns code dictionary
that changes the raw input from one representation to another.
 Where in sparse autoencoders with a sparsity enforcer that directs a single
layer network to learn code dictionary which in turn minimizes the error in
reproducing the input while restricting number of code words for
reconstruction.
 The sparse autoencoder consists a single hidden layer, which is connected to
the input vector by a weight matrix forming the encoding step.
 The hidden layer then outputs to a reconstruction vector, using a tied weight
matrix to form the decoder.
Sparse auto-encoders

 An advancement to sparse autoencoders is the k-sparse autoencoder. Here we


choose k neurons with highest activation functions ignoring other activation
functions using ReLU activation functions and adjusting the threshold to find
the largest neurons. This tune the value of k to obtain sparsity level best suited
for the dataset.
De-noising auto-encoders
 Autoencoders are Neural Networks which are commonly used for feature selection and
extraction. However, when there are more nodes in the hidden layer than there are inputs, the
Network is risking to learn the so-called “Identity Function”, also called “Null Function”,
meaning that the output equals the input, marking the Autoencoder useless.
 Denoising Autoencoders solve this problem by corrupting the data on purpose by randomly
turning some of the input values to zero. In general, the percentage of input nodes which are
being set to zero is about 50%. Other sources suggest a lower count, such as 30%. It depends on
the amount of data and input nodes you have.
 Denoising Autoencoders are an important and crucial tool for feature selection and extraction.
 When calculating the Loss function, it is important to compare the output values with the original
input, not with the corrupted input. That way, the risk of learning the identity function instead of
extracting features is eliminated.
De-noising auto-encoders
De-noising auto-encoders
The idea behind
denoising autoencoders
is simple. In order to
force the hidden layer to
De-noising auto-encoders

 Differently from sparse autoencoders or undercomplete autoencoders that


constrain representation, Denoising autoencoders (DAE) try to achieve a good
representation by changing the reconstruction criterion.
 Indeed, DAEs take a partially corrupted input and are trained to recover the
original undistorted input. In practice, the objective of denoising autoencoders
is that of cleaning the corrupted input, or denoising.
 Two underlying assumptions are inherent to this approach:
1. Higher level representations are relatively stable and robust to the corruption of
the input;
2. To perform denoising well, the model needs to extract features that capture
useful structure in the distribution of the input.
 In other words, denoising is advocated as a training criterion for learning to
extract useful features that will constitute better higher level representations of
Contractive auto-encoders
Contractive auto-encoders
 Contractive autoencoder(CAE) objective is to have a robust learned representation
which is less sensitive to small variation in the data.
 Robustness of the representation for the data is done by applying a penalty term to
the loss function. The penalty term is Frobenius norm of the Jacobian matrix.
Frobenius norm of the Jacobian matrix for the hidden layer is calculated with respect
to input. Frobenius norm of the Jacobian matrix is the sum of square of all elements.

 Contractive autoencoder is another regularization technique like sparse autoencoders


and denoising autoencoders.
Contractive auto-encoders

 Contractive Auto-Encoder is a variation of well-known Auto-Encoder


algorithm that has a solid background in the information theory and lately
deep learning community.
 The simple Auto-Encoder targets to compress information of the given data as
keeping the reconstruction cost lower as much as possible.
 However another use is to enlarge the given input's representation.
 In that case, we learn over-complete representation of the given data instead of
compressing it.
 Most common implication is Sparse Auto-Encoder that learns over-complete
representation but in a sparse (smart) manner. That means, for a given instance
only informative set of units are activated, therefore you are able to capture
more discriminative representation, especially if you use AE for pre-training
of your deep neural network.
Contractive auto-encoders

 CAE simply targets to learn invariant representations to unimportant


transformations for the given data.
 It only learns transformations that are exactly in the given dataset and try to
avoid more. For instance, if you have set of car images and they have left and
right view points in the dataset, then CAE is sensitive to those changes but it is
insensitive to frontal view point.
 What it means that if you give a frontal car image to CAE after the training
phase, it tries to contract its representation to one of the left or right view point
car representation at the hidden layer.
 From the mathematical point of view, it gives the effect of contraction by
adding an additional term to reconstruction cost.
 This addition is the Sqrt Frobenius norm of Jacobian of the hidden layer
representation with respect to input values. If this value is zero, it means, as
Stacked auto-encoders
 A stacked autoencoder is a neural network consist several layers of sparse autoencoders where
output of each hidden layer is connected to the input of the successive hidden layer.
Stacked auto-encoders
As shown in Figure the hidden layers are trained by
an unsupervised algorithm and then fine-tuned by a
supervised method.
Stacked autoencoder mainly consists of three steps.

1. Train autoencoder using input data and acquire the


learned data.
2. The learned data from the previous layer is used as
an input for the next layer and this continues until the
training is completed.
Deep auto-encoders
 A deep autoencoder is composed of two, symmetrical deep-belief
networks that typically have four or five shallow layers representing the
encoding half of the net, and second set of four or five layers that make up the
decoding half.
 The layers are restricted Boltzmann machines, the building blocks of deep-
belief networks, with several peculiarities. Here’s a simplified schema of a
deep autoencoder’s structure.
Deep auto-encoders
 Processing the benchmark dataset MNIST, a deep autoencoder would use
binary transformations after each RBM.
 Deep autoencoders can also be used for other types of datasets with real-
valued data, on which you would use Gaussian rectified transformations for
the RBMs instead.
 Deep Autoencoders consist of two identical deep belief networks. One
network for encoding and another for decoding
 Typically deep autoencoders have 4 to 5 layers for encoding and the next 4 to
5 layers for decoding. We use unsupervised layer by layer pre-training
 Restricted Boltzmann Machine(RBM) is the basic building block of the deep
belief network. We will do RBM is a different post.
 In the above figure, we take an image with 784 pixel. Train using a stack of 4
RBMs, unroll them and then finetune with back propagation
 Final encoding layer is compact and fast
Deep auto-encoders
Deep auto-encoders
Building an Auto-encoder

 Step 1: Importing the required libraries


 Step 2: Defining a utility function to load the data
 Step 3: Defining a utility function to build the Auto-encoder neural network
 Step 4: Defining a utility function to build and train the Auto-encoder network
 Step 5: Defining a utility function to visualize the reconstruction
 Step 6: Calling the utility functions in the appropriate order
a) Loading the data
b) Building the network
c) Building and training the Auto-encoder
d) Visualizing the reconstruction
Tuning and optimizing
 Incorporating the PCA properties will bring significant benefits to an Autoencoder, such as
resolving vanishing and exploding gradient, and overfitting via regularization.
 Based on this, properties that we would inherit are,
1. Tied weights,
2. Orthogonal weights,
3. Uncorrelated features, and
4. Unit Norm.
Implement custom layer and constraints to incorporate them.
Demonstrate how they work, and the improvements in reconstruction errors that they bring.
 These implementations will enable constructing a well-posed Autoencoder and optimizing it. The
optimizations improved the reconstruction error by more than 50%.
 Note: regularization techniques, such as, dropout, are popularly used. But without a well-posed
model, these approaches take longer to optimize.
Applications of auto-encoders
 Used for Non Linear Dimensionality Reduction. Encodes input in the
hidden layer to a smaller dimension compared to the input dimension.
Hidden layer is later decoded as output. Output layer has the same
dimension as input. Autoencoder reduces dimensionality of linear and
nonlinear data hence it is more powerful than PCA.
 Used in Recommendation Engines. This uses deep encoders to
understand user preferences to recommend movies, books or items
 Used for Feature Extraction : Autoencoders tries to minimize the
reconstruction error. In the process to reduce the error, it learns some of
important features present in the input. It reconstructs the input from the
encoded state present in the hidden layer. Encoding generates a new set of
features which is a combination of the original features. Encoding in
Applications of auto-encoders

 Dimensionality Reduction
 Image Compression
 Image Denoising
 Feature Extraction
 Image generation
 Sequence to sequence prediction
 Recommendation system
Dimensionality Reduction

 In autoencoders, where the size of hidden layer is smaller than input layer. We
force the network to learn important features by reducing the hidden layer size.
 Also, a network with high capacity(deep and highly nonlinear ) may not be
able to learn anything useful.
 Dimension reduction methods are based on the assumption that dimension of
data is artificially inflated and its intrinsic dimension is much lower.
 As we increase the number of layers in an autoencoder, the size of the hidden
layer will have to decrease. If the size of the hidden layer becomes smaller
than the intrinsic dimension of the data then it will result in loss of
information.
Image Compression

 Usually, Autoencoders are really not good for data compression.


For Image Compression, it is pretty difficult for an autoencoder to do
better than basic algorithms, like JPEG and by being only specific for a
particular type of images, we can prove this statement wrong. Thus, this data-
specific property of autoencoders makes it impractical for compression of real-
world data. One can only use them for data on which they were trained, and
therefore, generalisation requires a lot of data.
Image Denoising
 Today, Autoencoders are very good at denoising of images.
What happens when rain drops are on our window glass?
we can't get a clear image of "What is behind the scene?".
 Here rain drops can be seen as noise. So,When our image get corrupted or there is a bit of
noise in it, we call this image as a noisy image.
 To obtain proper information about the content of image, we want Image Denoising.
We define our autoencoder to remove (if not all)most of the noise of the image.
Feature Extraction
 Encoding part of Autoencoders helps to learn important hidden features
present in the input data, in the process to reduce the reconstruction error.
During encoding, a new set of combination of original features is generated.
Image generation
 There is a type of Autoencoder, named Variational Autoencoder(VAE), this
type of autoencoders are Generative Model, used to generate images.
 The idea is that given input images like images of face or scenery, the system
will generate similar images.
 The use is to:
1. Generate new characters of animation
2. Generate fake human images
Sequence to sequence prediction
 The Encoder-Decoder Model that can capture temporal structure, such
as LSTMs-based autoencoders, can be used to address Machine
Translation problems.
 This can be used to:
1. Predict the next frame of a video
2. Generate fake videos
 A complete guide is provided by Jason Brownlee on Sequence to
Sequence Prediction, where source sequence is a series of randomly
generated integer values, such as [20, 36, 40, 10, 34, 28], and the target
sequence is a reversed pre-defined subset of the input sequence, such as
the first 3 elements in reverse order [40, 36, 20].
Recommendation system
 Deep Autoencoders can be used to understand user preferences to recommend
movies, books or other items.
 Consider the case of YouTube, the idea is:
 the input data is the clustering of similar users based on interests of users are
denoted by videos watched, watch time for each, interactions (like
commenting) with the video, data is captured by clustering content
 Encoder part will capture the interests of the user
 Decoder part will try to project the interests on two parts:
 existing unseen content
 new content from content creators

You might also like