Lecture 14 Autoencoders

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Autoencoders

Introduction
• Autoencoder is an unsupervised artificial neural network that learns how to
efficiently compress and encode data then learns how to reconstruct the
data back from the reduced encoded representation to a representation that
is as close to the original input as possible.
• Autoencoder, by design, reduces data dimensions by learning how to ignore
the noise in the data.
• Here is an example of the input/output image from the MNIST dataset to an
autoencoder.
Autoencoder
• An autoencoder is an unsupervised machine learning algorithm that takes
an image as input and tries to reconstruct it using fewer number of bits
from the bottleneck also known as latent space.
• The image is majorly compressed at the bottleneck.
• The compression in autoencoders is achieved by training the network for a
period of time and as it learns it tries to best represent the input image at
the bottleneck.
• The general image compression algorithms like JPEG and JPEG lossless
compression techniques compress the images without the need for any
kind of training and do fairly well in compressing the images.
• Autoencoders are similar to dimensionality reduction techniques like
Principal Component Analysis (PCA).
• They project the data from a higher dimension to a lower dimension using
linear transformation and try to preserve the important features of the
data while removing the non-essential parts.
• However, the major difference between autoencoders and PCA lies in the
transformation part: as you already read, PCA uses linear transformation
whereas autoencoders use non-linear transformations.
• Now that you have a bit of understanding about autoencoders, let's now
break this term and try to get some intuition about it!

• The above figure is a two-layer vanilla autoencoder with one hidden layer.
• In deep learning terminology, you will often notice that the input layer is
never taken into account while counting the total number of layers in an
architecture.
• The total layers in an architecture only comprises of the number of hidden
layers and the output layer.
• As shown in the image above, the input and output layers have the same
number of neurons.
• Let's take an example. You feed an image with just five pixel values into
the autoencoder which is compressed by the encoder into three pixel
values at the bottleneck (middle layer) or latent space.
• Using these three values, the decoder tries to reconstruct the five pixel
values or rather the input image which you fed as an input to the network.
Autoencoder Components
• Autoencoders consists of 4 main parts:
– Encoder: In which the model learns how to reduce the input
dimensions and compress the input data into an encoded
representation.
– Bottleneck: which is the layer that contains the compressed
representation of the input data. This is the lowest possible
dimensions of the input data.
– Decoder: In which the model learns how to reconstruct the data from
the encoded representation to be as close to the original input as
possible.
– Reconstruction Loss: This is the method that measures how well the
decoder is performing and how close the output is to the original
input.
• The training then involves using back propagation in order to minimize the
network’s reconstruction loss.
How does Autoencoders work?
• We take the input, encode it to identify latent feature representation.
Decode the latent feature representation to recreate the input.
• We calculate the loss by comparing the input and output.
• To reduce the reconstruction error we back propagate and update the
weights.
• Weight is updated based on how much they are responsible for the error.
• In our example, we have taken the dataset for products bought by
customers.
• Step 1: Take the first row from the customer data for all products bought
in an array as the input. 1 represent that the customer bought the
product. 0 represents that the customer did not buy the product.
• Step 2: Encode the input into another vector h. h is a lower dimension
vector than the input. We can use sigmoid activation function for h as the
it ranges from 0 to 1. W is the weight applied to the input and b is the bias
term.
h=f(Wx+b)
• Step 3: Decode the vector h to recreate the input. Output will be of same
dimension as the input.
• Step 4 : Calculate the reconstruction error L. Reconstruction error is the
difference between the input and output vector. Our goal is to minimize
the reconstruction error so that output is similar to the input vector
• Reconstruction error= input vector — output vector

• Step 5: Back propagate the error from output layer to the input layer to
update the weights. Weights are updated based on how much they were
responsible for the error.
• Learning rate decides by how much we update the weights.
• Step 6: Repeat step 1 through 5 for each of the observation in the
dataset. Weights are updated after each
• Step 7: Repeat more epochs. Epoch is when all the rows in the dataset has
passed through the neural network.
Where are Auto encoders used ?
• Used for Non Linear Dimensionality Reduction. Encodes input in the
hidden layer to a smaller dimension compared to the input dimension.
Hidden layer is later decoded as output. Output layer has the same
dimension as input. Autoencoder reduces dimensionality of linear and
nonlinear data hence it is more powerful than PCA.
• Used in Recommendation Engines. This uses deep encoders to
understand user preferences to recommend movies, books or items
• Used for Feature Extraction : Autoencoders tries to minimize the
reconstruction error. In the process to reduce the error, it learns some of
important features present in the input. It reconstructs the input from the
encoded state present in the hidden layer. Encoding generates a new set
of features which is a combination of the original features. Encoding in
autoencoders helps to identify the latent features presents in the input
data.
• Image recognition : Stacked autoencoder are used for image recognition.
We can use multiple encoders stacked together helps to learn different
features of an image.
Different Types of Autoencoders
• Undercomplete Autoencoders
– Goal of the Autoencoder is to capture the most
important features present in the data.
– Undercomplete autoencoders have a smaller dimension
for hidden layer compared to the input layer. This helps
to obtain important features from the data.
– Objective is to minimize the loss function by penalizing
the g(f(x)) for being different from the input x.

– When decoder is linear and we use a mean squared


error loss function then undercomplete autoencoder
generates a reduced feature space similar to PCA
– We get a powerful nonlinear generalization of PCA
when encoder function f and decoder function g are
non linear.
– Undercomplete autoencoders do not need any
regularization as they maximize the probability of data
rather than copying the input to the output.
• Sparse Autoencoders
– Sparse autoencoders have hidden nodes greater than
input nodes. They can still discover important features
from the data.
– Sparsity constraint is introduced on the hidden layer. This
is to prevent output layer copy input data.
– Sparse autoencoders have a sparsity penalty, Ω(h), a
value close to zero but not zero. Sparsity penalty is
applied on the hidden layer in addition to the
reconstruction error. This prevents overfitting.

– Sparse autoencoders take the highest activation values in


the hidden layer and zero out the rest of the hidden
nodes. This prevents autoencoders to use all of the
hidden nodes at a time and forcing only a reduced
number of hidden nodes to be used.
– As we activate and inactivate hidden nodes for each row
in the dataset. Each hidden node extracts a feature from
the data
• Denoising Autoencoders(DAE)
– Denoising refers to intentionally adding noise to the
raw input before providing it to the network.
Denoising can be achieved using stochastic
mapping.
– Denoising autoencoders create a corrupted copy of
the input by introducing some noise. This helps to
avoid the autoencoders to copy the input to the
output without learning features about the data.
– Corruption of the input can be done randomly by
making some of the input as zero. Remaining nodes
copy the input to the noised input.
– Denoising autoencoders must remove the
corruption to generate an output that is similar to
the input. Output is compared with input and not
with noised input. To minimize the loss function we
continue until convergence
– Denoising autoencoders minimizes the loss function
between the output node and the corrupted input.
• Denoising helps the autoencoders to learn the latent representation
present in the data. Denoising autoencoders ensures a good
representation is one that can be derived robustly from a corrupted input
and that will be useful for recovering the corresponding clean input.
• Denoising is a stochastic autoencoder as we use a stochastic corruption
process to set some of the inputs to zero
Contractive Autoencoders(CAE)
• Contractive autoencoder(CAE) objective is to have
a robust learned representation which is less
sensitive to small variation in the data.
• Robustness of the representation for the data is
done by applying a penalty term to the loss
function. The penalty term is Frobenius norm of
the Jacobian matrix. Frobenius norm of
the Jacobian matrix for the hidden layer is
calculated with respect to input. Frobenius norm of
the Jacobian matrix is the sum of square of all
elements.

• Contractive autoencoder is another regularization


technique like sparse autoencoders and denoising
autoencoders.
• CAE surpasses results obtained by regularizing autoencoder using weight
decay or by denoising. CAE is a better choice than denoising autoencoder
to learn useful feature extraction.
• Penalty term generates mapping which are strongly contracting the data
and hence the name contractive autoencoder.
Stacked Denoising Autoencoders
• Stacked Autoencoders is a neural network with multiple
layers of sparse autoencoders
• When we add more hidden layers than just one hidden
layer to an autoencoder, it helps to reduce a high
dimensional data to a smaller code representing
important features
• Each hidden layer is a more compact representation than
the last hidden layer
• We can also denoise the input and then pass the data
through the stacked autoencoders called as stacked
denoising autoencoders
• In Stacked Denoising Autoencoders, input corruption is
used only for initial denoising. This helps learn important
features present in the data. Once the mapping function
f(θ) has been learnt. For further layers we use
uncorrupted input from the previous layers.
• After training a stack of encoders as explained above, we
can use the output of the stacked denoising
autoencoders as an input to a stand alone supervised
machine learning like support vector machines or multi
class logistics regression.
Deep Autoencoders
• Deep Autoencoders consist of two identical deep belief networks. One
network for encoding and another for decoding.
• Typically deep autoencoders have 4 to 5 layers for encoding and the next 4
to 5 layers for decoding. We use unsupervised layer by layer pre-training
• Restricted Boltzmann Machine(RBM) is the basic building block of the deep
belief network.
• In the figure, we take an image with 784 pixel. Train using a stack of 4
RBMs, unroll them and then finetune with back propagation
• Final encoding layer is compact and fast

Implementation of Autoencoder : Basic Autoencoder

• Load the dataset


• To start, you will train the basic autoencoder using the Fashon MNIST
dataset. Each image in this dataset is 28x28 pixels
• Define an autoencoder with two Dense layers: an encoder, which
compresses the images into a 64 dimensional latent vector, and
a decoder, that reconstructs the original image from the latent space.
• Train the model using x_train as both the input and the target.
The encoder will learn to compress the dataset from 784 dimensions to
the latent space, and the decoder will learn to reconstruct the original
images.

• Now that the model is trained, let's test it by encoding and decoding
images from the test set.
Example: Image denoising
• n autoencoder can also be trained to remove noise from images. In the
following section, you will create a noisy version of the Fashion MNIST
dataset by applying random noise to each image. You will then train an
autoencoder using the noisy image as input, and the original image as the
target.
• Let's reimport the dataset to omit the modifications made earlier.
• Adding random noise to the images.

• Plot the noisy images.


Define a convolutional autoencoder
• In this example, you will train a convolutional autoencoder
using Conv2D layers in the encoder, and Conv2DTranspose layers in
the decoder.
• The decoder upsamples the images back from 7x7 to 28x28.

• Plotting both the noisy images and the denoised images produced by the
autoencoder.
Example: Anomaly detection
• In this example, you will train an autoencoder to detect anomalies on
the ECG5000 dataset. This dataset contains 5,000 Electrocardiograms,
each with 140 data points. You will use a simplified version of the dataset,
where each example has been labeled either 0 (corresponding to an
abnormal rhythm), or 1 (corresponding to a normal rhythm). You are
interested in identifying the abnormal rhythms.
• How will you detect anomalies using an autoencoder? Recall that an
autoencoder is trained to minimize reconstruction error. You will train an
autoencoder on the normal rhythms only, then use it to reconstruct all the
data. Our hypothesis is that the abnormal rhythms will have higher
reconstruction error. You will then classify a rhythm as an anomaly if the
reconstruction error surpasses a fixed threshold.
• Load ECG data
Build the model
• You will soon classify an ECG as anomalous if the reconstruction error is
greater than one standard deviation from the normal training examples.
First, let's plot a normal ECG from the training set, the reconstruction after
it's encoded and decoded by the autoencoder, and the reconstruction
error.
• Detect anomalies
• Detect anomalies by calculating whether the reconstruction loss is greater
than a fixed threshold. In this tutorial, you will calculate the mean average
error for normal examples from the training set, then classify future
examples as anomalous if the reconstruction error is higher than one
standard deviation from the training set.
• Plot the reconstruction error on normal ECGs from the training set
Thanks

You might also like