Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Capsule Networks

Capsule Networks, proposed by Dr. Geoffery Hinton in his paper “Dynamic


Routing Between Capsules” [1] is a new kind of Neural Network architecture that
has several advantages over the conventional neural network architectures.

Unlike the conventional Neural Network architectures, the Convolutional Neural


Networks (CNN) and Artificial Neural Network (ANN), capsule networks have
unique and use full advantages which are:

A. Are rotationally invariant. That is because the internal representation of


features of an image in a capsule network, is in the form of a vector, in a
way that the length of the vector represents the probability of the presence
of that feature and the direction points towards that feature. Hence if the
feature moves from one place to another the probability of its presence
remains unchanged.
B. Are capable to learn the spatial relationship between the objects or
features in an image. That is any decision is taken after a dynamic routing
algorithm that studies the internal arrangement of features. For example, a
face is classified as a face if there is a nose between two eyes and lips
below the nose. Whereas in the case of CNN the max-pooling can detect
the presence of a feature but cannot learn its relationships with other
features.

Architecture Overview:

Fig 1. Capsnet architecture proposed in [1]

[1] ​https://arxiv.org/abs/1710.09829
[2] ​https://link.springer.com/chapter/10.1007/978-3-030-63820-7_45
As shown in figure 1. is the architecture of the original capsule networks
proposed by Dr. Geoffery Hinton. They have achieved SoTA performance on the
MNIST dataset using Capsnet. As shown, the 28X28 input image is passed
through a convolutional layer with kernel size 9X9 and no. of filters 256. The
feature map now obtained is convolved with a kernel size 9X9 and stride 2 and
no. of filters 256. These convolutions are carried out to extract the high-level
information from the input image. The final feature map of 6X6X256 is converted
to 32 primary capsules with dimensions 6X6X8 each. Finally, the output capsules
depicted as DigitCaps are calculated by passing the output of PrimaryCaps
through a dynamic routing algorithm. ​The length of the activity vector of each
capsule in the DigitCaps layer indicates the presence of an instance of each class
and is used to calculate the classification loss.

Fig 2. Decoder Proposed in [1]


To ensure that the encoder in the capsule network from figure 1 encodes
meaningful information in the DigitCaps, a decoder is trained to reconstruct the
input image with all capsules corresponding to incorrect class masked. Refer the
figure 2.

Future Work:
In our paper “EmotionCaps - Facial Emotion Recognition Using Capsules” [2] we
exploit the capsule networks for the task of facial emotion recognition. Due to the
aforementioned advantages of capsule networks over conventional methods, we
were able to achieve a state of the art accuracy in modeling the facial
expressions and emotions corresponding to them. Hence, the capsule networks
can be highly useful in classifying the hyperspectral images because it can learn
the presence of a feature in a rotationally invariant manner and learn the spatial
relationship between the features of an image to correctly classify the image.

[1] ​https://arxiv.org/abs/1710.09829
[2] ​https://link.springer.com/chapter/10.1007/978-3-030-63820-7_45

You might also like