Professional Documents
Culture Documents
Capsule Networks: Architecture Overview
Capsule Networks: Architecture Overview
Architecture Overview:
[1] https://arxiv.org/abs/1710.09829
[2] https://link.springer.com/chapter/10.1007/978-3-030-63820-7_45
As shown in figure 1. is the architecture of the original capsule networks
proposed by Dr. Geoffery Hinton. They have achieved SoTA performance on the
MNIST dataset using Capsnet. As shown, the 28X28 input image is passed
through a convolutional layer with kernel size 9X9 and no. of filters 256. The
feature map now obtained is convolved with a kernel size 9X9 and stride 2 and
no. of filters 256. These convolutions are carried out to extract the high-level
information from the input image. The final feature map of 6X6X256 is converted
to 32 primary capsules with dimensions 6X6X8 each. Finally, the output capsules
depicted as DigitCaps are calculated by passing the output of PrimaryCaps
through a dynamic routing algorithm. The length of the activity vector of each
capsule in the DigitCaps layer indicates the presence of an instance of each class
and is used to calculate the classification loss.
Future Work:
In our paper “EmotionCaps - Facial Emotion Recognition Using Capsules” [2] we
exploit the capsule networks for the task of facial emotion recognition. Due to the
aforementioned advantages of capsule networks over conventional methods, we
were able to achieve a state of the art accuracy in modeling the facial
expressions and emotions corresponding to them. Hence, the capsule networks
can be highly useful in classifying the hyperspectral images because it can learn
the presence of a feature in a rotationally invariant manner and learn the spatial
relationship between the features of an image to correctly classify the image.
[1] https://arxiv.org/abs/1710.09829
[2] https://link.springer.com/chapter/10.1007/978-3-030-63820-7_45