Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

https://www.youtube.com/watch?

v=62QUw7AZJcg&t=624s Part 1

https://www.youtube.com/watch?v=nGilTQza62g Part 2

https://www.youtube.com/watch?v=PQEKZDYyxFc Part 3

https://www.youtube.com/watch?v=oHJB9gIWBIk&list=PLxlkzujLkmQ8Ov11Xm-
K913C7uNSfr5Rp&index=35 Part 4

https://www.youtube.com/watch?v=VeCKz4E6SmY&list=PLxlkzujLkmQ8Ov11Xm-
K913C7uNSfr5Rp&index=36 1 * 1 Convolution

https://databasecamp.de/en/ml/resnet-en

ResNet: Residual Neural Networks – easily


explained!
 What is the problem with deep neural networks?
 How do residual neural networks solve the problem?
 What problems can arise with ResNets?
 How to build a ResNet block in TensorFlow?
 How does the training of a ResNet model work?
 What are the advantages and disadvantages of using ResNets?

Residual Neural Networks (ResNet) are special types of neural networks used in image
processing. They are characterized by their deep architectures, which can still produce low error
rates.

What architecture has been used in image recognition so far?

After the great success of a Convolutional Neural Network (CNN) at the ImageNet competition
in 2012, CNNs were the dominant architecture in machine vision. The approach is modeled on
how our eye works. When we see an image, we automatically split it into many small sub-images
and analyze them individually. By assembling these sub-images, we process and interpret the
image. How can this principle be implemented in a Convolutional Neural Network?

The work happens in the so-called Convolution Layer. To do this, we define a filter that
determines how large the partial images we are looking at should be, and a step length that
decides how many pixels we continue between calculations, i.e. how close the partial images are
to each other. By taking this step, we have greatly reduced the dimensionality of the image.
The next step is the Pooling Layer. From a purely computational point of view, the same thing
happens here first as in the Convolution Layer, with the difference that we only take either the
average or maximum value from the result, depending on the application. This preserves small
features in a few pixels that are crucial for the task solution.

Finally, there is a Fully-Connected Layer in the Convolutional Neural Network, as we already


know it from regular Neural Networks. Now that we have greatly reduced the dimensions of the
image, we can use the tightly meshed layers. Here, the individual sub-images are linked together
again in order to recognize the connections and carry out the classification.

What is the problem with deep neural networks?

In order to achieve better results, the architectures used became deeper and deeper. Thus, several
CNN blocks were simply stacked on top of each other in the hope of achieving better results.
However, the problem of the so-called vanishing gradient arises with deep neural networks.

Error
Rate with 20 and 56 Layers | Source: Deep Residual Learning for Image Recognition

The training of a network happens during the so-called backpropagation. In short, the error
travels through the network from the back to the front. In each layer, it is calculated how much
the respective neuron contributed to the error by calculating the gradient. However, the closer
this process approaches the initial layers, the smaller the gradient can become so that there is no
or only very slight adjustment of neuron weights in the front layers. As a result, deep network
structures often have a comparatively high error.

In practice, however, we cannot make it so easy for ourselves and simply blame the decreasing
performance on the vanishing gradient problem. In fact, it can even be handled relatively
well with so-called batch normalization layers. The fact that deeper neural networks have a
worse performance can furthermore also be due to the initialization of the layers or to the
optimization function.
How do residual neural networks solve the problem?
The basic building blocks of a residual neural network are the so-called residual blocks. The
basic idea here is that so-called “skip connections” are built into the network. These ensure that
the activation of a layer is added together with the output of a later layer.

Residual Block |
Source: Deep Residual Learning for Image Recognition

This architecture allows the network to simply skip certain layers, especially if they do not
contribute anything to a better result. A residual neural network is composed of several of these
so-called residual blocks.

The benefit of including this kind of skip link is that regularisation will skip any layer that
degrades architecture performance. As a result, training an extremely deep neural network is
possible without encountering issues with vanishing or expanding gradients.

What problems can arise with ResNets?

Especially with Convolutional Neural Networks, it naturally happens that the dimensionality at
the beginning of the skip connection does not match that at the end of the skip connection. This
is especially the case if several layers are skipped. In Convolutional Neural Networks, the
dimensionality is changed in each block with the help of a filter. Thus, the skip connection faces
the problem of simply adding the inputs of previous layers to the output of later layers.

To solve this problem, the residual can be multiplied by a linear projection to align the
dimensions. In many cases, for example, a 1×1 convolutional layer is used for this purpose.
However, it can also happen that an alignment of dimensions is not necessary at all.

How to build a ResNet block in TensorFlow?

A ResNet block is relatively easy to program in TensorFlow, especially if you ensure that the
dimensions are the same when merging.
In this case, the input first passes through a dense layer with 1024 neurons. This is followed by a
block consisting of a dropout layer and two dense layers, which first limits the number of
neurons to 512 before it is increased again to 1024. Then the merging with the add layer takes
place. Since both inputs have a dimensionality of 1024, they can be added up without any
problems.

How does the training of a ResNet model work?

Training a ResNet (Residual Neural Network) follows the standard process of training deep
neural networks. However, the unique aspect of ResNets is the use of skip connections that allow
the direct flow of gradients, which facilitates the training of very deep neural networks.

In the training process, the training data is fed into the ResNet and the loss or error between the
predicted output and the actual output is calculated. This loss is then tracked back through the
network to update the weights of the network to minimize the loss.

During the training process, skip connections in ResNets ensure that gradient information can
flow through the mesh, which facilitates the training of deep meshes. This is because the
gradient information can bypass layers in the network that may have zero or negative gradients,
which can lead to the vanishing gradient problem in deep neural networks.

In addition to standard training techniques such as stochastic gradient descent (SGD) and
backpropagation, ResNets can also be trained with techniques such as weight loss, dropout, and
batch normalization to improve performance and prevent overfitting.

Overall, the training process for a ResNet includes feeding the training data into the network,
calculating the loss or error, and updating the weights through backpropagation. The use of skip
connections facilitates the training of very deep neural networks, resulting in better accuracy and
faster convergence.

What are the advantages and disadvantages of using ResNets?

Advantages:

 Improved accuracy: ResNets have been shown to perform better than traditional deep neural
networks in various benchmark datasets, achieving peak performance.
 Faster convergence: ResNets allow for easier training and faster convergence due to the
presence of jump connections that allow for the direct flow of gradients.
 Better generalization: ResNets have been shown to generalize better than traditional deep
neural networks, which is essential for real-world applications where data distribution may
change over time.
 Transfer learning: These models can be effectively used for transfer learning by performing fine-
tuning on a smaller dataset, making them useful for practical applications where the availability
of labeled data is limited.

Disadvantages:

 Increased complexity: The presence of hop connections makes ResNets more complex than
traditional deep neural networks, which can lead to higher computational demands and
memory requirements.
 Overfitting: ResNets can be prone to overfitting on small datasets, and care must be taken to
use appropriate regularization techniques to avoid this.
 Interpretability: The complex nature of ResNets can make it difficult to interpret their internal
workings and understand how they make decisions, which can be a disadvantage in certain
applications.

What are Skip Connections in ResNet?


These skip connections work in two ways.

Firstly, they alleviate the issue of vanishing gradient by setting up an alternate shortcut for the gradient
to pass through. In addition, they enable the model to learn an identity function. This ensures that the
higher layers of the model do not perform any worse than the lower layers. In short, the residual blocks
make it considerably easier for the layers to learn identity functions. As a result, ResNet improves the
efficiency of deep neural networks with more neural layers while minimizing the percentage of errors. In
other words, the skip connections add the outputs from previous layers to the outputs of stacked layers,
making it possible to train much deeper networks than previously possible.
Read more at: https://viso.ai/deep-learning/resnet-residual-neural-network/

Conclusion
 A robust backbone model called ResNet is utilised often in various computer vision tasks.
 ResNet employs skip connections to transfer output from one layer to another. This aids in
reducing the issue of disappearing gradients.
https://builtin.com/artificial-intelligence/resnet-architecture

What Is a Residual Network (ResNet)?


ResNet is an artificial neural network that introduced a so-called “identity shortcut connection,” which
allows the model to skip one or more layers. This approach makes it possible to train the network on
thousands of layers without affecting performance. It’s become one of the most popular architectures
for various computer vision tasks.

What Is ResNet?

According to the universal approximation theorem, given enough capacity, we know that a
feedforward network with a single layer is sufficient to represent any function. However, the
layer might be massive, and the network is prone to overfitting the data. Therefore, there is a
common trend in the research community that our network architecture needs to go deeper.

Since AlexNet, the state-of-the-art convolutional neural network (CNN) architecture is going
deeper and deeper. While AlexNet had only five convolutional layers, the VGG network and
GoogleNet (also codenamed Inception_v1) had 19 and 22 layers respectively.

However, you can’t simply stack layers together to increase network depth. Deep networks are
hard to train because of the notorious vanishing gradient problem. As the gradient is
backpropagated to earlier layers, repeated multiplication may make the gradient infinitely small.
As a result, the deeper the network goes, the more its performance becomes saturated or even
starts rapidly degrading.

Before ResNet, there had been several ways to deal with the vanishing gradient issue. For
instance, GoogleNet adds an auxiliary loss in a middle layer for extra supervision, but none of
those solutions seemed to really tackle the problem once and for all.

The core idea of ResNet is that it introduced a so-called “identity shortcut connection” that skips
one or more layers.

The authors of the study on deep residual learning for image recognition argue that stacking
layers shouldn’t degrade the network performance because we could simply stack identity
mappings — a layer that doesn’t do anything — on top of the current network, and the resulting
architecture would perform the same. This indicates that the deeper model should not produce a
training error higher than its shallower counterparts. They hypothesize that letting the stacked
layers fit a residual mapping is easier than letting them directly fit the desired underlying
mapping. And the residual block above explicitly allows it to do precisely that.

As a matter of fact, ResNet was not the first to make use of shortcut connections. The authors of
a study on Highway Network also introduced gated shortcut connections. These parameterized
gates control how much information is allowed to flow across the shortcut. A similar idea can be
found in the report on long short-term memory (LSTM) cell, in which there is a parameterized
forget gate that controls how much information will flow to the next time step. Therefore,
ResNet can be thought of as a special case of highway network.
However, experiments show that the highway network performs no better than ResNet, which is
unusual because the solution space of highway network contains ResNet. Therefore, it should
perform at least as good as ResNet. This suggests that it is more important to keep these
“gradient highways” clear than to go for a larger solution space.

Following this intuition, the authors of deep residual learning for image recognition refined the
residual block and proposed in a study on identity mappings in deep ResNets a pre-activation
variant of residual block, in which the gradients can flow unimpeded through the shortcut
connections to any other earlier layer. In fact, using the original residual block in image
recognition study, training a 1202-layer ResNet resulted in a worse performance than its 110-
layer counterpart.

The authors of identity mappings in deep ResNets demonstrated with experiments that they can
now train a 1001-layer deep ResNet to outperform its shallower counterparts. Because of its
compelling results, ResNet quickly became one of the most popular architectures for various
computer vision tasks.

ResNet Architecture Variants and Interpretations

As ResNet gains popularity in the research community, its architecture is getting studied heavily.
In this section, I will first introduce several new architectures based on ResNet, then introduce a
paper that provides an interpretation of treating ResNet as an ensemble of many smaller
networks.

ResNeXt

The authors in a study on aggregated residual transformations for deep neural networks proposed
a variant of ResNet that is codenamed ResNeXt.

It is very similar to the inception module that the authors from the study on going deeper with
convolutions came up with in 2015. They both follow the split-transform-merge paradigm,
except in this variant, the outputs of different paths are merged by adding them together, while in
the 2015 study, they are depth-concatenated. Another difference is that in the study on going
deeper with convolutions, each path is different (1x1, 3x3 and 5x5 convolution) from each other,
while in this architecture, all paths share the same topology.

The authors introduced a hyper-parameter called cardinality — the number of independent paths
— to provide a new way of adjusting the model capacity. Experiments show that accuracy can be
gained more efficiently by increasing the cardinality than by going deeper or wider. The authors
state that compared to inception, this novel architecture is easier to adapt to new data sets and
tasks, as it has a simple paradigm and only one hyper-parameter needs to be adjusted. Inception,
however, has many hyper-parameters (like the kernel size of the convolutional layer of each
path) to tune. This novel building block has three equivalent forms.

In practice, the “split-transform-merge” is usually done via a pointwise grouped convolutional


layer, which divides its input into groups of feature maps and performs normal convolution
respectively. Their outputs are depth-concatenated and then fed to a 1x1 convolutional layer.

Densely Connected CNN

Another team of researchers in 2016 proposed a novel architecture called DenseNet that further
exploits the effects of shortcut connections. It connects all layers directly with each other. In this
novel architecture, the input of each layer consists of the feature maps of all earlier layer, and its
output is passed to each subsequent layer. The feature maps are aggregated with depth-
concatenation.

Other than tackling the vanishing gradients problem, the authors of “Aggregated Residual
Transformations for Deep Neural Networks” argue that this architecture also encourages feature
reuse, making the network highly parameter-efficient. One simple interpretation of this is that, in
the studies on deep residual learning for image recognition and identity mappings in deep
ResNetts the output of the identity mapping was added to the next block, which might impede
information flow if the feature maps of two layers have very different distributions. Therefore,
concatenating feature maps can preserve them all and increase the variance of the outputs,
encouraging feature reuse

Following this paradigm, we know that the l_th layer will have k * (l-1) + k_0 input feature
maps, where k_0 is the number of channels in the input image. The authors used a hyper-
parameter called growth rate (k) to prevent the network from growing too wide. They also used a
1x1 convolutional bottleneck layer to reduce the number of feature maps before the expensive
3x3 convolution.

You might also like