Dense Net

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

https://analyticsindiamag.

com/a-complete-understanding-of-dense-layers-in-neural-networks/

A dense layer also referred to as a fully connected layer is a layer that is used in the final stages of the
neural network. This layer helps in changing the dimensionality of the output from the preceding layer
so that the model can easily define the relationship between the values of the data in which the model
is working.

What is a Dense Layer?

In any neural network, a dense layer is a layer that is deeply connected with its preceding layer
which means the neurons of the layer are connected to every neuron of its preceding layer. This
layer is the most commonly used layer in artificial neural network networks.

The dense layer’s neuron in a model receives output from every neuron of its preceding layer,
where neurons of the dense layer perform matrix-vector multiplication. Matrix vector
multiplication is a procedure where the row vector of the output from the preceding layers is
equal to the column vector of the dense layer. The general rule of matrix-vector multiplication is
that the row vector must have as many columns like the column vector.

The general formula for a matrix-vector product is:

Where A is a (M x N) matrix and x is a (1 ???? N) matrix. Values under the matrix are the
trained parameters of the preceding layers and also can be updated by the backpropagation.
Backpropagation is the most commonly used algorithm for training the feedforward neural
networks. Generally, backpropagation in a neural network computes the gradient of the loss
function with respect to the weights of the network for single input or output. From the above
intuition, we can say that the output coming from the dense layer will be an N-dimensional
vector. We can see that it is reducing the dimension of the vectors. So basically a dense layer is
used for changing the dimension of the vectors by using every neuron.

As discussed before, results from every neuron of the preceding layers go to every single neuron
of the dense layer. So we can say that if the preceding layer outputs a (M x N) matrix by
combining results from every neuron, this output goes through the dense layer where the count of
neurons in a dense layer should be N.

DenseNet Architecture Introduction


https://amaarora.github.io/posts/2020-08-02-densenets.html

Each convolutional layer except the first one (which takes in the input image), takes in the output of the
previous convolutional layer and produces an output feature map that is then passed to next
convolutional layer. For L layers, there are L direct connections - one between each layer and its
subsequent layer.

The DenseNet architecture is all about modifying this standard CNN architecture like so:

Figure 1: A 5-Layer dense block with a growth rate of k=4, Each layer takes all preceding
feature-maps as input.
In a DenseNet architecture, each layer is connected to every other layer, hence the name Densely
Connected Convolutional Network. For L layers, there are L(L+1)/2 direct connections. For
each layer, the feature maps of all the preceding layers are used as inputs, and its own feature
maps are used as input for each subsequent layers.

This is really it, as simple as this may sound, DenseNets essentially conect every layer to every
other layer. This is the main idea that is extremely powerful. The input of a layer inside
DenseNet is the concatenation of feature maps from previous layers.

From the paper: > DenseNets have several compelling advantages: they alleviate the vanishing-
gradient problem, strengthen feature propagation, encourage feature reuse, and substantially
reduce the number of parameters.

3 But is feature concatenation possible?


Okay, so then, now we know the input of Lth layer are the feature maps from [L1, L1, L1.. L-1th]
concatenated but is this concatenation possible?

At this point in time, I want you to think about whether we can concat the features from the first
layer of a DenseNet with the last layer of the DenseNet? If we can, why? If we can’t, what do
we need to do to make this possible?

This is a good time to take a minute and think about this question.

So, here’s what I think - it would not be possible to concatenate the feature maps if the size of
feature maps is different. So, to be able to perform the concatenation operation, we need to make
sure that the size of the feature maps that we are concatenating is the same. Right?

But we can’t just keep the feature maps the same size throughout the network - an essential part
of concvolutional networks is down-sampling layers that change the size of feature maps.
For example, look at the VGG architecture below:
The input of shape 224x224x3 is downsampled to 7x7x512 towards the end of the network.

To facilitate both down-sampling in the architecture and feature concatenation - the authors
divided the network into multiple densely connected dense blocks. Inside the dense blocks, the
feature map size remains the same.

Dividing the network into densely connected blocks solves the problem that we discussed above.

Now, the Convolution + Pooling operations outside the dense blocks can perform the
downsampling operation and inside the dense block we can make sure that the size of the feature
maps is the same to be able to perform feature concatenation.

3.1 Transition Layers

The authors refer to the layers between the dense blocks as transition layers which do the
convolution and pooling.

From the paper, we know that the transition layers used in the DenseNet architecture consist of
a batch-norm layer, 1x1 convolution followed by a 2x2 average pooling layer.
4 Dense connectivity
Let’s consider a network with L layers, each of which performs a non-linear transformation HL.
The output of the Lth layer of the network is denoted as xL and the input image is represented as
x0.

We know that traditional feed-forward netowrks connect the output of the Lth layer to L+1th
layer. And the skip connection can be represented as:

In DenseNet architecture, the dense connectivity can be represented as:

where [x0, x1, x2..] represents concatenation of the feature maps produced by [0,1,.. Lth] layers.

4.1 Inside a single DenseBlock

Now that we understand that DenseNet architecture is divided into multiple dense blocks, let’s
look at a single dense block in a little more detail. Essentially, we know that inside a dense
block, each layer is connected to every other layer and the feature map size remains the same.

Let’s try and understand what’s really going on inside a dense block.

We have some gray input features that are then passed to LAYER_0. The LAYER_0 performs a
non-linear transformation to add purple features to the gray features. These are then used as
input to LAYER_1 which performs a non-linear transformation to also add orange features to the
gray and purple ones. And so on until the final output for this 3 layer denseblock is a
oncatenation of gray, purple, orange and green features.

So, in a dense block, each layer adds some features on top of the existing feature maps.
Therefore, as you can see the size of the feature map grows after a pass through each dense layer
and the new features are concatenated to the existing features. One can think of the features as a
global state of the network and each layer adds K features on top to the global state.

This parameter K is referred to as growth rate of the network.

5 DenseNet Architecture as a collection of DenseBlocks


We already know by now from fig-4, that DenseNets are divided into multiple DenseBlocks.

The various architectures of DenseNets have been summarized in the paper.

Each architecture consists of four DenseBlocks with varying number of layers. For example, the
DenseNet-121 has [6,12,24,16] layers in the four dense blocks whereas DenseNet-169 has
[6, 12, 32, 32] layers.

We can see that the first part of the DenseNet architecture consists of a 7x7 stride 2 Conv
Layer followed by a 3x3 stride-2 MaxPooling layer. And the fourth dense block is
followed by a Classification Layer that accepts the feature maps of all layers of the network to
perform the classification.

Also, the convolution operations inside each of the architectures are the Bottle Neck layers.
What this means is that the 1x1 conv reduces the number of channels in the input and 3x3 conv
performs the convolution operation on the transformed version of the input with reduced number
of channels rather than the input.
5.1 Bottleneck Layers

By now, we know that each layer produces K feature maps which are then concatenated to
previous feature maps. Therefore, the number of inputs are quite high especially for later layers
in the network.

This has huge computational requirements and to make it more efficient, the authors decided to
utilize Bottleneck layers. From the paper: > 1×1 convolution can be introduced as bottleneck
layer before each 3×3 convolution to reduce the number of input feature-maps, and thus to
improve computational efficiency. In our experiments, we let each 1×1 convolution produce 4k
feature-maps.

We know K refers to the growth rate, so what the authors have finalized on is for 1x1 conv to
first produce 4*K feature maps and then perform 3x3 conv on these 4*k size feature maps.

6 DenseNet Implementation
A DenseLayer accepts an input, concatenates the input together and performs bn_function on
these feature maps to get bottleneck_output. This is done for computational efficiency.
Finally, the convolution operation is performed to get new_features which are of size K or
growth_rate.

It should now be easy to map the above implementation with fig-5 shown below for reference
again:

Let’s say the above is an implementation of LAYER_2. First, LAYER_2 accepts the gray, purple,
orange feature maps and concatenates them. Next, the LAYER_2 performs a bottleneck operation
to create bottleneck_output for computational efficiency. Finally, the layer performs the HL
operation as in eq-2 to generate new_features. These new_features are the green features as
in fig-5.

Great! So far we have successfully implemented Transition and Dense layers.

https://amaarora.github.io/posts/2020-08-02-densenets.html
DenseNet is a recently proposed custom convolutional neural network model where the current layer
connects with all previous layers [23]. The structure has some advantages over existing structures, such
as mitigating the disappearing gradient problem, enhancing feature propagation, promoting feature
reuse, and reducing the number of parameters. A deep DenseNet is defined as a series of DenseNets
(called dense blocks) sequentially connected by additional convolution and pooling operations
between consecutive dense blocks. We can create a deep neural network flexible enough to represent
complex transformations with such a structure. An example of deep DenseNet is shown in Figure 1.

Figure 1. Three-Block DenseNet Architecture.

https://www.mdpi.com/2075-1729/13/2/349

Why do we use dense layers?


A Dense layer feeds all outputs from the previous layer to all its neurons, each providing one
output to the next layer. It's the most basic layer in neural networks.

Are dense layers fully connected?


The dense layer, also called the fully-connected layer, refers to the layer whose inside neurons
connect to every neuron in the preceding layer.

What is the use of dense Layer in CNN?


The dense layer is a simple Layer of neurons in which each neuron receives input from all the
neurons of the previous layer, thus called as dense. The dense layer is used to classify images
based on output from convolutional layers.
What does dense do?
Dense is the only actual network layer in that model. A Dense layer feeds all outputs from the
previous layer to all its neurons, each providing one output to the next layer.
Is a dense layer a hidden layer?
The first Dense object is the first hidden layer. The input layer is specified as a parameter to the
first Dense object's constructor.

https://www.codingninjas.com/studio/library/dense-in-deep-learning

Dense Layer
Firstly, the data of the image is read by convolutional layers and then the image is passed through
pooling layers. After the data is passed through pooling and convolutional layers, the output of the
image is transferred to dense layer. Output from the convolutional layer is multi-dimensional shape and
it is the main reason that we should not pass multi-dimensional image to the dense layer because the
dense layer only accepts 1-D shape of image.

Flatten Layer
We usually call Flatten method to convert multi-dimensional data into 1D array. To do this we revoke
the Flatten () method between convolutional Layers and the dense layers.

Batch Normalization Layer


Batch layer is also used in deep neural networks as a latest toolkit from many practitioners working on
deep learning projects. Batch layer is a significant layer in the deep learning. It is used in architecture as
a linear block. It also helps to normalize the network during the training session [7].

Introduction to DenseNets (Dense CNN) $$$$$$$$$ BEST $$$$$$$$$$$$


https://www.analyticsvidhya.com/blog/2022/03/introduction-to-densenets-dense-cnn/

What are DenseNets?


So dense net is densely connected-convolutional networks. It is very similar to a ResNet with some-
fundamental differences. ResNet is using an additive method that means they take a previous output as
an input for a future layer, & in DenseNet takes all previous output as an input for a future layer as
shown in the above image.

Why do we need DenseNets?


So DenseNet was specially developed to improve accuracy caused by the vanishing gradient in high-level
neural networks due to the long distance between input and output layers & the information vanishes
before reaching its destination.
DenseNet Architecture VS ResNet Architecture.

Source: paperswithcode

So suppose we have a capital L number of layers, In a typical network with L layers, there will be L
connections, that is, connections between the layers. However, in a DenseNet, there will be about L
and L plus one by two connections L(L+1)/2. So in a dense net, we have less number of layers than the
other model, so here we can train more than 100 layers of the model very easily by using this technique.

DenseBlocks And Layers

Source: Towards Data Science


Here as we go deeper into the network this becomes a kind of unsustainable, if you go 2nd layer to 3rd
layer so 3rd layer takes an input not only 2nd layer but it takes input all previous layers.

So let’s say we have about ten layers. Then the 10th layer will take us to input all the feature maps from
the preceding nine layers. Now, if each of these layers, let’s produce 128 feature maps and there is a
feature map explosion. to overcome this problem we create a dense block here and So each dense block
contains a prespecified number of layers inside them.

and the output from that particular dense block is given to what is called a transition layer and this layer
is like one by one convolution followed by Max pooling to reduce the size of the feature maps. So the
transition layer allows for Max pooling, which typically leads to a reduction in the size of your feature
maps.

As a given fig, we can see two blocks first one is the convolution layer and the second is the pooling
layer, and combinations of both are the transition layer.

So following some Advantages of the dense net.

 Parameter efficiency – Every layer adds only a limited number of parameters- for e.g. only
about 12 kernels are learned per layer
 Implicit deep supervision – Improved flow of gradient through the network- Feature maps in all
layers have direct access to the loss function and its gradient.

So there are a few other terms that the paper talks about and which are the important concept in
DenseNet.

 Growth rate – This


determines the number of feature maps output into individual layers inside dense blocks.
 Dense connectivity – By dense connectivity, we mean that within a dense block each layer gets
us input feature maps from the previous layer as seen in this figure.
 Composite functions – So the sequence of operations inside a layer goes as follows. So we have
batch normalization, followed by an application of Relu, and then a convolution layer (that will
be one convolution layer)
 Transition layers – The transition layers aggregate the feature maps from a dense block and
reduce its dimensions. So Max Pooling is enabled here.

So till this, we got a basic idea about what is a dense net, and how it internally works. So we take a
simple example and understand the code here.

DenseNets On CIFAR-10
Here we apply a DenseNet on the CIFAR-10 dataset, The CIFAR-10 dataset consists of 60000 32×32
colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test
images and more about CIFAR-10 click here.
So the 10 classes are as follows:-

airplane

automobile

Bird

Cat

Deer

Dog

Frog

Horse

Ship

Truck

Images credit – https://www.cs.toronto.edu/~kriz/cifar.html

So first we load the dataset using a Keras library and then we do one-hot encoding for the or all classes.

# Load CIFAR10 Data


(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
image_height, image_width, channel_size =
X_train.shape[1],X_train.shape[2],X_train.shape[3]
# convert to one hot encoing
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)
X_train.shape, X_test.shape
((50000, 32, 32, 3), (10000, 32, 32, 3))

All data is in array format so first, we normalize the data by dividing a 255.0 and here we take a
255.0 because our data is RGB so we know all RGB pixels values lie between 0 to 255.

X_train = X_train / 255.0


X_test = X_test / 255.0

Now we Build DenseNet as follows-

def denseblock(input, num_filter = 12, dropout_rate = 0.2):


global compression
temp = input
for _ in range(l):
BatchNorm = layers.BatchNormalization()(temp)
relu = layers.Activation('relu')(BatchNorm)
Conv2D_3_3 = layers.Conv2D(int(num_filter*compression), (3,3),
use_bias=False ,padding='same')(relu)
if dropout_rate>0:
Conv2D_3_3 = layers.Dropout(dropout_rate)(Conv2D_3_3)
concat = layers.Concatenate(axis=-1)([temp,Conv2D_3_3])
temp = concat
return temp
## transition Blosck
def transition(input, num_filter = 12, dropout_rate = 0.2):
global compression
BatchNorm = layers.BatchNormalization()(input)
relu = layers.Activation('relu')(BatchNorm)
Conv2D_BottleNeck = layers.Conv2D(int(num_filter*compression), (1,1),
use_bias=False ,padding='same')(relu)
if dropout_rate>0:
Conv2D_BottleNeck = layers.Dropout(dropout_rate)(Conv2D_BottleNeck)
avg = layers.AveragePooling2D(pool_size=(2,2))(Conv2D_BottleNeck)
return avg
#output layer
def output_layer(input):
global compression
BatchNorm = layers.BatchNormalization()(input)
relu = layers.Activation('relu')(BatchNorm)
AvgPooling = layers.AveragePooling2D(pool_size=(2,2))(relu)
flat = layers.Flatten()(AvgPooling)
output = layers.Dense(num_classes, activation='softmax')(flat)
return output

Now we create a model with the two DenseNet blocks-

l = 7
input = layers.Input(shape=(image_height, image_width, channel_size,))
First_Conv2D = layers.Conv2D(30, (3,3), use_bias=False ,padding='same')(input)
First_Block = denseblock(First_Conv2D, 30, 0.5)
First_Transition = transition(First_Block, 30, 0.5)
Last_Block = denseblock(First_Transition, 30, 0.5)
output = output_layer(Last_Block)
model = Model(inputs=[input], outputs=[output])

Now train a model-

# determine Loss function and Optimizer


model.compile(loss='categorical_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])
We also do an Image augmentation for better accuracy-

from keras.preprocessing.image import ImageDataGenerator


datagen = ImageDataGenerator(height_shift_range=0.1, width_shift_range= 0.1,
shear_range=0.2, zoom_range=0.2, horizontal_flip=True)
datagen.fit(X_train)
data = datagen.flow(x=X_train, y=y_train, batch_size=batch_size)

Fit the data to the model-

tep_size = X_train.shape[0]//batch_size
model1 = model.fit_generator(data, steps_per_epoch=step_size, epochs=100,
verbose=1, validation_data=(X_test, y_test))

After running the 100 epoch we got very good accuracy here-

Author GitHub

Here we saw some time accuracy is increased and the next epoch accuracy is reduced because of the
local oscillation inaccuracy here accuracy is not go down at minimum points so they oscillate and take
more time to go down.

Conclusion on DenseNets
 Here we learned a very very basic introduction of Dense Net architecture with the code
structure and we learn why it is useful and how they are good than the ResNet architecture.
 In a dense net, the main key takeaway is the number of connections, generally, in
any architecture, the number of connections is typically the same as the number of layers, but
here the number of connections is L(L+1)/2, here L = numbers of the layers.
 Here the main advantage of the dense net is, if you are in layer 3rd then layer 3rd takes input as
not only the 2nd layer but they take the input as the 1st layer also, and by doing this we train
our model in a better way and model learn better things.
 And here we saw how we use CIFAR-10 data basically, this is an image dataset with 10 different
classes, and train our model with very good accuracy and changing the number of dropout rates
and the number of the layer values we find a better model with good accuracy.

You might also like