Creating and Training Custom Layers in TensorFlow 2 - by Arjun Sarkar - Towards Data Science

8/23/2021 Creating and Training Custom Layers in TensorFlow 2 | by Arjun Sarkar | Towards Data Science
Get started Open in app
Follow 571K Followers
You have 2 free member-only stories left this month. Sign up for Medium and get an extra one
Creating and Training Custom Layers in

TensorFlow 2
Learning to create your own custom layers and training them in TensorFlow 2
Arjun Sarkar Jun 24 · 8 min read
1. Previously we’ve seen how to create custom loss functions — Creating custom Loss
functions using TensorFlow 2
2. Next, I wrote about creating custom Activation Functions using Lambda layers —
Creating Custom Activation Functions with Lambda Layers in TensorFlow 2
This is the third part of the series, where we create custom Dense Layers and train them
in TensorFlow 2.
Introduction:
Lambda layers are simple layers in TensorFlow that can be used to create some custom
activation functions. But lambda layers have many limitations, especially when it comes
to training these layers. So, the idea is to create custom layers that are trainable, using
the inheritable Keras layers in TensorFlow — with a special focus on Dense layers.
https://towardsdatascience.com/creating-and-training-custom-layers-in-tensorflow-2-6382292f48c2 1/11
What is a Layer?
Figure 1. Layer — Dense Layer representation (Source: image created by Author)
A layer is a class that receives some parameters, passes them through state and
computations, and passes out an output, as required by the neural network. Every model
architecture contains multiple layers, be it a Sequential or a Functional API.
State — Mostly trainable features which are trained during ‘model.fit’. In a Dense layer,
the states constitute the weights and the bias, as shown in Figure 1. These values are
updated to give better results as the model trains. In some layers, the state can also
contain non-trainable features.
Computation — Computation helps in transforming a batch of input data into a batch of

output data. In this part of the layer, the calculation takes place. In a Dense layer, the
computation does the following computation —
Y = (w*X+c), and returns Y.
Y is the output, X is the input, w = weights, c = bias.
Creating a custom Dense Layer:

Now that we know what happens inside Dense layers, let’s see how we can create our
own Dense layer and use it in a model.

import tensorflow as tf
from tensorflow.keras.layers import Layer
class SimpleDense(Layer):
def __init__(self, units=32):
'''Initializes the instance attributes'''
super(SimpleDense, self).__init__()
self.units = units
def build(self, input_shape):
'''Create the state of the layer (weights)'''
# initialize the weights
w_init = tf.random_normal_initializer()
self.w = tf.Variable(name="kernel",
initial_value=w_init(shape=(input_shape[-1], self.units),
dtype='float32'),trainable=True)
# initialize the biases
b_init = tf.zeros_initializer()
self.b = tf.Variable(name="bias",initial_value=b_init(shape=
(self.units,), dtype='float32'),trainable=True)
def call(self, inputs):
'''Defines the computation from inputs to outputs'''
return tf.matmul(inputs, self.w) + self.b
Explanation of the code above — The class is named SimpleDense. When we create a
custom layer, we have to inherit Keras’s layer class. This is done in the line ‘class
SimpleDense(Layer)’.
‘__init__’ is the first method in the class that will help to initialize the class. ‘init’ accepts
parameters and converts them to variables that can be used within the class. This is
inheriting from the ‘Layer’ class and hence requires some initialization. This
initialization is done using the ‘super’ keyword. ‘units’ is a local class variable. This is
analogous to the number of units in the Dense layer. The default value is set to 32, but
can always be changed when the class is called.
‘build’ is the next method in the class. This is used to specify the states. In the Dense
layer, the two states required are ‘w’ and ‘b’, for weights and biases. When the Dense
layer is being created, we are not just creating one neuron of the network’s hidden layer,
but multiple neurons at one go (in this case 32 neurons will be created). Every neuron in
the layer needs to be initialized and given some random weight and bias values.
TensorFlow contains many built-in functions to initialize these values.
For initializing the weights we use the ‘random_normal_initializer’ function from

TensorFlow, which will initialize weights randomly using a normal distribution. ‘self.w’
contains the states of the weights in the form of a tensor variable. These states will
initialize using ‘w_init’. The value contained as weights will be in the ‘float_32’ format. It
is set to ‘trainable’, which means after every run, these initial weights will be updated in
accordance with the loss function and optimizer. The name ‘kernel’ is added so that it
can be easily traced later.
For initializing the biases, TensorFlow’s ‘zeros_initializer’ function is used. This sets all
the initial bias values to zero. ‘self.b’ is a tensor with a size same as the size of the units
(here 32), and each of these 32 bias terms are set to zero initially. This is also set to
‘trainable’, so the bias terms will update as training starts. The name ‘bias’ is added to be
able to trace it later.
‘call’ is the last method that performs the computation. In this case, as it is a Dense layer,
it multiplies the inputs with the weights, adds the bias, and finally returns the output.
The ‘matmul’ operation is used as self.w and self.b are tensors and not single numerical
values.
# declare an instance of the class
my_dense = SimpleDense(units=1)
# define an input and feed into the layer
x = tf.ones((1, 1))
y = my_dense(x)
# parameters of the base Layer class like `variables` can be used
print(my_dense.variables)
Output:
[<tf.Variable 'simple_dense/kernel:0' shape=(1, 1) dtype=float32,

numpy=array([[0.00382898]], dtype=float32)>,
<tf.Variable 'simple_dense/bias:0' shape=(1,) dtype=float32,

numpy=array([0.], dtype=float32)>]
Explanation of the code above — The first line creates a Dense layer containing just one
neuron (unit =1). x (input) is a tensor of shape (1,1) with the value 1. Y =
my_dense(x), helps initialize the Dense layer. ‘.variables’ helps us to look at the values
initialized inside the Dense layers (weights and biases).
The output of ‘my_dense.variable’ is shown below the code block. It shows that there are
two variables in ‘simple_dense’ called ‘kernel’ and ‘bias’. The kernel ‘w’ is initialized a
value 0.0038, a random normal distribution value, and the bias ‘b’ is initialized with the
value 0. This is just the initial state of the layer. Once trained, these values will change
accordingly.
import numpy as np
# define the dataset
xs = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)
# use the Sequential API to build a model with our custom layer
my_layer = SimpleDense(units=1)
model = tf.keras.Sequential([my_layer])
# configure and train the model
model.compile(optimizer='sgd', loss='mean_squared_error')
model.fit(xs, ys, epochs=500,verbose=0)
# perform inference
print(model.predict([10.0]))
# see the updated state of the variables
print(my_layer.variables)
Output:
[[18.981567]]
[<tf.Variable 'sequential/simple_dense_1/kernel:0' shape=(1, 1)

dtype=float32, numpy=array([[1.9973286]], dtype=float32)>,
<tf.Variable 'sequential/simple_dense_1/bias:0' shape=(1,)

dtype=float32, numpy=array([-0.99171764], dtype=float32)>]
Explanation of the code above —The code used above is a very simple way to check if
the custom layers work. Input and output are set, and the model is compiled using the
custom layer and finally trained for 500 epochs. What is important to see is that after
training the model, the values of the weights and biases have now changed. The weight
which was initially set as 0.0038 is now 1.9973, and the bias which was initially set as
zero is now -0.9917.
Adding an Activation Function to the Custom Dense Layer:

Previously we created the custom Dense layer but we did not add any activations along
with this layer. Of course to add activation we can just write the activation as a separate
line in the model, or add the activation as a Lambda layer. But how do we implement the
activation in the same custom layer that we created above.
The answer is a simple tweak in the ‘__init__’ and the ‘call’ methods in the custom Dense
layer.
# add an activation parameter
def __init__(self, units=32, activation=None):
self.units = units
Get started
Open in app
# define the activation to get from the built-in activation
layers in Keras
self.activation = tf.keras.activations.get(activation)
initial_value=w_init(shape=(input_shape[-1],
self.units),dtype='float32'),trainable=True)
self.b = tf.Variable(name="bias",
initial_value=b_init(shape=(self.units,),
super().build(input_shape)
# pass the computation to the activation layer
return self.activation(tf.matmul(inputs, self.w) + self.b)
Explanation of the code above — Most of the code is exactly similar to the code that we
used before.
To add the activation we need to specify in the ‘__init__’ that we need an activation.
Either a string or an instance of an activation object can be passed into this activation. It
is set to default as None, so if no activation function is mentioned it will not throw an
error. Next, we have to initialize the activation function as —
‘tf.keras.activations.get(activation)’.
The final edit is in the ‘call’ method where just before the computation of the weights
and the biases we need to add self.activation to activate the computation. So now the
return is the computation along with the activation.
Complete code of Custom Dense layer with Activation on the mnist dataset:

self.units = units
# define the activation to get from the built-in activation

layers in Keras
self.activation = tf.keras.activations.get(activation)
initial_value=w_init(shape=(input_shape[-1],
self.units),dtype='float32'),trainable=True)
self.b = tf.Variable(name="bias",
initial_value=b_init(shape=(self.units,),
super().build(input_shape)
# pass the computation to the activation layer
return self.activation(tf.matmul(inputs, self.w) + self.b)
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# build the model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
# our custom Dense layer with activation
SimpleDense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
# compile the model
model.compile(optimizer='adam',

loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# fit the model
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
Training the model with our custom Dense layer and activation gives a training accuracy
of 97.8% and a validation accuracy of 97.7%.
Conclusion:
This is the way to create custom layers in TensorFlow. Even though we only see the
working of a Dense Layer, this can easily be replaced by any other layers such as a
Quadratic Layer which does the following computation —
It has 3 state variables: a,b and c,
Computation:
Replacing the Dense Layer with a Quadratic layer:
class SimpleQuadratic(Layer):
'''Initializes the class and sets up the internal

variables'''
super(SimpleQuadratic,self).__init__()
self.units=units
self.activation=tf.keras.activations.get(activation)
'''Create the state of the layer (weights)'''
Get started
Open in app
a_init = tf.random_normal_initializer()
a_init_val = a_init(shape=(input_shape[-1],self.units),dtype=
'float32')
self.a = tf.Variable(initial_value=a_init_val,
trainable='true')
b_init = tf.random_normal_initializer()
b_init_val = b_init(shape=(input_shape[-1],self.units),dtype=
'float32')
self.b = tf.Variable(initial_value=b_init_val,
trainable='true')
c_init= tf.zeros_initializer()
c_init_val = c_init(shape=(self.units,),dtype='float32')
self.c =
tf.Variable(initial_value=c_init_val,trainable='true')
'''Defines the computation from inputs to outputs'''
x_squared= tf.math.square(inputs)
x_squared_times_a = tf.matmul(x_squared,self.a)
x_times_b= tf.matmul(inputs,self.b)
x2a_plus_xb_plus_c = x_squared_times_a+x_times_b+self.c
return self.activation(x2a_plus_xb_plus_c)
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
SimpleQuadratic(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
This Quadratic layer gives a validation accuracy of 97.8% on the mnist dataset.
Thus, we see we can implement our own layers along with the desired activation into the
TensorFlow models to edit or maybe even improve overall accuracies.
Sign up for The Variable

By Towards Data Science
Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials
and cutting-edge research to original features you don't want to miss. Take a look.
Get this newsletter
Towards Data Science TensorFlow Deep Learning Machine Learning Neural Networks
About Write Help Legal
Get the Medium app

Creating and Training Custom Layers in TensorFlow 2 - by Arjun Sarkar - Towards Data Science

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Creating and Training Custom Layers in TensorFlow 2 - by Arjun Sarkar - Towards Data Science

Uploaded by

Copyright:

Available Formats

8/23/2021 Creating and Training Custom Layers in TensorFlow 2 | by Arjun Sarkar | Towards Data Science

Get started Open in app

Follow 571K Followers