Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Get unlimited access to the best of Medium for less than $1/week.

Become a member

Mathematical Representation of a
Perceptron Layer (with example in
TensorFlow)
Daniel H · Follow
7 min read · Jan 31, 2019

Motivation
When I started to learn about artificial neural networks, I quickly became
excited about the insides of those networks to solve real-world tasks. I am a
person who understands because of his imagination and thus being in the
need of graphics, animations as well as mathematical formulas if applicable.

The first question I asked myself was:


‘How does the transformation from inputs to outputs work in an ANN?’,
and is exactly what this article covers.

Sure there are many articles about this topic already, but none is as good as
your own written one. Furthermore, the contents of articles are slightly
different and there may be persons who are in the need of this article to
understand the ‘How’ and ‘Why’ better, which is another reason for making it
public.

Contents / Structure
The article is divided in two parts:

Mathematics of a Perceptron

Application in TensorFlow

The math starts with a Linear Threshold Unit (LTU), the first artificial
neuron proposed by Waren McCulloch and Walter Pitts, then combines
those units to a Perceptron, and finally clarifies the insides of a Multi-Layer
Perceptron ready to be fed by multiple instances at once for batch training.

The second part uses the explained math to build ANN layers for solving the
task to classify handwritten digits.
Prerequisites
To understand the math, you need to know how vector and matrix calculus
works - dot product and addition are of particular importance.

For running the TensorFlow application, you need to have a Python


environment with all required packages installed, especially NumPy,
TensorFlow as well as its dependencies.

Mathematics of a Perceptron

Linear Threshold Unit (LTU)


The linear threshold unit (LTU) consists of one input x with n values, one
single-value output y, and in-between mathematical operations to calculate
the linear combination of the inputs and their weights (i.e. the weighted
sum) and to apply an activation function on that calculation.

Linear Threshold Unit (LTU)

The weighted sum is the dot product of w and x, written as a matrix product:

and the used activation function is the Heaviside step function:

The resulting output value y is binary, since the step function outputs 0 and 1
only:
By that, a single LTU can be used for binary classification, just like Logistic
Regression.

Perceptron
A Perceptron is a simple artificial neural network (ANN) based on a single
layer of LTUs, where each LTU is connected to all inputs of vector x as well as
a bias vector b.

Perceptron with 3 LTUs

The above shown example takes one input vector x and a bias vector b = (1,
1, 1)^T (consists of ones only). It outputs 3 binary values y.

Important to note is, that the weight vector exists per LTU:

The equations can be combined to:


W is a matrix of shape (u, n), where u = number of LTUs and n = number of
input values.

The input vector x is of shape (n, 1), the bias vector b is of shape (u, 1) and
the
Search output vector y is of shape (u, 1). Write

By that, the Perceptron can be used for multi-class classification.

Multi-Layer Perceptron (MLP)


A Multi-Layer Perceptron (MLP) is a composition of an input layer, at least
one hidden layer of LTUs and an output layer of LTUs. If an MLP has two or
more hidden layer, it is called a deep neural network (DNN).

Multi-Layer Perceptron (hidden layer has 3 LTUs, output layer has 2 LTUs)

The calculations are the same as for a Perceptron, but now there exist more
layers of LTUs to combine until you reach the output y:

Input in Batches of k Instances


ANNs are usually trained in batches of instances (an instance is one input
vector x). Hereby k instances are drawn from the m instances available:

The k instances can be combined to:


By that, the equation to calculate y changes to:

The input X is now a matrix of shape (k, n), where k = number of instances
per batch and n = number of input values.

W is a matrix, but the shape changed to (n, u) (W is just the transpose of


itself = W^T).

The bias vector b is of shape (u, 1) and the output y changed to a matrix of
shape (k, u).

Modern MLPs
Todays MLPs make use of other activation functions in order to work better
with Gradient Descent.

Hereby the heaviside step function is replaced by one of the following


activations:

Logistic Function (Sigmoid)

Rectifying Linear Unit ReLU

Hyperbolic Tangent tanh

The activation function at the output layer depends on the task to be solved
by the ANN:

classification tasks: Softmax activation function

regression tasks: no activation function

Application in TensorFlow
With the math in the back, we can build our own neural network layers.

The task to solve is to classify handwritten digits from 0 to 9. I take the digits
dataset provided by scikit-learn which itself is a subset of the Optical
Recognition of Handwritten Digits Data Set from the UCI ML Repository.

The data exploration, data preprocessing and model evaluation are not part
of this article to focus on the implementation of Perceptron layers in
Tensorflow only. If you want to take a look at the whole machine learning
project, visit my GitHub ML repository with a Jupyter Notebook.

Overview
The digits dataset consists of m = 1797 instances with grayscale images
showing numbers. One image is 8 by 8 pixels and each pixel value is within

You might also like