Professional Documents
Culture Documents
Mathematical Representation of A Perceptron Layer (With Example in Tensorflow)
Mathematical Representation of A Perceptron Layer (With Example in Tensorflow)
Become a member
Mathematical Representation of a
Perceptron Layer (with example in
TensorFlow)
Daniel H · Follow
7 min read · Jan 31, 2019
Motivation
When I started to learn about artificial neural networks, I quickly became
excited about the insides of those networks to solve real-world tasks. I am a
person who understands because of his imagination and thus being in the
need of graphics, animations as well as mathematical formulas if applicable.
Sure there are many articles about this topic already, but none is as good as
your own written one. Furthermore, the contents of articles are slightly
different and there may be persons who are in the need of this article to
understand the ‘How’ and ‘Why’ better, which is another reason for making it
public.
Contents / Structure
The article is divided in two parts:
Mathematics of a Perceptron
Application in TensorFlow
The math starts with a Linear Threshold Unit (LTU), the first artificial
neuron proposed by Waren McCulloch and Walter Pitts, then combines
those units to a Perceptron, and finally clarifies the insides of a Multi-Layer
Perceptron ready to be fed by multiple instances at once for batch training.
The second part uses the explained math to build ANN layers for solving the
task to classify handwritten digits.
Prerequisites
To understand the math, you need to know how vector and matrix calculus
works - dot product and addition are of particular importance.
Mathematics of a Perceptron
The weighted sum is the dot product of w and x, written as a matrix product:
The resulting output value y is binary, since the step function outputs 0 and 1
only:
By that, a single LTU can be used for binary classification, just like Logistic
Regression.
Perceptron
A Perceptron is a simple artificial neural network (ANN) based on a single
layer of LTUs, where each LTU is connected to all inputs of vector x as well as
a bias vector b.
The above shown example takes one input vector x and a bias vector b = (1,
1, 1)^T (consists of ones only). It outputs 3 binary values y.
Important to note is, that the weight vector exists per LTU:
The input vector x is of shape (n, 1), the bias vector b is of shape (u, 1) and
the
Search output vector y is of shape (u, 1). Write
Multi-Layer Perceptron (hidden layer has 3 LTUs, output layer has 2 LTUs)
The calculations are the same as for a Perceptron, but now there exist more
layers of LTUs to combine until you reach the output y:
The input X is now a matrix of shape (k, n), where k = number of instances
per batch and n = number of input values.
The bias vector b is of shape (u, 1) and the output y changed to a matrix of
shape (k, u).
Modern MLPs
Todays MLPs make use of other activation functions in order to work better
with Gradient Descent.
The activation function at the output layer depends on the task to be solved
by the ANN:
Application in TensorFlow
With the math in the back, we can build our own neural network layers.
The task to solve is to classify handwritten digits from 0 to 9. I take the digits
dataset provided by scikit-learn which itself is a subset of the Optical
Recognition of Handwritten Digits Data Set from the UCI ML Repository.
The data exploration, data preprocessing and model evaluation are not part
of this article to focus on the implementation of Perceptron layers in
Tensorflow only. If you want to take a look at the whole machine learning
project, visit my GitHub ML repository with a Jupyter Notebook.
Overview
The digits dataset consists of m = 1797 instances with grayscale images
showing numbers. One image is 8 by 8 pixels and each pixel value is within