Professional Documents
Culture Documents
Multilayer Perceptrons For Digit Recognition With Core APIs - TensorFlow Core
Multilayer Perceptrons For Digit Recognition With Core APIs - TensorFlow Core
Multilayer Perceptrons are made up of functional units called perceptrons. The equation of a
perceptron is as follows:
where
: perceptron output
: feature matrix
: weight vector
: bias
When these perceptrons are stacked, they form structures called dense layers which can then
be connected to build a neural network. A dense layer's equation is similar to that of a
perceptron's but uses a weight matrix and a bias vector instead:
where
: feature matrix
: weight matrix
: bias vector
In an MLP, multiple dense layers are connected in such a way that the outputs of one layer are
fully connected to the inputs of the next layer. Adding non-linear activation functions to the
outputs of dense layers can help the MLP classifier learn complex decision boundaries and
generalize well to unseen data.
Setup
Import TensorFlow, pandas (https://pandas.pydata.org), Matplotlib (https://matplotlib.org) and
seaborn (https://seaborn.pydata.org) to get started.
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
import seaborn as sns
import tempfile
import os
# Preset Matplotlib figure sizes.
matplotlib.rcParams['figure.figsize'] = [9, 6]
import tensorflow as tf
import tensorflow_datasets as tfds
print(tf.__version__)
# Set random seed for reproducible results
tf.random.set_seed(22)
Split the MNIST dataset into training, validation, and testing sets. The validation set can be
used to gauge the model's generalizability during training so that the test set can serve as a
final unbiased estimator for the model's performance.
The MNIST dataset consists of handwritten digits and their corresponding true labels.
Visualize a couple of examples below.
x_viz, y_viz = tfds.load("mnist", split=['train[:1500]'], batch_size=-1, as_supe
x_viz = tf.squeeze(x_viz, axis=3)
for i in range(9):
plt.subplot(3,3,1+i)
plt.axis('off')
plt.imshow(x_viz[i], cmap='gray')
plt.title(f"True Label: {y_viz[i]}")
plt.subplots_adjust(hspace=.5)
Also review the distribution of digits in the training data to verify that each class is well
represented in the dataset.
sns.countplot(y_viz.numpy());
plt.xlabel('Digits')
plt.title("MNIST Digit Distribution");
x = tf.linspace(-2, 2, 201)
x = tf.cast(x, tf.float32)
plt.plot(x, tf.nn.relu(x));
plt.xlabel('x')
plt.ylabel('ReLU(x)')
plt.title('ReLU activation function');
The softmax activation function is a normalized exponential function that converts real
numbers into a probability distribution with outcomes/classes. This is useful for predicting
class probabilities from a neural network's output:
x = tf.linspace(-4, 4, 201)
x = tf.cast(x, tf.float32)
plt.plot(x, tf.nn.softmax(x, axis=0));
plt.xlabel('x')
plt.ylabel('Softmax(x)')
plt.title('Softmax activation function');
The dense layer
Create a class for the dense layer. By definition, the outputs of one layer are fully connected to
the inputs of the next layer in an MLP. Therefore, the input dimension for a dense layer can be
inferred based on the output dimension of its previous layer and does not need to be specified
upfront during its initialization. The weights should also be initialized properly to prevent
activation outputs from becoming too large or small. One of the most popular weight
initialization methods is the Xavier scheme, where each element of the weight matrix is
sampled in the following manner:
def xavier_init(shape):
# Computes the xavier initialization values for a weight matrix
in_dim, out_dim = shape
xavier_lim = tf.sqrt(6.)/tf.sqrt(tf.cast(in_dim + out_dim, tf.float32))
weight_vals = tf.random.uniform(shape=(in_dim, out_dim),
minval=-xavier_lim, maxval=xavier_lim, seed=22
return weight_vals
class DenseLayer(tf.Module):
Next, build a class for the MLP model that executes layers sequentially.
Remember that the
model variables are only available after the first sequence of dense layer calls due to
dimension inference.
class MLP(tf.Module):
The softmax activation function does not need to be applied by the MLP. It is computed
separately in the loss and prediction functions.
hidden_layer_1_size = 700
hidden_layer_2_size = 500
output_size = 10
mlp_model = MLP([
DenseLayer(out_dim=hidden_layer_1_size, activation=tf.nn.relu),
DenseLayer(out_dim=hidden_layer_2_size, activation=tf.nn.relu),
DenseLayer(out_dim=output_size)])
where
The tf.nn.sparse_softmax_cross_entropy_with_logits
(https://www.tensorflow.org/api_docs/python/tf/nn/sparse_softmax_cross_entropy_with_logits) function
can be used to compute the cross-entropy loss. This function does not require the model's last
layer to apply the softmax activation function nor does it require the class labels to be one hot
encoded
Write a basic accuracy function that calculates the proportion of correct classifications during
training. In order to generate class predictions from softmax outputs, return the index that
corresponds to the largest class probability.
class Adam:
Now, write a custom training loop that updates the MLP parameters with mini-batch gradient
descent. Using mini-batches for training provides both memory efficiency and faster
convergence.
Epoch: 0
Training loss: 0.223, Training accuracy: 0.934
Validation loss: 0.121, Validation accuracy: 0.962
Epoch: 1
Training loss: 0.080, Training accuracy: 0.975
Validation loss: 0.097, Validation accuracy: 0.971
Epoch: 2
Training loss: 0.047, Training accuracy: 0.986
Validation loss: 0.085, Validation accuracy: 0.977
Epoch: 3
Training loss: 0.033, Training accuracy: 0.990
Validation loss: 0.111, Validation accuracy: 0.971
Epoch: 4
Training loss: 0.027, Training accuracy: 0.991
Performance evaluation
Start by writing a plotting function to visualize the model's loss and accuracy during training.
Data preprocessing
Probability prediction
Class prediction
class ExportModule(tf.Module):
def __init__(self, model, preprocess, class_pred):
# Initialize pre and postprocessing functions
self.model = model
self.preprocess = preprocess
self.class_pred = class_pred
@tf.function(input_signature=[tf.TensorSpec(shape=[None, None, None, None], dt
def __call__(self, x):
# Run the ExportModule for new data points
x = self.preprocess(x)
y = self.model(x)
y = self.class_pred(y)
return y
def preprocess_test(x):
# The export module takes in unprocessed and unlabeled data
x = tf.reshape(x, shape=[-1, 784])
x = x/255
return x
def class_pred_test(y):
# Generate class predictions from MLP output
return tf.argmax(tf.nn.softmax(y), axis=1)
mlp_model_export = ExportModule(model=mlp_model,
preprocess=preprocess_test,
class_pred=class_pred_test)
models = tempfile.mkdtemp()
save_path = os.path.join(models, 'mlp_model_export')
tf.saved_model.save(mlp_model_export, save_path)
mlp_loaded = tf.saved_model.load(save_path)
The model does a great job of classifying handwritten digits in the training dataset and also
generalizes well to unseen data. Now, examine the model's class-wise accuracy to ensure good
performance for each digit.
It looks like the model struggles with some digits a little more than others which is quite
common in many multiclass classification problems. As a final exercise, plot a confusion
matrix of the model's predictions and its corresponding true labels to gather more class-level
insights. Sklearn and seaborn have functions for generating and visualizing confusion
matrices.
show_confusion_matrix(y_test, test_classes)
Class-level insights can help identify reasons for misclassifications and improve model
performance in future training cycles.
Conclusion
This notebook introduced a few techniques to handle a multiclass classification problem with
an MLP
(https://developers.google.com/machine-learning/crash-course/multi-class-neural-networks/softmax).
Here are a few more tips that may help:
Initialization schemes can help prevent model parameters from vanishing or exploding
during training.
Overfitting is another common problem for neural networks, though it wasn't a problem
for this tutorial. Visit the Overfit and underfit (/guide/core/overfit_and_underfit) tutorial for
more help with this.
For more examples of using the TensorFlow Core APIs, check out the guide
(https://www.tensorflow.org/guide/core). If you want to learn more about loading and preparing
data, see the tutorials on image data loading
(https://www.tensorflow.org/tutorials/load_data/images) or CSV data loading
(https://www.tensorflow.org/tutorials/load_data/csv).
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License
(https://creativecommons.org/licenses/by/4.0/), and code samples are licensed under the Apache 2.0 License
(https://www.apache.org/licenses/LICENSE-2.0). For details, see the Google Developers Site Policies
(https://developers.google.com/site-policies). Java is a registered trademark of Oracle and/or its affiliates.