Professional Documents
Culture Documents
Creating Theanets
Creating Theanets
Creating a Network
==================
The first step in using ``theanets`` is creating a model to train and use.
This basically involves two parts:
- one of the three broad classes of network models, and
- a series of layers that map inputs to outputs.
In ``theanets``, a network model is a subclass of :class:`Network
<theanets.feedforward.Network>`. Its primary defining characteristics are the
implementations of the :func:`Network.error()
<theanets.feedforward.Network.error>` and :func:`Network.setup_vars()
<theanets.feedforward.Network.setup_vars>` methods.
The ``error`` method defines the error function for the model, as a function of
the output of the network, and any internal symbolic variables that the model
defines. The error is an important (and sometimes the only) component of the
loss that model trainers attempt to minimize during the learning process.
The ``setup_vars`` method defines the variables that the network requires for
computing an error value. All variables that are required to compute the error
must be defined in this method.
In the brief discussion below, we assume that the network has some set of
parameters :math:`\theta`. In the feedforward pass, the network computes some
function of its inputs :math:`x \in \mathbb{R}^n` using these parameters; we
represent this feedforward function using the notation :math:`F_\theta(x)`.
.. _creating-predefined-models:
Feedforward Models
==================
There are three major types of neural network models, each defined primarily by
the loss function that the model attempts to optimize. While other types of
models are certainly possible, ``theanets`` only tries to handle the common
cases with built-in model classes. (If you want to define a new type of model,
see :ref:`creating-customizing`.)
Autoencoder
----------An :class:`autoencoder <theanets.feedforward.Autoencoder>` takes an array of
arbitrary data :math:`x` as input, transforms it in some way, and then attempts
to recreate the original input as the output of the network.
To evaluate the loss for an autoencoder, only the input data is required. The
default autoencoder model computes the loss using the mean squared error between
the network's output and the input:
.. math::
\mathcal{L}(X, \theta) = \frac{1}{m}\frac{1}{n} \sum_{i=1}^m \left\| F_\theta
(x_i) - x_i \right\|_2^2 + R(X, \theta)
To create an autoencoder in theanets, you can create a network class directly::
net = theanets.Autoencoder()
.. math::
\mathcal{L}(X, Y, \theta) = \frac{1}{m} \sum_{i=1}^m - \log F_\theta(x_i)_{y_
i} + R(x, \theta)
To create a classifier model in ``theanets``, you can create a network class
directly::
net = theanets.Classifier()
or you can use an :class:`Experiment <theanets.main.Experiment>`::
exp = theanets.Experiment(theanets.Classifier)
net = exp.network
A classifier model requires the following inputs at training time:
- ``x``: A two-dimensional array of input data. Each row of ``x`` is expected to
be one data item. Each column of ``x`` holds the measurements of a particular
input variable across all data items.
- ``labels``: A one-dimensional array of target labels. Each element of
``labels`` is expected to be the class index for a single data item.
The number of rows in ``x`` must match the number of elements in the ``labels``
vector. Additionally, the values in ``labels`` are expected to range from 0 to
the number of classes in the data being modeled. For example, for the MNIST
digits dataset, which represents digits 0 through 9, the labels array contains
integer class labels 0 through 9.
.. _creating-recurrent-models:
Recurrent Models
================
The three types of feedforward models described above also exist in recurrent
formulations, but in recurrent networks, time is an explicit part of the model.
In ``theanets``, if you wish to include recurrent layers in your model, you must
use a model class from the :mod:`theanets.recurrent` module; this is because
recurrent models require data matrices with an additional dimension to represent
time. In general,
- the data shapes required for a recurrent layer are all one
dimension larger than the corresponding shapes for a feedforward network, and
- the extra dimension is always the 0 axis, and
- the extra dimension represents time.
In addition to the three vanilla model types described above, recurrent networks
also allow for the possibility of *predicting future outputs*. This task is
handled by prediction networks.
Autoencoder
----------A :class:`recurrent autoencoder <theanets.recurrent.Autoencoder>`, just like its
feedforward counterpart, takes as input a single array of data :math:`X \in
\mathbb{R}^{t \times m \times n}` and attempts to recreate the same data at the
output, under a squared-error loss.
A recurrent autoencoder thus requires the following inputs:
- ``x``: A three-dimensional array of input data. Each element of axis 0 of
============================ =======================================
Description
:math:`g(z) =`
============================ =======================================
linear
logistic sigmoid
logistic sigmoid
hyperbolic tangent
smooth relu approximation
categorical distribution
rectified linear
truncated rectified linear
thresholded rectified linear
:math:`z`
:math:`(1 + e^{-z})^{-1}`
:math:`(1 + e^{-z})^{-1}`
:math:`\tanh(z)`
:math:`\log(1 + \exp(z))`
:math:`e^z / \sum e^z`
:math:`\max(0, z)`
:math:`\max(0, \min(1, z))`
:math:`z \mbox{ if } z > 1 \mbox{ else
thresholded linear
truncation
rectification
mean-normalization
max-normalization
variance-normalization
============================
:math:`\min(1, z)`
:math:`\max(0, z)`
:math:`z - \bar{z}`
:math:`z / \max |z|`
:math:`z / \mathbb{E}[(z-\bar{z})^2]`
=======================================
.. _creating-using-weighted-targets:
Using Weighted Targets
======================
By default, the network models available in ``theanets`` treat all inputs as
equal when computing the loss for the model. For example, a regression model
treats an error of 0.1 in component 2 of the output just the same as an error of
0.1 in component 3, and each example of a minibatch is treated with equal
importance when training a classifier.
However, there are times when all inputs to a neural network model are not to be
treated equally. This is especially evident in recurrent models: sometimes, the
inputs to a recurrent network might not contain the same number of time steps,
but because the inputs are presented to the model using a rectangular minibatch
array, all inputs must somehow be made to have the same size. One way to address
this would be to cut off all inputs at the length of the shortest input, but
then the network is not exposed to all input/output pairs during training.
Weighted targets can be used for any model in ``theanets``. For example, an
:class:`autoencoder <theanets.feedforward.Autoencoder>` could use an array of
weights containing zeros and ones to solve a matrix completion task, where the
input array contains some "unknown" values. In such a case, the network is
required to reproduce the known values exactly (so these could be presented to
the model with weight 1), while filling in the unknowns with statistically
reasonable values (which could be presented to the model during training with
weight 0).
As another example, suppose a :class:`classifier
<theanets.feedforward.Classifier>` model is being trained in a binary
classification task where one of the classes---say, class A---is only present
0.1% of the time. In such a case, the network can achieve 99.9% accuracy by
always predicting class B, so during training it might be important to ensure
that errors in predicting A are "amplified" when computing the loss. You could
provide a large weight for training examples in class A to encourage the model
not to miss these examples.
All of these cases are possible to model in ``theanets``; just include
``weighted=True`` when you create your model::
exp = theanets.Experiment(
theanets.recurrent.Autoencoder,
layers=(3, (10, 'rnn'), 3),
weighted=True)
Then, when training the weighted model, the training and validation datasets
require an additional component: an array of floating-point values with the same
shape as the expected outputs of the model. For example, a non-recurrent
Classifier model would require a weight vector with each minibatch, of the same
shape as the labels array, so that the training and validation datasets would
each have three pieces: ``sample``, ``label``, and ``weight``. Each value in the
weight array is used as the weight for the corresponding error when computing
the loss.
.. _creating-customizing:
Customizing
===========
The ``theanets`` package tries to strike a good balance between defining
everything known in the neural networks literature, and allowing you as a
programmer to create new stuff with the library. For many off-the-shelf use
cases, the hope is that something in ``theanets`` will work with just a few
lines of code. For more complex cases, you should be able to create an
appropriate subclass and integrate it into your workflow with a little more
effort.
.. _creating-custom-layers:
Defining Custom Layers
---------------------Layers are the real workhorse in ``theanets``; custom layers can be created to
do all sorts of fun stuff. To create a custom layer, just subclass :class:`Layer
<theanets.layers.Layer>` and give it the functionality you want. As a very
simple example, let's suppose you wanted to create a normal feedforward layer
but did not want to include a bias term::
import theanets
import theano.tensor as TT
class MyLayer(theanets.layers.Layer):
def transform(self, inputs):
return TT.dot(inputs, self.find('w'))
def setup(self):
self.log_setup(self.add_weights('w'))
Once you've set up your new layer class, it will automatically be registered and
available in :func:`theanets.layers.build` using the name of your class::
layer = theanets.layers.build('mylayer', nin=3, nout=4)
or, while creating a model::
net = theanets.Autoencoder(
layers=(4, ('mylayer', 'linear', 3), 4),
tied_weights=True,
)
This example shows how fast it is to create a model that will learn the subspace
of your dataset that spans the most variance---the same subspace spanned by the
principal components.
.. _creating-custom-regularizers:
Defining Custom Regularizers
---------------------------To create a custom regularizer in ``theanets``, you need to subclass the
appropriate model and provide an implementation of the
:func:`theanets.feedforward.Network.loss` method.
Let's keep going with the example above. Suppose you created a linear autoencode
r
model that had a larger hidden layer than your dataset::
net = theanets.Autoencoder(layers=(4, ('linear', 8), 4), tied_weights=True)
Then, at least in theory, you risk learning an uninteresting "identity" model
such that some hidden units are never used, and the ones that are have weights
equal to the identity matrix. To prevent this from happening, you can impose a
sparsity penalty::
net = theanets.Autoencoder(
layers=(4, ('linear', 8), 4),
tied_weights=True,
hidden_l1=0.1,
)
But then you might run into a situation where the sparsity penalty drives some
of the hidden units in the model to zero, to "save" loss during training.
Zero-valued features are probably not so interesting, so we can introduce
another penalty to prevent feature weights from going to zero::
class RICA(theanets.Autoencoder):
def loss(self, **kwargs):
loss, monitors, updates = super(RICA, self).loss(**kwargs)
w = kwargs.get('weight_inverse', 0)
if w > 0:
loss += w * sum((1 / (p * p).sum(axis=0)).sum()
for l in self.layers for p in l.params)
return loss, monitors, updates
This code adds a new regularizer that penalizes the inverse of the squared
length of each of the weights in the model's layers.
.. _creating-custom-errors:
Defining Custom Error Functions
------------------------------It's pretty straightforward to create models in ``theanets`` that use different
error functions from the predefined :class:`Classifier
<theanets.feedforward.Classifier>` (which uses categorical cross-entropy) and
:class:`Autoencoder <theanets.feedforward.Autoencoder>` and :class:`Regressor
<theanets.feedforward.Regressor>` (which both use mean squared error, MSE).
To define by a model with a new cost function, just create a new :class:`Network
<theanets.feedforward.Network>` subclass and override the ``error`` method. For
example, to create a regression model that uses mean absolute error (MAE)
instead of MSE::
class MaeRegressor(theanets.Regressor):
def error(self, output):
return abs(output - self.targets).mean()
Your cost function must return a theano expression that reflects the cost for
your model.