Networks. These Are Formed From Trillions of Neurons (Nerve

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

Humans and other animals process information with neural

networks. These are formed from trillions of neurons (nerve


cells) exchanging brief electrical pulses called action potentials.
Computer algorithms that mimic these biological structures are
formally called artificial neural networks to distinguish them
from the squishy things inside of animals. However, most
scientists and engineers are not this formal and use the
term neural network to include both biological and
nonbiological systems.
Neural network research is motivated by two desires: to obtain a
better understanding of the human brain, and to develop
computers that can deal with abstract and poorly defined
problems. For example, conventional computers have trouble
understanding speech and recognizing people's faces. In
comparison, humans do extremely well at these tasks.
Many different neural network structures have been tried, some
based on imitating what a biologist sees under the microscope,
some based on a more mathematical analysis of the problem.
The most commonly used structure is shown in Fig. 26-5. This
neural network is formed in three layers, called the input layer,
hidden layer, and output layer. Each layer consists of one or
more nodes, represented in this diagram by the small circles.
The lines between the nodes indicate the flow of information
from one node to the next. In this particular type of neural
network, the information flows only from the input to the output
(that is, from left-to-right). Other types of neural networks have
more intricate connections, such as feedback paths.
The nodes of the input layer are passive, meaning they do not
modify the data. They receive a single value on their input, and
duplicate the value to
their multiple outputs. In comparison, the nodes of the hidden
and output layer are active. This means they modify the data as
shown in Fig. 26-6. The variables: X11,X12…X115 hold the data
to be evaluated (see Fig. 26-5). For example, they may be pixel
values from an image, samples from an audio signal, stock
market prices on successive days, etc. They may also be the
output of some other algorithm, such as the classifiers in our
cancer detection example: diameter, brightness, edge sharpness,
etc.

a neural network is a complex mesh of artificial neurons that


imitates how the brain works. They take in input parameters with
its associated weights and biases then we compute the weighted
sum of the ‘activated’ neurons. Our activation function gets to
decide which neurons will push forward the values into the next
layer. How it works, is what we are going to see next.

The Underlying Concept


In the forward propagation steps, our artificial neurons receive
inputs from different parameters or features. Ideally, each input
has its own value or weight and biases which can display
interdependency that leads to change in the final prediction
value. This is a phenomenon referred to as the Interaction
effect.
A good example would be when trying a regression model on a
dataset of diabetic patients. The goal is to predict if a person runs
the risk of diabetes based on their body weight and height. Some
bodyweights indicate a greater risk of diabetes if a person has a
shorter height compared to a taller person who relatively has
better health index. There are of course other parameters which
we are not considering at the moment. We say there is an
interaction effect between height and bodyweight.

The activation function takes into account the interaction effects


in different parameters and does a transformation after which it
gets to decide which neuron passes forward the value into the
next layer.

Activation functions
So what does an artificial neuron do? Simply put, it calculates a
“weighted sum” of its input, adds a bias and then decides
whether it should be “fired” or not ( yeah right, an activation
function does this, but let’s go with the flow for a moment ).

So consider a neuron.

Now, the value of Y can be anything ranging from -inf to +inf.


The neuron really doesn’t know the bounds of the value. So how
do we decide whether the neuron should fire or not ( why this
firing pattern? Because we learnt it from biology that’s the way
brain works and brain is a working testimony of an awesome and
intelligent system ).

We decided to add “activation functions” for this purpose. To


check the Y value produced by a neuron and decide whether
outside connections should consider this neuron as “fired” or not.
Or rather let’s say — “activated” or not.

Step function
The first thing that comes to our minds is how about a threshold
based activation function? If the value of Y is above a certain
value, declare it activated. If it’s less than the threshold, then say
it’s not. Hmm great. This could work!

Activation function A = “activated” if Y > threshold else not

Alternatively, A = 1 if y> threshold, 0 otherwise

Well, what we just did is a “step function”, see the below figure.

There are a number of common activation functions in use with


neural networks. This is not an exhaustive list.

the activation function of a node defines the output of that node


given an input or set of inputs. A standard computer chip
circuit can be seen as a digital network of activation functions
that can be "ON" (1) or "OFF" (0), depending on input. This is
similar to the behavior of the linear perceptron in neural
networks. However, it is the nonlinear activation function that
allows such networks to compute nontrivial problems using only
a small number of nodes. In artificial neural networks this
function is also called transfer function

The artificial neuron shown in Figure 2.1 is a very simple


processing unit. The neuron has a fixed number of inputs n; each
input is connected to the neuron by a weighted link wi. The
neuron sums up the net input according to the equation: net =
∑i=1n xi wi or expressed as vectors net = xT w. To calculate the
output a activation function f is applied to the net input of the
neuron. This function is either a simple threshold function or a
continuous non linear function. Two often used activation
functions are:
fC(net) = {11-e-net}

fT(net) = {{ if a > θ then 1 else 0 }


The artificial neuron is an abstract model of the biological
neuron. The strength of a connection is coded in the weight. The
intensity of the input signal is modeled by using a real number
instead of a temporal summation of spikes. The artificial neuron
works in discrete time steps; the inputs are read and processed at
one moment in time.
There are many different learning methods possible for a single
neuron. Most of the supervised methods are based on the idea of
changing the weight in a direction that the difference between
the calculated output and the desired output is decreased.
Examples of such rules are the Perceptron Learning Rule,
the Widrow-Hoff Learning Rule, and the Gradient descent
Learning Rule.

A neural network is put together by hooking together many of


our simple “neurons,” so that the output of a neuron can be the
input of another. For example, here is a small neural network:

In this figure, we have used circles to also denote the inputs to


the network. The circles labeled “+1” are called bias units, and
correspond to the intercept term. The leftmost layer of the
network is called theinput layer, and the rightmost layer
the output layer (which, in this example, has only one node).
The middle layer of nodes is called the hidden layer, because its
values are not observed in the training set. We also say that our
example neural network has 3 input units (not counting the bias
unit), 3 hidden units, and 1 output unit.
A feedforward neural network is an artificial neural
network wherein connections between the units do not form a
cycle. As such, it is different from recurrent neural networks.
The feedforward neural network was the first and simplest type
of artificial neural network devised. In this network, the
information moves in only one direction, forward, from the
input nodes, through the hidden nodes (if any) and to the output
nodes. There are no cycles or loops in the network.

The simplest kind of neural network is a single-layer


perceptron network, which consists of a single layer of output
nodes; the inputs are fed directly to the outputs via a series of
weights. In this way it can be considered the simplest kind of
feed-forward network. The sum of the products of the weights
and the inputs is calculated in each node, and if the value is
above some threshold (typically 0) the neuron fires and takes the
activated value (typically 1); otherwise it takes the deactivated
value (typically -1). Neurons with this kind of activation
function are also called artificial neurons or linear threshold
units. In the literature the term perceptron often refers to
networks consisting of just one of these units. A similar neuron
was described by Warren McCulloch and Walter Pitts in the
1940s.

In this single-layer feedforward neural network, the network’s


inputs are directly connected to the output layer
perceptrons, Z1 and Z2. 
The output perceptrons use activation functions, g1 and g2, to
produce the outputs Y1 and Y2.
Since

 ,

,
and

.
When the activation functions g1 and g2 are identity activation
functions, the single-layer neural net is equivalent to a linear
regression model.  Similarly, if g1 and g2are logistic activation
functions, then the single-layer neural net is equivalent to
logistic regression.  Because of this correspondence between
single-layer neural networks and linear and logistic regression,
single-layer neural networks are rarely used in place of linear
and logistic regression.
The next most complicated neural network is one with two
layers.  This extra layer is referred to as a hidden layer.  In
general there is no restriction on the number of hidden layers. 
However, it has been shown mathematically that a two-layer
neural network
can accurately reproduce any differentiable function, provided
the number of perceptrons in the hidden layer is unlimited.
However, increasing the number of perceptrons increases the
number of weights that must be estimated in the network, which
in turn increases the execution time for the network.  Instead of
increasing the number of perceptrons in the hidden layers to
improve accuracy, it is sometimes better to add additional
hidden layers, which typically reduce both the total number of
network weights and the computational time.  However, in
practice, it is uncommon to see neural networks with more than
two or three hidden layers.
recurrent networks

Recurrent Neural Networks (RNNs) are popular models that have shown great promise in many NLP
tasks. But despite their recent popularity I’ve only found a limited number of resources that throughly
explain how RNNs work, and how to implement them. That’s what this tutorial is about. It’s a multi-part
series in which I’m planning to cover the following:

1. Introduction to RNNs (this post)


2. Implementing a RNN using Python and Theano
3. Understanding the Backpropagation Through Time (BPTT) algorithm and the
vanishing gradient problem
4. Implementing a GRU/LSTM RNN
As part of the tutorial we will implement a recurrent neural network based
language model. The applications of language models are two-fold: First, it
allows us to score arbitrary sentences based on how likely they are to occur in
the real world. This gives us a measure of grammatical and semantic
correctness. Such models are typically used as part of Machine Translation
systems. Secondly, a language model allows us to generate new text (I think
that’s the much cooler application). Training a language model on
Shakespeare allows us to generate Shakespeare-like text. 

I’m assuming that you are somewhat familiar with basic Neural Networks. If
you’re not, you may want to head over to Implementing A Neural Network
From Scratch,  which guides you through the ideas and implementation
behind non-recurrent networks.
WHAT ARE RNNS?

The idea behind RNNs is to make use of sequential information. In a


traditional neural network we assume that all inputs (and outputs) are
independent of each other. But for many tasks that’s a very bad idea. If you
want to predict the next word in a sentence you better know which words
came before it. RNNs are called recurrent because they perform the same
task for every element of a sequence, with the output being depended on the
previous computations. Another way to think about RNNs is that they have a
“memory” which captures information about what has been calculated so far.
In theory RNNs can make use of information in arbitrarily long sequences, but
in practice they are limited to looking back only a few steps (more on this
later). Here is what a typical RNN looks like:

A recurrent neural network and the unfolding in time of the computation


involved in its forward computation. Source: Nature

The above diagram shows a RNN being unrolled (or unfolded) into a full


network. By unrolling we simply mean that we write out the network for the
complete sequence. For example, if the sequence we care about is a
sentence of 5 words, the network would be unrolled into a 5-layer neural
network, one layer for each word. The formulas that govern the computation
happening in a RNN are as follows:
  is the input at time step  . For example,   could be a one-hot vector
corresponding to the second word of a sentence.
  is the hidden state at time step  . It’s the “memory” of the network.   is
calculated based on the previous hidden state and the input at the current
step:  . The function   usually is a nonlinearity such
astanh or ReLU.   , which is required to calculate the first hidden state, is
typically initialized to all zeroes.
  is the output at step  . For example, if we wanted to predict the next word in a
sentence it would be a vector of probabilities across our
vocabulary.  .
There are a few things to note here:

 You can think of the hidden state   as the memory of the network.  captures
information about what happened in all the previous time steps. The output at
step   is calculated solely based on the memory at time  . As briefly mentioned
above, it’s a bit more complicated  in practice because  typically can’t capture
information from too many time steps ago.
 Unlike a traditional deep neural network, which uses different parameters at
each layer, a RNN shares the same parameters (  above) across all steps. This
reflects the fact that we are performing the same task at each step, just with different
inputs. This greatly reduces the total number of parameters we need to learn.
 The above diagram has outputs at each time step, but depending on the task this
may not be necessary. For example, when predicting the sentiment of a sentence we
may only care about the final output, not the sentiment after each word. Similarly,
we may not need inputs at each time step. The main feature of an RNN is its  hidden
state, which captures some information about a sequence.

WHAT CAN RNNS DO?

RNNs have shown great success in many NLP tasks. At this point I should
mention that the most commonly used type of RNNs are LSTMs, which are
much better at capturing long-term dependencies than vanilla RNNs are. But
don’t worry, LSTMs are essentially the same thing as the RNN we will develop
in this tutorial, they just have a different way of computing the hidden state.
We’ll cover LSTMs in more detail in a later post. Here are some example
applications of RNNs in NLP (by non means an exhaustive list).

You might also like