Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Recurrent Neural Networks Chapter 1

A non-linear transformation of the sum of the two matrix multiplications—for example,


using the tanh or ReLU activation functions—becomes the RNN layer's output, yt:

The right side of the equation shows the effect of unrolling the recurrent relationship,
highlighting the repeated application of the same transformations and the resulting state
that combines information from past sequence elements with the current input, or context.
An alternative formulation connects the context vector to the first hidden state only; we will
outline additional options to modify the previously shown baseline architecture in the
following section.

The recurrent connections between subsequent hidden states are critical in rendering the
RNN model as universal because it can compute any (discrete) functions that a Turing
machine can represent.

Backpropagation through time


The unrolled computational graph shown in the preceding diagram highlights that the
learning process necessarily encompasses all time steps included in a given input sequence.
The backpropagation algorithm, which updates the weight parameters based on the
gradient of the loss function with respect to the parameters, involves a forward pass from
left to right along the unrolled computational graph, followed by a backward pass in the
opposite direction.

[4]

You might also like