Calculus On Computational Graphs - An Introduction: February 2021

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/348978259
Calculus on Computational Graphs - an Introduction
Chapter · February 2021
CITATIONS READS
0 34
1 author:
Johnson Agbinya
Melbourne Institute of Technology
202 PUBLICATIONS 1,838 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Data Science and AI View project
Digital Identity Management Systems View project
All content following this page was uploaded by Johnson Agbinya on 03 February 2021.
The user has requested enhancement of the downloaded file.

Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications
6
Calculus on Computational Graphs
6.1 Introduction
Fast, accurate and reliable computational schemes are essential when
implementing complex systems required in deep learning applications. One
of the techniques for achieving this is the so-called computational graph.
Computational graphs divide down a complex computation into small and
executable steps which could be performed quickly with pencil and paper
and better still with computers. In most cases, loops that require repeating of
the same algorithm but wastes time computationally due to processing of loop
times become a lot easier to handle. Computational graphs ease the training
of neural networks with gradient descent algorithm making them many times
faster than traditional implementation of neural networks.
Computational graphs have also found applications in weather forecasting
by reducing the associated computation time. Its strength is fast computation
of derivatives. It is known also by a different name of “reverse-mode
differentiation.”
Beyond its use in deep learning, backpropagation is a powerful
computational tool in many other areas like weather forecasting and analysis
of numerical stability. In many ways, computational graph theory is similar
with logic gate operations in digital circuits where dedicated logic operations
are undertaken with logic gates such as AND, OR, NOR and NAND oper-
ations in the implementation of many binary operations. While the use of
logic gates lead to complex systems such as multiplexers, adders, multipliers
and more complex digital circuits, computational graphs have found their way
into deep learning operations involving derivatives of real numbers, additions,
scaling and multiplications of real numbers by simplifying the operations.
93
(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
94 Calculus on Computational Graphs
6.1.1 Elements of Computational Graphs

Computational graphs are useful means of breaking down complex
mathematical computations and operations into micro computations and
thereby make it a lot easier to solve them in a sequential manner. They also
make it easier to track computations and to understand where solutions break
down. A computational graph is a connection of links and nodes at which
operations take place. Nodes represent variables and links are functions and
operations.
The range of operations include addition, multiplication, subtraction,
exponentiation and a lot more operations herein not mentioned. Consider
Figure 6.1, the three nodes represent three variables a, b and c. The variable c
is the result of the operation of the function f on a and b. This means we can
write the result as
c = f (a, b) (6.1)
Computational graphs allow nesting of operations which allow solving more
complex problems. Consider the following nesting of operations with this
computational graph in Equation (6.2).
Clearly, from these three operations, we see that it is a lot easier to
undertake the more complex operation when y is computed as in equation
y = h(g(f (x))) (6.2)
From Figure 6.2, it is apparent that the first operation is performed first
(f (x)). This is followed by the second operation, which is g(.), and lastly
h(.) operation.
However, from the point of view of Equation (6.2), the innermost
operation in Equation (6.2) involving f (x) is performed first. This is followed
by the second operation, which is g(.), and lastly h(.) as the last operation.
b
f c
a
Figure 6.1
x u v y
u=f(x); v=g(u); y=h(v)
Figure 6.2
6.2 Compound Expressions 95
Consider the case where f is the operation of addition (+). The result c is
then given by the expression
c = (a + b)
when the operation is multiplication (x), the result is also given by the
expression.
c = (a × b)
Since f is given as a general operator, we may use any operator in the diagram
and write an equivalent expression for c.
6.2 Compound Expressions

For computer to use compound expressions with computational graph effi-
ciently, it is essential to factor expressions into unit cells. For example, the
expression r = p × q = (x + y)(y + 1) can be reduced first to two unit
cells or two terms followed by computation of the product of the terms. The
product term is r = p × q.
r = p × q = (x + y)(y + 1) = xy + y 2 + x + y
p=x+y
q =y+1
Each of the computational operation or component is created and then a graph
is built out of them by appropriately connecting them with arrows. The arrows
originate from the terms used to build the unit term where the arrow ends as
in Figure 6.3.
This form of abstraction is extremely useful in building neural networks
and deep learning frameworks. In fact, they are also useful in programming
of expressions for example in operations involving parallel support vector
machines (PSVM). In Figure 6.4, once the values at the root of the compu-
tational graphs are known, the solution of the expression becomes a lot easy
and trivial. Take the case where x = 3 and y = 4, the resulting solutions are
shown in Figure 6.5.
The evaluation of the compound expression is 35.
b
f c
a
Figure 6.3
Figure 6.4 Computational graphs of compound expressions.
Figure 6.5 Evaluating compound expressions.
6.3 Computing Partial Derivatives

One of the areas where computational graphs are used widely in neural
network applications is for computing derivatives of variables and functions
in simple forms. For derivatives, it simplifies the use of the chain rule.
Consider computing the partial derivative of y with respect to b where
y = (a + b) × (b − 3) = c × d
(6.3)
c = (a + b); d = (b − 3)
The partial derivative of y with respect to b is

∂y ∂y ∂c ∂y ∂d
= × + × (6.4)
∂b ∂c ∂b ∂d ∂b
6.3 Computing Partial Derivatives 97
c d
-3
a b
Figure 6.6 Computational graph for Equation (6.3).
c d
-3
a bb
Figure 6.7 Computation of partial derivatives.
The partial computational graph covering Equation (6.3) is given in

Figure 6.6. The partial derivative is given in Equation (6.4). From
Equation (6.3), the partial derivatives are given as
∂y
= d = (b − 3);
∂c
∂y
= c = (a + b);
∂d
(6.5)
∂c ∂c ∂d
= 1; = 1; = 1;
∂a ∂b ∂b
∂y
= (b − 3) × 1 + (a + b) × 1
∂b
These derivatives are superimposed on the computational graph in Figure 6.7.
6.3.1 Partial Derivatives: Two Cases of the Chain Rule

The chain rule may be applied in various circumstances. Two cases are of
interest: the linear case (Figure 6.8) and the loop case (Figure 6.9). These two
cases are illustrated in this section.
g h
x y z
dz ∂z ∂z
= ×
dx ∂y ∂x
Δx → Δy → Δz
Figure 6.8 Linear chain rule.
Δx
x
s Δs Δz
z
y Δy
Figure 6.9 Loop chain rule.
6.3.1.1 Linear chain rule

The objective in the linear case is to find the derivative of the output z with
dz
respect to the input x or dx . The output depends recursively on both y and
x and hence it is expected that the partial derivative of z will also depend
on these two variables. The partial derivative of y with respect to x therefore
needs to address this. From the diagram, we can write generally
z = f (x); z = h(y); y = g(x) (6.6)
Therefore, the derivative of z with respect to x needs to be done with respect

to y also and is a product of two terms:
dz ∂z ∂y
= × (6.7)
dx ∂y ∂x
Thus, the derivative of z with respect to x is computed as a product of two
partial derivatives of variables leading to it. This is shown in Figure 6.8.
6.3.1.2 Loop chain rule

The loop chain rule in Figure 6.9 is an application of the linear chain rule.
Each loop is treated with a linear chain rule. Consider the following loop
diagrams (Figure 6.9). The objective is to find the derivative of the output z
using the linear chain rule along the two arms of the loop and sum them.
z is a function of s as z = f (s) through two branches involving x and y,
respectively. In the upper branch, x = g(s) and z = h(x). In the lower
6.4 Computing of Integrals 99
x1
s X2 z Δs Δz
XN
Figure 6.10 Multiple loop chain rule.
branch, y = g(s), z = h(y). Two branches contribute to the value of z so that

z = p(x, y). Therefore, there will also be a sum of partial derivatives coming
from the two branches. The derivative of x with respect to s is obtained as
dz ∂z dx ∂z dy
= × + × (6.8)
ds ∂x ds ∂y ds
6.3.1.3 Multiple loop chain rule

Generally, if z is computed from N loops so that it is a function of N variables
like z = k(x1 , x2 , . . . , xN ), then N branches contribute to the output z.
Therefore, the total derivative of z to the input is a chain of N partial
derivatives.
The general partial derivative expression shown as Figure 6.10 is
dz ∂z dx1 ∂z dx2 ∂z dxN
= × + × + ··· + ×
ds ∂x1 ds ∂x2 ds ∂xN ds
N
X ∂z dxn
= × (6.9)
∂xn ds
n=1
The general case is more suited to deep learning situations where there are
many stages in the neural network and many branches are also involved.
6.4 Computing of Integrals

In the next section, we introduce the use of computational graphs for
computing integrals using some of the well-known traditional approaches
including the trapezoidal and Simpson rules.
6.4.1 Trapezoidal Rule

Integration traditionally is finding the area under a curve. This area has two
dimensions: the discrete step in sampling of the function multiplied by the
amplitude of the function at that discrete step. Thus, the function f (x) may
be computed using trapezoidal rule with the expression
Z b
∆x
f (x)dx ≈ [f (x0 ) + 2f (x1 ) + 2f (x2 ) · · · + 2f (xN −1 ) + f (xN )]
a 2
N −1
∆x X
≈ [f (x0 ) + f (xN )] + ∆x f (xi ) (6.10)
2
i=1
so that ∆x = b−aN ; xi = a+i∆x. N is the number of discrete sampling steps.

Therefore, the computational graph for this integration method can be easily
drawn by first computing the values of the function at N + 1 locations starting
from zero to N in ∆x discrete steps. Just how big N is will be determined by
some error bound, which has been derived to be traditionally
K2 (b − a)3
|E| ≤ .
12N 2
K2 is the value of the second derivative of the function f (x). This sets the
bound on the error in the integration value obtained by using the trapezoidal
rule for the function. Thus, once the choice of N is made, an error bound
has been set for the result of the integration. This error may be reduced by
changing the value of N, the number of terms in the summation. Figure 6.11
shows how to compute an integral using the Trapezoidal rule.
6.4.2 Simpson Rule

The Simpson rule for computing the integral of a function follows the same
type of method as used for the trapezoidal rule with two exceptions. The
summation expression is different. The number of terms N is even. The rule is
Z b
∆x
f (x)dx ≈ [f (x0 ) + 4f (x1 ) + 2f (x2 ) + 2f (x3 ) + · · · + 2f (xN −2 )
a 3
+ 4f (xN −1 ) + f (xN )]
6.4 Computing of Integrals 101
x
1
f
+
f
+
f
+
x
1
Figure 6.11 Integration using trapezoidal rule.
(N /2)−1
∆x 4∆x X
≈ [f (x0 ) + f (xN )] + f (x2i+1 )
3 3
i=1
(N /2)−1
2∆x X
+ f (x2i ) (6.11)
3
i=1
b−a
so that ∆x = N ;xi = a + i∆x. Therefore, the computational graph for
this integration method can be easily drawn by first computing the values of
the function at N + 1 locations starting from zero to N in ∆x discrete steps.
Just how big N is will be determined by some error bound which has been
derived to be traditionally
K4 (b − a)5
|E| ≤
180N 4
K4 is the value of the fourth derivative of the function f (x). This sets the
bound on the error in the integration value obtained by using the Simpson
rule for the function. Once the choice of N is made, an error bound has been
set for the result of the integration. This error may be reduced by changing
the value of N, the number of terms in the summation.
Exercise: Draw the computational graph for the Simpson Rule for integrating
a function.
6.5 Multipath Compound Derivatives

In multipath differentiation as used in neural network applications, there is
tandem differentiation. Results from previous steps affect the derivative of
the current node. Take for example in Figure 6.12, the derivative of Y is
influenced by the derivative of X in the forward path.
In the reverse path, the derivative of Z affects the derivative of Y. Let us
look at these two cases involving multipath differentiation. In Figure 6.12,
the weights or factors by which each path derivative affects the next node are
shown with arrows and variables.
Multipath Forward Differentiation
In the discussion, we limit the number of paths to three, but with the
understanding that the number of paths is limitless and depends on the
application. Observe the dependence of the partial integrals on the weights
from the integrals from the previous node.
In Figure 6.13, we have the derivative of Z with respect to X depends
on the derivative of Y with respect to X. Notice the starting point has the
derivative of a variable X to X.
X Y Z
Figure 6.12 Multipath differentiation.
Figure 6.13 Multipath forward differentiation.
6.5 Multipath Compound Derivatives 103
Figure 6.14 Multipath reverse differentiation.
Multipath Reverse Differentiation

In the reverse (backward) path dependence, Figure 6.14 shows that the
derivative of Z with respected to X starts with the derivative of Z with respect
to Z, which of course is one.
In Figure 6.14, there is a tandem of three paths from stage 1 and another
three paths to the end node, making a total of 9 paths. These paths for
the forward and reverse partial differentiations are given by the product
(α + β + γ) (δ + ε + ξ).
View publication stats

Calculus On Computational Graphs - An Introduction: February 2021

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Calculus On Computational Graphs - An Introduction: February 2021

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Calculus on Computational Graphs - an Introduction

Chapter · February 2021

Data Science and AI View project

Digital Identity Management Systems View project

The user has requested enhancement of the downloaded file.

94 Calculus on Computational Graphs

6.1.1 Elements of Computational Graphs

u=f(x); v=g(u); y=h(v)

6.2 Compound Expressions 95

6.2 Compound Expressions

96 Calculus on Computational Graphs

Figure 6.4 Computational graphs of compound expressions.

Figure 6.5 Evaluating compound expressions.

6.3 Computing Partial Derivatives

The partial derivative of y with respect to b is

6.3 Computing Partial Derivatives 97

Figure 6.6 Computational graph for Equation (6.3).

The partial computational graph covering Equation (6.3) is given in

6.3.1 Partial Derivatives: Two Cases of the Chain Rule

98 Calculus on Computational Graphs

Figure 6.9 Loop chain rule.

6.3.1.1 Linear chain rule

z = f (x); z = h(y); y = g(x) (6.6)

Therefore, the derivative of z with respect to x needs to be done with respect

6.3.1.2 Loop chain rule

6.4 Computing of Integrals 99

Figure 6.10 Multiple loop chain rule.

branch, y = g(s), z = h(y). Two branches contribute to the value of z so that

6.3.1.3 Multiple loop chain rule

6.4 Computing of Integrals

100 Calculus on Computational Graphs

6.4.1 Trapezoidal Rule

so that ∆x = b−aN ; xi = a+i∆x. N is the number of discrete sampling steps.

6.4.2 Simpson Rule

6.4 Computing of Integrals 101

Figure 6.11 Integration using trapezoidal rule.

102 Calculus on Computational Graphs

6.5 Multipath Compound Derivatives

Figure 6.12 Multipath differentiation.

Figure 6.13 Multipath forward differentiation.

6.5 Multipath Compound Derivatives 103

Figure 6.14 Multipath reverse differentiation.

Multipath Reverse Differentiation

View publication stats

You might also like