Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/348978259

Calculus on Computational Graphs - an Introduction

Chapter · February 2021

CITATIONS READS

0 34

1 author:

Johnson Agbinya
Melbourne Institute of Technology
202 PUBLICATIONS   1,838 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Data Science and AI View project

Digital Identity Management Systems View project

All content following this page was uploaded by Johnson Agbinya on 03 February 2021.

The user has requested enhancement of the downloaded file.


Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications

6
Calculus on Computational Graphs

6.1 Introduction
Fast, accurate and reliable computational schemes are essential when
implementing complex systems required in deep learning applications. One
of the techniques for achieving this is the so-called computational graph.
Computational graphs divide down a complex computation into small and
executable steps which could be performed quickly with pencil and paper
and better still with computers. In most cases, loops that require repeating of
the same algorithm but wastes time computationally due to processing of loop
times become a lot easier to handle. Computational graphs ease the training
of neural networks with gradient descent algorithm making them many times
faster than traditional implementation of neural networks.
Computational graphs have also found applications in weather forecasting
by reducing the associated computation time. Its strength is fast computation
of derivatives. It is known also by a different name of “reverse-mode
differentiation.”
Beyond its use in deep learning, backpropagation is a powerful
computational tool in many other areas like weather forecasting and analysis
of numerical stability. In many ways, computational graph theory is similar
with logic gate operations in digital circuits where dedicated logic operations
are undertaken with logic gates such as AND, OR, NOR and NAND oper-
ations in the implementation of many binary operations. While the use of
logic gates lead to complex systems such as multiplexers, adders, multipliers
and more complex digital circuits, computational graphs have found their way
into deep learning operations involving derivatives of real numbers, additions,
scaling and multiplications of real numbers by simplifying the operations.

93
(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications

94 Calculus on Computational Graphs

6.1.1 Elements of Computational Graphs


Computational graphs are useful means of breaking down complex
mathematical computations and operations into micro computations and
thereby make it a lot easier to solve them in a sequential manner. They also
make it easier to track computations and to understand where solutions break
down. A computational graph is a connection of links and nodes at which
operations take place. Nodes represent variables and links are functions and
operations.
The range of operations include addition, multiplication, subtraction,
exponentiation and a lot more operations herein not mentioned. Consider
Figure 6.1, the three nodes represent three variables a, b and c. The variable c
is the result of the operation of the function f on a and b. This means we can
write the result as
c = f (a, b) (6.1)
Computational graphs allow nesting of operations which allow solving more
complex problems. Consider the following nesting of operations with this
computational graph in Equation (6.2).
Clearly, from these three operations, we see that it is a lot easier to
undertake the more complex operation when y is computed as in equation
y = h(g(f (x))) (6.2)
From Figure 6.2, it is apparent that the first operation is performed first
(f (x)). This is followed by the second operation, which is g(.), and lastly
h(.) operation.
However, from the point of view of Equation (6.2), the innermost
operation in Equation (6.2) involving f (x) is performed first. This is followed
by the second operation, which is g(.), and lastly h(.) as the last operation.

b
f c
a

Figure 6.1

x u v y

u=f(x); v=g(u); y=h(v)

Figure 6.2

(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications

6.2 Compound Expressions 95

Consider the case where f is the operation of addition (+). The result c is
then given by the expression
c = (a + b)
when the operation is multiplication (x), the result is also given by the
expression.
c = (a × b)
Since f is given as a general operator, we may use any operator in the diagram
and write an equivalent expression for c.

6.2 Compound Expressions


For computer to use compound expressions with computational graph effi-
ciently, it is essential to factor expressions into unit cells. For example, the
expression r = p × q = (x + y)(y + 1) can be reduced first to two unit
cells or two terms followed by computation of the product of the terms. The
product term is r = p × q.
r = p × q = (x + y)(y + 1) = xy + y 2 + x + y
p=x+y
q =y+1
Each of the computational operation or component is created and then a graph
is built out of them by appropriately connecting them with arrows. The arrows
originate from the terms used to build the unit term where the arrow ends as
in Figure 6.3.
This form of abstraction is extremely useful in building neural networks
and deep learning frameworks. In fact, they are also useful in programming
of expressions for example in operations involving parallel support vector
machines (PSVM). In Figure 6.4, once the values at the root of the compu-
tational graphs are known, the solution of the expression becomes a lot easy
and trivial. Take the case where x = 3 and y = 4, the resulting solutions are
shown in Figure 6.5.
The evaluation of the compound expression is 35.

b
f c
a

Figure 6.3

(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications

96 Calculus on Computational Graphs

Figure 6.4 Computational graphs of compound expressions.

Figure 6.5 Evaluating compound expressions.

6.3 Computing Partial Derivatives


One of the areas where computational graphs are used widely in neural
network applications is for computing derivatives of variables and functions
in simple forms. For derivatives, it simplifies the use of the chain rule.
Consider computing the partial derivative of y with respect to b where

y = (a + b) × (b − 3) = c × d
(6.3)
c = (a + b); d = (b − 3)

The partial derivative of y with respect to b is


∂y ∂y ∂c ∂y ∂d
= × + × (6.4)
∂b ∂c ∂b ∂d ∂b

(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications

6.3 Computing Partial Derivatives 97

c d
-3

a b

Figure 6.6 Computational graph for Equation (6.3).

c d
-3

a bb
Figure 6.7 Computation of partial derivatives.

The partial computational graph covering Equation (6.3) is given in


Figure 6.6. The partial derivative is given in Equation (6.4). From
Equation (6.3), the partial derivatives are given as
∂y
= d = (b − 3);
∂c
∂y
= c = (a + b);
∂d
(6.5)
∂c ∂c ∂d
= 1; = 1; = 1;
∂a ∂b ∂b
∂y
= (b − 3) × 1 + (a + b) × 1
∂b
These derivatives are superimposed on the computational graph in Figure 6.7.

6.3.1 Partial Derivatives: Two Cases of the Chain Rule


The chain rule may be applied in various circumstances. Two cases are of
interest: the linear case (Figure 6.8) and the loop case (Figure 6.9). These two
cases are illustrated in this section.

(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications

98 Calculus on Computational Graphs

g h
x y z
dz ∂z ∂z
= ×
dx ∂y ∂x
Δx → Δy → Δz
Figure 6.8 Linear chain rule.

Δx
x

s Δs Δz
z

y Δy

Figure 6.9 Loop chain rule.

6.3.1.1 Linear chain rule


The objective in the linear case is to find the derivative of the output z with
dz
respect to the input x or dx . The output depends recursively on both y and
x and hence it is expected that the partial derivative of z will also depend
on these two variables. The partial derivative of y with respect to x therefore
needs to address this. From the diagram, we can write generally

z = f (x); z = h(y); y = g(x) (6.6)

Therefore, the derivative of z with respect to x needs to be done with respect


to y also and is a product of two terms:
dz ∂z ∂y
= × (6.7)
dx ∂y ∂x
Thus, the derivative of z with respect to x is computed as a product of two
partial derivatives of variables leading to it. This is shown in Figure 6.8.

6.3.1.2 Loop chain rule


The loop chain rule in Figure 6.9 is an application of the linear chain rule.
Each loop is treated with a linear chain rule. Consider the following loop
diagrams (Figure 6.9). The objective is to find the derivative of the output z
using the linear chain rule along the two arms of the loop and sum them.
z is a function of s as z = f (s) through two branches involving x and y,
respectively. In the upper branch, x = g(s) and z = h(x). In the lower

(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications

6.4 Computing of Integrals 99

x1

s X2 z Δs Δz

XN

Figure 6.10 Multiple loop chain rule.

branch, y = g(s), z = h(y). Two branches contribute to the value of z so that


z = p(x, y). Therefore, there will also be a sum of partial derivatives coming
from the two branches. The derivative of x with respect to s is obtained as
dz ∂z dx ∂z dy
= × + × (6.8)
ds ∂x ds ∂y ds

6.3.1.3 Multiple loop chain rule


Generally, if z is computed from N loops so that it is a function of N variables
like z = k(x1 , x2 , . . . , xN ), then N branches contribute to the output z.
Therefore, the total derivative of z to the input is a chain of N partial
derivatives.
The general partial derivative expression shown as Figure 6.10 is
dz ∂z dx1 ∂z dx2 ∂z dxN
= × + × + ··· + ×
ds ∂x1 ds ∂x2 ds ∂xN ds
N
X ∂z dxn
= × (6.9)
∂xn ds
n=1

The general case is more suited to deep learning situations where there are
many stages in the neural network and many branches are also involved.

6.4 Computing of Integrals


In the next section, we introduce the use of computational graphs for
computing integrals using some of the well-known traditional approaches
including the trapezoidal and Simpson rules.

(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications

100 Calculus on Computational Graphs

6.4.1 Trapezoidal Rule


Integration traditionally is finding the area under a curve. This area has two
dimensions: the discrete step in sampling of the function multiplied by the
amplitude of the function at that discrete step. Thus, the function f (x) may
be computed using trapezoidal rule with the expression
Z b
∆x
f (x)dx ≈ [f (x0 ) + 2f (x1 ) + 2f (x2 ) · · · + 2f (xN −1 ) + f (xN )]
a 2
N −1
∆x X
≈ [f (x0 ) + f (xN )] + ∆x f (xi ) (6.10)
2
i=1

so that ∆x = b−aN ; xi = a+i∆x. N is the number of discrete sampling steps.


Therefore, the computational graph for this integration method can be easily
drawn by first computing the values of the function at N + 1 locations starting
from zero to N in ∆x discrete steps. Just how big N is will be determined by
some error bound, which has been derived to be traditionally

K2 (b − a)3
|E| ≤ .
12N 2
K2 is the value of the second derivative of the function f (x). This sets the
bound on the error in the integration value obtained by using the trapezoidal
rule for the function. Thus, once the choice of N is made, an error bound
has been set for the result of the integration. This error may be reduced by
changing the value of N, the number of terms in the summation. Figure 6.11
shows how to compute an integral using the Trapezoidal rule.

6.4.2 Simpson Rule


The Simpson rule for computing the integral of a function follows the same
type of method as used for the trapezoidal rule with two exceptions. The
summation expression is different. The number of terms N is even. The rule is
Z b
∆x
f (x)dx ≈ [f (x0 ) + 4f (x1 ) + 2f (x2 ) + 2f (x3 ) + · · · + 2f (xN −2 )
a 3
+ 4f (xN −1 ) + f (xN )]

(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications

6.4 Computing of Integrals 101

x
1

f
+

f
+

f
+

x
1

Figure 6.11 Integration using trapezoidal rule.

(N /2)−1
∆x 4∆x X
≈ [f (x0 ) + f (xN )] + f (x2i+1 )
3 3
i=1
(N /2)−1
2∆x X
+ f (x2i ) (6.11)
3
i=1
b−a
so that ∆x = N ;xi = a + i∆x. Therefore, the computational graph for
this integration method can be easily drawn by first computing the values of
the function at N + 1 locations starting from zero to N in ∆x discrete steps.
Just how big N is will be determined by some error bound which has been
derived to be traditionally

K4 (b − a)5
|E| ≤
180N 4
K4 is the value of the fourth derivative of the function f (x). This sets the
bound on the error in the integration value obtained by using the Simpson
rule for the function. Once the choice of N is made, an error bound has been

(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications

102 Calculus on Computational Graphs

set for the result of the integration. This error may be reduced by changing
the value of N, the number of terms in the summation.
Exercise: Draw the computational graph for the Simpson Rule for integrating
a function.

6.5 Multipath Compound Derivatives


In multipath differentiation as used in neural network applications, there is
tandem differentiation. Results from previous steps affect the derivative of
the current node. Take for example in Figure 6.12, the derivative of Y is
influenced by the derivative of X in the forward path.
In the reverse path, the derivative of Z affects the derivative of Y. Let us
look at these two cases involving multipath differentiation. In Figure 6.12,
the weights or factors by which each path derivative affects the next node are
shown with arrows and variables.
Multipath Forward Differentiation
In the discussion, we limit the number of paths to three, but with the
understanding that the number of paths is limitless and depends on the
application. Observe the dependence of the partial integrals on the weights
from the integrals from the previous node.
In Figure 6.13, we have the derivative of Z with respect to X depends
on the derivative of Y with respect to X. Notice the starting point has the
derivative of a variable X to X.

X Y Z

Figure 6.12 Multipath differentiation.

Figure 6.13 Multipath forward differentiation.

(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications

6.5 Multipath Compound Derivatives 103

Figure 6.14 Multipath reverse differentiation.

Multipath Reverse Differentiation


In the reverse (backward) path dependence, Figure 6.14 shows that the
derivative of Z with respected to X starts with the derivative of Z with respect
to Z, which of course is one.
In Figure 6.14, there is a tandem of three paths from stage 1 and another
three paths to the end node, making a total of 9 paths. These paths for
the forward and reverse partial differentiations are given by the product
(α + β + γ) (δ + ε + ξ).

(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)
Johnson Agbinya (Teaching Notes): Applied Data Analytics Principles & Applications

(C) 2020 River Publishers Denmark. ISBN: 978-87-7022-096-5 (Hardback) 978-87-7022-095-8 (Ebook)

View publication stats

You might also like