Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

A NOVEL SIGNAL FLOW GRAPH BASED SOLUTION FOR CALCULATION OF FIRST AND

SECOND DERIVATIVES OF DYNAMICAL NONLINEAR SYSTEMS

Andrea Arcangeli, Stefano Squartini, Francesco Piazza


DEIT, Università Politecnica delle Marche
Via Brecce Bianche 60131 Ancona, Italy (Europe)
Phone: +39 0712204453, fax: +390712204835, email: sts@deit.univpm.it

ABSTRACT 2) Single delay branch : v j = z −1 x j (t ) = x j (t − 1)


In this paper, the problem of calculation of first and second deriva- Function f of the static branch is requested to be differentiable
tives in general non-linear dynamical systems is addressed and an and it can change by time, since it depends on the time-varying
attempt of solution by means of signal flow graph (SFG) techniques parameter α j (t ) . This let it cover a large class of nonlinearities, the
is proposed. First and full second derivatives of an output of the
most common functional relations being:
initial system respect with the node variables of the starting SFG are
delivered through an adjoint graph derived without using Lee’s
 v j = f j ( x j (t ) ) non - linear branch
theorem. Mixed second derivatives are deduced by quantities at-

tained in adjoint graphs of the original graph or graphs related to it. v j = w j (t ) x j (t ) linear branch
A detailed theoretical demonstration of these formulations is given.
Even though no adjoint graph has been derived in case of mixed
derivatives, the ability of the proposed method to determine all Hes- 2. FIRST AND FULL SECOND DERIVATIVES
sian matrix entries in a complete automatic way is highlighted. Here we derive an adjoint graph for automatic computation of first
and second derivatives of an output of the original graph respect
1. INTRODUCTION with its nodes. The word adjoint is not here used as rigorously as its
definition requires: in fact the derived graph for derivatives calcula-
The problem of sensitivity calculation of complex discrete time
tion is a reverse graph but does not preserve the same topology of
domain systems has been effectively dealt with in literature through
the original one. Such a graph is characterized by two level, one
Signal-Flow-Graph (SFG) techniques [1]. New algorithms for gra-
exclusively for first derivatives and both for determination of full
dient computation in gradient-based linear and non-linear adaptive
second derivatives. This fact immediately lets us introduce the most
networks [2]-[4] have been developed. A more recent work [5]
relevant difference between this approach and that one suitable for
represents a generalization of previous approaches, as it delivers
general systems in literature [1]: the Lee’s theorem has not been
derivatives of an output at a given time with respect to a variation of
employed to define the functional relations of the adjoint graph
an internal parameter not necessarily at the same instant, allowing to
branches. Manipulation of derivative operation has revealed to suf-
derive the gradient information for dynamical non-linear system
fice for our purposes. However, we got a generalization of previous
learning. This is achieved by a suitable adjoint network, whose defi-
approach: indeed, if we consider only the part of the adjoint graph
nition is completely based on Lee’s theorem [2]. This method al-
relative to first derivatives we notice that it coincides with the ad-
lowed to work out an automatic procedure involving Jacobean based
joint graph referred in [1]. Concerning second derivatives, it must be
information for system adapting, but it has left open the problem of
noted that only the full ones are here calculated: the mixed ones will
Hessian matrix calculation. The present paper is claimed to give a
be considered in the following.
solution to such a task. On purpose, a new adjoint graph (defined
Now we can proceed to describe which are the transformations to do
without using Lee’s theorem) is derived to yield first and full second
to get the adjoint graph from the topology of the original one. They
derivatives, from which the mixed ones can be deduced, as it will be
are listed in Table 1. The first step consists on showing the validity
explained later on.
of such transformations in case of topologically linear graph, i.e. a
As in [5], the class of systems here addressed is that of non-linear
SFG where each node has just one incoming and one outgoing
dynamical ones, like dynamical neural networks. They can be repre-
branches, and where there are one input and one output branches.
sented by SFGs. A SFG is a set of nodes and oriented branches. A
The starting branch to analyze is the output one, that should repre-
branch is oriented from the initial node i and the final node j. There
sent the two-level input branch for the adjoint SFG. As output
are three different types of nodes: input nodes, which have only one
branch is a zero-memory branch and different outputs are independ-
outgoing branch, output nodes, which have only one incoming
ent each other (also for non-linear topology SFGs), the following
branch, and general n-nodes, which sum of incoming branches and
holds:
distribute the result to outcoming branches. All other branches be-
side input and output ones are called f-branches. Each of them have
 ∂yk (t )
two variable associated: the initial x one at the tail of branch and the  ∂y (t ) = δ kj
final one v at the head of the branch. We can distinguish among two  j
different types of f-branches in dependence of the relation between x 
 ∂yk (t ) = ∂yk (t ) = δ (τ ) = 1 if τ = 0
and v. Taking into account we address the case of discrete time sys-  ∂y (t ' ) ∂y (t − τ ) 
 k 0 if τ > 0
tems and that in our notation only branches are indexed, we can list k

them as:
In case of second derivative, it suffices to derivate further:
1) Static branch: v j = f j ( x j (t ), α j (t ), t )

69
∂ 2 yk (t ) ∂ 2 y k (t ) and vice versa. Indeed, let us suppose, without loss of generality that
y j '' = = =0 the system output depends on the addition node through a generic
∂y j (t )
' 2
∂y j (t − τ ) 2
function with memory as follows:
Let us continue showing the formulas derived in case of static n
branches. We move from the supposition that derivatives respect yk = h(uΣ ), uΣ = u1 + ... + un = ∑ u j
j =1
with node v j are correct, that means:

f
v j (t − τ ) = f j ( x j (t − τ ),α j (t − τ ), t − τ ) Original SFG Adjoint SFG

 ∂yk (t ) ∂ 2 yk (t )
v j '(τ ) = , v j ''(τ ) =
 ∂v j (t − τ ) ∂v j (t − τ ) 2

Hence, we can calculate derivatives of the output respect with the


node x j = v j −1 . Concerning the first derivative we have: xk '(τ ) = f k '(τ ) ⋅ vk '(τ )
xk ''(τ ) = f k ''(τ ) ⋅ vk '(τ ) +

Static
∂yk (t ) ∂yk (t ) ∂vk (t − τ )
x j '(τ ) = = = v j '(τ ) ⋅ f j '(t − τ ) + f k '(τ ) 2 ⋅ vk ''(τ )
∂x j (t − τ ) ∂v j (t − τ ) ∂x j (t − τ )
vk (t ) = ∂f k
f k '(τ ) =
while for second derivative it can be deduced: = f k ( xk (t ),α k (t ), t ) ∂xk xk ( t −τ ),α k ( t −τ ),( t −τ )

∂ y k (t )
2
∂  ∂yk (t )  ∂ fk
2

x j ''(τ ) = =  = f k ''(τ ) =
∂x j 2 (t − τ ) ∂x j (t − τ )  ∂x j (t − τ )  ∂xk 2 xk ( t −τ ),α k ( t −τ ),( t −τ )

∂  ∂yk (t ) ∂vk (t − τ ) 
=  =
∂x j (t − τ )  ∂v j (t − τ ) ∂x j (t − τ ) 
∂ 2 yk (t ) ∂vk (t − τ ) ∂yk (t ) ∂ 2vk (t − τ )
= + =
∂x j ∂v j (t − τ ) ∂x j (t − τ ) ∂v j (t − τ ) ∂x j 2 (t − τ )
Unit Delay

= v j ''(τ ) ( f j '(t − τ ) ) + v j '(τ ) ⋅ f j ''(t − τ )


2

v '(τ − 1) τ >0
vk (t ) = xk '(τ ) =  k
where we named: 0 τ =0
 x (t − 1) t >0
= k v ''(τ − 1) τ >0
 f j '(t − τ ) = f j ' ( x j (t − τ ),α j (t − τ ), t − τ ) 0 t =0 xk ''(τ ) =  k
 0 τ =0

 f j ''(t − τ ) = f j '' ( x j (t − τ ),α j (t − τ ), t − τ )

Regarding to unit delay branch we can write, under the same afore
mentioned hypotheses:
Output

 ∂yk (t ) ∂yk (t )
 x j '(τ ) = ∂x (t − τ ) = ∂v (t − τ + 1) = v j '(τ − 1)
 j j yk ' = δ jkδ (τ )
 yk (t ) = xk (t )
 x ''(τ ) = ∂ 2
y (t ) ∂ 2 yk (t ) yk '' = 0
k
= = v j ''(τ − 1)
 j
∂x j (t − τ ) 2
∂v j (t − τ + 1) 2

In case of input branch we do not have to derive anything since the


Input

corresponding branch in adjoint SFG fulfils only the role to deliver


the derivatives of the node to which it is connected. So, proof of
validity of transformation set can conclude here if we limit our uk '(τ ) = vk '(τ )
vk (t ) = uk (t )
analysis to topologically linear SFG. However, we can generalize uk ''(τ ) = vk ''(τ )
such a result, observing that a general graph can be seen as several
topologically linear graphs connected through addition and distribu-
tion nodes, (Figure 1). This means to arrange previous formulas to Table 1. Transformations of functional relations in f-branches from
fit then to this new scenario. For example, we can not employ input original graph to adjoint one.
branches in adjoint SFGs (for each topologically linear graph in-
volved) as: where n is the number of incoming branches. We have then:

y j ' = δ kjδ (τ ), y j '' = 0 ∂yk ∂yk ∂uΣ ∂uΣ


= , =1
∂u j ∂uΣ ∂u j ∂u j
since these are not the right derivatives respect with those input ∂yk ∂yk
nodes in corresponding original SFGs. However, we shall be able to ⇒ vj ' = = = vΣ ' ∀j = 1..n
show that an addition node is a distribution node in the adjoint graph ∂u j ∂uΣ

70
ance with formulas of previous sections, the following notation
∂2 y ∂y ∂2 y
holds: xi′′ = 2 , xi′′ = , xij′′ = . From that, equation (1)
∂xi ∂xi ∂xi ∂x j
and inverse function theorem, it derives that:

∂2 y ∂  ∂y ∂xi  ∂ 2 y ∂xi ∂y ∂ 2 xi
=  = + =
∂xi ∂x j ∂xi  ∂xi ∂x j  ∂xi 2 ∂x j ∂xi ∂xi ∂x j (2)

( )f ( ′′
= xi′′ ⋅ 1 ′ + xi′ ⋅ − f ′ = xij′′
f )
Figure 1. Addition and distribution nodes connecting topologically
linear SFGs. ∂2 y ∂  ∂y ∂x j  ∂ 2 y ∂x j ∂y ∂ x j
2

=   = + =
∂x j ∂xi ∂x j  ∂x j ∂xi  ∂x j 2 ∂xi ∂x j ∂x j ∂xi (3)
Similar considerations can be made for second derivatives:
∂ 2 yk ∂  ∂yk  ∂  ∂yk 
( )′′
= x j′′ ⋅ f ′ + x j′ ⋅ f ′ = x ji′′
f
v j '' = =  =  =
∂u j 2 ∂u j  ∂u j  ∂u j  ∂uΣ  It can be underlined that:
∂ 2 yk ∂uΣ ∂ 2 yk • xij′′ ≠ x ji′′ : in fact xi , x j are dependent and Schwarz’s
= = = vΣ '', ∀j = 1..n
∂uΣ 2 ∂u j ∂uΣ 2 theorem can not be applied.
Concerning the distribution node, we can underlie, without loss of
• xij′′ and x ji′′ can be calculated by means of first and full
generality, that there is a general function with memory describing
the dependence of desired output on the n outgoing branches: second derivatives relative to the original graph ( x j′′ , x j ) or
y = h ( u1 , u2 ,...., un ) . Then, u1 = u2 = ... = un = u holds, where u is
to sub-graphs of it ( f ′, f ′′ ).
the value of the incoming branch. This let us state:
∂yk (t ) n
∂yk (t ) ∂u j (t − τ )
u '(τ ) = =∑ = 3.2 Case 2: only independent variables xi , x j
∂u(t − τ ) j =1 ∂u j (t − τ ) ∂u(t − τ )
Conditions of Schwarz’s theorem are assumed to be valid: this is
n
∂yk (t ) n
=∑ = ∑ u j '(τ ) supported by the independence between xi , x j . Therefore xij′′ and
 j (t − τ ) j =1
j =1 ∂u

∂ 2 y k (t ) n
x ji′′ are expected to be identical. Anyway, we need further topo-
u ''(τ ) = = ∑ u j ''(τ )
∂u(t − τ ) 2
j =1 logical information for our purposes: that is the knowledge of a
In conclusion, it can be affirmed that the set of transformations in “common” node between xi , x j , namely xk . This node is depend-
Table 1 are valid for generic topology SFGs. ent on both xi and x j , through two different functions, i.e.

3. MIXED SECOND DERIVATIVES ( )


xk = f i ( xi ) , xk = f j x j . In particular, we can state that f j′ does
We are not interested for the moment in deriving an extended ad- not depend on xi , and f j′ on x j as well, if we choose xk to be the
joint graph to consider also the case of mixed second derivatives,
but just in showing how these quantities can be calculated directly first common node, i.e. the first addition node combining xi , x j .
from those ones obtained by means of the method described in the
∂ 2 xk
previous section. That is why we shall omit the time variable from This means that = 0 and thus:
the following expressions, assuming that the output yk (t ) at time t ∂xi ∂x j

depends on node variables xi ( t − τ 1 ) x j ( t − τ 2 ) at different times


∂2 y ∂  ∂y ∂xk  ∂ 2 y ∂xk ∂y ∂ 2 xk
t − τ 1 , t − τ 1 . The term we are interested on will be thus restricted to =   = + = xik ′′ ⋅ f j′
∂xi ∂x j ∂xi  ∂xk ∂x j  ∂xi ∂xk ∂x j ∂xk ∂xi ∂x j
its topological meaning:
∂2 y
= x jk ′′ ⋅ fi′
∂ 2 yk ( t ) ∂ 2 yk ∂x j ∂xi
= (1)
∂xi ( t − τ 1 ) ∂x j ( t − τ 2 ) ∂xi ∂x j
We can observe that calculation of mixed derivatives in such a case
We shall distinguish three cases, in accordance with the relationship involves quantities attainable from Case 1 and first derivatives of
between node variables xi , x j , as it follows. suitable sub-graphs.

3.3 Case 3: both dependent and independent variables xi , x j


3.1 Case 1: only dependent variables xi , x j
In contrast to case 1, output y depends on xi not only through x j
The dependence between xi , x j is given by x j = f ( xi ) and the
but also through different nodes (Figure 1), i.e. we can write
output y depends on xi only through x j ; we shall assume that f is
( ) ( )
y = g x j , xi = g f ( xi ) , xi . We can think to split xi into two new
invertible, consequently yielding xi = F x j = f ( ) −1
( x ) . In compli-
j variables, xi and xi , the latter being linked to x j through

71
( ) ( )
xi = F x j = f −1 x j , and the former being completely independ- xi = 0 . Then, we can rewrite (5) and (6) as in the following com-
pact way:
ent from x j . This let us deduce the formulation of mixed deriva-
tives we are interested to: 
( ) ⋅ 1 + ( g′ g ′)  − x′ ⋅  f ′′ ( f ′)  +  g ′′ ( g ′)  ⋅ g′
    
 xij′′ = xi′′⋅  1

 f
i 2 2

∂2 y ∂  ∂y ∂xi   ∂  ∂y   ∂xi ∂y  ∂  ∂xi   
= xij′′ =  =   +   
∂xi ∂x j ∂xi  ∂xi ∂x j   ∂xi  ∂xi   ∂x j ∂xi  ∂xi  ∂x j   

( g′ h′) + x′ ⋅ ( f ′′ f ′) − ( h′′ h′) g′

 x′′ji = x′′j ⋅  f ′ +

j

∂y ∂y ∂y ∂y
Observing that = ⋅ f′= − , we shall have:
∂xi ∂x j ∂xi ∂xi
x1 x2 x3 x4

( ) ( )( ) ′′  ∂  ∂y   (.) ln (.) exp (.)


2

xij′′ = xi′′ ⋅ 1 ′ − x′j ⋅ f ′ − 1 ′   (4)


f  ∂x ∂x  
u y
f f
 i  i 
sin (.)
that immediately reduces to (2) as soon as xi = 0 . Let us focus on
the third term in (4), that is given by:
Figure 2. A simple SFG example in case 3: x1, x2 are the both de-
∂  ∂y  ∂ y 2 pendent and independent node variables addressed.
 = + xi′′
∂xi  ∂xi  ∂xi ∂xi

where xi′′ is the full second derivatives of the graph deduced from
4. CONCLUSIONS
the original one by splitting xi into xi and xi . We can now afford
SFG techniques have been successfully employed here to calculate
to implement what done in case 2, and write:
first and second derivatives in dynamical non-linear systems.
Firstly, a suitable adjoint graph has been derived without using
∂2 y
=
∂ 2 y ∂xk

∂xi ∂xi ∂xi ∂xk ∂xi g  ( ) ′′
( )∂x
= 1 ′  xi′′ − xi′ ⋅ g ′  ⋅ k
g  ∂x
i
Lee’s theorem to yield first and full second derivatives of an output
of the original graph respect with one of its nodes automatically.
This let previous SFG based approach for sensitivity calculation [2]-
where xk is the first common node between xi and xi , and [5] be considered as particular case thereof. Secondly an helpful
xk = g ( xi ) , xk = g ( xi ) . Summarising, we get the final formula: formulation for mixed second derivatives has been deduced in order
to make them dependent on quantities attainable by means of suit-
able adjoint graphs for first and full second derivatives, even though


( ) ( )( )
f ′′
 xij′′ = xi′′⋅ 1 f ′ + x′j ⋅ − f ′ − 1 f ′ 
 ∂2 y
∂x
 i i∂x

+ xi′′

a proper graph description of such dependencies actually lacks.
Derivation of that is the first issue for future works. Then, applica-
 2 tion of proposed approach to adapting problems where calculation
 ∂ y ∂ 2 y ∂xk ∂2 y of Jacobean and Hessian based information is involved, with special
 = ⋅ = ⋅ g′ (5)
 ∂xi ∂xi ∂xi ∂xk ∂xi ∂xi ∂xk care to dynamical neural systems, is also being studied. Finally, the

( ) ( )
 ∂2 y  ′′  authors are investigating how to extend the present approach to the

 = 1  ′  xi′′ − xi′ ⋅ g  ′  case of multirate systems, as recently done [6] for calculation of first
 ∂xi ∂xk g  g 
derivatives.

where x j = f ( xi ) , xk = g ( xi ) , xk = g ( xi ) hold. It can be easily REFERENCES


noticed that only first and full second derivatives relative to the [1] A.Y. Lee, “Signal Flow Graphs—Computer Aided System
original graph, the graph derived by splitting the dependent variable Analysis and Sensitivity Calculations” IEEE Transactions on Cir-
into his two components, and particular sub-graphs of them occur in cuits and Systems, vol. 21, no. 2, pp. 209-216, 1974.
(5). Similarly, we can derive the following for the reverse mixed [2] E.A. Wan, and F. Beaufays, “Diagrammatic Derivation of Gra-
derivative: dient Algorithms for Neural Networks”, Neural Computation, vol.
8, pp. 182-201, 1996.


( )
 x′′ji = 1 f ′  x′′j ⋅ ( f ′ ) + x′j ⋅ f ′′ +

2 ∂2 y
 ∂x j ∂xi
[3] S. Osowski, “Signal flow graphs and neural networks”, Biologi-
cal Cybernetics, 70, pp. 387-395, 1994.
[4] G. Martinelli, and R. Perfetti, “Circuit theoretic approach to the
 2
 ∂ y ∂ 2 y ∂xk ∂2 y backpropagation learning algorithm”, Proc. of the IEEE Int. Sympo-
 = ⋅ = ⋅ g′ (6)
 ∂x j ∂xi ∂x j ∂xk ∂xi ∂x j ∂xk sium on Circuits and Systems, 1991.
 2 [5] P. Campolucci, A. Uncini, and F. Piazza, “A signal Flow-Graph

 ∂x j ∂xk ( )
 ∂ y = 1  xi′′ − xi′ ⋅ g ′′ 
g ′  ( ) g ′ 
Approach to On-Line Gradient Calculation”, Neural Computation,
vol. 12, no. 8, August 2000.
[6] F. Rosati, P. Campolucci, and F. Piazza, “ Learning in Linear
and Non-Linear Multirate Digital Systems by Signal Flow Graphs”,
( )
where x j = f ( xi ) , xk = h x j , xk = g ( xi ) hold. Same consid-
IEEE ISCAS, vol.2, pp. 6-9, May 2001.
erations as above can be made. In particular, (6) reduces to (3) when

72

You might also like