Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Adaptive Interaction and its Application

to Neural Networks. 
Robert D. Brandt and Feng Lin

Abstract
Adaptive interaction is a new approach to introduce adaptability into man-made
systems. In this approach, a system is decomposed into interconnected subsystems
that we call devices and adaptation occurs in the interactions. More precisely, interaction weights among these devices will be adapted in order to achieve the objective of
minimizing a given cost function. The adaptation algorithm developed is mathematically equivalent to a gradient descent algorithm but requires only local information in
its implementation. One particular application of adaptive interaction that we study
in this paper is in neural networks. By applying adaptive interaction, we can achieve
essentially the same adaptation as that using the well-known back-propagation algorithm but without the need of a feedback network to propagate the errors, which has
many advantages in practice. A simulation is provided to show the eectiveness of our
approach.

Keywords: Adaptive interaction, neural network, back-propagation


This research is supported in part by the National Science Foundation under grants ECS-9315344 and
INT-9602485. Robert D. Brandt is with Intelligent Devices, Inc., 465 Whittier Ave., Glen Ellyn, IL 60137.
Feng Lin is with the Department of Electrical and Computer Engineering, Wayne State University, Detroit,
MI 48202 Tel: 313-5773428, Fax: 313-5771101, Email: in@ece.eng.wayne.edu.


1 Introduction
Adaptation is one of the most important mechanisms in living organisms (or natural systems). Take human beings for example. Suppose that the Detroit Pistons have just recruited
a fresh new basketball star. In order for him to become the second Grant Hill, he must practice with his teammates and cooperate with other players by adapting his play. Another
example is the Castros family who performed a seven-person pyramid on high wire in Detroit Hart Plaza. They maintain the human pyramid while walking over a suspended cable.
Needless to say that a great deal of adaptation must take place before this can be done. As
a matter of fact, the brothers, sisters, and nephews have been adapting this for more than
seven years.
These two examples (and many more can be given if we wish) show that adaptation
occurs naturally and constantly in natural systems. This, unfortunately, cannot be said for
man-made systems. We are still waiting to see an airplane that can adapt like a bird on its
own (or a car, a train, for that matter). Only very few man-made systems have the build-in
capability of adaptation.
We suspect that this lack of adaptability in man-made systems is due to the lack of
understanding of adaptation mechanisms. This lack of understanding has resulted in some
\unnatural" approach to adaptation in man-made systems. Let us consider, for example,
adaptive control systems, which are perhaps the most commonly used man-made adaptive
systems. For an adaptive control system to work, we must rst develop a model of the system
to be controlled from physical laws
we must then identify the unknown parameters of the
system by some elaborated identi cation scheme
and nally we will use some sophisticated
synthesis methods to adjust the parameters of the controller so that it can adapt.
Obviously, natural systems do not adapt in this way. Grant Hill does not need to know
the dynamics behind the trajectory of his basketball, nor does he need to estimate parameters
of his teammates. As a matter of fact, he does not even have a model! However, he adapted
to become successful. Similarly, the Castros family may not even know Newton's law of
gravity, but they managed to perform the pyramid seven years without a fall.
2

Therefore, we submit that our view of adaptation must be modi ed (i.e., adapted).
Adaptation in man-made systems must be made more \natural". We must learn from
adaptations of natural systems. In fact, such attempts have been made in the past with
success. One example is the use of (arti cial) neural networks. Inspired by our own brain,
neural networks incorporate adaptation mechanisms that make them wonderful in many
engineering applications, especially when the model of a system is unknown or dicult to
obtain.
A question of great interest is thus the following: Are such adaptation mechanisms unique
to neural networks? That is, can devices other than neurons adapt in a similar manner that
requires no precise modeling and identi cation? In other words, is neuron a unique creation
of evolution or merely a biological convenience? We will show that the answers to the above
questions are no, yes and no, respectively. As a matter of fact, we will show that any
devices interconnected and interactive can adapt by adjusting their interactions, much like
neurons adjusting their synapses. This is true for dynamic or static systems, and for linear
or nonlinear systems.
One feature of our approach of adaptive interaction is the decomposition of a complex
adaptive system into subsystems that we called devices and their interactions via connections.
We assume that adaptation occurs in the interactions. This is done without loss of generality,
because the partition of devices and interactions is arbitrary and can be speci ed by the user.
We model a device by a general (causal) mapping from its input to its output. Thus, we can
handle linear and nonlinear systems in the same manner.
The result of our adaptation algorithm1 is essentially equivalent to that of gradient descent. However, our algorithm is implemented locally. In other words, the adaptation of
an interaction is based on information available locally (that is, in the devices which the
interaction is connected to and from). This is possible because in our algorithm, we do not
attempt to calculate the gradient directly, as the direct calculation may require global inforHere we use the word \algorithm" in a generalized sense to mean a (mathematical) description or model
for calculating and updating system parameters.
1

mation. Rather, we infer the gradient from locally available information. As we will show,
this localization not only is convenient, but also has important implications in its application
in neural networks.
We have successfully applied this approach of adaptive interaction to adaptive control and
system identi cation 18] 19]. These applications resulted in methods that are very dierent
from the traditional methods. In particular, a self-tuning method for PID controllers based
on adaptive interaction was developed in 18]. The advantages of this tuning method includes:
1. It is very simple and can be easily implemented. 2. It requires virtually no knowledge of
the plant. 3. It works for nonlinear as well as linear systems. 4. It is automatic and requires
no human intervention. 5. It works on-line as well as o-line. 6. Stability is guaranteed
after convergence. 7. The initial system can be stable or unstable.
In this paper, we will apply the approach of adaptive interaction to neural networks.
Before our approach, people use back-propagation algorithm to adapt synapses (that is,
interactions) in a neural network. To use the back-propagation algorithm, a dedicated companion (feedback) network to propagate the error back is required. This may complicate
implementations of the back-propagation algorithm, especially hardware implementations.
On the other hand, using adaptive interaction, we can eliminate the need for such a feedback network, and hence signi cantly reduce the complexity of adaptation for complex neural
networks. This is particularly important in VLSI implementations of neural networks. The
absence of the feedback network means that adding trainability to a chip design does not
involve additional wiring-layout complexity between neurons. A trainable neuron can be
designed as a standard unit without considering network topology. These trainable neurons
can then be connected in any way the designer wants. Obviously, this increases the potential
for designing networks with dynamically recon gurable topologies.
Furthermore, our adaptation algorithm also has an important implication on biological plausibility of similar adaptations occurring in biological neurons. Since the backpropagation algorithm was proposed in 1980's, researchers have speculated about whether
an analogous adaptation mechanism might be observed in biological neural systems 30].
4

The consensus among neuroscientists is that this is not likely 8]. The main reason for this
belief is that the requirement of a separated feedback network is unlikely to be met in a
biological neural system. This is not to say that reciprocal connections are rare in biological
neural systems (in fact they are ubiquitous)
but rather that it is unlikely that a biological
neural system could satisfy the strict requirement that there exists a one-to-one correspondence between connections in the feed-forward and feedback networks and the corresponding
connections in the two networks maintain identical weights even as they adapt. This seems
even less likely given the fact that in most biological neural systems a connection between
two neurons is composed of many (even hundreds) of synapses. With the elimination of the
feedback network, the problem of biological plausibility of similar adaptation occurring in
biological neurons may need to be reinvestigated.

2 Adaptive Interaction
Adaptive interaction considers a complex system consisting of N subsystems which we called
devices. Each device (indexed by n 2 N := f1 2 ::: N g) has an integrable output signal yn
and an integrable input signal xn. The dynamics of each device is described by a (generally
nonlinear) causal2 functional
Fn : Xn ! Yn  n 2 N 

where Xn and Yn are the input and output spaces respectively. That is, the output yn(t) of
the nth device relates to its input xn(t) by

yn(t) = (Fn  xn )(t) = Fnxn (t)] n 2 N 


where  denotes composition.
We assume the Frechet derivative of Fn exists3 . We further assume that each device is a
2
3

A functional Fn : Xn ! Yn is causal if yn (t) depends only on the previous history of xn  fxn ( ) :   tg.
The Frechet derivative 22], Fn0 x], of Fn x], is dened as a functional such that
x] ; Fn0 x]  jj = 0:
limjjjj!0 jjFn x + ]Fjjn
jj

single-input single-output system.


An interaction between two devices consists of a (generally non-exclusive) functional
dependence of the input of one of the devices on the outputs of the others and is mediated by
an information carrying connections denoted by c. The set of all connections is denoted by C .
We assume that there is at most one connection from one device to another. Let prec be the
device whose output is conveyed by connection c and postc the device whose input depends
on the signal conveyed by connection c. We denote the set of input connections for the nth
device by In = fc : prec = ng and the set of output connections by On = fc : postc = ng.
A typical system is illustrated in Figures 1. In the gure, for example, the set of input
connections of Device 2 is I2 = fc1 c3g and the set of output connections is O2 = fc4 g. Also,
c1 connects Device 1 to Device 2, therefore prec1 = 1 postc1 = 2.
For the purpose of this paper, we consider only linear interactions, that is, we assume that
the input to a device is a linear combination of the output of other devices via connections
in In and possibly an external input signal un(t):

xn (t) = un(t) +

c2In

cypre (t) n 2 N 
c

where c are the connection weights.


With this linear interaction, the dynamics of the system is described by

yn(t) = Fnun(t) +

X
c2In

cypre (t)] n 2 N :
c

To simplify the notation, in the rest of the paper, we will eliminate when appropriate the
explicit reference to time t.
The goal of our approach is to develop an algorithm to adapt the connection weights c
so that some performance index E (y1 ::: yn) will be minimized. The only assumption we
make to ensure the correctness of our adaptation algorithm is that the following equation

_ c = (

X
s2Opostc

dE
0
dyposts  Fposts xposts ]
0 x
s_ s ;  @y@E )  Fpost
postc ]  yprec 
dE  F 0 x
c
]

y
post
post
post
c
post
s
c
s
dyposts

c 2 C

has a unique solution for _ c c 2 C . This requires the corresponding Jacobian determinant
to be nonzero in the region of interest.
The adaptation algorithm is given in the following theorem.

Theorem 1 For the system with dynamics given by


yn = Fnun +

X
c2In

cypre ] n 2 N 

(1)

if connection weights c are adapted according to


dE
0
dyposts  Fposts xposts ]
_ c = (
s_ s dE
0
s2Opostc
dyposts  Fposts xposts ]  ypostc

;

@E )  F 0 x ]  y  c 2 C  (2)
pre
post post
@ypost
c

then the performance index E will decrease monotonically with time. In fact, the following
is always satised
_ c = ; dE  c 2 C 
(3)
where  > 0 is some adaptation coecient.

dc

Proof

Since by our assumption equation (2) has a unique solution, all we need to prove is that
equation (3) satis es equation (2). Because E is a functional of yn n 2 N , we have for any
connection c 2 C ,

dE = dE  dypost  dxpost
dc
dypost dxpost dc
dypost  y
= dydE  dx
pre
post
post
0 x
= dydE  Fpost
post ]  ypre :
post
c

Also, for any n 2 N ,

dE = @E + X dE  dypost
dyn
@yn c2O dypost dyn
X dE
0 x

 Fpost
= @E +
c
post ]:
@yn c2O dypost
c

Using these two equations, we have


0 x
post ]
dE = @E + X  dE dydE  Fpost
:
c
0 x
dyn @yn c2O dc dydE  Fpost
post ]  ypre
n

Substitute

dE
dc

postc

postc

by ; _c , we have
0 x
post ]
dE = @E + X  (; _ c ) dydE  Fpost
c
dE
0
dyn
@yn c2O
 dy  Fpost xpost ]  ypre
dE  F 0 x
X
1
@E
post post ]
dy
:
= @y ; 
c_ c dE
0
 Fpost xpost ]  ypre
n
c2O
dy
n

postc

postc

postc

postc

Therefore,

dE
_ c = ; d
c
dE
0 x
= ; dy  Fpost
post ]  ypre
postc

dE  F 0 x
X
posts ]
0 x
)  Fpost
= ; ( @y@E ; 1
s_ s dEdyposts 0 posts
postc ]  yprec
c

F

x
]

y
postc
post
pre
post
s
s
s
s2Opostc
dyposts
dE
0
 F xposts ]
X
@E )  F 0 x ]  y :
= (
s_ s dEdyposts 0 posts
;
prec
postc postc
@ypostc
s2Opostc
dyposts  Fposts xposts ]  ypres

Finally since ypostc = ypres ,


dE
0
dyposts  Fposts xposts ]
_ c = (
s_ s dE
0
s2Opostc
dyposts  Fposts xposts ]  ypostc

;

@E )  F 0 x ]  y :
pre
post post
@ypost
c

This shows that equation (3) is the unique solution to equation (2).
If Fn and E are instantaneous functions, which is the case for neural networks, then
the composition  can be replaced by multiplication in the adaptation algorithm. In other
words, the adaptation algorithm can be simpli ed as
yprec X  _ ;  F 0 x ]y
@E :
0 x
_ c = Fpost
(4)
postc ]
s s
postc prec
post
c
c
ypostc s2Opostc
@ypostc
We have applied the above adaptation algorithm to adaptive control and system identi cation 18] 19]. Simulations show that the results are excellent even if we approximate
8

0 x
Fpost
postc ] by a constant. When we use the approximation, there is no need to know
c
Fpostc xpostc ], that is, the model of the device is not needed. As we indicated early, adaptation without model has many advantages.

3 Neural Networks
Let us now apply the adaptation algorithm we developed to neural networks. Therefore, we
take the devices in the system as neurons. We use the standard notations in neural networks
and denote, for i j 2 N ,
vi - the output of neuron i

hi - the input of neuron i

i - the external input of neuron i

wij - the weight of the connection from neuron j to neuron i


wij = 0 if j is not connected
to i.
We denote by g(x) the activation function of a neuron. For sigmoidal neurons,
g(x) = (x) = 1 +1e;x :
Ignoring the dynamics, we can describe the neural network by4

v i = g ( hi ) = g (

j 2N

wij vj + i) i 2 N :

For output neurons, denote


i - the desired output of neuron i

Our goal is to minimize the following error


X
E = 21 e2i 
i2N
where
8
>< vi ; i if i is an output neuron
ei = >
:
:0
otherwise
4

We assume that the equation has at least one xed point which is a stable attractor.

We can now apply our adaptation algorithm to the neural network. We have the following
substitutions:

c ! wij
0 x
0
Fpost
post ] ! g (hi )
ypre ! vj
ypost ! vi
s ! wki
@E ! @E = e :
@ypost
@vi i
Therefore, the adaptation algorithm for neural networks is as follows.
X
w_ ij = g0(hi) vvj wkiw_ ki ; g0(hi)vj ei :
i k2N
The above algorithm is mathematically equivalent to the back-propagation algorithm.
However, it does not require a feedback network to propagate the errors.
By eliminating the feedback network, our new algorithm allows a much simpler implementation than that of the back-propagation algorithm. Using our algorithm, adaptation
mechanism can be built within each neuron to make the neuron trainable. A trainable neuron
can be built as a standard unit. As a particular application requires, these trainable neurons
can be interconnected arbitrarily with minimum wiring. In this way, it is much easier to
change the topology of a neural network, in other words, to recon gure the network.
In the rest of the paper, we will assume g(x) = (x). Since
c

0(x) = (x)(;x)
we can re-write the adaptation algorithm as
X
w_ ij = 0 (hi) vvj wkiw_ ki ; 0 (hi)vj ei
i k2N
X
= (;hi )vi vvj wkiw_ ki ; (;hi )vivj ei
i k2N

= (;hi )vj 1 ( wki2 )0 ; (;hi )vi vj ei :


2 k2N
10

A standard unit of a trainable neuron that implements the above equation is shown in Figure
2.

4 Application: Function Approximation


Let us now apply the adaptation algorithm developed in the previous section to function
approximation. That is, our goal is to train a neural network to approximate a set of
nonlinear functions:

ai =
i(b1  ::: bl ) i = 1 ::: m
over the domain D  Rl . We will denote the inputs and outputs as b = (b1  ::: bl ), a =
(a1  ::: am), and hence a =
(b).
We construct a network with l inputs and m outputs. The number of neurons and the
topology of connections will determine the achievable accuracy of the approximation. In
general, more neurons and connections in the network will result in more accurate approximation. To this end, we can de ne the topological capacity of a network to be the total
number of connections. For constructing a network having largest topological capacity for a
given number of neurons, we refer the reader to 5].
To train the network, we vary the input b over time according to

b = (t):
The function must be such that the trajectory b(t) = (t) will repeatedly visit all regions
of D, more or less uniformly.
Denote the input neurons by 1 ::: l and the output neurons by N ; m + 1 ::: N . Let

vj = bj  j = 1 ::: l:
Then the outputs of the neural network are functions of bj  j = 1 ::: l:

vi = i(v1  ::: vl ) = i(b1  ::: bl ) i = N ; m + 1 ::: N:


11

Denote v = (vN ;m+1 ::: vN ) and hence v = (b)


Since we want to approximate
, we will adapt the synapse weights to minimize the
error
N
X
(ai ; vi)2:
E = 21
i=N ;m+1
To illustrate the eectiveness of such function approximation, we build a neural network
of two layers as shown in Figure 3 using the standard units described in the previous section.
We use this neural network to approximate the following nonlinear function

a = b1 + b2 ; 2b1 b2 :
Note that the XOR problem can be expressed by this function.
We performed a simulation on this network. In the simulation, we let b1 and b2 vary over
the region D = 0:1 0:9]  0:1 0:9] as follows.

b1 = 0:5 + 0:4cos(0:001 t)
b2 = 0:5 + 0:4cos(0:002 t):
We take

 = 0:03
and select the initial values of wi randomly at the interval ;2 2]. The simulation results
are shown in Figures 4 and 5.
It is clear from the simulation that the neural network adapts nicely as the error has
decreased signi cantly during the simulation. More simulation results can be found in 3]
4] 6]. where convergence rate, adaptation coecient, and basin of attraction are studied in
details.

5 Conclusion
In this paper, a new approach to system adaptation was proposed. We view the adaptation of
a system as accomplished by adaptive interaction among subsystems (devices). We derived
12

an adaption algorithm for adapting the interactions that can be implemented based on local
information only. Furthermore, an approximation of this algorithm does not require the
knowledge of the models of the devices. We applied this approach of adaptive interaction
to neural networks. The adaptation algorithm obtained is mathematically equivalent to the
well-known back-propagation algorithm but requires no feedback networks to propagate the
errors.

References
1] W. R. Ashby (1960). Design for a Brain, Wiley.
2] R. C. Bolles and M. D. Beecher (eds.) (1988). Evolution and Learning, Lawrence Erlbaum Associates, Publishers.
3] R. D. Brandt and F. Lin (1994). Supervised learning in neural networks without explicit error back-propagation. Proceedings of the 32nd Annual Allerton Conference on
Communication, Control and Computing, pp. 294-303.
4] R. D. Brandt and F. Lin (1996a). Can supervised learning be achieved without explicit error back-propagation? Proceedings of the International Conference on Neural
Networks, pp. 300-305.
5] R. D. Brandt and F. Lin (1996b). Optimal layering of neurons. 1996 IEEE International
Symposium on Intelligent Control, pp. 497-501.
6] R. D. Brandt and F. Lin (1996c). Supervised learning in neural networks without feedback network. 1996 IEEE International Symposium on Intelligent Control, pp. 86-90.
7] R. D. Brandt and F. Lin (1998). Theory of Adaptive Interaction, AFI Press, to appear.
8] F. Crick (1989) The recent excitement about neural networks. Nature, 337, pp. 129-132.

13

9] B. K. Bolenko and H. C. Card (1995). Tolerance to analog hardware of on-chip learning
in backpropagation networks. IEEE Transactions on Neural Networks, 6(5), pp. 10451052.
10] S. Haykin (1994). Neural Networks: A Comprehensive Foundation, IEEE Press.
11] J. Hertz, A. Krogh and R. G. Palmer (1991). Introduction to the Theory of Neural
Computation, Addison-Wesley.
12] J. H. Holland (1992). Adaptation in Natural and Articial Systems, MIT Press.
13] P. W. Hollis and J. J. Paulos (1994). A neural network learning algorithm tailored for
VLSI implementation. IEEE Transactions on Neural Networks, 5(5), pp. 781-791.
14] P. A. Ioannou and J. Sun (1996). Robust Adaptive Control, Prentice-Hall.
15] R. Isermann, K.-H. Lachmann and D. Matko (1992). Adaptive Control Systems,
Prentice-Hall.
16] Y. D. Landau (1979). Adaptive Control: The Model Reference Approach, Marcel Dekker,
Inc.
17] J. A. Lansner and T. Lehmann (1993). An analog CMOS chip set for neural networks
with arbitrary topologies. IEEE Transactions on Neural Networks, 4(3), pp. 441-444.
18] F. Lin, R. D. Brandt and G. Sailalis (1998). Self-tuning of PID controllers by adaptive
interaction. Preprint.
19] F. Lin, R. D. Brandt and G. Sailalis (1998). Parameter estimation using adaptive interaction. Preprint.
20] F. Lin and R. D. Brandt (1998). Adaptive interaction: A new approach to adaptation.
Preprint.

14

21] B. Linares-Barranco, E. Sanchez-Sinencio, A. Rodriguez-Vazques, and J. L. Huertas


(1993). A CMOS adaptive BAM with on-chip learning and weight refreshing. IEEE
Transactions on Neural Networks, 4(3), pp. 445-455.
22] D. G. Luenberger (1968). Optimization by Vector Space Methods, John Wiley & Sons.
23] D. B. Parker (1987). Optimal algorithms for adaptive networks: second order back
propagation, second order direct propagation, and second order Hebbian learning. Proceedings of the IEEE International Conference on Neural Networks, pp. 593-600.
24] K. H. Pribram (1993) Rethinking Neural Networks: Quantum Fields and Biological
Data, Lawrence Erlbaum Associates, Publishers.
25] D. E. Rumelhart, G. E. Hinton, and G. E. Williams (1986). Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the
Microstructure of Cognition, Vol 1: Foundations (D. E. Rumelhart and J. L. McClelland, eds.), MIT Press, Cambridge, 1986, pp. 318-362.
26] G. C. Williams (1966). Adaptation and Natural Selection, Princeton University Press.
27] R. J. Williams (1988). On the use of back-propagation in associative reinforcement
learning. Proceedings of the IEEE International Conference on Neural Networks, pp.
263-270.
28] R. J. Williams (1988). Towards a theory of reinforcement-learning connnectionist systems. Technical Report, NU-CCS-88-3, Northeastern University.
29] C.-Y. Wu and J.-F. Lan (1996). CMOS current-mode neural associative memory design
with on-chop learning. IEEE Transactions on Neural Networks, 7(1), pp. 167-177.
30] D. Zipser and D. E. Rumelhart (1990). Neurobiological signi cance of new learning
models. In Computational Neuroscience (E. Schwartz, eds.), MIT Press, Cambridge,
1990, pp. 192-200.
15

C1
Device 2

Device 1
C3

...
Device 4

C2

Device 3

...

C4

Device 5

Figure 1: A typical decomposition of a system for adaptive interaction

16

i
v1

wi1
*

hi
+

vi

(.)

w1i

(.)
+

.
.
.
vN

1
(.)
2

*
+

(.)2
wNi

wiN

(.)2

Figure 2: A standard unit for trainable neuron using our adaptive algorithm

17

b1

w1
w2
w5

w6
b2

w3
w4

Figure 3: A neural network to approximate generalized XOR function

18

0.8

0.8

0.6

0.6

input2

input1

0.4
0.2
0
0

0.2
0.5

1
t

1.5

0
0

2
x 10

1
t

1.5

1
t

1.5

2
4

x 10

0.4

0.8

0.3

0.7

error

output

0.5

0.9

0.6

0.2
0.1

0.5
0.4
0

0.4

0.5

1
t

1.5

0
0

2
4

x 10

0.5

2
4

x 10

Figure 4: Simulation results of neural network in Figure 3: error decreases as the network
adapts

19

-0.5

w2

w1

1.95
-0.55

1.9
1.85
0

0.5

-1

1
t

1.5

-0.6

0.5

x 10

1
t

1.5

1
t

1.5

1
t

1.5

2
4

x 10

w4

w3

1.9
-1.1

-1.2

0.5

1
t

1.5

1.8
0

2
4

x 10

0.5

1.5

x 10

w6

w5

0.5

2
4

0
-0.5

0.5

0.5

1
t

1.5

0
0

2
4

x 10

0.5

2
4

x 10

Figure 5: Simulation results of neural network in Figure 3: six connection weights adapt
according to the adaptive algorithm

20

You might also like