Generalization Capacity in Artificial Neural Networks

Generalization Capacity in Artificial Neural Networks
- the ballistic trajectory learning case -
M. Roisenberg1, J. M. Barreto2, F. M. Azevedo2
1Instituto de Informática, Departamento de Informática Aplicada,

Universidade Federal do Rio Grande do Sul,
Porto Alegre, Brasil
e-mail: roisenb@inf.ufrgs.br
(Atualmente em programa de doutorado no endereço 2)
2Grupo de Pesquisas em Engenharia Biomédica, Departamento de Engenharia Elétrica,

Universidade Federal de Santa Catarina,
Florianópolis, Brasil
e-mail: barreto@gpeb.ufsc.br, azevedo@gpeb.ufsc.br
Abstract
This paper presents de use of an Artificial Neural Network (ANN) in learning the features of a
dynamic system, in such case, the development of an ANN as the controller for a device designed to
launch objects under ballistic movement. We make considerations about the generalization capacity
both for live beings as for the network. We also present the network topology and configuration, the
learning technique and the precision obtained. At last, this work shows the importance of the choice of
the activation function in learning some critical points.
1. Introduction depends on the initial conditions of the

launching, that is, the initial launching angle
The ballistic movement interception
and velocity. The combination on both of these
capacity is pervasive to a lot of life beings.
variables is capable to generate trajectories that
Observing some situations in our everyday life,
fulfill all the state space ℜ 2.
we can imagine that the interception of objects
under ballistic movement is a trivial action. As there are an infinite number of
Consider for example, when we throw a wood possible trajectories, it is "not trivial" to train a
stick to a dog, or even during football, neural network to solve this problem (so easily
basketball or other games. In these situations, solved by human beings). We must count on the
the interception task is always present but we neural network generalization capacity. As it's
rarely think about how it's done. The other impossible to train a neural network with all the
problem in achieving that action, resides in the possible trajectories set, the neural network
required response time -- when someone throws generalization capacity must be used, so that
you a ball, you have just a few seconds to we train the network with some trajectories and
predict its trajectory and to act to intercept it at it can correctly predict all the others.
the right point. Another great problem is that
the ballistic movement, whose trajectory is
given by the following equations: 2. Generalization vs. Specialization
x = vo * cos(θ) * t (1) Artificial Neural Networks are capable
to estimate continuous functions through the
and observation on how the output data relates with
y = vo * sin(θ) * t - g * t2 / 2 (2) the process inputs, without mathematically
specifying this relationship. Due to this feature,
the neural networks are becoming largely launching angles to train the network. Once
utilized by the engineering community in trained for these trajectories, we expect that the
identification problems and for control of neural network could generalize, i.e., to predict
dynamic systems. Once a neural network has the real trajectory for angles and velocities not
learned the desired relationship between the seen in the training examples.
input and output data presented during the
In order to implement the interception
training phase, we hope that it gives right
device, we had to study some subjects: the
answers to other problems of the same type, but
generalization capacity of neural networks to
it didn't ever seen, in the execution phase,
solve this problem, how precise are the given
presenting the expected generalization feature.
responses, and what the network must learn
One of the most interesting features of (which trajectories must be presented to the
intelligent life beings is the specialization network during the training phase). So, we
versus generalization dilemma. In this case, implemented two networks and selected two
specialization means to be precise, accurate in situations to train the networks. In the first
producing the right answers to some given situation, the direct model of the ballistic
questions. In the other way, generalization movement was presented to the network, i.e.,
means the property to give right answers (not for a given launching angle, velocity and time
necessarily accurate answers) to questions not after the launching, we want to obtain the
previously seen in the examples. position of the launched object in the plane
(x,y). In the second case, the situation was
Some animals are very specialized in
inverted and the inverse model was presented to
some aspects of their behaviors or anatomy that
the network. In this case, the input data to the
help them to survive at their natural
network was a target position point and the
environment, like the Panda Bear that just feeds
output was the launching angle and velocity to
some kind of bamboo. By the other side, this
reach this target point.
specialization makes an adaptation to other
conditions or the learning of other tasks very Suppose a device designed to intercept
difficult. Animals not so specialized, like objects under ballistic movement, for example a
humans, have a higher adaptation and survival basket ball player like Michael Jordan. The two
capacity, may be due to their generalization situations previously described take place
capacity. In the computational area, traditional almost at the same time. As soon as he ("Air"
computer programs have a very good Jordan) detects an object under ballistic
performance in solving algorithmic problems, movement (the ball), the interception device
but they have much less performance in ill tries to predict the trajectory and choose the
defined, imprecise, fuzzy problems. On the interception point where the object will be at
other hand, humans are very poor in solving certain time in future. Following this choice, the
algorithmic problems, but they can predict the interception devices (Michael Jordan) launch
ballistic trajectory of an object, even if it has himself in a ballistic trajectory with some
been launched with an angle or velocity never velocity and launching angle, in order to
seen before. We expect that other intercept the object at the previewed point and
computational approaches, like the neural then do the points throwing the ball into the
computing paradigm, have a better performance basket (another ballistic movement)
in these ill defined and imprecise class of
The neural network used to implement
problems.
the direct and the inverse model has a feed
forward architecture with one hidden layer. The
learning algorithm used was back propagation.
3. Implementation
For the first model the network has three
As already said, it's impossible to train neurons at the input layer and two neurons at
the neural network with all the ballistic the output layer, as we can see at figure 1a. The
trajectories generated by different launching input neurons receive the considered time
velocities and launching angles. So, we must instant, the launching angle and velocity. At the
choose an adequate set of velocities and
output layer, we obtain the corresponding used by the back propagation algorithm
position points (x,y) presents greater residual error. In using the
hyperbolic tangent activation function, that
For the second case, the two input
don't happen. In such case, the derivative is
neurons of the network receive the target points
bigger just for points near zero.
coordinates (x,y) and produces at the two
output neurons the launching angle and velocity 4. Experiments
to reach the desired point as can be seen at
In this section we will describe the
figure 1b.
experiments we have done to study the
Another question that we were faced generalization capacity of the network, the
during the implementation of the neural features of the training set, as well as the
network to run the experiments, became the influence of the number of neurons in the
choice of the activation function (also known by hidden layer.
threshold, squashing, signal or output function)
4.1. experiment no 1
be implemented in the neurons. Traditionally, in
many commercial products, a function called In order to analyze the generalization
logistic seems to be frequently preferred to be capacity, the neural network was trained to
implemented, due to its simple derivative used learn the ballistic trajectory of an object
in the back propagation algorithm. However, launched with one initial velocity (10 m/s) and
the image of this function belongs to the range two launching angles (30o and 60o). Some
[0,1]. So, the network output could only has point coordinates (x,y) followed by an object
positive values in this range. That's OK for launched under these two initial conditions,
horizontal distances from the launching point (x during 2 seconds, were presented to the network
coordinate points), but means a hard work during the training phase, as can be seen in
when we have negative height values (y figure 2. At the execution phase, we recall from
coordinate points). A more natural choice, the trained network the 2 seconds' trajectory for
would be to use an activation function that had an object launched with the same initial
image values in the range [-1,1]. This would velocity, but launching angle varying from 0o
allow to train the network with trajectories that to 90o. We plotted the maximum error between
reach negative height values, as when the the real trajectory and that given by the trained
launch angle or velocity is small. For this case, network. These error values can be seen in
we choose an activation function in the figure 3.
hyperbolic tangent form. We can see that for launching angles
We made experiments with both the inside the range compound by the angles used
activation functions for the neural network to train the network, the interpolation done by
neurons. Firstly we implement the logistic the network was very good. However, the
function: extrapolation capacity for angles outside the
trained angles' interval was very poor.
(1 + e-x)-1
and secondly, the hyperbolic tangent function:
(1 - e-x) / (1 + e-x) .
4.2. experiment no 2
Our experiments confirmed the stated
by Gallant[3]. The learning performance In the second experiment we change the
presented by the network using the hyperbolic training set. In this case, we want to observe the
tangent activation function was higher than that interpolation capacity if with train the network
presented when we used the logistic function. with other two trajectories far away from each
This can be explained because, in the first other. Now we trained the network maintaining
moments after the launching, the output values the initial velocity but with other launching
produced by the network are near zero. The angles (15o and 75o). As can be seen in figure
derivative of the logistic function for this points 4, the interpolation performance presented by
is also very small, then the gradient method the network during the execution phase for
launching angles between the trained interval For more distant points from the
was not good. As stated previously, this poorer launching point, the variation in the "launching
performance occurs because the "concepts" in angle" parameter occurs less frequently, and we
this case the two different trajectories presented just have to increase the "launching velocity"
to the network during the training phase are too parameter to reach target far points. The
"distant" from each other, exceeding the neural "concepts" to be taught to the network are more
network generalization capacity. In this "homogeneous".
experiment we also vary the number of neurons
After a long training phase, the results
in the hidden layer, but can't observe any
presented by the network can be seen in figure
significant variance in the generalization
5. There we can see that for target points near
capacity. At last we trained the network with an
the launching point, the errors are large and the
intermediate angle (15o , 45o and 75o ). The
interpolation capacity presented by the network
result can be seen at figure 4.
are poor. For target points faraway from the
4.3. experiment no 3 origin, where the relationship "greater distance,
greater launching velocity" is more constant,
For the third experiment, the inverse
the generalization capacity as the precision
model previously described was implemented.
obtained by the network are greater.
In this case our goal was to teach to the neural
network some features of the inverse model.
Given some target coordinate point (x,y) to the
5. Conclusions
network input, we want to get from the network
output the right launching angle and velocity to The results obtained through the
reach the point. As there are infinite solutions experiments leads us to conclude that, in
to reach the target point, we give as solution to general, artificial neural networks have a good
the network the smaller angle (in 15o interval) generalization capacity, in the sense that they
and the corresponding launching velocity. In can produce correct responses (not exactly
this example the number and complexity of the precise ones) to questions that does not belong
"concepts" to be taught to the network are much to the training set. However, we note that this
greater. Moreover, for a lot of points in the generalization occurs in the interpolation sense
space of coordinates, mainly for those located between two or more previously learned
near the launching point, we must constantly "concepts". It's important to remark that the
vary the solution angle and velocity during the "distance" between the concepts, i.e., how
training phase. different the "concepts" are, plays an essential
role in the network generalization capacity.
To illustrate the variability of
"concepts" presented to the network, mainly at This shows the great importance of the
target points near the launching point, look at selection of the examples in the training set. In
the following example: In some situations we what concerns about the extrapolation capacity
must to "increase" the launching angle while of the neural networks, as for the humans, is a
"reducing" the launching velocity to reach a much more complex task and as the humans,
target point more distant than other relatively the neural networks tend to produce wrong
with the launching point (to reach the point results.
(1.5,2.0) the launching conditions taught to the In this work, the concepts of "concept
network are: θo=60o and vo=8.6m/s, while to taught to a neural network" and "distance
reach a far point (1.5,2.5) the launching angle between concepts" were presented in an
grows to θo =75o, but the initial velocity informal and intuitive way. We intend to
decreases to vo=7.3m/s). In other situations, the formalize these concepts based in
relation changes (to reach the point (1.5,0.5) .mathematically based.. approach in a future
the launching conditions taught to the network work.
are: θo =30o e vo=6.34m/s, while to reach a
far point (1.5,1.0) the launching angle grows to At last, this work tries to show the
θo =45o, and the initial velocity also grows to importance in choosing the neurons' activation
vo=6.64m/s). function in learning performance of some
critical points. Moreover, we presented an
application in which some features of a
dynamic system were learned by a static neural
network.
This fact is very important. In fact,
suppose the control of a plant performed by
conventional frequency response methods. It is
well known that it is impossible to control the
plant with even a non-linear gain in the
feedback loop in several cases, for exemple a
system with two integrators and without zeros.
In this case for any gain the frequency response
will have the -1 point inside the Nyquist loop,
Figure 2. Real and neural network generated
indicating instability of the closed loop system.
trajectory for some launching angles and
In this case it is necessary to use a dynamic
launching velocity of 10 m/s.
system in the controller, in such a way to
reshape the frequency response and to leave the
-1 point outside the Nyquist loop. Or, using a
ANN in this case must include some sort of
dynamics.
In our case, however, we are learning
not the dynamics of the system, but the
trajectories that are functions in a space where
time is not a parameter, but a coordinate
variable, and it is possible to learn functions
with a static system implemented by the ANN.
Time X Position
Angle
Y Position
Velocity
...
Figure 3. Maximum error between the real and
the neural network calculated trajectory for
Figure 1a various launching angles and launching velocity
of 10m/s. The network was trained for 30o and
X Position 60o launching angles.
Angle
Velocity
Y Position
...
Figure 1b
Figure 1. Topology of the networks used to

learn the ballistic movement in an interception
device.
Figure 4. Maximum error between the real and for the MUCOM (Multisensory Control of
the neural network calculated trajectory for Movement, EU project), 1994.
various launching angles and launching velocity
[7] C. C. Klimasaukas, "Neural Networks: an
of 10m/s. The network was firstly trained for
Engineering Perspective," IEEE
15o and 75o launching angles and then for 15o,
Communications Magazine, p50-53, Sept.,
45o and 75o launching angles.
1992.
[8] J. L. Moody, "The Effective Number of
Parameters: An Analysis of Generalization
and Regularization in Nonlinear Learning
Systems." In J. E. Moody and S. J.
Hanson (Eds.), Neural Information
Processing Systems 4 (Morgan Kaufmann,
1992).
[9] J. M. Barreto, F. M. de Azevedo, C. I.
Zanchin, "Neural Network Identification
of Ressonance Frequencies from Noise".
In 9th Brazilian Congress on Automatics,
p840-844, Aug., 1992.
Figure 5. Minimum distance between the given
target point and the closer point reached by an
object launched with the initial conditions
calculated by the network.
References
[1] J. A. Anderson, "Neural-Network Learning
and Mark Twain's Cat," IEEE
Communications Magazine, p16-23,
September, 1992.
[2] D. E. Rummelhart, G. E. Hinton and R. J.
Williams, "Learning Internal
Representations by Error Propagation.") in
D. E. Rummelhart and J. L. McClelland
(Eds.), Parallel Distributed Processing -
Vol. 1, (MIT Press, 1989).
[3] S. I. Gallant, "Neural Network Learning
and Expert Systems," (MIT Press, 1993).
[4] R. Resnick, D. Halliday, "Physics, Part I,"
(Jonh Willey & Sons, 1973).
[5] Z. Schreter, "Connectionism: A Link
Between Psychology and Neuroscience?"
In J Stender, T. Addis (Eds.), Symbols
versus Neurons? (IOS Press, 1990).
[6] J. Barreto, T. Proychev, "Control of the
Standing Position: A Neural Network
Approach," Technical Report, Lab. of
Neurophysiology, Medicine Faculty,
University of Louvain, Brussels, prepared

Generalization Capacity in Artificial Neural Networks - The Ballistic Trajectory Learning Case

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Generalization Capacity in Artificial Neural Networks - The Ballistic Trajectory Learning Case

Uploaded by

Copyright:

Available Formats

- the ballistic trajectory learning case -

M. Roisenberg1, J. M. Barreto2, F. M. Azevedo2

1Instituto de Informática, Departamento de Informática Aplicada,

2Grupo de Pesquisas em Engenharia Biomédica, Departamento de Engenharia Elétrica,

1. Introduction depends on the initial conditions of the

Figure 1. Topology of the networks used to

You might also like