Professional Documents
Culture Documents
Wang2003 Chapter ArtificialNeuralNetwork
Wang2003 Chapter ArtificialNeuralNetwork
Wang2003 Chapter ArtificialNeuralNetwork
5.1 Introduction
An artificial neural network (or simply neural network) consists of an input
layer of neurons (or nodes, units), one or two (or even three) hidden layers
of neurons, and a final layer of output neurons. Figure 5.1 shows a typical
architecture, where lines connecting neurons are also shown. Each connection
is associated with a numeric number called weight. The output, hi, of neuron i
in the hidden layer is,
(5.1)
outputs
o 0
hidden layer
inputs
1
cr(u) = \ (5.2)
1 + exp( -Uj
Other possible activation functions are arc tangent and hyperbolic tangent.
They have similar response to the inputs as the sigmoid function, but differ
in the output ranges.
It has been shown that a neural network constructed the way above can
approximate any computable function to an arbitrary precision. Numbers given
to the input neurons are independent variables and those returned from the
output neurons are dependent variables to the function being approximated by
the neural network. Inputs to and outputs from a neural network can be binary
(such as yes or no) or even symbols (green, red, ... ) when data are appropriately
encoded. This feature confers a wide range of applicability to neural networks.
Artificial Neural Network 83
0.9
O.S
0.7
0.6
0.5
0.4
0.3
0.2
0.1
o
-2
-4
-6 -6
-8 -10 -S
-10
1/(1 +exp(-1.S*x-0.S*y))
set has therefore to be large enough for the neural network to memorize the
features/trends embedded in the training set. On the other hand, if too much
unimportant details are contained in the training set, the neural network might
waste its resources (weights) fitting the noise. A judicious selection and/or
representation of the data are therefore critical to successful implementations
of neural networks.
Note that the definitions of validation and test set are reversed among au-
thors of different fields. We have here followed the definition of B.D. Ripley
(1996).
This section serves as a general introduction to neural networks. The fol-
lowing sections describe a step-by-step procedure to design a neural network
for time series prediction.
outputs
o 0
-- _ _ _ _._._._._ ._.-_-._._.__ _._.__ .__ :._.__ ._ _ .- ------
~~ -~-~; -t -delay
( - 'j
\'~ .,, /
hidden layer!
The output of the neuron i in the hidden layer becomes, instead of Eq. (5.1),
N H
hi(t) = 0-(2: Vijxj(t) + 2: Wikhk(t - 1) + T ihid ) . (5.3)
j=l k=l
Time is explicitly indexed above to highlight the delayed feedback. When W ik
are all zero, the network reduces to the feedforward neural network of Figure
5.1. The output neurons, Oi(t), at time t are updated as,
H
Oi(t) = 0-(2: Qijhj(t) + Trt) , (5.4)
j=l
where Qij are the weights between the hidden neurons and the output neurons,
and Tr t are the thresholds of the output neurons. For one-unit-time-ahead
predictions, we define, in the sense of least squares, the following error func-
tion,
T N
X2 = 2: 2: (Oi(t) - Xi(t + 1))2, (5.5)
t=l i=l
where time is discrete and T is the horizon of the data. For simplicity, we have
assumed that the dimensionality of the output vector is the same as that of the
input. A properly defined measure of error is one of the few key factors for
successful neural network applications. An example will be shown below.
86 INTERDISCIPLINARY COMPUTING
yen(t + 1) = yen(t)+
f (dollar index( t), Euro index( t), Sterling index( t),
(5.6)
10 - year bond index(t), Nikkei - 225(t), DJIA(t),
oil price(t), ... ).
Variables of the function f 0 are the indicators of the market forces at present
time t that you believe will affect the price of yen in the currency exchange
market. The function f 0 in Eq. (5.6) is presumably highly nonlinear. And
particularly there exit no theories to model the behavior of yen over time. It
is this difficulty that leads us to the method of neural networks. The choice
of the frequency of the data, namely hourly, daily, or monthly exchange rates,
depends on the objective of the researcher. The horizon of the historical data
for training is another issue since trends change over time. You have also to
make sure that values of the variables have been evaluated in a consistent way
over time.
The second step is data representation. Raw data are rarely fed into neural
networks without preprocessing/transformation. One of the the common trans-
formations is to take natural logarithm of the change in the variable value. The
transformed values form a distribution whose mean and standard deviation can
be readily obtained. A scaling operation is then performed so that the range of
the data is bounded between 0 and 1 or -1 and 1. A typical scaling is to assign
to 1 (0) all values beyond (below) 3 standard deviations away from the mean.
Values within 3 standard deviations are then linearly scaled between 0 and 1.
Again which of the two ranges to use depends on the activation function. It
was however pointed out that sigmoid activation functions were better for neu-
ral networks learning average behavior, while hyperbolic tangent worked better
if learning involves deviations from the average.
Step 3 is training, validation, and testing. We move forward in time and
divide the whole historical data set into training, validation, and test set at a
proportion of, say, 7 :2: 1. The advantage of having the validation set follow the
training set is that these data contain the most up-to-date market trends. On the
other hand, care has to be taken to make sure that the trained neural network
does not favor a sub genre of market trends.
Artificial Neural Network 87
error function is then defined to be the squared differences between the neural
network outputs and the desired outputs.
However, we might remove all the 'no decision' target patterns in the er-
ror function so that the neural network does not waste weights remembering
unimportant fluctuations. All three patterns are however needed in inputs as
they make up the continuous history. If the error function is so defined, the
neural network is forced to output either buy or sell. The strategy of the user is
then changed to buy (sell) yens when the neural network outputs, say, 5 consec-
utive buys (sells). In this way, the neural network might have detected a falling
in the yen price and is predicting a turning over at the fifth buy signal. Patterns
less than 5 consecutive buys might simply identify minor troughs which need
no attention since they make no profits considering the transaction fees. The
actual strategy has to be experimented and depends on the user's portfolio.
The last step in the design is to tune the numerical values of the weights. The
error function is a function of the neuron connecting weights whose number
is often huge. Furthermore, the function can have lots of local minima in its
weight space. A powerful function minimization method capable of finding the
global minimum is simulated annealing introduced in Chapter 4 (cf. Section
4.7). Once the neural network is deployed, frequent retraining is beneficial
and sometimes mandatory because the important temporal patterns might have
changed since the last training.
output layer
input vector
has the same dimensions as the input vector, are calculated. The one having
the shortest distance is declared the winner and the weight vectors which are
in the winner's neighborhood are updated according to the following rule,
where Wj is the weight vector into neuron j on the output layer, k is the itera-
tion (or time) index. (3 (k) is called learning rate and usually defined as,
(3 ) k/k max
(3 (k) = (3initial ( (3. fi~~l , (5.8)
Imtlal
where kmax is the maximum number of iterations. It is seen that if (3initial = 1.0
and (3final = 0.05, the value of (3 starts from 1.0 and drops as k increases until
it becomes 0.05 at k = kmax . This is to say the neural network learns less and
less harder as time goes on, just like a human does when she grows. Gij (k)
defines the neighborhood and in many cases has the following form,
Gij
1 (Ii
= exp [ -2 o-(k)
- j I) 2] , (5.9)
TRIUMF
4004 Wesbrook Mall
Vancouver, V6T 2A3
Canada
e-mail: wangsc@triumf.ca
DataBase.java initializes the input vectors
for the Kohonen's self-organizing map *1
import java.util.*;
class DataBase extends Observable {
Neural parent;
Neuron[] data; II array of Neuron objects
final int num_samples = 100;
public DataBase(Neural parent) {
superO;
thls.parent = parent;
data = new Neuron[num_samples];
initializeData();
} II end of constructor
public void DrawMap (Neuron [] [] weight, double[] [] similarity) {
parent.plotting.weight = welght;
parent.plotting.simllarity = similarity;
setChanged 0 ;
notifyObservers(parent.plotting);
} II end of method
private void initializeData() {
Random rand = new Random();
rand.setSeed(lL);
II 100 different colors
for (int i=O; i<num_samples; i++) {
data[i] = new Neuron();
data[i].x rand.nextDouble();
data[i].y = rand.nextDouble();
data[i].z = rand.nextDouble();
}
II six (or eight) different colors
for (int i=O; i<num_samples/2; i++) {
data[i] = new Neuron();
data[i].x 1.0; II red
data[i].y = 0.0;
data[i].z = 0.0;
}
for (int i=50; i<60; i++) {
data[i] = new Neuron();
data[i].x 0.0; II green
data[i].y = 1.0;
data[i].z = 0.0;
}
for (int i=60; i<70; i++) {
data[i] = new Neuron();
data[i].x 0.0; II blue
data[i].y = 0.0;
data[i].z = 1.0;
}
for (int i=70; i<80; i++) {
data[i] = new Neuron();
data[i].x 1.0; II yellow
data[i].y = 1.0;
data[i].z = 0.0;
}
for (int i=80; i<85; i++) {
92 INTERDISCIPLINARY COMPUTING
1* Sun-Chong Wang
TRIUMF
4004 Wesbrook Mall
Vancouver, V6T 2A3
Canada
e-mail: wangsc@triumf.ca
Neuron.java defines the class to hold the input
vector as well as the weights *1
public class Neuron {
public double x,y,z;
public NeuronO {
x = 0.0;
y = 0.0;
z = 0.0;
} II end of constructor
} II end of class Neuron
Figure 5.5. Six clusters resulting from six Figure 5.6. Similarity plot of the clusters
input colors in Figure 5.5
Figure 5.7. Clustering with eight input Figure 5.8. Similarity plot of Figure 5.7
colors
I.
2 0 _ _ _....._ ...
"
"
12
10
Figure 5.9. Clustering 100 different input Figure 5.10. Similarity plot of Figure 5.9
colors
Figures 5.9 and 5.10. It is noticed that a grid of 20 by 20 output neurons might
be insufficient as evidenced by the blurring boundaries in the similarity map of
Figure 5.10. We then increase the number of output layer neurons to 40 by 40
(Figures 5.11 and 5.12) through the user interface dialog box of Figure 5.13.
1* Sun-Chong Wang
TRIUMF
4004 Wesbrook Mall
Vancouver, V6T 2A3
Canada
e-mail: wangsc@triumf.ca
SOM.java codes the unsupervised learning algorithm of
the self-organizing map: Eqs. (5.7), (5.8), (5.9) *1
import java.l~g.*;
import ~ava.utII.*;
import Java.util.Random;
public class SOM implements Runnable { II a thread
Random rand;
int XSize, YSize;
int iTime, ifreq;
int num_samples;
II initiallfinal betalsigma
double beta_i, beta_f, slgma_i, sigma_f;
double[] [] distance, similarity;
Neuron [] [] weight;
Neuron[] data;
SOMDialog parent;
public SOM(SOMDialog parent) {
this.parent = parent;
rand = new Random();
System.out.println("random number II + rand. nextFloat 0 );
XSize = 0;
Artificial Neural Network 95
File Paraneters
40
36
32
28
24
20
16
12
YSize = 0;
iTime = 10000;
ifreq = 100;
beta_i = 1. 0;
beta_f = 0.01;
sigma_i = (XSize+YSize)/2.0/2.0;
sigma_f = 1.0;
} II end of SOM class constructor
public void rune) { II executed by a thread
for (int i=O; i<iTime; i++) {
Learning(i);
if (i'l.ifreq == 0) {
Similari ty () ;
parent.parent.db.DrawMap(weight,similarity);
} II db is an instance of the DataBase class
parent.jbar. setValue(Math.round(((float) i)1
((float)iTime)*100.0f»;
}
}
40
36
32
28
24
20
16
12
o
'0
nUl'lber of neurons in )( = r®
nunber of neurons in y = .-,~
----
..
GO- -learning
- - _ . _.
)
}
}
} II end of Setup
private void Learning(int itime) {
int i_win=O,j_win=O;
double exponent,beta,sigma,adjust,distance_ij,dmin;
II randomly picks up an input vector
int k = rand.nextlnt(num_samples);
dmin = Double.MAX_VALUE;
II calculates the distance between the input vector and
II the weight vector. i,j are the coordinates of the weight
lion the 2-dimensional output layer
for (int i=O; i<XSize; i++) {
for (int j=O; j<YSize; j++) {
distance[i][j] = (data[k].x - weight[i] [j].x)*
(data[k].x - weight[i] [j].x)+
(data[k].y - weight[i] [j] .y)*
(data[k].y - weight[i] [j].y)+
(data[k].z - weight[i] [j].z)*
(data[k].z - weight[i] [j].z);
if (distance [i] [j] < dmin) {
dmin = distance[i] [j];
II keeps the coordinates of the winner
i_win = i;
j_win = j;
}
}
}
II boundary cells
for (int j=l; j<YSize-l; j++) {
similarity [OJ [jJ = (w_distance(O,j,O,j+l)+
w_distance(O,j,O,j-l)+
w_distance(0,j,1,j))/3.0;
similarity [XSize-1] [jJ =
(w_distance(XSize-l,j,XSize-l,j+l)+
w_distance(XSize-l,j,XSize-l,j-l)+
w_distance(XSize-l,j,XSize-2,j))/3.0;
}
for (int i=l; i<XSize-l; i++) {
similarity[iJ [OJ = (w_distance(i,O,i+l,O)+
w_distance(i,O,i-l,O)+
w_distance(i,0,i,1))/3.0;
similarity[iJ [YSize-1J =
(w_distance(i,YSize-l,i+l,YSize-l)+
w_distance(i,YSize-l,i-l,YSize-l)+
w_distance(i,YSize-l,i,YSize-2))/3.0;
}
II corner cells
similari ty [OJ [OJ
(w_distance(0,0,0,1)+w_distance(0,0,1,0))/2.0;
similarity [OJ [YSize-1J =(w_distance(0,YSize-l,0,YSize-2)+
w_distance(0,YSize-l,1,YSize-l))/2.0;
similarity[XSize-1J [YSize-1J = (w_distance(XSize-l,YSize-l,
XSize-2,YSize-l)+
w_distance(XSize-l,YSize-l,
XSize-l,YSize-2))/2.0;
similarity[XSize-1J [OJ (w_distance(XSize-l,0,XSize-2,0)+
w_distance(XSize-l,0,XSize-l,1))/2.0;
max = 0.0;
for (int i=O; i<XSize; i++) {
for (int j=O; j<YSize; j++) {
if (similarity[iJ[jJ > max) max similarity[iJ [jJ;
}
}
Artificial Neural Network 99
5.10 Summary
Neural.java --------,
I
Plotter.java DataBase.java SOMDialog.java
I I
Neuron.java SOM.java
I
Neuron.java
Figure 5.14. Source files for the Kohonen self-organizing map in this chapter
Neural. j ava, containing the maine) method, is easily written using Spin.
java of Chapter 4 as a template. Plotter. j ava, extending Canvas and im-
plementing Observer, is similar to the one in the appendix. SOMDialog. java,
a dialog box for user interaction, can be obtained by modifying the dialog box
class, SADialog. j ava, in Chapter 4.
We implemented a Kohonen self-organizing map. Vectors of colors were
input to the Kohonen neural network, and grouped into clusters on its own on
the 2-dimensional output layer in the end of the learning.
We laid out steps when designing a neural network. A recurrent neural
network architecture which has 'memory' neurons was introduced. Economic
time series prediction was used as an example throughout the designing stepso
100 INTERDISCIPLINARY COMPUTING