Professional Documents
Culture Documents
Atkinson 1997
Atkinson 1997
This article may be used for research, teaching, and private study purposes.
Any substantial or systematic reproduction, redistribution, reselling, loan, sub-
licensing, systematic supply, or distribution in any form to anyone is expressly
forbidden.
The publisher does not give any warranty express or implied or make any
representation that the contents will be complete or accurate or up to
date. The accuracy of any instructions, formulae, and drug doses should be
independently verified with primary sources. The publisher shall not be liable
for any loss, actions, claims, proceedings, demand, or costs or damages
whatsoever or howsoever caused arising directly or indirectly in connection
with or arising out of the use of this material.
int. j. remote sensing, 1997 , vol. 18 , no. 4 , 699 ± 709
Introduction
Abstract. Over the past decade there have been considerable increases in both
the quantity of remotely sensed data available and the use of neural networks.
These increases have largely taken place in parallel, and it is only recently that
several researchers have begun to apply neural networks to remotely sensed data.
This paper introduces this special issue which is concerned speci® cally with the
use of neural networks in remote sensing. The feed-forward back-propagation
multi-layer perceptron (MLP) is the type of neural network most commonly
encountered in remote sensing and is used in many of the papers in this special
issue. The basic structure of the MLP algorithm is described in some detail while
some other types of neural network are mentioned. The most common applica-
tions of neural networks in remote sensing are considered, particularly those
concerned with the classi® cation of land and clouds, and recent developments
in these areas are described. Finally, the application of neural networks to
multi-source data and fuzzy classi® cation are considered.
1. Backgroun d
The current generation of Earth observation sensors are producing data with
great potential for use in scienti® c and technological investigations in very large and
ever increasing quantities. Whilst such data provide a considerable resource with
which to address many fundamental environmental issues, they also present new
challenges of data processing and data interpretation. These challenges must be
tackled if the full potential of the data are to be realised. Not only is this necessary
for e cient use of the present data, it also provides an important constraint on the
need for, and an in¯ uence on the design of, instruments proposed for future sensor
platforms. It is in the context of these requirements that arti® cial neural networks
(ANNs) are currently being applied in a wide variety of remote sensing applications.
Good introductions to neural networks are provided in texts such as Kohonen
( 1988 ), Beale and Jackson ( 1990), Simpson ( 1990 ), Bishop ( 1995) and Aleksander
and Morton ( 1991 ).
The use of arti® cial neural networks for remote sensing data interpretation has
been motivated by the realisation that the human brain is very e cient at processing
vast quantities of data from a variety of di erent sources. Neurons in the human
brain receive inputs from other neurons and produce an output (if the sum of the
inputs is above a certain threshold ) which is then passed to other neurons. For some
time it has been recognized that a mathematical approach based on the actions of
0143 ± 1161/97 $12.0 0 Ñ 1997 Taylo r & Francis Ltd
700 P. M. Atkinson and A. R. L . Tatnall
biological neurons may be implemented to process and interpret many di erent
types of digital data. While it is not possible or desirable to reproduce the complexity
of the human brain on a computer, arti® cial neural networks that are based on an
architecture of simple processing elements like neurons are proving successful for a
wide range of applications, including processing and interpreting remotely sensed
data.
In the above sense, neural networks are an arti® cial intelligence (AI) technique
and, therefore, come from the same family as expert systems and knowledge-based
approaches to learning ( Key et al . 1989 ). However, whereas expert systems are based
on symbolic representation and, therefore, incorporate qualitative data into the
estimation through prior programming of the learning algorithm, neural networks
employ a connectionist approach in which computer code is required only to run
the network ( Hepner et al . 1990).
Neural networks, in the simplest sense, may be seen as data transformers ( Pao
Downloaded by [University of Sydney] at 11:19 16 April 2013
1989 ), where the objective is to associate the elements in one set of data with the
elements in a second set. When applied to classi® cation, for example, they are
concerned with the transformation of data from feature space to class space. Neural
networks, therefore, belong to the same class of techniques as automated pattern
recognition ( Ritter et al . 1988), regression, and spectral (and textural ) classi® cation.
Given the importance of these techniques, it is not surprising that neural networks
are ® nding increasing application in remote sensing.
The rapid uptake of neural approaches in remote sensing is due mainly to their
widely demonstrated ability to:
(i ) perform more accurately than other techniques such as statistical classi® ers,
particularly when the feature space is complex and the source data has
di erent statistical distributions ( Benediktsson et al . 1990, 1993, Schalko
1992 );
(ii ) perform more rapidly than other techniques such as statistical classi® ers
( Bankert 1994, Coà te and Tatnall 1995);
(iii ) incorporate a priori knowledge and realistic physical constraints into the
analysis ( Brown and Harris 1994, Foody 1995a, b),
(iv) incorporate di erent types of data (including those from di erent sensors)
into the analysis, thus facilitating synergistic studies ( Benediktsson et al .
1993, Benediktsson and Sveinsson 1997)
Given this list of bene® ts, it is clear that one of the main opportunities o ered
by neural networks is to allow the e cient handling of the large quantities of
remotely sensed data which are currently being produced.
In the remainder of this paper, one of the main di erences between statistical
and neural approaches is discussed, a common type of neural network employed in
remote sensing is introduced, and several applications of neural networks in remote
sensing are considered. The objective is not to review neural networks in remote
sensing, but rather to provide a context for the papers in this special issue. A useful
review may be found in Paola and Schowengerdt ( 1995), while some general guidance
is provided in Kanellopoulos and Wilkinson ( 1997).
data for training ( Lee et al . 1990 ). One of the main advantages of neural networks
for classi® cation is that they are distribution-free, that is, no underlying model is
assumed for the multivariate distribution of the class-speci® c data in feature space.
It is, therefore, possible for a single class to be represented in feature space as a
series of clusters (rather than a single cluster). A fundamental di erence between
statistical and neural approaches to classi® cation is, therefore, that statistical
approaches depend on an assumed model , whereas neural approaches depend on
data . It is for this reason that neural networks are suitable for integrating data from
di erent sources. More recent work has concentrated on the incorporation of
additional knowledge into the neural network ( for example, Foody 1995b).
biological neurons). Each node is a simple processing element that responds to the
weighted inputs it receives from other nodes. The arrangement of the nodes is referred
to as the network architecture (® gure 1).
The MLP can separate data that are non-linear because it is `multi-layer’, and it
generally consists of three (or more) types of layers . In this paper, it has been assumed
that the number of layers in a network refers to the number of layers of nodes and
not to the number of layers of weights. The latter de® nition can also be found in
the literature ( Bishop 1995 ). The ® rst type of layer is the input layer, where the
nodes are the elements of a feature vector. This vector might consist of the wavebands
of a data set, the texture of the image ( Kaminsky et al . 1997) or other more complex
parameters (Chen et al . 1997 ). In the paper by Logar et al . ( 1997), a large number
of possible feature elements are considered and a way of selecting the most
appropriate is described.
The second type of layer is the internal or `hidden’ layer since it does not contain
output units. There are no rules, but theory shows that one hidden layer can represent
any Boolean function. An increase in the number of hidden layers enables the
network to learn more complex problems, but the capacity to generalize is reduced
and there is an associated increase in training time ( Foody 1995b). Lippmann ( 1987)
suggests that if a second hidden layer is used, the maximum number of nodes in the
second hidden layer should be three times the number in the ® rst hidden layer.
Foody and Arora ( 1997 ) assess several factors a ecting the accuracy of a neural
network, one of which is the number of nodes in the hidden layer.
The third type of layer is the output layer and this presents the output data (for
example, for image classi® cation, the number of nodes in the output layer is equal
to the c classes in the classi® cation). Some researchers have suggested that if
the number of nodes in the output layer is actually greater than c, then greater
classi® cation accuracies may result ( Benediktsson et al . 1993 ).
Each node in the network is interconnected to the nodes in both the preceding
and following layers by connections. These connections have associated with them
weights (or synaptic e cacies following the biological analogy).
associated with the connection. The receiving node sums the weighted signals from
all nodes to which it is connected in the preceding layer. Formally, the input that a
single node receives is weighted according to:
netj = v ji o i ( 1)
where v ji represents the weights between node i and node j , and o i is the output
from node i . The output from a given node j is then computed from:
o j = f (net j ) ( 2)
The function f is usually a non-linear sigmoid function that is applied to the weighted
sum of inputs before the signal passes to the next layer. When the signal reaches the
output layer it forms the network output. In traditional hard classi® cation (where
pixels are assigned to a single class only), the output of one node (that of the chosen
class) is set to one, while all other nodes in the output layer are equal to zero.
Downloaded by [University of Sydney] at 11:19 16 April 2013
3.4. T raining
The aim of network training is to build a model of the data generating process
so that the network can generalize and predict outputs from inputs that it has not
seen before. For the MLP a training pattern is presented to the network and the
signals are fed-forwards as described above. Then, the network output is compared
with the desired output (a set of training data, for example, of known classes) and
the error computed. This error is then back-propagated through the network and,
generally, the weights of the connections (which are usually set randomly at the
start) are altered according to what is known as the generalized delta rule ( Rumelhart
et al . 1986):
Dv ji ( n +1) = g ( d j o i ) + aDv ji ( n ) ( 3)
where g is the learning rate parameter, d j is an index of the rate of change of the
error, and a is the momentum parameter. This process of feeding forward signals
and back-propagating the error is repeated iteratively until the error of the network
as a whole is minimized or reaches an acceptable magnitude. It is through the
successive modi® cation of the (adaptive) weights that the neural network is able
to learn.
3.5. Generalizing
Several factors a ect the capabilities of the neural network to generalize, that is,
the ability of the neural network to interpolate and extrapolate to data that it has
not seen before. These include:
(i ) Number of nodes and architecture
If a large number of simple processing elements are used the mathematical
structure can be made very ¯ exible and the neural network can be used for a wide
range of applications. This may not be necessary for all applications. For example,
very simple topologies using a small number of data points have been investigated
( Yahn and Simpson 1995 ). In general terms, the larger the number of nodes in the
hidden layer(s), the better the neural network is able to represent the training data,
but at the expense of the ability to generalize.
(ii ) Size of training set
The data set used must be representative of the entire distribution of values likely
to be associated with a particular class. If the extent of the distribution of the data
704 P. M. Atkinson and A. R. L . Tatnall
in feature space is not covered adequately the network may fail to classify new data
accurately. A consequence of this for the MLP algorithm is that large quantities of
data are often required for training, and researchers are often concerned with ® nding
the minimum size of data set necessary (for example, Hepner et al . 1990).
The requirement for large training data sets also means that training times may
be long. To speed up the training process, several modi® cations to the MLP algorithm
have been introduced including the momentum term (see above), the delta-bar-delta
rule, and optimization procedures ( Benediktsson et al . 1993 ). Paola and
Schowengerdt ( 1995), in a comparison of the back-propagation neural network and
maximum-likelihood classi® ers, identi® ed the choice of training times necessary for
mean square error minimization as problematic.
(iii ) Training time
The time taken for training also a ects the generalizing capabilities of the network.
The longer that a network is trained on a speci® c data set, the more accurately it
Downloaded by [University of Sydney] at 11:19 16 April 2013
will be able to classify those data, but at the expense of the ability to classify
previously unseen data. In particular, it is possible to overtrain a network so that it
is able to memorize the training data but is not able to generalize when it is applied
to di erent data.
4.1. L and
Howald ( 1989 ), McClelland et al . ( 1989), Hepner et al . ( 1990), and Downey et al .
( 1992 ) all applied neural networks to classify land cover from Landsat Thematic
Mapper ( TM ) imagery and all found to varying degrees that the neural approach
was more accurate than traditional statistical classi® cation. Kanellopoulos et al .
( 1992 ) conducted an experiment to estimate twenty land cover classes from SPOT
High Resolution Visible ( HRV ) imagery, and again found that the neural approach
Downloaded by [University of Sydney] at 11:19 16 April 2013
was more accurate. Decatur ( 1989) applied neural networks to classify terrain from
synthetic aperture radar (SAR) imagery, and Ersoy and Hong ( 1990 ) applied a
hierarchical network to classify airborne multispectral scanner system (MSS)
imagery. Civco ( 1993) found that in certain circumstances neural networks were
actually less accurate than conventional statistical approaches for classifying land
cover.
4.2. Clouds
A similar approach has been taken to evaluate neural networks for identifying
and classifying clouds. For example, Lee et al . ( 1990 ) used a feed-forward back-
propagation neural network to classify di erent categories of clouds including cirrus,
stratocumulus, and cumulus from Landsat MSS data. The results showed a signi® cant
improvement over classical methods and gave an overall accuracy of 93 per cent.
Several di erent types of neural network were compared by Welch et al . ( 1992 ) for
classifying cloud data. They found the feed-forward back-propagation neural network
to be accurate, but slow to train compared to other types of network, particularly
the probabilistic neural network. This network makes use of Bayesian classi® cation
to assign classes to the largest value of the posterior class probability density
functions. The probablistic type of network was successfully used by Bankert ( 1994)
to classify clouds in maritime regions. In these studies, one goal is to automate the
handling of satellite sensor imagery and there is growing recognition that use must
be made of cloud shape, size, texture, and context ( Pankiewicz 1995, Lewis et al .
1997 ). Another goal is that of cloud screening to assist other work ( Yahn and
Simpson 1995, Logar et al . 1997 ).
the incorporation of a priori knowledge and data from di erent sources into the
estimation. Neural networks are ® nding use in a wide range of applications in remote
sensing, and new applications are being proposed frequently. In this special issue,
some of the applications identi® ed above are investigated further, and several new
applications are reported.
References
A leksander, I . and M orton, H ., 1991, An Introduction to Neural Computing (London:
Chapman and Hall ).
A rora, M . K . and F oody, G . M . , 1997, An evaluation of variables a ecting the accuracy of
probabilistic, fuzzy and neural network classi® cations with log-linear modelling.
International Journal of Remote Sensing , 18, 785± 798 (this issue).
A nderson, J . A . and R osenfeld, E . , 1988, Neurocomputing (Cambridge, MA: MIT Press).
A tkinson, P ., C utler, M . and L ewis, H ., 1997, Mapping sub-pixel variation in land cover
in the U.K. from AVHRR imagery. International Journal of Remote Sensing , 18,
Downloaded by [University of Sydney] at 11:19 16 April 2013
F iset, R . and C avayas, F . , 1997, Automatic comparison of a topographic map with remotely
sensed images in a map updating perspective: the road network case. International
Journal of Remote Sensing , 18, 991± 1006 (this issue).
F isher, P . F . and P athirana, S ., 1990, The evaluation of fuzzy membership of land cover
classes in the suburban zone. Remote Sensing of Environment , 34, 121± 132.
F oody, G . M . , 1995a, Using prior knowledge in arti® cial neural network classi® cation with
a minimal training set. International Journal of Remote Sensing , 16, 301± 312.
F oody, G . M . , 1995b, Land cover classi® cation using an arti® cial neural network with ancillary
information. International Journal of Geographical Information Systems, 9, 527± 542.
F oody, G . M . and A rora, M . K ., 1997, An evaluation of some factors a ecting the accuracy
of classi® cation by an arti® cial neural network. International Journal of Remote Sensing ,
18, 799± 810 (this issue).
F oody, G . M ., L ucas, R . M ., C urran, P . J . and H onzak, M . , 1997, Non-linear mixture
modelling without end-members using an arti® cial neural network. International
Journal of Remote Sensing , 18, 937± 953 (this issue).
G opal, S . and W oodcock, C . , 1994, Theory and methods for accuracy assessment of thematic
maps using fuzzy sets. Photogrammetric Engineering and Remote Sensing , 60, 181± 188.
Downloaded by [University of Sydney] at 11:19 16 April 2013
H eerman, P . D . and K hazenie, N ., 1992, Classi® cation of multi-spectral remote sensing data
using a back-propagation neural network. I.E.E.E. T ransactions on Geoscience and
Remote Sensing , 30, 81± 88.
H epner, G . F ., L ogan, T ., R itter, N . and B ryant, N ., 1990, Arti® cial neural network
classi® cation using a minimal training set: comparison to conventional supervised
classi® cation. Photogrammetric Engineering and Remote Sensing , 56, 469± 473.
H opfield, J . J . and T ank, D . W . , 1985, Neural computation of decisions in optimization
problems. Biological Cybernetics, 52, 141± 152.
H owald, K . J ., 1989, Neural network image classi® cation. Proceedings of the ASPRS-ACS M
Fall Convention ( Falls Church, VA: American Society for Photogrammetry and Remote
Sensing), pp. 207± 215.
I to, Y . and O matu, S ., 1997, Categeory classi® cation using a self-organizing neural network.
International Journal of Remote Sensing , 18, 829± 845 (this issue).
J in, Y .-Q . and L iu, C . , 1997, Biomass retrieval from high-dimensional active/passive remote
sensing data by using an arti® cial neural network. International Journal of Remote
Sensing, 18, 971± 979 (this issue).
K aminsky, E . J ., B arad, H . and B rown, W . , 1997, Textural neural network and version space
classi® ers for remote sensing. International Journal of Remote Sensing , 18, 741± 762
(this issue).
K anellopoulos, I . and W ilkinson, G . G ., 1997, Strategies and best practice for neural
network image classi® cation. International Journal of Remote Sensing , 18, 711± 725
(this issue).
K anellopoulos, I ., V arfis, A ., W ilkinson, G . G . and M e’gier, J ., 1992, Land-cover discrim-
ination in SPOT HRV imagery using an arti® cial neural network: a 20-class experiment.
International Journal of Remote Sensing , 13, 917± 924.
K ey, J ., M aslanik, J . A . and S chweiger, A . J ., 1989, Classi® cation of merged AVHRR and
SMMR Arctic data with neural networks. Photogrammetric Engineering and Remote
Sensing, 55, 1331± 1338.
K ohonen, T . , 1984, Self Organization and Associative Memory (Berlin: Springer-Verlag).
K ohonen, T . , 1988, An introduction to neural computing. Neural Networks, 1, 3± 16.
L ee, J ., W eger, R . C ., S engupta, S . K . and W elch, R . M . , 1990, A neural network approach
to cloud classi® cation. I.E.E.E. T ransactions on Geoscience and Remote Sensing , 28,
846± 855.
L ee, J . J ., S him, J . C . and H a, Y . H ., 1994, Stereo correspondence using the Hop® eld neural
network of a new energy function. Pattern Recognition , 27, 1513± 1522.
L ewis, H . G ., C o^ te’ , S . and T atnall, A . R . L . , 1997, Determination of spatial and temporal
characteristics as an aid to neural network cloud classi® cation. International Journal
of Remote Sensing , 18, 899± 915 (this issue).
L ippmann, R . P ., 1987, An introduction to computing with neural nets. I.E.E.E. ASSP
Magazine , 2, 4± 22.
L ogar, A ., C orwin, E ., A lexander, J ., L loyd, D ., B erendes, T . and W elch, R ., 1997, A
Neural networks in remote sensing 709