Image

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 223

e-Learning

Understanding
Information Retrieval
Medical
SERIES ON SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING

Series Editor-in-Chief
S K CHANG (University of Pittsburgh, USA)

Vol. 1 Knowledge-Based Software Development for Real-Time Distributed Systems


Jeffrey J.-P. Tsai and Thomas J. Weigerf (Univ. Illinois at Chicago)
VOl. 2 Advances in Software Engineering and Knowledge Engineering
edited by Vincenzo Ambriola (Univ. Pisa) and Genoveffa Torfora (Univ. Salerno)
VOl. 3 The Impact of CASE Technology on Software Processes
edited by Daniel E. Cooke (Univ. Texas)
Vol. 4 Software Engineering and Knowledge Engineering: Trends for the Next Decade
edited by W. D. Hurley (Univ. Pittsburgh)
VOl. 5 Intelligent Image Database Systems
edited by S. K. Chang (Univ. Pittsburgh), 15.Jungerf (Swedish Defence Res.
Establishment) and G. Torfora (Univ. Salerno)
Vol. 6 Object-Oriented Software: Design and Maintenance
edited by Luiz F. Capretz and Miriam A. M. Capretz (Univ. Aizu, Japan)
VOl. 7 Software Visualisation
edited by P. Eades (Univ. Newcastle) and K. Zhang (Macquarie Univ.)
Vol. 8 Image Databases and Multi-Media Search
edited by Arnold W. M. Smeulders (Univ. Amsterdam) and
Ramesh Jain (Univ. California)
VOl. 9 Advances in Distributed Multimedia Systems
edited by S. K. Chang, T. F. Znati (Univ. Pittsburgh) and
S. T. Vuong (Univ. British Columbia)
Vol. 10 Hybrid Parallel Execution Model for Logic-Based Specification Languages
Jeffrey J.-P. Tsai and Sing Li (Univ. Illinois at Chicago)
Vol. 11 Graph Drawing and Applications for Software and Knowledge Engineers
Kozo Sugiyama (Japan Adv. Inst. Science and Technology)
Vol. 12 Lecture Notes on Empirical Software Engineering
edited by N. Jurist0 & A. M. Moreno (Universidad Politecrica de Madrid,
Spain)
Vol. 13 Data Structures and Algorithms
edited by S. K. Chang (Univ. Pittsburgh, USA)
Vol. 14 Acquisition of Software Engineering Knowledge
SWEEP: An Automatic Programming System Based on Genetic Programming
and Cultural Algorithms
edited by George S. Cowan and Robert G. Reynolds (Wayne State Univ.)
Vol. 15 Image: E-Learning, Understanding, Information Retieval and Medical
Proceedings of the First International Workshop
edited by S. Vitulano (Universita di Cagliari, Italy)
ngineering and Knowledge Engineering
Series on Software Engineering

Proceedings of the First International Workshop


Cagliari, ltak 9 - 10 June 2003

e-Learning
Understanding
Information Retrieval
Medical

edited by Sergio Vitulano


Uvliversitd degli Studi di Cagliari, Italy

r LeWorld Scientific
NewJersey London Singapore Hong Kong
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA ofice: Suite 202, 1060 Main Street, River Edge, NJ 07661
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library

IMAGE: E-LEARNING, UNDERSTANDING, INFORMATION RETRIEVAL


AND MEDICAL
Proceedings of the First International Workshop
Copyright 0 2003 by World Scientific Publishing Co. Pte. Ltd.
All rights reserved. This book, or parts thereoj may not be reproduced in any form or by any means,
electronic or mechanical, includingphotocopying, recording or any information storage and retrieval
system now known or to be invented, without written permissionfrom the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright
Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to
photocopy is not required from the publisher.

ISBN 981-238-587-8

Printed in Singapore by Mainland Press


This page intentionally left blank
PREFACE

The role played by images in several human activities, that ranges from
entertainment to studies and covers all phases of the learning process, is
ever more relevant and irreplaceable.
The computer age may be interpreted as a transformation of our social
life in its working and leisure aspects. In our opinion this change is so
relevant that it could be compared with the invention of printing, of the
steam-engine or the discovery of radio-waves.
While for a long time images could only be captured by photography,
we are now able to capture, to manipulate and to evaluate images with the
computer. Since original image processing literature is spread over many
disciplines, we can understand the need to gather into a specific science all
the knowledge in this field.
This new science takes into account the image elaboration, transmission,
understanding, ordering and finally the role of image in knowledge as a
general matter.
This book aims a t putting as evidence some of the above listed subjects.
First of all we wish to emphasize the importance of images in the
learning process and in the transmission of knowledge (e-Learning section).
How much and what kind of information contents do we need in im-
age comprehension? We try to give an answer, even if partially, in the
Understanding section of this book.
The big amount of images used in internet sites requires the solution of
several problems. Their organization and the transmission of their content
is the typical field of interest of information retrieval, which studies and
provides solution to this specific problem.
In the last two decades the number and the role played by images in
the medical field has become ever more important. At the same time the
physicians require methodologies typical of Computer Science for the anal-
ysis, the organization and for CAD (Computer-Aided Design) purposes
applied t o medical images treatment.
The Medical section of this volume gives examples of the interaction
between computer science and medical diagnosis.
This book tries to offer a new contribution to computer science that will
inspire the reader to discover the power of images and to apply the new
knowledge of this science adequately and successfully to his or her research
area and to everyday life.

Sergio Vitulano

vii
This page intentionally left blank
CONTENTS

Preface vii

Medical Session
Chairman: M. Tegolo
An Introduction to Biometrics and Face Recognition 1
F. Perronnin, Jean-Luc Dugelay

The Use of Image Analysis in the Early Diagnosis of Oral Cancer 21


3.Serpico, M. Petruzzi, M. De Benedittis

Lung Edge Detection in Poster0 Anterior Chest Radiographs 27


P. Campadelli, E. Casimghi

Discrete Tomography from Noisy Projections 38


C. Valenti

An Integrated Approach to 3D Facial Reconstruction 46


from Ancient Skull
A. F. Abate, M. Nappi, S. Ricciardi, G. Tortora

e-Learning Session
Chairman: M. Nappi
The e-Learning Myth and the New University 60
V. Cantoni, M. Porta, M. G. Semenza

e-Learning - The Next Big Wave: How e-learning will enable 69


the transformation of education
R. Straub, C. Milani

Information Retrieval Session


Chairman: V. Cantoni
Query Morphing for Information Fusion 86
S.-K. Chang

ix
X

Image Representation and Retrieval with Topological Trees 112


C. Grana, G. Pellacani, S. Seidenari, R. Cucchiara

An Integrated Environment for Control and Management 123


of Pictorial Information Systems
A . F. Abate, R. Cassino, M. Tucci

A Low Level Image Analysis Approach to Starfish Detection 132


V. Di Geszi, D. Tegolo, F. Isgrd, E. Trucco

A Comparison among Different Methods in Information Retrieval 140


F. Cannavale, V. Savona, C. Scintu

HER Application on Information Retrieval 150


A . Casanova, M . Praschini

Understanding Session
Chairman: Jean-Luc Dugalay
Issues in Image Understanding 159
V , Di Geszi

Information System in the Clinical-Health Area 178


G. Madonna

A Wireless-Based System for an Interactive Approach 200


to Medical Parameters Exchange
G. Fenu, A . Crisponi, S. Cugia, M. Picconi
1

AN INTRODUCTION TO BIOMETRICS AND FACE


RECOGNITION

F. PERRONNIN*AND J.-L. DUGELAY


Eurecom Institute
Multimedia Communications Department
2229, route des Crgtes - B. P. 193
06904 Sophia-Antipolis ce'dex - France
E-mail: {perronni,dugelay} @eurecom.fr

We present in this paper a brief introduction to biornetrics which refers to the


problem of identifying a person based on his/her physical or behavioral character-
istics. We will also provide a short review of the literature on face recognition
with a special emphasis on frontal face recognition, which represents the bulk of
the published work in this field. While biornetrics have mostly been studied s e p
arately, we also briefly introduce the notion of multirnodality, a topic related to
decision fusion and which has recently gained interest in the biometric community.

1. Introduction to Biometrics
The ability to verify automatically and with great accuracy the identity
of a person has become crucial in our society. Even though we may not
notice it, our identity is challenged daily when we use our credit card or try
t o gain access to a facility or a network for instance. The two traditional
approaches t o automatic person identification, namely the knowledge-based
approach which relies on something that you know such as a password,
and the token-based approach which relies on something that you have such
as a badge, have obvious shortcomings: passwords might be forgotten or
guessed by a malicious person while badges might be lost or stolen '.
Biometrics person recognition, which deals with the problem of iden-
tifying a person based on his/her physical or behavioral characteristics, is
an alternative to these traditional approaches as a biometric attribute is
inherent to each person and thus cannot be forgotten or lost and might be
difficult t o forge. The face, the fingerprint, the hand geometry, the iris,

'This work was supported in part by France Telecom Research

1
L

etc. are examples of physical characteristics while the signature, the gait,
the keystroke, etc. are examples of behavioral characteristics. It should be
underlined that a biometric such as the voice is both physical and behav-
ioral. Ideally a biometric should have the following properties: it should be
universal, unique, permanent and easily collectible 2 .

In the next three sections of this introductory part, we will briefly de-
scribe the architecture of a typical biometric system, the measures to eval-
uate its performance and the possible applications of biometrics.

1.1. Architecture
A biometric system is a particular case of a pattern recognition system
’. Given a set of observations (captures of a given biometric) and a set
of possible classes (for instame the set of persons that can be possibly
identified) the goal is to associate to each observation one unique class.
Hence, the main task of pattern recognition is to distinguish between the
intru-class and inter-class variabilities. Face recognition, which is the main
focus of this article, is a very challenging problem as faces of the same
person are subject to variations due to facial expressions, pose, illumination
conditions, presence/absence of glasses and facial hair, aging, etc.
A biometric system is composed of at least two mandatory modules,
the enrollment and recognition modules, and an optional one, the adapta-
tion module. During enrollment, the biometric is first measured through
a sensing device. Generally, before the feature eotraction step, a series of
pre-processing operations, such as detection, segmentation, etc. should be
applied. The extracted features should be a compact but accurate repre-
sentation of the biometric. Based on these features, a model is built and
stored, for instance in a database or on a smart card. During the recognition
phase, the biometric characteristic is measured and features are extracted
as done during the enrollment phase. These features are then compared
with one or many models stored in the database, depending on the op-
erational mode (see the next section on performance evaluation). During
the enrollment phase, a user friendly system generally captures only a few
instances of the biometric which may be insufficient to describe with great
accuracy the characteristics of this attribute. Moreover, this biometric can
vary over time in the case where it is non-permanent (e.g. face, voice).
Adaptation maintains or even improves the performance of the system over
time by updating the model after each access to the system.
3

~I-~I_ICUIIIIIF*llOh.- EXTRACTION ID

Figure 1. Architecture of a biometric system.

1.2. Performance Evaluation


Generally, a biometric system can work under two different operational
modes: identification or verification. During identification, the system
should guess the identity of person among a set of N possible identities
(1:N problem). A close-set is generally assumed, which means that all the
trials will be from people which have a model in the database and the goal
is hence to find the most likely person. During verification, the user claims
an identity and the system should compare this identity with the stored
model (1:l problem). This is referred as an open-set as persons which are
not in the database may try to fool the system. One can sometimes read
claims that identification is a more challenging problem than verification or
vice-versa. Actually, identification and verification are simply two different
problems.
As it may not be enough to know whether the top match is the correct
one for an identification system, one can measure its performance through
the cumulative match score which measures the percentage of correct an-
swers among the top N matches. Also one could use recall-precision curves
as is done for instance to measure the performance of database retrieval
systems. The FERET face database is the most commonly used database
for assessing the performance of a system in the identification mode.
A verification system can make two kinds of mistakes: it can reject a
rightful user, often called client, or accept an impostor. Hence, the perfor-
mance of a verification system is measured in terms of its false rejection
rate ( F R R ) and false acceptance rate (FAR). A threshold is set to the scores
obtained during the verification phase and one can vary this threshold t o
4

obtain the best possible compromise for a particular application depending


on the required security level. By varying this threshold, one obtains the
receiver operating curve (ROC), i.e. the FRR as a function of the FAR. To
summarize the performance of the system with one unique figure, one often
uses the equal error rate (EER) which corresponds to the point FAR=FRR.
The M2VTS database and its extension, the XM2VTSDB 5 , are the most
commonly used databases for assessing the performance of a system in the
verification mode.
The interested reader can also refer to for an introduction to evaluating
biometric systems.

1.3. Applications
There are mainly four areas of applications for biometrics: access control,
transaction authentication, law enforcement and personalization.
Access control can be subdivided into two categories: physical and uir-
’.
tual access control The former controls the access to a secured location.
An example is the Immigration and Naturalization Service’s Passenger Ac-
celerated Service System (INSPASS) deployed in major US airports which
enables frequent travelers to use an automated immigration system that
authenticates their identity through their hand geometry. The latter one
enables the access to a resource or a service such as a computer or a net-
work. An example of such a system is the voice recognition system used in
the MAC 0s 9.
Transaction authentication represents a huge market as it includes
transactions at an automatic teller machine (ATM) , electronic fund trans-
fers, credit card and smart card transactions, transactions on the phone or
on the Internet, etc. Mastercard estimates that a smart credit card incor-
porating finger verification could eliminate 80% of fraudulent charges 8 . For
transactions on the phone, biometric systems have already been deployed.
For instance, the speaker recognition technology of Nuance is used by the
clients of the Home Shopping Network or Charles Schwab.
Law enforcement has been one of the first applications of biometrics.
Fingerprint recognition has been accepted for more than a century as a
means of identifying a person. Automatic face recognition can also be very
useful for searching through large mugshot databases.
Finally, personalization through person authentication is very appealing
in the consumer product area. For instance, Siemens allows to personalize
one’s vehicle accessories, such as mirrors, radio station selections, seating
5

positions, etc. through fingerprint recognition lo

In the following subsections, we will provide to the reader a brief review


of the literature on face recognition. This review will be split into two parts:
we will devote the next section to frontal face recognition which represents
the bulk of the literature on and the “other modalities”, corresponding to
different acquisition scenarios such as profile, range images, facial thermo-
gram or video, will be discussed in section 3. The interested reader can
refer to l1 for a full review of the literature on face recognition before 1995.
We should underline that specific parts of the face (or the head) such as
the eyes, the ears, the lips, etc. contain a lot of relevant information for
identifying people. However, this is out of the scope of this paper and the
interested reader can refer to l2 for iris recognition, t o l 3 for ear recogni-
tion and l4 for lips dynamics recognition. Also we will not review a very
important part of any face recognition system: the face detection. For a
recent review on the topic, the reader can refer to 1 5 .

2. Frontal Face Recognition


It should be underlined that the expression “frontal face recognition” is used
in opposition to “profile recognition”. A face recognition system that would
work only under perfect frontal conditions would be of limited interest
and even “frontal” algorithms should have some view tolerance. As a full
review, even of the restricted topic of frontal face recognition, is out of
the scope of this paper, we will focus our attention on two very successful
classes of algorithms: the projection-based approaches, i.e. the Eigenfaces
and its related approaches, and the ones based on deformable models such
as Elastic Graph Matching. It should be underlined that the three top
performers at the 96 FERET performance evaluation belong t o one of these
two classes ‘.

2.1. Eigenfaces and Related Approaches


In this section, we will first review the basic eigenface algorithm and then
consider its extensions: multiple spaces, eigenfeatures, linear discriminant
analysis and probabilistic matching.

2.1.l. Eigenfaces
Eigenfaces are based on the notion of dimensionality reduction. first
outlined that the dimensionality of the face space, i.e. the space of variation
6

between images of human faces, is much smaller than the dimensionality of a


single face considered as an arbitrary image. As a useful approximation, one
may consider an individual face image to be a linear combination of a small
number of face components or eigenfaces derived from a set of reference
face images. The idea of the Principal Component Analysis (PCA) 17, also
known as the Karhunen-Loewe Transform (KLT), is to find the subspace
which best accounts for the distribution of face images within the whole
space.
~ ~the set of reference or training faces, 0 be the average
Let { O i } i E [ l , be
face and Oi = Oi - 0. Oi is sometimes called a caricature image. Finally,
if 0 = [ O 1 , 0 ~..., O N ] ,the scatter matrix S is defined as:
N
S= -pi@
=0 0 T (1)
i=l
The optimal subspace PPCAis chosen to maximize the scatter of the pro-
jected faces:
P ~ C A= argmax
P ~PSP~I (2)
where 1.1 is the determinant operator. The solution to problem (2) is the
subspace spanned by the eigenvectors [el, e2, ...e ~ ]also
, called eigenfaces,
corresponding t o the K largest eigenvalues of the scatter matrix S. It
should be underlined that eigenfaces are not themselves usually plausible
faces but only directions of variation between face images (see Figure 2).
Each face image is represented by a point PPCAx Oi = [w:, w f ,...]w: in

Figure 2. (a) Eigenface 0 (average face) and (b)-(f) eigenfaces 1 to 5 as estimated on a


subset of the FERET face database.

the K-dimensional space. The weights wk’s are the projection of the face
image on the k - th eigenface ek and thus represent the contribution of each
eigenface to the input face image.
7

To find the best match for an image of a person’s face in a set of stored
facial images, one may calculate the Euclidean distances between the vector
representing the new face and each of the vectors representing the stored
faces, and then choose the image yielding the smdlest distance 18.

2.1.2. Multiple Spaces Approaches


When one has a large amount of training data, one can either pool all
the data to train one unique eigenspace, which is known as the parametric
approach or split the data into multiple training sets and train multiple
eigenspaces which is known as the view-based approach. The latter approach
has been designed especially to compensate for different head poses.
One of the first attempts to train multiple eigenspaces was made in 19.
This method, consists in building a separate eigenspace for each possible
view 19. For each new target image, its orientation is first estimated by pro-
jecting it on each eigenspace and choosing the one that yields the smallest
distance from face to space. The performance of the parametric and view-
based approaches were compared in l9 and the latter one seems to perform
better. The problem with the view-based approach is that it requires large
amounts of labeled training data to train each separate eigenspace.
More recently Mixtures of Principal Components (MPC) were proposed
to extend the traditional PCA 2oi21.An iterative procedure based on the
Expectation-Maxamazationalgorithm was derived in both cases to train au-
tomatically the MPC. However, while 2o represents a face by the best set of
features corresponding to the closest set of eigenfaces, in 21 a face image is
projected on each component eigenspace and these individual projections
are then linearly combined. Hence, compared to the former approach, a
face image is not assigned in a hard manner to one eigenspace component
but in a soft manner to all the eigenspace components. 21 tested MPC on
a database of face images that exhibit large variabilities in poses and illu-
mination conditions. Each eigenspace converges automatically to varying
poses and the first few eigenvectors of each component eigenspace seem to
capture lightning variations.

2.1 3. Eigenfeatures
An eigenface-based recognition system can be easily fooled by gross varia-
tions of the image such as the presence or absence of facial hair 19. This
shortcoming is inherent to the eigenface approach which encodes a global
representation of the face. To address this issue, l9 proposed a modular or
8

layered approach where the global representation of the face is augmented


by local prominent features such as the eyes, the nose or the mouth. Such
an approach is of particular interest when a part of the face is occluded and
only a subset of the facial features can be used for recognition. A similar
approach was also developed in 22. The main difference is in the encoding
of the features: the notion of eigenface is extended to eigeneyes, eigen-
nose and eigenmouth as was done for instance in 23 for image coding. For
a small number of eigenvectors, the eigenfeatures approach outperformed
the eigenface approach and the combination of eigenfaces and eigenfeatures
outperformed each algorithm taken separately.

2.1.4. Linear Discriminant Approaches


While PCA is optimal with respect to data compression 16, in general it is
sub-optimal for a recognition task. Actually, PCA confounds intra-personal
and extra-personal sources of variability in the total scatter matrix S. Thus
eigenfaces can be contaminated by non-pertinent information.
For a classification task, a dimension reduction technique such as Linear
Discriminant Analysis (LDA) should be preferred to PCA 24125726. The idea
of LDA is to select a subspace that maximizes the ratio of the inter-class
variability and the intra-class variability. Whereas PCA is an unsupervised
feature extraction method, discriminant analysis uses the category infor-
mation associated with each training observation and is thus categorized as
supervised.
Let O i , k be the k-th picture of training person i, Ni be the number
of training images for person i and o i be the average of person i. Then
SB and S,, respectively the between- and within-class scatter matrices, are
given by:
C

i= 1

i = l k=l

The optimal subspace PLDA is chosen to maximize the between-scatter


of the projected face images while minimizing the within-scatter of the
projected faces:
9

The solution to equation (5) is the sub-space spanned by [ e l , e n , . . . e ~ ] ,


the generalized eigenvectors corresponding to the largest eigenvalues of the
generalized eigenvalue problem:

SBek = /\kSwek k= 1 , ... K (6)


However, due to the high dimensionality of the feature space, Sw is gen-
erally singular and this principle cannot be applied in a straightforward
manner. To overcome this issue, generally one first applies PCA to reduce
the dimension of the feature space and then performs the standard LDA
24,26. The eigenvectors that form the discriminant subspace are often re-
ferred as Fisherfaces 24. In 2 6 , the space spanned by the first few Fisherfaces
are called the m o s t discriminant features (MDF) classification space while
PCA features are referred as m o s t expressive features (MEF). It should be

Figure 3. (a) Fisherface 0 (average face) and (b)-(f) Fisherfaces 1 to 5 as estimated on


a subset of the FERET face database.

underlined that LDA induces non-orthogonal projection axes, a property


which has great relevance in biological sensory systems 2 7 .
Other solutions to equation 5 were suggested 27,28,29.

2.1.5. Probabilistic Matching


While most face recognition algorithms, especially those based on eigen-
faces, generally use simple metrics such as the Euclidean distance, 30 sug-
gests a probabilistic similarity based on a discriminative Bayesian analysis of
image differences. One considers the two mutually exclusives classes of vari-
ation between two facial images: the intra-personal and extra-personal vari-
ations, whose associated spaces are noted respectively RI and RE. Given
two face images 01 and 0 2 and the image difference A = 01 - 0 2 , the
similarity measure is given by P ( R I ~ A ) Using
. Baye’s rule, it can be trans-
10

formed into:

The high-dimensionality probability functions P(AlR1) and P ( A ~ R E are


)
estimated using an eigenspace density estimation technique 31. It was ob-
served that the denominator in equation (7) had a limited impact on the
performance of the system and that the similarity measure could be reduced
to P(A\O,) with little loss in performance, thus reducing the computational
requirements of the algorithm by a factor two.

2.2. Deformable Models


As noted in 32, since most face recognition algorithms are minimum distance
pattern classifiers, a special attention should be paid to the definition of
distance. The distance which is generally used is the Euclidean distance.
While it is easy to compute, it may not be optimal as, for instance, it
does not compensate for the deformations incurred from different facial
expressions. Face recognition algorithms based on deformable models can
cop with this kind of variation.

2.2.1. Elastic Graph Matching


Elastic Graph Matching algorithm (EGM) has roots in the neural network
community 3 3 .
Given a template image FT,one first derives a face model from this
image. A grid is placed on the face image and the face model is a vector
field 0 = { o i , j } where oi,j is the feature vector extracted at position ( i , j )
of the grid which summarizes local properties of the face (c.f. Figure 4(a).
Gabor coefficients are generally used but other features, like morphological
feature vectors, have also been considered and successfully applied to the
EGM problem 34. Given a query image 3Q, one also derives a vector field
X = { q j } but on a coarser grid than the template face (c.f. Figure 4(b)).
In the EGM approach, the distance between the template and query images
is defined as a best mapping M * among the set of all possible mappings
{ M }between the two vector fields 0 and X . The optimal mapping depends
on the definition of the cost function C. Such a function should keep a
proper balance between the local matching of features and the requirement
t o preserve spatial distance. Therefore, a proper cost function should be of
11

...................................

Figure 4. (a) Template image and (b) query image with their associated grids. (c) Grid
after deformation using the probabilistic deformable model of face mapping (c.f. section
2.2.3). Images extracted from the FERET face database.

the form:

where C, is the cost of local matchings, Ce the cost of local deformations


and p is a parameter which controls the rigidity of the elastic matching and
has to be hand-tuned.
As the number of possible mappings is extremely large, even for lattices
of moderate size, an exhaustive search is out of the question and an approx-
imate solution has to be found. Toward this end, a two steps procedure
was designed:
0 rigid matching: the whole template graph is shifted around the
query graph. This corresponds to p + 00. We obtain an initial
mapping M o .
0 deformable matching: the nodes of the template lattice are then
stretched through random local perturbations to reduce further
the cost function until the process converges to a locally optimal
mapping M * , i.e. once a predefined number of trials have failed to
improve the mapping cost.
The previous matching algorithm was later improved. For instance,
in 34 the authors argue that the two-stage coarse-to-fine optimization is
sub-optimal as the deformable matching relies too much on the success
of the rigid matching. The two stage optimization procedure is replaced
with a probabilistic hill-climbing algorithm which attempts to find at each
12

iteration both the optimal global translation and the set of optimal local
perturbations. In 35, the same authors further drop the C, term in equation
(8). However, to avoid unreasonable deformations, local translations are
restricted to a neighborhood.

2.2.2. Elastic Bunch Graph Matching


36elaborated on the basic idea of EGM with the Elastic Bunch Graph
Matching (EBGM) through three major extensions:

While the cost of local matchings in C, only makes use of the mag-
nitude of the complex Gabor coefficients in the EGM approach, the
phase information is used to disambiguate features which have a
similar magnitude, but also to estimate local distortions.
The features are no longer extracted on a rectangular graph but
they now refer to specific facial landmarks called fiducial points.
A new data structure called bunch graph which serves as a gen-
eral representation of the face is introduced. Such a structure is
obtained by combining the graphs of a set of reference individuals.

It should be noted that the idea of extracting features at positions which


correspond t o facial landmarks appeared in earlier work. In 37 feature
points are detected using a Gabor wavelet decomposition. Typically, 35 to
50 points are obtained in this manner and form the face graph. To compare
two face graphs, a two-stage matching similar to the one suggested in 33
is developed. One first compensates for a global translation of the graphs
and then performs local deformations for further optimization. However,
another difference with 33 is that the cost of local deformations (also called
topology cost) is only computed after the features are matched which results
in a very fast algorithm. One advantage of 36 over 37 is in the use of the
bunch graph which provides a supervised way to extract salient features.

An obvious shortcoming of EGM and EBGM is that C,, the cost of local
matchings, is simply a sum of all local matchings. This contradicts the fact
that certain parts of the face contain more discriminant information and
that this distribution of the information across the face may vary from one
person t o another. Hence, the cost of local matchings at each node should
be weighted according to their discriminatory power 38y39134935.
13

2.2.3. Probabilistic Deformable Model of Face Mapping


A novel probabilistic deformable model of face mapping 40, whose philoso-
phy is similar to EGM 33, was recently introduced. Given a template face
&-, a query face FQ and a deformable model of the face M , for a face
identification task the goal is to estimate P(.TTI.FQ,M). The two major
differences between EGM and the approach presented in 40 are:

0 In the use of the HMM framework which provides efficient formulas


t o compute P(FTIFQ, M ) and train automatically all the parame-
ters of M . This enables for instance to model the elastic properties
of the different parts of the face.
0 In the use of a shared deformable model of the face M for all
individuals, which is particularly useful when little enrollment data
is available.

3. Other “Modalities” for Face Recognition


In this section we will very briefly review what we called the “other modali-
ties” and which basically encompass the remaining of the literature on face
recognition: profile recognition, recognition based on range data, thermal
imagery and finally video-based face recognition.

3.1. Profile Recognition


The research on profile face recognition has been mainly motivated by
requirements of law enforcement agencies with their so-called mug shot
databases ”. However, it has been the focus of a relatively restricted num-
ber of papers. It should be underlined that frontal and profile face recog-
nition are complementary as they do not provide the same information.
A typical profile recognition algorithm first locates on the contour image
points of interest such as the nose tip, the mouth, chin, etc. also called
jiducial points and then extracts information such as the distances, angles,
etc. for the matching (see 41 for an example of an automatic system based
on this principle). An obvious problem with such an approach is the fact
that it relies on an accurate feature extraction. Alternative approaches
which alleviate this problem include (but are not limited to) the use of
Fourier descriptors for the description of closed curves 42, the application
of Eigenfaces to profiles l9 and, more recently, an algorithm based on string
matching 4 3 .
14

3.2. Range Data


While a 2-D intensity image does not have direct access to the 3-D structure
of an object, a range image contains the depth information and is not
sensitive to lightning conditions (it can even work in the dark) which makes
range data appealing for a face recognition system. The sensing device can
be a rotating laser scanner which provides a very accurate and complete
representation of the face as used for instance in 44145. However, such a
scanner is highly expensive and the scanning process is very slow. In 46
the authors suggested the use the coded light approach for acquiring range
images. A sequence of stripe patterns is projected onto the face and for
each projection an image is taken with a camera. However, for shadow
regions as well as regions that do not reflect the projected light, no 3-D
data can be estimated which results in range images with a lot of missing
data. Therefore, the authors decided to switch to a multi-sensor system
with two range sensors acquiring the face under two different views. These
two sets of range data are then merged. Although these sensing approaches
reduce both the acquisition time and cost, the user of such a system should
be cooperative which restricts its use. This may explain the fact that little
literature is available on this topic.
In 4 4 , the authors present a face recognition system based on range data
template matching. The range data is segmented into four surface regions
which are then normalized using the location of the eyes, nose and mouth.
The volume between two surfaces is used as distance measure. In 45 the face
recognition system uses features extracted from range and curvature data.
Examples of features are the left and right eye width, the head width, etc.
but also the maximum Gaussian curvature on the nose ridge, the average
minimum curvature on the nose ridge, etc. In 4 6 , the authors apply and
extend traditional 2-D face recognition algorithms (Eigenfaces and HMM-
based face recognition 47) to range data. More recently, 48 point signatures
are used as features for 3-D face recognition. These feature points are
projected into a subspace using PCA.

3.3. Facial Therrnogmm


The facial heat emission patterns can be used to characterize a person.
These patterns depend on nine factors including the location of major blood
vessels, the skeleton thickness, the amount of tissue, muscle, and fat 4 9 . IR
face images have the potential for a good biometric as this signatures is
unique (even identical twins do not share the same facial thermogram)
15

and it is supposed to be relatively stable over time. Moreover, it cannot


be altered through plastic surgery. The acquisition is done with an infra-
red (IR) camera. Hence, it does not depend on the lightning conditions,
which is a great advantage over traditional facial recognition. However, 1R
imagery is dependent on the temperature and IR is opaque to glass. A pre-
liminary study 50 compared the performance of visible and IR imagery for
face recognition and it was shown that there was little difference in perfor-
mance. However, the authors in 50 did not address the issue of significant
variations in illumination for visible images and changes in temperature for
IR images.

3.4. Video-Based Recognition


Although it has not been a very active research topic (at least compared
to frontal face recognition), video-based face recognition can offer many
advantages compared to recognition based on still images:

0 Abundant data is available at both enrollment and test time. Ac-


tually one could use video at enrollment time and still images at
test time or vice versa (although the latter scenario would perhaps
make less sense). However, it might not be necessary to process
all this data and one of the tasks of the recognition system will be
the selection of an optimal subset of the whole set of images which
contains the maximum amount of information.
0 With sequences of images, the recognition system has access to dy-
namic features which provides valuable information on the behavior
of the user. For instance, the BioID system l4 makes use of the lip
movement for the purpose of person identification (in conjunction
with face and voice recognition). Also dynamic features are gen-
erally more secure against fraud than static features as they are
harder to replicate.
0 Finally the system can try to build a model of the face by estimating
t h 3-D depth of points on objects from a sequence of 2-D images
which is known as structure from motion ll.

Video-based recognition might be extremely useful for covert surveillance,


for instance in airports. However, this is a highly challenging problem as
the system should work in a non-cooperative scenario and the quality of
surveillance video is generally poor and the resolution is low.
16

4. Multimodality
Reliable biometric-based person authentication systems, based for instance
on iris or retina recognition already exist but the user acceptance for such
systems is generally low and they should be used only in high security
scenarios. Systems based on voice or face recognition generally have a high
user acceptance but their performance is not satisfying enough.
Multimodality is a way to improve the performance of a system by com-
bining different biometrics. However, one should be extremely careful about
which modalities should be combined (especially, it might not be useful to
combine systems which have radically different performances) and how to
combine them. In the following, we will briefly describe the possible multi-
modality scenarios and the different ways to fuse the information.

4.1. Different Multirnodality Scenarios


We use here the exhaustive classification introduced in 51 :

(1) multiple biometric systems: consists in using different biometric at-


tributes, such as the face, voice and lip movement 14. This is the
most commonly used sense of the term multimodality.
(2) multiple sensors: e.g. a camera and an infrared camera for face
recognition.
(3) multiple units of the same biometn'c: e.g. fusing the result of the
recognition of both irises.
(4) multiple instances of the same biometn'c: e.g. in video-based face
recognition, fusing the recognition results of each image.
(5) multiple algorithms on the same biometric capture.
We can compare these scenarios in terms of the expected increase of
performance of the system over the monomodal systems versus the increase
of the cost of the system, which can be split into additional software and
hardware costs.
In terms of the additional amount of information and thus in the ex-
pected increase of the performance of the system, the first scenario is the
richest and scenarios (4) and (5) are the poorest ones. The amount of
information brought by scenario (2) is highly dependent on the difference
between the two sensors. Scenario (3) can bring a large amount of infor-
mation as, for instance, the two irises or the ten fingerprints of the same
person are different. However, if the quality of a fingerprint is low for a
person, e.g. because of a manual activity, then the quality of the other
17

fingerprints is likely t o be low.


The first two scenarios clearly introduce an additional cost as many
sensors are necessary t o perform the acquisitions. For scenario (3) there is
no need for an extra sensor if captures are done sequentially. However, this
lengthens the acquisition time which makes the system less user-friendly.
Finally, scenarios (1) and (5) induce an additional software cost as different
algorithm are necessary for the different systems.

4.2. Information Fusion


As stated at the beginning of this section, multimodality improves the
performance of a biometric system. The word performance includes both
accuracy and eficiency.
The assumption which is made is that different biometric systems make
different types of errors and thus, that it is possible to use the comple-
mentary nature of these systems. This is a traditional problem of decision
fusion 53. Fusion can be done at three different levels 52 (by increasing
order of available information):

0 At the abstract level, the output of each classifier is a label such


as the ID of the most likely person in the identification case or a
binary answer such as accept/reject in the verification case.
0 At the rank level the output labels are sorted by confidence.
At the measurement level, a confidence measure is associated to
each label.

Commonly used classification schemes such as the product rule, sum


rule, min rule, max rule and median rule, are derived from a common the-
oretical framework using different approximations 54. In 5 5 , the authors
evaluated different classification schemes, namely support vector machine
(SVM), multi layer perceptron (MLP), decision tree, Fisher’s linear dis-
criminant (FLD) and Bayesian classifier and showed that the SVM- and
Bayesian-based classifiers had a similar performance and outperformed the
other classifiers when fusing face and voice biometrics.
In the identification mode, one can use the complementary nature of
different biometrics to speed-up the search process. Identification is gener-
ally performed in a sequential mode. For instance, in 56 identification is a
two-step process: face recognition, which is fast but unreliable is used to
obtain an N-best list of the most likely persons and fingerprint recognition,
which is slower but more accurate, is then performed on this subset.
18

5. Summary
We introduced in this paper biometrics, which deals with the problem of
identifying a person based on his/her physical and behavioral character-
istics. Face recognition, which is one of the most actively research topic
in biometrics, was briefly reviewed. Although huge progresses have been
made in this field for the past twenty years, research has mainly focused
o n frontal face recognition from still images. We also introduced the no-
tion of multimodality as a way of exploiting t h e complementary nature of
monomodal biometric systems.

References
1. S. Liu and M. Silverman, “A practical guide to biometric security technol-
ogy”, I T Professional”, vol. 3, no. 1, pp. 27-32, Jan/Feb 2001.
2. A. Jain, R. Bolle and S. Pankanti, “Biometrics personal identification in
networked society”, Boston, MA: Luwer Academic, 1999.
3. R. 0. Duda, P. E. Hart and D. G. Stork, “Pattern classification”, 2nd edition,
John Wiley & Sons, Inc.
4. P. J. Phillips, H. Moon, S. Rizvi and P. Rauss, “The FERET evaluation
methodology for face recognition algorithms”, IEEE Tbans. on PAMI, 2000,
vol. 22, no. 10, October.
5. K. Messer, J. Matas, J. Kittler and K. Jonsson, “XMZVTSDB: the extended
M2VTS database”, AVBPA’99, 1999, pp. 72-77.
6. P. J. Phillips, A. Martin, C. L. Wilson and M. Przybocki, “An introduction
t o evaluating biometric systems”, Computer, 2000, vol. 33, no. 2, pp. 56-63.
7. INSPASS, http://www.immigration.gov/graphics/howdoi/inspass.htm
8. 0. O’Sullivan, “Biometrics comes t o life”, Banking Journal, 1997, January.
9. Nuance, http://www.nuance.com
10. Siemens Automotive, http://media.siemensauto.com
11. R. Chellappa, C. L. Wilson and S. Sirohey, “Human and machine recognition
of faces: a survey”, Proc. of the IEEE, 1995, vol. 83, no. 5, May.
12. J. Daugman, “HOWiris recognition works” ICIP, 2002, vol. 1, pp. 33-36.
13. B. Moreno, A. Sanchez and J. F. Velez, “On the use of outer ear images for
personal identification in security applications”, IEEE 3rd Conf. on Security
Technology, pp. 469-476.
14. R. W. Fkischholz and U.Dieckmann, “BioID: a multimodal biometric iden-
tification system”, Computer, 2000, vol. 33, no. 2, pp. 64-68, Feb.
15. E. Hjelmas and B. K. Low, “Face detection: a survey”, Computer Vision and
Image Understanding, 2001, vol. 83, pp. 236-274.
16. M. Kirby and L. Sirovich, “Application of the karhunen-lohe procedure for
the characterization of human faces,” IEEE Bans. on PAMI, vol. 12, pp.
103-108, 1990.
17. I. T. Joliffe, “Principal Component Analysis”, Springer-Verlag, 1986.
18. M. A. Turk and A. P. Pentland, “Face recognition using eigenfaces,” in IEEE
19

Conf. on CVPR, 1991, pp. 586-591.


19. A. Pentland, B. Moghaddam and T. Starner, “View-based and modular
eigenspaces for face recognition,” IEEE Conf. on CVPR, pp. 84-91, June
1994.
20. H.-C. Kim, D. Kim and S. Y. Bang, “Face recognition using the mixture-of-
eigenfaces method,” Pattern Recognition Letters, vol. 23, no. 13, pp. 1549-
1558, Nov. 2002.
21. D. S. Turaga and T. Chen, “Face recognition using mixtures of principal
components,” IEEE Int. Conf. on IP, vol. 2, pp. 101-104, 2002.
22. R. Brunelli and T. Poggio, “Face Recognition: Features versus Templates”,
IEEE Trans. o n PAMI, 1993, vol. 15, no. 10, pp. 1042-1052, Oct.
23. W. J. Welsh and D. Shah, “Facial feature image coding using principal com-
ponents,” Electronic Letters, vol. 28, no. 22, pp. 2066-2067, October 1992.
24. P. N. Belhumeur, J. P. Hespanha and D. J. Kriegman, “Eigenfaces vs. fish-
erfaces: recognition using class specific linear projection,” IEEE Transaction
on PAMI, vol. 19, pp. 711-720, Jul 1997.
25. K. Etermad and R. Chellappa, “Face recognition using discriminant eigen-
vectors,” ICASSP, vol. 4, pp. 2148-2151, May 1996.
26. D. L. Swets and J. Weng, “Using discriminant eigenfeatures for image re-
trieval,” IEEE n u n s . on PAMI, vol. 18, no. 8, pp. 831-836, August 1996.
27. C. Liu and H. Wechsler, “Gabor feature based classification using the en-
hanced fisher linear discriminant model for face recognition,” IEEE P a n s .
on IP, vol. 11, no. 4, pp. 467-476, Apr 2002.
28. L.-F. Chen, H.-Y. M. Liao, M.-T. KO, J.-C. Lin and G.-J. Yu, “A new lda-
based face recognition system which can solve the small sample size problem,”
Pattern Recognition, vol. 33, no. 10, pp. 1713-1726, October 2000.
29. J. Yang and J.-Y. Yang, “Why can Ida be performed in pca transformed
space?,” Pattern Recognition, vol. 36, no. 2, pp. 563-566, February 2003.
30. B. Moghaddam, W. Wahid and A. Pentland, “Beyond eigenfaces: Proba-
bilistic matching for face recognition,” IEEE Int. Conf. o n Automatic Face
and Gesture Recognition, pp. 30-35, April 1998.
31. B. Moghaddam and A. Pentland, “Probabilistic visual learning for object
recognition,” Int. Conf. o n Computer Vision, 1995.
32. J. Zhang, Y. Yan and M. Lades, “Face recognition: Eigenface, elastic match-
ing, and neural nets,” Proc. of the IEEE, vol. 85, no. 9, Sep 1997.
33. M. Lades, J. C. Vorbriiggen, J. Buhmann, J. Lange, C. von der Malsburd,
R. Wiirtz and W. Konen, “Distortion invariant object recognition in the
dynamic link architecture,” IEEE Trans. o n Computers, 1993, vol. 42, no. 3.
34. C. L. Kotropoulos, A. Tefas and I. Pitas, “Frontal face authentication us-
ing discriminant grids with morphological feature vectors,’’ IEEE P a n s . on
Multimedia, vol. 2, no. 1, pp. 14-26, March 2000.
35. A. Tefas, C. Kotropoulos and I. Pitas, “Using support vector machines to en-
hance the performance of elastic graph matching for frontal face recognition,”
IEEE Trans. o n PAMI, vol. 23, no. 7, pp. 735-746, Jul 2001.
36. L. Wiskott, J. M. Fellous, N. Kriiger and C. von der Malsburg, “Face recog-
nition by elastic bunch graph matching,” IEEE Trans. on PAMI, vol. 19, no.
20

7, pp. 775-779, July 1997.


37. B. S. Manjunath, R. Chellappa and C. von der Malsburg, “A feature based
approach to face recognition,” Proc. of IEEE Computer Society Conf. on
Computer Vision and Pattern Recognition, pp. 373-378, 1992.
38. N. Kriiger, “An algorithm for the learning of weights in discrimination func-
tions using a priori constraints,” IEEE Trans. on PAMI, vol. 19, no. 7, Jul
1997.
39. B. DGc, S. Fischer and J. Bigiin, “Face authentication with gabor information
on deformable graphs,” IEEE Trans. on IP, vol. 8, no. 4, Apr 1999.
40. F. Perronnin, Jean-Luc Dugelay and K. Rose, “Deformable Face Mapping
for Person Identification”, ICIP, 2003.
41. C. Wu and J. Huang, “Human Face profile recognition by computer”, Pattern
Recognition, vol. 23, pp. 255-259, 1990.
42. T. Aibara, K. Ohue and Y. Matsuoka, “Human face recognition of P-type
Fourier descriptors”, SPIE Proc., vol 1606: Visual Communication and Image
Processing, 1991, pp. 198-203.
43. Y. Can, M. Leung, “Human face profile recognition using attributed string”,
Pattern Recognition, vol. 35, pp. 353-360.
44. G. Gordon, “Face recognition based on depth maps and surface curvature”,
SPIE Proc., vol. 1570, pp. 234-247, 1991.
45. G. Gordon, “Face recognition based on depth and curvature features”, IEEE
Conf on CVPR, 1992, pp. 808-810, 15-18 Jun.
46. B. Achermann, X. Jiang and H. Bunke, “Face recognition using range im-
ages”, VSMM, 1997, pp. 129-136, 10-12 Sep.
47. F. S. Samaria, “Face recognition using hidden Markov models”, Ph. D. thesis,
University of Cambridge, 1994.
48. Y . Wang, C.-S. Chua and Y.-K. Ho, “Facial feature detection and face recog-
nition from 2D and 3D images”, Pattern Recognition Letters, 2002, vols. 23,
pp. 1191-1202.
49. M. Lawlor, “Thermal pattern recognition systems faces security challenges
head on”, Signal Magazine, 1997, November.
50. J. Wilder, P. J. Phillips, C. Jiang and S. Wiener, “Comparison of visible and
infra-red imagery for face recognition”, Int. Conf. on Automatic Face and
Gesture Recognition, 1996, pp. 182-187, 14-16 Oct.
51. S. Prabhakar and A. Jain, “Decision-level fusion in biometric verification”,
Pattern Recognition, 2002, vol. 35, no. 4, pp.861-874.
52. R. Brunelli and D. Falavigna, “Person identification using multiple cues”,
IEEE Trans. on PAMI, 1995, vol. 17, no. 10, pp. 955-966, Oct.
53. B. V. Dasarathy, “Decision fusion”, IEEE Computer Society Press, 1994.
54. J. Kittler, M. Hatef, R. Duin and J. Matas, “On combining classifiers”, IEEE
Trans. on PAMI, 1998, vol. 20, no. 3, pp. 226-239,
55. S. Ben-Yacoub, Y. Abdeljaoued and E. Mayorz, “Fusion of face and speech
data for person identity verification”, IEEE Trans. on NN, 1999, vol. 10,
no.5, Sept.
56. L. Hong and A. Jain, “Integrating faces and fingerprints for personal identi-
fication”, IEEE Trans. on PAMI, 1998, vol. 20, no. 12, pp. 1295-1307.
21

THE USE OF IMAGE ANALYSIS IN THE EARLY DIAGNOSIS OF


ORAL CANCER
R. SERPICO, M. PETRUZZI AND M DE BENEDI'ITIS
Department of Odontostomatology and Surgery
University of Bari.
p.zza G. Cesare 11- Bari- ITALY
E-mail: r.serpico@doc,uniba.it

Oral squamous cell carcinoma (OSCC) is a malignant neoplasm revealing a


poor prognosis. Despite of the site where such disease arises, there are several
cases where OSCC is not early detected by clinicians. Moreover, diagnostic
delay shortens the prognosis. In literature several tools, with variable
specificity and sensibility, of image analysis have been proposed in order to
detect OSCC. Lesional autofluorescence analysis of OSCC has revealed
effective, however different methods used to evoke the fluorescence. On the
other hand, vital staining, such as toluidine blu, requires only a clinical
assessment of the degree to detect the lesions. No studies have been performed
by using a computerized analysis of OSCC images or a neural networks. The
screening tool for an early OSSC detection should be inexpensive, easy to use
and reliable. We hope for the use information development in OSCC lesions
analysis to make its diagnosis early in order to extend the prognosis.

1. Definition and epidemiology of oral carcinoma

Recently, it has been estimated that oral squamous cell carcinoma ( OSCC)
represents 3% of all malignant neoplasms . OSCC, usually, affects more men
than women so is considered the 6" most frequent male malignant tumour and
the 12" female one. In U.S.A about 21.000 new cases of OSCC are diagnosed
every year and 6.000 people die because of this disease.
In the last decade OSCC has gone on developing. This has caused a terrible
increase of under 30 individuals affected by oral carcinoma.
A serious data concerns prognosis in these patients. If the neoplasm is detected
within its 1'' or 2"d stage, the probability of living for five years will be 76%.
This value will go down 41% , if the malignant tumour is diagnosed withm its
3rdstage.
Only 9% of the patients goes on living after five years since OSCC diagnosis
during its 4" stage.
The diagnostic delay is caused by different reasons:
carcinoma development. OSCC, during its manifestation, doesn't
reveal any particular symptom or painful. So, the patient tends to
ignore the lesion and hardly goes to the dentist to ask for a precise
diagnosis;
the polymorfism that the oral lesions often show. For example, an ulcer
can appear similar to a trauma, aphtae major or carcinoma;
22

the doctors in charge who aren’t used to examining the oral cavity
during the routine check-up. So, recent researches has proved that a
person suffering from mucous lesions
in the oral cavity goes first to h s family doctor who advises him a
dermatological visit.
Usually, carcinoma is detected after 80 days after its first symptoms, so, this
delay even is responsible for the short OSCC prognosis .

2. Fluorescence methodologies

Optical spectroscopy autofluorescence tissue is a sensitive, not invasive


methodology sensitive, easy to use and capable of detecting possible alterations
of the tissue.
Autofluorescence results from the presence of porphyrin connected with
neoplasm growth. Fluorescence given out the sound tissues, reveals a colour
different from that one observed on tissues affected by carcinoma. This
autofluorescence can be also stimulated by irradiations through laser, xenon
light or halogen lamps.

Fig. 1. Fluorescence of OSCC localized at the border of tongue.(Oral


-
Oncology 39 (2003) 150-156. )

Recently, it has showed a particular program which permit us to read digitalized


images of fluorescing lesions. This system uses the following operating
algorithm:

1. RGB FLUORESCENCE IMAGE


2. CONTRAST ENHANCEMENT
3. HUE EXTRACTION
4. HISTOGRAM THRESHOLDING
5. SEGMENTATION
6 . QUANTITATIVE PARAMETERS EXTRACTION
7. DIAGNOSTIC ALGORITHM
8. COMPARE WITH GOLD STANDARD
23

9. TISSUE CLASSIFICATION.

These methodologies reveal an high sensibility ( about 95%) but a specificity of


51-60%. Scientific literature shows some researches on the use of neural
networks able to make a good judgement on autofluorescence caused by
dubious lesions, Using these neural networks, it’s possible to distinguish a
sound tissue from a neoplasm one with a sensibility of 86% and a specificity of
100%.
In realty, it has been proved that these methodologies are ineffective because
aren’t able to identify the various mucous areas with their different dysplasia
levels.

Fig. 2. Example of mean neural work input curves grouped according to the
clinical diagnosis. (Oral Oncology 36 (2000) 286-293)

Onizawa and his collaborators have tested the use of fluorescence


methodologies on 55 patients suffering from OSCC. According to their research,
90% of the cases analysed has resulted positive to fluorescence. So, they have
found out that the lesion staging is as major as the sensibility and specificity
methodology.

3. Toulidine blue

Toulidine blue is a metachromatic vital staining.


Years ago it was employed by the gynaecologists but today is considered a good
methodology for diagnosing OSCC.
24

Because of the colorant is particularly similar to acid, it can combine directly


with genetic material ( DNA, RNA) of cells keeping on reproducing. So, it’s
possible to note DNA and RNA synthesis in neoplasm clones increasing where
neoplasm grows.
This methodology is easy, inexpensive and doesn’t cause any physical
discomfort. So, the patient must only rinse his oral cavity with acetic acid ( 1%)
in order to remove cellular residues and all that’s on the lesion. Successively,
it’s possible apply toulidine blue (1%) on the lesion for 30 seconds.

Fig. 3. Example of neoplastic lesion stained by using toluidine blue. Areas with
more active mitosis stain more with toluidine blue.

The patient rinses again the lesion with acetic acid to remove the excessive and
not fixed colour. In this moment the clinician can detect the lesion according to
the colour even though OSCC diagnosis
depends largely on histology report.
So, the coloured lesion can be defined :

a) TRUE POSITIVE : the lesion has absorbed the colour and is an OSCC from
an histological point of view;
b) FALSE POSITIVE: the lesion has absorbed the colour but isn’t an OSCC
from an histological point of view;

c) TRUE NEGATIVE: the lesion doesn’t absorb the colour and isn’t an OSCC
from an histological point of view;

d) FALSE NEGATIVE: the lesion doesn’t absorb the colour but is an OSCC
from an histological point of view.
25

Fig.4. Example of traumatic lesion: even though stained by toluidine blue, the
lesion is not a carcinoma (false positive).

In realty, this methodology is sensible but not particularly specific. The number
of coloured lesions, even though aren’t cancerous, is large.
Scientific literature shows different researches on the reliability of this
methodology. The case histories reveal encouraging data about the diagnostic
power of toulidine blue but no study has still considered the use of a digital
reading of the lesion. Employing digital methodologies could make more reliable
this test and , for example, it’s possible to use the different blue gradations
invisible to the naked eye. The reading of the coloured lesions making use of
toulidine blue aims to offer the dentists another diagnostic tool. It is inexpensive,
easy to use and not invasive, so, can be normally used like screening for the
patients who are used to go to the dentist. On the other hand this methodology
makes possible an on-line communication of the digital images to specialized
centres in order to have other consultations.
Actually, there isn’t a screening methodology with a sensibility and specificity
of 100%.
However, the use of data processing system improves the reliability in
diagnostic methodologies and offers an objective analysis.

4. Conclusions

Scientific literature hasn’t showed trial which have compared the efficacy of the
different methodologies used to analyse the image in OSCC diagnosis.
We hope we will use an univocal, reliable and inexpensive reading methodology
of the lesion. The information development should help clinic-medical
diagnosis. It could be the ideal way to have an early diagnosis. This will cause a
prognosis improvement which will make the relationship between medicine and
computer science extraordinary.
26

Acknowledgments

Authors are grateful to Annalisa Chiala for reviewing this paper.

References

1. Benjamin S, Aguirre A and Drinnan A, Dent. Today. 21( 11):116


(2002).

2. Llewellyn CD, Johnson NW and Warnakulasuriya KA, Oral. Oncol.


37(5):401 (2001).
3. Neville, Damm, Allen, Bouquot: Oral & Maxillofacial Pathology. Saunders
press. 2"d Edition- USA (2002).
4. Onizawa K, Okamura N, Saginoya H and Yoshida H. Oral. Oncol.
39(2): 150 (2003).
5. Onofre MA, Sposto MR and Navarro CM. Oral. Surg. Oral. Med. Oral.
Pathol. Oral. Radiol. Endod. 91(5):535(2001).

6. Porter SR and Scully C. Br. Dent. J. 25;185(2):72 (1998),

7. Reichart PA. Clin. Oral. Investig. 5(4):207 (2001).


8. van Staveren HJ, van Veen RL, Speelman OC, Witjes MJ, Star WM and
Roodenburg JL. Oral. Oncol. 36(3):286 (2000).
9. Zheng W, So0 KC, Sivanandan R and Olivo M. Znt. J. Oncol. 21(4):763
(2002).
27

LUNG EDGE DETECTION IN POSTER0 ANTERIOR


CHEST RADIOGRAPHS

PAOLA CAMPADELLI
Dipartamento di Scienze dell 'Informazione,
Universitd degli Studi d i Milano,
Via Comelico, 39/41
20135, Milano, Italy
E-mail: campadelli0dsi.unimi.it

ELENA CASIRAGHI
Dipartimento di Scienze dell 'Informazione,
Uniuersitd degli Studi d i Milano,
Via Comelico, 39/41
80135, Milano, Italy
E-mail: casiraghiQdsi.unimi.it

The use of image processing techniques and Computer Aided Diagnosis (CAD)
systems has proved to be effective for the improvement of radiologists' diagnosis,
especially in the case of lung nodules detection. The first step for the development
of such systems is the automatic segmentation of the chest radiograph in order to
extract the area of the lungs. In this paper we describe our segmentation method,
whose result is a close contour which strictly encloses the lung area.

1. Introduction
In the field of medical diagnosis a wide variety of ima-ging techniques is
currently avalaible, such as radiography, computed tomography (CT) and
magnetic resonance ima-ging (MRI). Although the last two are more precise
and more sensitive techniques, the chest radiography is still by far the most
common type of procedure for the initial detection and diagnosis of lung
cancer, due to its noninvasivity characteristics, radiation dose and economic
consideration. Studies by [20] and [ll]explain why chest radiograph is one
of the most challenging radiograph to produce technically and to interpret
diagnostically. When radiologists rate the severity of abnormal findings,
large interobserver and intraobserver differences occur. Moreover several
studies in the last two decades, as for example [B] and [2], calculated an av-
28

erage miss rate of 30% for the radiographic detection of early lung nodules
by humans. In a large lung cancer screening program 90% of peripheral
lung cancers have been found to be visible in radiographs produced earlier
than the date of the cancer discovery by the radiologist. This results showed
the potentiality of improved early diagnosis, suggesting the use of computer
programs for radiograph analysis. Moreover the advent of digital thorax
units and digital radiology departments with Picture Archiving Commu-
nication Systems (PACS) makes it possible to use computerized methods
for the analysis of chest radiographs as a routine basis. The use of im-
age processing techniques and Computer Aided Diagnosis (CAD) systems
has proved to be effective for the improvement of radiologists’ detection
accuracy for lung nodules in chest radiographs as reported in [15].
The first step of an automatic system for lung nodule detection, and in
general for any further analysis of chest radioraphs, is the segmentation of
lung field so that all the algorithms for the identification of lung nodules
will be applied just to the lung area.
The segmentation algorithms proposed in the literature t o identify the
lung field can be grouped into: rule based systems ([l],[21], [22], [7], [4],
[14], [5], [3]), pixel classification methods including Neural Networks ([13],
[12], [9], [IS]) and Markov random fields ([18] and [19]),active shape models
([S]) and their extensions ([17]).
In this paper we describe an automatic segmentation method which
identifies the lung area in Postero-anterior (PA) digital radiographs. Since
the method is thought as the first step of an automatic lung nodule detec-
tion algorithm, we choose to include in the area of interest also the bottom
of the chest and the region behind the heart; they are usually excluded by
the methods presented in the literature. Besides, we tried t o avoid all kind
of assumptions such as the position and orientation of the thorax: we work
with images where the chest is not always located in the central part of the
image, it can be tilted and it can have structural abnormalities.
The method is made of two steps. First, the lungs are localized using sim-
ple techniques (section 4), then their borders are more accurately defined
and fitted with curves and lines in order to obtain a simple close contour
(section 5).

2. Materials
Our database actually contains 11 1 radiographs of patients with no disease
and 13 of patients with lung nodules. They have been acquired in the
29

Department of Radiology of the Niguarda Hospital in Milan. The images


were digitized with a 0.160 mm pixel size, a maximum matrix size of 2128
by 2584, and 4096 grey levels.
Before processing they have been downsampled to a dimension of 300 by
364 pixels, and filtered with a median filter of 3 pixel size. In the following
sections we will refer to these images as the origina2 images.

3. Coarse lung border detection


3.1. Iterative thresholding
Since both the background of the image and the central part of the lungs
are charachterized by the highest grey va-lues, while the tissues between
them are very dark, we use an iterative thresholding technique to obtain a
first classification of the pixels as belonging to lung, body or background
regions.
Before applying the thresholding procedure, we enhance the image con-
trast by means of a non linear extreme value sharpening technique:

max iff Jmax-G(x, y)J 5


GN(z,y) = 5 lmin -G(z, Y)I (1)
min otherwise
where min and max are the minimum and maximum grey values com-
puted on a window Win(x, y) centered in ( 2 , ~ )The. window size used is 5
pixel.
We choose this operator because it has the effect of increasing the con-
trast where the boundaries between objects are characterized by gradual
changes in the grey levels. In the case of chest radiographs we often find
this situation in the peripheral area of the lung and sometimes on the top
regions and costophrenic angles.
We then perform a linear transformation on the enhanced image with
4096 grey levels, to get an image with 256 grey levels, and start the iterative
thresholding at an initial high threshold value of 235. At each step we lower
the threshold by 1 and classify the regions formed by the pixels with grey
value higher than the threshold into background and lung regions. We
consider background regions those attached to the borders of the image
or those at a distance of 1 pixel from other border regions, the others
are identified as lung. The algorithm stops when two regions classified
differently at the previous step fuse.
30

To obtain a finer approximation of the lung region we repeat the de-


scribed iterative procedure for three times; each time the input is the orig-
inal 8-bit image where the lung pixels found at the previous iteration are
set t o 0.
In [Fig.l] (left) a lung mask image is shown. The background is red
coloured, the body part is black, the lung regions are blue.

3.2. Edge detection


At this stage we look for rough lung borders. To obtain an initial edge image
(see [Fig.l] (center)) we use the simple but efficient Sobel operator, select
18% of the pixels with the highest gradient and delete those correspond-
ing t o the background. We then mantain only the connected edge pixels
regions which intersect the lung region previously identified. To delete or
to separate from the lung borders edge pixels belonging t o other structures
such as collarbones, neck, or clavicles we use a morphological opening op-
erator. The regions disconnected either from the lung mask border or from
the edges selected are eliminated if their localisation satisfies one of the
following conditions: they are attached t o the borders of the image or t o
background regions, their bottommost pixel is located over the topmost
pixel of the lung regions, they are totally located in the space between the
two lung areas.
If the area covered by the remaining edge pixels is less extended than the
one occupied by the lung mask .we look for new edge pixels in the lung
regions. This is done by considering in the initial edeg image a bigger per-
centage of pixel with the highest grey value and adding them until either
the edge pixels cover the whole lung area or the the percentage reaches a
value of 40%.
In [Fig.l] we show an example of the initial edge image (center) and the
extracted lung edge image, E , (right).
As can be seen further processing is necessary since some lung border
may still be missing (the top or bottom parts, the costophrenic angles,..),
or wrong edge pixels (belonging to the neck or collarbones) can still be
present. To solve this problem we search for the axis of the thorax. We can
thus delete, if they are present, the edges belonging to neck or collarbones
and estabilish if the thorax has a non vertical position.
31

Figure 1. lung mask image, initial edge image and edge image

3.3. Axis finder


To find the axis of the chest we use a binary image obtained by an OR
operation between the lung edge image, E , and the lung mask image. For
each horizontal line of this new image, we find the pixel in the center of
the segment connecting the leftmost and rightmost pixel and sign it if the
extremes of the segment do not belong to the same lung region. Moreover,
we consider the inclination of the line connecting one central pixel ($0, yo)
to the following (z1,yl) and discard it if the value (21- y o ) / ( z l - z o ) is less
then 1.5; a lower value means that probably (z1,yl) has been computed
from two outmost pixels that are not symmetric with respect to the real
axis. The Hough transform to search for lines, and a polynomial fitting
method that minimizes the chi-square error statistic, is used to find two
possible axis of the image. The one that fits the central pixels better is
then chosen as chest axis.
In figure [Fig.2] (left) the central points used to find the axis and the
corresponding lateral points are signed with the blue and red color respec-
tively; on the right the axis dilated is showed.

3.4. Edge refinement


The axis found is usually located in the center of the dorsal column. This
fact allows us to delete edges in E that belong to the dorsal column or to
the neck. They are tipically little edge regions (with less than 200 pixel),
crossing the axis itself or, more often, located in a region around it. We
defined this region as a stripe which width is of 1/25 of the width of the
originaE image (see [Fig21 on the right). We then delete all the regions
with less than 200 pixels that cross this stripe. If some lung edge is wrongly
32

Figure 2. axis points and neck stripe

cancelled it will be recovered in the next steps.


It can happen that the top parts of the lungs are detected by the Sobel
operator but they are not included in the lung edge image E because in the
lung m a s k they are not labelled as lung regions. The axis can help to verify
this condition since the apex point of the lung should be located close to
it. Consider the left lung (in the image), let ( z py,) be coordinates of the
leftmost edge pixel with the lowest y coordinate, and let ( z a ,y,) be the
coordinates of the axis in the same row; if lzp - zal is bigger than 1/4 of
the total image width, we add those pixels that in the initial edge i m a g e
are contained in a stripe extending from the x, to x a , with an height of
y,/lO. The same operation is done for the right lung. We can also verify a
simmetry condition between the two lung top pixels; if more that one pixel
with the lowest y coordinate is found on each side, the central is taken.
We evaluate the euclidean distance between one top and the simmetric of
the other with respect to the axis; if this distance is greater than 20 we
are allowed to think that there is no simmetry between the lungs edges
found, and that the wrong top pixel is the one with the higher vertical
coordinate. We therefore use this top pixel and the simmetric of the other
one as vertices of a rectangular search area in the inatial edge image, and
add the edge pixels found to E.
The bottom part of the lungs are often charachterized by very low con-
trast and therefore also in this region we look for edge pixels to be added
to E. In this case we use more accurate edge detectors, such as the direc-
tional gaussian filters. We limit the processing to a stripe centered around
the bottommost edge pixel and with an height fixed at 1/8 of the vertical
dimension of the original image. We work separately on the left and right
lung sub-images, applying a locally adaptive scaling operator described in
[lo], followed by the histogram equalisation. On these enhanced data we
search in the left lung for edges oriented at 90" and 45", and in the right
33

lung for those oriented at 90" and 135'. We filter the image with a gaussian
filter at scale c,related to the stripe dimension, take the vertical derivative
and mantain the 5% of the pixels with the highest gradient value. These
edge pixels, which often belongs to the lung borders, are added to the edge
image. Since the costophrenic angle can still be missing we filter the image
at a finer scale 0/2, take the derivative at 135" and 45" (depending on the
side) and mantain the 10% of the edge pixels. A binary image that may
represent the costophrenic angles is obtained combining this information
with the 10% of the pixels with the highest value in the vertical direction.
The regions in the binary image just created are added to the lung edge
i m a g e E if they touch, or are attached to, some edge pixels in it.
At this stage most of the edge pixels belonging to the lung borders should
have been determined; the image can hence be reduced defining a rectan-
gular bounding box slightly greater than the lung area defined by the lung
edge image E .

4. Lung area delineation


4.1. Final contour refinement
To obtain more precise and continuos contours we process the reduced
image but with 4096 grey levels. We enhance it with a locally adaptive
scaling algorithm and apply histogram equalization to the result. On the
grey level enhanced image we identify the pixels that in the lung edge image
E constitutes the lung extremes; for each side they are the leftmost and
rightmost pixel in each row and the topmost and bottommost pixel for
each column (they are red coloured in [Fig31 (left)). These are the seeds
of the following region growing procedure: for each seed with grey value
G(x,y), we select in its 8 neighborhood, and add to E , all the pixels in
the range [G(z, y - lo), G(z, y) + lo]. If their number is greater than 4
we select the pixels whose grey value is closest to G(z,y) and iterate the
procedure unless a background pixel is identified or the selected element is
another seed or 20 iteration steps have been done. This procedure creates
thick contours, that now reach the external border of the lung, often much
better defined especially on the top and bottom; however the lateral lung
contours are often still discontinuos, especially in the right lung (see also
Fig.31 (center)). We improve their definition calculating the horizontal
derivative of the enhanced image, and keeping 15% percent of the pixels
with the maximum value for the right lung, and 10% for the left.
We tken delete those pixels internal to the lung or background regions; the
34

regions in this image intersecting edge pixels are added to the lung edge
i m a g e (the result of this addition is shown in [Fig.3] (right)).

Figure 3. enhanced image with the seed points, edge image after growing, edge image
after the last regions added

At this point we can define the close contour of the area containing the
lungs, fitting the borders found with curves and lines. We describe the
operation on the left lung only, referring to the binary image of its edges as
left edge image El. We noticed that the shape of the top part of the lung
could be well fitted by a second order polynomial function. To find it we
use the Hough transform to search for parabolas, applied to the topmost
points of each column in El. The fitted parabola is stopped, on the right
side of its vertex, in the point where it crosses a line parallel to the axis and
passing through the rightmost pixel; on the left side it is stopped where it
crosses the left edge image; if more than one point is found we select the
one with the lowest y coordinate.

To find a close contour approximating the lateral borders we consider


the set U composed by selecting for each row in El the leftmost pixel if it is
located at the left side of the top one. Since we noticed that the orientation
of the left border can change starting from the top to the bottom, we
extracted from U three subsets u1, u2, u3 with an equal number of elements
and containing the points located respectively in the upper, central and
bottom part of the image. These subsets are fitted separately with different
functions. We use one parabola to fit the points in u1: this allow us to
recover errors in case the parabola used to fit the top points was too narrow
(in the central image in [Fig41 an example of this fact is shown). A second
line is used to fit the points in 212. The set u3 often contains the lateral
points of both the lateral border of the lung and the lateral border of the
35

costophrenic angles; we noticed that in some cases the contours of these


borders have different inclinations. We therefore fit with two different lines
the points in the upper and bottom part of u g .
We define as boundary in the bottom part the horizontal line that
crosses the bottommost pixel of the edge image

5. Results
We detected small errors in 4 of the 124 images in our database, where we
consider as error the fact that a part of the lung has not been included by
the lung contours defined. The part missed by the algorithm is the border
of the costophrenic angle. The algorithm anyway shows to be robust to
structural abnormalities of the chest. ([Fig.$]). The algorithm has been
implemented in IDL, an interpreted language and, when executed on a
Pentium N with 256 Mb of RAM, it takes from 12 seconds (for images of
patients with little sized lung that can be cutted as described in section
4.4) to 20 seconds (for images of big sized lung).

(b) (c)

Figure 4. resulting images

References
1. S.G. Armato, M.Giger, and H.MacMahon. Automated lung segmentation in
digitized posteroanterior chest radiographs. Academic radiology, 5:245-255,
1998.
2. J.H.M. Austin, B.M. Romeny, and L.S. Goldsmith. Missed bronchogenic car-
cinoma: radiographic findings in 27 patients with apotentially resectable
lesion evident in retrospect. Radiology, 182:115-122, 1992.
36

3. M.S. Brown, L.S. Wilson, B.D. Doust, R.W. Gill, and CSun. Knowledge-
based method for segmentation and analysis of lung boundaries in chest x-
rays images. Computerized Medical Imaging and Graphics, 22:463-477, 1998.
4. F.M. Carrascal, J.M. Carreira, M. Souto, P.G. Tahoces, L. Gomez, and J.J.
Vidal. Automatic calculation of total lung capacity from automatically traced
lung boundaries in postero- anterior and lateral digital chest radiographs.
Medical Physics, 25:1118-1131, 1998.
5. D. Cheng and M. Goldberg. An algorithm for segmenting chest radiographs.
Proc SPIE, pages 261-268, 1988.
6. T. Cootes, C. Taylor, D. Cooper, and J . Graham. Active shape models-their
training and application. Comput. Vzs, Image Understanding, 61:38-59, 1995.
7. J. Duryea and J.M. Boone. A fully automatic algorithmfor the segmentation
of lung fields in digital chest radiographic images. Medical Physics, 22:183-
191, 1995.
8. J. Forrest and P. Friedman. Radiologic errors in patient with lung cancer.
West Journal on Med., 134485-490, 1981.
9. A. Hasegawa, S.-C. Lo, M.T. Freedman, and S.K. Mun. Convolution neural
network based detection of lung structure. Proc. SPIE 2167, pages 654-662,
1994.
10. R. Klette and P.Zamperoni. Handbook of image processing operators. Wiley,
1994.
11. H. MacMahon and K. Doi. Digital chest radiography. Clan. Chest Med.,
12:19-32, 1991.
12. M.F. McNitt-Gray, H.K. Huang, and J.W. Sayre. Feature selection in the pat-
tern classification problem of digital chest radiographs segmentation. IEEE
Duns. on Med. Imaging, 14:537-547, 1995.
13. M.F. McNitt-Gray, J.W. Sayre, H.K. Huang, and M. Razavi. A pattern clas-
sification approach to segmentation of chest radiographs. PROC SPIE 1898,
pages 160-170, 1993.
14. E. Pietka. Lung segmentation in digital chest radiographs. Journal of digital
imaging, 7:79-84, 1994.
15. T.Kobayashi, X.-W. Xu, H. MacMahon, C. Metz, and K. Doi. Effect of a
computer-aided diagnosis scheme on radiologists’ performance in detection
of lung nodules on radiographs. Radiology, 199:843-848, 1996.
16. 0. Tsuji, M.T. Freedman, and S.K. Mun. Automated segmentation of
anatomic regions in chest radiographs using an adaptive-sized hybrid neu-
ral network. Med. Phys., 25:998-1007, 1998.
17. B. van Ginneken. Computer-aided diagnosis in chest radiographs. P.h.D. dis-
sertation, Utrecht Univ., Utrecht, The Nederlands, 2001.
18. N.F. Vittitoe, R.Vargas-Voracek, and C.E. Floyd Jr. Identification of lung
regions in chest radiographs using markov random field modeling. Med. Phys.,
25:976-985, 1998.
19. N.F. Vittitoe, R. Vargas-Voracek, and C.E. Floyd Jr. Markov random field
modeling in posteroanterior chest radiograph segmentation. Med. Phys.,
26:1670-1677, 1999.
20. Cj Vyborny. The aapm/rsna physics tutorial for residents: Image quality and
37

the clinical radiographic examination. Radiographics, 17:479-498, 1997.


21. X.-W. Xu and K. Doi. Image feature anlysis for computer aided diagno-
sis: accurate determination of ribcage boundaries chest radiographs. Medical
Physics, 22:617-626, 1995.
22. X.-W. Xu and K. Doi. Image feature anlysis for computer aided diagnosis:
accurate determination of right and left hemidiaphragm edges and delineation
of lung field in chest radiographs. Medical Physics, 23:1616-1624, 1996.
38

DISCRETE TOMOGRAPHY FROM NOISY PROJECTIONS

C . VALENTI
Dipartimento di Matematica ed Applicazioni
Universitci degli Studi d i Palermo
Via Archirafi 34, 90123 Palermo - Italy
E-mail: cvalenti@math.unipa.it

The new field of research of discrete tomography will be described in this paper.
It differs from standard computerized tomography in the reduced number of pro-
jections. It needs ud hoc algorithms which usually are based on the definition of
the model of the object to reconstruct. The main problems will be introduced and
an experimental simulation will prove the robustness of a slightly modified version
of a well known method for the reconstruction of binary planar convex sets, even
in case of projections affected by quantization error. To the best of our knowl-
edge this is the first experimental study of the stability problem with a statistical
approach. Prospective applications include crystallography, quality control and
reverse engineering while biomedical tests, due to their important role, still require
further research.

1. Introduction
Computerized tomography is an example of inverse problem solving. It
consists of the recovering of a 3D object from its projections Usually '.
this object is made of materials with different densities and therefore it is
necessary t o take a number of projections ranging between 500 and 1000.
When the object is made of just one homogeneous material, it is possible
to reduce the number of projections to no more than four, defining the so
called discrete tomography '. In such a case we define a model of the body,
assuming its shape. For example, we may know about the types of atoms
to analyze, the probability to find holes inside the object and its topology
(e.g. successive slices are similar to each other or some configurations of
pixels are energetically unstable) '.
Though this assumptions may be useful when considering applications
39

such as nondestructive reverse engineering, industrial quality control, elec-


tron microscopy, X-rays crystallography, data coding and compression, they
become almost unacceptable when the data to analyze come from biomed-
ical tests. Nevertheless the engagements required by the present technol-
ogy are too restrictive for real tasks and the state-of-the-art algorithms let
mainly reconstruct simulated images of special shapes.
Aim of this work is the description of an extensive simulation to verify
the robustness of a modified version of a well known method for the re-
construction of binary planar convex sets. In particular, we will face the
stability problem under noisy projections due to quantization error. Sec-
tion 2 introduces formal notations and basic problems. Section 3 gives a
brief description of the algorithm. Section 4 concludes with experimental
results and remarks.

2. Basic notations and issues


Discrete tomography differs from computerized tomography in the small
variety of density distribution of the object to analyze and in the very
few angles of the projections to take. From a mathematical point of view
we reformulate this reconstruction problem in terms of linear feasibility
(Figure 1):
A: = p-t , A E (0, a: E (0, l}n,-p E Nr
where the binary matrix A represents the geometric relation between points
in Z2 and the integer valued vector p- represents their projections.

1 3 2
Figure 1 . A subset of '
2 and its corresponding linear equation system. The black disks
(+)and the small dots (+) represent the points of the object and of the discrete lattice,
respectively.

Main issues in discrete tomography arise from this dearth of the input
data. In 1957 a polynomial time method to solve the consistency problem
(i.e. the ability to state whether there exists any A compatible with a given
p-) has been presented 4 .
40

The uniqueness problem derives from the fact that different A’s can sat-
isfy the same p . For example, two A’s with the same horizontal and vertical
projections can be transformed one into each other by a finite sequence of
switching operations (Figure 2). Moreover, there is an exponential number
of hv-convex polyominoes (i.e. 4-connected sets with 4-connected rows and
columns) with the same horizontal and vertical projections 5 .

Figure 2. Three switches let get these tomographically equivalent objects.

Lastly, the stability problem concerns how the shape of an object changes
while perturbing its projections. In computerized tomography the variation
in the final image due to the fluctuation in one projection sample is generally
disregarded, since it forms independently, as one of many, the result and
the effect is therefore distributed broadly across the reconstructed image 6 .
This is not true in the discrete case and the first theoretical analysis to
reconstruct binary objects of whatever shape has proved that this task is
instable and that it very hard to obtain a reasonably good reconstruction
’.
from noisy projections Here we will describe how our experimental results
show that it possible to get convex binary bodies from their perturbed
projections, still maintaining a low reconstruction error.

3. Reconstruction algorithm
In order to verify the correctness of the algorithm we have generated 1900
convex sets with (10 x 10,15x 15,. . . , 100 x 100) pixels. Further 100 convex
sets with both width and height randomly ranging between 10 and 100 have
been considered too. Their projections have been perturbed 1000 times
by incrementing or decrementing by 1 the value of some of their samples,
randomly chosen. This is to estimate the effect of errors with absolute value
0 5 E 5 1, so simulating a quantization error. The number of the samples
has been decided in a random way, but if we want to let the area of the
reconstructed body be constant, we add and subtract the same amount of
pixels in all projections.
41

The algorithm introduced in lets reconstruct hv-convex polyominoes


in polynomial time, starting from a set of pixels, called spine, that surely
belong to the object to be reconstructed. This method makes a rough
assumption of the shape of the object and then adds pixels t o this core
through an iterative procedure based on partial sums of the projection
values. Usually the spine covers just a small part of the object and therefore
it is necessary to expand it by applying the filling operations (Figure 3).
The underlying idea is the recursive constraint of convexity on each line and
along each direction till the core of pixels satisfies the projections (Figure 4).
Should it not happen, then no convex polyomino is compatible with those
projections.

Figure 3. The first two filling operations are not based on the projection value. The
circles (+) represent pixels not yet assigned to the core.

We have generalized this algorithm by weakening the convexity con-


straint. This means that as soon as it is not possible to apply a certain
filling operation, due to an inconsistency between the value of the projec-
tion and the number of pixels already in the considered line of the core,
we skip that line and process the rest of the projection, so reaching a solu-
tion that we called non-convex. It may happen that the ambiguity will be
reduced when processing the core along other directions. Besides the hor-
izontal and vertical directions, we have also considered the following ones
-
d = ( ( l , O ) , (0,-1), (1,-2), (2, l), (-1, -l), (1,1)),in a number of pro-
jections chosen between 2 and 4, according to the sets { { d l , d z } , {d3,d4},
( d 5 , d6), (6,d 2 , d 5 ) , {dl, d2, d3, d4), {dl, d2, d 5 , d6), (d3, d4, d5, d s } } . The
particular directions we used are indicated in the upper right corner of each
of the following figures.
Since we are dealing with corrupt projections, most of the ambiguity
zones are not due t o complete switching components. Just in case of com-
plete switches we link the processing of the remaining not yet assigned
pixels to the evaluation of a corresponding boolean 2-CNF formula (i.e.
the .and. of zero or more clauses, each of which is the .or. of exactly
42

Figure 4. Convex recover through {dlrd2,d5}. The spine is showed in the first two
steps, the filling operations in the remaining ones. T h e grey pixels are not yet assigned.

two literals) lo. This complete search has exponential time complexity, but
it has been proved that these formulas are very small and occur rarely,
especially for big images 11,
In order t o measure the difference between the input image taken from
the database and the obtained one, we have used the Hamming distance (i.e.
we have counted the different homologous pixels), normalized according t o
the size of the image. Most of times we have obtained non-convex solutions
for which the boolean evaluation involves a bigger average error. Due to
this reason, we have preferred not to apply the evaluation on the ambiguous
zones, when they were not due to switching components. We want to
emphasize that these pixels take part in the error computation only when
compared with those of the object. That is, we treat these uncertain pixels,
if any, as belonging to the background of the image.

Figure 5 . Non-convex recover (upper right) from a binarized real bone marrow scintig-
raphy (left) with 1 pixel added/subtracted along { d 3 , d 4 } and without spine. T h e final
reconstructed image (lowerright) is obtained by deleting all remaining grey pixels. T h e
input image is utilized and reproduced with permission from the MIR Nuclear Medicine
digital teaching file collection at Washington University School of Medicine. MIR and
Washington University are not otherwise involved in this research project.

4. Experimental results
This final section summarizes the most important results we obtained, giv-
ing also a brief explanation.
The average error rate increases when the number of modified samples
43

increases. Obviously, the more we change the projections, the harder is for
the algorithm to reconstruct the object (Figure 6a).
Many non-convex sets suffers from a number of wrong pixels lower than
the average error. Despite the algorithm couldn’t exactly reconstruct the
convex set, the forced non-convex solutions still keep the shape of the orig-
inal object. For example, there are about 66.11% of non-convex solutions,
marked in grey, with fixed 100 x 100 size, 1 pixel addedlsubtracted along
directions ( d 3 , d 4 , d 5 , d s ) , and error smaller than the 0.34% average error
(Figure 6b).
In the case of convex solutions, the spine construction lets reduce the
number of unambiguous cells for the successive phase of filling. In the
case of non-convex solutions, the spine usually assumes an initial object
shape that produces solutions very different from the input polyomino. An
example of non-convex set obtained without spine preprocessing is shown
in Figure 5.
The choice of the horizontal and vertical directions { d l , d z } is not always
the best one. For example, ( 4 ,d 4 ) and ( d 5 , d ~ )let recover more non-
convex solutions with a smaller error. This is due t o the higher density
of the scan lines, that corresponds to a better resolution. More than two
directions improve the correctness of the solutions, thanks t o the reduced
degree of freedom of the undetermined cells. The following tables concisely
reports all these results, obtained for objects with 100 x 100 pixels, with our
without the spine construction, along different directions and by varying
the number of perturbed samples.
To the best of our knowledge this is the first experimental study of the
stability problem with a statistical approach. Our results give a quantita-
tive esteem for both the probability of finding solutions and of introducing
errors at a given rate. We believe that a more realistic instrumental noise
should be introduced, considering also that the probability of finding an
error with magnitude greater than 1 usually grows in correspondence of
the samples with maximum values. Moreover, though the convexity con-
straint is interesting from a mathematical point of view, at present we are
also dealing with other models of objects to reconstruct, suitable for real
microscopy or crystallography tools.

Acknowledgements
The author wishes to thank Professor Jerold Wallis 12 for his kind contri-
bution in providing the input image of Figure 5.
44

301
p 20

+Isamples % error
Figure 6. a: Average (*) minimum (D) and maximum (0) error versus number of
modified samples, for non-convex solutions with fixed 100 x 100 size, directions {dl,dz}
and spine preprocessing. Linear least-square fits are superimposed. b: Number of non-
convex solutions versus error, for fixed 100 x 100 size and 1 pixel added/subtracted along
directions {d3,d4,d5,d6) without spine. The dashed line indicates the average error.

Table 1. + I / - 1 samples (constant area).


Directions Spine Average Number of
error solutions
no 0.34% 66.11%
no 0.35% 68.06%
no 0.54% 64.71%
no 0.64% 71.01%
no 0.71% 77.40%
no 0.79% 72.91%
no 1.57% 73.29%
Yes 4.81% 38.53%
Yes 4.83% 38.03%
{dl,da,ds,ds} yes 5.03% 39.11%
ld1.dz) yes 5.44% 37.47%

Table 2. Random samples (non constant area).


Directions Spine Average Number of
error solutions
no 5.43% 67.48%
no 5.66% 69.85%
no 5.71% 69.51%
no 5.86% 58.34%
no 6.24% 62.93%
no 8.53% 58.75%
no 9.84% 75.11%
Yes 10.67% 28.42%
Yes 10.78% 29.92%
Yes 10.87% 28.67%
yes 11.94% 28.32%

References
1. KAK A.C. AND SLANEY M., Principles of Computerized Tomography Imag-
ing. IEEE Press, New York, 1988.
45

2. SHEPPL., DIMACS Mini-Symposium on Discrete Tomography. Rutgers Uni-


versity, September 19, 1994.
3. SCHWANDER P., Application of Discrete Tomography to Electron Microscopy
of Crystals. Discrete Tomography Workshop, Szeged, Hungary, 1997.
4. RYSERH.J., Combinatorial properties of matrices of zeros and ones. Canad.
J . Math., 9:371-377, 1957.
5. DAURATA,, Convexity in Digital Plane (in French). PhD thesis, Universite
Paris 7 - Denis Diderot, UFR d’lnformatique, 1999.
6. SVALBEI. AND SPEK VAN DER D., Reconstruction of tomographic images
using analog projections and the digital Radon transform. Linear Algebra
and its Applications, 339:125-145, 2001.
7. ALPERSA., GRITZMANN P., AND THORENS L., Stability and Instability in
Discrete Tomography. Digital and Image Geometry, LNCS, 2243:175-186,
2001.
8. BRUNETTIS., DEL LUNGOA., DEL RISTOROF., KUBAA,, AND NIVAT
M., Reconstruction of 8- and 4-connected convex discrete sets from row and
column projections. Linear Algebra and its Applications, 339:37-57, 2001.
9. KUBAA., Reconstruction in different classes of 2d discrete sets. Lecture Notes
in Computer Science, 1568:153-163, 1999.
10. BARCUCCI E . , DEL LUNGOA , , NIVATM., AND PINZANI R., Reconstruct-
ing convex polyominoes from horizontal and vertical projections. Theoretical
Computer Science, 155:321-347, 1996.
11. BALOGHE., KUBAA . , DBVBNYIC., AND DEL LUNGOA., Comparison of
algorithms for reconstructing hv-convex discrete sets. Linear Algebra and its
Applications, 339:23-35, 2001.
12. Mallinckrodt Institute of Radiology, Washington University School of
Medicine, http://gamma.wustl.edu/home.html.
46

AN INTEGRATED APPROACH TO 3D FACIAL RECONSTRUCTION


FROM ANCIENT SKULL

A. F. ABATE, M. NAPPI, S. RICCIARDI, G. TORTORA


Dipartimento di Matematica e Informatica,
Universitri di Salem0
84081, Baronissi,, Italy
E-mail: mnappi@unisa.it

Powerful techniques for modelling and rendering tridimensional organic


shapes, like human body, are today available for applications in many
fields such as special effects, ergonomic simulation or medical
visualization, just to name a few. These techniques are proving to be very
useful also to archaeologists and anthropologists committed to
reconstruct the aspect of the inhabitants of historically relevant sites like
Pompei. This paper shows how, starting from radiological analysis of an
ancient skull and a database of modem individuals of the same
aredgendedage, it is possible to produce a tridimensional facial model
compatible to the anthropological and craniometrkal features of the
original skull.

1. Introduction

In the last years computer generated imaging (CGI) has been often used for
forensic reconstruction [19], as an aid for the identification of cadavers, as well
as for medical visualization [3,16], for example in the planning of maxillo-facial
surgery [ 141. In fact, 3D modelling, rendering and animation environments
today available have greatly increased their power to quickly and effectively
produce realistic images of humans [8]. Nevertheless the typical approach
usually adopted for modelling a face is often still too much artistic and it mainly
relies on the anatomic and physiognomic knowledge of the modeller. In other
terms computer technology is simply replacing the old process of creating an
identikit by hand drawn sketches or by sculpting clay, adding superior editing
and simulative capabilities, but often with the same limits in term of reliability
of the results.
The recent findings of five skulls [see Figure 11 and several bones (from a
group of sixteen individuals in Murecine (near Pompei), offers the opportunity
to use CGI, and craniographic methods [ 5 ] ,to reconstruct the aspect of the
victims of this tremendous event.
This paper starts assuming that, unfortunately, what is lost in the findings of
ancient human remains, is lost forever. This means that by no way is possible to
exactly reproduce a face simply from its skull, because there are many ways in
which soft tissues may cover the same skull leading to different final aspects.
47

The problem is even more complicated in the (frequent) case of partial findings,
because the missing elements (mandible or teeth for example) could not be
derived from the remaining bones [7].

Figure 1. One of the skulls found in the archaeological site of Murecine, near Pompei.

Nevertheless is true that the underlying skeleton affects directly the overall
aspect of an individual, and many fundamental physiognomic characteristics are
strongly affected by the skull. One of the main purposes of this study, is
therefore to correlate ancient skulls to skulls of living individuals, trying, in this
way, to replace lost information (for example missing bones and soft tissues)
with new compatible data. Additionally, the physiognomic relevant elements
that are too much aleatory to be derived from a single compatible living
individual, are selected through a search in a facial database (built from classical
art reproductions of typical Pompeians) and then integrated in the previous
reconstruction.

This paper is organized as follows. In Section 2 related works are presented.


In Section 3 the proposed reconstruction approach is presented in detail. In
Section 4 the results of the proposed method are presented and discussed. The
paper concludes showing directions for future research in Section 5.
48

2. Related Works

Facial reconstruction from skull begins has a long history, and begins around the
end of nineteenth century. The reconstructive methodologies developed over
more of a century [20] basically come from two main approaches:

. the study of human facial anatomy and relationships between soft


tissues (skin, fat, muscles) and hard tissues (cranial bones),

. the collection of statistical facial data about individuals belonging to


different races, sex and ages,

and they can be summarized as follow:

. 2D artistic drawing [6], in which the contours fitting a set of markers


positioned on the skull act as a reference for the hand drawing phase
which involves the anatomic knowledge of the artist.

. photo or video overlay of facial images on a skull image [lo], aimed to


compare a face to a skull to highlight matching features.

. 3D reconstruction both with manual clay sculpting or digital modelling.


In the manual approach the artist starts from a clay copy of a skull,
applies the usual depth markers (typically referred as landmarks) and
then begins to model in clay a face fitting the landmarks. In digital
modelling the first step is to produce a 3D reconstruction of the skull
[ 151, typically starting from CT data [17], then a facial surface model is
created from 3D primitives using the landmarks as a reference for the
contouring curves. It is also possible to generate a solid reconstruction
of the modelled face by stereolithographic techniques [9,11].

. warping of 3D digital facial model [18, 211, which tries to deform


(warp) a standard “reference” facial model, to fit the landmarks
previously assigned on the digital model of the skull.

Many of the methods mentioned above rely on a large survey on facial soft
tissue depth, measured in a set of anatomically relevant points. Firstly developed
on cadavers, this measurement protocol has been improved 141 with data from
other races, various body build, and even from living individuals by radiological
and ultrasound diagnostic techniques.
49

3. The proposed method

The whole reconstructive process is detailed below in sections 3.1 to 3.11.Two


reference databases are used: the Craniometrical Database (CD) and the
Pictorial Physiognomic Database (PPD). In sections 3.5 and 3.10 these
databases are discussed in detail.

3.1. The skull


We start selecting one dry skull among the five ones found in Murecine. This
skull belonged to a young male, and it has been found without the mandible and
with many teeth missing, but its overall state of conservation is fine.
Unfortunately the absence of the mandible make the reconstruction of the lower
portion of the face more complicated and less reliable, because in this case there
is no original bone tissue to guide the process.
The skull is photographed and then scanned via CT on the axial plane with a
step of 1 millimetre and a slice thickness of 2 millimetres, so every slice
overlaps by 1 millimetre with the following one. This hires scanning produce a
set of images (about 250), as well as a 3D reconstruction of the skull.
Additionally, three radiological images of the skull from three orthogonal planes
are taken, corresponding to front, side and bottom views. The 3D mesh
outputted by CT will be used as a reference to visually verify the compatibility
of the reconstructed soft tissues to the dry skull.

3.2. The set of landmarks


The next step is to define on each radiological image a corresponding set of
anatomic and physiognomic relevant points, named landmarks, each one with a
unique name and number in each view [see Figure 21.

Figure 2. Landmarks located on front and side view of skull and craniometrical tracing.
50

Because the landmarks are chosen according to their craniometrical relevance,


they possibly could not correspond to the points for soft tissue thickness
measurement indicated by Moore [4]. In this study we use a set of 19
landmarks, but this number could be extended if necessary. Alternatively it is
possible to assign the landmarks directly on the 3D skull mesh produced by CT,
in this case the following step (3.3) is not necessary because the landmarks
already have tridimensional coordinates.
A complete list of the landmarks used is showed in Table 1.

Table 1. List of landmarks .


I Landmark # 1 Location (front view) 1 Landmark # I Location (side view) I

3.3. Adding a third dimension to the set of landmarks


Now we have the same set of points assigned to each of three views
corresponding with plane XY, XZ and YZ. So it is easy to assign to each
landmark Li its tridimensional coordinates (hi,
Lyi,Lz,) simply measuring
them on the appropriate plane with respect to a common axis origin. We can
easily visualize the landmark set in the tridimensional space of our modeling
environment and make any kind of linear or angular measurements between two
or more landmarks.

3.4. Extraction of craniometricalfeatures


Starting from the landmarks previously assigned we define the n-tuple of
features (F,*,F2*,......, F,*) which are peculiar to this skull and results from
the craniometrical tracing of the skull [see Figure 21. These features are
consistent to the features present in CD, they includes angles and lenghts
measured on front or side view and are listed in Table 2.
51

Table 2 List of Features (front and side view)

Because each feature has a different relevance from a physiognomic and


craniometrical point of view, a different weight is assigned to each of them.
The resulting n-tuples (wl, w2, ......,w,,), with 0 I w i 5 1and 1I j I n ,
contains the weights relative to (F,*,F2*,......, F,*) . These weights are not
meant to be dependent from a particular set of features, and if Fi= o then
w i=o.

3.5. Searching for sirnilanties in CD


The CD is built on data collected from a radiological survey [see Figure 31
conducted on thousands of subjects of different ages and sex but all coming
from the same geographical area in which the remains were found: Pompei and
its surroundings.

Figure 3. Samples of records used to built the CD.

Each individual represent a record in the database, and each craniometrical


feature, extracted with the same procedure showed before, is stored in a numeric
field, as well as the 3D coordinates. Additionally we stored three photographic
facial images of each subject, shoot from the same position and during the same
52

session of radiological images. This precise alignment of photo camera and


radio-diagnostic device is necessary to allow a spatial correlation between the
two different kind of images.

If a digital CT equipment or even a 3D scanneddigitizer were available,


an optional field could point to a facial 3D model of each subject, thus
avoiding the need for steps 3.6 e 3.7.

Once the database is built, it is possible to search through it to find the record
(the modern Pompeian individual) whose craniometrical features are more
similar to the unknown subject given in input. This task is accomplished by
evaluating for each record i the Craniometrical Similarity Score (CSS) that is
calculated as :

in which Fi,is the jcomponent of the n-tuple of features


Fi2 7 . . . . . . , F i , ) , relative to record i , w,represent its weight and D, is
(Fi17
D2,......)0,) containing the max allowed
the j component of an array (D,,
difference between Fi, and F,* for each j . If any feature is not present in the
input skull, due to missing elements for example, then the corresponding term in
the CSS formula becomes zero. So is O<=CSS<=I, and a CSS of 1 means a
perfect match (an almost impossible case) is found. Ideally CSS should be not
less than 80%to use the face as a valid reference for the reconstruction.

3.6. Augmenting the set of landmarks


The aim in craniometrical database search is to augment the set of landmarks
with new landmarks relative to soft tissues coming from the individual with the
highest CSS. In fact radiological and photographic images of a living individual,
contains useful information about local thickness and shape of soft tissues,
which can replace data missing in the dry skull. To retrieve this data we first
normalize photographic images to match the radiological images, and then we
blend each pair of images to highlight the facial contours on the underlying
skull, thus revealing the soft tissue thickness in many relevant point of the head
for each plane.
53

3.7. Modelling the facial surface


The augmented set of landmarks and the set of photographic images can be
used to guide the 3D modelling of the “best match” face. The simplest
modelling technique is to visualize the landmarks as 3D points inside the
modelling environment, mapping the three photo images on three orthogonal
plane so that for each view all the landmarks are properly positioned. Using
these visual references we can draw a sequence of cross sections whose
interpolation result in a surface model of the head. B-patches as well as Nurbs
can be used for this purpose. An interesting alternative to this manual modelling
technique is the possibility to generate the model from a set of stereoscopic
images of the head as in [13]. In this case for each record the CD should also
contain three pairs of images acquired from a slightly different angles. Whatever
the technique adopted, the final result is the 3D model [see Figure 41 of the head
with the maximum CSS.

Figure 4.Rough face model.

3.8. Warping the roughface model to fit the original set of landmarks
If the CSS of the reconstructed head is not equal to 1 (and this will probably
always be true) then we would like to modify the shape of this model to better
fit the craniometrical features of the found skull.
This kind of tridimensional deformation of a mesh, based on vertex relocation
by a specific transformation of coordinates, is usually referred as a “warping”.
More precisely, we want to move every bone landmark Lj of the “best match”

case for which result


I(Li -L *;)I # 0 (where Ly is the corresponding landmark
54

on the dry skull) to a new position that correspond to the coordinates of L;. The
purpose is to affect the polygonal surface local to Li using the landmark as an
handle to guide the transformation. Many different algorithms are available to
accomplish this task, but we chosen a free form deformation which simply
works assigning to the input mesh a lattice withn control vertex (our
landmarks&) and by moving them (to L;) it deforms smoothly the
surrounding surface.
After warping is applied, the face model fit better the dry skull, and this
match can be easily verified visualizing at the same time the skull mesh (from
CT 3D reconstruction) and the face model with partial transparency.

3.9. Texturing and shading


At this point we can apply material shaders to the head model, to enhance the
realism of the reconstruction. We define a material for skin, with a texture
assigned for the diffuse channel and a shininess map to simulate the different
reflectivity levels present on actual face skin. Both the textures are mapped
spherically on the mesh. For the diffuse texture we could well use the
photographic images relative to the best match case present in CD, simply
editing them with a photo-retouching software. To fine tune the assignment of
mapping coordinates to mesh vertices we found very useful to make an unwrap
of the mesh, in this way has been possible to interactively edit a planar version
of the facial mesh thus simplifying this task.

3.10. Searching for missing elements in the physiognomic database


The result of the previous nine steps is the creation of a 3D model of a bald head
whose craniometrical features are compatible to the ones belonging to the found
skull, and whose soft tissue thickness come from a living individual probably
with similar anthropological features.
We want now integrate this tridimensional identikit of an unknown Pompeian
with physiognomic element such as eyes, lips, nose and hairs coming from the
only reliable source we have, the paintings and sculptures made from artists
contemporary to Vesuvio eruption, who are supposed to be inspired, in their
works, from typical local subjects.
So we introduce the PPD, built as a collection of images reproducing Pompeian
classical arts. This database is based on the work by [ l ] and it allows, via a
query by pictorial example, to retrieve images [see Figure 51 whose
physiognomic features are compatible with given craniometrical features. As a
result of a search through PPD, we could have a set of physiognomic elements
which can guide the refinement of the reconstruction.
55

Figure 5. Samples of PPD records.

3.11. Final reconstruction and rendering


The final step is to locally modify the mesh produced in step 3.9, trying to
integrate the facial features of ancient Pompeians, as resulting from the previous
search. We used a non-linear free-form deformation applied to a vertex selection
based on distance from landmarks to properly deform the areas corresponding to
eyes, lips and nose. We also tried to generate a digital reconstruction of haircut
and shave, because these physiognomic elements, although totally aleatory, help
to visualize the subject as it more probably was. Haircut and facial hair have
been applied and oriented using a specific modelling tool. After modelling
phase is over we can fine tune the materials properties and then produce the
final renderings [see Figure 61 of the reconstructed head in high resolution.

Figure 6. Final rendering of the reconstructed face.


56

4. Discussion

The methodology presented above actually integrates some of the features


typical of the classic reconstructive approaches listed in section 2, trying to
maximize their results specially for archaeological applications.
In fact the warping technique is common to other computerized methods, as it is
the use of a “reference” facial mesh to be deformed to fit the found skull, or the
positioning of a set of landmarks on the bone remaining to guide the warping.
Nevertheless this methodology differs substantially from the other ones in the
following fundamental aspects:

The building of a custom craniometrical database based on the


anthropological hypothesis that individuals with similar physiognomic
and craniometrical features can still be present in the same area in
which the remaining were found;
The selection of a reference candidate through a search for
craniometrical similarities in the CD and not just based on a generic
racelgender criteria;
The modelling of a 3D facial mesh by actual (photo, CT or 3D scan)
data of the selected (living) reference candidate, and not by average
soft tissue depths collected following a generic racelgender criteria and
applied to the dry skull;
The warping technique applied to the highest CSS subject mesh only to
improve the reconstruction, instead of using it as the main tool to
conform a generic facial mesh to the found skull.
The use of PPD to refine the reconstruction adding compatible
physiognomic elements (nose, eyes, lips ) often not defined with other
approaches.

These peculiarities lead to a precise applicative range for the proposed method,
with advantages and limits respect to other methods presented.
The proposed method works best on a complete skull, but even in the case of
missing mandible it can still produce interesting results, using the remaining
craniometrical measurements to search a similar subject in the CD, thus
replacing (even if a major alea would arise ) the lost information.
Another critical point about “warping methods” mentioned in section 2 is
the reference face mesh to warp, because its physiognomic features affect the
final result independently from the correctness of soft tissue depth in the discrete
set of landmarks involved in the process. The basic classification for races
(Caucasian, Afro, Asian, etc.), sex and build (fat, normal or thin) is often too
generic to accurately reproduce the aspect of specific ethnic groups.
The proposed method, based on the custom built CD containing records of
anthropologically compatible individuals, uses as a reference mesh the 3D face
model of the most similar subject in database, thus minimizing the amount of
57

interpolation between the landmarks and leading to a more accurate


reconstruction.
Finally, after the landmark based mesh warping is applied, the resulting
reconstructed face does not include elements such as nose, lips, eyes, ears or
hairs, which cannot be derived from soft tissues statistics, so, as proposed in
[ 191, it is necessary to manually draw them onto a rendered front or side view of
the head to obtain a complete identikit.
The proposed method relies on the PPD to search anthropologically compatible
facial features and to apply them on the reconstructed face by local
deformations of mesh control points. Even if these added elements still remain
aleatory, they could be very useful to visualize the possible aspect/s of the found
subject.
On the other side the use of CD and PPD could be a limit to the application of
this technique or to the reliability of its results, if an appropriate
radiological/photographic survey on a population anthropologically similar to
the subject to be reconstructed could not be available.

5. Conclusion

Facial reconstruction techniques have a long tradition both in forensic and


archaeological fields, but, as long as anthropological studies and information
technology help us to better identify and visualize a feasible reconstruction of an
individual, given its skull, we have to remark that there is no way to exactly
replace lost data.
The approach presented in this paper can considerably enhance the likeness of
the reconstructed face to the anthropological features of the ethnic group which
found skull belonged to, but requires correctly built CD and PPD to achieve
optimal results.
Future developments of this method will try to use as reference not only the
record with the highest CSS found searching through CD, but a set of records
whose CSS is above or equal to a previously defined threshold. By averaging the
mesh relative to each selected record, the resulting face could be a better
candidate for the next steps of reconstruction, with probably a lower influence
of random physiognomic features than in the case of a single best match.

References

[ 11 A. F. Abate, G. Sasso, A. C. Donadio, F. Sasso, The riddles of murecine: the


role of anthropological research by images and visual computing - MDIC 2001,
LNCS 2 184, pp. 33-4 1 - Springer-Verlag
58

121 J.P. Moss, A.D. Linney, S.R. Grindrod, C.A. Mosse, A laser scanning
system for the measurement of facial surface morphology, Optics Lasers Eng.
10 (1989) 179-190.
[3] A.C. Tan, R. Richards, A.D. Linney, 3-D medical graphics - using the
T800 transputer, in: Proceedings th of the 8 OCCAM User Group Technical
Meeting, 1988, pp. 83-89.
[4] J.S. Rhine, C.E. Moore, Facial reproduction tables of facial tissue thickness
of American Caucasoids in forensic anthropology, in: Maxwell Museum
Technical Series 1, Maxwell Museum, Albuquerque, New Mexico, ,1982.
[5] R.M. George, The lateral craniographic method of facial reconstruction, J.
Forensic Sci. 32 (1987)
1305-1 330.
[6] R.M. George, Anatomical and artistic guidelines for forensic facial
reconstruction, in: M.H. Iscan, R.P. Helmer (Eds.), Forensic Analysis of the
Skull, Wiley-Liss, New York, 1993, pp. 215-227, Chapter 16.
[7] H. Peck, S. Peck, A concept of facial aesthetics, Angle Orthodont. 40 (1970)
284-318.
[8] K.Waters, D. Terzopoulos, Modelling and animating faces using scanned
data, J. Visual. Graphics Image.
[9] H. Hjalgrim, N. Lynnerup, M. Liversage, A. Rosenklint, Stereolithography:
potential applications in anthropological studies, Am. J. Phys. Anthropol. 97
(1995) 329-333.
[lo] A.W. Sharom, P. Vanezis, R.C. Chapman, A. Gonzales, C. Blenkinsop,
M.L. Rossi, Techniques in facial identification: computer-aided facial
reconstruction using a laser scanner and video superimposition, 1nt.J. Legal
Med. 108 (1996) 194-200.
1111 N. Lynnerup, R. Neave, M. Vanezis, P. Vanezis, H. Hjalgrim, Skull
reconstruction by stereolithography,th in: J.G. Clement, D.L. Thomas (Eds.),
Let’s Face It! Proceedings of the 7 Scientific Meeting of the International
Association For Cranofacial Identification, Local Organising Committee of the
IACI, Melbourne, 1997, pp. 11-14. P.Vanezis et a1 . / Forensic Science
International 108 (2000) 81- 95 95
[ 121 Gonzalez-Figueroa, An Evaluation of the Optical Laser Scanning System
for Facial Reconstruction, Ph.D. thesis, University of Glasgow, 1998.
[13] R. Enciso, J. Li, D.A. Fidaleo, T-Y Kim, J-Y Noh and U. Neumann,
Synthesis Of 3d Faces - Integrated Media Systems Center, University of
Southern California - Los Angeles.
[14] M.W. Vannier, J.L. Marsh, J.O. Warren, Three dimensional CT
reconstruction images for craniofacial surgical planning and evaluation,
Radiology 150 (1984) 179-184.
[ 151 S. Arridge, J.P. Moss, A.D. Linney, D.R. James, Three-dimensional
digitisation of the face skull, J. Max.-fac. Surg. 13 (1985) 136-143.
59

1161 S.R. Arridge, Manipulation of volume data for surgical simulation, in: K.H.
Hohne, H. Fuchs, S.M. Pizer (Eds.), 3D Imaging in Medicine, NATO AS1
Series F 60, Springer-Verlag, Berlin, 1990, pp. 289-300.
[17] J.P. Moss, A.D. Linney, S.R. Grinrod, S.R. Arridge, J.S. Clifton, Three
dimensional visualization of the face and skull using computerized tomography
and laser scanning techniques, Eur. J. Orthodont. 9 (1987) 247-253.
[ 181 J.P. Moss, A.D. Linney, S.R. Grinrod, S.R. Arridge, D. James, A computer
system for the interactive planning and prediction of maxillo-facial surgery,
Am. J. Orthodont. Dental-facial Orthopaed. 94 (1988) 469-474.
[19] P. Vanezis, R.W. Blowes, A.D. Linney, A.C. Tan, R. Richards, R. Neave,
Application of 3-D computer graphics for facial reconstruction and comparison
with sculpting techniques, Forensic Sci. Int. 42 (1989) 69-84.
[20] A.J. Tyrell, M.P. Evison, A.T. Chamberlain, M.A. Green, Forensic three-
dimensional facial reconstruction: historical review and contemporary
developments, J. Forensic Sci. 42 (1997) 653-661.
[211 G. Quatrehomme, S. Cotin, G. Subsol, H. Delingette, Y. Garidel, G.
Grevin, M. Fidrich, P. Bailet, A.Ollier, A fully three-dimensional method for
facial reconstruction based on deformable models, J. Forensic Sci. 42 (1997)
649-652.
60

THE E-LEARNING MYTH AND THE NEW UNIVERSITY

VIRGIN10 CANTONI, MARC0 PORTA AND MARIAGRAZIA SEMENZA


Dipartimento di Informatica e Sistemistica,
Universitcidi Pavia
Via A . Ferrata, I
271 00, PA VIA, Italy
E-mail:virginio.cantoni@unipv.it,porta@vision.unipv.it,
mariagrazia.semenza@unipv. it

The role of Information and Communication Technologies (ICTs) in


educational development is underlined and is established as a priority, in order
”to reinforce academic development, to widen access, to attain universal scope
and to extend knowledge, as well as to facilitate education throughout life“. In
fact, the development ICTs has had a significant impact on traditional higher
education systems and the former dual system has been modified and the gap is
closing, as the university of the 21st century takes shape.

1. Introduction

Technological advances offer new paradigms for university training. In particular,


multimediality has strengthened the distance learning approach, insomuch that, in
a first phase, a clear dichotomy has emerged between the traditional in-presence
modality and the more aloof distance modality. With effective metaphors, it has
been used the terms “brick university” and “click university” to indicate this
separation.
Initially, the two paradigms were presented with opposed traits:
i) while the in-presence modality is characterized by the class (often active
in full-time), the distance modality is personalized for the student;
ii) while the first is characterized by the teacher and is centered on him or
her (who chooses topics and operational rules), the second is focused on
the student and is directly controlled by him or her;
iii) while the first has predefined schedules and time extents, the second
occurs only when required and has the strictly necessary duration;
iv) while one is based on the topic, which is discussed by voice, the other is
centered on the project, in which one learns by doing;
v) while the first is communicated through the technology (based on the
teacher competence), the second is conveyed by means of the technology
(based on the acquired knowledge), through a “query and discovery”
process by the student;
61

vi) to conclude, we can say that while in the in-presence paradigm the
student plays a reactive role, in the distance modality the student assumes
a proactive role.
The traditional university, as an institution offering on-site courses, to
maintain their prestigious position, needs to know how to make the most of the
opportunities being offered by new technologies. The challenge is to rethink their
hgher education environment in the light of new technologies in order to meet the
challenges of a global context. For this reason, several countries are promoting
technological development measures for education policy, either from government
or from university associations. This implies the establishment of strategic lines
for the development of a more open education.

2. The e-learning myth


After a first period in which several only-virtual universities were created (e.g. the
British Open University, which has today 100000 students around the world and
uses 7000 teachers distributed on the Great Britain territory; the Globewide
Network Academy in Denmark; the World Lecture Hall of the University of
Texas; the Athena University, etc.), some prestigious institutions have joined their
efforts to build non-profit alliances aimed at creating distance-learning programs.
A significant example is represented by the agreement among the Universities of
Stanford, Princeton, Yale and Oxford, in October 2000. Subsequently, on-line
education entrepreneurs and for-profit associations, with or without traditional
university partners, have appeared: today, there are more than 700 university
institutions of this kind (with initiatives distributed in all the continents), as well as
more than 2000 corporate Universities. At the end of April 2001, the MIT
announced that, within a ten-year program, its almost 2000 courses will be put on-
line, available for free to everybody.
In addition, technological advances have increased permanent education
demands, which are becoming more and more frequent. This way, permanent links
can be established between institutions and their graduates. Life cycles of new
technologies not only require new teaching paradigms, but also recurrent refresher
courses. According to Christopher Galvin, President and CEO of Motorola,
“Motorola no longer wants to hire engineers with a four year degree. Instead, we
want our employees to have a 40 year degree”. Thus, besides institutions
providing certified courses with a final diploma, there is a growing number of
university consortia, organizations, publishers and industries aimed at developing
and distributing on-line permanent instruction programs.
This lays the foundations for the development of open higher education, its
main objective being to develop human capital in the new technology age.
However, beyond the adoption of institutional measures for the technological
development of education, the expansion of open universities, some of which have
already become macro universities capable of overshadowing the classical
university model, has transformed the traditional university, while at the same
62

time increasing the diversification and development of higher education models,


whether at the third cycle, such as postgraduate courses, masters degrees,
vocational training and skills recycling.
The activity of the Information Technology industry in the multimedia
instructional sector has been very intense in the last few years. Currently, on the
market, more than 100 different Learning Management Systems administer
libraries for course storage and production, provide related information, and
control course distribution and student interactive access. Like for all technologies
approaching maturity, standardization activity is very intense in this phase to
assure interoperability and ease of update and reuse of multimedia instructional
products.
Major changes are then taking place also in classical higher education
institutions and universities, owing to the impact of new technologies and on the
basis of the newcomers in the field. Universities which have become pioneers in
adapting to this new reality through the introduction of new technologies as a
complement to on-site courses.

3. Advantages and benefits of e-learning


Since they can customize the learning material to their own needs, students have
more control over the learning process and can better understand the material,
leading to a faster learning curve, compared to instructor-led training. The
delivery of content in smaller units contributes further to a more lasting learning
effect.
Students taking an online course enter a risk-free environment in which they
can try new things and make mistakes without exposing themselves. This
characteristic is particularly valuable when trylng to learn soft skills, such as
leadership and decision-making. A good learning program shows the
consequences of students’ actions and wherelwhy they went wrong.
E-learning builds on existing delivery methods to incorporate connectivity,
whether through internal networks or the Internet. It removes the isolation that
limited its predecessors to a market of enthusiasts and innovators.
E-learning is used to drive strategic organizational goals. Unlike so much
training in the past, it is tightly integrated into what the organization must
achieve, not what individuals feel is good for them.
Among the several benefits of e-learning, we can list the following: it’s
usually less expensive to produce, it’s self-paced (most e-learning programs can
be taken when needed), it moves faster (the individualized approach allows
learners to skip material they already know), it provides a consistent message (e-
learning eliminates the problems associated with different instructors teaching
slightly different material on the same subject), it can work from any location
and any time (e-learners can go through training sessions from anywhere, usually
at anytime), it can be updated easily and quickly (online e-learning sessions are
especially easy to keep up-to-date because the updated materials are simply
63

uploaded to a server), it can lead to a increased retention and a stronger grasp on


the subject (because of the many elements that are combined in e-learning to
reinforce the message, such as video, audio, quizzes, interaction, etc.), it can be
easily managed for large groups of students.
E-learning can improve retention by varying the types of content (images,
sounds and text work together), crating interaction that engages the attention
(games, quizzes, etc.), providing immediate feedback (e-learning courses can
build in immediate feedback to correct misunderstood material), encouraging
interaction with other e-learners and e-instructors (chat rooms, discussion
boards, instant messaging and e-mail all offer effective interaction for e-
learners).

4. E-learning in Europe
The use of e-learning for enhancing quality and improving accessibility to
education and training is generally seen as one of the keystones for building the
European knowledge society.
At the Member State level, most countries have their own Action Plan for
encouraging the use of ICT in education and training: often involving direct
support for local pilots of e-learning in schools and higher education.
Evidence that true e-learning is being used in Europe is not easy to find, as
it’s typical at this stage of an embryonic technology market for organizations to
work through a series of internal pilots.
Compared to the USA, in some ways Europe is following a different path:
greater government involvement, more emphasis on creative and immersive
approaches to learning, more blending of e-learning with other forms, a greater
use of learning communities (mainly by southern European users), and
(particularly in Scandinavia) a strong emphasis on simulation and mobile
communications.
E-learning standards are recognized as being useful, even essential, to
encourage the reuse and interoperability of learning materials.
It is important to sustain the exchange of experience within Europe on the
use of ICT for learning and to develop a common understanding of what is good
or Best Practice.
We think that e-learning standards can only be established as the result of a
profitable collaboration among varied entities, operating in different contexts,
with different objectives. Only by sharing problems, solutions and evaluations of
the various outcomes the real essence of potential drawbacks and advantages can
be assessed.

5. The new role of teachers


The technological revolution taking place in higher education is changing the
classical models of on-site training and education. Educators cannot turn their
64

backs on information technologies when giving classes, students need to learn


new technologies and, rather than accumulate knowledge, it is increasingly
important to know where to find information. What is more, the university, as an
institutional offering on-site courses, needs to know how to make the most of the
opportunities being offered by new technologies, in order to broaden their
market on the basis of thls new provision.
The teacher plays in e-learning a new, different role. First of all, while
devising a course, the teacher becomes the designer of experiences, processes
and contexts for the learning activity; besides identifymg the contents, he has to
focus on motivation and active learning processes. He probably has to devote a
greater attention to the creation of what has never been done before than to the
analysis of the previous experiences. Rather than a scientist who applies his
analytics skills, the teacher seems to act like an artist.
Even more important are the strategies for teaching at a distance. In few
words, what is different:
- classroom teachers rely on a number of visual and unobtrusive cues from
their students. A quick glance, for example, reveals who is taking notes,
pondering a difficult concept, or preparing to make a comment. The student who
is confused, tired, or bored is equally evident. The attentive teacher receives and
analyzes these visual cues and adjusts the delivery to meet the needs of the class
during the lesson.
- the distant teacher has not visual cues: without the use of a real-time visual
medium the teacher receives no visual information from the distant sites. In any
case if do exist, these cues are filtered through technology and it is difficult to
carry on a stimulating teacher-class discussion when spontaneity is altered by
technical requirements and distance. The teacher might never really know, for
example, if students are paying attention, talking among themselves or even in
the room. Separation by distance also affects the class environment: living in
different communities deprives the teacher and students of a common
community link. Therefore even with on-site teaching, but in deep way in the
distant case, during the course, teachers are engaged as mentors in motivating
students, in highlighting pros and cons and in detecting the causes of failures.
The most advanced multimedia technology is not the one that artificially
replaces reality or intelligence, but rather the one that increases our skills,
adapting itself to the context and evolving while being used. Technology must fit
the user, not the contrary: it is really effective when it is ergonomic, intuitive and
transparent. By paraphrasing Wayne Hodgins, to be really efficient and effective
in multimedia1 teaching it is therefore necessary: to choose “just the right
CONTENT, to just the right PERSON, at just the right TIME, on just the right
DEVICE, in just the right CONTEXT, and just the right WAY.
Even if the so-called “digital divide” problem (that is the marginalization of
computer science illiterates) is not as strong as it was in the past (especially in
the young generations), there are still many people who have not approached the
65

new potentialities of technology and remain in the "cybercave" where they can
see only the shadows of technology. Even the ones that are completely temped
by the "hi-tech" can make a big mistake by forcing contents on technology: true
effectiveness is only obtained by adapting technology to contents!

6. The university with long tradition and the e-learning opportunities


After centuries of stable evolution, the academic system has entered a period
of significant change, revolutionary in certain aspects. Market forces are
increasingly interested in advanced education (mostly abroad, but recently in
Italy as well), academic competition has increased and technology demonstrates
a relevant innovative impact. Students are today more active and aware of
technology capability, they are used to interaction, to plug-and-play experiences:
they now constitute the digital generation and have easy access to the whole
academic world (different universities are just a click away one from the other).
This phase of rapid transformation presents broad perspectives and novel
opportunities, but many challenges and risks as well. It is primarily essential to
develop change capabilities, to help our institutions react to the rapid change
necessities required by society. Secondarily, actions are not simply extractable
from even glorious tradition, but need to be examined in the wide field of
different future perspectives.
The program adopted by the e-learning research unit of Pavia University
includes several activities. In particular, the research unit will focus on the
following themes.
The definition of a common project methodology for modeling, classifying
and archiving educational resources that guarantee an adequate level of
interoperability and reusability, through the adoption of standards and tools for
metadata description and packaging of 'learning objects'. With an explicit
reference to the S C O W standard, the focus will be on the analysis and
development of multi-resolution mechanisms for managing data from the atomic
level (of individual assets and 'learning objects') to the higher levels of 'semantic
resolution'.
The resulting process should be applied for both classification and retrieval,
through the management of learning object metadata, as well as for generating
new educational units, through the assessment of techniques for the aggregation
and recombination of basic constituents in lower semantic levels, down to the
atomic level of learning objects.
This hierarchical metaphor for knowledge management will be analyzed in
order to develop a methodology, if not a theory, on the principles and procedures
for the generative design of education units at the higher levels. Learning objects
will be in fact the fundamental elements in the new overall learning model that
originates from the object-oriented programming paradigm.
Following this approach, the efforts will also aim to the creation of basic
components, i.e. actual learning objects, which will be reusable in many different
66

contexts. In fact, the very notion of learning objects relies on this fundamental
principle: when developing a new educational unit, the objective should be the
construction of several basic components about the subject considered and these
components should be reusable in other contexts and with different learning
strategies.
Eventually, all the educational units will be accessible through the Internet,
i.e. they will be edited and used by many users simultaneously. It has to be
remarked that the objective of the analysis and development presented is not that
of achieving another 'Content Management System' but rather the definition of
the guidelines for selecting the tools and educational resources that will
eventually become the shared infrastructure.
A final not negligible objective of the Pavia e-learning project is that of
enlarging as much as possible the basis of potential teachers by stimulating and
promoting the adoption the new activity, up to achieving course portfolio for the
entire traditional university background that range from humanistic, to scientific,
to all the applied sciences.

References

1. Ackermann, E. (1996). Tools for teaching: The World Wide Web


and a Web Browser. (http://www.mwc.edu/ernie/facacad/WWW-
Teaching.htmI )
2. Angelo, T. & Cross, P. (1993). Classroom assessment techniques:
A handbook for college teachers. San Francisco: Josey-Bass
Publishers.
3. Bernt, F.L. & Bugbee, A.C. (1993). Study practices and attitudes
related to academic success in a distance learning programme.
Distance Education, 14(1), 97-1 12.
4. A. Biancardi, V. Cantoni, and M. Pini (1996), A Training
Environment for ISE Courses, Proc. ICIP '96, Lausanne, pp. 453-
456.
5. A. Biancardi, V. Cantoni, D. Codega, and M. Pini (1997), An
Interactive Tool for C.V. Tutorials, CAMP 97, pp. 170-174.
6. Bruwelheide, J. H. (1994) In Willis, B. (Ed.) Distance Education:
Copyright Issues. Distance Education: Strategies and Tools.
Educational Technology Publications: Englewood Clus, NJ.
7. Burge, E.J., & Howard, J.L. (1990). Audio-conferencing in
graduate education: A Case Study. The American Journal of
Distance Education, 4(2), 3- 13.
67

8. Cantoni, V. (200 1). L'universita nell'era digitale: tradizione e


nuove opportunita. Prolusione d'apertura dell'anno accademico
pavese.
9. Dick, W., & Carey, L. (1990). The systematic design of instruction
(3rd ed.). Glenview, IL: Scott, Foresman, and Company.
10. Galbreath, J. (1995) Compressed Digital Videoconferencing.
Educational Technology, 35( l), 3 1-38.
11. Ludlow, B.L. (1994). A comparison of traditional and distance
education models. Proceedings of the Annual National Conference
of the American Council on Rural Special Education, Austin, TX.
(ED 369 599)
12. Misanchuk, E.R. (1992). Preparing instructional text: Document
design using desktop publishing. Englewood Clifs, NJ:
Educational Technology Publications.
13. Misanchuk, E.R. (1994). Print tools in distance education. In B.
Willis (Ed.), Distance education: Strategies and tools (pp. 109-
129). Englewood Clifs, NJ: Educational Technology
Publications.
14. Moore, M.G.,& Thompson, M.M., with Quigley, A.B., Clark,
G.C., & Goff, G.G.(1990). The effects of distance learning: A
summary of the literature. Research Monograph No. 2. University
Park, PA: The Pennsylvania State Universig, American Center
for the Study of Distance Education. (ED 330 321)
15. Morgan, A. (1991). Research into student learning in distance
education. Victoria, Australia: University of South Australia,
Underdale. (ED 342 371).
16. M. Mosconi, M. Porta (1999) Testing the Usability of Visual
Languages: a Web-Based Methodology, Proceedings of the 8th
International Conference on Human-Computer Interaction
(HCI'99), 22-27 August 1999, Munich, Germany, Vol. 1, pp.
1053-1057.
17. Oliver, E.L.(1994). Video tools for distance education. In B.
Willis (Ed.), Distance education: Strategies and tools (pp. 165-
195). Englewood Cl@s, NJ: Educational Technology
Publications.
18. Schlosser, C.A., & Anderson, M.L. (1994). Distance education: A
review of the literature. Ames, IA: Iowa Distance Education
Alliance, Iowa State University. (ED 382 159)
19. Schuemer, R. (1993). Some psychological aspects of distance
education. Hagen, Germany: Institute for Research into Distance
Education. (ED 357 266).
20. Threlkeld, R., & Brzoska, K. (1994). Research in distance
education. In B. Willis (Ed.), Distance Education: Strategies and
68

Tools. Englewood Clifls, NJ: Educational Technology


Publications, Inc.
21. Verduin, J.R. & Clark, T.A. (1991). Distance education: The
foundations of effective practice. San Francisco, CA:Jossey-Bass
Publishers.
22. Wileman, R. (1993). Visual communicating. Englewaod Clifs,
NJ: Educational Technology Publications.
23. Wilkes, C.W., & Burnham, B.R. (1991). Adult learner motivations
and electronics distance education. The American Journal of
Distance Education, 5( l), 43-50.
24. Willis, B. (1993). Distance education: A practical guide.
Englewood Cliffs, NJ: Educational Technology Publications.
25. Willis, B. (Ed.) (1994). Distance education: Strategies and tools.
Educational Technology Publications, Inc. : Englewood Clifls, N.
J.
69

E-LEARNING - T H E NEXT BIG WAVE: HOW E-LEARNING WILL


ENABLE THE TRANSFORMATION O F EDUCATION

DR. RICHARD STRAUB


IBM Emea, Learning Solutions Director

CARLA MILAN1
IBM Emea, South Region, Learning Solutions

After having analyzed the motivations that lead to a revision of the teaching and learning
models, baring in mind the European Union initiatives, we now analyze different
possible views of e-learning, both from a technological perspective and through a more
global and integrated approach.

IBM has decided to participate in this transformation challenge and, after an accurate
analysis of all aspects of the phenomenon and the realization inside the company of wide
e-learning strategies, is ready for the education world, as a partner able to handle
complex projects, both in the academic and corporate environments.

We present the IBM role in the EU initiatives and in the public-private partnerships
started by the commission. We also outline the IBM education model, useful to building
learning projects with a “blended” methodology.

The provision of learning through our education systems is set to undergo a


fundamental transformation - this evolution is currently in its early stages. The
advent of the knowledge society makes education and learning a primary
concern for governments and the population at large - learning is increasingly
recognised as a lifelong process with the foundations being laid during the
period of formal primary, secondary and tertiary education.

Today e-learning is considered a key enabler for this transformation. There


is a broad need for basic Information and Communication Technology (ICT)
skdls in our society - ICT skills are becoming a new literacy skill. These skills
are a basic prerequisite for leveraging the potential of e-learning. Yet, the real
benefits of e-learning will flow from the enhancement it can bring to the process
of teaching and learning, improving the speed and depth of knowledge and skills
acquisition, increasing flexibility for the learner, personalising the learning path,
innovating cross-discipline approaches by networked content and new
70

supporting collaborative learning methods mediated by technology. Overall, e-


learning provides a broad array of new options to shape, design and deliver
learning resources and processes.

The change of the education systems is one of the most important challenges
for our society - the mobilisation and participation of all players is required.
Education institutions have been conservative by culture and tradition - hence
the challenge of the change process is second to none. This social transformation
can only be successful if managed in a proactive and holistic way - taking into
account all critical success factors, in particular the role of e-learning as the
driver of change must be recognised and understood.

IBM has made a strategic decision to engage in this transformation process.


IBM has a broad set of capabilities which positions it as a potential partner
to governments, education institutions and other stakeholders.

1. What is e-learning?

The European Commission defines e-learning in the context of its e-learning


initiative as, “the use of new multimedia technologies and the Internet to
improve the quality of learning by facilitating access to resources and services as
well as remote exchanges and collaboration”.

E-learning in a broad sense embraces all these views and meanings. As such
it can be conceived as a complex, integrated process, where the Internet enables
social inclusion and social cohesion - enabling us to involve and connect people,
pedagogy, processes, content and technology. E-learning is supporting the
development, delivery, evaluation, management and commerce of learning in an
integrated way.

Understanding the complex nature of this new learning paradigm has led
IBM to adopt a broad definition of e-learning, based on a total systems
perspective. It is related to our notion of e-business, which is about transforming
core business processes by leveraging the net. Typical core business processes
are customer relationship management (CRM), Supply Chain Management and
e-commerce. Since e-learning affects the core business processes and the
business model relating to learning provision we define it as follows:

‘E-learning is the application of e-business technology and services to teaching


and learning. It provides digital content and collaboration to support remote
learning and to augment class-based learning. It includes infrastructure, e-
learning delively platforms, content development and management.
71

rt provides the collaborative framework to enable knowledge sharing and peer


to peer learning hubs that can be further supported by mentors or coaches, thus
supporting informal collaboration, sharing of knowledge and experiential
learning. ’

2. E-learning provides a new learning environment

From this perspective e-learning creates a new learning universe - a learning


environment for educators and learners with major sets of key elements:

People-centric human elements, such as pedagogical and didactic


approaches, personalised learner support, teacher quality and
capabilities, learner preferences, cultural factors, new roles (for
example, virtual tutors and community facilitators), social elements like
group interaction, collaboration and knowledge sharing

Content centric elements, such as rich media content, authoring tools,


learning object management, flexible credits linked to learning objects,
content repositories, user friendly content management - rich
categorisation and search

Process centric elements, such as security and privacy, learning


management systems including enrolment facilities, online testing,
reporting capabilities, efficiency and effectiveness measurements

Technology centric elements, such as hardware and software


infrastructure including servers, routers, end-user equipment, network
bandwidth, databases, delivery platforms, mobile technologies and
networking.

Using this broader perspective for e-learning avoids known pitfalls -


deploying the latest technology does not solve a learning need if there is no
sound pedagogical approach associated with it. The best multimedia programme
will not produce the effects desired if the bandwidth of the network does not
allow for sufficient transmission capacity towards the end-user. New learning
programmes will not achieve acceptance in the user community if they are not
compatible with the existing cultural environment, experiences and values.
72

All elements must be balanced in order to achieve desired results.

A holistic view of e-learning- creating a new learning environment

Process Content
v
* Trackinglreporting
* Instructional
* Skills, planning implementation
and assessment
* SuppoNhelp Interactional

IT infrastructure,content repositories,
portals, learning management system,
LCMS, authoring tools

It is clear from the previous page that technology is one of the necessary
conditions to make e-learning work - though not a sufficient condition.
However, without a sound technology strategy there is no way even to get started
with an e-learning deployment.

3. A technology view of e-learning

The base layer of the enabling technology is the network infrastructure. Our
customers tell us they need a network infrastructure that is robust, reliable,
scalable, secure and flexible - based on open standards. Availability,
interoperability and manageability are also key requirements. The network
infrastructure must allow for access by multiple devices ranging from laptop
computers to mobile phones. The network infrastructure sets the basic
capabilities and limitations as to what type of e-learning programmes can be
provided.

Another key technology element is the software, which underpins the e-


learning environment and ensures that the different application components
function seamlessly together such as enrolment system and billing system or rich
73

media objects repositories and authoring tools. Software brings flexibility and
innovation for the teachers in their course context and enables teachers and
learners to collaborate synchronously or asynchronously and to establish work
processes. Software also integrates and secures your existing environment with
e-learning.

Learning portals integrate the view and access of the learning environment
from the user perspective and eventually enable the user to create a personalised
‘my.University’ or ‘mySchoo1’ - based on the students, educators,
administration staff, alumni and external stakeholders profile.

4. IBMs capabilities as a technology partner

The e-learning provider marketplace is very fragmented today. Industry


observers expect a shakeout during the next 12-18 months. There are highly
specialised niche players, providing learning management and learning content
management systems and delivery platforms - in most cases proprietary
solutions - consulting companies, acting as integrators, information technology
(IT) players, coming from the infrastructure side and telecommunication carriers.
Recently major content providers (media companies and publishers) have zeroed
in on e-learning as a key future market.

We strongly believe that e-learning will drive increased co-operation


between education institutions and industry, not only within the boundaries of a
country but across Europe and even across continents. Students and lifelong
learners will have to access the institutions and their resources anytime, from
anywhere.

IBM is committed to open standards to ensure long term vendor


independence, scalability, interoperability and flexibility of solutions. The least
desirable state in an e-learning environment are islands of incompatible
implementations. The need for economies of scale of infrastructure, content and
support place a heavy financial and productivity penalty on such an approach.
IBM provides solutions for all technology layers of e-learning - be it with IBMs
own products and services or with partners.

Given the importance that we attribute to emerging e-learning requirements


for education, we have created an IBM Institute to focus on this arena from a
strategy and research perspective - the IBM Institute of Advanced Learning. It is
a global virtual organisation with the leadership based in our Zurich laboratory
in Rueschlikon. The Institute of Advanced Learning is focusing on the
technology side as well as the human factors of e-learning.
74

A technology view of e-learning

5. Key benefits that IBM can provide as a technology partner

We have capabilities to provide hardware, software and services in all


technology domains relevant for e-learning, which we can leverage to the benefit
of our customers and we feel strongly that we are best positioned as an integrator
and total solutions partner in this marketplace.

As the premier IT infrastructure provider we can help our customers to


plan, build and run the network infrastructure. We have developed
offerings directed specifically to the education market. We are worlung
with a number of partners in this arena, in particular with Cisco, where
we have a strategic alliance that has now been extended to the education
sector

0 IBM has unique capabilities and experiences as a systems integrator to


make IBM and non-IBM components work together and to shield the
customer from the complexity of multi-vendor systems

With our consulting capabilities we can support customers in the


development of a vision for an e-learning environment and to devise
implementation and project management plans

0 IBM has long standing experience with integrating access devices into
an e-learning environment. In a concept known as ‘Thinkpad*
University’ we work with universities to implement an integrated
75

programme for the deployment and support of mobile computing for the
students and faculty

IBM has a strong commitment to open sourcing and open standards and
has been strongly engaged in Linux, Extensible Markup Language
(XML), JavaT"2 Platform, Enterprise Edition (J2EE), Web services and
JavaTM. With the emergence of new standards in the field of e-learning
IBM takes an active role in the international standardisation work such
as the development of standards for learning object metadata

0 IBM has an unrivaled software portfolio, applicable to e-learning,


covered in OW four major brands:

-Use data management software for development of a


learning objects strategy by managing unstructured and
structured information and knowledge

-Develop collaboration and knowledge management into e-learning


organisations with Lotus* software as a major requirement to e-learning
success and learning quality improvement

-Tivoli' software helps users manage, measure and secure a large


distributed and heterogeneous learning environment to be sure the
learners have the right level of system, use the appropriate software and
reduce the costs and risks of software and hardware administration

-Websphere* software enables users to integrate existing applications


and to develop interactive Web applications and portals.

With Lotus Learning Space we have a learning delivery platform which


allows for collaboration between learners and takes e-learning to a high
level of effectiveness. Combined with instant teamroom applications
such as Lotus Quickplace*,realtime conferencing with Lotus Sametime*
and other solutions, teachers can keep innovation and control of their
pedagogy in a flexible approach of e-learning.

To sum it up - With IBM develop e-learning for everyone, not just a small
group and get a better return on investment (ROI).
76

6. IBMs capabilities beyond technology

Technology is only one of the critical success factors for e-learning


implementation. Successful deployment of an e-learning environment requires a
thorough understanding of the interplay of technology culture and human
elements. One of the most common pitfalls with e-learning is to undertake the
digitisation of content without addressing the human element. E-learning
requires a complete redesign and management of content and its embedding in a
meaningful and motivating pedagogical pathway. This can only be achieved with
adequate skills for instructional design, with a sound understanding of how
various technologies can be used to support specific learning objectives within a
framework for creating and maintaining motivation and interest by the learner
and in measuring learning effectiveness.

Also, education in the 21st century should address the role of preparing
students to operate in an uncertain and ever-changing environment. What is
needed is a toolkit to last through life, comprising such intra-personal elements
as a values framework, self-knowledge, capacity for critical analysis, ability to
learn, as well as communication and social skills.

The use of new technologies in schools will free educators time for
concentrating on these new core competencies and for ‘identifymg the strengths
of individuals, to focus on them and to lead students to achievement’. The use of
technology for e-learning also forces teachers to develop a new relationship with
students - one in which teachers act as facilitator and mentor to the self-directed,
independent and collaborative learning activities of students. The learning
journey is becoming an interactive process, which at times demands self
direction by the learner, at times is dependent on feedback from peers and tutors
and at others is simply a function of instructor defined outcomes. It will be
necessary to place comparable emphasis on technology and face-to-face
interaction, to balance a teacher-directed and a facilitated, collaborative
approach and to place equal importance on teacher delivery and learner
exploration. All of this argues for what has been called a ‘high tech - high touch’
approach. Such paradigm shifts in teaching and learning require radical changes
in the competencies of teachers and the attitudes of learners.

7. Key benefits that IBM can provide

Our value proposition for education customers builds on four main sources:
77

1. IBM has invested $70m, since 1994, into it’s ‘Reinventing Education’
partnership programme with the objective of improving the quality of
primary and secondary education. From the 28 installations around the
world and the ensuing research projects, IBM has gained significant
intellectual capital and has developed solutions for schools. More details
about IBMs Reinventing Education Programme can be found at
i bm .com/ibm/ibmgives.

2. We are one of the most significant providers of enterprise-wide Learning


Solutions. Our offerings in this arena span across the learning value-chain
from planning the learning intervention, through design, development and
implementation, to measurement - including the measurement of learning
effectiveness. Most of the know-how of IBMs learning consultants (in our
Learning Solutions organisation) gained fiom such engagements, is
applicable to the education sector.

3. IBM is one of the most significant developers and consumers of e-learning


for internal purposes. Today, some 40 per cent of IBMs internal education is
delivered through e-learning. We have been repeatedly recognised as one of
the most innovative e-learning companies in the world with numerous
awards from the American Society for Training and Development (ASTD),
the Corporate University Exchange (CUX), Deutscher Industrie und
Handelstag and so on.

4. A key element of IBMs investment in the development of e-business skills is


a new programme for partnering with educational institutions. The IBM
Scholars programme is wide-ranging, from the provision of software and
educational resources to research, academic collaboration and curriculum
development.
i b m ,co m/software/info/university/scholarsprogram/

Some examples of specific solutions and intellectual assets derived from


these activities and initiatives:

Through the Re-inventing Education Programme IBM has supported a


series of projects investigating effective practices in the use of technology in
education. With eight projects in various countries around the world (including
three in Europe) it has been possible to study an array of approaches to using
technology to improve instructional practice across a variety of contexts. From
online teacher professional development to online lesson planning to online
teaching interventions and online authentic assessment, the projects have
explored ways that ICT can enable transformed teaching and learning. Our
78

Learning Village solution incorporates the experiences and the findings of these
projects.

In our internal education programmes we have developed a conceptual


model, the IBM 4-Tier Learning Model, which helps us to better align learning
technologies with learning programmes to reach desired learning outcomes. T h s
‘IBM learning model’, has become a widely acclaimed and applied framework
for e-learning inside and outside IBM. The model has been proven in areas of
softskills training such as management development and sales training where it
helped us to design continuous learning processes with a new, innovative ‘blend’
of technology and face-to-face learning.

Blending classroom with e-learning


The ISM four-tier learning model
Learning methods Technology

Get together

Try it. play it, experience it

8. The political dimension - public private partnerships

A successful transition to a new model in education requires, as a starting point,


a shared vision of how to design tomorrow’s education and training and shared
commitment from the stakeholders involved. However, the initial priority may be
to change perceptions and develop new mindsets.
79

Since the early days of the European Commission’s white papers ‘Growth,
Competitiveness and Employment’ (1994) and ‘Teaching and Learning’ (1999,
the European institutions have played an important role in actively tackling the
challenges of the 2 1st century and challenging prevailing mindsets. These white
papers set out the framework for subsequent commission documents, including
the eEurope Initiative, the eEurope Action Plan, the e-learning initiative and the
e-learning Summit and Action Plan (May 2001), the Memorandum on Lifelong
Learning and the Report On The Concrete Future Objectives Of Education
Systems. Attaining all the goals defined at the Lisbon European Council in
March 2000, presupposes the committed involvement of all the players involved
in education and training.

“The fact is that in the future a society’s economic and social performance will
increasingly be determined by the extent to which its citizens and its economic
and social forces can use the potential of these new technologies, how efficiently
they incorporate them into the economy and build up a knowledge-based
society”.
(Communication from the Commission: eLearning - Designing tomorrow’s
Education, 2000)

9. IBMs role in pan-European public private partnerships

IBM is engaged in a leading role in two major European initiatives relating to


ICT skills and e-learning: The Career-Space Consortium and the E-learning
Industry Group (eLIG).

10. The Career-Space project

Career-Space was founded in late 1998 with support and sponsorship from the
European Commission. Seven major ICT companies have been founding
members - BT, IBM, Microsof<*, Nokia, Philips, Siemens and Thales (formerly
Thomson CSF). This initiative was triggered by the structural shortage of
qualified ICT personnel in Europe, which will impact the future prosperity of the
content, if not addressed adequately. As a first compelling issue to be addressed
in the context of the ICT skills gap, Career-Space has given a response from the
industry perspective as to what generic skills will be needed in the future and
should be built by the ‘suppliers’. Following the publication of 13 ‘generic ICT
skills profiles’ by the end of 1999, new members joined the group - Cisco,
IntelrM,Nortel Networks and Telefonica. In addition, the European ICT Industry
Association (EICTA) and CEN/ISSS (the European standardisation organisation
for ICT) joined the Steering Committee along with EUREL (the convention of
80

national societies of electrical engineers), who were given status of associate


member. With this membership the consortium now has a significant credibility
to articulate requirements from an industry perspective and to provide
recommendations for action.

Following the publication of the skdls profiles Career-Space has focused on


the logical next step, for instance, it has moved from the demand side to the
supply side. What should be the changes in future ICT curricula in content and
structure to support the demands of the knowledge society? The
recommendations have been produced in co-operation with over 20 universities
and technical educational institutions across Europe. The output of this work
effort, the new ‘Curriculum Development Guidelines - New ICT Curricula for
the 2 1st Century, Designing Tomorrow’s Education’ - contain fundamental
recommendations for change. Besides the obvious need to provide solid
foundation skills from the engineering and informatics domains with a particular
emphasis on a broad systems perspective, the guidelines point to the need of
including non-technical skills in the curricula such as business skills and
personal skills.

There is no way to enforce implementation of these guidelines - however, a


number of universities have already started to implement these guidelines on a
pilot basis.

11. The European E-learning Summit 2001 and the E-learning


Industry Group

Following the recognition of the importance of public private partnerships as a


key element in the transformation process of the education system the European
Commission invited IBM, Cisco Systems, Nokia, Sanoma WSOY and
SmartForce to collaborate with a wide range of industry partners in organising
the summit. The e-learning summit explored the challenges outlined in the
European Commission’s e-learning action plan and presented an initial set of
recommendations. The Summit has taken into account a broad systemic view of
e-learning along the lines demonstrated initially in this article. As a consequence
the recommendations issued by the Summit touch elements from infrastructure to
digital content, pedagogy and professional development of teachers, to name a
few. Particular focus is also given to financial incentives and funding schemes to
progress actual implementation and diffusion of e-learning. The structural funds
and loans from the European Investment Bank are major examples of sources to
be broadly leveraged. Overall the summit working groups recommended
pragmatic steps to move the e-learning agenda forward on a European level.
81

Following the e-learning summit a core group of companies has proposed


the establishment of a standing body to provide advice to the European
Commission and national governments across Europe. The objective is to
accelerate the deployment of e-learning in line with the European Commission’s
e-learning action plan. The founding members of this e-learning industry group
are 3COM, Accenture, Apple, BT, Cisco, Digitalbrain, IBM, IntelTM,Line
Communications, NIIT, Nokia, Online Courseware Factory, Sanoma WSOY,
Sun Microsystems and Vivendi Universal Publishing.

Dr. Richard Straub, Director of Learning Solutions, IBM Europe, Middle


East and Africa has been elected Chairman of the group. In a meeting with the
European Commissioner for Education and Culture, Viviane Reding, four initial
projects were proposed by the group and have been reviewed and welcomed by
the Commissioner. These projects will lead to specific recommendations for
action and where feasible, support pilot implementations.

0 Connecting everyone and everything from everywhere - removing the


barriers for access to interactive e-learning environments
0 Adopt and participate in the development of open standards of e-learning
Create the conditions to sustain a commercial market for e-learning content
and development
Increase investment in continuous professional development of teachers and
trainers, enhancing their status and helping them develop and understand the
principles for e-learning.

The E-learning Industry Group is an open group and welcomes involvement


of other industry players. Other interested parties such as Associations and
Government Agencies can participate in the ‘Consultation Group’, which
represents the wider circle of the Industry Group.

12. Our vision for the future

It is our vision that new value-nets will emerge, with government bodies,
education institutions, corporations, technology providers, media companies and
publishers joining forces to provide learning on demand - as a ‘utility’. E-
learning utilities will serve schools, universities, small and medium enterprises,
larger corporations and individuals to meet their learning needs, shield them
from the complexities of the underlying infrastructures and systems and provide
ubiquitous access to learning via education portals to learning experiences. The
e-learning utility will also provide a managed environment for content producers
to deliver their content to education and learning environments. The e-learning
utility will be part of an overall learning environment, where the human elements
82

such as tutors, mentors and social interactions between groups will continue to
play a vital role - yet in an effective blend with technology.

The E-learning Utility will shield the education institutions from the
complexity of the IT solution and from the load of building, running and
maintaining it. It will allow focus on what is essential for the institutions - for
instance, the provision of well orchestrated learning opportunities, strong
curricula and adequate pedagogy and meeting the need of a diverse audience.

The knowledge society enforces a new way of thinlung and acting about
learning. We are at the very beginning of this journey, but it has definitely
started. The winners will be those who get on the learning curve early.

E-utility for learning

Hostinglbandwidth )

13. The IBM e-learning model

In order to obtain and master a competency, listening to someone speaking might


not be sufficient: we also need to experiment. Surely, the acquisition of
competencies can be reached through individual work, but people learn better
when they study in a team, as it has been proved in many works of research.
83

The same concept applies to an e-learning environment. E-learning models


centered on the teacher, as web lectures, are very effective in transferring
information. However, to completely master a competency it is necessary that
the model used allows the student to assume the control of his own learning as
well as to practice.

An effective learning model, that allows students to practically use the


acquired competency, also requires an interaction. The simple interaction with
the computer may not be sufficient: interaction and collaboration among several
students or between the student and the teacher or both is also advisable. Finally,
in order to really master competencies, the student needs to use it in a real
situation.

Hence, as we proceed in the education chain, the collaboration level must


increase. Even in an e-Learning model the interaction and collaboration level
must further increase when students need to master peculiar competencies.

Interaction and collaboration are the most meaningful aspects that support
the so-called IBM e-Learning model that the company uses both for its own
internal education and for the management of large e-Learning projects within its
own customers.

It is a 4-tier model that begins from a lower tier based on information


sharing and extends to a level of complete knowledge mastering: a model not
100% e-Learning based. As a matter of fact, it is not even plausible that e-
Learning could completely replace traditional, classroom based education: it will
always be necessary, at some stage of knowledge building and development, to
put students in front of an expert.

In addition, the model lets you develop courses both horizontally and
vertically. In other words, some courses can be based on a simple tier, using only
one e-Learning methodology, while others need more levels and different
methodologies. These latest solutions are called blended solutions

Briefly, the four-tiers of the IBM model are:

Tier 1: learning through information


Reading, watching, listening. This is basic knowledge transfer, ideal for the
launch of new initiatives, for announcing new company strategies, for the
advertising of new rules, etc. The tools that are used in this tier are simple web
lectures and web sites where students can quickly and easily find needed
information.
84

Tier 2 : learning through interaction


Testing, experimenting. Basic knowledge on new applications or simple
procedural activities can be treated at this level. Some examples: CBT based or
WBT based courses with application simulation.

Tier 3 : learning through collaboration


Discussing, practicing with others. Collaboration techniques, like chat lines,
team-rooms and online interaction with teachers, allow students to learn inside
the team and to share experiences. At this level, students can prepare group
exercises, or they can use more sophisticated technologies where application
sharing is possible.

Tier 4 : learning together


Finally, it is possible to use the traditional classroom with a mentoring activity.
However, in an e-learning model, we use this tier only to obtain very advanced
competencies and not to transfer or to acquire basic knowledge.

The result is a time reduction for students outside the work schedule and an
optimization of teachers’ precious time as well as expensive resources.
85

Web links:
Information about our Learning Solutions for schools, higher education and
government can be found at this Web site:

ibm.com/learning

IBMs Reinventing Education Programme: ibm.com/ibm/ibmgives


IBM Scholars Programme:

i bm .com/software/info/university/scholarsprogram
86

QUERY MORPHING FOR INFORMATION FUSION

SHI-KUO CHANG
Department of Computer Science
University of Pittsburgh
Pittsburgh, PA, USA
E-rnai1:chang@ cs.pitt.edu

An evolutionary query is a query that changes in time and/or space. For example, when
an emergency management worker moves around in a disaster area, an evolutionary
query can be executed repeatedly to evaluate the surrounding area in order to locate
objects of threat. Depending upon the position of the query originator, the time of the
day and other factors such as feedback from sensors, the query can be modified.
Incremental query modification leads to a query similar to the original query. Non-
incremental query modification on the other hand may lead to a substantially different
query. Query morphing includes both incremental query modification and non-
incremental query modification. In sensor-based evolutionary query processing, through
query morphing one or more sensor can provide feedback to the other sensors. The
sensor dependency graph is used to facilitate query optimization because most sensors
can generate large quantities of temporaVspatia1 information within short periods of time.
Applications to multi-sensor information fusion in emergency management, pervasive
computing and situated computing are discussed.

1. Evolutionary Queries

There is an important class of queries for information fusion applications in


emergency management, pervasive computing, situated computing [20] etc.,
which require novel query processing and information visualization techniques.
We will call this class of queries evolutionary queries. An evolutionary query is
a query that changes in time andlor space. For example, when an emergency
rescue worker moves around in a disaster area, an evolutionary query can be
executed repeatedly to evaluate the surrounding area in order to locate objects of
threat, determine routing for rescue vehicles, etc. Depending upon the position
of the person or agent, the time of the day and other factors such as feedback
from sensors, the query can be different.

The person or agent who issues the query is called the query originator.
Depending upon the spatialltemporal coordinates of the query originator and
feedback from sensors, an evolutionary query can be modified accordingly
87

Under normal circumstances the modified query is quite similar to the original
query, differing mainly in the spatial constraints of the query.

As explained above, incremental query modification where only the constraints


are changed leads to a query similar to the original query. However, non-
incremental query modification may lead to a substantially different query. For
example, if the aircraft has entered a cloudy region or is flying at night, the query
should be modified to consider only time sequenced laser radar images because
the video sequence will yield little or no information. There are cases where
even more substantial changes of the query are necessary. Query morphing
includes both incremental query modification and non-incremental query
modification.

In this paper we investigate query morphing for sensor-based evolutionary query


processing, where one or more sensor may provide feedback to the other sensors
through query morphing. The status information such as position, time and
certainty can be incorporated both in the multi-level views and also in the
morphed query. In order to accomplish sensor data independence, an ontological
knowledge base is employed. The results of query processing are visualized so
that the user can also manually modify the query. Further extension of the query
morphing approach is discussed.

2. Background and Related Research

Information fusion is the integration of information from multiple sources and


databases in multiple modalities and located in multiple spatial and temporal
domains. The fusion of multimedia information from multiple real-time sources
and databases has become increasingly important because of its practical
significance in many application areas such as telemedicine, community
networks for crime prevention, health care, emergency management, e-learning
and situated computing. The objectives of information fusion are: a) to detect
certain significant events [24, 261 and b) to verify the consistency of detected
events [ 11, 16, 211. In sensor-based query processing, the queries are applied to
both static databases and dynamic real-time sources that include different type of
sensors. Since most sensors can generate large quantities of spatial information
within short periods of time, novel sensor-based query processing techniques to
retrieve and fuse information from multiple sources are needed.

In our previous research, a spatiaVternpora1 query language called 2QL was


developed to support the retrieval and fusion of multimedia information from
88

real-time sources and databases [5, 6, 9, 151. ZQL allows a user to specify
powerful spatiaVtemporal queries for both multimedia data sources and
multimedia databases, thus eliminating the need to write separate queries for
each. ZQL can be seen as a tool for handling spatiaYtempord information for
sensor-based information fusion, because most sensors generate spatial
information in a temporal sequential manner [14]. A powerful visual user
interface called the Sentient Map allows the user to formulate spatialhemporal cr-
queries using gestures [7, 81.

For empirical study we collaborated with the Swedish Defense Research Agency
who has collected information from different type of sensors, including laser
radar, infrared video (similar to video but generated at 60 framedsec), and CCD
digital camera. When we applied ZQL to the fusion of the above described
sensor data, we discovered that in the fusion process data from a single sensor
yields poor results in object recognition. For instance, the target object may be
partially hidden by an occluding object such as a tree, rendering certain type of
sensors ineffective.

Object recognition can be significantly improved, if a modified query is


generated to obtain information from another type of sensor, while allowing the
target being partially hidden. In other words, one (or more) sensor may serve as
a guide to the other sensors by providing status information such as position,
time and certainty, which can be incorporated in multiple views and formulated
as constraints in the modified query. In the modified query, the source(s) can be
changed, and additional constraints can be included in the where-clause of the cr-
query. This approach provides better object recognition results because the
modified query can improve the result from the various sensor data that will also
lead to a better result in the fusion process. A modified query may also send a
request for new data and thus lead to a feedback process.

In early research on query modification, queries are modified to deal with


integrity constraints [22]. In query augmentation, queries are augmented by
adding constraints to speed up query processing [12]. In query refinement [23]
multiple term queries are refined by dynamically combining pre-computed
suggestions for single term queries. Recently query refinement technique was
applied to content-based retrieval from multimedia databases [3]. In our
approach, the modified queries are created to deal with the lack of information
from a certain source or sources, and therefore not only the constraints can be
changed, but also the source(s). This approach has not been considered
previously in database query optimization because usually the sources are
89

assumed to provide the complete information needed by the queries. Almost all
previous approaches fall under the category of incremental query modification.
For information fusion we must consider non-incremental query modification
where not only the constraints but also the sources and even the query structure
are modified. .It is for this reason we introduce the notion of query morphing.

In addition to the related approaches in query augmentation, there is also recent


research work in agent-based techniques that are relevant to our approach. Many
mobile agent systems have been developed [l, 2, 181, and recently mobile agent
technology is beginning to be applied to information retrieval from multimedia
databases [ 171. It is conceivable that sensors can be handled by different agents
that exchange information and cooperate with each other to achieve information
fusion. However, mobile agents are highly domain-specific and depend on ad-
hoc, ‘hardwired’ programs to implement them. In contrast, our approach offers a
theoretical framework for query optimization and is empirically applicable to
different type of sensors, thus achieving sensor data independence.

3. The Sensor Dependency Graph

The sensor dependency graph is proposed to facilitate sensor-based evolutionary


query processing and optimization because most sensors can generate large
quantities of spatial information within short periods of time. In database theory,
query optimization is usually formulated with respect to a query execution plan
where the nodes represent the various database operations to be performed [13].
The query execution plan can then be transformed in various ways to optimize
query processing with respect to certain cost functions. In sensor-based query
processing, a concept similar to the query execution plan is introduced. It is
called the sensor dependency graph, which is a graph in which each node Pi has
the following parameters:

obj-typei is the object type to be recognized


sourcei is either the information source or an operator for fusion, union or
combination
recog-algi is the object recognitiodfusion algorithm to be applied
timei is the estimated computation time of the recognitiodfusion algorithm
in seconds
recog-cq is the certainty range [min, max] for the recognition of an object
norecog-cri is the certainty range [min, max] for the non-recognition of an
object
sqoi is the spatial coordinates of the query originator
tqoi is the temporal coordinates of the query originator
90

soii is the space-of-interest for object recognitiodfusion (usually an area-of-


interest)
toiiis the time-of-interest for object recognitiodfusion (usually a time-
interval-of-interest)

These parameters provide detailed information on a computation step to be


carried out in sensor-based evolutionary query processing. As mentioned earlier
the query originator is the persodagent who issues a query. For evolutionary
queries, the spatialhemporal coordinates of the query originator are required.
For other type of queries, these parameters are optional.

If the computation results of a node PI are the required input to another node P2,
there is a directed arc from PI to P2. Usually we are dealing with sensur
dependency trees where the directed arcs originate from the leave nodes and
terminate at the root node. The leave nodes of the tree are the information
sources such as laser radar, infrared camera, CCD camera and so on. They have
parameters such as (none, LR, NONE, 0, (1,1), (1,1), sqo,, tqoi, soiall,toiall).
Sometimes we represent such leave nodes by their symbolic names such as LR,
IR, CCD, etc. The intermediate nodes of the tree are the objects to be
recognized. For example, suppose the object type is 'truck'. An intermediate
node may have parameters (truck, LR, recog315, 10, (0.3, OS),(I,l), sqoi, tqoi,
soill, toiall). The root node of the tree is the result of information fusion, for
example, a node with parameters (truck, ALL, fusion7, 2000, (O,l), (O,]), sqoi,
tqoi, soiall,toiall)where the parameter ALL indicates that information is drawn
from all the sources. In what follows, some parameters such as the
spatialhemporal coordinates sqoi and tqoi for the query originator, the all-
inclusive space-of-interest soialland the all-inclusive time-of-interest toiallwill be
omitted for the same of clarity.

Query processing is accomplished by the repeated computation and updates of


the sensor dependency graph. During each iteration one or more nodes are
selected for computation. The selected nodes must not be dependent on any
other nodes. After the computation, one ore more nodes are removed from the
sensor dependency graph. The process then iterates. As an example, the query
originator is an aircraft, and the evolutionary query is a query to find moving
trucks. By analyzing the initial query, the following sensor dependency graph TI
is constructed, where the sources are laser radar (LR), infrared (IR) and charged
couple device camera (CCD):

( m d - S N O N E P O .I )ll,1)) (+mk,LR mmQ1520L0303LO 496))

(mle,ITCNONEPLl,l)Ll,lf) ( ~ . A L L F u u o ~ ~ , ~ ~i(o,i))
(o,~
91

This means the information is from the three sources - laser radar, infrared
camera and CCD camera - and the information will be fused for recognizing the
object type 'truck'.

Next, we select some of the nodes to compute. For instance, all the three source
nodes can be selected, meaning information will be gathered from all three
sources. After this computation, the processed nodes are dropped and the
following updated sensor dependency graph T2 is obtained:

(truck,CCD, recogl1,100,(0.6,0.8),(0.1,0.3))

We can then select the next node(s) to compute. Since IR has the smallest
estimated computation time, it is selected and recognition algorithm 144 is
applied. The sensor dependency graph T3 is:

(truck,LR, recog3 15,20,(0.3,0.5),(0.4,0.6))

I))
(truck,ALL,fusion7,2OOO,(O,~),(0,

(truck,CCD, recogl1,100,(0.6,0.8),(0.1,0.3))

In the updated graph, the IR node has been removed. We now select the CCD
node because it has much higher certainty range than LR and, after its
processing, select the LR node. The sensor dependency graph T4 is:
I
I
(truck,LR, recog315,20,(0.3,0.5),(0.4,0.6))
I 1
(truck,ALL,fusion7,2OOO,(O,l),(0,1)) I
I

Finally the fusion node is selected. The graph T5 has only a single node:

I))
(truck,ALL,fusion7.2000,(0,I),(O,

After the fusion operation, there are no unprocessed (i.e., unselected) nodes, and
query processing terminates.
92

4. Query Morphing by Incremental Query Modification

In the previous section a straightforward approach of sensor-based evolutionary


query processing is described. This straightforward approach misses the
opportunity of utilizing incomplete and imprecise knowledge gained during
query processing.

Let us re-examine the above scenario. After IR is selected and recognition


algorithm 144 applied, suppose the result of recognition is not very good, and
only some partially occluded large objects are recognized. If we follow the
original approach, the reduced sensor dependency graph becomes T3 as shown
in Section 3. But this misses the opportunity of utilizing the incomplete and
imprecise knowledge gained by recognition algorithm 144. If the query is to find
un-occluded objects and the sensor reports only an occluded object, then the
query processor is unable to continue unless we modify the query to find
occluded objects. Therefore a better approach is to modify the original query, so
that the recognition algorithm 144 is first applied to detect objects in a space-of-
interest soi-ALL (i.e., the entire area). Although algorithm 144 cannot detect an
object, it is able to reduce the space of interest to a much smaller soi23, and T4
becomes T4'.

(truck,LR,reco~315,20,(0.3,0.5),(0.4,0.6),soi-23) (truck, ALL.fusion7,2000.(0. 1),(0. 1),soi-23)

The recognition algorithm 315 can now be applied to recognize objects of the
type 'truck' in this smaller space-of-interest. Finally, the fusion algorithm fusion7
is applied.

The query modification approach is outlined below, where italic words indicate
operations for the second (and subsequent) iteration.

Step 1. Analyze the user query to generatelupdate the sensor dependency graph
based upon the ontological knowledge base (see Section 6) and the multi-level
view database (see Section 5) that contains up-to-date contextual information in
the object view, local view and global view, respectively.

Step 2. If the sensor dependency graph is reduced to a single node, perform


fusion operation (if multiple sensors have been used) and then terminate query
processing. Otherwise buildmodify the o-query based upon the user query, the
sensor dependency graph and the multi-level view database.

Step 3. Execute the portion of the o-query that is executable according to the
sensor dependency graph.
93

Step 4. Update the multi-level view database and go back to Step 1.

As mentioned above, if in the original query we are interested only in finding un-
occluded objects, then the query processor must report failure when only an
occluded object is found. If, however, the query is modified to "find both un-
occluded and occluded objects", then the query processor can still continue.

Evolutionary queries and query processing are also affected by the


spatialhemporal relations among the query originator, the sensors and the sensed
objects. Therefore in query processing the spatialltemporal relations must be
taken into consideration in the constructiodupdate of the sensor dependency
graph. The temporal relations include "followed by", "preceded by", and so on.
The spatial relations include the usual spatial relations, and special ones such as
"occluded by", and so on [ 191.

5. Multi-Level View Database

A multi-level view database (MLVD) is proposed to support sensor-based query


processing. The status information is obtained from the sensors, which includes
object type, position, orientation, time, certainty and so on. The positions of the
query originator and the sensors may also change. This is processed and
integrated into the multi-level view database. Whenever the query processor
needs some information, it asks the view manager. The view manager also
shields the rest of the system from the details of managing sensory data, thus
achieving sensory data independence.

The multiple views may include the following three views in a resolution
pyramid structure: the global view, the local view and the object view. The
global view describes where the target object is situated in relation to some other
objects, e.g. a road from a map. This will enable the sensor analysis program to
find the location of the target object with greater accuracy and thus make a better
analysis. The local view provides the information such as the target object is
partially hidden. The local view can be described, for example, in terms of
Symbolic Projection [4],or other representations. Finally, there is also a need for
a symbolic object description. The views may include information about the
query originator and can be used later on in other important tasks such as in
situation analysis.

The multi-level views are managed by the view manager, which can be regarded
as an agent, or as middleware, depending upon the system architecture. The
global view is obtained primarily from the geographic information system (GIS).
94

The local view and object view are more detailed descriptions of local areas and
objects. The results of query processing, and the movements of the query
originator, may both lead to the updating of all three views.

6. The Ontological Knowledge Base

For any single sensor the sensed data usually does not fully describe an object,
otherwise there will be no need to utilize other sensors. In the general case the
system should be able to detect that some sensors are not giving the complete
view of the scene and automatically select those sensors that can help the most in
providing more information to describe the whole scene. In order to do so the
system should have a collection of facts and conditions, which constitute the
working knowledge about the real world and the sensors. We propose to store
this knowledge in the ontological knowledge base, whose content includes
object knowledge structure, sensor and sensor data control knowledge.

The ontological knowledge base consists of three parts: the sensor part
describing the sensors, recognition algorithms and so on, the external conditions
part providing a description of external conditions such as weather condition,
light condition and so on, and the sensed objects part describing objects to be
sensed. Given the external condition and the object to be sensed, we can
determine what sensor(s) and recognition algorithm(s) may be applied. For
example, IR and Laser can be used at night (time condition), while CCD cannot
be used. IR probably can be used in foggy weather, but Laser and CCD cannot
be used (weather condition). However, such determination is often uncertain.
Therefore certainty factors should be associated with items in the ontological
knowledge base to deal with the uncertainty.

7. Query Optimization

In the previous sections we explained the evolutionary query processing steps


and proposed the major components of the system. In this section the
optimization problem related to evolutionary query processing are proposed.

Suppose that we have the sensor dependency graph such as T1 of Section 3. For
the recognition algorithm recog3 15, we have the following certainty range:
P(recog315 = yes 1 X = truck, Y = LR) E (0.3,0.5), and P(recog315 = no 1 X #
truck ,Y= LR) E (0.4,0.6), where X= truck, Y = LR means that there is a truck
in the frame which is obtained by LR. If the input data has certainty range (a,b)
and the recognition algorithm has certainty range (c,d), then the output has
95

certainty range (min(a,c), min(b,d)). The optimization problem can be stated as


follows: Given the sensor dependency graph, we want to recognize the object
‘truck’ with the certainty value above a threshold. Our goal is to minimize the
total processing time. In other words, the optimization problem is as follows:

where 8, = if algorithm i doesn’t run at jth order

Ti = processing time of algorithm i.


subject to

56
i=l
5 1 (for jth order, at most one algorithm can run.)

56
j=l
5 1 (for every algorithm, it can be at most in one order.)

max( c N
6 , ) 2 8 ( 8 is the certainty threshold.)
where

c 1
= ( c( ALG,, priori certainty), .., , c( ALGN,priori certainty) )
96
...
Given the sensor dependency graph, a dual problem is to recognize the object
'truck' within the processing time limit. Our goal is to maximize the certainty
value for the object truck under the condition that the total processing time is
below the time limit. The problem is as follows:

Maximize m a x ( c 6,)

1'
if algorithm i runs at jth order
where 6 ,= 0 if algorithm i doesn't run at jth order

c 1
= ( c( ALG1, priori certainty), ... , c( ALGN, priori certainty) )

...
subject to

$8, I 1 (for jth order, at most one algorithm can run.)


i=l

I:6 ,I
N

j=l
1 (for every algorithm, it can be at most in one order.)

N N
X&,Ti I T (T is the maximum time that we can bear.)

where

T i = processing time of algorithm i.


97

In the above optimization problems we have not considered the space of interest
soi when we formalize the problem. If we put it in, the formulation of the
problem becomes more complicated:

1
F =(T (ALG1 , initial soi ) , . . . , T (ALGN, initial soi) )

I
A = ( a( ALG1, initial soi), . .. , a( ALGN,initial soi) )

...

5F
k=l
fi is the total running time.

Now we need to change the goal function to

N
Minimize F~ fi 0
k =l

Note:
a ( ALGi, soi) is the output soi after using algorithm i on the
input soi.

A 6, is the output soi after using the first algorithm.


F 6 is the running time of the k" order algorithm.
T (ALGi , soi ) is the running time of the i" algorithm on soi
98

8. An Experimental Prototype

An experimental prototype for query processing, fusion and visualization has


been implemented. As shown in Figure 1, after a recognition algorithm is applied
and some objects identified, these objects can be displayed. Each object has
seven attributes: object id, object color, object type, source, recognition
algorithm, estimated processing time and certainty range for object recognition.
There are also hidden attributes including the parameters for the minimum
enclosing rectangle of that object, the spatial and temporal coordinates of the
query originator, the space of interest and time of interest, and the certainty
range for non-recognition of object.

As shown in Figure 2, the recognition algorithms can be applied dynamically to


an area in the image, and the recognized objects are displayed. Figures 3 and 4
illustrate the construction of a CQL query and the results of processing the query,
respectively.

Figure 1. Objects recognized by the recognition algorithms have seven


attributes.
99

Figure 2. Recognition algorithms can be applied dynamically.

Figure 3. Visual construction of a query. The resultant query is shown in the


upper right window.
100

Figure 4.Visualization of Query processing. The result of query was shown on

?om12
-LOW1
-IOWl
I
- BOW1
IQusrys
-SOW1
5ualy9
- D OW2
~ 6d OW2
IQuerp0
-IOW2
IQuery1
-1OW2
IQuay1 2

Figure 5. The dependency tree (left) and a selected node (right). We can trace
the Query processing step by step.
101

mmm
POW12
-4
€Bowl
mow1
mow1
- 4ow2
8OEJ2
SOW2
mow2

Figure 6. The next step of query processing after the step shown in Figure 5.
Bot tep.

Figure 7. p window.

The main window in Figure 3 illustrates the visual construction of a query. The
user drags and drops objects and enters their attributes, and the constructed
query is shown in the upper right window. The objects in the dependency tree
are shown as an object stream in the middle right window. In Figure 4 the lower
right window shows the query results. When an evolutionary query is being
executed, its dependency tree will change dynamically. Figure 5 displays the
same information as that of the object stream, but in a format more familiar to
102

end users. It shows the dependency tree on the left side of the screen, and the
selected node with its attributes on the right side of the screen. In the next step,
both the dependency tree and the query may be changed, as illustrated in Figure
6. As shown in Figure 7, the information of optimization can be shown in a pop-
up window.

The CQL query shown in the upper right window of Figure 3 is as follows:

SELECT object
CLUSTER * ALIAS OBJl OBJ2
FROM
SELECT t
CLUSTER *
FROM video-source
WHERE 0BJl.type = 'car' AND 0BJl.color = 'red AND
OBJ2.type = 'truck AND 0BJl.t < OBJ2.t

The corresponding Object Stream is:


COMBINE
OBJ 1,OBJ2
OBJl.t=OBJ2.t

OBJ2
video-source
OBJ2.type = 'truck'

OBJl
video-source
0BJl.type = 'car'
0BJl.color = 'red

The Result Set is:


Carl, Truck3
C a d , Truck4
i.e., either (Carl, Truck3) or {Car2, Truck 4)is the retrieved result.

Another example to show the fusion operation is illustrated in Figure 8. The


ZQL query for fusion is as follows:
103

Figure 8. A fusion query.

SELECT object
CLUSTER * ALIAS OBJl OBJ2
FROM
SELECT t
CLUSTER *
FROM LR, CCD, IR
WHERE
WHERE OBJ 1.type = 'car' AND OBJ 1 .color = 'red' AND
OBJ2.type = 'truck AND 0BJl.t e OBJ2.t

The Object Stream is:


COMBINE
OBJl,OBJ2
OBJ1. t<OBJ2.t

FUSION
OB52
LR, CCD, IR
OBJ2.type = 'truck'

FUSION
OBJl
LR, CCD, IR
0BJl.type = 'car'
OBJ 1.color = 'red'
104

Figure 9. A fusion query's dependency tree consists of UnionKombination


nodes, Fusion nodes and Query nodes. The selected fusion node with its
attributes is displayed on the right screen.

As already explained, Figure 5 and 6 show the dependency tree whose dynamic
changes illustrates the steps of query processing. Figure 9 also shows the
dependency tree. However, in this example of query processing for fusion,
several nodes are marked as 'cut-off nodes, meaning the certainty values are
already above a threshold and consequently no further processing of these nodes
is needed.

9. Query Morphing by Non-Incremental Query Modification

In preceding sections we explained our approach based upon mainly incremental


query modification. Non-incremental query modification in general remains to
be incorporated into the approach. Therefore in this section we describe query
morphing by non-incremental query modification.

Conceptually, query morphing is somewhat like image morphing: the end user
formulates one query called a query point and requests the query processor to
morph one query point into another query point. Within limits, the two query
points are arbitrary, and the query processor is able to figure out automatically
how a query point is morphed into another query point. Sometimes query
morphing is accomplished by modifying the query incrementally. Sometimes
more substantial query modification is necessary. In incremental query
105

modification, the two query points are more or less similar. In non-incremental
query modification, the two query points are substantially different.

We define a distance measure d(ql, q,,) between two query points q1 and q,
based upon the number and type of transformation steps to transform q1 into 4.,
Depending upon the type of transformation, different weights are assigned to the
transformation steps. An infinite weight is assigned to a forbidden type of
transformation. Let fi,f2, ...,fn.l be the n-1 transformations such that fi (ql) =
q 2 , f 2 (q2) = q3, ..., fn-1 (q,-l) = q,. The distance between 41 and q, is defined as:

n
d(ql, qn) = wj where w, is the weight assigned to transformation stepA
j=l

The following are examples of transformation steps :

add target attributes (weight 1)


drop target attributes (weight 1)
replace sources (weight 3)
add a conditional clause (weight 2)

Each transformation step is assigned a certain weight. For example the add
transformation step has the weight 1. An incremental morphing pair of queries
(ql, 4), is one whose distance d(ql, 4”) is below a threshold 5. If q1 and qn form
an incremental morphing pair, morphing from one into another by incremental
query modification is possible. If q 1 and q, do not form an incremental morphing
pair, morphing from one into another by incremental query modification is
impossible.

A non-incremental morphing pair of queries (ql, qn) is one whose distance d(ql,
9,) is finite but above the threshold z. If q1 and q2 form a non-incremental
morphing pair, morphing from one into another by non-incremental query
modification is possible. A non-incremental transformation is one that
completely rewrites the query. A morphing pair is either an incremental or a
non-incremental morphing pair. If q1 and q2 do not form a morphing pair,
morphing from one into another by query modification is impossible.

In the preceding sections, the application we focused on is the fusion of remote


sensing data such as video, infrared images and radar images, and the examples
are also drawn from that application area. We will now use distance learning as
another application area in the following example on query morphing.
106

In distance learning, a student searches for information related to a subject


matter such as “binary tree” from the sources. The sources initially specified by
the student are the textbooks provided by the instructor.

Original query I :
Select object
From textbook
Where object.topic = “binary tree”

The query processor finds related class notes, reference books and videotaped
materials, and consequently the sources in the query are updated by the query
processor. This is a typical example of incremental query morphing.

Morphed query 2 :
Select object
From textbook, classnotes, reference-book, videotaped-materials
Where object.topic = “binary tree”

In experience-based distance learning, learning-by-doing becomes very


important. Information obtained from case studies and work experiences need to
be fused into the knowledge base and made available to the learner. Fusion of
information is thus required. The original query is rewritten as the fusion of
several media-specific queries, each reformulated according to the characteristics
of the respective media. This is an example of a substantial query transformation
for non-incremental query morphing.

Morphed query 3:
Select object
From case-studies
Where object.topic = “binary tree”
or
Select object
From life-experiences
Where object.topic = “binary tree”

Last but not least, learning-from-peers is another important aspect in peer-


oriented distance learning. A fellow student may possess information items to be
shared: textbooks, class notes, case studies and work experiences. The original
query is rewritten as the fusion of several peer-oriented queries, each
reformulated according to the user-profile of the respective peer. This is yet
another example of a substantial query transformation for non-incremental query
morphing.
107

Morphed query 4 :
Select object
From classnotes of student1
Where object.topic = “binary tree”
or
Select object
From life-experiences of student2
Where object.topic = “binary tree”

It is important to note that both the morphed query and the retrieval results
contain information valuable to the user/learner/student. In other words, the
questions are just as important as the answers. To this end, adlets [S] is used to
generate morphed queries to gather information. Adlets travel from nodes to
nodes to acquire more information. The query is morphed as the adlets travel
along a chosen path.

As illustrated by the above examples, query morphing is often event-driven:


when an event occurs, query morphing is invoked. Events are characterized by
conditions: “an object is occluded”, “the time of the day is 7pm”, “the weather
has changed from cloudy to raining”, “a new textbook on this subject becomes
available”, “a new work experience on this subject is obtained”, “learner Smith
has acquired work experience on this subject” and so on. Sometimes the
condition is changing along certain direction and an event becomes predictable:
it is 6pm and soon will be night time, it is changing from cloudy to raining, and
so on. The end user can then be prompted to decide what to do in case such an
event occurs.

The end user can posit a sequence of events and query points to define a query
path for morphing. In some cases the initial query is too restrictive and the end
user may wish to enhance the significance of a certain type of objects. If the end
user is able to visualize the type of objects that meet the information needs, the
end user can add clauses andor constraints involving that type of object to the
evolutionary query. In other words the end user can repeatedly adjust the query
path for morphing to focus on certain type of objects. In that regard, we note the
importance of visualization in query morphing: we need to visualize both the
query and the retrieval results.

Corresponding to adjusting the query path, the morphing algorithm revises its
strategy to modify the query. For example, in case of cloudy conditions a source
such as the CCD sensor should be replaced by another source such as the IR
sensor. A roaming query path can be defined, which is materialized into a query
path based upon the contents of the ontological knowledge base. By specifying
108

the appropriate adlet propagation rule, adlet generation rule and adlet
modification rule, the interactive morphing algorithm can be designed.

10. Discussion

To study the impact of query morphing on various applications in information


fusion, a test bed for evolutionary query visualization and evaluation is being
implemented. As mentioned in previous sections, the data from the sensors must
be merged to produce a coherent response to user queries. However, given that
each sensor is contributing incomplete and potentially conflicting information, it
is likely that the system’s response will still contain an element of uncertainty.
This motivates the need for a test bed that can effectively display potentially
ambiguous results to the viewer in a meaningful way.

Under the guise of routing for emergency rescue in catastrophic events, a test
bed could be implemented that obeys the following stages. First, a query is
issued that activates the appropriate sensors to collect information about the
environment. A query processor then collects data from the sensors and fuses
the information into a coherent statement about the environment. The relevant
information is passed to a display that helps the viewer visualize the results. An
interaction loop between the viewer and the display allows the viewer to provide
feedback and modify the query.

At the broadest level, the test bed is fairly simple, consisting of three main
interface components: a query mechanism, a visualization display, and a
feedback mechanism. This general model offers the broadest possible solution
and probably describes many visual information systems for fusion. Additional
requirements may include:

1) To process the query requires analysis of partial, ambiguous, redundant,


and possibly conflicting information. ResolutiodFusion of the sensor
data ensures a level of uncertainty that needs to be expressed in the
visualization.

2) The visualization needs to be able to assist the viewer with the


discriminating relevant information from background noise.

3) The evaluation of the system by the viewer needs to permit the viewer to
provide feedback on the accuracy of the sensors (a), as well as the accuracy
of guidance provided by the visualization (b).

To address these issues, a modular approach is adopted, selecting specific


technologies that address the needs for each of the components. The foundation
109

for the Query module can rely on ZQL, the query refinement fusion algorithms
described in [lo] and the query morphing approach discussed above. A
framework for evolutionary query, visualization and evaluation of dynamic
environments is formulated. To close the loop in the system requires that the
viewer be able to provide feedback, evaluating both the query results as well as
the visualization itself. The query mechanism should support two major types of
feedback: sensor accuracy and expressiveness of the query. Results from
evolutionary query optimization using limited query morphing, and interactive
approaches using roaming query paths can then be compared and evaluated.
Initially we will invite graduate and undergraduate students to participate in the
evaluation study. When the algorithms are well developed and the system more
mature, we plan to evaluate the applicability of query morphing techniques to
emergency management.

References

1. J. Baumann et al., “Mole - Concepts of a Mobile Agent System”, World


Wide Web, Vol. 1, No. 3, 1998, pp 123-137.
2. C. Baumer, “Grasshopper - A Universal Agent Platform based on MASIF
and FIPA Standards”, First International Workshop on Mobile Agents for
Telecommunication Applications (MATA’99), Ottawa, Canada, October
1999, World Scientific, pp 1-18.
3. K. Chakrabarti, K. Porkaew and S. Mehrotra, “Efficient Query Refinement
in Multimedia Databases, 16th International Conference on Data
Engineering, San Diego, California, February 28 - March 3, 2000.
4. S. K. Chang and E. Jungert, Symbolic Projection for Image Information
Retrieval and Spatial Reasoning, Academic Press, London, 1996.
5. S. K. Chang and E. Jungert, “A Spatialltemporal query language for
multiple data sources in a heterogeneous information system environment”,
The International Journal of Cooperative Information Systems (IJCIS), v01.
7, NOS2 & 3, 1998, pp 167-186.
6. S. K. Chang, G. Costagliola and E. Jungert, “Querying Multimedia Data
Sources and Databases”, Proceedings of the 3rdInternational Conference on
Visual Information Systems (Visual’99), Amsterdam, The Netherlands, June
2-4, 1999.
7. S. K. Chang, “The Sentient Map”, Journal of Visual Languages and
Computing, Vol. 11, No. 4, August 2000, pp 455-474.
8. S . K. Chang and T. Znati, “Adlet: An Active Document Abstraction for
Multimedia Information Fusion”, IEEE Trans. on Knowledge and Data
Engineering, JanuaryFebruary 2001, 112-123.
9. S. K. Chang, G. Costagliola and E. Jungert, “SpatiaVTemporaf Query
Processing for Information Fusion Applications”, Proceedings of the 4”
International Conference on Visual Information Systems (Visua1’2000),
110

Lyon, France, November 2000, Lecture Notes in Computer Sciences 1929,


Robert Laurini (Ed.), Springer, Berlin, pp 127-139.
10. S. K. Chang, E. Jungert and G. Costagliola, “Multi-sensor Information
Fusion by Query Refinement”, Proc. of 5th Int’l Cnference on Visual
Information Systems, Hsin Chu, Taiwan, March 2002, pp. 1-11.
11. C.-Y. Chong, S. Mori, K.-C Chang and W. H. Baker, “Architectures and
Algorithms for Track Association and Fusion”, Proceedings of Fusion’99,
Sunnyvale, CA, July 6-8, 1999, pp 239-246.
12. G. Grafe, “Query Evaluation Techniques for Large Databases”, ACM
Computing Surveys, Vol. 25, No. 2, June 1993.
13. M. Jarke and J. Cohen, “Query Optimization in Database Systems”, ACM
Computing Surveys, Vol. 16, No. 2, 1984.
14. E. Jungert, “An Information fusion System for Object Classification and
Decision Support Using Multiple Heterogeneous Data Sources”,
Proceedings of the 2”d International Conference on Information Fusion
(Fusion’99), Sunnyvale, California, USA, July 6-8, 1999.
15. E. Jungert, “A Data Fusion Concept for a Query Language for Multiple
Data Sources ”, Proceedings of the 3rd International Conference on
Information Fusion (FUSION 2000), Paris, France, July 10-13, 2000.
16. L. A. Klein, “A Boolean Algebra Approach to Multiple Sensor Voting
Fusion”, IEEE Transactions on Aerospace and Electronic Systems, Vol. 29,
NO. 2, April 1993, pp 317-327.
17. H. Kosch, M. Doller and L. Boszormenyi, “Content-based Indexing and
Retrieval supported by Mobile Agent Technology”, Multimedia Databases
and Image Communication, LNCS2 184, (M. Tucci, ed.), Springer-Verlag,
Berlin, 2001, pp 152-166.
18. D. B. Lange and M. Oshima, Programming and Deploying Java Mobile
Agents with Aglets, Addison-Wesley, Reading, MA, USA, 1999.
19. S. Y.Lee and F. J. Hsu, “Spatial Reasoning and Similarity Retrieval of
images using 2D C-string knowledge Representation”, Pattern Recognition,
vol. 25, no 3, 1992, pp 305-318.
20. Hideyuki Nakashima, “Cyber Assist Project for Situated Human Support”,
Proceedings of 2002 International Conference on Distributed Multimedia
Systems, Hotel Sofitel, San Francisco Bay, September 26-28,2002.
21. J. R. Parker, “Multiple Sensors, Voting Methods and Target Value
Analysis”, Proceedings of SPIE Conference on Signal Processing, Sensor
Fusion and Target Recognition VI, SPIE vol. 3720, Orlando, Florida, April
1999, pp 330-335.
22. M. Stonebraker, “Implementation of Integrity Constraints and Views by
Query Modification”, in SIGMOD, 1975.
23. Bienvenido VCIez, Ron Wiess, Mark A. Sheldon, and David K. Gifford,
“Fast and Effective Query Refinement”, Proceedings of the 20” ACM
111

Conference on Research and Development in Information Retrieval


(SIGIR97), Philadelphia, Pennsylvania, July 1997.
24. E. Waltz and J. Llinas, Multisensor data fusion, Artect House, Boston,
1990.
25. Wernert, E. and A. Hanson, “A Framework for Assisted Exploration with
Collaboration”, Proceedings of Visualization ‘99, IEEE Computer Society
Press. 1999, pp. 241-248.
26. F. E. White, “Managing Data Fusion Systems in Joint and Coalition
Warfare”, Proceedings of EuroFusion98 - International Conference on Data
Fusion, October 1998, Great Malvern, United Kingdom, pp 49-52.
112

IMAGE REPRESENTATION AND RETRIEVAL WITH


TOPOLOGICAL TREES

c. G R A N A ~ G.
, PELLACANI~,s. SEIDENARI~,R. CUCCHIARA~
t Dipartamento d i Ingegneria dell ’Informazione
% Dapartimento d i Dermatologia
Universita d i Modena e Reggio Emilia, Italy

Typical processes of image representation comprehend initial region segmentation


followed by a description of single regions’ feature and their relationships. Then a
graph model can be exploited in order t o integrate the knowledge of the specific
regions (that are the attributed relational graph’s (ARG) nodes) and the regions’
relations (that are the ARG’s edges). In this work we use color features to guide
region segmentation, geometric features to characterize regions one by one and
topological features (and in particular incluszon) t o describe regions’ relationships.
Guided by the inclusion property we define the Topological Tree (TT) as a n image
representation model t h a t exploiting the transitive property of inclusion, uses the
adjacency and inclusion topological features. We propose a n approach based on a
recursive version of fuzzy c-means t o construct t h e topological tree directly from t h e
initial image, performing both segmentation and TT construction. T h e TT can be
exploited in many applications of image analysis and image retrieval by similarity
in those contexts where inclusion is a key feature: we propose a n applicative case of
analysis of dermatological images t o support the melanoma diagnosis.In this paper
describe details of the TT algorithm, including the management of not ideality
and a n approximate measure of tree similarity in order to retrieve skin lesion with
a similar TT-based description.

1. Introduction
A fruitful representation of the image content, often exploited in many
tasks of understanding, recognition, and information retrieval by similarity,
is based on region segmentation; a richer description adds to the region’s
attributes some relationships between regions, spatial and topological, that
describe the way we perceive the mutual relations between parts of the
image. To this aim, graph-based description is a power formalism t o model
the knowledge extracted from the images of the regions of interest and their
relationships.
Moreover, the management of large volumes of digital images has gen-
erated additional interest in methods and tools for real time archiving
113

and retrieval of images by content3. Several approaches to the problem


of content-based image management have been proposed and some have
been implemented on research prototypes and commercial In
some works, Attributed Relational Graphs (ARGs) have been introduced
as a mean6i1 to describe the spatial relationships and indexing techniques
have been proposed to speed up the matching based on the edit distance6
approach. In Petrakis' papers1y2ARGSand edit distance are used for image
retrieval in medical image databases. Accordingly, we defined the Topo-
logical Tree , a rich description model that can be constructed a posteriori,
after the region segmentation step, for each type of image. However, in
some applicative contexts, in which the inclusion is a key feature, the inclu-
sion property can be exploited for segmentation too. Thus we propose an
approach called Recursive-FCM (fuzzy c-means) that exploits both color
and inclusion to perform segmentation and at the same time the TT con-
struction. This algorithm has a general formulation but is meaningful in
applications that search for inclusion and color: typical examples are der-
matological images of skin lesions that appear as skin's zones darker than
the normal skin, with many nuances of skin color. Many techniques have be
proposed for color segmentation: among them, many have been adopted for
skin lesion segmentation, as grayscale thresholding and color clustering7.
Fuzzy c-means (FCM) color clustering has been successfully adopted in the
work of Schmid' that adds Principal component Analysis (PCA)to FCM:
a FCM segmentation over the first two principal components of the color
space is tested to be meaningful and robust for skin lesion images. In a
recent workg we described a recursive extension of that approach and here
we will show further improvements that take into accounts not idealities.
Moreover, in the second part of the paper we propose an approximate mea-
sure of tree similarity that can be exploited to search similarities between
skin lesions in a image retrieval system.

2. Topological Relations
Given an image space and an 8-connection neighborhood system, that for
each point xi defines the neighbor set Nzi, segmentation by color clustering
aims to partition the image into a set of regions R = { R l ,. . . , R k } such
n
that U Ri = I and Ri = 0 . To this aim, a clustering process that groups
pixel w.r.t their color, should embed or be followed by a pixel connectivity
analysis, according with the given neighborhood system.
Then a graph-based representation describes spatial and/or topological
114

relations between regions. An example is the adjacency graph, a graph


G(V,E ) whose vertexes are the image regions (V = R) and whose arcs show
the adjacency property, that is a neighborhood system a t region level. In
this context adjacency is defined as follows:
Def.1: A region R, is adjacent to Rj @ 3 xi E Ri, xj E Rj: xj E N z , .
In addition t o connectivity intra-region and adjacency inter-regions, we
aim to evaluate inclusion of a region into another, thus we need to formally
define inclusion between regions. First, we consider an “extended” set of
image regions g = R U {Ro},being Ro a dummy region representing the
external boundary of an image. Then, we define the inclusion property as
follows:
Def.2: A region Ri E R is included in Rj E @ $ P = { R l , .. . ,R N }:
N
Ro U Ri U u R, is a connected region A R j $! P .
n=l
This definition means that is not possible t o draw a path of connected
points between region Ri and the end of the image space (Ro)that doesn’t
include points of Rj. The transitive property holds for inclusion: if Ri is
included into R j and Rj into Rt, than Ri is included into Rt. Thus a tree
model is a natural representation for inclusion.
This ideal definition must be relaxed for implementation purposes in
real images: thus we use aFCH- inclusion definition that substitutes in
Def.2 the filled convex hull of Rj to Rj itself. In this mode also a not exact
inclusion in a topological sense is accepted in real image description.

3. Construction of the Topological Tree


Using the FCH-inclusion propriety we developed an algorithm for providing
segmentation and tree description. In previous worksg we detailed the
color based segmentation with recursive-FCM. Here we add improvements
to deal with exceptions found in particular cases. The algorithm for TT
construction of Fig. 1 can be summarized saying that:

(1) it carries out a color based segmentation in two clusters, using the
PCA and FCM algorithmg;
(2) while segmenting it builds the corresponding tree;
(3) it recursively applies the segmentation to the regions of interest
created by the previous steps of the algorithm.

In particular the algorithm finds the presence of a region that contains


all the others, inserts it in the tree and then continues to apply the algorithm
115

RecursiveFCM(R,,P,)

Extraction of the regions R, and


of their corresponding sets Pk

v regions R, End.
RecursiveFCM(R,,P,)

Figure 1. Algorithm flow chart

to the other regions, obtaining a further partitioning.


In Fig. 1 is possible to see the recursive structure of the algorithm Re-
cursive FCM for the construction of the Topological Tree. Starting from
a region R, (initially equivalent to the whole image I ) is verified if it is
possible to further partition R,. To obtain regions of interest of uniform
color and significant area, the limits for the partitioning conditions shall be
given by the variance of the first two components of PCA and from the size
of the extracted regions. If the partitioning condition is not verified, R, is
inserted in the tree. Otherwise, R, is clustered in more regions, the not
116

significant areas are erased and it is searched for the presence of an external
one. The remaining regions are organized in a structure that allows for a
correct recursion step. In particular, in the ideal case (which generates a
TT with a single child for each node), the FCM algorithm creates two clus-
ters and should create two regions, one including the other. In real images,
often many regions are created. If one of these can be chosen as “external”
it becomes a new node, parent of the others; all the other regions are fur-
ther inspected in the recursion. However, some of these regions could be
also present mutual inclusions and thus not allow a correct tree generation.
We call these regions suspended, since they need a specific management.

3.1. Search f o r a n “external” region of R


The external region R E X Tof R is the region that FCH-includes all other
regions of R.This is the region that is searched for and added t o the tree.
In formulae R E X T E R tjRi 6 R,Ri # REXT+ Ri is included in the
filled convex hull of R E X T(is FCH-included).
Since is much easier and fast to check the the inclusion between the ex-
t e n t s (or bounding box) of two regions (extent-inclusion), it is possible t o
use the observation that FCH-inclusion implies extent-inclusion to search
for R E X T . This is accomplished searching a region RkxT such that all re-
maining regions are extent-included in it; if such a region exists, is necessary
to check if all the other regions are also FCH-included in RkxT.

3.2. Use of ‘?ow interest” and suspended regions


The decomposition of a region R, can cause the generation of regions with
negligible size. Such regions are considered as low interest for the interpre-
tation of images. We will use a parameter to select the minimum area that
a region can assume to be interesting. The regions that have to be elimi-
nated are collected, during the tree construction, in a specific structure for
later integration in the tree, after its complete construction.
After obtaining the set of regions R, (in Fig. 1 from R =
partitioning(R,)), eliminating “low interest” regions and inserting an ex-
ternal region to the tree, is not possible to call the algorithm for all the
extracted regions. In fact the possible presence of inclusion between the
regions of R could lead to the construction of a wrong tree, with a loss
of inclusion relationships between children produced by different clusters.
Because of this, the concept of ‘‘suspended” regions has been introduced,
117

indicating with this term the set of all the regions that cannot be immedi-
ately analyzed, but must wait for the including one.
We thus consider the set R after the elimination of low interest regions
and the possible external one. From R, we distinguish between regions Rk
not included in others and sets Pk of regions included in Rk . Now, for each
region the algorithm is recursively called along with its set of suspended
regions.

RNI = S
P = O
b ’ R , e R
{
if (3 Rk E RNI : R, is included in R k )
Pk = p k u { R a }
else

Figure 2. Pseudo-code for the integration of “low interest” regions

The process for finding all suspended regions is described in Fig. 3.2.
It is to note that a reduction of the search space is obtained, by ig-
noring all regions of P , in fact the external region should contain not only
all regions of R but also all the suspended regions, but for the transitive
property of inclusion this is guaranteed by the fact that they are included
in regions of R.

4. Tree matching
The construction of TT is the basis of a retrieval approach searching fro tree
similarities. An interesting non exact tree matching uses the edit distance
to compare two-trees. It measure the cost of operations such as adding
or eliminating nodes to transform a tree into another. Unfortunately the
118

edit distance based approach problem has proved to be computationally


too expansive to be used without modifications in a search for similarities
context. We tested this approach over our databases using linear assign-
ment to explore all the search space but we found unacceptable response
time. Moreover, it can be difficult t o describe the cost of an operation in
order to use it together with an inter-node, feature based, similarity. These
reasons lead us to produce a quick and sub-optimal algorithm that heavily
relies on two assumptions:

(1) we can match only nodes on the same level of the tree;
(2) given two sets of nodes, taken from two trees, we match one against
the other without solving the associated linear assignment prob-
lem, but considering a sorting of the two sets and letting greater
importance nodes have first choice on the other set.

The first assumption strong limit the search space: it is acceptable in


dermatological context and is motivated by the observation that in our
images each level tends to represent a specific feature as the skin, the le-
sion or its colored areas and an inter-level matching doesn't always make
great sense. The second one is a simplification that quickly produces good
results, without any assurance of reaching an optimum. An observation
that qualitatively justifies this choice is the fact that higher importance
nodes are weighted more in their contribution to the matching function, so
guaranteeing that they get a better match leads towards an higher match
direction.
The algorithm works recursively comparing two sub-trees according to
the following steps:

(1) The roots are compared in an Euclidean feature space by the dis-
tance d of the feature vec.ior. It can comprehend color, area, sym-
metry, texture and whichever other information of each region. An
equivalence measure is obtained as
1
E=-
l+d
(2) Children equivalence is evaluated:
(a) Let us call it 'TI
the tree with more nodes and Tz the other
one;
(b) The nodes of TI are considered in order of importance (eval-
uated on the feature vector);
119

(c) Each children of TI is matched against all not assigned chil-


dren of T,;
(d) After evaluating the equivalence of all nodes, not assigned
children of TI are matched against the null vector and pro-
dnce a negative match.
(3) Total equivalence is given by

Got =
Eroot
-
(ciI&%* + 1)
,
2 CaIi
where Ef is the signed equivalence of each node (with a matching
node or with the null vector), Ii is the importance of the node and
Erootis the equivalence of the roots, as previously defined.
The equivalence measure E is bounded between 0 and 1 and this guar-
antees that E,' is in range [-I, 11 so the the weighted sum can give -1 in
case of total mismatch of the tree structure or 1 in case of perfect match.
This value is converted by the equation to the interval [0,1] and used as
a reduction factor for the matching value of the roots. The interval shift
has the implicit property of reducing the influence of a mismatch at lower
levels of the tree.
In this way, given an image represented by its TT we are able to find
in a image database other images with a similar TT on the basis of the
previous algorithm. Obviously the TT representation cannot be the unique
approach to support query-by-example system and in melanoma diagnosis
a number of other features" on the whole lesion or their part should be
considered in an integrated way. Nevertheless, this is a powerful represen-
tation method that integrated with proven dermatological criteria can give
interesting results in retrieval.

5. Experimental Results
Experimental results have been conducted on synthetic ad on real images,
to first test the correct response of the adopted algorithm and then to verify
its applicability to real world images.

5.1. Synthetic Images


In Fig. 3 we report an example of test over synthetic images: we want search
the similarity between S1 and the whole set. Images, the TT and the match
score is indicated. Obviously the matching value of 1.000 in S1 means a
120

I 1.0000 0.9005 0.9999 0.9928 I 0.8962

Figure 3. Synthetic images(from left to right Sl,S2,S3,S4,S5)

perfect match. All other images present not so significant variation from the
original image. The colors were ignored in this evaluation and the feature
vector distance is provided computing only the distance of the center of
mass from the parent one and the percentage of parent area occupied by
the region. Thus S3 and S4 are very similar to S1 while S2 and S5 do not
have a node of S1. The results follow a correct evaluation, giving the ability
t o order the images by the similarity from the first one.

5.2. Dermatological Images


Our application context is the analysis of dermatological images for
melanoma diagnosis and this family of images present a natural partition
of color regions included one into the other because of their usual growth
process; moreover the position and size of inner areas are significant (as a
diagnostic feature). In Fig. 4 some results of a query by example research
are shown and an overall good retrieval was observed. In particular ideal
non-melanoma skin lesion have a TT described by a list (as the last one
image in Fig. 4), while melanomas typically present a more complicated
structure. Unfortunately we still didn’t have the possibility of including a
complete diagnostic set of features in the retrieval algorithm, so a quanti-
tative measure is not still available.

6. Conclusions

We showed a segmentation technique able to extract the inclusion-adjacency


structure of the image, accompanied by a low computational cost matching
technique that enables a flexible feature search over trees, instead that over
the whole images. Visual comparison over synthetic and real images have
been shown to assess the promising opportunity of this methodology. We
would like to thank Fabio Zanella and other students for the code generation
and the tests performed.
121

Figure 4. Experiments on real images

References
E.G.M. Petrakis et al., Image Indexin Based on Spatial Similarity, Technical
Report MUSIC-TR-01-99, Multimedia Systems Institute of Crete (MUSIC),
1999.
E.G.M. Petrakis et al., Similarity Searching in Medical Image Databases
IEEE Trans. Knowl. Data Eng. 9, 435-447 (1997)
A.W.M. Smeulders et al., Content-Based Image Retrieval at the End of the
Early Years, IEEE Trans. Pattern Anal. Mach. Intell. 22, 1349-1380 (2000).
M. Flickner et. al., Query By Image and Video Content: The QBIC System,
122

Computer 28, 23-32 (1995).


5 . A. Pentland et al., Photobook: Content Based Manipulation of Image
Databases, Int. J . Comput. Vzs. 18,233-254 (1996).
6. B.T. Messmer, Efficient Graph Matching Algorithms. PhD thesis, Univ. of
Bern, Switzerland, 1995.
7. S.E. Umbaugh et al., Automatic Color Segmentation Algorithms: With
Application to Skin Tumor Feature Identification, I E E E Engineering in
Medicine and Biology 12,75-82 (1993).
8. Ph. Schmid, Segmentation of Digitized Dermatoscopic Images by T w e
Dimensional Color Clustering, I E E E Transactions o n Medical Imaging 18,
164-171 (1999)
9. R. Cucchiara et al., Exploiting Color and Topological Features for Region
Segmentation with Recursive Fuzzy c-means, Machine Graphics and Vision
11, 169-182 (2002)
10. C. Grana et al., A New Algorithm for Border Description of Polarized Light
Surface Microscopic Images of Pigmented Skin Lesions, in press on I E E E
Transactions o n Medical I m a g i n g , (2003)
123

An Integrated Environment for Control and


Management of Pictorial Information Systems

A. F. Abate, R. Cassino and M. Tucci

Dipartimento di Matematica e Informatica


Universita di Salemo
84081 Baronissi, Salerno - ITALY
E-mail: {abate, rcassino, mtucci }@unisa.it

Abstract. The paper describes an integrated environment for control and man-
agement of pictorial information system. We consider the diagnostic radiology
field as a case study. A system for filing and processing medical images is a
particular pictorial information system that requires to manage information of
heterogeneous nature. In this perspective, the developed environment provides
the medical user with the tools to manage textual data and images in integrated
way. A Visual Data Definition Language was projected and implemented, that
allows the administrator of the system to extend the actual database on the base
of new queries of the users: the insertion of new entities and the creation of new
relationships between them take place simply manipulating the iconical repre-
sentation related to the information you manage. A Visual Query Language
represents a visual environment in which a user could query the database using
iconic operators related to the management of the alphanumeric and pictorial
information with the ability to formulate composed queries: from an alphanu-
meric query will be drawn pictorials data that are contained in the database and
vice versa.

1 Introduction

A pictorial information system is a system for analysis, storage and visualization of


information of heterogeneous nature: images and alphanumeric data.
Figure 1 shows the schematic diagram of a pictorial information system [ 11:
The progress in data transmission has led to high scale interconnection of many work-
station and to "multimedia communication" involving textual data, images, etc.. ..At
first, images are digitalized through an image acquisition device (i.e. tomograph, etc.)
and then manipulated by software tools.
The phases that characterize the elaboration of an image are:
- digitalization;
- coding and data compression;
- improvement of quality and restore;
- segmentation;
- image analysis and description;
- pictorial information management.
124

Planning a pictorial information system requires to evaluate the informative content of


an image in an objective way through measures of pictorial information. An image
storing system is a support to the electronic filing of image characteristics, that allows
to create and manage a pictorial database. A pictorial database is the nucleus of a pic-
torial information system, whose essential feature is coding of images. The image
coding process can be described in three phases:
- image partitioning in smaller elements;
- contour and geo-morphological feature extraction;
- construction of a meta-description of the image usehl for indexing and re-
trieval of pictorial information.

Image
Transmission + Comunication Net

Storage

t
Figure 1. A Pictorial Information System.

In the field of diagnostic radiology the typical information to consider can be divided
findamentally in two categories: data related to the clinical briefcases management,
that give the medical consumer a real support for the development of protocol of
treatment for the examined patients, and data related to the medical images elabora-
tion, filing and retrieval. The analysis of a medical image, a report of a radiology ex-
amination, involves the execution of complex operations like the survey of a possible
anomaly, the determination of its exact spatial position with respect to other objects
(elements of the human body) contained in the image, the calculus of its geo-
morphological characteristics such as area, density, symmetry etc.
In this work, we apply an icon-based methodology to device an integrated environ-
ment providing a set of tools for visual pictorial database definition and manipulation.
Section 2 describes the integrated environment in which the Visual Data Definition
Language and the Visual Query Language are defined. Section 3 presents the Medical
Image Management System and similarity retrieval techniques. Section 4 describes
the management of query results. Finally, section 5 presents future extensions of this
research.
125

2 The integrated environment

A system for the acquisition and elaboration of medical images is a pictorial informa-
tion system that requires to manage textual data (related to the management of medi-
cal record) and medical images. In the field of the informative medical systems the
elaboration of a radiological image consists of abnormality individuation, determina-
tion of geo-morphological characteristics, evaluation of the spatial relationships be-
tween the pathology and the anatomical organs in which it is located. On the base of
these information, the retrieval of similar images, by means of appropriate retrieval
techniques, facilitates the formulation of diagnosis and treatment plan for examined
patients. In this perspective, a Visual Data Definition Language was introduced and
implemented; it allows the administrator of the system to extend the actual database
on the basis of new queries of the users: the insertion of new entity and the creation of
new relationships between them take place simply manipulating the iconical represen-
tation related to the managed information (Figure 2).

Figure 2. Define a new entity.

The realized Visual Query Language represents a visual environment in whch the
user interrogates the database by means of iconic operators related to the management
of alphanumeric and pictorial information with the ability to formulate composed que-
ries: from an alphanumeric query pictorials data can be drawn that are contained in
the database and vice versa (Figure 3).
126

Figure 3. Composed query.

It is possible to store the results of a query in the database, in terms of the correspond-
ing iconic representation, with an efficient reuse of previously formulated queries.
The developed VDBMS is implemented in Java, using JDBC to connect to the under-
lying RDBMS. Particularly, the environment was developed like a webclient applica-
tion that guarantees the portability on any platform and the access to any RDBMS
(eventually remote) by means of a platform-independent user friendly interface.

3 Pictorial information management

A medical image is included in the database if it contains an abnormality whose g e e


morphological characteristics are automatically obtained. Such characteristics are de-
liberated once the image is segmented [2], [3], [4], [ 5 ] and the contour of the indi-
viduated pathology is extracted [6], [7], [8], [9] using algorithms already available in
the literature (Figure 4).
Once the contour of the anomaly located in the examined radiological image is ex-
tracted, the analysis of a pathology proceeds with the calculus of its geo-
morphological characteristics [lo], [ 111, [12], [ 131, [14]. This information will be
used for querying the database relatively to the similarity retrieval for geo-
morphological characteristics of other images previously examined.
One of the principal problems to consider in the pictorial database planning is to es-
tablish an appropriate representation of images. An image can be described both on
the basis of the alphanumeric information that identifies the characteristics of the
form and on the basis of the spatial arrangement of the objects it contains.
127

. .

I I

Figure 4. Segmentation and contour extraction.

In the environment, a virtual image was linked to each examined image. The virtual
image is built using the canonical representative objects of human body components,
which describe the content of the real image and the spatial relationships between the
contained objects and the individualized anomaly in compact manner (Figure 5).

Figure 5. Virtual image associated with an image.

The geo-morphological characteristics of the examined abnormality and the virtual


image are used to interrogate the database to retrieve similar images in the database.
128

The concept of similarity between two images is expressed in terms of Euclidean dis-
tance, in the space of the characteristics, between the points that represent them. The
geo-morphological characteristics we used for search strategy are: area, density,
asymmetry, orientation as regards the centroid, spreadness and uniformity (Figure 6).

Figure 6. Retneval by geo-morphological characteristics.

Given a real image im, the virtual image im.i associated with im is a pair (Ob, Rel)
where:
Ob = {obl, ob2,..., ob,} is a set of objects of im;
Re1 = (Rel,, Rel,) is a pair of sets of binary spatial relations over Ob, where Rel,
(resp. Rel,) contains the mutually disjoint subsets of Ob x Ob that express relations
holding between pairs of objects of im along the x-projection (resp. y-projection)
1151.
Let Q be the virtual image associated with an image used as a query for similarity re-
trieval and i m the
~ virtual image associated with one of the images examined for pos-
sible retrieval. In the case of similarity retrieval for spatial relationships the similarity
degree, denoted by Sim-deg (Q, imJ, is a value belonging to the interval [0,1] that is
defined by a formula that considers how many objects of Q are contained in imviand
how many spatial relationships similar to those of Q are found in imG[16].
Therefore, if Sim-deg (Q, imi) results better or equal of the least of similarity degree
in query, the image will be recovered. In the environment of visual management real-
ized in this paper, the relational algebra operators to interrogate the database on re-
gard of the alphanumeric information contained, and the operators Similarity Re-
trieval and Similarity Retrieval By Virtual Image to allow the recovery of medical
images similar to the query image (one considered for perform the search) and rela-
tive to clinical cases previously examined are considered (Figure 7).
129

Similarity Retriev a I

El Similarity Retrieval by Virtual Image

Figure 7. Operators to perform similarity retrieval.

The first allows to choose the query image according to the number of practice, year
of filing and number form of the clinical diary which the image referred; the second
allows to start the retrieval beginning from the virtual image associated with the query
image and inserted in the list of the Analyzed Abnormalities (Figure 8).

Figure 8. “Analyzed Abnormalities” list.

It can access to alphanumeric data related to an image, by click up correlated CT scan.


In case, it can run the retrieval from the extracted image.
Besides, it was implemented the possibility of formulate “combined queries”: infor-
mation on the pictorial data will be drawn contained in the database from an alpha-
numeric query and vice versa.
In the visual realized environment the possibility of create lists customizable of sites
of interest for consultations in real-time running the browser directly from the inter-
face was implemented.
130

4 Query results management

In the integrated environment designed, the possibility to memorize the results of an


interrogation in the database, in terms of the corresponding iconical representation, al-
lows an efficient reuse of it and avoids the problem of the reformulation of queries al-
ready previously made.
To save the results of a alphanumeric query (Figure 9) two procedures were imple-
mented.

Figure 9. Saving a query.

If a query is saved as Query formulation, each time it is launched, the query will per-
form on the actual data of the examined table; otherwise, if it is saved as Quely result,
the present data to the date of query storing will be saved.
The icons related to queries saved like Query formulation, is inserted in the Stored
Researches. The icons related to queries saved like Query result, is inserted in the Old
Researches. By clicking on either icons the results of the associated operation will be
visualized.

5 Conclusion and further work

The environment portability, the formulation of alphanumeric and pictorial queries,


the possibility to memorize the results, to perform combined queries, to perform con-
sultations in the WWW are particularly useful tools to a medical user for the formula-
tion of diagnosis and future developments of protocol of treatment for the examined
patients.
131

Future developments will concern the study of the techmques of analysis of medical
images of different type (mammography, etc.) and the improvement of the manage-
ment of the multi-user and of the access from remote to different RDBMS.

6 References

[ 11 S.K. Chang, “Principles of Pictorial Information System Design”, Prentice Hall, 1989.
[2] S. G. Carlton e R. Mitchell, “Image segmentation using texture and gray level”, Proc. IEEE
Conf Pattern Recognition and Image Processing. Troy, New York, pp.387-391,6-8 Giugno
1977.
[3] G. B. Coleman, “Image segmentation by clustering”, Report 750, University of Southern
California Image Processing Institute, Luglio (1 977).
[4] S. Vitulano, C. Di Ruberto, M. Nappi, “Different methods to segment biomedical images”,
Pattern Recognition Letters, vol. 18, (1997).
[5] A. Klinger e C. R. Dyer, “Experiments on picture representation using regular decomposi-
tion’’, Computer Graphics and Image Processing 4,360-372 (1976).
[6] Alan Bryant, “Recognizing Shapes in Planar Binary Images”, Pattern Recognition, vol. 22,
pp. 155-164, (1989).
[7] F. Gritzali and G. Papakonstantinou, “A Fast Piece Linear Approximation Algorithm”, Sig-
nal Processing, vol. 5, pp. 221-227, (1983).
[8] James George Dunham, “Optimum Uniform Piece Linear Approximation of Planar
Curves”, IEEE Transaction on Pattern Analysis and Machine Intelligence, PAM1 vo1.8, no 1,
(1986).
[9] G. Papakonstantinou, “Optimal Polygonal Approximation of Digital Curves”, Signal Proc-
essing, vol. 8, pp. 131-135,( 1985).
[lo] T. H. Comer, C. E. Leiserson, R. L. Rivest., “Introduzione agli algoritmi”, cap. 35,
pp.835-864, (1996).
[ 113 Jia-Guu Leu, “Computing a Shape’s Moments from its Boundary, Pattern Recognition”,
vol. 24, no 10, pp. 949-957, (1991).
[I21 Mark H. Singer, “A General Approch to Moment Calculation for Polygons and Line Seg-
ments”, Pattern Recognition, vol. 26, no 7, pp. 1019-1028, (1993).
[ 131 Bing-Cheng and Jun Shen, “Fast Computation of Moments Invariant”, Pattern Recogni-
tion, vol. 24, n”8, pp. 807-813, (1991).
[ 141 Jin-Jang Leou and Wen-Hsiang Tsai, “Automatic Rotational Symmetry Determination for
Shape Analisis”, Pattern Recognition, vol. 20, no 6, pp.571-582, (1987).
[I51 M. Sebillo, G. Tortora, M. Tucci e G. Petraglia, “Virtual Images for Similarity Retrieval in
Images Databases”, IEEE Trans. on Knowledge and Data Engineering vol. 13, no. 6, Nov.-
Dec. 2001, pp. 951-967.
[I61 A. F. Abate, M. Nappi, G. Tortora e M. Tucci, “IME: an image management environment
with content-based access”, Image and Vision Computing, vol. 17, n. 13, pp 967-980, 1999.
132

A LOW LEVEL IMAGE ANALYSIS APPROACH TO


STARFISH DETECTION

v. DI GESU, D. TEGOLO
Universita di Palermo
Dipartimento di Matematica ed Applicazioni
via Archirafi 34, 90123 Palermo, Italy
{digesu,tegolo}@math.unipa.it

F. ISGRO, E. TRUCCO
Heriot- Watt University
School of Engineering & Physical Science
Edinburgh EHI4 4AS, U.K.
Cfisgro,e.trucco}@hw.ac.uk

This paper introduces a simple and efficient methodology to detect starfish in video sequences
from underwater missions. The nature of the input images is characterised by a low ratio sig-
navnoise and the presence of noisy background represented by pebbles; this makes the detection
a non-trivial task. The procedure we used is a chain of several steps that starting from the ex-
traction of the area of interest ends with the classification of the starfish. Experiments report a
success rate of 96% in the detection.

1. Introduction
Underwater images have been used recently for a variety of inspection tasks, in
particular for military purposes as mine detection, or for the inspection of under-
water pipelines, cables or platforms *, or the detection of hand-made objects 7.
A number of underwater missions are for biological studies, as the inspection
of underwater life. Despite the fact of the large number of such missions, and that
image analysis techniques are starting to be adopted in the fish farming field ',
the majority of the inspection of the video footage recorded during the mission
is mostly done manually, as research trying to use image analysis techniques for
biological mission is relatively new 4J0.
In this paper we present a simple system for the analysis of underwater video
stream for biological studies. In particular our task is the detection of starfish in
each frame of the video sequence. The system presented here is the first stage of
133

a more complex system for determining the amount of starfish in a particular area
of the sea-bottom.
The problem we tackle in this work is non-trivial, because of a number of
reasons; in particular: the low quality of underwater images bringing a very low
signal to noise ratio; the different kind of possible backgrounds as starfish can be
found on various classes of sea-bottoms (e.g., sand, rock);
The system we present here is a chain of several modules (see Figure 1) that
starting from the extraction of area of interests in the image, and has as last module
a classifier to discriminate the selected areas between the two classes of starfish
and non-starfish. Experiments performed on a sample of 1090 candidates report
an average success rate for the detection of 96% .
The paper is structured as follows. The next section gives an overview of the
system. The method adopted for selection areas of interest is described in section
3. In section 4 we describe the features that we extract from the areas of interest for
the classification, and section 5 briefly discusses the classification methodology
used for this system. Experimental results are reported and discussed in section 6,
and section 7 is left to final remarks and future developments.

2. System overview
The system, depicted in Figure 1, works as a pipeline of the following four differ-
ent modules:
(1) Data acquisition: each single frame of the underwater video sequence
(live video or recorded off-line), is read by the system for processing;
(2) Extraction of areas of interest: candidate starfish are extracted from the
current frame (section 3);
(3) Computation of shape indicators features): for each candidate a set of
features are computed. The features chosen are a set of shape descriptors
(section 4).
(4) Classijication: this module discriminates the candidate starfish between
starfish and non-starfish, using the features extracted by the previous mod-
ule (section 5).

3. Selection of areas of interest


This first module detects areas of the image likely to include starfish. The objec-
tive of this module is to select everything that can be a starfish, regardless of the
number of false positives that can be extracted: it will be the classification module
taking care of discarding the false positives.
134

System Module Adopted Algorithms


Digitize tape

Extraction of
Connected Regions

Geometrical Indicator
Morphological Indicator
Histogram Indicator

Classification

Figure 1. Schematic representation of the detection systems.

The method adopted is very simple. We first binaries the image using a simple
adaptive threshold 2 , that computes local statics for each pixel (mean value p and
standard deviation a)in a window of size 7 x 7. From the binary images all the
connected components are extracted and the small size ones are filtered out using
the simple X84 rejection rule 3, an efficient outlier rejection method for robust
estimation.

4. Features extraction
The definition of suitable shape indicators is essential for the classification phase.
In our case the shape indicators have been suggested by the morphological struc-
ture of the starfish. We identified three indicators that are combined into a feature
vector to discriminate the connected components extracted between starfish and
noise.

Geometric indicator The convex hull of the connected component is com-


puted, then a geometric shape indicator, p, is defined as:

p = - Qcc
Qch

where a,, is the area of the connected component, and a c h represents the area of
the convex hull. Small values of p will mostly represent starfish.

Morphological indicator The morphological shape indicator, 8,is computed by


applying the opening morphological operator to the connected component:
135

,g=- a o c
Qcc

where aOcis the area of the result obtained applying the opening to the connected
component. Starfish are likely to return small values for the 8 indicator.

Histogram indicator This indicator is based on the statistics mean values, p,


and variances, cr, of the histograms by row and by column of the component to be
analysed:

where q, = 2 and q2/= 2.


Small values of this indicator characterise uniform distribution of the pixels of
the component. Therefore starfish components will have small values for q.

Figure 2. Examples of input images.

5. The classifier
For the classification module we adopted a simple Bayesian classifier Let C1
and C, represent the starfish class and the non-starfish class respectively, and let
2 be a vector in the feature space. What we want is to compute the a posteriori
probabilities P(xIC,) of a vector z to belong to the class Ci, and assign the vector
x to the class having the largest P ( z ( C i ) .
The Bayes’ formula states that

Assuming a Gaussian model for the a priori probabilities of the two classes of
vectors in the features space p, 8 ,(P(C,(z)),
~ a uniform distribution for P ( z )
(i.e., P ( 2 ) = l), and assuming that P(C1) = P(C2) we get that the a priori
136

Figure 3. Examples of the components extracted from the video sequences. First row show examples
of starfish. The second row shows a selection of elements from the non-starfish class.

probability P(CiIz) equalises the a posteriori probability P(zJC,).Therefore we


can perform the classification comparing the two a priori probabilities.
It is worth to notice that what we consider as the non-sta@sh class is not
everything that is not a starfish, but only material that can be found on the sea-
bottom together with starfish, mainly pebbles. Therefore our non-starfish class is
well defined and it can be seen from Figure 4 that most of the feature vectors fall
in a bounded region of the feature space. This can justify the use of a Gaussian
distribution to model the non-starfish class, although the cluster formed by the
features vectors for this class is not so well shaped as the one formed by the
starfish class.

6. Experimental results
We tested our system on different video sequences obtained as different chunks
of a long video from an underwater mission. We classified manually a number of
connected components from three different video sequences.
A set of 394 components (197 starfish and 197 non-starfish) from the first
video sequence, were used as training set in order to estimate the two Gaussian
distributions. The two clusters of points in the feature space relative to the training
set are shown in Figure 4.
A second set of 348 components, divided in 174 starfish and 174 non-starfish,
and a third set of 742 components, divided in 371 starfish and 371 non-starfish,
137

have been used as test sets. The two sets were extracted from the second and third
video sequence respectively. The results are reported in Table 1. In general we can
observe that the success rate in classifying elements from the starfish class is high
(in the order of 98%), that is a very good results for such a simple classifier. Higher
is the error in classifying elements from the non-starfish class (in the order of 7%).
This is due to the fact that we included among the non-starfish some components
that are small parts of a starfish (such as tentacles), and these have morphological
properties similar to the starfish. A way to overcome these problem is to identify
a feature discriminating between starfish and this sub-class of starfish and adopt
a multistep classifier, or add this feature to the feature space if different from the
three adopted.

Table 1. Results of the experiments on the two test sets. %E = errors per-
centage, #E = number of errors, MCS = Mis-classified starfish, MCNS =
Mis-classified non-starfish
#Components %E #E MCS MCNS
%E #E %E #E
Test1024b 348 (2 x 174) 3.7 14 1.72 3 6.3 11
Tes21550b 742 (2 x 371) 4.8 36 2.1 8 7.5 28

7. Conclusions
This paper presented a system for the detection of starfish from underwater video
sequences. The system is composed by a chain of modules which ends with a
Bayesian classifier, that discriminates if a area of interest extracted from the input
image represents a starfish or not. Experiments perfonned on a number images
(more than 1000) show that our system has a classification success rate of 96%.
The system can be developed and improved in a number of ways. Most of
them regard the classification module. First the classification module could im-
plement modern and sophisticated learning techniques (e.g., support vector ma-
chines). We might also associate to each classification a confidence level (for
instance a candidate is classified as a starfish with 90% confidence). Moreover
we might think to extend the classification to more classes, discriminating among
different species of starfish. We will need more than the three features described
in section 4, and it might be useful to use more than one classifier.
So far the system works on single frames. An interesting and useful extension
is to count the amount of starfish in a video sequence. To this purpose we need
to remember the starfish seen and counted in previous frames. Therefore a track-
ing module (which tracks starfish in consecutive frames) must be introduced, and
138

Figure 4. Plot of the distribution of the training set in the feature space. The dark points represent
elements in the non-starfish class, the grey crosses elements in the starfish class.

several candidate algorithms have been identified. Starfish counting also requires
identifying and occlusions between starfish.

Acknowledgements
We thank Dr. Ballaro for useful discussions. This work has been partially sup-
ported by the following projects: EIERO project under grant number EU-Contract
HPRI-CT-200 1-00173: the international project for universities scientific cooper-
ation CORI May 2001-EF2001; COST-action 283. The test data were provided by
Dr. Anthony Grehan (Martin Ryan Marine Science Institute, University College,
Galway, IRELAND).

References
R. 0. Duda, P. E. Hart, and D. G. Stork. Pattern classification. Wiley, 2001.
R. C. Gonzales and R. E. Woods. Digital imageprocessing. Addison Wesley, 1993.
F. R. Hampel, E. M. Ronchetti, P.J. Rousseeuw, and W. A. Stahel. Robust Statistics:
the approach based on inJIuencefunctions. John Wiley & Sons, 1986.
D.M. Kocak, N. da Vitoria Lobo, and E.A. Widder. Computer vision techniques
for quantifying, tracking, and identifying bioluminescent plankton. IEEE Journal of
Oceanic Engineering, 24(1):81-95, 1999.
139

5. S. Marchand-Maillet and Y.M. Sharaiha. Binary digital image processing. Accademic


Press, 1982.
6. F. Odone, E. Trucco, and A. Verri. Visual learning of weight from shape using support
vector machine. In Proceedings of the British Machine Vision Conference, 1998.
7 . A. Olmos and E. Trucco. Detecting man-made objects in unconstrained subsea videos.
In Proceedings of the British Machine Vision Conference, 2002.
8. A. Ortiz, M. Simo, and G. Oliver. Image sequence analysis for real-time underwater
cable tracking. In Proceedings of the IEEE Worhhop on Applications of Computer
Vision,pages 23&236, 2000.
9. J. Serra. Image analysis and mathematical morphology. Accademic Press, 1982.
10. M. Soriano, S. Marcos, C. Saloma, and M. Quibilan amd P. Alino. Image classifi-
cation of coral reef components from underwater color video. In Proceedings of the
MTSIIEEE OCEANS Conference, volume 2, pages 1008-1013,2001.
140

A COMPARISON AMONG DIFFERENT METHODS IN


INFORMATION RETRIEVAL

F.CANNAVALE AND VSAVONA


Dipartimento di Scienze Mediche, Facoltb di Medicina.
v. S. Giorgio12.09124, Cagliari, Italy
E-mail: vsavona@pacs.unica.it

C.SCINTU
Dipartimento Ingegneria del Territorio, Facolta di Ingegneria
p.zza d % - m i .09123, Cagliari, Italy
E-mail: cescintu@unica. it

In this paper we propose a comparison among algorithms (HER and HEAT, appeared in
the last three years in literature) and classical elaboration and transformation methods
(such as DFT, Wavelet and Euclidean Distance), when applied to information retrieval
with several multimedia1 databases, chosen under specific experimental criteria.
The first database is a collection of Brodatz textures, on which we applied some linear
and non-linear transformations; this choice was due to the wide popularity in the
scientific environment of the above mentioned textures and in the easy way to put a
visual interpretation on the obtained results due to the applied transformations. The
second database contains several mammographies, characterized by both benignant and
malignant lesions, while the last database is an aerophotogrammetric image of Cagliari’s
district area. The choice of the last two databases was due to the high grade of difficulty
of their image content.

1. Introduction
The problem of image classification and retrieval by content, based only on the
actual content in the pictorial scene, is an hard one. As it turns out, human beings
are extremely good at recognizing shapes and textures independently from their
position and orientation, but much less confident when programming a machine
to achieve the same task; finding an automated technique to solve the pattern
recognition problem by computer is a dounting task and no general solution is
yet available, even if scientists carried solutions for specific problems on fixed
areas through.
In the scientific literature the proposed technique fall almost invariably in
the category of feature extraction methods, whose key idea is to analyse the
pictorial scene in order to obtain n numerical features. In this way an image is
mapped from the Image (or pixel) Space into a single point in n-dimensional
Feature Space, where traditional -and exact- spatial access methods may be used
to retrieve points (i.e. images) that are close to a query image. This type of user
141

interaction paradigm is called “query by example”, because the user supplies an


example (the query image) and the system looks for images that are “near” in
some sense.
In order to place the present work into some perspective, we now briefly
review the considered well known techniques.
Given a discrete signal f(t) in the Time Space, we use a linear
transformation which associates the amplitude of the signal at the considered
time te[O,T], where T is the total duration of the signal, with its energy value in
the Energy Space, where we obtain as many features-energy points as the values
o f t in the [O,T] interval.
We define the Euclidean Distance between two signals both in the Features-
Energy and in the Time Space as the difference of the areas between the graphs
of the signals and the x axis in the considered interval, i.e. the difference of the
corresponding defined integrals.
We have to notice that the difference between two integrals is not absolutely
acceptable as comparison of signals, because there are infinite signals, with
completely different shape, which are characterized by the same Euclidean
Distance both in the Features and in the Time Space.
The second transformation used in this work is the DFT (Discrete Fourier
Transform), that under a certain approximation, may be considered a linear
transformation that maps a signal f(t), defined in the Time Space, in the
Frequency Space.
The comparison between two signals fi(t) and fi(t) defined in the Time
Space is realized by calculating the Euclidean Distance among its
representations using a stated harmonic content for each of the considered
signals [ 11.
A further transformation token into consideration in the present work is the
wavelet decompositon; if Fourier analysis consists of breaking up a signal into
sine waves of various frequencies, wavelet analysis is the breaking up of a signal
into shifted and scaled versions of the original (or mother) wavelet. A wavelet is
a waveform of effectively limited duration that has an average value of zero [2].
One major advantage afforded by wavelets is the ability to perform local
analysis, i.e. to analyze a localized area of a larger signal.
Wavelet analysis is capable of revealing aspects of data that other signal
analysis techniques miss, aspects like trends, breakdown points, discontinuities
in higher derivatives, and self-similarity.
Furthermore, because it affords a different view of data than those presented
by traditional techniques, wavelet analysis allows to compress or de-noise a
signal without appreciable degradation. Indeed wavelets have already proven
142

themselves to be an useful tool in the signal processing field and continue to


enjoy a burgeoning popularity today.
The last methodology we present in this work is the non-linear
transformation HER (Hierarchcal Entropy-based Representation) [3][4].
Given a signal f(t) in the Time Space, we can represent it by following these
steps: first we choose hierarchically the absolute maxima of the considered
signal, then for each of the maxima we evaluate the corresponding Gaussian,
whose height is the value of the energy related with the considered maximuq
and the associated entropy, whose measure is expressed by the following
relationship:

si = [Ei- o i , E i +oili
Ei
where Eiis the energy value for the considered absolute maximum and 0,is its
standard deviation.
We transform a signal f(t) from the Time Space into the Entropy Space by
adopting the following criterion: we considered the entropy values associated
with the absolute maxima following their hierarchical extraction order, then we
place in the Distance/Entropy Space these related values of the entropy, with the
first maximum at the place x=O.
HER is characterized by some interesting properties: it is invariant with
respect to the translation of the signal (i.e. to the amplitude, to the time-shift and
to the initial phase-shift).
HEAT introduces the linear transformation:

that allows us to extent HER transforms from 1D signals to images.


This paper is organized as follows: Section 2 gives and discusses
experimental results; Section 3 concludes our study.

2. Experiments and Results


In experiments we focused our attention on the results due to the different image
processing methods when applied to three diverse databases; the choice of the
above mentioned databases aimed to put in evidence the different characteristics
of the proposed methods.
For each of the database the experiments aimed at whatever retrieval results
were in good agreement with objective and/or human similarity judgement, and
in which measure.
143

The first experimental test was focused to study the behaviom of the
methods with respect to linear and non-linear transformation of the considered
signals.
The first database was a set of 256 signals, obtained from 16 different
Brodatz textures[5]. For each of the textures we selected a 32 x 32 pixel area,
that produces a 1D signal of 1024 pixel length when HEAT is applied; these
steps generate a set of 16 One-Dimensional signals.

Figure 1 Selection of tiles obtained from the Brodatz textures augmented with
transformed versions of 16 original signals (first database).

We applied several linear and non-linear transformation to the set of the ID


signals: the former were several amplitude shifts, translations, mirror-reflections,
rotations of integer multiples of n/2, the latter different histogram stretchings,
noisy versions with different amounts of Gaussian noise addition.
As easily predictable, the linear transformations produced a power spectrum
variations of the Fourier transforms, while non-linear transformations involved
several changing in the high frequency components of the Fourier spectrum.
The obtained results confirmed the expected effects of the applied
transformations, as shown in Figure 2; it is noteworthy that the performance of
the DFT changes when we augment the number of the reconstruction
components of the query signal: the bigger the number of the components, the
worse the result.
144

14 ~
-#False Alarm Position

12 - - - . -# -MatchesFirst 15 retrieved
10 -
CA
a
3
* 8 -
L1
O 6 -
2
4-
2 -
DFT Harmonics
0 I I I I

17 34 500 1000

Figure 2. Graph of the behaviour of the DFT-' considering different values of the
reconstruction harmonics.
Figure 3 shows a signal belonging to the first database (bark element) and
the same signal when reconstructed using 17 harmonics in the Inverse Fourier
reconstruction.

Figure 3 - Original Bark Brodatz signal (top) and the same signal after the DFT'
obtained with the first 17 harmonics (down)
145

We can notice that the inverse DFT transform produces a set of signals
almost overlapping when applied to the set of 16 relevant matches (Le. to one of
the transformed version of the image query, including the query too) using a
restricted number of components.
These results are also clearly shown in Table 1, which includes the
comparison among the proposed methods.

Table 1 Comparison among methods for Brodatz Bark Query

Euclidean
HER Fourier Wavelet
Distance
#Correct Tiles 15
1°#False Alarm position 12 11 12 12
#False Dismissal 4 5 4 4
Normalized Recall 0.969 0.927 0.969 0.969

The second database is a collection of mammographies, which includes 49


types of breast cancers (mass/calcification/microcalcification)of both benignant
(24 images) and malignant (25 images) nature; the diagnosis of each of the
considered radiological images is confirmed by bioptic exams [6].
For each of the mammographies we selected a 32 pixels square area, that
produces a set of 49 One-Dimensional signal of 1024 pixels length.

Figure 4 Collection of 49 mammographies of benignant and malignant breast cancers


(second database).
146

The graphs of the results due to the different methods are shown in Figure
5a and Figure 5b, where a benignant and a malignant query is adopted
respectively.

Retrieval trend with Benignant Query

25 1 -THEORICAL
- - - HER
20-
D.EUCLIDEA and FOURIER

-v)

15-
WAVELET

B
.-t
j lo-

5-
/
/
Tiles

Figure 5a Results due to the different methods when a benignant query is applied.

Retrieval trend with Malignant Query

25
-THEORICAL
--- HER
/ -
20 -D.EUCLID.
-
_ _ _ _ _ FOURIER
G WAVELET
5 15 -
f.-
e
g
lo-

5 -

Tiles
0- , I

Figure 5b Results due to the different methods when a malignant query is applied.
147

The wide range of variability of the signals belonging to this database and
their non- periodic nature are the most effective factors in the qualitative and
quantitative response of the considered methods [7][8] [9] [lo].
In Figure 5a and 5b we can observe that HER gives again results very close
to the theoretical response, while both DFT and Wavelet behaviours are similar
each other but less perfonning with respect to HER.
The thud and last database is a portion of an aerophotogrammetry of
Cagliari’s district area, acquired at an altitude of 10.000 meters; the image is
characterized by rural communication road, extended plantations (trees and
horticultural) and farms (Figure 6 ) .

Figure 6 Aerophotogrammetry of Cagliari’s district area (third database).

The aerophotogrammetric image was divided into 6400 portions (i.e.


signals), each of them characterized by a length of 100 pixels.
The results due to the application of all the considered methods is shown in
Figure 7, where a tree query is adopted; a comparison among the results obtained
in the second and the third databases reveals that the trend of the different
methods are qualitative similar.
148

100 -

90 -

80 -

70 -

60-

'.-
I
.I
Y

50-
?i
L.
Y

2 40-
I

0 20 40 60 80 100 120

Figure 7 Results due to the different methods with a tree query.

3. Conclusions
In this work we have faced the problem of image retrieval efficiency; four
different methods were performed when applied on three diverse databases.
Experimental results show the robustness both of HER and HEAT, also
stated in some previous works. We were also able to prove that the above
mentioned methods give results that can be compared to DFT, Wavelet and
Euclidean Distance, despite HER and HEAT intrinsic nature of non-linearity.
Textures, medical and aerophotogrammetric images has been considered as
databases for our experiments. The obtained results need some further
considerations; the use of the Brodatz database allowed us to show that HER,
DFT and wavelet behaviour is quite similar. The comparison among methods
gives more interesting results when applied to the other proposed databases; in
fact if the behaviour of HER is generally the best one, DFT, wavelet and
Euclidean Distance behaviour is worse and almost invariable with respect to the
considered database; among the last three methods we are not able to establish
which of them works generally better. Anyway the Euclidean Distance is the less
performing among the considered methods.
149

The amount of the information considered in the process of information


retrieval was also taken into account; it is important to put in evidence that HER
gives its appreciable results considering only the 20% of the whole information
content of the signals of each of the databases.

Aknowledgements
The authors would like to thank Marco Cabras and Maria Giuseppina Carta of
the Provincia di Cagliari for their concerning in obtaining the permission to use
some portions of the digital images of Cagliari’s district area.

References
1 Brandt s:, Laaksonen J., Oja E., Statistical Shape features in Content-
based Image Retrieval, Proc. of ICPR, Barcelona, Spain, September 2000
2 Teolis A., Computational signal processing with wavelets, 1998,
Birkhauser.
3 Casanova A., Fraschini M., Vitulano S., Hierarchical Entropy Approach
for image and signals Retrieval, Proc. FSKDO2, Singapore, L.Wang et al.
Editors.
4 Distasi R., Nappi M., Tucci M., Vitulano S., CONTEXT: A technique fur
Image retrieval Integrating CoNtour and TEXture Information, Proc. of
ICIAP 2001, Palermo 224:229,-Italy, IEEE Comp. SOC.
5 Brodatz P., Textures, A Photographic Album of Artists and Designers,
Dover Publications, New York, 1966. Available in a single .tar file
:ftp://ftp.cps .msu.edu/pub/prip/textures/
6 Suckling J., Parker J., Dance D.R. et al., The mammographic image
analysis society digital mammogram database, in Digital Mammography,
Gale, Astley, Cairns Eds, pp 375-378, Elsevier, Amsterdam, 1994
7 Issam El Naqa, Yongyi Yang, et alt., Content-based image retrieval fur
digital mammography, ICIP 2002.
8 Acharyya M., Kundu M.K., Wavelet-based Texture Segmentation of
remotely Sensed Images, Proc. of ICIAP 2001, Palermo, 69:74, IEEE
Computer Society.
9 Wang J.Z., Wiederhold G., Firshein O., Wie S.X., Content-based image
indexing and searching using Daubechies wavelets, Int. Jour. Digit. Libr.,
1997, 1:311-328 Springer Verlag.
10 Chang R.F., Kuo W.J. Tsai H.C., Image Retrieval on Uncompressed and
Compressed Domain, ICIP 2000.
150

HER: APPLICATION ON INFORMATION RETRIEVAL

A. CASANOVA AND M. FRASCHINI


Dipartimento di Scienze Mediche Internistiche, Facoltci di Medicina e Chirurgia
Via Sun Giorgio 12,
Cagliari, 09124, Italia
E-mail: {Casanova,fraschini}@pacs.unica.it

This paper presents an overview and some remarks of an indexing technique


(HER) for images retrieval based on contour and texture data, and shows last
results obtained with Brodatz texture, aerial photographs and medical image
datasets. The method encode 2-dimensional visual signal into a 1-d form in
order to obtain an effectivetechnique for content-based image indexing. This
representation is well-suited to both pattern recognition and image retrieval
tasks. Our experimental results have also shown that the hierarchical entropy-
based system approach can improve the detection of suspicious areas and the
diagnostic accuracy.

1. Introduction

The Information Retrieval field has generated additional interest in methods and
tools for multimedia database management, analysis and communication.
Multimedia computing systems are widely used for everyday tasks and in
particular, image database represents the most common type of applications; it is
important to extend the capabilities of such application field by developing
multimedia database systems based on retrieval by content. Searching for an
image in a database is a complex issue expecially if we restrict the queries to
approximate or similarity matches.
A variety of techniques and working prototypes for content-based image
indexing systems exist in literature.
This paper presents an overview and some remarks of an indexing
technique, Hierarchical Entropy-based Representation for images retrieval based
on contour and texture data, and shows last results obtained with Brodatz
texture, aerial photographs and medical image datasets. Our method has shown
to be effective on retrieve images in all cases under investigation and it has
invariance and robustness properties that make it attractive for incorporation into
larger systems.
We also think, that using this method in medical databases as the basis for a
computer-aided detection (CAD) system could be a relatively new and intriguing
151

idea; our first experimental results have shown that it is effective considering the
most objective indexes to estimate the performance of diagnosis results
(sensitivity, specificity, positive predictive value, and negative predictive value).
The paper is organized as follows: Section 2 shortly reassumes how our
method works and some of its properties; Section 3 shows a comparison with
Wavelet based method and Section 4 describes the results obtained from
experimentation on several image datasets.

2. HER, the Method

The main task of pattern recognition is to compare a measured image in an


unknown position to different prototypes. We get a direct brute force solution to
this problem if we compare the prototypes in all possible positions and extract
the optimal coincidence. If we use Euclidean distance for comparison, we end up
calculating the maximum of a high order correlation function, which is a rather
time consuming operation. The time required grows exponentially with the
number of parameters describing the coordinate transformations induced by the
motion. A more elegant way to solve the problem involves the use of mappings
that are able to extract position-invariant intrinsic features of the object.
The method of Fourier descriptors is known to work reasonably well for the
recognition of object contours independent of position, orientation and size.
There are works that show the results of the Fourier approximation of polygons
for different numbers of Fourier coefficients. As it turns out, it is possible to
acheve a good approximation of a polygon by using 15-30 coefficients.
Even with few coefficients, the Fourier series obtain an acceptable
approximation to the original curve because the low frequencies contain the most
significant information about the object.
Other techniques recur to the minimization of the contour’s moments with
respect to an orthogonal coordinate system centered in the object’s center.
Generally, only the first two moments are used because the higher-order
moments add little information content. However, this approach does not appear
to be particularly effective: indeed, it requires a great amount of information and
long computing times. HER, Hierarchcal Entropy-based Representation, is a
time-series indexing system useful for efficient retrieval by content. This model
is employed in order to describe a 1-D signal by means of a few coefficients. The
method reconstructs the energy distribution of the given signal along the
independent variable axis selecting the most relevant local maxima based on the
area, and therefore the energy, associated with each maximum.
152

Considering a signal f(') in the time space, HER represents the signal in
the entropy space following these steps:
Selection of first absolute maxima
Consider the maxima to be the midpoint of a Gaussian distribution
Compute its relative entropy
Go back to first step until we have used a predefined number M of
maxima or when the fraction of the total energy remaining in the signal
falls below a given threshold
In the entropy space the signal is represented by means the sequence of the
extracted maxima, located by the distance from first maximum (largest).
The distance between two given signal f, (') e f2(t) is obtained by
means the comparison of the correspondence non-linear HER representations.
HER is a good candidate for content based retrieval whenever the
information can be accurately represented by a 1-D signal.

2.1. Her for Contours


The proposed method HER has been applied to analyze and classify closed
contours of objects and regions of a pictorial scene. In order to obtain a 1-d time
series from 2-d contour data the approach is to scan the contour pixel by pixel.
One disadvantage is that the representation will have one point for each contour
pixels; therefore, the data size can get large for images at high resolution.
The advantage is that any contour can be represented in a lossless, reversible
way. The contour is scanned clockwise starting, from its top left pixel, recording
the distance between each pixel and the center of mass. The contour is sampled
pixel by pixel and this yields a periodic time series with as many points as there
are pixels in the object contour. The frame of reference is a coordinate system
centered in the barycentre G of the object, computed as follows:

where xi and yi are the coordinates of a pixel Pi belonging to the contour


with k pixels. After that, the d4 distance is computed between the barycentre G
and the k pixels of the contour. In this way is possible to obtain a representation
y(s) of the contour in curvilinear coordinates. Such a representation is univocal,
and it is possible to reconstruct the original 2-D contour shape without loss of
information.
Applying the HER method is possible to describe the y(s) representation by
means of a few coefficient. The discrete form of the Fourier Transform is also
often used as a shape descriptor. It has several nice well-known mathematical
153

properties, most importantly linearity. As shown by Zahn and Roskies, an


adequate approximation of a polygon requires 15-30 coefficients. When the
object has highly irregular or jagged contours, even 30 coefficients are not
enough to characterize the shape adequately for accurate reconstruction.

2.2. Her for Textures


HER has been also applied to analyze and classify 2-d texture information. The
main idea of the tool is to transform an image from 2-D signal to 1-D signal. In
order to obtain a 1-d time series from 2-d texture data the approach is to follow a
spiral path in the texture element.
This choice has the advantages of simplicity and instant computability.
Applying the HER method is possible to obtain a representation of the 1-D with
by means few coefficients.
In the case of textures, the spiral method used to obtain a 1-d dataset is
sensitive to rotation and reflection, so that exact theoretical invariance - as
opposed to practical robustness -is not possible. However, by rotating and
reflecting the texture into a canonical form before applying the spiral method, it
is possible to make the whole process invariant. This method is invariant to some
type of image transformation: contrast scaling, luminance shifting and
translation.

3. Comparison with a Wavelet Based Method

As said above, there are several methods available for image retrieval. The
methods based on the multiresolution formulation of wavelet transforms are
among the most reliable and robust. A wavelet is a waveform of effectively
limited duration that has an average value of zero. One of the most advantages
afforded by wavelets is the ability to perform local analysis.
The comparison was aimed at assessing the efficiency and effectiveness of
the retrieval. In particular, efficiency is related to the computational
requirements and to the index size, while effectiveness has to do with the quality
of the answer set. As for the quality of the retrieval, wavelet-based approaches
are very robust and tolerate even the addition of Gaussian noise to the query
texture without too negative consequences. In HER, as few as 4 or 5 maxima are
usually enough to characterize a texture in an effective way. Indeed, having too
many maxima in the index does not improve on the performance. As a
consequence, the typical size of HER indices is rather small. On the other hand,
a typical wavelet-based index requires about a hundred coefficients to work with
good accuracy.
154

Summing up, HER’S performance in terms of quality are very close to those
of methods based on the wavelet transform, but it is much less costly in terms of
computing resources and index size. Additionally, as stated above, this
representation can be effectively used for different kinds of data; in particular
contours and textures.

4. Experimental Results

Several experiments have been performed in order to assess the validity of the
proposed method. For these tests, we focused on aerial images dataset, Brodatz
set of textures and, furthermore, on one medical case study containing
mammographies from the MIAS Database. In all these cases texture is
significant enough that it can be tentatively used alone for indexing.
The testing dataset with aerial images was constructed using aerial
photographs acquired in nearby Cagliari regions. The dataset include several
images of different kind of soil, vegetables, roads, river and buildings. Figure 1
shows a portion of the area under investigation, with a subdivion in tiles (10 x 10
pixels). We tried to investigate the use of texture as a visual primitive to search
and retrieve aerial images.

Figure 1. Partitioning process of aerial photographs image.


155

The results obtained demonstrate (Figure 2) that our method can be used to
select a large number of geographically salient features as vegetation patterns,
parking lots, and building developments.

300000

250000

u) 200000
8
2
*
150000

=
u)

100000

50000

0 1000 2000 3000 4000 5000 6000 7000


Rank

Figure 2. Distance from query tile (tree).

The testing dataset from Brodatz textures include several transformed


versions in order to test the robustness of retrieval. Using one of the original
textures as the query and looked at the returned results we found several matches
at distance 0 in feature space from the query texture.

150000

8
C
100000
m
c)

.s
0
50000

0
0 10 20 30 40 50 60 70 80 90 100
Rank

Figure 3. Distance from query tile (Bark.0000).


156

In figure 3 is shown the distances from the query tile (Bark.OOOO) to the
closest 100 matches in the testing dataset. First bin represents the first nine
matches at distance 0 each one belonging to the same texture type.
About medical cases, the first database used was the MIAS Mammographic
Database, digitised at 50 micron pixel edge and reduced to 200 micron pixel
edge, so that every image is 1024 pixels x 1024 pixels with 8 bits. The MIAS
Database included 330 images, arranged in pairs of films, where each pair
represents the left and the right mammograms of a single patient, with the
follows details: MIAS database reference number, character of background
tissue, class of abnormality present, severity of abnormality, image coordinates
of centre of abnormality and radius of a circle enclosing the abnormality.
The testing data set includes 67 benign and 54 malignant mammograms. The
lesion was labelled by the reference included in the Database. Table 1 illustrates
the results obtained from the experimentation with different query tiles. The
table is structured in this way: the “#FA /XX’ columns show the number of false
alarms in the XX-elements answer set; the “1st FA” column contains the answer
set rank of the first false alarm; the last one column “Class” represents the class
of abnormality. It is important to note as in all cases we have not found any false
alarm in the first 10 retrieved mammogram tiles.

Table 1. Tiles query retrieval

Tile query 1st FA #FA N O #FA/2O #FA N O Class

I04 15 I 5 M
25 I2 5 9 B

99 16 3 9 M
34 I1 3 9 B

In Figure 4 we show a graphical representation of the first 30 tiles retrieved


with a malignant query (tile #104).
The rm curve represents the malignant retrieval trend while the rb is the
false alarm curve. The db and dm curves represent, respectively, the malignant
and benign distribution trend in the testing dataset.
1 57

25,W

-01 20,m-
>
.!? 15.00-
h
al
10.00 -

0 10 20

# tiles

Figure 4. Retrieval trend with malignant query tile

5. Conclusions

The main idea we have proposed with the HER method is to consider the
maxima as more important feature of a signal. The importance of the maxima is
not only on their position but rather on their “mutual position” inside the signal.
HER is a hierarchical method who select maxima considering the relative value
and reciprocal distances.
The signal is represented by means a vector containing couples of elements,
where the former is the distance of the maxima from the first one and the latter
represent the entropy associated. HER is a non linear transform who present
several nice invariance: translation, rotation, reflection, luminance shifting and
scale.
Experimentation using contour signal has shown encouraging results. Such a
results are strictly connected with the procedure we followed to transform a
shape into a 1-d signal. HER for contour allows to obtain important information
on the number and on the shape of the elongations of the object under
investigation. The sampling theorem clarifies the differences between the
proposed method and Fourier descriptors. Also the comparison with moment-
based technique has shown the validity of the HER method.
Some consideration can be done about the results obtained using the
Brodatz dataset of textures: the transformations applied on the tiles (rotation,
reflection, luminance shifting and contrast shifting) does not modify the low
158

frequencies of the signal, this allows the Fourier Transform to obtain better
results using only few coefficients. However, such a results are not better respect
to the once obtained with HER and the wavelets.
One of the most importance propriety of the HER method is relative to its
low time consuming respect to all other techniques take into account.
Furthermore all experimentation has been conducted using only the 30% of
whole signal information.
In conclusion we can affirm the experimentation with HER has shown
results comparable (and sometimes better) with Fourier Transform and
Wavelets. These lunds of results are confirmed with the last experiments on
medical images (mammography database).
Considering the results obtained on medical images we thlnk our method
could be used as the basis for a computer-aided detection (CAD) system. Finding
similar images, with the aim to attract radiologist attention to possible lesion
sites, sure is important way to provide aid during clinical practice. The
importance of a content based image retrieval system in computer aided
detection is to help radiologists when they need reference cases to interpret an
image under analysis. Our future objective is on the development of an efficient
database methodology for retrieving patterns in medical images representing
pathological processes.

References

1. Casanova A., Fraschmi M., Vitulano S., Hierarchical Entropy Approach for
image and signals Retrieval, Proc. FSKD02, Singapore, L.Wang et al.
Editors.
2. H Distasi R., Nappi M., Tucci M., Vitulano S., CONTEXT: A techniquefor
Image retrieval Integrating CONtour and TEXture Information, Proc. of
ICIAP 2001, Palermo-Italy, IEEE Comp. SOC.
3. Brodatz P., Textures, A Photographic Album of Artists and Designers,
Dover Publications, New York, 1966. Available in a single .tar file:
ftp :I/ftp .cps .msu.edu/pub/prip/textures/ .
4. Issam El Naqa, Yongyi Yang, et alt., Content-based image retrieval for
digital mammography, ICIP 2002.
159

ISSUES IN IMAGE UNDERSTANDING *

VITO DI GESU
D M A University of Palermo, Italy
IEF University oj Paris Sud, ORSAY, France
E-mail: digesu@math.unapa.it

Aim of the paper is to address some fundamental issues and view-points about
machine vision systems. Among them image understanding is one of more chal-
lenging. Even in the case of the human vision its meaning is ambiguous, it depends
on the context and goals to be achieved. Here, a pragmatic view will be considered,
by addressing the discussion on the algorithmic aspects of the artificial vision and
its applications.

1. Visual Science
Visual science is considered one of the most important field of investigation
in perception studies. One of reasons is that eyes collect most of the envi-
ronment information and this makes very complex the related computation.
Moreover, eyes interacts with other perceptive senses (e.g. hearing, touch,
smell), and this interaction is not fully understood. Mental models, stored
somewhere in the brain, are perhaps used to elaborate all information that
flow from our senses to the brain. One of the results of this process is an
update of our mental models by means of a sort of feedback loop. This
scenario shows that the understanding of a visual scene surrounding us is
a challenging problem.
The observation of visual forms plays a considerable role in the majority
of human activities. For example in our daily life, we stop the car at the red
t r a f i c light, select ripe tomatoes, discarding the bad ones, read a newspaper
to update our knowledge.
The previous three examples are related to three different levels of un-
derstanding. In the first example an instinctive action is performed as a

'This work has been partly supported by the european action COST-283 and by the
French ministry of education.
160

Figure 1. Axial slices through five regions of activity of the human brain.

Figure 2. Hardware based on bio-chips.

consequence of a visual stimulus. The second example concerns with con-


scious decision-making activity; where, attentive mechanisms are alerted by
the visual task: recognize the color of the tomato and decide if it i s rape. In
this case the understanding imply training and learning procedures. The
third example involves an higher level of understanding. In fact, the read-
ing of a sequence of typed words may produce or remind us concepts; for
example images and emotions. Reading may generate mental forms that
are different, depending on the reader culture, education, and past expe-
riences. At this point we may argue that visual processes imply different
degrees of complexity in the elaboration of the information.
Visual perception has been an interesting investigation topic since the
beginning of the human history, because of its presence in most of the
human activities. It can be used for communication, decoration and ritual
161

purposes. For example, scenes of hunting have been represented by graffiti


on walls prehistoric caves. Graffiti can be considered the first example of
visual language that uses an iconic technique to pass on history. They also
suggest us how external world was internalized by prehistoric men.
During centuries, painters and sculptors have discovered most of color
combination rules and spatial geometry relationships, allowing them to gen-
erate real scene representation, intriguing imaginary landscapes, and visual
paradoxes. This evolution was due not only to the study of our surrounding
environment; it was also stimulated from the emergency of more and more
internal concepts and utilities. In fact, with the beginning of writing visual
representation became an independent form of human expression.
Since 4000 B.C., visual information has been processed by Babilon and
Assir astronomers to generate sky maps representing and predicting plan-
ets and stars trajectories. Today astronomers use computer algorithms to
analyze very large sky images at different frequency ranges to infer galaxy
models and to predict the evolution of our universe.
Physicians perform most of diagnoses by means of biomedical images
and signals. Intelligent computer algorithms have been implemented to
perform automatic analysis of MRI (Magnetic Resonance Imaging) and
CTA (Computerized Tomography Analysis). Here, the intelligence stays
for the ability to retrieve useful information by guessing the most likely
disease.
Visual science has been motivated by all previously outlined arguments;
it is aimed to understand how we see and how we interpret scenes surround-
ing us by starting from the visual information that is collected by eyes and
processed by our brain 'v2.

One of the visual science goals is the design and the realization of arti-
ficial visual systems closer and closer to human being. Recent advances in
the inspection of human brain and future technology will allows us both
to explore our physical brain in deep (see Figure 1))and to design artificial
visual systems, behavior of which will be closer and closer to the human
being (see Figure 2).
However, advances in technology will be not sufficient to realize such
advanced artificial visual systems, as a matter of fact that their design
would need of a perfect knowledge of our visual system (from the eyes to
the brain).
162

2. The origin of artificial vision


The advent of digital computers has determined the development of the
computer vision. The beginning of the computer vision can be dated around
1940 when Cybernetics started. In that period the physicist Nobert Wiener
and the physician Arturo Rosenblueth promoted, at the Harward Medical
School, meetings between young researchers, to debate interdisciplinary
scientific topics. The guide-line of those meetings was the formalization of
biological systems by including the human behavior 5. The program was
ambitious, and the results not always has people hoped; however the advent
of the cybernetics marks the beginning of a new scientific approach to nat-
ural science. Physicists, mathematician, neurophysiologists, psychologists,
and physician cooperate to cover all aspects of human knowledge. The re-
sults of such integration was not only mere exchange of information coming
from different culture; it contributed to improve the human thought.
In this framework, Frank Rosemblat introduced, in a collection of papers
an books 637,8, the concept of perceptron. The perceptron is a multi-layers
machine, that store a set visual patterns, collected by an artificial retina,
that are used as a training set for an automaton; features and weights
learned during the training set are then used to recognize unknown pattern.
The intuitive idea is that each partial spatial predicates, recognized by
the perceptron, should provides evidences about whether a given pattern
belongs to a universe of patterns.
The dream of building machines, able to recognize any patterns after
a suitable training, was shattered either because of the intrinsic structural
complexity of the natural visual system (even in the case of simple animals
like frogs), or because of the insufficient technological development of that
time.
Nevertheless, the idea of perceptron must be considered the first
paradigm of parallel machine vision system, defined as the mutual inter-
change of information (data and instructions) among a set of cooperating
processing units. For example, it suggested the architecture of interesting
machine vision systems 9110i11.

3. The pragmatic paradigm


The pragmatic paradigm of artificial vision is goal oriented, and the goal
is to build machines, that perform visual tasks in a specific environment,
according to user-requirements. It follows that the machine design can be
based on models that are not necessarily suggested by natural visual sys-
163

tems. Here, the choice of the visual model is usually based on optimization
criteria.
Even if pragmatic approach has been developed under the stimulus of
practical requirements, it has contributed also to understand better some
vision mechanisms. For example, graph theoretical algorithms have been
successfully applied to recognize Gelstat clusters according to the human
perception l2?l3. The relation between graphs and natural grouping of
patterns could be grounded on the fact that our neural system could be
seen as is a very dense multi-graph with billions of billions of paths. Of
course, the question is still open and probably will never be solved.
Artificial visual systems can be described throughout several layers of
increasing abstraction; each one corresponding to a set of iterated trans-
formations. The general purpose is to reach a given goal, starting from an
input scene, X, represented, for example as an array of 2D pixels or 3D
voxels defined on a set of gray levels G. The computation paradigm follows
four phases (see Figure 3): Low level vision, Intermediate level vision, High
level vision, Interpretation.
Note that these steps don’t operate as a simple pipeline process; they
may interact through semantic networks and mechanism of control based
on feedback. For example, parameters and operators, used in the low level
phase, can be modified if the result is inconsistent with an internal model
used during the interpretation phase. The logical sequence of the vision
phases is weakly related with natural vision processes; in the following a
pragmatic approach is considered, where the implementation of each visual
procedure is performed by means of mathematical and physical principles,
which may have or not neuro-physiological counterpart.
The pragmatic approach has achieved promising and useful results in
many application fields. Among them robotics vision 14, face expres-
sion analysis 15, document analysis 16, medical imaging 17, and pictorial
database

3.1. Low Level Vision


Here, vision operators are applied point-wise and in neighborhood spatial
domains to perform geometric and intensity transformations. Examples are
digital linear and not-linear filters, histogram equalization with cumulative
histogram 19, mathematical morphology ’O. Figures4a,b show examples of
morphological erosion using man-definitions ”. The purpose of this stage
is to perform a preprocessing of the input image that reduces the effect
164

Early Vis'ton
Coanitive Vision

Figure 3. T h e classical paradigm of an artificial vision system.

Figure 4. T h e input image (a); its erosion (b)

of random noise, perform sharpening, and detects structural and shape


features.
A second goal of this phase is the selection of area of interests inside the
scene, where to perform more complex analysis. The Discrete Symmetry
Transform ( D S T ) ,as defined in 22,23, is an example of attentive operators
that extract area of interest based on the gray levels circular symmetry
around each pixel (see Figures 5a,b,c,d). The definition of interesting area
depends on the problem and it is based on information theory methods.
165

Figure 5. The attentive operator D S T : a) the input image; b) the application of the
D S T ; c) the selection of point of interest; c) the selection of eyes.

Low level vision operators can be often directly implemented in artificial


retinas both to reduce the cost of the whole computation and to enhance
their performance. So named active retinas have been included in active
visual systems 24,25
Active visual systems have mechanisms that can actively control camera
parameters such as orientation, focus, zoom, aperture and vergence in re-
sponse to the requirements of the task and external stimuli. More broadly,
active vision encompasses attention, selectively sensing in space, resolution
and time, whether it is achieved by modifying physical camera parameters
or the way data is processed after leaving the camera. The tight coupling
between perception and action proposed in the active vision paradigm does
not end with camera movements. The processing is tied closely with the
activities it supports (navigation, manipulation, signaling danger or oppor-
tunity, etc.) allowing simplified control algorithms and scene representa-
tions, quick response time, and increased success a t supporting the goals of
activities.
Active vision has higher feasibility and performanqe but requires lower
cost. The application of active vision facilitate certain tasks that would
be impossible using passive vision. The improvement of performance can
be measured in terms of reliability, repeatability, speed and efficiency in
performing specific tasks as well as the generality of the kinds of tasks
performed by the system. In active vision, the foveated sensor coupled with
166

a position system can replace a higher resolution sensor array. Moreover,


less data needs to be acquired and processed will significantly save the
hardware costs. 26,27.

3.2. Intermediate Level Vision


Neurologists argue that There are quite a number of visual analysis carried
out by the brain that is categorized as intermediate level vision. Included are
our ability to identify objects when they undergo various transfomations,
when they are partially occluded, to perceive them the same when they un-
dergo change in size and perspective, to put them into categories, to learn
to recognize new objects upon repeated encounter, and to select objects in
the visual scene by looking at them or reaching for them 28.
In the case of artificial visual systems the intermediate level vision com-
putation is performed on selected area of interest. The task is the extraction
of features that bring shape information. Features can of geometrical nature
(borders, blobs, edges, lines, corners, global symmetries, etc.) or computed
on pixel intensity values (regions, segmentation). These features are stored
a t an intermediate level of abstraction. Note that, such features are free
of domain information - they are not specifically objects or entities of the
domain of understanding, but they contain spatial and other information.
It is the spatial/geometric (and other) information that can be analyzed
in terms of the domain in order to interpret the images. Yet, as in the
natural case, all features involved must by invariant for geometrical and
topological transformations. This property is not always satisfied in the
real applications.
Geometrical features. Canny’s edge detectorz9 is one of the most robust
and is widely used in literature. The absolute value of the derivative of a
Gaussian is a good approximation to one member of his family of filters.
Ruzon and Tomasi 30 introduced an edge detector that uses also color
information. The RGB components are combined following two different
strategies: a) the edges of the three components are computed after the
application of a gradient operator then the fusion of the three edge-images
is performed; the edges are detected on the image obtained after fusing the
gradient-images. 2D and 3D modelling 3 1 ,
Snakes computation 32 is another example of technique to retrieve object
contours. They are based on an elastic modelling of a continuous, flexible
open (closed) parametric curve, T ( s ) with s E [0, 11, is imposed upon
and matched to an image. Borders follow the evolution of the dynamic
167

system describing the snake under constraints that are imposed by the
image features. The solution is funded by an iterative procedure and it
corresponds to the minimization of the system energy. The algorithm is
based on the evolution of the dynamic system:

The first term of this equation represents internal forces, where w1 is


the elasticity and 202 is the stiffness. The second term is the external forces;
it depends in image features. The solution of this equation is funded by an
iterative procedure and it corresponds to the minimization of the system
energy:

where: Eint(-i;)(s)) = I -Izas - lw12 and Efeature = -(V2P')2.


A global solution is usually not easily founded and piece wise solutions
are searched. The energy of each snake piece is minimized (the ends are
pulled to the true contour and the snake growing process is repeated.
A global solution is usually not easily founded and piece wise solutions
are searched. Numerical solutions are based on finite differences methods
and dynamic programming 33 (see Figure 6).
Methods to compute global object symmetries are considered a t this
level of the analysis, the Smoothed Local Symmetry 35 has been introduced
to retrieval a global symmetry (if it exists) from the local curvature of
contours.
In36137938 the mathematical background to extract object skewed sym-
metries under scaled Euclidean, f i n e and projective images transformation
is proposed. An algorithm for back-projection is given and the case of non-
coplanarity is studied. The authors introduce also the concept of invariant
signature for matching problems (see Figure 7).
A third example of intermediated level operation is the DCAD-
decomposition 39 that is an extension of the Cylindrical Algebraic Decom-
position 40. The DCAD decomposes a connected component of a binary
input image by digital straight paths that are parallel to the Y (X)-axis and
cross the internal digital borders of the component where they correspond
to concavities, convexities, and bends. F'rom this construction a connectiv-
ity graph, CG, is derived as a new representation of the input connected
168

Figure 6. (a) input image; (b) edges detection; (c) snakes computation.

components. The CG allow us to study topological properties and visibility


relation among all components in the image.
In this phase both topological information and structural relations need
to be represented by high level data structure where full order (sequences)
and/or partial order define relation between image components. Examples
of partial ordered data structure are the connectivity graphs and trees
41t39,

42

Grouping and segmentation. The automatic perception and description of a


scene is the result of a complex sequence of image-transformations starting
from an input image. All transformations are performed by vision operators
that are embedded in a vision loop representing their flow (from the low to
169

Figure 7. (a) Cipolla's skewed symmetries detection; (b) examples of global symmetry
detection.

the high level vision).


Within the vision loop, the segmentation of images in homogeneous
components is one of the most important phase 43. For example, it plays
a relevant role in the selection of parts an object on which to concentrate
further analysis. Therefore, the accuracy of the segmentation step may
influence the performance of the whole recognition procedure 44.
More formally, we may associate to the input digital image X a weighted
undirected graph G = (X,E , b ) , nodes of which are all pixels, arcs, E , de-
pend on the digital connectivity (e.g. 4 or 8), and the function b : E -+ [0,1]
is the arc-weight that could be a normalized distance between pixels. The
segmentation is then defined as an equivalence relation, N ,that determine
the graph partition:

where 0 5 q!~ 5 1 is a given threshold. One graph partition will correspond


to each threshold. The spanning of all value of 4 will provide a large set of
segmentation solutions and this make hard the segmentation problem.
On the other hand, image segmentation depends on the context and it is
subjective the decision process is driven by the goal or purpose of the visual
170

task. Therefore, general solutions do not exist and each proposed techniques
is suitable for a class of problems. In this sense, the image segmentation is
an hill posed problem that does not admit a unique solution.
Moreover, the segmentation problem is often hard, because the proba-
bility distribution of the features is not well known. Often, the assumption
of a Gaussian distribution of the features is a rough approximation that
makes false the linear separation between classes.
In the literature the segmentation problem has been formulated from
different perspectives. For example, in 45 is described a two-steps procedure
that use only data included in boundary, this approach has been extended t o
boundary surfaces by combining splines and superquadrics to define global
shape parameters 46,47. Other techniques use elastic surface models, that
are deformed under the action of internal forces to fit object contours using
a minima energy criteria 48. A model-driven approach segmentation of
range images is proposed in 49.
Recently, Jianbo and Malik 50 have considered a 2 0 image segmenta-
tion as a Graph Partitioning Problem ( G P P ) solved by a normalized cut
criterion. The method founds an approximated solution by solving a gener-
alized eigenvalue system. Moreover, the authors consider both spatial and
intensities pixel features in the evaluation of the similarity between pixels.
Recently, the problem of extracting the largest image regions that satisfy
uniformity conditions in the intensity/spatial domains has been related to
a Global Optimization Problem (GOP) 51 by modelling an images by a
weighted graph; where the edge-weight is function of both intensities and
spatial information. The chosen solution is that one for which a given
objective function obtains its smallest value, hopefully the global minimum.
In 52 a genetic algorithm is proposed to solve the segmentation problem as
a G O P problem using a tree regression strategy 53.
The evaluation of a segmentation method is not an easy task because
the expected results are subjective and they depend on the application.
One evaluation could be the comparison with a robust and well experi-
mented method; but this choice is not always feasible; whenever possible
the evaluation should be done combining the judgement of more than one
human expert. For example, the comparison could be performed using a
vote strategy as follows:
171

Figure 8. (a) input image; (b) human segmentation; ( c ) GS, (d) NMC, (e) SL, and (f)
C-means segmentations.

where #agrkis the number of pixels in which there is the agreement between
the human and the machine, I H P k I is the cardinality of the segment defined
by the human, and 1 41is the cardinality of the segment found by the
algorithm.
Figure 8 shows how different segmentations methods (Genetic Segmen-
tation (GS), Normalized Minimum Cut (NMC), Single link (SL), C-means)
perform the segmentation of the same image. Figure 8b shows the human
segmentation obtained using the vote strategy.

3.3. High Level Vision


In this phase decision tasks regard the classification and the recognition
of objects, the structural description of a scene. For example, high level
vision provides the medical domain with objective measurements about fea-
tures related to diseases, such as narrowing of arteries, volume changes of
a pumping heart, or the localization of points attaching muscles to bones
(used to analyze human motion) 54. Classification of cells is another exam-
ple of hard problem in the biological domain; for example, both statistical
172

Figure 9. The axial moment values of an a (a) and a p (b) cell.

and shape features are used in the classification and recognition of a! and
,B ganglion retinal cells. In 55 is presented a quantitative approach where
several features are combined such as diameter, eccentricity, fractal dimen-
sion, influence histogram, influence area, convex hull area, and convex hull
diameter. The classification is performed integrating the results from three
different clustering methods (Ward’s hierarchical scheme, K-Means and Ge-
netic Algorithm) using a voting strategy. The experiments indicated the
superiority of some features, also suggesting possible biological implications
among them the eccentricity derived from the axial moments of the cell (see
Figure 9).
Autonomous robots equipped with visual systems are able to recognize
their environment and to cooperate in finding satisfactory solutions. For
example in 56 is developed a probabilistic, vision-based state estimation
method for individual, autonomous robots. A team of mobile robots is
able to estimate their joint positions in a known environment and track the
positions of autonomously moving objects. The state estimators of different
robots cooperate to increase the accuracy and reliability of the estimation
process. The method has been empirically validated on experiments with
a team of physical robots playing soccer 5 7 .
The concept of internal model is central in this phase of the analysis.
173

Often, a geometric model is matched against the image features previously


computed and embedded in data structure derived in the intermediated
phase. The model parameters are optimized by minimizing a cost function.
Different minimization strategies (e.g. dynamic programming, gradient de-
scent, or genetic algorithms) can be considered. Two main techniques are
used in model matching called bottom-up when the primary direction of
flow of processing is from lower abstraction levels (images) to higher lev-
els (objects), and conversely top-down when the processing is guided by
expectations from the application domain 58.
Matching results depends also on the parameter chosen space. For ex-
amples the classification of human bones from MRI scans requires the com-
bination of multi-views data and the problem can’t admit an exact solution
59, the human face recognition has been treated considering a face as an ele-
ment of a multi-dimensional vector space 60, in a the recognition of faces
under different expressions and partial occlusions has been considered. To
resolve the occlusion problem, each face is divided into local regions that
are individually analyzed. The match is flexible and based on probabilistic
methods. The recognition system is less sensitive to the differences between
the facial expression displayed on the training and the testing images, be-
cause the author weights the results obtained on each local area on the
basis of how much of this local area is affected by the expression displayed
on the current test image.

3.4. Interpretation
This phase exploit the semantic part of the visual system. The result be-
longs to an interpretation space. Examples are linguistic description and
definition of physical models. This phase could be considered as the con-
scious component of the visual system. However, in a pragmatic approach it
is simply a set of semantic rules that are given, for example, by a knowledge
base.
The technical problem is that of automatically deriving a sensible in-
terpretation from an image. This task depends on the application or the
domain of interest within which the description makes sense. Typically, in
a domain there are named objects and characteristics that can be used in a
report or to make a decision. Obviously, there is a wide gap between the na-
ture of images (essentially arrays of numbers) and their descriptions and the
intermediate level of of the analysis is the necessary the link between image
data and domain descriptions. There are researchers who take clues from
174

biological systems to develop theories, and there are those who focus on
mathematical theories and physics regarding the imaging process. Eventu-
ally however, theory becomes practice in the specification of an algorithm -
embodied in an executable program with appropriate data representations.
There are alternate views of vision, resulting in other paradigms for image
understanding and research.
In image interpretation, knowledge about the application domain is ma-
nipulated to arrive at understanding the recorded part of the world. Knowl-
edge representation schemes that are studied include semantic networks 62,
Bayesian and Belief Networks 6 3 , and fuzzy expert systems 64. Some of
the issues addressed within these schemes are: incorporation of procedural
and declarative information, handling uncertainty, conflict resolution, and
mapping existing knowledge onto a specific representation scheme. Re-
sulting interpretation systems have been successfully applied to interpret
utility map, interpreting music scores and interpreting face images. Future
developments will focus on the central theme of fusing knowledge represen-
tations. In particular, attention will be paid towards information fusion,
distributed knowledge in multi-agent systems and mixing knowledge de-
rived from learning techniques with knowledge from context and expert.
Moreover recognition systems must be able to handle uncertainty, and
t o include subjective interpretation of a scene. Fuzzy-logic 67 can provide
a good theoretical support to model such kind of information 65966.For
example, to evaluate the degree of truth of the propositions:

- the chair, beyond the table, is small ;


- the chair, beyond the table, is very small;
- the chair, beyond the table, is quite small;
- few objects have straight medial axis;
it is necessary to represent the fuzzy predicate small, the fuzzy attributes
very and quite, and the fuzzy quantifier few. The evaluation of each propo-
sition depends on the meaning that is assigned t o small, very, quite, and
few. Moreover the objects chair and table, and the spatial relation beyond
must be recognized with some degree of truth.
These simple examples suggest the need for the use of fuzzy-logic to
describe spatial relations often used in high level vision problems. How-
ever, the meaning of the term soft-vision can’t be simply limited to the
application of fuzzy-operators and logic in vision. This example shows that
new visual systems should include soft tools t o express abstract or not fully
defined concepts 68 by following the paradigms of the soft-computing 69.
175

Figure 10. T h e Kanizsa triangle illusion.

4. Final remarks
This review has shown some problems and solutions in visual systems. To-
day, more than 10,000 researchers are working on visual science around
the world. Visual science has become one the most popular field among
scientists. Physicists, neurophysiologists, psychologists, and philosophers
cooperate to reach a full understanding about visual processes from differ-
ent perspectives. Fusion and the integration of which will allow us to make
consistent progresses in this fascinating subject. Moreover, we note that
anthropomorphic elements should be introduced to design complex artifi-
cial visual system. For example, the psychology of perception mays suggest
new approaches to solve ambiguous 2D and 3D segmentation problems.
For example, Figure 10 shows the well known Kanizsa illusion 70. Here the
perceived edges have no physical support whatsoever in the original signal.

References
1. D.Marr, S.Francisco, W.H.Freeman, (1982).
2. S.E.Palmer, MIT Press, (1999).
3. M.D.Eesposito, J.A. Detre, G.K. Aguirre, M. Stallcup, D.C. Alsop, L.J. T i p
pet, M.J. Farah, Neuropsychologia 35(5), 725 (1997).
4. M.Conrad, Advances in Computers, 31, 235 (1990).
5. [B] N.Wiener, Massachusetts Institute of Technology, MIT Press, Cambridge
(1965).
6. F.Rosemblatt, Proceedings of a Symposium on the Mechanization of Thought
Processes, 421, London (1959).
7. F.Rosemblatt, Self-organizing systems, Pregamon Press, NY, 63 (1960).
8. F.Rosemblatt, Spartan Books, NY (1962).
9. V. Cantoni, V. Di Ges, M. Ferretti, S. Levialdi, R. Negrini, R.Stefanelli,
Journal oj VLSl Signal Processing, bf 2, 195 (1991)
176

10. A.Merigot, P.Clemont, J.Mehat, F.Devos, and B.Zavidovique, in Pyrami-


dal Systems for Computer Vision, V.Cantoni and S.Levialdi (Eds.), Berlin,
Springer-Verlag, (1986).
11. W.D.Hillis, The Connection Machine, Cambridge MA: the MIT Press,
(1992).
12. C.T.Zahn, IEEE Trans. on Comp., C-20, 68 (1971).
13. V.Di Gesli, Znt.Journa1 oj Fuzzy Sets and Systems, 68, 293 (1994).
14. E.D.Dickmanns, Proceed-ings of the Fifteenth International Joint Conference
on Artificial Intelli-gence, Nagoya, 1577 (1997).
15. B. Kolb and L. Taylor, Cognitive Neuroscience of Emotion, R.D. Lane and
L. Nadel, eds., Oxford Univ. Press, 62 (2000).
16. P.M.Devaux, D. B. Lysak, R. Kasturi, International Journal o n Document
Analysis and Recognition, 2(2/3), 120 (1999).
17. A. J.Fitzgerald , E.Berry, N.N.Zinovev, G.C.Walker, M.A.Smith and
J.M.Chamberlain, Physics in Medicine and Biology, 47, 67 (2002).
18. J.Assfalg, A.Del Bimbo, P.Pala, IEEE Transactions on Visualization and
Computer Graphics, 8(4), 305 (2002)
19. R.C.Gonzales, P.Wintz, Prentice Hall, (2002).
20. J. Serra, Academic Press, New York, (1982).
21. L.Vincent and P.Soille, IEEE Transactions on PAMI, 13(6), 583 (1991).
22. V.Di Gesil, C.Valenti, Vistas in Astronomy, Pergamon, 40(4), 461 (1996).
23. V.Di Gesh, C.Valenti, Advances in Computer Vision (Solina, Kropatsch,
Klette and Bajcsy editors), Springer-Verlag, (1997).
24. Y. Aloimonos, CVGIP: Image Understanding, 840 (1992).
25. R.Bajcsy, Proceedings oj the IEEE, 76,996 (1988).
26. T.M. Bernard, B.Y. Zavidovique, and F.J. Devos, IEEE Journal oj Solid-
State Circuits, 28(7), 789 (1993).
27. G.Indiveri, R.Murer, and J.Kramer, IEEE Trans. on Cicuits and SystemsZI:
Analog and Digital Signal Processing, 48(5), 492 (2001).
28. P.H.Schiller, http://web.mit.edu/bcs/schillerlab/index.html.
29. J.Canny, IEEE Trans. on Pattern Analysis and Machine Intelligence, 8(6),
679 (1986).
30. M. Ruzon and C. Tomasi, Proceedings oj the IEEE Conference on Computer
Vision and Pattern Recognition, Ft. Collins CO, 2, 160 (1999).
31. O.D. Faugeras, MIT Press, 302 (1993).
32. D.Terzopulos and K.Fleischer, The visual Computer, 4, 306 (1988).
33. A.Blake and M.Isard, Springer-Verlag,London, (1998).
34. H.Blum and R.N.Nage1, Pattern recognition, 10, 167 (1978).
35. M.Brady, H.Asada, The International Journal oj Robotics Research, 3(3), 36
(1984).
36. D.P.Mukhergee, A.Zisserman, M.Brady, Philosofical Transaction oj Royal
Society oj London Academy, 351, 77 (1995).
37. T.J.Chan, R.Cipolla, Image and Vision Computing, 13(5), 439 (1995).
38. J.Sato, R.Cipolla, Image and Vision Computing, 15(5), 627 (1997).
39. V.Di Gesh, C.Valenti, Journal oj Linear Algebra and its Applications,
Springer Verlag, 339, 205 (2001).
177

40. G.E.Collins, Proc.of the Second GI Conference on Automata Theory and


Formal Languages,Springer Lect.Notes Comp. SCi., 33, 515 (1975).
41. A.Rosenfeld, Journal oj ACM, 20, 81 (1974).
42. H.Samet, ACM Computing Surveys, 16(2), 187 (1984).
43. R. Duda and P. Hart, NY: Wiley and Sons, (1973).
44. K . h , Pattern Recognition, 13. 3 (1981).
45. A. Pentland, Int. J. Comput. Vision, 4, 107 (1990).
46. D.Terzopulos, D.Metaxas, IEEE Dans.on P A M , 13(7) (1991).
47. N. Raja and A. Jain, Image and Vision Computing, 10(3), 179 (1992).
48. I. Cohen, L. D. Cohen, N. Ayache, ECCV’92, Second European Conference
on Computer Vision, Italy, 19 (1992).
49. A. Gupta, R. Bajcsy, Image Understanding, 58, 302 (1993).
50. J S h i , J.Malik, IEEE on PAMI, 22(8), 1 (2000).
51. R. Horst and P.M. Pardalos (eds.), Handbook of Global Optimization,
Kluwer , Dordrecht 1995.
52. G.Lo Bosco, Proceedings of the 11th International Conference on Image Anal-
ysis and Processing, IEEE Comp SOC.Publishing, (2001).
53. L. Breiman, J.H. Friedman, R.A. Olshen,C.J. Stone, Wadsworth Interna-
tional Group, (1984).
54. S.Vitulano, C.Di Ruberto, M.Nappi, Proceedings of the third IEEE mt. Conf.
on Electronics, Circuits and Systems, 2, 1111 (1996).
55. R.C.Coelho, V.Di Gesti, G.Lo Bosco, C. Valenti, Real Time Imaging, 8 , 213
(2002).
56. T.Schmitt, R.Hanek, M.Beetz, S.Buck, and B.Radig, IEEE Transactions on
Robotics and Automation, 18(5), 670 (2002).
57. M. Beetz, S. Buck, R. Hanek, T. Schmitt, and B. Radig, First International
Joint Conference on Autonomous Agents and Multi Agent Systems (AA-
MAS), 805 (2002).
58. V.Cantoni, L.Carrioli, M.Diani, MTerretti, L.Lombardi, and M.Savini, Im-
age Analysis and Processing, V.Cantoni, V.Di Ges, and S.Levialdi (Eds.),
Plenum-Press, 329 (1988).
59. P.Ghosh, D.H.Laidlaw, K.W.Fleischer, A.H.Barr, and R.E. Jacobs, IEEE
Transactions on Medical Imaging, 14(3), (1995).
60. A. Pentland, T. Starner, N. Etcoff, N. Masoiu, 0. Oliyide, and M.Turk, Proc.
Workshop Int’l Joint Conf. Artificial Intelligence, Looking at People, (1993).
61. A.M.Martinez, IEEE Duns. on P A M , 24(6), (2002).
62. A.T.McCray, S.J.Nelson, Meth. Inform. Med., 34,193 (1995).
63. J.Pear1, Morgan Kaufmann, (1988).
64. N.Kasabov, MIT Press, (1996).
65. C.V.Negoita, The Benjamin/Cumming Publishing Company, (1985).
66. LSombB, Wiley Professional Compting, (1990).
67. L. Zadeh, Information and Control, 8 , 338 (1965).
68. V. Di Gesti, Fundamenta Informaticae, 37, 101 (1999).
69. L.A.Zadeh, Communication oj the ACM, 37(3), 77 (1994).
70. G.Kanizsa, Rivista di Psicologia, 49(1), 7 (1955).
178

INFORMATION SYSTEM IN THE CLINICAL-HEALTH AREA

G..MADONNA
Sistemi Informativit

1. The external context and objectives of the information system

The reference context must consider the analysis contained in the


guidelines of the “White Book on Welfare”, whose objective is that of setting
down a reference picture for the creation and strengthening of the Country’s
social cohesion. From this viewpoint, two fundamental aspects, characterising
the Italian situation, are being analysed: the matter of population and the role of
the family, and two main objectives are identified: to favour the birth rate and
improve family policies.
As far as these focal themes are concerned, this document does not
constitute a closed packet of proposals but rather it aims at representing a basis
for a discussion about a new model of social politics. The policy on solidarity
must be set into a framework of broad ranging actions aimed at guaranteeing
social cohesion as a condition itself for development: this is the way institutional
changes are going, and they are already underway both in Europe with the basic
rights Card and the Lisbon summit, as well as in Italy, with the modification of
Title V of the Constitution.
Support to the family, the elderly and the disabled. These are therefore the
main objectives the “White Book on Welfare” wishes to achieve. A revolution
which wishes the family to be enhanced to the utmost, where the word family is
to be understood as being the fundamental and essential nucleus for the
development of civil society.
The National Health System is to be inserted into this context, a system
which is today subject of profound changes which concern its welfare mission,
its organisation and underlying system of financing. The Legislative Decree
229/99 (Rgormu Ter) reformed the “SSiV” - the National Health System -
reiterating and confirming the concept of turning the system into a “company”
characterised by the strategical, managerial and financial autonomy of the single
health structures.
179

This scenario can however be placed into a process of transformation going


back a long way, right from the Legislative decree 502/92 and which, as a whole
implies:
the consolidation of the trend to move the attention of the public action
away from the formalities of procedures, to the verifying of the effectiveness of
results;
the orientation towards people satisfaction and the transparency of public
action and in the case of the health system, to the achieving of best welfare
levels compatible with the resources available;
the attention to budget obligations, and the taking on of appropriate tools
for governing them, such as the afore-mentioned process of rendering the
system like a “company”
Of course, both from the view point of the information system, and that of
the strategical objectives of the company, to achieve at the same time the
objective of satisfying users (effectiveness of result, quality of the level of
service) and that of the control of costs on the basis of an availability of obliged
resources (the objective about the efficiency and productivity), is not an easy
task. Figure 1 represents the situation in an elementary way:
a Company can achieve excellent performance results, but at the cost of a
use of resources which does not pay attention to economic obligations and
therefore after a short time is unsustainable;
on the contrary, an approach which is too attentive to economic obligations
alone (an “obsession” with costs), ends up transforming itself into low costs per
unit of product, at the price of poor quality products, and implies that the
essential public aim of prevention and healthcare for its citizens is not achieved.
180

users a'reserved
with "infinite

Figure 1. Trade Off Effectiveness/Efficiency


The pursuit of these objectives is to be placed into a context of interactions
with external entities which exchange data, information, requests. In the
diagram at Figure 2, this situation is represented very simply, but effectively, so
as to clarify any basic problems there may be:
the crucial role of the health service, played by family doctors, and the
necessity therefore to construct an effective relationship between them
and the Company;
the importance of the Region, as this is the office which issues not only
financing but also standards, and the shifting of the central State
Adrmnistration to a role of secondary level, almost always mediated by
the Region, as regards the Company;
181

0 the emerging of consumer protection associations as significant actors


who ask for information and system transparency, especially from the
point of view of outcome and behaviour;
the emerging, alongside the more traditional interactions of the
Compwy with certified and agreed structures on the one hand and with
suppliers of various services on the other, of further possibilities of
interaction and the utilisation of suppliers of health and external welfare
services (service co-operatives, non-profit making companies,
voluntary workers, collaborators etc.).

Figure 2. Context diagram of the health Company


182

To sum up, all of these interactions put together require the information
system supporting them to be able to:
concentrate on the management of fundamental interactions (those with
the patient), so as to ensure the governing of the strategic objectives
pursued;
handle other exchanges of information adequately, in particular by
adopting progressively a logic of close integration of the information
flows with external companies (for example entrusting GP’s with
computerised appointment taking operations);
produce automatically a full and explanatory set of information of
public domain or with controlled access, so as to guarantee the
necessary performance transparency of processes, and therefore make
them usable by third parties, by way of portals.

2. Internal organisation: objectives and strategic variables

The primary objective of the Company is that of guaranteeing essential levels of


welfare, according to what is laid out in the National Health Plan:
0 collective health and medical care for life and at work;
district medical care;
0 hospital medical care.
The division of these objectives can be immediately transformed into a
structural division of the Company into Departments, Districts and Hospitals,
each of which expresses their specific information needs.
The three types of structures, visualised at Figure 3, must be supported by
service structures and must be governed by a system of objectives and strategic
variables which integrate them.
As to the former, the district, department and hospital structures visualise it
as the heart of components the company is made up of; a heart supported by a
group of general technical and administrative services, and which exchanges
data, information, processes and activities with the outside in two ways:
0 processes of access by the Company users;
0 processes of control by the strategic management on the Company.
183

ministrative Technical Suppo

Figure 3. Organisational components of the health Company

The strategic variables transversal to the three types of structure, then, can
be identified on the basis of the two essential access and control fbnctions:
0 management control;
the information and IT system, decisive both for the circulation of data for
access purposes, as well as for their analysis for control purposes;
the quality system, i.e. new attention paid to the service offered to the user;
human resources development.

3. The role of the information system: a system for transforming data


into information

Having an “information system”, and not just scattered chunks of IT


applications, represents the decisive leap of quality which needs to be taken;
some fixed points must be brought into this direction, method points to be
respected in that they constitute the “critical success factor” of an information
system.
184

The Information System generates information useful for improving


management and therefore takes account of and provides an answer to the
following aspects:
operational processes which constitute the activities of the Company
and levels and their integration points;
decisions which must be taken on a daily basis;
assessments which must be carried out in order to make decisions;
organisational responsibilities of those called upon to carry out the
assessments and make decisions.
In order to succeed in achieving these objectives, the system must respond
to the following requirements:
it is not parallel to the management system but an integral part of it, in
that all the information, both for efficiency purposes and for system
quality, can be constructed and generated by ordinary operational
processes;
the same datum in the scope of a homogenous procedure is collected
only once and from one place only, in that the duplication of
information sources, far from being a factor of data quality, is certainly
a cause for redundancy of procedures and even system errors;
it has the utmost flexibility and highest degree of opening to evolution
of solutions both technological and, especially, organisational, so as to
be able to cope with the inevitable multiplicity, diversification and poor
standardisation of activities.
The best integration possible between applications and the archives which
constitute the information system must be envisaged, in that only from
integration is it possible to obtain both process efficiency and availability of top
level information.
In other words, information, which is the significant aggregation of
processed data according to interpreted models, is often the result of the
combination of data contained in archives from different subsystems.
There can be many examples, but maybe it is sufficient to recall the
problems of management control, which is that typical management function
through which the Company ensures that the activity carried out by the
organisation goes in the direction set out by the units concerned and that it
operates in the best conditions of inexpensiveness possible.
From an information point of view, management control is an area where
very diversified information is collected to be summed up into economic data
and activity data or indicators of results, so that the costs of the activities and
relative production factors can be compared.
Expressed in these terms the reasoning is not particularly complex. The
nodal point, nevertheless, is that the designing of a similar system requires the
185

identification of the different levels within which the data must be treated and
processed and, in a correlated way, the identification of the integration
mechanisms of the data themselves.
Since integration is the solution to functioning problems of the information
system of the company, an integration which is able to exchange flows of
information is to be achieved by means of solutions which enable the sharing of
archives by all subsystems.
So it is clear how the help of a strong, organic and elastic support system is
fundamental to informatiodmanagement activities.
The system must be organised in such a way that the data only originate
from primary sources, identified as follows:
original data, generated by management processes;
second level data, produced by processing procedures;
complex data, resulting from the automatic acquisition from more than
one archve.

The three above-mentionedtypes are analysed and described as follows.


Original data:
We can define as original all the data that, as compared with the
information process as a whole, are collected at the origins of the
process in a strictly correlated way to the operational management of
the activity. Original data are therefore all the current information put
together, managed by the part of the information system that can be
defined as transactional or On Line Transactional Processing (OLTP).

Second level data:


This is data obtained through the processing of information already
acquired by the information system, possibly by procedures of a
different subsystem to the one which is using the data.

Complex data :
This is data necessary to activities of control, management and
statistical and epidemiological assessment, which have been originated
by a crossing of data present in more than one archive at the moment in
which they are correlated, in order to be able to express significant
values. This therefore deals with the definition of one or more data
warehouses, starting from which the specific application systems see to
the processing of a more aggregated level, in an On Line Analytical
Processing (OLAP) logic.
186

Figure 4. The layers of the health information system.


The division by sorts of data dealt with above, can be interpreted in the light
of the centrality of the patient in the health information system: Figure 4
visualises the analogy between the levels of data complexity and the thematic
layers of the health information system related to the patient:
the system centres upon the patient, who is at the base of the
information pyramid;
the next layer is constituted by the original data, which are related to
the patient;
0 fbrther up, the processing characteristic of operational systems is
placed (second level data), which operates fundamentally for the
187

service of the patient, both directly, in that they are the direct user of
them, and indirectly, as support to the work of health staff;
the top two layers, both characterised by complex data resulting from
statistical or OLAP processing, divide the information about the patient
according to the two axes - effectiveness and efficiency - recalled most
often:
J in the first direction, the essential processing is of the
medical and clinical sort, supporting the quality of the health
outcome and capability of the system to guarantee the
necessary levels of assistance;
J in the second direction, the typical processing are those
supporting managerial and strategical decisions and control
of expenditure and the relative production level.

4. Orientation of the patient and analysis capability

As a whole, the information system of a health Company must be able to


undertake many and diverse tasks and it is necessary to computerise adequately
all the processes inside the Company’s administrative machine. In this sense, the
general orientation to the patient, which constitutes the reading key favoured
here, obviously is not sufficient. In fact, it is clear that there are specific areas of
management and analysis where it is essential to support specific processes. For
example, in the support of specific activities like that of hotels or in interventions
of industrial medicine, it is necessary to interface more to companies than to
individuals.
However, according to the definition presented here, the computerising of
internal processes (accounts, staff management, supplies), even though essential,
must be a result of a total design which puts the management of processes
orientated to users outside the Company at the centre. And this for two reasons:
firstly, because the average level of computerisation of administrative
processes is often already higher than those strictly productive, so that
effectiveness and efficiency can be improved in the second direction;
secondly, because only by making data flow in a direct and integrated
way from production processes to administrative ones, in particular
according to the analysis of effectiveness and efficiency mentioned
above, is it possible to provide real added value to the Company
through its information system.
The ambition of this proposal of computerisation based on an integrated
information system is moving in this direction.
This can be illustrated with the aid of a diagram, visualised at Figure 5, of
the main (certainly not only) ways of access the patient has to the health system.
188

Q
P

Operator
USER Management
ASSOCIATION Operator

Figure 5. Simplified diagram about access to health services


The three “ways of access” identified in the figure (medical check-
up/prescription from the family doctor, hospital access in adrmssions and
hospitalizazation, information access by way of the PR Office) set up a path
which, through the automation of the phases of the patient’s entrance into the
health and medical care system (Appointments Centre, ADT, support to the PR
Office), constructs a precise information route whch has numerous uses:

firstly, the patients’ register is placed as a place for co-ordination and


integrated management of the different information elements;
0 the various processes of care act on this level (admissions, tests and
check-ups), these produce both technical information elements (for
example the Hospital Discharge Card) but also effects such as
admmistration and economics, with the subsequent assessment of the
services delivered;
0 the admmistrative and control processes, i.e. in a general line with all
the internal aspects necessary for developing the analysis capability of
the Company and its top management, are the consequence of the
access process and the complex path of welfare and medical care.
189

All in all, therefore, building up an information system orientated to the


patient, if the flows handled are correctly integrated, also results in being an
effective and complete way of supporting the internal processes of an
administrative sort, because of their natural correlation to the welfare and
medical care path.

5. The Hospital Information System (HIS)

The Hospital Information System, so as to manage the complex wealth of


company information and at the same time guarantee the involvement of
operative departments, must satisfy the objectives illustrated here below:

Centrality of the patient


Information which is generated about a patient at the moment they come
into contact with the structure and receive the services requested, is collected
and aggregated at the level of the patient themselves.

Co-ordination of the welfare process


The co-ordination and planned co-operation between operative units within
the health structures (hospital specialists, family doctors, district teams in the
hospital-territory system) enables the activation of a welfare process which
ensures the highest levels of quality, timeliness and efficiency in the access to
services and their delivery.

Control of production processes


Recent health legislation imposes an optimisation of expenditure and
revision of structures and processes. The transformation according to modules
of the company sort requires that medical care is carried out in the context of a
global process which contemplates both clinical aspects and administrative ones.

The primary objectives that the subsystem of the ClinicaVHospital proposes


to resolve are as follows:
improvement of the organisation of work for a more free and rational
use of health structures;
improvement of efficiency and effectiveness of services in the light of
the qualitative growth of clinical welfare;
reduction of time spent by patient in the health structure, by means of
the organisation of waiting times between the request for a service and
the relative delivery (average in-patient times);
190

rationalisation of the management of health information about the


patient: previous complete and exhaustive clinical hstories;
availability of information for investigations and statistical surveys,
clinical and epidemiological research;
use of the System as a decisional support.

Figure 6 represents the integration between modules which compose the


subsystem of the ClinicaVHospital Area. The assertion of the centrality of the
register, i.e. of the patient, which is found to carry out the function of a primary
node of the whole system is confirmed.

Figure 6. Integration of modules


191

6. Integration of modules - processes

The integration of modules, i.e. the term Integrated Health System cannot
be left behind by a new company vision: a vision based on “Processes”.
This statement proposes another determining key for the “information
system” as strategic variable: the capability of the system to be of support to
company processes and, therefore to map itself out and configure in a flexible
way on these processes. In short, the logics of design requested of an ERP
system must regard not only administrative and accounting systems - now an
acquired fact - but health ones too, both of the health-administration sort and
health-professional sort.
Each process is in itself complex and integrates administrative, health,
economic and welfare elements. Each process requires therefore information
integration towards itself and other correlated processes.
As Figure 7 shows, an implicit logical information flow does exist which
transports information from health processes (“production” in the strictest sense)
towards directional processes (the “government” of production), transiting for
processes which are to a certain extent auxiliary ones (but obviously essential
for the functioning of the production machine of the company), of the
healthladministrative type and the accounting/administrative one.

j
......................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 7. Integration of processes


192

The architecture of the system must enable the support of the processes
described in the figure and the information flows which tie these processes, and
in particular:
the territory-hospital integration favouring the patient and relative
welfare processes. The integration of welfare processes between
operative units of hospitals and operators over the territory (family
doctors and pharmacies) is activated through the support of the
functionality of the whole process of delivery of services: from
information activities, prescription, appointment making, delivery, to
that of payment of tickets and withdrawallacquisition of return
mformation.
the integration of the operational units, whether clinical or not, as
auxiliary service to the medical and health staff in the carrying out of
activities relative to the care of the patient. The information system
must handle in a unitary and facilitated way the activities of the care
process controlling the process of the delivery of services requested and
their outcome, so as to obtain an improvement in quality and efficiency.
the integration of clinical-welfare information in order to guarantee
compactness of the welfare process. The visibility of the status of the
total clinical-welfare process towards the patient is made possible
thanks to the access to previous clinical records.
0 the integration of information with directional purposes as auxiliary
service to the management personnel of the Company. Following the
latest reforms, Companies are pursuing management improvement,
guaranteeing the delivery of services at the highest levels of quality,
aiming at the final objective of health, at the total outcome of welfare.
From that comes the need to collect, reconcile and integrate
information coming from the different information, administrative and
health subsystems.
the information integration with the Regional Health Office General
Management in order to facilitate both the administrative operations
aimed at reimbursements (communications with local health authorities
and with the Regional office concerning budgets and services
delivered), as well as control and regional supervision activities about
health expenditure, and lastly activities of health (epidemiological)
supervision. The interaction of the Company with the GM of the
Regional Office has a double scope: to transmit promptly the necessary
documentation to receive reimbursements and regional financial
support the Company is due for services delivered, and to provide the
Region with the information necessary in support of the governing of
expenditure and management of the financial support, planning and
rebalancing of the Regional Health System and the improvement of
services for the population.
193

What follows here is an illustration of three examples of health processes


which involve a series of integrated modules to achieve their objective.
The key identifies the fbndamental health processes carried out at the
hospital structure (represented on the lefi-hand side of the figures which follow),
as well as the specific activities necessary for the carrying out of the processes
themselves (in the green circles in the tables).

6.1. Delivery of Services Process


The process of Delivery of Services starts out with the request (Prescription)
for the Service itself from the family doctor, the hospital doctor, Emergency
Units and concludes with the issue of the referral (health process).
The ahstrative/accounting process is completed with the handling of
payment, the production of flows relative to the Ambulatory Services, the
handling of flows relative to the Mobility of Specialist Services.
194

Figure 8 - Delivery of services process


The process of making appointments (request) can be activated by the
modules:
Appointment centre and Web Appt. Centre (external patients);
INTERNAL Appt. Centre - Admissions, Discharges and Transfers
(ADT) and Ward Management (internal patients);
0 Emergency Department and Admissions (Management of Urgent
Requests);
Out-Patient Management (Direct Acceptance of Service).
All the services within the preceding modules must automatically
generatelfeed the works list of the Delivery Units of the services themselves
(Ambulatory Management) and, if subject to payment, produce written accounts
to be visible at the module of Cash Department Management (the movement is
recalled by the identification code of the appointment, or alternatively by
personal data).
The acceptance and delivery of the service activates the referral process and
the management of its progress; the issuing of the referral qualifies its
visualisation by the modules: AmbulatoryILaboratories Management,
Admissions Discharges and Transfers (ADT), Ward Management, Emergency
and Admwions Department (EAD), Operating Theatres, Departments
concerned.
The services registered as delivered activate the processes of
administrative/accounting flows: control and analysis of ambulatory services
(production of File “C”), control and data validation for feeding the Mobility of
Specialist Services.

6.2. Hospitalisation process


The Hospitalisation process starts out with the request (Prescription) for
admission to hospital from the family doctor, hospital doctor, Emergency Units
and concludes with the discharge of the patient and the closing up of their
Clinical Record (health process).
The administrativelaccounting process is completed with the handling of the
Hospital Discharge Card (HDC) and the pricing of the HDC itself (Grouper 3M
integration), the production of flows supplied by the closed HDC’s, the
management of flows relative to the Hospitalisation Mobility.
195

Figure 9 - Hospitalisation process

The Hospitalisation Process (request) can be activated by the modules:


Admissions, Discharges and Transfers (ADT), for programmed
hospitalisation and as DH (both as administrative admission and
waiting list);
Ward Management (programmed hospitalisation);
Emergency Department and Admissions (Urgent hospitalisation);
Internal Appt. Centre for DH admmions with servicedprocedures by
appointment.
Once admission has been done, aside from the module used, the patient is
placed directly onto the in-patients list for the ward or onto the list of admissions
to be confirmed for the ward.
From the ADT modules and Ward Management it is possible to visualise the
preceding reports of hospitalisation and ambulatory services, i.e. the patient’s
clinical history.
196

The cards relative to hospitalisation, once filled out, are recorded in the
HDCDRG, which sees to the carrying out of pricing (Grouper 3M) and formal
controls and then to feeding the Hospitalisation Mobility module.
The In-patients Ward Management represents in itself the evolution of a
clinical-health process in the area of administrative process (admission and
filling out of the Discharge card).
The degree of integration sees to it that the list of in-patients of the single
ward is fed by all the modules of the Health System whlch can carry out the
admmistrative hospitalisation. Data relative to therapiedmedicine giving and the
pages of the specialist clinical record are visible in the complete clinical history
of the patient. The giving of medicine means an automatic writing for
discharging form the pharmacy cabinet, which is integrated with the Pharmacy
Store for the management of the sub stock and relative order for supplies. From
the specialist clinical record it is possible to access all administrative and clinical
data relative to the hospitalisation in itself as well as previous hospitalisations
(including therein all the data about ambulatory services with referrals inside the
Ambulatory Management).
The figure which follows represents the logical flow relative to the
Integrated management of an In-Patient Ward.
197

Figure 10 - Integration of an in-patient ward

6.3, Weyareprocess
The Delivery of Welfare Services process starts out with the request
(opening of file) for the Service itself from the family doctor, hospital doctor and
is defined with the management of the Multi-dimension Assessment file for
Adults and the Elderly and the putting onto the waiting list according to the type
of welfare regime.
The health process is completed with the discharge from the welfare
structure and the closing of the Clinical Record.
The admmistrative/accounting process is fed by the recording of the
activities delivered, so the price lists relative to both private structures and the
ones in the National Health Service are valid. On the basis of preventive
activities it is possible to carry out expenditure forecasts for each structure and
each type of welfare regime. For every welfare structure it is possible to import
198

specific plans (defined by the regions) relative to the activities carried out on
patients on their lists.
On the basis of the data from the files, it is possible to carry out controls on
the suitability of said data; controls enable transparent management of payments
made to private structures with welfare regime.
Figure 11 - Welfare Process

Figure 1 1 -Welfare Process


The welfare process (request, opening of file) can be activated by the
modules:
Admissions, Discharges and Transfers (ADT), for the management of
integrated home and protected residential assistance, should this
activity be strictly connected to the post-hospitalisation phase, i.e.
protected discharge;
0 Social-Health department (management of the Multi-dimension
Assessment file for Adults and the Elderly and the putting onto the
waiting lists).
The assessment of the patient, for whom the request for assistance has been
presented, activates the specific welfare regime:
199

Hospitalisation in Protected Welfare Accommodation;


Day centre Care;
Temporary Social Hospitality;
Protected Accommodation;
Day Centre for the demented;
Rehabilitation assistance;
Integrated Home Assistance;
Hospice.
200

A WIRELESS-BASED SYSTEM FOR AN INTERACTIVE


APPROACH TO MEDICAL PARAMETERS EXCHANGE

GIANNI FENU
University of Cagliari
Department of Mathematics and Informatics
e-mail:fenu@unica.it

ANTONIO CRISPONI
University of Cagliari
Department of Mathematics and Informatics
e-mail: antonio@sc.unica.it

SIMONE CUGIA
University of Cagliari
Department of Mathematics and Informatics
e-mail: simone@sc.unica. it

MASSIMILIANO PICCONI
University of Cagliari
Department of Mathematics and Informatics
e-mail: mpicconi@sc.unica.it

The use of computer technology in the sector of medicine has seen lately the study of
different fit applications to the clinical data management, being textual or images, across
networks in which has always been given priority to the politics of flexibility and safety.
In the pattern here shown I have summarized the factors of flexibility necessary for the
development of systems for ample application-oriented spectrums, creditable to the
technology investment wireless allowing safety and guaranteeing a certified exchange in
client network-server. Besides the need to offer an ample base, in client side, has
suggested the investment of different PDA patterns, making largely portable the
application and allowing architectural independence. The same smart-wireless network
smart-client allows cell-growth in the area of employment.

1 Introduction
The diffusion of computer tools in different sectors of modern medicine has
marked an evident discontinuity in the steps of development in scientific
activities.
Notable benefits have been brought in the field of medicine, by the
improvement of hospital services and from the consequential growth of the
quality of the aid, that in different ways is related not only to the development of
20 1

information process, in a precise sense, but even in the transferral of


mformation.
In particular the transferral of information has seen in time and improvement
of applications inside and outside of the hospital environment.
The evolution of the computer technology and of the wireless networks,
combined to the development of applications in order to reconcile the mobility
of the consumers, offer several application fields.
Among innovative services and technologies, this contribution was born,
which presents itself as an aim to study and implementation of a client-server
network in hospital environment with the use of palm-devices for the
consultation, the visualization and the insertion of the data relating to the
clinical folders of the patients in interactive way.
It allows the recovery of information relatives to any previous patient
admission, clinical folder data or only to verify the quality and/or quantity of
specific parametric data. The opportunity to be able to consult the patients data
base w i t h the hospital structure allows an analysis and monitoring therapies,
without the necessity of paperwork, suggesting a different method of
visualization of the data in each department.
The client interface has been studied to guarantee a easier use and
consultation, makmg its user-friendly for users with a little computer knowledge
and allow security and flexibility in hospital departments.
Particular attention has been given to the internal radiofrequencies in the
hospitals, through the use of the ISM frequencies ( Industrial, Scientifical and
Medical) specified by the ITU Radio Regulations for the scientific and medical
devices, with inclusive bands among 2,4 and 2,5 GHz, consenting the network to
conform to law.

2 Architectural Model
The solutions for the development of wireless applications are distinguished in
browser-based, synchronization-based, and smart-client [3].
The browser-based introduces the disadvantage to require a permanent
connection, with the consequent problems of high data exchange, more than
user's necessity, and cachmg of informations, that would be able not result
update.
The synchronization-based solution introduces the contrary disadvantage,
the offline operation, and it doesn't allow the system to work in real-time in
wireless network; the application uses a cache of data on the handled device.
The smart-client solution allows the network exchange of in demand
informations only, guaranteeing data rate and simple inquiry mode; a further
advantage is the independence by network architecture, integrating in existing
server arclutecture.
202

Figure 1. Smart-client architecture. Wi-Fi connection between a PDA client and


a server.

The client-server systems was born as coordination of two logical systems (


the client and the server ) and of their components in applications. The client-
server application planning requires the choice of appropriate architecture; the
client-server software has to be modular, because many parts of an application
were execute in more then a system [I].
The separation of application functions in different components makes
easier client-server processes, as components give clear locations when the
applications are distributed.
It needs besides to divide application modules in independent components,
so that a component of request (client), it’s expect outputs through a component
of response (server).
Beginning from a classical definition client-server architecture like
interaction among application distributed components, it notices that the
execution location of components become main point for performances of the
application and of the system. Some components of applications are able be
executed in more efficient way in the server or in the client.
Tasks of any fknctions are given below:

Tasks of the client:


Visualizatiodpresentation
Interaction with users;
Request formulation through the server.

Tasks of the server:


Query on the database;
203

0 Possible arithmetics and graphic processes;


0 Patient’s data management;
Communications and authentications of users;
Response process.

To make executable the code on more handled device (pocket-size PC and


PALM ) it’s important to eliminate the dependences of operating system by the
code of applications; because currently hardware interfaces and peripherals of
PDAs are not standardized, allowing so the possibility of use on different
operating systems ( Palm Os, and Win CE ).
In network architectures, the management of client-server interactions uses
communication techniques based on socket, considered currently the more
flexible technology.
The client negotiates the socket with the server and establishes a connection
on the same socket; through this channel all informations of every single client
converge on server, so the same client, communicates with the server in different
moments, would not be able to receive data fiom the same port of the server.

3 Patients’ data management


The filing, the management, and the consultation of data-base are managed by a
DBMS engine, that allows the abstraction of data by the way when these are
storaged and managed, and, above all, the possibility to perfom queries with a
high level language.
Particular consequence has again the authentication system, to avoid
database consultations by unauthorized users.
To can implement such mechanism is scheduled different users’ profiles
diversified in several authorization levels, to the minimum, in two areas: a
narrow area and a reserved area.
On the skeleton of the clinical folders of the patients, had studied a
relational structure contained information fields of patients: the bed in which is
entertained, the informations to the hospitalization, the temperature bend, the
water budget, the parameters relating to the breathing and cardiac rate, the
specific therapy and the medical examinations of the patient.
204

Figure 2. Scheme of the database.

4 Client interface
The application answers three fundamental requirement: provide users’
informative reports, provide data transfer security and reliability and provide the
data communication of different format but compatible with the computational
archtecture and the interface of clients.
The mechanism of communicatiodsynchronization between server and
client are implemented through a pattem-matching system to interpret, on server-
side, client commands, and on client-side, server commands [ 11 [4].
Steps of the communication between server and client are:

Send from client-side a string composed by association


<command>#<parameter>;
0 Decoding operated by the parser on server-side of command string;
205

Execution command and following operation on database;


Encoding informations server-side <command>#<information>;
Send from server-side informations;
Decoding operated by the parser on client-side of response string.

Cognome: ICognome I

4
I
,.
Pnitd Operatiw I 15i03i20031

Figure 3. Searching mask of patients.


$ I

Cognome:
Nome: Ie
Interu.: ,....,:vento
Letto: II 6ttn

Pnitd Oparatiw I 15/03/20031


, [TEMP.]
[m
Figure 4 . Searching mask of patient's personal data.

Frq. Card. 1-iminuto


1 Frq. Resp m i m i n u t o
Pres. Rrt. ( p m a x m m i n

'Dirte: 7

P i t d Operatiw I 15i03/200:

Figure 5 . Searching mask of cardiobreathing parameters.


206

During the implementation of communication protocol are given


prominence to some problems in the management of data flow; this is depend on
different implementation of sockets between client and server, to modify
platforms.
For the solution of those problems and the overcoming of limitations, it’s
chosen of limit the information send fiom the server to the client in a fixed size
of 100 bytes at step. The in general, server-side code interprets the commands in
arrival fiom the client and it performs the calls to response functions.

Figure 6. Searching mask of water balancing parameters.

Figure 7. Searching mask of the parameters of the temperature bend.

Particular attention gives to the implementation of the temperature bend;


from client-side it’s possible visualize the graph of the daily or weekly
temperature.
This element is of notable aid for example to verify during the medical
discussion the post operating feedbacks of patients.
207

5 Standards for transmission


To provide request interactivity between client and server for to the exchange of
real-time informations, is implemented a communication system extremely
flexible and suitable to guarantee the mobility of the regulating PDA [2] [6] [7].
Currently in the market they are present different technological solutions to
satisfy the quoted requisites. The transmission occurs in the ISM band (Industrial
Scientific Medical) about the 2,45 Ghz; it uses spectrum spread technical to
avoid interference signals. The transmission of the signal is forced on a greater
band of that necessary and, in this way, more systems are forced to transmit in
the same band [5].
In the classes of the wireless devices exist a difference between the devices
born to integrate perfectly with Ethernet LAN preexisting ( IEEE 802.1 1 ), and
devices that use an alternative technology to implement area personal computer
network (PAN), as Bluetooth.

The use of the protocol IEEE 802.11b, that defines a standard for the
physical layer and for the MAC sublayer for the implementation of Wireless
LAN, rapresents a system of communication that extends a traditional LAN on a
radio tecnologies, and so it facilitates the integration in existing department. The
adopted WLAN may be configured in two separate modes:

Peer-to-peer. A connection without any existent fixed network


device; two or more devices equippeded with wireless cards are
able to create a lan on the fly.
Clientherver. T h s mode permits a connection to an Access Point
to more devices, and it works as a bridge between them and the
wired lan.

Bluetooth [8] is the name of an open standard for wireless communications,


studied for transmissions to short ray, low power, with facility of use. Bluetooth
works at the frequencies of 2,4 GHz in the ISM band in TDD ( Time Division
Duplex ) and with Frequency Hopping (FHSS). It uses two different typology of
transmission: the first, on bus of asynchronous transmission; it supports a
maximum rate of 72 1 Kbps asymmetric; the second works on synchronous bus, a
symmetrical installments of 432,6 Kbps.
To communicate among them, two or more BD (Bluetooth Device) have to
form a piconet (a radio LAN with a maximum number of eight components).
The baseband level reserves a slot to the master, and a slot to the with a
typically alternate course ( master-Slave 1-master-Slave2-. . .), for the resolution
conflicts of the bus. The master is able transmit only in odd slots, while the Slave
in pair slots. The forwarding scheme of the packets is entrusted to the
mechanism of ARQ (Automatic Repeat request). The notification of an ACK
208

(ACKnowledgment) to the sender of the packet testifies the good result of the
transmission.

The smart-client architecture works on both wireless systems. To be able to


works in a wide area is recommended the use of the technology 802.1 lb, that
results be altogether to low cost, compatible with the standards Internet and
Ethernet; it allows high trasmissive speed with good QoS (Quality of Services)
to ensure low losses of packets in the applications in real-time.

6 Cryptography and Security.


One of the weak point in an any radio communication, is the privacy and the data
security. Usually, for this reason, when a protocol for radio communications is
unplemented, there are adopted advanced secutity standards. Even in this case,
the security has been submitted to the functions of data authentication and
cryptography.
The process of the authentication is divided in two ways:
Open System Authentication
Shared Key Authentication
The first way doesn't schedule real authentication, so any device is permitted
to access.
The second way happens with a pre-shared key. This process is similar to
the process of authentication in use in the GSM architecture; in fact when a
server receives an authentication request, it sends to the client a pseudorandom
number. The terminal user, based on the pre-shared key and on the
pseudorandom number, calculates the result output (through a no reversing
fimction) and sends it to the server. T h s one makes the same computation and
compares the two data; in this way it determines if the consumer is qualified to
the access. In such mode it is guaranteed the PDA authentication-server.

One of the aspects in the use of the standard 802.1l b is one's own security,
which is entrusted to a protocol called WEP ( Wired Equivalent Privacy ), that
is concerned with authentications of nodes and cryptography.
The logical diagram of the WEP algorithm is represented in this figure:
209

b IV

WEp Kev seauence,


Ciphertex
Secret Key b PRGN QD+

lntegritycheckvalue (ICV)

Figure 8. Operation diagram of the WEP algorithm

The inizialization vector (IV) is a 24 bits key, linked with the secret key (40bit
key).
In this way we obtain a set of 64 bits that are integrated as input in a
generator of pseudorandom codes ( PRGN WEP ) creating the sequence key.
The data users (Plaintext) are linked with a value of 4 bits, called Integrity
Check Value (ICV), generated from the algorithm for the integrity.
At this point the key sequence and the output of the connection between
plaintext and ICV, are submitted to a XOR operation (ciphertext). Then IV and
chiphertext are lmked and subsequently transmitted. The IV changes for every
transmission, and it’s the only that is transmitted in clear, to be able to
reconstruct the message in receipt phase.

7 Conclusions
The described architecture is characterized for aspects of the simplicity of
implementation, and the insertion in complex existent architecture.
T h s integration model is characterized in portability, security and
interactivity, and it consents to the user to interact with the server even to
distance.
There are in study some enhancements to process different models and
criterions for direct data exchange in the PDA architectures.
However the model of an user without ties in the treatment and in the access
to parametric data of the patient represents in this moment a simply and reliable
smart-client architecture.
210

References

1. E. Guttman and J. Kempf, Automatic Discovery of Thin Servers: SLP, Jini


and the SLP Jini Bridge., Proc 25th Ann. Conf, IEEE Industrial Electronics
SOC.(IECON 99), IEEE Press, Piscataway, N.J., 1999.
2. AAW, a cura di F. Mwatore, Le comunicazioni mobili del hturo UMTS:il
nuovo sistema del200 1, CSELT, 2000.
3. Rajan Kuruppillai, Mahi Dontamsetti and J. Casentino, Tecnologie
Wireless, Mc Graw Hill, 1999
4. L. Bright, S. Bhattacharjee and L. Rashid, Supporting diverse mobile
applications with client profiles, International Workshop on Wireless
Mobile Multimedia, Proc. 5" Ann. Conf. ACM, p. 88-95, Atlanta, Georgia,
2002
5. European Telecommunications Standards Institute, official site:
http://www.etsi.org
6 . PDA Palm official site: http://www.palm.com
7. Italian Palm Users Group, official site: http://www.itapug.it
8. BlueTooth, official site: http://www.bluetootli.com
This page intentionally left blank

You might also like