Data Fusion Kalyani 30 June

Data Fusion
in Geographic Information Systems
Amlan Chakrabarti
University of Calcutta
Part 1: Preliminary Concepts of
Data Fusion
Amlan Chakrabarti, University of Calcutta
The Balance of Knowledge Discovery
and Analysis
Meta Data Accept &
Tagging Format Data Data
Hierarchical
Decomposition Problem-centered
Decomposition &
Accumulate/Fuse Source Analysis
& Discover
Data Filtering, Information
Correlation & Fusion
Formulate &
Refine Queries
Format &
Display Reports
Interaction &
Collaboration Evaluate Monitor Process & User
with other Hypotheses and Adapt for Improved Formulate & Refine
human & virtual Inferences Alternate Hypotheses
agents Assess &
Analyze
Accumulate, Hypotheses
Filter & Fuse Data
Decompose
Problem
Retrieve
Information
KGEC : 30th June, 2009

Outline
• What is Data Fusion ?
• JDL Fusion Model
• Challenges of Data Fusion
• Data Fusion Problem
• Mathematical Techniques
• Bayes Theorem
• Fusion Technologies
• Research Problems and Challenges
• Conclusion
What is Data Fusion?
• Data fusion is an Information Process dealing

with the:
– association, correlation, and combination of data and information,
from
• single and multiple sensors or sources,
– to achieve
• refined estimates of parameters, characteristics,events, and

behaviors, for observed entities in an observed field of view

Formal Definition
• A process dealing with the association, correlation, and

combination of data and information from single and multiple
sources to achieve refined position and identity estimates,
and complete and timely assessments of situations and
threats, and their significance. The process is characterized
by continuous refinements of its estimates and assessments,
and the evaluation of the need for additional sources, or
modification of the process itself, to achieve improved
results.
(Data Fusion Lexicon, JDL Data Fusion Subgroup 1987)
• A process of combining data or information to estimate or predict entity states

(A N Steinberg, C L Bowman, F E White, 1998)
The JDL(Joint Directors of Laboratories) Data
Fusion Model
DATA FUSION DOMAIN
Level One Level Two Level Three

Source
Object Situation Threat
Pre-Processing
Refinement Refinement Refinement
Level
Human
Sources
Five
Computer
Cognitive Interaction
Refinement
Database Management System
Support Fusion
Database Database
Level Four
Process
Refinement

JDL Model in Brief
Level 1, performs "object refinement", which is an iterative process of fusing •

data to determine the identity and other attributes of entities and also to build
.tracks to represent their behavior
Level 2 performs "situation refinement", which is an iterative process of fusing the

spatial and temporal relationships between entities to group them together and
form an abstracted interpretation of the patterns in the order of battle data. The
.product from this level is called the situation assessment (DSTO, 1994)
Level 3 performs "threat refinement", which is an iterative process of fusing the

combined activity and capability of enemy forces to infer their intentions and
assess the threat that they pose. The product from this level is called the threat
assessment.
Level 4 performs "process refinement", which is an ongoing monitoring and

assessment of the fusion process to refine the process itself and to regulate the
acquisition of data to achieve optimal results (Klein, 1993). Level 4 interacts with
each of the other levels.

Communication structure of an imagined Data Fusion system in a military setting

The labels L1, L2, and L3 mean the levels in the JDL model
Challenges of Data Fusion

• Uncertainty of sensors:
– no perfect sensors available
– difficult to predict sensor performance
– to effectively use the sensors
– Heterogeneous sensors
– Power constraints
– Package loss rate is high with wireless communication
• Dynamics of environments:
– The sensor network is embedded into the real world. Its structure and
data fusion Strategy and algorithm must adapt to the environment
changes.
– to effectively task geographically distributed non- commensurate
sensors

Challenges of Data Fusion (contd.)
• Dynamics of targets
– Targets can appear any time, any where, with any speed,
under any weather conditions.
– There is insufficient training data
• Human computer interface (HCI):

– to know how to link decision needs to sensor management
– Incorporating human knowledge into the decision process

Data Fusion Problem
Problem solving requirements Fusion system

characteristics
Multiple levels of abstraction,
Hierarchical reasoning
organisations and processes
Spatial reasoning 3D nature of observed world
Temporal reasoning Dynamic, evolving situation

Use all possible information sources and
maintain multiple hypothesis: sensors, Uncertain data and tentative
domain knowledge, textual reports, decisions
known constraints
Distributed processing, task
Real-time monitoring and
decomposition, efficient algorithms and
reporting
databases

Remote-sensed Earth Science Data Sample
Data Format Source Data Product

HDF-EOS, Terra satellite, MODIS Land Surface
Integerized sinusoidal sensor Temperature
projection Bands 20, 22, 23, 29, Day/night land
31, 32, 33 temperature per grid
HDF-EOS, Terra satellite, MODIS Leaf Area Index

Integerized sinusoidal sensor One-sided leaf area per
projection Bands 1 - 7 unit ground area
HDF-EOS, Terra satellite, MODIS Precipitable Water

Equal angle grid sensor Column water vapor
Bands 1, 2, 17, 18, 19 amounts
Text report, ERS-1&2, ATSR sensor Fire Event

Point location Bands of 1.6, 3.7, 11.0, Detected fire indication
12.0 micrometers with time and location
Widely-available multi-source remote-sensed data and textual

information can be fused to make interpretations and inferences
using hybrid reasoning techniques.

Mathematical Techniques
• Fusion process is a complex mathematical task and
many issues need to be addressed
• Data in diverse formats, noisy and ambiguous
– analogue, digital, discrete, textual, imagery
• Data dimensionality and alignment
– coordinate systems, units, frequency, amplitude, timing
• Temporal alignment
– synchronisation of data,
– spatial distribution of sensors demands precise time
measurements,
– data arrival at fusion node may not coincide due to variable
propagation delays

Fusion Technologies
• Fusion techniques are drawn from many different

disciplines in mathematics and engineering.
– Probability and statistical estimation (Bayes,HMM..)
– Signal processing and information theory
– Image processing and pattern recognition
– Artificial intelligence (classical/modern)
– Information and communication technology
– Software engineering and networking
– Biological sciences
– control theory

Bayes Theorem
• Assume we have a hypothesis space H, and

dataset D. We can define three probabilities:
• P(h) is the probability of h being the correct
hypothesis before seeing any data. P(h) is called
the prior probability of h.
– Example: Chance of rain is 80% if we are close to the
sea and in latitude X (no data has been seen).
• P(D) is the probability of seeing data D.
• P(D|h) is the probability of the data given h. It is
called the likelihood of h with respect to D.

Bayes Theorem
• Bayes theorem relates the posterior probability of

a hypothesis given the data with the three
probabilities mentioned before:
Likelihood
P ( D | h).P (h)
P(h | D) 
P( D) Prior
probability
Posterior
probability Evidence

An example
• We do a test in the lab to check if a patient has

cancer.
• We know only 0.008 of the population has cancer.
• The lab test is imperfect. It returns positive in 98%
of the cases where the disease is present (true
positive rate) and it returns negative in 97% of the
cases where there is no disease (true negative
rate).

Example
• What is the probability that the patient has cancer

given that lab result was positive?
P ( | cancer ).P (cancer )
P (cancer ) 
P()
(0.98).(0.008)

P()

Research Issues and Challenges

• Geo-spatial and temporal • Incorporation of negative
resolution of the data reasoning
• Availability of training data • Default reasoning
• Multi-times scale and • Approach for indeterminate &

asynchronous data unavailable information
• Prediction intervals and • Human-in-the-loop processing

predictability horizons and multi-person collaboration
• Incorporation of multi-expert • Development of cognitive aids for

knowledge interpretation
• “Brittleness” of prediction and • Multi-sensory representations of

reasoning uncertainty
This problem provides a rich source of continuing challenges across multiple

disciplines.
Conclusion
• With some imagination and near-term

innovations, information fusion and
understanding may not be so difficult
Slide 21 of 15
Part 2: Data Fusion Models for
GIS
Outline
• Fusion in Geographic Information System
• Geographic database
• Fusion Data Set
• Measuring the Quality of the Result
• Locations Used for Fusion
• Fusion methods
• Conclusion
Fusion in Geographic InformationAmlan Chakrabarti, University of Calcutta
System
• The objective of data fusion is to combine data sets from

multiple sources into a single set meaningful information.
• Given two geographic databases, a fusion algorithm should
produce all pairs of corresponding objects (i.e., objects that
represent the same real-world entity).
• Data Fusion Models helps us to use the data generated
through different data abstraction layers in an optimal way.
• The algorithms should work even when locations are

imprecise and each database represents only some of the
real-world entities.

Geographic database
• A geographic database stores spatial objects.
• Each object represents a single real-world entity.
• We view a geographic databases as a dataset of objects with atmost one

object for each real-world entity.
• An object has associated spatial and non-spatial attributes.
– Spatial: location, height, shape, topology etc.
– Non-spatial: Name, address, no.of.rooms in the hotel etc.
• Locations (polygons) can be approximated by points, the distance between

two objects is the Euclidean distance between their point locations.
• When two geographic databases are integrated, the main task is to identify
pair objects, one from each dataset, that represent the same entity.
• In general a fusion algorithm may process more than two datasets and it
generates fusion sets with at most one object from each dataset.
Fusion Data Set

• We denote two data sets:
• A = {a1,…..,am} and B = {b1,….bn}; two objects a ε A and b ε B are
corresponding objects if they represent the same entity
• A fusion set that is generated from A and B is either a singleton
( i.e., contains a single object) or has two objects, one from each
dataset.
• A fusion set is {a,b} is correct if a and b are corresponding objects.
• A singleton fusion set {a} is correct if a does not have a
corresponding object in the other data set.
• We measure the quality of a fusion algorithm in terms of
recall and precision.
– Recall is the percentage of correct fusion sets that actually appear
in the result (e.g., 91% of all the correct fusion sets appear in the
result).
– Precision is the percentage of correct fusion sets out of all the
fusion sets in the result (e.g., 80% of the sets in the result are
correct).
Fusion Data Set (contd.)

• Formally, let the result of a fusion algorithm have sr fusion sets and
let src sets out those be correct.
• Let e denote the total number of real-world entities that are represented in at least one of the two
datasets.
• Then the precision is src /sr and the recall is src/e.
• Factors Affecting Recall and Precision
– One factor that influences the recall and precision is the error interval of each dataset.
– The error interval is a bound on the distance between an object in the dataset and the entity
it represents.
– The density of a dataset is the number of objects per unit of area.
– The choice factor is the number of objects in a circle with a radius that is equal to the error
interval (note that the choice factor is the product of the density and the area of that circle).
– Intuitively, for a given entity, the choice factor is an estimate of the number of objects in
the dataset that could possibly represent that entity.
– It is more difficult to achieve high recall and precision when the choice factor is large.

Measuring the Quality of the Result
# correct sets in the result | C |

Recall  
# entities |E|
# correct sets in the result | C |
Precision  
# all sets in the result |R|
E C R
Entities Correct Fusion
in the fusion sets sets in
world in the the
result result

The Goal: Fusing Objects that Represent
the Same Real-World Entity
Example: three data sources that provide
information about hotels in a city
Survey1 Survey2 Survey3

polygon
points
Is there a nearby
Hotel Rank parking lot?
Each data source provides data that the other sources do not provide
KGEC : 30 June, 2009
th
Radison Moria
Object fusion enables us to utilize the different perspectives of thethdata sources

KGEC : 30 June, 2009
Why Are Locations Used for Fusion?
• There are no global keys to identify objects that should

be fused
• Names cannot be used
– Change often
– May be missing
– May be in different languages
• It seems that locations are keys:
– Each spatial object includes location attributes
– In a “perfect world,” two objects that represent the same
entity have the same location

Why is it Difficult to use Locations?
• In real maps,
locations are inaccurate
• The map on the left is an overlay
of the three data sources about
hotels in Tel-Aviv
For example, the Basel
Hotel has three different
locations, in the three
data sources

Inaccuracy  Difficult to Use Locations
• It is difficult to distinguish between:

1. A pair of objects that represent close entities
+
2. A pair of objects that represent the same entity

+
• Partial coverage complicates the

problem
?
1 a 2

Fusion methods
Assumptions
• There are only two data sources
• Each data source has at most one object for

each real-world entity – i.e., the matching is
one-to-one

Corresponding Objects
• Objects from two

distinct sources
that represent
the same real-
world entity

Fusion Sets
• A fusion algorithm creates two types

of fusion sets:
+
– A set with a single object
– A set with a pair of objects – one

from each data source +

Confidence
• Our methods are heuristics  may produce incorrect

fusion sets
• A confidence value between 0 and 1 is attached to
each fusion set
• It indicates the degree of certainty in the correctness of
the fusion set
Fusion sets + Fusion sets

with low + with high
confidence confidence
One Sided Nearest Neighbor Join

• Given an object a ε A , we say that an object b ε B is the nearest B-neighbor
of a if b is the closest object to a among all the objects in B.
• The one-sided nearest neighbor join of a dataset B with a dataset A

produces all fusion sets {a, b}, such that a ε A and b ε B is the nearest B-
neighbor of a.
• Note that every a ε A is in one of the fusion sets, while objects of B may
appear in zero, one or more fusion sets.
• Thus, the one-sided nearest-neighbor join is not symmetric, i.e., the result
of joining B with A is not necessarily equal to the result of joining A with B.
• The definition can be modified by adding to the result the singleton set {b} for
every b ε B that is not the nearest neighbor of some a ε A . We do that to
boost up the recall of this method; otherwise, the recall could be very low.
• We say that a dataset A is covered by a dataset B if every real-world entity

that is represented in A is also represented in B.
The Mutually-Nearest Method
• The result includes

– All mutually-nearest pairs
– All singletons, when an object is not part of pair
Finding nearest
input Fusion sets
objects
nearest nearest
1 a 2 1 a 2 1 a 2
nearest

The Mutually-Nearest Method (contd.)
• Two objects, a ε A and b ε B, are mutually nearest if a is

the nearest A-neighbor of b and b is the nearest B-
neighbor of a.
• The intuition behind the mutually nearest method is that
corresponding objects are likely to be mutually nearest.
• Note that some objects of A are not in any pair of
mutually-nearest objects (and, similarly, for some
objects of B).
• It happens when the nearest B-neighbor of a ε A is some
b ε B , but the nearest A-neighbor of b is different from
a. KGEC : 30 June, 2009
th
The Probabilistic Method
• An object from one dataset has a probability of

choosing an object from the other dataset
• The probability is inversely proportional to the
distance
Confidence – the probability that

the object is not chosen by any +
+ Confidence – the probability of

the mutual choice
A threshold value is used to discard

fusion sets with low confidence
Mutual Influences Between Probabilities
Case I:
1 a 2 1 a 2
0.3 0.2
Case II: we expect
1 a 2 1 a 2
b b
0.8 0.05
The Normalized-Weights Method
Normalization Iteration
captures mutual brings to
influence equilibrium
Results are superior to those

of the previous two methods
(at a cost of only a small increase
in the computation time)

A Case Study:
State of
the art Our three methods
Normal- Proba- Mutually The

ized bilistic nearest traditional
weights method nearest
method neighbor
(Best
results)
0.85 0.80 0.77 0.48 Recall
0.90 0.80 0.85 Precisio

0.56
n
All three methods perform much better
than the nearest-neighbor method
Conclusions
The novelty of all these approaches is in developing

efficient methods that find fusion sets with high
recall and
precision, using only location of objects.
Thank you!
acakcs@caluniv.ac.in

Data Fusion Kalyani 30 June

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Fusion Kalyani 30 June

Uploaded by

Copyright:

Available Formats

Data Fusion

in Geographic Information Systems

KGEC : 30th June, 2009

• JDL Fusion Model

• Challenges of Data Fusion

• Data Fusion Problem

• Research Problems and Challenges

What is Data Fusion?

• Data fusion is an Information Process dealing

• single and multiple sensors or sources,

• refined estimates of parameters, characteristics,events, and

KGEC : 30th June, 2009

• A process dealing with the association, correlation, and

• A process of combining data or information to estimate or predict entity states

DATA FUSION DOMAIN

Level One Level Two Level Three

Database Management System

KGEC : 30th June, 2009

JDL Model in Brief

Level 1, performs "object refinement", which is an iterative process of fusing •

Level 2 performs "situation refinement", which is an iterative process of fusing the

Level 3 performs "threat refinement", which is an iterative process of fusing the

Level 4 performs "process refinement", which is an ongoing monitoring and

KGEC : 30th June, 2009

Communication structure of an imagined Data Fusion system in a military setting

Challenges of Data Fusion

KGEC : 30th June, 2009

• Human computer interface (HCI):

KGEC : 30th June, 2009

Data Fusion Problem

Problem solving requirements Fusion system

Spatial reasoning 3D nature of observed world

Temporal reasoning Dynamic, evolving situation

KGEC : 30th June, 2009

Remote-sensed Earth Science Data Sample

Data Format Source Data Product

HDF-EOS, Terra satellite, MODIS Leaf Area Index

HDF-EOS, Terra satellite, MODIS Precipitable Water

Text report, ERS-1&2, ATSR sensor Fire Event

Widely-available multi-source remote-sensed data and textual

KGEC : 30th June, 2009

KGEC : 30th June, 2009

• Fusion techniques are drawn from many different

KGEC : 30th June, 2009

• Assume we have a hypothesis space H, and

KGEC : 30th June, 2009

• Bayes theorem relates the posterior probability of

KGEC : 30th June, 2009

• We do a test in the lab to check if a patient has

KGEC : 30th June, 2009

• What is the probability that the patient has cancer

KGEC : 30th June, 2009

Research Issues and Challenges

• Availability of training data • Default reasoning

• Multi-times scale and • Approach for indeterminate &

• Prediction intervals and • Human-in-the-loop processing

• Incorporation of multi-expert • Development of cognitive aids for

• “Brittleness” of prediction and • Multi-sensory representations of

This problem provides a rich source of continuing challenges across multiple

• With some imagination and near-term

• Fusion Data Set

• Measuring the Quality of the Result

• Locations Used for Fusion