Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 46

Data Fusion

in Geographic Information Systems

Amlan Chakrabarti
University of Calcutta
Part 1: Preliminary Concepts of
Data Fusion
Amlan Chakrabarti, University of Calcutta
The Balance of Knowledge Discovery
and Analysis
Meta Data Accept &
Tagging Format Data Data

Hierarchical
Decomposition Problem-centered
Decomposition &
Accumulate/Fuse Source Analysis
& Discover
Data Filtering, Information
Correlation & Fusion
Formulate &
Refine Queries

Format &
Display Reports

Interaction &
Collaboration Evaluate Monitor Process & User
with other Hypotheses and Adapt for Improved Formulate & Refine
human & virtual Inferences Alternate Hypotheses
agents Assess &
Analyze
Accumulate, Hypotheses
Filter & Fuse Data
Decompose
Problem

Retrieve
Information

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Outline
• What is Data Fusion ?

• JDL Fusion Model

• Challenges of Data Fusion

• Data Fusion Problem

• Mathematical Techniques

• Bayes Theorem

• Fusion Technologies

• Research Problems and Challenges

• Conclusion
KGEC : 30th June, 2009
Amlan Chakrabarti, University of Calcutta

What is Data Fusion?

• Data fusion is an Information Process dealing


with the:
– association, correlation, and combination of data and information,
from

• single and multiple sensors or sources,

– to achieve

• refined estimates of parameters, characteristics,events, and


behaviors, for observed entities in an observed field of view

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Formal Definition

• A process dealing with the association, correlation, and


combination of data and information from single and multiple
sources to achieve refined position and identity estimates,
and complete and timely assessments of situations and
threats, and their significance. The process is characterized
by continuous refinements of its estimates and assessments,
and the evaluation of the need for additional sources, or
modification of the process itself, to achieve improved
results.
(Data Fusion Lexicon, JDL Data Fusion Subgroup 1987)

• A process of combining data or information to estimate or predict entity states


(A N Steinberg, C L Bowman, F E White, 1998)
KGEC : 30th June, 2009
Amlan Chakrabarti, University of Calcutta
The JDL(Joint Directors of Laboratories) Data
Fusion Model

DATA FUSION DOMAIN

Level One Level Two Level Three


Source
Object Situation Threat
Pre-Processing
Refinement Refinement Refinement

Level
Human
Sources
Five
Computer
Cognitive Interaction
Refinement

Database Management System

Support Fusion
Database Database

Level Four
Process
Refinement

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

JDL Model in Brief

Level 1, performs "object refinement", which is an iterative process of fusing •


data to determine the identity and other attributes of entities and also to build
.tracks to represent their behavior

Level 2 performs "situation refinement", which is an iterative process of fusing the


spatial and temporal relationships between entities to group them together and
form an abstracted interpretation of the patterns in the order of battle data. The
.product from this level is called the situation assessment (DSTO, 1994)

Level 3 performs "threat refinement", which is an iterative process of fusing the


combined activity and capability of enemy forces to infer their intentions and
assess the threat that they pose. The product from this level is called the threat
assessment.

Level 4 performs "process refinement", which is an ongoing monitoring and


assessment of the fusion process to refine the process itself and to regulate the
acquisition of data to achieve optimal results (Klein, 1993). Level 4 interacts with
each of the other levels.

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Communication structure of an imagined Data Fusion system in a military setting


The labels L1, L2, and L3 mean the levels in the JDL model
KGEC : 30th June, 2009
Amlan Chakrabarti, University of Calcutta

Challenges of Data Fusion


• Uncertainty of sensors:
– no perfect sensors available
– difficult to predict sensor performance
– to effectively use the sensors
– Heterogeneous sensors
– Power constraints
– Package loss rate is high with wireless communication
• Dynamics of environments:
– The sensor network is embedded into the real world. Its structure and
data fusion Strategy and algorithm must adapt to the environment
changes.
– to effectively task geographically distributed non- commensurate
sensors

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta
Challenges of Data Fusion (contd.)

• Dynamics of targets
– Targets can appear any time, any where, with any speed,
under any weather conditions.
– There is insufficient training data

• Human computer interface (HCI):


– to know how to link decision needs to sensor management
– Incorporating human knowledge into the decision process

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Data Fusion Problem

Problem solving requirements Fusion system


characteristics
Multiple levels of abstraction,
Hierarchical reasoning
organisations and processes

Spatial reasoning 3D nature of observed world

Temporal reasoning Dynamic, evolving situation


Use all possible information sources and
maintain multiple hypothesis: sensors, Uncertain data and tentative
domain knowledge, textual reports, decisions
known constraints
Distributed processing, task
Real-time monitoring and
decomposition, efficient algorithms and
reporting
databases

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Remote-sensed Earth Science Data Sample

Data Format Source Data Product


HDF-EOS, Terra satellite, MODIS Land Surface
Integerized sinusoidal sensor Temperature
projection Bands 20, 22, 23, 29, Day/night land
31, 32, 33 temperature per grid

HDF-EOS, Terra satellite, MODIS Leaf Area Index


Integerized sinusoidal sensor One-sided leaf area per
projection Bands 1 - 7 unit ground area

HDF-EOS, Terra satellite, MODIS Precipitable Water


Equal angle grid sensor Column water vapor
Bands 1, 2, 17, 18, 19 amounts

Text report, ERS-1&2, ATSR sensor Fire Event


Point location Bands of 1.6, 3.7, 11.0, Detected fire indication
12.0 micrometers with time and location

Widely-available multi-source remote-sensed data and textual


information can be fused to make interpretations and inferences
using hybrid reasoning techniques.

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Mathematical Techniques
• Fusion process is a complex mathematical task and
many issues need to be addressed
• Data in diverse formats, noisy and ambiguous
– analogue, digital, discrete, textual, imagery
• Data dimensionality and alignment
– coordinate systems, units, frequency, amplitude, timing
• Temporal alignment
– synchronisation of data,
– spatial distribution of sensors demands precise time
measurements,
– data arrival at fusion node may not coincide due to variable
propagation delays

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Fusion Technologies

• Fusion techniques are drawn from many different


disciplines in mathematics and engineering.
– Probability and statistical estimation (Bayes,HMM..)
– Signal processing and information theory
– Image processing and pattern recognition
– Artificial intelligence (classical/modern)
– Information and communication technology
– Software engineering and networking
– Biological sciences
– control theory

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Bayes Theorem

• Assume we have a hypothesis space H, and


dataset D. We can define three probabilities:
• P(h) is the probability of h being the correct
hypothesis before seeing any data. P(h) is called
the prior probability of h.
– Example: Chance of rain is 80% if we are close to the
sea and in latitude X (no data has been seen).
• P(D) is the probability of seeing data D.
• P(D|h) is the probability of the data given h. It is
called the likelihood of h with respect to D.

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Bayes Theorem

• Bayes theorem relates the posterior probability of


a hypothesis given the data with the three
probabilities mentioned before:
Likelihood

P ( D | h).P (h)
P(h | D) 
P( D) Prior
probability
Posterior
probability Evidence

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

An example

• We do a test in the lab to check if a patient has


cancer.
• We know only 0.008 of the population has cancer.
• The lab test is imperfect. It returns positive in 98%
of the cases where the disease is present (true
positive rate) and it returns negative in 97% of the
cases where there is no disease (true negative
rate).

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Example

• What is the probability that the patient has cancer


given that lab result was positive?
P ( | cancer ).P (cancer )
P (cancer ) 
P()
(0.98).(0.008)

P()

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Research Issues and Challenges


• Geo-spatial and temporal • Incorporation of negative
resolution of the data reasoning

• Availability of training data • Default reasoning

• Multi-times scale and • Approach for indeterminate &


asynchronous data unavailable information

• Prediction intervals and • Human-in-the-loop processing


predictability horizons and multi-person collaboration

• Incorporation of multi-expert • Development of cognitive aids for


knowledge interpretation

• “Brittleness” of prediction and • Multi-sensory representations of


reasoning uncertainty

This problem provides a rich source of continuing challenges across multiple


disciplines.
KGEC : 30th June, 2009
Amlan Chakrabarti, University of Calcutta

Conclusion

• With some imagination and near-term


innovations, information fusion and
understanding may not be so difficult

Slide 21 of 15
KGEC : 30th June, 2009
Part 2: Data Fusion Models for
GIS
Amlan Chakrabarti, University of Calcutta

Outline
• Fusion in Geographic Information System

• Geographic database

• Fusion Data Set

• Measuring the Quality of the Result

• Locations Used for Fusion

• Fusion methods

• Conclusion
KGEC : 30th June, 2009
Fusion in Geographic InformationAmlan Chakrabarti, University of Calcutta

System

• The objective of data fusion is to combine data sets from


multiple sources into a single set meaningful information.
• Given two geographic databases, a fusion algorithm should
produce all pairs of corresponding objects (i.e., objects that
represent the same real-world entity).
• Data Fusion Models helps us to use the data generated
through different data abstraction layers in an optimal way.

• The algorithms should work even when locations are


imprecise and each database represents only some of the
real-world entities.

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Geographic database
• A geographic database stores spatial objects.

• Each object represents a single real-world entity.

• We view a geographic databases as a dataset of objects with atmost one


object for each real-world entity.
• An object has associated spatial and non-spatial attributes.
– Spatial: location, height, shape, topology etc.
– Non-spatial: Name, address, no.of.rooms in the hotel etc.

• Locations (polygons) can be approximated by points, the distance between


two objects is the Euclidean distance between their point locations.
• When two geographic databases are integrated, the main task is to identify
pair objects, one from each dataset, that represent the same entity.
• In general a fusion algorithm may process more than two datasets and it
generates fusion sets with at most one object from each dataset.
KGEC : 30th June, 2009
Amlan Chakrabarti, University of Calcutta

Fusion Data Set


• We denote two data sets:
• A = {a1,…..,am} and B = {b1,….bn}; two objects a ε A and b ε B are
corresponding objects if they represent the same entity
• A fusion set that is generated from A and B is either a singleton
( i.e., contains a single object) or has two objects, one from each
dataset.
• A fusion set is {a,b} is correct if a and b are corresponding objects.
• A singleton fusion set {a} is correct if a does not have a
corresponding object in the other data set.
• We measure the quality of a fusion algorithm in terms of
recall and precision.
– Recall is the percentage of correct fusion sets that actually appear
in the result (e.g., 91% of all the correct fusion sets appear in the
result).
– Precision is the percentage of correct fusion sets out of all the
fusion sets in the result (e.g., 80% of the sets in the result are
correct).
KGEC : 30th June, 2009
Amlan Chakrabarti, University of Calcutta

Fusion Data Set (contd.)


• Formally, let the result of a fusion algorithm have sr fusion sets and
let src sets out those be correct.
• Let e denote the total number of real-world entities that are represented in at least one of the two
datasets.
• Then the precision is src /sr and the recall is src/e.
• Factors Affecting Recall and Precision
– One factor that influences the recall and precision is the error interval of each dataset.
– The error interval is a bound on the distance between an object in the dataset and the entity
it represents.
– The density of a dataset is the number of objects per unit of area.
– The choice factor is the number of objects in a circle with a radius that is equal to the error
interval (note that the choice factor is the product of the density and the area of that circle).
– Intuitively, for a given entity, the choice factor is an estimate of the number of objects in
the dataset that could possibly represent that entity.
– It is more difficult to achieve high recall and precision when the choice factor is large.

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Measuring the Quality of the Result

# correct sets in the result | C |


Recall  
# entities |E|
# correct sets in the result | C |
Precision  
# all sets in the result |R|

E C R
Entities Correct Fusion
in the fusion sets sets in
world in the the
result result

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta
The Goal: Fusing Objects that Represent
the Same Real-World Entity
Example: three data sources that provide
information about hotels in a city
Survey1 Survey2 Survey3

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta
The Goal: Fusing Objects that Represent
the Same Real-World Entity
Survey1 Survey2 Survey3

polygon
points

Is there a nearby
Hotel Rank parking lot?

Each data source provides data that the other sources do not provide
KGEC : 30 June, 2009
th
Amlan Chakrabarti, University of Calcutta
The Goal: Fusing Objects that Represent
the Same Real-World Entity
Radison Moria
Survey1 Survey2 Survey3

Object fusion enables us to utilize the different perspectives of thethdata sources


KGEC : 30 June, 2009
Amlan Chakrabarti, University of Calcutta

Why Are Locations Used for Fusion?

• There are no global keys to identify objects that should


be fused
• Names cannot be used
– Change often
– May be missing
– May be in different languages
• It seems that locations are keys:
– Each spatial object includes location attributes
– In a “perfect world,” two objects that represent the same
entity have the same location

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Why is it Difficult to use Locations?

• In real maps,
locations are inaccurate
• The map on the left is an overlay
of the three data sources about
hotels in Tel-Aviv
For example, the Basel
Hotel has three different
locations, in the three
data sources

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Inaccuracy  Difficult to Use Locations

• It is difficult to distinguish between:


1. A pair of objects that represent close entities
+

2. A pair of objects that represent the same entity


+

• Partial coverage complicates the


problem
?

1 a 2

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Fusion methods

Assumptions
• There are only two data sources

• Each data source has at most one object for


each real-world entity – i.e., the matching is
one-to-one

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Corresponding Objects

• Objects from two


distinct sources
that represent
the same real-
world entity

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Fusion Sets

• A fusion algorithm creates two types


of fusion sets:

+
– A set with a single object

– A set with a pair of objects – one


from each data source +

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

Confidence

• Our methods are heuristics  may produce incorrect


fusion sets
• A confidence value between 0 and 1 is attached to
each fusion set
• It indicates the degree of certainty in the correctness of
the fusion set

Fusion sets + Fusion sets


with low + with high
confidence confidence
KGEC : 30th June, 2009
Amlan Chakrabarti, University of Calcutta

One Sided Nearest Neighbor Join


• Given an object a ε A , we say that an object b ε B is the nearest B-neighbor
of a if b is the closest object to a among all the objects in B.

• The one-sided nearest neighbor join of a dataset B with a dataset A


produces all fusion sets {a, b}, such that a ε A and b ε B is the nearest B-
neighbor of a.

• Note that every a ε A is in one of the fusion sets, while objects of B may
appear in zero, one or more fusion sets.

• Thus, the one-sided nearest-neighbor join is not symmetric, i.e., the result
of joining B with A is not necessarily equal to the result of joining A with B.

• The definition can be modified by adding to the result the singleton set {b} for
every b ε B that is not the nearest neighbor of some a ε A . We do that to
boost up the recall of this method; otherwise, the recall could be very low.

• We say that a dataset A is covered by a dataset B if every real-world entity


that is represented in A is also represented in B.
KGEC : 30th June, 2009
Amlan Chakrabarti, University of Calcutta

The Mutually-Nearest Method

• The result includes


– All mutually-nearest pairs
– All singletons, when an object is not part of pair

Finding nearest
input Fusion sets
objects
nearest nearest

1 a 2 1 a 2 1 a 2
nearest

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta

The Mutually-Nearest Method (contd.)

• Two objects, a ε A and b ε B, are mutually nearest if a is


the nearest A-neighbor of b and b is the nearest B-
neighbor of a.
• The intuition behind the mutually nearest method is that
corresponding objects are likely to be mutually nearest.
• Note that some objects of A are not in any pair of
mutually-nearest objects (and, similarly, for some
objects of B).
• It happens when the nearest B-neighbor of a ε A is some
b ε B , but the nearest A-neighbor of b is different from
a. KGEC : 30 June, 2009
th
Amlan Chakrabarti, University of Calcutta

The Probabilistic Method

• An object from one dataset has a probability of


choosing an object from the other dataset
• The probability is inversely proportional to the
distance

Confidence – the probability that


the object is not chosen by any +

+ Confidence – the probability of


the mutual choice

A threshold value is used to discard


fusion sets with low confidence
KGEC : 30th June, 2009
Amlan Chakrabarti, University of Calcutta

Mutual Influences Between Probabilities

Case I:

1 a 2 1 a 2
0.3 0.2

Case II: we expect

1 a 2 1 a 2
b b
0.8 0.05
KGEC : 30th June, 2009
Amlan Chakrabarti, University of Calcutta
The Normalized-Weights Method

Normalization Iteration
captures mutual brings to
influence equilibrium

Results are superior to those


of the previous two methods
(at a cost of only a small increase
in the computation time)

KGEC : 30th June, 2009


Amlan Chakrabarti, University of Calcutta
A Case Study:

State of
the art Our three methods

Normal- Proba- Mutually The


ized bilistic nearest traditional
weights method nearest
method neighbor
(Best
results)

0.85 0.80 0.77 0.48 Recall

0.90 0.80 0.85 Precisio


0.56
n
All three methods perform much better
than the nearest-neighbor method
KGEC : 30th June, 2009
Amlan Chakrabarti, University of Calcutta

Conclusions

The novelty of all these approaches is in developing


efficient methods that find fusion sets with high
recall and
precision, using only location of objects.

Thank you!

acakcs@caluniv.ac.in

KGEC : 30th June, 2009

You might also like