Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Data Models

Sanjeevan Shrestha
Introduction
Data Models
• Viewing an image only as ‘data’ disconnects the remote sensing
analyst from the underlying physical processes that creates the ‘data’.
• Specific data models can provide the link between the physics of
remote sensing and the design of image processing algorithms
Designation of Pixel in image?
• Image as an array of numbers, with indexes i and j, where the array
values are the pixel DNs.
• A pixel value at row i and column j would be denoted as 𝐷𝑁𝑖𝑗
• Rows and columns are conveniently numbered from (1,1) at the
upper left to the (N,M) at the lower right of the image array
Image Statistics
Remote Sensing pixel and basic statistics
Many different ways to check the pixel values and statistics:
• Looking at the frequency of occurrence of individual brightness values
(or Digital Number) in the image displayed in a histogram
• Viewing on a computer monitor the individual pixel brightness value
or DN at specific locations or within a geographic area
• Computing univariate descriptive statistics to determine if there are
unusual anomalies in the image data
• Computing multivariate statistics to determine the amount of
between band correlation to identify redundance
RS data distribution
• Large sample drawn randomly from natural populations usually
produce a symmetrical frequency distribution; most values are
clustered around some central values, and the frequency of
occurrence declines away from this central point- bell shaped, and is
also called a normal distribution.
• Many statistical tests used in the analysis of remotely sensed data
assume that the brightness values (DN) recorded in a scene are
normally distributed.
• Unfortunately, remotely sensed data may not be normally distributed
and analyst must be careful to identify such conditions. In the
instances, non-parametric statistical theory may be preferred.
Univariate Image Statistics
• This is generally apply to the single band images.
• Univariate image statistics may be further classified into:
• Histogram
• Cumulative Histogram
DN Histogram
• Describes the statistical distribution of image pixels in terms of the
number of pixels at each DN.
• Measures brightness distribution
• It is calculated simply by counting the number of pixels in each DN
‘bin’ and divided by total number of pixels in the image, N.
ℎ𝑖𝑠𝑡𝐷𝑁 = 𝑐𝑜𝑢𝑛𝑡(𝐷𝑁)/𝑁
• This is analogous to the continuous Probability Density Function (PDF)
of statistics
ℎ𝑖𝑠𝑡𝐷𝑁 ≈ 𝑃𝐷𝐹(𝐷𝑁)
DN Histogram
• The histograms of larger images of land areas are typically unimodal
i.e. they have a single peak
• Usually skewed, with a tail towards the higher DNs.
• Histogram contains no direct information about the spatial
distribution of pixels
• However, spatial information can be inferred from the spatial
distribution from such pixels i.e. strongly bimodal histogram usually
indicates two dominant materials in the scene.
• What we cannot say is that how two materials are spatially
connected.
DN Histogram
• The image histogram is the useful tool for the contrast enhancement.
• A common contrast enhancement techniques stretches the range of
DNs and clips or thresholds it at one or both ends resulting in a
certain percentage of saturated pixels.
• The appropriate DN thresholds can be obtained from the histogram
percentages of the total number of pixels in the image.
Cumulative Histogram
• Some image processing algorithms, notably histogram equalization,
histogram matching etc. require a function, the cumulative histogram
𝐷𝑁

𝑐ℎ𝑖𝑠𝑡𝐷𝑁 = ෍ ℎ𝑖𝑠𝑡𝑚𝑖𝑛
𝐷𝑁= 𝐷𝑁𝑚𝑖𝑛
• The cumulative histogram is the fraction of pixels in the image with a
DN less than or equal to the specified DN.
• This is monotonic function of DN, since it can only increase as each
histogram value is accumulated.
• This is also called the Cumulative Distribution Function (CDF)
Statistical Parameters
• The mode is the value that occurs most frequently in a distribution
and is usually the highest point on the curve (histogram). It is
common, however, to encounter more than one mode in a remote
sensing dataset.
• The median is the value midway in the frequency distribution. One
half of the area below the distribution curve is to the right of the
median, and one half is to the left.
Statistical Parameters
• The mean is the arithmetic average and is defined as the sum of all
brightness value observations divided by the number of observations.
• This can be defined as the weight of each DN by the corresponding
histogram value (the fraction of the image that has that DN) and sum
of the weighted DNs.

𝑁 𝐷𝑁=𝐷𝑁𝑚𝑎𝑥
1
𝜇= ෍ 𝐷𝑁𝑝 = ෍ 𝐷𝑁 × ℎ𝑖𝑠𝑡𝐷𝑁
𝑁
𝑝=𝑗 𝐷𝑁= 𝐷𝑁𝑚𝑖𝑛
Statistical Parameters
• The image standard deviation can also be used as a measure of image
contrast since it is a measure of the histogram width i.e. the spread in
DNs.
Statistical Parameters
• Skewness
• Measure of asymmetry
• Is zero for any symmetric histogram
• A histogram with a long tail toward larger DNs has a positive/negative
skewness and this is typical of remote sensing images.
• If a distribution has a long right tail of larger values, it is positively skewed and
if it has a long left tail of small values, it is negatively skewed.
𝑁 3 𝐷𝑁𝑚𝑎𝑥 3
1 𝐷𝑁𝑝 − 𝜇 𝐷𝑁 − 𝜇
𝑠𝑘𝑒𝑤𝑛𝑒𝑠𝑠 = ෍ = ෍ × ℎ𝑖𝑠𝑡𝐷𝑁
𝑁 𝜎 𝜎
𝑝=1 𝐷𝑁= 𝐷𝑁𝑚𝑖𝑛
Statistical Parameters
• Kurtosis
• Measure the sharpness of peak relative to the normal distribution
• Is zero for the normal distribution
• If a histogram has a positive kurtosis, then the peak is sharper than that of a
gaussian
• A negative kurtosis means the peak is less sharp than that of gaussian
𝑁 4
1 𝐷𝑁 𝑝 − 𝜇
𝑘𝑢𝑟𝑡𝑜𝑠𝑖𝑠 = ෍ −3
𝑁 𝜎
𝑝=1
𝐷𝑁𝑚𝑎𝑣 4
𝐷𝑁 − 𝜇
= ෍ × ℎ𝑖𝑠𝑡𝐷𝑁 − 3
𝜎
𝐷𝑁= 𝐷𝑁𝑚𝑖𝑛
Statistical Parameters
• Both skewness and kurtosis are normalized by the standard deviation
and are unitless, unlike the mean and standard deviation.
• Skewness and kurtosis are quite sensitive to outliers, pixels with DNs
far removed from the majority distribution, because of their high
order.
Multivariate Image Statistics
• Remote sensing research is often concerned with the measurement
of how much radiant flux is reflected or emitted from an object in
more than one band.
• It is useful to compute multivariate statistical measures such as
covariance and correlation among the several bands to determine
how the measurements covary.
• Later it will be shown that variance-covariance and correlation
matrices are used in remote sensing principal components analysis
(PCA), feature selection, classification and accuracy assessment.
Scatterplot
• One way to visualize two or three
dimensional data is the
scatterplot.
• This is binary plot which shows
the dot if a particular
multispectral vector has a
histogram count of at least one.
• However, The number of pixels
with a particular vector is not
shown.
• This 3d nature of scatterplot can
help to reveal different features in
the data and for image
interpretation.
Scatterplot

• The 3-D plot can be


projected into three 2-D
plots by projecting every
point into one of the
bounding plane
Scatterplot
• The projection removes some of the spectral information, because all
points along a projection line are represented by only a single point in
2-D.
• Further information is lost if the 2-D scatterplot is projected into 1-D
histogram.
Covariance
• The different remote sensing derived spectral measurements for each
pixel often change together in some predictable fashion.
• IF there is no relationship between brightness value in one band and
that of another for a given pixel, the values are mutually
independent; that is, an increase or decrease in one band’s brightness
value is not accompanied by a predictable change in another band’s
brightness value.
• Because spectral measurements of individual pixels may not be
independent, some measure of their mutual interaction is needed.
Covariance
• This measure, called the covariance, is the joint variation of two
variables about their common mean.
Covariance
• The covariance matrix is symmetric.
• Because the diagonal elements are the variance of the distribution
along each dimension, they are always positive.
• Howevver, the off-diagonal elements may be negative or positive
Correlation
• To estimate the degree of interrelation between variables in a manner
not influenced by measurement units, the correlation coefficient, is
commonly used.
• The correlation between two bands of remotely sensed data, 𝜌𝑚𝑛 is
the ratio of their covariance (𝑐𝑚𝑛 ) to the product of their standard
deviation (𝑐𝑚𝑚 . 𝑐𝑛𝑛 ); thus:
𝑐𝑚𝑛
𝜌𝑚𝑛 = 1ൗ
(𝑐𝑚𝑚 . 𝑐𝑛𝑛 ) 2
Correlation
• This value must be between minus one to plus one; the diagonal
terms, for which m is equal to n, are each normalized to one.
• The values of correlation coefficient close to plus or minus one simply
imply a strong linear dependence between the data in the two
dimensions, whereas if correlation coefficient is near zero, then there
is little dependence between the two dimensions.
Correlation
• If we square the correlation coefficient, we obtain the sample
coefficient of determination (𝑟 2 ), which expresses the proportion of
the total variation in the value of ‘band 1’ that can be accounted for
or explained by the linear relationship with the value of the random
variable ‘variable k’.
• Thus a correlation coefficient of 0.7 results in an coefficient of
determination of 0.49, meaning that 49% of the total variation of the
values of ‘band 1’ in the sample is accounted for by a linear
relationship with values of ‘band k’.
Signal Noise
• Signal: Varying quantity that carry information
• Noise:
• In signal processing or computing it can be considered data without meaning;
that is, data that is not being used to transmit a signal, but is simply produced
as an unwanted by product of other activities
• In information theory, however, noise is still considered to be information.
• Noise can block, distort, or change the meaning of a message in both human
and electronic communication.
Sources of Noise
• Atmospheric effects
• Detector/preamplifier noise
• Quantization Noise
Noise Models
• Additive model
• This is signal independent component at each pixel, p, which can be
expressed as
𝐷𝑁𝑝 = 𝑖𝑛𝑡 𝑎𝑝 + 𝑛𝑝

• Signal Dependent Noise Model


• This is expressed as
𝐷𝑁𝑝 = 𝑖𝑛𝑡 𝑎𝑝 + 𝑛𝑝 𝑎𝑝
• For example, noise in the photographic film
Noise Models
• Global/Random Noise
• Typically as twice as large as the detector noise
• Periodic Noise
• Related to the scanning
• For whiskbroom scanner: cross-track stripping due to difference in calibration
and response to the individual detector
• For pushbroom scanner: in-track stripping and banding noise
Noise Models
• Detector Scan noise
• This can be modelled by sensor dependent or independent model
• For example: the detector scan noise can be modelled as
𝑖 = 𝑠𝑐𝑎𝑛 − 1 × 16 + 𝑑𝑒𝑡
• Or, this can also be modelled as
𝐷𝑁𝑖𝑗𝑑𝑒𝑡 = 𝑖𝑛𝑡 𝑔𝑎𝑖𝑛𝑑𝑒𝑡 × 𝑒𝑖𝑗
𝑑𝑒𝑡
+ 𝑜𝑓𝑓𝑠𝑒𝑡 𝑑𝑒𝑡
Statistical Measure for Image quality
• Contrast
Numerical contrast an be defined in several ways:
𝐷𝑁𝑚𝑖𝑛
𝑐𝑟𝑎𝑡𝑖𝑜 =
𝐷𝑁𝑚𝑎𝑥

𝑐𝑟𝑎𝑛𝑔𝑒 = 𝐷𝑁𝑚𝑎𝑥 − 𝐷𝑁𝑚𝑖𝑛

𝑐𝑠𝑡𝑑 = 𝜎𝐷𝑁
Statistical Measure for Image quality
• Modulation
Another easily measured image property is modulation, M, is defined
as
𝐷𝑁𝑚𝑎𝑥 − 𝐷𝑁𝑚𝑖𝑛
𝑀=
𝐷𝑁𝑚𝑎𝑥 + 𝐷𝑁𝑚𝑖𝑛

The modulation s always between zero and one.


Signal to Noise Ratio
• Some measure of the relative amounts of signal and noise is
necessary for engineering design, data quality assessment, noise
reduction algorithm and certain information extraction algorithm.
• The signal to noise ration (SNR) is such a measure; it is unitless and
independent to the measurement units.
• For an large contaminated by random noise at every pixel, amplitude
SNR can be defined as the ration of the noise free image contrast to
the noise contrast.
𝑐𝑠𝑖𝑔𝑛𝑎𝑙
𝑆𝑁𝑅𝑎𝑚𝑝𝑙𝑖𝑡𝑢𝑑𝑒 =
𝑐𝑛𝑜𝑖𝑠𝑒
Signal to Noise Ratio
• Because of the problem of outliers, the measure of standard
deviation is the most reliable. So, SNR can be defined in term of signal
standard deviation to the noise standard deviation.
𝜎𝑠𝑖𝑔𝑛𝑎𝑙
𝑆𝑁𝑅𝑠𝑡𝑑 =
𝜎𝑛𝑜𝑖𝑠𝑒
Signal to Noise Ratio
• The power SNR is given by
2
2 𝑐𝑠𝑖𝑔𝑛𝑎𝑙
𝑆𝑁𝑅𝑝𝑜𝑤𝑒𝑟 = 𝑆𝑁𝑅𝑎𝑚𝑝𝑙𝑖𝑡𝑢𝑑𝑒 =
𝑐𝑛𝑜𝑖𝑠𝑒
In term of standard deviation, this is

2
𝜎𝑠𝑖𝑔𝑛𝑎𝑙
𝑆𝑁𝑅𝑣𝑎𝑟 =
𝜎𝑛𝑜𝑖𝑠𝑒
Signal to Noise Ratio
• The SNR expressed in decibels (dB) is given by
𝑆𝑁𝑅𝑑𝐵 = 10log(𝑆𝑁𝑅)

You might also like