Geomedian Pixel Composites 08004469

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

6254 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO.

11, NOVEMBER 2017

High-Dimensional Pixel Composites From


Earth Observation Time Series
Dale Roberts, Norman Mueller, and Alexis McIntyre

Abstract— High-quality and large-scale image composites are large regional mosaics that are representative of conditions
increasingly important for a variety of applications. Yet a number over specific time periods while also being free of clouds and
of challenges still exist in the generation of composites with other unwanted image noise. One approach is the stitching
certain desirable qualities such as maintaining the spectral rela-
tionship between bands, reduced spatial noise, and consistency together of a number of clear images. Another is the creation
across scene boundaries so that large mosaics can be generated. of mosaics where pixels from different epochs are combined
We present a new method for generating pixel-based composite based on some algorithm from a time series of observations.
mosaics that achieves these goals. The method, based on a high- This ‘pixel composite’ approach to mosaic generation provides
dimensional statistic called the ‘geometric median,’ effectively a more consistent result compared with stitching clear images
trades a temporal stack of poor quality observations for a
single high-quality pixel composite with reduced spatial noise. due to the improved color balance created by the combining
The method requires no parameters or expert-defined rules. of one-by-one pixel representative images. Another strength
We quantitatively assess its strengths by benchmarking it against of pixel-based composites is their ability to be automated for
two other pixel-based compositing approaches over Tasmania, application to very large data collections and time series such
which is one of the most challenging locations in Australia for as national satellite data archives.
obtaining cloud-free imagery.
In general, the mechanisms to select a ‘best’ pixel for pixel
Index Terms— Big data applications, image analysis, remote composites are either rule based or statistics based. Rule-
sensing, time series analysis. based composite methods include pixel selection by a target
biophysical factor such as greenness [13], pixel selection by
I. I NTRODUCTION distance to cloud [14], pixel selection by time period [1], or a

L ARGE-scale image composites are increasingly impor-


tant for a variety of applications such as land cover
mapping [1]–[3], change detection [4]–[6], and the generation
combination of mechanisms [4], [15]. On the other hand,
statistics-based mechanisms generate composites by taking the
sequence of pixel observations for each band through time
of high-quality data to parametrize and validate bio-physical and calculating a summary statistic of these observations to
and geo-physical models [7]–[9]. Since first introduced by create the composite image. A summary statistic is a calculated
Holben [10] as a method to reduce cloud and aerosol con- numerical value (such as the mean) that characterizes some
tamination in advanced very high resolution radiometer time aspect of a set of data and that is often meant to estimate the
series, a number of compositing methodologies have been true value of a corresponding parameter (such as the popula-
proposed (see [1], [11]–[13]). However, challenges such as tion mean) in an underlying population. Common practice is to
maintaining the spectral relationship between bands, mitigat- apply a 1-D summary statistic to each spectral band separately,
ing against boundary artifacts due to mosaicking scenes from and while this might produce a pleasing picture, the resulting
different epochs, and ensuring spatial regularity across the composite will not retain the correct spectral relationships in
mosaic image still exist. In this paper, we propose an approach the output composite image. Maintaining spectral relationships
that leverages high-dimensional statistical theory to deliver on is particularly important if further analysis on the composite
all of these desirable attributes. image is required, such as calculating band ratios on the
The creation of good composite images is a particularly composite image or applying machine learning algorithms.
important technology since the opening of the Landsat archive A solution to this problem is to construct the pixel-composite
by the United States Geological Survey [4]. The greater avail- image using a high-dimensional summary statistic that applies
ability of satellite imagery has resulted in demand to provide to all bands at once to guarantee that the biophysical relation-
ships among all the spectral bands are maintained. An example
Manuscript received February 6, 2017; revised April 11, 2017 and of this technique is the recently proposed algorithm in [11] for
May 25, 2017; accepted June 20, 2017. Date of publication August 8, 2017;
date of current version October 25, 2017. (Corresponding author: creating seasonal composites.
Dale Roberts.) In this paper, we propose a new statistics-based approach
D. Roberts is with the Australian National University, Canberra, ACT 0200, to pixel-based compositing that takes a collection of earth
Australia (e-mail; dale.roberts@anu.edu.au).
N. Mueller and A. McIntyre are with Geoscience Australia, observations and collapses them down to a single image that
Symonston, ACT 2609, Australia (e-mail: norman.mueller@ga.gov.au; maintains the relationships among the spectral bands, provides
alexis.mcintyre@ga.gov.au). a good representation of a typical pixel observation devoid
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. of outliers, and exhibits reduced spatial noise. Our method
Digital Object Identifier 10.1109/TGRS.2017.2723896 effectively trades a temporal stack of poor quality observations
This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/
ROBERTS et al.: HIGH-DIMENSIONAL PIXEL COMPOSITES FROM EARTH OBSERVATION TIME SERIES 6255

TABLE I
D ESCRIPTION AND WAVELENGTH R ANGES ( IN M ICROMETERS ) AND BAND
N UMBERS OF THE S PECTRAL BANDS OF THE S ENSOR
AS L ABELED IN T HIS PAPER

each raw acquisition composed of a 4000 × 4000 matrix


for each spectral band. Each multiband 4000 × 4000 pixel
Fig. 1. Landsat footprints (black) over the Tasmania study area. The red box observation in each cell is termed a “tile.” The matrices allow
indicates the more detailed study area of Hobart, shown in Fig. 11. the construction, for each 1×1 degree cell, of a complete time
series of matrix-valued observations across six spectral bands
for a single high-quality pixel composite image. This paper common to most Landsat satellites (BLUE, GREEN, RED,
presents this new algorithm, provides a quantitative compari- NIR, SWIR1, and SWIR2; see Table I). The observations are
son and evaluation against other statistical-based approaches, ortho-rectified and atmospherically corrected to measurements
and demonstrates its applicability over Tasmania, which is one of surface reflectance using the NBAR approach [19] (see
of the most challenging locations in Australia for obtaining point A in the workflow shown in Fig. 2). This approach
cloud-free imagery. includes a bidirectional reflectance factor correction based
on the bidirectional distribution function data collected by
II. S TUDY A REA AND DATA the moderate resolution imaging spectroradiometer, thereby
reducing the impact of changing solar illumination angles.
The area of interest is Tasmania (centered at 42°06S,
The observations are stored on disk as 16-b integers with
146°37E), the southernmost state of Australia. Tasmania is
surface reflectance values scaled to range from 0 to 10 000 with
a large island with elevation ranging between sea level and
−999 denoting a missing value. Missing values occur naturally
1617 m [16] (see Fig. 1). The majority of Tasmania is covered
in the AGDC framework as a satellite observation may not fill
with dense forest, with conservation the main land use along
the full tile with values or may lack data, for example, due to
with some areas of forestry. The northern coast and main
Landsat-7 Scan Line Corrector failure gaps [18], [20].
valley system running through eastern Tasmania have mixed
We note that our study area is only an example of
agricultural uses including intensive horticulture and animal
our approach and we have already successfully applied our
husbandry [17].
approach to other imagery (commercial and open access)
The data consist of Landsat-8 OLI (LS8) satellite observa-
under various processing methodologies such as algorithms
tions from 2014. The study area is covered by 11 Landsat
that include a correction for terrain illumination [19]. Our
footprints that are shown in Fig. 1. A subset over the city of
method is general and can be applied to any temporal stack
Hobart as shown by the red box in Fig. 1 is further investigated
of consistently corrected input imagery where the pixels
in our assessment. An overview is provided of all the input
are spatially aligned with each other to a high degree of
images into the method and quantitative study in Fig. 4. Out
precision. For example, a stack of ortho-corrected top-of-
of the nine images, only one image is clear over the detailed
atmosphere (TOA) imagery will produce a TOA geometric
study area of Hobart and the majority have significant cloud
median pixel composite. Likewise, if the input imagery is
cover.
corrected to surface reflectance, our algorithm will produce
Tasmania was chosen as it is one of the hardest places in
a pixel composite of surface reflectance.
Australia to create a good mosaic, as this area receives very
few clear observations due to high cloud coverage, high fog
frequency, and steep relief causing significant terrain shadow. III. M ETHOD
It is not unusual for parts of Tasmania to receive fewer than As a summary statistic, the geometric median method is
three clear observations in any given year. This has led to simple and requires only a few steps (see the workflow
difficulties in creating mosaics that cover the entire state. in Fig. 2); however, for completeness, we also provide some
Scene-based mosaics require clear images to complete, and details of the ancillary steps required for the production of the
in some areas, it is difficult to find a single scene that has less composite image shown in Fig. 3.
than 50% cloud cover. The core method used in our approach (as labeled in Fig. 2)
The Landsat data are processed using the Australian Geo- is based on a high-dimensional generalization of the median
science Data Cube (AGDC) methodology whereby the conti- called the geometric median (also known as L 1 median,
nent has been spatially organized into 1 × 1 degree cells [18]. the median center, or the spatial median). The geometric
In each cell, observations are ortho-rectified and organized median was introduced in [21] and is one of a number of
as a set of acquisitions corresponding to the cell area with possible generalizations of the median to higher dimensions
6256 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO. 11, NOVEMBER 2017

Fig. 2. Overview of workflow. Details of steps A–C are given in Sections II and III.

(see [22] for a comprehensive survey). Out of all the high- cloud/shadow algorithms often miss the edges of clouds or do
dimensional medians, the geometric median is particularly not detect very thin cloud or haze. However, the key idea
attractive as its formulation allows a mathematical charac- in the approach is that the absolute accuracy of the pixel
terization of its properties [23] that, in turn, enables the quality mask (on a per tile basis) is not important as it is only
construction of mathematically rigorous statistical tools that required to reduce the number of artifacts through time below
exploit these properties (e.g., hypothesis tests). the breakdown point of the geometric median to obtain a good
result. The pixel quality mask is used to set identified outliers
as “no data” values (NaN), which are taken into account in
A. Masking: Sneaking Under the Breakdown Point
the algorithm implementation.
The breakdown point of a summary statistic is the smallest
proportion of contamination in the observations that can have
B. Geometric Median of Multiband Pixel Time Series
an arbitrarily large effect on the result. The geometric median
has a breakdown point of 0.5 [23]. A breakdown value of A geometric median can be defined in the remote sensing
α = 0.5 is high, since if α > 0.5, then more observations context as follows. Given a finite set X of p-band pixel
would be contaminated than not, and the algorithm would observations (i.e., multiband pixel time series) as vectors
not be able to converge x1 , . . . , xn ∈ R p , the geometric median of these observations
 on good observations. In comparison,
the mean x̄ := (1/n) ni=1 xi has a breakdown point of 0 and is
can be affected by a single large outlier resulting in a large  n

effect on x̄.  := argmin


μ x − xi  (1)
x∈R p i=1
This robustness of the geometric median is particularly
attractive for constructing pixel composites from time series of where · is the Euclidean norm and argminx f (x), the “argu-
earth observations, as outliers do not overly impact the median ment of the minima,” gives the point x that minimizes the
value. Unfortunately, it is common that in any given time function f . In other words, μ  is the closest p-dimensional
series of earth observations, more than 50% of the observations vector x ∈ R p to all the p-band pixel observations in X
contain cloud or other artifacts (for example, see the statistics using the Euclidean norm as the measure of distance. The p-
in [24, Fig. 1] or [25, Fig. 5]). To circumvent this, before dimensional vector μ  always exists [23]. When p = 1, defin-
applying the proposed algorithm to each pixel through time, ition (1) collapses down to the well-known (one-dimensional)
each pixel is treated as a multiband image and masked for pixel median (see Section IV for a definition).
quality [26] using logical ‘OR’ rules and the output of various The geometric median is used in high-dimensional statistics
algorithms that make use of spectral and spatial information as a robust alternative to the mean vector due to its high
to determine observation artifacts (e.g., cloud, cloud shadow, breakdown value. It is particularly useful when the proba-
and saturated pixels). In practice, pixel quality masks are never bility distribution of the data is not necessarily multivariate
perfect. For example, the ACCA [27] and FMASK [28], [29] normal or when outliers are present in the data. The geometric
ROBERTS et al.: HIGH-DIMENSIONAL PIXEL COMPOSITES FROM EARTH OBSERVATION TIME SERIES 6257

Fig. 3. Annual pixel composite of Tasmania in 2014 using SWIR1, NIR, and GREEN in the RGB channels for display. This pixel composite was generated
using our proposed geometric median approach using the implementation found in the hdmedians package.

median is equivariant under translation, scale, and orthogonal row r and column c in the temporal stack of tiles, a time
transformations [30]. That is, for any b ∈ R p , a ∈ R, and p× p series is obtained of p-band pixel observations Xrc of which
orthogonal matrix Q, if every p-band pixel xi is transformed the pixel composite for the (r, c) th pixel is constructed using
to aQxi +b and μ  is the geometric median of x1 , . . . , xn , then GEOMEDIAN(Xrc ) as given by Algorithm 1.
the geometric median of aQxi + b for i = 1, . . . , n becomes For simplicity, Algorithm 1 is given using the Weiszfeld [32]
aQ μ + b. For example, this implies that the geometric median approach. The algorithm has two stopping criterion,  that
of a set of tasseled cap transformed pixels [31] is exactly the specifies the decimal places of precision to achieve and max-
same as applying the same tasseled cap transformation to the iters that sets the maximum number of iterations that should be
geometric median μ . performed. This approach can be optimized by implementing
Missing values occur in the time series due to the way that the modifications in [33]. Nie et al. [34] give a semidefinite
observations are stored as 1 × 1 degree cells (see Section II) program formulation. Alternatively, more recent work gives
and the way that pixels are removed based on the pixel quality the geometric median as the fixed point of a stochastic
mask (as described above). The missing values are taken into gradient descent [35], [36]. Our actual implementation of the
account by skipping the i  th pixel xi = (x i1 , . . . , x ip ) if any geometric median algorithm includes some modifications to
one of its components x i1 , . . . , x ip is NaN. Then for each deal with missing values and edge cases. It is implemented in
6258 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO. 11, NOVEMBER 2017

Fig. 4. Time series of images from Landsat-8 OLI (LS8) that formed the input into the pixel compositing algorithm to create Fig. 11. All images have been
given the same histogram enhancements to match the composite.

Cython, which generates efficient C code with Python bindings postprocessing required to produce a seamless result. This is
and is available as open-source software in the hdmedians achieved using standard image processing tools (e.g., GDAL2 ).
package1 that provides a number of high-dimensional median
algorithms. D. Statistical Compositing Methods Used for Evaluation
Comparison is made between the one-dimensional median
C. Mosaicking of Pixel Composites applied for each band separately, the geometric median algo-
rithm as proposed in Section III-B, and the medoid, which
The final stage of the generation of large-scale pixel is an alternative high-dimensional median (see [22]) that was
composites, such as Fig. 3, is the mosaicking of the introduced in the remote-sensing context by Flood [11].
4000 × 4000 pixel composites into one large image. This is a The median of a set of n one-dimensional observations
simple aggregation of data into a single file, with no additional X = {x 1 , x 2 , . . . , x n } is obtained by sorting the set X and then
1 www.github.com/daleroberts/hdmedians 2 www.gdal.org
ROBERTS et al.: HIGH-DIMENSIONAL PIXEL COMPOSITES FROM EARTH OBSERVATION TIME SERIES 6259

Fig. 5. Scatter plot of BLUE versus RED values in a marine environment of size 50 × 50 pixels as shown in Fig. 7, comparing the distribution of values
across the various algorithms (subfigures left to right: Geometric Median, Median and Medoid) and with an actual clear observation of the same location
(final subfigure on the right). Color of the scatter points is assigned based on a 2-D kernel density estimate of the points where darker means a higher density
of points in the area. All subfigures have the same axes ranges in reflectance. The distribution of points shows most similarity between the geometric median
and the clear observation.

Fig. 6. Scatter plot of RED versus SWIR1 values in a land environment of size 50 × 50 pixels as shown in Fig. 7, comparing the distribution of values
across the various algorithms and with a clear observation of the same location. Color of the scatter points is assigned based on a 2-D kernel density estimate
of the points where darker means a higher density of points in the area. All subfigures have the same axes ranges in reflectance.

Algorithm 1 Geometric Median metric median can be described as a synthetic (not physically
X ← [x1 , x2 , . . . , xn ]  X is a p × n data matrix observed) observation.
procedure  GEOMEDIAN(X,  = 10−7 , maxiters=1000)
y0 ← n1 ni=1 xi , k ← 0 IV. R ESULTS AND D ISCUSSION
while k < maxiters
 do   
n xi n In Figs. 3 and 11, the geometric median method was used
yk+1 ← i=1 xi −yk 
1
i=1 xi −yk 
√ to create composites for 2014 from Landsat 8 OLI surface
if yk+1 − yk / p <  then reflectance data over the state of Tasmania, Australia. The
break resulting mosaic is visually appealing as it preserves the spec-
k ← k+1 tral relationships among bands resulting in good color balance
return yk
and does not show any trace of the Landsat footprints. The data
produced using this method are statistically representative and,
since the spectral relationships are preserved, it can be used
choosing the middle value when n is odd. When n is even,
to compute band ratios or as an input into a machine learning
the median is given by the average of the two middle values.
algorithm.
Given a finite set X of p-band pixel observations modeled by
The following sections provide an evaluation of the method,
vectors X = {x1 , . . . , xn }, the medoid of these observations is
in which the geometric median is compared to two alternative

n
statistical-based methods for constructing pixel composites.
m := argmin x − xi  (2) No comparison is made to rule-based systems as it is difficult
x∈X i=1
to fairly reproduce them in full. Each algorithm is applied to
where · is the Euclidean norm and argminx f (x), the “argu- the stack of images presented in Fig. 4.
ment of the minima,” gives the point x that minimizes the
function f . There is a subtle difference between (1) and (2)
whereby the search space for the solution differs (i.e., “x ∈ A. Spectral Relationship Among Bands
R p ” versus “x ∈ X”) and has the effect that the medoid The geometric median pixel composite captures the spectral
returns one of the pixel observations in X whereas the geo- relationship among bands by design. Figs. 5 and 6 compare
6260 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO. 11, NOVEMBER 2017

Fig. 7. Detail comparison of the 50 × 50 pixel land study area (as shown by red box in Fig. 11). The axes show the pixel location coordinates corresponding
to the graphs in Fig. 8. The area is shown with SWIR1, NIR, and GREEN placed in the RGB channels. Visual inspection shows that the medoid pixel
composite is pixelated as true observations are being chosen from different observations. The geometric median and median displays each have a smooth
appearance similar to the clear observation.

Fig. 8. Spectral profiles of observations from all time periods represented by the Landsat 8 image stack for the four pixel locations #1 = (25, 5),
#2 = (26, 23), #3 = (26, 24), #4 = (46, 43) within the area shown in Fig. 7. Dashed lines are observations removed through the pixel quality assessment.
Black solid lines are retained observations except for the comparative clear observation (LS8 28.09.2014) that is shown in blue. Black dots highlight the medoid
(a true pixel observation) and the red line is the calculated geometric median. The geometric median shows spatial consistency between two neighboring
points #2 and #3 compared with the medoid.

composites produced using the geometric median, median,


and medoid with a clear observation. Fig. 5 compares the
relationship between the BLUE and RED bands over water,
and Fig. 6 compares the relationship between the RED and
SWIR1 bands over land using a scatter plot of pixel observa-
tions over a 50 × 50 pixel area. These figures show that of
the three composite methods, the geometric median has the
most consistent and tightest ‘shape’ compared with a typical
clear observation, which signifies that it maintains the spectral
relationship over different targets. The scatter plot for the
clear observations in Fig. 5 also shows a change in spectral
characteristics of surface reflectance over time potentially
caused by differing brightness at different times of year, thin
haze, or clouds.
The distribution of points in the geometric median plot
compared with the median plot shows that the median exhibits Fig. 9. Characterization of the difference in spatial noise among the geometric
more scatter, which we attribute to less correlation between median, the median, and the medoid. It shows a screeplot, as described in
950
Section IV-B, of the Hobart study area (950 × 950 pixel) for the RED band
the bands in the median than the geometric median. This is
where λi = si2 /( j =1 s j ) and s1 , . . . , s950 are the singular values of the
expected as the median composite is calculated band-wise and RED band.
therefore loses the spectral relationships among all bands. The
medoid pixel composite captures the relationship among bands in the output as can be seen in Fig. 7 and is also present
as it gives true pixel observations. Although this behavior in Fig. 5 where the scatter plot shows that two clear clusters
is quite attractive for some applications (e.g., provenance of pixels are present, caused by pixels pulled from at least two
tracking), it can also lead to a patchwork or pixelated effect distinct time points. This can lead to pixel composites that
ROBERTS et al.: HIGH-DIMENSIONAL PIXEL COMPOSITES FROM EARTH OBSERVATION TIME SERIES 6261

Fig. 10. Histograms of each band, comparing each pixel-composite algorithm to a clear observation of the same location for the Hobart study area. Exactly
the same histogram bin edges have been used for each row.

have observable spatial variation due to neighboring pixels metric median and median pixel composites that qualitatively
being sourced from different points in time in a changing appear smoother. This effect can be seen in Fig. 7 where a
environment. detailed comparison of a 50×50 pixel land area is shown. This
Another aspect of the scatter plots is quantization effects area is quite interesting as it shows how the various algorithms
from storing reflectance values as 16-bit integers; this is perform in the case of a low number of observations. At the
specific to the implemented NBAR correction applied to the other extreme, for a very high number of clear observations,
input imagery. These effects are visible in the scatter plots the medoid and the geometric median would be very similar.
in the case of the clear observation, the medoid (as it pulls The spectral profiles from all the observations in the time
pixels from different observations), and in the median (due to series at four different pixel locations are shown in Fig. 8.
the specification of the algorithm). However, quantization is Location #1 was chosen as the medoid shows a different spec-
not visible in the geometric median composite as the algorithm tral profile exhibiting a peak in the SWIR1 signal instead of
calculations (and the final result) are performed using floating NIR compared with geometric median (red) and the observa-
point numbers. tion from 28.09.2014 (in blue). In this situation, the geometric
median shows a more consistent representative value of the
B. Reduced Spatial Noise clear observations. Locations #2 and #3 have been chosen
Pixels within a spatial neighborhood of a focal pixel are to show how the spectral profiles of two neighboring pixels
likely to exhibit dynamics similar to that of the focal pixel. are more consistent with each other when using a geometric
This property, for example, can be harnessed to improve median. The surface reflectance magnitude across all bands is
detection of deforestation (see [37]). very consistent when moving from Location #2 to Location #3
As true pixels from different time points are returned, similar to the clear observation from 28.09.2014. This is
medoid pixel composites often exhibit a patchwork or pix- compared with the medoid that shows significant variation in
elated appearance to the human eye compared with the geo- the NIR magnitude when moving from #2 to #3. Location #4
6262 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO. 11, NOVEMBER 2017

Fig. 11. Subset of Fig. 3 centered on Hobart, Tasmania, providing an indication of the image quality of the geometric median pixel composite. The differing
optical properties of the water are visible despite the low signal-to-noise ratio. Red boxes show the 50 × 50 pixel marine and land study areas for Figs. 5 and 6.

is chosen to show how fast biophysical changes (e.g., due ites methods were compared across a number of n × n pixel
to cropping) that are somewhat unrepresentative of the full test areas by studying the singular value decomposition of each
stack of images affects the medoid and geometric median, band (denoted as the n × n matrix B). That is, matrix B is
which have the same value in this situation. We note that, decomposed as B = U SV  where S = diag(s1 , . . . , sn ) is
as can be seen in the spectral profiles in Fig. 8, the clear a diagonal matrix and the values s1 ≥ s2 ≥ · · · ≥ sn are
observation of 28.09.2014 represents the peak of the vegetation the singular values, i.e., if u1 , . . . , un are the columns of U ,
growth cycle (with cropping at Location #4) and as such is then Bui = si ui . If v1 , . . . , vn are the columns of V , then
an extrema of the time series of good observations. Hence, B = s1 u1 v1 + s2 u2 v2 + · · · + sn un vn and the matrices si ui vi
neither the medoid nor the geometric median will reflect this are often called the principal images. Taking the reduced-rank
image completely. approximation, Bk = s1 u1 v1 + · · · + sk uk vk where k < n
In an effort to quantitatively describe the pixelated effect is a method commonly used to reduce noise in images  [38].
seen in the medoid pixel composite image, the pixel compos- As such, the speed at which the values λi = si2 /( nj =1 s j )
ROBERTS et al.: HIGH-DIMENSIONAL PIXEL COMPOSITES FROM EARTH OBSERVATION TIME SERIES 6263

decay is a strong indication of the spatial noise remaining in instance, the method achieves a robust representation of the
the image. This is sometimes called a “screeplot” in statistics. time period and performs well when there are a low num-
Alternatively, as k increases (or the cumulative sum of indices ber of clear observations. It generally preserves the spectral
i increases), the reduced rank matrices become closer to the relationships among bands and produces outputs that contain
original image, and hence the graphs reflect the different noise less spatial noise than other approaches such as the medoid,
characteristics of each composite method. Fig. 9 shows the and one-dimensional median applied to each band separately.
test area of Tasmania with n = 960, demonstrating that the The advantages conferred by the geometric median method
median and geometric median algorithms decay faster than make it ideally suited to the generation of pixel composites
the medoid, while the geometric median decays slightly faster over large spatial areas and to the use of these composites in
than median. downstream applications and studies. The method is currently
being applied by Geoscience Australia to produce annual
continental composites of Australia to provide a time series
C. Improved Color Balance
of annual cloud-free representative images from 1986 to
One of the implications of capturing the spectral relationship present. The methodology applies to other data sets (e.g.,
among bands and reducing spatial noise is an improvement in Sentinel-2, SPOT, and others) as it is completely sensor
color balance in the pixel composite. The result is an image agnostic. As mathematical theory underpinning the method is
that is qualitatively close to a clear observation in appearance, developed, a geometric median enables the development of
maintains the physical relationship among spectral bands, and statistical tools such as rigorous pixel-wise hypothesis tests
is spatially contiguous. to perform comparisons between pixel composites that would
Fig. 10 shows a comparison of histograms across all bands generally not be possible to apply to composites constructed
and all algorithms compared with a clear observation. It shows from alternative algorithms. A forthcoming paper demonstrates
the presence of upper tails that are smoother and exhibit less this approach for rigorous bio-physical change detection across
mass in the histograms for the geometric median composite seasonal and multiannual temporal periods.
compared with the median and medoid. This is particularly
visible when comparing the upper tail for the BLUE, GREEN, ACKNOWLEDGMENT
and RED bands of the geometric median versus the median
This paper was published with the permission of the CEO,
and medoid. At the lower end of the tail, there is a smoother
Geoscience Australia.
transition from no mass to positive mass in the histogram.
Together, this could lead to automatic scaling of the displayed R EFERENCES
composite image to be different to a clear observation. Further,
[1] P. Griffiths, S. V. D. Linden, T. Kuemmerle, and P. Hostert, “A pixel-
it is clear that the histograms for the median and medoid based Landsat compositing algorithm for large area land cover mapping,”
composites exhibit more noise compared with the geometric IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 6, no. 5,
median and the clear observation. While this could be due to an pp. 2088–2101, Oct. 2013.
[2] R. DeFries, M. Hansen, and J. Townshend, “Global discrimination of
aliasing effect in the histogram due to quantized integer values, land cover types from metrics derived from AVHRR pathfinder data,”
using the same histogram bins with the clear observation does Remote Sens. Environ., vol. 54, no. 3, pp. 209–222, 1995.
not produce the same noise problems. [3] T. R. Loveland and A. S. Belward, “The IGBP-DIS global 1 km land
cover data set, DISCover: First results,” Int. J. Remote Sens., vol. 18,
no. 15, pp. 3289–3295, 1997.
[4] T. Hermosilla, M. A. Wulder, J. C. White, N. C. Coops, and
D. Signal-to-Noise Ratio G. W. Hobart, “An integrated Landsat time series protocol for change
The geometric median is very effective over targets that detection and generation of annual gap-free surface reflectance compos-
ites,” Remote Sens. Environ., vol. 158, pp. 220–234, Mar. 2015.
typically exhibit a low signal-to-noise ratio, such as water [5] P. Griffiths et al., “Forest disturbances, forest recovery, and changes in
bodies. Fig. 11 shows the area around Hobart that includes forest types across the Carpathian ecoregion from 1985 to 2010 based on
the mouth of the Derwent River and many embayments. Landsat image composites,” Remote Sens. Environ., vol. 151, pp. 72–88,
Aug. 2014. [Online]. Available: http://www.sciencedirect.com/science/
In Fig. 11, the differing optical properties of the water are article/pii/S0034425713003453
visible. The image is also free of noise that is often visible over [6] X. Zhan et al., “The 250 m global land cover change product
dark homogenous regions and free of sun glint. The areas of from the Moderate Resolution Imaging Spectroradiometer of NASA’s
Earth Observing System,” Int. J. Remote Sens., vol. 21, nos. 6–7,
varying depth and sediment load are apparent. Due to favorable pp. 1433–1460, 2000.
characteristics of the geometric median composite, traditional [7] G. Gutman et al., “Towards monitoring land-cover and land-use changes
techniques for determining these properties could be applied. at a global scale: The global land survey 2005,” Photogramm. Eng.
Remote Sens., vol. 74, no. 1, pp. 6–10, 2008.
[8] M. A. Wulder et al., “Landsat continuity: Issues and opportunities
V. C ONCLUSION for land cover monitoring,” Remote Sens. Environ., vol. 112, no. 3,
pp. 955–969, 2008.
A simple and robust method for producing earth observation [9] P. J. Sellers et al., “A revised land surface parameterization (SiB2) for
composites with reduced spatial noise that preserves spectral atmospheric GCMS. Part I: Model formulation,” J. Climate, vol. 9, no. 4,
pp. 676–705, 1996.
relationships has been presented. The method is based on an [10] B. N. Holben, “Characteristics of maximum-value composite images
established and well-studied statistical theory that we have from temporal AVHRR data,” Int. J. Remote Sens., vol. 7, no. 11,
applied in a remote sensing context. pp. 1417–1434, 1986.
[11] N. Flood, “Seasonal composite Landsat TM/ETM+ images using the
The geometric median has several properties that make it medoid (a multi-dimensional median),” Remote Sens., vol. 5, no. 12,
especially attractive for many remote sensing applications. For pp. 6481–6500, 2013.
6264 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO. 11, NOVEMBER 2017

[12] M. C. Hansen et al., “Continuous fields of land cover for the con- [36] A. Godichon-Baggioni, “Estimating the geometric median in Hilbert
terminous United States using Landsat data: First results from the spaces with stochastic gradient algorithms: L p and almost sure rates of
Web-Enabled Landsat Data (WELD) project,” Remote Sens. Lett., vol. 2, convergence,” J. Multivariate Anal., vol. 146, pp. 209–222, Apr. 2016.
no. 4, pp. 279–288, 2011. [37] E. Hamunyela, J. Verbesselt, and M. Herold, “Using spatial context
[13] D. P. Roy et al., “Web-Enabled Landsat Data (WELD): Landsat ETM+ to improve early detection of deforestation from Landsat time series,”
composited mosaics of the conterminous United States,” Remote Sens. Remote Sens. Environ., vol. 172, pp. 126–138, Jan. 2016.
Environ., vol. 114, no. 1, pp. 35–49, 2010. [38] P. C. Hansen, “The truncated SVD as a method for regularization,” BIT
[14] P. Potapov, S. Turubanova, and M. C. Hansen, “Regional-scale boreal Numer. Math., vol. 27, no. 4, pp. 534–553, 1987.
forest cover and change mapping using Landsat data composites for
European Russia,” Remote Sens. Environ., vol. 115, no. 2, pp. 548–561,
2011.
[15] J. C. White et al., “Pixel-based image compositing for large-area dense
time series applications and science,” Can. J. Remote Sens., vol. 40,
no. 3, pp. 192–212, 2014.
[16] J. Gallant et al., “Building the national one-second digital elevation
model for Australia,” in Proc., Water Inf. Res. Develop. Alliance Sci. Dale Roberts received the B.Sc. and B.Sc. (Hons.)
Symp., 2011. degrees in mathematics from the University of Tech-
[17] Australian Collaborative Land Use and Management Program, “Catch- nology Sydney, Ultimo, NSW, Australia, in 2005 and
ment scale land use of Australia,” Dept. Agricult. Water Resour., 2006, respectively, and the Ph.D. degree in pure
Tech. Rep., 2016. mathematics from the University of New South
[18] A. Lewis et al., “Rapid, high-resolution detection of environmental Wales, Sydney, NSW, in 2012.
change over continental scales from satellite data—The Earth Obser- He was with the finance industry for a short
vation Data Cube,” Int. J. Digit. Earth, vol. 9, no. 1, pp. 106–111, period of time. He has been a Faculty Member
2015. with Australian National University, Canberra, ACT,
[19] F. Li et al., “A physics-based atmospheric and BRDF correction for Australia, since 2012. Since 2014, he has been
Landsat data over mountainous terrain,” Remote Sens. Environ., vol. 124, collaborating with various scientists at Geoscience
pp. 756–770, Sep. 2012. Australia, Symonston, ACT, Australia on the development of statistical and
[20] T. Arvidson, S. Goward, J. Gasch, and D. Williams, “Landsat-7 long- machine learning algorithms for earth observation. His research interests
term acquisition plan: Development and validation,” Photogramm. Eng. include probability theory and its applications.
Remote Sens., vol. 72, no. 10, pp. 1137–1146, 2006.
[21] A. Weber, Theory of the Location of Industries. Chicago, IL, USA:
Univ. Chicago Press, 1929.
[22] C. G. Small, “A survey of multidimensional medians,” Int. Stat. Rev.,
vol. 58, no. 3, pp. 263–277, 1990.
[23] J. Kemperman, “The median of a finite measure on a Banach space,”
in Proc. Stat. Data Anal. Based L1-Norm Rel. Methods, Neuchâtel,
Switzerland, 1987, pp. 217–230. Norman Mueller received the bachelor’s degree in
[24] Z. Zhu and C. E. Woodcock, “Continuous change detection and clas- physics from Macquarie University, Sydney, NSW,
sification of land cover using all available Landsat data,” Remote Sens. Australia, in 1995, and the Diploma in applied
Environ., vol. 144, pp. 152–171, Mar. 2014. science with a specialization in geographic informa-
[25] N. Mueller et al., “Water observations from space: Mapping surface tion system and remote sensing from Charles Sturt
water from 25 years of Landsat imagery across Australia,” Remote Sens. University, Wagga Wagga, NSW, in 2008.
Environ., vol. 174, pp. 341–352, Mar. 2016. He has an industry background in chemistry, infor-
[26] J. Sixsmith, S. Oliver, and L. Lymburner, “A hybrid approach to mation technology, and environmental consultancy.
automated Landsat pixel quality,” in Proc. IEEE Int. Geosci. Remote He is a specialist in the analysis of optical satellite
Sens. Symp. (IGARSS), Jul. 2013, pp. 4146–4149. imagery for land cover and inland water. He is
[27] R. R. Irish, J. L. Barker, S. N. Goward, and T. Arvidson, “Characteriza- currently a Senior Earth Observation Scientist with
tion of the Landsat-7 ETM+ automated cloud-cover assessment (ACCA) Geoscience Australia, Symonston, ACT, Australia, where he has led or con-
algorithm,” Photogramm. Eng. Remote Sens., vol. 72, no. 10, tributed to several continental earth observation products including the
pp. 1179–1188, 2006. Dynamic Land Cover Map of Australia and Water Observations from Space.
[28] Z. Zhu and C. E. Woodcock, “Object-based cloud and cloud shadow
detection in Landsat imagery,” Remote Sens. Environ., vol. 118,
pp. 83–94, Mar. 2012.
[29] Z. Zhu, S. Wang, and C. E. Woodcock, “Improvement and expansion
of the Fmask algorithm: Cloud, cloud shadow, and snow detection
for Landsats 4–7, 8, and Sentinel 2 images,” Remote Sens. Environ.,
vol. 159, pp. 269–277, Mar. 2015.
[30] A. Magyar and D. E. Tyler, “The asymptotic efficiency of the spatial Alexis McIntyre received the B.Sc. degree in spatial
median for elliptically symmetric distributions,” Sankhya B, vol. 73, information science and physical geography from
no. 2, pp. 165–192, 2011. the University of New South Wales, Sydney, NSW,
[31] E. P. Crist and R. C. Cicone, “A physically-based transformation of Australia, in 2006. She received a scholarship to pur-
thematic mapper data—The TM tasseled cap,” IEEE Trans. Geosci. sue the M.Sc. degree in geo-information science and
Remote Sens., vol. GE-22, no. 3, pp. 256–263, May 1984. Earth observation for environmental modeling and
[32] E. Weiszfeld, “Sur le point pour lequel la Somme des distances de n management through a European Commission Eras-
points donnés est minimum,” Tohoku Math. J., vol. 43, pp. 355–386, mus Mundus joint programme at four universities
1937. across Europe: Southampton University, U.K., Lund
[33] Y. Vardi and C.-H. Zhang, “A modified Weiszfeld algorithm for the University, Sweden, Warsaw University, Poland, and
Fermat–Weber location problem,” Math. Program., vol. 90, no. 3, ITC at the University of Twente, Netherlands, which
pp. 559–566, 2001. she received in 2010.
[34] J. Nie, P. A. Parrilo, and B. Sturmfels, “Semidefinite representation of the She joined Geoscience Australia, Symonston, ACT, Australia, in 2011,
k-ellipse,” in Algorithms in Algebraic Geometry. Geoscience Australia: where she is currently an Earth Observation Scientist. She has contributed to
Springer, 2008, pp. 117–132. the development of a range of continental products including the Dynamic
[35] H. Cardot, P. Cénac, and P.-A. Zitt, “Efficient and fast estimation of the Land Cover Map of Australia and Water Observations from Space. Her
geometric median in Hilbert spaces with an averaged stochastic gradient research interests include the analysis of time-series remote sensing data and
algorithm,” Bernoulli, vol. 19, no. 1, pp. 18–43, 2013. environmental modeling.

You might also like