Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 40

7

Chapter- 1

INTRODUCTION

REMOTE sensing techniques provide end users with evergrowing volumes of data. Indeed, the resolution of the acquisitions is continually improving while the number of available channels is also increasing. In addition, acquisition rates have been rising during the last few years. It is thus possible to gather large series of images for a given geographical zone. This kind of data set is designated as a satellite image time series (SITS). SITS analysis raises new challenges as processed data volumes are huge, and the temporal and spatial dimensions should be considered. Various techniques can characterize evolutions in SITS. Some of these techniques (e.g., [1]) explore the data at the region level: They extract regions from all the images to provide end users with the evolutions of these regions (e.g., growing regions). Other techniques (e.g., [2]) link descriptors to each image of the SITS. A time sequence of descriptors is thus built, and subevolutions matching temporal and frequency constraints are extracted. These techniques are image based, and the image information is reduced to a few descriptors. Furthermore, pixel-based techniques have also been proposed, focusing either on specific evolution occurring at a precise date (e.g., pixel change detection techniques such as those reviewed in [3]) or on the characterization of the whole sequence of pixel values (e.g., synthetic channel-based techniques, as proposed in [4], or clustering techniques, such as the one presented in [5]).

This paper presents an alternative and complementary approach, relying on the identification of evolutions and subevolutions (i.e., subsequences of evolutions) at the pixel level for finding groups of pixels that could be of interest to end users. Moreover, the experiments reported in this paper show that large data sets (e.g., 20 images of 1 000 000 pixels each) can be handled by such an approach. To output pixel sets that make sense both spatially and temporally, we select sets that contain a minimum number of pixels following the same temporal evolution and having a high connectivity measure. Furthermore, we propose to sift through the data set using an automated process which is, in addition to being efficient and accurate, one of the major premises of knowledge discovery in databases (KDD). The KDD is defined as the nontrivial extraction of implicit, unknown, and potentially useful information from data [6]. It relies on data-mining techniques to extract data models from large amounts of data. Information is then derived from those data models by domain experts. Following these principles, we want to extract, in an unsupervised way, not some, but all the groups of pixels covering more than a given minimum surface threshold, with each group having a common temporal evolution and satisfying a minimum connectivity criterion. We thus avoid making too many assumptions to run the knowledge discovery process. Moreover, the common temporal evolutions are not given beforehand but are determined by the method itself. This method is complementary to the existing techniques. In practice, it is effective in determining interesting groups of pixels sharing meaningful common temporal evolutions, which would not be uncovered by other approaches. Experiments on optical and radar data are presented. The first application is dedicated to crop monitoring from a multispectral image time series, namely the Data Assimilation for Agro-Modeling (ADAM) [7] data set. It is a Satellite Pour lObservation de la Terre (SPOT) SITS with large images covering a rural zone in South Romania, near Bucharest. This SITS allows performances to be assessed according to the ground truth collected by two Romanian research institutes for agriculture and soil science.

9 The second application relates to the monitoring of crustal deformations using synthetic aperture radar (SAR) images. The results obtained for SAR images covering Lake Mead on the Colorado River in the United States are detailed.

This paper is structured as follows. In Section II, spatiotemporal SITS analysis techniques are reviewed. In Section III, the technique proposed in this paper is presented. In Section IV, the experiments on the SPOT and SAR image time series are reported. Finally, we conclude this paper with Section V, discussing the results of this paper and indicating future research.

10

Chapter- 2

EXISTING TECHNIQUES FOR DESCRIBING SITS

In this section, different approaches to SITS information extraction are reviewed. Feature- and model-based techniques are detailed in Section II-A, pixel-based clustering techniques are presented in Section II-B, and change detection techniques are reviewed in Section II-C. Finally, frequent sequential pattern-based techniques are presented in Section II-D.

A. Feature- and Model-Based Techniques

11 The SITS can be processed at a higher level than that of a pixel. For example, in [1], stochastic models such as GibbsMarkov random fields (MRFs) are used to extract spatial and spectral features at the region/object level. Graphs encoding spatiotemporal structures contained in SITS are then inferred from these features. These graphs are finally proposed to the end user to define positive and negative examples that will be used to retrieve similar structures in SITS. The model-based approaches introduce either image statistical models such as the MRF or temporal evolution models to analyze the SITS information. For instance, in [8], the MRF model features are used to perform unsupervised classification in the feature space.

The optimal number of clusters is automatically derived using a rate-distortion analysis. In [9], a nonlinear harmonic model is introduced to identify and interpolate the dynamics of land cover classes. Phenological attributes are retrieved by fitting intra-annual evolution of multispectral time series. Spatiotemporal patterns can also be extracted from SITS by characterizing each image, as proposed in [2]. In this case, self-organizing maps are used to extract the signature of each image, and a time sequence of signatures is built. This sequence is further mined under temporal (maximum elapsed time between signatures) and frequency constraints to determine strong temporal dependences such as if signature A is observed once or more, then, sometime later, signature B is observed once or more.

B. Pixel-Based Clustering Techniques Numerous works show the ability of per-pixel analysis to deliver interesting results. For example, this allows end users to summarize a SITS within one single image. This can be done over synthetic channels. In the case of SAR images, these channels can be the backscattered amplitude average and the maximum backscattered amplitude date (e.g., [4]). Other approaches try to preserve the characteristics of the sequential information by considering each value of the pixels over time without fully aggregating them. In [5], such an approach is developed for clustering optical images. A distance measure based on a metric

12 space and on the Levenshtein edit distance is defined to cluster pixels whose evolutions share the same ground evolution. For example, sequence A A B and sequence A B B are summed up as A occurs before B. Nevertheless, information about the pace of the evolution is lost, irrespective of whether A occurs twice (or not) before B. In addition, spatial information is also not taken into account. In [10], another clustering approach is described.

It focuses on determining the initialization parameters of a K-means algorithm. More precisely, each pixel is considered as a vertex in a direct acyclic graph of minimal length: a minimal spanning tree (MST). This MST is used to derive the number of clusters as well as their centroids. However, information such as the distribution of the vertices has to be supplied beforehand. To our knowledge, although this technique has been tested on multispectral images, it has not been applied to any SITS.

C. Change Detection Techniques Another approach, namely change detection, helps end users by generating a single image in which changes are plotted, i.e., a change map. Change detection techniques generally require prior information about the type of change that has to be taken into account. For example, one may want to look for abrupt changes such as floods, earthquakes, or anthropic disasters (e.g., [11]), while others may be interested in gradual changes such as biomass accumulation (e.g., [12]). Although they require fine registration, change detection techniques can be efficiently applied at the pixel level, particularly when evaluating radiance changes in between optical images (e.g., [3] and [13]). This can also be done at the texture level, as proposed in [14]. On the other hand, it is difficult to process time series at the pixel level when considering SAR images. Indeed, SAR images are corrupted by the speckle effect that results in multiplicative noise. In [15], it has been proposed to consider the ratio of the local means in

13 the neighborhood of the pixels. In [16], this kind of ratio is said to be useful for detecting abrupt changes while the second- and third-order log-cumulants of this ratio were found to be effective to capture gradual changes. In [17], a measure based on the KullbachLeibler divergence [18] has been proposed to evaluate the distance between local radar intensity distributions. Once changes have been measured, a decision function is used to decide whether a given pixel belongs to the class change or no-change. For example, in [19], MRFs are used to model the neighborhood of the pixels.

This information is then injected into a more general statistical framework, and a decision is made in favor of the class having the highest probability. Useful contextual information can also be introduced by integrating geostatistics as in [20]. Change detection techniques also work at the object level. For example, in [21], pixels are clustered according to their radiances and positions to find objects. Objects whose behavior does not match an unchanged reference are selected.

D. Frequent Sequential Pattern-Based Techniques Although similar to our approach, in the sense that generally both temporal and spatial dimensions are taken into account, none of the techniques previously reviewed can extract sets of grouped pixels sharing the same evolution or subevolution without first extracting objects/regions (e.g., [1], [2], and [21]) and/or without making any assumption about the type of evolution, as in model-based techniques (e.g., [8] and [9]). Change detection techniques also look for specific change classes while pixel-based clustering techniques only consider full evolutions and not subevolutions (e.g., [4], [5], and [10]). Furthermore, when searching for subevolutions, we want to extract them without giving any priority to any date of acquisition, which prevents

14 us from using clustering techniques. In [22], we have presented a frequent sequential patternbased approach that is preliminary to the one described in this paper. This approach did not take into account the spatial grouping tendencies of the pixels that share a given evolution. Thus, it was possible to extract an evolution that holds for numerous pixels that are not connected to each other. As a consequence, the evolutions provided to the end user were sometimes difficult to interpret. As proposed in [22], other works (e.g., [23] and [24]) also rely on frequent sequential patterns to analyze spatiotemporal data sets. In [23], frequent subtrajectories of objects, i.e., sequences of spatial locations sampled at consecutive time stamps, are mined. Trajectory mining can be performed only if trajectories are given as prior information.

This implies that objects have to be identified beforehand. In [24], frequent sequential patterns are used to express spatiotemporal relations. More precisely, spatiotemporal neighborhoods have to be defined to check if certain types of data points are located within the spatiotemporal neighborhood of other types of data points. For example, if a sequential pattern A B is found, then it is interpreted as follows: Data points of type B tend to occur around and after data points of type A. A measure of significance based on the spatiotemporal density is proposed to discard uninteresting frequent sequential patterns. Nevertheless, the definition of spatiotemporal neighborhoods requires end users to set both temporal and spatial constraints. In this paper, no prior assumption about temporal information is made.

15

Chapter- 3

GFS PATTERN EXTRACTION

In this section, a new kind of data-mining pattern, the grouped frequent sequential (GFS) pattern, dedicated to the extraction of groups of pixels sharing a common temporal pattern and satisfying, on average, a minimum spatial connectivity is defined. Some

16 preliminary definitions are given to define a SITS as a set of temporal sequences from which a common kind of data-mining pattern, the sequential pattern, can be extracted. Finally, the connectivity measure used to define the GSF patterns is introduced.

A. Preliminary Definitions Let us consider a SITS that covers the same area at different dates. Within each image, each pixel is associated with a value, e.g., the reflectance intensity of the geographical zone that it represents. We can transform those pixel values into values belonging to a discrete domain, using labels for encoding pixel states. Those labels can correspond to ranges obtained by image quantization or to pixel classes resulting from an unsupervised classification (e.g., using K-means or electromagnetic-based clustering).

Definition 1(Label and Pixel State): Let L = {i1, i2, . . . , is} be a set containing s distinct symbols called labels, used to encode the values associated with the pixels. A pixel state is a pair (e, t), where e L and t N, such that t is the occurrence date of e. The date t is simply the time stamp of the image from which the value e has been obtained. Subsequently, we can define a symbolic SITS as a set of pixel evolution sequences, with each sequence describing the states of a pixel over time/at different dates. Definition 2(Pixel Evolution Sequence and Symbolic SITS): For a pixel p, the pixel evolution sequence is a pair ((x, y), seq), where (x, y) are the coordinates of p and seq is a tuple of pixel states seq = _(e1, t1), (e2, t2), . . . , (en, tn)_ containing the states of p ordered by increasing dates of occurrences. A symbolic SITS (or SITS when clear from the context) is then a set of pixel evolution sequences. For a typical symbolic SITS, we thus get a set of millions of pixel evolution sequences, with each sequence containing the discrete descriptions of the acquisition values of a given pixel.

17

B. Sequential Patterns An important and active data-mining research area is the mining of bases of sequences to determine the sequential patterns [25]. This domain is now mature and provides efficient techniques for extracting such patterns. A typical base of sequences is a set of sequences of discrete events, in which each sequence has a unique sequence identifier. For SITS, if we consider the pairs (x, y) of coordinates of the pixels as identifiers of their evolution sequences, then a symbolic SITS is a base of sequences, and the standard notions [25] of sequential patterns and sequential pattern occurrences can be adapted as follows. Definition 3(Sequential Pattern): A sequential pattern is a tuple _1, 2, . . . , m_, where 1, . . . , m are labels in L andmis the length of . Such a pattern is also denoted as 1 2 m. Definition 4(Occurrence and Support): Let S be a symbolic SITS and = 1 2 m be a sequential pattern. Then, ((x, y), _(1, t1), (2, t2), . . . , (m, tm)_), where t1 < t2 < < tm is an occurrence of in S if there exists ((x, y), seq) S such that (i, ti) appears in seq for all is in {1, . . . , m}. Such a pixel evolution sequence ((x, y), seq) is said to support . The support of in S, denoted by support(), is simply the number of sequences in S that support . Example 1: A mock symbolic SITS containing the states of four pixels

18 This data set describes the evolution of four pixels through five images with L = {A,B,C,D}. For example, the successive discrete labels associated with the values of the pixel located at (0, 0) are A, B, C, B, and D. In this data set, the sequential pattern A C B has the following four occurrences (notice that the elements in an occurrence do not need to be contiguous in time):

The pattern has four occurrences but appears in only three different pixel evolution sequences: Its support is support(A C B) = 3. Finally, it should be pointed out that a label can be repeated within a pattern: Pattern C C has two occurrences, one in the third and one in the fourth sequence.

The number of different patterns occurring in a data set can be high. Therefore, only the frequent ones are selected by using a support threshold. Definition 5(Frequent Sequential Pattern): Let be a strictly positive integer termed as support threshold. Let be a sequential pattern; then, is a frequent sequential pattern if support() . The support threshold can also be specified as a relative threshold rel [0, 1]. Then, a pattern is frequent if support()/|S| rel, where S is the data set and |S| is the number of sequences in S. Such a support constraint is used by the sequential pattern extraction algorithms to reduce the search space and to achieve reasonable execution times. The effects of the support constraint,

19 on the number of patterns and computing times, are well known in the sequential pattern literature and consistently have the same impact in the experiments presented in Section IV.

C. Spatial Connectivity Sequential pattern SITS analysis leads to a natural interpretation of the notion of support. For a pattern , the support of is simply an area, i.e., the total number of pixels in the image having an evolution in which occurs. These pixels are said to be covered by . Definition 6(Covered Pixel): A pixel having the evolution sequence ((x, y), seq) is covered by a sequential pattern if has at least one occurrence in seq. The set of the coordinates of the pixels covered by is denoted by cov().

Therefore, for a frequent pattern , the threshold (rel) can be interpreted as the minimum area (relative area) that must be covered by . However, a threshold on the covered area is not sufficient because, most of the time, interesting parts in images are made of pixels forming regions. An additional criterion, the average connectivity measure, is thus introduced. It is based on the 8-nearest neighbors (8-NN) convention. Using this measure, the algorithm selects patterns that cover pixels forming groups which can be defined as follows. Definition 7(Local Connectivity): For a symbolic SITS S, let occ((x, y), ) be a function that, given the spatial coordinates (x, y) and a sequential pattern , indicates whether occurs in S at location (x, y). More precisely, occ((x, y), ) is equal to one if and only if there is a sequence seq in S at coordinates (x, y) and occurs in ((x, y), seq). Otherwise, occ((x, y), ) is

20 equal to zero. If occurs in ((x, y), seq), then its local connectivity at location (x, y) is LC((x, y), ) =

The value LC((x, y), ) is simply the number of pixels in the 8-neighborhood of (x, y) having an evolution supporting . It should be noted that the sum is decremented by one so as not to count the occurrence of at location (x, y). Definition 8(Average Connectivity): The average connectivity of is defined as

For the pixels supporting , this measure gives the average number of neighbors in their 8-NN that also support . In Example 1, AC(A C B) = 6/3 = 2, and AC(C C) = 2/2 = 1. Finally, we define the GFS patterns as follows.

Definition 9(GFS and m-GFS Patterns): Let S be a symbolic SITS, given a sequential pattern frequent in S and a positive real number termed average connectivity threshold, is said to be a GFS pattern if AC() in S. A GFS pattern of length m is called an m-GFS pattern. For instance, in Example 1, if = 2 and if = 2, then A C B is a GFS pattern while C C is not. As shown in Section IV, in practice, the support threshold (i.e., the minimum covered area) and the average connectivity threshold (i.e., minimum degree of spatial grouping) enable the selection of patterns that are interesting with respect to applications.

21 There are several algorithms for extracting frequent sequential patterns in a sound and complete way (e.g., [25][27]). The main idea, used to reduce the execution times, is to take advantage of the antimonotonicity property of the support. This means that, if a sequential pattern has a support , then any pattern that contains at least the labels in , in the same order, has a support equal to or less than . For example, if support(D A) = , then support(D B A) . This property is commonly used by the sequential pattern extraction algorithms to limit the number of patterns to consider. For instance, if D A has already been checked and found not to be frequent, then there is no need to test pattern D B A, because it cannot be frequent. Owing to this property, a drastic reduction in the search space is made possible when looking for frequent patterns. We used our extractor engine developed and written in the C language [28]. It is based on the occurrence lists approach of [26] combined with the pattern-growth technique of [27] and extended for handling pixel coordinates to compute the average connectivity and to select patterns according to both and thresholds.

22 Fig. 1. Satellite NDVI image examples. (a) Original image. (b) Quantized image with s = 3.

23

Chapter 4

EXPERIMENTS

The purpose of these experiments is to verify whether GFS patterns are useful to describe a SITS in an unsupervised way. We present the experiments on the ADAM SITS [7]. The data set and its preprocessing are presented in Section IV-A. The parameter settings are described in Section IV-B, and the extracted patterns are presented in Section IV-C. To evaluate the generic nature of the proposed approach, we present the results obtained with a very different data set in Section IV-D: a SITS built from differential SAR interferograms measuring crustal deformation. All experiments have been run on a standard personal computer (Intel Core 2 at 3 GHz, 4-GB RAM, and Linux kernel 2.6.22.19-02 x86_64).

A. ADAM SITS: Presentation, Selection, and Preprocessing The ADAM SITS contains 39 images acquired between 2000 and 2001. They were acquired with three bands by SPOT satellites: B1 in green (0.50.59 m), B2 in red (0.61 0.68 m), and B3 in near infrared (NIR 0.780.89 m). The spatial resolution is 20 20 m, and the observed scene is a rural area in East Bucharest, Romania.

24

A subscene (containing 1000 1000 pixels) in the Fundulea area was selected. The scene mainly shows agricultural fields whose dimensions are larger than the spatial resolution. Various types of crops, such as wheat, corn, barley, chickpea, soya, sunflower, pea, millet, oats, or lucerne, are present. Other objects are categorized into roads, rivers, forests, and towns. The topography of this region is generally flat with a small part of the area corresponding to slopes bordering a river and to several microdepressions. A ground truth is available for the 20002001 period for the fields that belong to the Romanian National Agricultural Research and Development Institute. It represents 5.9% of the scene but still can be used to evaluate the results. This information has not been used within the datamining process itself. We finally selected 20 images between October 2000 and July 2001 to ensure that adequate data are available to observe agricultural cycles, from autumn plowing and seeding to harvest. For each pixel and for each date, we considered a synthetic band B4 that gives the normalized difference vegetation index (NDVI) [29]. It can be computed using bands B2 and B3 according to the formula B4 = (B3 B2)/(B3 + B2). The NDVI is widely used to detect live green plant canopies in multispectral remote sensing data. An example of an original image of the ADAM SITS in the B4 band is presented in Fig. 1(a). We quantized the pixel values into nonoverlapping intervals that are equally populated. To minimize the influence of possible calibration errors, quantization was done for each image by considering the same number of intervals. For a given acquisition date, a pixel was described by a single label that indicates the interval to which that pixel value belongs. The result of quantization into s = 3 intervals/labels is presented in Fig. 1(b).

B. Parameter Settings The GFS-pattern extraction requires end users to set the relative support threshold rel, the number of labels s, and the average connectivity threshold .

25

1) Relative Support Threshold rel: We first determined rel, because it is the only parameter that is actively used during the data-mining process to prune the search space (cf. Section III). The number of GFS patterns NP is a function of this parameter: It increases exponentially when rel decreases. The number of patterns is also a function of s to a lower order, because the total number of potential GFS patterns is equal to _ i=N i=1 si, where N is the number of images. It is, of course, recommended not to overwhelm end users with too many GFS patterns to be interpreted. To consider the worst case scenario, we plotted the dependence of NP on rel and s by setting no spatial constraint ( = 0).In this case, the GFS-pattern extraction is equivalent to the frequent sequential pattern extraction. The results are presented in Fig. 2(a). As expected, if rel is lower and s is higher, NP is higher. Such a behavior is also presented in [25][27]. Parameter rel has been set to values belonging to [0.25%, 1%] to provide a detailed description of the observed scene (a relative support of 1% corresponds to an area of 10 000 pixels, i.e., 400 hm2).

Fig. 2. Number of GFS patterns NP versus (a) relative support threshold rel and number of labels s for = 0, (b) and s for rel = 0.5%, and (c) rel and s for = 6.

26

As can be observed, too many patterns are extracted for all values except for s = 2. If the connectivity constraint is used (i.e., > 0), then NP exponentially decreases when increases. This behavior is illustrated in Fig. 2(b) for rel = 0.5% and for various values of s. The ability of the average connectivity measure to discard large amounts of frequent sequential patterns is demonstrated in Fig. 2(a) and (b). The notion of GFS patterns is, therefore, useful because it reduces the number of patterns that are supplied to end users by an order of magnitude. For example, with quite a low relative support threshold of 0.5%, with an average connectivity threshold of 5.5 and using three symbols, it is possible to provide 1104 GFS patterns instead of proposing 43 814 frequent sequential patterns. It is noticeable that such a behavior holds for any value of rel. For example, Fig. 2(c) shows the NP dependence on rel and s for = 6. Irrespective of the values of rel and s, the quantity of the extracted GFS patterns remains reasonable. We propose choosing the value of rel according to the execution times by focusing on the extra costs that are due to the handling of the average connectivity measure threshold. The extraction times with and without the computation of the average connectivity measure versus rel, for fixed and s, are shown in Fig. 3. Fig. 4 depicts the variation of the ratio of execution times with connectivity constraint to execution times without connectivity constraint versus rel and s. Extra costs decrease with the increase in the relative support threshold rel and have a relatively weak dependence on the quantization s. This can be explained by the fact that, the higher is, the lower is the number of GFS patterns to be written on disks with respect to the number of frequent sequential patterns [cf. Fig. 2(a)]. Finally, rel = 0.5% is a good compromise between description precision and extra costs due to the handling of the connectivity constraint. 2) Average Connectivity Threshold and Number of Quantization Intervals s: The remaining parameters to be set are and s, the number of quantization intervals (i.e., the number of labels). To obtain homogeneous regions, high values for must be considered. For s, low values lead to a simplified description, while high values increase the accuracy of the evolution descriptions. As previously mentioned, images between October 2000 and July 2001

27 were selected to observe full agricultural cycles. Observing the vegetation phenology requires regular sampling of the whole phenological cycles, which is not the case for the ADAM data set. Sampling rates are indeed not constant. Furthermore, some acquisitions suffer from varying atmospheric and sensor conditions. Moreover, depending on the zones that are considered, phenological cycles of a given type of crop do not always start and end on the same date, because different pedological, fertilization, and irrigation conditions are present. It is thus impossible to rely on GFS patterns having as many events as the number of acquisitions, i.e., 20-GFS patterns, for describing phenological cycles. We thus focused on 18-GFS patterns, because they are general enough to consider different possible occurrence dates and to discard up to two noisy values. Therefore, we propose to set and s by considering the percentage of pixels covered by 18-GFS patterns, denoted as NC18, and by considering the percentage of pure pixels NCP18, i.e., pixels that are covered by a single 18-GFS pattern. These two percentages are computed with respect to the total number of pixels in the image. The following three parameter settings are considered.

Fig. 3. Execution times with ( = 5.5) and without connectivity constraint, t+ and t, respectively, versus rel for s = 3.

28

Fig. 4. Ratio of the execution time with connectivity constraintt+ to execution times without connectivity constraint t versus rel and s for = 5.

Fig. 5. Percentage of pixels covered by 18-GFS patternsNC18 versus and s for rel = 0.5%.

29

Fig. 6. Percentage of pixels covered by a single 18-GFS patternNCP18 versus and s.

1) Case A: when the ratio NCP18/NC18 (the contribution of pure pixels to the total covering) is maximized to obtain a tradeoff between the simplicity of the description and the size of the covered area. 2) Case B: when NC18 is maximized. In this case, the largest possible area is covered, irrespective of whether pixels are covered by several 18-GFS patterns. 3) Case C: when NCP18 is maximized, i.e., the simplest description that covers the largest possible area is provided. Pixels covered by more than one 18-GFS pattern are not considered. To select well-connected GFS patterns, the values of that are below five were not considered. The parameter settings for operating point A were obtained when the ratio NCP18/NC18 was maximized. For that ratio, a maximum value of 56.55% was reached with rel = 0.5%, s = 3, and = 5.5 (graph is not shown). Variations in NC18 with respect to input

30 parameters ( and s) are presented in Fig. 5. The maximum value of NC18 was 72.63%. It was obtained for = 5 and s = 2 (i.e., images were binarized). We thus considered a second operating point, operating point B, for which rel = 0.5%, s = 2, and = 5. Nearly the same behavior was observed for NCP18. This is depicted in Fig. 6: A maximum value of 23.44% is obtained for s = 2 and = 6. This leads to defining operating point C as rel = 0.5%, s = 2, and = 6. It is significant that GFS patterns extracted for operating point C can also be extracted for operating point B, and the only difference lies in a higher value of . Irrespective of the operating point considered, the number of 18-GFS patterns obtained was not more than 40. The end users are thus able to browse and visualize them without being overwhelmed.

Fig. 7. Localization of the short GFS pattern 1 1 3 3 (zoom on the area where the ground truth is available).

Fig. 8. Ground truth is available for the fields that are located in the center of the images

31

C. Qualitative Results 1) Extracting General and Specific Evolutions: For operating point A (rel = 0.5%, s = 3, and = 5.5), we obtained approximately 1000 GFS patterns among which we had 14 18-GFS patterns. As s = 3, the pixel values were quantized into three equally populated intervals denoted by three labels/ symbols: label 1 for low radiometric values, label 2 for midrange values, and label 3 for high values. Before focusing on the 18-GFS patterns, the results for shorter GFS patterns are presented. The short patterns offer general information about pixel evolutions. For example, among the four-GFS patterns extracted, one can find 1 1 3 3. Its relative support RS is equal to 38.05%, and its average connectivity measure AC is equal to 6.78. This pattern points out the first part of phenological cycles: Some crops are sown and they grow and reach maturation (e.g., spring-seeded crops: corn, pea, chickpea, and Sudan grass). The pattern is localized by enlightening each pixel that supports it, while others are set to the color black. The result is illustrated in Fig. 7; for the area where the ground truth is available, see Fig. 8. Fairly homogeneous geometric regions with crisp boundaries can be observed. White regions correspond to different types of agricultural fields with spring crops, while black regions correspond to forests, water bodies, and other types of agricultural fields.

Fig. 9. Superposition of the localizations of GFS patterns 1 1 3 3 and 3 3 3 1 1 (zoom on the area where the ground truth is available;rel = 0.5%, = 5.5, and s = 3).

32 Another example of an extracted short pattern is a 5-GFS pattern 3 3 3 1 1 (RS = 35.7% and AC = 6.91). It corresponds to autumn-seeded crops, particularly wheat. The superposition of the localization of this pattern and of the previous one covers more than 95% of the zoomed scene area for which the ground truth is available, which is presented in Fig. 9. Unassigned pixels in black represent forests, roads, water bodies, localities, and a given type of crop, namely beans. Blue areas contain pure pixels that are covered by only one of the two patterns. They relate to long-term phenological cycles of autumn- or spring-seeded crops. Pixels covered by both the patterns are colored in red and correspond to barley and oat fields. Their short phenological cycles indeed allow us to observe both types of the evolution. This kind of short pattern can thus be used to characterize the main evolutions in a SITS. For the localization of more precise objects or regions, longer and thus more specific patterns must be considered. For the same operating point A, only two 20-GFS patterns were extracted: one containing only label 3, the superior third part of radiometric values, and another containing only label 1, the inferior third part of radiometric values. Only 3% of the pixels in the scene were covered by these 20-GFS patterns. They generally correspond to forest zones and water bodies, respectively. When the 18-GFS patterns were considered, 14 patterns covering 19.54% of the scene were obtained. Five of them are presented in Table I. These patterns will be discussed in Section IV-C2. For operating point B (rel = 0.5%, = 5, and s = 2), 24 extracted 18-GFS patterns were obtained, and the percentage of pixels covered by at least one of these patterns rose up to 72.6% for the entire scene area. In Fig. 10, the number of patterns that cover a pixel is depicted using different colors. The color black indicates that the pixels are not covered by any 18-GFS pattern. Violet pixels (18.8% of the scene) correspond to pure pixels, i.e., pixels covered by a single 18-GFS pattern. They mainly represent the Mostistea River, forests, and a few agricultural fields. Blue pixels (10.6% of the scene) are covered by two 18-GFS patterns, and green pixels (33% of the scene) are covered by three 18-GFS patterns. Blue and green pixels relate to crop areas. The last color, white (9.2% of the scene), emphasizes pixels covered by five 18-GFS patterns. They relate to various types of objects.

33

34

Fig. 10. Number of 18-GFS patterns that cover pixels ( rel = 0.5%, = 5, and s = 2). (Black) Zero pattern. (Violet) One pattern. (Blue) Two patterns. (Green) Three patterns. (White) Five patterns. 2) Evaluation of GFS Patterns Using the Ground Truth: With a ground truth that covers 5.9% of the observed scene, it is possible to match 18-GFS patterns with known types of crops, both spatially and temporally. More precisely, pixels covered by a given pattern are divided into subsets, with each subset being related to a given distribution of occurrence dates. Such a subset, for a given distribution i of occurrence dates, is denoted by cov(, i). A dominant crop is then assigned to each subset according to the ground truth (each covered pixel votes for the crop that it corresponds to). Within such a subset, all pixels corresponding to this dominant crop are called dominant pixels. They are denoted by d(cov(, i)). Next, a global purity P() is calculated. It is inspired by purity measures that are used for evaluating the overall purity of clustering [30, p. 549]. P() is defined as the ratio of the total number of dominant pixels to the

35 total number of pixels that are covered by the pattern. More formally, ifD is the set of all observed distributions of the occurrence dates, then P() = _iD d(cov(, i))/cov(). Even though several crop types can hold for a given pattern, we also propose to qualify each pattern according to its main crop by considering the whole set of the covered pixels. No temporal discrimination ismade in this case. Each pixel covered by the pattern votes for its crop (according to the ground truth). The most voted-for crop is then the main crop associated with the pattern. Other types of information about 18-GFS patterns are also considered. For a given pattern, the following two measures are computed: 1) GTC (ground truth coverage) defined as the percentage of the pixels of the ground truth covered by the pattern; 2) MCC (main crop coverage) defined as the percentage of the pixels corresponding to the main crop of the pattern in the ground truth covered by this same pattern. Table I gives all 18-GFS patterns having a GTC measure greater or equal to 3% for operating points A, B, and C. Regarding the dependence of P on the quantization for a given crop, better purities are likely to be obtained if s increases. The values of P for the main crops are indeed greater for the patterns extracted when s is set to three than for those obtained with s = 2. Most of the patterns presented in Table I have a high purity and thus capture parts of the ground truth. Their localization reveals the nature of the crops in various places. Let us consider, for instance, patterns n5, n11, and n13. The spatial localization of pattern n5 and its temporal discrimination are presented in Fig. 11. Each colored pixel is a covered one, and each color depicts a given distribution of occurrence dates. Pixels that are light blue in color refer to occurrences that are valid for early acquisition dates and correspond to regions with Sudan grass and pea crops. Red and dark-blue pixels match corn crops (late acquisition dates). This pattern presents the best global purity as well as the highest GTC for s = 3. Figs. 12 and 13 present the localization of patterns n13 and n11, respectively.

36

Fig. 11. Localization of pattern n5 (1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 3). Each color corresponds to a different temporal discrimination.

Fig. 12. Localization of pattern n13 (1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2). Each color corresponds to a different temporal discrimination.

37

Fig. 13. Localization of pattern n11 (1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2). Each color corresponds to a different temporal discrimination.

Once again, colors are used to depict each of the distribution of the occurrence dates. The same color crop interpretation, as presented for Fig. 11, is followed. Pattern n13 is one of the purest for the corn crop, and it has the maximal MCC. Pattern n11 maximizes GTC, but lower values of MCC and P are observed. As shown in Fig. 13, pattern n11 concerns more pixels. It relates predominantly to Sudan grass and pea crops (in light blue).

D. Crustal Deformation Monitoring Using SAR Images 1) Data Set and Application: Differential interferogram SITS are challenging data sets as they represent large volumes of data and, as acquisitions, suffer from atmospheric conditions.

38 In this section, SAR images covering the Lake Mead area (Nevada, USA) are considered. They have been acquired by the European Remote Sensing (ERS) satellites, ERS1 and ERS2, and made available through the Extraction and Fusion of Information for Displacement Measurement from SAR Imagery [31] project. Lake Mead is the largest water reservoir in the United States. Its water level showed interannual fluctuations of about 20 m for the period 19922009. The soil surface around the lake is affected by a subsidence/uplift motion that is correlated with water-level fluctuations. At a 50-km scale, the surface shows subsidence when the water level increases and inversely when the water level decreases [32]. On the other hand, locally, a few areas show uplift when the water level increases, reflecting soil dilatation due to water increased pressure (and inversely). The interferometric phase difference between two radar images is computed to measure the ground motion. However, it also contains atmospheric delays. Atmospheric patterns are spatially random for different acquisition dates, whereas deformation patterns must present some spatial correlation with time. To characterize the crustal deformation associated with lakelevel fluctuations, a subset of 20 interferograms obtained from images acquired between 1996 and 2008 was selected. Each interferogram gives the interferometric phase difference of its acquisition date relative to the master date October 8, 1995. The atmospheric phase screen of the master image is assigned to the master date. Between 1996 and 1998, the lake water level increased, while it dropped between 2000 and 2008. The analyzed images (759 716 pixel and 130 130 m resolution) contain phase delays due to both atmospheric and deformation patterns for an area of approximately 100 100 km2. Fig. 14 presents such an image. The depicted delay includes the dominating atmospheric patterns for the date August 8, 1996, along with the deformation of the surface between October 8, 1995, and August 8, 1996. One color cycle (red/yellow/green/blue/violet) corresponds to a 1.8-cm delay increase between the satellite and the Earth surface. White central areas correspond to Lake Mead on which no phase delay can be measured. 2) Processing and Results: The quantization procedure is the one used for the ADAM SITS (equally populated nonoverlapping intervals). The results presented here were obtained from images with pixel values, i.e., phase difference values, quantized in three intervals. The first

39 interval (label 1) denotes strong negative values, the second (label 2) contains reduced (close to zero) positive and negative values, and the third (label 3) corresponds to strong positive values of the phase difference. A strong positive value is interpreted as subsidence, while a strong negative value relates to uplift.

Fig. 14. Interferometric phase delay on August 8, 1996, relatively to the master date October 8, 1995, displayed in radar geometry.

Fig. 15. Localization of pattern 1.

40

Fig. 16. (Enlightened areas) Superposition of pattern 1 and average subsidence or uplift velocity.

Fig. 17. Joint localization of patterns 2, 3, 4, and 5. Setting to 10 000 (rel 2%) and to six provides 10 173 patterns. To consider precise information, the GFS patterns with the most events have been selected.We found five patterns with 15 events. These patterns are listed as follows:

41

Pattern 1 states that some pixels are constantly associated with negative phase difference values over time. The localization of this pattern is presented in Fig. 15. Such a pattern can be either due to an incomplete assessment of the atmospheric phase screen of the master date that holds for all 15 slave dates or due to uplifting deformations affecting all 15 slave dates after October 8, 1995. To check the validity of the latter hypothesis, we have computed the average uplift or subsidence velocity derived from the whole interferometric data set (50 images). Fig. 16 shows the superposition (enlightened areas) of the average subsidence or uplift velocity and of pattern 1. The subsidence or uplift velocity is represented with a wrapped color scale (red/yellow/green/blue/violet). A positive (respectively, negative) color cycle from stable areas (image borders) to deformation zones corresponds to a subsidence (respectively, uplift) rate of 2.2 mm/year. The superposition highlights the main uplift area close to Las Vegas (bottom left) that is probably due to decreased water pumping in this part of the aquifers of Las Vegas. The first labels of patterns 2, 3, 4, and 5 indicate large positive phase differences with respect to the master image, and their last labels indicate large negative phase differences. The

42 joint localization of those patterns is depicted in Fig. 17. Such patterns appear correlated with water-level fluctuations that increased between October 8, 1995, and 1998, and decreased after 2000. In other words, these patterns suggest that there should be pixels for which subsidence (respectively, uplift) is observed when the water level increases (respectively, decreases). Such behavior would be confirmed by a positive regression coefficient between phase delays and water-level fluctuations. To check this assumption, we computed the regression coefficient

Fig. 18. (Enlightened) Superposition of the joint localization of patterns 2, 3, 4, and 5 and regression coefficient (between phase delays and water-level fluctuations).

using the whole interferometric data set. Large positive regression coefficients were obtained on the localization of patterns 2, 3, 4, and 5 (see Fig. 18). The regression coefficient is represented by a wrapped color scale (red/yellow/green/blue/violet). A positive (respectively, negative) color cycle, from stable areas to deformation zones, corresponds to a subsidence (respectively, uplift) of 0.7 mm when the water level increases by 1 m.

43 All five patterns thus relate to the ground deformation and not to atmospheric perturbations, confirming that the notion of GFS patterns can be used to find spatiotemporal patterns describing nonrandom phenomena in SITS.

Chapter 5

CONCLUSION

This paper has proposed an original approach for describing a SITS by considering both spatial and temporal dimensions. A data-mining-based method has been developed to extract, in an unsupervised way, groups of pixels sharing the same temporal evolution and having a high connectivity measure. A frequent sequential pattern extraction technique has been adapted to a spatiotemporal context by introducing the notion of GFS patterns. In this context, at the pixel level, a SITS is considered as a set of temporal and symbolic sequences, with each

44 sequence describing the evolution of a given pixel. Sequences and subsequences, which are valid for groups of pixels covering at least a minimum surface rel and exceeding a degree of connectivity , are retained as potentially interesting patterns, namely GFS patterns. No assumption about the temporal evolution is made beforehand. The experiments have been run on a standard personal computer and two real data sets: a SPOT SITS containing 20 images originating from the ADAM data set and a series of 20 interferograms computed using SAR satellite images that were provided by ERS and ENVISAT satellites. The ADAM data set has been used to explain how to set the input parameters by studying their respective dependences. The results have shown that GFS patterns are less numerous than frequent sequential patterns and that they are interesting to describe well fields with different crops. The tradeoff between a precise description and a large land coverage has been illustrated. On the one hand, for good precision, i.e., high values of the global pattern purity P, it is necessary to consider long patterns as well as a high number of symbols/labels s for pixel value quantization. Nevertheless, in this case, the number of covered pixels is small. On the other hand, short patterns can offer useful general information about pixel evolutions in the scene, by characterizing the main evolutions of a SITS. A large number of pixels are covered by these patterns, but the level of purity is not high. However, a tradeoff can be found: We have presented results obtained for long GFS patterns for which purities are superior to 90%, which cover up to 75% of the ground truth. The experiments have shown that, even for a binarization of satellite images (s = 2), it is possible to obtain sufficient purities and coverage for crop description. We have also shown that, by using temporal discrimination, it is possible to identify the specific crops that a pattern relates to. This discrimination as well as patterns having high purity values offers good class characterizations that can be used for a supervised postclassification of land covering. Another experiment has been presented for the analysis of SAR differential interferogram time series. The studied data set covers the Lake Mead area, where the soil surface around the lake is affected by a subsidence/uplift motion, which is correlated with water-level fluctuations. The results have shown that it is possible to extract long GFS patterns that are correlated with water-level fluctuations. Their localization corresponds to zones where the ground deformation has been identified.

45 Although atmospheric perturbations were present, none of these patterns relates to them, which demonstrates the ability of GFS patterns to discard random phenomena. These results have shown that the developed technique can be applied to various types of images at different resolutions and for different purposes, such as crop or ground deformation monitoring. Future research includes the utilization of the average connectivity measure to actively mine SITS, reducing the search space and execution times. We also aim to use GFS patterns to provide single clustering of the whole SITS.

BIBLIOGRAPHY

[1] P. Has and M. Datcu, Modeling trajectory of dynamic clusters in image time-series for spatio-temporal reasoning, IEEE Trans. Geosci. Remote Sens., vol. 43, no. 7, pp. 16351647, Jul. 2005. [2] R. Honda and O. Konishi, Temporal rule discovery for time-series satellite images and integration with RDB, in Proc. 5th Eur. Conf. PKDD, 2001, pp. 204215. [3] P. Coppin, I. Jonckheere, K. Nackaerts, B. Muys, and E. Lambin, Digital change detection methods in ecosystem monitoring: A review, Int. J. Remote Sens., vol. 25, no. 9, pp. 1565 1596, May 2004. [4] E. Nezry, G. Genovese, G. Solaas, and S. Rmondire, ERS based early estimation of crop areas in Europe during winter 199495, in Proc. 2nd Int. Workshop ERS Appl., London, U.K., Dec. 68, 1995,

46 T.-D. Guyenne, Ed., 1996, pp. 1320. ESA SP-383. [Online]. Available:

http://adsabs.harvard.edu/full/1996ESASP.383...13N. [5] A. Ketterlin and P. Ganarski, Sequence similarity and multi-date image segmentation, in Proc. 4th Int. Workshop Anal. Multitemporal Remote Sens. Images, Leuven, Belgium, Jul. 2007, pp. 14. [6] W. Frawley, G. Piatetsky-Shapiro, and C. Matheus, Knowledge discovery in databases: An overview, in Knowledge in Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley, Eds. Menlo Park, CA: AAAI Press, 1991, pp. 127. [7] Centre National dEtudes Spatiales, Database for the Data Assimilation for Agro-Modeling (ADAM) Project. [Online]. Available: http://kalideos. cnes.fr/index.php?id=accueil-adam

You might also like