Professional Documents
Culture Documents
Ebook Spatial Predictive Modelling With R 1St Edition Jin Li 2 Online PDF All Chapter
Ebook Spatial Predictive Modelling With R 1St Edition Jin Li 2 Online PDF All Chapter
Ebook Spatial Predictive Modelling With R 1St Edition Jin Li 2 Online PDF All Chapter
Edition Jin Li
Visit to download the full and correct content document:
https://ebookmeta.com/product/spatial-predictive-modelling-with-r-1st-edition-jin-li-2/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
https://ebookmeta.com/product/spatial-predictive-modelling-
with-r-1st-edition-jin-li/
https://ebookmeta.com/product/fundamentals-of-spatial-analysis-
and-modelling-1st-edition-jay-gao/
https://ebookmeta.com/product/advanced-prognostic-predictive-
modelling-in-healthcare-data-analytics-1st-edition-sudipta-roy/
https://ebookmeta.com/product/3s-technology-applications-in-
meteorology-observations-methods-and-modelling-1st-edition-
shuanggen-jin/
Predictive Modelling for Energy Management and Power
Systems Engineering 1st Edition Ravinesh C. Deo
https://ebookmeta.com/product/predictive-modelling-for-energy-
management-and-power-systems-engineering-1st-edition-ravinesh-c-
deo/
https://ebookmeta.com/product/bayesian-modelling-of-spatio-
temporal-data-with-r-1st-edition-sujit-kumar-sahu/
https://ebookmeta.com/product/understanding-data-analytics-and-
predictive-modelling-in-the-oil-and-gas-industry-1st-edition-
kingshuk-srivastava/
https://ebookmeta.com/product/displaying-time-series-spatial-and-
space-time-data-with-r-second-edition-oscar-perpinan-lamigueiro/
https://ebookmeta.com/product/r-calculus-v-description-
logics-1st-edition-li/
Spatial Predictive Modeling
with R
Spatial Predictive Modeling
with R
Jin Li
First edition published 2022
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
© 2022 Jin Li
Reasonable efforts have been made to publish reliable data and information, but the author and publisher
cannot assume responsibility for the validity of all materials or the consequences of their use. The authors
and publishers have attempted to trace the copyright holders of all material reproduced in this publica-
tion and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any future
reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, trans-
mitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter
invented, including photocopying, microfilming, and recording, or in any information storage or retrieval
system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com or
contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-
8400. For works that are not available on CCC please contact mpkbookspermissions@tandf.co.uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used
only for identification and explanation without intent to infringe.
Publisher’s note: This book has been prepared from camera-ready copy provided by the authors
Contents
Preface xiii
v
vi Contents
Appendix 351
References 357
Index 369
Preface
xiii
xiv Preface
and optimized (Li and Heap 2011, 2014). Spatial predictive models are often developed in
terms of predictive accuracy based on model validation methods and are fundamentally
different from other modeling types (Leek and Peng 2015).
TABLE 1: Development of the hybrid methods for SPM (Li, Heap, Potter, and Daniell
2010, 2011b, 2011a; Li, Potter, Huang, Daniell, et al. 2010; Li 2011; Li, Potter, Huang, and
Heap 2012a, 2012b) (modified from Li (2018a)).
Predictive accuracy is the key criteria for predictive modeling, and it is used for parameter,
variable and model/method selection in this book. The property of spatial predictions is
a further criteria for predictive modeling, where professional knowledge plays its role in
examining whether the predictions are scientifically sound, reasonable, and interpretable.
This book aims to introduce SPM as a discipline to modelers and researchers. It systemati-
cally introduces the entire process of SPM . The process contains the following components:
data acquisition, method and variable selection, model or parameter optimization, accuracy
assessment, and the generation and visualization of spatial predictions. Each of these mod-
eling components plays an important role in model development. Incorrect or inappropriate
implementation of any components may lead to less accurate or even misleading predic-
tive model(s). This book provides tools for relevant components to improve the quality of
spatial predictions in various disciplines. It also provides guidelines, suggestions, recommen-
dations, and reproducible examples in R for developing the most accurate predictive model
by considering these components, relevant requirements, and factors associated with each
component.
This book concentrates more on the applications of predictive methods and less on the math-
ematical and statistical details that can be found in previous studies (e.g., Goovaerts 1997;
Webster and Oliver 2001; Venables and Ripley 2002; Wackernagel 2003; Bivand, Pebesma,
and Gomez-Rubio 2013; van Lieshout 2019). Since this book is specifically focusing on SPM
with R, for one interested in machine learning and spatio-temporal statistics with R other
Preface xv
relevant publications are available (e.g., Kuhn and Johnson 2013; James et al. 2017; Wikle,
Zammit-Mangion, and Cressie 2019).
This book covers the whole modeling process that is not only important for SPM, but also
provides valuable tools to other predictive modeling fields. It is expected to boost the ap-
plications of appropriate SPM processes, and improve the quality of spatial predictions for
various disciplines. It is also expected to enhance further research in this field, and antici-
pated further novel and performance-improved spatial predictive methods to be developed
and applied.
R versions
Two versions of R were used to run R code in this book.
R version 3.6.3 (2020-02-29) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under:
Windows 10 x64 (build 19041)
R version 4.0.3 (2020-10-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under:
Windows 10 x64 (build 19041)
R packages
An R package, spm (Li 2019b), initially released on CRAN in 2017, is used as one of the core
packages in this book. Three other core R packages, spm2 (Li 2021a), steprf (Li 2021c), and
stepgbm (Li 2021b), are developed for and released with this book, to cover the methods and
functions that are not currently included in the spm package.
1. The spm package
The spm package introduces some novel accurate hybrid methods of geostatistical and ma-
chine learning methods for SPM. It contains two commonly used geostatistical methods (i.e.,
OK and IDW ), two machine learning methods (RF and GBM ), four hybrid methods (i.e.,
RFIDW, RFOK, GBMIDW, and GBMOK) and two averaging methods (RFOKRFIDW,
and the average of GBMOK and GBMIDW (GBMOKGBMIDW )). For each method, two
functions are provided, with one function for assessing the predictive errors and accuracy
of the method based on cross-validation and the other for generating spatial predictions.
2. The spm2 package
The spm2 package is an extended version of spm, by further introducing some novel functions
for statistical methods (i.e., GLM, glmnet, GLS), thin plate splines, SVM, kriging methods
(i.e., simple kriging, universal kriging, block kriging, kriging with an external drift), and
228 hybrid methods plus numerous variants for SPM. For each method, two functions are
provided, with one function for assessing the predictive errors and accuracy of the method
based on cross-validation and the other for generating spatial predictions if needed. It also
contains a couple of functions for data preparation and predictive accuracy assessment.
3. The steprf package
The steprf package introduces several novel variable selection methods for RF. They are
based on averaged variable importance (AVI ), and knowledge informed AVI (KIAVI and
KIAVI2) methods.
4. The stepgbm package
The stepgbm package introduces a couple of novel variable selection methods for GBM. They
are based on relative variable influence (RVI ) and knowledge informed RVI (KIRVI and
KIRVI2) methods.
Preface xvii
Data sets
All data sets used in this book are available either in the spm, spm2, and sp (Pebesma and
Bivand 2020) packages or online as detailed in Appendix A.
References for R
Learning materials are available online for beginners, such as
(1) “An Introduction to R”;
(2) “R Language Definition”;
(3) “R Installation and Administration”;
(4) “R Data Import/Export”; and
(5) “Writing R Extensions”.
They can also be accessed from Manuals (in PDF) tab under Help tab in RGui.
Caveats
Although in this book I attempt to cover relevant components, which contribute to the
improvement of predictive accuracy, as comprehensively as possible, the SPM field is too
broad to allow that to be done completely in this book. This is because different disciplines
have their own specific features and requirements. Therefore, further work is needed to
identify factors in relevant components or additional components that can further improve
predictive accuracy in each discipline.
Furthermore, the data sets and software packages are provided in good faith, but none of
the author, publishers, and distributors warrant their accuracy and are responsible for the
consequences of their use.
xviii Preface
Contact details
The author may be contacted by email at
jldata2action@gmail.com
and would be grateful for being informed of errors and improvements to the contents of this
book. Errata and updates are available at https://github.com/jinli22.
Acknowledgment
This book would not be possible without:
(1) R, a free software environment for statistical computing and graphics (R Core Team
2020);
(2) RStudio, an integrated development environment (IDE) for R (RStudio Team 2020);
and
(3) R bookdown, a powerful tool for combining analysis and reporting into the same docu-
ment (Xie 2016).
I would like to acknowledge the contribution of many people to the conception and com-
pletion of this book. Over the years, many people have greatly influenced my career, but a
special recognition must be given to Bob Murison, University of New England, who, with
enthusiasm and patience, helped me in modern statistics and statistical computing in 1990s.
Tony Arthur and Steve Henry kindly helped to get the bee data set released from CSIRO for
this book. I am also grateful to Yanchang Zhao for helpful discussion and suggestion in the
early stage of the preparation of the book. I am indebted to Xiufu Zhang for proofreading
and critical comments on the preliminary draft. I would like to thank three reviewers for
valuable comments on the proposal of this book. I am greatly appreciative to Rob Calver
at Chapman & Hall/CRC for seeking a book idea back in 2015 and his team for continuous
support, patience, and help at each phase of the preparation of this book. Finally, I would
like to thank my family and my parents for their love and support.
Jin Li
July , 2021
Author Bio
xix
1
Data acquisition, data quality control, and spatial
reference systems
This chapter introduces relevant sampling designs for spatial data, factors to be considered
for data quality control (QC), and spatial data types and spatial reference systems to be
used for spatial predictive modeling.
library (raster)
pb.df <- as.data.frame(raster("./data/bathy_1km_Petrel.tif"), xy = TRUE , na.rm =
TRUE) # petrel bathymetry raster data imported as data - frame
names(pb.df) <- c("long", "lat", "bathy")
The data sets, pps.df and pb.df, are in dataframe format. They need to be converted to
spatial objects as below according to Bivand, Pebesma, and Gomez-Rubio (2013) so that
they can be visualized.
DOI: 10.1201/9781003091776-1 1
2 1 Data acquisition, data quality control, and spatial reference systems
library (sp)
pb <- pb.df
gridded(pb) = ~long+lat # grid spatial data into SpatialPixelsDataFrame
Spatial distribution of sediment samples in the Petrel area is illustrated in Figure 1.1 by
par(font.axis = 2, font.lab = 2)
lab. palette <- colorRampPalette(c("dark blue", "blue", "light blue"), space = "Lab
")
image(pb , axes = T, xlab = "Longitude", ylab = "Latitude", col = lab. palette (255))
plot(pps , add = TRUE , pch = 1, col = "red", cex = 0.5)
It is apparent that the samples are non-randomly distributed and even clustered.
FIGURE 1.1: Samples selected (red circle) by ad-hoc sampling designs; and the back-
ground shows the spatial patterns of bathymetry in the Petrel area.
For sampling point locations in an area using spsample, the following arguments need to be
specified:
(1) x, a spatial object;
(2) n, sample size; and
(3) type, a character to specify a sample method (e.g., “random”, “regular”).
We apply spsample to the Petrel area and produce 100 samples as below.
systsps <- spsample(pb , n = 100, "regular")
The samples selected, systsps, are illustrated in Figure 1.2. As expected for systematic
sampling design, the samples are evenly distributed over space.
FIGURE 1.2: Samples selected (red circle) by systematic sampling design; and the back-
ground shows the spatial patterns of bathymetry in the Petrel area.
For spatial predictive modeling, non-random sampling methods are not recommended. How-
ever, non-random sampling designs were compared previously (Diggle and Ribeiro Jr. 2010),
which provides some useful clues for lattice plus close pairs sampling for spatial predictive
modeling.
We apply spsample to pb and produce 100 samples using unstratified equal probability design
as an example below. To make the results reproducible, we need to use function set.seed
that takes an (arbitrary) integer argument.
set.seed (1234)
unstran <- spsample(pb , n = 100, "random")
FIGURE 1.3: Samples selected (red circle) by unstratified random sampling design, and
the background shows the spatial patterns of bathymetry in the Petrel area.
The unstratified random sampling is not recommended to collect data for spatial predictive
modeling, so for the unstratified unequal probability design, no example will be provided.
This is because (1) spatial information is available for sampling design, which would lead
to spatially stratified sampling as discussed below in Section 1.1.3, and (2) unstratified
random sampling design may even be overperformed by lattice sampling, a kind of non-
random design (Diggle and Ribeiro Jr. 2010).
ing a survey for a region. Some recently developed randomized spatial sampling procedures
were reviewed and compared using simple random sampling without replacement as a bench-
mark for comparison, and the guidance has been also provided for choosing an appropriate
spatial sampling method (Benedetti, Piersimoni, and Postigione 2017). Furthermore, some
applications of stratified random sampling with spatial information (i.e., spatially stratified
random sampling) and some R packages for stratified random sampling have been reviewed
for spatial predictive modeling (Li 2019a).
Spatially stratified random sampling design can be generated in several ways in R (Li 2019a)
using additional information. Three examples using different functions are provided below.
1. Function spsample
FIGURE 1.4: Samples selected (red circle) by spatially stratified random sampling design
using the function spsample in the sp package, and the background shows the spatial patterns
of bathymetry in the Petrel area.
The second example applies the functions stratify and spsample in the spcosa package
(Walvoort, Brus, and de Gruijter 2020) to pb.
6 1 Data acquisition, data quality control, and spatial reference systems
The function stratify is to partition a spatial object into compact strata by means of k-
means (Walvoort, Brus, and de Gruijter 2020). The descriptions of relevant arguments of
stratify are detailed in its help file, which can be accessed by ?stratify.
For sampling point locations in an area using stratify, the following arguments need to be
specified:
(1) x, a spatial object;
(2) nStrata, number of strata; and
(3) equalArea, if FALSE the algorithm results in compact strata, and if TRUE the algorithm
results in compact strata of equal size.
We apply stratify to pb and produce 20 strata as below.
library (spcosa)
The arguments of spsample in spcosa are the same as the arguments of spsample in sp. We
apply spsample to strata20 and select five samples from each stratum as below.
# select 100 samples (i.e., 5 samples from each stratum )
set.seed (1234)
sp20 <- spsample(strata20 , n = 5)
sp20 .100 <- as(sp20 , 'data.frame ')
3. Function clhs
The last example uses the function clhs in the clhs package (Roudier 2011, 2020). The
function clhs is to implement the conditioned Latin hypercube sampling (Roudier 2011).
The descriptions of relevant arguments of clhs are detailed in its help file, which can be
accessed by ?clhs.
For sampling point locations in an area using clhs, the following arguments need to be
specified:
(1) x, a data.frame or a spatial object; and
(2) szie, sample size.
We apply clhs to pb.df, a data.frame. This sampling design uses both location information
and bathymetry data to spatially stratify the sampling area and randomly select 100 samples
as below.
library (clhs)
set.seed (1234)
sample100 <- clhs(pb.df , size = 100)
sample100.sp <- pb.df[sample100 ,]
sp100 <- sample100.sp[,-3]
FIGURE 1.5: Samples selected (gray dot) by spatially stratified random sampling using
the functions stratify and spsample in the spcosa package, where the region to be sampled
is divided into 20 equal areas (green).
FIGURE 1.6: Samples selected (red circle) by spatially stratified random sampling design
using the function clhs; and the background shows the spatial patterns of bathymetry in
the Petrel area.
set.seed( 1234)
samp <- quasiSamp(n = n, dimension = 2, study.area = NULL , potential.sites = X,
inclusion.probs = altInclProbs) # generate the design according to the altered
inclusion probabilities .
Since this sampling method uses the spatial extent of survey area (i.e., potential sites)
instead of its actual spatial domain, some samples may locate outside the domain. If this
occurs, sample size (i.e., the number of samples to be selected) may need to be increased
to ensure sufficient samples fall within the domain.
We can visualize the adjusted inclusion probabilities (white to blue areas to show the inclu-
sion probability changing from low to high), legacy sites (red circle) and sample locations
as in Figure 1.7 by
X1 <- X
X1$prob <- altInclProbs
gridded(X1) = ~long+lat
image(X1 , axes = T, xlab = "Longitude", ylab = "Latitude", col = hcl.colors (10, "
blues", rev = TRUE)) # Adjusted Inclusion Probabilities
1.2 Data quality control 9
FIGURE 1.7: Samples selected (red diamond) by spatially stratified random sampling
with prior information (i.e., legacy sites, red circle), with the adjusted inclusion probabilities
increasing from the legacy sites (white to blue areas show the inclusion probability changing
from low to high).
still affected by many factors, such as data credibility, data accuracy, and completeness (Li,
Potter, Huang, Daniell, et al. 2010; Li, Potter, Huang, and Heap 2012b). Data QC process
is still required for the sediment samples.
We will use sediment samples in the Petrel area as an example, and the samples will be
extracted from MARS database as below.
sed1 <- read.csv("./data/MARS_Grain_size_as_mud_sand_gravel_mean_texture_20200524_
092819. csv")
# names (sed1)
dim(sed1)
[1] 385 20
The data structure and first three rows of sediment samples are
str(sed1 , digits.d = 6, width = 65, strict.width = "cut")
head(sed1 , 3)
The samples in sed1 can be quality controlled based on the data QC criteria developed
previously (Li, Potter, Huang, Daniell, et al. 2010; Li, Potter, Huang, and Heap 2012b). In
total, six data QC criteria are considered below to demonstrate how to conduct data QC.
library (spm2)
sed1$lat.digit <- decimaldigit(sed1$lat)
sed1$long.digit <- decimaldigit(sed1$long)
Then we remove samples with two or less digits after the decimal point in either longitude
or latitude.
sed1 <- subset(sed1 , sed1$lat.digit >= 3 | sed1$long.digit >= 3)
sed1 <- sed1[, -c(11:12)]
dim(sed1)
[1] 374 10
This shows that sample size has been reduced from 385 to 374, that is, 11 samples have
been removed.
Sample.type Frequency
1 CORE BOX 5
2 CORE GRAVITY 130
3 DREDGE PIPE 45
4 DREDGE UNSPECIFIED 92
5 GRAB SHIPEK 31
6 GRAB SMITH MCINTYRE 68
7 GRAB UNSPECIFIED 3
Among these seven sampling methods, dredge methods (i.e., dredge pipe and dredge un-
specified) are assumed to be unreliable, and samples collected using the methods are less
accurate. So, we need to remove these dredged samples.
sed1 <- subset(sed1 , sed1$ sample.type != "DREDGE PIPE" & sed1$ sample.type != "
DREDGE UNSPECIFIED")
dim(sed1)
[1] 237 10
It is clear that the sample size has been reduced from 374 to 237, that is, in total, 137
samples were collected using dredge methods and have been removed.
FALSE TRUE
230 7
12 1 Data acquisition, data quality control, and spatial reference systems
It shows that there are seven locations with duplicated samples. Since the data set is sorted
using mud variable in descending order, it ensures the samples with less mud content be
labeled as duplicated. These duplicated samples need to be removed, and the duplicated
samples with less mud content are removed below.
sed1$best_loc <- duplicated (sed1$location.id)
sed1 <- subset(sed1 , best_loc == FALSE)
sed1 <- sed1[, -c(11:12)]
dim(sed1)
[1] 230 10
This shows that sample size has been reduced from 237 to 230, that is, seven duplicated
samples have been removed.
[1] 125 10
This shows that sample size has been reduced from 230 to 125, that is, 105 samples with a
base depth more than 5 cm have been removed.
sed1 <- subset(sed1 , mud >= 0 & sand >= 0 & gravel >= 0)
dim(sed1)
[1] 111 10
It shows that sample size has been reduced from 125 to 111, that is, 14 samples are with
NA and have been removed.
In fact, with a further inspection, the sums of sediment for these 14 samples are all 100% if
the NAs were replaced with 0s. This suggests that 0s were ignored for these samples in the
MARS database, resulting in these missing values. If one wishes, it is pretty reasonable to
include these 14 samples by simply replacing the NAs with 0s.
1.3 Data quality control 13
[1] 99 102
99 100 102
1 109 1
This shows that: (1) the sum of three sediment types are 100% for 109 samples, and (2) for
the remaining two samples their sum is not 100%, but close to 100%. The differences could
result from rounding errors in their recordings, so no further samples are removed.
2. Data limits
For marine samples, bathymetry data are stored as negative, while for terrestrial samples
elevation data usually are positive. Samples with positive bathymetry are assumed to be
incorrect and need to be removed.
sed1b <- subset(sed1 , bathy <= 0)
dim(sed1b)
[1] 0 10
This suggests that no sample is selected. This is because bathymetry data are actually miss-
ing for all these remaining samples, which can be shown from summary(sed1) or simply from
sed1$bathy. Since we can derive bathy data for these samples from bathy data for Australian
EEZ as shown in Appendix A, it would not prevent the selection of these samples.
sed1 .1 <- sed1[, -c(1:4)]
names(sed1 .1); dim(sed1 .1)
[1] 111 6
In total, there are 111 samples remaining after the data QC process. In comparison with
the pps.df data set that was generated back in 2012 (Li, Potter, Huang, Daniell, et al. 2010;
Li, Potter, Huang, and Heap 2012b), there are three fewer samples in this data set. This
could be due to sample removal resulted from data QC process for MARS database since
2012.
This example intends to provide some clues to how to QC a data set at hand. Sometimes,
data noises may be resulted from repeated measures, and certain rules may need to be
developed to clean such samples based on professional knowledge (e.g., Li et al. 2009).
Furthermore, exploratory analysis can be used to further detect abnormal samples as de-
tailed in the next chapter.
14 1 Data acquisition, data quality control, and spatial reference systems
data(petrel)
class(petrel)
[1] "data.frame"
head(petrel , 3)
petrel [1:2, ]
data(sponge)
class(sponge)
[1] "data.frame"
head(sponge , 3)
The samples are point data and stored together with their associated location information.
The location information in the petrel data set is longitude and latitude, while the location
information in the sponge data set is easting and northing.
2. Grid data of predictive variables with location information
We use two grid data sets, petrel.grid and sponge.grid, in the spm package as examples below.
data(petrel.grid)
class(petrel.grid)
1.3 Spatial data types and spatial reference systems 15
[1] "data.frame"
petrel.grid [1:2, ]
data(sponge.grid)
class(sponge.grid)
[1] "data.frame"
sponge.grid [1:2, ]
The examples show that location information in the petrel.grid data set is longitude and
latitude, while the location information in the sponge.grid data set is easting and northing.
The spatial reference system for the location information can be changed when required as
demonstrated below.
The spatial information, easting and northing, in the sponge data set is stored in utm zone
52 south. Since the sponge data set is in dataframe format, it needs to be converted to
SpatialPoints format with its associated coordinate reference system prior to reprojecting as
below (Bivand, Keitt, and Rowlingson 2019; Pebesma and Bivand 2020; Hijmans 2020).
library (sp)
library (raster)
library (rgdal)
Given that the data format needs to be changed from dataframe to SpatialPoints format as
below, it is a good idea to reassign the dataframe sponge to a different object spng that will
be reformatted. Thus the format of sponge will remain unchanged for future use.
16 1 Data acquisition, data quality control, and spatial reference systems
spng [1:2, ]
crs(spng)
CRS arguments:
+proj=utm +zone =52 +south +ellps=WGS84 +units=m +no_defs
Now the spng data set can be reprojected from easting and northing in utm zone 52 south
to longitude and latitude in WGS84 using spTransform as demonstrated below.
spng.wgs84 <- spTransform(spng , CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=
WGS84 +towgs84 =0,0,0"))
class(spng.wgs84)
crs(spng.wgs84)
# windows ()
# spplot (spng.wgs84 , " sponge ")
write.csv(spng.wgs84 , "./data/ spongelonglat.csv", row.names = FALSE)
Function st_transform
Alternatively, the spng data set can be reprojected from easting and northing in utm zone 52
south to longitude and latitude in WGS84 using st_transform as shown below.
library (sf)
spng2 <- st_as_sf(sponge , coords=c("easting", "northing"))
st_crs(spng2) <- 32752
# crs( spng2 )
spng.wgs84 .2 <- st_ transform (spng2 , st_crs (4326))
class(spng.wgs84 .2)
1.3 Spatial data types and spatial reference systems 17
crs(spng.wgs84 .2)
For spatial predictive modeling, coordinate information may be used as predictive vari-
able(s). Thus spng.wgs84.2 needs to be converted into a dataframe format as follows.
spng.wgs84 .3 <- cbind ((st_ coordinates(spng.wgs84 .2)), st_set_geometry(spng.wgs84
.2, NULL))
class(spng.wgs84 .3)
[1] "data.frame"
spng.wgs84 .3[1:2 , ]
pg[1:2, ]
Now the pg data set can be reprojected from longitude and latitude in WGS84 to easting
and northing in utm zone 52 south as below.
pg.utm52s <- spTransform(pg , CRS("+proj=utm +zone =52 +south +units=m +no_defs +
ellps=WGS84 +towgs84 =0,0,0"))
class(pg.utm52s); crs(pg.utm52s)
CRS arguments:
+proj=utm +zone =52 +south +ellps=WGS84 +units=m +no_defs
Similar to the reprojection of point data spng above, st_transform can be used to reproject
the grid data set from WGS84 to utm zone 52 south.
3. Selection of spatial reference systems
Spatial reference system used to project spatial information is often assumed to have certain
effects on the performance of predictive models, thus in practice various spatial reference
systems have been developed to minimize such effects (Jiang and Li 2014). For spatial
predictive modeling, a spatial reference system that can minimize distortion in distance over
space is ideal. When a study area is relatively small and located within one utm zone, spatial
data are often projected using the utm zone or an appropriate projection system. When the
study area is spanning over two or more utm zones, the existing geographic coordinate
system (i.e., WGS84) could be used. This is because the effects of spatial reference systems
on the predictive accuracy of spatial predictive methods (i.e., IDW and OK) could be
negligible for areas at various latitudinal locations (up to 70 decimal degrees) and spatial
scales (Jiang and Li 2013, 2014; Turner, Li, and Jiang 2017). Hence, without reprojecting the
spatial data (most likely it is in WGS84), the spatial data can be used for spatial predictive
modeling for areas with latitude less than 70 decimal degrees.
Furthermore, since the predictive accuracy is the key for predictive modeling, an optimal
spatial reference system should be selected to maximize predictive accuracy. The spatial
reference system that can minimize the distortion in distance should be identified and used.
The selection of an optimal spatial reference system can also be determined based on its
effect on predictive accuracy for relevant predictive method . This can be achieved with
cross-validation function that is to be introduced later for relevant predictive methods in
this book.
2
Predictive variables and exploratory analysis
This chapter introduces data preparation of predictive variables and exploratory analysis for
predictive relevant methods, including (1) principles for pre-selection of predictive variables
and limitations, (2) predictive variables, and (3) role and limitations of exploratory analysis
in variable pre-selection.
DOI: 10.1201/9781003091776-2 19
20 2 Predictive variables and exploratory analysis
(i.e., surrogate variables) are used instead for spatial predictive modeling. Proxy variables
are usually variables directly caused by response variable and/or correlated variables. They
can be identified based on expert or professional knowledge (e.g., McArthur et al. 2010).
Certainly, predictive models can use causal variables, proxy variables, or both if causal
variables are not all available.
2.1.4 Limitations
For spatial predictive modeling, how to select potential predictive variables can be limited
or constrained by certain factors. For example, (1) they need to be continuously available
for the entire area to be predicted; (2) spatial resolution of various predictive variables
needs to meet desired resolution for the final predictions, although they can be re-scaled
or aggregated. Sometimes, even though we know certain possible predictive variables, they
may not meet these requirements and cannot be used for spatial predictive modeling. This
is particularly true in mountainous and deep sea areas for spatial predictive modeling in
the environmental sciences.
2. Climatic variables
Climatic variables may include temperature, precipitation, wind speed, and various derived
variables (e.g., humidity, seasonality).
3. Topographical variables
Topographical variables may include elevation, slope, aspect, distance-to-sea, and relevant
derived variables such as topographic position index.
4. Optical remote sensing data
Optical remote sensing data may include various reflectance bands and relevant derived
variables (e.g., NDVI, EVI).
5. Vegetation information
Vegetation information may include variables like vegetation types, abundance, and cover-
age.
6. Substrate data
Substrate data may include soil type, soil nutrients, organic matter, and soil moisture.
7. Disturbance information
Disturbance information may include grazing, fertilization, burning, and so on.
This is based on hbee1 data set from a previous study (Arthur et al. 2010, 2020).
Then some outliers identified could be excluded from further modeling and analysis.
Although identification of outliers is important for predictive modeling, the outliers identi-
fied may be caused by certain conditions (e.g., an optimal environmental condition) (Arthur
et al. 2010, 2020). Thus they could be false outliers as shown in Figure 2.2 (Li 2008; Arthur
et al. 2010) for model glmmpql1 after fitting a glmmPQL model below.
library (MASS)
glmmpql1 <- glmmPQL(hbee ~ inf + c500 + I(inf ^2) + I(links300 ^2) + inf:I(links300
^2), random = ~1| paddock/plot , data = hbee1 , family = quasi(var = "mu^2", link
= "log"), maxit = 1000)
It is apparent that one outlier identified in Figure 2.1 (i.e., the observation with the maximal
bee count number) could be no longer classified as an outlier in Figure 2.2.
2.3 Exploratory analysis 23
FIGURE 2.2: Observed honey bee count data vs. fitted values of an optimal model:
showing a false outlier (i.e., the sample with a count value > 60).
24 2 Predictive variables and exploratory analysis
Outliers may also change with predictive models developed, that is, a false outlier could
also be produced by a sub-optimal model, e.g, model glmmpql2 below.
glmmpql2 <- glmmPQL(hbee ~ inf + c500 + w2000 + w300 + I(inf ^2) + I(c500 ^2) + I(
w2000 ^2) + I(w300 ^2) + inf:w2000 + inf:w300 + c500:w2000 + c500:w300 + I(inf
^2):w2000 + I(inf ^2):w300 , data = hbee1 , random = ~ 1| paddock/plot , family =
quasi(link = log , var = mu^2), maxit = 1000)
The results are shown in Figure 2.3 (Li 2008; Arthur et al. 2010) by
par(font.axis = 2, font.lab = 2, las = 1)
plot(hbee , predict (glmmpql2 , type = "response"), xlab = "Observed values", ylab =
"Fitted values")
lines(hbee , hbee)
FIGURE 2.3: Observed honey bee count data vs. fitted values of a sub-optimal model:
showing an outlier (i.e., the observation with a count value > 60).
It is obvious that the observation with the maximal bee count number becomes an outlier.
Therefore, caution should be taken in dealing with outliers.
2. Homogeneity of variance
Variance of response variable (or depend variable) can be either homogeneity (homoscedas-
ticity) or heterogeneity (heteroscedasticity). For spatial predictive modeling using regression
methods (e.g., GLM), we need to consider the variance of response variable in relation with
its mean as shown in Figure 2.4 that is based on hbee1 data set by
mu <- with(hbee1 , tapply(hbee , list(paddock , obs), mean))
vars <- with(hbee1 , tapply(hbee , list(paddock , obs), var))
Apparently, the variance changes with sample mean and is heterogeneous. This can be fur-
ther examined based on the residuals, for details see (McCullagh and Nelder 1999; Crawley
2007).
Another random document with
no related content on Scribd:
commands Bayonne reserve, 595;
his torpor, 602, 713.
Villena, castle of, capitulates to Suchet, 287, 288.
Vittoria, battle of, 384-450;
retreat of the French from, 433-50.
[2] Certainly Carlos de España and Morillo, probably some of the Galicians, and
even some of Elio’s or Ballasteros’ troops from the South, if they proved able to
feed themselves and march.
[3] Dispatches, ix. p. 424, to General Dumouriez, to whom Wellington often sent
an illuminating note on the situation.
[4] Dispatches, ix. pp. 390-1. Alten had the 3rd, 4th, Light, and España’s
divisions.
[8] The best account of all this is in the diary for August of Tomkinson of the 16th
Light Dragoons, who was in charge of the outlying party that went to Valtanas.
[9] The actual numbers (as shown in the tables given in vol. v, Appendix xi—
which I owe to Mr. Fortescue’s kindness) were July 15, 49,636; August 1, 39,301.
The deficiency of about 600 cavalry lost had been more than replaced by
Chauvel’s 750 sabres. There was a shortage of twenty guns of the original artillery,
but Chauvel had brought up six.
[12] The 2nd, 4th, 6th, and 25th Léger, the 1st, 15th, 36th, 50th, 62nd, 65th,
118th, 119th, 120th Line had to cut themselves down by a battalion each: the 22nd
and 101st, which had been the heaviest sufferers of all, and had each lost their
eagle, were reduced from three to one battalion each. There had been seventy-
four battalions in the Army of Portugal on July 1st: on August 1st there were only
fifty-seven.
[13] See ‘Memorandum for General Santocildes’ of August 5. Dispatches, ix. pp.
344-5.
[15] The best account of all this is not (as might have been expected) in Foy’s
dispatches to Clausel, but in a memorandum drawn up by him in 1817 at the
request of Sir Howard Douglas, and printed in an appendix at the end of the life of
that officer (pp. 429-30). Sir Howard had asked Foy what he intended to do on the
23rd-27th August, and got a most interesting reply.
[16] Diary of Foy, in Girod de l’Ain’s Vie militaire du Général Foy, p. 182.
[19] See especially Sir Howard Douglas’s Memoirs, pp. 206-7, and Tomkinson’s
diary, p. 201. Napier is short and unsatisfactory at this point, and says wrongly that
Clausel abandoned Valladolid on the night of the 6th. His rearguard was certainly
there on the 7th.
[20] Castaños’s explanation was that Wellington’s letter of August 30, telling him
to march on Valladolid, did not reach him till the 7th September, along with another
supplementary letter to the same effect from Arevalo of September 3.
[21] ‘The proclamation was made from the town-hall in the square: few people of
any respectability attended.’ Tomkinson, p. 202.
[23] Wellington to Henry Wellesley, Magaz, September 12. Dispatches, ix. p. 422.
[25] Napier was not with the main army during this march, the Light Division being
left at Madrid. On the other hand Clausel had been very polite to him, and lent him
some of his orders and dispatches (Napier, iv. p. 327). I fancy he was repaid in
print for his courtesy. The diaries of Tomkinson, Burgoyne, D’Urban, and Sir
Howard Douglas do not give the impression that the French ever stayed to
manœuvre seriously, save on the 16th.
[28] One of the regiments withdrawn to the north after suffering at Arroyo dos
Molinos, see vol. iv. p. 603.
[33] There were eight rank and file of the Royal Military Artificers only, of whom
seven were hit during the siege, and five R.E. officers in all.
[35] This narrative of the assault, not very clearly worked out in Napier—is drawn
from the accounts of Burgoyne, Jones, the anonymous ‘Private Soldier of the
42nd’ [London, 1821], and Tomkinson, the latter the special friend and confidant of
Somers Cocks.
[37] For a dispute between the chief engineer, Burgoyne, who blamed the
Portuguese, and some officers in the Portuguese service who resented his words,
see Wellington, Supplementary Dispatches, xiv. p. 123.
[42] Indeed the besiegers had largely depended on a dépôt of French picks and
shovels found by chance in the town of Burgos, after the siege had begun.
[43] See especially Tomkinson, an old comrade of Cocks in the 16th Light
Dragoons, pp. 211-17.
[44] Wellington says 18 prisoners in his return. Dubreton claimed to have taken 2
officers and 36 men in his report. Possibly the difference was mortally wounded
men, who were captured but died.
[48] See Napier, iv. p. 412, who had the fact from Sir Edward Pakenham’s own
mouth.
[52] Alexander Dickson remarks in his diary, p. 772, ‘This was done to please
General Clinton, and had nothing to do with the attack.’ Clinton’s troops were
opposite this side of the Castle, and had as yet not been entrusted with any
important duty.
[54] For this dialogue, told at length, see Burgoyne’s Correspondence, ed.
Wrottesley, i. p. 235.
[55] So I make out from the returns, but Beamish’s and Schwertfeger’s Histories
of the K.G.L. both give the lesser figure of 75—still sufficiently high!
[59] By knocking off their remaining trunnions, which made them permanently
useless. Some of the captured French field-guns from the hornwork were also
destroyed.
[62] Burgoyne commanding, John Jones the historian, Captain Williams, and
Lieutenants Pitts and Reid.
[64] Ibid., i. p. 233. There is much more in this interesting page of Burgoyne’s
explanation of the failure, which I have not space to quote.
[66] Pringle was commanding the 5th Division (Leith being wounded); Bernewitz
the 7th (Hope having gone home sick on September 23): Campbell, in charge of
the 1st since Graham was invalided, was off duty himself for illness when relieved
by Paget. Bock commanded the Cavalry Division vice Stapleton Cotton, wounded
at Salamanca.
[68] Viz. (figures of the Imperial Muster Rolls for October 15) Army of Portugal
32,000 infantry, 3,400 cavalry; Army of the North: Chauvel’s cavalry brigade (lent
to the Army of Portugal since July) 700 sabres, Laferrière’s cavalry brigade 1,600
sabres, parts of Abbé’s and Dumoustier’s divisions 9,500 infantry. Allowing another
2,000 for artillery, sappers, &c., the total must have reached 53,000. Belmas says
that Caffarelli and Souham had only 41,000 men. Napier gives them 44,000. Both
these figures are far too low. No one denies that Caffarelli brought up about 10,000
men; and the Army of Portugal, by the return of October 15, had 45,000 effectives,
from whom there are only to be deducted the men of the artillery park and the
‘équipages militaires.’ It must have taken forward 40,000 of all arms. See tables of
October 15 in Appendix II.
[69] Wellington on the 11,000 Galicians, Hill on Carlos de España (4,000 men),
Penne Villemur and Murillo (3,500 men), and the Murcian remnants under Freire
and Elio, which got separated from the Alicante section of their Army and came
under Hill’s charge, about 5,000.
[70] i. e. if they brought up Suchet’s troops from Valencia, beside their own
armies.
[75] See Wellington to Hill of October 14, and Wellington to Popham of October
17. Ibid., pp. 490 and 495.
[78] Wellington to Hill, October 5. Dispatches, ix. p. 469: ‘I do not write to General
Ballasteros, because I do not know exactly where he is: but I believe he is at
Alcaraz. At least I understand he was ordered there [by the Regency]. Tell him to
hang upon the left flank and rear of the enemy, if they move by Albacete toward
the Tagus.’
[81] Wellington says in his Dispatch to Lord Bathurst of October 26 that the
Brunswick officer disobeyed orders, and was taken because he did not retire at
once, as directed.
[83] This is the figure given by Colonel Béchaud in his interesting narrative of the
doings of Maucune’s division (Études Napoléoniennes, ii. p. 396). Martinien’s lists
show 3 casualties of officers only, all in the 86th of Maucune’s division.
[86] So Colonel Béchaud’s narrative, quoted above, and most valuable for all this
retreat.
[87] These figures look very large—and exceed Napier’s estimate of 5,000
sabres. But I can only give the strength of the French official returns, viz. Curto’s
division 2,163, Boyer’s division 1,373, Merlin’s brigade 746, Laferrière’s brigade
1,662; total 5,944. All these units were engaged that day, as the French narrative
shows, except that 4 only of the 6 squadrons of gendarmerie in Laferrière’s
brigade were at the front.
[88] Owing to losses at Garcia Hernandez and Majadahonda the Germans were
only 4 squadrons, under 450 effective sabres. The Light Dragoons of Anson, all
three regiments down to 2-squadron strength, made up about 800.
[90] Who was not himself any longer at their head, having been killed in a private
quarrel some weeks before. His men were this day under his lieutenant Puente
(Schepeler, p. 680).
[91] To Caffarelli’s high disgust: see his dispatch to Clarke of October 30, where
he calls Boyer’s action a ‘fatalité que l’on ne peut conçevoir.’
[92] As Lumley did at Usagre against L’Allemand, see vol. iv. p. 412.
[93] Anson’s brigade fought, it is said, with only 600 sabres out of its original 800,
owing to heavy losses in the morning, and to the dropping behind of many men on
exhausted horses, who did not get up in time to form for the charge. Bock’s
brigade was intact, but only 400 strong. Of the French brigade 1,600 strong on
October 15 by its ‘morning state’ two squadrons out of the six of gendarmes were
not present, so that the total was probably 1,250 or so engaged.
[94] Most of this detail is from the admirable account of von Hodenberg, aide-de-
camp to Bock, whose letter I printed in Blackwood for 1913. There is a good
narrative also in Martin’s Gendarmerie d’Espagne, pp. 317-19.
[95] In a conversation with Foy (see life of the latter, by Girod de l’Ain, p. 141)
when he said that all the cavalry generals of the Army of Portugal except
Montbrun, Fournier, and Lamotte were ‘mauvais ou médiocres’—these others
being Curto, Boyer, Cavrois, Lorcet, and Carrié.
[96] Details are worth giving. The 2nd Dragoons K.G.L. had 52 casualties, the 1st
44. In Anson’s brigade the 11th Light Dragoons lost 49, the 12th only 20, the 16th
47. The officers taken prisoners were Colonel Pelly and Lieutenant Baker of the
16th, Major Fischer (mortally wounded) of the 1st Dragoons K.G.L., and Captain
Lenthe and Lieutenant Schaeffer of the 2nd Dragoons K.G.L. The two infantry
battalions had 18 casualties, of whom 13 were men missing, apparently
skirmishers cut off in the fight earlier in the day on the Hormaza, or footsore men
who had fallen behind.
[98] Napier, iv. p. 361. Corroboration may be had on p. 120 of the Journal of
Green of the 68th, who says that his colonel was much puzzled to know how so
many men had succeeded in getting liquor, and that one soldier was drowned in a
vat, overcome by the fumes of new wine.
[99] This was Bonnet’s old division: Chauvel had been commanding it since
Bonnet was disabled at Salamanca. But he had been wounded by a chance shot
at Venta del Pozo on the 23rd, and Gauthier, his senior brigadier, had taken it over.
[100] Some 27 men of the 3/1st, taken prisoners here, represent this party in the
casualty list of October 25. The battalion was not otherwise seriously engaged.
[101] Who were drawn from the 4th, 30th, and 44th.
[102] For a romantic story of how one was discovered see Napier, iv. p. 363, a
tale which I have not found corroborated in any other authority.
[103] I had not been able to make out how the 1/9th came to lose these prisoners
till I came on the whole story in the Autobiography of Hale of the 1/9th, printed at
Cirencester 1826, a rare little book, with a good account of this combat. He is my
best source for it on the British side.
[105] I have been using for the French side mainly the elaborate and interesting
narrative of Colonel Béchaud of the 66th, recently published in Études
Napoléoniennes, ii. pp. 405-11.
[108] There is a full account of this business in Foy’s dispatch to Souham of the
next morning, in which occur all the facts given by Guingret in his own little book.
That officer’s narrative must be taken as fully correct.
[109] All this from Burgoyne, i. p. 244. Napier does not mention the earthworks,
which were batteries for six guns each.
[110] There he wrote his dispatch, concerning the late combats, to Clarke. Napier
never mentions Caffarelli’s departure—a curious omission.
[111] p. 437.
[113] Deprez, travelling with great speed, reached Paris and interviewed Clarke
on September 21. The Minister, who was no friend of Soult’s, told him that neither
he himself nor the King could dare to depose the Marshal without the Emperor’s
permission. Deprez then posted on to Moscow, and overtook the Emperor there on
October 18. Napoleon in his reply practically ignored the quarrel, contented himself
with administering a general scolding to all parties, and directed them to ‘unite, and
diminish as far as possible the evils that a bad system had caused.’ But who had
inaugurated the system? He himself!
[116] Which were the 27th Chasseurs and 7th Polish Lancers.
[117] For details see Table of the Army of Spain of the date October 15th, in
Appendix II to this volume.
[120] Napier and Jourdan say that Cearra was killed; but he only suffered
concussion of the brain, and survived to tell Schepeler (p. 688) how his sword and
its sheath were melted into one rod of metal by the lightning which ran down the
side of the couch on which he was lying at the moment.
[123] This is clearly stated in Wellington’s note to Hill of October 10. Dispatches,
ix. p. 481.
[125] When Penne Villemur moved in, and went behind the Tagus, I cannot make
out exactly. But it was before October 25th, as at that time Erskine’s British cavalry
had no longer any screen in front of them.
[126] All these dispositions come from a table of routes sent to D’Urban by
Jackson, Hill’s chief of the staff (Quartermaster-general), on the 24th.
[127] Jackson, Q.M.G., to D’Urban, 27th night: ‘Sir Rowland has determined to
concentrate behind the Jarama, on account of the state of the fords upon the
Tagus, and their number,’ &c.
[128] They had been at Arganda behind the Tajuna on the previous day, when Hill
was still thinking of defending the line of the Tagus. See Diary of Leach, p. 287.
[129] Jackson to D’Urban, October 27: ‘Keep your patrols on the Tagus as long
as they can with prudence stay there, with orders to follow the march of your main
body.’ On the next day the order is varied to that quoted above.
[131] Wellington to Hill, Cabezon, October 27. Dispatches, ix. pp. 518-19.
[132] See for this Wachholz (of the Fusilier brigade), Schepeler, Purdon’s history
of the 47th, &c. Wachholz’s Brunswick Company straggled so that of 60 men he
found only 7 with him at night. Several were lost for good. Wellington put the
colonel of the 82nd under arrest, because he had lost 80 men this day.
[134] Always a reckless falsifier of his own losses (he said that he had only lost
2,800 men at Albuera!), Soult wrote in his dispatch that he had only about 25
wounded at the Puente Larga. The figure I give above is that of the staff-officer
d’Espinchel, whose memoirs are useful for this campaign. By far the best English
account is that of H. Bunbury of the 20th Portuguese (Reminiscences of a Veteran,
i. pp. 158-63). I can only trace three of the five French officers in Martinien’s lists—
Pillioud, Caulet, Fitz-James, but do not doubt d’Espinchel’s figures though his
account of the combat is hard to fit in with any English version. He speaks with
admiration of the steadiness of the defence.
[135] All this from Soult’s dispatch to the King of October 31, from Valdemoro, and
Jourdan’s to Clarke from Madrid of November 3rd.
[136] See Diary of Swabey, R.A., p. 428, in Journal of the Artillery Institution, vol.
xxii.
[138] Napier (iv. p. 373) says that Joseph left a garrison and his impedimenta in
Madrid—I can find no trace of it in the contemporary accounts, e. g. of Romanos
(Memorias de un Setenton) or of Harley who was about the town during the
second week of November. Vacani distinctly says that Joseph had to take on even
his sick (vi. p. 190). Cf. also Arteche, xi. pp. 309-12.
[139] Napier, iv. p. 373, says that Joseph went by the route of Segovia to Castile.
I cannot think where he picked up this extraordinary idea. Jourdan’s dispatch of
November 10 from Peñaranda gives all the facts. It was on the 5th, near
Villacastin, that Soult told Joseph that Hill was about to be joined by Wellington
and that the two might crush him. The King at once sent orders to Drouet to come
up by forced marches from Madrid. The Army of the Centre started next day.
Palombini did not get off till the 8th (Vacani, vi. p. 190), but the head of the column
reached Villacastin that same day.
[141] His first definite information as to this was from a Spaniard who on
November 4 saw 3,000 French infantry marching through Torquemada towards
Burgos (Dispatches, ix. p. 544). Even so late as November 8th he did not rely on
this important news as correct.
[143] They had really not the 50,000 on which Wellington speculated (’45,000
men I should consider rather below the number’ (Dispatches, ix. p. 544) ) but
60,000 or very nearly that number. But, on the day when Wellington was writing,
their rear had not even started from Madrid, and Soult’s 40,000 men were strung
out all along the road.
[144] As a matter of fact, using the best map of 1812 available to me (Nantiat’s), it
would seem that the line Rueda-Fuentesauco-Salamanca is about 50 miles, that
by Rueda-Nava del Rey-Pitiegua-Salamanca about 55 miles, while the route
suggested for the French, circuitous and running in more than one place by
country cross-paths, is over 65 miles long, not to speak of its being a worse route
for topographical reasons.
[146] The total which marched was 60,000, so Wellington was even more correct
than he supposed in his notion that 45,000 was too small a figure.
[147] Wellington to Lord Bathurst, Pitiegua, November 8. Dispatches, ix. pp. 544-
5.
[151] I cannot find the details of the marching orders of the divisions; but from
personal diaries I seem to deduce that the 5th and 7th Divisions marched by
Alaejos, the 1st and 6th by Castrejon and Vallesa, while the cavalry not only
provided a rearguard but kept out flank detachments as far as Cantalapiedra on
one side and the lower Guarena on the other.
[152] All this from the detailed routes of march in the dispatches of Jackson (Hill’s
Q.M.G.) to D’Urban on November 4-5-6.
[154] For an adventure with these rascals, who threatened to shoot one of Hill’s
aides-de-camp, see Schepeler, p. 691.
[155] The ground on which Del Parque had fought his unlucky battle in 1809.
[156] These movements from Jourdan to Clarke, of November 10, and Soult to
Clarke of November 12.
[157] See the Notes of the Baden officer Riegel (vol. iii. p. 537), who complains
bitterly of the piercing north wind, and the lack of wood to build fires.
[158] D’Espinchel (ii. p. 71) says that the voltigeurs got within the walls, but were
expelled on each occasion. The English narratives deny that they ever closed, or
reached the barricades.
[159] Soult to Joseph, 8 a.m. on the 11th, ‘bivouac sur la hauteur en arrière
d’Alba de Tormes.’
[161] All their names verifiable from Martinien’s admirable lists of ‘Officiers tués et
blessés.’
[163] Wellington to Hill, November 10, 4.30 p.m. (Dispatches, ix. p. 549).
[167] Soult to Joseph, night of November 11, from the bivouac behind Alba de
Tormes.
[168] As the table of the French Armies of Spain for October 15 in the Appendix
shows, the Army of the South had on that day 47,000 men under arms (omitting
‘sick’ and ‘detached’), the Army of the Centre 15,000, the Army of Portugal 45,000
(including Aussenac’s brigade and Merlin’s cavalry, both attached to it
provisionally). This gives a total of 107,000, without sick or detached. The Army of
Portugal may have lost 1,000 men in action at Villadrigo and Villa Muriel, &c.: the
Army of the South not more than 400 at the Puente Larga, Alba de Tormes, &c.
The Army of the Centre had not fought at all. A deduction has to be made for
Soult’s very large body of men attached to the Artillery Park, and for a smaller
number in the Army of Portugal—say 3,000 men for the two together. Souham had
left a small garrison at Valladolid—perhaps 1,500 men. If we allow 5,000 men for
sick and stragglers between October 20 and November 12 there must still have
been a good 90,000 men present. Miot (who was present) calls the total 97,000 (iii.
p. 254), making it a little too high, I imagine.
[169] Maucune and Gauthier (late Chauvel). See Wellington Dispatches, ix. p.
556.
[170] Souham naturally expressed his indignation. See Miot, iii. p. 252-3.
[172] This fact, very important in justification of Wellington’s long stay on San
Cristobal, is not mentioned in any of his dispatches. But there is a full account of
the skirmish in the Mémoire of Colonel Béchaud of Maucune’s Division, printed in
Études Napoléoniennes, iii. pp. 98-9.
[173] The reconnaissance was executed by Leith Hay, who found the French
flank at Galisancho and reported its exact position. See his Narrative, ii. pp. 99-
100.
[174] Details from Foy’s Vie militaire, ed. Girod de l’Ain, p. 118, and Béchaud
(quoted above), pp. 99-100.
[175] His original intention to attack is clearly stated in Dispatches, ix. p. 559, and
the statement is corroborated by D’Urban.
[176] D’Urban, an eye-witness, thinks that Wellington ought to have called up Hill
and attacked, despite of all difficulties. ‘Lord Wellington arrived upon the ground at
about 12 noon, and at first appeared inclined to attack what of the enemy had
already passed, with the divisions and cavalry on the spot. The success of such a
measure appeared certain, and would have frustrated all the enemy’s projects.
However, his opinion changed: they were allowed to continue passing
unmolested.’
[177] Mémoires, p. 448. Cf. also Napier, iv. p. 381, who seems to share the idea.
‘Why, it may be asked, did the English commander, having somewhat carelessly
suffered Soult to pass the Tormes and turn his position, wait so long on the
Arapiles position as to render a dangerous movement (retreat in face of the enemy
and to a flank) necessary?’