Professional Documents
Culture Documents
Bedrock Analysis
Bedrock Analysis
[2014] developed a simplified regolith model to estimate regolith thickness in area with high fraction of out-
crop based on outcrops, slopes and the distance to outcrops in eight directions. Boer et al. [1996] showed
that the performances of the maximum likelihood classifier in the shale and limestone areas were better
than that in the phyllite area for the prediction of soil depth of dry Mediterranean areas. Shafique et al.
[2009] predicted regolith thickness by landform, elevation and distance to stream. Tesfa et al. [2009] applied
generalized additive models and random forest to predict soil depth from topographic and land cover
attributes. Dahlke et al. [2009] used class means of merged spatial explanatory variables to extrapolate the
soil depth measured at point locations. Wilford [2012] used airborne gamma-ray spectrometry and digital
terrain analysis to derive weathering intensity index, which can be used to estimate the appearance of
outcrops.
Globally, there are several existing maps of depth to bedrock. One of the first estimates of global distribu-
tion of DTB (limited to the upper 2 m) was produced by FAO [1996]. Here global soil depth was mapped
using expert rules, and primarily based on the soil unit’s classification name, the soil phase and the slope
class. Miller and White [1998] derived the DTB for the United States based on STATSGO (State Soil Geograph-
ic data). DTB in STATSGO2 is expressed as a shallowest depth of soil components that occupies less than
15% area of the map unit [USDA-NCSS, 2006]. Shangguan et al. [2013] estimated soil profile depth and seven
basic horizon thicknesses based on soil classes of China. Hengl et al. [2014] further tried using zero-inflated
models to estimate global DTB based on global compilation of soil profiles. All previous examples provide
only information about DTB within 2 m. Wilford et al. [2016] produced a regolith depth map for the whole
Australia at 3 arc-seconds resolution by using water well records and the R-Cubist package for model fitting
and prediction [Kuhn et al., 2014]. Recently, Pelletier et al. [2016] developed a global data set of the average
thicknesses of soil, intact regolith, and sedimentary deposits by representing upland areas by soil data and
lowland by water well data, using topography, climate, and geology data as input.
Above-mentioned global estimates of DTB are available at coarse resolutions only (1km or coarser) and/or
are often of limited accuracy. In addition, soil, hydrological and geological exploration is often done in iso-
lated domains: predictions based only on soil data, i.e., soil maps [e.g., FAO, 1996; Miller and White, 1998;
Hengl et al., 2014] are often limited to soil surface with values limited to several meters. Likewise, maps
based on boreholes from geological explorations are only available for some states in USA and small
regions with values up to several hundred meters [see e.g., Richard et al., 2007; Illinois State Geological
Survey, 2004; Witzke et al., 2010]. Combining soil profiles and boreholes in producing DTB maps are neces-
sary to fill this gap and provide consistent estimates.
In this paper we describe a framework to estimate depth to bedrock at the spatial resolution of 250 m by
using the state-of-the-art machine learning methods. As training points we use a compilation of publicly
available soil profiles and borehole logs. As covariates, we use an extensive list of remote sensing based
covariates including the most up-to-date lithologic map of the world, DEM-based hydrological and morpho-
logical derivatives and MODIS land products. Our main objective is to use a statistical framework to provide
best possible unbiased predictions of DTB. We develop this framework within domain of automated soil
mapping as part of the SoilGrids system [Hengl et al., 2014], in which spatial predictions can be gradually
improved by adding new training data.
literature to describe the depth to bedrock. In contrast, the term ‘‘bedrock’’ is used relatively consistently in
geological literature [Illinois State Geological Survey, 2004; Missouri Geological Survey, 2013; Karlsson et al.,
2014; Jain, 2014], though differences also exist.
We consider ‘‘bedrock’’ [Jain, 2014]:
‘‘the consolidated solid rock underlying unconsolidated surface materials, such as soil or other
regolith,’’
which is considered to be equivalent to the definition of the R horizon (hard rock) in soil science
[Schoeneberger et al., 2011].
Correspondingly, we consider ‘‘Depth to Bedrock’’ (DTB) (Figure 1):
‘‘depth (in cm) from the ground surface to the contact with coherent (continuous) bedrock.’’
As such, DTB is a skewed variable with a lot of values grouped around 0 depth, while maximum values can
range up to a few thousand meters. Exposed bedrock or bedrock visible at surface is referred as ‘‘rock out-
crop,’’ i.e., DTB5 0 [Jain, 2014].
Figure 2. Global distribution of depth to bedrock observations. (a) Red colors indicate soil profiles, (b) blue colors boreholes, and (c) the
yellow colors pseudo observations, i.e., points inserted using expert knowledge.
2.3.2. Boreholes
We use 1,574,776 points with borehole logs from: the United States (661,441), Canada (580,063), Australia
(5,943), Sweden (320,451), Ireland (4,250), Brazil (2,004), China (598) and Russia (26). The spatial distribution of
boreholes is shown in Figure 2. Many states in the US established digital water well databases over the last
several decades. The databases includes data from Northern High Plains aquifer, South-Central Kansas, and 14
state databases, i.e., Alaska, Indiana, Iowa, Kentucky, Maine, Minnesota, Missouri, Nevada, New Hampshire,
New York, Ohio, Pennsylvania, Tennessee and Vermont. The coordinates of the points from Alaska were
derived from the Public Land Survey System, with a geo-location error ranging from 650 m to 6800 m (still
compatible with our target resolution of 250 m). For Canada, four provinces, i.e., British Columbia, Nova Scotia,
Prince Edward Island and Quebec, have a water well database. The list of water wells from the United States
and Canada are given in the supporting information. Boreholes of Russia are from Melnikov [1998].
For Australia, we derive DTB from the Australia National Groundwater Information System (ANGIS) (http://
www.bom.gov.au/water/groundwater/). Each well contains multiply layers of construction, hydro-
stratigraphy and lithology logs, which can be used to determine location of the the bedrock [Wilford et al.,
2016]. Although the number of recorded points is >200,000, only 5,992 points from the total can be classi-
fied as DTB measurements with high enough certainty to be further used for building global spatial predic-
tion models. The lookup tables used to convert original records in the ANGIS to values used for building
global spatial prediction models are available in the supporting information. For Brazil and China, DTB was
extracted from the lithology layer description by manual interpretation. The Brazil Groundwater Information
System (SIAGAS, http://siagasweb. cprm.gov.br) contains 273,972 water wells and the Chinese National
Database of Geological Drilling (http://zkinfo.cgsi.cn) contains 410,123 boreholes. Only a small fraction con-
tains lithological data that was used as training points, which distributes across Brazil and China quite
evenly.
2.3.3. Pseudo or Expert-Based Observations
We use two approaches to generating pseudo-observations:
1. Based on the global mask maps of sand dunes areas and steep bare surface areas (i.e., Himalayas) gener-
ated using remote sensing and slope map of the world, and
2. Based on the detailed geological maps reporting rock outcrops.
We generated the global mask maps of sand dunes areas and steep bare surface areas using the global
MODIS surface reflectance product (MCD43A4) and global DEM and slope maps based on the SRTM
DEM [Rabus et al., 2003], both derived at 500 m. After some visual inspection, we discovered that the
medium infrared band 7 from the MCD43A4 land product [Moody et al., 2005] can be used to detect
areas of high surface reflectance (sand dunes and bare rock). For the shifting sand areas we randomly
inserted 300 points (DTB5150 m; average depth of the sand in Sahara) and for the steep bare surface
areas 200 points (DTB50 m). Again, these points were carefully inserted only for the purpose of filling
the possible gaps in the data. The resulting global mask maps used to generate pseudo-observations
are shown in Figure 3.
In the second approach, we also generate few hundred points by using a number of detailed regional geo-
logical maps. Regions having exposed bedrock maps include New York State, Vermont, Alaska, Alberta,
Manitoba and Newfoundland and areas covered by NRCan Groundwater Program (http://gin.gw-info.net/).
All steps used to generate pseudo-points have been documented via Github (R code).
Figure 3. Global mask maps of shifting sand areas (above) and steep bare surface areas. This map was derived using the medium infrared
band 7 from the MCD43A4 MODIS land product, and global DEM and slope images (based on the SRTM DEM). Projected in the original
MODIS sinusoidal projection system.
The land mask is visible from the final prediction maps shown in Figure 8. As covariates, we use 155 global
environmental layers (most of them available from http://worldgrids.org/), which include:
SRTM DEM-derived parameters such as topographic wetness index, Valley Bottom Flatness index, slope, ter-
rain curvatures, openness and surface ruggedness index,
1. Global lithological map [Hartmann and Moosdorf, 2012],
2. Global landform map [Sayre et al., 2014],
3. Global land cover GLC30 product [Chen et al., 2015],
4. Climatic surface based on WorldClim [Hijmans et al., 2005],
5. MODIS land products, including EVI images and surface reflectance bands,
6. Global Water Table Depth in meters based on Fan et al. [2013],
7. Global 1 km Gridded Thickness of Soil, Regolith, and Sedimentary Deposit Layers based on Pelletier et al.
[2016].
The complete list of covariates is given in the supporting information. Note that the map by Pelletier et al.
[2016] is generated by combining process-based models and empirical models, and is as such ideal for sta-
tistical calibration using actual point data. For this purpose we use the layer of average soil and
sedimentary-deposit thickness which shows only depths up to 50 m.
Figure 4. The spatial prediction framework used to fit models and predict DTB variables globally at 250 m resolution.
package). Both models are tree ensemble methods. The random forest model uses fully grown decision
trees (low bias, high variance) and reducing error by reducing variance[Breiman, 2001]. The Gradient Boost-
ing Tree uses shallow trees (high bias, low variance) and reducing error mainly by reducing bias, and also to
some extent by reducing variance by aggregating the output from many models [Chen and Guestrin, 2016].
For model validation we used 10–fold cross-validation and comparison with regional maps. Cross-validation
is used to limit the problem of overfitting, which gives an insight on how the model will generalize to an
independent data set. For each of the three target variables we de rive the coefficient of determination (R2
or the amount of variation explained by the model), mean error (ME) and root mean square error (RMSE) to
evaluate the model performance. Amount of the variation explained by the model is:
SSE RMSE 2
R2 5 12 5 12 5½12100% (1)
SSTO r2Z
where SSE is the sum of squares for residuals at cross-validation points (i.e., RMSE2 n), and SSTO is the total
sum of squares. Coefficient of determination close to 1 indicates a perfect model, i.e., 100% of variation has
been explained by the model. RMSE is then:
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
u1 X l
RMSE5t 3 ½^z ðSi Þ2z ðSi Þ2 (2)
l i51
where l is the number of validation points. For the occurrence of R horizon within 0–200 cm expressed as
0–1 probability values (in essence a binomial variable) we also derive the area under the receiver operating
characteristic (ROC) curve, known as the AUC. Values of AUC close to 1 are highly satisfactory, while the val-
ues of AUC close to 0.5 can be considered fairly poor.
To evaluate the extrapolation risk, we used a procedure as follows (referred as ‘‘cross-validation by region’’).
First, all samples was partitioned into subsets by regions. Then, the spatial prediction model was calibrated
using one subset of a region (or regions). Finally, this model was validated using the other subsets (or other
subset). At the continental scale, the spatial prediction model is calibrated using data from one continent
and then applied it to the other two. The three continents are North America (United States and Canada),
Europe (Sweden and Ireland) and Australia. A similar procedure is applied to the provinces of Canada and
states of US. For convenience, we call these spatial prediction models continental models and state (prov-
ince) models. The extrapolation risk is also evaluated by leave one state out in calibration for the United
States. For convenience, we called such spatial prediction model such as the ‘‘without Ohio’’ model. All code
used to generate predictions is available from the Github channels (https://github.com/ISRICWorldSoil/
SoilGrids250m).
3. Result
Table 1. Statistics of the Depth to Bedrock (DTB) in Centimetera 3.1. Summary Statistics
Variable Continent Minimum Mean Median Maximum Number
The statistics of the absolute
Africa 2 1,337.3 125 15,000 3,281 DTB and the censored DTB is
Asia 0 1,057.9 15 65,379 2,070
Absolute Oceania 0 3,335.9 2,250 66,900 6,251 given in Table 1. Figure 5 shows
DTB the histogram of the absolute
Europe 0 690.5 400 22,000 281,563 DTB and the censored DTB. The
North America 0 1,487.4 850 312,541 1,227,393
South America 0 1,595.1 500 37,000 2433 absolute DTB after logarithm
World 0 1,309.3 670 312,541 1,590,464 transform had distribution simi-
Africa 2 110.01 110 195 2,636
lar to normal distribution but
Asia 0 25.4 10 197 1,543
Censored Oceania 0 61.63 55 198 805 with many zero values (i.e., out-
DTB crops). The frequency of values
Europe 0 87.73 100 198 78,491
North America 0 105.88 120 199 192,214
larger than 1 m from soil pro-
South America 0 29.25 10 190 892 files decreased as the DTB
World 0 97.51 100 199 307,936 increases. Many borehole val-
a
There are 1,379,502 observations which have a value equal or large than 200 cm, and ues were around 0.5 m, 1 m,
these are excluded in calculating statistics of censored DTB. 1.5 m, 2 m, etc. as well as in
Figure 5. Histograms of (a, b) absolute depth to bedrock (DTB) and (c, d) censored DTB. For absolute DTB, values equal or large than
8800 cm are not shown. For censored DTB, values equal or large than 200 cm are not shown. The number of observations are 1,590,464,
13,416 and 2,93,095 for (a, b) absolute DTB, (c) censored DTB from soils, and (d) censored DTB from wells, respectively.
integer multiples of one foot (i.e., 30.48 cm). This is due to the fact that the DTB is usually recorded in feet
or (half-) meters in borehole logs.
Table 2. Summary Statistics and Mapping Performance for Depth to Bedrock (DTB)a
Amount of
Model Fit Model Fit Variation
Variable Type Units Range RF (R2) GB (R2) Explained ME RMSE
Absolute DTB log-normal cm 0–312,500 0.61 0.38 0.59 224.6 1,172
Censored DTB zero-inflated cm 0–200 0.35 0.25 0.34 1.25 51
Occurrence binomial prob. 0–1 0.35 0.23 0.34 20.006 0.34
(of R horizon)
a
RF indicates random forest, GB indicates Gradient Boosting Tree. Amount of variation explained, mean error (ME) and root mean
square error (RMSE) were determined using 10–fold cross validation.
Figure 6. Scaled importance of covariates with the resolution of 250 m for target variables by random forest model. NIR is near infrared radiation. MIR is middle infrared radiation. MRVBF
is Multiresolution Index of Valley Bottom Flatness. LST is land surface temperature. PWV is precipitable water vapor.
et al., 2010]. The Ohio map was produced using over 162,000 data points as control for the bedrock-
topography lines [Swinford, 2004]. Ground-moraine dominated areas have a shallow DTB, the Ice-deposited
Wisconsinan-age ridge moraines generally have a medium DTB, and limited areas of deep DTB are largely
the results of deep bedrock valleys filled with drift.
Figure 7. Plot showing cross-validation results for absolute depth to bedrock on the logarithmic scale. R-square is calculated using formula
in equation (1).
The correlation coefficient between our prediction and the regional maps are 0.82 and 0.6 for Iowa and
Ohio, respectively. Although the regional maps of DTB cannot be considered a ground truth, these maps
can be nevertheless considered several times more accurate than our global predictions. For both areas,
there is an underestimation according to the mean error (2422 for Iowa and 2528 for Ohio). Although the
differences in Figures 9 and 10 indicate that there is some underestimation of higher values, especially in
the case of Ohio, this comparison also shows that the general patterns between regional maps and our pre-
dictions match in most cases. In Iowa, the bedrock surface is buried by unconsolidated surficial sediments
(mostly Quaternary) over most of its extent. In the southwest and northwest of Iowa, shallow DTB was
found. Most areas of Ohio are covered by sediments left by continental glaciers. In the southwest Ohio, the
bedrock surface is very close to the land surface as this area was free from glaciation.
The correlation coefficient between the map of Pelletier et al. [2016] and the regional maps are 0.27 and
0.24 for Iowa and Ohio, respectively. For both areas, the spatial patterns were quite different between them
(Figures 9 and 10). For Iowa, the deep DTB in the east part didn’t appear in the map of Pelletier et al. [2016].
For Ohio, the frequency of medium values in the map of Pelletier et al. [2016] were very low compared to
the regional map, and most values were either near zero or 50 m.
Figure 11 shows the comparison between the observations, our prediction and the regional DTB maps
along a line in Iowa and Ohio. In general, our predictions coincide well with the observation and the region-
al maps. Compared to the regional map of Iowa, our prediction had an underestimation for the bedrock
Calibration R2 ME
valley along the way from Alvord to Wever. Compared to the regional map of Ohio, our prediction had an
overestimation for the hill slope around 100km from Montpelier to Pomeroy, and an underestimation for
the valley around 250km.
Because our data set used the map of Pelletier et al. [2016] as a covariate in the prediction, the comparisons
above may be biased. However, the results from cross-validation shows that the amount of variation
explained decreased from 58.7% to 58.6% when the map of Pelletier et al. [2016] was took out from the
covariate list. The reason of this may be that many of the patterns in the map of Pelletier et al. [2016] has
been already represented in the existing list of covariates (especially DEM-derived parameters which are
also used as covariates in producing the map of Pelletier et al. [2016]). Thus, the resulting map will not
change much if the map of Pelletier et al. [2016] is taken out as a covariate, and the comparisons above is
not problematic.
Figures 12 and 13 show the comparison between observations, our prediction and the map of Pelletier et al.
[2016] for Kentucky and Pennsylvania. The correlation coefficient between our prediction and observations
was relative high. The machine learning models could reflect the major spatial pattern of DTB. However, the
underestimation of high value and the overestimation of low value were significant. On the contrary, the
map of Pelletier et al. [2016] gave extreme estimations, i.e., very high or very low, for almost all the areas.
But the major spatial patterns in observations are not reflected. For example, almost the whole state of
Kentucky has a shallow DTB, and the high values in the southeast corner of the state are almost missing.
This may be caused by the misclassification of landform.
We validated the map of Pelletier et al. [2016] with our DTB observations by excluding the values no less
than 50 m because the maximum value of Pelletier et al. [2016] is 50 m. For interpolation area including Indi-
ana, Kentucky, New York and Pennsylvania where they used DTB data for calibration, the amount of varia-
tion explained is 5%. For extrapolation, the amount of variation explained is 2%.
Table 5. Calibration and Validation Metrics of Leave One State Uut Models of United Statesa
Validation
2
Calibration R ME
Calibration Area
2 b b
(Without) R ME Interpolation Extrapolation Interpolation Extrapolation
Indiana 0.735 20.019 0.644 0.092 20.026 20.261
Iowa 0.717 20.018 0.63 0.201 0.024 0.041
South central Kansas 0.727 20.019 0.637 0.045 20.022 0.285
Kentucky 0.717 20.02 0.677 0.003 0.11 0.68
Maine 0.73 20.021 0.613 0.012 0.029 0.221
Minnesota 0.714 20.019 0.616 0.099 0.026 20.567
Missouri 0.741 20.023 0.683 0.054 20.024 20.306
New Hampshire 0.726 20.019 0.648 0.051 0.039 0.003
New York 0.746 20.02 0.684 0.008 20.034 20.151
Ohio 0.722 20.021 0.663 0.17 20.041 0.011
Northern High Plains 0.677 20.018 0.621 0.033 0.008 20.739
Pennsylvania 0.736 20.02 0.674 0.007 20.016 20.042
Tennessee 0.732 20.02 0.653 0.002 20.042 20.296
Vermont 0.729 20.02 0.652 0.012 0.034 0.119
a
ME is mean error, which is calculated after logarithm transform.
b
The average of all interpolations of leave one state out models.
Figure 8. Final prediction of (a) the absolute depth to bedrock (cm), (b) the censored depth to bedrock (cm, here values equal to 200 cm
indicate ‘‘deep as or deeper than’’), and (c) occurrence of R horizon within 200 cm (%). The maximum value of the absolute depth to
bedrock is set as 250 m for the convenience of visualization. But the actual maximum predicted value is about 540 m.
4. Discussion
We used the most abundant depth to bedrock observations from soil survey and geologic boreholes (pri-
marily water wells) to estimate the global spatial distribution using data-driven models. This work presented
Figure 9. Comparison of (a) regional map of Iowa, (b) our prediction and (d) map of Pelletier et al. [2016]. (c, e) The scatter plots with the
correlation coefficient indicate how well our prediction and Pelletier et al.’s [2016] prediction match the regional predictions. Values have
been stretched using a log-scale to emphasize spatial patterns. Note that the maximum value of Pelletier et al. [2016] is 50 m. And we took
out the values no less than 50 meters for the corresponding scatter plots.
the most up-to-date global DTB maps with higher resolution 250 m and higher accuracy compared to previ-
ous studies such as Pelletier et al. [2016]. The cross-validation statistics show that the absolute DTB maps
and the occurrence of R horizon have moderate accuracy, and the censored DTB map has a low accuracy.
There is overestimation of the absolute DTB with mean error of 20.25 m, and an underestimation of the
Figure 10..
Figure 11. Comparison of measured and predicted absolute depth to bedrock for (a) Iowa and (b) Ohio. The points are the observations.
The black line is the land surface elevation. The red line is the predicted DTB. The blue line is the DTB of regional map.
censored DTB with mean error of 1.25 cm. The large RMSE (11.7 m) in relation to the mean predicted values
highlights the need for considered use of the depth predictions. Our prediction patterns of DTB also match
with regional maps from Iowa and Ohio, although the average differences in values are about 610 m.
Figure 10. Comparison of (a) regional map of Ohio, (b) our predictions and (d) map of Pelletier et al. [2016]. (c, e) The scatter plots with the
correlation coefficient indicate how well our prediction and Pelletier et al.’s [2016] prediction match the regional predictions. Values have
been stretched using a log-scale to emphasize spatial patterns. Note that the maximum value of Pelletier et al. [2016] is 50 m. And we took
out the values no less than 50 meters for the corresponding scatter plots.
Figure 12. Comparison of (a) observations of Kentucky, (b) our predictions and (d) map of Pelletier et al. [2016]. (c, e) The scatter plots with
the correlation coefficient indicate how well our prediction and Pelletier et al.’s [2016] prediction match the observations. Values have been
stretched using a log-scale to emphasize spatial patterns. Note that the maximum value of Pelletier et al. [2016] is 50 m. And we took out
the values no less than 50 m for the corresponding scatter plots.
were not very successful in finding the relationship between the target variable and the covariates, and the
resulting map remains experimental.
The amount of variation explained by the models for the absolute DTB is about 59%, which means almost
half is unexplained. Mapping depth to bedrock is certainly complex (as soils are hidden, results of past grad-
ual and abrupt processes). Most likely more detailed geomorphological maps and lithological maps could
be the key for improving the predictions. At the moment we used the GLiM data set, which is actually of
Figure 13. Comparison of (a) regional map of Pennsylvania, (b) our predictions, and (d) map of Pelletier et al. [2016]. (c, e) The scatter plots
with the correlation coefficient indicate how well our prediction and Pelletier et al.’s [2016] prediction match the regional predictions.
Values have been stretched using a log-scale to emphasize spatial patterns. Note that the maximum value of Pelletier et al. [2016] is 50 m.
And we took out the values no less than 50 m for the corresponding scatter plots.
very general scale and low quality. As soon as a more detailed global lithological data arrives to the public
domain, it will be useful to improve the predictions.
It is a common thing in regression, such as the machine learning models we used, that low/high values can
get smoothed out in the case R-square is smaller. The deepest observation in the source data is about
3000 m. But the actual maximum predicted value is about 540 m. The machine learning models also overes-
timated zero DTB values, i.e., many outcrop were predicted as values around 300 cm (Figure 7). As a result,
the hint of Andes, Himalayas or many other mountain ranges, where DTB is near zero, is not very clear in
the map of absolute DTB. Another reason of the poor performance for mountain ranges is that we have few
observations there but only some pseudo-observations.
We could not predict deep values such as > 1km deep in Andean foreland basin because the borehole data
are also censored to some extent, i.e., we do not have much deep observations in such areas. There is no
universal requirement on how deep a drilling should go. So we do not know how much the borehole data
are censored (likely dozens of meters). Luckily, most applications including Earth System Models are more
interested for the shallow DTBs. Even though we estimate the absolute DTB, it should be considered as a
censored DTB when the interest is for the deep DTBs.
performance of spatial prediction models. In this study, though the spatial coverage of soil profiles was
quite good, boreholes are spatially clustered and the spatial coverage of boreholes was not ideal. Systemat-
ic omission of deep DTB observations where there are no water wells or other boreholes led to the underes-
timation of the DTB. For example, the tropical rainforests usually have a very deep regolith, but the above
feature is not predicted in the resulting DTB map due to the lack of deep observations in those areas. We
used the ellipsoid defined by Montgomery et al. [2001] to determine the feature space similarity. The results
shows that the feature space is covered well by the point observations (above 99.9%), indicating that there
is no extrapolation in feature space. However, the relationship between the dependent and independent
variables may not carry from one region to another. The spatial coverage of deep DTB observation is more
importance than their coverage in feature space to reduce the extrapolation risk.
Figure 14. Effects of observation density on the model performance for Kentucky. Black line is the amount of variation explained. Red line
is the percentage of the observations used for model calibration. There are 82,905 observations in total.
represent the spatial variation of DTB. It should be noted that there were less than 1% of grids within the
interpolation area which had an observation when the cell size is 100 m by 100 m. This is because the
observations are spatially clustered. As a result, adding more observations will improve the prediction even
in such areas with high density of observations. We also tested the above procedure for the global observa-
tions. The results shows that the amount of variation explained by validation was 19% when the cell size
was 100km by 100km, and only 2,308 observations were used for model calibration. This indicated that
there were still some predictabilities when observations were very sparse but were evenly distributed in
space.
5. Conclusions
We produced maps of the depth to bedrock including the absolute DTB, the censored DTB, and the occur-
rence of R horizon within 200 cm for the whole world using state-of- the-art ground observations of depth
to bedrock and machine learning algorithms. This data set provides Earth System Models with more accu-
rate estimation of the lower boundary condition. The cross-validation suggests that moderate performance
for the absolute DTB and the occurrence of R horizon. However, the censored DTB contains a significant
amount of over-predicted low values. The predictability of DTB was limited by the inherent variability, inac-
curacies, censored nature of the observations and biased spatial coverage of the input data. In addition,
almost all the covariates used in this study reflect surface or near surface characteristics and processes in
modern time. This restricts the ability of predicting the higher values of DTB (i.e., deeper DTB). Incorporation
of more observations, especially borehole drilling logs in the tropics, wetlands, mountain ranges, shifting
sand areas and similar, would help improve the resulting maps and increase accuracy, especially for higher
values of DTB. As all processes from point to raster overlay to model fitting are fully automated, by gradually
adding new training data we hope to produce more and more accurate maps of underlying boundary of
the world soil and regolith. The resulting global maps are available for download at http://globalchange.
bnu.edu.cn/and http://soilgrids.org/.
Acknowledgments References
This work was supported by the
Arrouays, D., et al. (2014), GlobalSoilMap: Toward a fine-resolution global grid of soil properties, Adv. Agron., 125, 93–134, doi:10.1016/
Natural Science Foundation of China
B978-0-12-800137-0.00003-0.
(under grants 41575072, 41405096)
Boer, M., G. DelBarrio, and J. Puigdefabregas (1996), Mapping soil depth classes in dry Mediterranean areas using terrain attributes derived
and R&D Special Fund for Nonprofit
from a digital elevation model, Geoderma, 72(1-2), 99–118, doi:10.1016/0016-7061(96)00024-9.
Industry (Meteorology,
Mallavan, B. P., B. Minasny, and A. B. McBratney (2010), Homosoil, a methodology for quantitative extrapolation of soil information across
GYHY201206013, GYHY201306066).
the globe, in Progress in Soil Science, vol. 2, edited by J. Boettinger et al., pp. 137–150, Springer Netherlands, Dordrecht, doi:10.1007/
ISRIC is a nonprofit-making
978-90-481-8863-5.
organization, core-funded by the
Breiman, L. (2001), Random forests, Mach. Learning, 45(1), 5–32, Dordrecht, doi:10.1023/A:1010933404324.
Dutch government, with a mandate to
Brown, J., O. F. Jr., J. Heginbottom, and E. Melnikov (2001), Circum-arctic map of permafrost and ground ice conditions, Digital media, Natl.
serve the international community as
Snow and Ice Data Cent., Boulder, Colo.
custodian of global soil information
Brunke, M. A., et al. (2016), Implementing and evaluating variable soil thickness in the Community Land Model version 4.5 (CLM4.5),
and to increase awareness and
J. Clim., 29, 3441–3461, doi:10.1175/JCLI-D-15-0307.1.
understanding of the role of soils in
Calvi~ no, P., V. O. Sadras, and F. H. Andrade (2003), Quantification of environmental and management effects on the yield of late-sown soy-
major global issues.
bean, Field Crops Res., 83(1), 67–77, doi:10.1016/S0378-4290(03)00062-5.
Chen, J., et al. (2015), Global land cover mapping at 30 m resolution: A POK-based operational approach, ISPRS J. Photogramm. Remote
Sens., 103, 7–27, doi:10.1016/j.isprsjprs.2014.09.002.
Chen, T., and C. Guestrin (2016), XGBoost: A Scalable Tree Boosting System, in KDD ’16 Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, pp. 785–794, Assoc. for Comput. Mach., New York, doi:10.1145/2939672.2939785.
Chesworth, W. (2008), Encyclopedia of Soil Science, Encycl. Earth Sci. Ser., Springer, Dordrecht, Netherlands, doi:10.1007/978-1-4020-3995-9.
Dahlke, H. E., T. Behrens, J. Seibert, and L. Andersson (2009), Test of statistical means for the extrapolation of soil depth point information
using overlays of spatial environmental data and bootstrapping techniques, Hydrol. Processes, 23(21), 3017–3029, doi:10.1002/hyp.7413.
Dregne, H. (2011), Soils of Arid Regions, Dev. Soil Sci., Elsevier Sci., Amsterdam.
Fan, Y., H. Li, and G. Miguez-Macho (2013), Global patterns of groundwater table depth, Science, 339(6122), 940–943, doi:10.1126/science.1229881.
FAO (1996), Digitized soil map of the world and derived soil properties, map, Rome.
FAO (2014), World Reference Base for Soil Resources 2014, Rome.
Fu, Z., Z. Li, C. Cai, Z. Shi, Q. Xu, and X. Wang (2011), Soil thickness effect on hydrological and erosion characteristics under sloping lands: A
hydropedological perspective, Geoderma, 167-168, 41–53, doi:10.1016/j.geoderma.2011.08.013.
Gochis, D. J., E. R. Vivoni, and C. J. Watts (2010), The impact of soil depth on land surface energy and water fluxes in the North American
Monsoon region, J. Arid Environ., 74, 564–571.
Hartmann, J., and N. Moosdorf (2012), The new global lithological map database GLiM: A representation of rock properties at the Earth
surface, Geochem. Geophys. Geosyst., 13, Q12004, doi:10.1029/2012GC004370.
Hengl, T., et al. (2014), Soilgrids1km — global soil information based on automated mapping, PLoS One, 9(8), e105992, doi:10.1371/
journal.pone.0105992.
Hijmans, R. J., S. E. Cameron, J. L. Parra, P. G. Jones, and A. Jarvis (2005), Very high resolution interpolated climate surfaces for global land
areas, Int. J. Climatol., 25, 1965–1978, doi:10.1002/joc.1276.
Howell, J. V. (1960), Glossary of Geology and Related Sciences, Am. Geol. Inst., Washington, D. C.
Illinois State Geological Survey (2004), Glacial drift in Illinois: Thickness and character, map, Champaign.
Jain, S. (2014), Fundamentals of Physical Geology, Springer, New Delhi.
Juilleret, J., S. Dondeyne, and C. Hissler (2014), What about the regolith, the saprolite and the bedrock? Proposals for classifying the subso-
lum in WRB, in EGU General Assembly Conference Abstracts, vol. 16, pp. 2716, Copernicus, Vienna.
Karlsson, C., I. Jamali, R. Earon, B. Olofsson, and U. M€ortberg (2014), Comparison of methods for predicting regolith thickness in previously
glaciated terrain, Stockholm, Sweden, Geoderma, 226-227, 116–129, doi:10.1016/j.geoderma.2014.03.003.
Kuhn, M., S. Weston, C. Keefer, N. Coulter, and R. Quinlan (2014), Cubist: Rule- and Instance-Based Regression Modeling, R package. [Available
at https://cran.r-project.org.]
Kuriakose, S. L., S. Devkota, D. G. Rossiter, and V. G. Jetten (2009), Prediction of soil depth using environmental variables in an anthropo-
genic landscape, a case study in the Western Ghats of Kerala, India, Catena, 79(1), 27–38, doi:10.1016/j.catena.2009.05.005.
Lawrence, D. M., A. G. Slater, V. E. Romanovsky, and D. J. Nicolsky (2008), Sensitivity of a model projection of near-surface permafrost degra-
dation to soil column depth and representation of soil organic matter, J. Geophys. Res., 113, F02011, doi:10.1029/2007jf000883.
Lowrie, W. (2007), Fundamentals of Geophysics, 2nd ed., Cambridge Univ. Press, New York.
McPherson, A. (2011), Development of the Australian National Regolith Site Classification Map, map, Geosci. Aust., Symonston ACT.
Melnikov, E. S. (1998), Catalog of boreholes from Russia and Mongolia, in International Permafrost Association, Data and Information Work-
ing Group, comp. Circumpolar Active-Layer Permafrost System (CAPS), Version 1.0 [CD-ROM], Natl. Snow and Ice Data Cent., Univ. of Colo.
at Boulder, Boulder, Colo.
Miller, D. A., and R. A. White (1998), A conterminous United States multilayer soil characteristics dataset for regional climate and hydrology
modeling, Earth Interact., 2, 1–26, doi:10.1175/1087-3562(1998)002h0001:ACUSMSi2.3.CO;2.
Missouri Geological Survey (2013), MO 2014 overburden thickness (depth to bedrock), map, Rolla, Mo.
Montgomery, D. C., E. A. Peck, and G. G. Vining (2001), Introduction to Linear Regression Analysis, Wiley Ser. Probab. Stat., 3rd ed., John Wiley,
Hoboken, N. J., doi:10.1198/tas.2003.s211.
Moody, E. G., M. D. King, S. Platnick, C. B. Schaaf, and F. Gao (2005), Spatially complete global spectral surface albedos: Value-added data-
sets derived from Terra MODIS land products, IEEE Trans. Geosci. Remote Sens., 43(1), 144–158, doi:10.1109/TGRS.2004.838359.
Pelletier, J. D., and C. Rasmussen (2009), Geomorphically based predictive mapping of soil thickness in upland watersheds, Water Resour.
Res., 45, W09417, doi:10.1029/2008WR007319.
Pelletier, J. D., P. D. Broxton, P. Hazenberg, X. Zeng, P. A. Troch, G.-Y. Niu, Z. Williams, M. A. Brunke, and D. Gochis (2016), A gridded global
data set of soil, immobile regolith, and sedimentary deposit thicknesses for regional and global land surface modeling, J. Adv. Model.
Earth Syst., 8, 41–65, doi:10.1002/2015MS000526.
Peterman, W., D. Bachelet, K. Ferschweiler, and T. Sheehan (2014), Soil depth affects simulated carbon and water in the MC2 dynamic glob-
al vegetation model, Ecol. Modell., 294, 84–93, doi:10.1016/j.ecolmodel.2014.09.025.
Price, D. G. (2009), Engineering Geology: Principles and Practice, Springer, Berlin.
Rabus, B., M. Eineder, A. Roth, and R. Bamler (2003), The shuttle radar topography mission—A new class of digital elevation models
acquired by spaceborne radar, ISPRS J. Photogramm. Remote Sens., 57(4), 241–262, doi:10.1016/S0924-2716(02)00124-7.
Ribeiro, E., N. Batjes, J. Leenaars, and A. van Oostrum (Eds.) (2015), Towards the standardization and harmonization of world soil data, ISRIC
Rep. 2015/03, 101 pp., ISRIC — World Soil Inf., Wageningen, Netherlands.
Richard, S., T. Shipman, L. Greene, and R. Harris (2007), Estimated depth to bedrock in Arizona, map, Arizona Geological Survey, Tucson,
Ariz.
Sayre, R., et al. (2014), A new map of global ecological land unitsan ecophysiographic stratification approach, map, Assoc. Am. Geogr.,
Washington, D. C.
Schenk, H. J., and R. B. Jackson (2005), Mapping the global distribution of deep roots in relation to climate and soil characteristics, Geo-
derma, 126(1-2), 129–140, doi:10.1016/j.geoderma.2004.11.018.
Schoeneberger, P., D. Wysocki, E. Benham, and W. Broderson (Eds.) (2011), Field Book for Describing and Sampling Soils, 3rd ed., Natl. Soil
Surv. Cent., NRCS USDA, Lincoln.
Schoeneberger, P., D. Wysocki, E. Benham, and Soil Survey Staff (2012), Field Book for Describing and Sampling Soils, Version 3.0, Nat. Resour.
Conserv. Serv., Natl. Soil Surv. Cent., Lincoln.
Scholes, R. J., and E. B. D. Colstoun (2011), ISLSCP II Global Gridded Soil Characteristics, map, Oak Ridge, Tenn.
Shafique, M., M. v. der Meijde, and D. G. Rossiter (2009), Geophysical and remote sensing-based approach to model regolith thickness in a
data-sparse environment, Catena, 87(1), 11–19, doi:10.1016/j.catena.2011.04.004.
Shangguan, W., et al. (2013), A China dataset of soil properties for land surface modeling, J. Adv. Model. Earth Syst., 5, 212–224, doi:10.1002/
jame.20026.
Soil Survey Staff (2014), Keys to Soil Taxonomy, 12th ed., USDA–Nat. Resour. Conserv. Serv., Washington, D. C.
Swinford, E. M. (2004), What the glaciers left behind, Ohio Geol., 1, 1–5.
Tesfa, T. K., D. G. Tarboton, D. G. Chandler, and J. P. McNamara (2009), Modeling soildepth from topographic and land cover attributes,
Water Resour. Res., 45, W10438, doi:10.1029/2008WR007474.
Tromp-van Meerveld, H. J., N. E. Peters, and J. J. McDonnell (2007), Effect of bedrock permeability on subsurface stormflow and the water
balance of a trenched hillslope at the Panola Mountain Research Watershed, Georgia, USA, Hydrol. Processes, 21(6), 750–769, doi:
10.1002/hyp.6265.
USDA-NCSS (2006), Digital General Soil Map of U.S., map, U.S. Dep. of Agric., Nat. Resour. Conserv. Serv., Fort Worth, Tex.
Vermont Geological Survey, and Vermont Agency of Natural Resources (2008), Bedrock geologic map of Vermont, map, Vermont Agency
of Nat. Resour., Montpelier.
Wilford, J. (2012), A weathering intensity index for the Australian continent using airborne gamma-ray spectrometry and digital terrain
analysis, Geoderma, 183-184, 124–142, doi:10.1016/j.geoderma.2010.12.022.
Wilford, J. R., R. Searle, M. Thomas, D. Pagendam, and M. J. Grundy (2016), A regolith depth map of the Australian continent, Geoderma,
266, 1–13, doi:10.1016/j.geoderma.2015.11.033.
Witzke, B. J., R. R. Anderson, and J. P. Pope (2010), Estimated depth to bedrock of Iowa as a 110-meter pixel, 32-bit Imagine Format Raster
Dataset, map, Iowa Geol. and Water Surv., DNR, Iowa City.
Yamakawa, Y., K. Kosugi, N. Masaoka, J. Sumida, M. Tani, and T. Mizuyama (2012), Combined geophysical methods for detecting soil thick-
ness distribution on a weathered granitic hillslope, Geomorphology, 145-146, 56–69, doi:10.1016/j.geomorph.2011.12. 035.