Estimating Sediment Settling Velocities From A Theoretically Guided Data-Driven Approach

Estimating Sediment Settling Velocities from a
Theoretically Guided Data-Driven Approach

Zhendong Cao 1; Phillip J. Wofram 2; Joel Rowland 3; Yu Zhang 4; and Donatella Pasqualini 5
Abstract: Sediment settling velocities are commonly estimated from analytical or process-based approaches. These approaches have
Downloaded from ascelibrary.org by University of Birmingham on 07/22/20. Copyright ASCE. For personal use only; all rights reserved.
theoretical constraints due to the incompletely resolved settling physics. A parametric data-driven approach was recently proposed without
theoretical constraints, but it is limited by its mathematical assumptions. To overcome these limitations, this study applies a machine learning
algorithm to an aggregated sediment settling experimental database and develops a nonparametric data-driven model to estimate the non-
cohesive sediment settling velocity in water. A cross-comparison against five process-based equations and a parametric data-driven equation
demonstrates the higher accuracy and better consistency of the new model in estimating sediment settling velocities under various physical
regimes. The new model also shows an easily implemented self-update capability by assimilating theoretical data derived from the process-
based equations. The updated model, incorporating experimental and theoretical data of sediment settling processes, further improves the
accuracy and reduces the uncertainty in estimating sediment settling velocities. This study demonstrates the capability of machine learning in
sediment transport study and illustrates an alternative framework for other hydraulic engineering challenges. DOI: 10.1061/(ASCE)
HY.1943-7900.0001798. © 2020 Published by American Society of Civil Engineers.
Introduction Analytical and process-based approaches derive sediment set-

tling velocities from the force equilibrium equation governing
Sediment settling velocity is prerequisite for various sediment the settling particles, which is solved by finding the solution of
transport studies and engineering applications. It influences the 1 n
sediment transport mode, rate, and distance in water (e.g., Dietrich M n 1
CD ¼ þN n
1982) and, therefore, is crucial in modeling sediment deposition, R
suspension, mixing, and exchange processes (e.g., Zhiyao et al.
2008). Sediment settling is a complex process because it depends (e.g., Cheng 1997), where CD = particle drag coefficient; R = par-
on the sediment particle density, particle geometry (e.g., size, ticle Reynolds number; and M, N and n = parameters. Analytical
shape, and roundness), interparticle cohesivity, fluid characteristics equations (e.g., Stokes 1851; Oseen 1927; Goldstein 1929) solve
(temperature, density, and viscosity), and fluid turbulence velocity the CD − R relationship analytically under small particle Reynolds
(e.g., Dietrich 1982; Nielsen 1993). Many sediment settling veloc- numbers (R < 2) and have been proven to break down in real ap-
ity equations have been proposed based on a broad range of applications (Munson et al. 2006). Process-based equations extend
proaches, including (pseudo-)analytical approaches (e.g., Stokes their application regimes to a much wider R range, but the CD − R
1851; Oseen 1927; Goldstein 1929; Rubey 1933), process-based relationship can only be empirically approximated because the
approaches (e.g., Dietrich 1982; Van Rijn 1989; Raudkivi 1990; settling physics (e.g., turbulent flow around the settling particles)
Fredsoe and Deigaard 1992; Cheng 1997; Ahrens 2000; She et al. cannot be resolved analytically. Various CD − R relationships have
2005; Wu and Wang 2006; Nasiha and Shanmugam 2018), and a been derived from different experimental data with certain assump-
parametric data-driven approach (Goldstein and Coco 2014). tions (e.g., Table 1 in Zhiyao et al. 2008), and different process-
based equations have been proposed accordingly. However, the
variations of the equations indicate the limitation of process-based
1
approaches to describe the settling process in precise mathematical
Postdoctoral Research Associate, Fluid Dynamics and Solid Mechanics, terms. Furthermore, only a few studies (e.g., Jiménez and Madsen
Theoretical Div., Los Alamos National Laboratory, Los Alamos, NM
2003; Wu and Wang 2006) have compared different process-based
87544 (corresponding author). ORCID: https://orcid.org/0000-0002
-5400-431X. Email: caozd999@lanl.gov equations, and new equations are rarely tested against existing data
2
Scientist, Fluid Dynamics and Solid Mechanics, Theoretical Div., Los to determine their general reliability. All these factors indicate the
Alamos National Laboratory, Los Alamos, NM 87544. Email: pwolfram@ uncertainty and inaccuracy of process-based equations in estimating
lanl.gov settling velocities in sediment transport studies (e.g., Bhattacharya
3
Scientist, Div. of Earth and Environmental Science, Los Alamos et al. 2007; Yang 2013). This might explain why many sediment
National Laboratory, Los Alamos, NM 87544. Email: jrowland@lanl.gov transport studies neither identify (e.g., Nielsen 1986; Huijts et al.
4
Postdoctoral Research Associate, Div. of Earth and Environmental 2006; Warner et al. 2008; Donatelli et al. 2018; Olabarrieta et al.
Science, Los Alamos National Laboratory, Los Alamos, NM 87544. Email: 2018) nor use (e.g., Warner et al. 2007; Ganju et al. 2009;
yuzhang@lanl.gov Fagherazzi et al. 2013) equations to estimate sediment settling
5
Scientist, Div. of Analytics, Intelligence and Technology, Los Alamos
velocities.
National Laboratory, Los Alamos, NM 87544. Email: dmp@lanl.gov
Note. This manuscript was submitted on November 26, 2019; approved As pioneers, Goldstein and Coco (2014) developed a new sedi-
on April 28, 2020; published online on July 22, 2020. Discussion period ment settling velocity equation from an aggregated database using a
open until December 22, 2020; separate discussions must be submitted for genetic programming (GP) data-driven approach. This approach
individual papers. This paper is part of the Journal of Hydraulic Engineer- does not have the theoretical constraints of process-based models,
ing, © ASCE, ISSN 0733-9429. and it needs to specify the mathematical operators as a prerequisite.
© ASCE 04020067-1 J. Hydraul. Eng.
J. Hydraul. Eng., 2020, 146(10): 04020067

Table 1. Aggregated sediment settling velocity database
Source Data points Dn (m) Δ ν (m2 =s) w (m=s)
Corey (1949) 46 1.6 × 10−3 − 7.1 × 10−3 1.65 1.2 × 10−6 0.16 − 0.42
Wilde (1952) 335 3.8 × 10−3 − 2.5 × 10−2 1.42 − 2.08 4.6 × 10−7 − 9.2 × 10−5 0.09 − 1.0
Briggs et al. (1962) 126 9.0 × 10−5 − 5.5 × 10−4 2.19 − 4.07 1.0 × 10−6 9.0 × 10−3 − 9.5 × 10−2
Schulz et al. (1954) 159 1.0 × 10−4 − 1.4 × 10−2 1.54 − 6.5 1.0 × 10−6 − 1.0 × 10−5 6.1 × 10−3 − 0.16
US Inter-Agency Committee (1957) 12 1.5 × 10−4 − 1.5 × 10−3 1.65 1.0 × 10−6 1.5 × 10−2 − 1.17
Alger (1964) 64 2.4 × 10−2 − 3.1 × 10−2 1.57 − 1.98 8.0 × 10−7 − 7.0 × 10−4 0.214 − 1.11
Komar and Reimers (1978) 51 6.4 × 10−3 − 1.9 × 10−2 1.77 − 2.80 9.0 × 10−4 1.9 × 10−2 − 0.12
Hallermeier (1981) 115 1.0 × 10−4 − 2.2 × 10−3 0.03 − 1.67 1.0 × 10−6 − 1.0 × 10−4 4.4 × 10−3 − 0.219
Cheng (1997) 43 1.0 × 10−6 − 4.5 × 10−3 1.65 6.6 × 10−7 − 1.4 × 10−4 5.7 × 10−7 − 0.28
Paphitis et al. (2002) 12 3.63 × 10−4 − 8.58 × 10−4 1.72 − 1.8 1.1 × 10−6 2.65 × 10−3 − 8.19 × 10−3
Smith and Cheung (2003) 22 4.2 × 10−4 − 6.9 × 10−3 1.6 9.9 × 10−7 − 1.0 × 10−6 4.8 × 10−2 − 0.316
6.8 × 10−5 − 4.3 × 10−3 9.2 × 10−7 6.0 × 10−3 − 0.307
Ferguson and Church (2004) 12 1.65

Watts and Zarillo (2019) 10 3.85 × 10−4 − 8.8 × 10−3 1.7 9.6 × 10−7 9.98 × 10−2 − 0.275
However, it is unclear how to select mathematical operators for GP, 2. Data are restricted to noncohesive sediment with D > 6.25 ×
which makes it challenging to reproduce the researchers’ work; 10−5 m (1,000 measurements).
furthermore, predefined operators may limit a model’s ability to 3. Data are classified into sand (D < 2 × 10−3 m) and gravel
fully capture the internal relationships in data. To overcome these (D > 2 × 10−3 m).
limitations, this study presents a nonparametric data-driven model 4. Only data in water are considered (ν < 2 × 10−5 m2 =s) for typ-
of sediment settling velocities based on an aggregated database. ical real-world applications.
The new model, built upon a machine learning (ML) algorithm After data standardization, the final database contains 756 ex-
called a random forest (RF), relaxes the theoretical or mathematical perimental measurements with the variable distributions shown
constraints of previous studies and describes the internal relation- in Fig. 1.
ships in the data from a fully data-driven approach. Background
knowledge of the RF and its applications to relevant research
New Data-Driven Approach Using Random Forest
are presented in the methodology section. The new model has the
capability to self-update with the addition of new theoretical data A RF (Breiman 2001) is a supervised machine learning algorithm
derived from process-based equations. Cross-comparisons between that learns intrinsic information within data, as demonstrated across
different approaches (process-based, parametric data-driven, and a variety of scientific fields and applications (Svetnik et al. 2003;
nonparametric data-driven) are also presented to evaluate the vari- Cutler et al. 2007; Shotton et al. 2013; Chen and Hu 2017; Kane
ety of sediment settling velocity models across multiple physical et al. 2014; Zhou et al. 2019; Chen et al. 2019). A brief overview
regimes. of a RF algorithm with simple examples is presented in Liaw and
The paper is organized as follows. First, data compilation, pre- Wiener (2002). Comparisons between RF and other ML algorithms
processing, and the ML model configuration are described in the (e.g., artificial neural network and support vector machine) demon-
methodology section. Then the ML model performance, cross- strate its easy implementation, competitively high accuracy, and
model comparison, and sensitivity tests are presented in the results transparency, as well as its ability to deal with small sample sizes
section. Differences between process-based and data-driven mod- and high-dimensional data (e.g., Svetnik et al. 2003; Liu et al.
els, including model sensitivity analysis and ML model update, are 2013; Smith et al. 2013; Biau and Scornet 2016; Tyralis and
provided in the discussion section. This is followed by a conclusion Papacharalampous 2017; Chen and Hu 2017; Chen et al. 2019).
section summarizing the value of this study and its illustration of Readers are referred to Verikas et al. (2011) for a review of the wide
ML capabilities for hydraulic engineering applications. applications of RF and its comparisons with other ML algorithms.
This study chooses a RF regression model because it (1) does not
require significant data preprocessing for discrete data, (2) guaran-
Methodology tees good performance with reduced sensitivity to outliers, which
are sometimes inevitable and hard to exclude in experimental data,
(3) is less susceptible to overfitting, (4) requires minimal parameter
Aggregated Database and Data Pre-Processing
tuning, (5) does not have linear/nonlinear assumptions, and (6) can
A multisource database of sediment settling measurements is com- provide the relative importance of each variable in the model, which
piled from Paphitis et al. (2002), Goldstein and Coco (2014), and is helpful in model sensitivity analysis (e.g., Segal 2004; Svetnik
Watts and Zarillo (2019). The database has 1,006 measurements et al. 2003; Raschka and Mirjalili 2017). Here, the open-source
(Table 1) with 4 independent variables: the nominal diameter of RF regression algorithm sklearn.ensemble.RandomForestRegressor
a particle (Dn ) (m), the particle submerged specific gravity (Δ) (Pedregosa et al. 2011) is applied in Python 3 (Summerfield 2010)
(dimensionless), the kinematic viscosity of fluid (ν) (m2 =s), and is applied in this study. Three parameters are fine-tuned in the
the corresponding particle settling velocity (w) (m=s). The specific model; the number of decision trees (ntrees), the depth of each tree
gravity Δ ¼ ρs =ρw − 1, where ρs and ρw are the density (kg=m3 ) of (max_depth), and the number of features (max_features) used to
sediment and water, respectively. Data standardization is carried determine tree splits. Each parameter is evaluated over a range of
out according to the following four steps: values, and the combination of values for best model performance
1. The nominal diameter (Dn ) is converted to sieve diameter (D) is determined using grid search (Lerman 1980). Fivefold cross-
by D ¼ Dn =1.1 (Raudkivi 1990) for consistency with paramet- validation (Kohavi 1995) is incorporated into the grid search to
ric settling equations that use D as an input variable. mitigate model overfitting (Raschka and Mirjalili 2017) during
J. Hydraul. Eng., 2020, 146(10): 04020067

(a) (b)
(c) (d)
Fig. 1. Aggregated database sample distributions: (a) sediment sieve diameter; (b) fluid kinematic viscosity; (c) submerged specific gravity; and
(d) settling velocity.
training. The final RF model with optimized model parameters is accuracy: MAE describes the average error of the model directly,
selected from a trade-off between model simplicity and accuracy. and RMSE has the benefit of penalizing large errors in the model.
The database is split into training and test data sets based on D Among the nine optimized models (Table 2), the one with the best
because of the broad distribution of these values in the database performance on both training and test data is selected as the final
[Fig. 1(a)]. A stratified random split is applied to preserve a con- RF model:
stant sediment class ratio (0.35 for gravel and 0.65 for sand) in both
training and test data sets, with the test size ratio (test_size) varied ΣNi¼1
Tr
ðwiobs − wirfp Þ2
R2 ¼ 1 − ð1Þ
between 0.1 and 0.9 to produce nine parameter tuning scenarios. In ΣNi¼1
Tr
ðwiobs − wobs Þ2
each scenario, the RF decision trees are binarily split until the
depths reach max_depth. A grid search is performed for max_depth sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 N Tr i
from 5 to 20, max_features from 1 to 3, and ntrees from 100 to σ¼ Σ ðw − wobs Þ2 ð2Þ
400 at an interval of 50. A fivefold cross-validation assesses model N Tr i¼1 rfp
performance and the optimized model is selected based on both
the confidence of determination [R2 , Eq. (1)] and the standard 1 N Te i
MAE ¼ Σ jw − wirfp j ð3Þ
deviation [σ, Eq. (2)]. These two metrics are used to evaluate the N Te i¼1 obs
average model performance throughout the model cross-validation
process, i.e., higher R2 and lower σ indicate higher model accuracy qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
and stability. Each scenario generates one optimized model that is RMSE ¼ ΣNi¼1 Te
ðwiobs − wirfp Þ2 ð4Þ
N Te
verified by the test data across three metrics: R2 , mean absolute
error [MAE, Eq. (3)], and root-mean-square error [RMSE, Eq. (4)]. where N Tr and N Te = total number of samples in training and test
Both MAE and RMSE are the evaluation metrics for the model data, respectively; wiobs = observed settling velocity of ith sample;
Table 2. Grid search results with fivefold cross-validation

Training data Test data
Scenario test_size ntrees max_depth MAEa RMSEa R2 MAEa RMSEa R2
1 0.1 100 8 1.57 2.67 0.99 2.54 4.81 0.97
2 0.2 150 9 1.37 2.35 0.99 2.42 4.28 0.97
3 0.3 100 9 1.35 2.30 0.99 2.44 4.38 0.97
4 0.4 350 11 1.17 2.13 0.99 2.60 4.64 0.96
5 0.5 400 13 1.16 2.15 0.99 2.55 4.65 0.96
6 0.6 400 5 2.05 3.31 0.98 3.06 5.55 0.95
7 0.7 150 6 1.62 2.34 0.99 3.30 6.14 0.93
8 0.8 100 5 1.66 2.50 0.99 3.68 6.66 0.92
9 0.9 150 9 1.31 2.29 0.99 3.77 6.78 0.92
a
10−2 m=s.
J. Hydraul. Eng., 2020, 146(10): 04020067

(a) (b)
Fig. 2. (Color) Performance of RF estimator on the training and test data, respectively: (a) training data (605 measurements); and (b) test data (151
measurements).
wobs = averaged settling velocity; and wirfp = estimated settling Reynolds number, which will be further demonstrated and dis-
velocity of ith sample from RF model. cussed in the sensitivity analysis.
Cross-Model Comparison
Results
The RF estimator is compared against six selected algebraic para-
Model Performance metric sediment settling velocity equations, as listed in Table 3. The
selected equations include five process-based equations (VR89,
Model performance obtained during parameter tuning is shown SB97, CH97, WW06, and SH09) and one parametric data-driven
in Table 2. Each row denotes a scenario for the best combination equation (GC14). All these equations have been well validated by
of ntrees and max_depth. Model performance is the best when their experimental data and are reported to be applicable to different
max features ¼ 3. The fivefold cross-validation results (with flow regimes (Van Rijn 1989; Soulsby 1997; Cheng 1997; Wu and
all R2 values greater than 0.95) and the metric evaluations indicate Wang 2006; Sadat-Helbar et al. 2009; Goldstein and Coco 2014).
that no optimized model is overfit. Model structure (estimated Specifically, VR89 and CH97 solve the CD − R relationship sim-
by ntrees × max depth) and the model accuracy (MAE, RMSE, ilarly, but they derive different M, N, and n from different exper-
and R2 ) on both training and test data are compared to determine imental data; WW06 considers the sediment particle shape in
Scenario 2 as the final RF model, which is referred to as the RF solving the CD − R relationship; SB97 is one of the most simplified
estimator hereafter. process-based equations with optimization of the coefficients in a
The performance of the RF estimator on both training and test combined viscous plus bluff-body drag law; SH09 is a process-
data is shown in Fig. 2. The evaluation metrics in the training data based equation, but it is derived from the artificial data generated
(MAE ¼ 1.37 × 10−2 m=s, RMSE ¼ 2.35 × 10−2 m=s, and R2 ¼ by other equations; and GC14 is the first parametric data-driven
0.99) and the test data (MAE ¼ 2.42 × 10−2 m=s, RMSE ¼ 4.28 × equation using an aggregated experimental database. As shown
10−2 m=s, and R2 ¼ 0.97) indicate that an accurate model for in Fig. 3, each equation performs well in the test data, but none
sediment settling velocities has been developed without any phys- outperforms the RF estimator. Considering that GC14 is developed
ical or mathematical constraints. The one standard deviation using nearly the same database as used in this study, the model
(1-sigma) prediction interval shows that the RF estimator has rel- accuracy of GC14 and the RF estimator is further compared on
atively smaller prediction intervals for smaller settling velocities, the entire final database. The results shown in Fig. 4 indicate that
e.g., for w < 0.30 m=s, but the relative uncertainties as obtained the RF estimator has a higher accuracy and overall less bias with a
by normalizing the standard deviation with the velocity are smaller higher R2 and smaller MAE and RMSE.
for larger settling velocities (not shown here). The overall perform- Model performance across different sediment classes (gravel
ance of the RF estimator on the entire database is MAE ¼ 1.58 × and sand) is also evaluated. The results (Table 4) demonstrate that
10−2 m=s, RMSE ¼ 2.85 × 10−2 m=s, and R2 ¼ 0.98 (Fig. 4 red the RF estimator performance is comparably accurate on sand and
text and dots). Feature importance analysis identifies D as the most gravel classes, but the algebraic equations do not have consistent
important parameter for the RF estimator with an index of 0.89 (out accuracy across different sediment classes. For example, the WW05
of 1.0). The importance indices of ν and Δ are both quite small and is the best among the six equations on sand samples (MAE ¼ 0.96 ×
are less than 0.10 individually. This result indicates that the RF 10−2 m=s, RMSE ¼ 1.35 × 10−2 m=s, and R2 ¼ 0.94), but it
estimator is only sensitive to D and the corresponding particle performs poorly on gravel samples (MAE ¼ 10.34 × 10−2 m=s,
J. Hydraul. Eng., 2020, 146(10): 04020067

Table 3. Algebraic parametric settling velocity equations
Reference Formulation of w Abbreviation
8
> 3
> νD if D ≤ 16.187
>
>
>
> 18D
>
>
<
10ν hpffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i
Van Rijn (1989) 1 þ 0.01D − 1 if16.187 < D ≤ 16187 VR89
>
> D
>
>
>
> 1.1νD 1.5
>
>
: if D > 16187
D
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ν
Soulsby (1997) 10.362 þ 1.049D3 − 10.36 SB97
D
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1.5
ν
Cheng (1997) 25 þ 1.2D2 − 5 CH97
D
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 !n
Mν 4ND3 n
Wu and Wang (2006) 0.25 þ − 0.5 WW06a
ND 3M 2
8
3 0.963
> 0.033ν ΔgD
>
> if D ≤ 10
< D ν 2
Sadat-Helbar et al. (2009)
> SH09
>
> 0.51ν ΔgD3 0.535
: if D > 10
D ν2
37.8ΔDn þ 3,780ΔD2n
Goldstein and Coco (2014) GC14
0.383 þ 10,000Δν þ 100Δ2 Dn
1
a
D ¼ D½Δg =ν 2 3 is the particle effective diameter. The coefficients M, N, and n in WW06 consider the Corey shape factor Sf (Corey 1949),
M ¼ 53.5e−0.65Sf , N ¼ 5.65e−2.5Sf , and n ¼ 0.7 þ 0.9Sf , where Sf ¼ pcffiffiffiffi
ab
, with a, b, and c the lengths (m) of the longest, intermediate, and shortest axes
of the particle. In this study Sf ¼ 0.8.
Fig. 3. (Color) Performances of algebraic parametric equations on test data.
J. Hydraul. Eng., 2020, 146(10): 04020067

sand), and 10−2 m (pebbles), respectively. Then the values of w are
calculated for each group using the RF estimator and the six equa-
tions, assuming constant nominal values of ν ¼ 1.0 × 10−6 m2 =s
and Δ ¼ 1.65. Model results are represented by the maximum,
minimum, and mean estimate values of each sediment group
and their standard deviation (σ) in Fig. 5, where abscissa labels
are model names (Table 3), the black error bars indicate the stand-
ard deviation of the estimate, the red error bars indicate the esti-
mate ranges between the minimum and maximum values, and the
black dots along the bars indicate the mean values of each estimate
group. Both the RF estimator and GC14 are sensitive to the grain
size variations from fine sand to pebbles, with the highest sensi-
tivity on the coarse sand. The remaining five equations show high
sensitivity for coarse sand and pebbles, but less for fine sand.
GC14 overestimates w when the grain size is small [Fig. 5(a)],
indicated by the estimate range (red bar) not overlapping with
any other model results. This is consistent with the findings of
Goldstein and Coco (2014).
Sensitivity to Kinematic Viscosity and Submerged Specific
Gravity
Twenty-one different fluid kinematic viscosity ν values are selected
with constant intervals from the range of 10−6 m2 =s and 10−5 m2 =s.
Fig. 4. (Color) Comparison of RF estimator and GC14 on entire Constant nominal values of D ¼ 10−3 m and Δ ¼ 1.65 are defined
database (756 measurements). for each ν value. For the sensitivity test against Δ, 15 different Δ
values are selected from [1 3], with ν and D assigned to be
10−6 m2 =s and 10−3 m, respectively. Model sensitivity results to
ν and Δ are shown in Fig. 6. Fig. 6(a) shows that the RF estimator
RMSE ¼ 13.34 × 10−2 m=s, and R2 ¼ 0.65). This indicates that and GC14 are not sensitive to ν, but the other five equations are
the traditional process-based equations, derived from limited exper- strongly sensitive. Sensitivity results for the specific gravity Δ
imental data and not tested by new data, may fail to show their gen- are similar: the five process-based models are more sensitive to
eral reliability. Compared to WW05, the RF estimator improves the Δ than the RF estimator and GC14 [Fig. 6(b)]
accuracy by 9.4% for sand samples and 48.6% for gravel samples
according to the MAE values. Similarly, compared to GC14, which
performs the best on gravel samples but the worst on sand samples, Discussion
the RF estimator has an increased accuracy by 44.2% and 17.8% on
sand and gravel classes, respectively. In general, the RF estimator In contrast to the traditional process-based approach based on lim-
outperforms the other six algebraic equations and shows consistent ited experimental data and mathematical/physical laws, the data-
accuracy across different sediment classes. driven approach is inductive and relies on aggregated database to
develop insight, predictions, or relationships. The two distinctive
approaches have intrinsic differences. Here the differences are ad-
Model Sensitivity Testing dressed from two perspectives: (1) model sensitivity and (2) model
To evaluate the sensitivity of the RF estimator and the six equations scalability. The latter focuses on the improvement of our data-
to each input variable, three sensitivity scenarios are completed driven model with the addition of more data derived from sampling
with each scenario testing one feature. theoretical relationships.
Sensitivity to Sediment Particle Size

Model Sensitivity Analysis
Sediment particle size is typically characterized by laboratory sieve
analysis (e.g., BSI 1967, 1986). Assuming a 10% error for grain Sediment particles are sampled with a wide range of D, varying
size (D error) in sieve analysis, three groups of sediment are se- from 6.25 × 10−5 m to about 0.03 m [Fig. 1(a)], across 323 differ-
lected, with the average D as 10−4 m (fine sand), 10−3 m (coarse ent D values in the database. The numbers of ν and Δ values,
Table 4. Performances of models on test data

Sand Gravel Sand and gravel
Model
name MAE a
RMSE a
R2 MAE a
RMSE a
R2 MAE a
RMSEa R2
RF 0.87 1.21 0.96 5.31 7.07 0.92 2.42 4.28 0.97
VR89 1.18 1.62 0.92 7.65 10.60 0.78 3.44 6.39 0.93
SB97 1.36 1.93 0.89 8.17 11.40 0.75 3.73 6.90 0.91
CH97 1.12 1.76 0.91 7.95 10.62 0.78 3.51 6.43 0.92
WW05 0.96 1.35 0.94 10.34 13.34 0.65 4.23 7.96 0.88
SH09 1.40 2.10 0.86 10.47 13.51 0.64 4.56 8.16 0.88
GC14 1.56 2.23 0.85 6.67 8.60 0.86 3.34 5.39 0.95
Note: Bold denotes lowest two error values.
a −2
10 m=s.
J. Hydraul. Eng., 2020, 146(10): 04020067

(a)
(b)
(c)
Fig. 5. (Color) Model sensitivity to different grain sizes: (a) fine sand; (b) coarse sand; and (c) pebbles.
however, are much smaller. There are only 48 different ν values and experimental data sets in air, in which all the measurements share
70 different Δ values in the database. More specifically, ν and Δ the same air kinematic viscosity (1.5 × 10−5 m2 =s) and submerged
can be effectively clustered into single groups [Figs. 1(b and c)]: specific density (2,207). However, from that database they derived
646 out of the 756 samples (85.4%) have ν values around a linear equation of settling velocity in air with respect to only grain
10−6 m2 =s (log10 ν ∈ [−6.1 − 5.9]); more than 70.7% of the Δ val- size, which outperforms eight previous equations with more inde-
ues falls in between 1.45 and 1.85, with 295 values equal to 1.65. pendent variables (e.g., ν and Δ) and more complex formulations
This explains why the RF estimator (and GC14) is sensitive to D [Table 5 in Farrell and Sherman (2015)].
but not to either ν or Δ. The proposed model’s sensitivity testing demonstrates intrinsic
In fact, the database is largely composed of data on quartz par- differences between a data-driven model and a process-based
ticles falling in water. This is the most common engineering con- model. The key capability of a process-based model is its transpar-
dition in the real world: water kinematic viscosity is around ency in physics—each variable has a distinct role in determining
10−6 m2 =s, and most sediment particles are quartz or similar min- the physical process; for example, D determines the volume and
erals with a characteristic density of 2,650 kg=m3 ; water density is surface area of a particle, ν determines the drag forces on settling
around 1,000 kg=m3 ; and the specific gravity Δ ¼ 1.65 is the typ- particles, and Δ determines the relative gravity of particles in fluid.
ical value in sediment transport studies (Soulsby 1997). The data- These variables largely control the particle settling process, with
driven model’s sensitivity does not affect the model application in their contributions to the settling velocity represented as distinct
the real world. For example, Farrell and Sherman (2015) compiled terms in the equations (Table 3). The data-driven model, however,
a database from all five of the existing sediment settling velocity is an inductive approach that embeds theory and logic within the
J. Hydraul. Eng., 2020, 146(10): 04020067

(a)
(b)
Fig. 6. (Color) Model sensitivity results: (a) v ∈ ½10−6 ; 10−5 ; and (b) Δ ∈ ½1; 3.
Table 5. Averaged IA values of models in different and overall R bands

IA values in various log10 ðRÞ bands
Model <0.5 0.5 ∼ 1.0 1.0 ∼ 1.5 1.5 ∼ 2.0 2.0 ∼ 2.5 2.5 ∼ 3.0 3.0 ∼ 3.5 >3.5 Overall average
VR89 0.95 0.90 0.93 0.96 0.89 0.95 0.99 0.98 0.96
SS97 0.95 0.88 0.91 0.94 0.99 0.96 0.93 0.90 0.93
CH97 0.78 0.83 0.85 0.93 0.96 0.99 0.99 0.99 0.97
WW06 0.83 0.91 0.96 0.94 0.89 0.85 0.85 0.86 0.86
SH09 0.65 0.81 0.95 0.96 0.92 0.95 0.97 0.93 0.95
GC14 0.33 0.76 0.96 0.95 0.91 0.91 0.91 0.92 0.89
RF 0.87 0.95 0.96 0.97 0.90 0.92 0.88 0.84 0.91
RF_update 0.89 0.95 0.97 0.98 0.97 0.99 0.99 0.99 0.99
Note: The highest values in each column are bolded.
model implicitly. The development of the data-driven model relies Under the inspiration of Sadat-Helbar et al. (2009), an ensemble
on the sufficiency and accuracy of the database, but it is not sus- learning method is introduced to generate the theoretical data
ceptible to overreductionism via conceptual assumptions. One con- from the five process-based equations in Table 3. This will
sequence is that the data-driven model might be applicable only incorporate theoretical knowledge of the settling process from
within the range of the data used to develop it. But this limitation the predicted data with reliable accuracy, and the ensemble learn-
also applies to the process-based models using empirical descrip- ing method will integrate five process-based equations and
tions for complex physical process, although it is rarely mentioned reduce the uncertainty of each individual estimate (e.g., Mendes-
(Goldstein et al. 2019). Moreira et al. 2012). The ensemble learning procedure is imple-
mented as follows:
1. Select 1,000 sediment particles with fixed density (ρs ¼
Improvement of RF Estimator with Theoretical Data 2,650 kg=m3 ) but varying grain size (D values that are evenly
The largest deficiency of the RF estimator is that it does not distributed in the range ½10−4 ; 10−2 m).
have theoretical knowledge of sediment settling processes as 2. Estimate the settling velocities of the particles in water (assume
do process-based equations. One solution is to have the RF es- ν ¼ 10−6 m2 =s and Δ ¼ 1.65) using each of the five process-
timator trained on data from theoretically derived predictions. based equations.
J. Hydraul. Eng., 2020, 146(10): 04020067

3. Average the five estimates of each particle to generate the the- versus MAE ¼ 2.42 × 10−2 m=s, RMSE ¼ 4.11 × 10−2 m=s, and
oretical data. R2 ¼ 0.97 for the RF estimator). This indicates that the RF_update
As a scalable model with no fixed formulation, the RF estimator does not lose experimental data knowledge when trained using the
has an easily implemented capability to self-update given more theoretical knowledge from the new data. The RF_update model
data. The RF estimator is retrained and updated by randomly se- learns from both the theory and the data. The overall accuracy and
lecting 10% (100 samples) of the new theoretical data and adding consistency of the RF_update is the best among the eight models.
them to the original training data set. The updated RF estimator This indicates that the RF_update has subsequently learned from
(RF_update hereafter) is then tested on the remaining 90% data the available sediment settling knowledge contained within the ex-
using precisely the same approach and observational data used perimental and theoretical data and made an improvement in reduc-
to test the RF model. To compare the model performance, an indi- ing the estimation uncertainty. This improvement is important
vidual accuracy (IA) metric is introduced: because the uncertainty is the main issue in reliable prediction for
sediment transport applications (Yang 2013).
jwiavg − wiest;j j The easily implemented, straightforward, self-updating capabil-
IAij ¼ 1 − ð5Þ
wiavg ity is an advantage of the nonparametric data-driven model with no

algebraic formulation. Although process-based equations are more
where IAij = accuracy of jth process-based equation in estimating transparent in terms of physical interpretation than the data-driven
individual ith sample in new theoretical data; and wiavg and wiest;j = models, a deterministic mathematical expression of the complex
averaged theoretical value and estimated value of sediment settling settling physics appears impractical. GC14 should also be able
velocity of ith sample by jth equation, respectively. The IAij value to assimilate the new data for better estimates, but the manually
varies in the range (−∞ 1]. An IA closer to 1 indicates a better predefined mathematical operations constrain the model to the lim-
estimate of the model. ited knowledge of sediment settling physics, although it does pro-
IA variations with respect to particle Reynolds numbers vide an algebraic equation that the RF estimator cannot. However,
(R ¼ wD=ν) for each model/equation are shown in Fig. 7. IA val- the benefits of the fully data-driven approach appear to outweigh
ues in the range [0.7, 1] are highlighted, and the minimum values this disadvantage by producing an accurate and consistent sediment
are provided for some equation curves (SH09 and GC14) that are settling velocity estimate across various physical regimes because it
not fully shown in the figure. Most models have IA values in the is flexible and comprehensive in integrating aggregated experimen-
range [0.8, 1], but no model performs consistently well on all R tal and theoretical knowledge to express the particle settling process
bands. GC14 significantly underestimates the settling velocity when inductively.
R is small (for the case of very fine sand), as noted by Goldstein and
Coco (2014). The RF estimator has the most fluctuations of IA due
Further Sediment Settling Velocity Modeling
to the ensemble of decision trees in RF, but these fluctuations are
Considerations
predominantly within the range of the overall IA values as shown in
Fig. 7. The performance of RF_update is greatly improved by in- The real-world sediment settling process is much more complex
tegrating the theoretical data, especially at R > 102 . The fluctua- than the simplified conditions typically addressed in the literature
tions in the RF estimator are modulated in the RF_update by and this study. For instance, particle shape affects sediment settling
incorporation of the theoretical data. Table 5 lists averaged IA val- velocity (e.g., Corey 1949; Wilde 1952; Dietrich 1982; Camenen
ues of each model on different and overall R bands. Process-based 2007), but in the database the particle shape is only denoted by
equations are obviously very accurate to estimate the new test data a particle diameter. This may limit the model application to spher-
generated by them, especially VR89 and CH97. The RF estimator ical sediment particles only. The effects of flocculation of cohesive
and GC14 do not outperform the process-based equations in this sediment (e.g., Winterwerp et al. 2002), sediment suspension
new test data mainly because they do not have the theoretical (e.g., Baldock et al. 2004), and mutual particle interactions (e.g., Imai
knowledge of sediment settling process as the process-based equa- 1980; El-Nahhas et al. 2009) also influence the sediment settling
tions do. Compared with the RF estimator and the other algebraic process. But these complex real-world effects are not explicitly
equations, the RF_update shows a significant improvement in es- considered within the present database. Additionally, some phys-
timating the theoretical data, with slightly low IA values for the first ical processes in the laboratory experiments such as wall effects
R band [log10 ðRÞ < 0.5] but with a near optimal value for the re- (e.g., Brown and Lawler 2003) absent in the real world may intro-
maining bands. In addition, the RF_update is applied to the original duce error into the experimental data and are not considered or cali-
training and test data, and the results show that the RF_update per- brated for in the database.
forms nearly the same as the RF estimator on the original training The model developed in this study is accurate within these
data, and slightly better on the original test data (MAE ¼ 2.31 × constraints and is readily extensible subject to the availability of
10−2 m=s, RMSE ¼ 4.11 × 10−2 m=s, and R2 ¼ 0.97 for RF_update additional data, but it does so at the cost of losing a simplified
framework for computation and direct theoretical underpinnings.
This makes the model difficult to understand and more costly to
compute as compared to an algebraic equation. However, the com-
plexity of the problem and the limitations of existing approaches as
illustrated here suggest value in this framework to provide better
estimates of sediment settling velocity. Minimally, there are two
key benefits: (1) the results obtained with the RF_update model
can be used to better select input values for sediment settling com-
putations and (2) results suggest parameter ranges that additional
laboratory or theoretical exploration should address to improve
understanding of sediment settling processes given the accuracy
Fig. 7. (Color) IA variations in various models with respect to R from
of existing estimations relative to the integrated knowledge assimi-
theoretical data.
lated in the RF_update model.
J. Hydraul. Eng., 2020, 146(10): 04020067

Conclusion Baldock, T. E., M. R. Tomkins, P. Nielsen, and M. G. Hughes. 2004.
“Settling velocity of sediments at high concentrations.” Coastal Eng.
In this study, a new nonparametric data-driven model is developed 51 (1): 91–100. https://doi.org/10.1016/j.coastaleng.2003.12.004.
for noncohesive sediment settling velocity in water based on an ag- Bhattacharya, B., R. K. Price, and D. P. Solomatine. 2007. “Machine
gregated multisource sediment settling database. Cross-comparisons learning approach to modeling sediment transport.” J. Hydraul.
between the new model and six algebraic equations demonstrate Eng. 133 (4): 440–450. https://doi.org/10.1061/(ASCE)0733-9429
(2007)133:4(440).
the differences between the process-based and data-driven models:
Biau, G., and E. Scornet. 2016. “A random forest guided tour.” Test 25 (2):
(1) the nonparametric data-driven model, without any theoretical and 197–227. https://doi.org/10.1007/s11749-016-0481-7.
mathematical constraints, estimates sediment settling velocities with Breiman, L. 2001. “Random forests.” Mach. Learn. 45 (1): 5–32. https://doi
higher accuracy and better consistency under various physical re- .org/10.1023/A:1010933404324.
gimes; (2) the data-driven model’s sensitivity largely depends on Briggs, L. I., D. S. McCulloch, and F. Moser. 1962. “The sand particles.”
the training database, and the process-based equations are sensitive J. Sediment. Res. 32 (4): 645–656. https://doi.org/10.1306/74D70D44
to all variables controlling the settling process; and (3) the nonpara- -2B21-11D7-8648000102C1865D.
metric data-driven model has an easily implemented self-updating Brown, P. P., and D. F. Lawler. 2003. “Sphere drag and settling velocity
capability given more data. By integrating both experimental and revisited.” J. Environ. Eng. 129 (3): 222–231. https://doi.org/10.1061
theoretical knowledge of sediment settling process, the updated /(ASCE)0733-9372(2003)129:3(222).
BSI (British Standards Institution). 1967. Methods of testing soils for civil
model further enhances the model performance with improved esti-
engineering purposes. BS 1377:1967. London: BSI.
mate accuracy and consistency.
BSI (British Standards Institution). 1986. British standard specification for
There are many complex natural processes with incompletely test sieves. BS 410:1986. London: BSI.
resolved physics in the hydraulic engineering research literature, Camenen, B. 2007. “Simple and general formula for the settling velocity of
such as sediment transport, turbulent flows, and water resource particles.” J. Hydraul. Eng. 133 (2): 229–233. https://doi.org/10.1061
management. Deterministic mathematical frameworks for such /(ASCE)0733-9429(2007)133:2(229).
complex processes appear intractable from the perspective of current Chen, S., and C. Hu. 2017. “Estimating sea surface salinity in the northern
process-based approaches. Alternatively, with the increase in avail- Gulf of Mexico from satellite ocean color measurements.” Remote
able data and progress in ML techniques, data-driven approaches Sens. Environ. 201 (Nov): 115–132. https://doi.org/10.1016/j.rse
have been increasingly applied to complex hydraulic engineering re- .2017.09.004.
search, for example, in sediment transport [e.g., a review in Goldstein Chen, S., C. Hu, B. B. Barnes, R. Wanninkhof, W. J. Cai, L. Barbero, and
D. Pierrot. 2019. “A machine learning approach to estimate surface ocean
et al. (2019)], in turbulent flows (e.g., Mohan et al. 2019), and in
pCO2 from satellite measurements.” Remote Sens. Environ. 228 (Jul):
water resource research (e.g., Najafzadeh et al. 2017; Granata et al. 203–226. https://doi.org/10.1016/j.rse.2019.04.019.
2018). This study, using a nonparametric data-driven approach, not Cheng, N.-S. 1997. “Simplified settling velocity formula for sediment
only demonstrates the capability of ML models to better identify the particle.” J. Hydraul. Eng. 123 (2): 149–152. https://doi.org/10.1061
internal relationships present in data but also shows its potential in /(ASCE)0733-9429(1997)123:2(149).
leveraging theoretical knowledge for further model improvement. In Corey, A. T. 1949. “Influence of shape on the fall velocity of sand
addition, it provides an alternative framework that may be applied to grains.” M.S. thesis, Irrigation Engineering, Colorado Agricultural
other hydraulic engineering challenges. and Mechanical College.
Cutler, D. R., T. C. Edwards Jr., K. H. Beard, A. Cutler, K. T. Hess,
J. Gibson, and J. J. Lawler. 2007. “Random forests for classification
in ecology.” Ecology 88 (11): 2783–2792. https://doi.org/10.1890/07
Data Availability Statement -0539.1.
Dietrich, W. E. 1982. “Settling velocity of natural particles.” Water
The compiled data and python code for this study can be obtained Resour. Res. 18 (6): 1615–1626. https://doi.org/10.1029/WR018i006
by sending a written request to the corresponding author. p01615.
Donatelli, C., N. K. Ganju, S. Fagherazzi, and N. Leonardi. 2018.
“Seagrass impact on sediment exchange between tidal flats and salt
Acknowledgments marsh, and the sediment budget of shallow bays.” Geophys. Res. Lett.
45 (10): 4933–4943. https://doi.org/10.1029/2018GL078056.
The authors thank Drs. Goldstein and Coco for sharing data and for El-Nahhas, K., N. G. El-Hak, M. A. Rayan, and I. El-Sawaf. 2009. “Effect
helpful discussions motivating this study. The authors would also of particle size distribution on the hydraulic transport of settling slur-
ries.” In Proc., 13th Int. Water Technology Conf., IWTC13. Hurghada,
like to thank the editors and three anonymous reviewers for their
Egypt: International Water Technology Conference.
valuable comments that helped improve the manuscript. Funding Fagherazzi, S., P. L. Wiberg, S. Temmerman, E. Struyf, Y. Zhao, and P. A.
for this study was provided under the Los Alamos National Labora- Raymond. 2013. “Fluxes of water, sediments, and biogeochemical com-
tory Research and Development Directed Research project “Adaption pounds in salt marshes.” Ecol. Processes 2 (1): 3. https://doi.org/10
Science for Complex Natural-Engineered Systems” (20180033DR). .1186/2192-1709-2-3.
This publication has been supported by the Los Alamos Laboratory Farrell, E. J., and D. J. Sherman. 2015. “A new relationship between grain
Directed Research and Development project under LA-UR-20-22942. size and fall (settling) velocity in air.” Prog. Phys. Geogr. 39 (3):
361–387. https://doi.org/10.1177/0309133314562442.
Ferguson, R., and M. Church. 2004. “A simple universal equation for grain
settling velocity.” J. Sediment. Res. 74 (6): 933–937. https://doi.org/10
References .1306/051204740933.
Fredsoe, J., and R. Deigaard. 1992. “Advanced series on ocean engineer-
Ahrens, J. P. 2000. “A fall-velocity equation.” J. Waterway, Port, Coastal, ing.” In Vol. 3 of Mechanics of coastal sediment transport. Singapore:
Ocean Eng. 126 (2): 99–102. https://doi.org/10.1061/(ASCE)0733 World Scientific.
-950X(2000)126:2(99). Ganju, N. K., D. H. Schoellhamer, and B. E. Jaffe. 2009. “Hindcasting of
Alger, G. 1964. “Terminal fall velocity of particles of irregular shapes as decadal-timescale estuarine bathymetric change with a tidal-timescale
affected by surface area.” Ph.D. dissertation, Dept. of Civil Engineering, model.” J. Geophys. Res. Earth Surf. 114 (F4): F04019. https://doi
Colorado State Univ. .org/10.1029/2008JF001191.
J. Hydraul. Eng., 2020, 146(10): 04020067

Goldstein, E. B., and G. Coco. 2014. “A machine learning approach for the evolution of funnel-shaped estuaries.” J. Geophys. Res. Earth Surf.
prediction of settling velocity.” Water Resour. Res. 50 (4): 3595–3601. 123 (11): 2901–2924. https://doi.org/10.1029/2017JF004527.
https://doi.org/10.1002/2013WR015116. Oseen, C. W. 1927. Vol. 1 of Hydrodynamik. Leipzig, Germany: Akad.
Goldstein, E. B., G. Coco, and N. G. Plant. 2019. “A review of machine Verl. Ges.
learning applications to coastal sediment transport and morphodynam- Paphitis, D., M. B. Collins, L. A. Nash, and S. Wallbridge. 2002. “Settling
ics.” Earth Sci. Rev. 194 (Jul): 97–108. https://doi.org/10.1016/j velocities and entrainment thresholds of biogenic sands (shell fragments)
.earscirev.2019.04.022. under unidirectional flow.” Sedimentology 49 (1): 211–225. https://doi
Goldstein, S. 1929. “The steady flow of viscous fluid past a fixed spherical .org/10.1046/j.1365-3091.2002.00446.x.
obstacle at small Reynolds numbers.” Proc. R. Soc. London, Ser. A Pedregosa, F., et al. 2011. “Scikit-learn: Machine learning in Python.”
Math. Phys. Eng. Sci. 123 (791): 225–235. https://doi.org/10.1098/rspa J. Mach. Learn. Res. 12: 2825–2830.
.1929.0067. Raschka, S., and V. Mirjalili. 2017. Python machine learning. Birmingham,
Granata, F., M. Saroli, G. de Marinis, and R. Gargano. 2018. “Machine UK: Packt Publishing.
learning models for spring discharge forecasting.” Geofluids 2018: Raudkivi, A. J. 1990. Loose boundary hydraulics. 3rd ed. Oxford, UK:
1–13. https://doi.org/10.1155/2018/8328167. Pergamon.
Hallermeier, R. J. 1981. “Terminal settling velocity of commonly occurring Rubey, W. W. 1933. “Settling velocity of gravel, sand, and silt particles.”
sand grains.” Sedimentology 28 (6): 859–865. https://doi.org/10.1111/j Am. J. Sci. 25 (148): 325–338. https://doi.org/10.2475/ajs.s5-25.148
.1365-3091.1981.tb01948.x. .325.
Huijts, K. M. H., H. M. Schuttelaars, H. E. De Swart, and A. Valle- Sadat-Helbar, S. M., E. Amiri-Tokaldany, S. Darby, and A. Shafaie. 2009.
Levinson. 2006. “Lateral entrapment of sediment in tidal estuaries: An “Fall velocity of sediment particles.” In Proc., 4th IASME/WSEAS Int.
idealized model study.” J. Geophys. Res. Oceans 111 (C12): C12016. Conf. on Water Resources, Hydraulics and Hydrology, WHH’09.
https://doi.org/10.1029/2006JC003615. Cambridge, UK: WSEAS Press.
Imai, G. 1980. “Settling behavior of clay suspension.” Soils Found. 20 (2): Schulz, S. E., R. H. Wilde, and M. L. Albertson. 1954. Influence of shape
61–77. https://doi.org/10.3208/sandf1972.20.2_61. on the fall velocity of sedimentary particles. M.R.D Sediment Series
Jiménez, J. A., and O. S. Madsen. 2003. “A simple formula to estimate No. 5. Omaha, NE: USACE.
settling velocity of natural sediments.” J. Waterway, Port, Coastal, Segal, M. R. 2004. Machine learning benchmarks and random forest re-
Ocean Eng. 129 (2): 70–78. https://doi.org/10.1061/(ASCE)0733 gression. Technical Rep. San Francisco: Center for Bioinformatics &
-950X(2003)129:2(70). Molecular Biostatistics, Univ. of California.
Kane, M. J., N. Price, M. Scotch, and P. Rabinowitz. 2014. “Comparison of She, K., L. Trim, and D. Pope. 2005. “Fall velocities of natural sediment
ARIMA and random forest time series models for prediction of avian particles: A simple mathematical presentation of the fall velocity law.”
influenza H5N1 outbreaks.” BMC Bioinf. 15 (1): 276. https://doi.org/10 J. Hydraul. Res. 43 (2): 189–195. https://doi.org/10.1080/00221686
.1186/1471-2105-15-276. .2005.9641235.
Kohavi, R. 1995. “A study of cross-validation and bootstrap for accuracy Shotton, J., T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio, A. Blake,
estimation and model selection.” IJCAI 14 (2): 1137–1145. M. Cook, and R. Moore. 2013. “Real-time human pose recognition in
Komar, P. D., and C. Reimers. 1978. “Grain shape effects on settling rates.” parts from single depth images.” Commun. ACM 56 (1): 116–124.
J. Geol. 86 (2): 193–209. https://doi.org/10.1086/649674. https://doi.org/10.1145/2398356.2398381.
Lerman, P. 1980. “Fitting segmented regression models by grid search.” Smith, D. A., and K. F. Cheung. 2003. “Settling characteristics of calca-
J. R. Stat. Soc. Ser. C Appl. Stat. 29 (1): 77–84. https://doi.org/10 reous sand.” J. Hydraul. Eng. 129 (6): 479–483. https://doi.org/10.1061
.2307/2346413. /(ASCE)0733-9429(2003)129:6(479).
Liaw, A., and M. Wiener. 2002. “Classification and regression by random Smith, P. F., S. Ganesh, and P. Liu. 2013. “A comparison of random forest
forest.” R News 2 (3): 18–22. regression and multiple linear regression for prediction in neuro-
Liu, M., M. Wang, J. Wang, and D. Li. 2013. “Comparison of random for- science.” J. Neurosci. Methods 220 (1): 85–91. https://doi.org/10
est, support vector machine and back propagation neural network for .1016/j.jneumeth.2013.08.024.
electronic tongue data classification: Application to the recognition of Soulsby, R. 1997. Dynamics of marine sands: A manual for practical
orange beverage and Chinese vinegar.” Sens. Actuators, B 177 (Feb): applications. London: Thomas Telford.
970–980. https://doi.org/10.1016/j.snb.2012.11.071. Stokes, G. G. 1851. Vol. 9 of On the effect of the internal friction of fluids
Mendes-Moreira, J., C. Soares, A. M. Jorge, and J. F. D. Sousa. 2012. on the motion of pendulums. Cambridge, UK: Pitt Press.
“Ensemble approaches for regression: A survey.” ACM Comput. Surv. Summerfield, M. 2010. Programming in Python 3: A complete introduction
45 (1): 1–40. https://doi.org/10.1145/2379776.2379786. to the Python language. Boston, MA: Addison-Wesley.
Mohan, A., D. Daniel, M. Chertkov, and D. Livescu. 2019. “Compressed Svetnik, V., A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and B. P.
convolutional LSTM: An efficient deep learning framework to model Feuston. 2003. “Random forest: A classification and regression tool for
high fidelity 3D turbulence.” Preprint, submitted February 28, 2019. compound classification and qsar modeling.” J. Chem. Inf. Comput. Sci.
https://arxiv.org/abs/1903.00033. 43 (6): 1947–1958. https://doi.org/10.1021/ci034160g.
Munson, B. R., D. F. Young, and T. H. Okiishi. 2006. Fundamentals of fluid Tyralis, H., and G. Papacharalampous. 2017. “Variable selection in time
mechanics. New York: Wiley. series forecasting using random forests.” Algorithms 10 (4): 114. https://
Najafzadeh, M., A. Tafarojnoruz, and S. Y. Lim. 2017. “Prediction of local doi.org/10.3390/a10040114.
scour depth downstream of sluice gates using data-driven models.” ISH US Inter-Agency Committee. 1957. Some fundamentals of particle size
J. Hydraul. Eng. 23 (2): 195–202. https://doi.org/10.1080/09715010 analysis: A study of methods used in measurement and analysis of sedi-
.2017.1286614. ment loads in streams. Rep. No. 12. Minneapolis: US Inter-Agency
Nasiha, H. J., and P. Shanmugam. 2018. “Estimation of settling velocity Committee on Water Resources.
of sediment particles in estuarine and coastal waters.” Estuarine Van Rijn, L. C. 1989. Handbook: Sediment transport by currents and
Coastal Shelf Sci. 203 (Apr): 59–71. https://doi.org/10.1016/j.ecss waves. Delft, Netherlands: Delft Hydraulics Laboratory.
.2018.02.001. Verikas, A., A. Gelzinis, and M. Bacauskiene. 2011. “Mining data with
Nielsen, P. 1986. “Suspended sediment concentrations under waves.” random forests: A survey and results of new tests.” Pattern Recognit.
Coastal Eng. 10 (1): 23–31. https://doi.org/10.1016/0378-3839(86) 44 (2): 330–349. https://doi.org/10.1016/j.patcog.2010.08.011.
90037-2. Warner, J. C., C. R. Sherwood, and W. R. Geyer. 2007. “Sensitivity of
Nielsen, P. 1993. “Turbulence effects on the settling of suspended par- estuarine turbidity maximum to settling velocity, tidal mixing, and sedi-
ticles.” J. Sediment. Res. 63 (5): 835–838. ment supply.” In Estuarine and coastal fine sediments dynamics, edited
Olabarrieta, M., W. R. Geyer, G. Coco, C. T. Friedrichs, and Z. Cao. 2018. by J. P.-Y. Maa, L. P. Sanford, and D. H. Schoellhamer, 355–376.
“Effects of density-driven flows on the long-term morphodynamic Amsterdam: Elsevier.
J. Hydraul. Eng., 2020, 146(10): 04020067

Warner, J. C., C. R. Sherwood, R. P. Signell, C. K. Harris, and H. G. Wu, W., and S. S. Wang. 2006. “Formulas for sediment porosity and set-
Arango. 2008. “Development of a three-dimensional, regional, coupled tling velocity.” J. Hydraul. Eng. 132 (8): 858–862. https://doi.org/10
wave, current, and sediment-transport model.” Comput. Geosci. 34 (10): .1061/(ASCE)0733-9429(2006)132:8(858).
1284–1306. https://doi.org/10.1016/j.cageo.2008.02.012. Yang, S. Q. 2013. “Why cannot sediment transport be accurately pre-
Watts, I. M., and G. A. Zarillo. 2019. “Fall velocity determination of indi- dicted.” In Proc., 35th World Congress of the International Association
for Hydraulic Research, 1–10. Chengdu, China: International Associ-
vidual grain size classes in a carbonate rich environment: Implications
ation for Hydraulic Research.
for numerical modeling.” In Proc., 9th Int. Conf. on Coastal Sediments,
Zhiyao, S., W. Tingting, X. Fumin, and L. Ruijie. 2008. “A simple formula
1029–1040. Singapore: World Scientific. for predicting settling velocity of sediment particles.” Water Sci. Eng.
Wilde, R. H. 1952. “Effect of shape on the fall-velocity of sand-sized par- 1 (1): 37–43. https://doi.org/10.1016/S1674-2370(15)30017-X.
ticles.” M.S. thesis, Fort Collins, Colorado A & M College. Zhou, Y., S. Li, C. Zhou, and H. Luo. 2019. “Intelligent approach based on
Winterwerp, J. C. 2002. “On the flocculation and settling velocity of estua- random forest for safety risk prediction of deep foundation pit in sub-
rine mud.” Cont. Shelf Res. 22 (9): 1339–1360. https://doi.org/10.1016 way stations.” J. Comput. Civ. Eng. 33 (1): 05018004. https://doi.org/10
/S0278-4343(02)00010-9. .1061/(ASCE)CP.1943-5487.0000796.
J. Hydraul. Eng., 2020, 146(10): 04020067

Estimating Sediment Settling Velocities From A Theoretically Guided Data-Driven Approach

Uploaded by

Copyright:

Available Formats

You might also like

Estimating Sediment Settling Velocities From A Theoretically Guided Data-Driven Approach

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimating Sediment Settling Velocities From A Theoretically Guided Data-Driven Approach

Uploaded by

Copyright:

Available Formats

Estimating Sediment Settling Velocities from a

Theoretically Guided Data-Driven Approach

Introduction Analytical and process-based approaches derive sediment set-

© ASCE 04020067-1 J. Hydraul. Eng.

J. Hydraul. Eng., 2020, 146(10): 04020067

Ferguson and Church (2004) 12 1.65

© ASCE 04020067-2 J. Hydraul. Eng.

J. Hydraul. Eng., 2020, 146(10): 04020067

Table 2. Grid search results with fivefold cross-validation

© ASCE 04020067-3 J. Hydraul. Eng.

J. Hydraul. Eng., 2020, 146(10): 04020067

© ASCE 04020067-4 J. Hydraul. Eng.

J. Hydraul. Eng., 2020, 146(10): 04020067

Fig. 3. (Color) Performances of algebraic parametric equations on test data.

© ASCE 04020067-5 J. Hydraul. Eng.

J. Hydraul. Eng., 2020, 146(10): 04020067

Sensitivity to Sediment Particle Size

Table 4. Performances of models on test data

© ASCE 04020067-6 J. Hydraul. Eng.

J. Hydraul. Eng., 2020, 146(10): 04020067

© ASCE 04020067-7 J. Hydraul. Eng.

J. Hydraul. Eng., 2020, 146(10): 04020067

Table 5. Averaged IA values of models in different and overall R bands

© ASCE 04020067-8 J. Hydraul. Eng.

J. Hydraul. Eng., 2020, 146(10): 04020067

wiavg ity is an advantage of the nonparametric data-driven model with no

© ASCE 04020067-9 J. Hydraul. Eng.

J. Hydraul. Eng., 2020, 146(10): 04020067

© ASCE 04020067-10 J. Hydraul. Eng.

J. Hydraul. Eng., 2020, 146(10): 04020067

© ASCE 04020067-11 J. Hydraul. Eng.

J. Hydraul. Eng., 2020, 146(10): 04020067

© ASCE 04020067-12 J. Hydraul. Eng.

J. Hydraul. Eng., 2020, 146(10): 04020067

You might also like