Mbarak 2019

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Front. Struct. Civ. Eng.

https://doi.org/10.1007/s11709-019-0591-x

RESEARCH ARTICLE

SPT based determination of undrained shear strength:


Regression models and machine learning
Walid Khalid MBARAKa, Esma Nur CINICIOGLUb, Ozer CINICIOGLUa*
a
Department of Civil Engigeering, Bogazici University, Istanbul 34342, Turkey
b
School of Business, Quantitative Methods Division, Istanbul University, Istanbul 34320, Turkey
*
Corresponding author. E-mail: ozer.cinicioglu@boun.edu.tr

© Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature 2019
ABSTRACT The purpose of this study is the accurate prediction of undrained shear strength using Standard
Penetration Test results and soil consistency indices, such as water content and Atterberg limits. With this study, along
with the conventional methods of simple and multiple linear regression models, three machine learning algorithms,
random forest, gradient boosting and stacked models, are developed for prediction of undrained shear strength. These
models are employed on a relatively large data set from different projects around Turkey covering 230 observations. As an
improvement over the available studies in literature, this study utilizes correct statistical analyses techniques on a
relatively large database, such as using a train/test split on the data set to avoid overfitting of the developed models.
Furthermore, the validity and consistency of the prediction results are ensured with the correct use of statistical measures
like p-value and cross-validation which were missing in previous studies. To compare the performances of the models
developed in this study with the prior ones existing in literature, all models were applied on the test data set and their
performances are evaluated in terms of the resulting root mean squared error (RMSE) values and coefficient of
determination (R2). Accordingly, the models developed in this study demonstrate superior prediction capabilities
compared to all of the prior studies. Moreover, to facilitate the use of machine learning algorithms for prediction purposes,
entire source code prepared for this study and the collected data set are provided as supplements of this study.

KEYWORDS undrained shear strength, linear regression, random forest, gradient boosting, machine learning, standard
penetration test

1 Introduction geotechnical design when cohesive soils are considered.


Stability calculations based on undrained strength of soils
With the ever growing project sites, constrained budgets are both more practical and safer when the problem
and reduced project deadlines, correlations in geotechnical involves rapid loading and positive excess pore water
engineering together with field and laboratory tests have pressure generation. Undrained shear strength can be
formed the basis of the design process. Therefore, it is measured with laboratory tests (i.e., unconfined compres-
invaluable to provide the engineers with tools to obtain sion test, triaxial test, etc.) and field vane shear test which is
critical soil properties from the results of commonly used the only field test that can directly measure undrained shear
practical field tests. However, most conventional field strength. However, laboratory tests require high quality
testing methods do not allow the direct measurement of the undisturbed samples increasing the economic burden on
necessary design parameters and relating the results of field the project and it is generally impossible to conduct vane
tests to the necessary design parameters requires the use of tests on most soil profiles. Therefore, a more practical and
empirical correlations. One such design parameter is cost effective alternative for determining undrained shear
undrained shear strength. strength is to use statistical methods that explore the
Undrained shear strength is an essential parameter in relationship between undrained shear strength and field test
results. This is an attractive alternative for design
Article history: Received Oct 18, 2018; Accepted Feb 25, 2019 engineers, because reliable predictions of undrained
2 Front. Struct. Civ. Eng.

shear strength can be made without the economic burden tioned models. The superior performance of the machine
of laboratory testing. This fact has been also acknowledged learning models created suggest that they may be
by previous studies [1–3] which use SPT results in order to considered as an attractive alternative to the conventional
predict undrained shear strength. However, the statistical method of linear regression. Accordingly, the outline of the
analyses conducted in most of the available works in remainder of the paper is as follows: in Section 2, the data
literature are preliminary in the sense that they use only one set used in this study is introduced, accompanied by a brief
independent variable to predict the dependent one, fail to description of SPT. Furthermore, previous models for
inform the reader regarding the number of observations prediction of undrained shear strength are reviewed. Then,
used and the level of significance of the results obtained. A models used for estimating undrained shear strength, linear
statistical analysis conducted in that manner will naturally regression, random forest, gradient boosting, are
involve ambiguities which may lead to underestimation or explained. In Section 3, the results of the models
overestimation of undrained shear strength [2]. Addition- developed for this study are compared with the results of
ally, even though many varying equations are suggested in the models frequently used in practice. Finally in Section 4,
literature, a performance comparison of the suggested conclusions are summarized. All of the analyses in this
models with their preceding alternatives is mostly study are conducted using the Classification and Regres-
neglected. As will be further detailed in Section 3 for sion Trees (CARET) package available in R version 2017,
prediction of undrained shear strength, this omission in a free software environment for statistical computing and
turn, leaves the legacy of their results open to discussions. graphics that is supported by the R Foundation for
Additionally, the use of statistical analysis in geotechni- Statistical Computing [7].
cal engineering is invaluable, since its application promises
to reveal the underlying mechanism of soil behavior. In this
regard, there exist remarkable contributions on how to treat 2 Data set and method
the uncertainties existing in different domains of engineer-
ing with the use of the sensitivity and stochastic analyses In this study, the constructed models will explore the
[4–6]. On the other hand, the results revealed with a relation of the SPT results along with the parameters
statistical analysis may only be considered as valid and indicating the soil properties to undrained shear strength of
applicable estimations of the underlying relationship, if the fine grained soils. These parameters include the water
statistical analyses are conducted in the correct statistical content (wn) and the Atterberg limits: liquid limit (LL),
manner. If not, resulting equation becomes nothing more plastic limit (PL), and plasticity index (PI).
than an invalid estimation which is destined to fail when a SPT, though it is the most commonly used in situ test for
new data set does emerge. Therefore, as will be further site exploration, cannot be directly used to measure any
discussed in text, when conducting a statistical analysis, it mechanical properties. However, general trend in SPT
is the utmost duty of both the practitioners and the results is that better soil conditions correspond to higher
researchers to pay attention to the use of the necessary expected blow counts. It is this indication of a possible
confirmatory statistical metrics. positive correlation between SPT blow count and cu that
In this context, one of the projected contributions of this leads many researchers to work on estimation of undrained
paper is to present a critical review of existing literature on shear strength of cohesive soils using SPT results [1,8–11].
the use of SPT for predicting undrained shear strength. For SPT typically involves a standard sampler driven into
that reason, first prior studies that lack the use of the correct the ground by energy delivered from a 63.5 kg weight
statistical metrics for model development are identified. hammer dropped from a height of 760 mm. The process is
Next, the available data set is randomly divided into testing repeated until the sampler has penetrated a distance of
and training sets. Subsequently, using the training data set 450 mm into the soil. Hammer blows required to penetrate
that corresponds to 80% of the whole data set, new simple each interval of 150 mm are recorded. The test is stopped
and multiple linear regression models are developed for if the number of blows required to penetrate a certain
prediction of undrained shear strength in the correct 150 mm interval exceeds 50, or if more than 100 blows are
statistical manner. Then, performances of the suggested required for the entire 300 mm. The SPT-N value is
models in this work and the prior ones in literature are calculated by adding up the sum of the blows required to
tested using the test data set, which corresponds to 20% of penetrate the final 300 mm. Due to factors and variables
the whole data set chosen on a random basis and spared for such as borehole diameter, hammer configuration and
testing purposes only. At last, two new machine learning many more, SPT hammer efficiencies can vary widely.
algorithms, random forest, and gradient boosting are Accordingly, following the suggestions of Bolton Seed
proposed for the prediction of undrained shear strength. et al. [12] and Skempton [13], it is now common practice to
Finally, in efforts of improving the accuracy of the apply corrections to the raw SPT-N values so as to render
suggested models, a stacked model is developed for them useful to the engineer. A hammer efficiency of 60% is
prediction which is a combination of the three aforemen- now the standard level to which all N values are correlated
Walid Khalid MBARAK et al. SPT Based determination of undrained shear strength 3

(N60). Hence, in this study the example of Hettiarachchi related work is indicated as not available (NA). Therefore,
and Brown [14] is followed and the correlations are based in the following section, a brief description of simple and
on N60. Another widely used SPT N value correction is multiple linear regression models is delivered and its
overburden correction. Overburden correction is necessary necessary metrics, p-value, RMSE, and R2, for purposes of
for cohesionless soils especially when the SPT results validation and performance evaluation, are revisited.
would be used for predicting relative density. Thus,
overburden correction is not used in this study. 2.1 A discussion on the available linear regression models
The accuracy of a statistical analysis increases with the in literature
number of observations included in the data set and the
validity of a suggested model improves if the observations Linear regression modeling is one of the oldest and
are sampled from different populations. Accordingly, the simplest forms of regression methods. In linear regression,
data set collected for this study covers 230 observations a direct linear relationship between the predictor variables
from different projects around Turkey. The observed x and the response variable y is assumed (Eq. (1)).
parameters include undrained shear strength (cu), SPT
blow count corrected for efficiency (N60), Atterberg limits ^y ¼ β1 þ β2 x2 þ ::: þ βp xp , (1)
(LL, PL, PI) and natural water content (wn). SPT blow
where ^y indicates the prediction of the response variable y
count cannot be used to estimate strength values exceeding
based on the values of independent variables of x. βp
200 kPa, since SPT is not a suitable test for such geological
represents the coefficient of the corresponding indepen-
materials. These soils are referred to as intermediate
dent variable and p is the number of independent variables
geomaterials representing a state between soils and rock.
used in the study.
When intermediate geomaterials are encountered, the
The verification of the resulting equation of regression
necessary number of SPT blows exceeds the aforemen-
analysis is conducted through the t-tests conducted on each
tioned limits before the target penetration depth is reached
variable of the regression equation. The result of the
and the test is terminated prematurely. For such tests,
significance test is reported in the form of p-values, which
number of blows is not recorded and result is given as
indicate the probability of obtaining the coefficient, βi , of
“refusal”. On the other hand, samples obtained from
the variable xi, on the resulting regression equation by
intermediate geomaterials can be tested in laboratory to
chance, whereas its’ actual value is equal to zero.
measure undrained shear strength. Since SPT tests that
Therefore, only the variables with a smaller p-value than
resulted in “refusal” cannot be used in the analyses, in
0.05 are considered as statistically significant and hence
order to maintain consistency in the interpretation of data,
included in the resulting regression equation.
data points with undrained shear strength values greater
Evaluation of model fit for linear regression can be
than 200 kPa are excluded from the data set. Thus, the final
conducted using different measures such as root mean
data set used in this study covers a total of 214
squared error (RMSE) and the coefficient of determination
observations. In Fig. 1 the scatter plots for each of the
R2. As depicted in Eq. (2), where N refers to the total
independent variables used in this study, water content
number of observations, RMSE, is the square-root of the
(wn), LL, PL, and PI versus cu, the target variable of
variance of the residuals. This corresponds to the
prediction, are provided. Additionally, a statistical sum-
difference between the observed value yi and the predicted
mary of the final data set used for this study is given in
value ^yi . Hence, RMSE shows the absolute fit of the model
Table 1. The entire data are given in Ref. [15].
to data, with lower values indicating a better fit of the
For evaluation of the performance of the suggested
model.
models, all of the statistical analyses conducted in this
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
research use an 80%–20% train-test split on the data set. 1 N
Most of the earlier research in literature on the RMSE ¼ Σ ðy – ^y Þ2 : (2)
N i¼1 i i
estimation of undrained shear strength from SPT results
used either simple or multiple linear regression models for The main difference between the RMSE and R2, is that R2
prediction. However, most of these works suffer from the is defined as a relative measure of fit, and not an absolute
lack of the use of the correct statistical metrics for one. The calculation of R2 is given in Eq. (3). Here, the sum
developing a valid regression model. Previous studies on of squared errors given in the nominator and defined as the
the subject matter are given in Table 2 along with the unexplained variation in the model, is compared to the total
equations proposed. Additionally, Table 2 provides sum of squares of the mean model. Hence, the resulting
information on the number of observations used in each value of R2 usually ranges from 0 to 1, and it is interpreted
study and the results of the performance metrics, such as R2 as the percentage of variation explained by the model
(coefficient of determination), p-value, and root mean compared to the mean model. The equation for R2 is given
squared error (RMSE). The absence of these metrics in the below in which yi is the mean of the observed data.
4 Front. Struct. Civ. Eng.

Fig. 1 Scatter plots of independent variables: (a) N60; (b) water content (wn); (c) LL; (d) PL; (e) PI versus cu.

Σðyi – ^y i Þ2
R2 ¼ 1 – : (3)
Table 1 Summary of the final data set Σðyi – y i Þ2
variable minimum maximum median mean
In rare cases where the fit of the suggested model is
N60 1 50 14 16.56
worse than the fit of the mean model given in the
wn 9% 96% 31% 32.23% denominator, the resulting R2 value can be negative. That
LL 23 44 57 56.48 shows that the fit of the prediction model which is the
PL 11 60 25 24.43 regression line in this case, is actually worse than just
PI 7 80 32 32.29 fitting a horizontal line for prediction [16].
One of the biggest pitfalls of R2 is that it can artificially
cu (kPa) 8 200 68 77.25
increase with the number of independent variables added.
To solve this problem adjusted R2 is developed which
Walid Khalid MBARAK et al. SPT Based determination of undrained shear strength 5

incorporates the model’s degrees of freedom to make up tions used in the data sets are provided. A correlation
for the artificial increase in R2. Especially, when multiple equation can only be as flexible as the observations used in
linear regression is used, adjusted R2 should be used for its derivation and hence in order to judge the validity of the
evaluation. The formulation of the adjusted R2 is given in results, it is important to know the number of observations
Eq. (4) where n indicates the sample size and k stands for in the data set. Also, with the exception of Hettiarachchi
the number of independent variables. and Brown [14], none of the prior studies used a test/train
split of their data which results in overfitting of the
ðn – 1Þ
Adjusted R2 ¼ 1 – ð1 – R2 Þ: (4) developed models, rendering them unstable and extremely
½n – ðk þ 1Þ variable. Consequently, though some of the researchers
Though, both RMSE and R2 are important to evaluate [1,3,14] compared their results with their preceding
model fit of regression analysis and it is best to use both, alternatives, the lack of the use of the correct metrics in
RMSE should especially be preferred when the objective of terms of comparison makes their results open to discus-
the regression analysis is prediction. sions. For that reason, the results they present, as given in
In Table 2 prior works of different researchers for Table 2, cannot be considered as valid statistical models,
prediction of undrained shear strength of cohesive soils are and their reported relatively high R2 values [2,3] are only
summarized. The variable in these correlations is given as an implication of model overfitting.
N60 even though some of the correlations were originally Considering all of these points presented above in order
developed with N values without applying any corrections. to reveal the underlying relation between undrained shear
The underlying reason is, as several researchers [14,17] strength and SPT results in a scientifically valid and
have suggested, that most of the SPT hammers work close acceptable form, the correct use of statistical analysis is
to 60% energy efficiency level and it is prudent to use N60 essential. Therefore, in this work, all the analyses will be
instead of N in these correlations. performed and presented along with the results of the
As evident in Table 2, the main problem of the earlier statistical significance tests conducted.
works is the lack of the necessary statistical evidence to
judge their validities. Unfortunately, as can be seen in 2.2 Proposed models
Table 2, none of the earlier works conducted t-tests and
used the corresponding p-values for verification of the 2.2.1 Model 1: simple linear regression
suggested models. Additionally, the R2 values are only
presented in Refs. [1,2,14], whereas RMSE results are For comparison purposes with the prior works of Sanglerat
missing from all of the earlier works presented in Table 2. [9], Decourt [11], Nixon [10], Ajayi and Balogum [18],
Some of these works [1,2,14] report the range of the error Kulhawy and Mayne [19], Sivrikaya and Toğrol [1], and
in prediction as a performance measure of the regression Hettiarachchi and Brown [14] that use simple linear
analysis. However, the range of the error cannot represent regression for prediction of undrained shear strength,
the total magnitude of the error made in prediction and initially a simple linear regression is developed where N60
therefore compared to RMSE, is not an effective measure is the predictor variable. The results of the simple
of prediction performance. Furthermore, in most of these regression analysis and the resulting equation are given
works no information regarding the number of observa- as Table 3 and Eq. (5), respectively.

Table 2 Previous correlations presented by different researchers for fine-grained soils


researcher(s) cu (kPa) number of observations used R2 p-values RMSE
Sanglerat [9] 12.5N60 NA NA NA NA
Nixon [10] 12N60 NA NA NA NA
Ajayi and Balogun [18] 1.39N60 + 74.2 NA NA NA NA
12.5N
Decourt [11] NA NA NA NA
15N60
Kulhawy and Mayne [19] 6.25N60 NA NA NA NA
4.45N 0.64
Sivrikaya and Toğrol [1] 226 NA NA
6.35N60 0.6084
Hettiarachchi and Brown [14] 4.1N60 26 NA NA NA
3.33N – 0.75wn+ 0.20LL + 1.67PI 0.6724
Sivrikaya [2] 100 NA NA
4.43N60 – 1.29wn+ 1.06LL + 1.02PI 0.6561
Nassaji and Kalantari [3] 2N60 – 0.4wn – 1.1LL + 2.4PI + 33.3 72 0.6561 NA NA
6 Front. Struct. Civ. Eng.

Table 3 Summary of the simple linear regression analysis As can easily be seen, only the variables N60 and wn
coefficient estimate Std. error t-value p-value (Pr ( > | t |)) proved to be statistically significant in 1% significance
intercept 32.639 4.8078 6.789 1.82E – 10 level whereas LL, PL, and PI have resulted in p-values well
N60 2.771 0.244 11.339 2.00E – 16
above the acceptable limit of 0.05. Accordingly, as the next
step of the regression analysis LL, PL, and PI will be
excluded from the analysis, since there is not enough
statistical proof for their effects on prediction of undrained
cu ¼ 32:639 þ 2:771N 60 : (5) shear strength. However, before ongoing with the resulting
Equation (6) is made unit-independent by dividing both model consisting of N60 and wn only, the model is further
sides with atmospheric pressure (pa = 100 kPa), as shown improved by considering the fact that the effect of one of
in Eq. (6): the variables (N60, wn, LL, PL, PI) on undrained shear
strength might depend on the effect of another variable.
cu Accordingly, an interaction term wn  LL is added to the
¼ 0:32639 þ 0:02771N 60 : (6)
pa model with N60 and wn. A summary of the resulting
regression analysis along with the p-values are given in
Figure 2 presents the measured vs. predicted values of
Table 5. As can be seen in Table 5, all of the variables in the
the undrained shear strength as estimated by the simple
new model which include N60, wn and wnLL are proved
linear regression analysis.
to be significant whereas the values for the performance
metrics did improve. This result shows that the effect of wn
on undrained shear strength is different for different levels
of LL. This is expected since the relative magnitude of wn
with respect to LL defines a soil sample’s consistency. Two
different soil samples might have the same wn, but the
consistency of the sample that has the smaller LL is more
likely to be softer, because that sample’s water content is
closer to its liquid limit. Accordingly, Eqs. (7) and (8)
represent the final regression equations. Here, Eq. (7) is
normalized with pa (= 100 kPa) to make it unit-
independent as given in Eq. (8).
Table 5 Summary of the multiple regression analysis results: final step
after adding the interaction term
coefficient estimate Std. error t-value p-value (Pr ( > | t |))
intercept 69.562 9.15 7.594 2.14E – 12
N60 2.602 0.235 11.047 2.00E – 16
wn – 1.789 0.44 – 4.057 7.63E – 05
Fig. 2 Plot of measured vs. predicted undrained shear strength
using Model 1: simple linear regression. wnLL 0.0122 0.005 2.453 0.0152

2.2.2 Model 2: multiple linear regression


cu ¼ 69:562 þ 2:602N 60 – 1:789wn þ 0:0122ðwn  LLÞ,
As the second model of the research, a multiple linear
regression model is developed using all of the variables (7)
present in the data set. Accordingly, Table 4 presents the
results of the multiple linear regression. cu
¼ 0:69562 þ 0:02602N 60 – 0:01789wn
pa
Table 4 Summary of the multiple regression analysis: step 1
coefficient estimate Std. error t-value p-value (Pr ( > | t |)) þ 0:000122ðwn  LLÞ: (8)
intercept 49.53 11.472 4.318 2.72E – 05
The regression analysis is conducted using the training
N60 2.67 0.256 10.437 2.00E – 16
test which corresponds to 80% of the whole data. The
wn – 0.935 0.252 – 3.714 0.000279 remaining 20% will later be used as the test data for
LL – 11.235 15.239 – 0.737 0.462 performance evaluation purposes. Figure 3 presents the
PL 11.279 15.227 0.741 0.4599 measured vs. predicted values of the undrained shear
PI 11.657 15.242 0.765 0.4454
strength as estimated by the multiple linear regression
analysis.
Walid Khalid MBARAK et al. SPT Based determination of undrained shear strength 7

3) Repeat Step 2 until a user defined number of trees is


grown. Typically 500 trees.
For each regression tree grown, a different bootstrap
sample is drawn from the training set. Two-thirds of the
sample is used to determine the regression function and the
remainder which is termed as out of bag (OOB) sample is
used to validate the accuracy of the function [23].
The random forest algorithm employed in this analysis
was implemented from the CARET package [24] found in
the R program [7]. In the random forest model, two
parameters need to be optimized so as to maximize
prediction performance. These parameters are ntree and
mtry [25]. ntree is simply defined as the number of trees
needed to perform the regression. The prediction accuracy
is more sensitive to mtry than ntree [21,25]. Therefore,
throughout the analysis ntree is fixed at 500, which is the
default number of the random forest model in CARET
Fig. 3 Measured vs. predicted undrained shear strength using package available in R.
Model 2: multiple linear regression. To determine the best mtry parameter, a 10-fold cross-
validation which is repeated 5 times is used to perform a
search on a grid of multiple parameter values. Cross-
validation involves splitting the training data into two sets
2.2.3 Model 3: random forest
randomly, fitting the model on one and testing for its error
in the other. Repeating this multiple times and taking the
The random forest is a powerful machine learning
average of the cross-validation results ensures reliability of
algorithm used for both regression and classification
the cross-validation test. As can easily be seen from the
problems. It is essentially a tree-based method which
results given in Fig. 4, the optimum value of mtry happens
involves breaking up the predictor space into a number of
to be 2 in this model.
simple regions. So as to make a prediction of a certain
observation, the mean of the training observation of the
region to which it belongs is given. Since the splitting
criterion used to break up the predictor space can be
summarized in a tree, these methods are known as decision
tree methods [20].
The random forest tree-based model uses the bagged tree
approach. It uses bootstrap resampling to create multiple
training sets and fits trees to each bootstrapped training
data set. Bootstrap is essentially a resampling technique
which involves continuously taking random samples from
a training data with replacement. This simply means that
selected data may appear more than once in the selected
subset [20]. During tree fitting, random forest randomly
picks m number of predictors from the total number
possible and only searches within these randomly selected
predictors for the best possible split [20]. This m predictor Fig. 4 Determination of mtry parameter in the random forest
number, essentially referred to as mtry in random forest model.
package in the R programming language, becomes a
critical tuning parameter in the random forest algorithm Unlike linear regression models, random forest is unable
[21]. to generate a tangible equation as to how prediction was
A simple procedure of how to build a random forest is achieved. Instead, as explained earlier, it splits the
explained as follows [21,22]: predictor space into regions while trying to minimize the
1) Draw a sample from the available data set using residual standard error in each region as the splitting takes
bootstrap resampling method (i.e., drawing samples with place. From the available data set 171 observations were
replacement). used to train the model and 43 observations were used to
2) Develop the trees by modifying the number of mtry test our developed model. In Fig. 5 the result of the random
values. forest regression model on the test data are shown where
8 Front. Struct. Civ. Eng.

controls the learning process of the gradient boosting


algorithm. This parameter determines how slow or fast you
wish the algorithm to try and comprehend the data given. If
a very low value of the shrinkage parameter is chosen, the
learning process will be very slow and a large number of
trees will be required. A very large number of trees will
lead to overfitting. Friedman [27] suggested the number of
trees to range between 100 and 500 for optimal results. The
interaction depth controls the complexity of the gradient
boosting algorithm. In simpler terms, the interaction depth
controls the number of splits in the tree. James et al. [20]
suggested that an interaction depth between 1 and 5 often
worked well. However, to determine the best shrinkage
parameter and the interaction depth that will give us the
optimal predictive model a 10-fold cross-validation
repeated 5 times is applied as done for the random forest
model. From the cross-validation results presented in
Fig. 5 Measured versus predicted values of undrained shear Fig. 6, best model parameters for the prediction of the
strength using Model 3: random forest. undrained shear strength are obtained when shrinkage
parameter l is 0.0101 and interaction depth is 3.
the measured versus predicted cu using the random forest Like the random forest model and most of the supervised
model can be seen. learning algorithms, it is not possible to derive a tangible
equation. In Fig. 7 the relationship between the measured
2.2.4 Model 4: gradient boosting versus predicted parameters obtained using the gradient
boosting algorithm are presented. The comparison was
Gradient boosting is another avenue for improving the done using the test data which was not used while training
prediction results of a decision tree. It can be applied to the model.
different statistical learning methods for both regression
and classification problems. Gradient boosting is a form of 2.2.5 Model 5: the stacked model
bagging which involves generating multiple training data
sets from the original set using bootstrap. This is followed The main goal of using a stacked model is to increase the
by fitting decision trees to each of these bootstrapped prediction performance by incorporating different models
samples, and finally combining all these fitted trees to to the analysis. Already developed models can be stacked
create a single model. Gradient boosting works in a similar to further improve their predictive capabilities [28]. In
way, except this time each one of the trees is grown by Fig. 8, the simple structure of a stacked model is presented.
using information from the previously grown trees. This Stacking involves feeding of the predictions of lower
process can be termed as sequential growing. This layers to an upper layer stacking function. The machine
sequential growing of the trees in the gradient boosting learning models pass their predictions to the upper layer
approach allows the algorithm to learn slowly [20]. and this layer makes decisions based on performances of
Let us break down this process a little further to form a the models in the layer below. Here, it can be seen that the
clear picture. Given a model, the residuals of the model are top layer has been labeled as the stacking function. This
fitted rather than the response. Then a new decision tree is simply means one can incorporate any machine learning
added to this so as to update the residuals. These new algorithm or simpler functions to the stacking function.
decision trees can be slow in essence; hence the residuals The selection of the models to be part of the stacked model
of the model are slowly improved. In the gradient boosting should be based on a clearly set performance criteria like a
algorithm, the shrinkage parameter, l, slows down the pre-etermined cutoff point for model accuracy or RMSE
process even further allowing smaller decision trees to obtained with the individual model. Accordingly, only the
improve the residuals hence improving the performance of models which meet the requirements of the performance
the model. Herein, the shrinkage parameter becomes an criterion are incorporated to the stacked model. In this
essential tuning parameter in the gradient boosting study, as presented in Fig. 8, the lower layers are the
algorithm [26]. multiple linear regression, random forest and gradient
The shrinkage parameter (l), number of trees and boosting models. The weighted average of their prediction
interaction depth are what need to be tuned so as to obtain performances in the form of RMSE indices are used to
the best predictive model. The shrinkage parameter (l) is a determine the weights of the stacking function.
positive number often between 0.1 and 0.001 which When building the stacked model, it is important to
Walid Khalid MBARAK et al. SPT Based determination of undrained shear strength 9

Fig. 6 Gradient boosting cross-validation results.

Fig. 7 Measured versus predicted values of undrained shear


strength using model 4: gradient boosting. Fig. 8 Structure of the stacked model.

ensure that the predictions of the models which are 0.70, which according to Spearman’s coefficient of
incorporated into the model are not highly correlated with correlation ranking is termed as a ‘strong’ relationship.
each other. If high correlations among the predictions of In Table 6, the correlation plot of the outputs of the
individual models are present, then their combination will individual models is presented. Consequently, the models
not result in an improved performance on the stacked used in this study, linear regression, random forest and
model. Therefore, before embarking on creating the gradient boosting, are not highly correlated with each
stacked model, the correlation between the predictions of other, hence stacking of these models can be undertaken.
individual models should be checked. If the correlation Figure 9 presents the measured versus predicted values of
present between two models exceeds a previously the undrained shear strength as estimated by the stacked
determined threshold, then the one with lower performance model. As with the other models, comparison was done
should be omitted from the stacked model developed. using the test data which was not used while training the
In this study, the upper limit for this correlation is set to model.
10 Front. Struct. Civ. Eng.

Table 6 Correlation plot of models used in the stacked model the aforementioned earlier studies in literature, detailed in
predictions multiple linear random gradient Section 2, Table 2. For that purpose, the test data set which
regression model forest boosting corresponds to 20% of the whole data in size, will be used
multiple linear regression 1 – 0.26645 – 0.1636 in order to check the prediction performances of all the
model available models. Several different equations exist in
random forest – 0.26645 1 0.086442 literature developed for the prediction of undrained shear
gradient boosting – 0.1636 0.086442 1 strength of cohesive soils. As given in Section 2 in further
detail, other than Sivrikaya [2] and Nassaji and Kalantari
[3], all of them use the method of simple linear regression,
having N60, as the only predictor variable of undrained
shear strength, cu. Sivrikaya [2] and Nassaji and Kalantari
[3], on the other hand, use a multiple linear regression
model, where their proposed equations include N60, water
content, liquid limit, and plasticity index. To compare the
performances of the models developed in this study with
the nine aforementioned equations existing in literature, all
of the models were applied on the test data set and their
performances are evaluated in terms of the resulting
RMSE, coefficient of determination (R2), and adjusted R2.
Hence, as presented in Table 7, a total of 14 models, nine
existing and five developed in this work, were used in this
section.
Considering the results of the linear regression equations
existing in literature and the ones proposed here, it can be
seen that both of the linear regression equations developed
in this study performed superior compared to their
Fig. 9 Measured versus predicted values of undrained shear preceding counterparts. The simple linear regression
strength using Model 5: stacked model. model (Eq. (6)) resulted in an RMSE value of 27.93 and
an R2 value of 0.55, whereas its closest follower
Hettiarachchi and Brown [14] performed as 30.14 and
3 Results, comparison and discussion: 0.47 for RMSE and R2, respectively. The multiple linear
suggested models vs. existing work regression model given in Eq. (8) on the other hand,
performed even better, with a lower RMSE value, 24.55
This section is devoted to the comparison of the and higher R2 and adjusted R2 values, as 0.68 and 0.67,
performances of the models developed in this work with respectively. As can be clearly be seen in Table 7 the

Table 7 Comparison of models used to estimate undrained shear strength


model RMSE (kPa) R2 adjusted R2
models used in prior studies Sanglerat [9] 181.53 – 18.10 – 18.57
Nixon [10] 171.29 – 16.00 – 16.42
Ajayi and Balogun [18] 39.49 0.10 0.07
Decourt [11] 232.90 – 30.50 – 31.20
Kulhawy and Mayne [19] 57.51 – 0.92 – 0.96
Sivrikaya and Toğrol [1] 59.30 – 1.04 – 1.09
Sivrikaya [2] 64.51 – 1.41 – 1.67
Hettiarachchi and Brown [14] 30.14 0.47 0.46
Nassaji and Kalantari [3] 32.44 0.39 0.33
models developed in this study SLR: Equation (6) 27.93 0.55 0.54
MLR: Equation (8) 24.55 0.68 0.67
random forest 23.50 0.70 0.69
gradient boosting 25.15 0.66 0.65
stacked 22.89 0.73 0.72
Walid Khalid MBARAK et al. SPT Based determination of undrained shear strength 11

performance of the proposed models are drastically better performance of the equation suggested by Hettiarachchi
than their preceding counterparts, both in terms of R2 and and Brown [14] compared to the previous ones in
RMSE. literature.
Another noteworthy observation regarding Table 7 is The superior performances of the models suggested in
that some of the earlier models in literature resulted in this work compared to the preceding counterparts are also
negative R2 values for their prediction performances on the apparent in the results of the machine learning models. As
test data set. These are the equations suggested by can be seen in Table 7, the gradient boosting model
Sanglerat [9], Nixon [10], Decourt [11], Kulhawy and resulted in a RMSE value of 25.15 and R2 value of 0.66.
Mayne [19], Sivrikaya and Toğrol [1], and Sivrikaya [2]. Random forest model, on the other hand showed an even
As detailed in Section 2.1, negative R2 values only happen better performance, as 23.50 for RMSE and 0.70 for R2.
when the mean of the data provides a better fit to the Furthermore, the stacked model, which incorporates the
outcomes than the model used to do the predictions. A linear regression, gradient boosting and random forest
visual comparison for the predictive performances of all models suggested in this work, has the lowest RMSE score,
the models is provided in Fig. 10. Figure 10 only focuses 22.89 and the highest R2, as 0.73, resulting in the best
on the region where the observation data are located. That prediction performance both among the earlier models and
is why some of the less successful predictions [1,9–11] are the ones developed in this study.
truncated at the top boundary. The reason behind the superior performance of the
The poor performance of the earlier studies can be stacked model is down to its ability to combine different
attributed to many factors, like missing verification and predictions coming from other models. Hence, the
evaluation metrics of p-value, R2, and RMSE. Additionally, combination of the outputs of the multiple linear regres-
in none of these studies the method of cross-validation was sion, random forest and gradient boosting models resulted
used in the model development. Moreover, with the in the best prediction performance. Furthermore, when
exception of Hettiarachchi and Brown [14], none of the random forest and gradient boosting algorithms are
earlier works utilized a train/test split of data when utilized, the cross-validation process ensures the determi-
developing their models. Hence, the whole data set was nation of the optimum tuning parameters which in turn
used for both developing the model and testing the results. induces a ripple effect in producing superior results
This practice in turn makes their results prone to compared to the equations available in literature.
overfitting. Overfitted models tend to be extremely variable Hettiarachchi and Brown [14] provide the test data set
and thus their solutions are unstable. In Ref. [14], instead, a used in their study which covers 12 observations for N60
test/train split of the data was performed and the analyses and its corresponding undrained shear strength values, cu.
were conducted accordingly. This fact, as presented in As the next step of the analysis, this data set is utilized as a
Table 7, might be one of the reasons of the better further verification of the performances of the simple linear

Fig. 10 Comparison of the predictions of all models with the test data of this study.
12 Front. Struct. Civ. Eng.

regression models available in literature (Table 2) and the significantly affects the possible magnitude of end
one developed in this study (Eq. (6)). Since this data set resistance. Penetration in stiff to very stiff clays is more
does not provide information regarding the other predictor likely to result in plug formation and generally formation
variables, water content wn and the Atterberg limits, the of a plug is not expected in soft clays. Then to increase the
multiple linear regression and machine learning models are accuracy of cu predictions, it becomes necessary to take the
not considered for comparison. In line with this, since only state of the soil into account. Simplest ways the state of a
one independent variable, N60, is used for prediction, the cohesive soil in nature can be described is to either use
results are reported in terms of RMSE and R2, only. In overconsolidation ratio or to use Atterberg limits and in
Fig. 11, the predictions of all the simple linear models on situ moisture content in combination. As information on
the test data set [14] are demonstrated. Additionally, in liquid limit, plastic limit and in situ moisture content are
Table 8, the corresponding RMSE and R2 values are more readily available in general, this study developed its
provided. Evidently, as can be seen in Table 8, having the multiple linear regression using these as input, as
smallest RMSE value and greatest coefficient of determi- previously done by Sivrikaya [2] and Nassaji and Kalantari
nation R2, Eq. (6) suggests the most accurate prediction [3]. However, evident in Table 4, wn is found to be
among simple linear regression equations. statistically significant, whereas Atterberg limits are not.
When the available equations in literature are consid- But, it is still possible that a parameter’s influence might be
ered, they use either simple linear regression or multiple dependent on its interaction with a statistically insignif-
linear regression. Equations that use simple linear regres- icant parameter. That is why possible interactions between
sion benefit from the inherent simplicity of directly linking Atterberg limits and wn were investigated and the influence
N60 through a proportionality constant to cu. This is an of the interaction term wn  LL was found to be significant.
attractive approach both for the researchers developing the Table 5 presents the results of the regression analysis. This
model and the end-user. However, this approach is more outcome is not surprising because it is known that soil
prone to errors that stem from the mechanics of the behavior is dependent on the relative magnitude of wn with
problem. Basically, an SPT sampler is a miniature open- respect to LL. As the magnitude of wn approaches LL, the
ended pipe pile [14]. During an SPT test, the driving effort response gets softer, and vice versa. In other words, the
must be sufficient to overcome the total of skin resistance influence of wn on undrained shear strength is dependent
and end-bearing resistance. When skin resistance is on its interaction with LL.
considered, the influence of the state of the soil on the In this study different algorithmic models are used to
mechanics of the problem is minor. However, the predict the undrained shear strength using Atterberg limits
mechanism by which the end-bearing develops is a and the water content. As explained in previous sections,
function of the state of the soil. As in the case of pipe different than the regression models, the machine learning
piles, the possible formation of a soil plug during driving algorithms random forest, gradient boosting and the

Fig. 11 Comparison of the predictions of all simple linear regression models with the test data of Hettiarachchi and Brown [14].
Walid Khalid MBARAK et al. SPT Based determination of undrained shear strength 13

Table 8 Comparison of models used to estimate undrained shear regard there exist other machine learning models like
strength on the test data of Hettiarachchi and Brown [14] hybrid nonlinear modeling used in Ref. [30], artificial
simple linear regression equation RMSE R2 neural network or adaptive neuro-fuzzy inference system
Sanglerat [9] 146.81 – 19.59 models used in Ref. [31] which proved to be successful
especially in case of high prediction accuracy. As a future
Decourt [11] 190.15 – 33.54
study, in order to improve the prediction accuracy, these
Nixon [10] 138.20 – 17.20 and other machine learning models can be used for
Ajayi and Balogun [18] 35.85 – 0.23 prediction of undrained shear strength. For ease of use of
Kulhawy and Mayne [19] 41.22 – 0.62 the suggested models by everyone on the other hand, the
Sivrikaya and Toğrol [1] 42.77 – 0.75
entire source code is provided to readers in an open data
repository as Ref. [29]. Moreover, the data used in this
Hettiarachchi and Brown [14] 19.18 0.65
study is provided in Ref. [15] in order to enhance the
This study: SLR Equation (6) 18.47 0.67 database in literature on the subject matter and to facilitate
the use of the provided code for prediction also by those
engineers who don’t have any training data at their
stacked model do not form tangible equations. Though, disposal.
this may be considered as a disadvantage of the suggested
machine learning algorithms for daily use in practice, the Acknowledgements The authors would like to thank Zemin Etud ve
far better accuracy in prediction performance outweighs Tasarim A. S and Geocon Zemin Uzmanlari ve Muhendislik Ltd. Sti. for
this shortcoming. Additionally, to ease the use of the providing the data that was utilized in this study.
developed machine learning models in this work, the entire
source code of the models is given in Ref. [29].
References

4 Conclusions 1. Sivrikaya O, Toğrol E. Determination of undrained strength of fine-


grained soils by means of SPT and its application in Turkey.
With this paper two objectives toward the accurate Engineering Geology, 2006, 86(1): 52–69
prediction of undrained shear strength using SPT results 2. Sivrikaya O. Comparison of artificial neural networks models with
are realized. First, both simple and multiple linear correlative works on undrained shear strength. Eurasian Soil
regression models are developed and their performances Science, 2009, 42(13): 1487–1496
are tested in comparison with the suggested linear 3. Nassaji F, Kalantari B. SPT Capability to estimate undrained shear
regression models in literature. Both of the linear strength of fine- grained soils of Tehran, Iran. Electronic Journal of
regression models suggested in this work performed Geotechnical Engineering, 2011, 16: 1229–1238
superior in terms of the performance metrics, R2 and 4. Vu-Bac N, Lahmer T, Zhuang X, Nguyen-Thoi T, Rabczuk T. A
RMSE. In fact, this result does not come as a surprise. The software framework for probabilistic sensitivity analysis for
main problem with the prior studies is the lack of use of the computationally expensive models. Advances in Engineering Soft-
necessary and correct statistical techniques such as the use ware, 2016, 100: 19–31
of p-values for validation, the use of training/test split of 5. Hamdia K M, Silani M, Zhuang X, He P, Rabczuk T. Stochastic
data for model verification and the use of cross validation. analysis of the fracture toughness of polymeric nanoparticle
As a result, their results may demonstrate random chance composites using polynomial chaos expansions. International
and their models become prone to overfitting. In this Journal of Fracture, 2017, 206(2): 215–227
research, all these problems inherent in the prior models 6. Hamdia K M, Ghasemi H, Zhuang X, Alajlan N, Rabczuk T.
are explained and the explanations of the statistical metrics Sensitivity and uncertainty analysis for flexoelectric nanostructures.
are given in full detail. Computer Methods in Applied Mechanics and Engineering, 2018,
The second objective of this study is to demonstrate the 337: 95–109
use of advanced techniques, machine learning algorithms 7. R Core Team. R: A Language and Environment for Statistical
as a superior alternative to the conventional methods of Computing. Vienna: R Foundation for Statistical Computing, 2017
linear regression for prediction of undrained shear strength. 8. Terzaghi K, Peck R B. Soil Mechanics in Engineering Practice. New
For that purpose, random forest, gradient boosting and the York: Wiley, 1967
stacked models were developed and used. This choice 9. Sanglerat G. The Penetrometer and Soil Exploration: Interpretation
proved to be successful which is evident in the superior of Penetration Diagrams, Theory and Practice. Developments in
prediction capability of the suggested machine learning Geotechnical Engineering 1. Amsterdam: Elsevier Publishing
algorithms especially in the case of the stacked model Company, 1972
which incorporates the MLR, gradient boosting and 10. Nixon I K. Standard penetration test: State of the art report. In:
random forest models. Using the stacked model, several Proceedings of the 2nd European Symposium on Penetration
different models can be applied simultaneously. In this Testing, Vol 1. Stockholm: AA Balkema Publishers, 1982, 3–24
14 Front. Struct. Civ. Eng.

11. Decourt L. General Report/Discussion session 2: SPT, CPT, Statistical Learning. New York: Springer, 2013
pressuremeter testing and recent developments in in-situ testing- 21. Zhou J, Shi X, Du K, Qiu X, Li X, Mitri H S. Feasibility of random-
Part 2: The standard penetration test, state-of-the-art report. In: The forest approach for prediction of ground settlements induced by the
12th International Conference on Soil Mechanics and Foundation construction of a shield-driven tunnel. International Journal of
Engineering. Rio De Janeiro: Taylor & Francis, 1989, 2405–2416 Geomechanics, 2017, 17(6): 04016129
12. Bolton Seed H, Tokimatsu K, Harder L F, Chung R M. Influence of 22. Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32
SPT procedures in soil liquefaction resistance evaluations. Journal 23. Adusumilli S, Bhatt D, Wang H, Bhattacharya P, Devabhaktuni V.
of Geotechnical Engineering, 1985, 111(12): 1425–1445 A low-cost INS/GPS integration methodology based on random
13. Skempton A W. Standard penetration test procedures and the effects forest regression. Expert Systems with Applications, 2013, 40(11):
in sands of overburden pressure, relative density, particle size, 4653–4659
ageing and overconsolidation. Geotechnique, 1986, 36(3): 425–447 24. Kuhn M. Building predictive models in R using the CARET
14. Hettiarachchi H, Brown T. Use of SPT blow counts to estimate shear package. Journal of Statistical Software, 2008, 28(5): 1–26
strength properties of soils: Energy balance approach. Journal of 25. Kuhn M, Johnson K. Applied Predictive Modeling. New York:
Geotechnical and Geoenvironmental Engineering, 2009, 135(6): Springer, 2013
830–834 26. Bishop C M. Pattern Recognition and Machine Learning. New
15. Khalid W, Cinicioglu E N, Cinicioglu O. Undrained Shear Strength, York: Springer, 2011
SPT, Water Content, Atterberg Limits-2018. Mendeley Data v1, 27. Friedman J H. Greedy function approximation: A gradient boosting
2018 machine. Annals of Statistics, 2001, 29(5): 1189–1232
16. Bisht D C, Jangid A. Discharge modelling using adaptive neuro- 28. Breiman L. Stacked regressions. Machine Learning, 1996, 24(1):
fuzzy inference system. International Journal of Advanced Science 49–64
and Technology, 2011, 31: 99–114 29. Khalid W, Cinicioglu E N, Cinicioglu O. Code for Predicting
17. McGregor J A, Duncan J M. Performance and Use of the Standard Undrained Shear Strength Using CARET Package in R. Mendeley
Penetration Test in Geotechnical Engineering Practice. Blacksburg, Data v1, 2019
Virginia: Virginia Polytechnic Institute and State University, 1998 30. Badawy M F, Msekh M A, Hamdia K M, Steiner M K, Lahmer T,
18. Ajayi L A, Balogum L A. Penetration testing in tropical lateritic and Rabczuk T. Hybrid nonlinear surrogate models for fracture behavior
residual soils—Nigerian experience. In: First International Sympo- of polymeric nanocomposites. Probabilistic Engineering Mechanics,
sium on Penetration Testing. Rotterdam: Balkema Pub., 1988, 315– 2017, 50: 64–75
328 31. Hamdia K M, Lahmer T, Nguyen-Thoi T, Rabczuk T. Predicting
19. Kulhawy F H, Mayne P W. Manual on Estimating Soil Properties the fracture toughness of PNCs: A stochastic approach based on
for Foundation Design. California: Palo Alto, 1990 ANN and ANFIS. Computational Materials Science, 2015, 102:
20. James G, Witten D, Hastie T, Tibshirani R. An Introduction to 304–313

You might also like