2022 - Predicting Nominal Capacity of RC Wall in Building

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Journal of Building Engineering 61 (2022) 105046

Contents lists available at ScienceDirect

Journal of Building Engineering


journal homepage: www.elsevier.com/locate/jobe

Predicting nominal shear capacity of reinforced concrete wall in


building by metaheuristics-optimized machine learning
Jui-Sheng Chou *, Chi-Yun Liu, Handy Prayogo, Riqi Radian Khasani, Danny Gho,
Gretel Gaby Lalitan
Department of Civil and Construction Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan

A R T I C L E I N F O A B S T R A C T

Keywords: Reinforced concrete shear walls are used in many structural systems to resist earthquake loading.
Reinforced concrete shear wall In recent earthquakes, shear wall buildings have tended to perform well. Modern building codes
Structural mechanics include provisions concerning shear capacity, which are recognized for their effectiveness.
Shear capacity Studies have demonstrated that the American Concrete Institute (ACI) 318-19 provision uses a
Machine learning low safety factor and does not cover high-strength concrete shear walls whereas the Eurocode 8
Extreme gradient boosting
provision is overly conservative. A rational method for predicting shear wall capacity could be
Metaheuristic optimization
used as an alternative to the simplified provision in the codes. Nevertheless, the use of rational
Jellyfish search optimizer
Symbiotic organisms search methods may present some difficulties for structural engineers because they require an iterative
calculation to determine the peak strengths of shear walls. Accordingly, an appropriate data-
driven machine learning scheme that accurately determines shear capacity is needed. Three
experimental cases that involve various input variables are adopted herein to train single models
and ensemble models. Numerical analytics show that the best result is achieved by using extreme
gradient boosting (XGBoost), which involves conventional parameters and synthetic parameters
that are inspired by the ACI shear wall strength equation. Subsequently, two metaheuristic
optimization algorithms are used to fine-tune the hyperparameters of the generally recognized
XGBoost. Two proposed metaheuristically hybrid models, jellyfish search (JS)-XGBoost and
symbiotic organisms search (SOS)-XGBoost, outperform the ACI provision equation and grid
search optimization (GSO)-XGBoost in the literature in predicting the nominal capacity of rein­
forced concrete shear walls in buildings. Metaheuristics-optimized machine learning models can
be used to improve building safety, simplify a cumbersome shear capacity calculation process,
and reduce material costs. The systematic approach that is utilized herein also serves as a general
framework for quantifying the performance of various mechanical models and empirical formulas
that are used in design standards.

1. Introduction
Reinforced concrete (RC) shear walls (SWs) (Fig. 1) are used in many structural systems to resist earthquake loading [1]. In recent
earthquakes, shear wall buildings have generally performed well [2]. Recognized for its effectiveness, the American Concrete Institute
(ACI) 318-19, EC-2 modern building code has included provisions about SW flexural and shear capacity. The mechanism that

* Corresponding author.
E-mail addresses: jschou@mail.ntust.edu.tw (J.-S. Chou), d10905002@mail.ntust.edu.tw (C.-Y. Liu), d10905818@mail.ntust.edu.tw (H. Prayogo), d10905812@
mail.ntust.edu.tw (R.R. Khasani), m10905839@mail.ntust.edu.tw (D. Gho), m10905833@mail.ntust.edu.tw (G.G. Lalitan).

https://doi.org/10.1016/j.jobe.2022.105046
Received 10 June 2022; Received in revised form 22 July 2022; Accepted 25 July 2022
Available online 17 August 2022
2352-7102/© 2022 Elsevier Ltd. All rights reserved.
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

determines flexural capacity has been adequately explained by flexural theory [3], but the ACI code provisions for shear capacity are
relatively unsophisticated [4]. Studies have shown that the ACI 318-19 provision has a low safety factor and does not cover
high-strength concrete SWs; moreover, the Eurocode 8 provision is overtly conservative [3].
A rational method to predict peak shear wall strength could be used in place of the simplified provisions in building codes.
Nevertheless, the use of rational methods, such as softened strut and tie [5] and the truss model [3] may present some difficulties to
structural engineers because such methods require a relatively complex calculation to determine the peak strengths of the shear walls.
Hence, an alternative approach that can provide accurate values of shear strength and is simple enough to use is required.
In the last few years, artificial intelligence (AI)-based models have been shown to be effective in predicting the shear strengths of
deep beams [6], soils [7], and concrete columns [8]. The use of a data-driven AI-based model is desirable because of its relative
simplicity and ease of development relative to rule-based/rational models [9]. Furthermore, the end-user of an AI model does not need
to perform complex calculations.
Despite the ease of use of recently developed AI models, developing an accurate AI model is a daunting challenge. The difficulty
arises in optimizing the hyperparameters of an AI algorithm. The usage of sub-optimal hyperparameters may result in unsatisfactory
model performance [10]. A metaheuristic optimizer can be used to tune the hyperparameters, yielding an optimized AI-based model
that offers improved precision and performance in predicting shear strength of RC walls in buildings. The results of this research can
support building safety and simplify an otherwise tedious shear strength calculation process.
This paper is organized as follows. Section 2 reviews the literature on rational and machine learning methods for predicting the
shear capacity of RC shear walls. Section 3 provides the basics of machine learning techniques, metaheuristic optimization algorithms,
hybrid model construction, and methods for evaluating models. Section 4 describes the collected data, data preprocessing, Pearson’s
correlation analysis, and the setting of hyperparameters of models. Section 5 comprehensively compares prediction models to identify
the best one; this model is then optimized by fine-tuning its hyperparameters using metaheuristic algorithms and optimal hybrid AI
models are thus proposed. The final section draws conclusions.

2. Literature review
Fig. 2 displays a shear wall that is a structural wall designed to resist combinations of shear, moments, and axial forces in the plane
of the wall [11]. Reinforced concrete shear walls are often used in high-rise buildings to withstand lateral forces due to wind or
earthquake loads. Shear walls can be grouped into the following three categories [12].
• Short/squat walls: reinforced concrete walls with a height-to-length ratio of less than or equal to two. The failure of squat walls is
generally shear-related and non-ductile.
• Slender/flexural walls: reinforced concrete walls with a height-to-length ratio greater than or equal to three. The behavior of
slender walls tends to be controlled by flexure.
• Intermediate walls: reinforced concrete walls with a height-to-length ratio value between two and three. The behavior of such
reinforced concrete walls is governed by shear and flexure.

The main target of this research is squat walls, which tend to fail under shear rather than flexure. The nominal shear strength of a
√̅̅̅̅
reinforced concrete shear wall is given by Eq. (1) and shall not exceed 0.66Acv f c in ACI 318-19 provision [11].

( √̅̅̅̅ )
(1)

Vn = Acv αc λ f c + ρt f y

where Vn is the nominal shear strength (resistance); Acv is the gross area of concrete section bounded by web thickness and length of
section in the direction of shear force; αc is the height to length ratio of the wall; λ is a modification factor that reflects the fact that the
mechanical properties of lightweight concrete are poorer than those of normal-weight concrete of the same compressive strength; f c is

Fig. 1. Structural building components.

2
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

Fig. 2. Required design inputs and shear capacity of reinforced concrete shear wall.

the compressive strength of concrete; ρt is the ratio of the area of distributed transverse reinforcement to the gross concrete area
perpendicular to that reinforcement, and f y is the yield strength of non-prestressed reinforcement. Fig. 2 shows those geometric factors
and additional mechanical properties that affect the shear strength of reinforced concrete walls, such as the wall height hw , wall length
lw , web thickness tw , flange width bf , flange thickness tf , vertical web reinforcement ratio ρv and strength f yv , horizontal web rein­
forcement ratio ρh and strength f yh , longitudinal boundary reinforcement ratio ρL and strength f yL , applied axial load P, aspect ratio αc
√̅̅̅̅
(wall height hw divided by wall length lw ), and square root of concrete compressive strength f c .

Following the successful application of machine learning (ML) in the engineering field [13,14], several investigations of the ca­
pacity of shear wall structures have been undertaken. Mangalathu et al. [15] developed an ML-based method to predict the failure
modes of reinforced concrete shear walls based on the experimental data concerning shear walls. Siam et al. [16] applied ML tech­
niques to classify the performance and predict the drift of masonry shear walls. Gondia et al. [17] proposed a genetic programming
model to predict the shear strength of squat walls.
Concrete structures are among the most common structures, and owing to their wide range of uses, understanding their behavior is
essential in structural engineering. Artificial Neural Networks (ANNs) [18] and fuzzy systems have been successfully used to estimate
the capacity of structural RC members and to determine the properties of concrete that influence it [19]. Tran, for instance, used ML
models to predict the chloride diffusion coefficient of concrete that contains supplementary cementitious materials, such as silica fume,
ground granulated blast furnace slag, and fly ash [20]. The ML technique represents a new way of predicting the effective stiffness of
precast concrete columns with greatly improved accuracy relative to the conventional methods and is used to investigate systemat­
ically the effects of design parameters on the stiffness of precast concrete columns [21].
In particular, ANN, Support Vector Regression (SVR) [22], and Random Forest (RF) [23] are widely used to predict the compressive
strength of concrete [6,24], handwritten digit recognition task [25], energy consumption [26–28], and solar energy generation [29].
Studies have shown that the aforementioned models are highly effective for prediction. As the field of ML is progresses, some advanced
ensemble models have been developed. One such model is extreme gradient boosting (XGBoost). Most recently, Feng et al. [30] used
XGBoost [31] to improve the prediction accuracy of squat shear wall strength.
XGBoost is extensively used not only in engineering but in a wide range of different fields, predicting, for example, insurance claims
[32], protein-protein interactions [33], and many other things. The widespread usage of XGBoost has established that the model is
robust and can be applied in real-world problems. However, XGBoost has more hyperparameters than other ML techniques, such as
ANN, SVR, and RF [32]. The usage of a sub-optimal set of hyperparameters may yield unsatisfactory model performance and so the
development of optimal XGBoost models is quite challenging.
Metaheuristic algorithm has been shown to be an effective method for optimizing ML models that are difficult to formulate
mathematically (also known as black box models). Since they are gradient-free optimizers, metaheuristic algorithms have been used to
optimize SVM [34], ANN [35], and other ML model parameters. However, a review of the recent corpus of metaheuristic optimization
has identified the use of many ‘classical’ metaheuristic algorithms, such as Genetic Algorithm (GA) and Particle Swarm Optimization

3
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

(PSO) [36–38]. These classical algorithms, while effective, may not offer optimal results efficiently.
The newer metaheuristic algorithms Jellyfish Search (JS) and Symbiotic Organisms Search (SOS) [39,40] are parameter-free and so
do not require any form of parameter fine-tuning, unlike the GA and PSO. They are therefore objectively superior to classical meta­
heuristic algorithms and are used in this study. The following numerical experiments will identify the optimal ML model to predict RC
wall shear strength. The usage of the superior metaheuristic algorithm with the best ML model constitutes a comprehensive framework
for precisely predicting RC shear wall strength.

3. Methodology
The following sub-sections present the machine learning techniques, metaheuristic optimization algorithms, hybrid model
framework, and evaluation methods that are adopted in this research.

3.1. Machine learning


3.1.1. Base learner
3.1.1.1. Artificial neural network. Artificial Neural Network (ANN) is an artificially intelligent method that is inspired by human
neurons [41,42]. A Multi-Layer Perceptron (MLP) is a fully connected class of feedforward and back-propagation ANNs. The most basic
unit of an MLP, the neuron, is combined with an activation function to provide nonlinear mapping to fit the actual output values and
improve the prediction accuracy. The output of each neuron is then fed forward into another neuron until an output neuron is ulti­
mately reached. The performance of the ANN is frequently determined by the network size. Generally, more neurons or layers
correspond to a higher probability of favorable performance since more weighting parameters can be evaluated by training.
3.1.1.2. Support vector regression. Support vector regression (SVR) is an extension of the support vector machine that was proposed by
Cortes and Vapnik [43] to solve regression problems. Equation (2) defines the generic SVR model.

f(x) = wT φ(x)+b (2)

Where f(x) is the regression function; φ(x) is the kernel function that converts input data (x) into a higher-dimensional space; w is the
weight vector of the hyperplane, and b is the bias parameter.

3.1.2. Ensemble model


3.1.2.1. Random forest. Random forest (RF) is an ensemble learning method that combines many decision trees. Tin Kam [44]
introduced the concept of random decision trees, and Breiman [45] extended the original version with a few modifications by adding
algorithmic parameters, consisting of displayImpDebug, displayTrees, and importance, to develop the random forest. The RF employs
the bootstrap resampling approach to construct multiple decision trees and average their predictions. By avoiding correlation among
the decision trees using bootstrap resampling, the generalization accuracy of the algorithm is increased.
A regression tree is used in the RF regression algorithm. Each of the regression trees is built on various bootstrap samples from the
initial training data. The regression trees are constructed using two thirds of the training dataset. The growth of each regression tree,
including its maximum depth and branching, depends on the hyperparameters. The error of the random forest is approximated using
the remaining dataset which is called out-of-bag (OOB) error. The OOB error can be used to estimate the prediction error of the set of
observations that are not used for building the current tree, and evaluate the relative importance variable, which shows how distinct
feature influences the prediction model.
3.1.2.2. Extreme gradient boosting. Extreme gradient boosting (XGBoost) is a scalable tree boosting system that is widely used in data
science and yields state-of-the-art solutions to many real-world problems, using a minimal amount of resources [31]. This section
comprises three parts. The first part will explain the use of XGBoost for regression; the second part will elaborate the construction of
decision trees in XGBoost; and the third part illustrates the technique to improve generalization ability in decision trees.
XGBoost is an ensemble method that involves decision trees. The predicted value is calculated using a combination of values that
are output by each decision tree.

K
yi =
̂ αk f k (xi ) (3)
k=1

where ̂y i is the predicted value; K is the maximum tree depth; f k (•) is the prediction function of a single decision tree; and αk is a
learning rate that is used to avoid overfitting.
XGBoost builds trees by minimizing the following loss function.
∑n
( ( t− 1 ) ) 1 ∑T
Lt = yi
l yi , ̂ + f t (xi ) +Ω(f t ), where Ω(f)= γT + λ ‖ wj ‖2 (4)
i=1
2 j=1

where ̂y t−i 1 is the prediction of the i-th datum at the (t-1)-th iteration, and l(•) is a function that quantifies the squared difference
between the prediction ŷi and the target yi . The purpose of the rest of the loss function is to determine the appropriate decision function
that minimizes the loss function. The second term Ω(•) is a regularization term which penalizes the complexity of the model based on
the parameters (γ and λ) that are used in generating the decision trees; T is the number of leaves in the tree, and wj is the weight of leaf j.
The second-order Taylor approximation can be used to optimize quickly the objective in the general setting; by removing the

4
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

constant terms, the equation can be approximated as follows [31].


∑n [ ]
1
̃t =
L gi f t (xi ) + hi f 2t (xi ) +Ω(f t ) (5)
i=1
2

y t− 1 ) and hi = ϑ2 t− 1 l(yi , ̂
where gi = ϑ t− 1 l(yi , ̂ y t− 1 ) are first- and second-order gradient statistics on the loss function. ϑ t− 1 and ϑ2 t− 1
̂y ̂y ̂y ̂y
represents the first and second-order partial derivatives of ̂ y t−i 1 . The optimal weight wj that is used in Eq. (4) is calculated using the
following equation (Eq. (6)) [31].

g
wj = − ∑ i∈I i (6)
i∈I hi +λ

In building XGBoost trees for regression, the similarity score of each leaf in the tree is calculated using Eq. (7) [31].
(∑ )2
g
Similarity score = ∑ i∈I i (7)
i∈I hi +λ

where λ is a regularization parameter that prevents the overfitting of training data.


Then, the gain of each branch is used to determine how to split the data. Assume that IL and IR are the instances of left and right
branches after the split and I = IL ∪ IR; the gain of each branch is given by Eq. (8) [31].
(∑ )2 ( ∑ )2 ( ∑ )2
gi gi g
Gain = ∑ i∈IL + ∑ i∈IR − ∑ i∈I i (8)
i∈IL hi +λ i∈IR hi +λ i∈I hi +λ

Pruning is carried out by calculating the differences between Gain and a user-defined tree complexity parameter, γ. If the result is
positive, the branch is not pruned; otherwise, the branch is pruned or removed. If the branch is pruned, then γ must be subtracted from
the Gain of the upper branch. The XGBoost model keeps building new trees until the residuals reach a certain threshold or the model
reaches the maximum number of tree depths.

3.2. Metaheuristic optimization algorithm


Metaheuristic optimization algorithms are becoming increasingly popular for solving complex problems in various domains for the
following reasons [46]. (i) They rely on simple concepts and are easy to implement; (ii) they do not depend on information about the
gradient of the objective function; (iii) they do not become trapped in local minima; (iv) they can be utilized to solve a wide range of
problems in various fields. Exploration (diversification) and exploitation (intensification) are the two main phases of a metaheuristic
algorithm. The main differences among metaheuristics are in their balancing of those two processes by trajectory-based (exploita­
tion-oriented) and population-based (exploration-oriented) methods [47].

Fig. 3. Jellyfish behavior in the ocean.

5
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

3.2.1. Jellyfish search optimizer


Artificial jellyfish search (JS) optimizer, which is a new metaheuristic optimization algorithm that is inspired by the behavior of
jellyfish as they seek food in the ocean (Fig. 3) [48]: they follow the ocean current at first; as time goes by, they increasingly move
within jellyfish swarms; and a time control mechanism governs the switching between these motions. In the JS algorithm, both
exploration and exploitation are considered. In the beginning, jellyfish sense the ocean current and use it to find planktonic food. Over
time, the movements of the jellyfish switch to passive and active motions inside swarms for exploitation. Toward the end, jellyfish
bloom occurs, which is the optimum phase.
Jellyfish search is based on three idealized rules [39].
(i). Jellyfish either follow the ocean current or move inside the swarm, and a “time control mechanism” governs the switching
between these types of movement.
(ii). Jellyfish move in the ocean in search of food. They are more attracted to locations where more food is available.
(iii). The quantity of food found is determined by the location and corresponding objective function.

In the JS, movement toward an ocean current is exploration; movements within a jellyfish swarm are exploitation, and a time
control mechanism switches between them. Initially, the probability of exploration exceeds that of exploitation; over time, the
probability of exploitation increases, ultimately becoming much higher than that of exploration. The jellyfish identify the best location
inside the searched areas. Fig. 4 presents the flowchart of the artificial JS optimizer.
3.2.1.1. Population initialization. The logistic map is used to improve the diversity of the initial population [39]. Xi is the logistic
chaotic value of the location of the i-th jellyfish; X0 is used to generate the initial population of jellyfish, X0 ∈ (0, 1), X0 ∕
∈ (0.0, 0.25,
0.75, 0.5, 1.0), and parameter η is set to 4.0. The population size, nPop, is set according to the complexity of problem.
Xi+1 = ηXi (1 − Xi ) (9)
3.2.1.2. Boundary conditions. Oceans are located all around the world, which represent the search space of JS algorithm. The earth is
approximately spherical, so when a jellyfish moves outside the bounded search area, meaning that jellyfish moves across the Earth’s
North (South) Poles, it is assumed to return to the opposite bound. Equation (10) presents this re-entry process. Xi,d is the location of the
i-th jellyfish with the d-th dimension [39]; Xi,d is the updated location after the boundary constraints have been imposed; Ub,d and

Lb,d are the upper and lower bounds on the search space in the d-th dimension, respectively.

Fig. 4. Algorithmic flowchart of JS optimizer.

6
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

⎧ ( )
⎨ X′ = Xi,d − Ub,d + Lb,d if Xi,d > Ub,d
(10)
i,d
′( )
⎩ X = Xi,d − Lb,d + Ub,d if Xi,d < Lb,d
i,d

3.2.1.3. Ocean current. The ocean current contains large amounts of nutrients, so the jellyfish are attracted to it. The direction of the
ocean current (trend) is determined by averaging all of the vectors from each jellyfish in the ocean to the jellyfish that is currently in
the best location (Eq. (11)). X* is the location of the jellyfish that currently has the best location in the swarm; μ is the mean location of
all jellyfish; Xi (t +1) is the new location of each jellyfish (Eq. (12)); and β > 0 is a distribution coefficient that is related to the length of
trend. From the results of a sensitivity analysis [39] based on numerical experiments, β = 3 is obtained.
̅̅→
trend = X* − β × rand(0, 1) × μ (11)

Xi (t + 1) = Xi (t) + rand(0, 1) × (X* − β × rand(0, 1) × μ) (12)


3.2.1.4. Jellyfish swarm. In a swarm, jellyfish exhibit passive (type A) and active (type B) motions. Initially, when the swarm has just
been formed, most jellyfish exhibit type A motion. Over time, they increasingly exhibit type B motion. Type A motion is the motion of
jellyfish around their locations, and the corresponding updated location of each jellyfish is given by Eq. (13) [39]. where Ub and Lb are
the upper and lower bounds of the search space, respectively; and γ > 0 is a motion coefficient that is related to the length of motion
around the locations of the jellyfish.
Xi (t + 1) = Xi (t) + γ × rand(0, 1) × (Ub − Lb ) (13)
To simulate type B motion, a jellyfish (j) other than the one of interest is selected at random, and a vector from the jellyfish of
interest (i) to the selected jellyfish (j) specifies the direction of movement. When the quantity of food at the location of the selected
jellyfish (j) exceeds that at the location of the jellyfish (i) of interest, the latter moves toward the former; it moves directly away from it
if the quantity of food available to the selected jellyfish (j) is lower than that available to the jellyfish of interest (i). Hence, each
jellyfish moves in a direction that favors its finding food in a swarm. Equations (14) and (15) simulate the direction of motion and the
updated location of a jellyfish, respectively [39]. This movement is considered to be effective exploitation of the local search space.
̅̅̅̅̅→
Xi (t + 1) = Xi (t)+rand(0, 1) × direction (14)
{ ( ( )
̅̅̅̅̅→ Xj (t) − Xi (t) if f Xi (t)) ≥ f( Xj (t))
direction = (15)
Xi (t) − Xj (t) if f(Xi (t)) < f Xj (t)
3.2.1.5. Time control mechanism. To regulate the movement of jellyfish between following the ocean current and moving inside the
jellyfish swarm, the time control mechanism [39] involves a time control function c(t) and a constant 0.5. The time control function
yields a random value that fluctuates between 0 and 1 over time. Equation (16) specifies the time control function. When its value
exceeds 0.5, the jellyfish follow the ocean current; when its value is less than 0.5, they move inside the swarm. The variable t is the time
specified as the iteration number. The maximum number of iterations, Maxiter, like the population size, is an initialized parameter.
Similar to c(t), the function (1− c(t)) is used to simulate the movement inside a swarm (type A or B).
⃒( ) ⃒
⃒ t ⃒
c(t) = ⃒⃒ 1 − × (2 × rand(0, 1) − 1)⃒⃒ (16)
Maxiter

Fig. 5. Symbiotic organisms in an ecosystem.

7
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

3.2.2. Symbiotic organisms search


Symbiotic organism search (SOS) [40] is a population-based search algorithm that is inspired by symbiotic interactions among
organisms in an ecosystem. Fig. 5 presents a group of symbiotic organisms that live together in an ecosystem [40]. SOS begins with an
initial population called the ecosystem. In the initial ecosystem, a group of organisms is generated randomly to the search space. Each
organism represents one candidate solution to the corresponding problem. Each organism in the ecosystem is associated with a certain
fitness value, which reflects degree of adaptation to the desired objective. Interactions benefit both sides in the mutualism phase;
benefit one side and do not impact the other in the commensalism phase; benefit one side and actively harm the other in the parasitism
phase. Each organism interacts with the other organism randomly through all phases. The process is repeated until termination criteria
are met.
As a metaheuristic algorithm, SOS integrates the concepts of exploration and exploitation to find efficiently near-optimal solutions.
However, because the SOS algorithm does not possess any algorithm-specific parameters, it may become trapped into local optima or
even reach convergence too quickly from case to case. Numerical experiments and validation are required before implementing the
algorithm in practical problems. Fig. 6 displays the flowchart of the SOS algorithm.
3.2.2.1. Ecosystem initialization. In the early stage of SOS operations, the initial population matrix, which represents the candidate
solutions, is created. The number of dimensions of the matrix depends on the number of organisms (eco_size) in the ecosystem and the
number of decision variables (D), or the number of variables sought. Each organism represents a candidate solution to the corre­
sponding problem. Each organism in the ecosystem is associated with a certain fitness value, which reflects the degree of adaptation to
the desired objective.
3.2.2.2. Mutualism phase. In SOS, an organism is matched to the i-th member of the ecosystem. Another organism Xj is chosen at
random from the ecosystem to interact with Xi . Both organisms engage in a mutualistic relationship to increase their mutual survival
rate in the ecosystem. The equations of both organisms are Eqs. (17) and (18) [40], where rand(0, 1) is a uniformly distributed random
number between zero and unity. Xbest denotes the best functional value of adaptation in the population. MV (mutual_vector) represents
the connection between organisms Xinew and Xjnew and MV is set as the average of two organisms. The beneficial factor (BF) is set at
random to either 1 (partial benefit) or 2 (full benefit) for each organism, and specifies whether an organism partially or fully benefits
from the interaction.

Fig. 6. Flowchart of SOS algorithm.

8
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

Xinew = Xi +rand(0, 1) × (Xbest – MV × BF1 ) (17)

Xjnew = Xj +rand(0, 1) × (Xbest – MV × BF2 ) (18)


3.2.2.3. Commensalism phase. The commensalism phase in the ecosystem favors only one organism, whereas the other organism
receives minimal or unaffected. In SOS, an organism Xj is selected at random from the ecosystem to interact with Xi . Under this
condition, the organism Xi attempts to benefit from the interaction. However, organism Xj itself neither benefits nor suffers from the
relationship. The new candidate solution Xinew is calculated according to Eq. (19) [40]. Following the biological laws of nature, or­
ganism Xi is updated only if its new fitness exceeds its pre-interaction fitness.
( )
Xinew = Xi +rand( − 1, 1) × Xbest – Xj (19)

Fig. 7. Flowchart of hybrid model evaluation.

9
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

3.2.2.4. Parasitism phase. The parasitism phase in the ecosystem only benefits one organism, while the other organism suffers a
disadvantage because of the interaction. In SOS, organism Xi is given a role similar to a parasite in the ecosystem called ‘Para­
site_Vector’, which is created in the search space by duplicating organism Xi and modifying in the specified dimensions. Organism Xj is
selected at random from the ecosystem and acts as a host to the parasite vector. The fitness values of both organisms are then evaluated.
If Parasite_Vector has a better fitness value than organism Xj, it will replace position Xj in the ecosystem [40].
3.2.2.5. Termination criteria. The SOS algorithm is iteratively implemented until pre-specified termination criteria are satisfied. The
termination criteria that are used in metaheuristic applications usually involve the maximum number of iterations (max_iter) or the
number of fitness evaluations (FE). Once the algorithm is terminated, a solution is presented.

3.3. Hybrid model framework


In evaluating the adequacy of the proposed hybrid models, the data are divided into two sets – a training set (70%) and a testing set
(30%). The training set is used to tune the model by the metaheuristic optimization method. Ten-fold cross-validation is employed to
evaluate the performance (fitness) of the model. Once the optimal values of the hyperparameters have been obtained, the model is
tested against the testing set to calculate its final performance metrics. Fig. 7 displays the flowchart of hybrid model evaluation. Once
the framework is satisfactorily validated and tested, a model thus built with collected data can be trained for obtaining predictive
informatics.

3.4. Performance metric


Four statistical performance metrics will be used in this study, they are the goodness of fit (R2), the root mean square error (RMSE),
the mean absolute error (MAE), and the mean absolute percentage error (MAPE). They will be used to determine which methods better
predict the shear strength of a shear wall and are formulated as follows.
⎛ ⎞
∑n ∑n ∑n
⎜ n y y
̂ − y y
̂ ⎟
(20)
i i i i=1 i
R2 = ⎝√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
( ∑
i=1
)( ∑
i=1
)⎠
n ( ∑ n )2 n 2 ( ∑ n ) 2
n i=1 y2i − i=1 yi n i=1 ̂y i − i=1 ̂yi

√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
1∑ n
RMSE = (y − ̂ y i )2 (21)
n i=1 i

Table 1
ANN, SVR, RF, and XGBoost parameter settings.

Machine learning Parameter Default value

Artificial Neural Network (ANN) hidden_layer_number 1


hidden_layer_node 4
BatchSize 100
Learning Rate 0.3
Momentum 0.2
Training Time 500
ValidationThreshold 20
Support Vector Regression (SVR) BatchSize 100
C 1.0
Kernel PolyKernel
NumDecimalPlaces 2
Regression Optimizer RegSMOImproved
Random Forest (RF) n_estimators 100
max_depth None
min_sample_split 2
min_sample_leaf 1
min_weight_fraction_leaf 0
max_features None
max_leaf_nodes None
min_impurity_decrease 0
Bootstrap True
max_samples None
Extreme Gradient Boosting (XGBoost) eta 0.3
gamma 0
max_depth 6
min_child_weight 1
max_delta_step 0
subsample 1
colsample_bytree 1
lamda 1
alpha 0

10
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

1∑ n
MAE = yi|
|y − ̂ (22)
n i=1 i

n ⃒ ⃒
100% ∑ ⃒(yi − ̂y i )⃒
MAPE = ⃒ ⃒ (23)
n i=1 ⃒ yi ⃒

where n is the number of observations; yi is the actual value of the i-th observation; ̂
y i is the predicted value of the i-th observation.

4. Data preprocessing and model setting


The shear wall dataset is collected from the Shear Wall Database [3,30]. The data are then preprocessed to clean up typos, duplicate
data, and missing values that could help to reduce the disturbance of anomalies on model stability. Data are collected and preprocessed
in the following steps.
(i). Original data from Feng et al. (2021) [30] are used. The data concern 534 shear walls.
(ii). The dataset of Feng et al. is taken from the study of Ning and Li (2017) [49] and Massone and Melo (2018) [50]. The com­
bination of data cleaning is inaccurate as 100 wall records are twins so duplicate data were deleted, leaving data for 434 walls.
(iii). The wall data from Chandra et al. (2018) [3] concerning 84 walls were combined. Twenty-six duplicate samples were removed,
leaving data concerning a total of 492 shear walls.
Since the range of values of raw data is wide, objective functions do not work effectively without normalization in some machine
learning algorithms [51]. Feature scaling is one of the most important data preprocessing steps in machine learning. In this investi­
gation, standardization is used to put different variables on the same scale: the mean is subtracted from each value of each variable, and
then division by the standard deviation of the variable shifts its distribution to have a mean of zero and a standard deviation of one.
The WEKA 3.8 and Python 3.9 programming software packages were used in this work. They were used to construct base ML
models (WEKA), and to implement ensemble and metaheuristics-optimized XGBoost models (Python). Numerical experiments are
performed on a desktop computer with an Intel Core i7-10700 CPU@2.90 GHz processor and 16 GB DDR4 RAM. Table 1 presents the
default hyperparameters of the machine learning models (ANN, SVR, RF, and XGBoost).

5. Results and discussion


5.1. Case scenarios for shear wall strength prediction
Three case scenarios of shear wall datasets are considered to train the machine learning models. In the first case, the model is run
using all of the available original features; in the second case, it is run using synthetic features that are evaluated using the ACI equation
(Eq. (1)) for the shear capacity of a structural wall, and in the third case, it is run using all of the original features and the synthetic
features for shear wall strength prediction. Case 2 is considered because the effect of reducing dimensionality on predicting perfor­
mance is to be analyzed as well as a fair accuracy comparison between machine learning technique and the current design code.
Additionally, reducing the number of feature dimensions is believed to decrease the data acquisition cost and ease the laboratory
experiments if the model still can keep a satisfactory accuracy. Table 2 presents the variables in all three cases. Appendix 1 shows the
datasets for cases 1 to 3.

5.2. Exploratory data analysis


Data exploration, also known as exploratory data analysis, is a process by which statistical and visualization methods are used to

Table 2
Selected feature factors in three cases.

Group Description Variable Symbol Case 1 Case 2 Case 3

Feature from original dataset Wall height X1 hw ✓ – ✓


Wall length X2 lw ✓ – ✓
Web thickness X3 tw ✓ – ✓
Flange thickness X4 tf ✓ – ✓
Flange width X5 bf ✓ – ✓
Concrete strength X6 ′
fc ✓ – ✓
Horizontal web reinforcement strength X7 f yh ✓ ✓ ✓
Vertical web reinforcement strength X8 f yv ✓ – ✓
Longitudinal boundary reinforcement strength X9 f yL ✓ – ✓
Horizontal web rebar ratio X10 ρh ✓ ✓ ✓
Vertical web rebar ratio X11 ρv ✓ – ✓
Longitudinal boundary rebar ratio X12 ρL ✓ – ✓
Axial load of wall X13 P ✓ – ✓
Additional feature required when using ACI equation Gross area of concrete section X14 Acv – ✓ ✓
Aspect ratio X15 αc – ✓ ✓
√̅̅̅̅
Square root of concrete strength X16 ′ – ✓ ✓
fc

11
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

understand and visualize data. This step helps to identify patterns in a dataset. Data exploration has three primary goals, which are to
identify the characteristics of single variables, reveal patterns of data distributions, and to determine relationships between variables.
Visualization methods graphically represent data in graphs and charts to facilitate understanding of complex structures and re­
lationships within the data. Table 3 presents the statistical descriptions concerning each variable and Fig. 8 displays a visualization of
relevant statistical distributions. Obviously, these variables display non-normal and irregular distributions, supporting the need to
determine the input-output relationship by advanced data analytics.
The input variables are the wall height hw , wall length lw , web thickness tw , flange width bf , flange thickness tf , concrete
compressive strength f c , vertical web reinforcement ratio ρv and strength f yv , horizontal web reinforcement ratio ρh and strength f yh ,

longitudinal boundary reinforcement ratio ρL and strength f yL , applied axial load P, gross area of concrete section Acv , aspect ratio αc ,
√̅̅̅̅
and square root of concrete compressive strength f c . The output is simply the nominal shear capacity of squat wall Vn . From Feng

√̅̅̅̅
et al., 13 input variables and one output are used, and three new input variables - Acv , αc , and f c - are added in this study (Table 2). A

total of 492 data with 16 input and one output variables are thus involved. The entire database is randomly split into training (70%)
and testing (30%) sets.
Pearson’s correlation coefficient is used to measure the linear correlation between two variables. Its value lies between -1 and +1,
where -1 indicates a perfect negative linear correlation, 0 indicates no linear correlation and 1 indicates a perfect positive linear
correlation. A coefficient value between ±0.50 and ± 1 is said to indicate a strong correlation. Fig. 9 presents a heatmap of correlation
coefficient between pair-wise variables. The newly added variables based on the ACI equation are seen to be strongly correlated with
existing variables. For example, gross area of concrete section Acv is strongly correlated with seven variables, hw , lw , tw , tf , bf , P, and Vn .
√̅̅̅̅
The aspect ratio αc is strongly correlated with wall height hw ,. The square root of concrete strength f c is strongly correlated with three

variables, f c , f yL , and P. The output variable Vn is strongly correlated with six variables: lw , tw , tf , bf , P, and Acv .

5.3. Comparison of single and ensemble models


To evaluate the performance of single (ANN and SVR) and ensemble (RF and XGBoost) models, the loss functions of the models in
each case are compared. In the first case in which only the factors from Feng et al. (2021) are used, RF performs best with R2, RMSE,
and MAE values of 0.966, 123.82 kN, and 72.50 kN, respectively, while XGBoost has a similar performance with the smallest MAPE of
15.01%, as seen in Table 4.
In the second case with five factors, the predictive performance of RF equals that of XGBoost. RF performs best with an R2 and
RMSE of 0.965 and 125.34 kN, respectively, while the XGBoost yields the smallest MAE and MAPE of 73.25 kN and 13.88%. In the
third case, XGBoost performs best with R2, RMSE, MAE, and MAPE values of 0.978, 99.36 kN, 59.96 kN, and 13.22%, respectively.
Fig. 10 displays the MAPEs of single and ensemble models. In terms of predictive accuracy, the ensemble models present superior
performance in all three cases.
Overall, the ensemble models outperform the single (baseline) models. Among the ensemble models, XGBoost performs best in
most evaluation measures, so the JS and SOS metaheuristic algorithms are integrated into XGBoost to maximize performance in the
prediction of shear wall strength. An interesting finding is that XGBoost has the better predictive accuracy when the number of input
factors is large, and RF is preferred for situations when the employed number of factors is small.

5.4. Integration of machine learning and optimization algorithm to establish hybrid models
Two recently developed optimization algorithms, jellyfish search (JS) and symbiotic organisms search (SOS) are integrated with

Table 3
Variables in database of shear strength of squat walls.

Description Variable Symbol Unit Minimum Maximum Mean Standard deviation

Wall height X1 hw mm 150 4572 1039.641 716.2902


Wall length X2 lw mm 420 3960 1352.025 741.9087
Web thickness X3 tw mm 10 160 73.79238 40.66748
Flange thickness X4 tf mm 30 360 130.7381 76.58449
Flange width X5 bf mm 30 3045 241.121 287.2892
Concrete strength X6 ′
fc MPa 12.3 138 33.12083 21.92864
Horizontal web reinforcement strength X7 f yh MPa 0 1420 387.3421 152.2247
Vertical web reinforcement strength X8 f yv MPa 0 1420 384.8493 150.9224
Longitudinal boundary reinforcement strength X9 f yL MPa 208.9 1009 401.0812 126.1964
Horizontal web rebar ratio X10 ρh – 0 0.03667 0.006865 0.005496
Vertical web rebar ratio X11 ρv – 0 0.03667 0.007003 0.005798
Longitudinal boundary rebar ratio X12 ρL – 0.004 0.10583 0.033842 0.020022
Axial load of wall X13 P kN 0 2617 264.8215 465.4279
Gross area of concrete section X14 Acv mm2 10800 825450 165018.4 159520.1
Aspect ratio X15 αc – 0.21049 2.4 0.826876 0.447027
√̅̅̅̅
Square root of concrete strength X16 ′ MPa 3.50713 11.74734 5.529621 1.596655
fc
Shear capacity of structural wall Y Vn kN 16.377 3138.128 599.9185 646.3667

12
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

Fig. 8. Statistical distribution of input and output variables.

13
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

Fig. 9. Correlation analysis.

the best model (XGBoost), identified in the aforementioned section, to establish hybrid models, JS-XGBoost and SOS-XGBoost, for
prediction of shear wall strength. The search space for various hyperparameters of XGBoost, including eta, gamma, max_depth,
min_child_weight, max_delta_step, subsample, colsample_bytree, lambda, and alpha, is defined as a range of values that is taken from
the literature [52] and provided in Table 5. The search range and hardware devices that are used in this study can be used for reference
although a larger search range should be considered in future work. In SOS algorithm, the objective function is evaluated four times in
a single run, twice in the mutualism phase and once in the commensalism phase and parasitism phase, respectively. In JS algorithm, the
objective function is only evaluated one time in a single run. To make a fair comparison between the optimization algorithms, the
initialization control parameters for JS and SOS are given in Table 6.
The objective function of the metaheuristic algorithms is the minimization of RMSE (kN). The training dataset is used to optimize
the hyperparameter settings. Ten-fold cross-validation is adopted to evaluate the performance (fitness) of the model. After the hybrid
models have been optimized, the test dataset is used to evaluate the prediction performance. As the generalizability of resulting model
must be considered, the hybrid models are run five times with the aim of eliminating random bias in the population initialization for
the JS and SOS algorithms. Three shear wall datasets are used to evaluate the hybrid models.
In the first case, in which only the 13 factors from Feng et al. (2021) are used, JS-XGBoost exhibits the best average performance
with R2, RMSE, MAE, and MAPE values of 0.976, 104.46 kN, 64.06 kN, and 16.92% respectively, over five runs. Table 7 compares the
performance measures of hybrid models. Both JS-XGBoost and SOS-XGBoost achieve better RMSE values than the original XGBoost. An
average RMSE of 104.46 kN was obtained using JS-XGBoost in five runs. This result is 3.22 kN better than the 107.68 kN that was
obtained using XGBoost with the default hyperparameters.

14
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

Table 4
Comparison of performance measures for single and ensemble models.

No. Model Method Dataset Metric

R2 RMSE (kN) MAE (kN) MAPE (%)

Case 1 Single ANN Training 0.980 95.06 67.63 32.12


Test 0.929 193.55 116.10 36.02
SVR Training 0.828 260.51 121.52 34.08
Test 0.749 346.50 156.93 40.26
Ensemble RF Training 0.991 57.38 32.08 6.39
Test 0.966 123.82 72.50 15.43
XGBoost Training 0.999 2.99 1.35 0.84
Test 0.965 125.85 72.76 15.01
Case 2 Single ANN Training 0.899 206.43 139.66 75.10
Test 0.882 225.05 150.16 90.69
SVR Training 0.844 256.17 142.26 42.47
Test 0.852 250.49 135.15 51.51
Ensemble RF Training 0.989 64.38 34.89 7.16
Test 0.965 125.34 73.77 14.58
XGBoost Training 0.999 10.29 3.88 1.87
Test 0.956 140.76 73.25 13.88
Case 3 Single ANN Training 0.978 102.52 77.11 53.03
Test 0.960 140.76 96.02 65.98
SVR Training 0.874 229.54 110.80 26.42
Test 0.852 251.17 122.51 32.87
Ensemble RF Training 0.992 55.59 31.04 5.97
Test 0.972 110.90 65.71 13.41
XGBoost Training 0.999 3.17 1.60 0.89
Test 0.978 99.36 59.96 13.22

Fig. 10. MAPEs of single and ensemble models.

Table 5
XGBoost tuning hyperparameters.

Parameter Lb Ub Description

eta 0.001 1 Learning rate


gamma 0 100 Minimum loss to split
max_depth 0 100 Maximum depth of a tree
min_child_weight 0 100 Minimum sum of instance weight (hessian) needed in a child
max_delta_step 0 100 Maximum delta step for each leaf output
subsample 0.001 1 Subsample of rows in training datasets
colsample_bytree 0.5 1 Subsample of columns in training datasets
lamda 0 10 L2 regularization term on weights
alpha 0 10 L1 regularization term on weights

Fig. 11 presents the convergence histories of the RMSE for JS and SOS. The vertical axis represents RMSE (kN). SOS yields a better
optimal solution than JS in four of the five runs of the optimization process. Furthermore, the optimization runtime of SOS was 1.5 min
faster than that of JS in average. Therefore, better convergence efficiency of the objective function optimizer may not result in better
performance as measured by the metrics of interest.
In the second case, five variables that were identified in the ACI equation for the shear strength of a structural wall, Eq. (1), are

15
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

Table 6
Control parameters in metaheuristic algorithms.

Metaheuristic algorithm Max_iter Pop_num

SOS 50 20
JS 50 80 (20 × 4)

Table 7
Comparison of performance measures of hybrid models in case 1.

Method Dataset Metric Computational time


2
R RMSE (kN) MAE (kN) MAPE (%)

JS-XGBoost Training 0.998 25.34 12.58 3.39 02:01:58


0.998 30.27 14.53 3.68 02:03:21
0.999 20.21 11.60 3.80 02:03:36
0.999 23.48 11.93 3.58 02:09:28
0.999 17.07 9.29 3.65 02:12:29
Mean 0.999 23.28 11.98 3.62 02:06:10
Standard deviation 0.000 4.50 1.69 0.13 00:04:04
JS-XGBoost Test 0.979 97.96 57.76 14.62 00:00:01
0.975 105.82 63.72 15.33 00:00:01
0.976 105.40 66.07 16.79 00:00:01
0.975 105.90 64.76 18.68 00:00:01
0.975 107.22 68.00 19.17 00:00:01
Mean 0.976 104.46 64.06 16.92 00:00:01
Standard deviation 0.001 3.31 3.46 1.79 00:00:00
SOS-XGBoost Training 0.999 21.16 11.26 3.63 02:00:20
0.996 39.10 18.34 4.51 02:03:02
0.999 18.47 10.22 3.24 02:07:48
0.999 23.85 12.47 4.42 02:01:57
0.998 26.62 14.09 3.95 02:10:12
Mean 0.998 25.84 13.28 3.95 02:04:40
Standard deviation 0.001 7.16 2.84 0.48 00:03:43
SOS-XGBoost Test 0.975 105.61 63.26 16.54 00:00:01
0.973 110.44 66.24 16.67 00:00:01
0.975 106.77 66.88 16.61 00:00:01
0.975 106.86 69.21 20.00 00:00:01
0.974 108.74 68.31 19.01 00:00:01
Mean 0.974 107.68 66.78 17.77 00:00:01
Standard deviation 0.001 1.70 2.04 1.46 00:00:00

Fig. 11. Convergence histories of RMSE (kN) for JS and SOS in case 1.

16
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

Fig. 12. Convergence histories of RMSE (kN) for JS and SOS in case 2.

used. Fig. 12 shows that SOS converges faster than JS and has a smaller objective function value and the best algorithm cannot be
determined with reference only to the convergence history. The best model is identified by comparing the multiple performance
measures of the models, as in Table 8. Neither of the hybrid models (JS-XGBoost and SOS-XGBoost) outperforms the ensemble models,
RF and XGBoost (Table 4). However, a comparison between only JS-XGBoost and SOS-XGBoost demonstrates that SOS-XGBoost
performs better in all respects, with the exception of their having the same R2 value. The results in case 2 indicate that with fewer
factors, ensemble models are better than single and hybrid models for predicting shear wall strength.
Additionally, this study used ACI 318-19 provision equation, as mentioned in Eq. (1), to calculate the shear strength. The R2, RMSE,
MAE, and MAPE values are 0.853, 393.725 kN, 233.755 kN, and 34.736% as shown in Table 8. With the condition of the same features
in case 2, SOS-XGBoost performed with mean R2, RMSE, MAE, and MAPE values of 0.953, 145.03 kN, 78.99 kN, and 19.55%. The
proposed SOS-XGBoost has a 63.16% improvement in RMSE and a 43.72% improvement in MAPE when they are comparing to those
obtained using the ACI 318-19 provision equation in shear capacity estimation.
In the last case with 16 factors, JS-XGBoost yields similar results with SOS-XGBoost in R2, RMSE, and MAE values, as shown in
Table 9. Both JS-XGBoost and SOS-XGBoost achieve better performance, except the MAPE, than the original XGBoost with the default
hyperparameters. The efficacies of the two hybrid algorithms are comparable and both outperform the original XGBoost in case 3.
Fig. 13 displays the convergence histories of RMSE of JS and SOS for case 3. The vertical axis represents RMSE (kN). SOS can obtain
a better optimal solution and converges faster than JS in all five runs in the optimization process. Additionally, the optimization
runtime of SOS was 1.43 min shorter than that of JS, as seen in Table 9. This result confirms that better performance of the objective
function optimizer may not result in better performance in terms of the performance metrics of interest.
Consistent with the “no free lunch” concept, the hyperparameters of machine learning models are often set case by case. Table 10
displays the settings of the hyperparameters of JS-XGBoost and SOS-XGBoost in case 3, demonstrating the challenge of manually
setting various algorithm parameters to determine the optimal models.

5.5. Superiority of JS-XGBoost and SOS-XGBoost in shear capacity estimation


To justify the performances of the proposed hybrid models that are used in this study, the original data from Feng et al. [30], which
were used as a reference, are used to train the hybrid metaheuristics-optimized XGBoost models. Despite the erroneous duplicate data
in the original dataset, running the model with the original, flawed dataset is the only way to compare fairly the model suggested by
Feng et al. (XGBoost, tuned by grid search optimization, GSO) and the models in this study. Fig. 14 shows the convergence histories of
JS and SOS using the original data. SOS can converge faster than JS and yield a smaller value of the objective function though the JS
start with a lower RMSE. Table 11 presents the results that were obtained using the proposed hybrid model and the model of Feng et al.
with the same original dataset. Both JS-XGBoost and SOS-XGBoost yield better results than the GSO-XGBoost model. Although the
hybrid models that were built by integrating metaheuristics (i.e., JS and SOS) into XGBoost yield slightly different test performance
based on diverse metrics (Table 11), JS-XGBoost has the best R2 of 0.975, RMSE of 106.349 kN, and MAE of 64.092 kN, and
SOS-XGBoost has the best MAPE of 18.129%.

5.6. Discussion
The analytical results in case 3 are illustrative because the case study uses all the original and synthetic features (Table 2) to predict
shear wall strength more accurately than that of the other two cases., Fig. 15 presents a histogram that compares MAPEs of the best
single, ensemble, and hybrid models, which are SVR (32.87%), XGBoost (13.22%), and JS-XGBoost (11.57%). The MAPE value of JS-

17
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

Table 8
Comparison of performance measures of hybrid models in case 2.

Method Dataset Metric Computational time


2
R RMSE (kN) MAE (kN) MAPE (%)

JS-XGBoost Training 0.999 13.76 7.20 3.16 02:16:31


0.998 29.27 20.26 9.64 02:25:12
1.000 11.05 4.89 2.51 02:23:56
0.999 11.94 6.34 3.82 02:18:04
0.999 10.64 4.73 2.37 02:21:20
Mean 0.999 15.33 8.69 4.30 02:21:01
Standard deviation 0.001 7.05 5.86 2.72 00:03:19
JS-XGBoost Test 0.958 138.11 84.94 27.10 00:00:01
0.961 133.60 78.61 22.90 00:00:01
0.955 142.51 80.36 22.30 00:00:01
0.954 144.90 85.22 24.37 00:00:01
0.935 171.67 95.11 25.01 00:00:01
Mean 0.953 146.16 84.85 24.34 00:00:01
Standard deviation 0.009 13.33 5.74 1.69 00:00:00
SOS-XGBoost Training 0.999 11.55 6.00 3.16 02:31:30
1.000 13.04 7.46 3.66 03:04:56
0.999 22.35 11.13 3.73 02:43:59
0.999 16.32 9.20 3.92 02:49:17
0.999 11.15 5.36 2.59 03:00:23
Mean 0.999 14.88 7.83 3.41 02:50:01
Standard deviation 0.000 4.15 2.11 0.48 00:11:55
SOS-XGBoost Test 0.951 149.46 72.16 14.25 00:00:01
0.959 136.49 72.61 15.78 00:00:01
0.960 134.35 77.72 19.81 00:00:01
0.953 145.97 82.53 23.80 00:00:01
0.944 158.88 89.93 24.10 00:00:01
Mean 0.953 145.03 78.99 19.55 00:00:01
Standard deviation 0.006 8.94 6.65 4.03 00:00:00
ACI 318-19 provision – 0.853 373.725 233.755 34.736 –

Table 9
Comparison of performance measures of hybrid models in case 3.

Method Dataset Metric Computational time

R2 RMSE (kN) MAE (kN) MAPE (%)

JS-XGBoost Training 0.999 13.13 6.86 2.54 02:11:20


0.999 19.49 8.51 2.84 02:10:42
0.998 25.54 11.08 3.01 02:10:28
0.998 24.82 11.15 3.14 02:18:14
0.999 18.72 8.64 2.48 02:14:21
Mean 0.999 20.34 9.25 2.80 02:13:01
Standard deviation 0.000 4.53 1.65 0.26 00:02:57
JS-XGBoost Test 0.976 103.98 67.19 18.18 00:00:01
0.979 98.07 61.37 18.80 00:00:01
0.980 94.55 60.62 14.26 00:00:01
0.982 90.04 55.45 11.57 00:00:01
0.984 85.19 52.87 12.96 00:00:01
Mean 0.980 94.36 59.50 15.16 00:00:01
Standard deviation 0.003 6.47 4.98 2.86 00:00:00
SOS-XGBoost Training 0.999 9.19 5.89 2.64 01:50:31
0.998 25.43 13.71 3.90 02:05:39
0.999 21.83 10.59 3.97 02:11:34
0.999 18.58 9.32 3.01 02:32:57
0.999 20.02 9.78 3.06 02:17:14
Mean 0.999 19.01 9.86 3.32 02:11:35
Standard deviation 0.000 5.42 2.51 0.53 00:13:55
SOS-XGBoost Test 0.983 88.26 56.56 17.73 00:00:01
0.983 87.35 56.20 15.80 00:00:01
0.979 104.57 64.67 17.43 00:00:01
0.981 92.16 57.83 14.79 00:00:01
0.979 96.94 61.31 14.53 00:00:01
Mean 0.981 93.85 59.31 16.06 00:00:01
Standard deviation 0.002 6.34 3.23 1.32 00:00:00

18
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

Fig. 13. Convergence histories of RMSE (kN) for JS and SOS in case 3.

Table 10
Tuning of hyperparameters of hybrid models in case 3.

Hyperparameter Feasible range Default setting in Python Optimal value in JS-XGBoost Optimal value in SOS-XGBoost

eta [0,1] 0.3 0.194 0.635


gamma [0,∞] 0 23.036 44.657
max_depth [0,∞] 6 54.306 57.417
min_child_weight [0,∞] 1 4.322 6.363
max_delta_step [0,∞] 0 0 74.125
subsample (0,1] 1 0.823 0.712
colsample_bytree (0,1] 1 0.688 0.781
lamda [0,∞] 1 3.313 8.897
alpha [0,∞] 0 6.575 5.096

Fig. 14. Convergence histories of RMSE (kN) for JS and SOS with original data.

XGBoost is 12.48% and 64.8% better than those of XGBoost and SVR, respectively.
For XGBoost in cases 1 to 3 (Table 4), the maximum difference of error rates between the training and test datasets is 14.17%,
revealing that XGBoost may suffer from slight overfitting, as shown in Fig. 16. Notably, after the hyperparameters of XGBoost were
fine-tuned, the variability of the MAPE values of XGBoost was reduced by integrating it with the JS and SOS algorithms. For JS-
XGBoost, the difference between the error rates of the training dataset and that of the test dataset is 11.23% in case 1 and 9.09%
in case 3. For SOS-XGBoost, the difference is 11.09% in case 2. The results imply the following two ways of possibly mitigating
overfitting in the shear wall datasets; (1) using metaheuristic optimization algorithms to fine-tune the XGBoost hyperparameters, and

19
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

Table 11
Comparison of performance measures for hybrid models using original dataset.

Method Dataset Metric

R2 RMSE (kN) MAE (kN) MAPE (%)

JS-XGBoost Training 0.997 35.486 14.820 5.523


Test 0.975 106.349 64.092 21.325
SOS-XGBoost Training 0.995 45.989 19.383 5.563
Test 0.963 129.415 68.696 18.129
GSO-XGBoost Training 0.999 17.623 12.326 5.843
Test 0.968 120.22 65.613 18.262

Fig. 15. Histogram of MAPEs for best single, ensemble, and hybrid models in case 3.

Fig. 16. Histogram of MAPEs for best XGBoost and metaheuristics-optimized XGBoost models.

(2) increasing the number of input features according to the ACI provision equation.
The tendency of the ACI code to underestimate nominal shear capacity relative to that predicted by metaheuristics-optimized ML
raises issues of design quality and safety, as shown in Fig. 17. Notably, the values of nominal shear capacity that are predicted using JS-
XGBoost in case 3 tend to be overestimated in the middle-strength range (680–1000 kN), yielding a larger predictive variability than
that of the ACI code; they tend to be underestimated at higher strengths (above 2000 kN). The overall accuracy of JS-XGBoost exceeds
that of the ACI code. For the purposes of design and quality control, further research into data quality, ML model improvements, and
the use of outputs from the prediction system must be carried out.

20
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

Fig. 17. Scatter plot of actual versus predicted/calculated nominal shear capacities.

6. Conclusion
Accurately calculating the shear capacity of an RC wall is complex and time-consuming. Thus, most engineers use simplified
equations for design purposes. However, the simplified equations in current building codes have low safety factors and do not cover
new materials. Accordingly, an appropriate machine learning scheme that can accurately determine peak shear strength is needed.
This investigation considered three cases. The first case involves the models using all of the original variables in the literature. The
second case involves the synthetic variables from the ACI equation for the shear strength of a wall. The third case involves a com­
bination of original and synthetic variables.
Based on the results in these three cases with single and ensemble models (ANN, SVR, RF, and XGBoost), XGBoost performed best
performance with R2, RMSE, MAE, and MAPE values of 0.978, 99.36 kN, 59.96 kN, and 13.22%, respectively, in case 3. We conclude
that the incorporation of synthetic features improved the results obtained over those obtained using only the original variables (first
case) or only the ACI variables (second case). Overall, ensemble models outperform single models. In particular, XGBoost exhibits
greater predictive accuracy when more factors are involved, and RF is effective even when the number of factors is small.
To further increase predictive accuracy, the JS and SOS metaheuristic optimization algorithms were integrated into XGBoost. JS
and SOS were used to fine-tune the hyperparameters of XGBoost and these two algorithms were proved to yield better predictive
accuracies than grid search, results concerning which were taken from the literature. The best RMSE value, 85.19 kN, was obtained
using JS-XGBoost over five runs in case 3 with 16 factors; this value was 14.17 kN better than the 99.36 kN that was obtained using
XGBoost with the default hyperparameters. This difference represents a 14.26% improvement in shear strength predictive perfor­
mance. Furthermore, SOS-XGBoost performs a 63.16% improvement of RMSE than the ACI provision equation in estimating shear wall
capacity (case 2) and allows the designed size of a shear wall or concrete strength to be reasonably reduced, while meeting the building
codes and saving construction and material cost. The efficacies of the two algorithms are similar and both yield better results than the
original XGBoost. JS algorithm finds the best solutions, but SOS performs more stably herein.
In the three cases of interest, the hybrid models generally exhibit more reliable and better predictive performances than the single
models and the ensemble models. Particularly, JS-XGBoost presents the best performance with MAPE of 11.57% in case 3 although
some hybrid models exhibit poorer predictive performance than the ensemble models for case 2. This study recommends to use the
metaheuristics-optimized XGBoost for shear strength prediction of RC walls in buildings based on the analytical results. The value of
this research warrants examination by other researchers in other contexts or other hybrid models. The proposed machine learning
framework favors building safety and simplify a cumbersome shear capacity calculation process. It can also be used as a general tool for
quantifying the performance of various mechanical models and empirical formulas in design standards.

Replication of results
The datasets (Appendix 1), codes, and results that were generated and/or analyzed during the current study are available from the
corresponding author upon reasonable request. Appendix 2 provides details of the best metaheuristics-optimized XGBoost models for
each case study, which may be of benefit to researchers and practitioners in the future.

Declaration of competing interest


The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.

21
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

Data availability

We have attached source data and source code in the appendexes

Acknowledgements
The authors would like to thank the National Science and Technology Council, Taiwan, for financially supporting this research
under contract NSTC 110-2221-E-011-080-MY3.

Appendix A. Supplementary data


Supplementary data to this article can be found online at https://doi.org/10.1016/j.jobe.2022.105046.

References
[1] A.C. Aydin, B. Bayrak, Design and performance parameters of shear walls: a review, Archit. Civ. Eng. Environ. 14 (4) (2021) 69–93, https://doi.org/10.21307/
acee-2021-032.
[2] J.A. Gallardo, J.C. de la Llera, H.S. Maria, M.F. Chacon, Damage and sensitivity analysis of a reinforced concrete wall building during the 2010, Chile
earthquake, Eng. Struct. 240 (2021) 19, https://doi.org/10.1016/j.engstruct.2021.112093.
[3] J. Chandra, K. Chanthabouala, S. Teng, Truss model for shear strength of structural concrete walls, ACI Struct. J. 115 (2018) 323–335, https://doi.org/
10.14359/51701129.
[4] T. Tran, C. Motter, C. Segura, J. Wallace, Strength and deformation capacity of shear walls, in: 16th World Conference on Earthquake Engineering, Santiago,
January, 2017, pp. 9–13. https://www.wcee.nicee.org/wcee/article/16WCEE/WCEE2017-2969.pdf.
[5] S.-J. Hwang, W.-H. Fang, H.-J. Lee, H.-W. Yu, Analytical model for predicting shear strengthof squat walls, J. Struct. Eng. 127 (1) (2001) 43–50, https://doi.org/
10.1061/(ASCE)0733-9445(2001)127:1(43).
[6] J.-S. Chou, N. Ngoc-Tri, A.-D. Pham, Shear strength prediction in reinforced concrete deep beams using nature-inspired metaheuristic support vector regression,
J. Comput. Civ. Eng. 30 (2016), 04015002, https://doi.org/10.1061/(ASCE)CP.1943-5487.0000466.
[7] S. Kiran, B. Lal, S. Tripathy, Shear strength prediction of soil based on probabilistic neural network, Indian J. Sci. Technol 9 (41) (2016), https://doi.org/
10.17485/ijst/2016/v9i41/99188.
[8] M.R. Azadi Kakavand, H. Sezen, E. Taciroglu, Data-driven models for predicting the shear strength of rectangular and circular reinforced concrete columns, J.
Struct. Eng. 147 (1) (2021), https://doi.org/10.1061/(ASCE)ST.1943-541X.0002875, 04020301.
[9] A. Halevy, P. Norvig, F. Pereira, The unreasonable effectiveness of data, Intelligent Systems, IEEE 24 (2009) 8–12, https://doi.org/10.1109/MIS.2009.36.
[10] M.S. Cao, L.X. Pan, Y.F. Gao, D. Novák, Z.C. Ding, D. Lehký, X.L. Li, Neural network ensemble-based parameter sensitivity analysis in civil engineering systems,
Neural Comput. Appl. 28 (7) (2017) 1583–1590, https://doi.org/10.1007/s00521-015-2132-4.
[11] ACI Committee 318, Building Code Requirements for Structural Concrete and Commentary, Technical Documents, American Concrete Institute, Farmington
Hills, MI, United States, 2019, p. 624. https://www.concrete.org/publications/internationalconcreteabstractsportal.aspx?m=details&ID=51716937.
[12] J. Chandra, K. Chanthabouala, S. Teng, Truss model for shear strength of structural concrete walls, ACI Struct. J. 115 (2) (2018) 323–335, https://doi.org/
10.14359/51701129.
[13] S.-H. Hwang, S. Mangalathu, J. Shin, J.-S. Jeon, Machine learning-based approaches for seismic demand and collapse of ductile reinforced concrete building
frames, J. Build. Eng. 34 (2021), 101905, https://doi.org/10.1016/j.jobe.2020.101905.
[14] B. Hilloulin, V.Q. Tran, Using machine learning techniques for predicting autogenous shrinkage of concrete incorporating superabsorbent polymers and
supplementary cementitious materials, J. Build. Eng. 49 (2022), 104086, https://doi.org/10.1016/j.jobe.2022.104086.
[15] S. Mangalathu, H. Jang, S.-H. Hwang, J.-S. Jeon, Data-driven machine-learning-based seismic failure mode identification of reinforced concrete shear walls,
Eng. Struct. 208 (2020), 110331, https://doi.org/10.1016/j.engstruct.2020.110331.
[16] A. Siam, M. Ezzeldin, W. El-Dakhakhni, Machine learning algorithms for structural performance classifications and predictions: application to reinforced
masonry shear walls, Structures 22 (2019) 252–265, https://doi.org/10.1016/j.istruc.2019.06.017.
[17] A. Gondia, M. Ezzeldin, W. El-Dakhakhni, Mechanics-guided genetic programming expression for shear-strength prediction of squat reinforced concrete walls
with boundary elements, J. Struct. Eng. 146 (11) (2020), 04020223, https://doi.org/10.1061/(ASCE)ST.1943-541X.0002734.
[18] M. Zarringol, H.T. Thai, Prediction of the load-shortening curve of CFST columns using ANN-based models, J. Build. Eng. 51 (2022) 19, https://doi.org/
10.1016/j.jobe.2022.104279.
[19] M. Mirrashid, H. Naderpour, Recent trends in prediction of concrete elements behavior using soft computing, Arch. Comput. Methods Eng. 28 (4) (2021)
3307–3327, https://doi.org/10.1007/s11831-020-09500-7.
[20] V.Q. Tran, Machine learning approach for investigating chloride diffusion coefficient of concrete containing supplementary cementitious materials, Construct.
Build. Mater. 328 (2022) 14, https://doi.org/10.1016/j.conbuildmat.2022.127103.
[21] Z. Wang, T.X. Liu, Z.L. Long, J.Q. Wang, J. Zhang, A machine-learning-based model for predicting the effective stiffness of precast concrete columns, Eng. Struct.
260 (2022) 24, https://doi.org/10.1016/j.engstruct.2022.114224.
[22] H.Y. Zhang, X.W. Cheng, Y. Li, X.L. Du, Prediction of failure modes, strength, and deformation capacity of RC shear walls through machine learning, J. Build.
Eng. 50 (2022) 22, https://doi.org/10.1016/j.jobe.2022.104145.
[23] W.L. Yao, D.H. Li, L. Gao, Fault detection and diagnosis using tree-based ensemble learning methods and multivariate control charts for centrifugal chillers, J.
Build. Eng. 51 (2022) 19, https://doi.org/10.1016/j.jobe.2022.104243.
[24] J.-S. Chou, A.-D. Pham, Enhanced artificial intelligence for ensemble approach to predicting high performance concrete compressive strength, Construct. Build.
Mater. 49 (2013) 554–563, https://doi.org/10.1016/j.conbuildmat.2013.08.078.
[25] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86 (11) (1998) 2278–2324. http://vision.
stanford.edu/cs598_spring07/papers/Lecun98.pdf.
[26] J.-S. Chou, D.-S. Tran, Forecasting energy consumption time series using machine learning techniques based on usage patterns of residential householders,
Energy 165 (2018) 709–726, https://doi.org/10.1016/j.energy.2018.09.144.
[27] R. Olu-Ajayi, H. Alaka, I. Sulaimon, F. Sunmola, S. Ajayi, Building energy consumption prediction for residential buildings using deep learning and other
machine learning techniques, J. Build. Eng. 45 (2022) 13, https://doi.org/10.1016/j.jobe.2021.103406.
[28] J. Moon, S. Park, S. Rho, E. Hwang, Robust building energy consumption forecasting using an online learning approach with R ranger, J. Build. Eng. 47 (2022)
20, https://doi.org/10.1016/j.jobe.2021.103851.
[29] M.W. Ahmad, J. Reynolds, Y. Rezgui, Predictive modelling for solar thermal energy systems: a comparison of support vector regression, random forest, extra
trees and regression trees, J. Clean. Prod. 203 (2018) 810–821, https://doi.org/10.1016/j.jclepro.2018.08.207.
[30] D.-C. Feng, W.-J. Wang, S. Mangalathu, E. Taciroglu, Interpretable XGBoost-SHAP machine-learning model for shear strength prediction of squat RC walls, J.
Struct. Eng. 147 (11) (2021), 04021173, https://doi.org/10.1061/(ASCE)ST.1943-541X.0003115.

22
J.-S. Chou et al. Journal of Building Engineering 61 (2022) 105046

[31] T. Chen, C. Guestrin, XGBoost: a scalable tree boosting system, in: Balaji Krishnapuram, et al. (Eds.), Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining Association for Computing Machinery, Unted States, New York, NY, 2016, pp. 785–794, https://doi.org/
10.1145/2939672.2939785.
[32] P. Probst, M. Wright, A.-L. Boulesteix, Hyperparameters and Tuning Strategies for Random Forest, vol. 9, Wiley Interdisciplinary Reviews: Data Mining and
Knowledge Discovery, 2018, https://doi.org/10.1002/widm.1301.
[33] A. Deng, H. Zhang, W. Wang, J. Zhang, D. Fan, P. Chen, B. Wang, Developing computational model to predict protein-protein interaction sites based on the
XGBoost algorithm, Int. J. Mol. Sci. 21 (7) (2020) 2274, https://doi.org/10.3390/ijms21072274.
[34] J.-S. Chou, A.-D. Pham, Smart artificial firefly colony algorithm-based support vector regression for enhanced forecasting in civil engineering, Comput. Aided
Civ. Infrastruct. Eng. 30 (9) (2015) 715–732, https://doi.org/10.1111/mice.12121.
[35] E.A. Grimaldi, F. Grimaccia, M. Mussetta, R. Zich, PSO as an effective learning algorithm for neural network applications, in: Proceedings of 3rd International
Conference on Computational Electromagnetics and Its Applications, 2004, pp. 557–560, https://doi.org/10.1109/ICCEA.2004.1459416.
[36] A.O. Alnahit, A.K. Mishra, A.A. Khan, Stream water quality prediction using boosted regression tree and random forest models, Stoch. Environ. Res. Risk Assess.
(2022), https://doi.org/10.1007/s00477-021-02152-4.
[37] T. Mao, A.S. Mihaita, F. Chen, H.L. Vu, Boosted genetic algorithm using machine learning for traffic control optimization, IEEE Trans. Intell. Transport. Syst. 23
(7) (2022) 7112–7141, https://doi.org/10.1109/tits.2021.3066958.
[38] M.M. Rosso, R. Cucuzza, F. Di Trapani, G.C. Marano, Nonpenalty machine learning constraint handling using PSO-SVM for structural optimization, Adv. Civ.
Eng. 2021 (2021) 17, https://doi.org/10.1155/2021/6617750.
[39] J.-S. Chou, D.-N. Truong, A novel metaheuristic optimizer inspired by behavior of jellyfish in ocean, Appl. Math. Comput. 389 (2021), 125535, https://doi.org/
10.1016/j.amc.2020.125535.
[40] M.-Y. Cheng, D. Prayogo, Symbiotic Organisms Search: a new metaheuristic optimization algorithm, Comput. Struct. 139 (2014) 98–112, https://doi.org/
10.1016/j.compstruc.2014.03.007.
[41] F. Rosenblatt, Principles of neurodynamics: perceptrons and the theory of brain mechanisms. https://safari.ethz.ch/digitaltechnik/spring2018/lib/exe/fetch.
php?media=neurodynamics1962rosenblatt.pdf, 1961.
[42] K. Fukushima, Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biol. Cybern. 36 (4)
(1980) 193–202, https://doi.org/10.1007/BF00344251.
[43] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (3) (1995) 273–297, https://doi.org/10.1007/BF00994018.
[44] H. Tin Kam, Random decision forests, in: Proceedings of 3rd International Conference on Document Analysis and Recognition, 1995, pp. 278–282, https://doi.
org/10.1109/ICDAR.1995.598994.
[45] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32, https://doi.org/10.1023/A:1010933404324.
[46] S. Gao, C.W. de Silva, Estimation distribution algorithms on constrained optimization problems, Appl. Math. Comput. 339 (2018) 323–345, https://doi.org/
10.1016/j.amc.2018.07.037.
[47] M.S. Kiran, TSA: tree-seed algorithm for continuous optimization, Expert Syst. Appl. 42 (19) (2015) 6686–6698, https://doi.org/10.1016/j.eswa.2015.04.055.
[48] J.-S. Chou, D.-N. Truong, Multiobjective optimization inspired by behavior of jellyfish for solving structural design problems, Chaos, Solitons & Fractals 135
(2020), 109738, https://doi.org/10.1016/j.chaos.2020.109738.
[49] C.-L. Ning, B. Li, Probabilistic development of shear strength model for reinforced concrete squat walls, Earthq. Eng. Struct. Dynam. 46 (6) (2017) 877–897,
https://doi.org/10.1002/eqe.2834.
[50] L.M. Massone, F. Melo, General solution for shear strength estimate of RC elements based on panel response, Eng. Struct. 172 (2018) 239–252, https://doi.org/
10.1016/j.engstruct.2018.06.038.
[51] E. Geeurickx, L. Lippens, P. Rappu, B.G. De Geest, O. De Wever, A. Hendrix, Recombinant extracellular vesicles as biological reference material for method
development, data normalization and assessment of (pre-)analytical variables, Nat. Protoc. 16 (2) (2021) 33, https://doi.org/10.1038/s41596-020-00446-5.
[52] Xgboost Developers, XGBoost parameters. https://xgboost.readthedocs.io/en/latest/parameter.html, 2020.

23

You might also like