Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Ecological Modelling 450 (2021) 109560

Contents lists available at ScienceDirect

Ecological Modelling
journal homepage: www.elsevier.com/locate/ecolmodel

A Bayesian network with fuzzy mathematics for species habitat suitability


analysis: A case with limited Angelica sinensis (Oliv.) Diels data
Quanzhong Zhang a, b, Haiyan Wei b, Jing Liu a, b, Zefang Zhao a, b, c, Qiao Ran a, b, c, Wei Gu a, c, d, *
a
National Engineering Laboratory for Resource Development of Endangered Crude Drugs in Northwest of China, Shaanxi Normal University, Xi’an 710119, China
b
School of Geography and Tourism, Shaanxi Normal University, Xi’an 710119, China
c
The Key Laboratory of Ministry of Education for Medicinal Resources and Natural Pharmaceutical Chemistry, Shaanxi Normal University, Xi’an 710119, China
d
College of Life Sciences, Shaanxi Normal University, Xi’an 710119, China

A R T I C L E I N F O A B S T R A C T

Keywords: There are many different types of species distribution models (SDMs) that are widely used in the field of ecology.
Bayesian network (BN) In this research, we explored a new advanced mechanism for predicting the distribution of species based on fuzzy
Fuzzy matter element model (FME) membership function, principle of maximum entropy, fuzzy mathematics comprehensive evaluation, and the
Fuzzy mathematics and Bayesian network
framework of Bayesian networks. We use fuzzy mathematics and Bayesian network model (FBM) to simulate
model (FBM)
Species habitat suitability
relationships between species’ habitats and environmental variables, and the relationship may be difficult to
Angelica sinensis (Oliv.) Diels quantify effectively. FBM, which combines species data, environmental data, expert experience, and machine
learning, could reduce the data and system error. In the case of medicinal plant, Angelica sinensis (Oliv.) Diels,
many approaches have been applied, including nine learning sequence of sampling sites, three FBM models, two
types of information classification by fuzzy mathematical classification (FMC) and equal interval classification
(EIC), and the evaluation of AIC and log-likelihood. Through the comparison of reasoning results between FBM
and fuzzy matter element model (FME) in testing sites, the result shows that the combination of objective data
and empirical model structure makes FBM have better result output. Besides, FBM sensitivity analysis helps
researchers explore in detail the impact of environmental factors on each level of species habitat suitability. The
temperature factor has an important influence on the highly suitable, moderately suitable, and lowly suitable
habitats of A. sinensis. Through FMC and sensitivity analysis, annual mean temperature (Bio1) in 5.92 ◦ C-9.05 ◦ C
and mean temperature of warmest quarter (Bio10) in 14.80 ◦ C-18.60 ◦ C are the highly suitable habitat tem­
perature range of A. sinensis.

1. Introduction up so far.
The Generalized linear model (GLM) proposed by Nelder and Wed­
Many researchers are interested in species distribution models derburn (1972) appeared very early and has been applied to the suit­
(SDMs). On this basis, the relationship model between habitat occur­ ability research of species. As a habitat suitability model, GLM performs
rence localities and environmental requirements was constructed a good analysis when there are a large number of species or lack of data
(Phillips et al., 2006). Since the late twentieth century, scholars have (Hirzel et al., 2001). The Generalized additive model (GAM) is a
been using mathematical methods to construct species environmental non-parametric extension of GLM and is used to deal with complex
suitability models to help people understand the living conditions of geographic data to predict spatial distributions of species (Zaniewski
species (Friedman, 1991; Manel et al., 1999; Breiman, 2001; Elith et al., et al., 2002). Unlike GLM or GAM, the genetic algorithm for rule-set
2006). Each model has its own theoretical framework and is different production (GARP) could predict the distribution of plants and ani­
from the concept definition with species suitability. Due to the system­ mals through automated spatial modeling and filter out potential sour­
atic error of different SDMs, the final prediction results will also be ces of errors (Stockwell, 1999). Subsequently, the Maxent model
distinguished. After long-term application and screening, a few SDMs appeared at the historic moment, which used maximum-entropy tech­
coped well with environmental conditions, and have been widely used niques to model the distribution of species. And its sequential update

* Corresponding author.
E-mail addresses: weihy@snnu.edu.cn (H. Wei), weigu@snnu.edu.cn (W. Gu).

https://doi.org/10.1016/j.ecolmodel.2021.109560
Received 26 June 2020; Received in revised form 2 February 2021; Accepted 3 April 2021
Available online 22 April 2021
0304-3800/© 2021 Elsevier B.V. All rights reserved.
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Table 1
Our team uses FME to study the potential distribution of medicinal plants.
Predicted Species of medicinal plant Number of environment Characteristic values of Study area in China Literature
period variables species

Current Schisandra sphenanthera Rehd. et Wils. 15 schisantherin A Qinling Mountains Lu et al., 2012
Schisandra sphenanthera Rehd. et Wils. 15 deoxyschizandrin Qinling Mountains Guo et al., 2013
γ-schizandrin
Scutellaria baicalensis Georgi 15 baicalin The whole of China Zhu et al., 2015
Cornus officinalis 21 loganin Qinling Mountains Sang et al., 2015
Sinopodophyllum hexandrum (Royle) 12 fruit yield 7 provinces in western of Guo et al., 2015
Ying China
Angelica sinensis (Oliv.) Diels 14 ferulic acid 10 provinces in western of Shang et al.,
China 2015
Schisandra chinensis (Turcz.) Baill. 18 lignan 4 provinces of China Mao et al., 2016
Gynostemma pentaphyllum (Thunb.) 19 gypenosides The whole of China Zhao et al., 2017
Makino
Future Schisandra sphenanthera Rehd. et Wils. 19 schisantherin A Qinling Mountains Guo et al., 2016

algorithms could handle a very large number of features (Phillips et al., European hake (Merluccius merluccius), anglerfishes (Lophius piscatorius
2006). By contrast with GARP, Maxent predicted the distribution more and L. budegassa), and red mullets (Mullus barbatus and M. surmuletus) in
accurately when it considered presence-only data (Ray et al., 2018). an exploited marine ecosystem of the Northwestern Mediterranean Sea.
With the advancement of machine learning techniques (Džeroski et al., In addition to animals, few studies have used Bayesian theory to focus on
1997; Chon et al., 2001), several models have been developed using a the suitability distribution of terrestrial plants. Meineri et al. (2015)
variety of algorithms, such as classification trees (Debeljak et al., 2001), disentangled direct and indirect associations between landscape physi­
Random forest (RF) (Freeman et al., 2012), and Generalized boosting ography, environmental variables, and species distribution of four
model which is usually called Boosted regression trees (GBM/BRT) common plants at their northern range limit in Sweden by Gaussian BN.
(Hallstan et al., 2013). However, due to the difficulty of data collection and experience acqui­
The fuzzy matter element model (FME), which is established on the sition, no scholars have directly used BN to predict the distribution of
basis of fuzzy mathematics and GIS could provide and quantify an plant species suitability. This inspired us to make a species suitability
effective method for the evaluation of plant habitat suitability. FME fits prediction of BN supported by fuzzy mathematics.
the relationship between environmental variables data at sampling sites Our approach closely links to ecological niche theory (Patten and
and species characteristics, such as the content of marker compounds Auble, 1981). Chinese mathematician Cai founded matter element the­
extracted from the species (Nino et al., 2017). In 2012, our team firstly ory (Cai, 1983). In this research, we use matter element theory, fuzzy
used the FME to predict the suitability distribution of Schisandra sphe­ membership function (Zadeh, 1965), the principle of maximum entropy
nanthera Rehd. et Wils. in Qinling Mountains (Lu et al., 2012). After our (Parkash et al., 2008), and fuzzy mathematics comprehensive evaluation
effort, we have applied FME to study the suitability distribution of many (Lu et al., 2012) to assist BN modeling. Fuzzy membership functions
medicinal plants using different number of environment variables could fit the membership relations between fuzzy features, such as
(Table 1). Hence, we have accumulated valuable experience in opti­ medicinal effective contents of species, and different numerical envi­
mizing and improving FME. ronmental variables. In general, the scope of fuzzy membership degree is
Bayesian theories or methods are widely used in various fields of [0,1] (Zadeh, 1965). The principle of maximum entropy of information
research, such as financial econometrics (Wang et al., 2017a), risk could determine the weighting coefficients associated with the observed
management (Xin et al., 2017), environmental assessment and man­ data, avoiding the subjective bias with only using experience. Fuzzy
agement (Marcot and Penman, 2018), road infrastructure (Sierra et al., mathematics comprehensive evaluation could achieve the original data
2018), mechanical engineering (Wang et al., 2017b; Wang and Matth­ standardization according to the membership relations (Lu et al., 2012;
ies, 2019), biotechnology (Gendelman et al., 2017), and ecology (Sal­ Zhao et al., 2017). Fuzzy mathematics could help us build habitat suit­
liou et al., 2017). There are some studies focused on species habitat ability models in the absence of expert experience. In this research, the
research based on Bayesian theory (Feki-Sahnoun et al., 2018; Wiest role of fuzzy mathematics mainly reflects in the value classification of
et al., 2019; Thompson et al., 2020). In terms of ecological suitability, states of single environmental variables (SEVs), the determination of
some researchers chose Bayesian methods to study the habitat distri­ states of integrated environmental variables (IEVs), and habitat suit­
bution of animals. For vertebrates, they have proved that the Bayesian ability node. As for BN, it mainly reflects in the construction of model’s
network (BN) based on abundant prior knowledge is sufficient to frame, the calculation of the probability of species habitat suitability,
establish a model with reasonable sensitivity (Tantipisanuh et al., 2014). and sensitivity analysis. The results of fuzzy mathematics will process
Gieder et al. (2014) used BN to study the response of habitat to sea level into the BN model.
rise, and successfully predicted the existence of bird nests on the barrier Angelica sinensis (Oliv.) Diels as an example of model construction in
islands. Havron et al. (2017) applied BN to a small and marine dataset to this article. A. sinensis is a perennial herb, mainly distributed in the
model the probability of occurrence of three macrofauna species. They provinces of Gansu, Sichuan, Shaanxi, Hubei, and Yunnan (Committee
were the first to show that BN can effectively model habitat suitability of of Flora of China, 1992). The roots use as Chinese herbal medicine
benthic macrofauna. Hradsky et al. (2017) showed that although only (CHM), and marker compound of ferulic acid content should be no lower
remote-sensed or mapped variables are provided, BN can effectively than 0.05% (Committee of National Pharmacopoeia, 2020). The roots of
distinguish the existence of most native mammal species in the Otway A. sinensis have been used historically as a tonic, hematopoietic, and
Ranges in southeastern Australia. Using Bayesian hierarchical species anti-inflammatory agent for the treatment of gynecological diseases
distribution model, Coll et al. (2019) predicted the distribution of such as menstrual disorders, amenorrhea and dysmenorrheal for

2
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Fig. 1. 90 sampling sites (orange sites) and 50 predicted sites (black sites) of A. sinensis in China. Sampling sites are obtained mainly from the provinces of Gansu,
Hubei, Sichuan, and Yunnan. The distribution of the predicted sites is symmetrical, and the habitat suitability of them will be predicted by FBM after a series of
machine learning and reasoning.

thousands of years in Chinese herbal medicine prescriptions (Lü et al., this paper attempts to prove that FBM has great potential in predicting
2009; Cao et al., 2010). the suitability of medicinal plant habitats.
Our modeling objective is to build fuzzy mathematics and Bayesian
network model (FBM), to reconstruct the real relationship between 2. Materials and methods
environmental variables and A. sinensis’ habitats, to predict the habitat
suitability of A. sinensis, and to show the optimal results of the predicted 2.1. Study area
sites given by FBM. The original data will discretize by fuzzy mathe­
matics classification (FMC) and equal interval classification (EIC). We We obtain the location information of distribution sites of A. sinensis
also use additional testing sites to compare the predicted results of the by consulting Flora of China, investigation resource reports, literature,
FBM and FME to verify the reliability of the FBM. Using general data, and digital herbarium. The sites with ferulic acid content acquired from

Fig. 2. The three kinds of FBM networks. (a) Model A, which is a simple network structure by learning with fuzzy mathematics classification (FMC); (b) Model B,
which increases three IEVs nodes and directed arrows (green), reflecting the relationship between the different IEVs; (c) Model C, which based on model B, increased
two arrows (red) which point directly to the habitat suitability node to show the importance of temperature and precipitation factors.

3
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Table 2
The eight SEVs’ states divided by fuzzy mathematics classification (FMC) method.
SEVs (unit) Four states of SEVs node in FBM network
1 2 3 4

Bio1 ( ◦ C) ~4.61; 10.36~ 4.61~5.31; 9.67~10.36 5.31~5.92; 9.05~9.67 5.92~9.05


Bio10 ( ◦ C) ~13.21; 20.20~ 13.21~14.05; 19.35~20.20 14.05~14.80; 18.60~19.35 14.80~18.60
Bio12 (mm) ~511.64; 1058~ 511.64~540.21; 924.67~1058 540.21~568.79; 791.33~924.67 568.79~791.33
Bio15 ~61.33; 95.19~ 61.33~65.41; 91.11~95.19 65.41~69.04; 87.48~91.11 69.04~87.48
Bio18 (mm) ~263.77; 628~ 263.77~274.03; 528~628 274.03~284.28; 428~528 284.28~428
SOM (‰) ~76.25; 109.87~ 76.25~80.31; 105.81~109.87 80.31~83.91; 102.21~105.81 83.91~102.21
Totalrad13 (MJ/m2) ~4490.94; 5449.06~ 4490.94~4606.51; 5333.49~5449.06 4606.51~4709.25; 5230.75~5333.49 4709.25~5230.75
EL (m) ~1049; 2739~ 1049~1252.85; 2535.15~2739 1252.85~1434.08; 2353.92~2535.15 1434.08~2353.92

1, 2, 3, and 4 represent node state 1, 2, 3, and 4 of SEVs, respectively.

Shang et al. (2015). In order to remove the errors caused by the exces­ Hence, the state of the parent nodes could affect the state of the child
sive concentration of sampling sites, we select samples by eliminating nodes by the arrows (Fuster-Parra et al., 2015). As a part of FBM con­
adjacent sampling sites. Finally, we obtain 90 sampling sites of struction, BN is an important assumption that there is no loop in the
A. sinensis with complete information including ferulic acid content network. Formally, the network is a directed acyclic graph.
(Fig. 1). Here, the theoretical framework of BN proposed by FBM includes
three parts, including eight SEVs nodes (Bio1, Bio10, Bio12, Bio15,
2.2. Collection of environmental variables Bio18, SOM, Totalrad13 and EL), three IEVs nodes (temperature suit­
ability, precipitation suitability, and soil suitability), and habitat suit­
We collect bio-climatic variables, soil variables, topographic vari­ ability node. The simple diagram of the FBM model is constructed as
ables, and light variables, which are used to simulate the distribution Fig. 2a by learning the network structure with FMC information (see
suitability of A. sinensis. For bio-climatic variables, we collect 19 SEVs Table 2) in the machine Hugin (Hugin Expert 8.4, 2016). The network
(Bio1-Bio19) from the WorldClim database (www.worldclim.org/curr structure of model A is obtained by machine learning of 90 sampling
ent). For soil variables, we collect 5 SEVs from the Harmonized World sites without adding operator experience information.
Soil Database (HWSD, http://www.fao.org/soils-portal/en). For topo­ In the process of model building (Fig. 2b and Fig. 2c), we constructed
graphic variables, we collect 3 SEVs from the Resource and Environment temperature, precipitation, and soil as three IEVs nodes. Habitat suit­
Data Cloud Platform (RESDC, http://www.resdc.cn). In addition, 4 SEVs ability is a node supported by precipitation, temperature and soil nodes.
related to light are acquired from the Chinese Ecosystem Research The common ground of model B and model C: (1) The networks are
Network (CERN, http://www.cern.ac.cn). roughly divided into three layers. (2) The suitability of temperature
Variables data selection is a three-step process, the first step is to set (Bio1 and Bio10), precipitation (Bio12, Bio15, and Bio18), and soil
up membership function between all SEVs and the content of ferulic acid (SOM) directly determine the habitat suitability of A. sinensis. (3) The
with 90 sampling sites of A. sinensis in Matlab software (Matlab software effects of topography (EL) regard as potential impacts on temperature
9.1, 2016), retained SEVs were those with an R-squared greater than 0.8. and precipitation (Russak, 2009; Marini et al., 2011). For the difference
The second step is to do the correlation analysis with environmental between model B and model C, model B has a clear hierarchy (Fig. 2b).
variables that we have chosen, in order to remove the high correlation Considering the importance of annual mean temperature (Bio1) and
variables and prevent overfitting of the model. If two or three variables annual precipitation (Bio12) for the growing of A. sinensis, the model C
are highly correlated, choose the one that has more biological signifi­ add two connection arrows directed from nodes of Bio1 and Bio12 to the
cance and value for A. sinensis growth. Finally, we obtain eight SEVs, node of habitat suitability (Fig. 2c), based on the model B.
including Bio1 (annual mean temperature), Bio10 (mean temperature of For an FBM network, the probability distribution of each node must
warmest quarter), Bio12 (annual precipitation), Bio15 (precipitation be described by a set of mutually exclusive “states”, such as Table 2.
seasonality), Bio18 (precipitation of the warmest quarter), SOM (soil Before obtaining the probability distribution of the entire network, we
organic matter), Totalrad13 (annual total solar radiation), and EL need to set the states of all nodes.
(elevation above sea level). The eight SEVs included two kinds of tem­
perature (Bio1 and Bio10), three kinds of precipitation (Bio12, Bio15, 2.4. The states of SEVs nodes in FBM
and Bio18), one soil variable (SOM), one light variable (Totalrad13),
and one terrain variable (EL). BNs are usually modeled on discrete domains, so continuous vari­
ables need to be discretized first (Uusitalo, 2007). Hence, we classify
2.3. Bayesian network for FBM eight SEVs by FMC and EIC discrete method. In order to set the state of
SEVs through FMC, we establish the membership function between the
In the field of Bayesian theory, the Bayesian paradigm is based on a marker compound ferulic acid of A. sinensis and environmental variables
concept that the probability of an event can be corrected by the of 90 sampling sites based on the Matlab software to determine vari­
knowledge of additional information (Smith et al., 1985), and the ables’ thresholds (Lu et al., 2012; Guo et al., 2016). The value of each
modification of this probability is called the condition. BN is usually a membership function (degree of membership) indicates that the higher
common method when Bayesian theory applies to practical research the value of the function, the higher the content of the marker com­
(Havron et al., 2017). More specifically, BN is causal network that as­ pound ferulic acid. According to the degree of membership, we classify
signs probabilities to all internal nodes (“child nodes”). The parent the states of the environmental variables. We divide the numerical range
nodes and the child nodes connect by directional arrow. The states of for FMC so that each SEVs node could classify into four states (Table 2).
their parent nodes are used as the condition for probability transfer. “1”, “2”, “3”, and “4” respectively represent node state 1, 2, 3, and 4 of

4
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Table 3 Table 5
Calculate numerical intervals by the membership function of variable Bio1. A serial number of learning documents for FBM’s machine learning. We extract
Values of F(x) Numerical intervals of x
from 90 sampling sites, and divided them into 9 groups randomly, according to
the increasing sequence number from 1 to 9, with the interval of 10 sampling
0 ~ 0.30 x < 4.61; x >10.36 sites to obtain the learning documents of 9 sets of sampling sites. Finally, we
0.30 ~ 0.50 4.61 ≤ x < 5.31; 9.67 < x ≤ 10.36
could get 9 learning documents.
0.50 ~ 0.70 5.31 ≤ x < 5.92; 9.05 < x ≤ 9.67
0.70 ~ 1.00 5.92 ≤ x ≤ 9.05 The learning documents Number of sampling sites Sequence number

1 10 Seq 1st
2 20 Seq 2nd
3 30 Seq 3rd
Table 4
4 40 Seq 4th
The eight SEVs’ states divided by equal interval classification (EIC) method.
5 50 Seq 5th
SEVs (unit) Four states of SEVs node in FBM network 6 60 Seq 6th
I II III Ⅳ 7 70 Seq 7th
8 80 Seq 8th
Bio1 ( C)

~7.88 7.88~11.65 11.65~15.43 15.43~
9 90 Seq 9th
Bio10 ( ◦ C) ~16.18 16.18~19.35 19.35~22.53 22.53~
Bio12 (mm) ~593.50 593.50~892 892~1190.50 1190.50~
Bio15 ~67.75 67.75~79.50 79.50~91.25 91.25~
Bio18 (mm) ~308.50 308.50~452 452~595.50 595.50~
2.5. The states of IEVs and habitat suitability nodes in FBM
SOM (‰) ~89.18 89.18~104.23 104.23~119.29 119.29~
Totalrad13 ~4150.18 4150.18~4697.98 4697.98~5245.77 5245.77~ For the three IEVs and habitat suitability nodes, we set the states as
(MJ/m2) “unsuitable”, “lowly suitable”, “moderately suitable”, and “highly suit­
EL (m) ~1084.75 1084.75~1743.50 1743.50~2402.25 2402.25~
able”. We have three steps to confirm the state of the IEVs and habitat
I, II, III, and Ⅳ represent node state I, II, III, and Ⅳ of SEVs, respectively. suitability node in each sampling sites. Firstly, we calculate and obtain
membership function with all eight SEVs. Secondly, the weight of
SEVs by FMC, and different states represent respective numerical environmental variables is confirmed by the weight of the evaluation
intervals. index, which determined by the maximum entropy principle of infor­
The process of dividing numerical intervals has the following three mation (Zhao et al., 2017). Thirdly, the comprehensive suitability
steps. Take Bio1 as an example, firstly, according to Bio1 values and the evaluation index Rk is determined by the fuzzy mathematics compre­
marker compound ferulic acid of 90 sampling sites, the relationship hensive evaluation, calculated as Eq. (2). Here, Rw is the information
between the Bio1 and ferulic acid is fitted using Matlab software. Eq. (1) entropy weight vector of the index, Rξ is the fuzzy matter element (Lu
is the membership function. et al., 2012).
( ) [ ]
(1) M1 M2 ⋯ Mm
F(x) = exp − ((10x − 74.86)/26.18)2 Rk = R w R ξ = (2)
Kj K1 K2 ⋯ Km

where x represents the actual value of Bio1. F(x) represents Bio1’s ∑


where the fuzzy operator in the formula adopts (, +), Kj = ni=1 Wi ξij , Kj
membership of ferulic acid in A. sinensis. This represents the extent to
is the total score of the suitability evaluation index in the jth evaluation
which Bio1 supports the production ferulic acid of A. sinensis. The value
unit, Wi is the weight value of the ith index, and ξij is the ith index
of F(x) ranges from 0 to 1. If the degree of membership is 0, it indicates
membership degree of jth grid. Mm represents mth grid. The value of Kj
that the attribute has a complete non-membership, meaning that Bio1
(0, 1) indicates that the greater the value is, the more suitable of the grid
does not support the synthesis of ferulic acid. If the degree of member­
unit for species growth under temperature, precipitation, soil factors, or
ship is 1, it means that the attribute has a complete membership,
whole habitat conditions. For the comprehensive suitability evaluation
meaning that Bio1 completely support the synthesis of ferulic acid in
index Rk, values less than 0.3 are treat as unsuitable. Values from 0.3 to
A. sinensis. The value between 0 and 1 implies that the attribute has a
0.5 are identified as lowly suitable. Values from 0.5 to 0.7 represent
partial membership. Combined with the results of fuzzy mathematics
moderately suitable, and values more than 0.7 represent highly suitable.
and expert opinions, the classification limits of membership degree we
Through the comprehensive suitability assessment of precipitation,
choose are 0.3, 0.5, and 0.7, respectively. Secondly, because the nu­
temperature, soil, and the habitat suitability, we can find out the states
merical distribution of F(x) has symmetry (Table 3), when the value of
of all IEVs and habitat suitability nodes in the FBM network at each
the formula F(x) is 0.3, we can get two thresholds of x at 4.61 and 10.36.
sample site. Finally, we obtain the states of all nodes in FBM network
When the value is 0.5, we can obtain two thresholds 5.31 and 9.67.
with all sampling sites.
When the value is 0.7, we can acquire two thresholds 5.92 and 9.05.
These numerical intervals represent the different support of environ­
mental variable Bio1 for the growth of A. sinensis (Table 3). According to 2.6. Probability setting and evaluation of model
the similarity relationship reflected by the numerical attributes of SEVs
(Table 2, Table 3), FMC is classified. The size of the conditional probability distribution table (CPT) is the
EIC is the control group of FMC, both of the EIC and FMC are used product of all possible states of the parent and child nodes (Butz et al.,
together to explore the efficiency of the model using raw data. EIC is a 2010). In general, CPT is determined from the data through various
classification method based on completely equal SEVs intervals. The methods, but a more objective and effective method of obtaining CPT is
interval of SEVs values divided into four equal intervals in the study through case studies, such as machine learning. Here, the CPT of FBM
area. The classification results are show in Table 4. “I”, “II”, “III”, and network obtains through machine learning by the Hugin (Hugin Expert
“Ⅳ” respectively represent node state I, II, III, and Ⅳ of SEVs by EIC. 8.4, 2016) software learning function. The log-likelihood index (Lie
Both FMC and EIC are applicable to Model B (Fig. 2b) and model C et al., 2018) plays an important role in the transition from the priori
(Fig. 2c). For model A (Fig. 2a), this paper only uses the information of probability to the posteriori probability by expectation maximization
FMC, not EIC. Because when we try to use EIC information to obtain the (EM) algorithm (Basak et al., 2012). EM algorithm uses an iterative
network structure through machine learning, the result violates the most process to determine the candidate network in the process, thereby
basic ecological and physical laws. To sum up, we can identify the state estimating the value of log-likelihood indexes.
of SEVs node through FMC and EIC. As for the log-likelihood value, the more explanatory variables, the
more explained parts in the dependent variable, the larger the

5
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Table 6 Table 7
The occurrence rate of “ × ” and “–” was compared under three models (A, B, and The results of 10 testing sites based FME and FBM.
C), two data discrete methods (FMC and EIC), and different learning stages (Seq ID of testing FME FBM by fuzzy mathematics FBM by equal
1st, Seq 2nd-5th, and Seq 6th-9th). sites classification (FMC) interval
Model States of Fuzzy mathematics Equal interval classification (EIC)
habitat classification (FMC) classification (EIC) Model Model Model Model Model
suitability Seq Seq Seq Seq Seq Seq A B C B C
node 1st 2nd- 6th- 1st 2nd- 6th- 1 h h h h m h
Seq 5th Seq 9th Seq 5th Seq 2 m m m m h h
9th 3 h h h h h h
A “×” 82% 60% 31.5% 4 h h h h h h
“–” 14% 33.5% 55.5% 5 h h h h h h
B “×” 60% 11.5% 0 70% 49.5% 0 6 l – l l l l
“–” 0 0 0 0 0 0 7 n – n n l l
C “×” 60% 11.5% 0 70% 49% 0 8 l l l l l l
“–” 6% 12.5% 1% 6% 10.5% 14% 9 l l l l l l
10 l l l l l l
“ × ” represents an impossible distribution in the process of network probabi­
listic inference, beyond the predicted range, which is called “conflict”. “–” h, m, l, and n represent highly suitable, moderately suitable, lowly suitable, and
represents the probability value of highly, moderately, lowly suitable, and un­ unsuitable, respectively. “–” represents the probability value of highly, moder­
suitable is equal to 25%, which is called “equal probability”. The “conflict” and ately, lowly suitable, and unsuitable is equal to 25%, which is called “equal
“equal probability” are the performance of the model’s poor extraction of probability”.
effective information and reasoning ability, after learning the data of the cor­
responding A. sinensis sampling sites. 2.8. Sensitivity analysis of environmental variables

corresponding likelihood function, and conversely, the fewer explana­ In this paper, the state of each node and CPT of all nodes are obtained
tory variables, the smaller the likelihood function (Kharroubi and according to the A. sinensis sampling sites, and the sensitivity analysis of
Sweeting, 2010). Here, the log-likelihood of the data given the model is environmental variables affecting the habitat of A. sinensis is carried out.
simply the sum of the log-likelihoods of the individual cases. The size of Starting from the parent node, the state of the node is input Hugin
the formula value takes from the sum of squared residuals (and the software for inference. Each state of a node is preset to 100%, and the
sample size). And, the formula is negative. Hence, the smaller the ab­ remaining nodes are not set. The remaining nodes are probabilistic
solute value and the smaller the sum of squared residuals, the higher the transmitted according to Bayesian inference. The probability difference
effectiveness of parameter fitting (Lie et al., 2018). However, as the between each child node and its initial state is calculated. The larger the
value of the log-likelihood function becomes smaller, we cannot judge difference, the stronger the sensitivity.
that the model with reduced variables is better than the previous model.
Akaike information criterion (AIC) commonly used for model se­ 3. Results
lection (Burnham and Anderson, 2004). AIC is a standard to measure the
goodness of fitness, characterizing how well the model matches the data 3.1. The prediction results of model A, B, and C
(Akaike, 1987). For FBM, the fitting validity of the model represents the
matching degree of the model to the data. Here, AIC is computed as l-k, The predicted results of model A, B, and C with 50 predicted sites of
where l is the log-likelihood and k is the number of free parameters in the nine kinds of sites sequences are show in Appendix A. There are four
model. Using Hugin software, the result of simplified AIC formula is states of habitat suitability of A. sinensis, including highly suitable,
negative. The smaller the absolute value, the better the model perfor­ moderately suitable, lowly suitable, and unsuitable. There are some
mance (Burnham and Anderson, 2004). special results, like “ × ” and “–”, which can characterize the perfor­
mance differences between three models. The “ × ” means the impos­
2.7. Modelling process sible distribution in the network probability reasoning process, beyond
the predicted range, which is called “conflict”. The “–” means the
The models are ordered from the standpoint of their adequacy probability of highly, moderately, lowly suitable, and unsuitable is equal
(Halfon and Reggiani, 1978), therefore, we choose three kinds of to 25%, which is called “equal probability”. When calculating the rele­
network models (Fig. 2), and two methods of information classification vant indicators in the results, we find that Model A is learned from the
(FMC and EIC) with nine kinds of learning sequence of sampling sites network structure based on the original data, which performed poorly
(Table 5) to compare the suitability and uncertainty of the three models. (Appendix Table A1). In the two discretization methods, whether it is
In the study area, 50 predicted sites (Fig. 1) are randomly selected, model B (Appendix Table A2, A3) or model C (Appendix Table A4, A5),
which are used as the prediction sites for the suitability of A. sinensis, and the performance of FMC is better than EIC. All models learn data from 10
the accuracy and uncertainty of the model are evaluated. sampling sites (Seq 1st), and in the prediction results, “conflict” has a
Even if it is difficult to collect data on the occurrence of A. sinensis higher occurrence rate (Table 6). Obviously, it caused by too little
with complete information including ferulic acid content, we still collect sample information. Consider Seq 1st as the first stage, Seq 2nd to Seq
additional 10 testing sites information (including ferulic acid content 5th as the second stage, and Seq 6th to Seq 9th as the third stage. In these
information), to compare the results of three FBM model (Fig. 2) con­ three stages, with the prediction results of models A, B, and C, the
structed by the two methods of information classification (Table 2, incidence of “conflict” is gradually decreasing (Table 6). For “equal
Table 4) with the FME. probability”, the occurrence rates are different due to different models.
Model B does not have the phenomenon of “equal probability” (Ap­
pendix Table A2, A3), while model A has been increasing in these three

6
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Fig. 3. Conditional probability distribution table (CPT) in B model after learning 90 sampling sites with SEVs’ states divided by FMC (Seq 9th), IEVs’ states divided
by comprehensive suitability evaluation, in which h, m, l, and n respectively represent node state highly suitable, moderately suitable, lowly suitable, and unsuitable
of IEVs. And 1, 2, 3, and 4 respectively represent node state 1, 2, 3, and 4 of SEVs. They correspond to the states 1–4 in Table 2.

stages (Table 6). When model C is based on EIC, the occurrence rate of Based on the fitting validity, stabilization of the model and predicted
“equal probability” has been increasing (Appendix Table A5), and basis results, the model B with SEVs’ states divided by FMC is the best. Fig. 3
of FMC, it first rises and then decreases (Appendix Table A4, Table 6). shows the conditional probability distribution of model B after learning
the information by FMC from environmental data. Finally, we confirm
that the predicted results of Seq 9th of B model with SEVs’ states divided
3.2. Test results
by FMC and IEVs’ states divided by comprehensive suitability evalua­
tion is accurate for A. sinensis, showed in Fig. 5. Among them, if the
We use 10 testing sites to evaluate the performance of FBM (models
predicted sites have the same probability of suitability (Appendix
A, B, and C) and FME. The result shows that models A, B, and C are in a
Table A2), we show the lowest level of suitability (Fig. 5). Generally, it
stable after learning the information of 90 sampling sites (Appendix
can be seen that the highly and moderately suitable habitats of A. sinensis
A1–5). Thus, FBM chose CPT after learning the information of 90 sam­
are mainly distributed in Gansu and Shaanxi, which is consistent with
pling sites (Seq 9th) to infer the results of the test sites. The FME also
genuine producing areas of A. sinensis (Shang et al., 2015). Lowly suit­
used information from 90 sampling sites to model. Compared with FBM
able habitats are located in Yunnan, Sichuan, Chongqing, and Hubei.
and FME (Table 7), the test results of model B and C are consistent with
the results of FME by FMC. Model A has two “equal probability” oc­
currences. Except for ID1 of test sites, the prediction results of models B
3.4. Sensitivity analysis of SEVs and IEVs on habitat suitability of A.
and C test sites are consistent by EIC. In terms of FME prediction results,
sinensis
there is 30% difference. Model B has an unclear boundary in the
discrimination between highly suitable and moderately suitable through
The sensitivity analysis is provided based on CPT (Fig. 3) of B model
EIC. From the results, the operation of model B and model C through
after learning 90 sampling sites with SEVs’ states divided by FMC (Seq
FMC is more stable and good.
9th) and IEVs’ states divided by comprehensive suitability evaluation
[eq. (2)]. In order to analyze the degree of influence of SEVs and IEVs on
3.3. Model fitting validation, stabilization and optimal model the habitat suitability of A. sinensis, we set the limit values for each node
in the FBM, which is to set the SEVs and IEVs nodes separately. We set
By FMC, the absolute value of AIC of model A tends to be the highest the probability of a node in a certain state to 100%, the other state
among the three models, and the absolute value of AIC of model B is the probability settings of this node are all 0 (Appendix Table A6). At the
lowest (Fig. 4a). The absolute value of log-likelihood of model A is the same time, the remaining nodes retain the original probability state.
lowest among the three models, followed by model B and model C After the probability transfer, the state probabilities of A. sinensis suit­
(Fig. 4b). By EIC, the absolute value of AIC of model B is much smaller able habitat nodes are recorded, and these state probabilities are
than that of model C (Fig. 4c), but the absolute value of log-likelihood of compared to the state probabilities of the habitat suitability nodes of
model C is smaller than the value of model B (Fig. 4d). In the experiment, A. sinensis (Fig. 3). The result indicates that different probability setting
the model with the least AIC absolute value is usually chosen. Therefore, methods lead to changes in the probability values of four suitability
the AIC absolute value of model B is the minimum between the two levels (Appendix Table A6). We extract the maximum and minimum
methods of discretization (FMC and EIC). For model B, the AIC absolute values of the probability change of suitability (Fig. 6).
value of FMC (Fig. 4a) is smaller than that of EIC (Fig. 4c). For the result of SEVs, the main environmental variables that

7
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Fig. 4. AIC and Log-likelihood indexes of model A, B, and C with 9 kinds of learning sequences of sampling sites. (a) and (b) are under the fuzzy mathematics
classification (FMC). (c) and (d) are based on equal interval classification (EIC).

promote the increase or decrease of highly suitable probability tend to 4. Discussion


be the same (Fig. 6a), including SOM, Bio1 and Bio10. This means that
the change of temperature and soil factor dominates the change of 4.1. Environmental factors and FBM prediction limits
highly suitability probability. For moderately suitable habitats, the main
environmental variables affecting their probability changes are SOM, Compared to the bioclimatic variables, soil, terrain, and light data,
Bio1 and EL (Fig. 6b). This is an interesting finding. Soil, temperature, the method we use has important biological significance for plant
and topographic factors together dominate the change in the probability growth. These data are collected from the most widely used data set and
of a moderately suitable habitat. However, there is a discrepancy be­ are used to predict the distribution of species potential (Soberón and
tween the main environmental variables that promote the increase of Peterson, 2005; Pliscoff et al., 2014; Zhao et al., 2020). Although local
lowly suitable probability and the reduction of their probability factors (such as geographic location), and biological interactions are
(Fig. 6c). The main variables that promote the increase of lowly suitable sometimes considered to be important factors for maintaining plant
probability are Bio1 (11.54%), Bio10 (5.11%), EL (3.59%), and the main growth in a local area, our team assumes that bioclimatic, soil, terrain,
factors that reduce the probability are Bio1 (− 9.6%), SOM (− 7.31%), EL light variables, and their relationships mechanisms are key drivers of
(− 4.17%) (Fig. 6c). This implies that the environmental variables that physiological processes related to species survival. However, the factors
promote the increase of lowly suitable probability do not have the same such as biological interaction types and processes are currently difficult
influence when reducing the lowly suitable probability. The simulation to obtain at large spatial scales (Liu et al., 2019; Zhang et al., 2020a,
for unsuitable probability changes shows the similar result (Fig. 6d). 2020b). At the same time, the obstacles to assess the impact of human
As for the three IEVs, the temperature suitability node causes the activities on the growth of research objectives will complicate the initial
highest change in the probability of highly suitable, moderately suitable, FBM research and evaluation. The dynamics and diversity of species,
and lowly suitable (Fig. 6a, b, c). This implies the important influence of and biological interactions are not considered in the modeling of
temperature factors on the suitability of the habitat of A. sinensis. For the A. sinensis in this study. We can work with experts in biogeography,
unsuitable probability change, the precipitation suitability node reaches ecology, and ecophysiology to address these issues in a more coordi­
the maximum value (24.49%) when increasing its occurrence proba­ nated way. For the prediction results, FBM takes the form of points. In
bility, while the soil suitability node reaches the minimum value future, with the optimization of the model, we expect the model pre­
(− 12.59%) when decreasing its occurrence probability (Fig. 6d). This diction to be presented in a flat form.
implies the important adjustment role of precipitation and soil factors in For BN, it allows us to integrate different data sources into a single
the transformation of the unsuitable habitat of A. sinensis into the suit­ framework (Molina-Navarro et al., 2020). Ramírez et al. (2019) believes
able habitat. that BN combines qualitative data with expert knowledge, and can
obtain better risk assessment by formalizing objective and quantitative
evaluation models. BN allows modelers to incorporate data, expert
knowledge, or any combination of the two (Koen et al., 2017). In the

8
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Fig. 5. Habitat suitability map of 50 predicted sites of A. sinensis obtained from the Seq 9th of B model with SEVs’ states divided by FMC and IEVs’ states divided by
comprehensive suitability evaluation.

absence of current data, it is still feasible for BN to model biodiversity at the suitability of A. sinensis into “unsuitable”, “lowly suitable”,
fine spatial scales (Grafius et al., 2019). The combination of ecological “moderately suitable”, and “highly suitable”. FMC and EIC are used to
process interaction and spatial explicit data in the BN frame can improve preprocess SEVs. FMC uses a mapping relationship between environ­
researchers’ clear understanding of the process in space (Froese et al., mental variables and membership degrees. The EIC is used as a control
2019). Here, RF and Maxent have better spatial distribution perfor­ group corresponding to FMC to participate in model data preprocessing.
mance to a certain extent (Elith et al., 2010). However, only Maxent can Bayesian networks usually require discrete data (Uusitalo, 2007). Dis­
use categorical data as input for environmental variables (Guo et al., cretization means that only the rough features of the original distribu­
2017, 2019). FBM retains the input advantage of Maxent in data and tion are captured (Friedman and Goldszmidt, 1996), and statistical
allows input of categorical data. In addition, FBM composed of BN al­ information is lost (Hamilton et al., 2015; Lucena-Moya et al., 2015;
lows the addition of more empirical and decision-making data, which Meyer et al., 2014). However, discretization is beneficial in this study.
can effectively integrate various ecological data. Other models (such as With the increase of learning data, FBM has successfully provided reli­
Maxent, RF) are difficult to do. FBM has unique advantages in flexible able results (Table 6), sensitivity analysis (Appendix Table A6), and the
data fusion (including data types that can be added, visualization of data popular FBM has not provided biased results. Therefore, discretization
relationships, etc.). can mitigate the effects of noise data (incorrectly saved values/typing
errors/incorrect values) (Feki-Sahnoun et al., 2018).

4.2. Network framework and node state of FBM


4.3. Model reliability and robustness
Three network frameworks of FBM are compared with each other.
Model A represents machine learning structure, and models B and C Model evaluation uses AIC and log-likelihood function indicators
represent human experience (including previous literature and prac­ (Fig. 4). Fuzzy mathematics constructs a membership function between
tice). In fact, all models contain the wisdom of machine learning, environment variables and A. sinensis eigenvalues (ferulic acid content),
because in the subsequent process of obtaining CPT, they all learn the and is similar to constructing a normal distribution. In addition to log-
data automatically. FBM combines regression, expectation maximiza­ likelihood, this paper uses AIC instead of Bayesian information crite­
tion, and Bayesian inference algorithms. It provides more information rion (BIC). Because, especially in the case of BIC, it can only be used if
on the potential cause pathways of species habitat suitability than FME the posterior distribution can be approximated by any normal distri­
(Lu et al., 2012), RF (Zhang et al., 2020b), and Maxent (Guo et al., bution (Burnham and Anderson, 2004). Fuzzy mathematics first elimi­
2017). Bayesian inference considers the inter-independency between nates the traces of Gaussian noise, and the remaining information may
predictive variables, and the relationship between variables and contain some Poisson distribution through discretization. Therefore, due
outcome is usually non-linear (Feki-Sahnoun et al., 2018). BN can cap­ to the rigor of the research, AIC is used to evaluate the goodness of
ture the interactions between predictor variables (Hradsky et al., 2017), model fitting (Burnham and Anderson, 2004). In addition to AIC and
and can easily be updated with new knowledge on the nature of re­ log-likelihood values, FBM can also try to use Kappa, TSS, and AUC for
lationships already established in BN (Molina-Navarro et al., 2020). model performance measurement in the future, because they are widely
Hence, this potential advantage of FBM approach is particularly useful used to evaluate SDMs (Guo et al., 2018). Not only statistical indicators,
for observational studies with a large number of variables, as causality the research also needs to try to measure the difference in model pre­
and time relationships are often unknown in these studies. diction performance through different statistical data (Swets, 1988).
This paper follows the previous suitability state allocation method to At the same time, the EIC method as a control group can help re­
deal with FBM network IEVs and habitat suitability nodes, and divides searchers find the effectiveness of the modeling data without fuzzy

9
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Fig. 6. The maximum and minimum effects of


nodes on A. sinensis habitat in B model after
learning 90 sampling sites (Seq 9th) with SEVs’
states divided by FMC and IEVs’ states divided
by comprehensive suitability evaluation. (a)
The highly suitable probability changes; (b) The
moderately suitable probability changes; (c)
The lowly suitable probability changes; (d) The
unsuitable probability changes. “Ts” means
temperature suitability node. “Ps” means pre­
cipitation suitability node. “Ss” means soil
suitability node.

mathematical denoising algorithm. Models B and C both use FMC growth, or the structure of the network model is not reasonable. The
(Table 2) and EIC (Table 4) discretization methods, respectively. results show that the model requires a large amount of original data, and
Observation shows that two models (Models B and C) better fitting and the quality and quantity of the original data have a profound impact on
reasonable reasoning, when learning FMC data (Fig. 4). In the com­ the prediction results of the FBM model. This is the same as other SDMs,
parison between FBM and FME, the results also show that FMC is su­ such as GLM, GARP, and RF (Hirzel et al., 2001; Ray et al., 2018; Zhang
perior to EIC (Table 7). Wiest et al. (2019) believes that a target training et al., 2020b).
data set can build a model probability structure, which helps a more
robust cross-validation analysis. We developed FBM on a regional scale
4.4. Case and features of FBM
rather than a smaller sub-region or statistical scale. The three FBM
frameworks and the corresponding target training data sets can build the
In China, A. sinensis is widely known as a CHM (Committee of Na­
model probability distribution, which is the performance of powerful
tional Pharmacopoeia, 2020), the quality of A. sinensis and its marker
cross-validation analysis. Although there are no specific guidelines to
compound ferulic acid are systematic research (Zhang et al., 2012, 2019;
derive the number of cases for the structure and parameters of the FBM,
Ma et al., 2015). Sufficient and perfect basic research data can effec­
as the cases increase, the BN model becomes more robust. In all FBM,
tively support our research. This is why we choose A. sinensis instead of
whether used FMC or EIC, it is shows that after the model structure
other medicinal plants as a typical case.
parameters are fixed, as the amount of data increases, the implicit pa­
Each model has its own theoretical framework and is different from
rameters required for inference will increase (Fig. 4), and the absolute
the concept definition with species suitability. Different from most
values of the AIC and log-likelihood functions will also increase. By
previous models, such as GLM, GARP, RF, Maxent (Hirzel et al., 2001;
evaluating the FMC data, Model A, Model B, and Model C have very
Ray et al., 2018; Zhang et al., 2020b; Guo et al., 2017), FBM has a unique
different feedback. Model A only comes from machine structure
understanding of species suitability. As described in the above method,
learning. Model A has fewer nodes, so the model has fewer independent
the membership function fits the marker compound ferulic acid of
variables than the other two models, and the absolute value of the
A. sinensis and environmental variables together, and then further pro­
log-likelihood is also smaller (Fig. 4b). But model A has too many hidden
cesses to obtain the level of species suitability. Because FBM currently
variables, and the absolute value of AIC is much higher than that of the
mainly targets medicinal plants, to a large extent, the level of the content
other two models (Fig. 4a). Similarly, model B is superior to model C in
of species marker compound, that is, the level of medicinal value, rep­
data fitting, as is the model structure, because the implicit parameters of
resents the level of species suitability. FBM emphasizes that the content
model B are less than model C, it can prevent the model from overfitting
of species marker compound determines the suitability of the species’
during the mapping process.
habitat. The original data required by the FBM model includes the
Although the results contain a number of “conflict” and “equal
content of species marker compound, the location of the species
probability” events, the probability of this happening decreases with the
occurrence, and environmental data. We have compared FBM with FME
increase of the learning data (Table 6). The “conflict” indicates that the
in the manuscript. However, models such as GLM, GARP, RF, and
software cannot obtain results after calculation. The reason is that the
Maxent combine observations of species occurrence or abundance with
network’s inference oversteps the scope of reasoning. This means the
environmental estimates (Elith and Leathwick, 2009). There is no data
network probability distribution selected by researchers cannot exist in
on the content of species marker compound for simulation. Therefore,
the current network. The predicted sites are not suitable for the species’
for medicinal plants, FBM’s prediction mechanism is more effective and

10
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

can help us quickly predict where there will be high-quality medicinal explore relationships between environmental variables, thereby
materials. Because of the different definitions of habitat suitability and improving the model’s ability to identify and fit data and the efficiency
different data sources, the structure of FBM and other models (such as of using species data.
RF and Maxent) are very different, and the evaluation methods are also
different. The reliability from different model systems actually comes 5. Conclusion
from different statistical models (Strandén and Christensen, 2011). The
comparison between FBM and other models (such as RF and Maxent) Combining fuzzy mathematics, expert experience, machine learning,
cannot help FBM to improve model prediction accuracy, reliability and and Bayesian reasoning, taking A. sinensis as a case, a species habitat
robustness. suitability model FBM is established in this study. Model A is generated
Through a series of reliability and robustness analyses, this study by machine learning. Models B and C combine previous research and
adopted the results of Seq 9th of B model with SEVs’ states divided by experience. Compared with model A, the model frameworks B and C are
FMC, and IEVs’ states divided by comprehensive suitability evaluation more accurate and reliable. Model B has simple structure parameters,
(Fig. 5). In this study, 50 predicted sites of A. sinensis were randomly efficient data identification, and data output stability is better than
selected in the study area (Fig. 1). From the prediction results, the dis­ Model C, and Model A. Comparative experiments of two discretization
tribution of A. sinensis habitat suitability is related to the sampling sites. methods, FMC and EIC, show that the results of FMC is effectively
The prediction result reflects the distribution pattern of ferulic acid recognized by the FBM, and reliable results are output. FBM sensitivity
content in A. sinensis sampling sites. The highly and moderately suitable analysis helps researchers explore in detail the effects of environmental
habitats of A. sinensis are mainly distributed in Gansu and Shaanxi factors on various aspects of species habitat suitability. It is easy for
(Shang et al., 2015). This also further confirms that the model has a researchers to obtain terrestrial plants’ environmental data needed for
reasonable mapping and promotion ability. FBM operation, and many plants have species characteristics that can be
FBM adds sensitivity analysis to help researchers further explore the used for FBM. Hence, the concept and framework of FBM may have a
impact of variables on habitat suitability. From Appendix Table A6, we broad application prospect for research on species habitat suitability in
further extract the degree of influence of each node on the habitat status the future.
of A. sinensis (Fig. 6). Through the maximum and minimum values in the
suitability probability changes (Appendix Table A6), we can find out CRediT authorship contribution statement
that SOM, Bio1, Bio10, and EL affect the increase and decrease of four
suitability probabilities in different degrees (Fig. 6). In terms of the Quanzhong Zhang: Conceptualization, Methodology, Software,
range of suitability probability changes (min to max), almost all four Formal analysis, Data curtion, Writing – original draft, Writing – review
suitability changes are dominated by these environmental variables & editing, Visualization. Haiyan Wei: Writing – review & editing, Su­
(SOM, Bio1 and Bio10). For IEVs, the temperature suitability node still pervision. Jing Liu: Investigation, Validation, Writing – review & edit­
has a great influence on suitability of A. sinensis (Fig. 6). ing. Zefang Zhao: Data curtion, Writing – review & editing. Qiao Ran:
FBM performs Bayesian inference based on FMC. The role of fuzzy Data curtion, Writing – review & editing. Wei Gu: Conceptualization,
mathematics in this study is for the discretization of nodes. According to Validation, Writing – review & editing, Supervision, Project adminis­
the degree of membership, we classify the states of the environmental tration, Funding acquisition.
variables (Table 2). This means that each state of the eight SEVs itself
contains some information that supports or not support the suitability of Declaration of Competing Interest
A. sinensis. For example, Bio1 node (Appendix Table A6), when the
probability of “4′′ is 100%, the probability of highly suitable is the The authors declare that they have no known competing financial
largest 39.82%, and the probability of unsuitable is the smallest 11.58%. interests or personal relationships that could have appeared to influence
In contrast, when the “1′′ probability is 100%, the highly suitable the work reported in this paper.
probability is the smallest 17.29%, and the unsuitable probability is the
largest 21.80%. In summary, the four states of FMC (Table 2), from state Acknowledgments
1 to state 4, imply the gradual support of environmental variables for the
growth of A. sinensis. Obviously, FMC can extract the relationship be­ We thank everyone in our team, because we summarized our pre­
tween the value of environmental variables and the growth of A. sinensis, vious studies to make progress and innovation. We also pay tribute to
while EIC cannot (Fig. 4, Table 6). Hence, FMC significantly improves researchers who engage in the study of species distribution. Also, we
the operating efficiency of FBM. Through FMC and the sensitivity appreciate Miss Yunfei Gu, a doctoral student at the University of
analysis (Appendix Table A6), Bio1 in 5.92 ◦ C-9.05 ◦ C, Bio10 in 14.80 Southampton for assisting us in revising the paper. This work is sup­

C-18.60 ◦ C, Bio12 in 568.79 mm-791.33 mm, Bio15 in 69.04–87.48, ported by the Basic Research Programs of Natural Science Foundation of
and Bio18 in 284.28 mm-428 mm can promote the highly suitable Shaanxi Province (No. 2020JM-277), the National Natural Science
habitats of A. sinensis (Table 2), which is consistent with the conclusion Foundation of China (No. 31070293), and the Research and Develop­
of previous studies (Shang et al., 2015; Zhang et al., 2017). Consistent ment Program of Science and Technology of Shaanxi Province (No.
with the research results of genuine producing areas, SOM has a pro­ 2014K14–01–02).
found effect on the growth quality of A. sinensis (Zhao et al., 2002). This
is an interesting finding. We can continue to analyze the range where Appendix A
SEVs have the greatest influence on the moderately suitable, lowly
suitable, and unsuitable habitats of A. sinensis (Table 2, Appendix Table A1, A2, A3, A4, A4, A5, and A6
Table A6). Unlike FME, FBM pre-processes the data while preserving the
adaptability of the species to the environment. It adds a machine
learning process that can perform detailed probabilistic reasoning and

11
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Table A1
The results of 50 predicted sites based on model A with 9 kinds of learning sequences of sampling sites with SEVs’ states divided by fuzzy mathematics classification
(FMC) and IEVs’ states divided by comprehensive suitability evaluation.
ID of predicted sites Seq 1st Seq 2nd Seq 3rd Seq 4th Seq 5th Seq 6th Seq 7th Seq 8th Seq 9th

1 × – – – – – – – –
2 × × × – – – – – –
3 × × – – – – – – –
4 × × × – – – – – –
5 × × × – – – – – –
6 × × × × – – – – –
7 × × × × × × × × ×
8 × × – – – – – – –
9 × × × × – – – – –
10 × × × × l l l l l
11 × × × × × × × × ×
12 n n n N n n n n n
13 – × × × × n n n n
14 × – – – – – – – –
15 × × × × × × × × ×
16 × × × × × × × × ×
17 × × × × × × × × ×
18 – – – – – – – – –
19 × × – – – – – l l
20 × × – – – – – – –
21 × × × × × × × × ×
22 × × × × × × – × ×
23 × × × × × × × × ×
24 × × × – – – – – –
25 × – – – – – – – –
26 × × × – – – × × ×
27 × × – – – – l l l
28 × × × × – – – – –
29 – × × × – – – – –
30 – × × × × × × × ×
31 × × × × × × × × ×
32 × × × × – – – – –
33 × × × × – – – – –
34 × × × × × × × × ×
35 × × × × × × × × ×
36 × × × × – × – – –
37 × × × × × × × × ×
38 × × × × × – – – –
39 × × × × – – – – –
40 × × × × × × × × ×
41 × × × × – – – – –
42 – – – – – – – h –
43 × h h H h h h h h
44 × × – – – – – – –
45 h h/m h/m h/m m m m h h
46 – – – – – – – – –
47 – × × × – – – – –
48 × × × × × × × × ×
49 × × × × – – – – –
50 × – – – – – – – –

h, m, l, and n represent highly suitable, moderately suitable, lowly suitable, and unsuitable. The “h/m” represents the probability value of highly and moderately
suitable is equal, and is greater than that of lowly suitable and unsuitable probability. “–” represents the probability value of highly, moderately, lowly suitable, and
unsuitable is equal to 25%, which is called “equal probability”. “ × ” represents an impossible distribution in the process of network probabilistic inference, beyond the
predicted range, which is called “conflict”, and the final prediction result was regarded as unsuitable.

12
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Table A2
The results of 50 predicted sites based on model B with 9 kinds of learning sequences of sampling sites with SEVs’ states divided by fuzzy mathematics classification
(FMC) and IEVs’ states divided by comprehensive suitability evaluation.
ID of predicted sites Seq 1st Seq 2nd Seq 3rd Seq 4th Seq 5th Seq 6th Seq 7th Seq 8th Seq 9th

1 × l × l l l l l l
2 × l n l l l l l l
3 × l × l l l l l h
4 × l × h h h h h l
5 × m n l l l l l l
6 × l h h h h h h h
7 l l l l l l l l l
8 m m m m m m m m m
9 × × × h h h h/l h l
10 × l × l l l l l l
11 × × × l l h m h/m/l h/l
12 × l l/n l l l l l l
13 l l n n n n n n n
14 l m n n n n n n n
15 × l × l l l l l l
16 × l l/n l l l l l l
17 × l l n n n n n n
18 l l h/l l l l l l l
19 l l h/l l l l l l l
20 l m n l l l l l l
21 × m l l l l l l l
22 × × × l l l l l m/l
23 × m × l l l l l l
24 l l l l l l l l l
25 × l × l l l l l l
26 × l n l l l l l l
27 l l l/n l l l l l l
28 × × × l l l l l l
29 l l l/n l l l l l l
30 l l l/n l l l l l l
31 × m n l l l l l l
32 × l l l l l l l l
33 × l l l l l l l l
34 × × × n n n n n n
35 × l l n n n n n n
36 l l l l l l l l l
37 × l × n n n n n n
38 l l l/n l l l l l l
39 l l n l l l l l l
40 × l × l l l l l n
41 l l n l l l l l l
42 × l h h h h h h h
43 h h h h h h h h h
44 m m m m m m m m m
45 m m h h h h h h h
46 l l l/n l l l l l l
47 h/m h h h h h h h m
48 × l h/l l l l l l l
49 × × × h h h h/l h l
50 × × × l l l l l l

The “h/m/l” represents the probability value of highly, moderately, and lowly suitable is equal, and is greater than that of unsuitable probability. The “h/l” represents
the probability value of highly and lowly suitable is equal, and is greater than that of moderately suitable and unsuitable probability. The “m/l” represents the
probability value of moderately and lowly suitable is equal, and is greater than that of highly suitable and unsuitable probability. The “l/n” represents the probability
value of lowly suitable and unsuitable is equal, and is greater than that of highly and moderately suitable probability. “–” and “ × ” are shown Table A1.

13
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Table A3
The results of 50 predicted sites based on model B with 9 kinds of learning sequences of sampling sites with SEVs’ states divided by equal interval classification (EIC)
and IEVs’ states divided by comprehensive suitability evaluation.
ID of predicted sites Seq 1st Seq 2nd Seq 3rd Seq 4th Seq 5th Seq 6th Seq 7th Seq 8th Seq 9th

1 l l l l l l l m/l l
2 × × l l l l l m/l l
3 m h h h h h m m m
4 l l l l l l h h h
5 l m m m m m m m m
6 × × × × × l l l h
7 × × × × × l l l l
8 m h m m m m m m m
9 × × × × × l l l l
10 × × × × × l l l l
11 × × × × × l l l l
12 × × × × × l l l l
13 × × × × l l l l l
14 × × × × × l l l l
15 × × × × l l l l l
16 × × × × l l l m/l l
17 × × l l l l l l l
18 l h/l h/m/l l l h h h h
19 l l l l l l l l l
20 l l l l l l l l l
21 × × l l l l l l l
22 m l l l l l l l l
23 × × l l l l l m/l l
24 l l l l l l l l l
25 × × l l l l l l l
26 × × l l l l l l l
27 l l l l l l l l l
28 × × l l l l l l l
29 × × l l l l l l l
30 × × l l l l l m/l l
31 × × × × × l l l l
32 × × × × h h h h h
33 × × × × l l l l h
34 × × × × l l l l l
35 × × × × l l l l l
36 × × × × l l l l l
37 × × × × l l l l l
38 × × × × l l l m/l l
39 × × × × × n l l l
40 × × × × l l l m/l l
41 × × × × l l l l l
42 × × × × × h h h h
43 m h h × h h h h h
44 × × h h h h h h h
45 m h h h h h h h h
46 m h m m m m m m m
47 m h/m h h h h h h h
48 × × × × × l l l h
49 × × × × × l l l l
50 × × × × × l l l l

The letters and symbols in the table are explained in Table A1 and Table A2.

14
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Table A4
The results of 50 predicted sites based on model C with 9 kinds of learning sequences of sampling sites with SEVs’ states divided by fuzzy mathematics classification
(FMC) and IEVs’ states divided by comprehensive suitability evaluation.
ID of predicted sites Seq 1st Seq 2nd Seq 3rd Seq 4th Seq 5th Seq 6th Seq 7th Seq 8th Seq 9th

1 × l × n n n n n n
2 × l l/n l/n l/n l/n l/n l/n l
3 × – × h h h/m h/m h/m h
4 × – × h h h/m h/m h/m h/m
5 × – – – – – l l l
6 × h – – – h h h h
7 l – – l l l l l l
8 l l l/n l/n l/n l/n l/n l/n l
9 × × × l l l l l l
10 × l × l l l l l l
11 × × × n n n n n n
12 × l /n l l l l l l l
13 l l n n n n n n n
14 – – h h h h/m h h h
15 × l × n n n n n n
16 × l n n n n n n n
17 × l l l l l l l l
18 l l l l l l l l l
19 l l l l l l l l l
20 l l l l l l l l l
21 × l l l l l l l l
22 × × × l l l l l l
23 × l × l l l l l l
24 l – – l l l l l l
25 × l × l l l l l l
26 × l l l l l l l l
27 l l n l l l l l l
28 × × × l l l l l l
29 l l n l l l l l l
30 l l n n n n n n n
31 × l l l l l l l l
32 × – – l l l l l l
33 × – – l l l l l l
34 × × × n n n n n n
35 × l n n n n n n n
36 l – – l l l l l l
37 × l × n n n n n n
38 l l n n n n n n n
39 – – h h h h/m h h h
40 × l × n n n n n n
41 l l n l l l l l l
42 × h /m h h h h h h h
43 h h h h h h h h h
44 m m h h h h h h h
45 m m h h h h h h h
46 l l l/n l/n l/n l/n l/n l/n l
47 – h h h h h h h h
48 × – – – – – l l l
49 × × × l l l l l l
50 × × × l l l l l l

The letters and symbols in the table are explained in Table A1 and Table A2.

15
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Table A5
The results of 50 predicted sites based on model C with 9 kinds of learning sequences of sampling sites with SEVs’ states divided by equal interval classification (EIC)
and IEVs’ states divided by comprehensive suitability evaluation.
ID of predicted sites Seq 1st Seq 2nd Seq 3rd Seq 4th Seq 5th Seq 6th Seq 7th Seq 8th Seq 9th

1 m m m m m m m l l
2 × × m m m m m l l
3 m m m m m m m m m
4 l l l l l l l l l
5 l m m m m m m m m
6 × × × × × – h h h
7 × × × × × l l l l
8 – – – – – – – – l
9 × × × × × l l l l
10 × × × × × l l l l
11 × × × × × l l l l
12 × × × × × l l l l
13 × × × × n n n n n
14 × × × × × – – – –
15 × × × × l l l l l
16 × × × × l l l l l
17 × × m m m m m m m
18 – l l l l l l l l
19 l l l l l l l l l
20 l l l l l l l l l
21 × × – – l l l l l
22 l l l l l l l l l
23 × × l l l l l l l
24 – – – – – – – – –
25 × × l l l l l l l
26 × × l – l l l l l
27 l l l l l l l l l
28 × × l l l l l l l
29 × × – – l l l l l
30 × × m m m m m l l
31 × × × × × – – – –
32 × × × × l l l l l
33 × × × × l – – – –
34 × × × × l l l l l
35 × × × × l l l l l
36 × × × × l l l l l
37 × × × × l l l l l
38 × × × × l l l l l
39 × × × × × – – – –
40 × × × × l l l l l
41 × × × × l l l l l
42 × × × × × l h h h
43 h/m/l – – – – – h h h
44 × × h h h h h h h
45 h h h h h h h h h
46 l m m m m m m m m
47 l – – – – – m m h/m
48 × × × × × l l l l
49 × × × × × l l l l
50 × × × × × l l – –

The letters and symbols in the table are explained in Table A1 and Table A2.

16
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Table A6
Sensitivity analysis on habitat suitability node in B model after learning 90 sampling sites (Seq 9th) with SEVs’ states divided by FMC and IEVs’ states divided by
comprehensive suitability evaluation.
Nodes Probability setting of node state (%) The state of the node of the habitat suitability
h (%) h (%) change m (%) m (%) change l (%) l (%) change n (%) n (%) change

Bio1 "1 =100,"2 =0,"3 =0,"4 =0


′′ ′′ ′′ ′′
17.29 − 11.67 19.04 − 5.77 41.87 11.54 21.80 5.9
"1′′ =0,"2′′ =100,"3′′ =0,"4′′ =0 23.83 − 5.13 29.43 4.62 31.06 0.73 15.68 − 0.22
"1′′ =0,"2′′ =0,"3′′ =100,"4′′ =0 28.71 − 0.25 28.37 3.56 29.31 − 1.02 13.61 − 2.29
"1′′ =0,"2′′ =0,"3′′ =0,"4′′ =100 39.82 10.86 27.87 3.06 20.73 − 9.6 11.58 − 4.32
Bio10 "1′′ =100,"2′′ =0,"3′′ =0,"4′′ =0 21.33 − 7.63 24.93 0.12 35.44 5.11 18.31 2.41
"1′′ =0,"2′′ =100,"3′′ =0,"4′′ =0 22.81 − 6.15 24.65 − 0.16 34.61 4.28 17.93 2.03
"1′′ =0,"2′′ =0,"3′′ =100,"4′′ =0 23.38 − 5.58 24.09 − 0.72 34.77 4.44 17.76 1.86
"1′′ =0,"2′′ =0,"3′′ =0,"4′′ =100 33.1 4.14 24.88 0.07 27.46 − 2.87 14.56 − 1.34
Totalrad13 "1′′ =100,"2′′ =0,"3′′ =0,"4′′ =0 23.43 − 5.53 26.07 1.26 33.44 3.11 17.06 1.16
"1′′ =0,"2′′ =100,"3′′ =0,"4′′ =0 26.56 − 2.4 27.14 2.33 30.73 0.4 15.58 − 0.32
"1′′ =0,"2′′ =0,"3′′ =100,"4′′ =0 31.5 2.54 28.05 3.24 26.58 − 3.75 13.87 − 2.03
"1′′ =0,"2′′ =0,"3′′ =0,"4′′ =100 30.21 1.25 23.52 − 1.29 30.24 − 0.09 16.03 0.13
EL "1′′ =100,"2′′ =0,"3′′ =0,"4′′ =0 23.33 − 5.63 25.13 0.32 33.92 3.59 17.62 1.72
"1′′ =0,"2′′ =100,"3′′ =0,"4′′ =0 30.96 2 28.92 4.11 26.16 − 4.17 13.95 − 1.95
"1′′ =0,"2′′ =0,"3′′ =100,"4′′ =0 33 4.04 25.72 0.91 26.67 − 3.66 14.6 − 1.3
"1′′ =0,"2′′ =0,"3′′ =0,"4′′ =100 27.79 − 1.17 23.36 − 1.45 32.26 1.93 16.59 0.69
Bio12 "1′′ =100,"2′′ =0,"3′′ =0,"4′′ =0 26.74 − 2.22 23.71 − 1.1 32.91 2.58 16.63 0.73
"1′′ =0,"2′′ =100,"3′′ =0,"4′′ =0 27.21 − 1.75 24.87 0.06 31.63 1.3 16.28 0.38
"1′′ =0,"2′′ =0,"3′′ =100,"4′′ =0 28.61 − 0.35 24.31 − 0.5 31.1 0.77 15.98 0.08
"1′′ =0,"2′′ =0,"3′′ =0,"4′′ =100 31.38 2.42 25.42 0.61 27.97 − 2.36 15.24 − 0.66
Bio15 "1′′ =100,"2′′ =0,"3′′ =0,"4′′ =0 28.04 − 0.92 23.8 − 1.01 31.2 0.87 16.96 1.06
"1′′ =0,"2′′ =100,"3′′ =0,"4′′ =0 27.96 − 1 23.98 − 0.83 31.46 1.13 16.6 0.7
"1′′ =0,"2′′ =0,"3′′ =100,"4′′ =0 28.18 − 0.78 24.14 − 0.67 31 0.67 16.68 0.78
"1′′ =0,"2′′ =0,"3′′ =0,"4′′ =100 29.19 0.23 25.05 0.24 30.1 − 0.23 15.66 − 0.24
Bio18 "1′′ =100,"2′′ =0,"3′′ =0,"4′′ =0 26.11 − 2.85 26.66 1.85 30.48 0.15 16.75 0.85
"1′′ =0,"2′′ =100,"3′′ =0,"4′′ =0 27.17 − 1.79 23.6 − 1.21 32.95 2.62 16.27 0.37
"1′′ =0,"2′′ =0,"3′′ =100,"4′′ =0 30.18 1.22 24.72 − 0.09 29.63 − 0.7 15.47 − 0.43
"1′′ =0,"2′′ =0,"3′′ =0,"4′′ =100 31.07 2.11 25.27 0.46 28.25 − 2.08 15.42 − 0.48
SOM "1′′ =100,"2′′ =0,"3′′ =0,"4′′ =0 15.85 − 13.11 32.35 7.54 31.67 1.34 20.12 4.22
"1′′ =0,"2′′ =100,"3′′ =0,"4′′ =0 30.89 1.93 22.46 − 2.35 23.02 − 7.31 23.63 7.73
"1′′ =0,"2′′ =0,"3′′ =100,"4′′ =0 36.2 7.24 17.1 − 7.71 29.33 − 1 17.37 1.47
"1′′ =0,"2′′ =0,"3′′ =0,"4′′ =100 43.79 14.83 17.59 − 7.22 31.19 0.86 7.44 − 8.46
Temperature suitability "h"=100,"m"=0,"l"=0,"n"=0 60.4 31.44 28.94 4.13 5.33 − 25 5.33 − 10.57
"h"=0,"m"=100,"l"=0,"n"=0 34.93 5.97 26.41 1.6 28.51 − 1.82 10.15 − 5.75
"h"=0,"m"=0,"l"=100,"n"=0 3.97 − 24.99 47.76 22.95 32.05 1.72 16.22 0.32
"h"=0,"m"=0,"l"=0,"n"=100 5.25 − 23.71 5.25 − 19.56 58.31 27.98 31.19 15.29
Precipitation suitability "h"=100,"m"=0,"l"=0,"n"=0 42.9 13.94 36.09 11.28 12.87 − 17.46 8.13 − 7.77
"h"=0,"m"=100,"l"=0,"n"=0 34.77 5.81 23.14 − 1.67 33.91 3.58 8.18 − 7.72
"h"=0,"m"=0,"l"=100,"n"=0 9.29 − 19.67 23.99 − 0.82 55.39 25.06 11.32 − 4.58
"h"=0,"m"=0,"l"=0,"n"=100 27.54 − 1.42 12.02 − 12.79 20.05 − 10.28 40.39 24.49
Soil suitability "h"=100,"m"=0,"l"=0,"n"=0 54.73 25.77 15.36 − 9.45 26.6 − 3.73 3.31 − 12.59
"h"=0,"m"=100,"l"=0,"n"=0 33.72 4.76 17.14 − 7.67 29.57 − 0.76 19.57 3.67
"h"=0,"m"=0,"l"=100,"n"=0 26.61 − 2.35 24.08 − 0.73 24.39 − 5.94 24.93 9.03
"h"=0,"m"=0,"l"=0,"n"=100 9.31 − 19.65 34.02 9.21 34.91 4.58 21.76 5.86

"1′′ , "2′′ , "3′′ , and "4′′ respectively represent node state 1, 2, 3, and 4 of SEVs. "h", "m", "l", and "n" respectively represent the node state of IEVs as highly suitable,
moderately suitable, lowly suitable and unsuitable habitat.

17
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

References Guo, Y.L., Gu, W., Lu, C.Y., Wei, H.Y., 2013. Deoxyschizandrin and γ-schizandrin content
in wild Schisandra sphenanthera to determine potential distribution in Qinling
Mountains. Chin. Bull. Bot. 48 (4), 411–422 (in Chinese).
Akaike, H., 1987. Factor analysis and AIC. Psychometrika 52 (3), 317–332.
Guo, Y.L., Wei, H.Y., Gu, W., Zhang, H.L., 2015. Potential distributions of
Basak, A., Brinster, I., Mengshoel, O.J., 2012. MapReduce for Bayesian network
Sinopodophyllum hexandrum based on fuzzy matter element model. Acta Ecol. Sin.
parameter learning using the EM algorithm. Proc. of Big Learning: Algorithms,
35 (3), 770–778 (in Chinese).
Systems and Tools. pp. 1–6.
Guo, Y.L., Wei, H.Y., Lu, C.Y., Gao, B., Gu, W., 2016. Predictions of potential
Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32.
geographical distribution and quality of Schisandra sphenanthera under climate
Burnham, K.P., Anderson, D.R., 2004. Multimodel inference understanding AIC and BIC
change. PeerJ 4 (10), e2554.
in model selection. Sociol. Method. Res. 33 (2), 261–304.
Halfon, E., Reggiani, M.G., 1978. Adequacy of ecosystem models. Ecol. Model. 4 (1),
Butz, C.J., Yan, W., Lingras, P., Yao, Y.Y., 2010. The CPT structure of variable
41–50.
elimination in discrete Bayesian networks. In: Ras, Z.W., Tsay, L.S. (Eds.), Advances
Hallstan, S., Johnson, R.K., Sandin, L., 2013. Effects of dispersal-related factors on
in Intelligent Information Systems. Studies in Computational Intelligence, Vol 265.
species distribution model accuracy for boreal lake ecosystems. Diversity (Basel) 5
Springer, Berlin, Heidelberg, pp. 245–257.
(2), 393–408.
Cai, W., 1983. The extension set and incompatibility problem. J. Sci. Explor. 3 (1), 81–93
Hamilton, S.H., Pollino, C.A., Jakeman, A.J., 2015. Habitat suitability modelling of rare
(in Chinese).
species using Bayesian networks: model evaluation under limited data. Ecol. Model.
Cao, W., Li, X.Q., Wang, X., Fan, H.T., Zhang, X.N., Hou, Y., Liu, S.B., Mei, Q.B., 2010.
299, 64–78.
A novel polysaccharide, isolated from Angelica sinensis (Oliv.) Diels induces the
Havron, A., Goldfinger, C., Henkel, S., Marcot, B.G., Romsos, C., Gilbane, L., 2017.
apoptosis of cervical cancer HeLa cells through an intrinsic apoptotic pathway.
Mapping marine habitat suitability and uncertainty of Bayesian networks: a case
Phytomedicine 17, 598–605.
study using Pacific benthic macrofauna. Ecosphere 8 (7), e01859.
Chon, T.S., Kwak, I.S., Park, Y.S., Kim, T.H., Kim, Y.S., 2001. Patterning and short-term
Hirzel, A.H., Helfer, V., Metral, F., 2001. Assessing habitat-suitability models with a
predictions of benthic macroinvertebrate community dynamics by using a recurrent
virtual species. Ecol. Model. 145 (2–3), 111–121.
artificial neural network. Ecol. Model. 146 (1), 181–193.
Hradsky, B.A., Penman, T.D., Ababei, D., Hanea, A., Ritchie, E.G., York, A., Stefano, J.D.,
Coll, M., Pennino, M.G., Steenbeek, J., Sole, J., Bellido, J.M., 2019. Predicting marine
2017. Bayesian networks elucidate interactions between fire and other drivers of
species distributions: complementarity of food-web and Bayesian hierarchical
terrestrial fauna distributions. Ecosphere 8 (8), 1–19.
modelling approaches. Ecol. Model. 405, 86–101.
Hugin Expert 8.4, Hugin Expert A/S, 2016. http://www.hugin.com/.
Committee of Flora of China, 1992. Flora of China, vol. 55 (3). Science Press, Beijing,
Kharroubi, S.A., Sweeting, T.J., 2010. Posterior simulation via the signed root log-
p. 41 (in Chinese).
likelihood ratio. Bayesian Anal 5 (4), 787–815.
Committee of National Pharmacopoeia, 2020. Pharmacopoeia of the People’s Republic of
Koen, H., de Villiers, J.P., Roodt, H., de Waal, A., 2017. An expert-driven causal model of
China, vol. 1. China Medical Science Press, Beijing, p. 142 (in Chinese).
the rhino poaching problem. Ecol. Model. 347, 29–39.
Debeljak, M., Džeroski, S., Jerina, K., Kobler, A., Adamič, M., 2001. Habitat suitability
Lie, H.C., Sullivan, T.J., Teckentrup, A.L., 2018. Random forward models and log-
modelling for red deer (Cervus elaphus L.) in South-central Slovenia with
likelihoods in Bayesian inverse problems. SIAM/ASA J. Uncertain. Quantificat. 6 (4),
classification trees. Ecol. Model. 138 (1), 321–330.
1600–1629.
Džeroski, S., Grbović, J., Walley, W.J., Kompare, B., 1997. Using machine learning
Liu, J., Yang, Y., Wei, H.Y., Zhang, Q.Z., Zhang, X.H., Zhang, X.Y., Gu, W., 2019.
techniques in the construction of models. II. Data analysis with rule induction. Ecol.
Assessing habitat suitability of parasitic plant Cistanche deserticola in northwest
Model. 95 (1), 95–111.
China under future climate scenarios. Forests 10 (9), 823.
Elith, J., Graham, C.H., Anderson, R.P., Dudík, M., Ferrier, S., Guisan, A., Hijmans, R.J.,
Lu, C.Y., Gu, W., Dai, A.H., Wei, H.Y., 2012. Assessing habitat suitability based on
Huettmann, F., Leathwick, J.R., Lehmann, A., Li, J., Lohmann, L.G., Loiselle, B.A.,
geographic information system (GIS) and fuzzy: a case study of Schisandra
Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J.M.M., Peterson, A.
sphenanthera Rehd. et Wils. in Qinling Mountains. China. Ecol. Model. 242 (3),
T., Phillips, S.J., Richardson, K., Scachetti-Pereira, R., Schapire, R.E., Soberón, J.,
105–115.
Williams, S., Wisz, M.S., Zimmermann, N.E., 2006. Novel methods improve
Lucena-Moya, P., Brawata, R., Kath, J., Harrison, E., ElSawah, S., Dyer, F., 2015.
prediction of species’ distributions from occurrence data. Ecography 29 (2),
Discretization of continuous predictor variables in Bayesian networks: an ecological
129–151.
threshold approach. Environ. Modell. Softw. 66, 36–45.
Elith, J., Kearney, M., Phillips, S., 2010. The art of modelling range-shifting species.
Lü, J.L., Zhao, J., Duan, J.A., Yan, H., Tang, Y.P., Zhang, L.B., 2009. Quality evaluation of
Methods Ecol. Evol. 1, 330–342.
Angelica sinensis by simultaneous determination of ten compounds using LC-PDA.
Elith, J., Leathwick, J.R., 2009. Species distribution models: ecological explanation and
Chromatographia 70, 455–465.
prediction across space and time. Annu. Rev. Ecol. Evol. S. 40 (1), 677–697.
Ma, J.P., Guo, Z.B., Jin, L., Li, Y.D., 2015. Phytochemical progress made in investigations
Feki-Sahnoun, W., Njah, H., Hamza, A., Barraj, N., Mahfoudi, M., Rebai, A., Hassen, M.
of Angelica sinensis (Oliv.) Diels. Chin. J. Nat. Medicines 13 (4), 241–249.
B., 2018. Using general linear model, Bayesian Networks and Naive Bayes classifier
Manel, S., Dias, J.M., Ormerod, S.J., 1999. Comparing discriminant analysis, neural
for prediction of Karenia selliformis occurrences and blooms. Ecol. Infor. 43, 12–23.
networks and logistic regression for predicting species distributions: a case study
Freeman, E.A., Moisen, G.G., Frescino, T.S., 2012. Evaluating effectiveness of down-
with a Himalayan river bird. Ecol. Model. 120 (2–3), 337–347.
sampling for stratified designs and unbalanced prevalence in Random Forest models
Mao, Y.J., Wei, H.Y., Shang, Z.H., Zhu, L.N., Sang, M.J., Gu, W., 2016. Habitat suitability
of tree species distributions in Nevada. Ecol. Model. 233 (2), 1–10.
assessment of Schisandra chinensis (Turcz.) Baill. in northeast China based on GIS
Friedman, J.H., 1991. Multivariate adaptive regression splines. Ann. Stat. 19 (1), 1–67.
and fuzzy matter element model. Chin. J. Appli. Envi. Biol. 22 (2), 241–248 (in
Friedman, N., Goldszmidt, M., 1996. Discretization of continuous attributes while
Chinese).
learning Bayesian networks. In: Saitta, L. (Ed.), Proceedings of the Thirteenth
Marcot, B.G., Penman, T.D., 2018. Advances in Bayesian network modelling: integration
International Conference on Machine Learning. Morgan Kaufmann, San Francisco,
of modelling technologies. Environ. Modell. Softw. 111, 386–393.
CA, pp. 157–165.
Marini, L., Bona, E., Kunin, W.E., Gaston, K.J., 2011. Exploring anthropogenic and
Froese, J.G., Pearse, A.R., Hamilton, G., 2019. Rapid spatial risk modelling for
natural processes shaping fern species richness along elevational gradients.
management of early weed invasions: balancing ecological complexity and
J. Biogeogr. 38 (1), 78–88.
operational needs. Methods Ecol. Evol. 10 (12), 2105–2117.
Matlab software 9.1, MathWorks, 2016. https://cn.mathworks.com/.
Fuster-Parra, P., García-Mas, A., Ponseti, F.J., Leo, F.M., 2015. Team performance and
Meineri, E., Dahlberg, C.J., Hylander, K., 2015. Using Gaussian Bayesian Networks to
collective efficacy in the dynamic psychology of competitive team: a Bayesian
disentangle direct and indirect associations between landscape physiography,
network analysis. Hum. Mov. Sci. 40 (40), 98–118.
environmental variables and species distribution. Ecol. Model. 313, 127–136.
Gendelman, R., Xing, H., Mirzoeva, O.K., Sarde, P., Curtis, C., Feiler, H., McDonagh, P.,
Meyer, S.R., Johnson, M.L., Lilieholm, R.J., Cronan, C.S., 2014. Development of a
Gray, J.W., Khalil, I., Korn, W.M., 2017. Bayesian network inference modeling
stakeholder-driven spatial modeling framework for strategic landscape planning
identifies TRIB1 as a novel regulator of cell cycle progression and survival in cancer
using Bayesian networks across two urban–rural gradients in Maine, USA. Ecol.
cells. Cancer Res 77 (7), 1575–1585.
Model. 291, 42–57.
Gieder, K.D., Karpanty, S.M., Fraser, J.D., Catlin, D.H., Gutierrez, B.T., Plant, N.G.,
Molina-Navarro, E., Segurado, P., Branco, P., Almeida, C., Andersen, H.E., 2020.
Turecek, A.M., Robert Thieler, E., 2014. A Bayesian network approach to predicting
Predicting the ecological status of rivers and streams under different climatic and
nest presence of the federally-threatened piping plover (charadrius melodus) using
socioeconomic scenarios using Bayesian Belief Networks. Limnologica 80, 125742.
barrier island features. Ecol. Model. 276, 38–50.
Nelder, J.A., Wedderburn, R.W.M., 1972. Generalized linear models. J. Roy. Stat. Soc.
Grafius, D.R., Corstanje, R., Warren, P.H., Evans, K.L., Norton, B.A., Siriwardena, G.M.,
135 (3), 370–384.
Pescott, O.L., Plummer, K.E., Mears, M., Zawadzka, J., Richards, J.P., Harris, J.A.,
Nino, G., Yang, Y.P., Franz, K.H., Anita, A., Caroline, S.W., 2017. Angelica sinensis
2019. Using GIS-linked Bayesian Belief Networks as a tool for modelling urban
(Oliv.) diels: influence of value chain on quality criteria and marker compounds
biodiversity. Landscape Urban Plan 189, 382–395.
ferulic acid and z-ligustilide. Medicines 4 (1), 14.
Guo, Y., Li, X., Zhao, Z., Nawaz, Z., 2019. Predicting the impacts of climate change, soils
Parkash, O., Sharma, P.K., Mahajan, R., 2008. New measures of weighted fuzzy entropy
and vegetation types on the geographic distribution of Polyporus umbellatus in
and their applications for the study of maximum weighted fuzzy entropy principle.
China. Sci. Total Environ. 648, 1–11.
Inform. Sci. 178 (11), 2389–2395.
Guo, Y., Li, X., Zhao, Z., Wei, H., 2018. Modeling the distribution of Populus euphratica
Patten, B.C., Auble, G.T., 1981. System theory of the ecological niche. Am. Natu. 117 (6),
in the Heihe River Basin, an inland river basin in an arid region of China. Sci. China
893–922.
Earth Sci. 61 (11), 1669–1684.
Phillips, S.J., Anderson, R.P., Schapire, R.E., 2006. Maximum entropy modeling of
Guo, Y., Li, X., Zhao, Z., Wei, H., Gao, B., Wei, G., 2017. Prediction of the potential
species geographic distributions. Ecol. Model. 190 (3–4), 231–259.
geographic distribution of the ectomycorrhizal mushroom Tricholoma matsutake
Pliscoff, P., Luebert, F., Hilger, H.H., Guisan, A., 2014. Effects of alternative sets of
under multiple climate change scenarios. Sci. Rep. 7, 46221.
climatic predictors on species distribution models and associated estimates of
extinction risk: a test with plants in an arid environment. Ecol. Model. 288, 166–177.

18
Q. Zhang et al. Ecological Modelling 450 (2021) 109560

Ramírez, R.S., Mora, F., Quintero, E., 2019. The use of geospatial data and Bayesian Wang, Y., Bi, L., Wang, S., Lin, S., Xiang, W., 2017b. The application of dynamic Bayesian
Networks to assess the risk status of Mexican amphibians. Glob. Ecol. Conserv. 20, network to reliability assessment of emu traction system. Eksploat. Niezawodn. 19
e00735. (3), 349–357 (in Polish).
Ray, D., Behera, M.D., Jacob, J., 2018. Evaluating ecological niche models: a comparison Wiest, W.A., Correll, M.D., Marcot, B.G., Olsen, B.J., Elphick, C.S., Hodgman, T.P.,
between Maxent and GARP for predicting distribution of Hevea brasiliensis in India. Guntenspergen, G.R., Shriver, W.G., 2019. Estimates of tidal-marsh bird densities
Proc. Natl. Acad. Sci., India, Sect. B Biol. Sci. 88 (4), 1337–1343. using Bayesian networks. J. Wildlife Manage. 83 (1), 109–120.
Russak, V., 2009. Changes in solar radiation and their influence on temperature trend in Xin, P., Khan, F., Ahmed, S., 2017. Dynamic hazard identification and scenario mapping
Estonia (1955–2007). J. Geophys. Res-Atmos. 114 (D10), D00D01. using Bayesian network. Process. Saf. Environ. 105, 143–155.
Salliou, N., Barnaud, C., Vialatte, A., Monteil, C., 2017. A participatory Bayesian belief Zadeh, L.A., 1965. Fuzzy sets. Infor. Cont. 8 (3), 338–353.
network approach to explore ambiguity among stakeholders about socio-ecological Zaniewski, A.E., Lehmann, A., Overton, J.M., 2002. Predicting species spatial
systems. Environ. Modell. Softw. 96, 199–209. distributions using presence-only data: a case study of native New Zealand ferns.
Sang, M.J., Wei, H.Y., Guo, Y.L., Gao, B., Zhu, L.N., Cui, J.L., Gu, W., 2015. Habitat Ecol. Model. 157 (2–3), 261–280.
suitability of Cornus officinalis in the Qinling region based on Fuzzy mathematics. Zhang, D.F., Zhang, Q., Guo, J., Sun, C.Z., Wu, J., Nie, X., Xie, C.X., 2017. Research on
Plant Sci. J. 33 (6), 757–765 (in Chinese). the global ecological suitability and characteristics of regions with Angelica sinensis
Shang, Z.H., Wei, H.Y., Gu, W., Mao, Y.J., Zhu, L.N., Sang, M.J., 2015. Potential based on the MaxEnt model. Act. Ecol. Sin. 37 (15), 5111–5120.
ecological suitability regionalization analysis of Angelica sinensis based on GIS and Zhang, H.Y., Bi, W.G., Yu, Y., Liao, W.B., 2012. Angelica sinensis (Oliv.) Diels in China:
fuzzy matter element model. J. Chin. Med. Mat. 38 (7), 1370–1374 (in Chinese). distribution, cultivation, utilization and variation. Genet. Resour. Crop Ev. 59 (4),
Sierra, L.A., Yepes, V., García-Segura, T., Pellicer, E., 2018. Bayesian network method for 607–613.
decision-making about the social sustainability of infrastructure projects. J. Clean. Zhang, X.H., Wei, H.Y., Zhao, Z.F., Liu, J., Zhang, Q.Z., Zhang, X.Y., Gu, W., 2020a. The
Prod. 176, 521–534. global potential distribution of invasive plants: anredera cordifolia under climate
Smith, A.F.M., Skene, A.M., Shaw, J.E.H., Naylor, J.C., Dransfield, M., 1985. The change and human activity based on Random Forest models. Sustainability 12 (4),
implementation of the Bayesian paradigm. Commun. Statist.-Theor. Meth. 14 (5), 1491.
1079–1102. Zhang, X.Y., Wei, H.Y., Zhang, X.H., Liu, J., Zhang, Q.Z., Gu, W., 2020b. Non-Pessimistic
Soberón, J., Peterson, T.A., 2005. Interpretation of models of fundamental ecological predictions of the distributions and suitability of Metasequoia glyptostroboides
niches and species’ distributional areas. Biodivers. Inform. 2, 1–10. under climate change using a Random Forest model. Forests 11 (1), 62.
Stockwell, D., 1999. The GARP modelling system: problems and solutions to automated Zhang, Y.H., Lu, Y.Y., He, C.Y., Gao, S.F., 2019. A method for cell suspension culture and
spatial prediction. Int. J. Geogr. Inf. Sci. 13 (2), 143–158. plant regeneration of Angelica sinensis (Oliv.) Diels. Plant Cell Tiss. Org. 136,
Strandén, I., Christensen, O.F., 2011. Allele coding in genomic evaluation. Genet. Sel. 313–322.
Evol. 43 (1), 25. Zhao, Y.J., Chen, S.B., Gao, G.Y., Feng, Y.X., Yang, S.L., Xu, L.Z., Du, L.J., Hu, S.L.,
Swets, J.A., 1988. Measuring the accuracy of diagnostic systems. Science 240, Feng, X.F., 2002. Study on the physicochemical properties of cultivated soil of
1285–1293. genuine crude and no-enuine crude Chinese Angelica. China J. Chin. Mate. Medi. 27
Tantipisanuh, N., Gale, G.A., Pollino, C., 2014. Bayesian networks for habitat suitability (1), 19–22 (in Chinese).
modeling: a potential tool for conservation planning with scarce resources. Ecol. Zhao, Z.F., Guo, Y.L., Wei, H.Y., Ran, Q., Liu, J., Zhang, Q.Z., Gu, W., 2020. Potential
Appl. 24 (7), 1705–1718. distribution of Notopterygium incisum Ting ex H. T. Chang and its predicted
Thompson, P.R., Fagan, W.F., Staniczenko, P.P., 2020. Predictor species: improving responses to climate change based on a comprehensive habitat suitability model.
assessments of rare species occurrence by modelling environmental co-responses. Ecol. Evol. 10 (6), 3004–3016.
Ecol. Evol. 10, 3293–3304. Zhao, Z.F., Guo, Y.L., Wei, H.Y., Ran, Q., Gu, W., 2017. Predictions of the potential
Uusitalo, L., 2007. Advantages and challenges of Bayesian networks in environmental geographical distribution and quality of a Gynostemma pentaphyllum base on the
modelling. Ecol. Model. 203, 312–318. fuzzy matter element model in China. Sustainability 9 (7), 1114.
Wang, C., Matthies, H.G., 2019. Novel model calibration method via non-probabilistic Zhu, L.N., Wei, H.Y., Guo, Y.L., Sang, M.J., Cui, J.L., Gu, W., 2015. Suitable habitat
interval characterization and Bayesian theory. Reliab. Eng. Syst. Safe. 183, 84–92. division of Scutellaria baicalensis Georgi based on entropy weight and matter
Wang, H., Chatpatanasiri, R., Sattayatham, P., 2017a. Stock trading using PE ratio: a element model. Bull. Soil Water Cons. 35 (1), 153–158 (in Chinese).
dynamic Bayesian network modeling on behavioral finance and fundamental
investment. arXiv:1706.02985.

19

You might also like