Professional Documents
Culture Documents
Invited Review Recursive Models in Animal Breeding
Invited Review Recursive Models in Animal Breeding
Invited Review Recursive Models in Animal Breeding
106:2198–2212
https://doi.org/10.3168/jds.2022-22578
© 2023, The Authors. Published by Elsevier Inc. and Fass Inc. on behalf of the American Dairy Science Association®.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
2198
Varona and González-Recio: INVITED REVIEW: RECURSIVE MODELS 2199
the structural parameters with a Markov Chain Monte of respective order, and A is the numerator relationship
Carlo approach. Since then, structural equation models matrix (which can be replaced by a genomic relation-
have been widely used in animal breeding to infer caus- ship matrix in a genomic selection approach).
al relationships between phenotypes and to estimate This RM can be reparametrized as a standard mixed
genetic parameters. Most applications have assumed model (MTM) by multiplying all terms of [1] by Λ−1,
unidirectional relationships and used RM; however, as follows:
their statistical and biological interpretations remain
debated. In most cases, RM inference can be achieved yi = Λ−1Xi b + Λ−1u i + Λ−1ei = Λ−1Xi b + u*i + e*i
within an MTM framework, although it can present
identification problems in certain circumstances, and with
some restrictions are needed to ensure proper param-
eter estimation. The objective of this review was to
clarify some controversial aspects of recursive models (
u* ~ N 0, A ⊗ G* ) (
and e* ~ N 0, I ⊗ R* , )
that have not been fully addressed.
where u* and e* are the vector of breeding values and
RECURSIVE MODELS IN ANIMAL BREEDING residuals under the MTM, and G* = Λ−1GΛ−1′ and
′
R* = Λ−1RΛ−1 .
No human or animal subjects were used, so this anal-
ysis did not require approval by an Institutional Ani- IDENTIFICATION
mal Care and Use Committee or Institutional Review
Board. Following the notation proposed by Gianola and The fully conditional density of data given the pa-
Sorensen (2004), the statistical definition of RM is as rameters of the model under an RM is as follows:
follows:
p (y | b, u, Λ, R)
Λyi = Xi b + u i + ei , [1] n 1
∝ ∏exp − (Λyi − Xi b − u i )′ R−1 (Λyi − Xi b − u i )
i=1
2
where b is the vector of fixed effects, yi, ui, and ei are
n 1 ′
m × 1 vectors of phenotypic measurements, additive
genetic effects and residuals of the m traits associated
(
= ∏exp − yi − Λ−1Xi b − Λ−1u i ΛR−1Λ
2
)( )
i=1
with the ith multivariate record, and Xi is its corre-
sponding incidence matrix. Λ is an m × m matrix of (yi − Λ−1Xi b − Λ−1ui )}
the recursive parameters with ones on the diagonal and
minus the recursive effects of the ith trait on the jth and, under the MTM, is as follows:
trait (−λ1→j) in some or all the elements below the
diagonal, as follows:
(
p y | b* , u* , R* )
1 0 … 0 n 1 ′
−λ 1
… 0 2
( ) (
∝ ∏exp − yi − Xi b* − u*i R*−1 yi − Xi b* − u*i )
Λ = 1→2 . i=1
… … … …
−λ −λ2→k … 1 As R*−1 = Λ ′R−1Λ, u*i = Λ−1u i , and, if each factor af-
1→k
fects all traits (Wu et al., 2010),
( )
Breeding values u ′ = {u'1, u'2 , …, u'n , u'n+1, …, u's } and b1
( {
residual effects e ′ = e'1, e'2 , …, e'n }) are distributed fol- −1
Λ Xi b = Xi Λ … = Xi b* ,
−1
lowing multivariate Gaussian distributions: bf
u ~ N (0, A ⊗ G) and e ~ N (0, I ⊗ R) , where f is the number of factors. It can be shown that
fully conditional densities of the 2 models are com-
where n is the number of multivariate records, s is pletely equivalent, and the models can only differ due
the number of individuals including recorded (n) and to the assumed prior distributions within a Bayesian
unrecorded (s − n), G and R are m × m genetic and framework. The MTM is completely identifiable, and
residual (co)variances matrices, I is the identity matrix G* and R* can be inferred from likelihood or Bayesian
Journal of Dairy Science Vol. 106 No. 4, 2023
Varona and González-Recio: INVITED REVIEW: RECURSIVE MODELS 2200
Figure 1. Four scenarios of the relationship between 2 variables (y1 and y2): (a) causality of y1 into y2, (b) causality and residual correlation
between y1 and y2, (c) causality and residual correlation between y1 and y2 with an extra variable (z) affecting y1, and (d) causality and residual
correlation between y1 and y2 with an extra variable (z) affecting y2.
approaches. Instead, the RM has up to m2/2 − m/2 ad- that scenario, 3 additional sources of information are
ditional recursive parameters, and there are an infinite available; namely, the covariance between z and y1, the
number of combinations of R, G, and the recursive pa- covariance between z and y2, and the variance of z and
rameters that generate the same likelihood. Therefore, 2 new unknowns (w and σz2 ) . However, in the scenario
to achieve the inference of the parameters involved, where the variable z affects y2, rather than y1 (Figure
some restrictions must be imposed. In animal breeding 1d), the covariance between z and y1 is null. Then, it
models, 2 strategies have been used to achieve statisti- does not provide any information, and the model is not
cal identification, (1) imposing restrictions on the ele- identifiable.
ments of the (co)variance component matrices (Varona In short, the only ways to make recursive models
et al., 2007) or (2) imposing constraints in the form of identifiable are (1) imposing constraints on the (co)
linear combinations of explanatory variables to dispose variance matrix (scenario a), or (2) having an instru-
of instrumental variables (Gianola and Sorensen, 2004). mental auxiliary variable that influences the dependent
To illustrate these 2 types of constraints, let us assume traits only through the independent variable (scenario
a simple recursive model that has 2 traits (y1 and y2) in c). Identification of auxiliary variables requires find-
4 scenarios (Figure 1). ing a systematic effect or a covariate (in the jargon of
For simplicity, assume that the means of the 2 traits mixed models) that explains a significant proportion of
are known. In the first scenario (Figure 1a), the only the variation in the independent traits, and influences
relationship between y1 and y2 is defined by the recur- the dependent traits through the recursive relationship,
sive parameter (λ) because the residuals are not corre- only. In the animal breeding scope, finding a variable
lated. Therefore, there are 3 sources of information that matches these 2 conditions can be difficult be-
available, the variances of y1 and y2 and the covariance cause traits of economic interest are often affected by
between them, which can be used to estimate σ12 , σ22 , common environmental factors. Nevertheless, there are
and λ, and the model is completely identifiable. In the some examples. For instance, milk yield is genetically
second scenario (Figure 1b), the residuals are correlat- correlated with days open, and have a recursive effect
ed; therefore, the covariance between y1 and y2 depends because cows that have high energy demands in lacta-
on λ and σ12, and we have only the same 3 sources of tion can incur a negative energy balance, which impairs
information to infer 4 parameters (σ12 , σ22 , σ12 , and λ ) reproductive function. In that case, the energy compo-
that make the model unidentifiable. One way to make sition of feed can be an auxiliary variable. The amount
the model identifiable is to include an auxiliary instru- of energy in the ration (z) influences the amount of
mental variable (z; Figure 1c). The instrumental vari- milk (y1) a cow produces; however, in the absence of
ables must be (i) correlated with the explanatory vari- lactation, reproductive performance (y2) is expected to
able (y1) and (ii) not correlated with the residuals. In be normal as long as the minimum feed requirements
are met. Thus, the amount of energy in the ration (z) σ2u 2 + 2λ1→2 σu12 + λ12→2 σ2u1
affects milk yield (y1), not reproductive performance h2 * = ,
(y2), but high levels of y1 would impair y2.
u2
( )
σ2u 2 + σ2e2 + 2λ1→2 σu12 + σe12 + λ12→2 σ2u1 + σ2e1 ( )
In more complex models, such as multiple-trait RM
models that have several recursive dependencies, identi- σu12 + λ1→2 σu21
fying auxiliary variables can be extremely cumbersome. corr ( u1 , u*2 )= .
However, forcing the residual covariance matrix to be (
σu21 σu2
2
+ 2λ1→2 σu12 + λ12→2 σu21 )
diagonal guarantees the identification in multiple-trait
RM, although we must bear in mind that this is a strict
For simplicity, λ1→2 is expressed as units of the indepen-
assumption in most cases. Phenotypic causal relation-
dent trait residual standard deviation (σe1 ) . Figure 2
ships between traits rarely is the sole cause of envi-
shows the landscape of heritability of trait 2 and the
ronmental covariance between traits, there are usually
recursive effect of trait 1 on trait 2 at various levels of
other shared external environmental effects. In addi-
the genetic covariance between the 2 traits. In the pres-
tion, mixed models can have other restrictions such as
ence of recursiveness, the heritability of the dependent
setting to zero the genetic or the phenotypic covariance
trait increases if λ1→2 has the same sign or direction as
between traits, as proposed by Varona et al. (2007) and
the genetic covariance. The stronger the genetic covari-
Jamrozik and Schaeffer (2011).
ance, the larger the increase in h2. If, however, the re-
The equivalence between MTM and RM holds under
cursive parameter and the genetic covariance are in
the assumption of fully recursive relationships. If the
opposite directions, the heritability of the dependent
number of constrains imposed is greater than the num-
trait decreases. In general, the stronger genetic correla-
ber required to be identifiable or if some of the recur-
tion the larger effect is shown on the change in herita-
sive relationships are fixed, the model is over-identified.
bility.
Over-identified models are often nested models with
Genetic correlation under recursiveness is depicted in
respect to the complete models (Bentler and Satorra,
Figure 3. The relationship between λ1→2 and the ge-
2010), and the parameter space of G* and R* is re-
netic correlation follows a sigmoidal curve. The pres-
stricted to a subspace. Therefore, they can be compared
ence of a recursive effect causes an increase (or decrease)
with frequentist or Bayesian models comparison tools;
in the genetic correlation if it has the same (or differ-
however, these tests compare model fit to the available
ent) direction as the genetic covariance. The recursive
data, only, and cannot differentiate causality from as-
parameter has the largest effect on h2 and the genetic
sociation. To illustrate that, one can test whether 2
correlation for values up to 1 genetic standard devia-
traits (y1 and y2) are associated but cannot determine
tion. Values larger than that tend to have a lesser effect
whether y1 has an effect on y2 or vice versa, or whether
on the 2 genetic parameters because λ1→2 affects in a
cofounders affect both traits simultaneously.
quadratic manner to h2u* ; therefore, if λ1→2 > 1, the
2
statistical equivalence of MTM and RM can be invoked the energy in the ration will increase milk production.
to calculate the breeding values from an RM It will also indirectly modify the SCC, although there
(u i* = Λ−1u i ). Furthermore, an RM allows the split of would be no direct influence of the energy of the diet
the breeding values into direct genetic effects (ui) that on the SCC.
are attributed to genes that directly affect the trait, External interventions can also modify the causal
and indirect genetic effects (ui* − ui), which are caused structure between traits. In the above example of the
by genes that influence the independent trait and affect dominant and compliant cows, if they are raised un-
the dependent trait through its phenotypic influence. If der feed restriction or without ensuring full access to
selection is based on the predicted breeding values from feeder, a breeding program selecting for larger produc-
the RM (ui), the response to selection in the dependent tion yields would also increase aggressiveness. Genetic
traits will occur by acting on the direct genetic effects. correlation from an MTM may show positive estimates,
Note, however, that it may cause a correlated genetic because it does not differentiate between the causal phe-
response in the independent trait because of the pres- notypic effect and the genetic correlation (association
ence of additive genetic covariance. If the breeding plan due to genes). However, if more feed bins and plenty
aims to modify the dependent trait and keep the inde- of feed availability is supplied to cows, more compliant
pendent trait constant, restricted selection indexes individuals can eat as much as the more aggressive in-
(Kempthorne and Nordskog, 1959) should be used. dividuals. Hence, the external intervention removes the
Similarly, estimates of the additive genetic (co)vari- phenotypic causal effect that favor dominant animals in
ances components in an RM (G) and their equivalents detriment of other cows.
in an MTM (G* = Λ−1GΛ−1′ ) allow the decomposition of The ability to predict the expected outcome for those
the genetic variance between traits. The estimates of external interventions requires distinguishing between
additive genetic variance from an RM reflect the addi- causality and association, which cannot be achieved us-
tive genetic variance caused by genes acting directly on ing standard goodness-of-fit tests given the statistical
the trait. The difference between the estimates from an equivalence of the RM and the MTM. The distinction
MTM and an RM is the additive genetic variance that between them can be extremely complex because of
the dependent traits incorporated through the pheno- the possibility of confounding effects that affect sev-
typic influence of other traits. For instance, assuming eral traits (Rubin, 1974). Some authors (Shipley, 2000;
the same simple recursive models defined in equations Pearl, 2009) have developed methods that can achieve
[2] and [3], the additive variance in an MTM can be that goal. An adaptation of the Inductive Causation
split into the additive variance in an RM (σu2 2
) and the (IC) algorithm (Pearl, 2009) to the scope of mixed
models by Valente et al. (2010) is the most widely used
additive variance generated by the causal influence of
algorithm for inferring the causal structure within a set
trait 2 (2λ1→2 σu12 + λ12→2σ2u1 ) . The same decomposition of phenotypic traits. Briefly, the IC algorithm has the
can be achieved for the additive genetic covariance be- following 3 steps:
tween traits, which can be divided into the covariance
generated by pleiotropic genes that affect the 2 traits 1. For each pair of traits (y1 and y2), search for a
(σu12 ) , simultaneously, and the additive genetic covari- set of traits such that y1 is independent of y2,
ance caused by the phenotypic influence of the indepen- given this set. If y1 and y2 are dependent in each
dent trait (λ1→2 σ2u1 ) . possible set, they are declared as adjacent and
Selection on independent traits produces a correlated are connected by an undirected edge.
response in the dependent trait because of both sources 2. If 2 nonadjacent variables (y1 and y2) are con-
of genetic covariance. However, an environmental nected to an extra-adjacent variable (y3) and
change or an external intervention (Rosa and Valente, are not independent for all sets of variables that
2013), that directly increases (or reduces) the indepen- contain y3, connect y3 to y1 and y2 by a directed
dent traits, will modify the phenotypic performance edge. The result is a partially oriented graph
of dependent traits. This phenotypic change is solely that has directed and undirected edges.
through the phenotypic influence between traits ex- 3. Change as many undirected to directed edges
pressed by the recursive parameter, and will not imply as possible to avoid generating new colliders or
any change in the genes associated with the genetic cycles.
variation. For example, milk yield in dairy cattle is af-
fected by the amount of energy in the diet and has a The goal of the IC algorithm is to identify each pair of
causal effect on the SCC in milk. An external inter- variables as either conditionally dependent or indepen-
vention that modifies the diet of cows by increasing dent, which is based on the partial correlations of the
2 traits given the sets of remaining traits. As proposed Sorensen, 2014), and Gaussian – Multiplicative Bino-
by Valente et al. (2010), within a Bayesian framework, mial models (Varona et al., 2020). Furthermore, some
those partial correlations can be obtained from the pos- authors (Heringstad et al., 2009) have presented models
terior distribution of the residual (co)variances matrix that have complex causal structures and include several
calculated in a standard multivariate mixed model. If Gaussian and Categorical traits. In those studies, the
the HPD (highest posterior density) of a partial cor- most frequent assumption to ensure identification was
relation includes zero, the 2 traits are assumed to be to set the residual covariances between traits to zero.
conditionally independent. A more detailed description Traditionally, RM have proposed causal effects be-
of the algorithm is described by Valente et al. (2010, tween phenotypes; however, in some cases, the number
2011) and Rosa and Valente (2013). For each pair of of traits can be many and clustered in latent pseudo-
traits, the algorithm chooses among independence, cau- phenotypes or latent variables that summarize each
sality, or associated confounders. However, it cannot be group of phenotypes. For example, Peñagaricano et al.
ruled out that one trait has an effect on another, and, at (2015) analyzed 19 traits associated with postweaning
the same time, there are confounding factors that affect growth, carcass quality, fat composition, or meat qual-
both traits simultaneously. Therefore, neither the RM ity in pigs, which they summarized in 5 latent variables
nor the MTM may reflect the real relationship between through factor analysis and were used to develop an
the 2 traits, and both models are subject to very strong RM. The use of latent variables might be useful to un-
assumptions. Again, the only way to measure the mag- derstand the causal structure of large sets of variables,
nitude of the effects of causality effects or the influence such as those generated in massive phenotypic devices
of cofounders is to have an external intervention that (Fernandes et al., 2020) or those that result from the
modifies the independent trait and allows quantifying omics analysis (Lippolis et al., 2019), as proposed by
the effects on the dependent trait. Saborío-Montero et al. (2021).
Recently, the use of RM in GWAS analysis has gained
RECURSIVE MODELS IN ANIMAL BREEDING popularity. GWAS has become the main method for
identifying genomic regions that harbor the genes that
After the introduction of recursive models in quan- cause the genetic variation in quantitative traits. The
titative genetics and animal breeding (Gianola and influence of those genomic regions on the phenotypic
Sorensen, 2004), they have been applied in numerous traits can be caused by the direct influence of the genes
studies. For instance, recursive models have been ap- within the genomic region or mediated by another trait
plied to calving traits (López de Maturana et al., 2010; that has a causal dependency on them. For that reason,
Inoue et al., 2017), milk traits (Rehbein et al., 2013; some (Momen et al., 2019; Wang et al., 2020) have
Bouwman et al., 2014; Santos et al., 2015; Tiezzi et al., developed RM for use in GWAS that provide several
2015), growth traits (González-Rodríguez et al., 2014; types of signals, direct, indirect, and overall. The direct
Leal-Gutiérrez et al., 2018; Krugmann et al., 2020), dis- signals point to the genomic regions that directly affect
eases (König et al., 2008; Rehbein et al., 2013; Quigley the dependent trait, and the indirect signals identify
et al., 2018), and microbiota traits (Saborío-Montero the regions that affect the trait through the phenotypic
et al., 2020) in different species. The methods behind dependency between traits. The overall signals indicate
recursive models and the wide range of applications the genomic regions that affect the genetic variability
have been reviewed elsewhere (Wu et al., 2010; Valente of the traits, either directly or indirectly.
et al., 2013; Cantet et al., 2017; Inoue, 2020).
Recursive models were initially formulated based on EXTENSIONS OF RECURSIVE MODELS
the assumption that all traits have a continuous distri-
bution; however, in animal breeding, it is quite common Recursive Inference from the Output of an MTM
for the traits of interest to be categorical and have 2
or more possible outputs. For those types of traits, ei- The inference results on the (co)variance components
ther threshold models (Gianola, 1982) or models that and the recursive parameters of an RM can be trans-
assume a non-Gaussian distribution for the residuals formed into the results of an MTM. Instead, in the op-
(Tempelman and Gianola, 1993; Varona and Sorensen, posite direction, as the number of parameters increases,
2010) are used. Bivariate recursive models when the there are an infinite number of combinations of the (co)
independent traits follow a Gaussian distribution variance components and recursive parameters, and the
has been described for Gaussian – Threshold Models transformation cannot be performed directly. In some
(López de Maturana et al., 2007; König et al., 2008; Wu scenarios and under some specific restrictions, however,
et al., 2008), Gaussian – Binomial Models (Varona and the statistical equivalence of MTM and RM can be used
growth data set, which showed that the influence of with each recursive parameter. The results suggested
initial weight was high in the early growth period and that the causal effect was higher for large litters than
decreased thereafter. for small litters. Another way to model nonlinear rela-
tionships between the dependent and the independent
Heterogeneous Causal Structures trait is to use a nonlinear function (González-Rodríguez
et al., 2014; Varona and Sorensen, 2014; Varona et al.,
If there is population stratification, phenotypic 2020). However, in complex analyses that have more
causality may differ between subpopulations. In that than 2 traits, the model can be affected by identifica-
case, RM can be reformulated to identify an alternative tion problems (Gianola and Sorensen, 2004).
causal relationship between the traits. Strata might A nonlinear causality between traits implies that the
be known (e. g., male or female) or unknown (e. g., genetic parameters of the dependent trait vary along
healthy or sick) categories of individuals. With known the parameter space of the independent trait. To il-
stratification, the use of an RM is straightforward with lustrate this, let us assume a nonlinear recursive model
standard REML or Bayesian approaches, although with 2 traits:
equivalence with MTM no longer holds because the
covariance structure in the MTM differs between sub- yi1 = xi′1β1 + u i1 + ei1
populations. If the distribution of individuals among
subpopulations is unknown (Jamrozik and Schaeffer, yi2 = f (yi1 ) + xi′2 β2 + u i2 + ei2 ,
2010), an additional latent variable (Wu et al., 2007)
about the unknown strata of the individuals must be
included. The inference can be achieved, under a Mar- where f (yi1 ) = λ1yi1 + λ 2 y2i1, the independent trait (y1)
kov Chain Monte Carlo approach to Bayesian inference, takes values from 80 to 120 units, the recursive param-
by using a Metropolis-Hasting step (Hastings, 1970) eters are λ1 = 1.00, and λ 2 = −0.005, and the covariance
within a Gibbs sampler (Gelfand and Smith, 1990) to matrices are as follows:
sample the latent variables.
σ2 σe12 100 0
R = e1 =
Nonlinear Relationship Between Traits σ
e12 σ2e2 0 100
The traditional definitions of MTM and RM imply a σ2 σa12 30 10
linear relationship between traits; however, the causal G = a1 = .
or non-causal relationship between traits can be non- σ
a12 σ2a 2 10 30
linear (Fuerst-Waltl et al., 1997, 1998; Mulder et al.,
2015). A special case of heterogeneous causal structures
occurs if the causal influence of the independent trait (
The total genetic G* = Λy1−1GΛy1−1′ ) and residual
on the dependent trait varies along its range of values.
For example, López de Maturana et al. (2009, 2010) de-
(R *
= Λy1−1RΛy1−1
′
) (co)variance matrix can be calcu-
lated along the parametric space by using
veloped a series of recursive models that had a hetero-
geneous structural coefficient between gestation length, 1 0 1 0
calving difficulty, and perinatal mortality in dairy
cattle. They defined 4 segments along the parameter Λy1 = ∂f (yi1 ) Λy1−1 =
∂f ( y )
− y 1 y
i1
1
space of gestation length (261–267, 268–273, 274–279, i1 i1
280–291 d), and showed that the recursive relationships ∂f (yi1 )
differed among the segments, which confirmed that the = λ1 + 2λ 2 y1 .
relationship between traits was nonlinear. Similarly, yi1
Jamrozik et al. (2010) proposed a model that defines
heterogeneous structural coefficients between milk yield The plots of the recursive function, the heritability of
and somatic cell score across (the first 3 lactations) and the dependent traits, and residual and genetic correla-
between (4 intervals) lactations. Likewise, Ibáñez-Es- tions along the parametric space of the independent
criche et al. (2010) described a change-point recursive trait (80 to 120 units) are presented in Figure 5.
model for the relationship between litter size and the It can be seen that a nonlinear recursive function
number of stillbirths in pigs. The study did not fix the (Figure 5a) leads to a modification of the heritability
segments within the litter size parameter space a-priori, along the parametric space of y1. In this example (Fig-
and used change-point techniques (Chib, 1998) to infer ure 5b), the heritability decreases from >0.25 at y1 = 80
the size of the regions of the parameter space associated to <0.21 at y1 = 120, and the genetic and the residual
Journal of Dairy Science Vol. 106 No. 4, 2023
Varona and González-Recio: INVITED REVIEW: RECURSIVE MODELS 2207
Figure 5. Plots of the values of the recursive function (a), the heritability of the dependent trait (b), genetic (c), and residual (d) correlations.
correlations decrease from 0.50 to 0.15 (Figure 5c) and gence properties, or stability of estimates (Jamrozik
from 0.20 to −0.20 (Figure 5d) when y1 changes from and Schaeffer, 2011). Moreover, they can be used to
80 to 120 units. It implies that the genetic and residual propose alternative models for complex problems. For
correlations between the 2 traits are also modified by instance, modeling non-genetic (or residual) sources
an external intervention that modifies the phenotypic of covariance in bivariate models that includes traits
mean of y1. Let us imagine that, in our population, the with Gaussian and non-Gaussian (Binomial, Poisson)
average independent trait is 80 and, at that point in residuals is challenging; however, some Gaussian and
the parametric space, it is strongly correlated with the non-Gaussian traits have a strong phenotypic covari-
dependent trait. Therefore, it can be used to achieve a ance caused by nongenetic factors. In fact, the assump-
correlated response under a breeding scheme. However, tion of statistical independence of the residuals in cases
if there are some external interventions or selection where there is nongenetic covariance can lead to unde-
processes that increase the independent trait to 120, sirable bias in the predictions of breeding values. Va-
the genetic correlation would be reduced to 0.15, and rona and Sorensen (2014) suggested using an RM and
the correlated response in the dependent trait will be setting the non-Gaussian trait as dependent and the
much lower. Gaussian trait as independent. Statistical equivalence
Note that these statements are conditioned on a of the MTM and the RM can be invoked to reconstruct
model that has very strict assumptions, e.g., the non- the breeding values of the non-Gaussian trait. With
linear relationship is caused by the phenotypic effect that approach, the covariance between the Gaussian
of the independent trait on the dependent trait, only. and non-Gaussian residuals is taken into account in the
Nevertheless, nonlinear recursive models might provide prediction of breeding values and bias is avoided.
a new and interesting area of research about the rela- Another operational application of RM has been de-
tionships among the traits of livestock populations. veloped for the analysis of Residual feed intake (Koch
et al., 1963), which is one of the most commonly used
Recursive Models as Operational Tools approaches to assess feed efficiency in livestock popula-
tions. The usual method for analyzing genetic variation
Recursive (or structural equation) models have been in residual feed intake has a 2-stage approach (Berry
developed to describe the causal relationships between and Crowley, 2013). First, a linear regression model
variables (or traits); however, RM can be useful for relating dry matter intake (DMI) to the energy sinks
operational reasons, even if there is no causality be- is used. Second, the estimated residuals (residual feed
tween traits, because they may perform better than intake) are analyzed using an MTM. Recently, Jam-
MTM in terms of computational requirements, conver- rozik et al. (2021) and Wu et al. (2021) suggested an
Journal of Dairy Science Vol. 106 No. 4, 2023
Varona and González-Recio: INVITED REVIEW: RECURSIVE MODELS 2208
RM that jointly analyzed DMI and all energy sinks that affects the independent trait without any direct
by postulating a recursive relationship between each effect of the dependent trait.
energy variable and the observed DMI. As such, the The estimates of variance components and breeding
systematic and breeding values for the residual feed values can be transformed from an RM to an MTM,
intake are obtained directly, without the need for the but the biological interpretation differs. In MTM,
2-stage approach. the breeding values predict the complete effects of
A third scenario that can benefit from operational genes on the traits and those values should be used
RM is that in which traits are defined as the ratio of for selection, whereas, in RM, breeding values reflect
2 Gaussian traits (y2/y1); e.g., the percentages of fat the additive genetic effects while holding constant the
and protein in milk, the feed conversion ratio. Ratio causal trait. If there is a recursive effect between 2 (or
traits have a Cauchy distribution only if the 2 traits more) traits, it may be preferred to consider it in an
are Gaussian and uncorrelated; otherwise, the distribu- RM framework for a more reliable prediction of the
tion cannot be assigned to any standard distribution. breeding values. The implementation of RM allows
Therefore, usually, breeding values for ratio traits are the splitting of the breeding values and the genetic
obtained from ad hoc models. To solve this problem, variances and covariances into direct and indirect ef-
Jamrozik et al. (2017) proposed using an RM to model fects. It ensures a deeper interpretation of the direct
those kinds of traits as y2 = λ × y1. Note that, in the and correlated response to selection, mainly under
RM, breeding values for the dependent trait (y2) can nonlinear recursive effects, or when the recursive ef-
be understood as the additive effect of the genes after fect goes in opposite direction to the additive genetic
holding the independent trait (y1; Valente et al., 2013). covariance. Furthermore, RM allow a more thorough
Therefore, the breeding values provided by the model understanding of the distribution of the genetic effects
for y2 are proxies for the breeding values of the ratio throughout the genome.
(y2/y1). The main advantage of that approach is that it This review presented the particularities of the mod-
avoids the use of statistical approximations to the ratio els, their interpretation, and some interesting and novel
trait, and it can be easily implemented in most mixed applications. Among these applications we mentioned
model software. (1) the statistical equivalence of RM and MTM can
be invoked to obtain the estimates of the recursive re-
FINAL REMARKS AND CONCLUSIONS lationships from the MTM estimates, (2) RM can be
expanded to describe more complex relationships be-
In recent decades, RM have gained popularity in tween traits including heterogeneous causal structures
animal breeding. Recursiveness between 2 traits under and nonlinear relationship between traits, (3) RM can
quantitative genetic analyses implies that the genetic be expanded to models that introduce some degree of
parameters such as heritability of the dependent trait regularization in the recursive structure that aims to
or genetic correlation are influenced by both the genetic estimate a very large number of recursive parameters,
variability of the independent trait and the recursive and (4) RM can be used for operational reasons because
effect between the 2 traits. Moreover, it must be noted of their simplicity for implementation, even though
that the heritability and correlation estimates under there is no causality between traits. In conclusion, the
MTM may change if the environmental circumstances use of RM in animal breeding is expected to increase
determining this nongenetic causal link are modified. due to their properties for causal inference and the pos-
In most cases, RM are statistically equivalent to sibility to use them as operational models. A plethora
MTM but, because RM have additional parameters, of applications of RM are expected with the advent of
their implementation requires imposing constraints multiomic (including phenomics) data. We hope that
that imply several strict assumptions. Essentially, there this review will contribute to the development of future
are 2 types of constraints, either to impose indepen- applications of RM in animal breeding.
dence of the residuals between traits or to dispose of an
auxiliary variable. In animal breeding, both of those as- ACKNOWLEDGMENTS
sumptions are difficult to meet. The first requires that
the only cause for the statistical association between This study received no external funding. The authors
the 2 traits is phenotypic causality and that there are wish to acknowledge the valuable suggestions of the
no environmental effects acting on both traits, simulta- reviewers and the help of Bruce MacWhirter (Madrid,
neously. The second constrain requires the disposition Spain) in the English edition. The authors have not
of an auxiliary variable (systematic effect or covariate) stated any conflicts of interest.
Figure A1. Causal structure between the traits recorded at times 1 (y1,y2), 2 (y3,y4), and 3 (y5,y6).
A B′ I 0 A 0 I A−1B′
R* = = = Λ−1R Λ−1′ ,
Example of the calculation the additive and residual B C BA−1 I 0 C − BA−1B′ 0 I
1 1 1
(co)variances matrices in a recursive model (RMM)
from the estimates under a standard mixed models where
(SMM) in traits recorded sequentially at 3 time points.
Let us suppose that we dispose phenotypes for 6 traits 1
1
5
2 1 1
recorded at time 1 (y1,y2), 2 (y3,y4) and 3 (y5,y6) and 5 2 1 1 2 5 1 1
A= B= ,
there is causal relationship between the traits recorded 2 5 0. 5 0 . 5 C = 1 1 5 2
at time 1 with the traits recorded at times 2 and 3, and 0.5 0.5 1 1 2 5
between traits recorded at time 2 with the recorded at
time 3 as is described in Figure A1. then
Let us suppose that we dispose of the following esti-
mates of the additive (G*) and the residual (R*) (co) 1
0 0 0 0 0 5
2 0 0 0 0
variances matrices achieved from a standard multitrait 0
1 0 0 0 0 2 5 0 0 0 0
mixed model (SMM): 0.143 0.143 1 0 0 0 0 0 4.714 1.714 0.857 0.857
R* =
0.143 0.143 0 1 0 0 0 0 1.714 4.714 0.857 0.857
2 1 1 1 1 1
0.071 0.071 0 0 1 0 0 0 0.857 0.857 4.929 1.9928
1 2 1 1 1 1
0..071 0.071 0 0 0 1 0 0 0.857 0.857 1.928 4.929
1 1 2 1 1 1
G* = 1 0 0 0 0 0
1 1 1 2 1 1
0 1 0 0 0
0
1 1 1 1 2 1 0.143 0.143 1 0 0 0
−1 −1′
1 1 1 1 1 2 = Λ1 R1Λ2 .
0.143 0.143 0 1 0 0
0.071 0.071 0 0 1 0
0.071 0.071 0 0 0 1
5 2 1 1 0. 5 0. 5
2 5 1 1 0.5 0.5
1 1 5 2 1 1 Now, we use a new Block LDL decomposition for R1 as:
R* = .
1 1 2 5 1 1
A B′ I 0 A1 0 I A−1B′
0.5 0.5 1 1 5 2 R1 = 1 1
=
0 C − B A−1B′
0
1 1 =
B
1 C B A−1
I I
0.5 0.5 1 1 2 5 1 1 1 1 1 1 1
′
Λ−2 1R 2Λ−2 1 ,