Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Theor Appl Genet (2016) 129:273–287

DOI 10.1007/s00122-015-2626-6

ORIGINAL ARTICLE

Multiple‑trait‑ and selection indices‑genomic predictions for grain


yield and protein content in rye for feeding purposes
Albert Wilhelm Schulthess1 · Yu Wang1,2 · Thomas Miedaner2 · Peer Wilde3 ·
Jochen C. Reif1 · Yusheng Zhao1

Received: 1 June 2015 / Accepted: 17 October 2015 / Published online: 3 November 2015
© Springer-Verlag Berlin Heidelberg 2015

Abstract model (2TGS) empirically confirm that the ideal scenario


Key message Exploiting the benefits from multiple- to exploit the benefits of MTGS would be when the predic-
trait genomic selection for protein content prediction tions of a relatively low heritable target trait with scarce
relying on additional grain yield information within phenotypic records are supported by an intensively phe-
training sets is a realistic genomic selection approach in notyped genetically correlated indicator trait which has
rye breeding. higher heritability. This ideal scenario is expected for PC
Abstract Multiple-trait genomic selection (MTGS) was in practice. According to our GS implementation, MTGS
specially designed to benefit from the information of can be performed in order to achieve more cycles of selec-
genetically correlated indicator traits in order to improve tion by unit of time. If the aim is to exclusively improve
genomic prediction accuracies. Two segregating F3:4 rye the prediction accuracy of a scarcely phenotyped trait,
testcross populations genotyped using diversity array tech- 2TGS will be a more accurate approach than a three-trait
nology markers and evaluated for grain yield (GY) and model which incorporates an additional correlated indica-
protein content (PC) were considered. The aims of our tor trait. In general for balanced phenotypic information,
study were to explore the benefits of MTGS over single- we recommend to perform GS considering SIs as single
trait genomic selection (STGS) for GY and PC prediction traits, this method being a simple, direct and efficient way
and to apply GS to predict different selection indices (SIs) of prediction.
for GY and PC improvement. Our results using a two-trait
Abbreviations
2T Two-trait
Communicated by J. Wang.
3T Three-trait
A. W. Schulthess and Y. Wang equally contributed to this work. 2TGS Two-trait genomic selection
3TGS Three-trait genomic selection
Electronic supplementary material The online version of this BLUE(s) Best linear unbiased estimator(s)
article (doi:10.1007/s00122-015-2626-6) contains supplementary
material, which is available to authorized users.

* Yusheng Zhao Jochen C. Reif


zhao@ipk‑gatersleben.de reif@ipk‑gatersleben.de
Albert Wilhelm Schulthess 1
Department of Breeding Research, Leibniz Institute of Plant
schulthess@ipk‑gatersleben.de
Genetics and Crop Plant Research (IPK), 06466 Gatersleben,
Yu Wang Germany
wangy@ipk‑gatersleben.de 2
State Plant Breeding Institute, University of Hohenheim,
Thomas Miedaner 70593 Stuttgart, Germany
thomas.miedaner@uni‑hohenheim.de 3
KWS LOCHOW GMBH, 29296 Bergen, Germany
Peer Wilde
peer.wilde@kws‑lochow.de

13
274 Theor Appl Genet (2016) 129:273–287

BLUP(s) Best linear unbiased predictor(s) genomic selection (MTGS) was originally designed to
CMS Cytoplasmic-male sterile profit from the information contained in correlated indi-
DArT Diversity array technology cator traits (Calus and Veerkamp 2011). Since then, this
GEBV(s) Genomic predicted breeding value(s) approach has been relatively widely applied to simulated
GS Genomic selection (Calus and Veerkamp 2011; Guo et al. 2014; Hayashi and
GY Grain yield Iwata 2013; Jia and Jannink 2012) and animal breeding
MT Multiple-trait datasets (Christensen et al. 2012; Liu et al. 2014; Tsuruta
MTGS Multiple-trait genomic selection et al. 2011). Simulation studies have extensively examined
NIRS Near-infrared reflectance spectroscopy the potential benefits of MTGS, and their findings agree
O-SI Smith–Hazel or optimum selection index with advantages of indirect selection (Falconer and Mac-
PC Protein content Kay 1996): prediction accuracies for a target trait with rela-
QTL Quantitative trait loci tively low heritability can be substantially improved when a
REML Restricted maximum likelihood genetically correlated indicator trait with higher heritability
RD Rogers’ distance is considered within the genomic prediction models (Guo
R-SI Restricted selection index et al. 2014; Hayashi and Iwata 2013; Jia and Jannink 2012).
SDS Sudden death syndrome In contrast, studies applying MTGS into real/experimental
SEW Single ear weight plant breeding data which aimed to take advantage of cor-
SI Selection index related indicator traits are less frequent (Bao et al. 2015;
ST Single-trait Jia and Jannink 2012; Rutkoski et al. 2012). These applied
STGS Single-trait genomic selection GS studies have concluded that the benefits of MTGS over
STGS are not as substantial as in simulation studies when
non-phenotyped selection candidates are intended to be
Introduction predicted. In addition, simulation and applied MTGS stud-
ies have been mainly focused on the factors influencing the
Genomic selection (GS) is specifically designed to predict potential superiority in terms of prediction accuracy that
complex traits such as grain yield (GY). In GS, training these methods can achieve compared to STGS, but they
populations which are phenotyped and genotyped are used have ignored the implications of selection when the corre-
to estimate the effects of genome-wide marker panels. The lations between traits are in practice undesirable for plant
marker effects are then applied to predict the trait perfor- breeders. Presumably, all these practical limitations have
mance of genotyped but non-phenotyped selection candi- hindered the routine implementation of MTGS by public
dates (Meuwissen et al. 2001). GS has been successfully plant breeding institutions and companies so far.
applied in plant breeding to predict for instance the per- Secondary traits (like grain protein content, PC) nega-
formance of testcross populations of maize (Albrecht et al. tively correlated to a primary trait (for example GY) can
2011; Windhausen et al. 2012; Zhao et al. 2012), sugar beet be deteriorated by means of single-trait selection (Dolan
(Hofheinz et al. 2012), and rye (Wang et al. 2014) or the et al. 1996). In this sense, when plant breeders are trying to
line per se performance of diversity panels in wheat (Crossa improve several quantitative traits simultaneously, selection
et al. 2010; Heffner et al. 2011; Rutkoski et al. 2012), rice using a total score or economic selection index (SI) is gen-
(Spindel et al. 2015), soybean (Jarquín et al. 2014; Shu erally more efficient than selection based on independent
et al. 2013), and barley (Heslot et al. 2012). The major ben- culling levels or tandem selection (Hazel and Lush 1942).
efit of GS relies on its potential of speeding up selection SIs are based on the overall performance of genotypes
cycles and, thus, enhancing the selection gains per unit of and, thus, consider the ability of favorable levels for some
time and cost (Endelman et al. 2013; Longin et al. 2015; trait(s) to compensate for unfavorable levels in another
Rutkoski et al. 2012). trait(s) (Dolan et al. 1996). The Smith–Hazel or optimum
In practice, when selection is performed for the improve- SI (O-SI) (Hazel 1943; Smith 1936) was originally designed
ment of the economic value of animals or plants, it is gen- to achieve the simultaneous improvement of several traits.
erally applied to several traits simultaneously (Falconer and More than a decade later, Kempthorne and Nordskog
MacKay 1996). However, current studies in plant sciences (1959) derived a restricted SI (R-SI) from the O-SI, which
generally target single-trait (ST) GS approaches without is specially designed for situations when breeders want to
exploiting information from correlated traits. A phenotypic improve some traits but simultaneously holding other traits
correlation occurs when the phenotypic values for two traits at their average level. Crop studies have evaluated the appli-
are associated due to genetic and non-genetic causes, plei- cation of these indices to improve the overall genotype per-
otropy and linkage being the two main known reasons for a formance according to different traits, including studies on
nonzero genetic correlation (Bernardo 2010). Multiple-trait alfalfa (Elgin et al. 1970), oat (Dolan et al. 1996; Eagles and

13
Theor Appl Genet (2016) 129:273–287 275

Frey 1974), maize (Suwantaradon et al. 1975), and soybean repeated nine times) and additionally two common checks
(Holbrook et al. 1989; Openshaw and Hadley 1984). In the were evaluated across 15 environments (see Table S1 for
past years, SIs for multiple-trait (MT) selection have been field experiments details) under an incomplete 24 × 10 lat-
coupled with information of molecular markers underly- tice design with two replications (Miedaner et al. 2012).
ing quantitative trait loci (QTL) (Cerón-Rojas et al. 2008) From these experiments, data for GY (t/ha), PC (%) and
or also with high-density molecular marker data as in GS SEW (g) were obtained. From past studies, GY data were
(Bernardo 2010; Dekkers 2007). However, until now, SIs available for all environments (Hübner et al. 2013; Mie-
coupled with state-of-the-art GS approaches have not been daner et al. 2012; Wang et al. 2015), whereas PC and SEW
extensively applied in plant studies. data were originally obtained for ten (Miedaner et al. 2012;
In the European Union, rye (Secale cereale L.) pro- Wang et al. 2015) and six (Miedaner et al. 2012) environ-
duction is mostly concentrated in Poland and Germany. ments, respectively. SEW data for six additional environ-
The end-uses of rye in Germany comprise food purposes ments (denoted as BEK11n, PET10n, PET11n, WAL10n,
(25 %), mainly bread making, feeding (32 %), mostly for WAL11n and WOH10n in Table S1) are included here,
home-grown feed, and bioenergy production (37 %) (http:// making a total of 12 environments for this trait. The PC
www.praxisnah.de/index.cfm/article/7270.html). Besides was determined by near-infrared reflectance spectroscopy
GY, that is important for all purposes, in the last decades (NIRS) as outlined in detail by Miedaner et al. (2012) and
breeding favored goals for bread making, which would not Wang et al. (2015).
necessarily meet the market requirements for other end- Phenotypic analyses for each trait (GY, PC, and SEW)
uses of rye (Miedaner et al. 2012). For instance, PC in rye were performed using a two-step approach as presented in
must be minimized for baking purposes; however, this par- detail by Wang et al. (2014). Firstly, best linear unbiased
ticular trait should reach sufficient levels to be commer- estimators (BLUEs) were calculated within environments
cially competitive with wheat when rye is used for animal with a linear model which included an intercept and the
feeding. Since GY and PC are intended to be selected in testcross progenies as fixed factors, whereas blocks, rep-
the same direction in rye for feeding purposes, a nega- lications and the error component were assumed as ran-
tive correlation between them would impose difficulties dom. In a second step, BLUEs across environments were
to the simultaneous breeding of both traits. In this study, computed considering a model with an overall mean and
two segregating F3:4 rye testcross populations were evalu- the testcross progenies as fixed factors in addition to the
ated for GY, PC, and single ear weight (SEW) in up to 15 environments plus the genotype × environment interac-
environments. The main objectives of this work were: (1) tion of the testcross progenies and the residual effect as
to explore the benefits of MTGS over STGS for the predic- random. BLUEs across environments for the testcross
tion of GY and PC in rye using different sizes of balanced progenies were used to compute the phenotypic correla-
and unbalanced training datasets concerning both traits tions among traits. In parallel, considering the models of
plus the SEW and (2) to combine SIs with GS approaches the first and second step but assuming testcross progenies
for GY and PC improvement when rye is used for feeding as random, the corresponding variance components were
purposes. estimated. Using these estimates, heritabilities (h2) were
σG2
computed as: h2 = σ2 σ2
, where σG2 , σG×E
2
G×E +
σG2 + Nr.Env. Eff.Error
Nr.Rep.×Nr.Env.
Materials and methods and 2
σEff.Error represent the genotypic, genotype × environ-
ment interaction of the testcross progenies and the effective
Plant material, field experiments and phenotypic data error variance components, respectively; whereas, Nr.Env.
analyses and Nr.Rep. are the number of environments and replicates,
correspondingly (Piepho and Möhring 2007). Additionally,
The plant material used for this study is described in detail using the BLUEs for the testcross progenies obtained from
by Miedaner et al. (2012). Briefly, two segregating popula- the first step, additive genetic (G0) and residual (R0) q × q
tions were developed from the crosses Lo115-N × Lo90-N variance–covariance matrices among the q traits were esti-
(POP-A) and Lo115-N × Lo117-N (POP-B). Subsequently, mated pre-specifying a genomic estimated kinship matrix
220 lines from each population were randomly selected within an MT mixed-model constructed according to Hen-
at the level of F3:4 generation and then crossed with an derson and Quass (1976). This model included the general
unrelated cytoplasmic-male sterile (CMS) single cross and environmental means as fixed effects and the testcross
tester to obtain three-way hybrids (testcrosses) of the type progenies and residuals as random factors assuming an
(A·B) × F3:4 (Hübner et al. 2013; Miedaner et al. 2012). unstructured covariance matrix between traits. Denoting
Originally, the 220 (A·B) × F3:4 hybrids plus the test- the diagonal and off-diagonal elements of G0 as cii and cij,
crosses of the two parents of each population (each of them respectively, the genetic correlations between traits i and

13
276 Theor Appl Genet (2016) 129:273–287

c
j were computed as: √ciiij×cjj . BLUEs and variance com- (m × q) × 1 vector of zeros and G = m−1 G0 ⊗ Im, Im
ponents estimations were carried out using the restricted being an m × m identity matrix, and ⊗ the direct product
maximum likelihood (REML) algorithm included in the operator between matrices (Searle 1982). Similarly, e is
ASReml-R package (Butler et al. 2009). assumed to have a multivariate normal distribution as
e ∼ N(0, R), where 0 is an N × 1 vector of zeros and R is a
Marker data and genetic analyses sub-matrix of R0 ⊗ In, corresponding In to an n × n iden-
tity matrix. The R matrix is obtained from the R0 ⊗ In
Using diversity array technology (DArT) markers, 201 and matrix by removing the rows and columns which corre-
219 lines were genotyped for POP-A and POP-B, respec- spond to the genotypes without records for each trait (Hen-
tively (Miedaner et al. 2012). After quality checks for these derson and Quass 1976). With the above assumptions, the
marker data, 394 and 584 DArT markers were retained for BLUEs of µ (µ ) and the best linear unbiased predictors
POP-A and POP-B, correspondingly (Wang et al. 2014). Sub- (BLUPs) for a ( a ) could be obtained by the following
sequently, DArT markers were used to calculate the Rogers’ mixed-model equations (Henderson 1975):
distance (RD) matrix for each population as stated by Reif
   −1  
et al. (2005), but in this case they were expressed as J-RD 
µ 1′N R−1 1N 1′N R−1 Z 1′N R−1 Y
= . (2)
to resemble a relationship matrix, J being an n × n matrix 
a Z R 1N Z R−1 Z + G−1
′ −1 ′ Z′ R−1 Y
whose every element are 1 (Searle 1982) and n corresponding
to the number of genotypes for the population. These matri- Genomic predicted breeding values (GEBVs) were then
ces were used as the pre-specified genomic estimated kinship computed as µ + ZU a, where ZU is the design matrix for the
matrices for the G0 and R0 matrices estimation. untested genotypes. In the present study, GY and PC were
jointly predicted using mainly a two-trait (2TGS) model
GS methods (q = 2) and secondly, by means of a three-trait (3TGS)
model (q = 3) including the SEW as a third correlated trait.
Based on Henderson and Quass (1976), ST and MTGS could
be presented as a single unified mixed-model. In general, if SIs for GY and PC and their genomic predictions
n different genotypes or individuals are genotyped for m loci
(markers) and have ni phenotypic records available for q dif- For the construction of an SI for GY and PC, the O-SI and
ferent traits (with ni ≤ n and i = 1, 2, . . . , q), then the uni- R-SI were employed. A general SI for the ith genotype is
fied mixed-model to predict marker effects can be defined as: defined as SI i = p′i b,, where p′i corresponds to a 1 × q row
vector with the phenotypic values for the traits coming from
Y = 1N µ + Za + e, (1) the ith genotype and b is a q × 1 vector containing the index
where weights for each of the q traits (Lin and Allaire 1977). Then,
      for the O-SI (Hazel 1943; Smith 1936), b = P−1 G0 w, with
Y1 µ1 1n1 0 0
 Y2   µ2   0 1 ···
0  P being a q × q phenotypic variance–covariance matrix
     n2 
Y=
 .. ,
 µ=
 .. ,
 1N = 
 .. .. . ,
 between traits, G0 coming from the phenotypic data analy-
 .   .   . . ..  ses and w being a q × 1 vector which contains the rela-
Yq µq 0 0 · · · 1 nq tive economic weights for the q traits under selection. The
R0
      P matrix was derived as P = G0 + Nr.Env. , with G0 and
Z1 0 0 a1 e1
 0 Z     R0 estimated from the phenotypic data analyses. For the
0 
· · ·
 2   a2   e2 
Z= . 
. . .. , a= .. , and e =  ..  . R-SI, index weights are calculated by the following equa-
 −1
 .    
 . . .   .   .  tion: b = Iq − P Gs (Gs P Gs ) Gs P G0 w, with P
−1 ′ −1 ′ −1
0 0 · · · Zq aq eq and G0 as previously defined for the O-SI, Iq representing a
For trait i (i = 1, 2, …, q), Y i is the ni × 1 vector of pheno- q × q identity matrix and Gs corresponding to a G0 matrix
typic values, μi corresponds to the overall mean, Zi repre- whose q – s rows were deleted, s being the number of traits
sents the ni × m design matrix of marker effects, ai refers whose genetic gains should be set to zero (Bernardo 2010;
to the m × 1 vector of marker effects, 1ni denotes an ni × 1 Kempthorne and Nordskog 1959). Using an R-SI for GY
vector whose every element is a unit (Searle 1982), and ei and PC, the two corresponding scenarios, when breeders
is the ni × 1 vector of residual variation. Thetotal number want to select for GY (R-SI_GY) or PC (R-SI_PC) main-
q
of phenotypic records corresponds to N = 1 ni. In this taining the remaining trait at the mean level, were taken
sense, it can be deduced that an ST model corresponds to a on count. When an R-SI is calculated based on two traits,
particular case of the unified mixed-model, when q = 1. In the economic weights become irrelevant (Kempthorne and
parallel, it is assumed that a follows a multivariate normal Nordskog 1959). Hence, only a 1:1 price ratio between GY
distribution in the way: a ∼ N(0, G), where 0 is an and PC was considered for its calculation. However, for the

13
Theor Appl Genet (2016) 129:273–287 277

O-SI, the economic weights play an important role; hence were employed to calculate the O-SI and R-SI for the train-
an approximate 10:3 price ratio between GY and PC was ing set using the BLUEs across environments as phenotypic
additionally considered for this SI. This last price ratio data for GY and PC and subsequently, by means of STGS
was assumed based on the Canadian Wheat Board (CWB) the GEBVs were computed for the validation set. For the
prices and PC bonifications for small grain cereals (http:// reversed method the estimated G0 and P matrices coming
www.cwb.ca/pricing) and the GY means for POP-A and from each balanced and unbalanced training set were used
POP-B, respectively. to compute the different SI of the genotypes included in the
Based on suggestions made by Bernardo (2010) to per- validation set. For this purpose GY and PC genomic pre-
form GS for a base SI (Williams 1962), two methods were dictions from ST- or 2TGS were used as phenotypic data
considered for the genomic predictions of the different SI of the validation set. It was assumed that the true G0 and
in the present study. The first method consisted in the cal- P matrices would be obtained when all the genotypes of
culation of each SI using the BLUEs across environments POP-A and POP-B, respectively, were used for their esti-
as phenotypic data for the GY and PC, followed by STGS mation. Then, prediction accuracies were expressed as the
considering the calculated SI as it was a single trait. Hereaf- correlation between the SI predictions and the SI calculated
ter, this method will be called the direct method. The second using the true G0 and P matrices, and GY and PC BLUEs
group of methods comprised firstly the genomic predictions across environments as phenotypic data for the valida-
for the GY and the PC using separately two ST models or tion set. Each prediction accuracy value presented in this
jointly by means of 2TGS, followed by the calculation of study corresponds to the average of the cross-validation
the SI considering these predictions as the phenotypic val- procedure repeated 1000 times. All computations were per-
ues. From now on this group of methods will be referred as formed within R environment (R Core Team 2014).
reversed methods. From the unified mixed-model and the SI
equations it becomes recognizable that the reversed meth-
ods handle information from the variance–covariance matri- Results
ces redundantly, similarly as it was done by Bauer and Léon
(2008) using classic BLUP approaches. Phenotypic variation, heritabilities and relationships
between traits
Cross‑validation for genomic predictions
Phenotypic analyses for Pop-A and Pop-B have been par-
In this study, four different balanced training set sizes were tially reported elsewhere (Miedaner et al. 2012; Wang et al.
considered for the cross-validation procedure; comprising 2015) and here they are briefly summarized for GY, PC,
40, 80, 120 or 160 genotypes, whereas the validation set was and SEW. BLUEs across environments reflect the high phe-
composed by 41 or 59 genotypes for POP-A and POP-B, notypic variation for these three traits within POP-A and
respectively. Since phenotypic data are normally generated POP-B (Table S2). High h2 values were found for GY (0.81
in an unbalanced way for the different traits within plant and 0.84 for POP-A and POP-B, respectively) and SEW
breeding programs, an unbalanced data training set scenario (0.87 and 0.83 for POP-A and POP-B, correspondingly),
was additionally considered for the MT approaches. For whereas medium (0.68) and high (0.83) h2 values were esti-
this scenario the training set size for one of the traits was mated for PC in POP-A and POP-B, respectively. Signifi-
maintained fixed at its maximum level (i.e., 160 genotypes) cant phenotypic correlations were found among all pairs of
and considered the four different training set sizes for the traits (Table 1). Phenotypic correlations were consistent in
other(s) trait(s) included in the model. Using ST and MTGS terms of sign and similar in magnitude across both popu-
methods, the BLUEs across environments for GY, SEW and lations. Phenotypically, GY was negatively correlated with
PC were considered as phenotypic values for the training PC but positively correlated with SEW, while the correla-
set, allowing the computation of µ  and α, which were sub- tion between PC and SEW was negative. For POP-A, the
sequently used to calculate the GEBV for each trait of the strongest phenotypic correlation was found between GY
selection candidates included in the validation set. Genomic and SEW, whereas the phenotypic correlation of GY with
prediction accuracies for GY and PC were calculated as the PC was the strongest one for POP-B. In each population,
correlation between GEBVs and the BLUEs across environ- the G0 matrices were estimated through the pre-specifica-
ments of these traits in the validation set. Prediction accura- tion of a genomic estimated relationship matrix (Figure
cies were standardized dividing them by the square root of S1) within an MT mixed-model, allowing subsequently the
the heritability for each trait. computation of the genetic correlations between traits. In
For the direct and reversed SI predictions, each train- both populations, estimates of genetic correlations resem-
ing set was firstly used to estimate the G0 and P matrices. ble their phenotypic counterparts in terms of sign and also
In the case of the direct method, these estimated matrices follow the same ranking of importance (Table 1).

13
278 Theor Appl Genet (2016) 129:273–287

Table 1  Phenotypic (Cor_P) Trait GY PC SEW GY PC SEW


and genetic correlations
(Cor_G) among grain yield Cor_P Cor_G
(GY, t/ha), protein content
GY – −0.454** 0.354** – −0.421 0.290
(PC, %), and single ear weight
(SEW, g) in two testcross PC −0.387** – −0.246** −0.459 – −0.161
populations (POP-A and POP- SEW 0.500** −0.227** – 0.586 −0.343 –
B) comprising each of them 220
progenies Estimated for each testcross progenies of POP-A (below diagonal) and POP-B (above diagonal) using data
for 15 (GY), ten (PC) and 12 (SEW) environments
** Significantly different from zero with P < 0.01

Genomic predictions for GY and PC 40 80 120 160


0.15
GY POP−A GY POP−B

Difference of accuracy
0.10
Predictions for GY and PC using STGS were previously
reported by Wang et al. (2015) considering 201 and 219 0.05

testcrosses from POP-A and POP-B, respectively. Briefly, 0.00

prediction accuracies for both traits were higher in POP-B −0.05


0.15
compared to POP-A, and also higher for GY than for PC PC PC 2Tb−ST
2Tufull−ST160
0.10
(Table S3). Nevertheless, in POP-B the difference between 2Tu−ST

GY and PC prediction accuracies was marginal. In the pre- 0.05

sent study, GS was performed jointly for GY and PC using 0.00

a 2T model (Table S3) and the prediction accuracies were −0.05


compared with the ones obtained by its ST counterparts 40 80 120 160
Training set size
(Fig. 1). For balanced training set sizes the benefits of using
2TGS compared to STGS were marginal, if not null, for
Fig. 1  Comparison between single-trait (ST) and two-trait (2T) mod-
both traits in POP-A and POP-B. Marginal improvements els of the cross-validated standardized accuracy of prediction for
were observed for PC prediction accuracies in POP-A grain yield (GY, t/ha) and protein content (PC, %) using balanced
when the model also included GY information, whereas (2Tb) and unbalanced (2Tu) training set scenarios for population A
these benefits were absent for POP-B. Improvements in (POP-A) and B (POP-B). For the balanced scenario all traits share the
training set size, whereas in the unbalanced scenario only one trait
PC prediction accuracies were much more noticeable when (either GY or PC) has information for 160 genotypes in the model
unbalanced training set sizes were used for 2TGS. In this (namely 2Tufull) and for the remaining trait there is only information
sense, if PC prediction is supported by maximizing GY according to the training set size. All comparisons between 2T and
information to the full training set size, prediction accura- ST approaches were made according to the training set size and for
the unbalanced training set scenarios the prediction accuracies for
cies are increased compared to the ones obtained by STGS the trait with maximized information (2Tufull) were compared with
for PC. Similarly to the balanced training sets scenarios, the ones obtained by their ST counterparts using a training set size of
these improvements in PC prediction accuracies were much 160 genotypes (ST160). Each point value corresponds to the average
more evident in POP-A than for POP-B. Nonetheless, the of 1000 cross-validations
benefits over STGS of 2TGS supported by maximized GY
information for PC prediction are dissipated when training
set sizes for PC are increased. Slight prediction accuracy their 2T counterparts (Table 2). In general for balanced and
improvements were observed when GY predictions were unbalanced training set sizes, no improvements for GY and
supported by maximizing PC information. Similar to the PC predictions accuracies were found using 3TGS over
case of PC, these improvements were much more notable 2TGS, with the exception of GY and PC predictions sup-
in POP-A than in POP-B and they were also dissipated as ported by maximized SEW information. In this particular
the training set size for GY was increased. However, when case, marginal prediction accuracies improvements were
the training set size was already maximized for a particu- observed for GY and PC, but these also were dissipated
lar trait, adding more information for the second trait in the with increased training set sizes in the same way as the pre-
model does not appear to be much beneficial (in terms of diction accuracy improvements achieved by 2TGS.
prediction accuracies) for the trait whose training set size is
already maximized. Selection methods for PC and GY
After performing 2TGS, a 3TGS model adding SEW
information was also considered for POP-A (Table S4) and A graphical representation of the different selection strate-
prediction accuracies of this approach were compared with gies considered in this study is included in Fig. 2. For all

13
Theor Appl Genet (2016) 129:273–287 279

Table 2  Comparison of TS VS ∆ (3Tb − 2Tb) ∆ ∆ ∆


the standardized prediction (3TufullGY − 2TufullGY) (3TufullPC − 2TufullPC) (3TufullSEW − 2Tb)
accuracies for grain yield (GY,
t/ha) and protein content (PC, GY PC GY PC GY PC GY PC
%) between the two- (2T) and
three-trait (3T) models using 40 41 −0.008 0 −0.004 0.013 −0.003 −0.007 0.041 0.047
balanced and unbalanced 80 41 −0.009 −0.002 −0.003 0.005 −0.007 −0.005 0.014 0.020
training set scenarios in
120 41 −0.006 0 −0.003 0.003 −0.005 −0.002 0.002 0.007
testcross population A
160 41 −0.003 0.001 −0.002 0.001 −0.002 0.001 −0.003 0.001

The 2T model considered GY and PC and for the 3T models the single ear weight (SEW, g) was addition-
ally taken on count. TS and VS denote training set and validation set sizes for each trait, respectively. For
the balanced (b) scenario all traits share the training set size, whereas for the unbalanced (u) scenarios only
one trait (either GY, PC or SEW) has information for 160 genotypes (namely ufull-GY, ufull-PC and ufull-SEW,
respectively) and for the remaining trait(s) there is only information according to the training set size. Each
value corresponds to the average of 1000 cross-validations

GY O-SI_(1:1) R-SI_GY
PC O-SI_(10:3) R-SI_PC
POP-A POP-B
Q1 Q2 Q1 Q2
11.0
Protein Content

10.5

10.0

9.5

Q3 Q4 Q3 Q4
9.0
6.8 7.0 7.2 7.4 7.6 7.8 6.4 6.6 6.8 7.0 7.2 7.4
Grain Yield

Fig. 2  Grain yield (GY, t/ha) and protein content (PC, %) dispersion to the optimum selection indices (SIs) for the different GY:PC price
graphs for populations A (POP-A) and B (POP-B). Each line repre- ratios (10:3 and 1:1), correspondingly. R-SI_GY and R-SI_PC are
sents a different selection strategy but always selecting the best 10 % the restricted SI when GY is improved while PC is maintained at the
of the genotypes. Vertical and horizontal lines are the culling levels average level or when PC is improved whereas GY is maintained at
for GY and PC, respectively, and split each diagram into four differ- the average level, respectively
ent quadrants (denoted as Q1–Q4). O-SI_(1:1) and O-SI_(10:3) refer

cases, culling levels were applied to select the uppermost importance is ignored). Quadrants Q4 and Q1 represent
10 % of the genotypes according to GY, PC, and each SI. the breeding costs in GY and PC, respectively, when these
Culling levels for GY and PC split the dispersion graphs of traits are selected during the late generations of a tandem
GY and PC into four quadrants (denoted as Q1, Q2, Q3 and selection program which aims to improve GY and PC
Q4 in Fig. 2). Quadrant Q3 corresponds to all genotypes simultaneously. However, an SI such as the O-SI can partly
which are culled because of their low GY and PC levels. reduce these breeding costs and also allow the inclusion of
When independent culling levels for GY and PC are used economic information. If both traits have the same scale,
and both traits are intended to be simultaneously improved, the most balanced case of simultaneous selection for high
selection for one trait would completely impede the use GY and PC values would be expected for the O-SI, when
of the same selection criteria for the second trait, which both traits have the same economic importance. The O-SI
is reflected by an empty Q2 quadrant. Therefore, no geno- would allow the selection of genotypes that maximize GY
types could be selected for high GY and PC simultaneously and PC simultaneously or also of genotypes which com-
using independent culling levels. Furthermore, if the GY pensate the not so high values for one trait with high values
and PC culling levels are considered as part of a tandem for the second trait. When the GY acquires 10:3 times the
selection strategy, the improvement of one of the traits dur- economic importance of PC, it can be seen that the curve
ing the first generation(s) would limit the improvement of of the O-SI approaches towards the culling level for GY,
the second trait during the next generation(s) (considered because genotypes maximizing GY would be preferred for
as a breeding cost for the second trait, when its economic selection. The curves representing the R-SIs (R-SI_GY and

13
280 Theor Appl Genet (2016) 129:273–287

1.0
GY POP-A POP-B
PC
0.8
O-SI_(1:1)
O-SI_(10:3)
R-SI_GY
Accuracy
0.6 R-SI_PC

0.4

0.2

0.0

40 80 120 160 40 80 120 160


Training set size

Fig. 3  Relationship between training set size and the cross-validated ferent GY:PC price ratios (1:1 and 10:3), respectively. R-SI_GY and
accuracies of prediction for single-trait genomic selection of grain R-SI_PC are the restricted selection index when GY is improved
yield (GY, t/ha), protein content (PC, %) and the different selection while PC is maintained at the average level or when PC is improved,
indices (SIs) in populations A (POP-A) and B (POP-B). O-SI_(1:1) whereas GY is maintained at the average level, correspondingly. Each
and O-SI_(10:3) refer to the optimum selection indices for the dif- point value corresponds to the average of 1000 cross-validations

R-SI_PC) are located towards the culling level of the trait training sets for GY and PC, the improvements were mar-
which is intended to be improved. ginal for some of the SIs and tended to disappear when the
training set size increased. In POP-A, marginal improvements
Genomic predictions for SIs were observed for the O-SI with a 1:1 GY:PC price ratio
reconstructed from ST predictions for GY and PC, and also
The accuracies of genomic predictions for the different for the R-SI for GY improvement calculated from both ST-
SIs of GY and PC according to the direct method are rep- and 2TGS predictions. For POP-B, marginal improvements
resented in Fig. 3. Similar to the STGS for GY and PC, could be only observed for the R-SI for PC improvement
when the training set size was increased, prediction accu- reconstructed using 2TGS predictions. Nevertheless, in some
racies improved for all SIs and independent of the train- cases the prediction accuracies of the reversed methods were
ing set size, prediction accuracies for all SIs were higher slightly lower than the ones obtained by their direct counter-
in POP-B than in POP-A. In both populations, the highest parts, being the most biased case the predictions of the R-SI
prediction accuracies were obtained for the O-SI with a for PC improvement based on STGS predictions for GY and
10:3 GY:PC price ratio followed by the ones observed for PC in POP-A. However, when the 2TGS predictions for GY
the R-SI when GY improvement was the breeding goal. and PC using unbalanced training set sizes were employed
In POP-A, the O-SI with a 1:1 GY:PC price ratio and the in the reversed method, much more prediction accuracy
R-SI for PC improvement had the lowest prediction accura- improvements could be observed. When 2TGS predictions
cies. For POP-B the O-SI with a 1:1 GY:PC price ratio had were obtained by maximizing GY information to the full
also the lowest SIs prediction accuracies, but the R-SI for training set size, notable improvements were observed par-
PC improvement had prediction accuracies similar to the ticularly for the O-SI with a 1:1 GY:PC price ratio in POP-A,
ones obtained for an R-SI when GY improvement was the whereas in both populations substantial improvements were
breeding goal. All these differences in prediction accura- achieved for the O-SI with a 10:3 price ratio and the R-SI
cies between the different SIs were more evident in POP-A for GY improvement. If 2TGS predictions were acquired by
than in POP-B. means of maximizing PC information to the full training set
A group of reversed prediction methods reconstruct- size, considerable improvements were found for the R-SI for
ing each SI from ST- and 2TGS predictions for GY and PC PC improvement in both populations followed by improve-
was implemented and then compared to the direct method of ments for the O-SIs with 1:1 and 10:3 GY:PC price ratios in
SI prediction to see if improvements in prediction accuracy POP-B and A, respectively. Nonetheless, similar to the case
could be achieved by the reversed methods (Table 3). Using of balanced training set sizes, the advantages of the reversed
reversed methods based on ST- and 2TGS with balanced prediction methods using unbalanced training set sizes tended

13
Theor Appl Genet (2016) 129:273–287 281

Table 3  Comparison of the cross-validated accuracy of genomic predictions for the different selection indices (SI) between a direct prediction
method (DST) and four different reversed prediction methods (R_) in two rye testcross populations, POP-A and POP-B
Population TS VS ∆ (R_ST-DST) ∆ (R_MTb-DST)
O-SI_(1:1) O-SI_(10:3) R-SI_GY R-SI_PC O-SI_(1:1) O-SI_(10:3) R-SI_GY R-SI_PC

POP-A 40 41 0.013 0.006 0.017 −0.013 0.005 0.009 0.018 0.007


80 41 0.009 0.007 0.014 −0.015 0.005 0.010 0.016 0.009
120 41 0.008 0.003 0.008 −0.015 0.006 0.008 0.013 0.008
160 41 0.004 −0.001 0.003 −0.013 0.004 0.006 0.010 0.008
POP-B 40 59 0.004 0.004 0.005 −0.001 0.001 0.000 0.003 0.015
80 59 0.000 0.000 −0.001 −0.002 −0.002 −0.003 0.003 0.005
120 59 −0.002 −0.002 −0.003 −0.003 −0.001 −0.002 0.001 0.001
160 59 −0.002 −0.001 −0.002 −0.003 −0.001 −0.002 0.000 0.000
Population TS VS ∆ (R_MTufullGY-DST) ∆ (R_MTufullPC-DST)
O-SI_(1:1) O-SI_(10:3) R-SI_GY R-SI_PC O-SI_(1:1) O-SI_(10:3) R-SI_GY R-SI_PC

POP-A 40 41 0.055 0.145 0.141 0.010 −0.005 0.031 −0.004 0.099


80 41 0.027 0.071 0.073 0.011 0.002 0.017 0.009 0.055
120 41 0.013 0.029 0.033 0.009 0.007 0.009 0.011 0.031
160 41 0.004 0.006 0.010 0.008 0.004 0.006 0.010 0.008
POP-B 40 59 0.005 0.118 0.133 0.004 0.090 −0.004 −0.040 0.134
80 59 0.006 0.044 0.049 0.005 0.033 −0.005 −0.002 0.051
120 59 0.002 0.014 0.016 0.002 0.012 −0.002 0.001 0.018
160 59 −0.001 −0.002 0.000 0.000 −0.001 −0.002 0.000 0.000

GY and PC denote the traits grain yield (t/ha) and protein content (%), correspondingly. O-SI_(1:1) and O-SI_(10:3) refer to the optimum SIs
for the different GY:PC price ratios (10:3 and 1:1), respectively. R-SI_GY and R-SI_PC are the restricted SIs when GY or PC, correspondingly,
are intended to be improved and the remaining trait is maintained at the average level. DST is the implementation of genomic selection using
directly each SI as a single-trait (ST) to train the model. R_ST corresponds to the reversed method using GY and PC predictions coming from
two independent ST methods. R_MTb and R_MTu are the reversed methods using GY and PC predictions originated by two-trait (2T) models
in balanced and unbalanced scenarios, respectively. TS and VS denote training set and validation set sizes, respectively. For the balanced sce-
nario all traits share the training set size, whereas for the unbalanced scenario only one trait (either GY or PC) has information for 160 genotypes
in the model (namely MTufull-GY or MTufull-PC, respectively) and for the remaining trait there is only information according to the training set
size. Comparisons correspond to the difference in prediction accuracies between R_ and DST methods. Each value corresponds to the average of
1000 cross-validations

to banish when the training set size increased, with the excep- information contained in correlated indicator traits is rela-
tion of the R-SI for GY improvement when PC information tively scarce, with only three studies to the date published
was maximized to the full training set size. For this specific using real/experimental plant breeding data (Bao et al.
case, reversed predictions for the R-SI for GY improve- 2015; Jia and Jannink 2012; Rutkoski et al. 2012). Jia and
ment appeared to be inferior compared to the direct method Jannink (2012) benefited from phenotype imputation using
in small training set sizes, but as the training set size for GY MTGS applied to pine breeding data. In this sense, com-
was increased, the reversed method tended to reach, and par- pared to STGS, it was possible for them to obtain a 60 %
ticularly in POP-A slightly overcome, the prediction accuracy increase in prediction accuracy for a partially masked trait
level of the direct method. when all the genotypes have been measured for another
particular trait. Similarly, Rutkoski et al. (2012) found sub-
stantial prediction accuracy improvements and also higher
Discussion gains per cycle over STGS when the kernel quality index
was additionally included as predictor variable within
MTGS can be performed in order to achieve more an MTGS model for deoxynivalenol toxin levels predic-
cycles of selection per unit of time tion in wheat. Nonetheless, the results from both studies
imply that, in practice, the correlated indicator trait should
In plant science and compared to STGS studies, the appli- be first measured for the selection candidates before per-
cation of MTGS methods aiming to benefit from the forming MTGS for a specific target trait. In consequence,

13
282 Theor Appl Genet (2016) 129:273–287

a cost-effective pre-requisite for the implementation of SEW corresponds to one important GY component in cere-
these MTGS methods is that the evaluation of the corre- als (Lin and Allaire 1977; Pinto et al. 2002; Yoshira et al.
lated indicator trait must be less costly than phenotyping 2000). For both populations, negative phenotypic and
the target trait directly. In addition, such MTGS implemen- genetic correlations between PC and SEW were observed.
tations do not allow to accelerate the breeding process by However, mediating effects of other traits for this relation-
reaching more cycles of selection per unit of time (Rut- ship were not so clear and the possible causes of the nega-
koski et al. 2012), which is one of the main advantages of tive correlation between SEW and PC are beyond the scope
STGS (Longin et al. 2015; Rutkoski et al. 2012). Recently, of the present study.
Bao et al. (2015) used different MTGS models concerning
various combinations of traits related to sudden death syn- Trait heritabilities, data structure orthogonality
drome (SDS) resistance in soybean; however, they found and genetic correlations are the main driving forces
no significant benefits when shifting from STGS to MTGS influencing the benefits of MTGS over STGS for GY
models. We presume that these past findings have some- and PC within the two rye testcross populations
how discouraged the plant breeding community to imple-
ment MTGS as a common routine within plant breeding In practice, if a target trait is difficult to measure with pre-
programs. Nevertheless, this situation also stimulated us to cision, the errors of its measurement may so reduce herit-
perform MT genomic predictions for the selection candi- ability that indirect selection using a genetically correlated
dates only relying on trait information contained within the indicator trait with higher heritability becomes advanta-
training sets. Therefore, the positive results achieved in our geous (Falconer and MacKay 1996). Consequently, it has
study, and discussed in the following manuscript sections, been shown in past studies using simulated data that MTGS
indicate that MTGS could also be implemented in order to is beneficial compared to STGS mostly when the genomic
achieve more cycles of selection by unit of time in the same predictions of a trait with relatively low heritability are sup-
way as STGS. The theoretical and practical implications of ported by using information from a genetically correlated
our positive findings for the plant breeding community in trait with higher heritability (Guo et al. 2014; Hayashi and
general and for the improvement of GY and PC in rye for Iwata 2013; Jia and Jannink 2012). In parallel, neither the
feeding purposes are discussed. changes in heritability of the correlated trait supporting the
predictions nor changes in the genetic correlations between
Correlations among GY, PC and SEW traits influence the prediction accuracies for a highly herita-
ble trait (Jia and Jannink 2012). Therefore, not much bene-
Jia and Jannink (2012) concluded that genetic correla- fit should be expected for highly heritable traits when shift-
tions between traits are the basis for the benefits of MTGS. ing from STGS to MTGS. The results of the present study
Therefore, in order to avoid the application of MTGS using experimental rye data agree with these past findings.
using an unfavorable scenario for their benefits, we com- High heritability values for GY and PC were estimated in
puted first the phenotypic and genetic correlations between POP-B (h2 = 0.84 and 0.83 for GY and PC, respectively);
GY, PC and SEW. Phenotypic correlations among these hence the benefits from 2TGS over STGS were marginal,
traits were previously reported by Miedaner et al. (2012) if not null, for both traits within this particular population
for POP-A and POP-B using a smaller number of environ- (Fig. 1). However, 2TGS was more beneficial over STGS
ments. The phenotypic correlations found in the present for PC prediction in POP-A, where the heritability of PC
study (Table 1) are coincident in sign and similar in mag- (h2 = 0.68) was lower than the one of GY (h2 = 0.81), and
nitude with the previous reports. Negative phenotypic and these improvements were even more pronounced within
genetic correlations between GY and PC were found in unbalanced training sets, when the GY was set to the maxi-
POP-A and POP-B. Since starch content (SC) is positively mum training set size. In addition, using unbalanced train-
correlated with GY (r = 0.208 and 0.356 at the phenotypic ing set sizes, some prediction accuracy improvements for
level for POP-A and POP-B, respectively) and negatively GY were observed by shifting from STGS to 2TGS when
correlated to PC (phenotypic correlations of −0.601 and the information for this trait was partially available and the
−0.809 for POP-A and POP-B, correspondingly), the nega- PC information was maximized to the full training set size
tive relationship between GY and PC could be partly medi- in POP-A (Fig. 1). Guo et al. (2014) observed using simu-
ated by the SC. In this sense, a higher SC would increase lated data that when the phenotypic information is partially
GY but would simultaneously decrease PC in a competi- and completely available for the target and the genetically
tive way, since both correspond to seed storage constitu- correlated indicator trait, respectively, a case analogous to
ents (Kindred et al. 2008; Perez et al. 1996). Positive phe- the unbalanced training sets in the present study, MTGS
notypic and genetic correlations were found between GY performs much better than STGS, especially for a target
and SEW in both populations, which is expected because trait with lower heritability. In consequence, the superiority

13
Theor Appl Genet (2016) 129:273–287 283

of 2TGS over STGS using unbalanced training set sizes in predicted traits. These future studies would allow a better
the present study was dissipated as the training set size for understanding of the marker density effect on the superior-
the partially available target trait increased (Fig. 1). ity of MTGS over STGS and also to potentially generalize
Genetic correlations also influence the relative superior- our findings to other scenarios.
ity of MTGS over STGS. From studies on simulated data, In summary, our recent findings using experimental rye
when one trait with high heritability is supporting MTGS breeding data plus the results from past simulation (Calus
predictions for a trait with lower heritability, the more and Veerkamp 2011; Guo et al. 2014; Hayashi and Iwata
genetically correlated are these two traits, the higher the 2013; Jia and Jannink 2012) and experimental plant data
prediction accuracies would be (Calus and Veerkamp 2011; (Bao et al. 2015) studies allow a practical understand-
Hayashi and Iwata 2013; Jia and Jannink 2012). Further- ing about the ideal conditions needed to exploit the ben-
more, Bao et al. (2015) found no significant prediction efits of MTGS over STGS. In this sense, an ideal scenario
accuracy improvements over STGS using MTGS and they for plant breeders to take advantage of MTGS would be
attributed this lack of substantial improvement to the weak when the predictions of a low-heritability trait with scarce
correlations among traits. In the present study, the benefits phenotypic records are supported by a genetically corre-
of 2T- over STGS were more noticeable for POP-A than lated highly heritable indicator trait which was massively
for POP-B (Fig. 1). Even though the absolute value of the measured.
phenotypic correlation between GY and PC was higher for
POP-B than for POP-A, the genetic correlation between Practical implications of MTGS for PC and GY in rye
GY and PC was slightly more pronounced for POP-A than
for POP-B (Table 1). In this sense, the genetic correlations According to our results using balanced training set sce-
between GY and PC could also partly explain why the ben- narios, only PC in POP-A benefited from 2T-over STGS
efits of 2T- over STGS were more evident for POP-A than (Fig. 1). Although these improvements in prediction accu-
POP-B. racy were not pronounced, they imply that when PC has
In our study, POP-B had a higher marker density com- a relatively low heritability, its prediction will be slightly
pared to POP-A, with 584 and 394 DArT markers for each more accurate than performing STGS if a 2TGS model
population, respectively. In general, simulation studies on is applied using GY as a highly heritable genetically cor-
STGS have shown that an increased marker density has a related indicator trait. In addition, GY and PC of POP-A
positive effect on the prediction accuracy of GS (Meuwis- benefited notably from MTGS when the phenotypic infor-
sen 2009; Solberg et al. 2008). However, we did not directly mation within the training set for the target and indicator
evaluate the prediction accuracies reached by MTGS itself, trait was partially and completely available, respectively
but the difference in prediction accuracies compared to (Fig. 1). In general, due to the high economic importance
STGS models (Fig. 1; Table 2); hence, we anticipate that of GY in cereal crops, plant breeding programs would tend
the influence of marker density mentioned in past simula- to handle more phenotypic records for GY than for any
tion studies is not relevant on our results. Nevertheless, to other trait. Therefore, taking advantage of the benefits of
our knowledge the influence of marker density on the supe- 2TGS over STGS using unbalanced training sets would be
riority of MTGS over STGS has not been deeply studied more realistic for PC than for GY. However, since the SEW
so far. To rule out this potential confounding effect caused is an important component determining GY in cereals (Lin
by the different marker densities of POP-A and POP-B, we and Allaire 1977; Pinto et al. 2002; Yoshira et al. 2000) and
performed ST- and 2TGS for GY and PC in POP-B using both traits were positively correlated in the present study
a balanced training set scenario but now jointly sampling (Table 1), we further examined the potential benefits over
markers (with a sample size of 394 DArT markers) and gen- STGS of using the SEW as indicator trait to predict GY by
otypes (according to each training set size) for each cross- means of 2TGS (Table S5). Pronounced benefits in terms of
validation run. Differences in prediction accuracy of MTGS GY prediction accuracies of 2TGS over STGS were only
over STGS obtained by this approach were extremely close observed in the unbalanced training set scenario with maxi-
to the original ones (data not shown). Therefore, we expect mized SEW information for POP-A. We presume that the
that the main driving forces explaining the different perfor- lack of benefits of 2TGS over STGS within balanced train-
mances of MTGS over STGS between POP-A and POP-B ing set scenarios was due to the high and similar heritabil-
(Fig. 1) are due to the differences in heritabilities (Table ity estimates for GY and SEW in both rye testcross pop-
S2) and genetic correlations (Table 1) of both populations. ulations (Table S2). Additionally, the almost null benefits
Nonetheless, studies on MTGS using different marker den- from 2TGS over STGS using unbalanced training sets in
sities should be performed in the near future, additionally POP-B could be mainly a consequence of the substantially
considering factors such as the genetic architecture and the lower genetic correlation between GY and SEW compared
level of phenotyping unbalance within training sets for the to the one of POP-A (Table 1) and secondly, because SEW

13
284 Theor Appl Genet (2016) 129:273–287

was slightly less-heritable than GY in POP-B (Table S2). impose difficulties such as convergence problems into the
Nevertheless, even though phenotyping for SEW could be solving of the mixed-model equations. Computational bur-
theoretically less time- and resource-consuming than meas- den would be also enhanced when unbalanced datasets are
uring GY in field trials, the plausibility of this last MTGS handled by the mixed model equations. Moreover, it should
strategy for GY prediction using the SEW as correlated be also expected that the contributions of the new traits
indicator trait should be evaluated according to the optimal which are added to the model would tend asymptotically
allocation of resources between GY and SEW phenotyping. towards zero; hence the more traits an MTGS model has
We consider that, in general, the search for realistic and the lesser would be the contribution of a new trait added
cost-effective strategies for the implementation of MTGS to the model. In addition, the interpretation of models with
in plant breeding programs should be ultimately conducted, more traits becomes more complex. Last but not least, in
similarly as it has been already done for STGS implemen- the present study the prediction accuracies obtained for a
tation for GY prediction in cereal breeding (Endelman trait with scarce phenotypic records available using 2TGS
et al. 2013; Longin et al. 2015). and an indicator trait with complete information at hand
(Tables S3 and S5) are in general higher, if not similar, than
Optimizing the number of traits handled by MTGS the ones obtained by 3TGS with partial information for the
models same two traits and complete information available for a
third correlated trait (Table S4). According to these results
Models for MTGS have been rarely fitted in the past using for GY and PC in rye, if the aim is to improve the predic-
more than two traits simultaneously. Tsuruta et al. (2011) tion accuracy of only one particular trait, 2TGS should be
performed MTGS using a model which simultaneously con- sufficient for that purpose.
sidered 18 traits of US Holstein cattle. They argued that the
prediction accuracies are increased when switching from Predicting SIs for GY and PC in rye
STGS to MTGS but that the increase depends on the trait
being predicted. In contrast, Bao et al. (2015) did not find There are several possible selection procedures to maxi-
significant benefits when shifting from STGS to an MTGS mize the economic value of animals or plants, including
model which simultaneously considered four traits related tandem selection, independent culling levels or simulta-
to SDS resistance in soybean. However, these studies did neous selection by means of an index. The SI method is
not discuss the benefits and limitations of progressively expected to give the most rapid improvement of the eco-
adding additional traits to an MTGS model which already nomic value (Falconer and MacKay 1996), and therefore
performed well with a few traits. Furthermore, MT meth- the application of GS to predict different SIs could poten-
ods are known for being more time and computer resource tially break the gap between GS and MT improvement. In
consuming than their ST counterparts, and this limitation the present study, firstly, a direct method for SI prediction
is expected to become more serious when the number of was presented, in which each SI was treated as a single
phenotypic records or traits increases (Bauer and Léon trait within the GS model. Four SIs were predicted using
2008; Calus and Veerkamp 2011; Hayashi and Iwata 2013). this method and different prediction accuracy levels were
Probably, this is another of the reasons why MTGS has not achieved for each of them (Fig. 3). Since the b values deter-
been routinely adopted until now. In consequence, an opti- mine the contribution of each trait to the SI, we expected
mal equilibrium between prediction accuracy improvement that the relative importance of each trait contributing to a
and computation efficiency should be somehow investi- particular SI (see bGY:bPC ratio in Table S6) and also GY
gated when additional traits are considered within the pre- and PC prediction accuracies influenced the prediction
dictive model. In the present study, GY and PC prediction accuracy of the SI. However, this trend was not clearly
accuracy improvements were only observed for an MTGS observed. Nevertheless, although the differences in herita-
model including GY, PC and SEW using a training set with bilities were not so high among the different SI (Table S6),
maximized SEW information in POP-A (Table 2). There- in general, the higher the heritability of the SI, the better
fore, one of the advantages of 3T-over 2TGS is that it can was the prediction accuracy achieved (Fig. 3). The herit-
simultaneously improve the prediction accuracies for two ability of a sum of traits not only include the phenotypic
traits with scarce phenotypic records when the information and genetic variances of the traits, but also the co-variances
for a third correlated trait is completely available. Never- among them; hence, it would not necessarily match the
theless, some cautions should be taken on count when average of the heritability of the traits composing the index
shifting from 2T-to 3TGS models. Besides enhancing the (Openshaw and Hadley 1984). This makes the heritability
time and computer resources requirements, when more cor- of an SI difficult to interpret, but also suggests that the co-
related traits are added to the MTGS model, the probability variances between traits play some role influencing the pre-
of having colinearity problems could increase, which would diction accuracies of the SI.

13
Theor Appl Genet (2016) 129:273–287 285

In addition, a group of reversed prediction methods, validated by means of simulations in order to be extrapolated
constructing each SI based on ST- and 2TGS predictions to other situations. In addition, although 3TGS could allow the
for GY and PC, were computed and then compared to simultaneous improvement of the prediction accuracies for
the direct method (Table 3). Bauer and Léon (2008) con- two traits with partial information, if the aim is to exclusively
structed an O-SI for two simulated traits with a 1:1 price improve the prediction accuracy for a particular single trait, a
ratio by means of classic ST- and MT-BLUP approaches 2TGS model is expected to be more suited for that purpose. In
in a similar way as the reversed methods using balanced general, for balanced phenotypic information, we recommend
training sets of the present study. In general, constructing to perform GS considering the SI as a single trait, this method
the SI based on MT-BLUPs allowed them to obtain better being a simple, direct and efficient way of prediction. Last but
selection responses than using ST-BLUPs, especially when not least, studies comparing the performances of the GS meth-
two traits were negatively correlated. However, in terms of ods suggested for a base SI by Bernardo (2010), and presented
prediction accuracies and compared to the direct SI predic- for the O-SI and R-SI in our study, with the ones achieved by
tion method, substantial improvements were only observed alternative SI genomic prediction methods (Cerón-Rojas et al.
for some SIs predicted by means of the reversed methods 2008; Dekkers 2007), would also be required. We expect that
based on 2TGS predictions using unbalanced training sets the joint consideration of past simulated and real/experimen-
in our study (Table 3). One of the disadvantages of the tal data studies plus our recent findings allow a better under-
direct method is that it can only handle balanced pheno- standing of MT improvement by means of GS and opens the
typic information and in consequence cannot exploit the door to future studies intensively discussing how to implement
benefits of MTGS using unbalanced training sets for dif- these methods in a cost-effective manner by plant breeding
ferent traits. Nonetheless, these improvements in prediction institutions and companies.
accuracy were not observed for all the SIs and we presume
that this inconsistency is partially related to the b values, Author contribution statement YW and AS wrote the
although it is beyond our faculties to demonstrate it. Never- manuscript. YW, AS and YZ performed calculations and
theless, as a general recommendation, when balanced phe- constructed figures and tables. TM and PW supervised the
notypic information is available, the direct method should acquisition of phenotypic data. YW, AS, YZ and JCR inter-
be preferred for SI genomic prediction, being simpler and preted the results. YZ, JCR, TM and PW helped to improve
more efficient than the reversed methods. the manuscript.

Conclusion Acknowledgments This research was conducted within the pro-


ject “Erweiterung der genetischen Basis von Hybridroggen für
Korn- und Biomasseleistung sowie Trockenheitstoleranz mittels
This is the first study using experimental plant breeding data Mehrlinienkartierung und DH-Technik” financially supported by the
that (1) intensively discusses the practical implications of German Federal Ministry of Food and Agriculture via the “Fachagen-
exploiting the information of correlated traits by means of tur Nachwachsende Rohstoffe e.V.”, Gülzow, Germany (Grant ID:
22021711).
MTGS and (2) also applies GS tools for different SI to con-
front an undesired correlation between two traits which are Compliance with ethical standards
intended to be bred in rye for feeding purposes. From the
results, it can be concluded that most of the findings from past Conflict of interest The authors declare that they have no conflict
of interest.
studies on MTGS using simulated data also hold truth for GY
and PC data in rye. Then, the ideal scenario to exploit the ben- Ethical statement The experiments were performed according to
efits of MTGS would be when the predictions of a relatively the current laws of Germany.
low heritable trait with scarce phenotypic records are sup-
ported by a genetically correlated highly heritable indicator
trait which was measured on a large scale. In a plant breed- References
ing context, this ideal scenario is more realistic for PC than
for GY. In the present study, GS was performed only relying Albrecht T, Wimmer V, Auinger HJ, Erbe M, Knaak C, Ouzunova M,
Simianer H, Schön CC (2011) Genome-based prediction of test-
on trait information contained within the training sets, which cross values in maize. Theor Appl Genet 123:339–350
indicates using experimental plant breeding data that MTGS Bao Y, Kurle JE, Anderson G, Yong ND (2015) Association map-
could also be implemented in order to achieve more cycles ping and genomic prediction for resistance to sudden death
of selection by unit of time in the same way as STGS. Even syndrome in early maturing soybean germplasm. Mol Breed
35:128
though, it is not expected that the difference in marker den- Bauer A, Léon J (2008) Multiple-trait breeding values for paren-
sity between the two testcross rye populations influenced tal selection in self-pollinating crops. Theor Appl Genet
the superiority of MTGS over STGS, these results should be 116:235–242

13
286 Theor Appl Genet (2016) 129:273–287

Bernardo RN (2010) Breeding for quantitative traits in plants. Jia Y, Jannink JL (2012) Multiple-trait genomic selection meth-
Stemma Press, Woodbury ods increase genetic value prediction accuracy. Genetics
Butler DG, Cullis BR, Gilmour AR, Gogel B (2009) ASReml-R refer- 192:1513–1522
ence manual. The State of Queensland, Department of Primary Kempthorne O, Nordskog AW (1959) Restricted selection indices.
Industries and Fisheries, Brisbane Biometrics 15:10–19
Calus MPL, Veerkamp RF (2011) Accuracy of multi-trait genomic Kindred DR, Verhoeven TMO, Weightman RM, Swanston JS, Agu
selection using different methods. Genet Sel Evol 43:26 RC, Brosnan JM, Sylvester-Bradley R (2008) Effects of vari-
Cerón-Rojas JJ, Sahagún-Castellanos J, Castillo-González F, Santacruz- ety and fertiliser nitrogen on alcohol yield, grain yield, starch
Varela A, Crossa J (2008) A restricted selection index method and protein content, and protein composition of winter wheat. J
based on eigenanalysis. J Agric Biol Environ Stat 13:440–457 Cereal Sci 48:46–57
Christensen OF, Madsen P, Nielsen B, Ostersen T, Su G (2012) Lin C, Allaire F (1977) Heritability of a linear combination of traits.
Single-step methods for genomic evaluation in pigs. Animal Theor Appl Genet 51:1–3
6:1565–1571 Liu T, Qu H, Luo C, Li X, Shu D, Lund MS, Su G (2014) Genomic
Crossa J, de Los Campos G, Pérez P, Gianola D, Burgueño J, Araus selection for the improvement of antibody response to newcas-
JL, Makumbi D, Singh RP, Dreisigacker S, Yan J, Arief V, Ban- tle disease and avian influenza virus in chickens. PLoS One.
ziger M, Braun HJ (2010) Prediction of genetic values of quan- doi:10.1371/journal.pone.0112685
titative traits in plant breeding using pedigree and molecular Longin CFH, Mi X, Würschum T (2015) Genomic selection in wheat:
markers. Genetics 186:713–724 optimum allocation of test resources and comparison of breed-
Dekkers JCM (2007) Prediction of response to marker-assisted and ing strategies for line and hybrid breeding. Theor Appl Genet
genomic selection using selection index theory. J Anim Breed 128:1297–1306
Genet 124:331–341 Meuwissen THE (2009) Accuracy of breeding values of ‘unrelated’
Dolan DJ, Stuthman DD, Kolb FL, Hewings AD (1996) Multiple trait individuals predicted by dense SNP genotyping. Genet Sel Evol
selection in a recurrent selection population in oat (Avena sativa 41:35
L.). Crop Sci 36:1207–1211 Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total
Eagles HA, Frey KJ (1974) Expected and actual gains in eco- genetic value using genome-wide dense marker maps. Genetics
nomic value of oat lines from five selection methods. Crop Sci 157:1819–1829
14:861–864 Miedaner T, Hübner M, Korzun V, Schmiedchen B, Bauer E,
Elgin JH, Hill RR, Zeiders KE (1970) Comparison of four meth- Haseneyer G, Wilde P, Reif JC (2012) Genetic architecture of
ods of multiple trait selection for five traits in alfalfa. Crop Sci complex agronomic traits examined in two testcross populations
10:190–193 of rye (Secale cereale L.). BMC Genomics 13:706
Endelman JB, Atlin GN, Beyene Y, Semagn K, Zhang X, Sorrells ME, Openshaw SJ, Hadley HH (1984) Selection indexes to modify protein
Jannink J-L (2013) Optimal design of preliminary yield trials concentration of soybean seeds. Crop Sci 24:1–4
with genome-wide markers. Crop Sci 54:48–59 Perez C, Juliano BO, Liboon SP, Alcantara JM, Cassman KG (1996)
Falconer DS, MacKay TFC (1996) Introduction to quantitative genet- Effects of late nitrogen fertilizer application on head rice
ics, 4th edn. Ronald Press Company, New York yield, protein content, and grain quality of rice. Cereal Chem
Guo G, Zhao F, Wang Y, Zhang Y, Du L, Su G (2014) Comparison of 73:556–560
single-trait and multiple-trait genomic prediction models. BMC Piepho HP, Möhring J (2007) Computing heritability and selec-
Genet 15:30 tion response from unbalanced plant breeding trials. Genetics
Hayashi T, Iwata H (2013) A Bayesian method and its variational 177:1881–1888
approximation for prediction of genomic breeding values in mul- Pinto RJB, Alvarez JB, Martínet LM (2002) Preliminary evaluation
tiple traits. BMC Bioinformatics 14:34 of grain yield components in hexaploid tritordeum. Crop Breed
Hazel LN (1943) The genetic basis for constructing selection indexes. Appl Biotechnol 2:213–218
Genetics 28:476–490 R Core Team (2014) R: A language and environment for statistical
Hazel LN, Lush JL (1942) The efficiency of three methods of selec- computing. R Foundation for Statistical Computing, Vienna,
tion. J Hered 33:393–399 Austria. ISBN 3-900051-07-0. http://www.R-project.org
Heffner EL, Jannink JL, Iwata H, Souza E, Sorrells ME (2011) Reif JC, Hamrit S, Heckenberger M, Schipprack W, Maurer HP, Bohn
Genomic selection accuracy for grain quality traits in biparental M, Melchinger AE (2005) Trends in genetic diversity among
wheat populations. Crop Sci 51:2597–2606 European maize cultivars and their parental components during
Henderson CR (1975) Best linear unbiased estimation and prediction the past 50 years. Theor Appl Genet 111:838–845
under a selection model. Biometrics 31:423–447 Rutkoski J, Benson J, Jia Y, Brown-Guedira G, Jannink JL, Sorrells
Henderson CR, Quass RL (1976) Multiple trait evaluation using rela- ME (2012) Evaluation of genomic prediction methods for Fusar-
tives’ records. J Anim Sci 43:1188–1197 ium head blight resistance in wheat. Plant Genome 5:51–61
Heslot N, Yang HP, Sorrells ME, Jannink JL (2012) Genomic selection Searle SR (1982) Matrix algebra useful for statistics. Wiley, New York
in plant breeding: a comparison of models. Crop Sci 52:146–160 Shu Y, Yu D, Wang D, Bai X, Zhu Y, Guo C (2013) Genomic selection
Hofheinz N, Borchardt D, Weissleder K, Frisch M (2012) Genome- of seed weight based on low-density SCAR markers in soybean.
based prediction of test cross performance in two subsequent Genet Mol Res 12:2178–2188
breeding cycles. Theor Appl Genet 125:1639–1645 Smith HF (1936) A discriminant function for plant selection. Ann
Holbrook CC, Burton JW, Carter TE (1989) Evaluation of recurrent Eugen 7:240–250
restricted index selection for increasing yield while holding seed Solberg TR, Sonesson AK, Woolliams JA, Meuwissen T (2008)
protein constant in soybean. Crop Sci 29:324–329 Genomic selection using different marker types and densities. J
Hübner M, Wilde P, Schmiedchen B, Dopierala P, Gowda M, Reif Anim Sci 86:2447–2454
JC, Miedaner T (2013) Hybrid rye performance under natural Spindel J, Begum H, Akdemir D, Virk P, Collard B, Redoña E, Atlin
drought stress in Europe. Theor Appl Genet 126:475–482 G, Jannink JL, McCouch SR (2015) Genomic selection and
Jarquín D, Kocak K, Posadas L, Hyma K, Jedlicka J, Graef G, Lorenz association mapping in rice (Oryza sativa): effect of trait genetic
A (2014) Genotyping by sequencing for genomic prediction in a architecture, training population composition, marker number
soybean breeding population. BMC Genomics 15:740 and statistical model on accuracy of rice genomic selection in

13
Theor Appl Genet (2016) 129:273–287 287

elite, tropical rice breeding lines. PLoS Genet. doi:10.1371/jour- Williams JS (1962) The evaluation of a selection index. Biometrics
nal.pgen.1004982 18:375–393
Suwantaradon K, Eberhart SA, Mock JJ, Owens JC, Guthrie WD Windhausen VS, Atlin GN, Hickey JM, Crossa J, Jannink JL, Sor-
(1975) Index selection for several agronomic traits in the BSSS2 rells ME, Raman B, Cairns JE, Tarekegne A, Semagn K, Bey-
maize population. Crop Sci 15:827–833 ene Y, Grudloyma P, Technow F, Riedelsheimer C, Melchinger
Tsuruta S, Misztal I, Aguilar I, Lawlor TJ (2011) Multiple-trait AE (2012) Effectiveness of genomic prediction of maize hybrid
genomic evaluation of linear type traits using genomic and phe- performance in different breeding populations and environments.
notypic data in US Holsteins. J Dairy Sci 94:4198–4204 G3 Genes| Genomes| Genetics 2:1427–1436
Wang Y, Mette MF, Miedaner T, Gottwald M, Wilde P, Reif JC, Zhao Yoshira T, Karasawa T, Nakatsuka K (2000) Yielding ability of winter
Y (2014) The accuracy of prediction of genomic selection in elite triticale in a heavy snow area of central Hokkaido. Crop Breed
hybrid rye populations surpasses the accuracy of marker-assisted Appl Biotechnol 2:213–218
selection and is equally augmented by multiple field evaluation Zhao Y, Gowda M, Liu W, Würschum T, Maurer HP, Longin FH,
locations and test years. BMC Genomics 15:556 Ranc N, Reif JC (2012) Accuracy of genomic selection in
Wang Y, Mette MF, Miedaner T, Wilde P, Reif JC, Zhao Y (2015) First European maize elite breeding populations. Theor Appl Genet
insights into the genotype–phenotype map of phenotypic stabil- 124:769–776
ity in rye. J Exp Bot 66:3275–3284

13

You might also like