Professional Documents
Culture Documents
PIC Pig Dataset
PIC Pig Dataset
ABSTRACT Although common datasets are an important resource for the scientific community and can be KEYWORDS
used to address important questions, genomic datasets of a meaningful size have not generally been available pig
in livestock species. We describe a pig dataset that PIC (a Genus company) has made available for comparing genomic
genomic prediction methods. We also describe genomic evaluation of the data using methods that PIC relationships
considers best practice for predicting and validating genomic breeding values, and we discuss the impact of GenPred
data structure on accuracy. The dataset contains 3534 individuals with high-density genotypes, phenotypes, cross-validation
and estimated breeding values for five traits. Genomic breeding values were calculated using BayesB, with shared data
phenotypes and de-regressed breeding values, and using a single-step genomic BLUP approach that resources
combines information from genotyped and un-genotyped animals. The genomic breeding value accuracy
increased with increased trait heritability and with increased relationship between training and validation. In
nearly all cases, BayesB using de-regressed breeding values outperformed the other approaches, but the
single-step evaluation performed only slightly worse. This dataset was useful for comparing methods for
genomic prediction using real data. Our results indicate that validation approaches accounting for relatedness
between populations can correct for potential overestimation of genomic breeding value accuracies, with
implications for genotyping strategies to carry out genomic selection programs.
The availability of common datasets is important to the scientific particularly during the genomics era in which organizations have made
community. These datasets can be used to compare and benchmark large investments in genotypes and phenotypes. Efforts to implement
methods, to provide links between different datasets facilitating meta- genomic selection in a number of livestock species have generated da-
analysis, and to enable researchers who lack such resources to test tasets that contain multigenerational phenotypes, genotypes, pedigree,
hypotheses that address important questions. In human genetics, many and progeny test-derived estimated breeding values (EBV). Such com-
common datasets are available, e.g. International HapMap Project monly available datasets could be used in numerous ways, in particular
(The International HapMap Consortium 2003), 1000 Genomes (The to evaluate new methods to estimate genomic breeding values (gEBV).
1000 Genomes Project Consortium 2010), Framingham Heart Study Additionally, important questions relating to the nature and properties
(Dawber et al. 1951), Mouse Genomes Project (Keane et al. 2011), and of genomic selection (GS) could be assessed with such a dataset, in-
have led to numerous discoveries. cluding the effect on inbreeding (Daetwyler et al. 2007), the importance
Because commercial data may hold economic value, common da- of close relatives when training prediction equations (Habier et al.
tasets of meaningful size have in general been unavailable in livestock, 2010; Clark et al. 2011), and the dynamics of long-term response to se-
lection (Jannink 2010).
Copyright © 2012 Cleveland et al. As part of the effort to implement GS in pigs (Sus scrofa), PIC (a
doi: 10.1534/g3.111.001453
Genus company) created several datasets containing individuals with
Manuscript received October 20, 2011; accepted for publication December 19, 2011
This is an open-access article distributed under the terms of the Creative phenotypes for a number of traits, high-density genotypes (60k), poly-
Commons Attribution Unported License (http://creativecommons.org/licenses/ genic estimated breeding values (pEBV; no genomic information) for
by/3.0/), which permits unrestricted use, distribution, and reproduction in any pure and crossbred traits, and complete pedigree. PIC has imple-
medium, provided the original work is properly cited. mented several methods for routine genomic evaluation, including
Supporting information is available online at http://www.g3journal.org/lookup/
suppl/doi:10.1534/g3.111.001453/-/DC1
a two-step approach using BayesA/BayesB (Meuwissen et al. 2001)
1
Corresponding author: Genus plc, 100 Bluegrass Commons Blvd., Suite 2200, and the single-step evaluation described by Misztal et al. (2009),
Hendersonville, TN 37075. E-mail: matthew.cleveland@genusplc.com which, based on extensive internal testing, PIC considers current best
accuracies similar to BayesB_ebv, but smaller in most cases. No strong which the effective heritability is proportional to the pEBV accuracy
evidence exists to support the existence of major QTL for these traits, (Garrick et al. 2009). The expected accuracy of the gEBV will then be
and therefore, BayesB would not be expected to have a large advantage based on this higher heritability (e.g. BayesB_ph vs. BayesB_ebv).
(results not shown). For traits affected by large numbers of loci (genes), Often, the number of available genotyped samples is based on prac-
the expectation is that a genomic BLUP approach should perform tical or budgetary considerations, and the decision to incorporate
similarly to BayesB (Luan et al. 2009; Habier et al. 2010). In SStep,
however, genomic relationships are augmented by pedigree relationships
between genotyped and un-genotyped animals for a more accurate
modeling of individual relationships, especially when only a subset of
the population is genotyped (Legarra et al. 2009). Assuming no large
QTL exist, any large differences between BayesB_ebv and SStep may be
due to differences in the information content of the phenotypes used by
each method and to differences in the training set size relative to differ-
ences in effective heritability. SStep outperformed BLUP, indicating that
genomic information was useful, but only when the heritability was high.
These findings, taken together, highlight the need for additional geno-
typing to improve lowly heritable traits using genomics, particularly
when only phenotypes are available.
Including genomic information in a genetic evaluation is expected
to increase the accuracy of the EBV, but the results presented here
(Figure 2) illustrate the potential to actually decrease the accuracy
when analyzing phenotypes, especially when the heritability is low.
Figure 4 Genomic breeding value accuracy of the 6-fold cross-validation
When heritability is not high, large numbers of genotyped and phe- (XVal) for individuals with one parent (1P), both parents (2P), at least one
notyped animals are needed to achieve even moderate gEBV accuracy offspring (O), at least one parent and one offspring (PO), or no parents or
(Goddard 2009; Goddard and Hayes 2009). Such numbers were not offspring (N) genotyped in the training set, for T4. The trait was analyzed
available in this dataset. Alternatively, the heritability of the trait can with BayesB with phenotypes (BayesB_ph), the single-step approach
be increased by using de-regressed progeny test EBV as phenotypes, in (SStep), and BayesB with estimated breeding values (BayesB_ebv).
ACKNOWLEDGMENTS
Nader Deeb, Olwen Southwood, and Lizhen Wang are acknowledged
for their assistance in creating the dataset. J.M.H. was funded by
Australian Research Council project LP100100880, of which Genus
plc is a co-funder.
LITERATURE CITED
Clark, S. A., J. M. Hickey, and J. H. J. van der Werf, 2011 Different models
of genetic variation and their effect on genomic evaluation. Genet. Sel.
Evol. 48: 18.
Cleveland, M. A., S. Forni, D. J. Garrick, and N. Deeb, 2010 Prediction of
genomic breeding values in a commercial pig population. Proceedings of
the 9th World Congress on Genetics Applied to Livestock Production.
Leipzig, Germany, August 1–6, 2010, Paper 0266.
Figure 6 Genomic breeding value accuracy of the validation set Daetwyler, H. D., B. Villanueva, P. Bijma, and J. A. Woolliams, 2007 In-
containing young animals (YoungVal), for all traits, using a standard breeding in genome-wide selection. J. Anim. Breed. Genet. 124: 369–376.
polygenic BLUP (BLUP), BayesB with phenotypes (BayesB_ph), the Dawber, T. R., G. F. Meadors, and F. E. Moore, Jr., 1951 Epidemiological
single-step approach (SStep), and BayesB with estimated breeding approaches to heart disease: the Framingham Study. Am. J. Public Health
values (BayesB_ebv). 41: 279–286.