Professional Documents
Culture Documents
GS Agb 701
GS Agb 701
Dr. Supriya
Scientist
ICAR-CIRB, Hisar
Quantitative Traits
Nevertheless, to date, most genetic progress for quantitative traits in livestock has been
made by selection on phenotype or Estimated Breeding Values (EBV) derived from the
phenotype without knowledge of the number of genes that affect the trait or the effects of
each gene.
What are molecular techniques in general ?
Morphological
Characteristics of Genetic
Markers:
• Co-dominant expression
• Nondestructive assay
Types of Genetic •
Biochemical Complete penetrance
Markers • Early onset of phenotypic
expression
• High polymorphism
• Random distribution throughout
the genome
Chromosomal
• Assay can be automated
04/10/2024 10
Advantages of molecular techniques
The knowledge of the genetic architecture of quantitative traits: the underlying
genetic basis of a phenotypic trait
Greater genetic gain and has higher accuracy
(3) PCR
04/10/2024 13
Markers must be polymorphic
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
P1 P 2
P1 P2
Marker A Marker B
Using markers A and B:
QTL
5 cM 5 cM 1 - 2 rArB = ~99.5%
F1
F2 large populations
Selected
Individual QTL MARKER-ASSISTED SELECTION
Method whereby phenotypic selection is based on DNA markers
04/10/2024 16
Role of markers in Genetic improvement
Varies by objective,
germplasm, trait
genetic architecture.
(Bernardo, 2008)
Limitations of MAS
The application of MAS in animal breeding practices has been very limited due to
several reasons.
Molecular geneticists used markers that were large DNA segments (100-1000 base
pairs) but were not very close to the actual genes.
As the success of MAS depends on the closeness of marker and genes, the term
linkage disequilibrium (LD) was used to denote the closeness.
High LD indicates more closeness of the marker to the gene and less chances
of recombination between the marker allele and gene allele across generations.
MAS requires LD of 30% or more between markers and genes, to act markers
to the genes.
Limitations of MAS
In practice, very few markers have some significant effects on the economic
traits and a marginal proportion of genetic variation is explained by each of these
markers.
Computationally intensive
Computationally Very intensive
(LE-MAS)
A genome-wide approach typically provides
better predictions
Genomic rA
MAS GS MAS GS
MAS rA
Yi = µ + animali + Ʃ (SNPijk) + ei
Where,
A GRM constructed over all the markers can be used in a regular mixed model instead of
an additive relationship matrix (Goddard, 2009)
The effectiveness of GBLUP depends on the closeness of GRM to the realized genetic
relationships
Appropriate quality control of genomic data before constructing the GRM can avoid
much of biases in any estimate
What to include?
1. Only female
2. Only male – e.g. Proven PT bulls with EBV
3. Both male and female – more accuracy (higher reference size)
(Sire families: FS and HS, Trios)
Prospects:
Collaboration with developed countries
Across region genomic prediction with pooled data
Assumptions:
Linkage disequilibrium (LD) between the markers and the QTL, and, ideally, for
every QTL there is a marker in perfect LD, i.e. R2 =1, where R2 is the square of
the correlation between the allele frequencies at two loci.
1. M1------------Q1------M2--------Q2------------M3----------Q3-----------M4------------Q4
2. M11--M12--Q1--M21--M22---Q2---M31---M32--------Q3----M41----M42-------Q4
Estimation of marker effects depends upon the structure of reference population
=> Effect of single marker (unrelated reference and validation) or Haplotype
(related) ?
2. For bulls: y = DYD (Daughter yield deviation) = twice average for a bull of
all YD of their daughters corrected for ½ genetic merit of their dams (with
associated weight
3. For bulls: y = de-regressed proofs --> obtained by solving the MME to get
the right-hand side (EBV/reliability)
The EBV of a sire that is based on 100 daughters of which 10 are genotyped,
should be corrected for the 10 genotyped daughters, such that the DRP still
includes the information of the 90 daughters that are not genotyped.
Genomic prediction models
There are mainly two classes of methods for genomic selection:
1) SNP effect-based method
2) Genomic relationship-based method: GBLUP, ssGBLUP
1. Multi-step model: Direct Genomic Values (DGVs) by RR-BLUP or SNP-BLUP
Only genotyped animals
2. Over-parametrization problem “small (n) big (p)” SNP as random effect
Assumes a normal distribution of SNP effects and a constant variance
Use of a subset of SNPs tag-SNPs
Large estimated effects incorporated in an animal model and the animal effects
represent the remaining polygenic effects after the SNPs
Selecting tag-SNPs : GWAS, feature selection, forward stepwise regression,
PCA etc.
GWAS-Assisted Genomic Prediction
Problems:
1. Lack of Power – causal variant is
truly tagged by a marker?
2. Beavis effect (Winner’s curse)
estimated effect = real effect +
"estimation noise"
10 True QTLS, 5000 markers, 1000 100 True QTLS, 50,000 markers, 1000
Individuals Individuals
3. Including SNPs with true association with QTL: Bayesian models
BayesA model:
• assumes all SNPs have a certain effect
• very few SNPs have large effects while many have small effects
BayesB model:
• Metropolis-Hastings algorithm to determine whether a locus has any effect on
the trait or not
• Assumes that a proportion (π) of the SNPs have zero effect while 1 - π SNPs
have a nonzero effect
• BayesB becomes BayesA when π is zero
BayesC model:
• Mix. of SNP-BLUP and BayesB
• Assumes a constant variance for a fraction of SNPs having an effect on the
trait as in SNP-BLUP and a fraction (π) of SNPs have zero effect as in
BayesB
• Gibbs sampling is applied
• QTLs with the distribution of large effects pick up the same SNPs as the
BayesB model, while the SNPs with smaller effects still explain small and
equal the variance as in the SNP-BLUP model
BayesCπ
• Modified BayesC with an additional step to estimate the proportion of SNPs
not having any effect on the trait
Allele Coding
Four individuals and two loci, where alleles for the loci are {A, a} and {B, b}.
The genotypes of the four individuals are:
aa Bb
AA bb
Aa bb
aa bb
VanRaden (VanRaden 2008) defined matrix M
as Z with 101 coding and then
Z=M-P, where P is a matrix with 2(pi - 0.5).
Best Predictor = Bayesian estimator
« Likelihood »
(how SNP effects « Prior » (how
affect the we think SNP
phenotype) effects are)
a p y | a p a
aˆ E a | y
da
p y | a p a
Estimate of SNP
effects da
How to get best predictors
𝑦 = 𝑋𝑏 + 𝑍𝑎 + ⋯ + 𝑒
• Everyone assumes 𝑝(𝑒 )~N 0 , 𝑅
• But people assume funny things for 𝑝
𝑎
• Do we want very strong marker effects?
– Yes: Bayesian Alphabet (Bayes A, B, C, R, S…
Bayesian Lasso…)
– No: 𝑝(𝑒)~ N 0, 𝐼𝜎 2 SNP-BLUP
𝑎
Assumptions regarding the
distributions of marker effects
XR1X
X R1Z bˆ XR 1y
ZR X Z R Z
1 1 1
1
aˆ ZR y
D
10 0 0
0 2
D 00 1 0 0 a
2
a
0 I
0 1
1
0 0
SNP-BLUP is flexible
• In theory
– Multiple trait models
– REML
– Threshold models
– Maternal effects, random regression, social
effects…
• But:
– Little software around
– Multiple trait models will involve huge
matrices
– We shrink the least square estimate
Genomic prediction models
There are mainly two classes of methods for genomic selection:
1) SNP effect-based method
2) Genomic relationship-based method: GBLUP, ssGBLUP
Var(u) =
G =
Relationship
s
• Relationships were conceived as standardized covariances (Fisher, Wright)
• 𝐶𝑜𝑣 𝑢 i , 𝑢j =i j𝑅 𝑢
• 𝑅
𝜎 2i j “some” relationship
• 𝜎𝑢2 genetic variance
• Pedigree relationships
• Marker relationships
50
Traditional
Pedigree
Sire of Sire
Sire
Dam of Sire
Animal
Sire of Dam
Dam
Dam of Dam
2007
Interbull annual meeting 2007 (51) VanRaden
Genomic
Pedigree
2007
Interbull annual meeting 2007 (52) VanRaden
Haplotype
Pedigree atagatcgatcg
ctgtagcgatcg ctgtagcttagg
agatctagatcg agggcgcgcagt
ctgtctagatcg cgatctagatcg
atgtcgcgcagt cggtagatcagt
agagatcgcagt agagatcgatct
atgtcgctcacg atggcgcgaacg
ctatcgctcagg
2007
Interbull annual meeting 2007 (53) VanRaden
Genotype
Pedigree
Count number of second allele
121101011110
111211120200
101121101111
122221121111
101101111102
011111012011
121120011010
0 = homozygous for first allele (alphabetically)
1 = heterozygous
2 = homozygous for second allele
(alphabetically)
2007
Interbull annual meeting 2007 (54) VanRaden
Realized
relationships
• Identical By Descent Relationships based on pedigree are average relationships
which assume infinite loci.
• « Real » IBD relationships 𝑅 are a bit different due to finite genome size (Hill and
Weir, 2010)
• Therefore A is the expectation of realized relationhips 𝑅
• SNPs more informative than A.
• Two full sibs might have a correlation of 0.4 or 0.6
• You need many markers to get these « fine relationships »
55
• If p are computed from the sample
• In HWE & Linkage Equilibrium
• Average of Diag(G) = 1
• Average (G) =0
• With average inbreeding F
• Average of Diag(G) = 1+F
• If p are computed from the data
• This implies that E(Breeding Values)=0
• Positive and negative inbreeding
• Some individuals are more heterozygous than the average of
the population (OK, no biological problem)
• Negative genomic relationships
• This implies that individuals i and j are more distinct than an
average pair of individuals in the data
• Fixing negative estimates of relationships to 0 is wrong
praxis
GBLUP
7K
T T G C A T C Works well !!!
T T C A C T A
+ « reference population » with complete genotypes
(<1% error in Holstein if
T CAGT TGTGA A TGAC TGGACGTGCC sire genotyped with 54k)
54K
T A AGTGGT C A C TGA A TGGC CGT T C A
GI for four paths of selection (SB, SC, DB, and Genetic gain per year estimates from four
DC) by birth year of offspring for Holsteins paths of selection (Four Paths) for MY in
Holstein
(García-Ruiz et al., 2016)
Options for genomic selection in low income countries
Nucleus breeding programs
Centrally organized
Generic breeding goal
Good potential for species like poultry
Community based breeding programs
Locally organized
Use locally adapted genetic resources
May struggle to get suitable reference population
Any participatory approach needs immediate benefits for the
participating farmers: not in the future
Genomic Selection under Indian dairying scenario
Implementation of the whole genome based selection in dairy animals keeping in view
the present Indian dairy farming system
Dairy
NGOs Organized cooperatives
Dairy herds,
large scale
phenomics
recording and
genotyping
India, must prioritize the milch species /breeds within the species to be engaged in
genomic selection for GS programs, firstly, either a single breed or group of some
breeds must be considered for developing reference population.
Further, the number of reference populations required will also depend upon the
breed(s) included in development of a multi-breed DNA chip
Creation of multi breed reference population probably with limited numbers of
subordinate breeds can be a viable solution to address the large DNA variations
segregating within and across Indian dairy populations is captured.
It is most likely that the single reference population may not suffice to capture
entire genomic diversity present across Indian dairy breeds/species
Harris et al. (2008) reported that SNP estimates calculated from Holstein Friesian
reference population did not produced accurate genomic estimated breeding value in
Jersey bulls and vice-versa
Lack of core competence in computational biology and in Big-data
analysis and limited trained manpower in the field of quantitative
genetics for prediction of genomic breeding value.