Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 69

Molecular genetic approaches for enhancing animal

productivity with special reference to genomic


selection

Dr. Supriya
Scientist
ICAR-CIRB, Hisar
Quantitative Traits

The observed trait is sum of many genes and


environmental factors
Slide courtesy: Dj.de-koning@slu.se
Department of Animal Breeding and Genetics, SLU
CONVENTIONAL
SELECTION
• Selection - Giving preference to certain
individuals to reproduce
• Selection - Non-random differential breeding
(Lerner, 1958)

• Phenotype based selection to genotype based


• Now genes not part of ‘black box’ Black box to white box
• Selection with help of marker – a recent
development
Three Approaches to Selection of dairy
animals
Conventional
Selection – Index
Traditional animal breeding programs / based, EBV, MME
Selection:
 Rely mainly on phenotypes being evaluated in several
environments (Progeny testing)
 Selection and recombination are based solely on the resulting Genetic Selection /
data plus pedigree information, when available.
Marker Assisted
Selection – QTL Based
 Genetic / Marker assisted selection (MAS):
 Uses molecular markers in linkage disequilibrium (LD) with
quantitative trait loci (QTL).
Genomic Selection – Chip
based i.e. SNP chip
04/10/2024 4
Molecular Markers (MAS)

Genome-Wide SNPs (GS)


Superior Genetics leads to superior production

Changes mainly due to Conventional selection & breeding


Slide courtesy: 27 Annual Conference of Ethiopian Society of Animal Production (ESAP), ILRI
Why do we bother while Conventional
Selection is doing so well ?
 Conventional animal breeding is related to the phenotypic selection
where traits are measured directly, animals with superior performance
in the desired traits are used as parents for the next generation in the
herd.

 Selection of best performing strains have been practiced for several


years now, slow genetic gains, requires more time to evaluate…

 Selection based on phenotypic traits, and pedigree based have


limitations associated with data collection and precision of measurement
Advantages of for traits measured easily and moderately to
 Can only be applied
highly heritable, and it is costly as it demands the maintenance of the
molecular
breeding stocktechniques
during measurement

 Immense achievements were possible through selection. However, the


need for more production, efficiency and robustness remains critical

 Robust Decision Making Tools are therefore required

Nevertheless, to date, most genetic progress for quantitative traits in livestock has been
made by selection on phenotype or Estimated Breeding Values (EBV) derived from the
phenotype without knowledge of the number of genes that affect the trait or the effects of
each gene.
What are molecular techniques in general ?

 Molecular Genetics, Investigates the genetic makeup of living things


at the molecular (DNA, RNA, and Protein) level

 It involves the identification and mapping of genes and


their vapolymorphisms, to identify variants associated with
economic traits

 Involves the analysis of DNA, RNA, protein, and lipids

 Molecular techniques have wide applications in the field of biology,


biochemistry, genetics, and biophysics disciplines
What is Marker? Molecular
Marker is a piece of
DNA molecule that is
associated with a
certain trait of a
organism

Morphological
Characteristics of Genetic
Markers:
• Co-dominant expression
• Nondestructive assay
Types of Genetic •
Biochemical Complete penetrance
Markers • Early onset of phenotypic
expression
• High polymorphism
• Random distribution throughout
the genome
Chromosomal
• Assay can be automated
04/10/2024 10
Advantages of molecular techniques
 The knowledge of the genetic architecture of quantitative traits: the underlying
genetic basis of a phenotypic trait
 Greater genetic gain and has higher accuracy

 Molecular genetic information can be obtained at an early age  reduction in


generation intervals

 Molecular genetic information can be obtained on all selected candidates;


beneficial for
 sex-limited traits
 traits that are expensive or difficult to record
 traits that require the slaughter of the animal (carcass traits)
 With molecular genetic techniques, it is possible to unravel many genetic
polymorphisms at the DNA level
Marker Assisted Selection
o Accelerate genetic progress  Measure Three starting points for MAS
DNA Markers early in life
- Even in embryos Ease of detection Use

Functional mutations Genes


o Sex limited traits known gene
(GAS)
- Milk
- Litter size Markers in population
wide linkage LD-
disequilibrium with markers
o Traits measured in (LD-MAS)
functional mutation
relatives
- Meat quality Markers in LE-
traits! population wide markers
linkage equilibrium (LE-MAS)
with functional
mutation
Marker Genotyping
(1) BLOOD SAMPLING

(2) DNA EXTRACTION

(3) PCR

(4) GEL ELECTROPHORESIS

(5) MARKER ANALYSIS

04/10/2024 13
Markers must be polymorphic

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

P1 P 2
P1 P2

Not polymorphic Polymorphic!

Polymorphism: A detectable difference at a particular gene or


marker occurring among individuals.
04/10/2024 14
Markers must be
tightly-linked to target loci!
• Ideally markers should be <5 cM from a gene or QTL
RELIABILITY FOR
SELECTION
Marker A

QTL Using marker A only:


5 cM
1 – rA = ~95%

Marker A Marker B
Using markers A and B:
QTL
5 cM 5 cM 1 - 2 rArB = ~99.5%

• Using a pair of flanking markers can greatly improve


reliability of genetic selection but increases time and
cost
04/10/2024 15
Example: MARKER-ASSISTED BREEDING
THROUGH GENE CHIP
P1 x P2
Susceptible to Resistant to
Disease a disease

F1

F2 large populations

Selected
Individual QTL MARKER-ASSISTED SELECTION
Method whereby phenotypic selection is based on DNA markers

04/10/2024 16
Role of markers in Genetic improvement

Varies by objective,
germplasm, trait
genetic architecture.
(Bernardo, 2008)
Limitations of MAS
The application of MAS in animal breeding practices has been very limited due to
several reasons.

 Molecular geneticists used markers that were large DNA segments (100-1000 base
pairs) but were not very close to the actual genes.

 As the success of MAS depends on the closeness of marker and genes, the term
linkage disequilibrium (LD) was used to denote the closeness.

 High LD indicates more closeness of the marker to the gene and less chances
of recombination between the marker allele and gene allele across generations.

 MAS requires LD of 30% or more between markers and genes, to act markers
to the genes.
Limitations of MAS
 In practice, very few markers have some significant effects on the economic
traits and a marginal proportion of genetic variation is explained by each of these
markers.

 Determination of marker genotypes often lacks consistency, consequently,


selection on marker genotypes was not as successful as expected.

 Determination of DNA markers mostly relied on restriction fragment length


polymorphisms (RFLP), however, the discovery of microsatellites that were much
smaller segments of DNA proved to be a better marker, but only a few
microsatellites could be found for quantitative traits in Livestock.
c h :
sea r
c e
R ffalo
om i u
e n s. B
G tle v
Ca t
Genomic Selection
Genomic selection is a form of marker-assisted selection
in which genetic markers covering the whole genome are
used so that all quantitative trait loci (QTL) are in linkage
disequilibrium with at least one marker
(Goddard and Hayes, 2007)

Thanks to large scale


SNP genotyping !!!
MAS GS
Find QTL or genes and select specifically Estimate effects across all markers and
for favourable alleles select on the sum of effects

QTL/QTN effects (RFLP, MS..) SNP effect (bi-allelic markers)

Need strategy to combine with EBV Replaces EBV with GEBV

Major genotyping requirements but low


High genotyping requirements and high
genome wide coverage
genome wide coverage
(LE-MAS)

Computationally intensive
Computationally Very intensive
(LE-MAS)
A genome-wide approach typically provides
better predictions
Genomic rA

MAS GS MAS GS

MAS rA

(Lorenzana and Bernardo, 2009) (Lorenz, 2013)


Schematic diagram representing the overall process of prediction of GEBVs
Classification of
GEBV estimation
methods

(Wang et al., 2018)


Prediction of SNP effects from reference population can be done by the
simplest model (Calus, 2010):

Yi = µ + animali + Ʃ (SNPijk) + ei

Where,

Yi = trait of interest or national EBV or daughter-yield deviation or


deregressed proofs or average daughter performance of animali
µ = fixed effects or simply the mean
animali = polygenic random effect
Ʃ (SNPijk) = effects SNP, summed across allelek (k = 1, 2) at locij for
animali

Then, the GEBVs of animali can be computed in the test population:

GEBV = animal + Ʃ (SNP )


 Other than SNP effects, the genomic relationship matrix (GRM) instead can be used in
the model that assumes each of the SNP contributes equally to the additive genetic
variance. Such model is known as GBLUP (Meuwissen et al., 2001)

 A GRM constructed over all the markers can be used in a regular mixed model instead of
an additive relationship matrix (Goddard, 2009)

 The effectiveness of GBLUP depends on the closeness of GRM to the realized genetic
relationships

 Appropriate quality control of genomic data before constructing the GRM can avoid
much of biases in any estimate

 Variance not represented by the markers can be explained by including an extra


polygenic effect
Genomic Selection: Components and
strategies

Dr. Supriya Chhotaray


Scientist
ICAR-CIRB, Hisar
Road map of GS

(Wang et al., 2018)


Reference Population
Size and Structure:

 Mostly reported 500-3000


 Before deciding the sample size for reference  Power analysis
 Genomic history of the population  MRCA and close pedigree with
the young selection candidate

What to include?
1. Only female
2. Only male – e.g. Proven PT bulls with EBV
3. Both male and female – more accuracy (higher reference size)
(Sire families: FS and HS, Trios)

Prospects:
 Collaboration with developed countries
 Across region genomic prediction with pooled data
Assumptions:
 Linkage disequilibrium (LD) between the markers and the QTL, and, ideally, for
every QTL there is a marker in perfect LD, i.e. R2 =1, where R2 is the square of
the correlation between the allele frequencies at two loci.

 Strong LD High marker density


 LD is expressed as r2 => 1/(4Nec)

 Marker density is a function of Ne/M

 A marker density of 10*(Ne/M)

=> If Ne = 100, indicates 1000 markers per Morgan (1M = 1 chromosome)

=> If Ne = 1000, indicates 10,000 markers per Morgan


 Effective population size in the Reference population depends upon the
genome size L (sum length of all the chromosomes) and the marker density.
What is the marker density for the following
two conditions?

1. M1------------Q1------M2--------Q2------------M3----------Q3-----------M4------------Q4

2. M11--M12--Q1--M21--M22---Q2---M31---M32--------Q3----M41----M42-------Q4
Estimation of marker effects depends upon the structure of reference population
=> Effect of single marker (unrelated reference and validation) or Haplotype
(related) ?

Updating Reference population:


Strategies?
1. Random
2. Top Selected individuals based on GEBV
3. Selection to maximize genetic diversity and response

 Multi-breed reference population

The phenotypes for the Reference population:


1. EBVs
2. DYDs
3. Traits such as milk yield, fat yield
4. De-regressed proofs
1. For cows: y = YD (Yield deviation) = Individual record corrected for all fixed
and non genetic random effects

2. For bulls: y = DYD (Daughter yield deviation) = twice average for a bull of
all YD of their daughters corrected for ½ genetic merit of their dams (with
associated weight

3. For bulls: y = de-regressed proofs --> obtained by solving the MME to get
the right-hand side (EBV/reliability)
The EBV of a sire that is based on 100 daughters of which 10 are genotyped,
should be corrected for the 10 genotyped daughters, such that the DRP still
includes the information of the 90 daughters that are not genotyped.
Genomic prediction models
There are mainly two classes of methods for genomic selection:
1) SNP effect-based method
2) Genomic relationship-based method: GBLUP, ssGBLUP
1. Multi-step model: Direct Genomic Values (DGVs) by RR-BLUP or SNP-BLUP
 Only genotyped animals
2. Over-parametrization problem “small (n) big (p)”  SNP as random effect
 Assumes a normal distribution of SNP effects and a constant variance
 Use of a subset of SNPs  tag-SNPs
 Large estimated effects incorporated in an animal model and the animal effects
represent the remaining polygenic effects after the SNPs
 Selecting tag-SNPs : GWAS, feature selection, forward stepwise regression,
PCA etc.
GWAS-Assisted Genomic Prediction

Significant SNPs Include effect of these


in GEBV prediction

Checks the problem of


“over-parametrization”

Problems:
1. Lack of Power – causal variant is
truly tagged by a marker?
2. Beavis effect (Winner’s curse)
estimated effect = real effect +
"estimation noise"

Results of GWAS on 1st principal component synthesized over seven


consecutive test day’s milk yield of Murrah buffaloes plotted as Manhattan
plot of genome-wide SNPs.
Problems:
1. Lack of Power – causal variant is truly tagged by a marker?
2. Beavis effect (Winner’s curse)
estimated effect = real effect + "estimation noise“
3. Markers associated in one population are not necessarily
associated in another one
4. True list of acting genes and QTL will vary across populations
due to drift or selection

10 True QTLS, 5000 markers, 1000 100 True QTLS, 50,000 markers, 1000
Individuals Individuals
3. Including SNPs with true association with QTL: Bayesian models
BayesA model:
• assumes all SNPs have a certain effect
• very few SNPs have large effects while many have small effects

BayesB model:
• Metropolis-Hastings algorithm to determine whether a locus has any effect on
the trait or not
• Assumes that a proportion (π) of the SNPs have zero effect while 1 - π SNPs
have a nonzero effect
• BayesB becomes BayesA when π is zero
BayesC model:
• Mix. of SNP-BLUP and BayesB
• Assumes a constant variance for a fraction of SNPs having an effect on the
trait as in SNP-BLUP and a fraction (π) of SNPs have zero effect as in
BayesB
• Gibbs sampling is applied
• QTLs with the distribution of large effects pick up the same SNPs as the
BayesB model, while the SNPs with smaller effects still explain small and
equal the variance as in the SNP-BLUP model
BayesCπ
• Modified BayesC with an additional step to estimate the proportion of SNPs
not having any effect on the trait
Allele Coding
Four individuals and two loci, where alleles for the loci are {A, a} and {B, b}.
The genotypes of the four individuals are:
aa Bb
AA bb
Aa bb
aa bb
VanRaden (VanRaden 2008) defined matrix M
as Z with 101 coding and then
Z=M-P, where P is a matrix with 2(pi - 0.5).
Best Predictor = Bayesian estimator

• accurate prediction involves integration of all


information, prior information and observed
information
• In the context of genomic predictions, the
Best Predictor is composed of two parts:
– The prior distribution of marker effects 𝑝 𝑎
– The likelihood of the data given the marker
effects, 𝑝 𝑦 ∣ 𝑎

Slide courtesy: Andres Legarra


Best Predictor = Bayesian estimator

« Likelihood »
(how SNP effects « Prior » (how
affect the we think SNP
phenotype) effects are)

 a p y | a  p a 
aˆ  E a | y  

da
 p y | a  p a 
Estimate of SNP
effects da
How to get best predictors

• Under some assumptions: explicit equation


(SNP-BLUP)
• Under other assumptions: iterative equations
(Gibbs Samplers, Expectation-Maximization)
Bayesian regressions

𝑦 = 𝑋𝑏 + 𝑍𝑎 + ⋯ + 𝑒
• Everyone assumes 𝑝(𝑒 )~N 0 , 𝑅
• But people assume funny things for 𝑝
𝑎
• Do we want very strong marker effects?
– Yes: Bayesian Alphabet (Bayes A, B, C, R, S…
Bayesian Lasso…)
– No: 𝑝(𝑒)~ N 0, 𝐼𝜎 2 SNP-BLUP
𝑎
Assumptions regarding the
distributions of marker effects

1. Normal distribution: Ridge regression BLUP (RR-BLUP), SNP-BLUP,


GBLUP
2. Normal distribution with unknown variances: BayesC, GREML, GGibbs
3. Student (t) distribution : BayesA
4. Mixture of Student (t) distribution and spike at 0: BayesB
5. Mixture of Normal distribution and spike at 0: BayesCπ
6. Double exponential: Bayesian Lasso
Mixed model equations for SNP-BLUP

• Z’Z is not diagonal


• Prior information: variance of SNP effects
• usually assumed 𝑉𝑎𝑟(𝑎) = 𝑎𝐼𝜎2 and 𝑉𝑎𝑟(e) = 𝐼𝜎e2

XR1X
 X R1Z  bˆ   XR 1y 
 ZR X Z R Z 
1 1 1
     1 
  aˆ ZR y
D
10 0 0
0  2

D  00 1 0 0   a
2
 a
0 I
0 1 
 1
0 0
SNP-BLUP is flexible

• In theory
– Multiple trait models
– REML
– Threshold models
– Maternal effects, random regression, social
effects…
• But:
– Little software around
– Multiple trait models will involve huge
matrices
– We shrink the least square estimate
Genomic prediction models
There are mainly two classes of methods for genomic selection:
1) SNP effect-based method
2) Genomic relationship-based method: GBLUP, ssGBLUP

GBLUP from SNP-BLUP

We have defined breeding values as sum of SNP effects


𝑢 = 𝑍𝑎

Var(u) =

G =
Relationship
s
• Relationships were conceived as standardized covariances (Fisher, Wright)
• 𝐶𝑜𝑣 𝑢 i , 𝑢j =i j𝑅 𝑢
• 𝑅
𝜎 2i j “some” relationship
• 𝜎𝑢2 genetic variance

• Genetic relationships are due to shared (Identical By State) alleles at causal


genes
• if I share the blood group 00 with somebody I am “like” his twin
• These genes are unknown (and many will likely remain so)
• Use proxies

• Pedigree relationships
• Marker relationships

50
Traditional
Pedigree
Sire of Sire
Sire
Dam of Sire
Animal
Sire of Dam
Dam
Dam of Dam

2007
Interbull annual meeting 2007 (51) VanRaden
Genomic
Pedigree

2007
Interbull annual meeting 2007 (52) VanRaden
Haplotype
Pedigree atagatcgatcg

ctgtagcgatcg ctgtagcttagg

agatctagatcg agggcgcgcagt

ctgtctagatcg cgatctagatcg

atgtcgcgcagt cggtagatcagt

agagatcgcagt agagatcgatct

atgtcgctcacg atggcgcgaacg

ctatcgctcagg

2007
Interbull annual meeting 2007 (53) VanRaden
Genotype
Pedigree
Count number of second allele
121101011110
111211120200
101121101111
122221121111
101101111102
011111012011
121120011010
0 = homozygous for first allele (alphabetically)
1 = heterozygous
2 = homozygous for second allele
(alphabetically)
2007
Interbull annual meeting 2007 (54) VanRaden
Realized
relationships
• Identical By Descent Relationships based on pedigree are average relationships
which assume infinite loci.
• « Real » IBD relationships 𝑅 are a bit different due to finite genome size (Hill and
Weir, 2010)
• Therefore A is the expectation of realized relationhips 𝑅
• SNPs more informative than A.
• Two full sibs might have a correlation of 0.4 or 0.6
• You need many markers to get these « fine relationships »

55
• If p are computed from the sample
• In HWE & Linkage Equilibrium
• Average of Diag(G) = 1
• Average (G) =0
• With average inbreeding F
• Average of Diag(G) = 1+F
• If p are computed from the data
• This implies that E(Breeding Values)=0
• Positive and negative inbreeding
• Some individuals are more heterozygous than the average of
the population (OK, no biological problem)
• Negative genomic relationships
• This implies that individuals i and j are more distinct than an
average pair of individuals in the data
• Fixing negative estimates of relationships to 0 is wrong
praxis
GBLUP

• We obtain animal, not SNP, solutions

• Immediate application to maternal effects


model, random regression, competition effect
models, multiple trait, etc.

• All genotyped individuals can be included, either with


phenotype or not..
• Regular software (blupf90, asreml, wombat…) works
• Therefore, GREML and G-Gibbs are simple extensions.
Multi-trait GBLUP
Imputation to get high density SNP panel
i_p_ta_io_ c_nsi_t_ i_ pr_di_t_n_ t_e m__s__g l_t_er_
+ dictionary

imputation consists in prediting the missing letters

7K
T T G C A T C Works well !!!
T T C A C T A
+ « reference population » with complete genotypes
(<1% error in Holstein if
T CAGT TGTGA A TGAC TGGACGTGCC sire genotyped with 54k)
54K
T A AGTGGT C A C TGA A TGGC CGT T C A

Figure courtesy: vincent ducrocq@jouy.inra.fr


Factors affecting prediction accuracy
 Training population size  What accuracy can
I achieve ?
 Trait heritability
 How many
 Influence of G x E, precision of measurements animals do I
 Marker density need for
training ?
 Effective population size of breeding population i.e.,
genetic diversity of breeding population  How many
markers do I
 Genetic relationship between training population and need ?
selection candidates  What is the
 Statistical model cost-benefit
ratio ?
Accuracy of genomic
prediction depends on the
size of reference population

(Van der Werf, 2013)


Impact on dairy cattle

Figure courtesy: vincent ducrocq@jouy.inra.fr


Results of GS on US Holstein dairy cattle

GI for four paths of selection (SB, SC, DB, and Genetic gain per year estimates from four
DC) by birth year of offspring for Holsteins paths of selection (Four Paths) for MY in
Holstein
(García-Ruiz et al., 2016)
Options for genomic selection in low income countries
 Nucleus breeding programs
Centrally organized
Generic breeding goal
Good potential for species like poultry
 Community based breeding programs
Locally organized
Use locally adapted genetic resources
May struggle to get suitable reference population
 Any participatory approach needs immediate benefits for the
participating farmers: not in the future
Genomic Selection under Indian dairying scenario
 Implementation of the whole genome based selection in dairy animals keeping in view
the present Indian dairy farming system

there are few concerns :


 Majority of dairy animals with small scale holding farmers
 Limited animal identification and recording system
 Lack of organized herd

need of the hour :


 Networking among these dairy herds to have a large organized network system of
dairying
 Adoption of intensive dairy production system
 Digital animal identification, and production recording system
 Indian dairying needs a well-established real time recording system

Central and Central and


State State
Government Government
farms organizations

Dairy
NGOs Organized cooperatives
Dairy herds,
large scale
phenomics
recording and
genotyping
 India, must prioritize the milch species /breeds within the species to be engaged in
genomic selection for GS programs, firstly, either a single breed or group of some
breeds must be considered for developing reference population.

 Further, the number of reference populations required will also depend upon the
breed(s) included in development of a multi-breed DNA chip
 Creation of multi breed reference population probably with limited numbers of
subordinate breeds can be a viable solution to address the large DNA variations
segregating within and across Indian dairy populations is captured.

 It is most likely that the single reference population may not suffice to capture
entire genomic diversity present across Indian dairy breeds/species
 Harris et al. (2008) reported that SNP estimates calculated from Holstein Friesian
reference population did not produced accurate genomic estimated breeding value in
Jersey bulls and vice-versa
 Lack of core competence in computational biology and in Big-data
analysis and limited trained manpower in the field of quantitative
genetics for prediction of genomic breeding value.

 The most essential aspect is that diverse dairy organizations in


India need to come at one platform, if they seriously want to
execute genomic selection programme in Indian dairying.

You might also like