Visscher Present

Queensland Institute of
Medical Research
GWAS for quantitative traits
Peter M. Visscher
peter.visscher@qimr.edu.au
Overview
• Darwin and Mendel
• Background: population genetics
• Background: quantitative genetics
• GWAS
– Examples
– Analysis
– Statistical power
[Galton, 1889]
Mendelian Genetics
Following a single (or several)
genes that we can directly score
Phenotype highly informative

as to genotype
Darwin & Mendel
• Darwin (1859) Origin of Species

– Instant Classic, major immediate impact
– Problem: Model of Inheritance
• Darwin assumed Blending inheritance
• Offspring = average of both parents
• zo = (zm + zf)/2
• Fleming Jenkin (1867) pointed out problem
– Var(zo) = Var[(zm + zf)/2] = (1/2) Var(parents)
– Hence, under blending inheritance, half the variation is
removed each generation and this must somehow be
replenished by mutation.
Mendel
• Mendel (1865), Experiments in Plant Hybridization
• No impact, paper essentially ignored
– Ironically, Darwin had an apparently unread copy in his
library
– Why ignored? Perhaps too mathematical for 19th
century biologists
• Rediscovery in 1900 (by three independent
groups)
• Mendel’s key idea: Genes are discrete particles
passed on intact from parent to offspring
The height vs. pea
debate
(early 1900s)
Biometricians Mendelians
Do quantitative traits have the same

hereditary and evolutionary properties as
discrete characters?
Trait
Qq
qq
QQ
m-a m+d m+a
RA Fisher (1918).
Transactions of the
Royal Society
of Edinburgh
52: 399-433.
Population Genetics
• Allele and genotype frequencies
• Hardy-Weinberg Equilibrium
• Linkage (dis)equilibrium
Allele and Genotype Frequencies
Given genotype frequencies, we can always compute allele
frequencies, e.g.,
1
pi = freq( Ai Ai ) + ∑ freq( Ai Aj )
2 i≠ j 6
The converse is not true: given allele frequencies we
cannot uniquely determine the genotype frequencies
For n alleles, there are n(n+1)/2 genotypes
If we are willing to assume random mating,
 pi2 for i = j Hardy-Weinberg

freq ( Ai A j ) =  proportions
2 pi p j for i ≠ j
Hardy-Weinberg
• Prediction of genotype frequencies from allele freqs
• Allele frequencies remain unchanged over generations,

provided:
• Infinite population size (no genetic drift)
• No mutation
• No selection QC in GWAS studies
• No migration
• Under HW conditions, a single generation of random
mating gives genotype frequencies in Hardy-Weinberg
proportions, and they remain forever in these proportions
Linkage equilibrium
Random mating and recombination eventually changes

gamete frequencies so that they are in linkage equilibrium (LE).
Once in LE, gamete frequencies do not change (unless acted on

by other forces)
At LE, alleles in gametes are independent of each other:
freq(AB) = freq(A)*freq(B)
freq(ABC) = freq(A) * freq(B) * freq(C)
Linkage disequilibrium
When linkage disequilibrium (LD) present, alleles are no
longer independent --- knowing that one allele is in the
gamete provides information on alleles at other loci:
freq(AB) ≠ freq(A) * freq(B)
The disequilibrium between alleles A and B is given by
DAB = freq(AB) – freq(A)*freq(B)
GWAS relies on LD between markers and causal variants

Linkage equilibrium Linkage disequilibrium
Q1 M1
Q1 M1
Q1 M1
Q2 M1
Q1 M1
Q2 M2
Q2 M2 Q2 M1
Q2 M2
Q2 M2
Q1 M2 Q2 M2
Q2 M2
Q1 M1
Q1 M1
Q1 M2
The Decay of Linkage Disequilibrium
The frequency of the AB gamete is given by
freq(AB) = freq(A)*freq*(B) + DAB
If recombination frequency between the A and B loci

is c, the disequilibrium in generation t is
D(t) = D(0) (1 – c)t 1.00
0.90
0.80
0.70
Note that D(t) -> zero, although the 0.60

LD
approach can be slow when c is very 0.50
small 0.40
0.30
0.20 c = 0.10
c = 0.01
0.10
c = 0.001
NB: Gene mapping & GWAS 0.00
0 10 20 30 40 50 60 70 80 90 100
Generation
Forces that Generate LD
• Drift (finite population size)
• Selection
• Migration (admixture)
• Mutation
• Population structure (stratification)
Effective population size determines the

number of markers needed for GWAS
Quantitative Genetics
The analysis of traits whose variation is
determined by both a number of genes and
environmental factors
Trait
Qq
qq
QQ
m-a m+d m+a
Phenotype is highly uninformative as to

underlying genotype
Complex (or Quantitative) trait
• No (apparent) simple Mendelian basis for variation
in the trait
• May be a single gene strongly influenced by
environmental factors
• May be the result of a number of genes of equal
(or differing) effect
• Most likely, a combination of both multiple genes
and environmental factors.
• Example: Blood pressure, cholesterol levels, IQ,
height, etc.
Basic model of Quantitative Genetics
Basic model: P = G + E
G = average phenotypic value for that genotype

if we are able to replicate it over the universe
of environmental values, G = E[P]
G x E interaction --- G values are different

across environments. Basic model now
becomes P = G + E + GE
Biometrical model for single diallelic Quantitative
Trait Locus (QTL)
µ = ∑ xi f (xi )
i
Contribution of the QTL to the Mean (X)
Genotypes AA Aa aa
Effect, x a d -a
Frequencies, f(x) p2 2pq q2
Mean (X) = a(p2) + d(2pq) – a(q2) = a(p-q) + 2pqd

Example: Apolipoprotein E & Alzheimer’s
Genotype ee Ee EE
Average age of onset 68.4 75.5 84.3
2a = G(EE) - G(ee) = 84.3 - 68.4 --> a = 7.95
d = G(Ee) - [ G(EE)+G(ee)]/2 = -0.85
d/a = -0.10 Only small amount of dominance

Biometrical model for single diallelic QTL
Var = ∑ ( xi − µ ) f ( xi )
2
Contribution of the QTL to the Variance (X)
i
Genotypes AA Aa aa
Effect, x a d -a
p2 2pq q2 HW proportions
Frequencies, f(x)
Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2
= VQTL
Biometrical model for single diallelic QTL
Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2
= 2pq[a+(q-p)d]2 + (2pqd)2
= VAQTL + VDQTL
Additive effects: the main effects of individual alleles

Dominance effects: represent the interaction between alleles
Biometrical model for single biallelic QTL
a
d
Fisher 1918
-a
aa Aa AA
Var (X) = Regression Variance + Residual Variance

= Additive Variance + Dominance Variance
Association (GWAS)
• State of play
• Model
• Analysis method
• Power of detection
Disease Number Percent of Heritability Heritability
of loci Measure Explained Measure
Age-related macular 5 50% Sibling recurrence
degeneration risk
Crohn’s disease 32 20% Genetic risk
(liability)
Systemic lupus 6 15% Sibling recurrence
erythematosus risk
Type 2 diabetes 18 6% Sibling recurrence
risk
HDL cholesterol 7 5.2% Phenotypic
variance
Height 40 5% Phenotypic
variance
Early onset myocardial 9 2.8% Phenotypic
infarction variance
Fasting glucose 4 1.5% Phenotypic
variance
• GWAS works
• Effect sizes are typically small
– Disease: OR ~1.1 to ~1.3
– Quantitative traits: % var explained
<<1%
Effect sizes QT (104 SNPs)
% variance explained, quantitative
traits
35
30
Frequency
25
20
15
10
5
0
1
7
0.
0.
0.
0.
0.
1.
1.
1.
1.
Linear model for single SNP
• Allelic
Additive model
Y = µ+ b*x + e
x = 0, 1, 2 for genotypes aa, Aa and AA
• Genotypic
Additive + dominance model
Y = µ + Gi + e
Gi = genotype group for corresponding to
genotypes aa, Aa and AA
Method
• Linear regression
• ANOVA
• (other: maximum likelihood, Bayesian)
Test statistic (allelic model)
T = bˆ / σ (bˆ) ~ t N − 2 ≈ N (0,1)
T = b / var(b) ~ F1, N − 2 ≈ χ1
2 ˆ 2 ˆ 2
σ e2 σ e2
var(bˆ) = =
N var( x) N 2 p (1 − p )
Statistical Power (additive model)
q2 = {2p(1-p)[a + d(1-2p)]2} / σp2
Non-centrality parameter of χ2 test:
λ = Nq2/(1-q2) ≈ Nq2
Required sample size given type-I (α) and type-II (β) error:
N = [(1-q2)/(q2)](z(1-α/2) + z(1-β))2 ≈ (z(1-α/2) + z(1-β))2 / q2

LD again
r2 = LD correlation between QTL and genotyped
SNP
Proportion of variance explained at SNP

= r2q2
Required sample size for detection

N ≈ (z(1-α/2) + z(1-β))2 / (r2q2)
Genetic Power Calculator (Shaun Purcell)
http://pngu.mgh.harvard.edu/~purcell/gpc/
Serum bilirubin: if all GWAS were so simple…
2.000
1.500
38% of phenotypic
variance explained
95% CI PHENOTYPE
1.000
0.500
0.000
-0.500
0 1 2
RS2070959_A
1984

Visscher Present

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Visscher Present

Uploaded by

Copyright:

Available Formats

Queensland Institute of

GWAS for quantitative traits

Phenotype highly informative

• Darwin (1859) Origin of Species

Do quantitative traits have the same

m-a m+d m+a

For n alleles, there are n(n+1)/2 genotypes

If we are willing to assume random mating,

 pi2 for i = j Hardy-Weinberg

• Allele frequencies remain unchanged over generations,

Random mating and recombination eventually changes

Once in LE, gamete frequencies do not change (unless acted on

At LE, alleles in gametes are independent of each other:

freq(AB) ≠ freq(A) * freq(B)

The disequilibrium between alleles A and B is given by

DAB = freq(AB) – freq(A)*freq(B)

GWAS relies on LD between markers and causal variants

freq(AB) = freq(A)*freq*(B) + DAB

If recombination frequency between the A and B loci

D(t) = D(0) (1 – c)t 1.00

Note that D(t) -> zero, although the 0.60

approach can be slow when c is very 0.50

Effective population size determines the

m-a m+d m+a

Phenotype is highly uninformative as to

G = average phenotypic value for that genotype

G x E interaction --- G values are different

Frequencies, f(x) p2 2pq q2

Mean (X) = a(p2) + d(2pq) – a(q2) = a(p-q) + 2pqd

Average age of onset 68.4 75.5 84.3

2a = G(EE) - G(ee) = 84.3 - 68.4 --> a = 7.95

d = G(Ee) - [ G(EE)+G(ee)]/2 = -0.85

d/a = -0.10 Only small amount of dominance

Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2

Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2

Additive effects: the main effects of individual alleles

Var (X) = Regression Variance + Residual Variance

Non-centrality parameter of χ2 test:

N = [(1-q2)/(q2)](z(1-α/2) + z(1-β))2 ≈ (z(1-α/2) + z(1-β))2 / q2

Proportion of variance explained at SNP

Required sample size for detection

You might also like

freq(AB) = freq(A)freq(B) + DAB