Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Queensland Institute of

Medical Research

GWAS for quantitative traits

Peter M. Visscher
peter.visscher@qimr.edu.au
Overview
• Darwin and Mendel
• Background: population genetics
• Background: quantitative genetics
• GWAS
– Examples
– Analysis
– Statistical power
[Galton, 1889]
Mendelian Genetics
Following a single (or several)
genes that we can directly score

Phenotype highly informative


as to genotype
Darwin & Mendel

• Darwin (1859) Origin of Species


– Instant Classic, major immediate impact
– Problem: Model of Inheritance
• Darwin assumed Blending inheritance
• Offspring = average of both parents
• zo = (zm + zf)/2
• Fleming Jenkin (1867) pointed out problem
– Var(zo) = Var[(zm + zf)/2] = (1/2) Var(parents)
– Hence, under blending inheritance, half the variation is
removed each generation and this must somehow be
replenished by mutation.
Mendel
• Mendel (1865), Experiments in Plant Hybridization
• No impact, paper essentially ignored
– Ironically, Darwin had an apparently unread copy in his
library
– Why ignored? Perhaps too mathematical for 19th
century biologists
• Rediscovery in 1900 (by three independent
groups)
• Mendel’s key idea: Genes are discrete particles
passed on intact from parent to offspring
The height vs. pea
debate
(early 1900s)

Biometricians Mendelians

Do quantitative traits have the same


hereditary and evolutionary properties as
discrete characters?
Trait

Qq

qq
QQ

m-a m+d m+a

RA Fisher (1918).
Transactions of the
Royal Society
of Edinburgh
52: 399-433.
Population Genetics
• Allele and genotype frequencies
• Hardy-Weinberg Equilibrium
• Linkage (dis)equilibrium
Allele and Genotype Frequencies
Given genotype frequencies, we can always compute allele
frequencies, e.g.,

1
pi = freq( Ai Ai ) + ∑ freq( Ai Aj )
2 i≠ j 6
The converse is not true: given allele frequencies we
cannot uniquely determine the genotype frequencies

For n alleles, there are n(n+1)/2 genotypes

If we are willing to assume random mating,

 pi2 for i = j Hardy-Weinberg


freq ( Ai A j ) =  proportions
2 pi p j for i ≠ j
Hardy-Weinberg
• Prediction of genotype frequencies from allele freqs

• Allele frequencies remain unchanged over generations,


provided:
• Infinite population size (no genetic drift)
• No mutation
• No selection QC in GWAS studies
• No migration
• Under HW conditions, a single generation of random
mating gives genotype frequencies in Hardy-Weinberg
proportions, and they remain forever in these proportions
Linkage equilibrium

Random mating and recombination eventually changes


gamete frequencies so that they are in linkage equilibrium (LE).

Once in LE, gamete frequencies do not change (unless acted on


by other forces)

At LE, alleles in gametes are independent of each other:

freq(AB) = freq(A)*freq(B)
freq(ABC) = freq(A) * freq(B) * freq(C)
Linkage disequilibrium
When linkage disequilibrium (LD) present, alleles are no
longer independent --- knowing that one allele is in the
gamete provides information on alleles at other loci:

freq(AB) ≠ freq(A) * freq(B)

The disequilibrium between alleles A and B is given by

DAB = freq(AB) – freq(A)*freq(B)

GWAS relies on LD between markers and causal variants


Linkage equilibrium Linkage disequilibrium

Q1 M1
Q1 M1

Q1 M1
Q2 M1
Q1 M1
Q2 M2

Q2 M2 Q2 M1
Q2 M2

Q2 M2
Q1 M2 Q2 M2
Q2 M2

Q1 M1
Q1 M1
Q1 M2
The Decay of Linkage Disequilibrium
The frequency of the AB gamete is given by

freq(AB) = freq(A)*freq*(B) + DAB

If recombination frequency between the A and B loci


is c, the disequilibrium in generation t is

D(t) = D(0) (1 – c)t 1.00

0.90

0.80

0.70

Note that D(t) -> zero, although the 0.60


LD

approach can be slow when c is very 0.50

small 0.40

0.30

0.20 c = 0.10
c = 0.01
0.10
c = 0.001
NB: Gene mapping & GWAS 0.00
0 10 20 30 40 50 60 70 80 90 100
Generation
Forces that Generate LD
• Drift (finite population size)
• Selection
• Migration (admixture)
• Mutation
• Population structure (stratification)

Effective population size determines the


number of markers needed for GWAS
Quantitative Genetics
The analysis of traits whose variation is
determined by both a number of genes and
environmental factors
Trait

Qq

qq
QQ

m-a m+d m+a

Phenotype is highly uninformative as to


underlying genotype
Complex (or Quantitative) trait
• No (apparent) simple Mendelian basis for variation
in the trait
• May be a single gene strongly influenced by
environmental factors
• May be the result of a number of genes of equal
(or differing) effect
• Most likely, a combination of both multiple genes
and environmental factors.
• Example: Blood pressure, cholesterol levels, IQ,
height, etc.
Basic model of Quantitative Genetics

Basic model: P = G + E

G = average phenotypic value for that genotype


if we are able to replicate it over the universe
of environmental values, G = E[P]

G x E interaction --- G values are different


across environments. Basic model now
becomes P = G + E + GE
Biometrical model for single diallelic Quantitative
Trait Locus (QTL)
µ = ∑ xi f (xi )
i
Contribution of the QTL to the Mean (X)
Genotypes AA Aa aa
Effect, x a d -a

Frequencies, f(x) p2 2pq q2

Mean (X) = a(p2) + d(2pq) – a(q2) = a(p-q) + 2pqd


Example: Apolipoprotein E & Alzheimer’s

Genotype ee Ee EE

Average age of onset 68.4 75.5 84.3

2a = G(EE) - G(ee) = 84.3 - 68.4 --> a = 7.95

d = G(Ee) - [ G(EE)+G(ee)]/2 = -0.85

d/a = -0.10 Only small amount of dominance


Biometrical model for single diallelic QTL

Var = ∑ ( xi − µ ) f ( xi )
2
Contribution of the QTL to the Variance (X)
i

Genotypes AA Aa aa
Effect, x a d -a

p2 2pq q2 HW proportions
Frequencies, f(x)

Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2

= VQTL
Biometrical model for single diallelic QTL

Var (X) = (a-m)2p2 + (d-m)22pq + (-a-m)2q2

= 2pq[a+(q-p)d]2 + (2pqd)2

= VAQTL + VDQTL

Additive effects: the main effects of individual alleles


Dominance effects: represent the interaction between alleles
Biometrical model for single biallelic QTL

a
d

Fisher 1918
-a

aa Aa AA

Var (X) = Regression Variance + Residual Variance


= Additive Variance + Dominance Variance
Association (GWAS)
• State of play
• Model
• Analysis method
• Power of detection
Disease Number Percent of Heritability Heritability
of loci Measure Explained Measure
Age-related macular 5 50% Sibling recurrence
degeneration risk
Crohn’s disease 32 20% Genetic risk
(liability)
Systemic lupus 6 15% Sibling recurrence
erythematosus risk
Type 2 diabetes 18 6% Sibling recurrence
risk
HDL cholesterol 7 5.2% Phenotypic
variance
Height 40 5% Phenotypic
variance
Early onset myocardial 9 2.8% Phenotypic
infarction variance
Fasting glucose 4 1.5% Phenotypic
variance

• GWAS works
• Effect sizes are typically small
– Disease: OR ~1.1 to ~1.3
– Quantitative traits: % var explained
<<1%
Effect sizes QT (104 SNPs)
% variance explained, quantitative
traits

35
30
Frequency

25
20
15
10
5
0
1

7
0.

0.

0.

0.

0.

1.

1.

1.

1.
Linear model for single SNP
• Allelic
Additive model
Y = µ+ b*x + e
x = 0, 1, 2 for genotypes aa, Aa and AA

• Genotypic
Additive + dominance model
Y = µ + Gi + e
Gi = genotype group for corresponding to
genotypes aa, Aa and AA
Method
• Linear regression
• ANOVA
• (other: maximum likelihood, Bayesian)
Test statistic (allelic model)

T = bˆ / σ (bˆ) ~ t N − 2 ≈ N (0,1)

T = b / var(b) ~ F1, N − 2 ≈ χ1
2 ˆ 2 ˆ 2

σ e2 σ e2
var(bˆ) = =
N var( x) N 2 p (1 − p )
Statistical Power (additive model)
q2 = {2p(1-p)[a + d(1-2p)]2} / σp2

Non-centrality parameter of χ2 test:

λ = Nq2/(1-q2) ≈ Nq2

Required sample size given type-I (α) and type-II (β) error:

N = [(1-q2)/(q2)](z(1-α/2) + z(1-β))2 ≈ (z(1-α/2) + z(1-β))2 / q2


LD again
r2 = LD correlation between QTL and genotyped
SNP

Proportion of variance explained at SNP


= r2q2

Required sample size for detection


N ≈ (z(1-α/2) + z(1-β))2 / (r2q2)
Genetic Power Calculator (Shaun Purcell)
http://pngu.mgh.harvard.edu/~purcell/gpc/
Serum bilirubin: if all GWAS were so simple…

2.000

1.500
38% of phenotypic
variance explained

95% CI PHENOTYPE
1.000

0.500

0.000

-0.500

0 1 2
RS2070959_A
1984

You might also like