Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Computational Biology II

MBG2004
SNP Genotyping Methods
Assistant Prof. Cemalettin Bekpen

Some of the slides are adapted from Dabby Nickerson (Genome Sciences-UW)
https://www.idtdna.com/pages/education/decoded/article/genotyping-terms-to-know
SNP Genotyping Technologies
https://www.youtube.com/watch?v=JP4LlGCrLdA
Please watch it, you are responsible
SNP Genotyping
Matched Mis-Matched
Probe and Target C Allele T Allele

C
C Eclipse
C
Allele-Specific Hybridization Target G A Dash
Hybridize Fail to hybridize
Molecular Beacon
Affymetrix

C
C C
Taqman Target G A
Degrade Fail t o degrade

Fluorescence
+ddCTP C
Polymerase Extension Target G A Polarization
C incorporat ed C Fails t o incorporate
Sequenom

Oligonucleotide Ligation
C C SNPlex
Target G A
Ligate Fail to ligat e Parallele
Illumina
SNPTyping Formats

Scale
Microtiter Plates - Fluorescence Low
eg. Taqman - Good for a few markers - lots of
samples - PCR prior to genotyping

Size Analysis by Electrophoresis Medium


eg. SNPlex - Intermediate Multiplexing
reduces costs - Genotype directly on
genomic DNA - new paradigm for high throughput

Arrays - Custom or Universal High


eg. Illumina, ParAllele, Affymetrics - Highly multiplexed
- 96, 1,500 SNPs and beyond (500K+)
Taqman
Microtiter Plates - Fluorescence

Genotyping with fluorescence-based homogenous assays


(single-tube assay) = 1 SNP/ tube
https://www.applied-maths.com/applications/taqman-based-snp-genotyping
SNP 1252 - T Genotype Calling - Cluster Analysis

SNP 1252 - C
Genotyping by Mass Spectrometry - 24
Polyacrylamide Gel Electropharesis (PAGE)

– Different concentration of acrylamide gels


– Proteins are negatively charged
– Move toward the positive electrode according to
size
– Smaller proteins travel furthest
– Can be visualized by several types of stains with
varying sensitivity (coomassie blue, silver)
– Can be semi-quantitative
– Can be native or denatured using Sodium
Dodecyl Sulfate (SDS-PAGE) – more common
– The smaller the protein size – the higher
acrylamide concentration used
Capillary electrophoresis (mun.ca)
Technological Leap - No advance PCR

Universal PCR after preparing multiple regions for analysis -

Several based on primer specific on genomic DNA followed by


PCR of the ligated products - different strategies
and different readouts.

SNPlex, Illumina, Parallel (Affymetrix)

Also, reduced representation - Affymetrix


- cut with restriction enzyme, then ligate linkers
and amplify from linkers and follow by chip
hybridization to read out.
SNPlex Assay - 48 SNPs
Size Analysis by Electrophoresis
Allele Specific Sequence
ZipCode1 Universal PCR Priming site
A P
G
ZipCode2 Locus Specific Sequence
Genomic
C
DNATarget
1. Ligation P
A Ligation Product
G Formed
C
(Homozygote shown in this case)

2. Clean-up
Detection
9. Characterize on Capillary Sequencer

SNP 1

SNP 2
SNPlex Readout

ZipChuten N(n) T Position n

n ~ 48/lane

~2000 lanes/day

~96,000 genotypes/day
Zipchute3 NNN T Position 3

Zipchute2 NN A Position 2

Zipchute1 N C Position 1
Arrays - Custom or Universal
Parallel - Defined and Custom Formats

- Intermediate Strategy

- Multiplex ~ 20,000 SNPs

- Affymetrix readout Universal Arrays


Parallele Technology (MIP)

Molecular Inversion Probes (MIP)

https://www.ncbi.nlm.nih.gov/probe/docs/techmip/
The probes are labelled, etiher detected by reader or sequenced with illumina
Whole Genome Association Strategies

Two Platforms Available Different Designs

- Affymetrix

- Illumina
Affymetrix GeneChip Mapping 500K Array Set

https://www.thermofisher.com/us/en/home/life-science/microarray-
analysis/affymetrix.html?category=34003&categoryIdClicked=34003&rootCategoryId=34003&navMode=34003&aId=aboutNav
Affymetrix Assays

100,000 to 500,000 fixed formats

Whole genome strategy

Not all regions - unique

Cheap but costly and throughput an issue


500K: Content Optimized SNP Selection

• Initial Selection: 48 people


~2,200,000 SNPs – 2.2M SNPs
– 25 million genotypes
From Public & Perlegen – 16 each Caucasian, African, Asian
– All HapMap samples
48 individuals
• Maximize performance: Second selection
Call rate, concordance over 400 people
– 270 HapMap Samples
– 130 diversity samples
~650K SNPs – Accuracy
• HW, Mendel error, reproducibility
400 samples – Call rates
Call rate, accuracy • Maximize information content:
LD – Prioritize SNPs based on LD & HapMap
(Broad Institute)
500K SNPs
The Assay - Details

Optimized for
250-2000bp

http://www.affymetrix.com/products/arrays/specific/100k.affx
GeneChip® Mapping 500K Assay

http://tools.thermofisher.com/content/sfs/manuals/500k_assay_manual.pdf
40 probes are used per SNP
High Throughput Chip Formats
80% genome coverage of Mapping 500K

• 500K run on 270 HapMap


samples
• Pairwise r2 analysis for
common SNPs (MAF>0.05)
• Robust coverage across
populations r2=0.8
– CEPH, Asian ~66%
– Yoruba ~45%
• 2 & 3 marker predictors
(multimarker) further
increase coverage
Mapping 500K Set

• >500K SNP’s
– 2 array set
• Performance
– 93-98% call rate range (>95% average)
– >99.5% concordance with HapMap Genotypes, 99.9%
reproducibility
• SNP lists, annotation and genotype data available without
restriction at Affymetrix.com
Illumina - Infinium I & II
10K - 300K

• https://www.youtube.com/watch?v=g0iPW9eAwrc

• https://dnatech.genomecenter.ucdavis.edu/infinium-assay/
• https://www.jove.com/v/50683/infinium-assay-for-large-scale-snp-
genotyping-applications
Infinium II Assay
Single Base Extension

Two haptens/colors
Bead U
A
C

WGA target
https://dnatech.genomecenter.ucdavis.edu/infinium-assay/
https://dnatech.genomecenter.ucdavis.edu/infinium-assay/
Hardy–Weinberg Principle
• When gametes containing either of two alleles, A or a, unite at
random to form the next generation, the genotype frequencies
among the zygotes are given by the ratio
p2 : 2pq : q2
this constitutes the Hardy–Weinberg (HW) Principle

p = frequency of a dominant allele A


q = frequency of a recessive allele a
p + q =1

36
Fig. 14.10 37
Hardy–Weinberg Principle
• One important implication of the HW Principle is that allelic
frequencies will remain constant over time if the following
conditions are met:
• The population is sufficiently large
• Mating is random
• Allelic frequencies are the same in males and females
• Selection does not occur = all genotypes have equal in
viability and fertility
• Mutation and migration are absent

38
H-W ASSUMPTIONS:

1) Mating is random (with respect to the locus).

2) The population is infinitely large. (no sampling error –


Random Genetic Drift)

3) Genes are not added from outside the population (no


gene flow or migration).

4) Genes do not change from one allelic state to another


(no mutation).

5) All individuals have equal probabilities of survival and


reproduction (no selection).
IMPLICATIONS OF THE H-W PRINCIPLE:

1) A random mating population with no external forces


acting on it will reach the equilibrium H-W frequencies
in a single generation, and these frequencies remain
constant there after.

2) Any perturbation of the gene frequencies leads to a


new equilibrium after random mating.

3) The amount of heterozygosity is maximized when the


gene frequencies are intermediate.

2pq has a maximum value of 0.5 when


p = q = 0.5
Hardy–Weinberg Principle
• Another important
implication is that
for a rare allele,
there are many
more
heterozygotes than
there are
homozygotes for
the rare allele

Fig. 14.12 41
FOUR PRIMARY USES OF THE H-W PRINCIPLE:

1) Enables us to compute genotype frequencies from


generation to generation, even with selection.

2) Serves as a null model in tests for natural selection,


nonrandom mating, etc., by comparing observed to
expected genotype frequencies.

3) Forensic analysis.

4) Expected heterozygosity provides a useful means of


summarizing the molecular genetic diversity in natural
populations.
Genotype frequencies for Wing-color Polymorphism in a Natural
Population of the Moth Panaxia dominula (Fisher & Ford, 1947)

Color Pattern dominula medionigra bimaculata Total

Genotype B1B1 B1B2 B2B2

Sample Size (Nij) 905 78 3 N = 986

Frequency (Pij) 0.918 0.079 0.003 1.000

Allele Frequencies:

p1 = p11 + ½ p12 = 0.918 + ½ (0.079) = 0.958


p2 = 1 – p1 = 0.041

Expectations (Nij) p12 N 2p1p2 N p22 N

905 79 2
Hardy Weinberg Equilibrium

• Given
– p = Allele 1 frequency
– q = 1-p
• Expectations
– p2 = frequency 11
– 2pq = frequency 12
– q2 = frequency 22

https://ramneetkaur.com/hardy-weinberg-principle/
Hardy Weinberg Disequilibrium

• Heterozygote excess • Homozygote excess


– Biologic – Biologic
• Differential survival • Population stratification
• Null allele

– Technical – Technical
• Allele dropout
• Nonspecific assays
Duplicated regions
Data Quality Control

• Estimate Error Rates from Replicates


• Check Hardy Weinberg Equilibrium
• Check Allele and Haplotype Frequencies
• Check Missing Data - Site specific
SNP Genotyping Summary

1. Many different genotyping approaches are available - Low to high throughput

2. Some platforms permit users to pick custom SNPs but the highest throughputs
are available only in fixed contents.

3. Not all custom SNPs will work for every format. Multiple formats will be required
to carry out most projects targeting specific SNPs

4. There are still trade-offs for throughput - Samples vs. SNPs

5. Costs still dictate study design.

6. Regardless of the study - Design,quality control and tracking will rule the day!
Laboratory Information Management Systems are key in every study design
(Key: Track - Samples,
- Assays
- Completion rate
- Reproducibility/Error Analysis)

You might also like