Professional Documents
Culture Documents
Biological Sequence Determination: Protein
Biological Sequence Determination: Protein
Biological Sequence Determination: Protein
Determination
artwork: commons.wikimedia.org
Sequencing
protein context
RNA technological
biological
DNA
old methods
“third generation”
SMRT, nanopores, etc.
contemplation
of the future
Protein Sequencing
Why Proteins?
Small Digestible
(pepsin, trypsin,
Chemically chymotrypsin)
distinguishable
(purifyable) Important
Insulin
Fred Sanger
Nobel prize, 1958
Classes of RNA
mRNA
snRNA
modified bases ( cap with m7G , 2'-
splicosomes (U1, U2, U4, U5,
O-methylation ) U6)
splicing
snoRNA
polyadenylation pre-rRNA processing (U3)
tRNA guide 2'-O-methylation
guide pseudouridylation
modified bases (GMe, GMe2, CMe,
T, ψ, UH2, I, IMe)
RNAi
rRNA
siRNA (short interfering RNA)
prokaryotic: 70S = 50S (5S, 23S) +
miRNA (microRNA)
30S (16S) post-transcriptional gene
eukaryotic: 80S = 60S (5S, 5.8S, silencing
28S) + 40S (18S) 3' UTR, conserved
7SL
piRNA
RNA of Signal Recognition Particle
transcriptional silencing of
retrotransposons
(SRP)
homologous to Alu SINE (11% of
human genome) ... et cetera ...
DNA Sequencing
1977
The “modern era” of DNA sequencing begins
Chemical Sequencing of DNA
(Maxam-Gilbert)
February 1977
Two steps:
Damage bases
specific, partial
Cleave backbone
Four reactions:
A
A+G Related approaches are still
C used in specialized applications
C+T (e.g., DNA footprinting)
http://nobelprize.org/nobel_prizes/chemistry/laureates/1980/gilbert-lecture.pdf
Chain Termination Sequencing
(Sanger Sequencing)
2',3'-dideoxy TTP
Bacterial
DNA polymerase I
adds nucleotides to
the 3' end of primer
to complement
5' -overhanging
template.
Each strand is an
ordered sequence
with a direction.
uses
data quality monitoring assembly consensus finishing criteria
Sequencing Strategy
"The major time spent in DNA sequencing is
spent in the preparation of the DNA
fragments and on the elements of strategy."
-- W. Gilbert, 1980
"Third-Generation" Sequencing
SMRT (Pacific Biosciences)
pyrosequencing
+
pyrophosphat APS
e by dNTP incorporation)
(released adenosine
5`-
phosphosulfate
ATP
sulfurylase
+
sulfat
e AT
P
pyrosequencing
O
+ 2 +
ATP oxygen luciferin
firefly
luciferase
problem solution
apyrase
pyrophosphate recycling breaks down ATP to
AMP + 2 Pi
use an analog
suitable for
luciferase can use dATP polymerase but
not luciferase
dATPα
S
pyrosequencing
flowgram
individual
template
molecule
Emulsion PCR
DNA anchored
to bead all
comes from
the same
template
molecule
Alternatives to chemiluminescence
heat (“thermosequencing”)
pH change ("Ion Torrent")
Cycles of Reversible Termination
Illumina/Solexa
Helicos
Helicos
Illumina
Mme I
TCCRAC
(20/18)
Illumina Genome Analyzer
Library Preparation
Illumina Genome Analyzer
SOLiD (ABI)
Complete Genomics
Polonator (Church Lab)
SOLiD
Sequencing by Oligonucleotide Ligation and Detection
3'- ATNNN~ZZZ*-5'
artwork is from the pamphlet Dibase Sequencing and Color Space Analysis
SOLiD
SOLiD: Dibase Encoding
AT AC AA AG
CG CA CC CT
GC GT GG GA
TA TG TT TC
SOLiD: Dibase Encoding
base space color space
The base sequence is one unit longer than the color sequence.
SNP causes two color changes single color change is probably an error
Single-Molecule, Real-Time (SMRT)
Sequencing
High throughput
Parallelism (small reactions)
Speed (immediate results)
Long reads
Read individual templates from mixtures
Haplotyping
SMRT Sequencing
41
Simulated SMRT Sequencing Data
Platform Comparisons
nimblegen.com
Bonus Slides
Selenocysteine
tRNA
Omics
transcriptome
exome
kinome
“Plus and Minus”
Method
(circa 1975)
"minus":
polymerase stops
at
missing base
"plus":
T4 DNA
polymerase
3' exonuclease
stalled by dNTP
Sanger F, Coulson AR.
J Mol Biol. 94(3):441-8, 1975
pyrosequencing
Animation: http://www.pyrosequencing.com/DynPage.aspx?id=7454
Bioinformatics Classics
valid
the set of bases assigned to probability p should
have an actual error rate of p
discriminating
helps to distinguish correct vs. incorrect base calls
1,000,000 base calls with 1,0000 errors (p = 0.01)
better if we can break it into two 500,000 sets:
p=0.018 in one set (9000 errors)
p=0.002 in second set (1000 errors)
Error Probability Calibration
'Given a set of parameters and a training set of reads for which it
is known which base-calls are correct and which are errors, find a
way of associating parameter values to error probabilities that
has (near) maximum discrimination power for small r.'
Phred Quality Score Parameters
Empirical.
Small values tend to correspond to more accurate base-calls.
Window-based parameters smooth out error probabilities.
glycosylation (glycoproteins)
acylation (at O, N, or S)
mucin, cellular interaction,
acetylation (acetate, CH3CO2− )
structural
myristoylation (myristate, a
N-linked C14 fatty acid)
asparagine
palmitoylation (palmitate, a
O-linked C16 fatty acid)
serine, threonine, hydroxylysine,
alkylation
hydroxyproline
methylation
iodination
isoprenylation
thyroid hormone
phosphorylation
hydroxylation
signal transduction
hydroxylysine in collagen
ADP-ribosylation
covalently bound enzyme
cofactors
signal transduction
FAD, biotin, etc
cholera toxin
ubiquitination ... and many more
“Wandering
Spot”
Method
ca.1970s
RNA or
DNA
partial digestion
2D separation
Horizontal = base
composition
Vertical = size
Sequence-specific RNases
Phy M: A+U
A: pyrimidine-specific (C+U)
U2: A or A+G
T1: degrades after G residues
enzymatic chemical
Modified Nucleotides in tRNA
(post-transcriptional)
Phred
third-party base caller with better
accuracy than ABI's
open source(ish)
“The whole of the DNA to be sequenced is shotgunned into a suitable vector and
cloned. Ideally the cloned fragments would be of at least 200 bases in length. The
clones are then sequenced and the computer used to collate the data. Collation
involves searching for overlaps in the data.”
2D gel electrophoresis
cybertory.org/exercises/primerDesign
Protein Sequencing
Edman Degradation
phenylisothiocyanate
invented ca. 1950s
automated ca. 1973
proceeds from N-terminus
read 50-70 aa
http://en.wikipedia.org/wiki/Edman_degradation
Mass Spectrometry
Precise determination of
A few amino acids can ID a spot on 2D molecular weights of
gel peptides
(Sec)