Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Next Generation

Sequencing Technologies II

BIO 254
Spring 2021
DNA Sequencing Methods

- Sequencing methodologies - Sanger seq, chemical degradation seq, and


pyrosequencing
- Next Generation sequencing - second generation sequencing - whole exome
(WES) and whole genome sequencing (WGS)
- Third generation sequencing –single DNA molecule sequencing
DNA Sequencing Methods
Methods for DNA sequencing were first devised in the mid-1970s.
Two different procedures were published at almost the same time:

- The chain termination method (Sanger et al., 1977),


in which the sequence of a single-stranded DNA molecule is determined by
enzymatic synthesis of complementary polynucleotide chains,
these chains terminating at specific nucleotide positions;

- The chemical degradation method (Maxam and Gilbert, 1977),


in which the sequence of a double-stranded DNA molecule is determined by
treatment with chemicals that cut the molecule at specific nucleotide positions.

- pyrosequencing (Ronaghi et al., 1998)


which does not require electrophoresis or any other fragment separation
procedure and so is more rapid than chain termination sequencing.
DNA Sequencing Methods

- Both chain termination and chemical degradation methods were equally popular
in 1900s.

The chain termination procedure has gained ascendancy in recent years,


particularly for genome sequencing because;

- the chemicals used in the chemical degradation method are toxic and therefore
hazardous to the health of the researchers,

- it has been easier to automate chain termination sequencing.

- Automated sequencing techniques were essential to complete the Human


genome Project in a reasonable time-span.
Sanger Sequencing- chain termination method

- developed by Sanger and his colleagues at the Medical Research Council in


Cambridge, England,
- is a method called enzymatic sequencing (Sanger et al., 1977).

Method;
• The unknown sequence is subcloned into a single-stranded DNA virus
• DNA synthesis is initiated from a primer sequence adjacent to the
unknown sequence.
• This method utilizes chain-terminating analogs of the four DNA nucleotides
(A, G, C, and T)
• When one of them is incorporated into DNA by DNA polymerase, synthesis of
the growing DNA chain is terminated.
Structures of normal deoxyribose and the
dideoxyribose sugar used in DNA
sequencing.

Addition of a nucleotide to the 3′-OH


terminus of a primer strand.
If the synthesis of DNA molecules begins at a
fixed point on a template in the presence of a
low concentration of the A analog, the analog
will infrequently be incorporated instead of
the normal A nucleotide at any one position.
However, when incorporation occurs, the
synthesis of the chain stops.
 Thus a nested set of DNA fragments that
terminate at every A nucleotide in the
unknown sequence is generated.
By correlating the length of the terminated
chains with the identity of the base analog
that was present in the reaction, one can
determine the order of the
nested DNA fragments and the
corresponding nucleotide sequences.
At present, this method
dominates DNA sequencing applications
primarily because once the subclones are
Dideoxy method of DNA sequencing. generated the procedure involves only a few
simple steps.
Trace of the fluorescence pattern obtained from a DNA sequencing gel by automated detection of
the fluorescence of each band as it comes off the bottom of the capillary tube during continued
electrophoresis.
Sanger Sequencing based on PCR
• The starting material for a chain termination sequencing experiment is
single-stranded DNA molecules.

There are several ways in which single-stranded DNA can be obtained:


• The DNA can be cloned in a plasmid vector.
• The DNA can be cloned in a bacteriophage M13 vector or phagemids.
Vectors based on M13 bacteriophage are designed specifically for the
production of single-stranded templates for DNA sequencing.
M13 bacteriophage has a single-stranded DNA genome and the infected
cells continually secrete new M13 phage particles, approximately 1000 per
generation, these phages containing the single-stranded version of the
genome.
• PCR can be used to generate single-stranded DNA.
Single primer is used to generate DNA fragments.
Sanger Sequencing – fluorescently labeled nucleotides
• The detection system for fluorolabels has opened the way to
automated sequence reading.
• The label is attached to the ddNTPs; a different fluorolabel used for each one.
• Chains terminated with A are labeled with one fluorophore, chains terminated
with C are labeled with a second fluorophore, and so on.
• It is possible to carry out the four sequencing reactions - for A, C, G and T - in a
single tube and to load all four families of molecules into just one lane of the
polyacrylamide gel.
• The fluorescent detector can discriminate between the different labels and
determine if each band represents an A, C, G or T.
• The sequence can be read directly as the bands pass in front of the detector and
sent straight to a computer for storage.
• When combined with robotic devices that prepare the sequencing reactions and
load the gel, the fluorescent detection system provides a major increase in
throughput.
• It generates sequence data rapidly.
Chemical Sequencing

- Chemical sequencing was developed by Maxam and Gilbert at Harvard


University.
- It uses chemicals that break the DNA chain at specific nucleotides.
- The DNA molecule is labeled at one end with a radioactive tag.
- It is then cleaved with each chemical separately in such a way as to
generate breaks infrequently at any given nucleotide.
- As in the enzymatic sequencing technique, the DNA fragments are
separated according to size, and the sizes are correlated with the
nucleotide that is cleaved.
- This method is generally more time-consuming than the
enzymatic sequencing method, but it often produces fewer ambiguities
in the interpretation of the data.
- Both methods generate mixtures of specific DNA fragments that are
separated by polyacrylamide gel electrophoresis.
- Polyacrylamide gels can resolve fragments that differ in size by a single
nucleotide.
Chemical Sequencing
Pyrosequencing
- does not require electrophoresis or any other fragment separation procedure
- is more rapid than chain termination sequencing.
- In pyrosequencing, the template is copied in a straightforward manner
without added ddNTPs.
- The addition of a nucleotide to the end of the growing strand is detectable
because it is accompanied by release of a molecule of pyrophosphate.
- Pyrophosphate is converted by the enzyme sulfurylase into a flash of
chemiluminescence.
- Each dNTP is added separately, one after the other, with a nucleotidase
enzyme also present in the reaction mixture so that if a dNTP is not
incorporated into the polynucleotide then it is rapidly degraded before the
next dNTP is added.
- This procedure makes it possible to follow the order in which the dNTPs are
incorporated into the growing strand.
- The technique requires a repetitive series of additions to the reaction
mixture, precisely the type of procedure that is easily automated, with the
possibility of many experiments being carried out in parallel.
The principle of pyrosequencing

(A) Incorporation of a dNTP into the growing chain releases a molecule of pyrophosphate
(PPi). In the presence of adenosine 5′ phosphosulfate, the enzyme ATP-sulfurylase converts
pyrophosphate into ATP, which luciferase uses to generate a flash of light.
(B) A pyrosequencer trace. Individual dNTPs are dispensed as possible substrates for the
pyrosequencing reaction in cycles with the order: C then T then G then A, and the intensity
of any light emitted in response to each nucleotide dispensed is recorded to give the graph
shown here. The interpreted sequence here would be: GAGTTCCCGAAGGCACCAAA.
Pyrosequencing reactions
Advances across multiple fields were brought together
to achieve routine sequencing at the genome scale
• In the mid-to-late 1990s, microarrays were developed as highly parallel assays to
measure RNA and DNA.
• Between 2001 and 2006, microarrays offered the first genome-scale parallel
analysis of DNA and RNA.
• In 2006, second- and third-generation sequencing techniques began to emerge
and allowed examination of billions of templates of DNA and RNA
simultaneously.
• Although now almost a decade old, the term next-generation sequencing
remains the popular way to describe;
very-high-throughput sequencing methods
allow millions to trillions of observations to be made in parallel
during a single instrument run.
Advances across multiple fields were brought together
to achieve routine sequencing at the genome scale.
Since 2006, there has been an explosion of
• new methods,
• techniques, and
• protocols
for the examination of virtually any question in basic genetics or clinical research
involving nucleic acid.

Genomics
Transcriptomics
Metabolomics
Microbiomics
Massively parelle sequencing
Next Generation Sequencing (NGS)

• Machines can sequence millions of templates simultaneously.


• Can generate one million base pairs of sequence in a single sequencing run.
• May generate reads of 100-1000 nucleotides
• The template can be directly obtained from genomic DNA and amplified by PCR

NGS companies
NGS; use of terminators

• Shear gDNA into small pieces


• Attach PCR adaptors to 3’ and 5’ ends of sheared DNA.
• The adaptors are complementary to primers found on slide.
• The template is denatured and covalently bound to random positions on the
slide. The slide is also completely covered by the primers.
• In each PCR cycle, the free adoptor end of the template hybridizes with an
immobilized neighbouring primer.
• Elongation of the primer yields a daughter strand.
• Its free end is used for the next PCR cycle since it is attached to another
neighbouring primer.
• Multiple rounds of PCR generates a cluster of DNA molecules that will be used
for terminator sequencing.
NGS; use of terminators

A method for massively parallel DNA sequencing that captures and amplifies DNA template strands
on a glass slide, and then uses reversible terminators for sequencing.
NGS; use of terminators
• Each nucleotide in the reaction mix has its 3’ end blocked by a fluorescence
emitter.
• In each cycle of sequencing, all four labeled nucleotides are used. Each template
will be blocked for further elongation as its 3’ end is blocked.
• Its flourescence is scanned and recorded.
• The fluorescent emittors are then chemically removed and replaced with 3’OH.
• In the next round, the second nucleotide will be added. Chain elongation will be
blocked; chemical replacemnt of free 3’OH….

The method yields;


about 1 billion nucleotide in one run;
25 million simultaneous sequencing reactions
Read lenghts per template is 100 bp’s.
NGS; long reads but few templates
• Shear gDNA
• Attach PCR adaptors
• Capture these denatured DNA fragments on beads (30 micrometers)
• Each bead gets one template.
• Each bead is absorbed by an oil droplet containing all reactants required for
PCR.
• With multiple cycles of PCR, 107 copies of original template is generated.
• Oil is dissolved by a detergant and bead is transferred to one of 1.5 million wells
in array (array is about 6 cm2).
• The reactants flow across the array chambers.
• Sequencing results are read by pyrosequencing.

 Generates 100 million basepairs in one run.


• 500.000 simultaneous sequencing reactions in one run.
• Average read lenght is 200 basepairs.
A method for massively parallel DNA sequencing that captures and amplifies DNA
template strands on tiny beads, and then determines the sequence by detecting the
pyrophosphate released as each new nucleotide is incorporated during synthesis.
Third Generation Sequencing: Massively-parallel sequencing
of unamplified DNA

• DNA sequencing technologies that use single unamplified DNA


templates
 sometimes called single-molecule sequencing or
 third-generation sequencing

• They avoid the biases introduced by PCR, and produce very long
sequences at low cost.
• However, sequence accuracy can be a problem.

Examples include;
- Pacific Biosciences systems (PacBio system) and
- Oxford Nanopore Technologies system
Pacific Biosciences systems

• The PacBio RSII system was released in 2010.


• It was the first DNA sequencing method to sequence single unamplified
molecules in real time.
• The sequencing templates are double-stranded DNA molecules that
have single-stranded hairpin oligonucleotides ligated to each end.
• A single sequencing primer is annealed to one of the hairpins.
• Sequencing takes place in a SMRT Cell, a fabrication containing 150,000
tiny wells (called zero-mode waveguides).
• A single DNA polymerase molecule is anchored to the bottom of each
well.
• Dye-labeled dNTPs diffuse in and out of the wells.
• The dye labels are on the terminal phosphates of the dNTPs, so that on
incorporation the dye is lost and totally natural DNA is synthesized.
Pacific Biosciences systems

• A high-processivity, strand-displacement DNA polymerase is used, such


as Φ29 or Bst polymerase.
• The synthesis point moves round and round the template, making long
concatemers of the sequence and using both the sense and antisense
strands as template in its journey round.
• When the polymerase binds and then incorporates a dNTP, the time
spend by the attached label at the bottom of the well is longer,
compared to the time spent by randomly diffusing dNTP molecules. This
label of the attached label is recorded by the laser imaging system.
• The duration represents the real-time dynamics of the polymerase, and
this may be different when template bases carry epigenetic
modifications such as methylation.
• Thus, the system has the unique ability to identify patterns of
modification directly from the raw data.
Single-molecule real-time sequencing using the PacBio system

(A)The template for PacBio sequencing is a double stranded DNA with single-strand hairpins
(green) ligated to either end. A sequencing primer (red) anneals to one of the hairpins. The
strand-displacing polymerase (gray) moves the template continuously round, producing
concatenated copies of the whole sequence.
(B) The polymerase is anchored at the bottom of a well; the four phospho-labeled dNTPs
diffuse freely.
(C) When a nucleotide is incorporated into the growing chain, its fluorescent label remains at
the bottom of the well much longer than the freely diffusing dNTPs.
PacBio system; advantages and disadvantages

• The PacBio RSII allows extremely long reads; 20 kb or more


• Both sample preparation times and run times are short
• The whole procedure to be completed in a single day.
• The error rate per nucleotide is around 11%  much higher than in
competing systems.
• The errors are random and can be compensated by allowing the
polymerase to run through the same circular template many times.
• The throughput is lower and the cost per base higher than for many
competing systems.
• However, the long reads make it ideal for de-novo sequencing of small
bacterial and viral genomes.
• It is also ideal for sequencing low-complexity regions or structural
variants in human and other genomes.
Oxford Nanopore Technologies system
• Oxford Nanopore are developing a competing third-generation system.
• In the MinION device, released in 2014, a flow cell contains about 500
wells.
• Each well is spanned by a synthetic, electrically resistant membrane in
which a single nanopore is anchored.
• The nanopores are made of modified α-hemolysin protein and are 10
μm long × 1 μm in size.
• Single-stranded DNA feeds through the pore.
• As the different sized nucleotides pass through, they block the ionic
current flowing through the pore to different extents  each nucleotide
can be recognized.
• In practice, the blocking effect depends on at least five contiguous
nucleotides and must be deconvoluted to identify individual
nucleotides.
Oxford Nanopore Technologies system

• To get signals for analysis, the passage of the DNA through the pore
must be slowed down by several orders of magnitude.
• It is slowed down by coupling it to a relatively slow-moving processive
enzyme.
• The test DNA is double-stranded.
• The leader end has a single-strand extension coupled to the motor
enzyme.
• The far end can be ligated to a hairpin oligonucleotide carrying a second
motor enzyme.
 Thus, both strands are sequenced.
Feeding test DNA through a nanopore

Oxford Nanopore’s sequencing strategy requires DNA templates to be ligated with two
adaptors.
The first adaptor is bound with a motor enzyme as well as a tether, whereas the second
adaptor is a hairpin oligonucleotide that is bound by the HP motor protein.
Changes in current that are induced as the nucleotides pass through the pore are used to
discriminate bases.
The library design allows sequencing of both strands of DNA from a single molecule (two
direction reads).
Oxford Nanopore Technologies system; advantages
and disadvantages

• Nanopore sequencing offers great promise.


• The read length is limited only by the length of the test DNA.
• Sample preparation is simple, the process is quick, and the device is
small and simple enough to be portable.
• The big problem is the error rate, with insertion, deletion, and
substitution rates of 4.9%, 7.8%, and 5.1%, respectively.
• They also report a high run-failure rate.
• However, the device has already demonstrated its use in sequencing a
previously unresolved, highly-repetitive region of human chromosome
X.
• Its reliability and accuracy has been increased since then.
• It is expected to be used for many applications.
Other technologies

Many alternative massively-parallel sequencing technologies are available


or underdevelopment by a variety of companies;
• Complete Genomics (http://www.completegenomics.com)
• Genapsys (http://genapsys.com)
• Genia (http://www.geniachip.com)
• Gnubio (http://gnubio.com)
• Lasergen (http://lasergen.com)
• Nabsys (http://nabsys.com)
• Stratos (http://stratosgenomics.com)
• ZSG (http://www.zsgenetics.com)
Comparison of NGS platforms

The two most common specifications used to compare platforms are;


• the number of reads produced in a given instrument run and
• the length of those reads.
• Other metrics—such as cost per run, cost per base, instrument run time,
presequencing sample preparation time, sample preparation cost, and platform
bias or error modes—are far more difficult to compare.
NGS Applications
Whole Genome Sequencing (WGS)
Whole Exome Sequencing (WES)
Targeted Sequencing (TS)
RNA sequencing
Methylation sequencing
Whole Exome Sequencing (WES)
Most commonly used NGS application
About 180.000 exons and exon/intron boundaries are
captured from the human genome
30 Mb
<2% of the genome
Whole Exome Sequencing (WES)

• Sequencing all or some of the protein-coding regions of the genome

• An effective and efficient method to characterize the ∼1–2% of the genome


annotated to encode proteins.

• oligonucleotides are used to target regions of interest

• commercialized technologies for oligonucleotide-based sequence capture have


been developed with platforms from Agilent Technologies, Illumina, and Roche
NimbleGen being the most dominant over the last five years.
Whole Exome Sequencing (WES)

Whole-exome sequencing quickly became an efficient and accurate tool in rare


disease diagnosis in clinics.
WES determines a causative variant in approximately 70% of cases.
Because;
• its power is limited to appreciate multi-variant effects
• and the possible impact of variations outside the exonic regions,
such as deep intronic or regulatory variants that could play a role.
Ex: for more common disorders, such as autism and sporadic schizophrenia, an
exceptional rate of de novo mutations are observed.

These observations are profoundly changing our perception of these diseases


and providing novel frameworks for the analysis of diseases with extensive locus
heterogeneity.
Why WES is the method of
choice?
85% of known disease-causing mutations
Generally, coverage is higher in WES - higher accuracy
Manageable data size
Why WES is the method of
choice?
Cost-effective alternative to WGS
Genetic diagnosis
Identification of genes responsible for single gene
disorders
Any Disadvantages?
Limited coverage of important regulatory and
evolutionarily conserved sequences
NGS workflow - Data analysis
DNA extraction
Sample preparation DNA fragmentation

End repair
Attachment of NGS adapters Adaptors’ ligation

Enrichment PCR
Finalizing library Size selection

Amplification of immobilized ssDNA molecules


Sequencing Massively parallel sequencing

Data processing and analysis


Data processing and analysis

Quality filter
Primary Analysis Read mapping

Alignment
Secondary Analysis Variant calling

Annotations and filtering of variants


Tertiary Analysis Association analysis
Population structure and advanced statistical analysis
From WES to candidate gene

Filtering of WES data (disease


genes, synonymous variants, read
depth, in silico prediction, MAF…)
Verification of variants (PCR
proliferation of variant regions,
Sanger sequencing in the patient,
segregation analysis in the family)
Functional analyses for candidate
genes
WES Data Analysis
The geneticist should select the possible disease causing
variants among about more than 20.000 listed variants by
using different filtering criteria.
Public human genomic databases and bioinformatics
software
WES Data Analysis
(Known disease genes)

Mode of inheritance - homozygous or heterozygous

Read depth >30

Exclude synonymous variants, intronic variants, non-coding exon


variants and upstream/downstream variants

minor allele frequency (MAF) <1% - common polymorphisms

known causative gene leading to a similar disorder or if the


results of previous analysis points to the same locus it can be
selected as a candidate variant.
SIFT
Sorting Intolerant from Tolerant
Predicts whether an amino acid
substitution affects protein function
based on sequence homology and the
physical properties of amino acids.
Deleterious or tolerated.
Smaller the value higher the effect
PolyPhen
Polymorphism Phenotyping
Predicts possible impact of an amino acid
substitution on the structure and function
of a human protein using straightforward
physical and comparative considerations.
Benign, possibly damaging and probably
damaging
Higher the value higher the effect
From WES to candidate gene

Filtering of WES data (disease


genes, synonymous variants, read
depth, in silico prediction, MAF…)
Verification of variants (PCR
proliferation of variant regions,
Sanger sequencing in the patient,
segregation analysis in the family)
Functional analyses for candidate
genes
Polymerase Chain Reaction (PCR)

developed by Kary Mullis in the


1980s
to amplify a single copy or a few
copies of a segment of DNA,
generating thousands to millions
of copies
PCR Components
DNA template- the sample DNA that contains the target sequence.
DNA polymerase- an enzyme that synthesizes new strands of DNA
complementary to the target sequence.
➡ Taq DNA polymerase (from Thermus aquaticus), Pfu DNA
polymerase (from Pyrococcus furiosus)
Primers- short pieces of single-stranded DNA that are complementary to
the target sequence. The polymerase begins synthesizing new DNA from
the end of the primer.
Nucleotides (dNTPs or deoxynucleotide triphosphates)- single units of
the bases A, T, G, and C, which are essentially "building blocks" for new
DNA strands.
PCR Steps

Denaturation

Denaturation
Annealing 35 cycles
Elongation
Final Elongation
Sanger sequencing
Verification process
Sanger sequencing - false positive
result?
The segregation of the variant should be
analyzed among the family members.
If the variation is the causative mutation
in the family, it is expected to be present
in all affected but absent in all
unaffected individuals in the same
family.
From WES to candidate gene

Filtering of WES data (disease


genes, synonymous variants, read
depth, in silico prediction, MAF…)
Verification of variants (PCR
proliferation of variant regions,
Sanger sequencing in the patient,
segregation analysis in the family)
Functional analyses for candidate
genes
EXAMPLE – Family E1
From now on…
Study the pedigree given in your manual (B254).
Filter the variants and decide which candidate variants
should be analyzed.
Design primers for these candidate variants
(instructions in the manual).
You were given the Sanger sequence chromatograms
for these genes for the index patient (B254.3).
Decide which variants you will analyze further until 11th
of April till 5 pm and e-mail me (acandayan@gmail.com)
You will receive chromatograms for the family members
on 11th of April at 6pm.
Decide which variant/s might be related to disease.
Submit your reports on April 19th.
Reports
The procedure is written in detail in your manual.
The reports will be written as if every step is done
by you.
No separate methods and results section, combine
the two.
Pedigree of Family B254

B254.1 B254.2

B254.3 B254.4 B254.5

You might also like