Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

12/7/21, 2:54 PM Module 10.

1: Genome Sequencing Methods

10 Microbial Genomics

Sequencing sequencing methods, regulatory sequences and proteins, and gene expression

Image credit: isak55 / iStock / Getty Images

Unless otherwise specified, all figures and tables presented on this web page are from: Microbiology. Wessner, Dupont, Charles, Neufeld. ©
2020 John Wiley & Sons Canada, Ltd.

This material is reproduced with the permission of John Wiley & Sons Canada, Ltd.

Because the complete sequences of the genomes of many prokaryotes have now been determined by
advances in genomics, molecular biology techniques are becoming ever more important in microbial
genetics.

Introduction to Genomics

Genomics studies the entire complement of genetic information within an organism. Genomics focuses on
the genome and its associated genes, regulatory sequences, and noncoding sequences. Genomics can
provide important information about the encoded potential of a microorganism and can inform related
studies exploring the genetics and ecology of microorganisms. Comparing sequenced genomes from many
microorganisms (comparative genomics) provides insight into the evolutionary histories and relationships
among branches of the tree of life.

Genomics also reveals that approximately one-third of sequenced genes from most microorganisms have
unknown function and are annotated as hypothetical proteins. We have no idea what these do. It is
incredible that, even with a microorganism as intensively studied as Escherichia coli, approximately one-
third of the genes in its genome are of unknown function! Genetics approaches that we covered earlier
again become important to identify the function of these unknown genes by analyzing the phenotypic
effects of mutants.

Functional genomics is the discipline that determines the functions of unknown genes by the construction
of mutants and that analyzes the biochemical and physiological effects of the mutations. Functional
genomics also involves the analysis of the structure, function, and regulation of proteins (proteomics) and

https://learn.uwaterloo.ca/content/enforced/701343-BIOL240_081_cel_1219/lecture-content/module10/genomics/genomics-part1.html?ou=701343 1/14
12/7/21, 2:54 PM Module 10.1: Genome Sequencing Methods

the analysis of the expression of all of the transcripts in the genome at once (transcriptomics), using either
high-throughput sequencing or microarray analysis. Metagenomics involves the extraction and analysis of
DNA directly from an environmental sample.

In this module, we’ll look at these topics and their impacts in more detail.

Test Your Knowledge

What is studied in comparative genomics?

The comparison between many organisms at a single gene locus


The relationship between organisms at a whole genome level
The comparison of genes expressed between organisms
The comparison of proteins expressed between organisms

Correct! Comparative genomics is intended to study the relationship between organisms on the
basis of their genomes.

Sanger/Dideoxy Sequencing

The study of genomics has been greatly facilitated by the development of recombinant DNA protocols
involving the manipulation of DNA into plasmids and bacteriophage for subsequent expression and
sequencing. DNA sequencing techniques have progressed enormously in recent decades, as has the ability
to store and analyze large sequence datasets, such that microbial genomics has advanced at a dizzying
pace. Nonetheless, it is very important be familiar with traditional sequencing methods and how they have
helped usher in the era of genomics.

Let’s start with Sanger sequencing.

Sanger sequencing, also known as dideoxy sequencing, remains one of the methods for sequencing DNA
that generates long reads of the highest quality. Although few universities still have Sanger sequencing
instruments, sending samples to sequencing facilities for Sanger sequencing generates 700–1000 base
reads at very high accuracy.

Method
https://learn.uwaterloo.ca/content/enforced/701343-BIOL240_081_cel_1219/lecture-content/module10/genomics/genomics-part1.html?ou=701343 2/14
12/7/21, 2:54 PM Module 10.1: Genome Sequencing Methods

Figure 11.3 shows you the output from a Sanger sequencing run for an individual DNA template. This
output is called an electropherogram and, as you can see, the unique signal for each base provides a clear
indication of the sequence of a template of interest.

(../../../media/images/microbiology/edition-3.module-10/11.3.png?
_&d2lSessionVal=kmo2UcBwRHDsXMWKg9CTqc2wW&ou=701343)
Figure 11.3. Trace from an automated Sanger DNA sequencer. Automated methods have improved the efficiency of DNA
sequencing and removed the requirement for the researcher to read the gel directly. Fluorescent dye tags are used in
place of radioactivity, enabling the automated detection of DNA fragments during electrophoresis. In the resulting
output, a different color represents each of the four dideoxynucleotide bases: black for G, green for A, red for T, blue
for C. The trace of the peaks can be interpreted by specialized software. Click image to see full-size version .

There are three common steps for Sanger sequencing:

1. Cloning gene fragments of interest: The cloning of DNA involves capturing fragments in ligase and
restriction enzymes to insert your DNA of interest into a vector, usually a plasmid vector.
2. DNA synthesis: You then generate multiple copies of that same sequence through DNA synthesis.
3. Electrophoresis: The DNA synthesis products can then be separated by size in a gel matrix
(electrophoresis).

The key to the process of Sanger/dideoxy sequencing involves DNA polymerase for the DNA-synthesis
step. The DNA-synthesis step is carried out in a reaction that has normal deoxynucleotides and also a
small proportion of dideoxynucleotides (ddNTPs), which lack the 3′ hydroxyl groups that allow for
continued DNA synthesis, as shown in Figure 11.2A. These dideoxynucleotides are called terminator
nucleotides because they terminate synthesis by DNA polymerase.

https://learn.uwaterloo.ca/content/enforced/701343-BIOL240_081_cel_1219/lecture-content/module10/genomics/genomics-part1.html?ou=701343 3/14
12/7/21, 2:54 PM Module 10.1: Genome Sequencing Methods

(../../../media/images/microbiology/edition-3.module-10/11.2a.png?
_&d2lSessionVal=kmo2UcBwRHDsXMWKg9CTqc2wW&ou=701343)
Figure 11.2A. Sanger, or dideoxy, sequencing. The Sanger sequencing method relies on the use of special
dideoxyribonucleotides that can be inserted into a growing DNA strand by DNA polymerase but block the addition of
further nucleotides. Click image to see full-size version .

Terminating DNA polymerase at different points along the template strand results in fragments of
different lengths that can be separated physically, by size, eventually allowing you to reconstruct the
sequence itself based on size.

The initial dideoxyribonucleotide triphosphate sequencing approach was done using the method shown in
Figure 11.2B. Admittedly, this figure is complicated, but it is important for understanding how Sanger
sequencing is done today. Watch the video walkthrough of Figure 11.2B, and consult the corresponding
passage in your textbook if unclear.

1.00

https://learn.uwaterloo.ca/content/enforced/701343-BIOL240_081_cel_1219/lecture-content/module10/genomics/genomics-part1.html?ou=701343 4/14
12/7/21, 2:54 PM Module 10.1: Genome Sequencing Methods

VIDEO IS PAUSED
00:00 | 07:08

Video walkthrough of Figure 11.2B. Copyright John Wiley & Sons Canada, Ltd. and University of Waterloo

Figure 11.2B (full-sized) (../../../media/images/microbiology/edition-3.module-10/11.2b.png?


_&d2lSessionVal=kmo2UcBwRHDsXMWKg9CTqc2wW&ou=701343)

Transcript (../../../media/transcripts/sanger-sequencing.html?ou=701343)

Use of Fluorophores

The Sanger sequencing method originally involved using X-ray film to identify radioactive fragments
using dideoxynucleotide terminator reactions. This type of Sanger sequencing, with separate reactions for
each terminator nucleotide, is not used anymore.

Automated methods for sequencing with fluorescent labels, instead of radioactive labels, are safer,
cheaper, and much easier.

Terminators are prepared with fluorophores, or fluorescent molecules, attached to them. Each of the four
terminators is labelled with its own fluorophore with a unique emission wavelength. This allows the
reaction to be carried out in a single tube and then run in one lane of a gel.

Fluorescence is measured as the fragments, separating by size, pass a laser that uses laser excitation of the
fluorophores and scans for each of the terminators, for each of the fragment sizes. By ~1000 bases or so,
the signal can no longer be discerned, partly because there are few sequences of that length that were
successfully synthesized by DNA polymerase without incorporating a dideoxy terminator.

Primer Walking

Sanger sequencing can also be used to obtain longer sequences from larger inserts in a plasmid by
sequencing sequentially in one direction along a large insert. This is called primer walking, and is shown in
Figure 11.4.

Primers are designed to the end of a known region of DNA next to a DNA insert of interest, then a Sanger
sequencing reaction generates ~700–1000 bases of additional sequence information. A new primer is
designed to the end of this new sequence region, allowing for sequencing into yet more of the insert. This
process continues iteratively, until the other side of the insert is reached.

https://learn.uwaterloo.ca/content/enforced/701343-BIOL240_081_cel_1219/lecture-content/module10/genomics/genomics-part1.html?ou=701343 5/14
12/7/21, 2:54 PM Module 10.1: Genome Sequencing Methods

(../../../media/images/microbiology/edition-3.module-10/11.4.png?
_&d2lSessionVal=kmo2UcBwRHDsXMWKg9CTqc2wW&ou=701343)
Figure 11.4. Primer walking. Sanger sequencing requires binding of a sequencing primer to initiate DNA polymerase
activity. As demonstrated in Figure 11.2, the ends of cloned fragments are usually sequenced using sequencing
primers complementary to known sequence regions within the vector (in blue). A cloned fragment (insert DNA) of
about 3 kb is sequenced using the primer walking method, with each sequencing reaction generating between 500
and 1,000 bases of sequence. The dark blue arrows indicate the sequence that is generated, and the short gold lines
at the base of each arrow correspond to the sequencing primers. Starting from the first sequences obtained from the
end of the insert using a primer to the vector cloning site sequence, each subsequent sequencing reaction, with the
same template, is carried out using primers designed based on the preceding sequence. Note that the primers are all
shown as different colors because they are all distinct, designed based on the unique sequence obtained along the
DNA template of interest. Click image to see full-size version .

Primer walking is a particularly useful method for genome sequencing. As described in the next section,
genome sequencing efforts often generate large fragments of a genome sequence. However, there are
often gaps in the final sequence that need filling. By amplifying the gaps by long-range polymerase chain
reaction (PCR), of several thousand bases, it is then possible to sequence through the gap until the next
known genome sequence is reached. We have “closed” genomes in this way, which is particularly
satisfying, and uses Sanger sequencing as the only suitable method for this genomics-related application.

Test Your Knowledge


What is the key reagent in Sanger sequencing?

primer
sequencing gel
enzyme
ddNTPs

Correct!

Why is primer walking used?

To generate new primers used to amplify DNA


To sequence a longer stretch of DNA than produced by single Sanger sequencing reads
https://learn.uwaterloo.ca/content/enforced/701343-BIOL240_081_cel_1219/lecture-content/module10/genomics/genomics-part1.html?ou=701343 6/14
12/7/21, 2:54 PM Module 10.1: Genome Sequencing Methods

To obtain sequence on both sides of a primer binding site


To know what DNA fragments to subclone for further sequencing

High-Throughput Sequencing

In the mid-2000s, alternatives to Sanger sequencing were developed, providing access to many more
sequences for only a fraction of the per-sequence cost associated with the traditional Sanger sequencing
approach. The key innovation was the high-throughput nature of the “next-generation sequencing”
platforms. Instead of sequencing a single sequence at a time, and perhaps 96 samples at a time, hundreds
of thousands or millions of sequences can be sequenced in an instrument simultaneously.

Pyrosequencing
The first of these high-throughput sequencing methods increased the throughput of an existing
pyrosequencing technique. Let’s look at how pyrosequencing works to better understand how the
method was eventually scaled up.

Like Sanger sequencing, pyrosequencing involves monitoring the incorporation of bases by DNA
polymerase. As shown in Figure 11.5, each time a nucleotide is incorporated into a growing DNA strand,
there is a release of adenosine phosphosulfate (APS) and pyrophosphate.

https://learn.uwaterloo.ca/content/enforced/701343-BIOL240_081_cel_1219/lecture-content/module10/genomics/genomics-part1.html?ou=701343 7/14
12/7/21, 2:54 PM Module 10.1: Genome Sequencing Methods

(../../../media/images/microbiology/edition-3.module-10/11.5.png?
_&d2lSessionVal=kmo2UcBwRHDsXMWKg9CTqc2wW&ou=701343)
Figure 11.5. Pyrosequencing. In the strand shown, DNA polymerase adds dTTP to the growing DNA strand. ATP-
sulfurylase reacts with the pyrophosphate that is released upon incorporation with adenosine phosphosulfate (APS),
generating ATP. This ATP interacts with luciferase and releases a burst of light. The automated detection of this light
by a camera indicates that the nucleotide was added to the growing DNA strand. Click image to see full-size version
.

In the presence of the ATP sulfurylase enzyme, pyrophosphate is converted to ATP. The released ATP
provides the energy for the luciferase enzyme, which gives off light. Every time a base is successfully
incorporated, a flash of light is emitted from the reaction. By providing individual nucleotides sequentially
to the reaction, light generation indicates that a particular base was incorporated. If two or more of the
same base are incorporated, light production will be two or more times as bright.

Scaling up of this reaction was done first by a company known as 454 and the method was called
454 pyrosequencing. The number 454 referred to the number of the method they tried, which was the
method the company adopted.

As shown in Figure 11.6, the 454 pyrosequencing method involves PCR amplification of DNA fragments
on the surface of beads in an emulsion. Many pyrosequencing reactions then occur in a multi-well format,
where every well contains a single bead that corresponds to an individual DNA fragment. Hundreds of
thousands of wells are sequenced simultaneously, with the incorporation of nucleotide bases one at a
time, observing when light is emitted from each of the wells.

https://learn.uwaterloo.ca/content/enforced/701343-BIOL240_081_cel_1219/lecture-content/module10/genomics/genomics-part1.html?ou=701343 8/14
12/7/21, 2:54 PM Module 10.1: Genome Sequencing Methods

(../../../media/images/microbiology/edition-3.module-10/11.6.png?
_&d2lSessionVal=kmo2UcBwRHDsXMWKg9CTqc2wW&ou=701343)
Figure 11.6. 454 pyrosequencing. This high-throughput pyrosequencing method starts with shearing of the DNA into
fragments. Short adapter sequences are then ligated to the end of each fragment. One of these adapter ends is
attached to agarose beads, and the fragment is then amplified by PCR within water droplets, a process called
emulsion PCR. Individual beads with the amplification products are distributed in a flow cell, where repeated
pyrosequencing reactions are carried out as in Figure 11.5, alternating between each of the four dNTP bases. ATP
produced by the DNA synthesis results in light production by luciferase, which can be measured by a CCD camera.
Ion Torrent sequencing uses this same overall process but instead measures protons released by the incorporation of
nucleotides. Click image to see full-size version .

Other High-Throughput Sequencing Methods


Since 454 pyrosequencing, many other high-throughput sequencing methods have become commonplace.

Ion Torrent (Figure 11.7A) uses the release of protons with the incorporation of nucleotide bases (instead
of pyrophosphate for 454 pyrosequencing) to indicate that a base (or bases) were incorporated. Thus, this
multi-well system is like a high-throughput pH meter and individual bases are added one at a time.
Although multi-welled, Figure 11.7A shows how compact the system is — hundreds of thousands of wells
are packed into this small “chip.”

Illumina (Figure 11.7B) sequencing can generate tens or hundreds of millions of sequences on a single flow
cell the size of a microscope slide. After producing many copies of each template DNA strand on the glass
surface of the flow cell, the technique involves incorporation of one base at a time, each with a unique
https://learn.uwaterloo.ca/content/enforced/701343-BIOL240_081_cel_1219/lecture-content/module10/genomics/genomics-part1.html?ou=701343 9/14
12/7/21, 2:54 PM Module 10.1: Genome Sequencing Methods

fluorophore. Laser scanning identifies base incorporations, one at a time, with the fluorophore removed
each time before the next base is incorporated.

(../../../media/images/microbiology/edition-3.module-10/11.7a-b.jpg?
_&d2lSessionVal=kmo2UcBwRHDsXMWKg9CTqc2wW&ou=701343)
Figure 11.7. High‐throughput sequencing technologies. Examples of sequencing platforms used to generate millions of
reads from DNA fragments, demonstrating the small physical size required for sample analysis. A. An Ion PI Chip
(bottom) and the corresponding Ion Proton system from Ion Torrent (top). B. A typical flow cell (bottom) and the
corresponding MiSeq instrument from Illumina (top). Click image to see full-size version .

454 pyrosequencing  (https://en.wikipedia.org/wiki/454_Life_Sciences) (454 Life Sciences), Ion Torrent


 (https://www.thermofisher.com/ca/en/home/brands/ion-torrent.html) (ThermoFisher Scientific), and
Illumina  (http://www.illumina.com/) (Illumina) generate sequences that are a few hundred bases in
length, shorter than those generated with Sanger sequencing and with higher error rates, but the
increased throughput compensates enormously where genome sequencing is concerned.

Keep an eye on the rapid development of DNA sequencing technology.

PacBio  (http://www.pacb.com/) (Pacific Biosciences of California): very long reads of ~3000 bases;
high error rate
Nanopore sequencing  (https://nanoporetech.com/applications/dna-nanopore-sequencing) (Oxford
Nanopore Technologies): very compact — the sequencer sits on a desk and plugs into a USB port of a
laptop

These technologies are often combined with one or more other methods to balance the strengths and
weaknesses of both.
https://learn.uwaterloo.ca/content/enforced/701343-BIOL240_081_cel_1219/lecture-content/module10/genomics/genomics-part1.html?ou=701343 10/14
12/7/21, 2:54 PM Module 10.1: Genome Sequencing Methods

The Illumina instrument shown in Figure 11.7B is in the Department of Biology at the University of
Waterloo and is used extensively by my lab and the labs of several other microbiology professors to
explore microbial communities and sequence genomes of microbes of interest.

Test Your Knowledge


What is the primary difference between 454 pyrosequencing and Sanger sequencing?

All four ddNTPs are different colours in Sanger sequencing whereas only one colour is used for 454
pyrosequencing
Sanger sequencing does not require cloning, 454 requires cloning
454 does not require cloning into a vector
454 sequencing is more expensive than Sanger sequencing on a per-sequence basis

Correct! 454 pyrosequencing does not require that template DNA be cloned into a vector first.

Shotgun Sequencing

Once you have access to sequencing technology and a microorganism of interest that you would like to
sequence a genome for, what’s next?

Sequencing of bacterial genomes is done by an approach known as shotgun sequencing, shown in


Figure 11.8. Generally, this approach involves cutting or shearing of genomic DNA into many small pieces,
cloning the pieces into plasmid vectors, and then randomly sequencing the cloned DNA. More recently,
the cloning step is omitted and fragmented DNA is sequenced directly by high-throughput sequencing
methods.

https://learn.uwaterloo.ca/content/enforced/701343-BIOL240_081_cel_1219/lecture-content/module10/genomics/genomics-part1.html?ou=701343 11/14
12/7/21, 2:54 PM Module 10.1: Genome Sequencing Methods

(../../../media/images/microbiology/edition-3.module-10/11.8.png?
_&d2lSessionVal=kmo2UcBwRHDsXMWKg9CTqc2wW&ou=701343)
Figure 11.8. Process Diagram: Whole genome shotgun sequencing Whole genome shotgun sequencing is the strategy of choice for determining
the sequence of bacterial genomes. It involves the direct sequencing of random clones or DNA fragments. The development of this method
was made possible by the improvement of DNA-sequencing methods and the availability of computational power to find overlaps between
the obtained sequences. Click image to see full-size version .

Sequencing of the fragmented genomic DNA is typically done so that any given region of the genome is
sequenced much more than just once. In fact, genome sequencing involves sequencing the entire genome
with at least 10-fold coverage. Every region of the genome is sequenced at least 10 times and often
hundreds or thousands of times with the high-throughput sequencing approaches of today. It is called
shotgun sequencing because you shred a genome into tiny pieces and sequence them all many times.

The next step is to treat the millions of short sequence reads like a jigsaw puzzle, assembling a single
genome sequence. As we discussed earlier, primer walking is still required to “close” a genome because
some genomic regions do not sequence well and some regions do not assemble well.

https://learn.uwaterloo.ca/content/enforced/701343-BIOL240_081_cel_1219/lecture-content/module10/genomics/genomics-part1.html?ou=701343 12/14
12/7/21, 2:54 PM Module 10.1: Genome Sequencing Methods

Bioinformatics

After you have sequenced your genome, the next step is to annotate and analyze your genome, which
involves bioinformatics.

Bioinformatics involves the use of computational tools to analyze sequence data, assemble sequences (if
it's a genome that's being analyzed), and then store that information in a searchable way. Although
bioinformatics is presented here in the context of DNA sequencing and genomics, bioinformaticians use
computational methods to analyze biological data from DNA, RNA, proteins, and metabolites. Given the
enormous progress in methodologies, bioinformatics is increasingly valuable for advancing modern-day
microbiological discovery.

Within the context of genomics, bioinformatics analysis usually involves analyzing raw sequence data and
trying to make meaningful interpretations of it.

The first step in this process is beginning to annotate those genomes that have been collected. Annotation
involves identifying the likely locations of genes within the genome and assigning putative functions to
those gene regions.

The likely locations of genes are known as open reading frames (ORFs). Open reading frames should have
the approximate size of a gene and possess start and stop codons, the Shine-Dalgarno sequence, and
potentially a promoter sequence as well.

It’s amazing to think that the first bacterial genome (Haemophilus influenzae) was sequenced in 1995. The
genome of Escherichia coli was published in 1997. These days, tens or hundreds of bacterial and archaeal
genomes are available for analysis from NCBI - Genome information by organism 
(https://www.ncbi.nlm.nih.gov/genome/browse/).

From a bioinformatics perspective, it is enormously fortunate that the increase in computational power
has progressed alongside developments in DNA sequencing technology. If not, it simply wouldn't be
possible to analyze the data that are being generated today.

An enormous bioinformatics challenge remains: roughly one-third of all genes have unknown function.
This is why functional genomics is critical, using methods and approaches discussed in our module on
genetics, helping us better assign functions to these open reading frames that have no known purpose,
thus limiting our ability to analyze genomes comprehensively.

Test Your Knowledge


What is the goal of functional genomics?

https://learn.uwaterloo.ca/content/enforced/701343-BIOL240_081_cel_1219/lecture-content/module10/genomics/genomics-part1.html?ou=701343 13/14
12/7/21, 2:54 PM Module 10.1: Genome Sequencing Methods

To determine the biological function of unknown genes


To determine modes of inheritance of unknown genes
To determine how genes interact with each other to cause a phenotype
To determine the functions of unknown bacteria

Correct! Functional genomics seeks to determine the function of unknown genes.

https://learn.uwaterloo.ca/content/enforced/701343-BIOL240_081_cel_1219/lecture-content/module10/genomics/genomics-part1.html?ou=701343 14/14

You might also like