Lec 6 BI 317 Lecture 6 (McStay)

BI317 Human Molecular Genetics
Prof Brian McStay

Centre for Chromosome Biology
& Discipline of Biochemistry
Lecture 6
DNA sequencing (past, present and future)
and
How the human genome sequence was obtained
See Strachan and Read Chapter 8
Centre for
Chromosome
Biology
Sanger or Dideoxy sequencing with 1000’s of these automated fluorescent
sequencers at centres such as the Sanger Centre in Cambridge UK were
responsible for the human genome sequence.
Dideoxy nucleotides can be incorporated into DNA by polymerase but
act as chain terminators since the 3’OH group has been replaced by H and
cannot form a phosphodiester bond with an incoming dNTP
Deoxy NTP
3’ OH
Dideoxy NTP 5’
Chain terminator
Fluorescent derivatives of each Dideoxy NTP have been developed
The Sanger method underpinned molecular genetics for three decades. The
method is disadvantaged by the need for gel electrophoresis, making it difficult
to scale up. The most advanced machines could only sequence 96 samples at
a time (30-60kb every 3-4 hrs).
New technologies developed in the mid-2000s led to sequencing of DNA

during strand synthesis. No gel was required and this led to the development
of massively parallel sequencing also known as next generation sequencing
(NGS)
Illumina workflow (1)
Library preparation
Samples consisting of longer fragments are first sheared into a random library of 100-300 base-pair
long fragments. After fragmentation the ends of the obtained DNA-fragments are repaired, and an A-
overhang is added at the 3'-end of each strand. Afterwards, adaptors which are necessary for
amplification and sequencing are ligated to both ends of the DNA-fragments. These fragments are
then size selected and purified.
Cluster Generation
The Cluster Generation is performed on the Illumina cBot. Single DNA-fragments are attached to the flow cell
by hybridizing to oligos on its surface that are complementary to the ligated adaptors. The DNA-molecules are
then amplified by a so called bridge amplification which results in a hundred of millions of unique clusters.
Finally, the reverse strands are cleaved and washed away and the sequencing primer is hybridized to the
DNA-templates.
Sequencing
During sequencing the huge amount of generated clusters are sequenced simultaneously. The DNA-
templates are copied base by base using the four nucleotides (ACGT) which are fluorescently-labeled and
reversibly terminated. After each synthesis step, the clusters are excited by a laser which causes
fluorescence of the last incorporated base. After that, the fluorescence label and the blocking group are
removed allowing the addition of the next base. The fluorescence signal after each incorporation step is
captured by a built-in camera, producing images of the flow cell.
2000X
human genome
MiSeq
NextSeq 500
HiSeq 2500
Sequencing without an amplification step could overcome bias in the
amplification step of current NGS platforms (eg Illumina etc)
This prompted the advent of single molecule sequencing, the most

notable platform being that offered by Pacific Biosciences
PacBio currently gives fewer and less
accurate reads than Illumina. The real value
is the length of the reads obtained. This
offers significant advantages when
sequencing difficult regions of genomes.
Sequencing of larger inserts (15-20 kb) to support de novo
assembly applications with HiFi data
By combining longer but possibly less accurate PacBio reads with shorter
Illumina reads we can now sequence through repeated DNA
Target DNA
PacBio reads
Illumina reads reads

Array-based DNA capture can enable targeted resequencing
Eg Exome sequencing captures all exons in cancer samples

Human genome project
Framework maps are needed for the first-time sequencing of
complex genomes
Genetic maps rely on the principal that if two mutant phenotypes show a tendancy
to be inherited together they could be expected to be closely linked on the same
chromosome.
Genetic maps can be constructed in model organisms
For ethical and practical reasons genetic mapping of mutations

could never be contemplated in humans
The first human genetic maps were of low resolution and
were constructed using polymorphic DNA markers
RFLP Restriction fragment

length polymorphisms
assayed by Southern blotting

(slow)
A second generation human genetic map was based on Microsatellite DNA
Microsatellite DNA also known as short tandem repeats (STRs). (remember 2nd yr lectures )
Microsatellite instability arises because of mistakes during replication
Assayed by PCR
(rapid)
Marker on
average every
3.5Mb
Marker ~ every Mb
Physical Mapping
Somatic cell hybrids
Radiation hybrids contain

fragments of human
chromosomes generated by
X-rays integrated into rodent
chromosomes
Physical Mapping
Yeast Artificial Chromosome (YAC) vectors enable clone of megabase fragments of DNA
• URA3 and TRP1 provide positive selection for yeast containing YAC
• SUP4 disruption by insert turns positive colonies from white to red
• ARS is a replication origin,
• Telomeres (TEL) and a centromeres (CEN4) confer stability to the artificial chromosome
A Clone Contig The chromosomal sequence from position A to B is represented by
overlapping DNA inserts in a series of genomic clones (YACs or BACs). Clones with
overlapping inserts are generated by the random fragmentation of DNA(usually by partial
digestion with restriction enzymes) when a genomic DNA library is constructed
Physical Mapping
Bacterial Artificial Chromosome vectors
(BACs)
A bacterial artificial chromosome (BAC) is

a vector, based on the F-plasmid.
BAC vectors usually are in the range of 10-

13kb in size
The bacterial artificial chromosome's usual

insert size is 100-300 kb.
The enormous capacity of BAC vectors

makes them especially suited to genome
projects (i.e. less clones required to cover
the genome).
Sequences tagged sites
An important physical mapping aim was to build a map based on sequence-tagged

site (STS) markers.
An STS marker is any known UNIQUE DNA sequence that can easily be assayed by
PCR.
STS markers include polymorphic markers such as microsatellites and many more
non-polymorphic sequences (many obtained from end sequence of BAC clones).
Aligning BAC clones by hybridisation with STS probes
Aligning BAC clones by hybridisation with STS probes
A Clone Contig An example of human clone contig assembly. YAC clones from a
portion of human chromosome 2. Positive typing for an STS marker is indicated by a
closed circle and brackets indicated the absence of an expected STS (YAC instability)
Expressed sequence tags
Other markers were obtained from sequenced cDNA clones and these were known
as expressed sequence tags (ESTs)
Library of
cDNA inserts
Two approaches to genome sequencing
Hierarchical approach is the best approach for the first time

sequencing but for resequencing whole genome shotgun is the best
approach, especially in the era of NGS
Human Genome Project Celera Genomics
International Consortium Commercial enterprise

Publically funded
Whole genome shotgun approach
Hierarchical approach
✗
The first draft genomes from both were published simultaneously
15th Feb 2001 16th Feb 2001
Currently we are on draft 19 (Hg19)

The human genome is complete, or is it?
α Satellite DNA is present at

all human centromeres
Large blocks of β satellite and

sat1, sat 2 and sat 3 are
indictated in red
Blocks of repetitive sequence present a considerable problem in contig assembly

Powerful genome databases and genome browsers
help to store and analyze the enormous amounts of
sequence data available
Genome Browsers such as
Ensemble allow users to explore
selected sub-chromosomal regions
using a graphical interface.
Genome Browsers such as
Ensemble allow users to explore
selected sub-chromosomal regions
using a graphical interface.
Here the subject of the query was

the the human CFTR (cystic
fibrosis transmembrane
regulator)gene on chromosome
7q31.
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between
sequences. The program compares nucleotide or protein sequences to sequence databases
and calculates the statistical significance of matches. BLAST can be used to infer functional
and evolutionary relationships between sequences as well as help identify members of gene
families.
DNA sequence
protein sequence
Nucleotide BLAST
Results
TBLASTN
Protein Vs translated
nucleotide database
of Danio rerio
1. Sequencing technologies
NGS (Illumina)
Sanger sequencing
Single molecule long read

PacBio, Nanopore
2. Human Genome project
3. A very brief intro to analysing genomic data

Lec 6 BI 317 Lecture 6 (McStay)

Uploaded by

Copyright:

Available Formats

You might also like

Lec 6 BI 317 Lecture 6 (McStay)

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec 6 BI 317 Lecture 6 (McStay)

Uploaded by

Copyright:

Available Formats

BI317 Human Molecular Genetics

Prof Brian McStay

DNA sequencing (past, present and future)

How the human genome sequence was obtained

See Strachan and Read Chapter 8

New technologies developed in the mid-2000s led to sequencing of DNA

This prompted the advent of single molecule sequencing, the most

Illumina reads reads

Eg Exome sequencing captures all exons in cancer samples

Genetic maps can be constructed in model organisms

For ethical and practical reasons genetic mapping of mutations

RFLP Restriction fragment

assayed by Southern blotting

Microsatellite instability arises because of mistakes during replication

Radiation hybrids contain

• SUP4 disruption by insert turns positive colonies from white to red

• ARS is a replication origin,

A bacterial artificial chromosome (BAC) is

BAC vectors usually are in the range of 10-

The bacterial artificial chromosome's usual

The enormous capacity of BAC vectors

An important physical mapping aim was to build a map based on sequence-tagged

Hierarchical approach is the best approach for the first time

International Consortium Commercial enterprise

The first draft genomes from both were published simultaneously

15th Feb 2001 16th Feb 2001

Currently we are on draft 19 (Hg19)

α Satellite DNA is present at

Large blocks of β satellite and

Blocks of repetitive sequence present a considerable problem in contig assembly

Here the subject of the query was

Single molecule long read

2. Human Genome project

3. A very brief intro to analysing genomic data

You might also like