How To Sequence Genome

Introduction: Transcript
Studying the human genome - the complete set of human genes - is a way of studying fundamental details about
ourselves. The three billion letters of the human genome are written using the four-letter alphabet of DNA. The DNA is
divided among 23 pairs of chromosomes that are found in each of the trillions of cells in our bodies. In 2003, The
Human Genome Project produced a complete representative sequence of the human genome. Of course, people are not
identical, and DNA sequences do differ subtly between individuals. Currently, a number of separate projects are
charting sequence variations found in human populations.
The representative sequence is a composite from several people who donated blood samples. Originally, close to 100
people volunteered to give a sample of their blood. Each person provided their informed consent, affirming that they
agreed to the study of their DNA. No names were attached to the blood samples and ultimately scientists used only a
few of them. These measures ensured that the DNA sequences remained anonymous; not even the donors knew
whether their samples were actually used or not.
The main goal of The Human Genome Project was to read, letter by letter, the three billion bases of human DNA.
Before starting to sequence the human genome, scientists built maps of the chromosomes and developed and refined
techniques for analyzing DNA. With the tools in place, project scientists began large-scale DNA sequencing in 1999. In
just one year, they had amassed sequence data covering more than 80 percent of the genome.
The human genome is a massive text. If the three billion letters (or bases) of the genome were printed in telephone
books, they would require a stack of books nearly as tall as the Washington monument.
To accurately determine the sequence of every base in the genome, scientists needed to read the three billion bases not
just once, but at least six to ten times. Individual sequencing reactions could only reveal the order of a few hundred
bases of DNA at a time - amounting to a fraction of a page. This meant that to place in order all of the DNA bases, it
was necessary to produce many thousands of overlapping segments of DNA sequence.
Mapping: Transcript
To begin the project, researchers built maps of the human genome. They identified thousands of DNA sequence
landmarks that helped them navigate across the chromosomes.
Developing genome maps was necessary preparation for DNA sequencing. These same maps also served to orient
geneticists who were hunting for disease genes.
With enough landmarks in place, project scientists created "libraries" of clones that spanned the genome. Each clone
contained a manageably small fragment of human DNA that was stored in bacteria. Scientists used the landmarks to tell
them what part of the human genome each fragment came from.
This clone-by-clone approach made it possible to double check the location of each DNA sequence. It also allowed
participating laboratories from around the world to carve up the genome and coordinate their work.
Building Libraries: Transcript

Clone libraries offered the same advantage of real libraries: orderly access to information. In most clone libraries, the
DNA fragments were stored in E. Coli. These are bacteria that normally live in our intestines. Each E. Coli cell stored a
single segment of human DNA and represented a single book of the library. Clone libraries allowed each human
fragment to be tracked and easily copied.
Subclones: Transcript
The clone libraries were prepared using bacterial artificial chromosomes, or BACs. Each BAC clone contained 100,000
to 200,000 bases of DNA sequence. The large BAC clones were used to establish the order of the DNA sequences. To
sequence the DNA, smaller-sized clones were needed. Project scientists cut the large BAC clones into smaller
fragments of about 2,000 bases. These smaller fragments were typically stored in viruses called phage that can infect E.
coli cells.
E. Coli to Store and Copy DNA: Transcript

E. coli cells containing fragments of human DNA, or any other type of DNA, can be stored in freezers indefinitely.
When scientists need to retrieve DNA from the library, they simply revive the cells by bringing them back up to 37
degrees Centigrade - gut temperature.
The E. coli cells act as copiers, producing many copies of the human DNA sequence that they contain. To prepare to
sequence DNA, a clone of cells containing the same bit of human DNA is released into a rich, warm broth. The cells
are shaken vigorously to provide them with air. This causes them to divide rapidly - about once every half hour. After
incubating for just a single night, one third of a teaspoon of broth contains billions of E. coli cells and so, billions of
copies of the particular fragment of human DNA they contained.
Preparing DNA for Sequencing Reactions: Transcript

The next morning, the E. coli cells are broken open to release their DNA. The human DNA is
separated from the cell debris and washed clean.
Now there are enough copies of the human DNA fragment to set up a sequencing reaction.
Sequencing Reactions: Transcript

A DNA sequencing reaction includes four main ingredients, "Template" DNA copied by the E. coli; free bases, the
building blocks of DNA that come in 4 types; short pieces of DNA called "primers"; and DNA polymerase, the enzyme
that copies DNA.
The chemical reaction that makes DNA in a test tube is similar to what happens in a living cell: both rely on DNA
polymerase and, in both cases, DNA strands have a head end, which is called the 5' end, and a tail end, which is called
the 3' end. A DNA strand can grow only from its 3' end.
Making DNA in cells and sequencing DNA in test tubes both depend on complementary base pairing. The building
blocks on opposite strands of DNA pair specifically - a C always pairs with a G, and an A always pairs with a T.
The primer sequence binds to its complementary sequence on the template DNA.
Free bases that match the template sequence can attach to the new strand's growing (3') end.
Among the free bases in the solution are a few that have a fluorescent dye attached to them. When a dye-bearing base
attaches to the growing strand, it stops the new DNA strand from growing any further. A different colored dye is
attached to each of the four kinds of bases.
Products of Sequencing Reactions: Transcript

A completed sequencing reaction contains an array of colored DNA fragments. The shortest fragments correspond to
the length of the primer plus one dye-colored base. The longest fragments are usually between 500 and 800 bases long,
depending on when the sequencing reaction ran out of steam.
The products of sequencing reactions are fed into an automated sequencing machine. Automated sequencers have
become increasingly sophisticated during the past decade. They can run more samples, process them more quickly, and
are easier to operate.
Separating the Sequencing Products: Transcript
The DNA molecules produced during the sequencing reaction are separated from each other by a process called
electrophoresis. DNA molecules are negatively charged. The sequencing machine sets up an electric field; all the DNA
moves through a porous gel toward the positive electrode. The gel acts like a sieve; shorter DNA fragments move more
quickly through the holes of the gel than do larger DNA fragments.
Reading the Sequencing Products: Transcript

As each DNA fragment reaches the end of the gel, a laser excites its fluorescent dye. A camera detects the color of the
emitted light and passes that information to a computer. One by one, the machine records the colors of the DNA
fragments that pass through the gel.
A single sequencing reaction can reveal the order of several hundred DNA bases.
Assembling the Results: Transcript

A computer program integrates the data from individual sequencing reactions. It can spot where DNA fragments
overlap and order them as they originally were on the chromosome.
Many overlapping sequences reads are needed to generate the uninterrupted sequence of the original stretch of DNA.
During the Human Genome Project, every base pair of DNA was sequenced an average of nine times. Some stretches
of DNA were easy to read and needed to be sequenced little less often, while other stretches were more difficult to read
and had to be sequenced more often.
During the Human Genome Project scientists ran more than 50 million sequencing reactions. Some 2000 scientists
from more than two dozen labs around the world, worked on the project.
Working Draft Sequence: Transcript

Whenever a stretch of DNA that spanned 2,000 or more bases was assembled, it was placed into public databases
within 24 hours. Anyone with access to the Internet could see and analyze the sequence data.
After sequencing the 3 billion letters in the human genome an average of nine times, the Human Genome Project had
released DNA sequence for 99 percent of the genome. This finished sequence was 99.99 percent accurate. The project
had completed all of its goals ahead of schedule and under budget.
Conclusion: Transcript
The Human Genome Project also produced other advances, not expected to be accomplished until much later. These
included an advanced draft of the mouse genome and an initial draft of the rat genome.
Medical researchers did not wait to use data from the Human Genome Project. When the project began in 1990, fewer
than 100 human disease genes had been identified. At the project's conclusion in 2003, the number of identified disease
genes had risen to more than 1,400.
The Human Genome Project focused on the DNA sequence of an individual. The next step was to analyze DNA
sequences from different populations. This catalog of human genetic variation was called the HapMap. Completed in
2005, the HapMap used single nucleotide polymorphisms called SNPs to identify large blocks of DNA sequence called
haplotypes that tend to be inherited together. To use the data, researchers compare haplotypes between people with and
without a disease. Haplotypes shared by people with the disease are then examined in detail to look for associated
genes. Already, scientists have used its data to identify a gene associated with age-related macular degeneration, a
disease responsible for blindness among the elderly. It is expected that the HapMap will play an important role in
identifying many more disease genes in the future.

How To Sequence Genome

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How To Sequence Genome

Uploaded by

Copyright:

Available Formats

Introduction: Transcript

Building Libraries: Transcript

E. Coli to Store and Copy DNA: Transcript

Preparing DNA for Sequencing Reactions: Transcript

Sequencing Reactions: Transcript

Products of Sequencing Reactions: Transcript

Reading the Sequencing Products: Transcript

Assembling the Results: Transcript

Working Draft Sequence: Transcript

You might also like