What Is The Human Genome Project

Human
Genome
Proje
ct
By
Shveta
Jaishankar GCM III
102 08 2553
CONTENTS
1. Introduction
2. Brief History
3. NHGRI Management, mission and goals
4. Timeline of milestones in genetics
5. Molecular Basics
6. Study, Sequencing, Conclusion
7. Bioinformatics
8. Participating institutions
9. Goals
10. Publications, Legislations and Sponsors
11. Benefits
12. Ethical, Legal and Social Issues
13. What we’ve learned so far
14. Timeline of Human Genome Project
15. Bibiliography
The Human Genome Project (HGP) was one of the great feats of exploration in history - an inward voyage of
discovery rather than an outward exploration of the planet or the cosmos; an international research effort to
sequence and map all of the genes - together known as the genome - of members of our species, Homo
sapiens. Completed in April 2003, the HGP gave us the ability to, for the first time, to read nature's
complete genetic blueprint for building a human being.
Begun formally in 1990, the U.S. Human Genome Project was a 13-year effort coordinated by the U.S.
Department of Energy and the National Institutes of Health. The project originally was planned to last 15
years, but rapid technological advances accelerated the completion date to 2003. Project goals were to
• identify all the approximately 20,000-25,000 genes in human DNA,

• determine the sequences of the 3 billion chemical base pairs that make up human DNA,
• store this information in databases,
• improve tools for data analysis,
• transfer related technologies to the private sector, and
• address the ethical, legal, and social issues (ELSI) that may arise from the project.
•
To help achieve these goals, researchers also studied the genetic makeup of several nonhuman organisms.
These include the common human gut bacterium Escherichia coli, the fruit fly, and the laboratory mouse.
A unique aspect of the U.S. Human Genome Project is that it was the first large scientific
undertaking to address potential ELSI implications arising from project data.
Another important feature of the project was the federal government's long-standing dedication to
the transfer of technology to the private sector. By licensing technologies to private companies and
awarding grants for innovative research, the project catalyzed the multibillion-dollar U.S. biotechnology
industry and fostered the development of new medical applications.
Landmark papers detailing sequence and analysis of the human genome were published in February
2001 and April 2003 issues of Nature and Science.
HUMAN GENOME PROJECT

The Human Genome Project (HGP) was the international, collaborative research program whose goal was
the complete mapping and understanding of all the genes of human beings. All our genes together are
known as our "genome."
The HGP was the natural culmination of the history of genetics research. In 1911, Alfred Sturtevant,
then an undergraduate researcher in the laboratory of Thomas Hunt Morgan, realized that he could - and
had to, in order to manage his data - map the locations of the fruit fly (Drosophila melanogaster) genes
whose mutations the Morgan laboratory was tracking over generations. Sturtevant's very first gene map can
be likened to the Wright brothers' first flight at Kitty Hawk. In turn, the Human Genome Project can be
compared to the Apollo program bringing humanity to the moon.
The hereditary material of all multi-cellular organisms is the famous double helix of deoxyribonucleic
acid (DNA), which contains all of our genes. DNA, in turn, is made up of four chemical bases, pairs of which
form the "rungs" of the twisted, ladder-shaped DNA molecules. All genes are made up of stretches of these
four bases, arranged in different ways and in different lengths. HGP researchers have deciphered the human
genome in three major ways: determining the order, or "sequence," of all the bases in our genome's DNA;
making maps that show the locations of genes for major sections of all our chromosomes; and producing
what are called linkage maps, complex versions of the type originated in early Drosophila research, through
which inherited traits (such as those for genetic disease) can be tracked over generations.
The HGP has revealed that there are probably about 20,500 human genes. The completed human sequence
can now identify their locations. This ultimate product of the HGP has given the world a resource of detailed
information about the structure, organization and function of the complete set of human genes. This
information can be thought of as the basic set of inheritable "instructions" for the development and function
of a human being.
The International Human Genome Sequencing Consortium published the first draft of the human
genome in the journal Nature in February 2001 with the sequence of the entire genome's three billion base
pairs some 90 percent complete. A startling finding of this first draft was that the number of human genes
appeared to be significantly fewer than previous estimates, which ranged from 50,000 genes to as many as
140,000.The full sequence was completed and published in April 2003.
Upon publication of the majority of the genome in February 2001, Francis Collins, the director of NHGRI,
noted that the genome could be thought of in terms of a book with multiple uses: "It's a history book - a
narrative of the journey of our species through time. It's a shop manual, with an incredibly detailed
blueprint for building every human cell. And it's a transformative textbook of medicine, with insights that
will give health care providers immense new powers to treat, prevent and cure disease."
The tools created through the HGP also continue to inform efforts to characterize the entire genomes of
several other organisms used extensively in biological research, such as mice, fruit flies and flatworms.
These efforts support each other, because most organisms have many similar, or "homologous," genes with
similar functions. Therefore, the identification of the sequence or function of a gene in a model organism, for
example, the roundworm C. elegans, has the potential to explain a homologous gene in human beings, or in
one of the other model organisms. These ambitious goals required and will continue to demand a variety of
new technologies that have made it possible to relatively rapidly construct a first draft of the human
genome and to continue to refine that draft. These techniques include:
• DNA Sequencing
• The Employment of Restriction Fragment-Length Polymorphisms (RFLP)
• Yeast Artificial Chromosomes (YAC)
• Bacterial Artificial Chromosomes (BAC)
• The Polymerase Chain Reaction (PCR)
• Electrophoresis
Of course, information is only as good as the ability to use it. Therefore, advanced methods for widely
disseminating the information generated by the HGP to scientists, physicians and others, is necessary in
order to ensure the most rapid application of research results for the benefit of humanity. Biomedical
technology and research are particular beneficiaries of the HGP.
However, the momentous implications for individuals and society for possessing the detailed genetic
information made possible by the HGP were recognized from the outset. Another major component of the
HGP - and an ongoing component of NHGRI - is therefore devoted to the analysis of the ethical, legal and
social implications (ELSI) of our newfound genetic knowledge, and the subsequent development of policy
options for public consideration.
A BRIEF HISTORY
In February 2001, the Human Genome Project (HGP) published its results to that date: a 90 percent
complete sequence of all three billion base pairs in the human genome. The HGP consortium published its
data in the February 15, 2001, issue of the journal Nature.
The project had its ideological origins in the mid-1980s, but its intellectual roots stretch back further.
Alfred Sturtevant created the first Drosophila gene map in 1911.
The crucial first step in molecular genome analysis, and in much of the molecular biological research
of the last half-century, was the discovery of the double helical structure of the DNA molecule in 1953 by
Francis Crick and James Watson. The two researchers shared the 1962 Nobel Prize (along with Maurice
Wilkins) in the category of "physiology or medicine."
In the mid-1970s, Frederick Sanger developed techniques to sequence DNA, for which he received
his second Nobel Prize in chemistry in 1980. (His first, in 1958, was for studies of protein structure). With
the automation of DNA sequencing in the 1980s, the idea of analyzing the entire human genome was first
proposed by a few academic biologists.
The United States Department of Energy, seeking data on protecting the genome from the
mutagenic (gene-mutating) effects of radiation, became involved in 1986, and established an early genome
project in 1987.
In 1988, Congress funded both the NIH and the DOE to embark on further exploration of this
concept, and the two government agencies formalized an agreement by signing a Memorandum of
Understanding to "coordinate research and technical activities related to the human genome."
James Watson was appointed to lead the NIH component, which was dubbed the Office of Human Genome
Research. The following year, the Office of Human Genome Research evolved into the National Center for
Human Genome Research (NCHGR).
In 1990, the initial planning stage was completed with the publication of a joint research plan,
"Understanding Our Genetic Inheritance: The Human Genome Project, The First Five Years, FY 1991-1995."
This initial research plan set out specific goals for the first five years of what was then projected to be a 15-
year research effort.
In 1992, Watson resigned, and Michael Gottesman was appointed acting director of the center. The
following year, Francis S. Collins was named director.
The advent and employment of improved research techniques, including the use of restriction fragment-
length polymorphisms, the polymerase chain reaction, bacterial and yeast artificial chromosomes and
pulsed-field gel electrophoresis, enabled rapid early progress. Therefore, the 1990 plan was updated with a
new five-year plan announced in 1993 in the journal Science (262: 43-46; 1993).
Indeed, a large part of the early work of the HGP was devoted to the development of improved technologies
for accelerating the elucidation of the genome. In a 2001 article in the journal Genome Research, Collins
wrote, "Building detailed genetic and physical maps, developing better, cheaper and faster technologies for
handling DNA, and mapping and sequencing the more modest-sized genomes of model organisms were all
critical stepping stones on the path to initiating the large-scale sequencing of the human genome."
Also in 1993, the NCHGR established a Division of Intramural Research (DIR), in which genome technology
is developed and used to study specific diseases. By 1996, eight NIH institutes and centers had also
collaborated to create the Center for Inherited Disease Research (CIDR), for study of the genetics of
complex diseases.
In 1997, the NCHGR received full institute status at NIH, becoming the National Human Genome
Research Institute in 1997, with Collins remaining as the director for the new institute. A third five-year plan
was announced in 1998, again in Science, (282: 682-689; 1998).
In June 2000 came the announcement that the majority of the human genome had in fact been sequenced,
which was followed by the publication of 90 percent of the sequence of the genome's three billion base-pairs
in the journal Nature, in February 2001.
On April 14, 2003 the National Human Genome Research Institute (NHGRI), the Department of
Energy (DOE) and their partners in the International Human Genome Sequencing Consortium announced
the successful completion of the Human Genome Project.
Surprises accompanying the sequence publication included: the relatively small number of human genes,
perhaps as few as 30,000; the complex architecture of human proteins compared to their homologs - similar
genes with the same functions - in, for example, roundworms and fruit flies; and the lessons to be taught by
repeat sequences of DNA.
How NHGRI Managed the Human Genome Project
NHGRI was created in 1989 to manage the role of NIH in the HGP and funded research in a variety of areas
related to the project.
The Division of Extramural Research (DER) for NHGRI supported and managed the role of NIH in the
HGP, set the scientific priorities for HGP research and supervised the peer-reviewed research projects that
addressed those research efforts. The extramural research community and the NHGRI National Advisory
Council for Human Genome Research (NACHGR) advise the DER. Major areas of genome-related research
overseen by the DER include: the development of technologies used in gene sequencing and mapping; the
analysis of the functions of the genes and the proteins for which most genes code; computer technologies
for managing and disseminating the enormous amounts of data generated by the HGP; determination of the
crucial differences in the genetic makeup of individual human beings from each other; and examination of
the ethical, legal and social implications of genetic research.
The Completion of the Sequence and Remaining Goals
In 2003, an accurate and complete human genome sequence was finished and made available to scientists
and researchers two years ahead of the original HGP schedule and at a cost less than the original estimated
budget. With the completion of the HGP, the mission of the NHGRI has expanded to include studies aimed at
understanding how the human genome functions in the role of creating gene products, most notably the
many proteins for which genes code.
NHGRI Mission and Goals
In late 2001 through 2002, NHGRI gathered the world's leading genome researchers to discuss and
determine the direction of future research at two large "bookend" meetings called Beyond the Beginning:
The Future of Genomics I and II and held workshops throughout 2002 to discuss specific areas of genomic
research, policy, education and ethics.
The specific ideas and recommendations that arose from these sessions has informed the next stage of
genomic research, resulting in a vision document authored by the leadership at NHGRI: A Vision for the
Future of Genomics Research. The overarching mission of the HGP and the NHGRI, however, remains the
same: the quest to understand the human genome and the role it plays in both health and disease. Francis
Collins called the publication in February 2001 of the majority of the human genome "the end of the
beginning.” With the completion of the HGP in April 2003, his words continue to ring true. Writing in a 2001
article for Genome Research titled: Contemplating the End of the Beginning Collins explained:
Critical understanding of gene expression, the connection between sequence variations and phenotype,
large-scale protein-protein interactions, and a host of other global analyses of human biology can now get
seriously underway. For me, as a physician, the true payoff from the HGP will be the ability to better
diagnose, treat and prevent disease, and most of those benefits to humanity still lie ahead. With these
immense data sets of sequence and variation now in hand, we are now empowered to pursue those goals in
ways undreamed of a few years ago.
Timeline of milestones in genetics
1859: Darwin publishes On the Origin of Species, proposing continual evolution of species
1865: Mendel's Peas
1869: DNA First Isolated
1879: Mitosis Observed
1900: Rediscovery of Mendel's work
1902: Orderly Inheritance of Disease Observed
1902: Chromosome Theory of Heredity
1909: The Word Gene Coined
1911: Fruit Flies Illuminate the Chromosome Theory
1941: One Gene, One Enzyme
1943: X-ray Diffraction of DNA
1944: DNA is "Transforming Principle"
1944: Jumping Genes
1952: Genes are Made of DNA
1953: DNA Double Helix
1955: 46 Human Chromosomes
1955: DNA Copying Enzyme
1956: Cause of Disease Traced to Alteration
1958: Semi conservative Replication of DNA
1959: Chromosome Abnormalities Identified
1961: mRNA Ferries Information
1961: First Screen for Metabolic Defect in Newborns
1966: Genetic Code Cracked
1968: First Restriction Enzymes Described
1972: First Recombinant DNA
1973: First Animal Gene Cloned
1975-77: DNA Sequencing
1976: First Genetic Engineering Company
1977: Introns Discovered
1981-82: First Transgenic Mice and Fruit Flies
1982: GenBank Database Formed
1983: First Disease Gene Mapped
1983: PCR Invented
1986: First Time Gene Positionally Cloned
1987: First Human Genetic Map
1987: YACs Developed
1989: Microsatellites, New Genetic Markers
1989: Sequence-tagged Sites, Another Marker
1990: Launch of the Human Genome Project
1990: ELSI Founded
1990: Research on BACs
1991: ESTs, Fragments of Genes
1992: Second-generation Genetic Map of Human Genome
1992: Data Release Guidelines Established
1993: NEW HGP Five-year Plan
1994: FLAVR SAVR Tomato
1994: Detailed Human Genetic Map
1994: Microbial Genome Project
1995: Ban on Genetic Discrimination in Workplace
1995: Two Microbial Genomes Sequenced
1995: Physical Map of Human Genome Completed
1996: International Strategy Meeting on Human Genome Sequencing
1996: Mouse Genetic Map Completed
1996: Yeast Genome Sequenced
1996: Archaea Genome Sequenced
1996: Health Insurance Discrimination Banned
1996: 280,000 Expressed Sequence Tags (ESTs)
1996: Human Gene Map Created
1996: Human DNA Sequence Begins
1997: Bermuda Meeting Affirms Principle of Data Release
1997: E. coli Genome Sequenced
1997: Recommendations on Genetic Testing
1998: Private Company Announces Sequencing Plan
1998: M. Tuberculosis Bacterium Sequenced
1998: Committee on Genetic Testing
1998: HGP Map Includes 30,000 Human Genes
1998: New HGP Goals for 2003
1998: SNP Initiative Begins
1998: Genome of Roundworm C. elegans Sequenced
1999: Full-scale Human Genome Sequencing
1999: Chromosome 22
2000: Free Access to Genomic Information
2000: Chromosome 21
2000: Working Draft
2000: Drosophila and Arabidopsis genomes sequenced
2000: Executive Order Bans Genetic Descrimination in the Federal Workplace
2000: Yeast Interactome Published
2000: Fly Model of Parkinson's Disease Reported
2001: First Draft of the Human Genome Sequence Released
2001: RNAi Shuts Off Mammalian Genes
2001: FDA Approves Genetics-based Drug to Treat Leukemia
2002: Mouse Genome Sequenced
2002: Researchers Find Genetic Variation Associated with Prostate Cancer
2002: Rice Genome Sequenced
2002: The International HapMap Project is Announced
2002: The Genomes to Life Program is Launched
2002: Researchers Identify Gene Linked to Bipolar Disorder
2003: Human Genome Project Completed
2003: Fiftieth Anniversary of Watson and Crick's Description of the Double Helix
2003: The First National DNA Day Celebrated
2003: ENCODE Program Begins
2003: Premature Aging Gene Identified
2004: Rat and Chicken Genomes Sequenced
2004: FDA Approves First Microarray
2004: Refined Analysis of Complete Human Genome Sequence
2004: Surgeon General Stresses Importance of Family History
2005: Chimpanzee Genomes Sequenced
2005: HapMap Project Completed
2005: Trypanosomatid Genomes Sequenced
2005: Dog Genomes Sequenced
2006: The Cancer Genome Atlas (TCGA) Project Started
2006: Second Non-human Primate Genome is Sequenced
2006: Initiatives to Establish the Genetic and Environmental Causes of Common Diseases Launched
Exploring Our Molecular Selves

The Human Genome Project is a way of exploring our molecular selves.
Almost all of our cells - the muscle cells that let us smile, the brain cells that perceive the humor in things,
the cells of our eyes that take it all in - contain a complete set of all our genes, the genome.
If we could journey inside ourselves, into a cell, we would see 23 pairs of chromosomes packed into
a nucleus.
Each chromosome contains a long coil of DNA. If all the chromosomes were unwound, the DNA in
just one of our cells would stretch 6 feet long. The DNA double helix contains four kinds of building blocks -
an A always pairs with a T, a C with a G.
DNA contains information to make every part of our bodies with its 4-letter language. Each of our
thousands of genes codes for a specific part. RNA polymerase copies the information in a gene into a
messenger molecule, messenger RNA.
The building blocks of messenger RNA and DNA are called bases. The bases on one strand of the
DNA specify the order of bases on the new strand of messenger RNA.
The DNA always stays inside the nucleus, but messenger RNA travels out into the cytoplasm. There,
a protein-making machine called a ribosome can read messenger RNA to make a particular protein.
Every three bases of the messenger RNA molecule codes for an amino acid. Proteins are made of
amino acids. tRNA molecules help translate the language of DNA and RNA into the language of proteins.
tRNA molecules bring the right amino acids that the ribosome links together to make a protein.
Proteins are the laborers. Some form structures like tendons and hair, others perceive light, sense
and flavors, control chemical reactions and carry messages between cells. To understand our molecular
selves, scientists have read the three billion letters making up the DNA in the human genome.
Different sets of genes, interacting with complex environmental factors, influence things like our looks,
personalities, and risks for diseases like cancer and heart disease.
A growing understanding of our genes and all they do will help us understand the complexity and the
wonder of life.
STUDY
Studying the human genome - the complete set of human genes - is a way of studying fundamental
details about ourselves. The three billion letters of the human genome are written using the four-letter
alphabet of DNA. The DNA is divided among 23 pairs of chromosomes that are found in each of the trillions
of cells in our bodies. In 2003, The Human Genome Project produced a complete representative sequence of
the human genome. Of course, people are not identical, and DNA sequences do differ subtly between
individuals. Currently, a number of separate projects are charting sequence variations found in human
populations.
The representative sequence is a composite from several people who donated blood samples.
Originally, close to 100 people volunteered to give a sample of their blood. Each person provided their
informed consent, affirming that they agreed to the study of their DNA. No names were attached to the
blood samples and ultimately scientists used only a few of them. These measures ensured that the DNA
sequences remained anonymous; not even the donors knew whether their samples were actually used or
not.
The main goal of The Human Genome Project was to read, letter by letter, the three billion bases of
human DNA. Before starting to sequence the human genome, scientists built maps of the chromosomes and
developed and refined techniques for analyzing DNA. With the tools in place, project scientists began large-
scale DNA sequencing in 1999. In just one year, they had amassed sequence data covering more than 80
percent of the genome.
The human genome is a massive text. If the three billion letters (or bases) of the genome were
printed in telephone books, they would require a stack of books nearly as tall as the Washington monument.
To accurately determine the sequence of every base in the genome, scientists needed to read the three
billion bases not just once, but at least six to ten times. Individual sequencing reactions could only reveal
the order of a few hundred bases of DNA at a time - amounting to a fraction of a page. This meant that to
place in order all of the DNA bases, it was necessary to produce many thousands of overlapping segments of
DNA sequence.
Mapping
To begin the project, researchers built maps of the human genome. They identified thousands of DNA
sequence landmarks that helped them navigate across the chromosomes.
Developing genome maps was necessary preparation for DNA sequencing. These same maps also
served to orient geneticists who were hunting for disease genes.
With enough landmarks in place, project scientists created "libraries" of clones that spanned the
genome. Each clone contained a manageably small fragment of human DNA that was stored in bacteria.
Scientists used the landmarks to tell them what part of the human genome each fragment came from.
This clone-by-clone approach made it possible to double check the location of each DNA sequence. It also
allowed participating laboratories from around the world to carve up the genome and coordinate their work.
Building Libraries
Clone libraries offered the same advantage of real libraries: orderly access to information. In most
clone libraries, the DNA fragments were stored in E. Coli. These are bacteria that normally live in our
intestines. Each E. Coli cell stored a single segment of human DNA and represented a single book of the
library. Clone libraries allowed each human fragment to be tracked and easily copied.
Subclones
The clone libraries were prepared using bacterial artificial chromosomes, or BACs. Each BAC clone
contained 100,000 to 200,000 bases of DNA sequence. The large BAC clones were used to establish the
order of the DNA sequences. To sequence the DNA, smaller-sized clones were needed. Project scientists cut
the large BAC clones into smaller fragments of about 2,000 bases. These smaller fragments were typically
stored in viruses called phage that can infect E. coli cells
E. Coli to Store and Copy DNA

E. coli cells containing fragments of human DNA, or any other type of DNA, can be stored in freezers
indefinitely. When scientists need to retrieve DNA from the library, they simply revive the cells by bringing
them back up to 37 degrees Centigrade - gut temperature.
The E. coli cells act as copiers, producing many copies of the human DNA sequence that they contain. To
prepare to sequence DNA, a clone of cells containing the same bit of human DNA is released into a rich,
warm broth. The cells are shaken vigorously to provide them with air. This causes them to divide rapidly -
about once every half hour. After incubating for just a single night, one third of a teaspoon of broth contains
billions of E. coli cells and so, billions of copies of the particular fragment of human DNA they contained.
Preparing DNA for Sequencing Reactions

The next morning, the E. coli cells are broken open to release their DNA. The human DNA is separated from
the cell debris and washed clean.
Now there are enough copies of the human DNA fragment to set up a sequencing reaction.
Sequencing Reactions
A DNA sequencing reaction includes four main ingredients, "Template" DNA copied by the E. coli;
free bases, the building blocks of DNA that come in 4 types; short pieces of DNA called "primers"; and DNA
polymerase, the enzyme that copies DNA.
The chemical reaction that makes DNA in a test tube is similar to what happens in a living cell: both
rely on DNA polymerase and, in both cases, DNA strands have a head end, which is called the 5' end, and a
tail end, which is called the 3' end. A DNA strand can grow only from its 3' end.
Making DNA in cells and sequencing DNA in test tubes both depend on complementary base pairing. The
building blocks on opposite strands of DNA pair specifically - a C always pairs with a G, and an A always
pairs with a T.
The primer sequence binds to its complementary sequence on the template DNA.
Free bases that match the template sequence can attach to the new strand's growing (3') end.
Among the free bases in the solution are a few that have a fluorescent dye attached to them. When a dye-
bearing base attaches to the growing strand, it stops the new DNA strand from growing any further. A
different colored dye is attached to each of the four kinds of bases.
Products of Sequencing Reactions

A completed sequencing reaction contains an array of colored DNA fragments. The shortest fragments
correspond to the length of the primer plus one dye-colored base. The longest fragments are usually
between 500 and 800 bases long, depending on when the sequencing reaction ran out of steam.
The products of sequencing reactions are fed into an automated sequencing machine. Automated
sequencers have become increasingly sophisticated during the past decade. They can run more samples,
process them more quickly, and are easier to operate.
Separating the Sequencing Products
The DNA molecules produced during the sequencing reaction are separated from each other by a process
called electrophoresis. DNA molecules are negatively charged. The sequencing machine sets up an electric
field; all the DNA moves through a porous gel toward the positive electrode. The gel acts like a sieve;
shorter DNA fragments move more quickly through the holes of the gel than do larger DNA fragments.
Reading the Sequencing Products

As each DNA fragment reaches the end of the gel, a laser excites its fluorescent dye. A camera detects the
color of the emitted light and passes that information to a computer. One by one, the machine records the
colors of the DNA fragments that pass through the gel.
A single sequencing reaction can reveal the order of several hundred DNA bases.
Assembling the Results

A computer program integrates the data from individual sequencing reactions. It can spot where DNA
fragments overlap and order them as they originally were on the chromosome.
Many overlapping sequences reads are needed to generate the uninterrupted sequence of the
original stretch of DNA. During the Human Genome Project, every base pair of DNA was sequenced an
average of nine times. Some stretches of DNA were easy to read and needed to be sequenced little less
often, while other stretches were more difficult to read and had to be sequenced more often.
During the Human Genome Project scientists ran more than 50 million sequencing reactions. Some
2000 scientists from more than two dozen labs around the world, worked on the project.
Working Draft Sequence

Whenever a stretch of DNA that spanned 2,000 or more bases was assembled, it was placed into public
databases within 24 hours. Anyone with access to the Internet could see and analyze the sequence data.
After sequencing the 3 billion letters in the human genome an average of nine times, the Human Genome
Project had released DNA sequence for 99 percent of the genome. This finished sequence was 99.99 percent
accurate. The project had completed all of its goals ahead of schedule and under budget.
Conclusion
The Human Genome Project also produced other advances, not expected to be accomplished until much
later. These included an advanced draft of the mouse genome and an initial draft of the rat genome.
Medical researchers did not wait to use data from the Human Genome Project. When the project began in
1990, fewer than 100 human disease genes had been identified. At the project's conclusion in 2003, the
number of identified disease genes had risen to more than 1,400.
The Human Genome Project focused on the DNA sequence of an individual. The next step was to analyze
DNA sequences from different populations. This catalog of human genetic variation was called the HapMap.
Completed in 2005, the HapMap used single nucleotide polymorphisms called SNPs to identify large blocks
of DNA sequence called haplotypes that tend to be inherited together. To use the data, researchers compare
haplotypes between people with and without a disease. Haplotypes shared by people with the disease are
then examined in detail to look for associated genes. Already, scientists have used its data to identify a
gene associated with age-related macular degeneration, a disease responsible for blindness among the
elderly. It is expected that the HapMap will play an important role in identifying many more disease genes in
the future.
BIOINFORMATICS
When the Human Genome Project was begun in 1990 it was understood that to meet the project's goals, the
speed of DNA sequencing would have to increase and the cost would have to come down. Over the life of
the project virtually every aspect of DNA sequencing was improved. It took the project approximately four
years to sequence its first one billion bases but just four months to sequence the second billion bases.
During the month of January, 2003, 1.5 billion bases were sequenced. As the speed of DNA sequencing
increased, the cost decreased from 10 dollars per base in 1990 to 10 cents per base at the conclusion of the
project in April 2003. Although the Human Genome Project is officially over, improvements in DNA
sequencing continue to be made. Researchers are experimenting with new methods for sequencing DNA
that have the potential to sequence a human genome in just a matter of weeks for a few thousand dollars.
DNA sequencing performed on an industrial scale has produced a vast amount
of data to analyze. In August 2005 it was announced that the three largest
public collections of DNA and RNA sequences together store one hundred
billion bases, representing over 165,000 different organisms. As sequence
data began to pile up, the need for new and better methods of sequence
analysis was critical.
Bioinformatics is the branch of biology that is concerned with the acquisition,
storage, and analysis of the information found in nucleic acid and protein
sequence data. Computers and bioinformatics software are the tools of the trade.
Genetic data represent a treasure trove for researchers and companies interested in how genes contribute
to our health and well being. Almost half of the genes identified by the Human Genome Project have no
known function. Researchers are using bioinformatics to identify genes, establish their functions, and
develop gene-based strategies for preventing, diagnosing, and treating disease.
A DNA sequencing reaction produces a sequence that is several hundred bases long. Gene sequences
typically run for thousands of bases. The largest known gene is that associated with Duchenne muscular
dystrophy. It is approximately 2.4 million bases in length. In order to study genes, scientists first assemble
long DNA sequences from series of shorter overlapping sequences.
Scientists enter their assembled sequences into genetic databases so that other scientists may use the data.
Since the sequences of the two DNA strands are complementary, it is only necessary to enter the sequence
of one DNA strand into a database. By selecting an appropriate computer program, scientists can use
sequence data to look for genes, get clues to gene functions, examine genetic variation, and explore
evolutionary relationships. Bioinformatics is a young and dynamic science. New bioinformatic software is
being developed while existing software is continually updated.
Finding Genes
Figure 1: One of the most important aspects of bioinformatics is

identifying genes within a long DNA sequence. Until the development
of bioinformatics, the only way to locate genes along the
chromosome was to study their behavior in the organism (in vivo) or
isolate the DNA and study it in a test tube (in vitro). Bioinformatics
allows scientists to make educated guesses about where genes are
located simply by analyzing sequence data using a computer (in
silico).In principle, locating genes should be easy. DNA sequences
that code for proteins begin with the three bases ATG that code for
the amino acid methionine and they end with one or more stop
codons; either TAA, TAG or TGA. Unfortunately, finding genes isn't
Figure 1: DNA Sequences- three always so easy.
bases and stop codons
Figure 2: Let's consider a DNA sequence that contains a gene of
interest. The DNA strand that codes for the protein is called the
sense strand because its sequence reads the same as that of the
messenger RNA. The other strand is called the antisense strand and
serves as the template for RNA polymerase during transcription.
Figure 2: Sense Strand / Antisense

Strand
Figure 3: A gene begins with a codon for the amino acid methionine
and ends with one of three stop codons. The codons between the
start and stop signals code for the various amino acids of the gene
Figure 3: Open Reading Frame product but do not include any of the three stop codons. When
examining an unknown DNA sequence, one indication that it may be
part of a gene is the presence of an open reading frame or ORF. An
ORF is any stretch of DNA that when transcribed into RNA has no
stop codon.
Figure 4: A computer program can be used to check an unknown
DNA sequence for ORFs. The program transcribes each DNA strand
into its complementary RNA sequence and then translates the RNA
sequence into an amino acid sequence. Each DNA strand can be read
Figure 4: Three different reading in three different reading frames. This means that the computer
frames must perform six different translations for any given double-
stranded DNA sequence.
Figure 5: The presence of an ORF doesn't guarantee that the DNA
sequence is part of a gene. We expect that, just by chance, there
will be some long stretches of DNA that do not contain stop codons
yet are not parts of genes. Likewise, codons for methionine do not
always mark the start of a gene sequence. Methionine codons are
also found within genes. Nevertheless, searching for ORFs identifies
Figure 5: Regions of DNA sequence regions of the DNA sequence that might be parts of genes.
that might be part of genes
Figure 6: A single RNA or DNA strand has a phosphate group at one
end and a sugar (ribose for RNA and deoxyribose for DNA) at the
other end. The end of the strand with the phosphate group is called
the 5' end and the opposite end with the sugar is called the 3' end.
In the double helix, the two strands run in opposite directions. That
is, one strand runs in the 5' to 3' direction while the complementary
strand runs in the 3' to 5' direction.
Figure 6: Strands with 5' and 3'
Figure 7: The enzymes and ribosomes that carry out protein
synthesis only work in one direction. During transcription, the mRNA
is made in the 5' to 3' direction. During translation, the mRNA is
read in the 5' to 3' direction. This means that a computer program
looking for ORFs also must read each DNA strand in the 5' to 3'
direction.
Figure 7: Transcription and

Translation
Figure 8: It is easier to locate genes in bacterial DNA than in
eukaryotic DNA. In bacteria, the genes are arranged like beads on a
string. Each gene consists of a single ORF. The situation in
eukaryotic organisms is complicated by the split nature of the genes.
Most eukaryotic genes take the form of alternating exons and
Figure 8: Exons and Introns
introns. Each exon is an ORF that codes for amino acids. The intron
sequences do not code for amino acids and contain internal stop
codons.
Figure 9: One of the surprises of the Human Genome Project was
the relatively small number of genes found - about 25,000. One
might ask, "How can something as complicated as a human have
only 25 percent more genes than the tiny roundworm C. elegans?"
Part of the answer seems to involve alternative splicing. Alternative
splicing refers to the process by which a given gene is spliced into
more than one type of mRNA molecule.
Figure 9: Alternative Splicing
ORFs are just one feature that a computer program looks for when locating potential genes. Genes are also
characterized by specific control sequences that are recognized by enzymes involved with transcription and
translation. When a computer program finds a DNA sequence that satisfies all of these gene features (an
ORF plus the appropriate control sequences), it identifies the sequence as likely coming from a gene. Only
testing the DNA sequence in the laboratory can prove that the gene is active in an organism however
Finding Functions
Once a nucleic acid or amino acid sequence has been assembled, bioinformatic analysis can be used to
determine if the sequence is similar to that of a known gene. This is where sequences from model
organisms are helpful. For example, let’s say we have an unknown human DNA sequence that is associated
with the disease cystic fibrosis. A bioinformatic analysis finds a similar sequence from mouse that is
associated with a gene that codes for a membrane protein that regulates salt balance. It is a good bet that
the human sequence also is part of a gene that codes for a membrane protein that regulates salt balance.
Determining the similarity of 2 sequences is not easy. For Ex., it was recently reported that the genomes of
humans and chimpanzees are 96 percent similar. What does this really mean?”
Consider the following two sequences:

Each sequence consists of 20 bases. There is just one base difference between them. Because the two
sequences match at 19 out 20 bases, we can say that the two sequences are 95 percent the same.
Now consider the following two DNA sequences:
This time, 16 out of 20 bases match. We can say that the two sequences are 80 percent the same. Careful
inspection however reveals another sort of similarity between Sequences 3 and 4.
If we align the sequences like this . . .
We see that the 2 sequences differ by just a missing base in Sequence 4 (or an added base to Sequence 3).
Does the deletion (or insertion) of a single base equal four base substitutions as suggested in this example?
There is no simple answer to that question. When comparing sequences, we must be concerned not only
with the quantity of the differences but the quality as well.
Scientists have written computer programs that can be used to see if a particular DNA sequence is similar to
any others that are stored in a sequence database. One of the most popular such programs is called BLAST
(Basic Local Alignment Search Tool). Using this program is somewhat like using a search engine on the
Internet. The user provides the program with a biological sequence (when using BLAST) or a subject (when
using a search engine). In each case, the program compares the input information to the information found
in the database. The results are given with the most closely matching items (or sequences) listed first,
followed by items (or sequences) that match less well.
Let’s look at an example of a BLAST search. The input sequence that is being compared to others in the
database is called the query sequence. In our example, the query is the short human DNA sequence listed
below.
Once the query sequence is submitted, the BLAST program compares it, one-at-a-time, to every sequence
in its database. Typically, the search results are displayed so that the query sequence is shown at the top
and the matching sequences are listed below it. The listed sequence “hits” also may include links to relevant
bibliographic information. The results from this search are shown below.
BLAST Search Terminology:
Sequence ID: A unique number used to identify the DNA sequence.

Description: Describes the species from which the sequence comes and the gene it is associated with (if
any).
Query: Indicates how many bases are in the input (test) sequence.
Match: The amount of shading on each graphic indicates how well the query sequence matches the hit (or
subject) sequence. Note, the shading does not compare the similarities between the whole genomes.
Expected (E) Value: Result of a mathematical calculation that describes the significance of a match. The
lower the E value (closer to“0”), the better the match. An E value of less than 10-6 is a biologically
significant match.
The BLAST program compares a single input sequence, one at a time, to others in a sequence database. The
results can provide clues as to the identity and function of the input sequence. Sometimes you may want to
compare a number of different sequences, all at the same time to see where they are alike and where they
are different. The CLUSTAL program was developed to produce such multiple alignments. CLUSTAL gets its
name because it deals with clusters of sequences.
CLUSTAL alignments are sometimes used by scientists examining genetic variation within a population. For
example, once a gene has been associated with a disease, scientists can use CLUSTAL to examine how the
gene sequence varies among people with and without the disease. The example below shows a CLUSTAL
alignment of DNA sequences from a portion of the gene associated
with cystic fibrosis. The person affected by the disease is seen to be missing a three-base DNA sequence.
Multiple sequence alignments are also useful to scientists investigating the evolutionary relationships among
species. For example, the CLUSTAL program can be used to align a series of related sequences from
different species. Once the program has produced the best alignment for the sequences, another program
can calculate the evolutionary relationships between them. These data can be used to construct a tree
diagram showing the evolutionary relationships for that sequence among the various species.
International Human Genome Sequencing Consortium (Participating institutions)
1. Whitehead Institute/MIT Center for Genome Research, Cambridge, Mass., U.S.

2. The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton,
Cambridgeshire, U.K.
3. Washington University School of Medicine Genome Sequencing Center, St. Louis, Mo., U.S.
4. U. S. Department of Energy Joint Genome Institute, Walnut Creek, Calif., U.S..
5. Baylor College of Medicine Human Genome Sequencing Center, Department of Molecular and Human
Genetics, Houston, Tex., U.S.
6. RIKEN Genomic Sciences Center, Yokohama, Japan
7. Genoscope and CNRS UMR-8030, Evry, France
8. GTC Sequencing Center, Genome Therapeutics Corporation, Waltham, Mass., U.S.
9. Department of Genome Analysis, Institute of Molecular Biotechnology, Jena, Germany
10. Beijing Genomics Institute/Human Genome Center, Institute of Genetics, Chinese Academy of
Sciences, Beijing, China
11. Multimegabase Sequencing Center, The Institute for Systems Biology, Seattle, Wash., U.S.
12. Stanford Genome Technology Center, Stanford, Calif., U.S.
13. Stanford Human Genome Center and Department of Genetics, Stanford University School of
Medicine, Stanford, Calif., U.S.
14. University of Washington Genome Center, Seattle, Wash., U.S.
15. Department of Molecular Biology, Keio University School of Medicine, Tokyo, Japan
16. University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, U.S.*
17. University of Oklahoma's Advanced Center for Genome Technology, Dept. of Chemistry and
Biochemistry, University of Oklahoma, Norman, Okla., U.S.
18. Max Planck Institute for Molecular Genetics, Berlin, Germany
19. Cold Spring Harbor Laboratory, Lita Annenberg Hazen Genome Center, Cold Spring Harbor, N.Y.,
U.S.
20. GBF - German Research Centre for Biotechnology, Braunschweig, Germany
*Sequencing center is no longer in operation.
GOALS
The completion of the human DNA sequence in the spring of 2003 coincided with the 50th anniversary of
Watson and Crick's description of the fundamental structure of DNA. The analytical power arising from the
reference DNA sequences of entire genomes and other genomics resources has jump-started what some call
the "biology century."
The Human Genome Project was marked by accelerated progress. In June 2000, the rough draft of
the human genome was completed a year ahead of schedule. In February 2001, the working draft was
completed, and special issues of Science and Nature containing the working draft sequence and analysis
were published. Additional papers were published in April 2003 when the project was completed..
The project's first 5-year plan, intended to guide research in FYs 1990-1995, was revised in 1993
due to unexpected progress, and the second plan outlined goals through FY 1998. The third and final plan
[Science, 23 October 1998] was developed during a series of DOE and NIH workshops. Some 18 countries
have participated in the worldwide effort, with significant contributions from the Sanger Center in the United
Kingdom and research centers in Germany, France, and Japan.
Human Genome Project Goals and Completion Dates
Area Goal Achieved Date
2- to 5-cMresolution map September

Genetic Map 1-cM resolution map(3,000 markers)
(600 - 1,500 markers) 1994
October
Physical Map 30,000 STSs 52,000 STSs
1998
95% of gene-containing
part of human sequence 99% of gene-containing part of human
DNA Sequence April 2003
finished to 99.99% sequence finished to 99.99% accuracy
accuracy
Capacity and Cost

Sequence 500 Mb/year at Sequence >1,400Mb/year at <$0.09 per November
of Finished
< $0.25 per finished base finished base 2002
Sequence
Human Sequence 100,000 mapped human February

3.7 million mapped human SNPs
Variation SNPs 2003
Gene
Full-length human cDNAs 15,000 full-lengthhuman cDNAs March 2003
Identification
Finished genome sequences of E. coli, S.

Complete genome
cerevisiae, C. elegans, D. melanogaster, plus
sequences of E. coli, S
Model Organisms whole-genome drafts of several others, April 2003
.cerevisiae, C. elegans, D.
including C. briggsae, D. pseudoobscura,
melanogaster
mouse and rat
High-throughput oligonucleotide synthesis

1994
DNA microarrays
Functional Develop genomic-scale 1996
Eukaryotic, whole-genome knockouts (yeast)
Analysis technologies 1999
Scale-up of two-hybrid system for protein-
2002
protein interaction
Source: Science 300, 286 (2003)
Key Definitions
cDNA: cDNA stands for complementary DNA, a synthetic type of DNA generated from messenger RNA, or
mRNA, the molecule in the cell that takes information from protein-coding DNA - the genes - to the protein-
making machinery and instructs it to make a specific protein. By using mRNA as a template, scientists use
enzymatic reactions to convert its information back into cDNA and then clone it, creating a collection of
cDNAs, or a cDNA library. These libraries are important to scientists because they consist of clones of all
protein-encoding DNA, or all of the genes, in the human genome.
cM: cM stands for centiMorgan, a unit of genetic distance. Generally, one centiMorgan equals about 1
million base pairs.
Eukaryotic: A eukaryote is a single-celled or multicellular organism whose cells contain a distinct
membrane-bound nucleus.
Mb: Mb stands for megabase, a unit of length equal to 1 million base pairs and roughly equal to 1 cM.
Microarray: Microarrays are devices used in many types of large-scale genetic analysis. They can be used
to study how large numbers of genes are expressed as messenger RNA in a particular tissue, and how a
cell's regulatory networks control vast batteries of genes simultaneously. In microarray studies, a robot is
used to precisely apply tiny droplets containing functional DNA to glass slides. Researchers then attach
fluorescent labels to complementary DNA (cDNA) from the tissue they are studying. The labeled cDNA binds
to its matched DNA sequence at a specific location on the slide. The slides are put into a scanning
microscope that can measure the brightness of each fluorescent dot. The brightness reveals how much of a
specific cDNA fragment is present, an indicator of how active a gene is.
Scientists use microarrays in many different ways. For example, microarrays can be used look at which
genes in cells are actively making products under a specific set of conditions, as well as to detect and/or
examine differences in gene activity between healthy and diseased cells.
Oligonucleotide: A short polymer of 10 to 70 nucleotides. A nucleotide is one of the structural
components, or building blocks, of DNA and RNA. A nucleotide consists of a base chemical - either adenine
(A), thymine (T), guanine (G) or cytosine (C) - plus a sugar-phosphate backbone. Oligonucleotides are often
used as probes for detecting complementary DNA or RNA because they bind readily to their complements.
SNP: SNP stands for single nucleotide polymorphism. SNPs - pronounced "snips" - are common, but
minute, variations that occur in the human genome at a frequency of one in every 300 bases. That means
10 million positions out of the 3 billion base-pair human genome have common variations. These variations
can be used to track inheritance in families and susceptibility to disease, so scientists are working hard to
develop a catalogue of SNPs as a tool to use in their efforts to uncover the causes of common illness like
diabetes or heart disease.
STS: STS stands for sequence tagged site, a short DNA segment that occurs only once in a genome and
whose exact location and order of bases is known. Because each is unique, STSs are helpful in chromosome
placement of mapping and sequencing data from many different laboratories. STSs serve as landmarks on
the physical map of a genome
Publications Summarizing Various Aspects of the Project
• Special issue of Nature Human Genome Collection (2006)

• Special issue of Science: Building on the DNA Revolution (April 11, 2003)
o "The Human Genome Project: Lessons from Large-Scale Biology," Francis S. Collins, Michael
Morgan, Aristides Patrinos, Science 300, 286 (2003)
o "Realizing the Potential of the Genome Revolution: The Genomes to Life Program," Marvin E.
Frazier, Gary M. Johnson, David G. Thomassen, Carl E. Oliver, Aristides Patrinos, Science
300, 290 (2003)
• Nature Genetics: A 10-Year Retrospective 1992-2002 (vol. 33, March 2003)
• Controversial From the Start --Science article summarizing the history of the HGP (February 2001)
• Genomes: 15 Years Later--A perspective from Charles DeLisi, HGP Pioneer (July 2001)
• 1997 Human Genome Program Report contains history of the Project
• Bermuda Conference Data Release Policies (1997, 1996)
• NCHGR-DOE Guidance on Human Subjects Issues in Large-Scale DNA Sequencing (1996)
• Special Anniversary Issue of Human Genome News (7(3-4); Sept.-Dec. 1995) Summarizing the
History and Progress of the Project
• Evolution of a Vision (Part I) by David Smith, then Director of the DOE HGP (December 1995)
• Evolution of a Vision (Part II) by Francis S. Collins, Director of NIH NCHGR (December 1995)
• Origins of the Human Genome Project by Robert Cook-Deegan (1994; Risk Journal)
• Mapping the Genome: The Vision, the Science, the Implementation; What is the Genome Project?
[Article from Los Alamos Science. A round table discussion with David Baltimore, David Botstein,
David R. Cox, David J. Galas, Leroy Hood, Robert K. Moyzis, Maynard V. Olson, Nancy S. Wexler,
and Norton D. Ziner] Los Alamos National Laboratory, 1992
• History of the Department of Energy Human Genome Program adapted from the U.S. DOE 1991-92
Human Genome Program Report (published June 1992)
• Data Sharing Policy: (1992) A U.S. Department of Energy and National Institutes of Health
Coordinated Effort
• What is the Human Genome Project?: A Summary of the Project Beginnings (published April 1990)
• Orchestrating the Human Genome Project, by Charles Cantor, Science Vol. 248, April 1990
• The Human Genome Project: Past, Present, and Future by J.D. Watson, Science Vol. 248, April 1990
• The Department of Energy (DOE) Human Genome Initiative, Benjamin J. Barnhart, Genomics 5, 657
(1989).
• The (May 1985) Santa Cruz Workshop, R.L. Sinsheimer, Genomics 5, 954 (1989).
• Mapping Our Genes: Genome Projects --How Big? How Fast? 1988 report from the U.S. Congress
Office of Technology Assessment
• Mapping and Sequencing the Human Genome, report from the National Research Council
Commission on Life Sciences, National Academy Press, Washington, DC, 1988
• Report on the Human Genome Initiative for the Office of Health and Environmental Research: April
1987 report that officially outlined the Department of Energy's strategies for the Human Genome
Project
• Summary Report of the 1986 Santa Fe Workshop, "Sequencing the Human Genome", Bitensky, M.;
Los Alamos National Laboratory, Los Alamos, NM. See also Nature: Meetings that changed the world:
Santa Fe 1986, (Oct. 16, 2008.)
• "The Alta Summit, December 1984," by Robert Cook-Deegan, Genomics 5, 661-663 (published
October 1989): The beginning of the Human Genome Project.
• Human Genome News (HGN) newsletter on the Human Genome Project. All published issues (since
1989) are available. See also the archive of HGN History and Project Management articles.
• DOE HGP Reports and Workshop Abstracts report on program research.
Project Enabling Legislation
• The Atomic Energy Act of 1946 (P.L. 79-585) provided the initial charter for a comprehensive
program of research and development related to the utilization of fissionable and radioactive
materials for medical, biological, and health purposes.
• The Atomic Energy Act of 1954 (P.L. 83-706) further authorized the AEC "to conduct research on the
biologic effects of ionizing radiation."
• The Energy Reorganization Act of 1974 (P.L. 93-438) provided that responsibilities of the Energy
Research and Development Administration (ERDA) shall include "engaging in and supporting
environmental, biomedical, physical, and safety research related to the development of energy
resources and utilization technologies."
• The Federal Non-nuclear Energy Research and Development Act of 1974 (P.L. 93-577) authorized
ERDA to conduct a comprehensive non-nuclear energy research, development, and demonstration
program to include the environmental and social consequences of the various technologies.
• The DOE Organization Act of 1977 (P.L. 95-91) mandated the Department "to assure incorporation
of national environmental protection goals in the formulation and implementation of energy
programs; and to advance the goal of restoring, protecting, and enhancing environmental quality,
and assuring public health and safety," and to conduct "a comprehensive program of research and
development on the environmental effects of energy technology and program."
Project Sponsors
• The U.S. Department of Energy funded its Human Genome Program through their Office of Biological
and Environmental Research. (genome@science.doe.gov).
• The U.S. National Institutes of Health funded its program through the National Human Genome
Research Institute (NHGRI).
BENEFITS OF GENOME PROJECTS
Rapid progress in genome science and a glimpse into its potential applications have spurred observers to
predict that biology will be the foremost science of the 21st century. Technology and resources generated
by the Human Genome Project and other genomics research are already having a major impact on research
across the life sciences. The potential for commercial development of genomics research presents
U.S. industry with a wealth of opportunities, and sales of DNA-based products and technologies in the
biotechnology industry are projected to exceed $45 billion by 2009
(Consulting Resources Corporation Newsletter, Spring 1999).
Some current and potential applications of genome research include
• Molecular medicine
• Energy sources and environmental applications
• Risk assessment
• Bioarchaeology, anthropology, evolution, and human migration
• DNA forensics (identification)
• Agriculture, livestock breeding, and bioprocessing
Molecular Medicine
• Improved diagnosis of disease

• Earlier detection of genetic predispositions to disease
• Rational drug design
• Gene therapy and control systems for drugs
• Pharmacogenomics "custom drugs"
Technology and resources promoted by the Human Genome Project are starting to have profound impacts
on biomedical research and promise to revolutionize the wider spectrum of biological research and clinical
medicine. Increasingly detailed genome maps have aided researchers seeking genes associated with dozens
of genetic conditions, including myotonic dystrophy, fragile X syndrome, neurofibromatosis types 1 and 2,
inherited colon cancer, Alzheimer's disease, and familial breast cancer.
On the horizon is a new era of molecular medicine characterized less by treating symptoms and more by
looking to the most fundamental causes of disease. Rapid and more specific diagnostic tests will make
possible earlier treatment of countless maladies. Medical researchers also will be able to devise novel
therapeutic regimens based on new classes of drugs, immunotherapy techniques, avoidance of
environmental conditions that may trigger disease, and possible augmentation or even replacement of
defective genes through gene therapy.
Energy and Environmental Applications
• Use microbial genomics research to create new energy sources (biofuels)

• Use microbial genomics research to develop environmental monitoring techniques to detect
pollutants
• Use microbial genomics research for safe, efficient environmental remediation
• Use microbial genomics research for carbon sequestration
In 1994, taking advantage of new capabilities developed by the genome project, DOE initiated the Microbial
Genome Program to sequence the genomes of bacteria useful in energy production, environmental
remediation, toxic waste reduction, and industrial processing. A follow-on program, Genomic Science
Program (GSP) builds on data and resources from the Human Genome Project, the Microbial Genome
Program, and systems biology. GSP will accelerate understanding of dynamic living systems for solutions to
DOE mission challenges in energy and the environment.
Despite our reliance on the inhabitants of the microbial world, we know little of their number or their nature:
estimates are that less than 0.01% of all microbes have been cultivated and characterized. Microbial
genome sequencing will help lay a foundation for knowledge that will ultimately benefit human health and
the environment. The economy will benefit from further industrial applications of microbial capabilities.
Information gleaned from the characterization of complete microbial genomes will lead to insights into the
development of such new energy-related biotechnologies as photosynthetic systems, microbial systems that
function in extreme environments, and organisms that can metabolize readily available renewable resources
and waste material with equal facility. Expected benefits also include development of diverse new products,
processes, and test methods that will open the door to a cleaner environment. Biomanufacturing will use
nontoxic chemicals and enzymes to reduce the cost and improve the efficiency of industrial processes.
Microbial enzymes have been used to bleach paper pulp, stone wash denim, remove lipstick from glassware,
break down starch in brewing, and coagulate milk protein for cheese production. In the health arena,
microbial sequences may help researchers find new human genes and shed light on the disease-producing
properties of pathogens.
Microbial genomics will also help pharmaceutical researchers gain a better understanding of how pathogenic
microbes cause disease. Sequencing these microbes will help reveal vulnerabilities and identify new drug
targets.
Gaining a deeper understanding of the microbial world also will provide insights into the strategies and limits
of life on this planet. Data generated in this young program have helped scientists identify the minimum
number of genes necessary for life and confirm the existence of a third major kingdom of life. Additionally,
the new genetic techniques now allow us to establish more precisely the diversity of microorganisms and
identify those critical to maintaining or restoring the function and integrity of large and small ecosystems;
this knowledge also can be useful in monitoring and predicting environmental change. Finally, studies on
microbial communities provide models for understanding biological interactions and evolutionary history.
Risk Assessment
• Assess health damage and risks caused by radiation exposure, including low-dose exposures
• Assess health damage and risks caused by exposure to mutagenic chemicals and cancer-causing
toxins
• Reduce the likelihood of heritable mutations
Understanding the human genome will have an enormous impact on the ability to assess risks posed to
individuals by exposure to toxic agents. Scientists know that genetic differences make some people more
susceptible and others more resistant to such agents. Far more work must be done to determine the genetic
basis of such variability. This knowledge will directly address DOE's long-term mission to understand the
effects of low-level exposures to radiation and other energy-related agents, especially in terms of cancer
risk.
Bioarchaeology, Anthropology, Evolution, and Human Migration
• Study evolution through germline mutations in lineages

• Study migration of different population groups based on female genetic inheritance
• Study mutations on the Y chromosome to trace lineage and migration of males
• Compare breakpoints in the evolution of mutations with ages of populations and historical events
Understanding genomics will help us understand human evolution and the common biology we share with all
of life. Comparative genomics between humans and other organisms such as mice already has led to similar
genes associated with diseases and traits. Further comparative studies will help determine the yet-unknown
function of thousands of other genes.
Comparing the DNA sequences of entire genomes of differerent microbes will provide new insights about
relationships among the three kingdoms of life: archaebacteria, eukaryotes, and prokaryotes.
DNA Forensics (Identification)
• Identify potential suspects whose DNA may match evidence left at crime scenes
• Exonerate persons wrongly accused of crimes
• Identify crime and catastrophe victims
• Establish paternity and other family relationships
• Identify endangered and protected species as an aid to wildlife officials (could be used for
prosecuting poachers)
• Detect bacteria and other organisms that may pollute air, water, soil, and food
• Match organ donors with recipients in transplant programs
• Determine pedigree for seed or livestock breeds
• Authenticate consumables such as caviar and wine
Any type of organism can be identified by examination of DNA sequences unique to that species. Identifying
individuals is less precise, although when DNA sequencing technologies progress further, direct
characterization of very large DNA segments, and possibly even whole genomes, will become feasible and
practical and will allow precise individual identification.
To identify individuals, forensic scientists scan about 10 DNA regions that vary from person to person and
use the data to create a DNA profile of that individual (sometimes called a DNA fingerprint). There is an
extremely small chance that another person has the same DNA profile for a particular set of regions.
Agriculture, Livestock Breeding, and Bioprocessing
• Disease-, insect-, and drought-resistant crops

• Healthier, more productive, disease-resistant farm animals
• More nutritious produce
• Biopesticides
• Edible vaccines incorporated into food products
• New environmental cleanup uses for plants like tobacco
Understanding plant and animal genomes will allow us to create stronger, more disease-resistant plants and
animals --reducing the costs of agriculture and providing consumers with more nutritious, pesticide-free
foods. Already growers are using bioengineered seeds to grow insect- and drought-resistant crops that
require little or no pesticide. Farmers have been able to increase outputs and reduce waste because their
crops and herds are healthier.
Alternate uses for crops such as tobacco have been found. One researcher has genetically engineered
tobacco plants in his laboratory to produce a bacterial enzyme that breaks down explosives such as TNT and
dinitroglycerin. Waste that would take centuries to break down in the soil can be cleaned up by simply
growing these special plants in the polluted area.
ETHICAL LEGAL AND SOCIAL ISSUES (ELSI)
The U.S. Department of Energy (DOE) and the National Institutes of Health (NIH) devoted 3% to 5% of
their annual Human Genome Project (HGP) budgets toward studying the ethical, legal, and social issues
(ELSI) surrounding availability of genetic information. This represents the world's largest bioethics program,
which has become a model for ELSI programs around the world.
The scientists who launched the Human Genome Project believed in the power of genetic information to
transform health care to allow earlier diagnosis of diseases than ever before possible and to fuel the creation
of powerful new medicines.
But it was also clear that genetic information could potentially be used in ways that are hurtful or unfair —
for example denying health insurance because of an increased risk for developing a particular disease.
Aware of the danger and hoping to ward it off, the founders of the Human Genome Project created a
program to explore the Ethical, Legal, and Social Implications of new genetic knowledge. The goal was to
anticipate problems that might arise and to prompt solutions.
For example, in the future, doctors will likely be able to give each of us a "genetic report card" that will spell
out our risk of developing a variety of different diseases. But will we really want that information? How will it
be used? Who will have access our genetic information? How will it affect our lives, our families, and our
communities?
The challenge of addressing these issues is not reserved for scientists. We all have a stake in making sure
that everyone will benefit from genetic research and no one is harmed.
Societal Concerns Arising from the New Genetics
Fairness in the use of genetic information by insurers, employers, courts, schools, adoption agencies, and
the military, among others.
- Who should have access to personal genetic information, and how will it be used?
- Privacy and confidentiality of genetic information.
- Who owns and controls genetic information?
Psychological impact and stigmatization due to an individual's genetic differences.

- How does personal genetic information affect an individual and society's perceptions of that individual?
- How does genomic information affect members of minority communities?
Reproductive issues including adequate informed consent for complex and potentially controversial
procedures, use of genetic information in reproductive decision making, and reproductive rights.
- Do healthcare personnel properly counsel parents about the risks and limitations of genetic technology?
- How reliable and useful is fetal genetic testing?
- What are the larger societal issues raised by new reproductive technologies?
Clinical issues including the education of doctors and other health service providers, patients, and the
general public in genetic capabilities, scientific limitations, and social risks; and implementation of standards
and quality-control measures in testing procedures.
- How will genetic tests be evaluated and regulated for accuracy, reliability, and utility? (Currently, there is
little regulation at the federal level.)
- How do we prepare healthcare professionals for the new genetics?
- How do we prepare the public to make informed choices?
- How do we as a society balance current scientific limitations and social risk with long-term benefits? .
Uncertainties associated with gene tests for susceptibilities and complex conditions (e.g., heart disease)
linked to multiple genes and gene-environment interactions.
- Should testing be performed when no treatment is available?
- Should parents have the right to have their minor children tested for adult-onset diseases?
- Are genetic tests reliable and interpretable by the medical community?
Conceptual and philosophical implications regarding human responsibility, free will vs genetic determinism,
and concepts of health and disease.
- Do people's genes make them behave in a particular way?
- Can people always control their behavior?
- What is considered acceptable diversity?
- Where is the line between medical treatment and enhancement? .
Health and environmental issues concerning genetically modified foods (GM) and microbes.
- Are GM foods and other products safe to humans and the environment?
- How will these technologies affect developing nations' dependence on the West? .
Commercialization of products including property rights (patents, copyrights, and trade secrets) and
accessibility of data and materials.
- Who owns genes and other pieces of DNA?
- Will patenting DNA sequences limit their accessibility and development into useful products?
What We've Learned So Far
What Does the Draft Human Genome Sequence Tell Us?
By the Numbers
• The human genome contains 3164.7 million chemical nucleotide bases (A, C, T, and G).
• The average gene consists of 3000 bases, but sizes vary greatly, with the largest known human
gene being dystrophin at 2.4 million bases.
• The total number of genes is estimated at 30,000 —much lower than previous estimates of 80,000
to 140,000 that had been based on extrapolations from gene-rich areas as opposed to a composite
of gene-rich and gene-poor areas.
• Almost all (99.9%) nucleotide bases are exactly the same in all people.
• The functions are unknown for over 50% of discovered genes.
The Wheat from the Chaff
• Less than 2% of the genome codes for proteins.

• Repeated sequences that do not code for proteins ("junk DNA") make up at least 50% of the human
genome.
• Repetitive sequences are thought to have no direct functions, but they shed light on chromosome
structure and dynamics. Over time, these repeats reshape the genome by rearranging it, creating
entirely new genes, and modifying and reshuffling existing genes.
• During the past 50 million years, a dramatic decrease seems to have occurred in the rate of
accumulation of repeats in the human genome.
How It's Arranged
• The human genome's gene-dense "urban centers" are predominantly composed of the DNA building
blocks G and C.
• In contrast, the gene-poor "deserts" are rich in the DNA building blocks A and T. GC- and AT-rich
regions usually can be seen through a microscope as light and dark bands on chromosomes.
• Genes appear to be concentrated in random areas along the genome, with vast expanses of
noncoding DNA between.
• Stretches of up to 30,000 C and G bases repeating over and over often occur adjacent to gene-rich
areas, forming a barrier between the genes and the "junk DNA." These CpG islands are believed to
help regulate gene activity.
• Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest (231).
How the Human Compares with Other Organisms
• Unlike the human's seemingly random distribution of gene-rich areas, many other organisms'
genomes are more uniform, with genes evenly spaced throughout.
• Humans have on average three times as many kinds of proteins as the fly or worm because of mRNA
transcript "alternative splicing" and chemical modifications to the proteins. This process can yield
different protein products from the same gene.
• Humans share most of the same protein families with worms, flies, and plants, but the number of
gene family members has expanded in humans, especially in proteins involved in development and
immunity.
• The human genome has a much greater portion (50%) of repeat sequences than the mustard weed
(11%), the worm (7%), and the fly (3%).
• Although humans appear to have stopped accumulating repeated DNA over 50 million years ago,
there seems to be no such decline in rodents. This may account for some of the fundamental
differences between hominids and rodents, although gene estimates are similar in these species.
Scientists have proposed many theories to explain evolutionary contrasts between humans and other
organisms, including those of life span, litter sizes, inbreeding, and genetic drift.
Variations and Mutations
• Scientists have identified about 1.4 million locations where single-base DNA differences (SNPs) occur
in humans. This information promises to revolutionize the processes of finding chromosomal
locations for disease-associated sequences and tracing human history.
• The ratio of germline (sperm or egg cell) mutations is 2:1 in males vs females. Researchers point to
several reasons for the higher mutation rate in the male germline, including the greater number of
cell divisions required for sperm formation than for eggs.
Applications, Future Challenges

Deriving meaningful knowledge from the DNA sequence will define research through the coming decades to
inform our understanding of biological systems. This enormous task will require the expertise and creativity
of tens of thousands of scientists from varied disciplines in both the public and private sectors worldwide.
The draft sequence already is having an impact on finding genes associated with disease. A number of
genes have been pinpointed and associated with breast cancer, muscle disease, deafness, and blindness.
Additionally, finding the DNA sequences underlying such common diseases as cardiovascular disease,
diabetes, arthritis, and cancers is being aided by the human variation maps (SNPs) generated in the HGP in
cooperation with the private sector. These genes and SNPs provide focused targets for the development of
effective new therapies.
One of the greatest impacts of having the sequence may well be in enabling an entirely new approach to
biological research. In the past, researchers studied one or a few genes at a time. With whole-genome
sequences and new high-throughput technologies, they can approach questions systematically and on a
grand scale. They can study all the genes in a genome, for example, or all the transcripts in a particular
tissue or organ or tumor, or how tens of thousands of genes and proteins work together in interconnected
networks to orchestrate the chemistry of life.
The Next Step: Functional Genomics

The words of Winston Churchill, spoken in 1942 after 3 years of war, capture well the HGP era: "Now this is
not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning."
The avalanche of genome data grows daily. The new challenge will be to use this vast reservoir of data to
explore how DNA and proteins work with each other and the environment to create complex, dynamic living
systems. Systematic studies of function on a grand scale-functional genomics-will be the focus of biological
explorations in this century and beyond. These explorations will encompass studies in transcriptomics,
proteomics, structural genomics, new experimental methodologies, and comparative genomics.
• Transcriptomics involves large-scale analysis of messenger RNAs transcribed from active genes to
follow when, where, and under what conditions genes are expressed.
• Studying protein expression and function--or proteomics--can bring researchers closer to what's
actually happening in the cell than gene-expression studies. This capability has applications to drug
design.
• Structural genomics initiatives are being launched worldwide to generate the 3-D structures of one
or more proteins from each protein family, thus offering clues to function and biological targets for
drug design.
• Experimental methods for understanding the function of DNA sequences and the proteins they
encode include knockout studies to inactivate genes in living organisms and monitor any changes
that could reveal their functions.
• Comparative genomics—analyzing DNA sequence patterns of humans and well-studied model
organisms side-by-side—has become one of the most powerful strategies for identifying human
genes and interpreting their function.
TIMELINE OF HUMAN GENOME PROJECT
1985
• Robert Sinsheimer holds meeting on human genome sequencing at University of California, Santa
Cruz.
• At OHER Charles DeLisi and David A. Smith commission the first Santa Fe conference to assess the
feasibility of a Human Genome Initiative.
1986
• Following the Santa Fe conference, DOE OHER announces Human Genome Initiative. With $5.3
million, pilot projects begin at DOE national laboratories to develop critical resources and
technologies.
• First Santa Fe Conference is held, March 3-4, 1986.
1987
• Congressionally chartered DOE advisory committee, HERAC, recommends a 15-year,
multidisciplinary, scientific, and technological undertaking to map and sequence the human genome.
DOE designates multidisciplinary human genome centers.
• NIH NIGMS begins funding of genome projects.
1988
• HUGO founded by scientists to coordinate efforts internationally.
• First annual Cold Spring Harbor Laboratory meeting on human genome mapping and sequencing.
• Telomere (chromosome end) sequence having implications for aging and cancer research is
identified at LANL.
1989
• DNA STSs recommended to correlate diverse types of DNA clones.
• DOE and NIH establish Joint ELSI Working Group.
1990
• DOE and NIH present joint 5-year U.S. HGP plan to Congress. The 15-year project formally begins.
• Projects begun to mark gene sites on chromosome maps as sites of mRNA expression.
1993
• International IMAGE Consortium established to coordinate efficient mapping and sequencing of
gene-representing cDNAs.
• DOE-NIH ELSI Working Group's Task Force on Genetic and Insurance Information releases
recommendations.
• DOE and NIH revise 5-year goals
• IOM releases U.S. HGP-funded report, "Assessing Genetic Risks."
• LBNL implements novel transposon-mediated chromosome-sequencing system.
• GRAIL sequence-interpretation service provides Internet access at ORNL.
1992
• Low-resolution genetic linkage map of entire human genome published.
1991
• Human chromosome mapping data repository, GDB, established.
1994
• Genetic-mapping 5-year goal achieved 1 year ahead of schedule.
• Completion of second-generation DNA clone libraries representing each human chromosome by
LLNL and LBNL.
• Genetic Privacy Act, first U.S. HGP legislative product, proposed to regulate collection, analysis,
storage, and use of DNA samples and genetic information obtained from them; endorsed by ELSI
Working Group.
• DOE MGP launched; spin-off of HGP.
• DOE HGP Information Web site activated for public and researchers.
1995
• LANL and LLNL announce high-resolution physical maps of chromosome 16 and chromosome 19,
respectively.
• Moderate-resolution maps of chromosomes 3, 11, 12, and 22 maps published.
• Physical map with over 15,000 STS markers published.
• First (nonviral) whole genome sequenced (for the bacterium Haemophilus influenzae).
• Sequence of smallest bacterium, Mycoplasma genitalium, completed; provides a model of the
minimum number of genes needed for independent existence.
• EEOC guidelines extend ADA employment protection to cover discrimination based on genetic
information related to illness, disease, or other conditions.
1996
• Methanococcus jannaschii genome sequenced; confirms existence of third major branch of life on
earth.
• Health Care Portability and Accountability Act prohibits use of genetic information in certain health-
insurance eligibility decisions, requires DHHS to enforce health-information privacy provisions.
• DOE and NCHGR issue guidelines on use of human subjects for large-scale sequencing projects.
• Saccharomyces cerevisiae (yeast) genome sequence completed by international consortium.
• Sequence of the human T-cell receptor region completed.
1997
• Escherichia coli genome sequence completed.
• Second large-scale sequencing strategy meeting held in Bermuda.
• High-resolution physical maps of chromosomes X and 7 completed.
• DOE forms Joint Genome Institute for implementing high-throughput activities at DOE human
genome centers, initially in sequencing and functional genomics.
1998
• Caenorhabditis elegans genome sequence completed.
• DOE and NIH reveal new five-year plan for HGP, predict project completion by 2003.
• JGI exceeds sequencing goal, achieves 20 Mb for FY 1998.
• GeneMap'98 containing 30,000 markers released.
• Incyte Pharmaceuticals announces plans to sequence human genome in 2 years.
• Mycobacterium tuberculosis bacterium sequenced.
• Celera Genomics formed to sequence much of human genome in 3 years using HGP-generated
resources.
• Largest-ever ELSI meeting attended by over 800 from diverse disciplines and sponsored by DOE;
Whitehead Institute; and the American Society of Law, Medicine, and Ethics.
• Human Genome Project passes midpoint.
1999
• First Human Chromosome Completely Sequenced- Chromosome 22.
• HGP advances goal for obtaining a draft sequence of the entire human genome from 2001 to 2000.
2000
• HGP leaders and President Clinton announce the completion of a "working draft" DNA sequence of
the human genome.
• International research consortium publishes chromosome 21 genome, the smallest human
chromosome and the second to be completely sequenced.
• DOE researchers announce completion of chromosomes 5, 16, and 19 draft sequence.
• International collaborators publish genome of fruit fly Drosophila melanogaster.
• Human Chromosome 20 Finished - Chromosome 20 is the third chromosome completely sequenced
to the high quality specified by the Human Genome Project.
• Publication of Initial Working Draft Sequence February 12, 2001
Special issues of Science (Feb. 16, 2001) and Nature (Feb. 15, 2001) contain the working draft of the
human genome sequence. Nature papers include initial analysis of the descriptions of the sequence
generated by the publicly sponsored Human Genome Project, while Science publications focus on
the draft sequence reported by the private company, Celera Genomics. A press conference was held
at 10 a.m., Monday, February 12, 2001, to discuss the landmark publications. Pieter de Jong's team
(now at the Oakland Children's Hospital, Oakland, CA) was a major provider of the BAC libraries
used in the sequencing of the human and several other genomes.
2002
• Mouse Genome Sequencing Consortium publishes its draft sequence of mouse genome in the
December 5, 2002, issue of Nature.
• International consortium led by the DOE Joint Genome Institute publishes draft sequence of Fugu
rubripes.
2003
• Human Chromosome 6 Completed, October 2003.
• Human Chromosome 7 Completed, July 2003.
• Human Chromosome Y Completed, June 2003.
• Human Genome Project Declared Complete, April 2003 [
• Human Chromosome 14 Finished –
2004
• Human Chromosome 16 Completed, December 2004.
• Landmark Paper: Finishing the euchromatic sequence of the human genome, Nature, Oct. 21, 2004
• Human Gene Count Estimates Changed to 20,000 to 25,000, October 2004.
• Human Chromosome 5 Completed, September 2004.
• Landmark Paper: Human genome: Quality assessment of the human genome sequence. Nature 429,
365-368 (27 May 2004)
• Human Chromosome 9 Completed, May 2004.
• Human Chromosome 18 Completed, March 2004.
• Human Chromosome 13 Completed, March 2004
2005
• Human Chromosome 4 Completed, April 2005.
• Human Chromosome X Completed, March 2005.
2006
• Human Chromosome 8 Completed, January 2006.
2008
• Genetic Information Nondiscrimination Act (GINA) Becomes Law, May 2008.
• Landmark Paper: Mapping and sequencing of structural variation from eight human genomes,
Nature, May 1, 2008
Acronyms
• ADA - Americans with Disabilities Act

• ANL - Argonne National Laboratory, a Department of Energy Laboratory
• BAC - bacterial artificial chromosome
• cDNA - complementary deoxyribonucleic acid
• DHHS - Department of Health and Human Services at National Institutes of Health (NIH)
• DNA - deoxyribonucleic acid
• DOE - Department of Energy
• EEOC - Equal Employment Opportunity Commission
• ELSI - ethical, legal, and social issues
• FY - federal fiscal year (October 1 to September 30)
• GDB - Genome Database
• GRAIL - Gene Recognition and Analysis Internet Link
• HERAC - Health and Enviornmental Research Advisory Committee
• HGI - Human Genome Initiative
• HGP - Human Genome Project, Human Genome Program
• HUGO - Human Genome Organisation
• ICPEMC - International Commission for Protection Against Environmental Mutagens and
Carcinogens
• IMAGE - Integrated Molecular Analysis of Gene Expression
• IOM - Institute of Medicine
• JGI - the Department of Energy's Joint Genome Institute in Walnut Creek, California. The JGI
houses the DOE's production sequencing facility.
• LANL - Los Alamos National Laboratory, a Department of Energy Laboratory
• LBNL - Lawrence Berkeley National Laboratory, a Department of Energy Laboratory
• LLNL - Lawrence Livermore National Laboratory, a Department of Energy Laboratory
• MGP - Microbial Genome Project
• MOU - memorandum of understanding
• mRNA - messenger ribonucleic acid
• NAS - National Academy of Sciences
• NCHGR - National Center for Human Genome Research at National Institutes of Health (NIH)
• NHGRI - National Human Genome Research Institute at National Institutes of Health (NIH)
• NIGMS - National Institute of General Medical Sciences at National Institutes of Health (NIH)
• NIH - National Institutes of Health
• NRC - National Research Council
• OBER - Office of Biological and Environmental Research, U.S. Department of Energy (formerly
Office of Health and Environmental Research)
• OHER - ORNL - Oak Ridge National Laboratory, a Department of Energy Laboratory
• OTA - Office of Technology Assessment
• R&D - research and development
• SBH - Sequencing by hybridization
• STS - sequence tagged site
• UNESCO - United Nations Educational, Scientific, and Cultural Organization
• YAC - yeast artificial chromosome
BIBILIOGRAPHY
WEBSITES
http://www.ornl.gov/sci/techresources/Human_Genome/project/info.shtml
http://www.genome.gov
http://genomics.energy.gov
http://nih.gov/hgp.htm
http://accessexcellence.com/resource/center/genome/projects.htm
http://www.scq.ubc.ca/wp-content/uploads/2006/08/sequencing2.gif
http://bioweb.wku.edu/courses/biol350/DNAsequencing13/Images/DNA_sequence.gif
http://www.bio.miami.edu/~cmallery/255/255hist/mcb4.1.dogma.jpg
BOOKS
Genetics A Conceptual Approach - Pierce, B. A.
Bioinformatics Sequence and Genome Analysis - David W. Mount.
From Genes to Genomes - Concepts and Applications of DNA Technology. – Dale

What Is The Human Genome Project

Uploaded by

Copyright:

Available Formats

You might also like

What Is The Human Genome Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

What Is The Human Genome Project

Uploaded by

Copyright:

Available Formats

Human

• identify all the approximately 20,000-25,000 genes in human DNA,

HUMAN GENOME PROJECT

How NHGRI Managed the Human Genome Project

The Completion of the Sequence and Remaining Goals

NHGRI Mission and Goals

Timeline of milestones in genetics

Exploring Our Molecular Selves

E. Coli to Store and Copy DNA

Preparing DNA for Sequencing Reactions

Products of Sequencing Reactions

Reading the Sequencing Products

Assembling the Results

Working Draft Sequence

Figure 1: One of the most important aspects of bioinformatics is

Figure 2: Sense Strand / Antisense

Figure 7: Transcription and

Consider the following two sequences:

If we align the sequences like this . . .

Sequence ID: A unique number used to identify the DNA sequence.

International Human Genome Sequencing Consortium (Participating institutions)

1. Whitehead Institute/MIT Center for Genome Research, Cambridge, Mass., U.S.

Human Genome Project Goals and Completion Dates

Area Goal Achieved Date

2- to 5-cMresolution map September

Capacity and Cost

Human Sequence 100,000 mapped human February

Finished genome sequences of E. coli, S.

High-throughput oligonucleotide synthesis

Publications Summarizing Various Aspects of the Project

• Special issue of Nature Human Genome Collection (2006)

Project Enabling Legislation

BENEFITS OF GENOME PROJECTS

Some current and potential applications of genome research include

• Improved diagnosis of disease

Energy and Environmental Applications

• Use microbial genomics research to create new energy sources (biofuels)

Bioarchaeology, Anthropology, Evolution, and Human Migration

• Study evolution through germline mutations in lineages

DNA Forensics (Identification)

Agriculture, Livestock Breeding, and Bioprocessing

• Disease-, insect-, and drought-resistant crops

ETHICAL LEGAL AND SOCIAL ISSUES (ELSI)

Psychological impact and stigmatization due to an individual's genetic differences.

What We've Learned So Far

What Does the Draft Human Genome Sequence Tell Us?

The Wheat from the Chaff

• Less than 2% of the genome codes for proteins.

How It's Arranged

How the Human Compares with Other Organisms

Variations and Mutations

Applications, Future Challenges

The Next Step: Functional Genomics

TIMELINE OF HUMAN GENOME PROJECT

• ADA - Americans with Disabilities Act

You might also like