Professional Documents
Culture Documents
What Is The Human Genome Project
What Is The Human Genome Project
What Is The Human Genome Project
Genome
Proje
ct
By
Shveta
Jaishankar GCM III
102 08 2553
CONTENTS
1. Introduction
2. Brief History
3. NHGRI Management, mission and goals
4. Timeline of milestones in genetics
5. Molecular Basics
6. Study, Sequencing, Conclusion
7. Bioinformatics
8. Participating institutions
9. Goals
10. Publications, Legislations and Sponsors
11. Benefits
12. Ethical, Legal and Social Issues
13. What we’ve learned so far
14. Timeline of Human Genome Project
15. Bibiliography
The Human Genome Project (HGP) was one of the great feats of exploration in history - an inward voyage of
discovery rather than an outward exploration of the planet or the cosmos; an international research effort to
sequence and map all of the genes - together known as the genome - of members of our species, Homo
sapiens. Completed in April 2003, the HGP gave us the ability to, for the first time, to read nature's
complete genetic blueprint for building a human being.
Begun formally in 1990, the U.S. Human Genome Project was a 13-year effort coordinated by the U.S.
Department of Energy and the National Institutes of Health. The project originally was planned to last 15
years, but rapid technological advances accelerated the completion date to 2003. Project goals were to
• DNA Sequencing
• The Employment of Restriction Fragment-Length Polymorphisms (RFLP)
• Yeast Artificial Chromosomes (YAC)
• Bacterial Artificial Chromosomes (BAC)
• The Polymerase Chain Reaction (PCR)
• Electrophoresis
Of course, information is only as good as the ability to use it. Therefore, advanced methods for widely
disseminating the information generated by the HGP to scientists, physicians and others, is necessary in
order to ensure the most rapid application of research results for the benefit of humanity. Biomedical
technology and research are particular beneficiaries of the HGP.
However, the momentous implications for individuals and society for possessing the detailed genetic
information made possible by the HGP were recognized from the outset. Another major component of the
HGP - and an ongoing component of NHGRI - is therefore devoted to the analysis of the ethical, legal and
social implications (ELSI) of our newfound genetic knowledge, and the subsequent development of policy
options for public consideration.
A BRIEF HISTORY
In February 2001, the Human Genome Project (HGP) published its results to that date: a 90 percent
complete sequence of all three billion base pairs in the human genome. The HGP consortium published its
data in the February 15, 2001, issue of the journal Nature.
The project had its ideological origins in the mid-1980s, but its intellectual roots stretch back further.
Alfred Sturtevant created the first Drosophila gene map in 1911.
The crucial first step in molecular genome analysis, and in much of the molecular biological research
of the last half-century, was the discovery of the double helical structure of the DNA molecule in 1953 by
Francis Crick and James Watson. The two researchers shared the 1962 Nobel Prize (along with Maurice
Wilkins) in the category of "physiology or medicine."
In the mid-1970s, Frederick Sanger developed techniques to sequence DNA, for which he received
his second Nobel Prize in chemistry in 1980. (His first, in 1958, was for studies of protein structure). With
the automation of DNA sequencing in the 1980s, the idea of analyzing the entire human genome was first
proposed by a few academic biologists.
The United States Department of Energy, seeking data on protecting the genome from the
mutagenic (gene-mutating) effects of radiation, became involved in 1986, and established an early genome
project in 1987.
In 1988, Congress funded both the NIH and the DOE to embark on further exploration of this
concept, and the two government agencies formalized an agreement by signing a Memorandum of
Understanding to "coordinate research and technical activities related to the human genome."
James Watson was appointed to lead the NIH component, which was dubbed the Office of Human Genome
Research. The following year, the Office of Human Genome Research evolved into the National Center for
Human Genome Research (NCHGR).
In 1990, the initial planning stage was completed with the publication of a joint research plan,
"Understanding Our Genetic Inheritance: The Human Genome Project, The First Five Years, FY 1991-1995."
This initial research plan set out specific goals for the first five years of what was then projected to be a 15-
year research effort.
In 1992, Watson resigned, and Michael Gottesman was appointed acting director of the center. The
following year, Francis S. Collins was named director.
The advent and employment of improved research techniques, including the use of restriction fragment-
length polymorphisms, the polymerase chain reaction, bacterial and yeast artificial chromosomes and
pulsed-field gel electrophoresis, enabled rapid early progress. Therefore, the 1990 plan was updated with a
new five-year plan announced in 1993 in the journal Science (262: 43-46; 1993).
Indeed, a large part of the early work of the HGP was devoted to the development of improved technologies
for accelerating the elucidation of the genome. In a 2001 article in the journal Genome Research, Collins
wrote, "Building detailed genetic and physical maps, developing better, cheaper and faster technologies for
handling DNA, and mapping and sequencing the more modest-sized genomes of model organisms were all
critical stepping stones on the path to initiating the large-scale sequencing of the human genome."
Also in 1993, the NCHGR established a Division of Intramural Research (DIR), in which genome technology
is developed and used to study specific diseases. By 1996, eight NIH institutes and centers had also
collaborated to create the Center for Inherited Disease Research (CIDR), for study of the genetics of
complex diseases.
In 1997, the NCHGR received full institute status at NIH, becoming the National Human Genome
Research Institute in 1997, with Collins remaining as the director for the new institute. A third five-year plan
was announced in 1998, again in Science, (282: 682-689; 1998).
In June 2000 came the announcement that the majority of the human genome had in fact been sequenced,
which was followed by the publication of 90 percent of the sequence of the genome's three billion base-pairs
in the journal Nature, in February 2001.
On April 14, 2003 the National Human Genome Research Institute (NHGRI), the Department of
Energy (DOE) and their partners in the International Human Genome Sequencing Consortium announced
the successful completion of the Human Genome Project.
Surprises accompanying the sequence publication included: the relatively small number of human genes,
perhaps as few as 30,000; the complex architecture of human proteins compared to their homologs - similar
genes with the same functions - in, for example, roundworms and fruit flies; and the lessons to be taught by
repeat sequences of DNA.
NHGRI was created in 1989 to manage the role of NIH in the HGP and funded research in a variety of areas
related to the project.
The Division of Extramural Research (DER) for NHGRI supported and managed the role of NIH in the
HGP, set the scientific priorities for HGP research and supervised the peer-reviewed research projects that
addressed those research efforts. The extramural research community and the NHGRI National Advisory
Council for Human Genome Research (NACHGR) advise the DER. Major areas of genome-related research
overseen by the DER include: the development of technologies used in gene sequencing and mapping; the
analysis of the functions of the genes and the proteins for which most genes code; computer technologies
for managing and disseminating the enormous amounts of data generated by the HGP; determination of the
crucial differences in the genetic makeup of individual human beings from each other; and examination of
the ethical, legal and social implications of genetic research.
In 2003, an accurate and complete human genome sequence was finished and made available to scientists
and researchers two years ahead of the original HGP schedule and at a cost less than the original estimated
budget. With the completion of the HGP, the mission of the NHGRI has expanded to include studies aimed at
understanding how the human genome functions in the role of creating gene products, most notably the
many proteins for which genes code.
In late 2001 through 2002, NHGRI gathered the world's leading genome researchers to discuss and
determine the direction of future research at two large "bookend" meetings called Beyond the Beginning:
The Future of Genomics I and II and held workshops throughout 2002 to discuss specific areas of genomic
research, policy, education and ethics.
The specific ideas and recommendations that arose from these sessions has informed the next stage of
genomic research, resulting in a vision document authored by the leadership at NHGRI: A Vision for the
Future of Genomics Research. The overarching mission of the HGP and the NHGRI, however, remains the
same: the quest to understand the human genome and the role it plays in both health and disease. Francis
Collins called the publication in February 2001 of the majority of the human genome "the end of the
beginning.” With the completion of the HGP in April 2003, his words continue to ring true. Writing in a 2001
article for Genome Research titled: Contemplating the End of the Beginning Collins explained:
Critical understanding of gene expression, the connection between sequence variations and phenotype,
large-scale protein-protein interactions, and a host of other global analyses of human biology can now get
seriously underway. For me, as a physician, the true payoff from the HGP will be the ability to better
diagnose, treat and prevent disease, and most of those benefits to humanity still lie ahead. With these
immense data sets of sequence and variation now in hand, we are now empowered to pursue those goals in
ways undreamed of a few years ago.
1859: Darwin publishes On the Origin of Species, proposing continual evolution of species
1865: Mendel's Peas
1869: DNA First Isolated
1879: Mitosis Observed
1900: Rediscovery of Mendel's work
1902: Orderly Inheritance of Disease Observed
1902: Chromosome Theory of Heredity
1909: The Word Gene Coined
1911: Fruit Flies Illuminate the Chromosome Theory
1941: One Gene, One Enzyme
1943: X-ray Diffraction of DNA
1944: DNA is "Transforming Principle"
1944: Jumping Genes
1952: Genes are Made of DNA
1953: DNA Double Helix
1955: 46 Human Chromosomes
1955: DNA Copying Enzyme
1956: Cause of Disease Traced to Alteration
1958: Semi conservative Replication of DNA
1959: Chromosome Abnormalities Identified
1961: mRNA Ferries Information
1961: First Screen for Metabolic Defect in Newborns
1966: Genetic Code Cracked
1968: First Restriction Enzymes Described
1972: First Recombinant DNA
1973: First Animal Gene Cloned
1975-77: DNA Sequencing
1976: First Genetic Engineering Company
1977: Introns Discovered
1981-82: First Transgenic Mice and Fruit Flies
1982: GenBank Database Formed
1983: First Disease Gene Mapped
1983: PCR Invented
1986: First Time Gene Positionally Cloned
1987: First Human Genetic Map
1987: YACs Developed
1989: Microsatellites, New Genetic Markers
1989: Sequence-tagged Sites, Another Marker
1990: Launch of the Human Genome Project
1990: ELSI Founded
1990: Research on BACs
1991: ESTs, Fragments of Genes
1992: Second-generation Genetic Map of Human Genome
1992: Data Release Guidelines Established
1993: NEW HGP Five-year Plan
1994: FLAVR SAVR Tomato
1994: Detailed Human Genetic Map
1994: Microbial Genome Project
1995: Ban on Genetic Discrimination in Workplace
1995: Two Microbial Genomes Sequenced
1995: Physical Map of Human Genome Completed
1996: International Strategy Meeting on Human Genome Sequencing
1996: Mouse Genetic Map Completed
1996: Yeast Genome Sequenced
1996: Archaea Genome Sequenced
1996: Health Insurance Discrimination Banned
1996: 280,000 Expressed Sequence Tags (ESTs)
1996: Human Gene Map Created
1996: Human DNA Sequence Begins
1997: Bermuda Meeting Affirms Principle of Data Release
1997: E. coli Genome Sequenced
1997: Recommendations on Genetic Testing
1998: Private Company Announces Sequencing Plan
1998: M. Tuberculosis Bacterium Sequenced
1998: Committee on Genetic Testing
1998: HGP Map Includes 30,000 Human Genes
1998: New HGP Goals for 2003
1998: SNP Initiative Begins
1998: Genome of Roundworm C. elegans Sequenced
1999: Full-scale Human Genome Sequencing
1999: Chromosome 22
2000: Free Access to Genomic Information
2000: Chromosome 21
2000: Working Draft
2000: Drosophila and Arabidopsis genomes sequenced
2000: Executive Order Bans Genetic Descrimination in the Federal Workplace
2000: Yeast Interactome Published
2000: Fly Model of Parkinson's Disease Reported
2001: First Draft of the Human Genome Sequence Released
2001: RNAi Shuts Off Mammalian Genes
2001: FDA Approves Genetics-based Drug to Treat Leukemia
2002: Mouse Genome Sequenced
2002: Researchers Find Genetic Variation Associated with Prostate Cancer
2002: Rice Genome Sequenced
2002: The International HapMap Project is Announced
2002: The Genomes to Life Program is Launched
2002: Researchers Identify Gene Linked to Bipolar Disorder
2003: Human Genome Project Completed
2003: Fiftieth Anniversary of Watson and Crick's Description of the Double Helix
2003: The First National DNA Day Celebrated
2003: ENCODE Program Begins
2003: Premature Aging Gene Identified
2004: Rat and Chicken Genomes Sequenced
2004: FDA Approves First Microarray
2004: Refined Analysis of Complete Human Genome Sequence
2004: Surgeon General Stresses Importance of Family History
2005: Chimpanzee Genomes Sequenced
2005: HapMap Project Completed
2005: Trypanosomatid Genomes Sequenced
2005: Dog Genomes Sequenced
2006: The Cancer Genome Atlas (TCGA) Project Started
2006: Second Non-human Primate Genome is Sequenced
2006: Initiatives to Establish the Genetic and Environmental Causes of Common Diseases Launched
Mapping
To begin the project, researchers built maps of the human genome. They identified thousands of DNA
sequence landmarks that helped them navigate across the chromosomes.
Developing genome maps was necessary preparation for DNA sequencing. These same maps also
served to orient geneticists who were hunting for disease genes.
With enough landmarks in place, project scientists created "libraries" of clones that spanned the
genome. Each clone contained a manageably small fragment of human DNA that was stored in bacteria.
Scientists used the landmarks to tell them what part of the human genome each fragment came from.
This clone-by-clone approach made it possible to double check the location of each DNA sequence. It also
allowed participating laboratories from around the world to carve up the genome and coordinate their work.
Building Libraries
Clone libraries offered the same advantage of real libraries: orderly access to information. In most
clone libraries, the DNA fragments were stored in E. Coli. These are bacteria that normally live in our
intestines. Each E. Coli cell stored a single segment of human DNA and represented a single book of the
library. Clone libraries allowed each human fragment to be tracked and easily copied.
Subclones
The clone libraries were prepared using bacterial artificial chromosomes, or BACs. Each BAC clone
contained 100,000 to 200,000 bases of DNA sequence. The large BAC clones were used to establish the
order of the DNA sequences. To sequence the DNA, smaller-sized clones were needed. Project scientists cut
the large BAC clones into smaller fragments of about 2,000 bases. These smaller fragments were typically
stored in viruses called phage that can infect E. coli cells
Conclusion
The Human Genome Project also produced other advances, not expected to be accomplished until much
later. These included an advanced draft of the mouse genome and an initial draft of the rat genome.
Medical researchers did not wait to use data from the Human Genome Project. When the project began in
1990, fewer than 100 human disease genes had been identified. At the project's conclusion in 2003, the
number of identified disease genes had risen to more than 1,400.
The Human Genome Project focused on the DNA sequence of an individual. The next step was to analyze
DNA sequences from different populations. This catalog of human genetic variation was called the HapMap.
Completed in 2005, the HapMap used single nucleotide polymorphisms called SNPs to identify large blocks
of DNA sequence called haplotypes that tend to be inherited together. To use the data, researchers compare
haplotypes between people with and without a disease. Haplotypes shared by people with the disease are
then examined in detail to look for associated genes. Already, scientists have used its data to identify a
gene associated with age-related macular degeneration, a disease responsible for blindness among the
elderly. It is expected that the HapMap will play an important role in identifying many more disease genes in
the future.
BIOINFORMATICS
When the Human Genome Project was begun in 1990 it was understood that to meet the project's goals, the
speed of DNA sequencing would have to increase and the cost would have to come down. Over the life of
the project virtually every aspect of DNA sequencing was improved. It took the project approximately four
years to sequence its first one billion bases but just four months to sequence the second billion bases.
During the month of January, 2003, 1.5 billion bases were sequenced. As the speed of DNA sequencing
increased, the cost decreased from 10 dollars per base in 1990 to 10 cents per base at the conclusion of the
project in April 2003. Although the Human Genome Project is officially over, improvements in DNA
sequencing continue to be made. Researchers are experimenting with new methods for sequencing DNA
that have the potential to sequence a human genome in just a matter of weeks for a few thousand dollars.
DNA sequencing performed on an industrial scale has produced a vast amount
of data to analyze. In August 2005 it was announced that the three largest
public collections of DNA and RNA sequences together store one hundred
billion bases, representing over 165,000 different organisms. As sequence
data began to pile up, the need for new and better methods of sequence
analysis was critical.
Bioinformatics is the branch of biology that is concerned with the acquisition,
storage, and analysis of the information found in nucleic acid and protein
sequence data. Computers and bioinformatics software are the tools of the trade.
Genetic data represent a treasure trove for researchers and companies interested in how genes contribute
to our health and well being. Almost half of the genes identified by the Human Genome Project have no
known function. Researchers are using bioinformatics to identify genes, establish their functions, and
develop gene-based strategies for preventing, diagnosing, and treating disease.
A DNA sequencing reaction produces a sequence that is several hundred bases long. Gene sequences
typically run for thousands of bases. The largest known gene is that associated with Duchenne muscular
dystrophy. It is approximately 2.4 million bases in length. In order to study genes, scientists first assemble
long DNA sequences from series of shorter overlapping sequences.
Scientists enter their assembled sequences into genetic databases so that other scientists may use the data.
Since the sequences of the two DNA strands are complementary, it is only necessary to enter the sequence
of one DNA strand into a database. By selecting an appropriate computer program, scientists can use
sequence data to look for genes, get clues to gene functions, examine genetic variation, and explore
evolutionary relationships. Bioinformatics is a young and dynamic science. New bioinformatic software is
being developed while existing software is continually updated.
Finding Genes
ORFs are just one feature that a computer program looks for when locating potential genes. Genes are also
characterized by specific control sequences that are recognized by enzymes involved with transcription and
translation. When a computer program finds a DNA sequence that satisfies all of these gene features (an
ORF plus the appropriate control sequences), it identifies the sequence as likely coming from a gene. Only
testing the DNA sequence in the laboratory can prove that the gene is active in an organism however
Finding Functions
Once a nucleic acid or amino acid sequence has been assembled, bioinformatic analysis can be used to
determine if the sequence is similar to that of a known gene. This is where sequences from model
organisms are helpful. For example, let’s say we have an unknown human DNA sequence that is associated
with the disease cystic fibrosis. A bioinformatic analysis finds a similar sequence from mouse that is
associated with a gene that codes for a membrane protein that regulates salt balance. It is a good bet that
the human sequence also is part of a gene that codes for a membrane protein that regulates salt balance.
Determining the similarity of 2 sequences is not easy. For Ex., it was recently reported that the genomes of
humans and chimpanzees are 96 percent similar. What does this really mean?”
This time, 16 out of 20 bases match. We can say that the two sequences are 80 percent the same. Careful
inspection however reveals another sort of similarity between Sequences 3 and 4.
We see that the 2 sequences differ by just a missing base in Sequence 4 (or an added base to Sequence 3).
Does the deletion (or insertion) of a single base equal four base substitutions as suggested in this example?
There is no simple answer to that question. When comparing sequences, we must be concerned not only
with the quantity of the differences but the quality as well.
Scientists have written computer programs that can be used to see if a particular DNA sequence is similar to
any others that are stored in a sequence database. One of the most popular such programs is called BLAST
(Basic Local Alignment Search Tool). Using this program is somewhat like using a search engine on the
Internet. The user provides the program with a biological sequence (when using BLAST) or a subject (when
using a search engine). In each case, the program compares the input information to the information found
in the database. The results are given with the most closely matching items (or sequences) listed first,
followed by items (or sequences) that match less well.
Let’s look at an example of a BLAST search. The input sequence that is being compared to others in the
database is called the query sequence. In our example, the query is the short human DNA sequence listed
below.
Once the query sequence is submitted, the BLAST program compares it, one-at-a-time, to every sequence
in its database. Typically, the search results are displayed so that the query sequence is shown at the top
and the matching sequences are listed below it. The listed sequence “hits” also may include links to relevant
bibliographic information. The results from this search are shown below.
BLAST Search Terminology:
The BLAST program compares a single input sequence, one at a time, to others in a sequence database. The
results can provide clues as to the identity and function of the input sequence. Sometimes you may want to
compare a number of different sequences, all at the same time to see where they are alike and where they
are different. The CLUSTAL program was developed to produce such multiple alignments. CLUSTAL gets its
name because it deals with clusters of sequences.
CLUSTAL alignments are sometimes used by scientists examining genetic variation within a population. For
example, once a gene has been associated with a disease, scientists can use CLUSTAL to examine how the
gene sequence varies among people with and without the disease. The example below shows a CLUSTAL
alignment of DNA sequences from a portion of the gene associated
with cystic fibrosis. The person affected by the disease is seen to be missing a three-base DNA sequence.
Multiple sequence alignments are also useful to scientists investigating the evolutionary relationships among
species. For example, the CLUSTAL program can be used to align a series of related sequences from
different species. Once the program has produced the best alignment for the sequences, another program
can calculate the evolutionary relationships between them. These data can be used to construct a tree
diagram showing the evolutionary relationships for that sequence among the various species.
GOALS
The completion of the human DNA sequence in the spring of 2003 coincided with the 50th anniversary of
Watson and Crick's description of the fundamental structure of DNA. The analytical power arising from the
reference DNA sequences of entire genomes and other genomics resources has jump-started what some call
the "biology century."
The Human Genome Project was marked by accelerated progress. In June 2000, the rough draft of
the human genome was completed a year ahead of schedule. In February 2001, the working draft was
completed, and special issues of Science and Nature containing the working draft sequence and analysis
were published. Additional papers were published in April 2003 when the project was completed..
The project's first 5-year plan, intended to guide research in FYs 1990-1995, was revised in 1993
due to unexpected progress, and the second plan outlined goals through FY 1998. The third and final plan
[Science, 23 October 1998] was developed during a series of DOE and NIH workshops. Some 18 countries
have participated in the worldwide effort, with significant contributions from the Sanger Center in the United
Kingdom and research centers in Germany, France, and Japan.
October
Physical Map 30,000 STSs 52,000 STSs
1998
95% of gene-containing
part of human sequence 99% of gene-containing part of human
DNA Sequence April 2003
finished to 99.99% sequence finished to 99.99% accuracy
accuracy
Gene
Full-length human cDNAs 15,000 full-lengthhuman cDNAs March 2003
Identification
• The Atomic Energy Act of 1946 (P.L. 79-585) provided the initial charter for a comprehensive
program of research and development related to the utilization of fissionable and radioactive
materials for medical, biological, and health purposes.
• The Atomic Energy Act of 1954 (P.L. 83-706) further authorized the AEC "to conduct research on the
biologic effects of ionizing radiation."
• The Energy Reorganization Act of 1974 (P.L. 93-438) provided that responsibilities of the Energy
Research and Development Administration (ERDA) shall include "engaging in and supporting
environmental, biomedical, physical, and safety research related to the development of energy
resources and utilization technologies."
• The Federal Non-nuclear Energy Research and Development Act of 1974 (P.L. 93-577) authorized
ERDA to conduct a comprehensive non-nuclear energy research, development, and demonstration
program to include the environmental and social consequences of the various technologies.
• The DOE Organization Act of 1977 (P.L. 95-91) mandated the Department "to assure incorporation
of national environmental protection goals in the formulation and implementation of energy
programs; and to advance the goal of restoring, protecting, and enhancing environmental quality,
and assuring public health and safety," and to conduct "a comprehensive program of research and
development on the environmental effects of energy technology and program."
Project Sponsors
• The U.S. Department of Energy funded its Human Genome Program through their Office of Biological
and Environmental Research. (genome@science.doe.gov).
• The U.S. National Institutes of Health funded its program through the National Human Genome
Research Institute (NHGRI).
Rapid progress in genome science and a glimpse into its potential applications have spurred observers to
predict that biology will be the foremost science of the 21st century. Technology and resources generated
by the Human Genome Project and other genomics research are already having a major impact on research
across the life sciences. The potential for commercial development of genomics research presents
U.S. industry with a wealth of opportunities, and sales of DNA-based products and technologies in the
biotechnology industry are projected to exceed $45 billion by 2009
(Consulting Resources Corporation Newsletter, Spring 1999).
• Molecular medicine
• Energy sources and environmental applications
• Risk assessment
• Bioarchaeology, anthropology, evolution, and human migration
• DNA forensics (identification)
• Agriculture, livestock breeding, and bioprocessing
Molecular Medicine
Technology and resources promoted by the Human Genome Project are starting to have profound impacts
on biomedical research and promise to revolutionize the wider spectrum of biological research and clinical
medicine. Increasingly detailed genome maps have aided researchers seeking genes associated with dozens
of genetic conditions, including myotonic dystrophy, fragile X syndrome, neurofibromatosis types 1 and 2,
inherited colon cancer, Alzheimer's disease, and familial breast cancer.
On the horizon is a new era of molecular medicine characterized less by treating symptoms and more by
looking to the most fundamental causes of disease. Rapid and more specific diagnostic tests will make
possible earlier treatment of countless maladies. Medical researchers also will be able to devise novel
therapeutic regimens based on new classes of drugs, immunotherapy techniques, avoidance of
environmental conditions that may trigger disease, and possible augmentation or even replacement of
defective genes through gene therapy.
In 1994, taking advantage of new capabilities developed by the genome project, DOE initiated the Microbial
Genome Program to sequence the genomes of bacteria useful in energy production, environmental
remediation, toxic waste reduction, and industrial processing. A follow-on program, Genomic Science
Program (GSP) builds on data and resources from the Human Genome Project, the Microbial Genome
Program, and systems biology. GSP will accelerate understanding of dynamic living systems for solutions to
DOE mission challenges in energy and the environment.
Despite our reliance on the inhabitants of the microbial world, we know little of their number or their nature:
estimates are that less than 0.01% of all microbes have been cultivated and characterized. Microbial
genome sequencing will help lay a foundation for knowledge that will ultimately benefit human health and
the environment. The economy will benefit from further industrial applications of microbial capabilities.
Information gleaned from the characterization of complete microbial genomes will lead to insights into the
development of such new energy-related biotechnologies as photosynthetic systems, microbial systems that
function in extreme environments, and organisms that can metabolize readily available renewable resources
and waste material with equal facility. Expected benefits also include development of diverse new products,
processes, and test methods that will open the door to a cleaner environment. Biomanufacturing will use
nontoxic chemicals and enzymes to reduce the cost and improve the efficiency of industrial processes.
Microbial enzymes have been used to bleach paper pulp, stone wash denim, remove lipstick from glassware,
break down starch in brewing, and coagulate milk protein for cheese production. In the health arena,
microbial sequences may help researchers find new human genes and shed light on the disease-producing
properties of pathogens.
Microbial genomics will also help pharmaceutical researchers gain a better understanding of how pathogenic
microbes cause disease. Sequencing these microbes will help reveal vulnerabilities and identify new drug
targets.
Gaining a deeper understanding of the microbial world also will provide insights into the strategies and limits
of life on this planet. Data generated in this young program have helped scientists identify the minimum
number of genes necessary for life and confirm the existence of a third major kingdom of life. Additionally,
the new genetic techniques now allow us to establish more precisely the diversity of microorganisms and
identify those critical to maintaining or restoring the function and integrity of large and small ecosystems;
this knowledge also can be useful in monitoring and predicting environmental change. Finally, studies on
microbial communities provide models for understanding biological interactions and evolutionary history.
Risk Assessment
• Assess health damage and risks caused by radiation exposure, including low-dose exposures
• Assess health damage and risks caused by exposure to mutagenic chemicals and cancer-causing
toxins
• Reduce the likelihood of heritable mutations
Understanding the human genome will have an enormous impact on the ability to assess risks posed to
individuals by exposure to toxic agents. Scientists know that genetic differences make some people more
susceptible and others more resistant to such agents. Far more work must be done to determine the genetic
basis of such variability. This knowledge will directly address DOE's long-term mission to understand the
effects of low-level exposures to radiation and other energy-related agents, especially in terms of cancer
risk.
Understanding genomics will help us understand human evolution and the common biology we share with all
of life. Comparative genomics between humans and other organisms such as mice already has led to similar
genes associated with diseases and traits. Further comparative studies will help determine the yet-unknown
function of thousands of other genes.
Comparing the DNA sequences of entire genomes of differerent microbes will provide new insights about
relationships among the three kingdoms of life: archaebacteria, eukaryotes, and prokaryotes.
• Identify potential suspects whose DNA may match evidence left at crime scenes
• Exonerate persons wrongly accused of crimes
• Identify crime and catastrophe victims
• Establish paternity and other family relationships
• Identify endangered and protected species as an aid to wildlife officials (could be used for
prosecuting poachers)
• Detect bacteria and other organisms that may pollute air, water, soil, and food
• Match organ donors with recipients in transplant programs
• Determine pedigree for seed or livestock breeds
• Authenticate consumables such as caviar and wine
Any type of organism can be identified by examination of DNA sequences unique to that species. Identifying
individuals is less precise, although when DNA sequencing technologies progress further, direct
characterization of very large DNA segments, and possibly even whole genomes, will become feasible and
practical and will allow precise individual identification.
To identify individuals, forensic scientists scan about 10 DNA regions that vary from person to person and
use the data to create a DNA profile of that individual (sometimes called a DNA fingerprint). There is an
extremely small chance that another person has the same DNA profile for a particular set of regions.
Understanding plant and animal genomes will allow us to create stronger, more disease-resistant plants and
animals --reducing the costs of agriculture and providing consumers with more nutritious, pesticide-free
foods. Already growers are using bioengineered seeds to grow insect- and drought-resistant crops that
require little or no pesticide. Farmers have been able to increase outputs and reduce waste because their
crops and herds are healthier.
Alternate uses for crops such as tobacco have been found. One researcher has genetically engineered
tobacco plants in his laboratory to produce a bacterial enzyme that breaks down explosives such as TNT and
dinitroglycerin. Waste that would take centuries to break down in the soil can be cleaned up by simply
growing these special plants in the polluted area.
The U.S. Department of Energy (DOE) and the National Institutes of Health (NIH) devoted 3% to 5% of
their annual Human Genome Project (HGP) budgets toward studying the ethical, legal, and social issues
(ELSI) surrounding availability of genetic information. This represents the world's largest bioethics program,
which has become a model for ELSI programs around the world.
The scientists who launched the Human Genome Project believed in the power of genetic information to
transform health care to allow earlier diagnosis of diseases than ever before possible and to fuel the creation
of powerful new medicines.
But it was also clear that genetic information could potentially be used in ways that are hurtful or unfair —
for example denying health insurance because of an increased risk for developing a particular disease.
Aware of the danger and hoping to ward it off, the founders of the Human Genome Project created a
program to explore the Ethical, Legal, and Social Implications of new genetic knowledge. The goal was to
anticipate problems that might arise and to prompt solutions.
For example, in the future, doctors will likely be able to give each of us a "genetic report card" that will spell
out our risk of developing a variety of different diseases. But will we really want that information? How will it
be used? Who will have access our genetic information? How will it affect our lives, our families, and our
communities?
The challenge of addressing these issues is not reserved for scientists. We all have a stake in making sure
that everyone will benefit from genetic research and no one is harmed.
Societal Concerns Arising from the New Genetics
Fairness in the use of genetic information by insurers, employers, courts, schools, adoption agencies, and
the military, among others.
- Who should have access to personal genetic information, and how will it be used?
- Privacy and confidentiality of genetic information.
- Who owns and controls genetic information?
Reproductive issues including adequate informed consent for complex and potentially controversial
procedures, use of genetic information in reproductive decision making, and reproductive rights.
- Do healthcare personnel properly counsel parents about the risks and limitations of genetic technology?
- How reliable and useful is fetal genetic testing?
- What are the larger societal issues raised by new reproductive technologies?
Clinical issues including the education of doctors and other health service providers, patients, and the
general public in genetic capabilities, scientific limitations, and social risks; and implementation of standards
and quality-control measures in testing procedures.
- How will genetic tests be evaluated and regulated for accuracy, reliability, and utility? (Currently, there is
little regulation at the federal level.)
- How do we prepare healthcare professionals for the new genetics?
- How do we prepare the public to make informed choices?
- How do we as a society balance current scientific limitations and social risk with long-term benefits? .
Uncertainties associated with gene tests for susceptibilities and complex conditions (e.g., heart disease)
linked to multiple genes and gene-environment interactions.
- Should testing be performed when no treatment is available?
- Should parents have the right to have their minor children tested for adult-onset diseases?
- Are genetic tests reliable and interpretable by the medical community?
Conceptual and philosophical implications regarding human responsibility, free will vs genetic determinism,
and concepts of health and disease.
- Do people's genes make them behave in a particular way?
- Can people always control their behavior?
- What is considered acceptable diversity?
- Where is the line between medical treatment and enhancement? .
Health and environmental issues concerning genetically modified foods (GM) and microbes.
- Are GM foods and other products safe to humans and the environment?
- How will these technologies affect developing nations' dependence on the West? .
Commercialization of products including property rights (patents, copyrights, and trade secrets) and
accessibility of data and materials.
- Who owns genes and other pieces of DNA?
- Will patenting DNA sequences limit their accessibility and development into useful products?
By the Numbers
• The human genome contains 3164.7 million chemical nucleotide bases (A, C, T, and G).
• The average gene consists of 3000 bases, but sizes vary greatly, with the largest known human
gene being dystrophin at 2.4 million bases.
• The total number of genes is estimated at 30,000 —much lower than previous estimates of 80,000
to 140,000 that had been based on extrapolations from gene-rich areas as opposed to a composite
of gene-rich and gene-poor areas.
• Almost all (99.9%) nucleotide bases are exactly the same in all people.
• The functions are unknown for over 50% of discovered genes.
• The human genome's gene-dense "urban centers" are predominantly composed of the DNA building
blocks G and C.
• In contrast, the gene-poor "deserts" are rich in the DNA building blocks A and T. GC- and AT-rich
regions usually can be seen through a microscope as light and dark bands on chromosomes.
• Genes appear to be concentrated in random areas along the genome, with vast expanses of
noncoding DNA between.
• Stretches of up to 30,000 C and G bases repeating over and over often occur adjacent to gene-rich
areas, forming a barrier between the genes and the "junk DNA." These CpG islands are believed to
help regulate gene activity.
• Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest (231).
• Unlike the human's seemingly random distribution of gene-rich areas, many other organisms'
genomes are more uniform, with genes evenly spaced throughout.
• Humans have on average three times as many kinds of proteins as the fly or worm because of mRNA
transcript "alternative splicing" and chemical modifications to the proteins. This process can yield
different protein products from the same gene.
• Humans share most of the same protein families with worms, flies, and plants, but the number of
gene family members has expanded in humans, especially in proteins involved in development and
immunity.
• The human genome has a much greater portion (50%) of repeat sequences than the mustard weed
(11%), the worm (7%), and the fly (3%).
• Although humans appear to have stopped accumulating repeated DNA over 50 million years ago,
there seems to be no such decline in rodents. This may account for some of the fundamental
differences between hominids and rodents, although gene estimates are similar in these species.
Scientists have proposed many theories to explain evolutionary contrasts between humans and other
organisms, including those of life span, litter sizes, inbreeding, and genetic drift.
• Scientists have identified about 1.4 million locations where single-base DNA differences (SNPs) occur
in humans. This information promises to revolutionize the processes of finding chromosomal
locations for disease-associated sequences and tracing human history.
• The ratio of germline (sperm or egg cell) mutations is 2:1 in males vs females. Researchers point to
several reasons for the higher mutation rate in the male germline, including the greater number of
cell divisions required for sperm formation than for eggs.
The draft sequence already is having an impact on finding genes associated with disease. A number of
genes have been pinpointed and associated with breast cancer, muscle disease, deafness, and blindness.
Additionally, finding the DNA sequences underlying such common diseases as cardiovascular disease,
diabetes, arthritis, and cancers is being aided by the human variation maps (SNPs) generated in the HGP in
cooperation with the private sector. These genes and SNPs provide focused targets for the development of
effective new therapies.
One of the greatest impacts of having the sequence may well be in enabling an entirely new approach to
biological research. In the past, researchers studied one or a few genes at a time. With whole-genome
sequences and new high-throughput technologies, they can approach questions systematically and on a
grand scale. They can study all the genes in a genome, for example, or all the transcripts in a particular
tissue or organ or tumor, or how tens of thousands of genes and proteins work together in interconnected
networks to orchestrate the chemistry of life.
The avalanche of genome data grows daily. The new challenge will be to use this vast reservoir of data to
explore how DNA and proteins work with each other and the environment to create complex, dynamic living
systems. Systematic studies of function on a grand scale-functional genomics-will be the focus of biological
explorations in this century and beyond. These explorations will encompass studies in transcriptomics,
proteomics, structural genomics, new experimental methodologies, and comparative genomics.
• Transcriptomics involves large-scale analysis of messenger RNAs transcribed from active genes to
follow when, where, and under what conditions genes are expressed.
• Studying protein expression and function--or proteomics--can bring researchers closer to what's
actually happening in the cell than gene-expression studies. This capability has applications to drug
design.
• Structural genomics initiatives are being launched worldwide to generate the 3-D structures of one
or more proteins from each protein family, thus offering clues to function and biological targets for
drug design.
• Experimental methods for understanding the function of DNA sequences and the proteins they
encode include knockout studies to inactivate genes in living organisms and monitor any changes
that could reveal their functions.
• Comparative genomics—analyzing DNA sequence patterns of humans and well-studied model
organisms side-by-side—has become one of the most powerful strategies for identifying human
genes and interpreting their function.
1985
• Robert Sinsheimer holds meeting on human genome sequencing at University of California, Santa
Cruz.
• At OHER Charles DeLisi and David A. Smith commission the first Santa Fe conference to assess the
feasibility of a Human Genome Initiative.
1986
• Following the Santa Fe conference, DOE OHER announces Human Genome Initiative. With $5.3
million, pilot projects begin at DOE national laboratories to develop critical resources and
technologies.
• First Santa Fe Conference is held, March 3-4, 1986.
1987
• Congressionally chartered DOE advisory committee, HERAC, recommends a 15-year,
multidisciplinary, scientific, and technological undertaking to map and sequence the human genome.
DOE designates multidisciplinary human genome centers.
• NIH NIGMS begins funding of genome projects.
1988
• HUGO founded by scientists to coordinate efforts internationally.
• First annual Cold Spring Harbor Laboratory meeting on human genome mapping and sequencing.
• Telomere (chromosome end) sequence having implications for aging and cancer research is
identified at LANL.
1989
• DNA STSs recommended to correlate diverse types of DNA clones.
• DOE and NIH establish Joint ELSI Working Group.
1990
• DOE and NIH present joint 5-year U.S. HGP plan to Congress. The 15-year project formally begins.
• Projects begun to mark gene sites on chromosome maps as sites of mRNA expression.
1993
• International IMAGE Consortium established to coordinate efficient mapping and sequencing of
gene-representing cDNAs.
• DOE-NIH ELSI Working Group's Task Force on Genetic and Insurance Information releases
recommendations.
• DOE and NIH revise 5-year goals
• IOM releases U.S. HGP-funded report, "Assessing Genetic Risks."
• LBNL implements novel transposon-mediated chromosome-sequencing system.
• GRAIL sequence-interpretation service provides Internet access at ORNL.
1992
• Low-resolution genetic linkage map of entire human genome published.
1991
• Human chromosome mapping data repository, GDB, established.
1994
• Genetic-mapping 5-year goal achieved 1 year ahead of schedule.
• Completion of second-generation DNA clone libraries representing each human chromosome by
LLNL and LBNL.
• Genetic Privacy Act, first U.S. HGP legislative product, proposed to regulate collection, analysis,
storage, and use of DNA samples and genetic information obtained from them; endorsed by ELSI
Working Group.
• DOE MGP launched; spin-off of HGP.
• DOE HGP Information Web site activated for public and researchers.
1995
• LANL and LLNL announce high-resolution physical maps of chromosome 16 and chromosome 19,
respectively.
• Moderate-resolution maps of chromosomes 3, 11, 12, and 22 maps published.
• Physical map with over 15,000 STS markers published.
• First (nonviral) whole genome sequenced (for the bacterium Haemophilus influenzae).
• Sequence of smallest bacterium, Mycoplasma genitalium, completed; provides a model of the
minimum number of genes needed for independent existence.
• EEOC guidelines extend ADA employment protection to cover discrimination based on genetic
information related to illness, disease, or other conditions.
1996
• Methanococcus jannaschii genome sequenced; confirms existence of third major branch of life on
earth.
• Health Care Portability and Accountability Act prohibits use of genetic information in certain health-
insurance eligibility decisions, requires DHHS to enforce health-information privacy provisions.
• DOE and NCHGR issue guidelines on use of human subjects for large-scale sequencing projects.
• Saccharomyces cerevisiae (yeast) genome sequence completed by international consortium.
• Sequence of the human T-cell receptor region completed.
1997
• Escherichia coli genome sequence completed.
• Second large-scale sequencing strategy meeting held in Bermuda.
• High-resolution physical maps of chromosomes X and 7 completed.
• DOE forms Joint Genome Institute for implementing high-throughput activities at DOE human
genome centers, initially in sequencing and functional genomics.
1998
• Caenorhabditis elegans genome sequence completed.
• DOE and NIH reveal new five-year plan for HGP, predict project completion by 2003.
• JGI exceeds sequencing goal, achieves 20 Mb for FY 1998.
• GeneMap'98 containing 30,000 markers released.
• Incyte Pharmaceuticals announces plans to sequence human genome in 2 years.
• Mycobacterium tuberculosis bacterium sequenced.
• Celera Genomics formed to sequence much of human genome in 3 years using HGP-generated
resources.
• Largest-ever ELSI meeting attended by over 800 from diverse disciplines and sponsored by DOE;
Whitehead Institute; and the American Society of Law, Medicine, and Ethics.
• Human Genome Project passes midpoint.
1999
• First Human Chromosome Completely Sequenced- Chromosome 22.
• HGP advances goal for obtaining a draft sequence of the entire human genome from 2001 to 2000.
2000
• HGP leaders and President Clinton announce the completion of a "working draft" DNA sequence of
the human genome.
• International research consortium publishes chromosome 21 genome, the smallest human
chromosome and the second to be completely sequenced.
• DOE researchers announce completion of chromosomes 5, 16, and 19 draft sequence.
• International collaborators publish genome of fruit fly Drosophila melanogaster.
• Human Chromosome 20 Finished - Chromosome 20 is the third chromosome completely sequenced
to the high quality specified by the Human Genome Project.
• Publication of Initial Working Draft Sequence February 12, 2001
Special issues of Science (Feb. 16, 2001) and Nature (Feb. 15, 2001) contain the working draft of the
human genome sequence. Nature papers include initial analysis of the descriptions of the sequence
generated by the publicly sponsored Human Genome Project, while Science publications focus on
the draft sequence reported by the private company, Celera Genomics. A press conference was held
at 10 a.m., Monday, February 12, 2001, to discuss the landmark publications. Pieter de Jong's team
(now at the Oakland Children's Hospital, Oakland, CA) was a major provider of the BAC libraries
used in the sequencing of the human and several other genomes.
2002
• Mouse Genome Sequencing Consortium publishes its draft sequence of mouse genome in the
December 5, 2002, issue of Nature.
• International consortium led by the DOE Joint Genome Institute publishes draft sequence of Fugu
rubripes.
2003
• Human Chromosome 6 Completed, October 2003.
• Human Chromosome 7 Completed, July 2003.
• Human Chromosome Y Completed, June 2003.
• Human Genome Project Declared Complete, April 2003 [
• Human Chromosome 14 Finished –
2004
• Human Chromosome 16 Completed, December 2004.
• Landmark Paper: Finishing the euchromatic sequence of the human genome, Nature, Oct. 21, 2004
• Human Gene Count Estimates Changed to 20,000 to 25,000, October 2004.
• Human Chromosome 5 Completed, September 2004.
• Landmark Paper: Human genome: Quality assessment of the human genome sequence. Nature 429,
365-368 (27 May 2004)
• Human Chromosome 9 Completed, May 2004.
• Human Chromosome 10 Completed, May 2004.
• Human Chromosome 18 Completed, March 2004.
• Human Chromosome 19 Completed, March 2004.
• Human Chromosome 13 Completed, March 2004
2005
• Human Chromosome 4 Completed, April 2005.
• Human Chromosome 2 Completed, April 2005.
• Human Chromosome X Completed, March 2005.
2006
• Human Chromosome 1 Completed, May 2006.
• Human Chromosome 3 Completed, April 2006.
• Human Chromosome 17 Completed, April 2006.
• Human Chromosome 11 Completed, March 2006.
• Human Chromosome 12 Completed, March 2006.
• Human Chromosome 15 Completed, March 2006.
• Human Chromosome 8 Completed, January 2006.
2008
• Genetic Information Nondiscrimination Act (GINA) Becomes Law, May 2008.
• Landmark Paper: Mapping and sequencing of structural variation from eight human genomes,
Nature, May 1, 2008
Acronyms
BIBILIOGRAPHY
WEBSITES
http://www.ornl.gov/sci/techresources/Human_Genome/project/info.shtml
http://www.genome.gov
http://genomics.energy.gov
http://nih.gov/hgp.htm
http://accessexcellence.com/resource/center/genome/projects.htm
http://www.scq.ubc.ca/wp-content/uploads/2006/08/sequencing2.gif
http://bioweb.wku.edu/courses/biol350/DNAsequencing13/Images/DNA_sequence.gif
http://www.bio.miami.edu/~cmallery/255/255hist/mcb4.1.dogma.jpg
BOOKS
Genetics A Conceptual Approach - Pierce, B. A.
Bioinformatics Sequence and Genome Analysis - David W. Mount.
From Genes to Genomes - Concepts and Applications of DNA Technology. – Dale