Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Proc. Natl. Acad. Sci.

USA
Vol. 90, pp. 4338-4344, May 1993

Review
The human genome project
Maynard V. Olson
Department of Molecular Biotechnology, University of Washington, Seattle, WA 98195

ABSTRACT The Human Genome germ cells), which contain 3 x 109 base synthesize a mammalian protein provides
Project in the United States is now well pairs (bp) of DNA. Given the four-letter a route to large amounts of the pure
underway. Its programmatic direction was alphabet of DNA-customarily symbol- protein. The ability to alter the structure
largely set by a National Research Council ized with the letters G, A, T, and C-the of the protein through site-directed mu-
report issued in 1988. The broad frame- sequence of 3 x 109 bp corresponds to tagenesis lends genuine novelty to the
work supplied by this report has survived 750 megabytes of information. If the se- resultant biosynthetic opportunities.
almost unchanged despite an upheaval in quence of the human genome could be However, the importance of recombi-
the technology of genome analysis. This determined, it would be possible to store nant-DNA technology in making the Hu-
upheaval has primarily affected physical and manipulate it on a desktop computer. man Genome Project feasible stems from
and genetic mapping, the two dominant However, even the dream of acquiring its analytical dimensions. Cloning pro-
activities in the present phase of the project. DNA sequence on this scale is of recent vides a means to purify individual recom-
Advances in mapping techniques have al- origin. Dramatic progress was made dur- binant-DNA molecules from complex
lowed good progress toward the specific ing the 1950s and 1960s in understanding mixtures and then to prepare biochemi-
goals of the project and are also providing the mechanisms by which genetic infor- cally useful amounts of the molecules by
strong corollary benefits throughout bio- mation specifies biological structure and culturing the microbial strains into which
medical research. Actual DNA sequencing function. However, during this era, the they have been introduced.
of the genomes of the human and model information itself remained nearly inac- A less obvious consequence of the
organisms is still at an early stage. There cessible. discovery of restriction enzymes was the
has been little progress in the intrinsic A landmark event in DNA analysis development of the first practical method
efficiency of DNA-sequence determination. came in 1970 with the discovery of site- of genetic mapping in humans (ref. 4, pp.
However, refinements in experimental pro- specific restriction enzymes (refs. 2 and 519-522; ref. 6). Most human cells con-
tocols, instrumentation, and project man- 3; ref. 4, p. 64). These remarkable en- tain two copies of each DNA sequence,
agement have made it practical to acquire one of maternal and the other of paternal
sequence data on an enlarged scale. It is zymes have the ability to scan any source
of DNA for every occurrence of a par- origin. When a new germ cell is pro-
also increasingly apparent that DNA- duced, it contains only one copy of the
sequence data provide a potent means of ticular string of bases-for example, the
enzyme EcoRI recognizes the string genome, a copy that is a unique mosaic of
relating knowledge gained from the study the two genomes from which it was de-
of model organisms to human biology. GAATTC. Restriction enzymes cleave
both strands of the double helix at their rived. Genetic mapping involves measur-
There is as yet little indication that the ing, through actual inheritance studies in
infusion of technology from outside biology recognition sites. Since the cleavage
events are directed by the DNA se- families, the probability that two closely
into the Human Genome Project has been spaced segments of the genome will stay
effectively stimulated. Opportunities in this quence, they always occur at the same
positions in different samples of genomic together during germ-cell formation. The
area remain large, posing substantial tech- mapping requires an ability to distinguish
nical and policy challenges. DNA extracted from any genetically ho-
mogeneous source (e.g., different tissues between the two copies of the genome
of the same individual human or different present in the somatic cells from which
In the United States, the Human Genome individuals sampled from an inbred strain the germ cells are derived. Subtle differ-
Project first took clear form in February of mice). ences in the base sequence of different
1988, with the release of the National Restriction enzymes provide a means instances of the human genome some-
Research Council (NRC) report Mapping to develop precise physical maps of DNA times alter restriction sites and, hence,
and Sequencing the Human Genome (1). simply by determining the coordinates in restriction-fragment sizes. These alter-
To a degree remarkable in Federal sci- base pairs of the sites at which particular ations are detectable even in complex
ence policy, this report has had a clear enzymes cleave (ref. 4, pp. 66-67; ref. 5). genomes by a method known as gel-
effect on subsequent programmatic ac- Like topographic maps, physical maps of transfer hybridization, which was devel-
tivity. With a budget in the current fiscal DNA derive their utility through annota- oped in 1975 (ref. 4, pp. 127-130; ref. 7).
year of $170 million, jointly administered tion: mapped landmarks provide refer- In 1987, the first global human genetic
by the- National Institutes of Health ence points relative to which functional map, based on "restriction-fragment-
(NIH) and the Department of Energy DNA sequences such as genes can be length polymorphisms," was published
(DOE), a program is underway that con- localized. Restriction enzymes also facil- (8).
forms closely to the recommendations of itate a key step in the cut-and-splice As to the actual determination of DNA
the NRC committee. After a 5-year real- procedures by which recombinant-DNA sequence, reasonably efficient methods
ity check, it is of both scientific and molecules (i.e., DNA clones) are con- first appeared in 1977 (ref. 4, pp. 67-69;
policy interest to examine how the com- structed (ref. 4, pp. 73-74 and pp. 99- refs. 9 and 10). A technique known as
mittee's view toward this project has 124).
fared in the field. The importance of recombinant-DNA Abbreviations: DOE, Department of Energy;
Background technology is often attributed primarily FISH, fluorescence in situ hybridization;
to its synthetic dimension. For example, NIH, National Institutes of Health; NRC,
The human genome is the genetic mate- the ability to design and construct a DNA National Research Council; STS, sequence-
tagged site; YAC, yeast artificial chromo-
rial in human egg and sperm cells (i.e., molecule that programs a bacterium to some.
4338
Review: Olson Proc. Natl. Acad. Sci. USA 90 (1993) 4339

chain-termination sequencing came to 188-191) and as the starting points for methods used to construct them. This
dominate standard practice: it is based on sequence analysis or functional studies. capability required a new choice of land-
enzymatic DNA synthesis, carried out in Human genetics is uniquely dependent marks called sequence-tagged sites
vitro in the presence of artificial chain- on strong research infrastructure. While (STSs; ref. 19). An STS is simply a short,
terminating variants of the normal DNA- model organisms have been extensively unique sequence of DNA that can be
precursor molecules. By the early 1980s bred for the specific purpose of facilitat- amplified via the PCR. STSs are ideal
individual sequences exceeding 105 bp ing genetic analysis (13), human genetics landmarks during map construction be-
had been determined (11); however, a is limited to the examination of individ- cause of the ease with which they can be
more common scale of analysis was 103- uals, families, and populations as they detected by PCR assays. Equally impor-
104 bp. Most physical mapping was car- are found in contemporary society. tant is their role in map representation
ried out on a similar scale. Hence, the NRC committee set ambi- and map use.
Given the gap between the ability to tious goals for the construction of de- Complex physical maps based on re-
determine 105 bp of DNA sequence in a tailed physical and genetic maps of the striction sites are of little value as exper-
state-of-the-art laboratory and the 109-bp human genome, as well as organized col- imental tools unless they are supported
size of the human genome, the NRC lections of cloned human DNA. By de- by a collection of clones that can be used
committee confronted an enormous sign, the goals were too ambitious for the to detect particular segments of the
problem of scale. Partly because of the technology of 1988. In retrospect, they mapped DNA via DNADNA hybridiza-
were so ambitious that they probably tion assays. Comprehensive maps of the
obvious need for improved technology
and also because of a desire to maximize would have overwhelmed the basic meth- human genome would have to be sup-
synergy between genome analysis and
odologies on which the NRC report was ported by tens of thousands of clones,
studies of biological function, the com- based. Fortunately, technical advances each of which would have to be main-
mittee recommended against early em- since 1988 have exceeded all reasonable tained as a separate microbial strain. In
phasis on large-scale sequencing of hu- expectations. contrast, STSs can be described in an
man DNA. Instead, it advocated com-
Much of this progress was made pos- electronic data base in a form that makes
sible by the development of the polymer- them experimentally accessible in any
prehensive physical and genetic mapping ase chain reaction (PCR). PCR, which is laboratory. The most critical aspect of an
of the human genome, extensive mapping essentially a method of in vitro cloning, STS description is the DNA sequence of
and sequencing of the smaller genomes of allows the amplification of specific DNA the two primers. Laboratory implemen-
several model organisms, and a system- molecules in vitro through cycles of en- tation of an STS simply requires that the
atic effort to develop improved sequenc- zymatic DNA synthesis (ref. 4, pp. 79- two primers be synthesized and the ap-
ing technology. 85). PCR amplification is dependent on a propriate temperature-cycling regime be
pair of short, synthetic "primers" (i.e., carried out.
Principal Aims of the Human single-stranded DNA molecules whose Most large-scale physical maps are
Genome Project ends can be extended by DNA polymer- constructed through the process of "con-
ase under the direction of template mol- tig building." A contig is an organized set
More important than the specific map- ecules). The test sample provides the of DNA clones that collectively provide
ping and sequencing objectives of the template molecules, and the primers di- redundant cloned coverage of a region
Human Genome Project are three rect the amplification to a particular seg- that is too long to clone in one piece (ref.
broader aims that are implicit in these ment of the template DNA, typically a 4, pp. 587-588; refs. 20-22). Typically,
goals: region only a few hundred base pairs in the clones have random end points, and
(i) To improve the research infrastruc- length. Starting with a minute sample of the contig is described by specifying the
ture of human genetics. total human DNA, it is possible to am- amount of overlap between each clone
(ii) To help establish DNA sequence as plify any such region 1 billionfold while and its nearest neighbors. A procedure
the primary interface between knowledge leaving the rest of the genome at its referred to as STS-content mapping pro-
of human biology and knowledge of the original concentration. vides a convenient method of establish-
biology of model organisms. Widespread application of the PCR de- ing these overlaps (ref. 4, pp. 610-612;
(iii) To launch an open-ended effort to pends on an efficient, automated method ref. 23). In a step that precedes contig
improve the analytical biochemistry of for the chemical synthesis of the PCR building, the STSs are tested to confirm
DNA. primers. An approach to DNA synthesis that they occur in a single copy in the
For the purposes of this review, prog- based on phosphoramidite chemistry, genome; then, if two clones share even a
ress in the Human Genome Project will which became routine in the early 1980s, single STS, they can be reliably assumed
be examined relative to these three broad meets this need (ref. 4, pp. 69-70; refs. to overlap.
aims. 14-16). The first paper on the PCR ap- Although the PCR has had a profound
peared in 1985 (17) but received little effect on physical mapping, other new
The Research Infrastructure of notice; for example, despite its present developments have also improved the
Human Genetics prominence in genome analysis, it is not prospects for the construction of large-
mentioned in the NRC report. The ex- scale physical maps. One such develop-
In the context of the Human Genome plosive growth of PCR applications be- ment has been the introduction of the
Project, research infrastructure refers to gan with the publication of an important yeast artificial-chromosome (YAC) clon-
the biological, informational, and meth- refinement of the PCR protocol in 1989- ing system, fist described in 1987 (ref. 4,
odological tools with which genetics re- the use of a thermostable DNA polymer- pp. 590-592; ref. 24). YACs allow large
search is carried out. Intensive genetic ase (18). This refinement allowed the segments of DNA to be cloned as linear,
analysis of any species is heavily depen- cycles of DNA synthesis, which are anal- artificial chromosomes into the yeast
dent on infrastructure. Particularly im- ogous to cellular generations, to be host Saccharomyces cerevisiae. Even
portant are genetic-linkage maps, physi- driven by simple thermal cycling with no some of the earliest YAC clones were 10
cal maps of DNA, and characterized new addition of reagents at each cycle. times the size of the largest clones that
DNA clones. The latter are useful as By the end of 1989, it was already had been constructed previously. Fur-
reagents that can be used to assay for apparent that the PCR provided a prac- thermore, the YAC system appears ca-
particular short segments of the genome tical means of abstracting large-scale pable of cloning a higher proportion of
by DNA-DNA hybridization (ref. 12, pp. physical maps away from the particular the genomic DNA of many organisms
4340 Review: Olson Proc. Natl. Acad. Sci. USA 90 (1993)

than could be recovered using earlier have led to greatly improved continuity, A key test of the effectiveness of the
systems. This point has been most clearly but the need remains for supplementary infrastructure-building features of the
documented during the physical mapping methods to define the order and orienta- Human Genome Project is the extent to
of the genome of the nematode worm tion of disconnected contigs along chro- which its components are being used
Caenorhabditis elegans (25). mosomes. even before genome-wide physical maps
By 1989, YAC technology had evolved Radiation-hybrid mapping, which in- are available. The most critical test in-
to the point where specific segments of volves fragmentation of chromosomes in volves projects directed at the "position-
the human genome could be recovered cultured cells with high doses of x-rays al cloning" of genes associated with her-
efficiently in YAC clones (26). Soon followed by incorporation of the frag- itable diseases. Positional cloning is a
thereafter, multi-megabase-pair contigs ments into stable cell lines, provides still strategy that was developed during the
began to appear (23, 27-29), and, in the another solution to this problem (ref. 4, 1980s to allow determination of the bio-
fall of 1992, complete YAC-based phys- pp. 608-609; ref. 40). Current protocols chemical basis of the many heritable dis-
ical maps of human chromosome 21 (30) for radiation-hybrid mapping are notable eases whose analysis has resisted the
and the human Y chromosome (31) were for their abandonment of the traditional more traditional approach of direct bio-
published. In these projects, contig con- goal of isolating a single short segment of chemical analysis of diseased tissue (46).
struction was largely by STS-content the human genome in each rodent cell In general, the biochemical analysis of
mapping. There is little doubt that the line. Nearly all of the radiation-hybrid diseased tissue is rarely effective unless
same technology employed on chromo- lines produced by these protocols con- the genetic defect alters a protein whose
somes Y and 21, as well as on a large tain many unrelated segments of the hu- metabolic role in normal tissue is already
segment of the X chromosome (29), has man genome. Proximity of two STSs, or understood. Few of the heritable dis-
sufficient power to produce highly con- other markers, is inferred by statistical eases that cause mental retardation, psy-
nected physical maps of the entire human analysis of the pattern in which they chosis, congenital malformation, malig-
genome. occur in a large collection of cell lines. nant tumors, and other similarly complex
Another important advance in physical Closely spaced markers have a higher effects meet this criterion.
mapping has been the development of probability of occurring together in the The first step in positional cloning is to
fluorescence in situ hybridization (FISH) same cell line than do pairs of markers
into a routine procedure. This technique that are on different chromosomes or are localize the "disease" gene by carrying
employs DNA probes that can detect far apart on the same chromosome. out genetic mapping studies on families
segments of the human genome by While the PCR, together with such new with multiple affected members. Studies
DNADNA hybridization on samples of techniques as YAC cloning, FISH, and of the coinheritance of the disease with
lysed metaphase cells prepared under radiation-hybrid mapping, has led to a genetically mapped DNA markers allow
conditions that preserve the morphology surge of success in physical mapping, determination of the position of the gene
of the condensed human chromosomes. PCR-based methods have also trans- in the genome. Actual biochemical iden-
Attachment of fluorescent molecules to formed genetic mapping. In particular, tification of the gene still remains a for-
the probe DNA allows visualization in the PCR has allowed development of a midable task since the resolution of ge-
the light microscope of the position on a new class of genetic markers that have a netic maps in the human is rarely better
chromosome to which the probe binds. particularly high probability of existing in than 1 megabase pair (Mbp). Physical
The technique is a refinement of previous alternate forms in different instances of mapping and functional studies on the
in situ hybridization methods that de- the human genome. cloned DNA are required to find the gene
pended on radiolabeling of the probes These markers are based on short, within the candidate region.
and autoradiographic detection (32). The repetitive DNA sequences that are Better physical mapping methods, par-
increases in convenience, reliability, and widely distributed in the human genome. ticularly the combination of YAC cloning
resolution that have accompanied non- A particularly common motif is ... and FISH analysis, have improved the
isotopic detection have transformed the (CA), .... At sites where this motif prospects for positional cloning. An ex-
role of in situ hybridization in physical occurs, n, the number of repetitions of emplary baseline case, publishedjust be-
mapping. The first nonisotopic visualiza- the dinucleotide CA, is highly variable fore either of these techniques became
tion of single-copy sequences in human from one instance of the human genome widely available, is cystic fibrosis. Final
chromosomes by in situ hybridization to the next (41, 42). Different values of n success in the positional cloning of the
was published in 1985 (33). Fluorescence lead to PCR-amplification products of cystic fibrosis gene required heroic phys-
detection of single-copy sequences was different lengths when the entire ... ical mapping efforts that never achieved
introduced in 1987 (34), after which ap- (CA)n ... tract is amplified by using any semblance of continuous cloned cov-
plications expanded rapidly (35-37). primers that flank the repeat; these dif- erage of the candidate region (47). Piece-
FISH contributes to two aspects of ferences are readily detected by gel elec- meal cloning and mapping proved ade-
long-range physical mapping. First, it al- trophoresis. An attractive feature of quate only because the gene was large
lows individual clones to be mapped at a PCR-detectable genetic markers is that and in a gene-poor region of the genome.
coarse level long before contig building is they are simply a special type of STS. As Subsequent successes with a series of
complete, thereby providing reagents of such, they can be readily included as disease genes reveal the influence of im-
immediate use in the analysis of targeted landmarks in physical maps, as well as proved techniques. Examples in which
regions. Second, contig maps have dis- genetic maps, thereby providing a simple YACs, FISH, or both figured promi-
continuities whenever a site in the ge- method of interrelating these two types of nently include the following: fragile-X
nome is missing from the available clone maps. Many PCR-detectable genetic syndrome (48-50), the most common
collections. FISH provides a way to or- markers have been integrated into preex- heritable form of mental retardation; fa-
der and orient contigs along a chromo- isting maps ofthe human, greatly improv- milial adenomatous polyposis (51, 52), a
some even when occasional discontinui- ing these maps (43). Still more recently, a heritable form of colorectal cancer; my-
ties exist. Early efforts to construct phys- human genetic map that is completely otonic dystrophy (53), an adult-onset dis-
ical maps of human chromosomes, which based on PCR-detectable markers has ease that affects muscle function; Kall-
depended on cosmid clones that are prop- been constructed (44). Markers of the mann syndrome (54, 55), a defect in neu-
agated in Escherichia coli, yielded rela- same type have also transformed genetic ronal development; Lowe syndrome (56),
tively small contigs separated by discon- mapping in the mouse, whose genome is a developmental defect affecting the lens,
tinuities (38, 39). YAC-based methods the same as that of the human (45). brain, and kidney; and Menkes disease
Review: Olson Proc. Natl. Acad. Sci. USA 90 (1993) 4341
(57-59), a neurological disease that is ruption is the initiating event in several the secretion of a factor follows the nor-
lethal in early childhood. forms of neoplasia (65, 66). mal pathway, its biosynthesis has a fea-
The genes for many other heritable ture that is unusual in yeast but relatively
diseases are now under analysis by sim- DNA Sequence as an Interface Between common in human cells: it is produced by
ilar techniques. Particularly impressive is Knowledge of Human Biology and proteolytic processing of a precursor
progress on the genetic mapping of dis- Knowledge of the Biology of peptide at a Lys-Arg linkage. Genetic
eases such as familial breast and ovarian Model Organisms studies revealed that the gene KEX2 en-
cancer (60, 61) and early-onset familial codes the protease that carries out this
Alzheimer disease (62). The genetic anal- Central to the NRC committee's recom- processing step. Comparison of the se-
ysis of these diseases is complicated by a mendations, which emphasized the im- quence of KEX2 with all other known
set of factors that will be encountered portance of sequencing the genomes of DNA sequences revealed strong similar-
increasingly often as positional cloning is model organisms, was the belief that ity to a human gene of previously un-
applied to complex, adult-onset genetic DNA sequence offers a potent means of known function, c-fur (72). Subsequent
disorders: suitable families are rare, interrelating diverse aspects of biological analysis showed that c-fur is a member of
small, and incomplete (i.e., few grand- knowledge. Events during the past 5 a family of human genes that encode
parents, parents, or siblings of the af- years have strongly reinforced this con- proteases that process precursors to
fected individuals are available); even cept. many important proteins and peptides
family members that remain disease-free Particularly remarkable is the ability of including insulin, nerve growth factor,
throughout a normal life span cannot be DNA-sequence data to call attention to bone morphogenetic protein, and a major
reliably categorized as unaffected since similarities between biological phenom- component of the AIDS virus (73). There
they may have died from other causes ena that are superficially unrelated. A is a long history of direct, biochemical
before disease developed; the disease is typical example involves the successful efforts to identify these proteases be-
common enough in the general popula- transfer of information from the study of cause of their potential interest as phar-
tion that cases with genetic and nonge- yeast mating to diverse areas of human macological targets. These efforts led to
netic causes occur frequently in the same biology. Yeast cells have two mating the description of a whole series of pro-
family. Highly informative genetic mark- types, commonly referred to as a and a. teases that are capable of cleaving Lys-
ers, such as the PCR-detectable CA- In the yeast life cycle, a and a cells are Arg linkages under particular in vitro
repeat polymorphisms, have helped ad- the rough counterparts of mammalian conditions but that serve other functions
dress these problems since they maxi- germ cells. The yeast counterpart of fer- in vivo.
mize the likelihood that the segment of tilization involves the fusion of a and a These two examples illustrate the
the chromosome that bears the disease- cells, a process that is partly mediated by strength of the concept, which is funda-
causing mutation can be tracked reliably two peptide hormones, a factor and a mental to the Human Genome Project,
from one generation to the next even factor. These hormones are named after that DNA sequence provides the key to
when there are many family members the cell type that secretes them. They efficient knowledge transfer between
that are missing or must be excluded from trigger a series of changes in the opposite model organisms and human biology. At
the study because of their uncertain dis- cell type that prepare the cell for fusion. present, this process requires consider-
ease status. The mechanisms through which a fac- able serendipity, only because available
In addition to improved genetic map- tor and a factor are synthesized and DNA-sequence data on the genomes of
ping, successful completion of these secreted have been studied in detail by both the human and the major model
projects may require further advances in genetic techniques that are particularly organisms are fragmentary. There has
physical mapping and sequencing. Be- well developed in yeast (67). A peculiar been enough progress on the sequencing
cause of the difficulty of the genetic anal- feature of a-factor secretion is its inde- of the genomes of E. coli (74), S. cerevi-
ysis, it is unlikely that disease genes such pendence of the pathway through which siae (75), and the nematode worm C.
as the recently described one for early- yeast proteins are normally secreted. A elegans (76) to indicate the value of sys-
onset familial Alzheimer disease on chro- particular gene, STE6, encodes a protein tematic genomic sequencing. However,
mosome 14 (62) will be localized even to that allows a factor to leave the cell while the real work of determining complete
within 1 Mbp by genetic mapping. Thus, bypassing the normal secretory pathway. sequences for the genomes of these
its isolation will place great demands on Sequence analysis of STE6 revealed un- model organisms still lies ahead.
physical mapping resources and tech- mistakable similarity to the human gene In humans, the main new source of
niques for locating genes within cloned mdrl (68, 69). This gene has attracted systematic data has come from the se-
DNA. The case of Huntington disease, interest because of its involvement in quencing of cDNAs, cloned DNA copies
for which ample family resources are multiple-drug resistance, a phenomenon of the messenger RNA (mRNA) mole-
available, is instructive: the gene was in which malignant cells become simul- cules that actually direct protein synthe-
genetically mapped to a position near the taneously resistant to several of the most sis (ref. 4, pp. 102-104; ref. 77). This
end of the short arm of chromosome 4 in commonly used chemotherapeutic method is a cost-effective way of discov-
1983 (63) but has not yet been identified agents (70). Once the relatedness of STE6 ering new human genes because only a
in cloned DNA. Its position, even now, is and mdrl had been established by se- small fraction of genomic DNA directly
known only to within 2.5 Mbp (64). quence comparison, it was quickly codes for proteins. However, cDNA se-
Still another class of disease genes shown by gene-transfer experiments that quencing is unlikely to replace genomic
whose analysis has benefited from new the mouse version of the mdrl gene will sequencing as the definitive method of
infrastructure are genes whose disruption actually substitute in yeast for the func- characterizing the complete set of human
in somatic cells causes cancer. Particu- tion of STE6, correcting the inability of genes for several reasons: genes contain
larly in leukemias and lymphomas, a yeast cells with mutations in the STE6 critical DNA sequences that regulate
common mechanism by which disease- gene to secrete a factor (71). The avail- their expression but are not included in
causing mutations arise is translocation, ability of the yeast system opens up a the mRNA; there are common instances
a process of chromosome breakage and powerful new front for the study of this both in which one gene produces multi-
rejoining. The combination of YACs and poorly understood transport mechanism. ple, substantially different mRNA mole-
FISH analysis has simplified the mapping Studies of a-factor biosynthesis have cules and in which multiple genes pro-
of the chromosomal breakpoints and al- proven equally productive in providing duce nearly identical mRNA molecules,
lowed the isolation of genes whose dis- insights into human metabolism. While situations that are difficult to sort out
4342 Review: Olson Proc. Natl. Acad. Sci. USA 90 (1993)

without detailed knowledge of the struc- "The technical problems associated The Human Genome Project has stimu-
tures of the corresponding genes and with mapping and sequencing the hu- lated increased interactions between bi-
gene families; the cost advantages of man and other genomes are sufficiently ologists and scientists and technologists
cDNA sequencing erode when the goal is great that a scientifically sound program who have the necessary expertise to
accurate, full-length sequences rather requires a diversified, sustained effort solve these problems. However, the dif-
than one-pass, partial sequences; no ad- to improve our ability to analyze com- ficulties of translating these beginnings
equate solution has been found to the plex DNA molecules .... Prospects into major improvements in DNA analy-
problem that the mRNA products of dif- are ... good that the required ad- sis continue to pose substantial policy
ferent genes are present at widely differ- vanced DNA technologies would challenges.
ent concentrations, which vary dramati- emerge from a focused effort that em-
cally in different tissues, different devel- phasizes pilot projects and technologi- Conclusions
opmental stages, and different metabolic cal development."
states. For an effort that is only in its third year
NIH and DOE have both made vigorous of substantial funding, the Human Ge-
Open-Ended Improvements in the efforts to steer a significant portion of the nome Project in the United States is
Analytical Biochemistry of DNA project in the recommended directions. making good progress toward its central
However, there is little indication that goals. The policy on which it was based
The promise of DNA-sequence compar- decisive research momentum has devel- has proven farsighted even in the face of
ison as a fundamental tool in biological oped in the technology of DNA sequenc- rapid technological change. In the map-
research emphasizes the need for pro- ing. It can be argued that the problem is ping goals of the project, which have
gressively better methods of DNA anal- predominantly cultural rather than tech- dominated the first years, the experimen-
ysis, particularly DNA sequencing. A nical, relating to the different value sys- tal methods that are leading to success
critical feature of this challenge is its tems and research emphases of molecu- have diverged widely from those extant
open-ended nature. DNA sequencing is a lar genetics, on the one hand, and ana- when the NRC report was issued. None-
technology, like digital computing, for lytical chemistry, applied physics, and theless, the report's conceptual frame-
which there is no obvious point at which engineering, on the other (79). work has survived with little alteration.
An illustration of the magnitude of the
further improvements would saturate po- technical challenge is provided by the gap Examples abound of biological ad-
tential applications. A basic misimpres- between the theoretical and actual output vances that have benefited directly from
sion about the Human Genome Project is of the current generation of DNA- the early activities ofthe Human Genome
that once its narrow goals are met, de- sequencing instruments. Standard com- Project. Precise tracking of the cause-
mands for large-scale DNA sequencing mercial instruments now have the capac- and-effect relationships between activi-
will taper off. DNA sequence data are ity to produce -30 kilobase pairs (kbp) of ties funded through the Human Genome
basically a source of hypotheses, the raw sequence data per day (78). Allowing Project in the United States and specific
rigorous testing of which typically re- for the desirability of determining the biological advances is neither possible
quires the acquisition of still more DNA sequence of the two redundant strands of nor desirable. Human genome analysis is
sequence. The determination of a "ref- DNA independently and for some over- a loosely coordinated international en-
erence" human sequence will provide a sampling of data for each strand, a ratio deavor to which funding agencies and
strong incentive to trace genes through of raw sequence data to finished data of scientists in many countries have already
evolution with finer grain than the E. 5:1 should be achievable. Hence, a single made important contributions. Vigorous
coli/yeast/worm/fly/mouse/human instrument should be capable of produc- research activity funded through other
comparisons on which the NRC commit- ing 6 kbp of finished sequence per day, or Federal programs, private agencies, and
tee recommended early emphasis. Fi- =2 Mbp per year. In reality, no genome industry has also had a major impact.
nally, the study of individual variation, center has yet produced even 1 Mbp of Nonetheless, NIH and DOE program-
which plays a central role both in biology contiguous, finished sequence per year matic efforts, particularly through their
and in medicine, poses unbounded de- even though such centers typically have productive investment in YACs, FISH,
mands for DNA-sequence data. many sequencing instruments. PCR-detectable DNA polymorphisms,
Juxtaposed to this open-ended need for This paradox reflects the present im- and radiation-hybrid mapping, have
improvements in the efficiency of DNA possibility of integrating all the steps in clearly achieved good progress toward
sequencing is the reality that there has DNA sequencing into a continuous pro- the mapping goals of the NRC report and
been no obvious increase in the basic effi- cess that fully utilizes even the capabili- also contributed directly to the success of
ciency of DNA sequencing during the past ties of current sequencing instruments. many other research projects in the bio-
decade. The protocols have become more Although this experience is universal medical sciences.
robust, and the skill level required for among DNA sequencing laboratories, Like other human endeavors, the Hu-
success has been lowered. Fluorescence- there is little consensus about which man Genome Project has succeeded best
based methods with real-time detection of steps in the process are rate limiting, when it has aligned itself with broader
the products ofDNA-sequencing reactions much less what should be done to im- trends. Examples include its increasing
during electrophoresis have eased labora- prove them. reliance on PCR, yeast genetics, and flu-
tory management of large projects and What is clear is that there is a dramatic orescence microscopy. It has succeeded
decreased the subjectivity of data interpre- gap between the advanced biological least when it has tried to establish new
tation (ref. 4, pp. 595-598; ref. 78). Hence, technologies of molecular genetics and trends such as the importation of high
the practicality of large projects is greater the primitive nonbiological technologies. technology from other areas into biology.
now than it was a decade ago. However, it The latter include the physical manipu- This tension is healthy and will undoubt-
is not apparent that there has been any lation of samples, methods of chemical edly remain as the project focuses in-
change in either the efficiency or the accu- and physical analysis, process design, creased attention on its flagship goal of
racy with which an expert DNA sequencer quality control, and information han- determining the sequence of the 990% of
can gather data. dling. These areas are all critical to ef- the human genome about which we still
The NRC committee recognized this forts to scale up bench-top molecular know almost nothing.
problem but was overoptimistic about its genetics, and most biologists are poorly
resolution (ref. 1, p. 2): trained to make the needed innovations. Note Added in Proof. The gene that is mutated
Review: Olson Proc. Natl. Acad. Sci. USA 90 (1993) 4343
in Huntington disease has now been identified 26. Brownstein, B. H., Silverman, G. A., M., Yu, S., Holman, K., Baker, E.,
(80). Little, R. D., Burke, D. T., Korsmeyer, Warren, S. T., Schlessinger, D., Suther-
S. J., Schlessinger, D. & Olson, M. V. land, G. R. & Richards, R. I. (1991) Sci-
1. National Research Council (1988) Map- (1989) Science 244, 1348-1351. ence 252, 1711-1714.
ping and Sequencing the Human Ge- 27. Anand, R., Ogilvie, D. J., Butler, R., 50. Oberle, I., Rousseau, F., Heitz, D.,
nome (Natl. Acad. Press, Washington, Riley, J. H., Finniear, R. S., Powell, Kretz, C., Devys, D., Hanauer, A.,
DC). S. J., Smith, J. C. & Markham, A. F. Boue, J., Bertheas, M. F. & Mandel,
2. Smith, H. 0. & Wilcox, K. W. (1970) J. (1991) Genomics 9, 124-130. J. L. (1991) Science 252, 1097-1102.
Mol. Biol. 51, 379-391. 28. Silverman, G. A., Jockel, J. I., Domer, 51. Kinzler, K. W., Nilbert, M. C., Su,
3. Smith, H. 0. (1979) Science 205, 455- P. H., Mohr, R. M., Taillon-Miller, P. & L. K., Vogelstein, B., Bryan, T. M. et
462. Korsmeyer, S. J. (1991) Genomics 9, al. (1991) Science 253, 661-665.
4. Watson, J. D., Gilman, M., Witkowski, 219-228. 52. Groden, J., Thliveris, A., Samowitz, W.,
J. & Zoller, M. (1992) Recombinant DNA 29. Little, R. D., Pilia, G., Johnson, S., Carlson, M., Gelbert, L. et al. (1991) Cell
(Freeman, New York), 2nd Ed. D'Urso, M. & Schlessinger, D. (1992) 66, 589-600.
5. Nathans, D. (1979) Science 206, 903-909. Proc. Natl. Acad. Sci. USA 89, 177-181. 53. Fu, Y. H., Pizzuti, A., Fenwick, R. G.,
6. Botstein, D., White, R. L., Skolnick, M. 30. Chumakov, I., Rigault, P., Guillou, S., Jr., King, J., Rajnarayan, S., Dunne,
& Davis, R. W. (1980) Am. J. Hum. Ougen, P., Billaut, A. et al. (1992) Nature P. W., Dubel, J., Nasser, G. A., Ash-
Genet. 32, 314-331. (London) 359, 380-387. izawa, T., de Jong, P., Wieringa, B.,
7. Southern, E. M. (1975) J. Mol. Biol. 98, 31. Foote, S., Vollrath, D., Hilton, A. & Korneluk, R., Perryman, M. B., Ep-
503-517. Page, D. C. (1992) Science 258, 60-66. stein, H. F. & Caskey, C. T. (1992) Sci-
8. Donis-Keller, H., Green, P., Helms, C., 32. Gall, J. G. & Pardue, M. L. (1969) Proc. ence 255, 1256-1258.
Cartinhour, S., Weiffenbach, B. et al. Natl. Acad. Sci. USA 63, 378-383. 54. Legouis, R., Hardelin, J. P., Levilliers,
(1987) Cell 51, 319-337. 33. Landegent, J. E., Jansen in de Wal, N., J., Claverie, J. M., Compain, S.,
9. Sanger, F., Nicklen, S. & Coulson, A. R. van Ommen, G.-J. B., Baas, F., de Vi- Wunderle, V., Millasseau, P., Le Paslier,
(1977) Proc. Natl. Acad. Sci. USA 74, jlder, J. J. M., van Duijn, P. & van der D., Cohen, D., Caterina, D., Bouguel-
5463-5467. Ploeg, M. (1985) Nature (London) 317, eret, L., Delemarre-Van de Waal, H.,
10. Maxam, A. M. & Gilbert, W. (1977) 175-177. Lutfalla, G., Weissenbach, J. & Petit, C.
Proc. Natl. Acad. Sci. USA 74, 560-564. 34. Landegent, J. E., Jansen in de Wal, N., (1991) Cell 67, 423-435.
11. Baer, R., Bankier, A. T., Biggin, M. D., Dirks, R. W., Baas, F. & van der Ploeg, 55. Franco, B., Guioli, S., Pragliola, A.,
Deininger, P. L., Farrell, P. J., Gibson, M. (1987) Hum. Genet. 77, 366-370. Incerti, B., Bardoni, B., Tonlorenzi, R.,
T. J., Hatfull, G., Hudson, G. S., Satch- 35. Lawrence, J. B., Villnave, C. A. & Carrozzo, R., Maestrini, E., Pieretti, M.,
well, S. C., Seguin, C., Tuffnell, P. S. & Singer, R. H. (1988) Cell 52, 51-61. Taillon-Miller, P., Brown, C. J., Willard,
Barrell, B. G. (1984) Nature (London) 36. Trask, B., Pinkel, D. & van den Engh, G. H. F., Lawrence, C., Persico, M. G.,
310, 207-211. (1989) Genomics 5, 710-717. Camerino, G. & Ballabio, A. (1991) Na-
12. Alberts, B., Bray, D., Lewis, J., Raff, 37. Lichter, P., Tang, C.-J. C., Call, K., ture (London) 353, 529-536.
M., Roberts, K. & Watson, J. D. (1989) Hermanson, G., Evans, G. A., Hous- 56. Attree, O., Olvios, I. M., Okabe, I., Bai-
Molecular Biology of the Cell (Garland, man, D. & Ward, D. (1990) Science 247, ley, L., Nelson, D. L., Lewis, R. A.,
New York), 2nd Ed. 64-69. McInnes, R. R. & Nussbaum, R. L.
13. Fink, G. R. (1988) Genetics 118, 549- 38. Stallings, R. L., Torney, D. C., Hilde- (1992) Nature (London) 358, 239-242.
550. brand, C. E., Longmire, J. L., Deaven, 57. Vulpe, C., Levinson, B., Whitney, S.,
14. Beaucage, S. L. & Caruthers, M. H. L. L., Jett, J. H., Doggett, N. A. & Packman, S. & Gitschier, J. (1993) Na-
(1981) Tetrahedron Lett. 22, 1859-1862. Moyzis, R. K. (1990) Proc. Natl. Acad. ture Genet. 3, 7-13.
15. Caruthers, M. H. (1985) Science 230, Sci. USA 87, 6218-6222. 58. Chelly, J., Tumer, Z., Tonnesen, T.,
281-285. 39. Tynan, K., Olsen, A., Trask, B., de Petterson, A., Ishikawa-Brush, Y., Tom-
16. Hunkapiller, M., Kent, S., Caruthers, Jong, P., Thompson, J., Zimmermann, merup, N., Horn, N. & Monaco, A. P.
M., Dreyer, W., Firca, J., Giffin, C., W., Carrano, A. & Mohrenweiser, H. (1993) Nature Genet. 3, 14-19.
Horvath, S., Hunkapiller, T., Tempst, P. (1992) Nucleic Acids Res. 20, 1629-1636. 59. Mercer, J. F. B., Livingston, J., Hall,
& Hood, L. (1984) Nature (London) 310, 40. Cox, D. R., Burmeister, M., Price, B., Paynter, J. A., Begy, C., Chan-
105-111. E. R., Kim, S. & Myers, R. M. (1990) drasekharappa, S., Lockhart, P.,
17. Saiki, R. K., Scharf, S., Faloona, F., Science 250, 245-250. Grimes, A., Bhave, M., Siemieniak, D.
Mullis, K. B., Horn, G. T., Erlich, 41. Weber, J. L. & May, P. E. (1989) Am. J. & Glover, T. W. (1993) Nature Genet. 3,
H. A. & Arnheim, N. (1985) Science 230, Hum. Genet. 44, 388-396. 20-25.
1350-1354. 42. Litt, M. & Luty, J. A. (1989) Am. J. 60. Hall, J. M., Lee, M. K., Newman, B.,
18. Saiki, R. K., Gelfand, D. H., Stoffel, S., Hum. Genet. 44, 397-401. Morrow, J. E., Anderson, L. A., Huey,
Scharf, S. J., Higuchi, R., Horn, G. T., 43. NIH/CEPH Collaborative Mapping B. & King, M.-C. (1990) Science 250,
Mullis, K. B. & Erlich, H. A. (1988) Group (1992) Science 258, 67-86. 1684-1689.
Science 239, 487-491. 44. Weissenbach, J., Gyapay, G., Dib, C., 61. Hall, J. M., Friedman, L., Guenther, C.,
19. Olson, M., Hood, L., Cantor, C. & Bot- Vignal, A., Morissette, J., Millasseau, Lee, M. K., Weber, J. L., Black, D. M.
stein, D. (1989) Science 245, 1434-1435. P., Vaysseix, G. & Lathrop, M. (1992) & King, M.-C. (1992) Am. J. Hum.
20. Coulson, A., Sulston, J., Brenner, S. & Nature (London) 359, 794-801. Genet. 50, 1235-1242.
Karn, J. (1986) Proc. Natl. Acad. Sci. 45. Dietrich, W., Katz, H., Lincoln, S. E., 62. Schellenberg, G. D., Bird, T. D., Wijs-
USA 83, 7821-7825. Shin, H. S., Friedman, J., Dracopoli, man, E. M., Orr, H. T., Anderson, L.,
21. Olson, M. V., Dutchik, J. E., Graham, N. C. & Lander, E. S. (1992) Genetics Nemens, E., White, J. A., Bonnycastle,
M. Y., Brodeur, G. M., Helms, C., 131, 423-447. L., Weber, J. L., Elisa Alonso, M., Pot-
Frank, M., MacColin, M., Scheinman, 46. Collins, F. S. (1992) Nature Genet. 1, ter, H., Heston, L. L. & Martin, G. M.
R. & Frank, T. (1986) Proc. Natl. Acad. 3-6. (1992) Science 258, 668-671.
Sci. USA 83, 7826-7830. 47. Rommens, J. M., Iannuzzi, M. C., 63. Gusella, J. F., Wexler, N. S., Con-
22. Kohara, Y., Akiyama, K. & Isono, K. Kerem, B. S., Drumm, M. L., Melmer, neally, P. M., Naylor, S. L., Anderson,
(1987) Cell 50, 495-508. G., Dean, M., Rozmahel, R., Cole, J. L., M. A., Tanzi, R. E., Watkins, P. C., Ot-
23. Green, E. D. & Olson, M. V. (1990) Sci- Kennedy, D., Hidaka, N., Zsiga, M., tina, K., Wallace, M. R., Sakaguchi,
ence 250, 94-98. Buchwald, M., Riordan, J. R., Tsui, A. Y., Young, A. B., Shoulson, I., Be-
24. Burke, D. T., Carle, G. F. & Olson, L. C. & Collins, F. S. (1989) Science nilla, E. & Martin, J. B. (1983) Nature
M. V. (1987) Science 236, 806-812. 245, 1059-1065. (London) 306, 234-238.
25. Coulson, A., Kozono, Y., Lutterbach, 48. Verkerk, A. J. M. H., Pieretti, M., Sut- 64. Davies, K. (1992) Nature (London) 357,
B., Shownkeen, R., Sulston, J. & Wa- cliffe,J. S., Fu,Y. H., Kuhl, D. P. A. et page before 95.
terston, R. (1991) BioEssays 13, 413- al. (1991) Cell 65, 905-914. 65. Diabali, M., Selleri, L., Parry, P.,
417. 49. Kremer, E. J., Pritchard, M., Lynch, Bower, M., Young, B. D. & Evans,
4344 Review: Olson Proc. Natl. Acad. Sci. USA 90 (1993)

G. A. (1992) Nature Genet. 2, 113- & Thomas, D. Y. (1992) Science 256, Hawkins, T., Ainscough, R. & Water-
118. 232-234. ston, R. (1992) Nature (London) 356,
66. Ziemin-van der Poel, S., McCabe, N. R., 72. Fuller, R. S., Brake, A. J. & Thorner, J. 37-41.
Gill, H. J., Espinosa, R., III, Patel, Y., (1989) Science 246, 482-486. 77. Adams, M. D., Kelley, J. M., Gocayne,
Harden, A., Rubinelli, P., Smith, S. D., 73. Barr, P. J. (1991) Cell 66, 1-3. J. D., Dubnick, M., Polymeropoulos,
LeBeau, M. M., Rowley, J. D. & Diaz, 74. Daniels, D. L., Plunkett, G., III, Bur- M. H., Xiao, H., Merril, C. R., Wu, A.,
M. 0. (1991) Proc. Nati. Acad. Sci. USA land, V. & Blattner, F. R. (1992) Science
88, 10735-10739. 257, 771-778. Olde, B., Moreno, R. F., Kerlavage,
67. Botstein, D. & Fink, G. R. (1988) Sci- 75. Oliver, S. G., van der Aart, Q. J. M., A. R., McCombie, W. R. & Venter,
ence 240, 1439-1443. Agostoni-Carbone, M. L., Aigle, M., Al- J. C. (1991) Science 252, 1651-1656.
68. Kuchler, K., Sterne, R. E. & Thorner, J. berghina, L. et al. (1992) Nature (Lon- 78. Hunkapiller, T., Kaiser, R. J., Koop,
(1989) EMBO J. 8, 3973-3984. don) 357, 38-46. B. F. & Hood, L. (1991) Science 254,
69. McGrath, J. P. & Varshavsky, A. (1989) 76. Sulston, J., Du, Z., Thomas, K., Wilson, 59-67.
Nature (London) 340, 400-404. R., Hillier, L., Staden, R., Halloran, N., 79. Olson, M. V. (1991) Anal. Chem. 63,
70. Gottesman, M. M. & Pastan, I. (1988) J. Green, P., Thierry-Mieg, J., Qiu, L., 416A-420A.
Biol. Chem. 263, 12163-12166. Dear, S., Coulson, A., Craxton, M., 80. The Huntington's Disease Collaborative
71. Raymond, M., Gros, P., Whiteway, M. Durbin, R., Berks, M., Metzstein, M., Research Group (1993) Cell 72, 971-983.

You might also like