Professional Documents
Culture Documents
Human Molecular Genetics, Fourth Edition True PDF (PDFDrive) PDF
Human Molecular Genetics, Fourth Edition True PDF (PDFDrive) PDF
Human Molecular Genetics, Fourth Edition True PDF (PDFDrive) PDF
ATTCACACTTGAGTAACTC
4 EDITION
TH
450 460
46 0 470
470 480 490
49 0 500
500 510 520
450 460
0 470
4 480 490
490 500
5 00 510 520
www.garlandscience.com/
I S B N 978-0-8153-4149-9
9 780815 341499
TOM STRACHAN AND ANDREW READ
HMG4_FINAL.indd 1 11/03/2010 11:18
This page is intentionally left blank.
HUMAN
MOLECULAR
GENETICS 4TH EDITION
This page is intentionally left blank.
HUMAN
MOLECULAR
GENETICS 4TH EDITION
Tom Strachan is Scientific Director of the Institute of Human Andrew Read is Emeritus Professor of Human Genetics at
Genetics and Professor of Human Molecular Genetics at Newcastle Manchester University, UK and a Fellow of the Academy of
University, UK, and is a Fellow of the Academy of Medical Sciences Medical Sciences. Andrew has been particularly concerned with
and a Fellow of the Royal Society of Edinburgh. Tom’s early research making the benefits of DNA technology available to people with
interests were in multigene family evolution and interlocus genetic problems. He established one of the first DNA diagnostic
sequence exchange, notably in the HLA and 21-hydroxylase gene laboratories in the UK over 20 years ago (it is now one of two
clusters. While pursuing the latter, he became interested in medical National Genetics Reference Laboratories), and was founder
genetics and disorders of development. His most recent research chairman of the British Society for Human Genetics, the main
has focused on developmental control of the cohesin regulators professional body in this area. His own research is on the molecular
Nipbl and Mau-2. pathology of various hereditary syndromes, especially hereditary
hearing loss.
Tom Strachan and Andrew Read were recipients of the European
Society of Human Genetics Education Award in 2007.
Dedication
This book is dedicated to Rodney Harris and to the memory of Margaret Weddle
(1943–2009).
This page is intentionally left blank.
vii
Preface
The pace of scientific and technical advance in human genetics has not slack-
ened since our third edition appeared in 2004. This has mandated a thorough
revision and reorganization of Human Molecular Genetics with much of the text
being completely rewritten. While only a few of the basic introductory chapters
retain their identity from the third edition, the aims of the text remain the same:
to provide a framework of principles rather than a list of facts, to provide a bridge
between basic textbooks and the research literature, and to communicate our
continuing excitement about this very fast-moving area of science.
The ‘finished’ human genome reference sequence was published in 2004 and
we are now entering an era where vast DNA sequence datasets will be produced
annually. The game changer is the advent of massively parallel DNA sequencing
which is already transforming how we approach genetics. Single molecule
sequencing will lead to a dramatic reduction in DNA sequencing costs and prom-
ises the ability to sequence a human genome in hours. We can confidently expect
that the genomes of huge numbers of organisms and individuals will have been
completed before the next edition of this book.
Powerful bioinformatics programs are already being pressed into service to
compare our genome with that of a burgeoning number of other organisms.
Comparative genomics is helping us understand the forces that have shaped the
evolution of our genome and that of many model organisms that are so impor-
tant to research and various biomedical applications. These studies have already
been extremely helpful in defining the most highly conserved and presumably
important parts of our genome. They are also helping us to identify the fastest
changing components of our genome and what it is that makes us unique.
Sequence-based transcriptomic analysis will become a major industry. It will
be an important player in our quest to understand human gene function within
the context of large projects, such as the ENCODE project that aims to create an
encyclopedia of DNA elements of known function. Eventually, as vast datasets
are accumulated on gene function, the stage will truly be set for systems biology
to develop.
Other large scale projects such as HapMap have been exploring the range of
genetic variation across the world’s populations. In disease-related research,
genome-wide screening for copy number variants has identified the problems
affecting many individual patients and led to the delineation of new microdele-
tion and microduplication syndromes. Whole exome sequencing is now poised
to explain the causes of many rare recessive conditions. In cancer, the first full
genome sequences of tumors are starting to reveal the landscape of carcinogen-
esis in unprecedented detail.
For common complex diseases, however, the picture is less pleasing. A com-
bination of new science (HapMap) and new technology (high-throughput SNP
genotyping) has finally allowed researchers to identify genetic susceptibility fac-
tors for common diseases, but it has become apparent that the variants revealed
by genome-wide association studies explain only a very small part of the overall
genetic susceptibility to most complex diseases. We are left with a problem—
where is the hidden heritability? Will it be found by large-scale resequencing, or
perhaps might it lie in epigenetic effects?
viii Preface
All these developments have affected both the way genetics research is done,
and the way we think about our genome. Genetics is more than ever about
processing and collating vast amounts of private and public data to extract mean-
ingful patterns. The data have also been forcing us to revise some of our basic
ideas about human genetics. Humans are more variable than we thought, with
copy number variants accounting for more variable nucleotides than SNPs. We
transcribe almost all of our genome, and the old picture of discrete genes thinly
scattered across a sea of junk DNA is starting to look untenable. Cells are now
known to be awash with a startling variety of noncoding RNAs of unknown func-
tion. Perhaps our genome might be primarily an RNA, rather than a protein,
machine.
The fourth edition of Human Molecular Genetics has therefore been heavily
updated in order to maintain its hallmark currency and continue to provide a
framework for understanding this exciting and rapidly advancing subject.
Coverage of epigenetics, noncoding RNAs, and cell biology, including stem cells,
have all been expanded. Greater detail has been provided on the major animal
models used in genetic studies and how they are used as models for human dis-
ease. The most recent developments in next generation sequencing and com-
parative genomics have been included. The text closes by looking at the develop-
ment of therapies to treat human disease. Genetic testing and screening, stem
cells and cell therapy, and personalized medicine are all discussed together with
a balanced view of the ethical issues surrounding these issues.
We would like to thank the staff at Garland Science who have undertaken the
job of converting our drafts into the finished product, Elizabeth Owen, Mary
Purton, David Borrowdale, and Simon Hill, and hope readers will appreciate all
the work they have put into this. As ever we are grateful to our respective families
for their forbearance and support.
TEACHING RESOURCES
The images from the book are downloadable from the web in JPEG and
Powerpoint® formats via the Classwire™ course management system. The
system also provides access to instructional resources for other Garland Science
books. In addition to serving as an online archive of electronic teaching resources,
Classwire allows instructors to build customized websites for their classes. Please
visit www.classwire.com/garlandscience or email science@garland.com for
further information. (Classwire is a trademark of Chalkfree, Inc.)
ix
Acknowledgments
In writing this book we have benefited greatly from the advice of many geneticists
and biologists. We are grateful to many colleagues at Newcastle and Manchester
Universities who commented on some aspects of the text notably: C. Brooks,
K. Bushby, A. Knight, M. Lako, H. Middleton-Price, S. Ramsden, G. Saretzki,
N. Thakker, and A. Wallace. We are also appreciative of help and data provided
by various staff members at the European Bioinformatics Institute notably:
Xose Fernandez, Javier Herrero, Julio Fernandez Banet, Simon White, and Ewan
Birney. In addition, we would like to thank the following for their suggestions in
preparing this edition.
Contents
Detailed Contents
4.3 PRINCIPLES OF CELL SIGNALING 102 Additional recombination and mutation mechanisms
Signaling molecules bind to specific receptors in contribute to receptor diversity in B cells, but
responding cells to trigger altered cell behavior 102 not T cells 130
Some signaling molecules bind intracellular receptors The monospecificity of Igs and TCRs is due to allelic
that activate target genes directly 103 exclusion and light chain exclusion 131
Signaling through cell surface receptors often FURTHER READING 132
involves kinase cascades 105
Signal transduction pathways often use small
intermediate intracellular signaling molecules 106 Chapter 5
Synaptic signaling is a specialized form of cell Principles of Development 133
signaling that does not require the activation of 5.1 AN OVERVIEW OF DEVELOPMENT 134
transcription factors 107
Animal models of development 135
4.4 CELL PROLIFERATION, SENESCENCE, AND
5.2 CELL SPECIALIZATION DURING
PROGRAMMED CELL DEATH 108
DEVELOPMENT 136
Most of the cells in mature animals are non-dividing
cells, but some tissues and cells turn over rapidly 108 Cells become specialized through an irreversible
Mitogens promote cell proliferation by overcoming series of hierarchical decisions 136
braking mechanisms that restrain cell cycle The choice between alternative cell fates often
progression in G1 109 depends on cell position 137
Cell proliferation limits and the concept of cell Sometimes cell fate can be specified by lineage
senescence 111 rather than position 138
Large numbers of our cells are naturally 5.3 PATTERN FORMATION IN DEVELOPMENT 139
programmed to die 111 Emergence of the body plan is dependent on axis
The importance of programmed cell death 112 specification and polarization 139
Apoptosis is performed by caspases in response Pattern formation often depends on gradients of
to death signals or sustained cell stress 113 signaling molecules 141
Extrinsic pathways: signaling through cell Homeotic mutations reveal the molecular basis of
surface death receptors 113 positional identity 142
Intrinsic pathways: intracellular responses to
cell stress 113 5.4 MORPHOGENESIS 144
4.5 STEM CELLS AND DIFFERENTIATION 114 Morphogenesis can be driven by changes in cell
shape and size 144
Cell specialization involves a directed series of
Major morphogenetic changes in the embryo
hierarchical decisions 114
result from changes in cell affinity 145
Stem cells are rare self-renewing progenitor cells 115
Cell proliferation and apoptosis are important
Tissue stem cells allow specific adult tissues to be
morphogenetic mechanisms 145
replenished 115
Stem cell niches 117 5.5 EARLY HUMAN DEVELOPMENT:
Stem cell renewal versus differentiation 117 FERTILIZATION TO GASTRULATION 146
Embryonic stem cells and embryonic germ cells During fertilization the egg is activated to form
are pluripotent 117 a unique individual 146
Origins of cultured embryonic stem cells 118 Cleavage partitions the zygote into many smaller
Pluripotency tests 119 cells 147
Embryonic germ cells 119 Mammalian eggs are among the smallest in the
animal kingdom, and cleavage in mammals is
4.6 IMMUNE SYSTEM CELLS: FUNCTION
exceptional in several ways 148
THROUGH DIVERSITY 119
Only a small percentage of the cells in the early
The innate immune system provides a rapid
mammalian embryo give rise to the mature
response based on general pattern recognition
organism 148
of pathogens 121
Implantation 150
The adaptive immune system mounts highly specific
Gastrulation is a dynamic process by which cells
immune responses that are enhanced by memory
of the epiblast give rise to the three germ layers 151
cells 122
Humoral immunity depends on the activities of 5.6 NEURAL DEVELOPMENT 154
soluble antibodies 123 The axial mesoderm induces the overlying
In cell-mediated immunity, T cells recognize cells ectoderm to develop into the nervous system 155
containing fragments of foreign proteins 125 Pattern formation in the neural tube involves the
T-cell activation 127 coordinated expression of genes along two axes 155
The unique organization and expression of Ig and Neuronal differentiation involves the combinatorial
TCR genes 128 activity of transcription factors 157
Detailed Contents xv
5.7 GERM-CELL AND SEX DETERMINATION IN In phage display, heterologous proteins are
MAMMALS 158 expressed on the surface of phage particles 179
Primordial germ cells are induced in the early Eukaryotic gene expression is performed with
mammalian embryo and migrate to the greater fidelity in eukaryotic cell lines 180
developing gonads 158 Transient expression in insect cells by using
Sex determination involves both intrinsic and baculovirus 181
positional information 159 Transient expression in mammalian cells 181
5.8 CONSERVATION OF DEVELOPMENTAL Stable expression in mammalian cells 181
PATHWAYS 160 6.4 CLONING DNA IN VITRO : THE POLYMERASE
Many human diseases are caused by the failure of CHAIN REACTION 182
normal developmental processes 160 PCR can be used to amplify a rare target DNA
Developmental processes are often highly selectively from within a complex DNA
conserved but some show considerable species population 183
differences 160 The cyclical nature of PCR leads to exponential
FURTHER READING 161 amplification of the target DNA 183
Selective amplification of target sequences depends
Chapter 6 on highly specific binding of primer sequences 184
PCR is disadvantaged as a DNA cloning method
Amplifying DNA: Cell-based DNA Cloning
by short lengths and comparatively low yields
and PCR 163 of product 185
Cell-based DNA cloning 165 A wide variety of PCR approaches have been
The polymerase chain reaction (PCR) 165 developed for specific applications 186
6.1 PRINCIPLES OF CELL-BASED DNA CLONING 165 Allele-specific PCR 187
Manageable pieces of target DNA are joined to Multiple target amplification and whole
vector molecules by using restriction genome PCR methods 187
endonucleases and DNA ligase 166 PCR mutagenesis 188
Basic DNA cloning in bacterial cells uses vectors Real-time PCR (qPCR) 189
based on naturally occurring extrachromosomal FURTHER READING 189
replicons 168
Cloning in bacterial cells uses genetically modified Chapter 7
plasmid or bacteriophage vectors and modified Nucleic Acid Hybridization: Principles and
host cells 169
Applications 191
Transformation is the key DNA fractionation step
in cell-based DNA cloning 171 7.1 PRINCIPLES OF NUCLEIC ACID
Recombinant DNA can be selectively purified after HYBRIDIZATION 192
screening and amplifying cell clones with desired In nucleic acid hybridization a known nucleic acid
target DNA fragments 172 population interrogates an imperfectly
6.2 LARGE INSERT CLONING AND CLONING understood nucleic acid population 192
SYSTEMS FOR PRODUCING SINGLE- Probe–target heteroduplexes are easier to identify
STRANDED DNA 172 after capture on a solid support 193
Early large insert cloning vectors exploited Denaturation and annealing are affected by
properties of bacteriophage l 173 temperature, chemical environment, and the
Large DNA fragments can be cloned in bacterial extent of hydrogen bonding 194
cells by using extrachromosomal low-copy- Stringent hybridization conditions increase the
number replicons 174 specificity of duplex formation 195
Bacterial artificial chromosome (BAC) and The kinetics of DNA reassociation is also dependent
fosmid vectors 174 on the concentration of DNA 196
Bacteriophage P1 vectors and P1 artificial 7.2 LABELING OF NUCLEIC ACIDS AND
chromosomes 175 OLIGONUCLEOTIDES 197
Yeast artificial chromosomes (YACs) enable cloning Different classes of hybridization probe can be
of megabase fragments of DNA 175 prepared from DNA, RNA, and oligonucleotide
Producing single-stranded DNA for use in DNA substrates 197
sequencing and in vitro site-specific mutagenesis 176 Long nucleic acid probes are usually labeled by
M13 vectors 176 incorporating labeled nucleotides during strand
Phagemid vectors 177 synthesis 197
6.3 CLONING SYSTEMS DESIGNED FOR GENE Labeling DNA by nick translation 198
EXPRESSION 178 Random primed DNA labeling 198
Large amounts of protein can be produced by PCR-based strand synthesis labeling 198
expression cloning in bacterial cells 178 RNA labeling 198
xvi Detailed Contents
The discovery of many small RNAs that regulate The protein interactome provides an important
gene expression caused a paradigm shift in gateway to systems biology 399
cell biology 376 Defining nucleic acid–protein interactions is
MicroRNAs as regulators of translation 376 critical to understanding how genes function 401
MicroRNAs and cancer 378 Mapping protein–DNA interactions in vitro 401
Some unresolved questions 378 Mapping protein–DNA interactions in vivo 402
CONCLUSION 378 CONCLUSION 403
FURTHER READING 379 FURTHER READING 404
Chapter 12 Chapter 13
Studying Gene Function in the Human Genetic Variability and Its
Post-Genome Era 381 Consequences 405
12.1 STUDYING GENE FUNCTION: AN OVERVIEW 382 13.1 TYPES OF VARIATION BETWEEN HUMAN
Gene function can be studied at a variety of GENOMES 406
different levels 383 Single nucleotide polymorphisms are numerically
Gene expression studies 383 the most abundant type of genetic variant 406
Gene inactivation and inhibition of gene Both interspersed and tandem repeated sequences
expression 383 can show polymorphic variation 408
Defining molecular partners for gene Short tandem repeat polymorphisms: the
products 383 workhorses of family and forensic studies 408
Genomewide analyses aim to integrate analyses Large-scale variations in copy number are
of gene function 384 surprisingly frequent in human genomes 409
12.2 BIOINFORMATIC APPROACHES TO 13.2 DNA DAMAGE AND REPAIR MECHANISMS 411
STUDYING GENE FUNCTION 384 DNA in cells requires constant maintenance
Sequence homology searches can provide valuable to repair damage and correct errors 411
clues to gene function 385 The effects of DNA damage 413
Database searching is often performed with a DNA replication, transcription, recombination,
model of an evolutionarily conserved sequence 386 and repair use multiprotein complexes that
Comparison with documented protein domains share components 415
and motifs can provide additional clues to Defects in DNA repair are the cause of many human
gene function 388 diseases 415
Complementation groups 416
12.3 STUDYING GENE FUNCTION BY SELECTIVE
GENE INACTIVATION OR MODIFICATION 389 13.3 PATHOGENIC DNA VARIANTS 416
Clues to gene function can be inferred from Deciding whether a DNA sequence change is
different types of genetic manipulation 389 pathogenic can be difficult 416
RNA interference is the primary method for evaluating Single nucleotide and other small-scale changes
gene function in cultured mammalian cells 390 are a common type of pathogenic change 417
Global RNAi screens provide a systems-level Missense mutations 417
approach to studying gene function in cells 392 Nonsense mutations 418
Inactivation of genes in the germ line provides the Changes that affect splicing of the primary
most detailed information on gene function 392 transcript 419
Frameshifts 420
12.4 PROTEOMICS, PROTEIN–PROTEIN
Changes that affect the level of gene expression 421
INTERACTIONS, AND PROTEIN–DNA
Pathogenic synonymous (silent) changes 422
INTERACTIONS 393
Variations at short tandem repeats are occasionally
Proteomics is largely concerned with identifying
pathogenic 422
and characterizing proteins at the biochemical
Dynamic mutations: a special class of
and functional levels 393
pathogenic microsatellite variants 423
Large-scale protein–protein interaction studies
Variants that affect dosage of one or more genes
seek to define functional protein networks 395
may be pathogenic 425
Yeast two-hybrid screening relies on reconstituting
a functional transcription factor 396 13.4 MOLECULAR PATHOLOGY: UNDERSTANDING
Affinity purification–mass spectrometry is widely THE EFFECT OF VARIANTS 428
used to screen for protein partners of a test The biggest distinction in molecular pathology is
protein 397 between loss-of-function and gain-of-function
Suggested protein–protein interactions are often changes 428
validated by co-immunoprecipitation or Allelic heterogeneity is a common feature of
pull-down assays 398 loss-of-function phenotypes 429
xx Detailed Contents
Loss-of-function mutations produce dominant Lod scores of +3 and –2 are the criteria for linkage
phenotypes when there is haploinsufficiency 429 and exclusion (for a single test) 453
Dominant-negative effects occur when a mutated For whole genome searches a genomewide threshold
gene product interferes with the function of the of significance must be used 455
normal product 431 14.4 MULTIPOINT MAPPING 455
Gain-of-function mutations often affect the way in The CEPH families were used to construct marker
which a gene or its product reacts to regulatory framework maps 455
signals 432 Multipoint mapping can locate a disease locus on a
Diseases caused by gain of function of G-protein- framework of markers 456
coupled hormone receptors 433
14.5 FINE-MAPPING USING EXTENDED PEDIGREES
Allelic homogeneity is not always due to a gain of
AND ANCESTRAL HAPLOTYPES 457
function 433
Autozygosity mapping can map recessive conditions
Loss-of-function and gain-of-function mutations in
efficiently in extended inbred families 457
the same gene will cause different phenotypes 433
Identifying shared ancestral chromosome segments
13.5 THE QUEST FOR GENOTYPE–PHENOTYPE allows high-resolution genetic mapping 459
CORRELATIONS 435
14.6 DIFFICULTIES WITH STANDARD LOD SCORE
The phenotypic effect of loss-of-function mutations
ANALYSIS 460
depends on the residual level of gene function 435
Errors in genotyping and misdiagnoses can generate
Genotype–phenotype correlations are especially
spurious recombinants 461
poor for conditions caused by mitochondrial
Computational difficulties limit the pedigrees that
mutations 436
can be analyzed 462
Variability within families is evidence of modifier
Locus heterogeneity is always a pitfall in human
genes or chance effects 437
gene mapping 462
CONCLUSION 438 Pedigree-based mapping has limited resolution 462
FURTHER READING 438 Characters with non-Mendelian inheritance cannot
be mapped by the methods described in this
Chapter 14 chapter 463
Genetic Mapping of Mendelian CONCLUSION 463
Characters 441 FURTHER READING 464
14.1 THE ROLE OF RECOMBINATION IN GENETIC
Chapter 15
MAPPING 442
Recombinants are identified by genotyping parents Mapping Genes Conferring Susceptibility to
and offspring for pairs of loci 442 Complex Diseases 467
The recombination fraction is a measure of the 15.1 FAMILY STUDIES OF COMPLEX DISEASES 468
genetic distance between two loci 442 The risk ratio (l) is a measure of familial clustering 468
Recombination fractions do not exceed 0.5, however Shared family environment is an alternative
great the distance between two loci 445 explanation for familial clustering 469
Mapping functions define the relationship between Twin studies suffer from many limitations 469
recombination fraction and genetic distance 445 Separated monozygotic twins 470
Chiasma counts give an estimate of the total map Adoption studies are the gold standard for
length 446 disentangling genetic and environmental factors 471
Recombination events are distributed non-randomly 15.2 SEGREGATION ANALYSIS 471
along chromosomes, and so genetic map distances Complex segregation analysis estimates the most
may not correspond to physical distances 446 likely mix of genetic factors in pooled family data 471
14.2 MAPPING A DISEASE LOCUS 448 15.3 LINKAGE ANALYSIS OF COMPLEX
Mapping human disease genes depends on genetic CHARACTERS 473
markers 448 Standard lod score analysis is usually inappropriate
For linkage analysis we need informative meioses 449 for non-Mendelian characters 473
Suitable markers need to be spread throughout the Near-Mendelian families 473
genome 449 Non-parametric linkage analysis does not require a
Linkage analysis normally uses either fluorescently genetic model 474
labeled microsatellites or SNPs as markers 450 Identity by descent versus identity by state 474
14.3 TWO-POINT MAPPING 451 Affected sib pair analysis 474
Scoring recombinants in human pedigrees is not Linkage analysis of complex diseases has several
always simple 451 weaknesses 475
Computerized lod score analysis is the best way to Significance thresholds 475
analyze complex pedigrees for linkage between Striking lucky 476
Mendelian characters 452 An example: linkage analysis in schizophrenia 476
Detailed Contents xxi
15.4 ASSOCIATION STUDIES AND LINKAGE Mouse models have a special role in identifying
DISEQUILIBRIUM 477 human disease genes 503
Associations have many possible causes 477 16.2 THE VALUE OF PATIENTS WITH
Association is quite distinct from linkage, except CHROMOSOMAL ABNORMALITIES 504
where the family and the population merge 479 Patients with a balanced chromosomal abnormality
Association studies depend on linkage and an unexplained phenotype provide valuable
disequilibrium 479 clues for research 504
The size of shared ancestral chromosome X–autosome translocations are a special case 505
segments 480 Rearrangements that appear balanced under the
Studying linkage disequilibrium 481 microscope are not always balanced at the
The HapMap project is the definitive study of linkage molecular level 506
disequilibrium across the human genome 481 Comparative genomic hybridization allows a
The use of tag-SNPs 484 systematic search for microdeletions and
15.5 ASSOCIATION STUDIES IN PRACTICE 484 microduplications 507
Early studies suffered from several systematic Long-range effects are a pitfall in disease gene
weaknesses 485 identification 508
The transmission disequilibrium test avoids the 16.3 POSITION-INDEPENDENT ROUTES TO
problem of matching controls 485 IDENTIFYING DISEASE GENES 509
Association can be more powerful than linkage studies A disease gene may be identified through knowing
for detecting weak susceptibility alleles 486 the protein product 509
Case-control designs are a feasible alternative to A disease gene may be identified through the function
the TDT for association studies 487 or interactions of its product 509
Special populations can offer advantages in A disease gene may be identified through an animal
association studies 488 model, even without positional information 511
A new generation of genomewide association (GWA) A disease gene may be identified by using
studies has finally broken the logjam in complex characteristics of the DNA sequence 511
disease research 488
The size of the relative risk 490 16.4 TESTING POSITIONAL CANDIDATE GENES 512
For Mendelian conditions, a candidate gene is
15.6 THE LIMITATIONS OF ASSOCIATION normally screened for mutations in a panel of
STUDIES 491 unrelated affected patients 512
The common disease–common variant hypothesis Epigenetic changes might cause a disease without
proposes that susceptibility factors have ancient changing the DNA sequence 513
origins 491 The gene underlying a disease may not be an
The mutation–selection hypothesis suggests that obvious one 514
a heterogeneous collection of recent mutations Locus heterogeneity is the rule rather than the
accounts for most disease susceptibility 492 exception 514
A complete account of genetic susceptibility will Further studies are often necessary to confirm that
require contributions from both the common the correct gene has been identified 515
disease–common variant and mutation–selection
hypotheses 493 16.5 IDENTIFYING CAUSAL VARIANTS FROM
CONCLUSION 494 ASSOCIATION STUDIES 515
Identifying causal variants is not simple 516
FURTHER READING 495 Causal variants are identified through a combination
of statistical and functional studies 516
Chapter 16 Functional analysis of SNPs in sequences with no
Identifying Human Disease Genes and known function is particularly difficult 518
Susceptibility Factors 497 Calpain-10 and type 2 diabetes 518
Chromosome 8q24 and susceptibility to
16.1 POSITIONAL CLONING 498
prostate cancer 518
Positional cloning identifies a disease gene from
its approximate chromosomal location 499 16.6 EIGHT EXAMPLES OF DISEASE GENE
The first step in positional cloning is to define the IDENTIFICATION 519
candidate region as tightly as possible 500 Case study 1: Duchenne muscular dystrophy 519
The second step is to establish a list of genes in the Case study 2: cystic fibrosis 520
candidate region 500 Case study 3: branchio-oto-renal syndrome 522
The third step is to prioritize genes from the Case study 4: multiple sulfatase deficiency 522
candidate region for mutation testing 501 Case study 5: persistence of intestinal lactase 523
Appropriate expression 501 Case study 6: CHARGE syndrome 525
Appropriate function 502 Case study 7: breast cancer 526
Homologies and functional relationships 502 Case study 8: Crohn disease 528
xxii Detailed Contents
16.7 HOW WELL HAS DISEASE GENE ATM: the initial detector of damage 555
IDENTIFICATION WORKED? 531 Nibrin and the MRN complex 555
Most variants that cause Mendelian disease have CHEK2: a mediator kinase 555
been identified 531 The role of BRCA1/2 556
Genomewide association studies have been very p53 to the rescue 556
successful, but identifying the true functional Defects in the repair machinery underlie a variety
variants remains difficult 532 of cancer-prone genetic disorders 556
Clinically useful findings have been achieved in a Microsatellite instability was discovered through
few complex diseases 532 research on familial colon cancer 557
Alzheimer disease 532 17.6 GENOMEWIDE VIEWS OF CANCER 559
Age-related macular degeneration (ARMD) 533 Cytogenetic and microarray analyses give
Eczema (atopic dermatitis) 533 genomewide views of structural changes 559
The problem of hidden heritability 534 New sequencing technologies allow genomewide
CONCLUSION 534 surveys of sequence changes 559
FURTHER READING 535 Further techniques provide a genomewide view of
epigenetic changes in tumors 560
Genomewide views of gene expression are used to
Chapter 17 generate expression signatures 560
Cancer Genetics 537
17.7 UNRAVELING THE MULTI-STAGE
17.1 THE EVOLUTION OF CANCER 539 EVOLUTION OF A TUMOR 561
17.2 ONCOGENES 540 The microevolution of colorectal cancer has been
Oncogenes function in growth signaling pathways 541 particularly well documented 561
Oncogene activation involves a gain of function 542 17.8 INTEGRATING THE DATA: CANCER AS CELL
Activation by amplification 542 BIOLOGY 564
Activation by point mutation 543 Tumorigenesis should be considered in terms of
Activation by a translocation that creates a pathways, not individual genes 564
novel chimeric gene 543 Malignant tumors must be capable of stimulating
Activation by translocation into a angiogenesis and metastasizing 565
transcriptionally active chromatin region 544 Systems biology may eventually allow a unified
Activation of oncogenes is only oncogenic under overview of tumor development 566
certain circumstances 546 CONCLUSION 566
17.3 TUMOR SUPPRESSOR GENES 546 FURTHER READING 567
Retinoblastoma provided a paradigm for
understanding tumor suppressor genes 546
Some tumor suppressor genes show variations on Chapter 18
the two-hit paradigm 547 Genetic Testing of Individuals 569
Loss of heterozygosity has been widely used as a
18.1 WHAT TO TEST AND WHY 571
marker to locate tumor suppressor genes 548
Many different types of sample can be used for
Tumor suppressor genes are often silenced
genetic testing 571
epigenetically by methylation 549
RNA or DNA? 572
17.4 CELL CYCLE DYSREGULATION IN CANCER 550 Functional assays 572
Three key tumor suppressor genes control events
18.2 SCANNING A GENE FOR MUTATIONS 572
in G1 phase 551
A gene is normally scanned for mutations by
pRb: a key regulator of progression through
sequencing 573
G1 phase 551
A variety of techniques have been used to scan a
p53: the guardian of the genome 551
gene rapidly for possible mutations 574
CDKN2A: one gene that encodes two key
Scanning methods based on detecting
regulatory proteins 552
mismatches or heteroduplexes 574
17.5 INSTABILITY OF THE GENOME 553 Scanning methods based on single-strand
Various methods are used to survey cancer cells for conformation analysis 576
chromosomal changes 553 Scanning methods based on translation: the
Three main mechanisms are responsible for the protein truncation test 576
chromosome instability and abnormal Microarrays allow a gene to be scanned for almost
karyotypes 553 any mutation in a single operation 576
Telomeres are essential for chromosomal stability 554 DNA methylation patterns can be detected by a
DNA damage sends a signal to p53, which initiates variety of methods 577
protective responses 555 Unclassified variants are a major problem 578
Detailed Contents xxiii
Even if no single test gives a strong prediction, Nuclear transfer has been used to produce
maybe a battery of tests will 623 genetically modified domestic mammals 646
Much remains unknown about the clinical validity Exogenous promoters provide a convenient way
of susceptibility tests 624 of regulating transgene expression 646
Risk of type 2 diabetes: the Framingham and Tetracycline-regulated inducible transgene
Scandinavian studies 625 expression 647
Risk of breast cancer: the study of Pharoah Tamoxifen-regulated inducible transgene
and colleagues 625 expression 647
Risk of prostate cancer: the study of Zheng Transgene expression may be influenced by position
and colleagues 627 effects and locus structure 648
Evidence on the clinical utility of susceptibility
20.3 TARGETED GENOME MODIFICATION
testing is almost wholly lacking 627
AND GENE INACTIVATION IN VIVO 648
19.5 POPULATION SCREENING 628 The isolation of pluripotent ES cell lines was a
Screening tests are not diagnostic tests 629 landmark in mammalian genetics 648
Prenatal screening for Down syndrome defines an Gene targeting allows the production of animals
arbitrary threshold for diagnostic testing 629 carrying defined mutations in every cell 649
Acceptable screening programs must fit certain Different gene-targeting approaches create null
criteria 631 alleles or subtle point mutations 650
What would screening achieve? 631 Gene knockouts 650
Sensitivity and specificity 631 Gene knock-ins 650
Choosing subjects for screening 633 Creating point mutations 651
An ethical framework for screening 633 Microbial site-specific recombination systems allow
Some people worry that prenatal screening conditional gene inactivation and chromosome
programs might devalue and stigmatize engineering in animals 651
affected people 633 Conditional gene inactivation 653
Some people worry that enabling people with Chromosome engineering 653
genetic diseases to lead normal lives spells Zinc finger nucleases offer an alternative way of
trouble for future generations 634 performing gene targeting 654
19.6 THE NEW PARADIGM: PREDICT AND Targeted gene knockdown at the RNA level involves
PREVENT? 634 cleaving the gene transcripts or inhibiting their
translation 656
CONCLUSION 636
In vivo gene knockdown by RNA interference 656
FURTHER READING 637 Gene knockdown with morpholino antisense
oligonucleotides 656
Chapter 20
20.4 RANDOM MUTAGENESIS AND LARGE-SCALE
Genetic Manipulation of Animals for
ANIMAL MUTAGENESIS SCREENS 657
Modeling Disease and Investigating Gene Random mutagenesis screens often use chemicals
Function 639 that mutate bases by adding ethyl groups 658
20.1 OVERVIEW 640 Insertional mutagenesis can be performed in ES
A wide range of species are used in animal modeling 640 cells by using expression-defective transgenes as
Most animal models are generated by some kind gene traps 658
of artificially designed genetic modification 640 Transposons cause random insertional gene
Multiple types of phenotype analysis can be inactivation by jumping within a genome 658
performed on animal models 642 Insertional mutagenesis with the Sleeping
Beauty transposon 659
20.2 MAKING TRANSGENIC ANIMALS 643
piggyBac-mediated transposition 660
Transgenic animals have exogenous DNA inserted
The International Mouse Knockout Consortium
into the germ line 643
seeks to knock out all mouse genes 660
Pronuclear microinjection is an established method
for making some transgenic animals 644 20.5 USING GENETICALLY MODIFIED ANIMALS
Transgenes can also be inserted into the germ line TO MODEL DISEASE AND DISSECT GENE
via germ cells, gametes, or pluripotent cells derived FUNCTION 661
from the early embryo 645 Genetically modified animals have furthered our
Gene transfer into gametes and germ cell knowledge of gene function 661
precursors 645 Creating animal models of human disease 662
Gene transfer into pluripotent cells of the early Loss-of-function mutations are modeled by selectively
embryo or cultured pluripotent stem cells 645 inactivating the orthologous mouse gene 662
Gene transfer into somatic cells (animal Null alleles 663
cloning) 645 Humanized alleles 663
Detailed Contents xxv
Leaky mutations and hypomorphs 663 21.4 PRINCIPLES OF GENE THERAPY AND
Gain-of-function mutations are conveniently MAMMALIAN GENE TRANSFECTION
modeled by expressing a mutant transgene 663 SYSTEMS 696
Modeling chromosomal disorders is a challenge 666 Genes can be transferred to a patient’s cells either in
Modeling human cancers in mice is complex 667 culture or within the patient’s body 699
Gene knockouts to model loss of tumor Integration of therapeutic genes into host
suppressor function 668 chromosomes has significant advantages but raises
Transgenic mice to model oncogene activation 670 major safety concerns 700
Modeling sporadic cancers 670 Viral vectors offer strong and sometimes long-term
Humanized mice can overcome some of the species transgene expression, but many come with
differences that make mice imperfect models 670 safety risks 700
New developments in genetics are extending the Retroviral vectors 701
range of disease models 671 Adenoviral and adeno-associated virus (AAV)
CONCLUSION 672 vectors 702
Other viral vectors 703
FURTHER READING 673
Non-viral vector systems are safer, but gene
transfer is less efficient and transgene expression
Chapter 21 is often relatively weak 704
Genetic Approaches to Treating Disease 677 Transfer of naked nucleic acid by direct
21.1 TREATMENT OF GENETIC DISEASE VERSUS injection or particle bombardment 704
GENETIC TREATMENT OF DISEASE 678 Lipid-mediated gene transfer 704
Treatment of genetic disease is most advanced for Compacted DNA nanoparticles 704
disorders whose biochemical basis is well 21.5 RNA AND OLIGONUCLEOTIDE THERAPEUTICS
understood 678 AND THERAPEUTIC GENE REPAIR 704
Genetic treatment of disease may be conducted Therapeutic RNAs and oligonucleotides are often
at many different levels 679 designed to selectively inactivate a mutant allele 705
21.2 GENETIC APPROACHES TO DISEASE Therapeutic ribozymes 706
TREATMENT USING DRUGS, RECOMBINANT Therapeutic siRNA 707
PROTEINS, AND VACCINES 680 Antisense oligonucleotides can induce exon
Drug companies have invested heavily in genomics skipping to bypass a harmful mutation 707
to try to identify new drug targets 680 Gene targeting with zinc finger nucleases can repair
Therapeutic proteins can be produced by expression a specific pathogenic mutation or specifically
cloning in microbes, mammalian cell lines, or inactivate a target gene 708
transgenic animals 681 21.6 GENE THERAPY IN PRACTICE 709
Genetic engineering has produced novel antibodies The first gene therapy successes involved recessively
with therapeutic potential 683 inherited blood cell disorders 710
Aptamers are selected to bind to specific target Gene therapies for many other monogenic disorders
proteins and inhibit their functions 685 have usually had limited success 712
Vaccines have been genetically engineered to Cancer gene therapies usually involve selective killing
improve their functions 686 of cancer cells, but tumors can grow again by
Cancer vaccines 687 proliferation of surviving cells 713
21.3 PRINCIPLES AND APPLICATIONS OF CELL Multiple HIV gene therapy strategies are being pursued,
THERAPY 687 but progress toward effective treatment is slow 714
Stem cell therapies promise to transform the CONCLUSION 715
potential of transplantation 687 FURTHER READING 716
Embryonic stem cells 688
Tissue stem cells 688
Practical difficulties in stem cell therapy 689
Allogeneic and autologous cell therapy 689
Nuclear reprogramming offers new approaches to
disease treatment and human models of human
disease 690
Induced pluripotency in somatic cells 693
Transdifferentiation 694
Stem cell therapy has been shown to work but is
at an immature stage 695
This page is intentionally left blank.
Chapter 1
• The great bulk of eukaryotic genetic information is stored in the DNA found in the nucleus.
A tiny amount is also stored in mitochondrial and chloroplast DNA.
• DNA molecules are polymers of nucleotide repeat units that consist of one of four types of
nitrogenous base, plus a sugar, plus a phosphate.
• The backbone of any DNA molecule is a sugar–phosphate polymer, but it is the sequence of
the bases attached to the sugars that determines the identity and genetic function of any
DNA sequence.
• DNA normally occurs as a double helix, comprising two strands that are held together by
hydrogen bonds between pairs of complementary nitrogenous bases.
• Transmission of genetic information from cell to cell is normally achieved by copying the
complementary DNA molecules that are then shared equally between two daughter cells.
• Genes are discrete segments of DNA that are used as a template to synthesize a functional
complementary RNA molecule.
• Most genes make an RNA that will serve as a template for making a polypeptide.
• Various genes make RNA molecules that do not encode polypeptide. Such noncoding RNA
often helps regulate the expression of other genes.
• Like DNA, RNA molecules are polymers of nucleotide repeat units that consist of one of four
types of nitrogenous base (three of these are the same as in DNA), plus a slightly different
sugar, plus a phosphate.
• Unlike DNA, RNA molecules are usually single-stranded.
• To become functional, newly synthesized RNA must undergo a series of maturation steps
such as excising unwanted intervening sequences and chemical modification of certain
bases.
• Polypeptide synthesis occurs at ribosomes, either in the cytoplasm or inside mitochondria
and chloroplasts.
• The sequential information encoded in the RNA is interpreted at the ribosome via a triplet
genetic code, determining the basic structure of the polypeptide.
• Polypeptides often undergo a variety of chemical modifications.
• Proteins display extraordinary structural and functional diversity.
2 Chapter 1: Nucleic Acid Structure and Gene Expression
Nucleic acids and polypeptides are linear sequences of simple Figure 1.1 Repeat units in nucleic acids.
(A) The linear backbone of nucleic acids
repeat units consists of alternating phosphate and sugar
residues. Attached to each sugar is a base.
Nucleic acids The basic repeat unit (pale peach shading)
DNA and RNA have very similar structures. Both are large polymers with long consists of a base + sugar + phosphate = a
linear backbones of alternating residues of a phosphate and a five-carbon sugar. nucleotide. The sugar has five carbon atoms
Attached to each sugar residue is a nitrogenous base (Figure 1.1A). The sugars in numbered 1¢ to 5¢. (B) In DNA, the sugar is
DNA and RNA differ, in either lacking or possessing, respectively, an –OH group deoxyribose. (C) In RNA, the sugar is ribose,
at their 2¢-carbon positions (Figure 1.1B, C). In deoxyribonucleic acid (DNA), the which differs from deoxyribose in having a
hydroxyl (OH) group attached to carbon 2¢.
sugar is deoxyribose; in ribonucleic acid (RNA), the sugar is ribose.
Unlike the sugar and phosphate residues, the bases of a nucleic acid molecule
vary. The sequence of bases identifies the nucleic acid and determines its func-
tion. Four types of base are commonly found in DNA: adenine (A), cytosine (C),
guanine (G), and thymine (T). RNA also has four major types of base. Three of
them (adenine, cytosine, and guanine) also occur in DNA, but in RNA uracil (U)
replaces thymine (Figure 1.2A).
6
C 7
C
6
7
4
C 1 C6 5
5 N
5 N 5 N7 1N C
C CH HN C
1N N CH
3 CH
CH 8
8 HC C
HC 8 C C N9
C C CH N 2 N 4
2 4 N 2 6 2 N 4 H
N H O N H2N 3
3 H 3 9
9
1
OH
thymine (T) uracil (U)
CH2 O
5¢
O O
C H H C 1¢
4¢
C C
CH3 H 3¢ 2¢ H
C 5 C 5
HN 4 C HN 4 CH OH OH
3 3
Purine
Adenine adenosine adenosine monophosphate adenosine diphosphate (ADP) adenosine triphosphate (ATP)
(AMP)a
Guanine guanosine guanosine monophosphate guanosine diphosphate (GDP) guanosine triphosphate (GTP)
(GMP)a
Pyrimidine
Cytosine cytidine cytidine monophosphate (CMP)a cytidine diphosphate (CDP) cytidine triphosphate (CTP)
Thymine thymidine thymidine monophosphate thymidine diphosphate (TDP) thymidine triphosphate (TTP)
(TMP)a
Uracil uridine uridine monophosphate (UMP)a uridine diphosphate (UDP) uridine triphosphate (UTP)
Note that TMP, TDP, and TTP are not normally found in cells.
Bases consist of heterocyclic rings of carbon and nitrogen atoms, and can be
divided into two classes: purines (A and G), which have two interlocked rings,
and pyrimidines (C, T, and U), which have a single ring. In nucleic acids, each
base is attached to carbon 1¢ (one prime) of the sugar; a sugar with an attached
base is called a nucleoside (Figure 1.2B). A nucleoside with a phosphate group
attached at the 5¢ or 3¢ carbon of the sugar is the basic repeat unit of a DNA strand
and is called a nucleotide (Figure 1.2C and Table 1.1).
Polypeptides
Proteins are composed of one or more polypeptide molecules that may be modi-
fied by the addition of carbohydrate side chains or other chemical groups. Like
DNA and RNA, polypeptide molecules are polymers that are a linear sequence of
repeating units. The basic repeat unit is called an amino acid. An amino acid has
a positively charged amino group (–NH2) and a negatively charged carboxylic
acid (carboxyl) group (–COOH). These are connected by a central a-carbon atom
that also bears an identifying side chain that determines the chemical nature of
the amino acid. Polypeptides are formed by a condensation reaction between the
amino group of one amino acid and the carboxyl group of the next, to form a
repeating backbone, where the side chain (called an R-group) can differ from one
amino acid to another (Figure 1.3).
The 20 different common amino acids can be categorized according to their
side chains:
• basic amino acids (Figure 1.4A) carry a side chain with a net positive charge
at physiological pH;
DNA, RNA, AND POLYPEPTIDES 5
CH2
CH
b 2 CH2
CH2 CH2
g CH2 CH2
CH2
d NH C CH2 CH2
CH2 N CH
e C C C
O– O–
+ + +
NH3 H2N NH2 HC NH O O
basic acidic
(C)
CH2
C
CH2 HC CH
CH2 CH2 HC CH
CH2 CH CH3 C CH2
C C
O NH2 O NH2 OH OH OH SH
(D)
H O
HN C C OH
CH2 CH a
H3C CH2 H 2C CH2
CH CH
Figure 1.4 R groups of the 20 common
H CH3 H3C CH3 H3C CH3 CH3 CH2 amino acids, grouped according to
glycine alanine valine leucine isoleucine proline chemical class. There are 11 polar amino
(Gly; G) (Ala; A) (Val; V) (Leu; L) (Ile; I) (Pro; P) acids, divided into three classes: (A) basic
amino acids (positively charged); (B) acidic
amino acids (negatively charged); and
CH2
(C) uncharged polar amino acids bearing
CH2 CH2
C CH three different types of chemical group.
CH2 Polar chemical groups are highlighted. (D) In
HC CH HC C C
addition, a fourth class is composed of nine
S HC CH HC C C nonpolar neutral amino acids. Amino acids
within each class are chemically very similar.
CH3 CH CH NH Side-chain carbon atoms are numbered
methionine phenylalanine tryptophan from the central a-carbon atom (see the
(Met; M) (Phe; F) (Trp; W) lysine side chain). In proline, the R group’s
side chain connects to the amino acid’s –NH2
nonpolar group as well as to its central a-carbon atom.
6 Chapter 1: Nucleic Acid Structure and Gene Expression
Hydrogen Hydrogen bonds form when a hydrogen atom interacts with electron-attracting
atoms, usually oxygen or nitrogen atoms
Ionic Ionic interactions occur between charged groups. They can be very strong in
crystals, but in an aqueous environment the charged groups are shielded by both
water molecules and ions in solution and so are quite weak. Nevertheless, they
can be very important in biological function, as in enzyme–substrate recognition
Van der Waals Any two atoms in close proximity show a weak attractive bonding interaction
forces (van der Waals attraction) as a result of their fluctuating electrical charges. When
atoms become extremely close, they repel each other very strongly (van der
Waals repulsion). Although individual van der Waals attractions are very weak, the
cumulative effect of many such attractive forces can be important when there is a
very good fit between the surfaces of two macromolecules
In general, polar amino acids are hydrophilic, and nonpolar amino acids are
hydrophobic. Glycine, with its very small side chain, and cysteine (whose –SH
group is not as polar as an –OH group) occupy intermediate positions on the
hydrophilic–hydrophobic scale.
As described below, the side chains can be modified by the addition of various
chemical groups or sugar chains.
BOX 1.1 THE IMPORTANCE OF HYDROGEN BONDING IN NUCLEIC ACIDS AND PROTEINS
Intermolecular hydrogen bonding in nucleic acids codons bind to tRNA during translation. Many regulatory RNAs,
This is important in permitting the formation of the following double- such as microRNAs, control the expression of selected target
stranded nucleic acids: genes by base pairing to complementary sequences at the RNA
• Double-stranded DNA. The stability of the double helix is level.
maintained by hydrogen bonding between A–T and C–G base
Intramolecular hydrogen bonding in nucleic acids
pairs. The individual hydrogen bonds are weak, but in eukaryotic
This is particularly prevalent in RNA molecules. Intramolecular base
cells the two strands of a DNA helix are held together by between
pairing can form hairpins that may be crucially important to the
tens of thousands and hundreds of millions of hydrogen bonds.
structure of some RNAs such as rRNA and tRNA (see Figure 1.9), and as
• DNA–RNA duplexes. Hydrogen bonds form naturally between DNA
targets for gene regulation.
and RNA during transcription, but the base pairing is transient
because the RNA migrates away from DNA as it matures. Intramolecular hydrogen bonding in proteins
• Double-stranded RNA. This occurs stably in the genomes of some Several characteristic elements of protein secondary structure, such
viruses. It also arises transiently in cells during gene expression. as a-helices and b-pleated sheets, arise because of hydrogen bonding
For example, during RNA splicing, small nuclear RNA molecules between side chains of different amino acids on the same polypeptide
bind to complementary sequences in pre-mRNA, and mRNA chain.
NUCLEIC ACID STRUCTURE AND DNA REPLICATION 7
Because the phosphodiester bonds link carbon atoms number 3¢ and number 5¢
3¢
5¢ of successive sugar residues, the two ends of a linear DNA strand are different.
The 5¢ end has a terminal sugar residue in which carbon atom number 5¢ is not
GC
linked to another sugar residue. The 3¢ end has a terminal sugar residue whose 3¢ TA
carbon is not involved in phosphodiester bonding. The two strands of a DNA AT
duplex are described as being anti-parallel to each other because the 5¢Æ3¢ direc- GC
minor
tion of one DNA strand is the opposite to that of its partner, according to Watson– groove
Crick base-pairing rules (Figure 1.8). AT pitch
Genetic information is encoded by the linear sequence of bases in the DNA GC 3.6 nm
strands. The two strands of a DNA duplex have complementary sequences, so the CG
sequence of bases of one DNA strand can therefore readily be inferred from that AT
major
of the other strand. It is usual to describe DNA by writing the sequence of bases groove
of one strand only, in the 5¢Æ3¢ direction, which is the direction of synthesis of TA
new DNA or RNA from a DNA template. When describing the sequence of a DNA CG
TA
region encompassing two neighboring bases (a dinucleotide) on one DNA strand,
AT
it is usual to insert a ‘p’ to denote a connecting phosphodiester bond. So, a CG
base pair means a C on one DNA strand is hydrogen-bonded to a G on the com-
plementary strand, but CpG represents a deoxycytidine covalently linked to a TA
GC
neighboring deoxyguanosine on the same DNA strand (see Figure 1.8). 5¢
Unlike DNA, RNA is normally single-stranded except for certain viruses that 3¢
have double-stranded RNA genomes. However, to perform certain cell functions 1 nm
two RNA molecules may need to associate transiently to form base pairs, and
Figure 1.7 Features of the DNA double
intermolecular hydrogen bonding also permits the formation of transient RNA–
helix. The two DNA strands wind round each
DNA duplexes (see Box 1.1). other, producing a minor groove and a major
In addition, hydrogen bonding can occur between bases within a single- groove in the double helix. The double helix
stranded RNA (or DNA) molecule to produce structurally and functionally impor- has a pitch of 3.6 nm and a radius of 1 nm
tant stretches of double-stranded sequence. Hairpin structures may be formed per turn.
with stems that are stabilized by hydrogen bonding between bases (Figure 1.9A).
Intrachain base pairing causes certain RNA molecules to have complex struc-
tures (Figure 1.9B).
In double-stranded RNA, A pairs with U instead of T. Although G usually pairs
with C, sometimes G–U base pairs are formed (see the example in Figure 1.9B).
Although not particularly stable, G–U base pairing does not significantly distort
the RNA–RNA helix.
5¢ end 3¢ end
H OH
O–
H 2¢ 3¢ H
C C
O P O– 4¢
1¢ C H H C
O 5¢
O CH2
CH2
C G
O O
5¢
C H H C 1¢ –O
4¢ P O
C C
H 3¢ 2¢ H
HO
O H
H 2¢
3¢ H
CC
O P O– 4¢
1¢ C H H C
O 5¢
O CH2
CH2
G C
O O
5¢
C H H C 1¢ –O
4¢ P O
C C
H 3¢ 2¢ H Figure 1.8 Anti-parallel nature of the DNA
H O
O H double helix. The two anti-parallel DNA
H 2¢ 3¢ H
C C strands run in opposite directions in linking
O P O– 4¢
1¢ C H H C 3¢ to 5¢ carbon atoms in the sugar residues.
O 5¢ This double-stranded trinucleotide has the
O CH2 sequence 5¢ pCpGpT–OH 3¢/5¢ pApCpG–OH 3¢,
CH2
T A where p stands for a phosphate group and
O O
5¢
–OH 3¢ represents the 3¢ terminal hydroxyl
C H H C 1¢ –O
4¢ P O group. This is conventionally abbreviated to
C C give the 5¢Æ3¢ sequence of nucleotides on only
H 3¢ 2¢ H –O
OH H one strand, either as 5¢-CGT-3¢ (blue strand) or
3¢ end 5¢ end as 5¢-ACG-3¢ (purple strand).
NUCLEIC ACID STRUCTURE AND DNA REPLICATION 9
A A A
3¢ 5¢ 3¢ 5¢ 3¢ 5¢ 3¢ 5¢ 3¢ 5¢ 3¢ 5¢
leading lagging
strand strand
Polymerase Family Standard DNA replication Additional or alternative roles in DNA repair, recombination, etc.
d (delta) B main polymerase that synthesizes lagging multiple roles in DNA repair
strand
l (lambda) X
double-strand break repair; VDJ recombinationg; base excision repairb
m (mu) X
Interspersed repeat reverse transcriptases (LINE-1 or occasionally converts mRNA and other RNA into cDNA, which can integrate elsewhere
endogenous retrovirus elements) into the genome
Telomerase reverse transcriptase (Tert) replicates DNA at the ends of linear chromosomes
a Terminal deoxynucleotide transferase. bBase excision repair identifies and removes inappropriate bases or inappropriately modified bases.
c Translesion synthesis involves the replication of DNA past damaged DNA (lesions) on the template strand. dInterstrand crosslink repair is the repair
of highly cytotoxic lesions where covalent DNA bonds have been formed between the DNA strands. eMismatch repair is a form of DNA repair that
corrects mistakes arising when noncomplementary nucleotides form a base pair. f Nucleotide excision repair is used to fix helix-distorting lesions.
gSomatic hypermutation and VDJ recombination are mechanisms used in B cells to diversify immunoglobulin sequences.
DNA polymerases can continue to synthesize new DNA strands opposite a lesion
in the template DNA (translesion synthesis) and they can contribute to the
sequence diversity of immunoglobulins (e.g. by introducing many base changes
in coding sequences) and so assist in the recognition of numerous foreign anti-
gens by the immune system.
Single linear Multiple linear Single circular Multiple circular Mixed (linear +
circular)
DNA GENOMES
Double-stranded (ds)DNA some viruses; a segmented dsDNA mitochondria; multipartite viruses; a very few bacteria,
very few bacteria, viruses; eukaryotic chloroplasts; many some bacteria e.g. Agrobacterium
e.g. Borrelia nuclei bacteria and Archaea tumefaciens
RNA GENOMES
See Figure 1.12 for examples of viral genomes and for explanations of segmented and multipartite viruses.
Viruses have developed many different strategies to infect and subvert cells,
and their genomes show extraordinary diversity when compared with cellular
genomes (Table 1.4 and Figure 1.12). Because RNA replication has a much higher
error rate than DNA replication, viral RNA genomes have a higher mutational
load than DNA genomes. Although viral RNA genomes are generally quite small,
the elevated mutation rate permits more rapid adaptation to changing environ-
mental conditions. RNA viruses usually replicate in the cytoplasm; DNA viruses
generally replicate in the nucleus.
Retroviruses are unusual RNA viruses both because they replicate in the
nucleus and also because their RNA replicates via a DNA intermediate. The
single-stranded RNA genome is converted into a single-stranded cDNA using a
viral reverse transcriptase. The single-stranded viral cDNA is then converted into
double-stranded DNA, by using a DNA polymerase from the host cell. Other viral
proteins then help insert this double-stranded DNA into the host cell’s chromo-
somal DNA. It can remain there for long periods or be used to synthesize new
viral RNA genomes that are packaged as new virus particles.
poxviruses
(have closed ends)
(B)
DNA RNA
21
1
gemini virus influenza virus
543
Normally, only one of the two DNA strands in a duplex serves as a template for
RNA synthesis. During transcription, double-stranded DNA is bound by RNA
polymerase. The DNA is then unwound, enabling the DNA strand that will act as
a template for RNA synthesis (the template strand) to form a transient double-
stranded RNA–DNA hybrid with the growing RNA strand.
The RNA transcript is complementary to the template strand of the DNA and
has the same 5¢Æ3¢ direction and base sequence (except that U replaces T) as the
opposite, nontemplate DNA strand. The nontemplate strand is often called the
sense strand, and the template strand is often called the antisense strand (Figure
1.13).
In documenting gene sequences, it is customary to show only the DNA Figure 1.13 Transcribed RNA is
sequence of the sense strand. The orientation of sequences relative to a gene nor- complementary to one strand of DNA.
mally refers to the sense strand. For example, the 5¢ end of a gene refers to The nucleotide sequence of transcribed
RNA is normally identical to that of the
sense strand sense strand, except that U replaces T, and
5¢ is complementary to that of the template
3¢ strand. The nucleotide at the extreme
3¢ 5¢ 5¢ end of a primary RNA transcript carries a
5¢ triphosphate group that may later
5¢ PPP OH 3¢ template (antisense) strand
undergo modification; the 3¢ end has a free
RNA hydroxyl group.
14 Chapter 1: Nucleic Acid Structure and Gene Expression
I 28S rRNAa, 18S rRNAa, 5.8S rRNAa localized in the nucleolus; RNA polymerase I produces a single primary
transcript (45S rRNA) that is cleaved to give the three rRNA classes listed here
II mRNAb, miRNAc, most snRNAsd and snoRNAse RNA polymerase II transcripts are unique in being subject to capping and
polyadenylation
III 5S rRNAa, tRNAf, U6 snRNAg, 7SL RNAh, various the promoter for some genes transcribed by RNA polymerase III (e.g. 5S rRNA,
other small noncoding RNAs tRNA, 7SL RNA) is internal to the gene; for others, it is located upstream of the
gene (see Figure 1.15)
aRibosomal RNA. bMessenger RNA. cMicroRNA. dSmall nuclear RNAs. eSmall nucleolar RNAs. fTransfer RNA. gU6 snRNA is a component of the
spliceosome, an RNA–protein complex that removes unwanted noncoding sequences from newly formed RNA transcripts. h7SL RNA forms part of the
signal recognition particle, which has an important role in the transport of newly synthesized proteins.
RNA TRANSCRIPTION AND GENE EXPRESSION 15
transcription of gene
(B)
E1 E2 E3 Figure 1.16 The process of RNA splicing.
gu - - - - - - - - - - ag gu - - - ag (A) In this example, the gene contains three
exons and two introns. (B) The primary RNA
cleave primary RNA transcript at transcript is a continuous RNA copy of the
and discard intronic sequences 1 and 2 gene and contains sequences transcribed
(C) from exons (E1, E2, and E3) and introns.
E1 E2 E3 (C) The primary transcript is cleaved at
regions corresponding to exon–intron
boundaries (splice junctions). The RNA
splicing of exonic sequences 1, 2, copies of the introns are snipped out and
and 3 to produce mature RNA discarded. (D) The RNA copies of the exons
are retained and then fused together
(D) E1 E2 E3 (spliced) in the same linear order as in the
genomic DNA sequence.
RNA PROCESSING 17
10s to > 10,000 nucleotides < 20 nucleotides Figure 1.17 Three consensus DNA
sequences in introns of complex
splice branch splice eukaryotes. Most introns in eukaryotic
donor site site acceptor site
genes contain conserved sequences that
C A T CTA C CCCCCCCCCCC C G
AG GT AG T N A N AG correspond to three functionally important
A G C T CG T TTTTTTTTTTT T A
regions. Two of the regions, the splice donor
exon exon
site and the splice acceptor site, span the 5¢
and 3¢ boundaries of the intron. The branch
site is an additional important region that
Although the conserved GT and AG dinucleotides are crucial for splicing, they typically occurs less than 20 nucleotides
are not sufficient to mark the limits of an intron. The nucleotide sequences that upstream of the splice acceptor site. The
are immediately adjacent to them are also quite highly conserved, constituting nucleotides shown in red in these three
splice junction consensus sequences (Figure 1.17). A third conserved intronic consensus sequences are almost invariant.
sequence that is also important in splicing is known as the branch site and is typi- The other nucleotides detailed in both
cally located no more than 40 nucleotides upstream of the intron’s 5¢ terminal AG the intron and the exons are those most
commonly found at each position. In some
(see Figure 1.17). Other exonic and intronic sequences can promote splicing
instances, two nucleotides may be equally
(splice enhancer sequences) or inhibit it (splice silencer sequences), and muta- common, as in the case of C and T near the
tions in these sequences can cause disease. 3¢ end of the intron. Where N appears, any of
The essential steps in splicing are as follows: the four nucleotides may occur.
• Nucleophilic attack of the intron’s 5¢ terminal G nucleotide by the invariant A
of the branch site consensus sequence, to form a lariat-shaped structure.
• Cleavage of the exon/intron junction at the splice donor site.
• Nucleophilic attack by the 3¢ end of the upstream exon of the splice acceptor
site, leading to cleavage and release of the intronic RNA in the form of a lariat,
and the splicing together of the two exonic RNA segments (Figure 1.18).
For genes residing in eukaryotic nuclei, RNA splicing is mediated by a large
RNA–protein complex, called the spliceosome. Spliceosomes have five types of
snRNA (small nuclear RNA) and more than 50 proteins. The snRNA molecules
associate with proteins to form small nuclear ribonucleoprotein (snRNP, or
snurp) particles. The specificity of the splicing reaction is established by
RNA–RNA base pairing between the RNA transcript to be spliced and snRNA
molecules within the spliceosome. There are two types of spliceosome:
• The major (GU-AG) spliceosome processes transcripts corresponding to clas-
sical GT-AG introns. It contains five types of snRNA. U1 and U2 snRNAs recog-
nize and bind the splice donor and branch sites, respectively. U4, U5, and U6
snRNAs subsequently bind to cause looping out of the intronic RNA (Figure
1.19). (A)
splice
• The minor (AU-AC) spliceosome processes transcripts corresponding to rare donor site
AU-AC introns. It also has five snRNAs but uses U11 and U12 snRNA instead
E1 E2
of U1 and U2 and has variants of U4 and U6 snRNA. 5¢ GU A AG 3¢
Once a splice donor site is recognized by the spliceosome, it scans the RNA
OH
sequence until it meets the next splice acceptor site (signaled as a target by the
upstream presence of the branch site consensus sequence). nucleophilic attack by A
in branch site at the 5¢
Specialized nucleotides are added to the ends of most RNA terminal G of intronic RNA
polymerase II transcripts
(B) splice
In addition to RNA splicing, the ends of RNA polymerase II transcripts undergo acceptor
modifications: the 5¢ end is capped by adding a variant guanine by using an site
E1 U E2
unusual phosphodiester bond, and a long sequence of adenines is added to the G
5¢ 3¢ A AG 3¢
3¢ end. As well as protecting the ends from cellular exonucleases, these modifica-
tions may assist the correct functioning of the RNA transcripts.
nucleophilic attack of
splice acceptor site by
3¢ end of E1
Figure 1.18 The mechanism of RNA splicing. (A) The unprocessed primary
RNA transcript with intronic RNA separating sequences E1 and E2 that
correspond to exons in DNA. The splicing mechanism involves a nucleophilic
attack on the G of the 5¢ GU dinucleotide. This is carried out by the 2¢ OH group U
G
on the conserved A of the branch site and results in the formation of a lariat A AG
structure (B), and cleavage of the splice donor site. The 3¢ OH at the 3¢ end of the
E1 sequence performs a nucleophilic attack on the splice acceptor site, causing (C)
release of the intronic RNA (as a lariat-shaped structure) and (C) fusion (splicing) E1 E2
of E1 and E2. 5¢ 3¢
18 Chapter 1: Nucleic Acid Structure and Gene Expression
Figure 1.20 The 5¢ cap of a eukaryotic mRNA. The nucleotide components 5¢CH
O Pu
2
in pink represent the residue of the original 5¢ end of a eukaryotic pre-mRNA.
The primary pre-mRNA transcript begins with a nucleotide that contains a 4¢ 1¢
purine (Pu) base and a 5¢ triphosphate group. However, as the pre-mRNA 3¢ 2¢
undergoes processing, the end phosphate group at the 5¢ end is excised with O O
a phosphatase to leave a 5¢ diphosphate group, and a specialized nucleotide is
covalently joined to form a cap that will protect mRNA from exonuclease attack P CH3
and assist in the initiation of translation. The cap nucleotide (with base shown
O
in red) is first formed when a GTP residue is cleaved to generate a guanosine
monophosphate that is then added through a 5¢–5¢ triphosphate linkage (pale CH2 O N
peach shading) to the diphosphate group of the original purine end nucleotide.
Subsequently nitrogen atom 7 of the new 5¢ terminal G is methylated. In mRNAs 4¢ 1¢
synthesized in vertebrate cells, the 2¢ carbon atom of the ribose of each of the 3¢ 2¢
two adjacent nucleotides, the original purine end-nucleotide and its neighbor, O O
are also methylated, as illustrated in this example. m7G, 7-methylguanosine;
N, any nucleotide. CH3
RNA PROCESSING 19
spacer spacer
In addition to the sequence of cleavage reactions (see Figure 1.22), the pri-
mary rRNA transcript also undergoes a variety of base-specific modifications.
This extensive RNA processing is undertaken by many different small nucleolar
RNAs that are encoded by about 200 different genes in the human genome.
Mature tRNA molecules also undergo extensive base modifications, and about
10% of the bases in any tRNA are modified versions of A, C, G, or U. Common
examples of modified nucleosides include dihydrouridine, which has extra
hydrogens at carbons 5 and 6; pseudouridine, an isomer of uridine; inosine
(deaminated guanosine); and N,N¢-dimethylguanosine.
OH O OH O OH OH OH O
+
adenine O CH2 adenine O CH2 adenine O CH2 adenine O CH2
O O O O
O P O– O P O– O P O– O P O–
O O O O
Each tRNA has its own anticodon, a trinucleotide at the center of the anti-
codon arm (see Figure 1.9B) that provides the necessary specificity to interpret
the genetic code. For an amino acid to be added to a growing polypeptide, the
relevant codon of the mRNA molecule must be recognized by base pairing with a
complementary anticodon on the appropriate aminoacyl tRNA molecule. This
happens on the ribosome. The small ribosomal subunit binds the mRNA, and the
large subunit has two sites for binding aminoacyl tRNAs, namely a P (peptidyl)
site and an A (aminoacyl) site (Figure 1.24).
The cap at the 5¢ end of messenger RNA molecules is important in initiating
translation. It is recognized by certain key proteins that bind the small ribosomal
subunit, and these initiation factors hold the mRNA in place. In cap-dependent
translation initiation, the ribosome scans the 5¢ UTR of the mRNA in the 5¢Æ3¢
direction to find a suitable initiation codon, an AUG that is found within the
Kozak consensus sequence 5¢-GCCPuCCAUGG-3¢ (where Pu represents purine).
The most important determinants are the G at position +4 (immediately follow-
ing the AUG codon), and the purine (preferably A) at –3 (three nucleotides
upstream of the AUG codon).
22 Chapter 1: Nucleic Acid Structure and Gene Expression
Although more than 60 codons can specify an amino acid, the number of dif-
TABLE 1.6 RULES FOR BASE
ferent cytoplasmic tRNA molecules is quite a bit smaller, and only 22 types of
PAIRING CAN BE RELAXED
mitochondrial tRNA are made. The interpretation of more than 60 sense codons
WOBBLE AT POSITION 3
with a much smaller number of different tRNAs is possible because base pairing
OF A CODON
in RNA is more flexible than in DNA. Pairing of codon and anticodon follows the
normal A–U and G–C rules for the first two base positions in a codon. However, at Base at 5¢ end of Base recognized
the third position there is some flexibility (base wobble), and GU base pairs are tRNA anticodon at 3¢ end of mRNA
tolerated here (Table 1.6). codon
The genetic code is the same throughout nearly all life forms. However, mito- A U only
chondria and chloroplasts have a limited capacity for protein synthesis, and dur-
ing evolution their genetic codes have diverged slightly from that used at cyto- C G only
plasmic ribosomes. Translation of nuclear-encoded mRNA continues until one
G (or I)a C or U
of three stop codons is encountered (UAA, UAG, or UGA) but in mammalian
mitochondria there are four possibilities (UAA, UAG, AGA, and AGG). U A or G
The meaning of a codon can also be dependent upon the sequence context;
aInosine (I) is a deaminated form of
that is, the nature of the nucleotide sequence in which it is embedded. Depending
guanosine.
on the surrounding sequence, some codons in a few types of nuclear-encoded
mRNA can be interpreted differently from normal. For example, in a wide variety
of cells the stop codon UGA is alternatively interpreted as encoding seleno-
cysteine with some nuclear-encoded mRNAs, and UAG can sometimes be inter-
preted to encode glutamine.
Phosphorylation (PO4 –) Tyr, Ser, Thr achieved by specific kinases; may be reversed by phosphatases
Hydroxylation (OH) Pro, Lys, Asp hydroxyproline (Hyp) and hydroxylysine (Hyl) are particularly common in
collagens
N-glycosylation (complex carbohydrate) Asna takes place initially in the endoplasmic reticulum, with later additional changes
occurring in the Golgi apparatus
O-glycosylation (complex carbohydrate) Ser, Thr, Hylb takes place in the Golgi apparatus; less common than N-glycosylation
Glycosylphosphatidylinositol (glycolipid) Aspc serves to anchor protein to outer layer of plasma membrane
attached carbohydrate); if they are, they have a single sugar residue, N-acetyl-
glucosamine, attached to a serine or threonine residue. However, proteins that
are secreted from cells or transported to lysosomes, the Golgi apparatus, or the
plasma membrane are routinely glycosylated. In these cases, the sugars are
assembled as oligosaccharides before being attached to the protein.
Two major types of glycosylation occur. Carbohydrate N-glycosylation
involves attaching a carbohydrate group to the nitrogen atom of an asparagine
side chain, and O-glycosylation entails adding a carbohydrate to the oxygen atom
of an OH group carried by the side chains of certain amino acids (see Table 1.7).
Proteoglycans are proteins with attached glycosaminoglycans (polysaccha-
rides) that usually include repeating disaccharide units containing glucosamine
or galactosamine. The best-characterized proteoglycans are components of the
extracellular matrix, a complex network of macromolecules secreted by, and sur-
rounding, cells in tissues or in culture systems.
Addition of lipid groups
Some proteins, notably membrane proteins, are modified by the addition of fatty
acyl or prenyl groups. These added groups typically serve as membrane anchors,
hydrophobic amino acid sequences that secure a newly synthesized protein
within either a plasma membrane or the endoplasmic reticulum (Table 1.8).
Anchoring a protein to the outer layer of the plasma membrane involves the
attachment of a glycosylphosphatidylinositol (GPI) group. This glycolipid group
contains a fatty acyl group that serves as the membrane anchor; it is linked suc-
cessively to a glycerophosphate unit, an oligosaccharide unit, and finally—
through a phosphoethanolamine unit—to the C-terminus of the protein. The
entire protein, except the GPI anchor, is located in the extracellular space.
Post-translational cleavage
The primary translation product may also undergo internal cleavage to generate
a smaller mature product. Occasionally the initiating methionine is cleaved from
Primary the linear sequence of amino acids in a can vary enormously in length from a few to
F T P A
polypeptide thousands of amino acids
V
L F A H
D
K F L A
S
V T S V
L
T S K Y
Secondary the path that a polypeptide backbone follows varies along the length of the polypeptide; common
within local regions of the primary structure elements of secondary structure include the a-helix
and b-pleated sheet
Tertiary the overall three-dimensional structure of a can take various forms (e.g. globular, rod-like, tube,
polypeptide, arising from the combination of all coil, sheet)
of the secondary structures
Quaternary the aggregate structure of a multimeric protein can be stabilized by disulfide bridges between
(comprising more than one subunit, which may subunits or ligand binding, and other factors
be of more than one type)
TRANSLATION, POST-TRANSLATIONAL PROCESSING, AND PROTEIN STRUCTURE 25
the primary translation product, as during the synthesis of b-globin (see Figure
1.23C, D). More substantial polypeptide cleavage is observed during the matura-
tion of many proteins, including plasma proteins, polypeptide hormones,
neuropeptides, and growth factors. Cleavable signal sequences are often used to
mark proteins either for export or for transport to a specific intracellular location.
A single mRNA molecule can sometimes specify more than one functional
polypeptide chain as a result of post-translational cleavage of a large precursor
polypeptide (Figure 1.26).
(A)
exon 1 intron 1 exon 2 intron 2 exon 3
1 63 110 Figure 1.26 Insulin synthesis involves
multiple post-translational cleavages of
gene gt ag ATG G gt ag TG AAC TAG
polypeptide precursors. (A) The human
transcription and insulin gene comprises three exons and
RNA processing two introns. The coding sequence (the part
(B) 1 63 110
that will be used to make polypeptide) is
mRNA m7Gppp AUG GUG AAC AAAAAA An shown in deep blue. It is confined to the 3¢
sequence of exon 2 and the 5¢ sequence of
5¢ UTR 3¢ UTR
translation exon 3. (B) Exon 1 and the 5¢ part of exon 2
(C) 1 24 25 110
specify the 5¢ untranslated region (5¢ UTR),
preproinsulin N Met Ala Phe Phe Asn C and the 3¢ end of exon 3 specifies the
3¢ UTR. The UTRs are transcribed and so
post-translational
are present at the ends of the mRNA. (C) A
cleavage primary translation product, preproinsulin,
1 24
(D) leader sequence Met Ala has 110 residues and is cleaved to give
(D) a 24-residue N-terminal leader sequence
1 86
Phe Leu Glu Arg Gly Asn proinsulin (that is required for the protein to cross the
cell membrane but is thereafter discarded)
cleavage of proinsulin plus an 86-residue proinsulin precursor.
(E) Proinsulin is cleaved to give a central
1 35
segment (the connecting peptide) that may
(E) Glu Arg
connecting peptide maintain the conformation of the A and B
1 30 1 21 chains of insulin before the formation of
Phe Leu Gly Asn their interconnecting covalent disulfide
insulin B chain insulin A chain bridges (see Figure 1.29).
26 Chapter 1: Nucleic Acid Structure and Gene Expression
(A) H (B)
1 Figure 1.27 The structure of a standard
C Phe Met Leu a-helix and an amphipathic a-helix.
+ Arg Ala
C N (A) The structure of an a-helix is stabilized
O 4 Ala Leu by hydrogen bonding between the oxygen
H C + Arg
1 Gly of the carbonyl group (C=O) of each peptide
H 2
C N C
Ser Leu 2 bond and the hydrogen on the peptide bond
N C O + Arg amide group (NH) of the fourth amino acid
Gly
3C O + Arg Ala away, making the helix have 3.6 amino acid
4 H + Arg
H C Pro Leu residues per turn. The side chains of each
C N 3 amino acid are located on the outside of the
C N
O O helix; there is almost no free space within
H C
5
the helix. Note that only the backbone of the
H polypeptide is shown, and some bonds have
C N C
6 been omitted for clarity. (B) An amphipathic
N C O
a-helix has tighter packing and has charged
7C O
8 amino acids and hydrophobic amino acids
H C H
C N located on different surfaces. Here we show
C N an end view of such a helix: five positively
O O
charged arginine residues are clustered on
H C9
H
one side of the helix, whereas the opposing
C N C side has a series of hydrophobic amino
10
N C O acids (mostly Ala, Leu, and Gly). The lines
O within the circle indicate neighboring
residues—the initiator methionine (position
1) is connected to a leucine (2), which
is connected to an arginine (3), which is
adjacent to an alanine (4), and so on.
in proteins that perform key cellular functions (such as transcription factors,
where they are usually represented in the DNA-binding domains). Identical
a-helices with a repeating arrangement of nonpolar side chains can coil round
each other to form a particularly stable coiled coil. Coiled coils occur in many
fibrous proteins, such as collagen of the extracellular matrix, the muscle protein
tropomyosin, a-keratin in hair, and fibrinogen in blood clots.
The b-pleated sheet
b-Pleated sheets are also stabilized by hydrogen bonding but, in this case, they
occur between opposed peptide bonds in parallel or anti-parallel segments of
the same polypeptide chain (Figure 1.28). b-Pleated sheets occur—often together
with a-helices—at the core of most globular proteins.
(A) (B)
The b-turn
Hydrogen bonding can occur between amino acids that are even nearer to each
other within a polypeptide. When this arises between the peptide bond CO group
of one amino acid residue and the peptide bond NH group of an amino acid resi-
due three places farther along, this results in a hairpin b-turn. Abrupt changes in
the direction of a polypeptide enable compact globular shapes to be achieved.
These b-turns can connect parallel or anti-parallel strands in b-pleated sheets.
Higher-order structures
Many more complex structural motifs, consisting of combinations of the above
structural modules, form protein domains. Such domains are often crucial to a
protein’s overall shape and stability and usually represent functional units
involved in binding other molecules. Another important determinant of the
structure (and function) of a protein is disulfide bridges. They can form between
the sulfur atoms of sulfhydryl (–SH) groups on two amino acids that may reside
on a single polypeptide chain or on two polypeptide chains (Figure 1.29).
In general, the primary structure of a protein determines the set of secondary
structures that, together, generate the protein’s tertiary structure. Secondary
structural motifs can be predicted from an analysis of the primary structure, but
the overall tertiary structure cannot easily be accurately predicted. Finally, some
proteins form complex aggregates of polypeptide subunits, giving an arrange-
ment known as the quaternary structure.
FURTHER READING
Agris PF, Vendeix FA & Graham WD (2007) tRNA’s wobble Preiss T & Hentze MW (2003) Starting the protein synthesis
decoding of the genome: 40 years of modification. J. Mol. Biol. machine: eukaryotic translation initiation. BioEssays 25,
366, 1–13. 1201–1211.
Calvo O & Manley JL (2003) Strange bedfellows: polyadenylation Sander DM. Big Picture Book of Viruses. http://www.virology.net/
factors at the promoter. Genes Dev. 17, 1321–1327. Big_Virology/BVHomePage.html
Carradine CR, Drew HR, Luisi B & Travers AA (2004) Understanding Wear MA & Cooper JA (2004) Capping protein: new insights into
DNA. The Molecule And How It Works, 3rd ed. Academic mechanism and regulation. Trends Biochem. Sci. 29, 418–428.
Press. Whitford D (2005) Protein Structure And Function. John Wiley.
Crain PF, Rozenski J & McCloskey JA. RNA Modification Database.
http://library.med.utah.edu/RNAmods
Fedorova O & Zingler N (2007) Group II introns: structure, folding
and splicing mechanisms. Biol. Chem. 388, 665–678.
Garcia-Diaz M & Bebenek K (2007) Multiple functions of DNA
polymerases. Crit. Rev. Plant Sci. 26, 105–122.
This page is intentionally left blank.
Chapter 2
• Chromosomes have two fundamental roles: the faithful transmission and appropriate expression of
genetic information.
• Prokaryotic chromosomes contain circular double-stranded DNA molecules that are relatively protein-
free, but eukaryotic chromosomes consist of linear double-stranded DNA molecules complexed
throughout their lengths with proteins.
• Chromatin is the DNA–protein matrix of eukaryotic chromosomes. The complexed proteins serve
structural roles, including compacting the DNA in different ways, and also regulatory roles.
• Chromosomes undergo major changes in the cell cycle, notably at S phase when they replicate and at
M phase when the replicated chromosomes become separated and allocated to two daughter cells.
• DNA replication at S phase produces two double-stranded daughter DNA molecules that are
held together at a specialized region, the centromere. When the daughter DNA molecules remain held
together like this they are known as sister chromatids, but once they separate at M phase they become
individual chromosomes.
• At the metaphase stage of M phase the chromosomes are so highly condensed that gene expression is
uniformly shut down. But this is the optimal time for viewing them under the microscope. Staining
with dyes that bind preferentially to GC-rich or AT-rich regions can give reproducible chromosome
banding patterns that allow different chromosomes to be differentiated.
• During interphase, the long period of the cell cycle that separates successive M phases, chromosomes
have generally very long extended conformations and are invisible under optical microscopy. The
extended structure means that genes can be expressed efficiently.
• Even during interphase some chromosomal regions always remain highly condensed and
transcriptionally inactive (heterochromatin), whereas others are extended to allow gene expression
(euchromatin).
• Sperm and egg cells have one copy of each chromosome (they are haploid), but most cells are diploid,
having two sets of chromosomes.
• Fertilization of a haploid egg by a haploid sperm generates the diploid zygote from which all other
body cells arise by cell division.
• In mitosis a cell divides to give two daughter cells, each with the same number and types of
chromosomes as the original cell.
• Meiosis is a specialized form of cell division that occurs in certain cells of the testes and ovaries to
produce haploid sperm and egg cells. During meiosis new genetic combinations are randomly created,
partly by exchanging sequences between maternal and paternal chromosomes.
• Three types of functional element are needed for eukaryotic chromosomes to transmit DNA faithfully
from mother cell to daughter cells: the centromere (ensures correct chromosome segregation at cell
division); replication origins (initiate DNA replication); and telomeres (cap the chromosomes to stop
the internal DNA from being degraded by nucleases).
• An abnormal number of chromosomes can sometimes occur but this is often lethal if present in most
cells of the body.
• Structural chromosome abnormalities arising from breaks in chromosomes can cause genes to be
deleted or incorrectly expressed.
• Having the correct number and structure of chromosomes is not enough. They must also have the
correct parental origin because certain genes are preferentially expressed on either paternally or
maternally inherited chromosomes.
30 Chapter 2: Chromosome Structure and Function
M phase: sister chromatids separate to give two Figure 2.1 Changes in chromosomes and
chromosomes that are distributed into two daughter cells DNA content during the cell cycle. The cell
cycle shown at the right includes a very short
+ M phase, when the chromosomes become
extremely highly condensed in preparation
for nuclear and cell division. Afterwards,
chromosomes = 2n chromosomes = 4n chromosomes = 2n
cells enter a long period of growth called
DNA = 4C DNA = 4C DNA = 2C
interphase, during which chromosomes
are enormously extended so that genes
can be expressed. Interphase is divided
late S phase: two DNA double
helices per chromosome M into three phases: G1, S (when the DNA
G2
centromere replicates), and G2. Chromosomes contain
sister one DNA double helix from the end of M
chromatids G1 phase right through until just before the
S
DNA is duplicated in S phase. After the DNA
paired
double helix has been duplicated, the two
double helices
resulting double helices are held together
chromosomes = 2n
DNA = 4C tightly along their lengths (by specialized
protein complexes called cohesins) until
DNA replication M phase. As the chromosomes condense
at M phase they are now seen to consist of
two sister chromatids, each containing a
early S phase: one DNA double helix per chromosome
centromere
DNA duplex, that are bound together only
at the centromeres. During M phase the
chromosome
two sister chromatids separate to form two
DNA double helix independent chromosomes that are then
chromosomes = 2n equally distributed into the daughter cells.
DNA = 2C
A small subset of diploid body cells constitute the germ line that gives rise to
gametes (sperm cells or egg cells). In humans, where n = 23, each gamete con-
tains one sex chromosome plus 22 non-sex chromosomes (autosomes). In eggs,
the sex chromosome is always an X; in sperm it may be either an X or a Y. After a
haploid sperm fertilizes a haploid egg, the resulting diploid zygote and almost all
gamete
of its descendant cells have the chromosome constitution 46,XX (female) or 46,XY production
(male) (Figure 2.2).
Cells outside the germ line are somatic cells. Human somatic cells are usually
diploid but, as will be described later, there are notable exceptions. Some types of
non-dividing cell lack a nucleus and any chromosomes, and so are nulliploid.
Other cell types have multiple chromosome sets; they are naturally polyploid as egg (23,X) sperm sperm
a result of multiple rounds of DNA replication without cell division. (23,X) (23,Y)
Figure 2.2 The human life cycle, from a chromosomal viewpoint. Haploid
egg and sperm cells originate from diploid precursors in the ovary and testis in cell growth, division,
women and men, respectively. All eggs have a 23,X chromosome constitution, and development
representing 22 autosomes plus a single X sex chromosome. A sperm can carry
either sex chromosome, so that the chromosome constitution is 23,X (50%) and
23,Y (50%). After fertilization and fusion of the egg and sperm nuclei, the diploid
zygote will have a chromosome constitution of either 46,XX or 46,XY, depending
on which sex chromosome the fertilizing sperm carried. After many cell cycles,
this zygote gives rise to all cells of the adult body, almost all of which will have
the same chromosome complement as the zygote from which they originated. 46,XX 46,XY
32 Chapter 2: Chromosome Structure and Function
anaphase
prophase
cytokinesis
centrioles
cells (see Figure 2.3). Interaction between the mitotic spindle and the centromere
is crucial to this process and we will consider this in detail in Section 2.3.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X sperm 5
The second division of meiosis is identical to mitosis, but the first division has
important differences. Its purpose is to generate genetic diversity between the
daughter cells. This is done by two mechanisms: independent assortment of
paternal and maternal homologs, and recombination. (A)
Independent assortment
Every diploid cell contains two chromosome sets and therefore has two copies
(homologs) of each chromosome (except in the special case of the X and Y chro-
mosomes in males). One homolog is paternally inherited and the other is mater-
nally inherited. During meiosis I the maternal and paternal homologs of each
pair of replicated chromosomes undergo synapsis by pairing together to form a
bivalent. After DNA replication, the homologous chromosomes each comprise pairing
two sister chromatids, so each bivalent is a four-stranded structure at the meta-
phase plate. Spindle fibers then pull one complete chromosome (two chroma- (B)
tids) to either pole. In humans, for each of the 23 homologous pairs, the choice of
which daughter cell each homolog enters is independent. This allows 223 or about
8.4 ¥ 106 different possible combinations of parental chromosomes in the gam-
etes that might arise from a single meiotic division (Figure 2.5).
Recombination
The five stages of prophase of meiosis I (Figure 2.6) begin during fetal life and, in crossing over
human females, can last for decades. During this extended process, the homologs
within each bivalent normally exchange segments of DNA at randomly posi- (C)
tioned but matching locations. At the zygotene stage (Figure 2.6B), a proteina-
ceous synaptonemal complex forms between closely apposed homologous
chromosomes. Completion of the synaptonemal complex marks the start of the
pachytene stage (Figure 2.6C), during which recombination (crossover) occurs.
Crossover involves physical breakage of the DNA in one paternal and one mater-
nal chromatid, and the subsequent joining of maternal and paternal fragments.
The mechanism that allows alignment of the homologs is not known (see
Figure 2.6A, B), although such close apposition is required for recombination.
partial separation
Located at intervals on the synaptonemal complex are very large multiprotein
assemblies, called recombination nodules, that may mediate recombination (D) chiasmata
Figure 2.7 Metaphase I to production of gametes. (A) At metaphase I, the bivalents (A) metaphase plate
align on the metaphase plate, at the center of the spindle apparatus. Contraction of
spindle fibers draws the chromosomes in the direction of the spindle poles (arrows).
(B) The transition to anaphase I occurs at the consequent rupture of the chiasmata.
(C) Cytokinesis segregates the two chromosome sets, each to a different primary
spermatocyte. Note that, as shown in this panel, after recombination during prophase
I the chromatids share a single centromere but are no longer identical. (D) Meiosis II
in each primary spermatocyte, which does not include DNA replication, generates
unique genetic combinations in the haploid secondary spermatocytes. Only 2 of the
possible 23 different human chromosomes are depicted, for clarity, so only 22 (i.e. 4) of
the possible 223 (8,388,608) possible combinations are illustrated. Although oogenesis
can produce only one functional haploid gamete per meiotic division (see Figure 2.4),
the processes by which genetic diversity arises are the same as in spermatogenesis.
Location all tissues specialized germ line cells in testis and ovary
DNA replication and cell division normally one round of only one round of replication per two cell divisions
replication per cell division
Duration of prophase short (~30 min in human cells) can take decades to complete
Recombination rare and abnormal during each meiosis; normally occurs at least once in each
chromosome arm after pairing of maternal and paternal homologs
Relationship between daughter cells genetically identical genetically different as a result of independent assortment of
homologs and recombination
supercoiling to form chromatin. Even in the interphase nucleus, when the DNA
is in a very highly extended form, the 2 nm thick DNA double helix undergoes at
least two levels of coiling that are directed by binding of basic histone proteins.
First, a 10 nm thick filament is formed that is then coiled into a 30 nm thick chro-
matin fiber. The chromatin fiber undergoes looping and is supported by a scaf-
fold of nonhistone proteins (Figure 2.8A).
The nucleosome is the fundamental unit of DNA packaging: a stretch of
147 bp of double-stranded DNA is coiled in just less than two turns around a
central core of eight histone proteins (two molecules each of the core histones
H2A, H2B, H3, and H4, all highly conserved proteins) (Figure 2.8B). Adjacent
nucleosomes are connected by a short length (8–114 bp) of linker DNA; the length
of the linker DNA varies both between species and between regions of the
genome. Electron micrographs of suitable preparations show nucleosomes to
have a string of beads appearance (Figure 2.8C). This first level of DNA packaging
is the only one that still allows transcriptional activity.
The N-terminal tails of the core histones protrude from the nucleosomes (see
Figure 2.8B). Specific amino acids in the histone tails can undergo various types
of post-translational modification, notably acetylation, phosphorylation, and
methylation. As a result, different proteins can be bound to the chromatin in a
way that affects chromatin condensation and the local level of transcriptional
activity. Additional histone genes encode variant forms of the core histones that
(A) nucleosome
H1 histone
2 nm
DNA uncoiling to enable
high-level gene
expression
formation of
solenoid structure
looped domain
10 nm
octameric
histone core 30 nm
chromatin
fiber
(C) (D)
H1 histone DNA
N
histone
octamer
N
N
N
N N
Figure 2.8 From DNA double helix to interphase chromatin. (A) Binding of basic histone proteins. The 2 nm thick DNA double helix binds basic histones to
undergo coiling, forming a 10 nm thick nucleosome filament that is further coiled into the 30 nm chromatin fiber. In interphase the chromatin fiber is organized
into looped domains with ~50–200 kb of DNA attached to a scaffold of nonhistone acidic proteins. High levels of gene expression require local uncoiling of the
chromatin fiber to give the 10 nm nucleosomal filaments. (B) A nucleosome consists of almost two turns of DNA wrapped round an octamer of core histones
(two each of H2, H3A, H3B, and H4). Note the extensive a-helical structure of histones and their protruding N-terminal tails. (C) Electron micrograph of 10 nm
nucleosomal filaments. (D) Cross section view of the 30 nm chromatin fiber showing one turn of the solenoid [the outer red box corresponds to that shown in
(A)]. The additional H1 histone, which binds to linker DNA, is important in organizing the structure of the 30 nm chromatin fiber. [(A) adapted from Grunstein M
(1992) Sci. Am. 267, 68. With permission from Scientific American Inc., and Alberts B, Johnson A, Lewis J et al. (2008) Molecular Biology of the Cell, 5th ed. Garland
Science/Taylor & Francis LLC. (B) adapted from Alberts B, Johnson A, Lewis J et al. (2008) Molecular Biology of the Cell, 5th ed. Garland Science/Taylor & Francis
LLC from figures by Jakob Waterborg, University of Missouri—Kansas City. (C) courtesy of Jakob Waterborg, University of Missouri—Kansas City. (D) adapted
from Klug A (1985) Proceedings, RW Welch Federation Conference, Chem. Res. 39, 133. With permission from The Welch Foundation.]
38 Chapter 2: Chromosome Structure and Function
kinetochore
microtubule
Figure 1 (A) Centrosome structure. (B) Kinetochore-centromere association. polar astral
(C) Mitotic spindle structure. microtubule spindle pole microtubule
40 Chapter 2: Chromosome Structure and Function
anaphase, the kinetochore microtubules pull the previously paired sister chro- centromere
matids toward opposite poles of the spindle. Kinetochores at the centromeres
TCACATGAT 80–90 bp TGATTTCCGAA
control assembly and disassembly of the attached microtubules, which drives AGTGTACTA > 90% (A+T) ACTAAAGGCTT
chromosome movement. CDE I CDE II CDE III
In the budding yeast Saccharomyces cerevisiae, the sequences that specify
centromere function are very short, as are other functional chromosomal ele-
ments (Figure 2.10). The centromere element (CEN) is about 120 bp long and telomere
contains three principal sequence elements, of which the central one, CDE II, is tandem repeats based on the general formula
particularly important for attaching microtubules to the kinetochore. A centro- (TG)1–3 TG2–3
meric CEN fragment derived from one S. cerevisiae chromosome can replace the
centromere of another yeast chromosome with no apparent consequence.
autonomous replicating sequence
S. cerevisiae centromeres are highly unusual because they are very small and
the DNA sequences specify the sites of centromere assembly. They do not have ~50 bp
counterparts in the centromeres of the fission yeast Schizosaccharomyces pombe
or in those of multicellular animals. Centromere size has increased during
11 bp AT-rich imperfect copies
eukaryote evolution and, in complex organisms, centromeric DNA is dominated core element of core element
by repeated sequences that evolve rapidly and are species-specific. The relatively
rapid evolution of centromeric DNA and associated proteins might contribute to Figure 2.10 In S. cerevisiae, chromosome
the reproductive isolation of emerging species. function is uniquely dependent on
Although centromeric DNA shows remarkable sequence heterogeneity short defined DNA sequence elements.
across eukaryotes, centromeres are universally marked by the presence of a S. cerevisiae centromeres are very short
(often 100–110 bp) and, unusually for
centromere-specific variant of histone H3, generically known as CenH3 (the
eukaryotic centromeres, are composed of
human form of CenH3 is named CENP-A). At centromeres, CenH3/CENP-A defined sequence elements. There are three
replaces the normal histone H3 and is essential for attachment to spindle micro- contiguous centromere DNA elements
tubules. Depending on centromere organization, different numbers of spindle (CDEs), of which CDE II and CDE III are the
microtubules can be attached (Figure 2.11). most important functionally. Telomeres
Mammalian centromeres are particularly complex. They often extend over are composed of tandem TG-rich repeats.
several megabases and contain some chromosome-specific as well as repetitive Autonomous replicating sequences are
DNA. A major component of human centromeric DNA is a-satellite DNA, whose defined by short AT-rich sequences. The
structure is based on tandem repeats of a 171 bp monomer. Adjacent repeat units three types of short sequence can be
can show minor variations in sequence, and occasional tandem amplification of combined with foreign DNA to make an
artificial chromosome in yeast cells.
a sequence of several slightly different neighboring repeats results in a higher-
order repeat organization. This type of a-satellite DNA is characteristic of cen-
tromeres and is marked by 17 bp recognition sites for the centromere-binding
protein CENP-B.
Unlike the very small discrete centromeres of S. cerevisiae, the much larger
centromeres of other eukaryotes are not dependent on sequence organization
alone. Neither specific DNA sequences (e.g. a-satellite DNA) nor the DNA-
binding protein CENP-B are essential or sufficient to dictate the assembly of a
functional mammalian centromere. Poorly understood DNA characteristics
specify a chromatin conformation that somehow, by means of sequence-
independent mechanisms, controls the formation and maintenance of a func-
tional centromere.
tary strand. On the centromeric side of the human telomeric TTAGGG repeats are
a further 100–300 kb of telomere-associated repeat sequences (Figure 2.12A). 125 bp
These have not been conserved during evolution, and their function is not yet
understood.
(B) S. pombe regional centromere
The (TTAGGG)n array of a human telomere often spans about 10–15 kb (see
Figure 2.12A). A very large protein complex (called shelterin, or the telosome)
contains several components that recognize and bind to telomeric DNA. Of these
components, two telomere repeat binding factors (TRF1 and TRF2) bind to
double-stranded TTAGGG sequences.
As a result of natural difficulty in replicating the lagging DNA strand at the
extreme end of a telomere (discussed in the next section), the G-rich strand has a OTR IMR core IMR OTR
single-stranded overhang at its 3¢ end that is typically 150–200 nucleotides long central domain
(see Figure 2.12A). This can fold back and form base pairs with the other, C-rich, 40–100 kb
strand to form a telomeric loop known as the T-loop (Figure 2.12B, C).
(C) H. sapiens
regional centromere
Figure 2.11 Differences in eukaryotic centromere organization. In all
eukaryotic centromeres a centromere-specific variant of histone H3 (generically
called CenH3) is implicated in microtubule binding. However, the extent to which
the centromere extends across the DNA of a chromosome and the number of
microtubules involved can vary enormously. (A) The budding yeast S. cerevisiae
has the simplest form of centromere organization, a point centromere, with just
125 bp of DNA wrapped around a single nucleosome; each kinetochore makes
only one stable microtubule attachment during metaphase. (B) In the fission
yeast S. pombe, centromeres have multiple microtubule attachments clustered in type II type I type II
a-satellites a-satellites a-satellites
a region occupying 35–110 kb of DNA (the average chromosome has more than
4 Mb of DNA). The microtubule attachment sites are clustered in a non-repetitive 0.1–4 Mb
core sequence which is flanked by different types of repeat sequence (IMR,
innermost; OTR, outermost). (C) Human centromeres have multiple microtubule
(D) C. elegans
attachment sites and are clustered over regions of up to 4 Mb of DNA. Higher-order diffuse centromere
a-satellite DNA repeats are prominent at the centromeres. In addition to binding
to nucleosomes containing a CenH3 protein, CENP-A, they also have binding sites
for the CENP-B protein. (D) In some eukaryote species, such as the nematode
Caenorhabditis elegans, the centromere is very diffuse: kinetochores that bind
spindle microtubules are distributed across the whole length of the chromosome,
and the chromosomes are said to be holocentric. [Adapted from Vagnarelli P, Ribeiro
SA & Earnshaw WC (2008) FEBS Lett. 582, 1950–1959. With permission from Elsevier.] telomere telomere
42 Chapter 2: Chromosome Structure and Function
Paramecium TTGGGG
Trypanosoma TAGGGG
Chlamydomonas TTTTAGGG
Arabidopsis TTTAGGG
Nematodes TTAGGC
Vertebrates TTAGGG
aIn the direction toward the end of the chromosome. Note: although telomere function is conserved
throughout eukaryotes and telomeric repeat sequences are generally strongly conserved, the
telomeres of arthropods, such as Drosophila, are radically different in structure, being composed
of long DNA repeats that are unrelated to the TG-rich oligonucleotide repeats found in other
eukaryotes.
G 21, 22, Y small; acrocentric, with satellites on 21 and 22 but not on Y BOX 2.2 HUMAN CHROMOSOME
aAutosomes are numbered from largest to smallest, except that chromosome 21 is slightly smaller NOMENCLATURE
than chromosome 22. bA metacentric chromosome has its centromere at or near the middle. The International System for Human
cA submetacentric chromosome has its centromere placed so that the two arms are of clearly
Cytogenetic Nomenclature (ISCN) is
unequal length. dAn acrocentric chromosome has its centromere at or near one end. eA satellite,
fixed by the Standing Committee on
in this context, is a small segment separated by a non-centromeric constriction from the rest of a
Human Cytogenetic Nomenclature that
chromosome; these occur on the short arms of most acrocentric human chromosomes.
issues detailed nomenclature reports,
most recently in 2005 (see Further
Reading). The basic terminology for
Fluorescing bands are called Q bands and mark the same chromosomal seg- banded chromosomes was decided at
ments as G bands. a meeting in Paris in 1971, and is often
referred to as the Paris nomenclature.
• R-banding — this is essentially the reverse of the G-banding pattern. The Short arm locations are labeled
chromosomes are heat-denatured in saline before being stained with Giemsa. p (petit) and long arms q (queue).
The heat treatment denatures AT-rich DNA, and R bands are Q negative. The Each chromosome arm is divided into
same pattern can be produced by binding GC-specific dyes such as chromo- regions labeled p1, p2, p3, etc., and q1,
mycin A3, olivomycin, or mithramycin. q2, q3, etc., counting outward from the
centromere. Regions are delimited by
• T-banding — identifies a subset of the R bands that are especially concen-
specific landmarks, which are consistent
trated close to the telomeres. The T bands are the most intensely staining of and distinct morphological features,
the R bands and are revealed by using either a particularly severe heat treat- such as the ends of the chromosome
ment of the chromosomes before they are stained with Giemsa, or a combina- arms, the centromere, and certain
tion of standard dyes and fluorescent dyes. bands. Regions are divided into bands
• C-banding — this is thought to demonstrate constitutive heterochromatin, labeled p11 (one-one, not eleven!),
mainly at the centromeres. The chromosomes are typically exposed to dena- p12, p13, etc., sub-bands labeled p11.1,
p11.2, etc., and sub-sub-bands, for
turation with a saturated solution of barium hydroxide, before Giemsa
example p11.21, p11.22, in each case
staining. counting outward from the centromere
Banding patterns correlate with functional elements of chromosome struc- (see Figure 2.14 and also the inside back
ture. The DNA of G bands replicates late in S phase and is relatively condensed, cover of this book, where ideograms are
whereas the DNA of R bands (which are G negative) generally replicates early in S shown for all human chromosomes).
phase and is less condensed. G-band DNA contains relatively few genes and is Relative distance from the
less active transcriptionally. Although the AT content of human G-band DNA is centromere is described by the words
proximal and distal. Thus, proximal Xq
only slightly higher than that of R-band DNA, individual G bands consistently
means the segment of the long arm of
have lower GC content than their immediate flanking sequences. This may cor- the X that is closest to the centromere,
relate with how the DNA becomes organized as it condenses. and distal 2p means the portion of
Banding resolution can be increased by using more elongated chromosomes, the short arm of chromosome 2 that is
at or before prometaphase rather than metaphase. High-resolution procedures most distant from the centromere, and
for human chromosomes can identify 400, 550, or 850 bands (see Figures 2.14 therefore closest to the telomere.
and 2.15; see the inside back cover of this book for an ideogram for all human When comparing human
chromosomes). Analysis of banding patterns can also be used to investigate chromosomes with those of another
chromosomal abnormalities involving missing and mispositioned genetic species, the convention is to use the
material. first letter of the genus name and the
first two letters of the species name, for
Reporting of cytogenetic analyses example:
• HSA18: human chromosome 18
A minimal cytogenetic analysis report provides a text-only statement that always (Homo sapiens)
gives the total number of chromosomes and the sex chromosome constitution (a • PTR10: chimp chromosome 10
karyotype). The normal karyotypes for human females and males are 46,XX and (Pan troglodytes)
46,XY, respectively. When there is a chromosomal abnormality, the karyotype • MMU6: mouse chromosome 6
also describes the type of abnormality and the chromosome bands or sub-bands (Mus musculus)
affected (Box 2.2).
46 Chapter 2: Chromosome Structure and Function
1 2 3 4 5
NUMERICAL
STRUCTURAL
Insertion 46,XX,ins(2)(p13q21q31) a rearrangement of one copy of chromosome 2 by insertion of segment 2q21–q31 into a
breakpoint at 2p13
Marker 47,XX,+mar indicates a cell that contains a marker chromosome (an extra unidentified chromosome)
Reciprocal 46,XX,t(2;6)(q35;p21.3) a balanced reciprocal translocation with breakpoints at 2q35 and 6p21.3
translocation
Robertsonian 45,XY,der(14;21)(q10;q10) a balanced carrier of a 14;21 Robertsonian translocation. q10 is not really a chromosome
translocation (gives band, but indicates the centromere; der is used when one chromosome from a translocation
rise to one derivative is present
chromosome)
46,XX,der(14;21)(q10;q10),+21 an individual with Down syndrome possessing one normal chromosome 14, a Robertsonian
translocation 14;21 chromosome, and two normal copies of chromosome 21
This is a short nomenclature; a more complicated nomenclature is defined by the ISCN that allows complete description of any chromosome
abnormality; see Shaffer LG, Tommerup N (eds) (2005) ISCN 2005: An International System for Human Cytogenetic Nomenclature. Karger.
Chromosomes that do not enter the nucleus of a daughter cell are eventually
66%
degraded.
Mixoploidy
Mixoploidy means having two or more genetically different cell lineages within 2n 2n
one individual. The genetically different cell populations usually arise from the
same zygote (mosaicism). More rarely, they originate from different zygotes n
(chimerism); spontaneous chimerism usually arises by the aggregation of
10% endomitosis
Figure 2.21 Origins of triploidy and tetraploidy. (A) Origins of human triploidy.
Dispermy is the principal cause, accounting for 66% of cases. Triploidy is also
caused by diploid gametes that arise by occasional faults in meiosis; fertilization
n 4n
of a diploid ovum and fertilization by a diploid sperm account for 10% and 24%
of cases, respectively. (B) Tetraploidy involves normal fertilization and fusion of 2n
gametes to give a normal zygote. Subsequently, however, tetraploidy arises by
endomitosis when DNA replicates without subsequent cell division. 24%
52 Chapter 2: Chromosome Structure and Function
fraternal twin zygotes, or immediate descendant cells therefrom, within the early
embryo. Abnormalities that would otherwise be lethal (such as triploidy) may not
be lethal in mixoploid individuals.
Aneuploidy mosaics are common. For example, mosaicism resulting in a pro-
portion of normal cells and a proportion of aneuploid (e.g. trisomic) cells can be
ascribed to nondisjunction or chromosome lag occurring in one of the mitotic
divisions of the early embryo (any monosomic cells that are formed usually die
out). Polyploidy mosaics (e.g. human diploid/triploid mosaics) are occasionally
found. As gain or loss of a haploid set of chromosomes by mitotic nondisjunction
is extremely unlikely, human diploid/triploid mosaics most probably arise by
fusion of the second polar body with one of the cleavage nuclei of a normal dip-
loid zygote.
Clinical consequences
Having the wrong number of chromosomes has serious, usually lethal, conse-
quences (Table 2.5). Even though the extra chromosome 21 in a person with tri-
somy 21 (Down syndrome) is a perfectly normal chromosome, inherited from a
normal parent, its presence causes multiple abnormalities that are present from
birth (congenital). Embryos with trisomy 13 or trisomy 18 can also survive to
term, but both result in severe developmental malformations, respectively Patau
syndrome and Edwards syndrome. Other autosomal trisomies are not compati-
ble with life. Autosomal monosomies have even more catastrophic consequences
than trisomies, and are invariably lethal at the earliest stages of embryonic life.
The developmental abnormalities associated with monosomies and trisomies
must be the consequence of an imbalance in the levels of gene products encoded
on different chromosomes. Normal development and function depend on innu-
merable interactions between gene products that are often encoded on different
chromosomes. For at least some of these interactions, balancing the levels of
gene products encoded by genes on different chromosomes is critically impor-
tant; each chromosome probably contains several genes for which the amount of
POLYPLOIDY
Triploidy (69,XXX or 69,XYY) 1–3% of all conceptions; almost never born live and do not
survive long
ANEUPLOIDY AUTOSOMES
Trisomy (one extra usually lethal during embryonic or fetala stages, but individuals
chromosome) with trisomy 13 (Patau syndrome) and trisomy 18 (Edwards
syndrome) may survive to term; those with trisomy 21
(Down syndrome) may survive beyond age 40
Additional sex chromosomes individuals with 47,XXX, 47,XXY, or 47,XYY all experience
relatively minor problems and a normal lifespan
Lacking a sex chromosome although 45,Y is never viable, in 45,X (Turner syndrome), about
99% of cases abort spontaneously; survivors are of normal
intelligence but are infertile and show minor physical diagnostic
characteristics
aIn humans, the embryonic period spans fertilization through to the end of the eighth week of
gene product made is tightly regulated. Altering the relative numbers of chromo-
somes will affect these interactions, and reducing the gene copy number by 50%
could be expected to have more severe consequences than providing three gene
copies in place of two.
Having extra sex chromosomes can have far fewer ill effects than having an
extra autosome. People with 47,XXX or 47,XYY karyotypes often function within
the normal range, and 47,XXY men have relatively minor problems in compari-
son with people with any autosomal trisomy. Even monosomy (in 45,X women)
can have remarkably few major consequences—although 45,Y is always lethal.
Because normal people can have either one or two X chromosomes, and
either no or one Y, there must be special mechanisms that allow normal function
with variable numbers of sex chromosomes. For the Y chromosome, this is
because the very few genes it carries are very largely focused on determining
maleness. The mammalian X chromosome is also a special case because
X-chromosome inactivation in females controls the level of X-encoded gene
products independently of the number of X chromosomes in the cell.
It is not obvious why triploidy is lethal in humans and other animals. With
three copies of every autosome, the dosage of autosomal genes is balanced and
should not cause problems. Triploids are always sterile because triplets of chro-
mosomes cannot pair and segregate correctly in meiosis, but many triploid plants
are in all other respects healthy and vigorous. The lethality in animals may be due
to an imbalance between products encoded on the X chromosome and auto-
somes, for which X-chromosome inactivation would be unable to compensate.
(A) 2 breaks in same arm (B) 2 breaks in different arms Figure 2.22 Stable outcomes after
incorrect repair of two breaks on a
a a single chromosome. (A) Incorrect repair
of two breaks (green arrows) occurring
b b
in the same chromosome arm can
c c involve loss of the central fragment (here
containing hypothetical regions e and f )
d d and re-joining of the terminal fragments
e e (deletion), or inversion of the central
fragment through 180° and rejoining of
f f the ends to the terminal fragments (called
g g
a paracentric inversion because it does
not involve the centromere). (B) When
two breaks occur on different arms of the
deletion inversion inversion join broken same chromosome, the central fragment
ends
(encompassing hypothetical regions b to f
in this example) may invert and re-join the
c b
a a a terminal fragments (pericentric inversion).
b b f d Alternatively, because the central fragment
contains a centromere, the two ends can be
c c e joined to form a stable ring chromosome,
e
while the acentric distal fragments are lost.
d d d f Like other repaired chromosomes that retain
f ring a centromere, ring chromosomes can be
g c
chromosome
stably propagated to daughter cells.
e b
g g
satellite
satellite
stalk
proximal
short arm
Figure 2.23 Reciprocal and Robertsonian translocations. (A) Reciprocal translocation. The
derivative chromosomes are stable in mitosis when one acentric fragment is exchanged for another;
when a centric fragment is exchanged for an acentric fragment, unstable acentric and dicentric
chromosomes are produced. (B) Robertsonian translocation. This is a highly specialized reciprocal
translocation in which exchange of centric and acentric fragments produces a dicentric chromosome
that is nevertheless stable in mitosis, plus an acentric chromosome that is lost in mitosis without any
effect on the phenotype. It occurs exclusively after breaks in the short arms of the human acrocentric
chromosomes 13, 14, 15, 21, and 22. As illustrated on the right, the short arm of the acrocentric
chromosomes consists of three regions: a proximal heterochromatic region (composed of highly
repetitive noncoding DNA), a distal heterochromatic region (called a chromosome satellite), and a thin
connecting region of euchromatin (the satellite stalk) composed of tandem rRNA genes. Breaks that
occur close to the centromere can result in a dicentric chromosome in which the two centromeres are
so close that they can function as a single centromere. The loss of the small acentric fragment has no
phenotypic consequences because the only genes lost are rRNA genes that are also present in large
copy number on the other acrocentric chromosomes.
gametes
fertilization
by a normal
gamete
zygotes
21
14
14 21
outcomes of meiosis
14 21
21
14
fertilization
by a normal
gamete
Figure 2.25 Possible outcomes of meiosis in a carrier of a Robertsonian translocation. Carriers are asymptomatic but often
produce unbalanced gametes that can result in a monosomic or trisomic zygote. The two monosomic zygotes and the trisomy 14
zygote in this example would not be expected to develop to term.
correct parental origin. Rare 46,XX conceptuses in which both complete haploid
genomes originate from the same parent (uniparental diploidy) never develop
correctly, and experimentally induced uniparental diploidy in other mammals
also results in abnormal development. Even for particular chromosomes, having
both homologs derived from the same parent (uniparental disomy) can cause
abnormality. Parental origin of chromosomes is important because some genes
are expressed differently according to the parent of origin; such genes are subject
to imprinting.
Uniparental diploidy in humans arises most often by fertilization of a faulty
egg that lacks chromosomes by a diploid sperm (produced by chance chromo-
some doubling) or by the activation of an unovulated oocyte (parthenogenesis),
resulting in each case in a 46,XX karyotype. The resulting conceptuses develop
abnormally.
Normal human (and mammalian) development involves the production of
both an embryo that develops into a fetus and also extra-embryonic membranes,
including the trophoblast, that act to support development and give rise to the
placenta. Uniparental diploidy changes the important balance between the
embryo or fetus and its supporting membranes. Paternal uniparental diploidy
produces hydatidiform moles, abnormal conceptuses that develop to show wide-
spread hyperplasia (overgrowth) of the trophoblast but no fetal parts; they have a
significant risk of transformation into choriocarcinoma. Maternal uniparental
diploidy results in ovarian teratomas, rare benign tumors of the ovary that consist
of disorganized embryonic tissues but are lacking in vital extra-embryonic
membranes.
Uniparental disomy is thought to arise in most cases by trisomy rescue. In
such cases a trisomic conceptus that would otherwise probably die before birth
gives rise to a 46,XX or 46,XY embryo by chromosome loss. The chromosome loss
occurs through faulty mitotic division very early in development in a cell that has
58 Chapter 2: Chromosome Structure and Function
(B)
normal acentric
normal inverted
inverted
dicentric
the capacity to give rise to all the different cell types in the organism (a totipotent
cell). The progeny of this cell are euploid and form the embryo, and all the aneu-
ploid cells die. If each of the three copies of a trisomic chromosome has an equal
chance of being lost, there is a two in three chance that a single chromosome loss
will lead to the normal chromosome constitution and a one in three chance of
uniparental disomy (either paternal or maternal).
Uniparental disomy may often go undiagnosed because there are no obvious
phenotypic effects, and it is detected only when it results in characteristic syn-
dromes. In such cases, the chromosome involved contains imprinted genes that
are expressed incorrectly. Uniparental disomy can involve heterodisomy (one
parent contributes non-identical homologs, most probably by trisomy rescue).
Sometimes, however, the two chromosome copies inherited from one parent are
identical (isodisomy). This can happen by monosomy rescue when a monosomic
embryo that would otherwise die achieves euploidy by selective duplication of
the monosomic chromosome.
CONCLUSION
In this chapter we have described fundamental aspects of chromosome structure
and function, with an emphasis on human chromosomes and the generation of
chromosome abnormalities in humans. There are close similarities in the ways
that eukaryotic chromosomes are structured, notably in the way that DNA is
complexed with histones and organized into higher-order structures.
FURTHER READING 59
But there are also important differences. During eukaryote evolution, cen-
tromeres have become progressively larger and more complex, and in more com-
plex eukaryotes they are dominated by rapidly evolving species-specific re-
petitive DNA sequences. The mammalian sex chromosomes have unique
characteristics and we will consider them in greater detail in Chapter 10 when we
consider chromosome evolution. Specialized mammalian X-inactivation and
imprinting systems will also be explored in greater detail in the context of gene
expression in Chapter 11, as will the relationship between chromatin structure
and gene expression.
We will also provide many illustrative examples on how chromosome abnor-
malities cause inherited disease and cancer in Chapters 16 and 17 that also con-
tain fuller descriptions of some of the technologies used to analyze chromo-
somes, such as array CGH.
FURTHER READING
Chromosome structure and function Rooney DE (2001) (ed.) Human Cytogenetics: Constitutional
Belmont AS (2006) Mitotic chromosome structure and Analysis, 3rd ed. Oxford University Press.
condensation. Curr. Opin. Cell Biol. 18, 632–638. Rooney DE (2001) (ed.) Human Cytogenetics: Malignancy and
Blackburn EH (2001) Switching and signaling at the telomere. Acquired Abnormalities, 3rd ed. Oxford University Press.
Cell 106, 661–673. Schinzel A (2001) Catalogue of Unbalanced Chromosome
Courbet S, Gay S, Arnoult N et al. (2008) Replication fork Aberrations in Man. Walter de Gruyter.
movement sets chromatin loop size and origin choice in Shaffer LG & Bejani BA (2006) Medical applications of array CGH
mammalian cells. Nature 455, 557–560. and the transformation of clinical cytogenetics. Cytogenet.
Haering CH, Farcas AM, Arumugam P et al. (2008) The cohesin Genome Res. 115, 303–309.
ring concatenates sister DNA molecules. Nature 454, Therman E & Susman M (1992) Human Chromosomes: Structure,
297–301. Behavior and Effects, 3rd ed. Springer.
Herrick J & Bensimon A (2008) Global regulation of genome Trask BJ (2002) Human cytogenetics: 46 chromosomes, 46 years
duplication in eukaryotes: an overview from the and counting. Nat. Rev. Genet. 3, 769–778.
epifluorescence microscope. Chromosoma 117, 243–260. Human cytogenetic nomenclature
Louis EJ & Vershinin AV (2005) Chromosome ends: different
sequences may provide conserved functions. BioEssays Shaffer LG & Tommerup N (eds) (2005) ISCN 2005: An
27, 685–697. International System for Human Cytogenetic Nomenclature.
Meaburn KJ & Misteli T (2007) Chromosome territories. Karger.
Nature 445, 379–381. Web-based cytogenetic resources
Riethman H (2008) Human telomere structure and biology.
Annu. Rev. Genomics Hum. Genet. 9, 1–19. Useful compilations are available on certain sites such as
Vagnarelli P, Ribeiro SA & Earnshaw WC (2008) Centromeres: old www.kumc.edu/gec/prof/cytogene.html
tales and new tools. FEBS Lett. 582, 1950–1959.
• The pattern of inheritance of a character follows Mendel’s rules if its presence, absence, or
specific nature is normally determined by the genotype at a single locus.
• Mendelian characters can be dominant, recessive, or co-dominant. A character is dominant
if it is evident in a heterozygous person, recessive if not.
• Human Mendelian characters give pedigree patterns that are often recognizable, although
seldom as unambiguous as the results of laboratory breeding experiments because of the
limited size and non-ideal structure of most human families.
• Genetically controlled characters can be dichotomous (present or absent) or continuous
(quantitative). The genes controlling quantitative characters are called quantitative trait
loci.
• Most characters depend on more than one genetic locus, and often also on environmental
factors. Such characters are called complex or multifactorial.
• Polygenic theory explains why many quantitative characters show a normal (Gaussian)
distribution in a population.
• Polygenic threshold theory provides a framework for understanding the genetics of
dichotomous multifactorial characters, but it is not a tool for predicting the characteristics
of individuals.
• Subject to certain conditions, gene frequencies and genotype frequencies are related by the
Hardy–Weinberg formula.
• Consanguineous matings contribute disproportionately to the incidence of rare autosomal
recessive conditions; ignoring this when using the Hardy–Weinberg formula leads to
incorrect gene frequencies.
• Gene frequencies in populations are the result of a combination of random processes
(genetic drift) and the effects of new mutation and natural selection.
• Heterozygote advantage often explains why an autosomal recessive condition is particularly
common in a population.
62 Chapter 3: Genes in Pedigrees and Populations
Genetics as a science started with Gregor Mendel’s experiments in the 1860s. BOX 3.1 NOMENCLATURE OF
Mendel bred and crossed pea plants, counted the numbers of plants in each gen- HUMAN GENES AND ALLELES
eration that manifested certain traits, and deduced mathematical rules to explain
his results. Patterns of inheritance in humans (and in all other diploid organisms Human genes have upper-case
that reproduce sexually) follow the same principles that Mendel identified in his italicized names; examples are CFTR
peas, and are explained by the same basic concepts. and CDKN2A.
The process of discovery often
Genes can be defined in two ways:
involves several competing research
• A gene is a determinant, or a co-determinant, of a character that is inherited groups, so that the same gene may
in accordance with Mendel’s rules. initially be referred to by several
• A gene is a functional unit of DNA. different names. Eventually an official
name is assigned by the HUGO (Human
In this chapter we will be exploring genes through the first of these defini- Genome Organisation) Nomenclature
tions. Later chapters will consider genes as DNA. A major aim of human molecu- Committee, and this is the name
lar genetics is to understand how genes as functional DNA sequences determine that should be used henceforth.
the observable characters of a person. Official names can be found on the
A locus (plural loci) is a unique chromosomal location defining the position nomenclature website (http://
of an individual gene or DNA sequence. Thus, we can speak of the ABO blood www.genenames.org) or from entries
group locus, the Rhesus blood group locus, and so on. in genome browser programs such as
Alleles are alternative versions of a gene. For example, A, B, and O are alterna- Ensembl (described in Chapter 8).
The formal nomenclature for alleles
tive alleles at the ABO locus. Box 3.1 shows the formal rules for nomenclature of
uses the gene name followed by an
alleles. However, for pedigree interpretation, alleles are usually simply denoted asterisk and then the allele name or
by upper- and lower-case versions of the same letter, so that the genotypes at a number, for example, PGM*1, HLA-A*31.
locus would be written AA, Aa, or aa. The upper-case letter is conventionally used When only the allele name is used, it
for the allele that determines the dominant character (see below). can be written *1, *31. However, the
The genotype is a list of the alleles present at one or a number of loci. rules for nomenclature of alleles are less
Phenotypes, characters, or traits are the observable properties of an organ- rigorously adhered to in publications
ism. The means of observation may range from simple inspection to sophisti- than those for genes.
cated laboratory investigations.
A person is homozygous at a locus if both alleles at that locus are the same,
and heterozygous if they are different. For different purposes we may check more
or less carefully whether the two alleles at a locus are really the same. For the
pedigree interpretation in this chapter, we will be concerned only with the phe-
notypic consequences of a person’s genotype. We will describe a person as
homozygous if both alleles at the locus in question have the same phenotypic
effect, even though inspection of the DNA sequence might reveal differences
between them.
A person is hemizygous if they have only a single allele at a locus. This may be
because the locus is on the X or Y chromosome in a male, or it may be because
one copy of an autosomal locus is deleted.
A character is dominant if it is manifested in a heterozygous person, recessive
if not.
they are the characters of choice for following the transmission of a chromosomal
segment through a pedigree, as described in Chapter 13. Directly observed char-
acteristics of proteins (such as electrophoretic mobility or enzyme activity) usu-
ally display Mendelian pedigree patterns, but these characters can be influenced
by more than one locus because of post-translational modification of gene prod-
ucts. Disease states or other typical traits that reflect the biochemical action of a
protein within a cellular context are less often entirely determined at a single
genetic locus. The failure or malfunction of a developmental pathway that results
in a birth defect is likely to involve a complex balance of factors. Thus, the com-
mon birth defects (for example, cleft palate, spina bifida, or congenital heart dis-
ease) are rarely Mendelian. Behavioral traits such as IQ test performance or
schizophrenia are still less likely to be Mendelian—but they may still be geneti-
cally determined to a greater or smaller extent.
Non-Mendelian characters may depend on two, three, or many genetic loci.
We use multifactorial here as a catch-all term covering all these possibilities.
More specifically, the genetic determination may involve a small number of loci
(oligogenic) or many loci each of individually small effect (polygenic); or there
may be a single major locus with a polygenic background—that is, the genotype
at one locus has a major effect on the phenotype, but this effect is modified by the
cumulative minor effects of genes at many other loci. In reality, characters form a
continuous spectrum, from perfectly Mendelian through to truly polygenic
(Figure 3.1). Superimposed on this there may be a greater or smaller effect of
environmental factors.
For dichotomous characters (characters that you either have or do not have,
such as extra fingers) the underlying loci are envisaged as susceptibility genes,
whereas for quantitative or continuous characters (such as height or weight)
they are seen as quantitative trait loci (QTLs). Any of these characters may tend
to run in families, but the pedigree patterns are not Mendelian and do not fit any
standard pattern.
Figure 3.1 Spectrum of genetic
In a further layer of complication, a common human condition such as diabe-
determination. ABO blood group depends
tes is likely to be very heterogeneous in its causation. Some cases may have a (with rare exceptions) on the genotype at
simple Mendelian cause, some might be entirely the result of environmental fac- just one locus, the ABO locus at chromosome
tors, while the majority of cases may be multifactorial. Such conditions are called 9q34. Rhesus hemolytic disease of the
complex. newborn depends on the genotypes of
mother and baby at the RHD locus at
ABO Rhesus chromosome 1p36, but also on mother and
blood hemolytic Hirschsprung adult
group disease disease stature baby’s being ABO compatible. Hirschsprung
disease depends on the interaction of several
genetic loci. Adult stature is determined by
the cumulative small effects of many loci.
Environmental factors are also important in
Mendelian polygenic the etiology of Rhesus hemolytic disease,
(monogenic) Hirschsprung disease, and adult height.
64 Chapter 3: Genes in Pedigrees and Populations
twin brothers
affected
miscarriage
carrier (optional)
6 offspring,
dead 6 sex unknown/
not stated
I
III
1 2
1 2
II
IV ?
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4
I I
1 2 1 2
II II
1 2 3 4 5 6 1 2 3 4 5 6 7 8
III III
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10
IV ? IV ?
1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Figure 3.5 Pedigree pattern of an X-linked recessive Figure 3.6 Pedigree pattern of an X-linked dominant condition. The risk for the
condition. The females marked with dots are definitely individual marked with a query is negligibly low if male, but 100% if female.
carriers; individuals III3 and/or IV4 may also be carriers,
but we do not know. The risk for the individual marked
with a query is 1 in 2 males or 1 in 4 of all offspring.
I
1 2
II
1 2 3 4 5 6 7
III
1 2 3 4 5 6 7 8 9 10
they use X-inactivation (sometimes called lyonization after its discoverer, Dr (A)
Mary Lyon).
Early in embryogenesis each cell somehow counts its number of X chromo-
somes, and then permanently inactivates all X chromosomes except one in each
somatic cell. Inactivated X chromosomes are still physically present, and on a
standard karyotype they look entirely normal, but the inactive X remains con-
densed throughout the cell cycle. Most genes on the chromosome are perma-
nently silenced in somatic cells. The mechanism is discussed in Chapter 11. In
interphase cells the inactive X may be seen under the microscope as a Barr body
or sex chromatin (Figure 3.8). Regardless of the karyotype, each somatic cell
retains a single active X:
• XY males keep their single X active (no Barr body).
• XX females inactivate one X in each cell (one Barr body).
• Females with Turner syndrome (45,X) do not inactivate their X (no Barr
body).
• Males with Klinefelter syndrome (47,XXY) inactivate one X (one Barr body). (B)
• 47,XXX females inactivate two X chromosomes (two Barr bodies).
Figure 3.8 A Barr body. (A) A cell from an XX female has one inactivated
X chromosome and shows a single Barr body. (B) A cell from a 49,XXXXY male
has three inactivated X chromosomes and shows three Barr bodies. (Courtesy of
Malcolm Ferguson-Smith, University of Cambridge.)
MENDELIAN PEDIGREE PATTERNS 67
Y
I
IV 6
V 6
chains. But even with these extensions, Beadle and Tatum’s hypothesis cannot be
used to imply a one-to-one correspondence between entries in the OMIM cata-
logue and DNA transcription units.
The genes of classical genetics are abstract entities. Any character that is
determined at a single chromosomal location will segregate in a Mendelian pat-
tern—but the determinant may not be a gene in the molecular geneticist’s sense
of the word. Fascio-scapulo-humeral muscular dystrophy (severe but non-lethal
weakness of certain muscle groups; OMIM 158900) is caused by small deletions
of sequences at 4q35—but, at the time of writing, nobody has yet found a relevant
protein-coding sequence at that location, despite intensive searching and
sequencing. This example is unusual—most OMIM entries probably do describe
the consequences of mutations affecting a single transcription unit. However,
there is still no one-to-one correspondence between phenotypes and transcrip-
tion units because of three types of heterogeneity:
• Locus heterogeneity is where the same clinical phenotype can result from
mutations at any one of several different loci. This may be due to epistasis,
where gene A controls the action of gene B, or lies upstream of gene B in a
pathway; alternatively genes A and B may function in separate pathways that
affect the same phenotype.
• Allelic heterogeneity is where many different mutations within a given gene
can be seen in different patients with a certain genetic condition (explored
more fully in Chapter 16). Many diseases show both locus and allelic
heterogeneity.
• Clinical heterogeneity is used here to describe the situation in which muta-
tions in the same gene produce two or more different diseases in different
people. For example, mutations in the HPRT gene can produce either a form
of gout or Lesch–Nyhan syndrome (severe mental retardation with behavioral
problems; OMIM 300322). Note that this is not the same as pleiotropy. That
term means that one mutation has multiple effects in the same organism.
Most mutations are pleiotropic if you look carefully enough at the pheno-
type.
Locus heterogeneity
Hearing loss provides good examples of locus heterogeneity. When two people
with autosomal recessive profound congenital hearing loss marry, as they often
do, the children most often have normal hearing. It is easy to see that many dif-
ferent genes would be needed to construct so exquisite a machine as the cochlear
hair cell, and a defect in any of those genes could lead to deafness. If the muta-
tions causing deafness are in different genes in the two parents, all their children
will be double heterozygotes—but assuming that both parental conditions are
recessive, the children will have normal hearing (Figure 3.12).
When two recessive characters may or may not be caused by mutations at the
same locus, a test cross between homozygotes for the two characters (in experi-
mental organisms) can provide the answer. This is called a complementation
test. If the mutations in the two parental stocks are at the same locus, the progeny
will not have a wild-type allele, and so will be phenotypically abnormal. If there
are two different loci, the progeny are heterozygous for each of the two recessive
characters, and therefore phenotypically normal, as illustrated in Table 3.1.
Very occasionally alleles at the same locus can complement each other
(interallelic complementation). This might happen if the gene product is a pro-
tein that dimerizes, and the mutant alleles in the two parents affect different
parts of the protein, such that a heterodimer may retain some function. In gen-
eral, however, if two mutations complement each other, it is reasonable to assume
that they involve different loci. Cell-based complementation tests (fusing cell
lines in tissue culture) have been important in sorting out the genetics of human
phenotypes such as DNA repair defects, in which the abnormality can be observed
in cultured cells. Hearing loss provides a rare opportunity to see complementa-
tion in action in human pedigrees.
Locus heterogeneity is only to be expected in conditions such as deafness,
blindness, or learning disability, in which a rather general pathway has failed; but
even with more specific pathologies, multiple loci are very frequent. A striking
example is Usher syndrome, an autosomal recessive combination of hearing loss
and progressive blindness (retinitis pigmentosa), which can be caused by muta-
tions at any of 10 or more unlinked loci. OMIM has separate entries for known
examples of locus heterogeneity (defined by linkage or mutation analysis), but
there must be many undetected examples still contained within single entries.
Clinical heterogeneity
Sometimes several apparently distinct human phenotypes turn out all to be
caused by different allelic mutations at the same locus. The difference may be
one of degree—mutations that partly inactivate the dystrophin gene produce
Becker muscular dystrophy (OMIM 300376), whereas mutations that completely
inactivate the same gene produce the similar but more severe Duchenne muscu-
lar dystrophy (lethal muscle wasting; OMIM 310200). At other times the differ-
ence is qualitative—inactivation of the androgen receptor gene causes androgen
insensitivity (46,XY embryos develop as females; OMIM 313700), but expansion
of a run of glutamine codons within the same gene causes a very different
disease, spinobulbar muscular atrophy or Kennedy disease (OMIM 313200).
These and other genotype–phenotype correlations are discussed in more depth
in Chapter 13.
II
III
Mendelian polygenic
(monogenic)
probability
time, and so the risk of having acquired it is cumulative and increases through
life. Depending on the disease, the penetrance may become 100% if the person 0.5
lives long enough, or there may be people who carry the gene but who will never B
0.4
develop symptoms no matter how long they live. Age-of-onset curves such as
0.3
those in Figure 3.16 are important tools in genetic counseling, because they
enable the geneticist to estimate the chance that an at-risk but asymptomatic 0.2
person will subsequently develop the disease. 0.1
0
Many conditions show variable expression 0 10 20 30 40 50 60 70
age (years)
Related to non-penetrance is the variable expression frequently seen in domi-
nant conditions. Figure 3.17 shows an example from a family with Waardenburg Figure 3.16 Age-of-onset curves for
syndrome. Different family members show different features of the syndrome. Huntington disease. Curve A shows the
The cause is the same as with non-penetrance: other genes, environmental fac- probability that an individual carrying the
tors, or pure chance have some influence on the development of the symptoms. disease gene will have developed symptoms
Non-penetrance and variable expression are typically problems with dominant, by a given age. Curve B shows the risk at a
rather than recessive, characters. In part this reflects the difficulty of spotting given age that an asymptomatic person who
has an affected parent nevertheless carries
non-penetrant cases in a typical recessive pedigree. However, as a general rule,
the disease gene. [From Harper PS (1998)
recessive conditions are less variable than dominant ones, probably because the Practical Genetic Counselling, 5th ed.
phenotype of a heterozygote involves a balance between the effects of the two With permission from Edward Arnold
alleles, so that the outcome is likely to be more sensitive to outside influence than (Publishers) Ltd.]
the phenotype of a homozygote. However, both non-penetrance and variable
expression are occasionally seen in recessive conditions.
These complications are much more conspicuous in humans than in experi-
mental organisms. Laboratory animals and crop plants are far more genetically
uniform than humans, and live in much more constant environments. What we
see in human genetics is typical of a natural mammalian population. Nevertheless,
mouse geneticists are familiar with the way in which the expression of a mutant
gene can change when it is bred onto a different genetic background—an impor-
tant consideration when studying mouse models of human diseases.
Anticipation
Anticipation describes the tendency of some conditions to become more severe
(or have earlier onset) in successive generations. Anticipation is a hallmark of
conditions caused by a very special genetic mechanism (dynamic mutation)
whereby a run of tandemly repeated nucleotides, such as CAGCAGCAGCAGCAG,
is meiotically unstable. A (CAG)40 sequence in a parent may appear as a (CAG)55
sequence in a child. Certain of these unstable repeat sequences become patho-
genic above some threshold size. Examples include fragile X syndrome (mental
retardation with various physical signs; OMIM 309550), myotonic dystrophy (a
II
III
II 4
III 3 4 2 5
III
(A) (B)
I I
1 2
II II
1 2 3 4 5
III III
1
Figure 3.21 Complications to the basic Mendelian patterns (7): new mutations. (A) A new autosomal dominant mutation. The
pedigree pattern mimics an autosomal or X-linked recessive pattern. (B) A new mutation in X-linked recessive Duchenne muscular
dystrophy. The three grandparental X chromosomes were distinguished by using genetic markers; here we distinguish them with
three different colors (ignoring recombination). III1 has the grandpaternal X, which has acquired a mutation at one of four possible
points in the pedigree—each with very different implications for genetic counseling:
• If III1 carries a new mutation, the recurrence risk for all family members is very low.
• If II1 is a germinal mosaic, there is a significant (but hard to quantify) risk for her future children, but not for those of her three
sisters.
• If II1 was the result of a single mutant sperm, her own future offspring have the standard recurrence risk for an X-linked recessive
trait, but those of her sisters are free of risk.
• If I1 was a germinal mosaic, all four sisters in generation II have a significant (but hard to quantify) risk of being carriers of the
condition.
• The mutation causes abnormal proliferation of a cell that would normally (A)
replicate slowly or not at all, thus generating a clone of mutant cells. This I
1 2
is how cancer happens, and this whole topic is discussed in detail in
Chapter 17.
A mutation in a germ-line cell early in development can produce a person
who harbors a large clone of mutant germ-line cells [germinal (or gonadal)
mosaicism]. As a result, a normal couple with no previous family history may II
produce more than one child with the same serious dominant disease. The pedi- 1 2 3
gree mimics recessive inheritance. Even if the correct mode of inheritance is real-
ized, it is very difficult to calculate a recurrence risk to use in counseling the par-
ents. The problem is discussed by van der Meulen et al. (see Further Reading).
Usually an empiric risk (see below) is quoted. Figure 3.21B shows an example of
the uncertainty that germinal mosaicism introduces into counseling, in this case III
in an X-linked disease. 1 2 3 4
Molecular studies can be a great help in these cases. Sometimes it is possible
to demonstrate directly that a normal father is producing a proportion of mutant
sperm. Direct testing of the germ line is not feasible in women, but other acces-
sible tissues such as fibroblasts or hair roots can be examined for evidence (B)
of mosaicism. A negative result on somatic tissues does not rule out germ-line III2 II2 III4
mosaicism, but a positive result, in conjunction with an affected child, proves it.
Individual II2 in Figure 3.22 is an example.
Chimeras
Mosaics start life as a single fertilized egg. Chimeras, in contrast, are the result of
fusion of two zygotes to form a single embryo (the reverse of twinning), or alter- Figure 3.22 Germ-line and somatic
natively of limited colonization of one twin by cells from a non-identical co-twin mosaicism in a dominant disease.
(Figure 3.23). Chimerism is proved by the presence of too many parental alleles (A) Although the grandparents in this
at several loci (in a sample that is prepared from a large number of cells). If just pedigree are unaffected, individuals II2 and
one locus were involved, one would suspect mosaicism for a single mutation, III2 both suffer from familial adenomatous
polyposis, a dominant inherited form of
rather than the much rarer phenomenon of chimerism. Blood-grouping centers
colorectal cancer (OMIM 175100; see
occasionally discover chimeras among normal donors, and some intersex Chapter 17). The four grandparental copies
patients turn out to be XX/XY chimeras. A fascinating example was described by of chromosome 5 (where the pathogenic
Strain et al. (see Further Reading). They showed that a 46,XY/46,XX boy was the gene is known to be located) were
result of two embryos amalgamating after an in vitro fertilization in which three distinguished by using genetic markers,
embryos had been transferred into the mother’s uterus. and are color-coded here (ignoring
recombination). (B) The pathogenic
mutation could be detected in III2 by gel
electrophoresis of blood DNA (red arrows;
the black arrow shows the band due to
the normal, wild-type allele). For II2 the gel
trace of her blood DNA shows the mutant
(A)
bands, but only very weakly, showing that
genetic she is a somatic mosaic for the mutation.
fertilization change The mutation is absent in III4, even though
marker studies showed that he inherited the
high-risk (blue in this figure) chromosome
from his mother. Individual II2 must therefore
be both a germ-line and a somatic mosaic
for the mutation. (Courtesy of Bert Bakker,
Leiden University Medical Center.)
mosaic
(B)
fusion or
exchange
fertilization of cells
Figure 3.23 Mosaics and chimeras.
(A) Mosaics have two or more genetically
different cell lines derived from a single
zygote. The genetic change indicated may
be a gene mutation, a numerical or structural
chromosomal change, or in the special case
of lyonization, X-inactivation. (B) A chimera is
derived from two zygotes, which are usually
chimera both normal but genetically distinct.
GENETICS OF MULTIFACTORIAL CHARACTERS: THE POLYGENIC THRESHOLD THEORY 79
frequency
ent and character, published in the same year, 1865, as Mendel’s paper (and
expanded in 1869 to a book, Hereditary Genius), he spent many years investigat- aa AA
ing family resemblances. Galton was devoted to quantifying observations and
applying statistical analysis. His Anthropometric Laboratory, established in
London in 1884, recorded from his subjects (who paid him threepence for the
privilege) their weight, sitting and standing height, arm span, breathing capacity,
strength of pull and of squeeze, force of blow, reaction time, keenness of sight and 70 80 90 100 110 120 130
hearing, color discrimination, and judgments of length. In one of the first appli- (B) two loci
cations of statistics, he compared physical attributes of parents and children, and
established the degree of correlation between relatives. By 1900 he had built up a AAbb
large body of knowledge about the inheritance of such attributes, and a tradition AaBb
aaBB
frequency
(biometrics) of their investigation.
When Mendel’s work was rediscovered, a controversy arose. Biometricians Aabb AABb
aaBb AaBB
accepted that a few rare abnormalities or curious quirks might be inherited as
Mendel described, but they pointed out that most of the characters likely to be
important in evolution (fertility, body size, strength, and skill in catching prey or aabb AABB
finding food) were continuous or quantitative characters and not amenable to
Mendelian analysis. We all have these characters, only to different degrees, so you 70 80 90 100 110 120 130
cannot define their inheritance by drawing pedigrees and marking in the people
(C) three loci
who have them. Mendelian analysis requires dichotomous characters that you AABbcc
AAbbCc
either have or do not have. A controversy, heated at times, ran on between AAbbcc AaBBcc aaBBCC
Mendelians and biometricians until 1918. That year saw a seminal paper by AaBbcc aaBBCc AAbbCC
frequency
mathematical form. A more rigorous treatment can be found in textbooks of (D) many loci
quantitative or population genetics.
genetically determined
Any variable quantitative character that depends on the additive action of a large
number of individually small independent causes (whether genetic or not) will
show a normal (Gaussian) distribution in the population. Figure 3.24 gives a
70 80 90 100 110 120 130
highly simplified illustration of this for a genetic character. We suppose the char-
acter to depend on alleles at a single locus, then at two loci, then at three. As more Figure 3.24 Successive approximations
loci are included, we see two consequences: to a Gaussian distribution. The charts
show the distribution in the population of a
• The simple one-to-one relationship between genotype and phenotype disap- hypothetical character that has a mean value
pears. Except for the extreme phenotypes, it is not possible to infer the geno- of 100 units. The character is determined by
type from the phenotype. the additive (co-dominant) effects of alleles.
• As the number of loci increases, the distribution looks increasingly like a Each upper-case allele adds 5 units to the
Gaussian curve. The addition of a little environmental variation would smooth value, and each lower-case allele subtracts
out the three-locus distribution into a good Gaussian curve. 5 units. All allele frequencies are 0.5. (A) The
character is determined by a single locus.
A more sophisticated treatment, allowing dominance and varying gene fre- (B) Two loci. (C) Three loci. (D) The addition of
quencies, leads to the same conclusions. Because relatives share genes, their a minor amount of ‘random’ (environmental
phenotypes are correlated, and Fisher’s 1918 paper predicted the size of the cor- or polygenic) variation produces the
relation for different relationships. Gaussian curve.
80 Chapter 3: Genes in Pedigrees and Populations
Hidden assumptions
In the simple model of Figure 3.25 there is a hidden assumption: that there is
random mating. For each class of mothers, the average IQ of their husbands is
assumed to be 100. Thus, the average IQ of the children is actually the mid-paren-
tal IQ, as common sense would suggest. In the real world, highly intelligent
women tend to marry men of above average intelligence (assortative mating).
The regression would therefore be less than halfway to the population mean,
even if IQ were a purely genetic character.
A second assumption of our simplified model is that there is no dominance.
Each person’s phenotype is assumed to be the sum of the contribution of each
allele at each relevant locus. If we allow dominance, the effect of some of a par-
ent’s genes will be masked by dominant alleles and invisible in their phenotype,
but they can still be passed on and can affect the child’s phenotype. Given domi-
nance, the expectation for the child is no longer the mid-parental value. Our best
guess about the likely phenotypic effect of the masked recessive alleles is obtained
by looking at the rest of the population. Therefore, the child’s expected pheno-
type will be displaced from the mid-parental value toward the population mean.
How far it will be displaced depends on how important dominance is in deter-
mining the phenotype.
Misunderstanding heritability
The term heritability is often misunderstood. Heritability is quite different from
the mode of inheritance. The mode of inheritance (autosomal dominant, poly-
genic, etc.) is a fixed property of a trait, but heritability is not. Heritability of IQ is
shorthand for heritability of variations in IQ. Contrast the following two
questions:
• To what extent is IQ genetic? This is a meaningless question.
• How much of the differences in IQ between people in a particular country at
a particular time is caused by their genetic differences, and how much by their
different environments and life histories? This is a meaningful question, even
if difficult to answer.
In different social circumstances, the heritability of IQ will differ. The more Figure 3.26 Multifactorial determination
equal a society is, the higher the heritability of IQ should be. If everybody has of a disease or malformation. The angels
equal opportunities, several of the environmental differences between people and devils can represent any combination of
genetic and environmental factors. Adding
have been removed. Therefore more of the remaining differences in IQ will be
an extra devil or removing an angel can tip
due to the genetic differences between people.
the balance, without that particular factor
being the cause of the disease in any general
The threshold model extended polygenic theory to cover sense. (Courtesy of the late Professor RSW
dichotomous characters Smithells.)
Some continuously variable characters such as blood pressure or body mass
index are of great importance in public health, but for medical geneticists the
innumerable diseases and malformations that tend to run in families but do not
show Mendelian pedigree patterns are of greater concern. A major conceptual
tool in non-Mendelian genetics was provided by the extension of polygenic the-
ory to dichotomous or discontinuous characters (those that you either have or do
not have).
The key concept is that even for a dichotomous character, there is an underly-
ing continuously variable susceptibility. You may or may not have a cleft palate,
but every embryo has a certain susceptibility to cleft palate. The susceptibility
may be low or high; it is polygenic and follows a Gaussian distribution in the
population. Together with the polygenic susceptibility, we postulate the exist-
ence of a threshold. Embryos whose susceptibility exceeds a critical threshold
value develop cleft palate; those whose susceptibility is below the threshold, even
if only just below, develop a normal palate. Stripped of mathematical subtlety,
the model can be represented as in Figure 3.26. The threshold can be imagined
as the neutral point of the balance. Changing the balance of factors tips the phe-
notype one way or the other.
For cleft palate, a polygenic threshold model seems intuitively reasonable. All
embryos start with a cleft palate. During early development the palatal shelves
must become horizontal and fuse together. They must do this within a specific
developmental window of time. Many different genetic and environmental fac-
tors influence embryonic development, so it seems reasonable that the genetic
part of the susceptibility should be polygenic. Whether the palatal shelves meet
and fuse with time to spare, or whether they only just manage to fuse in time, is
unimportant—if they fuse, a normal palate forms; if they do not fuse, a cleft pal-
ate results. There is therefore a natural threshold superimposed on a continu-
ously variable process.
Using threshold theory to understand recurrence risks
Threshold theory helps explain how recurrence risks vary in families. Affected
people must have an unfortunate combination of high-susceptibility alleles.
Their relatives who share genes with them will also, on average, have an increased
susceptibility, with the divergence from the population mean depending on the
proportion of shared genes. Thus, polygenic threshold characters tend to run in
families (Figure 3.27). Parents who have had several affected children may have
just been unlucky, but on average they will have more high-risk alleles than par-
ents with only one affected child. The threshold is fixed, but the average suscep-
tibility, and hence the recurrence risk, increases with an increasing number of
previous affected children.
Many supposed threshold conditions have different incidences in the two sexes.
Threshold theory accommodates this by postulating sex-specific thresholds.
GENETICS OF MULTIFACTORIAL CHARACTERS: THE POLYGENIC THRESHOLD THEORY 83
Congenital pyloric stenosis, for example, is five times more common in boys than
girls. The threshold must be higher for girls than boys. Therefore, to be affected,
a girl has on average to have a higher liability than a boy. Relatives of an affected
girl therefore have a higher average susceptibility than relatives of an affected boy
(Figure 3.28). The recurrence risk is correspondingly higher, although in each
case a baby’s risk of being affected is five times higher if it is a boy because a less
extreme liability is sufficient to cause a boy to be affected (Table 3.2).
Male proband 19/296 (6.42%) 7/274 (2.55%) 5/230 (2.17%) 5/242 (2.07%)
Female proband 14/61 (22.95%) 7/62 (11.48%) 11/101 (10.89%) 9/101 (8.91%)
More boys than girls are affected, but the recurrence risk is higher for relatives of an affected girl. The data fit a polygenic threshold model with
sex-specific thresholds (Figure 3.28). [Data from Fuhrmann and Vogel (1976) Genetic Counselling. Springer.]
X-linked locus
Note that these genotype frequencies will be seen whether or not A1 and A2 are the only alleles at
the locus.
Example 1
An autosomal recessive condition affects 1 newborn in 10,000. What is the
expected frequency of carriers?
Hardy–Weinberg relationships give the following relationships:
Phenotypes: Unaffected Affected
Genotypes: AA Aa aa
Frequencies: p2 2pq q 2 = 1/10,000
q 2 is 1 in 10,000 or 10–4, and therefore q = 10–2 or 1 in 100. One in 100 genes
at the A locus are a, 99 in 100 are A. The carrier frequency, 2pq, is therefore
2 ¥ 99/100 ¥ 1/100, which is very nearly 1 in 50.
This calculation assumes that the frequency of the condition has not been
increased by inbreeding. This is an important proviso, which is considered in
more detail below.
Example 2
If a parent of a child affected by the above condition remarries, what is the risk of
producing an affected child in the new marriage?
To produce an affected child, both parents must be carriers, and the risk is
then 1 in 4. We know that the parent of the affected child is a carrier. If the new
spouse is, from the genetic point of view, a random member of the population,
his or her carrier risk is 2pq, which we have calculated as 1 in 50. The overall risk
is then 1/50 ¥ 1/4 = 1/200.
This assumes that there is no family history of the same disease in the new
spouse’s family.
Example 3
X-linked red–green color blindness affects 1 in 12 Swiss males. What proportion
of Swiss females will be carriers and what proportion will be affected?
Assuming that we are dealing with a single X-linked recessive phenotype, the
Hardy–Weinberg relationships are as follows:
Males Females
Genotypes: A1 A2 A1A1 A1A2 A2A2
Frequencies: p q = 1/12 p2 2pq q2
q = 1/12; therefore p = 1 – q = 11/12. The frequency of carriers, 2pq, is then
2 ¥ 1/12 ¥ 11/12 = 22/144, and the frequencies of homozygous affected females is
q 2 = 1 in 144. Thus, this single-locus model predicts that 15% of females will be
carriers and 0.7% will be affected.
In fact, the frequency of affected females is rather lower than this calculation
predicts. This is part of the evidence that there are two forms of red–green blind-
ness, protan and deutan. These are caused by variants in different but adjacent
genes at Xq28. The mathematics and genetics are discussed in OMIM entry
303900.
Inbreeding
The simple Hardy–Weinberg calculations are invalid if the underlying assump-
tion, that a person’s two alleles are picked independently from the gene pool, is
violated. In particular, there is a problem if there has not been random mating.
86 Chapter 3: Genes in Pedigrees and Populations
Assortative mating can take several forms, but the most generally important is
inbreeding. A marriage in which there is a previous genetic relationship between
the partners is called a consanguineous marriage.
If you marry a relative you are marrying somebody whose genes resemble
your own:
• First-degree relatives (parents, children, full sibs) share half their genes
(always for parents and children; on average for sibs).
• Second-degree relatives (grandparents, grandchildren, uncles, aunts, neph-
ews, nieces, half-sibs) share one-quarter of their genes, on average.
• Third-degree relatives (first cousins etc.) share one-eighth of their genes on
average.
In each case the sharing refers to the proportion of genes that are identical by
descent, as a result of that relationship. We all share many genes simply by virtue
of all being human. The coefficient of relationship between two people is defined
as the proportion of their genes that they share by virtue of their relationship. The
coefficient of inbreeding of a person can be defined as the probability that they
receive at a given locus two alleles that are identical by descent. It is equal to half
the coefficient of relationship of the parents. Popular myth would have it that
many rural or isolated populations are heavily inbred and backward because of
that. In fact, average inbreeding coefficients in most populations are well below
0.01, and almost never exceed 0.03 even in the most isolated groups.
Although inbreeding is seldom significant at the population level, it can have
major consequences for individuals. The calculations in Box 3.6 show that the
rarer a recessive condition is, the greater will be the proportion of all cases that
are the product of cousin or other consanguineous marriages. For most Western
societies only 1% or less of all marriages would be between first cousins.
Nevertheless, for a recessive condition with gene frequency q = 0.01, the formula
shows that 6% of all affected children would be born in that 1% of marriages. The
converse also holds. For example, among white northern Europeans there is little
increased consanguinity among parents of children with cystic fibrosis, because
the condition is so common. Tay–Sachs disease (OMIM 272800) is strongly asso-
ciated with parental consanguinity among non-Jews, in whom it is rare, but much
less so among Ashkenazi Jews, in whom it is rather common.
For rare autosomal recessive conditions, a basic Hardy–Weinberg calculation
that ignores inbreeding will badly overestimate the carrier frequency in the pop-
ulation at large. For a couple who have already had an affected child, the recur-
rence risk is 1 in 4, regardless of whether they are cousins or unrelated. But for
situations in which the risk of an affected child depends on estimating the fre-
quency of carriers in the population, a correct calculation requires a knowledge
of population genetic theory, and of the genetics of the particular population in
question, that goes beyond the scope of this book.
Other causes of departures from the Hardy–Weinberg relationship
Inbreeding is by far the most important cause of deviations from the Hardy–
Weinberg relationship between gene frequencies and genotype frequencies.
Other possibilities include selective migration and mortality. If people with a cer-
tain genotype are more likely to be removed from the population, by migration or
death, then those remaining will be depleted of that genotype, in comparison
with Hardy–Weinberg predictions. Large-scale immigration of people from a
population with a different gene frequency would equally upset the relationship.
One generation of random mating would suffice to reestablish a Hardy–Weinberg
relationship, albeit with a new gene frequency. However, if the population
remained stratified into groups that did not interbreed freely, the Hardy–Weinberg
relationship might apply within each group, but not to the overall population.
For genetic marker studies, systematic laboratory error is an important pos-
sible cause of an apparent deviation from the expected Hardy–Weinberg ratios.
For example, the assay used might score heterozygotes unreliably. Observed pro-
portions can be compared with the Hardy–Weinberg prediction by a simple c2
test with k(k – 1)/2 degrees of freedom, where there are k alleles at the locus being
tested. More sophisticated statistical treatments are needed if the numbers are
small.
FACTORS AFFECTING GENE FREQUENCIES 87
Suppose a man marries his first cousin. Consider the risk of their having a baby affected with an
autosomal recessive condition that has gene frequency q. If we know nothing about him, there is
¼ ¼
a chance 2pq that he is a carrier of the condition. If he is a carrier, there is a 1 in 8 chance that his
cousin also is, by virtue of their common ancestry (Figure 1). The overall risk of an affected child
is 2pq ¥ 1/8 ¥ 1/4 = pq/16. Had he married an unrelated woman the risk would have been q2, the
same as for any other couple. The ratio of risk is ½ ¼
pq/16:q2 = p:16q
For a rare recessive condition, q will be small; p will therefore be almost 1 and the ratio simplifies to
1:16q
Suppose a proportion c of all marriages in a population are between first cousins. Let each
marriage produce one child. Suppose the risk that child would be affected by a certain rare
autosomal recessive condition, when the parents are first cousins, is x. The risk for a child from
an outbred marriage is 16qx. The c cousin marriages produce xc affected children, and the (1 – c) Figure 1 Relatives share genes. Figures
outbred marriages produce 16qx(1 – c) affected children. A proportion c/[c + 16q(1 – c)] of all show the proportion of their genes that
affected children will be born to first cousins. The table shows some examples. relatives share with a proband (arrow) by
virtue of their relationship.
Proportion of all marriages Frequency of disease Percentage of all affected
that are between first allele children whose parents are
cousins first cousins
0.01 0.01 6
0.01 0.005 11
0.01 0.001 39
0.05 0.01 25
0.05 0.005 40
0.05 0.001 77
Note that we have used several simplifying assumptions:
• In deriving the 1:16q ratio of risks we used the approximation p = 1, which would be
reasonable only for rare recessive conditions.
• We ignored the possibility that the cousin might carry the mutant allele inherited
independently, and not as a result of her relationship to the carrier man. Again, that is
reasonable for a rare condition, but not for a very common one.
• Finally, we assumed that all unions were either between first cousins or fully outbred.
Despite these approximations, it remains generally true that consanguineous unions contribute
disproportionately to the incidence of rare recessive diseases.
A final point to note is that unrecognized locus heterogeneity can upset the
use of the Hardy–Weinberg relation to calculate genetic risks. Suppose a recessive
disease with frequency F is actually caused by homozygosity at any one of 10 loci.
The example of Usher syndrome, quoted earlier, shows that this is not an unreal-
istic scenario. To keep the mathematics simple, we will suppose that each of the
10 loci contributes equally to the incidence of the disease, so each separate locus
contributes F/10 to the incidence. We wish to advise a known carrier of his risk of
having an affected child if his partner is unrelated and has no family history of
the disease. As we calculated previously, the risk is 2pq ¥ 1/4. For each locus, the
disease allele has frequency ÷(F/10), and the carrier frequency is approximately
2÷(F/10). If we were unaware of the locus heterogeneity, we would assume the
frequency of the disease allele to be ÷F, and the carrier frequency to be approxi-
mately 2÷F. For equally frequent disease alleles at n loci, the single-locus calcula-
tion overstates the risk by a factor ÷n.
Genotypes: AA Aa aa
CONCLUSION
In this chapter we have considered genes in a way that Gregor Mendel would
have recognized. Genes, as treated here, are recognized through the pedigree
patterns they give rise to as they segregate through multigeneration families.
Characters that do not give Mendelian pedigree patterns are explained with
mathematical tools that stem from the work of Mendel’s contemporary, Francis
Galton. Further mathematical tools developed by Hardy and Weinberg 100 years
ago can be used to infer aspects of population genetics. All this can be done with-
out ever asking what the physical nature of genes is. Even when the relationship
of genes to chromosomes was worked out early in the twentieth century, this was
done without any knowledge of their physical nature, or any need to understand
it, as we will see in Chapter 14. But of course we are interested in understanding
what genes do and what they are, and for this we need to get physical—first with
cells, then with DNA. This is the subject of the following chapters.
FURTHER READING
Introduction: some basic definitions Read AP & Donnai D (2007) The New Clinical Genetics. Scion.
[A basic undergraduate textbook illustrating clinical aspects of
Druery CT & Bateson W (translators, revised by Blumberg R) (1901) the topics covered in this chapter.]
Mendel’s paper in English. http://www.mendelweb.org/ University of Kansas Genetics Education Center.
Mendel.html [An English translation of Mendel’s original 1865 http://www.kumc.edu/gec/ [A portal to a large range of Web
paper, with notes.] resources covering basic genetics.]
90 Chapter 3: Genes in Pedigrees and Populations
• Cells show extraordinary diversity in size, form, and function. Histology recognizes more than
200 different cell types in adult humans, but this is a massive underestimate—the number of
different neurons alone is likely to be in the thousands.
• Cells make connections with each other (cell adhesion); transient connections allow cells to
perform various functions, while stable connections allow functionally similar cells to form
tissues.
• Tissues are composed of cells of one or more types, plus an extracellular matrix (ECM), a complex
network of secreted macromolecules that supports cells and interacts with them to regulate
many aspects of their behavior.
• Cell adhesion is regulated by a limited number of different types of cell adhesion molecule that
link cells to neighboring cells or to ECM. Neighboring cells also make contact through different
types of cell junction.
• Many cell functions require that cells signal to each other over both short and long distances.
Signals transmitted by one cell change the behavior of responding cells by altering the activity of
proteins, notably transcription factors, that bring about a change in gene expression.
• Transmitting cells often secrete a ligand molecule that binds to a receptor on the surface of
responding cells to initiate a downstream signal transduction pathway.
• Other small signaling molecules pass through the plasma membranes of responding cells to bind
to intracellular receptors, or are anchored in the membrane of the transmitting cell and interact
with receptors on the surfaces of adjacent cells.
• Cell proliferation is regulated at different stages in the cell cycle. During development, cell
proliferation causes rapid growth of an organism. At maturity, cell proliferation is limited to
certain cell types that need to be renewed, and there is an equilibrium between cell birth and cell
death.
• Programmed cell death is functionally important in development and throughout life. There are
different pathways for getting rid of cells that are unwanted, unnecessary, or potentially
dangerous.
• During development, cells become progressively more restricted in the range of cell types that
they give rise to. At maturity, most cells are terminally differentiated.
• Stem cells are comparatively unspecialized cells that can give rise to differentiated cells and yet
are able to renew themselves, often by asymmetric cell division.
• Stem cells derived from the early embryo can give rise to all the cells of the body. Those derived
from cells later in development have less differentiation potential.
• Immune system cells are diverse and highly specialized. Those of the innate immune system
work in the nonspecific recognition of foreign or altered host molecules known as antigens.
• B and T lymphocytes are the core of the adaptive immune system, which mounts strong, highly
specific immune responses.
• Unique DNA rearrangements in B and T cells means that different immunoglobulins are made in
different B cells and that different T-cell receptors are made in different T cells. As a result, a
single individual can generate huge ranges of immunoglobulins and T-cell receptors.
92 Chapter 4: Cells and Cell–Cell Communication
Cells can be classified into broad taxonomic groups according to differences in plants
their internal organization and functions (see Figure 4.1). The major division of protophyta metaphyta
organisms into prokaryotes and eukaryotes is founded on fundamental differ-
ences in cell architecture. animals
Prokaryotes have a simple internal organization, with a single, membrane-
protozoa metazoa
enclosed compartment that is not subdivided by any internal membranes, as
occurs in eukaryotes (see below). Under the electron microscope, prokaryotic
cells appear relatively featureless. However, prokaryotes are far from primitive: UNICELLULAR MULTICELLULAR
they have been through many more generations of evolution than humans. All
prokaryotes are unicellular, and they comprise two kingdoms of life: Figure 4.1 Classification of unicellular
and multicellular organisms. The first two
• Bacteria (formerly called eubacteria to distinguish them from archaebacteria)
domains of life—bacteria and archaea—are
are found in many environments. Some cause disease; others perform tasks also known as prokaryotes because they
that are useful or essential to human survival. Huge numbers of bacteria lack internal membranes. Eukaryotes, which
inhabit our bodies. Comprising about 500–1000 different species, the vast form the third domain, have membrane-
majority of such commensal bacteria live in the gut, and many are beneficial: enclosed organelles. This domain is further
they ferment complex indigestible carbohydrates and synthesize various vita- subdivided into unicellular and multicellular
mins, including folic acid, vitamin K, and biotin. fungi, plants, and animals. The name protist
has been used as a generic term for any
• Archaea are a poorly understood group of organisms that superficially resem- unicellular eukaryote but has also been used
ble bacteria and so were formerly termed archaebacteria. They are often found to describe unicellular eukaryotes that can
in extreme environments (such as hot acid springs), but some species are not easily be classified as animals, plants,
found in more convivial locations such as in the guts of cows. or fungi.
CELL STRUCTURE AND DIVERSITY 93
Prokaryotes have no defined nucleus and their chromosomal DNA does not
seem to be highly organized; instead it exists as a nucleoprotein complex known
as the nucleoid (Box 4.1). The typical prokaryote genome is represented as a sin-
gle, circular chromosome containing less than 10 Mb of DNA. However, prokary-
otes have recently been identified that possess multiple circular chromosomes,
multiple linear chromosomes, mixtures of circular and linear chromosomes, and
genomes up to 30 Mb in size in the case of Bacillus megaterium.
Eukaryotes are thought to have first appeared about 1.5 billion years ago.
They have a much more complex organization than their prokaryote counter-
parts, with internal membranes and membrane-enclosed organelles including a
nucleus (see Box 4.1). There is only one kingdom of eukaryotic organisms—the
Eukarya—but it includes both unicellular organisms (yeasts, protozoans, and so
on) and multicellular fungi, plants, and animals.
In the vast majority of eukaryotes, the genome comprises two or more linear
chromosomes contained within the nucleus. Each chromosome is a single, very
long DNA molecule packaged with histones and other proteins in an elaborate
and highly organized manner. The number and DNA content of the chromo-
somes vary greatly between species (see also below).
The membranes surrounding and dividing the cell are selectively permeable,
regulating the transport of a variety of ions and small molecules into and out of
the cell and between compartments.
The soluble portions of both the cytoplasm and the nucleoplasm are highly
organized. In the nucleus, a variety of subnuclear structures have been identified
and are thought to be arranged in the context of a nuclear matrix (see Box 4.1). In
the cytoplasm, the cytoskeleton is an internal scaffold of protein filaments that
provides stability, generates the forces needed for movement and changes in cell
shape, facilitates the intracellular transport of organelles, and allows communi-
cation between the cell and its environment.
CNS, central nervous system. Note that some cells in early development are not represented in adults. A full list of the human adult cell types
recognized by histology is available on the Garland Scientific Web site at: http://www.garlandscience.com/textbooks/0815341059
.asp?type=supplements
96 Chapter 4: Cells and Cell–Cell Communication
UNICELLULAR
MULTICELLULAR
Note that gene numbers are best current estimates, partly because of the difficulty in identifying
genes encoding functional RNA products. For genome sizes on a wide range of organisms, see the
database of genome sizes at http://www.cbs.dtu.dk/databases/DOGS/index.php
CELL ADHESION AND TISSUE FORMATION 97
(A) (B)
blood vessel (blood sinus)
fusion
There are also very small differences in the DNA sequence of cells from a sin- Figure 4.2 Examples of polyploid
gle individual. Such differences can arise in three ways: cells arising from endomitosis or cell
fusion. (A) The megakaryocyte is a giant
• Programmed differences in specialized cells. Sperm cells have either an X or a polyploid (16C–64C) bone marrow cell
Y chromosome. Other examples are mature B and T lymphocytes, in which that is responsible for producing the
cell-specific DNA rearrangements occur so that different B cells and different thrombocytes (platelets) needed for blood
T cells have differences in the arrangement of the DNA segments that will clotting. It has a large multi-lobed nucleus
encode immunoglobulins or T-cell receptors, respectively. This is described in as a result of undergoing multiple rounds
more detail in Section 4.6. of DNA replication without cell division
(endomitosis). Multiple platelets are formed
• Random mutation and instability of DNA. The DNA of all cells is constantly
by budding from cytoplasmic processes of
mutating because of environmental damage, chemical degradation, genome the megakaryocyte and so have no nucleus.
instability, and small but significant errors in DNA replication and DNA repair. (B) Skeletal muscle fiber cells are polyploid
During development, each cell builds up a unique profile of mutations. because they are formed by the fusion of
• Chimerism and colonization. Very occasionally, an individual may naturally large numbers of myoblast cells to produce
have two or more clones of cells with very different DNA sequences. Fraternal extremely long multinucleated cells.
(non-identical twin) embryos can spontaneously fuse in early development, A multinucleated cell is known as a
syncytium.
or one such embryo can be colonized by cells derived from its twin.
epithelium
tissue
asymmetry (polarity) in a plane that is at right angles to the cell sheet, with a
basolateral part of the cell adjacent to and interacting with the basal lamina, and
an apical part at the opposing end that faces the exterior or the lumen of a cylin-
drical tube.
Connective tissue
Connective tissue is largely composed of ECM that is rich in fibrous polymers,
notably collagen. Sparsely distributed within connective tissue is a remarkable
variety of specialized cells, including both indigenous cells and also some immi-
grant cells, notably immune-system and blood cells. The indigenous cells com-
prise primitive mesenchymal cells (undifferentiated multipotent stem cells) and
various differentiated cells that they give rise to, notably fibroblasts that synthe-
size and secrete most of the ECM macromolecules (see Figure 4.5).
Cells are sparsely distributed in the supporting ECM, and it is the ECM rather
than the cells it contains that bears most of the mechanical stress on connective
tissue (see Figure 4.5). Loose connective tissue has fibroblasts surrounded by a
flexible collagen fiber matrix; it is found beneath the epithelium in skin and many
internal organs and also forms a protective layer over muscle, nerves, and blood
vessels. In fibrous connective tissue the collagen fibers are densely packed, pro-
viding strength to tendons and ligaments. Cartilage and bone are rigid forms of
connective tissue.
Muscle tissue
Muscle tissue is composed of contractile cells that have the special ability to
shorten or contract so as to produce movement of the body parts. Skeletal muscle
fibers are cylindrical, striated, under voluntary control, and multinucleated
because they arise by the fusion of precursor cells called myoblasts (see Figure
4.2B). Smooth muscle cells are spindle-shaped, have a single, centrally located
nucleus, lack striations, and are under involuntary control (see Figure 4.4).
Cardiac muscle has branching fibers, striations, and intercalated disks; the com-
ponent cells, cardiomyocytes, each have a single nucleus, and contraction is not
under voluntary control.
Nervous tissue
Nervous tissue is limited to the brain, spinal cord, and nerves. Neurons are elec-
trically excitable cells that process and transmit information via electrical signals
(impulses) and secreted neurotransmitters. They have three principal parts: the
cell body (the main part of the cell, performing general functions); a network of
dendrites (extensions of the cytoplasm that carry incoming impulses to the cell
body); and a single long axon that carries impulses away from the cell body to the
end of the axon (Figure 4.6).
(A) (B) ~1 mm
cell body
dendrite
myelin
sheath
axon
layers of
myelin Schwann myelin
nucleus sheath cell
axon
nucleus
Figure 4.6 Neurons and myelination. (A) Neuron structure. Each neuron has a single long axon with multiple axon termini (dendrites) that are
connected to other neurons or to an effector cell such as a muscle cell. Neurons are insulated by certain glial cells such as Schwann cells that form a
myelin sheath. (B) Myelination of an axon from a peripheral nerve. Each Schwann cell wraps its plasma membrane concentrically around the axon,
forming a myelin sheath covering 1 mm of the axon. [From Alberts B, Johnson A, Lewis J et al. (2008) Molecular Biology of the Cell, 5th ed. Garland
Science/Taylor & Francis LLC.]
102 Chapter 4: Cells and Cell–Cell Communication
Neurons account for less than 10% of cells in the nervous system; the other
90% are glial cells. Glial cells do not transmit impulses, but instead support the
activities of the neurons in a variety of ways. The axons of neurons have an insu-
lating sheath of a phospholipid, myelin, that is produced by certain glial cells:
oligodendrocytes (in the central nervous system) and Schwann cells (in the
peripheral nervous system). Astrocytes are small star-shaped glial cells that (A)
ensheath synapses and regulate neuronal function. Microglial cells are phago-
cytic and protect against bacterial invasion. Other glial cells provide nutrients by
binding blood vessels to the neurons.
Neurons communicate with each other at two types of synapse, as detailed in transmitting
the next section. cell
ligand
4.3 PRINCIPLES OF CELL SIGNALING responding
cell receptor
All cells receive and respond to signals from their environment, such as changes receptor
in the extracellular concentrations of certain ions and nutrient molecules, or activation
temperature shocks. Whereas single-celled organisms largely function independ-
ently, the cells of multicellular organisms must cooperate with each other for the
benefit of the organism. To coordinate and regulate physiological and biochemi- (B)
cal functions they must communicate effectively by sending and receiving sig-
nals. The role of cell signaling during early development will be explained in
Chapter 5. In this section, we give some of the principles underlying cell
signaling.
TABLE 4.3 IMPORTANT CLASSES OF VERTEBRATE CELL SIGNALING MOLECULES AND THEIR RECEPTORS
Signaling molecule Receptor Examples/comments
Small signaling molecules that cross the cell Internal receptors See Figure 4.7A
membrane
Steroid hormones, retinoids can reside in cytoplasm or nucleus; are converted signaling using nuclear hormone receptors
to transcription factors when bound by their (Figures 4.8 and 4.9)
ligand
Signaling molecules that migrate to reach Cell surface receptors See Figure 4.7B
cell surface
Some growth factors (FGFs, EGFs), ephrins, receptors with intrinsic kinase activity many; FGF, EGF, and ephrin signaling are
and some hormones (insulin) widely used during embryonic development
Cytokines (interleukins, interferons, etc.); receptors with associated tyrosine kinase activity; many
some hormones and growth factors (growth internal domains have associated JAK protein
hormone, prolactin, erythropoietin, etc.) (Figure 4.10)
Various hormones (epinephrine, serotonin, G-protein-coupled receptors (Figure 4.11); they signaling involving olfactory receptors, taste
glucagon, FSH, etc.), histamine, opioids, span the cell membrane seven times and their receptors, rhodopsin receptors, etc.
neurokinins, etc. internal domains have associated GTP-binding
proteins
TGF-b family receptors with associated serine/threonine kinase many examples in signaling during embryonic
activity, internal domain associated with SMAD development
proteins
Signaling molecules immobilized on cell Cell surface receptors See Figure 4.7C
surface
EGF, epidermal growth factor; FGF, fibroblast growth factor; FSH, follicle-stimulating hormone; TGF-b, transforming growth factor-b.
HTH motif HTH motif binding to DNA HLH monomer HLH dimer binding to DNA
turn loop
a-helix
a-helix
NH2 a-helix
a-helix
COOH
Leu L L
L L C H
L L
Leu L L Zn
C H
a-helix
Leu
Leu
Figure 1 Structural motifs commonly found in transcription factors and DNA-binding proteins. HTH, helix–turn–helix motif; HLH, helix–loop–
helix motif.
PRINCIPLES OF CELL SIGNALING 105
(A) nuclear receptor (B) response element Figure 4.8 The nuclear receptor
1 777 superfamily. (A) Members of the nuclear
GR AGAACA nnn TGTTCT receptor superfamily all have a similar
DBD LBD TCTTGT nnn ACAAGA
structure, with a central DNA-binding
1 553
AGGTCA nnn TGACCT domain (DBD) and a C-terminal ligand-
ER DBD LBD TCCAGT nnn ACTGGA binding domain (LBD). Numbers refer to
1 432 the protein size in amino acid residues.
AGGTCA nnnnn AGACCA
RAR DBD LBD TCCAGT nnnnn TCTGGT GR, glucocorticoid receptor; ER, estrogen
receptor; RAR, retinoic acid receptor; TR,
1 408
TR DBD AGGTCA TGACCT thyroxine receptor; VDR, vitamin D receptor.
LBD TCCAGT ACTGGA
(B) The response elements recognized by
1 427 the nuclear receptors also have a conserved
VDR DBD LBD AGGTCA nnn AGGTCA
TCCAGT nnn TCCAGT structure, with two hexanucleotide
recognition sequences typically separated
by either three or five nucleotides. The
hexanucleotide sequences have the general
The receptors for steroid hormones and also those for the signaling molecules consensus of AGNNCA with the two central
thyroxine and retinoic acid belong to a common nuclear receptor superfamily. nucleotides (pale shading) conferring
Each receptor in this superfamily contains a centrally located DNA-binding specificity and belonging to one of three
domain of about 68 amino acids, and a ligand-binding domain of about 240 classes: AA, AC, or GT.
amino acids located close to the C terminus (Figure 4.8A). The DNA-binding
domain contains structural motifs known as zinc fingers (see Box 4.2) and binds
as a dimer, with each monomer recognizing one of two hexanucleotides in the
response element. The two hexanucleotides are either inverted repeats or direct
repeats that are usually separated by three or five nucleotides (Figure 4.8B).
The nuclear hormone receptors are normally found in the cytoplasm in an
inactive state. Either the ligand-binding domain directly represses the DNA-
binding domain or the receptor is bound to an inhibitory protein, as in the gluco- GC
corticoid receptor (Figure 4.9). Upon ligand binding, the inhibition is relieved
and the activated ligand–receptor complex migrates to the nucleus.
GCR
Signaling through cell surface receptors often involves kinase cascades +
The plasma membranes of animal cells positively bristle with transmembrane
receptors for signaling molecules, and the primary cilium, which protrudes into
the external environment, is studded with such receptors. This previously myste-
rious organelle is now thought to have a major role as a sensor, using the multiple
different receptors to sense, and respond to, alterations in the extracellular
environment.
When a receptor binds its signaling molecule on the external surface of the
cell, a conformational change is induced in its internal (cytoplasmic) domain.
This change alters the properties of the receptor, perhaps affecting its contact activation
with another protein, or stimulating some latent enzyme activity. For many sig-
Hsp90
naling receptors, either the receptor or an associated protein has integral kinase
activity. The activated kinase often causes the receptor to phosphorylate itself,
which then allows it to phosphorylate other proteins inside the cell, thereby acti-
vating them. These target proteins are themselves often kinases. They, in turn,
can phosphorylate and thereby activate proteins further down a signal transduc- dimerization and
tion pathway, resulting in a kinase cascade. translocation to
Eventually, a transcription factor is activated (or inhibited as appropriate), nucleus
resulting in a change in gene expression. The length of the signaling cascade can
be short (as in the cytokine-regulated JAK–STAT pathway; see Figure 4.10) or nucleus
have many steps (such as the MAP kinase pathway). Pathways in which receptors
do not have kinase activity often have several downstream components with
either kinase or phosphorylase activity.
Figure 4.9 Cell signaling by ligand activation of an intracellular receptor. target genes
A glucocorticoid (GC), like other hydrophobic hormones, can pass through
the plasma membrane and bind to a specific intracellular receptor. The
glucocorticoid receptor (GCR) is normally bound to an Hsp90 inhibitory protein
complex and is found within the cytoplasm. After binding to glucocorticoid, co-activator
however, the inhibitor complex is released and the now activated receptor
forms dimers and translocates to the nucleus. Here, it works as a transcription
factor by specifically binding to a particular response element sequence (lower
panel; see Figure 4.8B) in target genes at, and with the cooperation of, specific AGAACA nnn TGTTCT target
TCTTGT nnn ACAAGA gene
co-activator proteins, activating the target genes.
106 Chapter 4: Cells and Cell–Cell Communication
P P
target
gene
Hydrophilic Cyclic AMP (cAMP) produced from ATP by adenylate cyclase effects are usually mediated through protein kinase
(cytosolic)
Cyclic GMP (cGMP) produced from GTP by guanylate cyclase best-characterized role is in visual reception in the
vertebrate eye
Hydrophobic phosphatidylinositol PIP2 is a phospholipid that is enriched IP3 binds to receptors in the endoplasmic reticulum,
2+
(membrane- 4,5-bisphosphate (PIP2) at the plasma membrane and can be thereby causing the release of Ca into the cytosol;
associated) and its cleavage products; cleaved to generate DAG on the inner Ca2+-dependent protein kinase C is thereby recruited
diacylglycerol (DAG); inositol layer of the plasma membrane, releasing to the plasma membrane (see Figure 4.12)
trisphosphate (IP3) IP3 into the cytoplasm
PRINCIPLES OF CELL SIGNALING 107
that spreads as a traveling wave along the plasma membrane of the axon. The synaptic vesicle
containing
change in electrical potential at the axon terminus causes a neurotransmitter neurotransmitters
to be released that diffuses across the synaptic cleft to bind to receptors on
dendrites of interconnected neurons, causing local depolarization of the
plasma membrane (see Figure 4.13). In a similar fashion, neurons also trans- neurotransmitter
mit signals to muscles (at neuromuscular junctions) and glands. receptor
• Electrical synapses are less common. Here, the gap between the two connect-
ing neurons is very small, only about 3.5 nm, and is known as a gap junction. axon terminus dendrite
Neurotransmitters are not involved; instead, gap junction channels allow ions synaptic
to cross from the cytoplasm of one neuron into another, causing rapid depo- cleft
larization of the membrane.
Figure 4.13 Synaptic signaling.
At chemical synapses, a depolarized axon
4.4 CELL PROLIFERATION, SENESCENCE, AND terminus of a transmitting (presynaptic) cell
releases a neurotransmitter that is normally
PROGRAMMED CELL DEATH stored in vesicles. The release occurs by
exocytosis: the vesicles containing the
Cells are formed usually by cell division. Cell division may be common in certain neurotransmitters fuse with the plasma
tissues but especially so in early development, when rapid cell proliferation membrane, releasing their contents into the
underlies the growth of multicellular organisms and the progression toward narrow (about 20–40 nm) synaptic cleft. The
maturity. However, throughout development, cell death is common, and by the neurotransmitters then bind to transmitter-
time of maturity an equilibrium is reached between cell proliferation and cell gated ion channel receptors on the surface
death. of the dendrites of a communicating neuron,
Although some cell loss is accidental and is the result of injury or disease, causing local depolarization of the plasma
planned or programmed cell death is very common and is functionally impor- membrane.
tant. Like the organisms that contain them, cells age, and the aging process—cell
senescence—is related to various factors, including the frequency of cell division.
(B)
G1/S-cyclin (E) S-cyclin (A)
M-cyclin (B)
G1 S G2 M G1
M-Cdk M-Cdk
S-Cdk Cdk Cdk S-Cdk
inactive inactive
G1/S-Cdk S-Cdk M-Cdk
mitogen
Ras Ras
GDP GTP
P P
P P GTP GDP
MAP kinase
cell breaks apart into several membrane-lined vesicles called apoptotic bodies
that are phagocytosed.
Another form of PCD is autophagy, a catabolic process in which the cell’s
components are degraded as a way of coping with adverse conditions such as
nutrient starvation or infection by certain intracellular pathogens. Intracellular
double-membrane structures engulf large sections of the cytoplasm and fuse
with lysosomes, thereby targeting the enclosed proteins and organelles for degra-
dation. Taken to an extreme, autophagy can lead to cell death. Other forms of
PCD are known but are poorly characterized.
Killing of defective cells during removal of defective immature natural mistakes occur in the DNA rearrangements that give T and B
development lymphocytes lymphocytes their specific receptors; up to 95% of immature T cells have
defective receptors and die by apoptosis
Killing of harmful cells removal of harmful T lymphocytes some immature T cells contain receptors that recognize self antigens instead
of foreign antigens and need to be eliminated by apoptosis before they
cause normal host cells to die
removal of virally infected and a class of T lymphocytes known as killer T cells is responsible for inducing
tumor cells apoptosis in host cells that express viral antigens or altered antigens/
neoantigens produced by tumor cells
removal of cells with damaged cells with damaged DNA tend to accumulate harmful mutations and are
DNA potentially harmful; DNA damage often induces cell suicide pathways
Killing of excess, obsolete, or removal of surplus neurons during in the embryo, many more neurons are produced than are needed; those
unnecessary cells development neurons that fail to make the right connections, for example to muscle cells,
undergo apoptosis
removal of interdigital cells during development, fingers and toes are sculpted from spade-like
handplates and footplates by causing apoptosis of the unnecessary
interdigital cells (see Figure 5.7)
CELL PROLIFERATION, SENESCENCE, AND PROGRAMMED CELL DEATH 113
key immune system cells, allowing disease progression toward AIDS. Many can-
cer cells have devised strategies to oppose the immunosurveillance systems that
normally induce apoptosis of cancer cells. Successful chemotherapy often relies
on using chemicals that help induce cancer cells to apoptose. Other forms of
PCD are important in neurodegenerative disease, such as in Huntington disease
and Alzheimer disease, as well as in myocardial infarction and in stroke, in which
secondary PCD caused by oxygen deprivation in the area surrounding the ini-
tially affected cells greatly increases the size of the affected area.
effector
caspases
caspases
3, 6, 7
Totipotent the capacity to differentiate into all possible cells of the the zygote and its immediate descendants
organism, including the extra-embryonic membranes
Pluripotent can give rise to all of the cells of the embryo, and therefore of cells within the inner cell mass of a blastocyst (Figure 4.19A)
a whole animal, but are no longer capable of giving rise to the
extra-embryonic structures
Multipotent can give rise to multiple lineages but has lost the ability to give hematopoietic stem cell (Figure 4.17)
rise to all body cells
Oligopotent can give rise to a few types of differentiated cell neural stem cell that can create a subset of neurons in brain
Unipotent can give rise to one type of differentiated cell only spermatogonial stem cell
In adults, tissue stem cells are typically rare components of the tissue that
they renew, and the lack of convincing stem cell-specific markers makes them
generally difficult to purify. The ones that we know most about in mammals come
from easily accessible tissues with rapidly proliferating cells. Two of the best-
characterized systems are blood and epidermis, both of which are known to be
maintained by a single type of progenitor cell. Blood cells are initially made in
certain embryonic structures before the fetal liver takes over production. From
fetal week 20 onward, blood cells originate from the bone marrow where two
types of stem cell have long been known:
• Hematopoietic stem cells (HSCs). HSCs are anchored to fibroblast-like osteo-
blasts of the spongy inner marrow of the long bones. Grafting studies in mice
provide especially persuasive evidence that a single type of HSC is responsible
for making all the different blood cells. When a mouse that has been irradi-
ated, to ensure destruction of the bone marrow, receives a graft of purified
HSCs from another mouse strain, the incoming cells are able to differentiate
and re-populate the blood. Only about 1 in 10,000 to 1 in 15,000 bone marrow
cells is an HSC; the frequency in blood is about 1 in 100,000. In comparison
with other stem cells, however, HSCs have been reasonably highly purified.
They are multipotent, but the cell lineages to which they give rise become
increasingly specialized, ultimately producing all of the terminally differenti-
ated blood cells (Figure 4.17).
• Mesenchymal stem cells (MSCs). MSCs are primitive connective tissue stem
cells. Bone marrow MSCs (also known as marrow stromal cells) are poorly
defined, and so are heterogeneous. Unlike HSCs, they are not constantly self-
renewing but are relatively long-lived. They are multipotent and give rise to a
variety of cell types, including bone, cartilage, fat, and fibrous connective
tissue. Figure 4.17 A single type of self-renewing
MSCs are also found in umbilical cord blood, where they seem to have a wider multipotent hematopoietic stem cell gives
differentiation potential than in the bone marrow. In addition, MSCs are known rise to all blood cell lineages.
hematopoietic
stem cell
erythrocyte
plasma T-cell subclasses NK cells platelets dendritic macrophage basophil
cell cells
lymphocytes granulocytes
STEM CELLS AND DIFFERENTIATION 117
epidermis
villus
dermis stem cells goblet cells
basal
layer
sebaceous
gland
bulge mesenchymal
cells transit
stem cells amplifying
crypt cells
intestinal
stem cells
Paneth
cells
to occur in a wide variety of non-hematopoietic tissues. Recent evidence sug- Figure 4.18 Locations of skin, hair follicle,
gests that the MSCs in diverse tissue types may originate from progenitor cells and intestinal stem cells. (A) Location of
found in the walls of blood vessels. epidermal stem cells. Epidermal stem cells
can be found in three types of niche located
Stem cell niches in the bulge region of the hair follicle,
the basal layer of the epidermis, and the
A variety of other adult tissues are also known to contain stem cells, including sebaceous gland. Stem cells in the epidermal
brain, skin, intestine, and skeletal muscle. Germ cells are sustained by dedicated basal layer give rise to proliferating
male and female germ-line stem cells. Such stem cells exist in special supportive progenitor cells known as transit amplifying
microenvironments, known as stem cell niches, where they are kept active and cells, which in turn give rise to increasingly
alive by substances secreted by neighboring cells. differentiated upper cell layers. (B) Location
The location of some stem cells has been studied, notably in model organ- of intestinal stem cells. Intestinal stem
isms. Epidermal stem cells are known to occur in three locations (Figure 4.18A), cells are located within narrow bands in
of which those in the bulge region of the hair follicles give rise to both the hair intestinal crypts. They move downward
follicle and the epidermis. The stem cells that will form epidermis give rise to to differentiate into lysozyme-secreting
Paneth cells, or upward to generate transit
keratinocytes that progressively differentiate as they move toward upper layers.
amplifying (TA) cells. The nearby intestinal
In the epithelium of the small intestine, stem cells are located near the base of villi are regenerated every 3–5 days from the
crypts, which are small pits interspersed between projections called villi (Figure crypt TA cells and consist of three types of
4.18B). differentiated cell: mucus-secreting goblet
One characteristic of stem cells that helps in identifying where they are located cells, absorptive enterocytes (by far the
is that they divide only rarely. They do not undergo cell division very frequently, most prevalent villus cell), and occasional
so as to conserve their proliferative potential and to minimize DNA replication hormone-secreting enteroendocrine cells.
errors. As a result, differentiation pathways generally begin with stem cells giving [(A) courtesy of The Graduate School of
rise to proliferating progenitor cells known as transit amplifying (TA) cells. Biomedical Sciences, University of Medicine
Unlike stem cells, TA cells are committed to differentiation and divide a finite and Dentistry of New Jersey. (B) adapted
from Blanpain C, Horsley V, & Fuchs E (2007)
number of times until they become differentiated.
Cell, 128, 445–458. With permission from
Stem cell renewal versus differentiation Elsevier.]
Some progress has been made in identifying factors that are important in stem
cell renewal and differentiation potential, but much of the detail has yet to be
worked out. Stem cells can in principle renew themselves by normal (symmetric)
cell divisions. To produce cells that are more specialized than they are, and so
launch differentiation pathways, stem cells need to undergo asymmetric cell
divisions (Box 4.3). Initially, the stem cell gives rise to a cell that is committed to
a different fate and that typically gives rise in turn to a series of progressively
more specialized cells.
basal layer
differentiation
differentiation
Figure 2 Asymmetric and symmetric stem cell divisions. (A) Asymmetric division of stem cells.
(B) Symmetric division of stem cells. (C) Asymmetric division of stem cells produced by reorienting the
differentiation spindle so that one of the two daughter cells is in a different microenvironment.
shown to be able to give rise to healthy adult mice after being injected into blas- (A)
tocysts from a different mouse strain that were then transferred into the oviduct
of a foster mother (the technology and some applications are described in
Chapter 20).
Human ES cells were first cultured in the late 1990s. They have traditionally
been derived from embryos that develop from eggs fertilized in vitro in an IVF (in
vitro fertilization) clinic. The IVF procedure is intended to help couples who have
difficulty in conceiving children; it normally results in an excess of fertilized eggs
that may then be donated for research purposes.
ES cells can be established in culture by transferring ICM cells into a dish con-
taining culture medium. Culture dishes have traditionally been coated on their
inner surfaces with a layer of feeder cells (such as embryonic fibroblasts). The
overlying ICM cells attach to the sticky feeder cells and are stimulated to grow by
nutrients released into the medium from the feeder cells (Figure 4.19B). After the ICM
ICM cells have proliferated, they are gently removed and subcultured by re- (B)
plating into several fresh culture dishes. After 6 months or so of repeated subcul-
turing, the original small number of ICM cells can yield large numbers of ES cells.
ES cells that have proliferated in cell culture for 6 months or more without dif-
ferentiating, and that are demonstrably pluripotent while appearing genetically
normal, are referred to as an embryonic stem (ES) cell line.
Pluripotency tests
To test whether ES cells are likely to be pluripotent, their differentiation potential
is initially analyzed in culture. When cultured so as to favor differentiation, they
first adhere to each other to form cell clumps known as embryoid bodies, then
differentiate spontaneously to form cells of different types, but in a rather unpre-
dictable fashion. By manipulating the growth media, they can also be tested for Figure 4.19 Origins and potential of
embryonic stem cells. (A) A 6-day-old
their ability to differentiate along particular pathways.
human blastocyst. The blastocyst is a fluid-
A more rigorous pluripotency test involves assaying the ability to differentiate
filled hollow ball of cells containing about
in vivo. ES cells are injected into an immunosuppressed mouse to test for their 100 cells. Most cells are located on the
ability to form a benign tumor known as a teratoma. Naturally occurring and outside and will form trophoblast. The
experimentally induced teratomas typically contain a mixture of many differenti- cells in the interior, the inner cell mass (ICM)
ated or partly differentiated cell types from all three germ layers. If ES cells can will form all the cells of the embryo.
form a teratoma they are considered to be pluripotent. (B) Cultured human ES (embryonic stem)
cells. Cells surgically removed from the
Embryonic germ cells ICM can be used to give rise to a cultured
An alternative strategy for establishing pluripotent embryonic cell lines has human ES cell line. Human ES cell lines
involved isolating primordial germ cells, the cells of the gonadal ridge that will were traditionally established by growing
the human ICM cells on a supporting layer
normally develop into mature gametes. When cultured in vitro primordial germ
of mouse embryonic fibroblast feeder cells.
cells give rise to pluripotent embryonic germ (EG) cells. Human EG cells, derived Human ES cells fill most of the image shown
from primordial germ cells or embryos and fetuses from 5 to 10 weeks old, were here, and the feeder cells are located at the
first cultured in the late 1990s. extremities. (Courtesy of M Herbert (A) &
EG cells behave in a very similar fashion to ES cells and share many marker M Lako (B), Newcastle University.)
proteins that are believed to be important in ES cell pluripotency and self-
renewal. Currently, however, the molecular basis of pluripotency and of the
capacity for self-renewal is imperfectly understood.
The exact relationship between cultured ES cells and pluripotent ICM cells is
also not understood. ICM cells are not stem cells: they persist for only a limited
number of cell divisions in the intact embryo before differentiating into other cell
types with more restricted developmental potential. Even at the earliest stages
ICM cells are heterogeneous, and some evidence suggests that cultured ES cells
more closely resemble primitive ectoderm cells, whereas other evidence suggests
that ES cells are of germ cell origin.
B cells express immuno- originate in bone marrow and humoral (antibody) immunity; response to foreign
globulins (Igs) and when mature migrate to blood antigens; important antigen-presenting cells
class II major histo- and lymphoid organs
compatibility complex
(MHC) proteins
Plasma cell secrete IgM, IgG, IgA, or produced by antigen-stimulated secrete soluble antibodies
IgE antibodies B cells in lymph nodes
Memory B cells clonally expanded B-cell produced by antigen-stimulated respond in secondary immune responses
populations B cells in lymph nodes
T cells express T-cell receptors originate in bone marrow, cell-mediated immunity; important in antiviral and
(TCRs) carried by the blood to the anti-tumor responses
thymus where they mature
Helper Th cell CD4+ receptors derived from ab T cells recognize exogenous antigens bound to class II MHC protein;
signal to other immune system cells, stimulating them to
proliferate and be activated
Cytotoxic Tc cell CD8+ receptors derived from ab T cells recognize endogenous proteins bound to class I MHC; kill
virally infected host cells and tumor cells
+
Regulatory (Treg) CD4 and CD25 derived from ab T cells suppress autoimmune responses
cells (suppressor
T cells)
Memory T cells clonally expanded derived by clonal expansion of stimulated to expand rapidly in secondary immune
populations of ab T cells Tc, Th, or Treg cells responses
NKT cells TCRs interact with CD1 recognize glycolipid antigens; important regulatory cells
and not MHC proteins
gd T cells distinctive TCR has g and recognize glycolipid antigens; important regulatory cells
d chains
Natural killer CD16+ and CD56+ originate from bone marrow innate immunity; first line of defense against many
(NK) cells Fc receptors for IgG; and are present in blood and different viral infections; produce interferon-g and TNF-a,
receptors that detect spleen which can stimulate dendritic cell maturation, activate
class I MHC macrophages, and regulate Th cells; recognize and kill
stressed/diseased cells expressing reduced class I MHC
proteins or stress proteins
IMMUNE SYSTEM CELLS: FUNCTION THROUGH DIVERSITY 121
Monocytes in blood
Tissue-specific macrophages alveolar macrophages (lung), histocytes (connective tissue), potent phagocytes
intestinal macrophages, Kupffer cells (liver), mesangial cells
(kidney), microglial cells (brain), osteoclasts (bone)
DENDRITIC CELLS (DCs) in tissues in contact with external environment (mainly in dedicated antigen-presenting cells; limited
skin and inner linings of nose, lungs, stomach, intestines) phagocytic activity; key coordinators of innate
and adaptive immunity
Interstitial DCs move from interstitial spaces in most organs to lymph nodes/
spleen
BLOOD GRANULOCYTES in blood; migrate to sites of infection/inflammation during dedicated antigen-presenting cells
immune response
Neutrophil lysosome-like granules active motile phagocytes; often the first cells to
arrive at a site of inflammation after infection
Basophil Fc receptors for IgE; histamine-containing granules; are important in mounting allergic responses and in
equivalents of the mast cells in tissues responses to large parasites
MAST CELLS Fc receptors for IgE; histamine-containing granules; tissue dedicated antigen-presenting cells; important in
equivalents of the basophils in blood mounting allergic responses and in responses to
large parasites
filter blood). They are each packed with mature immune cells, predominantly
lymphocytes but also including macrophages, dendritic cells, and other cells
whose functions will be detailed below.
Rather less organized mucosa-associated lymphoid tissue is found in various
sites in the body. Gut-associated lymphoid tissue collectively constitutes the larg-
est lymphoid organ in the body, which is consistent with its need to interact with
a huge load of antigens from food and commensal bacteria. It comprises the ton-
sils, adenoids, Peyer’s patches in the small intestine, and lymphoid aggregates in
the appendix and large intestine. Epithelium-associated lymphoid tissue is also
found in the skin and in the mucous membranes lining the upper airways, bron-
chi, and genitourinary tract.
The first line of defense in vertebrate immune systems is physical. The epithe-
lial linings of the skin, gut, and lungs act both as barriers (epithelial cells are firmly
attached to each other by tight junctions) and as surfaces for trapping and killing
pathogens. The epithelia can produce thick mucous secretions to trap pathogens,
and can then kill them with lysozyme (in sweat) and antimicrobial peptides
known as defensins (in mucus).
Vertebrates mount an immune response based on cooperation between the
innate and adaptive immune systems. Cell signaling is a key element in this coop-
eration. Various cell surface molecules and secreted molecules called cytokines
orchestrate the interactions between immune system cells that are needed for
the proper functioning and regulation of the overall immune response. The most
prevalent cytokines are members of the interleukin and interferon protein
families.
of a species, and are rapid, developing within minutes. The principal effector C1 complement
cells perform two major functions: complex
• Various cells engulf and kill bacteria and fungi by phagocytosis, a form of
endocytosis. The major phagocytic cells are neutrophils (a white blood cell
with many lysosomal-like cytoplasmic granules), tissue macrophages (ubiq-
uitous, but concentrated in sites vulnerable to infection such as lungs and
gut), and blood monocytes (precursors to macrophages).
• NK (natural killer) cells induce virus-infected cells and tumor cells to undergo
apoptosis. Two principal pathways are used. First, NK cells bind to diseased
cells, and small cytoplasmic granules are released by exocytosis into the inter-
microbe
cellular space. Some of the released proteins (perforins) are inserted into the
membranes of the target cell to create pores, whereas others (granzymes)
enter the target cells, where they cleave critical substrates such as procaspase
3 that initiate apoptosis. In the second pathway, Fas ligands on NK cells acti-
vate Fas receptors on target cells to initiate apoptosis (see Figure 4.16).
Other cells help in various ways. In response to injury, platelets release coagu-
lation factors and adhere to each other and to endothelial cells lining the walls of
blood vessels to form clots and lessen bleeding. Mast cells release chemical mes-
sengers that cause blood vessels near a wound to constrict, reducing blood loss at
the wound. They also secrete histamines and other factors that induce blood ves-
sels slightly farther from the wound to dilate, increasing the delivery of blood
membrane
components to the general region. Cells in tissue then release chemokines such attack complex
as interleukin-8 to create a chemical gradient that attracts neutrophils and other
white blood cells to the site.
The innate immune system must distinguish self from nonself. To do this, it
recognizes conserved features of pathogens by using various cell surface recep-
tors, and secreted recognition molecules in the blood complement system.
Toll-like receptors (TLRs) are a family of transmembrane proteins that are
involved in the recognition of specific features of pathogens as being foreign.
Examples of these features are the following: peptidoglycans (from bacterial cell
walls); flagella; lipopolysaccharides (on Gram-negative bacteria); and double-
stranded RNA, methylated DNA, and zymosan (in fungal cell walls). TLRs are
abundant on macrophages and neutrophils, and on epithelial cells lining the
lung and gut. Once stimulated, TLRs induce the surface expression of co-
stimulatory molecules that are essential for initiating adaptive immune responses
(see below) and stimulate the secretion of pharmacologically active molecules
(prostaglandins and cytokines) that both initiate an inflammatory response and
also help induce an adaptive immune response.
The complement system comprises 20 or so proteins that circulate in the Figure 4.20 Activating the complement
blood and extracellular fluid. Complement proteins are made mostly by the liver, system. Antibodies bound to the surface of
but blood monocytes, tissue macrophages, and epithelial cells of the gastrointes- a microbe can be bound by complement via
tinal and genitourinary tracts can also make significant amounts. The proteins a component (C1q) of the C1 complement
are proteases that interact in a proteolytic cascade on the surface of invading complex, activating the classical complement
microbes. Eventually a membrane attack complex is formed that can punch holes pathway. Eventually a membrane attack
complex is formed: a ring of complement C9
through the outer membranes of the microbes, causing them to swell up and
proteins surrounding a core of complement
burst (Figure 4.20). Activation of complement is possible by three different proteins C5 to C8 integrates into the cell
pathways: membrane of the microbe. The membrane
• The classical pathway involves interaction with the adaptive immune system, attack complex then punches a hole in
particularly invariant domains of antibodies, which bind to antigens on the the microbe’s cell membrane, and the cell
surface of a microbe. contents are released, killing the microbe.
innate immune system when the latter fails to provide adequate protection
against an invading pathogen. It has three key characteristics:
• Exquisite specificity: it can discriminate between tiny differences in molecu-
lar structure.
• Extraordinary adaptability: it can respond to an unlimited number of
molecules.
• Memory: it can remember a previous encounter with a foreign molecule and
respond more rapidly and more effectively on a second occasion.
Whereas innate immune responses are the same in all members of a species,
adaptive immune responses vary between individuals: one individual may mount
a strong reaction to a particular antigen that another individual may never
respond to. There are two major arms to the adaptive immune response:
• Humoral (antibody) immunity is mediated by B lymphocytes (also called
B cells).
• Cell-mediated immunity is effected by T lymphocytes (T cells) and is an
important antiviral defense system.
Both B and T cells have dedicated cell surface receptors that can specifically
recognize individual antigens.
What makes the adaptive immune system so proficient at recognizing anti-
gens is that during its development in the primary lymphoid organs each naive B
or T cell acquires a cell surface antigen receptor of a unique specificity. Binding of
this receptor to its specific antigen activates the cell and causes it to proliferate
into a clone of cells with the same immunological specificity as the parent cell
(clonal selection). As a result, the number of lymphocytes that can recognize the
specific antigen can be rapidly expanded.
Immunological memory is another important feature of the adaptive immune
system. During the massive clonal expansion of antigen-specific lymphocytes
that occurs after an initial encounter with antigen, some of the expanding daugh-
ter cells differentiate into memory cells that are able to respond to the antigen
more rapidly or more effectively. Memory cells are endowed with higher-affinity
antigen receptors, and also combinations of adhesion molecules, homing recep-
tors, and cytokine receptors that direct the lymphocytes to migrate efficiently
into specific tissues (extravasation).
variable
regions
IgM m pentamer/monomer B-cell receptor (monomer) and first Ig to be produced (see Figure 4.26A); activates
secreted antibody (pentamer) complement
IgD d monomer B-cell receptor only second Ig to be produced (see Figure 4.26A); function
unknown
IgA a often dimer or B-cell receptor, but predominantly as predominant Ig in external secretions such as breast milk,
tetramer antibodies secreted by plasma cells saliva, tears, and mucus of the bronchial, genitourinary,
and digestive tracts
H H
L L
a a b a
a or b g or d
b2M b2M
In cell-mediated immunity, T cells recognize cells containing Figure 4.22 Some representatives of
the immunoglobulin superfamily. The
fragments of foreign proteins immunoglobulin domain is a barrel-like
Microorganisms such as viruses that penetrate cells and multiply within them structure held together by a disulfide bridge
(pink dot). Multiple copies of this structural
are out of reach of antibodies. T cells evolved to deal with this need; they carry on
domain are found in many proteins with
their surfaces dedicated T-cell receptors (TCRs) that recognize sequences from important immune system functions, but
foreign antigens. TCRs are heterodimers that are evolutionarily related to both other members of the immunoglobulin
immunoglobulins and other cell receptors of the immunoglobulin superfamily superfamily can have different functions,
(see Figure 4.22). such as Ig-CAM cell adhesion molecules
There are two types of TCR, composed either of a and b chains or of g and d (Section 4.2). Shaded boxes show the
chains. Unlike the immunoglobulins on B cells, most ab TCRs and the cells that variable domains of immune system
bear them are limited to recognizing foreign proteins, and only after they have proteins. b2M, b2-microglobulin.
been degraded intracellularly. Peptides of an appropriate size and sequence
derived from this degradation bind to major histocompatibility complex (MHC)
proteins (Box 4.4) that are then transported to the surface of the cell. With the
peptide held in a cleft on its outer surface, the MHC protein presents the peptide
to a T cell carrying an appropriate TCRab receptor (antigen presentation), thereby
activating that T cell.
(A) (C)
–Ab +Ab
docking
protein
BOX 4.4 THE MAJOR HISTOCOMPATIBILITY COMPLEX, MHC PROTEINS, AND ANTIGEN PRESENTATION
The major histocompatibility complex (MHC) class II region class III region class I region
is a large gene cluster that contains genes mostly
DQB1
DRB1
DRB9
DQA2
DQA1
DPA2
DPA1
DOB
DOA
DRA
(see Figure 4.22). These transmembrane proteins are
G
B
C
F
heterodimers. The MHC contains genes for the highly
*alternative genes on
polymorphic a (heavy) chain of class I and both a and different haplotypes
b chains of class II MHC proteins. The lighter chain Figure 1 Organization of HLA genes in the human MHC.
of class I MHC protein, b2-microglobulin, is not polymorphic and is
encoded by a gene elsewhere in the genome.
other cell types (including fibroblasts in skin, vascular endothelial cells,
The human MHC (known as the HLA complex; HLA stands for
glial cells, and pancreatic b cells) can be induced to express class II
human leukocyte antigen) spans several megabases at 6p21.3. The
MHC proteins.
class I MHC a chain genes (HLA-A, HLA-B, HLA-C, HLA-E, HLA-F, and
HLA-G) are located at the telomeric side; class II MHC genes are tightly From antigen presentation to T-cell activation
clustered at the centromeric end. An intervening region, sometimes Co-stimulatory molecules are surface molecules that interact
known as the class III MHC gene region, contains various complement with specific receptors on the T cell (Figure 2). The co-stimulatory
and other genes (Figure 1). signal delivered by this interaction is necessary to induce the Th
The MHC class I and class II proteins are among the most cell to synthesize the T-cell growth factor interleukin-2 (IL-2) and to
polymorphic vertebrate proteins known. Most individuals are express high-affinity receptors for IL-2. The secreted IL-2 stimulates
heterozygous at the corresponding loci, and both allelic forms the proliferation of T cells expressing the CD4 or CD8 cell surface
are functional and co-expressed at the cell surface. This extreme receptors. This complex way of inducing T-cell responses is presumably
polymorphism is presumably driven by intense natural selection necessary to avoid the inadvertent activation of T cells, which would
pressure from viruses and other parasites, which favors individuals be potentially damaging, and to allow the detailed regulation of their
with a greater capacity for recognizing infected host cells. proliferation and functional differentiation.
Because of their very high degree of polymorphism, class I and Co-stimulatory molecules are of such critical importance to the
II MHC proteins are important in graft rejection after organ or tissue initiation and regulation of immune responses that they are not
transplantation from one individual to another. If the donor and constitutively expressed, even by professional antigen-presenting
recipient (host) are unrelated, major differences could be expected in cells. Their expression can be induced via the TCR-triggered transient
the MHC proteins expressed by transplanted cells and host cells. The expression of CD40 ligand on T cells and the subsequent interaction
graft will be rejected because it will be recognized as foreign and will with and crosslinking of this ligand to CD40 on professional antigen-
be attacked by the immune system. To increase transplant success presenting cells. The expression of co-stimulators is also triggered by
rates, tissue typing is performed to identify alleles at important signaling arising from the recognition of the conserved features of
MHC loci (in easily accessible blood cells) so that potential donors pathogens that is part of the innate immune system, most importantly
can be sought with a closely matching MHC profile, and drugs are the signaling through Toll-like receptors expressed on dendritic cells,
administered that can suppress immune responses. However, MHC macrophages, and certain other cells.
polymorphism remains a challenge for successful cell therapy.
Antigen presentation T cell
The function of classical MHC proteins is to bind and transport TCRab
peptides to the cell surface for antigen presentation to T cells with
TCRab receptors.
• Class I MHC proteins bind peptides predominantly derived
from endogenous antigens and present them to cytotoxic T (Tc)
lymphocytes. CD4
• Class II MHC proteins bind peptides predominantly derived or peptide
CD28 LFA1
CD8 antigen
from exogenous antigens and present them to helper T (Th)
lymphocytes.
All cells with MHC proteins on their surface can present peptides
to T cells and in this context can be referred to as antigen-presenting MHC
cells. However, when such cells are recognized and potentially killed B7.1 (CD80) protein ICAM
by Tc cells they are often known as target cells. Almost all nucleated or B7.2 (CD86)
cells express class I MHC proteins and so can function as a target cell
presenting endogenous antigens to Tc cells. Target cells are often cells antigen-presenting
cell
infected with a virus or some other intracellular microorganism. In
addition, altered self cells such as cancer cells and aging body cells can Figure 2 Presentation of peptide antigens to T cells. Cell adhesion
serve as target cells, as can allogeneic cells introduced via skin grafts or (mediated by molecules such as ICAM1 and LAF1) brings the cells
organ transplantation (between genetically non-identical individuals). together. The peptides (about eight or nine amino acids long for
By contrast, relatively few cells express class II MHC molecules. class I MHC, and somewhat larger for class II MHC) are held in an
Among the most important are dendritic cells, macrophages, and outward-facing cleft in the MHC molecule. CD8 or CD4 proteins bind
B cells. These cells are also among the few that can express the to non-polymorphic components of class I and class II MHC proteins,
co-stimulatory molecules that are essential for initiating immune respectively. In addition to TCR–antigen recognition, a co-stimulatory
responses in both Tc and Th cells. For both these reasons they are signal is needed, such as a member of the B7 family that is recognized
sometimes referred to as professional antigen-presenting cells. Several by CD28 receptors on the T-cell surface.
IMMUNE SYSTEM CELLS: FUNCTION THROUGH DIVERSITY 127
This process, in which most ab T cells recognize antigens only after they have
been degraded and become associated with MHC molecules, is often referred to
as MHC restriction. It allows T cells to survey a peptide library derived from the
entire set of proteins that is contained within the cell but is presented on the sur-
face of the cell by MHC molecules, thereby providing a remarkable evolutionary
solution to the problem of how to detect intracellular pathogens. At the same
time, it restricts T cells to recognition of only those antigens that are associated
with cell-surface MHC molecules and are derived from intracellular spaces. T
cells therefore complement B cells and antibodies that can recognize only extra-
cellular antigens and pathogens.
Cell-associated antigens that are synthesized within the cell are referred to as
endogenous antigens. They derive from proteins produced within the cells
(including normal cellular proteins, tumor proteins, or viral and bacterial pro-
teins produced within infected cells). Such antigens are degraded within the
cytosol by proteasomes, large complexes containing enzymes that cleave peptide
bonds, converting proteins to peptides. These antigens are bound by class I MHC
proteins. Antigens that are captured by endocytosis or phagocytosis are described
as exogenous antigens. The internalized antigens move through several increas-
ingly acidic compartments, where they encounter hydrolytic enzymes including
proteases. These antigens are presented by class II MHC proteins.
An important aspect of this process is that MHC molecules cannot distinguish
self from nonself antigens, and therefore, except on the rare occasions when a
cell does indeed become infected or captures a microbial protein, the MHC mol-
ecules on its surface are presenting peptides derived from the degradation of self
proteins. A key event during the development of ab T cells in the thymus is that
only those T cells that have receptors that potentially recognize foreign peptides
in association with self MHC molecules are allowed to mature and be released
into the periphery (positive selection). Those that recognize self peptides are
induced to commit suicide by apoptosis (negative selection).
T-cell activation
Of the major classes of T cells (see Table 4.7), there are three main ones that par-
ticipate in this process:
• Helper T (Th) cells (which usually carry a CD4 cell surface receptor) recognize
peptides that are predominantly produced from exogenous protein antigens
and bound to a class II MHC protein (see Box 4.4). Only a few types of cell—
dendritic cells, macrophages, and B cells—express class II MHC proteins and
act to present antigens to Th cells. Once activated, Th cells secrete cytokines
to mobilize or regulate other immune system cells so as to generate an effec-
tive immune response. There are three types: Th1 cells activate macrophages
and Tc cells (see below); Th2 cells stimulate activation of eosinophils and pro-
mote antibody production by B cells; Th17 cells promote autoimmunity and
inflammation.
• Cytotoxic T (Tc) cells (which usually carry a CD8 cell surface receptor) recog-
nize peptides that are predominantly produced from endogenous protein
antigens and are bound to a class I MHC protein. Almost all nucleated cells
express class I MHC proteins and so can present antigen to Tc cells. Once acti-
vated, and in association with cytokines secreted by Th cells, they differenti-
ate into effector cells called cytotoxic T lymphocytes (CTLs).
CTLs are particularly important in recognizing and killing virally infected
cells. Like NK cells, they induce apoptosis, either by the Fas pathway (see
Figure 4.16) or often by delivering granules containing perforin and granzymes.
Some viruses seek to avoid being killed by CTLs by downregulating the expres-
sion of class I MHC molecules on host cells. However, NK cells have receptors
that screen host cells for the presence of class I MHC antigens and are acti-
vated when they sense host cells lacking class I MHC proteins.
• Regulatory T (Treg) cells. These cells generally express CD4 but are distin-
guished from helper T cells by their constitutive surface expression of CD25
(one of the chains of the interleukin-2 receptor) and intracellular expression
of the transcription factor Foxp3. They suppress autoimmunity (immune
responses to self antigens; Box 4.5)
128 Chapter 4: Cells and Cell–Cell Communication
Not all ab T cells recognize peptides presented by MHC class I and class II
molecules. In particular, a subclass called NKT cells can react with glycolipid
antigens, which are presented by members of the CD1 protein family rather than
by MHC proteins. CD1 proteins are non-polymorphic and structurally closely
resemble class I MHC proteins; however, like class II MHC proteins, their expres-
sion is restricted to professional antigen-presenting cells.
Other T-cell populations express a gd T-cell receptor and are found in a wide
variety of lymphoid tissues. Despite the abundance and presumed importance of
these cells, very little is known about their antigen recognition and function
except that in many cases it seems to be MHC-unrestricted. Possibly, gd T cells
recognize antigens presented by non-MHC-encoded molecules, or perhaps like
B cells they can recognize soluble antigens. Some human gd T cells resemble NKT
cells in being able to recognize bacterial glycolipids bound to CD1 molecules.
Other populations of gd T cells, such as those found in the skin, have essentially
invariant TCRs and presumably recognize a common microbial antigen or a self
molecule, but the nature of this is unknown.
g3 g1 a1 g2 g4 e a2
Centromeric
exon that potentially encodes the variable domain of a receptor chain (Figure
4.25). A typical receptor gene locus may contain 100 V, 5 D, and 10 J elements.
Random recombination of these gives 5000 permutations, and because each
immune receptor contains two chains, each formed from randomly rearranged
receptor genes, the rearrangement process by itself creates a potential basic
2
germ-line repertoire of (5000) or about 25 million specificities.
heavy chain
V1 V2 V3 V4 Vn D1 D2 D3 D4 Dn J1 J2 J3 J4 Jn Cm Cd Cg3 Cg1 Ca2
DNA
D–J joining
V–D–J joining
RNA splicing
mRNA
translation
polypeptide V C
Cd
Cg Cm
spliced to give
dmRNA IgG
Large as the combinatorial diversity above may seem, it represents only the
tip of the iceberg of receptor diversity. A much bigger contribution to V domain
variability is provided by mutational events that take place before the joining of
V, D, and J elements and which create junctional diversity. First, exonucleases
nibble away at the ends of each element, removing a variable number of nucle-
otides. Second, a template-independent DNA polymerase adds a variable num-
ber of random nucleotides to the nibbled ends, creating N regions. Combinatorial
and junctional diversity together create an almost limitless repertoire of variable
domains in immune receptors.
The genetic mechanisms leading to the production of functional VJ and VDJ
exons often involve large-scale deletions of the sequences separating the selected
gene segments (most probably by intrachromatid recombination; Figure 4.26B).
Conserved recombination signal sequences flank the 3¢ ends of each V and J seg-
ment and both the 5¢ and 3¢ ends of each D gene segment. They are recognized by
RAG proteins that initiate the recombination process, and they are the only lym-
phocyte-specific proteins involved in the entire recombination process. The
recombination signal sequences also specify the rules for recombination,
enabling joining of V to J, or D to J followed by V to DJ, but never V to V or D to D.
FURTHER READING
General Cell proliferation, senescence, and programmed
Alberts B, Johnson A, Lewis J et al. (2007) Molecular Biology Of cell death
The Cell, 5th ed. Garland Publishing. Adams JM (2003) Ways of dying: multiple pathways to apoptosis.
Cells Alive! Collection of various videos and animations in cell Genes Dev. 17, 2481–2495.
biology, immunology, and related areas at Berry D (2007) Molecular animation of cell death mediated
http://www.cellsalive.com/toc.htm by the Fas pathway. Science STKE 3 April, doi:10.1126/
Cooper GM & Hausman RE (2006) The Cell, A Molecular stke.3802007tr1. http://stke.sciencemag.org/cgi/content/full/
Approach, 4th ed. Sinauer Associates Inc. sigtrans;2007/380/tr1/DC1
Pollard TD & Earnshaw WC (2007) Cell Biology, 2nd ed. Saunders. Elmore S (2007) Apoptosis: a review of programmed cell death.
Toxicol. Pathol. 35, 495–516.
Cell structure and diversity Morgan DO (2007) The Cell Cycle: Principles Of Control. New
Handwerger KE & Gall JG (2006) Subnuclear organelles: new Science Press Ltd.
insights into form and function. Trends Cell Biol. 16, 19–26. Raff M (1998) Cell suicide for beginners. Nature 396, 119–122.
Liang B (2001) Construction of the cell membrane Von Zglinicki T, Saretzki G, Ladhoff J et al. (2005) Human cell
[animation]. http://www.wisc-online.com/objects/index_ senescence as a DNA damage response. Mech. Ageing Dev.
tj.asp?objID=AP1101 126, 111–117.
Singla V & Reiter JF (2006) The primary cilium as the cell’s
antennae: signaling at a sensory organelle. Science 313, Stem cells and differentiation
629–633. Moore KA & Lemischka IR (2006) Stem cells and their niches.
Science 311, 1880–1885.
Cell adhesion and tissue formation Nature Insight on Stem Cells (2001) Nature 414, 88–131.
Beckerle M (ed.) (2002) Cell Adhesion (Frontiers In Molecular Nature Insight on Stem Cells (2006) Nature 441, 1059–1102.
Biology). Oxford University Press. Weissman IL, Anderson DJ & Gage F (2001) Stem and progenitor
Extravasation Animation, available at: http://multimedia.mcb cells: origins, phenotypes, lineage commitments and
.harvard.edu/media.html [This video shows how regulating transdifferentiation. Annu. Rev. Cell Dev. Biol. 17, 387–403.
cell adhesion permits white blood cells to migrate into
tissues.] Immune system cells
Gumbiner BM (1996) Cell adhesion: the molecular basis of tissue Hayday AC & Pennington DJ (2007) Key factors in the organized
architecture and morphogenesis. Cell 84, 345–377. chaos of early T cell development. Nat. Immunol. 8, 137–144.
Hynes RO (2002) Integrins: bidirectional allosteric signalling Immune system cell lineages (2007) Immunity 26, 669–750.
machines. Cell 110, 673–687. Janeway C, Travers P, Walport M & Shlomchik M (2004)
Immunobiology: The Immune System In Health And Disease,
Cell signaling 6th ed. Garland Scientific.
Brivanlou AH & Darnell JE (2002) Signal transduction and the Kindt TJ, Goldsby RA & Osborne BA (2007) Kuby Immunology, 6th
control of gene expression. Science 295, 813–818. ed. WH Freeman & Co.
Christensen ST & Ott CM (2007) Cell signaling: a ciliary signaling Roitt IM, Martin SJ, Delves PJ & Burton D (2006) Roitt’s Essential
switch. Science 317, 330–331. Immunology, 11th ed. Blackwell Scientific.
Database of Cell Signaling. Science STKE (Signal Transduction
Knowledge Environment) at http://stke.sciencemag.org/cm/
[Permits browsing of details of signal transduction pathways.]
Gavi S, Shumay E, Wang HY & Malbon CC (2006) G-protein-
coupled receptors and tyrosine kinases: crossroads in cell
signaling and regulation. Trends Endocr. Metab. 17, 48–54.
Chapter 5
Principles of
Development 5
KEY CONCEPTS
• The zygote and each cell in very early stage vertebrate embryos (up to the 16-cell stage in
mouse) can give rise to every type of adult cell; as development proceeds, the choice of cell
fate narrows.
• In the early development of vertebrate embryos, the choice between alternative cell fates
primarily depends on the position of a cell and its interactions with other cells rather than
cell lineage.
• Many proteins that act as master regulators of early development are transcription factors;
others are components of pathways that mediate signaling between cells.
• The vertebrate body plan is dependent on the specification of three orthogonal axes and the
polarization of individual cells; there are very significant differences in the way that these
axes are specified in different vertebrates.
• The fates of cells at different positions along an axis are dictated by master regulatory genes
such as Hox genes.
• Cells become aware of their positions along an axis (and respond accordingly) as a result of
differential exposure to signaling molecules known as morphogens.
• Major changes in the form of the embryo (morphogenesis) are driven by changes in cell
shape, selective cell proliferation, and differences in cell affinity.
• Gastrulation is a key developmental stage in early animal embryos. Rapid cell migrations
cause drastic restructuring of the embryo to form three germ layers—ectoderm, mesoderm,
and endoderm—that will be precursors of defined tissues of the body.
• Mammals are unusual in that only a small fraction of the cells of the early embryo give rise
to the mature animal; the rest of the cells are involved in establishing extraembryonic
tissues that act as a life support.
• Developmental control genes are often highly conserved but there are important species
differences in gastrulation and in many earlier developmental processes.
134 Chapter 5: Principles of Development
pathways. Until quite recently, the gene products that were known to be involved
in modulating transcription and in cell signaling were exclusively proteins, but
now it is clear that many genes make noncoding RNA products that have impor-
tant developmental functions. In this chapter we will focus on vertebrate, and
particularly mammalian, development. Our knowledge of early human develop-
ment is fragmentary, because access to samples for study is often restricted for
ethical or practical reasons. As a result, much of the available information is
derived from animal models of development.
Invertebrates Caenorhabditis elegans easy to breed and maintain (GT = 3–5 days); fate body plan different from that of vertebrates;
(roundworm) of every single cell is known difficult to do targeted mutagenesis
Drosophila melanogaster easy to breed (GT = 12–14 days); sophisticated body plan different from that of vertebrates;
(fruit fly) genetics; large numbers of mutants available cannot be stored frozen
Fish Danio rerio (zebrafish) relatively easy to breed (GT = 3 months) and small embryo size makes manipulation difficult
maintain large populations; good genetics
Frogs Xenopus laevis transparent large embryo that is easy to genetics is difficult because of tetraploid
(large-clawed frog) manipulate genome; not so easy to breed (GT = 12 months)
Xenopus tropicalis diploid genome makes it genetically more smaller embryo than X. laevis
(small-clawed frog) amenable than X. laevis; easier to breed than
X. laevis (GT < 5 months )
Birds Gallus gallus (chick) accessible embryo that is easy to observe and genetics is difficult; moderately long generation
manipulate time (5 months)
Mammals Mus musculus (mouse) relatively easy to breed (GT = 2 months); small embryo size makes manipulation difficult;
sophisticated genetics; many different strains and implantation of embryo makes it difficult to
mutants access
Papio hamadryas extremely similar physiology to humans very expensive to maintain even small colonies;
(baboon) difficult to breed (GT = 60 months); major ethical
concerns about using primates for research
investigations
The three germ layers, as they are known, are ectoderm, mesoderm, and endo-
derm, and they will give rise to all the somatic tissues.
The constituent cells of the three germ layers are multipotent, and their
differentiation potential is restricted. The ectoderm cells of the embryo, for
example, give rise to epidermis, neural tissue, and neural crest, but they cannot
normally give rise to kidney cells (mesoderm-derived) or liver cells (endoderm-
derived). Cells from each of the three germ layers undergo a series of sequential
differentiation steps. Eventually, unipotent progenitor cells give rise to terminally
differentiated cells with specialized functions.
(A) (B) transplant cells from Figure 5.2 Ectoderm cells in the early
ventral to dorsal surface Xenopus embryo become committed to
epidermal or neural cell fates according to
their position. (A) Cells in the dorsal midline
ventral dorsal ectoderm (red) give rise to the neural plate
surface surface that is formed along the anteroposterior
(A–P) axis; other ectodermal cells such
as ventral ectoderm (green) give rise to
epidermis. (B) If the ventral ectoderm is
grafted onto the dorsal side of the embryo,
A A A it is re-specified and now forms the neural
plate instead of epidermis. This shows that
the fate of early ectoderm cells is flexible
and does not depend on their lineage but
on their position. The fate of dorsal midline
ectoderm cells is specified by signals from
cells of an underlying mesoderm structure,
P P P the notochord, that forms along the
anteroposterior axis.
The ectodermal cells are therefore flexible. If positioned on the front (ventral)
surface of the embryo they will normally give rise to epidermis, but if surgically
grafted to the dorsal midline surface they give rise to neural plate (Figure 5.2).
Similarly, if dorsal midline ectoderm cells are grafted to the ventral or lateral
regions of the embryo they form epidermis. The dorsal ectoderm cells are induced
to form the neural plate along the midline because they receive specialized sig-
nals from underlying mesoderm cells. These signals are described in Section 5.3.
The fate of the ectoderm—epidermal or neural—depends on the position of
the cells, not their lineage, and is initially reversible. At this point cell fate is said
to be specified, which means it can still be altered by changing the environment
of the cell. Later on, the fate of the ectoderm becomes fixed and can no longer be
altered by grafting. At this stage, the cells are said to be determined, irreversibly
committed to their fate because they have initiated some molecular process that
inevitably leads to differentiation. A new transcription factor may be synthesized
that cannot be inactivated, or a particular gene expression pattern is locked in
place through chromatin modifications. There may also be a loss of competence
for induction. For example, ectoderm cells that are committed to becoming epi-
dermis may stop synthesizing the receptor that responds to signals transmitted
from the underlying mesoderm.
(A) A (B)
normal situs inversus right isomerism left isomerism
heart
R L L R
R R L L
D
right lung
(3 lobes)
R L
V liver
stomach
spleen
Figure 5.4 The three axes of bilaterally symmetric animals. (A) The three axes. The anteroposterior (A–P) axis
is sometimes known as the craniocaudal or rostrocaudal axis (from the Latin cauda = tail and rostrum = beak). In
vertebrates the A–P and dorsoventral (D–V) axes are clearly lines of asymmetry. By contrast, the left–right (L–R) axis
is superficially symmetric (a plane through the A–P axis and at right angles to the L–R axis seems to divide the body
into two equal halves). (B) Internal left–right asymmetry in humans. Vertebrate embryos are initially symmetric with
P respect to the L–R axis. The breaking of this axis of symmetry is evolutionarily conserved, so that the organization and
placement of internal organs shows L–R asymmetry. In humans, the left lung has two lobes, the right lung has three.
The heart, stomach, and spleen are placed to the left; the liver is to the right, as shown in the left panel. In about 1 in
10,000 individuals the pattern is reversed (situs inversus) without harmful consequences. Failure to break symmetry
leads to isomerism: an individual may have two right halves (the liver and stomach become centralized but no
spleen develops) or two left halves (resulting in two spleens). In some cases, assignment of left and right is internally
inconsistent (heterotaxia), leading to heart defects and other problems. [(B) adapted from McGrath and Brueckner
(2003) Curr. Opin. Genet. Dev. 13, 385–392. With permission from Elsevier.]
The establishment of polarity in the early embryo and the mechanisms of axis
specification can vary significantly between animals. In various types of animal,
asymmetry is present in the egg. During differentiation to produce the egg, cer-
tain molecules known collectively as determinants are deposited asymmetrically
within the egg, endowing it with polarity. When the fertilized egg divides, the
determinants are segregated unequally into different daughter cells, causing the
embryo to become polarized.
In other animals, the egg is symmetric but symmetry is broken by an external
cue from the environment. In chickens, for example, the anteroposterior axis of
the embryo is defined by gravity as the egg rotates on its way down the oviduct. In
frogs, both mechanisms are used: there is pre-existing asymmetry in the egg
defined by the distribution of maternal gene products, while the site of sperm
entry provides another positional coordinate.
In mammals the symmetry-breaking mechanism is unclear. There is no evi-
dence for determinants in the zygote, and the cells of early mammalian embryos
show considerable developmental flexibility when compared with those of other
vertebrates (Box 5.1).
BOX 5.1 AXIS SPECIFICATION AND POLARITY IN THE EARLY MAMMALIAN EMBRYO
In many vertebrates, there is clear evidence of axes being set up in cells, the inner cell mass (ICM), which are located at one end of the
the egg or in the very early embryo. Mammalian embryos seem to be embryo, the embryonic pole. The opposite pole of the embryo is the
different. There is no clear sign of polarity in the mouse egg, and initial abembryonic pole. The embryonic face of the ICM is in contact with
claims that the first cleavage of the mouse zygote is related to an axis the trophectoderm, whereas the abembryonic face is open to the
of the embryo have not been substantiated. fluid-filled cavity, the blastocele. This difference in environment is
The sperm entry point defines one early opportunity for sufficient to specify the first two distinct cell layers in the ICM: primitive
establishing asymmetry. Fertilization induces the second meiotic ectoderm (epiblast) at the embryonic pole, and primitive endoderm
division of the egg, and a second polar body (Figure 1) is generally (hypoblast) at the abembryonic pole (Figure 1B, left panel). This, in
extruded opposite the sperm entry site. This defines the animal– turn, defines the dorsoventral (D–V) axis of the embryo. It is not clear
vegetal axis of the zygote, with the polar body at the animal pole. how the ICM becomes positioned asymmetrically in the blastocyst in
Subsequent cleavage divisions result in a blastocyst that shows the first place, but it is interesting to note that when the site of sperm
bilateral symmetry aligned with the former animal–vegetal axis of the entry is tracked with fluorescent beads, it is consistently localized to
zygote (see Figure 1A). trophectoderm cells at the embryonic–abembryonic border.
The first clear sign of polarity in the early mouse embryo is at the It is still unclear how the anteroposterior (A–P) axis is specified, but
blastocyst stage, when the cells are organized into two layers—an the position of sperm entry may have a defining role. The first overt
outer cell layer, known as trophectoderm, and an inner group of indication of the A–P axis is the primitive streak, a linear structure that
PATTERN FORMATION IN DEVELOPMENT 141
develops in mouse embryos at about 6.5 days after fertilization, and in The left–right (L–R) axis is the last of the three axes to form. The
human embryos at about 14 days after fertilization. The decision as to major step in determining L–R asymmetry occurs during gastrulation,
which end should form the head and which should form the tail rests when rotation of cilia at the embryonic node results in a unidirectional
with one of two major signaling centers in the early embryo, a region flow of perinodal fluid that is required to specify the L–R axis.
of extraembryonic tissue called the anterior visceral endoderm (AVE). In Somehow genes are activated specifically on the embryo’s left-hand
mice, this is initially located at the tip of the egg cylinder but it rotates side to produce Nodal and Lefty-2, initiating signaling pathways
toward the future cranial pole of the A–P axis just before gastrulation that activate genes encoding left-hand-specific transcription factors
(Figure 1C). The second major signaling center, the node, is established (such as Pitx2) and inhibiting those that produce right-hand-specific
at the opposite extreme of the epiblast. transcription factors.
How do cells become aware of their position along an axis and therefore
behave accordingly? This question is pertinent because functionally equivalent
cells at different positions are often required in the production of different struc-
tures. Examples include the formation of different fingers from the same cell
types in the developing hand, and the formation of different vertebrae (some
with ribs and some without) from the same cell types in the mesodermal struc-
tures known as somites.
In many developmental systems, the regionally specific behavior of cells has
been shown to depend on a signal gradient that has different effects on equiva-
lent target cells at different concentrations. Signaling molecules that work in this
way are known as morphogens. In vertebrate embryos, this mechanism is used
to pattern the main anteroposterior axis of the body, and both the anteroposte-
rior and proximodistal axes of the limbs (the proximodistal axis of the limbs runs
from adjacent to the trunk to the tips of the fingers or toes).
(A)
1 2 3 4 5 6 7 8 9 10 11 12 13
Hoxa 3¢ 5¢
Hoxb
Hoxc
Figure 5.5 Hox genes are expressed
Hoxd at different positions along the
anteroposterior axis according to their
position within Hox clusters.
(B) (A) Conservation of gene function within
Hox clusters. In Drosophila, an evolutionary
A P
rearrangement of what was a single
occipital cervical thoracic lumbar sacral caudal Hox cluster resulted in two subclusters
1 2 3 4 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10111213 1 2 3 4 5 6 1 2 3 4 1 2 3 4 collectively called HOM-C. Mammals have
four Hox gene clusters, such as the mouse
B1
A1 Hoxa, Hoxb, Hoxc, and Hoxd clusters shown
A3 here, that arose by the duplication of a
B3
D4 single ancestral Hox cluster. Colors indicate
A4
B4 sets of paralogous genes that have similar
A5
B5 functions/expression patterns, such as
B6 Drosophila labial (lab) and Hoxa1, Hoxb1, and
Hox gene
C5
C6 Hoxd1. (B) Mouse Hox genes show graded
A6
A7 expression along the anteroposterior axis.
B9
B7 Genes at the 3¢ end, such as Hoxa1 (A1) and
C8 Hoxb1 (B1), show expression at anterior (A)
C9
D8 parts of the embryo (in the head and neck
A10
D9 regions); those at the 5¢ end are expressed
D10
A11 at posterior (P) regions toward the tail.
D11 [Adapted from Twyman (2000) Instant Notes
D12
D13 In Developmental Biology. Taylor & Francis.]
PATTERN FORMATION IN DEVELOPMENT 143
(A) limits of HoxD organization Figure 5.6 Digit formation in the chick
gene expression of digits limb bud is specified by the zone of
polarizing activity. (A) Normal specification
of digits. The zone of polarizing activity (ZPA)
D9 II is located in the posterior margin of the
D10
D11 III developing limb bud (left panel). ZPA cells
D12 produce the morphogen Sonic hedgehog
D13 IV
ZPA ZPA (Shh), and the ensuing gradient of Shh levels
generates a nested overlapping pattern of
D1
D1 1
D1 0
D1
3
2
D9
expression patterns for the five different
(B)
HoxD genes (middle panel), which is
Shh Shh
important for the specification of digits (right
panel). (B) Duplication of ZPA signals causes
RA RA mirror-image duplication of digits. Grafting
ZPA of a second ZPA to the anterior margin of
D9 D10 11 12 3 IV
the limb bud (or placement of a bead coated
D D D1 III
ZPA with Shh or retinoic acid, RA) establishes
II an opposing morphogen gradient and
II results in a mirror-image reversal of digit
ZPA III fates. [From Twyman (2000) Instant Notes In
D1 2
ZPA
D1 1
IV
3
D9
The source of the morphogen gradient that guides Hox gene expression along
the major anteroposterior axis of the embryo is thought to be a transient embry-
onic structure known as the node, which is one of two major signaling centers in
the early embryo (see Box 5.1), and the morphogen itself is thought to be retinoic
acid. The node secretes increasing amounts of retinoic acid as it regresses, such
that posterior cells are exposed to larger amounts of the chemical than anterior
cells, resulting in the progressive activation of more Hox genes in the posterior
regions of the embryo.
5.4 MORPHOGENESIS
Cell division, with progressive pattern formation and cell differentiation, would
eventually yield an embryo with organized cell types, but that embryo would be
a static ball of cells. Real embryos are dynamic structures, with cells and tissues
undergoing constant interactions and rearrangements to generate structures
and shapes. Cells form sheets, tubes, loose reticular masses, and dense clumps.
Cells migrate either individually or en masse. In some cases, such behavior is in
response to the developmental program. In other cases, these processes drive
development, bringing groups of cells together that would otherwise never come
into contact. Several different mechanisms underlying morphogenesis are sum-
marized in Table 5.2 and discussed in more detail below.
Change in cell shape change from columnar to wedge-shaped cells during neural tube closure in birds and mammals
Change in cell size expansion of adipocytes (fat cells) as they accumulate lipid droplets
Gain of cell–cell adhesion condensation of cells of the cartilage mesenchyme in vertebrate limb bud
Loss of cell–cell adhesion delamination of cells from epiblast during gastrulation in mammals
Loss of cell–matrix adhesion delamination of cells from basal layer of the epidermis
Differential rates of cell proliferation selective outgrowth of vertebrate limb buds by proliferation of cells in the progress zone, the
undifferentiated population of mesenchyme cells from which successive parts of the limb are laid down
Alternative positioning and/or different embryonic cleavage patterns in animals; stereotyped cell divisions in nematodes
orientation of mitotic spindle
Apoptosis separation of digits in vertebrate limb bud (Figure 5.7); selection of functional synapses in the
mammalian nervous system
Migration The movement of an individual cell with respect to other cells in the embryo. Some cells, notably the neural crest cells (Box 5.4)
and germ cells (Section 5.7), migrate far from their original locations during development
Ingression The movement of a cell from the surface of an embryo into its interior (Figure 5.13)
Egression The movement of a cell from the interior of an embryo to the external surface
Delamination The movement of cells out of an epithelial sheet, often to convert a single layer of cells into multiple layers. This is one of the major
processes that underlie gastrulation (Figure 5.13) in mammalian embryos. Cells can also delaminate from a basement membrane,
as occurs in the development of the skin
Intercalation The opposite of delamination: cells from multiple cell layers merge into a single epithelial sheet
Condensation The conversion of loosely packed mesenchyme cells into an epithelial structure; sometimes called a mesenchymal-to-epithelial
transition
Dispersal The opposite of condensation: conversion of an epithelial structure into loosely packed mesenchyme cells; an epithelial-to-
mesenchymal transition
(A) (B)
E12.5
dying cells
(yellow)
are created by the death of interdigital cells in the hand and foot plates beginning
at about 45 days of gestation (Figure 5.7). In the mammalian nervous system,
apoptosis is used to prune out the neurons with nonproductive connections,
allowing the neuronal circuitry to be progressively refined. Remarkably, up
to 50% of neurons are disposed of in this manner, and in the retina this can
approach 80%.
(A) (B) Figure 5.8 The specialized sex cells. (A) The
plasma perivitelline zona pellucida head acrosomal vesicle egg. The mammalian egg (oocyte) is a large
membrane space 5 mm nucleus cell, 120 mm in diameter, surrounded by an
extracellular envelope, the zona pellucida,
cortical midpiece
5 mm which contains three glycoproteins, ZP-1,
granule
ZP-2, and ZP-3, that polymerize to form
a gel. The first polar body (the product of
meiosis I; not shown here) lies under the zona
within the perivitelline space. At ovulation,
plasma membrane oocytes are in metaphase II. Meiosis II is not
egg 50 mm completed until after fertilization. (B) The
flagellum sperm. This cell is much smaller than the
egg, with a 5 mm head containing highly
compacted DNA, a 5 mm cylindrical body (the
midpiece, containing many mitochondria),
and a 50 mm tail. At the front, the acrosomal
vesicle contains enzymes that help the sperm
120 mm to make a hole in the zona pellucida, allowing
it to access and fertilize the egg. Fertilization
triggers secretion of cortical granules by the
The male gamete, the sperm cell, is a small cell with a greatly reduced cyto- egg that effectively inhibit further sperm from
plasm and a haploid nucleus. The nucleus is highly condensed and transcrip- passing through the zona pellucida. [From
tionally inactive because the normal histones are replaced by a special class of Alberts B, Johnson A, Lewis J et al. (2002)
packaging proteins known as protamines. A long flagellum at the posterior end Molecular Biology of the Cell, 4th ed. Garland
provides propulsion. The acrosomal vesicle (or acrosome) at the anterior end Science/Taylor & Francis LLC.]
(Figure 5.8B) contains digestive enzymes. Human sperm cells have to migrate
very considerable distances, and out of the 280 million or so ejaculated into the
vagina only about 200 reach the required part of the oviduct where fertilization
takes place.
Fertilization begins with attachment of a sperm to the zona pellucida followed
by the release of enzymes from the acrosomal vesicle, causing local digestion of
the zona pellucida. The head of the sperm then fuses with the plasma membrane
of the oocyte and the sperm nucleus passes into the cytoplasm. Within the oocyte,
the haploid sets of sperm and egg chromosomes are initially separated from each
other and constitute, respectively, the male and female pronuclei. They subse-
quently fuse to form a diploid nucleus. The fertilized oocyte is known as the
zygote.
Figure 5.9 Early development of the mammalian embryo, from zygote to zygote
blastocyst. In mammals, the first cleavage is a normal meridional division (along
the vertical axis), but in the second cleavage one of the two cells (blastomeres)
divides meridionally while the other divides at right angles, equatorially (rotational
cleavage). At the eight-cell stage, the mouse morula undergoes compaction, when
the blastomeres huddle together to form a compact ball of cells. The tight packaging,
stabilized by tight junctions formed between the outer cells of the ball, seals off the
inside of the sphere. The inner cells form gap junctions, enabling small molecules
and ions to pass between them. The morula does not have an internal cavity, but 2-cell stage
the outer cells secrete fluid so that the subsequent blastocyst becomes a hollow ball
of cells with a fluid-filled internal cavity, the blastocele. The blastocyst has an outer cleavage
layer of cells, the trophoblast, that will give rise to an extraembryonic membrane, plane I
the chorion, plus an inner cell mass (ICM) located at one end of the embryo, the
embryonic pole. Cells from the ICM will give rise to the other extracellular membranes
as well as the embryo proper and the subsequent fetus. For convenience, the zona
pellucida that surrounds the early embryo (see Figure 5.10) is not shown. [Adapted
from Twyman (2000) Instant Notes In Developmental Biology. Taylor & Francis.]
as maternal genome regulation, and the maternal gene products are often
referred to as maternal determinants.
4-cell stage
Mammalian eggs are among the smallest in the animal kingdom,
cleavage cleavage
and cleavage in mammals is exceptional in several ways plane IIA plane I
Early mammalian development is unusual in that it is concerned trophoblast cells, which produce the enzymes that erode the lining
primarily with the formation of tissues that mostly do not contribute of the uterus, helping the embryo to implant into the uterine wall.
to the final organism. These tissues are the four extraembryonic The chorion is also a source of hormones (chorionic gonadotropin)
membranes: yolk sac, amnion, chorion, and allantois. The chorion that influence the uterus as well as other systems. In all these cases,
combines with maternal tissue to form the placenta. As well as the chorion serves as a surface for respiratory exchange. In placental
protecting the embryo (and later the fetus), these life support systems mammals, the chorion provides the fetal component of the placenta
are required to provide for its nutrition, respiration, and excretion. (see below).
Yolk sac Allantois
The most primitive of the four extraembryonic membranes, the yolk The most evolutionarily recent of the extraembryonic membranes, the
sac is found in all amniotes (mammals, birds, and reptiles) and also allantois develops from the posterior part of the alimentary canal in
in sharks, bony fishes, and some amphibians. In bird embryos, the the embryos of reptiles, birds, and mammals. It arises from an outward
yolk sac surrounds a nutritive yolk mass (the yellow part of the egg, bulging of the floor of the hindgut and so is composed of endoderm
consisting mostly of phospholipids). In many mammals, including and splanchnic lateral plate mesoderm. In most amniotes it acts as a
humans and mice, the yolk sac does not contain any yolk. The yolk sac waste (urine) storage system, but not in placental mammals (including
is generally important because: humans). Although the allantois of placental mammals is vestigial and
• the primordial germ cells pass through the yolk sac on their may regress, its blood vessels give rise to the umbilical cord vessels.
migration from the epiblast to the genital ridge (for more details,
Placenta
see Section 5.7);
The placenta is found only in some mammals and is derived partly
• it is the source of the first blood cells of the conceptus and most of
from the conceptus and partly from the uterine wall. It develops after
the first blood vessels, some of which extend themselves into the
implantation, when the embryo induces a response in the neighboring
developing embryo.
maternal endometrium, changing it to become a nutrient-packed
The yolk sac originates from splanchnic (= visceral) lateral plate
highly vascular tissue called the decidua. During the second and third
mesoderm and endoderm.
weeks of development, the trophoblast tissue becomes vacuolated,
Amnion and these vacuoles connect to nearby maternal capillaries, rapidly
The amnion is the innermost of the extraembryonic membranes, filling with blood.
remaining attached to and immediately surrounding the embryo. It As the chorion forms, it projects outgrowths known as chorionic
contains amniotic fluid that bathes the embryo, thereby preventing villi into the vacuoles, bringing the maternal and embryonic blood
drying out during development, helping the embryo to float (and so supplies into close contact. At the end of 3 weeks, the chorion has
reducing the effects of gravity on the body), and acting as a hydraulic differentiated fully and contains a vascular system that is connected
cushion to protect the embryo from mechanical jolting. The amnion to the embryo. Exchange of nutrients and waste products occurs over
derives from ectoderm and somatic lateral plate mesoderm. the chorionic villi. Initially, the embryo is completely surrounded by
Chorion the decidua, but as it grows and expands into the uterus, the overlying
The chorion is also derived from ectoderm and somatic lateral plate decidual tissue (decidua capsularis) thins out and then disintegrates.
mesoderm. In the embryos of birds, the chorion is pressed against The mature placenta is derived completely from the underlying
the shell membrane, but in mammalian embryos it is composed of decidua basalis.
150 Chapter 5: Principles of Development
splitting at
morula or
separate fertilized eggs before
splitting in
early blastocyst
e.g. at 2-
2 separate cell stage
blastocysts
chorions
chorion
amnions amnions
Implantation
At about day 5 of human development, an enzyme is released that bores a hole
through the zona pellucida, causing it to partly degrade and releasing the blasto-
cyst (see Figure 5.10). The hatched blastocyst is now free to interact directly with
the cells lining the uterus, the endometrium. Very soon after arriving in the uterus
(day 6 of human development), the blastocyst attaches tightly to the uterine epi-
thelium (implantation). Trophoblast cells proliferate rapidly and differentiate
into an inner layer of cytotrophoblast and an outer multinucleated cell layer, the
syncytiotrophoblast, that starts to invade the connective tissue of the uterus
(Figure 5.11).
EARLY HUMAN DEVELOPMENT: FERTILIZATION TO GASTRULATION 151
(A) (B)
maternal
day 6 syncytiotrophoblast blood vessel day 11
syncytiotrophoblast
primary stem villi
amniotic cavity
epiblast endometrial
(primitive epithelium
ectoderm)
cytotrophoblast
hypoblast maternal blood
(primitive blastocele vessels
endoderm)
cytotrophoblast yolk sac
0.1 mm
Even before implantation occurs, cells of the ICM begin to differentiate into Figure 5.11 Detail of human embryo
two distinct layers. An external cell layer, the epiblast (or primitive ectoderm) will implantation. (A) Human embryo
give rise to ectoderm, endoderm, and mesoderm (and hence all of the tissues of implantation begins at about 6 days
after fertilization, when the hatched
the embryo), and also the amnion, yolk sac, and allantois. An internal cell layer,
blastocyst adheres to the wall of the
the hypoblast (also called primitive endoderm or visceral endoderm) will give rise uterus (endometrium). Trophoblast
to the extraembryonic mesoderm that lines the primary yolk sac and the cells differentiate into an inner layer of
blastocele. cytotrophoblast and an outer multinucleated
A fluid-filled cavity, the amniotic cavity, forms within the inner cell mass, cell layer, the syncytiotrophoblast, that
enclosed by the amnion. The embryo now consists of distinct epiblast and hypo- invades the connective tissue of the uterus.
blast layers that will come together to form a flat bilaminar germ disk located The inner cell mass of the blastocyst has
between two fluid-filled cavities, the amniotic cavity on one side and the yolk sac given rise to two distinct cell layers, the
on the other (Figure 5.11B and Figure 5.12). epiblast and the hypoblast. (B) By about
11 days after fertilization, the primary
yolk sac detaches from the surrounding
Gastrulation is a dynamic process by which cells of the epiblast cytotrophoblast. Loose endodermal cells are
give rise to the three germ layers scattered round the yolk sac. Villi composed
of cytotrophoblast cells begin to extend into
Gastrulation takes place during the third week of human development and is the the syncytiotrophoblast. [Reproduced from
first major morphogenetic process in development. During gastrulation, the ori- McLachlan (1994) Medical Embryology. With
entation of the body is laid down, and the embryo is converted from a bilaminar permission from Addison-Wesley.]
structure into a structure of three germ layers: ectoderm, endoderm, and meso-
derm. It should be noted that whereas the bilaminar germ disk is a flat disk in
humans, rabbits, and some other mammals, it is cup-shaped in the mouse (com-
pare Figure 1B in Box 5.1).
In mammals, birds, and reptiles, the major structure characterizing gastrula-
tion is a linear one, the primitive streak. This transient structure appears, at
about day 15 in human development, as a faint groove along the longitudinal
midline of the now oval-shaped bilaminar germ disk. Over the course of the next
day, the primitive streak develops a groove (primitive groove), which deepens and
elongates to occupy close to half the length of the embryo. By day 16 a deep
depression (the primitive pit), surrounded by a slight mound of epiblast (the
primitive node), is evident at the end of the groove, near the center of the germ
disk (see Figure 5.12).
The process of gastrulation is extremely dynamic, involving very rapid cell
movements. At day 14–15 in human development, the epiblast cells near the
primitive streak begin to proliferate, flatten, and lose their connections with one
another. These flattened cells develop pseudopodia that allow them to migrate
downward through the primitive streak into the space between the epiblast and
152 Chapter 5: Principles of Development
extraembryonic mesoderm
the hypoblast (Figure 5.13B). Some of the ingressing epiblast cells invade the
hypoblast and displace its cells, leading eventually to complete replacement of
the hypoblast by a new layer of epiblast-derived cells, the definitive endoderm.
Starting on day 16, some of the migrating epiblast cells diverge into the space
between the epiblast and the newly formed definitive endoderm to form a third
layer, the intraembryonic mesoderm (Figure 5.13C). When the intraembryonic
mesoderm and definitive endoderm have formed, the residual epiblast is now
described as the ectoderm, and the new three-layered structure is referred to as
the trilaminar germ disk (see Figure 5.13C).
The ingressing mesoderm cells migrate in different directions, some laterally
and others toward the anterior, and others are deposited on the midline (Figure
5.14).
The cells that migrate through the primitive pit in the center of the primitive
node and come to rest on the midline form two structures:
• The prechordal (also called prochordal) plate is a compact mass of mesoderm
to the anterior of the primitive pit. The prechordal plate will induce important
cranial midline structures such as the brain.
• The notochordal process is a hollow tube that sprouts from the primitive pit
and grows in length as cells proliferating in the region of the primitive node
add on to its proximal end. The notochordal process and adjacent mesoderm
(A) (B) day 14–15 (C) day 16
epiblast
(primitive
ectoderm)
hypoblast
(primitive embryonic
endoderm) ectoderm
Figure 5.13 During human gastrulation, a flat bilaminar germ disk transforms into a trilaminar embryo. The outcome of gastrulation is similar in
all mammals, but major differences may occur in the details of morphogenesis, particularly in how extraembryonic structures are formed and used. (A) In
humans, the epiblast and hypoblast come into contact to form a flat bilaminar germ disk. The epiblast cells within this disk are described as the primitive
ectoderm but will give rise to all three germ layers—ectoderm, endoderm, and mesoderm—as described in panels (B) and (C). The epiblast cells that are
not in contact with the hypoblast will give rise to the ectoderm of the amnion. The hypoblast will give rise to the extraembryonic endoderm that lines the
yolk sac. (B) The bilaminar germ disk at 14–15 days of human development. Along the length of the primitive streak, epiblast cells migrate downward to
invade the hypoblast, and in so doing they become converted to embryonic endoderm and displace the cells of the hypoblast. (C) The bilaminar germ
disk at 16 days of human development. A second wave of ingressing epiblast cells diverge into the space between the epiblast and the newly formed
embryonic endoderm to form embryonic mesoderm. The remaining epiblast cells are now known as the embryonic ectoderm. [From Larsen (2001)
Human Embryology, 3rd ed. With permission from Elsevier.]
EARLY HUMAN DEVELOPMENT: FERTILIZATION TO GASTRULATION 153
Figure 5.14 Migration paths of mesoderm cells that ingress during epiblast
gastrulation. Epiblast cells ingressing through the primitive pit migrate toward
the anterior end (A) to form the notochordal process and the prechordal plate
(oval shape at anterior end—see Figure 5.12) and laterally through the primitive mesoderm
groove to form the lateral mesoderm flanking the midline. The oval shape at the endoderm
posterior end (P) marks the site of what will become the primordial anus. [From
Larsen (2001) Human Embryology, 3rd ed. With permission from Elsevier.]
induce the overlying embryonic ectoderm to form the neural plate, the pre-
cursor of the central nervous system (Figure 5.15). At the same time as the primitive node
notochordal process extends, the primitive streak regresses. By day 20 of primitive groove
A
human development, the notochordal process is completely formed, but it
then transforms from a hollow tube to a solid rod, the notochord. The noto-
chord will later induce the formation of components of the nervous system
(as described in Section 5.6).
The ingressing mesoderm cells that migrate laterally condense into rodlike P epiblast
and sheetlike structures on either side of the notochord. There are three main endoderm
structures (Figure 5.16A):
• The paraxial mesoderm, a pair of cylindrical condensations lying immedi-
ately adjacent to and flanking the notochord, first develops into a series of
whorl-like structures known as somitomeres, which form along the antero-
posterior axis through the third and fourth weeks of human development.
The first seven cranial somitomeres will eventually go on to form the striated
muscles of the face, jaw, and throat, but the other somitomeres develop fur-
ther into discrete blocks of segmental mesoderm known as somites (see Figure
5.16C). Cervical, thoracic, lumbar, and sacral somites will establish the seg-
mental organization of the body by giving rise to most of the axial skeleton
(including the vertebral column), the voluntary muscles, and part of the der-
mis of the skin.
• The intermediate mesoderm, a pair of less pronounced cylindrical conden-
sations, just lateral to the paraxial mesoderm, later develops into the urinary
system, parts of the genital system, and kidneys.
• The remainder of the lateral mesoderm forms a flattened sheet, known as the
lateral plate mesoderm. Starting on day 17 of human development, each of
the lateral plates splits horizontally into two layers separated by a space that
will become the body cavity, the coelom. The dorsal layer is known as the
somatic (or parietal) mesoderm or somatopleure. It becomes applied to the Figure 5.15 Progression of the human
inner surface of the ectoderm and will give rise to the inner lining of the body embryonic disk during week 3, showing
development of the neural plate and
wall (see Figure 5.16B). The ventral layer, adjacent to the endoderm, is the
notochord. The sketches are dorsal views
splanchnic (or visceral) mesoderm or splanchnopleure, and it will give rise to of the embryonic disk, showing how
the linings of the visceral organs. it progresses from 15 to 21 days after
Figure 5.17 summarizes how the three germ layers of the early embryo give fertilization. The primitive streak emerges
rise to the many different tissues of the mature organism. at about day 15 along the posterior dorsal
surface, advancing toward the center of
the embryonic disk by the addition of cells
to its posterior (caudal) end. Mesenchymal
prechordal neural cells migrate from the anterior end of the
plate fold primitive streak to form a midline cellular
A cord known as the notochordal process.
The notochordal process grows cranially
neural until it reaches the prechordal plate, the
plate
future site of the mouth. By day 18, the
notochordal developing notochordal process can be seen
process to be accompanied by a thickening of the
primitive
overlying epiblast to form the neural plate,
pit newly added cells
which will eventually give rise to the brain
P primitive and spinal cord. As the notochordal process
streak cloacal develops, it changes from being a tube to
membrane notochord become a solid rod, the notochord. The
cloacal membrane is the primordial anus.
A, anterior; P, posterior. [From Moore (1984)
The Developing Human, 3rd ed. With
15 days 18 days 20 days 21 days
permission from Elsevier.]
154 Chapter 5: Principles of Development
dermatome connective tissue layers of skin eventually form all tissues of the embryo.
The connective tissue of the head and
intermediate Mullerian ducts oviducts, uterus
the cartilage of the skull and of structures
mesoderm mesonephros kidney, ovary, testis, ureter derived from branchial arches are a mixture
of ectodermal and mesodermal tissue, as
somatopleure connective tissue of body wall, limbs
lateral plate shown. Although the notochord persists
mesoderm splanchnopleura heart, blood vessels, blood cells, in adults in some primitive vertebrates, in
adrenal cortex
mammals and other higher vertebrates
it becomes ossified in regions of forming
lungs, trachea, bronchi vertebrae and contributes to the center of
embryonic primitive gut digestive tube, liver, pancreas the intervertebral disks. Note that some of
endoderm
pharynx, pharyngeal pouches, thyroid the embryonic mesoderm cells go on to
form extraembryonic mesoderm.
NEURAL DEVELOPMENT 155
The vertebrate neural crest is quite extraordinary and, although Table 1 Major classes of neural crest derivatives
derived from ectoderm, its importance has led to suggestions that
it be recognized as a fourth germ layer. Neural crest cells originate Tissue/region Cell types/structure
during neurulation from dorsal ectoderm cells at the edges of the
neural folds (Figure 5.18). The neural crest is a transient structure—the Peripheral nervous sensory ganglia; sympathetic ganglia;
cells disperse soon after the neural tube closes. They undergo an system (PNS) neurons parasympathetic ganglia
epithelial-to-mesenchymal transition and they migrate away from the
midline (Figure 1). PNS glial cells Schwann cells; non-myelinating glial cells
Neural crest cells migrate to peripheral locations, where they Endocrine/ adrenal medulla; calcitonin-secreting
assume a variety of different ectodermal and mesodermal fates, giving paraendocrine cells; carotid body type I cells
rise to a quite prodigious number of differentiated cell types (Table 1). derivatives
Most of our information comes from fate-mapping studies in the chick,
which have been aided by the ability to transplant corresponding Head and neck corneal endothelium and stroma; dermis,
domains of the dorsal neural tube between chick and quail embryos smooth muscle, and adipose tissue;
and to identify the cellular derivatives of such domains. facial and anterior ventral skull cartilage
Neural crest (NC) cells form at all levels along the anteroposterior and bones; lachrymal gland, connective
axis of the neural tube; however, four main, but overlapping, regions tissue; pituitary gland, connective tissue;
are recognized with characteristic derivatives and functions as listed salivary gland, connective tissue; thyroid,
below. connective tissue; tooth papillae
Cranial neural crest
Skin dermis, smooth muscle, and adipose
Some NC cells migrate dorsolaterally and give rise to the craniofacial
tissue; melanocytes
mesenchyme, which in turn produces cartilage, bone, cranial neurons,
glia, and connective tissues of the face. NC cells also give rise to Others arteries originating from aortic arch—
structures within the pharyngeal (branchial) arches and pouches, connective tissue and smooth muscle;
including cartilaginous rudiments of several bones of the nose, thymus, connective tissue; truncoconal
face, middle ear, jaw, and neck and also cells of the thymus and septum
odontoblasts of tooth primordia.
Trunk neural crest
An early wave of NC cells migrates ventrolaterally through the anterior
half of the sclerotomes (the blocks of mesodermal tissue derived from into the ectoderm, becoming melanocytes and going on to move
somites that will differentiate into the vertebral cartilage). Some of through the skin toward the ventral midline of the body.
the NC cells stay in the sclerotomes to form the dorsal root ganglia Vagal and sacral neural crest
containing the sensory neurons. Other NC cells continue ventrally to The vagal (neck) NC lies opposite chick somites 1–7; the sacral NC
form the sympathetic ganglia, the adrenal medulla, and nerve clusters lies posterior to somite 28. The neck and sacral NC cells generate the
surrounding the aorta. A later wave of NC cells migrates dorsolaterally parasympathetic (enteric) ganglia of the gut.
Cardiac neural crest
D The cardiac neural crest is a subregion of the vagal neural crest,
original site of neural crest cells extending from chick somites 1 to 3. Cardiac NC cells develop into
melanocytes, neurons, cartilage, or connective tissue (of certain
pharyngeal arches). In addition, this type of NC gives rise to the
neural tube ectoderm entire (muscular/connective tissue) walls of the large arteries as they
arise from the heart (outflow tracts), and contribute to the septum
sensory somite separating the pulmonary circulation from the aorta.
ganglion
sympathetic notochord
L ganglion R
aorta Figure 1 Main migratory pathways during neural crest
adrenal migration. Schematic cross section through the middle part of the
gland
trunk of a chick embryo. Cells taking the pathway just beneath the
coelomic ectoderm (outer red arrows) will form melanocytes; those that take
enteric cavity
the deep pathway via the somites (inner red arrows) will form the
ganglia
gut tube
neurons and glial cells of sensory and sympathetic ganglia, and parts
of the adrenal gland. The neurons and glia of the enteric ganglia, in
the wall of the gut, are formed from neural crest cells that migrate
along the length of the body from either the neck or sacral regions.
[From Alberts B, Johnson A, Lewis J et al. (2002) Molecular Biology of
V the Cell, 4th ed. Garland Science/Taylor & Francis LLC.]
factors of the Emx and Otx families are expressed. In the hindbrain and spinal
cord, positional identities are controlled by the Hox genes. In the hindbrain, it
seems that the positional values of cells are fixed at the neural plate stage and
that migrating neural crest cells carry this information with them and impose
positional identities on the tissues surrounding their eventual resting place.
NEURAL DEVELOPMENT 157
TGF-b D
Shh V0
V1
V2
M
V3
Conversely, positional identity in the spinal cord is imposed by signals from the Figure 5.19 Pattern formation of the
surrounding paraxial mesoderm, which can be shown by transplanting cells to nervous system. Dorsoventral pattern
different parts of the axis. Cranial neural crest cells behave according to their lin- formation involves a competition between
secreted signals from the notochord (Sonic
eage when moved to a new position; that is, they do the same things as they would
hedgehog, Shh) and from the ectoderm
in their original position. Trunk crest cells, in contrast, behave according to their
(initially BMP4, but later other members of
new position—they do the same things as their neighbors. the TGF-b superfamily). As the neural plate
The developing nervous system is a good model of pattern formation and dif- folds up, the signals are expressed in the
ferentiation, because various cell types (different classes of neurons and glia) neural tube itself. The result is the activation
arise along the dorsoventral axis. This process is controlled by a hierarchy of of different transcription factors in different
genetic regulators (Figure 5.19). Dorsoventral polarity is generated by opposing zones resulting in the regional specification
sets of signals originating from the notochord and the adjacent ectoderm. The of different neuronal dorsal (D) and ventral
notochord secretes the signaling molecule Sonic hedgehog (Shh), which has ven- (V) subtypes and motor (M) neurons.
tralizing activity, whereas the ectoderm secretes several members of the TGF-b
(transforming growth factor) superfamily, including BMP4, BMP7, and a protein
appropriately named Dorsalin.
As the neural plate begins to fold, these same signals begin to be expressed in
the extreme ventral and dorsal regions of the neural tube itself—the floor plate
and roof plate, respectively. The opposing signals have opposite effects on the
activation of various homeodomain class transcription factors (such as Dbx1,
Irx3, and NKx6.1). Expression of specific transcription factors divides the neural
tube into discrete dorsoventral zones, which later become the regional centers of
different classes of neuron (see Figure 5.19). For example, the zone defined by the
expression of Nkx6.1 alone becomes the region populated by motor neurons,
which are residents of the ventral third of the neural tube.
Isl-1
Isl-1, Isl-2
sympathetic
ventral limb
neuron
Figure 5.20 Differentiation of the nervous system. The genes that control neurogenesis in the developing nervous system are also expressed in
discrete zones along the dorsoventral axis. They may be regulated by Pax3, Pax6, and Pax7 genes, whose expression domains are in turn partly defined
by the Sonic hedgehog signal emanating from the notochord. The expression domains of the proneural gene products, neurogenin, MATH1, and
MASH1 are shown. Neural development is characterized by the expression of the transcription factor NeuroD. Then different classes of neuron begin to
express distinct groups of transcription factors. Those of the LIM homeodomain family, which includes Isl-1, Isl-2, and Lim-3, are especially important in
determining subneuronal fates by regulating the behavior of projected axons.
158 Chapter 5: Principles of Development
and anteroposterior axes of the nervous system, which is defined by the combi-
nation of transcription factors discussed in the previous section.
Further diversification is controlled by refinements in the expression patterns
of the above transcription factors. For example, all motor neurons initially express
two transcription factors of the LIM homeodomain family: Islet-1 and Islet-2.
Later, only those motor neurons projecting their axons to the ventral limb mus-
cles express just these two LIM transcription factors. Neurons expressing Isl-1,
Isl-2, and a third transcription factor, Lim-3, project their axons to the axial mus-
cles of the body wall. Neurons expressing Isl-2 and Lim-1 project their axons to
dorsal limb muscles, whereas those expressing Isl-1 alone project their axons to
sympathetic ganglia (see Figure 5.20). The transcription factors determine the
particular combinations of receptors expressed on the axon growth cones and
therefore determine the response of growing axons to different physical and
chemical cues (such as chemoattractants). Once the first axons have reached
their targets, further axons can find their way by growing along existing axon
paths—a process termed fasciculation.
The PGCs then migrate away from the posterior region of the primitive streak
into the endoderm and enter the developing hindgut for a short period. At the
end of gastrulation the PGCs are determined and migrate into the genital ridge, a
mesoderm structure that is part of the developing gonad. At this stage, the gonad
is bipotential, being capable of developing into either testis or ovary.
(B)
FURTHER READING
General Noncoding RNA in development
Gilbert SF (2006) Developmental Biology, 8th ed. Sinauer Amaral PP & Mattick JS (2008) Noncoding RNA in development.
Associates. Mamm. Genome 19, 454–492.
Hill M (2008) UNSW Embryology. http://embryology.med.unsw Stefani G & Slack FJ (2008) Small non-coding RNAs in animal
.edu.au/ [An educational resource for learning concepts in development. Nat. Rev. Mol. Cell Biol. 9, 219–230.
embryological development hosted by the University of New
South Wales, Sydney.] Animal models of development
Wolpert L, Smith J, Jessell T et al. (2006) Principles Of
Development, 3rd ed. Oxford University Press. Bard J (ed.) (1994) Embryos: Color Atlas Of Development. Mosby.
162 Chapter 5: Principles of Development
Morphogenesis
Aman A & Piotrowski T (2009) Cell migration during
morphogenesis. Dev. Biol. (in press).
Kornberg TB & Guha A (2007) Understanding morphogen
gradients: a problem of dispersion and containment. Curr.
Opin. Genet. Dev. 17, 264–271.
Steinberg MS (2007) Differential adhesion in morphogenesis: a
modern view. Curr. Opin. Genet. Dev. 17, 281–286.
• DNA molecules are very long and complex and are prone to shearing when isolated from
other cell constituents, making their purification difficult.
• DNA cloning means making multiple identical copies (clones) of a DNA sequence of
interest, the target DNA; the increase in copy number is called amplification. The process
requires a DNA polymerase to replicate a target DNA sequence repeatedly, either within
cells or in vitro.
• In cell-based DNA cloning, target DNA is fractionated by transfer into bacterial or yeast cells
(transformation). Each transformed cell typically takes up just one target DNA molecule and
replicates it using the host cell’s DNA polymerase.
• Before transfer into cells, a target DNA is joined to vector DNA molecules, a plasmid, or a
phage DNA capable of self-replication inside cells. The target-vector DNA complexes are
known as recombinant DNA.
• Restriction endonucleases cut DNA molecules at defined short recognition sequences and
are needed to prepare target and vector DNA for joining together.
• After having been transferred into cells, recombinant DNA typically replicates
independently of the host cell’s chromosome(s).
• Large target DNAs are more stable in cells if carried by vectors whose copy number is
constrained to one or two per cell.
• Megabases of DNA can be cloned in yeast by adding very short sequence elements
necessary for chromosome function to form a linear artificial chromosome.
• In expression cloning the target DNA is designed to be expressed. It is converted to RNA and
in many cases translated into a recombinant protein.
• Cell-based DNA cloning is slow and laborious; however, it can produce very large amounts
of cloned DNA and is the first choice for cloning large DNA sequences, expressing genes,
and making comprehensive sets of DNA clones (DNA libraries).
• DNA cloning can also be performed rapidly in vitro with the polymerase chain reaction
(PCR). A purified DNA polymerase is usually used to replicate a specific DNA sequence
selectively within a complex sample DNA. Oligonucleotides are designed to bind specifically
to sites in the sample DNA that flank the target sequence, allowing the polymerase to
initiate DNA replication at these sites.
• PCR can also be used to amplify many different target DNA fragments simultaneously, for
example by using complex mixtures of numerous different primer sequences. This
indiscriminate amplification can replenish many of the sequences in precious samples
where there is little starting DNA.
• As well as being a DNA cloning procedure, PCR is also widely used as an assay to quantitate
DNA and RNA after it has been converted by reverse transcriptase to complementary DNA
(cDNA). Real-time PCR is the most efficient method, and quantitates PCR products while
the PCR reaction is occurring.
164 Chapter 6: Amplifying DNA: Cell-based DNA Cloning and PCR
The fundamentals of current DNA technology are very largely based on two quite
different approaches to studying specific DNA sequences within a complex DNA
population, as listed below (see also Figure 6.1):
• DNA cloning. DNA sequences of interest are selectively replicated in some way
to produce very large numbers of identical copies (clones). The resulting huge
increase in copy number (amplification) means that DNA cloning is effec-
tively a way of purifying a desired DNA sequence so that it can be comprehen-
sively studied.
• Molecular hybridization. The fragment of interest is not amplified or purified
in any way; instead, it is specifically detected within a complex mixture of
many different sequences. This will be covered in Chapter 7.
Before DNA cloning, our knowledge of DNA was extremely limited. DNA clon-
ing technology changed all that and revolutionized the study of genetics. In
comparison with protein sequences, DNA molecules are extremely large and
complex—individual nuclear DNA molecules often contain hundreds of millions
of nucleotides. When DNA is isolated from cells by standard methods, these huge
molecules are fragmented by shear forces, generating complex mixtures of still
very large DNA fragments (about 50–100 kb in length with standard DNA isola-
tion methods). Given the complexity of the DNA isolated from the cells of a typi-
cal eukaryote, or even prokaryote, the challenge was how to analyze it.
For many eukaryotes, one early approach had been to separate different pop-
ulations of DNA sequence by centrifugation. Ultracentrifugation in equilibrium
density gradients (e.g. CsCl density gradients) typically fractionates the DNA
from a eukaryotic cell into a major band representing the bulk of the DNA plus
several minor satellite bands that are composed of classes of repetitive DNA. The
DNA molecules of the satellite bands (satellite DNA) have different buoyant den-
sities from that of the bulk DNA (and from each other) because they consist of
very long arrays of short tandem repeats whose base composition is significantly
different from that of the bulk DNA. The satellite DNA sequences were found to
be involved in specific aspects of chromosome structure and function. Although
valuable and interesting, the purified satellite DNAs were a minor component of
the genome and did not contain genes.
What was needed were more general methods that allowed any DNA sequence
to be purified, not just DNA sequences whose base compositions deviated sig-
nificantly from that of the total DNA. Two major DNA cloning methodologies
were developed that made this possible. Both use DNA polymerases to make
multiple replica copies of DNA sequences (DNA clones), permitting in principle
the purification of any DNA sequence within a complex starting DNA
Figure 6.1 General approaches for
population. studying specific DNA sequences in
complex DNA populations. DNA molecules
are very long and complex, and any gene
or exon or other DNA sequence of interest
typically represents a tiny fraction of the
complex DNA population
starting DNA that we can isolate from cells.
Two quite different approaches can be
used to study a particular DNA sequence
GENERAL APPROACHES amplification of a detection of a of interest. One way is to purify a specific
FOR STUDYING SPECIFIC specific DNA specific DNA DNA sequence by selectively replicating
DNA SEQUENCES (DNA cloning) its sequence in some way (DNA cloning),
thereby increasing its copy number
(amplification). The specific DNA can either
cell-based cell-free cloning molecular
METHODS
cloning (PCR)
be cloned within bacterial or yeast cells
hybridization
by using cellular DNA polymerases, or by
using purified DNA polymerases in vitro, as
in the polymerase chain reaction (PCR). An
ESSENTIAL • suitable host cell • thermostable DNA • labeled DNA, RNA, alternative approach makes no attempt to
REQUIREMENTS • replicon (vector) polymerase or oligonucleotide purify the DNA sequence, but instead seeks
• means of attaching • sequence information probe to detect it specifically. It involves labeling
foreign DNA to vector
and transferring to
to enable synthesis of
specific oligonucleotide
• means of detecting
a previously isolated nucleic acid sequence
DNA fragments to
host cell primers which probe binds that is related in sequence to the desired
• selection/screening to DNA sequence so that it can specifically
identify transformed recognize it in a molecular hybridization
cells/recombinants
reaction.
PRINCIPLES OF CELL-BASED DNA CLONING 165
To do this, the mixture of transformed cells is spread out on an agar surface and
allowed to grow to form well-separated cell colonies, each arising from a single
transformed cell. Individual cell colonies can then be picked and allowed to
undergo a second growth phase in liquid culture, resulting in very large numbers
of cell clones (Figure 6.2C).
The final step is to harvest the expanded cell cultures and purify the recombi-
nant DNA (Figure 6.2D). The sections below consider these steps in some detail.
(D)
DNA fragments of a manageable size. When DNA is isolated from tissues and cul-
tured cells, the unpackaging of the huge DNA molecules and inevitable physical
shearing results in heterogeneous populations of DNA fragments. The fragments
are not easily studied because their average size is still very large and they are of
randomly different lengths. With the use of restriction endonucleases it became
possible to convert this hugely heterogeneous population of broken DNA frag-
ments into sets of smaller restriction fragments of defined lengths.
Type II restriction endonucleases also paved the way for artificially recombin-
ing DNA molecules. By cutting a vector molecule and the target DNA with restric-
tion endonuclease(s) producing the same type of sticky end, vector–target asso-
ciation is promoted by base pairing between their sticky ends (Figure 6.3). The
hydrogen bonding between their termini facilitates subsequent covalent joining
by enzymes known as DNA ligases. Intramolecular base pairing can also happen,
as when the two overhanging ends of a vector molecule associate to form a circle
168 Chapter 6: Amplifying DNA: Cell-based DNA Cloning and PCR
(cyclization), and ligation reactions are designed to promote the joining of target
DNA to vector DNA and to minimize vector cyclization and other unwanted
products such as linear concatemers produced by sequential end-to-end joining
(see Figure 6.3).
target DNA cut with MboI vector DNA cut with BamHI Figure 6.3 Formation of recombinant
DNA. Heterogeneous target DNA sequences
GATC GGATCC cut with a type II restriction enzyme (MboI
CTAG CCTAGG in this example) are mixed with vector
DNA cut with another restriction enzyme
GATC
such as BamHI that generates the same
CTAG type of sticky ends (here, a 5¢ overhanging
sequence of GATC). Ligation conditions are
chosen that will allow target and vector
molecules to combine, to give recombinant
DNA. However, other ligation products
GATC are possible. Intermolecular target–target
CTAG
concatemers are shown on the left, and
GATC
intramolecular vector cyclization on the
CTAG right. Other possible ligation products not
shown here include intermolecular vector
concatemers, cyclization of multimers, and
C co-ligation events that involve two different
GAT target sequences being included with a
CTAG
vector molecule in the same recombinant
GATC GATC
CTAG CTAG DNA molecule.
intermolecular target–target concatemers
intramolecular
C
GAT vector cyclization
CTAG
C TA C
GAT
G
intermolecular vector–target
recombinant DNA
Escherichia coli isolate, for example, might have three different small plasmids
present in multiple copies and one large single-copy plasmid. Natural examples
of bacterial plasmids include plasmids that carry the sex factor (F) and those that
carry drug-resistance genes. Some plasmids sometimes insert their DNA into the
bacterial chromosome (integration). Such plasmids, which can exist in two
forms—extrachromosomal replicons or integrated plasmids—are known as
episomes.
A bacteriophage (or phage, for short) is a virus that infects bacterial cells.
Unlike plasmids, phages can exist extracellularly but, like other viruses, they must
invade a host cell to reproduce, and they can only infect and reproduce within
certain types of host cell. Phages may have DNA or RNA genomes, and in DNA
phages the genome may consist of double-stranded DNA or single-stranded DNA
and may be linear or circular. There is considerable variation in genome sizes,
from a few kilobases to a few hundred kilobases.
In the mature phage particle, the genome is encased in a protein coat that can
aid in binding to specific receptor proteins on the surface of a new host cell and
entry into the cell. To escape from one cell to infect a new one, bacteriophages
produce enzymes that cause the cell to burst (cell lysis). The ability to infect new
cells depends on the number of phage particles produced, the burst size, and so
high copy numbers are usually attained. Like episomal plasmids, some bacterio-
phages, such as phage l, can also integrate into the host chromosome, where-
upon the integrated phage sequence is known as a prophage.
(A)
cell 2 2 2 2
2 2 2 2
3 3
division 2 2
3 3 2 2 2 2
transformation and 3
3 3
1 extrachromosomal
replication 1 1 2 2 2 2
1
2 + 1
1
1
1
2 2
2
2
2
2
2 2
2
2
2
2
3 2 2
2
2 2 2 2 2 2
heterogeneous 2
2
2 2
2
2
recombinant DNA competent cells transformed cells with 2 2 2 2
multiple copies (clones)
of one recombinant DNA 2 2 2 2
2 2 2 2
2 2
2 2 2 2
to produce blue colonies. However, recombinants with an insert in the polylinker Figure 6.5 Transformation, the critical
can cause insertional inactivation so that no functional b-galactosidase is DNA fractionation step, and large-scale
produced. amplification of recombinant DNA.
(A) Cell-based DNA cloning typically starts
with a complex population of target DNA
Transformation is the key DNA fractionation step in cell-based fragments that are joined to a single
DNA cloning type of vector molecule, producing a
heterogeneous collection of recombinant
The plasma membrane of cells is selectively permeable so that only certain small DNA molecules. During transformation, the
molecules are allowed to pass into and out of cells; large molecules such as long recombinant DNA molecules are transferred
DNA fragments are normally unable to cross the membrane. However, cells can into competent cells, so that a cell typically
be treated in different ways so that the selective permeability of the plasma mem- takes up just one type of recombinant DNA
branes is temporarily distorted. One way, for example, is to use electroporation, molecule. As a result, the cell population
in which cells are exposed to a short high-voltage pulse of electricity. As a result acts as a fractionation system, allocating
of the temporary alteration to the cell membrane, some of the treated cells individual recombinant DNAs to individual
become competent, meaning that they are capable of taking up foreign DNA from cells. Within each cell, a recombinant
the extracellular environment. DNA can replicate extrachromosomally,
often producing many identical DNA
Only a small percentage of competent cells will take up recombinant DNA to
clones. Thereafter each transformed cell
become transformed. However, those that do will often take up just a single undergoes repeated divisions to generate
recombinant DNA molecule. This is the basis of the critical fractionation step in populations of identical cell clones that
cell-based DNA cloning. The population of transformed cells can be thought of will form cell colonies. (B) The cell colonies
as a sorting office in which the complex mixture of DNA fragments is sorted by can be physically separated by plating
depositing individual DNA molecules into individual recipient cells. Extra- out, a procedure that involves growth of
chromosomal replication of the recombinant DNA often allows many identical transformed cells on a nutrient agar surface
copies of a single type of recombinant DNA in a cell (Figure 6.5). so as to allow separation of cell colonies.
Because circular DNA transforms much more efficiently than linear DNA, Individual colonies are picked and added to
most of the cell transformants will contain cyclized products rather than linear liquid culture in flasks to allow secondary
amplification. (Courtesy of James Stock,
recombinant DNA concatemers. Usually vector molecules are treated to reduce
King’s College. Used under the Creative
the chance of cyclizing and so most of the transformants will contain recombi- Commons Attribution ShareAlike 3.0 license.)
nant DNA. Note, however, that co-transformation, the occurrence of more than
one type of introduced DNA molecule within a cell clone, can sometimes occur.
The transformed cells are allowed to multiply. In cloning using plasmid vec-
tors and a bacterial cell host, a solution containing the transformed cells is sim-
ply spread over the surface of nutrient agar in a Petri dish and the cells are allowed
to grow (a process called plating out—Figure 6.5). This usually results in the for-
mation of bacterial colonies that consist of cell clones (all the cells in a single
colony are identical because they are all descended from a single founder cell).
172 Chapter 6: Amplifying DNA: Cell-based DNA Cloning and PCR
TABLE 6.2 PROPERTIES OF CELLBASED DNA CLONING SYSTEMS ARRANGED ACCORDING TO INSERT CAPACITY
Cloning system based on Insert size Copy Applications
number
(per cell )
PLASMID VECTORS
Plasmid vectors Standard plasmid and Usually up often high Very versatile. Routinely used in subcloning of fragments amplified
allowing phagemid vectors to 5–10 kb by PCR. Also widely used in making cDNA libraries and for expressing
replication to genes to give RNA or protein products. Phagemids can produce
high-copy- single-stranded recombinant DNA
numbers
Cosmid vectors 30–44 kb high Used for making genomic DNA libraries with modest-sized inserts.
However, high copy number makes inserts comparatively unstable
Plasmid vectors Fosmid vectors 30–44 kb 1–2 An alternative to cosmid cloning, providing much greater insert
whose copy stability at the expense of lower yields of recombinant DNA
number is
constrained PAC (P1 artificial 130–150 kb 1–2 Used for making genomic DNA libraries, often as a supplementary
chromosome) vectors aid to BAC cloning
BAC (bacterial artificial up to 1–2 Used extensively to make genomic libraries for use in genome
chromosome) vectors 300 kb projects
YAC (yeast artificial 0.2–2.0 Mb low Used as an intermediate cloning system for some genome projects
chromosome) vectors (e.g. Human Genome Project) but inserts are often unstable. Also
useful in studying function of large genes or gene clusters
PHAGE VECTORS
M13 small low to Useful in the past for making single-stranded recombinant DNA for
medium sequencing templates and site-directed mutagenesis. Superseded
by phagemids
l insertion vectors e.g. lgt11 up to 10 kb low Useful for making cDNA libraries because they can accept quite large
inserts and because large numbers of recombinants can be made
l replacement vectors 9–23 kba low Used for making genomic DNA libraries in the past; rarely used now
P1 70–100 kb low Used for making genomic libraries in the past; rarely used now
aInsert size is increased compared to insertion vectors by removing a central non-essential part of the l genome and replacing it by the DNA to be
cloned. Note that plasmid vectors are generally preferred; phage vectors are comparatively difficult to work with.
important tools in the Genome Projects and in working out gene structure and
the relationship between genes and their transcripts. We will consider them in
detail in Chapter 8.
Vertebrate genes are often very large with a high content of repetitive DNA,
making them less amenable to cloning. Repetitive DNA is extremely rare in bac-
terial genomes, and large amounts of introduced recombinant DNA that is rich in
repetitive DNA sequences are not well tolerated. As a result, standard plasmids
can rarely be used to clone large (> 10 kb) fragments of animal DNA; instead, the
inserts are usually less than 5 kb. Plasmids continue to be the workhorses of DNA
cloning for many research purposes, however. Vectors such as pUC19 (see Figure
6.4) permit high-copy-number replication and large amounts of recombinant
DNA, and, as we will see in Section 6.3, plasmid vectors are widely used to express
genes of interest. To clone larger DNA fragments, different DNA cloning systems
were needed (see Table 6.2 for an overview). In addition, some applications for
studying recombinant DNA, notably DNA sequencing and in vitro site-specific
mutagenesis, required single-stranded DNA templates. As a result, cloning vec-
tors were also developed that could be used to generate single-stranded recom-
binant DNA.
integration and DNA synthesis, host cell lysis, Figure 6.6 A map of the phage l genome.
coat proteins recombination and pathway regulation Genes are shown as vertical bars. The central
cos region of the 48.5 kb l genome is non-
essential for extrachromosomal replication,
cos viral coat production, and lysis of the host
cell—it contains genes that are required only
non-essential region when l integrates into the chromosome
of its host cell. When using l replacement
The central portion of the l genome is a non-essential region: the genes it con- vectors, cloning occurs by replacing this
tains are required only for integrating l DNA into the host chromosome. DNA central non-essential region by target
cloning is therefore possible by removing a large central section of the l genome restriction fragments that can be 9–23 kb
DNA and replacing it by up to 23 kb of target DNA (Figure 6.6). long. The terminal cos sequences have
A further refinement that improved the upper size of fragments that could be 5¢ overhanging ends that are 12 nucleotides
cloned exploited the properties of the terminal cos sequences of l. The cos long and complementary in sequence,
sequences terminate in 5¢ overhanging ends that are 12 nucleotides long and enabling cyclization. The l genome needs to
be linear to fit into its protein coat, but after
complementary in sequence so that they are cohesive ends; hence the name cos
the phage has attached to the membrane of
(see Figure 6.6). The l genome needs to be linear to fit into its protein coat, but
a new host cell and inserted its linear DNA,
after the phage has attached to the membrane of a new host cell and inserted its the inserted l DNA immediately cyclizes by
linear DNA, the inserted l DNA immediately cyclizes by base pairing between the base pairing between the overhanging 5¢
cos sequences. Extrachromosomal replication of l DNA occurs by a rolling-circle cos sequences, then replicates eventually by
model that generates linear multimers of the unit length. The l multimers are a rolling-circle model that generates linear
cleaved within the cos sequence to generate unit length linear genomes for pack- multimers of the unit 48.5 kb length. Coat
aging into viral protein coats. proteins are synthesized and the l multimers
For proper packaging of l DNA within the phage protein coat the spacing are then snipped asymmetrically within the
between the cos sequences is constrained to be 35–53 kb. By inserting the cos cos sequence to generate multiple copies of
the unit genome that are then individually
sequence into a plasmid, a new type of cloning vector, the cosmid, was produced
packed within protein coats.
that, when used with an in vitro l packaging system, could accept foreign frag-
ments up to 40 kb in length. However, cosmid recombinants that contain verte-
brate DNA 40 kb long are unstable. The inserts typically have repetitive sequences
and when the recombinant DNA is present in a bacterial cell at high copy number,
recombination makes the inserts prone to rearrangement and deletion.
To achieve cloning of larger inserts, cloning systems needed to use low-copy-
number replicons. One successful approach uses bacterial hosts but with extra-
chromosomal replicons that are constrained in copy number. Another solution
involves using yeast cells and seeks to construct artificial chromosomes to
replicate target DNA. In this case, the vectors include sequences essential for
chromosomal function, including an origin of replication that originates from a
chromosome, resulting in low copy numbers. To generate very large target DNA
fragments for cloning, DNA needs to be isolated from cells under gentle condi-
tions and then subjected to partial digestion with a restriction endonuclease, so
that only a small minority of the available restriction sites are actually cut.
replicon, and so only low yields of recombinant DNA can be recovered from the
host cells. However, because of their great insert stability they were the most
widely used clones that were selected for sequencing in the Human Genome
Project.
Cosmid-like vectors that have been adapted for low-copy-number replication
by using F-factor genes are known as fosmid vectors. The inserts have the same
size range as in cosmid cloning but are much more stable.
Bacteriophage P1 vectors and P1 artificial chromosomes
An alternative approach to cloning large inserts in E. coli was to develop vectors
based on bacteriophages that have naturally large genomes. For example, bacte-
riophage P1 has a 110–115 kb linear double-stranded DNA genome that, like
phage l, is packaged into a protein coat. Bacteriophage P1 cloning vectors have
therefore been designed in which components of P1 are included in a circular
plasmid. Subsequently, features of the P1 and F-factor systems were combined in
another plasmid-based cloning system in which recombinants are known as a P1
artificial chromosomes (PACs). As in BACs, the inserts of PACs are very stable
and they have also been used as direct templates for sequencing in some genome
projects.
wild-type
homoduplex
5¢
G C C mutagenic primer
CGG T C mutant
AT
AG G
C A sequence
T
single-stranded C G
M13 A C A C
recombinant DNA synthesis transformation
G C
T A in vitro of E. coli host cell
AC G
T
T GC
A
double-stranded
3¢ heteroduplex
yeast cells; instead, the external cell walls must first be removed. The resulting mutant
homoduplex
yeast spheroplasts can accept exogenous fragments but are osmotically unstable
and need to be embedded in agar. The overall transformation efficiency is very
low, and the yield of cloned DNA is low (about one copy per cell). Nevertheless, Figure 6.8 Oligonucleotide mismatch
the capacity to clone large DNA fragments (up to 2 Mb) made YACs a vital tool in mutagenesis. Many different methods of
the physical mapping of some complex genomes, notably the human genome. oligonucleotide mismatch mutagenesis can
be used to create a desired point mutation at
a unique predetermined site within a cloned
Producing single-stranded DNA for use in DNA sequencing and DNA molecule. The example illustrates
in vitro site-specific mutagenesis the use of a mutagenic oligonucleotide to
direct a single nucleotide substitution in
Single-stranded DNA clones have been preferred templates for DNA sequencing: a gene. The gene is cloned into a suitable
the sequences obtained are clearer and easier to read than when double-stranded vector to generate a single-stranded
DNA is used. We will consider aspects of DNA sequencing in Chapter 8. recombinant DNA, such as an M13 vector
Single-stranded DNA templates are also ideal substrates for site-specific (as shown here) or a phagemid vector. An
in vitro mutagenesis, an invaluable experimental tool in dissecting the func- oligonucleotide primer is designed to be
tional contributions made by short sequences of nucleotides or amino acids. For complementary in sequence to a portion
example, the suspected contribution of a specific amino acid to the function of of the gene sequence encompassing the
its protein can be tested by deleting the amino acid or mutating it to give a differ- nucleotide to be mutated (in this case an
adenine, A) and containing the desired
ent amino acid with a rather different side chain. To do this, a single-stranded
non-complementary base at that position
cloned DNA with the wild-type sequence is allowed to base-pair with a synthetic
(C, not T). Despite the internal mismatch,
oligonucleotide that is designed to be perfectly complementary in base sequence annealing of the mutagenic primer is
except for some changes that will correspond to the desired mutation. The oligo- possible, and second-strand synthesis can
nucleotide is allowed to prime new DNA synthesis to create a complementary be extended by DNA polymerase and the
full-length sequence containing the desired mutation. The newly formed wild- gap sealed by DNA ligase. The resulting
type–mutant heteroduplex is used to transform cells, and the desired mutant heteroduplex can be transformed into
sequence can be identified by screening for the mutation (Figure 6.8). E. coli, whereupon two populations of
Single-stranded recombinant DNA is often obtained by using cloning systems recombinants can be recovered: wild-type
based on filamentous bacteriophages such as phages M13, fd, and f1, which nat- and mutant homoduplexes. The latter can
be identified by PCR-based allele-specific
urally adopt a single-stranded DNA form at some stage in their life cycle.
amplification methods (see Figure 6.17).
Alternatively, plasmid vectors are used that contain filamentous phage DNA
sequences that regulate the production of single-stranded DNA.
M13 vectors
Bacteriophage M13 has a 6.4 kb circular single-stranded DNA genome enclosed
in a protein coat, forming a long filamentous structure. It infects certain strains of
E. coli. After the M13 phage binds to the wall of a bacterial cell, it injects its DNA
genome into the cell. Shortly after the single-stranded M13 DNA enters the cell it
is converted to a double-stranded DNA, the replicative form that serves as a tem-
plate for making numerous copies of the genome. After a short time, a phage-
encoded product switches DNA synthesis toward the production of single strands
that migrate to the cell membrane. Here they are enclosed in a protein coat, and
hundreds of mature phage particles are extruded from the infected cell without
cell lysis.
M13 vectors are based on the double-stranded replicative form, which has
been modified to contain a multiple cloning site and a lacZ selection system for
screening for recombinants. Double-stranded M13 recombinants can be trans-
fected into a suitable E. coli strain; after a certain period, phage particles are har-
vested and used to retrieve single-stranded recombinant DNA (Figure 6.9).
LARGE INSERT CLONING AND CLONING SYSTEMS FOR PRODUCING SINGLE-STRANDED DNA 177
MCS
lacl lacZ¢
ori single-
M13 vector stranded
~7.3 kb recombinant
DNA
transfect double-
stranded recombinant
DNA into host cells
switch to single-
replication of
stranded DNA
double-stranded
synthesis of
recombinant DNA
strand only
transcription
phage coat proteins
Figure 6.9 Producing single-stranded recombinant DNA. M13 vectors use the same lacZ assay for recombinant screening as
pUC19 (see Figure 6.4). Target DNA is inserted in the multiple cloning site (MCS). The double-stranded M13 recombinant DNA
enters the normal cycle of phage replication to generate numerous copies of the genome, before switching to production of
single-stranded DNA (+ strand only). After packaging with M13 phage coat protein, mature recombinant phage particles exit from
the cell without lysis. Single-stranded recombinant DNA is recovered by precipitating the extruded phage particles and chemically
removing the protein.
Phagemid vectors
A small segment of a single-stranded filamentous bacteriophage genome can be
inserted into a plasmid to form a hybrid vector known as a phagemid; Figure
6.10 shows the pBluescript vector as an example. The chosen segment of the
phage genome contains all the cis-acting elements required for DNA replication
and assembly into phage particles. They permit successful cloning of inserts sev-
eral kilobases long, unlike M13 vectors, in which inserts of this size tend to be
unstable. After transformation of a suitable E. coli strain with a recombinant
phagemid, the bacterial cells are superinfected with a filamentous helper phage,
such as phage f1, which is required for providing the coat protein. Phage particles
secreted from the superinfected cells will be a mixture of helper phage and
recombinant phagemids. The mixed single-stranded DNA population can be MCS
Plac lacZ
used directly for DNA sequencing because the primer for initiating DNA strand
synthesis is designed to bind specifically to a sequence of the phagemid vector
adjacent to the cloning site.
f1
Figure 6.10 pBluescript, a phagemid vector. The pBluescript series of
origin
phagemid vectors each contain two origins of replication, a standard plasmid pBluescript vector
one (ori) and a second one from the filamentous phage f1. The f1 origin of 2.9 kb
replication allows the production of single-stranded DNA, but the vector lacks
ori
the genes for phage coat proteins. Target DNA is inserted into the phagemid
vector, which is then used to transform host E. coli cells. Superinfection of
transformed cells with a helper M13 phage allows the recombinant phagemid
DNA to be packaged within phage protein coats, and the recombinant
single-stranded DNA can be recovered as in Figure 6.9. AmpR
178 Chapter 6: Amplifying DNA: Cell-based DNA Cloning and PCR
(B)
1 2 3 1 2 3
1
1 1
1
biotin S
phage library in selective binding of phage S S
S
aqueous solution antibody expressing protein 1
biotin-conjugated Petri dish coated
antibody specific with streptavidin (S)
selective binding of
for protein 1 biotin-conjugated phage
fusion protein that is incorporated into the phage’s protein coat, so that it is dis- Figure 6.13 Phage display. (A) Library
played on the surface of the phage without affecting the phage’s ability to infect construction. A heterogeneous mixture of
cells. If an antibody is available for the expressed protein, phage displaying that target cDNAs is cloned into a phage vector,
such as one based on phage M13 or f1,
protein can be selected by preferential binding to the antibody: affinity purifica-
to express foreign proteins on the phage
tion of virus particles bearing such a protein can be achieved from a 108-fold
surface. In this example, DNA is inserted
excess of phage not displaying the protein, using even minute quantities of the into a cloning site located within the initial
relevant antibody (Figure 6.13). part of the coding sequence for the gene
In protein engineering, phage display is a powerful adjunct to random muta- from phage f1 that makes one of the phage
genesis programs as a way of selecting for desired variants from a library of coat proteins (protein III). The idea is that
mutants. It has also proved a powerful alternative source of constructing anti- recombinants should express a fusion
bodies, bypassing normal immunization techniques and even hybridoma tech- protein with the target protein fused to this
nology. Phage libraries can also be used to identify proteins that interact with a phage coat protein, thereby enabling the
specific protein. In the same way in which antibodies can be used in affinity phage particles to display the foreign protein
on their surfaces. After transformation of
screening, a known protein (or any other molecule to which a protein can bind)
host E. coli cells, the phage DNA replicates,
can be used as a bait to select phages that display any other proteins that bind to
and phage particles are assembled,
the bait protein. extruded from the host cell, and harvested.
The mixture of recombinant phages is
Eukaryotic gene expression is performed with greater fidelity in known as a phage expression library.
eukaryotic cell lines (B) Library screening with a specific antibody.
An antibody specific for just one type of
Bacterial expression systems offer the great advantage of protein expression at protein will bind only to phage particles
very high levels. However, the biological properties of many eukaryotic proteins that display that protein. If the antibody
synthesized in bacteria may not be representative of the native molecules. When is complexed with biotin, phage particles
produced in bacterial cells, they do not undergo their normal post-translational carrying the desired protein can be purified
processing, and the folding of the eukaryotic protein may be incorrect or ineffi- by using streptavidin-coated Petri dishes: the
desired recombinants will have an antibody
cient. As an alternative, eukaryotic cells, including insect and mammalian cells,
complexed to a biotin group and the biotin
are widely used for expressing recombinant proteins. The expression vectors are group will bind strongly to streptavidin.
typically designed to be able to be replicated both in the desired eukaryotic cells
and in E. coli, where the intention is simply to produce large amounts of recom-
binant DNA that can subsequently be used for expression studies in eukaryotic
cells.
Once transfected into animal cells, the recombinant DNA can be expressed
for short or long periods, according to the expression vector used. For some pur-
poses, all that is required is transient expression. Here, the recombinant DNA
remains as an independent extrachromosomal genetic element (an episome)
within the transfected cells. Expression of the introduced gene reaches a maxi-
mal level about two or three days after transfection of the expression vector into
the mammalian cell line; thereafter, expression can diminish rapidly as a result of
cell death or loss of the expression construct.
The alternative is to use stable expression systems in which the recombinant
DNA can integrate into a host cell’s chromosome. The advantage here is that if
CLONING SYSTEMS DESIGNED FOR GENE EXPRESSION 181
the recombinant DNA has integrated so that the introduced gene is expressed, all
descendant cells will contain this gene and expression can be maintained over
many cell generations. Some viruses are highly efficient at integrating into chro-
mosomal DNA, and we will consider them in the context of gene therapy in a later
chapter. But plasmid expression vectors can also randomly integrate at low fre-
quencies, and stable cell lines can be developed with an integrated gene of
interest.
Transient expression in insect cells by using baculovirus
Baculovirus gene expression is a popular method for producing large quantities
of recombinant proteins in insect host cells: the protein yields are higher than in
mammalian expression systems, and the costs are lower. In most cases, post-
translational processing of eukaryotic proteins expressed in insect cells is similar
to the protein processing that occurs in mammalian cells, and the expression of
very large proteins is possible. The proteins produced in insect cells have compa-
rable biological activities and immunological reactivities to those of proteins
expressed in mammalian cells.
Autographa californica nuclear polyhedrosis virus (AcMNPV) is a baculovirus
that can be propagated in certain insect cell lines and is often used as a cloning
vector for protein expression systems. The viral polyhedrin protein is transcribed
at high rates and, although essential for viral propagation in its normal habitat, it
is not needed in culture. As a result, its coding sequence can be replaced by a cod-
ing DNA for a eukaryotic protein. The cloning vector is designed to express the
inserted protein from the powerful polyhedrin promoter, resulting in expression
levels accounting for more than 30% of the total cell protein.
medium that makes it difficult to isolate DNA by the standard methods. Successful multiple cloning site
PCR amplification is possible from small amounts of degraded DNA extracted
from decomposed tissues at archaeological or historical sites and from formalin- HindIII
KpnI
fixed tissue samples. AmpR
PCMV BamHI
SpeI
BstXI
PCR can be used to amplify a rare target DNA selectively from pcDNA3.1/myc-His
EcoRI
within a complex DNA population pUC ori BGH pA PstI
EcoRV
Usually, PCR is designed to permit the selective replication of one or more specific f1 ori BstXI
SV40 pA NotI
target DNA sequences within a heterogeneous collection of DNA sequences, neo XhoI
enabling it to increase vastly in copy number (amplification). Often the starting SV40 ori BstEII
XbaI
DNA is total genomic DNA from a particular tissue or cultured cells, and the target and
enhancer ApaI myc
DNA is typically a tiny fraction of the starting DNA. For example, if we want to SacII epitope
amplify the 1.6 kb b-globin gene from human genomic DNA (with a haploid BstBI
6 ¥ His
genome size of 3200 Mb), the ratio of target DNA to starting DNA is 1.6 kb:3200 Mb, PmeI
or 1:2,000,000. Many PCR reactions involve the amplification of even smaller tar- STOP
gets. For example, single human exons are on average about 290 bp long, and
thus represent less than 0.000001% of a starting population of genomic DNA. Figure 6.15 A mammalian expression
As an alternative to genomic DNA, the starting DNA may be total cDNA pre- vector, pcDNA3.1/myc-His. The Invitrogen
pared by isolating RNA from a suitable tissue or cell line and then converting it pcDNA series of plasmid expression
into DNA with the enzyme reverse transcriptase. This is known as reverse tran- vectors offer high-level constitutive protein
scriptase-PCR or RT-PCR for short. If the original RNA population has many cop- expression in mammalian cells. Shown
ies of the transcript of the target DNA, there may be significant enrichment of the here is the pcDNA3.1/myc-His expression
target sequence in comparison with its representation in genomic DNA. Under vector. Cloned cDNA inserts can be
optimal conditions, RT-PCR can also sometimes be used to amplify target transcribed from the strong cytomegalovirus
(PCMV ) promoter, which ensures high-
sequences from tissues in which they are expressed at only a basal transcription
level expression in mammalian cells, and
level.
transcripts are polyadenylated with the help
of a polyadenylation sequence element
The cyclical nature of PCR leads to exponential amplification of from the bovine growth hormone gene
the target DNA (BGH pA). The multiple cloning site region
is followed by two short coding sequences
PCR depends on synthesizing oligonucleotide sequences to act as primers for that can be expressed to give peptide tags.
new DNA synthesis at defined target sequences within the starting DNA; the idea The myc epitope tag allows recombinant
is to selectively replicate only the target sequence(s). It consists of a series of screening with an antibody specific for
cycles of three successive reactions conducted at different temperatures, so the the Myc peptide, and an affinity tag of six
process is sometimes described as thermocycling. First, the starting DNA is heated consecutive histidine residues (6 ¥ His)
to a temperature that is high enough to break the hydrogen bonds holding the facilitates purification of the recombinant
two complementary DNA strands together. As a result, the double strands sepa- protein by affinity chromatography with a
rate to give single-stranded DNA (denaturation). For human genomic DNA, the nickel–nitrilotriacetic acid matrix. Translation
reaction mixture is generally heated to about 93–95°C for the denaturation step. ends at a defined termination codon (STOP).
A neo gene marker (regulated by an SV40
After cooling, the synthetic oligonucleotide primers are allowed to bind by base
promoter/enhancer and an SV40 poly(A)
pairing to a complementary sequence on the single-stranded DNA (annealing). sequence) permits selection by the growth
Next, in the presence of the four deoxynucleoside triphosphates dATP, dCTP, of transformed host cells on the antibiotic
dGTP, and dTTP, a purified DNA polymerase initiates the synthesis of new DNA G418. Components for propagation in E. coli
strands that are complementary to the individual DNA strands of the target DNA include a permissive origin of replication
segment. from plasmid ColE1 (pUC ori) and an
The orientation of the primers is deliberately chosen so that the direction of ampicillin resistance gene (AmpR) plus an
synthesis of new DNA strands is toward the other primer-binding site. As a result, origin of replication from phage f1 (f1 ori),
the newly synthesized strands can, in turn, serve as templates for new DNA syn- which provides an option for producing
thesis, causing a chain reaction with an exponential increase in product (Figure single-stranded recombinant DNA.
6.16). The three successive reactions of denaturation, primer annealing, and
DNA synthesis are repeated up to about 30–40 times in a standard PCR reaction.
The DNA synthesis step of PCR typically occurs at about 70–75°C, but the DNA
polymerase used also needs to withstand the much higher temperature of the
denaturation step. The requirement for a heat-stable enzyme drove researchers
to isolate DNA polymerases from microorganisms whose natural habitat is hot
springs. An early and widely used example is Taq polymerase from Thermus
aquaticus. However, Taq polymerase lacks an associated 3¢Æ5¢ exonuclease activ-
ity to provide a proofreading activity. All polymerases make occasional errors by
inserting the wrong base, but such copying errors can be rectified by a proofread-
ing 3¢Æ5¢ exonuclease activity. As a result, other thermophilic bacteria were
sought as sources of heat-stable polymerases with a 3¢Æ5¢ proofreading activity,
such as Pfu polymerase from Pyrococcus furiosus.
184 Chapter 6: Amplifying DNA: Cell-based DNA Cloning and PCR
second-cycle products
third-cycle products
30th cycle
its binding site. A useful measure of the stability of any nucleic acid duplex is the
melting temperature (Tm ), the temperature corresponding to the mid-point in
observed transitions from the double-stranded form to the single-stranded form.
To ensure a high degree of specificity in primer binding, the primer annealing
temperature is typically set to be about 5°C below the calculated melting tem-
perature; if the annealing temperature were to be much lower, primers would
tend to bind to other regions in the DNA that had partly complementary
sequences.
The melting temperature, and so the temperature for primer annealing,
depends on base composition. This is because GC base pairs have three hydro-
gen bonds and AT base pairs only two. Strands with a high percentage of GC base
pairs are therefore more difficult to separate than those with a low composition
of GC base pairs. Optimal priming uses primers with a GC content between 40%
and 60% and with an even distribution of all four nucleotides. Additionally, the
calculated Tm values for two primers used together should not differ by greater
than 5°C, and the Tm of the amplified target DNA should not differ from those of
the primers by greater than 10°C.
Primer design is aided by various commercial software programs and certain
freeware programs that are accessible through the web at compilation sites such
as http://www.humgen.nl/primer_ design.html. As we will see below, it is par-
ticularly important that the 3¢ ends of primers are perfectly matched to their
intended sequences. Various modifications are often used to reduce the chances
of nonspecific primer binding, including hot-start PCR, touch-down PCR, and
the use of nested primers (Box 6.2).
X Y
a suitable restriction site at its 5¢ end. The nucleotide extension does not base-
pair to target DNA during amplification, but afterward the amplified product can
be digested with the appropriate restriction enzyme to generate overhanging
ends for cloning into a suitable vector.
Allele-specific PCR
Many applications require the ability to distinguish between two alleles that may
differ by only a single nucleotide. Diagnostic applications include the ability to
distinguish between a disease-causing point mutation and the normal allele.
Allele-specific PCR takes advantage of the crucial dependence of correct base
pairing at the extreme 3¢ end of bound primers. In the popular ARMS (amplifica-
tion refractory mutation system) method, allele-specific primers are designed
with their 3¢ end nucleotides designed to base-pair with the variable nucleotide
that distinguishes alleles. Under suitable experimental conditions amplification
will not take place when the 3¢ end nucleotide is not perfectly base paired, thereby
distinguishing the two alleles (Figure 6.17).
Multiple target amplification and whole genome PCR methods
Standard PCR is based on the need for very specific primer binding, to allow the
selective amplification of a desired known target sequence. However, PCR can
also be deliberately designed to simultaneously amplify multiple different target
DNA sequences. Sometimes there may be a need to amplify multiple members of
a family of sequences that have some common sequence characteristic. Some
highly repetitive elements such as the human Alu sequence occur at such a high
frequency that it is possible to design primers that will bind to multiple Alu
sequences. If two neighboring Alu sequences are in opposite orientations, a sin-
gle type of Alu-specific primer can bind to both sequences, enabling amplifica-
tion of the sequence between the two Alu repeats (Alu-PCR).
For some purposes, all sequences in a starting DNA population need to be
amplified. This approach can be used to replenish a precious source of DNA that
is present in very limiting quantities (whole genome amplification). As we will see
in a later chapter, it has also been used for indiscriminate amplification to pro-
duce templates for DNA sequencing. One way of performing multiple target
amplification is to attach a common double-stranded oligonucleotide linker
covalently to the extremities of all of the DNA sequences in the starting
(A)
allele 1 5¢ A 3¢
allele 2 3¢ C 5¢
A
allele 1 specific conserved primer
primer (ASP1)
C
allele 2 specific
primer (ASP2) Figure 6.17 Allele-specific PCR is
dependent on perfect base-pairing of the
(B) allele 1 3¢ end nucleotide of primers. (A) Alleles
A 1 and 2 differ by a single base (AÆC). Two
allele-specific oligonucleotide primers (ASP1
A and ASP2) are designed that are identical
either
to the sequence of the two alleles over a
T
region preceding the position of the variant
C no amplification nucleotide, but which differ and terminate
in the variant nucleotide. ASP1 and ASP2 are
or T
used as alternative primers in PCR reactions
with another primer that is designed to bind
to a conserved region on the opposite DNA
allele 2 strand for both alleles. (B) ASP1 will bind
perfectly to the complementary strand of the
C
allele 1 sequence, permitting amplification
with the conserved primer. However, the
C
3¢-terminal C of the ASP2 primer mismatches
either G with the T of the allele 1 sequence, making
amplification impossible. Similarly, ASP2, but
A no amplification
not ASP1, can bind perfectly to allele 2 and
or G initiate amplification.
188 Chapter 6: Amplifying DNA: Cell-based DNA Cloning and PCR
digest target DNA with restriction endonuclease, synthesize two oligonucleotides so that th Figure 6.18 Using linker oligonucleotides
e.g. MboI (which gives 5¢ GATC overhangs) have complementary sequences except th to amplify multiple target DNAs
each has the same palindromic sequenc
at the 5¢ end, e.g. GATC
simultaneously. The DNA to be amplified
is digested with a restriction endonuclease
5¢ GATC 3¢ 5¢ GATC 3¢ that produces overhanging ends. The linker
3¢ CTAG 5¢ is prepared by individually synthesizing
3¢ CTAG 5¢
GATC 3¢
two oligodeoxyribonucleotides that are
CTAG allow to base pair complementary in sequence except that
when allowed to combine they form a
GATC 3¢ GATC double-stranded DNA sequence with
CTAG CTAG
the same type of overhanging ends as
heterogeneous target DNA fragments linker oligonucleotides the restriction fragments. Ligation of the
linker oligonucleotides to the target DNA
restriction fragments can result in fragments
ligate being flanked on each side by a linker.
Primers that are specific for the linkers can
GATC GATC GATC then permit amplification of target DNA
CTAG CTAG CTAG molecules with linkers at both ends (shown
GATC GATC GATC in the case of only one of the different
CTAG CTAG CTAG fragments in this example). As a result,
numerous DNA sequences in a starting
GATC GATC GATC population can be amplified simultaneously.
CTAG CTAG CTAG
targets flanked by linker oligonucleotides
population and then use linker-specific primers to amplify all the DNA sequences
(Figure 6.18). An alternative to using linker oligonucleotides is to use compre-
hensively degenerate oligonucleotides as primers. That is, when oligonucleotides
are synthesized one nucleotide at a time, all four nucleotides can be inserted at
specific positions instead of a single one, to give a parallel set of many different
oligonucleotides. As a result, very large numbers of different oligonucleotides are
synthesized and can bind to numerous different sequences in the genome, ena-
bling PCR amplification of much of the genomic DNA. Note, however, that whole
genome amplification is often now not performed by PCR but by an alternative
form of in vitro DNA cloning (see Box 6.2).
PCR mutagenesis
PCR can be used to engineer various types of predetermined base substitutions,
deletions, and insertions into a target DNA. In 5¢ add-on mutagenesis a desired
sequence or chemical group is added in much the same way as can be achieved
by ligating an oligonucleotide linker. A mutagenic primer is designed that is com-
plementary to the target sequence at its 3¢ end, and the 5¢ end contains a desired
novel sequence or a sequence with an attached chemical group. The extra 5¢
sequence does not participate in the first annealing step of the PCR reaction but
subsequently becomes incorporated into the amplified product, thereby gener-
ating a recombinant product (Figure 6.19A). The additional 5¢ sequence may
contain one or more of the following: a suitable restriction site that may facilitate
subsequent cell-based DNA cloning; a modified nucleotide containing a reporter
group or labeled group such as a biotinylated nucleotide or fluorophore; or a
phage promoter to drive gene expression.
In mismatched primer mutagenesis, the primer is designed to be only partly
complementary to the target site, but in such a way that it will still bind specifi-
cally to the target. Inevitably this means that the mutation is introduced close to
the extreme end of the PCR product. This approach may be exploited to intro-
duce an artificial diagnostic restriction enzyme site that permits screening for a
FURTHER READING 189
(A) (B)
1M Figure 6.19 PCR mutagenesis. (A) 5¢ add-
2 on mutagenesis. Primers can be modified
5¢ 3¢ 5¢ 3¢ at the 5¢ end to introduce a desired novel
3¢ 5¢
3¢ 5¢ 1 sequence or chemical group (red bars) that
2M does not take part in the initial annealing
PCR reaction 1 PCR reaction 2
with primers 1 with primers 2 to the target DNA but will be copied in
and 1M and 2M subsequent cycles. The products of the
second and final PCR cycles are shown
second-cycle products product 1 in this example. (B) Mismatched primer
mutagenesis. A specific predetermined
product 2 mutation located in a central segment can be
introduced by two separate PCR reactions,
combine two sets of 1 and 2, each amplifying overlapping
PCR products, denature segments of DNA. Complementary mutant
and re-anneal primers, 1M and 2M, introduce deliberate
base mismatching at the site of the mutation.
After the two PCR products are combined,
final PCR products + denatured, and allowed to re-anneal, each of
the two product 1 strands can base-pair with
complementary strands from product 2 to
3¢ extension
form heteroduplexes with recessed
3¢ ends. DNA polymerase can extend the
3¢ ends to form full-length products with the
introduced mutation in a central segment,
and they can be amplified by using the outer
PCR using primers
1 and 2 primers 1 and 2 only.
known mutation. Mutations can also be introduced at any point within a chosen
sequence by using mismatched primers. Two mutagenic reactions are designed
in which the two separate PCR products have partly overlapping sequences con-
taining the mutation. The denatured products are combined to generate a larger
product with the mutation in a more central location (Figure 6.19B).
Real-time PCR (qPCR)
Real-time PCR (or qPCR, an abbreviation of quantitative PCR) is a widely used
method to quantitate DNA or RNA. In the latter case, RNA is copied by using
reverse transcriptase to make complementary DNA strands. Standard PCR reac-
tions, in which the amplification products are analyzed at the end of the proce-
dure, are not well suited to quantitation. Samples are removed from the reaction
tubes and are usually size-fractionated on agarose gels and detected by binding
ethidium bromide, which fluoresces under ultraviolet radiation. This procedure
is both time-consuming and not very sensitive. In real-time PCR (qPCR), how-
ever, quantitation is done automatically, while the reaction is actually occurring,
in specially designed PCR equipment. The reaction products are analyzed at an
early stage in the amplification process when the reaction is still exponential,
allowing more precise quantitation than at the end of the reaction. There are
many applications, both in absolute quantitation of a nucleic acid and in com-
parative quantitation. One of the most important is in tracking gene expression,
and we will describe in detail in Chapter 8 how qPCR works and is used in gene
expression profiling.
FURTHER READING
General cell-based DNA cloning Sambrook J & Russell DW (2001) Molecular Cloning. A Laboratory
Primrose SB, Twyman RM, Old RW & Bertola G (2006) Principles Manual. Cold Spring Harbor Laboratory Press.
Of Gene Manipulation And Genomics, 7th ed. Blackwell Vector databases: for compilation of websites see http://mybio
Publishing Professional. .wikia.com/wiki/Vector_databases
Roberts RJ (2009) Official REBASE Homepage. http://rebase Watson JD, Caudy AA, Myers RM & Witkowski JA (2007)
.neb.com/rebase/rebase.html [database of restriction Recombinant DNA: Genes And Genomes, 3rd ed. Freeman.
endonucleases.]
190 Chapter 6: Amplifying DNA: Cell-based DNA Cloning and PCR
PCR
Kubista M, Andrade JM, Bengtsson M et al. (2006) The real-time
polymerase chain reaction. Mol. Aspects Med. 27, 95–125.
McPherson MJ & Moller SG (2006) PCR: The Basics. Garland
Scientific Press.
National Center for Biotechnology Information (2008) Primer-
BLAST: primer designing tool. http://www.ncbi.nlm.nih
.gov/tools/primer-blast/ [to find primers specific for a given
sequence.]
DNA mutagenesis
Ling MM & Robinson BH (1997) Approaches to DNA mutagenesis:
an overview. Anal. Biochem. 254, 157–178.
Chapter 7
DNA molecules are very large and break easily when isolated from cells, making
them difficult to study. Chapter 6 looked at the way in which DNA cloning, either
in cells or in vitro, allows individual DNA molecules to be selectively replicated to
very high copy numbers. When amplified in this way, the DNA is effectively puri-
fied, and DNA cloning enables individual DNA sequences in genomic DNA to be
studied, and also individual RNA molecules after they have been copied to give a
cDNA. In this chapter, we consider an altogether different approach. Instead of
trying to purify individual nucleic acid sequences, the object is to track them spe-
cifically within a complex population that represents a sample of biological or
medical interest.
At the basic research level, nucleic acid hybridization is often used to track
RNA transcripts and so obtain information on how genes are expressed. It is also
used to identify relationships between DNA sequences from different sources.
For example, a starting DNA sequence can be used to identify other closely related
sequences from different organisms or from different individuals within the same
species, or even related sequences from the same genome as the starting
sequence. Nucleic acid hybridization is also often used as a way of identifying
disease alleles and aberrant transcripts associated with disease.
anneal
probe–target re-annealed
re-annealed sample nucleic acids heteroduplexes probe
Oligo–DNA or oligo–RNAd
%GC from 30% to 75% GC. cL, length of duplex in base pairs. dOligo, oligonucleotide; ln, effective
length of primer = 2 ¥ (no. of G + C) + (no. of A + T). For each 1% formamide, Tm is decreased by
about 0.6°C, and the presence of 6 M urea decreases Tm by about 30°C.
length of the duplex. There are more hydrogen bonds in longer nucleic acid mol-
ecules and so more energy is required to break them. The increase in duplex sta-
bility is not linearly proportional to length, and the effect of changing the length
is particularly noticeable at shorter length ranges. As will be described in the next
section, base mismatching reduces duplex stability.
Base composition and the chemical environment are also important factors
in duplex stability. A high percentage of GC base pairs means greater difficulty in
separating the strands of a duplex because GC base pairs have three hydrogen
bonds and AT base pairs have only two. The presence of monovalent cations (e.g.
Na+) stabilizes the hydrogen bonds in double-stranded molecules, whereas
strongly polar molecules (such as formamide and urea) disrupt hydrogen bonds
and thus act as chemical denaturants.
A progressive increase in temperature also makes hydrogen bonds unstable,
eventually disrupting them. The temperature corresponding to the mid-point in
observed transitions from double-stranded form to single-stranded form is
known as the melting temperature (Tm). For mammalian genomes, with a base
composition of about 40% GC, the DNA denatures with a Tm of about 87°C in
buffers whose pH and salt concentration approximate physiological conditions.
The Tm of perfect hybrids formed by DNA, RNA, or oligonucleotide probes can be
determined by using standard formulae (Table 7.1).
(A) nucleic acid probe (B) oligonucleotide probe Figure 7.3 Exploiting probe–target
mismatches during nucleic acid
probe hybridization. (A) Longer nucleic acid
perfect probes (more than 100 nucleotides long)
match will form stable heteroduplexes with target
with target
target stable stable sequences that are similar but not identical
to the probe, if hybridization conditions
are reduced in stringency. This allows
single
cross-species analyses and identification
mismatch of distantly related members of a gene
stable unstable at high
family. (B) Short oligonucleotide probes
hybridization stringency are less tolerant of sequence mismatches.
By using a high-stringency hybridization, it
is possible to select for perfectly matched
significant
mismatching duplexes only, and so distinguish between
allelic sequences that differ by just a single
stable at reduced hybridization
stringency; unstable at high nucleotide.
stringency
long (more than 100 bp). Conversely, if the region of base complementarity is
short, as when oligonucleotides are involved (typically 15–20 nucleotides),
hybridization conditions can be chosen such that a single mismatch renders a
heteroduplex unstable (Figure 7.3).
length of the newly synthesized strand. Typically a DNA or RNA polymerase uses
a suitable cloned DNA to make DNA or RNA copies that are labeled by using a
labeling mix in which at least one of the four dNTPs or rNTPs is labeled.
Labeling DNA by nick translation
In nick translation, single-strand breaks (nicks) are introduced in the DNA to
leave exposed 3¢ hydroxyl termini and 5¢ phosphate termini. The nicking can be
achieved by adding a suitable endonuclease such as pancreatic deoxyribonu-
clease I (DNase I). The exposed nick serves as a starting point for introducing new
nucleotides with E. coli DNA polymerase I, a multisubunit enzyme that has both
DNA polymerase and 5¢Æ3¢ exonuclease activities. At the same time as the DNA
polymerase activity adds new nucleotides at the 3¢ hydroxyl side of the nick, the
5¢Æ3¢ exonuclease activity removes existing nucleotides from the other side of
the nick. Thus, the nick will be moved progressively along the DNA in the 5¢Æ3¢
direction (Figure 7.4). If the reaction is conducted at a relatively low temperature
(about 15°C), the reaction proceeds no further than one complete renewal of the
existing nucleotide sequence. Although there is no net DNA synthesis at these
temperatures, the synthesis reaction allows the incorporation of labeled nucle-
otides in place of a previously existing unlabeled nucleotide.
Random primed DNA labeling
The random primed DNA labeling method uses a mixture of many different
hexanucleotides to bind randomly to complementary hexanucleotide sequences
in a DNA template and prime new DNA strand synthesis (Figure 7.5). Synthesis
of new complementary DNA strands is catalyzed by the Klenow subunit of
E. coli DNA polymerase I (containing the polymerase activity but not the associ-
ated 5¢Æ3¢ exonuclease activity).
PCR-based strand synthesis labeling
The standard PCR reaction can be modified to include one or more labeled nucle-
otide precursors that become incorporated into the PCR product throughout its
length.
RNA labeling
RNA probes (riboprobes) can be obtained by in vitro transcription of DNA cloned
in a suitable plasmid expression vector. Expression vectors contain a promoter
sequence immediately adjacent to the insertion site for foreign DNA, allowing
any inserted DNA sequence to be transcribed. Strong phage promoters are com-
monly used, such as those from phages SP6, T3, and T7, and the corresponding
phage polymerase is provided to ensure specific transcription of the cloned DNA.
The inclusion of labeled ribonucleotides ensures that the newly synthesized RNA
is labeled (Figure 7.6).
LABELING OF NUCLEIC ACIDS AND OLIGONUCLEOTIDES 199
5¢
5¢ 5¢
MCS Pvull
SP6 promoter
ori
pSP64
vector
Figure 7.6 RNA probes are usually made
by transcribing cloned DNA inserts with
AmpR a phage RNA polymerase. The plasmid
vector pSP64 contains a promoter sequence
cut with Pvull
for phage SP6 RNA polymerase linked to the
multiple cloning site (MCS) together with
an origin of replication (ori) and ampicillin
resistance gene (AmpR). A suitable DNA
fragment is cloned into the MCS and then
the purified recombinant DNA is linearized
SP6 RNA polymerase
ATP, GTP, UTP, CTP by cutting with the restriction enzyme PvuII.
SP6 RNA polymerase and a mixture of NTPs,
at least one of which is labeled, are added,
initiating transcription from a specific site
within the SP6 promoter and continuing
through the inserted DNA. Multiple labeled
RNA transcripts of the inserted DNA are
produced. Many similar expression vectors
multiple labeled
RNA transcripts offer two different phage promoters flanking
the multiple cloning system, such as those
for phages T3 and T7, and the respective
phage polymerases are used for generating
riboprobes from either of the two DNA
strands.
200 Chapter 7: Nucleic Acid Hybridization: Principles and Applications
TABLE 1 CHARACTERISTICS OF RADIOISOTOPES COMMONLY USED FOR LABELING DNA AND RNA PROBES
Isotope Half-life Decay type Emission energy (MeV) Exposure time Suitability for high-
resolution studies
3H 12.4 years b– 0.019 very long excellent
32P 14.3 days b– 1.710 short poor
33P 25.5 days b– 0.248 intermediate intermediate
35S 87.4 days b– 0.167 intermediate intermediate
DAPI 358 461 slightly longer wavelength, the emission wavelength. The light emitted
by the fluorophore passes back up and straight through the dichroic
Green mirror, through an appropriate barrier filter and into the microscope
eyepiece. A second beam-splitting device can also permit the light to
FITC 492 520 be recorded in a CCD (charge-coupled device) camera.
Fluorescein (see Figure 1) 494 523
Red
objective
Detection of fluorophore-labeled nucleic acids labeled sample
Fluorophore-labeled nucleic acids can be detected with laser scanners on slide
or by fluorescence microscopy. The fluorophores are detected by
passing a beam of light from a suitable light source (an argon laser is Figure 2 Fluorescence microscopy. The excitation filter allows light
used in automated DNA sequencing, a mercury vapor lamp is used of only an appropriate wavelength (blue in this example) to pass
in fluorescence microscopy) through an appropriate color filter. The through. The transmitted blue light is reflected by the dichroic (beam-
filter is designed to transmit light at the desired excitation wavelength. splitting) mirror onto the labeled sample, which then fluoresces
In fluorescence microscopy systems, this light is reflected onto the and emits light of a longer wavelength (green light in this case).
fluorescently labeled sample on a microscope slide by using a dichroic The emitted green light passes straight through the dichroic mirror
mirror that reflects light of certain wavelengths while allowing light of and then through a second barrier filter, which blocks unwanted
other wavelengths to pass straight through (Figure 2). The light then fluorescent signals, leaving the desired green fluorescence emission
excites the fluorophore to fluoresce; as it does so, it emits light at a to pass through to the eyepiece of the microscope.
There are two widely used indirect label detection systems. The biotin–strep-
tavidin system depends on the extremely high affinity between two naturally
occurring ligands. Biotin (a vitamin) acts as the reporter, and is specifically bound
by the bacterial protein streptavidin with an affinity constant (also known as the
dissociation constant) of 10–14, one of the strongest known in biology. Biotinylated
202 Chapter 7: Nucleic Acid Hybridization: Principles and Applications
M M M M
A A A A
R R R R
detection of marker
O
biotin-16-dUTP
C
C16 spacer
HN NH
O O O O HC CH
C CH CH CH2 NH C (CH2)3 NH C (CH2)3 NH C (CH2)4 CH CH
N C S
C CH biotin
O N
P P P CH2 O
C H H
C C
O
OH H
O
OH
CH3
digoxigenin-11-dUTP CH3 H
C11 spacer
OH
O O O
O
C CH CH CH2 NH C (CH2)5 NH C CH2 H
N C
digoxigenin
C CH
O N
P P P CH2 O
C H H
C C
OH H
Figure 7.8 Structure of biotin- and digoxigenin-modified nucleotides. The biotin and digoxigenin reporter groups shown here are linked to the
5¢ carbon atom of the uridine of dUTP by spacer groups consisting, respectively, of a total of 16 carbon atoms (biotin-16-dUTP) or 11 carbon atoms
(digoxigenin-11-dUTP). The spacer groups are needed to ensure physical separation of the reporter group from the nucleic acid backbone, so that the
reporter group protrudes sufficiently far to allow the affinity molecule to bind to it.
HYBRIDIZATION TO IMMOBILIZED TARGET NUCLEIC ACIDS 203
The phosphodiester backbone of nucleic acids means that they Very large DNA fragments are very poorly separated by
carry numerous negatively charged phosphate groups, so that when conventional agarose gel electrophoresis. Instead, specialized
present in an electric field they will migrate in the direction of the equipment is used in which a discontinuous electric field is used.
positive electrode. By migrating through a porous gel, nucleic acid During electrophoresis the direction or polarity of the electric field
molecules can be separated according to size. This happens because is intermittently changed. The electric field may be pulsed so that a
the porous gel acts as a sieve, and small nucleic acid molecules pass particular negative–positive orientation is established for a short time
easily through the pores of the gel but larger nucleic acid molecules before being reversed for another short interval and the switching of
are impeded by frictional forces. the field is performed recurrently. In pulsed-field gel electrophoresis,
Traditionally, quite small to moderately large nucleic acid the intermittent switching of the electric field forces the DNA to
molecules have been separated by using agarose slab gels with reorient to be able to migrate in the new electric field; the time
individual samples loaded into cut-out wells so that different lanes are taken to reorient depends on the length of the molecule. As a result,
used by the migrating samples (Figure 1). After electrophoresis, the it is possible to separate fragments of DNA by size up to several
gels are stained with chemicals, such as ethidium bromide or SYBR megabases in length.
green, that bind to nucleic acids and that can fluoresce when exposed
to ultraviolet radiation.
samples loaded into wells
For fragments between 0.1 kb and 30 kb long, the migration
speed depends on fragment length, but scarcely at all on the base
composition. However, conventional agarose gel electrophoresis is
not well suited to separating rather small or very large nucleic acid slab gel
small fragments
molecules. The percentage of agarose can be varied to increase
migrate faster
resolution—decreasing the percentage helps to separate larger
fragments, and increasing it makes it easier to separate smaller
fragments. To get superior resolution of smaller nucleic acids, however,
polyacrylamide gels are normally used, and they are used in DNA
sequencing to separate fragments that differ in length by just a single
nucleotide. Figure 1 Gel electrophoresis.
HYBRIDIZATION TO IMMOBILIZED TARGET NUCLEIC ACIDS 205
wash to remove
lm
as
ta
unhybridized probe
leta
cre
cen
ney
ti s
g
r
hea
pan
ske
live
bra
tes
pla
lun
kid
4.4
Figure 7.11 Northern blot hybridization. Northern blotting uses samples of total RNA or purified
poly(A)+ mRNA from tissues or cells of interest. The RNA is size-fractionated by electrophoresis,
transferred to a membrane, and hybridized with a suitable labeled nucleic acid probe. In the example
shown here, the probe was a labeled cDNA from the FMR1 (fragile X mental retardation syndrome)
gene. The results show comparative expression of FMR1 in different tissues: strongest expression was
found in the brain and testis (4.4 kb), with decreasing gene expression in the placenta, lung, and kidney,
respectively, and almost undetectable expression in liver, skeletal muscle, and pancreas. Although no
4.4 kb transcript was evident in heart tissue, smaller transcripts (about 1.4 kb) were evident that could
have been generated by alternative splicing or aberrant transcription. [From Hinds HL, Ashley CT,
Sutcliffe JS et al. (1993) Nat. Genet. 3, 36–43. With permission from Macmillan Publishers Ltd.]
206 Chapter 7: Nucleic Acid Hybridization: Principles and Applications
zona limitans
After hybridization, the probe solution is removed, and the filter is washed
extensively, dried, and processed to detect the bound labeled probe. The position
of strong signals from bound probe is related back to a master plate containing
the original pattern of colonies, to identify colonies containing DNA related to
the probe. These can then be individually picked and grown in liquid culture to
obtain sufficient quantities for extraction and purification of the recombinant
DNA.
Once it became possible to create complex DNA libraries, more efficient auto-
mated methods of clone screening were required. Now robotic gridding devices
can be used to pipette from clones arranged in microtiter dishes into predeter-
mined linear coordinates on a membrane. The resulting high-density clone fil-
ters can permit rapid and efficient library screening, as described in Chapter 8.
laser
scanning
laser
scanning
weak
hybridization
signal
strong
hybridization
signal
spot on the array is analyzed with digital imaging software that converts the sig-
nal into one of a palette of colors according to its intensity.
We will consider practical examples of microarray use in subsequent chap-
ters, such as in Chapter 8 where we consider how gene expression is analyzed.
Here we are concerned with the basic principles and technologies. The object of
microarray hybridization is for each of the thousands of different probes to form
heteroduplexes with target sequences in the labeled nucleic acids of the test sam-
ple. At any one position on the array numerous identical copies of just one type of
oligonucleotide or DNA will be available for hybridization (Figure 7.14).
Target sequences will be present at different amounts in the sample
under study. The amount of specific target DNA bound by each probe will depend
on the frequency of that specific target DNA in the test sample. A probe that binds
a relatively rare target nucleic acid will have less fluorescent label bound to it,
resulting in a weak hybridization signal, whereas positions on the array that are
strongly fluorescent after washing indicate probes that have bound an abundant
target nucleic acid. Quantitating the amount of fluorescent label bound across all
the positions in a microarray gives a huge data set that can reflect the relative
frequencies of numerous target sequences in the sample population.
(A)
photolithograph
(mask)
quartz wafer
microarray
(B)
photolabile group
O O O O OH O A
O O OH OH O A A A O
STEP 1-A
linker deprotection nucleotide
addition
O O A OH OH A C C A
A A A A C A A
STEP 1-C
deprotection nucleotide
addition
C T G
C G
synthesis
A A T
G C A T
T C G
C C A
A A
STEP 25-T
(end) deprotection nucleotide
addition
Figure 7.15 Construction of oligonucleotide microarrays by using light-directed oligonucleotide synthesis in situ.
(A) Affymetrix oligonucleotide microarrays are constructed by synthesizing thousands of oligonucleotides at predetermined
positions on a quartz wafer microarray, using a photolithograph (mask) to determine the position of each added nucleotide.
(B) Each oligonucleotide is assembled starting from a linker molecule that is fixed to the array and whose free end carries a
protective photolabile group. A mask with holes cut into it at specific positions is placed above the array and light from an external
source is shone through the holes. At the positions exposed to light the photolabile groups are destroyed (deprotection). A specific
mononucleotide coupled to a photolabile group is added to the array and will combine with deprotected strands. The process
is repeated three times with another three masks with holes cut in different positions, introducing in turn one of the other three
nucleotides. The net result is that each linker molecule will have a single nucleotide attached, either A, C, G, or T, in a specific
pre-programmed way. The sequential use of four different masks to allow coupling of the four different nucleotides is repeated for a
further 24 cycles. The end result is an assembly of oligonucleotides 25 nucleotides long, each with a precisely determined sequence
at a specific coordinate in the microarray.
markers. As we will describe later, large-scale DNA variation analyses often use
microarrays that have bacterial artificial chromosome DNA clones as probes that
seek to identify large subchromosomal deletions and duplications.
In addition to microarray hybridization, it should be noted that microarrays
have many other applications. Applications include the use of protein arrays,
arrays that assay binding between nucleic acids and proteins, and many others.
FURTHER READING
General Affymetrix and Illumina oligonucleotide microarray
Sambrook J & Russell DW (2001) Molecular Cloning. A Laboratory technology
Manual. Cold Spring Harbor Laboratory Press. Affymetrix GeneChip™. http://www.affymetrix.com/technology/
index.affx
Fluorescence labeling Gunderson KL, Kruglyak S, Graige MS et al. (2004) Decoding
Invitrogen Molecular Probes—The Handbook: A Guide To randomly ordered DNA arrays. Genome Res. 14, 870–877.
Fluorescent Probes And Labeling Technologies. http://www Illumina BeadArray™. http://www.illumina.com/pages.ilmn?ID=5
.invitrogen.com/site/us/en/home/References/Molecular
-Probes-The-Handbook.html Microarray hybridization applications
Falciani F (ed.) (2007) Microarray Technology Through
In situ hybridization Applications. Taylor and Francis.
Wilkinson D (1998) In Situ Hybridization: A Practical Approach, Hoheisel JD (2006) Microarray technology: beyond transcript
2nd ed. IRL Press. profiling and genotype analysis. Nat. Rev. Genet. 7, 200–210.
Microarray hybridization principles and basic
technology
Various authors (1999) The Chipping Forecast. Nat. Genet.
21 (Suppl.), 1–60.
Various authors (2002) The Chipping Forecast II. Nat. Genet.
32 (Suppl.), 465–551.
This page is intentionally left blank.
Chapter 8
• DNA libraries are collections of cloned DNA fragments that collectively represent the
genome of an organism (genomic DNA libraries) or genome subfractions.
• cDNA libraries are derived from reverse-transcribed RNA and so contain only the expressed
part of the genome.
• DNA sequencing usually relies on synthesizing new DNA strands from single-stranded
templates. The most modern methods sequence huge numbers of DNA samples in parallel
without electrophoresis.
• Genome resequencing entails using an initial genome sequence as a reference for rapid
resequencing of the genomes or genome subfractions of other individuals from the same
species.
• DNA markers are DNA sequences that have a unique subchromosomal location and can be
conveniently assayed. Marker maps provided important framework maps for the
sequencing of complex genomes that have numerous repetitive DNA elements.
• Polymorphic DNA markers are needed to make genetic maps. The different markers are
genotyped in individuals spanning different generations of a common pedigree.
• Both non-polymorphic and polymorphic DNA markers can be used to make physical maps
of chromosomes, often by genotyping panels of cells that have different fragments of
individual chromosomes.
• A clone contig is a series of cloned genomic DNA fragments, arranged in the same linear
order as the subchromosomal region from which they were derived. Overlaps between each
DNA fragment allow them to be placed in order, providing important framework maps for
genome sequencing.
• Computer-based gene prediction often relies on identifying significant sequence similarity
between a test DNA sequence (or derived protein sequence) and a known gene sequence or
protein.
• The terms transcriptome and proteome describe, respectively, the complete set of RNA
transcripts and the complete set of proteins produced by a cell. Whereas the different
nucleated cells of an organism have stable, almost identical genomes, transcriptomes and
proteomes are dynamic and each can vary very significantly from cell to cell.
• High-resolution expression analyses are designed to obtain detailed gene expression
patterns in cells or tissues but are limited to analyzing one or a few genes at a time.
• Highly parallel expression analyses provide gene expression information for many
thousands of genes at a time in two or more cellular sources.
214 Chapter 8: Analyzing the Structure and Expression of Genes and Genomes
The advent of DNA cloning and DNA sequencing technologies in the 1970s ush-
ered in a new era in which it became possible to characterize genes in a system-
atic way. By sequencing a gene and its transcript(s) it became possible to work
out exon–intron organization and to predict the sequence of any likely protein
products.
Initial progress was slow but quickly picked up as technologies improved. By
the early 1980s, researchers began to make plans for sequencing all the different
DNA molecules in a cell (collectively known as a genome), and by the start of the
1990s, internationally coordinated projects had been launched to determine the
complete sequence of the human genome and that of several model organisms.
The genome sequences paved the way for comprehensive analyses of gene struc-
ture and gene expression.
During the amplification stage, different colonies may grow at different rates,
however. For example, the growth of particular cells may be retarded because
they contain a recombinant DNA that affects their metabolism in some way.
Amplification may therefore result in a distortion of the original representation
of cell clones in the unamplified library.
length of
(B) A reaction (terminates with ddATP) synthesized (C) A C G T
DNA
n
AG n + 2
AGTAG n + 5
3¢ 5¢
AGTCCGGTAGTAG n + 13
n + 16 n + 18
ACGAGTCCGGTAGTAG
n + 17
n+6
were developed to sequence single DNA molecules that had not been amplified 1) + dTTP degraded
in any way, and the first commercial machine for this purpose became available 2) + dATP degraded
in 2008. 3) + dCTP degraded
4) + dGTP PP i LIGHT
Massively parallel DNA sequencing is transforming molecular genetics. Such
methods are being extensively used in rapid genome resequencing for various dGMP
purposes, including personal genome sequencing and the identification of
mutations. In addition, they are providing comprehensive analyses of the
transcriptome, the collection of different RNA molecules in cells. Furthermore, CAG
as we will see in Chapter 12, they are providing rapid endpoints for various analy-
ses that seek to identify DNA–protein interactions and binding sites for regula- GTCA
different types of oligonucleotide adaptor are ligated to the ends of the DNA frag-
ments to provide universal priming sequences for amplification. After denatura- (B)
tion, those single-stranded fragments that have the different adaptors ligated at sulfurylase luciferase
the different ends are selected. The next step is to attach the DNA molecules to PPi ATP LIGHT
beads (Figure 8.9A) and takes advantage of biotin–streptavidin binding. As
described in Section 7.2, the bacterial protein streptavidin has an extraordinarily 2–
APS SO4 luciferin oxyluciferin
high binding affinity for the vitamin biotin, and so by designing one of the adap-
tors to contain a biotin tag, the DNA molecules will bind to beads coated with Figure 8.8 The principle underlying
biotin’s binding partner, streptavidin. pyrosequencing. (A) Base insertion in
Single-stranded DNA templates are immobilized on beads, and the beads are pyrosequencing. In pyrosequencing, a
separated from each other by creating an oil–water emulsion, with each droplet DNA polymerase synthesizes a DNA chain
containing a single bead and the reagents needed for PCR. The droplets are by using a single-stranded DNA template
known as microreactors (Figure 8.9B). After PCR amplification there are 10 mil- and the four normal dNTPs. Instead of
lion copies of one DNA fragment immobilized on one bead. The emulsion is then having a mixture of the four dNTPs, the
individual dNTPs are provided sequentially.
broken and the beads are deposited into picoliter wells on a slide (one bead per
When the correct dNTP is provided, the
well) that are then layered with smaller beads that have ATP sulfurylase and incorporation of the dNMP nucleotide is
luciferase attached to their surfaces (see Figure 8.9B). A fixed sequence of the tracked by the simultaneous production
dNTP precursors (T, then A, then C, then G) is washed over the beads and chemi- of a pyrophosphate (PPi) group that is
luminescent light is emitted each time a nucleotide is incorporated; this is used to produce light. If an incorrect dNTP
recorded for the individual positions. is provided it is degraded by the enzyme
The massively parallel pyrosequencing method developed by 454 Life Sciences apyrase. In this example, the first dNMP
(and marketed by Roche) produces reasonably long individual DNA sequence to be incorporated is G, as indicated by
readouts, but many hundreds of thousands of sequences can be read in parallel. the production of light in the G reaction
As a result, the throughput is close to ten thousand times greater than that of dide- only, to be followed by T. (B) The insertion
of the correct base is monitored by light
oxysequencing (Table 8.1). The related Solexa sequencing technology was devel-
production in a two-step reaction. The
oped more recently by the Illumina company and has even greater sequencing released PPi is used by the enzyme ATP
throughput, although accompanied by smaller individual sequence readouts. sulfurylase to generate ATP, which in turn
An alternative technology, massively parallel sequencing-by-ligation has drives a luciferase reaction to produce light,
been developed by the Applied Biosystems company. Their SOLiD™ (Sequencing as detected by a charge-coupled device
by Oligonucleotide Ligation Detection) technology also uses an emulsion-based (CCD) camera.
PCR strategy to amplify individual DNA fragments. The amplification products
are then randomly deposited onto an array for probing with fluorescently labeled
oligonucleotides. The SOLiD method also has high sequencing throughput but
with comparatively short individual sequence readouts.
Single-molecule sequencing
A new wave of technologies, sometimes known as third-generation sequencing,
permit the sequencing of single DNA molecules that are not amplified in any way.
SEQUENCING DNA 221
(A) (B)
break
clonal microreactors,
amplification and enrich for
occurs inside DNA-positive
microreactors beads
load enzyme
beads and
centrifuge
Figure 8.9 Sample preparation for massively parallel pyrosequencing. (A) Genomic DNA is isolated, fragmented, ligated to
oligonucleotide adaptors, and separated into single strands. (B) Fragments are bound to beads under conditions that favor one
fragment per bead. Thereafter the beads are captured in the droplets of a microreactor (an emulsion containing a PCR reaction
mixture in oil). Clonal PCR amplification occurs within each droplet, resulting in beads each carrying 10 million copies of a unique
DNA template. The emulsion is broken to release the beads, and the DNA strands are denatured. Single beads carrying single-
stranded DNA clones are deposited into individual wells of a fiber-optic slide (PicoTiterPlate™) that contains 1.6 million wells.
Each well will contain a picoliter sequencing reaction. Smaller beads carrying immobilized enzymes required for pyrophosphate
sequencing (Figure 8.8B) are deposited into each well to enable the pyrosequencing reaction to take place. [Adapted from Margulies
M, Egholm M, Altman WE et al. (2005) Nature 437, 376–380. With permission from Macmillan Publishers Ltd.]
Sequencing-by-synthesis Roche GS >300 >0.45 Gb per run of Long read lengths but high reagent costs
of PCR-amplified DNA FLX sequencer 7 hours and limited output
Illumina Genome ~100 18 Gb for a run of 4 days Currently most widely used but high
Analyzer IIx reagents cost
2 ¥ ~100a 35 Gb for a run of 9 days
Sequencing-by-ligation Applied Biosystems ~50 Up to 30 Gb for a run of Short read lengths, long runs and high
of PCR- amplified DNA SOLiD 3 7 days reagent costs
Single molecule Helicos Biosciences ~30 ~40 Gb per run of 8 days Low reagent costs but short read lengths
sequencing HeliScope and comparatively high error rates
Pacific ~1000 Not currently available Not expected to be released until late
Biosciences 2010/2011. Great potential (see text)
aWhen using a mate-pair run that involves obtaining sequences from both ends of the DNA molecules.
222 Chapter 8: Analyzing the Structure and Expression of Genes and Genomes
fluorophore
group O
O O O
N g b a base Figure 8.10 A phospholinked fluorescent
N O P O P O P O labeled dNTP. The dashed arrow indicates
H cleavage that occurs when the nucleotide
H O– O– O– CH2 O is incorporated into DNA. A pyrophosphate
spacer
C H H C group containing the b- and g-phosphate
C C groups and an attached fluorophore is
H H
cleaved off from the dNMP residue that will
OH H be incorporated into DNA.
SEQUENCING DNA 223
(A)
SMRT
chip
aluminium
detection
system
SMRT chip glass emission
excitation
(B)
1 2 A 3 4 A
5
C T G T A
G
C C C C
time
Figure 8.11 Principles of SMRT (single-molecule real-time sequencing) technology. (A) The SMRT chip and ZMW nanophotonic
visualization chambers. The SMRT chip shown on the left is an aluminum film 100 nm thick deposited on a silicon dioxide substrate that
extends just over 40 mm ¥ 30 mm in size. The metal film contains thousands of perforations known as zero-mode waveguides (ZMWs)
that can act as tiny visualization chambers, in each of which a DNA polymerase molecule performs DNA sequencing with a single DNA
molecule template. The chip is mounted next to a fluorescence detection system. By viewing through the glass support it can visually
record real-time DNA sequencing reactions in each of the individual ZMWs (right). (B) The phospholinked dNTP incorporation cycle plus
the corresponding expected time trace of detected fluorescence intensity. SMRT is a sequencing-by-synthesis method that uses dNTPs
labeled with one of four different fluorophores, depending on which base is present. However, because the fluorophore is attached to
the g-phosphate of dNTPs (see Figure 8.10), it does not get incorporated into the growing DNA chain. Nevertheless, a DNA polymerase
molecule will hold an incoming dNTP steady at its active site for a brief time before the triphosphate group is cleaved and the dNMP is
inserted. During this time, the engaged fluorophore emits fluorescent light whose color corresponds to the base identity. The steps shown
are: (1) the phospholinked nucleotide begins to associate with the template in the active site of the polymerase, (2) causing an elevation of
the fluorescence output on the corresponding color channel. (3) Phosphodiester bond formation liberates the dye–linker–pyrophosphate
product, which diffuses out of the ZMW, thus ending the fluorescence pulse. (4) The polymerase translocates to the next position, and (5)
the next nucleotide to be inserted binds the active site beginning the subsequent pulse. [Adapted from Eid J, Fehr A, Gray J et al. (2009)
Science 323, 133–138. With permission from the American Association for the Advancement of Science.]
change in the current represents a direct reading of the DNA sequence. See http://
www.nanoporetech.com/sequences for a video explanation of one such method.
This is a fast moving area, but like SMRT has great potential for offering fast very
high throughput DNA sequencing at low cost.
(A) (B)
target
P
target region C
target
region B fragmentation linker
region A
and polishing ligation
denaturing and
hybridization
target fragment
amplification elution
array
washing
(D)
BOX 8.1 FRAMEWORK MAPS BASED ON DNA MARKERS AND CLONE CONTIGS
First time sequencing of complex genomes (which invariably have DNA markers may or may not be polymorphic (Table 1), but to
large amounts of repetitive DNA) is assisted by first constructing be useful for mapping purposes a marker needs to have a unique
framework maps for each chromosomal DNA molecule. The chromosomal location. Different methods have been used to map
framework maps best suited for DNA sequencing are based on a human DNA marker to a specific subchromosomal localization. A
clone contigs, linearly organized series of cloned overlapping DNA common approach is to type panels of artificially constructed hybrid
fragments that collectively represent chromosomal DNA sequences cells that contain a full set of rodent chromosomes plus one or more
(Figure 1). specific human chromosomes or multiple human chromosome
Assembling clone contigs depends on being able to identify fragments. The latter are generated by exposing human cells to
overlaps between cloned DNA fragments. There are various ways controlled radiation, causing chromosome breakage, and then fusing
in which this can be done but often clones are assayed for the the human cells with rodent cells so that variable sets of human
presence of certain DNA markers known to map in the approximate chromosome fragments can insert into the rodent chromosomes
subchromosomal region. A DNA marker is any short DNA sequence (radiation hybrids). Alternatively, DNA markers have been mapped on
that can be assayed in some way, either by a hybridization assay or, chromosomes by labeling clones known to contain the marker with
more conveniently, by a PCR assay. a fluorophore and hybridizing them to suitably treated fixed human
fragmentation (see Figure 8.1). Identical copies of the same chromosomal DNA
molecules will be cleaved at different locations on the DNA, and so any unique
short sequence will be represented by a series of overlapping DNA fragments of
different sizes produced by differential DNA cleavage. The different fragments
are then attached to identical vector molecules and randomly introduced into
cells.
Early attempts to assemble a series of human genomic DNA clones with over-
lapping inserts would proceed from specific starting clones. A hybridization
probe prepared from the insert DNA of a chosen genomic clone of interest would
be used to screen all the cells of the DNA library to identify those that contained
a strongly hybridizing DNA fragment, which could indicate an overlapping DNA
fragment. Newly identified clones could then be used to prepare new hybridiza-
tion probes to identify more distally overlapping clones. This procedure, known
as chromosome walking, was extremely laborious.
More efficient, genome-wide methods of identifying clones with overlapping
inserts were subsequently developed using clone fingerprinting. The idea was to
submit all clones in the library to some assay that could help identify clones with
chromosome preparations on a microscope slide (chromosome FISH; it is possible to sequence the ends of numerous large insert clones
see Figures 2.16A and 2.17A). from a genomic DNA library, thereby generating thousands
DNA marker maps can be built in different ways. Polymorphic of markers.
markers can simply be assayed in members of multigenerational If large numbers of STS markers are available, they can be used
pedigrees to identify groups of linked markers that must map to the to build framework marker maps that allow clones to be assembled
same chromosome. Non-polymorphic markers can be placed on into clone contig maps. An early example of a short clone contig map
marker maps by using PCR to assay the markers in panels of radiation that was assembled from STS marker mapping is shown in Figure 2.
hybrid cells, or in panels of YAC clones or other genomic DNA clones The idea is to arrange clones in a linear order according to whether
with large inserts. Common markers are sequence-tagged site they type positively or negatively for STS markers. Yeast artificial
(STS) markers originating from clones that have been at least partly chromosome clones were initially a popular choice for assembling
sequenced, including previously identified and sequenced DNA clone contig maps, but YACs containing large human inserts were
clones, such as gene clones or other DNA clones of interest that have found to be unstable, being prone to internal rearrangements and
already been well characterized and mapped. More generally, deletions (see Figure 2).
11
11
91
11
10
11
45
11
11
86
0
X IV
3
S3
S2
S2
S2
S2
S2
S2
S1
S2
S2
S2
H1
Fa
Z
MA
HK
D2
D2
D2
D2
D2
D2
D2
D2
D2
D2
D2
AN
EG
90
TG
AC
TABLE 1 COMMON MARKERS USED IN CONSTRUCTING FRAMEWORK DNA MAPS OF COMPLEX GENOMES
Marker type Marker Definition
Polymorphic Restriction fragment length Any DNA polymorphism that results in the creation or destruction of one recognition
polymorphism (RFLP) sequence for a specific restriction endonuclease. Used to be assayed by hybridization
but now assayed by PCR
Single nucleotide A single nucleotide change that is polymorphic. The polymorphism is usually limited
polymorphism (SNP) (typically, there are just two significantly frequent alleles), but SNPs are very suited to
automated genotyping such as the use of pyrosequencing (see Figure 8.8)
Non-polymorphic Sequence-tagged site (STS) Any DNA sequence that has been mapped to a specific subchromosomal location and
can be assayed by PCR amplification
Expressed sequence tag (EST) A subset of sequence-tagged sites that happen to be located within DNA sequences
known to be transcribed
228 Chapter 8: Analyzing the Structure and Expression of Genes and Genomes
overlapping DNA inserts. For example, clone insert DNAs were submitted to
restriction mapping and assays of repetitive DNA content to look for characteris-
tic patterns of spacing of sites for specific restriction endonucleases or of particu-
lar repetitive sequences.
Clone fingerprinting based on restriction mapping and/or repetitive DNA
content is still quite laborious. A yet more efficient method depends on the avail-
ability of a high-density DNA marker map. Various DNA markers can be used.
They may be polymorphic (and may be localized on a genetic map) or non-
polymorphic. The only requirements are that they should have a unique subchro-
mosomal localization and be able to be assayed in some way (see Box 8.1).
The first human genetic maps were of low resolution and were
constructed with mostly anonymous DNA markers
In genetic mapping different polymorphic markers are assayed in all members of
a variety of multigeneration families. The resulting genotypes are then analyzed
by computer to work out how alleles of the different markers segregate at each
meiotic event connecting parents to their offspring, and to identify markers
where specific alleles co-segregate. The discovery of apparently random DNA
polymorphisms in the human genome prompted the idea of constructing a com-
prehensive, nonclassical human genetic map, one that was based predominantly
on anonymous DNA markers rather than on genes.
The first genetic linkage map of the human genome was published in 1987
and was based on restriction fragment length polymorphisms (RFLPs). An
GENOME STRUCTURE ANALYSIS AND GENOME PROJECTS 229
40 bp allele 1 = (CA)16 P2
CA CA CA CA CA CA CA CA CA CA CA CA CA CA CA CA
GT GT GT GT GT GT GT GT GT GT GT GT GT GT GT GT
P1 40 bp
The 1994 map was of sufficiently high resolution to achieve the HGP’s goals
for genetic mapping, and from then onward the major focus was on developing
and refining physical maps leading to the ultimate physical map, the complete
DNA sequence of each chromosome. However, genetic mapping of the human
genome has continued, in two major ways. First, microsatellite maps of even
higher resolution have been developed. Second, high-density maps have been
developed for single nucleotide polymorphisms (SNPs) by the International
HapMap (haplotype mapping) Consortium. The major motivation has been to
aid the identification of common disease genes. SNPs are not particularly poly-
morphic (they mostly have two alleles) but they are well suited to automated typ-
ing, by methods such as pyrosequencing (see Figure 8.8). The most recent such
SNP map has over 3 million SNP markers, or roughly one SNP per kilobase.
TABLE 8.2 DIFFERENT TYPES OF PHYSICAL MAP USED TO MAP THE HUMAN NUCLEAR GENOME
Type of map Examples/methodology Resolution
Chromosome somatic cell hybrid panels with human chromosome fragments distance between adjacent chromosomal breakpoints
breakpoint maps derived from natural, translocation, or deletion chromosomes on a chromosome is usually several Mb
monochromosomal radiation hybrid (RH) maps distance between breakpoints is often many Mb
Restriction maps created with restriction endonucleases that cut only rarely several hundred kb
Clone contig maps overlapping YAC clones average YAC insert has several hundred kb
STS maps requires previous sequence information from clones that have less than 1 kb is possible, but standard STS maps have
been mapped to subchromosomal regions resolutions in tens of kb
EST maps requires cDNA sequencing, then mapping of cDNAs back to other average resolution in the human nuclear genome is
physical maps ~90 kb
markers with an average spacing of just less than 200 kb. In this landmark achieve-
ment, the chromosomal locations of the non-polymorphic STS markers had been
obtained by two approaches. One approach used STS content mapping. YAC
clones from the CEPH YAC library were assayed for large numbers of STS markers
to identify clones sharing certain markers, and to assign them to YAC contigs with
a known subchromosomal location. The second approach used hybrid cell map-
ping. Different human STS markers were assayed in panels of human–rodent
hybrid cells. The hybrid cells each contained a full complement of rodent chro-
mosomes but differed in that they contained different whole human chromo-
somes or different sets of fragments from human chromosomes that had inte-
grated into the rodent chromosomes.
Although the YAC mapping achievements were impressive, there was a prob-
lem. YACs containing large human DNA inserts (which will contain many repeti-
tive DNA sequences) proved to be prone to rearrangements and deletions (see
Figure 2 in Box 8.1). Because the YAC inserts were often not faithful representa-
tions of the original starting human DNA, YAC clones could not be the template
for the final genome sequencing effort.
Alternative large insert cloning systems were developed. Bacterial artificial
chromosome (BAC) and phage P1 artificial chromosome (PAC) libraries have
smaller inserts (80–250 kb) than YACs but, crucially, human inserts are compara-
tively stable in BACs and PACs. To provide even more STS markers, clones in the
BAC libraries were routinely subjected to end-sequencing in which a few hun-
dred nucleotides were sequenced at either end of the insert. Eventually, large
BAC/PAC clone contigs were established for each human chromosome, paving
the way for the final genome sequencing phase (see overview in Figure 8.16).
clones diseases
BOX 8.2 MAJOR MILESTONES IN MAPPING AND SEQUENCING THE HUMAN GENOME
1956 The first physical map of the human genome is determined. Using light microscopy of stained tissue, JH Tjio and A Levan (Hereditas 42,
1–6) reveal that our cells normally contain 46 chromosomes and that there are 24 different types of human chromosome.
1977 Fred Sanger and colleagues publish the dideoxy DNA sequencing method (Proc. Natl Acad. Sci. USA 74, 5463–5467). With some further
refinements (fluorescence labeling and automation) it will be the method used to sequence the human genome and many other
genomes.
1980 David Botstein et al. (Am. J. Hum. Genet. 32, 314–331) propose that a human genetic map can be constructed by using a set of random DNA
markers such as restriction fragment length polymorphisms (RFLPs).
1981 Sanger and colleagues publish the complete sequence of human mitochondrial DNA (Anderson S et al. Nature 290, 457–465).
1984 A workshop is held in Alta, Utah, to evaluate methods of mutation detection and characterization and to project future technologies.
A principal conclusion is that an enormously large, complex, and expensive sequencing program is required to permit high-efficiency
mutation detection.
1987 The US Department of Energy publishes a report on a Human Genome Initiative, which is the first of its kind.
1987 Helen Donis-Keller and colleagues report the first genetic linkage map of the human genome (Cell 51, 319–337). The map is based on
RFLPs and has a low resolution.
1988 The US National Institutes of Health (NIH) sets up a dedicated Office of Human Genome Research (later renamed the National Center for
Human Genome Research).
1988 The Human Genome Organization (HUGO) is established to coordinate international efforts by facilitating the exchange of research
resources, encouraging public debate, and advising on the implications of human genome research.
1990 The Human Genome Project (HGP) is launched officially after implementation of a $3 billion 15-year project in the USA.
1992 The first comprehensive human genetic linkage map, based on microsatellite markers (Weissenbach J et al. Nature 359, 794–801).
1993 A first-generation physical map of the human genome is reported, based on YAC clones (Cohen D et al. Nature 366, 698–701).
1994 An improved genetic map is published, based mostly on microsatellite markers and with a spacing of one marker per centimorgan (Murray
JC et al. Science 265, 2049–2054).
1995 The first detailed physical map of the human genome is published, based on sequence-tagged sites (Hudson TJ et al. Science 270,
1945–1954).
1996 A high-density human BAC library is published (Kim UJ et al. Genomics 34, 213–218).
1998 GeneMap ’98, the first reasonably comprehensive map of gene-based markers, is published (Deloukas P et al. Science 282, 744–746).
1999 The first essentially complete DNA sequence for a human chromosome is reported, for chromosome 22 (Dunham I et al. Nature 402,
489–495).
2001 Draft sequences of the human nuclear genome, comprising roughly 90% of the total euchromatic component, are published by the
International Human Genome Sequencing Consortium (IHGSC) (Nature 409, 860–921) and by Celera (Science 291, 1304–1351).
2001/2 Publication of a draft sequence of the mouse nuclear genome (Waterson B et al. Nature 420, 520–562). Human–mouse comparisons help
human gene identification/characterization.
2003/4 The essentially completed sequence (about 99%) of the euchromatic component of the human genome is reported, comprising 2.85 Gb of
DNA, or roughly 90% of the total of 3.2 Gb (analyses published in 2004 by the IHGSC, Nature 431, 931–945).
2004 Over 21,000 human genes are validated by full-length cDNA clones (Imanishi T et al. PLoS Biol. 2, e162).
2005–7 The International HapMap Consortium reports increasingly detailed single nucleotide polymorphism (SNP) maps for the human genome
in 2005 (Nature 437, 1299–1320) and 2007 (Nature 449, 851–861). The latter map has more than 3.1 million SNPs, roughly one SNP per
kilobase.
2007–8 The age of personal genome sequencing begins with delivery of euchromatic genome sequences for James Watson and Craig Venter and
the launch of the 1000 Genomes Project (http://www.1000genomes.org/page.php).
For the publicly funded HGP, most of the sequence was contributed by large
genome centers, notably the Wellcome Trust Sanger Institute, Washington
University, Baylor College of Medicine, MIT/Harvard, the DoE Joint Genome
Institute, the Japanese RIKEN Institute, and the French Genoscope center. To
ensure efficiency, it was agreed that specific centers would take primary respon-
sibility for assembling clone contigs and subsequent sequencing of individual
chromosomes: for example, the Sanger Institute for chromosome 1, and
Washington University for chromosome 2.
The publicly funded HGP met with aggressive competition from privately
funded genome sequencing. In 1999 the Celera company announced that it
intended to produce a draft human genome sequence in 2 years and that it would
do this by whole genome shotgun sequencing (see Figure 8.13A) rather than the
slower large contig assembly approach of the HGP.
The ensuing race between the publicly funded International Human Genome
Sequencing Consortium (IHGSC) and the privately funded Celera accelerated
the HGP timetable. In 2001, both sides published a draft sequence of the human
genome that covered about 90% of the euchromatic genome sequence. The
euchromatic component is about 90% of the total genome, and so the draft
sequences actually represented about 80% of the total genome but 90% of the
total gene sequence.
Although the race was perceived to have ended in a draw in 2001, it had not
been a fair race, and the finishing line had not been reached. The IHGSC had
made their data freely available, posting sequence data updates on the Web every
24 hours. Celera made use of huge blocks of the IHGSC’s sequence data, re-
processed it, and fed the data back into its own sequence compilation. The Celera
sequence was therefore not an independently obtained human genome sequence.
Unlike the IHGSC, Celera denied external access to their sequence data. Long
after they had published their analyses, Celera required expensive subscription
charges to view their sequence data.
To complete the sequence of the human genome, the IHGSC continued alone
with the hard work of filling in the gaps in the sequence. By 2003, the IHGSC had
produced an essentially complete sequence of the euchromatic component of
the human genome that was available on the Web and followed up by publishing
analyses of the finalized sequence in 2004.
The 2.85 Gb of sequence reported in 2003 was extremely accurate and arranged
in contigs of close to 9 Mb on average. The sequence was interrupted by 341 gaps
234 Chapter 8: Analyzing the Structure and Expression of Genes and Genomes
that remained. The gaps are of two types. Structural gaps result from problems
with ambiguous map assignment due to repetitive sequences (often duplications
with near-identical sequences), but can be solved by standard (clone-based)
DNA sequencing methods. Non-structural gaps are ones where the missing
sequence can not be cloned in bacterial cells. In such cases, the sequence alter-
nates between pyrimidine and purine nucleotides, making the DNA twist in a
reverse orientation to form a left-handed helix that is toxic to bacteria. They can
be closed only by using sequencing-by-synthesis methods. The number of gaps
in the euchromatic genome has been progressively reduced to less than 200 in
the most current sequence (NCBI build 37; released 2009).
Because a variety of DNA libraries had been used to generate the sequence,
the final sequence was a composite one, representing various donor cells of dif-
ferent genotypes. This sequence acted as a reference sequence that hugely facili-
tated subsequent sequencing of the euchromatic genomes of individual people.
An early demonstration was a 2007 report of the sequencing of Jim Watson’s
genome in a few months using massively parallel pyrosequencing. The age of
personal genome sequencing began to take off with the international 1000
Genomes Project that was launched in early 2008. The aim is to obtain a detailed
catalog of human genetic variation by sequencing the genomes of at least 1000
individuals representing a variety of different ethnic groups (see Box 8.2).
Once the initial draft genome sequence had been obtained, another major
international effort focused on functionally important sequences with a priority
of identifying and characterizing all human genes and regulatory elements. By
2004 more than 21,000 human genes were reported to have been validated by
determining full-length cDNA sequences. However, as described in the sections
below, there are still considerable uncertainties about exactly how many genes
we have.
TABLE 8.3 MAJOR ELECTRONIC DATABASES THAT SERVE AS GENERAL NUCLEOTIDE OR PROTEIN SEQUENCE
REPOSITORIES
Database type Database Originator/host URL
Nucleotide sequence GenBank US National Center for Biotechnology Information (NCBI) http://www.ncbi.nlm.nih.gov
TREMBL EBI (translations of coding sequences from the EMBL database that http://www.ebi.ac.uk/trembl/
have not yet been deposited in SWISS-PROT) http://ca.expasy.org/sprot
Note that the databases have subsets that are devoted to particular sequence classes. For transcribed sequences and genes, the dbEST database
(http://www.ncbi.nlm.nih.gov/dbEST) and the UniGene database (http://www.ncbi.nlm.nih.gov/unigene) have been widely used.
959 cells or 1031 cells depending on its gender, should have more than 20,000
protein-coding genes, about as many as a human. The correlation between gene
content and organism complexity will be considered more fully in Chapter 10.
GENOME BROWSERS
GENOME COMPILATIONS
The predicted translation products can in turn be used to search for homol-
ogy against all known protein sequences (using BLASTP), and against the pre-
dicted translation products of all known nucleotide sequences (using TBLASTN).
Suspected candidate protein sequences can also be used as queries in homology
searching against databases of protein domains and short motifs, as detailed in
Chapter 10.
The detection of novel genes can also be aided by computer algorithms that BOX 8.4 GENE ONTOLOGY AND
recognize certain elements commonly found in genes. Exon prediction programs THE GO CONSORTIUM
such as GENSCAN, for example, test for conserved consensus sequences at splice
junctions (see Figure 1.17) and assign high probability to a predicted exon if there As the trickle of genome sequence data
is a large open reading frame that differentiates the sequence from noncoding began to become a flood, biologists
DNA (where on average one of the three termination codon sequences will occur started to grapple with the difficult
challenge of integrating sequence
every 60 bp or so).
data with the vast and rapidly growing
Integrated gene-finding software packages have been developed that com- body of data from gene function
bine programs designed to identify exons and gene-associated motifs with gen- analyses. Searching across the broad
eral sequence-homology-based database searching programs. One example is scientific literature and databases has
the Genotator program (http://www.fruitfly.org/~nomi/genotator/). been hampered by wide variations
Another characteristic that helps identify genes in vertebrates is base compo- in terminology. What was needed
sition. Vertebrate genes are very often associated with small regions, often about was a standardized system of gene
1 kb long, that are both GC-rich and have a significantly higher frequency of the ontology to represent gene function
dinucleotide CpG than the bulk of the genome, where the CpG dinucleotide is across genomes and species. The Gene
statistically under-represented. Because these regions can be viewed as islands of Ontology (GO) Consortium was formed
in 1998 and the GO project began to
normal CpG frequency in a sea of genomic DNA that is otherwise CpG-deficient,
develop a controlled vocabulary to
they are referred to as CpG islands. describe the attributes of genes and
As we will see in Chapter 9, the CpG dinucleotide is a target for DNA methyla- gene products in any organism, to
tion that can cause local condensation of chromatin and inhibit gene expression. allow searching across multiple gene
The upstream regions and other important sites that regulate gene ex- products and species for common
pression need to be able to adopt an open chromatin conformation for gene characteristics.
expression, and so high frequencies of CpG are not tolerated in these regions. To describe gene products, the GO
CpG islands can be identified both experimentally and by computer programs Consortium developed three separate
that scan the sequence for variation in base composition. ontologies—biological process, cellular
component, and molecular function.
Obtaining accurate estimates for the number of human genes is These ontologies permit the annotation
of molecular characteristics across
surprisingly difficult species and may be broad or more
Analysis of the human genome sequence has led to the identification of many focused. For example, the biological
thousands of previously unstudied genes. As we will see in later chapters, the process can be as broad as signal
functions of predicted genes are being intensively studied in the post-genome- transduction or more restricted, for
example a-glucoside transport. Each
sequencing era, and systematic and hierarchical vocabularies are being devel-
vocabulary is structured so that any
oped to define gene function (Box 8.4). Surprisingly, however, many years after term may have more than one parent as
the complete euchromatic sequence was obtained and despite intense investiga- well as zero, one, or more children. This
tions, the total number of human genes is still not accurately known, and it may makes attempts to describe biology
be some time before a precise figure can be obtained. much richer than would be possible
Before the human genome was sequenced, the number of human genes had with a hierarchical graph. Currently the
been expected to be in the range 70,000–100,000, or close to three to four times GO vocabulary consists of more than
the number of genes in C. elegans. By 2001, experimental evidence was available 17,000 terms that will, in time, all have
for about 11,000 human genes, but analyses of the draft genome sequence strict definitions for their usage. See
reported in that year raised the predicted number to only about 30,000–40,000 http://www.geneontology.org for more
information.
mostly protein-coding genes. The essentially complete sequence of the euchro-
matic component of the human genome was obtained by 2003. After analysis of
the new sequence, the number of predicted protein-coding genes had fallen to
somewhat less than 25,000, and current estimates suggest close to 21,500 human
protein-coding genes.
The difficulty in establishing the precise number of genes is not confined to
the human genome but to all moderately complex genomes. As more and more
genomes were sequenced, however, comparative genomics has been extremely
helpful in gene identification. In particular, comparative genomics has helped in
the identification of protein-coding genes, and the recent estimate of about
21,500 human protein-coding genes is likely to be quite accurate. RNA genes are,
however, much more difficult to identify for two major reasons. First, they lack
any open reading frame, and second, their sequences tend to be much less con-
served than protein-coding sequences.
During the course of the human genome project, little attention was devoted
to RNA genes, and the various estimates of gene number were dominated by
expectations of the number of protein-coding genes. Remarkably, the 2001
Science paper in which Celera reported their analyses of the draft human genome
sequence does not mention RNA genes at all! Since then, there has been a revolu-
tion in our understanding of the importance of RNA genes. As detailed in Chapter
9, there are close to 1000 human miRNA genes, and many entirely new classes of
BASIC GENE EXPRESSION ANALYSES 239
• Genes that are expressed at low levels and/or at unusual cellular locations and stages of
development may not be well represented in available cDNA libraries, and may not be evident if
they have small exons.
• Very large genes have widely dispersed exons and may be misinterpreted as a cluster of two or
more smaller genes. This type of overestimate of gene number arises because many of the cDNA
libraries used to validate genes and their exon–intron organization have comparatively short inserts.
• Some nonfunctional gene copies (pseudogenes) are transcribed and may initially be mistaken for
true genes.
RNA have recently been identified, many with a regulatory function. It may take
some time before we have a clear idea of the number of human RNA genes. Table
8.5 lists some of the difficulties in estimating gene numbers.
RT-PCR/qPCR
(C) (D)
Nonspecific detection
SYBR Green I When free in solution, this dye shows relatively little fluorescence, but when it binds to double-stranded DNA, its
dye fluorescence increases more than 1000-fold. It is, however, nonspecific in its binding of double-stranded DNA and so can
bind to primer dimers or nonspecific amplification products (which is why it is not used in standard PCR). To control for this, a
melting curve is conducted at the end of the run by increasing the temperature slowly from 60°C to 95°C while continuously
monitoring the fluorescence. At a certain temperature, the whole amplified product will dissociate fully, resulting in a
decrease in fluorescence as the dye dissociates from the DNA. The temperature of dissociation is dependent on the length
and composition of the amplicon, allowing different DNA fragments to be distinguished (should there be nonspecific
amplification or significant primer dimers).
Molecular Molecular Beacon probes are designed to have a stem–loop structure F Q key:
Beacon probes that brings a fluorophore at the 5¢ end in very close proximity to a F fluorophore
quencher at the 3¢ end, severely limiting the fluorescence emitted
by the fluorophore. However, in the presence of a complementary activated
F
fluorophore
target sequence, the probe unfolds and hybridizes to the target,
causing the fluorophore to be displaced from the quencher Q quencher
(Figure 1). As a result, the quencher no longer absorbs the
photons emitted by the fluorophore and the probe starts to fluoresce. unbound probe
F Q
F Q
(A) 1 2 C 3 4 5 C 6 7 8
BOX 8.6 OBTAINING ANTIBODIES
400 kD
Antibodies that can specifically detect a protein of interest can be raised in different ways. 200
dystrophin
A comparatively recent approach has been to use phage display, an expression cloning system
(blot) 100
in which recombinant phages are used to display heterologous proteins on bacterial cell
surfaces (see Section 6.3). However, the main method of obtaining antibodies has been the
traditional route of repeatedly injecting animals (such as rodents, rabbits, or goats) with a suitable myosin
immunogen that represents a protein under investigation. One approach is to design a synthetic (gel)
peptide, often 20–50 amino acids long, that is then conjugated to a suitable molecule (such as 1 2 C 3 4 5 C 6 7 8
(B)
keyhole limpet hemocyanin) that will help maximize the immunogenicity. The hope is that the
400 kD
peptide will adopt a conformation resembling that of the native polypeptide sequence, but
success is not guaranteed and several different peptides may need to be designed. 200
dystrophin
An alternative type of immunogen is a fusion protein that contains most or a large portion (blot) 100
of the target protein sequence fused to another protein that confers some advantages, notably in
assisting protein purification. The fusion protein is synthesized by cloning a cDNA for the desired
target protein into a plasmid expression vector that also contains a cDNA for the companion myosin
protein and selecting for recombinants in which the two cDNAs are in frame (see Figure 6.12). (gel)
If the animal’s immune system has responded, specific antibodies should be secreted into the
serum. The antibody-rich serum (antiserum) contains a heterogeneous mixture of antibodies, each Figure 8.21 Immunoblotting (western
produced by a different B lymphocyte. The different antibodies recognize different parts (epitopes) blotting). Immunoblotting involves
of the immunogen and are known as polyclonal antibodies. the detection of polypeptides after size
A homogeneous preparation of antibodies with a defined specificity can be prepared, fractionation in a polyacrylamide gel and
however, by propagating a clone of cells (originally derived from a single B lymphocyte). transfer (blotting) to a membrane. Here, the
Because B cells have a limited lifespan in culture, it is preferable to establish an immortal cell blots show dystrophin detection in muscle
line: antibody-producing cells are fused with cells derived from an immortal B-cell tumor. From samples from individual patients with
the resulting heterogeneous mixture of hybrid cells, those hybrids that have both the ability to Duchenne or Becker muscular dystrophy
make a particular antibody and the ability to multiply indefinitely in culture are selected. Such (lanes 1–8) and in normal control muscle
hybridomas are propagated as individual clones, each of which can provide a permanent and (lanes C). (A) Blot using the Dy4/6D3
stable source of a single type of monoclonal antibody (mAb). antibody, which is specific for the dystrophin
rod domain. (B) Blot using the Dy6/C5
antibody, which is specific for the C-terminal
region of dystrophin. Myosin was used as
Protein expression in cultured cells is often analyzed by using a loading control to provide comparative
quantitation of the amount of muscle
different types of fluorescence microscopy samples in the different lanes. [Courtesy
The subcellular location of a protein of interest can also be tracked by antibodies of the late Louise Anderson (formerly
in cultured cells with the use of different types of fluorescence microscopy. The Nicholson) from Nicholson LV, Johnson MA,
protein may be followed directly by using a specific antibody raised against it. Bushby KM et al. (1993) J. Med. Genet. 30,
737–744. With permission from the BMJ
Alternatively, a peptide tag or other protein is coupled to the protein of interest,
Publishing Group.]
and expression of the protein is followed indirectly. In immunofluorescence
microscopy, antibodies track protein expression directly. The antibodies are
labeled with a suitable fluorescent dye, such as fluorescein or rhodamine, and
after antibody has bound to the protein of interest, the expression of the protein
is directly monitored within cells by using fluorescence microscopy (see Box 7.3
Figure 2 for the principle of fluorescence microscopy).
In epitope tagging, the relevant protein is tagged by attaching to its N- or
C-terminus an immunogenic marker peptide sequence for which a specific anti-
body already exists. To achieve this, a cDNA for the protein of interest is cloned
into an expression vector at a site adjacent to, and in frame with, a coding
sequence for the relevant peptide. Recombinant plasmids are transfected into
the appropriate cells, and the expressed protein is tracked with a fluorescently
labeled antibody specific for the peptide tag. Commonly used epitope tags are
shown in Table 8.7.
TABLE 8.7 COMMONLY USED EPITOPE TAGS FOR TRACKING PROTEIN LOCALIZATION
Sequence of tag Peptide origin Linked to protein at Monoclonal antibody
More recently, fluorescence microscopy has been used to track protein expres-
sion in cells without the use of antibodies. Instead, a naturally fluorescent pro-
tein is coupled to the protein of interest to provide a marker of its expression
pattern. One such protein is the green fluorescent protein (GFP), a 238-amino
acid protein originally identified in the jellyfish Aequoria victoria. Similar pro-
teins are expressed in many jellyfish and seem to be responsible for the green
light that they emit, being stimulated by energy obtained after the oxidation of
luciferin or another photoprotein.
When the gene encoding GFP was cloned and transfected into target cells in
culture, expression of GFP in heterologous cells was also marked by emission of
the green fluorescent light. This means that GFP is an autofluorescent protein.
Because it is a natural functional fluorophore, it can serve directly as a reporter
that can be readily followed by conventional and confocal fluorescence micros-
copy, making it a popular tool for tracking gene expression in cell lines and even
in some whole organisms.
Various genetically engineered mutants of GFP have improved its efficiency
in different ways, and various color mutants have been generated, notably blue
fluorescent protein, cyan fluorescent protein, and yellow fluorescent protein. nucleus
Another autofluorescent protein, the red fluorescent protein, has been derived
from the Discosoma coral.
To track the expression of a protein within living cells, the GFP-coding DNA
sequence has been inserted at the beginning or end of the cDNA for the protein
of interest within a suitable expression vector. Transfection and expression of the
resulting hybrid cDNA sequence produces a fusion protein with GFP linked to
the N- or C-terminus of the protein of interest. In many cases, the fusion protein
is expressed in the same way as the original protein, and so helps to reveal its
location (Figure 8.23).
1 2 3 4 5 8 6 7
experiment
67
43
SDS-
PAGE
30
20
14
The sensitivity of 2D-PAGE is dependent on the detection limit for very scarce
proteins, but the class of SYPRO® dyes can detect protein spots in the nanogram
range. Sensitivity is also influenced by the resolution of the gel, because spots
representing scarce proteins can be obscured by those representing abundant
proteins. Pre-fractionation can remove abundant proteins and so simplify the
initial sample loaded onto the gel, and the resolution can be improved by using
gels with a very narrow pH range (see Figure 8.25).
Important developments to improve the efficiency of 2D-PAGE include soft-
ware for spot recognition and quantitation, algorithms that can compare protein
spots across multiple gels, and robots that can pick interesting spots and process
them for later identification by mass spectrometry as described below. However,
a remaining major limitation of 2D-PAGE is that it is not highly suited to automa-
tion, making it difficult to perform high-throughput analyses of many samples.
Alternative separation methods based on multidimensional high-performance
liquid chromatography (HPLC) may eventually displace 2D-PAGE as the major
platform for protein separation.
Mass spectrometry
Mass spectrometry (MS) is used to determine the accurate masses of molecules
in a particular sample. In expression proteomics it helps identify the proteins in
selected spots that have been resolved by 2D-PAGE. It does this by accurate deter-
mination of molecular mass, a procedure that can be completed more quickly
than direct sequencing of proteins by Edman degradation. MS is also easy to
automate for high-throughput sample analysis, and so it can conveniently proc-
ess thousands of spots from a two-dimensional gel or hundreds of HPLC
fractions.
Until recently, MS could not be applied to large molecules such as proteins
and nucleic acids because these were broken into random fragments during the
ionization process. This limitation has been overcome using soft-ionization
methods such as matrix-assisted laser desorption/ionization (MALDI) (Figure
8.26 and Box 8.8) and electrospray ionization (ESI).
HIGHLY PARALLEL ANALYSES OF GENE EXPRESSION 251
sample paths of ionized peptides reflector Figure 8.26 Principle of MALDI-TOF mass
matrix spectrometry. The analyte (the sample
under study—usually a collection of tryptic
peptide fragments) is mixed with a matrix
compound and placed near the source of a
low
laser. The laser heats up the analyte–matrix
mass-to-charge
peptide crystals, causing the analyte to expand
into the gas phase without significant
high
fragmentation. Ions then travel down a
mass-to-charge
laser flight tube to a reflector, which focuses the
peptide
ions onto a detector. The time of flight (the
time taken for ions to reach the detector) is
dependent on the mass/charge ratio and
detector allows the mass of each molecule in the
analyte to be recorded.
The masses of proteins, or more usually the peptide fragments derived from
them by digestion with proteases, can be used to identify the proteins by correlat-
ing the experimentally determined masses with those predicted from database
sequences. As described below there are three different ways to annotate a pro-
tein by MS (Figure 8.27):
• Peptide mass fingerprinting (PMF) is best suited for simple proteomes. A
simple protein mixture (typically a single spot from a two-dimensional gel) is
digested with trypsin to generate a collection of tryptic peptides. These are
subject to MALDI-MS using a time-of-flight (TOF) analyzer (see Figure 8.26
and Box 8.8), which returns a set of mass spectra. The spectra are used as a
search query against protein sequence databases. The search algorithm per-
forms virtual trypsin digests of all the proteins in the database and calculates
the masses of the predicted tryptic peptides. It then attempts to match these
predicted masses against those determined experimentally.
exact match
LGMDGYR
GISLANWMCLAK human lysozyme
WESGYNTR
MALDI-TOF
analysis peptide mass
trypsin fingerprint
digestion
check
SLANW human lysozyme collection
RATNYN sheep lysozyme of EST hits
ESI-MS/MS FQINS rat lactalbumin
analysis WESGYN
GISLAN
GISLANW
GISLANWM
fragment ion GISLANWMC peptide ladder
masses GISLANWMCL
GISLANWMCLA
GISLANWMCLAK
Figure 8.27 Protein annotation by mass spectrometry. Individual protein samples (such as spots from two-dimensional gels) are
digested with trypsin, which cleaves at the C-terminal side of lysine (K) or arginine (R) residues as long as the next residue is
not proline. The tryptic peptides can be analyzed as intact molecules by MALDI-TOF, and the masses used as search queries against
protein databases. Algorithms are used that take protein sequences, cut them with the same cleavage specificity as trypsin, and
compare the theoretical masses of these peptides with the experimental masses obtained by MS. Ideally, the masses of several
peptides should identify the same parent protein, in this case human lysozyme. There may be no hits if the protein is not in the
database or, more probably, that it has been subject to post-translational modification or artifactual modification during the
experiment. In these circumstances, electrospray ionization coupled with tandem mass spectrometry (ESI-MS/MS) can be used
to fragment the ions. The fragment ion masses can be used to search EST databases and obtain partial matches, which may lead
eventually to the correct annotation. Alternatively, the masses of peptide ladders can be used to determine protein sequences de novo.
FURTHER READING
DNA sequencing Tucker T, Marra M, Friedman JM (2009) Massively parallel DNA
Clarke J, Wu HC, Jayasinghe L et al. (2009) Continuous base sequencing: the next big thing in genetic medicine. Am. J.
identification for single-molecule nanopore DNA sequencing. Hum. Genet. 85, 142–154.
Nat. Nanotechnol. 4, 265–270.
Eid J, Fehr A, Gray J et al. (2009) Real-time DNA sequencing from
single polymerase molecules. Science 323, 133–138. Human Genome Project and human genetic
Gupta P (2008) Single-molecule DNA sequencing technologies maps (see also Box 8.2 for additional historical
for future genomics research. Trends Biotechnol. 26, 602–611. references)
Hodges E, Xuan Z, Balija V et al. (2007) Genome-wide in situ
exon capture for selective resequencing. Nat. Genet. 39, All About The Human Genome Project (HGP). http://www
1522–1527. .genome.gov/10001772 [An educational resource maintained
Mamanova L, Coffey AJ, Scott CE et al. (2010) Target-enrichment by the US National Human Genome Research Institute.]
strategies for next generation sequencing. Nat. Meth. Garber M, Zody MC, Arachchi HM et al. (2009) Closing gaps in the
7, 111–118. human genome using sequencing by synthesis. Genome Biol.
Mardis ER (2008) The impact of next-generation sequencing 10, R60.
technology on genetics. Trends Genet. 24, 133–141. Genome Reference Consortium. http://www.ncbi.nlm.nih
Metzker M (2010) Sequencing technologies—the next .gov/projects/genome/assembly/grc/ [Gives updates and
generation. Nat. Rev. Genet. 11, 31–46. graphical overviews of the human and mouse genomes.]
254 Chapter 8: Analyzing the Structure and Expression of Genes and Genomes
Imanishi T, Itoh T, Suzuki Y et al. (2004) Integrative annotation of The Gene Ontology Consortium (2008) The Gene Ontology
21,037 human genes validated by full-length cDNA clones. Project in 2008. Nucleic Acids Res. 36, D440–D444. [See also
PLoS Biol. 2, e162. http://www.geneontology.org/]
International HapMap Consortium (2005) A haplotype map of Wheeler DL, Barrett T, Benson DA et al. (2008) Database resources
the human genome. Nature 437, 1299–1320. of the National Center for Biotechnology Information. Nucleic
International HapMap Consortium (2007) A second generation Acids Res. 36 (Database issue), D13–D21.
human haplotype map of over 3.1 million SNPs. Nature 449,
851–861. Basic gene expression analyses
Various authors (2001) Human Genome Issue. Nature 409, Applied Biosystems (undated) Real-Time PCR Vs. Traditional PCR.
813–958. [Available at Nature Network OmicsGateway http://www.appliedbiosystems.com/support/tutorials/pdf/
portal, http://www.nature.com/omics/subjects/ rtpcr_vs_tradpcr.pdf [Real-time PCR tutorial.]
genomesequenceandanalysis/archive/2001/index.html] VanGuilder HD, Vrana KE & Freeman WM (2008) Twenty-five
Various authors (2001) Human Genome Issue. Science 291, years of quantitative PCR for gene expression analysis.
1177–1351 [Papers available at http://www.sciencemag.org/ Biotechniques 44, 619–626.
content/vol291/issue5507/index.dtl] Ward TH & Lippincott-Schwartz J (2006) The uses of green
Various authors (2003) User’s Guide to the Human Genome. fluorescent protein in mammalian cells. Methods Biochem.
Nature Genet. 35 (1s), 1–79. [Available at http://www.nature Anal. 47, 305–337.
.com/ng/journal/v35/n1s/index.html]
Various authors (2006) Human Genome Collection. Nature S1, Highly parallel gene expression analyses: transcript
1–305. [Contains papers analyzing the sequences of all 24 profiling
human chromosomes; available at http://www.nature.com/ Allison DB, Cui X, Page GP & Sabripour M (2006) Microarray data
nature/supplements/collections/humangenome/index.html] analysis: from disarray to consolidation and consensus. Nat.
Waterston RH, Lander ES & Sulston JE (2002) On the sequencing Rev. Genet. 7, 55–65.
of the human genome. Proc. Natl Acad. Sci. USA 99, Belacel N, Wang Q & Cuperlovic-Culf M (2006) Clustering
3712–3716. methods for microarray gene expression data. Omics
10, 507–531.
Model Organism Genome Projects
Jongeneel CV, Delorenzi M, Iseli C et al. (2005) An atlas of
Genome News Network. A Quick Guide To Sequenced human gene expression from massively parallel signature
Genomes. http://www.genomenewsnetwork.org/resources/ sequencing (MPSS). Genome Res. 15, 1007–1014.
sequenced_genomes/genome_guide_index.shtml. [Gives Murray D, Doran P, MacMathuna P & Moss AC (2007) In silico gene
a brief account of each model organism that has been expression analysis—an overview. Mol. Cancer 6, 50.
sequenced and an associated literature cross-reference.] Various authors (2002) The Chipping Forecast II. Nat. Genet.
Liolios K, Mavrommatis K, Tavernarakis N & Kyrpides NC (2008) 32 (Suppl.), 465–551.
The Genomes On Line Database (GOLD) in 2007: status of
genomic and metagenomic projects and their associated Highly parallel gene expression analyses: protein
metadata. Nucleic Acids Res. 36 (Database issue), D475–D479. profiling
Genome databases, browsers, gene annotation, Hamdan M & Righetti PG (2003) Assessment of protein
expression by means of 2-D gel electrophoresis with and
and associated bioinformatics
without mass spectrometry. Mass Spectrom. Rev. 22, 272–284.
Barnes MR (ed.) (2007) Bioinformatics for Geneticists: A Kolker E, Higdon R & Hogan JM (2006) Protein identification
Bioinformatics Primer for the Analysis of Genetic Data, 2nd and expression analysis using mass spectrometry. Trends
ed. John Wiley and Sons. Microbiol. 14, 229–235.
Bina M (2006) Use of genome browsers to locate your favorite
genes. Methods Mol. Biol. 338, 1–7.
Reeves GA, Talavera D & Thornton JM (2009) Genome and
proteome annotation: organization, interpretation and
integration. J. R. Soc. Interface 6, 129–147.
Spudich G, Fernández-Suárez XM & Birney E (2007) Genome
browsing with Ensembl: a practical overview. Brief. Funct.
Genomics Proteomics 6, 202–219.
Chapter 9
Organization of the
Human Genome 9
KEY CONCEPTS
• The human genome is subdivided into a large nuclear genome with more than 26,000
genes, and a very small circular mitochondrial genome with only 37 genes. The nuclear
genome is distributed between 24 linear DNA molecules, one for each of the 24 different
types of human chromosome.
• Human genes are usually not discrete entities: their transcripts frequently overlap those
from other genes, sometimes on both strands.
• Duplication of single genes, subchromosomal regions, or whole genomes has given rise to
families of related genes.
• Genes are traditionally viewed as encoding RNA for the eventual synthesis of proteins, but
many thousands of RNA genes make functional noncoding RNAs that can be involved in
diverse functions.
• Noncoding RNAs often regulate the expression of specific target genes by base pairing with
their RNA transcripts.
• Some copies of a functional gene come to acquire mutations that prevent their expression.
These pseudogenes originate either by copying genomic DNA or by copying a processed
RNA transcript into a cDNA sequence that reintegrates into the genome
(retrotransposition).
• Occasionally, gene copies that originate by retrotransposition retain their function because
of selection pressure. These are known as retrogenes.
• Transposons are sequences that move from one genomic location to another by a cut-and-
paste or copy-and-paste mechanism. Retrotransposons make a cDNA copy of an RNA
transcript that then integrates into a new genomic location.
• Very large arrays of high-copy-number tandem repeats, known as satellite DNA, are
associated with highly condensed, transcriptionally inactive heterochromatin in human
chromosomes.
256 Chapter 9: Organization of the Human Genome
nuclear genome
The human genome comprises two parts: a complex nuclear genome with more
than 26,000 genes, and a very simple mitochondrial genome with only 37 genes
(Figure 9.1). The nuclear genome provides the great bulk of essential genetic
information and is partitioned between either 23 or 24 different types of chromo-
somal DNA molecule (22 autosomes plus an X chromosome in females, and an
additional Y chromosome in males).
Mitochondria possess their own genome—a single type of small circular
DNA—encoding some of the components needed for mitochondrial protein syn-
thesis on mitochondrial ribosomes. However, most mitochondrial proteins are
encoded by nuclear genes and are synthesized on cytoplasmic ribosomes before
being imported into the mitochondria.
As detailed in Chapter 10, sequence comparisons with other mammalian
genomes and vertebrate genomes indicate that about 5% of the human genome
has been strongly conserved during evolution and is presumably functionally
important. Protein-coding DNA sequences account for just 1.1% of the genome.
The other 4% or so of strongly conserved genome sequences consists of non-
protein-coding DNA sequences, including genes whose final products are func-
tionally important RNA molecules, and a variety of cis-acting sequences that
regulate gene expression at DNA or RNA levels. Although sequences that make
non-protein-coding RNA have not generally been so well conserved during evo-
lution, some of the regulatory sequences are much more strongly conserved than
protein-coding sequences.
Protein-coding sequences frequently belong to families of related sequences
that may be organized into clusters on one or more chromosomes or be dispersed
throughout the genome. Such families have arisen by gene duplication during
evolution. The mechanisms giving rise to duplicated genes also give rise to non-
functional gene-related sequences (pseudogenes).
One of the big surprises in the past few years has been the discovery that the
human genome is transcribed to give tens of thousands of different noncoding
RNA transcripts, including whole new classes of tiny regulatory RNAs not previ-
ously identified in the draft human genome sequences published in 2001.
Although we are close to obtaining a definitive inventory of human protein-cod-
ing genes, our knowledge of RNA genes remains undeveloped. It is abundantly
clear, however, that RNA is functionally much more versatile than we previously
suspected. In addition to a rapidly increasing list of human RNA genes, we have
also become aware of huge numbers of pseudogene copies of RNA genes.
A very large fraction of the human genome, and other complex genomes, is
made up of highly repetitive noncoding DNA sequences. A sizeable component
is organized in tandem head-to-tail repeats, but the majority consists of inter-
spersed repeats that have been copied from RNA transcripts in the cell by reverse
GENERAL ORGANIZATION OF THE HUMAN GENOME 257
16
genes. All genes lack introns and are closely
S
Glu
PL clustered. The symbols for protein-coding
ND6 Leu genes are shown here without the prefix
ND5 MT- that signifies mitochondrial gene. The
ND1
genes that encode subunits 6 and 8 of the
ATP synthase (ATP6 and ATP8) are partly
Gln Ile
Leu f-Met overlapping. Other polypeptide-encoding
Ser
His genes specify seven NADH dehydrogenase
OL Ala ND2 subunits (ND4L and ND1–ND6), three
Asn
ND4 cytochrome c oxidase subunits (CO1–CO3),
Cys and cytochrome b (CYB). tRNA genes are
Tyr Trp
L represented with the name of the amino
strand ND4L acid that they bind. The short 7S DNA strand
Arg Ser is produced by repeat synthesis of a short
ND3 segment of the H strand (see Figure 9.2).
Gly CO1
For further information, see the MITOMAP
CO3 database at http://www.mitomap.org/.
Asp
ATP6 ATP8 CO2
Lys
starts from common promoters in the CR/D-loop region and continues round
the circle (in opposing directions for the two different strands), to generate large
multigenic transcripts. The mature RNAs are subsequently generated by cleavage
of the multigenic transcripts.
Almost two-thirds (24 out of 37) of the mitochondrial genes specify a func-
tional noncoding RNA as their final product. There are 22 tRNA genes, one for
each of the 22 types of mitochondrial tRNA. In addition, two rRNA genes are ded-
icated to making 16S rRNA and 12S rRNA (components of the large and small
subunits, respectively, of mitochondrial ribosomes). The remaining 13 genes
encode polypeptides, which are synthesized on mitochondrial ribosomes. These
13 polypeptides form part of the mitochondrial respiratory complexes, the
enzymes of oxidative phosphorylation that are engaged in the production of ATP.
However, the great majority of the polypeptides that make up the mitochondrial
oxidative phosphorylation system plus all other mitochondrial proteins are
encoded by nuclear genes (Table 9.1). These proteins are translated on cytoplas-
mic ribosomes before being imported into the mitochondria.
Unlike its nuclear counterpart, the human mitochondrial genome is extremely
compact: all 37 mitochondrial genes lack introns and are tightly packed (on aver-
age, there is one gene per 0.45 kb). The coding sequences of some genes (notably
those encoding the sixth and eighth subunits of the mitochondrial ATP synthase)
show some overlap (Figure 9.4) and, in most other cases, the coding sequences of
neighboring genes are contiguous or separated by one or two noncoding bases.
Some genes even lack termination codons; to overcome this deficiency, UAA
codons have to be introduced at the post-transcriptional level (see Figure 9.4).
The mitochondrial genetic code
Prokaryotic genomes and the nuclear genomes of eukaryotes encode many hun-
dreds to usually many thousands of different proteins. They are subject to a uni-
versal genetic code that is kept invariant: mutations that could potentially change
GENERAL ORGANIZATION OF THE HUMAN GENOME 259
I NADH dehydrogenase 7 42
rRNA 2 0
tRNA 22 0
Ribosomal proteins 0 79
the genetic code are likely to produce at least some critically misfunctional pro-
teins and so are strongly selected against. However, the much smaller mitochon-
drial genomes make very few polypeptides. As a result, the mitochondrial genetic
code has been able to drift by mutation to be slightly different from the universal
genetic code.
In the mitochondrial genetic code there are 60 codons that specify amino
acids, one fewer than in the nuclear genetic code. There are four stop codons:
UAA and UAG (which also serve as stop codons in the nuclear genetic code) and
AGA and AGG (which specify arginine in the nuclear genetic code; see Figure
1.25). The nuclear stop codon UGA encodes tryptophan in mitochondria, and
AUA specifies methionine not isoleucine.
The mitochondrial genome specifies all the rRNA and tRNA molecules needed
for synthesizing proteins on mitochondrial ribosomes, but it relies on nuclear-
encoded genes to provide all other components, such as the protein components
of mitochondrial ribosomes and aminoacyl tRNA synthetases. Because there are
only 22 different types of human mitochondrial tRNA, individual tRNA molecules
need to be able to interpret several different codons. This is possible because of
third-base wobble in codon interpretation. Eight of the 22 tRNA molecules have
anticodons that each recognize families of four codons differing only at the third
base. The other 14 tRNAs recognize pairs of codons that are identical at the first
two base positions and share either a purine or a pyrimidine at the third base. Figure 9.4 The MT-ATP6 and MT-ATP8
Between them, therefore, the 22 mitochondrial tRNA molecules can recognize a genes are transcribed in different
total of 60 codons [(8 ¥ 4) + (14 ¥ 2)]. reading frames from overlapping
segments of the mitochondrial H
strand. MT-ATP8 is transcribed from
nucleotides 8366 to 8569, and MT-ATP6
MT-ATP8 from 8527 to 9204. After transcription,
8366 8522 8577 9202 9206 the RNA encoding the ATP synthase 6
subunit is cleaved after position 9206
1 53 68 and polyadenylated, resulting in a
Met Pro Lys Trp Thr Lys Ile Cys Ser Leu His Ser Leu Pro Pro Gln Ser Stop C-terminal UAA codon where the first
ATG CCAAAATGAACGAAAATCTGTTCGCTTCATTCATTGCCCCCACAATCCTAGGCCTA ACATA two nucleotides are derived ultimately
Met Asn Glu Asn Leu Phe Ala Ser Phe Ile Ala Pro Thr Ile Leu Gly Leu Thr from the TA at positions 9205–9206
1 17 226 and the third nucleotide is the first A of
MT-ATP6 the poly(A) tail.
260 Chapter 9: Organization of the Human Genome
Number of different DNA molecules 23 (in XX cells) or 24 (in XY cells); all linear one circular DNA molecule
Total number of DNA molecules per cell varies according to ploidy; 46 in diploid cells often several thousand copies (but copy number
varies in different cell types)
Associated protein several classes of histone and nonhistone protein largely free of protein
Repetitive DNA more than 50% of genome; see Figure 9.1 very little
Transcription genes are often independently transcribed multigenic transcripts are produced from both
the heavy and light strands
Codon usage 61 amino acid codons plus three stop codonsa 60 amino acid codons plus four stop codonsa
Recombination at least once for each pair of homologs at meiosis not evident
Chromosome sizes are taken from the ENSEMBL Human Map View (http://www.ensembl.org/Homo_sapiens/Location/Genome). Heterochromatin
figures are estimates abstracted from International Human Genome Sequencing Consortium (2004) Nature 431, 931–945. The size of the total human
genome is estimated to be about 3.1 Gb, with euchromatin accounting for close to 2.9 Gb and heterochromatin accounting for 200 Mb.
centromere. Certain chromosomes, notably 1, 9, 16, and 19, also have significant
amounts of heterochromatin in the euchromatic region close to the centromere
(pericentromere), and the acrocentric chromosomes each have two sizeable het-
erochromatic regions. But the most significant representation is in the Y chromo-
some, where most of the DNA is organized as heterochromatin.
The base composition of the euchromatic component of the human genome
averages out at 41% (G+C), but there is considerable variation between chromo-
somes, from 38% (G+C) for chromosomes 4 and 13 up to 49% (for chromosome
19). It also varies considerably along the lengths of chromosomes. For example,
the average (G+C) content on chromosome 17q is 50% for the distal 10.3 Mb but
drops to 38% for the adjacent 3.9 Mb. There are regions of less than 300 kb with
even wider swings, for example from 33.1% to 59.3% (G+C).
The proportion of some combinations of nucleotides can vary considerably.
Like other vertebrate nuclear genomes, the human nuclear genome has a con-
spicuous shortage of the dinucleotide CpG. However, certain small regions of
transcriptionally active DNA have the expected CpG density and, significantly,
are unmethylated or hypomethylated (CpG islands; Box 9.1).
The human genome contains at least 26,000 genes, but the exact
gene number is difficult to determine
Several years after the Human Genome Project delivered the first reference
genome sequence, there is still very considerable uncertainty about the total
human gene number. When the early analyses of the genome were reported in
2001, the gene catalog generated by the International Human Genome Sequenc-
ing Consortium was very much oriented toward protein-coding genes. Original
estimates suggested more than 30,000 human protein-coding genes, most of
which were gene predictions without any supportive experimental evidence. This
number was an overestimate because of errors that were made in defining genes
(see Box 8.5).
To validate gene predictions supportive evidence was sought, mostly by evo-
lutionary comparisons. Comparison with other mammalian genomes, such as
262 Chapter 9: Organization of the Human Genome
DNA methylation in multicellular animals often involves methylation (A) cytosine (B)
of a proportion of cytosine residues, giving 5-methylcytosine (mC). m
NH2 5¢ CpG 3¢
In most animals (but not Drosophila melanogaster), the dinucleotide
m
CpG is a common target for cytosine methylation by specific cytosine 4C 3¢ GpC 5¢
5
methyltransferases, forming mCpG (Figure 1A). 3N CH
DNA methylation has important consequences for gene deamination
C CH
expression and allows particular gene expression patterns to be stably O 2
NH 6
1
transmitted to daughter cells. It has also been implicated in systems
methylation 5¢ TpG 3¢
of host defense against transposons. Vertebrates have the highest
at 5¢ carbon m
levels of 5-methylcytosine in the animal kingdom, and methylation 3¢ GpC 5¢
is dispersed throughout vertebrate genomes. However, only a small NH2
percentage of cytosines are methylated (about 3% in human DNA, DNA duplication
mostly as mCpG but with a small percentage as mCpNpG, where N is C
any nucleotide). N C CH3
5¢ TpG 3¢
5-Methylcytosine is chemically unstable and is prone to C CH 3¢ ApC 5¢
deamination (see Figure 1A). Other deaminated bases produce O NH
+
derivatives that are identified as abnormal and are removed by the 5-methylcytosine
DNA repair machinery (e.g. unmethylated cytosine produces uracil 5¢ CpG 3¢
m
when deaminated). However, 5-methyl cytosine is deaminated to deamination 3¢ GpC 5¢
give thymine, a natural base in DNA that is not recognized as being
abnormal by cellular DNA repair systems. Over evolutionarily long O
periods, therefore, the number of CpG dinucleotides in vertebrate
DNA has gradually fallen because of the slow but steady C
HN C CH3
conversion of CpG to TpG (and to CpA on the complementary
strand; Figure 1B). C CH
Although the overall frequency of CpG in the vertebrate genome O NH
is low, there are small stretches of unmethylated or hypomethylated thymine
DNA that are characterized by having the normal, expected CpG (forms mismatch with G;
frequency. Such islands of normal CpG density (CpG islands) are inefficiently recognized by
comparatively GC-rich (typically more than 50% GC) and extend over DNA repair system)
hundreds of nucleotides. CpG islands are gene markers because they Figure 1 Instability of vertebrate CpG dinucleotides. (A) The cytosine
are associated with transcriptionally active regions. Highly methylated in CpG dinucleotides is a target for methylation at the 5¢ carbon atom.
DNA regions are prone to adopting a condensed chromatin The resulting 5-methylcytosine is deaminated to give thymine (T), which
conformation, but for actively transcribing DNA the chromatin needs is inefficiently recognized by the DNA repair system and so tends to
to be in a more extended, open unmethylated conformation that persist (however, deamination of unmethylated cytosine gives uracil
allows various regulatory proteins to bind more readily to promoters which is readily recognized by the DNA repair system). (B) The vertebrate
and other gene control regions. CpG dinucleotide is gradually being replaced by TpG and CpA.
those of the mouse and the dog, failed to identify counterparts of many of the
originally predicted human genes. By late 2009 the estimated number of human
protein-coding genes appeared to be stabilizing somewhere around 20,000 to
21,000, but huge uncertainty remained about the number of human RNA genes.
RNA genes are difficult to identify by using computer programs to analyze
genome sequences: there are no open reading frames to screen for, and many
RNA genes are very small and often not well conserved during evolution. There is
also the problem of how to define an RNA gene. As we detail in Chapter 12, com-
prehensive analyses have recently suggested that the great majority of the
genome—and probably at least 85% of nucleotides—is transcribed. It is currently
unknown how much of the transcriptional activity is background noise and how
much is functionally significant.
By mid-2009, evidence for at least 6000 human RNA genes had been obtained,
including thousands of genes encoding long noncoding RNAs that are thought to
be important in gene regulation. In addition, there is evidence for tens of thou-
sands of different tiny human RNAs, but in many such cases quite large numbers
of different tiny RNAs are obtained by the processing of single RNA transcripts.
We look at noncoding RNAs in detail in Section 9.3.
The combination of about 20,000 protein-coding genes and at least 6000 RNA
genes gives a total of at least 26,000 human genes. This remains a provisional
total gene number; defining RNA genes is challenging and it will be some time
before we obtain an accurate human gene number.
GENERAL ORGANIZATION OF THE HUMAN GENOME 263
TABLE 9.4 STRUCTURAL VARIATION IN SIZE AND ORGANIZATION OF HUMAN PROTEINCODING GENES
Human protein Size of protein (no. Size of gene No. of Coding DNA (%) Average size of Average size of
of amino acids) (kb) exons exon (bp) intron (bp)
Where isoforms are evident, the given figures represent the largest isoforms. As genes get larger, exon size remains fairly constant but intron sizes can
become very large. Internal exons tend to be fairly uniform in size, but the terminal exon or some exons near the 3¢ end can be many kilobases long;
for example, exon 26 of the APOB gene is 7.5 kb long. Note the extraordinarily high exon content and comparatively small intron sizes in the genes
encoding type VII collagen and titin. In addition to SRY, other single-exon protein-coding genes in the nuclear genome include retrogenes (see
Table 9.8) and genes encoding other SOX proteins, interferons, histones, many G-protein-coupled receptors, heat shock proteins, many ribonucleases,
and various neurotransmitter receptors and hormone receptors.
266 Chapter 9: Organization of the Human Genome
introns, there is an inverse correlation between gene size and fraction of coding
DNA (see Table 9.4). This does not arise because exons in large genes are smaller
than those in small genes. The average exon size in human genes is close to
300 bp, and exon size is comparatively independent of gene length. Instead, there
is huge variation in intron lengths, and large genes tend to have very large introns
(see Table 9.4). Transcription of long introns is, however, costly in time and energy;
transcription of the 2.4 Mb dystrophin gene takes about 16 hours. Thus, very
highly expressed genes often have short introns or no introns at all.
Repetitive sequences within coding DNA
Highly repetitive DNA sequences are often found within introns and flanking
sequences of genes. They will be detailed in Section 9.4. In addition, repetitive
DNA sequences are found to different extents in exons. Tandem repetition of very
short oligonucleotide sequences (1–4 bp) is frequent and may simply reflect sta-
tistically expected frequencies for certain base compositions. Tandem repetition
of sequences encoding known or assumed protein domains is also quite com-
mon, and it may be functionally advantageous by providing a more available bio-
logical target.
The sequence identities between the repeated protein domains are often
quite low but can sometimes be high. Lipoprotein Lp (a), encoded by the LPA
gene on chromosome 6q26, provides a classical example. It contains multiple
tandemly repeated kringle domains, which are each about 114 amino acids long
and form disulfide-bonded loops. The different kringle domains are often nearly
identical in amino acid sequence. Even at the nucleotide sequence level the DNA
repeats that encode the kringle domains show very high levels of sequence iden-
tity, making them prone to unequal crossover. As a result, the LPA gene is subject
to length polymorphism, and the number of kringle domains in lipoprotein Lp
(a) varies but is usually 15 or more.
longest isoform. Data were obtained largely from ENSEMBL release 55 datasets.
268 Chapter 9: Organization of the Human Genome
(A)
0 900 kb
PPIPs
2.2 kb 10 kb 4 kb
Figure 9.7 Overlapping genes and genes-within-genes. (A) Genes in the class III region of the HLA complex are tightly packed and
overlapping in some cases. Arrows show the direction of transcription. (B) Intron 27b of the NF1 (neurofibromatosis type I) gene is 60.5 kb
long and contains three small internal genes, each with two exons, which are transcribed from the opposing strand. The internal genes (not
drawn to scale) are OGMP (oligodendrocyte myelin glycoprotein) and EVI2A and EVI2B (human homologs of murine genes thought to be
involved in leukemogenesis and located at ecotropic viral integration sites).
separate transcript for each gene. Such genes are said to form part of a poly-
cistronic (= multigenic) transcription unit. Polycistronic transcription units are
common in simple genomes such as those of bacteria and the mitochondrial
genome (see Figure 9.3). Within the nuclear genome, some examples are known
of different proteins being produced from a common transcription unit. Typically,
they are produced by cleavage of a hybrid precursor protein that is translated
from a common transcript. The A and B chains of insulin, which are intimately
related functionally, are produced in this way (see Figure 1.26), as are the related
peptide hormones somatostatin and neuronostatin. Sometimes, however, func-
tionally distinct proteins are produced from a common protein precursor. The
UBA52 and UBA80 genes, for example, both generate ubiquitin and an unrelated
ribosomal protein (S27a and L40, respectively).
More recent analyses have shown that the long-standing idea that most
human genes are independent transcription units is not true, and so the defini-
tion of a gene will need to be radically revised. Multigenic transcription is now
known to be rather frequent in the human genome, and specific proteins and
functional noncoding RNAs can be made by common RNA precursors. This will
be explored further in Section 9.3.
albumin
cluster 4q12
ALB AFP ALF GC/DBP
Growth hormone gene cluster 5 clustered within 67 kb; one pseudogene (Figure 9.8) 17q24
Class I HLA heavy chain genes ~20 clustered over 2 Mb (Figure 9.10) 6p21
HOX genes 38 organized in four clusters (Figure 5.5) 2q31, 7p15, 12q13, 17q21
Histone gene family 61 modest-sized clusters at a few locations; two large clusters many
on chromosome 6
Olfactory receptor gene family > 900 about 25 large clusters scattered throughout the genome many
NF1 (neurofibromatosis type I) > 12 one functional gene at 22q11; others are nonprocessed many, mostly pericentromeric
pseudogenes or gene fragments (Figure 9.11)
Ferritin heavy chain 20 one functional gene on chromosome 11; most are many
processed pseudogenes
TABLE 9.7 EXAMPLES OF HUMAN GENES WITH SEQUENCE MOTIFS THAT ENCODE HIGHLY CONSERVED DOMAINS
Gene family Number of genes Sequence motif/domain
Homeobox genes 38 HOX genes plus 197 homeobox specifies a homeodomain of ~60 amino acids; a wide variety of different
orphan homeobox genes subclasses have been defined
PAX genes 9 paired box encodes a paired domain of ~124 amino acids; PAX genes often also have a
type of homeodomain known as a paired-type homeodomain
SOX genes 19 SRY-like HMG box which encodes a domain of 70–80 amino acids
Forkhead domain genes 50 the forkhead domain is ~110 amino acids long
POU domain genes 16 the POU domain is ~150 amino acids long
PROTEIN-CODING GENES 271
A A
(A) (B)
Figure 1 Origins of nonprocessed and processed
gene duplication transcription
pseudogenes. (A) Copying of genomic DNA
and processing
sequence containing gene A can produce duplicate
copies of gene A. Strong selection pressure needs mRNA
A A reverse
to be applied to one of the copies to maintain
gene function (bold arrow), but the other copy transcriptase
can be allowed to mutate (dashed arrow). If it mutational mutational cDNA
picks up inactivating mutations (red circles), a constraint flexibility
chromosomal
nonprocessed pseudogene (yA) can arise. integration
A yA
(B) A processed pseudogene arises after cellular
reverse transcriptases convert a transcript of a gene
into a cDNA that then is able to integrate back into
mutation
the genome (see Figure 9.12 for details). The lack
of important sequences such as a promoter usually promoter inactivating non-inactivating
results in an inactive gene copy. mutation mutation yA
PROTEIN-CODING GENES 273
TABLE 9.8 EXAMPLES OF HUMAN INTRONLESS RETROGENES AND THEIR PARENTAL INTRONCONTAINING HOMOLOGS
Retrogene Intron-containing homolog Product
TAF1L at 9p13 TAF1 at Xq13 TATA box binding protein associated factor, 250 kD
We have known for many decades that various ubiquitous ncRNA classes are
essential for cell function. Until recently, however, we have largely been accus-
tomed to thinking of ncRNA as not much more than a series of accessories that are
needed to process genes to make proteins. Transfer RNAs are needed at the very
end of the pathway, serving to decode the codons in mRNA and provide amino
acids in the order they are needed for insertion into growing polypeptide chains.
Ribosomal RNAs are essential components of the ribosomes, the complex ribo-
nucleoprotein factories of protein synthesis.
Other ubiquitous ncRNAs were known to function higher up the pathway to
ensure the correct processing of mRNA, rRNA, and tRNA precursors. Various
small RNAs are components of complex ribonucleoproteins involved in different
processing reactions, including splicing, cleavage of rRNA and tRNA precursors,
and base modifications that are required for RNA maturation. Typically these
RNAs work as guide RNAs, by base pairing with complementary sequences in the
precursor RNA.
We have also long been aware of a few ncRNAs that have other functions, such
as RNAs implicated in X-inactivation and imprinting, and the RNA component of
the telomerase ribonucleoprotein needed for synthesis of the DNA of telomeres
(see Figure 2.13). But these RNAs seemed to be quirky exceptions.
In the past decade or so, however, there has been a revolution in how we view
RNA. Many thousands of different ncRNAs have recently been identified in ani-
mal cells. Many of them are developmentally regulated and have been shown to
have crucial roles in a whole variety of different processes that occur in special-
ized tissues or specific stages of development. Several ncRNAs have already been
implicated in cancer and genetic disease.
Now that the human genome is known to have close to 20,000 protein-coding
genes, about the same as in the 1 mm nematode Caenorhabditis elegans, which
has only about 1000 cells—the question now is whether RNA-based regulation is
the key feature in explaining our complexity. Certainly, the complexity of RNA-
based gene regulation increases markedly in complex organisms, as is described
in Chapter 10. Maybe it is time to view the genome as more of an RNA machine
than just a protein machine.
276 Chapter 9: Organization of the Human Genome
HBII-85
snoRNA
Figure 9.13 Functional diversity of RNA. Various ubiquitous RNAs function in housekeeping roles in cells, and in protein synthesis
and protein export from cells (using 7SL RNA, the RNA component of the signal recognition particle). RNAs involved in RNA
maturation include spliceosomal small nuclear RNAs (snRNAs), small nucleolar RNAs (snoRNAs), small Cajal body RNAs (scaRNAs),
and two RNA ribonucleases. Telomere DNA synthesis is performed by a ribonucleoprotein that consists of TERC (the telomerase RNA
component) and a reverse transcriptase (see Figure 2.13). The Y RNA family are involved in chromosomal DNA replication, and RNase
MRP has a crucial role in initiating mtDNA replication as well as in cleaving pre-rRNA precursors in the nucleolus. Diverse classes of
RNA serve a regulatory function in gene expression. Although some do have general accessory roles in transcription, regulatory
RNAs are often restricted in their expression to certain cell types and/or developmental stages. Three classes of tiny RNA use RNA
interference (RNAi) pathways to act as regulators: Piwi protein-interacting RNAs (piRNAs) regulate the activity of transposons in
germ-line cells; microRNAs (miRNAs) regulate the expression of target genes; and endogenous short interfering RNAs (endo-siRNAs)
act as gene regulators and also regulate some types of transposon. Some members of the snoRNA family, such as HBII-85 snoRNA,
are involved in epigenetic gene regulation. A large number of long RNAs are involved in regulating various genes, often at the
transcriptional level. Some are known to be involved in epigenetic gene regulation in imprinting, X-inactivation, and so on. RNA
families are shown in black type; individual RNAs are shown in red.
Ribosomal 12S rRNA, 16S rRNA 1 of each components of mitochondrial ribosomes cleaved from multigenic transcripts produced
RNA (rRNA), by H strand of mtDNA (Figure 9.3)
~120–5000
nucleotides 5S rRNA, 5.8S rRNA, 1 of each components of cytoplasmic ribosomes 5S rRNA is encoded by multiple genes in
18S rRNA, 28S rRNA various gene clusters; 5.8S, 18S, and 28S
rRNA are cleaved from multigenic transcripts
(Figure 1.22); the multigenic 5.8S–18S–28S
transcription units are tandemly repeated on
each of 13p,14p,15p, 21p, and 22p (= rDNA
clusters)
Transfer mitochondrial family 22 decode mitochondrial mRNA to make 13 single-copy genes. tRNAs are cleaved from
RNA (tRNA), proteins on mitochondrial ribosomes multigenic mtDNA transcripts (Figure 9.3)
~70–80
nucleotides cytoplasmic family 49 decode mRNA produced by nuclear genes 700 tRNA genes and pseudogenes dispersed
(Figure 9.13) at multiple chromosomal locations with some
large gene clusters
Small nuclear spliceosomal family 9 U1, U2, U4, U5, and U6 snRNAs process about 200 spliceosomal snRNA genes are
RNA (snRNA), with subclasses Sm standard GU–AG introns (Figure 1.19); found at multiple locations but there are
~60–360 and Lsm (Table 9.10) U4atac, U6atac, U11, and U12 snRNAs moderately large clusters of U1 and U2 snRNA
nucleotides process rare AU–AC introns genes; most are transcribed by RNA pol II
non-spliceosomal several U7 snRNA: 3¢ processing of histone mRNA; mostly single-copy functional genes
snRNAs 7SK RNA: general transcription regulator;
Y RNA family: involved in chromosomal
DNA replication and regulators of cell
proliferation
Small C/D box class 246 maturation of rRNA, mostly nucleotide usually within introns of protein-coding
nucleolar RNA (Figure 9.15A) site-specific 2¢-O-ribose methylations genes; multiple chromosomal locations, but
(snoRNA), some genes are found in multiple copies in
~60–300 H/ACA class 94 maturation of rRNA by modifying uridines gene clusters (such as the HBII-52 and HBII-85
nucleotides (Figure 9.15B) at specific positions to give pseudouridine clusters—Figure 11.22)
Small Cajal 25 maturation of certain snRNA classes in usually within introns of protein-coding
body RNA Cajal bodies (coiled bodies) in the nucleus genes
(scaRNA)
Miscellaneous BC200 1 neural RNA that regulates dendritic 1 gene, BCYRN1, at 2p16
small protein biosynthesis; originated from Alu
cytoplasmic repeat
RNAs,
~80–500 7SL RNA 3 component of the signal recognition three closely related genes clustered on
nucleotides particle (SRP) that mediates insertion of 14q22
secretory proteins into the lumen of the
endoplasmic reticulum
Vault RNA 3 components of cytoplasmic vault RNPs VAULTRC1, VAULTRC2, and VAULTRC3 are
that have been thought to function in clustered at 5q31 and share ~84% sequence
drug resistance identity
Y RNA 4 components of the 60 kD Ro RNY1, RNY3, RNY4, and RNY5 are clustered at
ribonucleoprotein, an important target of 7q36
humoral autoimmune responses
RNA GENES 279
MicroRNA > 70 families of ~1000 multiple important roles in gene see Figure 9.17 for examples of genome
(miRNA), related miRNAs regulation, notably in development, organization, and Figure 9.16 for how they are
~22 nucleotides and implicated in some cancers synthesized
Piwi-binding RNA 89 individual > 15,000 often derived from repeats; expressed 89 large clusters distributed across the genome;
(piRNA), ~24–31 clusters only in germ-line cells, where they individual clusters span from 10 kb to 75 kb
nucleotides limit excess transposon activity with an average of 170 piRNAs per cluster
Endogenous many probably often derived from pseudogenes, clusters at many locations in the genome
short interfering more than inverted repeats, etc.; involved in
RNA (endo- 10,000a gene regulation in somatic cells and
siRNA), ~21–22 may also be involved in regulating
nucleotides some types of transposon
Long noncoding many > 3000 involved in regulating gene usually individual gene copies; transcripts often
regulatory RNA, expression; some are involved undergo capping, splicing, and polyadenylation
often > 1 kb in monoallelic expression but antisense regulatory RNAs are typically long
(X-inactivation, imprinting), and/or as transcripts that do not undergo splicing
antisense regulators (Table 9.11)
aBased on extrapolation of studies in mouse cells.
NONCODE integrated database of all ncRNAs except rRNA and tRNA http://www.noncode.org
sno/scaRNAbase small nucleolar RNAs and small Cajal body-specific RNAs http://gene.fudan.sh.cn/snoRNAbase.nsf
Component of minor spliceosome U11, U12, U4atac, and U5 snRNAs U6atac snRNA
Bound core proteins Sm proteins (SmB, SmD1, SmD2, SmD3, SmE, SmF, SmG) seven Lsm proteins (LSM2–LSM8)
Location synthesized in the nucleus and then exported to the never leave the nucleus; undergo maturation in
cytoplasm, where they each associate with seven Sm the nucleolus and then accumulate in speckles to
proteins and undergo 5¢ and 3¢ end processing; then perform spliceosomal function
re-imported into the nucleus to undergo more RNA
processing in Cajal bodies, before accumulating in
speckles to perform spliceosomal function
Site-specific nucleotide performed by scaRNAs in the Cajal bodies of the nucleus performed by snoRNAs in the nucleolus
modification
aAlthough not a spliceosomal snRNA, U7 snRNA has a similar structure to the Sm class of snRNAs and five of its seven core proteins are identical to core
some 15. Nonstandard functions are known or expected for some snoRNA genes
that do not have sequences complementary to rRNA sequences. For example, the
HBII-52 snoRNA has an 18-nucleotide sequence that is perfectly complementary
to a sequence within the HTR2C (serotonin receptor 2c) gene at Xp24, and
regulates alternative splicing of this gene. The neighboring HBII-85 snoRNAs
have recently been implicated in the pathogenesis of Prader–Willi syndrome
(OMIM 176270).
Small Cajal body RNA genes
The scaRNAs resemble snoRNAs and perform a similar role in RNA maturation,
but their targets are spliceosomal snRNAs and they perform site-specific modifi-
cations of spliceosomal snRNA precursors in the Cajal bodies of the nucleus.
There are at least 25 human genes, each specifying one type of scaRNA. Like
snoRNA genes, the scaRNA genes are typically located within the introns of genes
transcribed by RNA polymerase II.
long ds RNA
dicer
cap poly(A)
DNA
cap poly(A)
cap poly(A)
HMT DNMT
histone methylation
DNA methylation
degraded RNA
transcription repressed
Figure 9.16 Human miRNA synthesis. (A) General scheme. The primary transcript, pri-miRNA, has a 5¢ cap
(A)
(m7GpppG) and a 3¢ poly(A) tail. miRNA precursors have a prominent double-stranded RNA structure (RNA
hairpin), and processing occurs through the actions of a series of ribonuclease complexes. In the nucleus,
nucleus Rnasen, the human homolog of Drosha, cleaves the pri-miRNA to release the hairpin RNA (pre-miRNA); this
miR is then exported to the cytoplasm, where it is cleaved by the enzyme dicer to produce a miRNA duplex.
The duplex RNA is bound by an argonaute complex and the helix is unwound, whereupon one strand (the
passenger) is degraded by the argonaute ribonuclease, leaving the mature miRNA (the guide strand) bound
to argonaute. miR, miRNA gene. (B) A specific example: the synthesis of human miR-26a1. Inverted repeats
RNA pol II (shown as highlighted sequences overlined by long arrows) in the pri-miRNA undergo base pairing to form
a hairpin, usually with a few mismatches. The sequences that will form the mature guide strand are shown in
red; those of the passenger strand are shown in blue. Cleavage by both the human Drosha and dicer (green
arrows) is typically asymmetric, leaving an RNA duplex with overhanging 3¢ dinucleotides.
pri-miRNA
(B)
cap g u u c a
g g
gug ccucgu caaguaa ccaggauaggcugu g
pri-miRNA
pre-miRNA
cgc ggggca guucauu gguucuauccggua a u
a c - c c c
AAAAA
RNASEN complex
cytoplasm
u u c a
g g
u caaguaa ccaggauaggcugu g
pre-miRNA
pre-miRNA gca guucauu gguucuauccggua a u
c - c c c
u u
5¢ u caaguaa ccaggauaggcu 3¢
miRNA duplex miRNA duplex
3¢ gca guucauu gguucuaucc 5¢
c -
cap poly(A)
More than 15,000 different human piRNAs have been identified and so the
piRNA family is among the most diverse RNA family in human cells (see Table
9.8). The piRNAs map back to 89 genomic intervals of about 10–75 kb long (for
more information see the piRNAbank database; Table 9.9). They are thought to be
cleaved from large multigenic transcripts. In humans, the multi-piRNA tran-
scripts contain sequences for up to many hundreds of different piRNAs.
piRNAs are thought to repress transposition by transposons through an RNA
interference pathway, by association with piwi proteins, which are evolutionarily
related to argonaute proteins (Figure 9.18).
Endogenous siRNAs
Long double-stranded RNA in mammalian cells triggers nonspecific gene silenc-
ing through interferon pathways, but transfection of exogenous synthetic siRNA
duplexes or short hairpin RNAs induces RNAi-mediated silencing of specific
genes with sequence elements in common with the exogenous RNA. As we will
see in Chapter 12, this is an extremely important experimental tool that can give
valuable information on the cellular functions of a gene. Very recently, it has
become clear that human cells also naturally produce endogenous siRNAs (endo-
siRNAs).
In mammals, the most comprehensive endo-siRNA analyses have been per-
formed in mouse oocytes. Like piRNAs, endo-siRNAs are among the most varied
RNA population in the cell (many tens of thousands of different endo-siRNAs
have been identified in mouse oocytes). They arise as a result of the production
of limited amounts of natural double-stranded RNA in the cell. One way in
which this happens involves the occasional transcription of some pseudogenes
(Figure 9.19).
RNA GENES 287
piRNA cluster
transcript
DNA
methylation histone
(in mammals) modifications
(C) repression
piwi piwi
DNMT
Me Me Me Me Me HMT
HP1
HP1
mRNA cleavage
H19 2.3 kb 11p15 5 exons spanning 2.67 kb involved in imprinting at the 11p15 imprinted cluster
associated with Beckwith–Wiedemann syndrome
KCNQTOT1 (= LIT1) 59.5 kb 11p15 1 exon antisense regulator at the imprinted cluster at 11p15
PEG3 1.8 kba 19q13 variable number of exons but maternally imprinted and known to function in tumor
up to 9 exons spanning 25 kb suppression by activating p53
HOTAIR 2.2 kb 12q13 6 exons spanning 6.3 kb trans-acting gene regulator; although part of a
regulatory region in the HOX-C cluster on 12q13, HOTAIR
RNA represses transcription of a 40 kb region on the
HOX-D cluster on chromosome 2q31
aLargest isoforms.
HIGHLY REPETITIVE DNA: HETEROCHROMATIN AND TRANSPOSON REPEATS 289
Satellite 3 ATTCC/GGAAT 13p, 14p, 15p, 21p, 22p, and heterochromatin on 1q, 9q,
and Yq12
Microsatellite DNA < 100 bp often 1–4 bp widely dispersed throughout all chromosomes
aThe distinction between satellite, minisatellite, and microsatellite is made on the basis of the total array length, not the size of the repeat unit.
bSatellite DNA arrays that consist of simple repeat units often have base compositions that are radically different from the average 41% G+C
(and so could be isolated by buoyant density gradient centrifugation, when they would be differentiated from the main DNA and appear as satellite
bands—hence the name).
290 Chapter 9: Organization of the Human Genome
Of the three human LINE families, LINE-1 (or L1) is the only family that con-
tinues to have actively transposing members. LINE-1 is the most important
human transposable element and accounts for a higher fraction of genomic DNA
(17%) than any other class of sequence in the genome.
Full-length LINE-1 elements are more than 6 kb long and encode two pro-
teins: an RNA-binding protein and a protein with both endonuclease and reverse
transcriptase activities (Figure 9.21A). Unusually, an internal promoter is located
within the 5¢ untranslated region. Full-length copies therefore bring with them
their own promoter that can be used after integration in a permissive region of
the genome. After translation, the LINE-1 RNA assembles with its own encoded
proteins and moves to the nucleus.
To integrate into genomic DNA, the LINE-1 endonuclease cuts a DNA duplex
on one strand, leaving a free 3¢ OH group that serves as a primer for reverse tran-
scription from the 3¢ end of the LINE RNA. The endonuclease’s preferred cleavage
site is TTTTØA; hence the preference for integrating into AT-rich regions. AT-rich
DNA is comparatively gene-poor, and so because LINEs tend to integrate into
AT-rich DNA they impose a lower mutational burden, making it easier for their
host to accommodate them. During integration, the reverse transcription often
fails to proceed to the 5¢ end, resulting in truncated, nonfunctional insertions.
Accordingly, most LINE-derived repeats are short, with an average size of 900 bp
for all LINE-1 copies, and only about 1 in 100 copies are full length.
The LINE-1 machinery is responsible for most of the reverse transcription in
the genome, allowing retrotransposition of the nonautonomous SINEs and also
of copies of mRNA, giving rise to processed pseudogenes and retrogenes. Of the
6000 or so full-length LINE-1 sequences, about 80–100 are still capable of trans-
posing, and they occasionally cause disease by disrupting gene function after
insertion into an important conserved sequence.
Alu repeats are the most numerous human DNA elements and
originated as copies of 7SL RNA
SINEs (short interspersed nuclear elements) are retrotransposons about
100–400 bp in length. They have been very successful in colonizing mammalian
genomes, resulting in various interspersed DNA families, some with extremely
high copy numbers. Unlike LINEs, SINEs do not encode any proteins and they
cannot transpose independently. However, SINEs and LINEs share sequences at
their 3¢ end, and SINEs have been shown to be mobilized by neighboring LINEs.
By parasitizing on the LINE element transposition machinery, SINEs can attain
very high copy numbers.
The human Alu family is the most prominent SINE family in terms of copy Figure 9.21 The human LINE-1 and Alu
number, and is the most abundant sequence in the human genome, occurring on repeat elements. (A) The 6.1 kb LINE-1
average more than once every 3 kb. The full-length Alu repeat is about 280 bp element has two open reading frames: ORF1,
a 1 kb open reading frame, encodes p40,
long and consists of two tandem repeats, each about 120 bp in length followed by
an RNA-binding protein that has a nucleic
a short An/Tn sequence. The tandem repeats are asymmetric: one contains an acid chaperone activity; the 4 kb ORF2
internal 32 bp sequence that is lacking in the other (Figure 9.21B). Monomers, specifies a protein with both endonuclease
and reverse transcriptase activities. A
(A) LINE-1 repeat element bidirectional internal promoter lies within
the 5¢ untranslated region (UTR). At the
5¢ UTR ORF1 ORF2 3¢ UTR other end, there is an An/Tn sequence, often
described as the 3¢ poly(A) tail (pA). The
pA
LINE-1 endonuclease cuts one strand of a
DNA duplex, preferably within the sequence
TTTTØA, and the reverse transcriptase uses
the released 3¢-OH end to prime cDNA
synthesis. New insertion sites are flanked
p40 endonuclease reverse by a small target site duplication of
transcriptase
2–20 bp (flanking black arrowheads).
(B) An Alu dimer. The two monomers have
(B) Alu repeat element similar sequences that terminate in an
left monomer right monomer An/Tn sequence but differ in size because of
the insertion of a 32 bp element within the
AAAAAA AAAAAA
TTTTTT TTTTTT larger repeat. Alu monomers also exist in
A B the human genome, as do various truncated
32 bp copies of both monomers and dimers.
CONCLUSION 293
containing only one of the two tandem repeats, and various truncated versions
of dimers and monomers are also common, giving a genomewide average of
230 bp.
Whereas SINEs such as the MIR (mammalian-wide interspersed repeat) fami-
lies are found in a wide range of mammals, the Alu family is of comparatively
recent evolutionary origin and is found only in primates. However, Alu sub-
families of different evolutionary ages can be identified. In the past 5 million or
so years since the divergence of humans and African apes, only about 5000 cop-
ies of the Alu repeat have undergone transposition; the most mobile Alu sequences
are members of the Y and S subfamilies.
Like other mammalian SINEs, Alu repeats originated from cDNA copies of
small RNAs transcribed by RNA polymerase III. Genes transcribed by RNA
polymerase III often have internal promoters, and so cDNA copies of transcripts
carry with them their own promoter sequences. Both the Alu repeat and, inde-
pendently, the mouse B1 repeat originated from cDNA copies of 7SL RNA, the
short RNA that is a component of the signal recognition particle, using a retro-
transposition mechanism like that shown in Figure 9.12. Other SINEs, such as the
mouse B2 repeat, are retrotransposed copies of tRNA sequences.
Alu repeats have a relatively high GC content and, although dispersed mainly
throughout the euchromatic regions of the genome, are preferentially located in
the GC-rich and gene-rich R chromosome bands, in striking contrast to the pref-
erential location of LINEs in AT-rich DNA. However, when located within genes
they are, like LINE-1 elements, confined to introns and the untranslated regions.
Despite the tendency to be located in GC-rich DNA, newly transposing Alu
repeats show a preference for AT-rich DNA, but progressively older Alu repeats
show a progressively stronger bias toward GC-rich DNA.
The bias in the overall distribution of Alu repeats toward GC-rich and, accord-
ingly, gene-rich regions must result from strong selection pressure. It suggests
that Alu repeats are not just genome parasites but are making a useful contribu-
tion to cells containing them. Some Alu sequences are known to be actively tran-
scribed and may have been recruited to a useful function. The BCYRN1 gene,
which encodes the BC200 neural cytoplasmic RNA, arose from an Alu monomer
and is one of the few Alu sequences that are transcriptionally active under nor-
mal circumstances. In addition, the Alu repeat has recently been shown to act as
a trans-acting transcriptional repressor during the cellular heat shock response.
CONCLUSION
In this chapter, we have looked at the architecture of the human genome. Each
human cell contains many copies of a small, circular mitochondrial genome and
just one copy of the much larger nuclear genome. Whereas the mitochondrial
genome bears some similarities to the compact genomes of prokaryotes, the
human nuclear genome is much more complex in its organization, with only
1.1% of the genome encoding proteins and 95% comprising nonconserved, and
often highly repetitive, DNA sequences.
Sequencing of the human genome has revealed that, contrary to expectation,
there are comparatively few protein-coding genes—about 20,000–21,000 accord-
ing to the most recent estimates. These genes vary widely in size and internal
organization, with the coding exons often separated by large introns, which often
contain highly repetitive DNA sequences. The distribution of genes across the
genome is uneven, with some functionally and structurally related genes found
in clusters, suggesting that they arose by duplication of individual genes or larger
segments of DNA. Pseudogenes can be formed when a gene is duplicated and
then one of the pair accumulates deleterious mutations, preventing its expres-
sion. Other pseudogenes arise when an RNA transcript is reverse transcribed and
the cDNA is re-inserted into the genome.
The biggest surprise of the post-genome era is the number and variety of non-
protein-coding RNAs transcribed from the human genome. At least 85% of the
euchromatic genome is now known to be transcribed. The familiar ncRNAs
known to have a role in protein synthesis have been joined by others that have
roles in gene regulation, including several prolific classes of tiny regulatory RNAs
and thousands of different long ncRNAs. Our traditional view of the genome is
being radically revised.
294 Chapter 9: Organization of the Human Genome
FURTHER READING
Human mitochondrial genome Conrad B & Antonarakis SE (2007) Gene duplication: a drive for
Anderson S, Bankier AT, Barrell BG et al. (1981) Sequence and phenotypic diversity and cause of human disease. Annu. Rev.
organization of the human mitochondrial genome. Nature Genomics Hum. Genet. 8, 17–35.
290, 457–465. Kaessmann H, Vinckenbosch N & Long M (2009) RNA-based gene
Chen XJ & Butow RA (2005) The organization and inheritance of duplication: mechanistic and evolutionary insights. Nat. Rev.
the mitochondrial genome. Nat. Rev. Genet. 6, 815–825. Genet. 10, 19–31.
Falkenberg M, Larsson NG & Gustafsson CM (2007) DNA Linardopoulou EV, Williams EM, Fan Y et al. (2005) Human
replication and transcription in mammalian mitochondria. subtelomeres are hot spots of interchromosomal
Annu. Rev. Biochem. 76, 679–699. recombination and segmental duplication. Nature 437,
MITOMAP: human mitochondrial genome database. http://www 94–100.
.mitomap.org Redon R, Ishikawa S, Fitch KR et al. (2006) Global variation in copy
Wallace DC (2007) Why do we still have a maternally inherited number in the human genome. Nature 444, 444–454.
mitochondrial DNA? Insights from evolutionary medicine. Tuzun E, Sharp AJ, Bailey JA et al. (2005) Fine-scale structural
Annu. Rev. Biochem. 76, 781–821. variation of the human genome. Nat. Genet. 37, 727–732.
Model Organisms,
Comparative Genomics,
and Evolution 10
KEY CONCEPTS
• Model organisms have long been used for understanding facets of basic biology and for
applied research, with important roles in modeling disease and helping evaluate potential
therapies. Genome sequences for many model organisms are now available.
• Sequences from different genomes can be aligned. Comparing genome sequences provides
genomewide information on the relatedness of DNA sequences and inferred protein
sequences, and gives an insight into the processes shaping genome organization and
genome evolution.
• Many functionally important sequences are subject to purifying (negative) selection
whereby harmful mutations are selectively eliminated; these sequences are said to show
evolutionary constraint and seem to be strongly conserved.
• Some mutations in functionally important sequences are subject to positive selection
because they are beneficial. Such sequences underlie adaptive evolution.
• Neutral mutations, which are neither harmful nor beneficial, can become fixed in the
population through random genetic drift.
• Evolutionary novelty often arises through duplicated genes; after duplication, genes can
diverge in sequence and ultimately in function.
• Genes that originated by duplication of a gene within a single genome are known as
paralogs; orthologs are homologous genes in different species that descended from the
same gene in a common ancestor.
• One or both members of a duplicated gene pair can acquire a new function. Changes in
gene function can be due to mutations in the cis-regulatory elements that regulate the
gene’s expression.
• Comparative genomics has contributed to the assessment of evolutionary relationships
between organisms (phylogenetics).
• Morphological evolution is largely driven by mutations in cis-regulatory elements rather
than differences in protein sequences.
• Fragments from transposable elements can be co-opted to make new functionally
important sequences (exaptation); they can give rise to new exons (exonization) or to new
cis-regulatory elements.
• Closely related species such as humans and chimpanzees have almost identical gene
repertoires, but there are important differences between them in copy number of some
gene families, in differential inactivation or modification of some orthologs, and in
regulation of gene expression.
298 Chapter 10: Model Organisms, Comparative Genomics, and Evolution
Now that we have the sequence of the human genome, biologists ultimately hope
to get detailed answers to two big questions. First, at the molecular level, how
does genome information program the normal functioning of cells and organ-
isms? The corollary to this, of course, is: How does genome information become
altered or misinterpreted to lead to hereditary diseases? Second, what makes
individuals and species different? Getting detailed answers to these two funda-
mental questions will take a long time.
Obtaining genome sequences seemed to be an extremely arduous enterprise
in the pre-genome era of the 1990s and early 2000s. In the post-genome era, as
next-generation sequencing technologies become routine, we will soon be inun-
dated with genome sequences from a vast array of organisms and from multiple
individuals within species. With hindsight, we can now recognize that genome
sequencing was the easy part, and just the beginning of a fascinating voyage of
discovery.
To start to answer the question of how our own genes operate, we need to
identify all the genes and gene products, all the regulatory elements, and all the
other functional elements. In Chapter 12, we look at how gene function and gene
regulation are studied in cells and model organisms. Genome sequences also
give us the opportunity to compare representative individuals within and between
species. In Chapter 13, we consider variation between individual humans. In this
chapter, we address variation between species. We will consider how humans
relate to other organisms in the tree of life and the forces that have shaped genome
evolution.
Sequence comparisons across whole genomes from different organisms are
now the mainstay of a new discipline, comparative genomics, and they are begin-
ning to provide powerful new insights into our relationship to other organisms.
We have a major interest in a variety of model organisms from both basic and
applied research perspectives.
cals). Other unicellular organisms have been studied because they are patho-
gens, notably diverse bacterial species.
Archaea are not known to be associated with disease, and the major interest
in studying them is to know more about how they evolved to be so different. They
were initially found in unusual, often extreme environments, such as at very high
temperatures, in waters of extreme pH or salinity, or in oxygen-deficient muds of
marshes. They are now known to inhabit more familiar environments such as in
soils and lakes and have been found thriving inside the digestive tracts of cows,
termites, and marine life, where they produce methane. The metabolic and
energy conversion systems of archaea resemble those of bacteria but the systems
used to handle and process genetic information (DNA replication, transcription,
and translation) are more closely aligned to those of eukaryotes.
To model eukaryotic cell functions, most reliance has been placed on yeasts
because they are easy to culture and genetically very amenable, and because cer-
tain key molecules are known to have been strongly conserved in eukaryotes,
from yeasts to mammals. Yeasts are common on plant leaves and flowers, in soil,
and in salt water, and are also found on the skin surface and in the intestinal
tracts of warm-blooded animals, where they may live symbiotically or as para-
sites. They typically replicate asexually by budding rather than by binary fission:
the cytoplasm and dividing nucleus from the parent cell are initially a continuum
with the bud, or daughter yeast, before a new cell wall is deposited to separate the
two. Usually, yeasts are not associated with disease, but some Candida species
can cause common health problems, including candidiasis (thrush).
Two genetically amenable yeasts, Saccharomyces cerevisiae and
Schizosaccharomyces pombe, have been particularly well studied (see Box 10.1).
Genetic screens have provided a host of valuable mutants that have been partic-
ularly valuable in understanding aspects of the cell cycle and DNA repair that are
very relevant to our understanding of human cells and cancer and birth defects.
Various protozoan models are studied, including amebae, flagellates, and cili-
ates that move using, respectively, pseudopodia, flagella, and cilia. They are of
interest to biomedical researchers as models of various facets of cell and develop-
mental biology. Many are also parasites associated with disease, acting as hosts
for pathogenic bacteria that cause diseases such as legionnaire’s disease, salmo-
nellosis, and tuberculosis, or they may cause disease directly as in some trypano-
somes and malaria-causing Plasmodium species. As models of cell and develop-
mental biology, most interest has been focused on the ameba Dictyostelium
discoideum and the ciliate Tetrahymena thermophila.
Algae are plant-like organisms that exist in unicellular and multicellular
forms. The most relevant unicellular model is Chlamydomonas (see Box 10.1).
With the advent of massively parallel DNA sequencing (see Chapter 8, p. 220),
unicellular genome sequencing is now routine. As a result, it is now possible to
conduct a comprehensive investigation of the human microbiome, the collective
term for the microorganisms that live and interact inside and on humans (resid-
ing mostly on the skin and in the digestive tract, and outnumbering human cells
by a factor of 10 or so). As an extension to the Human Genome Project, an inter-
national effort of linked projects called the Human Microbiome Project has been
set up to sequence diverse genomes within the human microbiome.
There is also the expectation that synthetic model organisms will be created in
the near future. The first cellular genome to be completely synthesized—that of
Mycoplasma genitalium—was reported in early 2008. The 0.53 Mb synthetic
genome was assembled by a variety of genetic engineering procedures, but ulti-
mately derived from approximately 10,000 synthetic oligonucleotides about 50
nucleotides in length. This achievement came one year after another important
breakthrough: the first successful case of cellular genome transplantation. One
species, Mycoplasma capricolum, was changed into another, Mycoplasma
mycoides, by removing the host chromosome from M. capricolum and replacing
it with an equivalent chromosome isolated from M. mycoides.
Together with the ability to mutate genes, the above breakthroughs herald the
start of a new era of synthetic biology that will lead to the production of artificial
life. Synthetic unicellular organisms hold great promise for biomedical research
and the possibility of developing novel solutions to environmental problems,
such as the production of green biofuels and breaking down toxic waste. However,
MODEL ORGANISMS 301
even with inbuilt protection systems, there are important safety issues and the
threat of bioterrorism on a large scale.
Various fish, frog, and bird models offer accessible routes to the
study of vertebrate development
Mammalian development is not easy to analyze. First, egg cells are very small and
difficult to manipulate. Second, developing embryos implant into the uterine
epithelium; development then proceeds inside the mother, making it difficult to
access material for study. Fish, frogs, and birds produce eggs that can be large
and easy to manipulate, and the subsequent development proceeds externally to
the mother. For these reasons, various non-mammalian vertebrate models are
routinely studied to illuminate our understanding of vertebrate development.
302 Chapter 10: Model Organisms, Comparative Genomics, and Evolution
Caenorhabditis elegans is a roundworm 1 mm long. It can be The fruit fly Drosophila melanogaster has been studied extensively for
cultured easily in the laboratory, on agar plates (feeding on bacteria) decades, during which a vast amount of information has been built
or in liquid culture, and is very amenable to genetic analyses. The up about gene function. Some of the major features and approaches
predominant sex is hermaphrodite (XX), a modified female. By that have assisted genetic mapping and functional analyses are listed
producing both sperm and eggs it can self-fertilize, resulting in below.
increased homozygosity. Rarely, a male (X0) form develops through
(Courtesy of André
occasional loss of one of the two hermaphrodite X chromosomes, and
Karwath under the Creative
hermaphrodites will mate preferentially with males when available.
Commons Attribution
Several areas of study take advantage of the unique features of
ShareAlike 2.5 license.)
C. elegans:
Two types of small freshwater fish have been popular developmental models:
the zebrafish, which originated in India, and more recently the medaka (Japanese
killifish—Box 10.3). In addition, the genomes of pufferfish are remarkably free of
repetitive DNA, and two such species—Takifugu rubripes and Tetraodon nigro-
viridis—have been useful in aiding the discovery of genes and regulatory ele-
ments by comparative genomics.
Frogs of the genus Xenopus (African clawed frog) have been particularly
important models for investigating both embryonic development and cell biol-
ogy. The name Xenopus means strange foot and originates from the sharp claws
on the toes of the large, strong, webbed hind feet. Xenopus has been an important
MODEL ORGANISMS 303
TABLE 10.1 PRINCIPAL REASONS FOR STUDYING MAJOR INVERTEBRATE MODEL ORGANISMS
Organism class Principal models Major reasons why studied
parasitic Wuchereria bancrofti (causes elephantiasis) to understand pathogenesis and for economic reasons (some
Onchocerca volvulus (causes river blindness) nematodes attack crops)
Arthropods nonpathogenic Drosophila melanogaster (fruit fly) model of development, behavior, and neuroscience; also for
gaining insights into gene function, modeling some human
diseases at the cellular level, and rapid drug screening
Danaus plexippus (monarch butterfly) to study the cellular and molecular mechanisms underlying a
sophisticated circadian clock
pathogenic Anopheles gambiae (transmits Plasmodium these mosquitoes can transmit a variety of different parasites
falciparum, which causes malaria) and viruses
Aedes aegypti (transmits yellow fever and
Dengue fever viruses)
Echinoderms sea urchins, notably Strongylocentrotus models of development, and as basic deuterostomes
purpuratus
Ascidians Ciona intestinalis (sea squirt) (Figure 10.7B) model of development, and as a primitive chordate
Cephalochordates Amphioxus (Figure 10.7A) model of development, and because evolutionarily closely
related to vertebrates
model for establishing the mechanisms for early fate decisions, patterning of the
basic body plan, and organogenesis. There has also been seminal work on chro-
mosome replication, chromatin and nuclear assembly, cell cycle components,
cytoskeletal elements, and signaling pathways.
Birds have amniotic membranes, and avian development closely resembles
that of mammals. However, whereas a mammalian embryo depends on the
mother for its nutrition (with exchange occurring across the placenta), avian
embryos do not have a placenta and so are self-developing systems. Because a
bird embryo develops outside the body, it is accessible at all stages in develop-
ment. The chick is a good model largely because its embryo is easily obtained
(see Box 10.3).
cousins. Livestock animals, such as sheep and pigs, have some useful applica-
tions, but rodents are by a long way the most studied mammalian models, partly
because of their greater amenability to genetic studies and partly because of
financial and ethical considerations. Of these, the mouse is the premier model for
a variety of reasons, although an increasing number of studies involve rat models
MODEL ORGANISMS 305
(see Box 10.3). We consider mouse and rat models, largely in the context of genetic
models of disease, in Chapter 20.
Rodents are not good models for understanding some human systems, nota-
bly how our brain functions. Nonhuman primates resemble humans in physiol-
ogy, cognitive capabilities, detailed brain organization, social complexity, repro-
duction, and development, and have been particularly important in neuroscience
research. Being large and normally living in complex social groups, great apes are
expensive to maintain and their use in animal experimentation has been conten-
tious. As a result, most primate studies have been conducted on smaller primates
with shorter generation times, notably the rhesus macaque and marmoset (Table
10.2).
TABLE 10.2 PRINCIPAL REASONS FOR STUDYING MAJOR MAMMALIAN MODEL ORGANISMS
Organism Major reasons why studied
NONPRIMATE MODELS
Cat interactions between host gene and infectious pathogens, notably involving feline immunodeficiency virus; as a model of
genetic disease, and testing treatment regimens; model of reproductive physiology, endocrinology, and behavior
Cow, sheep, pig as models of human endocrinology and physiology; host–pathogen interactions in relation to food safety. Transgenic
sheep have also been used to produce valued human proteins in their milk
Dog as a model of genetic diseases (a long history of selective breeding means that many types of dog are prone to genetic
diseases such as cancer, heart disease, deafness, blindness, and autoimmune disorders); an important model for the
genetics of behavior, and used extensively in pharmaceutical research
Mouse principal model for understanding gene function and mammalian development, and for modeling human disease and
testing treatment regimens; as detailed in Chapter 20, sophisticated genetic tools and large numbers of mutants have
long been available
Rat large size compared with mouse makes it more suited to physiological, neurological, pharmacological, and biochemical
analyses. Has provided useful models for various complex human disorders, but genetics not as well developed as mouse
genetics; routinely used in testing drug toxicity and therapeutic efficiency
Chimpanzee for evolutionary reasons, being the species most closely related to humans; to identify the basis of its resistance to
developing Plasmodium falciparum-induced malaria and its comparative resistance to the HIV progression to AIDS
Marmoset used in studying brain function, drug sensitivity, immunity, and autoimmune disease. Also used for studying cognitive and
behavioral disorders
Rhesus macaque the most widely used primate model. The preferred model for vaccine development. Important model in neuroanatomy
and neurophysiology, and better suited than marmosets for studying higher brain function
A series of white paper discussion documents on individual model organisms and proposed genome projects can be accessed after clicking
on individual genome sequencing projects listed on the US National Human Genome Research Institute’s list of approved sequencing targets
(http://www.genome.gov/10002154).
306 Chapter 10: Model Organisms, Comparative Genomics, and Evolution
BOX 10.4 PRINCIPAL MECHANISMS THAT AFFECT THE POPULATION FREQUENCY OF ALLELES
Individuals within a population differ from each other, mostly as Random genetic drift has little effect in large populations. Even
a result of inherited genetic variation. Several factors affect the though the gametes that are transmitted are a small fraction of the
frequency of any mutant allele in a population. total gametes in the population, they are a statistically representative
Natural selection sample. However, when population sizes are small (e.g. as a result of
Natural selection is the process whereby some of the inherited genetic geographical isolation), the number of gametes that are transmitted
variation will result in differences between individuals in their capacity to the next generation will be proportionately smaller, and random
to engage in reproduction (affected by parameters such as mortality, genetic drift can cause considerable changes in allele frequencies
health, and mating success) and to produce healthy offspring (Figure 1). In the absence of new mutations and factors such as
(differences in fertility, fecundity, and viability of the offspring). The selection that affect allele frequency, alleles subject to random
fitness of an organism is a measure of the individual’s ability to survive genetic drift will eventually become fixed (the point at which the allele
and to reproduce successfully. In the simplest models, the fitness of an frequency in the population is 0% or 100%).
individual is determined solely by its genetic makeup, with each locus Interlocus sequence exchange
contributing independently. As a result, one can also talk about the Frequent sequence exchanges can occur between highly homologous
fitness of a genotype. gene loci that encode essentially identical products. For example,
Many mutations that occur in functionally important sequences human 5.8S rRNA, 18S rRNA, and 28S rRNA are encoded by multiple
decrease the fitness of the individuals that carry them (the mutation tandemly repeated transcription units (see Figure 1.22) that show
could remove or change a functionally important amino acid or very high levels of sequence identity. Mechanisms such as unequal
nucleotide, or alter splicing, or cause premature termination of an crossover and gene conversion cause sequence exchanges between
encoded protein). Carriers of the mutation will be reproductively less the different repeats, resulting in sequence homogenization (one
successful and so, to conserve functionally important sequences, sequence is replaced by another, so that one type of repeat can
harmful mutations are selected against and removed from the increase in frequency). As a result of frequent sequence exchanges,
population. This type of selection is known as purifying selection (also the multiple gene loci effectively behave as alleles. The frequency
called negative selection), and mutations are said to be evolutionarily of a specific gene repeat (allele) can be determined in part by the
constrained. Purifying selection results in a decrease in genetic frequency with which it engages in sequence exchange
diversity as the population stabilizes on a particular trait value, an
effect known as stabilizing selection. generation allele
Very rarely, a new mutation confers a selective advantage and frequency
increases the fitness of its carrier. Such a mutation will be subjected
0 0.5/0.5
to positive selection (also called Darwinian selection or advantageous
selection), which would be expected to foster its spread through a
population. A resulting change in phenotype can be described as
adaptive evolution. Mutations that do not confer any advantage or available 0.5/0.5
harm to their carriers are described as neutral mutations. gametes
If a locus has two alleles with different fitnesses, the fitness
of the heterozygote may be intermediate between the two types
of homozygote. The mode of selection in this case is co-dominant random
sampling: 1 0.4/0.6
and the selection will be directional, resulting in an increase in 10 gametes
the advantageous allele. In some cases, however, a new mutation
may be advantageous in heterozygotes but not in homozygotes
(heterozygote advantage). This situation, in which the heterozygote 0.4/0.6
has a higher fitness than both the mutant homozygote and the normal
homozygote, is a form of balancing selection known as overdominant
selection (see the example of cystic fibrosis in Box 3.7).
Random genetic drift 2 0.7/0.3
Changes in allele frequency can occur simply by chance, a process
known as random genetic drift. Even if all the individuals in a
population had exactly the same fitness so that natural selection could
not operate, allele frequencies would nevertheless change because 0.7/0.3
of random sampling of gametes. Such sampling occurs because,
through choice or circumstances, not all individuals in a population
will reproduce, and so only a small proportion of the available gametes
in any generation is ever passed on to the next generation. Even if
every single individual were to contribute two gametes to the next Figure 1 Random sampling of gametes in small populations can
generation, sampling would still occur because heterozygotes can lead to considerable changes in allele frequencies. [Adapted from
produce two types of gamete bearing different alleles but the two Bodmer WF and Cavalli-Sforza LL (1976) Genetics, Evolution And Man.
gametes passed to the next generation may, by chance, carry the With permission from WH Freeman & Company.]
same allele.
BOX 10.5 ASSAYING THE EVOLUTION OF CODING SEQUENCES BY USING Ka/Ks RATIOS
Coding sequences are, by definition, functionally important and so that is silent. However, for immune system genes that co-evolve with
they are subject to selection pressure. Most coding sequences are genes of invading microorganisms in a battle to recognize and kill the
subject to purifying selection to conserve functionally important invaders, the Ka/Ks ratio may be much greater than 1.0, which is very
amino acid sequences, and they seem to evolve slowly. However, strong evidence that positive selection is involved.
some coding sequences are subject to positive selection and Working out a Ka/Ks ratio over a full-length coding sequence
diverge rapidly in sequence in a way that is beneficial to the can be misleading because one part of a protein may be subject
organism. They include many immune system proteins that need to purifying selection while another domain is subject to positive
to be able to adapt quickly to be able to recognize rapidly evolving selection. Modern methods can use multiple alignments of
microbial antigens, many proteins involved in reproductive processes, orthologous sequences from several species within a phylogeny to
and many others. work out a Ka/Ks ratio for each codon.
The standard way of assessing whether a sequence is subject to Various potential pitfalls may affect the accuracy of the
positive or purifying selection is to calculate the ratio of the number calculations. One assumption—that synonymous substitutions are
of non-synonymous (amino acid-changing) substitutions per non- neutral substitutions—may not always be true. For example, the silent
synonymous site (Ka or dN) to the number of synonymous (silent) site may be part of a regulatory sequence within an exon, such as
substitutions per synonymous site (Ks or dS). This ratio is variously an exonic splice enhancer. Also, if the compared sequences are from
known as the Ka/Ks ratio or the dN/dS ratio, or w. When comparing distantly related organisms, multiple consecutive substitutions may
orthologs from different species, the overall Ka/Ks ratio is often very have taken place at a specific site but go unnoticed.
much less than 1.0 because a substitution that changes an amino For a Web-based server for calculating Ka/Ks ratios, see Selecton
acid is much less likely to be different between two species than one (http://selecton.tau.ac.il/).
COMPARATIVE GENOMICS 309
TABLE 10.3 SOME COMPUTER PROGRAMS USED TO ALIGN COMPLEX GENOME SEQUENCES AND DETECT
EVOLUTIONARILY CONSTRAINED SEQUENCES
Task Program Characteristics URL/reference
Reconstructing MERCATOR uses only coding exons as initial anchoring points http://www.biostat.wisc.edu/~cdewey/mercator/
homologous
collinearity GRIMM analyzes rearrangements in pairs of genomes http://grimm.bioprojects.org/GRIMM/index.html
MLAGAN multiple global alignment of genomic sequences Brudno et al. (2003) Genome Res. 13, 721–731
Multiple TBA/MULTIZ TBA is designed to be able to align many Miller et al. (2007) Genome Res. 17, 1797–1808; the
alignment megabase-sized regions of multiple mammalian 28-way vertebrate alignment can be viewed with
genomes; to perform the dynamic-programming the UCSC genome browser at http://genome.ucsc
step, TBA runs a stand-alone program called .edu, downloaded in bulk by anonymous FTP from
MULTIZ, which has been used to align much of the http://hgdownload.cse.ucsc.edu/goldenPath/hg18
human genome sequence with the sequences of /multiz28way, or analyzed with the Galaxy server at
27 other vertebrate genomes http://g2.trac.bx.psu.edu
Constraint PHASTCONS identifies evolutionarily conserved elements in a Siepel et al. (2005) Genome Res. 15, 1034–1050;
detection multiple alignment, given a phylogenetic tree http://compgen.bscb.cornell.edu/~acs/phast
Cons-HOWTO.html
SCONE gives a sequence conservation score after Asthana et al. (2007) PLoS Comp. Biol. 3, 2559–2568
estimating the rate at which each nucleotide is
evolving, and then computes the probability of
neutrality given the estimated rate
For further information on these and additional programs, see Margulies & Birney (2008) in Further Reading.
SC ATGACTCTCCTACTTCAAA-AATGTATCAGAGTAGAACTGC-GGAAAGATCCAATATTTA
SP ACGACTCTCCTACCACAAATAATGCGTCGGAAAAGAACTGCTGGGCAAATCCAATGTTTG
SM TTGATTTCACTAACCCAAACACTGTATCAAAAAGGAAAAGTTAAACAAACATAATGTTCA
SB -----------------GAAACTTTTCAACAAACGTAGTTTAAACTTGGTGCCGTCTTAA
* * * * * * * **
SC ACGCTTCT-TCTTCACTCGAAATATTTCTGTCAGTTTTAGAATCTGTTATATAT-CCCTT
SP ACACTGCC-TCTGGACCTGAAATGTCCCCGTCTATTCCAGAAGCTCTTATAGCTCCCCTT
SM ATATTTGTGTCTGGTATTCACTTGGATC-ACCTGTCTTGGAATCTCTTATATCT--TCTC
SB GATGTT---AAGCCAATCTAATTCTTCCGGTGTGTTTGGTCCTCGAGAATGTTGTTATCG
* * * * * * **
SC TAAAGGAAACTTGATACTTCTAGATTTTCATCTT-GTC-GATCTTCAAACAACGTGTTA-
SP CAAAGGAAATTTCGTATATTCATA-TCTCATCTTCATC-AATGTTCAAACAGCGTGTTA- Figure 10.4 Using comparative genomics
to test the validity of a predicted gene.
SM CAAAGTAGATCTGATGTCACTAGA-TCTCATTTATGTCAAATGATTGAACAATGTATTCG
The S. cerevisiae YDR102C sequence (SC)
SB CATTGCTCAAAACAAAGTCCTATATTTGTACCTATCTC-AATGGCAGGGCAACAAGTTGT is aligned here with homologs from
* * * * * * * * ** *** ** ** S. paradoxus (SP), S. mikatae (SM), and
S. bayanus (SB). YDR102C was originally
SC -----TCAGAGAAACAGCTCTATGAGAAATCAGCTGATG-TCAGTATTCTGTCAGTCACT identified as an open reading frame (ORF)
SP CAGACACAGAGAAACAGCTTCATGAGAAGTCAGCCGGTG-TCAGTGCTCTGACATCCGCT that was predicted to encode a 111 amino
SM CAGACACCAGTTAACAACAGCAAGATAAATTGAGGGACG----GTGCTCTACTGCCTGCT acid protein. Asterisks indicate nucleotide
SB AAG--CATGAGCAGCAGTAGCGAAAGGAACCGCCTGAAAAATAGTACTCGGTAATCTTCA positions where there is identity in the four
* ** * * * ** * * * sequences. Trinucleotides underlined in red
indicate potential premature termination
SC TCGTGAATTTACAT-TGACTGCTCTTTCGTGCAATTCATACTGTATTATTATAGTTGCAA codons. Not only is the start codon (green
box) not conserved but the alignments
SP CAGTGAACCCATGTATGACTGCTCTTTCGTGCAATTCAATCTGTA---TTCTAGCCGCAA require frequent insertions and deletions
SM CAGATAACTCATCG-TGACTGCTCTTCTGTACGGTCCAACCTGTATTCTCTT--CTTAAA (arrows). Thus, the homologs in the other
SB CATACTTCTTAC---CAGCTGTTCTCTCGTGTGGGCCAATCTGTATCTGACTATCATCTA Saccharomyces species could not make a
* *** *** ** ** ** *** * * protein like the one that had been predicted
for the YDR102C ORF, raising doubts about
SC CTAGATGGCAAAAAATT--GATAATGCCCTAAAAATGACAGGCCGGTAA whether the YDR102C sequence is really
SP TTAAACGACAGAAAATTTATATAATGCCCTAATGATGACAGGTTGATAA - start
a coding sequence. [From supplementary
SM TTAGAAGACGGGAAGTT--GATAATGCCCTATTGACGACAGGGCATAAA - stop information in Kellis M, Patterson N, Endrizzi
SB CTAGCC-TCAAAGAATT--GATAATGCCCTAACGGCGACAGGCCACAA- - in/del M et al. (2003) Nature 423, 241–254. With
** * * ** ****** * **** **** ** * permission from Macmillan Publishers Ltd.]
312 Chapter 10: Model Organisms, Comparative Genomics, and Evolution
inert, but high-throughput sequence sampling now suggests that at least 85% of
the human genome is transcribed. In addition, as we describe later in this chap-
ter, highly repetitive retrotransposons can give rise to functionally important
sequences including enhancer sequences, exons, and even some genes.
(A) (B)
Nkx2.5.IV
Nkx2.5.III
Nkx2.5.II
Nkx2.5.I
Ankrd1.I
Smyd1.I
Hand2.I
Mef2c.I
Gata6.I
sensitivity
Tbx1.I
Des.I
% identity
human–opossum >245,000 CNEs rat 90–100
human 80–90
chimp 70–80
human–chicken >65,000 CNEs dog 60–70
cow 50–60
opossum <50%
human–xenopus >29,000 CNEs chicken
xenopus
zebrafish
specificity human–fugu >11,000 CNEs fugu
GENE AND GENOME EVOLUTION 313
TABLE 10.4 PRINCIPAL GENOME BROWSERS FOR IDENTIFYING CONSERVED NONCODING SEQUENCES IN VERTEBRATES
Genome browser Conserved elements identified with Sequence alignment based on URL
strings under comparison are returned. Global alignments are less prone to demonstrating false homology because each letter of one sequence
is constrained to being aligned with only one letter of the other. However, local alignments can cope with rearrangements between non-syntenic,
orthologous sequences by identifying similar regions in sequences; this, however, comes at the expense of a higher false positive rate. Glocal
alignment is a combination of global and local alignment that allows the user to create a map that transforms one sequence into another while
allowing for arrangement events.
shuffling
The discovery in the mid- to late 1970s that most eukaryotic genes are split by
introns was a surprise. Such introns are known as spliceosomal introns because
the splicing reaction requires dedicated spliceosomes, unlike autocatalytic (self- (B)
splicing) introns. The latter are found in both bacteria and eukaryotes, but their
distribution is very restricted, being primarily found in rRNA and tRNA genes,
and in a few protein-coding genes found in some types of mitochondria and
chloroplasts.
Spliceosomal introns are believed to have originated from a subclass of auto-
catalytic introns known as group II introns that subsequently lost their capacity
for self-splicing. They are thought to have arisen in eukaryotes and were probably
inserted into genes at an early stage in eukaryote evolution, but there is evidence
for some later integrations too. However, the great majority of introns pre-dated
chordate evolution. Although humans and the primitive chordate, Amphioxus,
diverged from a common ancestor approximately 550 million years ago, com-
parison of the human and Amphioxus genomes reveals that 85% of the introns
are conserved at precisely analogous locations.
During the evolution of complex genomes, spliceosomal introns expanded in
size by the insertion of different types of DNA sequence, notably repetitive DNA
sequences originating by retrotransposition. As a result, exons came to be well Figure 10.7 Amphioxus and tunicates
separated in genomic DNA. Splitting functionally important sequences as exons represent two subphyla of chordates.
in genomic DNA introns facilitated the copying of exons in evolutionarily advan- (A) An example of an amphioxus (lancelet),
Branchiostoma lanceolata. (B) A tunicate,
tageous ways, allowing duplication and new combinations of important
Ciona intestinalis. [(A) courtesy of Hans
sequences. Hillewaert under the Creative Commons
Attribution ShareAlike 2.5 License.
Exon duplication
(B) courtesy of Clément Lamy under the
Tandem duplication of exons is evident in about 10% of genes in humans, D. mel- Creative Commons Attribution ShareAlike
anogaster, and C. elegans, and is often responsible for generating repeated pro- 3.0 License.]
tein domains (Figure 10.8). Unequal crossover (or unequal sister chromatid
exchange) can result in intragenic duplication so that a segment of genomic DNA
spanning one or more exons is duplicated.
(A)
E1 E2 E3 E18 E19
Gene duplication can permit increased gene dosage but its major
value is to permit functional complexity
Human genes have been duplicated by various duplication mechanisms (see
Chapter 9, p. 263). Gene families are a major general feature of metazoan genomes,
and one gene in a hundred is estimated to be duplicated and fixed in the popula-
tion every million years. When a gene duplicates within a genome, the two result-
ing gene copies are initially identical. In rare cases, the duplicated genes may be
maintained to produce identical, or essentially identical, gene products because
increased amounts of gene product are advantageous in some way. The large
numbers of genes that make essentially identical copies of different ribosomal
RNA and histone proteins come into this category.
In response to an altered environment, other genes may undergo duplication
to provide a selective advantage through increased gene dosage, as illustrated in
the recent adaptation of human populations to starch diets. Starch is digested
using the enzyme salivary amylase. The chimpanzee has a single salivary amylase
orthologs orthologs
gene, but in humans a series of evolutionarily very recent tandem gene duplica-
tions has produced multiple copies of such genes at the AMY1 gene cluster at
1p21. AMY1 copy number varies significantly between haplotypes, and AMY1
copy number positively correlates with the expression of salivary amylase. Human
populations that historically consume a high-starch diet have significantly more
AMY1 copies than populations that traditionally consume a low-starch diet;
increased AMY1 copy number seems to be an adaptive response to increased
starch in the diet.
Gene duplication to provide increased gene dosage is comparatively rare.
More frequently, after gene duplication the sequences of the two genes diverge
quite extensively. One of the duplicated gene copies may become a pseudogene
(see Box 9.2). Alternatively, two divergent functional gene copies are retained.
The divergent gene copies may acquire different properties and often they come
to be expressed in different ways that are functionally advantageous. Gene dupli-
cation is thus thought to be a major motor that drives increasing functional com-
plexity. The two homologous gene copies are described as paralogs (as opposed
to orthologs, which are present in different species; Figure 10.10).
Retrogenes offer an immediate possibility for functional divergence: because
the gene copy is made at the cDNA level, it lacks the promoter and both neigh-
boring and intronic regulatory elements of the parent gene (see Figure 9.12).
Continued expression of a retrogene is therefore dependent on regulatory
sequences in the neighborhood of its integration site that are usually rather dif-
ferent from the sequences that regulate the parent gene. As a result, the parent
gene and retrogene copy can be expressed in different ways, often in different cell
types. After being exposed to a different cellular (and molecular) environment, Figure 10.11 Functional divergence of
the retrogene can come to acquire different functions. duplicated genes. (A) Neofunctionalization.
Here, one of the duplicated genes retains
Genes duplicated at the genomic DNA level can also acquire different func-
the function of the ancestral gene. The
tions (Figure 10.11). One possibility is that one of the duplicate genes acquires a other paralog undergoes mutations (green
distinctive new function (neofunctionalization). Pure neofunctionalization is arrow) that result in its having altered
rare, however. Usually both genes undergo expression changes over evolutionary characteristics (green box, but could be in
time. Sometimes the duplicated genes acquire complementary mutations in dif- regulatory sequences unlike as shown here),
ferent cis-regulatory regions so that they diverge in expression while between leading it to acquire a new function.
them initially maintaining the expression domains of the ancestral gene. As a (B) Subfunctionalization. The duplicated gene
result, different subsets of the functions of the ancestral gene can be partitioned copies undergo complementary deleterious
mutations, often at the level of regulatory
(A) elements. In this example, the ancestral
gene is imagined to be regulated by two
sets of upstream control elements (blue and
red symbols) so that it is expressed in two
different cell types, or different tissue types,
or at different developmental stages (blue
and red arrows). Mutations (jagged arrows)
(B)
inactivate one set of control elements in
one paralog and the other set in the second
paralog so that the expression patterns
and functions of the ancestral gene are
partitioned between the two daughter
genes.
GENE AND GENOME EVOLUTION 317
ancestral globin
~800 MYA
~550 MYA
~500 MYA
~450 MYA
~200 MYA
~150 MYA
NGB CYGB MB HBZ HBA2 HBA1 HBE1 HBG2 HBG1 HBD HBB
14q 17q 22q 16p 11p
43.2%
Figure 10.12 Evolution of the vertebrate globin superfamily. Human globin genes are located on five chromosomes (bottom).
Those resulting from evolutionarily ancient gene duplications have subsequently been separated by chromosomal rearrangements.
The more recent duplications have resulted in gene clusters on chromosomes 11 and 16. Globins encoded within a gene cluster
show a greater degree of sequence identity (64.1–99.3%) than globins on different chromosomes (e.g. 43.2% between a- and
b-globin; sequence identities of the a- and b-globin genes to globins on the other chromosomes are significantly less than 30%).
Some of the duplications have been recent: the HBA1 and HBA2 genes encode identical a-globins, and the HBG1 and HBG2 genes
encode g-globins that differ by a single amino acid. Other vertebrates show some differences. Sometimes there are copy number
differences (for example, mice have a single a-globin and two b-globin genes, goats have a triplicated b-globin gene cluster,
and many fish have two cytoglobin genes). In addition, there are qualitative differences, such as a globin gene that is expressed
specifically in the eye of birds. MYA, million years ago.
318 Chapter 10: Model Organisms, Comparative Genomics, and Evolution
functions in the case of neuroglobin. Fish have two cytoglobin genes that are
expressed widely but have diverged in gene expression: the cygb-2 gene is
expressed very strongly in neuronal tissues, unlike the cygb-1 gene.
The blood globins have subsequently undergone further specialization, but
this time differences in gene regulation result in differences in developmental
timing. Thus, in early development, z-globin is expressed instead of a-globin,
which will take its place at later stages, and e-globin (embryonic period) and
g-globins (fetal period) are used instead of b-globin, which begins to be synthe-
sized at three months’ gestation and gradually accumulates as g-globin produc-
tion declines. One rationale is that the globins used in hemoglobin during embry-
onic and fetal periods are better suited to the more hypoxic environment at these
stages.
As we shall see later on in this chapter, different metazoan species can have
very different copy numbers for a specific gene family and, in some cases, the dif-
ferences can clearly be seen to be evolutionarily advantageous.
diploid genome
WGD
tetraploid genome
urochordates (tunicates)
more than 50% of the genes in the genome of P. tetraurelia are clearly seen to be
duplicated. The Paramecium studies show that after WGD, gene loss does not
cephalochordates
occur initially on a massive scale but instead occurs over a very long time-scale,
and that highly expressed genes and genes encoding proteins that function as
vertebrates
WGD
part of complexes are more likely to be retained after WGD.
Within vertebrates, polyploidy is comparatively common in amphibians and
reptiles, and some bony fish are polyploid. Constitutional polyploidy in mam-
mals is expected to be extremely rare, however, because of gene dosage difficul-
Ciona amphioxus bony other
ties arising from the X–Y sex chromosome system. fishes vertebrates
Comparative genomics has defined time points for some of the major WGD
events. For most land-dwelling vertebrates, the most recent WGD occurred early Figure 10.14 Major whole genome
in chordate evolution after the split of the lineage giving rise to tunicates such as duplication events arising in vertebrate
lineages. The evolutionary lineages leading
Ciona from the vertebrate lineage, when there seems to have been two rounds of
to most of the current vertebrates have
genome duplication. A major WGD event also occurred in the lineage leading to undergone two major whole genome
teleost fishes (the bony fishes, including zebrafish, pufferfish, and medaka), sub- duplication (WGD) events that took place
sequent to divergence from mammals (Figure 10.14). after the split from tunicates some time
within the approximate interval of 525–875
Major chromosome rearrangements have occurred during million years ago. In addition, a major WGD
mammalian genome evolution event occurred about 340–350 million years
ago in the lineage leading to bony fishes.
Large-scale chromosome rearrangements have been frequent during mamma-
lian genome evolution. Sometimes chromosome evolution can be seen to be
uncoupled from phenotype evolution. A classic example is provided by the
Chinese and Indian species of muntjac, a type of small deer. The two species are
so closely related that they can mate and give birth, although the offspring are
not viable. The Chinese muntjac has 46 chromosomes, but as a result of various
chromosome fusion events, the Indian muntjac has far fewer, but much larger,
chromosomes (Figure 10.15).
(A) (B)
X–Y XY (male); XX (female) prevalent in mammals, but also in Y chromosome carries male-determining
some amphibians and in some fishb factor; sperm are X or Y and so determine
the sex
Z–W ZW (female); ZZ (male) prevalent in birds and reptiles, but also eggs are Z or W and so determine the sex
in some amphibians and in some fishb
X-autosome ratio XX (female/hermaphrodite); in many types of insect and some C. elegans is an example in which
X0a/XY (male) nematodes XX individuals are hermaphrodites;
D. melanogaster has a Y chromosome but it
does not confer maleness
Haplo-diploid relies on ploidy only in many types of insect unfertilized eggs produce haploid fertile
males; fertilized eggs are diploid and give rise
to females or sterile males
aThe 0 denotes zero, so that X0 means a single X. bThere can be different systems in the same kind of organism. For example, some medaka species
have the X–Y system and others have the Z–W system.
(homomorphic) but are presumed to differ at the gene level. In other cases, the
sex chromosomes can clearly be seen to be structurally different and are said to
be heteromorphic.
In heteromorphic sex chromosomes, one sex has two different sex chromo-
somes and is said to be heterogametic, being able to produce gametes with either
one of the two sex chromosomes. The other sex is homogametic and produces
gametes with the same sex chromosome. For some species, the male is hetero-
gametic, whereupon the sex chromosomes are designated X and Y. For other
species, the female is heterogametic and the sex chromosomes are designated Z
and W (see Table 10.5).
In both the X–Y and Z–W sex chromosome systems, the chromosome that is
exclusively associated with the heterogametic sex (Y and W) is comparatively
small and poorly conserved with very few genes and with large amounts of repet-
itive DNA. For example, the human X chromosome contains many genes (includ-
ing close to 900 protein-coding genes) within 155 Mb of DNA. The Y chromosome
has just 59 Mb of DNA and much of it is genetically inert constitutive heterochro-
matin. There are more than 80 protein-coding genes on the Y chromosome, but
several genes are repeated and effectively the human Y chromosome encodes
only close to 30 different types of protein.
In female meiosis, the two X chromosomes pair up to form bivalents much
like any pair of autosomal homologs, but X–Y pairing in male meiosis is more
problematic because of major differences between the X and Y chromosomes in
size and sequence composition. Recombination between the human X and Y
chromosomes occurs exclusively in short terminal regions that, as a result of
engaging in frequent X–Y crossovers, are identical on the X and Y chromosomes.
Because of recombination these sequences can move between the X and Y chro-
mosomes; they are neither X-linked nor Y-linked and are known as pseudoauto-
somal regions.
The remaining regions of the sex chromosomes, containing the great majority
of their sequences, do not recombine in male meiosis and so are sex-specific
regions. Because the Y chromosome is exclusively male, the non-recombining
region on the Y chromosome never engages in recombination and is sometimes
referred to as the male-specific region. The equivalent region on the X chromo-
some does, however, recombine with its partner X chromosome in female
meiosis.
The Xp/Yp and Xq/Yq pseudoautosomal regions are allelic sequences, but the
X-specific and Y-specific components outside these areas are rather different in
sequence. Nevertheless, there are multiple regions of X–Y homology in the
X-specific and Y-specific regions that indicate a common evolutionary origin for
the X and Y chromosomes. They include 25 gene pairs with functional homologs
on the X and Y, known as gametologs (Figure 10.17).
322 Chapter 10: Model Organisms, Comparative Genomics, and Evolution
2.6 Mb at the extreme tips of Xp and Yp. The minor one, PAR2, spans 330 kb at the RPS4X EIF1AY
extreme tips of Xq and Yq. PAR1 is the site of an obligate crossover during male
meiosis that ensures correct meiotic segregation. As a result, the sex-averaged RBMY
boundary
PAR1 sex-specific
ANT3
PPP2R3B CSF2RA DHRSXY
ARSE ARSF
TEL PGPL SHOX XE7 IL3RA ASMTL ASMT Tramp MIC2 5¢ XG 3¢ XG ARSD
X
diverged
diverged
diverged
diverged
mouse + – – – – – – –– – – – mouse X
orthologs orthologs
diverged
diverged
ANT3 mouse Y
PPP2R3B CSF2RA DHRSXY orthologs
TEL PGPL SHOX XE7 IL3RA ASMTL ASMT Tramp MIC2 5¢ XG SRY RPS4Y
Y
Figure 10.18 Organization and evolutionary instability of the major pseudoautosomal region. The 2.6 Mb human PAR1 region is common to the
tips of Xp and Yp and contains multiple genes, of which 13 protein-coding genes are shown here. Adjoining PAR1 are the sex-specific parts of the X
and Y chromosomes, with only the first few genes shown here. The PAR1 boundary occurs within the XG blood group gene, so that the 5¢ end of XG is
in the pseudoautosomal region whereas the 3¢ end of XG is X-specific. On the Y chromosome, there is a truncated 5¢ XG gene homolog containing just
the promoter and the first few exons. The adjacent Y-specific DNA contains unrelated sequences including SRY and RPS4Y. Genes within PAR1 and in
the neighboring sex-specific regions (which were part of a former pseudoautosomal region) have been poorly conserved during evolution, and mouse
orthologs are often undetectable (–) or have extremely diverged sequences with low similarity to the human ones. TEL, telomere.
GENE AND GENOME EVOLUTION 323
PROTOTHERIA
METATHERIA
case of the major male determinant, SRY (located just 5 kb from the PAR1 bound-
EUTHERIA
ary) and the neighboring steroid sulfatase gene, STS.
Unlike the human X chromosome, which can recombine throughout its length
with a partner X chromosome in female meiosis, the human Y chromosome can
be viewed as a mostly asexual (non-recombining) chromosome. Population
genetics predicts that a non-recombining chromosome will acquire harmful
mutations. This happens because natural selection acts through recombination
to eliminate deleterious mutations.
Imagine a deleterious mutation arising in some gene on a chromosome that
has another gene with a beneficial allele. Natural selection can act through
recombination to uncouple the beneficial and harmful mutations, retaining the
beneficial mutation and eliminating the harmful one. But if they cannot be
uncoupled—because recombination between them is not possible—deleterious
mutations can hitchhike along with strongly beneficial ones, or if strongly delete-
rious mutations are eliminated by selection, linked beneficial mutations will also
be discarded.
Once harmful mutations accumulate in the non-recombining Y and cause
loss of gene function, inactive pseudogenes are formed. There is no selective
pressure to retain the relevant DNA segment, and genes will be progressively lost.
Normal DNA turnover mechanisms will ensure that the chromosome gradually,
but inexorably, contracts by a series of deletions.
The X chromosome is thought to have much the same DNA and gene content
as the ancestral autosome. By contrast, the human Y chromosome has lost the
great majority of its genes and much of the DNA content and has been envisaged
to be heading for extinction. On the basis that the Y chromosome had the same
number of genes as the X several hundred million years ago but is now reduced
to a very small number, one calculation has suggested that the Y chromosome
would be driven to extinction within only 10 million years from now. As we will
see in the following section, that calculation did not foresee that gene conversion,
a mechanism involving non-reciprocal sequence transfer, can occur in parts of
the Y chromosome.
TABLE 10.6 ABOUT 75% OF PROTEINCODING GENES ON THE HUMAN MSY REGION FUNCTION IN THE TESTIS
MSY sequence Gene Gene(s) Copy number Expression Function
classa number
RBMY 6
BPY2 3
DAZ 4
aSee Figure 10.21 for the location of the different gene classes in the human MSY (male-specific region of the Y chromosome). Data adapted from
DNA sequence (a copy is made of the donor DNA sequence that is then used to
replace the original acceptor sequence—see Box 14.1 for the mechanism). In the
context considered here, the outcome is that one arm of a palindrome acts as a
gene conversion donor sequence and the other arm as an acceptor so that the
sequence of one arm is periodically replaced by a perfect copy of the sequence of
the other arm. Essentially, this is a type of intrachromosomal recombination and
may act so as to preserve the testis-expressed genes from the extinction that they
might otherwise have faced in the absence of recombination.
Amniotes—vertebrates that develop an amnion. Includes reptiles, Hagfish—a type of fish that have skulls but are classed as
birds, and mammals but not fishes and amphibians. invertebrates because the notochord does not get converted into a
Anthropoids—hominoids and monkeys. vertebral column. They are closely related to vertebrates.
Ascidians—sea squirts (e.g. Ciona intestinalis), a type of tunicate. Hominids—humans plus great apes (gorilla, chimpanzee, and
Bilaterians—bilaterally symmetrical metazoans, including the bonobo) plus orangutans (see Figure 10.26).
majority of animal phyla. They mostly have three germ layers and so Hominins—humans plus great apes (see Figure 10.26).
are now almost synonymous with triploblasts. Hominoids—humans plus great apes plus orangutans plus gibbons
Catarrhine primates—hominoids and Old World monkeys (see Figure (see Figure 10.26).
10.26). Metatherian mammals—marsupials (e.g. kangaroo).
Cephalochordates—chordates that do not have a skull or vertebral Monotremes—prototherian egg-laying mammals (e.g. platypus).
column (e.g. lancelets, Amphioxus). New World monkeys (Platyrrhini)—monkeys that are limited to
Chordates—animals that undergo an embryonic stage in which they tropical forest environments of southern Mexico, and Central and
possess a notochord, a dorsal tubular nerve cord, and pharyngeal South America.
gill pouches; includes urochordates (tunicates), cephalochordates, and Old World monkeys (Cercopithecidae)—monkeys that are found in a
craniates. wide variety of environments in South and East Asia, the Middle East,
Cnidarians (= coelenterates)—eukaryotes that have two germ layers and Africa.
and are radially symmetrical (e.g. jellyfish, corals, and sea anemones). Phylum—a group of species sharing a common body organization.
Coelomates—organisms with a coelom, or fluid-filled internal Platyrrhine primates—New World monkeys.
body cavity that is completely lined with mesoderm, as opposed to Prototherian mammals—monotremes.
acoelomates (e.g. flatworms) and pseudocoelomates (e.g. nematodes Protostomes—from the Greek meaning first mouth. Organisms in
that have a fluid-filled body cavity but one that is not completely which the mouth originates from the blastopore (the opening of the
lined with mesoderm). Divided into two groups, protostomes and primitive digestive cavity to the exterior of the embryo). Later on, the
deuterostomes. anus will open opposite the mouth. Includes molluscs, annelids, and
Clade—a group of organisms including all the descendants of a last arthropods. Compare with deuterostomes.
common ancestor. A clade is a monophyletic taxon. Radiata—radially symmetrical animals with only two germ layers. Also
Craniates—chordates with skulls; that is, vertebrates plus hagfish. known as diploblasts.
Deuterostomes—from the Greek, meaning second mouth. Animals in Taxon—a group of organisms recognized at any level of the
which the blastopore (the opening of the primitive digestive cavity to classification.
the exterior of the embryo) is located to the posterior of the embryo Teleost fish—bony fish (as opposed to cartilaginous fish),
and becomes the anus. Later on, the mouth opens opposite the anus. characterized by a fully moveable upper jaw, rayed fins, and a swim
Includes chordates and echinoderms. Compare with protostomes. bladder (e.g. zebrafish and pufferfish).
Diploblasts—metazoans with only two germ layers: ectoderm Triploblasts—metazoans with three germ layers; they are bilaterally
and endoderm. This group is often classified as radiata, symmetrical and hence can also be described as bilaterians.
because its members are radially symmetrical. It comprises Tunicates—a primitive chordate group (sea squirts, e.g. Ciona
cnidarians and ctenophora (comb jellies). intestinalis).
Echinoderms—radially symmetrical marine adult animals Urochordates—chordates that have a notochord limited to the caudal
that develop from bilaterally symmetrical larvae. They have three region. There is only one major grouping, the tunicates.
germ layers and so are triploblasts.
Eutherian mammals—placental mammals. For more information on phylogenetic groupings, see the NCBI
Gnathostomes—jawed vertebrates. The great majority of vertebrates Taxonomy Browser (http://www.ncbi.nlm.nih.gov/Taxonomy/).
are gnathostomes, but lampreys are jawless vertebrates.
A rooted tree (or cladogram) infers the existence of a common ancestor (rep- (A)
A C
resented as the trunk or root of the tree) and indicates the direction of the evolu-
tionary process (Figure 10.23B). The root of the tree may be determined by com- D
paring sequences against an outgroup sequence (one that is clearly but distantly
related from the sequences under study). An unrooted tree (Figure 10.23A) does B E
not infer a common ancestor and shows only the evolutionary relationships
internal
between the organisms. Note that the number of possible rooted trees is usually nodes
much higher than the number of unrooted trees.
Figure 10.24 Constructing a phylogenetic tree with UPGMA. In this example, the N-terminal 50 amino acids of z-globins from human and six
other vertebrate species were aligned, and a rooted tree was constructed by using the UPGMA (unweighted pair group method with arithmetic mean)
hierarchical clustering method, which first calculates a distance matrix. (A) The human z-globin sequence is used as a reference for the alignment.
Sequence identity is shown by a dot. (B) A distance matrix is constructed in which pairwise comparisons are used to establish the fraction of amino acid
sites that are different between two sequences. For example, the human and chimp sequences differ at one amino acid out of 50, and so the fraction is
0.02. (C) The distance matrix values are used to construct the phylogenetic tree. Note that the lengths of the horizontal branches are proportional to the
distances between the nodes. Phylogeny software packages such as PHYLIP are available at various Web servers (e.g. at the Institut Pasteur in Paris,
http://bioweb2.pasteur.fr/phylogeny/intro-en.html) and typically offer neighbor joining (unrooted trees) and UPGMA (rooted trees) as alternatives.
OUR PLACE IN THE TREE OF LIFE 329
may be possible for a small number of sequences. For a large number of sequences,
the number of trees generated becomes extremely large and the computational
time needed to identify the best evolutionary tree becomes impossibly long. In
that case, a subset of possible trees is created by using heuristics: methods are
used that will produce an answer in a computable length of time, even if it may
not be the optimal one.
Once an evolutionary tree has been derived, statistical methods can be used
to gain a measure of its reliability. A popular method is bootstrapping, a form of
Monte Carlo simulation. Typically, a subsample of the data is removed and
replaced by a randomly generated equivalent data set; the resulting pseudose-
quence is analyzed to see whether the suggested evolutionary pattern is still
favored. If there is a clear relationship between two sequences, the randomiza-
tion introduced by the resampling will not erase it. If, however, a node connect-
ing the two sequences in the original tree is spurious, it may disappear on ran-
domization because the randomization process changes the frequencies of the
individual sites.
Bootstrapping often involves resampling subsets of data 1000 times. The boot-
strap value is typically given as a percentage. A value of 100 means that the simu-
lations fully support the original interpretation; values of 95–100 indicate a high
level of confidence in a predicted node; values less than 95 do not mean that the
original grouping of sequences is wrong, but that the available data do not pro-
vide convincing support.
As a result of a combination of classical and molecular phylogenetic
approaches, humans have been classified as belonging to a range of different
groups of organisms. Figure 10.25 shows simplified phylogenies for eukaryotes
and metazoans, and Figure 10.26 shows a vertebrate phylogeny.
(A)
BACTERIA
4000 ARCHAEA
MYA
EUKARYOTES
3800 taxa, etc. examples
MYA
metamonads Giardia lamblia
parabasilids Trichomonas
choanoflagellates
Figure 10.25 Simplified
METAZOANS (see panel B)
eukaryotic and
metazoan phylogenies.
(A) Life is divided into
three domains—bacteria,
archaea, and eukarya—
(B) taxa, etc. examples and the eukaryotic
poriferans sponges phylogeny is shown here.
The choanoflagellates
DIPLOBLASTS (also called collar
cnidarians jellyfish, corals, flagellates) are considered
sea anemones to be the closest living
protist relatives of the
sponges, the most
BILATERA = TRIPLOBLASTS
primitive metazoans. (B) In
the metazoan phylogeny,
PROTOSTOMES
diploblasts have two germ
platyhelminths flatworms layers whereas triploblasts
molluscs octopus
annelids earthworms have three germ layers
arthropods D. melanogaster and are bilaterally
Anopheles gambiae symmetrical. Note that the
nematodes C. elegans
C. briggsae fundamental protostome–
1000 deuterostome split, which
MYA DEUTEROSTOMES occurred about 1000
hemichordates acorn worms million years ago (MYA),
echinoderms sea urchins, starfish, means that C. elegans
sea cucumbers and D. melanogaster are
more distantly related
CHORDATES from humans than are
urochordates some other invertebrates
tunicates Ciona intestinalis
(ascidians) such as the sea urchin
cephalochordates
cephalochordates amphioxus Strongylocentrotus
craniates Hyperotreti hagfishes purpuratus, and notably
VERTEBRATES (see Figure 10.26) other chordates including
amphioxus and the
tunicate Ciona intestinalis.
OUR PLACE IN THE TREE OF LIFE 331
TABLE 10.7 THE INCONSISTENT RELATIONSHIP BETWEEN GENE NUMBER AND ORGANISMAL COMPLEXITY
Species Description Genome size (Mb) Number of protein-coding genes
genome that has a somatic, non-reproductive function, and a smaller micronucleus that is required for reproduction.
332 Chapter 10: Model Organisms, Comparative Genomics, and Evolution
TABLE 10.8 HIGHLY VARIABLE COPY NUMBER FOR CHEMOSENSORY RECEPTOR GENE FAMILIES IN DIFFERENT
VERTEBRATES
Gene Number of functional gene copies
family
Human Chimp Mouse Rat Dog Cow Opossum Platypus Chicken Xenopus Zebrafish
OR 388 399 1063 1259 822 1152 1198 348 300 1024 155
TAAR 6 3 15 17 2 17 22 4 3 6 109
OR, olfactory receptor; V1R, vomeronasal type 1 receptor; V2R, vomeronasal type 2 receptor; TAAR, trace amine-associated receptor. Data abstracted
from Nei et al. (2008) Nat. Rev. Genet. 9, 951–963.
OUR PLACE IN THE TREE OF LIFE 333
is less clear why Kruppel-associated box (KRAB)-zinc finger gene families have
expanded as they have in various mammalian lineages.
Although gene family expansions can therefore be an adaptive change, there
also seems to be a significant random element in the way in which gene family
copy number can vary between species, a process that has been called genomic
drift.
Figure 10.27 Humans have a gene that makes flies grow wings. (A)
(A) Drosophila with a mutation in the apterous gene lack wings. The normal
phenotype can be rescued after inserting the wild-type fly gene (B) or its human
ortholog, LHX2 (C), into the mutant fly embryo. [Adapted from Rincón-Limas
DE, Lu CH, Canal I et al. (1999) Proc. Natl Acad. Sci. USA 96, 2165–2170. With
permission from the National Academy of Sciences.]
closely related species. The cumulative data led Sean Carroll to propose a new
genetic theory of morphological evolution in 2008.
Key developmental regulators typically exhibit mosaic pleiotropism: they
(B)
serve multiple roles (and so are pleiotropic), but they also function independ-
ently in different cell types, germ layers, body parts, and developmental stages.
Because the same protein can shape the development of many different body
parts, the coding sequences of such regulators are under very considerable evo-
lutionary constraint, and so are extremely highly conserved. At the protein level,
their functions can be extraordinarily conserved; sometimes, their functions can
be maintained after they have been artificially replaced by very distantly related
orthologs, and sometimes even paralogs.
As shown in Figure 10.27, a human ortholog of the Drosophila apterous gene
can regulate the formation of fly wings. The implication is that it is not just the
gene regulators that are highly conserved, but also the proteins that they interact (C)
with to perform their functions, and that in some cases the conservation has
been maintained over as much as the billion years of evolution that separate
humans and flies (Figure 10.25B). Although there was a rapid expansion in the
number of fundamentally different genes encoding transcription factors, signal-
ing molecules, and receptors at the beginning of metazoan evolution, there has
been comparatively little expansion since the period just before the origin of bila-
terians (see Figure 10.25B). For example, 11 of the 12 vertebrate Wnt gene fami-
lies, which encode proteins that regulate cell–cell interactions in embryogenesis,
have counterparts in cnidarians such as jellyfish and, remarkably, six of the major
bilaterian signaling pathways are represented in sponges that diverged from
other metazoans at the very beginning of metazoan evolution (see Figure
10.25B).
The Hox genes, which specify the anterior–posterior axis and segment iden-
tity in embryogenesis, have also undergone very few changes over many hun-
dreds of millions of years. Coelacanths, cartilaginous fish that are very distantly
related to us, have much the same classical Hox genes as we do. More accurately,
they have four more Hox genes than us because of a loss of Hox genes in verte-
brate lineages leading to mammals (Figure 10.28).
Thus, although many other protein families were diversifying over the last
several hundred million years, there has been comparatively little diversification
of key developmental regulators, possibly because of constraints imposed by
gene dosage. Instead, expansion of gene function without gene duplication has
often been achieved by using the same gene to shape several entirely different
traits. By appropriately regulating gene expression, the same gene product can be
sent to different but specific tissues (heterotopy) and at different developmental
Figure 10.28 Gene duplication has not
stages (heterochrony) where, according to its environment, it can interact with occurred over hundreds of millions
of years of Hox cluster evolution. The
14 1 coelacanth is a cartilaginous fish that
A descended from the earliest diverging
B lineage of jawed vertebrates (see Figure
C
D13 D 10.26). Coelacanths, humans, and frogs have
14 1 coelacanth remarkably similar Hox gene organizations,
A
B ~410 13 1
and there does not seem to have been any
C MYA A gene duplication in the 410 million years or
D B so since they diverged from a last common
common ancestor A14 C
B10 D ancestor. Rather, there have been instances
C1 ~370 B13 western clawed frog of occasional Hox gene loss during this time,
C3 MYA D12 shown by curved arrows. For simplicity, only
13 1
A the classical Hox genes are shown within the
B Hox cluster. MYA, million years ago. [Adapted
C
D from Hoegg S & Meyer A (2005) Trends Genet.
human 21, 421–424. With permission from Elsevier.]
OUR PLACE IN THE TREE OF LIFE 335
factors, for example, or in the loss or modification of existing binding sites. But
new CREs may evolve by random mutation and also as a result of sequence inser-
tion events involving transposable elements.
Transposable elements have been viewed as genome parasites that owe their
survival to their ability to replicate faster than the host that carries them; RNA
interference mechanisms are needed to limit their spread within a genome.
Transposable element repeats account for about 45% of the DNA in the human
genome, although only a small percentage is active in transposition (see Chapter
9). By integrating into new locations in the DNA, they can influence the expres-
sion of nearby genes by a variety of possible mechanisms (Figure 10.31).
By disrupting the normal patterns of gene expression, transposable elements
are known to cause disease. In addition to being a potential threat at the level of
the genome and the individual, however, transposable elements can also be ben-
eficial. Comparative genomic studies have shown that at least 10,000 human
transposable element fragments have evolved under strong purifying selection,
indicating that they are functionally important. Many of them are located close
to genes encoding developmental regulators, and many have been strongly con-
served over hundreds of millions of years.
The functional importance of transposable elements arises from their occa-
sional ability to donate sequences that are then used by the host genome for a
novel function, a process known as exaptation. Thus, the sequences of a small
number of genes, sometimes known as neogenes, are thought to have originated
in large part from transposons. They include TERT (telomerase reverse tran-
scriptase), CENPB (encoding a centromere protein), and the recombination acti-
vating genes RAG1 and RAG2.
More commonly, transposable element sequences have contributed smaller
gene components. One way involves donating sequence fragments that have
appropriate splice sites so they can be included as novel exons (exonization—
Figure 10.31C). Such exons have contributed to both functional noncoding RNA
(such as the BC200 RNA—see Table 9.9) and various proteins. In addition, trans-
posable element sequences that contain binding sites for trans-acting factors
have been exapted to provide novel CREs.
As an example, the LF-SINE family of retrotransposons are known to have
contributed sequences to generate both a novel exon and a novel enhancer. The
starting point for this discovery was uc.338, a 223 bp ultraconserved element
located at 12q13 whose sequence is 100% conserved in human, mouse, and rat.
The uc.338 sequence lies within the PCBP2 gene, which encodes an RNA-binding
regulatory protein, and spans an alternatively spliced exon that encodes part of
the PCBP2 protein. Comparative genomics also showed that the sequence of
uc.338 showed a high degree of sequence identity to members of the LF-SINE
retrotransposon family in the coelacanth, a fish that has descended from one of
the most evolutionary ancient lineages of jawed fishes (Figure 10.32).
OUR PLACE IN THE TREE OF LIFE 337
GLUD2 glutamate metabolism enzyme hominoid-specific retrogene; evolved by retrotransposition from the housekeeping
gene GLUD1; believed to have undergone positive selection resulting in specific
targeting to mitochondria with novel, potentially brain-specific functions
CDC14Bretro cell cycle protein hominoid-specific retrogene that evolved by copying from a microtubule-expressed
isoform of CDC14; believed to have undergone positive selection and now
preferentially associates with endoplasmic reticulum
TCP10L likely spermatogenesis factor also expressed in hepatocytes, where it may be involved in maintaining the
differentiation state
snaR gene family ~117-nucleotide noncoding RNAs hominid-specific and rapidly evolving; predominantly expressed in testis
that bind nuclear factor 90
Various miRNA genes various microRNAs generated by primate-specific expansions in miRNA gene families; they notably
include brain-expressed miRNAs, some of which have been claimed to be human-
specific
Various primate-specific genes have been identified, and during the past 60
million years or so of primate evolution there seems to have been a burst of ret-
rogene formation—some estimates suggest that at least one new retrogene has
formed on average every one million years. Other primate-specific genes have
originated through the duplication of genomic DNA sequences, such as by seg-
mental duplication. Some of the primate-specific genes are known to have
evolved recently and may be present in hominoids but not monkeys, and some
others are restricted to hominids (Table 10.9). Truly human-specific genes seem
to be extremely rare; however, families of genes encoding brain-expressed
miRNAs have undergone significant expansion in primate lineages, and some of
the miRNA genes have been reported to be human-specific.
Lineage-specific differential expression of a single gene also plays a part.
Inappropriate ectopic expression of a gene may have important consequences,
as strikingly revealed with the FGF4 gene in dogs (Figure 10.33). Human–
chimpanzee gene differences resulting from lineage-specific gene inactivation
are clearly evident. Since the divergence of humans and chimpanzees from a
(A)
TABLE 10.10 EXAMPLES OF EVOLUTIONARY INACTIVATION OR MODIFICATION OF GENES IN THE HUMAN LINEAGE AFTER
DIVERGENCE FROM CHIMPANZEES
Gene Gene product Nature of change Consequence
CMAH CMP-N-acetyl- frameshifting deletion unlike other primates (and mammals), humans do not make
neuraminic acid N-glycolylneuraminic acid (Neu5Gc), which is one of the two
hydrolyase most common sialic acids in mammals and is known to be
expressed in a developmentally regulated and tissue-specific
fashion
MYH16 type of myosin heavy inactivation loss of this myosin protein in humans has been suggested to
chain correlate with the considerably smaller masticatory muscles
used to chew food, making larger brain sizes possible; see text
and Stedman et al. (2004) Nature 428, 373–374
SIGLEC11 type of sialic acid 5¢ end and first five exons replaced by new and strong expression in human microglia
sequence copied from a pseudogene
common ancestor about 5–7 million years ago, about 80 genes have been
inactivated on the human lineage and now appear as nonprocessed pseudo-
genes, and lineage-specific gene modification is also apparent in some cases
(Table 10.10 shows some examples).
Differential inactivation of MYH16, a gene expressed in the jaw muscles, has
been considered to be one of the most biologically meaningful differences
between humans and chimpanzees. Humans cannot make a MYH16 protein
because MYH16 was inactivated by mutation in the human lineage about 2.4 mil-
lion years ago, just before the modern hominid cranium evolved. Inactivation of
MYH16 in humans may have led to a decrease in the size of the jaw muscle, allow-
ing the necessary room for the hominid cranium to be remodeled to accommo-
date a larger brain.
TABLE 10.11 EXAMPLES OF DNA SEQUENCES THAT HAVE BEEN POSTULATED TO BE SUBJECT TO DIFFERENTIAL POSITIVE
SELECTION IN THE HUMAN LINEAGE
DNA Type of Properties References
sequence sequence
FOXP2 protein-coding encodes a transcription factor that is important in speech; see the Enard et al. (2002) Nature 418, 869–872;
gene text Enard et al. (2009) Cell 137, 961–971
MCPH protein-coding encodes microcephalin, a protein that regulates brain size Evans et al. (2005) Science 309,
gene 1717–1720
HAR1A (HAR1F) RNA gene expressed specifically in Cajal–Retzius neurons in the developing Pollard et al. (2006) Nature 443, 167–172
neocortex; co-expressed with it is reelin, a protein that is
fundamentally important in specifying the six-layer structure of the
brain
HACNS1 enhancer important in early development; in comparison with orthologous Prabhakar et al. (2008) Science 321,
sequences in chimpanzee and rhesus macaque it has gained a 1346–1350
strong expression domain in the developing limb that has been
postulated to result in altered human limb anatomy, such as
specialization of the hand to facilitate using tools or the foot to allow
walking on two legs
In several cases of putative positive selection in the human lineage, the great majority of nucleotides altered in the human lineage arose by
replacement of an A or T with a G or C. This ATÆGC bias has prompted the suggestion that the nucleotide changes simply reflect biased gene
conversion rather than positive selection; for a recent discussion, see Hurst LD (2009) Nature 457, 543–544.
We show in Table 10.11 some examples of DNA sequences that have been
believed to be positively selected in the human lineage, and we outline different
approaches to identify them below. To identify genes that confer human charac-
teristics, one major approach starts with phenotypes that relate to our special
cognitive abilities and brain capacity. Studies of human microcephaly, a neu-
rodevelopmental disorder characterized by decreased brain growth, have identi-
fied genes that regulate brain growth such as MCPH1 for which there is evidence
for positive selection in the human lineage. However, the ongoing adaptive evo-
lution of such genes is not explained by increased intelligence. Various language
and reading disorders with a significant genetic component have also been stud-
ied, including developmental dyslexia (reading disability), speech language dis-
order (SLD), and also specific language impairment (SLI), where the heritability
is particularly high.
The SLD gene FOXP2 has been extensively studied. It encodes a transcription
factor that has been very well conserved during evolution—out of a total of 715
amino acids, only 3 are changed in the mouse. Of these, two amino acids, N303
and S325, seem to be human-specific when compared with various nonhuman
primates and are thought to have been positively selected during recent human
evolution (Figure 10.34).
Evidence that the apparently human-specific N303 and S325 residues are
important in brain function comes from the Foxp2hum/hum mouse, which is
homozygous for a humanized version of the Foxp2 gene (genetically engineered
to express the human-specific N303 and S325). Foxp2hum/hum mice have altered
vocalization and also changes in certain neurons that are expected to enhance
the efficiency of brain mechanisms that in humans are known to regulate motor
control, including speech, word recognition, and several other cognitive
properties.
303 325
modern human DNGIKHGGLD LTTNNSSSTT SSNTSKASPP ITHHSIVNGQ SSVLSARRDS SSHEETGASH TLYGHGVCKW
Neanderthal DNGIKHGGLD LTTNNSSSTT SSNTSKASPP ITHHSIVNGQ SSVLSARRDS SSHEETGASH TLYGHGVCKW
chimp DNGIKHGGLD LTTNNSSSTT SSTTSKASPP ITHHSIVNGQ SSVLNARRDS SSHEETGASH TLYGHGVCKW
gorilla DNGIKHGGLD LTTNNSSSTT SSTTSKASPP ITHHSIVNGQ SSVLNARRDS SSHEETGASH TLYGHGVCKW
orangutan DNGIKHGGLD LTTNNSSSTT SSTTSKASPP ITHHSIVNGQ SSVLNARRDS SSHEETGASH TLYGHGVCKW
rhesus DNGIKHGGLD LTTNNSSSTT SSTTSKASPP ITHHSIVNGQ SSVLNARRDS SSHEETGASH TLYGHGVCKW
marmoset DNGIKHGGLD LTTNNSSSTT SSTTSKASPP ITHHSIVNGQ SSVLNARRDS SSHEETGASH TLYGHGVCKW
mouse DNGIKHGGLD LTTNNSSSTT SSTTSKASPP ITHHSIVNGQ SSVLNARRDS SSHEETGASH TLYGHGVCKW
Figure 10.34 Human-specific amino acids in the highly conserved FOXP2 protein. The FOXP2 protein has been extremely highly conserved during
evolution, showing 99.6% identity to mouse Foxp2. Alignment with nonhuman primate orthologs suggests that amino acids N303 and S325 result
from non-synonymous substitutions in the human lineage after human–chimp divergence but before the lineages leading to modern humans and
Neanderthals diverged about 600,000 years ago.
CONCLUSION 341
CONCLUSION
Model organisms have long been useful in research, to understand molecular
and physiological processes and to study disease and evaluate potential thera-
pies for human diseases. Now that we have the genome sequences for many of
these model organisms as well as humans, the detection of differences and simi-
larities between genomes is enabling us to learn more about the processes that
shape a genome and allow it to evolve.
Comparative genomics has allowed the construction of molecular phyloge-
netic trees of extant species, deepening our understanding of our position in the
tree of life, and revealing information about how genomes, including our own,
have evolved.
As well as helping to identify and validate predicted protein-coding genes,
comparative genomics has also been helpful in identifying functional noncoding
DNA. The latter is thought to account for about 80% of the highly conserved DNA
in the human genome, being made up of RNA genes and short regulatory ele-
ments that control how genes are expressed.
Functionally important sequences tend to be strongly conserved and are sub-
ject to purifying (negative) selection, with harmful mutations being selectively
eliminated. Those mutations that are beneficial tend to be positively selected.
These mutations underlie adaptive evolution.
Shuffling and duplication of the exons within protein-coding genes increases
gene complexity. Transposable elements can also be copied and used to make
coding sequences or to introduce new functionally important sequences, such as
novel exons and novel regulatory elements.
Evolutionary novelty can also arise through the duplication of genes or even
whole genomes. Once a gene has been duplicated, the two copies can diverge
342 Chapter 10: Model Organisms, Comparative Genomics, and Evolution
FURTHER READING
Model organisms Kaletta T & Hengartner MO (2006) Finding function in novel
Butte AJ (2008) The ultimate model organism. Science 320, targets: C. elegans as a model organism. Nat. Rev. Drug Discov.
325–327. 5, 387–398.
Davis RH (2004) The age of model organisms. Nat. Rev. Genet. Segalat L (2007) Invertebrate animal models of disease as
5, 69–76. screening tools in drug discovery. ACS Chem. Biol. 2, 231–236.
Fields S & Johnston M (2005) Whither model organism research? WormBook. http://www.wormbook.org [An online review of
Science 307, 1885–1886. C. elegans biology.]
Spradling A, Ganetsky B, Hieter P et al. (2006) New roles for
model genetic organisms in understanding and treating Vertebrate models (rodents and zebrafish are
human disease: report from the 2006 Genetics Society of referenced in Chapter 20)
America Meeting. Genetics 172, 2025–2032. Beck CW & Slack JMW (2001) An amphibian with ambition:
Unicellular model organisms and synthetic biology a new role for Xenopus in the 21st century. Genome Biol.
2, reviews1029.1–1029.5 (doi:10.1186/gb-2001-2-10-
Gibson DG, Benders GA, Andrews-Pfamkoch C et al. (2008) reviews1029).
Complete chemical synthesis, assembly, and cloning of a Capitanio JP & Emborg ME (2008) Contributions of non-human
Mycoplasma genitalium genome. Science 319, 1215–1220. primates to neuroscience research. Lancet 371, 1126–1135.
Keller LC, Romijn EP & Zamora I (2005) Proteomic analysis of Carlsson HE, Schapiro SJ, Farah I & Hau J (2004) Use of primates in
isolated Chlamydomonas centrioles reveals orthologs of research: a global view. Am. J. Primatol. 63, 225–237.
ciliary-disease genes. Curr. Biol. 15, 1090–1098. Cyranoski D (2009) Marmoset model takes centre stage. Nature
Mager WH & Winderickx J (2005) Yeast as a model for medical and 459, 492.
medicinal research. Trends Pharmacol. Sci. 26, 265–273. Pennisi E (2007) Boom time for monkey research. Science 316,
Turnbaugh PJ, Ley RE, Hamady M et al. (2007) The Human 216–218.
Microbiome Project. Nature 449, 804–810. Stern CD (2005) The chick: a great model system becomes even
Invertebrate animal models greater. Dev. Cell 8, 9–17.
Beckingham KM, Armstrong JD, Texada MJ et al. (2005) Comparative genomics
Drosophila melanogaster—the model organism of choice for
the complex biology of multi-cellular organisms. Gravit. Space Blanchette M (2007) Computation and analysis of multi-sequence
Biol. Bull. 18, 17–29. alignments. Annu. Rev. Genomics Hum. Genet. 8, 193–213.
Davidson B & Christiaen L (2006) Linking chordate gene networks Dolinski K & Botstein D (2007) Orthology and functional
to cellular behavior in ascidians. Cell 124, 247–250. conservation in eukaryotes. Annu. Rev. Genet. 41, 465–507.
FURTHER READING 343
Margulies EH & Birney E (2008) Approaches to comparative Comai L (2005) The advantages and disadvantages of being
sequence analysis: towards a functional view of vertebrate polyploid. Nat. Rev. Genet. 6, 836–846.
genomes. Nat. Rev. Genet. 9, 303–313. Dehal P & Boore JL (2006) Two rounds of whole genome
Miller W, Makova KD, Nekrutenko A & Hardison RC (2004) duplication in the ancestral vertebrate. PLoS Biol. 10, e314.
Comparative genomics. Annu. Rev. Genomics Hum. Genet. 5, Force A, Lynch M, Pickett FB et al. (1999) Preservation of duplicate
15–56. genes by complementary degenerative mutations. Genetics
Stern A, Doron-Faigenboim A, Erez E et al. (2007) Selecton 151, 1531–1545.
2007: advanced models for detecting positive and purifying Guth SIE & Wegner M (2008) Having it both ways: Sox protein
selection using a Bayesian inference approach. Nucleic Acids function between conservation and innovation. Cell. Mol. Life
Res. 35, W506–W511. Sci. 65, 3000–3018.
Ureta-Vidal A, Ettwiller L & Birney E (2003) Comparative Hurles M (2004) Gene duplication: the genomic trade in spare
genomics: genome-wide analysis in metazoan eukaryotes. parts. PLoS Biol. 2, e206.
Nat. Rev. Genet. 4, 251–262. Perry GH, Dominy NJ, Claw KG et al. (2007) Diet and the evolution
of human amylase gene copy number. Nat. Genet. 39,
Comparative genomics: invertebrates 1256–1260.
Drosophila 12 Genomes Consortium (2007) Evolution of genes Rodríguez-Trelles F, Tarrío R & Ayala FJ (2006) Origins and
and genomes on the Drosophila phylogeny. Nature 450, evolution of spliceosomal introns. Annu. Rev. Genet. 40,
203–218. 47–76.
Stark A, Lin MF, Kheradpour P et al. (2007) Discovery of functional Semon M & Wolfe KH (2007) Consequences of genome
elements in 12 Drosophila genomes using evolutionary duplication. Curr. Opin. Genet. Dev. 17, 505–512.
signatures. Nature 450, 219–232. Tvrdik P & Capecchi MR (2006) Reversal of Hox1 gene
Stein LD, Bao Z, Blasiar D et al. (2003) The genome sequence subfunctionalization in the mouse. Dev. Cell 11, 239–250.
of Caenorhabditis briggsae: a platform for comparative Chromosome evolution
genomics. PLoS Biol. 1, e45.
Ferguson-Smith MA & Trifonov V (2007) Mammalian karyotype
Vertebrate comparative genomics and functional evolution. Nat. Rev. Genet. 8, 950–962.
noncoding DNA Graves JA (2006) Sex chromosome specialization and
degeneration in mammals. Cell 124, 901–914.
Bejerano G, Pheasant M, Makunin I et al. (2004) Ultraconserved Graves JA (2008) Weird animal genomes and the evolution of
elements in the human genome. Science 304, 1321–1325. vertebrate sex and sex chromosomes. Annu. Rev. Genet. 42,
Elgar G & Vavouri T (2008) Tuning in to the signals: non-coding 565–586.
sequence conservation in vertebrate genomes. Trends Genet. Marshall OJ, Chueh AC, Wong LH & Choo KHA (2008)
24, 344–352. Neocentromeres: new insights into centromere structure,
Gibbs RA, Weinstock GM, Metzker ML et al. (2004) Genome disease development, and karyotype evolution. Am. J. Hum.
sequence of the Brown Norway rat yields insights into Genet. 82, 261–282.
mammalian evolution. Nature 428, 493–521. Ross MT, Graham DV, Coffey AJ et al. (2005) The DNA sequence of
Margulies EH, Cooper GM, Asimenos G et al. (2007) Analyses of the human X chromosome. Nature 434, 325–337.
deep mammalian alignments and constraint predictions for Rosser ZH, Balaresque P & Jobling M (2009) Gene conversion
1% of the human genome. Genome Res. 17, 760–774. between the X chromosome and the male-specific region
Nickel GC, Tefft D & Adams MD (2008) Human PAML browser: on the Y chromosome at a translocation hotspot. Am. J. Hum.
a database of positive selection on human genes using Genet. 85, 130–134.
phylogenetic methods. Nucleic Acids Res. 36, D800–D808. Rozen S, Skaletsky H, Marszalek JD et al. (2003) Abundant gene
Pang KC, Frith MC & Mattick JS (2006) Rapid evolution of non- conversion between arms of palindromes in human and ape
coding RNAs: lack of conservation does not mean lack of Y chromosomes. Nature 423, 873–876.
function. Trends Genet. 22, 1–5. Skaletsky H, Kuroda-Kawaguchi T, Minx PJ et al. (2003) The male-
Pheasant M & Mattick JS (2007) Raising the estimate of functional specific region of the human Y chromosome is a mosaic of
human sequences. Genome Res. 17, 1245–1253. discrete sequence classes. Nature 423, 825–837.
Visel A, Bristow J & Pennacchio LA (2007) Enhancer identification
through comparative genomics. Semin. Cell Dev. Biol. 18, Molecular phylogenetics and taxonomy
140–152. Delsuc F, Brinkmann H & Philippe H (2005) Phylogenomics
and the reconstruction of the tree of life. Nat. Rev. Genet. 6,
General molecular evolution 361–375.
Bromham L (2008) Reading The Story In DNA: A Beginner’s Guide Hall BG (2007) Phylogenetic Trees Made Easy: A How-To Manual,
To Molecular Evolution. Oxford University Press. 3rd ed. Sinauer Associates, Inc.
Li W-H (2007) Molecular Evolution. Sinauer Associates, Inc. NCBI Taxonomy Browser. http://www.ncbi.nlm.nih.gov/taxonomy
Nei M & Kumar S (2000) Molecular Evolution And Phylogenetics.
Gene and genome evolution Oxford University Press.
Aury J-M, Jaillon O, Duret L et al. (2006) Global trends of whole- Phylogeny programs. http://evolution.genetics.washington
genome duplication revealed by the ciliate Paramecium .edu/phylip/software.serv.html [One of several sites collating
tetraurelia. Nature 444, 171–178. phylogeny programs available through the Web.]
Babushok DV, Ostertag EM & Kazazian HH Jr (2007) Current topics
in genome evolution: molecular mechanisms of new gene
Recent mammalian genome sequence papers with
formation. Cell. Mol. Life Sci. 64, 542–554. major evolutionary implications
Blomme T, Vandepoele K, De Bodt S et al. (2006) The gain and loss Mikkelsen TS, Wakefield MJ, Wakefield J et al. (2007) Genome of
of genes during 600 million years of vertebrate evolution. the marsupial Monodelphis domestica reveals innovation in
Genome Biol. 7, R43. non-coding sequences. Nature 447, 167–178.
344 Chapter 10: Model Organisms, Comparative Genomics, and Evolution
Rhesus Macaque Genome Sequencing and Analysis Consortium Human–primate differences and the genetic basis
(2007) Evolutionary and biomedical insights from the rhesus of our uniqueness (see also Table 10.10)
macaque genome. Science 316, 222–234.
Warren WC, Hillier LW, Graves JAM et al. (2008) Genome analysis Berezikov E, Thuemmler F, van Laake LW et al. (2006) Diversity of
of the platypus reveals unique signatures of evolution. Nature microRNAs in human and chimpanzee brain. Nat. Genet.
453, 175–184. 38, 1375–1377.
Bird CP, Stranger BE, Liu M et al. (2007) Fast-evolving non-coding
Interspecific homology, lineage-specific gene sequences in the human genome. Genome Biol. 8, R118.
family expansion, and gene inactivation/loss Galtier N & Duret L (2007) Adaptation or biased gene conversion?
Extending the null hypothesis of molecular evolution. Trends
Church DM, Goodstadt L, Hillier LW et al. (2009) Lineage-specific Genet. 23, 273–277.
biology revealed by a finished genome assembly of the Gilad Y, Oshlack A, Smyth GK et al. (2006) Expression profiling in
mouse. PLoS Biol. 7, e1000112. primates reveals a rapid evolution of human transcription
Clusters of Orthologous Groups database. http://www.ncbi.nlm factors. Nature 440, 242–245.
.nih.gov/COG/ [A phylogenetic classification of proteins in Kelley JL & Swanson WJ (2008) Positive selection in the human
complete genomes.] genome: from genome scans to biological significance.
HomoloGene Database (NCBI). http://www.ncbi.nlm.nih.gov/ Annu. Rev. Genomics Hum. Genet. 9, 143–160.
homologene [For automated detection of homologs in other Konopka G, Bomar JM, Winden K et al. (2009) Human-specific
species.] transcriptional regulation of CNS development genes by
Matsuya A, Sakate R, Kawahara Y et al. (2008) Evola: ortholog FOXP2. Nature 462, 213–218.
database of all human genes in H-InvDB with manual Kosiol C, Vinar T, da Fonseca RR et al. (2008) Patterns of positive
curation of phylogenetic trees. Nucleic Acids Res. 36, selection in six mammalian genomes. PLoS Genet.
D787–D792. 4, e1000144.
Nei M, Niimura Y & Nozawa M (2008) Evolution of animal Noonan JP (2009) Regulatory RNAs and the evolution of human
chemosensory receptor gene repertoires: roles of chance and development. Curr. Opin. Genet. Dev. 19, 557–564.
necessity. Nat. Rev. Genet. 9, 951–964. Pollard KS, Salama SR, King B et al. (2006). Forces shaping the
Ponting CP (2008) The functional repertoires of metazoan fastest evolving regions of the human genome. PLoS Genet.
genomes. Nat. Rev. Genet. 9, 689–698. 2, e168.
Ponting CP & Goodstadt L (2009) Separating derived from Prabhakar S, Noonan JP, Paabo S & Rubin M (2006) Accelerated
ancestral features of human and mouse genomes. Biochem. evolution of conserved non-coding sequences in humans.
Soc. Trans. 37, 734–739. Science 314, 786.
Wang X, Grus WE & Zhang J (2006) Gene losses during human Sabeti PC, Schaffner SF, Fry B et al. (2006) Positive natural
origins. PLoS Biol. 4, e52. selection in the human lineage. Science 312, 1614–1620.
Noncoding DNA, cis-acting regulatory sequences, Yang Z (2007) PAML 4: phylogenetic analysis by maximum
likelihood. Mol. Biol. Evol. 24, 1586–1591.
and morphological evolution
Carroll SB (2008) Evo-Devo and an expanding evolutionary
synthesis: a genetic theory of morphological evolution. Cell
134, 25–36.
Copley RR (2008) The animal in the genome: comparative
genomics and evolution. Phil. Trans. R. Soc. B 363, 1453–1461.
Jeong S, Rebeiz M, Andolfatto P et al. (2008) The evolution
of gene regulation underlies a morphological difference
between two Drosophila sister species. Cell 132, 783–793.
Taft RJ, Pheasant M & Mattick JS (2007) The relationship between
non-protein-coding DNA and eukaryotic complexity.
BioEssays 29, 288–299.
Visel A, Prabhakar S, Akiyama JA et al. (2008) Ultraconservation
identifies a small subset of extremely constrained
developmental enhancers. Nat. Genet. 40, 158–160.
Wray GA (2007) The evolutionary significance of cis-regulatory
mutations. Nat. Rev. Genet. 8, 206–216.
• The characteristic forms and behaviors of different cell types result from different patterns
of expression of the same set of genes. Expression is regulated at both the transcriptional
and translational levels.
• RNA polymerase II transcribes all protein-coding genes, and many that encode noncoding
RNAs. A large multiprotein complex must be assembled at the gene promoter before
transcription can be initiated.
• A fixed set of general transcription factors forms the basal transcription apparatus, with a
large cast of gene-specific or tissue-specific transcription factors, co-activators, and
co-repressors modulating promoter activity.
• Promoter activity is affected by sequence-specific DNA-binding proteins that bind either
directly to the promoter or at more distant positions (enhancers, repressors, and locus
control regions). DNA looping brings distant sites close to the promoter. Co-activators and
co-repressors do not bind DNA but are recruited to promoters by protein–protein
interactions.
• Large-scale studies of human transcripts have shown that the majority of human genomic
DNA is transcribed. The function (if any) of most of the transcripts is unknown. The
implications of this are currently not clear, but a substantial revision of the traditional view
of discrete genes sparsely scattered along genomic DNA may be required.
• The local chromatin conformation may be more important than the local DNA sequence in
defining promoters. It is governed by an interplay between histone-modifying enzymes,
ATP-driven chromatin remodeling complexes, and DNA methyltransferases that methylate
cytosine in CpG dinucleotides. RNA molecules may also have a role.
• Histones in nucleosomes can undergo many different covalent modifications. The pattern
of these constitutes an additional layer of heritable information (a histone code) that has a
major, but not exclusive, role in determining chromatin conformation.
• Heritable effects on gene expression that are not caused by a change in the DNA sequence
are called epigenetic changes. The heritability is due to heritable patterns of methylation of
CpG sequences.
• For a few dozen human genes, expression depends on parental origin because of
imprinting. This is an epigenetic process whereby alleles at a locus carry an imprint of their
parental origin that governs the pattern of expression.
• Most protein-coding genes encode multiple polypeptides because of the use of alternative
promoters and/or alternative splicing. In many cases, the resulting protein isoforms have
different tissue specificities, biological properties, and functions.
• Gene expression is also regulated by controlling whether or not an mRNA will be translated.
This often depends on proteins and microRNAs (miRNAs) that bind to sequences in the 3¢
untranslated region of the mRNA.
• The role of the myriad different miRNAs is still uncertain, but it is thought that they fine-
tune the expression of many genes. Major changes in gene expression are more often due to
controls that affect transcription.
346 Chapter 11: Human Gene Expression
All the cells of our body, apart from gametes, are derived by repeated mitosis from
the original fertilized egg. All those cells, with a few minor exceptions, therefore
contain exactly the same set of genes. The differences in anatomy, physiology,
and behavior between cells, described in Chapters 4 and 5, are the result of differ-
ing patterns of gene expression. During normal development, cells follow branch-
ing trajectories in which successively more specialized patterns of gene expres-
sion define the transitions from totipotency through pluripotency and onward to
terminal differentiation. Most of these developmental decisions are irreversible
in the context of normal human physiology, although there is great interest in
reversing them by artificial manipulation, to produce induced pluripotent cells
(see Chapter 21) or to promote the regeneration of damaged tissues or organs. In
addition to these fixed patterns, cells have a repertoire of short-term flexible
changes in gene expression that enable them to respond to their fluctuating
environments.
Controls on gene expression operate at several levels. On the largest scale,
megabase regions of chromatin around centromeres and elsewhere form highly
condensed and genetically inert heterochromatin that can be seen by Giemsa
(G-band) staining of metaphase chromosomes. The dark G bands do contain
some expressed sequences, but most expressed genes are located in regions that
stain pale in a standard G-banded preparation (see Figure 2.14). The location of a
chromosome within the interphase cell nucleus can also have a profound effect
on its level of expression.
This chapter explores how gene expression is controlled at the single gene
level, from the selection of which genes to transcribe, through processing of the
transcript, to translation to produce the initial protein product.
As described in Chapter 8, techniques for studying gene expression have
evolved rapidly. Early studies used RT-PCR to document the expression or other-
wise of single genes of interest in different tissues or at different stages of devel-
opment. Gene expression was quantitated by real-time RT-PCR. As with other
aspects of genetics, the emphasis moved on to genomewide studies, using micro-
arrays for expression profiling. Typically, the experiments compare expression
profiles in two or more tissues or cell states. Now the focus is moving on to direct
sequencing of cDNA on a massive scale. In some ways, this is just a refinement of
the large-scale generation of expressed sequence tags (ESTs) by partial sequenc-
ing of cDNA libraries. However, the new massively parallel sequencing machines
allow an unprecedented depth of sequencing, and hence the unbiased quantita-
tion of even very rare transcripts and detection of rare splice isoforms.
Efforts to understand the factors controlling human gene expression have
been focused by the ENCODE (Encyclopedia of DNA Elements) project. This is a
systematic attempt to identify all functional sequence elements in human DNA
(see http://genome.ucsc.edu/ENCODE). A pilot phase, launched in 2003, focused
on 44 genomic regions, totaling 30 Mb or 1% of the human genome. In 2007, the
project moved into the production phase, studying the whole genome. At the
same time, parallel efforts (ModENCODE; http://www.modencode.org) targeted
two important model organisms, the fruit fly Drosophila melanogaster and the
nematode worm Caenorhabditis elegans. Data from the ENCODE project inform
much of the discussion in this chapter.
There is a growing awareness that the way in which our genome works is
much more complex than it once appeared. Early understanding naturally
focused on the sparsely scattered protein-coding sequences and was guided by
the idea that each gene encoded a single polypeptide. It now turns out that most
protein-coding genes encode multiple polypeptides. Moreover, although less
than 1.5% of our DNA codes for protein, much of the rest is transcribed, and cells
are awash with newly recognized RNA molecules, of whose function (if any) we
still have only a very limited understanding. The genome projects showed that
we humans have scarcely more protein-coding genes than the 1 mm long
Caenorhabditis elegans worm. It seems inevitable that our greater complexity
must be the result of more complex organization and regulation of a broadly sim-
ilar set of genes. Some of the new vision was presented in Box 9.5, which dis-
cussed revising the concept of the gene. We are still at an early stage of unraveling
the mechanisms that control human gene expression, but this chapter outlines
current concepts and sets the scene for emerging new data.
PROMOTERS AND THE PRIMARY TRANSCRIPT 347
5¢ RACE (Rapid amplification of cDNA ends) is a form of RT-PCR that The method is a variant of the SAGE (Serial analysis of gene
allows amplification of the whole mRNA sequence between an internal expression) technique that first introduced the use of type IIs
primer and the 5¢ end of the RNA (Figure 1). This is a low-throughput restriction enzymes and sequencing of concatemerized tags. The short
method but is particularly useful for identifying unsuspected tags are sufficient to identify the molecule, and the concatemerization
upstream exons of a gene, the result of initiation at a novel promoter. allows huge numbers of tags to be sequenced, permitting
In the ENCODE project, 5¢ RACE was systematically used to examine unprecedentedly accurate quantification of the relative abundances of
transcripts from 399 loci in 12 different tissues. RACE products different mRNAs. The purpose of using the type IIs enzyme to generate
were identified with the use of oligonucleotide microarrays. Novel short tags is to reduce the amount of sequencing needed. With the
fragments were found for 90% of all loci tested. introduction of massively parallel sequencing technologies, this may
CAGE (Cap analysis of gene expression) is a high-throughput no longer be an important consideration, and it may be simpler just to
method that sequences just the first 18 nucleotides adjacent to the 5¢ sequence cap-selected cDNAs on a massive scale.
cap of all mRNAs in a sample. The sequence allows the transcription 5¢ AAAAAAA...An 3¢
start site to be identified, and the number of occurrences of that
sequence is a direct count of the number of molecules of that mRNA in 3¢ AAAAAAA 5¢ reverse transcription of RNA using an
the original sample. The method depends on three basic steps: internal primer; extend 3¢ end of transcript
using terminal transferase and dATP
• cDNAs that correspond to the capped 5¢ end of mRNAs are
selected. This is achieved by a chemical reaction that biotinylates 5¢
the mRNA cap, followed by selection with streptavidin-coated 3¢ anneal adaptor containing primer
TTTTT
magnetic beads.
• An adaptor containing the recognition site for the type IIs enzyme 3¢ AAAAAAA 5¢ sequence A and extend
MmeI is ligated to the cDNA. Type IIs restriction enzymes recognize
a specific DNA sequence but cut the DNA elsewhere, a defined
B
number of base pairs away. Digestion with MmeI releases an
18-nucleotide fragment of the cDNA with a two-base overhang (a 5¢ AAAAA 3¢
PCR with primers A and B
sticky end). 3¢ TTTTT 5¢
• The 18-base fragments with their sticky ends are ligated together A
to form long concatemers, which are then sequenced. Figure 1 Principle of 5¢ RACE.
TFIID
TAF subunits ~11 recognize other DNA sequences near the start point; regulate DNA
binding by TBP
TFIIF 3 stabilizes RNA polymerase interaction with TBP and TFIIB; helps
attract TFIIE and TFIIH
TBP, TATA-binding protein; TAF, TBP-associated factors; BRE, TFIIB recognition element.
PROMOTERS AND THE PRIMARY TRANSCRIPT 349
Figure 11.2 Initiation of transcription at a pol II promoter. Before RNA start of transcription
polymerase II can initiate transcription, several general transcription factors— TATA box
TFIID, TFIIB, TFIIE, TFIIF, and TFIIH—are required to assemble sequentially on
the promoter. The promoter illustrated here includes a TATA box, to which TFIID
binds. Many promoters lack a TATA box and use other sequence elements to
assemble the transcription factors and RNA polymerase II. When this basal TBP TFIID
transcription apparatus has fully assembled, TFIIH opens up the DNA double
helix at the transcription start site and initiates a conformational change in
the polymerase by phosphorylating its protruding C-terminal domain. This
allows the polymerase to commence transcription. Additional DNA-binding
transcription factors and protein-binding co-activators (not shown) control the
level of activity of a promoter. TBP, TATA-box binding protein; CTD, C-terminal TFIIB
domain of RNA pol II. [From Alberts B, Johnson A, Lewis J et al. (2007) Molecular
Biology of the Cell, 5th ed. Garland Science/Taylor & Francis LLC.]
Unlike DNA polymerases, RNA polymerases do not need a primer to get started,
but further actions of the TFII factors are needed before transcription can start.
One subunit of TFIIH is a DNA helicase. This uses energy from the hydrolysis of TFIIE
ATP to open up the DNA double helix, giving pol II access to the template strand.
The polymerase then starts RNA synthesis but initially fails to progress away from TFIIH
the promoter. Short oligoribonucleotides are produced in a process of abortive
RNA polymerase II
initiation until a set of conformational changes occur that allow the pol II to move
into elongation mode. These changes are triggered by TFIIH-dependent phos-
phorylation of serine residues in the C-terminal domain of pol II. Some data sug-
gest that this transition is the most critical part of transcription control. Most
promoters, whether active or inactive, are said to be occupied by a pre-initiation
complex, but only those on active genes are able to move into elongation mode.
The polymerase then moves along the template strand, leaving behind most
of the general transcription factors but picking up new proteins, the elongation
factors such as the Elongin complex. The average speed is about 20 nucleotides UTP, ATP
CTP, GTP
per second, although there are pauses and spurts. At this speed it would take
more than 24 hours to transcribe the 2.4 Mb dystrophin gene!
Termination of transcription
There seem to be no specific sequences that act as stop signals for RNA polymer- P P
ase; there are no transcriptional analogs of the UGA, UAG, and UAA translational
stop codons. For many genes, termination occurs at various positions in different
RNA
individual transcripts. It turns out that termination of transcription is linked to
cleavage of the transcript. A well-known feature of mRNAs is the poly(A) addition
TRANSCRIPTION
site—AAUAAA or a closely similar sequence. As described in Chapter 1 (see Figure
1.21), the mRNA is cut a few nucleotides downstream of this site. This cleavage
also initiates the events leading to the termination of transcription. The cut is
made by a protein complex that is physically tethered to the RNA polymerase.
The complex includes an exonuclease. Once the cleavage has produced a free
RNA end, the exonuclease moves along the RNA that is still emerging from the
polymerase, degrading it in a 5¢Æ3¢ direction. A sort of race ensues between the
polymerase, continuing along the DNA and elongating the tail of the transcript,
and the exonuclease coming up behind the polymerase and eating up the tran-
script (Figure 11.3). The exonuclease is the faster of the two, and when it catches
up with the polymerase, transcription terminates.
P P
exonuclease
polymerase progression
AAUAAA
P P
AAUAAA
(A)
NH CH CO NH CH CO NH CH CO NH CH CO NH CH CO
H3K4 trimethylateda;
monomethylatedb
H3K27 trimethylated
H4R3 methylated
H4K5 acetylated
Within each category of chromatin there are sub-varieties and variant patterns of modification,
but those shown are the most frequent modifications in each broad category of chromatin.
See Box 11.2 for nomenclature of amino acids in histones. Blank boxes signify that there is no clear
pattern. aAt promoters. bAt enhancers.
CHROMATIN CONFORMATION: DNA METHYLATION AND THE HISTONE CODE 355
contrast with TAF3, DNA methyltransferases can bind only to nucleosomes that BOX 11.3 continued
are unmethylated at H3K4. Thus, H3K4 methylation contributes not only to
recruitment of the transcription machinery but also to the protection of pro-
moter regions against undesired silencing. This is particularly important for DNA-binding
housekeeping genes transcribed from CpG island promoters, the natural targets proteins
for DNA methyltransferases.
A web of interactions affects the final outcome, and no one factor is fully
determinative. The proteins that recognize specifically modified histones may
themselves have histone-modifying activity, producing positive feedback loops—
for example, some chromodomain proteins have histone deacetylase activity.
Thus, histone modifications can be interdependent. Certain modifications fol-
low on from one another. Phosphorylation of serine at H3S10 promotes acetyla-
tion of the adjacent H3K9 and inhibits its methylation. Ubiquitylation of H2AK119
is a prerequisite for the dimethylation and trimethylation of H3K4. The protein treat living cells or cell
nuclei with formaldehyde
complexes often also include components of ATP-dependent chromatin remod-
eling complexes and DNA methyltransferases that methylate DNA, both described crosslink formed
below. Gene expression depends on a balance of stimulatory and inhibitory
effects rather than on a simple one-to-one histone code.
ATP-dependent chromatin remodeling complexes
The histone-modifying enzymes described above constitute one set of determi-
nants of chromatin function; another set comprises ATP-driven multiprotein
complexes that modify the association of DNA and histones. The components of
these chromatin remodeling complexes are strongly conserved from yeast to
humans, and studies in many organisms have demonstrated their involvement extract and fragment chromatin;
in numerous developmental switches. They can be grouped into families, digest with restriction enzyme
depending on the nature of the ATPase subunit (Table 11.3). Each ATPase associ-
ates with a variety of partners to form large mix-and-match complexes with a
rich and confusing nomenclature. Table 11.3 is far from being a comprehensive
list. Some subunits are tissue-specific. The complexes often include proteins that
interact specifically with modified histones, for example through bromodomains
or chromodomains. Other subunits may have histone-modifying activity, so that
the ATP-dependent and histone-modifying factors can be interdependent.
Chromatin structure is dynamic. Whereas the histone-modifying enzymes
affect nucleosome function through covalent changes, the ATP-dependent DNA ligation
remodeling complexes change the nucleosome occupancy of DNA. They move
nucleosomes along the DNA or promote nucleosome assembly or disassembly.
Both locally and globally, nucleosome positioning affects gene expression.
Promoters of active genes and other regulatory sequences are relatively free of
nucleosomes, with a characteristic dip in nucleosome occupancy coinciding
exactly with the transcription start site. This is probably necessary to allow RNA remove crosslinks by heat
polymerase and transcription factors to gain access to the DNA. These regions treatment and proteolysis
therefore appear as DNase-hypersensitive sites (DHSs), which therefore often
mark promoters and other regulatory sequences. Changes in sensitivity to DNase
at particular loci can be observed when cells differentiate or change their state;
these presumably reflect the repositioning of nucleosomes. In yeast, the Isw2
remodeling factor has been shown to act as a general repressor of transcription
by sliding nucleosomes away from coding sequences onto adjacent regulatory inverse PCR with primers
sequences. The human counterparts probably act similarly. from bait sequence
Several variant histones exist, which replace the standard histones in specific
types of chromatin or in response to specific signals. The best known is CENP-A,
a variant histone H3 that is found in centromeric heterochromatin. Another interactor sequences
example is H2A.Z, a variant H2A that is associated with active chromatin regions
and may be involved in containing the spread of heterochromatin. The relative identify interactor
sequences on
depletion of nucleosomes at active promoters and regulatory sequences, men- microarrays or by
tioned above, reflects the fact that nucleosomes at these locations tend to con- sequencing
tain H2A.Z and another variant histone, H3.3. This makes them unstable, thus
allowing DNA-binding proteins better access to these sequences. Another vari- Figure 2 Chromosome conformation
ant of H2A, macro-H2A, is characteristic of nucleosomes on the inactive X chro- capture (4C). [Adapted from Alberts
mosome. Incorporation of these variant histones depends on the ability of some B, Johnson A, Lewis J et al. (2007)
chromatin remodeling complexes to loosen the structure of nucleosomes and Molecular Biology of the Cell, 5th ed.
promote histone exchange. For example, the TIP60 complex (see Table 11.3) has Garland Science/Taylor & Francis LLC.]
a function in replacing H2A by H2A.Z.
CHROMATIN CONFORMATION: DNA METHYLATION AND THE HISTONE CODE 357
CHRAC 3 SNF2H –
BAF, Brg- or Brm-associated factor; NuRF, nucleosome remodeling factor; CHRAC, chromatin
accessibility complex; ACF, ATP-dependent chromatin assembly factor; NuRD, nucleosome
remodeling and histone deacetylase; TIP, Tat-interactive protein.
(A) (B)
H
8
7 O H N CH3
N 5 6
HC 4
C C C C
6 Figure 11.10 5-Methylcytosine.
N C N H N CH
9
4 1 (A) 5-Methylcytosine base-pairs with guanine
N C C N in the same way as unmodified cytosine.
sugar 3 2 2 1
(B) There is a symmetrically methylated CpG
N H O
sequence in the center of this molecule. The
sugar methyl groups (red) lie in the major groove of
H
the double helix. [(B) designed by Mark Sherman.
G 5MeC Courtesy of Arthur Riggs and Craig Cooney.]
358 Chapter 11: Human Gene Expression
CH3 CH3
5¢ CagggCgggCttCgagtCa 3¢ 5¢ CagggCgggCttCgagtCa 3¢
Taql site Taql site
all unmethylated cytosines modified sodium methylated cytosines unchanged
bisulfite
CH3 CH3
5¢ UagggUgggUttUgagtUa 3¢ 5¢ UagggCgggUttCgagtUa 3¢
PCR
5¢ TagggTgggTttTgagtTa 3¢ 5¢ TagggCgggTttCgagtTa 3¢
analyses
P
P
P
UM
UM
MS
MS
Figure 1 Modification of DNA by sodium bisulfite. MSP, primer specific for the methylated
sequence; UMP, primer specific for the unmethylated sequence.
These dense clusters of CpG sequences tend to be protected from DNA meth-
ylation regardless of whether the associated gene is active or inactive, and they
are likely to be under strong selective pressure to retain promoter activity. On
occasion, however, CpG islands can acquire DNA methylation, which inevitably
results in the shutting down of transcription, as observed on the inactive X chro-
mosome or at imprinted genes (see below). Aberrant methylation of CpG islands,
particularly those associated with tumor suppressor genes, is a general charac-
teristic of cancer cells (this is discussed further in Chapter 17). A minority of CpG
islands do show variable methylation that correlates with gene silencing, often as
part of a developmental switch. Figure 11.11 shows examples of the distributions
of CpG methylation in different tissues and at different loci.
Methyl-CpG-binding proteins
The effect of DNA methylation on gene expression can be observed directly, in
that some transcription factors such as YY1 fail to bind to methylated DNA. DNA
methylation can also be read by proteins that contain a methyl-CpG-binding
domain (MBD). These can then recruit other proteins associated with repressive
CHROMATIN CONFORMATION: DNA METHYLATION AND THE HISTONE CODE 359
DNMT3L 606588 binds to chromatin with unmethylated H3K4 and DNMT3A, DNMT3B; histone deacetylases
stimulates activity of DNMT3A/3B
PCNA, proliferating cell nuclear antigen; HP1, heterochromatin protein 1. aFor the distinction between de novo and maintenance methylases, see
Section 11.3 on p. 365. bDNMT2 turned out to have RNA rather than DNA as its substrate, but its structure is that of a DNA methyltransferase, and it is
included here because it is listed as such in many publications.
maintenance
methylation
degree of methylation
sperm
primordial germ cells PGCs
fertilized de novo
methylation
in somatic cells
oocyte
erasure of placenta,
parental yolk sac
de novo egg demethylation
epigenetic methylation
settings in early embryo
and
establishment de novo
of imprinting methylation in
marks trophoblast
blastocyst lineages
developmental stage
the two pronuclei have fused, the early zygote genome undergoes further Figure 11.12 Changes in DNA methylation
genomewide demethylation until the morula and early blastula stages in the pre- during mammalian development. Drastic
implantation embryo. Starting at the blastocyst stage, and concurrently with ini- and often tissue-specific changes in overall
methylation accompany gametogenesis and
tial differentiation events, the genome undergoes a genomewide wave of de novo
early embryonic development. Their causal
DNA methylation. The extent of this methylation varies in different cell lineages:
role remains uncertain, although mice that
somatic cell lineages are heavily methylated, whereas trophoblast-derived line- are specifically unable to methylate sperm
ages (giving rise to the placenta, yolk sac, and so on) are relatively undermethyl- DNA are infertile. Note the breaks (slashes)
ated. The primordial germ cells of the embryo (from which the gametes will ulti- in the developmental lines. PGCs, primordial
mately be derived—see Chapter 4) undergo a second wave of genomewide germ cells.
reprogramming, with progressive demethylation first occurring on entry of the
migrating primordial germ cells to the genital ridge, followed by genomewide
remethylation as the germ cells begin to develop.
These changes are very dramatic, but it is debatable how far they are causative
of the events they accompany, rather than being downstream results of the mech-
anisms that cause differentiation. DNA methylation is a feature of vertebrates,
flowering plants, and some fungi, but it is virtually absent from many of the best-
studied model organisms, including yeast, C. elegans, and Drosophila —yet the
mechanisms of differentiation are fundamentally similar in all these organisms.
DNA methylation is certainly required for vertebrate development, because ani-
mals deficient in DNA methyltransferase activity die at various stages of develop-
ment. Similarly, although pluripotent embryonic stem cells derived from
blastocyst-stage embryos can grow normally even in the complete absence of
DNA methylation, these cells immediately undergo apoptosis when stimulated
to differentiate. Extensive epigenetic erasure might be important for resetting the
epigenetic information originating from the previous generation to achieve a
relatively blank and versatile pluripotent state necessary at the onset of
development.
humans with loss-of-function mutations of DNMT3B have the autosomal reces- methylated DNA binds
MECP2 and MBD1
sive ICF syndrome (Immunodeficiency, Centromeric instability, Facial anoma-
lies; OMIM 242860), in which the pericentromeric heterochromatic satellites 2
these can
and 3 of chromosomes 1, 9, and 16 are demethylated and unstable. recruit HDACs
DNA methylation and histone modification interact (Figure 11.13). DNA and histone
methyltransferases
methyltransferases and some proteins that bind methylated DNA (MeCpG-
binding proteins) recruit repressive complexes that contain HDACs and/or his-
tone methyltransferases. Methylated H3K9 can bind chromodomain proteins histone
DNA
deacetylation
such as HP1 (see below), which can in turn recruit DNA methyltransferases. methylation
and methylation
These mutually reinforcing effects may underlie long-term repressive chromatin
states. In addition, constitutive, but not facultative, heterochromatin is very rich methylated
in repetitive DNA. In the fission yeast S. pombe, this invokes the RNA-dependent histones bind
mechanism described below, and the same may be true in mammals. chromodomain
proteins
The role of HP1 protein
these can recruit DNA
HP1 (heterochromatin protein 1) is a small nonhistone protein, highly conserved methyltransferases
across evolution, that is typically found in heterochromatin. One feature of het-
erochromatin is its tendency to spread, and this may be mediated by HP1. HP1 Figure 11.13 Mutually reinforcing
contains a chromodomain that enables it to bind to histones modified by the histone and DNA methylation in
inactive chromatin. Methylation of DNA
methylation of H3K9. Nucleosomes containing H3K9me3 recruit HP1, which
attracts proteins that modify associated
then binds H3K9 methyltransferase. This can methylate H3K9 in the next-door histone proteins. Histone deacetylation
nucleosome, leading to HP1 binding, and so mediating a spread of methylation and methylation attract proteins that
and HP1 along a series of nucleosomes. Spreading is limited by sites, known as methylate associated DNA. In addition
boundary elements or insulators, at which the chromatin is especially rich in to the effects shown here, some histone
acetylated histones and methylated H3K4, which are markers of active methyltransferases can interact directly
transcription. with DNA methyltransferases. Endogenous
More generally, HP1 acts as a platform onto which a variety of chromatin- short interfering RNAs may be important in
modifying proteins can be assembled. These include both HDACs and histone nucleating heterochromatin. HDACs, histone
methyltransferases. The multiplicity of possible partners probably explains why deacetylases; MECP2, MBD1, methyl cytosine
binding proteins.
HP1 is also found in euchromatic regions, where it usually represses but some-
times activates gene expression. In Neurospora and maybe elsewhere, HP1 can
recruit a DNA methyltransferase, providing another link between histone meth-
ylation and DNA methylation.
A role for small RNA molecules
A surprising finding in the fission yeast Schizosaccharomyces pombe was that
when genes involved in the RNA interference mechanism (described in Box 9.7)
were inactivated, this affected the assembly of heterochromatin. Heterochromatic
regions are coated with components of the RNAi-induced transcriptional silenc-
ing complex (RITS). One component of this complex is a chromodomain-
containing protein, which may explain how the RITS could be targeted to chro-
matin containing methylated H3K9. Paradoxically, silencing of heterochromatin
requires transcription from repeat elements within the region, and these tran-
scripts are processed to small interfering RNAs (siRNAs). In some instances the
siRNAs can trigger the methylation of histones, providing a mutually reinforcing
mechanism like the mutually reinforcing methylation of histones and DNA
described above. In mammals, small RNAs generated from repeat sequences are
similarly involved in the nucleation of heterochromatin. RNA-based mechanisms
introduce the possibility that sequence-specific hybridization has a role in deci-
sions about chromatin conformation—an interesting area for speculation, given
that promoters are poorly defined by the local DNA sequence.
A role for nuclear localization
The interphase cell nucleus is highly structured. Ribosomal RNA synthesis is
confined to the nucleolus; most transcription takes place in a limited number of
transcription factories, and RNA splicing occurs mainly in structures called
nuclear speckles. Individual chromosomes occupy distinct territories that are
characteristic of a particular cell type (see Figure 2.9). Active genes tend to be
located on the outside of their chromosome territory, and to lie toward the center
of the nucleus. Inactive genes are more likely to be near the nuclear periphery.
Artificially tethering a region of chromatin to the nuclear lamina affects the
expression of the genes it contains.
362 Chapter 11: Human Gene Expression
Many reports document how sequences that are located far apart on a chro-
mosome, or even on different chromosomes, but that are involved in regulatory
interactions, may come to lie together within the nucleus—a phenomenon
described as gene kissing. The technique of chromosome conformation capture
(3C or 4C) is used to study such effects (see Box 11.3). For example, in differentiat-
ing female mouse embryonic stem cells, the two copies of the Xic (X-inactivation
center) sequence lie close to one another just before the X-inactivation process
starts, when the X chromosomes are being counted, but not at other times.
No single prime cause?
DNA methylation, histone modification, and nucleosome positioning all show
clear correlations with different states of chromatin, but it has not yet been pos-
sible to identify a single overriding control. Heterochromatin contains methyl-
ated DNA, tightly packed nucleosomes, and dimethylated or trimethylated H3K9.
Transcription start sites have unmethylated DNA and are marked by monometh-
ylated, dimethylated, or trimethylated H3K4 and acetylation of H3K9 and H4K5,
8, 12, and/or 16. Additionally, nucleosomes at these sites often contain the vari-
ant histones H2A.Z and H3.3, which make them unstable. Active genes are
reported to have a 5¢Æ3¢ gradient of modifications. Near the 5¢ end, lysines 4, 9,
and 27 of H3 are all monomethylated. In more distal parts of active genes, H3K36
is trimethylated, and the DNA is often highly methylated. By contrast, inactive
genes have trimethylation of H3K4 and H3K27. Current understanding suggests
that the different states of chromatin are the result of several different processes,
each of which individually contributes only a small part of the effect. Feedback
loops cause specific combinations of small effects to govern the overall chroma-
tin state.
known genes by including novel upstream exons. The novel exons were often
50–100 kb upstream of the previously documented transcription start site, with
20% located more than 200 kb upstream. There was also evidence of transcripts
containing exons of more than one gene. What this means in terms of cell func-
tion remains to be determined.
In the light of these findings, it is difficult to maintain the traditional idea of
genes as discrete stretches of DNA that constitute clearly delimited transcription
units. The results raise serious questions for future research. Do the novel tran-
scripts represent novel functional RNAs, and if so what is their function? Or do
they represent a requirement for transcription of much intergenic DNA, but not
for the transcript itself—as suggested by studies of some imprinted loci (see the
next section)? Alternatively, do the data mean that cells, just like the people stud-
ying them, find it difficult to identify which parts of the genome are functional
and which are junk? Examination of the novel transcription start sites showed
that they had properties indistinguishable from known start sites, which suggests
Figure 11.15 An example of the ENCODE data. This small subset of the overall data shows genes, CpG islands, transcripts, putative promoters, and
histone modification in 350 kb of genomic DNA adjacent to the telomere of chromosome 16p of HeLa cells. The aim is to identify and map all the factors
that, acting in combination, make up the chromatin landscape and govern gene expression. [Adapted from the ENCODE Project Consortium (2007) Nature
447, 799–816. With permission from Macmillan Publishers Ltd.]
364 Chapter 11: Human Gene Expression
that the novel transcripts are not just some sort of noise in the system. It seems
likely that the traditional view of the genome is in need of major revision, but the
implications of this are at present far from clear (see also Box 9.4).
Predicting transcription start sites
Within a core promoter there are typically several alternative transcription start
sites, each operating with a different efficiency. The relative usages of the differ-
ent start sites can be predicted fairly successfully from the local DNA sequence.
However, on a larger scale the location of core promoters is not predicted by the
DNA sequence. Transcription start sites can be identified by a combination of
indicators: DNase hypersensitivity, DNA methylation, and histone modifications.
(A)
Figure 11.17 shows that a specific pattern of histone modification is clearly asso-
ciated with transcription start sites of active genes.
H3K4me1
scaffold. Certainly, the ENCODE project confirmed previous work that demon- H3K4me2
H3K4me3
strated the existence of large (hundreds of kilobases) functional chromosome H3ac
H4ac
domains. Although we can now describe and manipulate many gene control ele- FAIRE
ments, we are often still unable to explain why those sequences have acquired DNase I
that function.
Figure 11.17 Chromatin conformation at transcription start sites. –5000 –3000 –1000 1000 3000 5000
distance to transcription start site
Curves show the distribution of markers of chromatin conformation across
transcription start sites of genes within the ENCODE pilot phase regions.
(A) All annotated genes. (B) All active genes associated with a CpG island. (C)
(C) All inactive genes associated with a CpG island. Data are shown for five
different histone modifications (monomethylation, dimethylation, and
relative frequency of binding
H3K4me1
trimethylation of H3K4, acetylation of H3 and H4), and for two markers of open H3K4me2
H3K4me3
chromatin [susceptibility to formaldehyde crosslinking, FAIRE (formaldehyde- H3ac
assisted isolation of regulatory elements), and DNase I hypersensitivity]. The H4ac
FAIRE
peaks show that the chromatin surrounding active transcription start sites DNase I
is in an open conformation and carries a particular combination of histone
modifications. The signature comprises H3K4me3 and H3K9/14ac located just
downstream of the transcription start site, with H3K4me2 further downstream.
There is a weaker and more diffuse peak of acetylated H4. [Adapted from the
ENCODE Project Consortium (2007) Nature 447, 799–816. With permission from –5000 –3000 –1000 1000 3000 5000
Macmillan Publishers Ltd.] distance to transcription start site
EPIGENETIC MEMORY AND IMPRINTING 365
example of example of
paternal X inactivated maternal X inactivated
pluripotent lineages, occurring at the late blastula stage in mice, and most prob-
ably also in humans. Inactive X chromosomes remain condensed throughout the
cell cycle and can be seen as the Barr body or sex chromatin on the periphery of
the cell nucleus (see Figure 3.8). A 46,XX cell may inactivate the maternal or the
paternal X chromosome, but whichever is chosen for inactivation, that same one
is inactivated in all daughter cells (Figure 11.19A). The body of an adult female is
thus a mosaic of cell clones, each clone retaining the pattern of X-inactivation
that was established in its progenitor cell early in embryonic life. Figure 11.19B
shows a striking example from the tortoiseshell cat. Inactivation is stable through
mitosis but not across the generations. A woman’s maternal X chromosome can
equally well have been the active or inactive one in her mother, and it has the
same chance as her paternal X chromosome of being inactivated in her own
cells.
Initiating X-inactivation: the role of XIST
Inactivation is initiated at the X-inactivation center (XIC ) at Xq13 and then prop-
agates along the whole length of the chromosome in what may be an extreme
example of the tendency of heterochromatin to spread. Spreading depends on
physical continuity of the chromosome, and also on some unknown special fea-
ture of the X-chromosome DNA. As described in Chapter 2, if the inactive X chro-
mosome is split in two by an X-autosome translocation, inactivation is limited to
the segment that includes XIC. It cannot jump to the detached portion of the X
chromosome. It may spread from the XIC some way into the attached autosomal
sequences, but only a short distance; it does not propagate all along the auto-
somal segment.
As previously mentioned, the earliest observed event in XX cells is a transient
pairing of the two XIC sequences. This is probably the mechanism by which the X
chromosomes are counted, because counting is disrupted if the XIC sequences
are deleted, replicated, or translocated. XIC encodes a large noncoding RNA, XIST
(X-inactivation-specific transcript), which is expressed only from the inactive X
chromosome. Its primary transcript undergoes splicing and polyadenylation to
generate a 17 kb mature noncoding RNA. The RNA is then thought to recruit
some protein factors that organize the chromatin into a closed, transcriptionally
EPIGENETIC MEMORY AND IMPRINTING 367
inactive conformation. In differentiated cells that have already undergone RT-PCR results
from 9 cell lines
X-inactivation, loss of XIST does not cause reactivation. Thus, XIST is required for
the establishment of X-inactivation, but not to maintain it.
10
In the mouse, expression of Xist from the active X chromosome is silenced by
an antisense transcript (Tsix) that alters the chromatin configuration at the Xist 20
locus, possibly through an RNAi mechanism induced by double-stranded Xist- 30
Tsix RNA. Humans have a TSIX gene, but it has little sequence or structural
homology to mouse Tsix, and it is much less clear whether a similar mechanism 40
X-inactivation is not a blanket inactivation of the entire chromosome. The two 120
pseudoautosomal regions (see Chapter 2) escape inactivation, but even outside 130
these regions X-inactivation is patchy. Some specific examples of genes that
escape inactivation were mentioned in Chapter 10 (see p. 326). Some of these are 140
genes that have a functional counterpart on the Y chromosome, for which there 150
would be no need for dosage compensation. Many, however, have no such coun- Mb
terpart. In one study, different hybrid cells containing independent copies of a
human inactive X were used to investigate the transcription of 612 X-linked Figure 11.20 Genes that escape
X-inactivation. Columns in the rectangle
genes: 458 of the genes were inactivated in all or most of the cell lines, but 94 were
show results of systematic RT-PCR tests
expressed. The remaining 60 genes showed a variable pattern of expression in for expression of X-linked genes in nine
different cell lines (Figure 11.20). Despite the general requirement for physical independent somatic cell hybrids. Each
continuity, the spreading X-inactivation is evidently able to jump over these hybrid contained a single inactive human
excluded genes and continue its progress. X chromosome. Blue bars identify genes
that were expressed from the inactive X
At imprinted loci, expression depends on the parental origin chromosome in a particular hybrid, and
yellow bars mark genes that were not
In Chapter 2, we described evidence that for some genes the parental origin mat- expressed. Many genes, scattered all along
ters. Zygotes with two maternal or two paternal genomes develop abnormally, as the X chromosome, escape inactivation in
ovarian teratomas or hydatidiform moles, respectively. For some chromosomal one or more of the hybrids. Some genes,
segments, although not others, uniparental disomy (UPD; see Section 2.5, p. 50) such as XIST, escape inactivation in all nine
is pathogenic. Systematic surveys have been made to identify imprinted chromo- hybrids, whereas others show a more patchy
somal regions in the mouse. Unlike in humans, all the mouse chromosomes are pattern of inactivation. Cen, centromere.
acrocentric, so Robertsonian translocations (see Figure 2.23) can permit crosses [From Carrel L & Willard HF (2005) Nature 434,
400–404. With permission from Macmillan
that produce offspring with both copies of any chosen whole chromosome
Publishers Ltd.]
derived from a specified single parent. These reveal that UPD for some chromo-
somes has no phenotypic effect, whereas for others it produces abnormal pheno-
types. The abnormal phenotypes are sometimes complementary for different
parental origins; for example, overgrowth is often seen in paternal UPD and
growth retardation in maternal UPD. For some chromosomes, UPD is lethal.
These effects are caused by UPD for small numbers of specific genes on the rele-
vant chromosome, rather than by the whole chromosome. Further work has
identified about 100 imprinted loci on 11 of the mouse chromosomes. The results
are summarized at www.har.mrc.ac.uk/research/genomic_imprinting/maps
.html. In humans, a similar systematic investigation of imprinting is not possible,
but many of the mouse imprinted genes are now known also to be imprinted in
humans.
To confirm imprinting of a gene, it is necessary to identify an individual who
is heterozygous for a sequence variant present in the mature mRNA; mRNA from
different tissues can then be checked for monoallelic or biallelic expression, and
the origin of each allele can be determined by typing the parents. For some genes,
this type of analysis has shown that imprinting is confined to certain tissues or to
certain stages of development (Table 11.5). In humans, imprinted genes have
been identified on chromosomes 6q24–q26, 7q21–q22, 7q32 (and maybe 7p12),
8q24, 11p15, 14q32, 15q11–q13, 19q13, and 20q13. Figure 3.18 showed examples
of pedigrees involving pathogenic mutations in imprinted genes.
368 Chapter 11: Human Gene Expression
IGF2 insulin-like growth factor maternal imprinted in many tissues but biallelically expressed in brain, adult liver,
chondrocytes, etc.
PEG1/MEST esterase maternal imprinted in fetal tissues but biallelically expressed in adult blood
UBE3A ubiquitin ligase paternal imprinted exclusively in brain; biallelically expressed in other tissues
KvLQT1 potassium channel paternal imprinted in several tissues but biallelically expressed in heart
WT1 Wilms tumor gene paternal frequently imprinted in cells of placenta and brain but biallelically expressed in
kidney
N
(A)
As
RP
at 15q11–q13 that is deleted in Prader–
RN
N
F-S
L2
B3
rf2
no
Willi or Angelman syndrome. (A) Genes
A
2
A
RN
10
RN
RN
GE
II s
UR
BR
5O
E3
N
in the imprinted cluster. Arrows show the
ATP
MK
MA
PW
PW
ND
HB
SN
UB
GA
C1
direction of transcription. Blue represents
paternally expressed genes, pale blue
predominantly paternally expressed genes,
IC
red maternally expressed genes, and black
biallelically expressed genes. IC marks an
(B) imprinting control center, a sequence that
is differentially methylated in males and
females and that seems to control overall
SNURF–SNRPN extended transcription unit imprinting of the region. (B) There seem to
HBII-43 RPN
B
38
38
3
II-4
II-4
UR
HB
SN
These regions bind the insulator protein CTCF when unmethylated, but not
when methylated. CTCF can prevent a promoter from having access to an CTCF
enhancer if, but only if, it lies between the two (Figure 11.23).
IGF2 ICR H19 E E
––– +++
Two questions arise about imprinting: how is it done and why is it
enhancers
done?
The phenomenon of imprinting raises two difficult questions. The first is: How
does it work? It must be reversible: a man may inherit a particular sequence with Me
a maternal imprint from his mother, but if he passes it on to his child it will then
carry a paternal imprint. The imprint probably consists of differential methyla- IGF2 ICR H19 E E
+++ –––
tion of CpG dinucleotides in imprinting control centers in the paternal and
maternal genomes. These are imposed by sex-specific de novo methylation dur- Figure 11.23 The H19 and IGF2 genes
on chromosome 11p compete for an
ing gametogenesis. How the specific regions to be imprinted are chosen is not
enhancer. Binding of the CTCF insulator
known. Although methylation patterns are quite different in sperm and eggs, protein to a differentially methylated
most of these differences are erased in the wave of demethylation in the early region determines the outcome of the
zygote (see Figure 11.12). However, for these special regions the imprint is erased competition. On the maternal chromosome,
only in primordial germ cells, not in somatic cells. Interestingly, at least in the the imprinting control region (ICR) is
mouse the Dnmt1 DNA methyltransferase uses different promoters in spermato- unmethylated, allowing it to bind CTCF
cytes, oocytes, and somatic cells, so that each contains a different isoform of the (orange oval). This prevents the IGF2 gene
enzyme. Maybe gamete-specific isoforms allow this differential control of from accessing the enhancer, and so the
methylation. enhancer drives H19 expression. On the
The second question concerns the function of imprinting. One popular the- paternal chromosome, methylation of the
imprinting control region prevents the
ory is based on a conflict of evolutionary interest between fathers and mothers.
binding of CTCF, thereby allowing IGF2 to
Selfish gene theory suggests that paternal genes might be best propagated by outcompete H19 for access to the enhancer.
ensuring that offspring are born as robust as possible, even at the expense of the [Adapted from Wallace JA & Felsenfeld G
mother—a man can father children by many different mothers. Maternal genes, (2007) Curr. Opin. Genet. Dev. 17, 400–407.
in contrast, are best propagated if the mother remains capable of further preg- With permission from Elsevier.]
nancies. So, the theory goes, paternal genes program the fetus to extract nutri-
ents at the greatest possible rate from the mother through the placenta, whereas
maternal genes act to limit the depredations of a parasitic fetus. This does fit
many imprinted loci, but is unlikely to be the full explanation. Why should some
imprinting be tissue-specific? And not all imprinted genes have functions related
to intrauterine growth, or are imprinted in the direction predicted by the parental
conflict hypothesis.
–/–
lethal
Some genes are expressed from only one allele but independently
of parental origin
X-inactivation ensures the monoallelic expression of most X-linked genes.
However, monoallelic expression, independently of parental origin, is also a fea-
ture of a surprisingly high proportion (maybe as high as 5–10%) of autosomal
genes. Three processes can be distinguished.
First, the immunoglobulin and T-cell receptor genes show monoallelic expres-
sion after programmed DNA rearrangements. Each B or T lymphocyte expresses
only a single allele of the relevant gene. As described in Chapter 4, functional
immunoglobulin and T-cell receptor genes are assembled from large batteries of
potential gene segments by a complicated series of DNA rearrangements. Some
random elements within the process mean that the chance of a productive rear- inactive H
rangement is quite low; however, once a functional protein has been produced, a (methylated)
feedback mechanism inhibits further DNA rearrangements.
Competition for a single-copy enhancer provides a second mechanism. The
olfactory receptor genes are a well-studied example (Figure 11.25). As mentioned
in Chapter 9, olfactory receptor genes are the largest gene family in humans and
mice. Mice have some 1300 of them, and humans at least 900. Nevertheless, each active H
olfactory neuron expresses just a single allele of a single receptor gene, so that it
fires only in response to one specific odorant. In mice, and presumably also in
humans, expression of any olfactory receptor gene depends absolutely on an
enhancer sequence (H) present as only a single copy in each genome. Chromo- OR1
some conformation capture experiments show that the single copy of H can
associate with any one of the 1300 receptor genes, regardless of their chromo-
OR2
somal location. Although a diploid olfactory neuron contains two copies of H,
only one is active; the other is inactivated by methylation (interestingly, appar-
ently at CpA rather than CpG sequences). OR3
Finally, the great majority of cases are random. In independent cell clones, a
gene is monoallelically expressed in some clones, but both alleles are expressed
in others. In cell cultures, the effects are stable within a clonal lineage. For a given Figure 11.25 Competition for a single
gene, some clones may show paternal and others maternal expression. In an enhancer leads to monoallelic expression
of olfactory receptors. Expression of any
individual there is no consistency of parental origin between different mono-
of the hundreds of olfactory receptor genes
allelically expressed genes, even when they lie on the same chromosome. (OR1, OR2, OR3, …) distributed around
Polymorphic variation in the expression level of alleles is common and heritable the genome depends on association with
(expression quantitative trait loci or e-QTLs), so monoallelic expression may be a single-copy enhancer, H. One of the two
the tail end of the distribution. The cause and significance of all this are unknown, alleles of H in a diploid cell is inactivated by
but probably e-QTLs and random monoallelic expression explain much of the methylation (green circles), so that there is
phenotypic diversity of our species. only a single active copy of H in each cell.
372 Chapter 11: Human Gene Expression
P1 P2 P3 P4
four alternative D D D D A D A
promoters and
first exons downstream exons
Figure 11.26 A gene with several
four alternative
primary transcripts alternative promoters. The gene has four
alternative promoters (P1–P4) and first
exons (different-colored boxes), plus two
AAAAAAA...
downstream exons (red boxes). The positions
AAAAAAA... of splice donor (D) and splice acceptor (A)
four alternative sites are shown. Each splice junction has to
mature mRNAs AAAAAAA...
be made by joining a donor and an acceptor
AAAAAAA... site. The mature mRNA has a poly(A) tail.
ONE GENE—MORE THAN ONE PROTEIN 373
(Figure 11.27). Transcripts using these promoters encode large dystrophin pro-
tein (Dp) isoforms with a molecular weight of 427 kD. The three Dp427 isoforms
differ in their N-terminal amino acid sequence because the three alternative first
exons include coding sequence. However, in addition to these three promoters,
at least four other alternative internal promoters can be used, generating much (B)
smaller transcripts that lack many of the exons of the Dp427 isoforms.
the transferrin receptor (TfR), part of the machinery for absorbing dietary iron,
again without any effect on the production of the mRNA. The 5¢ UTRs of both fer-
ritin heavy-chain mRNA and light-chain mRNA contain a single iron-response
element (IRE). This is a specific cis-acting regulatory sequence that forms a hair-
pin structure. Several such IRE sequences are also found in the 3¢ UTR of the
transferrin receptor mRNA. Regulation is effected by a specific IRE-binding pro-
tein that is activated at low iron levels to bind IREs (Figure 11.32). Like many
translational controls, the effects depend on the secondary structure of the
mRNAs.
Until a few years ago, these scattered examples of RNA-based controls on
gene expression were considered interesting but rare events. That view was
changed with the discovery of RNAi and miRNAs.
(A) (B)
transfection miRNA control miR-155 Figure 11.33 Proteomic analysis of
100 transfection H
the effect of changing the level of a
CEBP-b
relative intensity
CONCLUSION
Gene expression is controlled by decisions at all levels: on whether or not to tran-
scribe a sequence; on the choice of transcription start site; on how to process an
alternatively spliced primary transcript; on how, when, and where to translate
the message in the mature mRNA; and on the modification and transportation,
within or outside the cell, of the initial polypeptide.
Protein-coding genes, and many genes encoding functional RNAs, are tran-
scribed by RNA polymerase II. Before transcription can start, a large complex of
proteins must be assembled around the transcription start site. Apart from the
RNA polymerase, these include the TFII transcription factors (a complex of at
least 27 individual polypeptides) that comprise the basal transcription appara-
tus, plus many tissue-specific proteins. Some of these proteins recognize specific
DNA sequence elements near the transcription start site, such as the TATA box.
Others join the complex through protein–protein interactions. These include
proteins bound to distant regulatory elements that are brought into close prox-
imity to the promoter by DNA looping.
Transcription start sites are defined more by the local chromatin conforma-
tion than by the local DNA sequence. The conformation depends on a set of
mutually reinforcing processes: histones in the core nucleosomes undergo
numerous covalent modifications or are sometimes replaced by variant mole-
cules; cytosines in CpG dinucleotides may be methylated, and ATP-dependent
chromatin remodeling complexes alter the distribution of nucleosomes. The
many nonhistone proteins in chromatin include the enzymes that add or remove
modifying groups to or from the histones and DNA. They are themselves differ-
entially bound to chromatin, depending on its conformation and state of
modification.
Chromatin conformation controls gene expression, both in the short term as
cells respond to fluctuating metabolic and environmental conditions, and in
the longer term as developmental programs unfold. Enduring changes in confor-
mation are the basis of epigenetic effects—changes in gene expression that do
not depend on changes in the DNA sequence. For some genes, epigenetic effects
depend on the parental origin—the phenomenon of genetic imprinting.
Transcription requires an open chromatin conformation, whereas heterochro-
matin has a closed, tightly packed conformation and represses gene expression.
Small noncoding RNA molecules are involved in the establishment of stable
heterochromatin.
Primary transcripts are often subject to alternative splicing. Exons can be
skipped or included, or variant donor or acceptor sites can be used. Some genes
can potentially encode hundreds of different polypeptides because of extensive
FURTHER READING 379
FURTHER READING
Promoters and the primary transcript Chromatin conformation: DNA methylation and the
Fuda NJ, Ardehali B & Lis JT (2009) Defining mechanisms that histone code
regulate RNA polymerase II transcription in vivo. Nature Barski A, Cuddapah S, Cui K et al. (2007) High-resolution profiling
461, 186–192. [A nice review of controls on initiation of of histone methylations in the human genome. Cell 129, 823–
transcription, with good (but nonhuman) examples.] 837. [Genomewide mapping of histone methylations across
Kleinjan DA & van Heyningen V (2005) Long-range control of promoters, enhancers, insulators, and transcribed regions.]
gene expression: emerging mechanisms and disruption in ENCODE Project Consortium (2007) Identification and analysis
disease. Am. J. Hum. Genet. 76, 8–32. [Gives detail on PAX6, of functional elements in 1% of the human genome by the
SHH, and other clinically important examples.] ENCODE pilot project. Nature 447, 799–816. [Findings of
Lonard DM & O’Malley BW (2007) Nuclear receptor coregulators: the pilot project; details are in a series of papers in Genome
judges, juries, and executioners of cellular regulation. Mol. Research.]
Cell 27, 691–700. [Examples of regulation by networks of co- Koch CM, Andrews RM, Flicek P et al. (2007) The landscape of
activators and co-repressors.] histone modifications across 1% of the human genome in
Maniatis T & Reed R (2002) An extensive network of coupling five human cell lines. Genome Res. 17, 691–707. [Detail from
among gene expression machines. Nature 416, 499–506. [An the ENCODE pilot project.]
extra insight into control of gene expression.] Lall S (2007) Primers on chromatin. Nat. Struct. Mol. Biol. 14,
Sandelin A, Carnici P, Lenhard B et al. (2007) Mammalian RNA 1110–1115. [A good general introduction.]
polymerase II core promoters: insights from genomewide Ooi L & Wood IC (2007) Chromatin cross-talk in development
studies. Nat. Rev. Genet. 8, 424–436. [Promoter structure and and disease: lessons from REST. Nat. Rev. Genet. 8, 544–554.
transcription start sites; useful boxes on techniques.] [Function of the CoREST complex.]
Shiraki T, Kondo S, Katayama S et al. (1999) Cap analysis gene Wang Z, Zang C, Rosenfeld JA et al. (2008) Combinatorial patterns
expression for high-throughput analysis of transcriptional of histone acetylations and methylations in the human
starting point and identification of promoter usage. Methods genome. Nat. Genet. 40, 897–903. [A genomewide study of
Enzymol. 303, 19–44. [Gives details of the CAGE technique the distribution of over 30 specific histone modifications,
outlined in Box 11.1.] that identifies specific combinations that are characteristic of
Smale ST & Kadonaga JT (2003) The RNA polymerase II core promoters, enhancers, etc.]
promoter. Annu. Rev. Biochem. 72, 449–479. [Summarizes
much work on the start sites of genes.] Heterochromatin
Visel A, Rubin EM & Pennachio LA (2009) Genomic views of Bühler M & Moazed D (2007) Transcription and RNAi in
distant-acting enhancers. Nature 461, 199–205. [Review of heterochromatic gene silencing. Nat. Struct. Mol. Biol. 14,
the nature and function of enhancers, with emphasis on 1041–1048. [Involvement of small RNAs in heterochromatin
human examples.] formation.]
380 Chapter 11: Human Gene Expression
Gaszner M & Felsenfeld G (2006) Insulators: exploiting Ringrose L & Paro R (2007) Epigenetic regulation of cellular
transcriptional and epigenetic mechanisms. Nat. Rev. Genet. memory by the polycomb and trithorax group proteins.
7, 703–713. [General review of insulators.] Annu. Rev. Genet. 38, 413–443.
Hediger F & Gasser SM (2006) Heterochromatin protein 1: Wutz A & Gribnau J (2007) X-inactivation Xplained. Curr. Opin.
don’t judge the book by its cover! Curr. Opin. Genet. Dev. Genet. Dev. 17, 387–393.
16, 143–150. [Functions of HP1.]
Trojer P & Reinberg D (2007) Facultative heterochromatin: is there One gene—more than one protein
a distinctive molecular signature? Mol. Cell 28, 1–13. [Features Maniatis T & Tasic B (2002) Alternative splicing and proteome
of various types of facultative heterochromatin.] expansion in metazoans. Nature 418, 236–243. [A good
source of examples.]
DNA methylation Stamm S, Zhang MQ, Marr TG & Helfman DM (1994) A sequence
de la Serna IL, Ohkawa Y & Imbalzano AN (2006) Chromatin compilation and comparison of exons that are alternatively
remodelling in mammalian differentiation: lessons from spliced in neurons. Nucleic Acids Res. 22, 1515–1526. [Lists
ATP-dependent remodellers. Nat. Rev. Genet. 7, 461–473. many examples.]
Eckhardt F, Lewin J, Cortese R et al. (2006) DNA methylation Südhof TC (2008) Neuroligins and neurexins link synaptic
profiling of human chromosomes 6, 20 and 22. Nat. Genet. function to cognitive disease. Nature 455, 903–911.
38, 1378–1385. [An example of large-scale mapping of [Discusses alternative splicing of the neurexin 3 (NRXN3)
epigenetic signatures.] gene shown in Figure 11.29.]
Goll MG & Bestor TH (2005) Eukaryotic DNA cytosine
methyltransferases. Annu. Rev. Biochem. 74, 481–514. Control of gene expression at the level of
[Review of the DNA methyltransferases.] translation
Baek D, Villén J, Shin C et al. (2008) The impact of microRNAs on
Effects of nuclear localization on gene expression protein output. Nature 455, 64–71.
Akhtar A & Gasser SM (2007) The nuclear envelope and Buchan JR & Parker R (2007) The two faces of microRNA. Science
transcriptional control. Nat. Rev. Genet. 8, 507–517. 318, 1877–1878. [Reviews the evidence that miRNAs can
Dekker J, Rippe K, Dekker M & Kleckner N (2002) Capturing sometimes stimulate, rather than inhibit, translation.]
chromatin conformation. Science 295, 1306–1311. Chendrimada TP, Finn KJ & Ji X (2007) MicroRNA silencing
[The original 3C paper.] through recruitment of EIF6. Nature 447, 823–828.
Fraser P & Bickmore W (2007) Nuclear organization of the Klausner RD, Rouault TA & Harford JB (1993) Regulating the fate
genome and the potential for gene regulation. Nature 447, of messenger RNA: the control of cellular iron metabolism.
413–417. Cell 72, 19–28. [Source of the data on iron-response elements
Lanctôt C, Cheutin T, Cremer M et al. (2008) Dynamic genome in Figure 11.32.]
architecture in the nuclear space: regulation of gene Lee R, Feinbaum R & Ambros V (2004) A short history of a short
expression in three dimensions. Nat. Rev. Genet. RNA. Cell 116 (2 Suppl.), S89–S92. [Personal recollections of
8, 104–115. groundbreaking work.]
Linnemann AK, Platts AE & Krawetz SA (2009) Differential nuclear Mathonnet G, Fabian MR, Svitkin YV et al. (2007) MicroRNA
scaffold/matrix attachment marks expressed genes. Hum. inhibition of translation initiation in vitro by targeting of the
Mol. Genet. 18, 645–654. [Underlines the difficulty of defining cap-binding complex eIF4F. Science 317, 1764–1767. [One
matrix/scaffold attachment regions; a good source of way in which miRNAs inhibit translation; this may not be the
references.] whole answer.]
Ruvkun G, Wightman B & Ha Z (2004) The 20 years it took to
Epigenetic memory and imprinting recognize the importance of tiny RNAs. Cell 116 (2 Suppl.),
Cuzin F, Grandjean V & Rassoulzadegan M (2008) Inherited S93–S96. [A personal account of this important work.]
variation at the epigenetic level: paramutation from the plant Selbach M, Schwanhäusser B, Thierfelder N et al. (2008)
to the mouse. Curr. Opin. Genet. Dev. 18, 193–196. [Discusses Widespread changes in protein synthesis induced by
their work on the Kit paramutation.] microRNAs. Nature 455, 58–63. [Source of the data in Figure
Geneimprint. Jirtle Laboratory at Duke University. 11.33; illustrates an important experimental approach.]
www.geneimprint.com [A list of imprinted genes in humans
and other organisms.]
Gimelbrant A, Hutchinson JN, Thompson BR & Chess A (2007)
Widespread monoallelic expression on human autosomes.
Science 318, 1136–1140. [Surprising data showing
widespread monoallelic expression.]
Heard E & Disteche CM (2006) Dosage compensation in
mammals: fine-tuning the expression of the X-chromosome.
Genes Dev. 20, 1848–1867.
Horsthemke B & Wagstaff J (2008) Mechanisms of imprinting of
the Prader–Willi/Angelman region. Am. J. Med. Genet. 146A,
2041–2052.
Lomvardas S, Barnea G, Pisapia DJ et al. (2006) Interchromosomal
interactions and olfactory receptor choice. Cell 126, 403–413.
[How the H enhancer ensures that each olfactory neuron
expresses only a single olfactory receptor.]
Chapter 12
• Clues to the function of a gene can be inferred by comparing the sequence of the gene and
any protein products with all previously investigated nucleotide and protein sequences to
find related sequences. The related sequences may be closely related genes or proteins, or
they may be short functional sequence components such as protein domains or peptide/
nucleotide sequence motifs.
• Inactivating an individual gene often results in an altered phenotype that provides
important insights into the function of the gene, but useful information can also sometimes
be obtained by overexpressing genes or expressing them ectopically.
• Gene function can be conveniently studied in cultured cells by silencing the expression of a
specific target gene using RNA interference. More extensive information about how a gene
works can often be inferred by selectively inactivating a gene in whole model organisms.
• Yeast two-hybrid screens identify interactions between proteins by splitting a transcription
factor into two functionally complementary domains that are then joined onto test proteins;
interaction between two such test proteins can reconstitute the transcription factor and
switch on a reporter gene.
• Large-scale protein interaction studies seek to define interactomes, global functional
protein networks in cells.
• Protein–protein and protein–DNA interactions can be identified with the help of chemical
crosslinking agents such as formaldehyde that can covalently bond proteins to other
proteins or to DNA within cells.
• Chromatin immunoprecipitation identifies DNA-binding sites for a protein in vivo by using
a specific antibody to capture a protein after it has been chemically crosslinked to DNA
within cells.
382 Chapter 12: Studying Gene Function in the Post-Genome Era
Sequencing of the human genome was, with hindsight, the easy part. Now comes
the hard part: how do we unravel what the sequence means and discover how it
helps construct a functioning human being and differentiates us from other
species?
In the post-genome era, work will continue for some time to get a complete
inventory of human genes. As was described in Chapter 10, comparative genom-
ics has been particularly helpful in identifying regulatory elements and in vali-
dating predicted protein-coding genes. The vast majority of protein-coding genes
have been identified in the human genome, and we now know that there are
about 20,000 genes of this kind. However, for the reasons described in Chapter 9,
RNA genes have been much more difficult to identify within genomic DNA, and
our knowledge of human RNA genes, although growing rapidly, remains rather
incomplete.
As the effort to identify all the genes and other functional components in the
human genome intensifies, researchers seek to understand our genome in differ-
ent ways. One major priority, as will be detailed in Chapter 13, is prompted by the
biological and medical need to understand genetic variation at the genome level.
In this chapter we consider another priority, the need to understand the func-
tions of genes. In the post-genome era, whole genome analyses are increasingly
being applied to understand the function and regulation of genes. In many cases,
such analyses have been developed from methods that were used to study one or
a few genes at a time. However, getting a whole-genome perspective allows us to
develop comprehensive networks of interacting genes and gene products, and so
obtain a fuller picture of how genes and cells work.
Chapter 11 (p. 362), the ENCODE project seeks to identify and analyze all the
functional components of our genome by a combination of experimental
approaches and bioinformatics analyses. The widespread availability of many
experimental approaches to perform global gene analyses has meant that a net-
work of small laboratories throughout the world are also making invaluable con-
tributions in addition to large, well-equipped genome centers.
Mouse Atlas of Gene SAGEa and sequence tag data from many http://www.mouseatlas.org/mouseatlas_index_html
Expression mouse tissues during development
processed enzymatically so that from individual RNA molecules short sequence fragments (tags) are recovered and then sequenced. bRNA-seq
involves shotgun sequencing of the whole transcriptome, using massively parallel DNA sequencing.
databases and new algorithms—for example, to search and analyze the data or to
model structures and molecular interactions—as it has by experimental advances.
However, in addition, as described in this section, bioinformatics can be used
directly to infer gene function simply by comparing sequences from previously
uninvestigated genes with other nucleotide and protein sequences about which
some function is already known.
BOX 12.1 HOW BIOINFORMATICS CAN HAVE A VITAL ROLE IN ELUCIDATING GENE FUNCTION:
THE EXAMPLE OF THE NIPBL GENE
Positional cloning identified a previously unstudied gene at 5p13 Different protein interaction methods were used to show that
as a major locus for a severe developmental malformation, Cornelia the fly Nipped-B protein and the human NIPBL protein each bound a
de Lange syndrome (CdLS; OMIM 122470). The sequence of the small protein roughly 600 amino acid residues long, which turned out
underlying gene provided few clues to its function, but sequence to be closely related orthologs of each other and unexpectedly also
homology searching with the predicted 2804 amino acid protein of the well-studied C. elegans Mau-2 protein. Mau-2 was known to be
revealed that it had well-studied orthologs in flies and fungi. a regulator of cell and axon migration in C. elegans, but nothing was
The Drosophila ortholog, Nipped-B, was known to be a known about any of its orthologs in other species.
developmental regulator that seemed to facilitate long-range Standard BLAST searches did not find any evidence to suggest
interactions between promoters and remote enhancers in certain that Mau-2 and its orthologs were related to SCC4 in any way.
target genes, and so the human gene was named NIPBL (Nipped- However, subsequent analyses with PSI-BLAST, a specialized sequence
B-like). Fungal orthologs of the NIPBL protein, such as budding homology search program that is useful for detecting similarities
yeast SCC2, were known to load cohesins onto chromatin and to between distantly related sequences, showed that SCC4, Mau-2,
be important in regulating chromosome segregation. Individuals and the proteins binding to NIPBL and Nipped-B were orthologs.
with CdLS possessing a causative NIPBL mutation are heterozygotes Subsequent experimental analyses confirmed the idea that vertebrate
and do not have a significantly increased incidence of chromosome orthologs functioned in loading cohesin onto chromatin as partners of
segregation abnormalities. However, when both NIPBL alleles were vertebrate SCC2 orthologs. As for the SCC2/Nipped-B/NIPBL
selectively inactivated in human cells by RNA interference, clear family, selective inhibition of expression of genes encoding Mau-2
defects in sister chromatid cohesion were found. Parallel studies in and its orthologs also resulted in chromosome segregation defects
which the Nipped-B gene was inactivated in Drosophila cells also (Figure 1).
produced sister chromatid cohesion defects. It was concluded that the yeast SCC2–SCC4 cohesin-loading
In budding yeast, SCC2 was known to bind a smaller protein complex and its chromosome segregation function has been
SCC4, close to 600 amino acid residues long, and it had been shown conserved during evolution but that, in vertebrates, the same proteins,
that the SCC2–SCC4 complex was responsible for loading cohesins, including the NIPBL protein, are involved in various aspects of gene
multisubunit proteins that regulate chromosome segregation, on to regulation during embryonic development. Subsequent work showed
chromatin. What had been intriguing, however, was that whereas SCC2 that both NIPBL and human Mau-2 were involved in regulating axon
had been highly conserved during evolution with clear orthologs in migration. Genes encoding certain cohesin subunits were also found
all genomes, including NIPBL and Nipped-B, SCC4 seemed so poorly to be mutated in CdLS, and cohesins were found to have roles in
conserved that standard BLAST searching could find no matches other long-distance gene regulation in various cells including non-dividing
than homologs in closely related species of yeast that were confined to neurons.
the Saccharomyces genus. Figure 1 Synergy between
bioinformatic and experimental
investigations to identify functions for
Cornelia de Lange
human phenotype the Cornelia de Lange syndrome gene,
syndrome
NIPBL. Blue arrows, experimental work;
positional cloning of red arrows, bioinformatic analyses; colors
novel gene at 5p13 of boxes distinguish proteins according
full-length cDNA to species. Box outlines indicate whether
previous functional information was
human gene
unknown gene extensive (thick black borders), to limited
(now NIPBL) (dashed border), to unknown (no outline).
See Box 8.3 and Tables 8.4 and 12.4 for
TBLASTN
BLASTP 3-frame details of databases and homology
translation searching programs used. aa, amino acid.
TBLASTN
BLASTP
388 Chapter 12: Studying Gene Function in the Post-Genome Era
TABLE 12.2 SECONDARY DATABASES OF PROTEIN SEQUENCES USED TO IDENTIFY CONSERVED ELEMENTS AND PROTEIN
DOMAINS
Database Contents URL
InterPro a search facility that integrates the information from other http://www.ebi.ac.uk/interpro/
secondary databases
TABLE 12.3 DIFFERENT TYPES OF REVERSE GENETIC SCREEN IN CULTURED MAMMALIAN CELLS
Type of screen Basis of method
Loss-of-function usually, the RNA interference (RNAi) pathway is induced to selectively degrade RNA transcripts of a specific target gene
(see Figures 12.3 and 12.4)
Gain-of-function involves transfection of an exogenous gene copy driven by a suitable promoter to overexpress or misexpress that gene in a
desired cell type
Dominant-negative relies on producing a mutant gene product that interferes with the normal product of a specific target gene; similar to
loss-of-function screen but suppresses gene activity more efficiently than RNA; often the mutant product is designed to
be a truncated version of the wild type that competes with wild-type protein in some way, or the mutant protein becomes
incorporated with wild-type proteins in multisubunit complexes, thereby inactivating the complex
Modifier seeks to identify genes that enhance or suppress a specific phenotype; the initial phenotype is produced by one of the three
methods above; thereafter, the mutant cells are subjected to a second genetic modification (which can again be any of the
three methods above); if altered expression of a second gene enhances or suppresses the initial phenotype, the second gene
is considered to be a modifier of the gene that caused the initial phenotype; it then remains to be worked out how the two
genes interact
For further details see the review by Tochitani & Hayashizaki (2007).
typically reverse genetic screens. There are four different functional classes of
mutation—gain-of-function, loss-of-function, dominant-negative, and modifi-
er—and different screens can be conducted depending on the desired effect on
the target gene (Table 12.3).
Of the different approaches listed in Table 12.3, by far the most widely used
involves selective gene inactivation to cause a loss of function. An early general
approach used the specificity of base pairing to selectively inhibit the expression
of a predetermined gene. In antisense technology, RNA or oligonucleotide con-
structs are designed to have a sequence that is complementary to that of RNA
transcripts from a gene of interest. By hybridizing to the sense transcripts, the
antisense constructs block gene expression (gene knockdown).
Antisense technology was first developed in the 1980s. Specific antisense RNA
molecules were initially used to knock down the expression of a target gene, but
this was very much a hit-or-miss affair depending on which gene was targeted.
Subsequently, antisense oligodeoxynucleotides were used because they were
more stable than RNA. More recently, morpholino antisense oligonucleotides are
used. They have a vastly more stable and robust structure than conventional oli-
gonucleotides, and they are more consistent in producing significant inhibition
of gene expression. As will be described in Chapter 20, morpholino oligonucle-
otides are now routinely used to knock down the expression of specific genes in
various vertebrate models, notably zebrafish embryos.
As explained in the next section, the natural phenomenon of RNA interfer-
ence (see Box 9.7) has also been widely applied as an experimental tool to knock
down the expression of specific target genes in cultured cells and in the embryos
of some invertebrates, notably the nematode Caenorhabditis elegans.
(siRNA) (Figure 12.3, pathway 1). The siRNA associates with a multisubunit pro-
tein complex, the RNA-induced silencing complex (RISC), and the duplex siRNA
is unwound and one strand degraded by a RISC ribonuclease called argonaute.
The RISC with one siRNA strand attached to it is now activated, and the siRNA
strand will guide it to bind RNA transcripts containing complementary sequences.
Any transcript that is bound by the specific antisense siRNA is then targeted for
destruction by the argonaute ribonuclease.
The addition of long dsRNA to mammalian cells does not inhibit the expres-
sion of specific genes as in C. elegans. Instead, long dsRNA induces a general anti-
viral pathway that produces global, nonspecific gene silencing. It does this by
activating the protein kinase PKR (which is also induced by interferon). Activated
PKR phosphorylates the translation initiation factor 2 (eIF2a), thereby inactivat-
ing it and causing a generalized inhibition of translation. PKR also phosphory-
lates 2¢,5¢ oligoadenylate synthetase (2¢,5¢AS), which, when so activated, induces
a ribonuclease that causes nonspecific degradation of mRNA.
As an alternative to adding long dsRNA to mammalian cells, two approaches
are used (see Figure 12.3). In one method (Figure 12.3, pathway 2; see also next
section), transgenes are added that can be expressed to give a partly double-
stranded hairpin RNA (like the structure of miRNA). Another widely used method
involves chemically synthesizing two short complementary oligoribonucleotides
and allowing them to anneal to form a synthetic equivalent of siRNA. Often, the
392 Chapter 12: Studying Gene Function in the Post-Genome Era
by random inactivation of a see Figures designed mutant proteins synthetic nucleic antibodies
mutagenesis, e.g. specific 12.3 to block that bind to acids (or peptides) designed to
using chemical predetermined and translation or to normal selected to bind work within
mutagens gene 12.4 induce alternative proteins to a protein cells
splicing
Although all four types of screen listed in Table 12.3 can also be performed in Figure 12.5 Principal ways of studying
vivo, the most common approach to defining the function of a predetermined gene function in vivo by genetically
gene in whole model organisms is to inactivate a gene and then assess the result- modifying model eukaryotes. (A) Most
methods seek to inactivate a gene to
ing phenotype. This can be done by mutating the gene to produce a null allele, a
produce a mutant phenotype that can
gene knockout, or the gene can be effectively inactivated by inhibiting its expres-
be correlated with a specific gene (peach
sion at the RNA or protein level (Figure 12.5). boxes). Usually this is done at the level of
Valuable clues to gene function are usually obtained by this method, but not DNA, where large-scale random mutagenesis
all genes produce a phenotype when knocked out because of genetic redundancy screens have been performed in many
(sometimes two or more genes have identical or very similar roles and so double models, including both invertebrates
or triple gene knockouts may be needed to elucidate a phenotype). Other genes (such as D. melanogaster) and vertebrates
may be difficult to study because their functions are so crucial that inactivating (notably zebrafish and mice). Specific
them causes cell death. knockouts of predetermined genes have
The technologies involved in designing and studying genetically modified also been conducted, notably by using
homologous recombination in mice. The
animals to understand gene function are highly sophisticated. We examine these
expression of a specific gene can also
in Chapter 20, where animal models of disease are described. Different methods
be selectively inhibited by targeting its
are popular in different model eukaryotes. transcripts—large-scale RNA interference-
Table 12.4 outlines the principal eukaryote models that are currently used to based genetic screens have been conducted
infer how human genes work. The extent to which they are valuable depends on in C. elegans and D. melanogaster and gene
the degree to which functions of genes have been conserved during evolution. knockdown with antisense morpholino
Thus, although it may seem evident that mammalian models would generally be oligonucleotides is commonly performed
the most useful models (because of the high biochemical and physiological simi- in zebrafish embryos. Blocking proteins
larity to humans), even unicellular yeasts have also provided many insights into with dominant-negative mutant proteins,
human gene function. aptamers, or intrabodies also have potential
roles in treating disease (by specifically
downregulating harmful genes) and are
12.4 PROTEOMICS, PROTEINPROTEIN INTERACTIONS, discussed in more detail in Chapter 21.
AND PROTEINDNA INTERACTIONS (B) An alternative approach is to use a
transgene to overexpress a specific gene
The majority of known human genes encode polypeptides that are incorporated or to express it in tissues or developmental
in proteins. Proteins serve numerous functions in cells, and large-scale studies of stages in which it is not normally expressed
how human proteins function have begun. (ectopic expression) in an attempt to produce
a phenotype that will provide clues to the
function of the gene.
Proteomics is largely concerned with identifying and
characterizing proteins at the biochemical and functional levels
In its traditional, broadest sense, proteomics encompasses all global, or at least
large-scale, studies of proteins. However, it is now usual to use a more restricted
meaning that excludes protein structure determination (structural genomics)
and homology studies that compare sets of protein sequences and protein
394 Chapter 12: Studying Gene Function in the Post-Genome Era
TABLE 12.4 PRINCIPAL MODEL EUKARYOTIC ORGANISMS USED TO ASSESS GENE FUNCTION
Organism Class Methods of studying gene Comments Genes and mutants databases
function
Saccharomyces yeast highly sophisticated genetics despite being unicellular, yeasts have SGD (Saccharomyces Genome
cerevisiae (unicellular and mutant screens aided by been useful in modeling how some Database) at http://www
fungus) high frequency of homologous genes work to perform some highly .yeastgenome.org/
recombination conserved cellular functions such as
in DNA repair and cell cycle control
Caenorhabditis nematode large-scale genetic screens using extensively studied; invariant and WormBase at http://www
elegans (roundworm) RNA interference fully mapped cell lineage; the only .wormbase.org/
model in which the complete wiring
system is known for the nervous
system (there are 302 neurons, and all
their connections are known)
Drosophila fruit fly highly sophisticated genetics extensively studied and huge FlyBase at http://flybase.bio
melanogaster and mutant screens, aided by P1 numbers of well-characterized .indiana.edu/
transposon mutagenesis mutants
Dario rerio zebrafish selective gene inactivation principal uses are for understanding ZFIN at http://zfin.org/
by using morpholino how genes function in early
oligonucleotides to inhibit development and in offering a rapid
expression at level of translation; assessment of how a selected gene
also, use of transgenes to express functions in a vertebrate
dominant-negative mutant
proteins
Mus musculus mouse large-scale random mutagenesis the no. 1 model for understanding MGI (Mouse Genome Informatics)
screens (using chemical human gene function because of the at http://www.informatics.jax.org/
mutagens or transgenes); large combination of being a mammal and
programs using homologous the capacity for sophisticated genetic
recombination to knock out manipulation
specific genes; expression
of mutant transgenes and
overexpression/ectopic
expression of normal transgenes;
transgenic RNA interference
Nature Reviews Genetics has published a series of review articles on the Art and Design of Genetic Screens in model organisms that includes the above
models as follows: yeast: Forsburg SL (2001) Nat. Rev. Genet. 2, 659–668; C. elegans: Jorgensen EM & Mango SE (2002) Nat. Rev. Genet. 3, 356–369;
Drosophila: St Johnston D (2002) Nat. Rev. Genet. 3, 176–188; zebrafish: Patton EE & Zon LI (2001) Nat. Rev. Genet. 2, 956–966; mouse: Kile BT &
Hilton DJ (2005) Nat. Rev. Genet. 6, 557–567. Many of the technologies are explained in detail in Chapter 20.
2-D gels
phosphorylation
mass
spectrometry e.g. comparing two
samples representing
different but related
database states or conditions
searching glycosylation by 2-D gel electrophoresis
substrates of enzymes such as protein kinases. Antibody arrays have been com-
paratively widely used.
Much of the focus of proteomics is currently being devoted to identifying
protein–protein interactions and protein complexes. The ultimate aim is to define
protein networks that will aid our understanding of how cells work.
In addition to binding to other proteins and to nucleic acids, proteins interact
with a large number of small molecules that act as ligands, substrates, cargoes (in
the case of transport proteins), cofactors, and allosteric modulators. In all cases,
these interactions occur because the surface of the protein and the interacting
molecule are complementary in shape and chemical properties. Protein–ligand
interactions provide information on how genes function, and they are also impor-
tant for therapeutic purposes: the basis of drug design is to identify molecules
that interact specifically with target proteins and change their activities in the
cell. Most drugs work by interacting with a protein and altering its activity in a
manner that is physiologically beneficial. We will consider such interactions in
the context of therapeutic applications in Chapter 21.
TABLE 12.5 COMMONLY USED METHODS TO DETECT AND VALIDATE PROTEINPROTEIN INTERACTIONS
Method Description Comments
Yeast two-hybrid (Y2H) relies on mating of haploid yeast cells expressing hybrid the specificity of interactions is often low; yeast cells
screening (Figure 12.7) proteins with complementary transcription factor domains; are not ideal environments for testing interaction
positive interactions bring together a novel transcription between mammalian proteins
factor within the resulting diploid yeast cells
Affinity purification–mass a specific protein is covalently linked to an insoluble matrix; affinity purification is more specific than Y2H but does
spectrometry (AP–MS) this bound protein is then used to capture any proteins that not readily detect transient interactions; like Y2H, it has
(Figure 12.8A) it interacts with from a cell lysate; captured protein can be been used extensively to work out the detail of protein
eluted, purified, and analyzed by mass spectrometry interactomes in model organisms
TAP (tandem affinity relies on two sequential rounds of affinity purification; this
purification) tagging is achieved by transfecting cells with a cDNA of interest
(Figure 12.8B) coupled with a sequence that will add a terminal tag to
express two different proteins/peptides known to be bound
by other proteins
Fluorescence resonance microscopy method that can monitor interactions in real provides circumstantial supportive evidence only for a
energy transfer (FRET) time; it tests interaction of suggested interacting proteins supposed protein–protein interaction
(Figure 12.9) after labeling them with different fluorophores
Co-immunoprecipitation a specific antibody is used to precipitate a desired protein the gold standard for validating protein interaction,
(co-IP; Figure 12.10) from a cell lysate; if the protein is complexed to other especially when using endogenous proteins
proteins, the interacting proteins should be co-precipitated
Pull-down assay a variant of co-IP that is identical in all aspects except that
it uses a specific protein ligand other than an antibody to
capture a protein complex
For more detail on these and other methods including protein microarrays, see the review by Lalonde et al. (2008) under Further Reading. Note that
the division into (A) and (B) is somewhat arbitrary. For example, a directed Y2H assay can be designed to test the interaction between two specific
proteins. On the other hand, although co-IP is often used for validating presumed interactions, it can also be used to isolate entire protein complexes.
ability to bind to individual test proteins, and various assays can be performed to
validate interactions between specific proteins. We summarize the methods in
Table 12.5 and provide details of different methods in the next sections.
library of BD hybrids
massively parallel
protein–protein
interaction screening
(A)
high [Na+]
covalently
couple
proteins
insoluble to matrix
matrix
gel purification
mass spectrometry
Figure 12.9 Using fluorescence resonance energy transfer (FRET) to test a (A)
protein interaction. Fluorescence microscopy using FRET can be used to test 436 nm
protein interactions within cells in real time. (A) Two proteins of interest (1 and
480 nm
2) are expressed as hybrid proteins that have different fluorescent protein tags,
such as cyan fluorescent protein (CFP) and yellow fluorescent protein (YFP)
shown here. The cells are then exposed to light of a defined wavelength that YFP
CFP
will excite only one of the two fluorophores, in this case CFP. When excited by
exposure to light of wavelength 436 nm, CFP emits blue light with a slightly
longer wavelength of 480 nm and continues to do so if proteins 1 and 2 do not 2
interact. (B) If, however, proteins 1 and 2 do interact, their attached fluorescent
proteins come into very close proximity, and fluorescence energy is transferred 1
from the donor CFP to the acceptor YFP in a non-radiative manner via
dipole–dipole coupling (resonance). After excitation of CFP, only a small residual
amount of the absorbed light is now emitted at 480 nm; most of the energy is
transferred from CFP to the adjacent YFP. The acceptor YFP then emits light at a (B)
different wavelength (535 nm) that is readily distinguished.
436 nm
480 nm
transferred to the other type of fluorophore—the acceptor—if the two proteins to 535 nm
which the donor and acceptor fluorophores are bound come into extremely close FRET
CFP
contact (within 1–10 nm). As a result, a significant amount of the energy absorbed
by the excited donor is emitted at a much longer wavelength than if the donor YFP
and acceptor were distant from each other (Figure 12.9).
FRET only offers suggestive evidence of an interaction. To validate protein–
protein interaction, more robust evidence is required, such as that obtained by 1 2
using co-immunoprecipitation (co-IP). In this method, antibodies specific for a
particular protein are added directly to a cell lysate. If the protein of interest is
bound to one or more other proteins, the antibody can be used to immunopre-
cipitate the target protein while it is still bound to interacting proteins, often as
part of a complex (Figure 12.10). A variant co-IP technique, called a pull-down
assay, uses a specific ligand in place of an antibody to capture a protein of inter-
est as part of a complex. In both methods, the bound proteins can be identified
after peptide cleavage and mass spectrometry analysis.
Figure 12.10 Co-immunoprecipitation
The protein interactome provides an important gateway to (co-IP). This method is popularly used to
systems biology validate a suggested interaction between
two proteins, but it can also be used as
Affinity purification can be used to isolate entire protein complexes, allowing the a screening method to directly identify
different components to be identified by mass spectrometry. This is one of the novel proteins that bind to a protein of
major technology platforms in interaction proteomics. It has been widely used for interest. A cell lysate (or other protein mix)
is incubated with an antibody specific
for a protein of interest to form antigen–
antibody antibody complexes. The resulting immune
complexes are then captured by binding
to agarose or Sepharose beads to which
protein A or protein G has been coupled
(protein A and protein G are bacterial
proteins that happen to bind strongly to
immunoglobulins). As a result, the immune
complexes are precipitated out of solution
protein and, after centrifugation and washing,
A (or G)
cell lysate beads antibody binding
the unbound protein is eluted (1), leaving
elute
and capture 1 2 the target protein bound via its antibody
to the beads. If the target protein was
already bound to an interacting protein it
too would be co-precipitated. The beads
can then be treated to elute the target
protein and any interacting proteins (2). To
validate a suspected protein interaction,
western blotting is then conducted with an
antibody specific for the presumed protein
western peptide partner. Alternatively, the method can be
blotting cleavage used as a screening tool to identify novel
using antibody interacting proteins by treating them with
specific for
proteolytic enzymes to generate peptides
suggested
interacting mass that can be analyzed and identified by mass
protein spectrometry spectrometry.
400 Chapter 12: Studying Gene Function in the Post-Genome Era
BioGRID (General Repository for curated set of physical and genetic interactions for http://www.thebiogrid.org/
Interaction Datasets) more than 20 species; searchable by species
DroID (Drosophila Interactions Database) a comprehensive gene and protein interactions http://www.droidb.org/
database designed specifically for the model
organism Drosophila but with useful links to protein
interaction data from other organisms
HPRD (Human Protein Reference integrates data deposited in the Human Proteinpedia http://www.hprd.org/
Database) along with literature information curated in the
context of individual proteins
MINT (Molecular INTeraction database) experimentally verified protein interactions mined http://mint.bio.uniroma2.it/mint/Welcome.do
from the scientific literature by expert curators
vesicular transport
cell cycle recombination
control
cell Figure 12.12 Simplified functional group
structure cell interaction map of the yeast proteome.
polarity
DNA repair The data shown here were derived
mating protein
modification from a more complex map of individual
protein folding response
interactions, obtained mostly from two-
protein
cytokinesis hybrid screens. Only connections for which
differentiation synthesis
chromatin/chromosome there are 15 or more interactions between
protein translocation structure proteins of the connected groups are
RNA shown. A small number of interactions occur
processing between almost all groups and often tend to
signal transduction
nuclear–cytoplasmic
be spurious, that is, based on false positive
transport pol II transcription results. Only fully annotated yeast proteins
cell stress RNA are included, and many proteins are known
lipid fatty-acid turnover
RNA to belong to several functional classes. [From
and sterol splicing
metabolism
Tucker CL, Gera JF & Uetz P (2001) Trends
carbohydrate pol I transcription Cell Biol. 11, 102–106. With permission from
metabolism pol III transcription Elsevier.]
402 Chapter 12: Studying Gene Function in the Post-Genome Era
Figure 12.13 DNase I footprinting. DNA molecules that are labeled (red)
at one end are allowed to bind to a protein (green). Partial digestion of any
unbound DNA with pancreatic DNase I can result in a ladder of DNA fragments
of different sizes because the location of a cleavage site (gray vertical bars) on
an individual DNA molecule is essentially random (left). If, however, the DNA
contains a protein-binding site, incubation with a suitable protein extract may
result in the binding of protein at a specific region (right). Subsequent partial
digestion with pancreatic DNase I no longer produces random cleavage; instead,
the bound protein protects the underlying DNA sequence from cleavage. As a
result, the spectrum of fragment sizes is skewed against certain fragment sizes,
leaving a gap when size-fractionated on a gel, a DNase I footprint.
partial DNase I
cell extract can be revealed by taking advantage of their comparative inaccessi-
bility to nucleases, such as pancreatic DNase I (Figure 12.13).
In gel mobility shift assays, a suspected regulatory DNA sequence can be
assayed for the ability to bind proteins, and the binding proteins can subse-
quently be purified. The basis of the method is that a labeled DNA clone bound
to protein migrates more slowly in polyacrylamide gel electrophoresis than does
unbound labeled DNA. Fractions of the gel that contain slowly migrating labeled
DNA can be subjected to column chromatography to purify the bound protein.
To define DNA (or oligonucleotide) binding sites for a specific protein (or
other ligand), an in vitro selection procedure can be used, such as SELEX (system-
atic evolution of ligands by exponential enrichment ). Here, parallel oligonucle-
otide syntheses produce a huge library of oligonucleotides with constant
sequences at their 5¢ and 3¢ ends for primer binding but with randomly generated gel electrophoresis
and autoradiography
internal sequences of a fixed length. Often about 1013–1015 different sequences
are designed. The oligonucleotide library is then mixed with the target protein to
allow oligonucleotide–protein binding and the mixture is passed through an
affinity chromatography column. After several rounds of column chromatogra- footprint
phy, oligonucleotide sequences are purified that can bind to the target protein
(or other ligand) with extremely high affinity (Figure 12.14).
Mapping protein–DNA interactions in vivo
Methods such as those described above do not let us know where proteins bind label DNase I bound
to DNA in vivo. To obtain this information, proteins are covalently fixed to DNA cleavage site protein
(A) (B)
1
2
3
4
5
6
repeat process affinity
chromatography
15
10
library of ~1015 different oligonucleotides
amplify
selected
oligonucleotides
(crosslinking) within intact cells by using a reagent such as formaldehyde. The formaldehyde
cells are then lysed and the crosslinked DNA–protein complexes are purified and O
analyzed.
C
Formaldehyde is a highly reactive dipolar molecule whose carbon atom acts
as a nucleophilic center. It reacts readily with both amino and imino groups H H
found in proteins and DNA. The most available reacting groups in proteins are H
the amino groups of lysine and arginine and the imino group of histidine. In R N
DNA, the primary reacting groups are the amino groups of adenine and cytosine. H
Figure 12.15 shows an example of how formaldehyde can cause crosslinking of macromolecule
macromolecules such as proteins and DNA.
The most popular method for mapping DNA–protein interactions in vivo is
chromatin immunoprecipitation (ChIP ; see Box 11.3). Proteins are cross-linked
to DNA within cells. The cells are lysed, and the chromatin is extracted and frag- H H
mented by physical shearing or enzymatic treatment. DNA fragments that have a
R N C OH
crosslinked protein of interest are isolated by using an antibody specific for that
protein, and purified DNA fragments are analyzed to identify the location of the H H
binding site for the protein.
Schiff’s base
The basic ChIP method is often devoted to investigating interactions between
a specific protein and a few genomic regions that are known or suspected to have
binding sites for that protein. More recent ChIP variants permit the identification
of protein-binding sites on a genomewide scale. A major motivation is to identify H
H
all the binding sites for transcription factors within a genome. In the ChIP-chip
method (also called ChIP-on-chip, ChIPArray, and genomewide location analy- R N C
sis), the ChIP-purified DNA is amplified, fluorescently labeled, and analyzed by H
hybridization to genomic DNA sequences distributed within a microarray. As an
alternative, sequencing of the ChIP-purified DNA (ChIP-seq) is becoming widely N H
used. In a test run of ChIP-seq by Johnson et al., a zinc finger repressor protein, R¢
the neuron-restrictive silencing factor (NRSF), was found to bind to 1946 loca- N
tions in the human genome.
In addition to defining positions on DNA occupied by regulatory proteins,
ChIP technologies are routinely being used to map the positions within genomes
H H
of various types of modified histone and methylated cytosine. R¢
R N C N
CONCLUSION N
H
The study of gene function has been transformed in the post-genome era.
Experimental studies on a gene-by-gene basis have been augmented by Figure 12.15 Formaldehyde-induced
genomewide screens. In addition, the many databases that have arisen from crosslinking. Formaldehyde treatment
genome projects and new analytical tools for their interrogation have driven can result in covalent linking of proteins
much of the functional genomics revolution. to DNA or other proteins, often through
The function of a gene can often be inferred from sequence homology searches amino or imino groups. In this example, a
of increasing sophistication. Experimental approaches are then needed to verify nucleophilic attack causes covalent bonding
this putative function. Whereas classical genetics began with an observed pheno- between formaldehyde and an amino group
type and searched to find a gene responsible for it, reverse genetics now allows us carried by a macromolecule (R). A Schiff
base intermediate can then form that can
to alter a gene or its expression to create an altered phenotype. Model organisms,
be covalently linked to a suitable chemical
in particular the mouse, have enabled such studies at an organismal level. These group (in this case, an imino group) on
are explored further in Chapter 20. More recently developed techniques such as a different macromolecule (R¢). The end
RNA interference (RNAi) have been widely employed in genomewide studies of result is that a –CH2– bond contributed by a
gene function in cultured human and animal cells. formaldehyde molecule is used to crosslink
Experimental techniques that determine the specific interactions between the macromolecules R and R¢.
proteins and other ligands underpin the new field of proteomics. Yeast two-
hybrid assays and affinity purification–mass spectrometry techniques can be
used to screen for binding partners for specific proteins. Techniques such as fluo-
rescence resonance energy transfer and co-immunoprecipitation can then be
used to validate suggested protein–protein interactions.
The wealth of information from both genomewide experimental screens and
bioinformatics is enabling the mapping of vast protein networks of interacting
proteins in cells. Genes that encode functional noncoding RNA have not been
explored so extensively, partly because identifying RNA genes has been a com-
paratively recent endeavor and is ongoing. Nevertheless, we are increasingly
aware of the importance of many different noncoding RNAs in gene regulation
and in disease; as we will see in Chapter 20, genetically manipulated model
organisms are being used to dissect their functions.
404 Chapter 12: Studying Gene Function in the Post-Genome Era
FURTHER READING
Sequence homology searching Pandey A & Mann M (2000) Proteomics to study genes and
Barnes MR (ed.) (2007) Bioinformatics For Geneticists, 2nd ed. genomes. Nature 405, 837–846.
John Wiley & Sons. Ptacek J, Devgan G, Michaud G et al. (2005) Global analysis of
Durbin R, Eddy SR, Krogh A & Mitchison G (2005) Biological protein phosphorylation in yeast. Nature 438, 679–684.
Sequence Analysis: Probabilistic Models Of Proteins And Protein–protein interaction
Nucleic Acids. Cambridge University Press.
Margulies EH & Birney E (2008) Approaches to comparative Cusick ME, Klitgord N, Vidal M & Hill DE (2005) Interactome:
sequence analysis: towards a functional view of vertebrate gateway into systems biology. Hum. Mol. Genet. 14,
genomes. Nat. Rev. Genet. 9, 303–313. R171–R181.
NCBI BLAST and PSI-BLAST tutorials are available at http://www Krogan NJ, Cagney G, Yu H et al. (2006) Global landscape of
.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html protein complexes in the yeast Saccharomyces cerevisiae.
Nature 440, 637–643.
Genetic screens in mammalian cells (see also the Lalonde S, Erhardt DW, Loque D et al. (2008) Molecular and
footnote to Table 12.4 for references to reviews on cellular approaches for the detection of protein–protein
interactions: latest techniques and current limitations. Plant J.
genetic screening in various model eukaryotes) 53, 610–635.
Bernards R, Brummelkamp TR & Beijersbergen RL (2006) shRNA Rual JF, Venkatesan K, Hao T et al. (2005) Towards a proteome-
libraries and their use in cancer genetics. Nat. Methods 3, scale map of the human protein–protein interaction network.
701–706. Nature 437, 1173–1178.
Grimm S (2004) The art and design of genetic screens: Stelzl U, Worm U, Lalowski M et al. (2005) A human protein–
mammalian culture cells. Nat. Rev. Genet. 5, 179–189. protein interaction network: a resource for annotating the
Kittler R, Pelletier L, Heninger A-K et al. (2007) Genome-scale proteome. Cell 122, 957–968.
RNAi profiling of cell division in human tissue culture cells.
Nat. Cell Biol. 9, 1401–1412. Identifying DNA–protein interactions
Martin SE & Caplen NJ (2007) Applications of RNA interference Johnson DS, Mortazavi A, Myers RM & Wold B (2007) Genome-
in mammalian systems. Annu. Rev. Genomics Hum. Genet. 8, wide mapping of in vivo protein–DNA interactions. Science
81–108. 316, 1497–1502.
Silva JM, Li MZ, Chang K et al. (2005) Second-generation shRNA Kim TH & Ren B (2006) Genome-wide analysis of protein–DNA
libraries covering the mouse and human genomes. Nat. interactions. Annu. Rev. Genomics Hum. Genet. 7, 81–102.
Genet. 37, 1281–1288. Orlando V (2000) Mapping chromosomal proteins in vivo by
Tochitani S & Hayashizaki Y (2007) Functional screening revisited formaldehyde-crosslinked-chromatin immunoprecipitation.
in the postgenomic era. Mol. Biosyst. 3, 195–207. Trends Biochem. Sci. 25, 99–104.
Stoltenburg R, Reinemann C & Strehlitz B (2007) SELEX—A
Proteomics (r)evolutionary method to generate high-affinity nucleic acid
Bertone P & Snyder M (2005) Advances in functional protein ligands. Biomol. Eng. 24, 381–403.
microarray technology. FEBS J. 272, 5400–5411. Wei C-L, Wu Q, Vega VB et al. (2006) A global map of p53
Gstaiger M & Aebersold R (2009) Applying mass spectrometry- transcription-factor binding sites in the human genome.
based proteomics to genetics, genomics and network Cell 124, 207–219.
biology. Nat. Rev. Genet. 10, 617–627.
Nature Insight on Proteins to Proteomes (2007) Nature 450,
963–1009.
Chapter 13
• Human DNA variants can be classified as large scale versus small scale, common versus
rare, and pathogenic versus nonpathogenic.
• Single nucleotide polymorphisms (SNPs) are the most numerous variants.
• Short tandem repeat polymorphisms (STRPs or microsatellites) are very common. The
repeat units are most commonly 1, 2, or 4 bp long, and the string of tandem repeats usually
extends over less than 100 bp. The number of repeats in a string often varies between people
as a result of polymerase stuttering during DNA replication.
• Minisatellites are tandem repeats with longer repeat units, typically 10–50 bp. Again, the
number of repeat units in a run often varies between people, but this is most often because
of unequal recombination. Minisatellites are commoner toward the telomeres of
chromosomes.
• Many larger sequences (between 1 kb and 1 Mb) show variation in copy number between
normal healthy people. About 5% of the whole genome can vary in this way. A common
cause of such variation is recombination between mispaired repeated sequences.
• Genomic DNA is subject to constant processes of damage and repair. Most damage passes
unnoticed because it is efficiently repaired. Failures in repairing damage or correcting
replication errors are the ultimate cause of most sequence variation.
• Most common variants are not pathogenic, although they might, in combination with other
variants, increase or decrease susceptibility to multifactorial diseases. They are useful as
markers (variants that can be used to recognize a chromosome or a person) in pedigree
analysis, forensics, and studies of the origins and relationships of populations.
• Pathogenic effects can be mediated by either a loss of function or a gain of function of a
gene product. Very many different changes in a gene can cause a loss of function. Usually
only one or a few specific changes can cause a gain of function.
• Loss-of-function changes usually lead to recessive phenotypes, and gain-of-function
changes to dominant phenotypes. Dominant phenotypes can also result from loss of
function if a 50% level of the normal gene product is not sufficient to produce a normal
phenotype (haploinsufficiency), or if the protein product of the mutant allele interferes with
the function of the normal product (dominant-negative effects).
• Molecular pathology seeks to explain why a certain genetic change causes a particular
clinical phenotype. However, well-defined genotype–phenotype correlations are rare in
humans because humans differ greatly from one another in their genetic make-up and
environments.
406 Chapter 13: Human Genetic Variability and Its Consequences
This chapter is about the differences that exist between different human indi-
viduals in the DNA sequence of their genomes (differences between humans and
other species were covered in Chapter 10). Now that we have complete genome
sequences for several named individuals, we can see that they differ from one
another in millions of ways. Most of the many differences between individual
human genomes seem to have no effect whatsoever. Other differences do affect
the phenotype, producing the normal range of genetically determined variants in
body build, pigmentation, metabolism, temperament, and so on, that make each
of us individual. All these are normal variants. Some variants are pathogenic—
that is, they either cause disease or make their bearer susceptible to a disease that
they may or may not actually develop, depending on their other genes, their life-
style, their environment, or pure luck. In this chapter, we consider first the nor-
mal variants and then those that are pathogenic. There is one further level of
genetic variation that we do not consider here: epigenetic variation (variations in
DNA methylation and chromatin conformation) between individuals. This was
described in Chapter 11, but it is still a matter of speculation how much our indi-
vidual phenotypes are due to genetic and how much to epigenetic differences
between us.
The word polymorphism is used by human geneticists to mean commonest mutation that causes cystic fibrosis in northern
several different things at different times. Needless to say, this can European populations (labeled p.F508del; see Box 13.2 for an
cause confusion. explanation of the notation) has a frequency of 0.01–0.02 in
• Molecular geneticists often describe a variant as a polymorphism northern European populations. As shown in Chapter 3, this could
if its frequency in the population is above some arbitrary value, not be sustained by recurrent mutation in the face of the selective
often 0.01. For example, SNP rs1447295 (see Box 13.2 for an pressure against people with cystic fibrosis.
explanation of the notation) at 8q24 is a polymorphism with two • Clinical geneticists often use polymorphism to mean a non-
alleles, C and A, with frequencies 0.93 and 0.07, respectively, in pathogenic variant, regardless of its frequency. Pathogenic
the European HapMap population. This is the meaning we use in variants, whether common or rare, would be described as
this book, unless otherwise specified. Variants whose frequency is mutations.
below the arbitrary threshold might be described as rare variants. The word mutation can also be used with two different senses,
• Population geneticists define a polymorphism as the stable referring to either the process or the product:
coexistence in a population of more than one genotype at • an event that changes a DNA sequence—UV radiation produced a
frequencies such that the rarer type could not be maintained mutation in the DNA.
simply by recurrent mutation. On this definition, some pathogenic • a DNA sequence change that may have happened a long time
mutations would count as polymorphisms. For example, the ago—she had inherited a mutation from her father.
TYPES OF VARIATION BETWEEN HUMAN GENOMES 407
BOX 13.2 NOMENCLATURE FOR DESCRIBING DNA AND AMINO ACID VARIANTS
Each single nucleotide polymorphism (SNP) in the public database Amino acid substitutions
dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP) can be referred Use either one-letter codes (X means a stop codon) or three-letter
to by its unique identifier, such as rs212570, where rs stands for codes. Proteins are numbered with the initiator methionine as
reference SNP and 212570 is a unique serial number. Short tandem codon 1.
repeat polymorphisms (STRPs) have identifiers such as D6S282, where • p.R117H or Arg117His—replace arginine 117 by histidine.
D stands for DNA segment, 6 is the number of the chromosome on • p.G542X or Gly542Stop—replace the codon for glycine 542
which the marker is located, S means a single copy sequence, and with a stop codon.
282 is a unique serial number. Deletions and insertions
Any sequence change can be described using the conventions set Use del for deletions and ins for insertions, preceded by the nucleotide
out on the website of the Human Genome Variation Society, position or interval (for DNA changes) or the amino acid symbol
http://www.hgvs.org/mutnomen/. The commoner cases are described (one-letter code; for amino acid changes).
below. • p.F508del—in a protein (p) delete phenylalanine (F) 508.
All variants are prefixed with g. (genomic), c. (cDNA), r. (RNA), or • c.6232_6236del or c.6232_6236delATAAG—delete five
p. (protein). nucleotides starting with nucleotide 6232 of the cDNA. The
Nucleotide substitutions identity of the deleted nucleotide(s) can be specified.
For changes in a gene, the A of the initiator ATG codon is numbered • g.409_410insC—insert C between nucleotides 409 and 410 of
+1; the base immediately preceding this is –1. There is no zero. The genomic DNA.
number of the altered nucleotide is followed by the change. A program, Mutalyzer, has been developed to generate the
• g.1162G>A—in genomic DNA, replace guanine at position 1162 correct name for any sequence variant that the user inputs; see
by adenine. http://www.lovd.nl/mutalyzer/.
For changes within introns, when only the cDNA sequence
is known in full, specify the intron number by IVS (IVS stands for
intervening sequence) or the number of the nearest exon position.
• g.621+1G>T or IVS4+1G>T—replace G by T at the first base of
intron 4 (nucleotide 621 is the last base of exon 4).
anything from 0.5 downward. We would not expect high-frequency SNP alleles to
have any major phenotypic effect, because natural selection should have ensured
that they were eliminated if harmful or fixed (present in everybody) if beneficial.
However, those evolutionary processes might be incomplete at present for some
particular allele. Additionally, heterozygote advantage is a mechanism that can
produce stable polymorphisms even when one allele is harmful in homozygous
form. This happens when asymptomatic carriers of a deleterious recessive condi-
tion have some selective advantage over normal homozygotes, as with sickle-cell
anemia (see Chapter 3, p. 64). The argument from natural selection is less power-
ful with rare SNPs, but even so, these would also not generally be expected to
have a phenotypic effect, if only because most SNPs are not located in coding or
regulatory sequences.
Some polymorphic sites have more complex variants. A SNP may have three
alleles, say A, G, or C, or two SNPs may be adjacent to one another. The great
majority of more complex variants in the dbSNP database (about 2 million of the
12 million entries) are deletion/insertion polymorphisms (DIPs or indels).
Typically one or two nonrepeated nucleotides are inserted or deleted (Figure
13.1B). Insertions or deletions of repeat units have different causes and dynam- (A)
ics, and are discussed below in the section on short tandem repeats (p. 408). GCCTGTTTTATATTAC/TGATCCAATTTTTTCA
Sometimes a SNP will affect a nucleotide that is part of the recognition site for
(B)
one or another restriction enzyme. Although many hundreds of restriction
GAGACAGAGTTTCGC(T)TCTTGTTGCCCAGGCT
enzymes are known, most of them recognize a palindromic sequence such as
GAATTC. The sequence is palindromic because the complementary strand, read (C)
in the 5¢Æ3¢ direction, also reads GAATTC: CCAAGCCTGGAGCTA/GGCCGTGGGCCAGGCAAG
5¢-ACGATTGAATTCTAGGATC-3¢ Figure 13.1 Types of single nucleotide
3¢-TGCTAACTTAAGATCCTAG-5¢ polymorphism (SNP). (A) A simple SNP.
Only about 10% of all nucleotides fall in such palindromic sequences, so most rs212570 is a C/T variant on chromosome
20. (B) A deletion/insertion polymorphism.
SNPs do not affect any restriction site. When a SNP happens to lie within a restric-
rs36126541 is insertion or deletion of
tion site, only one allele will retain the necessary recognition sequence. Thus, the
a single T nucleotide in a sequence
polymorphism will create or abolish a restriction site. SNPs of this type (Figure on chromosome 12. (C) A restriction
13.1C), known as restriction fragment length polymorphisms (RFLPs) or fragment length polymorphism (RFLP).
restriction site polymorphisms (RSPs), were the first widely used DNA markers SNP rs36078338 creates or abolishes the
and were used to create the first genetic linkage map of the human genome (see sequence GCTAGC, which is the recognition
Chapter 8). site for the restriction enzyme NheI.
408 Chapter 13: Human Genetic Variability and Its Consequences
2 3 1 2 2 3
B C C D
+ +
1 1 2 3 1 3 normal replication
C B D C 1 2 3
5¢ CA CA CA 3¢
3¢ GT GT GT GT GT GT 5¢
1 2 3 4 5 6
defining and mapping sufficient STRPs to construct a high-resolution marker
framework map across the human genome; about 150,000 STRPs have been
identified. Although STRPs still remain the tool of choice for forensic work, link- 1 2 3 4 5 6
5¢ CA CA CA CA CA CA 3¢
age work has moved to using SNPs because they can be genotyped on a huge
scale with the use of microarrays.
Large-scale variations in copy number are surprisingly frequent in backward slippage causes insertion
human genomes 2
CA
Variants large enough to be seen by a cytogeneticist under the microscope are 1 2
5¢ CA CA 3¢
almost always the result of isolated pathogenic accidents and are not part of nor- 3¢ GT GT GT GT GT GT 5¢
mal human variation. Cytogeneticists recognize just three types of relatively 1 2 3 4 5 6
common normal variant:
• The size of the heterochromatic regions at the centromeres of chromosomes
1 2 2 3 4 5 6
1, 9, or 16 and the long arm of the Y chromosome may vary. 5¢ CA CA CA CA CA CA CA 3¢
• The short arms of the acrocentric chromosomes (chromosomes 13, 14, 15, 21,
and 22, which have the centromere close to one end—see Figure 2.15) vary
considerably in size and morphology. Often the most distal part appears as a forward slippage causes deletion
so-called satellite connected to the main body of the chromosome by a short, 1 2 3 5
thin stalk. Note that this use of the word satellite is unrelated to its use to 5¢ CA CA CA CA 3¢
describe DNA tandem repeats. 3¢ GT GT GT GT GT 5¢
1 2 3 5 6
• A variety of fragile sites—segments of uncoiled chromatin—may be seen GT
when cells are cultured under conditions that make DNA replication diffi- 4
cult—for example, by starving the cells of thymidine or adding the DNA
polymerase inhibitor aphidicolin (Figure 13.5). Most fragile sites are normal
variants, although the FRAXA and FRAXE fragile sites discussed below (see 1 2 3 5 6
5¢ CA CA CA CA CA 3¢
p. 424) are pathogenic.
All these changes reflect varying copy numbers of tandemly repeated Figure 13.4 Slipped-strand mispairing
sequences. Until recently it was assumed that any deletion or insertion of a large during DNA replication. A new DNA strand
stretch of non-repetitive DNA would be pathogenic. The advent of techniques (pink) is being synthesized, using the blue
such as array-comparative genomic hybridization (in which test and control strand as template. During normal DNA
replication the nascent strand often partly
DNAs compete to hybridize to a microarray; see Chapter 2) and whole genome
dissociates from the template and then
SNP arrays made it relatively easy for the first time to scan whole genomes for reassociates. When there is a tandemly
extra or missing material, and it quickly became apparent that many large-scale repeated sequence, the nascent strand
changes can be found in the genomes of normal healthy people. A major study by may mispair with the template when it
Redon et al. of 269 healthy individuals from four geographically distinct popula- reassociates. This can result in the newly
tions (the HapMap sample; see Chapter 15) identified 1447 regions with variable synthesized strand having fewer or more
numbers of copies of sequences of at least 1 kb in length. In total these variable repeat units than the template strand.
410 Chapter 13: Human Genetic Variability and Its Consequences
regions were reported to cover 360 Mb, or 12% of the genome (although that esti-
mate has since been revised downward), and included many known genes. The
mean size of the variable segments was 250 kb. The techniques used were rela-
tively insensitive to smaller, kilobase-sized variants. Many such smaller variable
segments would have been missed. These smaller variants can be identified using
the technique of paired-end mapping. This involves selecting random fragments
of genomic DNA of a known size, and sequencing a few dozen nucleotides from
each end of a fragment. One can then compare the distance apart of those end
sequences in the selected fragment (whose size is known) with the distance apart
of the same sequences in the reference human genome (Figure 13.6). If the sepa-
ration is greater in the selected fragment than in the reference genome, the frag-
ment must contain an insertion that is not present in the reference genome; con-
versely, a lesser distance indicates a deletion. Recent studies have shown that
these smaller variants are even more frequent than larger ones (Figure 13.7).
Although SNPs are individually more numerous, copy-number variants Figure 13.5 A chromosomal fragile
(CNVs) are responsible for by far the greatest number of nucleotides that differ site. A stretch of relatively uncondensed
between two genomes. Thus, the human genome is considerably more variable chromatin in an otherwise highly condensed
between normal healthy people than had previously been assumed. The studies metaphase chromosome. There are about
120 locations in the human genome where a
mentioned above did not reveal, when there is a gain in copy number, whether
chromosome of a normal healthy person has
the multiple copies of the sequence are scattered across several chromosomal some tendency to show a fragile site when
locations or clustered in tandem. Other studies show that they most commonly cells are cultured under special conditions.
form tandem clusters. The Database of Genomic Variants (TCAG; http://projects Shown here is an X chromosome with the
.tcag.ca/variation/) is a database of variants seen in apparently healthy people, pathogenic fragile site FRAXA (gray arrow).
whereas the Decipher database (http://decipher.sanger.ac.uk/) records variants (Courtesy of Graham Fews, University of
seen in people with phenotypic abnormalities (although whether any particular Birmingham.)
variant is the cause of the phenotypic abnormality is seldom known for certain).
The genome sequences of single individuals that are now becoming available
give a striking picture of the overall variability present in ostensibly healthy
individuals.
The pioneer genome scientist Craig Venter was the first individual whose full
diploid genome sequence was determined. The reference human genome
genomic DNA
fragment,
select 3 kb fragments,
ligate biotinylated adaptors
bio bio
ligate to promote
circularization
bio
bio
Figure 13.6 The principle of paired end
digest with restriction enzyme, mapping. In paired end mapping, small
select fragments containing biotin size-selected fragments of sheared genomic
DNA from BAC or PAC clones are used,
bio which are marked by ligation to biotinylated
adaptors. In both cases, the aim is to identify
a sequence representing the two ends of the
bio original DNA joined by the circularization.
In chromosome jumping this was to find
sequence,
map ends to reference sequences (green) that were 80–130 kb
genome sequence from a known sequence (blue), in order to
reference genome speed up construction of a clone contig
sequence across a large genomic region. In paired end
mapping, the aim is to identify structural
study genome
variants, by finding sequences that are a
different distance apart in the test genome
than in the reference human genome
no variant deletion insertion
sequence.
DNA DAMAGE AND REPAIR MECHANISMS 411
1000
0
1–10 10–50 50–100 100–200 200–500 500–1000 >1000
size ranges (kb)
O H
agents can transfer a methyl or other alkyl group onto DNA bases and can
cause cross-linking between bases within a strand or between different DNA
strands.
Although external agents are more visible, the major threats to the stability of
a cell’s DNA come from internal chemical events, such as those illustrated in
Figures 13.8 and 13.9:
• Depurination—about 5000 adenine or guanine bases are lost every day from
each nucleated human cell by spontaneous hydrolysis of the base–sugar link
(see Figures 13.8 and 13.9A).
• Deamination—at least 100 cytosines each day in each nucleated human cell
are spontaneously deaminated to produce uracil (see Figures 13.8 and 13.9B).
Less frequently, spontaneous deamination of adenine produces
hypoxanthine.
• Attack by reactive oxygen species—highly reactive superoxide anions (O2–) and
related molecules are generated as a by-product of oxidative metabolism in
mitochondria. They can also be produced by the impact of ionizing radiation
on cellular constituents. These reactive oxygen species attack purine and
pyrimidine rings (see Figure 13.8).
• Nonenzymatic methylation—accidental nonenzymatic DNA methylation by
S-adenosyl methionine produces about 300 molecules per cell per day of the
cytotoxic base 3-methyl adenine, plus a quantity of the less harmful 7-methyl
guanine (see Figure 13.8). This is quite distinct from the enzymatic methyla-
tion of cytosine to produce 5-methyl cytosine, which cells use as a major
method of controlling gene expression (see Chapter 11). The methylated ade-
nine or guanine bases distort the double helix and interfere with vital
DNA–protein interactions.
In addition to damage of these various types, errors arise during normal DNA
metabolism. A certain error rate (incorporation of the wrong nucleotide) is inevi-
table during DNA replication. Proofreading mechanisms correct the great major-
ity of the resulting mismatches, but a few may persist, producing sequence vari-
ants. Failure of the proofreading mechanism in somatic cells is one cause of
cancer (see Chapter 17). In addition, occasional errors in replication or recombi-
nation leave strand breaks in the DNA, which must be repaired if the cell is to
survive.
DNA DAMAGE AND REPAIR MECHANISMS 413
(A) guanine
O
N H
N no base
HC
N N NH2 OH
O H2O guanine O
O P O CH2 O P O CH2
O O
_ _
O O
cytosine
(B) O H H H O H
N H2O NH3 O uracil
H
N NH
H N O N O
O P O CH2 O P O CH2
O O
_ H2O NH3 _
O O
O H O H
(C) 5-methylcytosine
H H
N O thymine
H3C H3C
N NH
H N O N O
O P O CH2 O P O CH2
O O
_ _
O O
O H O H
O
O
CH2 C NH
O
N C O
C C
O H H CH3
and involves the sister chromatid. The eukaryotic machinery chromosomal rearrangements, but it is better than leaving breaks
for recombination repair is less well defined than the excision unrepaired. One reason why chromosomal telomeres require a
repair systems. Human genes involved in this pathway include special structure is to protect normal chromosome ends from this
NBS (Nijmegen breakage syndrome; OMIM 602667), BLM (Bloom double-strand break response.
syndrome; OMIM 604610), and the breast cancer susceptibility A final repair mechanism is concerned with correcting mismatches
genes BRCA2 (OMIM 600185) and BRCA1 (OMIM 113705). caused by replication errors. Cells deficient in mismatch repair have
• In nonhomologous end joining, large multiprotein complexes mutation rates 100–1000-fold normal, with a particular tendency
are assembled at broken ends of DNA molecules, and DNA ligases to replication slippage in homopolymeric runs (see Figure 13.4). In
rejoin the broken ends regardless of their sequence (Figure 2B). humans, the mechanism involves at least five proteins, and defects
There is always some loss of DNA sequence from the broken ends. cause Lynch syndrome (OMIM 120435 and OMIM 609310). Details of
This is a desperate measure that is likely to cause mutations or the mechanism are given in Chapter 17.
Even if a cell is able to survive with damaged DNA, unrepaired damage is likely
to be mutagenic. If mutations occur in germ cells, they can give rise to new vari-
ants in the population that provide much of the raw material for evolution as well
as the intraspecific variants considered in this chapter. A major source of muta-
tions is error-prone trans-lesion DNA synthesis. The sloppy polymerases z (zeta)
and i (iota) are able to replicate damaged DNA, bypassing stalled replication
forks—but at the cost of a high error rate. Additionally, modified bases may mis-
pair during DNA replication, introducing permanent sequence changes. For
example, uracil, produced by the deamination of cytosine, pairs with adenine, as
does the 8-hydroxy guanine that is a product of oxidative attack on DNA.
A special case is 5-methyl cytosine. When cytosine is deaminated, the product
is uracil. This is an unnatural base in DNA that is efficiently recognized and cor-
rected. However, 5-methyl cytosine is deaminated to thymine, a natural base in
DNA (Figure 13.9C). If the resulting G–T mispairing is not corrected before the
next round of DNA replication, one daughter cell will have a permanent CÆT
mutation. Evidence from both evolutionary studies and human diseases shows
the importance of deamination of 5-methyl cytosine as a source of sequence
changes. Methylated cytosines are almost always found in CpG sequences (that
is, the nucleotide 3¢ of cytosine is guanosine), and CpG sequences are mutational
hotspots (see also Chapters 8 and 11).
In response to these various forms of damage, cells deploy a whole range of
repair mechanisms. Different mechanisms correct different types of lesion (Box
13.3). The importance of effective DNA repair systems is highlighted by the
roughly 130 human genes participating in DNA repair, and by the severe diseases
that affect people with deficient repair systems.
251260), and Bloom syndrome (OMIM 210900). Clinically, patients may suffer
symptoms including hypersensitivity to sunlight, neurological and/or skeletal
problems, anemia, and especially a high incidence of various cancers. In the lab-
oratory, cells from patients with these disorders show hypersensitivity to various
DNA-damaging agents. cell A cell B
Complementation groups
cell fusion
Many of these conditions can be divided into a series of complementation groups
by using cell fusion tests (Figure 13.10). If two cells, A and B, lack function in dif-
ferent repair genes, then when the cells are fused the resulting hybrid cell will
contain functioning copies of both genes. Cell A, with gene A defective, will pro-
vide a functioning copy of gene B, and cell B, in which gene B is defective, will
provide a functional copy of gene A. Thus, the hybrid should recover the wild-
type resistance to DNA damage. Using this technique, cells from patients with fused cell
xeroderma pigmentosum, caused by defects in nucleotide excision repair (see
Box 13.3), have been divided into seven different groups. Cells from any one
group will complement (make good) the defect in a cell from any other group. DNA repair
test
Fanconi anemia, caused by a defective cellular response to DNA damage, has
been divided into at least 12 groups. In general, a different gene is mutated in repair-deficient: repair-competent:
each separate complementation group, although the clinical phenotypes over- same different
lap. Details of these genetic diseases and their complementation groups can be complementation complementation
group groups
found in the OMIM database (http://www.ncbi.nlm.nih.gov/omim).
Molecular studies of these various groups have defined a large number of Figure 13.10 Assigning DNA repair defect
genes involved in human DNA repair. Sorting out the individual pathways has genes to complementation groups.
been greatly aided by the very strong conservation of repair mechanisms across Repair-deficient cell lines from two unrelated
the whole spectrum of life. Not only the reaction mechanisms but also the pro- patients (A and B) are fused in cell culture,
tein structures and gene sequences are often conserved from E. coli to humans. and the fused cells are then tested for their
Generally, eukaryotes have multiple systems corresponding to each single sys- ability to repair DNA. If the fused cells are
tem in E. coli. For example, nucleotide excision repair requires six proteins in competent in DNA repair, then cells A and B
E. coli but at least 30 in mammals. A downside of the conservation is a confusing belong to different complementation groups
and probably contain defects in different
gene nomenclature, referring sometimes to human diseases (e.g. xeroderma pig-
DNA repair genes. If the fused cells still
mentosum type D, or XPD), sometimes to yeast mutants (RAD genes), and some-
lack the ability for DNA repair, then A and B
times to mammalian cell complementation systems (ERCC—excision repair belong to the same complementation group
cross-complementing). So, for example, XPD, ERCC2, and RAD3 are the same and may have defects in the same DNA
gene in human, mouse, and yeast. repair gene.
Not all the diseases that involve hypersensitivity to DNA-damaging agents are
caused by defects in the DNA repair systems themselves. Sometimes it is the
broader cellular response to damage that is defective. Normal cells react to DNA
damage by stalling progress through the cell cycle at a checkpoint until the dam-
age has been repaired, or by triggering apoptosis if the damage is irreparable.
Patients with ataxia-telangiectasia and Fanconi anemia have intact repair sys-
tems but are deficient in damage-sensing or response mechanisms. Defects in
cell cycle control and in the apoptotic response are central to the development of
cancer, and they will be discussed further in Chapter 17.
In any case, for many gene products no laboratory test is available that checks all
aspects of the gene’s function in vivo. Some variants may be pathogenic only at
times of environmental stress, and others may have subtle effects that manifest
as susceptibility to a disease, perhaps only when in combination with certain
other genetic variants.
In the absence of a definitive functional test, the nature of the sequence
change often provides a clue. First we can ask whether the variant affects a
sequence that is known to be functional. Such sequences would include the cod-
ing sequences of genes, sequences flanking exon–intron junctions (splice sites),
the promoter sequence immediately upstream of a gene, and any other known
regulatory sequences. The great majority of all known pathogenic variants affect
sequences that were already known to be functional, and these comprise only a
small percentage of our total DNA. However, it is always possible that a variant
located outside any known functional sequence might lie in a currently unidenti-
fied functional element. As we saw in Chapter 11, the ENCODE project is reveal-
ing many previously unsuspected functional elements in the human genome.
Such elements are suspected to be locations for variants that merely alter suscep-
tibility to a disease, rather than directly causing any disease.
If a variant does affect a known functional sequence, we must try to predict its
effect. A table of the genetic code (see Figure 1.25) can be used to identify the
effect of a coding sequence variant on the protein product of a gene. As described
below, nonsense mutations, frameshifts, and many deletions can be confidently
predicted to wreck the protein. Similarly, changes to the invariant GT...AG
sequences at splice sites are highly likely to be pathogenic. Changes that merely
replace one amino acid with a different one (missense changes) are more diffi-
cult to interpret.
Another approach is to look for precedents. Maybe a variant is already docu-
mented in dbSNP, the database of single nucleotide polymorphisms (see above).
Alternatively, it may be documented in one of the databases of pathogenic muta-
tions listed in Further Reading. A different sort of precedent can be sought by
checking the normal sequence of related genes. These may be in humans (para-
logs) or other species (orthologs). If the variant is present as the normal, wild-
type sequence of a related gene it is unlikely to be pathogenic.
Further aspects of this problem are considered in Chapter 18, where we dis-
cuss genetic testing. In the rest of this section we consider some of the many ways
in which a change in a functional sequence can be pathogenic.
(A)
V H L T P E E K S
normal HBB gene GTG CAT CTG ACT CCT GAG GAG AAG TCT
V H L T P V E K S
sickle-cell mutation GTG CAT CTG ACT CCT GTG GAG AAG TCT
a1 b1
a2 b2
Phe 85
Val 6
b1 a1
Phe b2 a2 Leu 88
85 Val 6
a1 b1 Leu
a2 b2 88
cell mutation is pathogenic because it replaces a polar amino acid with a nonpo-
lar one on the outside of the globin molecule (Figure 13.11). This makes the
molecules tend to stick together. Protein aggregation, the result of abnormal pro-
teins having sticky external areas, has emerged as a common pathogenic mecha-
nism in a variety of diseases, especially progressive neurodegenerative condi-
tions, and is discussed further on p. 425.
It is seldom possible to predict these effects with much confidence. It helps if
the three-dimensional structure of the protein has been solved, so that one can
model the likely structural effect of a substitution. If amino acid sequences of
related proteins (from humans or other organisms) are known, we can see which
amino acids are invariant and which seem free to vary widely between species.
Most amino acid substitutions probably have no effect on the functioning of a
protein.
Nonsense mutations
Three of the 64 codons in the genetic code are stop codons, and so it is quite com-
mon for a nucleotide substitution to convert the codon for an internal amino
acid of a protein into a stop codon. When ribosomes encounter a stop codon they
dissociate from the mRNA, and the nascent polypeptide is released (Figure
13.12A). However, genes containing premature termination codons seldom
cause production of the truncated protein that might be predicted. Cells have a
mechanism, nonsense-mediated decay (NMD), that detects mRNAs containing
premature termination codons and degrades them. Thus, the usual result of a
nonsense mutation is to prevent any expression of the gene.
NMD works because the spliced mRNA that travels from the nucleus to the
ribosomes retains a memory of the positions of the introns. The splicing mecha-
nism leaves proteins of the exon junction complex (EJC) attached to splice sites.
During the first (pioneer) round of translation, as the ribosome passes each splice
site it clears the EJC proteins attached to that site. If there is a premature termina-
tion codon, the ribosome will not have traversed every splice site before it
detaches. Some EJC proteins will remain attached to the mRNA, and this marks
the mRNA for destruction (Figure 13.12B).
PATHOGENIC DNA VARIANTS 419
intron/exon size (bp) frameshift deletions causing DMD or BMD Figure 13.16 Effect of deletions in the
dystrophin gene. The gene consists of
intron 41 31,823
79 small exons surrounded by large introns.
exon 42 195 0 About 65% of all mutations in this gene are
intragenic deletions. Because the introns are
intron 42 22,380 much larger than the exons, the breakpoints
exon 43 173 –1 nearly always lie within introns. The effect is
to delete one or more whole exons. When
intron 43 70,465 this results in a frameshift (red bars) it causes
the severe Duchenne muscular dystrophy
exon 44 148 +1
(DMD; OMIM 310200). If the deleted exons
intron 44 248,401 are frame-neutral (green bars) the result is
the milder Becker muscular dystrophy (BMD;
exon 45 176 –1
OMIM 300376), even if the deletion is larger
intron 45 36,111 than one causing DMD.
exon 46 148 +1
intron 46 2334
exon 47 150 0
intron 47 54,222
exon 48 186 0
intron 48 38,368
exon 49 102 0
intron 49 16,634
deletions or duplications usually cause frameshifts. Most exons are not frame-
neutral (that is, the number of nucleotides in the exon is not a multiple of three),
so excluding one or more exons because of a deletion or splicing error will more
often than not create a frameshift. Most introns are much larger than the exons
they surround, so in most genes the breakpoints of random intragenic deletions
or duplications will lie within introns. The result is to delete or duplicate one or
more whole exons (Figure 13.16).
Changes that affect the level of gene expression
A variant in a control sequence might affect the level of transcription of a gene, so
that although the gene product is entirely normal, too little (or too much) of it is
produced. Most obviously this would happen if the variant changed the promoter
sequence. In reality, few such variants have been described. This is partly because
diagnostic laboratories do not routinely sequence promoters. If they did find a
change, they would seldom know how to interpret its significance. As described
in Chapter 1, promoters can include consensus binding sites for a variety of tran-
scription factors, and the effect of a sequence change can usually only be deter-
mined by experiment.
Patients with a- or b-thalassemia have been thoroughly investigated in the
search for mutations affecting transcription. Their diseases are forms of anemia
caused by a quantitative deficiency of a- or b-globin, respectively, and so were
prime candidates for mutations of this type. However, a-thalassemia is most
commonly caused by reduced numbers of active a-globin genes (see below).
Some mutations in the b-globin gene promoter have been identified (see OMIM
141900, listed mutations 370–381, and also Figure 13.17A), but the great majority
of b-thalassemia mutations work by producing an unstable mRNA or an unstable
globin protein, rather than directly repressing transcription. Three common
types of event have been noted:
• Many thalassemia mutations are splicing errors or nonsense mutations that
produce a premature termination codon, resulting in nonsense-mediated
decay of the mRNA.
• Changes in the 3¢ untranslated region may cause the mRNA to be unstable. A
change in this region may create or abolish a critical binding site for a micro-
RNA or a protein that regulates translation (see Chapter 11).
422 Chapter 13: Human Genetic Variability and Its Consequences
• Cells are very efficient at detecting and degrading abnormal proteins that fold
incorrectly. For example, one form of a-thalassemia is caused by a change of
the normal TAA stop codon of an a-globin gene to CAA. This encodes
glutamine, and translation of the mRNA continues for a further 31 amino
acids until another stop codon is encountered (Figure 13.17B). This variant
hemoglobin (Hb Constant Spring) is unstable, and clinically the effect is
a-thalassemia resulting from a quantitative deficiency of a-globin chains.
Fragile-X site A (FRAXA) 309550 X FMR1 Xq27.3 5¢ untranslated (CGG)n 6–54 200–1000+
region
Fragile-X site E (FRAXE) 309548 X FMR2 Xq28 5¢ untranslated (CCG)n 4–39 200–900
region
Friedreich ataxia (FRDA) 229300 AR FXN 9q13 intron 1 (GAA)n 6–32 200–1700
Myotonic dystrophy 1 (DM1) 160900 AD DMPK 19q13 3¢ untranslated (CTG)n 5–37 50–10,000
region
Myotonic dystrophy 2 (DM2) 602668 AD ZNF9 3q21.3 intron 1 (CCTG)n 10–26 75–11,000
Spinocerebellar ataxia 10 (SCA10) 603516 AD ATXN10 22q13.31 intron 9 (ATTCT)n 10–20 500–4500
Myoclonic epilepsy of Unverricht 254800 AR CSTB 21q22.3 promoter (CCCCGC 2–3 40–80
and Lundborg CCCGCG)n
Spinocerebellar ataxia 1 (SCA1) 164400 AD ATXN1 6p23 coding (CAG)n 6–38 39–82
Spinocerebellar ataxia 2 (SCA2) 183090 AD ATXN2 12q24 coding (CAG)n 15–24 32–200
Machado-Joseph disease (SCA3) 109150 AD ATXN3 14q32.1 coding (CAG)n 13–36 61–84
Spinocerebellar ataxia 6 (SCA6) 183086 AD CACNA1A 19p13 coding (CAG)n 4–17 21–33
Spinocerebellar ataxia 7 (SCA7) 164500 AD ATXN7 3p14.1 coding (CAG)n 4–35 37–306
Spinocerebellar ataxia 17 (SCA17) 607136 AD TBP 6q27 coding (CAG)n 25–42 47–63
Spinocerebellar ataxia 8 (SCA8) 608768 AD ? 13q21 untranslated RNA (CTG)n 16–34 74+
Spinocerebellar ataxia 12 (SCA12) 604326 AD PPP2R2B 5q32 promoter (CAG)n 7–45 55–78
Huntington disease-like 2 (HDL2) 606438 AD JPH3 16q24.3 variably spliced (CTG)n 7–28 66–78
exon
aModes of inheritance: X, X-linked; AD, autosomal dominant; AR, autosomal recessive.
(CAG)n
in coding sequence
Comparisons of repeat sizes in parent and child are hard to interpret mecha-
nistically because they may reflect the mitotic or meiotic instability of the repeat
in the parental germ line or, alternatively, the ability of gametes to transmit large
repeats. Moreover, repeat sizes are usually studied in DNA extracted from periph-
eral blood lymphocytes, and if there is mitotic instability the lymphocytes may
have very different repeat sizes from those in sperm or egg, or in the tissues
involved in the disease pathology. Some repeats are unstable somatically, and so
give a smeared band when blood DNA is analyzed by electrophoresis. Others are
unstable between parent and child but stable in mitosis, and so blood DNA gives
a sharp band, but of different sizes in parent and child.
(A) (B)
a a
a a
a a a a a a a a
or
a a a a
a
One common mechanism generating changes in gene dosage is non-allelic Figure 13.20 Deletions of a-globin genes
homologous recombination (NAHR). Segmental duplications (often defined as in a-thalassemia. (A) Normal copies of
sequences 1 kb or longer with 95% or greater sequence identity) may misalign chromosome 16 each carry two a-globin
genes (red) arranged in tandem. Repeat
when homologous chromosomes pair in meiosis. NAHR then produces deletions
blocks flanking the genes (shown in gray
or duplications. The misaligned repeats have the same sequence but not the
and yellow) may misalign, allowing unequal
same chromosomal location, so recombination is homologous but the sequences crossover. The figure shows one of several
are not alleles. Many (but by no means all) of the common nonpathogenic vari- such misaligned recombinations that can
ants in copy number seen in normal healthy people are generated by this mecha- produce a chromosome carrying only one
nism. a-Thalassemia provides a good example of NAHR producing a pathogenic active a gene. Unequal crossovers between
variation in gene dosage. Most people have four copies of the a-globin gene (aa/ other repeats (not shown) can produce
aa) as a result of an ancient tandem duplication. As shown in Figure 13.20, NAHR chromosomes carrying no functional a
between low-copy repeat sequences flanking the a-globin genes can produce gene. Thus, the number of a-globin genes
chromosomes carrying more or fewer a-globin genes. Reduced copy numbers of may vary in individuals from none to four or
more. (B) The consequences become more
a-globin genes produce successively more severe effects. People with three cop-
severe as the number of a genes diminishes.
ies (aa/a–) are healthy; those with two (whether the phase is a– /a– or aa/– –)
See Weatherall et al. (Further Reading) for
suffer mild a-thalassemia; those with only one gene (a–/– –) have severe disease; detail.
and lack of all a genes (– –/– –) causes lethal hydrops fetalis (fluid accumulation
in the fetus).
X-chromosome monosomy and trisomy are particularly interesting because
X-inactivation (inactivation of all except one of the X chromosomes in a cell; see
Chapter 3) ought to render them asymptomatic in somatic tissues. However, as
noted in Chapter 11, a surprisingly large number of genes on the X chromosome
escape inactivation. Some of these have a counterpart on the Y chromosome, but
most do not. For those genes that escape X-inactivation but lack a Y-linked coun-
terpart, normal females would have two functional copies and males only one.
Turner (45,X) females would have the same single active copy as normal males—
but perhaps in the context of female development, a single copy is not sufficient.
The skeletal abnormalities of Turner syndrome are caused by haploinsufficiency
for SHOX (50% of the normal gene product is not sufficient to produce a normal
phenotype). This is a homeobox gene that is located in the Xp/Yp pseudoauto-
somal region, and so is present in two copies in both males and normal females.
Below the level of conventional cytogenetic resolution but above the single
gene level, pathogenic variations in copy number are classified as microdeletions
or microduplications (Table 13.2). Among these, three different molecular
pathologies can be distinguished:
• Single gene syndromes, in which all the phenotypic effects are due to the
deletion (or sometimes duplication) of a single gene. For example, Alagille
syndrome (OMIM 118450) is seen in patients with a microdeletion at 20p11.
However, 93% of Alagille patients have no deletion but instead are hetero-
zygous for point mutations in the JAG1 gene located at 20p12. The cause of the
syndrome in all cases is a half dosage of the JAG1 gene product.
• Contiguous gene syndromes are seen primarily in males with X-chromosome
deletions (Figure 13.21A). The classic case was a boy BB who had Duchenne
muscular dystrophy (DMD; OMIM 310200), chronic granulomatous disease
(CGD; OMIM 306400), and retinitis pigmentosa (OMIM 312600), together
with mental retardation. He had a chromosomal deletion in Xp21 that removed
a contiguous set of genes and incidentally provided investigators with the
means to clone the genes whose absence caused two of his diseases, DMD
and CGD. Deletions of the tip of Xp are seen in another set of contiguous gene
syndromes. Successively larger deletions remove more genes and add more
PATHOGENIC DNA VARIANTS 427
Those syndromes caused by having only one functional copy of a single gene are often also seen in patients who do not have a microdeletion, but a
point mutation in the gene. WAGR, Wilms tumor, aniridia, genital abnormalities, mental retardation; VCFS, velocardiofacial syndrome.
patient no. 1 2 3 4 5 6 7 8
X1 A1
X2 A2
X3 A3
X4 A4
Figure 13.21 X-linked and autosomal
microdeletion syndromes. (A) On the
X5 A5 X chromosome, male patients 1–3 show
a nested series of contiguous gene
X6 A6 syndromes, whereas patient 4 has a different
contiguous gene syndrome. Deletion of
X7 A7 gene X1 or X5 is lethal in males. (B) On the
autosome, contiguous gene syndromes
X8 A8 are rare because of the balancing effect of
the second chromosome. In this example,
only gene A5 is dosage sensitive. Patients
5–7, with different-sized deletions all
phenotype X2 X2 X2 X6 A5 A5 A5 A5
of patient: + + + encompassing gene A5, all show the same
X3 X3 X7 phenotype as patient 8, who is heterozygous
+ for a loss-of-function point mutation in the
X4 A5 gene.
428 Chapter 13: Human Genetic Variability and Its Consequences
+
+ +
+ + ++ +
+
+ + + + +
+
+ + + ++ + Figure 13.22 The spectrum of mutations
+ + ++ ++ + ++ + + ++ + + ++ + + + + + ++ +
in the ATM gene in patients with ataxia-
telangiectasia. The great variety of different
+
mutations, most of which would truncate
1 10 20 30 40 50 60
exon the gene product, show that this recessive
disease is caused by loss of function of the
truncating mutations: non-truncating mutations:
ATM gene. [Data reported in Li A & Swift M
nonsense mutation missense mutation (2000) Am. J. Med. Genet. 92, 170–177. With
+ frameshifting insertion/deletion + in-frame deletion permission from Wiley-Liss, Inc., a subsidiary
splice site mutation of John Wiley & Sons, Inc.; and from Ensembl
deletions of all or part of the gene and OMIM.]
430 Chapter 13: Human Genetic Variability and Its Consequences
+
+ + Figure 13.23 Loss-of-function mutations
+
+
+ +
in the PAX3 gene in patients with type I
+ + + + + + + Waardenburg syndrome. A wide range
1 2 3 4 5 6 7 8 9 10 of both truncating and non-truncating
mutations are seen, similar to the data for
the ATM gene in Figure 13.21. Missense
changes are concentrated in the two shaded
+
areas that encode key functional DNA-
binding domains (orange, paired domain;
green, homeodomain) of the PAX3 protein.
truncating mutations: non-truncating mutations: Although the disease is dominant, the data
nonsense mutation missense mutation in the figure show that the cause must be
+ frameshifting insertion/deletion + in-frame deletion a loss of function of the PAX3 gene. This is
splice site mutation therefore an example of haploinsufficiency.
deletions of all or part of the gene
One might reasonably ask why there should be haploinsufficiency for any
gene product. Why has natural selection not managed things better? If a gene is
expressed so that two copies make only a barely sufficient amount of product,
selection for variants with higher levels of expression should lead to the evolution
of a more robust organism, with no obvious price to be paid. The answer is that in
most cases this has indeed happened—which is why relatively few genes are
dosage-sensitive. There are a few cases in which a cell with only one working copy
of a gene just cannot meet the demand for a gene product that is needed in large
quantities. An example may be elastin. In people heterozygous for a deletion or
loss-of-function mutation of the elastin gene, tissues that require only modest
quantities of elastin (skin and lung, for example) are unaffected, but the aorta,
where much more elastin is required, often shows an abnormality, supravalvular
aortic stenosis (OMIM 185500). However, certain gene functions are inherently
dosage-sensitive. These include:
• gene products that are part of a quantitative signaling system whose function
depends on partial or variable occupancy of a receptor or a DNA-binding site,
for example;
• gene products that compete with each other to determine a developmental or
metabolic switch;
• gene products that cooperate in interactions with fixed stoichiometry (such
as the a- and b-globins or many structural proteins).
In each case, the gene product is titrated against something else in the cell.
What matters is not the correct absolute level of product, but the correct relative
levels of interacting products. The effects are sensitive to changes in all the inter-
acting partners; thus, these dominant conditions often show highly variable
expression. Genes whose products act essentially alone, such as many soluble
enzymes of metabolism, seldom show dosage effects.
abnormal no
expression expression expression
Overexpression NR0B1 300018 male-to-female sex reversal gene duplications cause overexpression and sex reversal
Acquire new PI (Pittsburgh allele) 107400 lethal bleeding disorder a rare gain-of-function allele of the a1-antitrypsin gene
substrate (Figure 13.25)
Ion channel open SCN4A 168300 paramyotonia congenita specific mutations in this sodium channel gene delay
inappropriately of von Eulenburg closing
Protein aggregation HD 143100 Huntington disease proteins with expanded polyglutamine runs form toxic
aggregates
Receptor GNAS1 174800 McCune–Albright disease somatic mutations only; constitutional form would
permanently on probably be lethal
Chimeric gene BCR–ABL 151410 chronic myeloid leukemia somatic mutations only
MOLECULAR PATHOLOGY: UNDERSTANDING THE EFFECT OF VARIANTS 433
acid substitution in the active site converts an elastase inhibitor into a novel,
constitutively active, antithrombin, resulting in a lethal bleeding disorder.
Diseases caused by gain of function of G-protein-coupled hormone
receptors
The G-protein-coupled hormone receptors provide good examples of activating
mutations. Many hormones exert their effects on target cells by binding to the
extracellular domains of transmembrane receptors. Binding of ligand to the
receptor causes the cytoplasmic tail of the receptor to catalyze the conversion of
an inactive (GDP-bound) G-protein into an active (GTP-bound) form, and this
relays the signal further by stimulating ion channels or enzymes such as adenylyl
cyclase. Some mutant receptors are constitutionally active: they fire (signal to
downstream effectors) even in the absence of ligand:
• Familial male precocious puberty (OMIM 176410: onset of puberty by age 4
years in affected boys) is found with a constitutively active luteinizing hor-
mone receptor. Figure 13.25 A mutation causing the
• Autosomal dominant thyroid hyperplasia can be caused by an activating a1-antitrypsin molecule to gain a novel
mutation in the thyroid-stimulating hormone receptor (see OMIM 275200). function. a1-Antitrypsin is an inactivator of
specific proteases. The protease cleaves the
• The bone disorder Jansen metaphyseal chondrodystrophy (OMIM 156400) peptide bond between residues 358 and 359
can be caused by a constitutionally active parathyroid hormone receptor. in the a1-antitrypsin molecule. As a result,
the two residues (shown as green balls)
Allelic homogeneity is not always due to a gain of function spring 65 Å apart, as shown by the arrows
here. The conformational change traps and
We stated above that allelic heterogeneity is normally a hallmark of loss-of- inactivates the protease. In the wild-type
function phenotypes. However, it is not safe to assume that any condition that allele, residues 358 and 359 are methionine
shows allelic homogeneity is caused by a gain of function. There are other possi- and serine, respectively. This creates a
ble explanations: substrate for elastase; thus, the wild-type
a1-antitrypsin functions as an anti-elastase
• In some diseases, the phenotype is very directly related to the gene product
agent. In the Pittsburgh variant a missense
itself, rather than being a more remote consequence of the genetic change. mutation, p.M358R, replaces the methionine
The disease may then be defined in terms of a particular variant product, as in with arginine. The structure, and the effect
sickle-cell anemia. of cleavage of the peptide bond between
• Some specific molecular mechanism may make a certain sequence change in residues 358 and 359, are unaltered, but
a gene much more likely than any other change—for example the (CGG)n now the specificity is for thrombin rather
expansion in fragile X syndrome. than elastase. Thus, the Pittsburgh variant
has no anti-elastase activity, but has become
• There may be a founder effect. For example, certain disease mutations are a powerful antithrombin agent. The result
common among Ashkenazi Jews. Present-day Ashkenazi Jews are descended is a lethal bleeding disorder. Orange and
from a fairly small number of founders. If one of those founders carried a blue mark the parts of the a1-antitrypsin
recessive allele, it may be found at high frequency in the present Ashkenazi molecule N-terminal and C-terminal of
population. the cleavage site; carbohydrate residues
are shown in gray. [Image S3D00427 from
• Selection favoring heterozygotes enhances founder effects and often results University of Geneva ExPASy molecular
in one or a few specific mutations being common in a population. biology Web server, http://www.expasy.ch/]
point mutations
* *
>60 no disease
HPRT is an X-linked gene (OMIM 308000); the phenotype in males directly reflects the activity of
their single HPRT gene.
CONCLUSION
In this chapter, we have described the many ways in which the genomes of two
humans can differ. Principally, these are through single nucleotide polymor-
phisms, variable numbers of tandem repeats, and large-scale copy-number vari-
ants. Studies of these differences are directly relevant to understanding many
features of human populations. The structure of populations in terms of inbreed-
ing and reproductive isolation, the relationship of different populations to each
other, the geographical and historical origins of populations—all these aspects
are illuminated by studies of genetic variation.
When we come to studies of health and disease, it is not quite so straightfor-
ward. Connecting sequence variants with health and disease requires approach-
ing the question from both ends. We have seen how one can try to decide whether
a particular variant may be pathogenic or not. In some cases the answer is clear,
but all too often simply looking at the DNA sequence does not give a clear answer.
Even when we can be confident that a sequence change would be pathogenic, we
can very rarely predict the actual clinical symptoms it would produce. To relate
DNA to disease we need also to start from the disease end of the connection.
Studying genetic diseases allows us to identify the underlying pathogenic
sequence variants. This approach has been hugely successful with the Mendelian
diseases, as the next chapters demonstrate. Given suitable families, it is relatively
easy to identify the chromosomal location of the gene underlying a Mendelian
character (Chapter 14), which usually allows the researcher to go on to identify
the relevant gene (Chapter 16). Often it remains puzzling why a malfunction in a
particular gene should cause that particular set of symptoms, but at least the
pathogenic variants can usually be identified unambiguously.
Many genetic variants do not cause Mendelian conditions but may neverthe-
less have an influence on health. Genetic susceptibility factors are important
determinants, though not the sole ones, of many common diseases. Linking spe-
cific variants to specific disease susceptibilities has been a very difficult task that
is still far from fully accomplished. Chapters 15 and 16 review progress in that
area. Hopefully we will eventually be in a position to derive useful predictive
health information from analysis of a person’s genome sequence—which will
raise a whole series of issues about genetic testing and population screening.
These are covered in Chapters 18 and 19.
FURTHER READING
Types of variation between human genomes Kidd JM, Cooper GM, Donahue WF et al. (2008) Mapping and
Bentley DR, Balasubramanian S, Swerdlow HP et al. (2008) sequencing of structural variation from eight human
Accurate whole-genome sequencing using reversible genomes. Nature 453, 56–64. [A systematic study of variants
terminator chemistry. Nature 456, 53–59. [Mainly a technical 8 kb and larger in individuals from Asia, Europe, and Africa.]
account of using the Illumina technology to sequence an Korbel JO, Urban AE, Affourtit JP et al. (2007) Paired-end mapping
individual’s genome.] reveals extensive structural variation in the human genome.
DbSNP. www.ncbi.nlm.nih.gov/projects/SNP [The 12 million Science 318, 420–426. [Results of using a method of detecting
entries in Build 126 include some rare pathogenic variants CNVs down to 3 kb in size.]
and some changes that are more complex than single Levy S, Sutton G, Ng PC et al. (2007) The diploid genome
nucleotide polymorphisms.] sequence of an individual human. PLoS Biol. 5, e254.
Decipher database of pathogenic large-scale variants. [Craig Venter’s sequence.]
http://decipher.sanger.ac.uk/ Li JZ, Absher DM, Tang H et al. (2008) Worldwide human
Den Dunnen JT & Antonarakis SE (2001) Nomenclature for the relationships inferred from genome-wide patterns of
description of human sequence variations. Hum. Genet. 109, variation. Science 319, 1100–1104. [Using SNP data to infer
121–124. population relationships.]
Horaitis O, Talbot CC Jr, Phommarinh M et al. (2007) A database of Lukusa T & Fryns JP (2008) Human chromosome fragility. Biochim.
locus-specific databases. Nat. Genet. 39, 425. Biophy. Acta 1779, 3–16. [A review of chromosome fragile
[A manifesto for locus-specific mutation databases.] sites.]
Jakobsson M, Scholz SW, Scheet P et al. (2008) Genotype, Mutation databases. http://www.genomic.unimelb.edu.au/mdi/
haplotype and copy-number variation in worldwide human The Human Genome Variation Society Mutation Database
populations. Nature 451, 998–1003. [Analysis of SNPs and Initiative lists details and links for many databases, including
CNVs in individuals from 29 populations.] http://www.hgvbaseg2p.org [Human Genome Variation
FURTHER READING 439
Genetic Mapping of
Mendelian Characters 14
KEY CONCEPTS
• Genetic mapping is needed to localize genes underlying observable phenotypes, which cannot
be identified by searching the DNA sequence of the human genome.
• During meiosis there is normally at least one crossover in each arm of each pair of synapsed
homologous chromosomes. Pedigree analysis reveals the result of these crossovers as genetic
recombination.
• Genes that are physically close together on the same chromatid tend to travel together during
meiosis; genes that are farther apart are liable to separate through crossovers between
homologous chromosomes.
• The genetic distance between two loci is defined as the frequency with which recombination
separates them during meiosis. This is often measured as the fraction of gametes that are
recombinant. A recombinant fraction of 0.01 corresponds to 1% recombination and a genetic
distance of 1 centimorgan (cM).
• Recombinant fractions never exceed 0.5, regardless of whether the two loci concerned are on the
same or different chromosomes.
• Genetic maps are co-linear with physical maps, but distances do not necessarily correspond
because the probability of recombination per megabase of DNA is not uniform along the length
of a chromosome, and differs between males and females.
• To map human traits, it is necessary to assemble family pedigrees containing a sufficient number
of informative meioses.
• Genes are mapped by checking a collection of pedigrees for co-segregation of the relevant
phenotype and the alleles of genetic markers. The main markers used are short tandem repeat
polymorphisms (STRPs or microsatellites) and single nucleotide polymorphisms (SNPs).
• The statistical criterion for assessing linkage is the logarithm of the odds or lod score. In simple
situations, the threshold of significance for linkage is a lod score of +3 and against linkage –2.
• Human pedigrees often have complex structures in which recombinants cannot be identified
through simple inspection, and so software programs are needed that calculate the relative
likelihood of the observed data on the competing hypotheses of linkage or no linkage.
• Linkage analysis can be more efficient if data for more than two loci are analyzed simultaneously,
an approach known as multipoint mapping.
• Multipoint linkage analysis has been used on large family pedigrees to construct marker
framework maps, which can then be used to pinpoint the location of a disease gene.
• Autozygosity mapping is a powerful method of mapping genes for rare recessive conditions. It
works by hunting for ancestral chromosome segments shared by affected people in large
consanguineous families.
• The resolution of linkage analysis depends on the number of meioses analyzed. The ultimate
resolution is obtained by typing individual sperm.
• The methods described above have been very successful for mapping Mendelian conditions, but
loci conferring susceptibility to common complex diseases require different approaches, which
are described in Chapter 15.
442 Chapter 14: Genetic Mapping of Mendelian Characters
The finished sequence of the human genome is the ultimate physical map, allow-
ing the chromosomal location of physical entities (such as DNA sequences and
breakpoints) to be defined to the nearest nucleotide. However, nothing in the raw
sequence would have allowed a researcher to locate the gene, HD, responsible for
Huntington disease, or any other phenotypic character. Of course, the HD gene
can now be located by using a genome browser, but this is only possible because
the gene was previously localized and identified by other means, and that infor-
mation was added to the annotation used by the browser. To locate genes respon-
sible for a particular disease or any other phenotypic character, we still need a
genetic map.
Genes defined through phenotypes can be identified in various ways (as will
be described in Chapter 16), but the usual methods all involve first discovering
the chromosomal location of the gene. Such genetic mapping can be performed
for any phenotypic determinant. The first step is to collect a series of clinically
well-phenotyped cases, which are then genotyped in the laboratory for a series of
genetic markers. These genotypes are then analyzed to map the determinant. In
this chapter we describe the basic principles of genetic mapping and their appli-
cation to Mendelian characters. Mapping susceptibility factors for complex dis-
eases, where there is genetic susceptibility but not a Mendelian inheritance pat-
tern, is based on the same principles, but brings in extra complications, which
are considered in Chapter 15.
Recombination is believed to involve the following steps (Figure 1): Figure 1 The stages of recombination and gene conversion.
• A double-strand break is made in one parental DNA molecule Note that here we show the individual strands of each double
(yellow). Exonucleases strip back the 5¢ ends of the broken strands, helix; Figure 14.3 shows chromatids, each of which contains an
leaving single-stranded 3¢ overhangs. The single-stranded ends intact double helix. [From Alberts B, Johnson A, Lewis J et al. (2007)
are stabilized by cooperative binding of Rad51 or related proteins. Molecular Biology of the Cell, 5th ed. Garland Science/Taylor &
• The single strand invades the complementary double helix Francis LLC.]
(red), initially forming a triple helix but
eventually displacing one of the original
complementary strands. DNA synthesis original chromatids
extends the invading strand for a kilobase
or so (green). The displaced complementary
strand pairs with the remaining strand of
the cut molecule, which is extended by DNA
synthesis (green). double-strand break
strip back 5¢ ends
• The two junctions linking the two double
helices (known as Holliday junctions after
Dr Robin Holliday, who first described their
properties) are each resolved by cutting
two strands and ligating the cut ends to the
opposite partner. strand invasion
Depending on the strands cut, the resulting DNA synthesis
double helices are recombinant or non-
recombinant. However, within the region
between the two Holliday junctions, things are
more complicated. Recombination is a reciprocal
process (an equal exchange between the two
double helices), but in the region between the
two junctions two mechanisms may cause a ligate
non-reciprocal transfer of sequence, known
as gene conversion. First, both stretches of
newly synthesized DNA (green) have been
made by using the red strand as a template,
rather than one using the red and the other
the yellow strand. Second, if there were any
sequence differences between the two original DNA strands cut at arrows
double helices, the hatched regions in the figure
may contain mismatches. Mismatch repair
enzymes will correct these by stripping back
and resynthesizing one of the strands, chosen at
random. The result of either of these processes is non-recombinant recombinant
to patch short (typically 1 kb) stretches of
sequence derived from one parental double microduplications are produced by non-allelic homologous
helix into the other, without making any reciprocal replacement. recombination between low-copy-number repeats spaced along
Recombination generally takes place only in prophase of the the same chromosome.
first division of meiosis, and normally involves truly allelic sequences. • Nonhomologous recombination can sometimes also occur,
However, there are exceptions: presumably through chance malfunction of the recombination
• Recombination happens in mitosis at maybe 10–4 times the machinery. This can be a source of chromosomal rearrangements.
frequency of meiotic recombination. Mitotic recombination is • Special recombination mechanisms rearrange immunoglobulin
a significant mechanism in carcinogenesis and is discussed in and T-cell receptor genes during the maturation of lymphocytes.
Chapter 17. These generate the diversity that allows an immune response to
• Many sequences occur more than once in the genome and so be mounted against virtually any antigen (see Chapter 4).
homologous but non-allelic recombination is not infrequent. However, these exceptions do not affect the use of normal meiotic
In Chapter 13 we saw how recurrent microdeletions and recombination for genetic mapping.
single crossover double crossover Figure 14.3 Single and double crossovers.
two-strand three-strand four-strand The figure shows the chromosomal events
that determine whether a gamete is
recombinant or non-recombinant for the
two loci A and B. Each crossover involves
A1 A2 A1 A2 A1 A2 A1 A2 two of the four chromatids of the two
synapsed homologous chromosomes.
bivalents in One chromosome carries alleles A1 and
prophase of B1, the other alleles A2 and B2. Chromatids
meiosis I in the gametes labeled N carry a parental
combination of alleles (A1B1 or A2B2).
Chromatids labeled R carry a recombinant
B1 B2 B1 B2 B1 B2 B1 B2 combination (A1B2 or A2B1). Note that
recombinant and non-recombinant are
gametes defined only in relation to these two loci.
For example, in the result of the
three-strand double crossover, the second
chromatid from the left has been involved
in crossovers—but it is non-recombinant
for loci A and B because it carries alleles A1
and B1, a parental combination. A single
chromatids crossover generates two recombinant and
in gamete two non-recombinant chromatids (50%
recombinants). The three types of double
crossover occur in random proportions, so
the average effect of a double crossover is to
give 50% recombinants.
N R R N N N N N R N R N R R R R
much lower than the theoretical chance of a double crossover between two mark- (A)
ers 10 cM apart if there were no interference (10% ¥ 10% = 1%) . The interested
reader should consult Ott’s book (see Further Reading) for a fuller discussion of
mapping functions.
Interference makes true close double recombinants very unlikely, but gene
conversion (see Box 14.1) can produce results that are indistinguishable from
very close double recombinants. Only very short sequences are involved in gene
conversion—of the order of 1 kb. This is far below the resolution of almost all
genetic mapping, and so most gene conversion events have no influence on
genetic maps. However, they could pose a problem for identifying conserved
ancestral chromosome segments, as explained in Chapter 15 (see Section 15.4).
0 cM
0 cM
chromosome 18 95.5 cM
118 cM
0 cM
0 cM
chromosome 21 53.5 cM
61.5 cM
pseudoautosomal region at the tip of the short arms of the X and Y chromosomes.
Males have an obligatory crossover within this 2.6 Mb region, so that it is 50 cM
long. Thus, for this region in males 1 Mb = 19 cM, whereas in females 1 Mb =
2.7 cM. Uniquely, the Y chromosome, outside the pseudoautosomal region, has
no genetic map because it is not subject to synapsis and crossing over in
meiosis.
Two lines of evidence show that recombination is also highly non-random at
a much finer, kilobase, level of resolution:
• When individuals are genotyped for a dense map of SNP markers, and the
results from many different individuals are compared, it seems that our
genome consists of a set of conserved ancestral segments separated by fairly
sharp boundaries. This structure implies that, historically, very few recombi-
nation events have taken place within the ancestral blocks, whereas block
boundaries are hotspots for recombination. Typical block sizes are 5–15 kb.
These findings, from the International HapMap Project, are discussed in more
detail in Chapter 15.
448 Chapter 14: Genetic Mapping of Mendelian Characters
activity (cM/Mb)
recombination recombination occurs at highly localized
200
hotspots, with average recombination
frequencies up to 70 cM/Mb. Outside the
hotspots, recombination is no more than
100
0.04 cM/Mb. Green bars mark relatively
stable haplotype blocks where crossovers
are infrequent. Black bars mark the short
0
–150 –100 –50 0 sequences that were selected for analysis
chromosome distance (kb) of crossovers in individual sperm. [Adapted
from Jeffreys AJ, Neumann R, Panayi M
et al. (2005) Nat. Genet. 37, 601–606. With
• Recombination events can be observed directly at 1 kb or higher resolution by permission from Macmillan Publishers Ltd.]
genotyping single sperm from suitably heterozygous men. Such work by
Jeffreys and colleagues on two different chromosomal regions has clearly
shown that almost all recombination occurs in scattered 1–2 kb hotspots
(Figure 14.6). By and large (although not entirely), these hotspots correspond
to the gaps between ancestral conserved segments. It is not possible to study
female meiosis in the same way, but the general agreement with the block
structure defined through population studies suggests that recombination in
females must also be largely confined to the same small hotspots.
The restriction of recombination events to such very localized hotspots has
major implications for studies of the mechanism of recombination. Tantalizingly,
the DNA sequence at these points does not have any obvious special feature to
explain this behavior. Identical DNA sequences can be hotspots in humans but
not in chimpanzees. However, genetic mapping in general is not affected by this 40
very fine-scale irregularity. The best current human genetic map has a resolution males
of only about 500 kb. 35
Individuals also vary in the average number of crossovers per meiosis, and if 30
they do, then they will also have different genetic map lengths (Figure 14.7). 25
Thus, genetic maps are always probabilistic and should not be over-interpreted.
In fact, the same is true of physical maps. Although physical distances in any one 20
recombinations per meiosis
for two different genetic diseases are extremely rare. Even if they can be found, 10
individual
they will probably have no children or be unsuitable for genetic analysis in some
other way. Thus, it is seldom possible in humans to map diseases or other pheno-
types against one another, as one would in Drosophila. Figure 14.7 Individual differences in
numbers of recombinations per meiosis.
Cheung and colleagues studied an average
Mapping human disease genes depends on genetic markers of eight meioses per individual in 34 women
To map a human disease (or the locus determining any other uncommon pheno- and 33 men by genotyping members of
the Centre d’Étude du Polymorphisme
type), we need to collect families in which the character of interest is segregating,
Humain (CEPH) Utah families for 6324 SNP
and then find some other Mendelian character that is also segregating in these
markers. Each vertical line of dots shows the
same families, so that individuals can be scored as recombinant or non- number of recombinations identified in the
recombinant. A suitable character would have the following characteristics: different meioses in one person. The scatter
• It should show a clean pattern of Mendelian inheritance, preferably co- within and between individuals shows that
dominant so that the genotype can always be inferred from the phenotype. there is no one correct human map length.
[From Cheung VG, Burdick JT, Hirschmann
• It should be scored easily and cheaply using readily available material (e.g. a D & Morley M (2007) Am. J. Hum. Genet. 80,
mouthwash rather than a brain biopsy). 526–530. With permission from Elsevier.]
MAPPING A DISEASE LOCUS 449
Blood groups 1910–1960 ~20 May need fresh blood, and rare antisera. Genotype cannot always be inferred from
phenotype because of dominance. No easy physical localization
Electrophoretic mobility 1960–1985 ~30 May need fresh serum, and specialized assays. No easy physical localization. Often
variants of serum proteins limited numbers of alleles and low frequency polymorphism
HLA tissue types 1970– 1 One set of closely linked loci, usually scored as a haplotype. Highly informative;
many alleles with high or moderate frequency. Can only test for linkage to 6p21.3
DNA RFLPs 1975– >105 Two-allele markers, maximum heterozygosity 0.5. Assayed previously by Southern
blotting, now by PCR. Easy physical localization
DNA VNTRs (minisatellites) 1985–1990 ~104 Many alleles, highly informative. Assayed by Southern blotting. Easy physical
localization. Tend to cluster near ends of chromosomes
DNA VNTRs (microsatellites) 1990– ~105 Many alleles, highly informative. Can be assayed by automated multiplex PCR.
Easy physical localization. Distributed throughout genome
DNA SNPs 2000– ~107 Two-allele markers. Densely distributed across the genome. Massively parallel
technologies (microarrays, etc.) allow a sample to be assayed for thousands of
SNPs in a single operation. More stable over evolutionary time than microsatellites
RFLP, restriction fragment length polymorphism; SNP, single nucleotide polymorphism; VNTR, variable number of tandem repeats.
I
A1 A1 A1 A 1 A1 A1 A1 A1
II
A1 A1 A2 A2 A1 A2 A1 A2 A1 A2 A1 A2 A1 A2 A3 A4
III
A1 A2 A1 A2 A1 A1 A2 A4
available. In theory, linkage could be detected between loci that are 40 cM apart, Figure 14.8 Informative and
but the amount of data required to do this is prohibitive (Table 14.2). uninformative meioses. In each pedigree,
Obtaining enough family material to test more than 20–30 meioses can be individual II1 has a dominant condition that
we know he inherited along with marker
seriously difficult for a rare disease. Thus, as can be seen in Table 14.2, mapping
allele A1 because he inherited the condition
requires markers spaced at intervals no greater than 10–20 cM across the genome. from his mother, who is homozygous A1A1.
Given the genome lengths calculated above, and allowing for imperfect informa- Is the sperm that produced his daughter
tiveness, we need a minimum of several hundred markers to cover the genome. recombinant or non-recombinant between
Human genetic mapping only became generally feasible with the identification the loci for the condition and the marker?
of DNA markers, starting with restriction fragment length polymorphisms (A) This meiosis is uninformative: the father
(RFLPs) in the 1970s. As well as being more numerous than earlier protein-based (II1) is homozygous and so his marker alleles
markers, DNA markers have the advantage that they can all be assayed with the cannot be distinguished. (B) This meiosis
same techniques. Moreover, their chromosomal location can be easily deter- is uninformative because both parents are
mined by reference to the human genome sequence. This allows DNA-based heterozygous for alleles A1 and A2: the child
could have inherited A1 from the father and
genetic maps to be cross-referenced to physical maps.
A2 from the mother, or vice versa. (C) This
meiosis is informative and non-recombinant:
Linkage analysis normally uses either fluorescently labeled the child inherited A1 from the father. (D) This
microsatellites or SNPs as markers meiosis is informative and recombinant: the
child inherited A2 from the father.
The first DNA-based linkage studies used RFLPs that had to be assayed by the
very laborious Southern blotting technique, which made a genomewide search
for linkage extremely onerous. Subsequent developments have steadily made
linkage analysis quicker and easier. Microsatellite (STRP) markers were intro-
duced that were highly informative because they had many possible alleles, and
could be assayed using PCR. In the early days, each microsatellite was amplified
individually and run individually on a manually poured slab gel. Later, compati-
ble sets of microsatellite markers were developed that could be amplified together
in a multiplex PCR reaction to give non-overlapping allele sizes, so that they
could be run in the same gel lane. With fluorescent labeling in several colors, it
became possible to genotype a sample for 10 or more markers in a single gel lane
or capillary of a sequencing machine.
A major achievement of the early stages of the Human Genome Project was to
identify upward of 10,000 highly polymorphic microsatellite markers and place
them on framework maps (see Chapter 8). These maps made possible the spec-
tacular progress of the 1990s in mapping Mendelian diseases. The original micro-
satellites were mostly repeats of the dinucleotide sequence CA. Trinucleotide and
tetranucleotide repeats have gradually replaced dinucleotide repeats as the
markers of choice because they give cleaner results.
With the widespread adoption of microarray technology, SNPs have come to
the fore as genetic markers. SNPs normally have only two alleles, which means
that a higher proportion of meioses are uninformative for linkage using a SNP
than using a microsatellite. Two to three SNPs will generally be required to
TABLE 14.2 THE NUMBER OF INFORMATIVE MEIOSES NEEDED TO OBTAIN EVIDENCE OF LINKAGE BETWEEN TWO LOCI
Recombination fraction 0 0.05 0.10 0.15 0.20 0.30 0.40
The higher the recombination fraction is between two linked loci, the more meioses are needed to obtain evidence that they are linked. See Box 14.2
for an explanation of how these figures were calculated.
TWO-POINT MAPPING 451
(A) (B)
I I
A2 A5 A1 A6
II II
A1 A2 A3 A4 A1 A2 A3 A4
III III
A1 A3 A2 A3 A1 A4 A1 A4 A2 A4 A2 A3 A1 A3 A2 A3 A1 A4 A1 A4 A2 A4 A2 A3
N N N N N R
(C)
II
A1 A2 A3 A4 A5 A6
III
A1 A3 A2 A3 A1 A4 A1 A4 A2 A4 A2 A3 A1 A5 A1 A6
Figure 14.9 Recognizing recombinants. Three versions of a family with an autosomal dominant disease, typed for a marker with alleles A1–A6.
(A) All meioses are phase-known. We can identify III1–III5 unambiguously as non-recombinant (N) and III6 as recombinant (R). (B) The same family, but
phase-unknown. The mother, II1, could have inherited either marker allele A1 or A2 with the disease; thus, her phase is unknown. Either III1–III5 are
non-recombinant and III6 is recombinant; or III1–III5 are recombinant and III6 is non-recombinant. (C) The same family after further tracing of relatives. III7
and III8 have also inherited marker allele A1 along with the disease from their father—but we cannot be sure whether their father’s allele A1 is identical by
descent to the allele A1 in his sister II1. Maybe there are two copies of allele A1 among the four grandparental marker alleles. The likelihood of this depends
on the gene frequency of allele A1. Thus, although this pedigree contains linkage information, extracting it is problematic.
452 Chapter 14: Genetic Mapping of Mendelian Characters
0.7 0.4
0.6 0.3
0.5 0.2
lod score
0.4 0.1
0.3 0
0 0.1 0.2 0.3 0.4 0.5 0.6
0.2
recombination fraction
0.1
0 Figure 2 Lod score curve for family B.
0 0.1 0.2 0.3 0.4 0.5 0.6
recombination fraction
Family C
Figure 1 Lod score curve for family A. At this point non-masochists turn to the computer.
TWO-POINT MAPPING 453
must take likelihoods calculated for each possible genotype of I1, I2, and II3,
weighted by the probability of that genotype. For I1 and I2, the genotype proba-
bilities depend on both the gene frequencies and the observed genotypes of II1,
III7, and III8. Genotype probabilities for II3 are then calculated by simple
Mendelian rules. Human linkage analysis, except in the very simplest cases, is
entirely dependent on computer programs that implement algorithms for han-
dling these branching trees of genotype probabilities, given the pedigree data
and a table of gene frequencies.
The result of linkage analysis is a table of lod scores at various recombination
fractions, like the two tables in Box 14.2. Positive lod scores give evidence in favor
of linkage, and negative lods give evidence against linkage. Note that only recom-
bination fractions between 0 and 0.5 are meaningful, and that all lod scores are
zero at q = 0.5 [because they are then measuring the ratio of two identical prob-
abilities, and log10 (1) = 0]. The results can be plotted to give curves (see Box
14.2).
Returning to the two questions posed at the start of this section, we now see
that the most likely recombination fraction is the one at which the lod score is
highest. If there are no recombinants, the lod score will be maximum at q = 0. If
there are recombinants, Z will peak at the most likely recombination fraction
(0.167 = 1/6 for family A in Figure 14.9, but it is harder to predict for family B).
Lod scores of +3 and –2 are the criteria for linkage and exclusion
(for a single test)
The second question posed at the start of this section concerned the threshold of
significance. Here the answer is at first sight surprising. Z = 3.0 is the threshold for
accepting linkage, with a 5% chance of a type 1 error (falsely rejecting the null
hypothesis). For most statistics, p < 0.05 is used as the threshold of significance,
but Z = 3.0 corresponds to 1000:1 odds [log10(1000) = 3.0]. The reason why such a
stringent threshold is chosen lies in the inherent improbability that two loci, cho-
sen at random, should be linked. With 22 pairs of autosomes to choose from, it is
not likely the two loci would be located on the same chromosome (syntenic) and,
even if they were, loci well separated on a chromosome segregate independently.
Common sense tells us that if something is inherently improbable, we require
strong evidence to convince us that it is true. This common sense can be quanti-
fied in a Bayesian calculation (Box 14.3), which shows that odds of 1000:1 in fact
correspond precisely to the conventional p = 0.05 threshold of significance. Thus,
Z = 3.0 is the threshold for accepting linkage with a 5% chance of falsely rejecting
the null hypothesis. Linkage can be rejected if Z < –2.0. Values of Z between –2
Because of the low prior probability that two randomly chosen loci should be linked, evidence
giving 1000:1 odds in favor of linkage is required in order to give overall 20:1 odds in favor of
linkage. This corresponds to the conventional p = 0.05 threshold of statistical significance. The
calculation is an example of the use of Bayesian statistics to combine probabilities. See the text
for a description of the lod score.
454 Chapter 14: Genetic Mapping of Mendelian Characters
Figure 14.10 Lod score curves. Graphs of lod score against recombination
5
fraction (q) from a hypothetical set of linkage experiments. The green curve shows
evidence of linkage (Z > 3) with no recombinants. The red curve shows evidence
of linkage (Z > 3), with the most likely recombination fraction being 0.23. The 4
purple curve shows linkage excluded (Z < –2) for recombination fractions 1 lod
below 0.12; inconclusive for larger recombination fractions. The blue curve is unit
3
inconclusive at all recombination fractions.
2
and +3 are inconclusive. The same logic suggests a threshold lod score of 2.3 for
1
establishing linkage between an X-linked character and an X-chromosome
marker (prior probability of linkage ≈ 1/10).
lod score
Confidence intervals are hard to deduce analytically, but a widely accepted 0 θ
0.1 0.2 0.3 0.4 0.5
support interval extends to recombination fractions at which the lod score is 1
unit below the peak value (the lod – 1 rule). Thus, the red curve in Figure 14.10 –1
gives acceptable evidence of linkage (Z > 3), with the most likely recombination
fraction 0.23 and support interval 0.17–0.32. The curve will be more sharply –2
peaked for greater amounts of data but, in general, peaks are quite broad. It is
important to remember that distances on human genetic maps are often very –3
imprecise estimates.
Negative lod scores exclude linkage for the region where Z < –2. The purple –4
curve in Figure 14.10 excludes the disease from 12 cM (= 0.12 recombination frac-
tion) either side of the marker. Although gene mappers hope for a positive lod
–5
score, exclusions are not without value. They tell us where the disease is not
(exclusion mapping). This can exclude a possible candidate gene, and if enough
of the genome is excluded, only a few possible locations may remain.
Table 14.3 shows a real example of two-point lod scores from a study of fami-
lies with migraine with aura. There are several points worth noting:
• There is evidence of linkage—some lod scores are well above the 3.0 threshold
(green shaded rows).
TABLE 14.3 TWOPOINT LOD SCORES FOR A SERIES OF MARKERS IN FAMILIES SHOWING AUTOSOMAL DOMINANT
INHERITANCE OF MIGRAINE WITH AURA
Marker lod (Z) at q =
Markers from chromosome 15q11–q13 are listed in chromosomal order. For each marker, the maximum lod score is shown in bold. The markers
highlighted in green show significant evidence of linkage to the locus (i.e. Z > 3.0) determining migraine in these families. A multipoint analysis of the
same data is shown in Figure 14.10. [Data from Russo L, Mariotti P, Sangiorgi E et al. (2005) Am. J. Hum. Genet. 76, 312–326.]
MULTIPOINT MAPPING 455
• For some markers, the maximum lod score is at zero recombination (like the
green curve in Figure 14.10; actually q = 0.001 is used to avoid problems with
minus infinity scores for unlinked markers). For some others (D15S986 and
D15S113) it is slightly negative at q = 0.001 but rises almost to the threshold of
significance at q = 0.1, like the red curve in Figure 14.10 although less extreme;
for others (D15S1012 and D15S971) it is always negative, like the purple curve
in Figure 14.10.
Tables like this should not be over-interpreted. The precise lod score depends
on the vagaries of which meioses were informative with which markers. The thing
to note is the overall trend, showing strongly positive scores for markers toward
the center of the region covered, with negative scores at each end of the region.
Abc/abc A–¥–(B, C ) 47
aBC/abc
AbC/abc B–¥–(A, C ) 95
aBc/abc
One thousand offspring have been scored from crosses between female Drosophila fruit flies
heterozygous for recessive characters at three linked loci (ABC/abc) and triply homozygous
males (abc/abc). This design allows the genotype of the eggs to be inferred by inspection of the
phenotype of the offspring. The rarest class of offspring will be those whose production requires
two crossovers. Of the 1000 flies, 142 (95 + 47) are recombinant between A and B, 52 (47 + 5)
between A and C, and 100 (95 + 5) between B and C. Only five flies are recombinant between A and
C but not between A and B, so these must have double crossovers, A–¥–C–¥–B. Therefore the map
order is A–C–B and the genetic distances are approximately A–(5 cM)–C–(10 cM)–B.
serious difficulty for map makers. Something more intelligent than brute-force
computing had to be used to work out the correct order. Even without large-scale
sequence data, physical mapping information was immensely helpful. Markers
that can be assayed by PCR can be used as sequence-tagged sites (STS) and phys-
ically localized, either by database searching or experimentally by using radiation
hybrids. The result is a physically anchored marker framework.
Disease–marker mapping suffers from the necessity of using whatever fami-
lies can be found in which the disease of interest is segregating. Such families will
rarely have ideal structures. All too often the number of meioses is undesirably
small, and some are phase-unknown. Marker–marker mapping can avoid these
problems. Markers can be studied in any family, so families can be chosen that
have plenty of children and ideal structures for linkage, like the family in Figure
14.1. The construction of marker framework maps has benefited greatly from a
collection of families each with eight or more children and all four grandparents,
assembled specifically for the purpose by the Centre d’Étude du Polymorphisme
Humain (CEPH; now the Fondation Jean Dausset) in Paris. Immortalized cell
lines from every individual of these CEPH families ensure a permanent supply of
DNA, and sample mix-ups and non-paternity have long since been ruled out by
typing with many markers. As an example, the widely used 1998 CHLC
(Cooperative Human Linkage Center) map was based on the results of scoring
eight CEPH families with 8325 microsatellites, resulting in more than 1 million
genotypes.
0
(marker positions are shown by black arrows;
colored arrows indicate the position of
two candidate genes) and the multipoint
–5 lod score is calculated for each possible
location of the disease locus. Lod scores
dip to strongly negative values near the
position of loci that show recombinants
–10 with the disease. The highest peak marks
the most likely location; odds in favor of
this location are measured by the degree to
which the highest peak overtops its rivals.
–15 The peak lod score is 6.54; that is, a tenfold
stronger likelihood than the maximum of
5.56 obtained by two-point analysis. There
which are usually only very roughly known. The physical distances may be known
were no recombinants within this region in
precisely, but for this purpose we need the genetic distances. For markers 1 cM this data set, so the candidate region could
apart on the highest resolution map currently available (the Icelandic map of not be narrowed down further. [From Russo
Kong et al.), the 95% confidence interval for their true distance is 0.5–1.8 cM. L, Mariotti P, Sangiorgi E et al. (2005) Am. J.
Moreover, those figures relate only to the individuals whose data were used in Hum. Genet. 76, 327–333. With permission
constructing the map, who will not be the same individuals as those used in any from Elsevier.]
particular disease mapping study. As Figure 14.7 shows, total map length varies
between individuals. Additionally, none of the mapping functions in linkage pro-
grams even approximates to the real complexities of chiasma distribution (see
Figure 14.5). However, unless distances on the marker map are radically wrong, it
remains true that the highest peak marks the most likely location. If the marker
framework is physically anchored, as described above, the stage is then set to
search the DNA of the candidate interval and identify the disease gene.
a child would be autozygous at only 1/64 of all loci (Figure 14.12). If that child is 1/8 1/8
homozygous for a particular marker allele, this could be because of autozygosity
or it could be because a second, independent example of the same allele has
entered the family at some stage (these alleles can be described as identical by 1/16 +
state, or IBS). The rarer the allele is in the population, the greater is the likelihood 1/4 1/16
= 1/8
that homozygosity represents autozygosity. In practice, the requisite rarity is
achieved by using a haplotype of closely linked markers. For an infinitely rare
allele or haplotype, a single homozygous affected child born to second cousins 1/2 1/16
generates a lod score of log10(64) = 1.8. If there are two other affected sibs who are
both also homozygous for the same rare allele, the lod score is log10(64 ¥ 4 ¥ 4) =
1/32
3.0; the chance that a sibling would have inherited the same pair of parental hap-
lotypes even if they are unrelated to the disease is 1 in 4. A B
Thus, quite small consanguineous families can generate significant lod scores.
Autozygosity mapping is an especially powerful tool if families can be found with
multiple people affected by the same recessive condition in two or more sibships C
linked by consanguinity. Suitable families may be found in Middle Eastern coun- Figure 14.12 Second cousins share 1/32 of
tries and parts of the Indian subcontinent where first-cousin and other consan- their genes, on average, by virtue of their
guineous marriages are common. The method has been applied with great suc- relationship. Individuals A and B are second
cess to locating genes for autosomal recessive hearing loss (Figure 14.13). As cousins (their mothers are first cousins). The
mentioned in Chapter 3, extensive locus heterogeneity makes recessive hearing figures show the proportion of A’s genes that
loss impossible to analyze by combining data from several small nuclear each linking relative is expected to share. C,
families. the child of A and B, would be expected to
be autozygous at 1/64 of all loci.
A bold application of autozygosity in a Northern European population ena-
bled Houwen and colleagues in 1994 to map a rare recessive condition, benign
recurrent intrahepatic cholestasis (OMIM 243300), using only four affected indi-
viduals (two sibs and two supposedly unrelated people) from an isolated Dutch
1 2
I
1 2 3 4 5 6 7
II
5 2
2 2
3 2
5 2
2 2
3 1
2 2
2 3
6 3
1 3
1 2 3 4 5 6 7 8 9 10
III
5 2 5 2
1 2 1 2
1 2 1 2
5 2 5 2
1 2 3 4 5 3 2 3 2 6
6 1 6 1
IV 2 2 2 2
D2S305 4 5 6 5 5 7 5 7 5 7 3 3 3 3 5 6
D2S310 2 1 1 2 2 1 1 2 1 2 7 3 7 3 1 3
D2S144 6 1 2 3 4 5 1 2 1 2 6 3 6 3 1 2
D2S171 7 5 2 1 2 7 5 3 5 3 5 3
AFMb346ye5 1 3 3 4 3 2 3 4 3 4 3 2
AFMa052yb5 4 6 5 6 6 3 6 5 6 5 6 2
D2S158 2 2 4 2 2 1 2 2 2 2 2 2
D2S174 3 3 4 5 5 6 5 3 5 3 5 6
D2S1365 9 7 5 2 2 8 2 7 2 7 2 7
D2S170 5 6 2 6 6 2 6 5 6 5 6 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
V
D2S305 4 5 5 5 5 6 4 6 5 5 4 5 5 5 7 7 7 7 5 7 7 7 5 5 5 2 5 5 5 2 5 2 5 6 5 5 5 6 5 5
D2S310 2 2 1 2 1 1 2 1 1 2 2 2 1 2 1 2 1 2 2 2 1 2 2 1 1 2 1 1 1 2 1 2 1 3 1 1 1 1 1 1
D2S144 1 3 1 3 1 2 1 2 1 3 6 3 1 3 5 2 5 2 4 2 5 2 4 1 1 2 1 1 1 2 1 1 1 2 1 1 1 1 1 1
D2S171 5 1 5 1 5 2 5 2 5 1 7 1 5 1 7 3 7 3 2 3 7 3 2 5 5 2 5 5 5 2 5 5 5 3 5 5 5 5 5 5
AFMb346ye5 3 4 3 4 3 3 3 3 3 4 1 4 3 4 2 4 2 4 3 4 2 4 3 3 3 2 3 3 3 2 3 3 3 2 3 3 3 3 3 3
AFMa052yb5 6 6 6 6 6 5 6 5 6 6 4 6 6 6 3 5 3 5 6 5 3 5 6 6 6 1 6 6 6 1 6 6 6 2 6 6 6 6 6 6
D2S158 2 2 2 2 2 4 2 4 2 2 2 2 2 2 1 2 1 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
D2S174 3 5 3 5 3 4 3 4 3 5 3 5 3 5 6 3 6 3 5 3 6 3 5 5 5 3 5 3 5 3 5 3 3 6 3 5 3 5 3 5
D2S1365 7 2 7 2 7 5 7 5 7 2 9 2 7 2 8 7 8 7 2 7 8 7 2 2 2 3 2 7 2 3 2 7 7 7 7 2 7 2 7 2
D2S170 6 6 6 6 6 2 6 2 6 6 5 6 6 6 2 5 2 5 6 5 2 5 6 6 6 3 6 6 6 3 6 6 6 2 6 6 6 6 6 6
Figure 14.13 Autozygosity mapping. A large, multiply consanguineous family in which several members suffer from autosomal recessive profound
congenital deafness (filled symbols). Shaded blocks mark a haplotype of microsatellite markers from chromosome 2 that segregates with the deafness.
Markers AFMa052yb5 and D2S158 (blue) are homozygous in all affected people and no unaffected people. The deafness gene should lie somewhere
between the two markers that flank these (AFMb346ye5 and D2S174). [Adapted from Chaib H, Place C, Salem N et al. (1996) Hum. Mol. Genet. 5, 155–158.
With permission from Oxford University Press.]
FINE-MAPPING USING EXTENDED PEDIGREES AND ANCESTRAL HAPLOTYPES 459
X1, K1 3 49
X1, K2 147 19
X2, K1 8 70
X2, K2 8 25
Data from typing for two RFLP markers, the first with alleles X1 and X2 and the second with alleles
K1 and K2 in 114 British families with a child with cystic fibrosis (CF). Chromosomes carrying the CF
disease mutation tend to carry allele X1 and allele K2. [Data from Ivinson AJ, Read AP, Harris R et al.
(1989) J. Med. Genet. 26, 426–430.]
460 Chapter 14: Genetic Mapping of Mendelian Characters
marker A 1 2
θ1 = 0.05
homologous
chromosomes
of a phase-known disease + D
triple heterozygote
θ2 = 0.05
marker B 1 2
1 2 1 2 1 2 1 2
Figure 14.15 Apparent double
recombinants suggest errors in the data.
In this phase-known triple heterozygote,
gametes + D + D D + D + two markers (A and B) flank the disease
locus (D). On the genetic map, the markers
are each only 5 cM from the disease locus,
giving a recombination frequency for each
1 2 2 1 2 1 1 2 single crossover of 0.05. The recombination
frequencies of the possible gametes are
type of gamete non-recombinant marker–marker recombinants apparent double shown below. Because of interference, the
recombinants probability of true double recombinants
expected 1 – θ1 – θ2 θ1 + θ2 < θ1θ2 would be substantially below the theoretical
frequency = 0.9 = 0.1 < 0.0025 value of 0.05 ¥ 0.05 = 0.0025.
462 Chapter 14: Genetic Mapping of Mendelian Characters
CONCLUSION
The human genome sequence provides the ultimate physical map, enabling
physical entities such as genes, DNA sequences, and chromosomal breakpoints
to be localized to the nearest nucleotide. However, localizing the determinants of
phenotypes such as genetic diseases requires an alternative approach, namely
genetic mapping. Genetic mapping depends on identifying a physical entity that
consistently travels together (co-segregates) with the phenotypic determinant
through multiple meioses in a collection of families. Because of recombination
during meiosis, only sequences that lie very close together on the same chromo-
somal segment will consistently co-segregate.
The usual physical entities used in genetic mapping are microsatellites or sin-
gle nucleotide polymorphisms (SNPs). Array-based technologies allow hundreds
of thousands of SNPs, distributed across the whole genome, to be tested quickly
and easily for co-segregation with a phenotype of interest. If there is significant
but imperfect co-segregation, the fraction of meioses in which the two fail to co-
segregate (the recombination fraction) is a measure of their genetic distance
apart. Genetic distances are not identical to physical distances, measured in kilo-
bases, because the probability of a crossover is not uniform along the whole
length of every chromosome. It varies systematically between the sexes and
between individuals. The relationship between the genetic distance, measured in
centimorgans, and the recombination fraction is not linear, but must be expressed
mathematically by a mapping function. This is necessary because recombination
fractions never exceed 0.5, no matter how far apart physically two loci are.
464 Chapter 14: Genetic Mapping of Mendelian Characters
FURTHER READING
General ideas in genetic recombination, from the man who proposed
Laboratory of Statistical Genetics, Rockefeller University. http:// the Holliday junction. The full text is available electronically.]
linkage.rockefeller.edu/ [A very useful website, giving access Hultén MA & Lindsten J (1973) Cytogenetic aspects of human
to all the programs mentioned in this chapter, and a wealth male meiosis. Adv. Hum. Genet. 4, 327–387. [Classical
of general information about human linkage analysis.] microscope study of male meiosis.]
Ott J (1999) Analysis Of Human Genetic Linkage, 3rd ed. Johns Jeffreys AJ, Neumann R, Panayi M et al. (2005) Human
Hopkins University Press. [Gives a full discussion of linkage recombination hotspots hidden in regions of strong marker
analysis and mapping functions.] association. Nat. Genet. 37, 601–606. [An elegant and
Terwilliger J & Ott J (1994) Handbook For Human Genetic comprehensive exploration of recombination hotspots in
Linkage. Johns Hopkins University Press. [Full of practical a region of human chromosome 1; also a useful source of
advice indispensable to anybody undertaking human linkage references to earlier work.]
analysis.] Keeney S (2001) Mechanism and control of meiotic
recombination initiation. Curr. Top. Dev. Biol. 52, 1–53.
The role of recombination in genetic mapping [A detailed review of the molecules involved in meiotic
recombination, focusing on yeast.]
Broman KW & Weber JL (2000) Characterization of human
crossover interference. Am. J. Hum. Genet. 66, 1911–1926.
[An attempt to define the best mapping function for human Mapping a disease locus
linkage analysis.] Botstein D, White RL, Skolnick M & Davis RW (1980) Construction
Cheung VG, Burdick JT, Hirschmann D & Morley M (2007) of a genetic linkage map in man using restriction fragment
Polymorphic variation in human meiotic recombination. Am. length polymorphisms. Am. J. Hum. Genet. 32, 314–331.
J. Hum. Genet. 80, 526–530. [A study of an average of eight [A classic in the history of developing markers for linkage.]
meioses in each of 67 individuals using 6324 SNPs, giving Shete S, Tiwari H & Elston RC (2000) On estimating the
detail of inter-individual and chromosomal variations in heterozygosity and polymorphism information content
recombination.] value. Theor. Popul. Biol. 57, 265–271. [For those interested
Holliday R (1990) The history of the DNA heteroduplex. BioEssays in a thorough mathematical analysis of informativeness of
12, 133–142. [An interesting personal review of the history of markers.]
FURTHER READING 465
Two-point mapping
Morton NE (1955) Sequential tests for the detection of linkage.
Am. J. Hum. Genet. 7, 277–318. [The blockbuster paper that
established the lod score method for human linkage. Not
easy reading.]
Russo L, Mariotti P, Sangiorgi E et al. (2005) A new susceptibility
locus for migraine with aura in the 15q11–q13 genomic
region containing three GABAA receptor genes. Am. J. Hum.
Genet. 76, 312–326. [The study described in Table 14.3.]
Multipoint mapping
Abecasis GR, Cherny SS, Cookson WO & Cardon LR (2002)
Merlin—rapid analysis of dense genetic maps using sparse
gene flow trees. Nat. Genet. 30, 97–101.
Gudbjartsson DF, Jonasson K, Frigge ML & Kong A (2000) Allegro,
a new computer program for multipoint linkage analysis. Nat.
Genet. 25, 12–13.
Kong A, Gudbjartsson DF, Sainz J et al. (2002) A high-resolution
recombination map of the human genome. Nat. Genet. 31,
241–247. [The most comprehensive human marker–marker
genetic map, following 5136 microsatellites through 1257
meioses in a set of Icelandic families.]
Lathrop GM, Lalouel JM, Julier C & Ott J (1985) Multilocus linkage
analysis in humans: detection of linkage and estimation of
recombination. Am. J. Hum. Genet. 37, 482–498. [Discusses
the advantages of three-point over two-point linkage, and
introduces the Linkage program package.]
• Many common diseases are multifactorial, with a variety of environmental and genetic
factors each reducing or increasing an individual’s susceptibility to that disease. They are
also usually complex, having many different possible causes.
• The major genetic effects on human health and disease come from genetic factors that
affect susceptibility to common complex diseases, rather than those that cause Mendelian
diseases. A main aim of much current human genetic research is to identify such factors.
• Evidence for the role of genetic factors in many common complex diseases comes from
studies of families, twins, and adopted people. Such studies need careful interpretation to
disentangle genetic effects from the effects of a shared family environment.
• Linkage analysis has shown only very limited success in mapping susceptibility factors for
common complex diseases. Standard lod score analysis cannot be used because it is not
possible to specify certain parameters for the susceptibility factors, such as the mode of
inheritance, gene frequencies, or penetrances. Instead, non-parametric methods are used,
which do not require a detailed genetic model to be provided. Non-parametric linkage
methods take affected relatives and look for chromosomal segments that they share more
often than expected by chance. Affected sib pairs are the most frequently used sets of
relatives.
• An alternative approach to finding susceptibility factors is to seek populationwide statistical
associations between a certain genotype and a disease. Such associations may arise because
many supposedly unrelated people share a chromosome segment inherited from a distant
common ancestor who carried a susceptibility factor. However, not all populationwide
associations have a genetic cause.
• The International HapMap Project has defined the ancestral chromosome segments in four
human populations, and cataloged markers (tagging SNPs) that can be used to identify
them.
• Shared ancestral chromosome segments are extremely small (typically a few kilobases).
Thus, seeking associations requires the use of a dense array of closely spaced markers. It has
only recently become feasible to conduct genomewide association studies.
• Identifying susceptibility factors, either by linkage studies or by association, has proved
unexpectedly difficult. Only recently have studies become sufficiently powerful to identify
susceptibility factors reliably.
• Association studies can only identify factors that are present on chromosome segments that
are shared by many individuals in the study group. Thus, the ability of association studies to
identify susceptibility factors for common complex diseases depends on the common
disease–common variant hypothesis, which supposes that most susceptibility factors are
ancient variants found on shared ancestral chromosome segments.
• An alternative hypothesis, the mutation–selection hypothesis, supposes most factors to be
the result of a highly heterogeneous set of individually rare recent mutations.
• If the mutation–selection hypothesis is true, identifying susceptibility factors will require
large-scale resequencing of individual genomes from cases and controls. New sequencing
technologies make this feasible.
468 Chapter 15: Mapping Genes Conferring Susceptibility to Complex Diseases
The main genetic contribution to morbidity and mortality is through the genetic
component of common complex diseases. These diseases have no one single
cause, but result from the cumulative effects of a variety of genetic and environ-
mental factors, often different in different affected individuals. Identifying the
genes concerned is a central task for medical research. However, this is a much
more formidable task than identifying the mutations that cause monogenic dis-
eases. For Mendelian diseases, given a sufficient collection of affected families,
linkage analysis as described in Chapter 14 can usually localize the causative
gene to a small chromosomal segment containing only a handful of candidate
genes. Similar studies in complex diseases have been much less successful.
A meta-analysis by Altmüller and colleagues in 2001 reviewed 101 linkage
studies in 31 complex diseases. The result was sobering. Candidate regions
defined in different linkage studies of the same disease rarely coincided. There
were some real successes, reviewed by Lohmueller and colleagues (see Further
Reading) but, despite huge efforts by leading research teams, overall the studies
had made only limited progress in localizing susceptibility genes. Recently this
has changed. A combination of new technology (high-resolution SNP chips) and
new understanding of the structure of human genomes (the HapMap project) is
finally allowing susceptibility factors to be reliably identified. In this chapter, we
describe the ways in which this difficult problem has been approached.
Numbers at risk are corrected to allow for the fact that some at-risk relatives were below or only
just within the age of risk for schizophrenia (say, 15–35 years). l values are calculated assuming a
population incidence of 0.8%. [Data from McGuffin P, Shanks MF & Hodgson RJ (eds) (1984) The
Scientific Principles of Psychopathology. Grune & Stratton.]
papers by Risch (see Further Reading). As an example, Table 15.1 shows pooled
data from several studies of schizophrenia. Family clustering is evident from the
raised l values, for example a sevenfold increased risk for somebody, one of
whose parents is schizophrenic. As expected, l values drop back toward 1 for
more distant relationships such as nephews, nieces, and cousins.
MZ DZ
The numbers show the total number of twin pairs ascertained and the number that were
concordant (both twins diagnosed as schizophrenic). Diagnostic and inclusion criteria varied
between studies; despite the heterogeneity there is a clear tendency for more monozygotic (MZ)
than dizygotic (DZ) pairs to be concordant. [For references, see Onstad S, Skre I, Torgersen S &
Kringlen E (1991) Acta Psychiatr. Scand. 83, 395–401.]
both twins have the condition (concordant, +/+) and the number of pairs in
which only one twin is affected (discordant, +/–). Probandwise concordance is
obtained by counting a pair twice if both were probands, and thus gives higher
values for the concordance. For example, in the study by Onstad et al. cited in
Table 15.2, in 7 of the 24 MZ twin pairs, both twins were independently ascer-
tained as affected probands. Thus, the probandwise concordance was 0.48, cal-
culated as [(8 + 7)/(24 + 7)], in comparison with the pairwise concordance of 0.33
(8/24). Probandwise concordances are thought to be more comparable with
other measures of family clustering.
Genetic characters should show a higher concordance in MZ than DZ twins,
and many characters do. However, a higher concordance in MZ twins than in DZ
twins does not prove a genetic effect. For a start, half of DZ twins are of different
sexes, whereas all MZ twins are the same sex. Even if the comparison is restricted
to same-sex DZ twins (as it is in the studies shown in Table 15.2), at least for
behavioral traits the argument can be made that MZ twins are more likely to look
very similar, to be dressed and treated the same, and thus to share more of their
environment than DZ twins.
Separated monozygotic twins
Monozygotic twins separated at birth and brought up in entirely separate envi-
ronments seem to provide an ideal experiment for separating the effects of shared
genes and shared environment. Francis Crick once made the tongue-in-cheek
suggestion that one of each pair of twins born should be donated to science for
this purpose. Such separations happened in the past more often than one might
expect—the birth of twins was sometimes the last straw for an overburdened
mother. Fascinating television programs can be made about twins reunited after
40 years of separation, who discover they have similar jobs, wear similar clothes,
and like the same music. As research material, however, separated twins have
many drawbacks:
• Any research is necessarily based on small numbers of arguably exceptional
people.
• The separation was often not total—often the twins were separated some time
after birth, and were brought up by relatives.
• There is a bias of ascertainment—everybody wants to know about strikingly
similar separated twins, but separated twins who are very different are not
newsworthy.
• Research on separated twins cannot distinguish intrauterine environmental
causes from genetic causes. This may be important, for example in studies of
sexual orientation (the gay gene), in which some people have suggested that
maternal hormones may affect the fetus in utero so as to influence its future
sexual orientation.
SEGREGATION ANALYSIS 471
Thus, for all their anecdotal fascination, separated twins have contributed
relatively little to human genetic research.
Index cases (47 chronic schizophrenic adoptees) 44/279 (15.8%) 2/111 (1.8%)
Control adoptees (matched for age, sex, social status of adoptive 5/234 (2.1%) 2/117 (1.7%)
family, and number of years in institutional care before adoption)
The study involved 14,427 adopted persons aged 20–40 years in Denmark; 47 of them were diagnosed as chronic schizophrenic. The 47 were
matched with 47 non-schizophrenic control subjects from the same set of adoptees. [Data from Kety SS, Wender PH, Jacobsen B et al. (1994)
Arch. Gen. Psychiatry 51, 442–455.]
472 Chapter 15: Mapping Genes Conferring Susceptibility to Complex Diseases
Major dominant locus 1.00 7.56 1.2 ¥ 10–5 0.19 2.8 0.42
Data are for families ascertained through a proband with long-segment Hirschsprung disease (OMIM 142623). Parameters that can be varied are as
follows: d, the degree of dominance of any major disease allele; t, the difference in liability between people homozygous for the low-susceptibility
and the high-susceptibility alleles of a major susceptibility gene, measured in units of standard deviation of liability; q, the gene frequency of any
major disease allele; H, the proportion of total variance in liability that is due to polygenic inheritance, in adults; z, the ratio of heritability in children to
heritability in adults; x, the proportion of cases due to new mutation. The values shown are those that best account for the family data using the stated
model. The c2 statistic is a standard test that compares the performance of each model with the mixed model, in which a mix of all mechanisms is
allowed. A single major locus encoding dominant susceptibility explains the data as well as the mixed model. [Data from Badner JA, Sieber WK,
Garver KL & Chakravarti A (1990) Am. J. Hum. Genet. 46, 568–580.]
of these, and the environmental factors may include both familial and non-
familial variables. In complex segregation analysis, a whole range of possible
inheritance patterns, gene frequencies, penetrances, and so on, are modeled by
computer analysis to find the mix of scenarios that gives the greatest overall like-
lihood for the observed data. Factors in this mixed model are then omitted or
constrained to identify the minimum that must be included so as to avoid a sig-
nificant loss of explanatory power. As with lod score analysis (see Chapter 14), the
question asked is how much more likely the observations are when one hypoth-
esis is compared with another.
In the example of Table 15.4, the ability of specific models (sporadic, poly-
genic, recessive, dominant) to explain the data was compared with the likelihood
calculated by a general mixed model, in which the computer program could
freely optimize the mixture of single-gene, polygenic, and random environmen-
tal causes. All models were constrained by the overall epidemiology, sex ratios,
and probabilities of ascertainment estimated from the collected data. A single-
locus dominant model is not significantly worse than the mixed model at explain-
ing the data (c2 = 2.8, p = 0.42). In contrast, models assuming no genetic factors
(sporadic), pure polygenic inheritance, or pure recessive inheritance perform
very badly. On the argument that simple explanations are preferable to compli-
cated explanations, the analysis suggests the existence of a major dominant sus-
ceptibility to Hirschsprung disease. One such factor has now been identified, the
RET (rearranged during transfection) oncogene (OMIM 164761).
However clever the segregation analysis program is, it can do no more than
maximize the likelihood across the parameters it was given. If a major factor is
omitted, the result can be misleading. This was well illustrated by the data in
Table 15.5. McGuffin & Huckle asked their classes of medical students which of
their relatives had attended medical school. When they fed the results through a
The data are taken from a survey of medical students and their families. The meaning of the
symbols is explained in the foornote to Table 15.4. Affected is defined as attending medical school.
The analysis seems to support recessive inheritance, because this accounts for the data equally well
as the unrestricted model (but see the text). n.s., not significant. [Data from McGuffin P & Huckle P
(1990) Am. J. Hum. Genet. 46, 994–999.]
LINKAGE ANALYSIS OF COMPLEX CHARACTERS 473
In the first case, identifying the Mendelian subset is intrinsically valuable, but
it does not necessarily cast any light on the causes of the non-Mendelian disease. A1 A2 A1 A3 A1 A2 A2 A3
That was the case with breast cancer and Alzheimer disease. In breast cancer this
led to the discovery of the BRCA1 and BRCA2 genes, as described in Chapter 16
(see Case 7, p. 526). Mutations in the PSEN1, PSEN2, and APP genes cause rare
Mendelian forms of early-onset Alzheimer disease. However, the common non- A1 A3 A1 A2 A1 A3 A1 A2
Mendelian forms of breast cancer and Alzheimer disease do not usually involve
Figure 15.1 Identity by state and identity
any of these genes. In the second case, the loci mapped are also susceptibility
by descent. Both sib pairs share allele A1.
factors for the common non-Mendelian disease—Hirschsprung disease is an The first sib pair have two independent
example. Finally, an early study of schizophrenia exemplified the third case, pro- copies of A1 (colored red and blue),
ducing a lod score of 6 that is now generally agreed to have been spurious. This indicating identity by state but not by
debacle was enough to persuade most investigators to switch to alternative meth- descent. The second sib pair share copies of
ods of analysis. the same paternal A1 allele (red), showing
identity by descent. The difference is only
Non-parametric linkage analysis does not require a genetic model apparent if the parental genotypes are
known.
Model-free or non-parametric methods of linkage analysis look for alleles or
chromosomal segments that are shared by affected individuals more often than
random Mendelian segregation would predict. Some of the basic ideas underly-
ing these approaches were set out in 1990 in the three papers by Risch mentioned
previously.
Identity by descent versus identity by state
It is important to distinguish segments identical by descent (IBD) from those
identical by state (IBS). Alleles that are IBD are demonstrably copies of the same
ancestral (usually parental) allele. IBS alleles look identical, and may indeed be
copies, but their common ancestry is not demonstrable. This may be because of
a lack of information about any common ancestor, or because the genotypes do
not permit unambiguous tracing of the ancestral origin of the alleles. Figure 15.1 (A) frequency
illustrates the difference. In non-parametric analysis, IBD alleles are treated
mathematically in terms of the Mendelian probability of inheritance from the
AB CD
defined common ancestor; for IBS alleles, the population frequency is used. For
very rare alleles, two independent origins are unlikely, so IBS generally implies
IBD. With common alleles, no such inference can be made. Multiallele microsat-
ellites are more efficient than two-allele markers such as SNPs for defining IBD, AC AC ¼
AD
and multilocus multiallele haplotypes are better still, because any one haplotype BC ½
is likely to be rare. Shared segment analysis can be conducted with either IBS or BD ¼
IBD data, provided that the appropriate analysis is used. IBD is the more power- (B)
ful of the two, but it requires samples from more relatives because it is necessary
to work out the likelihood that sharing is IBD rather than IBS. AB CD
Figure 15.2 Affected sib pair analysis. (A) By random segregation, sib pairs
share 2 (both AC), 1 (AC and either AD or BC), or 0 (AC and BD) parental haplotypes (D)
4 , ©, and 4 of the time, respectively. (B) Pairs of sibs who are both affected by a
1 1
Mendelian dominant condition must share the segment that carries the disease AB CD
allele, and they may or may not (a 50:50 chance) share a haplotype from the
unaffected parent. (C) Pairs of sibs who are both affected by a Mendelian recessive
condition necessarily share the same two parental haplotypes for the relevant
chromosomal segment. (D) For complex conditions, haplotype sharing above that AC AC >¼
expected to occur by chance (as in panel A) identifies chromosomal segments AD >½
BC
containing susceptibility genes. BD <¼
LINKAGE ANALYSIS OF COMPLEX CHARACTERS 475
and chromosomal regions are sought where the sharing is above the random
1:2:1 ratios of sharing 2, 1, or 0 haplotypes IBD. If the sib pairs are tested only for
IBS, the expected sharing on the null hypothesis must be calculated as a function
of the gene frequencies.
Shared segment analysis can be performed using any set of affected relatives
and without making any assumptions about the genetics of the disease. Affected
sib pairs are especially favored because they are relatively easy to collect.
Multilocus analysis is preferable to single-locus analysis because it more effi-
ciently extracts the information about IBD sharing across the chromosomal
region. The Mapmaker/Sibs computer program is widely used to analyze multi-
point ASP data. Programs such as Genehunter extend shared segment analysis to
other relationships. The programs calculate the extent to which affected relatives
share alleles IBD. The result across all affected pedigree members is compared
with the null hypothesis of simple Mendelian segregation (markers will segregate
according to Mendelian ratios unless the segregation among affected people is
distorted by linkage or association). The comparison can be used to compute a
non-parametric lod (NPL) score. Theoretical arguments by Lander & Kruglyak
suggest genomewide lod score thresholds of 3.6 for IBD testing of affected sib
pairs, and 4.0 for IBS testing.
Criteria suggested by Lander E & Kruglyak L (1995) Nat. Genet. 11, 241–247. The figures for p values
and lod scores are for ASP studies as given by Altmüller J, Palmer LJ, Fischer G et al. (2001)
Am. J. Hum. Genet. 69, 936–950.
of the data on the given hypothesis. For example, as explained in Chapter 14,
in two-point analysis of a Mendelian condition, a lod score of 3 corresponds
to an absolute probability of 0.05.
Complex disease studies often use simulation to estimate their significance
thresholds. In a typical permutation test, 1000 replicates of the family collection
are generated by a computer program with random marker genotypes, but based
on correct allele frequencies, recombination fractions, and so on. A genomewide
search is conducted in each simulated data set and the maximum lod score is
noted. The genomewide threshold of significance is set at a score that is exceeded
in less than 5% of the replicates. This is taken to be a lod score that has no more
than a 5% probability of having arisen purely by chance in the real data set.
Striking lucky threshold of
The actual lod score for any particular factor depends on how it happened to significance L3 L9
segregate in the particular families studied. But there are many susceptibility fac- L6
L1 L5 L8
lod score
tors, and a genomewide scan tests for all of them. If there are a dozen or more initial study L2 L7
cohort L4
L10
susceptibility loci, but the studies are only marginally powerful enough to detect
the effect of any of them, then there is a large and unavoidable element of chance.
In one set of families, a certain factor may happen to segregate in a way that gives
a significant lod score, whereas in an independent set of families used in a repli- threshold of L5
cation study, that factor may not happen to show such a favorable pattern (Figure significance L8
L1
15.3). In fact, targeted replication requires a study design with a much higher L2
L3
L4
L7 L10
lod score
statistical power than the original study. Thus, failure to replicate the conclusion replication L6 L9
study cohort
of a study does not necessarily mean that the original report was a false positive.
Nevertheless, for a long time the paucity of well-replicated findings cast a shadow
over linkage studies of complex disease. ten susceptibility loci (L1–L10)
suggestive 8p
Williams et al. (1999) ASPs (UK or Irish Caucasian) 196 ASPs suggestive 4p, 18q, Xcen
formed part of the meta-analysis by Altmüller J, Palmer LJ, Fischer G et al. (2001) Am. J. Hum. Genet. 69, 936–950. Full references are given in that paper.
NPL score
NPL score
NPL score
NPL score
1 1 1 1
0 0 0 0
-1 -1 -1 -1
-2 -2 -2 -2
-5 95 195 295 -5 95 195 -5 95 195 0 50 100 150
distance (cM) distance (cM) distance (cM) distance (cM)
NPL score
NPL score
NPL score
1 1 1 1
0 0 0 0
-1 -1 -1 -1
-2 -2 -2 -2
-5 95 195 -5 95 195 -5 45 95 145 -5 45 95 145
distance (cM) distance (cM) distance (cM) distance (cM)
NPL score
NPL score
NPL score
1 1 1 1
0 0 0 0
-1 -1 -1 -1
-2 -2 -2 -2
-5 45 95 245 -5 45 95 145 -5 45 95 145 -5 45 95 145
distance (cM) distance (cM) distance (cM) distance (cM)
NPL score
NPL score
NPL score
1 1 1 1
0 0 0 0
–1 -1 -1 -1
–2 -2 -2 -2
–5 45 95 145 -5 45 95 -5 45 95 -5 45 95
distance (cM) distance (cM) distance (cM) distance (cM)
NPL score
NPL score
NPL score
1 1 1 1
0 0 0 0
-1 -1 -1 -1
-2 -2 -2 -2
-5 45 95 -5 45 95 -5 45 95 -5 45 95
distance (cM) distance (cM) distance (cM) distance (cM)
NPL score
NPL score
1 1 1
0 0 0
-1 -1 -1
-2 -2 -2
-5 15 35 55 -5 20 40 60 -5 45 95 145
distance (cM) distance (cM) distance (cM)
Figure 15.4 Results of a genomewide association study of schizophrenia. A total of 171 affected individuals from 70 multiply affected sibships were
genotyped for 338 microsatellite markers spaced across the genome. The blue line shows results when only people with schizophrenia (DSM-III criteria)
were counted as affected. For the red line, individuals with DSM schizoaffective disorder were also included. The green line uses a broad definition of
affected (schizophrenia, schizoaffective disorder, paranoid or schizotypal personality disorder, delusional disorder, or brief reactive psychosis). In the
event, no significant or suggestive lod scores were observed. NPL, nonparametric lod score. [From Shaw SH, Kelly M, Smith AB et al. (1998) Am. J. Med.
Genet. (Neuropsychiatr. Genet.) 81, 364–378. With permission of Wiley-Liss, Inc., a subsidiary of John Wiley & Sons, Inc.]
ASSOCIATION STUDIES AND LINKAGE DISEQUILIBRIUM 479
assumptions about population structure and history to estimate the size distri-
bution of shared ancestral segments. However, the wide stochastic variance and
reliance on unknowable details of population history make even the most elabo-
rate calculations unreliable. What we need is data, and recently increasing quan-
tities of real data have become available.
Studying linkage disequilibrium
LD is a statistic about populations, not individuals. To study it, a sample of indi-
viduals from the chosen population is genotyped for a series of linked polymor-
phic markers. SNPs are the markers of choice, for three reasons:
• They are sufficiently abundant that they can be used to check very short chro-
mosome segments for disequilibrium.
• In comparison with microsatellites, they have a far lower rate of mutation.
This is important when the aim is to identify ancient conserved chromosomal
segments.
• SNPs are easy to genotype on a large scale, up to 1 million at once, spread
across the genome. This is very hard with other polymorphisms.
Unless genotyping is done on individual chromosomes isolated by laboratory
manipulation (an expensive option), the raw data will consist of the genotypes at
each individual locus, rather than haplotypes of alleles across multiple linked
loci. The raw genotype data must be phased—that is, converted into haplotypes—
before any useful analysis can be done. This can be done by genotyping other
family members to infer haplotypes; alternatively, computer programs can be
used to convert genotypes into haplotypes. These programs use a maximum-
likelihood procedure: they guess possible haplotypes, and keep trying until they
find the guess that best explains the whole data set with the minimum number of
plausibly related haplotypes (haplotypes within a population should be related
to one another by a minimum number of recombinations).
Various statistics have been used as measures of LD (Box 15.1).
distance apart. Closely spaced SNPs often showed little or no LD, whereas some-
times there was strong LD between more widely separated SNPs. Box 15.2 shows
how the patterns of disequilibrium can be represented graphically, and also illus-
trates just how complex and irregular the patterns can be. These irregularities
reflect the combined effects of several factors:
• Recombination, the force that breaks up ancestral segments, occurs mainly at
a limited number of discrete hotspots (see Chapter 14). SNPs that are close
together but separated by a hotspot show little or no LD, whereas, conversely,
LD may be strong across even a large chromosomal region if it is devoid of
hotspots.
• Gene conversion (see Box 14.1) may replace a small internal part of a con-
served segment, producing a localized breakdown of LD, whereas markers
either side of the replaced segment continue to show LD with each other.
• Population history is important. The older a population is, the shorter are the
conserved segments. LD is more extensive and of longer range in populations
derived from recent founders, as often occurs with small, genetically isolated
populations. LD will have a shorter range in populations that have remained
constant in size than in populations that have undergone a recent expansion.
Superimposed on this regularity are many stochastic effects. Chromosome
segments may carry a mixture of marker alleles that are IBD and alleles that
are only IBS (through independent mutations, back mutation, and so on), but
LD statistics do not distinguish between these two cases.
Efficient disease association studies depend on a detailed knowledge of the
patterns of LD across the genome. It is important to know how big and how
diverse the ancestral chromosome segments are, so that SNPs can be chosen to
test each conserved ancestral segment. The International HapMap Project was
established to provide this detailed knowledge. A consortium of academic insti-
tutions and pharmaceutical companies typed several million SNPs in 269 indi-
viduals drawn from four human populations: 30 white American parent–child
trios from Utah (CEU), 30 Yoruba parent–child trios from Ibadan, Nigeria (YRI),
45 individual Han Chinese from Beijing (CHB), and 44 individual Japanese from
Tokyo (JPT). All were ostensibly healthy and, although not a formal population
sample, were hopefully typical of the population from which they were drawn.
The Phase I report in 2005 (see Further Reading) summarized results from
1,007,329 SNPs; Phase II has added a further 3 million. The same individuals have
been used in several other studies, for example in surveys of copy-number vari-
ants and of gene expression levels, adding value and depth to the HapMap data.
The CEU and YRI samples, which each consisted of parent–child trios, could
be phased directly. For each trio, the genotype of the child could be used to infer
phase in the parental genotypes. Thus, each set of 30 trios yielded 120 phased
parental haplotypes. For the CHB and JPT samples, computer programs were
used to convert genotypes into haplotypes.
The key findings can be summarized as follows (Table 15.8):
• All four populations show a similar structure of blocks of strong linkage dis-
equilibrium, with little or no LD between markers in adjacent blocks. Block
boundaries are generally at similar positions across the four populations.
ASSOCIATION STUDIES AND LINKAGE DISEQUILIBRIUM 483
Only haplotypes having a frequency of 0.05 or greater in the relevant population are reported here.
The CHB and JPT samples have been combined because they show very similar patterns. A different
statistical method of defining blocks gave somewhat different detailed figures but a similar overall
pattern. YRI, Yoruba from Ibadan, Nigeria; CEU, white Americans from Utah; CHB, Han Chinese from
Beijing; JPT, Japanese from Tokyo. [Data from The International HapMap Consortium (2005) Nature
437, 1299–1320.]
484 Chapter 15: Mapping Genes Conferring Susceptibility to Complex Diseases
• Blocks vary in size, but are typically 5–15 kb. Much longer blocks can be found
in chromosomal regions where there is very little recombination, such as cen-
tromeres, or that have been subject to recent strong selection.
• The blocks, as defined here, cover only 67–87% (depending on the statistic
used to define a block) of the genomic regions examined; in other words,
13–33% of the regions analyzed show a more chaotic and diverse pattern of
variation between individuals.
• The general size of blocks is similar in the CEU, CHB, and JPT samples
(13.2–16.3 kb), but smaller in the YRI samples (7.3 kb). This accords with the
Out of Africa model of human origins that suggests that humans originated in
Africa and evolved there for a considerable time before the progenitors of
modern non-African populations migrated out. Sub-Saharan African popula-
tions are older and more variable than all others. Their founders lie further in
the past, and the present structure reflects a greater number of generations
during which meiotic recombination has had the chance to fragment the
founder chromosomes.
• Human genetic variability is much more limited than the number of SNPs
might suggest. In each population studied, on average 4.0–5.6 haplotypes
account for 93–95% of copies of any given block.
The transmission disequilibrium test (TDT) statistic, which is based on 94 families with one
the standard c2 statistic, is or more diabetic children
(a – b)2/(a + b)
where a is the number of times that a heterozygous parent transmits
I to the affected offspring, and b is the number of times that the
neither parent one parent heterozygous
other allele is transmitted. The mathematically inclined can find the
heterozygous for allele I
justification for this procedure in the paper by Spielman, McGinnis, and for allele I 57 families
Ewens (see Further Reading). 37 families
Figure 1 shows the transmission disequilibrium test applied
to type 1 diabetes. Ninety-four families were investigated for an
association between type 1 diabetes and a particular allele at a X–X I–X I–X
repetitive sequence upstream of the insulin gene. Among the
94 families were 57 parents who were heterozygous for the allele
under investigation (denoted by I) and some other allele (denoted I– X–
collectively by X). These 57 parents transmitted 124 alleles to diabetic allele I other allele (X
offspring (some had more than one affected child). Of these, 78 were transmitted transmitted
allele I and 46 allele X. 78 cases 46 cases
The TDT statistic had a value of (78 – 46)2/(78 + 46) = 322/124 = Figure 1 An example of TDT data. Only one parent is shown in each
8.26, corresponding to p = 0.004. Thus, the data demonstrated an case. [Data reported in Spielman RS, McGinnis RE & Ewens WJ (1993)
association between the I allele and type 1 diabetes. Am. J. Hum. Genet. 52, 506–516. With permission from Elsevier.]
1000 isolated cases and their parents than it is to collect 1000 affected sib pairs, at
least for early-onset diseases, for which both parents are likely to be still alive. The
paper helped trigger a widespread move away from linkage studies and toward
studies of association.
BOX 15.4 SAMPLE SIZES NEEDED TO FIND A DISEASE SUSCEPTIBILITY LOCUS BY A GENOMEWIDE SCAN
Risch & Merikangas (1996) calculated the sample sizes Table 1 Comparison of the power of linkage and association studies
needed to find a disease susceptibility locus by a genomewide
scan by using either affected sib pairs (ASPs) or the transmission Susceptibility factor ASP analysis TDT analysis
disequilibrium test (TDT). This Box summarizes their
formulae and equations, but the original paper (see Further g p Y No. of pairs P(trA) No. of trios
Reading) should be consulted for the derivations and for details. 5 0.01 0.534 2530 0.830 747
A standard piece of statistics tells us that the sample
size M required to distinguish a genetic effect from the null 0.1 0.634 161 0.830 108
hypothesis with power (1 – b) and significance level a is
given by (Za – sZ1 – b)2/m2, where Z refers to the standard 0.5 0.591 355 0.830 83
normal deviate. The mean m and variance s2 are calculated
3 0.01 0.509 33,797 0.750 1960
as functions of the susceptibility allele frequency (p) and the
relative risk g conferred by one copy of the susceptibility allele. 0.1 0.556 953 0.750 251
The model assumes that the relative risk for a person carrying
two susceptibility alleles is g 2, that the marker used is always 0.5 0.556 953 0.750 150
informative, and that there is no recombination with the
susceptibility locus. 2 0.1 0.518 9167 0.667 696
For ASP, the expected allele sharing at the susceptibility
0.5 0.526 4254 0.667 340
locus is given by Y = (1 + w)/(2 + w), where w = [pq(g – 1)2]/
(pg + q). m = 2Y – 1 and s2 = 4Y(1 – Y). The genomewide 1.5 0.1 0.505 115,537 0.600 2219
threshold of significance (probability 0.05 of a false positive
anywhere in the genome; testing for sharing IBD) requires a 0.5 0.510 30,660 0.600 950
lod score of 3.6, corresponding to a = 3 ¥ 10– 5, and Za = 4.014.
For 80% power to detect an effect, 1 – b = 0.2 and Z1 – b = –0.84. 1.2 0.1 0.501 3,951,997 0.545 11,868
For the TDT, the probability that a parent will be
0.5 0.502 696,099 0.545 4606
heterozygous for the allele in question is h = pq(g + 1)/
(pg + q). P(trA), the probability that such a heterozygous parent The table shows the number of affected sib pairs (ASPs) or parent–child
will transmit the high-risk allele to the affected child, is g/(1 + g ). trios needed for 80% power to detect significant linkage or association
m = ÷h(g – 1)/(g + 1), and s2 = 1 – [h(g – 1)2/(g + 1)2]. As discussed in a genomewide search. g is the relative risk for individuals of genotype
above, for a genomewide screen involving 1,000,000 tests, Aa compared with aa; p is the frequency of the susceptibility allele, A. For
a = 5 ¥ 10– 8 , Za = 5.33, and, as before, Z1 – b = –0.84. affected sib pair analysis, Y is the expected allele sharing; for the trios analyzed
In Table 1, the Za, Z1 – b, m, and s2 values are used to by the transmission disequilibrium test, P(trA) is the probability that a parent
calculate sample sizes by substitution in the formula heterozygous for allele A will transmit A to an affected child. For low values
M = (Za – sZ1 – b)2/m2. For the TDT, the answer is halved of g, unfeasibly large numbers of affected sib pairs are needed to detect an
because each parent–child trio allows two tests, one on each effect. [Data from Risch N & Merikangas K (1996) Science 273, 1516–1517.]
parent.
488 Chapter 15: Mapping Genes Conferring Susceptibility to Complex Diseases
Different studies of the same disease and candidate gene may report associations with different
markers, or with multilocus haplotypes. a These odds ratios and all allele or haplotype frequencies
are representative examples of data from one out of several reports on each association.
b These odds ratios are from the meta-analysis by Lohmueller KE, Pearce CL, Pike M et al. (2003)
15
bipolar disorder Figure 15.9 Results of the Wellcome
10 Trust Case-Control Consortium (WTCCC)
5 genomewide association study. For each of
0 the seven diseases studied, the distribution
1
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
of p values (as –log10p) for the association
coronary artery disease of each SNP with the disease is shown, at
the appropriate chromosomal position.
15 Most of the 469,557 SNPs that passed all
10 the quality control checks showed weak or
5
no association with the respective disease
0
(blue dots, merged together). Those showing
1
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
stronger evidence of association (p < 10–5)
Crohn disease
are marked with green dots. The 25 most
15 strongly associated SNPs or clusters of SNPs
10 (p < 5 ¥ 10–7, the threshold of significance
5 in this study) are marked with red triangles.
0 [Adapted from The Wellcome Trust
1
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Case-Control Consortium (2007) Nature
15 hypertension 447, 661–676. With permission from
–log10(p)
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
rheumatoid arthritis
15
10
5
0
1
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
type 1 diabetes
15
10
5
0
1
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
type 2 diabetes
15
10
5
0
1
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
chromosome
WTCCC study, and one of the remaining two showed positive signals that did not
quite reach significance. Almost all of the novel susceptibility factors have subse-
quently been confirmed in independent follow-up studies.
The WTCCC study is not the only one to have exploited the new tools and
knowledge noted above. Between them, these studies clearly show that the era of
reliable GWA studies has finally arrived.
The size of the relative risk
Considering the 25 SNPs showing the strongest evidence of association in the
WTCCC study, the odds ratios (risk for a heterozygous carrier of the risk allele
compared with a homozygous non-carrier) varied between 1.09 and 5.49.
However, only the previously well-known associations of certain HLA types with
autoimmune disease (type 1 diabetes and rheumatoid arthritis) produced high
odds ratios. For all other cases, the highest ratio observed was 2.08, and in only
five cases did it even exceed 1.5. As already mentioned, the study was predicted
to have 80% power to detect a factor with an odds ratio of 1.5. The clear conclu-
sion is that there are no individually strong ancient susceptibility factors for any
of these typical common complex diseases.
A common story is emerging from this and other similar studies: the ancient
susceptibility factors can be identified by current methods—but the relative risks
they confer are quite small.
THE LIMITATIONS OF ASSOCIATION STUDIES 491
genes suggest that maybe half of all such changes may be mildly deleterious (with
a quarter being seriously deleterious and a quarter neutral). For silent changes
and changes in noncoding sequences, the proportion of neutral variants is likely
to be much higher.
Several preliminary resequencing studies have indeed found an excess of rare
missense variants in candidate genes in disease cohorts, but such studies need to
be conducted on a wider selection of genes and in more diseases to see whether
the effect is general. The new massively parallel sequencing technologies allow
this to be done, so that definitive tests of the mutation–selection hypothesis are
now possible.
FURTHER READING
Family studies of complex disease Linkage analysis of complex characters
Burmeister M, McInnis MG & Zollner S (2008) Psychiatric genetics: Altmüller J, Palmer LJ, Fischer G et al. (2001) Genomewide scans
progress amid controversy. Nat. Rev. Genet. 9, 527–540. [An of complex human diseases: true linkage is hard to find. Am.
overview of progress since the pioneering studies used as J. Hum. Genet. 69, 936–950. [A sobering meta-analysis of 101
examples in this section.] linkage studies in 31 complex diseases.]
McGuffin P, Shanks MF & Hodgson RJ (eds) (1984) The Scientific Lander ES & Kruglyak L (1995) Genetic dissection of complex
Principles Of Psychopathology. Grune & Stratton. traits: guidelines for interpreting and reporting linkage
Risch N (1990) Linkage strategies for genetically complex traits. results. Nat. Genet. 11, 241–247. [Introducing the widely used
1. Multilocus models. 2. The power of affected relative pairs. categories of suggestive, significant, and highly significant
3. The effect of marker polymorphism on analysis of affected results.]
relative pairs. Am. J. Hum. Genet. 46, 222–228; 229–241;
242–253. [Three key papers establishing the statistical basis Association studies and linkage disequilibrium
of familial clustering and shared segment analysis.] Cardon LR & Bell JI (2001) Association study designs for
Rosenthal D & Kety SS (1968) The Transmission Of Schizophrenia. complex diseases. Nat. Rev. Genet. 2, 91–99. [A general non-
Pergamon Press. mathematical discussion of designs for association studies.]
International HapMap Consortium (2003) The International
Segregation analysis HapMap Project. Nature 426, 789–796. [A description of the
Badner JA, Sieber WK, Garver KL & Chakravarti A (1990) A aims and methods of the project.]
genetic study of Hirschsprung disease. Am. J. Hum. Genet. 46, International HapMap Consortium (2005) A haplotype map of
568–580. [A good example of segregation analysis applied to the human genome. Nature 437, 1299–1320. [The primary
a non-Mendelian condition.] report of Phase I of the HapMap project; available for
McGuffin P & Huckle P (1990) Simulation of Mendelism revisited: download from http://www.hapmap.org]
the recessive gene for attending medical school. Am. J. International HapMap Consortium (2007) A second generation
Hum. Genet. 46, 994–999. [A warning about some pitfalls of human haplotype map of over 3.1 million SNPs. Nature 449,
segregation analysis.] 851–861. [Report of Phase II.]
496 Chapter 15: Mapping Genes Conferring Susceptibility to Complex Diseases
Jobling MA, Hurles ME & Tyler-Smith C (2004) Human The limitations of association studies
Evolutionary Genetics: Origins, People and Disease. Garland Bodmer W & Bonilla C (2008) Common and rare variants in
Science. [A unique and excellent textbook that, among many multifactorial susceptibility to common disease. Nat. Genet.
other things, sets out the whole background to linkage 40, 695–710. [An important contribution to the debate about
disequilibrium.] the relative merits of the common disease–common variant
Lohmueller KE, Pearce CL, Pike M et al. (2003) Meta-analysis and mutation–selection hypotheses.]
of genetic association studies supports a contribution of Helbig I, Mefford HC, Sharp AJ et al. (2009) 15q13.3
common variants to susceptibility to common disease. Nat. microdeletions increase risk of idiopathic generalized
Genet. 33, 177–182. [Showing that some early association epilepsy. Nat. Genet. 41, 160–162. [One of several studies
studies did indeed produce replicable results.] showing that the microdeletion acts as a typical low-
Risch N & Merikangas K (1996) The future of genetic studies penetrance susceptibility factor for several different
of complex human diseases. Science 273, 1516–1517. [The neuropsychiatric disorders. This paper includes references to
power calculations in this paper helped trigger the move other studies.]
from linkage to association studies; see also Science 275, Jacobsson M, Scholz SW, Scheet P et al. (2008) Genotype,
1327–1330 (1997) for discussion.] haplotype and copy number variation in worldwide
Slatkin M (2008) Linkage disequilibrium—understanding the human populations. Nature 451, 998–1003. [An extension
evolutionary past and mapping the medical future. Nat. of HapMap-type investigations to 485 individuals from 29
Rev. Genet. 9, 477–485. [A useful general introduction to LD, populations.]
including definitions, measures, and origins.] Kryukov GV, Pennachio LA & Sunyaev SR (2007) Most rare
Association studies in practice missense alleles are deleterious in humans: implications for
complex disease and association studies. Am. J. Hum. Genet.
Altshuler D & Daly M (2007) Guilt beyond a reasonable doubt. 80, 727–739. [Data and calculations suggesting that around
Nat. Genet. 39, 813–815. [A brief review of the achievements 50% of rare coding-sequence variants are mildly deleterious,
of the first wave of successful genomewide association thus supporting the mutation–selection hypothesis.]
studies.] Pritchard JK & Cox NJ (2002) The allelic architecture of human
Campbell CD, Ogburn EL, Lunetta KL et al. (2005) Demonstrating disease genes: common disease–common variant or not?
stratification in a European American population. Nat. Genet. Hum. Mol. Genet. 11, 2417–2423. [Arguments against the
37, 868–872. [A cautionary tale about the need to match common disease–common variant hypothesis.]
cases and controls very carefully.] Reich DE & Lander ES (2001) On the allelic spectrum of human
Database of Genotypes and Phenotypes (dbGaP). http://www disease. Trends Genet. 17, 502–510. [Arguments in favor of the
.ncbi.nlm.nih.gov/entrez/query/Gap/gap_tmpl/about common disease–common variant hypothesis.]
.html [Planned to be a central resource for raw data relating Topol EJ & Frazer KA (2007) The resequencing imperative. Nat.
genotypes and phenotypes, including results of association Genet. 39, 439–440. [A brief review of the case for large-scale
studies.] resequencing and results to date.]
Schaid DJ (1998) Transmission disequilibrium, family controls and
great expectations. Am. J. Hum. Genet. 63, 935–941. [A review
of the strengths and weaknesses of the TDT.]
Spielman RS, McGinnis RE & Ewens WJ (1993) Transmission
test for linkage disequilibrium: the insulin gene region and
insulin-dependent diabetes mellitus (IDDM). Am. J. Hum.
Genet. 52, 506–516. [Describes the statistical basis for the
transmission disequilibrium test, and shows an example of its
power.]
Wellcome Trust Case-Control Consortium (2007) Genome-wide
association study of 14,000 cases of seven common diseases
and 3,000 shared controls. Nature 447, 661–678. [An excellent
overview of how to perform GWA studies and what they
might show.]
Wilson JF & Goldstein DB (2001) Consistent long-range linkage
disequilibrium generated by admixture in a Bantu–Semitic
hybrid population. Am. J. Hum. Genet. 67, 926–935. [The
advantages of an admixed population for association
studies.]
Wright AF, Carothers AD & Pirastu M (1999) Population choice
in mapping genes for complex diseases. Nat. Genet. 23,
397–404. [Despite its age, a good review of the options.]
Chapter 16
• Positional cloning has been the main route through which the genes underlying Mendelian
diseases have been identified. In positional cloning, a gene is identified from its
chromosomal location alone.
• Positional candidate genes are identified by drawing up a list of genes at that location, on
the basis of their function, expression pattern, and homologies.
• Model organisms can provide clues to gene identification because the structure and
function of many genes are conserved between humans and other organisms.
• Mice are particularly useful for identifying disease genes and studying pathogenic
mechanisms because of the ease with which genes can be mapped and manipulated in
mice, and because of the relatively close similarity of mice and humans.
• Patients with chromosome abnormalities can provide essential pointers to the location of a
disease gene. Large abnormalities may be recognized by conventional cytogenetics, and
small ones by comparative genomic hybridization or using SNP arrays. These approaches
have been especially useful for severe dominant conditions, in which most cases are new
mutations, making linkage analysis impossible.
• Some disease genes have been identified by alternative, position-independent approaches.
These include functional studies of cells in culture, direct identification of altered proteins
or changed gene expression, and extrapolation from animal models.
• Candidate genes are tested by seeking mutations in collections of unrelated affected cases.
Functional studies may be needed to confirm an uncertain identification.
• Susceptibility factors for complex disease are mainly sought through association studies.
These may be targeted at genes whose function makes them likely candidates, or studies
may be conducted on a genomewide scale.
• Association studies identify haplotype blocks that are associated with susceptibility to a
disease. Identifying the actual causal variant is difficult because of linkage disequilibrium
between all the variants in the block. Both statistical and functional evidence are needed.
• Association studies have identified susceptibility factors for many complex diseases, but for
most diseases the known factors only weakly influence susceptibility and collectively
account for only a small part of the overall genetic determination, leaving a problem of
hidden heritability.
• The hidden heritability in complex diseases may consist of the cumulative effects of very
many extremely weak genetic factors, or of the effects of a highly heterogeneous collection
of individually rare mutations, or possibly of epigenetic changes.
498 Chapter 16: Identifying Human Disease Genes and Susceptibility Factors
Few subjects have moved as fast as human disease gene identification. Before
1980, very few human genes had been identified as disease loci. The few early
successes involved a handful of diseases with a known biochemical basis, for
which it was possible to purify the gene product. In the 1980s, advances in recom-
binant DNA technology allowed a new purely genetic approach, sometimes given
the rather meaningless label reverse genetics. The number of disease genes identi-
fied started to increase, but these early successes were hard-won, heroic efforts.
With the advent of polymerase chain reaction (PCR) techniques for linkage stud-
ies and mutation screening, it all became much easier. Now that the human and
other genome projects have made available a vast range of resources, the ability
to identify a Mendelian disease gene depends almost entirely on having access to
suitable families.
The 1990s saw triumphant progress in identifying the causes of Mendelian
diseases, so that the genes underlying most common Mendelian diseases have
now been identified. The exceptions tend to be those such as nonsyndromic
mental retardation, in which the hunt for causative genes is bedeviled by irreduc-
ible genetic heterogeneity or, alternatively, very rare conditions for which it is
difficult to find enough suitable families. However, identifying the factors confer-
ring susceptibility to common complex diseases is much more challenging. As
described in Chapter 15, only association studies have sufficient power to iden-
tify most susceptibility factors; linkage studies are seldom productive. For years,
researchers were unable to make much progress. After many false dawns, the
new generation of large-scale genomewide association studies, reported in
increasing numbers from 2005 onward, has finally broken the logjam. Many sus-
ceptibility factors have now been mapped, but identifying the actual causal
sequence change remains very difficult.
In this chapter, we first consider positional cloning, which has been the most
successful general approach to the identification of Mendelian disease genes.
Most commonly, the candidate chromosomal region is identified through link-
age studies, but alternatives are to use chromosomal breakpoints or animal mod-
els. We next consider some other approaches: cloning based on knowledge of the
gene product, and identifying causal variants through genomewide association
studies. An important question is how one knows when one has succeeded. How
can we know that a candidate sequence variant really is the cause of the pheno-
type? This connects with the discussion in Chapter 13 of pathogenic versus non-
pathogenic sequence changes. Toward the end of the chapter is a series of case
studies that show how the various methods have been used to identify important
phenotypic determinants.
The approaches described are just as applicable to identifying determinants
of normal variations such as red hair or red–green color blindness as they are to
identifying disease genes. The determinants that we would wish to identify may
not necessarily be genes, in the sense of protein-coding sequences. The primary
determinant might affect noncoding RNAs or alter chromatin structure, and it
might be located some distance from any protein-coding sequence. Whatever
the mechanism, however, we are looking for DNA sequence variants that cause
phenotypic variation. Epigenetic changes that do not alter the DNA sequence
(such as changes in DNA methylation) also cause phenotypic variation, ulti-
mately mediated by changes in the expression of protein-coding genes.
It is important not to be misled by phrases such as the gene for cystic fibrosis,
the gene for diabetes, and so on. You would not describe your domestic freezer as
a machine for ruining frozen food. Genes do a job in cells; if the job is not done,
or is done wrongly, the result may be a disease. However, many human genes
were first discovered through research on the diseases caused by mutations in
them, so terminology such as the cystic fibrosis gene does reflect historical, if not
biological, reality.
A1 B1 C1 D1 E1
mapped Y candidate N collect genetic
successfully
candidate chromosomal families mapping:
located?
homolog? region? for mapping genome search
Y
Y N
B2 C2 D2 E2
think again
check chromosomal clone the
about mode of
databases deletions or chromosomal
inheritance,
for genes translocations breakpoints
heterogeneity
N
A3 B3 D3 E3
cloned Y possible identify think again
candidate candidate new human about candidate
homolog? gene genes genes
B4 D4
N
has it work out
been fully full sequence
characterized? and structure
Y
N
C5 D5 E5
collect look pathogenic
unrelated for mutations mutations
patients found?
major efforts in screening cDNA libraries to identify and characterize the genes.
define the candidate region
By 1995, only about 50 inherited disease genes had been identified by this
approach.
The first step in positional cloning is to define the candidate obtain clones of all the DNA of the region
region as tightly as possible
The difficulty of positional cloning depends very largely on the size of the candi-
date region, so the first priority is to narrow this down as far as possible. For identify all the genes in the region
Mendelian diseases, this is mainly a function of the number of meioses available
for linkage analysis. The limit of resolution is reached when the last recombinant
has been mapped between closely spaced markers. Using the generalization of
prioritize them for mutation screening
1 cM = 1 Mb, a family collection with 100 informative meioses might localize a
Mendelian disease to a candidate region of about 1 Mb—depending on luck and
on the frequency of recombination in the particular chromosomal region.
At this stage, the raw data should be scrutinized by inspecting haplotypes test candidate genes for
rather than relying on computer analysis (Figure 16.3). When single recom- mutations in affected people
binants define the boundaries of the candidate region, it is important to consider
possible sources of error. Meticulous clinical diagnoses are imperative. Key Figure 16.2 The logic of positional
cloning. The figure illustrates the logical
recombinants are more reliable if they occur in unambiguously affected peo-
progression of positional cloning as
ple—an unaffected individual might be a non-penetrant carrier of the disease originally practiced. Nowadays a list of genes
gene. Apparent close double recombinants are highly suspect; as explained in in the candidate region can be downloaded
Chapter 14, they most probably reflect errors in diagnosis or genotyping. The from genome databases. Before the current
possibility of phenocopies must be considered. Sometimes, despite good positive sequence data and high-resolution marker
lod scores, there seem to be recombinants with every marker tried. This is usually maps were available, researchers tried all
an indication of locus heterogeneity—in one or more of the families in the study sorts of shortcuts to reduce the labor of pure
the disease does not map to the region under study. positional cloning.
I I
1 2 1 2 3
II II
1 2 3 4 5 6 7 1 2 3 4
7 8 6 2 2 1 6 2 3 7 2 1 6 2 D12S84 8 10 8 10 8 5 8 10 D12S84
5 6 5 2 5 4 5 2 6 4 5 4 5 2 D12S105 3 4 3 3 3 3 3 3 D12S105
4 3 2 1 2 1 2 1 5 4 2 1 2 1 D12S234 8 6 8 2 8 5 8 2 D12S234
6 2 6 9 5 3 6 9 5 2 6 3 6 9 D12S129 2 4 2 3 2 3 4 3 D12S129
6 1 2 5 5 7 2 5 7 5 2 7 2 5 D12S354 7 1 7 5 7 1 2 5 D12S354
2 8 2 ? 5 5 2 5 8 4 2 5 2 8 D12S79 10 6 10 8 10 7 3 8 D12S79
III
1 2 3 4
2 8 6 7 6 3 6 7 D12S84
2 6 5 5 5 6 5 4 D12S105
1 3 2 4 2 5 2 4 D12S234
6 2 6 6 6 5 6 2 D12S129
2 1 2 6 2 7 2 5 D12S354
2 8 2 2 2 8 2 4 D12S79
Figure 16.3 Defining the minimal candidate region by inspection of haplotypes. The two pedigrees show a dominantly inherited skin disorder,
Darier–White disease (OMIM 124200), which had previously been mapped to chromosome 12q. The 12q marker haplotype that segregates with the
disease is highlighted in yellow. Gray boxes mark inferred haplotypes in dead people. Numbers within the boxes identify different alleles of the relevant
marker. (A) The recombination in individual II6 in this pedigree shows that the disease must be located distal of marker D12S84. D12S105 (green) is
uninformative because I1 was evidently homozygous for allele 5 (II3 and II7 evidently inherited opposite haplotypes from I1, but both have D12S105 allele
5). The recombination shown in III1 suggests that the disease gene maps proximal to D12S129, but this requires confirmation because the interpretation
requires the genotypes of II1 and II2 to be inferred correctly, and that III1 not be a non-penetrant gene carrier. (B) The recombination in individual II4 in
this pedigree provides the confirmation. The combined data locate the Darier–White gene to the interval between D12S84 and D12S129. [From Carter SA,
Bryce SD, Munro CS et al. (1994) Genomics 24, 378–382. With permission from Elsevier.]
POSITIONAL CLONING 501
Scale 100 kb
chr6: 43000000 43050000 43100000 43150000 43200000 43250000 43300000 43350000 43400000
Assembly from Fragments
Assembly
RefSeq Genes
RPL7L1 PTCRA GNMT MEA1 MRPL2 SRF C6orf108 SLC22A7
C6orf226 CNPY3 PPP2R5D CUL7 CUL9 SLC22A7
PEX6 KLHDC3 KLC4 C6orf108 CRIP3
PPP2R5D KLC4 TTBK1 ZNF318
PPP2R5D KLC4
C6orf153 KLC4
PTK7
PTK7
PTK7
PTK7
genomic libraries (chromosome walking), and then expressed sequences from Figure 16.4 Using a genome browser to
within the clone contig had to be identified by mapping transcripts. The cam- list the genes in a candidate region.
paign to clone the cystic fibrosis gene, described in case study 2 at the end of this This partial screendump from the UCSC
genome browser (http://www.genome
chapter, gives an excellent insight into the difficulties researchers had to over-
.ucsc.edu/) shows genes in a 500 kb region
come in this early phase.
of chromosome 6p21.1. Exons of a gene
Nowadays a genome browser such as Ensembl (http://www.ensembl.org/) or (vertical bars) are linked by a horizontal
the Santa Cruz browser (http://genome.cse.ucsc.edu/) would be used to display line, and arrowheads show the direction of
all the definite and possible genes in the candidate region (Figure 16.4). Deeply transcription.
impressive though these displays are, it is important not to rely totally on them.
These extremely sophisticated tools need to be used as adjuncts to, and not
replacements for, thought. They must be supplemented by first-hand in-depth
study of the region. The reference sequence used by the browsers is currently far
from perfect. Going back to the raw data can identify missing or variable parts of
the sequence. The ENCODE (Encyclopedia of DNA Elements) project (see
Chapter 11) has shown that gene structures identified by the browsers are often
not complete. The current annotations do include virtually all protein-coding
genes, but they may not identify all the possible exons or alternative transcripts,
especially those using small exons separated by large introns, and they certainly
omit many noncoding transcripts. It can be useful to compare the predictions of
several different gene-finding programs. In the laboratory, techniques such as
reverse transcriptase (RT)-PCR (see Chapter 8) or 5¢ RACE (see Box 11.1) can be
used to verify claimed isoforms and search for new ones.
The third step is to prioritize genes from the candidate region for
mutation testing
To identify mutations, samples from a collection of unrelated affected people
need to be sequenced. Typically, a candidate region defined by linkage analysis
would contain several genes with 100 or more exons in total. One approach, made
increasingly possible by new sequencing technologies, is simply to sequence
every exon across the candidate region. More usually, however, a candidate gene
is selected, and each exon of that gene is individually amplified by PCR and
sequenced. In choosing genes for sequencing it makes sense to start with the
most promising ones, although ease of analysis is also a consideration. All things
being equal, one would deal with a gene with 4 exons before tackling one
with 65.
Appropriate expression
A good candidate gene should have an expression pattern consistent with the
disease phenotype. Expression need not be restricted to the affected tissue,
because there are many examples of widely expressed genes causing a tissue-
specific disease, but the candidate gene should at least be expressed in the place
where the pathology is seen, and at or before the time at which the pathology
becomes evident. For example, genes responsible for neural tube defects should
be expressed in the neural tube shortly before it closes (which happens during
the third and fourth weeks of human embryonic development; see Chapter 5).
The expression of candidate genes can be tested by RT-PCR, northern blotting
(see Chapter 7), or serial analysis of gene expression (SAGE; see Box 11.1). Much
of the preliminary work can be performed with databases (dbEST or the SAGE
database, both accessible through the NCBI homepage at http://www.ncbi.nlm
.nih.gov/) rather than in the laboratory. In situ hybridization against mRNA in
tissue sections, or immunohistochemistry with labeled antibodies, provides the
502 Chapter 16: Identifying Human Disease Genes and Susceptibility Factors
most detailed picture of expression patterns. Studies are usually done on mouse
tissues, especially for embryonic stages. The common assumption that humans
and mice will show similar expression patterns is not always justified, and cen-
tralized resources of staged human embryo sections have been established to
allow the equivalent analyses to be performed, where necessary, on human
embryos (see http://www.hdbr.org/).
Appropriate function
When the function of a gene in the candidate region is known, it may be obvious
whether or not it is a good candidate for the disease. For example, rhodopsin was
a good candidate for retinitis pigmentosum, and fibrillin for the connective tis-
sue disorder Marfan syndrome (OMIM 154700). For a novel gene, sequence anal-
ysis will often provide clues to its function through the recognition of common
sequence motifs such as transmembrane domains or tyrosine kinase motifs.
These may be sufficient to prioritize a gene as a candidate, given the pathology of
the disease. For example, ion transport in the inner ear is known to be critical for
hearing, so an ion channel gene would be a natural candidate in positional clon-
ing of a deafness gene.
Homologies and functional relationships
Sometimes a gene in the candidate region turns out to be a close homolog of a
known gene (a paralog in humans, or an ortholog in other species). If mutations
in the homologous gene cause a related phenotype, the new gene becomes a
compelling candidate. For example, Marfan syndrome is caused by mutations in
the fibrillin gene. A clinically overlapping condition, congenital contractural
arachnodactyly (CCA; OMIM 121050), was mapped to chromosome 5q. The can-
didate region contained the FBN2 gene, which is a paralog of fibrillin. FBN2
mutations were soon demonstrated in CCA patients. Weak homologies may be
missed by the programs used for automatic genome annotation. A more directed,
hypothesis-driven search may come up with significant weak homologies that
can point to the gene’s function.
Candidate genes may also be suggested on the basis of a close functional rela-
tionship to a gene known to be involved in a similar disease. The genes could be
related by encoding a receptor and its ligand, or other interacting components in
the same metabolic or developmental pathway. For example, mutations in the
RET gene were known to cause some but not all cases of Hirschsprung disease.
RET encodes a cell-surface tyrosine kinase receptor. The genes encoding the RET
ligands GDNF and NRTN were then obvious candidates for hunting for further
Hirschsprung mutations.
Now that we have genome sequences of many different species, it has become
clear how far structural and functional homologies extend across even very dis-
tantly related species. The great majority of mouse genes have an identifiable
human counterpart, and the same is true of other mammalian species. As
described in Chapter 10, extensive homologies can be detected between genes in
humans and model organisms such as zebrafish, the fruit fly Drosophila, the
nematode worm Caenorhabditis elegans and even the yeasts Saccharomyces
cerevisiae and Schizosaccharomyces pombe. For these distantly related organ-
isms, homology may be more evident in the amino acid sequence of the protein
product than in the DNA sequence of the gene. Looking at the amino acid level
allows synonymous coding sequence changes to be ignored.
A very powerful means of prioritizing candidates from among a set of human
genes is therefore to see what is known about homologous genes in well-studied
model organisms. Such data might include the pattern of expression and the
phenotype of mutants. Papers listed in Further Reading show how data from
Drosophila and yeast can be used to infer the function of human genes. Mice are
especially useful for such investigations, and their use is considered in more
detail below. Even more than gene sequences, pathways are often highly con-
served, so that knowledge of a developmental or control pathway in Drosophila
or yeast can be used to predict the likely working of human pathways—although
mammals often have several parallel paths corresponding to a single path in
lower organisms.
By contrast, mutant phenotypes are less likely to correspond closely. A strik-
ing example is the wingless apterous mutant of Drosophila. A human gene, LHX2,
POSITIONAL CLONING 503
Several methods are available for easy and rapid mapping of Recombinant inbred strains
phenotypes or DNA clones in mice. Together with the ability to These are obtained by systematic inbreeding of the progeny of a cross.
construct transgenic mice, they make the mouse especially useful for For example, the widely used BXD strains are a set of 26 lines derived
comparisons with humans. by more than 60 generations of inbreeding from the progeny of a
Interspecific crosses (Mus musculus crossed with either Mus spretus or C57BL/6J ¥ DBA/2J cross. They provide unlimited supplies of a panel of
Mus castaneus) chromosomes with fixed recombination points. DNA is available as a
Different mouse species have different alleles at many polymorphic public resource, analogous to the human CEPH families. Recombinant
loci, making it easy to recognize the origin of a marker allele. This is inbred strains are particularly suited to mapping quantitative traits,
exploited in two ways: which can be defined in each parent strain and averaged over a
• Constructing marker framework maps. Several laboratories number of animals of each recombinant type. In comparison with mice
have generated large sets of F2 backcrossed mice. Any marker from interspecific crosses, it may be harder to find a marker in a given
or cloned gene can be assigned rapidly to a small chromosomal region that distinguishes the two original strains, and the resolution is
segment defined by two recombination breakpoints in the lower because of the smaller numbers.
collection of backcrossed mice. For example, the Collaborative Congenic strains
European backcross was produced from a M. spretus ¥ These are identical except at a specific locus. They are produced
M. musculus (C57BL) cross. Five hundred F2 mice were produced by repeated backcrossing and can be used to explore the effect of
by backcrossing with M. spretus, and 500 by backcrossing with changing just one genetic factor on a constant background.
C57BL. All microsatellites in the framework map are scored in
every mouse.
• Mapping a new phenotype. A cross must be set up specifically
to do this but, unlike with humans, any number of F2 mice can be
bred to map to the desired resolution. M. musculus ¥ M. castaneus
crosses are easier to breed than M. musculus ¥ M. spretus.
is able to complement the deficient function of the mutant, so that the flies grow
normal wings (see Figure 10.27). We humans must have a virtually identical
developmental pathway to Drosophila, but clearly we use it for a different pur-
pose. No human mutation has been described, but Lhx2 knockout mice die
before birth with defects in development of the eyes, forebrain, and erythrocytes.
Branchio-oto-renal syndrome (OMIM 113650), the subject of case study 3 at the
end of this chapter, provides another example of a conserved pathway used for a
different purpose.
Clinicians have made major contributions to identifying disease Additional mental retardation
genes by identifying patients who have causative chromosome A patient may have a typical Mendelian disease but may in addition
abnormalities. be mentally retarded. This may be coincidental, but such cases can be
A cytogenetic abnormality in a patient with the standard caused by deletions that eliminate the disease gene plus additional
clinical presentation neighboring genes. Large chromosomal deletions almost always
If a disease gene has already been mapped to a certain chromosomal cause severe mental retardation, reflecting the involvement of a
location, and then a patient with that disease is found who has high proportion of our genes in fetal brain development. When the
a chromosome abnormality affecting that same location, the patient is also a de novo case of a condition that is normally familial,
chromosome abnormality most probably caused the disease. cytogenetic and molecular analysis is warranted.
Patients with balanced translocations or inversions often Contiguous gene syndromes
have breakpoints located within the disease gene, or very close Very rarely, a patient seems to suffer from several different genetic
to it. Cloning these breakpoints can provide the quickest route to disorders simultaneously. This may be just very bad luck, but
identifying the disease gene. With deletions, the breakpoints may sometimes the cause is simultaneous deletion of a contiguous
be located some distance from the disease gene, but if the deleted set of genes. Contiguous gene syndromes are described in
segment is smaller than the current candidate region, defining the Chapter 13; they are particularly well defined for X-linked diseases.
breakpoints helps localize the gene. One famous example, described at the end of this chapter, led
Most such patients will have de novo mutations. Some researchers directly to the identification of the gene mutated in Duchenne
feel that performing chromosome analysis on all patients with de novo muscular dystrophy.
mutations is a worthwhile expenditure of research effort.
THE VALUE OF PATIENTS WITH CHROMOSOMAL ABNORMALITIES 505
ple by splicing together exons of two genes to create a novel chimeric gene—this
is rare in inherited disease but common in tumorigenesis (see Chapter 17). In
either case, the breakpoint provides a valuable clue to the exact physical location
of the disease gene. The position of the breakpoint is most easily defined by using
FISH (Figure 16.6). This will often be sufficient to identify the gene involved. If
necessary, once the breakpoint has been localized to a single clone, its exact posi-
tion could be defined by finding a sequence that can be amplified by PCR from
DNA of the patient by using a pair of primers, one from each of the two chromo-
somes involved. An alternative approach would be to make a genomic library
from the patient’s DNA and then sequence appropriate clones. Breakpoints of
deletions or duplications can be easily mapped on microarrays by looking for a
change in the hybridization intensity.
An example of the power of this approach is the identification of the Sotos
syndrome gene (Figure 16.7). This syndrome (OMIM 117550) involves childhood
overgrowth, dysmorphic features, and mental retardation. Because it always
occurs de novo, there were no multigeneration families that might allow the caus-
ative gene to be localized by linkage analysis. A single patient with a de novo bal-
anced 5;8 translocation, 46,XX,t(5;8)(q35;q24.1), provided the vital clue. As shown
in Figure 16.7, the translocation breakpoint disrupted the NSD1 gene. This could
have been purely coincidental. Proof that NSD1 was the gene mutated in Sotos
syndrome came from showing that 4 out of 38 independent patients with
Sotos syndrome had point mutations in the NSD1 gene, and 20 out of 30 patients
had microdeletions involving this gene (Figure 16.8).
2.0
1.0
extract DNA 0
–1.0
intensity. The SNP genotypes can also be informative: SNPs from within a deleted
region will show only a single allele (apparent homozygosity), and SNPs from a
duplicated region may show three alleles.
Whichever technique is used, it is important to confirm the results by FISH.
Balanced abnormalities would not be detected, and the results for a duplication
or amplification would not distinguish tandem repeats from dispersed repeats.
The main limitation of these techniques is in the interpretation of the results. As
mentioned in Chapter 13, large-scale application of these methods has identified
innumerable copy number variations in normal healthy people. Just because a
microdeletion is seen in a patient with a clinical phenotype, it does not follow
that the deletion caused the phenotype. Databases of normal variants, such as
TCAG (see Further Reading), can be consulted to eliminate known nonpatho-
genic variants, but there always remains the possibility that the patient carries a
novel but harmless variant. Thus, as always with chromosomal abnormalities,
variants detected by CGH suggest hypotheses; these must then be checked by
other means.
studying gene expression in cells from that patient, or by finding unrelated chro-
mosomally normal cases with point mutations in the relevant gene.
1. isolate human
MYO15 gene, using
p.I892F primers designed p.C674Y sequence BAC 425 p24
p.N890Y from the mouse
p.K1300X myo15 mutation computer analysis
2. use somatic cell identified in sh2 identifies a novel myosin
MYO15 mutations hybrids to check that mouse gene, myo15
identified in three it maps to 17p11.2
unrelated DFNB3
patients
• The Fanconi anemia (see OMIM 227650) groups A, C, and G genes (see Chapter Figure 16.12 Functional complementation
13, p. 416) were identified by screening cDNA libraries for clones that could in transgenic mice as a tool for identifying
cure the DNA repair defect in cells from patients. a human disease gene. In mice, the shaker-2
mutation causes deafness. This mutation was
• The second of the two ways in which the gene responsible for multiple sulfa- identified by finding a wild-type clone that
tase deficiency was identified (see case study 4 at the end of this chapter) corrected the defect. Members of a human
involved finding a chromosome fragment that was able to correct the defect family with a similar phenotype that mapped
in a cell line from a patient. to the corresponding chromosomal location
• A similar functional complementation approach, but in this case in a whole proved to have mutations in the orthologous
gene, MYO15.
animal, was used to identify the mouse shaker-2 and human DFNB3 deafness
genes (Figure 16.12). In fact, this approach did use some positional informa-
tion, but it is described here because it is another example of functional com-
plementation. A form of autosomal recessive hearing loss was mapped to a
unique locus, DFNB3, in a very large extended Indonesian kindred. Because
all the patients were likely to be homozygous for the same single mutation, it
was unlikely that the pathogenic change could be unambiguously identified
by sequencing DNA from the candidate region in family members.
Comparative mapping showed that DFNB3 and shaker-2 were likely to be
orthologs. Transgenic shaker-2 mice were constructed that contained wild-
type BAC clones from the shaker-2 candidate region. A BAC that corrected the
phenotype was identified and turned out to contain an unconventional
myosin gene, myo15. The human MYO15 gene was then isolated on the basis
of its close homology with the mouse gene; its position within the DFNB3
candidate region was confirmed, and the Indonesian patients were then
shown to have a mutation in MYO15.
When a disease affects a biochemical pathway, each component of the path-
way is a non-positional candidate for the disease. The components may be iden-
tified through protein or RNA studies.
• Protein–protein interactions can be identified through yeast two-hybrid
experiments or by using mass spectrometry to identify the components of a
multiprotein complex. For example, BRIP1 and PALB2 (see case study 7 at the
end of this chapter) were identified as susceptibility factors for familial breast
cancer because the proteins they encode interact with the proteins encoded
by the known breast cancer susceptibility genes BRCA1 and BRCA2.
POSITION-INDEPENDENT ROUTES TO IDENTIFYING DISEASE GENES 511
• Microarrays are often used to compare mRNA profiles from patients and con-
trols, or from cells before and after the expression of some protein is knocked
down by RNAi (RNA interference; see Box 9.6, p. 284 and Chapter 12). This
produces a list of genes whose expression is altered—but the list is usually
very long. It is then necessary somehow to distinguish primary effects from all
the downstream changes.
Waardenburg–Hirschsprung
disease (OMIM 277580)
Figure 16.13 Identifying the gene causing
Waardenburg–Hirschsprung disease.
Dominant megacolon mouse: Individuals affected by this condition
no families a possible ortholog (OMIM 277580) have pigmentary anomalies,
suitable for
linkage
hearing loss, Hirschsprung disease, and
sometimes other neurological problems.
The condition is very rare, and affected
Dom locus mapped to
mouse chromosome 15 people seldom reproduce, so it was not
positional possible to identify the gene responsible
cloning not by first mapping it in families and then
possible
testing positional candidates. The Dominant
human SOX10 Dom gene identified megacolon (Dom) mouse has a similar
cloned as Sox10 combination of pigmentary anomalies and
lack of enteric ganglia. Breeding experiments
in mice allowed the causative gene (Sox10)
to be identified. The human homolog was
SOX10 mutations found in then cloned, and DNA from patients was
some human patients
tested directly for mutations.
512 Chapter 16: Identifying Human Disease Genes and Susceptibility Factors
In some cases, the desired mutations may be hard to find. Apart from the
practical problem of screening a large gene with many exons, some mutations
are not readily discoverable by PCR testing of genomic DNA. For example, one
frequent mutation of the cystic fibrosis transmembrane receptor gene (CFTR)
that causes cystic fibrosis, c.3849+10kb C>T, is a single nucleotide change deep
within an intron that activates a cryptic splice site. It would be missed by the
standard exon-by-exon PCR and sequencing protocols. Initial detection of this
mutation required RT-PCR, although once it had been identified a PCR assay
could easily be designed to detect further cases by using genomic DNA.
In another example, most mutations that cause severe hemophilia A
(OMIM 306700) could not be detected even after sequencing all 26 exons of the
F8A gene. It turned out that a recurrent inversion disrupted the gene. The
intragenic breakpoint was in a 32 kb intron, and so each exon of the gene remained
intact and appeared normal on PCR or sequencing (Figure 16.14). The inversion
was detected by Southern blotting. It is caused by non-allelic homologous recom-
bination between a repeat located in intron 22 of the F8A gene and copies of the
repeat located 360 kb and 435 kb away. Once the relevant sequences were known,
a long PCR test could be designed to detect it.
the central nervous system, many more or less exclusively. All of them are candi-
date genes for mental retardation. Although mental retardation is an extreme
example, almost all observable phenotypes depend on the action of more than
one gene, and so locus heterogeneity is a very common and unsurprising obser-
vation. Indeed, at first sight it seems surprising that any condition should not be
grossly heterogeneous. Locus homogeneity probably results in part from the way
in which most conditions are defined by combinations of symptoms. For exam-
ple, each individual feature of cystic fibrosis can have a variety of causes, but the
combination is specific to people who have no functional copy of the CFTR gene.
Additionally, we are good at seeing very subtle differences between people and
labeling them accordingly. If mutations in two different genes cause very similar
but subtly different phenotypes, we would probably label them differently in
humans but not in mice, flies, or worms.
region is usually too large to search for mutations. However, starting in about
2005, genomewide association studies in many different complex diseases have
identified SNPs that are reproducibly associated with susceptibility. As described
in Chapter 15, these studies typically genotype 500,000 or more tagging SNPs (as
defined by the HapMap project) and copy-number variants in very large case-
control designs. For example, the Wellcome Trust Case-Control Consortium
(WTCCC) genotyped 500,568 SNPs in 14,000 patients with seven diseases and
3000 controls.
SNP are excluded. A proportion of tests are repeated to check the overall error
rate. As a result of all these checks, the number of samples and SNPs used in the
analysis is usually only 70–80% of the number actually tested. sequence the block in a panel of cases and
In addition, association studies detect variants that fulfill the common controls to identify every variant
disease–common variant hypothesis (see Chapter 15). Because such variants are
necessarily ancient, they cannot have been subject to strong negative selection.
Thus, they are likely to have fairly subtle effects on gene expression or function, test association of each variant with the
unlike the frameshift, splice-site, or nonsense mutations commonly seen in disease, using logistic regression to disentangle
effects of linkage disequilibrium between SNPs
Mendelian conditions. Susceptibility factors may be polymorphisms in noncod-
ing DNA that have some small effect on promoter activity, splicing, or mRNA sta-
bility. They may be located at a large distance from the gene whose activity they
affect—the case study of persistence of intestinal lactase at the end of this chap- identify the factor(s) giving the strongest
association(s) and perform functional assays
ter illustrates the sort of variant that may be expected. It is therefore unlikely that
the true causal variant will be identifiable by simple inspection of the sequence,
in the way that it often is for a Mendelian disease. Figure 16.15 Procedure for identifying
a causal variant through association
studies. Genomewide association studies
Causal variants are identified through a combination of statistical use tagging SNPs to identify haplotype
and functional studies blocks that are associated with disease
susceptibility. Any of the variants in that
Figure 16.15 sketches the way in which one would hope to solve these problems. block might be the actual functional variant.
The original association study will most probably have used a limited number of Statistical and functional studies are needed
tag-SNPs (described in Chapter 15); sequencing of the entire associated region in to identify it.
IDENTIFYING CAUSAL VARIANTS FROM ASSOCIATION STUDIES 517
a large panel of the cases and controls will reveal all variants in the study sample.
These will include copy-number variants and polymorphisms with minor allele
frequencies below the 5% threshold normally used in association studies.
Next, each individual variant will be tested for association with the disease,
ideally in an independently ascertained set of patients and controls. Haplotype
blocks, such as those in the HapMap data, are defined by an arbitrary cutoff of
linkage disequilibrium that is always less than 100%, so that the degree of asso-
ciation is not necessarily identical for every variant within a block. Moreover,
most variants are two-allele SNPs, but within the population there will usually be
more than two alternative haplotype blocks at any location. Thus, particular vari-
ants will probably be present on more than one of the alternative blocks. If a vari-
ant that is present on two different blocks is causative, it will show a much stronger
association with the disease than variants that are present on only one of the
blocks. Conversely, a nonfunctional variant that is present on one block that also
carries the functional variant but is also present on another block that does not
do so will show a weaker association with susceptibility (Figure 16.16 shows this
in a simplified example). If the investigators are lucky, testing every variant for
association will show one peak of especially strong association for some small
cluster of SNPs, against a background of the general association of all the SNP
alleles that are found on the relevant haplotype block.
Several different variants in a region may each contribute, independently or
in combination, to susceptibility. Sorting out the causal relationships is extremely
difficult when all the variants are in linkage disequilibrium with each other. The
main tool is logistic regression: a principal variant is selected, and the effect of
other variants is studied, conditioned on the effects of the principal variant. If
this procedure still shows an effect, the second variant does indeed make an
independent contribution. An additional problem is that the main determinants
of susceptibility may be different in different populations. Apart from differences
due to interaction of genetic factors with different environments, associations
identify shared ancestral chromosome segments, which may be different in dif-
ferent populations. Readers interested in a more detailed discussion of the statis-
tical methods used in association studies should consult the review by Balding
(see Further Reading).
When a candidate causal variant has been identified, some sort of functional
test must be used to confirm that it has a biological effect. Missense changes in
coding sequences are the easiest to investigate, through direct tests of protein
function. The balance of splice isoforms can be checked for variants located
within gene sequences, and variants in promoters or enhancers can be studied
with the use of appropriate expression systems. But laboratory studies seldom
capture the full function of a complex gene in all cells and all tissues, and under
all conditions. In a systematic study of gene knockouts in the budding yeast
Saccharomyces cerevisiae, only a minority of all knockouts had any detectable
effect. It is highly unlikely that most yeast genes really have no function—what
the result tells us is that standard laboratory biochemical tests are not able to
assess the full function of a gene, even in a simple organism. In a similar study in
the mouse, 96% of all knockouts did show some abnormality, but the abnormali-
ties could be subtle, often behavioral. If this is true of knockouts, it is likely to be
A B C D
4 haplotypes, 3 SNPs
Figure 16.16 SNP associations with
frequency in population 0.1 0.2 0.3 0.4 disease susceptibility. In this imaginary
population there are four alternative
risk for someone haplotypes (A–D) at a certain location. Three
2¥ 2¥ 1¥ 1¥
with this haplotype different SNPs are shown as three colored
bars; the alternative allele in each case is
relative likelihood that a 0.2 0.4 0.3 0.4 blank. The red SNP is a true susceptibility
case has this haplotype
factor for the disease under study, doubling
probability that a case the risk. The blue and green SNPs have no
0.2/T 0.4/T 0.3/T 0.4/T
has this haplotype = 0.15 = 0.31 = 0.23 = 0.31 causal effect. When each SNP is tested for
(T = 0.2 + 0.4 + 0.3 + 0.4 = 1.3) association with the disease, all will show
an association because each is present on
proportion of cases haplotype A, but it will be strongest for the
having a given SNP 0.46 0.15 0.38 true causal SNP.
518 Chapter 16: Identifying Human Disease Genes and Susceptibility Factors
doubly so for the more subtle functional changes that are suspected of causing
susceptibility to complex diseases.
In a few cases, such investigations have convincingly explained the associa-
tion of a variant with disease susceptibility or resistance. A likely example comes
from the genomewide association study of Crohn disease described in the case
studies. The p.T300A variant in the ATG16L1 gene is an amino acid change in a
protein that has convincing functional links to the disease, and variation at this
amino acid accounts for the whole of the observed association.
0
128.10 128.20 128.30 128.40 128.50 128.60 128.70
position on 8q24 (Mb)
identified, and for most of these new loci plausible candidate genes could be sug-
gested. The odds ratios for the susceptibility alleles were low, mostly in the range
1.1–1.5 (an odds ratio of 1 means that the factor has no effect on risk). The studies
demonstrate that it is possible to identify weak susceptibility factors while leav-
ing open the question of whether doing so is worth the trouble.
was autosomal recessive, and families were generally small in the countries where
it was prevalent. Thus, mapping relied on combining large numbers of affected
sib pairs and hoping that there was no locus heterogeneity. Frustratingly, the first
linkage had been to the locus encoding the enzyme paraoxonase—whose chro-
mosomal location was unknown. This was rectified when paraoxonase, and
hence CF, were localized to chromosome 7q31.
Unlike with Duchenne muscular dystrophy, there were no patients with CF
who had appropriately located chromosomal breakpoints to speed the discovery
process. Instead, years of grinding effort were needed to clone as much as possi-
ble of the DNA from the candidate region. This became an intensely competitive
race between three major groups. The winners, a US–Canadian group led by Lap-
Chee Tsui, used 12 different genomic libraries, including one made from a
human–hamster hybrid cell that contained only human chromosome 7. A major
problem was to build up a clone contig across the candidate region. Chromosome
jumping was used to bypass the size limitation of cosmids, which have a maxi-
mum cloning capacity of 45 kb. This was an early version of the paired-end
mapping technique that is now used to scan the human genome for struct-
ural variants (see Figure 13.6). In this early manifestation, DNA fragments of
80–130 kb from a partial MboI digest of genomic DNA were recovered from pre-
parative pulsed-field gels. The fragments were mixed in very dilute solution with
an excess of a short marker sequence that had compatible sticky ends; they were
then exposed to DNA ligase. The hoped-for ligation products were circles of
80–130 kb of genomic DNA with the marker fragment located at the position
where the ends joined. Circles were fragmented with a restriction enzyme and
the fragments were cloned. The desired clones contained the marker and a known
7q31 sequence, joined to an unknown sequence, which was hoped to come
from a position in 7q31 that was 80–130 kb away from the known sequence
(Figure 16.19).
genomic DNA
A restriction map was constructed across the region and the available genomic
clones were hybridized to zoo blots to find genomic sequences that were con-
served across species. Such sequences were putative gene sequences. Segments
that showed conservation across species were used to screen cDNA libraries
made from tissues that were affected in CF. In addition, a search was made for the
CpG islands that often mark the 5¢ ends of genes, and clones were sequenced to
look for open reading frames. Eventually, and with great difficulty, a 6.5 kb cDNA
clone was isolated that led to the characterization of the CF transmembrane
receptor (CFTR) gene. The original 1989 paper is well worth reading to get a feel
for just how heroic these early efforts were.
C CH NH
HC O
formylglycine
mass spectrometry. This allowed them to identify a partial amino acid sequence.
Searching genetic databases for sequences that could encode their peptides
identified a single human cDNA. This corresponded to a gene they named SUMF1
on chromosome 3p26. They then sequenced the SUMF1 gene in a series of
patients with MSD and demonstrated mutations (Figure 16.21).
Meanwhile, Andrea Ballabio’s group in Naples took a different approach.
Ballabio had long been interested in sulfatases as a result of early positional clon-
ing work in which he had identified four sulfatase genes on the short arm of the X
chromosome. He used a purely genetic approach, microcell-mediated chromo-
some transfer, to identify the MSD gene. This technique had had an important
role in physical mapping during the earlier phases of the Human Genome Project.
The starting material is a set of human–mouse somatic cell hybrids each contain-
ing a single human chromosome tagged with a dominant selectable marker (see
Chapter 6, p. 182). Panels of such cells have been established as publicly available
resources (US National Institute of General Medical Sciences Human/Rodent
Somatic Cell Hybrid Mapping Panel #2, http://ccr.coriell.org/Sections/
Collections/NIGMS/Map02.aspx?PgId=496).
The monochromosomal hybrid cells were subjected to prolonged mitotic
arrest by continued exposure to cytochalasin B, an inhibitor of mitotic spindle
formation. This causes the chromosome content to become partitioned into dis-
crete subnuclear packets (micronuclei). The micronuclei can be physically iso- extract of bovine testis
lated from the cells by centrifugation in the presence of cytochalasin B, resulting
chromatographic
in the formation of microcells, which are particles consisting of a single micronu- separation,
cleus and a thin rim of cytoplasm, surrounded by an intact plasma membrane. guided by
Microcells, each containing a single known human chromosome, were fused functional assay
with cells from a patient with MSD, which were then assayed to see whether they
had recovered any sulfatase activity. These difficult experiments showed that highly enriched enzyme preparation
chromosome 3 could restore the activity. mass spectrometry
To narrow down the candidate region, the chromosome 3 monochromosomal
hybrid cells were irradiated before microcells were made. The radiation frag-
partial peptide sequence
mented the chromosomes, so that in the subsequent microcell-mediated chro-
mosome transfer, only random small chromosome fragments were transferred. database searches for
cDNA that could encode
Fusion cells that had recovered sulfatase activity were tested for the presence of these peptides
donor-specific alleles of microsatellite markers from chromosome 3. By this
method, the candidate region was defined as a 2.6 Mb region of 3p26. This con- candidate cDNA
tained seven genes, all of which were sequenced in a panel of patients. Mutations
in one gene were found in every patient (Figure 16.22). design PCR primers, test
for mutations in patients
The two competing groups had identified the same gene by very different
routes. Their results were published back to back in Cell in 2003 (see Further
mutations in SUMF1 identified in patients
Reading).
Figure 16.21 How the Göttingen group
Case study 5: persistence of intestinal lactase identified the gene responsible for
multiple sulfatase deficiency. The relevant
Most people worldwide stop producing intestinal lactase in childhood, after
enzymic activity was purified biochemically;
which they are unable to digest lactose, the sugar found in milk. Drinking fresh mass spectrometry was then used to obtain
milk causes discomfort and diarrhea. However, some people, including most partial peptide sequence. A candidate gene
Northern Europeans, continue to produce intestinal lactase, and can therefore was then identified from genetic databases,
tolerate milk as adults. Persistence of intestinal lactase correlates very closely and mutations in this gene were found in
across the world with agricultural practice. Populations that keep dairy animals MSD patients.
524 Chapter 16: Identifying Human Disease Genes and Susceptibility Factors
have high levels of persistence, whereas persistence is rare in those that do not. panel of mouse–human hybrid cells, each
For example, whereas the majority of African adults are lactose-intolerant, the containing the whole mouse genome plus
one specific human chromosome
Beja, Tutsi, and Fulani, who are traditionally pastoralist, are mostly tolerant.
Biochemical and genetic evidence suggested that the main genetic determi- treat with the mitotic
nant of persistence was linked to the lactase (LCT) gene on 2q21, but no relevant inhibitor cytochalasin B,
then select those
change could be found in the coding exons or promoter. Eventually a single C/T microcells that contain
SNP in intron 13 of the unrelated but neighboring gene MCM6 was shown to be a human chromosome
perfectly associated with persistence in Northern Europeans. Functional assays
confirmed that this polymorphism, –13910 C/T, affected transcription of the panel of microcells, each containing just a
lactase gene. In Africans, however, no such association could be demonstrated. single human chromosome
The –13910 T allele that determines persistence in Northern Europeans is rare or fuse each microcell in
absent in lactose-tolerant Africans. Instead, other SNPs in the same intron 13 of turn with cells from
the MCM6 gene are associated with lactase persistence and have been shown to patients with multiple
sulfatase deficiency,
affect levels of transcription of the LCT gene. It is interesting to speculate that, look for one that
had these effects been found in a genomewide association study, and if we were corrects the
biochemical defect
unaware of the function of the LCT gene, MCM6 might have been the gene for
lactase persistence.
human chromosome 3 corrects the defect
Lactose intolerance is the ancestral state, in man and other mammals.
Tolerance has been strongly selected for in pastoralist populations because of the repeat the whole
procedure using mouse–
obvious advantages of being able to use milk, a rich source of energy and nutri- human hybrids that
ents. The selection has resulted in convergent evolution of the trait, with different contain just fragments
sequence variants being selected in different parts of the world to achieve the of human chromosome 3
same result. The haplotypes that carry the persistence alleles are very much
longer than those associated with nonpersistence (Figure 16.23). In Eurasians, a 2.6 Mb fragment from human 3p26
corrects the defect
(A) African G/C-14010 identify genes in
this region
the average fully homozygous tract length in –13910 T/T homozygotes was
1.4 Mb, in comparison with 1900 bp in C/C homozygotes; in Africans lengths
were 1.8 Mb and 1800 bp in –14010 C/C and G/G homozygotes, respectively. The
very long haplotypes are evidence of strong recent selection: there has been little
time for recombination to fragment them.
(A) 2.0
2 4 6 8 10 12 14 16 18 20 22 Y
0.3
0
–0.3
1 3 5 7 9 11 13 15 17 19 21 X
–0.2
chromosome position
58 Mb 59 Mb 60 Mb 61 Mb 62 Mb 63 Mb
(B) cen tel
M3C39325 TOX
DKFZP434F122 CAB RAB2 CHD7 M3C34848 A8PH
AL831990 NSMAF
8DC8P
CYP7A1
(C)
38
ATG TAA
chromo chromo SNF2
helicase
Figure 16.24 Discovering the cause of CHARGE syndrome. (A) Initial detection of a deletion (arrow) in one CHARGE patient by array-CGH. (B) The
candidate region on 8q12 and genes in the region. Combining data from the first deletion case with data from a second patient with a chromosomal
rearrangement identified two candidate regions (red arrows) for the CHARGE gene. The genes in green were screened for mutations. (C) Genomic structure
of the CHD7 gene, showing exons (black rectangles) and functional domains (colored blocks). Mutations found in 10 non-deletion patients are shown as red
dots. [Adapted from Vissers LE, van Ravenswaaij CM, Admiraal R et al. (2004) Nat. Genet. 36, 955–957. With permission from Macmillan Publishers Ltd.]
526 Chapter 16: Identifying Human Disease Genes and Susceptibility Factors
1500 families
complex segregation
analysis
5¢
locus at 17q21
D17S250 (Z = 3.28–5.98
with D17S84)
BRCA1 gene
24 exons 8 cM ( ) BRCA1
1863 amino 17 cM ( )
acids Figure 16.25 How the BRCA1 gene was
positional 214 selected found. Initial segregation analysis of 1500
cloning multi-case families families with multiple cases of breast
D17S588 (international
mutation lod score analysis cancer suggested that some cases might be
collaborative study)
analysis heritable in a Mendelian manner. Linkage
BRCA1 candidate region analysis of 23 of these families identified a
susceptibility locus at 17q21, which led to
3¢
the identification of the BRCA1 gene.
EIGHT EXAMPLES OF DISEASE GENE IDENTIFICATION 527
seemed that BRCA1 might account for 80–90% of families with both breast and
ovarian cancer but a much smaller proportion of families with breast cancer
alone. Male breast cancer was seen mainly in BRCA2 families. Evidence concern-
ing the function of these two genes, and why mutations cause cancer, is discussed
in Chapter 17.
Important though these discoveries were, it would be a mistake to suppose
that we now understand the genetic basis of breast cancer, or even of familial
breast cancer. Mutations at these two loci account for only 20–25% of the familial
risk of breast cancer. A survey by Ford and colleagues of 257 families with four or
more cases of breast cancer suggested that, of those with four or five cases of
female breast cancer, but no ovarian or male breast cancer, 67% probably did not
involve mutations in BRCA1 or BRCA2. Despite intensive research, no third major
gene has been discovered that could explain the remaining familial tendency.
Segregation analysis is consistent with the remaining genetic susceptibility being
polygenic—that is, due to a large number of genes, each having only a modest
effect.
The BRCA1/2 genes were identified by focusing on small numbers of special
families. Identifying the other susceptibility factors requires large-scale case-
control studies. Many candidate genes have been the subject of focused studies
that have identified additional susceptibility factors.
• A specific variant, c.1100delC, in the CHEK2 gene (encoding a protein impor-
tant in repair of DNA damage, and hence a good candidate for involvement in
cancer) was seen in 201 out of 10,890 cases and only 64 out of 9065 controls,
giving an odds ratio of 2.34.
• By sequencing all 62 exons of the ATM gene (described in more detail in
Chapter 17), mutations were found in 12 out of 443 cases but only 2 out of 521
controls. The odds ratio was 2.37. All cases were familial, but they did not have
BRCA1/2 mutations.
• By sequencing all 20 exons of the BRIP1 gene (encoding a protein that inter-
acts with the BRCA1 protein), mutations were found in 9 out of 1212 cases but
only 2 out of 2081 controls, giving an odds ratio of 2.0. Again, the cases were
chosen to be familial but negative for BRCA1/2 mutations.
• By sequencing all 13 exons of the PALB2 gene (encoding a protein that inter-
acts with the BRCA2 protein), mutations were found in 10 out of 923 cases
but none of 1084 controls, suggesting an odds ratio of 2.3 for mutation carri-
408 cases
ers. As before, the cases were familial and had been checked for BRCA1/2 stage 1 400 controls
mutations. 266,722 SNPs
These factors are relevant in the small minority of families in which they are select the 5% most
segregating, but at a population level their effect is small. For example the CHEK2 associated SNPs
c.1100delC variant was seen in only 1.9% of cases. In fact, all these susceptibility
factors together, including BRCA1 and BRCA2, accounted for only about 25% of 3990 cases
stage 2 3916 controls
the familial tendency. A very large genomewide association study attempted to 12,711 SNPs
identify other factors. Figure 16.26 outlines the study design.
SNPs at five novel loci gave convincing evidence of association with breast select the top 30 SNPs
cancer susceptibility. Four of the loci contained genes that, on functional grounds,
were considered plausible candidates (FGFR2, TNRC9, MAP3K1, and LSP1). The 21,860 cases
fifth did not fall close to any likely candidate gene but, interestingly, was close to stage 3 22,578 controls
the region on chromosome 8q24 that has been implicated in a complex pattern 30 SNPs
of susceptibility to prostate cancer (see Figure 16.17). The odds ratios for the five
loci were 1.26 (FGFR2), 1.11 (TNRC9), 1.13 (MAP3K1), 1.07 (LSP1), and 1.08 (the
anonymous 8q24 SNP). All the putative risk alleles were common in the popula-
tion so, despite the low odds ratios, these factors may have an appreciable role in 5 novel susceptibility loci
the overall incidence of breast cancer. Whether knowing one’s genotype for these 4 contained plausible candidate genes
<0.00001 13 0.96
0.00001–0.0001 12 7.0
0.0001–0.001 88 53
Data are from stages 1 and 2 of the study outlined in Figure 16.26. aObserved data after certain
statistical corrections. bNumber expected on the null hypothesis of no true association.
Linkage and association studies have suggested locations for a long and growing list of
susceptibility factors. In many cases a candidate gene has been identified. aRegions supported by
suggestive linkage data (lod scores ≥ 2.2). Other regions were defined by association data, although
in some cases there was also weak linkage data.
CARD15 mutations account for only part of the susceptibility to Crohn dis-
ease. As usual in such cases, many other candidate regions were proposed in
these early linkage studies, two of which are noteworthy. A region on 5q31 (IBD5)
showed linkage and association in several studies, with an odds ratio of 1.5. The
candidate region contains several cytokine genes that are plausible candidates,
but attempts to identify the precise variant causing susceptibility have not so far
been successful. A coding variant in the IL23R gene at chromosome 1p31
(p.R381Q) is strongly protective, with an odds ratio of 0.26—but this variant
accounts for only a small proportion of overall susceptibility, because the
arginine-381 allele that confers susceptibility is present in 93% of controls as well
as in 98.1% of patients.
530 Chapter 16: Identifying Human Disease Genes and Susceptibility Factors
TABLE 16.3 STATISTICAL ANALYSIS OF THE DATA FROM A GENOMEWIDE STUDY OF CROHN DISEASE
Chromosome Initial GWA study Replication Replication cohort Overall Gene
no. cohort 1: TDT result 2: allele frequency
Allele frequency p (transmissions/non- (cases/controls) Odds p
(cases/controls) transmissions) ratio
The data analyzed here are also shown in Figure 16.27. The frequency of the minor SNP allele in cases and controls is shown for the initial scan and
second replication set. TDT data show the transmission/nontransmission of the minor SNP allele from parent to affected offspring. GWA, genomewide
associations; N.D., not done. aData from an earlier report by the same group. bSNPs showing significant replication in the TDT series.
HOW WELL HAS DISEASE GENE IDENTIFICATION WORKED? 531
0
1
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
chromosome
This work has been chosen as a good example of the progress that has been
made over the past few years. The list of susceptibility loci in Table 16.2 is long
and still growing. The work has demonstrated that determinants of susceptibility
to the two types of inflammatory bowel disease, Crohn disease and ulcerative
colitis, are mostly different, and identifying the main Crohn disease susceptibility
factors has produced a better understanding of the pathological process.
the only way to identify causative mutations. A pilot study by Tarpey and col-
leagues (see Further Reading) attempted to do this for X-linked mental retarda-
tion (XLMR). After excluding cases with mutations in known XLMR genes, 208
families were chosen for study, and an attempt was made to sequence every exon
on the X chromosome in an affected male from each family. The study identified
nine novel disease genes, but in 50–75% of families (depending on the degree of
certainty required) the cause remained unknown. Worryingly, in several genes
the study identified clearly inactivating mutations in normal male controls.
Extending this to the much larger number of autosomal disease genes is clearly a
formidable task.
that modern living conditions (dry air and frequent washing) aggravate the skin
barrier defects in filaggrin-deficient babies. These babies then first encounter
antigens through a defective skin barrier rather than through the normal intesti-
nal route, maybe during a short sensitive period.
CONCLUSION
There are many ways in which a disease gene may be identified. Knowledge of
the protein product or of a related gene (in humans or another species) can lead
directly to identification of the gene. More often, first the chromosomal location
of the gene is defined, and then a search of that region is made for a gene that
carries mutations in patients but not in controls (positional cloning). Sometimes
chromosomal abnormalities can suggest a location. Translocations, inversions,
or deletions may cause clinical problems by disrupting or deleting genes. If two
unrelated patients share a chromosomal breakpoint and also a particular clinical
feature, maybe a gene responsible for that feature is located at or near the break-
point. More often, however, disease genes are localized by linkage analysis in a
collection of families where the disease is segregating, as described in Chapters
14 and 15.
Once a candidate location has been defined, it is necessary to identify all the
genes in the region and prioritize them for mutation analysis. The genome data-
bases are a powerful but not infallible tool for identifying candidate genes. It may
be necessary to perform additional searches or laboratory experiments to iden-
tify all exons of a gene and other transcripts. Genes are prioritized for testing on
the basis of their likely function and expression pattern, and also on practical
considerations about the ease of testing. As sequencing has become faster and
cheaper, it has become increasingly attractive simply to sequence all exons across
the candidate region, without trying too hard to decide priorities.
FURTHER READING 535
This approach has been outstandingly successful at identifying the genes and
mutations underlying Mendelian conditions, but less so for complex diseases. As
described in Chapter 15, linkage analysis, even using non-parametric methods,
has had very limited success with complex diseases, probably because of the
great heterogeneity of causes. Association studies are much more successful.
Genomewide association studies of many different complex diseases have used
high-resolution SNP genotyping in large case-control studies to identify numer-
ous SNPs that are associated with either susceptibility or resistance to the dis-
ease. Generally these SNPs will not directly affect susceptibility. Instead, they will
be in linkage disequilibrium with the functional variant. Identifying the true
functional variants is extremely challenging.
The question will remain: what is the practical value of having a list of suscep-
tibility factors that confer relative risks of 1.2 or less? Funding agencies and phar-
maceutical companies have invested heavily in research in these areas in the
belief, or at least the hope, that such knowledge would revolutionize medicine. In
Chapter 19 we ask how far these hopes look like being realized. Meanwhile, per-
haps the area of medicine in which analyzing genomewide changes using micro-
arrays is closest to clinical application is oncology. We explore this area in the
next chapter.
FURTHER READING
Positional cloning ways in which Drosophila can be used to help understand
Church DM, Goodstadt L, Hillier LW et al. (2009) Lineage-specific human disease.]
biology revealed by a finished genome assembly of the Cuzin F, Grandjean V & Rassoulzadegan M (2008) Inherited
mouse. PLoS Biol. 7(5), e1000112. variation at the epigenetic level: paramutation from the
Comprehensive Mouse Knockout Consortium (2004) The plant to the mouse. Curr. Opin. Genet. Dev. 18, 193–196. [A
knockout mouse project. Nat. Genet. 36, 921–924. review of paramutation by the group that described the Kit
Database of Genomic Variants. http://projects.tcag.ca/variation/ paramutation in the mouse.]
[A database of copy-number and other structural variants Jirtle RL & Skinner MK (2007) Environmental epigenomics and
found in apparently normal subjects.] disease susceptibility. Nat. Rev. Genet. 8, 253–262. [A review
European Mouse Mutagenesis Consortium (2004) The European of possible cases in which epigenetic effects allow acquired
dimension for the mouse genome mutagenesis program. characters to be heritable.]
Nat. Genet. 36, 925–927.
Peters LL, Robledo RF, Bult CJ et al. (2007) The mouse as a
Identifying causal variants from association studies
resource for human biology: a resource guide for complex Balding DJ (2006) A tutorial on statistical methods for population
trait analysis. Nat. Rev. Genet. 8, 58–69. [An impressive association studies. Nat. Rev. Genet. 7, 781–790. [A detailed
detailing of the very extensive resources in mouse genetics discussion of the statistical methods used in association
that can help in the identification of human disease factors.] studies.]
Steinmetz LM, Scharfe C, Deutschbauer AM et al. (2002) Horikawa Y, Oda N, Cox NJ et al. (2000) Genetic variation in the
Systematic screen for human disease genes in yeast. Nat. gene encoding calpain-10 is associated with type 2 diabetes
Genet. 31, 400–404. mellitus. Nat. Genet. 26, 163–175. [A heroic early attempt to
identify the precise variant causing susceptibility.]
The value of patients with chromosomal Wellcome Trust Case Control Consortium (2007) Genome-wide
abnormalities association study of 14,000 cases of seven common diseases
and 3,000 shared controls. Nature 447, 661–678. [A good
Kurotaki N, Imaizumi K, Harada N et al. (2002) Haploinsufficiency
overview of the state of the art in genomewide association
of NSD1 causes Sotos syndrome. Nat. Genet. 30, 365–366.
studies.]
Vissers LELM, Veltman JA, Geurts van Kessel A & Brunner HG
Witte JS (2007) Multiple prostate cancer risk variants on 8q24.
(2005) Identification of disease genes by whole genome CGH
Nat. Genet. 39, 579–580. [A commentary on three papers in
arrays. Hum. Mol. Genet. 14, R215–R223.
this number of the journal that report association studies of
Position-independent routes to identifying disease susceptibility to prostate cancer.]
genes Eight examples of disease gene identification
Koob MD, Benzow KA, Bird TD et al. (1998) Rapid cloning of
expanded trinucleotide repeat sequences from genomic Duchenne muscular dystrophy
DNA. Nat. Genet. 18, 72–75. [Use of the repeat expansion Koenig M, Hoffman EP, Bertelson CJ et al. (1987) Complete
detection method for position-independent identification of cloning of the Duchenne muscular dystrophy (DMD) cDNA
genes containing expanded repeats.] and preliminary genomic organization of the DMD gene
in normal and affected individuals. Cell 50, 509–517. [The
Testing positional candidate genes original report of the cloning of this gene.]
Bier E (2005) Drosophila, the golden bug, emerges as a tool for Royer-Pokora B, Kunkel LM, Monaco AP et al. (1985) Cloning
human genetics. Nat. Rev. Genet. 6, 9–23. [An overview of the the gene for an inherited human disorder—chronic
536 Chapter 16: Identifying Human Disease Genes and Susceptibility Factors
granulomatous disease—on the basis of its chromosomal study mentioned on p. 528. This paper is well worth reading
location. Nature 322, 32–38. [The first gene to be positionally for the picture it gives of the methodology.]
cloned.] Klionsky DJ (2009) Crohn disease, autophagy and the Paneth cell.
Worton RG & Thompson MW (1988) Genetics of Duchenne N. Engl. J. Med. 360, 1785–1786. [A succinct account of our
muscular dystrophy. Annu. Rev. Genet. 22, 601–629. [An understanding of the pathogenesis.]
excellent review, giving a good feel for the obstacles Ogura Y, Bonen DK, Inohara N et al. (2001) A frameshift mutation
overcome in cloning the dystrophin gene.] in NOD2 associated with susceptibility to Crohn disease.
Nature 411, 603–606. [One of the three studies mentioned on
Cystic fibrosis p. 528.]
Rommens JM, Januzzi MC, Kerem B-S et al. (1989) Identification Rioux JD, Xavier RJ, Taylor KD et al. (2007) Genome-wide
of the cystic fibrosis gene: chromosome walking and association study identifies new susceptibility loci for Crohn
jumping. Science 245, 1059–1065. [A humbling illustration of disease and implicates autophagy in disease pathogenesis.
the huge effort needed in the pioneering days of positional Nat. Genet. 39, 596–604. [The study from which Figure 16.27
cloning.] and Table 16.3 are taken.]
Cancer Genetics
17
KEY CONCEPTS
• Cancer is the result of somatic cells acquiring genetic changes that confer on them six general
features: (1) independence of external growth signals, (2) insensitivity to external anti-growth
signals, (3) the ability to avoid apoptosis, (4) the ability to replicate indefinitely, (5) the ability of a
mass of such cells to trigger angiogenesis and vascularize, and (6) the ability to invade tissues and
establish secondary tumors.
• These features are acquired through a normal Darwinian process of natural selection acting on
random mutations. The microevolution occurs in several stages, with each successive change
giving an extra selective advantage to descendants of a cell.
• Highly evolved and sophisticated defense mechanisms protect the body against the proliferation
of such mutant cells, but tumor cells have mutations that disable these defenses.
• Stem cells, unlike most somatic cells, have the ability to replicate indefinitely, and so they already
possess one of the features of cancer cells. Thus, stem cells are likely candidates to evolve into
tumor cells.
• Oncogenes normally act to promote cell division, but a complex regulatory network limits this
activity. In tumor cells, one copy of an oncogene is often abnormally activated—through point
mutations, copy-number amplification, or chromosomal rearrangements—so that it escapes
regulation.
• Chromosomal rearrangements in tumor cells often create novel chimeric oncogenes but may
alternatively upregulate the expression of an oncogene by placing it under the influence of a
powerful enhancer.
• The normal role of tumor suppressor genes is to limit cell division. In tumor cells, both copies
of a tumor suppressor gene are often inactivated by deletions, point mutations, or methylation
of the promoter.
• Many tumor suppressor genes have been identified through investigation of familial cancer
predisposition syndromes. In these syndromes, individuals inherit a mutation that inactivates
one allele of a tumor suppressor gene.
• Oncogenes and tumor suppressor genes normally function in the cell signaling that controls the
cell cycle, or in the response to DNA damage. Understanding these processes is central to
understanding what goes wrong in cancer.
• Genomic instability is a normal feature of tumor cells. Because of this instability, tumors contain
large populations of cells carrying a great variety of mutations, on which natural selection can
act. Driver mutations contribute to tumor development and are subject to positive selection;
passenger mutations are chance by-products of the genomic instability of most tumor cells.
• Most instability is seen at the chromosomal level. However, some tumors are cytogenetically
normal but have a high level of DNA replication errors.
• New sequencing technology allows the totality of acquired genetic changes in tumor cells to be
cataloged. The results of these studies emphasize the individuality and large number of such
changes in tumor cells.
• Histological and molecular profiling of a tumor can provide important prognostic information
and a guide to treatment. Drugs can target specific acquired genetic changes. Genomewide
studies using expression arrays are an important tool for this.
• Tumorigenesis is best understood by thinking in terms of altered pathways rather than individual
mutated genes.
538 Chapter 17: Cancer Genetics
oral
mucosa
Figure 17.1 Stages in the development of a carcinoma. Three histological sections of oral mucosa, stained with hematoxylin and
eosin, showing stages in the development of oral cancer. (A) Normal epithelium. (B) Dysplastic epithelium, which is a potentially
premalignant change. The epithelium shows disordered growth and maturation, abnormal cells, and an increased mitotic index
(the proportion of cells undergoing mitosis). (C) Cancer arising from the surface epithelium and invading the underlying connective
tissues. The islands of the tumor (carcinoma) show disordered differentiation, abnormal cells, and increased and atypical mitoses.
Pathologists use such changes in tissue architecture to identify and grade tumors. (Courtesy of Nalin Thakker, University of
Manchester.)
Cancer—a condition in which cells divide without control—is not so much a dis-
ease as the natural end state of any multicellular organism. We are all familiar
with the basic Darwinian idea that a population of organisms that show heredi-
tary variation in reproductive capacity will evolve by natural selection. Genotypes
that reproduce faster or more extensively will come to dominate later genera-
tions, only to be supplanted, in turn, by yet more efficient reproducers. Exactly
the same applies to the population of cells that constitutes a multicellular organ-
ism. The determining factor can be an increased birth rate or a decreased death
rate. Cellular birth and death are under genetic control, and if somatic mutation
creates a variant that proliferates faster, the mutant clone will tend to take over
the organism. Cancers are the result of a series of somatic mutations with, in
some cases, also an inherited predisposition. Thus, cancer can be seen as a natu-
ral evolutionary process.
Tumors can be classified according to the tissue of origin; thus, carcinomas
are derived from epithelial cells, sarcomas from bone or connective tissue, and
leukemias and lymphomas from blood cell precursors. Solid tumors can be con-
sidered as organs, in that they consist of a variety of cell types and are thought to
be maintained by a small population of (cancer) stem cells. An important distinc-
tion is between benign (noninvasive) and malignant (invasive) tumors.
Pathologists classify tumors based on the histology as seen under the microscope
(Figure 17.1). Genetic tests for specific chromosomal rearrangements or genom-
ewide expression profiling (discussed later in this chapter) allow further refine-
ments of the classification. These classifications are important for deciding prog-
nosis and management, but they do not explain how the cancer evolved. The aim
of cancer genetics is to understand the multi-step mutational and selective path-
way that allowed a normal somatic cell (but maybe a stem cell) to found a popu-
lation of proliferating and invasive cancer cells. As the key molecular events are
revealed, new prognostic indicators become available to the pathologist. It is
hoped that some of these events will also present new therapeutic targets to the
oncologist.
THE EVOLUTION OF CANCER 539
Turning a normal epithelial cell into an invasive cancer cell requires perhaps six specific mutations
in the one cell. It would seem extremely unlikely that any one cell should suffer so many mutations
(which is why most of us are alive). If a typical mutation rate is 10–7 per gene per cell generation,
the probability of this happening to any one of the 1013 cells in a person is 1013 ¥ 10–42, or 1 in
1029. Cancer nevertheless happens because of a combination of two mechanisms:
• Some mutations enhance cell proliferation, creating an expanded target population of cells
for the next mutation (Figure 17.2). This may require a combination of two or more mutations.
• Some mutations affect the stability of the entire genome, at either the DNA or the
chromosomal level, increasing the overall mutation rate. Malignant tumor cells usually
advertise their genomic instability by their abnormal karyotypes.
Because cancers depend on these two mechanisms, they develop in stages, starting with
tissue hyperplasia or benign growths. Within a stage, successive random mutations generate an
increasingly diverse cell population until eventually, by chance, one cell acquires a change or
combination of changes that gives it a growth advantage.
540 Chapter 17: Cancer Genetics
mutation can increase the likelihood that a cell will pick up subsequent muta- mutation no. 1
tions, either by conferring a growth advantage (Figure 17.2) or by inducing
genomic instability. Accumulating all these mutations nevertheless takes time, so
that cancer is mainly a disease of post-reproductive life, when there is little selec-
tive pressure to improve the defenses still further. selective growth of clone
Cell types differ in the extent to which their normal capabilities approach with mutation no. 1
these tumor cell capabilities—for example, some cell types divide rapidly, some
are relatively resistant to apoptosis, and so on. Tumors arise most easily from
cells in which that approach is closest. The existence of populations of rapidly mutation no. 2
dividing and relatively undifferentiated cells in fetuses and infants explains the
special cancers seen in young children. Stem cells are thought to be important as
tumor progenitors, because they already possess the capacity for indefinite pro-
liferation. There is controversy over how far all tumors arise from mutated stem
cells, but there is agreement that tumor precursor cells have stem cell-like prop- selective growth of clone
erties, whether these are innate or are acquired by mutation. with mutations 1 + 2
The genes that are the targets of these mutations can be divided into two
broad categories, although, as always in biology, these are more tools for thinking
about cancer than watertight exclusive classifications. mutation no. 3
• Oncogenes are genes whose normal activity promotes cell proliferation. Gain-
of-function mutations in tumor cells create forms that are excessively or inap-
propriately active. A single mutant allele may affect the behavior of a cell. The
nonmutant versions are properly called proto-oncogenes.
• Tumor suppressor genes are genes whose products act to limit normal cell
proliferation. Mutant versions in cancer cells have lost their function. Some continuing evolution by
tumor suppressor gene products prevent inappropriate cell cycle progres- natural selection
sion, some steer deviant cells into apoptosis, and others keep the genome sta-
ble and mutation rates low by ensuring accurate replication, repair, and seg-
regation of the cell’s DNA. Both alleles of a tumor suppressor gene must be
inactivated to change the behavior of a cell.
By analogy with a bus, one can picture the oncogenes as the accelerator and
the tumor suppressor genes as the brake. Jamming the accelerator on (a domi-
malignant tumor
nant gain of function of an oncogene) or having all the brakes fail (a recessive loss
of function of a tumor suppressor gene) will make the bus run out of control. Figure 17.2 Multi-stage evolution of
Traditionally, both categories have been seen as protein-coding genes, but simi- cancer. Each successive mutation gives the
lar reasoning could be applied to other genetic control elements—in particular cell a growth advantage, so that it forms an
expanded clone, thus presenting a larger
microRNAs (see Chapter 9, p. 283).
target for the next mutation.
17.2 ONCOGENES
Oncogenes were discovered in the 1960s when it was realized that some animal
cancers (especially leukemias and lymphomas) were caused by viruses. Some of
the viruses had relatively complicated DNA genomes (SV40 virus and papilloma
viruses, for example), but others were acute transforming retroviruses that had
very simple RNA genomes. The genome of a standard retrovirus has just three
transcription units: gag, encoding internal proteins; pol, encoding a reverse tran-
scriptase and other proteins; and env, encoding envelope proteins. Post-
transcriptional processing produces several proteins from single transcripts.
Great excitement ensued when it was discovered that the ability of acute trans-
forming retroviruses to transform cells in culture (that is, to change their growth
pattern to one resembling that of tumor cells) was entirely due to their posses-
sion of one extra gene, the oncogene (Figure 17.3). For a short time, some enthu- Figure 17.3 An acute transforming
siasts hoped that the whole of cancer might be explained by infection with viruses retrovirus. (A) The genome of a standard
carrying oncogenes, and that once these had been identified anti-cancer vac- retrovirus has three transcription units: gag,
cines could be developed. encoding internal proteins; pol, encoding
proteins needed for replication; and env,
encoding the proteins needed for packaging
(A) 8500 nt RNA genome new virus particles. There are also long
terminal repeated sequences (LTR). (B) In a
LTR gag pol env LTR random processing error, part of the normal
retroviral genome has been replaced by a
(B) cellular oncogene. The virus is no longer
competent to replicate, but it can ferry the
LTR oncogene LTR oncogene into a cell. nt, nucleotide.
ONCOGENES 541
Researchers soon discovered that the viral oncogenes were copies of normal
cellular genes, proto-oncogenes, that had become accidentally incorporated into
the retroviral particles. The viral versions were somehow activated, enabling
them to transform infected cells.
Most human cancers do not depend on viruses, but nevertheless their resi-
dent proto-oncogenes have become activated. In the late 1970s an assay was
developed to detect activated cellular oncogenes. NIH 3T3 mouse fibroblast cells
were transfected with random DNA fragments from human tumors. Some of the
cells were transformed, and the human DNA responsible could be identified by
constructing a genomic library in bacteriophage from fragments of the DNA of
the transformed 3T3 cells and then screening for recombinant phage that con-
tained the human-specific Alu repeat. The oncogenes identified in the transform-
ing fragments included many that were already known from the viral studies.
DNABINDING PROTEINS
D-type cyclins:
encodes a virus-specific D-type cyclin, which is an independent gene, not an activated version of one of the human cyclin D genes. However, oncogenic
activated versions of all three human D-type cyclins, the result of somatic mutations, have been identified in certain leukemias.
542 Chapter 17: Cancer Genetics
MYCN neuroblastoma
Chromosomal rearrangement BCR–ABL1 chronic myeloid leukemia (see also Table 17.3)
creating a novel chimeric gene
COOH
NH2
GTP
ABL1 gene 5¢ 3¢
BCR gene 5¢ 3¢
translocation joins the 3¢ part of the ABL1 genomic sequence onto the 5¢ part of
the BCR (breakpoint cluster region) gene on chromosome 22, creating a novel
fusion gene. This chimeric gene is expressed to produce a tyrosine kinase related
to the ABL1 product but with abnormal transforming properties (Figure 17.6).
Many other tumor-specific recurrent rearrangements that produce chimeric
oncogenes have now been recognized (Table 17.3). This mechanism is seen in
15–25% of leukemias, lymphomas, and sarcomas, but has been reported in only
1% or less of the common solid epithelial tumors. The apparent scarcity of gene
fusions in epithelial tumors may, however, be deceptive, and be due to technical
difficulties. A review by Mitelman and colleagues (see Further Reading) gives a
valuable discussion and many examples. All known examples are cataloged in
the Mitelman database of chromosomal aberrations in cancer (http://cgap.nci.
nih.gov/Chromosomes/). At present, 358 different fusions have been recognized,
involving 337 different genes. Many genes are promiscuous—that is, they are
found in fusions with various different partners. The MLL gene at 11q23 is the
most extreme example, having been noted in fusions with more than 40 different
partners. Fusions can be produced not only by translocations but also by inver-
sions or, occasionally, by deletions that remove sequences that normally separate
two genes.
Clinically, the specific gene fusions present in a patient’s leukemic cells have
an important bearing on the management and prognosis. They can be identified
by a targeted FISH assay. For example, to check for the 9;22 translocation, differ-
ently colored FISH probes for the BCR and ABL1 genes can be hybridized to an
interphase cell. If the translocation is present, there will be one BCR signal, one
ABL signal, and two fusion signals from the reciprocal products of the transloca-
tion (see Figure 2.17).
Activation by translocation into a transcriptionally active chromatin
region
Burkitt lymphoma is especially common in malarial regions of Central Africa and
Papua New Guinea. Mosquitoes and Epstein–Barr virus are believed to have some
role in the etiology, but activation of the MYC oncogene is a central event. A char-
acteristic chromosomal translocation, t(8;14)(q24;q32), is seen in 75–85% of
patients (Figure 17.7). The remainder have t(2;8)(p12;q24) or t(8;22)(q24;q11).
Each of these translocations juxtaposes the MYC oncogene (normally located at
ONCOGENES 545
Acute promyelocytic leukemia t(15;17)(q22;q12) PML–RARA transcription factor + retinoic acid receptor
Note how the same gene may be involved in several different rearrangements. For a more complete list see http://www.sanger.ac.uk/genetics/CGP
/Census. ALL, acute lymphoblastoid leukemia; AML, acute myeloid leukemia; CML, chronic myeloid leukemia.
8q24) close to an immunoglobulin (IG) locus. This may be IGH at 14q32, IGK at
2p12, or IGL at 22q11. Unlike the translocations shown in Table 17.3, these trans-
locations do not create novel chimeric genes. Instead, they bring the oncogene
under the influence of regulatory elements that normally ensure high expression
of the immunoglobulin genes in antibody-producing B cells. In the 8;14 translo-
cation, the MYC and IGH genes are in opposite transcriptional orientations, head
to head. Often, depending on the precise breakpoint, exon 1 of the MYC gene
(which is noncoding) is not included in the translocated material. Deprived of its
normal upstream controls and placed in an active chromatin domain, MYC is
expressed at an inappropriately high level. Between 25% and 65% of all B-cell
malignancies involve the activation of one or another oncogene by an
8 14 der (8) der (14)
References to the genes and diseases may be found in OMIM under the number cited. A longer list is
given in Table 1 of Futreal PA et al. (2001) Nature 409, 850–852.
Some genes frequently lose the function of one allele in tumors, but the
retained allele seems fully functional. Sometimes this is because the second allele
is inactivated by methylation, which is not detected by some of the experimental
methods used, but some cases do seem to be genuine one-hit events. It is not
unreasonable to postulate that there are tumor suppressor genes for which hap-
loinsufficiency is enough to provide a growth advantage.
More unusually, there is at least one case in which three hits are seen. In famil-
ial adenomatous polyposis (FAP or APC; OMIM 175100; described later in this
chapter) numerous premalignant polyps (adenomas) develop in the colon.
Certain inherited germ-line mutations are weak and confer an attenuated phe-
notype with fewer polyps. Adenomas from such patients often have two somatic
mutations in addition to the inherited mutation. One somatic mutation inacti-
vates the wild-type allele. This gives the cells a growth advantage, but they gain a
further advantage from a chromosomal deletion that eliminates the partly func-
tional germ-line APC allele.
LOH analysis has been a major industry in cancer research during the 1990s, stromal tumor apparent
contamination cells LOH
but it is fair to ask whether the results have justified the effort. Advanced cancer
cells may show LOH at as many as one-quarter of all loci, so large samples are
needed to tease out the specific changes from the general background. Array-
CGH and other genomewide approaches have supplanted studies with single
ψTSG1
markers. The results suggest the presence of a remarkably large number of tumor
suppressor genes, yet attempts to follow up such studies by positional cloning TSG
have had quite a low success rate. ψTSG2
The tumor DNA used in LOH studies always comes from a collection of cells,
and these may well not be genetically homogeneous. Figure 17.11 illustrates one
pitfall that arises from the inevitable contamination of primary tumor material
by stromal (normal) cells. It is better to speak of a quantitative allelic imbalance
rather than an all-or-nothing loss of heterozygosity. Cell lines can be used to cir-
Figure 17.11 A pitfall in interpreting
cumvent the problem of heterogeneity, but tumor cell lines normally have many
LOH data. The true event in this tumor is
more chromosome abnormalities than primary tumor cells. A more general homozygous deletion of the TSG gene. In
problem is that there is almost never any information on the karyotypes of the the region of homozygous deletion, all the
tumors used in LOH studies—and LOH is a product of the chromosomal instabil- PCR product is derived from contaminating
ity and deficient repair of double-strand DNA breaks that are so characteristic of stromal material (green lines), which is
tumor cells. Maybe some of the losses observed are by-products of specific pat- genetically normal. Thus, LOH data show
terns of chromosomal instability rather than selection for loss of a tumor sup- retention of heterozygosity at the TSG locus
pressor gene. (pale pink). In the flanking regions one allele
is present in the tumor cells, so the PCR
product shows LOH (dark pink). The pattern
Tumor suppressor genes are often silenced epigenetically by suggests the existence of two spurious
methylation tumor suppressor loci, yTSG1 and yTSG2.
The true situation would be revealed by
Tumor suppressor genes may be silenced by deletion (reflected in loss of hetero-
fluorescence in situ hybridization (FISH) or
zygosity) or by point mutations, but a very common third mechanism is methyla- by immunohistochemical testing for the
tion of the promoter. This mechanism is seen with genes whose promoters con- TSG gene product.
tain CpG islands. Most CpG islands in normal cells are unmethylated (see Chapter
11 for a discussion of exceptions to this). In tumor cells, we frequently see meth-
ylation of normally unmethylated CpG islands in the promoters of certain genes.
The effect is to prevent expression of the gene. Specific genes differ in their sus-
ceptibility to this type of silencing (Figure 17.12). Some tumor suppressor genes
(for example MSH2) are often silenced by mutation but never by methylation. For
others (for example MLH1), methylation occurs as a common alternative to point
mutation, whereas in some (for example RASSF1A at 3p21, and HIC1 at 17p13.3),
methylation is the only known mechanism causing tumor-specific loss of
function.
TP73
MSH2
VHL
MLH1
RASSF1A
FHIT P15
P14 P16
APC
GSTP1
SMARCA3 RB1
1 2 3 4 5 6 7 8 9 10 11 12 13
SSI1 HIC1
TP53 STK11
SMAD4 NF2
NF1 SMARCB1
CDH1
BRCA1 TIMP3
14 15 16 17 18 19 20 21 22 X Y
Figure 17.12 Ways of silencing tumor suppressor genes. Genes shown in green are silenced only by mutation, and those in red only by methylation,
whereas those in purple may be silenced by either mechanism. [From Jones PA & Baylin SB (2002) Nat. Rev. Genet. 3, 415–428. With permission from
Macmillan Publishers Ltd.]
550 Chapter 17: Cancer Genetics
Cdk4/6
DNA damage
p16INK4A
cyclin D response
irreparable damage:
pRb Mdm2 p53
trigger apoptosis
E2F p14ARF
reparable damage:
p21WAF1/CIP1 pause for repair
p14ARF p16INK4A
transcript transcript Figure 17.16 The two products of the
CDKN2A gene. This gene (also known as
MTS and INK4A) encodes two completely
unrelated proteins. p16INK4A is translated
from exons 1a, 2, and 3, and p14ARF from
exons 1b 1a 2 3 exons 1b, 2, and 3—but with a different
reading frame of exons 2 and 3. The two
gene products are active in the pRb and p53
p16 coding p14 coding arms of cell cycle control, respectively, as
untranslated shown in Figure 17.14.
sequence sequence
INSTABILITY OF THE GENOME 553
6 7 8 9 10 11 12
13 14 15 16 17 18
19 20 21 22 X Y
phosphorylation P
G1 S
of damage. Any cell that is unable to repair DNA damage should be eliminated by
apoptosis, but in a whole population of such cells there must be selective pres-
sure for variants that can escape death. The different types of damage and differ-
ent repair mechanisms were described in Chapter 13 and are reviewed by
Hoeijmakers (see Further Reading). Defects in each are associated with specific
forms of cancer:
• Nucleotide excision repair is defective in several cancer-prone syndromes,
particularly the various forms of xeroderma pigmentosum (XP; OMIM
278700). Patients with XP are homozygous for inherited loss-of-function
mutations in one or other of the genes involved in nucleotide excision repair.
They are un-able to repair DNA damage caused by ultraviolet radiation. They
are exceedingly sensitive to sunlight and develop many tumors on exposed
skin.
• Base excision repair defects are not common in human cancers, but a survey
of patients with colon cancer by Farrington and colleagues (see Further
Reading) showed that rare homozygous defects in the MUTYH repair enzyme
were associated with a 93-fold increased risk of colon cancer. Overall, these
accounted for about 0.5% of all cases of colon cancer.
• Double-strand break repair defects primarily affect homologous recombina-
tion rather than nonhomologous end joining, although both mechanisms
require the ATM–NBS1–BRCA1–BRCA2 machinery mentioned above.
• Replication error repair defects cause a large generalized increase in muta-
tion rates. These defects came to light when studies of familial colon cancer
revealed the phenomenon of microsatellite instability, described below.
nonmalignant DNA
160.50 173.19 211.96 227.50
567 369 1650 1301
Figure 17.19 Microsatellite instability.
Electropherograms of two PCR-amplified
microsatellite markers. Upper traces: blood
DNA. Lower traces: tumor DNA. Note the
tumor DNA 211.99 227.53
160.49 173.21 extra peaks in the tumor DNA. (Courtesy of
2075 407
378 197 Lise Hansen, University of Aarhus.)
558 Chapter 17: Cancer Genetics
syndrome I and endometrial cancer. For details, see Jiricny J & Nystrom-Lahti M (2000) Curr. Opin.
Genes Dev. 10, 157–162.
A
G
MSH2
hMutSa
MSH6
A
G
PMS2
ATP-powered hMutLa
movement MLH1
Figure 17.20 Mechanism of mismatch
repair. Replication errors can produce
A mismatched base pairs or small insertion–
G deletion loops. These are recognized by
hMutSa, a dimer of MSH2 and MSH6
proteins, or sometimes by the MSH2—MSH3
exonuclease dimer hMutSb. The proteins translocate
(1) strip back
polymerase
(2) resynthesize along the DNA, bind the MLH1–PMS2 dimer
replication factors
hMutLa, then assemble the full repairosome,
C which strips back and resynthesizes the
G newly synthesized strand.
GENOMEWIDE VIEWS OF CANCER 559
consequence of the many cell divisions and genomic instability of cancer cells.
Passenger mutations probably greatly outnumber driver mutations. For example,
the glioblastoma multiforme study by Parsons et al. identified one or more non-
silent coding sequence changes in 685 of the 20,661 genes studied, with a mean
of 47 mutations per tumor. Combining data from sequencing (including limited
testing of a further 83 tumor samples), copy-number changes, and expression
data suggested that 42 of these genes might harbor driver mutations.
These and other similar results clearly show the complexity and heterogeneity
of tumorigenesis. Individual tumors have large numbers of mutations and, even
among tumors of the same type, there is only limited overlap. To extract signifi-
cant patterns, the International Cancer Genome Consortium plan to sequence at
least 2000 genes in 250 samples each from 50 tumor types, a total of 12,500 sam-
ples. The review by Stratton, Campbell, and Futreal gives a good picture of the
achievements of this work and the challenges it faces.
series of tumors can be used to search for common patterns or meaningful differ-
ences. Meticulous control of the experimental conditions and very careful filter-
ing and processing of the raw data are necessary to produce reproducible results.
The processed data are displayed as a matrix in which each column shows data
for one sample and each row shows data for one gene. Cells of the matrix are
colored to indicate raised, lowered, or unaltered levels of expression.
The challenge is to extract meaningful patterns from the matrix. The object
may be to find gene expression signatures—patterns of overexpression or under-
expression of a limited number of genes—that are correlated with known clinical
variables and may serve to predict that variable in an unknown case. Alternatively,
the object may be to identify unexpected between-sample similarities and differ-
ences that can define new classes of tumor. Hierarchical clustering methods rear-
range the rows and columns of the matrix to maximize similarities.
MicroRNAs (miRNAs), small noncoding RNAs that are thought to have a regu-
latory role, have also been targets of genomewide expression profiling. The pro-
files show characteristic changes in different tumor types. Most miRNAs are
downregulated in most tumors. This suggests a general dedifferentiation of the
cells, a reversion to a more embryonic state. MiRNA expression tends to increase
as cells differentiate.
Superimposed on an overall decrease in miRNA expression, cancer cells show
abnormal miRNA expression profiles. Lu and colleagues reported profiles of 217
miRNAs in 334 samples of normal human tissue and various tumors (see Further
Reading). The profiles were remarkably diverse and performed better than mRNA
profiles at discriminating between different tumor types, and between tumor
and normal cells. Other researchers showed that overexpression of one miRNA,
miR-10b, in nonmetastatic breast tumors triggered invasion. Clinical progression
of primary breast tumors correlated with miR-10b levels. Other miRNAs
(miR-126 and miR-335) had the opposite effect: metastatic breast cancer cells
lose expression of these two miRNAs, and restoring their expression decreased
the metastatic potential of the cells.
The data are currently confusing, but some individual miRNA genes do fulfill
the criteria for oncogenes or tumor suppressor genes. This may be because they
control the expression of one of the well-known classical oncogenes or tumor
suppressor genes, or it might reflect some independent action of the miRNA.
Expression profiling has been widely and successfully used for identifying dis-
tinct categories of tumor (class discovery). Figure 17.21 shows typical examples.
The eventual aim must be to develop prognostic and predictive indicators.
Prognostic indicators would predict the likely survival of the patient, regardless
of treatment. Predictive indicators would predict which treatments would be
most likely to succeed with that particular tumor. How far expression profiling
has moved toward such direct clinical relevance is discussed in Chapter 19.
Pr Bl Br Co Ga Ki Li Ov Pa LA LS
Bl
(95%)
Br
(91%)
Co
148 ‘classifier’ genes
(93%)
correlate with
resistance
Ga
(92%)
Ki
(99%)
Li
(99%) (D)
Ov 70
(95%) 60
Pa 50
tumors
(99%)
40
LA 30
(91%)
LS 20
(95%)
10
100 tumors
Figure 17.21 Examples of the use of expression arrays to characterize tumors. (A) Identifying the tissue of origin. Expression
patterns of 148 genes (rows) classify 100 tumors (columns) by origin. Pr, prostate; Bl, bladder/ureter; Br, breast; Co, colorectal; Ga,
gastroesophagus; Ki, kidney; Li, liver; Ov, ovary; Pa, pancreas; LA, lung adenocarcinoma; LS, lung squamous cell carcinoma. Red
indicates increased gene expression, blue decreased expression. (B) Distinguishing two types of diffuse large B-cell lymphoma.
Clustering of gene expression distinguishes germinal center-like tumors (orange bars) from activated B-like tumors (blue bars).
Five-year survival in the two groups is 76% and 16%, respectively. (C) Sensitivity to chemotherapy drugs. Expression profiles of
6817 genes in 60 tumor cell lines were correlated with response to 232 drugs. The figure shows gene expression patterns (rows) in
30 cell lines (columns) that predict sensitivity or resistance to one drug, cytochalasin D. (D) Predicting the clinical outcome of breast
cancer. Expression patterns of 25,000 genes were scored in 98 primary breast cancers. The figure shows profiles of 70 prognostic
marker genes (rows) in 78 cancers (columns). All patients had undergone surgery and radiotherapy; the solid yellow line divides
those predicted to require additional chemotherapy (above the line) from those who do not need this unpleasant treatment.
[(A) from Su AI, Welsh JB, Sapinoso LM et al. (2001) Cancer Res. 61, 7388–7393. With permission from the American Association
for Cancer Research. (B) from Alizadeh AA, Eisen MB, Davis RE et al. (2000) Nature 403, 503–511. With permission from Macmillan
Publishers Ltd. (C) from Staunton JE, Slonim DK, Coller HA et al. (2001) Proc. Natl Acad. Sci. USA 98, 10787–10792. With permission
from the National Academy of Sciences. (D) data from van ’t Veer LJ, Dai H, van de Vijver MJ et al. (2002) Nature 415, 530–536. With
permission from Macmillan Publishers Ltd.]
frequent in very early stage lesions, whereas others usually only appear later. The
principal observations are as follows:
• The earliest detectable lesions, aberrant crypt foci, lack all APC expression. In
people with FAP, who have one inherited loss-of-function mutation of APC in
every cell, about 1 epithelial cell in 106 develops into a polyp. This rate is con-
sistent with loss of the second APC allele being the determining event.
• Early stage adenomas, like most cancer cells, show a global hypomethylation
of CpG sequences in the DNA. Despite the overall hypomethylation, CpG
islands in the promoters of specific genes are hypermethylated, as mentioned
previously.
• About 50% of intermediate and late adenomas, but only about 10% of early
adenomas, have mutations in the KRAS oncogene. Thus, KRAS mutations may
often be required for progression from early to intermediate adenomas.
• About 50% of late adenomas and carcinomas show a loss of heterozygosity on
18q. This is relatively uncommon in early and intermediate adenomas. It
seems likely that the relevant gene is SMAD4 rather than the initial candidate,
DCC.
UNRAVELING THE MULTI-STAGE EVOLUTION OF A TUMOR 563
Figure 17.22 A model for the multi-step development of colon cancer. In many tumors a loss, activation, or mutation of certain
genes is seen at particular histological stages, as described in the text. This is primarily a tool for thinking about how tumors develop,
rather than a firm description. Every colorectal cancer is likely to have developed through the same histological stages, but the
underlying genetic changes are more varied.
• Colorectal carcinomas, but not adenomas, have a very high frequency of TP53
loss or mutations.
These observations have been summarized in the much-quoted ‘Vogelgram’
of Figure 17.22. This scheme should not be over-interpreted; it is a model for
thinking about the multi-stage evolution of cancer, not a description of how it
always happens. As the statistics quoted above show, many colon cancers do not
show these changes. Indeed, only about 60% have APC mutations, and Lynch
syndrome I tumors follow a different route.
Similar but more rudimentary schemes can be made for other tumors for
which the data are not so rich. The early mutations in the multi-step evolution of
a tumor are more critical than the later ones, because there may be only a few
ways in which a small number of mutations can change the behavior of an other-
wise normal cell that has all its defenses intact. The genomic instability of later
stages allows a much wider range of possibilities on which natural selection can
act. According to the gatekeeper hypothesis of Kinzler and Vogelstein, in a given
renewing cell population one particular gene is responsible for maintaining a
constant cell number. Mutation of a gatekeeper leads to a permanent imbalance
between cell division and cell death, whereas mutations of other genes have no
long-term effect if the gatekeeper is functioning correctly. The tumor suppressor
genes identified by studies of Mendelian cancers are likely to be the gatekeepers
for the tissue involved, for example RB1 for retinal progenitor cells, NF2 for
Schwann cells, and VHL for kidney cells. Sporadic cases would then need two
successive mutations in the gatekeeper—but, as was shown earlier for retino-
blastoma, this is compatible with the normal target cell population size and the
relative rarity of each specific sporadic cancer. Some common cancers such as
prostate or lung cancer do not have clean Mendelian forms. These pose a bit of a
problem for the gatekeeper hypothesis because they suggest that no single initial
mutation substantially increases the likelihood that a tumor will develop.
Interestingly, inherited APC mutations are almost always frameshifting (68%)
or nonsense (30%) changes, whereas the somatic mutations (the second hit in
familial cases, and both mutations in sporadic tumors) seldom inactivate the
gene fully. This fits well with the gatekeeper hypothesis. People with inherited
FAP remain normal and healthy until one cell suffers a second mutation; there
should be some difference between their inherited mutations and the somatic
mutations that are postulated to change cell behavior. However, this argument
seems specific to the APC gene. It is perhaps relevant that APC protein not only
regulates the level of b-catenin, a critical controller of cell growth (see below), but
also has a role in ensuring chromosomal stability. APC protein helps ensure the
attachment of spindle microtubules to the kinetochores of chromosomes. We
have already noted that even very early adenomas show chromosomal instability.
It is likely that a single APC mutation (of the right type) already confers a growth
advantage on colonic epithelium, whereas mutation of a single BRCA1/2 gene
gives no advantage to ductal epithelium. This may explain why APC mutation is
very common in sporadic colon cancer but BRCA1/2 silencing is seen in only
10–15% of sporadic breast cancers.
564 Chapter 17: Cancer Genetics
(A) (B)
axin axin
inactive APC
b-Cat
GSK3B Figure 17.23 The APC pathway in
GSK3B colorectal cancer. (A) One main role of the
b-Cat APC protein is to bind b-catenin (b-Cat),
APC which is then marked for destruction. Axin
TCF collaborates in this process. GSK3B, glycogen
b-catenin
synthase kinase 3B. (B) Loss of APC function
degradation b-Cat
allows b-catenin to accumulate and bind to
nucleus nucleus
the TCF/LEF (lymphoid enhancer binding
factor) transcription factor, thus promoting
the transcription of genes such as MYC and
TCF TCF cyclin D1. However, there are other ways
b-Cat of achieving the same effect (see the text).
[Adapted from Fearnhead NS, Britton MP
& Bodmer WF (2001) Hum. Mol. Genet.
10, 721–733. With permission from Oxford
University Press.]
INTEGRATING THE DATA: CANCER AS CELL BIOLOGY 565
amplification
in 14% MDM2
amplification
MDM4 in 7%
NF1 RAS PI(3)K PTEN
proliferation mutation
survival FOXO
in 1%
translation
Thus, behind the heterogeneity of molecular events, there is a much more Figure 17.24 Altered signaling pathways
regular picture of pathways that must be inactivated for a tumor to develop. In in glioblastoma multiforme tumor cells.
their 2002 review, Hahn and Weinberg attempted to list all these pathways and A variety of different sequence or copy-
number changes among pathway
draw up a ‘route map’ of cancer. The current generation of integrated genom-
components are seen in different individual
ewide investigations of different tumor types is providing many concrete exam-
tumors. (A) In 87% of tumors, the pathways
ples (Figure 17.24), and the review by Vogelstein and Kinzler gives many others. through which p53 affects senescence and
Figure 12.11 gives an impression of just how complex a full picture will be. apoptosis are compromised by a variety
of different genetic changes. (B) In 88% of
Malignant tumors must be capable of stimulating angiogenesis tumors, various genetic changes affect the
and metastasizing pathways by which receptor tyrosine kinases
(EGFR, ERBB2, PDGFRA, and MET) acting
Of the six capabilities that a malignant tumor must acquire, we have already seen though RAS and PI(3)K (phosphoinositide
the roles of activated oncogenes, deranged cell cycle control, p53 mutation, and 3-kinase) control cell proliferation and
reactivation of telomerase in achieving the first four capabilities in the list. The survival. Yellow indicates activating genetic
final two capabilities are angiogenesis and metastasis. alterations, with frequently altered genes
showing deeper shades of yellow. Blue
After a tumor reaches a volume of 2–3 mm3 it becomes dependent on the pro-
indicates inactivating mutations, with
duction of new blood vessels (angiogenesis) to obtain an adequate supply of oxy-
darker shades corresponding to a higher
gen and nutrients. The capability of sustained angiogenesis depends on an ang- percentage of alterations. [Data from Cancer
iogenic switch that both upregulates pro-angiogenic factors and down-regulates Genome Atlas Research Network (2008)
angiogenesis inhibitors. Different types of tumor may use different molecular Nature 455, 1061–1068. With permission
strategies to activate the switch. Many cytokines and other signaling molecules from Macmillan Publishers Ltd.]
are involved. Members of the VEGF (vascular endothelial growth factor) family of
proteins have prominent roles in this. Expression is upregulated in many tumors,
and VEGF is the target of anti-cancer drugs such as Avastin®. Hypoxia triggers the
expression of many genes by stabilizing the transcription factor HIF1a (Hypoxia
Inducible Factor 1a). One hereditary cancer syndrome, von Hippel–Lindau syn-
drome (see Table 17.4), is caused by the loss of a protein that normally keeps
HIF1a levels low by ubiquitylating it and flagging it for degradation.
Metastasis may not be a specifically selected capability, because the ability to
metastasize is not clearly advantageous to any cell in a primary tumor. Some
capabilities that are advantageous to primary tumors also incidentally facilitate
metastasis. Tumors need to be highly vascularized, but this also increases the
opportunities for cells to leave the tumor and travel round the body. Primary
tumors need to be invasive to provide room for growth, but the decreased cell
adhesion and increased ability to disrupt tissues also aid metastasis. On the other
hand, a whole series of functions are not important to primary tumors but are
necessary to allow a newly seeded tumor cell to establish a colony in a new loca-
tion. The review by Nguyen and Massagué discusses these issues.
566 Chapter 17: Cancer Genetics
Systems biology may eventually allow a unified overview of tumor Figure 17.25 Pathways important for the
acquisition of cancer cell capabilities.
development This figure is an initial attempt to show
Each of the six capabilities listed by Hanahan and Weinberg may be acquired by the involvement of oncogenes and tumor
a variety of different genetic changes, but in each case acquiring it requires cer- suppressor genes (shown in red) in many
tain specific pathways to be activated or inactivated. Figure 17.25, taken from the pathways that affect a cell’s choice of
progressing through the cell cycle, exiting
review by Hanahan and Weinberg, shows some of the target pathways. The capa-
from the cycle, or undergoing apoptosis.
bilities are not necessarily acquired in the same order in different tumors, but the Many specific examples are discussed by
requirements for genomic instability and successive clonal expansions at inter- Hanahan and Weinberg, from whose paper
mediate stages impose a certain regularity on the process. this figure is taken. RTK, receptor tyrosine
All these pathways are used to steer a cell in certain directions in particular kinase; 7-TMR, 7-transmembrane segment
circumstances (Figure 17.26A). Life would be very simple if signal and response cell surface receptor; ECM, extracellular
were connected by a single unbranched linear pathway (Figure 17.26B) but this matrix. [From Hanahan D & Weinberg RA
seems never to be the case. Rather, multiple branching, overlapping, and par- (2000) Cell 100, 57–70. With permission from
tially redundant pathways control the behavior of cells (Figure 17.26C). Such Elsevier.]
complicated networks are probably necessary to confer stability and resilience
on the extraordinarily complex machinery of a cell. Experimentally, unraveling
the precise genetic circuitry of the controls is exceedingly difficult, partly because
of their complexity and partly because it is difficult to distinguish direct from
indirect effects in transfection or knockout experiments. The science of systems
biology is devoted to quantitative modeling of such networks, and one of its main
applications is in cancer. When a full quantitative description of the control net-
work in each different cell type has become available, we will be in a position to
truly understand cancer.
CONCLUSION
Cancer is the result of a Darwinian microevolutionary process among the cells of
our body. Just as in the evolution of a population of animals, individual cells
acquire mutations at random, and natural selection favors any cell containing
mutations that allow it to reproduce faster or resist death better than its neigh-
bors. To become a cancer cell, a cell must be able to continue replicating, inde-
pendently of external growth and anti-growth signals. This requires several suc-
cessive mutations to disable multiple regulatory pathways.
FURTHER READING 567
Genes that normally act to stimulate growth suffer activating mutations and (A)
stasis
are known as oncogenes; in contrast, those that normally act to restrain growth external
and differentiation
suffer inactivating mutations and are known as tumor suppressor genes. Genomic internal mitosis
instability allows the more rapid accumulation of subsequent mutations. These signals
apoptosis
consist of a small number of driver mutations and a much larger number of pas-
senger mutations. Driver mutations contribute to the development of the tumor (B)
and are positively selected; passenger mutations are incidental by-products of
tumor development. Through many rounds of ill-controlled replication with high signals responses
mutation rates, tumors come to contain a heterogeneous collection of clones,
probably maintained by a small population of cancer stem cells.
Whereas the evolution of a population of organisms is brought about by natu- (C)
ral selection of germ-line mutations, the microevolution of a tumor depends on
selection of somatic mutations. Nevertheless, certain predisposing mutations signals responses
may be passed through the generations in the germ line, causing the various rare
familial cancer syndromes. This can happen because mutations in most tumor
Figure 17.26 The options open to a
suppressor genes have no phenotypic effect until both alleles are inactivated (as
cell, and how it chooses. (A) In response
Knudson put it, they must suffer two hits). A single inactivating mutation can be
to internal and external signals, a cell
transmitted through the germ line, so that it is present in every cell of the off- chooses between stasis, mitosis, apoptosis,
spring. Such cells still function normally but are vulnerable to a single hit that and sometimes differentiation. (B) An
inactivates the remaining functional allele. This is a much more frequent occur- imaginary cell in which signals are linked to
rence than two hits happening by chance on a single normal cell, so the result is responses by linear unbranched pathways
an inherited high risk of cancer. of stimulation (Æ) or inhibition (—|). Human
Every tumor is a unique product of a separate microevolutionary process, and cells do not function like this. (C) In real cells,
contains a unique spectrum of driver and passenger mutations. However, all signals feed into a complex network of partly
tumors need to have acquired a set of specific capabilities: in addition to inde- redundant interactions whose outcome is
not easy to predict analytically.
pendence from external growth and anti-growth signals, tumors need to avoid
apoptosis, replicate indefinitely, develop their own blood supply (angiogenesis),
and, in malignant tumors, invade other tissues to establish secondary tumors.
These capabilities depend on subverting the same network of signaling pathways
in a given cell type. Thus, considered in terms of pathways, tumors are much less
heterogeneous than when the individual gene mutations they contain are listed.
Understanding cancer requires a quantitative and genomewide view of how the
signaling network functions in normal cells of each particular type. Only when
we have obtained such a view will we then understand how specific genetic or
epigenetic changes can destabilize the network and allow a tumor to develop.
FURTHER READING
The evolution of cancer Mitelman F, Johansson B & Mertens F (eds) (2002) Mitelman
Database of Chromosome Aberrations in Cancer. http://cgap.
Hanahan D & Weinberg RA (2000) The hallmarks of cancer. Cell
nci.nih.gov/Chromosomes/Mitelman
100, 57–70. [An important review, introducing the concept of
Mitelman F, Johansson B & Mertens F (2007) The impact of
the six essential capabilities of a tumor cell and giving many
translocations and gene fusions on cancer causation. Nat.
examples.]
Rev. Cancer 7, 233–244.
Rosen JM & Jordan CT (2009) The increasing complexity of the
cancer stem cell paradigm. Science 324, 1670–1673. [A review Tumor suppressor genes
of the controversies and complexities surrounding the Burkhart DL & Sage J (2008) Cellular mechanisms of tumour
concept of cancer stem cells.] suppression by the retinoblastoma gene. Nat. Rev. Cancer
8, 671–682. [An in-depth review of the functions of the pRb
Oncogenes protein.]
Bishop JM (1983) Cellular oncogenes and retroviruses. Annu. Cavenee WK, Dryja TP, Phillips RA et al. (1983) Expression
Rev. Biochem. 52, 301–354. [A nice review of early work on of recessive alleles by chromosomal mechanisms in
oncogenes.] retinoblastoma. Nature 305, 779–784. [The paper that
Boxer LM & Dang CV (2001) Translocations involving c-myc and established Knudson’s two-hit hypothesis. Some of the data
c-myc function. Oncogene 20, 5595–5610. [An insight into are in an earlier paper, Godbout R, Dryja TP, Squire J et al.
the complexities of the Burkitt lymphoma and other MYC (1983) Somatic inactivation of genes on chromosome 13 is a
translocations.] common event in retinoblastoma. Nature 304, 451–453.]
Cancer Gene Census. http://www.sanger.ac.uk/genetics/CGP/ Fodde R & Smits R (2002) One-hit carcinogenesis. Science 298,
Census [A database of genes for which mutations have been 761–763. [Examples of haploinsufficient tumor suppressor
causally implicated in cancer.] genes.]
Meyer N & Penn LZ (2008) Reflecting on 25 years with MYC. Nat. Knudson AG (2001) Two genetic hits (more or less) to cancer.
Rev. Cancer 8, 976–990. [A historical review illustrating the Nat. Rev. Cancer 1, 157–162. [A review of his two-hit
many roles of one of the major oncogenes.] hypothesis.]
568 Chapter 17: Cancer Genetics
Thiagalingam S, Laken S, Willson JK et al. (2001) Mechanisms International Cancer Genome Consortium. http://www.icgc.org/
underlying losses of heterozygosity in human colorectal home [A project whose goal is to obtain a comprehensive
cancers. Proc. Natl Acad. Sci. USA 98, 2698–2702. [A nice study description of genomic, transcriptomic and epigenomic
using DNA markers and FISH to identify the mechanisms of changes in 50 different tumor types and/or subtypes which
loss in 62 tumors. The mechanisms often differed from those are of clinical and societal importance across the globe.]
described by Cavenee and colleagues in retinoblastoma.] Ley TJ, Mardis ER, Ding L et al. (2008) DNA sequencing of a
cytogenetically normal acute myeloid leukaemia genome.
Cell cycle dysregulation in cancer Nature 456, 66–72. [The first full sequence of a tumor
Kastan MB & Bartek J (2004) Cell cycle checkpoints and cancer. genome.]
Nature 432, 316–323. [One of a cluster of eight reviews of cell Lu J, Getz G, Miska E et al. (2005) MicroRNA expression profiles
division and cancer in this issue of Nature. This review focuses classify human cancers. Nature 435, 834–838.
on the response to DNA damage.] Ma L, Teruya-Feldstein J & Weinberg RA (2007) Tumor invasion
Kops GJPL, Weaver BAA & Cleveland DW (2005) On the road to and metastasis initiated by microRNA-10b in breast cancer.
cancer: aneuploidy and the mitotic checkpoint. Nat. Rev. Nature 449, 682–688.
Cancer 5, 773–785. [Gives details of the mitotic checkpoint Parsons DW, Jones S, Zhang X et al. (2008) An integrated
mechanism.] genomic analysis of human glioblastoma multiforme. Science
Malumbres M & Barbacid M (2005) Mammalian cyclin-dependent 321, 1807–1812.
kinases. Trends Biochem. Sci. 30, 630–641. [A general review of Quackenbush J (2006) Microarray analysis and tumor
the cyclin-dependent kinases and their functions.] classification. N. Engl. J. Med. 354, 2463–2472. [A clear and
Malumbres M & Barbacid M (2009) Cell cycle, CDKs and cancer: readable account of how microarray data are processed and
a changing paradigm. Nat. Rev. Cancer 9, 153–166. [Suggests interpreted.]
that Cdk1 is the only generally essential cyclin-dependent Stratton MR, Campbell PJ & Futreal PA (2009) The cancer genome.
kinase; others may have specialized roles in certain cell Nature 458, 719–724. [A review of achievements and
types.] challenges in obtaining genomewide views of cancer.]
Massagué J (2004) G1 cell-cycle control and cancer. Nature 432, Tavasoie SF, Alarcón C, Oskarsson T et al. (2008) Endogenous
298–306. [A detailed review of mechanisms controlling human microRNAs that suppress breast cancer metastasis.
progression through G1 and across the G1/S threshold.] Nature 451, 147–152.
Weir BA, Woo MS, Getz G et al. (2007) Characterizing the cancer
Instability of the genome genome in lung adenocarcinoma. Nature 450, 893–898.
Brennan P, McKay J, Lee M et al. (2007) Uncommon CHEK2 mis- [Identifying copy-number variations in 371 tumors.]
sense variant and reduced risk of tobacco-related cancers:
case-control study. Hum. Mol. Genet. 16, 1794–1801. [An Unraveling the multi-stage evolution of a tumor
interesting study and worth reading for the clear discussion Fodde R, Smits R & Clevers H (2001) APC, signal transduction
of this unexpected finding—but not a comfort blanket for and genetic instability in colorectal cancer. Nat. Rev. Cancer
smokers!] 1, 55–67. [A detailed look at APC function and the surprising
Farrington SM, Tenesa A, Barnetson R et al. (2005) Germ line regularities in the types of mutation found.]
susceptibility to colorectal cancer due to base-excision repair Kinzler KW & Vogelstein B (1996) Lessons from hereditary
gene defects. Am. J. Hum. Genet. 77, 112–119. colorectal cancer. Cell 87, 159–170. [A review of their
Fishel R, Lescoe MK, Rao MRS et al. (1993) The human mutator important early work on the development of colorectal
gene homolog MSH2 and its association with hereditary cancer, and an introduction to the gatekeeper hypothesis.]
nonpolyposis colon cancer. Cell 75, 1027–1038. [Identifying
MSH2 mutations as a cause of human mismatch repair Integrating the data: cancer as cell biology
defects.] Hahn WC & Weinberg RA (2002) Modelling the molecular circuitry
Hoeijmakers J (2009) DNA damage, aging and cancer. New Engl. J. of cancer. Nat. Rev. Cancer 2, 331–340. [An attempt to define a
Med. 361, 1475–1485. road map of carcinogenesis.]
Nguyen DX & Massagué J (2007) Genetic determinants of cancer
Genomewide views of cancer metastasis. Nat. Rev. Genet. 8, 341–352. [A detailed discussion
Cancer Genome Anatomy Project. http://cgap.nci.nih.gov/ of how and when a tumor acquires metastatic potential.]
[A project whose goal is ‘to determine the gene expression Rajagopalan H, Bardelli A, Lengauer C et al. (2002) RAF/RAS
profiles of normal, precancer, and cancer cells, leading oncogenes and mismatch-repair status. Nature 418, 934.
eventually to improved detection, diagnosis, and treatment [Showing that RAS and BRAF mutations are alternative ways
for the patient.’] of achieving the same thing in tumorigenesis.]
Cancer Genome Atlas Research Network (2008) Comprehensive Smith G, Carey FA, Beattie J et al. (2002) Mutations in APC,
genomic characterization defines human glioblastoma genes Kirsten-ras, and p53 –alternative genetic pathways to
and core pathways. Nature 455, 1061–1068. colorectal cancer. Proc. Natl Acad. Sci. USA 99, 9433–9438.
Feinberg AP, Ohlsson R & Henikoff S (2006) The epigenetic [This study highlights the heterogeneity of molecular events
progenitor origin of human cancer. Nat. Rev. Genet. 7, 21–32. in the evolution of colorectal cancer.]
[Argues for an early epigenetic event being crucial to Vogelstein B & Kinzler KW (2004) Cancer genes and the pathways
tumorigenesis.] they control. Nat. Med. 10, 789–799. [A good overview of
Greenman C, Stephens P, Smith R et al. (2007) Patterns of somatic pathways in tumor development.]
mutation in human cancer genomes. Nature 446, 153–158.
[Sequence analysis of 518 protein tyrosine kinase genes in
210 diverse human cancers.]
Chapter 18
Genetic Testing of
Individuals 18
KEY CONCEPTS
Genetic testing is a routine tool for almost all biomedical scientists. PCR-based
tests are the standard way of identifying pathogens and are the everyday working
tools of microbiologists and virologists. Emerging diseases are identified by
sequencing the pathogen’s genome and are tracked by genetic tests. Animal and
plant breeders use genetic tests to guide their work. Our modern understanding
of evolution and development is based on genetic tests. However, this being a
book on human molecular genetics, we will confine ourselves here to tests
focused on the human genome. Even here the field is very extensive. Previous
chapters have covered many applications of genetic tests in research, for exam-
ple to map disease genes or susceptibility factors, or to identify the cause of a
disease. Here we consider the methodology of testing in human diagnostics and
forensics. Many of the principles have already been covered in Chapter 7.
When considering diagnostic tests, wherever possible we will use two of the
most common Mendelian diseases, cystic fibrosis (CF) and Duchenne muscular
dystrophy (DMD), to illustrate the various testing methods. Both CF and DMD
involve large genes with extensive allelic heterogeneity, but beyond that they
pose rather different sets of problems for DNA diagnosis (Table 18.1). Between
them, they show many of the issues involved in testing for Mendelian diseases. As
always in this book, we concentrate on the principles and not the practical details.
Best practice guidelines for laboratory diagnosis of the commoner Mendelian
diseases are available on the websites of the American College of Medical Genetics
and the UK Clinical Molecular Genetics Society, among others. The reader inter-
ested in specific procedures or conditions should consult these (see Further
Reading).
Genetic tests are unusual among clinical tests because a genetic test is nor-
mally performed just once and the result forms a permanent part of a person’s
health record. Thus, it is especially important to avoid mistakes in diagnostic
testing. Any laboratory offering clinical (or forensic) testing must have adequate
quality assurance measures in place. These are beyond the scope of this chap-
ter—any interested reader should consult the principles and guidelines pub-
lished by the Organisation for Economic Co-operation and Development (OECD)
(see Further Reading).
When a clinician brings a sample of a patient’s DNA to a laboratory for diag-
nostic testing, there are three possible questions that the laboratory might be
asked to answer:
• Does the patient have any mutation in any gene that would explain the dis-
ease? This is an unreasonable question to ask for routine diagnosis, even if it
might very occasionally be possible in a research context. Gene testing has to
be targeted. Even though the day may come when a person’s entire genome
sequence forms part of his or her basic health record, we would still need to
know where to look among the several million specific variants present in a
person’s DNA. At best it might be reasonable to ask a laboratory to check
TABLE 18.1 THE CONTRASTING GENETICS OF CYSTIC FIBROSIS AND DUCHENNE MUSCULAR DYSTROPHY
Condition Cystic fibrosis Duchenne muscular dystrophy
Characteristics of gene fairly large gene: 250 kb genomic DNA, giant gene: 2400 kb genomic DNA, 79 exons, 14 kb mRNA
27 exons, 6.5 kb mRNA
Spectrum of mutations almost all mutations are single-nucleotide 65% of mutations are deletions encompassing one or more
changes; a few large deletions complete exons; 5% duplications; 30% nonsense, splice site, or small
frameshifting mutations; missense mutations are very unusual
Frequency of new mutations new mutations are extremely rare new mutations are very frequent
Frequency of mosaicism mosaicism is not a problem mosaicism (see Chapter 3, p. 77) is common
Frequency of intragenic little intragenic recombination recombination hotspot (12% between markers at either end of the
recombination gene)
WHAT TO TEST AND WHY 571
Mouthwash or buccal scrape noninvasive; usually enough DNA for several dozen PCR tests, but quality and quantity very variable
Skin, muscle, etc. for RNA studies; needs to be a tissue where the gene of interest is expressed; requires a biopsy, which is invasive
and often unpleasant
Single cell from a blastocyst for pre-implantation diagnosis; technically very demanding
Amniotic fluid a relatively poor source of fetal DNA compared with chorionic villi, but used for testing at 15–20 weeks of
pregnancy
Fetal DNA in maternal blood a promising alternative to chorionic villi; detectable from 6 weeks; paternal alleles only
Pathological specimens vital resource for genotyping dead people; also tumor, etc., biopsies; fixed paraffin-embedded tissue requires
special procedures for DNA purification
Guthrie card the cards used for neonatal screening have one or more spots of the baby’s blood, not all of which are normally
used for the screening; if cards are archived they are a possible source of DNA from a deceased baby
572 Chapter 18: Genetic Testing of Individuals
attempts have been made to use fetal cells in the maternal circulation for less
invasive prenatal diagnosis. Unfortunately, the cells are present only in tiny num-
bers, and the results have never been reliable enough for routine use. Free fetal
DNA is more promising, although only tests for paternally derived alleles are
informative, because the fetal DNA comprises only maybe 3% of the free DNA in
the maternal blood circulation; the rest is maternal. Archived pathological sam-
ples are very important sources of DNA from deceased persons (although con-
sent laws in some countries make it difficult to use them). When samples have
been formalin-fixed, special procedures are necessary, and only short sequences,
typically 200 base pairs or less, can usually be recovered.
Southern blotting (see Figure 7.10) is still necessary for a few applications,
including as an alternative to FISH (see Chapter 2, p. 47) for testing for major bal-
anced rearrangements that would not be picked up by array-CGH (see Chapter 2,
p. 49) and, in some circumstances, for fragile X full mutations (see below).
RNA or DNA?
If a gene has to be scanned for unknown mutations, testing RNA by reverse tran-
scriptase PCR (RT-PCR; see Chapter 8) offers several advantages. Direct testing of
DNA usually involves amplifying and testing each exon separately, and this can
be a major chore in a gene with many exons. A gene with 15 exons that encodes a
2.5 kb transcript can be scanned by RT-PCR in perhaps four overlapping seg-
ments, whereas the same analysis on genomic DNA would require 15 PCR reac-
tions and 30 sequencing runs (assuming that both strands of each PCR product
are sequenced, which would be normal practice). In addition, only RT-PCR can
reliably detect aberrant splicing. It is often hard to predict whether a DNA
sequence change will affect splicing; moreover, aberrant splicing may result from
activation of a cryptic splice site deep within an intron, which would not be
detected by the usual exon-by-exon protocols for sequencing genomic DNA (see
Chapter 8).
However, analyzing RNA has disadvantages. RNA is much less convenient to
obtain and work with. Samples must be handled and processed with extreme
care to avoid degrading mRNA. Importantly, the gene of interest may not be
expressed in any readily accessible tissue. A very low level of illegitimate tran-
scripts may sometimes be detectable in tissues in which the gene is not normally
expressed, but reliable analysis of very low-level mRNAs can be difficult. In addi-
tion, truncating mutations usually result in unstable mRNA because of nonsense-
mediated decay (see Chapter 13), so that the RT-PCR product from a heterozygous
person may show only the normal allele. Treating cultured cells or whole blood
with the translation inhibitor puromycin has been shown to inhibit nonsense-
mediated decay, which may allow cDNA sequencing to detect transcripts that
include premature termination codons.
Functional assays
When considering pedigree patterns in Chapter 3, we simply distinguished two
classes of allele, normal and abnormal, or A and a. Assaying the function of a
gene might allow a similar distinction to be made in the laboratory—which is,
after all, the essential question in most diagnoses. Function might be tested at
the DNA level, for example by a cell transfection assay. Alternatively, the protein
product of a gene could be tested biochemically if its function can be adequately
assayed in the laboratory. The problem with functional assays is that they are
specific to a particular gene or protein; by contrast, DNA technology is generic.
This has obvious advantages for the diagnostic laboratory, but in addition it
encourages technical development, because any new technique could be used
for a wide variety of problems.
patient
T A A A C C T A C C A –G T N C A N C N A A N
TABLE 18.3 SOME METHODS FOR SCANNING A GENE FOR MUTATIONS BEFORE SEQUENCING
Method Advantages Disadvantages
Heteroduplex gel mobility very simple to set up; cheap and simple to run sequences < 200 bp only; limited sensitivity; does not reveal
position of change
dHPLC quick, high throughput; quantitative expensive equipment; does not reveal position of change
SSCP simple, cheap; sensitive when formatted for sequences < 200 bp only; does not reveal position of change
capillary electrophoresis
DGGE high sensitivity choice of primers is critical; expensive GC-clamped primers; does
not reveal position of change
Melting curve analysis high sensitivity proprietary system using a Light-Cycler® machine
Protein truncation test high sensitivity for chain-terminating mutations; chain-terminating mutations only; expensive, difficult technique;
shows position of change usually needs RNA
dHPLC, denaturing high-performance liquid chromatography; SSCP, single-strand conformation polymorphism; DGGE, denaturing gradient gel
electrophoresis.
report is issued the data must be checked by two separate individuals, and the
likely significance of any variant must be considered. Programs are available that
automatically report differences between the test sequence and a standard
sequence, provided that the sequence is of good quality. DNA quality is critical
for avoiding artifacts and for reliably detecting base substitutions in
heterozygotes.
800
4
600
fluorescence intensity
3 400
200
2 0
1 900 patient
800
700
0
2 3 4 5 6 600
time (minutes) 500
400
300
200
100
0
Figure 18.3 Scanning the DMD and NF2 genes for mutations. (A) Scanning for DMD mutation by denaturing high-performance liquid chromatography
(dHPLC). Exon 6 of the dystrophin gene gives different patterns in an affected male (blue trace) and a normal control (red trace). Sequencing revealed
the splice site mutation c.738+1G>T. For males, because this is an X-linked condition, test DNA must be mixed with an equal amount of normal DNA to
allow the formation of heteroduplexes. (B) Mutation scanning of the NF2 (neurofibromatosis 2) gene by chemical cleavage of mismatches. A fluorescently
labeled meta-PCR product contained exons 6–10 of the NF2 gene. The lower track shows the sample from a patient; a heterozygous intron 6 splice site
mutation 600-3c>g is revealed by hydroxylamine cleavage of the 1032 bp meta-PCR product to fragments of 813 + 239 bp (arrows). The upper track is a
control sample. [(A) courtesy of Richard Bennett, Children’s Hospital, Boston. (B) courtesy of Andrew Wallace, St Mary’s Hospital, Manchester.]
576 Chapter 18: Genetic Testing of Individuals
Figure 18.4 DMD mutation scanning with the protein truncation test
(PTT). A coupled transcription–translation reaction is used to produce labeled
polypeptide products encoded by a segment of a cDNA. Segments containing
premature termination codons produce truncated polypeptides. The figure
shows results for one segment of the dystrophin mRNA studied in five affected
boys and one unaffected control (c). The samples in lanes 3 and 5 produce
truncated, faster-running polypeptides. The position of the termination codon
within a sample can be determined from the size of the truncated polypeptide.
(Courtesy of Johan den Dunnen and D. Verbove, Leiden, The Netherlands.) DMD segment 1EF
SCANNING A GENE FOR MUTATIONS 577
wild-type mutant
cell 11 12 13 cell 11 12 13
A A
G G
C C
T T
PCR products are always unmethylated, because they are made from the nor-
mal monomers supplied in the reaction mix. Thus, none of the methods described PCR product
so far gives any information on the methylation patterns in a DNA sample.
Genomewide studies of methylation use chromatin immunoprecipitation with Figure 18.6 Using restriction enzymes to
an antibody against methylated DNA (see Chapter 11, p. 353), but to study the analyze the methylation status of a DNA
methylation status of an individual sequence, two main methods are used: sequence. The restriction enzyme HpaII cuts
CCGG, but only when the cytosines are not
• Restriction enzyme digestion. HpaII cuts only unmethylated CCGG, whereas
methylated. MspI cuts the same sequence
MspI cuts any CCGG, whether or not it is methylated. Thus, methylated
regardless of methylation. If genomic
genomic DNA gives different patterns of restriction fragments with the two DNA is digested with HpaII before PCR
enzymes. Alternatively, a PCR template containing a CCGG sequence can be amplification, any amplicon that contains a
digested with either HpaII or MspI before amplification. If the CCGG is meth- CCGG sequence will give a product only if
ylated, the template will not be cleaved by HpaII, and the PCR will yield a methylation of the CCGG site protects it from
product (Figure 18.6). digestion.
578 Chapter 18: Genetic Testing of Individuals
Restriction digestion of PCR-amplified DNA; check size only when the mutation creates or abolishes a natural restriction site or one engineered by
of products on a gel the use of special PCR primers (see Figure 18.7); tests for a single mutation
Hybridize PCR-amplified DNA to allele-specific general method for specified point mutations; large arrays allow massively parallel
oligonucleotides (ASO) on a dot-blot or gene chip scanning for a very large number of mutations or SNPs
PCR using allele-specific primers (ARMS test) general method for single point mutations; primer design is critical (see Figure 6.17)
Single-nucleotide primer extension general method for specified point mutations; formatted for readout on DNA sequencer or
on microarray for parallel assay of many sites
Oligonucleotide ligation assay (OLA) general method for specified point mutations (see Figure 18.9); a few dozen tests can be
multiplexed
Mass spectrometry very fast (< 1 second) versatile general method; quantitative result (see Box 8.8 and
Figure 8.26)
PCR with primers located either side of a translocation successful amplification shows the presence of the suspected deletion or specified
breakpoint rearrangement
Check size of expanded repeat for dynamic repeat diseases (see Table 13.1) only; large expansions may require Southern
blots
580 Chapter 18: Genetic Testing of Individuals
Sickle-cell disease only this particular mutation produces the sickle-cell p.E6V in HBB gene (see Figure 7.9)
phenotype
Achondroplasia only G380R produces this particular phenotype; high two distinct changes, both causing p.G380R in
frequency of new mutant cases FGFR3 gene
Huntington disease, myotonic dystrophy gain-of-function mutations unstable expanded repeats (see Table 13.1)
Fragile X common molecular mechanism: expansion of an see Table 13.1; other mutations occur, but are
unstable repeat rare
Charcot–Marie–Tooth disease (HMSN1) common molecular mechanism: recombination duplication of 1.5 Mb at 17p11.2 (see
between misaligned repeats Figure 13.25); point mutations also occur
a- and b- thalassemia selection for heterozygotes leads to different ancestral see Figure 13.19 (a-thalassemia) and
mutations being common in different populations Table 18.6 (b-thalassemia)
Tay–Sachs disease founder effect in Ashkenazi Jews; ancient heterozygote two common HEXA mutations in Ashkenazim:
advantage 4 bp insertion in exon 11 (73%); exon 11
donor splice site G>C (15%)
Cystic fibrosis common ancestral mutations in northern European see Table 18.7 and Section 3.5
populations, ancient heterozygote advantage
See Section 13.4 for a further discussion of the reasons why some diseases show a limited range of mutations, whereas others have extensive allelic
heterogeneity.
N H N M
– – – intron – – – – exon – – – – – intron – – – – – – – – –
genomic DNA 5¢ – – – G T G A G T A T T – – – 3¢ 5¢ – – – G T G T G T A T T – – – 3¢
PCR primer 3¢ – – – – – – C A T G A – – – 5¢ 3¢ – – – – – – C A T G A – – – 5¢
PCR product 5¢ – – – G T G A G T A C T – – – 3¢ 5¢ – – – G T G T G T A C T – – – 3¢
Scal
Figure 18.7 Introducing an artificial diagnostic restriction site. An A>T mutation in the intron 4 splice site of the FACC gene
does not create or abolish a restriction site. The PCR primer stops short of this altered base but has a single base mismatch (red G)
in a non-critical position that does not prevent it from hybridizing to and amplifying both the normal and mutant sequences. The
mismatch in the primer introduces an AGTACT restriction site for ScaI into the PCR product from the normal sequence but not the
mutant sequence. The ScaI-digested product from homozygous normal (N), heterozygous (H), and homozygous mutant (M) patients
is shown. (Courtesy of Rachel Gibson, Guy’s Hospital, London.)
TESTING FOR A SPECIFIED SEQUENCE CHANGE 581
to labeled PCR-amplified test DNA. The same reverse dot-blot principle is applied
on a massively parallel scale in DNA microarrays (see above and Chapter 7,
p. 207).
Allele-specific PCR amplification
The principle of the ARMS (amplification refractory mutation system) method
was shown in Figure 6.17. Paired PCR reactions are performed. One primer (the
conserved primer) is the same in both reactions; the other exists in two slightly
different versions, one specific for the normal sequence and the other specific for
the mutant sequence. Additional control primers are usually included, to amplify
some unrelated sequence from every sample as a check that the PCR reaction has
worked. The location of the common primer can be chosen to give different-sized
products for different mutations, so that the PCR products of multiplexed reac-
tions can be separated by gel electrophoresis. The pairs of mutation-specific
primers can also be made to give distinguishable products. For example, they can
(A)
be labeled with different fluorescent or other labels, or given 5¢ extensions of dif- 3¢
ferent sizes. Multiplexed mutation-specific PCR is well suited to screening fairly 5¢
large numbers of samples for a given panel of mutations (Figure 18.8).
A modified procedure, pyrophosphorolysis-activated polymerization, is effec- 3¢ 5¢
tive for detecting variant alleles that are present at very low levels. Applications
would include testing for residual disease in cancer, testing for rare somatic
mutations, and testing for paternally derived alleles in fetal DNA extracted from 5¢ 3¢
maternal blood. The reaction uses a primer with a dideoxy 3¢ end that pairs at the
position of the mutant nucleotide. In the presence of pyrophosphate, DNA ligation creates PCR template
polymerase will remove the 3¢ dideoxynucleotide, in a reversal of the normal
polymerization reaction, but only if it is correctly paired with the test strand.
(B)
The polymerase then extends the primer in the normal way, using normal dNTPs. 3¢
The variant nucleotide is thus read twice: once by the reverse polymerase reac- 5¢
tion, and again by the forward reaction. This dual selectivity results in an extremely
low level of errors. 3¢ 5¢
1000
500
Fluorescence intensity
900
600
300
1500
1000
500
DNA polymerase plus four differently labeled ddNTPs. The test DNA acts as tem- Figure 18.10 Using the oligonucleotide
plate for the addition of a single labeled dideoxynucleotide to the primer (Figure ligation assay to test for 29 known
CF mutations. A multiplex OLA is performed
18.11). The nucleotide added to the primer is identified, for example, by running
and the products are amplified by PCR.
the extended primer on a fluorescence sequencer, allowing the base at the posi-
Ligation oligonucleotides are designed so
tion of interest to be identified. Minisequencing can readily be adapted to an that products for each mutation and its
array-based format (APEX; arrayed primer extension) to allow the entire sequence normal counterpart can be distinguished
of a gene to be checked for base substitutions in a single operation. The same by size and color of label. A ligation product
idea, but using reversibly blocked fluorescently labeled monomers, is the prin- from the splice site mutation 621+1g>t is
ciple behind the Illumina/Solexa ultra-high-throughput sequencing technology seen (red box). The person may be a carrier
(see Chapter 8, p. 220). After a single nucleotide has been added and identified, or may be a compound heterozygote with
the blocking groups can be removed with a suitable flash of light, and the cycle a second mutation that is not one of the 29
can be repeated to identify the next nucleotide. detected by this kit. (Courtesy of Andrew
Wallace, St Mary’s Hospital, Manchester.)
insAGGC
c.3958delCTCAG
c.3875delGTCT
c.4184delTCAA
c.5149delCTAA
c.3600del11
c.2804delAA
c.185delAG
c.5382insC
c.3960C>T
c.4446C>T
c.300T>G
Pyrosequencing
As described in Chapter 8, pyrosequencing is a method of examining very short
stretches of sequence—typically 1–5 nucleotides—adjacent to a defined start
point. The main use is for SNP typing, in which only one or two bases are
sequenced. Pyrosequencing uses an ingenious cocktail of enzymes to couple the
release of pyrophosphate that occurs when a dNTP is added to a growing DNA
chain to light emission by luciferase (see Figure 8.8). A primer is hybridized to the
test DNA and offered each dNTP in turn. When the correct dNTP is present, so
that the primer can be extended, a flash of light is emitted. The method has been
developed into a machine that can automatically analyze 10,000 samples a day.
Output is quantitative, so that allele frequencies of a SNP can be estimated in a
single analysis of a large pooled sample. The same technology, applied on a mas-
sively parallel scale, is the basis of the Roche/454 technique for ultra-high-
throughput sequencing (see Figure 8.9).
size ladder
TOF (see Figure 8.26). Applied to DNA, the technique can measure a mass up to
20 kD with an accuracy of ±0.3%. MALDI-TOF MS can be used as a very much
PCR
faster alternative to gel electrophoresis for sizing oligonucleotides up to about
100 nucleotides long. A test takes less than a second in the machine. For small
oligonucleotides, the accuracy is sufficient to deduce the base composition 1000 ~ 60 Mb
500
directly from the exact mass. Alternatively, SNPs can be analyzed by primer exten- 200
sion using mass-labeled ddNTPs. The spots of DNA to be analyzed can be arrayed 100
on a plate and the machine will automatically ionize each spot in turn. Current fragmentation and
systems can genotype tens of thousands of SNPs per day and, as with pyrose- end-labeling
frag
quencing, samples can be pooled to measure allele frequencies directly.
P2 P2
P1 P2 P1 P2
tag tag
3¢ T 5¢ 3¢ C 5¢
cleavage cleavage
tag tag
5¢ A 3¢ 5¢ G 3¢
tag tag
5¢ 3¢ 5¢ 3¢
capture capture
A tags G
Figure 18.14 SNP genotyping by the Illumina GoldenGate® method. Two allele- A
T
specific (P1 and P2) oligonucleotides and one locus-specific (P3) oligonucleotide are G
used, with a combination of allele-specific primer extension and DNA ligation to provide C
reliable discrimination of alleles. As with the MIP system (see Figure 18.13), ligation annealing
creates a PCR substrate that is amplified with universal dye-labeled primers, and the
products are identified through hybridization of the tag sequence to an array—in
A
this case, a bead array. [From Syvanen AC (2005) Nat. Genet. 37 (Suppl), S5–S10. With P1 T
permission from Macmillan Publishers Ltd.] 5¢ C tag 3¢
P3
P2
amplified with one of two universal dye-labeled primers (Figure 18.14). The G
P2 C
LSO carries a tag sequence that is used to hybridize the PCR product to a spe- tag
5¢ T 3¢
cific bead in a bead array. P3
P1
extension
18.4 SOME SPECIAL TESTS ligation
techniques P1
tag P3
Generally, introns are much bigger than the exons they flank, and so random T
breakpoints in a gene will most probably lie within an intron. Thus, kilobase- P3
scale partial deletions or duplications of a gene sequence, if they do not lie wholly P2
P3
tag
within one intron, will most probably remove or duplicate one or more whole C
exons. Such changes will not be apparent when genomic DNA is amplified exon P3
by exon. In heterozygous deletions of whole exons, the mutant allele will give no
product. The test sees only the normal allele, and the result looks entirely normal.
Similarly, duplications will not show up by these methods. PCR in its normal form A
tag
is not quantitative, so any extra yield of product would not be noticed. RNA test- G
ing would solve the problem, but RNA may be difficult to obtain.
This is a problem that is particular to kilobase-scale deletions or duplications. capture
Small deletions or duplications that lie wholly within an exon will be apparent
when the exon is sequenced. Large-scale copy-number changes that might be
anywhere in the genome are efficiently detected by array-CGH. However, this is A G
an expensive technique for routine diagnosis, and the normal BAC arrays have
tags
low sensitivity for changes involving only a kilobase or so of DNA. When the can-
didate gene is already known, the technique of choice for detecting whole-exon
deletions or duplications is multiplex ligation-dependent probe amplification
(MLPA).
The multiplex ligation-dependent probe amplification (MLPA) test
MLPA is a development of the oligonucleotide ligation assay (OLA) described
above. As in the OLA, pairs of oligonucleotides hybridize to adjacent locations on
the test DNA, and DNA ligase seals the gap between them to produce a single
molecule. In the OLA the gap is positioned at a suspected variant nucleotide and
ligation will occur only if there is an exact match; in MLPA the gap is positioned
at a (hopefully) invariant nucleotide in the test DNA, so ligation will always take
place if the template sequence is present. Ligation creates a PCR-amplifiable
molecule (see Figure 18.9). The presence of a PCR product signals the presence of
the appropriate matching sequence in the test DNA.
MLPA is a multiplex procedure with up to 45 probe pairs combined in some of
the available kits. The individual PCR products are distinguished by length, so as
to generate a series of peaks when the products are run on a DNA sequencer
(Figure 18.15). Although the PCR is not truly quantitative (absolute amounts of
product vary between probes), the relative amounts of a particular product in
two samples should reflect the relative amounts of the template in the two sam-
ples. When results from test and control samples are compared, the relative peak
heights give a direct readout of the dosage of each sequence in the test DNA rela-
tive to the control DNA. Normally, one ligation is used for each exon of a gene,
allowing whole-exon deletions and duplications to be detected.
586 Chapter 18: Genetic Testing of Individuals
exon
13
(A)
track 1 2 3 4 5 6 7 8 9 10
exon
48
51
43
45
50
53
47
60
Figure 18.16 Multiplex screen for
52
dystrophin deletions in males. (A) Products
of multiplex PCR amplification of nine exons,
primers using samples from 10 unrelated boys with
Duchenne/Becker muscular dystrophy. PCR
(B) primers have been designed so that each
exon 43 44 45 46 47 48 49 50 51 52 53 60
exon, with some flanking intron sequence,
gives a different-sized PCR product.
track (B) Interpretation of the results in (A).
1 Solid lines show exons definitely deleted,
2 dotted lines show the possible extent of
3 deletion running into untested exons.
4 No deletion is seen in samples 7 and 9;
5 these patients may have point mutations,
6
or deletions of exons not examined in this
7
8 test. Exon sizes and spacing are not to scale.
9 Compare with Figure 13.15. [(A) courtesy of
10 R. Mountford, Liverpool Women’s Hospital.]
SOME SPECIAL TESTS 587
II 2 II
1 2 3 4 1 2 3 4
3 3 1 1 1 2
III III
1 2 1 2
1 3
the gene) will reveal 98% of all deletions. It is important to consider alternative
explanations for failure to amplify: maybe there was a technical failure of the PCR
or a base substitution in one of the primer-binding sites. Most deletions remove
more than one exon. Deletions of just a single exon and deletions that seem to Figure 18.17 Deletion carriers in a
affect non-contiguous exons need confirming. Deletions can be confirmed by DMD family revealed by apparent non-
using alternative PCR primers or by MLPA. About 5% of dystrophin mutations are maternity. Pedigree (A) and results (B) of
duplications of one or more exons, and detecting these requires MLPA in males genotyping with the intragenic marker
STR45. The affected boy, III1, has a deletion
as well as in females.
that includes STR45 (his lane on the gel
Apparent non-maternity in a family in which a deletion is segregating is blank). His mother, II2, and his aunt, II3,
inherited no allele of STR45 from their
If a deletion is segregating in a DMD family, genotyping females for microsatel- mother, I2, showing that the deletion is being
lites that map within the deletion may reveal apparent non-maternity, in which a transmitted in the family. I2 is apparently
mother has transmitted no marker allele to her daughter because of the deletion homozygous for this highly polymorphic
(Figure 18.17). In such families, non-maternity proves that a woman is a carrier, marker (lane 2), but in fact is hemizygous
whereas heterozygosity (in the daughter or sister of a deletion carrier) proves that as a result of the deletion. The boy’s other
a woman is not a carrier. Several markers suitable for this purpose have been aunt, II4, and his sister, III2, are heterozygous
identified in the introns at deletion hotspots. The method works best in families for the marker and therefore do not carry
in which there is an affected male in whom the deletion can first be defined. In the deletion. (C) Pedigree showing how the
marker genotypes of individuals I2, II2, and II3
principle, the same approach could be applied to tracking any other deletion,
are hemizygous as a result of the deletion.
whether X-linked or autosomal, through a family. However, for any autosomal
deletion, a quantitative method would be needed to identify any heterozygous
deletion in the first place, and if such a method were available it would be simpler
to use the same method to test other family members.
Figure 18.18 Prenatal diagnosis of trisomy 21 with QF-PCR. Fetal DNA, obtained by chorionic villus biopsy or amniocentesis,
was amplified in a multiplex PCR reaction using dye-labeled primers for four highly polymorphic microsatellites from each of
chromosomes 13, 18, and 21. The markers from chromosomes 13 and 18 each show two equal-sized peaks (if heterozygous) or one
larger peak (if homozygous). Markers from chromosome 21 all show three peaks (D21S1411) or two peaks in a 2:1 size ratio (D21S11,
D21S1270, and D21S1435). (Courtesy of Susan Hamilton, St Mary’s Hospital, Manchester.)
(C)
The mutation screen for some diseases must take account of EcoRI EclXI (CGG)n EcoRI
geographical variation
Ox1.9
The population genetics of recessive diseases is often dominated by founder
effects or the effects of heterozygote advantage (see Chapter 3, p. 89). The result-
ing limited diversity of mutations in a population can make genetic testing much
easier. b-Thalassemia and cystic fibrosis are good examples. For both these con-
ditions, a very large number of different mutations in the relevant gene have been
described, but in each case a handful of mutations account for most cases in any
particular population. With b-thalassemia, DNA testing is not needed to diag- F
nose carriers or affected people—orthodox hematology does this perfectly well—
but it is the method of choice for prenatal diagnosis. Different mutations are pre- NM
dominant in different populations (Table 18.6). Provided that one has DNA
samples from the parents and knows their ethnic origin, the parental mutations
can often be found by using only a small cocktail of specific tests, after which the
fetus can be readily checked.
In cystic fibrosis, the p.F508del mutation is the commonest in all European
populations and is believed to be of ancient origin. However, the proportion of all
mutations that are p.F508del varies, being generally high in the north and west of P
Europe and lower in the south. Testing for cystic fibrosis mutations divides into N
two phases. First, a limited number of specified mutations, always including
p.F508del but otherwise population-specific, are sought with the methods
described in Section 18.3. As Table 18.7 shows, there is no obvious natural cutoff 1 2 3 4 5 6 7
in terms of diminishing returns on testing for specific mutations. If this phase
fails to reveal the mutations, then, if resources allow, a screen for unknown muta-
tions may be instituted, using the methods described in Section 18.2. Alternatively,
gene tracking (see below) may be used. The impact of this diversity on proposals
for population screening is discussed in Chapter 19.
Surprisingly often, when a recessive disease is particularly common in a cer-
tain population, it turns out that more than one mutation is responsible. An
example is Tay–Sachs disease among Ashkenazi Jews, where there are two com-
mon HEXA mutations (see Table 18.5). It is difficult to explain this situation except
by assuming there has been a long-continuing heterozygote advantage, favoring
the accumulation of mutations in the gene.
590 Chapter 18: Genetic Testing of Individuals
In each country, certain mutations are frequent because of a combination of founder effects and
selection favoring heterozygotes. Data courtesy of J Old, Institute of Molecular Medicine, Oxford.
b0, complete absence of b-globin chains; b+, b-globin present but in insufficient quantity. The
nomenclature of mutations used here is nonstandard (see Box 13.2, p. 407), but is widely used for
thalassemia.
p.F508del and a few of the other relatively common mutations are probably ancient and
spread through selection favoring heterozygotes; the other mutations are probably recent, rare,
and highly heterogeneous. Cystic fibrosis is more homogeneous in this population than in most
others. See Box 13.2 for nomenclature of mutations. Data courtesy of Andrew Wallace, St Mary’s
Hospital, Manchester.
case in which this does happen. Worldwide, 20–50% of children with autosomal
recessive profound congenital hearing loss have mutations in the GJB2 gene that
encodes connexin 26. Different specific mutations are common in different pop-
ulations—c.30delG in Europe, c.235delC in East Asia, c.167delT in Ashkenazi
Jews. A simple PCR test therefore provides the answer in a good proportion of
cases. For the remainder, it would be necessary first to sequence the whole GJB2
gene and then, if resources allowed, to examine a large number of other genes.
In many other cases, no one gene accounts for a significant proportion of
cases. Learning difficulties present the ultimate challenge in this respect. Custom
chips are being developed to allow a large panel of genes to be screened; alterna-
tively, the new exon capture and ultra-fast sequencing technologies may allow
dozens of genes to be sequenced at a reasonable cost. Technically, the challenge
is identical to the problem of screening a person’s DNA for large numbers of vari-
ants that confer susceptibility or resistance to common complex diseases.
However, data interpretation presents different problems. For susceptibility
screening, the problem is to know the combined risk from many variants, each of
which modifies risk to only a small degree. Risks cannot be simply added or mul-
tiplied; combining them requires a quantitative model of the effect of each vari-
ant on overall cell biology. For heterogeneous Mendelian conditions this is not a
problem: one mutation is responsible for the condition. The problem is the large
number of unclassified variants that will undoubtedly be identified.
592 Chapter 18: Genetic Testing of Individuals
Shown here are three stages in the investigation of a late-onset her unaffected chromosome. Her affected chromosome, inherited
autosomal dominant disease where, for one reason or another, direct from her dead father, must be the one that carries marker allele 1.
testing for the mutation is not possible. • By typing III2 and her father, we can work out which marker allele
• Individual III2 (arrow), who is pregnant, wishes to have a she received from her mother. If she is 2–1 or 2–3, it is good
presymptomatic test to show whether she has inherited news: she inherited marker allele 2 from her mother, which is the
the disease allele. The first step is to tell her mother’s two grandmaternal allele. If she types as 1–1 or 1–3 it is bad news:
chromosomes apart. A marker, closely linked to the disease locus, she inherited the grandpaternal chromosome, which carries the
is found for which II3 is heterozygous (2–1). disease allele.
• Next we must establish phase—that is, work out which marker Note that it is the segregation pattern in the family, and not the
allele in II3 is segregating with the disease allele. The maternal actual marker genotype, that is important: if III2 has the same marker
grandmother, I2, is typed for the marker (2–4). Thus, II3 must have genotype, 2–1, as her affected mother, this is good news, not bad
inherited marker allele 2 from her mother, which therefore marks news, for her.
I
2–4 2–4
II
2–1 2–1 2–1 1–3
III
2–1
2–3
1–1
1–3
a greater risk is unexpected locus heterogeneity, so that the true disease locus in
the family is actually different from the locus being tracked.
Bayesian calculations give a quick answer for simple pedigrees, but the calcu-
lations can get very elaborate for more complex ones. Few people feel fully confi-
dent of their ability to work through a complex pedigree correctly, although the
attempt is a valuable mental exercise for teasing out the factors contributing to
the final risk. An alternative is to use a linkage analysis program.
Using linkage programs for calculating genetic risks
At first sight it may seem surprising that a program designed to calculate lod
scores can also calculate genetic risks, but in fact the two are closely related
(Figure 18.22). As described in Chapter 14, linkage analysis programs are gener-
al-purpose engines for calculating the likelihood of a pedigree, given certain data
and assumptions. For calculating the likelihood of linkage we calculate the ratio
likelihood of data | linkage, recombination fraction q
likelihood of data | no linkage (q = 0.5)
For estimating the risk that a proband carries a disease gene, we calculate the
ratio
likelihood of data | proband is a carrier, recombination fraction q
likelihood of data | proband is not a carrier, recombination fraction q
As in Box 18.2, the vertical line | means given that.
(A) (B)
M C F1 F2
suspects
specimen
victim
1 2 3
D2S1338 2 +
TPOX 2 +
D3S1358 3 + +
FGA 4 + + +
CSF1PO 5 +
D5S818 5 +
F13A1 6 +
D7S820 7 +
D8S1179 8 + + +
THO1 11 + + + +
VWA 12 + + + +
D13S317 13 +
FES/FPS 15 +
D16S539 16 + +
D18S51 18 + + +
D19S433 19 +
D21S11 21 + + +
Amelogenin X, Y + + +
Current panels are CODIS and SGM+. The UK first panel and SGM panel are no longer used; they are
included to illustrate how testing panels have evolved over time. The match probability of a profile
is the probability that two unrelated individuals would have the same profile by random chance. It is
calculated by multiplying together the chance of a match for each individual marker allele.
assuming p to be the same for every bin and ignoring the need to match on
intensity as well as position). Even for p = 0.2, p10 is only 10–7. It is imperative
that the same binning criteria be used for judging matches between two pro-
files and for calculating p. The criteria can be arbitrary within certain limits,
but they must be consistent.
X Y 13 14 29 30 16 18
D3S1358
HUMVWA D16S539 D2S1338
16 17 15 18 11 12 18 19
14 6 9.3 15 18
Figure 18.24 A DNA profile produced using the SGM+ set of markers. The SGM+ multiplex uses three different colored labels for
primers to make the results clearer. Numbers in boxes are the number of repeat units in each microsatellite allele. [From Jobling MA
& Gill P (2004) Nat. Rev. Genet. 5, 739–751. With permission from Macmillan Publishers Ltd.]
the complete genotype from a single definable ancestor. For the Y chromosome,
a panel of short tandem repeat markers is used, whereas mitochondrial geno-
types are defined by SNPs. An interesting example was the identification of the
remains of the Russian tsar and his family, killed by the Bolsheviks in 1917, by
comparing DNA profiles of excavated remains with those of living distant rela-
tives (see Further Reading). These markers are also sometimes used in criminal
investigations, particularly when the question arises of whether the criminal
might be a relative of somebody whose profile is stored on a forensic database.
Genetic markers provide a much more reliable test of zygosity. The extensive
but now obsolete literature on using blood groups for this purpose is summa-
rized by Race & Sanger (see Further Reading). DNA profiling is nowadays the
method of choice. The Jeffreys fingerprinting probe provided an immediate
impression: samples from MZ twins look like the same sample loaded twice, and
samples from DZ twins show differences. When single-locus markers are used, if
twins give the same types, then for each locus the probability that DZ twins would
type alike is calculated. If the parents have been typed, this follows from Mendelian
principles; otherwise, the probability of DZ twins typing the same must be calcu-
lated for each possible parental mating and weighted by the probability of that
mating calculated from population gene frequencies. The resultant probabilities
for each (unlinked) locus are multiplied, to give an overall likelihood PI that DZ
twins would give the same results with all the markers used. The probability that
the twins are MZ is then
Pm = m/[m + (1 – m)PI ],
where m is the proportion of twins in the population who are MZ (about 0.4 for
same-sex pairs).
our physical appearance is also encoded in our DNA. People dream of a DNA
photofit—a portrait (and maybe surname) of the criminal revealed by the DNA
left at the crime scene! Considerable development work is taking place toward
that goal, but it remains far distant. Currently only eye and hair color can be even
roughly predicted. It is also worth remembering that, whatever might be possible
with a good blood sample, crime-scene DNA is often too limited in quantity and
quality to allow extensive genotyping.
Courtroom issues
If the genotypes do all match, the court needs to know the odds that the criminal
is the suspect rather than a random member of the population. Of course, if the
alternative were the suspect’s brother or the suspect’s identical twin, the odds
would look very different. It is also important to remember that a DNA match
may prove beyond reasonable doubt that the suspect had been present at the
crime scene, but (apart perhaps in rape cases) it cannot prove what he or she did
there. By itself, a DNA match is not normally enough to secure a conviction.
The fate of DNA evidence in courts provides a fascinating insight into the dif-
ference between scientific and legal cultures. Suppose that a suspect’s DNA pro-
file matches the scene-of-crime sample. There are still at least three obstacles to
a rational use of the DNA data.
First, the jury may simply not believe, or may perhaps choose to ignore, the
DNA data, as evidently happened in the O.J. Simpson trial. Maybe they will decide
the incriminating DNA was planted—a not unreasonable proposition.
Second, an unscrupulous lawyer may try to lead the jury into a false probabil-
ity argument, the so-called Prosecutor’s Fallacy. This consists of confusing the
probability that the suspect is innocent, given the match, with the probability of
a match, given that the suspect is innocent. The jury should consider the first
probability, not the second. As Box 18.3 shows, the two are very different.
Finally, objections may be raised to some of the principles by which DNA-
based probabilities are calculated:
• The multiplicative principle, that the overall probabilities can be obtained by
multiplying the individual probabilities for each allele or locus, depends on
the assumption that genotypes are independent. If the population actually
consisted of reproductively isolated groups, each of whom had genotypes
that were fairly constant within a group but very different between groups,
Suppose that a court case has good evidence that the person whose DNA was found at the crime
scene committed the crime. The suspect’s DNA profile matches the crime-scene sample. Does that
make the suspect guilty? Consider two different probabilities:
• The probability that the suspect is innocent, given the match.
• The probability of a match, given that the suspect is innocent.
Using Bayesian notation (see Box 18.2) with M = match, G = suspect is guilty, and I = suspect
is innocent, the first probability is PI|M and the second is PM|I. The Prosecutor’s Fallacy consists of
arguing that the relevant probability is PM|I when in fact it is PI|M. The calculation below shows
how different these two probabilities are.
If the suspect were guilty, the samples would necessarily match: PM|G = 1. Let us suppose
that population genetic arguments say there is a 1 in 106 chance that a randomly selected person
would have the same profile as the crime-scene sample: PM|I = 10–6. Suppose that the guilty
person could have been any one of 107 men in the population. If there is no other evidence to
implicate him, he is simply a random member of the population, and the prior probability that
he is guilty (before considering the DNA evidence) is PG = 10–7. The prior probability that he is
innocent is PI = 1 – 10–7, which is very close to 1.
Bayes’s theorem tells us that
PI|M = (PI ¥ PM|I ) / [(PI ¥ PM|I ) + (PG ¥ PM|G)]
= 10–6/(10–6 + 10–7)
= 1.0/1.1
= 0.9.
We already saw that PM|I = 10–6—quite a difference!
Courts make fools of themselves by ignoring compelling DNA evidence—but this calculation
also shows that DNA evidence could not safely convict somebody unless PM|I was well below 10–6.
Plans to screen all men in a large town to find a rapist need to take account of this.
DNA PROFILING 601
CONCLUSION
In this chapter we have reviewed some of the many ways in which a DNA sequence
can be checked for the presence of known or unspecified variants.
For cases in which it is necessary to scan one or more candidate genes for any
variant, DNA sequencing is the main tool. Normally, genomic DNA is sequenced,
but cDNA may be preferable for large genes with numerous small exons, or if
abnormal splicing is suspected. However, RNA requires more careful handling
than DNA. DNA or RNA samples are generally amplified by PCR (RT-PCR in the
case of RNA). If it is important to reduce the sequencing load, various techniques
allow a quick scan of exons of a gene, allowing sequencing to be restricted to
those that show some variant. These techniques exploit differences in the prop-
erties of DNA heteroduplexes or in the conformation of single-stranded DNA.
Alternatively, microarrays can be used to scan a whole gene sequence for vari-
ants. The continuing rapid improvements in sequencing technology have made
all these alternatives to sequencing less attractive; for most diagnostic laborato-
ries the main bottleneck in mutation analysis is now in analyzing the sequence
data rather than in generating it.
Another group of tests are designed to detect specific mutations in a particu-
lar gene. Commercial kits for such tests are increasingly available. Allele-specific
oligonucleotide probes can be used in reverse dot-blot hybridization or, on a
large scale, as probes on a DNA microarray. In ARMS (amplification refractory
mutation system) testing, pairs of allele-specific primers are used in PCR to dis-
criminate between normal and mutant sequences. In the oligonucleotide liga-
tion assay (OLA), mutations prevent the DNA ligation of two paired oligonucle-
otides and thus PCR amplification of the test sequence. Specialized sequencing
technologies such as pyrosequencing or minisequencing identify just one or a
few nucleotides adjacent to a primer, thus allowing specific mutations to be
detected. The same microarray techniques used for mutation detection can also
be used to search whole genomes for specific mutations or SNPs.
Special tests are required to identify heterozygous deletions or copy-number
changes in genomic DNA. Multiplex ligation-dependent probe amplification
(MLPA), a variant of the OLA test, is a popular method for detecting deletions or
duplications of exons within a gene. At the level of whole chromosomes, quanti-
tative PCR of a panel of fluorescence-labeled microsatellites (QF-PCR) is often
used as an alternative to standard karyotyping for prenatal diagnosis of the com-
mon fetal trisomies. For genomewide detection of copy-number changes, array-
comparative genomic hybridization or high-resolution SNP chips can be used.
The older technique of gene tracking in a family pedigree is still useful in some
situations, but care needs to be taken in the choice of DNA marker and calcula-
tion of the risk.
DNA profiles, consisting of genotypes for a panel of polymorphic markers, are
of great use in identifying individuals and determining relationships. The older
DNA fingerprinting technique, based on hypervariable minisatellites, has been
superseded by single-locus profiling using standardized panels of polymorphic
microsatellites. The main use of DNA profiling is in criminal investigations, in
which profiling and the associated police DNA databases have revolutionized
practice but have also raised concerns about possible infringements of civil liber-
ties. Courts have not always found it easy to handle the scientific questions raised
by DNA profiling within an adversarial legal framework. It is important that they
should recognize the limitations, as well as the powers, of DNA profiling. Other
uses for DNA profiling include resolving cases of disputed paternity, identifying
the origin of clinical samples (e.g. between mother and fetus, or transplant donor
and recipient), and determining the zygosity of twins.
The emphasis in this chapter has been on the techniques used to genotype a
sample; except for the problem of unclassified variants there was little need to
discuss the significance of the result. However, the same techniques can be used
in situations where there is much more uncertainty about how useful the results
might be. One such case is the use of genetic tests to predict somebody’s response
to a drug. Individual variations in response often have a genetic basis, and there
are hopes that in the future genetic testing will lead to more efficient prescribing,
with the ultimate goal of personalized medicine. How realistic is this goal? Further
questions arise from the use of genetic tests in population screening. This raises
FURTHER READING 603
FURTHER READING
Best practice guidelines program that combines the biophysical characteristics of
American College of Medical Genetics. http://www.acmg.net/ amino acids and protein multiple sequence alignments to
Clinical Molecular Genetics Society (UK). http://cmgsweb.shared predict where missense substitutions in genes of interest fall
.hosting.zen.co.uk/ in a spectrum from deleterious to neutral.]
Cystic fibrosis mutation database. http://www.genet.sickkids J. Craig Venter Institute: SIFT (Sorting Intolerant from Tolerant).
.on.ca/cftr/ [The best source of overview information about http://sift.jcvi.org/ [Predicts whether an amino acid
cystic fibrosis mutations.] substitution affects protein function on the basis of sequence
Leiden muscular dystrophy pages. http://www.dmd.nl/ homology and the physical properties of amino acids.]
[A comprehensive and well-maintained source of information PolyPhen (Polymorphism Phenotyping). genetics.bwh.harvard
on all molecular aspects of muscular dystrophy.] .edu/pph/ [A tool that predicts the possible impact of an
amino acid substitution on the structure and function of a
RNA analysis human protein.]
Andreutti-Zaugg C, Scott R & Iggo R (1997) Inhibition of
Methods for testing for a specified sequence
nonsense-mediated messenger RNA decay in clinical samples
facilitates detection of human MSH2 mutations with an Fakhrai-Rad H, Pourmand N & Ronaghi M (2002) Pyrosequencing:
in vivo fusion protein assay and conventional techniques. an accurate detection platform for single nucleotide
Cancer Res. 57, 3288–3293. [Deals with the use of puromycin polymorphisms. Hum. Mutat. 19, 479–485.
to identify mRNAs containing premature termination Liu Q & Sommer SS (2004) PAP: detection of ultra rare mutations
codons.] depends on P* oligonucleotides: “sleeping beauties”
awakened by the kiss of pyrophosphorolysis. Hum. Mutat.
Methods for scanning a gene for mutations 23, 426–436.
Cotton RGH, Edkins E & Forrest S (eds) (1998) Mutation Detection: Monforte JA & Becker CH (1997) High-throughput DNA analysis
A Practical Approach. IRL Press. [Individual chapters review a by time-of-flight mass spectrometry. Nat. Med. 3, 360–362.
wide range of methods.]
Keen J, Lester D, Inglehearn C et al. (1991) Rapid detection of Large-scale SNP genotyping
single base mismatches as heteroduplexes on Hydrolink gels. Affymetrix GeneChip. http://www.affymetrix.com/technology/
Trends Genet. 7, 5. index.affx [Requires a log-in.]
Sheffield VC, Beck JS, Kwitek AE et al. (1993) The sensitivity of Fan J-B, Oliphant A, Shen R et al. (2003) Highly parallel SNP
single strand conformational polymorphism analysis for the genotyping. Cold Spring Harbor Symp Quant Biol. 68, 69–78.
detection of single base substitutions. Genomics 16, 325–332. Gut IG (2004) DNA analysis by MALDI-TOF mass spectrometry.
Sheffield VC, Beck JS, Nichols B et al. (1992) Detection of Hum. Mutat. 23, 437–441.
multiallele polymorphisms within gene sequences by Hardenbol P, Yu F, Belmont J et al. (2005) Highly multiplexed
GC-clamped denaturing gradient gel electrophoresis. Am. J. molecular inversion probe genotyping: over 10,000 targeted
Hum. Genet. 50, 567–575. SNPs genotyped in a single tube assay. Genome Res.
van der Luijt R, Khan PM, Vasen H et al. (1994) Rapid detection 15, 269–275.
of translation-terminating mutations at the adenomatous Illumina BeadArray. http://www.illumina.com/pages.ilmn?ID=5
polyposis coli (APC ) gene by direct protein truncation test. Stanssens P, Zabeau M, Meersseman G et al. (2004) High-
Genomics 20, 1–4. throughput MALDI-TOF discovery of genomic sequence
Wallace AJ, Wu CL & Elles RG (1999) Meta-PCR: a novel method polymorphisms. Genome Res. 14, 126–133.
for creating chimeric DNA molecules and increasing the Tõnisson N, Zernant J, Kurg A et al. (2002) Evaluating the
productivity of mutation scanning techniques. Genet. Test. arrayed primer extension resequencing assay of TP53 tumor
3, 173–183. suppressor gene. Proc. Natl Acad. Sci. USA 99, 5503–5508.
Various authors (2002) The Chipping Forecast II. Nat. Genet.
Analysis of DNA methylation 32 (Suppl), 465–551. [Reviews of microarray-based
Ehrlich M, Nelson MR, Stanssens P et al. (2005) Quantitative high- technologies.]
throughput analysis of DNA methylation patterns by base-
specific cleavage and mass spectrometry. Proc. Natl Acad. Sci. Gene tracking
USA 102, 15785–15790. Bridge PJ (1997) The Calculation Of Genetic Risks: Worked
Laird PW (2003) The power and the promise of DNA methylation Examples In DNA Diagnostics, 2nd ed. Johns Hopkins
markers. Nat. Rev. Cancer 3, 253–266. University Press. [A very detailed set of calculations covering
Zeschnigk M, Lich C, Buiting K et al. (1997) A single-tube PCR test almost every conceivable situation in DNA diagnostics.]
for the diagnosis of Angelman and Prader–Willi syndrome Young ID (2006) An Introduction To Risk Calculation In Genetic
based on allelic methylation differences at the SNRPN locus. Counseling. Oxford University Press. [A clinically oriented set
Eur. J. Hum. Genet. 5, 94–98. [An example of methylation- of risk calculations.]
specific PCR.]
DNA profiling
The problem of unclassified variants Gill P, Ivanov PL, Kimpton C et al. (1994) Identification of the
International Agency for Research on Cancer: Align-GVGD. remains of the Romanov family by DNA analysis. Nat. Genet.
agvgd.iarc.fr/agvgd_input.php [A freely available, Web-based 6, 130–135.
604 Chapter 18: Genetic Testing of Individuals
Quality assurance
Organisation for Economic Co-operation and Development. OECD
Guidelines for quality assurance in molecular genetic testing.
http://www.oecd.org/dataoecd/43/6/38839788.pdf
Chapter 19
19
Pharmacogenetics,
Personalized Medicine,
and Population Screening
KEY CONCEPTS
• Medicine in the post-genomic era holds the promise of testing of individuals to deliver
personalized medicine and screening of populations to determine susceptibility to
common diseases.
• Any clinical test, including genetic tests, should be evaluated by first considering its
analytical validity, which includes its sensitivity, specificity, and predictive value.
• A broader evaluation of a clinical test should consider clinical validity, clinical utility, and
ethical implications.
• The most crucial question about a proposed test is how often the result, in combination
with other available data, will move participants across a threshold for possible
further action.
• Pharmacogenetics is concerned with the way in which genetic variations between
individuals cause different responses to a drug.
• Many pharmacogenetic variations affect the rate of absorption or metabolism of drugs
(pharmacokinetics).
• Other pharmacogenetic variations affect the response of a drug target to a given level of
the drug (pharmacodynamics).
• One aim of personalized medicine is to tailor the drug and dosage that an individual
receives to his/her genotype. Obstacles to personalized prescribing include the inadequate
predictive power of most genotypes and the need to develop rapid real-time bedside
genotyping methods.
• Genotype-specific prescribing is closest to reality in cancer medicine, in which some drugs
are already prescribed only after a tumor specimen has been genotyped.
• Expression profiling of tumors may eventually guide treatment, but its value has yet to be
confirmed by large prospective studies.
• Testing for susceptibility to complex diseases is a growing but largely unregulated industry.
Tests marketed directly to consumers over the Internet are often not supported by evidence
of clinical utility.
• Population screening usually aims at defining a high-risk subset of the population. This will
almost certainly include many individuals who do not, in fact, have the condition being
screened for. People in the high-risk group are then offered some intervention, most usually
a definitive diagnostic test.
• Screening tests usually involve a trade-off between sensitivity and specificity, with an
arbitrary intervention level being defined to optimize the compromise. Screening programs
need very careful consideration to ensure that they do more good than harm.
• A vision for the future role of genetics in medicine is to move from the current reactive
‘diagnose and treat’ system to a proactive ‘predict and prevent’ system. Whether this will be
achievable, and if so when it will happen, are extremely contentious and
unresolved questions.
606 Chapter 19: Pharmacogenetics, Personalized Medicine, and Population Screening
Test positive a c
Test negative b d
these cases the analytical validity covers just laboratory quality issues. It should
not be taken for granted. In 2001, investigators in California sent hair samples
from the same person to six of the leading US laboratories that offered hair min-
eral analysis to people concerned about nutritional deficiencies. The results from
the different laboratories were grossly disparate and overall not far from random.
Before relying on results from a laboratory it is advisable to check that it is enrolled
in a reputable quality assurance scheme.
The measures become more interesting if the test is defined as the totality of
what the laboratory does to try to answer a clinical question. In Chapter 18 we
distinguished between testing for a specific predefined sequence change and
scanning an entire gene for mutations. The clinically relevant question is usually
whether any mutation has been detected anywhere in the relevant gene. The sen-
sitivity of the test will depend on how hard we look. Will we sequence every exon,
or will we rely on a method such as single strand conformation polymorphism or
melting curve analysis? Will we use multiplex ligation-dependent probe amplifi-
cation (MLPA) to check for whole exon deletions, in addition to sequencing
exons? Scanning the entire gene also raises the problem of unclassified variants,
discussed in Chapter 18. If a nonpathogenic variant is mistakenly reported as a
positive test result, the specificity of the test is reduced.
In the end, the value of a test must be judged by the changes it brings about.
Whatever the performance of a test according to the criteria outlined above, in
the end what matters is what it achieves for patients. Does it change what people
do? For most conditions there will be some threshold for action—for taking
X-rays, doing a biopsy, or prescribing a drug. The result of any single test will be
considered alongside all other relevant information—the age and sex of the
patient, their general health and social situation, and the results of any other tests.
A useful test is one that makes a substantial contribution to moving people across
the action threshold, in either direction. In this connection, absolute risk is much
more important than relative risk. A test may alter somebody’s risk tenfold, but if
the effect is only to change the risk from 1 in 10,000 to 1 in 1000 it will probably
not make any practical difference. In contrast, a positive test for a BRCA1 muta-
tion has a relative risk of only about 7 (80% for a carrier versus 12% general popu-
lation risk), but the test result is important because of the high absolute risk.
derivative of the substrate. The P450 cytochromes probably evolved to deal with
toxic plant metabolites, and variants may be important in determining the
response to chemicals in the diet and environment, as well as to prescribed drugs.
The enzymes are found mainly in the liver, but some occur in the small intestine
and elsewhere. Some drugs have an additional effect of inducing or inhibiting
specific P450 enzymes, which can lead to unforeseen interactions between these
drugs and those that are substrates of the enzyme. The Cytochrome P450 Drug
Interaction Table (see Further Reading) maintains comprehensive lists.
Humans have about 60 P450 genes, encoding enzymes that, between them,
are responsible for the phase 1 metabolism of maybe 60% of all prescribed drugs.
They are grouped into several families. Genes have names such as CYP2D6 (cyto-
chrome P450 family 2, subfamily D, polypeptide 6). At least 10 different P450
enzymes have significant roles in phase 1 drug metabolism. Many of them can
metabolize a considerable range of different drugs, and individual drugs are often
substrates for more than one P450 enzyme. These wide and overlapping specifi-
cities mean that there are not always close correlations between any one P450
activity in a patient and his/her handling of a specific drug. Nevertheless, some
drugs are metabolized by just one P450 enzyme, and in pharmacological research
these drugs can be used as probes to identify individuals with unusual functional
variants of that enzyme.
CYP2D6
Back in the 1970s, it was noted that some individuals showed markedly enhanced
sensitivity to the antihypertensive drug debrisoquine and to the anti-arrhythmic
drug sparteine. On investigation, these individuals showed high plasma levels of
the drugs but low urinary levels of the catabolism products, implying that they
were failing to metabolize and excrete the drugs. The cause was eventually identi-
fied as low activity of the CYP2D6 enzyme. Individuals can be classified into poor,
intermediate, extensive, and ultra-rapid metabolizers. Figure 19.2 shows how
CYP2D6 activity governs the effective dose of the antidepressant drug nortriptyl-
ine, which is metabolized by this enzyme.
CYP2D6 is involved in the metabolism of perhaps 25% of all drugs. Variation
in its activity has significant effects on the response to some beta-blockers used
to treat hypertension and heart disease, and to several psychiatric drugs includ-
ing tricyclic antidepressants. Poor metabolizers are at risk of overdose effects of
these drugs. In contrast, CYP2D6 is also required to convert codeine into its active
form, morphine. Codeine is ineffective in pain relief for poor metabolizers,
whereas ultra-rapid metabolizers risk adverse effects such as sedation and
impaired breathing.
90
Figure 19.2 Pharmacogenetics of
80
debrisoquine and nortriptyline. The
70 activity of the CYP2D6 enzyme is measured
by the metabolic ratio (MR), which is the ratio
number of patients
60
of amounts of a substrate drug, debrisoquine
50 and its metabolic product in urine after
a standard dose of the drug. High ratios
40
show poor conversion due to low enzyme
30 activity. The graph shows observed ratios
for a range of patients. The same enzyme is
20
largely responsible for phase 1 catabolism
10 of the antidepressant drug nortriptyline.
0
Depending on the CYP2D6 phenotype as
0.01 0.1 1 10 100 measured by the metabolic ratio, patients
MR require different doses of nortriptyline.
nortriptyline dose requirement (mg/day) [Adapted from Meyer UA (2004) Nat. Rev.
>250–500 150–100 20–50 Genet. 5, 669–676. With permission from
nortriptyline (mg) Macmillan Publishers Ltd.]
612 Chapter 19: Pharmacogenetics, Personalized Medicine, and Population Screening
0 CYP2D6
0 24 48 72
hours
7% 10%
13%
The CYP2D6 gene has nine exons and is located on chromosome 22q13.
Sequencing the gene in poor metabolizers has revealed a variety of point muta-
tions, mainly nonsense, frameshift, or splice site changes, and occasional com-
plete gene deletions. Interestingly, the Human Genome Project chromosome 22
70%
reference sequence happened to be from a deletion carrier, so CYP2D6 was not
recorded in the original sequence. Ultra-rapid metabolizers have increased num-
bers of CYP2D6 genes (Figure 19.3).
Other P450 enzymes
Another P450 cytochrome, CYP2C9, hydroxylates various drugs, including non- CYP2C9
steroidal anti-inflammatory drugs, sulfonylureas, inhibitors of angiotensin- 4%
converting enzyme, and oral hypoglycemics. For example, rare poor metaboliz-
ers have an exaggerated response to tolbutamide, a hypoglycemic agent that is
used to treat type 2 diabetes. The role of CYP2C9 variants in hypersensitivity to
warfarin is discussed below in the section on personalized medicine.
38%
CYP2C19 metabolizes a variety of drugs, including anti-convulsants such as
58%
mephenytoin, proton pump inhibitors such as omeprazole (used to treat stom-
ach ulcers), proguanil (an antimalarial), and certain antidepressants. Subjects
can be classified phenotypically into poor, intermediate, and extensive metabo-
lizers (Figure 19.4). Intermediate metabolizers are compound heterozygotes for
an active and an inactive allele; poor metabolizers have two inactive alleles. The
frequency of poor metabolizers is 3–5% among Caucasians and African-
Americans, but higher in Orientals and very high in Polynesians. Mephenytoin
and omeprazole are degraded by CYP2C19, so poor metabolizers require a lower CYP2C19
dose. A particular risk for poor metabolizers is unacceptably prolonged sedation 12%
with a standard dose of diazepam as a result of slow demethylation by CYP2C19.
In contrast, proguanil is a prodrug that requires activation by CYP2C19 to form
the active molecule, cycloguanil; poor metabolizers therefore show a decreased 37%
effect of that drug.
CYP3A4 is another P450 cytochrome involved in the metabolism of maybe
40% of all drugs. Its activity in the liver varies up to thirtyfold between individu- 51%
als, but the cause of this variability has been hard to identify. Coding sequence
variants are not common. The enzyme is inducible, and it may be that regulatory
effects are the main cause of the variability. A further complication is that the
activity of CYP3A4 strongly overlaps with that of yet another P450 enzyme,
CYP3A5.
poor metabolizer
Figure 19.4 Population frequencies of P450 activity variants.
The distributions differ in different populations. [Data for CYP2D6 adapted intermediate metabolizer
from Meyer UA (2004) Nat. Rev. Genet. 5, 669–676. Data for CYP2C9 for northern
Europeans from Service RF (2005) Science 308, 1858–1860. Data for CYP2C19 extensive metabolizer
for Taiwanese from Liou YH, Lin CT, Wu YJ & Wu LS (2006) J. Hum. Genet.
ultra-rapid metabolizer
51, 857–863.]
PHARMACOGENETICS AND PHARMACOGENOMICS 613
Other P450 enzymes with important roles in drug metabolism include CYP1A2
and maybe CYP2A6. Figure 19.4 shows the frequencies of variants for three of the
most important P450 enzymes.
24
adverse effect of the drug. Other drugs for which variations in acetylation rate SH
can be clinically important include procainamide (an anti-arrhythmic), hydrala- O
zine (an antihypertensive), dapsone (anti-leprosy), and several sulfa drugs. H2N NH COOH
NH
COOH
However, much depends on precisely what response is being studied. The review
by Evans & McLeod (see Further Reading) gives details.
The ADRB1 gene located on chromosome 10q24–26 encodes the beta-1 recep-
tor. This has only limited homology to the beta-2 receptor and is targeted by dif-
ferent drugs (Figure 19.7). A common polymorphism, p.Arg389Gly, has been
tested for pharmacodynamic effects. The Gly389 allele is associated with a
decreased cardiovascular response to beta-blocker drugs. For example, Arg389
homozygotes had a much better response to the beta-blocker bucindolol than
those who are heterozygous or homozygous for the Gly389 allele.
Variations in the angiotensin-converting enzyme
The angiotensin-converting enzyme (ACE) is a peptidase that converts angi-
otensin I into angiotensin II. The latter is an important regulator of blood pres-
sure and has many other physiological functions. A polymorphic insertion/dele-
tion of an Alu sequence in an intron of the ACE gene is associated with variations
in enzyme activity (Figure 19.8). DD (deletion) homozygotes have about twice
the level of circulating ACE than II (insertion) homozygotes.
ACE inhibitors such as enalapril and captopril are widely used drugs for treat-
ing heart failure. Several reports suggest that ACE inhibitors are more effective in
DD than II patients. The insertion–deletion polymorphism has also been exten-
sively studied for association with diseases. DD homozygotes are at increased
risk of myocardial infarction and coronary artery disease, and possibly also for
complications of type 2 diabetes, but at slightly decreased risk of late-onset
Alzheimer disease.
Variations in the HT2RA serotonin receptor
The serotonin (5-hydroxytryptamine) receptor 2A has been a target for many
investigations because of the central role of serotonin in many neurological proc-
esses. Response to psychiatric drugs is notoriously unpredictable, and several
studies have looked for associations between HT2RA variants and variations in
I allele
insertion
Alu
deletion
Figure 19.8 An insertion–deletion
D allele
polymorphism in the ACE gene that
vasoconstriction: encodes the angiotensin-converting
angiotensin-converting enzyme blood pressure enzyme. Insertion (I) alleles have an
insertion of an Alu element into an intron
angiotensin I renal effects: of the gene. Homozygotes for the deletion
angiotensin II
(inactive) retention of Na+,Cl–
DRVYIHPF allele (D) have higher levels of circulating
DRVYIHPFHL excretion of K+
enzyme than those with the I allele.
secretion of Peptide sequences are shown with the
dipeptide cleaved off aldosterone single-letter code.
616 Chapter 19: Pharmacogenetics, Personalized Medicine, and Population Screening
response. Two variants, p.Ile197Val and p.His452Tyr, have been associated with a
poor response to the antipsychotic drug clozapine, whereas an intronic SNP,
rs7997012, has been associated with variable response to the antidepressant
citalopram. There are sometimes conflicting results regarding possible associa-
tions of HT2RA variants with conditions including schizophrenia, obsessive–
compulsive disorder, and alcohol dependence.
Malignant hyperthermia and the ryanodine receptor
Rare individuals have a life-threatening response to inhalation anesthetics such
as halothane or isoflurane. They suffer severe muscle damage (rhabdomyolysis)
and an extreme rise in temperature. This condition, known as malignant hyper-
thermia (MH), is genetic but heterogeneous. Many, but not all, cases have muta-
tions in the ryanodine receptor gene, RYR1, on chromosome 19q13. This gene
encodes a calcium release channel in the muscle sarcoplasmic reticulum. It is
thought that MH results when a mildly temperature-sensitive ryanodine recep-
tor allows excessive calcium flow, which causes the temperature to rise and initi-
ates a catastrophic positive feedback loop.
Warfarin
Warfarin is a powerful anticoagulant, used for patients at risk of embolism or
thrombosis (and also to poison rats and mice by triggering internal bleeding). It
works by decreasing the availability of vitamin K. This is an essential cofactor for
the enzyme g-glutamyl carboxylase that activates blood clotting factors II (pro-
thrombin), VII, IX, and X. During these reactions, vitamin K is converted to the
inactive vitamin K epoxide, and this is recycled by the action of vitamin K epoxide
reductase (VKOR). Warfarin inhibits VKOR (Figure 19.9).
Achieving the correct level of anticoagulation is clinically very important. If
the level is too low, the patient remains at risk of thrombosis or embolism; how-
ever, if it is too high, there is a risk of hemorrhage, which can be life-threatening.
The therapeutic window (the range of doses that are efficacious and not harmful)
is narrow. After insulin, warfarin is the most common prescription drug respon-
sible for emergency hospital admissions. The average cost of a bleeding episode
has been estimated as $16,000. But the effective warfarin dose varies up to twen-
tyfold between individuals. This has made warfarin something of a test case for
the general utility of pharmacogenetically informed prescribing.
Warfarin as normally used is a mixture of two stereoisomers, R-warfarin and
S-warfarin. Both isomers are active, but the S form is about three times as potent
as the R form. CYP2C9 is the principal enzyme that catalyzes the conversion of
S-warfarin to inactive 6-hydroxy and 7-hydroxy metabolites, whereas the oxida-
tive metabolism of R-warfarin is catalyzed mainly by CYP1A2 and CYP3A4. Vari-
ants in CYP2C9 and VKORC1 (the gene encoding subunit 1 of VKOR), together
with a limited set of environmental factors, can explain about 50–60% of the vari-
ability of response to warfarin. However, as Figure 19.10 shows, several other
LIVER CELL
6-hydroxywarfarin
7-hydroxywarfarin
8-hydroxywarfarin
CYP1A1
10-hydroxywarfarin CYP2C9
CYP1A2
CYP2C19
CYP3A4
CYP2C9
R-warfarin S-warfarin 6-hydroxywarfarin
CYP1A2
CYP4F2
CYP2C8
CYP2C9
CYP1A1
CYP2C18
7-hydroxywarfarin dehydrowarfarin
Figure 19.10 The catabolism of warfarin.
Warfarin, as prescribed, is a mixture of two
4-hydroxywarfarin stereoisomers, R- and S-warfarin, each
of which can be catabolized by several
different P450 enzymes. As a result, the rate
of catabolism is not wholly governed by
ABCB1 the activity of any one enzyme, although
elimination
via kidney
CYP2C9 has the major effect. (Courtesy of
elimination PharmGKB. With permission from PharmGKB
via bile and Stanford University.)
618 Chapter 19: Pharmacogenetics, Personalized Medicine, and Population Screening
P450 enzymes have a role in removing warfarin. In 2007, the US Food and Drug
Administration put wording on the label of warfarin that recommended (but did
not mandate) genotyping these two loci. Many clinicians, however, remain skep-
tical about the value of genotyping their patients, because CYP2C9 and VKORC1
genotypes do not fully predict the response. A major goal in pharmacogenetics is
to identify the remaining factors and develop a simple test that will accurately
predict the correct dose of warfarin for a patient. The review by Wadelius and
Piromohamed (see Further Reading) describes 30 other genes in which variation
might explain some of the variable effect, and discusses the prospects for better
prediction.
microdosing study
a single very small dose is given to maybe 10–15 healthy volunteers to get preliminary data on the
pharmacodynamics and pharmacokinetics (this step may be omitted)
Nowadays companies try to avoid compounds that are metabolized by the most
variable enzymes. Many of the drugs described above, for which pharmacoge-
netic effects are important, were developed years ago when these effects were
less well understood. They are also now off-patent, so that the company has little
financial incentive to develop a genetic test.
Predicting adverse effects is particularly important. It can be both a financial
and a public relations disaster for a company if an adverse effect becomes appar-
ent only after a drug has been brought to market. There will probably be litigation
and claims for compensation. Vioxx® (rofecoxib), a nonsteroidal anti-inflamma-
tory drug used to treat arthritis and other forms of pain, is a recent example. Vioxx
was withdrawn from the market in 2004 by its manufacturer, Merck, because of
evidence that high doses could increase the risk of heart attacks and stroke. Sales
in the previous year had generated $2.5 billion in revenue in the USA alone. If it
could be shown that only patients with certain genotypes were vulnerable to the
adverse effect, the company would have a huge incentive to market the drug in
combination with a genetic test. But identifying rare genotypes that predispose
to adverse effects can be difficult. If only a few patients in a phase III clinical trial
experience these effects, there will be little statistical power to identify the cause;
moreover, obtaining the necessary DNA for investigation may be complicated
and require extra consent.
O N
P P CYTOSOL
ATP P P ATP * * ATP
TK
P P
gefitinib blocks access of ATP to TK
P P domains bearing certain specific mutations (*)
cross-phosphorylation
by activated
kinase domains
Figure 19.12 A genotype-specific drug.
The anti-cancer drug gefitinib (Iressa)
activation of intracellular intracellular signaling targets mutant epidermal growth factor
signaling cascade blocked (EGF) receptors. (A) The EGF receptor is
a transmembrane tyrosine kinase (TK).
In response to ligand binding, receptors
Gefitinib for lung and other cancers that have EGFR mutations dimerize, then use ATP to autophosphorylate
Iressa® (gefitinib) and Tarceva® (erlotinib) are small-molecule inhibitors of the tyrosine residues. This transmits a mitogenic
epidermal growth factor receptor (EGFR; Figure 19.12). Both drugs selectively signal through an intracellular signal
target EGFR molecules that carry certain tumor-specific mutations. They have transduction cascade. (B) In some small cell
lung cancers, in-frame deletions or point
proved effective in the 10% of non-small cell lung cancers that have certain spe-
mutations in the TK domain lead to excessive
cific mutations in the EGFR gene (an in-frame deletion in exon 19 or a point signaling. The small-molecule drug gefitinib
mutation in codon 858), but ineffective in tumors that lack these mutations. They blocks access of ATP specifically to the
have been used mainly in patients in whom chemotherapy has been tried and mutant EGF receptor molecules, preventing
has failed; recent data suggest that they may also be better than chemotherapy as downstream signaling. Wild-type EGF
first-line treatments for mutation-bearing tumors. Their utility is being explored receptors are not affected.
in a variety of other cancers.
NAT1
normal LIV-1
HNF3A
breast-like XBP1
GATA3
ESR1
PTP4A2
RERG
SCUBE2
5.6
2.8
1.4
1.4
2.8
5.6
4
2
1
2
4
TABLE 19.1 THE SIGNIFICANCE OF A GENETIC TEST RESULT FOR THREE DIFFERENT CONDITIONS
Test CAG repeat number BRCA1/2 mutation T/C SNP rs7903146 in TCF7L2 gene
Risk if test result is positive or high-risk 100% 70–85% 6.5% (a) 36% (b)
Risk if test result is negative or low-risk 0% 12% (population risk) 4.5% (a) 27%(b)
The risks for type 2 diabetes are calculated from frequencies of the risk allele of 0.345 in cases and 0.261 in controls [Denmark B sample of Helgason et
al. (2007) Nat. Genet. 39, 218–225] assuming a person with an age and sex-specific risk before testing of 5% (a) or 30% (b). Test result positive means that
a person has one copy of the high-risk allele.
per-allele odds ratios between 1.2 and 1.5), and even these estimates are likely to
be inflated.” The point about inflation refers to the phenomenon of striking lucky,
discussed in Chapter 15 (see Figure 15.3).
As discussed in Chapter 15, it is not surprising that factors identified through
association studies are weak. To show associations with tagging SNPs, these fac-
tors must have persisted in the population for hundreds of generations. Given
that they must therefore be evolutionarily insignificant, it requires something
special for them to be clinically significant. Possible special factors are an inter-
action with some specific feature of modern life, for example: smoking or a West-
ern diet; a balanced effect whereby susceptibility to one disease is balanced by
resistance to another; balancing selection in favor of heterozygotes; or an effect
entirely limited to people past reproductive age (over the age of 50 years).
TABLE 19.2 THE 10 STRONGEST ASSOCIATIONS FOUND BY THE WELLCOME TRUST CASE CONTROL CONSORTIUM STUDY
OF 500,000 SNPS IN SEVEN COMPLEX DISEASES
Disease Chromosomal location of factor Odds ratio 95% confidence interval
As shown in Figure 15.9, no strong associations were detected for the seventh disease studied, hypertension. MHC, major histocompatibility complex.
Data from Wellcome Trust Case Control Consortium (2007) Nature 447, 661–678.
624 Chapter 19: Pharmacogenetics, Personalized Medicine, and Population Screening
1.0
plotted on a graph (Figure 19.14). The area under the curve is a measure of the
0.9
ability of the test to discriminate between people with and without the condition
being tested. A completely useless test would give a straight line (area under 0.8
curve = 0.5). The more useful a test is, the more the curve bulges up from the 0.7
sensitivity
diagonal, and the greater the area under the curve. A test that discriminated per- 0.6
fectly would have an area under the curve of 1. 0.5
Figure 19.14 shows the ROC curves from simulations by Janssens and col- 0.4
leagues (see below and Further Reading) of genotyping two, three, four, or five 0.3 2 genes (AUC 0.59)
susceptibility factors for a hypothetical complex disease. We see that the area 0.2 3 genes (AUC 0.64)
4 genes (AUC 0.68)
under the curve increases as more tests are added to the battery. The exact shape 0.1 5 genes (AUC 0.70)
of the curves, and the amount of increase in the area under the curve as a new 0
0 0.2 0.4 0.6 0.8 1.0
test is added, depend on the extra risk identified by that test. The curves in the
1 – specificity
figure assume that the successive tests are of variants that confer relative risks of
1.5, 2.0, 2.5, 3.0, and 3.5, in that order. Note that the beneficial effect of adding Figure 19.14 Receiver-operating
extra tests is partly a consequence of the fact that each successive test identifies characteristic (ROC) curve for a simulated
more risk than the preceding test. If they all had the same effect on the overall test battery, showing the increasing
risk, the progression would be less marked. discrimination obtained by testing two,
It is important, however, to bring some population genetics into the argu- three, four, or five independent disease
ment. Suppose there are 20 independent two-allele SNPs, each having one allele susceptibility factors. The ROC curve plots
sensitivity against (1 – specificity) for a test.
that confers a relative risk of 1.05 for a disease and is present at a frequency (q) of
The more discriminating a test is, the greater
0.3 in the population: 51% of people have at least one high-risk allele for any one is the area under the curve (AUC). [From
SNP (q = 0.3, p = 0.7, p2 = 0.49, 1 - p2 = 0.51). That single test result makes little Janssens AC, Pardo MC, Steyerberg EW &
difference to the person’s risk. But if somebody has high-risk alleles at all 20 loci, van Duijn CM (2004) Am. J. Hum. Genet. 74,
their risk will undoubtedly be high. Similarly, if they have only low-risk alleles at 585–588. With permission from Elsevier.]
all 20 loci, their risk will be low. Such results would very probably have clinical
utility. But these are independent loci, and only one person in (0.51)20 = 1.4 per
million people would have the very high risk. One person in (0.49)20 = 0.6 per mil-
lion people would have a very low risk; everybody else would be somewhere in
between.
Testing for several factors must provide more information than testing for just
a single factor. The ROC curves in Figure 19.14 illustrated this. The more tests that
are added to a battery, the greater is its power to detect significantly high or low
risks—but the smaller is the percentage of people who would show those ex-
treme risks. In many ways, the larger a test battery is, the more it resembles test-
ing for a Mendelian condition: it becomes good at defining rare people who are at
notably high (or low) risk. But Mendelian testing is done only when the family
history or clinical signs suggest a high prior risk. The value of a test battery must
depend on its ability to report usefully above-average risks for a usefully large
minority of subjects.
In our example with 20 SNPs, we can use the binomial theorem to calculate
that 0.7% of people would have one or more high-risk alleles at 16 or more of the
20 loci, 7% would have high-risk alleles at 14 or more loci, and 15% at 13 or more
loci. We cannot, however, calculate the relative risk of these people simply by
doing sums with the relative risks at individual loci. This is biology, not arithme-
tic. Different factors may affect the same pathway and either be redundant or act
synergistically. A very detailed systems biology understanding of the pathogene-
sis, coupled with extensive epidemiological data, would be necessary to interpret
these intermediate risks.
Figure 19.15 shows the simulations by Janssens and colleagues (see Figure
19.14) in more detail. With the parameters used in this simulation, it is clear that
the chances of a person being identified as being at very high risk are extremely
small.
0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
genetic risk across the population, on the basis of the reported allele frequencies
and per-allele odds ratios. The risk in the highest risk class (homozygous for the
high-risk allele at all seven loci) is 6.3 times the risk in the lowest risk class—but
only six women in a million would fall into either of these classes. The distribu-
tion of genetic risk in cases is only marginally different from the distribution in
the general population. The 10% of the population at highest risk accounts for
only 15% of cases.
Clearly, genotyping women for these seven factors would contribute little to
defining a high-risk group for special attention. Maybe these genotypes could be
used to modify the age at which women are offered routine mammographic
screening. This would be similar to the way in which maternal serum biomarkers
are used to modify the age threshold for prenatal testing for Down syndrome (see
Section 19.5). At present, in the UK all women over the age of 50 years are offered
mammographic screening every 3 years. Screening would be more effective if
entry into the program were dictated by combined age and genetic risk, rather
than by age alone. Women with high-risk genotypes would be offered screening
from an earlier age. The corollary (assuming the overall numbers screened remain
the same) is that women with low-risk genotypes should be denied screening
until some later age. Some of them will nevertheless develop breast cancer that
could have been picked up had they been screened at 50 years old, so this may
prove a difficult concept to sell to the public. The proposal could also be seen as
another instance of the tendency to focus on marginal genetic effects rather than
much more significant lifestyle factors. Calorie intake and weight have far more
influence on susceptibility to breast cancer than the genetic factors discussed
here. Being 20 kg overweight at the menopause doubles a woman’s risk of breast
cancer. A screening program based on age, weight, and diet would define a high-
risk group far more effectively than one based on age and the currently known
genotypes; moreover, it would define a cohort of women who would be a natural
target for risk reduction programs.
0
–0.4 –0.2 0 0.2 0.4
log relative risk
(B) affected cases Figure 19.16 Relative risk of sporadic
6 breast cancer, based on the predicted
combined effects of seven susceptibility
loci. (A) The distribution of genetic risk in
the general population. (B) The distribution
proportion of population (%)
Data from Zheng SL, Sun J, Wiklund F et al. (2008) N. Engl. J. Med. 358, 910–919.
628 Chapter 19: Pharmacogenetics, Personalized Medicine, and Population Screening
to consumers through the Internet. Typical offerings have titles such as Heart
health, Bone health, Immunogenomic profile, and Nutritional genetic profile.
A Dutch–US group reported in 2008 on the scientific bases of genomic profiles
offered by seven such companies (see Further Reading). Janssens and colleagues
identified the particular SNPs used for each profile and sought evidence that
these were significant risk factors for the claimed aspect of health. Their findings
did little to bolster confidence. They concluded, “This review shows that the
excess disease risk associated with many genetic variants included in genomic
profiles has not been investigated in meta-analyses, or has been found to be min-
imal or not significant.... Although genomic profiling may have potential to
enhance the effectiveness and efficiency of preventive interventions, to date the
scientific evidence for most associations between genetic variants and disease
risk is insufficient to support useful applications.”
Few would question the right of people to find out about their own genetic
constitution, as long as they do so at their own expense. People should, however,
be able to trust the claims that are made for the services they are buying. Pro-
vided that the advertising is honest, people will opt to buy or not buy a test
according to personal whim. This may not accord with the calculations of health
economists. A news article in Nature (vol. 453, p. 570; 2008) reported the anec-
dotal case of a young woman whose test result, from a company selling through
the Internet, claimed to change her risk of obesity from 32%, based on her age
and sex, to 34%. She was a satisfied customer of the service. The report quotes her
as saying, “It really does show that if I wasn’t already taking care of myself I would
probably be overweight or diabetic.”
Table 19.4 summarizes the situation in 2010 regarding susceptibility testing.
Whether the situation will be very different in the future is one of the biggest
debates in public health.
Environment and lifestyle do not usually have environment and lifestyle are often more
a major role important than the genetic factors covered by
a test
A single genetic test is highly predictive a single genetic test has very little predictive
value
Genotype of one person may have implications no implications for relatives (susceptibility
for their relatives depends on particular combinations of
genotypes, and recombination during meiosis
breaks these up)
Tests assessed for sensitivity, specificity, and tests assessed for potential profitability before
predictive value before they are made available they are offered
Testing laboratories subject to external quality testing laboratory may or may not be subject to
assurance any quality assurance
POPULATION SCREENING 629
2.0
1.5
1.0
0.5
0
20 25 30 35 40 45
maternal age at term (years)
most babies are born to younger women, so too are most babies with Down syn-
drome. A screening test with a cutoff based on maternal age alone has low
sensitivity.
Over the years, several biomarkers have been identified in maternal serum
that show different, although strongly overlapping, distributions in pregnancies
in which the fetus does or does not have Down syndrome. None of these would
individually be useful as a screening test, but combining age and measures of
several biomarkers can give a valuable composite risk. Several alternative tests
have been proposed. For example, a study of 46,193 pregnancies in London hos-
pitals demonstrated the efficacy of a quadruple test using maternal age plus lev-
els of alpha-fetoprotein, unconjugated estriol, human chorionic gonadotropin,
and inhibin-A (Table 19.5). Other biomarkers such as PAPP-A (pregnancy-associ-
ated plasma protein A) or looking at the neck of the fetus by ultrasound (nuchal
translucency test) are used in some programs. The FASTER (First and Second Tri-
mester Evaluation of Risk) study by the US National Institutes of Health com-
pared several possible protocols. In many programs, the threshold for offering a
definitive diagnostic test would be a composite risk of Down syndrome of 1 in 300
or greater. Various options in the FASTER study gave detection rates of 90–95%
for a false positive rate of 5%.
Whatever the protocol, there is a trade-off between sensitivity and specificity.
Sensitivity can be maximized only at the cost of plunging more couples into con-
siderable stress and anxiety, as well as triggering risky and invasive diagnostic
testing. In contrast, minimizing the number of false positives means that more
women who have been given a negative test result will nevertheless go on to have
a baby with Down syndrome. One important consideration is how early in preg-
nancy the result can be given. A positive result in the first trimester probably
engenders less stress than one obtained later, and if the couple opt for termina-
tion of an affected pregnancy, this is far less traumatic when performed earlier on
in the pregnancy.
Either test defines a high-risk group who then need a definitive diagnostic test. The quadruple test is
considerably more efficient than maternal age alone. Data from Wald NJ, Huttly WJ & Hackshaw AK
(2003) Lancet 361, 835–836.
POPULATION SCREENING 631
A positive result must enable some preventive treatment, e.g. special diet for PKU; review and
useful action choice of reproductive options in cystic fibrosis carrier
screening
The test must have high sensitivity tests with many false negatives undermine confidence in
and specificity the program; tests with many false positives, even if these
are subsequently filtered out by a definitive diagnostic test,
can create unacceptably high levels of anxiety
The whole program must be socially subjects must opt in with informed consent; screening
and ethically acceptable without counseling is unacceptable; there must be no
pressure to terminate affected pregnancies; screening
must not be seen as discriminatory
The benefits of the program must it is unethical to use limited health care budgets in an
outweigh its costs inefficient way
TABLE 19.7 A TEST THAT PERFORMS WELL IN THE LABORATORY MAY BE USELESS FOR POPULATION SCREENING
Prevalence of True positives in True positives True negatives in False positives Positive predictive
condition population screened detected by population detected by value
screening screening
different pathogenic mutations have been described. Even with the new ultra-
high-throughput sequencing methods it would not be feasible to test for all of
them as a population screen. Screening would test for some subset of the com-
moner mutations. People carrying rare mutations not included in the screening
panel would be false negatives. The question is, what sensitivity would be accept-
able? Figure 19.19 shows a flowchart for the screening program.
It is clear that simply testing for the most common mutation, p.F508del, would
not produce an acceptable program; more affected children would be born to
couples who tested negative on the screening program than would be detected
by the screening. Whether or not such a program were financially cost-effective,
it would surely be socially unacceptable. What constitutes an acceptable program
is harder to define. One suggestion focuses on +/- couples; that is, couples with
one known carrier and the partner negative on all the tests. The partner still might
be a carrier of an undetectable or unidentifiable rare mutation. Writing in the
journal Nature (see Further Reading), Leo ten Kate suggested that an acceptable
screening program is one in which the risk for such +/- couples is no higher than
in the general population risk before screening. For cystic fibrosis screening of
northern Europeans, that would require a sensitivity of about 96%. Rather than
screen everybody with such high sensitivity, it might be more efficient to use an
initial screen of fairly low sensitivity and involving very large numbers of sub-
jects, but then to do a very thorough screen of the much smaller number of part-
ners of the carriers identified in the initial screen.
Choosing subjects for screening
As well as considering how many mutations to test, organizers of a screening pro-
gram for CF carriers would have to decide whom to test. As Table 19.8 shows,
there are several options, each of which has advantages and disadvantages.
TABLE 19.8 POSSIBLE WAYS OF ORGANIZING POPULATION SCREENING FOR CARRIERS OF CYSTIC FIBROSIS
Group tested Advantages Disadvantages
Newborns easily organized no consequences for 20 years; many families would forget
the result; unethical to test children
School leavers easily organized; inform people before they start difficult to conduct ethically; risk of stigmatization of
relationships carriers
Couples from physician’s couple is unit of risk; stresses physician's role in preventive difficult to control quality of counseling
lists medicine; pre-conception screening allows a full range of
reproductive options
Women in antenatal clinic easily organized; rapid results bombshell effect for carriers; partner may be unavailable;
time pressure on laboratory
Adult volunteers (drop-in few ethical problems bad framework for counseling; no targeting to suitable
CF center) users; inefficient use of resources?
634 Chapter 19: Pharmacogenetics, Personalized Medicine, and Population Screening
This argument has special force for couples who already have one affected child.
They often request prenatal testing, saying that however much they love that
child, they could not cope with having a second affected child. Whatever their
position on this argument, all civilized people must surely agree that society has
an obligation to look after people born with disabilities and do whatever is pos-
sible to allow them a full life.
Some people worry that enabling people with genetic diseases to lead
normal lives spells trouble for future generations
If neonatal screening detects a baby with PKU, and the family is able to keep to
the low-phenylalanine diet, the baby will grow into a person who can lead a nor-
mal life. That may well include becoming a parent. Phenotypically such people
are normal, but genetically they have PKU, and they will pass on one of their
mutant alleles to any child. If they had not been treated, they could never have
become parents and their mutant allele would have died with them. Is this not
going against natural selection and storing up trouble for future generations?
It has to be true that helping those affected by a mutation to live normal lives
has the effect of increasing the number of mutant alleles in the next generation—
the question is, by how much? A little population genetics helps here:
• For recessive conditions, only a very small proportion of the disease genes are
carried by affected people. The great majority are in healthy heterozygotes.
The Hardy–Weinberg equation (see Chapter 3) gives the ratio of carriers to
affected in a population as 2pq:q2, where q is the frequency of the disease
allele and p = 1 - q. Because each carrier has one copy of the disease allele and
each affected person has two, the proportion of disease alleles present in
affected people is 2q2/(2pq + 2q2). This expression simplifies to just q when
numerator and denominator are both divided by 2q. So, for a recessive dis-
ease affecting one person in 10,000 (q2 = 1/10,000; q = 0.01), only 1% of disease
alleles are in affected people. Whether or not we allow affected people to
transmit their disease alleles has very little effect on the frequency of the dis-
ease in future generations. We do not need to think of measures to prevent
them from having children as the price of treatment.
• For dominant conditions, the effect on the gene frequency would be more
marked. If a condition is sufficiently serious that, untreated, it is incompatible
with reproduction, then it must be maintained by recurrent mutation. If
affected people were treated so well that they had the same reproductive suc-
cess as unaffected people, then the frequency of the condition would double
in the next generation. On the other hand, if treatment were so successful,
would an increase in the number of affected people be such a disaster?
Optimists believe that medicine will move from a reactive ‘diagnose and treat’
diagnose and treat medicine
to a proactive ‘predict and prevent’ model. At present, we wait until we are unwell
CONCLUSION
Clinical tests must be evaluated against several criteria. The analytical validity
measures how far the test measures the thing it seeks to measure. Typical consid-
erations would be the sensitivity, specificity, false positive rate, and positive pre-
dictive value. For a test of a continuously variable quantity, suitable measures
would be the test–retest reliability and the ability to report correctly the value in
a known test sample. Reputable laboratories enroll in external quality assurance
schemes to ensure that their results have good analytical validity.
Clinical validity measures the extent to which the test result predicts the clini-
cal outcome that it is testing. For tests of genotypes, the clinical validity depends
on the extent of genotype–phenotype correlation. Further criteria are the clinical
utility—how useful is the result to the patient?—and considerations of the ethics
of the test.
An important facet of individual genetic variation is the way in which indi-
viduals differ in their response to many drugs. Pharmacogenetics considers the
effect of specific genotypes on pharmacokinetics (what the body does to a drug
by way of absorption, metabolic activation, and elimination) and pharmacody-
namics (the variable response of a target organ to a drug). There are many exam-
ples in which a drug is either ineffective or dangerous to patients who have some
specific genetic variant. This has led to an interest in patient-specific prescribing,
in which the choice and dose of a drug would be influenced by a genetic test on
the patient. This is already a reality with some expensive anti-cancer drugs, but it
has been slow to become part of wider medical practice. Reasons for the slow
uptake include the limited predictive value of many pharmacogenetic tests, and
the logistic problems of integrating genetic testing into normal clinical routines.
A second area where common individual genetic differences are of clinical
interest is their role in modulating susceptibility to common complex diseases.
Previous chapters have described the scientific and technical breakthroughs that
have led to the identification of long lists of genetic susceptibility factors for many
complex diseases. However, attempts to use this knowledge as a basis for person-
alized medicine seem premature. For most common diseases, the currently iden-
tified genetic factors account for only a modest part of the individual differences
in susceptibility. Genotyping may be performed to satisfy individual curiosity,
but for most susceptibility tests the clinical validity is low, and no clinical utility
has been established.
Measures of susceptibility find current use in population screening programs.
In contrast with diagnostic tests, population screening tests do not usually seek to
identify cases directly. Instead, the usual aim is to define a high-risk group of peo-
ple who can then be offered a definitive diagnostic test or some other useful inter-
vention. The initial screening tests for genetic conditions are usually based on
clinical or biochemical findings, rather than tests of genotypes. Usually, there is a
trade-off between sensitivity and specificity in deciding where to draw the line
between a normal and a high-risk result. In contrast with diagnostic tests, large-
scale population screening programs target people who had no particular prior
reason to be concerned about the condition being screened for. This makes it very
important to consider the social and ethical implications of any proposed screen-
ing program. A poorly designed program could easily do more harm than good.
In the longer term, there are hopes that susceptibility testing could become
much more predictive and could be combined with preventive measures. This
would allow medicine to move from the current diagnose and treat model to a
predict and prevent model. People would be genotyped at some early age for a
whole range of susceptibility factors, their individual susceptibilities identified,
and appropriate preventive measures recommended. How far this vision could
ever be realized is a matter of considerable controversy. Progress will depend on
identifying stronger determinants of susceptibility and better means of preven-
tion. The chances of achieving these goals will differ from one disease to
another.
FURTHER READING 637
FURTHER READING
Evaluation of tests Smith I, Procter M, Gelber RD et al. (2007) 2-year follow-up of
Burke W, Atkins D, Gwinn M et al. (2002) Genetic test evaluation: trastuzumab after adjuvant chemotherapy in HER2-positive
information needs of clinicians, policy makers and the public. breast cancer: a randomised controlled trial. Lancet 369,
Am. J. Epidemiol. 156, 311–318. [The ACCE framework and its 29–36. [Trastuzumab therapy for breast cancer is effective
only for tumors with amplification of the ERBB2 gene.]
implications.]
Wadelius M & Piromohamed M (2007) Pharmacogenetics
Seidel S, Kreutzer R, Smith D et al. (2001) Assessment of
of warfarin: current status and future challenges.
commercial laboratories performing hair mineral analysis.
Pharmacogenom. J. 7, 99–111. [Reviews a long list of
JAMA 285, 67–72.
potentially relevant genetic variants.]
Wang Q, Khoury MJ, Botto L et al. (2003) Improving the
Wald NJ & Law MR (2003) A strategy to reduce cardiovascular
prediction of complex diseases by testing for multiple
disease by more than 80%. Br. Med. J. 326, 1419–1423.
disease-susceptibility genes. Am. J. Hum. Genet. 72, 636–649.
[A proposal for a universal polypill for all, in contrast with
[The conclusions were largely refuted by Janssens and
tailored, personalized medicine.]
colleagues—see below—but the paper is a good source of
equations relevant to evaluating tests.]
Tumor expression profiling
Pharmacogenetics Armstrong SA, Staunton JE, Silverman LB et al. (2002) MLL
Arranz MJ, Munro J, Owen MJ et al. (1998) Evidence for association translocations specify a distinct gene expression profile that
between polymorphisms in the promoter and coding regions distinguishes a unique leukemia. Nat. Genet. 30, 41–47. [An
of the 5-HT2A receptor gene and response to clozapine. Mol. example of successful molecular subtyping.]
Psychiatry 3, 61–66. [An example of a clinically important Lakhani SR & Ashworth A (2001) Microarray and histopathological
pharmacogenetic effect.] analysis of tumours: the future and the past? Nat. Rev. Cancer
Durham WJ, Aracena-Parks P, Long C et al. (2008) RyR1 1, 151–157.
S-nitrosylation underlies environmental heat stroke and Petricoin EF, Zoon KC, Kohn EC et al. (2002) Clinical proteomics:
sudden death in Y522S RyR1 knockin mice. Cell 133, 53–65. translating benchside promise into bedside reality. Nat. Rev.
[The mechanism of malignant hyperthermia.] Drug Discov. 1, 683–695.
Evans WE & McLeod HL (2003) Pharmacogenomics—drug Sorlie T, Tibshirani R, Parker J et al. (2003) Repeated observation of
disposition, drug targets and side effects. N. Engl. J. Med. 348, breast tumor subtypes in independent gene expression data
538–549. [A general review.] sets. Proc. Natl Acad. Sci. USA 100, 8418–8423. [Source of the
Flockhart DA (2007) Drug Interactions: Cytochrome P450 Drug data in Figure 19.13.]
Interaction Table. Indiana University School of Medicine, Sotiriou C & Piccart MJ (2007) Taking gene-expression profiling to
Division of Clinical Pharmacology. http://medicine.iupui.edu/ the clinic: when will molecular signatures become relevant to
clinpharm/ddis/table.asp [accessed 25 June 2008]. patient care? Nat. Rev. Cancer 7, 545–553.
McMahon FJ, Buervenich S, Charney D et al. (2006) Variation in the Van de Vijver MJ, He YD, van ’t Veer LJ et al. (2002) A gene-
gene encoding the serotonin 2A receptor is associated with expression signature as a predictor of survival in breast
outcome of antidepressant treatment. Am. J. Hum. Genet. 78, cancer. N. Engl. J. Med. 347, 1999–2009. [Work subsequently
804–814. [An example of the importance of genetic variants commercialized as the MammaPrint system.]
for treating psychiatric disease.]
Patin E, Barreiro LB, Sabeti PC et al. (2006) Deciphering the ancient Susceptibility testing
and complex evolutionary history of human arylamine Easton DF, Pooley DA, Dunning AM et al. (2007) Genomewide
N-acetyltransferase genes. Am. J. Hum. Genet. 78, 423–436. association study identifies novel breast cancer susceptibility
[Natural selection and NAT gene polymorphisms.] loci. Nature 447, 1087–1095. [Identification of five novel
The Pharmacogenomics Knowledge Base [PharmGKB]. susceptibility factors using a three-stage protocol, as
http://www.pharmgkb.org (accessed 25 June 2008). described in Chapter 16.]
Weinshilboum R (2003) Inheritance and drug response. Janssens AC, Pardo MC, Steyerberg EW & van Duijn CM (2004)
N. Engl. J. Med. 348, 529–537. [A general review.] Revisiting the clinical validity of multiplex genetic testing in
common diseases. Am. J. Hum. Genet. 74, 585–588. [A riposte
Personalized medicine to the paper by Wang and colleagues; see above.]
Aspinall MG & Hamermesh RG (2007) Realizing the process of Janssens AC, Gwinn M, Bradley LA et al. (2008) A critical appraisal
personalized medicine. Harvard Business Rev. 85, 108–117. of the scientific basis of commercial genomic profiles used to
[Discusses organization of services, not the science.] assess health risks and personalize health interventions. Am.
International Warfarin Pharmacogenetics Consortium J. Hum. Genet. 82, 593–598. [A very negative appraisal of some
(2009) Estimation of the warfarin dose with clinical and commercial lifestyle genotyping offers.]
pharmacogenetics data. N. Engl. J. Med. 360, 753–764. Pharaoh PDP, Antoniou AC, Easton DF & Ponder BA (2008)
[Performance of an algorithm for warfarin dosage that Polygenes, risk prediction and targeted prevention of breast
incorporates both clinical and genetic data.] cancer. N. Engl. J. Med. 358, 2796–2803. [A proposal to use
Lyssenko V, Jonsson A, Almgren P et al. (2008) Clinical risk factors, susceptibility genotypes to modify mammographic screening
DNA variants and the development of type 2 diabetes. criteria.]
N. Engl. J. Med. 359, 2220–2232. [The two studies summarized Wellcome Trust Case Control Consortium (2007) Genome-wide
in Section 19.4.] association study of 14,000 cases of seven common diseases
Meigs JB, Shrader MPH, Sullivan LM et al. (2008) Genotype score and 3,000 shared controls. Nature 447, 661–678. [One of the
in addition to common risk factors for prediction of type 2 definitive genomewide association studies, involving 17,000
diabetes. N. Engl. J. Med. 359, 2208–2219. individuals and 7 diseases.]
638 Chapter 19: Pharmacogenetics, Personalized Medicine, and Population Screening
Population screening
Andrews LB, Fullarton JE, Holtzman NA & Motulsky AG (1994)
Assessing Genetic Risks: Implications For Health And Social
Policy. The National Academies Press, Washington DC. http://
www.nap.edu/catalog.php?record_id=2057#toc [Report
on ethical issues in genetic population screening from a
committee of distinguished American geneticists, clinicians,
lawyers, and theologians.]
Malone FD, Canick JA, Ball RH et al. First- and Second-Trimester
Evaluation of Risk (FASTER) Research Consortium (2005)
First-trimester or second-trimester screening, or both, for
Down syndrome. N. Engl. J. Med. 353, 2001–2011. [Data on the
performance of American screening protocols.]
ten Kate L (1989) Carrier screening in CF. Nature 342, 131.
[Discusses the sensitivity required for a CF carrier screening
program.]
Wald NJ, Huttly WJ & Hackshaw AK (2003) Antenatal screening
for Down syndrome with the quadruple test. Lancet 361,
835–836. [Data on the performance of a British screening
protocol.]
Chapter 20
• The study of gene function and human genetic disease can be greatly aided by genetically
modified animal models, often known as transgenic animals.
• To make a transgenic animal, exogenous DNA (known as a transgene) is inserted into the
germ line; the transgene may contain more than one gene and accompanying regulatory or
marker sequences.
• Transgenes are often introduced into a fertilized oocyte, whereupon they can randomly
integrate into the genome and be transmitted to all cells of the developing animal.
• Transgenes can also be introduced into embryos via cultured pluripotent stem cells that can
ultimately give rise to all cells of the animal.
• In gene targeting a specific predetermined gene is genetically modified within cultured
pluripotent stem cells and the genetically modified stem cells are used to introduce the
mutation into the germ line.
• Gene targeting may be designed to inactivate a gene (a gene knockout ) or to modify it; gene
targeting frequently exploits homologous recombination, a natural cellular process, to
replace specific DNA sequences within a gene by DNA sequences carried on an introduced
vector DNA.
• Transgenes can also be directed to integrate within a specific target gene so that the target
gene is inactivated while transgene sequences are expressed under the regulation of the
promoter of the target gene (a gene knock-in).
• Larger genomic changes can be brought about by chromosome engineering. Short
recognition sequences for a microbial recombinase are inserted by gene targeting into
specific positions in the chromosomes of pluripotent stem cells. A specific microbial
recombinase is then used to induce the recognition sequences to recombine to produce a
chromosomal deletion, inversion, or translocation.
• Specific predetermined target genes can also be knocked down in animal cells at the RNA
level by using RNA interference or synthetic antisense oligonucleotides to selectively target
transcripts of the gene to be knocked down.
• Large-scale mutagenesis screens often use insertional mutagenesis, whereby specific
transgenes are inserted into the genome, causing inactivation of genes in a random or semi-
random way. The transgene acts as a marker to identify the inactivated gene.
• Disease resulting from a loss of gene function is often modeled by using gene targeting to
inactivate the orthologous gene in a suitable animal model.
• Disease resulting from a gain of function can be modeled by making transgenic animals
with mutant transgenes.
640 Chapter 20: Genetic Manipulation of Animals for Modeling Disease and Investigating Gene Function
Much of our knowledge of gene function and human disease has been obtained
or illuminated by studying model organisms. As outlined in Chapter 5, the con-
servation of basic developmental processes between species has enabled the
study of model organisms to aid our understanding of human development. The
characteristics of a wide variety of model organisms were detailed in Section 10.1.
In this chapter, we explore how knowledge of the genome sequences of humans
and of various animals has facilitated the genetic manipulation of animal models
to understand gene function and to mimic human disease states.
20.1 OVERVIEW
Unicellular models—notably Escherichia coli and the yeast Saccharomyces cere-
visiae—have been invaluable in shedding light on how genes function and on
many general cellular functions (see Box 10.1 and Table 10.1). About 30% of
human genes known to be involved in disease have functional homologs in yeast,
and yeasts have also been of great help in modeling some kinds of human disease
that are caused by defects in basic cellular functions, such as abnormalities of
protein glycosylation.
As described in Chapter 12, studies of cultured invertebrate and mammalian
cells have also been invaluable for investigating gene function and understand-
ing pathogenesis. However, many types of cells are not easy to culture, and cul-
tured cells can take us only so far—they cannot reveal much about interactions
between different cell types. If we are to begin to understand our biology and to
model disease we also need to study whole organisms.
Animal models are extremely valuable in providing insights into gene func-
tion and disease processes, allowing detailed physiological assessments and
investigations of the cellular and molecular basis of disease. They are also cru-
cially important in allowing us to test drugs and other novel therapies before con-
ducting clinical trials on human subjects.
TABLE 20.1 PRINCIPAL ANIMAL CLASSES USED TO MODEL HUMAN DISEASE AND INVESTIGATE GENE FUNCTION
Organism class Advantages as models of disease and gene function Principal disadvantages and practical limitations
Nonhuman closely resemble humans at physiological, biochemical, and primate experimentation is ethically unacceptable to many
primates (see genetic levels; high intelligence; great apes have similar brains people; primate colonies are very expensive to maintain;
Table 10.2) to those of humans; important for neuroscience research generally long generation times and small numbers of
(understanding brain function, etc.) and modeling resistance to offspring are a major limitation to genetic studies, but
certain diseases marmosets, which have a relatively short gestation period
and are reasonably prolific, may be more amenable,
especially after successful germ line transmission of
transgenes first reported in 2009 (see the text)
Other mammals resemble humans quite closely at different levels; sheep and mammals other than rodents have comparatively long
(see Table 10.2 pigs, and to a lesser extent rats, are more suited to physiological generation times and low numbers of offspring, and
and Box 10.3) studies than mice; dogs provide models of genetic disorders experimentation on them is not ethically acceptable to some
and genetics of behavior; cats are good models for studying people; rodents are not good models for studying normal
host-gene–pathogen interactions; rodents have short human brain function and disorders of higher brain function,
generation times and large numbers of offspring, making them such as Alzheimer disease; they also show important genetic
more suited to genetic analyses; mice have been especially and biochemical differences from humans, and their short
amenable to genetic manipulation, with large numbers of generation times may make them less suited to modeling
disease models; rats have provided better models of complex late-onset disorders
disorders than mice
Other vertebrates chickens, frogs (Xenopus), and zebrafish are good models for genetic manipulation has been limited in non-mammalian
(see Box 10.3) studying early developmental processes; the large eggs of vertebrates other than zebrafish; some models are relatively
Xenopus are useful for biochemical analyses; zebrafish are expensive to maintain and have moderately long generation
inexpensive to maintain and are particularly suited to genetic times and moderate offspring numbers; more distantly
analyses, with short generation times and large numbers of related to humans than mammals are, and lack counterparts
offspring; they have a transparent embryo that facilitates for some human genes
mutant screening, and good genetic manipulation
Invertebrates the nematode C. elegans and fruit fly D. melanogaster are invertebrates are evolutionarily very distant from humans
(see Box 10.2) inexpensive to maintain and are extremely well suited to and so are physiologically remote, even though some
genetic analyses, having very short generation times and cellular processes are reasonably conserved; they lack
very large numbers of offspring, and being amenable to counterparts for a considerable number of human genes
sophisticated genetic manipulations; both organisms have
been very important for exploring the functions of the sizeable
number of human genes that have homologs in the worm
or fruit fly, and they are often very useful in providing cellular
models with which to understand human disease processes
disorders such as the mdx muscular dystrophy mouse model, and of more com-
plex disorders such as the NOD non-obese diabetic mouse.
The vast majority of animal disease models have been produced by human
intervention at different levels. For some models, disease has been induced by
treating animals with chemicals or infectious agents. In others, autoimmune dis-
ease has been induced by challenging the immune system with suitable antigens;
for example, injection of pristane, a synthetic adjuvant oil, into the dermis of rats
has produced a model of arthritis. Some other models have been obtained by
selective breeding to obtain animal strains that are genetically susceptible to dis-
ease. However, most animal models are produced by some kind of genetic
modification.
The techniques of genetic modification of animals will be described in this
chapter. In Section 20.2 we describe strategies in which the aim is to insert an
exogenous gene into an animal’s germ line and then express it to help study gene
function or to induce disease. Other strategies simply rely on modifying endog-
enous genes within the germ line to produce animals with different mutant phe-
notypes, and two main approaches are used. Sometimes, as detailed in Section
20.3, the aim is to specifically alter the expression of a single predetermined
endogenous gene to produce a disease model and/or understand the function of
that gene. Alternatively, as described in Section 20.4, endogenous genes can be
mutated at random to generate a large series of mutants whose phenotypes are
then analyzed and categorized.
642 Chapter 20: Genetic Manipulation of Animals for Modeling Disease and Investigating Gene Function
may be analyzed to gain insights into abnormal cellular functions, and relevant
permanent cell lines may be produced for convenient analysis of the mutant
phenotype at the cellular level.
In large-scale mutagenesis screens, large numbers of mutants are produced,
and those that produce easily ascertained phenotypes tend to be over-repre-
sented. Mutants with major defects in limbs or craniofacial structures are easier
to identify than those with minor alterations to some internal organs, but at least
in zebrafish abnormalities of some internal organs such as the heart are more
readily observed because the embryo is transparent. As we will see in Section
20.4, standardized phenotyping is important for large-scale phenotyping
studies.
sperm
Figure 20.2 Genetically modified mice
primordial can be produced by a variety of routes.
germ cells zygote blastocyst
The boxes linked by gray arrows at the top
show components of the mouse life cycle
oocytes ICSI and represent the multiple stages at which,
potentially, genetic modification can be
performed. The lower part of the figure
pronuclear
microinjection shows two mice as sources of donor cells
transfection retroviral for nuclear transfer procedures (process
nuclear transfer microinjection
transduction shown in green arrows). Red arrows show
exogenous DNA cell the input and transport of exogenous DNA
transfer
for all other routes. The most widely used
transfection transfection gene transfer methods—microinjection into
the male pronucleus of the fertilized oocyte,
and transfection of embryonic stem (ES) cells
followed by injection of genetically modified
somatic ES cells
cells ES cells into a blastocyst—are shown by thick
red arrows. ICSI, intracytoplasmic sperm
injection. For convenience, some methods
described in the text are not shown.
644 Chapter 20: Genetic Manipulation of Animals for Modeling Disease and Investigating Gene Function
Transgenes can also be inserted into the germ line via germ cells,
gametes, or pluripotent cells derived from the early embryo
Various other routes have also been used to introduce transgenes into the germ
line, in addition to the fertilized oocyte (zygote) route. Some of them are less
widely used but can be useful in some circumstances and can be popular with
non-mammalian species.
domestic mammals
The principal methods used to generate transgenic mice have not been readily
applicable in some animals. However, somatic cell nuclear transfer, a techni-
cally very demanding method that allows animals to be cloned, has occasionally
been used to make transgenic mammals other than mice. 6-year-old Scottish blackface
Somatic cell nuclear transfer involves the replacement of an oocyte nucleus Finn Dorset sheep sheep
with the nucleus of a somatic cell, which is then reprogrammed by the oocyte.
Despite the differentiated state of the donor cell, the reprogramming can
make the nucleus totipotent so that it is able to recapitulate the whole of
development. mammary gland
oviduct
The technology itself is not new. It has been used for more than 50 years to isolate
clone amphibians, and it has been possible to produce cloned mammals from ovulated
oocytes
embryonic cells since the 1980s. In 1995, it was shown for the first time that mam-
mals could be cloned from the nuclei of cultured cells, and in 1997 the first mam- isolate and
mal cloned from an adult cell was produced, a sheep called Dolly. culture cells
remove
Dolly was produced by transferring a nucleus from a mammary gland cell chromosomes
from a Finn Dorset sheep into an enucleated oocyte taken from a Scottish black-
face sheep. The resulting artificially fertilized oocyte was allowed to develop to
the blastocyst stage before implantation into the uterus of a foster mother (Figure
20.4). Out of a total of 434 oocytes, only 29 developed to the transferable blasto- enucleated oocyte
culture in
cyst stage, and of these only one developed to term, giving rise to Dolly. Dolly has serum-depleted 1. inject
the same nuclear genome as the Finn Dorset sheep donor of the mammary gland medium into oocytes
cell, so they are considered to be clones. However, their mitochondrial genomes under zona
pellucida
are different because Dolly inherited her mtDNA from the Scottish blackface
sheep from which the enucleated oocyte was derived. Subsequently, a variety of donor G0 cells
fusion
different mammals have been cloned by the nuclear transfer procedure, includ- 2. electric
ing mice, rats, cats, dogs, horses, mules, and cows. pulse
We will consider here the nuclear transfer method as a way to generate trans-
genic animals. The essential point is that if the donor nucleus is taken from a renucleated oocyte
somatic cell that has been genetically manipulated to contain a transgene of
interest, the animal that develops from the manipulated oocyte will be trans- (B)
genic. Transgenic livestock have been produced in this way with transgenes that
produce therapeutic proteins, as described in Chapter 21. It is also possible to
modify a specific predetermined endogenous gene in a somatic cell, such as a
fibroblast, and then use somatic cell nuclear transfer to make an animal with the blastocyst
mutation in all its cells, as in a pig model of cystic fibrosis that will be described
in Section 20.5. We consider the ethical implications of human cloning in implant in uterus
Chapter 21. of foster Scottish
blackface sheep
Exogenous promoters provide a convenient way of regulating Figure 20.4 The first successful attempt
transgene expression at mammalian cloning from adult cells
resulted in a sheep called Dolly. (A) The
To make a useful transgenic animal, the transgene not only has to be inserted donor nuclei were derived from a cell line
into the host’s genome but also has to be expressed to produce functional prod- established from adult mammary gland
ucts. Some regulatory elements that influence transgene expression are found cells of a Finn Dorset sheep. The donor
within the host genome, but important elements are provided with the trans- cells were deprived of serum before use,
gene, most notably a suitable promoter and a polyadenylation site. forcing them to exit from the cell cycle and
In transgenic animals, it is often desirable to express the transgene in particu- enter a quiescent state known as G0, in
which only minimal transcription occurs.
lar tissues or at particular developmental stages that are appropriate for the
Nuclear transfer was accomplished by fusing
intended function of the transgene. To achieve a desired expression pattern, the individual somatic cells to enucleated,
transgene will have a suitable tissue-specific or stage-specific promoter. metaphase II-arrested oocytes from a
Maximum control over transgene expression is provided by inducible promoters Scottish blackface sheep. Eggs are normally
that can be switched on and off according to need, usually by controlling the sup- fertilized by transcriptionally inactive sperm
ply of a particular chemical ligand. Typically, the transcription factors that regu- whose nuclei are presumably programmed
late such promoters are structurally modified by this ligand. by transcription factors and other chromatin
Several naturally inducible promoters have been used, but endogenous pro- proteins available in the egg, and so the
moters are generally disadvantaged by leakiness (a high background expression G0 nucleus may represent the ideal basal
level), and by relatively low levels of induction, unwanted co-activation of other state for reprogramming. (B) Dolly with her
firstborn, Bonnie. [(B) courtesy of the Roslin
endogenous genes that respond to the same ligand, and differential rates of
Institute, Edinburgh.]
uptake and elimination of the ligand in different organs. More robust inducible
promoter systems have therefore been developed by using non-mammalian,
MAKING TRANSGENIC ANIMALS 647
(A)
isolate inner endogenous gene
isolate
cell mass
blastocyst
X
selection for mutated endogenous gene
re-implant gene-targeted
into foster ES cells ( )
mother
inject into (B)
C57B10/J
chimeric blastocyst genetically
C57B10/J
blastocyst modified male chimera C57B10/J female
ES cells
mating
Figure 20.6 Genetic targeting of embryonic stem cells to introduce mutations into the mouse germ line. (A) An embryonic
stem (ES) cell line is made by excising blastocysts from the oviducts of a suitable mouse strain (such as 129), and cells from the
blastocyst’s inner cell mass are cultured to eventually give an ES cell line (see Box 21.2 for the details). ES cells can be genetically
modified while in culture by transfection of a linearized recombinant plasmid containing DNA sequences that are identical to certain
sequences in a predetermined endogenous gene, except for a desired genetic modification. Homologous recombination allows
sequence swapping so that the targeted gene is mutated in the ES cell’s genome. Only a very few cells will undergo homologous
recombination to produce the expected modification of the targeted gene, but they can be selected for and amplified. The modified
ES cells are then injected into isolated blastocysts of another mouse strain such as C57B10/J, which has black coat coloration that
is recessive to the agouti color of the 129 strain; the cells are then implanted into a pseudopregnant foster mother of the same
strain as the blastocyst. Subsequent development of the introduced chimeric blastocyst results in a chimeric offspring containing
two populations of cells that derive from different zygotes (here, 129 and C57B10/J, normally evident by the presence of differently
colored coat patches). (B) Backcrossing of chimeras can produce mice that are heterozygous for the genetic modification (bottom
left). Subsequent interbreeding of heterozygous mutants generates homozygotes.
assembled by fusing coding sequences for different types of zinc finger to pro-
duce a particular combination of zinc fingers with the desired DNA-binding
sequence specificity (Figure 20.13A).
Like restriction nucleases, zinc finger nucleases work as dimers. To make a
double-strand DNA break, the two monomers bind to and cut complementary
DNA strands. Standard restriction nucleases typically work as homodimers (with
the result that both monomers recognize the same DNA sequence and the recog-
nition sequence is palindromic). However, zinc finger nucleases are deliberately
designed to work as heterodimers, and each monomer is designed to recognize
different DNA sequences that are located within a very short distance from each
other in the target gene (Figure 20.13B). A zinc finger nuclease heterodimer com-
posed of monomers each with four zinc fingers (as shown in Figure 20.13) effec-
tively binds a 24 (12 + 12) nucleotide recognition sequence that can be expected
to occur just once in a genome.
The induced double-strand break can be repaired by different natural cellular
dsDNA repair pathways (as described in Box 13.3). One such pathway, the non-
homologous end joining (NHEJ) pathway, is naturally prone to error: because the
broken ends are simply joined together, the joining process is frequently accom-
panied by a loss or gain of nucleotides (Figure 20.13C). For the vast majority of
locations in the DNA sequence there would be no serious consequences, but
when the double-strand break is designed to occur in a coding sequence, the
result is often gene inactivation. Successful gene targeting with zinc finger nucle-
Figure 20.13 Targeted gene inactivation
ases was first conducted in zebrafish in 2008 and in rats in 2009. In both cases using genetically engineered zinc finger
mRNAs encoding zinc finger nucleases were injected into one-cell embryos, and nucleases. (A) Zinc finger nucleases (ZFNs)
target genes were inactivated after inaccurate repair by the NHEJ DNA repair contain a series of zinc fingers joined by an
pathway. amino acid linker to a DNA-cleaving domain.
Individual zinc fingers bind to specific triplet
(A)
sequences in DNA, and combinations
DNA-cleaving
domain monomer
of different zinc finger modules can be
N 1 2 3 4
assembled in a defined order to bind to
amino acid specific DNA sequences of interest. (B) A
Zn fingers linker double-strand break is introduced into
(B) functionally important sequence in a specific
left ZFN recognition site right ZFN recognition site gene with a pair of different ZFNs. In this
CGGAGCCGC T T T NNNNNNAC T C T G T GGA AG example the ZFNs each have four zinc fingers
GCC T CGGCGA A ANNNNNN T GAGACACC T T C and are designed to specifically recognize
neighboring 12 bp sequences at the target
binding of ZFNs at neighboring
recognition sequences site. The effective recognition sequence
is 12 + 12 = 24 nucleotides and so should
4 3 2 1 N occur just once in the genome. The targeting
event is designed so that the DNA-cleaving
CGGAGCCGC T T T NNNNNNAC T C T G T GGA AG
GCC T CGGCGA A ANNNNNN T GAGACACC T T C domains of the two different ZFNs can
come into contact and dimerize, activating
N 1 2 3 4
cleavage of both DNA strands. The resulting
dimeric nuclease cleaves DNA double-strand break (DSB) activates cellular
on both strands to generate a DNA repair pathways. The cleavage domains
double-strand break of the two ZFNs can be designed to be
asymmetrical so that heterodimers form but
not homodimers. (C) In the nonhomologous
(C) (D) end joining (NHEJ) DNA repair pathway, the
NHEJ repair HR repair repair is naturally often an imperfect one,
5¢ 3¢ 5¢ 3¢ with nucleotides being deleted or inserted
3¢ 5¢ 3¢ 5¢ that may cause gene inactivation. (D) In the
resection homologous recombination (HR) pathway,
the 5¢ ends of the DSB are first trimmed back
5¢ 3¢
3¢ 5¢ (resection), allowing strand invasion by donor
DNA strands. A plasmid with homologous
repaired, but with deletion
invasion by donor sequence (bold lines) that is designed to
DNA strand and have an inactivating mutation (red asterisk)
repair may be used as the donor DNA, acting as a
repaired, but with insertion template for synthesizing new DNA from a
portion of the homologous sequence (thick
5¢ 3¢ blue lines) to replace the previously existing
3¢ 5¢ sequence. As a result, the selected mutation
repaired sequence with is copied into the gene sequence, causing
inactivating mutation gene inactivation.
656 Chapter 20: Genetic Manipulation of Animals for Modeling Disease and Investigating Gene Function
The alternative major pathway for repairing double-strand DNA breaks, the
homologous recombination (HR) pathway, offers a general and more powerful
way of artificially inducing faulty DNA repair. In this cellular pathway the broken
DNA ends are trimmed back and new DNA, copied from a donor DNA strand
with the homologous sequence, is inserted (Figure 20.13D). To induce an inacti-
vating mutation, a plasmid can be provided that contains a portion of the target
gene sequence spanning the location of the double-strand break but altered to
include a specific inactivating mutation. The HR pathway would use the plasmid
as a donor template DNA to direct repair and in so doing would be able to copy
the inactivating mutation into the repaired sequence (see Figure 20.13D). This
approach allows exquisite control over the mutagenesis procedure, unlike the
random inactivation of the NHEJ pathway.
locations, they sometimes integrate into genes and can inactivate them.
Retrotransposons encode a reverse transcriptase and they move via an RNA
intermediate. DNA transposons move as DNA and encode a DNA transposase
that binds to short inverted terminal repeats (see Figure 9.20).
DNA transposons have been widely used in germ-line mutagenesis in D. mel-
anogaster and C. elegans. The 2907 bp P element of D. melanogaster is a celebrated
example that has revolutionized the study of gene function in the fruit fly. It
moves by a cut-and-paste mechanism: each element needs to be excised from
the DNA before moving to a new location (as opposed to the copy-and-paste
mechanisms used by retrotransposons; see Figure 10.9). Germ-line mutations
caused by transposon mutagenesis are molecularly tagged, greatly facilitating
their identification.
Until very recently, transposons were not available to mutagenize vertebrate
genomes in a meaningful way. Transposable elements had been isolated from
several vertebrate species, but nearly all of the elements were found to be inac-
tive or had low frequencies of transposition. Transposons with broad host ranges
were sought for use as effective mutagens in a wide range of vertebrate
genomes.
Two recently developed transposon systems that offer high-frequency trans-
position in vertebrate genomes are described below. In addition to applications
involving germ-line mutagenesis, vertebrate transposon systems can be used in
the mutagenesis of somatic cells for cancer gene discovery and in modeling
somatic cancers. They can also be used in standard transgenesis via pronuclear
microinjection, offering both high efficiency and single-copy transgenes.
Insertional mutagenesis with the Sleeping Beauty transposon
A defective transposon in salmonid fish, presumed to have been inactive for mil-
lions of years, was resuscitated by molecular engineering and named Sleeping
Beauty (Figure 20.17). It can transpose in cells of a wide variety of vertebrate spe-
cies, but it is popularly used to produce genetically modified mice by inserting
transgenes into the mouse genome, causing insertional inactivation of genes.
Because its sequence is rather different from the endogenous mouse transposons,
it is readily identifiable.
Although Sleeping Beauty shows no integration preference with respect to
gene structure, it tends to reintegrate into sites located within a short distance
(often less than 3 Mb) from the locus it transposes from. This phenomenon,
called local hopping, is a feature of many transposons (the D. melanogaster P1
transposon usually reintegrates within 100 kb of its donor site). The local hop-
ping tendency can be exploited by designing modified Sleeping Beauty trans-
posons to permit region-specific mutagenesis and chromosome engineering
(see the review by Carlson & Largaespada in Further Reading).
660 Chapter 20: Genetic Manipulation of Animals for Modeling Disease and Investigating Gene Function
TA
AT
TA TA
AT AT
piggyBac-mediated transposition
Many DNA transposons, including the P element of D. melanogaster, are nonfunc-
tional outside their natural hosts—presumably they need some host cell factor to
transpose. Others, however, have a broad host range. piggyBac (PB) is a DNA
transposon originally derived from the cabbage looper moth, but PB element-like
sequences have been found in diverse genomes from fungi to mammals. PB
transposition is not so dependent on host cell factors, and PB transposes effi-
ciently in human and mouse cell lines and also in mouse germ-line cells.
PB transposons insert into the tetranucleotide TTAA and regenerate the
sequence on excision. Like Sleeping Beauty, piggyBac displays preferential inte-
gration into actively transcribed loci but has a higher transposition efficiency;
there is no evidence for local hopping but a bias toward reintegration within
intragenic regions.
C57BL/6 ES cells are generally less efficient than 129 ES cells in colonizing
developing embryos and so usually give rise to low-level chimeras, but once they
have done this there is no problem in achieving germ-line transmission. Recently,
a C57BL/6-derived strain of mouse ES cells known as JM8 has been shown to be
highly competent for germ-line transmission and has been genetically modified
to show a dominant pale (agouti) coat color like the 129 strain. Because of these
advantages, the JM8 strain is likely to be used widely in large-scale mouse
knockouts.
Although the next few years will see a massive increase in the number of pub-
licly available mouse knockouts, much work will be required to move from knock-
outs in ES cells to the time-consuming development and phenotyping of the
resulting knockout mice. For the large mutagenesis screens, the challenge of ana-
lyzing the many facets of the phenotype—collectively called the phenome—is
being addressed, and various phenotyping standardization protocols have been
developed and are being applied.
Humanized alleles
Many single-gene disorders show considerable mutational heterogeneity. For
disorders arising from a loss of function of a polypeptide-encoding gene, most
chain-terminating mutations can be expected to be null alleles, but the effects of
some splicing mutations and some other point mutations may be more difficult
to predict. Disease alleles found at high frequency may warrant the effort needed
to engineer the pathogenic human mutation into the sequence of the mouse
allele, thereby creating a humanized allele (Box 20.2 gives some examples).
BOX 20.2 MODELING RECESSIVE DISORDERS IN MICE: THE EXAMPLE OF CYSTIC FIBROSIS
Cystic fibrosis (CF; OMIM 219700) is a recessive disorder that is Other investigators deleted an early exon (exon 1 or exon 2)
particularly common in populations of Caucasian ancestry. It is caused to cause early premature termination. Again, there was no lung
by mutations in the CFTR (cystic fibrosis transmembrane regulator) pathology. However, in the deleted exon 1 model, the severity of the
gene, which has 27 exons, specifying a polypeptide of 1480 amino intestinal disease was observed to vary significantly depending on the
acid residues. The DF508 mutation, a trinucleotide deletion in exon 10, genetic background. This prompted the originators of the first exon
is very frequent (about 70% of CF alleles). Other relatively common 10 deletion model to make an inbred congenic strain by repeatedly
alleles result in the amino acid substitutions G551D and G480C. backcrossing with C57BL/6 until the contribution of the ES cell strain
In the absence of a suitable animal model, the identification of was limited to a small region including the Cftr allele. The resulting
CFTR as the CF locus prompted efforts to make a mouse knockout mouse now developed the hoped-for lung pathology.
model. Different types of mutation were generated in the mouse Cftr Attempts to model human alleles with a single amino acid
gene, and the resulting knockout mice were produced on various deletion or substitution produced milder phenotypes because the
genetic backgrounds. Not unexpectedly, the different CF models mutant protein had some residual functional activity. Two
showed very considerable differences in phenotype (Table 1). attempts to model the DF508 mutation produced different
Initial attempts focused on mutating exon 10, either deleting phenotypes because of differences in the amount of expression of
it to cause a downstream shift in the translational reading frame or the engineered CftrDF508 allele. Other models have been produced for
introducing an insertional mutation. Three such models failed to show the missense mutations. A mouse with the G551D mutation did have
evidence of lung pathology (see Table 1). Two mutations seemed to be some intestinal disease but less than for the null mice, and a mouse
null alleles, resulting in an absence of Cftr from mutant homozygotes, with the G480C allele had a very mild phenotype. The milder models
and did result in severe intestinal phenotypes; the other model had can nevertheless provide valuable information on how the
some residual Cftr mRNA because of a leaky mutation. Cftr gene functions.
TABLE 1 SOME OF THE DIFFERENT GENETARGETING STRATEGIES USED TO MAKE MOUSE MODELS OF CYSTIC
FIBROSIS
Mutant allele Nature of Cftr mRNA Genetic background Phenotype
mutationa (percentage (ES cell strain—
of wild type) embryo strain)
m1UNC DEx10 0 E14TG2a—C57BL/6 less than 5% survive to maturity; no lung pathology even in
adulthood, but severe intestinal disease
m1HGU Ins. Ex10 10 129—MF1 a leaky mutation by which some transcripts underwent exon
skipping to produce some functional mRNA; the resulting mild
phenotype included mild intestinal obstruction; about 95% of
mice survive to maturity
m1CAM DEx10 0 mixed: 129—MF1 and similar pathology to the m1UNC on a mixed genetic background,
129—C57BL/6 except has additional pathology in the lacrimal gland of the eye
m1HSC DEx1 0 129—CD1 severe phenotype with 30% survival but, after subsequent
crossing with variety of different mouse strains, the survival rates
and severity of phenotypes are seen to vary widely
m1EUR DF508 100 129—FBV disease mild (possibly because of high expression of the mutant
protein that is known to have partial Cl– channel function)
m2CAM DF508 30 129—C57BL/6 about 35% of mice die by day 16 (compared with 80% for null
mutants); higher survival is probably due to production of
mutant Cftr mRNA with some functional activity
m1KTH DF508 very low 129—C57BL/6 survival rate only about 40% because virtually no Cftr mRNA of
any kind is produced in intestine
aDEx, deletion of exon; Ins., insertional mutation. For historical reasons, the 6th and 7th exons of the human CFTR and mouse Cftr genes are
labeled as exons 6a and 6b, and so in this table exons 10 and 11 are in fact the 11th and 12th exons of the Cftr gene, respectively.
bAs a result of extensive backcrossing with C57BL/6 and selection for the original Cftr allele, only a very small region (including the Cftr gene)
originated from the E14TG2a ES cells, and the vast majority of the DNA is from the C57BL/6 strain.
TABLE 20.2 ANIMAL MODELS FOR MODELING HUNTINGTON DISEASE PATHOGENESIS AND EXPLORING THE FUNCTION OF
THE HD GENE
Animal Models and application(s)
Mouse (see also gene knockout to investigate effects of loss of huntingtin protein function; conventional transgenics that express
Figure 20.19) mutant huntingtin to study pathogenesis; knock-ins to study pathogenesis when mutant mouse huntingtin is
regulated by the endogenous promoter
Rat conventional transgenics that express mutant huntingtin; transgenics in which viral vectors were used to deliver
mutant transgenes to brain
Rhesus macaque transgenics with mutant HTT gene encoding expanded polyglutamine tract
Drosophila melanogaster enhancer and suppressor screens to identify genes that modify polyglutamine toxicity; phenotype rescue by
inhibition of transcriptional repressor complexes
Caenorhabditis elegans RNAi screen to identify genes that modulate polyglutamine aggregation
Saccharomyces cerevisiae enhancer and suppressor screens to identify genes that modify polyglutamine toxicity
666 Chapter 20: Genetic Manipulation of Animals for Modeling Disease and Investigating Gene Function
u
A1 Mmu10
)1Y
on the long arm of human chromosome
Mmu16 A2
(16
A3.1 A1 21 (Hsa21) mostly have counterparts on
Dp
A2 A3.2 C3.1 the distal part of mouse chromosome 16
A3 A3.3 A2
B1 (Mmu16) (pale green boxes). Genes on the
B1 B2 A3
B3 A4 distal extremity of Hsa21 have orthologs on
mouse chromosome 17 (Mmu17) (red
n
je
B2 B1
5D
C C3.2
1C
boxes) or 10 (Mmu10) (yellow boxes).
5
B3
Ts
Ts
B2
s6
B4 D (B) Extent of trisomy in segmental trisomy
1T
E1.1 B3 16 mouse models. The widely used Ts65Dn
Ms
B5 E1.2
E1.3 B4 C3.3 mouse model is trisomic for a distal region
C1.1 Hsa21 E2 B5.1
C1.2 B5.2 of mouse chromosome 16 that spans about
E3 B5.3
C1.3 60% of the region showing conservation
C2 E4 C1
C3.1 of synteny with Hsa21. However, it is also
E5 C2 C4
C3.2 trisomic for a more than 5.8 Mb region on
C3.3 C3 mouse chromosome 17. Complementary
C4 components of the trisomic region in Ts65Dn
D1
are represented in the Ts1Cje (small proximal
D2 component) and Ms1Ts65 (large distal
D3 component) models. The Dp(16)1Yu model
was generated very recently by long-range
difficult to model because of the limited conservation of human–mouse synteny chromosome engineering with Cre–loxP
(gene order; see Figure 10.16). Nevertheless, considerable effort has been devoted site-specific recombination; it carries an
to making mouse models of trisomy 21 (Down syndrome). accurate duplication of the full 22.9 Mb
In trisomy 21, the pathogenesis results from the dosage sensitivity of a few region on mouse chromosome 16 that
genes on chromosome 21, for which the presence of three copies instead of two shows conserved synteny with Hsa21.
is sufficient to produce abnormal gene regulation. Because 21p is essentially
made up of heterochromatin and of rRNA genes that are also present in multiple
copies on four other chromosomes, the problem lies with genes on 21q. Most of
these genes have counterparts on mouse chromosome 16 (Figure 20.20A).
However, mouse trisomy 16 does not resemble human trisomy 21; trisomy 16
mice die in utero and are not useful models. Importantly, most genes on mouse
chromosome 16 have orthologs on chromosomes other than human chromo-
some 21. Genes in the distal 4.5 Mb of human 21q have orthologs on mouse
chromosome 17 or 10.
To obtain more accurate models, various mice with segmental trisomy 16
have been produced. Earlier models relied on using random mutagenesis and
retrieving mice with partial trisomy 16 arising through translocations. The Ts65Dn
mouse has been a very widely used model of trisomy 21 because of its learning
and behavior deficits. However, it is trisomic for less than 60% of the human
chromosome 21 syntenic region on mouse chromosome 16. Additionally, it is tri-
somic for a region of mouse chromosome 17. Recently, long-range chromosome
engineering using Cre–loxP recombination has produced a promising model
with a duplication of the entire 22.9 Mb region on mouse chromosome 16 that is
syntenic with human chromosome 21 (see Figure 20.20).
As a radical alternative to duplicating mouse chromosome segments,
transchromosomic mice have been generated that carry a single almost com-
plete human chromosome 21. To do this, chromosomes from human fibroblast
cells were transferred into mouse ES cells by using microcell-mediated chromo-
some transfer (see Chapter 16, p. 523, for a description of this technique). With
the use of a marker gene, those ES cells that had picked up human chromosome
21 were selected. Some of the resulting ES cells had a nearly complete human
chromosome 21 and were used to make an aneuploid transchromosomic mouse
line, Tc1, that had various characteristics of trisomy 21 (Figure 20.21). Although
the Tc1 transchromosomic mouse model appeared promising, it has not been
easy to work with. Random loss of the Hsa21 fragment during development
means that analyses are complicated by the variable levels of mosaicism in differ-
ent tissues.
(A)
wild-type
ES cell
(B) MmuX
Figure 20.21 Making the transchromosomic mouse line Tc1 to model Down syndrome.
(A) Using gene targeting, a neomycin resistance gene (neo) was inserted into human chromosome
21 (Hsa21) in a human cell line. Cells carrying a neo-tagged Hsa21 were arrested in metaphase and
then centrifuged to isolate microcells carrying one or just a few chromosomes. The microcells were
fused to murine embryonic stem (ES) cells. After selection with G418 for uptake of the neo gene,
ES cells were screened to identify the extent of Hsa21 present. Female mouse ES cell lines were
injected into mouse blastocysts to generate female chimeric mice. The latter were bred to establish
the Tc1 mouse strain, which carries a freely segregating Hsa21. PEG, poly(ethylene glycol).
(B) Fluorescence in situ hybridization (FISH) of metaphase chromosomes from Tc1 mouse
splenocytes. Chromosomes are revealed by DAPI staining (blue), and chromosome painting was
performed with probes specific for human chromosome 21 (Hsa21) and mouse chromosome X
(MmuX). (Courtesy of Elizabeth Fisher and Victor Tybulewicz, University College London.)
Hsa21
TABLE 20.3 UNEXPECTED PHENOTYPES IN MICE FOLLOWING INACTIVATION OF ORTHOLOGS OF HUMAN TUMOR
SUPPRESSOR GENES
Mouse gene Tumors in standard knockout mouse Human tumors arising from loss of orthologous gene function
Apc (Min, D716, multiple polyps in small intestine polyps in colon progressing to carcinoma
D580 alleles)
Brca1 none breast, ovary
Brca2 none breast, ovary
Ink4a fibrosarcoma, lymphoma, squamous cell carcinoma familial melanoma, sporadic pancreatic and brain tumors
Nf1 pheochromocytoma, myeloid leukemia benign neurofibroma, malignant fibrosarcoma
Nf2 osteosarcoma, fibrosarcoma, lung adenocarcinoma, schwannomas, meningomas, ependymomas, gliomas
hepatocellular carcinoma
Rb brain, pituitary retinoblastoma, osteosarcoma
Tp53 osteosarcoma, lymphoma, soft-tissue sarcoma breast carcinoma, brain, sarcomas, leukemia
Second-generation models using conditional inactivation of the above tumor suppressor genes and/or inactivation of additional genes involved in
tumorigenesis have generally produced good models; see the text and the reviews by Rangarajan & Weinberg (2003) and Frese & Tuveson (2007)
under Further Reading.
TABLE 20.4 EXAMPLES OF ZEBRAFISH DISEASE MODELS PRODUCED BY GENE KNOCKDOWN USING MORPHOLINO
OLIGONUCLEOTIDES
Disease OMIM numbera Disease locus Zebrafish phenotype
Barth syndrome 302060 TAZ dose-dependent lethality, severe developmental and growth retardation,
bradycardia, pericardial effusions, and generalized edema
Joubert syndrome type 5 610188 CEP290 (NPHP6) defects in retinal, cerebellar, and otic cavity development
Lenz microphthalmia 300166 BCOR developmental perturbations of the eye, skeleton, and central nervous system
Kari G, Rodeck U & Dicker AP (2007) Clin. Pharmacol. Ther. 82, 70–80.
672 Chapter 20: Genetic Manipulation of Animals for Modeling Disease and Investigating Gene Function
breakthrough has been in the extended provision of pluripotent stem cell lines to
enable gene targeting in non-murine vertebrates.
New promising ES cell lines have been developed from rat blastocysts, and
stable rabbit ES cell lines have also recently been reported. An alternative and
exciting new route has been to induce somatic cells to dedifferentiate so as to
generate pluripotent stem cells that are operationally identical to ES cells. Such
induced pluripotent stem (iPS ) cells may serve as alternatives to ES cells for per-
forming gene targeting. iPS cell lines have been reported from a variety of experi-
mental animals including mice, rats, pigs, and rhesus monkeys.
The difficulties in accurately replicating many human phenotypes in rodent
models have prompted considerable interest in nonhuman primate models.
Although transgenesis has been possible in some cases, such as the rhesus
macaque model of Huntington disease described above, germ-line transmission
of functional transgenes was not successful until 2009, when transgene transmis-
sion was reported in marmosets. Together with the development of nonhuman
primate iPS cells, these breakthroughs may lead to a significant increase in the
use of nonhuman primate models.
Human iPS cells have also recently been made and in Chapter 21 we will detail
the technology for making human iPS cells, because of their therapeutic poten-
tial and their exciting potential for providing human cellular models of disease.
CONCLUSION
Animal models are vital for our exploration of how genes function in cells, in
understanding human disease, and in testing novel drugs and therapies. The
ability to modify the genome of model animals has greatly assisted research
efforts.
The use and usefulness of animal models depends on various factors, includ-
ing evolutionary relatedness to humans, ethical concerns, and practical advan-
tages and limitations. Fly and worm models have been invaluable for under-
standing gene function and modeling disease at the cellular level. Because they
are particularly amenable to genetic manipulation, mice and zebrafish are impor-
tant vertebrate models.
Genetic manipulation of animals normally involves introducing exogenous
DNA into the germ line to create a transgenic animal. Pronuclear microinjection
of exogenous DNA into fertilized zygotes is often used to study the effect of
expressing an exogenous gene that may be normal or a mutant. An alternative is
to modify endogenous genes, either by gene targeting (modifying a specific pre-
determined gene) or by some kind of random mutagenesis. Gene targeting often
exploits the cellular process of homologous recombination to replace important
sequences in an endogenous gene by a mutant version.
Site-specific recombination gives control over where a gene is inserted and
when it might be expressed. Use of systems such as Cre–loxP allows conditional
knockouts, in which genes essential for early development can be switched off in
specific tissues or at a desired developmental stage. This technology can also be
used to engineer larger chromosomal changes such as deletions, insertions, and
translocations.
Gene function can also be altered by targeting RNA transcripts for destruction
by RNA interference pathways or blocking their expression by using morpholino
antisense oligonucleotides. Although these methods often have transient effects
and may allow some residual gene function, they are comparatively easy to do
and can produce some useful models quite quickly.
Large-scale mutagenesis often uses random mutagenesis with mutagenic
chemicals or insertional inactivation by a known transgene sequence, when the
position of the disrupted gene can be identified by the inserted sequence.
New genetic technologies are increasing the range of disease models. Until
recently, knocking out gene function in vertebrates was very largely performed in
mice, but the use of zinc finger nucleases has extended gene targeting to species
such as zebrafish and rats. The development of iPS cells from somatic cells opens
the prospect of widening the range of model species still further. Finally, human
iPS cells can also offer the prospect of cellular human models of human disease.
We consider this aspect in Chapter 21 within the context of biomedical applica-
tions of human iPS cells.
FURTHER READING 673
FURTHER READING
Mouse genetics, transgenic animals, and basic Targeted gene inactivation at the RNA level
gene targeting Gao X & Zhang P (2007) Transgenic RNA interference with mice.
Capecchi M (1989) Altering the genome by homologous Physiology 22, 161–166.
recombination. Science 244, 1288–1292. Heasman J (2002) Morpholino oligos: making sense of antisense?
Chan AW, Chong KY, Martinovich C et al. (2001) Transgenic Dev. Biol. 243, 209–214.
monkeys produced by retroviral gene transfer into mature Kamath RS, Fraser AG, Dong Y et al. (2003) Systematic functional
oocytes. Science 291, 309–312. analysis of the Caenorhabditis elegans genome using RNAi.
Couzin J (2007) Erasing microRNAs reveals their powerful punch. Nature 421, 231–237.
Science 316, 530.
Houdebine LM (2007) Transgenic animals in biomedical research.
ENU mutagenesis in rodents
Methods Mol. Biol. 360, 163–202. Hrabe de Angelis MH, Flaswinkel H, Fuchs H et al. (2000)
Silver L (1995) Mouse Genetics. Concepts And Applications. Genome-wide, large-scale production of mutant mice by
Oxford University Press. [Now freely available electronically ENU mutagenesis. Nat. Genet. 4, 444–447.
at: http://www.informatics.jax.org/silver/] Nolan PM, Peters J, Strivens M et al. (2000) A systematic, genome-
wide, phenotype-driven mutagenesis programme for gene
Somatic cell nuclear transfer for animal cloning, function studies in the mouse. Nat. Genet. 4, 440–443.
transgenesis, and gene targeting Smits B, Mudde J, van de Belt J et al. (2006) Generation of gene
knockouts and mutant models in the laboratory rat by
Campbell KHS, McWhir J, Richie WA & Wilmut I (1996) Sheep
ENU-driven target-selected mutagenesis. Pharmacogenet.
cloned by nuclear transfer from a cultured cell line. Nature
Genomics 16, 159–169.
380, 64–66.
McCreath KJ, Howcroft J, Campbell KHS et al. (2000) Production Insertional mutagenesis using gene traps and
of gene-targeted sheep by nuclear transfer from cultured
somatic cells. Nature 405, 1066–1069.
transposons
Schnieke AE, Kind AJ, Ritchie WA et al. (1997) Human factor Carlson CM & Largaespada DA (2005) Insertional mutagenesis
IX transgenic sheep produced by transfer of nuclei from in mice: new perspectives and tools. Nat. Rev. Genet.
transfected fetal fibroblasts. Science 278, 2130–2133. 7, 568–580.
Wilmut I, Schnieke AE, McWhir J et al. (1997) Viable offspring Ding S, Wu X, Han M, Zhuang Y & Xu T (2005) Efficient
derived from fetal and adult mammalian cells. Nature transposition of the piggyBac (PB) transposon in mammalian
385, 810–813. cells and mice. Cell 122, 473–483.
Dupuy AJ, Akagi K, Largaespada DA et al. (2005) Mammalian
Cre–loxP, conditional mutagenesis, and mutagenesis using a highly mobile somatic Sleeping Beauty
transposon system. Nature 436, 221–226.
chromosome engineering
Friedel RH, Plump A, Lu X et al. (2005) Gene targeting using a
Bradley A & van der Weyden L (2006) Mouse chromosome promoterless gene trap vector (“targeted trapping”) is an
engineering for modeling human disease. Annu. Rev. efficient method to mutate a large fraction of genes. Proc.
Genomics Hum. Genet. 7, 247–276. Natl Acad. Sci. USA 102, 13188–13193.
Feil R & Metzger D (eds) (2007) Conditional Mutagenesis: An Kawakami K (2005) Transposon tools and methods in zebrafish.
Approach To Disease Models. Springer-Verlag. Dev. Dyn. 234, 244–254.
Kühn R & Torres RM (2002) Cre/loxP recombination system and Stanford WL, Cohn JB & Cordes SP (2001) Gene-trap mutagenesis:
gene targeting. Methods Mol. Biol. 180, 175–204. past, present and beyond. Nat. Rev. Genet. 2, 756–768.
Le Y & Sauer B (2000) Conditional gene knockout using cre Yamamura K-I & Araki K (2008) Gene trap mutagenesis in mice:
recombinase. Methods Mol. Biol. 136, 477–485. new perspectives and tools in cancer research. Cancer Sci.
Wallace HA, Marques-Kranc F, Richardson M et al. (2007) 99, 1–6.
Manipulating the mouse genome to engineer precise
functional syntenic replacements with human sequence. Cell Large-scale knockout mouse projects and mouse
128, 197–209. phenotyping
Wu S, Ying G, Wu Q & Capecchi MR (2007) Toward simpler and
faster genome-wide mutagenesis in mice. Nat. Genet. 39, Abbott A (2009) The check-up. Nature 460, 947–948. [A report on
922–930. the German Mouse Clinic.]
Brown SD, Wurst W, Kuhn R & Hancock JM (2009) The functional
annotation of mammalian genomes: the challenge of
Zinc finger nucleases and targeted genome
phenotyping. Annu. Rev. Genet. 43, 305–333.
modification in non-murine vertebrates Grimm D (2006) Mouse genetics. A mouse for every gene. Science
Geurts AM, Cost GJ, Freyvert Y et al. (2009) Knockout rats via 312, 1862–1866.
embryo microinjection of zinc-finger nucleases. Science 325, International Mouse Knockout Consortium (2007) A mouse for all
433. reasons. Cell 128, 9–13.
McCreath KJ, Howcroft J, Campbell KH et al. (2000) Production Morgan H, Beck T, Blake A et al. (2010) EuroPhenome: a repository
of gene targeted sheep by nuclear transfer from cultured for high-throughput mouse phenotyping data. Nucleic Acids
somatic cells. Nature 405, 1066–1069. Res. 38, D577–D585.
Remy S, Tesson L, Menoret S et al. (2010) Zinc-finger nucleases: a Pettitt SJ, Liang Q, Rairdan XY et al. (2009) Agouti C57BL/6N
powerful tool for genetic engineering of animals. Transgenic embryonic stem cells for mouse genetic resources. Nat.
Res. (in the press) (DOI:10.1007/s11248-009-9323-7). Methods 6, 493–495.
674 Chapter 20: Genetic Manipulation of Animals for Modeling Disease and Investigating Gene Function
Genetic redundancy Jonkers J & Berns A (2002) Conditional mouse models of sporadic
Kafri R, Springer M & Pilpel Y (2009) Genetic redundancy: new cancer. Nat. Rev. Cancer 2, 251–265.
tricks for old genes. Cell 6, 389–392. Rangarajan A & Weinberg RA (2003) Opinion: comparative
biology of mouse versus human cells: modelling human
Genetic screens and gene function analyses in cancer in mice. Nat. Rev. Cancer 3, 952–959.
vertebrates Santos J, Fernandez-Navarro P, Villa-Morales M et al. (2008)
Genetically modified mouse models in cancer studies.
Aitman TJ, Critser JK, Cuppen E et al. (2008) Progress and Clin. Transl. Oncol. 10, 794–803.
prospects in rat genetics: a community view. Nat. Genet. Stoletov K & Klemke R (2008) Catch of the day: zebrafish as a
5, 516–522. human cancer model. Oncogene 27, 4509–4520.
Jorgensen EM & Mango SED (2002) The art and design of genetic
screens: Caenorhabditis elegans. Nat. Rev. Genet. 3, 356–369. Mouse genetic background and modifier genes
Kile BT & Hilton DJ (2005) The art and design of genetic screens: Dietrich WF, Lander ES, Smith JS et al. (1993) Genetic
mouse. Nat. Rev. Genet. 6, 557–567. identification of Mom-1, a major modifier locus affecting Min-
Mashimo T, Yanagihara K, Tokuda S et al. (2008) An ENU-induced induced intestinal neoplasia in the mouse. Cell 75, 631–639.
mutant archive for gene targeting in rats. Nat. Genet. Jackson Laboratory Manual on Mouse Genetic Background.
5, 514–515. http://jaxmice.jax.org/manual/index.htmlgenetics
Nguyen D & Xu T (2008) The expanding role of mouse genetics Kwong LN, Shedlovsky A, Biehl BS et al. (2007) Identification of
for understanding human biology and disease. Dis. Model Mom7, a novel modifier of ApcMin/+ on mouse chromosome
Mech. 1, 56–66. 18. Genetics 176, 1237–1244.
Skromne I & Prince VE (2008) Current perspectives in zebrafish Nadeau J (2003) Modifier genes and protective alleles in humans
reverse genetics: moving forwards. Dev. Dyn. 237, 861–882. and mice. Curr. Opin. Genet. Dev. 13, 290–295.
St Johnston D (2002) The art and design of genetic screens:
Drosophila melanogaster. Nat. Rev. Genet. 3, 176–188. General reviews on mouse disease models
Modeling single-gene disorders and difficulties in replicating human disease
Guilbault C, Saeed Z, Downey GP & Radzioch D (2007) Cystic phenotypes
fibrosis mouse models. Am. J. Respir. Cell Mol. Biol. 36, 1–7. Bartlett NW, Walton RP, Edwards MR et al. (2008) Mouse models
Hickey MA & Chesselet MF (2003) The use of transgenic and of rhinovirus-induced disease and exacerbation of allergic
knock-in mice to study Huntington’s disease. Cytogenet. airway inflammation. Nat. Med. 14, 199–204.
Genome Res. 100, 276–286. Bedell MA, Largaespada DA, Jenkins NA & Copeland NG (1997)
Kist R, Watson M, Wang X et al. (2005) Reduction of Pax9 Mouse models of disease. Part II: recent progress and future
gene dosage in an allelic series of mouse mutants causes directions. Genes Dev. 11, 11–43.
hypodontia and oligodontia. Hum. Mol. Genet. 14, 3605–3617. Dennis C (2006) Cancer: off by a whisker. Nature 442, 739–741.
Mankodi A, Logigian E, Callahan L et al. (2000) Myotonic Elsea SH & Lucas RE (2003) The mousetrap: what we can learn
dystrophy in transgenic mice expressing an expanded CUG when the mouse model does not mimic the human disease.
repeat. Science 289, 1769–1773. ILAR J. 43, 66–79.
Seznec H, Agbulut O, Sergeant N et al. (2001) Mice transgenic for Liao B-Y & Zhang J (2008) Null mutations in human and mouse
the human myotonic dystrophy region with expanded CTG orthologs frequently result in different phenotypes. Proc. Natl
repeats display muscular and brain abnormalities. Hum. Mol. Acad. Sci. USA 105, 6987–6992.
Genet. 10, 2717–2726. Thyagarajan T, Totey S, Danton MJS & Kulkarni AB (2003)
Slow EJ, van Raamsdonk J, Rogers D et al. (2003) Selective striatal Genetically altered mouse models: the good, the bad and the
neuronal loss in a YAC128 mouse model of Huntington ugly. Crit. Rev. Oral Biol. Med. 14, 154–174.
disease. Hum. Mol. Genet. 12, 1555–1567.
Yamamoto A, Lucas JJ & Hen R (2000) Reversal of neuropathology Genetically modified invertebrates for modeling
and motor dysfunction in a conditional model of disease and drug discovery
Huntington’s disease. Cell 101, 57–66.
Bier E (2005) Drosophila, the golden bug, emerges as a tool for
Modeling chromosomal disorders human genetics. Nat. Rev. Genet. 6, 9–23.
Bilen J & Bonini NM (2005) Drosophila as a model for human
Li Z, Yu T, Morishima M et al. (2007) Duplication of the entire
neurodegenerative disease. Annu. Rev. Genet. 39, 153–171.
22.9 Mb human chromosome 21 syntenic region on mouse
Olsen A, Vantipalli MC & Lithgow GJ (2006) Using Caenorhabditis
chromosome 16 causes cardiovascular and gastrointestinal
elegans as a model for aging and age-related diseases. Ann.
abnormalities. Hum. Mol. Genet. 16, 1359–1366.
N.Y. Acad. Sci. 1067, 120–128.
O’Doherty A, Ruf S, Mulligan C et al. (2005) An aneuploidy mouse
strain carrying human chromosome 21 with Down syndrome
Genetic modification of non-murine vertebrates:
phenotypes. Science 309, 2033–2037.
Reeves RH & Gardner CC (2007) A year of unprecedented progress and novel pluripotent stem cells
progress in Down syndrome basic research. Ment. Retard. Dev. Buehr M, Meek S, Blair K et al. (2008) Capture of authentic
Disabil. Res. Rev. 13, 215–220. embryonic stem cells from rat blastocysts. Cell 135,
1287–1298.
Modeling cancers Bugos O, Bhide M & Zilka N (2009) Beyond the rat models of
Evers B & Jonkers J (2006) Mouse models of BRCA1 and BRCA2 human neurodegenerative disease. Cell. Mol. Neurobiol. 29,
deficiency: past lessons, current understanding and future 859–869.
prospects. Oncogene 25, 5885–5897. Honda A, Hirose M, Inoue K et al. (2008) Stable embryonic stem
Frese KK & Tuveson DA (2007) Maximizing mouse cancer models. cell lines in rabbits: potential small animal models for human
Nat. Rev. Cancer 7, 645–658. research. Reprod. Biomed. Online 17, 706–715.
FURTHER READING 675
Genetic Approaches to
Treating Disease 21
KEY CONCEPTS
• Genetic techniques can be applied in various ways in treating disease, regardless of whether
the disease has a genetic origin or not.
• Genetics is being used extensively in the drug discovery process to identify new drug targets.
• Therapeutic recombinant proteins produced by expressing cloned DNA are safer than those
sourced from animal and human sources.
• Genetic engineering of vaccines may involve inserting genes for desired antigens into a viral
vector and using either the modified vector or the purified antigen as the vaccine.
Alternatively, DNA can be used as a vaccine by direct injection into muscle cells.
• Cells lost by disease or injury can be replaced by suitable stem cells that have been directed
to differentiate into the correct specialized cell type.
• Nuclear reprogramming involves fundamental alteration of the chromatin structure and
gene expression pattern so as to convert differentiated cells into less specialized cells
(dedifferentiation) or to a different type of differentiated cell (transdifferentiation).
• Autologous cell therapy involves conversion of a patient’s cells into a type of cell that is
deficient through disease or injury. In principle this can be done by artificially stimulating
the mobilization of existing stem cells in the patient or by nuclear reprogramming.
• Skin fibroblasts from a patient can be dedifferentiated to make patient-specific totipotent or
pluripotent stem cells that can be differentiated to required cell types for use in autologous
cell therapy and for studying disease. In therapeutic cloning the nucleus from a patient’s cell
is transplanted into an enucleated egg cell from a donor, giving rise to a totipotent cell. More
conveniently, the skin cells can simply be induced to express key transcription factors that
will convert them to pluripotent stem cells.
• Gene therapy involves transferring genes, RNA, or oligonucleotides into the cells of a patient
so as to alter gene expression in some way that counteracts or alleviates disease. The
patient’s cells are often genetically modified in culture and then returned to the body.
• Gene therapy strategies often seek to compensate for underproduction of an important
gene product. Other strategies are designed to block the expression of a mutant gene that
makes a harmful gene product, or to induce alternative splicing so that a harmful mutation
does not get included in a mRNA, or to kill harmful cells, for example in cancer gene
therapy.
• Viral vectors are commonly used in gene therapy because they have good gene transfer
rates, but they sometimes provoke strong immune responses and also abnormal activation
of cellular genes such as proto-oncogenes.
• Zinc finger nucleases can be designed to make just one double-strand DNA break within the
genome of intact cells. Cellular pathways that repair double-strand DNA breaks can then be
induced to replace a sequence containing a harmful gene mutation with the normal allelic
sequence.
678 Chapter 21: Genetic Approaches to Treating Disease
In the preceding chapters in this part of the book, we have looked at how genetics
can be used to test individuals and populations for genetic diseases and suscep-
tibility for complex diseases. Chapter 20 explored genetic manipulation of ani-
mals to model human diseases. In this chapter, we look at how genetics can be
employed to treat diseases. A chapter on disease treatment might reasonably
consider two quite separate matters: the treatment of genetic disease, and the
genetic treatment of disease. We therefore consider briefly what these two areas
cover before describing how genetic techniques are being used directly to treat
disease.
Despite much work, however, relatively few genetic diseases have wholly sat-
isfactory treatments.
IL2R
Insulin diabetes
Leptin obesity
Erythropoietin anemia
lines (or microbial bioreactors). Instead, transgenic animals have been used
because large amounts of a recombinant human protein can be produced in
ways that simplify its purification. One way is to arrange that the desired protein
is secreted in the animal’s milk. An expression vector is designed with the coding
sequence for the human protein under the control of a promoter from a gene that
normally makes a milk protein, and the recombinant DNA is inserted into the
animal’s germ line (Figure 21.2).
mammary gland-
specific promoter
pA transfection
cDNA for
protein X
transgene
X – cells goat (or cow,
rabbit, etc.)
(A) rodent mAb (B) chimeric Ab (C) humanized Ab Figure 21.3 Antibody engineering.
(A) Classic monoclonal antibodies (mAbs) are
monospecific rodent antibodies synthesized
by hybridomas. (B) Chimeric V/C antibodies
(Abs) are genetically engineered to have
rodent variable region sequences containing
the critically important hypervariable
complementarity determining region (CDR)
joined to human constant region sequences.
(C) Humanized antibodies can be engineered
so that all the molecule except the CDR is of
human origin. (D) More recently, it has been
possible to obtain fully human antibodies
(D) fully human Ab (E) scFv Ab KEY:
rodent human by different routes (see the text and Figure
21.4). (E) Single-chain variable fragment
constant (C) region domains (scFv) antibodies contain variable region
domains jointed by a peptide linker. They
variable (V) region domains are particularly well suited to working within
the reducing environment of cells and serve
complementarity determining region as intrabodies (intracellular antibodies) by
binding to specific antigens within cells.
Depending on the length of the linker, they
bind their target as monomers, dimers, or
trimers. Multimers bind their target more
fully human antibodies have recently been obtained against nicotine. They have strongly than monomers.
been shown to be effective in inhibiting the entry of nicotine into the brain in
mice; if the same effect is replicated in clinical trials they could be a valuable aid
in stopping smoking.
One very promising class of therapeutic antibody is designed to have a single
polypeptide chain. Single-chain variable fragment (scFv) antibodies have almost
all the binding specificity of a mAb but are restricted to a single non-glycosylated
variable chain (see Figure 21.3E). scFV antibodies can be made on a large scale in
bacterial cells, yeast cells, or even plant cells. They are particularly well suited to
acting as intracellular antibodies (intrabodies). Instead of being secreted like
normal antibodies, they are designed to bind specific target molecules within
cells and can be directed as required to specific subcellular compartments.
Unlike standard antibodies, which have four polypeptide chains linked by disul-
fide bridges, intrabodies are stable in the reducing environment within cells.
Intrabodies can be used to carry effector molecules that can perform specific
functions when antigen binding occurs. However, for many therapeutic pur-
poses, they are designed simply to block specific protein–protein associations
within cells. As such, they complement conventional drugs. Protein–protein
interactions usually occur across large, flat surfaces and are considered unsuit-
able targets for typical small-molecule drugs, which normally operate by fitting
snugly into clefts on the surface of macromolecules. Promising therapeutic target
proteins for intrabodies include mutant proteins that tend to misfold in a way
that causes neurons to die, as in various neurodegenerative diseases including
Alzheimer, Huntington and prion diseases.
From inauspicious beginnings in the 1980s, mAbs have become the most
successful biotech drugs ever, and the market for mAbs is the fastest-growing
component of the pharmaceutical industry. By the middle of 2008, a total of
21 mAb therapies had been approved by the US FDA. Of the therapeutic mAbs
currently in use, the eight bestsellers together generate an annual income of
close to $30 billion. Another 200 mAb products are in the pipeline. Of the FDA-
approved mAbs, 19 are partly or fully human and most of them are aimed at treat-
ing cancers or autoimmune disease (Table 21.2). Only one of the approved anti-
bodies is directed against infectious disease, being used to fend off respiratory
syncytial virus in infants.
Safety issues can sometimes be a concern, however. In 2006 a clinical trial to
assess a mAb called TGN1412 left six volunteers fighting for their lives. TGN1412
was hoped to be an effective agent in cancer immunotherapy, in which the strat-
egy is to stimulate immune system cells so as to provoke a strong immune
response against the cancer cells. However, TGN1412 activated immune cells to
such an extent that the volunteers developed intense and nearly fatal allergic
reactions to the mAb.
GENETIC APPROACHES TO DISEASE TREATMENT USING DRUGS, RECOMBINANT PROTEINS, AND VACCINES 685
myeloma spleen
cells cells
hybridomas
anti-X mAbs
TABLE 21.2 PARTLY OR FULLY HUMAN THERAPEUTIC MONOCLONAL ANTIBODIES mAbs APPROVED BY THE US FOOD
AND DRUG ADMINISTRATION AS OF MID2008
Disease category Targeta mAb generic name (trade name) mAb classb Disease treated
factor a; EGFR, epidermal growth factor receptor; HER2, human epidermal growth factor receptor 2; VEGF, vascular endothelial growth factor;
GPIIb/IIIa, a platelet integrin; RSV, respiratory syncytial virus; NSCL, non-small-cell lung. bSee Figure 21.3 for structures.
chronic viral infections. Several aptamers are currently being tested in preclinical
and clinical trials, and recently the aptamer pegaptanib (Macugen®) received
FDA approval for the treatment of neovascular (wet) age-related macular degen-
eration, a principal cause of blindness in adults. Loss of vision in this disorder
arises by the formation of new blood vessels (angiogenesis) and leakage from
blood vessels, both of which are promoted by vascular endothelial growth factor
(VEGF). Pegaptanib aptamers specifically bind to VEGF and inhibit its angiogen-
esis functions.
Pluripotent embryonic stem cells are derived from the early embryo, a stage
in development at which cells are naturally still unspecialized. Later, as tissues
form in development, cells become more and more specialized but some stem
cells are retained in the fetus and in adults. They are needed to regenerate tissue.
Our blood, skin, and intestinal epithelial cells, for example, have limited lifetimes
and are replaced regularly by new cells produced by differentiation from stem
cells. Stem cells are also used in tissue repair. Although many stem cells derived
from fetal and adult tissues are tissue-specific, mesenchymal stem cells are found
in a wide range of tissues and may constitute a subset of cells associated with
minor blood vessels (pericytes).
Embryonic stem cells
Cells of the inner cell mass of a blastocyst can be isolated and cultured to gener-
ate embryonic stem (ES) cell lines (Box 21.2). Mouse ES cells have been studied
for many years but, after the isolation of human ES cell lines was first reported in
1998, stem cell therapy was propelled to the forefront of both medical research
and ethical debate. ES cells have clear advantages for cell therapy because they
can be grown comparatively readily in culture and propagated indefinitely
and can give rise to cell lineages. Different ES cell lines show significant variation
in some properties, however, such as their potential for differentiation. Some ES
cell lines readily differentiate toward mesodermal lineages, for example, whereas
others do not.
The use of ES cell lines is contentious because the methods used to isolate
them have traditionally involved the destruction of a human embryo to obtain
cells in the inner cell mass. Nevertheless, some human ES cell lines have been
made without destroying an embryo. For example, individual cells (blastomeres)
can be carefully isolated from very early embryos in such a way that the embryo
remains viable, and a few human ES cell lines have been generated from indi-
vidual blastomeres isolated in this way.
Tissue stem cells
Deriving stem cells from adult human tissues and adult and cord blood has not
been contentious. Tissue stem cells were initially thought to be comparatively
plastic under the right experimental conditions. For example, when bone mar-
row stem cells are placed in the environment of liver tissue they seem to be
induced to give rise to new liver cells. However, when rigorously tested, there is
little evidence to support claims of this type. Instead, tissue stem cells are now
known to be almost always rather limited in their differentiation potential. An
exception is provided by the adult germ line—pluripotent germ-line stem cells
have recently been derived from spermatogonial cells of the adult human testis
and show many similarities to ES cells. Many tissue stem cells are also difficult to
grow in culture.
While normal mobilization of an individual’s stem cells can help make small
repairs to different degrees, large injuries and serious disease pose too much of a
challenge for the body’s natural repair processes. But what if we could artificially
enhance the normal stem cell response to repair? Bone marrow cells that are
mobilized in response to disease or injury are an attractive target for artificially
enhanced mobilization. They include mesenchymal stem cells, which can
become bone or cartilage cells, and endothelial progenitor cells, which produce
the cells that make up our blood vessels and help with tissue repair. A promising
report published in early 2009 showed that administration of the drug Mozobil®
and a growth factor was found to elicit a hundredfold increase in the release of
endothelial and mesenchymal progenitor cells from the bone marrow in mice.
epidermis
neural cells
C
macrophage
lymphocyte
muscle
exocrine
embryo pancreas
D
endocrine
pancreas
stomach
unfertilized ICM
egg blastocyst B
ES cells
differentiated
cells
Figure 1 Experimental routes for nuclear reprogramming in mammals. Normal steps in development and tissue differentiation are shown by
black arrows. Experimental procedures are shown as blue arrows. (A) Nuclear transfer into an unfertilized egg reprograms a somatic nucleus to
dedifferentiate to give a totipotent cell. (B) A somatic cell is stimulated to dedifferentiate to become a pluripotent stem cell that closely resembles
an ES cell. (C) Lineage switching is a form of transdifferentiation that involves dedifferentiation followed by differentiation down a different
differentiation pathway. (D) Transdifferentiation by direct conversion of one somatic cell into another. ICM, inner cell mass.
692 Chapter 21: Genetic Approaches to Treating Disease
The resulting cells would be stimulated to develop into blastocysts, and inner cell
mass cells would be removed to make ES cells. This type of approach became
known as therapeutic cloning, to distinguish it from the banned reproductive
cloning (Figure 21.5).
Therapeutic cloning has been controversial. Not only does it involve the
destruction of an unfertilized human egg, but also, by providing the technology
for cloning human blastocysts, it has raised fears that human reproductive clon-
ing could be facilitated. In addition, obtaining eggs from human donors involves
an invasive and potentially dangerous procedure. The attraction of therapeutic
cloning was that it offered the possibility of making patient-specific ES cells that
could, in principle, be used for autologous cell therapy, and disease-specific ES
cells that could be used for studying pathogenesis and in drug screening as
described below. In practice, cloning human blastocysts by somatic cell nuclear
transfer proved to be extraordinarily inefficient—thus far, there is no report of a
human ES cell line made by this method.
1996 production of the first cloned mammal, a sheep called Dolly, by somatic cell nuclear transfer; see Figure 20.4 (Nature 380,
64–66; PMID 8598906)
2004 chemical screens identify small molecules that cause reversal of differentiation, such as reversine, which can cause myoblasts
to dedifferentiate into progenitor cells (J. Am. Chem. Soc. 126, 410–411; PMID 14719906 and Nat. Biotechnol. 22, 833–840;
PMID 15229546)
2004 reprogramming of nuclei transferred from olfactory sensory neurons to eggs to generate totipotent cells and subsequently
cloned mice (Nature 428, 44–49; PMID 14990966)
2004 stepwise reprogramming of B cells into macrophages—an early example of a lineage switch (Cell 117, 663–676;
PMID 15163413)
2005 nuclear reprogramming of somatic cells after fusion with human ES cells (Science 309, 1369–1373; PMID 16123299)
2006 reprogramming of cultured mouse embryonic and adult fibroblasts to pluripotent stem cells following transfection by genes
encoding just four transcription factors (Cell 126, 663–676; PMID 16904174)
2007 induction of pluripotency in human somatic cells (Cell 131, 861–872; PMID 18035408 and Science 318, 1917–1920;
PMID 18029452)
2008 transdifferentiation by in vivo reprogramming of adult pancreatic exocrine cells to beta cells (Nature 455, 627–632;
PMID 18754011)
2008 nuclear reprogramming provides the first patient-specific and disease-specific pluripotent stem cells (Cell 134, 877–886;
PMID 18691744 and Science 321, 1218–1221; PMID 18669821)
2009 generation of fertile viable mice derived exclusively from iPS cells (Nature 461, 86–90; PMID 19672241, Nature 461, 91–94;
PMID 19672243, and Cell Stem Cell 5, 135–138; PMID 19631602)
2009 generation of iPS cells using recombinant proteins only (Cell Stem Cell 4, 381–384; PMID 19398399)
2010 reprogramming of fibroblasts to give functional neurons (Vierbuchen T et al., Nature 463, 1035–1041; PMID 20107439)
iPS cell, induced pluripotent stem cell; PMID, PubMed identifier number.
694 Chapter 21: Genetic Approaches to Treating Disease
differentiated cells
For some disorders, the use of iPS cell-derived cells also means that drugs can
now be tested directly on the relevant human cells. Drugs can be tested for their
ability to reduce the progression of the pathogenesis or even to counteract it in
some way, and the toxicity levels can be determined directly, rather than relying
on animal studies.
The potential of iPS cells for efficient autologous cell therapy remains unclear.
Safety concerns were an issue with the original technology, which used retroviral
vectors to transfect reprogramming genes, one of which was a known oncogene,
c-Myc. However, new methods do not use c-Myc, and avoid genome integration
by using non-integrating vectors or by transducing cells with the pluripotency-
inducing transcription factor proteins themselves, or miRNAs and other small
molecules involved in the same pathways. Performing efficient precisely directed
differentiation remains a major challenge. Like ES cells, undifferentiated iPS cells
can give rise to tumors, and there is considerable ignorance about the molecular
details of the long, multi-step differentiation pathways required to convert plu-
ripotent stem cells to differentiated somatic cells.
Transdifferentiation
An alternative to iPS cells is to use lineage-reprogramming methods that would
result in limited dedifferentiation to produce oligopotent progenitors of a desired
cell type, or a direct cell conversion (transdifferentiation) to produce the desired
cell type (see Box 21.3 Figure 1, pathways C and D). For example, pancreatic exo-
crine cells could be reprogrammed to make pancreatic beta cells that could pos-
sibly be induced to join pancreatic islets to replace beta cells lost in type 1 diabe-
tes. By just taking one step back in differentiation or a sideways step, differentiation
toward producing the desired cell type would be much simpler or not even
required. Additionally, disruption to the epigenetic marks that are set during
PRINCIPLES AND APPLICATIONS OF CELL THERAPY 695
TABLE 21.3 SOME PRINCIPAL TARGETS FOR STEM CELL THERAPY AND EXAMPLES OF SUCCESSFUL CELL REPLACEMENT
THERAPY IN RODENT DISEASE MODELS
Disease/injury Cell affected Stem cell therapy Comments/references
Leukemia/lymphoma hematopoietic allogeneic bone marrow transplantation from successful over many decades
stem cells killed by matched donor
chemotherapy
Eye injury/disease limbal stem cells in limbal stem cells are taken from the healthy Pellegrini et al. (1997) Lancet 349, 990–993;
one eye eye, expanded ex vivo, and transplanted back PMID 9100626 and Kolli et al. (2010) Stem
into the affected eye Cells in press; PMID 20014040
Spinal cord injuries neurons transplantation of fetal neural stem cells or ES some evidence for successful treatment;
cell-derived oligodendrocyte progenitors first target for approved clinical trials using
ES cells
Heart disease cardiomyocytes (heart direct injection of bone marrow stem cells conflicting views on the degree of success,
muscle cells) (including hematopoietic stem cells and and uncertainty regarding the underlying
mesenchymal stem cells) cause of any clinical improvement
Stroke neurons human fetal neural stem cells used as cell first clinical trial underway
source
Parkinson disease midbrain dopaminergic fetal neuronal cells used as cell source difficulties in ensuring that grafted neurons
neurons do not themselves develop disease after a
few years
Rat Parkinson disease midbrain dopaminergic iPS cells were differentiated into neural Wernig et al. (2008) Proc. Natl Acad. Sci. USA
model neurons precursor cells that were transplanted into the 105, 5856–5861; PMID 18391196
fetal mouse brain; the transplanted cells were
functionally integrated, causing improvements
in symptoms
Brain disease (in mouse glial cells as a result of human fetal glial progenitor cells were injected Windrem et al. (2008) Cell Stem Cell 2,
shiverer mutant) myelin deficiency into the nervous system to correct abnormal 553–565; PMID 18522848
brain development
Blindness (in Crx-deficient degeneration of human ES cells were differentiated to give Lamba et al. (2009) Cell Stem Cell 4, 73–79;
mice) photoreceptors retinal cells that were transplanted into the PMID 19128794
retinas of the mouse Crx mutant and restored
some visual function
gene A gene A
greatly overexpress a normal antigen in a way that can provoke weak immune
responses. Immunotherapy is designed to amplify the naturally weak
immune response to cancer cells.
in vivo ex vivo
X X Figure 21.8 In vivo and ex vivo gene
therapy. In ex vivo gene therapy, cells are
+
select X removed from the patient, genetically
remove cells; modified in some way in the laboratory, and
cells amplify returned to the patient. This allows just the
appropriate cells to be treated, and they
patient’s cells
–
some cells
+
can be checked to make sure they have the
X now X correct genetic modification before they are
+
X cells returned to the patient. For many tissues,
this is not possible and the cells must be
return cells to body modified within the patient’s body (in vivo
gene therapy).
700 Chapter 21: Genetic Approaches to Treating Disease
TABLE 21.4 VIRAL VECTORS OF USE IN MAMMALIAN GENE TRANSFECTION AND GENE THERAPY
Virus class Viral genome Cloning Interaction Target cells Transgene Comments
capacity with host expression
genome
g-Retroviruses ssRNA 7–8 kb integrating dividing cells only long-lasting moderate vector yielda;
(oncoretroviruses) risk of activation of
cellular oncogene
Lentiviruses ssRNA; ~9 kb up to 8 kb integrating dividing and long-lasting high vector yielda; risk of
non-dividing cells; and high-level oncogene activation
tropism varies expression
Adenoviruses dsDNA; often 7.5 kb, non-integrating dividing and transient but high- high vector yielda;
up to 38 kb but up to non-dividing cells level expression immunogenicity can be
35 kb for a major problem
gutless vectors
Adeno-associated ssDNA; 5 kb < 4.5 kb non-integrating dividing and high-level high vector yielda;
viruses non-dividing cells expression in small cloning capacity
medium to long but immunogenicity is
term (year) less significant than for
adenovirus
Herpes simplex dsDNA; > 30 kb non-integrating central nervous potential for long- able to establish lifelong
virus 120–200 kb system lasting expression latent infections
Some viruses that are used to make gene therapy vectors are naturally patho-
genic, and some can potentially generate strong immune responses. For safety
reasons, viral vectors are generally designed to be disabled so that they are
replication-defective. However, replication-competent viruses have sometimes
been used as therapeutic agents, notably oncolytic viruses for treating cancers.
Where a virus is naturally immunogenic, the viral vectors are modified in an
effort to reduce or eliminate the immunogenicity.
Retroviral vectors
Retroviruses—RNA viruses that possess a reverse transcriptase—deliver a nu-
cleoprotein complex (preintegration complex) into the cytoplasm of infected
cells. This complex reverse-transcribes the viral RNA genome and then integrates
the resulting cDNA into a single, but random, site in a host cell’s chromosome.
Retroviruses offer high efficiencies of gene transfer but can be generated at titers
that are not so high as some other viruses, and they can afford moderately high
gene expression. Because they integrate into chromosomes, long-term stable
transgene expression is possible, but uncontrolled chromosome integration con-
stitutes a safety hazard because promoter/enhancer sequences in the recom-
binant DNA can inappropriately activate neighboring chromosomal genes.
g-Retroviral vectors are derived from simple mouse and avian retroviruses
that contain three transcription units: gag, pol, and env (Figure 21.9). In addition,
a cis-acting RNA element, y, is important for packaging, being recognized by viral
proteins that package the RNA into infectious particles. Because native g-retrovi-
ruses transform cells, the vector systems need to be engineered to ensure that
they can produce only permanently disabled viruses. g-Retroviruses cannot get
their genomes through nuclear pores and so infect dividing cells only. This limi-
tation can, however, be turned to advantage in cancer treatment. Actively divid-
ing cancer cells in a normally non-dividing tissue such as brain can be selectively
infected and killed without major risk to the normal (non-dividing) cells.
Lentiviruses are complex retroviruses that have the useful attribute of infect-
ing non-dividing as well as dividing cells, and can be produced in titers that are a
hundredfold greater than is possible for g-retroviruses. In addition to expressing
late (post-replication) mRNAs from gag, pol, and env transcription units, six early
702 Chapter 21: Genetic Approaches to Treating Disease
REGION REPLACED
viral proteins are produced before replication of the virus. The early proteins
include two regulatory proteins, tat and rev, that bind specific sequences in the
viral genome and are essential for viral replication. Like other retroviruses, they
allow long-term gene expression. Results with marker genes have been promis-
ing, showing prolonged in vivo expression in muscle, liver, and neuronal tissue.
Most lentiviral vectors are based on HIV, the human immunodeficiency virus,
and much work has been devoted to eliminating unnecessary genes from the
complex HIV genome and generating safe packaging lines while retaining the
ability to infect non-dividing cells. Lentiviruses appear to have a safer chromo-
some integration profile than g-retroviruses; self-inactivating lentivirus vectors
provide an additional layer of safety.
Adenoviral and adeno-associated virus (AAV) vectors
Adenoviruses are DNA viruses that cause benign infections of the upper respira-
tory tract in humans. As with retroviral vectors, adenoviral vectors are disabled
and rely on a packaging cell to provide vital functions. The adenovirus genome is
relatively large, and gutless adenoviral vectors, which retain only the inverted ter-
minal repeats and packaging sequence, can accommodate up to 35 kb of thera-
peutic DNA (Figure 21.10).
Adenovirus virus vectors can be produced at much higher titers than g-retro-
viruses and so transgenes can be highly expressed. They can also efficiently
transduce both dividing and non-dividing cells. A big disadvantage is their
immunogenicity. Even though a live replication-competent adenovirus vaccine
has been safely administered to several million US army recruits over several
decades (for protection against natural adenoviral infections), unwanted immune
reactions have been a problem in several gene therapy trials, as described below.
Moreover, because these vectors are non-integrating, gene expression is short-
PRINCIPLES OF GENE THERAPY AND MAMMALIAN GENE TRANSFECTION SYSTEMS 703
to kill harmful cells, but instead to counteract the harmful effects of genes within
cells. Some conditions are caused by gain-of-function mutations (Chapter 13,
p. 428) or a dominant-negative effect (Chapter 13, p. 431), and the problem is a
resident gene that is doing something positively harmful. Diseases that might
benefit from such strategies could include cancers that arise through activated
oncogenes, dominant Mendelian conditions other than those caused by loss of
gene function, and autoimmune diseases. Infectious diseases might also be
treated by targeting a pathogen-specific gene or gene product.
In principle, two groups of strategies could be used to counteract a gene’s
harmful effects. One way is to specifically block expression of the harmful gene
by downregulating transcription, by destroying the transcript, or by inhibiting
the protein product. In Section 21.2 we outlined various methods that inhibit
gene expression at the protein level; here, we will consider methods that inhibit
or cleave specific RNA transcripts. A second type of strategy seeks to restore the
normal gene function in some way. Either the DNA sequence is altered to correct
a genetic defect, or alternative gene splicing is artificially induced so as to bypass
a causative mutation at the RNA level.
Whatever method is used, some sort of agent or construct must be delivered
to a target cell and made to function there. In that respect, the problems of effi-
cient delivery and expression are similar to those described in the previous sec-
tion. In the case of dominant diseases or activated oncogenes, there is the addi-
tional problem of designing an agent that will selectively attack the mutant allele
without affecting the normal allele.
ensure that sufficient intact antisense oligonucleotides (AOs) arrived at the target
tissue, clinical applications focused on applying the AOs directly to diseased tis-
sue. In 1998, Vitravene® (fomivirsen) became the first FDA-approved therapeutic
AO; it is applied directly to the eyes to treat cytomegalovirus retinitis in AIDS
patients. The cornea (and anterior chamber of the eye) is an immunologically
privileged site and so a high concentration of reagents can be administered
directly by injection.
Subsequent technological developments produced a second generation of
AOs based on nucleic acid analogs that are much more chemically stable than
normal oligonucleotides, such as 2-O-methyl phosphorothioate oligonucle-
otides, morpholino phosphorothioate oligonucleotides, and locked nucleic acids
(Figure 21.12). As we saw in Section 20.3, morpholino oligonucleotides are now
widely used to knock down genes in some model organisms, but there have
been few therapeutic applications that exploit morpholinos or other second-
generation AOs to block gene expression. Although second-generation AOs are
chemically stable, off-target effects can occur when the AO inhibits RNAs other
than the target.
In addition, as for any other oligonucleotide, efficient delivery of AOs to cells
has been a significant problem. In an effort to increase the in vivo uptake of AOs
by cells, some second-generation AOs, such as the morpholino oligonucleotides,
have been modified by conjugating peptides to them; however, this can some-
times provoke immune reactions.
Therapeutic ribozymes
A separate wave of RNA therapeutics focused on RNA enzymes (ribozymes), such
as the plant hammerhead ribozyme, that naturally cleave other RNA target mol-
ecules. Genetically modified ribozymes have been made in which a catalytic,
RNA-cleaving domain from a natural ribozyme is fused to an antisense RNA
sequence designed to bind to transcripts from a specific target gene of interest.
The modified ribozyme would bind to the transcripts of the target gene and
cleave them. Ribozymes have, however, not been so effective in practice: clinical
trials to treat cancer and other diseases have had rather disappointing
outcomes.
(A) (B)
O base O base
O O
OH OMe
O O
O P O– O P S–
O base O base
O O
OH OMe
O O
(C) (D)
O O base
O
O base
N Figure 21.12 Second-generation
antisense oligonucleotides. (A) The
O P N normal ribose phosphate backbone
O base of RNA. (B) Structure of a 2-O-methyl
O O
O phosphorothioate oligonucleotide.
(C) Structure of a morpholino
N O P O– phosphorothioate oligonucleotide.
(D) Structure of an oligonucleotide known
as a locked nucleic acid.
RNA AND OLIGONUCLEOTIDE THERAPEUTICS AND THERAPEUTIC GENE REPAIR 707
Therapeutic siRNA
In the 1990s, the discovery of RNA interference (RNAi) transformed the potential
for specific inhibition of gene expression and provided a stimulus to the develop-
ment of RNA therapeutics. Experimentally, the object is to produce a double-
stranded RNA with a base sequence corresponding to a transcribed part of the
target gene, and so provoke natural cellular defense pathways to destroy tran-
scripts of the specific gene. Long double-stranded RNA cannot be used for this
purpose in human (or other mammalian) cells because it provokes interferon
responses that result in nonspecific inhibition of gene expression (see Chapter 12,
p. 391). However, short double-stranded RNA is highly efficient at inducing
knockdown of the expression of specific genes.
Performing RNAi in mammalian cells often involves transfecting transgenes
that express a short hairpin RNA. After transfection into cells, the expressed RNA
is cleaved by the cytoplasmic enzyme dicer to give a double-stranded siRNA
about 21–23 nucleotides long (see Figures 12.3 and 12.4). Other RNAi therapies
deliver just the naked siRNA.
siRNA technology is highly efficient at gene knockdown in mammalian cells
and is transforming our understanding of human gene function. There is there-
fore considerable excitement about its therapeutic potential. Promising thera-
peutic targets would include viral infections, cancers (targeting oncogenes), and
neurodegenerative disease (targeting harmful mutant alleles). The method has
been used with some success in treating various animal models of disease, but
delivery is comparatively inefficient because of the reliance on non-viral vector
systems.
Some early clinical trials using siRNA have focused on the immunologically
privileged eye. One principal target is the VEGF gene that underlies macular
degeneration (as described in the section on therapeutic aptamers on page 686)
and has involved simply injecting naked siRNA. Applying RNAi to target diseases
with less localized pathology is a much more formidable task. However, systemic
and reasonably effective delivery has been possible in animal models by attach-
ing siRNAs to cholesterol, to ferry the siRNA through the bloodstream.
repaired sequence
BOX 21.8 SOME OF THE MAJOR UPS AND DOWNS IN THE PRACTICE OF GENE THERAPY
1990– commencement of the first clinical trial involving gene therapy, using recombinant g-retroviruses to overcome an autosomal
recessive form of severe combined immunodeficiency (SCID) due to deficiency of adenosine deaminase (ADA); the trial was hailed as
a success but patients had also been treated in parallel with standard enzyme replacement using polyethylene glycol (PEG)-ADA
1999 death of Jesse Gelsinger in September 1999, just a few days after receiving recombinant adenoviral particles by intra-hepatic
injection in a clinical trial of gene therapy for ornithine transcarbamylase deficiency; a massive immune response to the adenovirus
particles resulted in multiple organ failure
2000– the first unambiguous gene therapy success involved using recombinant g-retroviruses to treat an X-linked form of SCID (see
Figure 21.16); however, several of the treated children went on to develop leukemia, with one death, as a result of insertional
activation of cellular oncogenes (see the main text)
2006– transient success in using g-retroviral gene therapy to treat two adult patients with chronic granulomatous disease, a recessive
disorder that affects phagocyte function causing immunodeficiency; despite initial success, both patients went on to show
transgene silencing, and also myelodysplasia as a result of insertional activation of cellular genes (see the main text); 27 months
after gene therapy one patient died from overwhelming sepsis; see Ott et al. (2006) Nat. Med. 12, 401–409; PMID 16582916 and
Stein et al. (2010) Nat. Med. 16, 198–204; PMID 20098431
2006 report of limited success in gene therapy for hemophilia B: AAV2 vectors were used successfully to transduce hepatocytes and
express a Factor IX transgene; however, an immune response led to destruction of the transduced cells; see Manno et al.
Nat. Med. 12, 342–347; PMID 16474400
2009 successful gene therapy for ADA deficiency reported by Aiuti et al. N. Engl. J. Med. 360, 447–458; PMID 19179314
2009 first report of successful gene therapy for a central nervous system disorder, X-linked adrenoleukodystrophy, and the first using a
lentiviral vector; see Cartier et al. Science 326, 818–823; PMID 19892975
2009 report of successful in vivo gene therapy using retinal injection of recombinant AAV to treat Leber congenital amaurosis, a form
of childhood blindness; the significant post-treatment gain in vision was found to be maintained after one year; see Cideciyan et al.
N. Engl. J. Med. 361, 725–727; PMID 19675341
benefits. By contrast, dominant disorders—in which one of the two alleles is nor-
mal and makes a gene product—present much greater challenges. Getting the
correct gene dosage to treat haploinsufficiency could be problematic, and coun-
teracting the effect of a mutant allele that makes a harmful product would require
very efficient and selective silencing of the mutant allele, while leaving the nor-
mal allele unaffected.
monogenic naked/
diseases plasmid
7.9% DNA
17.7%
CD34+ stem cells enriched retroviral vector carrying Figure 21.16 Ex vivo gene therapy
by magnetic bead-antibody IL2RG gene for X-linked severe combined
immunodeficiency disease (X-SCID). Bone
marrow was removed from the patient and
antibody affinity was used to enrich for cells
expressing the CD34 antigen, a marker of
hematopoietic stem cells. To do this, bone
marrow cells were mixed with paramagnetic
beads coated with a CD34-specific
monoclonal antibody; beads containing
bound cells were removed by using a
magnet. The transduced stem cells were
expanded in culture before being returned
to the patient. For details, see Cavazzana-
Calvo M et al. (2000) Science 288, 669–672;
PMID 10784449 and Hacein-Bey-Abina S et
30–150 ml of bone al. (2002) N. Engl. J. Med. 346, 1185–1193;
marrow aspirated under PMID 11961146.
general anesthetic infuse 14–38 million transduce cells in plastic bag
cells per kg body weight for 3 days; cells multiply
5–8-fold
Figure 21.8). Disorders resulting from recessively inherited defects in white blood
cell function were among the first targets for gene therapy. White blood cells have
key roles in the immune system (see Tables 4.7 and 4.8), and defects in their func-
tion can give rise to severe immunodeficiencies.
The first definitive gene therapy success came in treating severe combined
immunodeficiency (SCID), in which both B and T lymphocyte function is defec-
tive. Patients are extremely vulnerable to infectious disease, and in some cases
they have been obliged to live in a sterile environment. The most common form
of SCID is due to inactivating mutations in the X-linked IL2RG gene that makes
the gc common gamma subunit for multiple interleukin receptors, including
interleukin receptor 2. Lymphocytes use interleukins as cytokines to signal to
each other and to other immune system cells, and so lack of the gc cytokine recep-
tor subunit had devastating effects on lymphocyte and immune system function.
Another common form of SCID is due to adenosine deaminase (ADA) deficiency;
the resulting build-up of toxic purine metabolites kills T cells. Because T cells
regulate B-cell activity, both T- and B-cell function is affected.
SCID gene therapy has involved ex vivo g-retroviral transfer of IL2RG or ADA
coding sequences into autologous patient cells (see Box 21.8). To aid the chances
of success, bone marrow cells were used because they are comparatively enriched
in hematopoietic stem cells, the cells that give rise to all blood cells (see Figure
4.17); a further refinement was to select for cells expressing CD34, a marker of
hematopoietic stem cells (Figure 21.16). By 2008 Fischer et al. reported that 17
out of 20 X-linked SCID patients and 11 out of 11 ADA-deficient SCID patients
had been successfully treated and retained a functional immune system (for
more than 9 years after treatment in the earliest patients).
With the use of retrovirus vectors, genes could be inserted into chromosomes;
the therapeutic DNA would be stably transmitted after cell division, giving long-
lasting transgene expression. This clear advantage was eroded by the lack of con-
trol over where the transgenes integrated. In one clinical trial for X-linked SCID,
four patients developed T-acute lymphoblastic leukemia after retroviral integra-
tion because promoter/enhancer sequences in the inserted DNA inappropriately
activated a neighboring LMO2 proto-oncogene. Activation of LMO2 is now
known to promote the self-renewal of pre-leukemic thymocytes, so that commit-
ted T cells accumulate additional genetic mutations required for leukemic
transformation.
Similar protocols were applied to treating patients with an X-linked form of
chronic granulomatous disease (CGD). Patients with CGD have immunodefi-
ciency arising from mutations in any of four genes that encode subunits of the
NADPH oxidase complex. NADPH oxidase is involved in making free radicals and
other toxic small molecules that phagocytes use to kill the microbes that they
engulf; a defective NADPH oxidase results in defective phagocyte function.
712 Chapter 21: Genetic Approaches to Treating Disease
Ovarian cancer tumor cells gene addition to restore tumor intraperitoneal injection of retrovirus or adenovirus encoding full-
suppressor gene function length cDNA encoding p53 or BRCA1, with the hope of restoring cell
cycle control
Ovarian cancer tumor cells oncogene inactivation inject adenovirus encoding a scFv antibody to ErbB2; hope to
inactivate a growth signal
Malignant tumor-infiltrating genetic manipulation of tumor extract TILs from surgically removed tumor and expand in culture;
melanoma lymphocytes cells to trigger apoptosis infect TILs ex vivo with a retroviral vector expressing TNF-a, infuse
(TILs) into patient; hope that TILs will target remaining tumor cells, and the
TNF-a will kill them (see Figure 21.7E for the principle)
Various tumors tumor cells increase antigenicity of tumor transfect tumor cells with a retrovirus expressing a cell surface
cells so that immune system antigen such as HLA-B7 or a cytokine such as IL-12, IL-4, GM-CSF, or
destroys tumor g-interferon; hope that this enhances the immunogenicity of the
tumor, so that the host immune system destroys it; often done ex vivo
with lethally irradiated tumor cells (see Figure 21.7E for the principle)
Prostate cancer dendritic cells modify dendritic cells to treat autologous dendritic cells with a tumor antigen or cDNA
enhance tumor-specific expressing the antigen, to prime them to mount an enhanced
immune reaction immune response to the tumor cells (see Figure 21.7E for the
principle)
Malignant glioma tumor cells genetic modification of tumor inject a retrovirus expressing thymidine kinase (TK) or cytosine
(brain tumor) cells so that they convert a deaminase (CD) into the tumor; only the dividing tumor cells, not
non-toxic prodrug into a toxic the surrounding non-dividing brain cells, are infected; then treat
compound that kills them with ganciclovir (TK-positive cells convert this to the toxic ganciclovir
triphosphate) or 5-fluorocytosine (CD-positive cells convert it to
the toxic 5-fluorouracil); virally infected (dividing) cells are killed
selectively (see Figure 21.17)
Head and neck tumor cells use of oncolytic viruses that are inject ONYX-015 engineered adenovirus into tumor; the virus can only
tumors engineered to selectively kill replicate in p53-deficient cells, so it selectively lyses tumor cells; this
tumor cells treatment was effective when combined with systemic chemotherapy
TNF, tumor necrosis factor; IL, interleukin; GM-GSF, granulocyte/macrophage colony-stimulating factor. See the National Institute of Health clinical trials
database at http://clinicaltrials.gov/ for a comprehensive survey.
714 Chapter 21: Genetic Approaches to Treating Disease
• Use of oncolytic viruses that are engineered to kill tumor cells selectively
• Genetic modification of tumor cells so that they, but not surrounding non-
tumor cells, convert a non-toxic prodrug into a toxic compound that kills
them (Figure 21.17).
As at 2009, three cancer immunotherapies were at the phase III stage of clinical
trials. Also at the phase III stage are adenoviral-based transfer of a herpes simplex
virus thymidine kinase suicide gene for treating glioblastoma (see Figure 21.17)
and adenoviral-based expression of p53 for the treatment of head and neck can-
cers and for Li–Fraumeni (OMIM 151623) tumors.
A general problem is that the methods for killing cancer cells are far from
100% efficient, and even when the efficiency is very high so that tumors shrink,
they can grow again as surviving cancer cells proliferate. The idea of curing can-
cer is being replaced by a more realistic long-term management of cancer
disorders.
CONCLUSION
Genetic technologies can be used in the treatment of disease, regardless of
whether the disease has a genetic cause or not. This can be part of a treatment
regime that involves conventional drugs or vaccines, but genetic manipulations
can also be used in the production of drugs and vaccines. Genetic engineering
has also been used to make therapeutic proteins. Expression cloning in microor-
ganisms, cultured cells, or transgenic livestock (in which the protein of interest is
expressed in milk or eggs) avoids the health risks associated with harvesting these
proteins from animal or human sources. Genetic engineering has also been
applied to make partly or fully human monoclonal antibodies. They are more
stable in human serum than rodent monoclonal antibodies and are consequently
better suited for therapeutic purposes.
Genetic interventions are also being used in two developing and intercon-
nected therapeutic strategies: cell replacement therapy (to replace cells lost
through disease or injury, enabling repair of damaged tissues) and gene therapy
(involving genetic modification of a patient’s cells). Stem cells are important in
cell therapy because they are both self-renewing and also capable of being dif-
ferentiated to make new tissue cells. Bone marrow transplants are an established
form of stem cell therapy to treat blood disorders and hematopoietic stem cell
depletion after chemotherapy.
To avoid immune rejection, autologous cells are prefered for cell therapy. In
principle, they can be generated by artificially enhancing the mobilization and
differentiation of stem cells in the patient or by using laboratory nuclear repro-
gramming methods to reverse the normal developmental progression of cells
toward specialization, which has opened new research and therapeutic opportu-
nities. The transplant of the nucleus from an adult somatic cell into an enucle-
ated oocyte allowed the cloning of Dolly the sheep and various other types of
animal species, and could in principle be used to make patient-specific embry-
onic stem cells; however, the procedures are technically challenging and ethically
contentious.
Nuclear reprogramming of skin fibroblasts from patients by ectopic expres-
sion of specific transcription factors offers a new route to obtaining patient-
specific and disease-specific pluripotent stem cells, with few ethical concerns.
716 Chapter 21: Genetic Approaches to Treating Disease
Such induced pluripotent stem (iPS) cells can be differentiated to provide human
cellular models of disease and are being used to test drugs; in principle they
could provide sources of cells for autologous cell therapy. Transdifferentiation
techniques hold the promise of making more subtle changes to cell lineage (e.g.
altering pancreatic exocrine cells into beta cells to replace those lost in type 1
diabetes).
In gene therapy, the aim is to transfer a DNA, RNA, or oligonucleotide into a
patient’s cells to counteract or alleviate disease. Transfer into cells is often
achieved with virus vectors, which offer high efficiency of gene transfer and high-
level, sometimes long-lasting, transgene expression. Safety issues are a concern
because integration into chromosomes is uncontrolled and can accidentally
activate a neighboring proto-oncogene. Non-viral vectors are safer, but gene
transfer rates are low and expression is often transient and comparatively weak.
The genetic modification could theoretically be directed to germ-line cells (a
permanent, transmissible modification), but for ethical reasons germ-line gene
therapy is not being attempted, and all current trials involve somatic gene ther-
apy. The aim is often to add a gene that can functionally replace a defective gene,
but some therapies seek to block expression of a mutant or ectopically expressed
gene at the RNA level, usually using RNA interference, or at the protein level.
Another class of gene therapy strategy seeks to restore the function of a defec-
tive gene, either at the RNA level by inducing exon skipping so that a causative
mutation is bypassed, or at the DNA level by using zinc finger nucleases. These
enzymes are genetically engineered to act as restriction nucleases that cleave
genomic DNA at just one site, such as at the location of an inactivating point
mutation. The double-strand DNA break can trigger a cellular repair pathway
that uses homologous recombination to replace the mutant sequence by a
homologous wild-type sequence provided by a transfected plasmid.
Most clinical trials of gene therapy have sought to treat cancer, and here the
strategy is to kill cancer cells selectively; however, success has been limited. The
first clearly successful gene therapies have been for recessive blood cell disor-
ders, which are more amenable because the cells are highly accessible and can be
genetically modified outside the body before being returned to the patient, and
because even a small amount of transgene expression can often bring about
some clinical benefit.
FURTHER READING
General Cardinale A & Biocca S (2008) The potential of intracellular
antibodies for therapeutic targeting of protein-misfolding
Costa T, Scriver CR & Childs B (1985) The effect of Mendelian diseases. Trends Mol. Med. 14, 373–380.
disease on human health: a measurement. Am. J. Med. Genet. Hoogenboom HR (2005) Selecting and screening human
21, 231–242. recombinant antibody libraries. Nat. Biotechnol. 23,
Treacy EP, Valle D & Scriver CR (2001) Treatment of genetic 1105–1116.
disease. In The Metabolic And Molecular Basis Of Inherited Kim SJ, Park Y & Hong HJ (2005) Antibody engineering for the
Disease, 8th ed. (Scriver CR, Beaudet AL, Sly WS, Valle MD development of therapeutic antibodies. Mol. Cell 20, 17–29.
eds). McGraw-Hill. Lillico SG, Sherman A, McGrew MJ et al. (2007) Oviduct-specific
expression of two therapeutic proteins in transgenic hens.
Genetic approaches to identifying novel drugs and Proc. Natl Acad. Sci. USA 104, 1771–1776.
drug targets Reichert JM (2008) Monoclonal antibodies as innovative
therapeutics. Curr. Pharm. Biotechnol. 9, 423–430.
Caskey CT (2007) The drug development crisis: efficiency and
Stocks M (2005) Intrabodies as drug discovery tools and
safety. Annu. Rev. Med. 58, 1–16.
therapeutics. Curr. Opin. Chem. Biol. 9, 359–365.
King A (2009) Researchers find their Nemo. Cell 139, 843–846.
Stöger E, Vaquero C, Torres E et al. (2000) Cereal crops as viable
[A profile on zebrafish that includes its advantages for drug
production and storage systems for pharmaceutical scFv
screening.]
antibodies. Plant Mol. Biol. 42, 583–590.
Welch EM, Barton ER, Zhuo J et al. (2007) PTC124 targets genetic
disorders caused by nonsense mutations. Nature 447, 87–91. Genetic approaches to disease treatment using
aptamers or vaccines
Antibody engineering and therapeutic antibodies
Bunka DH & Stockley PG (2006) Aptamers come of age—at last.
and proteins Nat. Rev. Microbiol. 4, 588–596.
Beerli RR, Bauer M, Buser RB et al. (2008) Isolation of human Dausse E, Gomes SDR & Toulme J-J (2009) Aptamers: a new class
monoclonal antibodies by mammalian cell display. Proc. Natl of oligonucleotides in the drug discovery pipeline. Curr. Opin.
Acad. Sci. USA 105, 14336–14341. Pharmacol. 9, 602–607.
FURTHER READING 717
Ellington Lab Aptamer Database. http://aptamer.icmb.utexas Verma IM & Weitzman MD (2005) Gene therapy: twenty-first
.edu/index.php century medicine. Annu. Rev. Biochem. 74, 711–738.
Kutzler MA & Weiner DB (2008) DNA vaccines: ready for prime
time? Nat. Rev. Genet. 9, 776–788. Gene delivery into mammalian cells
Soler E & Houdebine LM (2007) Preparation of recombinant Bouard D, Alazard-Dany N & Cosset FL (2009) Viral vectors: from
vaccines. Biotechnol. Annu. Rev. 13, 65–94. virology to transgene expression. Br. J. Pharmacol. 157,
153–165.
General reviews on pluripotent stem cells and Chowdhury EH (2009) Nuclear targeting of viral and non-viral
nuclear reprogramming DNA. Expert Opin. Drug Deliv. 6, 697–703.
Farjo R, Skaggs J, Quiambao AB et al. (2006) Efficient non-viral
Conrad S, Renninger M, Hennenlotter J et al. (2008) Generation
ocular gene transfer with compacted DNA nanoparticles.
of pluripotent stem cells from adult human testis. Nature 456,
PLoS ONE 1, e38.
344–349.
Pringle, IA, Hyde SC & Gill DR (2009) Non-viral vectors in cystic
Fenno LE, Ptaszek LM & Cowan CA (2008) Human embryonic
fibrosis gene therapy: recent developments and future
stem cells: emerging technologies and practical applications.
prospects. Expert Opin. Biol. Ther. 9, 991–1003.
Curr. Opin. Genet. Dev. 18, 324–329.
Rolland A (2006) Nuclear gene delivery: the Trojan horse
Hochedlinger K & Plath K (2009) Epigenetic reprogramming and
approach. Expert Opin. Drug. Deliv. 3, 1–10.
induced pluripotency. Development 136, 509–523.
Simoes S, Filipe A, Faneca H et al. (2005) Cationic liposomes for
Jaenisch R & Young R (2008) Stem cells, the molecular circuitry of
gene delivery. Expert Opin. Drug. Deliv. 2, 237–254.
pluripotency and nuclear reprogramming. Cell 132, 567–582.
Wolfrum C, Shi S, Jayaprakash KN et al. (2007) Mechanisms and
Rossant J (2009) Reprogramming to pluripotency: from frogs to
optimization of in vivo delivery of lipophilic siRNAs. Nat.
stem cells. Cell 138, 1047–1050.
Biotechnol. 25, 1149–1157.
Yamanaka S (2008) Pluripotency and nuclear reprogramming.
Philos. Trans. R. Soc. Lond. B 363, 2079–2087. RNA and antisense therapeutics
Zhou Q & Melton DA (2008) Extreme makeover: converting one
Castanotto D & Rossi JJ (2009) The promises and pitfalls of RNA
cell into another. Cell Stem Cell 3, 382–388.
interference-based therapeutics. Nature 457, 426–433.
Rayburn ER & Zhang R (2008) Antisense, RNAi, and gene silencing
General stem cell therapy and regenerative strategies for therapy: mission possible or impossible? Drug
medicine (see also references in Table 21.3) Discov. Today 13, 513–521.
Ährlund-Richter L, De Luca M, Marshak DR et al. (2009) Isolation Van Ommen G-J, van Deutekom J & Aartsma-Rus A (2008) The
and production of cells suitable for human therapy: therapeutic potential of antisense-mediated exon skipping.
challenges ahead. Cell Stem Cell 4, 20–26. Curr. Opin. Mol. Ther. 10, 140–149.
Nature Insight on Regenerative Medicine (2008) Nature 453, Wood M, Yin H & McClorey G (2007) Modulating the expression of
301–351. disease genes with RNA-based therapy. PLoS Genet. 3, e109.
Pitchford SC, Furze RC, Jones CP, Wengner AM & Rankin SM (2009) Yokota T, Takeda S, Lu QL et al. (2009) A renaissance for antisense
Differential mobilization of subsets of progenitor cells from oligonucleotide drugs in neurology: exon skipping breaks
the bone marrow. Cell Stem Cell 4, 62–72. new ground. Arch. Neurol. 66, 32–38.
iPS cells and disease-specific pluripotent stem cells Zinc finger nucleases and potential therapeutic
Colman A & Dreesen O (2009) Pluripotent stem cells and disease applications
modeling. Cell Stem Cell 5, 244–247. Cathomen T & Joung JK (2008) Zinc-finger nucleases: the next
Kiskinis E & Eggan K (2010) Progress towards the clinical generation emerges. Mol. Ther. 16, 1200–1207.
application of patient-specific pluripotent stem cells. J. Clin. Perez EE, Wang J, Miller JC et al. (2008) Establishment of HIV-1
Invest. 120, 51–59. resistance in CD4+ T cells by genome editing using zinc-
Muller R & Lengerke C (2009) Patient-specific pluripotent stem finger nucleases. Nat. Biotechnol. 26, 808–816.
cells: promises and challenges. Nat. Rev. Endocrinol. Urnov FD, Miller JC, Lee Y-L et al. (2005) Highly efficient
5, 195–203. endogenous human gene correction using designed zinc-
Nature Methods Reviews on Induced Pluripotent Stem Cells finger nucleases. Nature 435, 646–651.
(2010) Nat. Methods 7, 17–33. Wu J, Kandavelou K & Chandrasegaran S (2007) Custom-designed
Nishikawa S-I, Goldstein RA & Nierras CR (2008) The promise zinc finger nucleases: what is next? Cell Mol. Life Sci. 64,
of human induced pluripotent stem cells for research and 2933–2944.
therapy. Nat. Rev. Mol. Cell Biol. 9, 725–729.
Saha K & Jaenisch R (2009) Technical challenges in using human Gene therapy in practice (see also references
induced pluripotent stem cells to model disease. Cell Stem in Box 21.8)
Cell 5, 584–595. Baxevanis CN, Perez SA & Papmichail M (2009) Cancer
immunotherapy. Crit. Rev. Clin. Lab. Sci. 46, 167–189.
Gene therapy: general background reviews and Edelstein ML, Abedi MR & Wixon J (2007) Gene therapy clinical
information trials worldwide to 2007—an update. J. Gene Med. 9, 833–842.
Gene Therapy Net. http://www.genetherapynet.com/ Fischer A & Cavazzana-Calvo M (2008) Gene therapy of inherited
[An information resource for basic and clinical research in diseases. Lancet 371, 2044–2047.
gene therapy, cell therapy, and genetic vaccines.] Frank NY, Schatton T & Frank MH (2010) The therapeutic promise
Gordon JW (1999) Genetic enhancement in humans. Science 283, of the cancer stem cell concept. J. Clin. Invest. 120, 41–50.
2023–2024. Gene Therapy Clinical Trials Worldwide. http://www.wiley.co.uk/
O’Connor TP & Crystal RG (2006) Genetic medicines: treatment genmed/clinical/ [Wiley’s database of worldwide gene
strategies for hereditary disorders. Nat. Rev. Genet. 7, 261–276. therapy clinical trials.]
718 Chapter 21: Genetic Approaches to Treating Disease
Glossary
3¢ end The end of a DNA or RNA strand that is linked to the antibody A protein produced by activated B-cells in
rest of the chain only by carbon 5 of the sugar, not carbon 3. response to a foreign molecule or microorganism. (Figure
(Figure 1.8) 4.21)
3¢, 5¢- phosphodiester bond The link between adjacent anticipation The tendency for the severity of a condition to
nucleotides in DNA or RNA. (Figure 1.5) increase in successive generations. Commonly due to bias of
ascertainment (Section 3.3), but seen for real with dynamic
5¢ end The end of a DNA or RNA strand that is linked to the
mutations. (Section 13.3)
rest of the chain only by carbon 3 of the sugar, not carbon 5.
(Figure 1.8) anticodon The 3-base sequence in a tRNA molecule that
base-pairs with the codon in mRNA.
5¢ RACE (rapid amplification of cDNA ends) A technique
for characterizing the ends of mRNAs, in this case the 5¢ antigen A molecule that can induce an adaptive immune
ends. (Box 11.1) response or that can bind to an antibody or T-cell receptor.
adaptive immune system The immune system of verte- antiparallel Of the strands in a double-stranded nucleic
brates that creates immunological memory. acid molecule, running in opposite directions so that where
one strand has its 5¢ end the complementary strand has its
affected sib pair (ASP) analysis A form of non-parametric
3¢ end.
linkage analysis based on measuring haplotype sharing by
sibs who both have the same disease. (Figure 15.2) antisense RNA A transcript complementary to a normal
mRNA. Naturally occurring antisense RNAs, made using the
affinity tag In genetic manipulation, a short peptide that is
nontemplate strand of a gene, are important regulators of
attached to a recombinant protein in order to allow the pro-
gene expression.
tein to be isolated by affinity chromatography.
antisense strand (template strand) The DNA strand of a
alleles Alternative forms of the same gene.
gene, which, during transcription, is used as a template by
allelic heterogeneity The existence of many different RNA polymerase for synthesis of mRNA. (Figure 1.13)
mutations, but all within the same gene, in unrelated people
antisense technology Experimental inhibition of expres-
with the same phenotype.
sion of a gene by use of an RNA, an oligonucleotide or a mor-
amino acid The building blocks of proteins. (Figure 1.4) pholino derivative complementary to the mRNA of the gene.
amplification An increase in the copy number of a DNA apoptosis Programmed cell death.
sequence as a result of cloning, PCR or natural processes.
aptamers Single-stranded nucleic acid molecules designed
anaphase lag Loss of a chromosome because it moves to bind specifically to an antigen, mimicking antibodies.
too slowly at anaphase to get incorporated into a daughter
archaea Single-celled prokaryotes superficially resembling
nucleus.
bacteria, but with molecular features indicative of a third
ancestral chromosome segments, shared Chromosomal kingdom of life.
segments that are shared by apparently unrelated people
ARMS (amplification refractory mutation system) test
because they are inherited from an unknown distant com-
Allele-specific PCR. (Figures 6.17 and 18.8)
mon ancestor. (Figures 14.14 and 15.6)
array CGH (comparative genomic hybridization) Com-
aneuploidy A chromosome constitution with one or more
petitive hybridization of a test and control sample to a
chromosomes extra or missing from a full euploid set.
microarray of mapped clones to detect copy number varia-
anneal Allowing two complementary single-stranded tions. (Figure 16.10)
nucleic acids to form a base-paired double strand. The
ASOs (allele-specific oligonucleotides) Under stringent
reverse of denaturation.
hybridization conditions, oligonucleotides 15–20 nt long will
720 Glossary: ASOs
hybridize only to a perfectly matched target. This provides opposite strands of a double-stranded nucleic acid, hydro-
the basis for various methods of distinguishing alleles that gen-bonded to each other. (Figure 1.6)
differ by only a single nucleotide. (Figure 7.9)
Bayesian statistics A method of combining a number of
association A tendency of two characters (diseases, marker independent probabilities. It forms the basis of much genetic
alleles etc.) to occur together at non-random frequencies. risk estimation. (Box 18.2)
Association is a simple statistical observation, not a genetic
bias of ascertainment Distorted proportions of pheno-
phenomenon, but can sometimes be caused by linkage dis-
types in a dataset caused by the way cases are collected. (Sec-
equilibrium. (Section 15.4)
tion 3.2)
assortative mating Mating where the partner is chosen on
biometrics The statistical study of quantitative characters.
the basis of phenotypic or genotypic similarity (e.g. tall peo-
ple tend to marry tall people, deaf people tend to marry deaf biotin-streptavidin system A tool for isolating labeled
people; some people prefer to marry relatives). Assortative molecules. The bacterial protein streptavidin binds the
mating can produce a non-Hardy–Weinberg distribution of B-vitamin biotin with exceptionally high affinity. Biotinyl-
genotypes in a population. ated molecules can be isolated using streptavidin-coated
magnetic beads.
autoimmunity An abnormal state in which the distinction
between self and nonself fails, so that the body mounts an bisulfite sequencing A method for identifying methyl-
adaptive immune response against one or more self mol- ated cytosines in a DNA sample. Sodium bisulfite converts
ecules. (Box 4.5) unmethylated cytosines, but not methylated cytosines, to
uracil. When the product is sequenced, cytosines that were
autophagy Digestion of worn-out organelles by a cell’s
originally methylated are still read as cytosines, but those
own lysosomes.
that were unmethylated are read as thymine. (Box 11.4)
autoradiography Using photographic film to make a
bivalent The four-stranded structure seen in prophase I
radiolabeled molecule reveal its location on a gel or in a cell.
of meiosis, comprising two synapsed homologous chromo-
(Box 7.2)
somes. (Figure 14.4)
autosome Any chromosome other than the sex chromo-
blastocyst An embryo at a very early stage of development
somes, X and Y.
when it consists of a hollow ball of cells with a fluid-filled
autozygosity In an inbred person, homozygosity for alleles internal compartment, the blastocele. (Figure 5.9)
identical by descent. (Section 14.5)
blastomere One of the many cells formed by cleavage of a
B cells (B lymphocytes) The lymphocytes that make anti- fertilized egg.
bodies.
bootstrapping A statistical method designed to check the
bacteria Single-celled organisms that lack a membrane- accuracy of an evolutionary tree constructed from compara-
bound nucleus or other organelles. tive sequence analysis. (Section 10.4)
bacterial artificial chromosome (BAC) A cloning vector in branch site In mRNA processing, a rather poorly defined
which inserts up to 300 kb long can be propagated in bacte- sequence (consensus YNCTRAY; R = purine, Y = pyrimidine,
rial cells. N = any nucleotide) located 10–50 bases upstream of the
splice acceptor, containing the adenosine at which the lariat
bacteriophage (phage) A virus that infects bacteria. Modi-
splicing intermediate is formed. (Figure 1.17)
fied phage are used as vectors for cloning in bacterial cells.
bromodomain A protein domain that stimulates binding
banding In a preparation of chromosomes, treatments to
to acetylated lysine residues, primarily in histones.
make the chromosomes stain in a reproducible pattern of
dark and light bands to aid identification of chromosomes C value paradox The lack of a direct relationship between
and detection of structural abnormalities. (Figure 2.14) the amount of DNA in the cells of an organism (the C value)
and the complexity of the organism. (Table 4.2)
Barr body The chromatin of an inactive X chromosome,
seen as a blob of condensed chromatin at the edge of the CAAT box A short sequence, GGCCAATCT or a close vari-
nucleus of interphase cells that contain one or more inactive ant that is found in the promoter of many genes that are
X chromosomes. Also called sex chromatin. (Figure 3.8) transcribed by RNA polymerase II.
basal lamina Thin mat of extracellular matrix that sepa- CAGE (cap analysis of gene expression) A high-through-
rates epithelial sheets and many types of cell from the under- put technique for cataloging bulk mRNAs by isolating
lying connective tissue. around 18 nucleotides adjacent to the 5¢ cap for sequencing.
(Box 11.1)
basal transcription apparatus The multiprotein complex
that is required for RNA polymerase II to transcribe any gene. capping A stage in RNA processing, addition of a special
Additional, tissue-specific, proteins are usually also needed. nucleotide, 7-methylguanosine triphosphate, by a 5¢-5¢ bond
(Figure 11.2) to the 5¢ end of a primary transcript. Capping is important
for the stability of the RNA. (Figure 1.20)
base complementarity The relationship between bases on
opposite strands of a double-stranded nucleic acid: A always cell The basic unit of all living things.
opposite T (or U in RNA) and G always opposite C.
cell adhesion The way cells recognize and bind to each
base pair The unit of length of a double-stranded nucleic other.
acid. Also, more narrowly, two purine or pyrimidine bases on
Glossary: coefficient of inbreeding 721
cell cycle The reproductive cycle of a cell, comprising Protein and DNA are reversibly cross-linked, the chosen
mitosis (M phase), the first gap (G1 phase), DNA synthesis protein is precipitated with an antibody, and the associated
(S phase), a second gap (G2 phase), then mitosis of the next DNA sequenced. (Box 11.3)
cycle. (Figure 2.1)
chromatin remodeling complexes Protein complexes that
cell differentiation The process by which a less specialized can move, dissociate or reconstitute nucleosomes in chro-
cell becomes more specialized. matin, as part of the systems controlling chromatin confor-
mation. (Table 11.3)
cell fate In developmental biology, the normal progeny of a
given cell as development proceeds. chromodomain A protein domain that stimulates binding
to methylated lysine residues, primarily in histones.
cell junctions Specialized structures that control the pas-
sage of molecules between cells. They may form totally chromosome conformation capture A set of techniques
impermeable barriers, or they may allow specific types or for identifying DNA sequences that lie close together in
sizes of molecule to pass. (Figure 4.3) interphase nuclei. (Box 11.3)
cell lineage In development, the ancestry and descendants chromosome engineering Genetic engineering tech-
of a cell, as traced backwards or forwards through successive niques to produce specific large-scale chromosomal dele-
cell divisions. tions or rearrangements. (Figure 20.12)
cell polarity Asymmetry of a cell. In embryos, asymmetric chromosome jumping An obsolete technique used to
cells often divide to form daughter cells that follow different obtain clones from a chromosomal location some distance
fates in development. (Box 4.3) (typically 100 kb) away from a previously isolated clone. (Fig-
ure 16.19)
centimorgan (cM) The unit of genetic distance. Loci 1 cM
apart have a 1% probability of recombination during meio- chromosome painting Fluorescence labeling of a whole
sis. Figure 14.5 shows the relationship between genetic and chromosome by a FISH procedure in which the probe is a
physical distances. cocktail of many different DNA sequences from that particu-
lar chromosome. (Figures 2.16, 2.18, and 17.17)
central dogma As formulated by Francis Crick, DNAÆ
RNAÆprotein—that is, the DNA specifies the nucleotide chromosome set The chromosomes of a haploid genome.
sequence of an RNA which in turn specifies the amino acid
chromosome walking Isolating sequences adjacent on the
sequence of a protein. Useful, but not always true.
chromosome to a characterized clone by screening genomic
centriole A cylinder of short microtubules located in the libraries for clones that partially overlap it.
centrosome. (Box 2.1)
cis-acting Controlling the activity of a gene only when it is
centromere The primary constriction of a chromosome, part of the same DNA molecule or chromosome as the regu-
separating the short arm from the long arm, and the point at latory factor. Compare trans-acting regulatory factors which
which spindle fibers attach to pull chromatids apart during can control their target sequences irrespective of their chro-
cell division. mosome location.
centrosome In cell division, the microtubule organizing clonal selection and expansion The process responsible
center that forms a spindle pole. (Box 2.1) for immunological memory. Binding of an antigen to a B or
T cell stimulates it to multiply, forming a clone of cells that
CEPH families A set of families assembled by the Centre
react to the same antigen. However, clones that respond to
d’Etude du Polymorphisme Humain in Paris to assist the
self-antigens are eliminated.
production of marker-marker framework maps.
clone fingerprinting Identifying independent clones that
CGH—see comparative genomic hybridization
contain overlapping inserts by comparing the pattern of
character An observable property of an individual. fragments produced by a series of restriction enzymes.
checkpoint Checkpoints prevent further progress through clones Identical copies (of a DNA sequence, a cell, an
the cell cycle unless the genome and the cell are in a suitable organism). In genetic research, often means cells containing
state to proceed. (Figure 17.13) identical recombinant DNA molecules (the cells themselves
may or may not be identical).
chiasma (plural chiasmata) The physical manifestation of
meiotic recombination, as seen under the microscope. (Fig- cloning Production of many identical copies of a DNA
ure 14.4) sequence, a cell or a whole organism.
chimera (1) An organism derived from more than one co-activators Proteins that enhance transcription of a
zygote. (Figure 3.23) (2) A chimeric gene is a gene created gene through protein-protein rather than protein-DNA
when a chromosomal rearrangement brings together parts interactions.
of two different genes to create a novel functional gene – a
coding RNA Messenger RNA that codes for protein.
frequent event in tumors. (Figure 17.6)
codon A nucleotide triplet (strictly in mRNA, but by exten-
chromatin A general term for the packaged DNA in a cell
sion, in genomic coding DNA) that specifies an amino acid
nucleus. The basic conformation is a 30 nm coiled coil of
or a translation stop signal.
DNA and histones. (Figure 2.8)
coefficient of inbreeding The proportion of loci at which a
chromatin immunoprecipitation (ChIP) A technique for
person is homozygous by virtue of the consanguinity of their
identifying the DNA sequences that bind a specific protein.
parents. (Section 3.5)
722 Glossary: coefficient of relationship
coefficient of relationship Of two people, the proportion complex Of a phenotype, one that can have a variety of dif-
of loci at which they share alleles identical by descent. (Box ferent causes and modes of inheritance in different people.
3.6)
compound heterozygote A person with two different
coefficient of selection The chance of reproductive failure mutant alleles at a locus.
for a certain genotype, relative to the most successful geno-
concatemers Molecules joined end-to-end in a chain.
type. (Section 3.5)
concordance Of twins, the frequency with which co-twins
cofactor A small molecule or metal ion that is required for
have the same phenotype.
the biological activity of a protein. More generally, any factor
that assists the principal actor in a process. conformation Of a complex molecule, the 3-dimensional
shape – the result of the combined effects of many weak
co-immunoprecipitation (co-IP) Using an antibody to
noncovalent bonds.
precipitate a known molecule complexed with its binding
partners, as a way of identifying the binding partners. (Fig- congenic strain In experimental animals, two strains that
ure 12.10) have identical genetic backgrounds and differ only in some
chosen gene or location.
colony hybridization Screening a DNA library by hybridiz-
ing a labeled probe to gridded-out colonies of cells contain- consanguineous mating A mating where both parties
ing recombinant molecules. (Figure 8.3) share one or more identifiable recent common ancestors.
common disease–common variant hypothesis The consensus sequence A representation of the main shared
hypothesis that most factors conferring susceptibility to elements of a family of functionally related DNA sequences.
common non-Mendelian diseases are ancient variants that
are common in the population (in distinction to the muta- conservative change In a protein, replacement of one
tion–selection hypothesis, q.v.). (Section 15.6) amino acid by a chemically similar one.
compaction In an early embryo, the tightening of cell-cell conserved sequence A sequence (of DNA or sometimes
contacts that converts the loosely bound products of the ini- protein) that is identical or recognizably similar across a
tial cleavage divisions of the zygote into a compact morula. range of organisms.
comparative genomic hybridization (CGH) Competitive constitutional abnormality An abnormality that was pres-
fluorescence in situ hybridization of a test and control sam- ent in the zygote, and so is present in every cell of a person.
ple, normally to a microarray of mapped clones but originally contig A set of overlapping clones. (Box 8.1)
to a spread of normal chromosomes, to detect chromosomal
regions in the test sample that are amplified or deleted com- contiguous gene syndrome A syndrome that is the result
pared to the control sample. of a deletion that inactivates two or more contiguous genes,
each of which contributes to the phenotype. (Table 13.2)
comparative genomics A systematic and comprehensive
comparison of the genomes of different organisms. continuous character A character like height, which
everybody has, but to differing degree – as compared with a
complement system A system of serum proteins activated dichotomous character like polydactyly, which some people
by antigen-antibody complexes or by microorganisms. (Fig- have and others do not.
ure 4.20)
copy number variation (CNV) Variation between individ-
complementary Of two nucleic acid strands, having uals in the number of copies of a particular DNA sequence in
sequences such that they can anneal to form a double- their genomes. (Section 13.1)
stranded molecule.
co-repressors Proteins that suppress transcription of a
complementary DNA (cDNA) A DNA copy of an RNA, gene through protein-protein rather than protein-DNA
made by reverse transcriptase. (Figure 8.2) interactions.
complementation Two alleles complement if in combina- cosmid A vector for cloning in E. coli. Cosmids have an
tion they restore the wild-type phenotype (Table 3.1). Nor- insert capacity of up to 44 kb.
mally alleles complement only if they are at different loci,
although some cases of interallelic complementation occur. CpG island Short stretch of DNA, often < 1 kb, containing
frequent unmethylated CpG dinucleotides. CpG islands tend
complementation test A breeding or cell culture experi- to mark the 5¢ ends of genes.
ment to establish the relationship between two recessive
mutations by checking the phenotype of an organism or cell Cre–loxP technique A technique that allows tissue- or
that inherits one of the mutations from each parent or each stage-specific knockout of a gene in an intact animal. (Figure
of two donor cells. If the mutations are allelic, the phenotype 20.11)
will be mutant; if not, it will be wild type. (Figures 3.9 and crossover An act of meiotic recombination, or the physical
3.12) manifestation of that, as seen under the microscope. (Fig-
complete truncate ascertainment Sampling a population ures 14.3 and 14.4)
by collecting every family where at least one child has a cer- cryptic splice site A sequence in pre-mRNA with some
tain recessive condition. The collection will not include fam- homology to a splice site. Cryptic splice sites may be used as
ilies where both parents are carriers but none of the children splice sites when splicing is disturbed or after a base substi-
is affected. (Figure 3.11) tution mutation that increases the resemblance to a normal
splice site. (Figure 13.13)
Glossary: endoplasmic reticulum 723
cytogenetics The study of chromosomes. DNA replication The process of copying a DNA molecule
to make two identical daughter molecules.
cytokines Extracellular signaling proteins or peptides that
act as local mediators in cell-cell communication. DNase-hypersensitive sites (DHSs) Regions of chromatin
that are rapidly digested by DNase I because the DNA is rela-
cytokinesis The final event of cell division, when the cyto-
tively exposed rather than being tightly packaged in nucleo-
plasm of the cell divides. (Figure 2.3)
somes. They are believed to mark important long-range
cytosol The contents of the cytoplasm of a cell, excluding control sequences. (Section 11.2)
membrane-bound organelles such as mitochondria or lyso-
dominant In human genetics, any trait that is expressed in
somes.
a heterozygote. See also semi-dominant.
degenerate Used in genetics to describe a many-to-one
dominant negative effect The situation where a mutant
relation between structure and function. The genetic code is
protein interferes with the function of its normal counter-
degenerate because most amino acids can be incorporated
part in a heterozygous person. (Figure 13.24)
into a polypeptide in response to any of several different
codons in the mRNA. An oligonucleotide is degenerate if it is double helix The normal structure of DNA; two antiparal-
a mixture of several related sequences. lel DNA strands wrapped round one another.
denaturation Dissociation of double-stranded nucleic driver mutations In cancer, mutations that are subject to
acid to give single-strands. Also destruction of the 3-dimen- positive selection during tumorigenesis because they assist
sional structure of a protein by heat or high pH. development of the tumor. Cf. passenger mutations.
dichotomous character A character like polydactyly, which dynamic mutation An unstable expanded repeat that
some people have and others do not have – as compared to a changes size between parent and child. (Section 13.3)
continuous character like height, which everybody has, but
ectoderm One of the three germ layers of the embryo. It
to differing degree.
is formed during gastrulation from cells of the epiblast and
dideoxy (Sanger) sequencing The standard method of gives rise to the nervous system and outer epithelia. (Figures
DNA sequencing, developed by Fred Sanger and using 5.1 and 5.17)
didexoynucleotide chain terminators. (Figure 8.6)
electroporation A method of transferring DNA into cells in
diploid Having two copies of each type of chromosome; vitro by use of a brief high-voltage pulse.
the normal constitution of most human somatic cells.
electrospray ionization A mass spectrometric technique
distal (of chromosome) Positioned comparatively distant in which a solution containing the analyte is sprayed as very
from the centromere. fine charged droplets into the machine. (Box 8.8)
disulfide bridge In proteins, an intramolecular or inter- elongation factors Factors that assist progression of RNA
molecular link between the SH groups of two cysteine resi- polymerase along a DNA sequence once transcription has
dues. Disulfide bridges are important for maintaining the been initiated.
3-dimensional folding of proteins. (Figure 1.29)
embryonic germ cells Pluripotent cells derived from cul-
DNA chip A high-density microarray carrying oligonucle- tured primordial germ cells of embryos.
otides or longer single-stranded DNA molecules.
embryonic stem (ES) cell line Embryonic stem cells that
DNA duplex A double-stranded DNA molecule. have continued to proliferate after subculturing for a period
of 6 months or longer and that are judged to be pluripotent
DNA fingerprinting A now obsolete method of identify-
and genetically normal.
ing a person for legal or forensic purposes based on prob-
ing Southern blots with a hypervariable minisatellite probe. embryonic stem (ES) cells Undifferentiatied, pluripotent
(Figure 18.23) cells derived from an embryo. A key tool for genetic manipu-
lation. (Figure 21.5)
DNA libraries The result of cloning random DNA frag-
ments or molecules. A collection of cells containing different empiric risks Risks calculated from survey data rather
recombinant vectors, which must then be screened to find than from genetic theory. Genetic counseling in most non-
any desired sequence. (Figures 8.1 to 8.4) Mendelian conditions is based on empiric risks. (Section 3.4)
DNA methylation Conversion of cytosine in DNA to endoderm One of the three germ layers of the embryo. It
5-methyl cytosine, a signal that helps regulate gene expres- is formed during gastrulation from cells migrating out of the
sion. (Figures 11.10 to 11.13) epiblast layer. (Figures 5.1 and 5.17)
DNA polymerase The family of enzymes that can add endonuclease An enzyme that cuts DNA or RNA at an
nucleotides to the 3¢ end of a DNA molecule. (Table 1.3) internal position in the chain.
DNA primase An enzyme that synthesizes a short RNA endophenotype A phenotype that marks one facet of the
molecule that serves as a primer for DNA replication. changes that occur during development of a complex dis-
ease. Hopefully endophenotypes are directly related to an
DNA profiling Using genotypes at a series of polymorphic
underlying biological change.
loci to recognize a person, usually for legal or forensic pur-
poses. (Section 18.6) endoplasmic reticulum A meshwork of membranes in the
cytoplasm of cells, forming a compartment where mem-
DNA repair Correcting lesions in DNA caused by mistakes
brane-bound and secreted proteins are made.
during replication or by external agents such as radiation or
chemicals.
724 Glossary: enhancer
enhancer A set of short sequence elements which stimu- expression array A microarray of probes for expressed
late transcription of a gene and whose function is not criti- sequences, used to analyze the pattern of gene expression
cally dependent on their precise position or orientation. in a given cell type or tissue by hybridizing to labeled cDNA
(Figures 11.5 and 11.6) from the cell or tissue. (Figures 8.24 and 17.21)
epiblast The layer of cells in the pregastrulation embryo expression cloning Cloning in vectors that are deigned to
that will give rise to all three germ layers of the embryo allow genes in the insert to be expressed. Used to make puri-
proper, plus the extraembryonic ectoderm and mesoderm. fied gene product. (Figures 6.11 and 6.15)
Cf. hypoblast. (Figure 5.13)
extracellular matrix A meshwork of polysaccharide and
epigenetic Heritable (from mother cell to daughter cell, protein molecules found within the extracellular space and
or sometimes from parent to child), but not produced by in association with the basement membrane of the cell sur-
a change in DNA sequence. DNA methylation is the best face. It provides a scaffold to which cells adhere and serves to
understood epigenetic mechanism. promote cellular proliferation.
episome Any DNA sequence that can exist in an autono- first-degree relatives Parents, children or sibs.
mous extra-chromosomal form in the cell. Often used to
FISH—see fluorescent in situ hybridization
describe self-replicating and extra-chromosomal forms of
DNA. fitness (f ) In population genetics, a measure of the suc-
cess in transmitting genotypes to the next generation, rela-
epistasis Literally ‘standing above’. Gene A is epistatic to
tive to the most successful genotype. Also called biological or
gene B if A functions upstream of B in a common pathway.
reproductive fitness. f always lies between 0 and 1.
Loss of function of A will cause all the effects of loss of func-
tion of B, and maybe other effects as well. fluorescent in situ hybridization (FISH) In situ hybridiza-
tion using a fluorescently labeled DNA or RNA probe. A key
epitope The part of an immunogenic molecule to which an
technique in modern molecular genetics. (Figures 2.16 and
antibody responds.
2.17)
epitope tagging A method for visualizing a specific pro-
fluorophore A fluorescent chemical group, used for label-
tein in cells or tissues. A recombinant version of the protein
ing nucleic acids or proteins. (Box 7.3)
is produced that has attached a marker peptide for which a
fluorescently labeled antibody is available. (Table 8.7) founder effect High frequency of a particular allele in a
population because the population is derived from a small
EST—see expressed sequence tag
number of founders, one or more of whom carried that allele.
euchromatin The fraction of the nuclear genome which
fragile sites Locations on chromosomes where, under spe-
contains transcriptionally active DNA and which, unlike het-
cial culture conditions, the chromatin of metaphase chro-
erochromatin, adopts a relatively extended conformation.
mosomes appears uncondensed. Most are nonpathogenic
(Figure 11.9)
variants present at varying frequencies in normal healthy
eukaryotes Organisms made of cells with a membrane- individuals, but a few are pathogenic. (Figure 13.5)
bound nucleus and other organelles (Box 4.1). One of the
fragment ion searching A method of identifying proteins
three kingdoms of life.
in tandem mass spectrometry from the mass of fragments
exaptation An unusual evolutionary process in which they produce. (Figure 8.27)
sequences derived from a transposable element are used by
frameshift mutation A mutation that alters the triplet
the host genome for a novel function. (Figure 10.32)
reading frame of a mRNA (by inserting or deleting a number
exclusion mapping Genetic mapping with negative of nucleotides that is not a multiple of 3). (Figures 13.14 and
results, showing that the locus in question does not map to a 13.15)
particular location. Particularly useful for excluding a possi-
framework map A map of the locations of some physical
ble candidate gene without the labor of mutation screening.
entities – genetic markers, sequence-tagged sites or clones
exome The totality of exons in a genome. – across a genome or chromosome. Used as a step towards a
full genome sequence. (Box 8.1)
exon A segment of a gene that is retained during splicing.
Individual exons may contain coding and/or noncoding functional genomics Analysis of gene function on a large
DNA (untranslated sequences). (Figure 1.16) scale, by conducting parallel analyses of gene expression/
function for large numbers of genes, even all genes in a
exon junction complex (EJC) A set of proteins that are
genome.
bound to mRNAs during splicing, at the positions where
introns have been removed. Exon junction complexes are fusion protein The product of a natural or engineered
removed from the mRNA during a ‘pioneer’ round of transla- fusion gene: a single polypeptide chain containing amino
tion by ribosomes. Incomplete removal, because of a prema- acid sequences that are normally part of two or more sepa-
ture termination codon, triggers nonsense-mediated decay rate polypeptides. (Figures 6.12 and 17.6)
of the mRNA. (Figure 13.12)
gain-of-function mutations Mutations that cause the
exonuclease An enzyme that digests DNA or RNA from one gene product to do something abnormal, rather than simply
end. May be a 3¢ or a 5¢ exonuclease. to lose function. Usually the gain is a change in the timing or
level of expression. (Section 13.4)
expressed sequence tag (EST) Short partial sequences of
cDNAs that can be used to follow gene expression or to iso- gamete Sperm or egg; a haploid cell formed when a pri-
late a full-length cDNA. mordial germ cell undergoes meiosis.
Glossary: Golgi apparatus 725
gastrulation Conversion of the two-layer embryo (consist- genetic background The genotypes at all loci other than
ing of epiblast plus hypoblast) to one that contains the three one under active investigation. Variations in genetic back-
germ layers: ectoderm, mesoderm and endoderm. ground are a major reason for imperfect genotype-pheno-
type correlations.
GC box A short sequence, GGGCGG or a close variant, that
is found in the promoters of many genes that are transcribed genetic code The relationship between a codon and the
by RNA polymerase II. amino acid it specifies. (Figure 1.25)
gene (1) A functional DNA unit (but see Box 9.4). (2) A fac- genetic distance Distance on a genetic map, defined by
tor that controls a phenotype and segregates in pedigrees recombination fractions and the mapping function, and
according to Mendel’s laws. measured in centimorgans. (Section 14.1)
gene conversion A naturally occurring nonreciprocal genetic drift Random changes in gene frequencies over
genetic exchange in which a sequence of one DNA strand is generations because of random fluctuations in the propor-
altered so as to become identical to the sequence of another tions of the alleles in the parental population that are trans-
DNA strand. (Box 14.1) mitted to offspring. Only significant in small populations.
gene expression Production of the gene product (a protein genetic map A map showing the sequence and recombi-
or a functional RNA). nation fractions between genes, based on breeding experi-
ments or observation of human pedigrees (Table 14.4 and
gene family A set of related genes with a presumed com-
Figure 14.11). Figure 14.5 shows examples of the relation
mon ancestry. (Table 9.6)
between genetic and physical maps. Cf. physical maps.
gene frequency The proportion of all alleles at a locus that
genetic marker Any character that can be used to fol-
are the allele in question (Section 3.5). Really we mean allele
low the segregation of a particular chromosomal segment
frequency, but the use of gene frequency is too well estab-
through a pedigree or in a population. Normally a DNA
lished now to change.
sequence polymorphism.
gene knockdown Targeted inhibition of expression of a
genetic redundancy Partially or completely overlapping
gene by, for example, using siRNA or a morpholino oligonu-
function of genes at more than one locus, so that loss of
cleotide to bind to RNA transcripts.
function mutations at one locus do not cause overall loss of
gene knock-in A targeted mutation that replaces activity of function.
one gene by that of an introduced gene (usually an allele).
genome The total set of different DNA molecules of an
(Figures 20.8 and 20.9)
organelle, cell or organism. The human genome consists of
gene knockout The targeted inactivation of a gene within 3 ¥ 109 bp of DNA divided into 25 molecules, the mitochon-
an intact cell. drial DNA molecule plus the 24 different chromosomal DNA
molecules. Cf. transcriptome, proteome.
gene ontology A formal controlled vocabulary for describ-
ing the functions of genes, as an aid to automated cross-ref- genome browser A program that provides a graphical
erencing. interface for interrogating genome databases.
gene pool All the genes (in the whole genome or at a speci- genomewide association study (GWAS) The standard
fied locus) in a particular population. (Section 3.5) approach to identifying factors governing susceptibility to
complex disease. (Figure 15.11)
gene superfamily A set of multiple genes and gene families
that show signs of overall distant structural and functional genotype–phenotype correlation The extent to which
relationships – for example the immunoglobulin and the a phenotype can be predicted from a genotype. Typically
G-protein coupled receptor superfamilies. much higher in experimental animals, which are inbred and
live under standard laboratory conditions, than in humans.
gene targeting Targeted modification of a gene in a cell or Poor correlation is a major limitation in human genetic test-
organism. (Figures 20.6 to 20.9) ing and counseling.
gene therapy Treating disease by genetic modification. germ line The germ cells and those cells which give rise to
May involve adding a functional copy of a gene that has lost them; other cells of the body constitute the soma.
its function, inhibiting a gene showing a pathological gain of
function, or more generally, replacing a defective gene. germinal (gonadal, or gonosomal) mosaic An individual
who has a subset of germ-line cells carrying a mutation that
gene tracking Following a disease gene through a pedi- is not found in other germ-line cells. (Figure 3.22)
gree by use of linked markers rather than a direct test for the
pathogenic change. (Box 18.1) glycolipid A lipid molecule with a covalently attached
sugar or oligosaccharide.
gene trap Using random insertions of a reporter construct
into genes of embryonic stem cells to generate embryos with glycosaminoglycans Long polysaccharide molecules
random genes inactivated. The affected genes can be identi- made of pairs of sugar units, one of which is always an amino
fied via the reporter. (Figure 20.16) sugar. A major component of extracellular matrix.
general transcription factors DNA-binding proteins that glycosylation Covalent addition of sugars, usually to a pro-
are always required to allow transcription to take place (as tein or lipid molecule.
distinct from tissue-specific or stage-specific transcription
Golgi apparatus A membranous organelle in which pro-
factors). (Table 11.1)
teins and lipids are modified and sorted for transport to dif-
ferent destinations. (Box 4.1)
726 Glossary: gonadal mosaic
gonadal mosaic—see germinal mosaic histones A family of small basic proteins that complex with
DNA to form nucleosomes. (Figures 2.8 and 11.8)
GT–AG rule The rule that almost all human introns begin
with GT (GU in the RNA) and end in AG. A few follow an AT– homeobox A 180 bp module found in many genes that
AC rule and use a different spliceosome machine. have functions in development. The products of homeobox
genes regulate the expression of target genes through a 60
haploid Describing a cell (typically a gamete) which has
amino acid DNA-binding homeodomain.
only a single copy of each chromosome (e.g. the 23 chromo-
somes in human sperm and eggs). homoduplex Double-stranded DNA in which the two
strands match perfectly. Cf. heteroduplex.
haploinsufficiency A locus shows haploinsufficiency if
producing a normal phenotype requires more gene product homologs (chromosomes) The two copies of a chromo-
than the amount produced by a single functional allele. (Sec- some in a diploid cell. Unlike sister chromatids, homologous
tion 13.4) chromosomes are not copies of each other; one was inher-
ited from the father and the other from the mother.
haplotype A series of alleles found at linked loci on a single
chromosome. homologs (genes) Two or more genes whose sequences are
significantly related because of a close evolutionary relation-
Hardy–Weinberg distribution The simple relationship
ship, either between species (orthologs) or within a species
between gene frequencies and genotype frequencies that is
(paralogs).
found in a population under certain conditions. (Section 3.5)
homoplasmy Of a cell or organism, having all copies of the
heat map A form of data display used particularly for
mitochondrial DNA identical. Cf. heteroplasmy.
expression array data. A table of cells, with each row repre-
senting a gene, each column a sample, and the color of each homozygote An individual having two identical alleles at
cell representing the level of expression of that gene in that a particular locus. For clinical purposes a person is often
sample. (Box 8.7 Figure 2 and Figure 17.21) described as homozygous AA if they have two normally-
functioning alleles, or homozygous aa if they have two
helicase A protein that acts to separate the two strands of
pathogenic alleles at a locus, regardless of whether the alleles
double-stranded nucleic acid, as part of the machinery for
are in fact completely identical at the DNA sequence level.
replication, recombination and repair.
Homozygosity for alleles identical by descent is called auto-
hematopoietic stem cells Self-renewing bone marrow cell zygosity.
that gives rise to all the various types of blood cell. (Figure
housekeeping gene A gene that provides some basic aspect
4.17)
of cell function, common to most or all cells of an organism.
hemizygous Having only one copy of a gene or DNA
humanized antibodies Monoclonal antibodies made
sequence in diploid cells. Males are hemizygous for most
in rodent systems but where some or most of the rodent-
genes on the sex chromosomes. Deletions occurring on one
specific sequence has been replaced by human-specific
autosome produce hemizygosity in males and in females.
sequence.
heritability The proportion of the causation of a character
humanized mice Mice that have been modified so that
that is due to genetic causes. (Section 3.4)
some chosen aspect of their genetics or physiology more
heterochromatin Chromatin that is highly condensed and closely resembles its human equivalent.
shows little or no evidence of active gene expression. Facul-
hybrid cells Cells containing genetic material from more
tative heterochromatin may reversibly decondense to form
than one species. Human-rodent hybrid cells, containing a
euchromatin, depending on the requirements of the cell.
full rodent genome and one or more human chromosomes
Constitutive heterochromatin remains condensed through-
or fragments, were widely used in physical mapping of the
out the cell cycle. It is found at centromeres plus some other
human genome.
regions. Cf. euchromatin. (Figure 11.9)
hybridization Of nucleic acids, allowing complementary
heteroduplex Double-stranded DNA in which there is
single strands to base-pair (anneal).
some mismatch between the two strands. Important in
mutation detection. hybridization stringency The degree to which the con-
ditions (temperature, salt concentration) during a hybrid-
heteroplasmy Mosaicism, usually within a single cell, for
ization assay permit sequences with some mismatches to
mitochondrial DNA variants. (Section 3.2)
hybridize. High stringency conditions allow only perfect
heterozygote An individual having two different alleles at matches. (Figure 7.3)
a particular locus.
hybridization, molecular Hybridization of two comple-
heterozygote advantage The situation when somebody mentary single-stranded nucleic acids.
heterozygous for a mutation has a reproductive advantage
hybridoma A cell line made by fusing an antibody-produc-
over both homozygotes. Sometimes called over-dominance.
ing B cell with a cell of a B lymphocyte tumor. The source of
Heterozygote advantage is one reason why severe recessive
monoclonal antibodies.
diseases may remain common. (Box 3.7)
hydrogen bond A weak chemical bond that forms when a
histone code The idea that the pattern of covalent modifi-
hydrogen atom lies in line between two oxygen, nitrogen or
cation of histones in nucleosomes determines the activity of
fluorine atoms. The basis of base-pairing in nucleic acids.
the DNA in the vicinity. In fact, histone modification is only
one of several factors that, between them, determine gene hydrophilic Of a chemical group, having energetically
expression. (Section 11.2) favorable interactions with water and other polar molecules.
Glossary: lateral inhibition 727
A property of charged or polar groups. inner cell mass (ICM) A group of cells located internally
within the blastocyst which will give rise to the embryo
hydrophobic Of a chemical group, repelled by water and
proper.
other polar groups. Hydrophobic groups associate together
in the interior of protein molecules, membranes etc. insertional mutagenesis Mutation (usually abolition
of function) of a gene by insertion of an unrelated DNA
hypoblast The layer of cells in the pre-gastrulation embryo
sequence into the gene.
which gives rise to extraembryonic endoderm. (Figure 5.13)
insulators DNA elements that act as barriers to the spread
hypomorph An allele that produces a reduced amount or
of chromatin changes or the influence of cis-acting elements.
activity of product.
interallelic complementation—see complementation
identity by descent (IBD) Alleles in an individual or in two
people that are known to be identical because they have interference In meiosis, the tendency of one crossover to
both been inherited from a demonstrable common ances- inhibit further crossing over within the same region of the
tor. (Figure 15.1) chromosomes. (Section 14.1)
identity by state (IBS) Alleles that appear identical, but interphase All the time in the cell cycle when a cell is not
may or may not be identical by descent because there is no dividing.
demonstrable common source. (Figure 15.1)
interphase FISH Fluorescence in situ hybridization of a
immunoblotting Using an antibody to identify proteins probe to interphase cell nuclei. Used to detect aneuploidies
that have been fractionated by size and charge by electro- or other chromosomal abnormalities without the need to
phoresis and then transferred to a nitrocellulose membrane. culture cells, or to examine the subnuclear localization of
(Figure 8.21) chromosomes in nondividing cells. (Figures 2.17 and 17.4)
immunogen Any molecule that elicits an immune intrabodies Engineered nonsecreted intracellular anti-
response. bodies that can be used to inactivate selected molecules
inside a cell.
immunological memory The ability of the adaptive
immune system to mount a rapid and strong response to an intracytoplasmic sperm injection (ICSI) A human infer-
antigen that it has previously encountered. tility treatment in which sperm heads are injected into
unfertilized eggs. Sometimes used experimentally to make
imprinting In genetics, determination of the expression of
transgenic animals by first coating the sperm head with the
a gene by its parental origin. (Table 11.5)
desired transgene DNA.
imprinting control center (IC) A short sequence within an
intron Segments of a transcript that are cut out during
imprinted gene cluster where differential methylation con-
splicing (Figure 1.16). Introns appear to be largely nonfunc-
trols the imprinting status of genes within the cluster. (Figure
tional, but they may contain elements that modify transcrip-
11.22)
tion or splicing of their host gene, and some introns are the
in situ hybridization Hybridization of a labeled DNA or source of small nucleolar RNAs (Figure 11.22), microRNAs
RNA probe to an immobilized nucleic acid target. The target (Figure 9.17), and other functional RNAs.
may be denatured DNA within a chromosome preparation
iron-response element (IRE) A sequence element in cer-
(chromosome in situ hybridization), RNA within the cells of
tain mRNA species that changes the activity of the mRNA in
a tissue section on a microscope slide (tissue in situ hybrid-
response to excess or deficiency of Fe++. (Figure 11.32)
ization) or RNA within a whole embryo (whole mount in situ
hybridization). isochromosome An abnormal symmetrical chromosome,
consisting of two identical arms, which are normally either
in vitro mutagenesis Techniques to introduce a specific
the short arm or the long arm of a normal chromosome.
desired sequence change into the DNA of a cell through
manipulation of vectors. (Figures 6.8 and 6.19) Ka/Ks ratio An indicator of selection affecting a gene
sequence: the ratio of non-synonymous to synonymous
inbreeding Marrying a blood relative. The term is compar-
codon changes in a comparison of two organisms. (Box 10.5)
ative, since ultimately everybody is related. The coefficient
of inbreeding is the proportion of a person’s genes that are karyogram A display of the chromosomes of a cell, sorted
identical by descent. (Box 3.6) into pairs. (Figure 2.15)
indels Insertion / deletion variants. karyotype A summary of the chromosome constitution of
a cell or person, such as 46,XY. Often used more loosely to
induced pluripotent stem (iPS) cells Somatic cells that
mean an image showing the chromosomes of a cell sorted in
have been treated with specific genes or gene products to
order and arranged in pairs (strictly, a karyogram).
reprogram them to resemble pluripotent stem cells. They
can then be induced to differentiate into desired cell types. A kinetochore The structure at chromosomal centromeres
great hope for regenerative medicine. to which the spindle fibers attach.
induction In development, the process whereby one tissue lagging strand In DNA replication, the strand that is syn-
changes the state or fate of an adjacent tissue. thesized as Okazaki fragments. (Figure 1.11)
innate immune system System of nonspecific response to lateral inhibition A process during embryogenesis in
a pathogen using the natural defenses of the body. Cf. adap- which cells that differentiate inhibit neighboring cells from
tive immune system. doing the same. The result is to develop spaced sets of dif-
ferentiated cells.
728 Glossary: LCR
MHC (major histocompatibility complex) restriction The may be point mutations, chromosomal changes, etc. (Fig-
process that restricts recognition of foreign antigens by T ures 3.22 and 3.23)
cells to fragments that are associated with an MHC molecule
motif A short sequence or structure (usually in a protein)
on the surface of an antigen-presenting cell.
that forms a recognizable signature of a structure or func-
microcell-mediated chromosome transfer A technique tion.
for introducing a selected single chromosome into a mutant
mRNA—see messenger RNA
cell, to see if it can correct the mutant phenotype. (Figure
16.22) mtDNA (mitochondrial DNA) DNA of the 16,569 nt mito-
chondrial genome. (Figure 9.3)
microRNAs (miRNAs) Short (21–22 nt) RNA molecules
encoded within normal genomes that have a role in regula- multifactorial A character that is determined by some
tion of gene expression and maybe also of chromatin struc- unspecified combination of genetic and environmental fac-
ture. (Figures 9.16, 9.17, and 11.33) tors. Cf. polygenic.
microsatellite instability A type of genomic instability multiplex ligation-dependent probe amplification (MLPA)
seen in some colon and other cancers where replication A method for testing a DNA sample for specific copy-num-
errors are not corrected; observed as the production of extra ber changes, usually deletion or duplication of whole exons.
alleles at many polymorphic microsatellites. (Figure 17.19) (Figure 18.15)
microsatellite Small run (usually less than 0.1 kb) of tan- multipoint mapping Genetic mapping based on consider-
dem repeats of a very simple DNA sequence, usually 1–4 bp, ing the simultaneous segregation of more than two marker
for example (CA)n. Often polymorphic, providing the pri- in pedigrees.
mary tool for genetic mapping during the 1990s. Sometimes
also described as STR (short tandem repeat) polymorphism. mutagenesis Creation of mutations, in vitro or in vivo.
(Figure 13.2) mutation–selection hypothesis The hypothesis that
microtubules Long hollow cylinders constructed from genetic susceptibility to complex diseases is mainly the
tubulin polymers. Form the spindle fibers that move chro- result of a heterogeneous collection of mutations that turn
mosomes in mitosis and meiosis, and contribute to the cyto- over relatively quickly because of natural selection (in dis-
skeleton. tinction to the common disease-common variant hypoth-
esis, q.v.). (Section 15.6)
minisatellite An array (typically 0.1–20 kb long) of tan-
demly repeated 10–50 bp DNA sequences. (Section 13.1) necrosis Cell death as a result of irreparable external dam-
age.
minor groove In a DNA double helix, the smaller of the two
spiral grooves that run along the length of the molecule. neural crest A group of migratory cells in an embryo that
form along the lateral margin of the neural folds and give rise
missense changes Changes in a coding sequence that to many different tissues. Neural crest derivatives include
cause one amino acid in the gene product to be replaced by part of the peripheral nervous system, melanocytes, some
a different one. (Section 13.3) bone and muscle, the retina and other structures. (Box 5.4)
mitogen A substance that stimulates cells to divide. neural plate In an embryo, the precursor of the neural
tube. (Figure 5.18)
mitosis The normal process of cell division, which pro-
duces daughter cells genetically identical to the parent cell. neural tube In vertebrate embryos, a tube of ectoderm that
(Figure 2.3) will form the brain and spinal cord. (Figure 5.18)
MLPA—see multiplex ligation-dependent probe amplifica- node (1) In an early embryo, the anterior end of the primi-
tion tive streak. (Figure 5.12) (2) In a phylogenetic tree, the point
at which two lineages diverge. (Figure 10.24)
monoclonal antibody (mAb) A pure antibody with a single
specificity, produced by hybridoma technology, as distinct non-allelic homologous recombination (NAHR) Recom-
from polyclonal antibodies that are raised by immunization. bination between misaligned DNA repeats, either on the
(Box 8.6) same chromosome, on sister chromatids or on homologous
chromosomes. NAHR generates recurrent deletions, dupli-
morphogenesis The formation of structures during embry-
cations or inversions. (Figure 13.20)
onic development. (Table 5.2)
noncoding RNA (ncRNA) RNA that does not contain
morphogens Signaling molecules that can impose a pat-
genetic code for a protein. Noncoding RNAs have many dif-
tern on a field of cells in response to a gradient of concentra-
ferent functions in cells. (Table 9.9)
tion of the morphogen.
nondisjunction Failure of chromosomes (sister chroma-
Morpholino A stable chemically modified RNA analog
tids in mitosis or meiosis II; paired homologs in meiosis I) to
used to inhibit expression of a gene under study. (Figure
separate (disjoin) at anaphase. The major cause of numerical
20.14)
chromosome abnormalities.
morula An early stage of embryonic development; a
non-parametric In linkage analysis, a method such as
loosely packed ball of cells that will give rise to the blasto-
affected sib pair analysis that does not depend on a specific
cyst. (Figure 5.9)
genetic model.
mosaic An individual who has two or more genetically dif-
non-parametric lod (NPL) score The statistical output of
ferent cell lines derived from a single zygote. The differences
non-parametric linage analysis. (Table 15.6)
730 Glossary: non-penetrance
non-penetrance The situation when somebody carrying one gene–one enzyme hypothesis The hypothesis
an allele that normally causes a dominant phenotype does advanced by Beadle and Tatum in 1941 that the primary
not show that phenotype. An effect of other genetic loci or action of each gene was to specify the structure of an enzyme.
of the environment. A pitfall in genetic counseling. (Figure Historically very important, but now seen to be only part of
3.14) the range of gene functions.
non-recombinant In a pedigree, two loci are non-recom- origin of replication A site on DNA where replication can
binant in a gamete that contains the same combination of be initiated.
alleles as the person received from his or her parent. (Figure
ortholog Orthologous genes are genes present in different
14.1)
organisms that are related through descent from a common
nonsense mutation A mutation that replaces the codon ancestral gene. (Figure 10.10)
for an amino acid with a premature termination codon. (Fig-
P1 artificial chromosome (PAC) A vector based on P1 bac-
ure 13.12)
teriophage that allows inserts of @100 kb to be cloned in E.
nonsense-mediated mRNA decay A cellular mechanism coli cells.
that degrades mRNA molecules that contain a premature
paired-end mapping Comparing the number of nucleo-
termination codon (>50 nt upstream of the last splice junc-
tides separating two known sequences in a person’s DNA
tion). (Figure 13.12)
with the number in a reference genome, as a way of identify-
northern blot A membrane bearing RNA molecules that ing structural rearrangements. (Figure 13.6)
have been size-fractionated by gel electrophoresis, used as a
palindrome A DNA sequence such as ATCGAT that reads
target for a hybridization assay. Used to detect the presence
the same when read in the 5¢Æ3¢ direction on each strand.
and size of transcripts of a gene of interest. (Figure 7.11)
DNA-protein recognition, for example by restriction
notochord A flexible rod-like structure that in mammalian enzymes, often relies on palindromic sequences.
embryos induces formation of the central nervous system.
paralog One of a set of homologous genes within a single
(Figure 5.15)
species. (Figure 10.10)
nuclear reprogramming Large-scale epigenetic changes
parametric In linkage analysis, a method such as standard
to convert the pattern of gene expression in a cell to that
lod score analysis, that requires a tightly specified genetic
typical of another cell type or state.
model.
nucleic acid DNA or RNA.
paramutation An inherited phenotype that mimics the
nucleolar organizer region (NOR) The satellite stalks of result of a DNA sequence change, but where any change is
human chromosomes 13, 14, 15, 21 and 22. NORs contain epigenetic rather than a change of the DNA sequence. (Fig-
arrays of ribosomal RNA genes and can be selectively stained ure 11.24)
with silver. Each NOR forms a nucleolus in telophase of cell
passenger mutations In cancer, mutations that arise inci-
division; the nucleoli fuse in interphase.
dentally during development of a tumor and do not play any
nucleolus The site within the nucleus where ribosomal causative role in the process. Cf. driver mutations.
RNA is transcribed and assembled into the ribosomal sub-
PCR—see polymerase chain reaction
units.
penetrance The frequency with which a genotype mani-
nucleoside A purine or pyrimidine base linked to a sugar
fests itself in a given phenotype.
(ribose or deoxyribose). (Table 1.1)
peptide mass fingerprinting (PMF) A way of identifying
nucleosome The basic structural unit of chromatin, com-
proteins in a mixture by digesting with trypsin, analyzing the
prising 147 bp of DNA wound round an octamer of histone
mixture of peptides on a mass spectrometer, and comparing
molecules. (Figure 2.8)
the resulting peaks against a database showing the patterns
nucleotide A nucleoside phosphate. The basic building produced by known proteins. (Figure 8.27)
block of DNA and RNA. (Table 1.1)
phage A bacteriophage. A virus that replicates in bacterial
odds ratio In case-control studies, the relative odds of a cells.
person with or without a factor under study being a case.
phage display An expression cloning method in which for-
(Box 19.3)
eign genes are inserted into a phage vector and are expressed
Okazaki fragments Short strands of DNA; the immediate to give polypeptides that are displayed on the surface (pro-
product of lagging-strand replication. tein coat) of the phage. (Figure 6.13)
oligogenic A character that is determined by a small num- phagemid vector A vector containing sequences derived
ber of genes acting together. from phage and from plasmids, that allows inserts of several
kilobases to be cloned in E. coli and then released as single-
oligosaccharide A molecule consisting of a few linked
stranded DNA ready for sequencing. (Figure 6.10)
sugar units.
pharmacodynamics The response of a target organ or cell
oncogene A gene involved in control of cell proliferation
to a drug.
which, when overactive can help to transform a normal cell
into a tumor cell (Table 17.1). Originally the word was used pharmacogenetics The study of the influence of individual
only for the activated forms of the gene, and the normal cel- genes or alleles on the metabolism or function of drugs.
lular gene was called a proto-oncogene, but this distinction
is now widely ignored.
Glossary: probe 731
pharmacogenomics The use of genome resources polymorphism Strictly, the existence of two or more vari-
(genome sequences, expression profiles etc.) to identify new ants (alleles, phenotypes, sequence variants, chromosomal
drug targets. structure variants) at significant frequencies in the popula-
tion. Looser usages among molecular geneticists include (1)
pharmacokinetics The absorption, activation, catabolism
any sequence variant present at a frequency >1% in a popu-
and elimination of a drug.
lation (2) any nonpathogenic sequence variant, regardless of
phase-known In a pedigree, a person in whom the phase frequency. (Section 13.1)
of two or more loci (i.e. the combination of alleles inherited
polymorphism information content (PIC) A measure of
from each individual parent) is known. (Section 14.3)
how often genotyping a polymorphism can make a meiosis
phase-unknown—see phase-known informative for linkage. For most purposes the average het-
erozygosity at the locus is used for this purpose, but the PIC
phenocopy A person (or organism) who has a phenotype is a more accurate metric.
normally caused by a certain genotype, but who does not
have that genotype. Phenocopies may be the result of a dif- polypeptide A string of amino acids linked by peptide
ferent genetic variant, or of an environmental factor. bonds (Figure 1.3). Proteins may consist of one or more poly-
peptide chains.
phenome The totality of phenotypes. Cf. genome.
population attributable risk (PAR) In epidemiology, the
phenotype The observable characteristics of a cell or contribution that a particular factor or combination of fac-
organism, including the result of any test that is not a direct tors makes to the overall incidence of a condition. Can be
test of the genotype. used to measure how much of the overall genetic suscepti-
phylogeny Classification of organisms according to per- bility to a disease can be accounted for by the factors thus far
ceived evolutionary relatedness. (Figures 10.24 to 10.26) identified. (Section 19.4)
physical map A map showing the locations of some physi- position effect Complete or partial silencing of a gene
cal entities on a chromosome or genome. The entities might when a chromosomal rearrangement moves it close to het-
be DNA sequences or features such as natural or radiation- erochromatin. (Section 16.2)
induced breakpoints. Cf. genetic map. positional cloning Identifying a disease gene using knowl-
pitch Of a spiral, the distance occupied by a single turn, edge of its chromosomal location. The way the great major-
which is 3.4 nm in the standard B-DNA double helix. ity of Mendelian disease genes were identified. (Figure 16.2)
plasma membrane The membrane that surrounds a cell. positional information Information supplied to or pos-
sessed by cells according to their position in a multicellular
plasmid A small circular DNA molecule that can replicate organism.
independently in a cell. Modified plasmids are widely used
as cloning vectors. positive selection Selection in favor of a particular geno-
type.
pleiotropy The common situation where variation in one
gene affects several different aspects of the phenotype. potency Of a cell, its potential for dividing into different
cell types. Cells can be totipotent, pluripotent or committed
ploidy The number of complete sets of chromosomes in a to one fate.
cell. Cells can be haploid, diploid, triploid… polyploid.
premutation alleles Among diseases caused by dynamic
polar body In female meiosis, the small product of the mutations (expanding repeats), a repeat expansion that is
asymmetrical division of the cell mass during each division large enough to be unstable on transmission, but not large
of meiosis. The polar bodies eventually degenerate. enough to cause disease. (Section 13.3)
poly(A) tail The string of 200 or so A residues that are primary structure Of a polypeptide or nucleic acid, the
attached to the 3¢ end of a mRNA. The poly(A) tail is impor- linear sequence of amino acids or nucleotides in the mol-
tant for stabilizing mRNA. (Figure 1.21) ecule. (Table 1.8)
polyadenylation Addition of the poly(A) tail to the 3¢ end of primary transcript The RNA product of transcription of a
a mRNA. (Figure 1.21) gene by RNA polymerase, before splicing. The primary tran-
polyclonal antibodies Natural antibodies produced by script of a gene contains all the exons and introns.
the adaptive immune system in response to an antigen. primer A short oligonucleotide, often 15–25 bases long,
Polyclonal antibodies are typically a mixture of species that which base-pairs specifically to a target sequence to allow a
respond to different epitopes of the stimulating antigen. polymerase to initiate synthesis of a complementary strand.
polygenic A character determined by the combined action primitive streak In early embryos, a transient structure
of a number of genetic loci. Mathematical polygenic theory that defines the longitudinal axis. (Figure 5.12)
(Section 3.4) assumes there are very many loci, each with a
small effect. primordial germ cells (PGCs) Cells in the embryo and
fetus which will ultimately give rise to germ-line cells.
polylinker In a cloning vector, a short sequence containing
recognition sites for several different restriction enzymes, as proband (or propositus) The person through whom a
an aid to making recombinant molecules. (Figure 6.4) family was ascertained.
polymerase chain reaction (PCR) The standard technique probe A known DNA or RNA fragment (or a collection of
used to amplify short DNA sequences. (Figure 6.16 and Box such fragments) used in a hybridization assay to identify
6.2) closely related DNA or RNA sequences within a complex,
732 Glossary: programmed cell death
poorly understood mixture of nucleic acids. In standard quantitative character A character like height, which
hybridization assays, the probe is labeled but in reverse everybody has, but to differing degree – as compared with a
hybridization assays the target is labeled. dichotomous character like polydactyly, which some people
have and others do not.
programmed cell death Programmed death of an animal
cell, in which a ‘suicide’ program is activated in the cell. quantitative PCR (qPCR) PCR methods that allow accu-
(Table 4.5 and Figure 4.16) rate estimation of the amount of template present. Reliable
qPCR methods are based on real-time techniques. (Box 8.5)
prokaryotes Single-celled microorganisms (bacteria or
archaea) that lack a membrane-bound nucleus. quantitative trait locus (QTL) A locus that contributes to
determining the phenotype of a continuous character.
prometaphase In mitosis, late prophase, when chro-
mosomes are well separated but not yet maximally con- quaternary structure The overall structure of a multimeric
tracted; the optimum stage for normal cytogenetic analysis. protein. (Table 1.8)
(Figure 2.15)
reading frame During translation, the way the continuous
promoter A combination of short sequence elements, sequence of the mRNA is read as a series of triplet codons.
normally just upstream of a gene, to which RNA polymerase There are three possible reading frames for any mRNA, and
binds in order to initiate transcription of the gene. (Figure the correct reading frame is set by correct recognition of the
11.1) AUG initiation codon.
proofreading An enzymic mechanism by which DNA rep- real-time PCR A PCR process in which the accumulation
lication errors are identified and corrected. of product is followed in real time, which allows accurate
quantitation of the amount of template present. (Box 8.5)
propositus—see proband
recessive A character is recessive if it is manifest only in the
protein A molecule consisting of one or more polypeptide
homozygote.
chains folded into a specific 3-dimensional structure.
recombinant In linkage analysis, a gamete that contains a
protein domain A structural (and often functional) sub-
combination of alleles that is different from the combination
unit of a protein; a structural module that may be found in
which the parent inherited from their parent. (Figure 14.1)
several different proteins.
recombinant DNA An artificially constructed hybrid DNA
proteome The totality of proteins in a cell or organism.
containing covalently linked sequences from two or more
proteomics Global or large-scale studies of the proteins in different sources. (Figure 6.3)
a cell or organism. (Figure 12.6)
recombinant proteins Proteins produced in expression
proto-oncogenes Normal cellular genes whose function is cloning systems. Although the vector is recombinant, the
to promote cell proliferation, and in which tumors may carry protein is not actually recombinant.
activating mutations.
recombination fraction For a given pair of loci, the pro-
proximal Of a chromosomal location, comparatively close portion of meioses in which they are separated by recombi-
to the centromere. nation. Usually signified as q. q, values vary between 0 and
0.5. (Section 14.1)
pseudoautosomal regions (PAR) Regions at each tip of the
X and Y chromosomes containing X-Y homologous genes regression to the mean The phenomenon whereby par-
(Figures 10.17 and 10.18). Because of X-Y recombination, ents with extreme values of quantitative characters have, on
alleles in these regions show an apparently autosomal mode average, children with less extreme values. This is a purely
of inheritance. (Figure 3.9) statistical phenomenon, and has no bearing on whether or
not a character is genetically determined. (Section 3.4)
pseudogene A DNA sequence that shows a high degree
of sequence homology to a non-allelic functional gene, but relative risk In epidemiology, the relative risks of develop-
which is itself nonfunctional. (Box 9.2) ing a condition in people with and without a susceptibility
factor.
purifying selection (negative selection) Selection against
unfavorable genotypes. replicate Make an exact copy.
purines Nitrogenous bases having a specific double-ring replication fork In DNA replication, the point along a DNA
chemical structure. Adenine and guanine are purines. (Fig- strand where the replication machinery is currently at work.
ure 1.2)
replicon Any nucleic acid that is capable of self-replica-
pyrimidines Nitrogenous bases having a specific single- tion. Many cloning vectors use extrachromosomal replicons
ring chemical structure. Cytosine, thymine and uracil are (as in the case of plasmids), while others use chromosomal
pyrimidines. (Figure 1.2) replicons, either directly (as in the case of yeast artificial
chromosome vectors), or indirectly, by allowing integration
pyrosequencing A technique for sequencing a few nucleo- into chromosomal DNA.
tides from a defined start point. (Figure 8.8)
reporter gene A gene used to test the ability of an upstream
qPCR—see quantitative PCR sequence joined on to it to cause its expression. Putative cis-
quadrupole mass analyzer A method of separating ana- acting regulatory sequences can be coupled to a reporter
lytes by mass/charge ratio for mass spectrometry. (Box 8.8) gene and transfected into suitable cells to study their func-
tion. Alternatively, transgenic animals (and other organisms)
Glossary: sense strand 733
are often made with a promotorless reporter gene integrated RNA polymerase An enzyme that can add ribonucleotides
at random into the chromosomes, so that expression of the to the 3¢ end of an RNA chain. Most RNA polymerases use a
reporter marks the presence of an efficient promoter. (Figure DNA template, but some use an RNA template, and hence
20.9) synthesize double-stranded RNA. (Table 1.5)
reporter molecule A molecule whose presence is read- RNA processing The processes required to convert a pri-
ily detected (for example, a fluorescent molecule) that is mary transcript into a mature messenger RNA – capping,
attached to a DNA sequence we wish to monitor. (Figures 7.7 splicing and polyadenylation.
and 7.8)
RNA splicing—see splicing
restriction endonucleases A bacterial enzyme that cuts
RT-PCR—see reverse transcriptase PCR
double-stranded DNA at a short (normally 4, 6 or 8 bp) rec-
ognition sequence. (Box 6.1 and Table 6.1) SAGE (serial analysis of gene expression) A method of
expression profiling based on sequencing. (Box 11.1)
restriction fragment length polymorphism (RFLP) A
DNA polymorphism that creates or abolishes a recogni- satellite (chromosome) Stalked knobs variably seen on
tion sequence for a restriction endonuclease. When DNA the ends of the short arms of the acrocentric chromosomes
is digested with the relevant enzyme, the sizes of the frag- (13, 14, 15, 21, and 22)
ments will differ, depending on the presence or absence of
the restriction site. (Figures 8.14 and 13.1C) satellite (DNA) Originally described a DNA fraction that
forms separate minor bands on density gradient centrifu-
restriction site polymorphism (RSP)—see restriction frag- gation because of its unusual base composition. The DNA
ment length polymorphism is composed of very long arrays of tandemly repeated DNA
sequences.
retrogene A functional gene that appears to be derived
from a reverse-transcribed RNA. (Table 9.8) second messenger Intracellular signaling molecules that
relay signals from cell surface receptors to downstream tar-
retrovirus An RNA virus with a reverse transcriptase func-
gets. (Table 4.4)
tion, enabling the RNA genome to be copied into cDNA prior
to integration into the chromosomes of a host cell. (Figure secondary structure The path of the backbone of a folded
17.3) polypeptide or single-stranded nucleic acid, determined by
weak interactions between residues in different parts of the
reverse genetics Inferring phenotypes from knowledge of
sequence. (Table 1.8)
genes (a reversal of the classical pathway in which genes are
identified through the study of phenotypes) second-degree relatives Uncles, aunts, nephews, nieces,
grandparents, grandchildren and half-sibs.
reverse transcriptase An enzyme, often of viral origin, the
makes a DNA copy of an RNA template; a RNA-dependent segmental aneuploidy syndrome A syndrome caused
DNA polymerase. by deletion or duplication of a segment of a chromosome.
(Table 13.2)
reverse transcriptase PCR (RT-PCR) Indirect PCR ampli-
fication of RNA by first making a cDNA copy using reverse segmental duplication The existence of very highly related
transcriptase. DNA sequence blocks on different chromosomes or at more
than one location within a chromosome. (Figure 9.6)
ribonuclease An enzyme that breaks down RNA.
segregation (1) The distribution of allelic sequences
ribonuclease protection assay A method for quantitat-
between daughter cells at meiosis. Allelic sequences are said
ing one specific RNA transcript in a complex mixture. Uses
to segregate, non-allelic sequences to assort. (2) In pedigree
a labeled antisense probe to protect the transcript of interest
analysis, the probability of a child inheriting a phenotype
from degradation by ribonuclease.
from a parent.
ribosomal DNA The DNA from which ribosomal RNA is
segregation analysis The statistical methodology for infer-
transcribed. In human cells, located on the short arms of
ring modes of inheritance. (Section 15.2)
the acrocentric chromosomes (13, 14, 15, 21 and 22). (Figure
1.22) segregation ratio The proportion of offspring who inherit
a given gene or character from a parent. (Section 15.2)
ribosome The large cytoplasmic protein-RNA complex
where polypeptides are assembled using information in a semi-conservative Of DNA replication, each daughter
messenger RNA. double helix contains one parental and one newly synthe-
sized strand. (Figure 1.10)
ribozyme A natural or synthetic catalytic RNA molecule.
semi-discontinuous Of DNA replication, where one newly
risk ratio (l) In family studies, the relative risk of disease
synthesized strand has to be made in short pieces (Okazaki
in a relative of an affected person, compared to a member of
fragments) because DNA polymerase can only extend a
the general population. (Section 15.1)
chain in the 5¢Æ3¢ direction. (Figure 1.11)
RNA editing Insertion, deletion or substitution of nucleo-
sense strand The DNA strand of a gene that is comple-
tides in an mRNA after transcription. An unusual event in
mentary in sequence to the template (antisense) strand, and
humans. (Figure 11.30)
identical to the transcribed RNA sequence (except that DNA
RNA interference (RNAi) The use of siRNAs to knock down contains T where RNA has U). Quoted gene sequences are
(but rarely completely abolish) expression of specified genes. always of the sense strand, in the 5¢Æ3¢ direction. (Figure
A powerful tool for studying gene function. (Box 9.6) 1.13)
734 Glossary: sensitivity
sensitivity Of a test, the proportion of all true positives that small nucleolar RNA (snoRNA) A large family of small
it is able to detect. RNA molecules present in the nucleolus that act as guides
to modify specific bases in other RNA molecules, especially
sequence identity The degree to which two nucleic acid or
ribosomal RNAs.
protein sequences are identical.
SNP—see single nucleotide polymorphism
sequence similarity A looser measure of sequence identity
that takes into account synonymous codon replacements in somatic cell Any cell in the body that is not part of the
DNA and conservative amino acid replacements in proteins. germ line.
sequence-tagged site (STS) Any unique piece of DNA for somatic cell nuclear transfer (SCNT) An experimental
which a specific PCR assay has been designed, so that any manipulation in which the nucleus of an unfertilized egg is
DNA sample can be easily tested for its presence or absence. removed and replaced by the nucleus of a somatic cell from
(Box 8.1) another animal. The technique used to produce Dolly the
sheep. (Figure 20.4)
sex chromatin The Barr body (q.v.)
somatopleure In an embryo, the somatic mesoderm. (Fig-
short interfering RNA (siRNA) 21–22 nt double-stranded
ure 5.16)
RNA molecules that can dramatically shut down expres-
sion of genes (RNA interference). siRNAs are a major tool for somites Paired blocks of segmental mesoderm that will
studying gene function. (Box 9.6) establish the segmental organization of the body by giving
rise to most of the axial skeleton (including the vertebral col-
short tandem repeat (STR) polymorphism A polymorphic
umn), the voluntary muscles and part of the dermis of the
microsatellite.
skin. (Figure 5.16)
shotgun sequencing Sequencing a genome by mass
Southern blot Transfer of DNA fragments from an electro-
sequencing of random fragments, then using computers to
phoretic gel to a nylon or nitrocellulose membrane (filter), in
assemble the mass of short sequence reads into a final over-
preparation for a hybridization assay. (Figure 7.10)
all sequence. Works well for simple genomes, but assembly
is difficult for large genomes with much repetitive DNA, as specificity In testing, a measure of the performance of a
with humans. test. Specificity = (1 – false positive rate). (Box 19.2)
silencer Combination of short DNA sequence elements splanchnopleure In an embryo, the visceral mesoderm.
which suppress transcription of a gene. (Figure 5.16)
silent change A nucleotide substitution in a coding spliceosome The large ribonucleoprotein complex that
sequence that does not alter the amino acid encoded. Silent splices primary transcripts to remove introns.
mutations may nevertheless cause problems by interfering
splicing Cutting out the introns from an RNA primary tran-
with splicing. (Figure 13.13C)
script and joining together the exons. (Figures 1.16 and 1.18)
SINE (short interspersed nuclear element) A class of
SSCP (single strand conformation polymorphism) A
moderate to highly repetitive DNA sequence families, of
method for scanning a short piece of DNA (up to 200 bp) for
which the best known in humans is the Alu repeat family.
sequence variants compared to a control sample. (Figure
(Figures 9.20 and 9.21)
18.2)
single nucleotide polymorphism (SNP) A position in the
start codon In mRNA, the AUG codon at which the ribo-
genome where two or occasionally three alternative nucleo-
some initiates protein synthesis. Start codons are embed-
tides are common in the population. May be pathogenic or
ded in the Kozak consensus sequence, GCCRCCAUGG (R =
neutral. The dbSNP database lists human SNPs, but includes
purine).
some rare pathogenic variants and some variants that
involve two or more contiguous nucleotides. stem cell A cell that can act as a precursor to differentiated
cells but which retains the capacity for self-renewal.
single selection Collecting a small subset of families in a
population by collecting consecutive cases with the condi- stem cell niches The special locations of stem cells. (Figure
tion under study. Families with two affected cases will be 4.18)
twice as likely to feature in the collection as those with only
one affected case, and families with three affected cases sticky end A short single-stranded protrusion at one end
will be three times as likely. This introduces specific biases of a double-stranded nucleic acid. Molecules with comple-
in the epidemiology of the sample compared to the general mentary sticky ends can associate and then be covalently
population, which need appropriate statistical treatment. joined by DNA ligase. This is the key step in making recom-
(Box 3.4) binant DNA.
single-stranded binding protein A protein that binds and stop codon In mRNA, a UAA, UAG or UGA triplet. When the
stabilizes single-stranded DNA. Important in recombination ribosome encounters an in-frame stop codon it dissociates
and DNA repair. from the mRNA and releases the nascent polypeptide.
sister chromatid Two chromatids present within a single stratification A population is stratified if it consists of sev-
chromosome and joined by a centromere. Non-sister chro- eral sub-populations that do not interbreed freely. Strati-
matids are present on different but homologous chromo- fication is a source of error in association studies and risk
somes. estimation.
Glossary: triplet 735
stringency (of hybridization) The choice of conditions time-of-flight (TOF) mass analyzer A method of analysis
that will allow either imperfectly matched sequences or only in mass spectrometry based on the time taken for ions to
perfectly matched sequences to hybridize. travel down a flight tube. (Box 8.8)
subfunctionalization The independent specialization of tissue A set of contiguous functionally related cells.
the two gene copies formed by gene duplication. (Figure
tissue hybridization Hybridization of a labeled probe to
10.11)
RNA molecules in a tissue section to show their distribution.
subtraction cloning A method of cloning DNA sequences (Figure 7.12)
that are present in one DNA sample and absent in a second,
Tm—see melting temperature
generally similar, sample. Has been used to clone a gene that
is deleted in a patient with a disease by subtraction against topoisomerase An enzyme that can unwind DNA, relax
normal DNA. coiling or even pass one DNA double helix through another
by making temporary cuts and then rejoining the ends.
supercoiling Coiling an already coiled strand.
totipotent A cell that is able to give rise to all the cell types
susceptibility gene A gene, variation in which influences
in an organism.
susceptibility or resistance to a complex disease.
trait A character or phenotype.
synapsis A close functional association of two partners,
e.g. homologous chromosomes in prophase I of meiosis trans-acting Of a regulatory factor, affecting expression
(Figure 2.6) or neurons in the nervous system. (Figure 4.13) of all copies of the target gene, irrespective of chromosomal
location. Trans-acting regulatory factors are usually proteins
synaptonemal complex A proteinaceous substance that
which can diffuse to their target sites.
helps link paired homologous chromosomes during pro-
phase I of meiosis. transchromosomic mice Mice engineered to carry a whole
exogenous (normally human) chromosome. Named by anal-
synonymous Of codons, two codons that specify the same
ogy to transgenic mice.
amino acid.
transcription factors DNA-binding proteins that promote
synteny Loci are syntenic if they are on the same chro-
transcription of genes. (Section 11.1)
mosome. Syntenic loci are not necessarily linked: loci suffi-
ciently far apart on the chromosome assort at random, with transcriptome The total set of different RNA transcripts in
50% recombinants. a cell or tissue. Cf. genome, proteome.
synthetic biology The attempt to produce wholly synthetic transduction (1) Relaying a signal from a cell surface recep-
living organisms. tor to a target within a cell. (2) Using recombinant viruses to
introduce foreign DNA into a cell.
systems biology The attempt to get a full understanding of
how cells and organisms function by quantitative modeling transfection Direct introduction of an exogenous DNA
of the network of interactions between genes, pathways and molecule into a cell without using a vector.
metabolism that link inputs and outputs.
transformation Of a cell. (1) Uptake by a competent bacte-
T cells (T lymphocytes) A heterogeneous set of lympho- rial cell of naked high molecular weight DNA from the envi-
cytes including T helper cells and cytotoxic T lymphocytes ronment. (2) Alteration of the growth properties of a normal
that, between them, are responsible for adaptive cell-medi- eukaryotic cell as a step towards evolving into a tumor cell.
ated immunity.
transgene An exogenous gene that has been transfected
tag-SNPs Single nucleotide polymorphisms selected into cells of an animal or plant. It may be present in some
because the combined genotypes of a small number of tissues (as in human gene therapies) or in all tissues (as in
such tag-SNPs serve to identify haplotype blocks and make germ-line engineering, e.g. in mouse). Introduced trans-
it unnecessary to genotype every SNP in the block. (Figure genes may be episomal and be transiently expressed, or can
15.7) be integrated into host cell chromosomes.
target In hybridization assays, the sequence to which the transgenic animal An animal in which artificially intro-
probe is intended to hybridize. duced foreign DNA (a transgene) becomes stably incorpo-
rated into the germ line. (Figures 20.2 and 20.3)
TATA box A short sequence, TATAAA or a close variant, that
is part of the promoter of many genes that are transcribed by transit amplifying cells The immediate progeny by which
RNA polymerase II in a tissue-specific or stage-specific way. stem cells give rise to differentiated cells. Transit amplifying
cells go through many cycles of division, but eventually dif-
taxon A phylogenetic group. (Figure 10.22)
ferentiate.
terminal differentiation The state of a cell that has ceased
translocation Transfer of chromosomal regions between
dividing and has become irreversibly committed to some
nonhomologous chromosomes. (Figure 2.23)
specialized function.
transmission disequilibrium test (TDT) A statistical test
tertiary structure The 3-dimensional structure of a poly-
of allelic association. (Box 15.3)
peptide. (Table 1.8)
transposon A mobile genetic element. (Figures 9.20 and
third-degree relatives The parents or children of second-
9.21)
degree relatives of a person, most commonly the first cous-
ins. triplet Three consecutive nucleotides.
736 Glossary: trophoblast (or trophectoderm)
trophoblast (or trophectoderm) Outer layer of polarized whole genome amplification In vitro amplification of all
cells in the blastocyst which will go on to form the chorion, the DNA of a genome as a way of making a precious small
the embryonic component of the placenta. genomic DNA sample go further. Done using phi29 DNA
polymerase rather than PCR. (Box 6.2)
tropism The specificity of a virus for a particular cell type,
determined in part by the interaction of viral surface struc- wobble pairing A special relaxed base-pairing that occurs
tures with receptors present on the surface of the cell. between the 3¢ nucleotide of a codon and a tRNA anticodon.
(Table 1.6)
tumor suppressor gene (TSG) A gene whose normal func-
tion is to inhibit or control cell division. TSG are typically X-inactivation (lyonization) The epigenetic inactivation
inactivated in tumors. (Section 17.3) of all but one of the X chromosomes in the cells of mammals
that have more than one X. (Figures 11.19 and 11.20)
ultraconserved sequence Genomic sequences >200 bp
long that are 100% conserved between human, rat and X-inactivation center (Xic) The location on the proxi-
mouse genomes. They are likely to be important regulatory mal long arm of the human X chromosome from which the
elements. spreading X-inactivation is initiated. (Section 11.3)
unclassified variant A DNA sequence change seen in XY body In male meiosis, the X and Y chromosomes associ-
diagnostic testing, where the laboratory is unable to decide ate in a condensed body of heterochromatin.
whether or not it is pathogenic.
YAC—see yeast artificial chromosome
uniparental diploidy A 46,XX diploid conceptus where
yeast artificial chromosome (YAC) A vector able to propa-
both genomes derive from the same parent. Such concep-
gate inserts of up to 2 megabases in yeast cells. (Figure 6.7)
tuses never develop normally.
yeast two-hybrid system A technique for identifying pro-
uniparental disomy A cell or organism in which both cop-
tein-protein interactions. Proteins that physically associate
ies of one particular chromosome pair are derived from one
are identified by their ability to bring together two separated
parent. Depending on the chromosome involved, this may
parts of a transcription factor, and so stimulate transcription
or may not be pathogenic.
of a reporter gene in yeast cells. (Figure 12.7)
unrelated Ultimately everybody is related; the word is
zinc finger nucleases Synthetic enzymes that combine an
used in this book to mean people who do not have an identi-
endonuclease module with a sequence-specific targeting
fied common ancestor in the last four or so generations.
module, so as to cleave DNA at a selected sequence. (Figure
untranslated region (5¢UTR, 3¢UTR) Regions at the 5¢ end 20.13)
of mRNA before the AUG translation start codon, or at the
zona pellucida A layer of glycoprotein that surrounds an
3¢ end after the UAG, UAA or UGA stop codon. (Figure 1.23)
unfertilized egg and acts as a barrier to fertilization.
variable expression Variable extent or intensity of phe-
zone of polarizing activity (ZPA) During limb develop-
notypic signs among people with a given genotype. (Figure
ment, cells of the ZPA secrete a morphogen that controls the
3.17)
patterning of the digits. (Figure 5.6)
vector A nucleic acid that is able to replicate and maintain
zoo blot A Southern blot containing DNA samples from
itself within a host cell, and that can be used to confer simi-
a range of different species, used to check for conserved
lar properties on any sequence covalently linked to it. (Table
sequences.
6.2)
zygote The fertilized egg cell.
Watson–Crick rules Describe the normal base-pairing in
double-stranded nucleic acid: A with T (or U); G with C.
western blotting A procedure analogous to Southern blot-
ting but involving proteins separated by charge and size on
electrophoretic gels, blotted onto a membrane and detected
using antibodies or stains. (Figure 8.21)
737
Index
Notes: Please note that all genes mentioned in the text are listed in the index
under ‘genes (list)’ using their official HUGO nomenclature where possible.
Entries followed by ‘f’ refer to figures and those followed by ‘t’ and ‘b’ refer to
tables and boxed material respectively. Entries followed by ‘ff’ (or ‘bb’ or ‘tt’) refer
to figures (boxes; tables) which span consecutive pages. Please note that all
entries refer to humans unless otherwise stated
Alu repeats, 292–293, 292f Anatomical phenotyping, 642f see also Transfer RNA (tRNA)
copy number, chimpanzee, 337 Ancestral chromosome segments, 408, 457, 517 Antigen,
copy number, human, 291f autozygosity and, 457, 458, 459 endogenous, 127
evolutionary origin from 7SL RNA, 293 disease mapping, 457, 459–460 exogenous, 127
multiple target PCR, 187 association studies see Association studies lymphocytes and, 120–121, 123
selective pressure on, 293 cystic fibrosis and, 459, 459t presentation to T cells, 125, 126b, 126f
symmetric repeats, 292–293 NBS, 459–460, 460f Antigen presentation mechanism, 126b, 126f
Alzheimer disease, non-parametric linkage analysis, 474–475, Antigen-presenting cells (APCs), 120t, 121t, 126b, 126f
clinically relevant associations, 532–533 474f professional, 126b
early-onset disease, 474, 532–533 heterozygote advantage and, 459 see also specific cell types
late-onset disease (Apoε4), 533 see also Heterozygote advantage Antisense DNA strand, 13, 13f
linkage analysis, 474, 532–533 recombination and linkage disequilibrium, 447, Antisense RNA/transcripts,
PCD role, 113 457, 459, 479–480, 483 definition, 192b
protein aggregation, 425b size of shared segments, 480–481, 481f imprinting and, 369
Amber termination codons, 22 see also Haplotype blocks noncoding RNA (ncRNA), 287
AMCA, excitation/emission wavelengths, 201t Anchored PCR, 186b RNA processing, 287
Amebae, as model organisms, 299b, 300 Anchoring junctions, 98f, 99 X-inactivation and, 367
Amelogenin, as sex marker, 597 see also specific types see also RNA interference
Amino acid(s), 4 Androgen insensitivity, 72 see also siRNA (short interfering RNA)
classification by side chain, 4–5 sexual differentiation and, 159 Antisense technology, 390
acidic amino acids, 5, 5f Androgen receptor, mutations in, 72 antisense oligonucleotides, 390
basic amino acids, 4, 5f Anergy, self-reactive T cells and, 128b chemically stable classes, 706, 706f
nonpolar neutral amino acids, 5, 5f Anesthetics, malignant hyperthermia and, 616 enhancing cellular uptake of, 706
uncharged polar amino acids, 5, 5f Aneuploidy, 50–51, 51t inducing exon skipping, 707, 708f
definition, 4 cancer cells, 50 therapeutic, 706-707, 708f
hydrophilicity/hydrophobicity, 6 clinical consequences, 52–53, 52t see also Morpholino oligonucleotides
mutation effects mechanisms, 51 Antisense RNA therapy, 705, 714
conservative vs. nonconservative monosomy, 50, 51t see also Ribozymes, therapeutic; siRNA,
substitutions, 417 mosaics, 52 therapeutic
protein stability and, 417–418, 418f, rescue and uniparental disomy, 57–58 Antiserum, 244b
421–422, 422f sex chromosomes see Sex chromosome a1-Antitrypsin, 432–433, 433f
sickle-cell mutation and, 417–418, 418f aneuploidies AP axis see Anteroposterior (AP) axis
substitution nomenclature/terminology, trisomy see Trisomy APC protein,
407b see also specific disorders APC mutations, 548, 550, 562, 564
see also Mutation(s) Angelman syndrome (AS), 368–370, 427, 427t colorectal carcinogenesis, 562
post-translational modification see Post- imprint control center and, 369 genetic heterogeneity, 564
translational modification maternal origin of deletion and, 368 pathways approach and, 564–565, 564f
selenocysteine, 21st amino acid, 280b methylation assays, 577 familial vs. sporadic mutations, 563
sequence effects on protein structure, 25–27, 417 microdeletions, 368, 368f, 428 functional role, 550, 563, 564
structure, 4, 5f UBE3A gene defect, 368–369, 369f Apes,
general formula, 5f Angiogenesis, humans see Human(s)
R-groups of the 20 common amino acids, inhibition using aptamers, 686 vertebrate phylogeny and, 331f
4–5 inhibition using mAb, 686t see also individual species
side chain modification, 6 tumor dependence on, 565 Apolipoprotein B, 375
translation and Angiogenic switch, 565 Apolipoprotein E, late-onset Alzheimer disease and,
specification by mRNA see Genetic code Angiotensin-converting enzyme (ACE), 612, 615, 615f 533
tRNA recognition, 20, 280b Animal models, see also Model organisms Apoptosis (type I programmed cell death), 111–112
see also Aminoacyl tRNA; Translation of disease, 662-672 development role, 108, 112, 112t, 113, 145–146
tRNA genes and frequency correlation, 279 of development, 135-136, 135t C. elegans and, 112, 302b
see also individual amino acids Animal–vegetal axis, 140b, 141f digit formation and morphogenesis, 146,
Amino acid substitution(s) Anion-exchange chromatography, DNA, 172 146f
Aminoacyl tRNA, 20 Aniridia, 351, 352f, 508 disease role, 112–113
base wobble and, 23, 23t, 259, 280b Anneal/annealing, 193f cancer, 546, 557
ribosome binding sites, 21, 21f, 22 definition, 192b DNA damage-induced, 413, 557
Aminomethylcoumarin (AMCA), excitation/emission factors affecting, 194–195 cancer role, 546, 557
wavelengths, 201t kinetics, 196 defects, 416
Amniocentesis, 587, 629 PCR and, 183 oncogene activation and, 546
Amnion, function and origin, 149b, 151 primer Tm, 185 immune/defensive role, 112, 112t
Amniotes, definition, 327b Anterior visceral ectoderm (AVE), 141b, 141f mechanism/pathways, 113–114
Amniotic cavity, 151, 151f, 152f Anteroposterior (AP) axis, caspases and, 113
Amniotic fluid, 149b asymmetry, 139, 140f extrinsic pathway, 113, 114f
AMP (adenosine monophosphate), 3f Hox genes and, 142f intrinsic pathway, 113–114, 114f
Amphibian(s), polarity and axis specification in mammals, p53 and, 551
developmental models, 135t, 136, 302–303 140bb, 141f structural features, 111
polyploidy, 319 Anthropoids, definition, 327b tissue turnover and, 112, 112t
sex determination, 321t Antibiotic resistance genes, see also Programmed cell death
vertebrate phylogeny and, 331f expression cloning, 182, 183f Aptamers, 393f, 685
Amphipathic alpha helix, 26f vector construction, 170 therapeutic applications, 685-686
Amphioxus, Antibodies see also Immunoglobulin(s) Archaea (= Archaebacteria), 92–93, 92f
intron conservation, 314 functions, 122f, 123-124, 125f Arginine, structure, 5f
model organism, 303t, 314, 314f monoclonal, see Monoclonal antibodies Argonaute (Ago) endoribonuclease,
phylogenetic relationships, 330f polyclonal vs. monoclonal, 244b miRNA maturation, 283, 285f
Amplification refractory mutation system (ARMS), 187, raising with peptide immunogens, 244b RNA interference, 284b, 284f, 285f, 391
581, 581f, 602 raising using fusion proteins, 244b Aripiprazole, CYP2D6 variation and, 616
Amylase, evolution, 316 use in immunoblotting, 243,244f ARMS (amplification refractory mutation system), 187,
Analytical validity, 606–607, 606b, 636 Antibodies, genetically engineered for therapy, 683- 581, 581f, 602
clinical validity vs., 607 685, 686t Array-CGH, 49–50
measures, 607b, 607t different classes of, 683, 684f, 686t copy-number variation, 409
susceptibility tests, 624 FDA-approved, 684, 686t in cancer studies, 553, 559, 560
Anaphase, safety risks, 684 principle, 507, 507f
meiosis I, aneuploidy mechanisms, 51 Anticipation, 74–75, 424, 512 whole exon deletions/duplications, 585
mitosis, 32–33, 32f Anticoagulation, warfarin, 617, 617f Arrayed primer extension technique (APEX), 582
kinetochore and, 39b, 39f, 40 Anticodons, 21, 21f Arrays see Microarray(s)
Anaphase lag, 51 specificity of codon-anticodon recognition, 280b, ARS (autonomously replicating sequence), 40–41,
Anaphase-promoting complex (APC), 550 280f 175, 175f
Index 739
Arthritis associations, see under Rheumatoid arthritis, inbreeding see Inbreeding dorsoventral axis, 139, 140f
Arthritis, rat model, 641 Asthma, embryonic-abembryonic axis, 141b, 141f
Artificial chromosome(s), 36 ADRB2 variations and pharmacodynamics, left-right axis, 139, 140f, 141
bacterial (BACs), 174–175, 231 614–615 Axon(s), 101, 101f
human, see Human artificial chromosomes (HACs) therapeutic mAbs, 686t growth/guidance, 158
yeast, see Yeast artificial chromosomes (YACs) Astral fibers, 39b, 39f
Ascertainment Astrocytes, 102
bias of, 69, 69f Asymmetric cell division, 118b
B
complete truncate, 70b female gametogenesis, 33, 33f, 145 Bacillus megaterium, genome, 93
single selection, 70b intrinsically asymmetric division and, 118b, 118f Backcrossing, of mice, 660, 661b
Ascidians, definition, 327b intrinsically symmetric division and, 118b, 118f to produce congenics, 664b
Ashkenazi Jews, founder effects, 433 morphogenesis and, 145 Bacteria (= eubacteria), 92
ASOs see Allele-specific oligonucleotide probes (ASOs) origins, 118b, 118f chromosome, 168
Asparagine, structure, 5f polar body origins, 118b, 118f commensal, 92
Aspartic acid, structure, 5f stem cells, 117, 118b, 118f, 138 DNA cloning and, 165, 168
Assay vs. test, 606 Ataxia-telangiectasia, see also Cell-based DNA cloning
Association, allelic heterogeneity, 429 extrachromosomal replicons, 168
definition, 477 DNA repair defect, 415, 416, 555 see also Phage; Plasmids
linkage vs., 479, 480f, 486, 487b A-T base pair, structure, 7f immunoglobulins and, 124, 125f.
possible causes, 477, 479 Atelosteogenesis II, 436 innate immunity and, 122
linkage disequilibrium and see Linkage ATM protein and DNA damage detection, 555, 556f mitochondrial DNA similarity, 257
disequilibrium ATP-dependent chromatin remodeling complexes, as model organisms, 298, 299b, 300
susceptibility gene identification see Association 356, 357t polycistronic transcription units, 268
studies ATryn, recombinant anti-thrombin, 683 restriction–modification systems, 165
see also Genotype–phenotype relationship Autism, copy-number variants and, 532 see also individual species
Association studies, 477–494, 498, 515–519, 535, 622 Autocatalytic (self-splicing) introns, 314 Bacterial artificial chromosomes (BACs), 174–175, 231
costs, 488 Autocatalytic nucleic acids, 275b Bacteriophage see Phage; specific viruses
genome-wide studies see Genome-wide Autocrine signaling, 102, 102f Baculovirus, transient expression vectors, 181
association (GWA) studies Autofluorescence, green fluorescent protein (GFP), Balanced chromosome abnormalities, 55–56, 56f,
HapMap project see HapMap project 245, 245f 504, 504f
linkage analysis vs., 484, 486–487, 487b, 487t, 492 Autoimmunity, 128b prenatal trisomy screening and, 629
methodology, 484–490, 516–518, 516f anergy and, 128b as unbalanced at the molecular level, 426, 506
candidate regions, 485, 516 peripheral tolerance, 128, see also Microdeletions
case-control designs see Case-control regulatory T cells and, 127 unexplained phenotypes and gene identification,
studies self-tolerance and, 128b 504–505
controls, 485, 516 Automated sequencing see DNA sequencing see also Chromosome breakpoints; specific
corrections, 485 Autonomously replicating sequence (ARS), 40–41, abnormalities
detecting weak associations, 484, 486–487, 175, 175f Balancing selection, 307b
516 Autonomous specification, 138, 139f Barr body (sex chromatin), 66, 66f
functional testing, 517–518 Autonomous transposons, 290 see also X-inactivation
identification of variants, 516–517, 517f Autonomy, screening programs and, 633 Basal lamina, 100, 100f, 139f
logistic regression, 517 Autophagy, 112 Basal layer of epidermis, 117f, 118f
phasing (genotype to haplotype Autophosphorylation, receptor-mediated signaling Basal transcription apparatus, 15, 348, 348t, 349f
conversion), 481, 482 and, 105 conformational change and elongation, 349
relative risk or odds ratio, 477, 490, 609b, Autoradiography, DNA repair and, 415
609t common radioisotopes used, 200t histone modification and, 355–356
resequencing candidate genes, 492–493, nucleic acid hybridization and, 199–200, 200b, protein–protein interactions, 352
531–532 205f, 206f, 207f see also specific components
sequences with no known function, Autosomal dominant migraine with aura, 454–455, Base(s), 7
518–519, 519f 454t base + sugar (nucleosides), 3, 3f, 4t
SNP analysis see SNP (single nucleotide Autosomal dominant thyroid hyperplasia, 433 base + sugar + phosphate see Nucleotide(s)
polymorphism) Autosomal inheritance, 64 complementarity, 8
special populations, 488, 488f dominant traits/diseases, 64f, 66b definition, 192b
statistical analysis, 481, 482b, 517 gene tracking, 592b, 592f hybridization stability and, 195
TDT see Transmission disequilibrium test mutation rate estimation, 88 see also Base pairing
(TDT) genetic mapping and see Genetic maps/mapping DNA content see Base composition
principles/theory, 477–490 recessive traits/diseases, 64f, 66b DNA vs. RNA, 3, 3f
linkage disequilibrium and, 479–481, 516, compound heterozygotes, 435 nomenclature, 3, 4t
622 founder effects, 89 pairing see Base pairing
linkage vs., 479, 480f gene tracking, 593f purines see Purines
relatedness assumptions, 479, 480f, 481f Hardy–Weinberg distribution and, 85 pyrimidines see Pyrimidines
size of shared segments, 480–481, 481f mutation rate estimation, 88 see also specific types
see also Linkage disequilibrium (LD) see also specific traits/conditions Base composition, 7
problems/weaknesses, 491–494, 516 Autosomal recessive hearing loss, GC content see GC content
chance effects (striking lucky), 485, 623 autozygosity mapping, 458, 458f hybridization effects, 195
CNV effects, 491 functional complementation and gene nonrandom nature, 7, 341
common gene–common variant model and, identification, 510, 510f PCR primers and, 185
491, 493, 493t, 516 Autosomal recessive skeletal dysplasia, 508 sequence prediction/annotation, 238
early studies, 485 Autosomes, 20, 31, 260 Base excision repair (BER), 414b, 414f, 557
epistasis and, 479 aneuploidy see Aneuploidy DNA polymerase b and, 11t
functional tests and, 417–518 DNA content/chromosome, 261t Base pairing, 7, 7f
hidden heritability and, 534 inheritance patterns see Autosomal inheritance A-T base pair, structure of, 7f
locus heterogeneity, 492 X–autosome translocations, 505–506, 506f, 520 denaturation and, 193
mutation–selection hypothesis and, Autozygosity, G-C, base pair, structure of, 7f
492–493, 493t, 494 ancestral chromosome segments, 457, 458, 459 intramolecular, 9f
non-coding DNA and, 518–519 consanguineous marriages and likelihood, nucleic acid hybridization and, 192, 194–195
non-Hardy–Weinberg distribution and 457–458, 458f in RNA molecules, 8, 9f
errors, 516 definition, 457 G-U base pairing, 8, 9f
pathogenic vs. neutral variations, 493 gene mapping using, 457–459 inosine base pairing, 374
population stratification, 479, 487 autosomal recessive hearing loss, 458, 458f splicing and, 8
type I error (false positives), 479, 485 benign recurrent intrahepatic cholestasis, wobble and, 23, 23t, 259, 280b
replicated associations, 488, 489–490, 489t, 490f, 458–459 Watson–Crick rules, 7, 7f
516, 532, 622–623, 623t haplotype blocks and, 458, 458f Base wobble, 23, 23t, 259, 280b
see also specific disorders Average linkage method, microarray data, 247b Basic amino acids, 4, 5f
Assortative mating, Axis specification and polarity, 140b, 141b, 141f Basic helix-loop-helix (bHLH) transcription factors,
Hardy–Weinberg distribution and, 85–86 anteroposterior axis, see Anteroposterior axis dominant-negative mutations and, 432
740 Index
neuronal differentiation and, 157, 157f see also specific defects therapeutic mAbs, 686t
Basic proteins, 7 Bisulfite modification, Breast cancer, genes, 526–528
Basophils, 121t DNA methylation status and, 358b, 358f, 578 association studies, 518–519, 527–528
Batten disease, 245f Bivalent chromosome formation, 34 case-contol studies, 527
Bax, apoptosis induction and, 113–114, 114f Bladder cancer, 543, 562f clinical validity and, 625–626, 626f
Bayesian calculations, Blaschko’s lines, 76 common disease–common variant model
gene tracking and genetic risk, 594–595, 594b, Blast cells, 115 and, 491
594f Blastocele, 148f, 149 overall pattern of associations, 527–528,
lod score and, 453, 453b Blastocyst, 136, 147, 148f, 149f, 150 528t
Bayes’s theorem, 594b DNA methylation, 360 susceptibility genes, 491
Baylor College of Medicine, 233 hatching, 149f, 150 three-stage genome-wide study, 527, 527f
BC200 RNA, Alu repeat origin, 278t, 336 human, 119f ATM, 491, 527, 555
B cells (= B lymphocytes), 93, 120t, 123 ICM and ES cells, 118–119, 119f BRCA1/BRCA2, 474, 491, 510, 526–527, 526f
cell surface receptors, 123 polarity and axis specification, 140b genetic testing, 582f, 586f
secreted form see Immunoglobulin(s) Blastodisc, 147 risk associated, 491, 527
clonal selection, 123 Blastomeres, 147, 148f, 688 BRIP1, 491, 510, 527
deficiency in SCID, 711 Blastula, 147 candidate genes, 527
high MHC II antigen expression, 126b DNA methylation, 360 CASP8, 491
humoral immunity role, 123–124 X-inactivation and, 366 CHEK2, 491, 527, 555
monospecificity, 128, 131 Blindness, treatment in mice using human ES-derived familial, 491, 526, 556
recombination and rearrangements, 53, 93 retinal cells, 696t FGFR2, 527, 625
see also specific subtypes Blood cells, 95t lifestyle factors vs., 626, 626f
BCR–ABL1 rearrangement, chronic myeloid leukemia, lymphocytes see Lymphocytes linkage analysis, 474, 526
48f, 543–544, 544f, 545t renewal/replenishment, 116, 116f LSP1, 527
B-DNA, 7 see also Bone marrow male, 527
BeadArray™, 209 see also specific types MAP3K1, 527
Becker muscular dystrophy, Blood clotting, 617, 617f non-positional candidate gene approach, 510
clinical heterogeneity and, 72, 436 Blood group loci, PALB2, 491, 510, 527
dystrophin frameshifts and, 421f as genetic marker, 449t TNRC9, 527
Beckwith–Wiedemann syndrome, recessive trait mimicking dominant inheritance, Bright-field microscopy, ISH and, 206, 206f
imprinting and, 75, 75f 72, 73f Brittle bone disease, 431, 431f
methylation assays, 577 Blood pressure, regulation, 615 Bromodomains, 354
Bedside genotyping, 616 Bloom syndrome, DNA repair defect, 416 Bucindolol, ADRB1 variation and response to, 615
Behavioral traits, B lymphocytes see B cells Burkitt lymphoma, MYC activation and, 544–545, 545f
multifactorial inheritance and, 63 BMPs see Bone morphogenetic proteins (BMPs) Burst size, 169
genetic mapping and, 468, 469 Body plan(s), 139 Butyrylcholinesterase, genetic variation and drug
nature-nurture problem, 468 axis specification/polarization and, 139–140 metabolism, 613, 616
need for rigid diagnostic criteria, 468 orthogonal axes, 139
Benign recurrent intrahepatic cholestasis, vertebrate symmetry/asymmetry, 137, 139–140,
autozygosity mapping, 458–459 140f
C
Beta-2 agonists, ADRB2 variations and, 614–615 see also specific axes CAAT box, 14
Beta-blockers, Bone marrow, Cadherins, 98
ADRB1 variation and response to, 615 enriched in stem cells, 695, 696t adhesion junctions and, 99
CYP2D6 polymorphism and sensitivity to, 611 lymphoid organ, 120 Caenorhabditis elegans,
bgeo gene trap vector, 658, 659f mesenchymal stem cells (stromal cells), 116 aging, 302b
Beta-pleated sheets, 26, 26f renewal/replenishment, 115 cell lineages fully known, 135, 302b
Beta-turn, 27 toxicity in TPMT variants, 614 centromeres, 41f
Bias of ascertainment, 69,69f Bone marrow transplantation, conserved noncoding DNA, lack of, 333
in twin studies, 470 DNA profiling and, 598 developmental genes/pathways, 160, 161f
BiDil®, race-specific license, 619 in leukemia/lymphoma treament, 695, 696t evolutionary conservation and, 160, 161f, 502
Bilaminar germ disk, 151, 152f Bone morphogenetic proteins (BMPs), disease model, 641t
Bilaterians (= triploblasts), definition, 327b evolutionary conservation, 160, 161f gene family expansions, 332
phylogenetic relationships, 330f neural pattern formation and, 155, 157, 157f gene knockdown in, 390–391, 656
Bin matching, DNA fingerprinting, 596–597 primordial germ cell migration and, 158 gene numbers, 329, 331t
BioBank project, UK, 488 Bonferroni correction, 455, 485 genetic screening, 394t
Bioinformatics, Bootstrapping, phylogenetic trees, 329 genome sequencing, 228, 234–235, 236t
comparative genomics and, see Comparative Bootstrap value, 329 genome size, 331t
genomics Brain, holocentric chromosomes, 41f
databases see Databases development see Neural development in vivo drug screening, 680b
functional genomics and, 383, 384–388 gene expression database, 385t lack of DNA methylation and, 360
interaction proteomics and, 400, 400t stem cells, 115 microRNAs, 376
model-based searches, 386, 386f, 387b Brainbow transgenes, 662, 662f model organism, 135, 135t, 301, 302b, 394t, 502
genome browsers, 235, 236f, 236t, 312, 313f, 313t, Branchio-oto-renal syndrome, gene identification, nervous system, 135, 302b
501, 501f 506, 522 phylogenetic relationships, 330f
genome compilations, 236t BRCA1/BRCA2 proteins see under Breast cancer programmed cell death, 112, 302b
homology searching, see Sequence homology Breast cancer, RNA interference, 302b
searching BRAC1 and, 556, 622, 623t sex determination system, 321t
programs/algorithms see Computer programs/ inactivation (sporadic vs. familial cancer), Wormbase database, 394t
algorithms 547 CAGE analysis, 348b
Biological clock, cell senescence and, 111 minisequencing, 582 Cajal bodies, 94b
Biomarkers, trisomy 21, 630, 630t MLPA test, 586f small nuclear RNA, 278t, 281, 283
Biometrics, history, 79 as susceptibility gene see Breast cancer, Calcium ions (Ca2+),
Biotin–streptavidin system, 201–202, 202f genes apoptosis induction and, 113, 114f
in massively parallel DNA sequencing, 220 testing, value of, 609 as second messenger, 107, 107f
in phage display, 180f, BRAC2 and, 556, 622, 623t cAMP, see Cyclic AMP (cAMP)
in yeast two hybrid assays, 397f as susceptibility gene see Breast cancer, Campomelic dysplasia, 508
Bipolar disorder, association studies, 489, 490f, 623t genes Cancer(s),
Bird(s), ERBB2 amplification, 619 age-dependence, 539
as developmental models, 135t, 136, 303 oncogene expression, 542 benign vs. malignant, 538
see also Chick, as developmental model personalized medicine cell proliferation and cell death, 538
sex determination, 321, 321t clinical validity of susceptibility tests, cells see Cancer cell(s)
transgenesis, 645 625–626, 626f classification, 538, 538f, 560, 561, 562f, 620
vertebrate phylogeny and, 331f trastuzumab (Herceptin®) and, 619 Darwinian perspective, 538, 566
Birth defects, tumor profiling and, 620–621, 621f defenses against, 539
multifactorial inheritance and, 63 protein truncation test, 576 development see Carcinogenesis
Index 741
genetics see Cancer genetics oncogenes see Oncogene(s) tumor suppressors see Tumor suppressor
hereditary, delayed onset, 74 recombination and, 443b genes
viruses and, 540 stem cells and, 115, 538, 539 recombination and, 443b
acute transforming retroviruses, 540, 540f tumor suppressors see Tumor suppressor genes sporadic vs. familial cancers, 546, 547
Cancer cell(s), see also Carcinogenesis see also Cancer genetics
acquired characteristics, 539 Cancer Genome Anatomy Project, 559 Carcinoma,
cell cycle and see Cell cycle Cancer Genome Atlas, 223 adenomas and, 561
choices made, 567f Cancer Genome Atlas Research Network, 559 colorectal see Colorectal cancer
epigenetic changes, 513 Cancer immunotherapy, 684, 698-699, 698f, definition, 538
genetics see Cancer genetics Cancer modeling in mice, 667–670, 669t development, 538f
genomic instability as feature, 539, 553 oncogene activation, 670 lack of chromosome rearrangements, 544
see also Genomic instability sporadic cancers, 670 Cardiac muscle cells, 101
hypoxia and, 565 tumor suppressors, 668, 669t Cardiac neural crest, 156b, 156f, 156t
metastasis, 565 unexpected phenotypes, 668, 669t Cardiomyocytes, polyploidy, 96
PCD avoidance, 113 Cancer treatment, Case-control studies, 492–493, 516
selection and, 538, 539 bone marrow transplantation, 695, 696t odds ratio, 609b, 609t
stem cells and, 538 gene therapy, see Cancer gene therapy TDT vs., 487–488
Cancer cytogenetics, immunotherapy, see under Cancer gene therapy WTCCC and see Welcome Trust Case-Control
aneuploidy, 50 therapeutic mAbs and, 684, 686t Consortium (WTCCC)
chromosome rearrangements, 48, 48f, 53, 542t, TGN1412 mAb trial, 684 Caspase(s), role in apoptosis, 113, 114f
620 vaccines, see Cancer vaccines Caspase-activated DNase, 113
oncogenes see Oncogene(s) see also Cancer immunotherapy Catarrhine primates,
tumor-specific, 544, 545t Cancer vaccines, definition, 327b
see also specific rearrangements preventive, 687 vertebrate phylogeny and, 331f
clinical importance, 544 therapeutic, 687 Cat, as model organism, 305t
double minute chromosomes, 542 Candidate genes, calico cat and X-inactivation, 366f
FISH/chromosome painting and, 48, 48f, 553, 554f biological roles, 514, 515 b-Catenin,
homogeneously staining regions, 542 less obvious mechanisms of action and, 514 APC and, 563, 564
instability assessment, 553, 554f non-positional identification, 509–512 colorectal carcinogenesis and, 563, 564, 564f
karyotyping difficulty, 553 animal models and, 511, 511f C-banding, 45
microarray analysis, 559 DNA characteristics and, 511–512 Cd1 receptor, 125, 128
array-CGH, 553, 559 expression cloning see Expression cloning CD133, 713
Mitelman database, 544 functional complementation see Functional CD25, 127
see also specific techniques complementation CD28 receptor, 125f, 126f
Cancer gene therapy, 698, 698f, 713, 714, 713t, 714f immunoprecipitation, 509 CD34, 711, 711f
gene therapy strategies, 698, 698f, 713-714, 714f inferring gene sequence from known CD4 receptor, 125f, 126b, 126f, 127CD8 receptor, 125f,
gene therapy trials, statistics, 709, 710f protein, 509, 509f, 523, 523f 126b, 126f, 127
examples of, 713t known pathway components, 510–511 CD40, 126b
Cancer genetics, 537–568 protein–protein interactions, 510 Cdc25C, 550
aim, 538 RNAi and, 511 Cdk1/cyclin B, 550
cell biology perspective, 564–566 positional cloning and see Positional cloning Cdk2/cyclin E, G1/S checkpoint and, 550
angiogenesis, 565 resequencing, 492–493, 531–532 cDNA see Complementary DNA (cDNA)
metastases, 565 RNAi and, 511, 515 CDR (complementarity-determining region), 123f, 124
pathways vs. individual genes/proteins, testing, 512–515 Celera, human genome mapping, 232b, 233
564–565, 564f, 565f, 566, 566f, 567 control comparisons and mutation Cell(s), 91–132
systems biology, 566, 566f, 567f screening, 512–513 adhesion between see Cell adhesion
see also Cancer cell(s) epigenetic changes and, 513 aging/senescence see Cell senescence
cell proliferation/cell death balance, 538, 550–552 functional complementation, 515 cell cycle see Cell cycle
see also Cell cycle genetically modified models, 515 cell–tissue–organ relationship, 100, 100f
chromosome-level see Cancer cytogenetics identifying mechanism of action, 513 communication between see Cell–cell
complexity, 560, 564 importance of further studies, 515 communication
DNA damage/repair genes locus heterogeneity and, 514–515 death see Cell death
DNA damage response and, 546, 554, mutational homogeneity and, 512, 515 definition, 92
555–556 problems/difficulties, 513, 513f differentiation see Cell differentiation
proofreading failure and, 412, 557 Candidate region approach (cloning) see Positional diversity, 92–97
repair pathways/genes, 556–557, 558, 558f, cloning adult human cell types, 93, 95t
558t, 560 Cap analysis of gene expression (CAGE), 348b classification, 93
see also DNA damage; DNA repair Capping of 5’ end of RNA DNA content (C), 30, 96–97, 96t
DNA methylation role, 358, 513, 548, 549f, 560, mRNA cap structure, 18, 18f DNA variation in cells of single individual,
562, 577 function, 18 96–97
driver vs. passenger mutations, 559–560, 567 mechanisms, 18, 18f immune cells see Immune system
drug metabolizing enzymes and, 614 Lsm snRNA capping, 281f see also Cell differentiation; individual cell
genome instability see Genomic instability Sm snRNA capping, 281f types
genome-wide perspective, 559–561 Capillary sequencing, 218 division see Cell division
cytogenetics, 559 Captopril, ACE polymorphism and response to, 615 evolutionary perspective, 92
epigenetic/methylation changes, 513, 560, Carbohydrates, protein modification, metabolic networks vs. linear pathways, 379
562 glycoproteins, 23–24 polarity
expression profiling, 560–561, 562f, 620–621, proteoglycans, 24 cell fate specification and, 138
621f Carcinogenesis, mammalian embryo compaction, 148, 148f
massively parallel sequencing, 559–560 cell cycle role see Cell cycle proliferation see Cell proliferation
microarray analysis, 223, 559, 560–561 cell-type differences in, 539 specialized reproductive see Germ cells
hereditary/familial complexity, 560 stem cells see Stem cells
delayed onset, 74 DNA damage response and, 546, 554, 555–556 structure see Cell structure
gatekeeper hypothesis and, 563 driver vs. passenger mutations, 559–560, 567 tumors see Cancer cell(s)
Knudson’s two-hit hypothesis and, 546–547, Knudson’s two-hit hypothesis, 546–547, 546f, 567 unicellular model organisms and, 298–301
546f, 567 as multistep process, 539–540, 539b, 540f, 561– see also specific components
LOH see Loss of heterozygosity (LOH) 563, 563f, 567 Cell adhesion, 97–102
tumor suppressor genes and, 546, 547 genome instability and see Genomic CAMs see Cell adhesion molecules (CAMs)
see also specific disorders instability cell junctions see Cell junctions
microevolutionary process, 538, 539–540, 539b, initial mutation, 539–540, 539b, 563 definition, 97
540f, 561–563, 562f mutation targets, 540, 567 ECM and see Extracellular matrix (ECM)
see also Carcinogenesis ‘gatekeeper’ hypothesis, 563 importance of, 97
microRNA (miRNA) and, 378, 561 oncogenes see Oncogene(s) morphogenesis and, 145, 145t
molecular pathology, 435 pathways as, 564–565, 564f see also specific tissues
see also Cancer cytogenetics Cell adhesion molecules (CAMs), 97–98
742 Index
adhesion junctions and, 99 plane of division and, 145 Cell size, morphogenesis and, 144
classes, 98 meiosis see Meiosis Cell structure, 92–97
domains, 98 mitosis see Mitosis cytoplasm, 93
receptor–ligand interactions, 97 as M phase of cell cycle, 30 organelles and intracellular organization, 93,
see also specific molecules stem cells see Stem cells 94b, 94f
Cell asymmetry, origins, 118b, 118f see also Cell cycle; Cell proliferation prokaryotes vs. eukaryotes, 92–93, 94f
Cell-based DNA cloning, see DNA cloning Cell fate determination, 137f, 138, 138f selectively permeable membranes, 93, 94b
Cell–cell communication, 91–132 cell fate restriction and, 114, 115, 115t, 136–137 see also specific structures/organelles
cell signaling see Cell signaling cell lineage and (autonomous specification), Cell surface receptors,
model organisms, 301 138, 139f B- cells see B cells
synapses see Synapse(s) cell position and (induction), 137–138, 138f B-cells see Immunoglobulin(s)
see also Cell adhesion; Cell junctions dependent on plane of cell division, 139f oncogenes, 541t, 542
Cell clones, 165f, 167f, 171f Cell growth, developmental, 134 T-cells see T-cell receptors (TCRs)
Cell crisis, 555 Cell junctions, 97, 98–99, 98f Cell survival, 112
Cell culture, anchoring types, 98f, 99 see also Cell death
cell proliferation studies, 108 barrier functions, 98 Cell therapy, 679, 687–696
CEPH cell lines, 455–456, 462 communicating type see Gap junctions see Stem cell therapy
crisis point, 555 see also specific types Cell transplantation, experimental, 138f
embryonic pluripotent cell lines, 119 Cell lines, embryonic pluripotent, 119 see also Stem cell therapy
see also Embryonic stem (ES) cells Cell lysis, CEN element, S. cerevisiae, 40, 40f
expression cloning and see Expression cloning DNA isolation, 172 CenH3/CENP-A, 38, 40, 41f, 356
functional genomics and, 389–390 phage-induced, 169 CENP-A see CenH3/CENP-A
disadvantages, 389, 392 Cell-mediated immunity, 123, 125–128 CENPB, 40, 290, 336
gene knockdown in, 390–392, 391f, 392f see also T cells Centimorgan (cM), definition, 444
reverse genetics in mammalian cells, Cell membrane see Plasma membrane Central dogma, 2
389–390, 390t Cell migration, exceptions to
systems biology approach, 389 gastrulation, 151–152 reverse transcription and, 2
see also RNA interference (RNAi) mesoderm pathways, 152–153, 153f, 154f RNA editing and, 374
Hayflick limit and senescence, 111, 555 neural crest pathways, 156b, 156f Centric fusion, 54
hybrid cell lines see Hybrid cells primordial germ cells, 149b, 158–159 Centrifugation, DNA isolation/purification, 172
protein expression Cell polarity, 138 ultracentrifugation, 164
analysis, 244–245 Cell proliferation, 108–111 Centrioles, 32f, 39b, 39f
recombinant proteins see Expression cloning cell cycle see Cell cycle Centromere(s), 36, 39–40, 290
see also Fluorescence microscopy cell death balance, 108 alignment (interphase nuclei), 38, 38f
transformation see also Cancer(s); Cell death chromosome abnormalities and, 53, 555
retroviral, 540 developmental, 108, 134, 145 evolution/comparative biology, 39–40, 59
telomerase and immortality, 555 highly proliferative cells/tissues, 108 evolutionary increase in size, 40
Cell cycle, 30, 108–109, 550 PCD and, 112 histone H3 variants (CenH3/CENP-A), 38,
cancer cells and dysregulation, 110, 550–552 intrinsic vs. extrinsic factors, 109 40, 41f
checkpoint defects, 109, 109f, 550, 550f limits, 111 human tandem repeats, 40
chromosomal instability and, 554 see also Cell senescence mammalian rearrangements, 320
oncogenes and, 540, 541t, 542, 546 mitogens and control, 109–110, 109f, 110f S. cerevisiae elements, 40, 40f, 175
TSGs and, 540, 546, 551–552, 551f, 552f morphogenesis and, 145 species differences, 40, 41f
see also Oncogene(s); Tumor suppressor oncogenes and, 540 heterochromatin, 40, 261, 289, 290
genes see also Oncogene(s) histone variants in, 40
cells in culture, 108 rates, 108 mitosis and, 32
chromosomal changes, 30–31, 31f, 32, 36 tumor suppressors and, 540 kinetochores, 39–40, 39b, 39f
DNA damage and, 413 see also Tumor suppressor genes see also Mitosis
exit from (G0), 108 see also Cell division neocentromeres, 320
phases, 30, 31f, 108, 109f, 550 Cell senescence, 108–114, 555 structure,
interphase see Interphase concept of, 111 C. elegans, 41f
M phase see Cell division definition, 108 H. sapiens, 41f
see also specific phases Hayflick limit, 111 S. cerevisiae, 41f
regulation, 109–110 immortal cells vs., 111 S. pombe, 41f
checkpoints, 109, 109f, 550, 550f immortality and tumorigenesis, 555 tandem repeats (a-satellite), 40, 290, 408
cleavage and, 147–148 possible causes, 111 Cephalochordates, 313, 314f
mitogens and, 110, 110f telomere length and, 43, 111, 555 definition, 327b
proto-oncogenes, 540, 541t, 542, 546 see also Cell death; Cell proliferation CEPH (Centre d’Étude du Polymorphisme Humain) cell
tumor suppressors, 540, 546 Cell shape, morphogenesis and, 144 lines, 455–456, 462
see also specific regulators Cell signaling, 102–108 CFTR see under Cystic fibrosis
synchronization at same stage, 44 gene expression regulation and, 102–103 CGH see Comparative genome hybridization (CGH)
Cell death, 108–114 direct interactions, 103, 105 cGMP see Cyclic GMP
balance with cell proliferation, 108 see also Transcription factors Character, definition, 62
during development, 108 intercellular mechanisms, 102–103, 102f, 103t Charcot–Marie–Tooth disease, 434, 435f
necrosis, 111 intracellular mechanisms, 103t mutations, 580t
programmed see Programmed cell death (PCD) ligand activation of intracellular receptor, CHARGE syndrome, gene identification, 506, 525–526,
survival vs., 112 103, 104b, 105, 105f 525f
Cell differentiation, 114–119 second messengers see Second messengers CHD family, 357t
dedifferentiation and, 115 signal transduction pathways, 103, 103t, Checkpoints (cell cycle) see Cell cycle, regulation
definition, 114 105, 106f CHEK2, 555, 556f
developmental specialization, 134, 136–138, 137f kinase cascades and, 105, 106f Chemical cleavage of mismatches (CCM), 575, 575f
differentiation potentials, 114, 115, 115t, 136–137 ligand-receptor interaction, three types, 102f, Chemical DNA damage, 411–412
directed hierarchical decisions, 114–115, 136–137 103t see also under Mutagenesis, random
exit from cell cycle and, 108, 109f signaling molecules, 102 Chemical synapses, 107–108, 108f
see also Terminal differentiation cytokines, 121 Chemokine, 122
in very early human development, 137f molecules immobilized on cell surface, 103t Chemoreceptor genes, vertebrate differences in
neuronal, 157–158 molecules that migrate to cell surface, 103t number, 332, 332t
path choice/mechanism see Cell fate receptor interactions, 102, 102f, 103t, 105, Chiasma (chiasmata), 35, 35f
determination 105f counting and genetic map length, 446, 446f
transdifferentation, small hydrophobic molecules, 103, 103t, number in human meiosis, 35
see also Stem cells 105, 105f recombination interference, 445
Cell division, 30, 31–36, 108 see also Signal transduction pathways Chick, developmental model, 135t, 136, 303, 304b
chromosomal changes, 30–31, 31f synapses see Synapse(s) digit formation, 143, 143f
cytokinesis, 32, 32f transmitting vs. responding cells, 102 Chicken,
development and, 134 see also specific molecules/pathways phylogenetic relationships, 331f
Index 743
transgenic for making recombinant proteins, 683 ChIPArray (ChIP-on-chip), 403 positional cloning and see Positional cloning
Chimeras, 97 ChIP-seq, 403 somatic, 50
definition, 51–52 tumor cells, 560 see also Mosaicism; Somatic cells
mosaics vs., 78, 78f variants, 355b, 403 structural, 51t, 53–54
mouse, in gene targeting, 650f Chromatin landscape, 355. acentric chromosomes, 53, 55f, 58f
XX/XY, 78 Chromatin remodelling complexes, 356, 357t, 365 assessment see Cytogenetics
Chimeric (V/C) antibodies, 683, 684f Chromatography, chimeric genes and, 432
Chimeric genes, 432 affinity see Affinity chromatography oncogenes and see Oncogene(s)
Chimpanzee, denaturing HPLC, 574, 574t, 575f chromosome breaks and, 53, 54f
disease model, 302t DNA isolation/purification, 172 breakpoints see Chromosome breakpoints
human vs., 337, 342 Chromodomains, 354, 361 see also DNA repair
gene inactivation/modification, 339, 339t Chromosomal instability (CIN), 553 chromosome painting and see Chromosome
gene synteny/banding patterns, 320 adenomas, 563 painting
orthologous genes, 333 analysis methods, 553, 554f clinical consequences, 55–56, 543
smaller number of Alu repeats, 337 see also Cancer cytogenetics balanced vs. unbalanced rearrangements,
model organism, 305t mechanisms, 553–554 55–56, 56f, 504, 504b
phylogenetic relationships, 331f cell cycle dysregulation, 554 cancer and see Cancer cytogenetics
Chinese hamster ovary (CHO) cells, 681 DNA damage response defects, 554, contiguous gene syndromes, 426–427,
ChIPArray, 403 555–556, 556f 504b, 520
ChIP assays see Chromatin immunoprecipitation (ChIP) DNA repair defects and, 556–557 mental retardation and, 504b
ChIP-PET, 355b telomere shortening and, 554–555 pointers to presence of in genetic disease,
ChIP-seq, 355b, 403 see also DNA damage; Telomere(s) 504b
Chlamydomonas reinhardtii, as model organism, 299b Chromosome(s), 29–59, 93, 260–261 see also Cancer cytogenetics
CHLC map, 456 abnormalities see Chromosome abnormalities deletions see Chromosome deletions
Chloroplast DNA, genetic code variations, 23 analysis dicentric chromosomes, 53, 55f, 58f, 555
Cholestasis, autozygosity mapping, 458–459 human, 43–50 fragile sites, 409, 410f, 423
Cholesterol, complexed to siRNA, 704, 717 in situ hybridization, 205–206 genome evolution and, 319–320, 319f
Chondrodysplasia in short-legged dog breeds, see also Cytogenetics; specific techniques inversions see Chromosome inversion(s)
due to FGF4 retrogene, 338, 338f artificial, 36 isochromosomes, 54
Chordates, 313 see also Human artificial chromosome (HAC); meiosis and see Meiosis
definition, 327b Yeast artificial chromosome (YAC) deletions see Microdeletions
evolution, 314 autosomes see Autosomes microduplications, 426, 506
phylogenetic relationships, 330f bacterial, 168 positional cloning and see Positional
Chordin, evolutionary conservation, 160, 161f cell cycle changes, 30–31 cloning
Choriocarcinoma, 57 see also Cell cycle ring chromosomes, origin, 53, 54f
Chorion, 136, 149, 149b, 150f cell division and see Cell division translocations see Chromosome
Chorionic villus (villi), 149b chromatids and, 31f, 32 translocation(s)
sampling (CVS), 587, 629 DNA content, 260–261, 261t see also DNA damage; specific
Chromatids, 32 see also DNA rearrangements
crossing over see Recombination evolution Chromosome banding, 30, 44–45, 205
same chromosome see Sister chromatids ancestral conserved segments, 447, 457 C-banding, 45
separation, 32–33, 32f centromeres, 39–40, 59, 320 G-banding, 44, 263
kinetochore microtubules and, 39–40, 39b, comparative genome hybridization and, gene-rich vs. gene-poor chromosomes, 263
39f 48–50 great ape–human conservation, 320
nondisjunction, 51 gene order (synteny) and, 320, 320f, 342 human chromosome groups, 45t
Chromatid breaks, 53 major rearrangements and, 319–320, 319f, Q-banding, 44–45
Chromatin, 36–38, 37f, 260–261, 498 342 R-banding, 45
10nm filament, 37, 37f phenotype evolution vs., 319, 319f reporting, 45–46
30nm filament, 37, 37f, 38 sex chromosomes see Sex chromosomes resolution, 45, 46f
analysis, 355bb, 560 telomeres, 41–42 T-banding, 45
ChIP see Chromatin immunoprecipitation function, 36 Chromosome breakpoints, 504b
(ChIP) gene density, 39, 263, 266 BCR–ABL1 rearrangement, 543–544, 544f
G-banding see G-banding nuclear location and, 39 defining/localizing, 505, 505f
‘beads on a string’ appearance, 37, 37f see also Chromatin; Gene(s) long-range effects, 508–509
chromosome-specific content, 261t groups, 45t maps, 230t
compaction, different levels of, 37-38, 37f haplotype blocks see Haplotype blocks positional cloning and see Positional cloning
definition, 37 homlogous (homologs), 34 see also Balanced chromosome abnormalities
dynamic nature, 356 human chromosomes, see Human chromosomes Chromosome breaks, 53, 54f
euchromatin see Euchromatin number/cell (ploidy) see Ploidy structural abnormalities and, 53
heterochromatin see Heterochromatin replication see also DNA repair; Double-strand breaks
histone proteins see Histone(s) end-replication problem, 42–43, 43f Chromosome conformation capture (3C, 4C), 355b,
in interphase, 38 see also Telomere(s); DNA replication 356f, 362
linker DNA, 37, 37f, 353 sex chromosomes see Sex chromosomes Chromosome deletions, 51t, 53, 54f
histone H1 and, 38 structure see Chromosome structure submicroscopic level see Microdeletions
long-range effects, 508 Chromosome abnormalities, 50–58 terminal deletions, 428
position effect, 351, 508 assessment see Cytogenetics X chromosomes, 426–427, 427f
maintenance, 360–362 cell culture crisis and immortal cells, 555 Y chromosome and potential extinction, 324
non-histone proteins, 354 constitutional, 50 Chromosome engineering, 653-654, 654f
nucleosomes see Nucleosome(s) definitions, 50 inducing deletions, 653, 654f
transcription/gene expression and, 37, 37f, 38, modeling by chromosome engineering, 666-668 induced large-scale duplication, 667, 667f
232, 260, 353–364, 378 nomenclature, 51t inducing inversions, 653, 645f
ATP-dependent chromatin remodeling numerical, 50–53, 51t Chromosome evolution, 319-326,
complexes, 356, 357t aneuploidy see Aneuploidy see Evolution
DNA loops and, 351, 351f, 364 clinical consequences, 52–53, 52t Chromosome inversion(s), 51t, 53, 54f
DNA methylation and, 357 gene dosage effects, 425 balanced, gene-level effects, 504–505
see also DNA methylation see also Gene dosage defining breakpoint, 505
histone modification and see Histone mixoploidy, 51–52 mammalian evolution and, 320
modification see also Chimeras; Mosaicism meiosis outcomes in inversion carriers, 56, 58f
open vs. closed configuration, 354–356, polyploidy, 50, 51f, 52, 52t, 53 paracentric vs. pericentric, 54f
354f, 354t sex chromosomes see Sex chromosome positional cloning and, see Positional cloning
transcription start sites and, 364, 364f aneuploidies recombination suppression, 323
see also Epigenetics/epigenetic memory; parental origin abnormalities, 56–58 sex chromosomes and, 323
Euchromatin; Heterochromatin uniparental diploidy, 57 see also Chromosome breakpoints
Chromatin immunoprecipitation (ChIP), 355b, 355f, uniparental disomy see Uniparental disomy Chromosome jumping,
403, 577 see also Imprinting; specific disorders cystic fibrosis gene identification, 521
744 Index
method, 521, 521f enhancers see Enhancers case-control studies, 609b, 609t
Chromosome painting, 35f, 47–48, 47f, 48f evolution of, 335–337 clinical heterogeneity and, 607–608
CGH see Comparative genome hybridization mutation and new binding sites, 335–336 genetic susceptibility tests and see Susceptibility
(CGH) random mutation, 336 testing
chromosomal instability, 553, 554f transposition and, 335–337, 336f, 337f prostate-specific antigen (PSA) test, 607
M-FISH/SKY, 48, 49f, 553, 554f within introns, 372, 508 relative risk see Relative risk
Chromosome rearrangement see Chromosome iron response element, 376, 376f Clonal selection of B and T cells,123
abnormalities, structural morphological divergence role, 333–335, 335f Clone contigs (clone maps) see under Framework maps
Chromosome segregation, chiasmata and, 35, 35f constraints and evolutionary conservation, Clone fingerprinting, 227–228
Chromosome set/number (n), 30 334, 334f Clones,
Chromosome spread, 46 pleiotropic developmental regulators, 335, bacterial cell, 165, 167f, 171f
Chromosome structure, 36–43 335f human, natural, see Twins, identical, 691
abnormalities see Chromosome abnormalities, nuclear receptor binding elements, 352 vertebrates, experimentally produced, 645-646,
structural promoters see Promoter(s) 646f, 690b
cell cycle changes, 30–31, 31f, 36 silencers see Silencers abnormalities in, 690b
sister chromatids, 32 structure, 335 Cloning DNA, see DNA cloning; DNA libraries;
centromeres see Centromere(s) translocation and oncogene activation, 544–546, Polymerase chain reaction
coiling and compaction, 30, 36–38, 37f, 353 545f Cloning amphibians, 690b
banding patterns see Chromosome banding see also specific elements Cloning mammals, 115, 645-646
cohesins and, 32 Citalopram, HT2RA polymorphism and sensitivity Dolly the cloned sheep, 646, 646f
levels of/variations, 38 to, 616 Cloning humans, see Reproductive cloning;
protein complexes, 36 Clade, definition, 327b Therapeutic cloning
solenoids, 38 Cladogram, 328, 328f Clozapine, HT2RA polymorphism and sensitivity to,
see also Chromatin Classical genetics (phenotype to gene), 616
copy-number variation (CNV), 263 functional genomics and, 389, 403 Clustering methods, microarray data, 247b, 247f
see also Repetitive DNA genome mapping and, 225 Cnidarians, definition, 327b
DNA content/chromosome, 261t Mendelian vs. quantitative models, 79 Co-activators, 350, 352–353
fragile sites, 409, 410f, 423 Mendel’s experiments, 62 transcription factors, 104b
grouping and types, 45t vs. reverse genetics, 389 Coagulation, 617, 617f
insulators, 352 see also Monogenic (Mendelian) inheritance; Coat color, X-inactivation and, 366, 366f
interphase Multifactorial inheritance Coated pits, endocytosis and, 705f
condensation (M phase vs.), 38 Classical pathway, complement, 122 Cockayne syndrome, DNA repair defect, 415
gene expression and, 36, 37, 37f Class (isotype) switching, 130–131, 130f Codeine, CYP2D6 polymorphism and sensitivity to,
nuclear territories, 38–39, 38f, 361 Clathrin, in endocytosis, 705f 611
Mus musculus, 367 Cleavage, embryonic 147–148 Coding DNA (= DNA coding for protein),
nomenclature and, 45b maternal cell cycle regulation and, 147–148 highly conserved sequences
normal variations, 409 physiological context, 149f repetitive sequences within, 266, 314f
replication origins, 36, 40–41 rotational, 148, 148f tiny fraction of human genome, 256, 256f
telomeres see Telomere(s) specific mammalian features, 148, 148f see also Protein-coding genes
Chromosome territories, 38, 38f Cleavage bodies, 94b CODIS, 597, 597t, 599, 601
Chromosome translocation(s), 51t, 54 Cleft palate, 63, 82 Co-dominant, definition, 65
balanced Clinical genetics, Co-dominant selection, 307b
in Sotos syndrome, 505, 505f counseling see Genetic counseling Codons, 2, 20
and unbalanced, 55–56, 56f cytogenetics see Cytogenetics anticodon recognition, 21, 21f, 280b, 280f
see also Balanced chromosome place in 21st century, 634–635, 634f, 635f wobble and, 23, 23t, 259, 280b, 280f
abnormalities polymorphism definition, 406b see also Transfer RNA (tRNA)
carriers, 55–56 testing see Genetic testing degenerate, 22–23, 22f, 417, 509
outcomes, 56f Clinical heterogeneity, 71, 72 initiation, 21, 420
chromosome painting and, 48f clinical validity and, 608–609 nuclear vs. mitochondrial, 22f
clinical consequences, 55–56, 504–505, 504b definition, 71 termination, 22, 22f, 418, 420
defining breakpoint, 505, 505f loss-of-function mutations, 435–436, 435f, 436t see also Genetic code
gene-level effects, 504–505 mitochondrial disorders, 436–437 Coefficient of inbreeding, 86
mammalian evolution, 320 modifier genes and, 437 Coefficient of relationship, 86
mechanism, 55f see also Allelic heterogeneity; Genotype– Coefficient of selection, 88
oncogenes, 542t phenotype relationship; Locus Coelenterates (cnidarians), definition, 327b
novel chimeric gene creation, 48f, 542t, heterogeneity; specific disorders Coelacanth, LF-SINE family, 336-337, 337f
543–544, 545t Clinical test(s), Coelom, 153
translocation into highly active chromatin, analog vs. digital results, 608, 629 Coelomates, definition, 327b
542t, 544–546, 545f assay vs. test, 606 Cofactor(s), protein conformation and, 25
positional cloning and see Positional cloning ‘diagnose and treat’ vs. ‘predict and prevent,’ 606 Cohesins, 32, 550
reciprocal, 54, 55f, 56, 56f evaluation, 606–609, 636 Coiled bodies see Cajal bodies
Robertsonian (centric fusion), 54-56, 55f, 57f ACCE framework, 606, 606b, 629 Coiled coils, 26
segmental duplication and, 264, 264f analytical validity see Analytical validity Co-immunoprecipitation (co-IP), 396t, 399, 399f
X–autosome translocations, 505–506, 506f clinical utility, 606b, 608–609 Colcemid, 44
see also Chromosome breakpoints; specific clinical validity see Clinical validity Colinearity reconstruction problem, 309
rearrangements ethical issues, 606b, 608–609 Collagen(s),
Chromosome walking, 227 quality assurance, 607 connective tissue, 101
dystrophin gene identification, 520 susceptibility tests see Susceptibility testing exon duplication and evolution, 315
Chronic granulomatous disease, X-linked, 426, 427f, threshold for action and, 609 fibrillar
499, 520, genetic see Genetic testing dominant-negative mutations, 431, 431f, 436
gene therapy for, 710b, 712-713 population screening see Population screening haploinsufficiency and, 436
Chronic myeloid leukemia, BCR–ABL1 rearrangement, relative risk see Relative risk missense mutations, 431
48f, 543–544, 544f, 545t sensitivity vs. specificity, 607, 607b, 608, 630, 630t, structure, 431
Cilia, 94b, 94f 631–632, 632t Colon,
cell signaling and, 105 Clinical trials/protocols, 618f, 619, 680b, 696t, 709, aberrant crypt foci, 561, 562
cell types with cilia, 95t 710t, 710f, 713t cancer see Colorectal cancer
primary cilia, 94b, 94f cancer gene therapy, examples, 713t Colony, of cloned bacterial cells, 171, 171f
Ciliates, as model organisms, 299b, 300 gene therapy, 709, 710t, 710f, 713t Colony hybridization, 172, 206–207, 207f, 215–216,
Ciona intestinalis (tunicate = ascidian) in drug development, 618f 216f
model organism, 303t, 314f phase I, II and III defined, 618f Color blindness, Hardy–Weinberg distribution and, 85
cis-acting, definition, 14 phase III, 619 Colorectal cancer,
cis-regulatory elements (CREs), 14, 312, 333 Clinical utility, 606b, 608–609 familial see Familial colorectal cancer
chromosome breakpoints and, 504–505 Clinical validity, 606b, 636 as multistage process, 561–563, 563f
long-range effects, 508–509 analog vs. digital results, 608, 629 pathways approach to understanding, 564–565,
clusters (locus control regions), 351–352, 352f analytical validity vs., 607 564f
Index 745
mapping using shared ancestral segments, Cytotoxic T cells (CTLs), 120t, 127 Decapentaplegic, 335
459, 459t class I MHC proteins and, 126b, 127 Decidua, 149b
positional cloning approach and, 499, 501 killing virus-infected cells, 127 DeCode project (Iceland), 488
zooblots, 522 Cytotrophoblast cells, 150, 151f, 152f Dedifferentiation, see under Cell differentiation
gene therapy difficulties, 712 Degeneracy, genetic code, 22–23, 22f, 417, 509
genetic testing, 570 Degenerate oligonucleotide-primed PCR (DOP-PCR),
allelic heterogeneity and, 572
D 186b, 188
ARMS test, 581f Danio rerio see Zebrafish (Danio rerio) Degenerate oligonucleotide probes, 509, 509f
carrier screening see under Population DAPI, excitation/emission wavelengths, 201t Degenerate primers, PCR, 186b, 188
screening Dapsone, genetic variations in metabolism, 614 Degrees of relationship, 86
DMD vs., 570t Darier–White disease, 500f Delamination (development), definition, 145t
DNA sequencing, 573f Dark-field microscopy, ISH and, 206 in gastrulation, 152f
gene tracking and, 592 Darwinian selection see Positive selection Deletion(s),
geographical variations and, 589, 591t Databases, chromosome-level see Chromosome deletions
heteroduplex/SSCP analysis, 575f candidate gene identification, 501 dystrophin gene, 586–587, 586f
OLA, 582f forensic DNA profiling, 601 indels, 407, 407f, 411, 429f, 430f, 491, 589f, 615,
phases, 589 genetic variation and, 407, 407b, 578 615f
heterozygote advantage and, 88–89, 89b, 459 copy-number variation, 410 microdeletions see Microdeletions
mouse disease models, 664b, 664t genome data, 235, 235t, 236f, 236t nomenclature/terminology, 407b
difficulty in replicating CF phenotype, 664b mitochondrial mutations, 437 terminal deletions, 428
leaky mutations, 664b, 664t model organism-specific, 394t whole-exon, 585
missense mutations, 664b, 664t noncoding RNA, 279t testing for see Genetic testing
null alleles, 664b, 664t nucleotide sequences, 235t Deletion/insertion polymorphisms (DIPs; indels), 407,
pig model, 671 protein sequence motifs and domains, 388t 407f, 411, 429f, 430f, 491, 589f,
population genetics, 589, 591t proteomic, 235t, 385t 615, 615f
Cystic fibrosis transmembrane conductance regulator protein interactions, 400, 400t Delta–Notch signaling,
(CFTR) see under Cystic fibrosis repetitive DNA, 291, 410 neural stem cells, polarity and cell fate, 138, 139f
Cytidine, 4t restriction sites, 166b neuronal differentiation and, 157, 157f
Cytochrome c, apoptosis induction and, 114, 114f specific databases see Databases (individual) Denature/denaturation, 47, 193, 193f
Cytochrome P450 Drug Interaction Table, 611 transcriptomic, 385t definition, 192b
Cytochrome P450s, 610, 610f Databases (individual), DNA isolation/purification, 172
evolution, 611 Allen Brain Atlas, 385t factors affecting, 194–195
genetic variation and, 610–613 ArrayExpress, 385t mutation screening and, 574, 574t, 575f
copy number variation, 612, 612f BioGRID, 400t PCR and, 183
CYP1A2, 613, 617, 617f dbEST, 235t Denaturing HPLC, 574, 574t, 575f
CYP2A6, 613 dbSNP, 407, 407b, 417, 578 Dendrites, 101, 101f
CYP2C9, 612, 612f, 617, 617f DDBJ, 235t Dendritic cells, types and roles, 121t, 126b,
CYP2C19, 612, 612f, 616 Decipher database, 410 high class II MHC expression, 126b
CYP2D6, 611–612, 611f, 612f, 616 DIP (database of interacting proteins), 400t see also specific cell types
CYP3A4, 612, 617, 617f DrolD (Drosophila interacting proteins), 400t Dendrograms, microarray data, 247b, 247f
drug development and, 619 EBI genomes, 236t Deoxynucleotides, nomenclature, 4t
drug labeling and, 616 EMAGE, 385t Deoxynucleotide triphosphates (dNTPs), 4t
extensive metabolizers, 611, 611f, 612f EMBL, 235t, 520 phospholinked fluorescent dNTPs, 222, 222f
intermediate metabolizers, 611, 611f, 612f ENCODE, 311, 346 structure, 3f
polygenic effects, 616 Flybase, 236t Deoxyribonucleic acid see DNA
poor metabolizers, 611, 611f, 612f GenBank, 235t, 520 Deoxyribose, structure, 3f
population frequencies, 612, 612f Gene Expression Omnibus, 385t Depurination, 412, 412f, 413f
race/ethnicity and, 612 GOLD (Genomes online database), 236t Designer babies, see Genetic enhancement
ultra-rapid metabolizers, 611, 611f, 612f HRPD, 400t Desmocollins, 99
human gene numbers, 611 Human Proteinpedia, 400t Desmogleins, 99
humanized mice, 670 InterPro, 388t Desmoplakins, 99
mechanism of action, 610–611 MGI (Mouse Genome Informatics), 236t Desmoplastic small round cell tumor, 545t
specificities, 611 MINT (Molecular interactions db), 400t Desmosomes, 98f, 99
Cytogenetic maps, 230t MIPS (Mamm. prot.-prot. interaction db), 400t Determinants, polarity determination, 140
Cytogenetics, 43–50 Mitelman, 544, 559 Deuterostomes, definition, 327b
cancer see Cancer cytogenetics MITOMAP, 68, 437 Development, 133–162
cell types used, 44, 46f Mouse Atlas of Gene Expression, 385t aging as part of, 134
chromosome groups, 45t NCBI, 236t amphibians, 135t, 136
chromosome identification, 30, 44–46 OMIM, 62, 63b, 69 animal models, 135–136, 135t, 300, 301–303
banding patterns see Chromosome banding Pfam, 388t see also Model organisms
molecular techniques see Molecular PIR (Protein Information Resource), 235t cell adhesion and tissue formation, 97, 145, 145t
cytogenetics ProDom, 388t cell division and, 134
chromosome nomenclature, 45b PROSITE, 388f, 388t cell proliferation vs. cell death, 108, 145–146
karyograms, 46, 46f Proteomic World, 385t programmed cell death, 108, 112, 112t, 113,
karyotypes, 45 pseudogene database, 272b 145–146
mitosis vs. meiosis, 44 REBASE (restriction nucleases db), 166b proliferation role, 108, 134, 145
normal variations, 409 Repbase (repeat sequences db), 291 see also Apoptosis; Cell proliferation
preparation methods, 44 SAGE Genie, 385t cellular specialization see Cell differentiation
reporting, 45–46 SGD, (Saccharomyces genome db) 236t cleavage, 147–148, 148f
resolution, 45, 46f, 47 SMART, 388t embryonic vs. postembryonic, 134
see also specific techniques SWISS-PROT, 235t evolutionary conservation see under
Cytoglobin, 317 TCAG, 410 Development, genetic control
Cytokines, 121, 122 TREMBL, 235t fertilization, 146–147
Crohn disease, 529 UK National DNA Database, 601 gastrulation see Gastrulation
Cytokinesis, 32, 32f US CODIS database, 601 human see Human development
Cytoplasm, 93, 94b Wormbase (C. elegans), 236t invertebrates vs. vertebrates, 112, 135, 135t, 142
Cytosine, 3, 3f, 4t ZFIN (Zebrafish Information Network), 236t mammalian see Mammal(s)
5-methycytosine derivative, 357, 357f DEAD box motif, 270, 271f morphogenesis see Morphogenesis
deamination and, 262f, 412, 412f, 413f, 415 Deafness see Hearing loss nervous system see Neural development
see also DNA methylation Deamination, 412, 412f, 413f numerical chromosome abnormalities and,
RNA editing C¨ÆU 374, 375f 5-methylcytosine, 262f, 415 52–53, 52t
structure of, 3f RNA editing, 374, 375f overview, 134–136
Cytoskeleton, 94b Death signals, 113–114 pattern formation see Pattern formation
see also specific components Debrisoquine, CYP2D6 polymorphism and sensitivity (development)
Cytosol, 94b to, 611, 611f phylotypic stage, 135
Index 747
progressive complexity, 134 twin studies see Twin studies recombinant DNA production, 165, 166–168,
reproductive system, 158–159 dn/ds ratios (see Ka/Ks ratios) 167f, 169f
teratogens, 160 DNA, 2–7 cloning vectors see Vectors
twinning, 149, 150b, 150f antisense strand, 13, 13f screening for, 170–171, 170f, 172
see also specific processes/mechanisms base composition see Base composition type II restriction endonucleases and, 165,
Development, genes and gene expression, 134–135, cell content (C), 30 166–167, 168t
346 paradox of, 96, 96f, 329 single-stranded DNA production, 176–177, 177f
alternative promoters and, 372 cell-specific functions, 12 uses, 176, 176f
ATP-dependent chromatin remodeling chromosomes see Chromosome(s) transformation see Transformation
complexes and, 356 complexity/size, 164 vectors, different types, 173t, see also Vectors,
DNA methylation and, 359–360, 360f electrophoresis, 204b cloning
enhancers, loss-of-function mutations and, 351, eukaryotic functions, 30 DNA damage, 411–416
352f information storage, 2, 11 carcinogenesis see Carcinogenesis
evolutionary conservation of, 160–161, 161f, central dogma and, 2 detection, 429, 555, 556f
334, 334f see also Gene(s); Genetic code; Genome(s) DNA metabolism and, 412, 415
apterous, 334, 334f, 502–503 isolation methods, 164 effects of, 413, 415
dorsoventral specification and, 160, 161f labeling see Nucleic acid labeling 5-methyl cytosine deamination and C to T
eyes absent, 520 long-range interactions, 508–509 changes, 262f, 415
Hox genes, 143, 161, 334, 334f see also cis-regulatory elements (CREs) apoptosis and cell death, 413, 416
morphological evolution and, 334, 334f microarray see Microarray(s) cancer role, 412
see also Homologous genes; individual noncoding cell cycle and, 413
species functional RNAs see RNA genes germ line, 415
gene dosage effects, 430 introns see Intron(s) mutation see Mutation(s)
globin gene expression during, 318 ‘junk DNA,’ 276b environmental agents, 411–412
heterochrony, 334–335 pseudogenes see Pseudogenes internal chemical events, 412, 412f, 413f
heterotopy, 334 regulatory sequences see Regulatory response, 416, 551f, 555–556, 556f
homeotic genes see Homeotic genes (Hox genes) elements apoptosis see Apoptosis
long ncRNAs and, 288 repetitive see Repetitive DNA BRCA1/2, 556, 556f
mosaic pleiotropism, 334, 334f transcription of, 362–364, 364f cell cycle arrest, 109, 546
orthologous genes, 334, 334f see also ENCODE project see also Cell cycle, regulation
pleiotropic developmental regulators, 335, 335f protein-coding see Protein-coding genes CHEK2, 555, 556f
transcription factors, 134–135 recombinant, definition, 165 defects and cancer, 546
neural development and, 157–158 repetitive sequences see Repetitive DNA direct reversal, 414b
see also Transcription factors replication see DNA replication genomic instability and, 554, 555–556, 556f
Diabetes Type 1, 486b, 486f, 489, 489t, 490f, 635 RNA binding see DNA–RNA duplexes nibrin/MRN complex, 555, 556f
Association study, 623t sense strand, 13, 13f oncogene activation and, 546
TDT, 486b, 486f Southern blots, 204, 205f p53 and see p53
Diabetes Type 2, 422, 489, 489t, 490f, 623t, 635 structure see DNA structure repair see DNA repair
calpain-10 and, 518 transposons, see DNA transposons see also specific types of damage
susceptibility tests, 622,623t viral genomes, 7, 12t, 13f DNA fingerprinting, 596–597, 596f, 602
predictive value of, 623t, 625 see also Genome(s) definition, 596
Diacylglycerol, as second messenger, 107, 107f DNA amplification, 163–190 zygosity testing, 599
Diagnostic and Statistical Manual of Mental Disorders approaches, 164–165, 164f DNA footprinting, 401–402, 402f
(DSM), 468 cell-based see DNA cloning (cell-based) DNA helicase(s), 9, 10b, 10f
Diagnostic criteria, multifactorial disorders, 468 genetic testing and, 571 DNA repair and, 415
Diagnostic microarrays, 210 meta-PCR technique, 573 TFIIH and, 349
Diakinesis, 34, 34f multiple displacement amplification, 186b DNA libraries, 172–173, 214–217
Diastrophic dysplasia, 436 whole genome amplification using phi29 DNA amplification/dissemination, 216–217
Dicentric chromosomes, 53, 555 polymerase, 186b, 489 differential growth rates and distortion, 217
Dicer (cytoplasmic RNase III), 283, 284b, 285f PCR see Polymerase chain reaction (PCR) gridded arrays, 216
knockdown and cancer, 377 DNA-binding domains, 350 plating out see Plating out
Dictyostelium discoideum, common motifs, 104b, 104f, 350 cDNA, 172, 214–215, 215f
as model organism, 299b, 300 nuclear receptor superfamily, 104b, 105, 105f genomic, 172, 183, 214, 215f
multicellular evolution and, 92 transcription factors, 104b, 396 screening, 215–216
Dideoxy DNA sequencing, see DNA sequencing DNA-binding proteins, colony hybridization see Colony
Dideoxynucleotides (ddNTPs), 217, 217f binding domains see DNA-binding domains hybridization
Difference gel electrophoresis (DIGE), 253 methyl-CpG-binding domain, 358 markers, 215, 216f
Differentiation, cellular see Cell differentiation oncogenes, 541t, 542 PCR-based, 216, 216f
Diffuse enhancers, 312 transcription factors see Transcription factors sequencing see DNA sequencing
DiGeorge/VCFS syndrome, 427–428, 427t see also Protein–DNA interactions starting material, 214
Digit formation, 139, 143, 143f DNA chip, 209, 210f, 245–246 DNA ligase(s), 10b
apoptosis and, 146, 146f definition, 192b DNA repair and, 415
Digoxigenin, see also Affymetrix microarrays; Microarray(s) recombinant DNA production, 167
DNA labeling, 202 DNA cloning (cell-based), 164f, 165–172, 167f DNA loops, gene expression regulation, 351, 351f, 364
structure, 202f advantages, 165 DNA markers, 226, 226b, 227t, 228, 449, 449t, 463
Diplobasts, definition, 327b cell division/amplification, 165, 167f, 171, 171f density/genome spread, 449–450
phylogenetic relationships, 330f disadvantages, 182 DNA profiling and, 596–598, 596f, 597t, 598f
Diploid cells, 96 DNA purification, 166, 167f, 172 ESTs see Expressed sequence tag (EST)
Diploid, definition, 30 denaturation/precipitation steps, 172 hybrid cell panels, 226b, 231
Diploidization, 318 see also Colony hybridization ideal characteristics, 448–449
Diploidy, uniparental, 57 essential steps, 165-166, 167f map assembly, 227b
Diplotene, 34, 34f gene expression systems see Expression cloning microsatellites as see Microsatellite DNA
Direct repeats, 264 history/background, 165 minisatellites, 449t, 596–597, 596f
Direct reversal of DNA damage, 414b host cells RFLPs see Restriction fragment length
Disability rights, prenatal screening and, 633–634 bacterial, 165, 168 polymorphism (RFLP)
Disease genes see Genetic disease competence, 171 SNPs see SNP (single nucleotide polymorphism)
Disease models, genetic deficiency and, 182 STSs, 227b, 227f, 230–231, 230t
animal, 305t, 641t, 662-672, insect, 181 see also Genetic variation; Polymorphism(s);
human cellular, via iPS cells, 693, 694f mammalian, 181–182 specific markers/types
Disease treatment, genetic, 679-715 yeast, 175 DNA methylation, 262b
Dispermy, 50 large insert cloning, 172–177, 173t animal cells and, 262b
Distal (chromosome nomenclature), 45b problems, 173, 174 bacterial, 165, 166b
Distance matrix tree construction, 328, 328f PCR vs., 185 cancer cells, 358, 548, 549f, 560, 562, 577
Disulfide bridges, protein structure, 27, 27f plating out, 165–166, 171–172, 171f DNA damage and (nonenzymatic methylation),
Dizygotic (fraternal) twins, 150b, 150f, 470 412, 412f, 415
748 Index
gene expression and, 262b, 357–360 DNA rearrangements, immunoglobulin genes, 53, 93, minisequencing, 581-582, 602
chromatin regulation and, 357, 360–362 128–130 next generation sequencing, see DNA
development role, 359–360, 360f, 370 DNA repair, 411–416 sequencing, massively parallel
hypomethylated CpG islands, 357–358 chromosome rearrangements and, 53, 54f resequencing
effect of lack of, 360 common DNA metabolism proteins, 415 candidate genes, 492–493, 531–532
gene silencing, 358 damage detection, 429, 555, 556f data analysis, 573–574
imprinting and, 358, 370 defects and disease, 415–416, 555 genetic testing, 573–574, 573f, 602
methyl-CpG binding proteins, 357, 358–359 allelic heterogeneity and, 429, 429f meta-PCR method, 573
tumor suppressor inactivation and cancer, cancers, 556–557, 558, 558f, 558t, 560 microarray-based, 223–224, 224f
358, 548, 549, 549f complementation groups, 416, 416f substrate sources, 217
see also Epigenetics/epigenetic memory see also specific disorders whole genome see Genome Projects; Genomics
identification/assay, 560 DNA polymerases, 10–11, 415 DNA sequencing, gel-based
by bisulfite modification, 358b, 358f, 577 double-strand break repair, 414b, 414f automated in part, 218, 219f,
ChIP see Chromatin immunoprecipitation see Homlogous recombination; capillary sequencing, 218
(ChIP) Nonhomologous end joining dideoxy (Sanger) sequencing, 217–218, 217f,
diagnostic utility, 577 (NHEJ) 218f, 219, 219f
mutation screening, 577–578, 577f gene identification used in, 416 single stranded templates, 176, 217, 220
RFLP analysis, 577, 577f gene nomenclature, 416 DNA sequencing, massively parallel, 220–223,
limited in Drosophila, 262b Ig hypermutation and, 131 amplified DNA methods, 220
locations, 357 mechanisms, 414bb, 414f, 557 errors, 221
methyltransferases see DNA methyltransferases see also specific mechanisms/lesions iterative pyrosequencing, 219-220, 220f,
(DNMTs) DNA replication, 7–12 221f,
repetitive DNA, 357 cell senescence and, 111 micorarray-based DNA capture for, 223-224,
sites (CpGs), 357, 357f, 358 see also Cell senescence 224f
variation in, 358, 359f common DNA metabolism proteins, 415 sequencing-by-ligation, 220, 221t
DNA methyltransferases (DNMTs), 357 DNA damage and, 412, 415 sequencing-by-synthesis, 220, 221f, 221t
de novo, 365 end-replication problem, 41, 42–43 methods compared, 221t
human, 359t, 365 telomerase and, 43, 43f single molecule (third generation) sequencing,
imprinting and, 370 see also Telomere(s) 220–223, 221t
maintenance methyltransferases, 365, 370 machinery, 9–11, 10b advantages, 221
O-6-methylguanine-DNA methyltransferase, 414b helicase, 9 HeliScope™, 221
restriction–modification systems, 166b polymerases see DNA polymerase(s) nanopore sequencing, 222–223
RNA interference and, 284b see also specific components phospholinked fluorescent dNTPs, 222, 222f
DNA nanoparticles, compacted, 704, microsatellite generation and, 408, 409f, 423 polymerase processivity and, 222
DNA polymerase(s), 9–11, 10b mitochondrial genome, 257, 258f SMRT (single molecule real time)
absolute 3’ OH requirement, 43 origins see Origins of replication sequencing, 222, 223f
DNA recombination, 10 repair and, 10, 557 DNA structure, 7
DNA repair and, 10–11, 415 cancer links, 412, 557 3’ and 5’ ends, 8, 8f
heat-stable polymerases for use in PCR, 183, 185 proofreading, 10, 412 A-DNA, 7
high-fidelity and proofreading, 10 translesion synthesis, 11, 415 A-T base pair, structure of, 7f
low-fidelity, 10–11, 415 replication fork, 9 bases, 3, 3f, 7
phi29 DNA polymerase, isothermal amplification replication origin, 9 relative composition see Base composition
and, 186b rolling circle, phage l, 174 see also Base(s)
primers, 9, 43 semi-conservative nature, 9, 9f B-DNA, 7
processivity, 222 semi-discontinuous nature, 9 charge and, 7, 172
stutter and microsatellite generation, 408, 409f, end-replication problem and, 41, 42 chemical bonds, 6–7, 6t
423 leading vs. lagging strand, 9, 9f, 10f hydrogen bonds, 6, 6b, 6t, 7
types/families, 10, 11t Okazaki fragments, 9, 42, 423 phosphodiester bonds, 7, 7f
DNA-directed, 9–10, 11t see also Lagging strand double helix, 7
RNA-directed (reverse transcriptases), 11t telomere replication, 43, 43f anti-parallel nature, 8, 8f
see also specific enzymes DNA replicons, 168 see also DNA replication
DNA polymerase I (E. coli), nick translation, 198, 198f DNA resequencing see under DNA sequencing base pairing see Base pairing
DNA probes, DNA response elements, nuclear receptor superfamily, chromatin and see Chromatin
advantages/disadvantages, 197 103, 105, 105f, 352 coiling/supercoiling, 36–37
labeling, 197–198, 198f, 199f DNA–RNA duplexes, 8 major and minor groove, 7, 8f
see also Nucleic acid labeling hydrogen bonds, 6b pitch of helix, 7, 8f
molecular cytogenetics and, 46, 47 see also Heteroduplex(es); Nucleic acid see also Chromosome structure
preparation, 197 hybridization G-C base pair, structure of, 7f
DNA profiling, 571, 596–601, 602 DNase(s), caspase-activated, 113 linear sequences, 3–4, 3f, 7
clinical samples, origin identification, 598 DNase-hypersensitive sites (DHSs), 352f, 356 nomenclature issues, 3, 4t, 8, 13
definition, 596 DNase I footprinting, 401-402, 402f nonrandom base composition, 7, 341
DNA fingerprinting, 596–597, 596f, 599, 602 DNA sequence, phosphate, 3, 3f, 7
DNA markers/polymorphisms used, 596–598 changes in see Mutation(s) RNA vs., 3
amelogenin as sex marker, 597 genomic analysis see Genomics sugar (deoxyribose), 3, 3f, 7
CODIS set, 597t maps, 230t topology, 7
microsatellites, 597, 597t, 598f, 599 nomenclature, 13–14 viral genomes, 7,12t, 13f
minisatellites, 596–597, 597f prediction from known protein, 509, 509f Z-DNA, 7
mitochondrial markers, 597–598 sequencing methodology see DNA sequencing DNA synthesis,
SGM+ set, 597t variation PCR and, 183
Y chromosome markers, 597–598, 599–600 between cells, 96–97 pyrophosphate release, 219
forensic science and see Forensic DNA profiling see also Chimeras see also DNA replication
low copy number technique and, 599 between individuals see Genetic variation DNA technology see Recombinant DNA technologies
mixed profile problem, 599 see also Mutation(s) DNA transposons, 290
paternity/non-paternity and, 598 DNA sequencing, 217–218 cut-and-paste transposition, 659
fingerprinting and, 596f dideoxy sequencing, 217-219, 218f, 219f human (fossils), 291, 291fr
single locus markers, 598, 598f candidate gene testing, 512–513 local hopping, 656
sample material ChIP-seq, 355b DNA transposons, specific types
crime-scene, 599 gene expression analysis, MER1, MER2, 291, 291f
quality/quantity issues, 599 MPSS, 248 P element (Drosophila), 302b, 659-660
zygosity tests, 598–599 SAGE, 248–249, 384 piggyBac, 660
DNA purification, see also DNA sequencing, massively parallel Sleeping Beauty, 659-660, 660f
by CsCl density gradients, 164 genome-wide association studies, 489 DNA vaccines, 687
by Quiagen DEAE-based resins, 172 genome-wide cancer studies, 559–560 DNA viruses, 7, 12t, 13f
see also DNA cloning; Polymerase chain reaction human genome see Human Genome Project DNMT1, 365, 365f
(PCR) (HGP) Dog, disease models, 641t
Index 749
additional exons/promoters, 372 polycomb proteins, 365 vertebrate phylogeny and, 331f
approaches used, 362, 363f trithorax proteins, 365 Evolution, 297–344
genome browser limitations and, 501 see also Chromatin; DNA methylation cancer cells and microevolution, 538, 539–540,
goals, 362 paramutations and transgenerational effects, 365, 539b, 540f, 561–563, 562f
identifying pathogenic variants, 417 370, 371f see also Carcinogenesis
model organisms (ModENCODE), 346 Epigenetics, 378, 498 comparative genomics see Comparative
novel exons, 363 definition, 365 genomics
transcription start sites, 363–364, 364f developmental regulation and, 359–360, 360f, conservation of development pathways,
pervasive transcription, 362–364, 364f 370 160–161, 161f
transcription factor binding sites, 351, 351f epimutations, 513 drug metabolism enzymes, 610, 611
Encyclopedia of DNA elements project see ENCODE as primary vs. secondary cause of disease, exon see under Genome evolution
project 513 genes/genomes see under Genome evolution
End-labeling, 197 in cancer imprinting role, 370
Endocytosis, different mechanisms, 705f gene silencing, 549, 549f molecular relationships see Molecular
Endoderm, 137, 151, 152, 152f genome-wide studies (cancer epigenome), 560 phylogenetics
extraembryonic, 137f possible role in normal variation, 406 multicellular organisms, 92
origin of definitive embryonic, 137f, 152f hidden heritability and, 534 nucleic acid(s), 275b
principal derivatives, 154f imprinting see Imprinting Out of Africa model, 484
visceral (hypoblast), 137f monoallelic gene expression, 365, 371, 371f phylogenetics see Phylogenetics
Endogenous antigen, 127 X-inactivation, see X-inactivation RNA interference (RNAi), 390
Endogenous short interfering RNA (endo-siRNA), 286, Epilepsy/seizures, copy-number variants and, 532 RNA World hypothesis, 275b
288f, 279t Episomes, 168, 180 selection and see Selection
Endometrial cancer, 558 Epistasis, see also specific processes
Endometrium, 149f locus heterogeneity and, 71 Ewing sarcoma, chromosome rearrangements, 545t
Endomitosis, definition, 96, 97f Epithelial cells, 95t Exaptation,
constitutional tetraploidy and, 51f cancer see Carcinoma creating novel regulatory elements, ,
mosaic polyploidy and, 96,97f cell junctions, 98, 98f, 99 creating novel exons, 272b, 336, 336f, 337f
polytene chromosomes and, 225 Epithelial–mesenchymal transition, neural crest cells, of pseudogenes, 272b
Endonucleases, 156b of transposons, 336-337, 337f
DNA repair and, 415 Epithelium, 100–101, 100f Excitation wavelength, fluorophores, 200, 201b, 201t
DNA replication and, 423 asymmetry, 101 Exclusion mapping, 454
Endophenotypes, gene identification and, 532 cells see Epithelial cells Exogenous antigen, 127
Endoplasmic reticulum (ER), 94b, 94f lymphoid tissue, 121 Exome, 224, 224f
Ca2+ levels and apoptosis induction, 113, 114f Epitope tagging, 244, 245t Exon(s), 16, 276b
Endothelial cells, vascular, 95t, 126b sequences of common epitope tags, 245t alternative first exons/promoters, 372
progenitor cells, 690 Erlotinib (Tarceva®), as genotype-specific drug, 620 CDKN2A gene, 552, 552f
Enhancers, 15, 349 Escherichia coli, UGT1A gene, 614
competition for, 371, 371f gene density, 266 duplication, 314–315, 314f, 341, 585
diffuse enhancers, 312 genome sequencing, 228, 234, 306 genetic tests see under Genetic testing
evolutionary conservation, 335 as model organism, 298, 299b, 300 human–mouse conservation, 504
immunoglobulin genes, oncogene activation mutator genes and homologs, 558, 558t immunoglobulin rearrangement see
and, 545–546, 545f Estriol (unconjugated), trisomy 21 and, 630 Immunoglobulin genes
LCRs, 352 ESTs see Expressed sequence tag (EST) lineage-specific, 335-337, 337f
LF-SINEs and, 337 Ethical concerns, number, and genetic testing, 572
loss-of-function mutations and, 351, 352f animal models, 303–304 number, variation between genes, 265t, 267t
mode of action, 351, 351f clinical tests, 606b, 608–609 repetitive DNA, 266
role in imprinting, 369-370, 370f designer babies, 697b shuffling, 315, 315f, 341, 388, 432
splicing and, see Splicing enhancers DNA profiling, 601 size statistics, 265t, 266, 267t
tissue-specific, 313f genetic enhancement, 697b skipping of, 373
ENSEMBL genome browser, 236t, 236f, 501 germline gene therapy, 697b splicing and, 16–17, 16f
ENU (N-ethyl-N-nitrosurea), human embryonic stem (ES) cells, 688 alternative see Alternative splicing
alkylating agent, 658, 658f human reproductive cloning, 692b see also Splicing
mutation frequency, 658 population screening and, 633–634 transposon origins, 336, 336f, 337f
preference for modifying A-T base pairs, 658 Ethidium bromide, staining DNA, 204b VDJ and VJ exons, 129f, 130, 130f
structure of, 568, 658f Ethnicity/race, Exon duplication, 314-315, 314f, 341, 585
Estrogen, 647 DNA profiling and, 601 Exonization, 336, 336f, 337f
Estrogen receptor, 105f, 647, 647f genetic testing and specific mutations, 589, 590t, Exon junction complex (EJC), 418
ligand-binding domain, 105f, 647, 647f 591t Exon shuffling, transposon-mediated, 315, 315f
Environmental agents, DNA damage, 411–412 genotype-specific drugs, 619 Exon skipping, induced by antisense oligonucleotides,
Environmental factors, lactose tolerance, 524, 524f 707, 708f
genetic factor interactions see Gene– pharmacokinetics see Pharmacokinetics therapy for Duchenne muscular dystrophy, 707,
environment interactions see also Genetic variation 708f
genetic factors vs. Eubacteria, see Bacteria Exonucleases,
family studies, 469 Euchromatin, 38, 260, 354 DNA repair and, 415
twin studies, 470–471 amount in human chromosomes, 261t transcription termination, 349
G value paradox and simple organism adaptation, conformation, 354, 354f Expressed sequence tag (EST), 227t, 230, 230t, 232,
331–332 fraction of human genome, 260 248, 309, 346
lineage-specific gene family expansion, 332–333, genome mapping, 232, 260 dbEST database, 235t
332t histone code and gene expression, 354, 354t functional genomics searches, 386
sex determination and, 320 human chromosome content, 261, 261t Expression arrays see Gene expression analysis
Environmental genes, lineage-specific expansion, Eukaryote(s), Expression cloning, 178–182
332–333 cell structure see Cell structure definition, 178
Environmental variance (VE), 81 classification, 92, 92f disease gene identification, 509
Eosinophils, 121t genomes, 2, 93 eukaryotic cell lines, 180–182
Epiblast, 151, 152f, 153f evolution see Genome evolution advantages, 180–181
source of three germ layers and germ cells, 137f introns and, 314 dominant selectable markers, 182, 183f
Epidermal growth factor (EGF), 160, 161f see also Chromosome(s); DNA; Gene(s) functional complementation, 182, 182f,
Epidermal growth factor receptor (EGFR), as model organisms see Model organisms 509–511
inhibitors and cancer therapy, 620, 620f phylogenies, 329, 330f insect cells, 181
Epidermis, prokaryotes vs., 92–93, 94f mammalian cells, 181–182, 681
renewal/replenishment, 115 see also specific classes/phyla stable expression, 180–181, 181–182, 182f
stem cell niches, 117, 117f Euploidy, 50 transient expression, 180, 181
Epigenetic memory, 365-371 European Court of Human Rights, DNA databases large volume protein synthesis in bacteria,
mechanisms, 353–364, 365, 370 and, 601 178–179
maintenance methyltransferase and, 365, 365f, Eutherian (placental) mammals, advantages, 178
370 definition, 327b disadvantages/problems, 178–179, 180, 681
Index 751
fusion proteins, 179, 179f Farnesylation, 23t use in quantitative PCR (qPCR), 243b, 243f
inducible vectors, 178, 178f Fas signaling in apoptosis, 113, 114f, 127 see also specific molecules
phage display, 179–180, 180f FASTER (First and Second Trimester Evaluation of Risk) Focal adhesions, 98f, 99
problems/limitations study, 630 FokI DNA cleaving domain,
complementation markers, 182 Fc receptor, structure, 125f separable from DNA-binding domain, 654,
inclusion bodies, 179 FDA (US Food and drug adminstration), 683-684, 686, use in zinc finger nucleases, 655, 655f, 708
post-translational processing/folding and, 695, 706 Follistatin, evolutionary conservation, 160, 161f
178, 180 Feature (microarray), definition, 192b Forensic DNA profiling, 599–601, 602
protein product toxicity, 178 Female(s), databases, 601
stable expression, 180, 181 meiosis ethical/political issues, 601
therapeutic recombinant protein, used to chiasmata counts, 446, 446f fingerprinting and, 596f
produce, 681, 683, 682f male vs., 33, 33f legal issues, 600–601, 602
transgenic animals, 681, 683, 682f recombination frequencies, 446, 446f, 447, 447f appropriate ethnic gene frequency, 601
transient expression, principle, 180 sex determination, 159 multiplicative principle and, 600–601
in baculovirus system, 181 X-inactivation see X-inactivation prosecutor’s fallacy, 600, 600b
in mammalian cells, 181 Feminizing genes, 159 obstacles to use, 600
uses/purpose, 178 FEN1 endonuclease, 423 personal information and, 599–600
vectors see Vectors Ferritin, iron-induced changes in expression, 375–376, appearance prediction, 600
Expression profiling see Gene expression analysis 376f ethnicity and, 601
Expression quantitative trait loci (e-QTLs), 371 Fertilization, 146–147 familial relationships, 601
Expression screening see Gene expression analysis Fetal testing see Prenatal testing Y haplotype and surname, 599–600
Extended transmission disequilibrium test (ETDT), 486 Fiber FISH, 47 probabilities and, 599, 600–601
Extracellular matrix (ECM), 97, 99–100 Fibrillin, 502 sample quality/quantity, 599
basal lamina, 100, 100f Fibroblast growth factor 4 (FGF4), lineage-specific technical issues, 599–600
connective tissue, 100f, 101 ectopic expression, 338, 338f uses, 599
epithelial tissue, 100 Fibroblasts, cytogenetics, 44 Forkhead domain, 270t
functional roles of components, 99,100 Fibrous connective tissue, 101 gene family, 270t
GAGs and proteoglycans, 99 Filaggrin, eczema role, 533–534 Formaldehyde, protein and DNA crosslinking, 403,
morphogenesis and, 145 Filopodia, 94b 403f
role in cell behavior, 99 Fingerprinting, see Clone fingerprinting; DNA Formylglycine residues, sulfatase enzymes, 522, 523f
secretion, 95t fingerprinting Forward genetics see Classical genetics
support role, 99 First-degree relatives, definition, 86 Fosmids, 173t, 175
Extraembryonic membranes , 137f, 148–151, 149b gene sharing by, 86 Founder effects, 89
see also specific structures FISH see Fluorescence in situ hybridization (FISH) association studies and, 488, 488f
Extravasation, 123 Fish, loss-of-function mutations, 433
Extrinsic pathway, apoptosis, 113, 114f model organisms, 302, 304b recessive disease and, 589
Eye disorders, promising for gene/cell therapy, 712, polyploidy in, 319 454 Life Sciences, see Massively parallel DNA
gene therapy success, 710b, 712 sex determination and, 320, 321t sequencing
teleost, definition, 327b Fragile sites, 409, 410f, 423
transgenesis, 644 Fragile X syndrome, 409, 410f, 423, 425f, 513, 514
F vertebrate phylogeny and, 331f allelic heterogeneity, 429
Factor VIII, see also individual species anticipation and, 74, 424
difficulty in finding mutations, 513, 513f Fisher, RA, 79 genetic testing, 588, 589f
DNA sequence prediction from known protein, Fitness, organism, 88, 307b methylation assays, 577
509, 509f see also Selection mutations, 580t
Facultative heterochromatin, 38, 354, 360 FLAG epitope tag, 245t premutation alleles, 424
FADD, 113, 114f Flagellates, as model organisms, 300 Fragment ion searching, 251
Falconer, DS, 79 Flagellum, 94b Frameshift mutations, 420–421, 420f
see also Polygenic threshold theory sperm, 94b, 147, 147f APC, 563
Familial adenomatous polyposis (FAP), Floor plate, 157 consequences, 420
genetic testing, protein truncation test, 576 Fluorescein, connexin 26 gene and, 420, 420f
loss of heterozygosity and, 548, 557 emisson/excitation wavelengths, 201t dystrophin gene and, 421f
pathways approach to, 564 structure, 201f mechanisms, 420–421
three-hit carcinogenesis and, 548 Fluorescence in situ hybridization (FISH), 47 Framework maps, see under Genome Projects
Familial colorectal cancer, breakpoint identification, 505, 505f, 506f Free fetal DNA, 572
familial adenomatous polyposis and, 548, 557, direct vs. indirect detection, 47 FRET (fluorescence resonance energy transfer), 396t,
564, 576 gene expression studies, 241 398–399, 399f
genetic testing, protein truncation test, 576 microdeletion/duplication identification, 506 Friedreich ataxia, allelic heterogeneity, 429
hereditary non-polyposis colon cancer (Lynch molecular cytogenetics, 47, 553, 554f Frogs, see Xenopus and species of
syndrome), 557–558, 564 chromosome painting/karyotyping see Fruitfly see Drosophila melanogaster
loss of heterozygosity see Loss of heterozygosity Chromosome painting Fugu, see Takifugu rubripes
(LOH) conventional, 47, 47f, 48f Functional assays,
positional cloning approach, 499 interphase FISH, 47, 48f genetic testing, 571
tumorigenesis, 563 metaphase FISH, 47, 47f, 48f mutations vs. normal variants, 578
Familial male precocious puberty, 433 pseudocolors, 48, 48f Functional complementation, 182, 182f, 509–510, 515
Familial medullary thyroid carcinoma, 434 resolution, 47 cell lines, 182, 182f, 509–510
Familial relationships, DNA profiling, 596, 596f, 598, oncogene amplification, 542, 543f whole animal, 510, 510f
598f, 601 see also specific techniques Functional genomics, 381–404
Family environment, shared, 469 Fluorescence microscopy, 201b, 201t see also ENCODE project
Family studies, 468–471 protein expression analysis, 244–245, 245f definition, 382
adoption studies, 471, 471t Fluorescence resonance energy transfer (FRET), 396t, functionally important sequences, 234
Crohn disease, 528 398–399, 399f gene inactivation/inhibition, 383, 389–393, 403
diagnostic criteria and, 468 Fluorophores, classical (forward) genetics, 389, 403
genes vs. environment, 469, 470–471 automated DNA sequencing, 218, 219f germ line, 392–393
limitations, 469–471 single molecule sequencing, 222, 222f reverse genetics, 389, 390, 390t, 403
risk ratio (λ) and familial clustering, 468–469, 469t definition, 200, 201b RNAi see RNA interference (RNAi)
twin studies see Twin studies emission wavelength, 200, 201b, 201t types of manipulations/approaches,
Fanconi anemia, excitation wavelength, 200, 201b, 201t 389–390
BRCA2 and, 556 nucleic acid hybridization and, 200–203 see also Cell culture; Genetically modified
DNA repair defect, 415, 416, 556 commonly used molecules, 201t organisms
functional complementation and gene direct assays, 47, 200, 201b integrated analyses, 384, 385t
identification, 510 FISH and, 47 levels of study, 382, 383
FAP see Familial adenomatous polyposis (FAP) fluorescence microscopy and, 201b, 201f biochemical level, 382
Farthest-neighbor method, microarray data, 247b indirect assays, 47, 200–202, 201b, 202f cellular level, 382
Fascio-scapulo-humeral muscular dystrophy,71 structure, 201f organismal level, 382
752 Index
problems, 248 multiple sulfatase deficiency, 510, 522–523, CFH, 533, 533f
qPCR, 186b, 189, 241–242, 243b, 243f, 346 523f, 524f CFTR, 88, 419f, 422, 423f, 428, 513, 514, 522, 712
ribonuclease protection assay, 241 see also success (below); specific disorders CHD7, 525–526, 525f
RT-PCR, 346, 348b clinical importance, 515 CHEK2, 491, 527, 555
SAGE, 248, 348t complex multifactorial disease see Susceptibility CLN3, 245, 245f
tissue in situ hybridization, 206, 206f, 241, genes CMAH, 339t
241f endophenotypes and, 532 COL1A1, 436
transcript size/abundance, 241 mapping see Genetic maps/mapping COL1A2, 436
tumor cells, 560–561, 562f, 620–621, 621f monogenic disease see Positional cloning CXYorf1, 326
see also Complementary DNA (cDNA); position-dependent methods see Positional CYP2C9, 617
Messenger RNA (mRNA); Nucleic cloning CYP2C19, 616
acid hybridization; Transcriptome position-independent methods, 509–512 CYP2D6, 611, 612, 616
source material, 239–240, 240f success, 531–534 DAX, 159
transcription start site identification, 348b clinically useful findings, 532–534 DCC, 562
see also Expression cloning; specific methods genome-wide association studies and, 532 Delta, 157
Gene families, 256, 263–264, 268–269, 270t hidden heritability and, 534 DFNB3 genes, 510, 510f
classes, 269–271 monogenic disorders, 531–532 DMD, 712
closely related, highly homologous, 269 see also case studies/examples (above) DMPK, 423
common protein domains, 270, 270t susceptibility genes see Susceptibility genes Dnmt1, 370
common protein motifs, 270, 271f see also specific methods DNMT3B, 361
functionally related, weak homology Gene knockdown, 393 Dscam, 315
(superfamilies), 270–271 C. elegans, 390–391 DSCR1, 425
clustered, 256, 268, 269f, 270t genetically modified organisms, 393 DSRAD, 375
Hox genes see Homeotic genes (Hox genes) mammalian cell lines, 390, 391–392 DTDST, 436
imprinting and, 369, 369f morpholino oligonucleotides, 390, 393f DYRK1A, 425
LCRs and, 352, 352f RNAi and, see RNA interference (RNAi) dystrophin, 373, 373f, 421f, 436, 499, 506, 506f,
multiple clusters, 269, 269f Gene knock-in, 650-651, 651f, 652f 520, 520f, 586–587, 586f, 595
pseudogenes, 271 Gene knockout(s), 389, 393, 393f, 394 env, 540, 540f, 701, 702f
see also Pseudogenes candidate gene testing, 515 ERBB, 542
RNA genes, 277, 279 mouse genetics and, 503 ERBB2 (HER2/Neu), 542, 619
dispersed, 268, 269, 270t random, 393f ERCC2, 416
dispersal mechanisms, 264 RNA editing studies, 375 EVI2A, 268
pseudogenes and, 271 targeted, 393f EVI2B, 268
snRNA genes, 280–283 Gene loss, EVC/Evc, 642f
feature of metazoan genomes, 315 after whole genome duplication (WGD), 318, 318f EVC2, 642f
gene fragments, 271, 273f human lineage, 337–339 EYA1, 520
generation by gene duplication see Gene Gene number, different species, 329, 331t EYA2, 520
duplication difficulty in estimating, 239t EYA3, 520
lineage-specific expansion G-value paradox, 329, 331-332, 331t, 342 EYA4, 520
environmental genes and, 331, 332–333, Gene Ontology, 238b, 382 eyes absent (eya), 520
332t Gene Ontology (GO) Consortium, 238b F8A, 513, 513f
genomic drift and, 333 Genoscope, 233 FBN2, 502
G-value paradox and, 331 Gene pool, FGF4, 338, 338f
human lineage and, 337–339 definition, 84 FGFR2, 527, 625
lineage-specific regulation, 338–339, 338f frequency of genes/alleles in see Gene frequency FKHR, 433
pseudogenes see Pseudogenes Gene prediction programs, 236 FLG, 533
see also specific gene/protein families Genes (list), FMS, 542
Gene fragments, 271 ABL1, 48f, 542, 543–544, 544f FOXP2, 340, 340f, 340t
Gene frequency, 84–89 ACE, 615, 615f FRA16A, 423
definition, 84 Adar1, 375 FRA16B, 423
genes identical by descent, 86 ADRB1, 615 fragilis, 158, 158f
genotype frequency relationship, 84–87 ADRB2, 614 FRAXA, 409, 424
genetic counseling and, 84–85 AMELX, 597 FRAXE, 409
inbreeding effects, 85–86 AMELY, 597 GABRA3, 374
see also Hardy–Weinberg distribution AMH, 159 gag, 540, 540f, 701, 702f
population screening effects on, 634 AMPR, 170, 170f, 175f, 179f, 183f GJB2, 420, 420f, 591
variation over time, 87–89, 307b AMY1, 316, 337 globin genes
coefficient of selection, 88 APC/Apc, 548, 550, 562, 564, 576, 661b HBA1, 317f
fitness and, 307b APOB, 374, 375f HBA2, 317f, 422f
founder effects, 89 APOE, 474 HBB, 418f, 422f
genetic drift, 87, 307b APP, 474, 532 HBD, 317f
heterozygote advantage and, 88–89, 89b apterous, 161, 334, 334f, 502–503, 522 HBE1, 317f
interlocus sequence exchange and, 307b AR, 434t HBG1, 317f
population size effects, 89, 307b, 307f ARMD loci, 533, 533f HBG2, 317f
see also Mutation rate; Selection ATG16L1, 516, 518, 530 HBZ, 317f
Gene function, overview of methods for studying in ATM, 429, 429f, 491, 527, 555 MB, 317f
vivo, 389, 393f BAX, 551 NGB, 317f
Gene fusion, BCR, 48f, 544, 544f see also Globin genes/proteins
oncogene activation and, 48f, 543–544, 544f BCRYRN1, 293 GNAS1, 432t, 434t
see also Fusion proteins blimp1, 158, 158f GPR35, 518
Gene identification, 497–536 BLM, 415b GRIA2, 374
approaches, 498, 499f, 534–535 BRAF, 543, 564 GRIK2, 374
candidate genes see Candidate genes BRCA1, 474, 491, 510, 526, 526f, 547, 556, 576, H19, 287, 288t, 370f
case studies/examples, 519–531 582f, 586f, 609, 622, 623t, 713, 713f HAR1A (HAR1F), 340t, 341
Alzheimer disease, 532–533 BRCA2, 474, 491, 510, 526, 527, 556, 622, 623t HBII snoRNA genes, 282, 369f
AMD, 533, 533f BRIP1, 491, 510, 527 HD, 424, 424t, 432t, 442
branchio-oto-renal syndrome, 506, 522 cap, 703, 703f HEXA, 589
breast cancer see Breast cancer, genes CAPN10, 518 HFE, 608
CHARGE syndrome, 506, 525–526, 525f CARD15 (NOD), 528, 530 HIC1, 549, 549f
Crohn disease see Crohn disease CASP8, 491 HLA genes, 126b, 126f, 263, 266, 268f, 270t, 271,
cystic fibrosis see Cystic fibrosis CCND1, 542 273f, 339, 485
DMD see Duchenne muscular dystrophy CCR5, 709 HLA*A1, 479
(DMD) CDC14Bretro, 338t HLA-DR4, 477
Eczema, 533–534 CDKN2A, 373, 552, 552f screening, 635
lactose tolerance, 523–525, 524f CENPB, 336 Hox genes, 142f, 143, 270t, 288, 334
754 Index
penetrance and, 73 non-Mendelian disorders vs., 468 QF-PCR in fetal trisomies, 587–588, 588f,
see also Pedigree(s) overview/principles, 448–451 598, 602
see also Genetic testing pedigree size and, 450, 456, 457 repeat expansions, 588
Genetic disease, positional cloning and see Positional cloning whole-exon duplication/deletion see below
chromosome abnormalities and, 504b resolution, 462, 464 specific mutations, 571, 579–585, 579t, 602
positional cloning and see Positional cloning sperm genotyping/mapping, 462–463, 462f allele-specific PCR (ARMS), 581, 581f, 602
see also Chromosome abnormalities success, 531–532 applications, 579, 580t
counseling see Genetic counseling see also Pedigree(s) ASO hybridization, 580–581, 602
gene identification see Gene identification model organisms, 225, 441 massively parallel SNP arrays, 583–585, 583f,
heterogeneity and, mouse, 503, 503b 584f, 585f
genotype–phenotype correlations see three-point crosses, 455, 456t microsequencing by primer extension,
Genotype–phenotype relationship see also Genome Projects; specific models 581–582, 582f, 602
limited range of mutations and, 580t non-Mendelian disorders, 463, 464, 467–494, MS genotyping, 583
see also Genetic variation; specific types of 498, 515 oligonucleotide ligation assay, 581, 581f,
heterogeneity association studies see Association studies 582f, 602
inheritance 62, 63, 63b diagnostic criteria and, 468 pyrosequencing, 583, 602
see also Monogenic (Mendelian) inheritance; difficulty, 468 RFLP analysis, 579–580
Multifactorial inheritance; family studies see Family studies susceptibility genes see Susceptibility testing
Pedigree(s) genes vs. environment, 469 whole-exon duplication/deletion, 585–587
multiple causes, 514 linkage disequilibrium and, 460 apparent non-maternity, 587, 587f
one disease many genes see Locus heterogeneity linkage studies see Linkage analysis array-CGH and, 585
one gene many diseases, 514 Mendelian disorders vs., 468 difficulties/problems, 585
as spectrum of genetic determination, 63, 63f, 73, segregation analysis, 471–473, 471t dystrophin deletion in males, 586–587, 586f
73f, 471, 494, 494f, 622 physical maps and, 446–448, 456 MLPA, 585–586, 586f, 602
testing see Genetic testing positional cloning see Positional cloning see also Genetic counseling; individual disorders;
timing of expression, 73 recombination and, 225, 442–448 specific methods
see also specific conditions chiasma counts and total map length, 446, Genetic variance (VG), 81
Genetic distance, 463 446f Genetic variation, 405–440
centimorgans (cMs), 444 gene conversion effects, 446 allelic heterogeneity see Allelic heterogeneity
chiasmata counts and total length, 446, 446f haplotype blocks, 444 copy number variation, see Copy number
defining minimal candidate region, 500 interference and, 445, 461 variation
mapping functions, 445–446 mapping functions, 445–446 effects of, 406
multipoint mapping and, 457 nonrandom distribution, 446–448, 447f, 448f clinical heterogeneity and see Clinical
physical distance vs., 446–448 recombinant vs. non-recombinant loci, 442, heterogeneity
recombination fraction and see Recombination 443f, 444f, 449, 450 difficulty in identifying, 416–417
fraction recombination fraction see Recombination drugs/pharmacology and see
Genetic diversity see Genetic variation fraction Pharmacogenetics/
Genetic drift, 87, 307b sex differences, 446, 446f, 447, 447f pharmacogenomics
population size and, 307b, 307f single vs. double-crossovers, 445, 445f functional tests, 416–417
pseudogenes and, 272b utility/need for, 442 pathogenic effects see below
Genetic enhancement, 697b see also Genetic testing; specific methods see also Phenotype
designer babies, 697b Genetic markers, 448–449, 449t, 463 epigenetic see Epigenetics/epigenetic memory
Genetic heterogeneity see Genetic variation density/genome spread, 449–450 gene identification studies, 498
Genetic Information Nondiscrimination Act (GINA), DNA see DNA markers success and, 531–532
633 ideal characteristics, 448–449 see also Gene identification; Genetic maps/
Genetic linkage, protein, 449, 449t mapping
association vs., 479, 480f Genetic modifier see Modifier, genetic generation/mechanisms, 34, 408, 409f
disequilibrium, 459 Genetic screens, in different eukaryotes, 394t DNA damage/repair and, 411–416
mapping Mendelian traits see Linkage analysis reverse, in cultured cells, 390t see also DNA damage; DNA repair
recombination fraction and, 443–444, 444f Genetic testing, 569–604 DNA replication and, 408, 409f
Genetic maps/mapping, 441–465 clinical relevance see Clinical test(s), evaluation independent assortment, 34, 34f, 442
aims, 442 DNA amplification and, 571 meiosis I and, 34, 34f
bioinformatics see Bioinformatics see also Polymerase chain reaction (PCR) recombination and see Recombination
case studies, 519–531 DNA profiling see DNA profiling genetic susceptibility and, 438
see also specific disorders DNA vs. RNA, 572, 602 see also Susceptibility genes
cytogenetic maps, 230t functional assays, 571 genetic testing and see Genetic testing
framework maps see Framework maps general mutation screens, 571, 572–579, 602 genotype–phenotype correlation see Genotype–
genetic distance see Genetic distance DNA sequencing (resequencing), 573–574, phenotype relationship
genetic markers, 442 573f, 602 hidden heritability and, 534
human, 228–230 methylation patterns, 577–578, 577f large-scale vs. small-scale, 406
Mendelian disorders see below microarray-based, 576–577 locus heterogeneity see Locus heterogeneity
microsatellite analysis, 229, 229f, 408–409, see also Microarray(s) major histocompatibility complex, 126b
446 pre-sequencing scans see rapid scanning nomenclature/terminology issues, 406b, 407b
non-Mendelian disorders see below (below) normal variants, 406–411, 438, 498
resolution, 448 unclassified variants and, 578–579 evolutionary conservation and, 579
RFLP analysis, 228–229, 229f gene size/structure and, 572 interspersed repeats, 408
SNP analysis, 230, 446 gene tracking see Gene tracking large-scale variations, 409–411
sperm genotyping, 462–463, 462f guidelines, 570, 578 SNPs see SNP (single nucleotide
see also Human Genome Project (HGP) monogenic vs. complex disease, 628t polymorphism)
linkage analysis see Linkage analysis personalized see Personalized medicine tandem repeats see Tandem repeats
markers see Genetic markers place in 21st century, 634–635, 634f unclassified and genetic testing, 578–579
Mendelian disorders, 448–463, 497, 498, 499f population screening see Population screening see also Polymorphism(s); Repetitive DNA;
ancestral chromosome segments and, 457, possible questions to address, 570 specific types
459–460, 459t, 460f prenatal see Prenatal testing pathogenic variants, 416–428, 438
autozygosity and see Autozygosity quality assurance, 570 co-segregation and, 578
candidate genes see Candidate genes rapid scanning, 574–576, 574t database searches and, 578
difficulties with standard lod score analysis, heteroduplex properties, 574–575, 575f de novo, 578
460–463 protein truncation test, 576, 576f gain-of-function see Gain-of-function
factors affecting, 498 single-strand conformation, 575f, 576 mutations
fine-mapping, 457–460 samples/source material, 571–572, 571t gene dosage effects see Gene dosage
informative vs. noninformative meioses, 449, special tests, 585–591 identification difficulty, 416–417, 578
450f, 450t, 451–452, 451f, 500 ethnic origin/geographical factors, 589, identification methods, 578–579
linkage analysis see Linkage analysis 590t, 591t in silico predictions of effects, 578–579
linkage disequilibrium and, 459, 460 locus heterogeneity and, 590–591 interpreting changes, 417
low resolution, 448–457 lack of evolutionary conservation, 578
756 Index
loss-of-function see Loss-of-function whole genome duplications, 264, 318–319, 318f, dystrophin mutations, 421f, 436
mutations 319f HPRT mutations, 436, 436t
molecular pathology see Molecular see also Gene frequency; Selection; specific see also Association
pathology mechanisms Gentamycin, 680
protein aggregation and, 418, 425b Genome Projects, 214, 224-239 Germ cells, 92, 93, 95t, 96
small scale changes, 417–422 DNA libraries, 173 development in mammals, 158–159
tandem repeats instability see Microsatellite framework maps, 225-226, 225f, 226b, 227b embryonic pluripotent, 117–119
DNA chromosome banding patterns and, 226 production, 33–34, 33f
see also Mutation(s); specific disorders; clone contigs and clone maps, 226–228, 226b, see also Gametogenesis; Meiosis
specific mutations 230t somatic cells vs., 93
relative amounts, 484 chromosome walking, 227, 500–501 stem cells, 117
uses/applications clone fingerprinting, 227–228 Germ layers, three major, 151f, 154f
forensic analysis and, 408–409 positional cloning and, 500–501 principal derivatives of, 154f
see also DNA markers schematic, 226f Germ line, 31, 96, 158
see also Mutation(s); Polymorphism(s) disease mapping, 449, 450 DNA damage effects, 415
Genome(s), CEPH families, 455–456, 462 gene inactivation
definition, 11, 214 errors, 461, 500 gene manipulation see Genetically modified
DNA vs. RNA, 11, 12t, 13f DNA markers see DNA markers organisms
evolution see Genome evolution human genome see under Human Genome Germ plasm, 158
human see Human genome Project (HGP) Geron, 695
instability see Genomic instability mouse genes, 503b GINA (Genetic Information Nondiscrimination Act),
repetitive DNA see Repetitive DNA YAC instability and, 227b, 230 633
size–complexity relationship, 331t gene expression, understanding, 346 Glial cells, 102
C value paradox, 96, 96t human see Human Genome Project (HGP) see also specific types
G value paradox, 329, 331–332 model organisms, 228, 234–235, 306 Glioblastoma multiforme,
size constraints, 329 whole genome shotgun sequencing, 225f in vivo gene therapy, 713t, 714, 714f
synthetic, 300 Genome size, of different organisms, 329, 331t pathways involved, 565f
viral see Viral genomes Genome transplantation, 300 sequencing, 559
see also Chromosome(s); Gene(s); specific animal Genome-wide association (GWA) studies, 485, Globin genes/proteins,
groups/species 488–490, 494, 495f, 498, 516 a globin, 317, 317f
Genome browsers, 235, 236f, 236t, 312, 313f, 313t Crohn disease, 530–531, 530t, 531f gene clusters, 269, 269f
positional cloning and, 501, 501f factors contributing to development, 488–489 gene dosage effects, 426, 426f
Genome compilations, 236t HapMap project and see HapMap project locus control regions, 352, 352f
Genome databases, 235, 235t, 236t hidden heritability and, 534 pseudogenes, 271
Genome evolution, 313–326 power, 491 a thalassemia, 421, 422, 422f, 426, 426f
chromosomes see under Chromosome(s) size of relative risk, 490 ancestral gene, 317
comparative genomics and see Comparative successful studies, 518–519, 532 b globin, 317, 317f
genomics success of, 532, 622 b thalassemia, 421–422, 422f
computer analysis see Bioinformatics WTCCC replicated associations, 489–490, 490f gene clusters, 269, 269f
evolutionary constraint and conservation, 256, see also specific disorders locus control regions, 352, 352f
256f, 306–308, 341 Genome-wide chromosome paints, 48, 49f missense mutations and sickle-cells,
defining pathogenic variants and, 579 Genomic DNA libraries, 172, 214 417–418, 418f
development pathways and, 143, 160–161, average insert size, 225 post-translational cleavage, 25
161f, 334, 334f, 520 construction, 214, 215f promoter mutations, 421, 422f
noncoding DNA, 310–312, 312f, 313f, 313t, contiguous clones, 226 pseudogenes, 271
341 genetic testing and, 572 transcription and translation, 20f
positional cloning and, 502–503 PCR, 183 developmental expression, 318
purifying (negative) selection and, 306–308, Genomic drift, 333 divergence following gene duplication, 317–318
307b, 308f, 341 Genomic imprinting see Imprinting gene clusters and, 269, 269f, 317f
X chromosome and, 323 Genomic instability, 553–558 subfunctionalization, 317–318, 317f
see also ENCODE project; Homologous chromosomal see Chromosomal instability (CIN) early cloning experiments, 172, 214
genes; Model organisms DNA damage response and, 555–556, 556f g globin, 318
exaptation, 336 repair defects and cancer, 556–557 mutations
gene complexity, 314–315, 341 microsatellite see Microsatellite instability (MIN) missense and sickle-cell anemia, 417–418,
exon duplication, 314–315, 314f, 341 Genomics, 214, 224–239 418f
exon shuffling, 315, 315f, 341, 388, 432 comparative see Comparative genomics regulatory elements and thalassemias,
intron evolution and, 314 databases/browsers, 235, 235t, 236f, 236t 421–422, 422f
gene duplication and, see Gene duplication see also Databases pseudogenes, 271, 317
gene family expansion, lineage-specific, 332, 332t data interrogation/annotation programs see ξ globin, 318
gene loss, Bioinformatics see also Cytoglobin; Myoglobin; Neuroglobin
following WGD, 318, 318f DNA libraries and see DNA libraries Globin superfamily, evolution of, 317-318, 317f
human lineage, 337–339 framework maps see Framework maps Globular proteins, b−pleated sheets, 26
human see Human genome evolution functional see Functional genomics Glomus tumors, 75, 75t
major chromosome rearrangements (mammals), gene ontology, 238b Glucuronidation, drug metabolism, 610, 610f
319–320 Genome Projects see Genome Projects genetic variation and, 614
inversions, 320 massively parallel DNA sequencing and, 220 Glutamic acid, structure, 5f
karyotype divergence, 319, 319f personal genome sequencing, 233, 263, 305, Glutamine, structure, 5f
synteny and, 320, 320f 410–411 Glutathione conjugation, drug metabolism, 610, 610f,
translocations, 320 questions raised, 298 614f
molecular phylogenetics see Molecular shotgun sequencing strategy, 224–225, 225f genetic variation and, 614
phylogenetics see also specific methods Glutathione-S-transferases,
neogenes, 336 Genoscope, 203 genetic variation, 614
noncoding DNA Genotype, mechanism of action, 614f
expansion, 333 analysis see Genomics Glycine, structure, 5f
regulatory elements, 312, 333–335 definition, 62 Glycomics, 384
repetitive DNA, 290 drug development and, 618–619 Glycoproteins, 23–24
RNA genes see RNA genes frequency, gene frequency relationship, 84–87 Glycosaminoglycans (GAGs), 99
organism complexity and, 331t see also Hardy–Weinberg distribution Glycosylation, 23–24
G value paradox and, 329, 331–332 genetic testing see Genetic testing Glycosylphosphatidylinositol (GPI) anchors, 23t, 24
noncoding DNA, 329, 333 phenotype correlation see Genotype–phenotype Gnathostomes, definition, 327b
positive selection and rapid changes, 308 relationship Goats, transgenic, 682f, 683
Ka/Ks ratios, 308b Genotype–phenotype relationship, 70–71, 435–438 Goblet cells, of intestine, 117f
RNA genes and, 308, 309f clinical validity and, 607–608 GoldenGate® system, 584–585, 585f
RNA World hypothesis and, 275b mitochondrial disorders and, 436–437 Golgi complex, 94b, 94f
transposition and see Transposons modifier genes/effects, 437 Gonad development, 159
residual function and, 435–436, 435f see also Gametes; Sex determination
Index 757
Gorilla, human genome vs., 342 nonparametric linkage and, 474–475, 474f sickle-cell mutation and, 418f
Gout, 71 mutation–selection hypothesis and, 492 see also Globin genes/proteins
GPCRs see G protein-coupled receptors (GPCRs) size variations, 484 Hemoglobinopathies,
GPI anchors, 23t, 24 see also Association difficulty in finding mutations, 513, 513f
G protein-coupled receptors (GPCRs), 106–107, 106t, HapMap project, 230, 481–484, 494, 622 molecular pathology, 435
107f, 271 CNV effects, 491 sickle-cell mutation and, 418f
gain-of-function mutations and, 433 genome-wide association studies and, 489 see also Globin genes/proteins; specific disorders
GPCR superfamily, 271 graphical LD representation, 483b, 483f Hemophilia,
see also specific receptors irregularity of LD patterns, 482, 483f hemophilia A, difficulty in identifying mutations,
G proteins, key findings, 482, 483t, 484 513, 513f
inactive vs. active form, 107, 433, 543 Out of Africa model and, 484 mosaicism due to X-inactivation, 67
Ras signaling and, 543, 543f parent–child trios, 482 Hemolytic disease of newborn, 63f
as second messengers, 106–107, 106t, 107f phase I report, 482, 483t Hepatitis B virus, see HBV
types, 107 phase II report, 482 Hepatitis C, contracted by hemophiliacs, 681
Granulocytes, origins, 116f populations studied, 482, 484, 488 Hepatocytes, polyploidy in, 96
Roles and types, 121t tag-SNPs and, 484, 494 HER2, genotype-specific drugs, 619
see also specific cell types HAR1, function and positive selection, 341 Herceptin® (trastuzumab), genotype-specific drug
Granzymes, 122,127 HAR2, positive selection, 341 license, 619
Green fluorescent protein (GFP), Hardy–Weinberg distribution, 84–87 cancer treatment with, 686b
autofluorescence, 245, 245f association studies and, 516 Hereditary neuropathy with pressure palsies
genetic variants, 245 departures from/factors affecting, 86–87 (tomaculous neuropathy), 430t,
protein labeling, 245, 245f heterozygote advantage and, 88–89, 89b 434, 435f
Growth factors, inbreeding and, 85–86, 87b Hereditary non-polyposis colon cancer see Lynch
angiogenesis, 565 laboratory error, 86 syndrome
oncogenes, 541t, 542 locus heterogeneity and, 87 Heritability, 81–82
Growth hormone deficiency, treatment causing migration and, 86 definition/derivation, 81
Creutzfeld-Jacob disease, 681 selective mortality, 86 genes vs. environment, 471
gene clusters, 269f, 270t derivation, 84, 85b hidden heritability, 534
GST-glutathione affinity chromatofgraphy, 179, 179f predicting risks using, 84–85 misunderstanding of term, 82
GTPases, Ras family, 110, 110f, 543, 543f possible effects of screening on gene frequency, Herpes simplex virus, 701t
Guanine, 3, 3f, 4t 634 genome, 13f
depurination, 412, 412f, 413f Hardy–Weinberg equilibrium, 87 use in gene therapy, 701t, 703
nonenzymatic methylation, 412, 412f Harvard University, 233 HERV (human endogenous retroviral sequence), 291
structure, 3f HATs (histone acetyltransferases), 353 Heterochromatin, 38, 260–261, 289–293, 354
Guanosine, 4t HAT selection, 182, 182f acrocentric chromosomes, 261
methylated in mRNA/snRNA caps, 18, 18f, 281f Hayflick limit, 111 centromeres, 40, 261, 289, 290
Guanylate cyclase, 106t HBII-52 snoRNA, gene regulation and, 277f, 283 conformation, 354, 354f
G-U base pairing in RNA, 8,9f HBII-85 snoRNAs, and Prader-Willi syndrome, 277f, constitutive see Constitutive heterochromatin
Guide RNAs, 275 369, 369f DNA methylation and, 357, 360
see also specific types HBV (hepatitis B virus), 687 facultative, 38, 354, 360
Gut, HD see Huntington disease (HD) features, 362
lymphoid tissue, 121 HDACs (histone deacetylases), 353 genome mapping, 232
renewal/replenishment, 115 Health care, histone code and gene expression, 354, 354t
stem cell niches, 117, 117f expenditure, changing profile, 635, 635f human chromosomes, content in, 261t
tissue organization, 100, 100f policy, 621 insulators, 508
Guthrie card, PCR from, 593 population screening and, 628–629 maintenance see Constitutive heterochromatin
G-value paradox, 329, 331–332, 331t, 342 ethical issues, 633 RNA interference, 284b, 376–377
GWAS see Genome-wide association studies see also Population screening telomeres, 41, 42t, 289, 290
‘predict and prevent’ medicine, 606, 634–635 tendency to spread, 351, 361, 506, 508
Hearing loss, Y chromosome sequence, 290
H autozygosity mapping, 458, 458f see also Repetitive DNA
3H (tritium), 200t
dominant-negative mutations and, 432 Heterochromatin protein 1 (HP1), 361
H19 RNA, 287, 288t functional complementation and gene Heterochrony, in developmental, 334–335
Haemophilus influenzae, genome sequence, 234 identification, 510, 510f Heterodisomy, 58
Hagfish, definition, 327b locus heterogeneity and, 71, 71f, 590–591 Heteroduplex(es), 193, 193f
Hair follicle, stem cell niches, 117, 117f Heart disease, polypill and see Coronary artery disease definition, 192b
Hair mineral analysis stem cell therapy for, 695, 695t DNA concentration and, 196
analytical and clinical validity, 607 Heatmaps, microarray data, 247b, 247f factors affecting stability, 194–195
Hairpins in nucleic acids, 8, 9f Heat shock response, Alu sequence and, 293 genetic testing/mutation screening and,
see also short hairpin RNA Heavy chain, immunoglobulin(s), 123, 123f, 128–130, 574–575, 574t
Hairy ears, Y-linked inheritance and, 67 129f probe–target formation, 193, 193f, 194f
Haldane’s function, 445 HEK293 human embryonic kidney cells, 389 stringency and see Hybridization stringency
Halothane, malignant hyperthermia and, 616 HeLa cells, 389 Heterogametic sex chromosomes, 321
Haploid/haploidy, 96 Helicases see DNA helicase(s) Heterogeneous ribonucleoprotein (hnRNP), 373
definition, 30 HeliScope™, 221 Heteromorphic chromosomes, 321, 321t
see also Germ cells Helix-loop-helix (HLH) motifs, 104b, 104f, 350 Heteroplasmy, 68, 437
Haploinsufficiency, Helix-turn-helix (HTH) motif, 104b, 104f, 350 Heterotopy, in developmental, 334
loss-of-function mutations and dominance, Helper T (Th) cells, 120t, 127 Heterozygote advantage,
429–430, 430f, 430t, 435f, 436 class II MHC proteins and, 126b, 127 ancestral chromosome segments and, 459
SHOX in Turner syndrome, 426 Th subclasses, 127 balancing selection, 307b
tumor suppressor genes and, 548 Hematopoietic stem cells (HSCs), 116, 116f gene frequency and, 88–89, 89b
X–autosome translocations and, 506 in bone marrow stem cells, 695, 695t loss-of-function mutations, 433
see also Loss of heterozygosity (LOH) Hematoxylin and eosin, staining, 642 population genetics and, 589
Haplotype blocks, 444, 516 Hemidesmosomes, 98f, 99 single nucleotide polymorphisms, 407
affected sib pair frequencies, 474, 474f Hemizygous/hemizygote, Heterozygous/heterozygote,
HapMap project see HapMap project definition, 62 allele-specific fitness and, 307b
identity by descent vs. identity by state, 474, male X chromosome and, 64 ‘carriers’ see Recessive traits/recessiveness
474f, 482 Hemochromatosis, co-dominant phenotype, 65
mapping using testing, clinical validity and, 608 co-dominant selection, 307b
association studies see Association studies tratment of, 678 compound heterozygotes, 435, 574
autozygosity analysis, 458, 458f Hemophilia, definition, 62
cystic fibrosis, 459, 459t AIDS induced in some patients, 681 dominant-negative effects, 431–432, 431f
defining minimal candidate region and, limited success in gene therapy, 710b manifesting heterozygotes, 67
500, 500f Hemoglobin, 317 mosaicism and phenotype, 67
lactose tolerance, 524–525, 524f Constant Spring, 422, 422f selective advantage and see Heterozygote
NBS, 459–460, 460f embryonic/fetal, 318 advantage
758 Index
Hierarchical clustering, microarray data, 247b, 247f apoptosis, PCD and, 112–113 developmental positional identity and, 142–144,
HIF1a, tumor cells and, 565 gene therapy strategies to make lymphocytes 142f, 334
High-performance liquid chromatography, denaturing resistant to infection, 714-715 knock-out effects, 143
(dHPLC), 574, 574t, 575f gene therapy vector, see under Lentiviruses morphogen regulation and, 143, 143f
High-throughput technologies, HLA (human leukocyte antigen) complex, 126b, 126f neural pattern formation, 156
DNA sequencing see Massively parallel DNA disease associations evolutionary conservation, 143, 161, 334, 334f
sequencing ankylosing spondylitis, 635 gene clusters,
gene expression analysis, 240 rheumatoid arthritis and, 477 evolution of, 334, 334f
invertebrate models and, 301 type 1 diabetes, 635 human gene family, 270t
see also specific methods gene cluster organization, 126b, 126f mouse gene clusters, 142, 142f, 143
Hirschsprung disease, 434 class I gene family, 270t, 273f multiple regulatory ncRNAs within, 288
animal models and, 511, 511f class III gene cluster, 268f whole genome duplications and, 264
parametric linkage analysis and, 474 gene density and overlapping genes, 266, 268f haploinsufficiency and, 426
positional cloning, 502 genetic marker, 449t protein product, 143
segregation analysis, 471, 471t HLAB*27, ankylosing spondylitis and, 635 HOXD13 protein, 423
Histidine, pseudogenes, 271, 273f HP1 (heterochromatin protein 1), 361
His6-affinity tagging, 179, 183f see also Major histocompatibility complex (MHC) HpaII, methylation assay, 577, 577f
protein crosslinking and, 403 HMTs (histone methyltransferases), 353 HPGRT, genotype–phenotype correlations, 436, 436t
structure, 5f HNPCC (hereditary non-polyposis colon cancer) see HPLC, denaturing (dHPLC), 574, 574t, 575f
zinc fingers and, 104b Lynch syndrome HRAS protein, 543f
Histone(s), 353 Holliday junctions, 443b, 443f Hsp90, 105f, 647, 647f
charge, 353 Holoprosencephaly, 160 HUGO gene nomenclature, 62b
core histones, 37, 37f, 38, 353 Homeobox, see also Genes (list)
DNA interactions definition, 143 Human accelerated regions (HARs), 341
ATP-dependent remodeling complexes, see also Homeotic genes; Hox genes) Human artificial chromosomes (HACs), 685f
356, 357t Homeodomain, definition, 143 Human chorionic gonadotropin, trisomy 21 and, 630
histone modification and, 355–356 Homeotic gene complex, HOM-C, 142–143, 142f Human cells, a classification of adult, 95t
transcriptional start sites and, 362 see Hox genes Human chromosomes,
gene family, 269, 270t Homeotic transformation, 142 abnormalities of, see Chromosome abnormalities,
H1 and linker DNA binding, 38, 353 definition, 143 chromosome banding, see Chromosome banding
lack of mRNA polyadenylation, 18, 282 Hominids, chiasmata, number of in meiosis, 35
post-translational modification see Histone definition, 327b DNA amounts in each, 261t
modification hominid-specific genes, 338, 338t euchromatin content, 261t
U7 snRNAs and, 282 vertebrate phylogeny and, 331f gene density variation, 233f, 263
variants/specialization, 38, 356 Hominins, groups, 45t
centromeric (CenH3/CENP-A), 38, 40, 356 definition, 327b heterochromatin content, 261t
H2A.Z, 356 vertebrate phylogeny and, 331f distribution of non-centromeric, see inside
H3.3, 356 Hominoids, back cover
macro-H2A, 356 definition, 327b ideogram, see inside back cover
see also specific types hominoid-specific genes, 338, 338t karyotype, 46f
Histone acetyltransferases (HATs), 353 vertebrate phylogeny and, 331f nomenclature, 45b, 51t
Histone code, 354–356, 354t Homoduplex, pseudoautosomal regions, 35, 67, 321
Histone deacetylases (HDACs), 353 definition, 192b structure of, 322-323, 322f
Histone demethylases, 353 stability, 195 sex chromosomes,
Histone H1, 38, 353 Homogametic sex chromosomes, 321 gametologs on X and Y, 322f
Histone H2A, 37, 353 Homologous chromosomes (homologs), 34 X-inactivation, see X-inactivation
Histone H2AX, 38 independent assortment, 34,34f Y chromosome structure, 324-326, 324f
Histone H2A.Z, 356, 362 unusual pairing of X and Y, 35 gene conversion in, 325-326
Histone H2B, 37, 353 Homologous genes, 306 genes, 324, 325, 325b
Histone H3, 37, 353 ‘human-defining’ DNA and, 339 see also Chromosomes; X chromosomes;
centromeric variants, 40 orthologs see Orthologous genes Y chromosomes
H3.3, 356, 362 paralogs see Paralogous genes Human cloning, see Human reproductive cloning;
H3K4 methylation and promoters, 355–356 positional cloning and candidate genes, 499f, Therapeutic cloning
specialized centromeric (CenH3/CENP-A), 38, 502–503, 520 Human development,
40, 41f sex chromosomes, 326 animal models of, 135–136, 135t
Histone H3.3, 356, 362 see also Pseudoautosomal regions disease and, 160
Histone H3K4, Homologous recombination (HR), early (fertilization to gastrulation), 146–153, 148f
HP1 binding and, 361 dsDNA break repair and, 414bb, 414f, 655f, 656, cleavage, 147–148, 148f, 149f
methylation and gene expression, 355–356, 367 708, 709f extraembryonic structures, 137f, 148–151
Histone H3K9, X-inactivation and, 367 nonallelic (NAHR), 426 fertilization, 146–147
Histone H3K27, 365 frequency in mammalian cells, 649 gastrulation see Gastrulation
Histone H3K36, 361 in gene targeting, 649 implantation, 136, 150–151, 151f
Histone H4, 37, 353 Homology searches, physiological context, 149f
Histone kinases, 353 functional genomics, 385–386 evolutionary conservation and, 160–161
Histone methyltransferases (HMTs), 353 sequence prediction/annotation, 236, 237b, 237f, morphogenesis see Morphogenesis
In RNA interference pathways, 284 237t, 309–310 neural tissue see Neural development
Histone modification, 37, 353–356 see also Bioinformatics pattern formation see Pattern formation
acetylation, 353, 353f Homo neanderthalensis, ‘human-defining’ DNA and, (development)
associations between types, 356 339 postembryonic, 134
enzymes, 353–354 Homoplasmy, 68, 437 see also specific processes/mechanisms
gene regulation and ‘histone code,’ 354–356, Homozygous/homozygote, Human endogenous retroviral sequence (HERV), 291
354f, 354t alleles identical by descent see Autozygosity Human genome, 224, 267t
alternative transcription start sites and, 364, definition, 62 base composition, 7, 261, 263
364f see also Recessive traits/recessiveness CpG islands see CpG islands
DNA methylation and, 361, 361f Hormone receptors, see also Base composition
gene silencing and, 356 GPCRs, gain-of-function mutations and, 433 chromosomes, see Human chromosomes
multiple interactions and, 356 nuclear receptor family, 103, 104b, 105, 105f, coding DNA, fraction of, 256, 256f
nonhistone protein interactions, 354 352–353 comparative genomics, 339–341, 342, 502
promoter regions, 355–356 response elements, 352 chimpanzee vs. see under Chimpanzee
methylation, 353, 353f, 355–356 Hot-start PCR, 186b mouse vs. see under Mouse (Mus musculus)
nomenclature, 353, 353b Housekeeping gene(s), 12 positional cloning and, 502–503
nonhistone protein binding domains and, 354 HOTAIR RNA, trans-acting regulator, 288t see also Comparative genomics
phosphorylation, 353, 353f Hox genes components of, 256f
Histone phosphatases, 353–354 anteroposterior axis and, 142, 142f CNV, see Copy number variation (CNV)
HIV (human immunodeficiency virus), 112-113, DNA content, 261t
Index 759
euchromatin content, 260t, 261 STS maps, 230–231 Hydroxylation, of polypeptides, 23t
evolution see Human genome evolution types uses, 230, 230t Hypertension, association studies, 489, 490f
gene expression see Gene expression YAC contig map, 230, 231 Hypervariable regions, of Igs, 123f, 124
gene numbers, 238–239, 261–262, 267t, 293, 382 private companies vs., 231, 233 Hypoblast, 136, 151
comparative biology and, 261–262 sequencing phase, 231–234, 463 Hypohidrotic ectodermal dysplasia, 67
current vs. early estimates, 238 composite sequence vs. individual genomes, Hypomorph, 663
difficulties in estimating, 239t, 261 233 Hypoxanthine, adenine deamination, 412
protein-coding genes, 238, 261, 275, draft sequence, 233 Hypoxanthine guanine phosphoribosyl transferase,
310–311, 382 EST mapping and, 232 genotype–phenotype correlations,
RNA genes, 238–239, 256, 262, 277, 279, 382 euchromatin and, 232, 260 436, 436t
gene and genome statistics, 267t gaps, 233–234 Hypoxia, tumor cells and, 565
heterochromatin content, 261t individual chromosomes, 233
highly conserved sequences, fraction, 256, 256f technological advances and, 231
highly repetitive DNA, 267, 289-293 strategies, 231f
I
heterochromatin, 261t, 269t, 289-290, 289t, timeline, 228, 231, 232b I-cell disease, 514
transposon-based repeats, 267t, 291-293, see also specific methods ICAM, 126f
291f, 292f Human Genome Variation Society, 407b ICAM-1 receptor, humanized mice, 670
imprinted regions, location of, 367 Humanized alleles, engineered into mice, 340, 663, ICF syndrome, 361
mapping project see Human Genome Project 664b ICM see Inner cell mass (ICM)
(HGP) Humanized antibodies, 683, 684f ICSI (intracytoplasmic sperm injection), 643f
mitochondrial see Mitochondrial DNA Humanized mice, Identity by descent (IBD), 86, 457, 474, 474f, 482
(mitochondrial genome) RMGR recombination for making, 670 Identity by state (IBS), 458, 458f, 474, 474f, 482
organization, 256-263, 256f CF missense mutations, 664b Ig-CAMs, 98
pseudogenes, 267t, 269f, 271, 272b, 273f, 274f, human cytochrome p450 Igs see Immunoglobulin(s)
339 human IgH, IgK , 683, 685f Illumina BeadArray™ microarrays, 209, 246
segmentally duplicated DNA, 264f, 267t melanocytes in outer skin layers, 670 Illumina GoldenGate® system, 584–585, 585f
sequencing, see Human Genome Project (HGP) receptor for common cold virus, 670 Immortal cells, 555
transcribed fraction, 311–312, 362–364, 364f Human leukocyte antigen complex see HLA (human senescent cells vs., 111
see also ENCODE project leukocyte antigen) complex Immune system,
variations see Genetic variation Human Microbiome Project, 300 adaptive immune system see Adaptive immune
Human genome evolution, Human reproductive cloning, ethics, 692b system
chimpanzee vs. see under Chimpanzee Human noncoding RNA (ncRNA), classes of, 278tt barrier defenses, 121
chromosomes see under Chromosome(s) see individual RNA types cells, 95t, 119–131
fastest evolving genome sequences, 341 Humoral immunity, 123–124 lymphocytes see Lymphocytes
gene family expansion, 337–339 see also B cells; Immunoglobulin(s) non-lymphocyte cells, classes of, 121t
see also Gene duplication; Gene families Huntington disease (HD), 423, 425f see also specific cell types
gene loss, 337–339 age of onset 73-74, 74f cell signaling and cytokines, 121, 122
human-defining DNA identification, 339–341 allelic homogeneity and, 429 disease and, 120
cognitive ability and language, 340, 340f, animal modeling, 665-666, 665t, 666f identification of self and, 120, 122, 127
340t anticipation and, 75 autoimmunity and, 128b
conserved noncoding regions, 340t, 341 gain-of-function and, 423, 428 innate immune system, 121–122, 122f
nonrandom base composition, 341 gene mapping/identification, 442 programmed cell death and, 112–113, 112t, 122
positive selection and, 339, 340t, 341 positional cloning approach, 499 tissue damage and, 120
human-specific miRNA genes, 338 genetic screens to identify modifiers of tissue rejection and, 120
lactose tolerance and, 523–525 polyglutamine toxicity, 665t see also specific components
lineage-specific changes, 337–339, 338t, 339t, genetic testing, 588, 589f, 622, 623t Immune tolerance,
342 clinical utility vs. clinical validity, 608, 631 autoimmunity and, 128b
gene inactivation/modification, 339, 339t gene tracking and, 592 central vs. peripheral, 128b
gene regulation, 338 mutations, 580t major histocompatibility complex and, 127
human-specific genes, rarity, 338 PCD role, 113 Immunoblotting (western blotting), 242, 244f
mouse vs. see under Mouse (Mus musculus) protein aggregation, 425b Immunocytochemistry, 242, 244f
Out of Africa model and, 484 tissue-specificity of gene defect, 514 Immunofluorescence microscopy, 244
susceptibility genes, common disease–common see also Programmed cell death (PCD) Immunogens, 244b
variant model, 491, 493, 493t, Hyaluronic acid, 99, 100 Immunoglobulin(s), 123–124
494, 516 Hybrid cells, affinity maturation, 131
see also Comparative genomics; Molecular complementation groups, 416, 416f allelic exclusion, 13
phylogenetics marker framework maps, 226b, 231, 232 B cell, secreted (antibodies), 123
Human Genome Organization (HUGO), microcell-mediated chromosome transfer, 523, B cell surface, 123, 125f
gene nomenclature, 62b 524f classes, 123, 124t
see also Genes (list) X-inactivation studies, 367, 367f complement activation and, 122, 122f, 124
Human Genome Project (HGP), 228, 257 Hybrid proteins, see Fusion proteins diversity generation, 130
applications, 298 Hybridization, class/isotope switching, 130–131, 130f
bioinformatics definition, 192 somatic hypermutation, 131
databases/browsers, 235 nucleic acids see Nucleic acid hybridization VDJ recombination, 128–130, 129f
data interrogation/annotation programs, Hybridization assay, definition, 192b see also Immunoglobulin genes;
235–238 Hybridization stringency, 195–196 Recombination
see also Bioinformatics; Databases definition, 192b expression levels, 545
free availability of data, 233 methods for increasing, 195 functional roles, 124
functionally important sequences, 234 mismatches and, 195–196, 196f activating effector cells, 124, 125f
see also Functional genomics Hybridomas, 244b blocking pathogen entry, 124, 125f
gene maps, 232, 232b Hybrid populations, association studies and, 488 neutralizing toxins, 124, 125f
gene number estimates, 238–239, 239t, 261 Hydatidiform moles, 57 fusion proteins and, 244b
genome centers, 228, 233 Hydralazine, genetic variations in metabolism, 614 genes see Immunoglobulin genes
goals, 228, 230, 234 Hydrogen bond(s), 6, 6b, 6t isotypes, roles of different, 124t
low resolution genetic maps, 228–230 breaking see Denature/denaturation light chain exclusion, 131
microsatellite analysis, 229, 229f, 408–409 definition, 6 monoclonal antibodies, 244b
RFLP analysis, 228–229, 229f factors affecting, 194–195 monospecificity generation, 131
SNP analysis, 230 formation, 194–195 polyclonal antibodies, 244b
see also Genetic maps/mapping see also Anneal/annealing production/raising, 242, 244b
physical maps, 230–231, 442 intermolecular vs. intramolecular, 6b protein tracking methods, 242, 244b
EST maps, 232 nucleic acid structure, 6, 6b, 7, 7f, 8, 193 structure, 123, 123f
human-rodent hybrid cell mapping, 231 see also Base pairing constant region, 123f, 124
radiation hybrids, 232 protein structure, 6, 6b, 25 heavy chain, 123, 123f, 128–130, 129f
strategies/approaches, 231f Hydrophobic forces, 6t hypervariable regions (CDRs), 123f, 124
760 Index
immunoglobulin domains, 124 gene identification, 528–531 checkpoints and, 109, 109f, 550, 550f
light chain, 123f shared components, 528, 529t chromosome location, 38–39, 38f, 361
variable regions, 123f, 124, 128 ulcerative colitis, 528 chromosome structure, 30, 38
superfamily, see Immunoglobulin superfamily Ingression (morphogenesis), definition, 145t gene expression and, 36
see also specific types during gastrulation, 152, 152f FISH, 47, 48f
Immunoglobulin A (IgA), 124t Inheritance patterns, 61–90 G1 phase see G1 phase of cell cycle
Immunoglobulin D (IgD), 124t genes as trait determinants, 62 G2 phase, 30, 31f, 108, 109f, 550, 550f
Immunoglobulin domains, 124 heritability vs., 82 mitogen effects, 110
Immunoglobulin E (IgE), 124t loss-of-function vs. gain-of-function mutations, nucleus structure, 361
effector cell activation, 124 428–429 S phase, 30, 31f, 32, 108, 109f
Immunoglobulin G (IgG), 124t Mendel’s work, 62 Interstitial dendritic cells, 121t
complement activation, 122, 122f, 124 monogenic vs. multifactorial, 62–63 Intervention programs, 635
Immunoglobulin genes, 124, 128–130 monogenic (Mendelian) see Monogenic Intestinal crypts and villi, 117, 117f
class (isotype) switching, 130–131, 130f (Mendelian) inheritance Intestinal lactase, persistence, 523–525
junctional diversity and, 130 multifactorial see Multifactorial inheritance Intrabodies, 393f, 684, 684, 684f, 715
monoallelic expression, 131, 371 as spectrum of genetic determination, 63, 63f, 73, Intracellular signaling,
regulatory elements, 545 73f, 494, 494f, 622 second messengers see Second messengers
oncogene translocation and activation, see also Pedigree(s); specific modes signal transduction pathways, 103, 103t, 105,
545–546, 545f Inhibin-A, trisomy 21 and, 630 105f, 106f
V,D,J gene segments, 128, 129f Initiation codons, 21 see also specific molecules/pathways
recombination, 128–130, 129f frameshifts and, 420 Intrauterine environment, twin studies and, 470
somatic hypermutation and, 131 Initiator procaspases, 113 Intrinsic pathway, apoptosis, 113–114, 114f
VDJ and VJ exons, 130 Innate immune system, 121–122 Intron(s), 16, 276b
Immunoglobulin M (IgM), 124t functional role, 122 autocatalytic (self-splicing), 314
complement activation, 122, 122f, 124 see also specific components evolution, 314
Immunoglobulin superfamily, 124, 125f, 271 Inner cell mass (ICM), of blastocyst, 136, 137f lack of in mitochondrial genes, 258, 258f
Ig-CAMs, 98 embryonic stem cells and, 118–119, 119f lack of in many nuclear genes, 265t, 274t
MHC antigens as, 126b epiblast formation and 151 regulatory elements
see also specific molecules extraembryonic structures and, 149, 149b alternative promoters and, 372
Immunohistochemistry, 242, 244f formation, 149 chromosome abnormalities and, 508
Immunological memory, 123 hypoblast formation and, 151 removal by splicing, 16–17, 16f
see also specific cells/mechanisms pluripotency, 118, 136 repeat expansion diseases, 423
Immunologically privileged sites, 689 polarity and axis specification, 140b, 141f repetitive sequences, 266
Immunoprecipitation, gene identification, 509 Innocence Project, DNA evidence and, 599 size variations, 265t, 266, 267t, 372
Immunotherapy, 684, 698-699, 698f Inosine, production from adenine, 374, 375f spliceosomal, 314
Implantation, 136, 150–151, 151f base-pairing rules, 280b see also Splicing
Imprinting, 75–76, 75f, 76, 365 Inositol 1,4,5-triphosphate (IP3), as second messenger, Invagination, 144
common features of imprinted regions, 369 107, 107f neural tube formation see Neurulation
conflict theory, 370 Inositol phospholipids, as second messengers, 107, Inverse PCR, 186b, 186f
functions, 370 107f Inversion, chromosomes see Chromosome inversion
human genes, 367, 368t Insertions, Invertebrate(s),
disease role, 368–370, 368f, 369f, 428 copy number variation and, 409 chordates and vertebrate evolution, 313
imprinting control center, 369 epigenetic phenomena and, 370 lack of DNA methylation, 360
see also Angelman syndrome; Prader–Willi exon shuffling and, 314, 315f lineage-specific regulatory elements, 335, 335f
syndrome; specific disorders indels, 407, 407f, 411, 429f, 430f, 491, 589f, 615, model organisms, 301, 302b, 303t
list of human imprinted regions, 367 615f animal models of development, 112, 135,
mechanism, 370 long-range control of gene expression and, 352f 135t, 142, 160, 161f
DNA methylation and, 358, 370 nomenclature/terminology, 407b disadvantages, 301
enhancer–promoter interactions and, pair end mapping, 410, 410f, 411f evolutionary conservation and, 160, 161f,
369–370, 370f RNA editing, 374 301
imprinting control center, 369 tandem gene duplication and, 264 high-throughput and, 301
noncoding RNA role, 287, 368 transposons and insertion mutation, 292, 336 sex determination, 321t
see also DNA methylation In situ hybridization (ISH), 205–206, 241 see also individual groups/species
methylation assays, 577 chromosomes, 205–206 Inverted repeats, 652f, 660f, 703f
parental origin and, 367 definition, 192b In vitro fertilization (IVF), obtaining embryonic stem
Prader–Willi vs. Angelman syndromes, FISH see Fluorescence in situ hybridization (FISH) cells and, 119
368–369 tissue sections, 206, 206f, 384 In vitro mutagenesis, 176f
uniparental disomy and, 57 whole-mount, 241, 241f Ion channels, dominant-negative mutations and, 432
tissue- or stage-specific, 368t, 369 Instability see Genomic instability Ionic bond(s), 6t
Imprinting control center (IC), 369 Insulators, 352, 361, 508 Ionizing radiation, DNA damage, 411
IN0802 family, 357t imprinting and, 370 iPS cells, see Induced pluripotent stem (iPS) cells
Inborn errors of metabolism, 522 Insulin, IQ, heritability of (model), 82
different classes of treatment, 678 disulfide bridges, 27f Irinotecan, genetic variation and drug metabolism,
see also specific disorders post-translational cleavage reactions, 25, 25f, 268 614
Inbreeding, 85–86, 87b recombinant human, 681, 682t Iron metabolism, control of gene expression, 375–376,
autozygosity and, 457–459, 458f Integrins, 98 376f
coefficient of, 86 adhesion junctions and, 99 Iron response element (IRE), 376, 376f
genetic risks of, 87b Interallelic complementation, 72 ISCN chromosome nomenclature, 45b
Hardy–Weinberg distribution and, 85–86 Intercalation (morphogenesis), 145t ISH see In situ hybridization (ISH)
mouse gene mapping, 503b Intercellular signaling, 102–103, 102f Isochromosomes, 54
pedigree analysis and, 76, 76f see also specific mechanisms Isodisomy, 58
Incontinentia pigmenti, male lethality and inheritance Interference in recombination, 445 Isoelectric focusing, 249, 250f
mode, 76, 76f Interferons, recombinant human, 682t Isoflurane, malignant hyperthermia and, 616
Indels see Deletion/insertion polymorphisms (DIPs; Intergenic DNA, transcription of, 363 Isoleucine, structure, 5f
Indels) interleukin-2 (IL-2), 126b Isomerism, right, 140f
Indirect repeats, 264 Interlocus sequence exchange, 307b Isomerism, left, 140f
Induced pluripotent stem (iPS) cells, 690bb Intermediate filaments, 94b Isoniazid, genetic variations in metabolism, 613–614,
functionally equivalent to ES cells, 690bb Intermediate mesoderm, 153, 154f 613f
disease-specific for disease modeling and drug Internal promoters, 15, 16f Isotope-coded affinity tags (ICATs), 253
screening, 694f International Cancer Genome Consortium, 559 Isw2, 356
made from diverse model organisms, 672 International Human Genome Sequencing ISWI family, 357t
patient-specific for autologous cell therapy, 694f Consortium, 228 Iterative pyrosequencing see Pyrosequencing
Induction, cell fate and, 137–138, 138f International System for Human Cytogenetic
Infertility, male, 44, 67 Nomenclature (ISCN), 45b
Inflammatory bowel disease (IBD), Internet, susceptibility testing and, 628
J
Crohn disease see Crohn disease Interphase, 30, 32f, 108 JAK–STAT pathway, 105, 106f
Index 761
Jansen metaphyseal chondrodystrophy, 433 Lectin pathway, complement, 122 multipoint mapping and, 455
Japanese killifish (medaka), as model organism, 302, Left–right axis, phase-known vs. phase-unknown
304b polarity and specification in mammals, 141b, 141f heterozygotes, 451, 451f
Jervell and Lange-Nielsen syndrome, 436 symmetry/asymmetry, 139, 140f lod score see Lod score
J gene segments, see Immunoglobulin genes internal asymmetry, 140f multipoint mapping, 455–457
Joint probability, 594b, 594f Legal issues, advantages, 455, 456, 461
Joubert syndrome, zebrafish model, 671t forensic DNA profiling, 600–601, 600b, 602 CEPH-based framework maps, 455–456
Junk DNA, 276b population screening and, 633 definition, 455
Juxtacrine signaling, 102, 102f Lemba, association studies and, 488 disease loci mapping using, 456–457, 457f
Fas signaling as example, 114f Lentivirus, 701-702, 701t, 702f nonparametric, 475
gene therapy vectors, 701-702, 701t, 702f three-point crosses, 455, 456t
safety profile, 702 non-Mendelian traits, 468, 473–476, 494, 498
K first gene therapy success, 710b association studies vs., 484, 486–487, 487b,
Ka/Ks ratios, 308b genome, 702f 487t, 492
Karyograms, 46, 46f Leptin, recombinant, 682t lack of success, 477
Karyotype(s), 45 Leptotene, 34, 34f nonparametric analysis see below
evolutionary divergence, 319, 319f Leri–Weill dyschondrosteosis, pseudoautosomal parametric analysis, 473–474
Karyotyping, inheritance, 67 problems/difficulties, 463, 473–474,
cancer difficulty, 553 Lesch–Nyhan syndrome, 71 475–476, 515
fetal aneuploidy, QF-PCR vs., 588, 602 Leucine, structure, 5f schizophrenia and, 474, 476, 477t, 478f
molecular see Chromosome painting Leucine zipper motif, 104b, 104f, 350 significance thresholds and, 475–476, 476f
see also Cytogenetics Leukemia, nonparametric, 474–475, 494
KEGG (Kyoto Encyclopedia of Genes and Genomes), consequence of gene therapy, 710b, 711 affected sib pair analysis, 474–475, 474f,
382 definition, 538 486–487, 487b, 487t, 521
Kennedy disease (spinobulbar muscular atrophy), bone marrow transplantation treatment, 695, chance role, 476, 476f
clinical heterogeneity and, 72 696t identity by descent vs. identity by state,
Kinase(s), tumor-specific chromosome rearrangements, 474, 474f
cascades and cell signaling pathways, 105, 106f 544, 545t, 620 lod, 475
kinomics, 384 BCR–ABL1 gene fusion, 48f, 543–544, 544f significance, 475–476, 476f
mitogen actions, 110, 110f see also specific disorders overview/principles, 448–451
tumor expression, 559 Lifestyle factors, parametric, 464
see also specific enzymes/types genetic factors vs., 626, 626f, 635 lod scores see Lod score
Kinetochore, 39–40, 39b, 39f, 550 ignoring of advice on, 635 Mendelian traits/disorders, 448–457
Kinetochore fibers, 39b, 39f Li–Fraumeni syndrome, p53 mutations, 552, 552f non-Mendelian traits/disorders, 473–474
Klenow fragment, random primed DNA labeling, 198, tumors, gene therapy for, 714 positional cloning role, 498, 499f, 534
199f Ligand-binding domain, nuclear receptor superfamily, threshold, 453, 453b
Klf-4, 693 105, 105f two-point mapping, 451–455
Klinefelter syndrome, X-inactivation and, 66 Light chain (immunoglobulin), 123f autosomal dominant migraine, 454–455,
Knockdown, see Gene knockdown exclusion, B cell monospecificity and, 131 454t
Knockouts, see Gene knockout(s) see also Immunoglobulin genes genome-wide threshold, 455
Knudson’s two-hit hypothesis, 546–547, 546f, 567 Likelihood ratio, 608b linkage vs. exclusion criteria, 453–455, 454f
Kosambi’s function, 445 Limb development, over-interpretation, 455
Kozak consensus sequence, 21 conserved tissue-specific enhancers, 313f see also Genetic maps/mapping; Pedigree(s)
Kringle domains, 266, 314f digits see Digit formation Linkage disequilibrium (LD), 459, 479–481, 515, 622
Kuppfer cells, 121t Hox genes, 143, 143f ancestral segment size and, 480–481, 481f
Kuru, in New Guinea, 469 rapid cell proliferation, 145 factors affecting, 483
Kyoto Encyclopedia of Genes and Genomes (KEGG), SHH and, 351, 352f, 508 graphical representation, 483b, 483f
382 whole mount ISH, 241f human genome analysis see HapMap project
Limbal stem cell therapy, 696t measures, 481, 482b
LIM homeodomain family, neuronal differentiation recombination fraction and, 480
L and, 157f, 158 selection and, 480
Labeling (nucleic acids) see Nucleic acid labeling Lineage-specific elements, see under Species study methods, 481
Lactose tolerance, differences, at molecular level see also Association studies
agriculture/pastoralist populations, 523–524, 524f Lineage switching, 690bb Linker oligonucleotides, see Oligonucleotide linkers
convergent evolution, 524 LINE-1 elements, 291–292, 292f Linker-primed PCR, 186b
gene identification, 523–525 active transposition and, 292 Lipid(s),
haplotype size and, 524–525, 524f exon shuffling and, 315f attachment to polypeptides, 23t, 24
racial/ethnic differences, 524, 524f reverse transcription role, 2, 292 lipidomics, 384
lacZ complementation assay, 170–171, 170f, 172, 176 LINE-2 elements, 290 Lipoplexes, 705f
lacZ reporter transgenes, 652 LINE-3 elements, 290 post-translational modification, 24
Lagging strand, 9, 9f, 10f, 42 LINE elements, 290, 291, 291f, 292f, 408 Lipoprotein Lp gene, 266, 314f, 315
end-replication problem, 41, 42–43 evolutionary history, 291 Liposarcoma, 545t
microsatellite instability and, 423 SINE homology, 292 Liposomes, cationic, 704, 705f
Lambda see Phage l see also Transposons; specific elements Liver, renewal/replenishment, 115
Lamellipodia, 94b Linkage analysis, 225 Livestock animals, as model organisms, 304, 305t
Langer–Giedon syndrome (trichorhinophalangeal allelic heterogeneity and, 492 Locus (loci),
syndrome type II), 427, 427t computer analysis, 453, 456, 462, 464, 475 definition, 62
Langerhans cells, 121t genetic risk calculation and, 595, 595f genetic heterogeneity see Locus heterogeneity
Language genes, 'human-defining' DNA and, 340, errors/problems, 460–463, 500 linkage and, 479
340f, 340t computational difficulties, 462 see also Genetic linkage; Linkage analysis
Language and reading disorders, 340 genotyping/diagnostic errors, 461, 461f, 500 Locus control regions (LCRs), 351–352
Laser capture microdissection, 240, 240f locus heterogeneity, 462 globulin gene clusters, 352f
Lateral inhibition, neuronal differentiation and, 157, exclusion mapping, 454 Locus heterogeneity,
157f fine-mapping and, 457–560 candidate genes and, 514–515
Laterality see Left–right axis framework maps and, 449, 450 complementation tests, 71, 72, 72t
Lateral plate mesoderm, 153, 154f CEPH families, 455–456, 462 definition, 71, 514
LCN (low copy number) technique, 599 errors, 461 genetic testing and problem of, 590–591
LD see Linkage disequilibrium (LD) physical–genetic map correlation, 456 Hardy–Weinberg distribution and, 87
Leading strand, 9, 9f, 10f, 42 see also Framework maps hearing loss, 71, 71f, 590–591
Leaky mutation, 658, 663, 664b, 664t genetic markers see Genetic markers interallelic complementation, 72
Learning difficulties see Mental retardation genetic model requirement, 463 linkage analysis and, 462, 500
Leber congenital amaurosis, informative vs. noninformative meioses, 449, nonsyndromic mental retardation, 514–515,
successful gene therapy for, 710b 450f, 450t 531–532, 590, 591
Leber hereditary optic neuropathy, clinical ambiguity and, 451–452, 451f susceptibility genes, 492
heterogeneity, 437 defining minimal candidate region, 500 Lod-1 rule, 454
762 Index
Lod score, GST activity and, 614 cleavage features, 148, 148f
calculation, 452b, 452f Luteinizing hormone receptor, gain-of-function difficult to study, 301
genetic risk and, 595, 595f mutations, 433 evolutionary conservation and, 160–161
complex pedigrees and, 452–453 Lymph nodes, 120 extraembryonic structures, 137f, 148–151
computer algorithms, 453, 456, 462, 464, 475 Lymphocytes, 93, 120, 120t fertilization, 146–147
difficulties with standard analysis, 460–463 antigen presentation to, 120–121 gastrulation see Gastrulation
apparent double recombinants, 461, 461f B lymphocytes (B cells) see B cells germ-cells and, 158–159
computing errors/problems, 462 cell surface receptors, 123 human see Human development
general difficulties, 460 classes of, 120t implantation, 136, 150–151, 151f
genotyping errors, 461 clonal selection, 123 invertebrates vs., 135, 160, 161f
limited resolution, 462–463 DNA rearrangements, 93 neural tissue see Neural development
locus heterogeneity, 462 self-reactive, 128b sex determination see Sex determination
misdiagnosis and, 461 T lymphocytes (T cells) see T cells see also specific processes/mechanisms
non-Mendelian disorders and, 473–474 see also specific types disease models, 641t
genetic model requirement, 463 Lymphoid organs, 120–121 eutherian, definition, 327b
linkage/exclusion criteria, 453–455 see also specific organs evolution, 323f
Bayesian calculations, 453, 453b Lymphoma, genome evolution
confidence intervals, 454, 454f bone marrow transplantation treatment, 695, chromosome rearrangements, 319–320, 342
exclusion (z less than -2), 454 696t homologous genes, 502
linkage (z greater than 3), 454 chromosome rearrangements human see Human genome evolution
lod score curves MYC activation and, 544–545, 545f see also Genome evolution
multipoint maps, 456–457, 457f tumor-specific, 544, 545t metatherian (marsupials), definition, 327b
two-point maps, 454, 454f definition, 538 model organisms, 303–305, 304b, 305t
non-Mendelian disorders and see also specific disorders advantages, 303
chance effects, 476, 476f Lynch syndrome, developmental, 135–136, 135t, 301–303
nonparametric lod score, 475 microsatellite instability, 557–558 ethical concerns, 303–304
significance and, 475–476, 476t pathways approach to, 564 humans as, 305–306
recombination fraction relationship, 453 Lyonization see X-inactivation livestock animals, 304
significance thresholds, 453, 453b, 475, 476t Lysine, structure, 5f non-human primates, 305, 305t
two-point scores for familial migraine family, Lysosomal disorders, tissue-specificity of gene defect, rodents, 303, 304–305
454–455, 454t 514 prototherian (monotremes), definition, 327b
Logarithm of the odds see Lod score Lysosomes, 94b, 94f regenerative capacity limited, 690b
Logistic regression, association studies, 517, 609b vertebrate phylogeny and, 331f
Long interspersed nuclear elements see LINE elements see also individual groups/species
Long regulatory RNAs, 279t, 287–288, 288t
M Mammalian cell display, 683
Long terminal repeats (LTR) Macaque, as model organism, 305t MammaPrint® system, 620
retrovirus, 702f Macro-H2A, 356 Manhattan plot, 531f
transposons, 291, 291f Macrophages, 121t, 122, 126b Manifesting heterozygotes, 67
Loose connective tissue, 101 high class II MHC expression, 126b MAP kinase, mitogens and, 110, 110f
Loss-of-function mutations, 389, 390, 390t motile, originating from monocytes, 121t Mapping functions, 445–446
allelic heterogeneity and, 429, 429f tissue, different classes of, 121t Marfan syndrome, fibrillin as candidate gene, 502
allelic homogeneity and, 433 Macular degeneration, age-related, Marker chromosome, 51t
cancer and aptamer treatment, 686 Marker genes, vector construction, 170–171, 170f, 172
mismatch repair genes in Lynch syndrome, siRNA therapy, 707 Marker–marker mapping, 456
558, 558f, 558t Magnetic resonance imaging (MRI) Markers, genetic see Genetic markers
TSGs see Tumor suppressor genes whole animal imaging, 642 Marmoset, as model organism, 305t
chromosome breaks and, 351, 504 Maintenance methyltransferases, 365, 365f germline transmission, 672
definition, 428 Major histocompatibility complex (MHC), 125, 126b, phylogenetic relationships, 331f
founder effects, 433 126f Marsupials (metatherian mammals),
gain-of-function mutations in same gene vs., class I, 125f, 126b, 127 definition, 327b
433–434, 434t, 435f almost ubiquitous expression, 126b vertebrate phylogeny and, 331f
genotype–phenotype correlation, residual class II, 125f, 126b, 127 Mass analyzers, 251b
function and, 435–436, 435f, 436t high expression in a few cell types, 126b Massively parallel signature sequencing (MPSS), 248
heterozygote advantage and, 433 class III, 126b Mass spectrometer, 251b
long-range effects of chromosome abnormalities, genetic variation, 126b Mass spectrometry (MS), 251b
508 human see HLA (human leukocyte antigen) affinity purification and (AP–MS), 396t, 397–398,
mechanisms of action, 429 complex 398f
mode of inheritance and, 428–429 MHC restriction, 127 genotyping and genetic testing, 583
dominant-negative mutations and, 430–431, proteins, 126b global protein expression analysis, 250–252
430f, 436 self-recognition and immune tolerance, 127, 128b annotation, 251–252, 252f
haploinsufficiency and dominance, 429–430, tissue typing/graft rejection and, 126b de novo sequencing, 252
435f, 436 Major (GU-AG) spliceosome, 17, 281t fragment ion searching, 251
recessive inheritance and, 429, 435, 435f MALDI, 250, 251, 251b, 251f, 583 peptide mass fingerprinting, 251
modeling in mice, 662-663, 664b MALDI-TOF, 251, 251f, 252f, 583 identifying proteins as vaccine targets, 681
molecular pathology and, 428–429 Malaria resistance, chimpanzee as model, 305t large molecule ionization (soft ionization), 250
repeat expansion diseases, 423 Male(s), ESI, 251b
Loss of function, genetic screens in culture, 390t ‘maleness’ and Y-linked genes, 67, 321, 325t MALDI, 250, 251, 251b, 251f
Loss of heterozygosity (LOH), meiosis mass analyzers, 251b
colorectal cancers, 562 chiasmata counts, 446, 446f mass spectrometers, 251b
FAP and, 548 female vs., 33, 33f Mast cells, 121t, 122
tumor suppressor genes, 548–549, 548f X–Y pairing and, 35 Maternal age, trisomy 21 (Down syndrome), 629–630,
p53, 552 see also Gametogenesis; Meiosis 630f, 630t
problems with identification, 549, 549f recombination frequencies, 446, 446f, 447f Maternal determinants, 148
two-hit hypothesis and, 540, 546, 546f, 547 sex determination, 159 Maternal genome regulation, 148
Low copy number (LCN) technique, 599 see also Y chromosome Matrilineal mitochondrial inheritance, 68, 257
loxP sequence of phage P1, structure, 652f Malignant hyperthermia, 616 see also Mitochondria
see Cre-loxP site-specific recombination MaLR repeats, 291 Matrix-assisted laser desorption/ionization see MALDI
Lsm snRNAs and proteins, 281, 281f, 281t MALT (mucosa-associated lymphoid tissue), 121 Maximum-likelihood methods, tree construction,
LTR transposons, 291, 291f Mammal(s), 328–329, 339
Luciferase, iterative pyrosequencing and, 219, 220f cultured cells, 389-390 Maximum-parsimony methods, tree construction, 328
Luminescence, iterative pyrosequencing and, 219, reverse genetic screens in, 390t MBD (methyl CpG-binding domain), 358
220f RNA interference in, 390-393, 391f, 392f, 393f mdx muscular dystrophy mouse, 641, 707
Lung cancer, development Mdm2 protein, p53 regulation, 551, 551f, 556
CHEK2 and, 555 animal models, 135–136, 135t, 301–303 Medaka (Japanese killifish), as model organism, 302,
gatekeeper hypothesis and, 563 axis specification and polarity, 140, 140bb, 304b
genotype-specific drugs, 620 141f different sex determination systems, 321t
Index 763
coding vs. noncoding sequences, 422–423, homoplasmy and, 68, 437 Molecular genetics, polymorphism definition, 406b
424t matrilineal nature, 68, 257 Molecular inversion probes (MIPs), 584, 584f
dynamic mutations see Repeat expansion light (L) strand, 257, 257f, 258f Molecular karyotyping, 47–48
diseases mutation rate, 68, 259 see also Chromosome painting
fragile sites and, 74, 423 nuclear genome vs., 260, 260t Molecular pathology, 428–434
instability see Microsatellite instability (MIN) replication, 257, 258f allelic heterogeneity and, 429
polyalanine tracts, 423 7S DNA, 257, 257f cancer, 435
protein aggregation and, 425b CR/D-loop region, 257 dominant-negative effects, 431–432, 431f
see also specific disorders strains, 650f, 660-661, 661b see also Dominant-negative mutations
production, 408, 409f, 423 transcription, 257–258, 258f gain-of-function see Gain-of-function mutations
Microsatellite instability (MIN), 229, 553, 557–558, 557f multigenic transcripts, 258, 258f genotype–phenotype correlation see Genotype–
colorectal cancer and, 557–558 overlapping genes, 258, 258f, 259f, 268 phenotype relationship
E.coli mutator gene homologs and, 558, 558f, 558t RNA polymerase, 347 hemoglobinopathies, 435
see also Repeat expansion diseases Mitochondrial matrix, 94b inheritance patterns and, 428–429
Microscopy, Mitochondrial pseudogenes, 272b gene dosage effects, 429–430, 430f, 430t
bright-field, ISH and, 206, 206f Mitochondrial ribosomes, 94b, 258 see also Inheritance patterns
dark-field, ISH and, 206 Mitogens, 110, 110f loss-of-function see Loss-of-function mutations
electron, 242 Mitosis, 31–33 Molecular phylogenetics, 326–341
fluorescence, 201b, 201t, 244–245, 245f cell cycle checkpoints and, 550, 550f gene families see Gene families
see also specific techniques cohesins and, 32 ‘human-defining’ sequences and, 339
Microtubules, 94b cytogenetics and, 44 lineage-specific DNA
cytoskeleton, 94b see also Cytogenetics environmental genes, 322–323
kinetochore, 39–40, 39b, 39f definition, 32 gene families see Gene families
in diverse centromere types, 41f increasing proportion of cells in, 44 G-value paradox and, 331
mitotic spindle, 39b, 39f intrinsically asymmetric divisions, 118b rarity of human-specific genes, 337–339
Midline, 137 meiosis vs., 36, 36t transposable elements and, 335–337
Miller–Dieker lissencephaly, 427t, 428 meiosis II similarity, 34, 35 mutation rate and, 328
Min (multiple intestine neoplasia) mouse, 661b see also Meiosis noncoding DNA expansion, 333
Minisatellite DNA, 289–290, 289t, 408 mtDNA and, 257 G-value paradox and, 329
generation, 408, 409f recombination and, 443b transposition and, 335–337
as marker, 449t requirement/functions, 31–32 regulatory elements and morphological
DNA fingerprinting and, 596–597, 596f stages, 32–33, 32f divergence, 333–335
Minisequencing, 581–582, 582f, 602 see also specific stages tree construction, 326–329
Minor (AU-AC) spliceosome, 17, 281f without cell division (endomitosis), 96, 97f assumptions, 328
MIPs (molecular inversion probes), 584, 584f Mitotic index, 44 bootstrapping, 329
MIR elements, 293 Mitotic spindle, 39, 39b, 39f branches, 327
miRNA see MicroRNA (miRNA) cell cycle checkpoint and, 552, 552f data set size and, 326–327
Misfolded proteins chromosomal instability and, 554 distance matrix method, 328, 328f
pathogenic nature, 417, 421, 425b disruption, 44 human phylogeny/classification, 329, 330f
Mismatch repair, 415b Mixoploidy, 51–52 maximum-likelihood methods, 328–329, 339
genes, 558, 558t see also Chimeras maximum-parsimony methods, 328
mechanisms, 558, 558f MLH1, chiasmata, 446, 446f neighbor relation method, 328
Missense mutation(s), 417–418 MLPA (multiplex ligation-dependent probe nodes, 327
conservative vs. nonconservative, 417 amplification), 585–586, 586f, 602 rooted (cladogram), 328, 328f
difficulty in assigning pathogenicity, 512 Modafinil, CYP2D6 variation and, 616 sequence alignments, 326–328
dominant-negative mutations vs., 431 Model organisms, 298–306, 341, 394t unrooted, 328, 328f
modeling in mouse, 664b, 664t animal models of development, 135-136, 135t validation and, 328–329
protein aggregates and, 418 invertebrate models, 301, 302b, 303t, see also Bioinformatics
sickle-cell anemia, 417–418, 418f non-mammalian vertebrate models, 301- Monkeys,
MIT (Massachusetts Institute of Technology), 233 303, 304b model organisms, 305t
Mitochondria, animal models of disease, 641t New World monkeys (NWM), definition, 327b
apoptosis induction and, 114, 114f applications, 298 Old World monkeys (OWM), definition, 327b
genome see Mitochondrial DNA (mitochondrial for gene function studies, 394t, 641t phylogenetic relationships, 331f
genome) gene identification Monoclonal antibodies (mAbs), 244b
structure, 94b, 94f functional complementation and, 510, 510f genetically engineered for therapy, 683-684, 684f,
Mitochondrial bottleneck, 437 non-positional methods, 511, 511f 685f, 686t
Mitochondrial disease(s), positional cloning and, 502–503 chimeric antibodies, 683, 684f
mtDNA mutation see Mitochondrial DNA genetically modified/recombinant see Genetically fully human, 683, 684f, 685f
(mitochondrial genome) modified organisms humanized, 683, 684f
nuclear DNA mutation, 68 Genome Projects, 228, 234–235 intrabodies, see Intrabodies
poor genotype–phenotype correlations, 436–437 humans as, 305–306 scFv, 684, 684f
Mitochondrial genome (or DNA), 68, 94b, 224, 256, invertebrates see Invertebrate(s) human only in genetically engineered mice,
256f, 257–260, 293 nonhuman primates, 641t 683-685f
CR/D-loop region, 257f regulatory elements project (ModENCODE), 346 therapeutic antibodies, examples, 686t
developmental bottleneck, 68, 437 unicellular, 298–301, 299b see also Immunoglobulin(s)
DNA profiling using, 597–598 eukaryotes, 300 Monocytes, 121t, 122
genes, 257–258, 259t prokaryotes, 298, 300 Monogenic (Mendelian) inheritance, 62, 64–78
density, 266 synthetic, 300–301 anticipation, 74–75
function of, 258, 259t see also Archaea; Bacteria; Yeast basic types, 64–68, 64f, 65f, 66b
nuclear pseudogenes and, 272b vertebrates, 304b autosomal see Autosomal inheritance
protein-coding genes, 258, 258f developmental models, 301–303 dominant see Dominant inheritance
rRNA genes, 258, 258f, 277 ethical issues, 303–304 mitochondrial see Mitochondrial DNA
tRNA genes, 258, 258f, 279 mammalian see under Mammal(s) (mitochondrial genome)
genetic code variations, 22f, 23, 258–260 see also specific models recessive see Recessive inheritance
genome evolution, ModENCODE project, 346 sex-linked see Sex-linked inheritance
bacteria similarity, 257-258 Modifier, genetic clinical heterogeneity see Clinical heterogeneity
evolution of different genetic code, 258-259 mouse, 661b databases of genetic diseases and, 62, 63b, 69
genes lack introns, 258 Molecular beacon probes, qPCR, 243f gene identification, 498
genes lost to nuclear genome, 264 Molecular cytogenetics, 46–50 approaches, 498, 499f
limited autonomy, 259t chromosome painting see Chromosome painting mapping see Genetic maps/mapping
genome sequence, 224, 257 comparative genome hybridization, 48–50 success, 531–532
heavy (H) strand, 257, 257f, 258f FISH see Fluorescence in situ hybridization (FISH) see also Pedigree(s); specific methods
heteroplasmy, 68 molecular karyotyping, 47–48 gene tracking and, 592
inheritance, 66b, 68 see also Nucleic acid hybridization; specific genotype–phenotype correlations see Genotype–
heteroplasmy and, 68, 437 techniques phenotype relationship
Index 765
heterozygote advantage and, 88–89, 89f gene synteny, 320, 320f see also individual traits/conditions
identifying mode of noncoding DNA and, 311 Multiple displacement amplification (MDA), 186b
importance of, 69 positional cloning and, 503–504 Multiple endocrine neoplasia type II, 434
pedigree analysis see Pedigree(s) pseudoautosomal regions, 322, 322f Multiple epiphyseal dysplasia 4, 436
segregation ratio see Segregation ratio ultraconserved elements, 312 Multiple exostoses type1, 430t
modifiers, 437 see also Orthologous genes Multiple melanoma, CDKN2A gene and, 552, 552f
multifactorial vs., 62–63, 468 imprinting, 367 Multiple sulfatase deficiency (MSD),
historical aspects/conflict, 79 interspecific crosses, 503b gene defects, 522, 523f
see also Multifactorial inheritance making human antibodies only, 683, 685f gene identification, 522–523
mutation rate estimates, 88 model organism, 304, 304b, 305t, 389, 394t, 502, functional complementation, 510
one gene–one enzyme hypothesis and, 70–71 503–504 microcell-mediated chromosome transfer,
pedigrees see Pedigree(s) modifier genes, 661b 523, 524f
see also individual diseases mutagenesis, 503 predicting gene sequence from peptide
Mononuclear phagocytes, 121t paramutation, 370 sequence, 523, 523f
see also specific cell types phylogenetic relationships, 331f Multiplex ligation-dependent probe amplification
Monosomy, 50, 51t recombinant inbred strains, 503b (MLPA), 585–586, 586f, 602
rescue and uniparental disomy, 58 transgenic, see Transgenic mouse Multiplicative principle, forensic DNA profiling and,
sex chromosomes, 53 transgenerational epigenetic regulation, 370, 600–601
Monosomy X see Turner syndrome 371f Multipoint mapping see under Linkage analysis
Monotremes, Mouse Genome Sequencing Consortium, 234 Multipotent cells, 115, 115t
definition, 327b Mozobil, and stem cell mobilization, 690 germ layers, 137
vertebrate phylogeny and, 331f M phase of cell cycle, 30,31f, 108 HSCs as, 116, 116f
Monozygotic (identical) twins, 150b, 150f, 469 see also Cell division; Meiosis; Mitosis Muntjac, chromosome evolution, 319, 319f
twin studies see Twin studies MRE11 protein, 555, 556f Muscle(s), 101
Morphogen(s), mRNA see Messenger RNA (mRNA) DMD gene identification and, 519
definition, 142 MRN complex, 555, 556f neuromuscular junction, 108
gradients and pattern formation, 141–143 MSCs see mesenchymal stem cells (MSCs) syncytium formation, 96, 97f
Hox gene regulation and, 143, 143f MspI, methylation assay, 577, 577f see also specific types
nodes and, 144 mtDNA see Mitochondrial DNA (mitochondrial Muscular dystrophy,
see also specific molecules genome) clinical heterogeneity, 72
Morphogenesis, 134, 144–146 Mucolipidosis II, 514 dystrophin frameshifts and, 421f
asymmetric vs. symmetric cell division, 145 Mucosa-associated lymphoid tissue (MALT), 121 genotype–phenotype relationship, 70–71, 436
cell adhesion and affinity, 145, 145t Mullerian ducts, 159 see also specific types
cell proliferation vs. cell death, 108, 134, 145–146 Multicellular organisms, Mus musculus see Mouse (Mus musculus)
see also Apoptosis; Cell proliferation classification, 92–93, 92f Mutagenesis,
cell size/shape changes, 144 C value paradox and, 96, 96t classical (forward) genetics, 389
gastrulation see Gastrulation evolution, 92 invertebrate models, 302b
invagination, 144 as model organisms, 301 oligonucleotide mismatch, 176, 176f
neural tube formation see Neurulation see also Model organisms PCR-based, 188–189, 189f, 580, 580f
key processes, 144t size variation, 134 random, principles of, 657
Morpholine, structure, 657f see also Metazoans chemical, 658
Morpholino antisense oligonucleotides, 390, 393f, Multi-drug resistance, MDR1 mutations, 422 insertional, 658-660
657, 657f Multifactorial inheritance, 79–83 gene trap-based, 658, 659, 659f
disadvantages, 657 behavioral traits and, 63, 469 transposon-based, 659-660, 660f
inducing alternative splicing, 657 birth defects and, 63 screens, 393f, 643, 657
inhibiting translation, 657 complicating factors, 463 site-specific see Site-specific mutagenesis
structure, 657, 657f assortative mating, 81 vertebrate models, 304b
Morphological evolution, cis-regulatory elements and, dominance and, 81 mouse models, 503
333–335 penetrance and, 492 Mutation(s), 97, 416–428
Morphological phenotyping, 612f continuous traits and, 63, 79 carcinogenesis and, 539–540, 539b, 540f
Morula, 136, 147 see also Quantitative genetics; Quantitative oncogenes see Oncogene(s)
DNA methylation, 360 trait loci (QTL) tumor suppressors see Tumor suppressor
polarized vs. nonpolar cells, 148 dichotomous traits and, 82–83 genes
Mosaicism, 51t disease susceptibility, 63, 82, 82f see also Cancer genetics; Carcinogenesis
chimeras vs., 78, 78f recurrence risks, 82–83 definitions, 406b
chromosome abnormalities, 50, 51, 52 threshold concept, 82 DNA damage and, 415
definition, 51 see also Susceptibility genes dominant-negative see Dominant-negative
DMD carriers, 595 diagnostic criteria, 468 mutations
germ-line, 77, 78f see also Susceptibility genes dynamic and anticipation, 74–75, 424, 512
definition, 78 environment factors, 469, 470, 472 see also Microsatellite instability (MIN);
pedigree analysis and, 77–78, 78f intrauterine, 470 Repeat expansion diseases
identifying by DNA analysis, 78f genetic counseling and empiric risks, 83, 84t fitness and, 307b
somatic, 77, 78f Hardy–Weinberg distribution and, 84–85 see also Selection
risk to organism, 77–78 genetic factors, 471–472 frameshifts see Frameshift mutations
see also Cancer(s) common disease–common variant model gain-of-function see Gain-of-function mutations
X-inactivation and, 66–67, 366, 366f and, 491, 493, 493t, 494, 516 gene expression changes and, 421–422
Mosaic pleiotropism, developmental genes, 334, 334f gene/genotype frequencies and see Gene heterozygosity and, 574
Mouse/mice (Mus musculus), frequency see also Heterozygous/heterozygote; Loss of
backcrossing, 660, 661b genetic mapping problem, 463, 464 heterozygosity (LOH)
chimeric, 650f mapping see Genetic maps/mapping interpreting effects of, 417, 512, 513, 514
chromosome structure, 367 mutation–selection hypothesis and, 88, leaky, see Leaky mutations
congenic strains, 503b 492–493, 493t, 494 loss-of-function see Loss-of-function mutations
disease model, 304b, 641t, 669b oligogenic, 63, 471 missense see Missense mutation(s)
difficulties in replicating human phenotype, penetrance, 492 modifier, 389, 390, 390t, 437
669b polygenic, 63, 471 neutral, 307b, 311
gene mapping, 503, 503b threshold theory see Polygenic threshold new mutations and trait maintenance, 76–77, 77f
genetic background, 661b theory nomenclature/terminology, 407b
genome database (MGI), 394t see also Heritability; Quantitative genetics; non-protein-coding regions and, 514
genome sequencing, 228, 234, 236t, 306 Susceptibility genes nonsense, 418–419, 419f, 421, 563
human gene identification and heritability and see Heritability null mutations, 431
functional complementation and, 510, 510f historical aspects, 79 pedigree interpretation complicated by, 77f
non-positional information and, 511, 511f mapping see Genetic maps/mapping post-zygotic, 77
positional cloning and, 503–504 monogenic vs., 62–63, 468 see also Mosaicism
human genome vs., 502, 503f historical aspects/conflict, 79 protein aggregation and, 418, 425b
chemoreceptors and, 332, 332t see also Monogenic (Mendelian) inheritance rate see Mutation rate
766 Index
regulatory elements, 421–422 Nematode model see Caenorhabditis elegans New World monkeys (NWM), definition, 327b
expansion and, 333 Neocentromeres, 320 phylogenetic relationships, 331f
morphological divergence and, 333–335 neo marker/transgene, 182, 183f, 650, 651f, 659f Next generation sequencing, see DNA sequencing,
thalassemia and, 421–422, 422f Neofunctionalization, 316, 316f massively parallel
screening and candidate gene testing, 512–513 Neogenes, created by transposons, 336 N-glycosylation, 24
small-scale pathogenic, 417–422 Neomycin, 182 Nibrin, 555, 556f
splicing effects see Splicing Neomycin phosphotransferase, 659f Nick translation, nucleic acid labeling, 198, 198f
synonymous (silent), 417 Neonatal screening, 629, 631 Nicotine, human mAbs against, 684
pathogenic, 422 Nephrotic syndrome, zebrafish model, 671t Nijmegen breakage syndrome,
Y chromosome accumulation, 324 Nervous system, 101–102 DNA repair defect, 415, 555
see also Genetic variation; Polymorphism(s); alternative splicing and, 374, 374f mapping using shared ancestral segments,
specific types/mechanisms development see Neural development 459–460, 460f
Mutation rate, differentiation of, 157,157f NimbleGen™ system for DNA capture, 223–224, 224f
estimating, 88 glia, 102 Nipped-B, 387b, 387f
human, 77 neurons see Neuron(s) NKT cells, 120t, 128
mitochondrial DNA, 68 pattern formation, 157f Noggin, evolutionary conservation, 160, 161f
molecular phylogenetics and, 328 PCD and disease, 113 Nomenclature/terminology,
RNA genomes, 12 Nested primer PCR, 186b alleles, 62b
Mutation–selection equilibrium, 88 Neural crest, 155, 155f amino acid variants, 407b
dystrophin mutations and, 595 cell fates/derivatives, 156b, 156t DNA polymerases, of mammals, 11t
Mutation–selection hypothesis, 492–493, 494 origin/migration pathways, 156b, 156f human chromosome(s), 45b
common disease–common variant model vs., Neural development, 154–158 abnormalities of, 51t
493, 493t axial mesoderm and induction, 153f, 155, 155f histone modification, 353, 353b
Hardy–Weinberg distribution, 88 chromosome abnormalities and, 504b human genes and alleles, 62b
Mutator genes and homologs, 558, 558t cranial vesicles, 155 mutation/polymorphism, 406b
MUTYH mutations, 557 disorders, 160 nucleosides, nucleotides, 4t
MVA (Modified vaccinia virus Ankara), 703 neural crest cells and, 155, 156b, 156f, 156t nucleic acid structure, 3, 4t
Myc, mitogens and, 110, 110f neuronal differentiation, 157–158, 157f nucleotide substitutions, 407b
Mycoplasma capricolum, genome transplantation neurulation see Neurulation taxonomic ranks, 326f
and, 300 pattern formation, 155–157, 157f Non-allelic homologous recombination (NAHR), 368,
Mycoplasma genitalium, as synthetic organism, 300 BMPs, 155, 157, 157f 368f, 426, 426f, 428, 443b
Mycoplasma mycoides, genome transplantation and, transcription factors, 155–156 Nonautonomous transposons, 290
300 Xenopus model, 155 Noncoding RNA (ncRNA), 2, 256, 274–288, 278tt, 293,
Myelination, 101f, 102 transcription factors and gene expression 346, 498
Myelin basic protein, mRNA transport, 375 Hox genes, 156 antisense transcripts, 287
Myeloid leukemia, neuronal differentiation, 157–158 comparative genomics and, 310–312
acute pattern formation, 155–156 databases, 279t, 311
SK gram, 554f see also specific structures/processes ENCODE project findings, 363
whole genome sequencing, 559 Neural folds, 155, 155f see also ENCODE project
chronic, BCR-ABL1 rearrangement, 48f, 543–544, Neural plate, 144, 153, 153f, 155 functional diversity, 277, 277f, 278tt
544f neural tube formation and see Neurulation functional importance, 274–275, 276b, 311
Myoblasts, 115 Xenopus laevis induction experiments and, gene expression regulation, 277f, 278tt
Myoglobin, 317, 317f 137–138, 138f epigenetic, 287
see also Globin genes/proteins Neural stem cells, polarity and cell fate, 138, 139f gene silencing, 275, 285–286
Myotonic dystrophy, 423, 425f Neural tube, 144, 155 guide RNAs, 275, 282, 282f
allelic homogeneity and, 429 defects, positional cloning approach, 501 medium-sized to large regulatory RNAs,
anticipation and, 74–75 recurrence risk, 83 279t, 287–288, 288t
gain-of-function mutations, 428 formation by invagination see Neurulation microRNAs and, 283–285
genetic testing, 589f gene expression and pattern formation in, snRNAs see Small nuclear RNA (snRNA)
gene tracking and, 592 155–157 translational controls, 376–378, 377f, 379
mouse models, 665 positional cloning of neural tube defect genes, transposition suppression, 285–286
mutations, 580t 501 genes see RNA genes
Myotonic dystrophy 1 and 2, repeat expansion and, Neurectoderm, 155 human noncoding RNA classes, 278tt
423 Neurexin-3, alternative splicing, 374, 374f libraries, 392, 392f
Myristoylation, 23tN Neuroblastoma, 542, 543f major types/classes, 278tt
Neurodegenerative disease, percentage of genome represented by, 311–312
programmed cell death and, 113 processing see RNA processing
N repeat expansion and see Repeat expansion ribosomal RNA see Ribosomal RNA (rRNA)
N-acetyltransferase (NAT), genetic variation and, diseases synthesis, 15–16
613–614, 613f see also specific conditions telomerase TERC, 43, 283
NADPH oxidase, 711-712 Neuroepithelium, 139f transfer RNA see Transfer RNA (tRNA)
NAHR see non-allelic homologous recombination Neurofibromatosis, see also specific types
Naive B cells, 120t, 123 type 1, 266, 268f, 271, 273f, 374 Noncovalent bonds, 6–7, 6b, 6t
Naive T cells, 123 type 2, chemical cleavage mismatch test, 575f see also specific types
Nanophotonic zero-mode waveguides (ZWMs), 222, Neurogenic genes, 157 Nondisjunction, 51
223f Neurogenin, 157f Nonhierarchical clustering, microarray data, 247b
Nanopore sequencing, 222–223 Neuroglobin, 317, 318 Nonhistone proteins, in chromatin, 37f
National Health Service (NHS), see also Globin genes/proteins Nonhomologous end joining (NHEJ),
personalized medicine and, 619 Neuron(s), 93, 95t dsDNA break repair, 414f, 415b
population screening criteria, 631 cell body, 101, 101f gene inactivation using, 655, 655f
National Institute for Health and Clinical Excellence different types, number, 93 Nonhomologous recombination, 443b
(NICE), 619 differentiation, 157–158, 157f Nonhuman primates, disease models, 641t
National Screening Committee (UK), population growth after terminal differentiation, 108 theoretically best models, 640
screening criteria, 631 interneuron connections, number, 93 Non-Mendelian inheritance see Multifactorial
Natural killer (NK) cells, 120t, 122 myelination, 101f, 102 inheritance
Natural selection see Selection structure, 101, 101f NOD (nonobese diabetic) mouse, 641
Nature–nurture debate, synapses see Synapse(s) Nonparametric linkage see Linkage analysis
multifactorial disorders and, 468, 470 tau mRNA transport, 375 Nonparametric lod (NPL), 475
see also Environmental factors; Gene– Neuronal cell body, 101, 101f Non-penetrance, 73, 73f
environment interactions; Neurotransmitters, 107–108, 108f Nonsense codons,
Genotype–phenotype relationship Neurulation, 144, 153, 153f suppression by gentamcyin, 680,
NCBI map viewer, 236t definition, 155 suppression by PTC124, 680-681
Nearest-neighbor analysis, microarray data, 247b Xenopus laevis induction experiments, 137–138, Nonsense-mediated decay (NMD), 418–419, 419f
Necrosis, 111 138f dominant-negative mutations and evolution, 431
Negative selection see Purifying (negative) selection Neutral mutations, 304b, 311 inhibition by puromycin, 572
Neighbor relation method, tree construction, 328 Neutrophils, 121t, 122 thalassemias and, 421
Index 767
Nonsense mutation(s), 418–419, 419f definition, 192b Nuffield Council on Bioethics, DNA databases, 601
APC, 563 labeling see Nucleic acid labeling Nulliploid c ells, 96
mRNA degradation and see Nonsense-mediated see also specific techniques Null mutations, dominant-negative vs., 431
decay (NMD) Nucleic acid labeling, 194, 197–203 Numb, asymmetric localization, 139f
thalassemias and, 421 end-labeling, 197
Northern blot hybridization, 204, 205f, 241 nonisotopic labels, 200–203
candidate gene approach and, 501 biotinylation, 201–202, 202f
O
Nortriptyline, CYP2D6 polymorphism and sensitivity direct assays, 47, 200, 201b Obesity, 368, 368f
to, 611, 611f fluorophores see Fluorophores Ochre termination codons, 22
Notch1, asymmetric localization, 139f indirect assays, 47, 200–202, 201b, 202f Odds ratio, 477, 490, 608b, 608t
in cell fate, probe detection see under Nucleic acid WTCCC studies, 622–623
Notochord, 153, 153f, 154fr hybridization O-glycosylation, 24
neural pattern formation, 157, 157f radioisotopic labels, 199–200 Oct-3/4, OCT4, 690, 693
Notochordal process, 152–153, 153f advantages/disadvantages, 200 Okazaki fragments, 9, 10f, 42, 423
Nuclear entry, autoradiography and, 200b Old World monkeys (OWM), definition, 327b
difficult for plasmid DNA, 704 commonly used labels, 200, 200t phylogenetic relationships, 331f
methods to enhance, 704, strand synthesis labeling, 197–198 Olfactory receptors, expression, 371, 371f
retrovirus, difficulties for some types, 701 nick translation, 198, 198f Olfactory receptors, gene family
Nuclear envelope, 94b, 94f PCR-based, 198 human gene family, 270t
Nuclear matrix, 93, 94b, 94f random primed DNA labeling, 198, 199f species differences, 332, 332f
Nuclear pores, 94b, 94 RNA labeling, 198, 199f Oligodendrocytes, 102
difficult traverse for plasmid DNA, 704 see also specific labels/methods precursors for treating spinal cord injury, 695,
traversed by many viruses, 702-703 Nucleic acid probe(s), 696t
Nuclear receptor superfamily, 103, 104b, 105, 105f, advantages/disadvantages, 197 Oligogenic inheritance, 471
352–353 biotinylated, 202 definition, 63
Nuclear reprogramming, definition, 192b see also Multifactorial inheritance
cell fusion and, 690b immobilized see under Nucleic acid hybridization Oligonucleotide ligation assay (OLA), 581, 581f, 582f,
induced by ectopic transcription factors, 690b labeling see Nucleic acid labeling 602
dedifferentiation and transdifferentation, 690b mismatches and, 196f MLPA variant, 585–586, 586f, 602
Nuclear speckles, 361 molecular cytogenetics and, 46 Oligonucleotide linkers,
Nucleic acid(s), nucleic acid hybridization and, 193, 193f in whole genome PCR, 187-188, 188f
base pairing see Base pairing preparation, 197 Oligonucleotide primers,
double-stranded regions, 193 in solution, 194 convergent in PCR, 183, 184, 184f
electrophoresis, 204b, 204f types/classes, 197 divergent (inverse PCR), 186b, 186f
evolution, 275b see also specific types in mismatch mutagenesis, 176, 176f
hybridization see Nucleic acid hybridization Nucleic acid structure, 1–27 oligo(dT) for making cDNA, 215, 215f
information storage, 2 bases see Base(s) random hexanucleotides, 198,199f
central dogma and, 2 charge and, 7 Oligonucleotide probes,
RNA world hypothesis and, 275b chemical bonds, 6–7, 6t advantages/disadvantages, 197, 208
see also Gene(s); Genetic code conformation and, 7 allele-specific, 186b, 187, 203, 203f
labeling see Nucleic acid labeling hydrogen bonds, 6, 6b end-labeling, 197
locations, 2 phosphodiester bonds, 7, 7f microarray hybridization and, 208–209, 245, 246
structure see Nucleic acid structure see also specific types see also Microarray(s)
see also Genome(s); specific types DNA see DNA structure mismatching consequences, 196f
Nucleic acid hybridization, 164, 191–211 DNA replication and see DNA replication molecular cytogenetics and, 46
aims/goals, 192 linear sequences, 3–4, 3f preparation, 197
annealing/reannealing, 193, 194–195 nomenclature, 3, 4t protein sequence-derived (degenerate), 509, 509f
applications, 192 phosphate, 3, 3f, 7 Oligonucleotide therapeutics, 704-707,706f, 708f
gene expression studies, 240–241, 241f repeating units in, 3f Oligonucleotides,
automation and, 207 RNA see RNA structure antisense and gene knockdown, 390
denaturation, 193, 194–195 sugars, 3, 3f, 7 hybridization probes see Oligonucleotide probes
detection see also specific components morpholino, see Morpholino oligonucleotides
autoradiography and, 199–200, 200b Nucleoid, 93 PCR primers see under Polymerase chain reaction
fluorescence and, 201b, 201f, 201t Nucleolus, 38, 94b, 94f, 361 (PCR)
see also Fluorophores Nucleoplasm, 93 site-specific mutagenesis, 176, 176f
indirect methods, 47, 201–202, 202f Nucleoside(s), definition, 4 Oligopotent cells, 115, 115t
factors affecting, 194–195 nomenclature, 4t Omics, 384
base composition, 195 structure, 3f see also specific types of analyses
hydrogen bond number, 194–195 see also individual nucleosides OMIM database see Online Mendelian Inheritance in
repetitive DNA and, 196 Nucleosome(s), 37, 37f, 353 Man (OMIM)
temperature and Tm, 192b, 195, 195t chromatin remodeling and, 356 Oncogene(s) cellular, 540–546, 567
glossary of terms used, 192b core histones, 37, 37f, 353 activation (gain-of-function mutations), 542–546
immobilization/capture on solid support, see also Histone(s) amplification, 542, 542t, 543fr
193–194, 194f, 203–207 transcriptional start sites and, 362 cell cycle arrest/apoptosis and, 546
colony hybridization, 206–207, 207f Nucleotide(s), 3f, 4, 4t chromosomal rearrangements see below
dot/slot blots, 203, 203f definition, 4 point mutation, 542t, 543, 543f
microarray-based see Microarray(s) nomenclature, 4t retrovirus vector-induced, 710b, 711-712
northern blots, 204, 205f RNA editing and, 374–375 specific circumstances of, 546
in situ hybridization see In situ hybridization structure, 3f analysis methods, 542
(ISH) triplet code, 2, 20 assay, 541
size-fractionated samples, 204, 204b, 205f see also Genetic code chromosomal rearrangements and, 542t
Southern blots, 204, 205f see also individual nucleosides; specific types gene promiscuity and, 544
interrogation of test sample by known probe, Nucleotide excision repair (NER), 414b, 414f, 557 immunoglobulin regulatory elements and,
192–193 xeroderma pigmentosum, 416, 557 545–546, 545f
kinetics, 196 Nucleotide sequence databases, 235t novel chimeric genes, 48f, 542t, 543–544,
labeling reactions see Nucleic acid labeling Nucleotide substitutions 545t
molecular cytogenetics and, 46–47 nomenclature, 407b translocation to active chromatin, 542t,
PCR and, 183 non-synonymous, 308b 544–546, 545f
principles, 192–196, 193f synonymous (silent), 308b tumor-specific, 544, 545t
probes see Nucleic acid probe(s) see also Ka/Ks ratio see also Cancer cytogenetics; specific
probe–target heteroduplex formation see Nucleus, 93 rearrangements
Heteroduplex(es) chromosomes see Chromosome(s) colorectal carcinogenesis, 562
stringency see Hybridization stringency chromosome territories, 38–39, 38f discovery, 540
target sequence, 46, 193, 193f structure, 38, 94b, 94f functions/mechanisms of action, 540, 541–542,
copy number effects, 196 sublocalization of functions, 361 541t
768 Index
cell cycle regulators, 541t, 542, 550, 551 Oxidative phosphorylation, mitochondrial genes and, PCR see Polymerase chain reaction (PCR)
cell surface receptors, 541t, 542 258, 258f, 259t PDGF, as oncogene, 541
DNA-binding proteins, 541t, 542 Oxidative stress, Pedigree(s), 62, 64–72
growth factors, 541, 541t, 542 cell senescence and, 111 basic types (Mendelian), 64–68, 66b
signal transduction components, 541t, 542, DNA damage, 412, 412f autosomal dominant, 64f
543, 543f autosomal recessive, 64f
miRNAs as, 561 mitochondrial inheritance, 68f
normal cellular (proto-oncogenes), 540, 541, 541t
P X-linked dominant, 65f
viral, 540–541, 540f, 541t P1 artificial chromosomes (PACs), 173t, 175, 231 X-linked recessive, 65f
One gene–one enzyme hypothesis, 70–71 p14IARF, 552, 552f Y-linked (hypothetical), 65f
Online Mendelian Inheritance in Man (OMIM), 62, 63b p16INK4A, 552, 552f complicating factors (to basic Mendelian type),
entry status symbols, 69 p21WAF1/CIP1, 556 72–78
see also specific disorders p53, age-related (delayed onset), 73–74, 74f
Oocyte, 146, 147f cell cycle regulation, 109, 551–552, 551f anticipation and, 74–75
cell polarity, species differences, 140 DNA damage response and, 551, 556, 556f chimeras and, 78
sperm entry point and, 140b loss of function in tumorigenesis, 552, 552f failure of dominant trait to manifest, 73–74,
cell structure, 146, 147f p63, 552 73f, 74f
DNA methylation, 359, 370 p73, 552 germinal mosaicism and, 77–78, 78f
fertilization, 146–147 P450 enzymes see Cytochrome P450s imprinting and, 75–76, 75f
mammalians vs. other vertebrates, 148 Pachytene, 34, 34f inbreeding effects, 76, 76f
meiosis and gametogenesis, 33–34, 33f Packaging cell, in gene therapy, 702, 702f male lethality in X-linked disorders, 76, 76f
intrinsic asymmetry, 118b Paired-end mapping, copy-number variation, 410, new mutations, 76–77, 77f
prophase arrest, 33 410f penetrance effects, 73, 73f
see also Meiosis Paired domain, motif, 388f recessive trait mimicking dominant pattern,
mRNA storage, 375 Paired box, 270t 72, 73f
Oogonia, 33, 33f Pair-end mapping, cystic fibrosis gene, 521 variable expression, 74–76, 74f, 75f
Opal termination codons, 22 Pairwise concordance, 469–470 see also specific factors
Oral cancer, development/stages, 538f Palmitoylation, 23 family identification and collection, 70
Organ(s), 100 Paneth cells, of intestine, 17f gene tracking and, 592, 592b, 592f
cell–tissue–organ relationship, 100, 100f Papillary thyroid carcinoma, 545t identifying mode of inheritance
development, 154–155 Papilloma virus, 687 ambiguity/problems, 69–72
neural tissue see Neural development PAPP-A (pregnancy-associated plasma protein A), gene sequence–character relationship,
internal asymmetry, 140f trisomy 21 and, 630 70–71
pattern formation (development), 139 PAR1, 322, 322f, 323, 324f genetic counseling and, 69
tumors as, 538 X-inactivation and, 326 genetic heterogeneity and, 71–72, 71f
see also Cell adhesion; Tissue(s); specific organs PAR2, 322, 322f, 324f importance of, 69
Organelles, 93, 94b, 94f X-inactivation and, 326 segregation ratios see Segregation ratio
see also specific types Paracrine signaling, 102, 102f see also complicating factors (above)
Organism complexity, Paralogous genes (paralogs), 316, 316f, 386 informative meioses and genetic mapping, 449,
C-value paradox, 96, 96f, 329 homology searches and functional genomics, 386 450f
G-value paradox see G-value paradox interpreting sequence changes, 417 ambiguity and, 451–452, 451f
noncoding DNA and, 329, 333 origin by genome duplication, 318, 318f autozygosity and, 457–459
Organization for Economic Growth and Development positional cloning and, 499f, 502 CEPH cell lines and, 455–456, 462
(OECD), genetic testing guidelines, see also Orthologous genes defining minimal candidate region, 500
570 Paramecium tetraurelia, genome duplication, 319 limited resolution, 462–463
Origins of replication, 9, 40–41 Paramagnetic beads, 711f size and, 70, 450, 456, 457
chromosome structure, 36, 40–41 Parametric analysis, 464 see also Linkage analysis
DNA replication, 9 lod score see Lod score symbols used, 64, 64f
mammalian cells, 41 monogenic linkage see Linkage analysis see also specific types of inheritance
mitochondrial DNA, 257, 258f Paramutations, 365, 370 P element of Drosophila, 302b, 659-660
origin of replication complex (ORC) candidate gene testing and, 513 PEG3 RNA, as tumor suppressor, 288t
yeast ARS elements, 40–41 Kit locus, 370, 371f PEG (polyethylene glycol),
Ornithine transcarbamylase defiency, 710b possible mechanism, 370 conjugation to therapeutic proteins, 678, 681
Orthologous genes (orthologs), 316, 316f Paraoxonase, 521 PEG-ADA (PEG-adenosine deaminase), 710b
developmental genes and, 334, 334f Parathyroid hormone receptor, gain-of-function PEG-CK30, in DNA nanoparticles, 704
homology searches and functional genomics, mutations, 433 PEGylation, 681
385–386 Paraxial mesoderm, 153, 154f Penetrance, 73, 73f
human–chimpanzee, 333 Parietal (somatic) mesoderm, 153, 154f factors affecting, 73
‘human-defining’ DNA and, 339 Parkinson disease, stem cell therapy, 695, 696t genetic mapping and, 463
interpreting sequence changes, 417 Lewy bodies in originally healthy fetal cell grafts, Peptide bonds, 21f, 22,
non-positional gene identification, 511, 511f 695 structure, 5f
positional cloning and, 499f, 502–503, 520 Parthenogenesis, 57 Peptide mass fingerprinting (PMF), 251
see also Paralogous genes Parvoviruses, gemoes, 13f Peptidyl transferase activity, ribosomal RNA, 22
Osteoblasts, 115 Patau syndrome see Trisomy 13 (Patau syndrome) perforins, 122, 127
Osteogenesis imperfecta type IIA, 431, 431f Paternity, DNA profiling see DNA profiling Perichromatin fibrils, 94b
Out of Africa model, haplotype blocks and, 484 Pathogen recognition, 122 Pericytes, as mesenchymal stem cells, 688
Ovarian cancer, Pathway-specific microarrays, 210 Peripheral myelin protein, loss-of-function vs. gain-of-
BRCA1, 527 Pattern-based functional searches, 386, 386f function mutations, 434, 435f
breast cancer and, 526 Pattern formation (development), 134, 139–144 perivitelline space, 147f
gene therapy trials, 713t axis specification/polarization and, 139–140, 140f Permutation test, 476
minisatellite instability, 558 evolutionary conservation, 160, 161, 161f Peroxisomes (microbodies), 94b
Ovarian teratomas, 57, 367 homeotic genes and see Homeotic genes Personal genome sequencing, 233, 305
Ovary, 149f (Hox genes) CNV and, 263, 410–411
Overdose, P450 variation and, 611 mammals, 140, 140bb Internet companies offering, 628
Overlapping transcription units, 266, 267, 268, 276b, signaling (morphogen) gradients, 141–142, see also Personalized medicine;
276f 143, 143f, 144 Pharmacogenetics/
bacterial genes, 268 species differences in, 140 pharmacogenomics
common promoters, 266, 268 neural see Neural development Personalized medicine, 602, 636
genes-within-genes, 266, 268, 268f organs, 139, 142 drug development and, 618–619
mitochondrial genes, 258, 258f, 259f, 268 orthogonal axes, 139 genetic treatment of disease and, 679, 679f
overlapping genes, 266, 268f see also Body plan; specific axes personalized prescribing and see Personalized
RNA genes and, 268 Pattern recognition, 122 prescribing
Oviduct (= Fallopian tube), 149f PCD see Programmed cell death (PCD) ‘predict and prevent’ medicine, 634–635, 635f
Ovulation, 149f PCNA (proliferating cell nuclear antigen), DNA repair susceptibility genes see Susceptibility testing
and, 415
Index 769
see also Personal genome sequencing; genotype and see Genotype–phenotype Platyrrhine (New World) monkeys, 327b
Pharmacogenetics/ relationship Pleiotropy, clinical heterogeneity vs., 71
pharmacogenomics mouse maps, 503b Ploidy, 30–31, 96, 260
Personalized prescribing, 609, 616–621 Phenotype studies, abnormal see Chromosome abnormalities
depersonalized medicine and ‘polypill’ vs., 621 humans as model organisms, 305–306 aneuploidy see Aneuploidy
drug labeling and, 616, 618 human-specific genes and, 339–341 chromosome set/number (n), 30
drugs licensed for specific genotypes, 619–620 Phenotyping, of animal models, definitions, 30
BiDil®, 619 anatomical, 642, 642f DNA content (C), 30
EGFR inhibitors, 620, 620f morphological 642, 642f whole genome duplication and, 318
trastuzumab (Herceptin®), 619 Phenylalanine, structure, 5f see also specific types
goals, 616 Phenylketonuria (PKU), population screening and, Pluripotency tests, 119
need for bedside genotyping, 616 629, 631 Pluripotent cells, 115, 115t
pharmaceutical industry and, 618–619 treatment of, 678 embryonic stem cells/germ cells, 117–119
polygenic drug effects, 616–618 Philadelphia chromosome, 543–544 in vitro vs. in vivo tests, 119
monogenic vs., 616 see also BCR–ABL1 rearrangement, chronic PML bodies, 94b
P450s, 616 myeloid leukemia Point mutations,
warfarin and, 617–618, 617f 3’,5’-Phosphodiester bond, structure, 7, 7f missense see Missense mutation(s)
slow acceptance/uptake, 621, 635 32Phosphorous (32P), 200, 200t nonsense see Nonsense mutation(s)
tumor expression profiling and, 620–621, 621f 33Phosphorous (33P), 200 oncogene activation, 542t, 543
Pfu polymerase, 183 Phosphorylation, of polypeptides, 23t synonymous (silent), 417, 422
P-glycoprotein, multi-drug resistance, 422 transphosphorylation of receptors, 106f see also SNP (single nucleotide polymorphism)
Phage, 169 Phosphorylation of RNA, Polar body, 33, 33f, 118b, 145
as cloning vectors see Vectors triphosphorylation at 5’ end, 18, 18f and asymmetric cell division, 118b ,118f
‘helper,’ 177 PHOX2B protein, 423 Polar fibers, 39b, 39f
host integration (prophage), 169, 173 Phylogenetics, Political issues, DNA profiling, 601
RNA polymerase(s), riboprobe labeling, 198, 199f classical approaches, 326, 326f Polyacrylamide gel electrophoresis (PAGE), 2D-PAGE,
see also specific phage below definition, 326 250f
Phage display, 179–180, 180f glossary, of common metazoan phylogenetic global protein expression and, 249–250
disease gene identification, 509 groups and terms, 327b resolution, 249
Phage λ, 173–174 molecular see Molecular phylogenetics sensitivity, 250
cloning vectors and, 173–174, 173t outgroup, 328 Polyadenylation (mRNA 3’ end), 18, 19f, 349
cos sequences, 174 rooted trees (cladograms), 328, 328f alternative splicing and, 374
genome, 174f taxonomic units (taxa), 326, 326f, 327b function, 18
rolling circle replication, 174 unrooted trees, 328, 328f mechanism, 18, 19f
Phage M13, 176 Phylogenies, Polyalanine tract disorders, 423
Phage P1, 175, 652, 652f definition, 326 see also Repeat expansion diseases
Phage T7, 178 eukaryotes, 329, 330f Polycistronic transcription unit, see under Transcription
Phagocytes, classes and roles, 121-122 metazoan, 329, 330f unit
NADPH oxidase and, 711 vertebrates, 329, 331f Polyclonal antibodies, 244b
Phagocytosis, innate immune response and, 122 Phylogenomics, 327 see also Immunoglobulin(s)
Pharmaceutical industry, personalized medicine and, see also Molecular phylogenetics Polycomb proteins, epigenetics/epigenetic memory,
618–619 Phylum, definition, 327b 365
Pharmacodynamics, Physical appearance prediction, 600 Polycystic kidney disease, positional cloning approach,
definition, 610 Physical maps/mapping, 499
genetic variation effects, 614–616, 636 chromosome banding and, 226 Polyethylene glycol see PEG (polyethylene glycol)
ACE polymorphisms, 615, 615f Drosophila melanogaster, 225 Polygenic threshold theory, 79–83
beta-adrenergic receptors and, 614–615, genetic distance vs., 446–448 dichotomous characters and, 82–83
615f genetic linkage maps and, 456 angel vs. demon model, 82, 82f
HT2RA polymorphisms, 615–616 HapMap project and, 483b, 483f recurrence risks and, 82–83, 83f, 84t
malignant hyperthermia and, 616 human see Human Genome Project (HGP) sex-specific thresholds, 82–83
Pharmacogenetics/pharmacogenomics, 609–616, 636 see also Genetic maps/mapping general principles, 79–81
adverse reaction prediction, 610, 619 PicoTitrePlate™, 221f hidden assumptions, 81
definitions, 610 Pig, animal model, 641t regression to the mean, 80, 80f
drug development and, 618–619 cystic fibrosis model, 671 heritability and see Heritability
drug labeling and, 616, 618 induced pluripotent stem cells, 672 see also Quantitative genetics
personalized prescribing and see Personalized piggyBac, DNA transposon, 660 Polygenic traits, 471
prescribing piRNA see Piwi-protein-interacting RNA (piRNA) definition, 63
polygenic effects, 616–618 Pittsburgh allele, 432–433, 433f see also Multifactorial inheritance
Pharmacokinetics, Piwi-protein-interacting RNA (piRNA), 279t, 285–286, Polyglutamine toxicity, screens for modifier genes,
definition, 610 287f 665t
genetic variation effects, 610–614, 636 coding within longer ncRNA transcripts, 287 Polyglutamine tract disorders, 423, 425f
fast vs. slow acetylators, 613–614, 613f translational regulation and, 377 genetic testing, 588, 589f
glutathione-S-transferases, 614, 614f Placenta, development, 136, 149b Huntington disease see Huntington disease (HD)
P450 enzymes and see Cytochrome P450s Placental (eutherian) mammals, see also Repeat expansion diseases; specific
phase 1 reactions, 610–613 definition, 327b disorders
phase 2 reactions, 613–614 vertebrate phylogeny and, 331f Polyhistidine-nickel ion affinity chromatography, 179
suxamethonium sensitivity and, 613, 616 Plakoglobin, 99 Polylinker sequences, 170
thiopurine methyltransferase, 614, 616 Plant homeodomain, 354 Polymerase chain reaction (PCR), 164f, 182–189
UGT1A1 glucuronosyltransferase, 614 Plants, advantages, 182–183
see also Drug metabolism transgenerational epigenetic regulation, 370 amplification of rare sequences, 183
Pharming, 683 whole genome duplication, 318 approaches used/specific methods, 186–189,
Phase I, II, III clinical trials, see Clinical trials Plasma cells, 120t 186b
Phase (in linkage), 451, 451f Plasma membrane, 94b, 94f allele-specific PCR, 186b, 187, 187f, 581, 581f
Phenocopies, genetic mapping and, 463, 500 selective permeability, 93, 94b Alu-PCR, 187
Phenome, 661 Plasmids, 168–169 anchored PCR, 186b
Phenotype, as cloning vectors see Vectors DNA library screening, 216, 216fr
candidate gene identification and, 501–502 copy number, 168 DOP-PCR, 186b, 188
definition, 62 F factor, 174 hot-start PCR, 186b
evolution integration, 168 inverse PCR, 186b, 186f
lineage-specific gene expression, 338–339, Plamodium falciparum, 681 large fragment cloning and, 185–186
338f. Platelet-derived growth factor (PDGF), as oncogene, linker-primed PCR, 186b
regulatory elements and, 333–335 541 microsatellite analysis, 229, 229f
uncoupling from chromosome evolution, Platelets, immune functions, 122 multiple targets, 186b, 187
319, 319f origin from megakaryocytes, 96, 97f mutagenesis using, 188–189, 189f
genetic background and, 661b Plating out of bacterial cells, 165–166, 171–172, 171f, nested primer PCR, 186b
genetic variation and, 406 216 RACE-PCR, 186b, 384b, 384f
770 Index
real-time (quantitative) PCR see Quantitative Polytene chromosomes, 225, 302b Position-specific scoring matrix (PSSM), 386, 386f
PCR (qPCR) Population attributable risk (PAR), 627 Positive-negative marker selection, 650, 651f
RT(reverse transcriptase)-PCR, 186b, Population genetics, Positive selection, 307b, 308, 308b
241–242, 346, 348b, 348f, 572 linkage disequilibrium and, 483, 488 causing sequence diversification, 308, 309f
sequencing and, 217, 220 polymorphism definition, 406b human-specific DNA and, 339, 340, 340t, 341
touch-down PCR, 186b population screening effects on gene frequency, signatures in coding DNA, 309f
whole genome, 634 signatures in noncoding DNA, 309f
using degenerate primers, 186b, 188 quantitative see Quantitative genetics Post-translational modification, 23–25
oligonucleotide linkers, 187–188, 188f recessive traits/recessiveness, 589 carbohydrate addition, 23–24
cell-based cloning vs., 185 susceptibility testing and, 624 cleavage reactions, 24–25, 268
disadvantages, 185–186 Population screening, 602–603, 628–634, 636 insulin, 25f, 268
gene identification and, 498 ACCE framework and validity, 629 methionine removal, 25
genetic testing and, 571 cystic fibrosis and signal sequences, 25
allele-specifc PCR, 581, 581f +/– couples and, 632 expression cloning and, 178, 180
DNA vs. RNA samples, 572 flowchart for carrier screening, 632f functional proteomics and, 394
MLPA and, 585 mutation screening problem, 631–632 histones, 37
multiplexing problem, 583 sensitivity/specificity issues, 631–633 lipid groups and, 23t, 24
multiplex screening for DMD deletions, subject choice/organization of, 633, 633t major types/functions, 23t
586–587, 586f diagnostic tests vs., 629 species differences, 681
PCR mutagenesis and RFLP induction, 580, ethical issues, 633–634 POT1, telosome component, 42
580f ethical principles of, 633 Potency, of cells, 114, 115t, 136
profiling and see DNA profiling gene frequency effects, 634 restriction see Cell fate determination
QF-PCR in fetal trisomies, 587–588, 588f, 598, 602 stigmatization/devaluation, 633–634 POU domain, 270t
see also specific types/applications false positives and, 630, 630t Prader–Willi syndrome (PWS), 368–370, 427, 427t
heat-stable polymerases, 183, 185 general aims, 628–629, 629f, 631 imprint control center and, 369
history/background, 165 intervention following methylation assays, 577
large-scale amplification of target DNA, 183 defining threshold, 629–630 microdeletions, 368, 368f, 428
mutagenesis methods, 188-189, 189f importance of, 629, 631 paternal origin of deletion and, 368
polishing enzymes, 185 see also Clinical validity snoRNA deficiency and, 283, 369, 369f
primers legal issues and, 633 SNURF–SNRPN transcripts and, 369, 369f
Alu-specific, 187 phenylketonuria and, 629, 631 PRC1, 365
degenerate, 186b, 188 ‘predict and prevent’ medicine, 634–635 PRC2, 365
design, 184–185 requirements/criteria, 631–633, 631t Prechordal (prochordal) plate, 152 153f
mismatched, 188–189, 189f sensitivity vs. specificity, 630, 630t, 631–633, 632t Precocious puberty, 433
multiplex, 573 as top-down process, 629 Precursor proteins, 24–25, 268
specificity and, 184–185 trisomy 21 screening, 629–630, 630f, 630t Pregnancy-associated plasma protein A (PAPP-A),
universal adaptors, 583 worried well and, 631 trisomy 21 and, 630
sequence information requirement, 184 Population stratification, association and, 479, 487 Pre-integration complex, 701
starting DNA, 183 Porcupine men, 67 Pre-miRNA, 283, 285
steps (thermocycling), 183, 184f ‘Porcupine men,’ Y-linked inheritance and, 67 Premutations, 424
strand synthesis labeling, 198 Positional cloning, 498–504, 499f, 534 Prenatal testing,
whole genome, 186b, 187–188 approaches, 498 b-thalassemia, 589
see also specific methods approximate chromosomal location, 499–500 DNA profiling role, 598
and Multiple displacement amplification (MDA) chromosome abnormalities and, 498, 499f, DNA sources, 571–572, 571t, 587
Polymorphism(s), 504–509, 534 indications, 587
definitions of, 406b additional mental retardation, 504b stigmatization/devaluation and, 633–634
‘human-defining’ DNA and, 339 balanced abnormality with unexplained trisomies, 587–588, 588f, 602, 608
as mapping markers, 226, 226b, 227t, 228 phenotype, 504–505, 504b, 505f population screening and intervention
see also DNA markers breakpoint localization, 505, 505f thresholds, 629–630
MHC genes, 126b CGH and, 506, 507–508, 507f, 525 Primary cilium, 94b, 94f
microarray analysis, 210–211 clinical points of interest, 504b cell signaling and, 105
molecular genetics definition, 406b confirmation of effects, 508–509 Primary oocytes, 33, 33f
normal variants, 406–411 contiguous gene syndromes, 504b, 520 Primase(s), 9, 10b
see also Genetic variation; specific types cytogenetic abnormality with standard Primate(s),
Polymorphism information content (PIC), 449 presentation, 504b Alu repeats, 293
Polypeptides, 2–7 FISH and, 505, 505f, 506, 506f brain expressed miRNA gene expansion, 332–333
chemical bonds, 6–7, 6b, 6t long-range effects and, 508–509 catarrhine, definition, 327b
see also specific types microdeletions/microduplications and, 506, ‘human-defining’ DNA and, 339
DNA-coding for see Protein-coding genes 507–508, 507f large-scale subgenomic duplications, 264
gene expression and, 14–15 SNP arrays and, 507–508 PKD1 pseudogenes and, 271, 273f
linear sequences, 4–6, 5f X–autosome translocations, 505–506, 506f, long ncRNAs, 288
amino acids see Amino acid(s) 520 model organisms, 136, 305, 305t
repeating units, 5f defining minimal candidate region, 500, 500f ethical concerns, 303
structure see Protein structure establishing candidate genes, 354, 500, 501 New World monkeys, definition, 327b
synthesis see Protein synthesis appropriate function and, 502 Old World monkeys, definition, 327b
post-translational cleavage, 24-25, 25f chromosome jumping, 521, 521f PAR regions, 322
post-translational modification, 23-25, 23t chromosome walking, 500–501 phylogenetic relationships, 331f
‘Polypill’ concept, 621 expression analysis/phenotype correlation, primate-specific genes, 338, 338t
Polyploidy, 50 501–502 taxonomy, 326f
constitutional, functional relationships and, 502 vertebrate phylogeny and, 331f
in humans, origins 50, 51f genome browsers, 501, 501f see also specific groups/species
clinical consequences, 52t, 53 homologous genes and, 499f, 502–503, 522 Primer extension assay, 581–582, 582f
natural in some vertebrates, 319 mouse models, 502, 503–504, 503b, 503f Arrayed primer extension (APEX), 582
whole genome duplication, 318 prioritizing candidate genes, 501–503, 534 Primers,
definition, 30 see also Comparative genomics; Model RNA for DNA polymerase(s), 9, 43
mosaic, 52 organisms RNA in end-replication,
common in some cell types, 96, 97f framework maps see Framework maps see also Oligonucleotide primers
due to cell fusion, 96, 97f linkage analysis see Linkage analysis Pri-miRNA, 283, 285f, 286f
due to endomitosis, 96, 97f logic of, 499–500, 500f Primitive endoderm, 151
polytene chromosomes and, 302b, 302f source of error, 500 Primitive groove, 151, 152f
polytene chromosomes, linkage analysis and see under Linkage Primitive mesenchymal cells, 101
Poly(A) polymerase, 18, 19f analysis Primitive node, 151, 152f
Polyps see Adenomas successes/genes cloned, 499, 501, 502, 505, 505f Primitive pit, 151, 152f
Polysomes, 22 see also individual disorders; specific techniques Primitive streak, 151, 152f, 158f
Poly(A) tail see Polyadenylation (mRNA 3’ end) Position effects, 351, 508 origin, 137f
Index 771
sources of embryonic germ cells, 119 stages, 34, 34f specific proteins
Primordial germ cells, 33, 33f, 96 mitosis, 32f antibody tracking, 242, 244f
developmental migration of, 158–159, 158f mitotic spindle, 39, 39b, 39f direct vs. indirect immunolabels, 242
environmental regulation, 159 see also specific stages electron microscopy, 242
isolation of and cell line formation, 119 Prosecutor’s fallacy, 600, 600b epitope tags, 244, 245t
migration, 149b Prostate cancer, fluorescence microscopy, 244–245, 245f
transgenesis route, 643f gatekeeper hypothesis and, 563 GFP fusion proteins, 245, 245f
Prion proteins, 425b gene therapy, 713t immunoblotting (western blotting), 242,
Prior probability, 594b, 594f prostate-specific antigen (PSA) test and, 607, 627 244f
Privacy, screening programs and, 633 clinical validity, 607 immunocytochemistry, 242, 244f
Probability calculations, 594–595, 594b, 594f susceptibility genes immunofluorescence, 244
lod score and, 453, 453b chromosome 8q24 and, 518, 519f see also specific methods
susceptibility tests, 624 clinical validity of testing, 627, 627t Protein folding, expression cloning and, 178, 180
zygosity testing, 599 Protein(s), Protein interactions, 25, 393, 403
Proband, 64 acidic vs. basic, dominant-negative mutations and, 432
Probandwise concordance, 470 aggregation, disease and, 418, 425b functional genomics/proteomics, 383, 393–403
Probes see Nucleic acid probe(s) crosslinking of, 403, 403f large-scale studies, 395–396
Procainamide, genetic variations in metabolism, 614 deficiency, possible causes, 514 NIPBL and, 387b, 387f
Procaspases, 113 DNA-coding see Protein-coding genes protein–DNA interactions see Protein–DNA
Prodrugs, in gene therapy, 698f, 714, 714f electrophoresis, 2D-PAGE, 249–250 interactions
Professional antigen-presenting cells, 126b expression protein–protein interactions see
Progenitor cells, analysis see Proteomics Protein–protein interactions
differentiation potential, 115, 115t change independent of transcription, see also specific methods
self-renewing see Stem cells 375–376 gene dosage effects, 430
Programmed cell death (PCD), 111 factors affecting, 514 protein conformation and, 25
apoptosis (type I) see Apoptosis inability to self replicate, 275b protein–DNA see Protein–DNA interactions
autophagy (type II), 112 interactions see Protein interactions protein–ligand, 25, 395
cell survival vs., 112 isoforms see Protein isoforms receptors see Receptor–ligand interactions
importance/functional roles, 112–113, 112t mass spectroscopy see Mass spectrometry (MS) protein–protein see Protein–protein interactions
defensive/immune system, 112, 112t, 122 ‘recombinant,’ 178 Protein isoforms,
developmental role, 108, 112, 112t, 145–146 size variations, 266, 267t alternative promoters and, 373, 373f
human disease and, 112–113 structure see Protein structure alternative splicing and, 373–374, 373f
in neurons, quality control, 112t synthesis see Protein synthesis see also Alternative splicing
tissue turnover and, 112, 112t see also Polypeptides differential cleavage see Post-translational
see also specific functions Protein A, 242 modification
Prokaryote(s), Protein chips, 394–395 gene expression analysis, 240
bacteria see Bacteria Protein-coding genes, 256, 265–273, 346 Protein kinase R, 391
cell structure, eukaryotes vs., 92–93, 94f coding RNA see Messenger RNA (mRNA) Protein kinases see Kinase(s)
classification, 92–93, 92f exons see Exon(s) Protein–ligand interactions, 395
genomes, 2, 93 gene families see Gene families protein conformation and, 25
huge variety of, 93 human numbers, 238, 261, 275, 293, 310–311, receptors see Receptor–ligand interactions
model organisms, 298, 299b, 300 382 see also Protein–DNA interactions;
Prolactin, in JAK-STAT signaling, 106f introns see Intron(s) Protein–protein interactions
Proliferating cell nuclear antigen (PCNA), DNA repair mitochondrial DNA, 258, 258f Protein motifs,
and, 415 multiple proteins from single gene, 372–375, common classes, 270, 271f
Proline, structure, 5f 378–379 common DNA-binding domains, 104b, 104f, 350
Prometaphase, mitosis, 32, 32f alternative first exons, 372 functional genomics searches, 388, 388f, 388t
cytogenetic analysis, 44 alternative splicing see Alternative splicing see also specific motifs
karyotype, 46f CDKN2A gene, 552, 552f Protein–protein interactions, 395–401, 395f
Promoter(s), 14, 347–353 multiple alternative promoters, 372–373, basal transcription apparatus, 352
alternative promoters, 372–373, 372f 372f candidate gene identification and, 510
CDKN2A gene, 552, 552f overlapping transcription units see genetic studies, 395
functions, 372 Overlapping transcription units large-scale studies, 395–396, 400
alternative start sites, 364, 364f RNA editing see RNA editing methods for detecting, 396t
basal transcription apparatus and, 15, 348, 348t, see also Protein isoforms affinity purification–MS, 396t, 397–398, 398f
349f, 355–356 non-intron containing, 265 TAP tagging, 396t, 398, 398f
co-regulators, 15, 350, 352–353 overlapping see Overlapping transcription units yeast two-hybrid system, 396–397, 396t,
core sequence elements, 347, 347f as percentage of conserved DNA, 276b 397f
CAAT box, 14 statistics, 267t see also specific methods
GC box, 14, 15f untranslated regions, evolutionary expansion, methods for validating, 396t
spacing and, 347, 347f 333 co-immunoprecipitation, 396t, 399, 399f
TATA box, 14, 15f, 347, 348 variations, 265–266, 265t FRET, 396t, 398–399, 399f
CpG islands in, 347 intron length, 266 pull-down assay, 396t, 399
enhancers see Enhancers repetitive DNA within, 266 protein interactome, 399-401, 401f
histone H3K4 methylation and, 355–356 size variation, 265–266, 267t human interactome, 401f
imprinting and, 369–370, 370f Protein–DNA interactions, 401–403 yeast interactome, 401f
inducible, 646, 647, 647f DNA binding domains see DNA-binding domains systems approach (interaction proteomics),
internal (within gene), 15, 16f, 372 transcription factors see Transcription factors 399–401, 403
mutation effects, 421, 422, 422f in vitro mapping, 401–402 data presentation, 400, 401f
shared, 266, 268 DNA footprinting, 401–402, 402f see also specific proteins
silencers see Silencers gel mobility shift assay, 402 Protein sequence(s), 4–6, 24t
transcription factor binding sites and, 350–351 SELEX assay, 402, 402f amino acids see Amino acid(s)
proximal vs. distal, 351, 351f in vivo mapping, 402–403 databases, 235t
see also Transcription factors ChIP, 355b, 355f, 403 de novo MS sequencing, 252
Proneural genes, 157, 157f formaldehyde-induced crosslinking, 403, linear nature, 4–6
Pronuclear microinjection, 643f, 647, 647f 403f predicting gene sequence from, 509, 509f, 523,
Pronucleus, 147, 644f Protein domains, 27 523f
Proofreading, exon duplication and, 314–315, 314f translation from mRNA see Translation
DNA polymerases, 10 exon shuffling and, 315, 315f, 388, 432 see also Proteomics
failure and DNA damage, 412 examples, human, 270t Protein structure,
Prophage, 169 functional genomics searches, 388, 388f, 388t charge and water solubility, 7
Prophase, gene families, 270, 270t chemical bonds, 6–7, 6b, 6t
meiosis I Protein expression analysis, conformation and, 7
female gametogenesis and arrest, 33, 33f, 34 comparative, 252 disulfide bridges, 27, 27f
recombination and see Recombination global protein analysis see Proteomics hydrogen bonds, 6, 6b, 25, 27
772 Index
peptide bonds, 21f, 22 exaptation of, 272b fluorescent (QF-PCR) in fetal trisomies, 587–588,
see also specific types functionality, 272b, 273 588f, 602,
cofactor/ligand interactions and, 25 gene duplication and, 256, 263, 271, 272b, 273, Quantitative trait loci (QTL), 63
domains see Protein domains 316 expression QTLs, 371
exon duplication and, 314–315, 314f in dispersed gene families, 271,273f
levels, 24t, 25 in gene clusters, 269f, 271, 273f
missense mutations and, 417–418 in human genome, 339
R
modular nature, 388 in RNA gene families, 272b Rabbit, novel ES cells, 672
domains see Protein domains mitochondrial, 272b Race differences see Ethnicity/race
isoforms and see Protein isoforms nonprocessed, 271, 272b, 272f, 273f RACE-PCR, 186b, 348b, 348f
motifs see Protein motifs origin same as retrogenes, 274f RAD50 proteins, 555, 556f
post-translational modification see Post- origins, 272b Rad51 proteins, 443b, 556
translational modification prevalence, 272b Radiata, definition, 327b
primary see Protein sequence(s) processed, 271, 272b, 272f, 273, 274f Radiation hybrids see Hybrid cells
quaternary, 24t, 27 role in regulating parent gene, 286, 288f Radiography, 642, 642f
dominant-negative mutations and, 432 segmental duplication origin, 271,273f Radioisotopes,
secondary, 24t, 25–26 selection and, 272b advantages/disadvantages, 200
b-pleated sheet, 26, 26f transcription nucleic acid labeling, 199–200, 200b, 200t
b-turn, 27 inducing endogenous siRNA, 286, 288f Random mating, multifactorial inheritance and, 81
coiled coils, 26 retrogenes and, 272b, 273, 274f, 274t Random primed DNA labeling, 198, 199f
a-helix, 24t, 25–26, 26f XIST origin from, 272b Ras signaling, 543, 543f
sequence effects, 25–27 Y chromosome, 324 mitogens and, 110, 110f
tertiary, 24t, 27 Pseudosequences, phylogenetic tree validation, 329 oncogenic, 543, 562
dominant-negative mutations and, 431, 431f Psoriasis, theraputic mAbs for, 686t Rats,
see also specific motifs PSSM (position-specific scoring matrix), 386, 386f animal models, 304–305, 304b, 305t, 641, 641t,
Protein synthesis, 20–27 Psychiatric disorders, 665t, 696t
condensation reactions, 4 copy-number variants and, 532 gene knockouts using zinc finger nucleases, 671
elongation, 21f, 22 diagnostic criteria, 468 induced pluripotent stem cells, 672
gene expression systems see Expression cloning difficulty of gene identification, 532 novel ES cells, 672
initiation, 20, 21–22, 21f drug therapy physiological relationships, 331f
mitochondrial genes and, 258, 258f, 259t CYP2D6 polymorphism and sensitivity to, Rb see Retinoblastoma (Rb) protein
mRNA translation see Translation 611 R-banding, 45
peptide bond formation, 21f, 22 HT2RA polymorphism and sensitivity to, Reactive oxygen species (ROS), DNA damage, 412,
post-translational modification see Post- 615–616 412f
translational modification endophenotypes and, 532 Reading frame, 22, 420, 420f
termination, 22 genetic mapping, 468 mutation effects (frameshifts) see Frameshift
see also Messenger RNA (mRNA); Ribosome(s); see also Genetic maps/mapping mutations
Transfer RNA (tRNA) see also specific disorders ORF identification and comparative genomics,
Protein truncation test (PTT), 576, 576f PTC124 (ataluren), 680-681 309, 310
Proteoglycans, 24 PTT see Protein truncation test (PTT) overlapping genes see Overlapping transcription
ECM and, 99 PU.1, transdifferentiation using, 690bb units
Proteome, 239, 245, 249 Public health measures, population screening, Real-time PCR see Quantitative PCR (qPCR)
analysis see Proteomics 628–629 Receiver-operating characteristic (ROC) curves,
databases, 235t, 385t see also Population screening susceptibility testing, 623–624,
Proteomics, 239, 239t, 249–252, 384, 393–395 Pufferfish, as model organisms, 302 624f
2D-PAGE, 249–250, 250f Pulsed-field gel electrophoresis, 204b Receptor–ligand interactions,
applications, 249, 252–253, 393–394, 395f Purifying (negative) selection, 306–308, 341 cell adhesion molecules, 97
functional analysis, 393–403 definition, 306, 307b cell signaling and, 102, 102f, 103t, 105, 105f, 106f
identifying proteins as vaccine targets, 681 signatures in noncoding DNA, 307, 308f see also Signal transduction pathways; specific
miRNA target site identification, 377, 377f protein-coding vs. RNA genes, 308f, 310 molecules
protein interactions see Protein interactions transposons and, 336 Receptor tyrosine kinases, mitogen actions, 110, 110f
sequence analysis, 393 Purines, 3f, 4, 4t Recessive inheritance,
definition, 393 definition, 4, autosomal, 64f, 66b
identifying proteins as vaccine targets, 681 depurination, 412, 412f, 413f founder effects, 89
major facets, 395f salvage pathways, 182f Hardy–Weinberg distribution and, 85
mass spectrometry see Mass spectrometry (MS) structure of, 3f mutation rate, 88
microarrays (protein chips), 394–395 see also Adenine; Guanine loss-of-function mutations, 429, 435, 435f
see also specific methods Pyloric stenosis, 83, 84t mimicking of dominant traits, 72, 73f
Protist, definition, 92 Pyrimidines, 3f, 4, 4t new mutations and, 76, 77f, 88
Protocadherins, alternative splicing, 374 definition, 4, segregation ratio calculation, 70b
Proto-oncogene(s) see Oncogene(s) salvage pathways, 182f variable expression and, 74
Protophyta, 92f structure of, 3f X-linked, 65f, 66b
Protostomes, definition, 327b see also Cytosine; Thymine; Uracil carrier phenotype, 67
within metazoan phylogeny, 330f Pyrophosphate, release during DNA synthesis, 219 estimated mutation rates, 88
Prototherian mammals (monotremes), Pyrophosphorolysis-activated polymerization, 581 manifesting heterozygotes, 67
definition, 327b Pyrosequencing, standard, 219, 220f mutation rate, 88
vertebrate phylogeny and, 331f genetic testing with, 583, 602 new mutations and, 76, 77f, 88
Protozoa(ns), 92f massively parallel, 220, 221f pedigree, 65f
model organisms, 299b, 300 Pyrosequencing, iterative, see also specific modes (autosomal, X-linked)
Proximal, in chromosome nomenclature, 45b in massively parallel DNA sequencing, 220, 220f, Recessive disorders/traits,
Pseudoautosomal inheritance, 67–68, 67f 221f as characteristics not genes, 64
Pseudoautosomal regions, 35, 67, 321, 322f definition, 62
evolution, 321, 322–323 founder effects see Founder effects
common X–Y ancestor and, 321, 322f
Q genetic mapping see Genetic maps/mapping
human–mouse divergence, 322, 322f Q-banding, 44–45 gene therapy, promising targerts for, 709-710
instability and, 322f, 323 Quadruple test, trisomy 21, 630, 630t gene tracking, 593f
recent origins (in humans), 323 Quadrupole mass analyzers, 251b Hardy–Weinberg distribution and, 85
PAR1, 322, 322f, 323, 324f, 326 Quantitative genetics, 63 heterozygote (carrier) phenotypes
obligate crossover in, 322 Gaussian distribution of traits, 79, 79f genetic risk calculation, 594–595, 594b, 594f
PAR2, 322, 322f, 324f, 326 genetic determination of traits see Polygenic mosaicism and, 67
pseudoautosomal inheritance and, 67–68, 67f threshold theory screening issues, 631–633, 632f
recombination frequencies, 447 heritability see Heritability selective advantage and see Heterozygote
X-inactivation and, 326, 367 historical studies, 79 advantage
Pseudocolors, 48, 48f see also Multifactorial inheritance inheritance pattern see Recessive inheritance
Pseudogenes, 256, 263, 271–273, 272b, 293 Quantitative PCR (qPCR), 186b, 189, 241–242, 243b, modeling in mice, 662-663, 664b
definition, 271 2043f mutation rate estimates and, 88
Index 773
population genetics, 589 suppression by chromosome inversions, 323 see also DNA methylation
possible screening effects on gene frequency, suppression near sex-determining regions, 323 noncoding regions, 289
634 synaptonemal complex, 34, 34f nucleic acid hybridization and, 196
see also specific traits/conditions X–Y pairing and, 35 PCR primers and, 184
Recessive inheritance, autosomal, 64f, 66b, Recombination fraction, Alu-specific PCR, 187
X-linked, 65f, 66b gene tracking and, 593 recombination and, 443b
Reciprocal translocation, 51t, 54 linkage disequilibrium and, 480 shotgun sequencing and, 225
mechanism, 55f lod score relationship, 453 tandem vs. interspersed, 408
potential outcomes in balanced translocation mapping function relationship, 445–446 transposon-derived see Transposons
carrier, 55–56, 56f maximum value (0.5), 445 types/classes, 263, 289–290, 289t
Recombinant DNA, as measure of genetic distance, 442–444, 448, 463 Y chromosome ampliconic sequences, 324f,
definition, 165 haplotype blocks, 444 325–326, 325t
libraries see DNA libraries synteny/linkage and, 443–444, 444f see also specific types
production, 165, 166–168, 167f, 169f see also Genetic linkage Replication fork, 9, 10f
single-stranded, 176–177, 177f number of informative meioses and, 449, 450, Replication origins see Origins of replication
see also Cloning; Vectors 450t Replication protein A (RPA), DNA repair and, 415
Recombinant DNA technologies, 163–190 sperm genotyping/mapping, 462–463, 462f Replicon, definition, 168
cell-based cloning see Cell-based DNA cloning Recombination hotspots, 447, 448, 448f Reporter(s)/reporter genes,
gene identification and, 498 Recombination nodules, 34–35 genes, vector construction, 170–171, 170f, 172
general approaches, 164, 164f Recurrence risk, 64-65ff, 66t, 82 indirect DNA labeling, 47, 202f
PCR see Polymerase chain reaction (PCR) Regenerative capacity, transgenes, 651, 652f, 659f
see also specific methods high in amphibians, 690b see also specific genes/proteins
Recombinant inbred strains, mouse (Mus musculus), higher in fetus than adult, 689 Reproductive system,
503b Regenerative medicine, 695 development, 158–159
Recombinant proteins, principle, 178, 679 Regression to the mean, 80, 80f gametes see Gametes
expression cloning origin, 681 Regulatory DNA elements, ‘human-defining’ DNA and, 339
safety of, 681 chromosome rearrangements and, 351, 352f Reptile(s),
therapeutic examples, 682t cis-acting see cis-regulatory elements (CREs) polyploidy, 319
Recombinant vaccines, 687 comparative genomics and identification, 312 sex determination, 320, 321, 321t
Recombinase, see Cre recombinase sensitivity–specificity tradeoff, 312, 312f vertebrate phylogeny and, 331f
Recombination, 34–35, 442, 443b, 443f ultraconserved elements, 312 Resequencing see DNA sequencing
carcinogenesis and, 443b Encyclopedia of DNA elements project see Respiratory chain genes, mtDNA and, 258, 258f, 259t
common DNA metabolism proteins, 415 ENCODE project Restriction endonucleases, 165, 166b
DNA polymerases, 10 expansion in complex metazoans, 333 genomic library construction, 214
distribution along chromosome, 446–448 gain-of-function mutations, 432–433, 432t, 433f partial digestion with, 214, 215f
conserved ancestral segments see Ancestral mutation effects see Mutation(s) sequence specificities, 166, 166b
chromosome segments trans-acting, 14 blunt vs. sticky ends, 166
fraction, estimation of, 446, 446f lineage-specific, 335, 335f common type II enzymes, 168t
hotspots, 447, 448, 448f transcription factors see Transcription factors methylated vs. unmethylated targets, 577,
individual variations, 448, 448f see also specific elements 577f
nonrandom nature, 446–448, 447f Regulatory T cells (Tregs), 120t, 127 site polymorphisms see Restriction fragment
recombination deserts and jungles, 446 self-tolerance and, 127, 128b length polymorphism (RFLP)
sex differences, 446, 447, 447f Relatedness, type I, 166, 166b
see also Genetic linkage common ancestors, 479, 481f type II, 165, 166–167, 166b, 168t
double-strand breaks and, 414bb, 414f, 443b gene sharing and, 86, 479–481 IIB, IIR, IIS, see 168t
dystrophin gene and, 595 Relative risk, 609b, 609t type III, 166, 166b
gene duplication and absolute risk vs., 609, 635 utility, 165
segmental duplication, 264, 264f association studies, 477, 490, 622–623 see also Zinc finger nucleases
tandem duplication, 264, 264f clinical validity and, 608, 609b, 609t Restriction fragment length polymorphism (RFLP),
genetic maps and see Genetic maps/mapping susceptibility test batteries, 624 229f, 407, 407f
Holliday junctions, 443b, 443f Repeat expansion diseases, 229, 423–425, 424t DNA markers, 227t, 229, 229f, 230t, 449t, 450
homologous, see Homologous recombination allelic heterogeneity and, 429 DNA methylation assays, 577, 577f
immunoglobulin/TCR rearrangements, 53, 93 anticipation and, 74–75, 424 human genome maps, 228–229, 230t
class switching and, 130–131, 130f coding sequence, 423, 424t specific mutation testing, 579–580, 579t
failure/incorrect, 131 gain-of-function mutations, 423, 428 PCR mutagenesis and, 580, 580f
recombination signal sequences, 130 genetic anticipation and, 512 Restriction–modification systems, 165, 166b
VDJ segments, 128–130, 129f genetic testing, 588, 589f Retinal progenitor cells, retinoblastoma and, 546
see also Immunoglobulin genes instability threshold, 423, 424, 588 Retinitis pigmentosa,
interference, 445, 461 mechanisms, 423, 425f contiguous gene syndromes, 426, 427f
intrachromosomal, 325–326, 442, 443b mitotic instability and, 425 rhodopsin as candidate gene, 502
see also Gene conversion noncoding sequences, 423, 424t Retinoblastoma,
inversion effects, 323 non-positional gene identification and, 511–512 Knudson’s two-hit hypothesis and, 546–547, 546f
linkage disequilibrium and, 483 polyalanine tracts, 423 linkage studies, 546–547
see also Ancestral chromosome segments; polyglutamine tracts, 423, 425f mechanisms of Rb mutation/loss, 547, 547f
Haplotype blocks; Linkage premutations, 424 Retinoblastoma (Rb) protein,
disequilibrium (LD) protein aggregation and, 425b cell cycle regulation, 109, 551, 551f
mitotic, 443b sex effects, 424 loss of function in retinoblastoma, 546–547, 546f
non-allelic homologous, 368, 368f, 426, 426f, see also specific disorders LOH analysis and, 548
428, 443b Repetitive DNA, 2, 256–257, 289–293 possible mechanisms, 547, 547f
nonhomologous, 443b cloning and, 173, 174 tissue-specificity of gene defect, 514
obligate in PAR1 pseudoautosomal region, 322 coding regions, 266, 289, 293 Retinoic acid, as developmental morphogen, 144
prophase of meiosis I and, 34, 34f, 442, 443b constitutive heterochromatin see Constitutive RET, loss vs. gain-of-function mutations, 433–434
alignment mechanisms, 34–35 heterochromatin Retrogenes, 272b, 273, 274f, 274t, 316
chiasmata, 35, 35f databases, 291 cause of chondrodystosis in dogs, 338, 338f
individual differences, 448, 448f exons, 266 definition, 273
minisatellite production, 408, 409f as feature of large genomes, 263 gene duplication to give, 274f
rates, sex-averaged and sex-specific, 446 genetic variation and, 263, 408–409 in humans, examples of , 274t
recombinant vs. non-recombinant chromatids, pathogenic, 422–425 in primates, 338
442, 443b, 444f, 446 see also Repeat expansion diseases rationale, 273
informative pedigrees, 449, 450f, 451–452, tandem repeats see Tandem repeats Retrotransposons/retrotransposition, 2, 264, 290,
451f human vs. chimpanzee, 337 291, 291f
regulatory elements and, 351, 352f introns, 266 copy-and-paste transposition, 315, 315f, 659
RMGR method, 670 isolation by ultracentrifugation, 164 creating novel exons, 335-337, 337f
site-specific, using Cre-loxP, 652-654, 652f, 653f, lack of evolutionary constraint, 290 disease role, 292
654f low amount in Drosophila, 329 exon shuffling mediated by, 315, 315f
using FLP-FRT, 652 652f methylation and, 357 intron evolution and, 314
774 Index
LINEs see LINE elements interlocus sequence exchange, 307b internal promoters in some, 15, 16f
processed pseudogenes and, 271, 273 internal promoters, 15, 16f medium-sized to long regulatory RNA, 278tt,
retrovirus-like elements, 290 mtDNA, 258, 258f, 259, 277 287–288, 288t
human LTR transposons, 291, 291f see also RNA genes microRNA, 283, 285
SINEs see SINE elements peptidyl transferase activity, 22 mitochondrial DNA, 258, 258f, 259
see also specific elements processing, 19–20, 19f noncoding RNA (ncRNA), human genes 278tt
Retrovirus(es), synthesis by RNA pol III, 15, 16f novel gene identification, 310
gene therapy vectors, 701-702, 701t, 702f Ribosome(s), 20 piwi-protein-interacting RNA (piRNA), 285–286
genomes, 540f, 701-702, 702f aminoacyl tRNA binding sites within protein-coding genes, 268, 276b, 276f
HIV, see Lentiviruses A-site, 21, 21f, 22 pseudogenes, 272b
lentiviral, see Lentiviruses P-site, 21, 21f, 22 ribosomal RNA (rDNA) see Ribosomal RNA (rRNA)
γ-retroviral, 701, 701t, 702f, 710b, 711 large (60S) subunit, 20 selection and
oncogenes, 540–541, 540f rough endoplasmic reticulum, 94b, 94f positive selection and, 308, 309f
reverse transcription see Reverse transcription rRNA see Ribosomal RNA (rRNA) purifying (negative) selection and, 308f
Retrovirus-like elements, 290 small (40S) subunit, 20 small Cajal body RNA (sacRNA), 278tt, 283
human LTR transposons, 291, 291f translocation, 21f small nuclear RNA (snRNA), 276b, 278tt, 280–283,
Rett syndrome, male lethality and inheritance Ribozymes, 2, 275b small nucleolar RNA (snoRNA), 278tt, 282-283
identification, 76 therapeutic, 706, 715 statistics, 267t
rev, HIV regulatory protein, 702, 714 Rice, transgenic, 683 transfer RNA, 15, 16f, 258, 258f, 259, 278t, 279
Reverse genetics, 389, 403, 498 RIKEN Institute, 233 see also specific RNAs
in cell lines, 389–3920, 390t Ring chromosomes, 51t, 53, 54f RNA genomes, in some viruses, 11, 12t, 13f
see also Genetically modified organisms RISC see RNA-induced silencing complex (RISC) RNA-induced silencing complex (RISC), 283, 285f
Reverse transcriptase(s), 11t, 12 Risk ratio (λ), 468–469 RNA interference, 284b, 284f, 391
cDNA production and, 215, 215f definition, 468–469 RNA-induced transcriptional silencing (RITS) complex,
telomerase TERT component, 43 schizophrenia, 469t RNA interference, 284b, 284f
Reverse transcriptase PCR (RT-PCR), 186b, 241–242, Risk, relative vs. absolute, 609 RNA interference (RNAi), 283, 284b, 284f, 376, 390
346 RITS complex, heterochromatin formation and, 361 candidate genes and
5’ RACE, 186b, 348b, 348f RMGR recombination, 670 identification, 511
candidate gene approach and, 501 RNA, 2–7 testing, 515
genetic testing and, 572 antisense see Antisense RNA/transcripts evolution, 390
Reverse transcription, 2, 12 advantages and disadvantages for genetic functional genomics and, 390–392, 403
see also Reverse transcriptase(s) testing, 572 cell lines, 390, 391–392, 391f
RFLP see Restriction fragment length polymorphism base pairing see Base pairing dsRNA and, 390–391, 391f
(RFLP) capping of 5’ ends efficiency and sequence, 392
Rhabdomyosarcoma, 433, 545t mRNA, 18, 19f global screens, 392, 392f
Rhesus haemolytic disease, 63f snRNA, 281f model organisms, 390–391, 391f, 393f
Rhesus macaque, coding vs. noncoding, 2 morpholino oligonucleotides, 390
ANDi, first transgenic primate, 645 DNA binding see DNA–RNA duplexes synthetic siRNA, 391–392, 391f
Huntinton disease model, 665t editing see RNA editing heterochromatin formation and, 361
iPS cells from, 672 electrophoresis, 204b identifying drug targets, 681
model organism, 305t evolutionary importance, 274, 275b translational regulation and, 376–377
phylogenetic relationships, 331f functional importance, 275, 275b, 277f see also MicroRNA (miRNA)
Rheumatoid arthritis (RA), association studies, 489, guide RNAs, 275, 283 RNA polymerase(s),
490f, 623t hybridization eukaryotic, 14, 14t, 15–16, 347
HLA-DR4 and, 477 northern blots, 204, 205f mitochondrial, 347
Rhinovirus, 670 probes see Riboprobe(s) transcription role, 13, 14, 15, 347–349
Rhodamine, see also Heteroduplex(es); specific use in RNA labeling, 198, 199f
emission/excitation wavelength, 201t techniques see also specific types
structure, 201f information storage, 2 RNA polymerase I, 14t, 347
Rhodopsin, retinitis pigmentosum, 502 central dogma and, 2 noncoding RNA synthesis, 15
Ribonuclease protection assay, 241 mutation rate and, 12 RNA polymerase II, 14, 14t, 347–349, 378
Ribonucleases (RNases), 278t RNA World hypothesis and, 275b basal transcription apparatus, 15, 348, 348t, 349,
RNase III in miRNA maturation, 283, 285f viral genomes, 11–12, 12t 349f
dicer (cytoplasmic), 283, 284b, 284f, 285f see also Gene expression; Genetic code; conformational changes, 349
drosha/Rnasen (nuclear), 283, 285f Genome(s) polypeptide-encoding genes, 14, 15, 347–349
RNase H, 215f maturation/processing see RNA processing promoter elements and, 14, 15f
RNase MRP, 278t noncoding, human classes, 278tt small noncoding RNA synthesis, 15
RNase P, 278t protein-coding RNA see Messenger RNA (mRNA) snRNAs and function of, 282
Ribonucleic acid see RNA structure see RNA structure RNA polymerase III, 14t
Ribonucleoprotein particles, 375 toxic, in myotonic dystrophy, 423, 425f internal promoters and, 15, 16f
Riboprobe(s), see also Genome(s); specific types/forms noncoding RNA synthesis, 15–16, 16f
advantages/disadvantages, 197 RNA binding proteins, mRNA transport, 375 Alu repeats and, 293
definition, 192b RNA, double-stranded RNA primers, DNA polymerase(s), 9, 43
expression vectors, 198 heterodimers involving guide RNAs, RNA processing, 12, 16–20
labeling by strand synthesis, 198, 199f miRNA, 283 antisense transcripts and long ncRNAs, 287
preparation, 197 piRNA, 282f editing see RNA editing
in situ hybridization and, 206 snRNA, 18f mRNA end modifications, 17–19
Ribose, 3, 3f snoRNA, 282f 3’ polyadenylation, 18, 19f, 349
structure, 3f intramolecular base pairing, 9f, 285f 5’ capping, 18, 18f, 21
Ribosomal RNA (rRNA), 275, 278t long dsRNA used in RNAi, 656 functional roles, 17, 18, 21
5.8S rRNA, 19, 277 siRNA duplexes, 284b, 390–391, 391f transcription termination and, 349
ribosomes, 20 viral dsRNA genomes, 13f rRNA, 19–20, 19f
synthesis, 19f RNA editing, 374–375 snRNA, 280
5S rRNA, 19, 277 C to U, 374, 375f see also Small nuclear RNA (snRNA)
ribosomes, 20 A to I, 374, 375f splicing see Splicing
synthesis, 15, 16f, 19f knockout effects, 375 tRNA gene products, 20
18S rRNA, 19, 277 U to C, 374 see also specific types of RNA
ribosomes, 20 RNA genes, 256, 274–288, 276b, 278tt, 498 Rnasen (Drosha), 283, 285f
synthesis, 19f comparative genomics, 310–312 RNA structure, 7
28S rRNA, 19, 277 dispersed gene families, 280–283 bases, 3, 3f, 7
ribosomes, 20 endogenous siRNA, 286 see also Base(s)
synthesis, 19f gene clusters, 277, 279 charge and, 7
evolution, 275b gene numbers and, 238–239, 256, 262, 277, 279, chemical bonds and, 6–7, 6t
genes (rDNA), 19, 277, 278t 382 hydrogen bonds, 6, 6b
duplication events, 315 G-value paradox and, 329 phosphodiester bonds, 7, 7f
in satellite stalk, 55f historical lack of interest, 274 DNA replication and see DNA replication
Index 775
DNA vs., 3 tumor-specific chromosome rearrangements, Self-organizing maps, microarray data, 247b
double-stranded, 8, 193 544, 545t Self-splicing (autocatalytic) introns, 314
gene knockdown and, 390 Satellite (of acrocentric chromosome), 55f Senescence see Cell senescence
hairpin structures, 8, 9f Satellite DNA, 289, 289t, 408 Sense stand of DNA, definition, 13, 13f
hydrogen bonds, 6b, 8 a-Satellite DNA, 40, 290, 408 Sensitivity of a test, 607b
miRNA synthesis, 283, 285f isolation by ultracentrifugation, 164 conflict with specificity, 608, 630
RNAi see RNA interference (RNAi) lack of evolutionary conservation, 290 Sensory cell types, 95t
viral genomes, 8 methylation, 357 Sequence alignment, 237b, 237f, 237t
see also Base pairing Scaffold, of nonhistone proteins, 37f aligning genome sequences, 309-310
linear sequences, 3–4, 3f, 7 Scanning electron microscopy, 642 comparative genomics and, 306, 308–309
nomenclature, 3, 4t scaRNA see Small Cajal body RNA (scaRNA) molecular phylogenetics, 326–328
phosphate, 3, 3f, 7 SCC2, 387b, 387f see also Comparative genomics
secondary structure, 8, 9f SCC4, 387b, 387f multiple sequence alignment, 386f
single-stranded, 8 scFv antibodies, 684, 684f, 713t programs for, 386, 386f
sugar (ribose), 3, 3f, 7 use as intrabodies, see Intrabodies pathogenic effect prediction, 578–579
viral genomes, 8, 11–12 Schizophrenia, see also Bioinformatics
RNA therapeutics, principles, 704-707 adoption studies, 471, 471t Sequence homology searching, 237b, 385-386, 388
RNA viruses, 11–12, 12t association studies, 489t consensus sequence searching, 386, 386f
genome diversity, 12, 12t, 13f in isolated populations, 488, 488f pattern searching, 386, 386f
mutation and, 12 copy-number variants and, 532 Sequence identity, 237b, 237f
phage, 169 linkage analysis and gene family classification and, 269–271, 270t
retroviruses, 12 nonparametric, 476, 477t, 478f human accelerated regions, 341
RNA World hypothesis, 275b parametric, 474 kringle domains and, 266
Robertsonian translocation, 51t, 54 risk ratio (λ), 469, 469t large-scale duplications and, 264
acrocentric chromosomes, 367 twin studies, 470, 470t NAHR and, 426
as balanced translocations, 55 Schizosaccharomyces pombe, ultra conserved elements and, 312
possible carrier outcomes, 56, 57f centromeres, 40, 41f Y chromosome and, 324, 325
mechanism, 55f evolutionary conservation, 502 see also Bioinformatics; Homologous genes
ROC (receiver–operator) curves, susceptibility testing, see also Orthologous genes Sequence similarity, 236, 237b, 237f
623–624, 624f heterochromatin formation and siRNAs, 361 definition, 192b
Roche AmpliChip®, 616 model organism, 299b, 300, 502 see also Bioinformatics; Gene families
Rodents, phylogenetic relationships, 330f Sequence tag analysis of genomic enrichment
model organisms, 303, 304–305, 304b, 305t telomere repeat sequence, 42t (STAGE), 355b
mouse see Mouse (Mus musculus) Schwann cells, role of, 102 Sequence-tagged site (STS), 227b, 227f, 227t, 230, 230t
rat see Rats SCID see Severe combined immunodeficiency (SCID) human genome map, 230–231
reproduction gene expansion, 332 Sealing strands, tight junctions, 98 Sequence tags, global transcript profiling, 248, 384
see also individual species Second degree relative, definition, 86 Sequencing-by-ligation, 220
Rofecoxib (Vioxx®), adverse reactions, 619 gene sharing by, 86 Serial analysis of gene expression (SAGE), 248, 348b,
Rolling circle replication, phage λ, 174 Secondary oocyte, 33, 33f 384
Romano–Ward syndrome, 436 Second messengers, 106–107, 107f candidate gene approach and, 501
Roof plate, 157 definition, 106 Serine, structure, 5f
Rooted phylogenetic tree (cladogram), 328, 328f hydrophilic, 106t Serotonin receptors,
Rostrocaudal axis, see Anteroposterior axis hydrophobic, 106t polymorphism and pharmacodynamics, 615–616
Rotational cleavage, 148, 148f see also specific types RNA editing, 374
Rough endoplasmic reticulum (RER), 94b Segmental aneuploidy syndromes, 427–428 Severe combined immunodeficiency (SCID), 711
see also Protein synthesis; Ribosome(s) Segmental duplication, 264, 264f, 271, 273f, 337, 338 due to ADA deficiency, 711,
RPA (replication protein A), DNA repair and, 415 NAHR and, 426 successful gene therapy,
rRNA see Ribosomal RNA (rRNA) Segregation analysis, 471–473 due to defective IL2RG gene, 711,
RSPs see Restriction fragment length polymorphism breast cancer, 526, 526f successful gene therapy,
(RFLP) Hirschsprung disease, 471, 471t Sex characteristics, primary vs. secondary, 159
RT-PCR see Reverse transcriptase PCR (RT-PCR) pitfalls/problems, 471–472, 471t Sex chromatin (Barr body), 66, 66f
Rubinstein–Taybi syndrome, 427t specific vs. general models, 471, 471t see also X-inactivation
Russell–Silver syndrome, methylation assays, 577 Segregation ratio, Sex chromosome aneuploidies, 50
Ryanodine receptor, malignant hyperthermia, 616 bias of ascertainment, 69–70, 69f chimeras, 78
calculating and correcting for recessive condition, clinical consequences, 52t, 53
70b gene dosage effects, 426
S definition, 69 sexual differentiation and, 159
Saccharomyces cerevisiae, Selectins, 98 X-inactivation and, 53, 66, 426
2 μm plasmid, 652 Selection, 307b Sex chromosomes, 31, 59, 158, 260
chromosomes, 175 Alu sequences and, 293 aneuploidy see Sex chromosome aneuploidies
autonomous replicating sequences, 40–41, AMY1 copy number and dietary starch, 316 chromosome inversion and, 323
40f balancing, 307b DNA content/chromosome, 261t
centromeres, 40, 40f, 41f, 175 see also Heterozygote advantage evolution, 320–326, 342
telomeres, 40f, 42t, cancer cells and, 538, 539 common ancestor, 321, 322f
see also Yeast artificial chromosomes co-dominant, 307b pseudoautosomal regions and, 321,
evolutionary conservation, 502 heterozygote advantage and see Heterozygote 322–323, 322f
FLP-FRT recombination system, 652, 652f advantage sex determining locus and X–Y divergence,
gene density, 266 human genome and, 339, 340t 323–324, 323f
genetic screening in, 394t natural, definition, 307b X-inactivation and, 326
genome database (SGD), 394t mutation–selection hypothesis of susceptibility Y chromosome gene conversion, 324–326
genome sequencing, 228, 234, 236t, 306 genes, 492–493, 494 gametologs in, 321, 322f
model organism, 299b, 300, 394t, 502 overdominant, 307b homologous genes, 326
phylogenetic relationships, 330f positive (Darwinian) see Positive (Darwinian) see also Pseudoautosomal regions
Sacral neural crest, 156b, 156f, 156t selection meiosis
SAGE (serial analysis of gene expression), 248, 348b, pseudogenes and, 272b evolution and, 322–323
384 purifying (negative) see Purifying (negative) X–Y pairing and, 35
candidate gene approach, 501 selection see also Meiosis
Salamanders, limb regeneration, 690b removal of deleterious traits, 88 sex determination and, 31, 31f
Salivary amylase, gene duplication, 315–316 signature of, 523-525, 524f different systems, 321t
Sanger Cancer Gene Census, 542 stabilizing, 307b heteromorphic chromosomes, 321
Sanger, Fred, 217 Y chromosome and, 323–324 homomorphic chromosomes, 321
Sanger sequencing see Dideoxy DNA sequencing Selenocysteine, 21st amino acid, 280b sex-specific regions, 321
Sarcoma, SELEX method, 402, 402f Z-W sex chromosomes, 321t, 321
definition, 538 to make oligonucleotide aptamers, 685 Sex determination, 31, 31f, 320
MDM2 and, 551 Selfish gene theory, imprinting, 370 evolution, 320–321
776 Index
sex chromosomes see Sex chromosomes therapeutic, 707 factors affecting allele frequency, 406, 407
X–Y divergence, 323–324 cholesterol conjugation, 707 high frequency and lack of phenotype effects,
different genetic mechanisms, 321t clinical trials, 707 407
genetic, 31, 321t transgene approach, 391, 391f indels, 407, 407f, 411, 429f, 430f, 491, 589f, 615,
environmental vs., 320 see also RNA interference (RNAi); short hairpin 615f
sex chromosomes see Sex chromosomes RNA (shRNA) microarray analysis see SNP microarray analysis
sex-determining loci, 323–324, 323f Sister chromatids, 31f, 32 nomenclature/terminology issues, 406, 407b
intrinsic vs. positional information, 159 crossovers, 444 normal vs. pathogenic variation, 578–579
male and female development, 159 see also Recombination see also Mutation(s); specific disorders
sex determining loci, gene duplication, 264, 264f restriction sites and see Restriction fragment
evolution, 323–324, 323f Site-specific mutagenesis, length polymorphism (RFLP)
SRY, 323, 323f oligonucleotide mismatch, 176, 176f simple alternative alleles, 406–407
Sex differences, PCR-based types, 407, 407f
recombination and, 446, 446f, 447, 447f 5’ add-on, 188, 189f SNP microarray analysis, 210–211
repeat expansion diseases and, 424 mismatched primers, 188–189, 189f cancer genetics
Sex hormones, sexual differentiation and, 159 polylinker sequences, 170 chromosomal instability assessment, 553
Sex-linked inheritance, 66b single-stranded DNA production for, 176–177, genome-wide studies, 559
pseudoautosomal inheritance, 67–68, 67f 176f, 177f oncogene analysis, 542
X-linked see X-linked inheritance vector construction, 170 for detecting copy number variants, 507-508
Y-linked, 65f, 66b, 67 Skeletal muscle cells, 101 massively parallel genotyping, 583–585, 602
Sexual differentiation, 159 syncytia, 96, 97f Affymetrix system, 583, 583f
see also Sex determination Skin, Illumina GoldenGate® system, 584–585, 585f
SGM+ markers for genetic profiling, 597, 597t, 598f, renewal, 115 molecular inversion probes, 584, 584f
599 stem cell niches, 117, 117f snRNA see Small nuclear RNA (snRNA)
Shiverer mouse, stem cell therapy, 696t Sleeping Beauty, DNA transposon, 659-660, 660f SNURF–SNRPN transcripts, 276b, 282
Short hairpin RNA (shRNA) Slime mold (Dictyostelium discoideum), multicellular coding for two proteins plus ncRNA, 369, 369f
expression vector, 392f evolution and, 92 Prader–Willi syndrome and, 369, 369f
gene libraries encoding, 392, 392f Slot-blot hybridization, 203 Solenoid, packing of nucleosomes, 37f, 38
transgenes for gene knockdown, 656 Sm proteins, 281t Solexa sequencing technology, 220, 582
Short interfering RNA see siRNA, Small Cajal body RNA (scaRNA), 278t, 281, 283 SOLiD™, 220, 221t
Short interspersed nuclear elements see SINE elements Small cytoplasmic RNAs, 278t Somatic cells,
Short tandem repeat polymorphisms (STRPs) see Small hydrophobic signaling molecules, 103t definition, 31
Microsatellite DNA Small interfering RNA (siRNA) see Short interfering germ cells vs., 93
Short tandem repeats (STRs) see Microsatellite DNA RNA (siRNA) mutations
Shotgun genome sequencing, 224–225 Small intestine, chromosome abnormalities, 50
Sickle-cell anemia, 433 stem cell niches, 117, 117f immunoglobulin hypermutation, 131
heterozygote advantage and, 407 tissue renewal/replenishment, 115 see also Mosaicism; Mutation(s)
as recessive trait, 64 Small nuclear ribonuclear proteins (snRNPs), 281 Somatic cell nuclear transfer, 643f, 690b, 691f
Sickle-cell disease, spliceosomes, 17, 18f to make pig cystic fibrosis model, 671
ASO hybridization, 203, 203f, 580 Small nuclear RNA (snRNA), 278t, 280–283 Somatic (parietal) mesoderm, 153, 154f
heterozygotes (sickling trait), 64–65 Cajal body RNA, 278t, 281, 283 Somatopleure, 153, 154f
homozygotes see Sickle-cell anemia functions, 282–283, 282f Somites, 153, 154f
missense mutations, 417–418, 418f non-spliceosomal, 282 Somitomeres, 153, 154f
one specific mutation, 580t spliceosomal, 17, 18f, 281, 281t Sonic hedgehog (Shh/SHH),
Sickling trait, as dominant trait, 64–65 genes, 272b, 278t, 280–283 distal enhancers, 351, 352f
Signal sequences, post-translational modification, 25 within longer ncRNA transcripts, 287 Hox gene regulation and positional identity,
Signal transduction pathways, 103, 103t, 104b, 105, non-spliceosomal, 282 143, 143f
105f, 106f spliceosomal, limb development, 351, 352f, 508
cancer role functions, 17, 18f, 281 mutation and disease, 160
oncogenes, 541t, 542 Lsm snRNA, 281t, 281f neural pattern formation, 157
pathways approach and, 564, 564f, 565, 565f, Sm snRNA, 281, 281t, 281f Sotos syndrome, 505, 505f, 506f
566, 566f synthesis Southern blot hybridization, 204, 205f
cross-talk, 107 by RNA pol II, 15 DNA profiling, fingerprinting with minisatellites,
gene dosage effects, 430 by RNA pol III, 15, 16f 596, 596f
kinase cascades, 105, 106f see also specific types fragile X syndrome tests, 588
second messengers see Second messengers Small nucleolar RNAs (snoRNAs), 278t, 281 genetic testing and, 572
see also specific molecules/pathways alternative splicing regulation (HBII-52), 283 Sox2/SOX2, 690, 693, 694f
Silencers, 15, 349 brain-specific, 282 Sparteine, CYP2D6 polymorphism and sensitivity
DNA looping and, 351, 351f disease locus (HBII-85 snRNA), 277f, 283, 369, 369f to, 611
spicing and see Splicing suppressors genes, 282–283 Species differences, at molecular level
Silent mutations see Synonymous (silent) mutation(s) HBII-85 disease locus, 277f, 283 amount of alternative splicing, 333
SINE elements, 290, 291f, 292, 292f, 408 within protein-coding introns, 369 cis-regulatory elements, 335-337, 337f
Alu see Alu sequences gene clusters, 282, 369f fraction of genome,
LF-SINES and gene evolution, 336–337, 337f imprinted, 369, 369f lineage-specific, 335-337, 337
LINE homology, 292 rRNA processing, 20 conserved developmental pathways used for
MIRs, 293 subclasses different purposes, 161, 161f, 334,
Single-molecule sequencing, see DNA sequencing, C/D box snoRNAs, 282, 282f 334f, 502-503, 522
massively parallel H/ACA box snoRNAs, 282, 282f developmental pathways not well conserved,
Single nucleotide polymorphism see SNP Smith–Magenis syndrome, 427, 427t DNA methlyation,
Single selection, 70b Smoking, nicotine-specific mAb to help stop, 684 gene number (G-value paradox), 329,
Single-strand conformation polymorphisms (SSCPs), Smooth muscle cells, 101 genome sizes (C-value paradox), 96
575f, 576 SMRT sequencing, 222, 223f lineage-specific expansion of gene families,
Single-stranded binding proteins, 10b snoRNA see Small nucleolar RNA (snoRNA) 332–333, 332t, 337-339, 338t, 338f
Single-stranded DNA production, cell-based DNA SNP (single nucleotide polymorphism), 406–408, 438 lineage-specific exons, 335-337, 337f
cloning, 176–177 association/linkage disequilibrium studies, 481, lineage-specific gene inactivation, 338–339, 339t
siRNA (short interfering RNA), endogenous, 279t, 286, 482, 483b, 484, 484f, 494, 516 noncoding DNA fraction, 333
376-377 breast cancer studies, 528–529, 529t ploidy, 319
discovery, 375 HapMap see HapMap project protein-coding gene number (G-paradox), 329,
heterochromatin formation and, 361 sequences with no known function, 518–519 331t
libraries, 392, 392f tag-SNPs, 484, 484f, 494, 516, 622 repetitive DNA content, 329, 332
mRNA degradation and, 375 variant identification and, 516–517, 517f transcription factor binding sites, 333
pseudogene transcription and, 286, 288f complex alternative alleles, 407 Specific language impairment (SLI), FOXP2 and, 340,
target sites (3’ UTRs), 377 databases, 407, 407b, 577 340f, 340t
siRNA, synthetic/experimental, 376, 391–392, 391f as DNA marker, 227t, 230, 449t, 450–451, 481 Specificity of a test, 607b, 632t
global screens, 392, 392f evolutionary stability, 408 conflict with sensitivity, 608, 630
Index 777
Speckles, 94b potency and, 115, 115t prostate cancer associations, 627, 627t
Spectral karyotyping (SKY) see Chromosome painting self-renewing progenitors, 115 type 2 diabetes associations, 625
Speech language disorder, 340 therapeutic use, 115 lifestyle changes vs., 626
Sperm, 147, 147f tissue stem cells, 115-117 limited value of, 624, 627–628, 635
DNA methylation, 359, 370 therapy, see Stem cell therapy population attributable risk and, 627
flagellum, 94b, 147, 147f transit amplifying cells and, 117, 117f population genetics and, 624
genotyping/mapping, 462–463, 462f Stem cell therapy, 687-690, 693-695, 694f, 696t population screening see Population screening
meiosis and gametogenesis, 33–34, 33f allogeneic, 689 predictive vs. preventative, 635, 636
see also Meiosis autologous, 689, 694f private companies and, 627–628
Spermatogonia, 33, 33f cell replacement, 687-688 relative risk see Relative risk
S phase of cell cycle, 30, 31f, 32, 108 disorders/injuries to treat, 689 ROC curves, 623–624, 624f
checkpoints, 550, 550f genetic modification of stem cells for, 689 test batteries, 623–624, 624f, 625f
mitogens and promotion, 110 practical difficulties, 689 see also specific disorders
Spheroplasts, 176 practice of, 696t Suxamethonium, genetic variation and metabolism,
Spinal cord, stem cell banks for, 689 613
injury, ES cell therapy trial for, 695 stem cell mobilization, induction, 689-690 SWI/SWF family, 357t
pattern formation, 157 Steroid hormones, cell signaling, 103, 103t, 104b, 105 SYBR green, for staining DNA, 204b, 243f
Spinal muscular atrophy (SMA), Stigma, Symmetry-breaking, during development, 140
SMN mutations Werdnig–Hoffmann disease, 419f, population screening and, 633 Synapse(s), 102, 107–108
420, 422 test results and, 608 cell–cell signaling, 102, 102f
zebrafish model, 671 Stop codons see Termination codons chemical synapses, 107–108, 108f
Spinobulbar muscular atrophy (Kennedy disease), 72 Streptavidin see Biotin–streptavidin system electrical synapses, 108
Spindle poles, 39–40, 39b, 39f Stroke, Synapsis, 34
Spino-cerebellar ataxia, 512, 514 PCD role, 113 Syncytium (-a),
Splanchnic (visceral) mesoderm, 153, 154f polypill and, 621 muscle cells, 96, 97f
Splanchnopleure, 153, 154f possible cell therapy, 696t syncytial blastoderm, 147
Spleen, as lymphoid organ, 120–121 Stromal cells, 116 syncytiotrophoblast, 150, 151f
Splice isoforms, 373, 373f Subfunctionalization, 316–317, 316f Synonymous (silent) mutation(s), 417
Splice junctions, 16 globin genes, 317–318, 317f Ka/Ks ratios and, 308b
acceptor vs. donor sites, 17, 17f Submetacentric chromosome, definition, 45t neutral variants and, 493
consensus sequences, 17, 17f, 419 Substitution, see Amino acid substitution; Nucleotide pathogenic, 419f, 420, 422
cryptic, 420, 513 substitution PTT and, 576
different efficiencies, 373 Subtraction cloning, dystrophin gene, 520 Synpolydactyly 1, 423
mutation effects, 419–420, 419f Sulfa drugs, genetic variations in metabolism, 614 Synteny, human-mouse conservation, 320, 320f
in silico prediction, 578 Sulfatase enzymes, Synthetic biology, 300
Spliceosome(s), 17, 18f deficiency diseases, 510, 522–523 Synthetic organisms/genomes, 300–301
alternative splicing and, 373 formylglycine residues, 522, 523f SYPRO® dyes, 250
major vs. minor, 17 35Sulfur (35S), 200, 200t Systemic lupus erythematosus (SLE), association
snRNAs, 17, 18f, 281, 281f, 281t Supercoiling, 36–37 studies, 489t
Splicing, RNA, 16–17, 16f, 276b Supravalvular aortic stenosis, 430t Systems biology, 389
alternative see Alternative splicing Surgery, genetic variation and, cancer genetics and, 566, 566f
branch site, 17, 17f malignant hyperthermia, 616
enhancers, 17, 373, 379, 419–420, 422 suxamethonium sensitivity and, 613
immunoglobulin class switching, 130f, 131 Surname, Y haplotypes and, 599–600
T
intron removal, 16 Survival factors, 112, 113 T-acute lymphoblastic leukemia, 711
‘memory,’ 418 Susceptibility genes, 63, 438 TAFs, 348, 348t
mechanism/steps, 17, 17f, 372, 372f clinical use see Susceptibility testing Tag-SNPs, 484, 484f, 494, 516, 622
nuclear location, 361 environmental factor interactions, 533, 623 Takifugu rubripes, as model organisms, 302
pathogenic mutations and, 419–420, 419f hidden heritability and, 534 Tamoxifen-inducible gene expression, 647, 647f
CFTR gene and, 419f, 422, 423f, 513 identification, 515–519, 622 Tandem affinity purification (TAP) tagging, 396t, 398,
cryptic splice sites and, 420, 513 approaches, 498, 499f, 535 398f
enhancers/suppressors and, 419–420, 422 association studies see Association studies Tandem duplication,
nonsense-mediated decay and, 418 clinically relevant findings, 532–534 exons, 314–315, 314f
prediction of effects, 420 difficulty, 532 genes, 264, 264f
splice junction changes, 419, 419f linkage studies see Linkage analysis, Tandem mass spectrometry (MS/MS), 251b, 252
thalassemias, 421 nonparametric Tandem repeats,
in silico prediction, 578 mapping see Genetic maps/mapping, non- constitutive heterochromatin see Constitutive
splice junction recognition, 16–17, 17f, 372 Mendelian disorders heterochromatin
spliceosomes and snRNPs see Spliceosome(s) see also Susceptibility testing individual genetic variation and, 408–409, 438
suppressors, 373, 379, 419–420, 422 lifestyle factors vs., 626, 626f copy-number variation see Copy-number
trans-splicing, 276b locus heterogeneity and, 492 variation (CNV)
Splicing enhancers, 17, 373, 379 pathogenic vs. neutral variations, 493 microsatellites see Microsatellite DNA
mutation and, 419–420, 422 penetrance, 492 minisatellites see Minisatellite DNA
Splicing suppressors, 17, 373, 379 population attributable risk and, 627 pathogenic repeats, 422–425
mutation and, 419–420, 422 relative risk see Relative risk short tandem repeats see Microsatellite DNA
STAGE technique, 355b testing see Susceptibility testing see also Repeat expansion diseases
Starch, amylase evolution and, 316 theoretical models lack of evolutionary constraint, 290
Stem cells, 114–119 common disease–common variant, 491, 493, methylation, 357
adult, 115 493t, 494, 516 nomenclature/terminology, 407b
cancer and, 115, 538, 539 mutation–selection hypothesis, 492–493, recombination and, 443b
cell division, 117 493t, 494 types/classes, 289–290, 289t, 408
asymmetric, 117, 118b, 118f, 138 synthesis of, 493, 493t uses/applications
see also Asymmetric cell division weakness of individual factors, 475, 486–487, 492, forensic analysis and, 408–409
polarity and cell fate, 138, 139f 516, 532, 622–623 pedigree analysis and, 71–72, 71f, 408–409
symmetric, 117, 118f see also specific disorders see also DNA markers
cord blood, 688, 697b Susceptibility testing, 532, 535, 622–628, 636 see also specific types
differentiation potential, 115, 115t analytical validity, 624 TAP tagging, 396t, 398, 398f
differentiation vs. renewal, 117, 118b, 118f clinically relevant findings, 532–534, 622–623, TaqMan® double-dye probes, qPCR, 243f
embryonic, see Embryonic stem (ES) cells 623t Taq polymerase, 183
functional role (tissue replenishment), 115–117 clinical factors vs., 625 Tarceva® (erlotinib), as genotype-specific drug, 620
hematopoietic, 116, 116f clinical validity see below Target DNA, for cloning, 166-168, 167f, 168f
mesenchymal, 116–117 clinical validity and, 608, 609b, 609t, 624–628 Target (nucleic acid hybridization), see Nucleic acid
neural, and cell fate, 139f breast cancer associations, 625–626, 626f hybridization
niches, 117, 117f lack of evidence for, 627–628 TATA-binding protein (TBP), 348, 348t
pluripotent embryonic cells, 117–119 monogenic disorders vs., 628t TATA box, 14, 15f, 347, 348
778 Index
mitochondrial DNA strands, 258, 258f Translation, 2, 20–27 Trisomy, 50, 51t
nuclear, making unrelated proteins, 268 gene expression regulation and, 375–378, 379 gene dosage effects, 425
nuclear, making proteins and ncRNA, 276b, 369f environmental influences, 375–376, 376f partial, generated by translocations, 56f, 57f
Transcriptome, 239, 245 small noncoding RNAs and, 376–378 prenatal testing, 587–588, 588f, 598, 602, 608
analysis see Transcriptomics spatial control, 375 biomarkers, 630, 630t
databases, 385t temporal control, 375–376, 376f population screening, 629–630, 630f, 630t
see also Messenger RNA (mRNA) see also MicroRNA (miRNA); Short interfering rescue and uniparental disomy, 57–58
Transcriptomics, 239, 239t, 384 RNA (siRNA) risk factors, 587
DNA sequencing, 220, 248–249, 384 human b-globin gene, 20f see also specific disorders
microarrays see Microarray(s) initiation, 20, 21–22, 21f Trisomy 13 (Patau syndrome), 52, 52t
proteomics vs., 249 alternative start sites, 373 prenatal testing, 587–588
see also Gene expression analysis mRNA code see Genetic code Trisomy 18 (Edwards syndrome), 52, 52t
Transdifferentiation, 690bb, 691f ribosomes see Ribosome(s) prenatal testing, 587–588
Transduction, 699 tRNA see Transfer RNA (tRNA) Trisomy 21 (Down syndrome), 50, 52, 52t
Transfection, 699 Translesion synthesis, 11, 11b, 415 gene dosage, 425
Transferrin receptor, iron-induced changes in Translocation(s), see Chromosome translocations maternal age and risk, 629–630, 630f, 630t
expression, 376, 376f Transmission disequilibrium test (TDT), 485–486, 486b modeling in mice, 667, 667f
Transfer RNA (tRNA), 275, 278t advantages, 486 segmental trisomy Mmu16 models, 667,
amino acid recognition, 20, 280b case-control designs vs., 487–488 667f
see also Aminoacyl tRNA Crohn disease studies, 528, 530 transchromosomic Hsa21 model, 667, 668f
anticodon, 9f extended, 486 prenatal testing, 587–588, 588f, 608
classes of, 280b, 280f internal control, 485 population screening and, 629–630
cloverleaf structure, 9f linkage analysis (affected sib pairs) vs., 486–487, Trithorax proteins, epigenetics, 365
G-U base pairing in stems, 9f 487b, 487t 3
Tritium ( H), 200t
hairpins in, 9f type 1 diabetes, 486b, 486f tRNA see Transfer RNA (tRNA)
codon–anticodon recognition, 21, 21f, 280b, 280f Transplantation, Trophectoderm, polarity and axis specification, 140b
base wobble and, 23, 23t, 259, 280b, 280f immune response and rejection, 120 Trophoblast, 136, 149, 150
see also Genetic code MHC and, 126b cytotrophoblast, 137f, 150, 151f, 152f
evolution, 275b Transposable elements see Transposons origins of, 51f
genes, 278t, 279 Transposons, 290–291, 408 syncytiotrophoblast, 137, 150, 151f, 152f
amino acid frequency and, 279 Alu repeats, 187, 272b, 292–293 see also Extraembryonic structures
chromosomal locations, 279 autonomous vs. nonautonomous, 290, 291f Tropism/tropic (of viruses), 700, 701t, 703
internal promoters, 15, 16f disease role, 336 Trunk neural crest, 156b, 156f, 156t
mtDNA, 258, 258f, 259, 279 DNA transposons, see under DNA transposons Tryptophan, structure, 5f
see also RNA genes exaptation of, 336-337, 337f TSGs see Tumor suppressor genes
processing, 20 exons, source of novel, TSH receptor, gain-of-function mutations, 433
diverse modified nucleotides, 9f families/classes, 290, 291f Tuberous sclerosis, locus heterogeneity, 462
synthesis by RNA pol III, 15–16, 16f functional importance and purifying selection, b-Tubulin, immunocytochemistry, 244f
translation initiation, 22 336 Tumor(s),
Transformation gene duplication events and, 316 children vs. adults, 546
cell-based DNA cloning, 165, 167f, 171f gene expression, effect on neighboring gene, cytogenetics see Cancer cytogenetics
circular vs. linear DNA, 171 336, 336f DNA sequencing, 559–560
co-transformation, 171 genome evolution roles, 335–337, 336f formation (tumorigenesis) see Carcinogenesis
key DNA fractionation step, 171–172 exaptation, 336 genetics of see Cancer genetics
large insert size and, 174, 175–176 novel exons, 336, 336f, 337f as organs, 538
number of recombinant molecules/cell, 171 novel regulatory elements335–337, 336f, 337f specific capabilities/characteristics, 564
large-insert size and, 174, 175–176 exon shuffling, mediation of, 315, 315f angiogenesis, 565
see also specific methods. neogene formation, 336 metastases, 565
Transforming growth factor b (TGFb), mutagenesis using, see Transposon mutagenesis types/classification, 538, 538f, 560, 561, 562f,
colorectal cancer role, 564 occurrence/DNA content, 336 620–621
neural pattern formation and, 157, 157f ‘parasites,’ 336 see also Cancer(s); specific malignancies
Transgene(s), P element (D. melanogaster), 302b Tumor cells see Cancer cell(s)
defective, in gene trap mutagenesis, 658, 659f regulation Tumor infiltrating lymphocytes (TILs), 713t
definition, 643 by endo-siRNA, 277f, 279t Tumor necrosis factor (TNF) superfamily, death
expression, by piRNAs, 285-286, 287f receptors, 113
candidate gene testing and, 515 retrotransposons see Retrotransposons/ Tumor suppressor genes, 540, 546–549, 567
in D. melanogaster, 302b retrotransposition cancer and inactivation
inducible, 647, 647f transposition mechanism biallelic (two-hit) inactivation, 540, 546,
leaky, 646 copy-and-paste, 315, 315f, 659 546f, 547
silencing, 645, 707 cut-and-paste, 659 cell cycle dysregulation and, 551–552, 551f,
by position effects, 648 see also specific sequences/elements 552f
due to transgene structure, 648 Transposon mutagenesis, 658-660, 659f, 660f FAP/APC and, 548
integration, piggyBac-mediated, 660 haploinsufficiency as growth advantage, 548
tandem repetition of DNA, 644, 644f Sleeping beauty-mediated, 659, 660f inactivation by methylation, 358, 548, 549,
post-zygotic, causing mosaicism, 644 Trans-splicing, 276b 549f
knocked-in, 651, 651f, 652f Trastuzumab (Herceptin®), genotype-specific drug Knudson’s two-hit hypothesis and, 546–547,
promoterless, 659f license, 619 546f, 567
YAC, 648 TRF1, TRF2, components of telosomes, 41, 42f one-hit inactivation in familial cancers, 546
Transgenesis methods, 643-648, 643f, 644f, 646f, 647f Trichorhinophalangeal syndrome, retinoblastoma and, 546–547, 546f, 547f
Transgenic birds, making of, 645 type 1, 430t variations on two-hit paradigm, 547–548
chickens producing recombinant proteins in type II (Langer–Giedon syndrome), 427, 427t functional role, 540, 546, 551–552, 551f, 555
eggs, 683 Trichothiodystrophy, DNA repair defect, 415 as ‘gatekeepers,’ 563
Transgenic cows, making of, 645 Trilaminar germ disk, 152, 152f identification
Transgenic Drosophila, making of, 645 Trinucleotide repeat disorders, familial cancers, 547, 548t
Transgenic fish, making of, 644 genetic anticipation and, 74–75, 424, 512 loss of heterozygosity and, 548–549, 548f
Transgenic frogs, making of, 645 genetic testing, 588, 589f miRNAs as, 561
Transgenic livestock, 683 see also Repeat expansion diseases; specific see also specific proteins
goats producing therapeutic proteins in milk, conditions Tunicates, definition, 327b
682f, 683 Triplet code, 2, 20 Turner syndrome, 50
Transgenic plants, uses, 683 see also Genetic code SHOX haploinsufficiency, 426
rice for increasing dietary vitamin A, 683 Triploblasts (bilaterians), definition, 327b X-inactivation and, 66, 426
Transgenic mice, making 643-645, 643f, 644f, within metazoan phylogeny, 330f Twins,
Huntington disease (HD) models, 665-666, 665t, 5’-5’ triphosphate linkage, structure, 18f dizygotic (fraternal) twins, 150b, 150f, 470, 598
666f Triploidy, constitutional human, 50, 51f, 51t monozygotic (identical) twins, 150b, 150f, 469,
Transgenic rats, HD models, 665t lethality of, 53 598
Transit amplifying (TA) cells, 117, 117f origin of, 51f twinning mechanisms, 149, 150b, 150f
780 Index
zygosity testing, 598–599 DNA vaccines, 687 chemoreceptor genes and, 332, 332t
Twin studies, recombinant vaccines, 687 chrodates and, 313
concordance, 469 Vagal neural crest, 156b, 156f, 156t sex determination systems, 320
MZ vs. DZ twins, 470 Validity, whole genome duplications, 318–319, 318f,
Pairwise vs. probandwise, 469–470 analytical see Analytical validity 319f
schizophrenia studies, 470, 470t clinical see Clinical validity see also Genome evolution
Crohn disease, 528 susceptibility gene tests see Susceptibility testing model organisms see Model organisms
limitations, 469–471 Valine, structure, 5f oocytes, 146, 147f, 148
separated MZ twins, 470–471 Van der Waals forces, 6t phylogenies, 329, 330f, 331f
zygosity errors and, 598 Variable expression, 74, 74f polyploidy, 319
zygosity testing, 598–599 Variable number tandem repeats see Tandem repeats symmetry and asymmetry, 137, 139–140, 140f
Two-dimensional polyacrylamide gel electrophoresis Variable (V) regions, immunoglobulin(s), 123f, 124, 128 see also specific groups/species
(2D-PAGE), global protein see also Immunoglobulin genes Vioxx® (rofecoxib), adverse reactions, 619
expression and, 249–250 Variance, partitioning in multifactorial theory, 81 Virus(es),
Two-point mapping see under Linkage analysis Vascular endothelial growth factor (VEGF), antiviral responses, RNAi and, 391
Type I error (false positives), association studies, 479, angiogenesis, 565 of bacterial cells see Phage
485 VDJ recombination, 128-129, 129f cytotoxic T cells and, 127
Typhoid, cystic fibrosis gene and heterozygote cell-specific events, 129f genomes
advantage, 88–89 DNA polymerases involved, 11b different classes of, 12, 13f
Tyrosine kinase, BRAF, 543 Vectors, for cloning DNA, 168–169 DNA, 7, 11, 12t, 13f, 701t, 702–703, 703f
Tyrosine, structure, 5f cyclization, 168 RNA, 8, 11–12, 12t, 13f, 701, 701t, 702f
Tyrosinemia, treatment, 678 definition, 168 segmented vs. multipartite, 13f
DNA insertion, 165, 166–168, 167f, 169f immunoglobulins and, 124, 125f
expression vectors, 178, 178f infection strategies, 12
U affinity tags and, 179, 179f innate immunity and, 122
U1, U2, U4, U5, and U6 snRNAs, 281, 281t, 282 baculovirus, 181 oncogenes, 540–541, 540f, 541t
components of major (GU-AG) spliceosome, 17 phage display and, 180, 180f see also Oncogene(s)
U11, U12, U4atac, U5, and U6atac snRNAs, 281, 281t, riboprobes and, 198 oncolytic, 714
282 SV40 origins and COS expression systems, vectors for gene therapy, 700–703, 701t, 702f,
components of minor (AU-AC) spliceosome, 17 181 703f
U3 snoRNA, 281 see also Expression cloning replication-competent, for oncolytic
U7 snRNA, 282 f1-based, 180f function, 701
U8 snoRNA, 281 genetic modifications and construction, 169–171 replication-defective, for safety, 7
Ubiquitin, co-expression with ribosomal proteins, 268 cloning sites, 169–170 see also specific viruses/types
Ubiquitin ligase, 369, 550 marker/reporter genes, 170–171, 170f, 172, Visceral endoderm, 151
UCEs (ultraconserved elements), 312 176 Visceral (splanchnic) mesoderm, 153, 154f
UGT1A1 glucuronosyltransferase, 614 polylinker sequences, 170 VISTA enhancer browser, 313t, 313f
Ulcerative colitis, 528 integration and, 169 VISTA genome browser, 312, 313t, 313f
Ultracentrifugation, satellite DNA isolation, 164 insert size preferences, 173t Vitamin K, blood clotting and warfarin effects, 617,
Ultraconserved elements (UCEs), 312, 336, 337f large insert vectors, 173–176, 173t 617f
Ultraviolet (UV) radiation, BACs, 131, 174–175 Vitelline envelope, 146, 147f
DNA damage, 411, 413f cosmids, 174 VNTRs see Tandem repeats
DNA repair defects and, 416 fosmids, 175 Vogelgram, 563, 563f
Umbilical cord blood stem cells, 688, 697b λ-based vectors, 173–174 Vomeronasal receptors, vertebrate differences, 332,
mesenchymal stem cells, 116 low copy-number replicons, 174–175 332t
treating blood cancers, 695 PACs, 175 von Hippel–Lindau syndrome, 565
Unclassified variants, in genetic testing, 578 problems, 227b, 231 Von Recklinghausen disease see Neurofibromatosis,
Unconjugated estriol, trisomy 21 and, 630 YACs, 175–176, 176f, 227b, 230, 231 type 1
Unequal crossover, M13-based, 176, 177f, 180f V regions see Variable regions, immunoglobulin(s)
exon duplication, 314 phage-based, 169–171, 173t
gene duplication, 264, 264f large insert size and see above
Unicellular organisms, phage display and, 180, 180f
W
classification, 92–93, 92f plasmid-based, 169–171, 173t Waardenburg syndrome,
C-value paradox and, 96, 96t inducible, 178, 178f testing, clinical validity and, 608
evolution, 92 marker genes, 170–171, 170f type 1, PAX3 mutations, 429, 430f, 433
as model organisms, 298–301 size limits, 173 type 2, MITF mutations, 511
see also Archaea; Bacteria; Yeast restriction sites, 165, 167 type IV, SOX10 mutations, 511, 511f
Uniparental diploidy, 57 multiple unique sites, 170 WAGR, 427t
Uniparental disomy (UPD), 57–58 see also Restriction endonucleases Warfarin,
pathogenicity and, 367 single-stranded DNA production, 176–177 genetic variation and metabolism, 617–618, 617f
Unipotent cells, 115, 115t, 137 M13 vectors, 176, 177f mechanism of action, 617, 617f
Unrooted phylogenetic tree, 328, 328f phagemid vectors, 177, 177f narrow therapeutic window, 617
Unstable repeats see Repeat expansion diseases size, 173 Washington University, 233
Untranslated region (mRNA) see under Messenger RNA Vectors, individual Watson–Crick base pairing rules, 7, 7f
(mRNA) pBluescript, 177, 177f W chromosome, 321
Uracil, 3, 3f, 4t pcDNA3.1/myc-His, 183f WD repeat, 270, 271f
product of deamination, 412, 412f, 413f pET-3 series, 178, 178f Wellcome Trust Case-Control Consortium (WTCCC),
RNA base pairing and, 8, 9f pGEX-4T series, 178f 487–488, 516
RNA editing pSP64, 199f replicated associations, 489–490, 490f, 622–623,
conversion from cytosine, 374, 375f pUC19, 170, 170f 623t
conversion to cytosine, 374 Vectors, for gene therapy see Gene therapy, vectors for lack of action following discovery, 635
structure of, 3f VEGF, angiogenesis, 565, 686, 686t size of relative risk, 490
Uridine, 4t in macular degeneration therapy, see also specific disorders
Urochordates, 313, 314f apatamer target, 686 Werdnig–Hoffmann disease (SMA; spinal muscular
definition, 327b mAb target, 686t atrophy), 419f, 420, 422
Usher syndrome, locus heterogeneity, 72 Velocardiofacial syndrome (VCFS), 427–428 Western blotting (immunoblotting), 242, 244f
Hardy–Weinberg distribution and, 87 Ventralizing factors, 157, 157f Whole genome amplification,
Uterus, 149f Vertebrate(s), isothermal amplification (phi29 DNA
UV see Ultraviolet (UV) radiation development, 133–162 polymerase), 186b, 489
mammalian see under Mammal(s) PCR-based methods, 186b
models, 301–303 Whole genome duplications (WGD), 264, 318–319,
V see also Development; specific stages 318f, 319f, 341–342
Vaccinia virus, in gene therapy, 701t, 703 disease models, 641t G-value paradox and, 329
Vaccines, genetic engineering of, 686-687 DNA methylation, 360 Whole genome PCR, 186b, 187–188
cancer vaccines, 687 evolution Whole-mount in situ hybridization, 241, 241f
Index 781
Williams–Beuren syndrome, 427, 427t, 428 X-linked inheritance, 64 large fragments and instability, 227b, 231
Wnt signaling pathway, evolutionary conservation, contiguous gene syndromes, 504b YAC cloning/libraries, 175-176, 175f, 216f, 227b,
334, 334f dominant, 66b 227f, 230t, 232b,
Wobble see Base wobble male lethality and identification of, 76, 76f YAC transgenics, 648, 661-662, 666, 666f
Wolf–Hirschhorn syndrome, 427t, 428 pedigree, 65f Yeast two-hybrid system, 396–397, 396t, 397f
Worried well, 631 genotype–phenotype correlations, 436, 436t Y-linked inheritance, 66b, 67
Hardy–Weinberg distribution and, 85 hypothetical pedigree, 65f
male hemizygosity and, 64 Yolk mass, 146, 147, 149b
X recessive, 66b Yolk sac, 149b
X chromosome(s), 321, 321t estimated mutation rates, 88 Y RNA, 272b, 282
amelogenin, 597 manifesting heterozygotes, 67
aneuploidies new mutations and, 76, 77f, 88
gene dosage and, 426 pedigree, 65f
Z
X-inactivation and, 66, 426 X-inactivation and see X-inactivation Z chromosome(s), 321, 321t, 323
see also specific disorders see also specific traits/conditions Z-DNA, 7
deletions, 426–427, 427f X-linked mental retardation (XLMR), 532 Zebrafish (Danio rerio),
DNA content, 261t XY body, 273 dedicated database (ZFIN), 394t
evolution and, 342 X-Y pairing in male meiosis, 35, 273 disease models, examples, 671t
autosomal ancestor, 323, 323f evolution and, 322–323 embryos in drug screening, 680b
common X–Y ancestor, 321, 322f gene pairs (gametologs), 321, 322 gene inactivation using zinc finger nucleases, 671
evolutionary conservation, 323 obligate crossover, 322 gene kockdown using morpholinos, 656–657,
gametologs, 321, 322f retrogenes and, 273 657f
gene synteny, 320, 320f see also Pseudoautosomal regions genetic screening in, 394t
inactivation, 326 X-Y sex determination system, 320–326, 321t genome sequencing, 236t
pseudoautosomal regions, 321, 322–323, model organism, 302, 304b, 394t
322f model of development, 135, 135t, 136
Y chromosome divergence from, 323–324 Y phylogenetic relationships, 331f
genes, 65, 326 YACs see Yeast artificial chromosomes (YACs) Zero-mode waveguides (ZWMs), 222, 223f
dosage effects, 426 Yamanaka, Shinya, 690b, 693 Zinc finger motif, 104b, 104f, 350
escaping inactivation (active genes), 367, Y chromosome, 321, 321t modular combinations, 655, 655f
367f, 426 amelogenin, 597 trinucleotide sequence specificity, 655, 655f
inactivation see X-inactivation DNA content, 261t, 324, 324f Zinc finger nucleases,
inheritance patterns see X-linked inheritance protein-coding see genes (below) construction of, 654–655
sex determination and, 158 pseudoautosomal regions see heterodimeric, 655, 655f
synteny conservation, 320 Pseudoautosomal regions structure, 655f
X–autosome translocations, 505–506, 506f, 520 DNA profiling using, 597–598 targeted gene inactivation,
X–Y pairing and meiosis, 35, 321 surname associations, 599–600 in animal models, 654–656, 655f
evolution and, 322–323 evolution and, 342 therapeutic possibilities, 709
retrogenes and, 273 autosomal ancestor, 323, 323f targeted gene repair, 708, 709f
Y homologous regions see Pseudoautosomal common X–Y ancestor, 321, 322f Zona pellucida, 146, 147, 147f
regions deletions and potential extinction, 324 Zone of polarizing activity (ZPA), 143, 143f
Xenopus laevis, divergence from X chromosome, 323–324 Zooblots, 204
induction experiments, 137–138, 138f gametologs, 321, 322f cystic fibrosis gene, 522
model of development, 135t, 136, 302–303, 304b gene conversion, 324–326, 325f dystrophin gene, 520
neural pattern formation and, 155 pseudoautosomal regions, 321, 322–323, Z score see Lod score
ploidy, 304b 322f Zygosity, errors in, 598
Xenopus spp., as model organisms, 302–303, 304b selection and, 323–324 Zygosity testing, 598–599
phylogenetic relationships, 331f gene conversion within, 324-326, 324f Zygote, 30
transgenesis, 645 genes, 65, 67, 324–326 animal–vegetal axis, 140b, 141f
see also individual species ampliconic sequences, 324f, 325–326, 325t cleavage, 147–148, 148f
Xenopus tropicalis, model organism, 304b depletion and X-inactivation, 326 DNA methylation, 360
Xeroderma pigmentosum, DNA repair defect, 415, gametologs, 321, 322f early genome activation in mammals, 148
416, 557 pseudogenes, 324 formation (at fertilization), 147
X-inactivation, mammalian, 59, 65–66, 365–367, 366f retrogenes, 273 totipotency, 114, 136
Barr body (sex chromatin), 66, 66f testes-specific, 324, 325t Zygotene, 34, 34f
copy number modulation, 326 X-chromosome homology, 326
counting mechanism, 362 X-degenerate sequences, 324f, 325, 325t
genes escaping (active genes), 367, 367f X-transposed sequences, 324, 324f, 325t
gene silencing, 66 heterochromatin, 161t, 261, 290
antisense Xist silencing, 367 inactivation, 326
DNA methylation and, 358, 367 inheritance and see Y-linked inheritance
facultative heterochromatin and, 38 MSY, male-specific non-recombining region, 67,
noncoding RNA role, 287 321, 324f, 325t
translocations and, 506 protein-coding genes within MSY, 325t
initiation/propagation, 365–366 see also genes (above) and Pseudoautosomal
mosaicism and, 66–67, 366, 366f regions
noncoding RNA and, 275 mutations, 324
nuclear location effects, 362 palindromes, 324f, 325
pathogenic gene dosage and, 426 possibility of extinction, 324
phenotype skewing, 67 sequencing, 290
pseudoautosomal regions and, 326, 367 sex determination and, 158
random nature of inactivation, 66, 367 structure, 324, 324f
sex chromosome aneuploidies and, 53, 66, 426 Y gene inactivation, limited, 326
species differences, 326 Yeast,
X–autosome translocations and positional centromere elements, 40, 40f, 41f, 175
mapping, 506, 506f chromatin remodeling factors, 356
XIST, 287, 288, 288t, 366–367 as model organisms, 299b, 300
X-inactivation center (XIC), 366 protein interactome, 401f
XIST RNA, 287, 288, 288t, 366–367 replication origins
regulation by TSIX RNA, 288t ARS elements, 40–41, 175, 175f
X-linked agammaglobulinemia, 514 assay, 40
X-linked Bruton agammaglobulinemia, mosaicism due whole genome duplication events, 319
to X-inactivation, 67 see also individual species
X-linked chronic granulomatous disease, positional Yeast artificial chromosomes (YACs), 173t, 175–176,
cloning, 499 175f
Quick reference
Human cells and tissues
A classification of adult human cells Table 4.1, page 95
A classification of immune system cells
lymphocyte classes and sublcasses Table 4.7, page120
non-lymphocyte cells Table 4.8, page 121
Extraembryonic membranes Box 5.2, page 149
Tissue origins from the three germ layers Figure 5.17, page 154
Human nomenclature
Chromosome groups Table 2.3, page 45
Chromosome nomenclature Box 2.2, page 45
Chromosome abnormalities Table 2.4, page 51
Genes and DNA sequences Box 3.1, page 62
DNA and amino acid variants Box 13.2, page 407
Pedigree symbols Figure 3.2, page 64
Model organisms
Animal models of development Table 5.1, page 135
Animal models of disease Table 20.1, page 641
Model eukaryotes for studying gene function Table 12.4, page 394
Unicellular models Box 10.1, page 299
Invertebrate models Box 10.2, page 302
Table 10.1, page 303
Vertebrate models Box 10.3, page 304
Mammalian models Table 10.2, page 305
Model organism databases, Table 12.4, page 394
Phylogenies
Eukaryotic phylogeny Figure 10.25A, page 330
Metazoan phylogeny Figure 10.25B, page 330
Vertebrate phylogeny Figure 10.26, page 331
Glossary of metazoan phylogenetic groups/terms Box 10.6, page 327
The Genetic code (see figure 1.25 for variant mitochondrial codes)
First Base
Second Base U C A G
UUC 冧 Phe
冧 冧 冧
U UUU CUU AUU GUU
CUC AUC lle GUC
Leu Val
UUG 冧
UUA CUA AUA GUA
Leu
CUG AUG Met GUG
冧 冧 冧
C UCU CCU ACU GCU
UCC
UCA
UCG
Ser
CCC
CCA
CCG
Pro
ACC
ACA
ACG
Thr
GCC
GCA
GCG
冧 Ala
450 46
UAC 冧 CAC 冧
A UAU CAU AAU GAU
Tyr His AAC 冧 Asn
GAC 冧 Asp
UAA STOP
UAG STOP CAG 冧
CAA Gln AAA
AAG 冧 Lys GAA
GAG 冧 Glu
UGC 冧 Cys
冧 冧
冧
G UGU CGU AGU GGU
Ser
CGC AGC GGC
Arg Gly
UGA STOP
UGG Trp
CGA
CGG
AGA
AGG 冧 Arg GGA
GGG
ATTCACACTTGAGTAACTC
4 EDITION
TH
450 460
46 0 470
470 480 490
49 0 500
500 510 520
450 460
0 470
4 480 490
490 500
5 00 510 520
www.garlandscience.com/
I S B N 978-0-8153-4149-9
9 780815 341499
TOM STRACHAN AND ANDREW READ
HMG4_FINAL.indd 1 11/03/2010 11:18